Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (9): 1-9    DOI: 10.11925/infotech.2096-3467.2021.0179
Research on Knowledge Base Error Detection Method Based on Confidence Learning
Li Wenna1,2,Zhang Zhixiong1,2,3()
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2Department of Library, Information and Archives Mangement, School of Economic and Management, University of Chinese Academy of Sciences, Beijing 100190, China
3Hubei Key Laboratory of Big Data in Science and Technology, Wuhan 430071, China
[Objective] This paper explores the error detection method for knowledge base with the help of confidence learning, aiming to reduce the noise data. [Objective] We used the TransE model to represent knowledge base triples, and used the multi-layer perceptron model to detect errors. Then, we cleaned the dataset with confidence learning, and reduced the influence of noise data through multiple rounds of iterative training. [Results] We examined our new method with DBpedia datasets, and found the optimal F1 value reached 0.736 4, which is better than the control group. [Limitations] The noise data in the experiment was artificially generated and was different from the distribution of real world data. More research is needed to evaluate our method with larger knowledge bases. [Conclusions] The proposed method could reduce the influence of noise data through confidence learning, and more effectively detect knowledge base errors.

Key wordsKnowledge Base      Error Detection      Confidence Learning     
Received: 23 January 2021      Published: 15 October 2021
ZTFLH:  TP393  
Fund:*Project of Chinese Academy of Sciences' Literature and Information Capacity Building(2019WQZX0017)
Corresponding Authors: Zhang Zhixiong     E-mail:

Cite this article:

Li Wenna,Zhang Zhixiong. Research on Knowledge Base Error Detection Method Based on Confidence Learning. Data Analysis and Knowledge Discovery, 2021, 5(9): 1-9.

Framework of Knowledge Base Error Detection Method Based on Confidence Learning
Flow of Confidence Learning Module
C y ˜ , y y=0 y=1
y ˜=0 C0,0 C0,1
y ˜=1 C1,0 C1,1
Confusion Matrix of Model Prediction
Q y ˜ , y y=0 y=1
y ˜=0 Q0,0 Q0,1
y ˜=1 Q1,0 Q1,1
Joint Probability Distribution Matrix
Flow of Construction for Experimental Dataset
数据集 TransE C-TransE
Precision Recall F1 Precision Recall F1
E1 0.787 9 0.721 0 0.703 8 0.797 8 0.747 5 0.736 4(+4.63%)
E2 0.793 1 0.719 5 0.700 7 0.790 8 0.736 0 0.723 0(+3.18%)
E5 0.786 4 0.701 0 0.676 9 0.785 2 0.731 0 0.717 6(+6.01%)
E10 0.771 6 0.656 5 0.615 8 0.758 4 0.692 5 0.671 6(+9.06%)
E15 0.745 9 0.552 0 0.442 0 0.756 1 0.679 5 0.653 6(+47.87%)
E20 0.250 0 0.500 0 0.333 3 0.731 2 0.661 5 0.633 9(+90.18%)
Result of Control Experiment on Datasets with Different Noise Ratios
The Results of Models on Datasets with Different Noise Ratios
Stability of Models on Datasets with Different Noise Ratios
The Results of Manual Evaluation of Top100 Errors on DBpedia Dataset
头实体 关系 尾实体 错误类型
Bertram Kelly significant project Isle of Man 关系错误
Chandigarh government type Government of
George Latham
team Newtown A.F.C. 过时数据
Northwest Airlines lounge Northwest Airlines 实体错误
Hammersmith borough Fulham 实体错误
South African Military Health Service garrison Pretoria 实体错误
Stuart Boardley team Long Melford F.C. 实体错误
Jong Ajax chairman AFC Ajax 关系错误
Philadelphia Union chairman Philadelphia Union 实体错误
Burt Bacharach instrument McGill University 实体错误
False Triples Detected from DBpedia Datasets
