Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (9): 1-9    DOI: 10.11925/infotech.2096-3467.2021.0179
Current Issue | Archive | Adv Search |
Research on Knowledge Base Error Detection Method Based on Confidence Learning
Li Wenna1,2,Zhang Zhixiong1,2,3()
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2Department of Library, Information and Archives Mangement, School of Economic and Management, University of Chinese Academy of Sciences, Beijing 100190, China
3Hubei Key Laboratory of Big Data in Science and Technology, Wuhan 430071, China
Download: PDF (1240 KB)   HTML ( 34
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper explores the error detection method for knowledge base with the help of confidence learning, aiming to reduce the noise data. [Objective] We used the TransE model to represent knowledge base triples, and used the multi-layer perceptron model to detect errors. Then, we cleaned the dataset with confidence learning, and reduced the influence of noise data through multiple rounds of iterative training. [Results] We examined our new method with DBpedia datasets, and found the optimal F1 value reached 0.736 4, which is better than the control group. [Limitations] The noise data in the experiment was artificially generated and was different from the distribution of real world data. More research is needed to evaluate our method with larger knowledge bases. [Conclusions] The proposed method could reduce the influence of noise data through confidence learning, and more effectively detect knowledge base errors.

Key wordsKnowledge Base      Error Detection      Confidence Learning     
Received: 23 January 2021      Published: 15 October 2021
ZTFLH:  TP393  
Fund:*Project of Chinese Academy of Sciences' Literature and Information Capacity Building(2019WQZX0017)
Corresponding Authors: Zhang Zhixiong     E-mail: zhangzhx@mail.las.ac.cn

Cite this article:

Li Wenna,Zhang Zhixiong. Research on Knowledge Base Error Detection Method Based on Confidence Learning. Data Analysis and Knowledge Discovery, 2021, 5(9): 1-9.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0179     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I9/1

Framework of Knowledge Base Error Detection Method Based on Confidence Learning
Flow of Confidence Learning Module
C y ˜ , y y=0 y=1
y ˜=0 C0,0 C0,1
y ˜=1 C1,0 C1,1
Confusion Matrix of Model Prediction
Q y ˜ , y y=0 y=1
y ˜=0 Q0,0 Q0,1
y ˜=1 Q1,0 Q1,1
Joint Probability Distribution Matrix
Flow of Construction for Experimental Dataset
数据集 TransE C-TransE
Precision Recall F1 Precision Recall F1
E1 0.787 9 0.721 0 0.703 8 0.797 8 0.747 5 0.736 4(+4.63%)
E2 0.793 1 0.719 5 0.700 7 0.790 8 0.736 0 0.723 0(+3.18%)
E5 0.786 4 0.701 0 0.676 9 0.785 2 0.731 0 0.717 6(+6.01%)
E10 0.771 6 0.656 5 0.615 8 0.758 4 0.692 5 0.671 6(+9.06%)
E15 0.745 9 0.552 0 0.442 0 0.756 1 0.679 5 0.653 6(+47.87%)
E20 0.250 0 0.500 0 0.333 3 0.731 2 0.661 5 0.633 9(+90.18%)
Result of Control Experiment on Datasets with Different Noise Ratios
The Results of Models on Datasets with Different Noise Ratios
Stability of Models on Datasets with Different Noise Ratios
The Results of Manual Evaluation of Top100 Errors on DBpedia Dataset
头实体 关系 尾实体 错误类型
Bertram Kelly significant project Isle of Man 关系错误
Chandigarh government type Government of
India
实体错误
George Latham
(footballer)
team Newtown A.F.C. 过时数据
Northwest Airlines lounge Northwest Airlines 实体错误
Hammersmith borough Fulham 实体错误
South African Military Health Service garrison Pretoria 实体错误
Stuart Boardley team Long Melford F.C. 实体错误
Jong Ajax chairman AFC Ajax 关系错误
Philadelphia Union chairman Philadelphia Union 实体错误
Burt Bacharach instrument McGill University 实体错误
False Triples Detected from DBpedia Datasets
[1] Dong X, Gabrilovich E, Heitz G, et al. Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion [C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014: 601-610.
[2] Auer S, Bizer C, Kobilarov G, et al. DBpedia: A Nucleus for a Web of Open Data [C]//Proceedings of International Semantic Web Conference, Asian Semantic Web Conference. 2007: 722-735.
[3] Bollacker K, Evans C, Paritosh P, et al. FreeBase: A Collaboratively Created Graph Database for Structuring Human Knowledge [C]//Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008: 1247-1250.
[4] Heindorf S, Potthast M, Stein B, et al. Vandalism Detection in Wikidata [C]//Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 2016: 327-336.
[5] Aktolga E, Cartright M A, Allan J. Cross-document Cross-lingual Coreference Retrieval [C]//Proceedings of the 17th ACM Conference on Information and Knowledge Management. 2008: 1359-1360.
[6] Pilz A, Paaß G. From Names to Entities Using Thematic Context Distance [C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011: 857-866.
[7] Vapnik V N, Lerner A Y. Recognition of Patterns with Help of Generalized Portraits[J]. Avtomatika i Telemekhanika, 1963, 24(6):774-780.
[8] Carlson A, Betteridge J, Wang R C, et al. Coupled Semi-supervised Learning for Information Extraction [C]// Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 2010: 101-110.
[9] Bordes A, Usunier N, Garcia-Durán A, et al. Translating Embeddings for Modeling Multi-Relational Data [C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013: 2787-2795.
[10] Lin Y K, Liu Z Y, Sun M S, et al. Learning Entity and Relation Embeddings for Knowledge Graph Completion [C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015: 2181-2187.
[11] Wang Z, Zhang J W, Feng J L, et al. Knowledge Graph Embedding by Translating on Hyperplanes [C]//Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014: 1112-1119.
[12] Xie R B, Liu Z Y, Lin F, et al. Does William Shakespeare Really Write Hamlet? Knowledge Representation Learning with Confidence[OL]. arXiv Preprint, arXiv: 1705.03202.
[13] Fasoulis R, Bougiatiotis K, Aisopos F, et al. Error Detection in Knowledge Graphs: Path Ranking, Embeddings or Both?[OL]. arXiv Preprint,arXiv: 2002. 08762.
[14] Lin Y K, Liu Z Y, Luan H B, et al. Modeling Relation Paths for Representation Learning of Knowledge Bases[OL]. arXiv Preprint, arXiv: 1506.00379.
[15] Zhao Y, Feng H L, Gallinari P. Embedding Learning with Triple Trustiness on Noisy Knowledge Graph[J]. Entropy, 2019, 21(11):1083.
doi: 10.3390/e21111083
[16] Jia S B, Xiang Y, Chen X J, et al. Triple Trustworthiness Measurement for Knowledge Graph [C]//Proceedings of the World Wide Web Conference. 2019: 2865-2871.
[17] Northcutt C, Jiang L, Chuang I L. Confident Learning: Estimating Uncertainty in Dataset Labels[J]. Journal of Artificial Intelligence Research, 2021, 70:1373-1411.
doi: 10.1613/jair.1.12125
[18] Rosenblatt F. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms[R]. Cornell Aeronautical Lab Inc Buffalo NY, 1961.
[19] Sun Z Q, Zhang Q H, Hu W, et al. A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs[J]. Proceedings of the VLDB Endowment, 2020, 13(12):2326-2340.
doi: 10.14778/3407790.3407828
[1] Wen Pingmei,Ye Zhiwei,Ding Wenjian,Liu Ying,Xu Jian. Developments of Named Entity Disambiguation[J]. 数据分析与知识发现, 2020, 4(9): 15-25.
[2] Ruihua Qi,Junyi Zhou,Xu Guo,Caihong Liu. Extracting Book Review Topics with Knowledge Base[J]. 数据分析与知识发现, 2019, 3(6): 83-91.
[3] Chen Guo,Xiao Lu. Linking Knowledge Elements from Online Community[J]. 数据分析与知识发现, 2017, 1(11): 75-83.
[4] Zhou Pengcheng,Wu Chuan,Lu Wei. Entity Linking Method for Short Texts with Multi-Knowledge Bases: Case Study of Wikipedia and Freebase[J]. 现代图书情报技术, 2016, 32(6): 1-11.
[5] Dongsheng Zhai, He Liu, Jie Zhang, Liwei Cai. Managing Patent Semantic Knowledge with Graph Database[J]. 数据分析与知识发现, 2016, 32(12): 66-75.
[6] Jiang Xun, Xu Xukan, Su Xinning. Knowledge Service-oriented Model of Knowledge Base Frame Structure Research Based on Double-base Cooperating[J]. 现代图书情报技术, 2014, 30(2): 55-62.
[7] Xu Xin, Hong Yunjia. Study on Text Visualization of Clustering Result for Domain Knowledge Base —— Take Knowledge Base of Chinese Cuisine Culture as the Object[J]. 现代图书情报技术, 2014, 30(10): 25-32.
[8] Wang Dongbo, Zhu Danhao. Research of Mining the Word Category Knowledge for Chinese Syntactic Function Distribution Knowledge Base[J]. 现代图书情报技术, 2013, 29(3): 33-37.
[9] Xu Xin, Guo Jinlong. Construction of Subject Knowledge Base——Taking the Domain of Chinese Cuisine Culture as an Example[J]. 现代图书情报技术, 2013, (12): 2-9.
[10] Guo Jinlong, Hong Yunjia, Xu Xin. Construction and Application of Ontology in the Domain of Chinese Cuisine Culture[J]. 现代图书情报技术, 2013, (12): 10-18.
[11] Hong Yunjia, Xu Xin. Study on Multi-level Text Clustering for Knowledge Base Based on Domain Ontology——Taking Knowledge Base of Chinese Cuisine Culture as an Example[J]. 现代图书情报技术, 2013, (12): 19-26.
[12] Zhang Pengyi, Qu Yan, Huang Chen. Design and Application of the S&T Innovation Group and Environment Ontology[J]. 现代图书情报技术, 2013, (12): 42-47.
[13] Li Jianwei, Song Wen, Tang Yijie, Liu Yi, Wang Xinglan. Research on Data Building for Knowledge Base Based on Scientific Research Ontology[J]. 现代图书情报技术, 2013, 29(11): 15-21.
[14] Hong Yunjia, Xu Xin. Knowledge Base of Collaborative Virtual Reference Systems:State of the Art and Future Trends[J]. 现代图书情报技术, 2012, (9): 2-9.
[15] Tong Zhaojuan, Xu Xin, Chen Chao. Research on Answer Format Regulation in Knowledge Base of Collaborative Virtual Reference System of Shanghai Library[J]. 现代图书情报技术, 2012, (9): 10-14.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn