Research on Knowledge Base Error Detection Method Based on Confidence Learning
Li Wenna1,2,Zhang Zhixiong1,2,3()
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China 2Department of Library, Information and Archives Mangement, School of Economic and Management, University of Chinese Academy of Sciences, Beijing 100190, China 3Hubei Key Laboratory of Big Data in Science and Technology, Wuhan 430071, China
[Objective] This paper explores the error detection method for knowledge base with the help of confidence learning, aiming to reduce the noise data. [Objective] We used the TransE model to represent knowledge base triples, and used the multi-layer perceptron model to detect errors. Then, we cleaned the dataset with confidence learning, and reduced the influence of noise data through multiple rounds of iterative training. [Results] We examined our new method with DBpedia datasets, and found the optimal F1 value reached 0.736 4, which is better than the control group. [Limitations] The noise data in the experiment was artificially generated and was different from the distribution of real world data. More research is needed to evaluate our method with larger knowledge bases. [Conclusions] The proposed method could reduce the influence of noise data through confidence learning, and more effectively detect knowledge base errors.
李文娜,张智雄. 基于置信学习的知识库错误检测方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 1-9.
Li Wenna,Zhang Zhixiong. Research on Knowledge Base Error Detection Method Based on Confidence Learning. Data Analysis and Knowledge Discovery, 2021, 5(9): 1-9.
Dong X, Gabrilovich E, Heitz G, et al. Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion [C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014: 601-610.
Auer S, Bizer C, Kobilarov G, et al. DBpedia: A Nucleus for a Web of Open Data [C]//Proceedings of International Semantic Web Conference, Asian Semantic Web Conference. 2007: 722-735.
Bollacker K, Evans C, Paritosh P, et al. FreeBase: A Collaboratively Created Graph Database for Structuring Human Knowledge [C]//Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008: 1247-1250.
Heindorf S, Potthast M, Stein B, et al. Vandalism Detection in Wikidata [C]//Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 2016: 327-336.
Aktolga E, Cartright M A, Allan J. Cross-document Cross-lingual Coreference Retrieval [C]//Proceedings of the 17th ACM Conference on Information and Knowledge Management. 2008: 1359-1360.
Pilz A, Paaß G. From Names to Entities Using Thematic Context Distance [C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011: 857-866.
Vapnik V N, Lerner A Y. Recognition of Patterns with Help of Generalized Portraits[J]. Avtomatika i Telemekhanika, 1963, 24(6):774-780.
Carlson A, Betteridge J, Wang R C, et al. Coupled Semi-supervised Learning for Information Extraction [C]// Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 2010: 101-110.
Bordes A, Usunier N, Garcia-Durán A, et al. Translating Embeddings for Modeling Multi-Relational Data [C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013: 2787-2795.
Lin Y K, Liu Z Y, Sun M S, et al. Learning Entity and Relation Embeddings for Knowledge Graph Completion [C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015: 2181-2187.
Wang Z, Zhang J W, Feng J L, et al. Knowledge Graph Embedding by Translating on Hyperplanes [C]//Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014: 1112-1119.
Xie R B, Liu Z Y, Lin F, et al. Does William Shakespeare Really Write Hamlet? Knowledge Representation Learning with Confidence[OL]. arXiv Preprint, arXiv: 1705.03202.
Fasoulis R, Bougiatiotis K, Aisopos F, et al. Error Detection in Knowledge Graphs: Path Ranking, Embeddings or Both?[OL]. arXiv Preprint,arXiv: 2002. 08762.
Lin Y K, Liu Z Y, Luan H B, et al. Modeling Relation Paths for Representation Learning of Knowledge Bases[OL]. arXiv Preprint, arXiv: 1506.00379.
Zhao Y, Feng H L, Gallinari P. Embedding Learning with Triple Trustiness on Noisy Knowledge Graph[J]. Entropy, 2019, 21(11):1083.
Jia S B, Xiang Y, Chen X J, et al. Triple Trustworthiness Measurement for Knowledge Graph [C]//Proceedings of the World Wide Web Conference. 2019: 2865-2871.
Northcutt C, Jiang L, Chuang I L. Confident Learning: Estimating Uncertainty in Dataset Labels[J]. Journal of Artificial Intelligence Research, 2021, 70:1373-1411.
Rosenblatt F. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms[R]. Cornell Aeronautical Lab Inc Buffalo NY, 1961.
Sun Z Q, Zhang Q H, Hu W, et al. A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs[J]. Proceedings of the VLDB Endowment, 2020, 13(12):2326-2340.