Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (7): 1-9    DOI: 10.11925/infotech.2096-3467.2021.0143
Current Issue | Archive | Adv Search |
Entity Alignment Method for Different Knowledge Repositories with Joint Semantic Representation
Li Wenna1,2,Zhang Zhixiong1,2,3()
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
3Hubei Key Laboratory of Big Data in Science and Technology, Wuhan 430071, China
Download: PDF (967 KB)   HTML ( 29
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper combines the structure and semantic information of knowledge, aiming to create a better entity alignment method for different knowledge repositories. [Methods] First, we used the TransE model to represent the structure of entities, and used the BERT model to represent their semantic information. Then, we designed an entity alignment method based on the BTJE model (BERT and TransE Joint model for Entity alignment). Finally, we use the siamese network model to finish entity alignment tasks. [Results] We examined the new method with DBP-WD and DBP-YG datasets. Their optimal MRR values reached 0.521 and 0.413, while the Hits@1 reached 0.542 and 0.478. These results were better than those of the traditional models. [Limitations] The size of our experimental data set needs to be expanded, which will further evaluate the performance of the proposed method. [Conclusions] Our new method could effectively finish entity alignment tasks for different knowledge bases.

Key wordsEntity Alignment      Joint Semantic Representation      BERT     
Received: 11 February 2021      Published: 11 August 2021
ZTFLH:  TP393  
Fund:Project of Literature and Information Capacity Building, Chinese Academy of Sciences(2019WQZX0017)
Corresponding Authors: Zhang Zhixiong,OCRID: 0000-0003-1596-7487     E-mail: zhangzhx@mail.las.ac.cn

Cite this article:

Li Wenna, Zhang Zhixiong. Entity Alignment Method for Different Knowledge Repositories with Joint Semantic Representation. Data Analysis and Knowledge Discovery, 2021, 5(7): 1-9.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0143     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I7/1

方法类型 使用特征 模型
统计方法 基于实体属性相似度 RDF-AI[5]、SILK[6]、LIMES[7]
机器学习
方法
基于实体描述信息 决策树[8]、SVM[9]
基于隐含主题特征 LDA-EA[10]、DPVL[11]
基于图结构信息 GCN-Align[13]、RDGCN[14]、MultiKE[15]
深度学习
方法
基于结构信息的嵌入表示 TransE[17]、TransR[18]、TransH[19]、MTransE[20]
IPTransE[21]、BootEA[23]
结合属性信息的嵌入表示 JAPE[24]、KDCoE[25]、AttrE[27]
Overview of Researches on Knowledge Base Entity Alignment
Model Structure of BTJE
数据集 来源 实体 关系 属性 关系
三元组
属性
三元组
DBP-WD DBpedia 100 000 330 351 463 294 381 166
Wikidata 100 000 220 729 448 774 789 815
DBP-YG DBpedia 100 000 302 334 428 952 451 646
YAGO 100 000 31 23 502 563 118 376
Dataset Information for Experiment
BTJE Training Loss for Different Learning Rate
模型 Hits@1 Hits@10 MRR
仅结构信息 MTransE 0.281 0.520 0.363
IPTransE 0.348 0.638 0.447
结构+属性信息 JAPE 0.318 0.588 0.411
AttrE 0.389 0.667 0.487
联合表示 BTJE 0.542 0.785 0.521
Experiment Results on DBP-WD Dataset
模型 Hits@1 Hits@10 MRR
仅结构信息 MTransE 0.252 0.493 0.334
IPTransE 0.297 0.557 0.386
结构+属性信息 JAPE 0.235 0.484 0.320
AttrE 0.232 0.427 0.300
联合表示 BTJE 0.478 0.692 0.413
Experiment Results on DBP-YG Dataset
[1] Bollacker K, Evans C, Paritosh P, et al. FreeBase: A Collaboratively Created Graph Database for Structuring Human Knowledge[C]// Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008: 1247-1250.
[2] Suchanek F M, Kasneci G, Weikum G. YAGO: A Core of Semantic Knowledge[C]// Proceedings of the 16th International Conference on World Wide Web. 2007: 697-706.
[3] Auer S, Bizer C, Kobilarov G, et al. DBpedia: A Nucleus for a Web of Open Data[C]// Proceedings of the 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference. Springer, Berlin, Heidelberg, 2007: 722-735.
[4] Dong X, Gabrilovich E, Heitz G, et al. Knowledge Vault: A Web-scale Approach to Probabilistic Knowledge Fusion[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014: 601-610.
[5] Scharffe F, Liu Y B, Zhou C G. RDF-AI: An Architecture for RDF Datasets Matching, Fusion and Interlink[C]// Proceedings of IJCAI 2009 Workshop on Identity, Reference, and Knowledge Representation(IR-KR). 2009.
[6] Volz J, Bizer C, Gaedke M, et al. Discovering and Maintaining Links on the Web of Data[C]// Proceedings of the 8th International Semantic Web Conference. 2009:650-665.
[7] Ngomo A C N, Auer S. LIMES : A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data[C]// Proceedings of the 22nd International Joint Conference on Artificial Intelligence. AAAI Press, 2011: 2312-2317.
[8] Han J W, Kamber M, Pei J. Data Mining: Concepts and Techniques[M]. The 3rd Edition. The Morgan Kaufmann, 2011.
[9] Vapnik V. The Nature of Statistical Learning Theory[M]. Springer Science & Business Media, 2013.
[10] Bhattacharya I, Getoor L. A Latent Dirichlet Model for Unsupervised Entity Resolution[C]// Proceedings of the 6th SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2006: 47-58.
[11] Hall R, Sutton C, McCallum A. Unsupervised Deduplication Using Cross-field Dependencies[C]// Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008: 310-317.
[12] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022.
[13] Wang Z C, Lv Q S, Lan X H, et al. Cross-lingual Knowledge Graph Alignment via Graph Convolutional Networks[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 349-357.
[14] Wu Y T, Liu X, Feng Y S, et al. Relation-aware Entity Alignment for Heterogeneous Knowledge Graphs[OL]. arXiv Preprint, arXiv: 1908. 08210.
[15] Zhang Q H, Sun Z Q, Hu W, et al. Multi-view Knowledge Graph Embedding for Entity Alignment[OL]. arXiv Preprint, arXiv: 1906. 02390.
[16] Le Q, Mikolov T. Distributed Representations of Sentences and Documents[C]// Proceedings of the 31st International Conference on Machine Learning. 2014: 1188-1196.
[17] Bordes A, Usunier N, Garcia-Duran A, et al. Translating Embeddings for Modeling Multi-relational Data[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013: 2787-2795.
[18] Lin Y K, Liu Z Y, Sun M S, et al. Learning Entity and Relation Embeddings for Knowledge Graph Completion[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015: 2181-2187.
[19] Wang Z, Zhang J W, Feng J L, et al. Knowledge Graph Embedding by Translating on Hyperplanes[C]// Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014: 1112-1119.
[20] Chen M H, Tian Y T, Yang M H, et al. Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment[OL]. arXiv Preprint, arXiv:1611.03954.
[21] Zhu H, Xie R B, Liu Z Y, et al. Iterative Entity Alignment via Joint Knowledge Embeddings[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017: 4258-4264.
[22] Lin Y K, Liu Z Y, Sun M S. Modeling Relation Paths for Representation Learning of Knowledge Bases[OL]. arXiv Preprint, arXiv:1506.00379.
[23] Sun Z Q, Hu W, Zhang Q H, et al. Bootstrapping Entity Alignment with Knowledge Graph Embedding[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018: 4396-4402.
[24] Sun Z Q, Hu W, Li C K. Cross-Lingual Entity Alignment via Joint Attribute-Preserving Embedding[C]// Proceedings of International Semantic Web Conference. Springer, Cham, 2017: 628-644.
[25] Chen M H, Tian Y T, Chang K W, et al. Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment[OL]. arXiv Preprint, arXiv:1806.06478.
[26] Chung J, Gulcehre C, Cho K H, et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling[OL]. arXiv Preprint, arXiv:1412.3555.
[27] Trsedya B D, Qi J Z, Zhang R. Entity Alignment Between Knowledge Graphs Using Attribute Embeddings[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2019: 297-304.
[28] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8):1735-1780.
pmid: 9377276
[29] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of NAACL-HLT 2019. 2019:4171-4186.
[30] Bromley J, Bentz J W, Bottou L, et al. Signature Verification Using a “Siamese” Time Delay Neural Network[J]. International Journal of Pattern Recognition and Artificial Intelligence, 1993, 7(4):669-688.
doi: 10.1142/S0218001493000339
[31] Vrandečić D. Wikidata: A New Platform for Collaborative Data Collection[C]// Proceedings of the 21st International Conference on World Wide Web. 2012: 1063-1064.
[1] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[3] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[4] Wang Hao, Lin Kerou, Meng Zhen, Li Xinlei. Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[5] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[6] Lu Quan, He Chao, Chen Jing, Tian Min, Liu Ting. A Multi-Label Classification Model with Two-Stage Transfer Learning[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
[7] Liu Wenbin, He Yanqing, Wu Zhenfeng, Dong Cheng. Sentence Alignment Method Based on BERT and Multi-similarity Fusion[J]. 数据分析与知识发现, 2021, 5(7): 48-58.
[8] Yin Pengbo,Pan Weimin,Zhang Haijun,Chen Degang. Identifying Clickbait with BERT-BiGA Model[J]. 数据分析与知识发现, 2021, 5(6): 126-134.
[9] Song Ruoxuan,Qian Li,Du Yu. Identifying Academic Creative Concept Topics Based on Future Work of Scientific Papers[J]. 数据分析与知识发现, 2021, 5(5): 10-20.
[10] Hu Haotian,Ji Jinfeng,Wang Dongbo,Deng Sanhong. An Integrated Platform for Food Safety Incident Entities Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[11] Wang Qian,Wang Dongbo,Li Bin,Xu Chao. Deep Learning Based Automatic Sentence Segmentation and Punctuation Model for Massive Classical Chinese Literature[J]. 数据分析与知识发现, 2021, 5(3): 25-34.
[12] Chang Chengyang,Wang Xiaodong,Zhang Shenglei. Polarity Analysis of Dynamic Political Sentiments from Tweets with Deep Learning Method[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[13] Dong Miao, Su Zhongqi, Zhou Xiaobei, Lan Xue, Cui Zhigang, Cui Lei. Improving PubMedBERT for CID-Entity-Relation Classification Using Text-CNN[J]. 数据分析与知识发现, 2021, 5(11): 145-152.
[14] Liu Huan,Zhang Zhixiong,Wang Yufei. A Review on Main Optimization Methods of BERT[J]. 数据分析与知识发现, 2021, 5(1): 3-15.
[15] Zhao Yang, Zhang Zhixiong, Liu Huan, Ding Liangping. Classification of Chinese Medical Literature with BERT Model[J]. 数据分析与知识发现, 2020, 4(8): 41-49.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn