Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (7): 1-9     https://doi.org/10.11925/infotech.2096-3467.2021.0143
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于联合语义表示的不同知识库中的实体对齐方法研究*
李文娜1,2,张智雄1,2,3()
1中国科学院文献情报中心 北京 100190
2中国科学院大学经济与管理学院图书情报与档案管理系 北京 100190
3科技大数据湖北省重点实验室 武汉 430071
Entity Alignment Method for Different Knowledge Repositories with Joint Semantic Representation
Li Wenna1,2,Zhang Zhixiong1,2,3()
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
3Hubei Key Laboratory of Big Data in Science and Technology, Wuhan 430071, China
全文: PDF (967 KB)   HTML ( 35
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 解决不同知识库中的实体对齐问题,探索如何有效地同时利用知识库结构信息和语义信息的实体对齐方法。【方法】 利用TransE模型表示实体的结构信息,利用BERT模型表示实体的语义信息,并设计基于知识库实体的结构信息和语义信息的联合语义表示模型(BTJE),通过孪生网络实现实体对齐。【结果】 本文方法在DBP-WD和DBP-YG数据集上最优MRR值分别达到0.521和0.413,Hits@1达到0.542和0.478,优于其他传统方法。【局限】 实验数据集规模有限,在更大规模知识库上的通用性有待考证。【结论】 探索了一种基于联合语义表示的不同知识库中的实体对齐方法,通过在模型中同时引入实体的结构信息和语义信息,有效提高了模型对实体的表示能力,从而在不同知识库中的实体对齐任务中有较好的性能。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
李文娜
张智雄
关键词 实体对齐联合语义表示BERT    
Abstract

[Objective] This paper combines the structure and semantic information of knowledge, aiming to create a better entity alignment method for different knowledge repositories. [Methods] First, we used the TransE model to represent the structure of entities, and used the BERT model to represent their semantic information. Then, we designed an entity alignment method based on the BTJE model (BERT and TransE Joint model for Entity alignment). Finally, we use the siamese network model to finish entity alignment tasks. [Results] We examined the new method with DBP-WD and DBP-YG datasets. Their optimal MRR values reached 0.521 and 0.413, while the Hits@1 reached 0.542 and 0.478. These results were better than those of the traditional models. [Limitations] The size of our experimental data set needs to be expanded, which will further evaluate the performance of the proposed method. [Conclusions] Our new method could effectively finish entity alignment tasks for different knowledge bases.

Key wordsEntity Alignment    Joint Semantic Representation    BERT
收稿日期: 2021-02-11      出版日期: 2021-08-11
ZTFLH:  TP393  
基金资助:*中国科学院文献情报能力建设专项课题(2019WQZX0017)
通讯作者: 张智雄,OCRID: 0000-0003-1596-7487     E-mail: zhangzhx@mail.las.ac.cn
引用本文:   
李文娜, 张智雄. 基于联合语义表示的不同知识库中的实体对齐方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 1-9.
Li Wenna, Zhang Zhixiong. Entity Alignment Method for Different Knowledge Repositories with Joint Semantic Representation. Data Analysis and Knowledge Discovery, 2021, 5(7): 1-9.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.0143      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I7/1
方法类型 使用特征 模型
统计方法 基于实体属性相似度 RDF-AI[5]、SILK[6]、LIMES[7]
机器学习
方法
基于实体描述信息 决策树[8]、SVM[9]
基于隐含主题特征 LDA-EA[10]、DPVL[11]
基于图结构信息 GCN-Align[13]、RDGCN[14]、MultiKE[15]
深度学习
方法
基于结构信息的嵌入表示 TransE[17]、TransR[18]、TransH[19]、MTransE[20]
IPTransE[21]、BootEA[23]
结合属性信息的嵌入表示 JAPE[24]、KDCoE[25]、AttrE[27]
Table 1  知识库实体对齐研究方法归纳
Fig.1  基于联合语义表示的实体对齐模型结构
数据集 来源 实体 关系 属性 关系
三元组
属性
三元组
DBP-WD DBpedia 100 000 330 351 463 294 381 166
Wikidata 100 000 220 729 448 774 789 815
DBP-YG DBpedia 100 000 302 334 428 952 451 646
YAGO 100 000 31 23 502 563 118 376
Table 2  实验数据集信息
Fig.2  BTJE模型在不同学习率下的训练损失曲线
模型 Hits@1 Hits@10 MRR
仅结构信息 MTransE 0.281 0.520 0.363
IPTransE 0.348 0.638 0.447
结构+属性信息 JAPE 0.318 0.588 0.411
AttrE 0.389 0.667 0.487
联合表示 BTJE 0.542 0.785 0.521
Table 3  DBP-WD数据集上的实验结果
模型 Hits@1 Hits@10 MRR
仅结构信息 MTransE 0.252 0.493 0.334
IPTransE 0.297 0.557 0.386
结构+属性信息 JAPE 0.235 0.484 0.320
AttrE 0.232 0.427 0.300
联合表示 BTJE 0.478 0.692 0.413
Table 4  DBP-YG数据集上的实验结果
[1] Bollacker K, Evans C, Paritosh P, et al. FreeBase: A Collaboratively Created Graph Database for Structuring Human Knowledge[C]// Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008: 1247-1250.
[2] Suchanek F M, Kasneci G, Weikum G. YAGO: A Core of Semantic Knowledge[C]// Proceedings of the 16th International Conference on World Wide Web. 2007: 697-706.
[3] Auer S, Bizer C, Kobilarov G, et al. DBpedia: A Nucleus for a Web of Open Data[C]// Proceedings of the 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference. Springer, Berlin, Heidelberg, 2007: 722-735.
[4] Dong X, Gabrilovich E, Heitz G, et al. Knowledge Vault: A Web-scale Approach to Probabilistic Knowledge Fusion[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014: 601-610.
[5] Scharffe F, Liu Y B, Zhou C G. RDF-AI: An Architecture for RDF Datasets Matching, Fusion and Interlink[C]// Proceedings of IJCAI 2009 Workshop on Identity, Reference, and Knowledge Representation(IR-KR). 2009.
[6] Volz J, Bizer C, Gaedke M, et al. Discovering and Maintaining Links on the Web of Data[C]// Proceedings of the 8th International Semantic Web Conference. 2009:650-665.
[7] Ngomo A C N, Auer S. LIMES : A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data[C]// Proceedings of the 22nd International Joint Conference on Artificial Intelligence. AAAI Press, 2011: 2312-2317.
[8] Han J W, Kamber M, Pei J. Data Mining: Concepts and Techniques[M]. The 3rd Edition. The Morgan Kaufmann, 2011.
[9] Vapnik V. The Nature of Statistical Learning Theory[M]. Springer Science & Business Media, 2013.
[10] Bhattacharya I, Getoor L. A Latent Dirichlet Model for Unsupervised Entity Resolution[C]// Proceedings of the 6th SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2006: 47-58.
[11] Hall R, Sutton C, McCallum A. Unsupervised Deduplication Using Cross-field Dependencies[C]// Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008: 310-317.
[12] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022.
[13] Wang Z C, Lv Q S, Lan X H, et al. Cross-lingual Knowledge Graph Alignment via Graph Convolutional Networks[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 349-357.
[14] Wu Y T, Liu X, Feng Y S, et al. Relation-aware Entity Alignment for Heterogeneous Knowledge Graphs[OL]. arXiv Preprint, arXiv: 1908. 08210.
[15] Zhang Q H, Sun Z Q, Hu W, et al. Multi-view Knowledge Graph Embedding for Entity Alignment[OL]. arXiv Preprint, arXiv: 1906. 02390.
[16] Le Q, Mikolov T. Distributed Representations of Sentences and Documents[C]// Proceedings of the 31st International Conference on Machine Learning. 2014: 1188-1196.
[17] Bordes A, Usunier N, Garcia-Duran A, et al. Translating Embeddings for Modeling Multi-relational Data[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013: 2787-2795.
[18] Lin Y K, Liu Z Y, Sun M S, et al. Learning Entity and Relation Embeddings for Knowledge Graph Completion[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015: 2181-2187.
[19] Wang Z, Zhang J W, Feng J L, et al. Knowledge Graph Embedding by Translating on Hyperplanes[C]// Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014: 1112-1119.
[20] Chen M H, Tian Y T, Yang M H, et al. Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment[OL]. arXiv Preprint, arXiv:1611.03954.
[21] Zhu H, Xie R B, Liu Z Y, et al. Iterative Entity Alignment via Joint Knowledge Embeddings[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017: 4258-4264.
[22] Lin Y K, Liu Z Y, Sun M S. Modeling Relation Paths for Representation Learning of Knowledge Bases[OL]. arXiv Preprint, arXiv:1506.00379.
[23] Sun Z Q, Hu W, Zhang Q H, et al. Bootstrapping Entity Alignment with Knowledge Graph Embedding[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018: 4396-4402.
[24] Sun Z Q, Hu W, Li C K. Cross-Lingual Entity Alignment via Joint Attribute-Preserving Embedding[C]// Proceedings of International Semantic Web Conference. Springer, Cham, 2017: 628-644.
[25] Chen M H, Tian Y T, Chang K W, et al. Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment[OL]. arXiv Preprint, arXiv:1806.06478.
[26] Chung J, Gulcehre C, Cho K H, et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling[OL]. arXiv Preprint, arXiv:1412.3555.
[27] Trsedya B D, Qi J Z, Zhang R. Entity Alignment Between Knowledge Graphs Using Attribute Embeddings[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2019: 297-304.
[28] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8):1735-1780.
pmid: 9377276
[29] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of NAACL-HLT 2019. 2019:4171-4186.
[30] Bromley J, Bentz J W, Bottou L, et al. Signature Verification Using a “Siamese” Time Delay Neural Network[J]. International Journal of Pattern Recognition and Artificial Intelligence, 1993, 7(4):669-688.
doi: 10.1142/S0218001493000339
[31] Vrandečić D. Wikidata: A New Platform for Collaborative Data Collection[C]// Proceedings of the 21st International Conference on World Wide Web. 2012: 1063-1064.
[1] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] 周泽聿,王昊,赵梓博,李跃艳,张小琴. 融合关联信息的GCN文本分类模型构建及其应用研究*[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[3] 马江微, 吕学强, 游新冬, 肖刚, 韩君妹. 融合BERT与关系位置特征的军事领域关系抽取方法*[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[4] 王昊, 林克柔, 孟镇, 李心蕾. 文本表示及其特征生成对法律判决书中多类型实体识别的影响分析[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[5] 喻雪寒, 何琳, 徐健. 基于RoBERTa-CRF的古文历史事件抽取方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[6] 陆泉, 何超, 陈静, 田敏, 刘婷. 基于两阶段迁移学习的多标签分类模型研究*[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
[7] 刘文斌, 何彦青, 吴振峰, 董诚. 基于BERT和多相似度融合的句子对齐方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 48-58.
[8] 尹鹏博,潘伟民,张海军,陈德刚. 基于BERT-BiGA模型的标题党新闻识别研究*[J]. 数据分析与知识发现, 2021, 5(6): 126-134.
[9] 宋若璇,钱力,杜宇. 基于科技论文中未来工作句集的学术创新构想话题自动生成方法研究*[J]. 数据分析与知识发现, 2021, 5(5): 10-20.
[10] 胡昊天,吉晋锋,王东波,邓三鸿. 基于深度学习的食品安全事件实体一体化呈现平台构建*[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[11] 王倩,王东波,李斌,许超. 面向海量典籍文本的深度学习自动断句与标点平台构建研究*[J]. 数据分析与知识发现, 2021, 5(3): 25-34.
[12] 常城扬,王晓东,张胜磊. 基于深度学习方法对特定群体推特的动态政治情感极性分析*[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[13] 董淼, 苏中琪, 周晓北, 兰雪, 崔志刚, 崔雷. 利用Text-CNN改进PubMedBERT在化学诱导性疾病实体关系分类效果的尝试[J]. 数据分析与知识发现, 2021, 5(11): 145-152.
[14] 刘欢,张智雄,王宇飞. BERT模型的主要优化改进方法研究综述*[J]. 数据分析与知识发现, 2021, 5(1): 3-15.
[15] 赵旸, 张智雄, 刘欢, 丁良萍. 基于BERT模型的中文医学文献分类研究*[J]. 数据分析与知识发现, 2020, 4(8): 41-49.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn