|
|
Chinese Named Entity Disambiguation Based on Multivariate Similarity Fusion |
Shi Shuiqian,Jin Jing,Shen Gengyu,Wang Baojia,Ren Ni() |
Institute of Agricultural Information, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China |
|
|
Abstract [Objective] This paper aims to solve the ambiguity problems arising from mapping multiple entities of the same name with different meanings to a knowledge base. It improves the accuracy of entity disambiguation. [Methods] We proposed a multi-dimensional similarity fusion method. It utilizes the semantic similarity of entity context, the entity attributes' background similarity, and the topic words' semantic similarity to characterize entities. [Results] We examined the new model on the agricultural dataset from Wikipedia. The proposed method achieved an accuracy of 89.7%, outperforming traditional methods. [Limitations] The proposed method is only applicable in specific fields. [Conclusions] The new method addresses the entity disambiguation issues in specific fields. It can be applied to a broader range of entity disambiguation scenarios.
|
Received: 12 November 2022
Published: 28 April 2023
|
|
Fund:National Social Science Fund of China(19BTQ032) |
Corresponding Authors:
Ren Ni,ORCID: 0000-0002-3789-7347,E-mail:rn@jaas.ac.cn。
|
[1] |
段宗涛, 李菲, 陈柘. 实体消歧综述[J]. 控制与决策, 2021, 36(5): 1025-1039.
|
[1] |
(Duan Zongtao, Li Fei, Chen Zhe. Entity Disambiguation: A Review[J]. Control and Decision, 2021, 36(5): 1025-1039.)
|
[2] |
温萍梅, 叶志炜, 丁文健, 等. 命名实体消歧研究进展综述[J]. 数据分析与知识发现, 2020, 4(9): 15-25.
|
[2] |
(Wen Pingmei, Ye Zhiwei, Ding Wenjian, et al. Developments of Named Entity Disambiguation[J]. Data Analysis and Knowledge Discovery, 2020, 4(9): 15-25.)
|
[3] |
沈喆, 王毅, 姚毅凡, 等. 面向学术文献的作者名消歧方法研究综述[J]. 数据分析与知识发现, 2020, 4(8): 15-27.
|
[3] |
(Shen Zhe, Wang Yi, Yao Yifan, et al. Author Name Disambiguation Techniques for Academic Literature: A Review[J]. Data Analysis and Knowledge Discovery, 2020, 4(8): 15-27.)
|
[4] |
何儒汉, 唐娇, 史爱武, 等. 基于实体消歧和多粒度注意力的知识库问答[J]. 计算机工程与设计, 2022, 43(2): 560-566.
|
[4] |
(He Ruhan, Tang Jiao, Shi Aiwu, et al. Knowledge Base Question Answering Based on Entity Disambiguation and Multiple Granularity Attention[J]. Computer Engineering and Design, 2022, 43(2): 560-566.)
|
[5] |
刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述[J]. 计算机研究与发展, 2016, 53(3): 582-600.
|
[5] |
(Liu Qiao, Li Yang, Duan Hong, et al. Knowledge Graph Construction Techniques[J]. Journal of Computer Research and Development, 2016, 53(3): 582-600.)
|
[6] |
张吉祥, 张祥森, 武长旭, 等. 知识图谱构建技术综述[J]. 计算机工程, 2022, 48(3): 23-37.
doi: 10.19678/j.issn.1000-3428.0061803
|
[6] |
(Zhang Jixiang, Zhang Xiangsen, Wu Changxu, et al. Survey of Knowledge Graph Construction Techniques[J]. Computer Engineering, 2022, 48(3): 23-37.)
doi: 10.19678/j.issn.1000-3428.0061803
|
[7] |
范鹏程, 沈英汉, 许洪波, 等. 融合实体知识描述的实体联合消歧方法[J]. 中文信息学报, 2020, 34(7): 42-49.
|
[7] |
(Fan Pengcheng, Shen Yinghan, Xu Hongbo, et al. Joint Entity Disambiguation with Entity Knowledge Description[J]. Journal of Chinese Information Processing, 2020, 34(7): 42-49.)
|
[8] |
谭咏梅, 杨雪. 结合实体链接与实体聚类的命名实体消歧[J]. 北京邮电大学学报, 2014, 37(5): 36-40.
doi: 10.13190/j.jbupt.2014.05.008
|
[8] |
(Tan Yongmei, Yang Xue. An Named Entity Disambiguation Algorithm Combining Entity Linking and Entity Clustering[J]. Journal of Beijing University of Posts and Telecommunications, 2014, 37(5): 36-40.)
doi: 10.13190/j.jbupt.2014.05.008
|
[9] |
林泽斐, 欧石燕. 多特征融合的中文命名实体链接方法研究[J]. 情报学报, 2019, 38(1): 68-78.
|
[9] |
(Lin Zefei, Ou Shiyan. Research on Chinese Named Entity Linking Based on Multi-Feature Fusion[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(1): 68-78.)
|
[10] |
文万志, 姜文轩, 葛威, 等. 一种基于深度学习的实体消歧技术[J]. 南通大学学报(自然科学版), 2021, 20(4): 23-30.
|
[10] |
Wen Wanzhi, Jiang Wenxuan, Ge Wei, et al. An Entity Disambiguation Method Based on Deep Learning[J]. Journal of Nantong University (Natural Science Edition), 2021, 20(4): 23-30.)
|
[11] |
怀宝兴, 宝腾飞, 祝恒书, 等. 一种基于概率主题模型的命名实体链接方法[J]. 软件学报, 2014, 25(9): 2076-2087.
|
[11] |
(Huai Baoxing, Bao Tengfei, Zhu Hengshu, et al. Topic Modeling Approach to Named Entity Linking[J]. Journal of Software, 2014, 25(9): 2076-2087.)
|
[12] |
王旭阳, 姜喜秋. 基于上下文信息的中文命名实体消歧方法研究[J]. 计算机应用研究, 2018, 35(4): 1072-1075.
|
[12] |
(Wang Xuyang, Jiang Xiqiu. Chinese Named Entity Disambiguation Method Research Based on Context Information[J]. Application Research of Computers, 2018, 35(4): 1072-1075.)
|
[13] |
王静, 谭绍峰, 贺东东, 等. 基于上下文特征的领域文献实体消歧算法[J]. 北京生物医学工程, 2018, 37(4): 398-402.
|
[13] |
(Wang Jing, Tan Shaofeng, He Dongdong, et al. Entity Disambiguation Algorithm for Domain Document Based on Context Feature[J]. Beijing Biomedical Engineering, 2018, 37(4): 398-402.)
|
[14] |
王瑞, 李弼程, 杜文倩. 基于上下文词向量和主题模型的实体消歧方法[J]. 中文信息学报, 2019, 33(11): 46-56.
|
[14] |
(Wang Rui, Li Bicheng, Du Wenqian. Entity Disambiguation Based on Context Word Vector and Topic Models[J]. Journal of Chinese Information Processing, 2019, 33(11): 46-56.)
|
[15] |
周国民, 宣鑫乐, 沈佳琪, 等. 基于实体关联的消歧算法研究[J]. 中国电子科学研究院学报, 2020, 15(3): 271-277.
|
[15] |
(Zhou Guomin, Xuan Xinle, Shen Jiaqi, et al. Research on Disambiguation Algorithm Based on Entity Association[J]. Journal of China Academy of Electronics and Information Technology, 2020, 15(3): 271-277.)
|
[16] |
曾健荣, 张仰森, 王思远, 等. 基于多特征融合的同名专家消歧方法研究[J]. 北京大学学报(自然科学版), 2020, 56(4): 607-613.
|
[16] |
(Zeng Jianrong, Zhang Yangsen, Wang Siyuan, et al. Research on Expert Disambiguation of Same Name Based on Multi-Feature Fusion[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2020, 56(4): 607-613.)
|
[17] |
线岩团, 余正涛, 洪旭东, 等. 基于特征加权重叠度的中文实体协同消歧方法[J]. 中文信息学报, 2017, 31(2): 36-41.
|
[17] |
(Xian Yantuan, Yu Zhengtao, Hong Xudong, et al. Collaborative Entity Disambiguation Method Based on Weighted Feature Overlap Relatedness for Chinese[J]. Journal of Chinese Information Processing, 2017, 31(2): 36-41.)
|
[18] |
单晓欢, 齐鑫傲, 宋宝燕, 等. 融合多特征图及实体影响力的领域实体消歧[J]. 计算机工程与应用, 2023, 59(5): 305-311.
doi: 10.3778/j.issn.1002-8331.2109-0494
|
[18] |
(Shan Xiaohuan, Qi Xin'ao, Song Baoyan, et al. Domain Entity Disambiguation Combining Multi-Feature Graph and Entity Influence[J]. Computer Engineering and Applications, 2023, 59(5): 305-311.)
doi: 10.3778/j.issn.1002-8331.2109-0494
|
[19] |
Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
|
[20] |
王伟, 赵尔平, 崔志远, 等. 基于HowNet义原和Word2Vec词向量表示的多特征融合消歧方法[J]. 计算机应用, 2021, 41(8): 2193-2198.
doi: 10.11772/j.issn.1001-9081.2020101625
|
[20] |
(Wang Wei, Zhao Erping, Cui Zhiyuan, et al. Disambiguation Method of Multi-Feature Fusion Based on HowNet Sememe and Word2Vec Word Embedding Representation[J]. Journal of Computer Applications, 2021, 41(8): 2193-2198.)
doi: 10.11772/j.issn.1001-9081.2020101625
|
[21] |
邓启平, 陈卫静, 嵇灵, 等. 一种基于异质信息网络的学术文献作者重名消歧方法[J]. 数据分析与知识发现, 2022, 6(4): 60-68.
|
[21] |
(Deng Qiping, Chen Weijing, Ji Ling, et al. Author Name Disambiguation Based on Heterogeneous Information Network[J]. Data Analysis and Knowledge Discovery, 2022, 6(4): 60-68.)
|
[22] |
毛二松, 王波, 唐永旺, 等. 基于词向量的中文微博实体链接方法[J]. 计算机应用与软件, 2017, 34(4): 11-15.
|
[22] |
(Mao Ersong, Wang Bo, Tang Yongwang, et al. Entity Linking Method of Chinese Micro-Blog Based on Word Vector[J]. Computer Applications and Software, 2017, 34(4): 11-15.)
|
[23] |
冯冲, 石戈, 郭宇航, 等. 基于词向量语义分类的微博实体链接方法[J]. 自动化学报, 2016, 42(6): 915-922.
|
[23] |
(Feng Chong, Shi Ge, Guo Yuhang, et al. An Entity Linking Method for Microblog Based on Semantic Categorization by Word Embeddings[J]. Acta Automatica Sinica, 2016, 42(6): 915-922.)
|
[24] |
费晨杰, 刘柏嵩. 基于LDA扩展主题词库的主题爬虫研究[J]. 计算机应用与软件, 2018, 35(4): 49-54.
|
[24] |
(Fei Chenjie, Liu Baisong. Focused Crawler Based on LDA Extended Topic Terms[J]. Computer Applications and Software, 2018, 35(4): 49-54.)
|
[25] |
Müllner D. Modern Hierarchical, Agglomerative Clustering Algorithms[OL]. arXiv Preprint, arXiv: 1109.2378.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|