Please wait a minute...
Data Analysis and Knowledge Discovery  2024, Vol. 8 Issue (2): 56-64    DOI: 10.11925/infotech.2096-3467.2022.1190
Current Issue | Archive | Adv Search |
Chinese Named Entity Disambiguation Based on Multivariate Similarity Fusion
Shi Shuiqian,Jin Jing,Shen Gengyu,Wang Baojia,Ren Ni()
Institute of Agricultural Information, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
Download: PDF (997 KB)   HTML ( 7
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper aims to solve the ambiguity problems arising from mapping multiple entities of the same name with different meanings to a knowledge base. It improves the accuracy of entity disambiguation. [Methods] We proposed a multi-dimensional similarity fusion method. It utilizes the semantic similarity of entity context, the entity attributes' background similarity, and the topic words' semantic similarity to characterize entities. [Results] We examined the new model on the agricultural dataset from Wikipedia. The proposed method achieved an accuracy of 89.7%, outperforming traditional methods. [Limitations] The proposed method is only applicable in specific fields. [Conclusions] The new method addresses the entity disambiguation issues in specific fields. It can be applied to a broader range of entity disambiguation scenarios.

Key wordsEntity Disambiguation      Similarity      Contextual Word Vector      Entity Properties      Topic Word Vector     
Received: 12 November 2022      Published: 28 April 2023
ZTFLH:  TP393 G250  
Fund:National Social Science Fund of China(19BTQ032)
Corresponding Authors: Ren Ni,ORCID: 0000-0002-3789-7347,E-mail:rn@jaas.ac.cn。   

Cite this article:

Shi Shuiqian, Jin Jing, Shen Gengyu, Wang Baojia, Ren Ni. Chinese Named Entity Disambiguation Based on Multivariate Similarity Fusion. Data Analysis and Knowledge Discovery, 2024, 8(2): 56-64.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.1190     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2024/V8/I2/56

Flowchart of Chinese Named Entity Disambiguation Algorithm Based on Multivariate Similarity Fusion
LDA Model
Weight Matrix
相似度组合 准确率(%)
A 78.2
B 77.1
C 75.6
A+C 81.1
B+C 82.3
A+B+C 89.7
Experimental Results of Combining Different Similarity Features
Disambiguation Results for the Number of Different Topics
Performance of Different Approaches
[1] 段宗涛, 李菲, 陈柘. 实体消歧综述[J]. 控制与决策, 2021, 36(5): 1025-1039.
[1] (Duan Zongtao, Li Fei, Chen Zhe. Entity Disambiguation: A Review[J]. Control and Decision, 2021, 36(5): 1025-1039.)
[2] 温萍梅, 叶志炜, 丁文健, 等. 命名实体消歧研究进展综述[J]. 数据分析与知识发现, 2020, 4(9): 15-25.
[2] (Wen Pingmei, Ye Zhiwei, Ding Wenjian, et al. Developments of Named Entity Disambiguation[J]. Data Analysis and Knowledge Discovery, 2020, 4(9): 15-25.)
[3] 沈喆, 王毅, 姚毅凡, 等. 面向学术文献的作者名消歧方法研究综述[J]. 数据分析与知识发现, 2020, 4(8): 15-27.
[3] (Shen Zhe, Wang Yi, Yao Yifan, et al. Author Name Disambiguation Techniques for Academic Literature: A Review[J]. Data Analysis and Knowledge Discovery, 2020, 4(8): 15-27.)
[4] 何儒汉, 唐娇, 史爱武, 等. 基于实体消歧和多粒度注意力的知识库问答[J]. 计算机工程与设计, 2022, 43(2): 560-566.
[4] (He Ruhan, Tang Jiao, Shi Aiwu, et al. Knowledge Base Question Answering Based on Entity Disambiguation and Multiple Granularity Attention[J]. Computer Engineering and Design, 2022, 43(2): 560-566.)
[5] 刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述[J]. 计算机研究与发展, 2016, 53(3): 582-600.
[5] (Liu Qiao, Li Yang, Duan Hong, et al. Knowledge Graph Construction Techniques[J]. Journal of Computer Research and Development, 2016, 53(3): 582-600.)
[6] 张吉祥, 张祥森, 武长旭, 等. 知识图谱构建技术综述[J]. 计算机工程, 2022, 48(3): 23-37.
doi: 10.19678/j.issn.1000-3428.0061803
[6] (Zhang Jixiang, Zhang Xiangsen, Wu Changxu, et al. Survey of Knowledge Graph Construction Techniques[J]. Computer Engineering, 2022, 48(3): 23-37.)
doi: 10.19678/j.issn.1000-3428.0061803
[7] 范鹏程, 沈英汉, 许洪波, 等. 融合实体知识描述的实体联合消歧方法[J]. 中文信息学报, 2020, 34(7): 42-49.
[7] (Fan Pengcheng, Shen Yinghan, Xu Hongbo, et al. Joint Entity Disambiguation with Entity Knowledge Description[J]. Journal of Chinese Information Processing, 2020, 34(7): 42-49.)
[8] 谭咏梅, 杨雪. 结合实体链接与实体聚类的命名实体消歧[J]. 北京邮电大学学报, 2014, 37(5): 36-40.
doi: 10.13190/j.jbupt.2014.05.008
[8] (Tan Yongmei, Yang Xue. An Named Entity Disambiguation Algorithm Combining Entity Linking and Entity Clustering[J]. Journal of Beijing University of Posts and Telecommunications, 2014, 37(5): 36-40.)
doi: 10.13190/j.jbupt.2014.05.008
[9] 林泽斐, 欧石燕. 多特征融合的中文命名实体链接方法研究[J]. 情报学报, 2019, 38(1): 68-78.
[9] (Lin Zefei, Ou Shiyan. Research on Chinese Named Entity Linking Based on Multi-Feature Fusion[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(1): 68-78.)
[10] 文万志, 姜文轩, 葛威, 等. 一种基于深度学习的实体消歧技术[J]. 南通大学学报(自然科学版), 2021, 20(4): 23-30.
[10] Wen Wanzhi, Jiang Wenxuan, Ge Wei, et al. An Entity Disambiguation Method Based on Deep Learning[J]. Journal of Nantong University (Natural Science Edition), 2021, 20(4): 23-30.)
[11] 怀宝兴, 宝腾飞, 祝恒书, 等. 一种基于概率主题模型的命名实体链接方法[J]. 软件学报, 2014, 25(9): 2076-2087.
[11] (Huai Baoxing, Bao Tengfei, Zhu Hengshu, et al. Topic Modeling Approach to Named Entity Linking[J]. Journal of Software, 2014, 25(9): 2076-2087.)
[12] 王旭阳, 姜喜秋. 基于上下文信息的中文命名实体消歧方法研究[J]. 计算机应用研究, 2018, 35(4): 1072-1075.
[12] (Wang Xuyang, Jiang Xiqiu. Chinese Named Entity Disambiguation Method Research Based on Context Information[J]. Application Research of Computers, 2018, 35(4): 1072-1075.)
[13] 王静, 谭绍峰, 贺东东, 等. 基于上下文特征的领域文献实体消歧算法[J]. 北京生物医学工程, 2018, 37(4): 398-402.
[13] (Wang Jing, Tan Shaofeng, He Dongdong, et al. Entity Disambiguation Algorithm for Domain Document Based on Context Feature[J]. Beijing Biomedical Engineering, 2018, 37(4): 398-402.)
[14] 王瑞, 李弼程, 杜文倩. 基于上下文词向量和主题模型的实体消歧方法[J]. 中文信息学报, 2019, 33(11): 46-56.
[14] (Wang Rui, Li Bicheng, Du Wenqian. Entity Disambiguation Based on Context Word Vector and Topic Models[J]. Journal of Chinese Information Processing, 2019, 33(11): 46-56.)
[15] 周国民, 宣鑫乐, 沈佳琪, 等. 基于实体关联的消歧算法研究[J]. 中国电子科学研究院学报, 2020, 15(3): 271-277.
[15] (Zhou Guomin, Xuan Xinle, Shen Jiaqi, et al. Research on Disambiguation Algorithm Based on Entity Association[J]. Journal of China Academy of Electronics and Information Technology, 2020, 15(3): 271-277.)
[16] 曾健荣, 张仰森, 王思远, 等. 基于多特征融合的同名专家消歧方法研究[J]. 北京大学学报(自然科学版), 2020, 56(4): 607-613.
[16] (Zeng Jianrong, Zhang Yangsen, Wang Siyuan, et al. Research on Expert Disambiguation of Same Name Based on Multi-Feature Fusion[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2020, 56(4): 607-613.)
[17] 线岩团, 余正涛, 洪旭东, 等. 基于特征加权重叠度的中文实体协同消歧方法[J]. 中文信息学报, 2017, 31(2): 36-41.
[17] (Xian Yantuan, Yu Zhengtao, Hong Xudong, et al. Collaborative Entity Disambiguation Method Based on Weighted Feature Overlap Relatedness for Chinese[J]. Journal of Chinese Information Processing, 2017, 31(2): 36-41.)
[18] 单晓欢, 齐鑫傲, 宋宝燕, 等. 融合多特征图及实体影响力的领域实体消歧[J]. 计算机工程与应用, 2023, 59(5): 305-311.
doi: 10.3778/j.issn.1002-8331.2109-0494
[18] (Shan Xiaohuan, Qi Xin'ao, Song Baoyan, et al. Domain Entity Disambiguation Combining Multi-Feature Graph and Entity Influence[J]. Computer Engineering and Applications, 2023, 59(5): 305-311.)
doi: 10.3778/j.issn.1002-8331.2109-0494
[19] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[20] 王伟, 赵尔平, 崔志远, 等. 基于HowNet义原和Word2Vec词向量表示的多特征融合消歧方法[J]. 计算机应用, 2021, 41(8): 2193-2198.
doi: 10.11772/j.issn.1001-9081.2020101625
[20] (Wang Wei, Zhao Erping, Cui Zhiyuan, et al. Disambiguation Method of Multi-Feature Fusion Based on HowNet Sememe and Word2Vec Word Embedding Representation[J]. Journal of Computer Applications, 2021, 41(8): 2193-2198.)
doi: 10.11772/j.issn.1001-9081.2020101625
[21] 邓启平, 陈卫静, 嵇灵, 等. 一种基于异质信息网络的学术文献作者重名消歧方法[J]. 数据分析与知识发现, 2022, 6(4): 60-68.
[21] (Deng Qiping, Chen Weijing, Ji Ling, et al. Author Name Disambiguation Based on Heterogeneous Information Network[J]. Data Analysis and Knowledge Discovery, 2022, 6(4): 60-68.)
[22] 毛二松, 王波, 唐永旺, 等. 基于词向量的中文微博实体链接方法[J]. 计算机应用与软件, 2017, 34(4): 11-15.
[22] (Mao Ersong, Wang Bo, Tang Yongwang, et al. Entity Linking Method of Chinese Micro-Blog Based on Word Vector[J]. Computer Applications and Software, 2017, 34(4): 11-15.)
[23] 冯冲, 石戈, 郭宇航, 等. 基于词向量语义分类的微博实体链接方法[J]. 自动化学报, 2016, 42(6): 915-922.
[23] (Feng Chong, Shi Ge, Guo Yuhang, et al. An Entity Linking Method for Microblog Based on Semantic Categorization by Word Embeddings[J]. Acta Automatica Sinica, 2016, 42(6): 915-922.)
[24] 费晨杰, 刘柏嵩. 基于LDA扩展主题词库的主题爬虫研究[J]. 计算机应用与软件, 2018, 35(4): 49-54.
[24] (Fei Chenjie, Liu Baisong. Focused Crawler Based on LDA Extended Topic Terms[J]. Computer Applications and Software, 2018, 35(4): 49-54.)
[25] Müllner D. Modern Hierarchical, Agglomerative Clustering Algorithms[OL]. arXiv Preprint, arXiv: 1109.2378.
[1] Chen Zhuo, Jiang Xixi, Zhang Xiaojuan. Influence of Network Structure Changes on Co-word Network Link Prediction[J]. 数据分析与知识发现, 2024, 8(2): 114-130.
[2] Xiang Shuxuan, Cao Yujie, Mao Jin. Computing Patent Similarity Based on Hierarchical Feature of Claims[J]. 数据分析与知识发现, 2024, 8(2): 33-43.
[3] Chen Liu, Guo Yuhong. Literature Recommendation Algorithm Integrating High-Order Similarity of Motif Structure[J]. 数据分析与知识发现, 2023, 7(7): 146-155.
[4] Li Tianyu, Liu Libo. Deep Cross-modal Hashing Based on Intra-modal Similarity and Semantic Preservation[J]. 数据分析与知识发现, 2023, 7(5): 105-115.
[5] Deng Na, He Xinyang, Chen Weijie, Chen Xu. MPMFC: A Traditional Chinese Medicine Patent Classification Model Integrating Network Neighborhood Structural Features and Patent Semantic Features[J]. 数据分析与知识发现, 2023, 7(4): 145-158.
[6] Chen Wenjie. Scientific Collaboration Recommendation Based on Hypergraph[J]. 数据分析与知识发现, 2023, 7(4): 68-76.
[7] Zhou Ning, Jin Gaoya, Shi Wenqian. Algorithm for Entity Coreference Resolution with Neural Network and Global Reasoning[J]. 数据分析与知识发现, 2022, 6(8): 75-83.
[8] Li Hui, Hu Jixia, Tong Zhiying. Subject Topic Mining and Evolution Analysis with Multi-Source Data[J]. 数据分析与知识发现, 2022, 6(7): 44-55.
[9] Duan Jianyong, Xu Lishan, Liu Jie, Li Xin, Zhang Jiaming, Wang Hao. Question Generation Based on Sememe Knowledge and Bidirectional Attention Flow[J]. 数据分析与知识发现, 2022, 6(5): 44-53.
[10] Cao Simeng, Li Chunwang. Review of Studies on Incremental Name Disambiguation[J]. 数据分析与知识发现, 2022, 6(5): 10-19.
[11] Liu Xiaoling, Tan Zongying. Clustering Technology Topics Based on Patent Multi-Attribute Fusion[J]. 数据分析与知识发现, 2022, 6(2/3): 45-54.
[12] Cheng Ge, Wang Shuo, Liao Yongan, Zhang Dongliang. Calculating Case Similarity with Heterogeneous Property Graph[J]. 数据分析与知识发现, 2022, 6(12): 113-122.
[13] Zhang Le, Leng Jidong, Lv Xueqiang, Yuan Menglong, You Xindong. Discovering Chinese New Words Based on Multi-sense Word Embedding[J]. 数据分析与知识发现, 2022, 6(1): 113-121.
[14] Han Hui, Liu Xiuwen. Automatic Scoring for Subjective Questions in Maritime Competency Assessment[J]. 数据分析与知识发现, 2021, 5(8): 113-121.
[15] Liu Wenbin, He Yanqing, Wu Zhenfeng, Dong Cheng. Sentence Alignment Method Based on BERT and Multi-similarity Fusion[J]. 数据分析与知识发现, 2021, 5(7): 48-58.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn