Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (9): 15-25    DOI: 10.11925/infotech.2096-3467.2020.0382
Current Issue | Archive | Adv Search |
Developments of Named Entity Disambiguation
Wen Pingmei1,Ye Zhiwei1,Ding Wenjian1,Liu Ying2(),Xu Jian1
1School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China
2Sun Yat-Sen University Library, Guangzhou 510275, China
Download: PDF (1188 KB)   HTML ( 9
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper reviews research and resources in the field of named entity disambiguation(NED) with a focus on the NED methods.[Coverage] We retrieved 57 representative papers and electronic resources from CNKI, Wanfang Data Knowledge Service Platform, and EBSCO.[Methods] First, we summarized the NED principles and methods from the perspectives of entity prominence, context similarity, entity relationship, deep learning and special identification resources. Then, we explored useful knowledge bases, open source tools as well as international conferences on NED evaluation.[Results] Traditional and classic methods were easy to use, while the new ones (e.g., deep learning) significantly improved the results of NED. Effective models often integrated various methods to yield the optimal results.[Limitations] There are subjectivity factors in comparing different methods from the literature.[Conclusions] The NED methods are still developing and could be further improved by artificial intelligence and domain resources.

Key wordsNamed Entity Disambiguation      Knowledge Base      Entity Linking      Clustering     
Received: 04 May 2020      Published: 14 October 2020
ZTFLH:  TP393  
Corresponding Authors: Liu Ying     E-mail: pusly@mail.sysu.edu.cn

Cite this article:

Wen Pingmei,Ye Zhiwei,Ding Wenjian,Liu Ying,Xu Jian. Developments of Named Entity Disambiguation. Data Analysis and Knowledge Discovery, 2020, 4(9): 15-25.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0382     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I9/15

Named Entity Disambiguation Results Based on Knowledge Base and Text Similarity
消歧特征 特征来源 消歧方法 消歧思路
实体显著性 字符串、流行度、共性 规则匹配、先验概率 计算实体显著性,将候选实体列表中显著性最高的候选实体作为歧义实体的消歧结果
上下文相似度 整篇文本、部分名词、实体 分类、聚类、基于主题、概率语言模型 比较实体指称上下文和候选实体上下文的相似度,并将相似度最高的候选实体作为消歧结果,比较过程可选择借助外部知识库
实体关联度 文本描述和文本分类信息、实体注释与实体共现和实体分布、实体关联图 聚类、图关系推理 同一文本中共现的实体往往属于同一个主题或具有某种相关性,利用同一文本中实体之间的语义联系进行协同消歧
深度学习 词向量 神经网络模型 用具有语义特征的分布式向量代表消歧任务中的指称、文本和实体,并将该向量用于深度学习模型中,通过向量相似度完成消歧
特殊标识资源 IPC、ORCID 特殊标识、唯一标识符 通过领域内通用标识资源,降低消歧难度
Comparative Analysis Based on Different Disambiguation Characteristics
Entity Page
Disambiguation Page
知识库 研发机构 实体数量 知识源 介绍
DBpedia[35] 德国莱比锡大学与曼海姆大学 458万 Wikipedia 大规模跨语言的知识库,支持多达125种语言,包括人物、地点、唱片、游戏、组织、疾病、物种等领域
Freebase[36] MetaWeb公司,后被谷歌收购 6 800万 Wikipedia、
IMDB、Flickr
大规模开放结构数据库,包含三层结构:Domain、Type、Topic
YAGO[37] 德国马普研究所 1 000万 Wikipedia、WordNet、GeoNames 大规模跨语言的语义知识库,包含人物、组织、城市等领域
WordNet[39] 普林斯顿大学 15万 专家人工构建 人工编辑,英文词典,按词义组织,名词同义词集合及上下位为消歧提供帮助
Examples of General Knowledge Bases
Web Interface of Text Analytics
服务 研发机构 是否开源 链接源 说明
Text Analytics[40] 微软 提供API Wikipedia 该API提供情感分析、关键词提取、命名实体识别等功能
AGDISTIS[41,48] 莱比锡大学 开源 DBpedia 开源实体链接框架
TAGME[43] 比萨大学 开源API Wikipedia 开源实体链接和注释工具
FEL[44] 雅虎 开源工具包 Wikipedia 多语言轻量级实体链接工具包
Examples of Entity Linking Service
测评会议 测评语言 消歧要求 链接源 说明
TAC KBP[49,50] 英文 链接并聚类 Wikipedia抽取的知识库 官网提供开放训练语料集
WePS[51,52,53] 英文 聚类 无命名实体知识库
CLP-2012[55,56] 中文 链接并聚类 知识库 链接到知识库中的定义
NLP&CC[57] 中文 链接 百科知识库 实现短文本实体的链接
International Conference on Named Entity Disambiguation
[1] 赵军. 命名实体识别、排歧和跨语言关联[J]. 中文信息学报, 2009,23(2):3-17.
[1] ( Zhao Jun. A Survey on Named Entity Recognition, Disambiguation and Cross-Lingual Coreference Resolution[J]. Journal of Chinese Information Processing, 2009,23(2):3-17.)
[2] 高艳红, 李爱萍, 段利国. 面向实体链接的多特征图模型实体消歧方法[J]. 计算机应用研究, 2017,34(10):2909-2914.
[2] ( Gao Yanhong, Li Aiping, Duan Liguo. Entity Disambiguation Method Based on Multi-Feature Fusion Graph Model for Entity Linking[J]. Application Research of Computers, 2017,34(10):2909-2914.)
[3] Shen W, Wang J, Han J. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions[J]. IEEE Transactions on Knowledge and Data Engineering, 2015,27(2):443-460.
doi: 10.1109/TKDE.2014.2327028
[4] Dredze M, McNamee P, Rao D, et al. Entity Disambiguation for Knowledge Base Population[C] // Proceedings of the 23rd International Conference on Computational Linguistics. 2010: 277-285.
[5] Zhu G, Iglesias C A. Exploiting Semantic Similarity for Named Entity Disambiguation in Knowledge Graphs[J]. Expert Systems with Applications, 2018,101:8-24.
doi: 10.1016/j.eswa.2018.02.011
[6] 左乃彻. 基于维基百科的中英文命名实体消歧[D]. 北京: 北京邮电大学, 2015.
[6] ( Zuo Naiche. Named Entity Disambiguation Based on Chinese and English Wikipedia Knowledge Base[D]. Beijing: Beijing University of Posts and Telecommunications, 2015.)
[7] Gattani A, Lamba D S, Garera N, et al. Entity Extraction, Linking, Classification, and Tagging for Social Media: A Wikipedia-based Approach[J]. Proceedings of the VLDB Endowment, 2013,6(11):1126-1137.
doi: 10.14778/2536222.2536237
[8] 王静, 谭绍峰, 贺东东, 等. 基于上下文特征的领域文献实体消歧算法[J]. 北京生物医学工程, 2018, 37(4): 398-402, 409.
[8] ( Wang Jing, Tan Shaofeng, He Dongdong, et al. Entity Disambiguation Algorithm for Domain Document Based on Context Feature[J]. Beijing Biomedical Engineering, 2018,37(4):398-402, 409.)
[9] Guo S, Chang M W, Kiciman E. To Link or Not to Link? A Study on End-to-End Tweet Entity Linking[C] // Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013: 1020-1030.
[10] 线岩团, 余正涛, 洪旭东, 等. 基于特征加权重叠度的中文实体协同消歧方法[J]. 中文信息学报, 2017,31(2):36-41.
[10] ( Xian Yantuan, Yu Zhengtao, Hong Xudong, et al. Collaborative Entity Disambiguation Method Based on Weighted Feature Overlap Relatedness for Chinese[J]. Journal of Chinese Information Processing, 2017,31(2):36-41.)
[11] Elmacioglu E, Tan Y, Yan S, et al. PSNUS: Web People Name Disambiguation by Simple Clustering with Rich Features[C] //Proceedings of the 4th International Workshop on Semantic Evaluations. 2007: 268-271.
[12] Hoffart J, Yosef M A, Bordino I, et al. Robust Disambiguation of Named Entities in Text[C] // Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 2011: 782-792.
[13] Zhang W, Su J, Tan C L, et al. Entity Linking Leveraging: Automatically Generated Annotation[C] // Proceedings of the 23rd International Conference on Computational Linguistics. 2010: 1290-1298.
[14] 李广一, 王厚峰. 基于多步聚类的汉语命名实体识别和歧义消解[J]. 中文信息学报, 2013,27(5):29-34, 42.
[14] ( Li Guangyi, Wang Houfeng. Chinese Named Entity Recognition and Disambiguation Based on Multi-Stage Clustering[J]. Journal of Chinese Information Processing, 2013,27(5):29-34, 42.)
[15] 谭咏梅, 杨雪. 结合实体链接与实体聚类的命名实体消歧[J]. 北京邮电大学学报, 2014,37(5):36-40.
[15] ( Tan Yongmei, Yang Xue. An Named Entity Disambiguation Algorithm Combining Entity Linking and Entity Clustering[J]. Journal of Beijing University of Posts and Telecommunications, 2014,37(5):36-40.)
[16] 怀宝兴, 宝腾飞, 祝恒书, 等. 一种基于概率主题模型的命名实体链接方法[J]. 软件学报, 2014,25(9):2076-2087.
[16] ( Huai Baoxing, Bao Tengfei, Zhu Hengshu, et al. Topic Modeling Approach to Named Entity Linking[J]. Journal of Software, 2014,25(9):2076-2087.)
[17] Han X, Sun L. A Generative Entity-Mention Model for Linking Entities with Knowledge Base[C] // Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011: 945-954.
[18] Meij E, Bron M, Hollink L, et al. Mapping Queries to the Linking Open Data Cloud: A Case Study Using DBpedia[J]. Journal of Web Semantics, 2011,9(4):418-433.
doi: 10.1016/j.websem.2011.04.001
[19] Sun Y, Ji Z, Lin L, et al. Entity Disambiguation with Decomposable Neural Networks[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2017,7(5):e1215.
doi: 10.1002/widm.2017.7.issue-5
[20] 杨光, 刘秉权, 刘铭. 基于图方法的命名实体消歧[J]. 智能计算机与应用, 2015,5(5):52-55.
[20] ( Yang Guang, Liu Bingquan, Liu Ming. Graph-based Method for Named Entity Disambiguation[J]. Intelligent Computer and Applications, 2015,5(5):52-55.)
[21] Cucerzan S. Large-Scale Named Entity Disambiguation Based on Wikipedia Data[C] // Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic. DBLP, 2007: 708-716.
[22] Alhelbawy A, Gaizauskas R. Named Entity Disambiguation Using HMMs[C] // Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies. IEEE Computer Society, 2013: 159-162.
[23] Han X, Sun L, Zhao J. Collective Entity Linking in Web Text: A Graph-Based Method[C] // Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011: 765-774.
[24] Phan M C, Sun A, Tay Y, et al. Pair-Linking for Collective Entity Disambiguation: Two Could be Better than All[J]. IEEE Transactions on Knowledge and Data Engineering, 2018,31(7):1383-1396.
doi: 10.1109/TKDE.69
[25] Niu L, Wu J, Shi Y. Entity Disambiguation with Textual and Connection Information[J]. Procedia Computer Science, 2012,9:1249-1255.
doi: 10.1016/j.procs.2012.04.136
[26] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[C] //Proceedings of the 1st International Conference on Learning Representations. 2013.
[27] Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation (Code and Pre-trained Data)[EB/OL] [2019-12-21]. https://nlp.stanford.edu/projects/glove/.
[28] Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation[C] //Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014.
[29] Zuheros C, Tabik S, Valdivia A, et al. Deep Recurrent Neural Network for Geographical Entities Disambiguation on Social Media Data[J]. Knowledge-Based Systems, 2019,173:117-127.
doi: 10.1016/j.knosys.2019.02.030
[30] He Z, Liu S, Li M, et al. Learning Entity Representation for Entity Disambiguation[C] //Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013.
[31] Francis-Landau M, Durrett G, Klein D. Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks[C] //Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016.
[32] 王琰炎, 王裴岩, 蔡东风, 等. 一种用于专利实体的实体消歧方法[J]. 沈阳航空航天大学学报, 2015,32(1):77-83.
[32] ( Wang Yanyan, Wang Peiyan, Cai Dongfeng, et al. An Entity Disambiguation Method for Patent Entity[J]. Journal of Shenyang Aerospace University, 2015,32(1):77-83.)
[33] Lerchenmueller M J, Olav S. Author Disambiguation in PubMed: Evidence on the Precision and Recall of Authority Among NIH-Funded Scientists[J]. PLoS ONE, 2016,11(7):e0158731.
doi: 10.1371/journal.pone.0158731 pmid: 27367860
[34] Haak L L, Fenner M, Paglione L, et al. ORCID: A System to Uniquely Identify Researchers[J]. Learned Publishing, 2012,25(4):259-264.
doi: 10.1087/20120404
[35] Auer S, Bizer C, Kobilarov G, et al. DBpedia: A Nucleus for a Web of Open Data[A]//Aberer K, Choi K, Noy N, et al. The Semantic Web[M]. Springer Berlin Heidelberg, 2007: 722-735.
[36] Bollacker K, Evans C, Paritosh P, et al. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge[C] //Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, 2008: 1247-1250.
[37] Suchanek F M, Kasneci G, Weikum G. YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia[C] //Proceedings of the 16th International World Wide Web Conference. 2007: 697-706.
[38] 黄恒琪, 于娟, 廖晓, 等. 知识图谱研究综述[J]. 计算机系统应用, 2019,28(6):1-12.
[38] ( Huang Hengqi, Yu Juan, Liao Xiao, et al. Review on Knowledge Graphs[J]. Computer Systems & Applications, 2019,28(6):1-12.)
[39] Miller G A. WordNet: A Lexical Database for English[J]. Communications of the ACM, 1995,38(11):39-41.
[40] Microsoft Azure. Text Analytics: Detect Sentiment, Key Phrases, Named Entities and Language from Your Text[EB/OL]. [2019-12-21]. https://azure.microsoft.com/en-us/services/cognitive-services/ text-analytics/.
[41] Usbeck R, Ngomo A C N, Auer S, et al. AGDISTIS-Agnostic Disambiguation of Named Entities Using Linked Open Data[C] // Proceedings of the 12th International Semantic Web Conference, Sydney, Australia. 2013.
[42] AGDISTIS. Agnostic Disambiguation of Named Entities Using Linked Open Data[EB/OL]. [ 2019- 12- 21]. http://aksw.org/Projects/ AGDISTIS.html.
[43] Ferragina P, Scaiella U. TAGME[EB/OL]. [2019-12-21]. https://tagme.d4science.org/tagme/.
[44] Blanco R, Pappu A. FEL GitHub[DB/OL]. [2019-12-21]. https://github.com/yahoo/FEL.
[45] Blanco R, Ottaviano G, Meij E. Fast and Space-Efficient Entity Linking in Queries[C] // Proceedings of the 8th ACM International Conference on Web Search and Data Mining. 2015: 179-188.
[46] Dexter. Dexter, an Open Source Framework for Entity Linking[EB/OL]. [ 2019- 12- 21]. http://dexter.isti.cnr.it/.
[47] Ceccarelli D, Lucchese C, Orlando S, et al. Dexter: An Open Source Framework for Entity Linking[C] // Proceedings of the 6th International Workshop on Exploiting Semantic Annotations in Information Retrieval. 2013.
[48] AGDISTIS. AGDISTIS-Agnostic Named Entity Disambiguation[DB/OL]. [2019-12-21]. https://github.com/dice-group/AGDISTIS.
[49] Ji H, Grishman R, Dang H T, et al. Overview of the TAC 2010 Knowledge Base Population Track[C] //Proceedings of the 3rd Text Analysis Conference. 2010.
[50] Ji H, Grishman R, Dang H T. Overview of the TAC 2011 Knowledge Base Population Track[C] //Proceedings of the 4th Text Analysis Conference. 2011.
[51] Artiles J, Gonzalo J, Sekine S. The SemEval-2007 WePS Evaluation: Establishing a Benchmark for the Web People Search Task[C] //Proceedings of the 4th International Workshop on Semantic Evaluations. 2007: 64-69.
[52] Artiles J, Gonzalo J, Sekine S. WePS 2 Evaluation Campaign: Overview of the Web People Search Clustering Task[C] //Proceedings of the 2nd Web People Search Evaluation Workshop. 2009.
[53] Artiles J, Borthwick A, Gonzalo J, et al. WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Tasks[C] //Proceedings of the 2010 CLEF LABs & Workshops. DBLP, 2010.
[54] TAC. Past TAC Data[DB/OL].[2019-12-21]. https://tac.nist.gov/data/index.html.
[55] He Z, Wang H, Li S. The Task 2 of CIPS-SIGHAN 2012 Named Entity Recognition and Disambiguation in Chinese Bakeoff[C] //Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2012: 108-144.
[56] CIPS-SIGHAN2012. The Second CIPS-SIGHAN Joint Conference on Chinese Language Processing[EB/OL]. [ 2019- 12- 21]. http://www.cipsc.org.cn/clp2012/bakeoff-cn.html.
[57] NLP&CC. 第二届自然语言处理与中文计算会议(NLP&CC 2013)技术评测测试数据下载[DB/OL]. [ 2019- 12- 21]. http://tcci.ccf.org.cn/conference/2013/pages/page04_tdata.html.
[57] ( NLP&CC. The 2nd Conference on Natural Language Processing and Chinese Computing Test Data Download Address[DB/OL]. [2019- 12- 21]. http://tcci.ccf.org.cn/conference/2013/pages/page04_tdata.html
[1] Wu Jinming,Hou Yuefang,Cui Lei. Automatic Expression of Co-occurrence Clustering Based on Indexing Rules of Medical Subject Headings[J]. 数据分析与知识发现, 2020, 4(9): 133-144.
[2] Xi Yunjiang, Du Diedie, Liao Xiao, Zhang Xuehong. Analyzing & Clustering Enterprise Microblog Users with Supernetwork[J]. 数据分析与知识发现, 2020, 4(8): 107-118.
[3] Yang Xu,Qian Xiaodong. Synchronous Clustering Algorithm for Social Networks Based on Improved Vicsek Model[J]. 数据分析与知识发现, 2020, 4(4): 119-128.
[4] Xiong Huixiang,Li Xiaomin,Li Yueyan. Group Recommendation Based on Attribute Mining of Book Reviews[J]. 数据分析与知识发现, 2020, 4(2/3): 214-222.
[5] Huaming Zhao,Li Yu,Qiang Zhou. Determining Best Text Clustering Number with Mean Shift Algorithm[J]. 数据分析与知识发现, 2019, 3(9): 27-35.
[6] Shan Li,Yehui Yao,Hao Li,Jie Liu,Karmapemo. ISA Biclustering Algorithm for Group Recommendation[J]. 数据分析与知识发现, 2019, 3(8): 77-87.
[7] Ke Li,Yuya Sasaki. Analyzing Sentiment Distribution with Spatial-textual Data of Multi-dimensional Clustering[J]. 数据分析与知识发现, 2019, 3(7): 14-22.
[8] Ruihua Qi,Junyi Zhou,Xu Guo,Caihong Liu. Extracting Book Review Topics with Knowledge Base[J]. 数据分析与知识发现, 2019, 3(6): 83-91.
[9] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[10] Quan Lu,Anqi Zhu,Jiyue Zhang,Jing Chen. Research on User Information Requirement in Chinese Network Health Community: Taking Tumor-forum Data of Qiuyi as an Example[J]. 数据分析与知识发现, 2019, 3(4): 22-32.
[11] Jiang Wu,Yinghui Zhao,Jiahui Gao. Research on Weibo Opinion Leaders Identification and Analysis in Medical Public Opinion Incidents[J]. 数据分析与知识发现, 2019, 3(4): 53-62.
[12] Lianjie Xiao,Mengrui Gao,Xinning Su. An Under-sampling Ensemble Classification Algorithm Based on Fuzzy C-Means Clustering for Imbalanced Data[J]. 数据分析与知识发现, 2019, 3(4): 90-96.
[13] Shengchun Ding,Linlin Hou,Ying Wang. Product Knowledge Map Construction Based on the E-commerce Data[J]. 数据分析与知识发现, 2019, 3(3): 45-56.
[14] Jiaxin Ye,Huixiang Xiong. Recommending Personalized Contents from Cross-Domain Resources Based on Tags[J]. 数据分析与知识发现, 2019, 3(2): 21-32.
[15] Chongwu Bi,Guanghui Ye,Mingqian Li,Jieyan Zeng. Discovering City Profile Based on Tag Semantic Mining[J]. 数据分析与知识发现, 2019, 3(12): 41-51.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn