Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (9): 15-25     https://doi.org/10.11925/infotech.2096-3467.2020.0382
  综述评介 本期目录 | 过刊浏览 | 高级检索 |
命名实体消歧研究进展综述*
温萍梅1,叶志炜1,丁文健1,刘颖2(),徐健1
1中山大学资讯管理学院 广州 510006
2中山大学图书馆 广州 510275
Developments of Named Entity Disambiguation
Wen Pingmei1,Ye Zhiwei1,Ding Wenjian1,Liu Ying2(),Xu Jian1
1School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China
2Sun Yat-Sen University Library, Guangzhou 510275, China
全文: PDF (1188 KB)   HTML ( 20
输出: BibTeX | EndNote (RIS)      
摘要 

目的】 调研近年来命名实体消歧领域的相关研究和资源,重点介绍命名实体消歧方法研究进展。【文献范围】 使用知网数据库、万方数据知识服务平台和EBSCO外文期刊平台检索命名实体消歧相关文献,共选择57篇代表性文献和电子资源。【方法】 从实体显著性、上下文相似度、实体关联度、深度学习和特殊标识资源5个角度对命名实体消歧的方法和思路进行归纳总结,并对可用的辅助知识库和开源工具以及国际评测会议进行梳理。【结果】 传统的方法经典易用,而近年来出现的深度学习等新方法,则明显地提升了消歧效果。有效的消歧模型往往整合了不同类型方法,以期达到最优消歧效果。【局限】 基于已有文献对各种方法的对比分析尚存在一定的主观性。【结论】 现有的命名实体消歧方法仍然处在发展阶段,未来可利用人工智能方法和领域资源进一步提升实体消歧效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
温萍梅
叶志炜
丁文健
刘颖
徐健
关键词 命名实体消歧知识库实体链接聚类    
Abstract

[Objective] This paper reviews research and resources in the field of named entity disambiguation(NED) with a focus on the NED methods.[Coverage] We retrieved 57 representative papers and electronic resources from CNKI, Wanfang Data Knowledge Service Platform, and EBSCO.[Methods] First, we summarized the NED principles and methods from the perspectives of entity prominence, context similarity, entity relationship, deep learning and special identification resources. Then, we explored useful knowledge bases, open source tools as well as international conferences on NED evaluation.[Results] Traditional and classic methods were easy to use, while the new ones (e.g., deep learning) significantly improved the results of NED. Effective models often integrated various methods to yield the optimal results.[Limitations] There are subjectivity factors in comparing different methods from the literature.[Conclusions] The NED methods are still developing and could be further improved by artificial intelligence and domain resources.

Key wordsNamed Entity Disambiguation    Knowledge Base    Entity Linking    Clustering
收稿日期: 2020-05-04      出版日期: 2020-10-14
ZTFLH:  TP393  
基金资助:*本文系广东省自然科学基金项目“情感分歧度量化模型及其应用研究”的研究成果之一(2018A030313981)
通讯作者: 刘颖     E-mail: pusly@mail.sysu.edu.cn
引用本文:   
温萍梅,叶志炜,丁文健,刘颖,徐健. 命名实体消歧研究进展综述*[J]. 数据分析与知识发现, 2020, 4(9): 15-25.
Wen Pingmei,Ye Zhiwei,Ding Wenjian,Liu Ying,Xu Jian. Developments of Named Entity Disambiguation. Data Analysis and Knowledge Discovery, 2020, 4(9): 15-25.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0382      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I9/15
Fig.1  基于知识库和文本相似度的命名实体消歧结果[19]
消歧特征 特征来源 消歧方法 消歧思路
实体显著性 字符串、流行度、共性 规则匹配、先验概率 计算实体显著性,将候选实体列表中显著性最高的候选实体作为歧义实体的消歧结果
上下文相似度 整篇文本、部分名词、实体 分类、聚类、基于主题、概率语言模型 比较实体指称上下文和候选实体上下文的相似度,并将相似度最高的候选实体作为消歧结果,比较过程可选择借助外部知识库
实体关联度 文本描述和文本分类信息、实体注释与实体共现和实体分布、实体关联图 聚类、图关系推理 同一文本中共现的实体往往属于同一个主题或具有某种相关性,利用同一文本中实体之间的语义联系进行协同消歧
深度学习 词向量 神经网络模型 用具有语义特征的分布式向量代表消歧任务中的指称、文本和实体,并将该向量用于深度学习模型中,通过向量相似度完成消歧
特殊标识资源 IPC、ORCID 特殊标识、唯一标识符 通过领域内通用标识资源,降低消歧难度
Table 1  利用不同消歧特征的方法对比分析
Fig.2  条目页面
Fig.3  消歧页面
知识库 研发机构 实体数量 知识源 介绍
DBpedia[35] 德国莱比锡大学与曼海姆大学 458万 Wikipedia 大规模跨语言的知识库,支持多达125种语言,包括人物、地点、唱片、游戏、组织、疾病、物种等领域
Freebase[36] MetaWeb公司,后被谷歌收购 6 800万 Wikipedia、
IMDB、Flickr
大规模开放结构数据库,包含三层结构:Domain、Type、Topic
YAGO[37] 德国马普研究所 1 000万 Wikipedia、WordNet、GeoNames 大规模跨语言的语义知识库,包含人物、组织、城市等领域
WordNet[39] 普林斯顿大学 15万 专家人工构建 人工编辑,英文词典,按词义组织,名词同义词集合及上下位为消歧提供帮助
Table 2  部分通用知识库
Fig.4  Text Analytics网页界面
服务 研发机构 是否开源 链接源 说明
Text Analytics[40] 微软 提供API Wikipedia 该API提供情感分析、关键词提取、命名实体识别等功能
AGDISTIS[41,48] 莱比锡大学 开源 DBpedia 开源实体链接框架
TAGME[43] 比萨大学 开源API Wikipedia 开源实体链接和注释工具
FEL[44] 雅虎 开源工具包 Wikipedia 多语言轻量级实体链接工具包
Table 3  部分实体链接服务
测评会议 测评语言 消歧要求 链接源 说明
TAC KBP[49,50] 英文 链接并聚类 Wikipedia抽取的知识库 官网提供开放训练语料集
WePS[51,52,53] 英文 聚类 无命名实体知识库
CLP-2012[55,56] 中文 链接并聚类 知识库 链接到知识库中的定义
NLP&CC[57] 中文 链接 百科知识库 实现短文本实体的链接
Table 4  命名实体消歧国际测评会议
[1] 赵军. 命名实体识别、排歧和跨语言关联[J]. 中文信息学报, 2009,23(2):3-17.
[1] ( Zhao Jun. A Survey on Named Entity Recognition, Disambiguation and Cross-Lingual Coreference Resolution[J]. Journal of Chinese Information Processing, 2009,23(2):3-17.)
[2] 高艳红, 李爱萍, 段利国. 面向实体链接的多特征图模型实体消歧方法[J]. 计算机应用研究, 2017,34(10):2909-2914.
[2] ( Gao Yanhong, Li Aiping, Duan Liguo. Entity Disambiguation Method Based on Multi-Feature Fusion Graph Model for Entity Linking[J]. Application Research of Computers, 2017,34(10):2909-2914.)
[3] Shen W, Wang J, Han J. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions[J]. IEEE Transactions on Knowledge and Data Engineering, 2015,27(2):443-460.
doi: 10.1109/TKDE.2014.2327028
[4] Dredze M, McNamee P, Rao D, et al. Entity Disambiguation for Knowledge Base Population[C] // Proceedings of the 23rd International Conference on Computational Linguistics. 2010: 277-285.
[5] Zhu G, Iglesias C A. Exploiting Semantic Similarity for Named Entity Disambiguation in Knowledge Graphs[J]. Expert Systems with Applications, 2018,101:8-24.
doi: 10.1016/j.eswa.2018.02.011
[6] 左乃彻. 基于维基百科的中英文命名实体消歧[D]. 北京: 北京邮电大学, 2015.
[6] ( Zuo Naiche. Named Entity Disambiguation Based on Chinese and English Wikipedia Knowledge Base[D]. Beijing: Beijing University of Posts and Telecommunications, 2015.)
[7] Gattani A, Lamba D S, Garera N, et al. Entity Extraction, Linking, Classification, and Tagging for Social Media: A Wikipedia-based Approach[J]. Proceedings of the VLDB Endowment, 2013,6(11):1126-1137.
doi: 10.14778/2536222.2536237
[8] 王静, 谭绍峰, 贺东东, 等. 基于上下文特征的领域文献实体消歧算法[J]. 北京生物医学工程, 2018, 37(4): 398-402, 409.
[8] ( Wang Jing, Tan Shaofeng, He Dongdong, et al. Entity Disambiguation Algorithm for Domain Document Based on Context Feature[J]. Beijing Biomedical Engineering, 2018,37(4):398-402, 409.)
[9] Guo S, Chang M W, Kiciman E. To Link or Not to Link? A Study on End-to-End Tweet Entity Linking[C] // Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013: 1020-1030.
[10] 线岩团, 余正涛, 洪旭东, 等. 基于特征加权重叠度的中文实体协同消歧方法[J]. 中文信息学报, 2017,31(2):36-41.
[10] ( Xian Yantuan, Yu Zhengtao, Hong Xudong, et al. Collaborative Entity Disambiguation Method Based on Weighted Feature Overlap Relatedness for Chinese[J]. Journal of Chinese Information Processing, 2017,31(2):36-41.)
[11] Elmacioglu E, Tan Y, Yan S, et al. PSNUS: Web People Name Disambiguation by Simple Clustering with Rich Features[C] //Proceedings of the 4th International Workshop on Semantic Evaluations. 2007: 268-271.
[12] Hoffart J, Yosef M A, Bordino I, et al. Robust Disambiguation of Named Entities in Text[C] // Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 2011: 782-792.
[13] Zhang W, Su J, Tan C L, et al. Entity Linking Leveraging: Automatically Generated Annotation[C] // Proceedings of the 23rd International Conference on Computational Linguistics. 2010: 1290-1298.
[14] 李广一, 王厚峰. 基于多步聚类的汉语命名实体识别和歧义消解[J]. 中文信息学报, 2013,27(5):29-34, 42.
[14] ( Li Guangyi, Wang Houfeng. Chinese Named Entity Recognition and Disambiguation Based on Multi-Stage Clustering[J]. Journal of Chinese Information Processing, 2013,27(5):29-34, 42.)
[15] 谭咏梅, 杨雪. 结合实体链接与实体聚类的命名实体消歧[J]. 北京邮电大学学报, 2014,37(5):36-40.
[15] ( Tan Yongmei, Yang Xue. An Named Entity Disambiguation Algorithm Combining Entity Linking and Entity Clustering[J]. Journal of Beijing University of Posts and Telecommunications, 2014,37(5):36-40.)
[16] 怀宝兴, 宝腾飞, 祝恒书, 等. 一种基于概率主题模型的命名实体链接方法[J]. 软件学报, 2014,25(9):2076-2087.
[16] ( Huai Baoxing, Bao Tengfei, Zhu Hengshu, et al. Topic Modeling Approach to Named Entity Linking[J]. Journal of Software, 2014,25(9):2076-2087.)
[17] Han X, Sun L. A Generative Entity-Mention Model for Linking Entities with Knowledge Base[C] // Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011: 945-954.
[18] Meij E, Bron M, Hollink L, et al. Mapping Queries to the Linking Open Data Cloud: A Case Study Using DBpedia[J]. Journal of Web Semantics, 2011,9(4):418-433.
doi: 10.1016/j.websem.2011.04.001
[19] Sun Y, Ji Z, Lin L, et al. Entity Disambiguation with Decomposable Neural Networks[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2017,7(5):e1215.
doi: 10.1002/widm.2017.7.issue-5
[20] 杨光, 刘秉权, 刘铭. 基于图方法的命名实体消歧[J]. 智能计算机与应用, 2015,5(5):52-55.
[20] ( Yang Guang, Liu Bingquan, Liu Ming. Graph-based Method for Named Entity Disambiguation[J]. Intelligent Computer and Applications, 2015,5(5):52-55.)
[21] Cucerzan S. Large-Scale Named Entity Disambiguation Based on Wikipedia Data[C] // Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic. DBLP, 2007: 708-716.
[22] Alhelbawy A, Gaizauskas R. Named Entity Disambiguation Using HMMs[C] // Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies. IEEE Computer Society, 2013: 159-162.
[23] Han X, Sun L, Zhao J. Collective Entity Linking in Web Text: A Graph-Based Method[C] // Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011: 765-774.
[24] Phan M C, Sun A, Tay Y, et al. Pair-Linking for Collective Entity Disambiguation: Two Could be Better than All[J]. IEEE Transactions on Knowledge and Data Engineering, 2018,31(7):1383-1396.
doi: 10.1109/TKDE.69
[25] Niu L, Wu J, Shi Y. Entity Disambiguation with Textual and Connection Information[J]. Procedia Computer Science, 2012,9:1249-1255.
doi: 10.1016/j.procs.2012.04.136
[26] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[C] //Proceedings of the 1st International Conference on Learning Representations. 2013.
[27] Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation (Code and Pre-trained Data)[EB/OL] [2019-12-21]. https://nlp.stanford.edu/projects/glove/.
[28] Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation[C] //Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014.
[29] Zuheros C, Tabik S, Valdivia A, et al. Deep Recurrent Neural Network for Geographical Entities Disambiguation on Social Media Data[J]. Knowledge-Based Systems, 2019,173:117-127.
doi: 10.1016/j.knosys.2019.02.030
[30] He Z, Liu S, Li M, et al. Learning Entity Representation for Entity Disambiguation[C] //Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013.
[31] Francis-Landau M, Durrett G, Klein D. Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks[C] //Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016.
[32] 王琰炎, 王裴岩, 蔡东风, 等. 一种用于专利实体的实体消歧方法[J]. 沈阳航空航天大学学报, 2015,32(1):77-83.
[32] ( Wang Yanyan, Wang Peiyan, Cai Dongfeng, et al. An Entity Disambiguation Method for Patent Entity[J]. Journal of Shenyang Aerospace University, 2015,32(1):77-83.)
[33] Lerchenmueller M J, Olav S. Author Disambiguation in PubMed: Evidence on the Precision and Recall of Authority Among NIH-Funded Scientists[J]. PLoS ONE, 2016,11(7):e0158731.
doi: 10.1371/journal.pone.0158731 pmid: 27367860
[34] Haak L L, Fenner M, Paglione L, et al. ORCID: A System to Uniquely Identify Researchers[J]. Learned Publishing, 2012,25(4):259-264.
doi: 10.1087/20120404
[35] Auer S, Bizer C, Kobilarov G, et al. DBpedia: A Nucleus for a Web of Open Data[A]//Aberer K, Choi K, Noy N, et al. The Semantic Web[M]. Springer Berlin Heidelberg, 2007: 722-735.
[36] Bollacker K, Evans C, Paritosh P, et al. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge[C] //Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, 2008: 1247-1250.
[37] Suchanek F M, Kasneci G, Weikum G. YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia[C] //Proceedings of the 16th International World Wide Web Conference. 2007: 697-706.
[38] 黄恒琪, 于娟, 廖晓, 等. 知识图谱研究综述[J]. 计算机系统应用, 2019,28(6):1-12.
[38] ( Huang Hengqi, Yu Juan, Liao Xiao, et al. Review on Knowledge Graphs[J]. Computer Systems & Applications, 2019,28(6):1-12.)
[39] Miller G A. WordNet: A Lexical Database for English[J]. Communications of the ACM, 1995,38(11):39-41.
[40] Microsoft Azure. Text Analytics: Detect Sentiment, Key Phrases, Named Entities and Language from Your Text[EB/OL]. [2019-12-21]. https://azure.microsoft.com/en-us/services/cognitive-services/ text-analytics/.
[41] Usbeck R, Ngomo A C N, Auer S, et al. AGDISTIS-Agnostic Disambiguation of Named Entities Using Linked Open Data[C] // Proceedings of the 12th International Semantic Web Conference, Sydney, Australia. 2013.
[42] AGDISTIS. Agnostic Disambiguation of Named Entities Using Linked Open Data[EB/OL]. [ 2019- 12- 21]. http://aksw.org/Projects/ AGDISTIS.html.
[43] Ferragina P, Scaiella U. TAGME[EB/OL]. [2019-12-21]. https://tagme.d4science.org/tagme/.
[44] Blanco R, Pappu A. FEL GitHub[DB/OL]. [2019-12-21]. https://github.com/yahoo/FEL.
[45] Blanco R, Ottaviano G, Meij E. Fast and Space-Efficient Entity Linking in Queries[C] // Proceedings of the 8th ACM International Conference on Web Search and Data Mining. 2015: 179-188.
[46] Dexter. Dexter, an Open Source Framework for Entity Linking[EB/OL]. [ 2019- 12- 21]. http://dexter.isti.cnr.it/.
[47] Ceccarelli D, Lucchese C, Orlando S, et al. Dexter: An Open Source Framework for Entity Linking[C] // Proceedings of the 6th International Workshop on Exploiting Semantic Annotations in Information Retrieval. 2013.
[48] AGDISTIS. AGDISTIS-Agnostic Named Entity Disambiguation[DB/OL]. [2019-12-21]. https://github.com/dice-group/AGDISTIS.
[49] Ji H, Grishman R, Dang H T, et al. Overview of the TAC 2010 Knowledge Base Population Track[C] //Proceedings of the 3rd Text Analysis Conference. 2010.
[50] Ji H, Grishman R, Dang H T. Overview of the TAC 2011 Knowledge Base Population Track[C] //Proceedings of the 4th Text Analysis Conference. 2011.
[51] Artiles J, Gonzalo J, Sekine S. The SemEval-2007 WePS Evaluation: Establishing a Benchmark for the Web People Search Task[C] //Proceedings of the 4th International Workshop on Semantic Evaluations. 2007: 64-69.
[52] Artiles J, Gonzalo J, Sekine S. WePS 2 Evaluation Campaign: Overview of the Web People Search Clustering Task[C] //Proceedings of the 2nd Web People Search Evaluation Workshop. 2009.
[53] Artiles J, Borthwick A, Gonzalo J, et al. WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Tasks[C] //Proceedings of the 2010 CLEF LABs & Workshops. DBLP, 2010.
[54] TAC. Past TAC Data[DB/OL].[2019-12-21]. https://tac.nist.gov/data/index.html.
[55] He Z, Wang H, Li S. The Task 2 of CIPS-SIGHAN 2012 Named Entity Recognition and Disambiguation in Chinese Bakeoff[C] //Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2012: 108-144.
[56] CIPS-SIGHAN2012. The Second CIPS-SIGHAN Joint Conference on Chinese Language Processing[EB/OL]. [ 2019- 12- 21]. http://www.cipsc.org.cn/clp2012/bakeoff-cn.html.
[57] NLP&CC. 第二届自然语言处理与中文计算会议(NLP&CC 2013)技术评测测试数据下载[DB/OL]. [ 2019- 12- 21]. http://tcci.ccf.org.cn/conference/2013/pages/page04_tdata.html.
[57] ( NLP&CC. The 2nd Conference on Natural Language Processing and Chinese Computing Test Data Download Address[DB/OL]. [2019- 12- 21]. http://tcci.ccf.org.cn/conference/2013/pages/page04_tdata.html
[1] 李文娜,张智雄. 基于置信学习的知识库错误检测方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 1-9.
[2] 王若琳, 牛振东, 蔺奇卡, 朱一凡, 邱萍, 陆浩, 刘东磊. 基于异质信息嵌入与RNN聚类参数预测的作者姓名消歧方法*[J]. 数据分析与知识发现, 2021, 5(8): 13-24.
[3] 王晰巍,贾若男,韦雅楠,张柳. 多维度社交网络舆情用户群体聚类分析方法研究*[J]. 数据分析与知识发现, 2021, 5(6): 25-35.
[4] 卢利农,祝忠明,张旺强,王小春. 基于Lingo3G聚类算法的机构知识库跨库知识整合与知识指纹服务实现[J]. 数据分析与知识发现, 2021, 5(5): 127-132.
[5] 张梦瑶, 朱广丽, 张顺香, 张标. 基于情感分析的微博热点话题用户群体划分模型 *[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[6] 丁浩, 艾文华, 胡广伟, 李树青, 索炜. 融合用户兴趣波动时序的个性化推荐模型*[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[7] 杨辰, 陈晓虹, 王楚涵, 刘婷婷. 基于用户细粒度属性偏好聚类的推荐策略*[J]. 数据分析与知识发现, 2021, 5(10): 94-102.
[8] 于丰畅,程齐凯,陆伟. 基于几何对象聚类的学术文献图表定位研究[J]. 数据分析与知识发现, 2021, 5(1): 140-149.
[9] 邬金鸣,侯跃芳,崔雷. 基于医学主题词标引规则的词共现聚类分析结果自动判读和表达的研究[J]. 数据分析与知识发现, 2020, 4(9): 133-144.
[10] 席运江, 杜蝶蝶, 廖晓, 仉学红. 基于超网络的企业微博用户聚类研究及特征分析*[J]. 数据分析与知识发现, 2020, 4(8): 107-118.
[11] 杨旭,钱晓东. 基于改进的Vicsek模型的社会网络同步聚类算法*[J]. 数据分析与知识发现, 2020, 4(4): 119-128.
[12] 熊回香,李晓敏,李跃艳. 基于图书评论属性挖掘的群组推荐研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 214-222.
[13] 魏家泽,董诚,何彦青,刘志辉,彭柯芸. 基于均衡段落和分话题向量的新闻热点话题检测研究*[J]. 数据分析与知识发现, 2020, 4(10): 70-79.
[14] 赵华茗,余丽,周强. 基于均值漂移算法的文本聚类数目优化研究 *[J]. 数据分析与知识发现, 2019, 3(9): 27-35.
[15] 李珊,姚叶慧,厉浩,刘洁,嘎玛白姆. 基于ISA联合聚类的组推荐算法研究 *[J]. 数据分析与知识发现, 2019, 3(8): 77-87.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn