Please wait a minute...
Advanced Search
现代图书情报技术  2016, Vol. 32 Issue (6): 1-11    DOI: 10.11925/infotech.1003-3513.2016.06.01
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于多知识库的短文本实体链接方法研究*——以Wikipedia和Freebase为例
周鹏程1(),武川1,陆伟1,2
1武汉大学信息管理学院 武汉 430072
2武汉大学信息资源研究中心 武汉 430072
Entity Linking Method for Short Texts with Multi-Knowledge Bases: Case Study of Wikipedia and Freebase
Zhou Pengcheng1(),Wu Chuan1,Lu Wei1,2
1School of Information Management, Wuhan University, Wuhan 430072, China
2Center for the Studies of Information Resources, Wuhan University, Wuhan 430072, China
全文: PDF(701 KB)   HTML ( 59
输出: BibTeX | EndNote (RIS)      
摘要 

目的】基于多知识库进行实体链接, 解决基于单一知识库的实体链接覆盖度低的问题。【方法】首先生成文本的n-gram并利用词性和多个指称-实体字典获取候选指称, 然后生成指称组合并保留覆盖度最大且不被其他组合包含的指称组合, 接着生成候选实体序列并利用多知识库信息计算实体序列的相关度, 最后选择相关度最大的实体序列为最终结果。【结果】以Wikipedia和Freebase为例的实验结果表明, 基于Wikipedia+Freebase的实体链接准确率、召回率、F值分别达到71.81%、76.86%、74.25%。【局限】基于词性过滤n-gram缺乏理论依据, 数据集FACC1具有高准确率和低召回率的特点。【结论】利用多个知识库的实体信息, 能够提升实体链接效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
周鹏程
武川
陆伟
关键词 实体链接知识库WikipediaFreebase    
Abstract

[Objective] This paper proposes an entity linking method using multi-knowledge bases, aiming at solving the problem of low coverage caused by entity linking with single knowledge base. [Methods] First, we generated n-gram of input text and obtained candidate mentions using part of speech and multi-mention-entity dictionary. Second, we generated and retained mention combinations of highest coverage which are not contained by other mention combinations. Third, we generated entity sequences and calculated their relevence degree using information from multi-knowledge bases. We listed entity sequence with the highest relevence degree as the final result. [Results] This case study showed that the Precision, Recall, and F-value of the entity linking based on Wikipedia+Freebase reaches 71.81%, 76.86%, and 74.25% respectively. [Limitations] Filtering n-gram based on part of speech lacked theoretical foundation, and the FACC1 dataset featured high precision but low recall. [Conclusions] Utilizing entity information from multi-knowledge bases can improve the performance of entity linking.

Key wordsEntity linking    Knowledge base    Wikipedia    Freebase
收稿日期: 2016-01-13     
基金资助:*本文系国家自然科学基金面上项目“基于语言模型的通用实体检索建模及框架实现研究”(项目编号: 71173164)和武汉大学与中国科技信息研究所合作项目“科学文献的语义功能识别与深度利用”的研究成果之一
引用本文:   
周鹏程,武川,陆伟. 基于多知识库的短文本实体链接方法研究*——以Wikipedia和Freebase为例[J]. 现代图书情报技术, 2016, 32(6): 1-11.
Zhou Pengcheng,Wu Chuan,Lu Wei. Entity Linking Method for Short Texts with Multi-Knowledge Bases: Case Study of Wikipedia and Freebase. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2016.06.01.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2016.06.01
[1] Zhang W, Sim Y C, Su J, et al.Entity Linking with Effective Acronym Expansion, Instance Selection and Topic Modeling [C]. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain. 2011: 1909-1914.
[2] Pantel P, Fuxman A.Jigs and Lures: Associating Web Queries with Structured Entities [C]. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA. 2011: 83-92.
[3] Lin T, Etzioni O.Entity Linking at Web Scale [C]. In: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, Montreal, Canada. 2012: 84-88.
[4] Welty C, Murdock J W, Kalyanpur A, et al.A Comparison of Hard Filters and Soft Evidence for Answer Typing in Watson [C]. In: Proceedings of the 11th International Conference on the Semantic Web. Springer-Verlag, 2012: 243-256.
[5] Bollacker K, Evans C, Paritosh P, et al.Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge [C]. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, 2008: 1247-1250.
[6] Suchanek F M, Kasneci G, Weikum G.YAGO: A Core of Semantic Knowledge [C]. In: Proceedings of the 16th International Conference on World Wide Web. ACM, 2007: 697-706.
[7] Auer S, Bizer C, Kobilarov G, et al.DBpedia: A Nucleus for a Web of Open Data [C]. In: Proceedings of the 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, Busan, Korea. 2007: 722-735.
[8] ClueWeb09 Related Data: Freebase Annotations of the ClueWeb Corpora, v1 (FACC1) [EB/OL]. (2013-11-04). [2015-11-24]. .
[9] Brand?o W C, Santos R L T, Ziviani N, et al. Learning to Expand Queries Using Entities[J]. Journal of the Association for Information Science and Technology, 2014, 65(9): 1870-1883.
[10] 陆伟, 武川. 实体链接研究综述[J]. 情报学报, 2015, 34(1): 105-112.
[10] (Lu Wei, Wu Chuan.Literature Review on Entity Linking[J]. Journal of the China Society for Scientific and Technical Information, 2015, 34(1): 105-112.)
[11] Cucerzan S.Large-scale Named Entity Disambiguation Based on Wikipedia Data [C]. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2007: 708-716.
[12] Milne D, Witten I H.Learning to Link with Wikipedia [C]. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, 2008: 509-518.
[13] Ferragina P, Scaiella U.Tagme: On-the-fly Annotation of Short Text Fragments (by Wikipedia Entities) [C]. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, Ontario, Canada. 2010: 1625-1628.
[14] Meij E, Weerkamp W, De Rijke M.Adding Semantics to Microblog Posts [C]. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining. ACM, 2012: 563-572.
[15] Sil A, Yates A.Re-ranking for Joint Named-entity Recognition and Linking [C]. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 2013: 2369-2374.
[16] Mihalcea R, Csomai A.Wikify!: Linking Documents to Encyclopedic Knowledge [C]. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management, Lisboa, Portugal. 2007: 233-242.
[17] Zhang W, Su J, Tan C L, et al.Entity Linking Leveraging: Automatically Generated Annotation [C]. In: Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, Beijing, China. 2010: 1290-1298.
[18] Pilz A, Paa? G.From Names to Entities Using Thematic Context Distance [C]. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, Glasgow, Scotland, UK. 2011: 857-866.
[19] Zheng Z, Li F, Huang M, et al.Learning to Link Entities with Knowledge Base [C]. In: Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010: 483-491.
[20] Ratinov L, Roth D, Downey D, et al.Local and Global Algorithms for Disambiguation to Wikipedia [C]. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2011: 1375-1384.
[21] Shen W, Wang J, Luo P, et al.LINDEN: Linking Named Entities with Knowledge Base via Semantic Knowledge [C]. In: Proceedings of the 21st International Conference on World Wide Web, Lyon, France. 2012: 449-458.
[22] Han X, Sun L, Zhao J.Collective Entity Linking in Web Text: A Graph-based Method [C]. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China. 2011: 765-774.
[23] Hoffart J, Yosef M A, Bordino I, et al.Robust Disambiguation of Named Entities in Text [C]. In: Proceedingsof the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011: 782-792.
[24] Hachey B, Radford W, Curran J.Graph-Based Named Entity Linking with Wikipedia [C]. In: Proceedings of the 12th International Conference on Web Information System Engineering. 2011: 213-226.
[25] Guo Y, Che W, Liu T, et al.A Graph-based Method for Entity Linking [C]. In: Proceedings of the 5th International Joint Conferenceon Natural Language Processing, Chiang Mai, Thailand. 2011: 1010-1018.
[26] Gottipati S, Jiang J.Linking Entities to a Knowledge Base with Query Expansion [C]. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011: 804-813.
[27] Zhang W, Sim Y C, Su J, et al.NUS-I2R: Learning a Combined System for Entity Linking [C]. In: Proceedings of Text Analysis Conference 2010 Workshop, Gaithersburg, Maryland, USA. 2010.
[28] Chen Z, Ji H.Collaborative Ranking: A Case Study on Entity Linking [C]. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Scotland, UK. 2011: 771-781.
[29] Liu X, Li Y, Wu H, et al.Entity Linking for Tweets [C]. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2013.
[30] Wu C, Lu W, Zhou P.An Optimization Framework for Entity Recognition and Disambiguation [C]. In: Proceedings of the 1st International Workshop on Entity Recognition & Disambiguation. ACM, 2014: 105-110.
[31] Bunescu R C, Pasca M.Using Encyclopedic Knowledge for Named Entity Disambiguation [C]. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy. 2006: 9-16.
[1] 祁瑞华,周俊艺,郭旭,刘彩虹. 基于知识库的图书评论主题抽取研究*[J]. 数据分析与知识发现, 2019, 3(6): 83-91.
[2] 张旺强,祝忠明,李雅梅,卢利农,刘巍. 机构知识库作者名自动消歧框架设计与实践*[J]. 数据分析与知识发现, 2019, 3(6): 92-98.
[3] 吴志强,祝忠明,刘巍,王思丽. CSpace知识分析与可视化功能扩展研究与实践*[J]. 数据分析与知识发现, 2019, 3(3): 112-119.
[4] 丁晟春,侯琳琳,王颖. 基于电商数据的产品知识图谱构建研究*[J]. 数据分析与知识发现, 2019, 3(3): 45-56.
[5] 吴志强,祝忠明,姚晓娜,王思丽. CSpace机构知识库影音资源支持能力扩展研究与实践*[J]. 数据分析与知识发现, 2017, 1(9): 90-96.
[6] 陈果,肖璐. 网络社区中的知识元链接体系构建研究*[J]. 数据分析与知识发现, 2017, 1(11): 75-83.
[7] 王思丽,刘巍,祝忠明,吴志强,王金平. 基于CSpace的科技信息可配置化自动监测功能设计与实现*[J]. 数据分析与知识发现, 2017, 1(10): 85-93.
[8] 吴志强,祝忠明,刘巍,张旺强,姚晓娜. 机构知识库三维模型检索与展示技术研究与实践*[J]. 数据分析与知识发现, 2017, 1(1): 73-80.
[9] 张旺强,祝忠明,姚晓娜,刘巍. 基于开放获取论文推送转发服务系统iSwitch的机构知识库内容建设*[J]. 现代图书情报技术, 2016, 32(4): 91-96.
[10] 刘峰,黎建辉,张进,韩芳,刘昂. TeamDR:面向科研团队的数据知识库管理系统*[J]. 现代图书情报技术, 2016, 32(3): 82-89.
[11] 翟东升, 刘鹤, 张杰, 蔡力伟. 基于图形数据库的专利语义知识库构建技术研究[J]. 数据分析与知识发现, 2016, 32(12): 66-75.
[12] 钱力, 师洪波, 张晓林, 梁娜. 开放获取论文推送转发服务系统iSwitch: 论文分发推送[J]. 现代图书情报技术, 2015, 31(6): 7-12.
[13] 严潮斌, 陈嘉勇, 侯瑞芳, 李玲, 周婕. 查收查引服务支撑需求驱动下的高校机构知识库建设[J]. 现代图书情报技术, 2015, 31(5): 94-100.
[14] 白海燕. ORCID在机构知识库中的整合介绍[J]. 现代图书情报技术, 2015, 31(3): 8-17.
[15] 李慧, 相华婷, 汤强. 基于结构和编辑历史的Wikipedia信任模型[J]. 现代图书情报技术, 2015, 31(3): 33-38.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn