Please wait a minute...
New Technology of Library and Information Service  2016, Vol. 32 Issue (6): 1-11    DOI: 10.11925/infotech.1003-3513.2016.06.01
Orginal Article Current Issue | Archive | Adv Search |
Entity Linking Method for Short Texts with Multi-Knowledge Bases: Case Study of Wikipedia and Freebase
Zhou Pengcheng1(),Wu Chuan1,Lu Wei1,2
1School of Information Management, Wuhan University, Wuhan 430072, China
2Center for the Studies of Information Resources, Wuhan University, Wuhan 430072, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes an entity linking method using multi-knowledge bases, aiming at solving the problem of low coverage caused by entity linking with single knowledge base. [Methods] First, we generated n-gram of input text and obtained candidate mentions using part of speech and multi-mention-entity dictionary. Second, we generated and retained mention combinations of highest coverage which are not contained by other mention combinations. Third, we generated entity sequences and calculated their relevence degree using information from multi-knowledge bases. We listed entity sequence with the highest relevence degree as the final result. [Results] This case study showed that the Precision, Recall, and F-value of the entity linking based on Wikipedia+Freebase reaches 71.81%, 76.86%, and 74.25% respectively. [Limitations] Filtering n-gram based on part of speech lacked theoretical foundation, and the FACC1 dataset featured high precision but low recall. [Conclusions] Utilizing entity information from multi-knowledge bases can improve the performance of entity linking.

Key wordsEntity linking      Knowledge base      Wikipedia      Freebase     
Received: 13 January 2016      Published: 18 July 2016

Cite this article:

Zhou Pengcheng,Wu Chuan,Lu Wei. Entity Linking Method for Short Texts with Multi-Knowledge Bases: Case Study of Wikipedia and Freebase. New Technology of Library and Information Service, 2016, 32(6): 1-11.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2016.06.01     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2016/V32/I6/1

[1] Zhang W, Sim Y C, Su J, et al.Entity Linking with Effective Acronym Expansion, Instance Selection and Topic Modeling [C]. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain. 2011: 1909-1914.
[2] Pantel P, Fuxman A.Jigs and Lures: Associating Web Queries with Structured Entities [C]. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA. 2011: 83-92.
[3] Lin T, Etzioni O.Entity Linking at Web Scale [C]. In: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, Montreal, Canada. 2012: 84-88.
[4] Welty C, Murdock J W, Kalyanpur A, et al.A Comparison of Hard Filters and Soft Evidence for Answer Typing in Watson [C]. In: Proceedings of the 11th International Conference on the Semantic Web. Springer-Verlag, 2012: 243-256.
[5] Bollacker K, Evans C, Paritosh P, et al.Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge [C]. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, 2008: 1247-1250.
[6] Suchanek F M, Kasneci G, Weikum G.YAGO: A Core of Semantic Knowledge [C]. In: Proceedings of the 16th International Conference on World Wide Web. ACM, 2007: 697-706.
[7] Auer S, Bizer C, Kobilarov G, et al.DBpedia: A Nucleus for a Web of Open Data [C]. In: Proceedings of the 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, Busan, Korea. 2007: 722-735.
[8] ClueWeb09 Related Data: Freebase Annotations of the ClueWeb Corpora, v1 (FACC1) [EB/OL]. (2013-11-04). [2015-11-24]. .
[9] Brand?o W C, Santos R L T, Ziviani N, et al. Learning to Expand Queries Using Entities[J]. Journal of the Association for Information Science and Technology, 2014, 65(9): 1870-1883.
[10] 陆伟, 武川. 实体链接研究综述[J]. 情报学报, 2015, 34(1): 105-112.
[10] (Lu Wei, Wu Chuan.Literature Review on Entity Linking[J]. Journal of the China Society for Scientific and Technical Information, 2015, 34(1): 105-112.)
[11] Cucerzan S.Large-scale Named Entity Disambiguation Based on Wikipedia Data [C]. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2007: 708-716.
[12] Milne D, Witten I H.Learning to Link with Wikipedia [C]. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, 2008: 509-518.
[13] Ferragina P, Scaiella U.Tagme: On-the-fly Annotation of Short Text Fragments (by Wikipedia Entities) [C]. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, Ontario, Canada. 2010: 1625-1628.
[14] Meij E, Weerkamp W, De Rijke M.Adding Semantics to Microblog Posts [C]. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining. ACM, 2012: 563-572.
[15] Sil A, Yates A.Re-ranking for Joint Named-entity Recognition and Linking [C]. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 2013: 2369-2374.
[16] Mihalcea R, Csomai A.Wikify!: Linking Documents to Encyclopedic Knowledge [C]. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management, Lisboa, Portugal. 2007: 233-242.
[17] Zhang W, Su J, Tan C L, et al.Entity Linking Leveraging: Automatically Generated Annotation [C]. In: Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, Beijing, China. 2010: 1290-1298.
[18] Pilz A, Paa? G.From Names to Entities Using Thematic Context Distance [C]. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, Glasgow, Scotland, UK. 2011: 857-866.
[19] Zheng Z, Li F, Huang M, et al.Learning to Link Entities with Knowledge Base [C]. In: Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010: 483-491.
[20] Ratinov L, Roth D, Downey D, et al.Local and Global Algorithms for Disambiguation to Wikipedia [C]. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2011: 1375-1384.
[21] Shen W, Wang J, Luo P, et al.LINDEN: Linking Named Entities with Knowledge Base via Semantic Knowledge [C]. In: Proceedings of the 21st International Conference on World Wide Web, Lyon, France. 2012: 449-458.
[22] Han X, Sun L, Zhao J.Collective Entity Linking in Web Text: A Graph-based Method [C]. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China. 2011: 765-774.
[23] Hoffart J, Yosef M A, Bordino I, et al.Robust Disambiguation of Named Entities in Text [C]. In: Proceedingsof the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011: 782-792.
[24] Hachey B, Radford W, Curran J.Graph-Based Named Entity Linking with Wikipedia [C]. In: Proceedings of the 12th International Conference on Web Information System Engineering. 2011: 213-226.
[25] Guo Y, Che W, Liu T, et al.A Graph-based Method for Entity Linking [C]. In: Proceedings of the 5th International Joint Conferenceon Natural Language Processing, Chiang Mai, Thailand. 2011: 1010-1018.
[26] Gottipati S, Jiang J.Linking Entities to a Knowledge Base with Query Expansion [C]. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011: 804-813.
[27] Zhang W, Sim Y C, Su J, et al.NUS-I2R: Learning a Combined System for Entity Linking [C]. In: Proceedings of Text Analysis Conference 2010 Workshop, Gaithersburg, Maryland, USA. 2010.
[28] Chen Z, Ji H.Collaborative Ranking: A Case Study on Entity Linking [C]. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Scotland, UK. 2011: 771-781.
[29] Liu X, Li Y, Wu H, et al.Entity Linking for Tweets [C]. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2013.
[30] Wu C, Lu W, Zhou P.An Optimization Framework for Entity Recognition and Disambiguation [C]. In: Proceedings of the 1st International Workshop on Entity Recognition & Disambiguation. ACM, 2014: 105-110.
[31] Bunescu R C, Pasca M.Using Encyclopedic Knowledge for Named Entity Disambiguation [C]. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy. 2006: 9-16.
[1] Li Wenna,Zhang Zhixiong. Research on Knowledge Base Error Detection Method Based on Confidence Learning[J]. 数据分析与知识发现, 2021, 5(9): 1-9.
[2] Wen Pingmei,Ye Zhiwei,Ding Wenjian,Liu Ying,Xu Jian. Developments of Named Entity Disambiguation[J]. 数据分析与知识发现, 2020, 4(9): 15-25.
[3] Ruihua Qi,Junyi Zhou,Xu Guo,Caihong Liu. Extracting Book Review Topics with Knowledge Base[J]. 数据分析与知识发现, 2019, 3(6): 83-91.
[4] Shengchun Ding,Linlin Hou,Ying Wang. Product Knowledge Map Construction Based on the E-commerce Data[J]. 数据分析与知识发现, 2019, 3(3): 45-56.
[5] Chen Guo,Xiao Lu. Linking Knowledge Elements from Online Community[J]. 数据分析与知识发现, 2017, 1(11): 75-83.
[6] Li Xiangdong,Ruan Tao,Liu Kang. Automatic Classification of Documents from Wikipedia[J]. 数据分析与知识发现, 2017, 1(10): 43-52.
[7] Xia Tian. Generating Hierarchical Paths of Chinese Text from Wikipedia[J]. 现代图书情报技术, 2016, 32(3): 25-32.
[8] Dongsheng Zhai, He Liu, Jie Zhang, Liwei Cai. Managing Patent Semantic Knowledge with Graph Database[J]. 数据分析与知识发现, 2016, 32(12): 66-75.
[9] Li Hui, Xiang Huating, Tang Qiang. A Trust Model for Wikipedia Based on Structure Information and Edit History[J]. 现代图书情报技术, 2015, 31(3): 33-38.
[10] Ren Haiying, Yu Liting. A Multi-strategy Method for Word Sense Disambiguation Based on Wikipedia[J]. 现代图书情报技术, 2015, 31(11): 18-25.
[11] Jiang Xun, Xu Xukan, Su Xinning. Knowledge Service-oriented Model of Knowledge Base Frame Structure Research Based on Double-base Cooperating[J]. 现代图书情报技术, 2014, 30(2): 55-62.
[12] Yang Zhimo, Liu Huailiang, Zhao Hui. An Algorithm of Chinese Text Representation Based on Complex Network[J]. 现代图书情报技术, 2014, 30(11): 38-44.
[13] Xu Xin, Hong Yunjia. Study on Text Visualization of Clustering Result for Domain Knowledge Base —— Take Knowledge Base of Chinese Cuisine Culture as the Object[J]. 现代图书情报技术, 2014, 30(10): 25-32.
[14] Wang Dongbo, Zhu Danhao. Research of Mining the Word Category Knowledge for Chinese Syntactic Function Distribution Knowledge Base[J]. 现代图书情报技术, 2013, 29(3): 33-37.
[15] Xu Xin, Guo Jinlong. Construction of Subject Knowledge Base——Taking the Domain of Chinese Cuisine Culture as an Example[J]. 现代图书情报技术, 2013, (12): 2-9.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn