|
|
A Multi-strategy Method for Word Sense Disambiguation Based on Wikipedia |
Ren Haiying, Yu Liting |
School of Economics and Management, Beijing University of Technology, Beijing 100124, China |
|
|
Abstract [Objective] This paper proposes a multi-strategy method for Word Sense Disambiguation (WSD) based on Wikipedia which makes full use of the latent knowledge in Wikipedia.[Methods] Design three indicators including category commonness, content relatedness and the importance of the word sense, make an entropy-based dynamic linear fusion of these three indicators, combined with re-disambiguation to choose the best sense of an ambiguous term in its context.[Results] Experimental result shows an average precision of 74.82%, therefore validating the feasibility and effectiveness of this method.[Limitations] The proposed method mainly aims at WSD in English with a setting of fine grained candidate senses, lacking certain generality to other languages.[Conclusions] This method provides more semantic knowledge and background information based on Wikipedia which enhance the precision of disambiguation tasks.
|
Received: 21 April 2015
Published: 06 April 2016
|
|
[1] Bhala R V V, Abirami S. Trends in Word Sense Disambiguation[J]. Artificial Intelligence Review, 2014, 42(2): 159-171.
[2] Pedersen T. A Decision Tree of Bigrams is an Accurate Predictor of Word Sense [C]. In: Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics, Carnegie Mellon University, Pittsburgh, PA, USA. Somerset: Association Computational Linguistics, 2001: 79-86.
[3] Navigli R, Velardi P. Structural Semantic Interconnections: A Knowledge-based Approach to Word Sense Disambiguation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(7): 1075-1086.
[4] Dandala B, Mihalcea R, Bunescu R. Word Sense Disambiguation Using Wikipedia [A]// The People's Web Meets NLP: Collaboratively Constructed Language Resources [M]. Springer Berlin Heidelberg, 2013: 241-262.
[5] 王兰成, 刘晓亮. 维基百科知网的构建研究与应用进展[J]. 情报资料工作, 2012(5): 56-60. (Wang Lancheng, Liu Xiaoliang. Construction Research and Application Progress of Wikipedia Knowledge Network [J]. Information and Documentation Services, 2012(5): 56-60.)
[6] Mihalcea R. Using Wikipedia for Automatic Word Sense Disambiguation [C]. In: Proceedings of the Human Language Technologies: The Conference of the North American Chapter of the Association for Computational Linguistics. 2007: 196-203.
[7] Fogarolli A.Word Sense Disambiguation Based on Wikipedia Link Structure [C]. In: Proceedings of the 2009 IEEE International Conference on Semantic Computing (ICSC '09), Berkeley, CA, USA. New York: IEEE, 2009: 77-82.
[8] 史天艺, 李明禄. 基于维基百科的自动词义消歧方法[J]. 计算机工程, 2009, 35(18): 62-64, 66. (Shi Tianyi, Li Minglu. Automatic Word Sense Disambiguation Method Based on Wikipedia [J]. Computer Engineering, 2009, 35(18): 62-64, 66.)
[9] Li C, Sun A, Datta A. TSDW: Two-Stage Word Sense Disambiguation Using Wikipedia [J]. Journal of the American Society for Information Science and Technology, 2013, 64(6): 1203-1223.
[10] 汪祥. 基于中文维基百科的语义相关度计算的研究与实现[D]. 长沙: 国防科学技术大学, 2011. (Wang Xiang. Research and Implementation on Computing Semantic Relatedness Using Chinese Wikipedia [D]. Changsha: National University of Defense Technology, 2011.)
[11] Firth J. A Synopsis of Linguistic Theory 1930—1955 [J]. Special, 1957(5611): 562.
[12] 刘群, 李素建. 基于《知网》的词汇语义相似度计算[J]. 中文计算语言学, 2002, 7(2): 59-76. (Liu Qun, Li Sujian. Word Similarity Computing Based on How-net [J]. Computational Linguistics and Chinese Language Processing, 2002, 7(2): 59-76.)
[13] 范云杰, 刘怀亮. 基于维基百科的中文短文本分类研究[J]. 现代图书情报技术, 2012(3): 47-52. (Fan Yunjie, Liu Huailiang. Research on Chinese Short Text Classification Based on Wikipedia [J]. New Technology of Library and Information Service, 2012(3): 47-52.)
[14] 龚永恩, 袁春风, 武港山. 基于语义的词义消歧算法初探[J]. 计算机应用研究, 2006, 23(3): 41-43,52. (Gong Yongen, Yuan Chunfeng, Wu Gangshan. Word Sense Disambiguation Algorithm Based on Semantic Information [J]. Application Research of Computers, 2006, 23(3): 41-43, 52.)
[15] 涂新辉, 张红春, 周琨峰, 等. 中文维基百科的结构化信息抽取及词语相关度计算方法[J]. 中文信息学报, 2012, 26(3): 109-115. (Tu Xinhui, Zhang Hongchun, Zhou Kunfeng, et al. Extracting Structured Information from Chinese Wiki-pedia and Measuring Relatedness Between Words [J]. Journal of Chinese Information Processing, 2012, 26(3): 109-115.)
[16] Witten I H, Milne D N. An Effective, Low-cost Measure of Semantic Relatedness Obtained from Wikipedia Links [C]. In: Proceeding of the AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy. Chicago: AAAI Press, 2008: 25-30.
[17] Cilibrasi R L, Vitanyi P M B. The Google Similarity Distance [J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 370-383.
[18] Zhang W, Yoshida T, Tang X. A Comparative Study of TF*IDF, LSI and Multi-words for Text Classification [J]. Expert Systems with Applications, 2011, 38(3): 2758-2765.
[19] 于洋, 李一军. 基于多策略评价的绩效指标权重确定方法研究[J]. 系统工程理论与实践, 2003, 23(8): 8-15, 52. (Yu Yang, Li Yijun. Research on Giving Weight for Performance Indicator Based on the Multi-strategy Method [J]. Systems Engineering-Theory & Practice, 2003, 23(8): 8-15, 52.)Enwiki Dump Progress [DB/OL]. [2014-09-03]. http://dumps. wikimedia.org/enwiki/. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|