Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (11): 18-25    DOI: 10.11925/infotech.1003-3513.2015.11.04
Current Issue | Archive | Adv Search |
A Multi-strategy Method for Word Sense Disambiguation Based on Wikipedia
Ren Haiying, Yu Liting
School of Economics and Management, Beijing University of Technology, Beijing 100124, China
Download: PDF(476 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a multi-strategy method for Word Sense Disambiguation (WSD) based on Wikipedia which makes full use of the latent knowledge in Wikipedia.[Methods] Design three indicators including category commonness, content relatedness and the importance of the word sense, make an entropy-based dynamic linear fusion of these three indicators, combined with re-disambiguation to choose the best sense of an ambiguous term in its context.[Results] Experimental result shows an average precision of 74.82%, therefore validating the feasibility and effectiveness of this method.[Limitations] The proposed method mainly aims at WSD in English with a setting of fine grained candidate senses, lacking certain generality to other languages.[Conclusions] This method provides more semantic knowledge and background information based on Wikipedia which enhance the precision of disambiguation tasks.

Received: 21 April 2015      Published: 06 April 2016
:  TP391  
  G35  

Cite this article:

Ren Haiying, Yu Liting. A Multi-strategy Method for Word Sense Disambiguation Based on Wikipedia. New Technology of Library and Information Service, 2015, 31(11): 18-25.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.11.04     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I11/18

[1] Bhala R V V, Abirami S. Trends in Word Sense Disambigua­tion[J]. Artificial Intelligence Review, 2014, 42(2): 159-171.
[2] Pedersen T. A Decision Tree of Bigrams is an Accurate Predictor of Word Sense [C]. In: Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics, Carnegie Mellon University, Pittsburgh, PA, USA. Somerset: Association Computational Linguistics, 2001: 79-86.
[3] Navigli R, Velardi P. Structural Semantic Interconnections: A Knowledge-based Approach to Word Sense Disambiguation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(7): 1075-1086.
[4] Dandala B, Mihalcea R, Bunescu R. Word Sense Disambiguation Using Wikipedia [A]// The People's Web Meets NLP: Collaboratively Constructed Language Resources [M]. Springer Berlin Heidelberg, 2013: 241-262.
[5] 王兰成, 刘晓亮. 维基百科知网的构建研究与应用进展[J]. 情报资料工作, 2012(5): 56-60. (Wang Lancheng, Liu Xiaoliang. Construction Research and Application Progress of Wikipedia Knowledge Network [J]. Information and Documentation Services, 2012(5): 56-60.)
[6] Mihalcea R. Using Wikipedia for Automatic Word Sense Disambiguation [C]. In: Proceedings of the Human Language Technologies: The Conference of the North American Chapter of the Association for Computational Linguistics. 2007: 196-203.
[7] Fogarolli A.Word Sense Disambiguation Based on Wikipedia Link Structure [C]. In: Proceedings of the 2009 IEEE International Conference on Semantic Computing (ICSC '09), Berkeley, CA, USA. New York: IEEE, 2009: 77-82.
[8] 史天艺, 李明禄. 基于维基百科的自动词义消歧方法[J]. 计算机工程, 2009, 35(18): 62-64, 66. (Shi Tianyi, Li Minglu. Automatic Word Sense Disambiguation Method Based on Wikipedia [J]. Computer Engineering, 2009, 35(18): 62-64, 66.)
[9] Li C, Sun A, Datta A. TSDW: Two-Stage Word Sense Disambiguation Using Wikipedia [J]. Journal of the American Society for Information Science and Technology, 2013, 64(6): 1203-1223.
[10] 汪祥. 基于中文维基百科的语义相关度计算的研究与实现[D]. 长沙: 国防科学技术大学, 2011. (Wang Xiang. Research and Implementation on Computing Semantic Relatedness Using Chinese Wikipedia [D]. Changsha: National University of Defense Technology, 2011.)
[11] Firth J. A Synopsis of Linguistic Theory 1930—1955 [J]. Special, 1957(5611): 562.
[12] 刘群, 李素建. 基于《知网》的词汇语义相似度计算[J]. 中文计算语言学, 2002, 7(2): 59-76. (Liu Qun, Li Sujian. Word Similarity Computing Based on How-net [J]. Computational Linguistics and Chinese Language Processing, 2002, 7(2): 59-76.)
[13] 范云杰, 刘怀亮. 基于维基百科的中文短文本分类研究[J]. 现代图书情报技术, 2012(3): 47-52. (Fan Yunjie, Liu Huailiang. Research on Chinese Short Text Classification Based on Wikipedia [J]. New Technology of Library and Information Service, 2012(3): 47-52.)
[14] 龚永恩, 袁春风, 武港山. 基于语义的词义消歧算法初探[J]. 计算机应用研究, 2006, 23(3): 41-43,52. (Gong Yongen, Yuan Chunfeng, Wu Gangshan. Word Sense Disambiguation Algorithm Based on Semantic Information [J]. Application Research of Computers, 2006, 23(3): 41-43, 52.)
[15] 涂新辉, 张红春, 周琨峰, 等. 中文维基百科的结构化信息抽取及词语相关度计算方法[J]. 中文信息学报, 2012, 26(3): 109-115. (Tu Xinhui, Zhang Hongchun, Zhou Kunfeng, et al. Extracting Structured Information from Chinese Wiki­-pe­dia and Measuring Relatedness Between Words [J]. Journal of Chinese Information Processing, 2012, 26(3): 109-115.)
[16] Witten I H, Milne D N. An Effective, Low-cost Measure of Semantic Relatedness Obtained from Wikipedia Links [C]. In: Proceeding of the AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy. Chicago: AAAI Press, 2008: 25-30.
[17] Cilibrasi R L, Vitanyi P M B. The Google Similarity Distance [J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 370-383.
[18] Zhang W, Yoshida T, Tang X. A Comparative Study of TF*IDF, LSI and Multi-words for Text Classification [J]. Expert Systems with Applications, 2011, 38(3): 2758-2765.
[19] 于洋, 李一军. 基于多策略评价的绩效指标权重确定方法研究[J]. 系统工程理论与实践, 2003, 23(8): 8-15, 52. (Yu Yang, Li Yijun. Research on Giving Weight for Performance Indicator Based on the Multi-strategy Method [J]. Systems Engineering-Theory & Practice, 2003, 23(8): 8-15, 52.)Enwiki Dump Progress [DB/OL]. [2014-09-03]. http://dumps. wikimedia.org/enwiki/.

[1] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[2] Zhongxi You,Weina Hua,Xuelian Pan. Matching Book Reviews and Essential Sentiment Lexicons with Chinese Word Segmenters[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
[3] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[4] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[5] Beibei Kong,Jing Xie,Li Qian,Zhijun Chang,Zhenxin Wu. Methodology and Tools to Enrich Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(7): 113-122.
[6] Fan Xuexue, Wang Zhirong, Xu Wu, Liang Yin, Ma Xiaohu. Research on Semantic Similarity Estimation Algorithm of Medical Terminology Based on Medical Ontology[J]. 现代图书情报技术, 2015, 31(12): 57-64.
[7] Du Kun, Liu Huailiang, Guo Lujie. Study on the Modified Method of Feature Weighting with Complex Networks[J]. 现代图书情报技术, 2015, 31(11): 26-32.
[8] Ye Chuan, Ma Jing. Research on Topic Discovery Algoritm of Multimedia Microblog Comments Information[J]. 现代图书情报技术, 2015, 31(11): 51-59.
[9] Xie Xiaqing, Wu Xu. Application of Visualization Technology for “Classic Reading” Platform[J]. 现代图书情报技术, 2015, 31(11): 96-103.
[10] He Yu, Lv Xueqiang, Xu Liping. A Chinese Term Extraction System in New Energy Vehicles Domain[J]. 现代图书情报技术, 2015, 31(10): 88-94.
[11] Du Siqi, Li Honglian, Lv Xueqiang. Research of Chinese Chunk Parsing in Application of the Product Feature Extraction[J]. 现代图书情报技术, 2015, 31(9): 26-30.
[12] Xu Deshan, Li Hui, Zhang Yunliang. A Method of Keywords Annotation Based on Linked Triples[J]. 现代图书情报技术, 2015, 31(9): 31-37.
[13] Dun Wenjie, Sun Yigang, Zhu Xianzhong. Design and Realization of Multimedia Document Structure of Internet TV[J]. 现代图书情报技术, 2015, 31(9): 82-89.
[14] Chen Shiqin, Li Wenjiang. Application of WebSocket in Library Mobile Information Service[J]. 现代图书情报技术, 2015, 31(9): 90-96.
[15] Tong Guoping, Sun Jianjun. User Behavior Analysis Based on Search Engine Log[J]. 现代图书情报技术, 2015, 31(7-8): 80-88.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn