Please wait a minute...
Advanced Search
现代图书情报技术  2015, Vol. 31 Issue (11): 18-25    DOI: 10.11925/infotech.1003-3513.2015.11.04
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
一种基于维基百科的多策略词义消歧方法
任海英, 于立婷
北京工业大学经济与管理学院 北京 100124
A Multi-strategy Method for Word Sense Disambiguation Based on Wikipedia
Ren Haiying, Yu Liting
School of Economics and Management, Beijing University of Technology, Beijing 100124, China
全文: PDF(476 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的]提出一种基于维基百科的多策略词义消歧方法, 充分利用维基百科中的潜在知识进行消歧。[方法]设计类别一致性、内容相关性以及词义重要程度三个指标, 并通过动态熵权线性融合各指标值以及二次消歧的方法来确定歧义词在特定语境的最佳词义。[结果]通过实验, 该方法取得了74.82%的准确率, 可以验证其有效性。[局限]候选词义粒度较细, 且主要针对英文进行消歧, 对其他语言缺少一定的普适性。[结论]维基百科为消歧提供更多的语义知识和背景信息, 能够提高消歧准确率。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
Abstract

[Objective] This paper proposes a multi-strategy method for Word Sense Disambiguation (WSD) based on Wikipedia which makes full use of the latent knowledge in Wikipedia.[Methods] Design three indicators including category commonness, content relatedness and the importance of the word sense, make an entropy-based dynamic linear fusion of these three indicators, combined with re-disambiguation to choose the best sense of an ambiguous term in its context.[Results] Experimental result shows an average precision of 74.82%, therefore validating the feasibility and effectiveness of this method.[Limitations] The proposed method mainly aims at WSD in English with a setting of fine grained candidate senses, lacking certain generality to other languages.[Conclusions] This method provides more semantic knowledge and background information based on Wikipedia which enhance the precision of disambiguation tasks.

收稿日期: 2015-04-21     
:  TP391  
  G35  
基金资助:

本文系北京市自然科学基金预探索项目“发明过程和机理的概念地图表示研究”(项目编号:9153020)和2015年度北京市教委社会科学计划面上项目“一种基于概念地图的发明过程机理的描述方法”(项目编号:SM201510005001)的研究成果之一。

通讯作者: 于立婷, ORCID: 0000-0003-1555-9846, E-mail: yuliting@emails.bjut.edu.cn。     E-mail: yuliting@emails.bjut.edu.cn
作者简介: 作者贡献声明:任海英, 于立婷: 提出、设计研究命题, 实施研究过程, 获取与分析数据; 于立婷: 起草论文; 任海英: 论文最终版本修订。
引用本文:   
任海英, 于立婷. 一种基于维基百科的多策略词义消歧方法[J]. 现代图书情报技术, 2015, 31(11): 18-25.
Ren Haiying, Yu Liting. A Multi-strategy Method for Word Sense Disambiguation Based on Wikipedia. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2015.11.04.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2015.11.04

[1] Bhala R V V, Abirami S. Trends in Word Sense Disambigua­tion[J]. Artificial Intelligence Review, 2014, 42(2): 159-171.
[2] Pedersen T. A Decision Tree of Bigrams is an Accurate Predictor of Word Sense [C]. In: Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics, Carnegie Mellon University, Pittsburgh, PA, USA. Somerset: Association Computational Linguistics, 2001: 79-86.
[3] Navigli R, Velardi P. Structural Semantic Interconnections: A Knowledge-based Approach to Word Sense Disambiguation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(7): 1075-1086.
[4] Dandala B, Mihalcea R, Bunescu R. Word Sense Disambiguation Using Wikipedia [A]// The People's Web Meets NLP: Collaboratively Constructed Language Resources [M]. Springer Berlin Heidelberg, 2013: 241-262.
[5] 王兰成, 刘晓亮. 维基百科知网的构建研究与应用进展[J]. 情报资料工作, 2012(5): 56-60. (Wang Lancheng, Liu Xiaoliang. Construction Research and Application Progress of Wikipedia Knowledge Network [J]. Information and Documentation Services, 2012(5): 56-60.)
[6] Mihalcea R. Using Wikipedia for Automatic Word Sense Disambiguation [C]. In: Proceedings of the Human Language Technologies: The Conference of the North American Chapter of the Association for Computational Linguistics. 2007: 196-203.
[7] Fogarolli A.Word Sense Disambiguation Based on Wikipedia Link Structure [C]. In: Proceedings of the 2009 IEEE International Conference on Semantic Computing (ICSC '09), Berkeley, CA, USA. New York: IEEE, 2009: 77-82.
[8] 史天艺, 李明禄. 基于维基百科的自动词义消歧方法[J]. 计算机工程, 2009, 35(18): 62-64, 66. (Shi Tianyi, Li Minglu. Automatic Word Sense Disambiguation Method Based on Wikipedia [J]. Computer Engineering, 2009, 35(18): 62-64, 66.)
[9] Li C, Sun A, Datta A. TSDW: Two-Stage Word Sense Disambiguation Using Wikipedia [J]. Journal of the American Society for Information Science and Technology, 2013, 64(6): 1203-1223.
[10] 汪祥. 基于中文维基百科的语义相关度计算的研究与实现[D]. 长沙: 国防科学技术大学, 2011. (Wang Xiang. Research and Implementation on Computing Semantic Relatedness Using Chinese Wikipedia [D]. Changsha: National University of Defense Technology, 2011.)
[11] Firth J. A Synopsis of Linguistic Theory 1930—1955 [J]. Special, 1957(5611): 562.
[12] 刘群, 李素建. 基于《知网》的词汇语义相似度计算[J]. 中文计算语言学, 2002, 7(2): 59-76. (Liu Qun, Li Sujian. Word Similarity Computing Based on How-net [J]. Computational Linguistics and Chinese Language Processing, 2002, 7(2): 59-76.)
[13] 范云杰, 刘怀亮. 基于维基百科的中文短文本分类研究[J]. 现代图书情报技术, 2012(3): 47-52. (Fan Yunjie, Liu Huailiang. Research on Chinese Short Text Classification Based on Wikipedia [J]. New Technology of Library and Information Service, 2012(3): 47-52.)
[14] 龚永恩, 袁春风, 武港山. 基于语义的词义消歧算法初探[J]. 计算机应用研究, 2006, 23(3): 41-43,52. (Gong Yongen, Yuan Chunfeng, Wu Gangshan. Word Sense Disambiguation Algorithm Based on Semantic Information [J]. Application Research of Computers, 2006, 23(3): 41-43, 52.)
[15] 涂新辉, 张红春, 周琨峰, 等. 中文维基百科的结构化信息抽取及词语相关度计算方法[J]. 中文信息学报, 2012, 26(3): 109-115. (Tu Xinhui, Zhang Hongchun, Zhou Kunfeng, et al. Extracting Structured Information from Chinese Wiki­-pe­dia and Measuring Relatedness Between Words [J]. Journal of Chinese Information Processing, 2012, 26(3): 109-115.)
[16] Witten I H, Milne D N. An Effective, Low-cost Measure of Semantic Relatedness Obtained from Wikipedia Links [C]. In: Proceeding of the AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy. Chicago: AAAI Press, 2008: 25-30.
[17] Cilibrasi R L, Vitanyi P M B. The Google Similarity Distance [J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 370-383.
[18] Zhang W, Yoshida T, Tang X. A Comparative Study of TF*IDF, LSI and Multi-words for Text Classification [J]. Expert Systems with Applications, 2011, 38(3): 2758-2765.
[19] 于洋, 李一军. 基于多策略评价的绩效指标权重确定方法研究[J]. 系统工程理论与实践, 2003, 23(8): 8-15, 52. (Yu Yang, Li Yijun. Research on Giving Weight for Performance Indicator Based on the Multi-strategy Method [J]. Systems Engineering-Theory & Practice, 2003, 23(8): 8-15, 52.)Enwiki Dump Progress [DB/OL]. [2014-09-03]. http://dumps. wikimedia.org/enwiki/.

[1] 李晓峰,马静,李驰,朱恒民. 基于XGBoost模型的电商商品品名识别算法研究 *[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[2] 尤众喜,华薇娜,潘雪莲. 中文分词器对图书评论和情感词典匹配程度的影响 *[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
[3] 关鹏,王曰芬,傅柱. 基于LDA的主题语义演化分析方法研究 * ——以锂离子电池领域为例[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[4] 胡佳慧,方安,赵琬清,杨晨柳,任慧玲. 面向知识发现的中文电子病历标注方法
研究 *
[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[5] 孔贝贝,谢靖,钱力,常志军,吴振新. 科技大数据增值丰富化方法研究与工具研发 *[J]. 数据分析与知识发现, 2019, 3(7): 113-122.
[6] 范雪雪, 王志荣, 徐晤, 梁银, 马小虎. 基于医学本体的术语相似度算法研究[J]. 现代图书情报技术, 2015, 31(12): 57-64.
[7] 杜坤, 刘怀亮, 郭路杰. 结合复杂网络的特征权重改进算法研究[J]. 现代图书情报技术, 2015, 31(11): 26-32.
[8] 叶川, 马静. 多媒体微博评论信息的主题发现算法研究[J]. 现代图书情报技术, 2015, 31(11): 51-59.
[9] 颉夏青, 吴旭. “经典阅读”网络平台可视化技术应用及实现[J]. 现代图书情报技术, 2015, 31(11): 96-103.
[10] 何宇, 吕学强, 徐丽萍. 新能源汽车领域中文术语抽取方法[J]. 现代图书情报技术, 2015, 31(10): 88-94.
[11] 杜思奇, 李红莲, 吕学强. 汉语组块分析在产品特征提取中的应用研究[J]. 现代图书情报技术, 2015, 31(9): 26-30.
[12] 许德山, 李辉, 张运良. 文献关键词链接标引方法研究[J]. 现代图书情报技术, 2015, 31(9): 31-37.
[13] 敦文杰, 孙一钢, 朱先忠. 互联网络电视多媒体文档格式设计与实现[J]. 现代图书情报技术, 2015, 31(9): 82-89.
[14] 陈诗琴, 李文江. WebSocket在图书馆移动信息服务中的应用[J]. 现代图书情报技术, 2015, 31(9): 90-96.
[15] 童国平, 孙建军. 基于搜索日志的用户行为分析[J]. 现代图书情报技术, 2015, 31(7-8): 80-88.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn