Please wait a minute...
Advanced Search
现代图书情报技术  2015, Vol. 31 Issue (11): 18-25     https://doi.org/10.11925/infotech.1003-3513.2015.11.04
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
一种基于维基百科的多策略词义消歧方法
任海英, 于立婷
北京工业大学经济与管理学院 北京 100124
A Multi-strategy Method for Word Sense Disambiguation Based on Wikipedia
Ren Haiying, Yu Liting
School of Economics and Management, Beijing University of Technology, Beijing 100124, China
全文: PDF (476 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的]提出一种基于维基百科的多策略词义消歧方法, 充分利用维基百科中的潜在知识进行消歧。[方法]设计类别一致性、内容相关性以及词义重要程度三个指标, 并通过动态熵权线性融合各指标值以及二次消歧的方法来确定歧义词在特定语境的最佳词义。[结果]通过实验, 该方法取得了74.82%的准确率, 可以验证其有效性。[局限]候选词义粒度较细, 且主要针对英文进行消歧, 对其他语言缺少一定的普适性。[结论]维基百科为消歧提供更多的语义知识和背景信息, 能够提高消歧准确率。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
Abstract

[Objective] This paper proposes a multi-strategy method for Word Sense Disambiguation (WSD) based on Wikipedia which makes full use of the latent knowledge in Wikipedia.[Methods] Design three indicators including category commonness, content relatedness and the importance of the word sense, make an entropy-based dynamic linear fusion of these three indicators, combined with re-disambiguation to choose the best sense of an ambiguous term in its context.[Results] Experimental result shows an average precision of 74.82%, therefore validating the feasibility and effectiveness of this method.[Limitations] The proposed method mainly aims at WSD in English with a setting of fine grained candidate senses, lacking certain generality to other languages.[Conclusions] This method provides more semantic knowledge and background information based on Wikipedia which enhance the precision of disambiguation tasks.

收稿日期: 2015-04-21      出版日期: 2016-04-06
:  TP391  
  G35  
基金资助:

本文系北京市自然科学基金预探索项目“发明过程和机理的概念地图表示研究”(项目编号:9153020)和2015年度北京市教委社会科学计划面上项目“一种基于概念地图的发明过程机理的描述方法”(项目编号:SM201510005001)的研究成果之一。

通讯作者: 于立婷, ORCID: 0000-0003-1555-9846, E-mail: yuliting@emails.bjut.edu.cn。     E-mail: yuliting@emails.bjut.edu.cn
作者简介: 作者贡献声明:任海英, 于立婷: 提出、设计研究命题, 实施研究过程, 获取与分析数据; 于立婷: 起草论文; 任海英: 论文最终版本修订。
引用本文:   
任海英, 于立婷. 一种基于维基百科的多策略词义消歧方法[J]. 现代图书情报技术, 2015, 31(11): 18-25.
Ren Haiying, Yu Liting. A Multi-strategy Method for Word Sense Disambiguation Based on Wikipedia. New Technology of Library and Information Service, 2015, 31(11): 18-25.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2015.11.04      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2015/V31/I11/18

[1] Bhala R V V, Abirami S. Trends in Word Sense Disambigua­tion[J]. Artificial Intelligence Review, 2014, 42(2): 159-171.
[2] Pedersen T. A Decision Tree of Bigrams is an Accurate Predictor of Word Sense [C]. In: Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics, Carnegie Mellon University, Pittsburgh, PA, USA. Somerset: Association Computational Linguistics, 2001: 79-86.
[3] Navigli R, Velardi P. Structural Semantic Interconnections: A Knowledge-based Approach to Word Sense Disambiguation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(7): 1075-1086.
[4] Dandala B, Mihalcea R, Bunescu R. Word Sense Disambiguation Using Wikipedia [A]// The People's Web Meets NLP: Collaboratively Constructed Language Resources [M]. Springer Berlin Heidelberg, 2013: 241-262.
[5] 王兰成, 刘晓亮. 维基百科知网的构建研究与应用进展[J]. 情报资料工作, 2012(5): 56-60. (Wang Lancheng, Liu Xiaoliang. Construction Research and Application Progress of Wikipedia Knowledge Network [J]. Information and Documentation Services, 2012(5): 56-60.)
[6] Mihalcea R. Using Wikipedia for Automatic Word Sense Disambiguation [C]. In: Proceedings of the Human Language Technologies: The Conference of the North American Chapter of the Association for Computational Linguistics. 2007: 196-203.
[7] Fogarolli A.Word Sense Disambiguation Based on Wikipedia Link Structure [C]. In: Proceedings of the 2009 IEEE International Conference on Semantic Computing (ICSC '09), Berkeley, CA, USA. New York: IEEE, 2009: 77-82.
[8] 史天艺, 李明禄. 基于维基百科的自动词义消歧方法[J]. 计算机工程, 2009, 35(18): 62-64, 66. (Shi Tianyi, Li Minglu. Automatic Word Sense Disambiguation Method Based on Wikipedia [J]. Computer Engineering, 2009, 35(18): 62-64, 66.)
[9] Li C, Sun A, Datta A. TSDW: Two-Stage Word Sense Disambiguation Using Wikipedia [J]. Journal of the American Society for Information Science and Technology, 2013, 64(6): 1203-1223.
[10] 汪祥. 基于中文维基百科的语义相关度计算的研究与实现[D]. 长沙: 国防科学技术大学, 2011. (Wang Xiang. Research and Implementation on Computing Semantic Relatedness Using Chinese Wikipedia [D]. Changsha: National University of Defense Technology, 2011.)
[11] Firth J. A Synopsis of Linguistic Theory 1930—1955 [J]. Special, 1957(5611): 562.
[12] 刘群, 李素建. 基于《知网》的词汇语义相似度计算[J]. 中文计算语言学, 2002, 7(2): 59-76. (Liu Qun, Li Sujian. Word Similarity Computing Based on How-net [J]. Computational Linguistics and Chinese Language Processing, 2002, 7(2): 59-76.)
[13] 范云杰, 刘怀亮. 基于维基百科的中文短文本分类研究[J]. 现代图书情报技术, 2012(3): 47-52. (Fan Yunjie, Liu Huailiang. Research on Chinese Short Text Classification Based on Wikipedia [J]. New Technology of Library and Information Service, 2012(3): 47-52.)
[14] 龚永恩, 袁春风, 武港山. 基于语义的词义消歧算法初探[J]. 计算机应用研究, 2006, 23(3): 41-43,52. (Gong Yongen, Yuan Chunfeng, Wu Gangshan. Word Sense Disambiguation Algorithm Based on Semantic Information [J]. Application Research of Computers, 2006, 23(3): 41-43, 52.)
[15] 涂新辉, 张红春, 周琨峰, 等. 中文维基百科的结构化信息抽取及词语相关度计算方法[J]. 中文信息学报, 2012, 26(3): 109-115. (Tu Xinhui, Zhang Hongchun, Zhou Kunfeng, et al. Extracting Structured Information from Chinese Wiki­-pe­dia and Measuring Relatedness Between Words [J]. Journal of Chinese Information Processing, 2012, 26(3): 109-115.)
[16] Witten I H, Milne D N. An Effective, Low-cost Measure of Semantic Relatedness Obtained from Wikipedia Links [C]. In: Proceeding of the AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy. Chicago: AAAI Press, 2008: 25-30.
[17] Cilibrasi R L, Vitanyi P M B. The Google Similarity Distance [J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 370-383.
[18] Zhang W, Yoshida T, Tang X. A Comparative Study of TF*IDF, LSI and Multi-words for Text Classification [J]. Expert Systems with Applications, 2011, 38(3): 2758-2765.
[19] 于洋, 李一军. 基于多策略评价的绩效指标权重确定方法研究[J]. 系统工程理论与实践, 2003, 23(8): 8-15, 52. (Yu Yang, Li Yijun. Research on Giving Weight for Performance Indicator Based on the Multi-strategy Method [J]. Systems Engineering-Theory & Practice, 2003, 23(8): 8-15, 52.)Enwiki Dump Progress [DB/OL]. [2014-09-03]. http://dumps. wikimedia.org/enwiki/.

[1] 王鸿, 舒展, 高印权, 田文洪. 一种单分类器联合多任务网络的隐式句间关系分析方法*[J]. 数据分析与知识发现, 2021, 5(11): 80-88.
[2] 吴彦文, 蔡秋亭, 刘智, 邓云泽. 融合多源数据和场景相似度计算的数字资源推荐研究*[J]. 数据分析与知识发现, 2021, 5(11): 114-123.
[3] 李振宇, 李树青. 嵌入隐式相似群的深度协同过滤算法*[J]. 数据分析与知识发现, 2021, 5(11): 124-134.
[4] 董淼, 苏中琪, 周晓北, 兰雪, 崔志刚, 崔雷. 利用Text-CNN改进PubMedBERT在化学诱导性疾病实体关系分类效果的尝试[J]. 数据分析与知识发现, 2021, 5(11): 145-152.
[5] 余传明, 张贞港, 孔令格. 面向链接预测的知识图谱表示模型对比研究*[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
[6] 丁浩, 艾文华, 胡广伟, 李树青, 索炜. 融合用户兴趣波动时序的个性化推荐模型*[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[7] 华斌, 吴诺, 贺欣. 基于知识融合的政务信息化项目多专家审批意见整合*[J]. 数据分析与知识发现, 2021, 5(10): 124-136.
[8] 王媛, 时恺泽, 牛振东. 一种用于实体关系三元组抽取的位置辅助分步标记方法*[J]. 数据分析与知识发现, 2021, 5(10): 71-80.
[9] 杨辰, 陈晓虹, 王楚涵, 刘婷婷. 基于用户细粒度属性偏好聚类的推荐策略*[J]. 数据分析与知识发现, 2021, 5(10): 94-102.
[10] 戴志宏, 郝晓玲. 上下位关系抽取方法及其在金融市场的应用*[J]. 数据分析与知识发现, 2021, 5(10): 60-70.
[11] 汪雪锋, 任惠超, 刘玉琴. 融合聚类信息的技术主题图可视化方法研究 [J]. 数据分析与知识发现, 0, (): 1-.
[12] 王一钒,李博,史话,苗威,姜斌. 古汉语实体关系联合抽取的标注方法*[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[13] 车宏鑫,王桐,王伟. 前列腺癌预测模型对比研究*[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[14] 周阳,李学俊,王冬磊,陈方,彭莉娟. 炸药配方设计知识图谱的构建与可视分析方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 42-53.
[15] 马江微, 吕学强, 游新冬, 肖刚, 韩君妹. 融合BERT与关系位置特征的军事领域关系抽取方法*[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn