Please wait a minute...
Advanced Search
现代图书情报技术  2014, Vol. 30 Issue (4): 41-47    DOI: 10.11925/infotech.1003-3513.2014.04.07
  知识组织与知识管理 本期目录 | 过刊浏览 | 高级检索 |
利用术语定义的汉语同义词发现
殷希红, 乔晓东, 张运良
中国科学技术信息研究所 北京 100038
Chinese Synonyms Discovery Based on the Term Definition
Yin Xihong, Qiao Xiaodong, Zhang Yunliang
Institute of Scientific & Technical Information of China, Beijing 100038, China
全文: PDF(458 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的] 借鉴Lesk词义消歧思想,提出并实现一种利用术语定义来发现汉语同义词的方法。[方法] 将新能源汽车领域汉语科技词系统中的术语及其定义作为测试集,首先对术语定义做分词和词性标注,并进行人工校对,然后抽取出动词和名词词性的实词,再根据两个术语定义中相同的实词数量及位置信息计算术语的相似度,最后根据相似度和给定的阈值得到同义词关系的推荐。[结果] 利用准确率、召回率、F值对同义词发现效果进行评价,论证该方法的有效性,结果表明该方法可以达到较高的准确率,但是召回率比较低。[局限] 该同义词发现方法不能剔除反义关系和相关关系的术语对,造成召回率较低。[结论] 该方法较为简便快捷有效,并且可达到较高准确率,但召回率有待提高。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
殷希红
张运良
乔晓东
关键词 术语定义相似度算法同义词发现实词出现位置    
Abstract

[Objective] Enlightened by Lesk's research about sense disambiguation, an approach based on the term definition to find synonyms is proposed. [Methods] This experiment set up the test set on the Chinese scientific and technical vocabulary system(new energy vehicles). First the Chinese word segmentation, part-of-speech tagging and manual correction of term definition are given. Then verbs and nouns content words are extracted, and the similarity of two terms is calculated according to the number of terms defined in the same content words and the position of the same content words. At last, according to the similarity and given threshold, the synonym relations are recommended. [Results] The precision, recall, F value is used to evaluate the effect of synonyms found, to demonstrate the effectiveness of this method. The result shows that the method can achieve a high precision, but the recall is low. [Limitations] This method can not exclude terms with antisense relationships or related relationships, resulting in lower recall rate. [Conclusions] This method is simple and more effective, and can achieve a high accuracy, while higher recall rate is expected.

Key wordsTerm definition    Similarity algorithm    Synonyms found    Content words    Position
收稿日期: 2014-01-06     
:  G254  
  TP391  
基金资助:

本文系国家自然科学基金项目“面向特定情报分析应用的知识组织系统快速构建关键问题研究”(项目编号:71203208)、国家“十二五”科技支撑计划课题“面向外文科技文献信息的超级科技词表和本体建设”(项目编号:2011BAH10B01)和中国科学技术信息研究所重点工作项目“汉语科技词系统建设与应用工程”(项目编号:ZD2012-3-2)的研究成果之一。

通讯作者: 殷希红 E-mail:zzuxxglyxh@163.com)     E-mail: zzuxxglyxh@163.com
作者简介: 作者贡献声明:乔晓东,张运良:提出研究思路,设计研究方案; 殷希红:进行实验; 殷希红,张运良:采集、清洗和分析数据; 殷希红:论文起草;张运良,殷希红:最终版本修订。
引用本文:   
殷希红, 乔晓东, 张运良. 利用术语定义的汉语同义词发现[J]. 现代图书情报技术, 2014, 30(4): 41-47.
Yin Xihong, Qiao Xiaodong, Zhang Yunliang. Chinese Synonyms Discovery Based on the Term Definition. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2014.04.07.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.04.07

[1] Wei X,Peng F C,Tseng H.Search with Synonyms:Problems and Solutions[C].In:Proceedings of the International Conference on Computational Linguistics (COLING).2010:1318-1326.
[2] Gnoli C.Ten Long-term Research Questions in Knowledge Organization[J].Knowledge Organization,2008,35(2-3):137-149.
[3] 陆勇,侯汉清.用于信息检索的同义词自动识别及其进展[J].南京农业大学学报:社会科学版,2004,4(3):87-93.(Lu Yong,Hou Hanqing.Synonyms Automatic Identification and Progress for Information Retrieval[J].Journal of Nanjing Agricultural University:Social Science Edition,2004,4(3):87-93.)
[4] 钟伟金.基于共现"互斥互信"原理的同义词识别[J].中华医学图书情报杂志,2012,21(5):1-4.(Zhong Weijin."Mutual Exclusion and Mutual Trust" Co-Occurrence Principle-Based Identification of Synonyms[J].Chinese Journal of Medical Library and Information Science,2012,21(5):1-4.)
[5] Grushetsky O,Baker S D.Document-based Synonym Generation:United States,US7890521 B1[P].(2011-02-15).[2013-11-15].http://www.google.com.tw/patents/US7890521.
[6] 宋丹,师庆辉,薛德军,等.术语同义词的自动抽取[C].见:第三届全国信息检索与内容安全学术会议,苏州,江苏.2007.(Song Dan,Shi Qinghui,Xue Dejun,et al.Automation Extraction of Similar Term[C].In:Proceedings of the 3rdNational Information Retrieval and Content Security Conference,Suzhou,Jiangsu.2007.)
[7] 孙霞,董乐红.基于监督学习的同义关系自动抽取方法[J].西北大学学报:自然科学版,2008,38(1):35-39.(Sun Xia,Dong Lehong.Automatic Extraction Method of Synonym Relationship Based on Supervised Learning[J].Journal of Northwest University:Natural Science Edition,2008,38(1):35-39.)
[8] Muller P,Langlais P.Comparing Distributional and Mirror Translation Similarities for Extracting Synonyms[C].In:Proceedings of the 24th Canadian Conference on Advances in Artificial Intelligence.Berlin,Heidelberg:Springer-Verlag,2011:323-334.
[9] Van der Plas L,Tiedemann J,Manguin J L.Automatic Acquisition of Synonyms for French Using Parallel Corpora[C].In:Proceedings of the 4th International Workshop on Distributed Agent-based Retrieval Tools.2010.
[10] Meusel R,Niepert M,Eckert K,et al.Thesaurus Extension Using Web Search Engines[C].In:Proceedings of the Role of Digital Libraries in a Time of Global Change,and 12th International Conference on Asia-Pacific Digital Libraries.2010:198-207.
[11] 张运良,乔晓东,朱礼军,等.基于术语翻译信息的同义关系快速构建方法研究[J].图书情报工作,2013,57(8):109-113.(Zhang Yunliang,Qiao Xiaodong,Zhu Lijun,et al.Rapid Construction Method of Synonym Relationship Based on Terms Translation Information[J].Library and Information Service,2013,57(8):109-113.)
[12] Muller P,Hathout N,Gaume B.Synonym Extraction Using a Semantic Distance on a Dictionary[C].In:Proceedings of the 1st Workshop on Graph Based Methods for Natural Language Processing.Stroudsburg:Association for Computational Linguistics,2006:65-72.
[13] 陆勇,侯汉清.基于PageRank值的汉语同义词自动识别[J].西华大学学报:自然科学版,2008,27(2):13-15,94.( Lu Yong,Hou Hanqing.Automatic Recognition of Chinese Synonyms Based on PageRank Algorithm[J].Journal of Xihua University:Natural Science Edition,2008,27(2):13-15,94.)
[14] Wu H,Zhou M.Optimizing Synonym Extraction Using Monolingual and Bilingual Resource[C].In:Proceedings of the 2nd International Workshop on Paraphrasing.Stroudsburg:Association for Computational Linguistics,2003:72-79.
[15] Lesk M.Information in Data:Using the Oxford English Dic­tionary on a Computer[J].SIGIR Forum,1986,20(1-4):18-21.
[16] Banerjee S,Pedersen T.Extended Gloss Overlaps as a Measure of Semantic Relatedness[C].In:Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI'03).2003:805-810.
[17] 朱毅华.智能搜索引擎中的同义词识别算法研究[D].南京:南京农业大学,2001.(Zhu Yihua.Automatic Recognition of Synonym in Construction of Intelligent Search Engine[D].Nanjing:Nanjing Agriculture University,2001.)
[18] 贺德方,乔晓东,朱礼军,等.汉语科技词系统(新能源汽车卷)[M].北京:科学技术文献出版社,2012.(He Defang,Qiao Xiaodong,Zhu Lijun,et al.Chinese Scientific and Technical Vocabulary System (New Energy Vehicles)[M].Beijing:Scientific and Technical Documentation Press,2012.)

[1] 高楠,傅俊英,赵蕴华. 基于两种相似度矩阵的专利引文耦合方法识别研究前沿*——以脑机接口为例[J]. 现代图书情报技术, 2016, 32(3): 33-40.
[2] 张运良 梁健 朱礼军 乔晓东. 基于术语定义的科技知识组织系统自动丰富关键技术研究*[J]. 现代图书情报技术, 2010, 26(7/8): 66-71.
[3] 杨贺 杨奕虹 乔晓东 李宁 朱礼军. 用于计算机辅助文献标引加工系统的自然语言词表构建*[J]. 现代图书情报技术, 2010, 26(6): 17-24.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn