Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (4): 41-47    DOI: 10.11925/infotech.1003-3513.2014.04.07
Current Issue | Archive | Adv Search |
Chinese Synonyms Discovery Based on the Term Definition
Yin Xihong, Qiao Xiaodong, Zhang Yunliang
Institute of Scientific & Technical Information of China, Beijing 100038, China
Download: PDF(458 KB)   HTML  
Export: BibTeX | EndNote (RIS)      

[Objective] Enlightened by Lesk's research about sense disambiguation, an approach based on the term definition to find synonyms is proposed. [Methods] This experiment set up the test set on the Chinese scientific and technical vocabulary system(new energy vehicles). First the Chinese word segmentation, part-of-speech tagging and manual correction of term definition are given. Then verbs and nouns content words are extracted, and the similarity of two terms is calculated according to the number of terms defined in the same content words and the position of the same content words. At last, according to the similarity and given threshold, the synonym relations are recommended. [Results] The precision, recall, F value is used to evaluate the effect of synonyms found, to demonstrate the effectiveness of this method. The result shows that the method can achieve a high precision, but the recall is low. [Limitations] This method can not exclude terms with antisense relationships or related relationships, resulting in lower recall rate. [Conclusions] This method is simple and more effective, and can achieve a high accuracy, while higher recall rate is expected.

Key wordsTerm definition      Similarity algorithm      Synonyms found      Content words      Position     
Received: 06 January 2014      Published: 19 May 2014
:  G254  

Cite this article:

Yin Xihong, Qiao Xiaodong, Zhang Yunliang. Chinese Synonyms Discovery Based on the Term Definition. New Technology of Library and Information Service, 2014, 30(4): 41-47.

URL:     OR

[1] Wei X,Peng F C,Tseng H.Search with Synonyms:Problems and Solutions[C].In:Proceedings of the International Conference on Computational Linguistics (COLING).2010:1318-1326.
[2] Gnoli C.Ten Long-term Research Questions in Knowledge Organization[J].Knowledge Organization,2008,35(2-3):137-149.
[3] 陆勇,侯汉清.用于信息检索的同义词自动识别及其进展[J].南京农业大学学报:社会科学版,2004,4(3):87-93.(Lu Yong,Hou Hanqing.Synonyms Automatic Identification and Progress for Information Retrieval[J].Journal of Nanjing Agricultural University:Social Science Edition,2004,4(3):87-93.)
[4] 钟伟金.基于共现"互斥互信"原理的同义词识别[J].中华医学图书情报杂志,2012,21(5):1-4.(Zhong Weijin."Mutual Exclusion and Mutual Trust" Co-Occurrence Principle-Based Identification of Synonyms[J].Chinese Journal of Medical Library and Information Science,2012,21(5):1-4.)
[5] Grushetsky O,Baker S D.Document-based Synonym Generation:United States,US7890521 B1[P].(2011-02-15).[2013-11-15].
[6] 宋丹,师庆辉,薛德军,等.术语同义词的自动抽取[C].见:第三届全国信息检索与内容安全学术会议,苏州,江苏.2007.(Song Dan,Shi Qinghui,Xue Dejun,et al.Automation Extraction of Similar Term[C].In:Proceedings of the 3rdNational Information Retrieval and Content Security Conference,Suzhou,Jiangsu.2007.)
[7] 孙霞,董乐红.基于监督学习的同义关系自动抽取方法[J].西北大学学报:自然科学版,2008,38(1):35-39.(Sun Xia,Dong Lehong.Automatic Extraction Method of Synonym Relationship Based on Supervised Learning[J].Journal of Northwest University:Natural Science Edition,2008,38(1):35-39.)
[8] Muller P,Langlais P.Comparing Distributional and Mirror Translation Similarities for Extracting Synonyms[C].In:Proceedings of the 24th Canadian Conference on Advances in Artificial Intelligence.Berlin,Heidelberg:Springer-Verlag,2011:323-334.
[9] Van der Plas L,Tiedemann J,Manguin J L.Automatic Acquisition of Synonyms for French Using Parallel Corpora[C].In:Proceedings of the 4th International Workshop on Distributed Agent-based Retrieval Tools.2010.
[10] Meusel R,Niepert M,Eckert K,et al.Thesaurus Extension Using Web Search Engines[C].In:Proceedings of the Role of Digital Libraries in a Time of Global Change,and 12th International Conference on Asia-Pacific Digital Libraries.2010:198-207.
[11] 张运良,乔晓东,朱礼军,等.基于术语翻译信息的同义关系快速构建方法研究[J].图书情报工作,2013,57(8):109-113.(Zhang Yunliang,Qiao Xiaodong,Zhu Lijun,et al.Rapid Construction Method of Synonym Relationship Based on Terms Translation Information[J].Library and Information Service,2013,57(8):109-113.)
[12] Muller P,Hathout N,Gaume B.Synonym Extraction Using a Semantic Distance on a Dictionary[C].In:Proceedings of the 1st Workshop on Graph Based Methods for Natural Language Processing.Stroudsburg:Association for Computational Linguistics,2006:65-72.
[13] 陆勇,侯汉清.基于PageRank值的汉语同义词自动识别[J].西华大学学报:自然科学版,2008,27(2):13-15,94.( Lu Yong,Hou Hanqing.Automatic Recognition of Chinese Synonyms Based on PageRank Algorithm[J].Journal of Xihua University:Natural Science Edition,2008,27(2):13-15,94.)
[14] Wu H,Zhou M.Optimizing Synonym Extraction Using Monolingual and Bilingual Resource[C].In:Proceedings of the 2nd International Workshop on Paraphrasing.Stroudsburg:Association for Computational Linguistics,2003:72-79.
[15] Lesk M.Information in Data:Using the Oxford English Dic­tionary on a Computer[J].SIGIR Forum,1986,20(1-4):18-21.
[16] Banerjee S,Pedersen T.Extended Gloss Overlaps as a Measure of Semantic Relatedness[C].In:Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI'03).2003:805-810.
[17] 朱毅华.智能搜索引擎中的同义词识别算法研究[D].南京:南京农业大学,2001.(Zhu Yihua.Automatic Recognition of Synonym in Construction of Intelligent Search Engine[D].Nanjing:Nanjing Agriculture University,2001.)
[18] 贺德方,乔晓东,朱礼军,等.汉语科技词系统(新能源汽车卷)[M].北京:科学技术文献出版社,2012.(He Defang,Qiao Xiaodong,Zhu Lijun,et al.Chinese Scientific and Technical Vocabulary System (New Energy Vehicles)[M].Beijing:Scientific and Technical Documentation Press,2012.)

[1] Xinlei Li,Hao Wang,Xiaomin Liu,Sanhong Deng. Comparing Text Vector Generators for Weibo Short Text Classification[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[2] Meimei Chen,Kangjie Xue. Personalized Recommendation Algorithm of Multi-faceted Trust Tensor Based on Tag Clustering[J]. 数据分析与知识发现, 2017, 1(5): 94-101.
[3] Meimei Chen, Kangjie Xue. Personalized Recommendation Algorithm Based on Modified Tensor Decomposition Model[J]. 数据分析与知识发现, 2017, 1(3): 38-45.
[4] Gao Nan,Fu Junying,Zhao Yunhua. Identifying Research Trends Based on Patent Bibliographic Coupling: Case Study of Brain-Computer Interface[J]. 现代图书情报技术, 2016, 32(3): 33-40.
[5] Bai Lingen, Chen Zhiqun, Wang Rongbo, Huang Xiaoxi. Empirical Analysis on K-core of Microblog Following Relationship Network[J]. 现代图书情报技术, 2013, 29(11): 68-74.
[6] Feng Jiangfan, Wang Qian, Liu Zhaohong. Research and Implement of Library Positioning System in Wireless LAN Environment[J]. 现代图书情报技术, 2012, 28(4): 79-83.
[7] Zhu Danhao Wang Dongbo Xie Jing. Automatic Identification of Prepositional Phrase Based on Conditional Random Field[J]. 现代图书情报技术, 2010, 26(7/8): 79-83.
[8] Yang He Yang Yihong Qiao Xiaodong Li Ning Zhu Lijun. Construction of Natural Language Thesauri for Automatic Assistant Indexing Literature System[J]. 现代图书情报技术, 2010, 26(6): 17-24.
[9] Lei Xiaoping, Zhang Xu, Zhao Yunhua, Zheng Jia. The Method of Patent Data Approximately Duplicate Attributes and Records Detecting Based on IRPU Algorithm[J]. 现代图书情报技术, 2010, 26(12): 46-51.
[10] Zhang Jinzhu,Zhang Dong,Wang Huilin. The Research of Character-Position-Based Chinese Word Segmentation[J]. 现代图书情报技术, 2008, 24(5): 39-43.
[11] Shen Chunyan,Wang Huilin. Rule-based Automatic Annotating for the Discourse of English Complicated Sentences[J]. 现代图书情报技术, 2008, 24(3): 40-44.
[12] Sun Jieli,Gong Liqun . A Study on the Technology Standards of Web Services Composition[J]. 现代图书情报技术, 2007, 2(5): 27-31.
[13] Han Yahong,Liu Yongge . A Shudy on Web Services Composition Based on Ontology[J]. 现代图书情报技术, 2007, 2(5): 36-40.
[14] Hao Chunyun . A Study on Performance Management of Information Service System Based on J2EE Platform——Taking NSTL as an Example[J]. 现代图书情报技术, 2007, 2(4): 66-69.
[15] Wang Jiesheng,Li Zhoujun,Li Mengjun . A Web Service Composition Method Based on Ontology[J]. 现代图书情报技术, 2007, 2(1): 1-5.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938