Please wait a minute...
New Technology of Library and Information Service  2012, Vol. 28 Issue (7): 82-89    DOI: 10.11925/infotech.1003-3513.2012.07.13
Current Issue | Archive | Adv Search |
Survey of the State of the Art in Word Similarity
Liu Ping, Chen Ye
Center for the Studies of Information Resources, Wuhan University, Wuhan 430072, China
Download: PDF(502 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  This paper provides a comprehensive review on word similarity measuring methods in two categories, with background knowledge and without background knowledge. The statistical method without background knowledge cannot reveal the semantic relations between words, and the thesaurus provides limited scope of words. Wikipedia, as a large corpus comprising semantic knowledgebase, becomes the new sources for measuring semantic similarity between words. Three new Wiki-based methods, WikiWalk, conceptual graph, and temporal semantic analysis are described in details. The future directions of this field continue to combine Wikipedia and other background information as complementary semantic resources. In addition, characterizing relatedness between words by performing a complex network analysis is also a future challenge.
Key wordsWord similarity      Semantic relatedness      Similarity measures     
Received: 27 May 2012      Published: 11 October 2012
: 

TP391

 

Cite this article:

Liu Ping, Chen Ye. Survey of the State of the Art in Word Similarity. New Technology of Library and Information Service, 2012, 28(7): 82-89.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2012.07.13     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2012/V28/I7/82

[1] 秦春秀, 赵捧未, 刘怀亮.词语相似度计算研究[J]. 情报理论与实践 ,2007,30(1):105-108. (Qin Chunxiu, Zhao Pengwei, Liu Huailiang. Research on Word Similarity Measurement[J]. Information Studies:Theory & Application, 2007, 30(1):105-108.)

[2] 刘群,李素建. 基于《知网》的词汇语义相似度计算[C]. 见: 第三届汉语词汇语义学研讨会 ,2002. (Liu Qun, Li Sujian. Word Similarity Computing Based on How-Net[C]. In:Proceedings of the 3th Conference on Word Semantic,Taipei,2002.)

[3] Levenshetin V I. Binary Codes Capable of Correcting Deletions, Insertions, and Reversals[J]. Soviet Physics Doklady, 1966, 10 (8):707-710.

[4] Wagner R A, Fischer M J. The String-to-String Correction Problem[J]. Journal of the ACM(JACM), 1974, 21(1):168-173.

[5] Cilibrasi R L, Vitányi P M B. Clustering by Compression[J]. IEEE Transaction on Information Theory, 2005, 51(4):1523-1545.

[6] Cilibrasi R L, Vitányi P M B. The Google Similarity Distance[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3):370-383.

[7] Bollegala D, Matsuo Y, Ishizuka M. Measuring Semantic Similarity Between Words Using Web Search Engines[C]. In:Proceedings of the 16th International Conference on World Wide Web(WWW’07). New York:ACM, 2007:757-766.

[8] Sahami M, Heilman T. A Web-based Kernel Function for Matching Short Text Snippets[C]. In:Proceedings of the 15th International Conference on World Wide Web(WWW’06),Edinburgh. 2006.

[9] Salton G, McGill M J. An Introduction to Modern Information Retrieval[M]. New York:McGraw-Hill, Inc, 1986.

[10] Deerwester S, Dumais S T, Furnas G W, et al. Indexing by Latent Semantic Analysis[J]. Journal of the American Society for Information Science, 1990, 41(6):391-407.

[11] Mladenic D. Turning Yahoo into an Automatic Web-Page Classifier[C]. In:Proceedings of the 13th European Conference on Artificial Intelligence. 1998:473-474.

[12] Caropreso M, Matwin S, Sebastiani F. A Learner-Independent Evaluation of the Usefulness of Statistical Phrases for Automated Text Categorization[A]. //Chin A G. Text Database and Document Management:Theory and Practice[M]. Pennsylvania:IGI Publishing Hershey, 2001:78-102.

[13] Raskutti B, Ferra H L, Kowalczyk A. Second Order Features for Maximizing Text Classification Performance[C]. In:Proceedings of the 12th European Conference on Machine Learning(ECML’01). London:Springer-Verlag, 2001:419-430.

[14] Sable C, McKeown K, Church K W. NLP Found Helpful (At Least for One Text Categorization Task)[C]. In:Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing(EMNLP’02). Stroudsburg:Association for Computational Linguistics, 2002:172-179.

[15] Jarmasz M. Roget’s Thesaurus as a Lexical Resource for Natural Language Processing[D].Ottawa:University of Ottawa, 2003.

[16] Miller G A, Fellbaum C. Semantic Network of English[J].Cognition, 1991, 41(1-3):197-229.

[17] 梅家驹,竺一鸣,高蕴琦,等. 同义词词林[M]. 上海:上海辞书出版社,1983.(Mei Jiaju, Zhu Yiming, Gao Yunqi, et al. Synonyms[M]. Shanghai:Shanghai Lexicographical Publishing House, 1983.)

[18] 董振东, 董强. 知网[EB/OL].[2012-03-20]. http://www.keenage.com/html/c_index.html.

[19] 于江生, 俞士汶. 中文概念词典的结构[J]. 中文信息学报 ,2002,16(4):12-20,44.(Yu Jiangsheng,Yu Shiwen. The Structure of the Chinese Concept Dictionary[J]. Journal of Chinese Information Processing, 2002, 16(4):12-20,44.)

[20] Rada R, Mili H, Bichnell E, et al. Development and Application of a Metric on Semantic Nets[J]. IEEE Transactions on Systems Man and Cybernetics, 1989, 19(1):17-30.

[21] Wu Z B,Palmer M. Verb Semantic and Lexical Selection[C]. In:Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics(ACL’94). Stroudsburg:Association for Computational Linguistics, 1994:133-138.

[22] Hirst G, St-Onge D. Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms[M]. Cambridge:The MIT Press, 1998:305-332.

[23] Leacock C, Chodorow M. Combining Local Context and WordNet Similarity for Word Sense Identification[A]. //Fellbaum C. WordNet:An Electronic Lexical Database[M]. Cambridge:The MIT Press, 1998:265-283.

[24] Resnik P. Semantic Similarity in a Taxonomy:An Information-Based Measure and Its Application to Problems Ambiguity in Nature Language[J]. Journal of Artificial Intelligence Research, 1999(11):95-130.

[25] 王斌.汉英双语语料库自动对齐研究[D].北京:中国科学院计算技术研究所,1999.(Wang Bin. Automatic Chinese English Paragraph Segmentation and Alignment[D]. Beijing:Institute of Computing Technology, Chinese Academy of Sciences, 1999.)

[26] Li S J, Zhang J, Huang X, et al. Semantic Computation in Chinese Question-Answering System[J]. Journal of Computer Science and Technology, 2002, 17(6):933-939.

[27] Resnik P. Using Information Content to Evaluate Semantic Similarity in Taxonomy[C]. In:Proceedings of the 14th International Joint Conference on Artificial Intelligence(IJCAI’95). San Francisco:Morgan Kaufmann Publishers Inc, 1995:448-453.

[28] Jiang J J, Conrath D W. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy[C]. In:Proceedings of International Conference Research on Computational Linguistics, Taiwan. 1997:19-33.

[29] Lin D. An Information-Theoretic Definition of Similarity [C]. In:Proceedings of the 15th International Conference on Machine Learning(ICML’98). San Francisco:Morgan Kaufmann Publishers Inc, 1998:296-304.

[30] 荀恩东,颜伟. 基于语义网计算英语词语相似度[J]. 情报学报 ,2006,25(1):43-48.(Xun Endong, Yan Wei. English Word Similarity Calculation Based on Semantic Net[J]. Journal of the China Society for Scientific and Technical Information, 2006, 25(1):43-48.)

[31] Zhang X D, Jing L P, Hu X H, et al. A Comparative Study of Ontology Based Term Similarity Measures on PubMed Document Clustering[C]. In:Proceedings of the 12th International Conference on Database Systems for Advanced Applications(DASFAA’07). Heidelberg, Berlin:Springer-Verlag, 2007:115-126.

[32] 江敏, 肖诗斌, 王弘蔚, 等.一种改进的基于《知网》的词语语义相似度计算[J]. 中文信息学报 ,2008,22(5):84-89.(Jiang Min, Xiao Shibin, Wang Hongwei, et al. An Improved Word Similarity Computing Method Based on HowNet[J]. Journal of Chinese Information Processing, 2008, 22(5):84-89.)

[33] Wikipedia [EB/OL].[2012-03-20]. http://www.wikipedia.org/.

[34] Strube M, Ponzetto S P. WikiRelate! Computing Semantic Relatedness Using Wikipedia[C]. In:Proceedings of the 21st National Conference on Artificial Intelligence(AAAI’06). AAAI Press, 2006:1419-1424.

[35] Gabrilovich E, Markovich S. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis[C]. In:Proceedings of the 20th International Joint Conference on Artificial Intelligence(IJCAI’07). San Francisco:Morgan Kaufmann Publishers Inc, 2007:1606-1611.

[36] Milne D, Witten I H. An Effective, Low-cost Measure of Semantic Relatedness Obtained from Wikipedia Links[C]. In:Proceedings of AAAI Workshop on Wikipedia and Artificial Intelligence. Chicago:AAAI Press, 2008:25-30.

[37] Gabrilovich E, Markovitch S. Overcoming the Brittleness Bottleneck Using Wikipedia:Enhancing Text Categorization with Encyclopedic Knowledge[C]. In:Proceedings of the 21st National Conference on Artificial Intelligence(AAAI’06). AAAI Press, 2006:1301-1306.

[38] Gupta R, Ratinov L. Text Categorization with Knowledge Transfer from Heterogeneous Data Sources[C]. In:Proceedings of the 23rd National Conference on Artificial Intelligence(AAAI’08). AAAI Press, 2008:842-847.

[39] Chang M W, Ratinov L, Roth D, et al. Importance of Semantic Representation:Dataless Classification[C]. In:Proceedings of the 23rd National Conference on Artificial Intelligence(AAAI’08). AAAI Press, 2008:830-835.

[40] Potthast M, Stein B, Anderka M. A Wikipedia-based Multilingual Retrieval Model[C]. In:Proceedings of the 30th European Conference on Advances in Information Retrieval(ECIR’08). Heidelberg, Berlin:Springer-Verlag, 2008:522-530.

[41] Sorg P, Cimiano P. Cross-lingual Information Retrieval with Explicit Semantic Analysis[C]. In:Proceedings of Working Notes for the Conference and Labs of the Evaluation Forum 2008 Workshop, 2008.

[42] Egozi O, Gabrilovich E, Markovitch S. Concept-based Feature Generation and Selection for Information Retrieval[C]. In:Proceedings of the 23rd National Conference on Artificial Intelligence(AAAI’08). AAAI Press, 2008:1132-1137.

[43] Gabrilovich E, Markovitch S. Wikipedia-based Semantic Interpretation for Natural Language Processing[J]. Journal of Artificial Intelligence Research, 2009,34 (1):443-498.

[44] 陈燕, 龙建勋.基于明确语义分析的自动文摘算法[J]. 计算机工程 , 2011, 37(3):183-186.(Chen Yan, Long Jianxun. Automatic Abstraction Algorithm Based on Explicit Semantic Analysis[J]. Computer Engineering, 2011, 37(3):183-186.)

[45] Li Y H, Bandar Z A, Mclean D. An Approach for Measuring Semantic Similarity Between Words Using Multiple Information Sources[J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(4):871-882.

[46] Yeh E, Ramage D, Manning C D, et al. WikiWalk:Random Walks on Wikipedia for Semantic Relatedness[C]. In:Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2009:41-49.

[47] Agirre E, Soroa A. Personalizing PageRank for Word Sense Disambiguation[C]. In:Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics(EACL’09).Stroudsburg:Association for Computational Linguistics, 2009:33-41.

[48] 何夏燕.基于汉语概念图的词汇语义相似度计算[D].上海:上海交通大学, 2010.(He Xiayan. Word Similarity Computing Based on Chinese Conceptual Graph[D]. Shanghai:Shanghai Jiaotong University, 2010.)

[49] Radinsky K, Agichtein E, Gabrilovich E, et al. A Word at a Time:Computing Word Relatedness Using Temporal Semantic Analysis[C]. In:Proceedings of the 20th International Conference on World Wide Web(WWW’11). New York:ACM, 2011:337-346.

[50] Rubenstein H, Goodenough J B. Contextual Correlates of Synonymy[J]. Communications of the ACM, 1965, 8(10):627-633.

[51] Miller G A, Charles W G. Contextual Correlates of Semantic Similarity[J]. Language and Cognitive Processes, 1991, 6(1):1-28.

[52] Finkelstein L, Gabdlovich E, Matias Y, et al. Placing Search in Context:The Concept Revisited[J]. ACM Transactions on Information Systems, 2002, 20(1):116-131.

[53] Budanisky A, Hirst G. Semantic Distance in WordNet:An Experimental, Application-oriented Evaluation of Five Measures[C]. In:Proceeding of Workshop on WordNet and Other Lexical Resources, 2nd Meeting of the North American Chapter of the Association for Computational Linguistics, Pittsburgh. 2001.
[1] Jiang Shuhao, Xue Fuliang. An Improved Content-based Recommendation Method Through Collaborative Predictions and Fuzzy Similarity Measures[J]. 现代图书情报技术, 2014, 30(2): 41-47.
[2] Lu Shengjun,Li Fayong,Qian Jianjun ,Zhen Zhen. WCONS+:An Ontology Integration Approach Based on WCONS[J]. 现代图书情报技术, 2009, 3(2): 18-22.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn