|
|
Calculating Word Similarities Based on Formal Concept Analysis |
Liu Ping1,2( ),Peng Xiaofang1 |
1School of Information Management, Wuhan University, Wuhan 430072, China 2Institute for Digital Library, Wuhan University, Wuhan 430072, China |
|
|
Abstract [Objective] This paper tries to add a topic layer between document and word layers, aiming to calculate word similarities effectively. [Methods] First, we proposed a topic defintion and representation model based on the theory of formal concept analysis. Then, we mapped words to the topic layer. Finally, we developed an algorithm to calculate word similarities with the help of topic-to-topic relationship.[Results] We analyzed papers of SIGIR conference from 2006 to 2016 with the proposed method to calculate word similarities in the field of information retrieval. The precision and recall of the proposed method were up to 30% and 21% higher than those of the FastText method.[Limitations] The proposed method relies on the quality of extracted feature words of documents.[Conclusions] The proposed method utilizes the semantic relations among associated topics, and effectively calculate word similarities.
|
Received: 03 December 2019
Published: 15 June 2020
|
|
Corresponding Authors:
Liu Ping
E-mail: pliuleeds@126.com
|
[1] |
秦春秀, 赵捧未, 刘怀亮. 词语相似度计算研究[J]. 情报理论与实践, 2007,30(1):105-108.
|
[1] |
( Qin Chunxiu, Zhao Pengwei, Liu Huailiang. Computational Research on Word Similarity[J]. Information Studies: Theory & Practice, 2007,30(1):105-108.)
|
[2] |
刘群, 李素建. 基于《知网》的词汇语义相似度计算[J]. 中文计算语言学, 2002,7(2):59-76.
|
[2] |
( Liu Qun, Li Sujian. Word Similarity Computing Based on How-Net[J]. Chinese Computational Linguisties, 2002,7(2):59-76. )
|
[3] |
韩普, 王东波, 王子敏. 词汇相似度计算和相似词挖掘研究进展[J]. 情报科学, 2016,34(9):161-165.
|
[3] |
( Han Pu, Wang Dongbo, Wang Zimin. Research Advancement in Word Similarity Calculation and Mining[J]. Information Science, 2016,34(9):161-165.)
|
[4] |
刘萍, 陈烨. 词汇相似度研究进展综述[J].现代图书情报技术, 2012(7):82-89.
|
[4] |
( Liu Ping, Chen Ye. Survey of the State of the Art in Word Similarity[J].New Technology of Library and Information Service, 2012(7):82-89.)
|
[5] |
Rada R, Mili H, Bicknell E, et al. Development and Application of a Metric on Semantic Nets[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1989,19(1):17-30.
|
[6] |
Gao J B, Zhang B W, Chen X H. A WordNet-based Semantic Similarity Measurement Combining Edge-counting and Information Content Theory[J]. Engineering Applications of Artificial Intelligence, 2015,39:80-88.
|
[7] |
朱新华, 马润聪, 孙柳, 等. 基于知网与词林的词语语义相似度计算[J]. 中文信息学报, 2016,30(4):29-36.
|
[7] |
( Zhu Xinhua, Ma Runcong, Sun Liu, et al. Word Semantic Similarity Computation Based on HowNet and CiLin[J]. Journal of Chinese Information Processing, 2016,30(4):29-36.)
|
[8] |
池哲洁, 张全. 基于概念基元的词语相似度计算研究[J]. 电子与信息学报, 2017,39(1):150-158.
|
[8] |
( Chi Zhejie, Zhang Quan. Word Similarity Measurement Based on Concept Primitive[J]. Journal of Electronics and Information Technology, 2017,39(1):150-158.)
|
[9] |
Strube M, Ponzetto S P . WikiRelate! Computing Semantic Relatedness Using Wikipedia [C]// Proceedings of the 21st National Conference on Artificial Intelligence. 2006: 1419-1424.
|
[10] |
Jiang Y, Zhang X, Tang Y, et al. Feature-based Approaches to Semantic Similarity Assessment of Concepts Using Wikipedia[J]. Information Processing & Management, 2015,51(3):215-234.
|
[11] |
彭丽针, 吴扬扬. 基于维基百科社区挖掘的词语语义相似度计算[J]. 计算机科学, 2016,43(4):45-49.
|
[11] |
( Peng Lizhen, Wu Yangyang. Semantic Similarity Computing Based on Community Mining of Wikipedia[J]. Computer Science, 2016,43(4):45-49.)
|
[12] |
Salton G. A Vector Space Model for Automatic Indexing[J]. Communications of the ACM, 1975,18(11):613-620.
|
[13] |
Saif A, Aziz M J A, Omar N. Reducing Explicit Semantic Representation Vectors Using Latent Dirichlet Allocation[J]. Knowledge-Based Systems, 2016,100:145-149.
|
[14] |
吕亚伟, 李芳, 戴龙龙. 基于LDA的中文词语相似度计算[J]. 北京化工大学学报: 自然科学版, 2016,43(5):79-83.
|
[14] |
( Lv Yawei, Li Fang, Dai Longlong. Chinese Word Similarity Computing Based on Latent Dirichlet Allocation(LDA) Model[J]. Journal of Beijing University of Chemical Technology: Natural Science Edition, 2016,43(5):79-83.)
|
[15] |
Bollegala D, Matsuo Y, Ishizuka M. A Web Search Engine-Based Approach to Measure Semantic Similarity Between Words[J]. IEEE Transactions on Knowledge and Data Engineering, 2011,23(7):977-990.
doi: 10.1109/TKDE.2010.172
|
[16] |
陈海燕. 基于搜索引擎的词汇语义相似度计算方法[J]. 计算机科学, 2015,42(1):261-267.
|
[16] |
( Chen Haiyan. Measuring Semantic Similarity Between Words Using Web Search Engine[J]. Computer Science, 2015,42(1):261-267.)
|
[17] |
张硕望, 欧阳纯萍, 阳小华, 等. 融合《知网》和搜索引擎的词汇语义相似度计算[J]. 计算机应用, 2017,37(4):1056-1060.
|
[17] |
( Zhang Shuowang, Ouyang Chunping, Yang Xiaohua, et al. Word Semantic Similarity Computation Based on Integrating HowNet and Search Engines[J]. Computer Applications, 2017,37(4):1056-1060.)
|
[18] |
Wille R . Restructing Lattice Theory: An Approach Based on Hierarchies of Concepts [C]// Proceedings of the 7th International Conference on Formal Concept Analysis. 2009: 314-339.
|
[19] |
Morris S A, Yen G G. Crossmaps: Visualization of Overlapping Relationships in Collections of Journal Papers[J]. Proceedings of the National Academy of Sciences, 2004,101(S1):5291-5296.
|
[20] |
Wu Z, Palmer M . Verb Semantic and Lexical Selection [C]// Proceedings of the 32nd Annual Meeting of the Associations for Computational Linguistics. 1994: 133-138.
|
[21] |
Bojanowski P, Grave E, Joulin A, et al. Enriching Word Vectors with Subword Information[J]. Transactions of the Association for Computational Linguistics, 2017,5:135-146.
doi: 10.1162/tacl_a_00051
|
[22] |
Grave E, Bojanowski P, Gupta P , et al. Learning Word Vectors for 157 Languages [C]// Proceedings of the 11th International Conference on Language Resources and Evaluation. 2018: 3483-3487.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|