[Objective] This paper tries to add a topic layer between document and word layers, aiming to calculate word similarities effectively. [Methods] First, we proposed a topic defintion and representation model based on the theory of formal concept analysis. Then, we mapped words to the topic layer. Finally, we developed an algorithm to calculate word similarities with the help of topic-to-topic relationship.[Results] We analyzed papers of SIGIR conference from 2006 to 2016 with the proposed method to calculate word similarities in the field of information retrieval. The precision and recall of the proposed method were up to 30% and 21% higher than those of the FastText method.[Limitations] The proposed method relies on the quality of extracted feature words of documents.[Conclusions] The proposed method utilizes the semantic relations among associated topics, and effectively calculate word similarities.
刘萍,彭小芳. 基于形式概念分析的词汇相似度计算*[J]. 数据分析与知识发现, 2020, 4(5): 66-74.
Liu Ping,Peng Xiaofang. Calculating Word Similarities Based on Formal Concept Analysis. Data Analysis and Knowledge Discovery, 2020, 4(5): 66-74.
( Han Pu, Wang Dongbo, Wang Zimin. Research Advancement in Word Similarity Calculation and Mining[J]. Information Science, 2016,34(9):161-165.)
[4]
刘萍, 陈烨. 词汇相似度研究进展综述[J].现代图书情报技术, 2012(7):82-89.
[4]
( Liu Ping, Chen Ye. Survey of the State of the Art in Word Similarity[J].New Technology of Library and Information Service, 2012(7):82-89.)
[5]
Rada R, Mili H, Bicknell E, et al. Development and Application of a Metric on Semantic Nets[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1989,19(1):17-30.
[6]
Gao J B, Zhang B W, Chen X H. A WordNet-based Semantic Similarity Measurement Combining Edge-counting and Information Content Theory[J]. Engineering Applications of Artificial Intelligence, 2015,39:80-88.
( Zhu Xinhua, Ma Runcong, Sun Liu, et al. Word Semantic Similarity Computation Based on HowNet and CiLin[J]. Journal of Chinese Information Processing, 2016,30(4):29-36.)
( Chi Zhejie, Zhang Quan. Word Similarity Measurement Based on Concept Primitive[J]. Journal of Electronics and Information Technology, 2017,39(1):150-158.)
[9]
Strube M, Ponzetto S P . WikiRelate! Computing Semantic Relatedness Using Wikipedia [C]// Proceedings of the 21st National Conference on Artificial Intelligence. 2006: 1419-1424.
[10]
Jiang Y, Zhang X, Tang Y, et al. Feature-based Approaches to Semantic Similarity Assessment of Concepts Using Wikipedia[J]. Information Processing & Management, 2015,51(3):215-234.
( Peng Lizhen, Wu Yangyang. Semantic Similarity Computing Based on Community Mining of Wikipedia[J]. Computer Science, 2016,43(4):45-49.)
[12]
Salton G. A Vector Space Model for Automatic Indexing[J]. Communications of the ACM, 1975,18(11):613-620.
[13]
Saif A, Aziz M J A, Omar N. Reducing Explicit Semantic Representation Vectors Using Latent Dirichlet Allocation[J]. Knowledge-Based Systems, 2016,100:145-149.
( Lv Yawei, Li Fang, Dai Longlong. Chinese Word Similarity Computing Based on Latent Dirichlet Allocation(LDA) Model[J]. Journal of Beijing University of Chemical Technology: Natural Science Edition, 2016,43(5):79-83.)
[15]
Bollegala D, Matsuo Y, Ishizuka M. A Web Search Engine-Based Approach to Measure Semantic Similarity Between Words[J]. IEEE Transactions on Knowledge and Data Engineering, 2011,23(7):977-990.
doi: 10.1109/TKDE.2010.172
( Zhang Shuowang, Ouyang Chunping, Yang Xiaohua, et al. Word Semantic Similarity Computation Based on Integrating HowNet and Search Engines[J]. Computer Applications, 2017,37(4):1056-1060.)
[18]
Wille R . Restructing Lattice Theory: An Approach Based on Hierarchies of Concepts [C]// Proceedings of the 7th International Conference on Formal Concept Analysis. 2009: 314-339.
[19]
Morris S A, Yen G G. Crossmaps: Visualization of Overlapping Relationships in Collections of Journal Papers[J]. Proceedings of the National Academy of Sciences, 2004,101(S1):5291-5296.
[20]
Wu Z, Palmer M . Verb Semantic and Lexical Selection [C]// Proceedings of the 32nd Annual Meeting of the Associations for Computational Linguistics. 1994: 133-138.
[21]
Bojanowski P, Grave E, Joulin A, et al. Enriching Word Vectors with Subword Information[J]. Transactions of the Association for Computational Linguistics, 2017,5:135-146.
doi: 10.1162/tacl_a_00051
[22]
Grave E, Bojanowski P, Gupta P , et al. Learning Word Vectors for 157 Languages [C]// Proceedings of the 11th International Conference on Language Resources and Evaluation. 2018: 3483-3487.