Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (6): 1-11    DOI: 10.11925/infotech.2096-3467.2017.06.01
Orginal Article Current Issue | Archive | Adv Search |
Review of Studies on Text Similarity Measures
Chen Erjing1,2(), Jiang Enbo1
1Chengdu Documentation and Information Center, Chinese Academy of Sciences, Chengdu 610041, China
2University of Chinese Academy of Sciences, Beijing 100049, China
Download: PDF (756 KB)   HTML ( 35
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper analyzes the popular text similarity measures and discusses their latest developments. [Coverage] We retrieved 69 key articles from CNKI and Web of Science databases by searching “TI: ‘text similarity’ or ‘semantic similarity’ or ‘lexical similarity’ ” in Chinese and English respectively. [Methods] We systematically reviewed the text similarity measures focusing on their basic concepts, characteristics and future directions. [Results] There were four types of text similarity measures: String-based, Corpus-based, Knowledge-based and others. Measures based on the neural network, Knowledge-based measures and inter-disciplinary measures could be the future research directions. [Limitations] We did not discuss the applications of those measures. [Conclusions] This paper is a comprehensive review of text similarity measure research.

Key wordsText Similarity      Semantic Similarity      Ontology      Bag of Words Model      Neural Network     
Received: 09 May 2017      Published: 25 August 2017
ZTFLH:  TP391 G35  

Cite this article:

Chen Erjing,Jiang Enbo. Review of Studies on Text Similarity Measures. Data Analysis and Knowledge Discovery, 2017, 1(6): 1-11.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.06.01     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I6/1

类型 方法 基本思想 类型 特点与不足
基于字符 编辑距离 SA转换到SB需要删除、插入、替换操作的最少次数。 字符组成 计算准确, 但费时。
汉明距离[13] $1-\left( \sum\limits_{k=1}^{n}{{{x}_{k}}\oplus {{y}_{k}}} \right)/n$, 其中xk, yk分别表示字符串SASB
对应码字第K位的分量。
字符组成 采用模2加运算, 简化长文本计算,
效率高。
LCS 共现且最长的子字符串。 字符顺序 原理简单, 针对派生词和短文本有较好效果, 但不适用于长文本。
Jaro-Winkler ${{d}_{j}}=\frac{1}{3}\left( \frac{m}{|{{S}_{A}}|}+\frac{m}{|{{S}_{B}}|}+\frac{m-t}{m} \right)$, 其中m是匹配的字符数;
t是换位的数目。相似度计算公式为${{d}_{j}}+(lp(1-{{d}_{j}}))$, 其中dj是两个字符串的Jaro 距离, l是前缀相同的长度, 规定最大为4。Winkler将p定义为0.1。
字符顺序 考虑了前缀相同的重要性, 针对短
文本有较好效果, 但不适用于长文本。
N-gram $\frac{n}{n}$ 集合思想 n可调, 方法较为灵活, 但不适用于长文本。
基于词语 余弦相似度 $\frac{\overrightarrow{{{S}_{A}}}\cdot \overrightarrow{{{S}_{B}}}}{||{{S}_{A}}||\ ||{{S}_{B}}||}$ 词语组成 将文本置于向量空间, 解释性强, 较为常用, 但不适用于长文本。
Dice系数[14] $\frac{2\times comm({{S}_{A}},{{S}_{B}})}{leng({{S}_{A}})+leng({{S}_{B}})}$ 词语组成 增强相同部分的作用, 有效关注较短的相同文本。
欧式距离 $\sqrt{S_{A}^{2}+S_{B}^{2}}$ 词语组成 算法简单直接, 但效果粗糙, 不适用于长文本。
Jaccard $\frac{{{S}_{A}}\ \bigcap {{S}_{B}}}{{{S}_{A}}\ \bigcup {{S}_{B}}}$ 集合思想 不适用于长文本。
Overlap Coefficient $\frac{{{S}_{A}}\ \bigcap {{S}_{B}}}{\min ({{S}_{A}},{{S}_{B}})}$ 集合思想 当一个字符串是另一个字符串的子字符串时, 相似度最大。
基于距离 基于内容 基于属性 混合式
基本
原理
用概念之间的路径长度表示
语义距离
用概念词共享的信息量化它们之间的语义相似度 用概念词之间的公共属性数
量衡量它们之间的相似度
将基于距离、基于内容、基于属性三种方法综合计算概念之间的相似度
代表
方法
Shortest Path[38]、Wu等[39]
Weighted Links[40]、Li等[41]
刘群等[10]
Lin[42]、Resnik[43]、Lord等[44]、边振兴[45] Tversky[46] 葛斌等[47]、王艳娜等[48]、李文清等[49]
特点 在计算方法中加入了节点深度、密度、强度、宽度及分类体系
层次等影响因子
计算方法采用不同节点的信息量以及表达信息内容的不同公式 计算效果依赖于本体属性集的完整性 计算方法中权重参数设置大多依赖领域专家
[1] Gomaa W H, Fahmy A A.A Survey of Text Similarity Approaches[J]. International Journal of Computer Applications, 2013, 68(13): 13-18.
doi: 10.5120/11638-7118
[2] Pradhan N, Gyanchandani M, Wadhvani R.A Review on Text Similarity Technique Used in IR and Its Application[J]. International Journal of Computer Applications, 2015, 120(9): 29-34.
doi: 10.5120/21257-4109
[3] 秦春秀, 赵捧未, 刘怀亮. 词语相似度计算研究[J]. 情报理论与实践, 2007, 30(1): 105-108.
[3] (Qin Chunxiu, Zhao Pengwei, Liu Huailiang.Research on Word Similarity Measurement[J]. Information Studies: Theory & Application, 2007, 30(1): 105-108.)
[4] 刘萍, 陈烨. 词汇相似度研究进展综述[J]. 现代图书情报技术, 2012(7-8): 82-89.
[4] (Liu Ping, Chen Ye.Survey of the State of the Art in Word Similarity[J]. New Technology of Library and Information Service, 2012(7-8): 82-89. )
[5] 李慧. 词语相似度算法研究综述[J]. 现代情报, 2015, 35(4): 172-177.
[5] (Li Hui.A Review on the Research of Word Similarity Algorithms[J]. Journal of Modern Information, 2015, 35(4): 172-177. )
[6] 韩普, 王东波, 王子敏. 词汇相似度计算和相似词挖掘研究进展[J]. 情报科学, 2016, 34(9): 161-165.
[6] (Han Pu, Wang Dongbo, Wang Zimin.Research Advancement in Word Similarity Calculation and Mining[J]. Information Science, 2016, 34(9): 161-165. )
[7] 孙海霞, 钱庆, 成颖. 基于本体的语义相似度计算方法研究综述[J]. 现代图书情报技术, 2010(1): 51-56.
[7] (Sun Haixia, Qian Qing, Cheng Ying.Review of Ontology-based Semantic Similarity Measuring[J]. New Technology of Library and Information Service, 2010(1): 51-56. )
[8] 刘宏哲, 须德. 基于本体的语义相似度和相关度计算研究综述[J]. 计算机科学, 2012, 39(2): 8-13.
doi: 10.3969/j.issn.1002-137X.2012.02.002
[8] (Liu Hongzhe, Xu De.Ontology Based Semantic Similarity and Relatedness Measures Review[J]. Computer Science, 2012, 39(2): 8-13. )
doi: 10.3969/j.issn.1002-137X.2012.02.002
[9] Lin D.An Information-theoretic Definition of Similarity[C]// Proceedings of the 15th International Conference on Machine Learning.1998.
[10] 刘群, 李素建. 基于《知网》的词汇语义相似度计算[J]. 中文计算语言学, 2002, 7(2): 59-76.
[10] (Liu Qun, Li Sujian.Word Similarity Computing Based on How-Net[J]. Chinese Computational Linguisties, 2002, 7(2): 59-76. )
[11] 董振东, 董强. 知网[EB/OL]. [2016-12-08]. .
[11] (Dong Zhendong, Dong Qiang. owNet [EB/OL]. [2016-12-08]. .
[12] 田久乐, 赵蔚. 基于同义词词林的词语相似度计算方法[J]. 吉林大学学报: 信息科学版, 2010, 28(6): 602-608.
doi: 10.3969/j.issn.1671-5896.2010.06.011
[12] (Tian Jiule, Zhao Wei.Words Similarity Algorithm Based on Tongyici Cilin in Semantic Web Adaptive Learning System[J]. Journal of Jilin University: Information Science Edition, 2010, 28(6): 602-608.)
doi: 10.3969/j.issn.1671-5896.2010.06.011
[13] 张焕炯, 王国胜, 钟义信. 基于汉明距离的文本相似度计算[J]. 计算机工程与应用, 2001, 37(19): 21-22.
[13] (Zhang Huanjiong, Wang Guosheng, Zhong Yixin.Text Similarity Computing Based on Hamming Distance[J]. Computer Engineering and Applications, 2001, 37(19): 21-22. )
[14] Dice L R.Measures of the Aount of Ecologic Association Between Species[J]. Ecology, 1944, 26(3): 297-302.
[15] Harris Z S.Distributional Structure [A]// Papers in Structural and Transformational Linguistics[M]. Springer, Dordrecht, 1970.
[16] Salton G, Wong A, Yang C S.A Vector Space Model for Automatic Indexing[J]. Communications of the ACM, 1975, 18(11): 613-620.
doi: 10.1145/361219.361220
[17] 郭庆琳, 李艳梅, 唐琦. 基于VSM的文本相似度计算的研究[J]. 计算机应用研究, 2008,25(11): 3256-3258.
doi: 10.3969/j.issn.1001-3695.2008.11.015
[17] (Guo Qinglin, Li Yanmei, Tang Qi.Similarity Computing of Documents Based on VSM[J]. Application Research of Computers, 2008, 25(11): 3256-3258. )
doi: 10.3969/j.issn.1001-3695.2008.11.015
[18] 李连, 朱爱红, 苏涛. 一种改进的基于向量空间文本相似度算法的研究与实现[J]. 计算机应用与软件, 2012, 29(2): 282-284.
doi: 10.3969/j.issn.1000-386X.2012.02.082
[18] (Li Lian, Zhu Aihong, Su Tao.Research and Implementation of An Improved VSM-based Text Similarity Algorithm[J]. Computer Applications and Software, 2012, 29(2): 282-284. )
doi: 10.3969/j.issn.1000-386X.2012.02.082
[19] Landauer T K, Dumais S T.A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge[J]. Psychological Review, 1997, 104(2): 211-240.
doi: 10.1037//0033-295X.104.2.211
[20] Hofmann T.Probabilistic Latent Semantic Analysis[C]// Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence.1999.
[21] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[22] 王振振, 何明, 杜永萍. 基于LDA主题模型的文本相似度计算[J]. 计算机科学, 2013, 40(12): 229-232.
doi: 10.3969/j.issn.1002-137X.2013.12.049
[22] (Wang Zhenzhen, He Ming, Du Yongping.Text Similarity Computing Based on Topic Model LDA[J]. Computer Science, 2013, 40(12): 229-232. )
doi: 10.3969/j.issn.1002-137X.2013.12.049
[23] 熊大平, 王健, 林鸿飞. 一种基于LDA的社区问答问句相似度计算方法[J]. 中文信息学报, 2012, 26(5): 40-45.
doi: 10.3969/j.issn.1003-0077.2012.05.007
[23] (Xiong Daping, Wang Jian, Lin Hongfei.An LDA-based Approach to Finding Similar Questions for Community Question Answer[J]. Journal of Chinese Information Processing, 2012, 26(5): 40-45. )
doi: 10.3969/j.issn.1003-0077.2012.05.007
[24] 张超, 陈利, 李琼. 一种PST_LDA中文文本相似度计算方法[J]. 计算机应用研究, 2016, 33(2): 375-377,383.
doi: 10.3969/j.issn.1001-3695.2016.02.012
[24] (Zhang Chao, Chen Li, Li Qiong.Chinese Text Similarity Algorithm Based on PST_LDA[J]. Application Research of Computers, 2016, 33(2): 375-377,383. )
doi: 10.3969/j.issn.1001-3695.2016.02.012
[25] Hinton G E.Learning Distributed Representations of Concepts[C]//Proceedings of the 8th Annual Conference of the Cognitive Science Society. 1986.
[26] Bengio Y, Ducharme R, Vincent P, et al.A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003, 3(6): 1137-1155.
doi: 10.1007/3-540-33486-6_6
[27] Mikolov T, Sutskever I, Chen K, et al.Distributed Representations of Words and Phrases and Their Compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013.
[28] Pennington J, Socher R, Manning C D.GloVe: Global Vectors for Word Representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
[29] Kenter T, Rijke M D.Short Text Similarity with Word Embeddings[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. 2015: 1411-1420.
[30] Kusner M J, Sun Y, Kolkin N I, et al.From Word Embeddings to Document Distances[C]//Proceedings of the 32nd International Conference on Machine Learning. 2015.
[31] Huang G, Guo C, Kusner M J, et al.Supervised Word Mover’s Distance[C]//Proceedings of the 30th Conference on Neural Information Processing Systems. 2016.
[32] Cilibrasi R L, Vitanyi P M B. The Google Similarity Distance[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 370-383.
[33] 刘胜久, 李天瑞, 贾真, 等. 基于搜索引擎的相似度研究与应用[J]. 计算机科学, 2014, 41(4): 211-214.
doi: 10.3969/j.issn.1002-137X.2014.04.044
[33] (Liu Shengjiu, Li Tianrui, Jia Zhen, et al.Research and Application of Similarity Based on Search Engine[J]. Computer Science, 2014, 41(4): 211-214. )
doi: 10.3969/j.issn.1002-137X.2014.04.044
[34] Sahami M, Heilman T D.A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets[C]// Proceedings of the 15th International Conference on World Wide Web. 2006: 377-386.
[35] 陈海燕. 基于搜索引擎的词汇语义相似度计算方法[J]. 计算机科学, 2015, 42(1): 261-267.
[35] (Chen Haiyan.Measuring Semantic Similarity Between Words Using Web Search Engines[J]. Computer Science, 2015, 42(1): 261-267.)
[36] Hliaoutakis A. Semantic Similarity Measures in MeSH Ontology and Their Application to Information Retrieval on Medline [EB/OL]. [2016-12-08]. .
[37] Batet M, Sanchez D, Valls A.An Ontology-based Measure to Compute Semantic Similarity in Biomedicine[J]. Journal of Biomedical Informatics, 2011, 44(1): 118-125.
doi: 10.1016/j.jbi.2010.09.002 pmid: 20837160
[38] Rada R, Mili H, Bicknell E, et al.Development and Application of a Metric on Semantic Nets[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1989, 19(1): 17-30.
doi: 10.1109/21.24528
[39] Wu Z, Palmer M.Verb Semantic and Lexical Selection[C]// Proceedings of the 32nd Annual Meeting of the Associations for Computational Linguistics. 1994:133-138.
[40] Richardson R, Smeaton A F, Murphy J. Using WordNet as a Knowledge Base for Measuring Semantic Similarity Between Words [EB/OL]. [2016-12-08]. .
[41] Li Y, Bandar Z A, McLean D. An Approach for Measuring Semantic Similarity Between Words Using Multiple Information Sources[J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(4): 871-882.
doi: 10.1109/TKDE.2003.1209005
[42] Lin D.Principle-based Parsing without Overgeneration[C]// Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics. 1993.
[43] Resnik P.Semantic Similarity in a Taxonomy: An Information-Based Measure and Its Application to Problems of Ambiguity in Natural Language[J]. Journal of Artificial Intelligence Research, 1999, 11:95-130.
doi: 10.1613/jair.514
[44] Lord P W, Stevens R D, Brass A, et al.Investigating Semantic Similarity Measures across the Gene Ontology: The Relationship Between Sequence and Annotation[J]. Bioinformatics, 2003, 19(10): 1275-1283.
doi: 10.1093/bioinformatics/btg153 pmid: 12835272
[45] 边振兴. WordNet中概念语义相似度IC参数模型研究[J]. 计算机工程与应用, 2011, 47(19): 128-131.
doi: 10.3778/j.issn.1002-8331.2011.19.035
[45] (Bian Zhenxing.Research on Model of IC Parameter for Semantic Similarity of Concept in WordNet[J]. Computer Engineering and Applications, 2011, 47(19): 128-131. )
doi: 10.3778/j.issn.1002-8331.2011.19.035
[46] Tversky A.Features of Similarity[J]. Psychological Review, 1977, 84(4): 327-352.
[47] 葛斌, 李芳芳, 郭丝路, 等. 基于知网的词汇语义相似度计算方法研究[J]. 计算机应用研究, 2010, 27(9): 3329-3333.
doi: 10.3969/j.issn.1001-3695.2010.09.034
[47] (Ge Bin, Li Fangfang, Guo Silu, et al.Word’s Semantic Similarity Computation Method Based on Hownet[J]. Application Research of Computers, 2010, 27(9): 3329-3333. )
doi: 10.3969/j.issn.1001-3695.2010.09.034
[48] 王艳娜, 周子力, 何艳. WordNet中基于IC的概念语义相似度算法[J]. 计算机工程, 2011, 37(22): 42-44.
doi: 10.3969/j.issn.1000-3428.2011.22.011
[48] (Wang Yanna, Zhou Zili, He Yan.Concept Semantic Similarity Algorithm in WordNet Based on Information Content[J]. Computer Engineering, 2011, 37(22): 42-44. )
doi: 10.3969/j.issn.1000-3428.2011.22.011
[49] 李文清, 孙新, 张常有, 等. 一种本体概念的语义相似度计算方法[J]. 自动化学报, 2012, 38(2): 229-235.
doi: 10.3724/SP.J.1004.2012.00229
[49] (Li Wenqing, Sun Xin, Zhang Changyou, et al.A Semantic Similarity Measure Between Ontological Concepts[J]. Acta Automatica Sinica, 2012, 38(2): 229-235. )
doi: 10.3724/SP.J.1004.2012.00229
[50] 孙琛琛, 申德荣, 单菁, 等. WSR:一种基于维基百科结构信息的语义关联度计算算法[J]. 计算机学报, 2012, 35(11): 2361-2370.
doi: 10.3724/SP.J.1016.2012.02361
[50] (Sun Chenchen, Shen Derong, Shan Jing, et al.WSR: A Semantic Relatedness Measure Based on Wikipedia Structure[J]. Chinese Journal of Computers, 2012, 35(11): 2361-2370. )
doi: 10.3724/SP.J.1016.2012.02361
[51] Strube M, Ponzetto S P.WikiRelate! Computing Semantic Relatedness Using Wikipedia[C]//Proceedings of the 21st National Conference on Artificial Intelligence. 2006.
[52] Gabrilovich E, Markovitch S.Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis[C]// Proceedings of the 20th International Joint Conference on Artifical Intelligence.2007.
[53] Milne D, Witten I H. An Effective, Low-cost Measure of Semantic Relatedness Obtained from Wikipedia Links[C]// Proceedings of the 23rd Association for the Advancement of Artificial Intelligence. 2008.
[54] 盛志超, 陶晓鹏. 基于维基百科的语义相似度计算方法[J]. 计算机工程, 2011, 37(7): 193-195.
doi: 10.3969/j.issn.1000-3428.2011.07.065
[54] (Sheng Zhichao, Tao Xiaopeng.Semantic Similarity Computing Method Based on Wikipedia[J]. Computer Engineering, 2011, 37(7): 193-195. )
doi: 10.3969/j.issn.1000-3428.2011.07.065
[55] 彭丽针, 吴扬扬. 基于维基百科社区挖掘的词语语义相似度计算[J]. 计算机科学, 2016, 43(4): 45-49.
doi: 10.11896/j.issn.1002-137X.2016.4.009
[55] (Peng Lizhen, Wu Yangyang.Semantic Similarity Computing Based on Community Mining of Wikipedia[J]. Computer Science, 2016, 43(4): 45-49. )
doi: 10.11896/j.issn.1002-137X.2016.4.009
[56] Lizorkin D, Medelyan O, Grineva M.Analysis of Community Structure in Wikipedia[C]//Proceedings of the 18th International Conference on World Wide Web. 2009: 1221-1222.
[57] 詹志建, 梁丽娜, 杨小平. 基于百度百科的词语相似度计算[J]. 计算机科学, 2013, 40(6): 199-202.
doi: 10.3969/j.issn.1002-137X.2013.06.043
[57] (Zhan Zhijian, Liang Li’na, Yang Xiaoping.Word Similarity Measurement Based on BaiduBaike[J]. Computer Science, 2013, 40(6): 199-202. )
doi: 10.3969/j.issn.1002-137X.2013.06.043
[58] 尹坤, 尹红风, 杨燕, 等. 基于SimRank的百度百科词条语义相似度计算[J]. 山东大学学报:工学版, 2014, 44(3): 29-35.
doi: 10.6040/j.issn.1672-3961.2.2013.282
[58] (Yin Kun, Yin Hongfeng, Yang Yan, et al.Semantic Similarity Computation of Baidu Encyclopedia Entries Based on SimRank[J]. Journal of Shandong University:Engineering Science, 2014, 44(3): 29-35. )
doi: 10.6040/j.issn.1672-3961.2.2013.282
[59] 穗志方, 俞士汶. 基于骨架依存树的语句相似度计算模型[C]//1998中文信息处理国际会议论文集. 1998.
[59] (Sui Zhifang, Yu Shiwen.The Skeletal-Dependency-Tree-Based Computational Model for the Sentence Similarity[C]// Proceedings of the International Conference on Chinese Computing.1998. )
[60] 李彬, 刘挺, 秦兵, 等. 基于语义依存的汉语句子相似度计算[J]. 计算机应用研究, 2003, 20(12): 15-17.
doi: 10.3969/j.issn.1001-3695.2003.12.005
[60] (Li Bin, Liu Ting, Qin Bing, et al.Chinese Sentence Similarity Computing Based on Semantic Dependency Relationship Analysis[J]. Application Research of Computers, 2003, 20(12): 15-17. )
doi: 10.3969/j.issn.1001-3695.2003.12.005
[61] 李茹, 王智强, 李双红, 等. 基于框架语义分析的汉语句子相似度计算[J]. 计算机研究与发展, 2013, 50(8): 1728-1736.
[61] (Li Ru, Wang Zhiqiang, Li Shuanghong, et al.Chinese Sentence Similarity Computing Based on Frame Semantic Parsing[J]. Journal of Computer Research and Development, 2013, 50(8): 1728-1736.)
[62] Blanco E, Moldovan D.A Semantic Logic-Based Approach to Determine Textual Similarity[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(4): 683-693.
doi: 10.1109/TASLP.2015.2403613
[63] Jiang J J, Conrath D W.Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy[C]// Proceedings of the International Conference on Research in Computational Linguistics. 1997.
[64] Islam A, Inkpen D.Semantic Text Similarity Using Corpus-based Word Similarity and String Similarity[J]. ACM Transactions on Knowledge Discovery from Data, 2008, 2(2): 1-25.
doi: 10.1145/1376815.1376819
[65] Tasi C S, Huang Y M, Liu C H, et al.Applying VSM and LCS to Develop an Integrated Text Retrieval Mechanism[J]. Expert Systems with Applications, 2012, 39(4): 3974-3982.
doi: 10.1016/j.eswa.2011.09.039
[66] 魏韡, 向阳, 陈千. 计算术语间语义相似度的混合方法[J]. 计算机应用, 2010, 30(6): 1668-1670.
[66] (Wei Wei, Xiang Yang, Chen Qian.Combined Measurement Approach for Semantic Similarity of Terms[J]. Journal of Computer Applications, 2010, 30(6): 1668-1670. )
[67] Liu G, Wang R, Buckley J, et al.A WordNet-based Semantic Similarity Measure Enhanced by Internet-based Knowledge[C]//Proceedings of the International Conference on Software Engineering & Knowledge Engineering.2011.
[68] 王小林, 肖慧, 邰伟鹏. 基于Hadoop平台的文本相似度检测系统的研究[J]. 计算机技术与发展, 2015, 25(8): 90-93.
[68] (Wang Xiaolin, Xiao Hui, Tai Weipeng.Research on Text Similarity Detection System Based on Hadoop[J]. Computer Technology and Development, 2015, 25(8): 90-93.)
[69] Atoum I, Otoom A.Efficient Hybrid Semantic Text Similarity Using Wordnet and a Corpus[J]. International Journal of Advanced Computer Science and Applications, 2016, 7(9): 124-130.
doi: 10.14569/IJACSA.2016.070917
[1] Gu Yaowen, Zhang Bowen, Zheng Si, Yang Fengchun, Li Jiao. Predicting Drug ADMET Properties Based on Graph Attention Network[J]. 数据分析与知识发现, 2021, 5(8): 76-85.
[2] Zhang Le, Leng Jidong, Lv Xueqiang, Cui Zhuo, Wang Lei, You Xindong. RLCPAR: A Rewriting Model for Chinese Patent Abstracts Based on Reinforcement Learning[J]. 数据分析与知识发现, 2021, 5(7): 59-69.
[3] Han Pu,Zhang Zhanpeng,Zhang Mingtao,Gu Liang. Normalizing Chinese Disease Names with Multi-feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 83-94.
[4] Wang Nan,Li Hairong,Tan Shuru. Predicting of Public Opinion Reversal with Improved SMOTE Algorithm and Ensemble Learning[J]. 数据分析与知识发现, 2021, 5(4): 37-48.
[5] Li Danyang, Gan Mingxin. Music Recommendation Method Based on Multi-Source Information Fusion[J]. 数据分析与知识发现, 2021, 5(2): 94-105.
[6] Sheng Shu, Huang Qi, Yang Yang, Xie Qiwen, Qin Xinguo. Exchanging Chinese Medical Information Based on HL7 FHIR[J]. 数据分析与知识发现, 2021, 5(11): 13-28.
[7] Ding Hao, Ai Wenhua, Hu Guangwei, Li Shuqing, Suo Wei. A Personalized Recommendation Model with Time Series Fluctuation of User Interest[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[8] Yin Haoran,Cao Jinxuan,Cao Luzhe,Wang Guodong. Identifying Emergency Elements Based on BiGRU-AM Model with Extended Semantic Dimension[J]. 数据分析与知识发现, 2020, 4(9): 91-99.
[9] Zeng Zhen,Li Gang,Mao Jin,Chen Jinghao. Data Governance and Domain Ontology of Regional Public Security[J]. 数据分析与知识发现, 2020, 4(9): 41-55.
[10] Qiu Erli,He Hongwei,Yi Chengqi,Li Huiying. Research on Public Policy Support Based on Character-level CNN Technology[J]. 数据分析与知识发现, 2020, 4(7): 28-37.
[11] Liu Weijiang,Wei Hai,Yun Tianhe. Evaluation Model for Customer Credits Based on Convolutional Neural Network[J]. 数据分析与知识发现, 2020, 4(6): 80-90.
[12] Wang Mo,Cui Yunpeng,Chen Li,Li Huan. A Deep Learning-based Method of Argumentative Zoning for Research Articles[J]. 数据分析与知识发现, 2020, 4(6): 60-68.
[13] Yan Chun,Liu Lu. Classifying Non-life Insurance Customers Based on Improved SOM and RFM Models[J]. 数据分析与知识发现, 2020, 4(4): 83-90.
[14] Su Chuandong,Huang Xiaoxi,Wang Rongbo,Chen Zhiqun,Mao Junyu,Zhu Jiaying,Pan Yuhao. Identifying Chinese / English Metaphors with Word Embedding and Recurrent Neural Network[J]. 数据分析与知识发现, 2020, 4(4): 91-99.
[15] Gao Yuan,Shi Yuanlei,Zhang Lei,Cao Tianyi,Feng Jun. Reconstructing Tour Routes Based on Travel Notes[J]. 数据分析与知识发现, 2020, 4(2/3): 165-172.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn