Please wait a minute...
Advanced Search
数据分析与知识发现  2017, Vol. 1 Issue (6): 1-11     https://doi.org/10.11925/infotech.2096-3467.2017.06.01
  综述评介 本期目录 | 过刊浏览 | 高级检索 |
文本相似度计算方法研究综述
陈二静1,2(), 姜恩波1
1中国科学院成都文献情报中心 成都 610041
2中国科学院大学 北京 100049
Review of Studies on Text Similarity Measures
Chen Erjing1,2(), Jiang Enbo1
1Chengdu Documentation and Information Center, Chinese Academy of Sciences, Chengdu 610041, China
2University of Chinese Academy of Sciences, Beijing 100049, China
全文: PDF (756 KB)   HTML ( 35
输出: BibTeX | EndNote (RIS)      
摘要 

目的】分析文本相似度计算方法, 了解该领域的发展态势。【文献范围】在CNKI和Web of Science中分别以检索式“篇名: 文本相似度 OR篇名: 词汇相似度 OR篇名: 语义相似度”和“TI: ‘text similarity’ or ‘semantic similarity’ or ‘lexical similarity’ ”并限定文献类型进行检索, 最终得到69篇重点文献。【方法】对文本相似度计算方法进行系统梳理, 分析重点方法的基本思想、特点并总结未来发展方向。【结果】形成了较为全面的分类描述体系, 文本相似度计算方法可分为4类: 基于字符串的方法、基于语料库的方法、基于世界知识的方法和其他方法。其中, 基于神经网络和基于世界知识的方法以及针对跨领域文本的相似度计算将成为该领域的发展趋势。【局限】仅将不同方法本身作为探讨的核心, 未进一步分析方法的应用情况。【结论】有助于全面把握和深入了解文本相似度计算方法的研究现状和未来趋势。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
陈二静
姜恩波
关键词 文本相似度语义相似度本体词袋模型神经网络    
Abstract

[Objective] This paper analyzes the popular text similarity measures and discusses their latest developments. [Coverage] We retrieved 69 key articles from CNKI and Web of Science databases by searching “TI: ‘text similarity’ or ‘semantic similarity’ or ‘lexical similarity’ ” in Chinese and English respectively. [Methods] We systematically reviewed the text similarity measures focusing on their basic concepts, characteristics and future directions. [Results] There were four types of text similarity measures: String-based, Corpus-based, Knowledge-based and others. Measures based on the neural network, Knowledge-based measures and inter-disciplinary measures could be the future research directions. [Limitations] We did not discuss the applications of those measures. [Conclusions] This paper is a comprehensive review of text similarity measure research.

Key wordsText Similarity    Semantic Similarity    Ontology    Bag of Words Model    Neural Network
收稿日期: 2017-05-09      出版日期: 2017-08-25
ZTFLH:  TP391 G35  
引用本文:   
陈二静, 姜恩波. 文本相似度计算方法研究综述[J]. 数据分析与知识发现, 2017, 1(6): 1-11.
Chen Erjing,Jiang Enbo. Review of Studies on Text Similarity Measures. Data Analysis and Knowledge Discovery, 2017, 1(6): 1-11.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2017.06.01      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2017/V1/I6/1
  文本相似度计算方法分类
类型 方法 基本思想 类型 特点与不足
基于字符 编辑距离 SA转换到SB需要删除、插入、替换操作的最少次数。 字符组成 计算准确, 但费时。
汉明距离[13] $1-\left( \sum\limits_{k=1}^{n}{{{x}_{k}}\oplus {{y}_{k}}} \right)/n$, 其中xk, yk分别表示字符串SASB
对应码字第K位的分量。
字符组成 采用模2加运算, 简化长文本计算,
效率高。
LCS 共现且最长的子字符串。 字符顺序 原理简单, 针对派生词和短文本有较好效果, 但不适用于长文本。
Jaro-Winkler ${{d}_{j}}=\frac{1}{3}\left( \frac{m}{|{{S}_{A}}|}+\frac{m}{|{{S}_{B}}|}+\frac{m-t}{m} \right)$, 其中m是匹配的字符数;
t是换位的数目。相似度计算公式为${{d}_{j}}+(lp(1-{{d}_{j}}))$, 其中dj是两个字符串的Jaro 距离, l是前缀相同的长度, 规定最大为4。Winkler将p定义为0.1。
字符顺序 考虑了前缀相同的重要性, 针对短
文本有较好效果, 但不适用于长文本。
N-gram $\frac{n}{n}$ 集合思想 n可调, 方法较为灵活, 但不适用于长文本。
基于词语 余弦相似度 $\frac{\overrightarrow{{{S}_{A}}}\cdot \overrightarrow{{{S}_{B}}}}{||{{S}_{A}}||\ ||{{S}_{B}}||}$ 词语组成 将文本置于向量空间, 解释性强, 较为常用, 但不适用于长文本。
Dice系数[14] $\frac{2\times comm({{S}_{A}},{{S}_{B}})}{leng({{S}_{A}})+leng({{S}_{B}})}$ 词语组成 增强相同部分的作用, 有效关注较短的相同文本。
欧式距离 $\sqrt{S_{A}^{2}+S_{B}^{2}}$ 词语组成 算法简单直接, 但效果粗糙, 不适用于长文本。
Jaccard $\frac{{{S}_{A}}\ \bigcap {{S}_{B}}}{{{S}_{A}}\ \bigcup {{S}_{B}}}$ 集合思想 不适用于长文本。
Overlap Coefficient $\frac{{{S}_{A}}\ \bigcap {{S}_{B}}}{\min ({{S}_{A}},{{S}_{B}})}$ 集合思想 当一个字符串是另一个字符串的子字符串时, 相似度最大。
  基于字符串的代表方法
基于距离 基于内容 基于属性 混合式
基本
原理
用概念之间的路径长度表示
语义距离
用概念词共享的信息量化它们之间的语义相似度 用概念词之间的公共属性数
量衡量它们之间的相似度
将基于距离、基于内容、基于属性三种方法综合计算概念之间的相似度
代表
方法
Shortest Path[38]、Wu等[39]
Weighted Links[40]、Li等[41]
刘群等[10]
Lin[42]、Resnik[43]、Lord等[44]、边振兴[45] Tversky[46] 葛斌等[47]、王艳娜等[48]、李文清等[49]
特点 在计算方法中加入了节点深度、密度、强度、宽度及分类体系
层次等影响因子
计算方法采用不同节点的信息量以及表达信息内容的不同公式 计算效果依赖于本体属性集的完整性 计算方法中权重参数设置大多依赖领域专家
  基于本体的方法
[1] Gomaa W H, Fahmy A A.A Survey of Text Similarity Approaches[J]. International Journal of Computer Applications, 2013, 68(13): 13-18.
doi: 10.5120/11638-7118
[2] Pradhan N, Gyanchandani M, Wadhvani R.A Review on Text Similarity Technique Used in IR and Its Application[J]. International Journal of Computer Applications, 2015, 120(9): 29-34.
doi: 10.5120/21257-4109
[3] 秦春秀, 赵捧未, 刘怀亮. 词语相似度计算研究[J]. 情报理论与实践, 2007, 30(1): 105-108.
[3] (Qin Chunxiu, Zhao Pengwei, Liu Huailiang.Research on Word Similarity Measurement[J]. Information Studies: Theory & Application, 2007, 30(1): 105-108.)
[4] 刘萍, 陈烨. 词汇相似度研究进展综述[J]. 现代图书情报技术, 2012(7-8): 82-89.
[4] (Liu Ping, Chen Ye.Survey of the State of the Art in Word Similarity[J]. New Technology of Library and Information Service, 2012(7-8): 82-89. )
[5] 李慧. 词语相似度算法研究综述[J]. 现代情报, 2015, 35(4): 172-177.
[5] (Li Hui.A Review on the Research of Word Similarity Algorithms[J]. Journal of Modern Information, 2015, 35(4): 172-177. )
[6] 韩普, 王东波, 王子敏. 词汇相似度计算和相似词挖掘研究进展[J]. 情报科学, 2016, 34(9): 161-165.
[6] (Han Pu, Wang Dongbo, Wang Zimin.Research Advancement in Word Similarity Calculation and Mining[J]. Information Science, 2016, 34(9): 161-165. )
[7] 孙海霞, 钱庆, 成颖. 基于本体的语义相似度计算方法研究综述[J]. 现代图书情报技术, 2010(1): 51-56.
[7] (Sun Haixia, Qian Qing, Cheng Ying.Review of Ontology-based Semantic Similarity Measuring[J]. New Technology of Library and Information Service, 2010(1): 51-56. )
[8] 刘宏哲, 须德. 基于本体的语义相似度和相关度计算研究综述[J]. 计算机科学, 2012, 39(2): 8-13.
doi: 10.3969/j.issn.1002-137X.2012.02.002
[8] (Liu Hongzhe, Xu De.Ontology Based Semantic Similarity and Relatedness Measures Review[J]. Computer Science, 2012, 39(2): 8-13. )
doi: 10.3969/j.issn.1002-137X.2012.02.002
[9] Lin D.An Information-theoretic Definition of Similarity[C]// Proceedings of the 15th International Conference on Machine Learning.1998.
[10] 刘群, 李素建. 基于《知网》的词汇语义相似度计算[J]. 中文计算语言学, 2002, 7(2): 59-76.
[10] (Liu Qun, Li Sujian.Word Similarity Computing Based on How-Net[J]. Chinese Computational Linguisties, 2002, 7(2): 59-76. )
[11] 董振东, 董强. 知网[EB/OL]. [2016-12-08]. .
[11] (Dong Zhendong, Dong Qiang. owNet [EB/OL]. [2016-12-08]. .
[12] 田久乐, 赵蔚. 基于同义词词林的词语相似度计算方法[J]. 吉林大学学报: 信息科学版, 2010, 28(6): 602-608.
doi: 10.3969/j.issn.1671-5896.2010.06.011
[12] (Tian Jiule, Zhao Wei.Words Similarity Algorithm Based on Tongyici Cilin in Semantic Web Adaptive Learning System[J]. Journal of Jilin University: Information Science Edition, 2010, 28(6): 602-608.)
doi: 10.3969/j.issn.1671-5896.2010.06.011
[13] 张焕炯, 王国胜, 钟义信. 基于汉明距离的文本相似度计算[J]. 计算机工程与应用, 2001, 37(19): 21-22.
[13] (Zhang Huanjiong, Wang Guosheng, Zhong Yixin.Text Similarity Computing Based on Hamming Distance[J]. Computer Engineering and Applications, 2001, 37(19): 21-22. )
[14] Dice L R.Measures of the Aount of Ecologic Association Between Species[J]. Ecology, 1944, 26(3): 297-302.
[15] Harris Z S.Distributional Structure [A]// Papers in Structural and Transformational Linguistics[M]. Springer, Dordrecht, 1970.
[16] Salton G, Wong A, Yang C S.A Vector Space Model for Automatic Indexing[J]. Communications of the ACM, 1975, 18(11): 613-620.
doi: 10.1145/361219.361220
[17] 郭庆琳, 李艳梅, 唐琦. 基于VSM的文本相似度计算的研究[J]. 计算机应用研究, 2008,25(11): 3256-3258.
doi: 10.3969/j.issn.1001-3695.2008.11.015
[17] (Guo Qinglin, Li Yanmei, Tang Qi.Similarity Computing of Documents Based on VSM[J]. Application Research of Computers, 2008, 25(11): 3256-3258. )
doi: 10.3969/j.issn.1001-3695.2008.11.015
[18] 李连, 朱爱红, 苏涛. 一种改进的基于向量空间文本相似度算法的研究与实现[J]. 计算机应用与软件, 2012, 29(2): 282-284.
doi: 10.3969/j.issn.1000-386X.2012.02.082
[18] (Li Lian, Zhu Aihong, Su Tao.Research and Implementation of An Improved VSM-based Text Similarity Algorithm[J]. Computer Applications and Software, 2012, 29(2): 282-284. )
doi: 10.3969/j.issn.1000-386X.2012.02.082
[19] Landauer T K, Dumais S T.A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge[J]. Psychological Review, 1997, 104(2): 211-240.
doi: 10.1037//0033-295X.104.2.211
[20] Hofmann T.Probabilistic Latent Semantic Analysis[C]// Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence.1999.
[21] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[22] 王振振, 何明, 杜永萍. 基于LDA主题模型的文本相似度计算[J]. 计算机科学, 2013, 40(12): 229-232.
doi: 10.3969/j.issn.1002-137X.2013.12.049
[22] (Wang Zhenzhen, He Ming, Du Yongping.Text Similarity Computing Based on Topic Model LDA[J]. Computer Science, 2013, 40(12): 229-232. )
doi: 10.3969/j.issn.1002-137X.2013.12.049
[23] 熊大平, 王健, 林鸿飞. 一种基于LDA的社区问答问句相似度计算方法[J]. 中文信息学报, 2012, 26(5): 40-45.
doi: 10.3969/j.issn.1003-0077.2012.05.007
[23] (Xiong Daping, Wang Jian, Lin Hongfei.An LDA-based Approach to Finding Similar Questions for Community Question Answer[J]. Journal of Chinese Information Processing, 2012, 26(5): 40-45. )
doi: 10.3969/j.issn.1003-0077.2012.05.007
[24] 张超, 陈利, 李琼. 一种PST_LDA中文文本相似度计算方法[J]. 计算机应用研究, 2016, 33(2): 375-377,383.
doi: 10.3969/j.issn.1001-3695.2016.02.012
[24] (Zhang Chao, Chen Li, Li Qiong.Chinese Text Similarity Algorithm Based on PST_LDA[J]. Application Research of Computers, 2016, 33(2): 375-377,383. )
doi: 10.3969/j.issn.1001-3695.2016.02.012
[25] Hinton G E.Learning Distributed Representations of Concepts[C]//Proceedings of the 8th Annual Conference of the Cognitive Science Society. 1986.
[26] Bengio Y, Ducharme R, Vincent P, et al.A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003, 3(6): 1137-1155.
doi: 10.1007/3-540-33486-6_6
[27] Mikolov T, Sutskever I, Chen K, et al.Distributed Representations of Words and Phrases and Their Compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013.
[28] Pennington J, Socher R, Manning C D.GloVe: Global Vectors for Word Representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
[29] Kenter T, Rijke M D.Short Text Similarity with Word Embeddings[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. 2015: 1411-1420.
[30] Kusner M J, Sun Y, Kolkin N I, et al.From Word Embeddings to Document Distances[C]//Proceedings of the 32nd International Conference on Machine Learning. 2015.
[31] Huang G, Guo C, Kusner M J, et al.Supervised Word Mover’s Distance[C]//Proceedings of the 30th Conference on Neural Information Processing Systems. 2016.
[32] Cilibrasi R L, Vitanyi P M B. The Google Similarity Distance[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 370-383.
[33] 刘胜久, 李天瑞, 贾真, 等. 基于搜索引擎的相似度研究与应用[J]. 计算机科学, 2014, 41(4): 211-214.
doi: 10.3969/j.issn.1002-137X.2014.04.044
[33] (Liu Shengjiu, Li Tianrui, Jia Zhen, et al.Research and Application of Similarity Based on Search Engine[J]. Computer Science, 2014, 41(4): 211-214. )
doi: 10.3969/j.issn.1002-137X.2014.04.044
[34] Sahami M, Heilman T D.A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets[C]// Proceedings of the 15th International Conference on World Wide Web. 2006: 377-386.
[35] 陈海燕. 基于搜索引擎的词汇语义相似度计算方法[J]. 计算机科学, 2015, 42(1): 261-267.
[35] (Chen Haiyan.Measuring Semantic Similarity Between Words Using Web Search Engines[J]. Computer Science, 2015, 42(1): 261-267.)
[36] Hliaoutakis A. Semantic Similarity Measures in MeSH Ontology and Their Application to Information Retrieval on Medline [EB/OL]. [2016-12-08]. .
[37] Batet M, Sanchez D, Valls A.An Ontology-based Measure to Compute Semantic Similarity in Biomedicine[J]. Journal of Biomedical Informatics, 2011, 44(1): 118-125.
doi: 10.1016/j.jbi.2010.09.002 pmid: 20837160
[38] Rada R, Mili H, Bicknell E, et al.Development and Application of a Metric on Semantic Nets[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1989, 19(1): 17-30.
doi: 10.1109/21.24528
[39] Wu Z, Palmer M.Verb Semantic and Lexical Selection[C]// Proceedings of the 32nd Annual Meeting of the Associations for Computational Linguistics. 1994:133-138.
[40] Richardson R, Smeaton A F, Murphy J. Using WordNet as a Knowledge Base for Measuring Semantic Similarity Between Words [EB/OL]. [2016-12-08]. .
[41] Li Y, Bandar Z A, McLean D. An Approach for Measuring Semantic Similarity Between Words Using Multiple Information Sources[J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(4): 871-882.
doi: 10.1109/TKDE.2003.1209005
[42] Lin D.Principle-based Parsing without Overgeneration[C]// Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics. 1993.
[43] Resnik P.Semantic Similarity in a Taxonomy: An Information-Based Measure and Its Application to Problems of Ambiguity in Natural Language[J]. Journal of Artificial Intelligence Research, 1999, 11:95-130.
doi: 10.1613/jair.514
[44] Lord P W, Stevens R D, Brass A, et al.Investigating Semantic Similarity Measures across the Gene Ontology: The Relationship Between Sequence and Annotation[J]. Bioinformatics, 2003, 19(10): 1275-1283.
doi: 10.1093/bioinformatics/btg153 pmid: 12835272
[45] 边振兴. WordNet中概念语义相似度IC参数模型研究[J]. 计算机工程与应用, 2011, 47(19): 128-131.
doi: 10.3778/j.issn.1002-8331.2011.19.035
[45] (Bian Zhenxing.Research on Model of IC Parameter for Semantic Similarity of Concept in WordNet[J]. Computer Engineering and Applications, 2011, 47(19): 128-131. )
doi: 10.3778/j.issn.1002-8331.2011.19.035
[46] Tversky A.Features of Similarity[J]. Psychological Review, 1977, 84(4): 327-352.
[47] 葛斌, 李芳芳, 郭丝路, 等. 基于知网的词汇语义相似度计算方法研究[J]. 计算机应用研究, 2010, 27(9): 3329-3333.
doi: 10.3969/j.issn.1001-3695.2010.09.034
[47] (Ge Bin, Li Fangfang, Guo Silu, et al.Word’s Semantic Similarity Computation Method Based on Hownet[J]. Application Research of Computers, 2010, 27(9): 3329-3333. )
doi: 10.3969/j.issn.1001-3695.2010.09.034
[48] 王艳娜, 周子力, 何艳. WordNet中基于IC的概念语义相似度算法[J]. 计算机工程, 2011, 37(22): 42-44.
doi: 10.3969/j.issn.1000-3428.2011.22.011
[48] (Wang Yanna, Zhou Zili, He Yan.Concept Semantic Similarity Algorithm in WordNet Based on Information Content[J]. Computer Engineering, 2011, 37(22): 42-44. )
doi: 10.3969/j.issn.1000-3428.2011.22.011
[49] 李文清, 孙新, 张常有, 等. 一种本体概念的语义相似度计算方法[J]. 自动化学报, 2012, 38(2): 229-235.
doi: 10.3724/SP.J.1004.2012.00229
[49] (Li Wenqing, Sun Xin, Zhang Changyou, et al.A Semantic Similarity Measure Between Ontological Concepts[J]. Acta Automatica Sinica, 2012, 38(2): 229-235. )
doi: 10.3724/SP.J.1004.2012.00229
[50] 孙琛琛, 申德荣, 单菁, 等. WSR:一种基于维基百科结构信息的语义关联度计算算法[J]. 计算机学报, 2012, 35(11): 2361-2370.
doi: 10.3724/SP.J.1016.2012.02361
[50] (Sun Chenchen, Shen Derong, Shan Jing, et al.WSR: A Semantic Relatedness Measure Based on Wikipedia Structure[J]. Chinese Journal of Computers, 2012, 35(11): 2361-2370. )
doi: 10.3724/SP.J.1016.2012.02361
[51] Strube M, Ponzetto S P.WikiRelate! Computing Semantic Relatedness Using Wikipedia[C]//Proceedings of the 21st National Conference on Artificial Intelligence. 2006.
[52] Gabrilovich E, Markovitch S.Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis[C]// Proceedings of the 20th International Joint Conference on Artifical Intelligence.2007.
[53] Milne D, Witten I H. An Effective, Low-cost Measure of Semantic Relatedness Obtained from Wikipedia Links[C]// Proceedings of the 23rd Association for the Advancement of Artificial Intelligence. 2008.
[54] 盛志超, 陶晓鹏. 基于维基百科的语义相似度计算方法[J]. 计算机工程, 2011, 37(7): 193-195.
doi: 10.3969/j.issn.1000-3428.2011.07.065
[54] (Sheng Zhichao, Tao Xiaopeng.Semantic Similarity Computing Method Based on Wikipedia[J]. Computer Engineering, 2011, 37(7): 193-195. )
doi: 10.3969/j.issn.1000-3428.2011.07.065
[55] 彭丽针, 吴扬扬. 基于维基百科社区挖掘的词语语义相似度计算[J]. 计算机科学, 2016, 43(4): 45-49.
doi: 10.11896/j.issn.1002-137X.2016.4.009
[55] (Peng Lizhen, Wu Yangyang.Semantic Similarity Computing Based on Community Mining of Wikipedia[J]. Computer Science, 2016, 43(4): 45-49. )
doi: 10.11896/j.issn.1002-137X.2016.4.009
[56] Lizorkin D, Medelyan O, Grineva M.Analysis of Community Structure in Wikipedia[C]//Proceedings of the 18th International Conference on World Wide Web. 2009: 1221-1222.
[57] 詹志建, 梁丽娜, 杨小平. 基于百度百科的词语相似度计算[J]. 计算机科学, 2013, 40(6): 199-202.
doi: 10.3969/j.issn.1002-137X.2013.06.043
[57] (Zhan Zhijian, Liang Li’na, Yang Xiaoping.Word Similarity Measurement Based on BaiduBaike[J]. Computer Science, 2013, 40(6): 199-202. )
doi: 10.3969/j.issn.1002-137X.2013.06.043
[58] 尹坤, 尹红风, 杨燕, 等. 基于SimRank的百度百科词条语义相似度计算[J]. 山东大学学报:工学版, 2014, 44(3): 29-35.
doi: 10.6040/j.issn.1672-3961.2.2013.282
[58] (Yin Kun, Yin Hongfeng, Yang Yan, et al.Semantic Similarity Computation of Baidu Encyclopedia Entries Based on SimRank[J]. Journal of Shandong University:Engineering Science, 2014, 44(3): 29-35. )
doi: 10.6040/j.issn.1672-3961.2.2013.282
[59] 穗志方, 俞士汶. 基于骨架依存树的语句相似度计算模型[C]//1998中文信息处理国际会议论文集. 1998.
[59] (Sui Zhifang, Yu Shiwen.The Skeletal-Dependency-Tree-Based Computational Model for the Sentence Similarity[C]// Proceedings of the International Conference on Chinese Computing.1998. )
[60] 李彬, 刘挺, 秦兵, 等. 基于语义依存的汉语句子相似度计算[J]. 计算机应用研究, 2003, 20(12): 15-17.
doi: 10.3969/j.issn.1001-3695.2003.12.005
[60] (Li Bin, Liu Ting, Qin Bing, et al.Chinese Sentence Similarity Computing Based on Semantic Dependency Relationship Analysis[J]. Application Research of Computers, 2003, 20(12): 15-17. )
doi: 10.3969/j.issn.1001-3695.2003.12.005
[61] 李茹, 王智强, 李双红, 等. 基于框架语义分析的汉语句子相似度计算[J]. 计算机研究与发展, 2013, 50(8): 1728-1736.
[61] (Li Ru, Wang Zhiqiang, Li Shuanghong, et al.Chinese Sentence Similarity Computing Based on Frame Semantic Parsing[J]. Journal of Computer Research and Development, 2013, 50(8): 1728-1736.)
[62] Blanco E, Moldovan D.A Semantic Logic-Based Approach to Determine Textual Similarity[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(4): 683-693.
doi: 10.1109/TASLP.2015.2403613
[63] Jiang J J, Conrath D W.Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy[C]// Proceedings of the International Conference on Research in Computational Linguistics. 1997.
[64] Islam A, Inkpen D.Semantic Text Similarity Using Corpus-based Word Similarity and String Similarity[J]. ACM Transactions on Knowledge Discovery from Data, 2008, 2(2): 1-25.
doi: 10.1145/1376815.1376819
[65] Tasi C S, Huang Y M, Liu C H, et al.Applying VSM and LCS to Develop an Integrated Text Retrieval Mechanism[J]. Expert Systems with Applications, 2012, 39(4): 3974-3982.
doi: 10.1016/j.eswa.2011.09.039
[66] 魏韡, 向阳, 陈千. 计算术语间语义相似度的混合方法[J]. 计算机应用, 2010, 30(6): 1668-1670.
[66] (Wei Wei, Xiang Yang, Chen Qian.Combined Measurement Approach for Semantic Similarity of Terms[J]. Journal of Computer Applications, 2010, 30(6): 1668-1670. )
[67] Liu G, Wang R, Buckley J, et al.A WordNet-based Semantic Similarity Measure Enhanced by Internet-based Knowledge[C]//Proceedings of the International Conference on Software Engineering & Knowledge Engineering.2011.
[68] 王小林, 肖慧, 邰伟鹏. 基于Hadoop平台的文本相似度检测系统的研究[J]. 计算机技术与发展, 2015, 25(8): 90-93.
[68] (Wang Xiaolin, Xiao Hui, Tai Weipeng.Research on Text Similarity Detection System Based on Hadoop[J]. Computer Technology and Development, 2015, 25(8): 90-93.)
[69] Atoum I, Otoom A.Efficient Hybrid Semantic Text Similarity Using Wordnet and a Corpus[J]. International Journal of Advanced Computer Science and Applications, 2016, 7(9): 124-130.
doi: 10.14569/IJACSA.2016.070917
[1] 范少萍,赵雨宣,安新颖,吴清强. 基于卷积神经网络的医学实体关系分类模型研究*[J]. 数据分析与知识发现, 2021, 5(9): 75-84.
[2] 范涛,王昊,吴鹏. 基于图卷积神经网络和依存句法分析的网民负面情感分析研究*[J]. 数据分析与知识发现, 2021, 5(9): 97-106.
[3] 顾耀文, 张博文, 郑思, 杨丰春, 李姣. 基于图注意力网络的药物ADMET分类预测模型构建方法*[J]. 数据分析与知识发现, 2021, 5(8): 76-85.
[4] 张乐, 冷基栋, 吕学强, 崔卓, 王磊, 游新冬. RLCPAR:一种基于强化学习的中文专利摘要改写模型*[J]. 数据分析与知识发现, 2021, 5(7): 59-69.
[5] 孟镇,王昊,虞为,邓三鸿,张宝隆. 基于特征融合的声乐分类研究*[J]. 数据分析与知识发现, 2021, 5(5): 59-70.
[6] 韩普,张展鹏,张明淘,顾亮. 基于多特征融合的中文疾病名称归一化研究*[J]. 数据分析与知识发现, 2021, 5(5): 83-94.
[7] 王楠,李海荣,谭舒孺. 基于改进SMOTE算法与集成学习的舆情反转预测研究*[J]. 数据分析与知识发现, 2021, 5(4): 37-48.
[8] 李丹阳, 甘明鑫. 基于多源信息融合的音乐推荐方法 *[J]. 数据分析与知识发现, 2021, 5(2): 94-105.
[9] 程铁军, 王曼, 黄宝凤, 冯兰萍. 基于CEEMDAN-BP模型的突发事件网络舆情预测研究*[J]. 数据分析与知识发现, 2021, 5(11): 59-67.
[10] 盛姝, 黄奇, 杨洋, 解绮雯, 秦新国. HL7 FHIR框架下中国医疗领域信息交换研究与解决方案[J]. 数据分析与知识发现, 2021, 5(11): 13-28.
[11] 丁浩, 艾文华, 胡广伟, 李树青, 索炜. 融合用户兴趣波动时序的个性化推荐模型*[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[12] 尹浩然,曹金璇,曹鲁喆,王国栋. 扩充语义维度的BiGRU-AM突发事件要素识别研究*[J]. 数据分析与知识发现, 2020, 4(9): 91-99.
[13] 曾桢,李纲,毛进,陈璟浩. 区域公共安全数据治理与业务领域本体研究*[J]. 数据分析与知识发现, 2020, 4(9): 41-55.
[14] 邱尔丽,何鸿魏,易成岐,李慧颖. 基于字符级CNN技术的公共政策网民支持度研究 *[J]. 数据分析与知识发现, 2020, 4(7): 28-37.
[15] 王思迪,胡广伟,杨巳煜,施云. 基于文本分类的政府网站信箱自动转递方法研究*[J]. 数据分析与知识发现, 2020, 4(6): 51-59.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn