%A Chen Erjing,Jiang Enbo %T Review of Studies on Text Similarity Measures %0 Journal Article %D 2017 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.2096-3467.2017.06.01 %P 1-11 %V 1 %N 6 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_4392.shtml} %8 2017-06-25 %X

[Objective] This paper analyzes the popular text similarity measures and discusses their latest developments. [Coverage] We retrieved 69 key articles from CNKI and Web of Science databases by searching “TI: ‘text similarity’ or ‘semantic similarity’ or ‘lexical similarity’ ” in Chinese and English respectively. [Methods] We systematically reviewed the text similarity measures focusing on their basic concepts, characteristics and future directions. [Results] There were four types of text similarity measures: String-based, Corpus-based, Knowledge-based and others. Measures based on the neural network, Knowledge-based measures and inter-disciplinary measures could be the future research directions. [Limitations] We did not discuss the applications of those measures. [Conclusions] This paper is a comprehensive review of text similarity measure research.