Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (6): 1-11    DOI: 10.11925/infotech.2096-3467.2017.06.01
Review of Studies on Text Similarity Measures
Erjing Chen1,2(),Enbo Jiang1
1Chengdu Documentation and Information Center, Chinese Academy of Sciences, Chengdu 610041, China
2University of Chinese Academy of Sciences, Beijing 100049, China
[Objective] This paper analyzes the popular text similarity measures and discusses their latest developments. [Coverage] We retrieved 69 key articles from CNKI and Web of Science databases by searching “TI: ‘text similarity’ or ‘semantic similarity’ or ‘lexical similarity’ ” in Chinese and English respectively. [Methods] We systematically reviewed the text similarity measures focusing on their basic concepts, characteristics and future directions. [Results] There were four types of text similarity measures: String-based, Corpus-based, Knowledge-based and others. Measures based on the neural network, Knowledge-based measures and inter-disciplinary measures could be the future research directions. [Limitations] We did not discuss the applications of those measures. [Conclusions] This paper is a comprehensive review of text similarity measure research.

Key wordsText Similarity      Semantic Similarity      Ontology      Bag of Words Model      Neural Network     
Received: 09 May 2017      Published: 25 August 2017

Erjing Chen,Enbo Jiang. Review of Studies on Text Similarity Measures. Data Analysis and Knowledge Discovery, 2017, 1(6): 1-11.

