Data Analysis and Knowledge Discovery  2016, Vol. 32 Issue (12): 9-16    DOI: 10.11925/infotech.1003-3513.2016.12.02
A New Text Clustering Method Based on Semantic Similarity
Qiang Bi1,Jian Liu1,Yulai Bao1,2()
1School of Management, Jilin University, Changchun 130022, China
2Inner Mongolia University Library, Hohhot 010021, China
[Objective]This paper proposes an algorithm based on semantic similarity to extract more information from the textual resources. [Methods] First, we calculated the semantic similarity of words with the Extended Dictionary of Synonyms, and then created a semantic similarity matrix. Second, we clustered the texts based on the new semantic similarity matrix. [Results] The proposed algorithm was examined with text corpus from Fudan University and the search engine Sogou. Compared to the traditional methods, the proposed algorithm achieved the highest precision rates and purity values (cluster number=10). [Limitations] Some partial similarity calculation results were manually adjusted due to the incomplete coverage of the Tongyici Cilin Extened Edition. [Conclusions] The proposed algorithm could extract more latent information from the texts, which is an effective method to cluster and recommend textual documents.

Key wordsTongyici Cilin Extended Edition      Semantic similarity      Spectrum clustering      Text mining     
Received: 12 September 2016      Published: 22 January 2017

Cite this article:

Qiang Bi, Jian Liu, Yulai Bao. A New Text Clustering Method Based on Semantic Similarity. Data Analysis and Knowledge Discovery, 2016, 32(12): 9-16.

