%A Qiang Bi, Jian Liu, Yulai Bao %T A New Text Clustering Method Based on Semantic Similarity %0 Journal Article %D 2016 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.1003-3513.2016.12.02 %P 9-16 %V 32 %N 12 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_4307.shtml} %8 2016-12-25 %X

[Objective]This paper proposes an algorithm based on semantic similarity to extract more information from the textual resources. [Methods] First, we calculated the semantic similarity of words with the Extended Dictionary of Synonyms, and then created a semantic similarity matrix. Second, we clustered the texts based on the new semantic similarity matrix. [Results] The proposed algorithm was examined with text corpus from Fudan University and the search engine Sogou. Compared to the traditional methods, the proposed algorithm achieved the highest precision rates and purity values (cluster number=10). [Limitations] Some partial similarity calculation results were manually adjusted due to the incomplete coverage of the Tongyici Cilin Extened Edition. [Conclusions] The proposed algorithm could extract more latent information from the texts, which is an effective method to cluster and recommend textual documents.