New Technology of Library and Information Service  2013, Vol. 29 Issue (9): 88-92    DOI: 10.11925/infotech.1003-3513.2013.09.14
Research on Short Text Clustering Algorithm for User Generated Content
Zhao Hui, Liu Huailiang
School of Economics & Management, Xidian University, Xi’an 710071, China
Abstract  To solve the problem of weak semantic description ability of short text feature in user generated content, and the traditional K-means algorithm for document clustering is sensitive to the initial clustering center, this paper proposes that the semantic features information of short text can be supplied by feature extension based on the concept, link structure and category system of Wikipedia. Then the weighted complex network of short text set is built by the semantic relation of texts, and text clustering is achieved by node partitioning community based on K-means algorithm whose initial clustering center is chosen according to the synthetic characteristics of network nodes. Results of experiment show that the algorithm proposed by this paper can improve the effect of short text clustering.
Key wordsShort text clustering      Feature extension      Complex network      K-means algorithm      User enerated content     
Received: 02 July 2013      Published: 27 September 2013
Zhao Hui, Liu Huailiang. Research on Short Text Clustering Algorithm for User Generated Content. New Technology of Library and Information Service, 2013, 29(9): 88-92.

