%A Gu Xiaoxue, Zhang Chengzhi %T Using Content and Tags for Web Text Clustering %0 Journal Article %D 2014 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.1003-3513.2014.11.07 %P 45-52 %V 30 %N 11 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_3972.shtml} %8 2014-11-25 %X

[Objective] This paper explores the infulence of the combination of social tagging and text content. [Methods] In this paper, taking the English and Chinese blogs for example, using TF×IDF, TextRank and TextRank×IDF as text feature extraction method, basing on tags combining with text content where two types weighted methods is used, and AP clustering algorithm is used to cluster samples. [Results] The results show that acts the best in the clustering of three feature extraction. And content weighted with tags improve different degree of the clustering of English blogs, but not for Chinese blogs in the method of Sigmoid. In two kinds of similarity weighted, linear method performs better than the Sigmoid method. [Limitations] The authors cannot find the best weight coefficient of tag similarity and content similarity. AP clustering algorithm can't apply to big data and a lot of clustering results interfered the visualization of show. [Conclusions] The weighted similarity of social tags and text content can improve the effect of the clutering of Web text.