New Technology of Library and Information Service  2016, Vol. 32 Issue (10): 50-58    DOI: 10.11925/infotech.1003-3513.2016.10.06
Clustering Blog Posts with Co-occurrence Analysis
Gong Kaile(),Cheng Ying,Sun Jianjun
School of Information Management, Nanjing University, Nanjing 210023, China
[Objective] This study investigates the co-occurrence of blog comment contributors, aiming to explore their roles in blog posts clustering. [Methods] We developed a method of two-step clustering. First, we constructed the co-occurrence matrix of the contributors from different blog posts and then transform it to a correlation matrix. Then finished the first-step clustering with the help of Affinity Propagation (AP) algorithm. Second, we calculated the terms’ position weight based on the centers of AP clustering, and then finished the second-stage blog post content clustering with K-means algorithm. [Results] The average precision and recall ratio of the proposed method were 0.66 and 0.57, which were significantly higher than those of the traditional ones. [Limitations] The blog comment contributors co-occurrence improved the quality of clustering, but it has limited value in blog posts with few comments. [Conclusions] The proposed method improves the quality of blog posts clustering by combining terms and contributors’ co-occurrence. The two-step clustering method is a better option to select the initial cluster centers of the K-means algorithm.

Key wordsCo-occurrence analysis      Text clustering      Blog comments contributor      Initial cluster centers     
Received: 04 May 2016      Published: 23 November 2016

Gong Kaile,Cheng Ying,Sun Jianjun. Clustering Blog Posts with Co-occurrence Analysis. New Technology of Library and Information Service, 2016, 32(10): 50-58.

