New Technology of Library and Information Service  2013, Vol. 29 Issue (2): 57-62    DOI: 10.11925/infotech.1003-3513.2013.02.09
Research on Chinese Micro-blog Bursty Topics Detection
Wang Yong1, Xiao Shibin1,2, Guo Yixiu1, Lv Xueqiang1,2
1. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China;
2. Beijing TRS Information Technology Co., Ltd., Beijing 100101, China
Abstract  Much attention is paid to mining bursty topics accurately and efficiently from micro-blog nowadays. In this paper, a set of burst terms are extracted by counting the term frequency, calculating the growth rate of the terms and using Term Frequency-Proportional Document Frequency (TF-PDF) algorithm to measure the weight. And then micro-blog texts are described with the burst terms. Analyzing the characteristic that bursty topics propagate in the platform of micro-blog, the authors filter the texts that do not contribute to detect bursty topics. The paper proposes a novel clustering strategy of “Absolute Clustering” to cluster the micro-blog texts. By figuring up the hot spot of the texts with weighted value of reply and retweet number, the top 5 texts are extracted as the result of burst topics detection. The experiments show that the precision is 92.60%, the recall is 85.51% and the F-measure is 0.89. Contrast with the traditional method, the validity of the proposed method is proved.
Key wordsBursty topics      Burst terms      Filter      Absolute clustering     
Received: 18 January 2013      Published: 24 April 2013
Wang Yong, Xiao Shibin, Guo Yixiu, Lv Xueqiang. Research on Chinese Micro-blog Bursty Topics Detection. New Technology of Library and Information Service, 2013, 29(2): 57-62.

