Please wait a minute...
New Technology of Library and Information Service  2013, Vol. 29 Issue (2): 57-62    DOI: 10.11925/infotech.1003-3513.2013.02.09
Current Issue | Archive | Adv Search |
Research on Chinese Micro-blog Bursty Topics Detection
Wang Yong1, Xiao Shibin1,2, Guo Yixiu1, Lv Xueqiang1,2
1. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China;
2. Beijing TRS Information Technology Co., Ltd., Beijing 100101, China
Download: PDF(502 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  Much attention is paid to mining bursty topics accurately and efficiently from micro-blog nowadays. In this paper, a set of burst terms are extracted by counting the term frequency, calculating the growth rate of the terms and using Term Frequency-Proportional Document Frequency (TF-PDF) algorithm to measure the weight. And then micro-blog texts are described with the burst terms. Analyzing the characteristic that bursty topics propagate in the platform of micro-blog, the authors filter the texts that do not contribute to detect bursty topics. The paper proposes a novel clustering strategy of “Absolute Clustering” to cluster the micro-blog texts. By figuring up the hot spot of the texts with weighted value of reply and retweet number, the top 5 texts are extracted as the result of burst topics detection. The experiments show that the precision is 92.60%, the recall is 85.51% and the F-measure is 0.89. Contrast with the traditional method, the validity of the proposed method is proved.
Key wordsBursty topics      Burst terms      Filter      Absolute clustering     
Received: 18 January 2013      Published: 24 April 2013
:  TP311.6  

Cite this article:

Wang Yong, Xiao Shibin, Guo Yixiu, Lv Xueqiang. Research on Chinese Micro-blog Bursty Topics Detection. New Technology of Library and Information Service, 2013, 29(2): 57-62.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2013.02.09     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2013/V29/I2/57

[1] 中国互联网信息中心.第30次中国互联网络发展状况统计报告[R].北京:中国互联网络信息中心,2012.(China Internet Network Information Center. The 30th Statistical Report of China Internet Development[R]. Beijing:CNNIC, 2012.)
[2] 原福永,冯静,符茜茜.微博用户的影响力指数模型[J].现代图书情报技术,2012(6):60-64.(Yuan Fuyong, Feng Jing, Fu Qianqian. Influence Index Model of Micro-blog User[J]. New Technology of Library and Information Service, 2012(6):60-64.)
[3] Diao Q M, Jiang J, Zhu F D. Finding Bursty Topics from Microblogs[C].In: Proceedings of ACL, 2012:536-544.
[4] Wang X H, Zhai C X, Hu X,et al. Mining Correlated Bursty Topics Patterns from Coordinated Text Streams[C]. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'07), California, USA. New York, NY, USA:ACM,2007:784-793.
[5] Du Y Y, He Y X, Tian Y,et al. Microblog Bursty Topic Detection Based on User Relationship[C]. In: Proceedings of the 6th IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC). 2011:260-263.
[6] Du Y Y, Wu W, He Y X,et al. Microblog Bursty Feature Detection Based on Dynamics Model[C]. In: Proceedings of the International Conference on Systems and Informatics(ICSAI). 2012:2304-2308.
[7] Fung G P C, Yu J X, Yu P S,et al. Parameter Free Bursty Events Detection in Text Streams[C].In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005:181-192.
[8] Erdmann M, Nakayama K, Hara T,et al. Improving the Extraction of Bilingual Terminology from Wikipedia[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2009, 5(4):1-17.
[9] Bollegala D, Matsuo Y, Ishizuka M. Measuring the Similarity Between Implicit Semantic Relation Using Web Search Engines[C].In: Proceedings of the 2nd ACM International Conference on Web Search and Data Mining(WSDM'09). New York, NY, USA: ACM, 2009:104-113.
[10] 李海芳,史俊冰,段利国,等.一种基于含糊同义词的查询扩展方法[J].计算机应用与软件,2011, 28(12):439-443.(Li Haifang, Shi Junbing, Duan Liguo, et.al. A Query Expansion Method Based on Vague Synonyms[J]. Computer Application and Software, 2011, 28(12):439-443.)
[11] 赵辉,刘怀亮,范云杰,等.一种基于语义的中文文本分类算法[J].情报理论与实践,2012, 35(3):115-118.(Zhao Hui, Liu Huailiang, Fan Yunjie, et.al. A Chinese Text Classfication Algorithm Based on Semantics[J]. Information Studies:Theory & Application, 2012, 35(3):115-118.)
[12] Blei D M , Ng A Y , Jordan M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3:993-1022.
[13] Nallapati R, Cohen W. Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence in Blogs[C].In: Proceedings of the International Conference for Weblogs and Social Media. 2008:84-92.
[14] 洪宇,张宇,刘挺,等.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87.(Hong Yu, Zhang Yu, Liu Ting, et al. Topic Detection and Tracking Review[J]. Journal of Chinese Information Processing, 2007, 21(6):71-87.)
[15] Bun K K,Ishizuka M. Topic Extraction from News Archive Using TF*PDF Algorithm[C]. In: Proceedings of the 3rd International Conference on Web Information Systems Engineering.2002:73-82.
[16] 百度百科.新闻五要素[EB/OL].[2013-01-03].http://baike.baidu.com/view/754050.htm.(Baidu Baike. The Five Elements of News[EB/OL].[2013-01-03]. http://baike.baidu.com/view/754050.htm.)
[1] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[2] Jie Li,Fang Yang,Chenxi Xu. A Personalized Recommendation Algorithm with Temporal Dynamics and Sequential Patterns[J]. 数据分析与知识发现, 2018, 2(7): 72-80.
[3] Daoping Wang,Zhongyang Jiang,Boqing Zhang. Collaborative Filtering Algorithm Based on Gray Correlation Analysis and Time Factor[J]. 数据分析与知识发现, 2018, 2(6): 102-109.
[4] Yong Wang,Yongdong Wang,Huifang Guo,Yumin Zhou. Measuring Item Similarity Based on Increment of Diversity[J]. 数据分析与知识发现, 2018, 2(5): 70-76.
[5] Lingfeng Hua,Gaoming Yang,Xiujun Wang. Recommending Diversified News Based on User’s Locations[J]. 数据分析与知识发现, 2018, 2(5): 94-104.
[6] Cong Yin,Liyi Zhang. Recommendation Algorithm for Post-Context Filtering Based on TF-IDF: Case Study of Catering O2O[J]. 数据分析与知识发现, 2018, 2(11): 28-36.
[7] Jiabin Qu,Shiyan Ou. Analyzing Topic Evolution with Topic Filtering and Relevance[J]. 数据分析与知识发现, 2018, 2(1): 64-75.
[8] Fuliang Xue,Junling Liu. Improving Collaborative Filtering Recommendation Based on Trust Relationship Among Users[J]. 数据分析与知识发现, 2017, 1(7): 90-99.
[9] Xingxin Qin,Rongbo Wang,Xiaoxi Huang,Zhiqun Chen. Slope One Collaborative Filtering Algorithm Based on Multi-Weights[J]. 数据分析与知识发现, 2017, 1(6): 65-71.
[10] Li Daoguo,Li Lianjie,Shen Enping. New Collaborative Filtering Recommendation Algorithm Based on User Rating Time[J]. 现代图书情报技术, 2016, 32(9): 65-69.
[11] Tan Xueqing,Zhang Lei,Huang Cuicui,Luo Lin. A Collaborative Filtering and Recommendation Algorithm Using Trust of Domain-Experts and Similarity[J]. 现代图书情报技术, 2016, 32(7-8): 101-109.
[12] Wang Yong,Deng Jiangzhou,Deng Yongheng,Zhang Pu. A Collaborative Filtering Recommendation Algorithm Based on Item Probability Distribution[J]. 现代图书情报技术, 2016, 32(6): 73-79.
[13] Ma Li. Collaborative Filtering Recommendation Method Based on User Learning Tree[J]. 现代图书情报技术, 2016, 32(4): 72-80.
[14] Shuhao Jiang, Liyi Zhang, Zhixin Zhang. New Collaborative Filtering Algorithm Based on Relative Similarity[J]. 数据分析与知识发现, 2016, 32(12): 44-49.
[15] Wu Yingliang, Yao Huaidong, Li Cheng'an. An Improved Collaborative Filtering Recommendation Algorithm with Indirect Trust Relationship[J]. 现代图书情报技术, 2015, 31(9): 38-45.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn