Please wait a minute...
New Technology of Library and Information Service  2013, Vol. 29 Issue (2): 57-62    DOI: 10.11925/infotech.1003-3513.2013.02.09
Current Issue | Archive | Adv Search |
Research on Chinese Micro-blog Bursty Topics Detection
Wang Yong1, Xiao Shibin1,2, Guo Yixiu1, Lv Xueqiang1,2
1. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China;
2. Beijing TRS Information Technology Co., Ltd., Beijing 100101, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  Much attention is paid to mining bursty topics accurately and efficiently from micro-blog nowadays. In this paper, a set of burst terms are extracted by counting the term frequency, calculating the growth rate of the terms and using Term Frequency-Proportional Document Frequency (TF-PDF) algorithm to measure the weight. And then micro-blog texts are described with the burst terms. Analyzing the characteristic that bursty topics propagate in the platform of micro-blog, the authors filter the texts that do not contribute to detect bursty topics. The paper proposes a novel clustering strategy of “Absolute Clustering” to cluster the micro-blog texts. By figuring up the hot spot of the texts with weighted value of reply and retweet number, the top 5 texts are extracted as the result of burst topics detection. The experiments show that the precision is 92.60%, the recall is 85.51% and the F-measure is 0.89. Contrast with the traditional method, the validity of the proposed method is proved.
Key wordsBursty topics      Burst terms      Filter      Absolute clustering     
Received: 18 January 2013      Published: 24 April 2013
:  TP311.6  

Cite this article:

Wang Yong, Xiao Shibin, Guo Yixiu, Lv Xueqiang. Research on Chinese Micro-blog Bursty Topics Detection. New Technology of Library and Information Service, 2013, 29(2): 57-62.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2013.02.09     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2013/V29/I2/57

[1] 中国互联网信息中心.第30次中国互联网络发展状况统计报告[R].北京:中国互联网络信息中心,2012.(China Internet Network Information Center. The 30th Statistical Report of China Internet Development[R]. Beijing:CNNIC, 2012.)
[2] 原福永,冯静,符茜茜.微博用户的影响力指数模型[J].现代图书情报技术,2012(6):60-64.(Yuan Fuyong, Feng Jing, Fu Qianqian. Influence Index Model of Micro-blog User[J]. New Technology of Library and Information Service, 2012(6):60-64.)
[3] Diao Q M, Jiang J, Zhu F D. Finding Bursty Topics from Microblogs[C].In: Proceedings of ACL, 2012:536-544.
[4] Wang X H, Zhai C X, Hu X,et al. Mining Correlated Bursty Topics Patterns from Coordinated Text Streams[C]. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'07), California, USA. New York, NY, USA:ACM,2007:784-793.
[5] Du Y Y, He Y X, Tian Y,et al. Microblog Bursty Topic Detection Based on User Relationship[C]. In: Proceedings of the 6th IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC). 2011:260-263.
[6] Du Y Y, Wu W, He Y X,et al. Microblog Bursty Feature Detection Based on Dynamics Model[C]. In: Proceedings of the International Conference on Systems and Informatics(ICSAI). 2012:2304-2308.
[7] Fung G P C, Yu J X, Yu P S,et al. Parameter Free Bursty Events Detection in Text Streams[C].In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005:181-192.
[8] Erdmann M, Nakayama K, Hara T,et al. Improving the Extraction of Bilingual Terminology from Wikipedia[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2009, 5(4):1-17.
[9] Bollegala D, Matsuo Y, Ishizuka M. Measuring the Similarity Between Implicit Semantic Relation Using Web Search Engines[C].In: Proceedings of the 2nd ACM International Conference on Web Search and Data Mining(WSDM'09). New York, NY, USA: ACM, 2009:104-113.
[10] 李海芳,史俊冰,段利国,等.一种基于含糊同义词的查询扩展方法[J].计算机应用与软件,2011, 28(12):439-443.(Li Haifang, Shi Junbing, Duan Liguo, et.al. A Query Expansion Method Based on Vague Synonyms[J]. Computer Application and Software, 2011, 28(12):439-443.)
[11] 赵辉,刘怀亮,范云杰,等.一种基于语义的中文文本分类算法[J].情报理论与实践,2012, 35(3):115-118.(Zhao Hui, Liu Huailiang, Fan Yunjie, et.al. A Chinese Text Classfication Algorithm Based on Semantics[J]. Information Studies:Theory & Application, 2012, 35(3):115-118.)
[12] Blei D M , Ng A Y , Jordan M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3:993-1022.
[13] Nallapati R, Cohen W. Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence in Blogs[C].In: Proceedings of the International Conference for Weblogs and Social Media. 2008:84-92.
[14] 洪宇,张宇,刘挺,等.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87.(Hong Yu, Zhang Yu, Liu Ting, et al. Topic Detection and Tracking Review[J]. Journal of Chinese Information Processing, 2007, 21(6):71-87.)
[15] Bun K K,Ishizuka M. Topic Extraction from News Archive Using TF*PDF Algorithm[C]. In: Proceedings of the 3rd International Conference on Web Information Systems Engineering.2002:73-82.
[16] 百度百科.新闻五要素[EB/OL].[2013-01-03].http://baike.baidu.com/view/754050.htm.(Baidu Baike. The Five Elements of News[EB/OL].[2013-01-03]. http://baike.baidu.com/view/754050.htm.)
[1] Li Zhenyu, Li Shuqing. Deep Collaborative Filtering Algorithm with Embedding Implicit Similarity Groups[J]. 数据分析与知识发现, 2021, 5(11): 124-134.
[2] Yang Chen, Chen Xiaohong, Wang Chuhan, Liu Tingting. Recommendation Strategy Based on Users’ Preferences for Fine-Grained Attributes[J]. 数据分析与知识发现, 2021, 5(10): 94-102.
[3] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[4] Su Qing,Chen Sizhao,Wu Weimin,Li Xiaomei,Huang Tiankuan. Personalized Recommendation Model Based on Collaborative Filtering Algorithm of Learning Situation[J]. 数据分析与知识发现, 2020, 4(5): 105-117.
[5] Zheng Songyin,Tan Guoxin,Shi Zhongchao. Recommending Tourism Attractions Based on Segmented User Groups and Time Contexts[J]. 数据分析与知识发现, 2020, 4(5): 92-104.
[6] Ding Yong,Chen Xi,Jiang Cuiqing,Wang Zhao. Predicting Online Ratings with Network Representation Learning and XGBoost[J]. 数据分析与知识发现, 2020, 4(11): 52-62.
[7] Fusen Jiao,Shuqing Li. Collaborative Filtering Recommendation Based on Item Quality and User Ratings[J]. 数据分析与知识发现, 2019, 3(8): 62-67.
[8] Shan Li,Yehui Yao,Hao Li,Jie Liu,Karmapemo. ISA Biclustering Algorithm for Group Recommendation[J]. 数据分析与知识发现, 2019, 3(8): 77-87.
[9] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[10] Li Jie,Yang Fang,Xu Chenxi. A Personalized Recommendation Algorithm with Temporal Dynamics and Sequential Patterns[J]. 数据分析与知识发现, 2018, 2(7): 72-80.
[11] Wang Daoping,Jiang Zhongyang,Zhang Boqing. Collaborative Filtering Algorithm Based on Gray Correlation Analysis and Time Factor[J]. 数据分析与知识发现, 2018, 2(6): 102-109.
[12] Wang Yong,Wang Yongdong,Guo Huifang,Zhou Yumin. Measuring Item Similarity Based on Increment of Diversity[J]. 数据分析与知识发现, 2018, 2(5): 70-76.
[13] Hua Lingfeng,Yang Gaoming,Wang Xiujun. Recommending Diversified News Based on User’s Locations[J]. 数据分析与知识发现, 2018, 2(5): 94-104.
[14] Yin Cong,Zhang Liyi. Recommendation Algorithm for Post-Context Filtering Based on TF-IDF: Case Study of Catering O2O[J]. 数据分析与知识发现, 2018, 2(11): 28-36.
[15] Qu Jiabin,Ou Shiyan. Analyzing Topic Evolution with Topic Filtering and Relevance[J]. 数据分析与知识发现, 2018, 2(1): 64-75.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn