Please wait a minute...
New Technology of Library and Information Service  2016, Vol. 32 Issue (7-8): 12-20    DOI: 10.11925/infotech.1003-3513.2016.07.03
Orginal Article Current Issue | Archive | Adv Search |
A New Method to Detect Bursty Events from Micro-blog Posts Based on Bursty Topic Words and Agglomerative Hierarchical Clustering Algorithm
Ding Shengchun(),Gong Silan,Li Hongmei
School of Economics and Management, Nanjing University of Science & Technology, Nanjing 210094, China
Download: PDF(653 KB)   HTML ( 60
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a new method to detect real time bursty events accurately and efficiently from massive micro-blog posts. It provides decision-making information to public opinion emergency management. [Methods] First, we introduced the reference time window mechanism, and then designed an algorithm to process the data of word frequency, document frequency, Hashtags, and word frequency growth rates. Second, used this dynamic threshold based algorithm to extract bursty words. Third, transformed micro-blog texts to feature vector of the bursty words. Finally, we detected the bursty events using agglomerative hierarchical clustering algorithm. [Results] The bursty events detection method reached 80% of accuracy rate compared with real world cases. Thus, the proposed method was feasible and effective. [Limitations] We could not describe the detected emergencies automatically due to the limits of data and size of the current study. More research is needed to analyze users’ emotion and semantic relationships among the bursty events. [Conclusions] Our study fills the knowledge gaps left by previous research, and improves the efficiency of retrieving bursty events from micro-blog posts.

Key wordsBursty events detection      Bursty topic words      Agglomerative hierarchical clustering algorithm Public opinion      Micro-blog     
Received: 12 June 2016      Published: 29 September 2016

Cite this article:

Ding Shengchun,Gong Silan,Li Hongmei. A New Method to Detect Bursty Events from Micro-blog Posts Based on Bursty Topic Words and Agglomerative Hierarchical Clustering Algorithm. New Technology of Library and Information Service, 2016, 32(7-8): 12-20.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2016.07.03     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2016/V32/I7-8/12

[1] Wang X, Zhai C X, Hu X, et al.Mining Correlated Bursty Topic Patterns from Coordinated Text Streams[C]. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2007: 784-793.
[2] Du Y, Wu W, He Y, et al.Microblog Bursty Feature Detection Based on Dynamics Model [C]. In: Proceedings of 2012 International Conference on Systems and Informatics (ICSAI). IEEE, 2012: 2304-2308.
[3] Aggarwal C C, Zhai C X.A Survey of Text Clustering Algorithms [A]. //Mining Text Data[M]. Springer US, 2012: 77-128.
[4] Yang Y, Pierce T, Carbonell J.A Study of Retrospective and On-line Event Detection[C]. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1998: 28-36.
[5] Petrovi? S, Osborne M, Lavrenko V.Streaming First Story Detection with Application to Twitter[C]. In: Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010: 181-189.
[6] Phuvipadawat S, Murata T.Breaking News Detection and Tracking in Twitter [C]. In: Proceedings of 2010 IEEE/WIC/ ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). IEEE, 2010: 120-123.
[7] 葛高飞. 突发事件微博新话题检测与跟踪系统的设计与实现[D]. 北京: 北京邮电大学, 2014.
[7] (Ge Gaofei.Design and Implementation of New Topic Detection and Tracking of Microblog Based on Emergency [D]. Beijing: Beijing University of Posts and Telecomunications, 2014.)
[8] Becker H, Naaman M, Gravano L.Beyond Trending Topics: Real-World Event Identification on Twitter[C]. In: Proceedings of the 5th International AAAI Conference on Weblogs and Social Media. 2011: 438-441.
[9] Du Y, He Y, Tian Y, et al.Microblog Bursty Topic Detection Based on User Relationship [C]. In:Proceedings of 2011 6th IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC). IEEE, 2011: 260-263.
[10] Benevenuto F, Magno G, Rodrigues T, et al.Detecting Spammers on Twitter [C]. In: Proceedings of the Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS). 2010(6): 12-20.
[11] Weng J, Lee B S.Event Detection in Twitter [C]. In: Proceedings of the 5th International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain. 2011: 401-408.
[12] Kleinberg J.Bursty and Hierarchical Structure in Streams[J]. Data Mining and Knowledge Discovery, 2003, 7(4): 373-397.
[13] He Q, Chang K, Lim E P.Analyzing Feature Trajectories for Event Detection [C]. In:Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2007: 207-214.
[14] Mathioudakis M, Koudas N.Twittermonitor: Trend Detection over the Twitter Stream[C]. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, 2010: 1155-1158.
[15] Long R, Wang H, Chen Y, et al.Towards Effective Event Detection, Tracking and Summarization on Microblog Data [A]. //Web-Age Information Management[M]. Springer Berlin Heidelberg, 2011: 652-663.
[16] 赵文清, 侯小可. 基于词共现图的中文微博新闻话题识别[J]. 智能系统学报, 2012, 7(5): 444-449.
[16] (Zhao Wenqing, Hou Xiaoke.News Topic Recognition of Chinese Microblog Based on Word Co-occurrence Graph[J]. CAAI Transactions on Intelligent Systems, 2012, 7(5): 444-449.)
[17] Yao J, Cui B, Huang Y, et al.Bursty Event Detection from Collaborative Tags[J]. World Wide Web, 2012, 15(2): 171-195.
[18] 王勇, 肖诗斌, 郭跇秀, 等. 中文微博突发事件检测研究[J]. 现代图书情报技术, 2013(2): 57-62.
[18] (Wang Yong, Xiao Shibin, Guo Yixiu, et al.Research on Chinese Micro-blog Bursty Topics Detection[J]. New Technology of Library and Information Service, 2013 (2): 57-62.)
[19] 郭跇秀, 吕学强, 李卓. 基于突发词聚类的微博突发事件检测方法[J]. 计算机应用, 2014, 34(2): 486-490, 505.
[19] (Guo Yixiu, Lv Xueqiang, Li Zhuo.Bursty Topics Detection Approach on Chinese Microblog Based on Burst Words Clustering[J]. Journal of Computer Applications, 2014, 34(2): 486-490, 505.)
[20] Small T A.What the Hashtag? A Content Analysis of Canadian Politics on Twitter[J]. Information, Communication & Society, 2011, 14(6): 872-895.
[21] 张志瑛. 基于主题模型和社区发现的微博热点事件检测研究[D]. 重庆: 西南大学, 2014.
[21] (Zhang Zhiying.Research on Hot Event Detection in Micro-blog Based on Topic Model and Community Discovery [D]. Chongqing: Southwest University, 2014.)
[22] 国家语言文字工作委员会.《通用规范汉字表》[K]. 2013.08. . (National Languages Committee. The Common Standard Chinese Characters Table [K]. 2013.08.
[23] NLPIR汉语分词系统[CP/OL]. . (NLPIR Chinese Word Segmentation System [CP/OL].
[1] Mingqing Zhao,Shengqiang Wu. Research on Stock Market Weighted Prediction Method Based on Micro-blog Sentiment Analysis[J]. 数据分析与知识发现, 2019, 3(2): 43-51.
[2] Ziming Zeng,Qianwen Yang. Sentiment Analysis for Micro-blogs with LDA and AdaBoost[J]. 数据分析与知识发现, 2018, 2(8): 51-59.
[3] Yongbing Gao,Guipeng Yang,Di Zhang,Zhanfei Ma. Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words[J]. 数据分析与知识发现, 2017, 1(9): 57-64.
[4] Yue He,Can Zhu. Sentiment Analysis of Weibo Opinion Leaders——Case Study of “Illegal Vaccine” Event[J]. 数据分析与知识发现, 2017, 1(9): 65-73.
[5] Yang Aidong,Liu Dongsu. Hadoop Based Public Opinion Monitoring System for Micro-blogs[J]. 现代图书情报技术, 2016, 32(5): 56-63.
[6] Sun He,Li Shuqin,Lv Xueqiang,Liu Kehui. Retrieving Geographic Information for Micro-blog’s City Complaints[J]. 现代图书情报技术, 2016, 32(3): 58-66.
[7] Li Jinhua,An Zhongjie. Analyzing Geographical Coordinates Data for Micro-blog Trending Events[J]. 现代图书情报技术, 2016, 32(2): 90-101.
[8] Lan Yuexin, Dong Xilin, Su Guoqiang, Qu Zhikai. Research on Micro-blog Public Opinion Information Interaction Model Under the Background of Big Data[J]. 现代图书情报技术, 2015, 31(5): 24-33.
[9] Yang Ning, Huang Feihu, Wen Yi, Chen Yunwei. An Opinion Evolution Model Based on the Behavior of Micro-blog Users[J]. 现代图书情报技术, 2015, 31(12): 34-41.
[10] Shi Weijie, Xu Yabin. Research on Discovering Micro-blog User Interests[J]. 现代图书情报技术, 2015, 31(1): 52-58.
[11] Tang Xiaobo, Fang Xiaoke. The Effect of the Quality of Textual Features on Retrieval in Micro-blog[J]. 现代图书情报技术, 2014, 30(6): 79-86.
[12] Lin Chen. The Evaluation Model Research on Information Dissemination Influence of Micro-blog Individual[J]. 现代图书情报技术, 2014, 30(2): 79-85.
[13] He Jing, Guo Jinli, Xu Xuejuan. Analysis on Statistical Characteristic and Dynamics for User Behavior in Microblog Communities[J]. 现代图书情报技术, 2013, 29(7/8): 94-100.
[14] Xiong Tao, He Yue. The Identification and Analysis of Micro-blogging Opinion Leaders in the Network of Retweet Relationship[J]. 现代图书情报技术, 2013, (6): 55-62.
[15] Wang Lin, Zhao Yang, Shi Kan. Perception of Implementation Intention Law on Micro-blog Public Opinion: A Comparative Experimental Study[J]. 现代图书情报技术, 2013, (5): 73-79.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn