|
|
A New Method to Detect Bursty Events from Micro-blog Posts Based on Bursty Topic Words and Agglomerative Hierarchical Clustering Algorithm |
Ding Shengchun( ),Gong Silan,Li Hongmei |
School of Economics and Management, Nanjing University of Science & Technology, Nanjing 210094, China |
|
|
Abstract [Objective] This paper proposes a new method to detect real time bursty events accurately and efficiently from massive micro-blog posts. It provides decision-making information to public opinion emergency management. [Methods] First, we introduced the reference time window mechanism, and then designed an algorithm to process the data of word frequency, document frequency, Hashtags, and word frequency growth rates. Second, used this dynamic threshold based algorithm to extract bursty words. Third, transformed micro-blog texts to feature vector of the bursty words. Finally, we detected the bursty events using agglomerative hierarchical clustering algorithm. [Results] The bursty events detection method reached 80% of accuracy rate compared with real world cases. Thus, the proposed method was feasible and effective. [Limitations] We could not describe the detected emergencies automatically due to the limits of data and size of the current study. More research is needed to analyze users’ emotion and semantic relationships among the bursty events. [Conclusions] Our study fills the knowledge gaps left by previous research, and improves the efficiency of retrieving bursty events from micro-blog posts.
|
Received: 12 June 2016
Published: 29 September 2016
|
[1] | Wang X, Zhai C X, Hu X, et al.Mining Correlated Bursty Topic Patterns from Coordinated Text Streams[C]. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2007: 784-793. | [2] | Du Y, Wu W, He Y, et al.Microblog Bursty Feature Detection Based on Dynamics Model [C]. In: Proceedings of 2012 International Conference on Systems and Informatics (ICSAI). IEEE, 2012: 2304-2308. | [3] | Aggarwal C C, Zhai C X.A Survey of Text Clustering Algorithms [A]. //Mining Text Data[M]. Springer US, 2012: 77-128. | [4] | Yang Y, Pierce T, Carbonell J.A Study of Retrospective and On-line Event Detection[C]. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1998: 28-36. | [5] | Petrovi? S, Osborne M, Lavrenko V.Streaming First Story Detection with Application to Twitter[C]. In: Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010: 181-189. | [6] | Phuvipadawat S, Murata T.Breaking News Detection and Tracking in Twitter [C]. In: Proceedings of 2010 IEEE/WIC/ ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). IEEE, 2010: 120-123. | [7] | 葛高飞. 突发事件微博新话题检测与跟踪系统的设计与实现[D]. 北京: 北京邮电大学, 2014. | [7] | (Ge Gaofei.Design and Implementation of New Topic Detection and Tracking of Microblog Based on Emergency [D]. Beijing: Beijing University of Posts and Telecomunications, 2014.) | [8] | Becker H, Naaman M, Gravano L.Beyond Trending Topics: Real-World Event Identification on Twitter[C]. In: Proceedings of the 5th International AAAI Conference on Weblogs and Social Media. 2011: 438-441. | [9] | Du Y, He Y, Tian Y, et al.Microblog Bursty Topic Detection Based on User Relationship [C]. In:Proceedings of 2011 6th IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC). IEEE, 2011: 260-263. | [10] | Benevenuto F, Magno G, Rodrigues T, et al.Detecting Spammers on Twitter [C]. In: Proceedings of the Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS). 2010(6): 12-20. | [11] | Weng J, Lee B S.Event Detection in Twitter [C]. In: Proceedings of the 5th International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain. 2011: 401-408. | [12] | Kleinberg J.Bursty and Hierarchical Structure in Streams[J]. Data Mining and Knowledge Discovery, 2003, 7(4): 373-397. | [13] | He Q, Chang K, Lim E P.Analyzing Feature Trajectories for Event Detection [C]. In:Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2007: 207-214. | [14] | Mathioudakis M, Koudas N.Twittermonitor: Trend Detection over the Twitter Stream[C]. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, 2010: 1155-1158. | [15] | Long R, Wang H, Chen Y, et al.Towards Effective Event Detection, Tracking and Summarization on Microblog Data [A]. //Web-Age Information Management[M]. Springer Berlin Heidelberg, 2011: 652-663. | [16] | 赵文清, 侯小可. 基于词共现图的中文微博新闻话题识别[J]. 智能系统学报, 2012, 7(5): 444-449. | [16] | (Zhao Wenqing, Hou Xiaoke.News Topic Recognition of Chinese Microblog Based on Word Co-occurrence Graph[J]. CAAI Transactions on Intelligent Systems, 2012, 7(5): 444-449.) | [17] | Yao J, Cui B, Huang Y, et al.Bursty Event Detection from Collaborative Tags[J]. World Wide Web, 2012, 15(2): 171-195. | [18] | 王勇, 肖诗斌, 郭跇秀, 等. 中文微博突发事件检测研究[J]. 现代图书情报技术, 2013(2): 57-62. | [18] | (Wang Yong, Xiao Shibin, Guo Yixiu, et al.Research on Chinese Micro-blog Bursty Topics Detection[J]. New Technology of Library and Information Service, 2013 (2): 57-62.) | [19] | 郭跇秀, 吕学强, 李卓. 基于突发词聚类的微博突发事件检测方法[J]. 计算机应用, 2014, 34(2): 486-490, 505. | [19] | (Guo Yixiu, Lv Xueqiang, Li Zhuo.Bursty Topics Detection Approach on Chinese Microblog Based on Burst Words Clustering[J]. Journal of Computer Applications, 2014, 34(2): 486-490, 505.) | [20] | Small T A.What the Hashtag? A Content Analysis of Canadian Politics on Twitter[J]. Information, Communication & Society, 2011, 14(6): 872-895. | [21] | 张志瑛. 基于主题模型和社区发现的微博热点事件检测研究[D]. 重庆: 西南大学, 2014. | [21] | (Zhang Zhiying.Research on Hot Event Detection in Micro-blog Based on Topic Model and Community Discovery [D]. Chongqing: Southwest University, 2014.) | [22] | 国家语言文字工作委员会.《通用规范汉字表》[K]. 2013.08. . (National Languages Committee. The Common Standard Chinese Characters Table [K]. 2013.08. | [23] | NLPIR汉语分词系统[CP/OL]. . (NLPIR Chinese Word Segmentation System [CP/OL]. |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|