Identifying Trending Events Based on Time Series Anomaly Detection
Yang Xinyi1,2,Ma Haiyun1,2(),Zhu Hengmin3
1School of Information Management, Nanjing University, Nanjing 210023, China 2Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China 3School of Management, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
[Objective] This study aims to discover information topics and identify real-world events that stimulate public discussions. It helps us establish timely responses and reduce risks. [Methods] We first constructed a co-word network to detect communities representing topics. Then, we calculated the document topic vectors based on the overlaps between the document words and topic community words. Third, we decided topic popularity time series according to the document time. Finally, we used the STL to decompose topic popularity time series and employed the 3σ rule to detect anomalies. We identified real-world events stimulating discussion by examining high-frequency words and highly correlated documents at anomalous time points. [Results] We examined the new model with posts from Sina Weibo about the heavy rainstorm in Henan. We discovered topics related to disaster situations, emergency management, and social response. Anomaly detection and analysis show that the topics about disaster situations received the highest public attention, with rainfall warnings and flood control actions being hot events. In emergency management, rescue and relief efforts and accident investigation can stimulate discussions. Regarding social response, stories of victims' mutual aid and public donations attract attention. [Limitations] The dataset of this study is relatively small, so we have to manually set the threshold of anomaly detection. An automatic method is needed for larger datasets. [Conclusions] Anomaly detection in topic time series can identify the trending events on social platforms. In crisis response, government agencies need to address rescue, prevention, and recovery aspects, issue timely warnings, provide information on disaster relief and accident investigations to address public concerns, and guide positive or healthy public opinion by promoting rescue, mutual aid, and donation activities.
杨欣谊, 马海云, 朱恒民. 基于时间序列异常检测的热点事件发现*[J]. 数据分析与知识发现, 2024, 8(2): 131-142.
Yang Xinyi, Ma Haiyun, Zhu Hengmin. Identifying Trending Events Based on Time Series Anomaly Detection. Data Analysis and Knowledge Discovery, 2024, 8(2): 131-142.
Lyu J C, Han E L, Luli G K. COVID-19 Vaccine-Related Discussion on Twitter: Topic Modeling and Sentiment Analysis[J]. Journal of Medical Internet Research, 2021, 23(6): e24435.
doi: 10.2196/24435
[2]
Weng J S, Lee B S. Event Detection in Twitter[C]// Proceedings of the 5th International AAAI Conference on Web and Social Media. 2011: 401-408.
[3]
Xu Z. Personal Stories Matter: Topic Evolution and Popularity Among Pro- and Anti-vaccine Online Articles[J]. Journal of Computational Social Science, 2019, 2(2): 207-220.
doi: 10.1007/s42001-019-00044-w
[4]
Zhang Z F, Li Q D. QuestionHolic: Hot Topic Discovery and Trend Analysis in Community Question Answering Systems[J]. Expert Systems with Applications, 2011, 38(6): 6848-6855.
doi: 10.1016/j.eswa.2010.12.052
[5]
Kato S, Nakanishi T, Ahsan B, et al. Time-Series Topic Analysis Using Singular Spectrum Transformation for Detecting Political Business Cycles[J]. Journal of Cloud Computing, 2021, 10(1): 21.
[6]
Ntompras C, Drosatos G, Kaldoudi E. A High-Resolution Temporal and Geospatial Content Analysis of Twitter Posts Related to the COVID-19 Pandemic[J]. Journal of Computational Social Science, 2022, 5(1): 687-729.
doi: 10.1007/s42001-021-00150-8
[7]
Blázquez-García A, Conde A, Mori U, et al. A Review on Outlier/Anomaly Detection in Time Series Data[J]. ACM Computing Surveys, 2021, 54(3): 56.
[8]
Hochenbaum J, Vallis O S, Kejariwal A. Automatic Anomaly Detection in the Cloud via Statistical Learning[OL]. arXiv Preprint, arXiv:1704.07706.
[9]
Dani M C, Jollois F X, Nadif M, et al. Adaptive Threshold for Anomaly Detection Using Time Series Segmentation[C]// Proceedings of International Conference on Neural Information Processing. 2015: 82-89.
[10]
Ansah J, Liu L, Kang W, et al. Leveraging Burst in Twitter Network Communities for Event Detection[J]. World Wide Web, 2020, 23(5): 2851-2876.
doi: 10.1007/s11280-020-00786-y
[11]
Feng W, Zhang C, Zhang W, et al. STREAMCUBE: Hierarchical Spatio-Temporal Hashtag Clustering for Event Exploration over the Twitter Stream[C]// Proceedings of the 31st International Conference on Data Engineering. 2015: 1561-1572.
[12]
Kleinberg J. Bursty and Hierarchical Structure in Streams[J]. Data Mining and Knowledge Discovery, 2003, 7(4): 373-397.
doi: 10.1023/A:1024940629314
[13]
Leskovec J, Backstrom L, Kleinberg J. Meme-Tracking and the Dynamics of the News Cycle[C]// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009: 497-506.
[14]
Mathioudakis M, Koudas N. TwitterMonitor: Trend Detection over the Twitter Stream[C]// Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010: 1155-1158.
(Luo Pengcheng, Wang Yibo, Wang Shiqi, et al. Microblog Event Detection Method Based on Bursty Phrase Mining[J]. Information Studies: Theory & Application, 2021, 44(12): 172-179.)
[16]
Stilo G, Velardi P. Efficient Temporal Mining of Micro-blog Texts and Its Application to Event Discovery[J]. Data Mining and Knowledge Discovery, 2016, 30(2): 372-402.
doi: 10.1007/s10618-015-0412-3
[17]
Slater P B. A Two-Stage Algorithm for Extracting the Multiscale Backbone of Complex Weighted Networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2009, 106(26): E66.
[18]
Traag V A, Waltman L, van Eck N J. From Louvain to Leiden: Guaranteeing Well-Connected Communities[J]. Scientific Reports, 2019, 9: 5233.
doi: 10.1038/s41598-019-41695-z
pmid: 30914743
[19]
Cruickshank I J, Carley K M. Characterizing Communities of Hashtag Usage on Twitter During the 2020 COVID-19 Pandemic by Multi-view Clustering[J]. Applied Network Science, 2020, 5(1): 66.
doi: 10.1007/s41109-020-00317-8
pmid: 32953977
(Li Qianrui, Guo Junfang, Huang Ying, et al. Topic Evolution Research of Disruptive Technology Based on Mutation and Fusion Perspective[J]. Studies in Science of Science, 2021, 39(12): 2129-2139.)
[21]
Cortés J D. Identifying the Dissension in Management and Business Research in Latin America and the Caribbean via Co-word Analysis[J]. Scientometrics, 2022, 127(12): 7111-7125.
doi: 10.1007/s11192-021-04259-5
[22]
Cleveland R B, Cleveland W S, McRae J E, et al. STL: A Seasonal-Trend Decomposition Procedure Based on Loess[J]. Journal of Official Statistics, 1990, 6(1): 3-73.
(Huang Jixin, Guo Xuesong. Research on Adaptive Mechanism of Disaster Response Organization Network Drived by Emergency Task——Taking the Case of “7.20” Extraordinary Rainstorm in Zhengzhou, Henan[J]. Journal of Public Management, 2022, 19(4): 52-64, 168-169.)
(Yao Leye, Meng Qun. The Evolution Mechanism of Public Opinion on Large-Scale Natural Disasters: Constituent Elements, Operational Logic and Dynamic Factors[J]. Information and Documentation Services, 2020, 41(5): 49-57.)
(Lu Yanxia. Research on the Psychological Response and Dissemination Behavior of Weibo Users in Natural Disaster Emergencies[D]. Dalian: Dalian University of Technology, 2021.)
(Li Gang, Chen Sijing, Mao Jin, et al. Spatio-Temporal Comparison of Microblog Trending Topics on Natural Disasters[J]. Data Analysis and Knowledge Discovery, 2019, 3(11): 1-15.)
(Li Ziwei, Xing Yunfei. Research on the Evolution of Emergency Public Opinion Topic in the New Media Environment ——A Case of “Jiuzhaigou Earthquake” in Sina Micro-blog[J]. Information Science, 2017, 35(12): 39-44, 167.)