[Objective] This paper constructs a topic drift index for trending network events, aiming to describe their changing topics. [Methods] We used the LDA model to extract topics of online trending events and analyzed their drifts with word weights. Then, we proposed procedures for constructing topic drift index. Finally, we took “Gao Yixiang pass away” as the sample for an empirical analysis. [Results] In the early stage of our case, the number of topics increased from 11 to 18, and the topic drift index was 41%, which then fell to 22%. Finally, the number of topics was reduced to 5 and the topic drift index turned to -41%. [Limitations] The proposed method could not effectively generate early-warnings for small number of topic changes and multimedia contents. It cannot detect changes of topic semantics. [Conclusions] The topic drift index for trending network events could predict the timing of online public opinion outbreaks and their recurrences.
黄微,赵江元,闫璐. 网络热点事件话题漂移指数构建与实证研究*[J]. 数据分析与知识发现, 2020, 4(11): 92-101.
Huang Wei,Zhao Jiangyuan,Yan Lu. Empirical Research on Topic Drift Index for Trending Network Events. Data Analysis and Knowledge Discovery, 2020, 4(11): 92-101.
Wan L X. Book Review: The Discourse of News Values: How News Organizations Create Newsworthiness. New York: Oxford University Press[A]// Language in Society[J]. 2018,47(2):320-321.
( Mao Taitian, Jiang Guanwen, Li Yong, et al. Research on the Characteristics of Emotional Communication of Network Hot Events in the Era of New Media[J]. Information Science, 2019,37(4):29-35, 96.)
[3]
Gao S X, Li X, Yu Z T, et al. Combining Paper Cooperative Network and Topic Model for Expert Topic Analysis and Extraction[J]. Neurocomputing, 2017,257(27):136-143.
[4]
Comito C, Pizzuti C, Procopio N. Online Clustering for Topic Detection in Social Data Streams[C]// Proceedings of 2016 IEEE International Conference on Tools with Artificial Intelligence (ICTAI). IEEE Computer Society, 2016.
( Shen Si, Sun Hao, Wang Dongbo. Research on Topics Semantic Similarity Calculation and Knowledge Discovery of Medical Based on Deep Learning Representation[J]. Information Studies: Theory & Practice, 2020,43(5):183-190.)
[6]
Gibran F P, Ivan V M R. Topic Discovery in Massive Text Corpora Based on Min-Hashing[J]. Expert Systems with Applications, 2019,136:62-72.
doi: 10.1016/j.eswa.2019.06.024
[7]
Chen J Y, Gong Z G, Liu W W. A Nonparametric Model for Online Topic Discovery with Word Embeddings[J]. Information Sciences, 2019,504:32-47.
doi: 10.1016/j.ins.2019.07.048
( Zheng Hengyi, Liao Chenglin, Li Tianzhu. A Topic Detection Method for Long Web Text[J]. Chinese Journal of Engineering, 2019,41(9):1208-1214.)
[9]
Zhao B, Xu W, Ji G L, et al. Discovering Topic Evolution Topology in a Microblog Corpus[C]// Proceedings of the 3rd International Conference on Advanced Cloud and Big Data (CBD). IEEE, 2015. DOI: 10.1109/CBD.2015.12.
[10]
Wang J M, Wu X D, Li L. A Framework for Semantic Connection Based Topic Evolution with DeepWalk[J]. Intelligent Data Analysis, 2018,22(1):211-237.
doi: 10.3233/IDA-163282
[11]
Gao W, Peng M, Wang H, et al. Generation of Topic Evolution Graphs from Short Text Streams[J]. Neurocomputing, 2020,383:282-294.
doi: 10.1016/j.neucom.2019.11.077
( Li Gang, Chen Sijing, Mao Jin, et al. Spatio-Temporal Comparison of Microblog Trending Topics on Natural Disasters[J]. Data Analysis and Knowledge Discovery, 2019,3(11):1-15.)
( Huang Wei, Zhu Zhenyuan, Xu Yejing, et al. The Construction of Heat Assessment Model for Tweets of Network Public Opinion[J]. Library and Information Service, 2019,63(20):17-25.)
[15]
Nolasco D, Oliveira J. Subevents Detection Through Topic Modeling in Social Media Posts[J]. Future Generation Computer Systems, 2019,93:290-303.
doi: 10.1016/j.future.2018.09.008
( Zhao Hua, Zhang Chengzhi. A Comparative Study of Chinese and English Emergency Topics Evolution: Taking H7N9 Microblog as Example[J]. Information and Documentation Services, 2016 (3):19-27.)
[17]
Blei D M, Ng A Y, Jordan M I, et al. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3(4-5):993-1022.
[18]
Joachims T. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization[C]// Proceedings of the 14th International Conference on Machine Learning (ICML 1997). 1997: 143-151.
[19]
Huang L, Ma J Y, Chen C L. Topic Detection from Microblogs Using T-LDA and Perplexity[C]// Proceedings of Asia-Pacific Software Engineering Conference Workshops. 2017: 71-77.
[20]
Gao Y B, Yang L Y, Hu W J, et al. Research on Domanial Microblog Topic Evolution Based on HDP Model[J]. Computer Engineering, 2018,5:132-145.
doi: 10.19678/j.issn.1000-3428.0046456