Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (11): 92-101    DOI: 10.11925/infotech.2096-3467.2020.0230
Current Issue | Archive | Adv Search |
Empirical Research on Topic Drift Index for Trending Network Events
Huang Wei,Zhao Jiangyuan(),Yan Lu
School of Management, Jilin University, Changchun 130022, China
Download: PDF (1068 KB)   HTML ( 9
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper constructs a topic drift index for trending network events, aiming to describe their changing topics. [Methods] We used the LDA model to extract topics of online trending events and analyzed their drifts with word weights. Then, we proposed procedures for constructing topic drift index. Finally, we took “Gao Yixiang pass away” as the sample for an empirical analysis. [Results] In the early stage of our case, the number of topics increased from 11 to 18, and the topic drift index was 41%, which then fell to 22%. Finally, the number of topics was reduced to 5 and the topic drift index turned to -41%. [Limitations] The proposed method could not effectively generate early-warnings for small number of topic changes and multimedia contents. It cannot detect changes of topic semantics. [Conclusions] The topic drift index for trending network events could predict the timing of online public opinion outbreaks and their recurrences.

Key wordsNetwork Hot Events      Hot Topics Drift Index      LDA      Early Warning of Internet Public Opinion     
Received: 22 March 2020      Published: 04 December 2020
ZTFLH:  G203  
Corresponding Authors: Zhao Jiangyuan     E-mail: 402595270@qq.com

Cite this article:

Huang Wei,Zhao Jiangyuan,Yan Lu. Empirical Research on Topic Drift Index for Trending Network Events. Data Analysis and Knowledge Discovery, 2020, 4(11): 92-101.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0230     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I11/92

Construction Process of Hot Topics Drift Index
The Curve of Perplexity After Average of Multiple Test Sets
Trend Chart of Sina Weibo Heat Index with “Gao Yixiang” as the Keyword
日期 微博数 最优
话题数
日期 微博数 最优
话题数
11月24日 412 13 12月6日 4 041 17
11月25日 397 10 12月7日 6 253 18
11月26日 343 11 12月8日 5 629 16
11月27日 5 316 18 12月9日 6 511 19
11月28日 4 453 17 12月10日 5 165 19
11月29日 5 254 15 12月11日 5 390 15
11月30日 5 750 17 12月12日 6 816 18
12月1日 4 241 17 12月13日 5 273 18
12月2日 5 065 14 12月14日 4 598 17
12月3日 4 443 16 12月15日 5 129 10
12月4日 4 331 14 12月16日 4 459 5
12月5日 4 385 17 12月17日 6 032 6
Optimal Topic Number and Sina Weibo Number
日期 话题关键词 话题内涵
11月25日 2019年、高以翔、电视剧、大赏、投票、助力、支持、人气、剧集、作品、参与…… 网民为高以翔参演电视剧作品参选2019年“电视剧大赏”投票。
高以翔、彩虹、重力、过分、美丽、微博、电视剧、遇见、王沥川、季簧、成熟、帅气…… 网民对高以翔参演电视剧作品及剧中人物的讨论。
男神、高以翔、影响力、提升、贡献、超话、打榜、排名、人气、应援、超越…… 网民为高以翔在明星超话榜中打榜应援。
11月26日 2019年、最喜爱、剧集、百部、优秀、好剧、微博、电视剧、大赏、助力、投票…… 高以翔参演电视剧作品入围微博“电视剧大赏”评选,网民参与投票。
高以翔、彩虹、重力、入围、微博、电视剧、大赏、百部、优秀、剧集、期待、更多、精彩…… 网民恭喜高以翔参演电视剧《彩虹的重力》入围电视剧大赏百部优秀剧集。
高以翔、彩虹、重力、过分、美丽、跟着、官官、官方、微博、网宣组、投票、电视剧、加油…… 网民转发高以翔官方账号组织的投票链接。
11月27日 王沥川、高以翔、去世、难以接受、真的、一路、走好、喜欢、永远、难过、温柔、绅士…… 网民对高以翔去世难以接受及通过转发其饰演角色名“王沥川”以缅怀高以翔。
高以翔、去世、录制、综艺、节目、追我吧、35岁、浙江、卫视、中国蓝、27日、猝死、中暑…… 关于高以翔在录制浙江卫视综艺节目《追我吧》时去世。
高以翔、去世、希望、没事、太假、出事、意外、平安、消息、明天、急救、最佳时间、珍惜…… 部分网友得到高以翔去世的消息并对消息产生怀疑以及对其祝福。
11月28日 高以翔、追思会、去世、徐峥、斥责、浙江、卫视、追我吧、节目组、安全、防范、太差…… 网友就高以翔追思会展开探讨以及转发徐峥斥责浙江卫视安全防范意识。
高以翔、走好、王沥川、真的、难过、再见、希望、一路、永远、可惜、世界、天堂、温柔…… 网友为高以翔的去世而难过并祝福高以翔。
高以翔、追思会、浙江、卫视、蓝台、追我吧、节目组、安慰、父母、爸爸、声明、道歉…… 浙江卫视追我吧节目组为高以翔去世发表声明并安慰其家人。
Keywords and Topic Connotation of the First Three Topics from November 25 to November 28
时间节点 话题漂移指数 时间节点 话题漂移指数 时间节点 话题漂移指数
[11.24 0:00, 11.25 24:00) -0.015 [12.02 0:00, 12.03 24:00) 0.224 [12.10 0:00, 12.11 24:00) -0.005
[11.25 0:00, 11.26 24:00) 0.045 [12.03 0:00, 12.04 24:00) -0.014 [12.11 0:00, 12.12 24:00) -0.058
[11.26 0:00, 11.27 24:00) 0.409 [12.04 0:00, 12.05 24:00) 0.002 [12.12 0:00, 12.13 24:00) 0.039
[11.27 0:00, 11.28 24:00) 0.069 [12.05 0:00, 12.06 24:00) -0.010 [12.13 0:00, 12.14 24:00) -0.065
[11.28 0:00, 11.29 24:00) -0.025 [12.06 0:00, 12.07 24:00) 0.054 [12.14 0:00, 12.15 24:00) -0.024
[11.29 0:00, 11.30 24:00) 0.026 [12.07 0:00, 12.08 24:00) -0.056 [12.15 0:00, 12.16 24:00) -0.412
[11.30 0:00, 12.01 24:00) -0.061 [12.08 0:00, 12.09 24:00) -0.007 [12.16 0:00, 12.17 24:00) 0.076
[12.01 0:00, 12.02 24:00) -0.078 [12.09 0:00, 12.10 24:00) 0.037
Hot Topic Drift Index of “Gao Yixiang Pass Away”
Hot Topic Drift Index and Heat Index of “Gao Yixiang Pass Away”
[1] Wan L X. Book Review: The Discourse of News Values: How News Organizations Create Newsworthiness. New York: Oxford University Press[A]// Language in Society[J]. 2018,47(2):320-321.
[2] 毛太田, 蒋冠文, 李勇, 等. 新媒体时代下网络热点事件情感传播特征研究[J]. 情报科学, 2019,37(4):29-35, 96.
[2] ( Mao Taitian, Jiang Guanwen, Li Yong, et al. Research on the Characteristics of Emotional Communication of Network Hot Events in the Era of New Media[J]. Information Science, 2019,37(4):29-35, 96.)
[3] Gao S X, Li X, Yu Z T, et al. Combining Paper Cooperative Network and Topic Model for Expert Topic Analysis and Extraction[J]. Neurocomputing, 2017,257(27):136-143.
[4] Comito C, Pizzuti C, Procopio N. Online Clustering for Topic Detection in Social Data Streams[C]// Proceedings of 2016 IEEE International Conference on Tools with Artificial Intelligence (ICTAI). IEEE Computer Society, 2016.
[5] 沈思, 孙豪, 王东波. 基于深度学习表示的医学主题语义相似度计算及知识发现研究[J]. 情报理论与实践, 2020,43(5):183-190.
[5] ( Shen Si, Sun Hao, Wang Dongbo. Research on Topics Semantic Similarity Calculation and Knowledge Discovery of Medical Based on Deep Learning Representation[J]. Information Studies: Theory & Practice, 2020,43(5):183-190.)
[6] Gibran F P, Ivan V M R. Topic Discovery in Massive Text Corpora Based on Min-Hashing[J]. Expert Systems with Applications, 2019,136:62-72.
doi: 10.1016/j.eswa.2019.06.024
[7] Chen J Y, Gong Z G, Liu W W. A Nonparametric Model for Online Topic Discovery with Word Embeddings[J]. Information Sciences, 2019,504:32-47.
doi: 10.1016/j.ins.2019.07.048
[8] 郑恒毅, 廖城霖, 李天柱. 一种面向网络长文本的话题检测方法[J]. 工程科学学报, 2019,41(9):1208-1214.
[8] ( Zheng Hengyi, Liao Chenglin, Li Tianzhu. A Topic Detection Method for Long Web Text[J]. Chinese Journal of Engineering, 2019,41(9):1208-1214.)
[9] Zhao B, Xu W, Ji G L, et al. Discovering Topic Evolution Topology in a Microblog Corpus[C]// Proceedings of the 3rd International Conference on Advanced Cloud and Big Data (CBD). IEEE, 2015. DOI: 10.1109/CBD.2015.12.
[10] Wang J M, Wu X D, Li L. A Framework for Semantic Connection Based Topic Evolution with DeepWalk[J]. Intelligent Data Analysis, 2018,22(1):211-237.
doi: 10.3233/IDA-163282
[11] Gao W, Peng M, Wang H, et al. Generation of Topic Evolution Graphs from Short Text Streams[J]. Neurocomputing, 2020,383:282-294.
doi: 10.1016/j.neucom.2019.11.077
[12] 李慧, 王丽婷. 基于话题标签的微博热点话题演化研究[J]. 情报科学, 2019,37(1):30-36.
[12] ( Li Hui, Wang Liting. The Evolution of Hot Topics in Microblog Based on Topic Tagging[J]. Information Science, 2019,37(1):30-36.)
[13] 李纲, 陈思菁, 毛进, 等. 自然灾害事件微博热点话题的时空对比分析[J]. 数据分析与知识发现, 2019,3(11):1-15.
[13] ( Li Gang, Chen Sijing, Mao Jin, et al. Spatio-Temporal Comparison of Microblog Trending Topics on Natural Disasters[J]. Data Analysis and Knowledge Discovery, 2019,3(11):1-15.)
[14] 黄微, 朱镇远, 许烨婧, 等. 网络舆情衍进指数构建与实证分析[J]. 图书情报工作, 2019,63(20):17-25.
[14] ( Huang Wei, Zhu Zhenyuan, Xu Yejing, et al. The Construction of Heat Assessment Model for Tweets of Network Public Opinion[J]. Library and Information Service, 2019,63(20):17-25.)
[15] Nolasco D, Oliveira J. Subevents Detection Through Topic Modeling in Social Media Posts[J]. Future Generation Computer Systems, 2019,93:290-303.
doi: 10.1016/j.future.2018.09.008
[16] 赵华, 章成志. 中英文突发事件话题演化对比研究——以H7N9微博为例[J]. 情报资料工作, 2016(3):19-27.
[16] ( Zhao Hua, Zhang Chengzhi. A Comparative Study of Chinese and English Emergency Topics Evolution: Taking H7N9 Microblog as Example[J]. Information and Documentation Services, 2016 (3):19-27.)
[17] Blei D M, Ng A Y, Jordan M I, et al. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3(4-5):993-1022.
[18] Joachims T. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization[C]// Proceedings of the 14th International Conference on Machine Learning (ICML 1997). 1997: 143-151.
[19] Huang L, Ma J Y, Chen C L. Topic Detection from Microblogs Using T-LDA and Perplexity[C]// Proceedings of Asia-Pacific Software Engineering Conference Workshops. 2017: 71-77.
[20] Gao Y B, Yang L Y, Hu W J, et al. Research on Domanial Microblog Topic Evolution Based on HDP Model[J]. Computer Engineering, 2018,5:132-145.
doi: 10.19678/j.issn.1000-3428.0046456
[1] Cai Yongming,Liu Lu,Wang Kewei. Identifying Key Users and Topics from Online Learning Community[J]. 数据分析与知识发现, 2020, 4(6): 69-79.
[2] Ye Guanghui,Zeng Jieyan,Hu Jinglan,Bi Chongwu. Analyzing Public Sentiments from the Perspective of City Profiles[J]. 数据分析与知识发现, 2020, 4(4): 15-26.
[3] Pan Youneng,Ni Xiuli. Recommending Online Medical Experts with Labeled-LDA Model[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[4] Liu Yuwen,Wang Kai. Finding Geographic Locations of Popular Online Topics[J]. 数据分析与知识发现, 2020, 4(2/3): 173-181.
[5] Ye Guanghui,Xu Tong,Bi Chongwu,Li Xinyue. Analyzing Evolution of City Tourism Portraits with Multi-Dimensional Features and LDA Model[J]. 数据分析与知识发现, 2020, 4(11): 121-130.
[6] Wang Xiwei,Zhang Liu,Huang Bo,Wei Ya’nan. Constructing Topic Graph for Weibo Users Based on LDA: Case Study of “Egypt Air Disaster”[J]. 数据分析与知识发现, 2020, 4(10): 47-57.
[7] Hongfei Ling,Shiyan Ou. Review of Automatic Labeling for Topic Models[J]. 数据分析与知识发现, 2019, 3(9): 16-26.
[8] Yunfei Shao,Dongsu Liu. Classifying Short-texts with Class Feature Extension[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[9] Lixin Xia,Jieyan Zeng,Chongwu Bi,Guanghui Ye. Identifying Hierarchy Evolution of User Interests with LDA Topic Model[J]. 数据分析与知识发现, 2019, 3(7): 1-13.
[10] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[11] Linna Xi,Yongxiang Dou. Examining Reposts of Micro-bloggers with Planned Behavior Theory[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[12] Jie Zhang,Junbo Zhao,Dongsheng Zhai,Ningning Sun. Patent Technology Analysis of Microalgae Biofuel Industrial Chain Based on Topic Model[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[13] Qinghong Zhong,Xiaodong Qiao,Yunliang Zhang,Mengjuan Weng. Cross-media Fusion Method Based on LDA2Vec and Residual Network[J]. 数据分析与知识发现, 2019, 3(10): 78-88.
[14] Junwan Liu,Zhixin Long,Feifei Wang. Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
[15] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn