Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (10): 47-57    DOI: 10.11925/infotech.2096-3467.2020.0127
Current Issue | Archive | Adv Search |
Constructing Topic Graph for Weibo Users Based on LDA: Case Study of “Egypt Air Disaster”
Wang Xiwei1,2,3,Zhang Liu1(),Huang Bo4,Wei Ya’nan1
1School of Management, Jilin University, Changchun 130022, China
2Research Center for Big Data Management, Jilin University, Changchun 130022, China
3Cyberspace Governance Research Center, Jilin University, Changchun 130022, China
4School of Computer Science and Technology, Jilin University, Changchun 130022, China
Download: PDF (2625 KB)   HTML ( 5
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper constructs a topic graph for Weibo users, aiming to identify the characteristics of user groups and opinion leaders. It also tries to guide online public opinion and reduce the surveillance costs.[Methods] First, we built a processing model for topic graph of Weibo users based on LDA. Then, we determined the optimal number and distribution of users’ topics with the index of perplexity. Third, we used JS divergence to measure the similarity of user topics, and constructed the topic graph. Finally, we took “Egypt air disaster” data to examine the proposed method.[Results] The topic graph generated by LDA clustered the user topics and identified the opinion leaders.[Limitations] More research is needed to determine the optimal number of LDA topics.[Conclusions] The proposed method could help us identify the characteristics of different topic groups and their opinion leaders.

Key wordsLDA      Weibo User      Topic Map     
Received: 22 February 2020      Published: 10 July 2020
ZTFLH:  TP393  
Corresponding Authors: Zhang Liu     E-mail: 598837913@qq.com

Cite this article:

Wang Xiwei,Zhang Liu,Huang Bo,Wei Ya’nan. Constructing Topic Graph for Weibo Users Based on LDA: Case Study of “Egypt Air Disaster”. Data Analysis and Knowledge Discovery, 2020, 4(10): 47-57.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0127     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I10/47

Processing Model of Weibo User Topic Map Based on LDA
Baidu Index of “Egypt Air Disaster”
Perplexity-Topic Line Chart
主题0 概率 主题1 概率 主题2 概率 主题3 概率 主题4 概率 主题5 概率 主题6 概率
埃塞俄比亚 0.042 停飞 0.038 照片 0.028 公布 0.027 记者会 0.044 波音 0.062 家属 0.019
飞行员 0.031 故障 0.032 护照 0.026 视频 0.024 东航 0.036 系统 0.039 起诉 0.017
发布 0.021 公布 0.016 意外 0.016 状况 0.021 国航 0.036 缺陷 0.039 遇难者 0.016
坠机 0.022 中国 0.013 信息 0.016 翻找 0.020 遗物 0.030 客机 0.038 死者 0.015
发布 0.021 全球 0.012 员工 0.015 机动 0.019 女孩 0.020 飞机 0.037 遗体 0.015
Topic High Frequency Word Distribution
Random Weibo Users’ Document-Topic Distribution
User Topic Map of “Egypt Air Disaster”
Document-Topic Average Probability
The Number of Weibo Users and Authenticated Users
User Node Distribution and Opinion Leader Identification in Topic 3
序号 用户节点 度中心度
1 凤凰网视频 62
2 安徽反邪教 54
3 S丶Rachel 53
4 高庆一 49
5 时间国际视频 26
6 眉山残联 26
7 快科技2018 22
8 火勺看点 20
9 新浪天津 11
10 潘清華005 5
User Degree Centrality in Topic 3 (Top10)
[1] 朱晓霞, 宋嘉欣, 孟建芳. 基于动态主题-情感演化模型的网络舆情信息分析[J]. 情报科学, 2019,37(7):72-78.
[1] ( Zhu Xiaoxia, Song Jiaxin, Meng Jianfang. Analysis of Online Public Opinion Information Based on the Dynamic Theme-Emotion Evolution Model[J]. Information Science, 2019,37(7):72-78.)
[2] 王晰巍, 张柳, 李师萌, 等. 新媒体环境下社会公益网络舆情传播研究——以新浪微博“画出生命线”话题为例[J]. 数据分析与知识发现, 2017,1(6):93-101.
[2] ( Wang Xiwei, Zhang Liu, Li Shimeng, et al. The Dissemination of Online Public Opinion on Social Welfare Issues via New Media: Case Study of “Draw up the Lifeline” in Sina Weibo[J]. Data Analysis and Knowledge Discovery, 2017,1(6):93-101.)
[3] 凌晨, 冯俊文, 吴鹏, 等. 基于SOAR模型的高校网络舆情应急响应研究[J]. 情报科学, 2019,37(9):145-152.
[3] ( Ling Chen, Feng Junwen, Wu Peng, et al. A Study on Crisis Response of Campus Network Public Opinion Based on SOAR Model[J]. Information Science, 2019,37(9):145-152.)
[4] Chen S Y, Jin Z S. Weibo Topic Detection Based on Improved TF-IDF Algorithm[J]. Science & Technology Review, 2016,34(2):282-286.
doi: 10.1126/science.34.870.282
[5] Srijith P K, Hepple M, Bontcheva K, et al. Sub-Story Detection in Twitter with Hierarchical Dirichlet Processes[J]. Information Processing & Management, 2017,53(4):989-1003.
doi: 10.1016/j.ipm.2016.10.004
[6] Choi H J, Park C H. Emerging Topic Detection in Twitter Stream Based on High Utility Pattern Mining[J]. Expert Systems with Applications, 2019,115(1):27-36.
doi: 10.1016/j.eswa.2018.07.051
[7] Nolasco D, Oliveira J. Subevents Detection Through Topic Modeling in Social Media Posts[J]. Future Generation Computer Systems, 2019,93(4):290-303.
doi: 10.1016/j.future.2018.09.008
[8] Ma T H, Li J, Liang X N, et al. A Time-Series Based Aggregation Scheme for Topic Detection in Weibo Short Texts[J]. Physica A: Statistical Mechanics and Its Applications, 2019, 536: Article No. 120972.
doi: 10.1016/j.physa.2019.04.266 pmid: 32288109
[9] 唐晓波, 肖璐. 基于依存句法分析的微博主题挖掘模型研究[J]. 情报科学, 2015,33(9):61-65.
[9] ( Tang Xiaobo, Xiao Lu. Research on Micro-Blog Topics Mining Model on Dependency Parsing[J]. Information Science, 2015,33(9):61-65.)
[10] 梁晓贺, 田儒雅, 吴蕾, 等. 基于超网络的微博舆情主题挖掘方法[J]. 情报理论与实践, 2017,40(10):100-105.
[10] ( Liang Xiaohe, Tian Ruya, Wu Lei, et al. A Method of Public Opinion Topic Mining in Micro-Blog Based on Super-Network[J]. Information Studies: Theory & Application, 2017,40(10):100-105.)
[11] 赵常煜, 吴亚平, 王继民. “一带一路”倡议下的Twitter文本主题挖掘和情感分析[J]. 图书情报工作, 2019,63(19):119-127.
[11] ( Zhao Changyu, Wu Yaping, Wang Jimin. Twitter Text Topic Mining and Sentiment Analysis Under the Belt and Road Initiative[J]. Library and Information Service, 2019,63(19):119-127.)
[12] 朱晓霞, 宋嘉欣, 孟建芳. 基于主题-情感挖掘模型的微博评论情感分类研究[J]. 情报理论与实践, 2019,42(5):159-164.
[12] ( Zhu Xiaoxia, Song Jiaxin, Meng Jianfang. Research on the Classification of Emotion in Microblog Comments Based on the Theme-Emotion Mining Model[J]. Information Studies: Theory & Application, 2019,42(5):159-164.)
[13] 徐敏, 李广建. 基于词频均值波动和概率语言模型的短文本热点主题探测研究[J]. 情报杂志, 2019,38(6):152-158.
[13] ( Xu Min, Li Guangjian. Short Texts’ Hot Topics Detection: Based on Word Frequency Mean Fluctuation and Probabilistic Language Model[J]. Journal of Intelligence, 2019,38(6):152-158.)
[14] Zhang Y L, Eick C F. Tracking Events in Twitter by Combining an LDA-Based Approach and a Density-Contour Clustering Approach[J]. International Journal of Semantic Computing, 2019,13(1):87-110.
doi: 10.1142/S1793351X19400051
[15] Luo L X. Network Text Sentiment Analysis Method Combining LDA Text Representation and GRU-CNN[J]. Personal and Ubiquitous Computing, 2019,23(3-4):405-412.
[16] 蔡永明, 长青. 共词网络LDA模型的中文短文本主题分析[J]. 情报学报, 2018,37(3):305-317.
[16] ( Cai Yongming, Chang Qing. Chinese Short Text Topic Analysis by Latent Dirichlet Allocation Model with Co-word Network Analysis[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(3):305-317.)
[17] Huang L, Ma J Y, Chen C L. Topic Detection from Microblogs Using T-LDA and Perplexity[C]//Proceedings of the 24th Asia-Pacific Software Engineering Conference Workshops. 2017: 71-77.
[18] 关鹏, 王曰芬. 科技情报分析中LDA主题模型最优主题数确定方法研究[J]. 现代图书情报技术, 2016(9):42-50.
[18] ( Guan Peng, Wang Yuefen. Identifying Optimal Topic Numbers from Sci-Tech Information with LDA Model[J]. New Technology of Library and Information Service, 2016(9):42-50.)
[19] 曾子明, 王婧. 基于LDA和随机森林的微博谣言识别研究——以2016年雾霾谣言为例[J]. 情报学报, 2019,38(1):89-96.
[19] ( Zeng Ziming, Wang Jing. Research on Microblog Rumor Identification Based on LDA and Random Forest[J]. Journal of the China Society for Scientific and Technical Information, 2019,38(1):89-96.)
[20] Zareie A, Sheikhahmadi A, Jalili M. Identification of Influential Users in Social Networks Based on Users’ Interest[J]. Information Sciences, 2019,493(4):217-231.
doi: 10.1016/j.ins.2019.04.033
[21] Wang H C, Chen W F, Lin C Y. NoteSum: An Integrated Note Summarization System by Using Text Mining Algorithms[J]. Information Sciences, 2020,513(3):536-552.
doi: 10.1016/j.ins.2019.11.011
[22] 观研网. 2019年中国微博行业分析报告[R/OL]. [2020-02-20]. http://www.gyii.cn/plus/view.php?aid=239883.
[22] Proresearch. Analysis Report of China’s Microblog Industry in 2019[R/OL]. [2020-02-20]. http://www.gyii.cn/plus/view.php?aid=239883.)
[23] 江燕青, 许鑫. 半衰期视角的微博信息老化研究——以高校官方微博为例[J]. 图书情报知识, 2016(2):94-102.
[23] ( Jiang Yanqing, Xu Xin. Research on Microblog Information Obsolescence from the Perspective of Half-Life: Taking Universities’ Official Microblog for Example[J]. Documentation, Information& Knowledge, 2016(2):94-102.)
[24] Hagen L. Content Analysis of E-petitions with Topic Modeling: How to Train and Evaluate LDA Models?[J]. Information Processing& Management, 2018,54(6):1292-1307.
doi: 10.1016/j.ipm.2018.05.006
[25] Jain L, Katarya R. Discover Opinion Leader in Online Social Network Using Firefly Algorithm[J]. Expert Systems with Applications, 2019,112(5):1-15.
doi: 10.1016/j.eswa.2018.06.026
[26] 张柳, 王晰巍, 黄博, 等. 基于字词向量的多尺度卷积神经网络微博评论的情感分类模型及实验研究[J]. 图书情报工作, 2019,63(18):99-108.
[26] ( Zhang Liu, Wang Xiwei, Huang Bo, et al. A Sentiment Classification Model of Multi-scale Convolutional Neural Network Microblog Comments Based on Word Vectors and Experimental Research[J]. Library and Information Service, 2019,63(18):99-108.)
[1] Cai Yongming,Liu Lu,Wang Kewei. Identifying Key Users and Topics from Online Learning Community[J]. 数据分析与知识发现, 2020, 4(6): 69-79.
[2] Ye Guanghui,Zeng Jieyan,Hu Jinglan,Bi Chongwu. Analyzing Public Sentiments from the Perspective of City Profiles[J]. 数据分析与知识发现, 2020, 4(4): 15-26.
[3] Pan Youneng,Ni Xiuli. Recommending Online Medical Experts with Labeled-LDA Model[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[4] Liu Yuwen,Wang Kai. Finding Geographic Locations of Popular Online Topics[J]. 数据分析与知识发现, 2020, 4(2/3): 173-181.
[5] Hongfei Ling,Shiyan Ou. Review of Automatic Labeling for Topic Models[J]. 数据分析与知识发现, 2019, 3(9): 16-26.
[6] Yunfei Shao,Dongsu Liu. Classifying Short-texts with Class Feature Extension[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[7] Lixin Xia,Jieyan Zeng,Chongwu Bi,Guanghui Ye. Identifying Hierarchy Evolution of User Interests with LDA Topic Model[J]. 数据分析与知识发现, 2019, 3(7): 1-13.
[8] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[9] Linna Xi,Yongxiang Dou. Examining Reposts of Micro-bloggers with Planned Behavior Theory[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[10] Jie Zhang,Junbo Zhao,Dongsheng Zhai,Ningning Sun. Patent Technology Analysis of Microalgae Biofuel Industrial Chain Based on Topic Model[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[11] Qinghong Zhong,Xiaodong Qiao,Yunliang Zhang,Mengjuan Weng. Cross-media Fusion Method Based on LDA2Vec and Residual Network[J]. 数据分析与知识发现, 2019, 3(10): 78-88.
[12] Junwan Liu,Zhixin Long,Feifei Wang. Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
[13] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[14] He Yue,Feng Yue,Zhao Shupeng,Ma Yufeng. Recommending Contents Based on Zhihu Q&A Community: Case Study of Logistics Topics[J]. 数据分析与知识发现, 2018, 2(9): 42-49.
[15] Zhang Tao,Ma Haiqun. Clustering Policy Texts Based on LDA Topic Model[J]. 数据分析与知识发现, 2018, 2(9): 59-65.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn