|
|
Constructing Topic Graph for Weibo Users Based on LDA: Case Study of “Egypt Air Disaster” |
Wang Xiwei1,2,3,Zhang Liu1(),Huang Bo4,Wei Ya’nan1 |
1School of Management, Jilin University, Changchun 130022, China 2Research Center for Big Data Management, Jilin University, Changchun 130022, China 3Cyberspace Governance Research Center, Jilin University, Changchun 130022, China 4School of Computer Science and Technology, Jilin University, Changchun 130022, China |
|
|
Abstract [Objective] This paper constructs a topic graph for Weibo users, aiming to identify the characteristics of user groups and opinion leaders. It also tries to guide online public opinion and reduce the surveillance costs.[Methods] First, we built a processing model for topic graph of Weibo users based on LDA. Then, we determined the optimal number and distribution of users’ topics with the index of perplexity. Third, we used JS divergence to measure the similarity of user topics, and constructed the topic graph. Finally, we took “Egypt air disaster” data to examine the proposed method.[Results] The topic graph generated by LDA clustered the user topics and identified the opinion leaders.[Limitations] More research is needed to determine the optimal number of LDA topics.[Conclusions] The proposed method could help us identify the characteristics of different topic groups and their opinion leaders.
|
Received: 22 February 2020
Published: 10 July 2020
|
|
Corresponding Authors:
Zhang Liu
E-mail: 598837913@qq.com
|
[1] |
朱晓霞, 宋嘉欣, 孟建芳. 基于动态主题-情感演化模型的网络舆情信息分析[J]. 情报科学, 2019,37(7):72-78.
|
[1] |
( Zhu Xiaoxia, Song Jiaxin, Meng Jianfang. Analysis of Online Public Opinion Information Based on the Dynamic Theme-Emotion Evolution Model[J]. Information Science, 2019,37(7):72-78.)
|
[2] |
王晰巍, 张柳, 李师萌, 等. 新媒体环境下社会公益网络舆情传播研究——以新浪微博“画出生命线”话题为例[J]. 数据分析与知识发现, 2017,1(6):93-101.
|
[2] |
( Wang Xiwei, Zhang Liu, Li Shimeng, et al. The Dissemination of Online Public Opinion on Social Welfare Issues via New Media: Case Study of “Draw up the Lifeline” in Sina Weibo[J]. Data Analysis and Knowledge Discovery, 2017,1(6):93-101.)
|
[3] |
凌晨, 冯俊文, 吴鹏, 等. 基于SOAR模型的高校网络舆情应急响应研究[J]. 情报科学, 2019,37(9):145-152.
|
[3] |
( Ling Chen, Feng Junwen, Wu Peng, et al. A Study on Crisis Response of Campus Network Public Opinion Based on SOAR Model[J]. Information Science, 2019,37(9):145-152.)
|
[4] |
Chen S Y, Jin Z S. Weibo Topic Detection Based on Improved TF-IDF Algorithm[J]. Science & Technology Review, 2016,34(2):282-286.
doi: 10.1126/science.34.870.282
|
[5] |
Srijith P K, Hepple M, Bontcheva K, et al. Sub-Story Detection in Twitter with Hierarchical Dirichlet Processes[J]. Information Processing & Management, 2017,53(4):989-1003.
doi: 10.1016/j.ipm.2016.10.004
|
[6] |
Choi H J, Park C H. Emerging Topic Detection in Twitter Stream Based on High Utility Pattern Mining[J]. Expert Systems with Applications, 2019,115(1):27-36.
doi: 10.1016/j.eswa.2018.07.051
|
[7] |
Nolasco D, Oliveira J. Subevents Detection Through Topic Modeling in Social Media Posts[J]. Future Generation Computer Systems, 2019,93(4):290-303.
doi: 10.1016/j.future.2018.09.008
|
[8] |
Ma T H, Li J, Liang X N, et al. A Time-Series Based Aggregation Scheme for Topic Detection in Weibo Short Texts[J]. Physica A: Statistical Mechanics and Its Applications, 2019, 536: Article No. 120972.
doi: 10.1016/j.physa.2019.04.266
pmid: 32288109
|
[9] |
唐晓波, 肖璐. 基于依存句法分析的微博主题挖掘模型研究[J]. 情报科学, 2015,33(9):61-65.
|
[9] |
( Tang Xiaobo, Xiao Lu. Research on Micro-Blog Topics Mining Model on Dependency Parsing[J]. Information Science, 2015,33(9):61-65.)
|
[10] |
梁晓贺, 田儒雅, 吴蕾, 等. 基于超网络的微博舆情主题挖掘方法[J]. 情报理论与实践, 2017,40(10):100-105.
|
[10] |
( Liang Xiaohe, Tian Ruya, Wu Lei, et al. A Method of Public Opinion Topic Mining in Micro-Blog Based on Super-Network[J]. Information Studies: Theory & Application, 2017,40(10):100-105.)
|
[11] |
赵常煜, 吴亚平, 王继民. “一带一路”倡议下的Twitter文本主题挖掘和情感分析[J]. 图书情报工作, 2019,63(19):119-127.
|
[11] |
( Zhao Changyu, Wu Yaping, Wang Jimin. Twitter Text Topic Mining and Sentiment Analysis Under the Belt and Road Initiative[J]. Library and Information Service, 2019,63(19):119-127.)
|
[12] |
朱晓霞, 宋嘉欣, 孟建芳. 基于主题-情感挖掘模型的微博评论情感分类研究[J]. 情报理论与实践, 2019,42(5):159-164.
|
[12] |
( Zhu Xiaoxia, Song Jiaxin, Meng Jianfang. Research on the Classification of Emotion in Microblog Comments Based on the Theme-Emotion Mining Model[J]. Information Studies: Theory & Application, 2019,42(5):159-164.)
|
[13] |
徐敏, 李广建. 基于词频均值波动和概率语言模型的短文本热点主题探测研究[J]. 情报杂志, 2019,38(6):152-158.
|
[13] |
( Xu Min, Li Guangjian. Short Texts’ Hot Topics Detection: Based on Word Frequency Mean Fluctuation and Probabilistic Language Model[J]. Journal of Intelligence, 2019,38(6):152-158.)
|
[14] |
Zhang Y L, Eick C F. Tracking Events in Twitter by Combining an LDA-Based Approach and a Density-Contour Clustering Approach[J]. International Journal of Semantic Computing, 2019,13(1):87-110.
doi: 10.1142/S1793351X19400051
|
[15] |
Luo L X. Network Text Sentiment Analysis Method Combining LDA Text Representation and GRU-CNN[J]. Personal and Ubiquitous Computing, 2019,23(3-4):405-412.
|
[16] |
蔡永明, 长青. 共词网络LDA模型的中文短文本主题分析[J]. 情报学报, 2018,37(3):305-317.
|
[16] |
( Cai Yongming, Chang Qing. Chinese Short Text Topic Analysis by Latent Dirichlet Allocation Model with Co-word Network Analysis[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(3):305-317.)
|
[17] |
Huang L, Ma J Y, Chen C L. Topic Detection from Microblogs Using T-LDA and Perplexity[C]//Proceedings of the 24th Asia-Pacific Software Engineering Conference Workshops. 2017: 71-77.
|
[18] |
关鹏, 王曰芬. 科技情报分析中LDA主题模型最优主题数确定方法研究[J]. 现代图书情报技术, 2016(9):42-50.
|
[18] |
( Guan Peng, Wang Yuefen. Identifying Optimal Topic Numbers from Sci-Tech Information with LDA Model[J]. New Technology of Library and Information Service, 2016(9):42-50.)
|
[19] |
曾子明, 王婧. 基于LDA和随机森林的微博谣言识别研究——以2016年雾霾谣言为例[J]. 情报学报, 2019,38(1):89-96.
|
[19] |
( Zeng Ziming, Wang Jing. Research on Microblog Rumor Identification Based on LDA and Random Forest[J]. Journal of the China Society for Scientific and Technical Information, 2019,38(1):89-96.)
|
[20] |
Zareie A, Sheikhahmadi A, Jalili M. Identification of Influential Users in Social Networks Based on Users’ Interest[J]. Information Sciences, 2019,493(4):217-231.
doi: 10.1016/j.ins.2019.04.033
|
[21] |
Wang H C, Chen W F, Lin C Y. NoteSum: An Integrated Note Summarization System by Using Text Mining Algorithms[J]. Information Sciences, 2020,513(3):536-552.
doi: 10.1016/j.ins.2019.11.011
|
[22] |
观研网. 2019年中国微博行业分析报告[R/OL]. [2020-02-20]. http://www.gyii.cn/plus/view.php?aid=239883.
|
[22] |
Proresearch. Analysis Report of China’s Microblog Industry in 2019[R/OL]. [2020-02-20]. http://www.gyii.cn/plus/view.php?aid=239883.)
|
[23] |
江燕青, 许鑫. 半衰期视角的微博信息老化研究——以高校官方微博为例[J]. 图书情报知识, 2016(2):94-102.
|
[23] |
( Jiang Yanqing, Xu Xin. Research on Microblog Information Obsolescence from the Perspective of Half-Life: Taking Universities’ Official Microblog for Example[J]. Documentation, Information& Knowledge, 2016(2):94-102.)
|
[24] |
Hagen L. Content Analysis of E-petitions with Topic Modeling: How to Train and Evaluate LDA Models?[J]. Information Processing& Management, 2018,54(6):1292-1307.
doi: 10.1016/j.ipm.2018.05.006
|
[25] |
Jain L, Katarya R. Discover Opinion Leader in Online Social Network Using Firefly Algorithm[J]. Expert Systems with Applications, 2019,112(5):1-15.
doi: 10.1016/j.eswa.2018.06.026
|
[26] |
张柳, 王晰巍, 黄博, 等. 基于字词向量的多尺度卷积神经网络微博评论的情感分类模型及实验研究[J]. 图书情报工作, 2019,63(18):99-108.
|
[26] |
( Zhang Liu, Wang Xiwei, Huang Bo, et al. A Sentiment Classification Model of Multi-scale Convolutional Neural Network Microblog Comments Based on Word Vectors and Experimental Research[J]. Library and Information Service, 2019,63(18):99-108.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|