Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (2/3): 18-28    DOI: 10.11925/infotech.2096-3467.2019.0720
Current Issue | Archive | Adv Search |
Predicitng Retweets of Government Microblogs with Deep-combined Features
Xu Yuemei(),Liu Yunwen,Cai Lianqiao
School of Information Science and Technology, Beijing Foreign Studies University, Beijing 100089, China
Download: PDF(2023 KB)   HTML ( 14
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to predict the number of retweets of government microblogs, aiming to evaluate the important features affecting retweets and public opinions.[Methods] First, we used the Convolutional Neural Network (CNN) and Gradient Boosting Decision Tree (GBDT) to combine user, time and content features. Then, we predicted the retweet numbers of government microblogs. Finally, we ranked the importance of every feature to find the most important one for retweets.[Results] The proposed model improved the accuracy of retweet prediction to 0.933. The semantic feature of microblog texts is the most important one.[Limitations] We did not study the impacts of indirect retweeting behaviors.[Conclusions] The CNN-GBDT model for deep-combined features could effectively predict retweets of government microblogs.

Key wordsGovernment Microblogs      Retweeting Scale Prediction      Convolutional Neural Network      Text Classification     
Received: 20 June 2019      Published: 26 April 2020
ZTFLH:  TP393  
Corresponding Authors: Yuemei Xu     E-mail: xuyuemei@bfsu.edu.cn

Cite this article:

Xu Yuemei,Liu Yunwen,Cai Lianqiao. Predicitng Retweets of Government Microblogs with Deep-combined Features. Data Analysis and Knowledge Discovery, 2020, 4(2/3): 18-28.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.0720     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I2/3/18

Flowchart of Retweeting Scale Prediction of Government Microblogs Based on Deep-combined Features
The Procedure of Microblogs Text Semantic Calculation Based on CNN
微博编号 传播规模 微博内容 发布时间 点赞数(次) 转发数(次) 评论数(条) 发布者 粉丝数(人)
1 平安回家过大年 2019-01-18 07:30 536 5 813 474 公安部交通安全微发布 5 309 399
2 爱心护考,交警同行 2018-06-07 15:13 32 64 6 公安部交通安全微发布 5 309 399
3 曾经,在故宫,观画... 2018-07-25 11:34 10 753 5 138 1 149 故宫博物院 6 282 823
Examples of the Raw Dataset
参数 参数值
词向量维度 300
卷积核个数 256
卷积核大小 5
Dropout 0.5
batch_size 64
迭代次数 20
激活函数 ReLU
Parameter Settings of CNN
特征传播
类别
CNN文本语义打分 关键词相似度 粉丝数 发布者日均
发博数
发布者高转
发率
时间特征
1.000 0.317 0.167 0.357 0.147 -1.204
1.000 0.553 0.167 0.357 0.147 -0.223
0.125 0.030 0.024 0.571 0.018 -0.223
0.476 0.065 0.024 0.571 0.018 -0.223
Examples of Input Dataset in the Retweeting-Scale-Prediction Model
混淆矩阵 预测值
高转发 低转发
实际值 高转发 TT TF
低转发 FT FF
Confusion Matrix
算法 准确率 召回率 精确度 F1值
CNN+SVM 0.905 0.823 0.886 0.861
SVM 0.833 0.695 0.781 0.737
CNN+GBDT 0.933 0.869 0.925 0.918
GBDT 0.842 0.683 0.817 0.768
Experiment Results
Accuracy of the Four Algorithms
Recall of the Four Algorithms
Precision of the Four Algorithms
F1-value of the Four Algorithms
指标
特征组合
准确率 召回率 精确度 F1值
发布者特征+内容特征+时间特征 0.933 0.869 0.925 0.918
发布者特征+时间特征 0.832 0.667 0.800 0.733
内容特征+时间特征 0.886 0.787 0.861 0.852
发布者特征+内容特征 0.931 0.867 0.922 0.912
Performance of GBDT Model Using Different Feature Settings
指标
特征组合
准确率 召回率 精确度 F1值
发布者特征+内容特征+时间特征 0.905 0.823 0.886 0.861
发布者特征+时间特征 0.814 0.681 0.742 0.712
内容特征+时间特征 0.852 0.693 0.837 0.760
发布者特征+内容特征 0.897 0.806 0.877 0.843
Performance of SVM Model Using Different Feature Settings
Importance Ranking of Different Features Measured by GBDT
[1] 刘泱育 . 新闻大学[J]. 新闻大学, 2017(1):78-84.
[1] Liu Yangyu . Communication Efficacy of the Local Government Affairs Micro-blogging in China: Evidence from the Central Government Work Report by the Official Sina Micro-blogging in 31 Provincial Capital Cities[J]. Journalism Bimonthly, 2017(1):78-84.)
[2] 人民网舆情数据中心. 2018年度人民日报政务指数·微博影响力报告[R/OL]. [ 2019- 03- 03]. http://yuqing.people.com.cn/NMediaFile/2019/0121/MAIN201901211335000329860253572.pdf.
[2] ( Public Sentiment Data Center of People’s Daily Online. Government Affairs Index of People’s Daily and Report of Microblog Influence in 2018[R/OL]. [ 2019- 03- 03]. http://yuqing.people.com.cn/NMediaFile/2019/0121/MAIN201901211335000329860253572.pdf.)
[3] 仇学明, 肖基毅, 陈磊 . 基于用户特征的微博转发预测研究[J]. 南华大学学报:自然科学版, 2016,30(4):100-105.
[3] ( Qiu Xueming, Xiao Jiyi, Chen Lei . Research on Micro-blog Forward Prediction Based on User Characteristics[J]. Journal of University of South China: Science and Technology, 2016,30(4):100-105.)
[4] 刘玮, 贺敏, 王丽宏 , 等. 基于用户行为特征的微博转发预测研究[J]. 计算机学报, 2016,39(10):1992-2006.
[4] ( Liu Wei, He Min, Wang Lihong , et al. Research on Microblog Retweeting Prediction Based on User Behavior Features[J]. Chinese Journal of Computers, 2016,39(10):1992-2006.)
[5] 马晓峰, 王磊, 陈观淡 . 基于混合特征学习的微博转发预测方法[J]. 计算机应用与软件, 2016,33(11):249-252, 257.
[5] ( Ma Xiaofeng, Wang Lei, Chen Guandan . A Microblogging Retweet Prediction Method Based on Hybrid Features Learning[J]. Computer Applications and Software, 2016,33(11):249-252, 257.)
[6] 李志清 . 基于LDA主题特征的微博转发预测[J]. 情报杂志, 2015,34(9):158-162.
[6] ( Li Zhiqing . Predicting Retweeting Behavior Based on LDA Topic Features[J]. Journal of Intelligence, 2015,34(9):158-162.)
[7] Kim Y . Convolutional Neural Networks for Sentence Classification [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar. 2014: 1746-1751.
[8] Friedman J H . Greedy Function Approximation: A Gradient Boosting Machine[J]. The Annals of Statistics, 2001,29(5):1189-1232.
[9] Petrovic S, Osborne M, Lavrenko V . RT to Win! Predicting Message Propagation in Twitter [C]// Proceedings of the 5th International AAAI Conference on Web and Social Media. 2011.
[10] 曹玖新, 吴江林, 石伟 , 等. 新浪微博网信息传播分析与预测[J]. 计算机学报, 2014,37(4):779-790.
[10] ( Cao Jiuxin, Wu Jianglin, Shi Wei , et al. Sina Microblog Information Diffusion Analysis and Prediction[J]. Chinese Journal of Computers, 2014,37(4):779-790.)
[11] 陈江, 刘玮, 巢文涵 , 等. 融合热点话题的微博转发预测研究[J]. 中文信息学报, 2015,29(6):150-158.
[11] ( Chen Jiang, Liu Wei, Chao Wenhan , et al. Microblog Forwarding Prediction Based on Hot Topics[J]. Journal of Chinese Information Processing, 2015,29(6):150-158.)
[12] Weng J, Lim E P, Jiang J , et al. TwitterRank: Finding Topic Sensitive Influential Twitters [C]// Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, 2010: 261-270.
[13] 李倩, 张碧君, 赵中英 . 微博信息转发影响因素研究[J]. 软件导刊, 2017,16(1):15-17.
[13] ( Li Qian, Zhang Bijun, Zhao Zhongying . Research on the Influencing Factors of Microblogs Information[J]. Software Guide, 2017,16(1):15-17.)
[14] 周莉, 李晓, 黄娟 . 新闻大学[J].新闻大学, 2015(2):144-152.
[14] ( Zhou Li, Li Xiao, Huang Juan . The Release of Information and Its Impact on Government Microblogs in Emergencies[J]. Journalism Bimonthly, 2015(2):144-152.)
[15] 陈然, 刘洋 . 电子政务[J].电子政务, 20177):108-117.
[15] ( Chen Ran, Liu Yang . Research on the Dissemination Mode of Government Microblogs Based on Retweeting Behaviors[J]. E-Government, 2017(7):108-117.)
[16] 张漫锐, 刘文波 . 政务微博传播效果影响因素研究——以“江宁公安在线”为例[J]. 今传媒, 2017,25(10):72-73.
[16] ( Zhang Manrui, Liu Wenbo . A Study on the Influencing Factors of the Effect of Government Microblogs——Taking Jiangning Public Security Online as an Example[J]. Today’s Massmedia, 2017,25(10):72-73.)
[17] 李倩倩, 姜景, 李瑛 , 等. 我国政务微博转发规模分类预测[J]. 情报杂志, 2018,37(1):95-99.
[17] ( Li Qianqian, Jiang Jing, Li Ying , et al. The Retweeting Scale Classification Prediction of Government Microblogs in China[J]. Journal of Intelligence, 2018,37(1):95-99.)
[18] Maning C D, Schütze H, Raghavan P. 信息检索导论[M]. 王斌译. 北京: 人民邮电出版社, 2011.
[18] ( Manning C D, Schütze H, Raghavan P. Introduction to Information Retrieval[M]. Translated by Wang Bin. Beijing: Post &Telecom Press, 2011.)
[19] Ilia I, Tsangaratos P . Applying Weight of Evidence Method and Sensitivity Analysis to Produce a Landslide Susceptibility Map[J]. Landslides, 2016,13(2):379-397.
[1] Xiang Fei,Xie Yaotan. Recognition Model of Patient Reviews Based on Mixed Sampling and Transfer Learning[J]. 数据分析与知识发现, 2020, 4(2/3): 39-47.
[2] Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang. Classification of Short Texts Based on nLD-SVM-RF Model[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[3] Weimin Nie,Yongzhou Chen,Jing Ma. A Text Vector Representation Model Merging Multi-Granularity Information[J]. 数据分析与知识发现, 2019, 3(9): 45-52.
[4] Yunfei Shao,Dongsu Liu. Classifying Short-texts with Class Feature Extension[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[5] Heran Qin,Liu Liu,Bin Li,Dongbo Wang. Automatic Classification of Ancient Classics with Entity Features[J]. 数据分析与知识发现, 2019, 3(9): 68-76.
[6] Guo Chen,Tianxiang Xu. Sentence Function Recognition Based on Active Learning[J]. 数据分析与知识发现, 2019, 3(8): 53-61.
[7] Kan Liu,Lu Chen. Deep Neural Network Learning for Medical Triage[J]. 数据分析与知识发现, 2019, 3(6): 99-108.
[8] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[9] Zhiyong Tao,Xiaobing Li,Ying Liu,Xiaofang Liu. Classifying Short Texts with Improved-Attention Based Bidirectional Long Memory Network[J]. 数据分析与知识发现, 2019, 3(12): 21-29.
[10] Yuman Li,Zhibo Chen,Fu Xu. Classifying Texts with KACC Model[J]. 数据分析与知识发现, 2019, 3(10): 89-97.
[11] Zixuan Zhang,Hao Wang,Liping Zhu,Sanhong eng. Identifying Risks of HS Codes by China Customs[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
[12] Yuemei Xu,Sining Lv,Lianqiao Cai,Xiaoya Zhang. Analyzing News Topic Evolution with Convolutional Neural Networks and Topic2Vec[J]. 数据分析与知识发现, 2018, 2(9): 31-41.
[13] Xinlei Li,Hao Wang,Xiaomin Liu,Sanhong Deng. Comparing Text Vector Generators for Weibo Short Text Classification[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[14] Liu Liu,Dongbo Wang. Identifying Interdisciplinary Social Science Research Based on Article Classification[J]. 数据分析与知识发现, 2018, 2(3): 30-38.
[15] Xiangdong Li,Tao Ruan,Kang Liu. Automatic Classification of Documents from Wikipedia[J]. 数据分析与知识发现, 2017, 1(10): 43-52.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn