Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (2/3): 29-38    DOI: 10.11925/infotech.2096-3467.2019.0735
Current Issue | Archive | Adv Search |
Identifying Potential Trending Topics of Online Public Opinion
Ding Shengchun1,2(),Yu Fengyang1,Li Zhen1
1School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China
2Jiangsu Social Public Security Science and Technology Collaborative Innovation Center, Nanjing 210094, China
Download: PDF (1104 KB)   HTML ( 26
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to find potential trending topics from the online data, aiming to help government or enterprises monitor and guide public opinion.[Methods] First, we collected topics of public opinion with microblog’s real-time data stream. Then, we identified features of trending topics. Finally, we compared the performance of the Logistic Regression and SVM models for predicting potential trending topics.[Results] The Logistic Regression model is more capable of finding potential trending topics (recall=0.89) than SVM.[Limitations] More research is needed to examine our model with other social media platforms.[Conclusions] The proposed model could effectively identify potential trending topics of online public opinion.

Key wordsInternet Public Opinion      Identification of Potential Hot Topics      Logistic Regression      Support Vector Machine     
Received: 24 June 2019      Published: 26 April 2020
ZTFLH:  TP391 N99  
Corresponding Authors: Ding Shengchun     E-mail: todingding@163.com

Cite this article:

Ding Shengchun,Yu Fengyang,Li Zhen. Identifying Potential Trending Topics of Online Public Opinion. Data Analysis and Knowledge Discovery, 2020, 4(2/3): 29-38.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.0735     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I2/3/29

The Framework for Identifying Potential Hot Topics of Network Public Opinion
序号 特征量化
1 单位时间内主题相关微博增量
2 单位时间内主题相关微博的评论增量
3 单位时间内主题相关微博的转发增量
4 单位时间内主题相关微博的点赞增量
5 主题相关用户最近30天内的日均发博数
6 主题相关用户最近30天内的粉丝互动h指数
7 主题相关用户的高质量粉丝数
8 主题相关用户最近30天内的微博平均评论数
9 主题相关用户最近30天内的微博平均转发数
10 主题相关用户最近30天内的微博平均点赞数
Identification Characteristics of Potential Hot Topics
意见领袖所属类别 参考来源
政务类 微博榜单及2017年度人民日报·政务指数微博影响力报告[20]
传统媒体类(含报纸、杂志、媒体网站等) 微博榜单
互联网类
自媒体人气大V类
娱乐类
财经类 新浪全媒体影响力排行榜(②http://blog.sina.com.cn/lm/bang/.)
Selection Categories and Reference Sources of Opinion Leaders
Execution Result of Subject Detection Program (Part)
Example of Topic Content Assignment for Three Categories of Rules
被过滤的主题类型 过滤原因
已登录微博热搜榜的主题 无预测价值
综合新闻或事件回顾 已失去时效性
交通、天气、股票等实时播报 日常或周期性事件,突发程度低,网络舆情监测价值或提前预警必要性较低
系列活动的日常报道
周期性事件
娱乐新闻、明星八卦 不属于本研究的目标服务群体
城市、图书、影视、音乐等推荐与分享 多数微博用户用于吸引粉丝的日常分享,不含较重大的社会事件或突发事件,监测价值较低
招聘启事、商业广告
人物访谈、人物简介、名人名言
搞笑段子、鸡汤文字
粉丝福利、日常互动
食谱、生活技巧、知识科普
便民提示、安全提醒
世界杯等体育赛事 该类事件属全民关注,极易登上微博热搜榜,提前预警的必要性低
Types and Reasons of Artificial Filtering of Public Opinion Topics
序号 特征 序号 特征
1 t1内的主题相关微博增量 17 t1内的主题相关用户粉丝互动h指数
2 t1内的主题相关微博的评论增量 18 t1内的主题相关用户高质量粉丝数
3 t1内的主题相关微博的转发增量 19 t1内的主题相关用户活跃度
4 t1内的主题相关微博的点赞增量 20 t1内的主题相关用户微博影响力
5 t2内的主题相关微博增量 21 t2内新增的主题相关用户粉丝互动h指数
6 t2内的主题相关微博的评论增量 22 t2内新增的主题相关用户高质量粉丝数
7 t2内的主题相关微博的转发增量 23 t2内新增的主题相关用户活跃度
8 t2内的主题相关微博的点赞增量 24 t2内新增的主题相关用户微博影响力
9 t3内的主题相关微博增量 25 t3内的主题相关用户粉丝互动h指数
10 t3内的主题相关微博的评论增量 26 t3内的主题相关用户高质量粉丝数
11 t3内的主题相关微博的转发增量 27 t3内的主题相关用户活跃度
12 t3内的主题相关微博的点赞增量 28 t3内的主题相关用户微博影响力
13 t4内的主题相关微博增量 29 t4内新增的主题相关用户粉丝互动h指数
14 t4内的主题相关微博的评论增量 30 t4内新增的主题相关用户高质量粉丝数
15 t4内的主题相关微博的转发增量 31 t4内新增的主题相关用户活跃度
16 t4内的主题相关微博的点赞增量 32 t4内新增的主题相关用户微博影响力
Potential Hot Topic Identification Feature Items
Example of Feature Item Extraction Results
Example of Manual Labeling Results
实验次数 Logistic Regression SVM
准确率 召回率 F1值 准确率 召回率 F1值
1 0.66 0.88 0.75 0.82 0.67 0.74
2 0.69 0.83 0.75 0.75 0.71 0.73
3 0.65 0.80 0.72 0.78 0.64 0.70
4 0.67 0.84 0.75 0.84 0.65 0.73
5 0.63 0.86 0.73 0.69 0.76 0.72
6 0.70 0.89 0.78 0.70 0.67 0.69
7 0.67 0.89 0.77 0.87 0.73 0.79
8 0.66 0.81 0.73 0.90 0.64 0.74
9 0.67 0.83 0.74 0.78 0.65 0.71
10 0.67 0.89 0.77 0.88 0.67 0.76
11 0.75 0.85 0.79 0.83 0.64 0.72
12 0.68 0.79 0.73 0.84 0.65 0.73
13 0.65 0.88 0.75 0.76 0.85 0.80
14 0.71 0.86 0.78 0.73 0.73 0.73
15 0.63 0.85 0.72 0.79 0.67 0.73
均值 0.67 0.85 0.75 0.80 0.69 0.73
Results of Potential Hot Topic Identification
[1] 贺恩锋, 庄林远, 徐文根 . 网络舆情潜在影响力指标体系构建及应用[J]. 情报杂志, 2014,33(1):114-119.
[1] ( He Enfeng, Zhuang Linyuan, Xu Wengen . The Construction and Application of Potential Influence Index System for Network Public Opinions[J]. Journal of Intelligence, 2014,33(1):114-119.)
[2] 高俊波, 安博文, 王晓峰 . 在线论坛中潜在影响力主题的发现研究[J]. 计算机应用, 2008,28(1):140-142.
[2] ( Gao Junbo, An Bowen, Wang Xiaofeng . Study on Potential Influence Topic in On-line Community[J]. Journal of Computer Appplications, 2008,28(1):140-142.)
[3] Jamali S, Rangwala H . Digging Digg: Comment Mining, Popularity Prediction, and Social Network Analysis [C]// Proceedings of the 2009 International Conference on Web Information Systems and Mining. IEEE, 2009: 32-38.
[4] Hong L, Dan O, Davison B D . Predicting Popular Messages in Twitter [C]// Proceedings of the 20th International Conference on World Wide Web, Hyderabad, India. 2011: 57-58.
[5] Bandari R, Asur S, Huberman B A . The Pulse of News in Social Media: Forecasting Popularity [C]// Proceedings of the 6th International AAAI Conference on Weblogs and Social Media. 2012.
[6] 蒋玉婷 . 支持向量机修正ARIMA误差的微博热点预测[J]. 计算机应用与软件, 2014,31(9):187-190.
[6] ( Jiang Yuting . Microblogging Hot Topic Prediction Based on Correcting ARIMA Error by Support Vector Machine[J]. Computer Applications and Software, 2014,31(9):187-190.)
[7] Xu F, Liu J, He Y , et al. Hot Topic Trend Prediction of Topic Based on Markov Chain and Dynamic Backtracking [C]// Proceedings of the 18th Pacific-Rim Conference on Multimedia. Springer, 2017: 517-528.
[8] 史蕊, 陈福集, 张金华 . 基于组合灰色模型的网络舆情预测研究[J]. 情报杂志, 2018,37(7):101-106.
[8] ( Shi Rui, Chen Fuji, Zhang Jinhua . Prediction of Online Public Opinion Based on Combination Grey Model[J]. Journal of Intelligence, 2018,37(7):101-106.)
[9] 何炎祥, 刘健博, 孙松涛 . 基于神经网络的微博舆情预测方法[J]. 华南理工大学学报:自然科学版, 2016,44(9):47-52.
[9] ( He Yanxiang, Liu Jianbo, Sun Songtao . Neural Network-Based Public Opinion Prediction Method for Microblog[J]. Journal of South China University of Technology: Natural Science Edition, 2016,44(9):47-52.)
[10] 陈江, 刘玮, 巢文涵 , 等. 融合热点话题的微博转发预测研究[J]. 中文信息学报, 2015,29(6):150-158.
[10] ( Chen Jiang, Liu Wei, Chao Wenhan , et al. Research on Weibo Forwarding Prediction Based on Hot Topics[J]. Journal of Chinese Information Processing, 2015,29(6):150-158.)
[11] 李永兴 . 网络热点话题检测与趋势预测技术研究[D]. 天津: 天津大学, 2016.
[11] ( Li Yongxing . Research on Technologies of Hot Topic Detection and Topic Trend Prediction[D]. Tianjin: Tianjin University, 2016.)
[12] 姚海波 . 微博热点话题检测与趋势预测研究[D]. 广州: 华南理工大学, 2013.
[12] ( Yao Haibo . Detection and Trend Prediction Research of Hot Topic of Micro-Blogging[D]. Guangzhou: South China University of Technology, 2013.)
[13] 黄蕉平 . 基于微博的负面热点新闻早期预测分析[D]. 广州: 华南理工大学, 2013.
[13] ( Huang Jiaoping . Based on Microblogging Early Forecast and Analyze Negative Hot News[D]. Guangzhou: South China University of Technology, 2013.)
[14] 刘跃杰 . 基于中文微博的话题趋势预测系统的设计与实现[D]. 北京: 北京邮电大学, 2014.
[14] ( Liu Yuejie . Design and Implementation of Trending Topic Prediction System Based on Chinese Microblogging[D]. Beijing: Beijing University of Posts and Telecommunications, 2014.)
[15] Nikolov S . Trend or No Trend: A Novel Nonparametric Method for Classifying Time Series[D]. Massachusetts Institute of Technology, 2012.
[16] Yuan S, Tao Z, Zhu T , et al. Realtime Online Hot Topics Prediction in Sina Weibo for News Earlier Report [C]// Proceedings of the 2017 IEEE 31st International Conference on Advanced Information Networking & Applications. IEEE, 2017: 599-650.
[17] 原福永, 冯静, 符茜茜 . 现代图书情报技术[J].现代图书情报技术, 2012(6):60-64.
[17] ( Yuan Fuyong, Feng Jing, Fu Qianqian . Influence Index Model of Micro-blog User[J].New Technology of Library and Information Service, 2012(6):60-64.)
[18] Brin S, Page L . Reprint of: The Anatomy of a Large-scale Hypertextual Web Search Engine[J]. Computer Networks, 2012,56(18):3825-3833.
[19] Hirsch J E . An Index to Quantify an Individual’s Scientific Research Output[J]. Proceedings of the National Academy of Sciences, 2005,102(46):16569-16572.
[20] 人民网舆情监测室.2017年度人民日报·政务指数微博影响力报告[EB/OL].[ 2018- 03- 03]. http://yuqing.people.com.cn/NMediaFile/2018/0123/MAIN201801231606000362822292002.pdf.
[20] ( People’s Daily Public Opinion Monitoring Office. 2017 People’s Daily·Government Affairs Index Weibo Impact Report [EB/OL]. [ 2018- 03- 03]. http://yuqing.people.com.cn/NMediaFile/2018/0123/MAIN201801231606000362822292002.pdf.)
[1] Feng Hao, Li Shuqing. Multi-layer Cascade Classifier for Credit Scoring with Multiple-Support Vector Machines[J]. 数据分析与知识发现, 2021, 5(10): 28-36.
[2] Huang Wei,Zhao Jiangyuan,Yan Lu. Empirical Research on Topic Drift Index for Trending Network Events[J]. 数据分析与知识发现, 2020, 4(11): 92-101.
[3] Heran Qin,Liu Liu,Bin Li,Dongbo Wang. Automatic Classification of Ancient Classics with Entity Features[J]. 数据分析与知识发现, 2019, 3(9): 68-76.
[4] Ruojia Wang,Lu Zhang,Jimin Wang. Automatic Triage of Online Doctor Services Based on Machine Learning[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
[5] Xianlai Chen,Chaopeng Han,Ying An,Li Liu,Zhongmin Li,Rong Yang. Extracting New Words with Mutual Information and Logistic Regression[J]. 数据分析与知识发现, 2019, 3(8): 105-113.
[6] Qingtian Zeng,Mingdi Dai,Chao Li,Hua Duan,Zhongying Zhao. Discovering Important Locations with User Representation and Trace Data[J]. 数据分析与知识发现, 2019, 3(6): 75-82.
[7] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[8] Yanshuang Mei,Hengmin Zhu,Jing Wei. A Study on the Mechanism of Media Collaboration on the Spread of Internet Public Opinion[J]. 数据分析与知识发现, 2019, 3(2): 65-71.
[9] Zhixiong Zhang,Huan Liu,Liangping Ding,Pengmin Wu,Gaihong Yu. Identifying Moves of Research Abstracts with Deep Learning Methods[J]. 数据分析与知识发现, 2019, 3(12): 1-9.
[10] Liangping Ding,Zhixiong Zhang,Huan Liu. Factors Affecting Rhetorical Move Recognition with SVM Model[J]. 数据分析与知识发现, 2019, 3(11): 16-23.
[11] Wenxiu Hu,Li Ma,Jianfeng Zhang. Identifying Ultra-short-term Market Manipulation with Stock Intraday Trading Weighted Network[J]. 数据分析与知识发现, 2019, 3(10): 118-126.
[12] Jia Longjia,Zhang Bangzuo. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
[13] Wang Jingqi,Li Rui,Wu Huayi. The Evolution of Online Public Opinion Based on Spatial Autocorrelation[J]. 数据分析与知识发现, 2018, 2(2): 64-73.
[14] Huang Xiaoxi,Li Hanyu,Wang Rongbo,Wang Xiaohua,Chen Zhiqun. Recognizing Metaphor with Convolution Neural Network and SVM[J]. 数据分析与知识发现, 2018, 2(10): 77-83.
[15] Zeng Jin,Lu Wei,Ding Heng,Chen Haihua. Modeling User’s Interests Based on Image Semantics[J]. 数据分析与知识发现, 2017, 1(4): 76-83.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn