Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (7): 55-62    DOI: 10.11925/infotech.2096-3467.2018.0003
Current Issue | Archive | Adv Search |
Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo
Jia Longjia1,2(), Zhang Bangzuo3
1School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China
2Department of Planning and Development, Northeast Normal University, Changchun 130024, China
3School of Computer Science and Information Technology, Northeast Normal University, Changchun 130024, China
Download: PDF (879 KB)   HTML ( 2
Export: BibTeX | EndNote (RIS)      

[Objective] This paper introduces a term weighting method to classify topics of Sina Weibo posts by college students, aiming to solve the high dimension and sparsity issues. [Methods] First, we calculated the probability of a term’s falling to specific categories and then predicted the probability of a document’s category. Then, we converted the word-based features to a class-based matrix, which was classified by the support vector machine. [Results] Our new method increased the MicroF1/MacroF1values of the traditional tf, tf×idf and tf×rf methods by 7.2%/7.8%, 7.5%/7.9% and 6.4%/5.7%, respectively. [Limitations] More research is needed to explore topic classification methods other than the term weighting one in this paper. [Conclusions] The proposed method could effectively reduce the dimension of feature matrix and improve the classification efficiency for Internet public opinion studies.

Key wordsInternet Public Opinion Security      Theme Classification      Term Weighting      Machine Learning     
Received: 02 January 2018      Published: 15 August 2018
ZTFLH:  TP391.1  

Cite this article:

Jia Longjia,Zhang Bangzuo. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo. Data Analysis and Knowledge Discovery, 2018, 2(7): 55-62.

URL:     OR

属于类别ci 不属于类别ci
属于类别ci TP FP
不属于类别ci FN TN
属于类别C 不属于类别C
属于类别C $TP=\sum\limits_{\text{i}=1}^{|C|}{T{{P}_{\text{i}}}}$ $FP=\sum\limits_{\text{i}=1}^{|C|}{F{{P}_{\text{i}}}}$
不属于类别C $FN=\sum\limits_{i=1}^{|C|}{F{{N}_{\text{i}}}}$ $TN=\sum\limits_{i=1}^{|C|}{T{{N}_{\text{i}}}}$
[1] 中国互联网络信息中心(CNNIC). 第40次中国互联网络发展状况统计报告[R/OL]. .
[1] (China Internet Network Information Center (CNNIC). The 40th Statistical Report on the Internet Development in China [R/OL].
[2] 廖海涵, 靳嘉林, 王曰芬. 网络舆情事件中微博用户行为特征和关系分析——以新浪微博“雾霾调查: 穹顶之下”为例[J]. 情报资料工作, 2016(3): 12-18.
doi: 10.3969/j.issn.1002-0314.2016.03.002
[2] (Liao Haihan, Jin Jialin, Wang Yuefen.Analysis on the Characteristics and Relationships of Weibo Users’ Behaviors in Internet Public Opinion Incidents —— A Case Study of Sina Weibo Survey on Haze: Under Domes[J]. Information and Documentation Services, 2016(3): 12-18.)
doi: 10.3969/j.issn.1002-0314.2016.03.002
[3] 罗泰晔. 基于Logistic模型的微博舆情热点发展预测研究[J]. 统计与信息论坛, 2017, 32(10): 91-95.
[3] (Luo Taiye.Study on the Prediction of Hotspot Development of Weibo Public Opinion Based on Logistic Model[J]. Statistics and Information Forum, 2017, 32(10): 91-95.)
[4] 王亚民, 胡悦. 基于BTM的微博舆情热点发现[J]. 情报杂志, 2016, 35(11): 119-124, 140.
doi: 10.3969/j.issn.1002-1965.2016.11.022
[4] (Wang Yamin, Hu Yue.Discovery of Public Opinion Hotspot in Weibo Based on BTM[J]. Journal of Intelligence. 2016, 35(11): 119-124, 140.)
doi: 10.3969/j.issn.1002-1965.2016.11.022
[5] 胡悦, 王亚民. 基于模糊神经网络的微博舆情趋势预测方法[J]. 情报科学, 2017, 35(12): 28-33.
[5] (Hu Yue, Wang Yamin.New Forecasting Method of Weibo Public Opinion Based on Fuzzy Neural Network[J]. Information Science, 2017, 35(12): 28-33.)
[6] 张宸, 韩夏. 大数据环境下基于SVM-WNB的网络舆情分类研究[J]. 统计与决策, 2017(14): 45-48.
doi: 10.13546/j.cnki.tjyjc.2017.14.010
[6] (Zhang Chen, Han Xia.Study on Network Public Opinion Classification Based on SVM-WNB in Big Data Environment[J]. Statistics and Decision, 2017(14): 45-48.)
doi: 10.13546/j.cnki.tjyjc.2017.14.010
[7] 马宾, 殷立峰. 一种基于Hadoop平台的并行朴素贝叶斯网络舆情快速分类算法[J]. 现代图书情报技术, 2015(2): 78-84.
[7] (Ma Bin, Yin Lifeng.A Fast Classification Algorithm of Public Opinion Based on Parallel Naive Bayesian Network Based on Hadoop Platform[J]. New Technology of Library and Information Service, 2015(2): 78-84.)
[8] 李纲, 陈璟浩. 突发公共事件网络舆情研究综述[J]. 图书情报知识, 2014(2): 111-119.
doi: 10.13366/j.dik.2014.02.111
[8] (Li Gang, Chen Jinghao.Review of the Research on Internet Public Opinions of Public Emergencies[J].Knowledge of Library and Information Service, 2014(2): 111-119.)
doi: 10.13366/j.dik.2014.02.111
[9] Uysal A K.An Improved Global Feature Selection Scheme for Text Classification[J]. Expert Systems with Applications, 2016, 43: 82-92.
doi: 10.1016/j.eswa.2015.08.050
[10] 李真, 丁晟春, 王楠. 网络舆情观点主题识别研究[J]. 数据分析与知识发现, 2017, 1(8): 18-30.
[10] (Li Zhen, Ding Shengchun, Wang Nan.A Study on Theme Recognition of Internet Public Opinion[J]. Data Analysis and Knowledge Discovery, 2017, 1(8): 18-30.)
[11] 王国华, 冯伟, 王雅蕾. 基于网络舆情分类的舆情应对研究[J]. 情报杂志, 2013, 32(5): 1-4.
doi: 10.3969/j.issn.1002-1965.2013.05.001
[11] (Wang Guohua, Feng Wei, Wang Yalei.Research on Public Opinion Based on Internet Public Opinion Classification[J]. Journal of Intelligence. 2013, 32(5): 1-4.)
doi: 10.3969/j.issn.1002-1965.2013.05.001
[12] Nakov P, Rosenthal S, Kiritchenko S, et al.Developing a Successful SemEval Task in Sentiment Analysis of Twitter and Other Social Media Texts[J]. Language Resources and Evaluation, 2016, 50(1): 35-65.
doi: 10.1007/s10579-015-9328-1
[13] 刘小慧, 李长玲, 冯志刚. 基于改进的TF*IDF方法分析学科研究热点——以情报学为例[J]. 情报科学, 2017, 35(7): 82-87.
[13] (Liu Xiaohui, Li Changling, Feng Zhigang.Analysis of Discipline Research Hotspots Based on Improved TF×IDF Method —— A Case Study of Information Science[J]. Journal of IntelligenceScience, 2017, 35(7): 82-87.)
[14] Tang B, He H, Baggenstoss P M, et al.A Bayesian Classification Approach Using Class-Specific Features for Text Categorization[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(6): 1602-1606.
doi: 10.1109/TKDE.2016.2522427
[15] Lan M, Tan C L, Su J, et al.Supervised and Traditional Term Weighting Methods for Automatic Text Categorization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(4): 721-735.
doi: 10.1109/TPAMI.2008.110 pmid: 19229086
[16] Lan M, Tan C L, Low H B.Proposing a New Term Weighting Scheme for Text Categorization[C]//Proceedings of the 21st National Conference on Artificial Intelligence. 2006.
[17] McCallum A, Nigam K. A Comparison of Event Models for Naive Bayes Text Classification[C]//Proceedings of the 13th National Conference on Artificial Intelligence. 1998, 752: 41-48.
[18] 刘勘, 朱怀萍, 刘秀芹. 基于支持向量机的网络伪舆情识别研究[J]. 现代图书情报技术, 2013(11): 75-80.
[18] (Liu Kan, Zu Huaiping, Liu Xiuqin.Pseudo-publicaire Recognition Based on Support Vector Machine[J]. New Technology of Library and Information Service, 2013(11): 75-80.)
[19] 岑咏华, 王曰芬. 大数据环境下社会舆情分析与决策支持的研究视角和关键问题[J]. 现代图书情报技术, 2016(7-8): 3-11.
[19] (Cen Yonghua, Wang Yuefen.Study Perspective and Key Issues on Analysis and Decision Support of Social Sentiment in Big Data Environment[J]. New Technology of Library and Information Service, 2016(7-8): 3-11.)
[20] Zhang L, Jiang L, Li C, et al.Two Feature Weighting Approaches for Naive Bayes Text Classifiers[J]. Knowledge- Based Systems, 2016, 100: 137-144.
doi: 10.1007/978-3-319-11179-7_70
[21] Zhang J, Chen L, Guo G.Projected-prototype Based Classifier for Text Categorization[J]. Knowledge-Based Systems, 2013, 49: 179-189.
doi: 10.1016/j.knosys.2013.05.013
[22] Lee S, Seo K K.Intelligent Fault Diagnosis Based on a Hybrid Multi-class Support Vector Machines and Case-based Reasoning Approach[J]. Journal of Computational and Theoretical Nanoscience, 2013, 10(8): 1727-1734.
doi: 10.1166/jctn.2013.3116
[23] Chang C C, Lin C J.LIBSVM: A Library for Support Vector Machines[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2011, 2(3): 27.
[1] Wang Hanxue,Cui Wenjuan,Zhou Yuanchun,Du Yi. Identifying Pathogens of Foodborne Diseases with Machine Learning[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] Chen Donghua,Zhao Hongmei,Shang Xiaopu,Zhang Runtong. Optimizing Large Hospital Operating Rooms with Data Analytics[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[3] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[4] Su Qiang, Hou Xiaoli, Zou Ni. Predicting Surgical Infections Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[5] Cao Rui,Liao Bin,Li Min,Sun Ruina. Predicting Prices and Analyzing Features of Online Short-Term Rentals Based on XGBoost[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[6] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[7] Xiang Zhuoyuan,Liu Zhicong,Wu Yu. Adaptive Recommendation Model Based on User Behaviors[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[8] Chai Guorong,Wang Bin,Sha Yongzhong. Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[9] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[10] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[11] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[12] Wang Shuyi,Liu Sai,Ma Zheng. Microblog Image Privacy Classification with Deep Transfer Learning[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
[13] Ruojia Wang,Lu Zhang,Jimin Wang. Automatic Triage of Online Doctor Services Based on Machine Learning[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
[14] Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[15] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938