Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo
Jia Longjia1,2(), Zhang Bangzuo3
1School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China 2Department of Planning and Development, Northeast Normal University, Changchun 130024, China 3School of Computer Science and Information Technology, Northeast Normal University, Changchun 130024, China
[Objective] This paper introduces a term weighting method to classify topics of Sina Weibo posts by college students, aiming to solve the high dimension and sparsity issues. [Methods] First, we calculated the probability of a term’s falling to specific categories and then predicted the probability of a document’s category. Then, we converted the word-based features to a class-based matrix, which was classified by the support vector machine. [Results] Our new method increased the MicroF1/MacroF1values of the traditional tf, tf×idf and tf×rf methods by 7.2%/7.8%, 7.5%/7.9% and 6.4%/5.7%, respectively. [Limitations] More research is needed to explore topic classification methods other than the term weighting one in this paper. [Conclusions] The proposed method could effectively reduce the dimension of feature matrix and improve the classification efficiency for Internet public opinion studies.
贾隆嘉, 张邦佐. 高校网络舆情安全中主题分类方法研究*——以新浪微博数据为例[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
Jia Longjia,Zhang Bangzuo. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo. Data Analysis and Knowledge Discovery, 2018, 2(7): 55-62.
(Liao Haihan, Jin Jialin, Wang Yuefen.Analysis on the Characteristics and Relationships of Weibo Users’ Behaviors in Internet Public Opinion Incidents —— A Case Study of Sina Weibo Survey on Haze: Under Domes[J]. Information and Documentation Services, 2016(3): 12-18.)
doi: 10.3969/j.issn.1002-0314.2016.03.002
(Luo Taiye.Study on the Prediction of Hotspot Development of Weibo Public Opinion Based on Logistic Model[J]. Statistics and Information Forum, 2017, 32(10): 91-95.)
(Wang Yamin, Hu Yue.Discovery of Public Opinion Hotspot in Weibo Based on BTM[J]. Journal of Intelligence. 2016, 35(11): 119-124, 140.)
doi: 10.3969/j.issn.1002-1965.2016.11.022
(Zhang Chen, Han Xia.Study on Network Public Opinion Classification Based on SVM-WNB in Big Data Environment[J]. Statistics and Decision, 2017(14): 45-48.)
doi: 10.13546/j.cnki.tjyjc.2017.14.010
(Ma Bin, Yin Lifeng.A Fast Classification Algorithm of Public Opinion Based on Parallel Naive Bayesian Network Based on Hadoop Platform[J]. New Technology of Library and Information Service, 2015(2): 78-84.)
(Li Gang, Chen Jinghao.Review of the Research on Internet Public Opinions of Public Emergencies[J].Knowledge of Library and Information Service, 2014(2): 111-119.)
doi: 10.13366/j.dik.2014.02.111
[9]
Uysal A K.An Improved Global Feature Selection Scheme for Text Classification[J]. Expert Systems with Applications, 2016, 43: 82-92.
doi: 10.1016/j.eswa.2015.08.050
(Li Zhen, Ding Shengchun, Wang Nan.A Study on Theme Recognition of Internet Public Opinion[J]. Data Analysis and Knowledge Discovery, 2017, 1(8): 18-30.)
(Wang Guohua, Feng Wei, Wang Yalei.Research on Public Opinion Based on Internet Public Opinion Classification[J]. Journal of Intelligence. 2013, 32(5): 1-4.)
doi: 10.3969/j.issn.1002-1965.2013.05.001
[12]
Nakov P, Rosenthal S, Kiritchenko S, et al.Developing a Successful SemEval Task in Sentiment Analysis of Twitter and Other Social Media Texts[J]. Language Resources and Evaluation, 2016, 50(1): 35-65.
doi: 10.1007/s10579-015-9328-1
(Liu Xiaohui, Li Changling, Feng Zhigang.Analysis of Discipline Research Hotspots Based on Improved TF×IDF Method —— A Case Study of Information Science[J]. Journal of IntelligenceScience, 2017, 35(7): 82-87.)
[14]
Tang B, He H, Baggenstoss P M, et al.A Bayesian Classification Approach Using Class-Specific Features for Text Categorization[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(6): 1602-1606.
doi: 10.1109/TKDE.2016.2522427
[15]
Lan M, Tan C L, Su J, et al.Supervised and Traditional Term Weighting Methods for Automatic Text Categorization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(4): 721-735.
doi: 10.1109/TPAMI.2008.110
pmid: 19229086
[16]
Lan M, Tan C L, Low H B.Proposing a New Term Weighting Scheme for Text Categorization[C]//Proceedings of the 21st National Conference on Artificial Intelligence. 2006.
[17]
McCallum A, Nigam K. A Comparison of Event Models for Naive Bayes Text Classification[C]//Proceedings of the 13th National Conference on Artificial Intelligence. 1998, 752: 41-48.
(Liu Kan, Zu Huaiping, Liu Xiuqin.Pseudo-publicaire Recognition Based on Support Vector Machine[J]. New Technology of Library and Information Service, 2013(11): 75-80.)
(Cen Yonghua, Wang Yuefen.Study Perspective and Key Issues on Analysis and Decision Support of Social Sentiment in Big Data Environment[J]. New Technology of Library and Information Service, 2016(7-8): 3-11.)
[20]
Zhang L, Jiang L, Li C, et al.Two Feature Weighting Approaches for Naive Bayes Text Classifiers[J]. Knowledge- Based Systems, 2016, 100: 137-144.
doi: 10.1007/978-3-319-11179-7_70
[21]
Zhang J, Chen L, Guo G.Projected-prototype Based Classifier for Text Categorization[J]. Knowledge-Based Systems, 2013, 49: 179-189.
doi: 10.1016/j.knosys.2013.05.013
[22]
Lee S, Seo K K.Intelligent Fault Diagnosis Based on a Hybrid Multi-class Support Vector Machines and Case-based Reasoning Approach[J]. Journal of Computational and Theoretical Nanoscience, 2013, 10(8): 1727-1734.
doi: 10.1166/jctn.2013.3116
[23]
Chang C C, Lin C J.LIBSVM: A Library for Support Vector Machines[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2011, 2(3): 27.