Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo
Longjia Jia1,2(),Bangzuo Zhang3
1School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China 2Department of Planning and Development, Northeast Normal University, Changchun 130024, China 3School of Computer Science and Information Technology, Northeast Normal University, Changchun 130024, China
[Objective] This paper introduces a term weighting method to classify topics of Sina Weibo posts by college students, aiming to solve the high dimension and sparsity issues. [Methods] First, we calculated the probability of a term’s falling to specific categories and then predicted the probability of a document’s category. Then, we converted the word-based features to a class-based matrix, which was classified by the support vector machine. [Results] Our new method increased the MicroF1/MacroF1values of the traditional tf, tf×idf and tf×rf methods by 7.2%/7.8%, 7.5%/7.9% and 6.4%/5.7%, respectively. [Limitations] More research is needed to explore topic classification methods other than the term weighting one in this paper. [Conclusions] The proposed method could effectively reduce the dimension of feature matrix and improve the classification efficiency for Internet public opinion studies.
贾隆嘉,张邦佐. 高校网络舆情安全中主题分类方法研究*——以新浪微博数据为例[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
Longjia Jia,Bangzuo Zhang. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo. Data Analysis and Knowledge Discovery, DOI：10.11925/infotech.2096-3467.2018.0003.
(Liao Haihan, Jin Jialin, Wang Yuefen.Analysis on the Characteristics and Relationships of Weibo Users’ Behaviors in Internet Public Opinion Incidents —— A Case Study of Sina Weibo Survey on Haze: Under Domes[J]. Information and Documentation Services, 2016(3): 12-18.)
(Ma Bin, Yin Lifeng.A Fast Classification Algorithm of Public Opinion Based on Parallel Naive Bayesian Network Based on Hadoop Platform[J]. New Technology of Library and Information Service, 2015(2): 78-84.)
(Liu Xiaohui, Li Changling, Feng Zhigang.Analysis of Discipline Research Hotspots Based on Improved TF×IDF Method —— A Case Study of Information Science[J]. Journal of IntelligenceScience, 2017, 35(7): 82-87.)
Tang B, He H, Baggenstoss P M, et al.A Bayesian Classification Approach Using Class-Specific Features for Text Categorization[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(6): 1602-1606.
Lan M, Tan C L, Su J, et al.Supervised and Traditional Term Weighting Methods for Automatic Text Categorization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(4): 721-735.
Lan M, Tan C L, Low H B.Proposing a New Term Weighting Scheme for Text Categorization[C]//Proceedings of the 21st National Conference on Artificial Intelligence. 2006.
McCallum A, Nigam K. A Comparison of Event Models for Naive Bayes Text Classification[C]//Proceedings of the 13th National Conference on Artificial Intelligence. 1998, 752: 41-48.
(Cen Yonghua, Wang Yuefen.Study Perspective and Key Issues on Analysis and Decision Support of Social Sentiment in Big Data Environment[J]. New Technology of Library and Information Service, 2016(7-8): 3-11.)
Zhang L, Jiang L, Li C, et al.Two Feature Weighting Approaches for Naive Bayes Text Classifiers[J]. Knowledge- Based Systems, 2016, 100: 137-144.
Zhang J, Chen L, Guo G.Projected-prototype Based Classifier for Text Categorization[J]. Knowledge-Based Systems, 2013, 49: 179-189.
Lee S, Seo K K.Intelligent Fault Diagnosis Based on a Hybrid Multi-class Support Vector Machines and Case-based Reasoning Approach[J]. Journal of Computational and Theoretical Nanoscience, 2013, 10(8): 1727-1734.
Chang C C, Lin C J.LIBSVM: A Library for Support Vector Machines[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2011, 2(3): 27.