Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (7): 55-62    DOI: 10.11925/infotech.2096-3467.2018.0003
Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo
Longjia Jia1,2(),Bangzuo Zhang3
1School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China
2Department of Planning and Development, Northeast Normal University, Changchun 130024, China
3School of Computer Science and Information Technology, Northeast Normal University, Changchun 130024, China
[Objective] This paper introduces a term weighting method to classify topics of Sina Weibo posts by college students, aiming to solve the high dimension and sparsity issues. [Methods] First, we calculated the probability of a term’s falling to specific categories and then predicted the probability of a document’s category. Then, we converted the word-based features to a class-based matrix, which was classified by the support vector machine. [Results] Our new method increased the MicroF1/MacroF1values of the traditional tf, tf×idf and tf×rf methods by 7.2%/7.8%, 7.5%/7.9% and 6.4%/5.7%, respectively. [Limitations] More research is needed to explore topic classification methods other than the term weighting one in this paper. [Conclusions] The proposed method could effectively reduce the dimension of feature matrix and improve the classification efficiency for Internet public opinion studies.

Key wordsInternet Public Opinion Security      Theme Classification      Term Weighting      Machine Learning     
Received: 02 January 2018      Published: 15 August 2018

Longjia Jia,Bangzuo Zhang. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo. Data Analysis and Knowledge Discovery, 2018, 2(7): 55-62.

