|
|
Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo |
Jia Longjia1,2(), Zhang Bangzuo3 |
1School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China 2Department of Planning and Development, Northeast Normal University, Changchun 130024, China 3School of Computer Science and Information Technology, Northeast Normal University, Changchun 130024, China |
|
|
Abstract [Objective] This paper introduces a term weighting method to classify topics of Sina Weibo posts by college students, aiming to solve the high dimension and sparsity issues. [Methods] First, we calculated the probability of a term’s falling to specific categories and then predicted the probability of a document’s category. Then, we converted the word-based features to a class-based matrix, which was classified by the support vector machine. [Results] Our new method increased the MicroF1/MacroF1values of the traditional tf, tf×idf and tf×rf methods by 7.2%/7.8%, 7.5%/7.9% and 6.4%/5.7%, respectively. [Limitations] More research is needed to explore topic classification methods other than the term weighting one in this paper. [Conclusions] The proposed method could effectively reduce the dimension of feature matrix and improve the classification efficiency for Internet public opinion studies.
|
Received: 02 January 2018
Published: 15 August 2018
|
|
[1] |
中国互联网络信息中心(CNNIC). 第40次中国互联网络发展状况统计报告[R/OL]. .
|
[1] |
(China Internet Network Information Center (CNNIC). The 40th Statistical Report on the Internet Development in China [R/OL].
|
[2] |
廖海涵, 靳嘉林, 王曰芬. 网络舆情事件中微博用户行为特征和关系分析——以新浪微博“雾霾调查: 穹顶之下”为例[J]. 情报资料工作, 2016(3): 12-18.
doi: 10.3969/j.issn.1002-0314.2016.03.002
|
[2] |
(Liao Haihan, Jin Jialin, Wang Yuefen.Analysis on the Characteristics and Relationships of Weibo Users’ Behaviors in Internet Public Opinion Incidents —— A Case Study of Sina Weibo Survey on Haze: Under Domes[J]. Information and Documentation Services, 2016(3): 12-18.)
doi: 10.3969/j.issn.1002-0314.2016.03.002
|
[3] |
罗泰晔. 基于Logistic模型的微博舆情热点发展预测研究[J]. 统计与信息论坛, 2017, 32(10): 91-95.
|
[3] |
(Luo Taiye.Study on the Prediction of Hotspot Development of Weibo Public Opinion Based on Logistic Model[J]. Statistics and Information Forum, 2017, 32(10): 91-95.)
|
[4] |
王亚民, 胡悦. 基于BTM的微博舆情热点发现[J]. 情报杂志, 2016, 35(11): 119-124, 140.
doi: 10.3969/j.issn.1002-1965.2016.11.022
|
[4] |
(Wang Yamin, Hu Yue.Discovery of Public Opinion Hotspot in Weibo Based on BTM[J]. Journal of Intelligence. 2016, 35(11): 119-124, 140.)
doi: 10.3969/j.issn.1002-1965.2016.11.022
|
[5] |
胡悦, 王亚民. 基于模糊神经网络的微博舆情趋势预测方法[J]. 情报科学, 2017, 35(12): 28-33.
|
[5] |
(Hu Yue, Wang Yamin.New Forecasting Method of Weibo Public Opinion Based on Fuzzy Neural Network[J]. Information Science, 2017, 35(12): 28-33.)
|
[6] |
张宸, 韩夏. 大数据环境下基于SVM-WNB的网络舆情分类研究[J]. 统计与决策, 2017(14): 45-48.
doi: 10.13546/j.cnki.tjyjc.2017.14.010
|
[6] |
(Zhang Chen, Han Xia.Study on Network Public Opinion Classification Based on SVM-WNB in Big Data Environment[J]. Statistics and Decision, 2017(14): 45-48.)
doi: 10.13546/j.cnki.tjyjc.2017.14.010
|
[7] |
马宾, 殷立峰. 一种基于Hadoop平台的并行朴素贝叶斯网络舆情快速分类算法[J]. 现代图书情报技术, 2015(2): 78-84.
|
[7] |
(Ma Bin, Yin Lifeng.A Fast Classification Algorithm of Public Opinion Based on Parallel Naive Bayesian Network Based on Hadoop Platform[J]. New Technology of Library and Information Service, 2015(2): 78-84.)
|
[8] |
李纲, 陈璟浩. 突发公共事件网络舆情研究综述[J]. 图书情报知识, 2014(2): 111-119.
doi: 10.13366/j.dik.2014.02.111
|
[8] |
(Li Gang, Chen Jinghao.Review of the Research on Internet Public Opinions of Public Emergencies[J].Knowledge of Library and Information Service, 2014(2): 111-119.)
doi: 10.13366/j.dik.2014.02.111
|
[9] |
Uysal A K.An Improved Global Feature Selection Scheme for Text Classification[J]. Expert Systems with Applications, 2016, 43: 82-92.
doi: 10.1016/j.eswa.2015.08.050
|
[10] |
李真, 丁晟春, 王楠. 网络舆情观点主题识别研究[J]. 数据分析与知识发现, 2017, 1(8): 18-30.
|
[10] |
(Li Zhen, Ding Shengchun, Wang Nan.A Study on Theme Recognition of Internet Public Opinion[J]. Data Analysis and Knowledge Discovery, 2017, 1(8): 18-30.)
|
[11] |
王国华, 冯伟, 王雅蕾. 基于网络舆情分类的舆情应对研究[J]. 情报杂志, 2013, 32(5): 1-4.
doi: 10.3969/j.issn.1002-1965.2013.05.001
|
[11] |
(Wang Guohua, Feng Wei, Wang Yalei.Research on Public Opinion Based on Internet Public Opinion Classification[J]. Journal of Intelligence. 2013, 32(5): 1-4.)
doi: 10.3969/j.issn.1002-1965.2013.05.001
|
[12] |
Nakov P, Rosenthal S, Kiritchenko S, et al.Developing a Successful SemEval Task in Sentiment Analysis of Twitter and Other Social Media Texts[J]. Language Resources and Evaluation, 2016, 50(1): 35-65.
doi: 10.1007/s10579-015-9328-1
|
[13] |
刘小慧, 李长玲, 冯志刚. 基于改进的TF*IDF方法分析学科研究热点——以情报学为例[J]. 情报科学, 2017, 35(7): 82-87.
|
[13] |
(Liu Xiaohui, Li Changling, Feng Zhigang.Analysis of Discipline Research Hotspots Based on Improved TF×IDF Method —— A Case Study of Information Science[J]. Journal of IntelligenceScience, 2017, 35(7): 82-87.)
|
[14] |
Tang B, He H, Baggenstoss P M, et al.A Bayesian Classification Approach Using Class-Specific Features for Text Categorization[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(6): 1602-1606.
doi: 10.1109/TKDE.2016.2522427
|
[15] |
Lan M, Tan C L, Su J, et al.Supervised and Traditional Term Weighting Methods for Automatic Text Categorization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(4): 721-735.
doi: 10.1109/TPAMI.2008.110
pmid: 19229086
|
[16] |
Lan M, Tan C L, Low H B.Proposing a New Term Weighting Scheme for Text Categorization[C]//Proceedings of the 21st National Conference on Artificial Intelligence. 2006.
|
[17] |
McCallum A, Nigam K. A Comparison of Event Models for Naive Bayes Text Classification[C]//Proceedings of the 13th National Conference on Artificial Intelligence. 1998, 752: 41-48.
|
[18] |
刘勘, 朱怀萍, 刘秀芹. 基于支持向量机的网络伪舆情识别研究[J]. 现代图书情报技术, 2013(11): 75-80.
|
[18] |
(Liu Kan, Zu Huaiping, Liu Xiuqin.Pseudo-publicaire Recognition Based on Support Vector Machine[J]. New Technology of Library and Information Service, 2013(11): 75-80.)
|
[19] |
岑咏华, 王曰芬. 大数据环境下社会舆情分析与决策支持的研究视角和关键问题[J]. 现代图书情报技术, 2016(7-8): 3-11.
|
[19] |
(Cen Yonghua, Wang Yuefen.Study Perspective and Key Issues on Analysis and Decision Support of Social Sentiment in Big Data Environment[J]. New Technology of Library and Information Service, 2016(7-8): 3-11.)
|
[20] |
Zhang L, Jiang L, Li C, et al.Two Feature Weighting Approaches for Naive Bayes Text Classifiers[J]. Knowledge- Based Systems, 2016, 100: 137-144.
doi: 10.1007/978-3-319-11179-7_70
|
[21] |
Zhang J, Chen L, Guo G.Projected-prototype Based Classifier for Text Categorization[J]. Knowledge-Based Systems, 2013, 49: 179-189.
doi: 10.1016/j.knosys.2013.05.013
|
[22] |
Lee S, Seo K K.Intelligent Fault Diagnosis Based on a Hybrid Multi-class Support Vector Machines and Case-based Reasoning Approach[J]. Journal of Computational and Theoretical Nanoscience, 2013, 10(8): 1727-1734.
doi: 10.1166/jctn.2013.3116
|
[23] |
Chang C C, Lin C J.LIBSVM: A Library for Support Vector Machines[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2011, 2(3): 27.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|