Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (7): 55-62    DOI: 10.11925/infotech.2096-3467.2018.0003
Current Issue | Archive | Adv Search |
Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo
Longjia Jia1,2(),Bangzuo Zhang3
1School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China
2Department of Planning and Development, Northeast Normal University, Changchun 130024, China
3School of Computer Science and Information Technology, Northeast Normal University, Changchun 130024, China
Download: PDF(879 KB)   HTML ( 2
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper introduces a term weighting method to classify topics of Sina Weibo posts by college students, aiming to solve the high dimension and sparsity issues. [Methods] First, we calculated the probability of a term’s falling to specific categories and then predicted the probability of a document’s category. Then, we converted the word-based features to a class-based matrix, which was classified by the support vector machine. [Results] Our new method increased the MicroF1/MacroF1values of the traditional tf, tf×idf and tf×rf methods by 7.2%/7.8%, 7.5%/7.9% and 6.4%/5.7%, respectively. [Limitations] More research is needed to explore topic classification methods other than the term weighting one in this paper. [Conclusions] The proposed method could effectively reduce the dimension of feature matrix and improve the classification efficiency for Internet public opinion studies.

Key wordsInternet Public Opinion Security      Theme Classification      Term Weighting      Machine Learning     
Received: 02 January 2018      Published: 15 August 2018

Cite this article:

Longjia Jia,Bangzuo Zhang. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo. Data Analysis and Knowledge Discovery, 2018, 2(7): 55-62.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.0003     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I7/55

[1] 中国互联网络信息中心(CNNIC). 第40次中国互联网络发展状况统计报告[R/OL]. .
[1] (China Internet Network Information Center (CNNIC). The 40th Statistical Report on the Internet Development in China [R/OL].
[2] 廖海涵, 靳嘉林, 王曰芬. 网络舆情事件中微博用户行为特征和关系分析——以新浪微博“雾霾调查: 穹顶之下”为例[J]. 情报资料工作, 2016(3): 12-18.
[2] (Liao Haihan, Jin Jialin, Wang Yuefen.Analysis on the Characteristics and Relationships of Weibo Users’ Behaviors in Internet Public Opinion Incidents —— A Case Study of Sina Weibo Survey on Haze: Under Domes[J]. Information and Documentation Services, 2016(3): 12-18.)
[3] 罗泰晔. 基于Logistic模型的微博舆情热点发展预测研究[J]. 统计与信息论坛, 2017, 32(10): 91-95.
[3] (Luo Taiye.Study on the Prediction of Hotspot Development of Weibo Public Opinion Based on Logistic Model[J]. Statistics and Information Forum, 2017, 32(10): 91-95.)
[4] 王亚民, 胡悦. 基于BTM的微博舆情热点发现[J]. 情报杂志, 2016, 35(11): 119-124, 140.
[4] (Wang Yamin, Hu Yue.Discovery of Public Opinion Hotspot in Weibo Based on BTM[J]. Journal of Intelligence. 2016, 35(11): 119-124, 140.)
[5] 胡悦, 王亚民. 基于模糊神经网络的微博舆情趋势预测方法[J]. 情报科学, 2017, 35(12): 28-33.
[5] (Hu Yue, Wang Yamin.New Forecasting Method of Weibo Public Opinion Based on Fuzzy Neural Network[J]. Information Science, 2017, 35(12): 28-33.)
[6] 张宸, 韩夏. 大数据环境下基于SVM-WNB的网络舆情分类研究[J]. 统计与决策, 2017(14): 45-48.
[6] (Zhang Chen, Han Xia.Study on Network Public Opinion Classification Based on SVM-WNB in Big Data Environment[J]. Statistics and Decision, 2017(14): 45-48.)
[7] 马宾, 殷立峰. 一种基于Hadoop平台的并行朴素贝叶斯网络舆情快速分类算法[J]. 现代图书情报技术, 2015(2): 78-84.
[7] (Ma Bin, Yin Lifeng.A Fast Classification Algorithm of Public Opinion Based on Parallel Naive Bayesian Network Based on Hadoop Platform[J]. New Technology of Library and Information Service, 2015(2): 78-84.)
[8] 李纲, 陈璟浩. 突发公共事件网络舆情研究综述[J]. 图书情报知识, 2014(2): 111-119.
[8] (Li Gang, Chen Jinghao.Review of the Research on Internet Public Opinions of Public Emergencies[J].Knowledge of Library and Information Service, 2014(2): 111-119.)
[9] Uysal A K.An Improved Global Feature Selection Scheme for Text Classification[J]. Expert Systems with Applications, 2016, 43: 82-92.
[10] 李真, 丁晟春, 王楠. 网络舆情观点主题识别研究[J]. 数据分析与知识发现, 2017, 1(8): 18-30.
[10] (Li Zhen, Ding Shengchun, Wang Nan.A Study on Theme Recognition of Internet Public Opinion[J]. Data Analysis and Knowledge Discovery, 2017, 1(8): 18-30.)
[11] 王国华, 冯伟, 王雅蕾. 基于网络舆情分类的舆情应对研究[J]. 情报杂志, 2013, 32(5): 1-4.
[11] (Wang Guohua, Feng Wei, Wang Yalei.Research on Public Opinion Based on Internet Public Opinion Classification[J]. Journal of Intelligence. 2013, 32(5): 1-4.)
[12] Nakov P, Rosenthal S, Kiritchenko S, et al.Developing a Successful SemEval Task in Sentiment Analysis of Twitter and Other Social Media Texts[J]. Language Resources and Evaluation, 2016, 50(1): 35-65.
[13] 刘小慧, 李长玲, 冯志刚. 基于改进的TF*IDF方法分析学科研究热点——以情报学为例[J]. 情报科学, 2017, 35(7): 82-87.
[13] (Liu Xiaohui, Li Changling, Feng Zhigang.Analysis of Discipline Research Hotspots Based on Improved TF×IDF Method —— A Case Study of Information Science[J]. Journal of IntelligenceScience, 2017, 35(7): 82-87.)
[14] Tang B, He H, Baggenstoss P M, et al.A Bayesian Classification Approach Using Class-Specific Features for Text Categorization[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(6): 1602-1606.
[15] Lan M, Tan C L, Su J, et al.Supervised and Traditional Term Weighting Methods for Automatic Text Categorization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(4): 721-735.
[16] Lan M, Tan C L, Low H B.Proposing a New Term Weighting Scheme for Text Categorization[C]//Proceedings of the 21st National Conference on Artificial Intelligence. 2006.
[17] McCallum A, Nigam K. A Comparison of Event Models for Naive Bayes Text Classification[C]//Proceedings of the 13th National Conference on Artificial Intelligence. 1998, 752: 41-48.
[18] 刘勘, 朱怀萍, 刘秀芹. 基于支持向量机的网络伪舆情识别研究[J]. 现代图书情报技术, 2013(11): 75-80.
[18] (Liu Kan, Zu Huaiping, Liu Xiuqin.Pseudo-publicaire Recognition Based on Support Vector Machine[J]. New Technology of Library and Information Service, 2013(11): 75-80.)
[19] 岑咏华, 王曰芬. 大数据环境下社会舆情分析与决策支持的研究视角和关键问题[J]. 现代图书情报技术, 2016(7-8): 3-11.
[19] (Cen Yonghua, Wang Yuefen.Study Perspective and Key Issues on Analysis and Decision Support of Social Sentiment in Big Data Environment[J]. New Technology of Library and Information Service, 2016(7-8): 3-11.)
[20] Zhang L, Jiang L, Li C, et al.Two Feature Weighting Approaches for Naive Bayes Text Classifiers[J]. Knowledge- Based Systems, 2016, 100: 137-144.
[21] Zhang J, Chen L, Guo G.Projected-prototype Based Classifier for Text Categorization[J]. Knowledge-Based Systems, 2013, 49: 179-189.
[22] Lee S, Seo K K.Intelligent Fault Diagnosis Based on a Hybrid Multi-class Support Vector Machines and Case-based Reasoning Approach[J]. Journal of Computational and Theoretical Nanoscience, 2013, 10(8): 1727-1734.
[23] Chang C C, Lin C J.LIBSVM: A Library for Support Vector Machines[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2011, 2(3): 27.
[1] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[2] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[3] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[4] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[5] Zixuan Zhang,Hao Wang,Liping Zhu,Sanhong eng. Identifying Risks of HS Codes by China Customs[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
[6] Lina Liu,Jiayin Qi,Zhenping Zhang,Dan Zeng. Analyzing Impacts of Brand Reputation on Online Sales Based on Massive Commodity Reviews and Brand[J]. 数据分析与知识发现, 2018, 2(9): 10-21.
[7] Wei Lu,Mengqi Luo,Heng Ding,Xin Li. Image Annotation Tags by Deep Learning and Real Users: A Comparative Study[J]. 数据分析与知识发现, 2018, 2(5): 1-10.
[8] Li Wang,Lixue Zou,Xiwen Liu. Visualizing Document Correlation Based on LDA Model[J]. 数据分析与知识发现, 2018, 2(3): 98-106.
[9] Xinyue Fan,Lei Cui. Predicting Antineoplastic Drug Targets Based on Network Properties[J]. 数据分析与知识发现, 2018, 2(12): 98-108.
[10] Yang Zhao,Xini Yuan,Yawen Chen,Liqiang Wu. Predicting Conversion Rate of APP Advertising with Machine Learning[J]. 数据分析与知识发现, 2018, 2(11): 2-9.
[11] Xin Wang,Wen’gang Feng. Review of Techniques Detecting Online Extremism and Radicalization[J]. 数据分析与知识发现, 2018, 2(10): 2-8.
[12] Zhongyi Hu,Chaoqun Wang,Jiang Wu. Identifying Phishing Websites with Multiple Online Data Sources[J]. 数据分析与知识发现, 2017, 1(6): 47-55.
[13] Weimin Lv,Xiaomei Wang,Tao Han. Recommending Scientific Research Collaborators with Link Prediction and Extremely Randomized Trees Algorithm[J]. 数据分析与知识发现, 2017, 1(4): 38-45.
[14] Yue He,Min Xiao,Yue Zhang. Sentiment Analysis of Trending Topics Based on Relevance[J]. 数据分析与知识发现, 2017, 1(3): 46-53.
[15] Bincan Yin,Shichao Xin,Han Zhang,Yuhong Zhao. Building Asian Tumor-patients Prognostic Model with Bayesian Network and SEER Database——Case Study of Non-Small Cell Lung Cancer[J]. 数据分析与知识发现, 2017, 1(2): 41-46.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn