Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (2): 73-79    DOI: 10.11925/infotech.2096-3467.2017.02.10
Orginal Article Current Issue | Archive | Adv Search |
Analyzing Sentiments of Micro-blog Posts Based on Support Vector Machine
Yang Shuang(), Chen Fen
School of Economics and Management, Nanjing University of Science & Technology, Nanjing 210094, China
Download: PDF (466 KB)   HTML ( 27
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a new method based on the Support Vector Machine to monitor online public opinion. [Methods] We extracted fourteen linguistic characteristics of the micro-blog posts and analysed their sentiments with Support Vector Machine. [Results] The precision, recall and F value of the proposed method were 82.40%, 81.91%, and 82.10%, respectively. [Limitations] The size of training corpus needs to be expanded. [Conclusions] The proposed method could effectively analyze sentiments of micro-blog posts.

Key wordsMicroblog      Sentiment Analysis      Support Vector Machine      Parsing     
Received: 29 August 2016      Published: 27 March 2017
ZTFLH:  G35 TP391  

Cite this article:

Yang Shuang,Chen Fen. Analyzing Sentiments of Micro-blog Posts Based on Support Vector Machine. Data Analysis and Knowledge Discovery, 2017, 1(2): 73-79.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.02.10     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I2/73

权重 示例 个数
2.0 百分之百、绝对、非常、超、过于…… 99
1.5 很、多么、更加、不胜…… 78
1.0 比较、较为、多多少少…… 13
0.5 稍微、略为、不怎么、不为过…… 54
特征类型 含义
词性特征 微博中含有的动词数量(F1)
微博中含有的形容词数量(F2)
微博中含有的副词数量(F3)
情感特征 微博中含有的正面情感词数量(F4)
微博中含有的负向情感词数量(F5)
微博中程度副词的最高权重(F6)
微博的情感得分(F7)
句式特征 否定词的数量(F8)
感叹号的数量(F9)
问号的数量(F10)
语义特征 与情感词有关的副词性修饰语(F11)
与情感词有关的形容词性修饰语(F12)
与情感词有关的名词性主语(F13)
类别 数量
非常正面 217
正面 1 149
中立 2 081
负面 1 239
非常负面 304
特征
情感值
F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13
+2 1: 2 2: 0 3: 2 4: 2 5: 0 6: 2.0 7: 4.0 8: 0 9: 3 10: 0 11: 1 12: 0 13: 1
+1 1: 4 2: 2 3: 3 4: 3 5: 0 6: 0.0 7: 1.0 8: 0 9: 1 10: 0 11: 0 12: 2 13: 2
-2 1: 2 2: 2 3: 0 4: 0 5: 2 6: 2.0 7: -4.0 8: 1 9: 0 10: 1 11: 2 12: 0 13: 0
-1 1: 3 2: 5 3: 3 4: 1 5: 4 6: 1.0 7: -2.0 8: 3 9: 0 10: 6 11: 1 12: 3 13: 0
0 1: 3 2: 2 3: 3 4: 1 5: 0 6: 0 7: 1.0 8: 2 9: 1 10: 1 11: 2 12: 3 13: 0
实验 特征组合 准确率
1 词性 57.60%
2 词性+情感词 80.93%
3 词性+情感词+程度副词权重 81.76%
4 词性+情感词+程度副词权重+情感得分 81.95%
5 词性+情感词+程度副词权重+情感得分+
否定词
82.14%
6 词性+情感词+程度副词权重+情感得分+
否定词+问号和感叹号
82.22%
7 词性+情感词+程度副词权重+情感得分+
否定词+问号和感叹号+语义特征
82.40%
方法 准确率 召回率 F1值
本文方法 82.40% 81.91% 82.10%
层叠CRFs方法 75.31% 73.30% 74.30%
[1] 王雪猛, 王玉平. 基于情感倾向分析的突发事件网络舆情预警研究[J]. 西南科技大学学报: 哲学社会科学版, 2016, 33(1): 63-66.
doi: 10.3969/j.issn.51-1660/C.2016.01.011
[1] (Wang Xuemeng, Wang Yuping.Research of Emergency Network Public Sentiment Warning Based on the Analysis of Emotional Tendency[J]. Journal of Southwest University of Science and Technology: Philosophy and Social Science Edition, 2016, 33(1): 63-66.)
doi: 10.3969/j.issn.51-1660/C.2016.01.011
[2] Kamps J, Marx M, Mokken R J, et al.Using WordNet to Measure Semantic Orientations of Adjectives[C]// Proceedings of the 4th International Conference on Language Resources and Evaluation. 2004.
[3] Shen Y, Li S, Zheng L, et al.Emotion Mining Research on Micro-blog[C]// Proceedings of the 1st IEEE Symposium on Web Society. 2009.
[4] 郑诚, 杨希, 张吉赓. 结合情感词典与规则的微博情感极性分类方法[J]. 电脑知识与技术, 2014, 10(13): 3111-3113.
[4] (Zheng Cheng, Yang Xi, Zhang Jigeng.Micro-blog Sentiment Analysis of Combined Sentiment Dictionary and Rules[J]. Computer Knowledge and Technology, 2014, 10(13): 3111-3113.)
[5] 张阳, 刘晓霞, 孙凯龙, 等. 基于情感描述项的文本倾向性识别研究[J]. 计算机工程与应用, 2015, 51(4): 158-161, 195.
doi: 10.3778/j.issn.1002-8331.1304-0321
[5] (Zhang Yang, Liu Xiaoxia, Sun Kailong, et al.Research on Text Orientation Identification Based on Emotional Description Item[J]. Computer Engineering and Applications, 2015, 51(4): 158-161, 195.)
doi: 10.3778/j.issn.1002-8331.1304-0321
[6] Pang B, Lee L, Vaithyanathan S.Thumbs up? Sentiment Classification Using Machine Learning Techniques[C]// Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing. 2002.
[7] Borbosa L, Feng J.Robust Sentiment Detection on Twitter from Biased and Noisy Data [C]//Proceedings of the 23rd International Conference on Computational Linguistics. Beijing: Tsinghua University Press. 2010.
[8] Davidov D, Tsur O, Rappoport A.Enhanced Sentiment Learning Using Twitter Hashtags and Smileys[C]// Proceedings of the 23rd International Conference on Computational Linguistics: Posters, 2010: 241-249.
[9] 夏梦南, 杜永萍, 左本欣. 基于依存分析与特征组合的微博情感分析[J]. 山东大学学报: 理学版, 2014, 49(11): 22-30.
doi: 10.6040/j.issn.1671-9352.3.2014.074
[9] (Xia Mengnan, Du Yongping, Zuo Benxin.Micro-blog Opinion Analysis Based on Syntactic Dependency and Feature Combination[J]. Journal of Shandong University: Natural Science, 2014, 49(11): 22-30.)
doi: 10.6040/j.issn.1671-9352.3.2014.074
[10] Ding S, Jiang T, Wen N.Research on Sentiment Orientation of Product Reviews in Chinese Based on Cascaded CRFs Models[C]//Proceeding of the 2012 International Conference on Machine Learning and Cybernetics (ICMLC 2012). IEEE, 2012.
[11] 魏晶晶, 吴晓吟. 电子商务产品评论多级情感分析的研究与实现[J]. 软件, 2013, 34(9): 65-67, 94.
doi: 10.3969/j.issn.1003-6970.2013.09.020
[11] (Wei Jingjing, Wu Xiaoyin.Research on Multi-level Sentiment Analysis System of E-Commerce Product Review and Implementation[J]. Software, 2013, 34(9): 65-67, 94.)
doi: 10.3969/j.issn.1003-6970.2013.09.020
[12] 廖健, 王素格, 李德玉, 等. 基于观点袋模型的汽车评论情感极性分类[J]. 中文信息学报, 2015, 29(3): 113-120.
[12] (Liao Jian, Wang Suge, Li Deyu, et al.The Bag-of-Opinions Method for Car Review Sentiment Polarity Classification[J]. Journal of Chinese Information Processing, 2015, 29(3): 113-120.)
[13] Word2Vec [EB/OL]. [2015-01-12]. .
[14] Liu Z, Yu W, Chen W, et al.Short Text Feature Selection for Micro-blog Mining[C]//Proceedings of the 2010 International Conference on Computational Intelligence and Software Engineering. IEEE, 2010.
[15] 吴明芬, 陈涛. 基于SVM的以词性和依存关系为特征的句子倾向性判断分析[J]. 五邑大学学报: 自然科学版, 2012, 26(4): 66-71.
doi: 10.3969/j.issn.1006-7302.2012.04.015
[15] (Wu Mingfen, Chen Tao.Sentences Tendency Judgement by POS and Dependency Based on SVM[J]. Journal of Wuyi University: Natural Science Edition, 2012: 26(4): 66-71.)
doi: 10.3969/j.issn.1006-7302.2012.04.015
[16] 刘海涛. 依存语法的理论与实践[M].北京: 科学出版社, 2009.
[16] (Liu Haitao.Dependency Grammar: From Theory to Practice [M]. Beijing: Science Press, 2009.)
[17] Stanford Parser [EB/OL]. [2015-06-16]. .
[18] 彭玥. 基于文本倾向性的网络意见领袖识别[D]. 南京: 南京理工大学, 2014.
[18] (Peng Yue.Internet Opinion Leader Detection Based on Text Sentiment Analysis [D]. Nanjing: Nanjing University of Science and Technology, 2014.)
[19] NLPIR/ICTCLAS [EB/OL]. [2015-12-02]. .
[20] LibSVM [EB/OL]. [2015-07-12].https://www.csie.ntu.edu.tw/~cjlin/libsvm/.
[1] Xu Hongxia,Yu Qianqian,Qian Li. Studying Content Interaction Data with Topic Model and Sentiment Analysis[J]. 数据分析与知识发现, 2020, 4(7): 110-117.
[2] Jiang Lin,Zhang Qilin. Research on Academic Evaluation Based on Fine-Grain Citation Sentimental Quantification[J]. 数据分析与知识发现, 2020, 4(6): 129-138.
[3] Shi Lei,Wang Yi,Cheng Ying,Wei Ruibin. Review of Attention Mechanism in Natural Language Processing[J]. 数据分析与知识发现, 2020, 4(5): 1-14.
[4] Li Tiejun,Yan Duanwu,Yang Xiongfei. Recommending Microblogs Based on Emotion-Weighted Association Rules[J]. 数据分析与知识发现, 2020, 4(4): 27-33.
[5] Shen Zhuo,Li Yan. Mining User Reviews with PreLM-FT Fine-Grain Sentiment Analysis[J]. 数据分析与知识发现, 2020, 4(4): 63-71.
[6] Liang Yanping,An Lu,Liu Jing. Topic Resonance of Micro-blogs on Similar Public Health Emergencies[J]. 数据分析与知识发现, 2020, 4(2/3): 122-133.
[7] Xu Yuemei,Liu Yunwen,Cai Lianqiao. Predicitng Retweets of Government Microblogs with Deep-combined Features[J]. 数据分析与知识发现, 2020, 4(2/3): 18-28.
[8] Xue Fuliang,Liu Lifang. Fine-Grained Sentiment Analysis with CRF and ATAE-LSTM[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
[9] Ding Shengchun,Yu Fengyang,Li Zhen. Identifying Potential Trending Topics of Online Public Opinion[J]. 数据分析与知识发现, 2020, 4(2/3): 29-38.
[10] Ying Tan,Jin Zhang,Lixin Xia. A Survey of Sentiment Analysis on Social Media[J]. 数据分析与知识发现, 2020, 4(1): 1-11.
[11] Hui Nie,Huan He. Identifying Implicit Features with Word Embedding[J]. 数据分析与知识发现, 2020, 4(1): 99-110.
[12] Bocheng Li,Yunqiu Zhang,Kaixi Yang. Extracting Emotion Tags from Comments of Microblog Commodities[J]. 数据分析与知识发现, 2019, 3(9): 115-123.
[13] Heran Qin,Liu Liu,Bin Li,Dongbo Wang. Automatic Classification of Ancient Classics with Entity Features[J]. 数据分析与知识发现, 2019, 3(9): 68-76.
[14] Ruojia Wang,Lu Zhang,Jimin Wang. Automatic Triage of Online Doctor Services Based on Machine Learning[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
[15] Yonghua Cen,Zhihao Tan,Chengyao Wu. Impacts of Financial Media Information on Stock Market: An Empirical Study of Sentiment Analysis[J]. 数据分析与知识发现, 2019, 3(9): 98-114.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn