Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (7): 61-72    DOI: 10.11925/infotech.2096-3467.2017.0516
Orginal Article Current Issue | Archive | Adv Search |
Fine-grained Sentiment Analysis Based on Weibo
Dun Xinhui1, Zhang Yunqiu1(), Yang Kaixi2
1School of Public Health, Jilin University, Changchun 130021, China
2International School of Information Science & Engineering, Dalian University of Technology, Dalian 116620, China
Download: PDF (1622 KB)   HTML ( 3
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper conducts a fine-grained sentiment analysis of Weibo posts by dividing the sentiments into eight categories and calculating their intensity values. [Methods] First, we analyzed the Weibo corpus to construct the question word list. Besides the seven sentiments defined by DUTIR, we added “suspected” to the list. Then, we used the Pointwise Mutual Information method, the impacts of negative words and the degree adverbs to construct the expression symbol dictionary. We employed Python to retrieve the needed data from Weibo, and applied the jiebaR package to segment the words. Finally, we classified the sentiments and calculated their intensity. [Results] We got the proportion of eight sentiment categories and sentiment intensity of commonly used drugs for diabetes. The Precision values of “angry” and “sad” were the highest (85.73% and 83.05%), while the Recall and F values of “happy” and “like” were the highest (more than 81%). The Precision, Recall and F values of “suspected” were 77.33%, 78.58% and 77.95% respectively. [Limitations] The sentiment dictionary needs to be expanded. [Conclusions] The proposed model could analyze the sentiment of Weibo Posts more effectively than traditional methods.

Key wordsMicroblog      Fine-grained Sentiment Analysis      Drug     
Received: 31 May 2017      Published: 26 July 2017
ZTFLH:  TP393  

Cite this article:

Dun Xinhui,Zhang Yunqiu,Yang Kaixi. Fine-grained Sentiment Analysis Based on Weibo. Data Analysis and Knowledge Discovery, 2017, 1(7): 61-72.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.0516     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I7/61

词语 词性种类 词义数 词义序号 情感分类 强度 极性 辅助情感分类 强度 极性
无所畏惧 idiom 1 1 PH 7 1
手头紧 idiom 1 1 NE 7 0
周到 adj 1 1 PH 5 1
言过其实 idiom 1 1 NN 5 -1
编号 情感大类 情感类 例词
1 快乐(PA) 喜悦、欢喜、笑眯眯、欢天喜地
2 安心(PE) 踏实、宽心、定心丸、问心无愧
3 尊敬(PD) 恭敬、敬爱、毕恭毕敬、肃然起敬
4 赞扬(PH) 英俊、优秀、通情达理、实事求是
5 相信(PG) 信任、信赖、可靠、毋庸置疑
6 喜爱(PB) 倾慕、宝贝、一见钟情、爱不释手
7 祝愿(PK) 渴望、保佑、福寿绵长、万寿无疆
8 愤怒(NA) 气愤、恼火、大发雷霆、七窍生烟
9 悲伤(NB) 忧伤、悲苦、心如刀割、悲痛欲绝
10 失望(NJ) 憾事、绝望、灰心丧气、心灰意冷
11 疚(NH) 内疚、忏悔、过意不去、问心有愧
12 思(PF) 思念、相思、牵肠挂肚、朝思暮想
13 慌(NI) 慌张、心慌、不知所措、手忙脚乱
14 恐惧(NC) 胆怯、害怕、担惊受怕、胆颤心惊
15 羞(NG) 害羞、害臊、面红耳赤、无地自容
16 烦闷(NE) 憋闷、烦躁、心烦意乱、自寻烦恼
17 憎恶(ND) 反感、可耻、恨之入骨、深恶痛绝
18 贬责(NN) 呆板、虚荣、杂乱无章、心狠手辣
19 妒忌(NK) 眼红、吃醋、醋坛子、嫉贤妒能
20 怀疑(NL) 多心、生疑、将信将疑、疑神疑鬼
21 惊奇(PC) 奇怪、奇迹、大吃一惊、瞠目结舌
序号 疑问词 强度值 极性值
1 哪儿、哪里、怎么样、怎么着、如何、为什么、难道、'呢?'、'吧?'、'啊?'、啥、为何、怎么办、哪些、问题、请问、为神马、神马情况、为啥、干嘛、能否、何时、求问 7 1
2 谁、何、什么、神马、几时、怎么、怎的、怎样、岂、何尝、吗、么、多大、有没有、会不会、好不好、能不能、可不可以、行不行 5 1
3 几、多少、怎、难怪、反倒、何必、你知道 3 1
4 居然、竟然、究竟 1 1
序号 程度副词 强度值
1 极、极为、极其、透顶、极端、顶、最、最为、绝顶、无比 2
2 多、很、非常、甚至、十分、太、分外、特别、万分、尤其、真、格外、何等、过于、多么、更加、更为、更、越加、越发、愈加、愈、相当、好 1.5
3 颇、挺、比较、较、较为、较比 1.2
4 怪、有点、有点儿、有些、稍、稍稍、稍微、稍许、少许、略、略微 0.5
否定词
白白、甭、别、并非、不、不必、不曾、不可、不要、不用、从不、从未、非、毫不、毫无、何必、何曾、何尝、何须、决不、绝不、绝非、绝无、没、没有、莫、难以、切勿、尚未、徒、徒然、枉、未、未必、未曾、未尝、未有、无从、无须、无庸、毋须、毋庸、勿
表情符号 情感分类 表情符号 情感分类
[doge] 8 [抱抱] 2
[喵喵] 1 [坏笑] 1
[二哈] 1 [舔屏] 2
[打脸] 4 [污] 1
[哆啦A梦笑] 1 [允悲] 4
[哆啦A梦汗] 7 [笑而不语] 1
[话筒] 2 [费解] 8
[哆啦A梦开心] 1 [憧憬] 2
[笑cry] 1 [并不简单] 2
[摊手] 8 [微笑] 1
情感分类 表情符号 数量
[微笑][哈哈][偷笑][太开心] 32
[爱你][亲亲][鼓掌][心] 31
[怒][抓狂][怒骂] 9
[允悲][委屈][失望][悲伤] 14
[害羞][哆啦A梦害怕][羞嗒嗒] 8
[坏笑][挖鼻][闭嘴][鄙视] 8
[吃惊][惊恐] 5
[费解][疑问] 6
总计 113
序号 类型 示例
1 仅含情感词 热情
2 否定词+情感词 不 热情
3 程度副词+情感词 太 热情
4 否定词+程度副词+情感词 不 太 热情
5 程度副词+否定词+情感词 太 不 热情
6 否定词+否定词+情感词 没有 不 热情
种类 名称 数量 总计
双胍类口服降糖药 二甲双胍 248 353
格华止、美迪康 105
磺脲类口服降糖药 格列吡嗪 119 166
瑞易宁 47
非磺脲类口服降糖药 瑞格列奈 162 203
诺和龙 41
α葡萄糖苷酶抑制剂 阿卡波糖 172 260
拜糖平 88
胰岛素增敏剂 罗格列酮 61 205
文迪雅 144
DPP-4抑制剂 西格列汀 186 305
捷诺维 119
复方制剂 消渴丸 212 212
总计 1 704
序号 特征词 词频 序号 特征词 词频
1 糖尿病 145 10 服药 29
2 患者 121 11 第一口 28
3 服用 89 12 餐前 26
4 治疗 84 13 餐后 25
5 降糖药 76 14 用药 25
6 胰岛素 59 15 长生不老 21
7 口服 55 16 副作用 20
8 低血糖 50 17 首例 20
9 餐后血糖 35
情感类别 Precision Recall F
79.00% 83.15% 81.02%
77.18% 85.56% 81.15%
85.73% 38.83% 53.45%
83.05% 35.65% 49.89%
53.42% 47.12% 50.07%
64.67% 66.96% 65.80%
54.58% 33.37% 41.42%
77.33% 78.58% 77.95%
[1] CNNIC. 第39次中国互联网络发展状况统计报告[R]. 中国互联网络信息中心, 2017.
[1] (CNNIC.The Report of The 39th China Internet Development Statistics[R]. Information Center of the China Internet Network, 2017.)
[2] 蓝天广. 电子商务产品在线评论的细粒度情感强度分析[D]. 北京: 北京邮电大学, 2015.
[2] (Lan Tianguang.Fine-Grained Sentiment Analysis of E-Commerce Online Reviews [D]. Beijing: Beijing University of Posts and Telecommunications, 2015.)
[3] 李长江. 基于酒店中文评论情感倾向分析[D]. 广州: 华南理工大学, 2016.
[3] (Li Changjiang.Text Sentiment Polarity Analysis Based on Chinese Reviews in Hotel Domain [D]. Guangzhou: South China University of Technology, 2016.)
[4] 贾治中. 基于依存句法分析的中文评价对象抽取和情感倾向性分析[D]. 南京: 东南大学, 2016.
[4] (Jia Zhizhong.Chinese Opinion Target Extraction and Orientation Analysis Based on Syntactic Dependencies [D]. Nanjing: Southeast University, 2016.)
[5] 彭云, 万常选, 江腾蛟, 等. 基于语义约束LDA的商品特征和情感词提取[J]. 软件学报, 2017, 28(3): 676-693.
doi: 10.13328/j.cnki.jos.005154
[5] (Peng Yun, Wan Changxuan, Jiang Tengjiao, et al.Extracting Product Aspects and User Opinions Based on Semantic Constrained LDA Model[J]. Journal of Software, 2017, 28(3): 676-693.)
doi: 10.13328/j.cnki.jos.005154
[6] Pang B, Lee L, Vaithyanathan S.Thumbs up? Sentiment Classification Using Machine Learning Techniques[C]// Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Philadelphia. USA: Association for Computational Linguistics, 2002: 79-86.
[7] 杨艳霞. 基于分类的微博情感分析算法研究及实现[J]. 计算机与数字工程, 2017, 45(2): 197-200, 396.
[7] (Yang Yanxia.Microblog Sentiment Analysis Algorithm Research and Implementation Based on Classification[J]. Computer & Digital Engineering, 2017, 45(2): 197-200, 396.)
[8] 陈炳丰, 郝志峰, 蔡瑞初, 等. 面向汽车评论的细粒度情感分析方法研究[J]. 广东工业大学学报, 2017, 34(3): 8-14.
[8] (Chen Bingfeng, Hao Zhifeng, Cai Ruichu, et al.A Fine-grained Sentiment Analysis Algorithm for Automotive Reviews[J]. Journal of Guangdong University of Technology, 2017, 34(3): 8-14.)
[9] 朱晓光. 基于半监督学习的微博情感分析方法研究[D]. 济南: 山东财经大学, 2014.
[9] (Zhu Xiaoguang.Research on Microblog Sentiment Analysis Based on Semi-supervised Learning [D]. Jinan: Shandong University of Finance and Economics, 2014.)
[10] 程佳军. 基于半监督递归自动编码的微博情感分析方法研究[D]. 长沙: 国防科学技术大学, 2014.
[10] (Cheng Jiajun.Research on Sentiment Analysis of Microblog Based on Semi-suprvise Recursive Auto Encoder [D]. Changsha: National University of Defense Technology, 2014.)
[11] 熊德兰, 程菊明, 田胜利. 基于HowNet的句子褒贬倾向性研究[J]. 计算机工程与应用, 2008, 44(22): 143-145.
[11] (Xiong Delan, Cheng Juming, Tian Shengli.Sentence Orientation Research Based on HowNet[J]. Computer Engineering and Applications, 2008, 44(22): 143-145.)
[12] 潘明慧, 牛耘. 基于多线索混合词典的微博情绪识别[J]. 计算机技术与发展, 2014, 24(9): 28-32, 36.
[12] (Pan Minghui, Niu Geng.Emotion Recognition of Micro-blogs Based on a Hybrid Lexicon[J]. Computer Technology and Development, 2014, 24(9): 28-32, 36.)
[13] 肖江, 丁星, 何荣杰. 基于领域情感词典的中文微博情感分析[J]. 电子设计工程, 2015, 23(12): 18-21.
[13] (Xiao Jiang, Ding Xing, He Rongjie.Analysis of Chinese Micro-blog Emotion Which Based on Field of Emotional Dictionary[J]. Electronic Design Engineering, 2015, 23(12): 18-21.)
[14] 王志涛, 於志文, 郭斌, 等. 基于词典和规则集的中文微博情感分析[J]. 计算机工程与应用, 2015, 51(8): 218-225.
[14] (Wang Zhitao, Yu Zhiwen, Guo Bin, et al.Sentiment Analysis of Chinese Micro Blog Based on Lexicon and Rule Set[J]. Computer Engineering and Applications, 2015, 51(8): 218-225.)
[15] 张珊, 于留宝, 胡长军. 基于表情图片与情感词的中文微博情感分析[J]. 计算机科学, 2012, 39(11A): 146-148, 176.
doi: 10.3969/j.issn.1002-137X.2012.z3.041
[15] (Zhang Shan, Yu Liubao, Hu Changjun.Sentiment Analysis of Chinese Micro-blogs Based on Emoticons and Emotional Words[J]. Computer Science, 2012, 39(11A): 146-148, 176.)
doi: 10.3969/j.issn.1002-137X.2012.z3.041
[16] 王文远, 王大玲, 冯时, 等. 一种面向情感分析的微博表情情感词典构建及应用[J]. 计算机与数字工程, 2012, 40(11): 6-9.
doi: 10.3969/j.issn.1672-9722.2012.11.002
[16] (Wang Wenyuan, Wang Daling, Feng Shi, et al.An Approach of Building Microblog Smiley Emotion Lexicon and Its Application for Sentiment Analysis[J]. Computer & Digital Engineering, 2012, 40(11): 6-9.)
doi: 10.3969/j.issn.1672-9722.2012.11.002
[17] 栗雨晴, 礼欣, 韩煦, 等. 基于双语词典的微博多类情感分析方法[J]. 电子学报, 2016, 44(9): 2068-2073.
doi: 10.3969/j.issn.0372-2112.2016.09.007
[17] (Li Yuqing, Li Xin, Han Xu, et al.A Bilingual Lexicon-Based Multi-class Semantic Orientation Analysis for Microblogs[J]. Acta Electronica Sinica, 2016, 44(9): 2068-2073.)
doi: 10.3969/j.issn.0372-2112.2016.09.007
[18] 何文娟. 微博情感营销对消费者购买意愿的影响研究[D]. 合肥: 安徽大学, 2016.
[18] (He Wenjuan.Research on the Influence of Microblog-Based Emotional Marketing on Consumers’ Purchase Intention[D]. Hefei: Anhui University, 2016.)
[19] 史伟, 王洪伟, 何绍义. 基于微博情感分析的电影票房预测研究[J]. 华中师范大学学报: 自然科学版, 2015, 49(1): 66-72.
[19] (Shi Wei, Wang Hongwei, He Shaoyi.Study on Predicting Movie Box Office Based on Sentiment Analysis of Micro-blog[J]. Journal of HuaZhong Normal University: Natural Sciences, 2015, 49(1): 66-72.)
[20] 李鸣, 吴波, 宋阳, 等. 细粒度情感分析的酒店评论研究[J]. 传感器与微系统, 2016, 35(12): 41-43, 47.
[20] (Li Ming, Wu Bo, Song Yang, et al.Research on Hotel Reviews Based on Fine-grained Sentiment Analysis[J]. Transducer and Microsystem Technologies, 2016, 35(12): 41-43, 47.)
[21] 钱慎一, 杨铁松. 基于微博电影评论的情感分析研究[J]. 现代计算机(专业版), 2017(5): 48-51.
[21] (Qian Shenyi, Yang Tiesong.Research on Emotional Analysis Based on Micro-Blog Film Criticism[J]. Modern Computer, 2017(5): 48-51.)
[22] 赵晓航. 基于情感分析与主题分析的“后微博”时代突发事件政府信息公开研究——以新浪微博“天津爆炸”话题为例[J]. 图书情报工作, 2016, 60(20): 104-111.
[22] (Zhao Xiaohang.The Study on Government News Release in the Era of Post-microblog Based on Sentiment Analysis and Subject Analysis: A Case Study of the “Tianjin Explosion” on Sina Microblog[J]. Library and Information Service, 2016, 60(20): 104-111.)
[23] 缪茹一. 基于文本数据挖掘的微博情感分析与监控系统[D]. 杭州: 浙江工业大学, 2015.
[23] (Miu Ruyi.Microblog Sentiment Analysis and Monitoring System Based on Text Data Mining [D]. Hangzhou: Zhejiang University of Technology, 2015.)
[24] 崔安颀. 微博热点事件的公众情感分析研究[D]. 北京: 清华大学, 2013.
[24] (Cui Anqi.Study on Public Sentiment Analysis of Events in Microblogs [D]. Beijing: Tsinghua University, 2013.)
[25] 陈建美. 中文情感词汇本体的构建及其应用[D]. 大连; 大连理工大学, 2009.
[25] (Chen Jianmei.The Construction and Application of Chinese Emotion Word Ontology[D]. Dalian: Dalian University of Technology, 2009.)
[26] 高宁. 现代汉语程度副词与否定副词共现的认知研究[D]. 长春: 吉林大学, 2013.
[26] (Gao Ning.A Cognitive Study on the Combination of the Degree Adverb and the Negative Adverb in Mandarin Chinese [D]. Changchun: Jilin University, 2013.)
[27] 施寒潇. 细粒度情感分析研究[D]. 苏州: 苏州大学, 2013.
[27] (Shi Hanxiao.Research on Fine-grained Sentiment Analysis [D]. Suzhou: Soochow University, 2013.)
[28] 陈国兰. 基于情感词典与语义规则的微博情感分析[J]. 情报探索, 2016(2): 1-6.
doi: 10.3969/j.issn.1005-8095.2016.02.001
[28] (Chen Guolan.Microblog Sentiment Analysis Basing on Emotion Dictionary and Semantic Rule[J]. Information Research, 2016(2): 1-6.)
doi: 10.3969/j.issn.1005-8095.2016.02.001
[29] 李婷婷, 姬东鸿. 基于SVM和CRF多特征组合的微博情感分析[J]. 计算机应用研究, 2015, 32(4): 978-981.
doi: 10.3969/j.issn.1001-3695.2015.04.004
[29] (Li Tingting, Ji Donghong.Sentiment Analysis of Micro-blog Based on SVM and CRF Using Various Combinations of Features[J]. Application Research of Computers, 2015, 32(4): 978-981.)
doi: 10.3969/j.issn.1001-3695.2015.04.004
[30] 马秉楠, 黄永峰, 邓北星. 基于表情符的社交网络情绪词典构造[J]. 计算机工程与设计, 2016, 37(5): 1129-1133.
[30] (Ma Bingnan, Huang Yongfeng, Deng Beixing.Generating Sentiment Lexicon of Online Social Network Based on Emotions[J]. Computer Engineering and Design, 2016, 37(5): 1129-1133.)
[31] 崔连超. 互联网评论文本情感分析研究[D]. 济南: 山东大学, 2015.
[31] (Cui Lianchao.Research on Internet Review Text Sentiment Analysis [D]. Ji’nan: Shandong University, 2015.)
[32] 郑诚, 杨希, 张吉赓. 结合情感词典与规则的微博情感极性分类方法[J]. 电脑知识与技术, 2014, 10(13): 3111-3113, 3123.
[32] (Zheng Cheng, Yang Xi, Zhang Jigeng.Combining Emotional Dictionary and Rules of Microblogging Emotional Polarity Classification Method[J]. Computer Knowledge and Technology, 2014, 10(13): 3111-3113, 3123.)
[33] 汪会琴, 胡如英, 武海滨, 等. 2型糖尿病报告发病率研究进展[J]. 浙江预防医学, 2016, 28(1): 37-39, 57.
[33] (Wang Huiqin, Hu Ruying, Wu Haibin, et al.Research Progress on Incidence of Type 2 Diabetes Mellitus[J]. Zhejiang Preventive Medicine, 2016, 28(1): 37-39, 57.)
[34] Li G, Hoi S C H, Chang K, et al. Microblogging Sentiment Detection by Collaborative Online Learning[C]//Proceedings of the 2010 IEEE International Conference on Data Mining, Sydney, Australia. USA: IEEE, 2010: 893-898.
[1] Jiang Lin,Zhang Qilin. Research on Academic Evaluation Based on Fine-Grain Citation Sentimental Quantification[J]. 数据分析与知识发现, 2020, 4(6): 129-138.
[2] Li Tiejun,Yan Duanwu,Yang Xiongfei. Recommending Microblogs Based on Emotion-Weighted Association Rules[J]. 数据分析与知识发现, 2020, 4(4): 27-33.
[3] Liang Yanping,An Lu,Liu Jing. Topic Resonance of Micro-blogs on Similar Public Health Emergencies[J]. 数据分析与知识发现, 2020, 4(2/3): 122-133.
[4] Xu Yuemei,Liu Yunwen,Cai Lianqiao. Predicitng Retweets of Government Microblogs with Deep-combined Features[J]. 数据分析与知识发现, 2020, 4(2/3): 18-28.
[5] Bocheng Li,Yunqiu Zhang,Kaixi Yang. Extracting Emotion Tags from Comments of Microblog Commodities[J]. 数据分析与知识发现, 2019, 3(9): 115-123.
[6] Lu An,Yanping Liang. Selection of Users’ Behaviors Towards Different Topics of Microblog on Public Health Emergencies[J]. 数据分析与知识发现, 2019, 3(4): 33-41.
[7] Xiaoxiao Zhu,Zunqi Yang,Jing Liu. Construction of an Adverse Drug Reaction Extraction Model Based on Bi-LSTM and CRF[J]. 数据分析与知识发现, 2019, 3(2): 90-97.
[8] Xinyue Fan,Lei Cui. Using Text Mining to Discover Drug Side Effects: Case Study of PubMed[J]. 数据分析与知识发现, 2018, 2(3): 79-86.
[9] Xinyue Fan,Lei Cui. Predicting Antineoplastic Drug Targets Based on Network Properties[J]. 数据分析与知识发现, 2018, 2(12): 98-108.
[10] Yongbing Gao,Guipeng Yang,Di Zhang,Zhanfei Ma. Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words[J]. 数据分析与知识发现, 2017, 1(9): 57-64.
[11] Qi Ruihua. Identifying Chinese Microblog Author Gender Based on Dependency[J]. 数据分析与知识发现, 2017, 1(2): 58-63.
[12] Yang Shuang,Chen Fen. Analyzing Sentiments of Micro-blog Posts Based on Support Vector Machine[J]. 数据分析与知识发现, 2017, 1(2): 73-79.
[13] Wei Xing,Hu Dehua,Yi Minhan,Zhu Qizhen,Zhu Wenjie. Extracting Disease-Gene-Drug Correlations Based on Data Cube[J]. 数据分析与知识发现, 2017, 1(10): 94-104.
[14] Yao Zhaoxu,Ma Jing. Extracting Topic and Opinion from Microblog Posts with New Algorithm[J]. 现代图书情报技术, 2016, 32(7-8): 78-86.
[15] Li Yazi,Zheng Jianli,Zhou Yiyang,Li Guolei. Building a National System for the Reimbursable Prescription Drugs[J]. 现代图书情报技术, 2016, 32(6): 96-101.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn