Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (8): 51-59    DOI: 10.11925/infotech.2096-3467.2018.0060
Current Issue | Archive | Adv Search |
Sentiment Analysis for Micro-blogs with LDA and AdaBoost
Zeng Ziming(), Yang Qianwen
Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China
Download: PDF (614 KB)   HTML ( 8
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] The paper aims to improve the performance of sentiment analysis for micro-blog texts with the help of LDA model and AdaBoost algorithm. [Methods] First, we used the LDA topic model to extract topics of micro-blog posts. Then, we merged the emotional and sentence pattern features. Finally, we trained the proposed sentiment analysis model with the AdaBoost ensemble classification method. [Results] The topic feature posed significant positive impacts on emotion recognition therefore, model with topic and emotional features yielded the best results. The precision of the proposed model reached 84.512%, while the recall reached 83.160%. [Limitations] The sample size needs to be expanded, and the sentiment dictionary should be improved too. We did not study the emoticons from the micro-blog posts. [Conclusions] The proposed AdaBoost model with LDA could effectively identify emotional tendencies.

Key wordsMicro-blog      Sentiment Analysis      LDA      AdaBoost     
Received: 17 January 2018      Published: 08 September 2018
ZTFLH:  TP391.1  

Cite this article:

Zeng Ziming,Yang Qianwen. Sentiment Analysis for Micro-blogs with LDA and AdaBoost. Data Analysis and Knowledge Discovery, 2018, 2(8): 51-59.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.0060     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I8/51

类别 示例
正向情感词 赞、骄傲、厉害、膜拜、大神
负向情感词 喷、辣鸡、垃圾、脑残、差评、炒作、屌丝、键盘侠、细思极恐
否定词
不该 不好 没有 绝非
转折词
但是 然而 可是 不料
不过 偏偏 否则 毕竟 可惜 只是
特征类型 特征表示 含义 特征度量
主题特征 主题类别 该条评论所属主题类别 topic=i,(i=0,1,2…)
情感特征 正向情感词 一条评论中包含正向情感词个数 pos=n,(n=0,1,2…)
负向情感词 一条评论中包含负向情感词个数 neg=n,(n=0,1,2…)
句式特征 否定词 一条评论中包含否定词数量 nw= n,(n=0,1,2…)
转折词 一条评论中包含转折词数量 adv=n,(n=0,1,2…)
主题 正向 负向 合计
Topic_1 158 99 257
Topic_2 68 157 225
Topic_3 36 217 253
Topic_4 169 70 239
Topic_5 44 181 225
Topic_6 20 207 227
合计 495 931 1 426
主题 Topic_1 Topic_2 Topic_3 Topic_4 Topic_5 Topic_6
主题词 交流0.0300
失联0.0244
厄巴纳0.0201
签0.0193
租房0.0192
硕士0.0180
小镇0.0367
塞勒姆0.0360
女孩0.0314
此前0.0139
伊利诺伊州0.0139
曾见0.0138
联邦调查局0.0548
死亡0.0476
一名0.0473
男子0.0428
逮捕0.0412
涉嫌0.0393
警察0.0418
模糊0.0413
震惊0.0398
画像0.0396
林宇辉0.0318
手绘0.0237
网站0.0190
疑犯0.0182
变态0.0165
潜入0.0137
呼吁0.0127
会员0.0119
捐款0.0503
家人0.0305
质疑0.0265
回应0.0223
用法0.0194
用于0.0166
主题含义 事件发生背景 受害人出现地点 嫌犯逮捕 手绘嫌犯画像 变态网站 回应捐款质疑
主题 微博文本
Topic_1
(事件发生背景)
【急转!北大女硕士赴美国交流时失联, 目前已超50小时】章莹颖, 女, 25岁, 中大本科、北大硕士、中科院助理研究员, 于今年4月前往美国UIUC伊利诺伊大学厄巴纳-香槟分校交流。当地时间9日, 小章外出签租房合同时失联, 目前已报警。
Topic_2
(受害人出现地点)
在一个名为塞勒姆(Salem)的小镇, 多名目击者声称在这里看到过章莹颖。塞勒姆镇位于章莹颖失踪地点伊利诺伊大学香槟分校西南约200公里。章莹颖的家人在Salem当地沿街走访, 有七个人分别向家人证实了他们曾见过章莹颖。
Topic_3
(嫌犯逮捕)
美国联邦调查局已经逮捕一名涉嫌绑架中国访问学者章莹颖的27岁男子。联邦调查局表示, 相信章莹颖已经死亡。
Topic_4
(手绘嫌犯画像)
6月23日, 中国警察林宇辉根据非常模糊的监控画面, 手绘出“章莹颖失踪案”嫌犯, 逼真程度震惊美国警方。7月
1日, 嫌疑人已被美国警方抓获。
Topic_5
(变态网站)
【记者潜入“全球第一变态网” 提章莹颖被封号】近日, #北大女硕士在美失联#案引发国内外广泛关注, 此案中, 一个涉嫌教唆疑犯进行绑架的变态网站也引起了大家注意。据悉, 此网站有500多万会员, 多名会员从网站学到“技术”后犯下重罪, 记者潜入网站, 发帖问章莹颖案就被“踢出”。
Topic_6
(回应捐款质疑)
【章莹颖家属已募得14万美元, 款项用途悄然变更遭质疑】章莹颖在美失踪70多天, 当地时间22日下午举行的新闻发布会上, 介绍了章家人几天前向特朗普提交的请愿信内容, 以及募集到的14.4万美元捐款主要用途及使用情况说明。但有不少捐款网友留言对筹款上限一再提高表示了质疑。
模型 主题特征 情感特征 句式特征
1
2
3
4
5
模型 Precision Recall F1-score AUC
1 74.808% 75.248% 66.667% 0.752
2 81.887% 78.150% 71.146% 0.781
3 84.512% 83.160% 77.778% 0.832
4 84.283% 78.651% 72.131% 0.787
5 83.313% 82.282% 76.471% 0.823
[1] 何跃, 朱灿. 基于微博的意见领袖网情感特征分析——以“非法疫苗”事件为例[J]. 数据分析与知识发现, 2017, 1(9): 65-73.
[1] (He Yue, Zhu Can.Sentiment Analysis of Weibo Opinion Leaders—Case Study of ‘Illegal Vaccine’ Event[J]. Data Analysis and Knowledge Discovery, 2017, 1(9): 65-73.)
[2] 徐健. 基于网络用户情感分析的预测方法研究[J]. 中国图书馆学报, 2013, 39(3): 96-107.
doi: 10.3969/j.issn.1001-8867.2013.03.022
[2] (Xu Jian.Research on Predicting Methods Based on Network User Sentiment Analysis[J]. Journal of Library Science in China, 2013, 39(3): 96-107.)
doi: 10.3969/j.issn.1001-8867.2013.03.022
[3] 崔安颀. 微博热点事件的公众情感分析研究[D]. 北京: 清华大学, 2013.
[3] (Cui Anqi.Study on Public Sentiment Analysis of Events in Microblogs[D]. Beijing: Tsinghua University, 2013.)
[4] Pang B, Lee L.Opinion Mining and Sentiment Analysis[J]. Foundations and Trends in Information Retrival, 2008, 2(1-2): 1-135.
doi: 10.1561/1500000011
[5] 陈晓东. 基于情感词典的中文微博情感倾向分析研究[D].武汉: 华中科技大学, 2012.
[5] (Chen Xiaodong.Research on Sentiment Dictionary Based Emotional Tendency Analysis of Chinese MicroBlog[D]. Wuhan: Huazhong University of Science and Technology, 2012.)
[6] 史伟, 王洪伟, 何绍义. 基于语义的中文在线评论情感分析[J]. 情报学报, 2013, 32(8): 860-867.
doi: 10.3772/j.issn.1000-0135.2013.08.009
[6] (Shi Wei, Wang Hongwei, He Shaoyi.Sentiment Analysis of Chinese Online Reviews Based on Semantics[J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(8): 860-867.)
doi: 10.3772/j.issn.1000-0135.2013.08.009
[7] 韩旭. 社交网络中短文本情感分析技术研究[D]. 天津: 天津大学, 2014.
[7] (Han Xu.Research on Technology of Short-Text Sentiment Analysis in Social Network[D].Tianjin: Tianjin University, 2014.)
[8] Pang B, Lee L, Vaithyanathan S.Thumbs up? Sentiment Classification Using Machine Learning Techniques[C]// Proceedings of Conference on Empirical Methods in Natural Language Processing. 2002: 79-86.
[9] 丁晟春, 孟美任, 李霄. 面向中文微博的观点句识别研究[J]. 情报学报, 2014, 33(2): 175-182.
[9] (Ding Shengchun, Meng Meiren, Li Xiao.Study of Subjective Sentence Identification Oriented to Chinese Microblog[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(2): 175-182.)
[10] 毛龙龙. 基于LDA模型的微博情感分析技术研究[D]. 兰州: 西北师范大学, 2015.
[10] (Mao Longlong.Research on Microblog Sentiment Analysis Technology Based the LDA Model [D]. Lanzhou: Northwest Normal University, 2015.)
[11] 苏莹, 张勇, 胡珀, 等. 基于朴素贝叶斯与潜在狄利克雷分布相结合的情感分析[J]. 计算机应用, 2016, 36(6): 1613-1618.
doi: 10.11772/j.issn.1001-9081.2016.06.1613
[11] (Su Ying, Zhang Yong, Hu Po, et al.Sentiment Analysis Research Based on Combination of Naive Bayes and Latent Dirichlet Allocation[J]. Journal of Computer Applications, 2016, 36(6): 1613-1618.)
doi: 10.11772/j.issn.1001-9081.2016.06.1613
[12] 唐晓波, 朱娟, 杨丰华. 基于情感本体和kNN算法的在线评论情感分类研究[J]. 情报理论与实践, 2016, 39(6): 110-114.
[12] (Tang Xiaobo, Zhu Juan, Yang Fenghua.Research on Emotional Classification of Online Reviews Based on Emotional Ontology and kNN Algorithm[J]. Information Studies: Theory & Application, 2016, 39(6): 110-114.)
[13] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J].Journal of Machine Learning Research, 2003, 3: 993-1022.
[14] 张培晶, 宋蕾. 基于LDA的微博文本主题建模方法研究述评[J]. 图书情报工作, 2012, 56(24): 120-126.
[14] (Zhang Peijing, Song Lei.Overview on Topic Modeling Method of Microblogs Text Based on LDA[J]. Library and Information Service, 2012, 56(24): 120-126.)
[15] 唐晓波, 向坤. 基于LDA模型和微博热度的热点挖掘[J].图书情报工作, 2014, 58(5): 58-63.
doi: 10.13266/j.issn.0252-3116.2014.05.010
[15] (Tang Xiaobo, Xiang Kun.Hotspot Mining Based on LDA Model and Microblog Heat[J]. Library and Information Service, 2014, 58(5): 58-63.)
doi: 10.13266/j.issn.0252-3116.2014.05.010
[16] Stevens K, Kegelmeyer P, Andrzejewski D, et al.Exploring Topic Coherence over Many Models and Many Topics[C]//Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea. 2012.
[17] Mimno D, Wallach H M, Talley E, et al.Opitimizing Semantic Coherence in Topic Models[C]//Proceedings of Conference on Emperical Methods in Natural Language Processing.2011: 262-272.
[18] Hatfield E, Cacioppo J L, Rapson R L.Emotional Contagion[J]. Current Directions in Psychological Sciences, 1993, 2: 96-99.
doi: 10.1111/1467-8721.ep10770953
[19] Freund Y, Schipare R E.A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting[C]// Proceedings of the 2nd European Conference on Computational Learning Theory. 1995: 23-37.
[20] 曹莹, 苗启广, 刘家辰, 等. AdaBoost算法研究进展与展望[J]. 自动化学报, 2013, 39(6): 745-758.
doi: 10.3724/SP.J.1004.2013.00745
[20] (Cao Ying, Miao Qiguang, Liu Jiachen.Advance and Prospects of AdaBoost Algorithm[J]. Acta Automatica Sinica, 2013, 39(6): 745-758.)
doi: 10.3724/SP.J.1004.2013.00745
[21] 张志飞, 苗夺谦, 高灿. 基于LDA主题模型的短文本分类方法[J]. 计算机应用, 2013, 33(6): 1587-1590.
doi: 10.3724/SP.J.1087.2013.01587
[21] (Zhang Zhifei, Miao Duoqian, Gao Can.Short Text Classification Using Latent Dirichlet Allocation[J]. Journal of Computer Application, 2013, 33(6): 1587-1590.)
doi: 10.3724/SP.J.1087.2013.01587
[22] 王义真, 郑啸, 后盾, 等. 基于SVM的高维混合特征短文本情感分类[J]. 计算机技术与发展, 2018, 28(2): 88-93.
[22] (Wang Yizhen, Zheng Xiao, Hou Dun, et al.Short Text Sentiment Classification of High Dimensional Hybrid Feature Based on SVM[J]. Computer Technology and Development, 2018, 28(2): 88-93.)
[23] 贺鸣, 孙建军, 成颖. 基于朴素贝叶斯的文本分类研究综述[J]. 情报科学, 2016, 34(7): 147-154.
[23] (He Ming, Sun Jianjun, Cheng Ying.Text Classification Based on Naïve Bayes: A Review[J]. Information Science, 2016, 34(7): 147-154.)
[24] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
[24] (Zhou Zhihua.Machine Learning[M]. Beijing: Tsinghua University Press, 2016.)
[25] 敦欣卉, 张云秋, 杨铠西. 基于微博的细粒度情感分析[J].数据分析与知识发现, 2017, 1(7): 61-72.
[25] (Guo Xinhui, Zhang Yunqiu, Yang Kaixi.Fine-grained Sentiment Analysis Based on Weibo[J]. Data Analysis and Knowledge Discovery, 2017, 1(7): 61-72.)
[1] Xu Yuemei, Wang Zihou, Wu Zixin. Predicting Stock Trends with CNN-BiLSTM Based Multi-Feature Integration Model[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[2] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[3] Liu Tong,Liu Chen,Ni Weijian. A Semi-Supervised Sentiment Analysis Method for Chinese Based on Multi-Level Data Augmentation[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[4] Yi Huifang,Liu Xiwen. Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[5] Wang Yuzhu,Xie Jun,Chen Bo,Xu Xinying. Multi-modal Sentiment Analysis Based on Cross-modal Context-aware Attention[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[6] Li Feifei,Wu Fan,Wang Zhongqing. Sentiment Analysis with Reviewer Types and Generative Adversarial Network[J]. 数据分析与知识发现, 2021, 5(4): 72-79.
[7] Li Yueyan,Wang Hao,Deng Sanhong,Wang Wei. Research Trends of Information Retrieval——Case Study of SIGIR Conference Papers[J]. 数据分析与知识发现, 2021, 5(4): 13-24.
[8] Chang Chengyang,Wang Xiaodong,Zhang Shenglei. Polarity Analysis of Dynamic Political Sentiments from Tweets with Deep Learning Method[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[9] Wang Hongbin,Wang Jianxiong,Zhang Yafei,Yang Heng. Topic Recognition of News Reports with Imbalanced Contents[J]. 数据分析与知识发现, 2021, 5(3): 109-120.
[10] Zhang Mengyao, Zhu Guangli, Zhang Shunxiang, Zhang Biao. Grouping Microblog Users of Trending Topics Based on Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[11] Han Pu, Zhang Wei, Zhang Zhanpeng, Wang Yuxin, Fang Haoyu. Sentiment Analysis of Weibo Posts on Public Health Emergency with Feature Fusion and Multi-Channel[J]. 数据分析与知识发现, 2021, 5(11): 68-79.
[12] Wang Wei, Gao Ning, Xu Yuting, Wang Hongwei. Topic Evolution of Online Reviews for Crowdfunding Campaigns[J]. 数据分析与知识发现, 2021, 5(10): 103-123.
[13] Lv Huakui,Liu Zhenghao,Qian Yuxing,Hong Xudong. Relationship Between Financial News and Stock Market Fluctuations[J]. 数据分析与知识发现, 2021, 5(1): 99-111.
[14] Xu Hongxia,Yu Qianqian,Qian Li. Studying Content Interaction Data with Topic Model and Sentiment Analysis[J]. 数据分析与知识发现, 2020, 4(7): 110-117.
[15] Cai Yongming,Liu Lu,Wang Kewei. Identifying Key Users and Topics from Online Learning Community[J]. 数据分析与知识发现, 2020, 4(6): 69-79.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn