Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (8): 51-59    DOI: 10.11925/infotech.2096-3467.2018.0060
Current Issue | Archive | Adv Search |
Sentiment Analysis for Micro-blogs with LDA and AdaBoost
Ziming Zeng(),Qianwen Yang
Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China
Download: PDF(614 KB)   HTML ( 3
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] The paper aims to improve the performance of sentiment analysis for micro-blog texts with the help of LDA model and AdaBoost algorithm. [Methods] First, we used the LDA topic model to extract topics of micro-blog posts. Then, we merged the emotional and sentence pattern features. Finally, we trained the proposed sentiment analysis model with the AdaBoost ensemble classification method. [Results] The topic feature posed significant positive impacts on emotion recognition therefore, model with topic and emotional features yielded the best results. The precision of the proposed model reached 84.512%, while the recall reached 83.160%. [Limitations] The sample size needs to be expanded, and the sentiment dictionary should be improved too. We did not study the emoticons from the micro-blog posts. [Conclusions] The proposed AdaBoost model with LDA could effectively identify emotional tendencies.

Key wordsMicro-blog      Sentiment Analysis      LDA      AdaBoost     
Received: 17 January 2018      Published: 08 September 2018

Cite this article:

Ziming Zeng,Qianwen Yang. Sentiment Analysis for Micro-blogs with LDA and AdaBoost. Data Analysis and Knowledge Discovery, 2018, 2(8): 51-59.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.0060     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I8/51

[1] 何跃, 朱灿. 基于微博的意见领袖网情感特征分析——以“非法疫苗”事件为例[J]. 数据分析与知识发现, 2017, 1(9): 65-73.
[1] (He Yue, Zhu Can.Sentiment Analysis of Weibo Opinion Leaders—Case Study of ‘Illegal Vaccine’ Event[J]. Data Analysis and Knowledge Discovery, 2017, 1(9): 65-73.)
[2] 徐健. 基于网络用户情感分析的预测方法研究[J]. 中国图书馆学报, 2013, 39(3): 96-107.
[2] (Xu Jian.Research on Predicting Methods Based on Network User Sentiment Analysis[J]. Journal of Library Science in China, 2013, 39(3): 96-107.)
[3] 崔安颀. 微博热点事件的公众情感分析研究[D]. 北京: 清华大学, 2013.
[3] (Cui Anqi.Study on Public Sentiment Analysis of Events in Microblogs[D]. Beijing: Tsinghua University, 2013.)
[4] Pang B, Lee L.Opinion Mining and Sentiment Analysis[J]. Foundations and Trends in Information Retrival, 2008, 2(1-2): 1-135.
[5] 陈晓东. 基于情感词典的中文微博情感倾向分析研究[D].武汉: 华中科技大学, 2012.
[5] (Chen Xiaodong.Research on Sentiment Dictionary Based Emotional Tendency Analysis of Chinese MicroBlog[D]. Wuhan: Huazhong University of Science and Technology, 2012.)
[6] 史伟, 王洪伟, 何绍义. 基于语义的中文在线评论情感分析[J]. 情报学报, 2013, 32(8): 860-867.
[6] (Shi Wei, Wang Hongwei, He Shaoyi.Sentiment Analysis of Chinese Online Reviews Based on Semantics[J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(8): 860-867.)
[7] 韩旭. 社交网络中短文本情感分析技术研究[D]. 天津: 天津大学, 2014.
[7] (Han Xu.Research on Technology of Short-Text Sentiment Analysis in Social Network[D].Tianjin: Tianjin University, 2014.)
[8] Pang B, Lee L, Vaithyanathan S.Thumbs up? Sentiment Classification Using Machine Learning Techniques[C]// Proceedings of Conference on Empirical Methods in Natural Language Processing. 2002: 79-86.
[9] 丁晟春, 孟美任, 李霄. 面向中文微博的观点句识别研究[J]. 情报学报, 2014, 33(2): 175-182.
[9] (Ding Shengchun, Meng Meiren, Li Xiao.Study of Subjective Sentence Identification Oriented to Chinese Microblog[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(2): 175-182.)
[10] 毛龙龙. 基于LDA模型的微博情感分析技术研究[D]. 兰州: 西北师范大学, 2015.
[10] (Mao Longlong.Research on Microblog Sentiment Analysis Technology Based the LDA Model [D]. Lanzhou: Northwest Normal University, 2015.)
[11] 苏莹, 张勇, 胡珀, 等. 基于朴素贝叶斯与潜在狄利克雷分布相结合的情感分析[J]. 计算机应用, 2016, 36(6): 1613-1618.
[11] (Su Ying, Zhang Yong, Hu Po, et al.Sentiment Analysis Research Based on Combination of Naive Bayes and Latent Dirichlet Allocation[J]. Journal of Computer Applications, 2016, 36(6): 1613-1618.)
[12] 唐晓波, 朱娟, 杨丰华. 基于情感本体和kNN算法的在线评论情感分类研究[J]. 情报理论与实践, 2016, 39(6): 110-114.
[12] (Tang Xiaobo, Zhu Juan, Yang Fenghua.Research on Emotional Classification of Online Reviews Based on Emotional Ontology and kNN Algorithm[J]. Information Studies: Theory & Application, 2016, 39(6): 110-114.)
[13] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J].Journal of Machine Learning Research, 2003, 3: 993-1022.
[14] 张培晶, 宋蕾. 基于LDA的微博文本主题建模方法研究述评[J]. 图书情报工作, 2012, 56(24): 120-126.
[14] (Zhang Peijing, Song Lei.Overview on Topic Modeling Method of Microblogs Text Based on LDA[J]. Library and Information Service, 2012, 56(24): 120-126.)
[15] 唐晓波, 向坤. 基于LDA模型和微博热度的热点挖掘[J].图书情报工作, 2014, 58(5): 58-63.
[15] (Tang Xiaobo, Xiang Kun.Hotspot Mining Based on LDA Model and Microblog Heat[J]. Library and Information Service, 2014, 58(5): 58-63.)
[16] Stevens K, Kegelmeyer P, Andrzejewski D, et al.Exploring Topic Coherence over Many Models and Many Topics[C]//Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea. 2012.
[17] Mimno D, Wallach H M, Talley E, et al.Opitimizing Semantic Coherence in Topic Models[C]//Proceedings of Conference on Emperical Methods in Natural Language Processing.2011: 262-272.
[18] Hatfield E, Cacioppo J L, Rapson R L.Emotional Contagion[J]. Current Directions in Psychological Sciences, 1993, 2: 96-99.
[19] Freund Y, Schipare R E.A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting[C]// Proceedings of the 2nd European Conference on Computational Learning Theory. 1995: 23-37.
[20] 曹莹, 苗启广, 刘家辰, 等. AdaBoost算法研究进展与展望[J]. 自动化学报, 2013, 39(6): 745-758.
[20] (Cao Ying, Miao Qiguang, Liu Jiachen.Advance and Prospects of AdaBoost Algorithm[J]. Acta Automatica Sinica, 2013, 39(6): 745-758.)
[21] 张志飞, 苗夺谦, 高灿. 基于LDA主题模型的短文本分类方法[J]. 计算机应用, 2013, 33(6): 1587-1590.
[21] (Zhang Zhifei, Miao Duoqian, Gao Can.Short Text Classification Using Latent Dirichlet Allocation[J]. Journal of Computer Application, 2013, 33(6): 1587-1590.)
[22] 王义真, 郑啸, 后盾, 等. 基于SVM的高维混合特征短文本情感分类[J]. 计算机技术与发展, 2018, 28(2): 88-93.
[22] (Wang Yizhen, Zheng Xiao, Hou Dun, et al.Short Text Sentiment Classification of High Dimensional Hybrid Feature Based on SVM[J]. Computer Technology and Development, 2018, 28(2): 88-93.)
[23] 贺鸣, 孙建军, 成颖. 基于朴素贝叶斯的文本分类研究综述[J]. 情报科学, 2016, 34(7): 147-154.
[23] (He Ming, Sun Jianjun, Cheng Ying.Text Classification Based on Naïve Bayes: A Review[J]. Information Science, 2016, 34(7): 147-154.)
[24] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
[24] (Zhou Zhihua.Machine Learning[M]. Beijing: Tsinghua University Press, 2016.)
[25] 敦欣卉, 张云秋, 杨铠西. 基于微博的细粒度情感分析[J].数据分析与知识发现, 2017, 1(7): 61-72.
[25] (Guo Xinhui, Zhang Yunqiu, Yang Kaixi.Fine-grained Sentiment Analysis Based on Weibo[J]. Data Analysis and Knowledge Discovery, 2017, 1(7): 61-72.)
[1] Lixin Xia,Jieyan Zeng,Chongwu Bi,Guanghui Ye. Identifying Hierarchy Evolution of User Interests with LDA Topic Model[J]. 数据分析与知识发现, 2019, 3(7): 1-13.
[2] Zhongxi You,Weina Hua,Xuelian Pan. Matching Book Reviews and Essential Sentiment Lexicons with Chinese Word Segmenters[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
[3] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[4] Linna Xi,Yongxiang Dou. Examining Reposts of Micro-bloggers with Planned Behavior Theory[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[5] Mingqing Zhao,Shengqiang Wu. Research on Stock Market Weighted Prediction Method Based on Micro-blog Sentiment Analysis[J]. 数据分析与知识发现, 2019, 3(2): 43-51.
[6] Jie Zhang,Junbo Zhao,Dongsheng Zhai,Ningning Sun. Patent Technology Analysis of Microalgae Biofuel Industrial Chain Based on Topic Model[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[7] Cuiqing Jiang,Yibo Guo,Yao Liu. Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
[8] Junwan Liu,Zhixin Long,Feifei Wang. Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
[9] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[10] Bengong Yu,Peihang Zhang,Qingtang Xu. Selecting Products Based on F-BiGRU Sentiment Analysis[J]. 数据分析与知识发现, 2018, 2(9): 22-30.
[11] Yue He,Yue Feng,Shupeng Zhao,Yufeng Ma. Recommending Contents Based on Zhihu Q&A Community: Case Study of Logistics Topics[J]. 数据分析与知识发现, 2018, 2(9): 42-49.
[12] Tao Zhang,Haiqun Ma. Clustering Policy Texts Based on LDA Topic Model[J]. 数据分析与知识发现, 2018, 2(9): 59-65.
[13] Yanhua Xu,Yujie Miao,Lin Miao,Xueqiang Lv. Generating HSK Writing Essays with LDA Model[J]. 数据分析与知识发现, 2018, 2(9): 80-87.
[14] Xiufang Wang,Shu Sheng,Yan Lu. Analyzing Public Opinion from Microblog with Topic Clustering and Sentiment Intensity[J]. 数据分析与知识发现, 2018, 2(6): 37-47.
[15] Beibei Pang,Juanqiong Gou,Wenxin Mu. Extracting Topics and Their Relationship from College Student Mentoring[J]. 数据分析与知识发现, 2018, 2(6): 92-101.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn