Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (2): 98-107    DOI: 10.11925/infotech.2096-3467.2018.0578
Current Issue | Archive | Adv Search |
Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text
Cuiqing Jiang1,2(),Yibo Guo1,Yao Liu1
1School of Management, Hefei University of Technology, Hefei 230009, China
2Key Laboratory of Process Optimization and Intelligent Decision-making of Ministry of Education, Hefei 230009, China
Download: PDF (632 KB)   HTML ( 8
Export: BibTeX | EndNote (RIS)      

[Objective] This study aims to construct a domain sentiment lexicon by discovering unrecognized sentiment words from user-generated contents on Chinese social media to apply it to automotive comments sentiment analysis. [Methods] First, words in HowNet are selected as the seeds, and PMI and Word2Vec algorithm are used to calculate the sentiment polarity of the candidates respectively on real automative corpus. Then the results of the two discriminations are judged synthetically according to the ensemble rules. Finally the proposed method was shown effective by the comparison of the sentiment classification experiments. [Results] The accuracy rate of the lexicon constructed according to proposed method is 21.6% higher than that of HowNet. The lexicon constructed by PMI and Word2Vec respectively increase 3.7% and 2.1%. Meanwhile the number of positive and negative emotional words are greatly increased. [Limitations] The source of corpus is single, and it has certain limitations in guiding other fields. [Conclusions] The sentiment lexicon constructed by this method can be applied to sentiment analysis of social media texts effectively.

Key wordsSocial Media      Sentiment Analysis      Sentiment Lexicon      PMI      Word2Vec     
Received: 23 May 2018      Published: 27 March 2019

Cite this article:

Cuiqing Jiang,Yibo Guo,Yao Liu. Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text. Data Analysis and Knowledge Discovery, 2019, 3(2): 98-107.

URL:     OR

[1] Liu B.Sentiment Analysis and Opinion Mining[A]//Synthesis Lectures on Human Language Technologies[M]. Morgan & Claypool Publishers, 2012: 152-153.
[2] Hogenboom A, Heerschop B, Frasincar F, et al.Multi-lingual Support for Lexicon-based Sentiment Analysis Guided by Semantics[J]. Decision Support Systems, 2014, 62(2): 43-53.
[3] Wu F, Huang Y, Song Y, et al.Towards Building a High-quality Microblog-specific Chinese Sentiment Lexicon[J]. Decision Support Systems, 2016, 87: 39-49.
[4] Fellbaum C, Miller G.WordNet: An Electronic Lexical Database[M]. MIT Press, 1998.
[5] Stone P J, Dunphy D C, Smith M S.The General Inquirer: A Computer Approach to Content Analysis[J]. Information Storage & Retrieval, 1966, 4(4): 375-376.
[6] Dong Z, Dong Q.HowNet - A Hybrid Language and Knowledge Resource[C]// Proceedings of the 2003 International Conference on Natural Language Processing and Knowledge Engineering. 2003.
[7] 王科, 夏睿. 情感词典自动构建方法综述[J]. 自动化学报, 2016,42(4): 495-511.
[7] (Wang Ke, Xia Rui.A Survey on Automatical Construction Methods of Sentiment Lexicons[J]. Acta Automatica Sinica, 2016, 42(4): 495-511.)
[8] Loughran T, Mcdonald B.When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10‐Ks[J]. Journal of Finance, 2011, 66(1): 35-65.
[9] Church K W, Hanks P. Word Association Norms, Mutual Information,Lexicography[J]. Computational Linguistics, 1990, 16(1): 76-83.
[10] Turney P D, Littman M L.Measuring Praise and Criticism: Inference of Semantic Orientation from Association[J]. ACM Transactions on Information Systems, 2003, 21(4): 315-346.
[11] Deng S, Sinha A P, Zhao H.Adapting Sentiment Lexicons to Domain-Specific Social Media Texts[J]. Decision Support Systems, 2017, 94: 65-76.
[12] 郭顺利, 张向先. 面向中文图书评论的情感词典构建方法研究[J]. 现代图书情报技术, 2016(2): 67-74.
[12] (Guo Shunli, Zhang Xiangxian.Building Sentiment Analysis Dictionary for Chinese Book Reviews[J]. New Technology of Library and Information Service, 2016(2): 67-74.)
[13] 郗亚辉. 产品评论中领域情感词典的构建[J]. 中文信息学报, 2016, 30(5): 136-144.
[13] (Xi Yahui.Construction of Domain-specific Sentiment Lexicon in Product Reviews[J]. Journal of Chinese Information Processing, 2016, 30(5): 136-144.)
[14] 朱嫣岚, 闵锦, 周雅倩, 等. 基于HowNet的词汇语义倾向计算[J]. 中文信息学报, 2006, 20(1): 14-20.
[14] (Zhu Yanlan, Min Jin, Zhou Yaqian, et al.Semantic Orientation Computing Based on HowNet[J]. Journal of Chinese Information Processing, 2006, 20(1): 14-20.)
[15] Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.378.
[16] Mikolov T, KarafiáT M, BURGET L, et al. Recurrent Neural Network Based Language Model[C]//Proceedings of the 2010 Conference of the International Speech Communication Association, Makuhari, Chiba, Japan. 2010.
[17] 杨小平, 张中夏, 王良, 等. 基于Word2Vec的情感词典自动构建与优化[J]. 计算机科学, 2017, 44(1): 42-47.
[17] (Yang Xiaoping, Zhang Zhongxia, Wang Liang, et al.Automatic Construction and Optimization of Sentiment Lexicon Based on Word2Vec[J].Computer Science, 2017, 44(1): 42-47.)
[18] 王仁武, 宋家怡, 陈川宝. 基于Word2vec的情感分析在品牌认知中的应用研究[J]. 图书情报工作, 2017, 61(22): 6-12.
[18] (Wang Renwu, Song Jiayi, Chen Chuanbao.Application of Sentiment Analysis Based on Word2vec in Brand Cognition[J]. Library and Information Service, 2017, 61(22): 6-12.)
[19] Qiu G, Liu B, Bu J, et al.Expanding Domain Sentiment Lexicon Through Double Propagation[C]// Proceedings of the 21st International Jont Conference on Artifical Intelligence. 2009.
[20] Filho J L, Canuto A P, Santiago R N.Investigating the Impact of Selection Criteria in Dynamic Ensemble Selection Methods[J]. Expert Systems with Applications, 2018,106: 141-153.
[21] Kittler J V, Hatef M, Duin R W, et al.On Combining Classfiers[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1998, 20(3): 226-239.
[22] 黄伟, 范磊. 基于多分类器投票集成的半监督情感分类方法研究[J]. 中文信息学报, 2016, 30(2): 41-49.
[22] (Huang Wei, Fan Lei.Semi-supervised Sentiment Classification Based on Ensemble Learning with Voting[J]. Journal of Chinese Information Processing, 2016, 30(2): 41-49.)
[23] Sun Z, Song Q, Zhu X, et al.A Novel Ensemble Method for Classifying Imbalanced Data[J]. Pattern Recognition, 2015, 48(5): 1623-1637.
[24] Li Y, Guo H, Liu X, et al.Adapted Ensemble Classification Algorithm Based on Multiple Classifier System and Feature Selection for Classifying Multi-class Imbalanced Data[J]. Knowledge-Based Systems, 2016, 94: 88-104.
[25] 汽车之家论坛[EB/OL]. [2018-03-01].
[25] (AutoHome Forum[EB/OL]. [2018-03-01]..)
[26] 结巴中文分词[CP/OL]. [2018-03-01]..
[26] (Jieba: Chinese Text Segmentation[CP/OL] . [2018-03-01]..)
[27] 搜狗词库-汽车[DB/OL]. [2018-03-01]..
[27] (Sougou Typewriting Lexicon-Car [DB/OL]. [2018-03-01]..)
[28] Chalothorn T, Ellman J.Sentiment Analysis of Web Forums: Comparison Between Sentiwordnet and Sentistrength[C]// Proceedings of the 2012 International Conference on Software Technology and Engineering. 2012.
[1] Xu Yuemei, Wang Zihou, Wu Zixin. Predicting Stock Trends with CNN-BiLSTM Based Multi-Feature Integration Model[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[2] Ma Yingxue,Zhao Jichang. Patterns and Evolution of Public Opinion on Weibo During Natural Disasters: Case Study of Typhoons and Rainstorms[J]. 数据分析与知识发现, 2021, 5(6): 66-79.
[3] Xie Hao,Mao Jin,Li Gang. Sentiment Classification of Image-Text Information with Multi-Layer Semantic Fusion[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[4] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[5] Zhang Guobiao,Li Jie. Detecting Social Media Fake News with Semantic Consistency Between Multi-model Contents[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[6] Liu Tong,Liu Chen,Ni Weijian. A Semi-Supervised Sentiment Analysis Method for Chinese Based on Multi-Level Data Augmentation[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[7] Wang Yuzhu,Xie Jun,Chen Bo,Xu Xinying. Multi-modal Sentiment Analysis Based on Cross-modal Context-aware Attention[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[8] Li Feifei,Wu Fan,Wang Zhongqing. Sentiment Analysis with Reviewer Types and Generative Adversarial Network[J]. 数据分析与知识发现, 2021, 5(4): 72-79.
[9] Chang Chengyang,Wang Xiaodong,Zhang Shenglei. Polarity Analysis of Dynamic Political Sentiments from Tweets with Deep Learning Method[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[10] Zhang Mengyao, Zhu Guangli, Zhang Shunxiang, Zhang Biao. Grouping Microblog Users of Trending Topics Based on Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[11] Han Pu, Zhang Wei, Zhang Zhanpeng, Wang Yuxin, Fang Haoyu. Sentiment Analysis of Weibo Posts on Public Health Emergency with Feature Fusion and Multi-Channel[J]. 数据分析与知识发现, 2021, 5(11): 68-79.
[12] Lv Huakui,Liu Zhenghao,Qian Yuxing,Hong Xudong. Relationship Between Financial News and Stock Market Fluctuations[J]. 数据分析与知识发现, 2021, 5(1): 99-111.
[13] Li Gang, Guan Weidong, Ma Yaxue, Mao Jin. Predicting Social Media Visibility of Scholarly Articles[J]. 数据分析与知识发现, 2020, 4(8): 63-74.
[14] Li Yueyan,Xiong Huixiang,Li Xiaomin. Recommending Doctors Online Based on Combined Conditions[J]. 数据分析与知识发现, 2020, 4(8): 130-142.
[15] Liu Qian, Li Chenliang. A Survey of Topic Evolution on Social Media[J]. 数据分析与知识发现, 2020, 4(8): 1-14.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938