Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (2): 98-107    DOI: 10.11925/infotech.2096-3467.2018.0578
Current Issue | Archive | Adv Search |
Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text
Cuiqing Jiang1,2(),Yibo Guo1,Yao Liu1
1School of Management, Hefei University of Technology, Hefei 230009, China
2Key Laboratory of Process Optimization and Intelligent Decision-making of Ministry of Education, Hefei 230009, China
Download: PDF(632 KB)   HTML ( 5
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study aims to construct a domain sentiment lexicon by discovering unrecognized sentiment words from user-generated contents on Chinese social media to apply it to automotive comments sentiment analysis. [Methods] First, words in HowNet are selected as the seeds, and PMI and Word2Vec algorithm are used to calculate the sentiment polarity of the candidates respectively on real automative corpus. Then the results of the two discriminations are judged synthetically according to the ensemble rules. Finally the proposed method was shown effective by the comparison of the sentiment classification experiments. [Results] The accuracy rate of the lexicon constructed according to proposed method is 21.6% higher than that of HowNet. The lexicon constructed by PMI and Word2Vec respectively increase 3.7% and 2.1%. Meanwhile the number of positive and negative emotional words are greatly increased. [Limitations] The source of corpus is single, and it has certain limitations in guiding other fields. [Conclusions] The sentiment lexicon constructed by this method can be applied to sentiment analysis of social media texts effectively.

Key wordsSocial Media      Sentiment Analysis      Sentiment Lexicon      PMI      Word2Vec     
Received: 23 May 2018      Published: 27 March 2019

Cite this article:

Cuiqing Jiang,Yibo Guo,Yao Liu. Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text. Data Analysis and Knowledge Discovery, 2019, 3(2): 98-107.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.0578     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2019/V3/I2/98

[1] Liu B.Sentiment Analysis and Opinion Mining[A]//Synthesis Lectures on Human Language Technologies[M]. Morgan & Claypool Publishers, 2012: 152-153.
[2] Hogenboom A, Heerschop B, Frasincar F, et al.Multi-lingual Support for Lexicon-based Sentiment Analysis Guided by Semantics[J]. Decision Support Systems, 2014, 62(2): 43-53.
[3] Wu F, Huang Y, Song Y, et al.Towards Building a High-quality Microblog-specific Chinese Sentiment Lexicon[J]. Decision Support Systems, 2016, 87: 39-49.
[4] Fellbaum C, Miller G.WordNet: An Electronic Lexical Database[M]. MIT Press, 1998.
[5] Stone P J, Dunphy D C, Smith M S.The General Inquirer: A Computer Approach to Content Analysis[J]. Information Storage & Retrieval, 1966, 4(4): 375-376.
[6] Dong Z, Dong Q.HowNet - A Hybrid Language and Knowledge Resource[C]// Proceedings of the 2003 International Conference on Natural Language Processing and Knowledge Engineering. 2003.
[7] 王科, 夏睿. 情感词典自动构建方法综述[J]. 自动化学报, 2016,42(4): 495-511.
[7] (Wang Ke, Xia Rui.A Survey on Automatical Construction Methods of Sentiment Lexicons[J]. Acta Automatica Sinica, 2016, 42(4): 495-511.)
[8] Loughran T, Mcdonald B.When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10‐Ks[J]. Journal of Finance, 2011, 66(1): 35-65.
[9] Church K W, Hanks P. Word Association Norms, Mutual Information,Lexicography[J]. Computational Linguistics, 1990, 16(1): 76-83.
[10] Turney P D, Littman M L.Measuring Praise and Criticism: Inference of Semantic Orientation from Association[J]. ACM Transactions on Information Systems, 2003, 21(4): 315-346.
[11] Deng S, Sinha A P, Zhao H.Adapting Sentiment Lexicons to Domain-Specific Social Media Texts[J]. Decision Support Systems, 2017, 94: 65-76.
[12] 郭顺利, 张向先. 面向中文图书评论的情感词典构建方法研究[J]. 现代图书情报技术, 2016(2): 67-74.
[12] (Guo Shunli, Zhang Xiangxian.Building Sentiment Analysis Dictionary for Chinese Book Reviews[J]. New Technology of Library and Information Service, 2016(2): 67-74.)
[13] 郗亚辉. 产品评论中领域情感词典的构建[J]. 中文信息学报, 2016, 30(5): 136-144.
[13] (Xi Yahui.Construction of Domain-specific Sentiment Lexicon in Product Reviews[J]. Journal of Chinese Information Processing, 2016, 30(5): 136-144.)
[14] 朱嫣岚, 闵锦, 周雅倩, 等. 基于HowNet的词汇语义倾向计算[J]. 中文信息学报, 2006, 20(1): 14-20.
[14] (Zhu Yanlan, Min Jin, Zhou Yaqian, et al.Semantic Orientation Computing Based on HowNet[J]. Journal of Chinese Information Processing, 2006, 20(1): 14-20.)
[15] Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.378.
[16] Mikolov T, KarafiáT M, BURGET L, et al. Recurrent Neural Network Based Language Model[C]//Proceedings of the 2010 Conference of the International Speech Communication Association, Makuhari, Chiba, Japan. 2010.
[17] 杨小平, 张中夏, 王良, 等. 基于Word2Vec的情感词典自动构建与优化[J]. 计算机科学, 2017, 44(1): 42-47.
[17] (Yang Xiaoping, Zhang Zhongxia, Wang Liang, et al.Automatic Construction and Optimization of Sentiment Lexicon Based on Word2Vec[J].Computer Science, 2017, 44(1): 42-47.)
[18] 王仁武, 宋家怡, 陈川宝. 基于Word2vec的情感分析在品牌认知中的应用研究[J]. 图书情报工作, 2017, 61(22): 6-12.
[18] (Wang Renwu, Song Jiayi, Chen Chuanbao.Application of Sentiment Analysis Based on Word2vec in Brand Cognition[J]. Library and Information Service, 2017, 61(22): 6-12.)
[19] Qiu G, Liu B, Bu J, et al.Expanding Domain Sentiment Lexicon Through Double Propagation[C]// Proceedings of the 21st International Jont Conference on Artifical Intelligence. 2009.
[20] Filho J L, Canuto A P, Santiago R N.Investigating the Impact of Selection Criteria in Dynamic Ensemble Selection Methods[J]. Expert Systems with Applications, 2018,106: 141-153.
[21] Kittler J V, Hatef M, Duin R W, et al.On Combining Classfiers[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1998, 20(3): 226-239.
[22] 黄伟, 范磊. 基于多分类器投票集成的半监督情感分类方法研究[J]. 中文信息学报, 2016, 30(2): 41-49.
[22] (Huang Wei, Fan Lei.Semi-supervised Sentiment Classification Based on Ensemble Learning with Voting[J]. Journal of Chinese Information Processing, 2016, 30(2): 41-49.)
[23] Sun Z, Song Q, Zhu X, et al.A Novel Ensemble Method for Classifying Imbalanced Data[J]. Pattern Recognition, 2015, 48(5): 1623-1637.
[24] Li Y, Guo H, Liu X, et al.Adapted Ensemble Classification Algorithm Based on Multiple Classifier System and Feature Selection for Classifying Multi-class Imbalanced Data[J]. Knowledge-Based Systems, 2016, 94: 88-104.
[25] 汽车之家论坛[EB/OL]. [2018-03-01].
[25] (AutoHome Forum[EB/OL]. [2018-03-01]..)
[26] 结巴中文分词[CP/OL]. [2018-03-01]..
[26] (Jieba: Chinese Text Segmentation[CP/OL] . [2018-03-01]..)
[27] 搜狗词库-汽车[DB/OL]. [2018-03-01]..
[27] (Sougou Typewriting Lexicon-Car [DB/OL]. [2018-03-01]..)
[28] Chalothorn T, Ellman J.Sentiment Analysis of Web Forums: Comparison Between Sentiwordnet and Sentistrength[C]// Proceedings of the 2012 International Conference on Software Technology and Engineering. 2012.
[1] Zhongxi You,Weina Hua,Xuelian Pan. Matching Book Reviews and Essential Sentiment Lexicons with Chinese Word Segmenters[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
[2] Lin Wang,Ke Wang,Jiang Wu. Public Opinion Propagation and Evolution of Public Health Emergencies in Social Media Era: A Case Study of 2018 Vaccine Event[J]. 数据分析与知识发现, 2019, 3(4): 42-52.
[3] Xiwei Wang,Duo Wang,Qingxiao Zheng,Ya’nan Wei. Information Interaction Between User and Enterprise in Online Brand Community: A Study of Virtual Reality Industry[J]. 数据分析与知识发现, 2019, 3(3): 83-94.
[4] Xiaoxiao Zhu,Zunqi Yang,Jing Liu. Construction of an Adverse Drug Reaction Extraction Model Based on Bi-LSTM and CRF[J]. 数据分析与知识发现, 2019, 3(2): 90-97.
[5] Bengong Yu,Peihang Zhang,Qingtang Xu. Selecting Products Based on F-BiGRU Sentiment Analysis[J]. 数据分析与知识发现, 2018, 2(9): 22-30.
[6] Xinlei Li,Hao Wang,Xiaomin Liu,Sanhong Deng. Comparing Text Vector Generators for Weibo Short Text Classification[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[7] Ziming Zeng,Qianwen Yang. Sentiment Analysis for Micro-blogs with LDA and AdaBoost[J]. 数据分析与知识发现, 2018, 2(8): 51-59.
[8] Lei Li,Daqing He,Chengzhi Zhang. Survey on Social Question and Answer[J]. 数据分析与知识发现, 2018, 2(7): 1-12.
[9] Dong Jing,Dayong Zhang. Assessing Trust-Based Users’ Influence in Social Media[J]. 数据分析与知识发现, 2018, 2(7): 26-33.
[10] Xiufang Wang,Shu Sheng,Yan Lu. Analyzing Public Opinion from Microblog with Topic Clustering and Sentiment Intensity[J]. 数据分析与知识发现, 2018, 2(6): 37-47.
[11] Sinan Yang,Jian Xu,Pingping Ye. Review of Online Sentiment Visualization Techniques[J]. 数据分析与知识发现, 2018, 2(5): 77-87.
[12] Tingting Wang,Kaiping Wang,Guijie Qi. Analyzing Implemented Ideas from Open Innovation Platform with Sentiment Analysis: Case Study of Salesforce[J]. 数据分析与知识发现, 2018, 2(4): 38-47.
[13] Yang Zhao,Qiqi Li,Yuhan Chen,Wenhang Cao. Examining Consumer Reviews of Overseas Shopping APP with Sentiment Analysis[J]. 数据分析与知识发现, 2018, 2(11): 19-27.
[14] Yongbing Gao,Guipeng Yang,Di Zhang,Zhanfei Ma. Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words[J]. 数据分析与知识发现, 2017, 1(9): 57-64.
[15] Yue He,Can Zhu. Sentiment Analysis of Weibo Opinion Leaders——Case Study of “Illegal Vaccine” Event[J]. 数据分析与知识发现, 2017, 1(9): 65-73.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn