Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text
Cuiqing Jiang1,2(),Yibo Guo1,Yao Liu1
1School of Management, Hefei University of Technology, Hefei 230009, China 2Key Laboratory of Process Optimization and Intelligent Decision-making of Ministry of Education, Hefei 230009, China
[Objective] This study aims to construct a domain sentiment lexicon by discovering unrecognized sentiment words from user-generated contents on Chinese social media to apply it to automotive comments sentiment analysis. [Methods] First, words in HowNet are selected as the seeds, and PMI and Word2Vec algorithm are used to calculate the sentiment polarity of the candidates respectively on real automative corpus. Then the results of the two discriminations are judged synthetically according to the ensemble rules. Finally the proposed method was shown effective by the comparison of the sentiment classification experiments. [Results] The accuracy rate of the lexicon constructed according to proposed method is 21.6% higher than that of HowNet. The lexicon constructed by PMI and Word2Vec respectively increase 3.7% and 2.1%. Meanwhile the number of positive and negative emotional words are greatly increased. [Limitations] The source of corpus is single, and it has certain limitations in guiding other fields. [Conclusions] The sentiment lexicon constructed by this method can be applied to sentiment analysis of social media texts effectively.
蒋翠清,郭轶博,刘尧. 基于中文社交媒体文本的领域情感词典构建方法研究*[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
Cuiqing Jiang,Yibo Guo,Yao Liu. Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text. Data Analysis and Knowledge Discovery, 2019, 3(2): 98-107.
Liu B.Sentiment Analysis and Opinion Mining[A]//Synthesis Lectures on Human Language Technologies[M]. Morgan & Claypool Publishers, 2012: 152-153.
[2]
Hogenboom A, Heerschop B, Frasincar F, et al.Multi-lingual Support for Lexicon-based Sentiment Analysis Guided by Semantics[J]. Decision Support Systems, 2014, 62(2): 43-53.
[3]
Wu F, Huang Y, Song Y, et al.Towards Building a High-quality Microblog-specific Chinese Sentiment Lexicon[J]. Decision Support Systems, 2016, 87: 39-49.
[4]
Fellbaum C, Miller G.WordNet: An Electronic Lexical Database[M]. MIT Press, 1998.
[5]
Stone P J, Dunphy D C, Smith M S.The General Inquirer: A Computer Approach to Content Analysis[J]. Information Storage & Retrieval, 1966, 4(4): 375-376.
[6]
Dong Z, Dong Q.HowNet - A Hybrid Language and Knowledge Resource[C]// Proceedings of the 2003 International Conference on Natural Language Processing and Knowledge Engineering. 2003.
(Wang Ke, Xia Rui.A Survey on Automatical Construction Methods of Sentiment Lexicons[J]. Acta Automatica Sinica, 2016, 42(4): 495-511.)
[8]
Loughran T, Mcdonald B.When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10‐Ks[J]. Journal of Finance, 2011, 66(1): 35-65.
[9]
Church K W, Hanks P. Word Association Norms, Mutual Information,Lexicography[J]. Computational Linguistics, 1990, 16(1): 76-83.
[10]
Turney P D, Littman M L.Measuring Praise and Criticism: Inference of Semantic Orientation from Association[J]. ACM Transactions on Information Systems, 2003, 21(4): 315-346.
[11]
Deng S, Sinha A P, Zhao H.Adapting Sentiment Lexicons to Domain-Specific Social Media Texts[J]. Decision Support Systems, 2017, 94: 65-76.
(Guo Shunli, Zhang Xiangxian.Building Sentiment Analysis Dictionary for Chinese Book Reviews[J]. New Technology of Library and Information Service, 2016(2): 67-74.)
(Zhu Yanlan, Min Jin, Zhou Yaqian, et al.Semantic Orientation Computing Based on HowNet[J]. Journal of Chinese Information Processing, 2006, 20(1): 14-20.)
[15]
Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.378.
[16]
Mikolov T, KarafiáT M, BURGET L, et al. Recurrent Neural Network Based Language Model[C]//Proceedings of the 2010 Conference of the International Speech Communication Association, Makuhari, Chiba, Japan. 2010.
(Yang Xiaoping, Zhang Zhongxia, Wang Liang, et al.Automatic Construction and Optimization of Sentiment Lexicon Based on Word2Vec[J].Computer Science, 2017, 44(1): 42-47.)
(Wang Renwu, Song Jiayi, Chen Chuanbao.Application of Sentiment Analysis Based on Word2vec in Brand Cognition[J]. Library and Information Service, 2017, 61(22): 6-12.)
[19]
Qiu G, Liu B, Bu J, et al.Expanding Domain Sentiment Lexicon Through Double Propagation[C]// Proceedings of the 21st International Jont Conference on Artifical Intelligence. 2009.
[20]
Filho J L, Canuto A P, Santiago R N.Investigating the Impact of Selection Criteria in Dynamic Ensemble Selection Methods[J]. Expert Systems with Applications, 2018,106: 141-153.
[21]
Kittler J V, Hatef M, Duin R W, et al.On Combining Classfiers[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1998, 20(3): 226-239.
(Huang Wei, Fan Lei.Semi-supervised Sentiment Classification Based on Ensemble Learning with Voting[J]. Journal of Chinese Information Processing, 2016, 30(2): 41-49.)
[23]
Sun Z, Song Q, Zhu X, et al.A Novel Ensemble Method for Classifying Imbalanced Data[J]. Pattern Recognition, 2015, 48(5): 1623-1637.
[24]
Li Y, Guo H, Liu X, et al.Adapted Ensemble Classification Algorithm Based on Multiple Classifier System and Feature Selection for Classifying Multi-class Imbalanced Data[J]. Knowledge-Based Systems, 2016, 94: 88-104.
[25]
汽车之家论坛[EB/OL]. [2018-03-01].
[25]
(AutoHome Forum[EB/OL]. [2018-03-01]..)
[26]
结巴中文分词[CP/OL]. [2018-03-01]..
[26]
(Jieba: Chinese Text Segmentation[CP/OL] . [2018-03-01]..)
Chalothorn T, Ellman J.Sentiment Analysis of Web Forums: Comparison Between Sentiwordnet and Sentistrength[C]// Proceedings of the 2012 International Conference on Software Technology and Engineering. 2012.