[Objective] This study aims to construct a domain sentiment lexicon by discovering unrecognized sentiment words from user-generated contents on Chinese social media to apply it to automotive comments sentiment analysis. [Methods] First, words in HowNet are selected as the seeds, and PMI and Word2Vec algorithm are used to calculate the sentiment polarity of the candidates respectively on real automative corpus. Then the results of the two discriminations are judged synthetically according to the ensemble rules. Finally the proposed method was shown effective by the comparison of the sentiment classification experiments. [Results] The accuracy rate of the lexicon constructed according to proposed method is 21.6% higher than that of HowNet. The lexicon constructed by PMI and Word2Vec respectively increase 3.7% and 2.1%. Meanwhile the number of positive and negative emotional words are greatly increased. [Limitations] The source of corpus is single, and it has certain limitations in guiding other fields. [Conclusions] The sentiment lexicon constructed by this method can be applied to sentiment analysis of social media texts effectively.
蒋翠清,郭轶博,刘尧. 基于中文社交媒体文本的领域情感词典构建方法研究*[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
Cuiqing Jiang,Yibo Guo,Yao Liu. Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text. Data Analysis and Knowledge Discovery, 2019, 3(2): 98-107.
(Zhu Yanlan, Min Jin, Zhou Yaqian, et al.Semantic Orientation Computing Based on HowNet[J]. Journal of Chinese Information Processing, 2006, 20(1): 14-20.)
Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.378.
Mikolov T, KarafiáT M, BURGET L, et al. Recurrent Neural Network Based Language Model[C]//Proceedings of the 2010 Conference of the International Speech Communication Association, Makuhari, Chiba, Japan. 2010.
(Huang Wei, Fan Lei.Semi-supervised Sentiment Classification Based on Ensemble Learning with Voting[J]. Journal of Chinese Information Processing, 2016, 30(2): 41-49.)
Sun Z, Song Q, Zhu X, et al.A Novel Ensemble Method for Classifying Imbalanced Data[J]. Pattern Recognition, 2015, 48(5): 1623-1637.
Li Y, Guo H, Liu X, et al.Adapted Ensemble Classification Algorithm Based on Multiple Classifier System and Feature Selection for Classifying Multi-class Imbalanced Data[J]. Knowledge-Based Systems, 2016, 94: 88-104.
(AutoHome Forum[EB/OL]. [2018-03-01]..)
(Jieba: Chinese Text Segmentation[CP/OL] . [2018-03-01]..)
Chalothorn T, Ellman J.Sentiment Analysis of Web Forums: Comparison Between Sentiwordnet and Sentistrength[C]// Proceedings of the 2012 International Conference on Software Technology and Engineering. 2012.