Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (11): 99-107    DOI: 10.11925/infotech.2096-3467.2019.0412
     研究论文 本期目录 | 过刊浏览 | 高级检索 |
融合语法规则的Bi-LSTM中文情感分类方法研究 *
卢强,朱振方(),徐富永,国强强
山东交通学院信息科学与电气工程学院 济南 250357
Chinese Sentiment Classification Method with Bi-LSTM and Grammar Rules
Qiang Lu,Zhenfang Zhu(),Fuyong Xu,Qiangqiang Guo
School of Information Science and Electrical Engineering, Shandong Jiaotong University, Ji’nan 250357, China
全文: PDF(588 KB)   HTML ( 16
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】提出一种融合语法规则的情感分类方法, 提高中文文本情感分类的准确率。【方法】将中文语法规则以约束的形式同Bi-LSTM结合, 通过规范句子相邻位置的输出模拟句子层次中非情感词、情感词、否定词和程度词的语言作用。【结果】相较于前沿的RNN、LSTM、Bi-LSTM模型, 融合中文语法规则的Bi-LSTM模型准确率可达91.2%, 在准确率方面得到较好的提升。【局限】实验数据集来源相对单一, 只选取酒店评论数据集, 在其他数据集上方法的有效性需要进一步验证。【结论】本文提出的情感分类方法融合了中文语法规则, 进 一步提升了情感分类的准确率。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
卢强
朱振方
徐富永
国强强
关键词 语法规则情感分类Bi-LSTM    
Abstract

[Objective] This paper proposes a new classification method based on grammar rules, aiming to improve the accuracy of sentiment analysis for Chinese texts. [Methods] Firstly, we combined the Chinese grammar rules with Bi-LSTM in the form of constraints and standardized the adjacent positions of sentences from the experimental corpus. Then, we generated the linguistic functions of non-emotional, emotional, negative, and degree words at sentence level. [Results] Compared with the RNN, LSTM and Bi-LSTM models, the accuracy of our model reached upto 91.2%. [Limitations] The experimental data was only collected from the hotel reviews. More research is needed to examine the performance of this model on other data sets. [Conclusions] The proposed method improves the accuracy of sentiment classification for Chinese.

Key wordsGrammar Rules    Sentiment Classification    Bi-LSTM
收稿日期: 2019-04-22     
中图分类号:  TP391  
基金资助:*本文系国家社会科学基金项目“面向公共安全事件舆情文本的语义识别与决策支持研究”(项目编号: 19BYY076);教育部人文社会科学规划项目“基于内容和用户行为分析的网络舆情情感分析技术研究”(项目编号: 14YJC860042);山东省社会科学规划项目“网络舆情分析与导控中的文本语义识别与推理机制研究”(项目编号: 19BJCJ51)
通讯作者: 朱振方     E-mail: zhuzf@sdjtu.edu.cn
引用本文:   
卢强,朱振方,徐富永,国强强. 融合语法规则的Bi-LSTM中文情感分类方法研究 *[J]. 数据分析与知识发现, 2019, 3(11): 99-107.
Qiang Lu,Zhenfang Zhu,Fuyong Xu,Qiangqiang Guo. Chinese Sentiment Classification Method with Bi-LSTM and Grammar Rules. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2019.0412.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2019.0412
图1  标准LSTM模型[22]
图2  融合中文语法规则的Bi-LSTM模型
参数名 参数值
正向中文情感词汇 11 229
负向中文情感词汇 10 783
否定词 59
程度词 219
表1  语法词典
实验环境 环境配置
操作系统 Windows7
CPU Intel E5-2640v4 2.40 GHz
内存 4×16GB
编程语言 Python 2.7
分词工具 Jieba 0.39
词嵌入工具 Word2Vec
表2  实验配置与环境
参数名 参数值
词向量维度 300
隐藏层大小 300
学习率 0.01
Batch_Size 64
L2正则系数 0.001
表3  模型参数设置
图3  Dropout对模型性能的影响
图4  迭代次数对模型性能的影响
模型 准确率
RNN 0.726
CNN 0.816
LSTM 0.821
Bi-LSTM 0.845
R-Bi-LSTM 0.912
表4  模型实验结果
模型 准确率
Bi-LSTM Model 0.881
Stacked Bi-LSTM Model 0.895
CNN-Bi-LSTM Model 0.901
Bi-LSTM-CRF Model 0.875
R-Bi-LSTM 0.912
表5  与前沿基线模型对比结果
[1] Titov I, McDonald R . Modeling Online Reviews with Multi-Grain Topic Models [C]// Proceedings of the 17th International Conference on World Wide Web. ACM, 2008: 111-120.
[2] 赵妍妍, 秦兵, 刘挺 . 文本情感分析[J]. 软件学报, 2010,21(8):1834-1848.
( Zhao Yanyan, Qin Bing, Liu Ting . Sentiment Analysis[J]. Journal of Software, 2010,21(8):1834-1848.)
[3] Hu M, Liu B . Mining and Summarizing Customer Reviews [C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2004: 168-177.
[4] Miller G A, Beckwith R, Fellbaum C , et al. Introduction to WordNet: An On-Line Lexical Database[J]. International Journal of Lexicography, 1990,3(4):235-244.
doi: 10.1093/ijl/3.4.235
[5] Zhang S, Wei Z, Wang Y , et al. Sentiment Analysis of Chinese Micro-Blog Text Based on Extended Sentiment Dictionary[J]. Future Generation Computer Systems, 2018,81:395-403.
doi: 10.1016/j.future.2017.09.048
[6] Graovac J, Mladenović M, Tanasijević I . NgramSPD: Exploring Optimal N-Gram Model for Sentiment Polarity Detection in Different Languages[J]. Intelligent Data Analysis, 2019,23(2):279-296.
doi: 10.3233/IDA-183879
[7] 李泽魁, 赵妍妍, 秦兵 , 等. 中文微博情感倾向性分析特征工程[J]. 山西大学学报: 自然科学版, 2014,37(4):570-578.
( Li Zekui, Zhao Yanyan, Qin Bing , et al. Feature Engineering for Chinese Microblog Sentiment Classification[J]. Journal of Shanxi University: Natural Science Edition, 2014,37(4):570-578.)
[8] Sidorov G . Vector Space Model for Texts and the TF-IDF Measure[A]// Sidorov G. Syntactic N-Grams in Computational Linguistics[M]. Springer, 2019: 11-15.
[9] Wang G, Shin S Y . An Improved Text Classification Method for Sentiment Classification[J]. Journal of Information and Communication Convergence Engineering, 2019,17(1):41-48.
doi: 10.1111/acer.13906 pmid: 30326140
[10] Bachhety S, Dhingra S, Jain R , et al. Improved Multinomial Naïve Bayes Approach for Sentiment Analysis on Social Media[J]. International Journal of Information Systems & Management Science, 2018,1(1).
doi: 10.1159/000504871 pmid: 31838480
[11] Manek A S, Shenoy P D, Mohan M C , et al. Aspect Term Extraction for Sentiment Analysis in Large Movie Reviews Using Gini Index Feature Selection Method and SVM Classifier[J]. World Wide Web, 2017,20(2):135-154.
doi: 10.1007/s11280-015-0381-x
[12] Pang B, Lee L, Vaithyanathan S . Thumbs up?: Sentiment Classification Using Machine Learning Techniques [C]// Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing. ACL, 2002: 79-86.
[13] Shore J, Johnson R . Axiomatic Derivation of the Principle of Maximum Entropy and the Principle of Minimum Cross-Entropy[J]. IEEE Transactions on Information Theory, 1980,26(1):26-37.
doi: 10.1109/TIT.1980.1056144
[14] 张庆庆, 刘西林 . 基于BPSO随机子空间的文本情感分类研究[J]. 数据分析与知识发现, 2017,1(5):71-81.
( Zhang Qingqing, Liu Xilin . Classifying Sentiments Based on BPSO Random Subspace[J]. Data Analysis and Knowledge Discovery, 2017,1(5):71-81.)
[15] Kim Y . Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408.5882.
[16] Davidian D . Feed-forward Neural Network: USA, US5438646[P]. 1995-08-01.
[17] 李慧, 柴亚青 . 基于卷积神经网络的细粒度情感分析方法[J]. 数据分析与知识发现, 2019,3(1):95-103.
( Li Hui, Chai Yaqing . Fine-Grained Sentiment Analysis Based on Convolutional Neural Network[J]. Data Analysis and Knowledge Discovery, 2019,3(1):95-103.)
[18] Goldberg Y, Levy O . Word2Vec Explained: Deriving Mikolov et al. ’s Negative-Sampling Word-Embedding Method[OL]. arXiv Preprint, arXiv: 1402.3722.
[19] Rumelhart D E, Hinton G E, Williams R J . Learning Representations by Back-Propagating Errors[A]// Polk T A, Seifert C M. Cognitive Modeling[M]. 1988.
[20] Abdi A, Shamsuddin S M, Hasan S , et al. Deep Learning-Based Sentiment Classification of Evaluative Text Based on Multi-Feature Fusion[J]. Information Processing & Management, 2019,56(4):1245-1259.
doi: 10.1088/1361-6560/ab6240 pmid: 31842014
[21] Gers F A, Schmidhuber J, Cummins F . Learning to Forget: Continual Prediction with LSTM [C]// Proceedings of the 9th International Conference on Artificial Neural Networks. 1999: 850-855.
[22] Wang Y, Huang M, Zhao L . Attention-based LSTM for Aspect-level Sentiment Classification [C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016: 606-615.
[23] Lin J . Divergence Measures Based on the Shannon Entropy[J]. IEEE Transactions on Information Theory, 1991,37(1):145-151.
doi: 10.3390/e19120646 pmid: 30498328
[24] Pennington J, Socher R, Manning C . GloVe: Global Vectors for Word Representation [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP). 2014: 1532-1543.
[25] 蔡慧苹, 王丽丹, 段书凯 . 基于Word Embedding和CNN的情感分类模型[J]. 计算机应用研究, 2016,33(10):2902-2905.
( Cai Huiping, Wang Lidan, Duan Shukai . Sentiment Classification Model Based on Word Embedding and CNN[J]. Application Research of Computers, 2016,33(10):2902-2905.)
[26] Byrkjeland M, De Lichtenberg F G, Gambäck B . Ternary Twitter Sentiment Classification with Distant Supervision and Sentiment-Specific Word Embeddings [C]// Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 2018: 97-106.
[27] Duchi J, Hazan E, Singer Y . Adaptive Subgradient Methods for Online Learning and Stochastic Optimization[J]. Journal of Machine Learning Research, 2011,12:2121-2159.
[28] Srivastava N, Hinton G, Krizhevsky A , et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting[J]. The Journal of Machine Learning Research, 2014,15(1):1929-1958.
[29] Xu G, Meng Y, Qiu X , et al. Sentiment Analysis of Comment Texts Based on BiLSTM[J]. IEEE Access, 2019,7:51522-51532.
doi: 10.1109/Access.6287639
[30] Zhou J, Lu Y, Dai H N , et al. Sentiment Analysis of Chinese Microblog Based on Stacked Bidirectional LSTM[J]. IEEE Access, 2019,7:38856-38866.
doi: 10.1109/Access.6287639
[31] Zhou K, Long F . Sentiment Analysis of Text Based on CNN and Bi-Directional LSTM Model [C]// Proceedings of the 24th International Conference on Automation and Computing. IEEE, 2018: 1-5.
[32] Xiong H, Yan H, Zeng Z , et al. Dependency Parsing and Bidirectional LSTM-CRF for Aspect-level Sentiment Analysis of Chinese [C]// Proceedings of the 8th Joint International Semantic Technology Conference. 2018: 90-93.
[1] 张庆庆,贺兴时,王慧敏,蒙胜军. 基于深度信念网络的文本情感分类研究*[J]. 数据分析与知识发现, 2019, 3(4): 71-79.
[2] 李慧,柴亚青. 基于卷积神经网络的细粒度情感分析方法*[J]. 数据分析与知识发现, 2019, 3(1): 95-103.
[3] 冯国明,张晓冬,刘素辉. 基于自主学习的专业领域文本DBLC分词模型[J]. 数据分析与知识发现, 2018, 2(5): 40-47.
[4] 王树义,廖桦涛,吴查科. 基于情感分类的竞争企业新闻文本主题挖掘*[J]. 数据分析与知识发现, 2018, 2(3): 70-78.
[5] 张庆庆,刘西林. 基于BPSO随机子空间的文本情感分类研究[J]. 数据分析与知识发现, 2017, 1(5): 71-81.
[6] 王晓耘,袁媛,史玲玲. 基于微博的电影首映周票房预测建模*[J]. 现代图书情报技术, 2016, 32(4): 31-39.
[7] 郭顺利,张向先. 面向中文图书评论的情感词典构建方法研究[J]. 现代图书情报技术, 2016, 32(2): 67-74.
[8] 邵健, 章成志, 李蕾. Hashtag研究综述[J]. 现代图书情报技术, 2015, 31(10): 40-49.
[9] 毕秋敏, 李明, 曾志勇. 一种主动学习和协同训练相结合的半监督微博情感分类方法[J]. 现代图书情报技术, 2015, 31(1): 38-44.
[10] 许鑫, 俞飞, 张莉. 一种文本倾向性分析方法及其应用[J]. 现代图书情报技术, 2011, 27(10): 54-62.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn