Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (12): 85-94     https://doi.org/10.11925/infotech.2096-3467.2020.0535
     研究论文 本期目录 | 过刊浏览 | 高级检索 |
融合特征与注意力的跨领域产品评论情感分析*
祁瑞华1,2(),简悦1,2,郭旭2,关菁华2,杨明昕1,2
1大连外国语大学语言智能研究中心 大连 116044
2大连外国语大学软件学院 大连 116044
Sentiment Analysis of Cross-Domain Product Reviews Based on Feature Fusion and Attention Mechanism
Qi Ruihua1,2(),Jian Yue1,2,Guo Xu2,Guan Jinghua2,Yang Mingxin1,2
1Research Center for Language Intelligence, Dalian University of Foreign Languages, Dalian 116044, China
2School of Software Engineering, Dalian University of Foreign Languages, Dalian 116044, China
全文: PDF (1050 KB)   HTML ( 24
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 研究针对跨领域情感分类任务中标注数据资源相对匮乏以及从源领域到目标领域情感分类特征重要性区分问题。【方法】 提出基于特征融合表示方法与注意力机制的跨领域双向长短时记忆产品评论情感分类模型,融合Bert词向量和跨领域词向量生成跨领域统一特征空间,通过双向长短时记忆网络结合注意力机制提取全局特征和局部特征的重要性权重。【结果】 在亚马逊产品公开评论数据集上的对照实验结果表明,该模型跨领域评论情感分类平均准确率达到对照模型的最高值95.93%,比文献中对照模型最高准确率高出9.33%。【局限】 需在多领域大规模数据集上进一步检验模型的泛化性,探究源领域知识对目标领域评论情感分类贡献规律。【结论】 通过双向长短时记忆网络层学习融合特征能够有效获取情感语义信息,对照实验中对目标领域最有帮助的源领域基本一致。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
祁瑞华
简悦
郭旭
关菁华
杨明昕
关键词 特征融合注意力机制跨领域情感分类    
Abstract

[Objective] This paper tries to address the issues of labelled data shortage, aiming to distinguish the weights of sentiment characteristics in cross-domain sentiment classification. [Methods] We proposed a sentiment classification model for cross-domain product reviews based on feature fusion representation and the attention mechanism. First, this model integrated Bert and cross-domain word vectors to generate cross-domain unified feature space. Then, it extracted the weights of global and local features through attention mechanism. [Results] We examined our model with public review data from Amazon and found the average accuracy of the proposed model was up-to 95.93%, which was 9.33% higher than the existing model. [Limitations] More research is needed to evaluate our model with large-scale multi-domain data sets. [Conclusions] The proposed model could effectively analyze sentiment information.

Key wordsFeature Fusion    Attention Mechanism    Cross-Domain    Sentiment Classification
收稿日期: 2020-06-08      出版日期: 2020-12-25
ZTFLH:  TP393  
基金资助:辽宁省高等学校创新人才项目(WR2019005);国家社会科学基金一般项目“典籍英译国外读者网上评论观点挖掘研究”(15BYY028);辽宁省社科规划基金一般项目“大数据环境下突发事件谣言预警研究”(L17BTQ005)
通讯作者: 祁瑞华     E-mail: rhqi@dlufl.edu.cn
引用本文:   
祁瑞华,简悦,郭旭,关菁华,杨明昕. 融合特征与注意力的跨领域产品评论情感分析*[J]. 数据分析与知识发现, 2020, 4(12): 85-94.
Qi Ruihua,Jian Yue,Guo Xu,Guan Jinghua,Yang Mingxin. Sentiment Analysis of Cross-Domain Product Reviews Based on Feature Fusion and Attention Mechanism. Data Analysis and Knowledge Discovery, 2020, 4(12): 85-94.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0535      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I12/85
Fig.1  基于特征融合与注意力的跨领域评论情感分类模型
领域 积极评论 消极评论 无标注文本
Books 3 000 3 000 9 750
DVD disk 3 000 3 000 11 843
Electronics 3 000 3 000 17 009
Kitchen appliances 3 000 3 000 13 856
Videos 3 000 3 000 30 180
Table 1  实验数据集(条)
参数名称 参数名称
最大长度 120 优化器 Adam
词向量维度 768,1 068,
1 068
损失函数 binary_
crossentropy
LSTM隐藏单元 128 batch_size 32
全连接层1 128, activation=“tanh” 全连接层2 2, activation=“softmax”
Dropout 0.5 输出层激活函数 Softmax
Table 2  跨领域情感分类实验参数
源领域 目标领域 S-only SFA DANN mSDA HATN CDSA-B CDSA-F CDSA-F-Att
Book DVD 80.57% 82.85% 83.42% 86.12% 87.07% 93.20% 94.96% 95.46%
Book Electronic 73.65% 76.38% 76.27% 79.02% 85.75% 93.68% 95.46% 95.43%
Book Kitchen 71.63% 78.10% 77.90% 81.05% 87.03% 94.65% 95.20% 96.11%
Book Video 81.45% 82.95% 83.23% 84.98% 87.80% 95.56% 96.10% 96.53%
DVD Book 76.45% 80.20% 80.77% 85.17% 87.78% 93.41% 94.55% 94.86%
DVD Electronic 73.12% 76.00% 76.35% 76.17% 86.32% 92.86% 95.08% 95.35%
DVD Kitchen 73.43% 77.50% 78.15% 82.60% 87.47% 94.76% 96.28% 96.51%
DVD Video 82.75% 85.95% 85.95% 83.80% 89.12% 96.15% 96.65% 97.10%
Electronic Book 68.87% 72.35% 73.53% 79.92% 84.03% 93.15% 94.70% 94.86%
Electronic DVD 72.60% 75.93% 76.27% 82.63% 84.32% 93.38% 95.68% 96.13%
Electronic Kitchen 84.63% 86.50% 84.53% 85.80% 90.08% 95.70% 96.41% 96.60%
Electronic Video 72.48% 75.65% 77.20% 81.70% 84.08% 95.48% 96.23% 96.71%
Kitchen Book 71.53% 73.97% 74.17% 80.55% 84.88% 93.68% 94.03% 95.00%
Kitchen DVD 73.32% 75.67% 75.32% 82.18% 84.72% 93.40% 95.81% 96.03%
Kitchen Electronic 83.15% 85.38% 85.53% 88.00% 89.33% 94.63% 96.00% 96.01%
Kitchen Video 76.08% 77.97% 76.37% 81.47% 84.85% 95.90% 96.91% 96.80%
Video Book 77.03% 79.48% 80.03% 83.00% 87.10% 93.88% 94.18% 94.19%
Video DVD 82.43% 83.65% 84.15% 85.90% 87.90% 95.30% 95.96% 96.50%
Video Electronic 71.87% 75.93% 75.72% 77.67% 85.98% 94.00% 94.98% 95.86%
Video Kitchen 71.33% 74.78% 75.22% 79.52% 86.45% 95.80% 96.28% 96.71%
平均准确率 75.92% 78.69% 79.00% 82.36% 86.60% 94.42% 95.57% 95.93%
Table 3  亚马逊评论数据集上的实验准确率
Fig.2  8个模型在源领域-目标领域实验中的平均准确率
Fig.3  CDSA-F模型在源领域-目标领域实验中的平均准确率
Fig.4  CDSA-F-Att模型在源领域-目标领域实验中的平均准确率
[1] Al-Moslmi T, Omar N, Abdullah S , et al. Approaches to Cross-Domain Sentiment Analysis: A Systematic Literature Review[J]. IEEE Access, 2017,5:16173-16192.
[2] Lai S W, Liu K, He S Z , et al. How to Generate a Good Word Embedding[J]. IEEE Intelligent Systems, 2016,31(6):5-14.
[3] Weiss K, Khoshgoftaar T M, Wang D D . A Survey of Transfer Learning[J]. Journal of Big Data, 2016, 3(1): Article No. 9.
[4] Tahmoresnezhad J, Hashemi S . Visual Domain Adaptation via Transfer Feature Learning[J]. Knowledge and Information Systems, 2017,50(2):585-605.
[5] Sun M, Tan Q, Ding R W, et al. Cross-Domain Sentiment Classification Using Deep Learning Approach [C]//Proceedings of the 3rd International Conference on Cloud Computing and Intelligence Systems. IEEE, 2014: 60-64.
[6] 余传明 . 基于深度循环神经网络的跨领域文本情感分析[J]. 图书情报工作, 2018,62(11):23-34.
[6] ( Yu Chuanming . A Cross-Domain Text Sentiment Analysis Based on Deep Recurrent Neural Network[J]. Library and Information Service, 2018,62(11):23-34.)
[7] Ganin Y, Ustinova E, Ajakan H , et al. Domain-Adversarial Training of Neural Networks[J]. The Journal of Machine Learning Research, 2016,17(1):2096-2130.
[8] Gamon M, Aue A. Automatic Identification of Sentiment Vocabulary: Exploiting Low Association with Known Sentiment Terms [C]//Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing. 2005: 57-64.
[9] Dai W Y, Yang Q, Xue G R, et al. Boosting for Transfer Learning [C]//Proceedings of the 24th International Conference on Machine Learning. 2007: 193-200.
[10] Zhang S W, Liu H L, Yang L, et al. A Cross-domain Sentiment Classification Method Based on Extraction of Key Sentiment Sentence [C]//Proceedings of the 4th CCF Conference on Natural Language Processing and Chinese Computing-Volume 9362. 2015: 90-101.
[11] Blitzer J, McDonald R, Pereira F. Domain Adaptation with Structural Correspondence Learning [C]//Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. 2006: 120-128.
[12] Blitzer J, Dredze M, Pereira F. Biographies, Bollywood, Boom-Boxes and Blenders: Domain Adaptation for Sentiment Classification [C]//Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 2007: 440-447.
[13] Pan S J, Ni X, Sun J T, et al. Cross-Domain Sentiment Classification via Spectral Feature Alignment [C]// Proceedings of the 19th International Conference on World Wide Web. 2010: 751-760.
[14] Wei X C, Lin H F, Yang L . Cross-Domain Sentiment Classification via Constructing Semantic Correlation[J]. IAENG International Journal of Computer Science, 2017,44(2):172-179.
[15] Glorot X, Bordes A, Bengio Y. Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach [C]//Proceedings of the 28th International Conference on Machine Learning. 2011: 513-520.
[16] Chen M M, Xu Z X, Weinberger K Q, et al. Marginalized Denoising Autoencoders for Domain Adaptation [C]//Proceedings of the 29th International Conference on Machine Learning. 2012: 1627-1634.
[17] Mikolov T, Yih W T, Zweig G. Linguistic Regularities in Continuous Space Word Representations [C]//Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013: 746-751.
[18] Akhtar M S, Kumar A, Ekbal A, et al. A Hybrid Deep Learning Architecture for Sentiment Analysis [C]// Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers. 2016: 482-493.
[19] Yu J F, Jiang J. Learning Sentence Embeddings with Auxiliary Tasks for Cross-Domain Sentiment Classification [C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016: 236-246.
[20] Li Y T, Baldwin T, Cohn T. What’s in a Domain? Learning Domain-Robust Text Representations Using Adversarial Training [C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2. 2018: 474-479.
[21] Zhang L W, Tu K W, Zhang Y. Latent Variable Sentiment Grammar [C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 4642-4651.
[22] Tai K S, Socher R, Manning C D. Improved Semantic Representations from Tree-Structured Long Short-Term Memory Networks [C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1). 2015: 1556-1566.
[23] Qian Q, Huang M L, Lei J H, et al. Linguistically Regularized LSTM for Sentiment Classification [C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1). 2017: 1679-1689.
[24] Wei X C, Lin H F, Yang L , et al. A Convolution-LSTM-Based Deep Neural Network for Cross-Domain MOOC Forum Post Classification[J]. Information, 2017,8(3). DOI: 10.3390/info8030092.
[25] Raffel C, Ellis D P W . Feed-Forward Networks with Attention can Solve Some Long-Term Memory Problems[OL]. arXiv Preprint, arXiv: 1512.08756.
[26] Yang Z C, Yang D Y, Dyer C, et al. Hierarchical Attention Networks for Document Classification [C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016: 1480-1489.
[27] Gu Y, Yang K N, Fu S Y, et al. Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1). 2018: 2225-2235.
[28] Li Z, Zhang Y, Wei Y, et al. End-to-End Adversarial Memory Network for Cross-Domain Sentiment Classification [C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017.
[29] Li Z, Wei Y, Zhang Y, et al. Hierarchical Attention Transfer Network for Cross-Domain Sentiment Classification [C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018: 5852-5859.
[30] Ben-David S, Blitzer J, Crammer K , et al. A Theory of Learning from Different Domains[J]. Machine Learning, 2010,79(1-2):151-175.
doi: 10.1007/s10994-009-5152-4
[31] Jatnika D, Bijaksana M A, Suryani A A . Word2Vec Model Analysis for Semantic Similarities in English Words[J]. Procedia Computer Science, 2019,157:160-167.
doi: 10.1016/j.procs.2019.08.153
[32] Hochreiter S, Schmidhuber J . Long Short-Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
doi: 10.1162/neco.1997.9.8.1735 pmid: 9377276
[33] Zhou P, Shi W, Tian J, et al. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification [C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2). 2016: 207-212.
[34] Buja A, Stuetzle W, Shen Y. Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications[EB/OL]. [ 2020- 07- 12]. http://www-stat.wharton.upenn.edu/~buja/.
[1] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] 范涛,王昊,吴鹏. 基于图卷积神经网络和依存句法分析的网民负面情感分析研究*[J]. 数据分析与知识发现, 2021, 5(9): 97-106.
[3] 杨晗迅, 周德群, 马静, 罗永聪. 基于不确定性损失函数和任务层级注意力机制的多任务谣言检测研究*[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
[4] 徐月梅, 王子厚, 吴子歆. 一种基于CNN-BiLSTM多特征融合的股票走势预测模型*[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[5] 尹鹏博,潘伟民,张海军,陈德刚. 基于BERT-BiGA模型的标题党新闻识别研究*[J]. 数据分析与知识发现, 2021, 5(6): 126-134.
[6] 余本功,朱晓洁,张子薇. 基于多层次特征提取的胶囊网络文本分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[7] 谢豪,毛进,李纲. 基于多层语义融合的图文信息情感分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[8] 孟镇,王昊,虞为,邓三鸿,张宝隆. 基于特征融合的声乐分类研究*[J]. 数据分析与知识发现, 2021, 5(5): 59-70.
[9] 韩普,张展鹏,张明淘,顾亮. 基于多特征融合的中文疾病名称归一化研究*[J]. 数据分析与知识发现, 2021, 5(5): 83-94.
[10] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[11] 林克柔,王昊,龚丽娟,张宝隆. 融合多特征的中文论文同名学者消歧研究 *[J]. 数据分析与知识发现, 2021, 5(4): 90-102.
[12] 段建勇,魏晓鹏,王昊. 基于多角度共同匹配的多项选择机器阅读理解模型 *[J]. 数据分析与知识发现, 2021, 5(4): 134-141.
[13] 王雨竹,谢珺,陈波,续欣莹. 基于跨模态上下文感知注意力的多模态情感分析 *[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[14] 李菲菲,吴璠,王中卿. 基于生成式对抗网络和评论专业类型的情感分类研究 *[J]. 数据分析与知识发现, 2021, 5(4): 72-79.
[15] 韩普, 张伟, 张展鹏, 王宇欣, 方浩宇. 基于特征融合和多通道的突发公共卫生事件微博情感分析*[J]. 数据分析与知识发现, 2021, 5(11): 68-79.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn