Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (4): 49-59     https://doi.org/10.11925/infotech.2096-3467.2020.1042
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于跨模态上下文感知注意力的多模态情感分析 *
王雨竹1,谢珺1(),陈波1,续欣莹2
1太原理工大学信息与计算机学院 晋中 030600
2太原理工大学电气与动力工程学院 太原 030024
Multi-modal Sentiment Analysis Based on Cross-modal Context-aware Attention
Wang Yuzhu1,Xie Jun1(),Chen Bo1,Xu Xinying2
1College of Information and Computer, Taiyuan University of Technology, Jinzhong 030600, China
2College of Electrical and Power Engineering, Taiyuan University of Technology, Taiyuan 030024, China
全文: PDF (1288 KB)   HTML ( 18
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 应用多模态情感分析方法,从视频信息中挖掘表达者观点,进而分析用户的情感表达。【方法】 引入双模态和三模态两个层次的跨模态上下文信息帮助获取文本、视觉及语音三模态间的交互信息,并使用注意力机制过滤冗余信息,根据融合信息进行情感分析。【结果】 在MOSEI数据集上,情感分类的准确率和F1值分别达到80.27%和79.23%,较基准方法的最高值分别提高了0.47%和0.87%;回归分析的平均绝对误差降低为0.66。【局限】 MOSI数据集规模小,模型训练阶段出现过拟合现象,情感预测效果受限。【结论】 所提模型能够充分利用不同模态间的交互信息,有效提升多模态情感预测的准确性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
王雨竹
谢珺
陈波
续欣莹
关键词 多模态特征融合情感分析上下文感知注意力机制    
Abstract

[Objective] This paper extracts users’ opinions from videos to analyze their sentiments with the help of multi-modal methods. [Methods] First, we introduced bimodal and trimodal context information to obtain the interactions data among text, visual and audio. Then, we used attention mechanism to filter redundant information. Finally, we conducted sentiment analysis with the processed data. [Results] We examined the proposed method with MOSEI dataset. The accuracy and F1 value of sentiment classification reached 80.27% and 79.23%, which were 0.47% and 0.87% higher than the best results of the benchmark method. The mean absolute error of the regression analysis was reduced to 0.66. [Limitations] There was overfitting issue in model training due to the small size of MOSI dataset, which limited the effects of sentiment prediction. [Conclusions] The proposed model uses the interaction among different modalities and effectively improves the accuracy of sentiment prediction.

Key wordsMulti-modal    Feature Fusion    Sentiment Analysis    Context-aware    Attention Mechanism
收稿日期: 2020-10-26      出版日期: 2021-05-17
ZTFLH:  分类号: TP391  
基金资助:*山西省应用基础研究项目(201801D221190);山西省2020年研究生教育创新项目的研究成果之一(2020SY527)
通讯作者: 谢珺     E-mail: xiejun@tyut.edu.cn
引用本文:   
王雨竹,谢珺,陈波,续欣莹. 基于跨模态上下文感知注意力的多模态情感分析 *[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
Wang Yuzhu,Xie Jun,Chen Bo,Xu Xinying. Multi-modal Sentiment Analysis Based on Cross-modal Context-aware Attention. Data Analysis and Knowledge Discovery, 2021, 5(4): 49-59.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.1042      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I4/49
Fig.1  多模态表达实例
Fig.2  上下文信息对情感倾向的影响
Fig.3  基于跨模态上下文感知注意力的多模态情感分析模型
Fig.4  上下文感知的三模态融合注意力机制
统计数据 MOSI MOSEI
训练集 验证集 测试集 训练集 验证集 测试集
视频 52 10 31 2 250 300 679
视频片段 1 151 296 752 16 216 1 835 4 625
正面情感类 556 153 467 11 499 1 333 3 281
负面情感类 595 143 285 4 717 502 1 344
Table 1  数据集统计数据(单位/条)
模型表现 MOSI MOSEI
Acc/% F1/% Acc/% F1/% MAE
TFN 74.60 74.50 - - -
MARN 77.10 77.00 - - -
MFN 77.40 77.30 76.00 76.00 0.72
CH-Fusion 80.00 - - - -
Graph-MFN - - 76.90 77.00 0.71
BC-LSTM 80.30 - 77.64 - -
MMMU-BA 82.31 82.27 79.80 78.36 -
CCA-SA 81.78 81.76 80.27 79.23 0.66
Table 2  不同模型对比实验结果
Fig.5  CCA-SA分类混淆矩阵结果
模态组合 MOSI MOSEI
Acc/% F1/% Acc/% F1/%
T 79.39 78.98 77.71 76.12
A 62.10 47.58 73.62 71.52
V 64.36 55.05 74.01 64.56
V+T 80.72 80.85 78.04 76.66
T+A 80.32 80.26 78.53 77.15
V+A 63.83 59.96 75.69 74.33
T+V+A 81.78 81.76 80.27 79.23
Table 3  不同模态组合的情感分类结果
Fig.6  MOSI消融实验结果
Fig.7  MOSEI消融实验结果
[1] 张亚洲, 戎璐, 宋大为, 等. 多模态情感分析研究综述[J]. 模式识别与人工智能, 2020,33(5):426-438.
[1] ( Zhang Yazhou, Rong Lu, Song Dawei, et al. A Survey on Multimodal Sentiment Analysis[J]. Pattern Recognition and Artificial Intelligence, 2020,33(5):426-438.)
[2] Morency L P, Mihalcea R, Doshi P. Towards Multimodal Sentiment Analysis: Harvesting Opinions from the Web[C]// Proceeding of the 13th International Conference on Multimodal Interfaces. Alicante, Spain: ACM, 2011: 169-176.
[3] Poria S, Cambria E, Hazarika D, et al. Context-dependent Sentiment Analysis in User-generated Videos[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada: ACL, 2017: 873-883.
[4] 谭荧, 张进, 夏立新. 社交媒体情境下的情感分析研究综述[J]. 数据分析与知识发现, 2020,4(1):1-11.
[4] ( Tan Ying, Zhang Jin, Xia Lixin. A Survey of Sentiment Analysis on Social Media[J]. Data Analysis and Knowledge Discovery, 2020,4(1):1-11.)
[5] Glodek M, Tschechne S, Layher G, et al. Multiple Classifier Systems for the Classification of Audio-visual Emotional States[C]//Proceedings of the 4th International Conference on Affective Computing and Intelligent Interaction. Berlin, Heidelberg: Springer, 2011: 359-368.
[6] Cai G Y, Xia B B. Convolutional Neural Networks for Multimedia Sentiment Analysis[C]//Proceedings of the 4th CCF Conference on Natural Language Processing and Chinese Computing. Berlin, Heidelberg: Springer, 2015: 159-167.
[7] Zadeh A, Zellers R, Pincus E, et al. Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages[J]. IEEE Intelligent Systems, 2016,31(6):82-88.
doi: 10.1109/MIS.2016.94
[8] Atrey P K, Hossain M A, El Saddik A, et al. Multimodal Fusion for Multimedia Analysis: A Survey[J]. Multimedia Systems, 2010,16(6):345-379.
doi: 10.1007/s00530-010-0182-0
[9] Zadeh A, Chen M, Poria S, et al. Tensor Fusion Network for Multimodal Sentiment Analysis[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1103-1114.
[10] Zadeh A, Liang P P, Mazumder N, et al. Memory Fusion Network for Multi-view Sequential Learning[C]// Proceedings of the 2018 AAAI Conference on Artificial Intelligence. 2018: 5634-5641.
[11] Zadeh A, Liang P P, Poria S, et al. Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 2236-2246.
[12] Ghosal D, Akhtar M S, Chauhan D, et al. Contextual Inter-modal Attention for Multi-modal Sentiment Analysis[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 3454-3466.
[13] Nojavanasghari B, Gopinath D, Koushik J, et al. Deep Multimodal Fusion for Persuasiveness Prediction[C]// Proceedings of the 18th ACM International Conference on Multimodal Interaction. 2016: 284-288.
[14] Wollmer M, Weninger F, Knaup T, et al. YouTube Movie Reviews: Sentiment Analysis in an Audio-visual Context[J]. IEEE Intelligent Systems, 2013,28(3):46-53.
doi: 10.1109/MIS.2013.34
[15] Cho K, van Merriënboer B, Gulcehre C, et al. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1724-1734.
[16] He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
[17] Karpathy A, Toderici G, Shetty S, et al. Large-scale Video Classification with Convolutional Neural Networks[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014: 1725-1732.
[18] Ji S W, Xu W, Yang M, et al. 3D Convolutional Neural Networks for Human Action Recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012,35(1):221-231.
doi: 10.1109/TPAMI.2012.59
[19] Eyben F, Wöllmer M, Schuller B. Opensmile: The Munich Versatile and Fast Open-source Audio Feature Extractor[C]// Proceedings of the 18th ACM International Conference on Multimedia. 2010: 1459-1462.
[20] Degottex G, Kane J, Drugman T, et al. COVAREP-A Collaborative Voice Analysis Repository for Speech Technologies[C]// Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. 2014: 960-964.
[21] Majumder N, Hazarika D, Gelbukh A, et al. Multimodal Sentiment Analysis Using Hierarchical Fusion with Context Modeling[J]. Knowledge-based Systems, 2018,161:124-133.
doi: 10.1016/j.knosys.2018.07.041
[22] Zadeh A, Liang P P, Poria S, et al. Multi-attention Recurrent Network for Human Communication Comprehension[C]// Proceedings of the 2018 AAAI Conference on Artificial Intelligence. 2018: 5642-5649.
[23] Poria S, Cambria E, Gelbukh A. Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 2539-2544.
[24] Pérez-Rosas V, Mihalcea R, Morency L P. Utterance-Level Multimodal Sentiment Analysis[C]// Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013: 973-982.
[25] Poria S, Cambria E, Hazarika D, et al. Multi-level Multiple Attentions for Contextual Multimodal Sentiment Analysis[C]// Proceedings of the 2017 IEEE International Conference on Data Mining. 2017: 1033-1038.
[1] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] 范涛,王昊,吴鹏. 基于图卷积神经网络和依存句法分析的网民负面情感分析研究*[J]. 数据分析与知识发现, 2021, 5(9): 97-106.
[3] 杨晗迅, 周德群, 马静, 罗永聪. 基于不确定性损失函数和任务层级注意力机制的多任务谣言检测研究*[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
[4] 徐月梅, 王子厚, 吴子歆. 一种基于CNN-BiLSTM多特征融合的股票走势预测模型*[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[5] 谢豪,毛进,李纲. 基于多层语义融合的图文信息情感分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[6] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[7] 尹鹏博,潘伟民,张海军,陈德刚. 基于BERT-BiGA模型的标题党新闻识别研究*[J]. 数据分析与知识发现, 2021, 5(6): 126-134.
[8] 余本功,朱晓洁,张子薇. 基于多层次特征提取的胶囊网络文本分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[9] 刘彤,刘琛,倪维健. 多层次数据增强的半监督中文情感分析方法*[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[10] 韩普,张展鹏,张明淘,顾亮. 基于多特征融合的中文疾病名称归一化研究*[J]. 数据分析与知识发现, 2021, 5(5): 83-94.
[11] 孟镇,王昊,虞为,邓三鸿,张宝隆. 基于特征融合的声乐分类研究*[J]. 数据分析与知识发现, 2021, 5(5): 59-70.
[12] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[13] 段建勇,魏晓鹏,王昊. 基于多角度共同匹配的多项选择机器阅读理解模型 *[J]. 数据分析与知识发现, 2021, 5(4): 134-141.
[14] 林克柔,王昊,龚丽娟,张宝隆. 融合多特征的中文论文同名学者消歧研究 *[J]. 数据分析与知识发现, 2021, 5(4): 90-102.
[15] 常城扬,王晓东,张胜磊. 基于深度学习方法对特定群体推特的动态政治情感极性分析*[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn