Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (4): 49-59    DOI: 10.11925/infotech.2096-3467.2020.1042
Current Issue | Archive | Adv Search |
Multi-modal Sentiment Analysis Based on Cross-modal Context-aware Attention
Wang Yuzhu1,Xie Jun1(),Chen Bo1,Xu Xinying2
1College of Information and Computer, Taiyuan University of Technology, Jinzhong 030600, China
2College of Electrical and Power Engineering, Taiyuan University of Technology, Taiyuan 030024, China
Download: PDF (1288 KB)   HTML ( 24
Export: BibTeX | EndNote (RIS)      

[Objective] This paper extracts users’ opinions from videos to analyze their sentiments with the help of multi-modal methods. [Methods] First, we introduced bimodal and trimodal context information to obtain the interactions data among text, visual and audio. Then, we used attention mechanism to filter redundant information. Finally, we conducted sentiment analysis with the processed data. [Results] We examined the proposed method with MOSEI dataset. The accuracy and F1 value of sentiment classification reached 80.27% and 79.23%, which were 0.47% and 0.87% higher than the best results of the benchmark method. The mean absolute error of the regression analysis was reduced to 0.66. [Limitations] There was overfitting issue in model training due to the small size of MOSI dataset, which limited the effects of sentiment prediction. [Conclusions] The proposed model uses the interaction among different modalities and effectively improves the accuracy of sentiment prediction.

Key wordsMulti-modal      Feature Fusion      Sentiment Analysis      Context-aware      Attention Mechanism     
Received: 26 October 2020      Published: 17 May 2021
ZTFLH:  分类号: TP391  
Fund:Applied Basic Research Project of Shanxi Province(201801D221190);Graduate Education Innovation Project 2020 of Shanxi Province(2020SY527)
Corresponding Authors: Xie Jun     E-mail:

Cite this article:

Wang Yuzhu,Xie Jun,Chen Bo,Xu Xinying. Multi-modal Sentiment Analysis Based on Cross-modal Context-aware Attention. Data Analysis and Knowledge Discovery, 2021, 5(4): 49-59.

URL:     OR

An Example of Multi-modal Data
An Example of Effect of Context Information on Sentiment Analysis
Overall Architecture of the Proposed Framework
Context-Aware Trimodal Fusion Attention Mechanism
训练集 验证集 测试集 训练集 验证集 测试集
视频 52 10 31 2 250 300 679
视频片段 1 151 296 752 16 216 1 835 4 625
正面情感类 556 153 467 11 499 1 333 3 281
负面情感类 595 143 285 4 717 502 1 344
Statistics of Datasets
Acc/% F1/% Acc/% F1/% MAE
TFN 74.60 74.50 - - -
MARN 77.10 77.00 - - -
MFN 77.40 77.30 76.00 76.00 0.72
CH-Fusion 80.00 - - - -
Graph-MFN - - 76.90 77.00 0.71
BC-LSTM 80.30 - 77.64 - -
MMMU-BA 82.31 82.27 79.80 78.36 -
CCA-SA 81.78 81.76 80.27 79.23 0.66
The Results of Different Models
Confusion Matrix Results of CCA-SA
Acc/% F1/% Acc/% F1/%
T 79.39 78.98 77.71 76.12
A 62.10 47.58 73.62 71.52
V 64.36 55.05 74.01 64.56
V+T 80.72 80.85 78.04 76.66
T+A 80.32 80.26 78.53 77.15
V+A 63.83 59.96 75.69 74.33
T+V+A 81.78 81.76 80.27 79.23
Sentiment Classification Results of Different Modality Combinations
Model Ablation Results on MOSI
Model Ablation Results on MOSEI
[1] 张亚洲, 戎璐, 宋大为, 等. 多模态情感分析研究综述[J]. 模式识别与人工智能, 2020,33(5):426-438.
[1] ( Zhang Yazhou, Rong Lu, Song Dawei, et al. A Survey on Multimodal Sentiment Analysis[J]. Pattern Recognition and Artificial Intelligence, 2020,33(5):426-438.)
[2] Morency L P, Mihalcea R, Doshi P. Towards Multimodal Sentiment Analysis: Harvesting Opinions from the Web[C]// Proceeding of the 13th International Conference on Multimodal Interfaces. Alicante, Spain: ACM, 2011: 169-176.
[3] Poria S, Cambria E, Hazarika D, et al. Context-dependent Sentiment Analysis in User-generated Videos[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada: ACL, 2017: 873-883.
[4] 谭荧, 张进, 夏立新. 社交媒体情境下的情感分析研究综述[J]. 数据分析与知识发现, 2020,4(1):1-11.
[4] ( Tan Ying, Zhang Jin, Xia Lixin. A Survey of Sentiment Analysis on Social Media[J]. Data Analysis and Knowledge Discovery, 2020,4(1):1-11.)
[5] Glodek M, Tschechne S, Layher G, et al. Multiple Classifier Systems for the Classification of Audio-visual Emotional States[C]//Proceedings of the 4th International Conference on Affective Computing and Intelligent Interaction. Berlin, Heidelberg: Springer, 2011: 359-368.
[6] Cai G Y, Xia B B. Convolutional Neural Networks for Multimedia Sentiment Analysis[C]//Proceedings of the 4th CCF Conference on Natural Language Processing and Chinese Computing. Berlin, Heidelberg: Springer, 2015: 159-167.
[7] Zadeh A, Zellers R, Pincus E, et al. Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages[J]. IEEE Intelligent Systems, 2016,31(6):82-88.
doi: 10.1109/MIS.2016.94
[8] Atrey P K, Hossain M A, El Saddik A, et al. Multimodal Fusion for Multimedia Analysis: A Survey[J]. Multimedia Systems, 2010,16(6):345-379.
doi: 10.1007/s00530-010-0182-0
[9] Zadeh A, Chen M, Poria S, et al. Tensor Fusion Network for Multimodal Sentiment Analysis[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1103-1114.
[10] Zadeh A, Liang P P, Mazumder N, et al. Memory Fusion Network for Multi-view Sequential Learning[C]// Proceedings of the 2018 AAAI Conference on Artificial Intelligence. 2018: 5634-5641.
[11] Zadeh A, Liang P P, Poria S, et al. Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 2236-2246.
[12] Ghosal D, Akhtar M S, Chauhan D, et al. Contextual Inter-modal Attention for Multi-modal Sentiment Analysis[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 3454-3466.
[13] Nojavanasghari B, Gopinath D, Koushik J, et al. Deep Multimodal Fusion for Persuasiveness Prediction[C]// Proceedings of the 18th ACM International Conference on Multimodal Interaction. 2016: 284-288.
[14] Wollmer M, Weninger F, Knaup T, et al. YouTube Movie Reviews: Sentiment Analysis in an Audio-visual Context[J]. IEEE Intelligent Systems, 2013,28(3):46-53.
doi: 10.1109/MIS.2013.34
[15] Cho K, van Merriënboer B, Gulcehre C, et al. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1724-1734.
[16] He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
[17] Karpathy A, Toderici G, Shetty S, et al. Large-scale Video Classification with Convolutional Neural Networks[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014: 1725-1732.
[18] Ji S W, Xu W, Yang M, et al. 3D Convolutional Neural Networks for Human Action Recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012,35(1):221-231.
doi: 10.1109/TPAMI.2012.59
[19] Eyben F, Wöllmer M, Schuller B. Opensmile: The Munich Versatile and Fast Open-source Audio Feature Extractor[C]// Proceedings of the 18th ACM International Conference on Multimedia. 2010: 1459-1462.
[20] Degottex G, Kane J, Drugman T, et al. COVAREP-A Collaborative Voice Analysis Repository for Speech Technologies[C]// Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. 2014: 960-964.
[21] Majumder N, Hazarika D, Gelbukh A, et al. Multimodal Sentiment Analysis Using Hierarchical Fusion with Context Modeling[J]. Knowledge-based Systems, 2018,161:124-133.
doi: 10.1016/j.knosys.2018.07.041
[22] Zadeh A, Liang P P, Poria S, et al. Multi-attention Recurrent Network for Human Communication Comprehension[C]// Proceedings of the 2018 AAAI Conference on Artificial Intelligence. 2018: 5642-5649.
[23] Poria S, Cambria E, Gelbukh A. Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 2539-2544.
[24] Pérez-Rosas V, Mihalcea R, Morency L P. Utterance-Level Multimodal Sentiment Analysis[C]// Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013: 973-982.
[25] Poria S, Cambria E, Hazarika D, et al. Multi-level Multiple Attentions for Contextual Multimodal Sentiment Analysis[C]// Proceedings of the 2017 IEEE International Conference on Data Mining. 2017: 1033-1038.
[1] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] Yang Hanxun, Zhou Dequn, Ma Jing, Luo Yongcong. Detecting Rumors with Uncertain Loss and Task-level Attention Mechanism[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
[3] Xu Yuemei, Wang Zihou, Wu Zixin. Predicting Stock Trends with CNN-BiLSTM Based Multi-Feature Integration Model[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[4] Yin Pengbo,Pan Weimin,Zhang Haijun,Chen Degang. Identifying Clickbait with BERT-BiGA Model[J]. 数据分析与知识发现, 2021, 5(6): 126-134.
[5] Xie Hao,Mao Jin,Li Gang. Sentiment Classification of Image-Text Information with Multi-Layer Semantic Fusion[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[6] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[7] Zhang Guobiao,Li Jie. Detecting Social Media Fake News with Semantic Consistency Between Multi-model Contents[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[8] Liu Tong,Liu Chen,Ni Weijian. A Semi-Supervised Sentiment Analysis Method for Chinese Based on Multi-Level Data Augmentation[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[9] Han Pu,Zhang Zhanpeng,Zhang Mingtao,Gu Liang. Normalizing Chinese Disease Names with Multi-feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 83-94.
[10] Meng Zhen,Wang Hao,Yu Wei,Deng Sanhong,Zhang Baolong. Vocal Music Classification Based on Multi-category Feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 59-70.
[11] Duan Jianyong,Wei Xiaopeng,Wang Hao. A Multi-Perspective Co-Matching Model for Machine Reading Comprehension[J]. 数据分析与知识发现, 2021, 5(4): 134-141.
[12] Li Feifei,Wu Fan,Wang Zhongqing. Sentiment Analysis with Reviewer Types and Generative Adversarial Network[J]. 数据分析与知识发现, 2021, 5(4): 72-79.
[13] Lin Kerou,Wang Hao,Gong Lijuan,Zhang Baolong. Disambiguation of Chinese Author Names with Multiple Features[J]. 数据分析与知识发现, 2021, 5(4): 90-102.
[14] Chang Chengyang,Wang Xiaodong,Zhang Shenglei. Polarity Analysis of Dynamic Political Sentiments from Tweets with Deep Learning Method[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[15] Zhang Mengyao, Zhu Guangli, Zhang Shunxiang, Zhang Biao. Grouping Microblog Users of Trending Topics Based on Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938