Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (4): 49-59    DOI: 10.11925/infotech.2096-3467.2020.1042
Multi-modal Sentiment Analysis Based on Cross-modal Context-aware Attention
Wang Yuzhu1,Xie Jun1(),Chen Bo1,Xu Xinying2
1College of Information and Computer, Taiyuan University of Technology, Jinzhong 030600, China
2College of Electrical and Power Engineering, Taiyuan University of Technology, Taiyuan 030024, China
[Objective] This paper extracts users’ opinions from videos to analyze their sentiments with the help of multi-modal methods. [Methods] First, we introduced bimodal and trimodal context information to obtain the interactions data among text, visual and audio. Then, we used attention mechanism to filter redundant information. Finally, we conducted sentiment analysis with the processed data. [Results] We examined the proposed method with MOSEI dataset. The accuracy and F1 value of sentiment classification reached 80.27% and 79.23%, which were 0.47% and 0.87% higher than the best results of the benchmark method. The mean absolute error of the regression analysis was reduced to 0.66. [Limitations] There was overfitting issue in model training due to the small size of MOSI dataset, which limited the effects of sentiment prediction. [Conclusions] The proposed model uses the interaction among different modalities and effectively improves the accuracy of sentiment prediction.

Key wordsMulti-modal      Feature Fusion      Sentiment Analysis      Context-aware      Attention Mechanism     
Received: 26 October 2020      Published: 17 May 2021
ZTFLH:  分类号: TP391  
Fund:Applied Basic Research Project of Shanxi Province(201801D221190);Graduate Education Innovation Project 2020 of Shanxi Province(2020SY527)
Wang Yuzhu,Xie Jun,Chen Bo,Xu Xinying. Multi-modal Sentiment Analysis Based on Cross-modal Context-aware Attention. Data Analysis and Knowledge Discovery, 2021, 5(4): 49-59.

An Example of Multi-modal Data
An Example of Effect of Context Information on Sentiment Analysis
Overall Architecture of the Proposed Framework
Context-Aware Trimodal Fusion Attention Mechanism
训练集 验证集 测试集 训练集 验证集 测试集
视频 52 10 31 2 250 300 679
视频片段 1 151 296 752 16 216 1 835 4 625
正面情感类 556 153 467 11 499 1 333 3 281
负面情感类 595 143 285 4 717 502 1 344
Statistics of Datasets
Acc/% F1/% Acc/% F1/% MAE
TFN 74.60 74.50 - - -
MARN 77.10 77.00 - - -
MFN 77.40 77.30 76.00 76.00 0.72
CH-Fusion 80.00 - - - -
Graph-MFN - - 76.90 77.00 0.71
BC-LSTM 80.30 - 77.64 - -
MMMU-BA 82.31 82.27 79.80 78.36 -
CCA-SA 81.78 81.76 80.27 79.23 0.66
The Results of Different Models
Confusion Matrix Results of CCA-SA
Acc/% F1/% Acc/% F1/%
T 79.39 78.98 77.71 76.12
A 62.10 47.58 73.62 71.52
V 64.36 55.05 74.01 64.56
V+T 80.72 80.85 78.04 76.66
T+A 80.32 80.26 78.53 77.15
V+A 63.83 59.96 75.69 74.33
T+V+A 81.78 81.76 80.27 79.23
Sentiment Classification Results of Different Modality Combinations
Model Ablation Results on MOSI
Model Ablation Results on MOSEI
