Data Analysis and Knowledge Discovery  2024, Vol. 8 Issue (5): 91-101    DOI: 10.11925/infotech.2096-3467.2023.0026
Multimodal Sentiment Analysis Model Integrating Multi-features and Attention Mechanism
Lyu Xueqiang1,Tian Chi1,Zhang Le1(),Du Yifan1,Zhang Xu1,Cai Zangtai2
1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China
2The State Key Laboratory of Tibetan Intelligent Information Processing and Application, Qinghai Normal University, Xining 810008, China
[Objective] This paper proposes a multimodal sentiment analysis model integrating multiple features and attention mechanisms. It addresses the insufficient extraction of multimodal features and inadequate interaction of intra-modal and inter-modal information in existing models. [Methods] In multimodal feature extraction, we enhanced the features of body movements, gender, and age of individuals in the video modality. For the text modality, we integrated BERT-based character-level and word-level semantic vectors. Therefore, we enriched the low-level features of multimodal data. We also utilized self-attention and cross-modal attention mechanisms to integrate intra-modal and inter-modal information. We concatenated the modal features and employed a soft attention mechanism to allocate attention weight to each feature. Finally, we generated the sentiment classification results through fully connected layers. [Results] We examined the proposed model on the public dataset (CH-SIMS) and the Hot Public Opinion Comments Videos (HPOC) dataset constructed in this paper. Compared with the Self-MM model, our model improved the binary classification accuracy, tri-class classification accuracy, and F1 value by 1.83%, 1.74%, and 0.69% on the CH-SIMS dataset, and 1.03%, 0.94%, and 0.79% on the HPOC dataset. [Limitations] The person’s scene in the video may change constantly, and different scenes may contain different emotional information. Our model does not integrate the scene information of the person. [Conclusions] The proposed model enriches the low-level features of multimodal data and improves the effectiveness of sentimental analysis.

Key wordsMulti-features      Multi-modal      Sentiment Analysis      Attention Mechanism     
Received: 11 January 2023      Published: 27 May 2024
ZTFLH:  TP391  
Fund:Key Program of the National Language Commission of China(ZDI145-10);Science and Technology Plan of the Beijing Municipal Commission of Education(KM202311232001);Innovation Platform Construction Special Project of Qinghai Province(2022-ZJ-T02)
Corresponding Authors: Zhang Le,ORCID:0000-0002-9620-511X,E-mail:。   

Lyu Xueqiang, Tian Chi, Zhang Le, Du Yifan, Zhang Xu, Cai Zangtai. Multimodal Sentiment Analysis Model Integrating Multi-features and Attention Mechanism. Data Analysis and Knowledge Discovery, 2024, 8(5): 91-101.

Overall Architecture of MFAM
The Architecture of Cross-Modal Attention Module
训练集 验证集 测试集 训练集 验证集 测试集
话语数 1 368 456 457 350 119 119
积极情感 419 139 140 125 33 47
中性情感 207 69 69 87 32 37
消极情感 742 248 248 138 54 35
Basic Information of Dataset
实验环境 配置
操作系统 Linux
CPU Intel(R) Xeon(R) Gold 5118 CPU @2.30GHz
GPU Tesla V100
Python 3.8.13
PyTorch 1.12.1
CUDA 11.4
Experimental Environment Information
参数 参数值 参数 参数值
跨模态注意力维度 50 Learning_rate 0.002
跨模态注意力头数 10 Dropout 0.3
优化器 Adam Early_stop 8
迭代次数 20 Batch_size 16
Experimental Parameters Setting
模型 Acc-2/% Acc-3/% F1-Score/% MAE
EF-LSTM 69.37 54.27 56.82 0.590
TFN 78.38 65.12 78.62 0.432
MFN 77.90 65.73 77.88 0.435
MulT 78.56 64.77 79.66 0.453
MISA 79.43 64.55 79.70 0.428
Self-MM 80.04 65.47 80.44 0.425
MFAM 81.87 67.21 81.13 0.416
Experimental Results of Different Models on CH-SIMS Dataset
模型 Acc-2/% Acc-3/% F1-Score/% MAE
EF-LSTM 63.26 49.50 50.17 0.632
TFN 72.45 57.14 71.85 0.593
MFN 73.38 57.65 72.04 0.589
MulT 73.32 57.43 71.90 0.591
MISA 74.03 58.21 73.26 0.578
Self-MM 74.37 58.52 73.73 0.564
MFAM 75.40 59.46 74.52 0.560
Experimental Results of Different Models on HPOC Dataset
序号 模型 Acc-2/% Acc-3/% F1-Score/% MAE
1 L-A 80.41 66.47 79.71 0.435
2 L-V 79.36 65.72 78.84 0.447
3 w/o Pose 80.92 66.64 79.93 0.435
4 w/o Gender 81.13 66.73 80.23 0.426
5 w/o Age 81.27 66.85 80.45 0.423
6 w/o P_G_A 80.35 66.39 79.48 0.444
7 w/o Sememe 81.05 66.70 79.95 0.432
8 w/o Cross-attention 80.18 65.96 79.26 0.446
9 w/o Soft-attention 81.35 67.03 80.56 0.421
10 MFAM 81.87 67.21 81.13 0.416
The Ablation Experiment Result on CH-SIMS Dataset
