Please wait a minute...
Data Analysis and Knowledge Discovery  2024, Vol. 8 Issue (5): 91-101    DOI: 10.11925/infotech.2096-3467.2023.0026
Current Issue | Archive | Adv Search |
Multimodal Sentiment Analysis Model Integrating Multi-features and Attention Mechanism
Lyu Xueqiang1,Tian Chi1,Zhang Le1(),Du Yifan1,Zhang Xu1,Cai Zangtai2
1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China
2The State Key Laboratory of Tibetan Intelligent Information Processing and Application, Qinghai Normal University, Xining 810008, China
Download: PDF (1103 KB)   HTML ( 36
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a multimodal sentiment analysis model integrating multiple features and attention mechanisms. It addresses the insufficient extraction of multimodal features and inadequate interaction of intra-modal and inter-modal information in existing models. [Methods] In multimodal feature extraction, we enhanced the features of body movements, gender, and age of individuals in the video modality. For the text modality, we integrated BERT-based character-level and word-level semantic vectors. Therefore, we enriched the low-level features of multimodal data. We also utilized self-attention and cross-modal attention mechanisms to integrate intra-modal and inter-modal information. We concatenated the modal features and employed a soft attention mechanism to allocate attention weight to each feature. Finally, we generated the sentiment classification results through fully connected layers. [Results] We examined the proposed model on the public dataset (CH-SIMS) and the Hot Public Opinion Comments Videos (HPOC) dataset constructed in this paper. Compared with the Self-MM model, our model improved the binary classification accuracy, tri-class classification accuracy, and F1 value by 1.83%, 1.74%, and 0.69% on the CH-SIMS dataset, and 1.03%, 0.94%, and 0.79% on the HPOC dataset. [Limitations] The person’s scene in the video may change constantly, and different scenes may contain different emotional information. Our model does not integrate the scene information of the person. [Conclusions] The proposed model enriches the low-level features of multimodal data and improves the effectiveness of sentimental analysis.

Key wordsMulti-features      Multi-modal      Sentiment Analysis      Attention Mechanism     
Received: 11 January 2023      Published: 27 May 2024
ZTFLH:  TP391  
Fund:Key Program of the National Language Commission of China(ZDI145-10);Science and Technology Plan of the Beijing Municipal Commission of Education(KM202311232001);Innovation Platform Construction Special Project of Qinghai Province(2022-ZJ-T02)
Corresponding Authors: Zhang Le,ORCID:0000-0002-9620-511X,E-mail: zhangle@bistu.edu.cn。   

Cite this article:

Lyu Xueqiang, Tian Chi, Zhang Le, Du Yifan, Zhang Xu, Cai Zangtai. Multimodal Sentiment Analysis Model Integrating Multi-features and Attention Mechanism. Data Analysis and Knowledge Discovery, 2024, 8(5): 91-101.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2023.0026     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2024/V8/I5/91

Overall Architecture of MFAM
The Architecture of Cross-Modal Attention Module
项目 CH-SIMS HPOC
训练集 验证集 测试集 训练集 验证集 测试集
话语数 1 368 456 457 350 119 119
积极情感 419 139 140 125 33 47
中性情感 207 69 69 87 32 37
消极情感 742 248 248 138 54 35
Basic Information of Dataset
实验环境 配置
操作系统 Linux
CPU Intel(R) Xeon(R) Gold 5118 CPU @2.30GHz
GPU Tesla V100
Python 3.8.13
PyTorch 1.12.1
CUDA 11.4
Experimental Environment Information
参数 参数值 参数 参数值
跨模态注意力维度 50 Learning_rate 0.002
跨模态注意力头数 10 Dropout 0.3
优化器 Adam Early_stop 8
迭代次数 20 Batch_size 16
Experimental Parameters Setting
模型 Acc-2/% Acc-3/% F1-Score/% MAE
EF-LSTM 69.37 54.27 56.82 0.590
TFN 78.38 65.12 78.62 0.432
MFN 77.90 65.73 77.88 0.435
MulT 78.56 64.77 79.66 0.453
MISA 79.43 64.55 79.70 0.428
Self-MM 80.04 65.47 80.44 0.425
MFAM 81.87 67.21 81.13 0.416
Experimental Results of Different Models on CH-SIMS Dataset
模型 Acc-2/% Acc-3/% F1-Score/% MAE
EF-LSTM 63.26 49.50 50.17 0.632
TFN 72.45 57.14 71.85 0.593
MFN 73.38 57.65 72.04 0.589
MulT 73.32 57.43 71.90 0.591
MISA 74.03 58.21 73.26 0.578
Self-MM 74.37 58.52 73.73 0.564
MFAM 75.40 59.46 74.52 0.560
Experimental Results of Different Models on HPOC Dataset
序号 模型 Acc-2/% Acc-3/% F1-Score/% MAE
1 L-A 80.41 66.47 79.71 0.435
2 L-V 79.36 65.72 78.84 0.447
3 w/o Pose 80.92 66.64 79.93 0.435
4 w/o Gender 81.13 66.73 80.23 0.426
5 w/o Age 81.27 66.85 80.45 0.423
6 w/o P_G_A 80.35 66.39 79.48 0.444
7 w/o Sememe 81.05 66.70 79.95 0.432
8 w/o Cross-attention 80.18 65.96 79.26 0.446
9 w/o Soft-attention 81.35 67.03 80.56 0.421
10 MFAM 81.87 67.21 81.13 0.416
The Ablation Experiment Result on CH-SIMS Dataset
[1] Gönen M, Alpaydin E. Multiple Kernel Learning Algorithms[J]. Journal of Machine Learning Research, 2011, 12: 2211-2268.
[2] Ghahramani Z. Learning Dynamic Bayesian Networks[M]//Adaptive Processing of Sequences and Data Structures. Berlin, Heidelberg: Springer, 1998: 168-197.
[3] Chen Y. Convolutional Neural Network for Sentence Classification[D]. Waterloo: University of Waterloo, 2015.
[4] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
doi: 10.1162/neco.1997.9.8.1735 pmid: 9377276
[5] Tang D Y, Qin B, Liu T. Document Modeling with Gated Recurrent Neural Network for Sentiment Classification[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1422-1432.
[6] Poria S, Cambria E, Hazarika D, et al. Context-Dependent Sentiment Analysis in User-Generated Videos[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 873-883.
[7] Williams J, Kleinegesse S, Comanescu R, et al. Recognizing Emotions in Video Using Multimodal DNN Feature Fusion[C]// Proceedings of Grand Challenge and Workshop on Human Multimodal Language. 2018: 11-19.
[8] Zadeh A, Chen M H, Poria S, et al. Tensor Fusion Network for Multimodal Sentiment Analysis[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1103-1114.
[9] Hou M, Tang J J, Zhang J H, et al. Deep Multimodal Multilinear Fusion with High-Order Polynomial Pooling[C]// Proceedings of the 33rd Conference on Neural Information Processing Systems. 2019: 12156-12166.
[10] Liu Z, Shen Y, Lakshminarasimhan V B, et al. Efficient Low-Rank Multimodal Fusion with Modality-Specific Factors[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2018: 2247-2256.
[11] 马超, 李纲, 陈思菁, 等. 基于多模态数据语义融合的旅游在线评论有用性识别研究[J]. 情报学报, 2020, 39(2): 199-207.
[11] (Ma Chao, Li Gang, Chen Sijing, et al. Research on Usefulness Recognition of Tourism Online Reviews Based on Multimodal Data Semantic Fusion[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(2): 199-207.)
[12] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[13] Tsai Y H H, Bai S J, Liang P P, et al. Multimodal Transformer for Unaligned Multimodal Language Sequences[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 6558-6569.
[14] Hazarika D, Zimmermann R, Poria S. MISA: Modality-Invariant and-Specific Representations for Multimodal Sentiment Analysis[C]// Proceedings of the 28th ACM International Conference on Multimedia. 2020: 1122-1131.
[15] 张昱, 张海军, 刘雅情, 等. 基于双向掩码注意力机制的多模态情感分析[J]. 数据分析与知识发现, 2023, 7(4): 46-55.
[15] (Zhang Yu, Zhang Haijun, Liu Yaqing, et al. Multimodal Sentiment Analysis Based on Bidirectional Mask Attention Mechanism[J]. Data Analysis and Knowledge Discovery, 2023, 7(4): 46-55.)
[16] Lindsay P H, Norman D A. Human Information Processing: An Introduction to Psychology[M]. London: Academic Press, 2013.
[17] 张峰, 李希城, 董春茹, 等. 基于深度情感唤醒网络的多模态情感分析与情绪识别[J]. 控制与决策, 2022, 37(11): 2984-2992.
[17] (Zhang Feng, Li Xicheng, Dong Chunru, et al. Deep Emotional Arousal Network for Multimodal Sentiment Analysis and Emotion Recognition[J]. Control and Decision, 2022, 37(11): 2984-2992.)
[18] 王旭阳, 董帅, 石杰. 复合层次融合的多模态情感分析[J]. 计算机科学与探索, 2023, 17(1): 198-208.
doi: 10.3778/j.issn.1673-9418.2111004
[18] (Wang Xuyang, Dong Shuai, Shi Jie. Multimodal Sentiment Analysis with Composite Hierarchical Fusion[J]. Journal of Frontiers of Computer Science & Technology, 2023, 17(1): 198-208.)
doi: 10.3778/j.issn.1673-9418.2111004
[19] McFee B, Raffel C, Liang D W, et al. librosa: Audio and Music Signal Analysis in Python[C]// Proceedings of the 14th Python in Science Conference. 2015: 18-25.
[20] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume1(Long and Short Papers). 2019: 4171-4186.
[21] Niu Y L, Xie R B, Liu Z Y, et al. Improved Word Representation Learning with Sememes[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2017: 2049-2058.
[22] Yu W M, Xu H, Meng F Y, et al. CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-Grained Annotation of Modality[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 3718-3727.
[23] Baltrusaitis T, Zadeh A, Lim Y C, et al. OpenFace 2.0: Facial Behavior Analysis Toolkit[C]// Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition. 2018: 59-66.
[24] Xu Y F, Zhang J, Zhang Q M, et al. ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation [OL]. arXiv Preprint, arXiv: 2204.12484.
[25] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: Common Objects in Context[C]// Proceedings of the 13th European Conference on Computer Vision. 2014: 740-755.
[26] Park S, Shim H S, Chatterjee M, et al. Computational Analysis of Persuasiveness in Social Multimedia: A Novel Dataset and Multimodal Prediction Approach[C]// Proceedings of the 16th ACM International Conference on Multimodal Interaction. 2014: 50-57.
[27] Zeng Y, Mai S, Hu H F. Which is Making the Contribution: Modulating Unimodal and Cross-Modal Dynamics for Multimodal Sentiment Analysis[C]// Findings of the Association for Computational Linguistics:EMNLP 2021. 2021: 1262-1274.
[28] Zadeh A, Liang P P, Mazumder N, et al. Memory Fusion Network for Multi-view Sequential Learning[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018: 5634-5641.
[29] Yu W M, Xu H, Yuan Z Q, et al. Learning Modality-Specific Representations with Self-Supervised Multi-task Learning for Multimodal Sentiment Analysis[C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021: 10790-10797.
[1] Jiang Yiping, Zhang Ting, Xia Zhengming, Li Yuhua, Zhang Zhaotong. Sentiment Analysis of User Reviews Integrating Margin Sampling and Tri-training[J]. 数据分析与知识发现, 2024, 8(5): 102-112.
[2] Feng Lizhou, Liu Furong, Wang Youwei. Detecting Rumor Based on Graph Convolution Network and Attention Mechanism[J]. 数据分析与知识发现, 2024, 8(4): 125-136.
[3] Liu Chengshan, Li Puguo, Wang Zhen. A Researcher Recommendation Model for Research Teams[J]. 数据分析与知识发现, 2024, 8(3): 132-142.
[4] Ni Liang, Wu Peng, Zhou Xueqing. Topic Detecting on Multimodal News Data Based on Deep Learning[J]. 数据分析与知识发现, 2024, 8(3): 85-97.
[5] Quan Ankun, Li Honglian, Zhang Le, Lyu Xueqiang. Generating Chinese Abstracts with Content and Image Features[J]. 数据分析与知识发现, 2024, 8(3): 110-119.
[6] Li Xuelian, Wang Bi, Li Lixin, Han Dixuan. Sentiment Analysis with Abstract Meaning Representation and Dependency Grammar[J]. 数据分析与知识发现, 2024, 8(1): 55-68.
[7] Li Hui, Hu Yaohua, Xu Cunzhen. Personalized Recommendation Algorithm with Review Sentiments and Importance[J]. 数据分析与知识发现, 2024, 8(1): 69-79.
[8] He Li, Yang Meihua, Liu Luyao. Detecting Events with SPO Semantic and Syntactic Information[J]. 数据分析与知识发现, 2023, 7(9): 114-124.
[9] Han Pu, Gu Liang, Ye Dongyu, Chen Wenqi. Recognizing Chinese Medical Literature Entities Based on Multi-Task and Transfer Learning[J]. 数据分析与知识发现, 2023, 7(9): 136-145.
[10] Liu Yang, Ding Xingchen, Ma Lili, Wang Chunyang, Zhu Lifang. Usefulness Detection of Travel Reviews Based on Multi-dimensional Graph Convolutional Networks[J]. 数据分析与知识发现, 2023, 7(8): 95-104.
[11] Yan Shangyi, Wang Jingya, Liu Xiaowen, Cui Yumeng, Tao Zhizhong, Zhang Xiaofan. Microblog Sentiment Analysis with Multi-Head Self-Attention Pooling and Multi-Granularity Feature Interaction Fusion[J]. 数据分析与知识发现, 2023, 7(4): 32-45.
[12] Zhang Yu, Zhang Haijun, Liu Yaqing, Liang Kejin, Wang Yueyang. Multimodal Sentiment Analysis Based on Bidirectional Mask Attention Mechanism[J]. 数据分析与知识发现, 2023, 7(4): 46-55.
[13] Han Pu, Zhong Yule, Lu Haojie, Ma Shiwen. Identifying Named Entities of Adverse Drug Reaction with Adversarial Transfer Learning[J]. 数据分析与知识发现, 2023, 7(3): 131-141.
[14] Zhao Chaoyang, Zhu Guibo, Wang Jinqiao. The Inspiration Brought by ChatGPT to LLM and the New Development Ideas of Multi-modal Large Model[J]. 数据分析与知识发现, 2023, 7(3): 26-35.
[15] Li Haojun, Lv Yun, Wang Xuhui, Huang Jieya. A Deep Recommendation Model with Multi-Layer Interaction and Sentiment Analysis[J]. 数据分析与知识发现, 2023, 7(3): 43-57.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn