Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (4): 46-55    DOI: 10.11925/infotech.2096-3467.2022.0151
Current Issue | Archive | Adv Search |
Multimodal Sentiment Analysis Based on Bidirectional Mask Attention Mechanism
Zhang Yu,Zhang Haijun(),Liu Yaqing,Liang Kejin,Wang Yueyang
School of Computer Science and Technology, Xinjiang Normal University, Urumqi 830054, China
Download: PDF (1172 KB)   HTML ( 24
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a multimodal sentiment analysis model based on the bidirectional mask attention mechanism (BMAM) to utilize multimodal information and achieve more effective intermodal interaction. [Methods] First, we simultaneously modeled text and speech modalities. The mask attention dynamically adjusted the attention weight for each modality by introducing information from the other modality. Then, we obtained more accurate modality representations, which retained the inherent uniqueness of the modality. We also reduced the differences with the other modality and helped the model choose the optimal sentiment. [Results] The model was evaluated and validated on the general multimodal sentiment analysis dataset-IEMOCAP. The model’s sentiment analysis weighted accuracy rate reached 74.1%, significantly improving the existing mainstream methods. [Limitations] The model has a higher recognition effect on the Neutral and Anger emotional categories, accounting for a larger proportion in the data set. It has a poor recognition performance on the Happy and Sad emotional categories, which account for a smaller proportion in the data set. [Conclusions] The proposed BMAM model can effectively use the interaction between multiple modalities to adjust the attention weight between their emotional elements reasonably and decide sentiment accurately.

Key wordsMultimodality      Sentiment Analysis      Inter-Modal Interaction      Bidirectional Mask Attention     
Received: 25 February 2022      Published: 07 June 2023
ZTFLH:  TP393  
Fund:National Natural Science Foundation of China(U1703261);Tianshan Cedars Plan of Xinjiang Uygur Autonomous Region(2019XS08)
Corresponding Authors: Zhang Haijun,ORCID:0000-0002-6823-7077,E-mail: zhjlp@163.com   

Cite this article:

Zhang Yu, Zhang Haijun, Liu Yaqing, Liang Kejin, Wang Yueyang. Multimodal Sentiment Analysis Based on Bidirectional Mask Attention Mechanism. Data Analysis and Knowledge Discovery, 2023, 7(4): 46-55.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0151     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I4/46

Framework of BMAM
Masked Attention Structure for Speech-to-Text Direction
数据集 Session Neutral Sad Happy Anger 合计 占比
训练数据集 Session 1 384 184 278 22 4 290 77.7%
Session 2 362 197 237 137
Session 3 320 305 286 240
Session 4 258 143 303 327
测试数据集 Session 5 384 245 442 170 1 241 22.3%
Distribution of IEMOCAP Dataset
算法模型 WA/% UA/%
M_CBLA_BLR
MDRE
Attn+Align
CMA+Raw
LFMA
IEmoNET
STAER
BMAM
70.4
71.8
72.5
72.8
72.9
73.5
71.1
74.1
71.2

70.9

71.9
71.0
72.0
72.8
Experimental Results of Different Algorithm Models on IEMOCAP
情感类别 精准度/% 召回率/% F1值/%
Neutral
Happy
Sad
Anger
74.0
72.1
67.4
81.5
74.9
65.2
71.0
80.1
74.4
68.5
69.2
80.8
Emotion Recognition Performance Statistics of Various Categories
Confusion Matrix of Multimodal Recognition Results
γ t
">
Text Modal Self-Attention Matrix γ t
γ ^ t After Interaction
">
Text Modal Self-Attention Matrix γ ^ t After Interaction
[1] 杨开漠, 吴明芬, 陈涛. 广义文本情感分析综述[J]. 计算机应用, 2019, 39(S2): 6-14.
[1] (Yang Kaimo, Wu Mingfen, Chen Tao. Generalized Text Sentiment Analysis Review[J]. Journal of Computer Applications, 2019, 39(S2): 6-14.)
[2] 冒小栋, 范涛. 基于文本情感分析的共享单车用户满意度研究[J]. 计算机系统应用, 2019, 28(1): 222-227.
[2] (Mao Xiaodong, Fan Tao. Study on Customer Satisfaction of Shared Bicycle Based on Textual Emotional Analysis[J]. Computer Systems & Applications, 2019, 28(1): 222-227.)
[3] 张亚洲, 戎璐, 宋大为, 等. 多模态情感分析研究综述[J]. 模式识别与人工智能, 2020, 33(5): 426-438.
doi: 10.16451/j.cnki.issn1003-6059.202005005
[3] (Zhang Yazhou, Rong Lu, Song Dawei, et al. A Survey on Multimodal Sentiment Analysis[J]. Pattern Recognition and Artificial Intelligence, 2020, 33(5): 426-438.)
doi: 10.16451/j.cnki.issn1003-6059.202005005
[4] 包广斌, 李港乐, 王国雄. 面向多模态情感分析的双模态交互注意力[J]. 计算机科学与探索, 2022, 16(4): 909-916.
doi: 10.3778/j.issn.1673-9418.2105071
[4] (Bao Guangbin, Li Gangle, Wang Guoxiong. Bimodal Interactive Attention for Multimodal Sentiment Analysis[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 909-916.)
doi: 10.3778/j.issn.1673-9418.2105071
[5] Abdu S A, Yousef A H, Salem A. Multimodal Video Sentiment Analysis Using Deep Learning Approaches, a Survey[J]. Information Fusion, 2021, 76: 204-226.
doi: 10.1016/j.inffus.2021.06.003
[6] 王开心, 徐秀娟, 刘宇, 等. 在线评论的静态多模态情感分析[J]. 应用科学学报, 2022, 40(1): 25-35.
[6] (Wang Kaixin, Xu Xiujuan, Liu Yu, et al. Static Multimodal Sentiment Analysis of Online Reviews[J]. Journal of Applied Sciences, 2022, 40(1): 25-35.)
[7] 刘继明, 张培翔, 刘颖, 等. 多模态的情感分析技术综述[J]. 计算机科学与探索, 2021, 15(7): 1165-1182.
doi: 10.3778/j.issn.1673-9418.2012075
[7] (Liu Jiming, Zhang Peixiang, Liu Ying, et al. Summary of Multi-Modal Sentiment Analysis Technology[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(7): 1165-1182.)
doi: 10.3778/j.issn.1673-9418.2012075
[8] Li Y Q, Zhang K, Wang J Y, et al. A Cognitive Brain Model for Multimodal Sentiment Analysis Based on Attention Neural Networks[J]. Neurocomputing, 2021, 430: 159-173.
doi: 10.1016/j.neucom.2020.10.021
[9] Uppal S, Bhagat S, Hazarika D, et al. Multimodal Research in Vision and Language: A Review of Current and Emerging Trends[J]. Information Fusion, 2022, 77: 149-171.
doi: 10.1016/j.inffus.2021.07.009
[10] Xu H Y, Zhang H, Han K, et al. Learning Alignment for Multimodal Emotion Recognition from Speech[C]// Proceedings of the 20th Annual Conference of the International Speech Communication Association. 2019: 3569-3573.
[11] Siriwardhana S, Reis A, Weerasekera R, et al. Jointly Fine-Tuning “BERT-Like” Self Supervised Models to Improve Multimodal Speech Emotion Recognition[C]// Proceedings of the 21st Annual Conference of the International Speech Communication Association. 2020.
[12] Li H, Ding W B, Wu Z Q, et al. Learning Fine-Grained Cross Modality Excitement for Speech Emotion Recognition[C]// Proceedings of the 22nd Annual Conference of the International Speech Communication Association. 2021.
[13] 徐军, 丁宇新, 王晓龙. 使用机器学习方法进行新闻的情感自动分类[J]. 中文信息学报, 2007, 21(6): 95-100.
[13] (Xu Jun, Ding Yuxin, Wang Xiaolong. Sentiment Classification for Chinese News Using Machine Learning Methods[J]. Journal of Chinese Information Processing, 2007, 21(6): 95-100.)
[14] 余伶俐, 蔡自兴, 陈明义. 语音信号的情感特征分析与识别研究综述[J]. 电路与系统学报, 2007, 12(4): 76-84.
[14] (Yu Lingli, Cai Zixing, Chen Mingyi. Study on Emotion Feature Analysis and Recognition in Speech Signal: An Overview[J]. Journal of Circuits and Systems, 2007, 12(4): 76-84.)
[15] 李祖贺, 樊养余. 基于视觉的情感分析研究综述[J]. 计算机应用研究, 2015, 32(12): 3521-3526.
[15] (Li Zuhe, Fan Yangyu. Survey on Visual Sentiment Analysis[J]. Application Research of Computers, 2015, 32(12): 3521-3526.)
[16] Jin Q, Li C X, Chen S Z, et al. Speech Emotion Recognition with Acoustic and Lexical Features[C]// Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2015: 4749-4753.
[17] Sahay S, Kumar S H, Xia R, et al. Multimodal Relational Tensor Network for Sentiment and Emotion Classification[C]// Proceedings of the Grand Challenge and Workshop on Human Multimodal Language. 2018: 20-27.
[18] Nadeem U, Bennamoun M, Sohel F, et al. Learning-Based Confidence Estimation for Multi-Modal Classifier Fusion[C]// Proceedings of the International Conference on Neural Information Processing, 2019: 299-312.
[19] Wang Y, Shen Y, Liu Z, et al. Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors[C]// Proceedings of the 2019 AAAI Conference on Artificial Intelligence. 2019, 33(1): 7216-7223.
[20] Rahman W, Hasan M K, Lee S W, et al. Integrating Multimodal Information in Large Pretrained Transformers[C]// Proceedings of the 2020 Conference of the Association for Computational Linguistics Meeting. 2020: 2359-2369.
[21] Poria S, Cambria E, Hazarika D, et al. Context-Dependent Sentiment Analysis in User-Generated Videos[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 873-883.
[22] Hazarika D, Poria S, Zadeh A, et al. Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos[C]// Proceedings of the 2018 Conference of the Association for Computational Linguistics North American Chapter Meeting. 2018: 2122-2132.
[23] Jiao W X, Lyu M, King I. Real-Time Emotion Recognition via Attention Gated Hierarchical Memory Network[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 8002-8009.
doi: 10.1609/aaai.v34i05.6309
[24] Hazarika D, Zimmermann R, Poria S. MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis[C]// Proceedings of the 28th ACM International Conference on Multimedia. 2020: 1122-1131.
[25] Guo X B, Kong W K A, Kot A C. Deep Multimodal Sequence Fusion by Regularized Expressive Representation Distillation[J]. IEEE Transactions on Multimedia, 2022. DOI:10.1109/TMM.2022.3142448.
doi: 10.1109/TMM.2022.3142448
[26] Cui Y M, Che W X, Liu T, et al. Pre-Training with Whole Word Masking for Chinese BERT[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514.
doi: 10.1109/TASLP.2021.3124365
[27] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
[28] Zhang Y, Wallace B C. A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1510.03820.
[29] Degottex G, Kane J, Drugman T, et al. COVAREP: A Collaborative Voice Analysis Repository for Speech Technologies[C]// Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. 2014: 960-964.
[30] Yuan J H, Liberman M. Speaker Identification on the SCOTUS Corpus[J]. The Journal of the Acoustical Society of America, 2008, 123(5): 3878.
[31] Lin Z, Feng M, dos Santos C N, et al. A Structured Self-Attentive Sentence Embedding[C]// Proceedings of the 2017 International Conference on Learning Representations. 2017.
[32] Yang K C, Xu H, Gao K. CM-BERT: Cross-Modal BERT for Text-Audio Sentiment Analysis[C]// Proceedings of the 28th ACM International Conference on Multimedia. 2020: 521-528.
[33] Gao S, Chen X Y, Li P J, et al. How to Write Summaries with Patterns?Learning Towards Abstractive Summarization Through Prototype Editing[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 3741-3751.
[34] Cai L Q, Hu Y X, Dong J G, et al. Audio-Textual Emotion Recognition Based on Improved Neural Networks[J]. Mathematical Problems in Engineering, 2019, 2019: 2593036.
[35] Yoon S, Byun S, Jung K. Multimodal Speech Emotion Recognition Using Audio and Text[C]// Proceedings of the 2018 IEEE Spoken Language Technology Workshop. 2018: 112-118.
[36] Krishna D N, Patil A. Multimodal Emotion Recognition Using Cross-Modal Attention and 1D Convolutional Neural Networks[C]// Proceedings of the 21st Annual Conference of the International Speech Communication Association. 2020: 4243-4247.
[37] Heusser V, Freymuth N, Constantin S, et al. Bimodal Speech Emotion Recognition Using Pre-Trained Language Models[OL]. arXiv Preprint, arXiv: 1912.02610.
[38] Chen M, Zhao X D. A Multi-Scale Fusion Framework for Bimodal Speech Emotion Recognition[C]// Proceedings of the 21st Annual Conference of the International Speech Communication Association. 2020.
[1] Yan Shangyi, Wang Jingya, Liu Xiaowen, Cui Yumeng, Tao Zhizhong, Zhang Xiaofan. Microblog Sentiment Analysis with Multi-Head Self-Attention Pooling and Multi-Granularity Feature Interaction Fusion[J]. 数据分析与知识发现, 2023, 7(4): 32-45.
[2] Li Haojun, Lv Yun, Wang Xuhui, Huang Jieya. A Deep Recommendation Model with Multi-Layer Interaction and Sentiment Analysis[J]. 数据分析与知识发现, 2023, 7(3): 43-57.
[3] Zhou Ning, Zhong Na, Jin Gaoya, Liu Bin. Chinese Text Sentiment Analysis Based on Dual Channel Attention Network with Hybrid Word Embedding[J]. 数据分析与知识发现, 2023, 7(3): 58-68.
[4] Shen Lining, Yang Jiayi, Pei Jiaxuan, Cao Guang, Chen Gongzheng. A Fine-Grained Sentiment Recognition Method Based on OCC Model and Triggering Events[J]. 数据分析与知识发现, 2023, 7(2): 72-85.
[5] Wang Hao, Gong Lijuan, Zhou Zeyu, Fan Tao, Wang Yongsheng. Detecting Mis/Dis-information from Social Media with Semantic Enhancement[J]. 数据分析与知识发现, 2023, 7(2): 48-60.
[6] Xu Yuemei, Cao Han, Wang Wenqing, Du Wanze, Xu Chengyang. Cross-Lingual Sentiment Analysis: A Survey[J]. 数据分析与知识发现, 2023, 7(1): 1-21.
[7] Xiao Yuhan, Lin Huiping. Mining Differentiated Demands with Aspect Word Extraction: Case Study of Smartphone Reviews[J]. 数据分析与知识发现, 2023, 7(1): 63-75.
[8] Xiao Hanqiong, Zhang Xinyu, Xiao Yuhan, Lin Huiping. Creating Consumer Psychology Portrait with Aspect Words[J]. 数据分析与知识发现, 2022, 6(6): 22-31.
[9] Shang Rongxuan, Zhang Bin, Mi Jianing. End-to-End Aspect-Level Sentiment Analysis for E-Government Applications Based on BRNN[J]. 数据分析与知识发现, 2022, 6(2/3): 364-375.
[10] Sun Yu, Qiu Jiangnan. Studying Opinion Leaders with Network Analysis and Text Mining[J]. 数据分析与知识发现, 2022, 6(1): 69-79.
[11] Xu Yuemei, Wang Zihou, Wu Zixin. Predicting Stock Trends with CNN-BiLSTM Based Multi-Feature Integration Model[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[12] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[13] Liu Tong,Liu Chen,Ni Weijian. A Semi-Supervised Sentiment Analysis Method for Chinese Based on Multi-Level Data Augmentation[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[14] Wang Yuzhu,Xie Jun,Chen Bo,Xu Xinying. Multi-modal Sentiment Analysis Based on Cross-modal Context-aware Attention[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[15] Li Feifei,Wu Fan,Wang Zhongqing. Sentiment Analysis with Reviewer Types and Generative Adversarial Network[J]. 数据分析与知识发现, 2021, 5(4): 72-79.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn