|
|
Multi-task & Multi-modal Sentiment Analysis Model Based on Aware Fusion |
Wu Sisi,Ma Jing() |
College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China |
|
|
Abstract [Objective] This paper develops a multi-task and multi-modal sentiment analysis model based on aware fusion, aiming to sufficiently use context information, as well as address the modality-invariant and modality-specific issues. [Methods] We established multi-modal, text, acoustic, and image sentiment analysis tasks. We extracted their features using BERT, wav2vec2.0, and openface2.0 models, which were processed by the self-attention layer and sent to the aware fusion layer for multi-modal feature fusion. Finally, we categorized the single-modal and multi-modal information using Softmax. We also introduced the loss function of the homoscedastic uncertainty to assign weights to different tasks automatically. [Results] Compared with the baseline method, the proposed model improved the accuracy and F1 value by 1.59% and 1.67% on CH-SIMS, and 0.55% and 0.67% on CMU-MOSI. The ablation experiment showed that the accuracy and F1 value of multi-task learning were 4.08% and 4.18% higher than those of single-task learning. [Limitations] We need to examine the new model’s performance on large-scale data sets. [Conclusions] The model can effectively reduce noise and improve multi-modal fusion. The multi-task learning framework could also achieve better performance.
|
Received: 26 September 2022
Published: 21 March 2023
|
|
Fund:National Natural Science Foundation of China(72174086);Nanjing University of Aeronautics and Astronautics Graduate Research and Practice Innovation Project(xcxjh20220910) |
Corresponding Authors:
Ma Jing, ORCID: 0000-0001-8472-2518, E-mail: majing5525@126.com。
|
[1] |
Cambria E, Hazarika D, Poria S, et al. Benchmarking Multimodal Sentiment Analysis[C]// Proceedings of International Conference on Computational Linguistics and Intelligent Text Processing. Springer, Cham, 2017: 166-179.
|
[2] |
Bahdanau D, Cho K H, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[C]// Proceedings of the 3rd International Conference on Learning Representations. 2015.
|
[3] |
Zadeh A, Chen M, Poria S, et al. Tensor Fusion Network for Multimodal Sentiment Analysis[OL]. arXiv Preprint, arXiv: 1707.07250.
|
[4] |
Gu Y, Yang K N, Fu S Y, et al. Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2018: 2225-2235.
|
[5] |
Wang Y S, Shen Y, Liu Z, et al. Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence. 2019: 7216-7223.
|
[6] |
Pham H, Liang P P, Manzini T, et al. Found in Translation: Learning Robust Joint Representations by Cyclic Translations Between Modalities[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence and the 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence. 2019: 6892-6899.
|
[7] |
Hazarika D, Zimmermann R, Poria S. MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis[C]// Proceedings of the 28th ACM International Conference on Multimedia. 2020: 1122-1131.
|
[8] |
潘家辉, 何志鹏, 李自娜, 等. 多模态情绪识别研究综述[J]. 智能系统学报, 2020, 15(4): 633-645.
|
[8] |
(Pan Jiahui, He Zhipeng, Li Zina, et al. A Review of Multimodal Emotion Recognition[J]. CAAI Transactions on Intelligent Systems, 2020, 15(4): 633-645.)
|
[9] |
Liu Z, Shen Y, Lakshminarasimhan V B, et al. Efficient Low-Rank Multimodal Fusion with Modality-Specific Factors[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2018: 2247-2256.
|
[10] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
|
[11] |
Zadeh A, Liang P P, Mazumder N, et al. Memory Fusion Network for Multi-view Sequential Learning[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. 2018: 5634-5641.
|
[12] |
Tsai Y H H, Bai S J, Liang P P, et al. Multimodal Transformer for Unaligned Multimodal Language Sequences[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 6558-6569.
|
[13] |
Sahay S, Okur E, Kumar S H, et al. Low Rank Fusion Based Transformers for Multimodal Sequences[C]// Proceedings of the 2nd Grand-Challenge and Workshop on Multimodal Language (Challenge-HML). 2020: 29-34.
|
[14] |
Han W, Chen H, Poria S. Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021: 9180-9192.
|
[15] |
Li Z, Xu B, Zhu C H, et al. CLMLF: A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection[OL]. arXiv Preprint, arXiv: 2204.05515.
|
[16] |
Akhtar M S, Chauhan D S, Ghosal D, et al. Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis[OL]. arXiv Preprint, arXiv: 1905.05812.
|
[17] |
Yu W M, Xu H, Meng F Y, et al. CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 3718-3727.
|
[18] |
Chauhan D S, Dhanush S R, Ekbal A, et al.Sentiment and Emotion Help Sarcasm? A Multi-task Learning Framework for Multi-modal Sarcasm, Sentiment and Emotion Analysis[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 4351-4360.
|
[19] |
Yu W M, Xu H, Yuan Z Q, et al. Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2021: 10790-10797.
|
[20] |
Yang B S, Li J, Wong D F, et al. Context-Aware Self-Attention Networks[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2019: 387-394.
|
[21] |
Kendall A, Gal Y, Cipolla R. Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 7482-7491.
|
[22] |
Cui Y M, Che W X, Liu T, et al. Pre-training with Whole Word Masking for Chinese BERT[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514.
doi: 10.1109/TASLP.2021.3124365
|
[23] |
Baevski A, Zhou H, Mohamed A, et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020: 12449-12460.
|
[24] |
Baltrusaitis T, Zadeh A, Lim Y C, et al. OpenFace 2.0: Facial Behavior Analysis Toolkit[C]// Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition. 2018: 59-66.
|
[25] |
Lu J S, Batra D, Parikh D, et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks[OL]. arXiv Preprint, arXiv: 1908.02265.
|
[26] |
Zadeh A, Zellers R, Pincus E, et al. MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos[OL]. arXiv Preprint, arXiv: 1606.06259.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|