Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (10): 74-84    DOI: 10.11925/infotech.2096-3467.2022.1019
Current Issue | Archive | Adv Search |
Multi-task & Multi-modal Sentiment Analysis Model Based on Aware Fusion
Wu Sisi,Ma Jing()
College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Download: PDF (1585 KB)   HTML ( 14
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper develops a multi-task and multi-modal sentiment analysis model based on aware fusion, aiming to sufficiently use context information, as well as address the modality-invariant and modality-specific issues. [Methods] We established multi-modal, text, acoustic, and image sentiment analysis tasks. We extracted their features using BERT, wav2vec2.0, and openface2.0 models, which were processed by the self-attention layer and sent to the aware fusion layer for multi-modal feature fusion. Finally, we categorized the single-modal and multi-modal information using Softmax. We also introduced the loss function of the homoscedastic uncertainty to assign weights to different tasks automatically. [Results] Compared with the baseline method, the proposed model improved the accuracy and F1 value by 1.59% and 1.67% on CH-SIMS, and 0.55% and 0.67% on CMU-MOSI. The ablation experiment showed that the accuracy and F1 value of multi-task learning were 4.08% and 4.18% higher than those of single-task learning. [Limitations] We need to examine the new model’s performance on large-scale data sets. [Conclusions] The model can effectively reduce noise and improve multi-modal fusion. The multi-task learning framework could also achieve better performance.

Key wordsMulti-modal      Sentiment Analysis      Multi-task      Aware Fusion     
Received: 26 September 2022      Published: 21 March 2023
ZTFLH:  TP391  
  G350  
Fund:National Natural Science Foundation of China(72174086);Nanjing University of Aeronautics and Astronautics Graduate Research and Practice Innovation Project(xcxjh20220910)
Corresponding Authors: Ma Jing, ORCID: 0000-0001-8472-2518, E-mail: majing5525@126.com。   

Cite this article:

Wu Sisi, Ma Jing. Multi-task & Multi-modal Sentiment Analysis Model Based on Aware Fusion. Data Analysis and Knowledge Discovery, 2023, 7(10): 74-84.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.1019     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I10/74

Dataset Examples
Framework of MMAF
Multi-modal Aware Fusion
数据集划分 CH-SIMS CMU-MOSI
训练集 1 368 1 284
验证集 456 229
测试集 457 686
合计 2 281 2 199
Datasets Information
参数名 参数值
词向量维度 768
声音向量维度 768
图像向量维度 709
学习率 0.001
Dropout 0.2
激活函数 ReLU
批处理 16
优化函数 Adam
Experimental Parameter Setting
模型 CH-SIMS CMU-MOSI
Acc-3/% F1/% Acc-2/% F1/%
EF-LSTM 56.35 56.89 75.72 75.45
LF-LSTM 58.13 58.51 76.92 77.03
TFN 64.84 65.14 80.36 80.51
LMF 65.41 65.49 81.97 82.20
MulT 66.28 66.72 83.59 83.84
Self-MM 67.94 68.28 85.98 85.95
MMAF 69.53 69.95 86.53 86.62
Algorithm Performance
模态组合 Acc-3/% F1/%
T 65.19 65.60
A 62.87 63.04
I 61.37 61.53
T+A 67.54 67.91
T+I 66.05 66.49
A+I 62.91 63.05
T+A+I 69.53 69.95
Experimental Results of Modal Ablation on CH-SIMS
任务 Acc-3/% F1/%
M 69.53 69.95
M+T 70.42 70.83
M+A 69.90 70.36
M+I 69.64 70.02
M+T+A 72.16 73.27
M+T+I 71.21 71.29
M+A+I 70.70 70.99
M+T+A+I 73.61 74.13
Experimental Results of Task Ablation on CH-SIMS
[1] Cambria E, Hazarika D, Poria S, et al. Benchmarking Multimodal Sentiment Analysis[C]// Proceedings of International Conference on Computational Linguistics and Intelligent Text Processing. Springer, Cham, 2017: 166-179.
[2] Bahdanau D, Cho K H, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[C]// Proceedings of the 3rd International Conference on Learning Representations. 2015.
[3] Zadeh A, Chen M, Poria S, et al. Tensor Fusion Network for Multimodal Sentiment Analysis[OL]. arXiv Preprint, arXiv: 1707.07250.
[4] Gu Y, Yang K N, Fu S Y, et al. Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2018: 2225-2235.
[5] Wang Y S, Shen Y, Liu Z, et al. Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence. 2019: 7216-7223.
[6] Pham H, Liang P P, Manzini T, et al. Found in Translation: Learning Robust Joint Representations by Cyclic Translations Between Modalities[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence and the 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence. 2019: 6892-6899.
[7] Hazarika D, Zimmermann R, Poria S. MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis[C]// Proceedings of the 28th ACM International Conference on Multimedia. 2020: 1122-1131.
[8] 潘家辉, 何志鹏, 李自娜, 等. 多模态情绪识别研究综述[J]. 智能系统学报, 2020, 15(4): 633-645.
[8] (Pan Jiahui, He Zhipeng, Li Zina, et al. A Review of Multimodal Emotion Recognition[J]. CAAI Transactions on Intelligent Systems, 2020, 15(4): 633-645.)
[9] Liu Z, Shen Y, Lakshminarasimhan V B, et al. Efficient Low-Rank Multimodal Fusion with Modality-Specific Factors[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2018: 2247-2256.
[10] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[11] Zadeh A, Liang P P, Mazumder N, et al. Memory Fusion Network for Multi-view Sequential Learning[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. 2018: 5634-5641.
[12] Tsai Y H H, Bai S J, Liang P P, et al. Multimodal Transformer for Unaligned Multimodal Language Sequences[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 6558-6569.
[13] Sahay S, Okur E, Kumar S H, et al. Low Rank Fusion Based Transformers for Multimodal Sequences[C]// Proceedings of the 2nd Grand-Challenge and Workshop on Multimodal Language (Challenge-HML). 2020: 29-34.
[14] Han W, Chen H, Poria S. Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021: 9180-9192.
[15] Li Z, Xu B, Zhu C H, et al. CLMLF: A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection[OL]. arXiv Preprint, arXiv: 2204.05515.
[16] Akhtar M S, Chauhan D S, Ghosal D, et al. Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis[OL]. arXiv Preprint, arXiv: 1905.05812.
[17] Yu W M, Xu H, Meng F Y, et al. CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 3718-3727.
[18] Chauhan D S, Dhanush S R, Ekbal A, et al.Sentiment and Emotion Help Sarcasm? A Multi-task Learning Framework for Multi-modal Sarcasm, Sentiment and Emotion Analysis[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 4351-4360.
[19] Yu W M, Xu H, Yuan Z Q, et al. Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2021: 10790-10797.
[20] Yang B S, Li J, Wong D F, et al. Context-Aware Self-Attention Networks[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2019: 387-394.
[21] Kendall A, Gal Y, Cipolla R. Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 7482-7491.
[22] Cui Y M, Che W X, Liu T, et al. Pre-training with Whole Word Masking for Chinese BERT[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514.
doi: 10.1109/TASLP.2021.3124365
[23] Baevski A, Zhou H, Mohamed A, et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020: 12449-12460.
[24] Baltrusaitis T, Zadeh A, Lim Y C, et al. OpenFace 2.0: Facial Behavior Analysis Toolkit[C]// Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition. 2018: 59-66.
[25] Lu J S, Batra D, Parikh D, et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks[OL]. arXiv Preprint, arXiv: 1908.02265.
[26] Zadeh A, Zellers R, Pincus E, et al. MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos[OL]. arXiv Preprint, arXiv: 1606.06259.
[1] Han Pu, Gu Liang, Ye Dongyu, Chen Wenqi. Recognizing Chinese Medical Literature Entities Based on Multi-Task and Transfer Learning[J]. 数据分析与知识发现, 2023, 7(9): 136-145.
[2] Liu Yang, Ding Xingchen, Ma Lili, Wang Chunyang, Zhu Lifang. Usefulness Detection of Travel Reviews Based on Multi-dimensional Graph Convolutional Networks[J]. 数据分析与知识发现, 2023, 7(8): 95-104.
[3] Yan Shangyi, Wang Jingya, Liu Xiaowen, Cui Yumeng, Tao Zhizhong, Zhang Xiaofan. Microblog Sentiment Analysis with Multi-Head Self-Attention Pooling and Multi-Granularity Feature Interaction Fusion[J]. 数据分析与知识发现, 2023, 7(4): 32-45.
[4] Zhang Yu, Zhang Haijun, Liu Yaqing, Liang Kejin, Wang Yueyang. Multimodal Sentiment Analysis Based on Bidirectional Mask Attention Mechanism[J]. 数据分析与知识发现, 2023, 7(4): 46-55.
[5] Zhao Chaoyang, Zhu Guibo, Wang Jinqiao. The Inspiration Brought by ChatGPT to LLM and the New Development Ideas of Multi-modal Large Model[J]. 数据分析与知识发现, 2023, 7(3): 26-35.
[6] Li Haojun, Lv Yun, Wang Xuhui, Huang Jieya. A Deep Recommendation Model with Multi-Layer Interaction and Sentiment Analysis[J]. 数据分析与知识发现, 2023, 7(3): 43-57.
[7] Zhou Ning, Zhong Na, Jin Gaoya, Liu Bin. Chinese Text Sentiment Analysis Based on Dual Channel Attention Network with Hybrid Word Embedding[J]. 数据分析与知识发现, 2023, 7(3): 58-68.
[8] Shen Lining, Yang Jiayi, Pei Jiaxuan, Cao Guang, Chen Gongzheng. A Fine-Grained Sentiment Recognition Method Based on OCC Model and Triggering Events[J]. 数据分析与知识发现, 2023, 7(2): 72-85.
[9] Wang Hao, Gong Lijuan, Zhou Zeyu, Fan Tao, Wang Yongsheng. Detecting Mis/Dis-information from Social Media with Semantic Enhancement[J]. 数据分析与知识发现, 2023, 7(2): 48-60.
[10] Xu Yuemei, Cao Han, Wang Wenqing, Du Wanze, Xu Chengyang. Cross-Lingual Sentiment Analysis: A Survey[J]. 数据分析与知识发现, 2023, 7(1): 1-21.
[11] Xiao Yuhan, Lin Huiping. Mining Differentiated Demands with Aspect Word Extraction: Case Study of Smartphone Reviews[J]. 数据分析与知识发现, 2023, 7(1): 63-75.
[12] Xiao Hanqiong, Zhang Xinyu, Xiao Yuhan, Lin Huiping. Creating Consumer Psychology Portrait with Aspect Words[J]. 数据分析与知识发现, 2022, 6(6): 22-31.
[13] Li Guofeng, Li Zuojuan, Wang Zheji, Wu Meng. Identifying Tax Audit Cases with Multi-task Learning[J]. 数据分析与知识发现, 2022, 6(6): 128-140.
[14] Xu Yuemei, Fan Zuwei, Cao Han. A Multi-Task Text Classification Model Based on Label Embedding of Attention Mechanism[J]. 数据分析与知识发现, 2022, 6(2/3): 105-116.
[15] Yu Chuanming, Lin Hongjun, Zhang Zhengang. Joint Extraction Model for Entities and Events with Multi-task Deep Learning[J]. 数据分析与知识发现, 2022, 6(2/3): 117-128.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn