Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (10): 74-84     https://doi.org/10.11925/infotech.2096-3467.2022.1019
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于感知融合的多任务多模态情感分析模型*
吴思思,马静()
南京航空航天大学经济与管理学院 南京 211106
Multi-task & Multi-modal Sentiment Analysis Model Based on Aware Fusion
Wu Sisi,Ma Jing()
College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
全文: PDF (1585 KB)   HTML ( 14
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】针对现有的多模态情感分析模型中存在的上下文信息利用不足、未注重模态间的一致性和差异性等问题,设计一种基于感知融合的多任务多模态情感分析模型。【方法】设计了多模态、文本、声音、图像情感分析4个任务,使用BERT、wav2vec 2.0、OpenFace 2.0模型提取文本、声音、图像的特征信息,经Self-Attention层处理后传入感知融合层进行多模态特征融合,最后使用Softmax对单模态信息和多模态信息进行分类。引入同方差不确定性损失函数,为不同任务自动分配权重。【结果】所提模型与基线模型相比,在CH-SIMS数据集上准确率和F1值上分别提升1.59和1.67个百分点,在CMU-MOSI数据集上准确率和F1值上分别提升0.55和0.67个百分点。消融实验表明采用多任务学习比未采用多任务学习在准确率和F1值上分别提升4.08和4.18个百分点。【局限】 未测试模型在大规模数据集上的表现。【结论】所提模型能够有效地降低噪声,提升多模态融合效果,同时多任务学习框架能够使模型获得更好的学习效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
吴思思
马静
关键词 多模态情感分析多任务感知融合    
Abstract

[Objective] This paper develops a multi-task and multi-modal sentiment analysis model based on aware fusion, aiming to sufficiently use context information, as well as address the modality-invariant and modality-specific issues. [Methods] We established multi-modal, text, acoustic, and image sentiment analysis tasks. We extracted their features using BERT, wav2vec2.0, and openface2.0 models, which were processed by the self-attention layer and sent to the aware fusion layer for multi-modal feature fusion. Finally, we categorized the single-modal and multi-modal information using Softmax. We also introduced the loss function of the homoscedastic uncertainty to assign weights to different tasks automatically. [Results] Compared with the baseline method, the proposed model improved the accuracy and F1 value by 1.59% and 1.67% on CH-SIMS, and 0.55% and 0.67% on CMU-MOSI. The ablation experiment showed that the accuracy and F1 value of multi-task learning were 4.08% and 4.18% higher than those of single-task learning. [Limitations] We need to examine the new model’s performance on large-scale data sets. [Conclusions] The model can effectively reduce noise and improve multi-modal fusion. The multi-task learning framework could also achieve better performance.

Key wordsMulti-modal    Sentiment Analysis    Multi-task    Aware Fusion
收稿日期: 2022-09-26      出版日期: 2023-03-21
ZTFLH:  TP391  
  G350  
基金资助:*国家自然科学基金项目(72174086);南京航空航天大学研究生科研与实践创新项目(xcxjh20220910)
通讯作者: 马静, ORCID: 0000-0001-8472-2518, E-mail: majing5525@126.com。   
引用本文:   
吴思思, 马静. 基于感知融合的多任务多模态情感分析模型*[J]. 数据分析与知识发现, 2023, 7(10): 74-84.
Wu Sisi, Ma Jing. Multi-task & Multi-modal Sentiment Analysis Model Based on Aware Fusion. Data Analysis and Knowledge Discovery, 2023, 7(10): 74-84.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.1019      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I10/74
Fig.1  数据集示例
Fig.2  基于感知融合的多任务多模态情感分析模型框架
Fig.3  多模态感知融合
数据集划分 CH-SIMS CMU-MOSI
训练集 1 368 1 284
验证集 456 229
测试集 457 686
合计 2 281 2 199
Table 1  数据集信息
参数名 参数值
词向量维度 768
声音向量维度 768
图像向量维度 709
学习率 0.001
Dropout 0.2
激活函数 ReLU
批处理 16
优化函数 Adam
Table 2  实验参数设置
模型 CH-SIMS CMU-MOSI
Acc-3/% F1/% Acc-2/% F1/%
EF-LSTM 56.35 56.89 75.72 75.45
LF-LSTM 58.13 58.51 76.92 77.03
TFN 64.84 65.14 80.36 80.51
LMF 65.41 65.49 81.97 82.20
MulT 66.28 66.72 83.59 83.84
Self-MM 67.94 68.28 85.98 85.95
MMAF 69.53 69.95 86.53 86.62
Table 3  算法性能对比
模态组合 Acc-3/% F1/%
T 65.19 65.60
A 62.87 63.04
I 61.37 61.53
T+A 67.54 67.91
T+I 66.05 66.49
A+I 62.91 63.05
T+A+I 69.53 69.95
Table 4  CH-SIMS模态消融实验结果
任务 Acc-3/% F1/%
M 69.53 69.95
M+T 70.42 70.83
M+A 69.90 70.36
M+I 69.64 70.02
M+T+A 72.16 73.27
M+T+I 71.21 71.29
M+A+I 70.70 70.99
M+T+A+I 73.61 74.13
Table 5  CH-SIMS任务消融实验结果
[1] Cambria E, Hazarika D, Poria S, et al. Benchmarking Multimodal Sentiment Analysis[C]// Proceedings of International Conference on Computational Linguistics and Intelligent Text Processing. Springer, Cham, 2017: 166-179.
[2] Bahdanau D, Cho K H, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[C]// Proceedings of the 3rd International Conference on Learning Representations. 2015.
[3] Zadeh A, Chen M, Poria S, et al. Tensor Fusion Network for Multimodal Sentiment Analysis[OL]. arXiv Preprint, arXiv: 1707.07250.
[4] Gu Y, Yang K N, Fu S Y, et al. Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2018: 2225-2235.
[5] Wang Y S, Shen Y, Liu Z, et al. Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence. 2019: 7216-7223.
[6] Pham H, Liang P P, Manzini T, et al. Found in Translation: Learning Robust Joint Representations by Cyclic Translations Between Modalities[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence and the 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence. 2019: 6892-6899.
[7] Hazarika D, Zimmermann R, Poria S. MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis[C]// Proceedings of the 28th ACM International Conference on Multimedia. 2020: 1122-1131.
[8] 潘家辉, 何志鹏, 李自娜, 等. 多模态情绪识别研究综述[J]. 智能系统学报, 2020, 15(4): 633-645.
[8] (Pan Jiahui, He Zhipeng, Li Zina, et al. A Review of Multimodal Emotion Recognition[J]. CAAI Transactions on Intelligent Systems, 2020, 15(4): 633-645.)
[9] Liu Z, Shen Y, Lakshminarasimhan V B, et al. Efficient Low-Rank Multimodal Fusion with Modality-Specific Factors[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2018: 2247-2256.
[10] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[11] Zadeh A, Liang P P, Mazumder N, et al. Memory Fusion Network for Multi-view Sequential Learning[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. 2018: 5634-5641.
[12] Tsai Y H H, Bai S J, Liang P P, et al. Multimodal Transformer for Unaligned Multimodal Language Sequences[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 6558-6569.
[13] Sahay S, Okur E, Kumar S H, et al. Low Rank Fusion Based Transformers for Multimodal Sequences[C]// Proceedings of the 2nd Grand-Challenge and Workshop on Multimodal Language (Challenge-HML). 2020: 29-34.
[14] Han W, Chen H, Poria S. Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021: 9180-9192.
[15] Li Z, Xu B, Zhu C H, et al. CLMLF: A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection[OL]. arXiv Preprint, arXiv: 2204.05515.
[16] Akhtar M S, Chauhan D S, Ghosal D, et al. Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis[OL]. arXiv Preprint, arXiv: 1905.05812.
[17] Yu W M, Xu H, Meng F Y, et al. CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 3718-3727.
[18] Chauhan D S, Dhanush S R, Ekbal A, et al.Sentiment and Emotion Help Sarcasm? A Multi-task Learning Framework for Multi-modal Sarcasm, Sentiment and Emotion Analysis[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 4351-4360.
[19] Yu W M, Xu H, Yuan Z Q, et al. Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2021: 10790-10797.
[20] Yang B S, Li J, Wong D F, et al. Context-Aware Self-Attention Networks[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2019: 387-394.
[21] Kendall A, Gal Y, Cipolla R. Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 7482-7491.
[22] Cui Y M, Che W X, Liu T, et al. Pre-training with Whole Word Masking for Chinese BERT[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514.
doi: 10.1109/TASLP.2021.3124365
[23] Baevski A, Zhou H, Mohamed A, et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020: 12449-12460.
[24] Baltrusaitis T, Zadeh A, Lim Y C, et al. OpenFace 2.0: Facial Behavior Analysis Toolkit[C]// Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition. 2018: 59-66.
[25] Lu J S, Batra D, Parikh D, et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks[OL]. arXiv Preprint, arXiv: 1908.02265.
[26] Zadeh A, Zellers R, Pincus E, et al. MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos[OL]. arXiv Preprint, arXiv: 1606.06259.
[1] 韩普, 顾亮, 叶东宇, 陈文祺. 基于多任务和迁移学习的中文医学文献实体识别研究*[J]. 数据分析与知识发现, 2023, 7(9): 136-145.
[2] 刘洋, 丁星辰, 马莉莉, 王淳洋, 朱立芳. 基于多维度图卷积网络的旅游评论有用性识别*[J]. 数据分析与知识发现, 2023, 7(8): 95-104.
[3] 赵萌, 王昊, 李晓敏. 中国民歌多情感识别及情感变化规律分析研究*[J]. 数据分析与知识发现, 2023, 7(7): 111-124.
[4] 刘洋, 张雯, 胡毅, 毛进, 黄菲. 基于多模态深度学习的酒店股票预测*[J]. 数据分析与知识发现, 2023, 7(5): 21-32.
[5] 闫尚义, 王靖亚, 刘晓文, 崔雨萌, 陶知众, 张晓帆. 基于多头自注意力池化与多粒度特征交互融合的微博情感分析*[J]. 数据分析与知识发现, 2023, 7(4): 32-45.
[6] 张昱, 张海军, 刘雅情, 梁科晋, 王月阳. 基于双向掩码注意力机制的多模态情感分析*[J]. 数据分析与知识发现, 2023, 7(4): 46-55.
[7] 潘华莉, 谢珺, 高婧, 续欣莹, 王长征. 融合多模态特征的深度强化学习推荐模型*[J]. 数据分析与知识发现, 2023, 7(4): 114-128.
[8] 赵朝阳, 朱贵波, 王金桥. ChatGPT给语言大模型带来的启示和多模态大模型新的发展思路*[J]. 数据分析与知识发现, 2023, 7(3): 26-35.
[9] 李浩君, 吕韵, 汪旭辉, 黄诘雅. 融入情感分析的多层交互深度推荐模型研究*[J]. 数据分析与知识发现, 2023, 7(3): 43-57.
[10] 周宁, 钟娜, 靳高雅, 刘斌. 基于混合词嵌入的双通道注意力网络中文文本情感分析*[J]. 数据分析与知识发现, 2023, 7(3): 58-68.
[11] 王昊, 龚丽娟, 周泽聿, 范涛, 王永生. 融合语义增强的社交媒体虚假信息检测方法研究*[J]. 数据分析与知识发现, 2023, 7(2): 48-60.
[12] 张艳琼, 朱兆松, 赵晓驰. 面向手语语言学的中国手语词汇多模态语料库构建研究*[J]. 数据分析与知识发现, 2023, 7(10): 144-155.
[13] 余本功, 季晓晗. 基于ADGCN-MFM的多模态讽刺检测研究*[J]. 数据分析与知识发现, 2023, 7(10): 85-94.
[14] 徐月梅, 曹晗, 王文清, 杜宛泽, 徐承炀. 跨语言情感分析研究综述*[J]. 数据分析与知识发现, 2023, 7(1): 1-21.
[15] 肖宇晗, 林慧苹. 基于CWSA方面词提取模型的差异化需求挖掘方法研究——以京东手机评论为例*[J]. 数据分析与知识发现, 2023, 7(1): 63-75.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn