一种融合知识与Res-ViT的特征增强多模态情感识别模型<sup>*</sup>

doi:10.11925/infotech.2096-3467.2022.1020

数据分析与知识发现

2023, Vol. 7

Issue (11): 14-25 https://doi.org/10.11925/infotech.2096-3467.2022.1020

研究论文

本期目录 | 过刊浏览 | 高级检索

一种融合知识与Res-ViT的特征增强多模态情感识别模型^*

杨茹芸,马静(

)

南京航空航天大学经济与管理学院南京 211106

A Feature-Enhanced Multi-modal Emotion Recognition Model Integrating Knowledge and Res-ViT

Yang Ruyun,Ma Jing(

)

College of Economics and Management, Nanjing University of Aeronautics and Astronautics,Nanjing 211106, China

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (2024 KB) HTML ( 17 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】改善多模态特征提取的质量，提高对多模态舆情中用户情感的识别精度。【方法】针对文本模态，使用RoBERTa进行特征提取，并通过知识短语表征词典进行知识增强；针对图像模态，整合ResNet与视觉Transformer，提出Res-ViT模型；特征融合部分使用Transformer编码器，最后将多模态表示输入全连接层中进行情感识别。【结果】在MVSA-Multiple数据集上，情感识别的准确率、F₁值分别为71.66%、69.42%，较基准方法的最高值分别提高2.22、0.59个百分点。【局限】未使用其他数据集进一步验证模型的泛化性与稳健性。【结论】本文模型能够更好地提取并有效融合多模态特征，提升了多模态情感识别的能力。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	杨茹芸
	马静

关键词 ：多模态, 情感识别, 特征增强

Abstract：

[Objective] This paper aims to enhance the quality of multi-modal feature extraction and improve the accuracy of netizen sentiment recognition for multi-modal public opinion. [Methods] First, we extracted features of the text modality using RoBERTa and enhanced them with the knowledge phrase representation dictionary. Then, we proposed a Res-ViT model for the graph modality, combining ResNet and Vision Transformer. Finally, we fused multi-modal features with Transformer encoders and fed the representations to the fully connected layer for sentiment recognition. [Results] We evaluated our model using the MVSA-Multiple dataset and achieved an accuracy of 71.66% and an F1 score of 69.42% for sentiment recognition. These improvements were 2.22% and 0.59% over the best scores of the baseline methods. [Limitations] More research is needed to examine the model with other datasets to verify its generalizability and robustness. [Conclusions] The proposed model could more effectively extract and fuse multi-modal features and improve the accuracy of sentiment recognition.

Key words： Multi-modal Sentiment Recognition Feature Enhancement

收稿日期: 2022-09-26 出版日期: 2023-03-22

ZTFLH:

G350 TP391

基金资助:*国家自然科学基金面上项目(72174086);南京航空航天大学研究生科研与创新实践项目的研究成果之一(xcxjh20220910)

通讯作者: 马静，ORCID：0000-0001-8472-2581，E-mail：majing5525@126.com。

引用本文:

杨茹芸, 马静. 一种融合知识与Res-ViT的特征增强多模态情感识别模型^*[J]. 数据分析与知识发现, 2023, 7(11): 14-25.
Yang Ruyun, Ma Jing. A Feature-Enhanced Multi-modal Emotion Recognition Model Integrating Knowledge and Res-ViT. Data Analysis and Knowledge Discovery, 2023, 7(11): 14-25.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.1020 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I11/14

Fig.1 融合知识与Res-ViT的特征增强多模态情感识别模型框架

Fig.2 文本特征提取模块

Fig.3 Res-ViT模型结构

Fig.4 Transformer编码器结构

Table 1 一次实验的数据集分布

Table 2 模型参数设置

Table 3 模型效果对比

Table 4 模型消融实验结果

Fig.5 Transformer编码器层数对比

Fig.6 可视化分析示例

[1]	Zhao J, Gui X, Zhang X. Deep Convolution Neural Networks for Twitter Sentiment Analysis[J]. IEEE Access, 2018, 6: 23253-23260. doi: 10.1109/ACCESS.2017.2776930
[2]	Rehman A U, Malik A K, Raza B, et al. A Hybrid CNN-LSTM Model for Improving Accuracy of Movie Reviews Sentiment Analysis[J]. Multimedia Tools and Applications, 2019, 78(18): 26597-26613. doi: 10.1007/s11042-019-07788-7
[3]	Mohammad Ehsan Basiri, Shahla Nemati, Moloud Abdar, et al. ABCDM: An Attention-Based Bidirectional CNN-RNN Deep Model for Sentiment Analysis[J]. Future Generation Computer Systems, 2021, 115: 279-294. doi: 10.1016/j.future.2020.08.005
[4]	Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st Annual Conference on Neural Information Processing Systems. ACM, 2017: 5998-6008.
[5]	Brown T B, Mann B, Ryder N, et al. Language Models are Few-Shot Learners[OL]. arXiv Preprint, arXiv: 2005.14165.
[6]	Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
[7]	Sun Y, Wang S, Li Y, et al. ERNIE: Enhanced Representation Through Knowledge Integration[OL]. arXiv preprint, arXiv:1904.09223.
[8]	Liu W, Zhou P, Zhao Z, et al. K-BERT: Enabling Language Representation with Knowledge Graph[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2020: 2901-2908.
[9]	Ke P, Ji H, Liu S, et al. SentiLARE: Sentiment-aware Language Representation Learning with Linguistic Knowledge[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020: 6975-6988.
[10]	Zhong P, Wang D, Miao C. Knowledge-enriched Transformer for Emotion Detection in Textual Conversations[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019: 165-176.
[11]	Tian H, Gao C, Xiao X, et al. SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 4067-4076.
[12]	Tenney I, Xia P, Chen B, et al. What do You Learn from Context?Probing for Sentence Structure in Contextualized Word Representations[C]// Proceedings of the 7th International Conference on Learning Representations. 2019.
[13]	Roberts A, Raffel C, Shazeer N. How Much Knowledge Can You Pack into the Parameters of a Language Model?[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020: 5418-5426.
[14]	Jia J, Wu S, Wang X, et al. Can We Understand Van Gogh’s Mood?Learning to Infer Affects from Images in Social Networks[C]// Proceedings of the 20th ACM International Conference on Multimedia. 2012: 857-860.
[15]	Borth D, Ji R, Chen T, et al. Large-Scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs[C]// Proceedings of the 21st ACM International Conference on Multimedia. 2013: 223-232.
[16]	Xu C, Cetintas S, Lee K C, et al. Visual Sentiment Prediction with Deep Convolutional Neural Networks[OL]. arXiv Preprint, arXiv:1411.5731.
[17]	Campos V, Jou B, Giró-i-Nieto X. From Pixels to Sentiment: Fine-tuning CNNs for Visual Sentiment Prediction[J]. Image and Vision Computing, 2017, 65: 15-22. doi: 10.1016/j.imavis.2017.01.011
[18]	He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
[19]	Hu J, Shen L, Sun G. Squeeze-and-Excitation Networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 7132-7141.
[20]	Woo S, Park J, Lee J Y, et al. CBAM: Convolutional Block Attention Module[C]// Proceedings of the European Conference on Computer Vision (ECCV). 2018: 3-19.
[21]	Dosovitskiy A, Beyer L, Kolesnikov A, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale[OL]. arXiv Preprint, arXiv: 2010.11929.
[22]	Raghu M, Unterthiner T, Kornblith S, et al. Do Vision Transformers See Like Convolutional Neural Networks?[J]. Advances in Neural Information Processing Systems, 2021, 34: 12116-12128.
[23]	Liu Y, Ott M, Goyal N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[OL]. arXiv Preprint, arXiv: 1907.11692.
[24]	Niu T, Zhu S, Pang L, et al. Sentiment Analysis on Multi-View Social Data[C]// Proceedings of the International Conference on Multimedia Modeling. Springer, Cham, 2016: 15-27.
[25]	Xu N, Mao W. MultiSentiNet: A Deep Semantic Network for Multimodal Sentiment Analysis[C]// Proceedings of the 2017 ACM Conference on Information and Knowledge Management. 2017: 2399-2402.
[26]	Vadicamo L, Carrara F, Cimino A, et al. Cross-Media Learning for Image Sentiment Analysis in the Wild[C]// Proceedings of the IEEE International Conference on Computer Vision Workshops. 2017: 308-317.
[27]	Xu N. Analyzing Multimodal Public Sentiment Based on Hierarchical Semantic Attentional Network[C]// Proceedings of the IEEE International Conference on Intelligence and Security Informatics. 2017: 152-154.
[28]	Xu N, Mao W, Chen G. A Co-Memory Network for Multimodal Sentiment Analysis[C]// Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018: 929-932.
[29]	Li L, Yatskar M, Yin D, et al. VisualBERT: A Simple and Performant Baseline for Vision and Language[OL]. arXiv Preprint, arXiv:1908.03557.
[30]	Tan H, Bansal M. LXMERT: Learning Cross-Modality Encoder Representations from Transformers[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019: 5100-5111.
[31]	Tsai Y H, Bai S, Liang P P, et al. Multimodal Transformer for Unaligned Multimodal Language Sequences[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 6558-6569.

[1]	刘洋, 丁星辰, 马莉莉, 王淳洋, 朱立芳. 基于多维度图卷积网络的旅游评论有用性识别*[J]. 数据分析与知识发现, 2023, 7(8): 95-104.
[2]	赵萌, 王昊, 李晓敏. 中国民歌多情感识别及情感变化规律分析研究^*[J]. 数据分析与知识发现, 2023, 7(7): 111-124.
[3]	刘洋, 张雯, 胡毅, 毛进, 黄菲. 基于多模态深度学习的酒店股票预测^*[J]. 数据分析与知识发现, 2023, 7(5): 21-32.
[4]	张昱, 张海军, 刘雅情, 梁科晋, 王月阳. 基于双向掩码注意力机制的多模态情感分析^*[J]. 数据分析与知识发现, 2023, 7(4): 46-55.
[5]	潘华莉, 谢珺, 高婧, 续欣莹, 王长征. 融合多模态特征的深度强化学习推荐模型^*[J]. 数据分析与知识发现, 2023, 7(4): 114-128.
[6]	赵朝阳, 朱贵波, 王金桥. ChatGPT给语言大模型带来的启示和多模态大模型新的发展思路^*[J]. 数据分析与知识发现, 2023, 7(3): 26-35.
[7]	王昊, 龚丽娟, 周泽聿, 范涛, 王永生. 融合语义增强的社交媒体虚假信息检测方法研究^*[J]. 数据分析与知识发现, 2023, 7(2): 48-60.
[8]	强子珊,顾益军. 基于多模态异质图的社交媒体谣言检测模型^*[J]. 数据分析与知识发现, 2023, 7(11): 68-78.
[9]	张艳琼, 朱兆松, 赵晓驰. 面向手语语言学的中国手语词汇多模态语料库构建研究^*[J]. 数据分析与知识发现, 2023, 7(10): 144-155.
[10]	吴思思, 马静. 基于感知融合的多任务多模态情感分析模型^*[J]. 数据分析与知识发现, 2023, 7(10): 74-84.
[11]	余本功, 季晓晗. 基于ADGCN-MFM的多模态讽刺检测研究^*[J]. 数据分析与知识发现, 2023, 7(10): 85-94.
[12]	陈圆圆, 马静. 基于SC-Attention机制的多模态讽刺检测研究^*[J]. 数据分析与知识发现, 2022, 6(9): 40-51.
[13]	施运梅, 袁博, 张乐, 吕学强. IMTS：融合图像与文本语义的虚假评论检测方法*[J]. 数据分析与知识发现, 2022, 6(8): 84-96.
[14]	范涛, 王昊, 李跃艳, 邓三鸿. 基于多模态融合的非遗图片分类研究^*[J]. 数据分析与知识发现, 2022, 6(2/3): 329-337.
[15]	李纲, 张霁, 毛进. 面向突发事件画像的社交媒体图像分类研究^*[J]. 数据分析与知识发现, 2022, 6(2/3): 67-79.

Viewed

Full text

Abstract

Cited

Shared

Discussed