基于SC-Attention机制的多模态讽刺检测研究<sup>*</sup>

doi:10.11925/infotech.2096-3467.2021.1362

数据分析与知识发现

2022, Vol. 6

Issue (9): 40-51 https://doi.org/10.11925/infotech.2096-3467.2021.1362

研究论文

本期目录 | 过刊浏览 | 高级检索

基于SC-Attention机制的多模态讽刺检测研究^*

陈圆圆,马静(

)

南京航空航天大学经济与管理学院南京 211106

Detecting Multimodal Sarcasm Based on SC-Attention Mechanism

Chen Yuanyuan,Ma Jing(

)

College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (5393 KB) HTML ( 24 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】 针对现有多模态讽刺检测模型中存在预测准确率不高、多模态特征难以融合等问题,设计一种SC-Attention融合机制。【方法】 采用CLIP和RoBERTa模型分别提取图像、图像属性和文本三种模态特征,经由SENet的注意力机制和Co-Attention机制结合构成的SC-Attention机制将多模态特征进行融合,以原始模态特征为引导,合理分配特征权重,最后输入全连接层进行讽刺检测。【结果】 实验结果表明,基于SC-Attention机制的多模态讽刺检测的准确率为93.71%,F1值为 91.68%,与基准模型相比,准确率提升10.27个百分点,F1值提升11.50个百分点。【局限】 模型的泛化性需要在更多数据集上体现出来。【结论】 SC-Attention机制减少信息冗余和特征损失,有效提高多模态讽刺检测的准确率。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	陈圆圆
	马静

关键词 ：多模态, 讽刺检测, SC-Attention机制, CLIP模型

Abstract：

[Objective] This paper designs an SC-Attention fusion mechanism,aiming to improve the low prediction accuracy and difficult fusion of multimodal features in the existing detection models for multimodal sarcasm. [Methods] First, we used the CLIP and RoBERTa models to extract features from pictures, picture attributes, and texts. Then, we combined the SC-Attention mechanism with SENet’s attention mechanism to establish the Co-Attention mechanism and fuse multi-modal features. Third, we re-allocated attention feature weights by the original modals. Finally, we input features to the full connection layers to detect sarcasm. [Results] The accuracy and F1 of the proposed model reached 93.71% and 91.68%, which were 10.27 and 11.5 percentage point higher than the existing ones. [Limitations] We need to examine our model with more data sets. [Conclusions] The proposed model reduces information redundancy and feature loss, which effectively improves the accuracy of multimodal sarcasm detection.

Key words： Multimodal Sarcasm Detection SC-Attention Mechanism CLIP Model

收稿日期: 2021-12-01 出版日期: 2022-10-26

ZTFLH:	TP393
	G250

基金资助:^*国家科学自然基金项目(72174086);中央高校基本科研业务费专项前瞻性发展策略研究基金项目(NW2020001)

通讯作者: 马静,ORCID： 0000-0001-8472-2518 E-mail: majing5525@126.com

引用本文:

陈圆圆, 马静. 基于SC-Attention机制的多模态讽刺检测研究^*[J]. 数据分析与知识发现, 2022, 6(9): 40-51.
Chen Yuanyuan, Ma Jing. Detecting Multimodal Sarcasm Based on SC-Attention Mechanism. Data Analysis and Knowledge Discovery, 2022, 6(9): 40-51.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.1362 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I9/40

Fig.1 基于SC-Attention机制的多模态讽刺检测模型

Fig.2 ViT特征提取架构

Fig.3 添加图像属性模态的例子
（图像属性：blue, sky,white,cloud,many,文本：“What bad weather！”）

Fig.4 CLIP模型文本编码器结构

Fig.5 SC-Attention机制结构

Fig.6 Parallel-Attention机制结构

Fig.7 SENet的注意力机制结构

Table 1 数据集标注结果

Table 2 实验参数设置

Table 3 对比实验

Table 4 模态消融实验结果

Table 5 不同的特征融合机制的实验结果

Table 6 不同特征提取模型的实验结果

Fig.8 不同Dropout值的准确率对比

Table 7 三种优化器以及学习率衰减策略的实验结果

Fig.9 讽刺推文的例子

[1]	Cai Y T, Cai H Y, Wan X J. Multi-Modal Sarcasm Detection in Twitter with Hierarchical Fusion Model[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 2506-2515.
[2]	Joshi A, Tripathi V, Patel K, et al. Are Word Embedding-Based Features Useful for Sarcasm Detection?[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016:1006-1011.
[3]	Poria S, Cambria E, Hazarika D, et al. A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks[OL]. arXiv Preprint, arXiv:1610.08815.
[4]	Potamias R A, Siolas G, Stafylopatis A G. A Transformer-Based Approach to Irony And sarcasm Detection[J]. Neural Computing and Applications, 2020, 32(23): 17309-17320. doi: 10.1007/s00521-020-05102-3
[5]	何俊, 张彩庆, 李小珍, 等. 面向深度学习的多模态融合技术研究综述[J]. 计算机工程, 2020, 46(5): 1-11.
[5]	( He Jun, Zhang Caiqing, Li Xiaozhen, et al. Survey of Research on Multimodal Fusion Technology for Deep Learning[J]. Computer Engineering, 2020, 46(5): 1-11.)
[6]	Zadeh A, Chen M, Poria S, et al. Tensor Fusion Network for Multimodal Sentiment Analysis[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1103-1114.
[7]	Bedi M, Kumar S, Akhtar M S, et al. Multi-Modal Sarcasm Detection and Humor Classification in Code-Mixed Conversations[J]. IEEE Transactions on Affective Computing, DOI: 10.1109/TAFFC.2021.3083522. doi: 10.1109/TAFFC.2021.3083522
[8]	Handoyo A T, Suhartono D. Sarcasm Detection in Twitter: Performance Impact While Using Data Augmentation: Word Embeddings[OL]. arXiv Preprint, arXiv: 2108.09924.
[9]	Swami S, Khandelwal A, Singh V, et al. A Corpus of English-Hindi Code-Mixed Tweets for Sarcasm Detection[OL]. arXiv Preprint, arXiv:1805.11869.
[10]	Castro S, Hazarika D, Pérez-Rosas V, et al. Towards Multimodal Sarcasm Detection (An_Obviously_Perfect Paper)[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 4619-4629.
[11]	Bharti S K, Babu K S, Jena S K. Harnessing Online News for Sarcasm Detection in Hindi Tweets[C]// Proceedings of International Conference on Pattern Recognition and Machine Intelligence. 2017: 679-686.
[12]	Radford A, Kim J W, Hallacy C, et al. Learning Transferable Visual Models from Natural Language Supervision[C]// Proceedings of the 38th International Conference on Machine Learning. 2021: 8748-8763.
[13]	Liu Z, Lin W, Shi Y, et al. A Robustly Optimized BERT Pre-training Approach with Post-training[C]// Proceedings of Chinese Computational Linguistics:20th China National Conference. 2021: 471-484.
[14]	孟祥瑞, 杨文忠, 王婷. 基于图文融合的情感分析研究综述[J]. 计算机应用, 2021, 41(2): 307-317. doi: 10.11772/j.issn.1001-9081.2020060923
[14]	( Meng Xiangrui, Yang Wenzhong, Wang Ting. Survey of Sentiment Analysis Based on Image and Text Fusion[J]. Journal of Computer Applications, 2021, 41(2): 307-317.) doi: 10.11772/j.issn.1001-9081.2020060923
[15]	Lu J S, Batra D, Yang J W, et al. Hierarchical Question-Image Co-Attention for Visual Question Answering[OL]. arXiv Preprint, arXiv: 1606.00061.
[16]	Hu J, Shen L, Sun G. Squeeze-and-EXCITATION Networks[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 7132-7141.

[1]	施运梅, 袁博, 张乐, 吕学强. IMTS：融合图像与文本语义的虚假评论检测方法*[J]. 数据分析与知识发现, 2022, 6(8): 84-96.
[2]	范涛, 王昊, 李跃艳, 邓三鸿. 基于多模态融合的非遗图片分类研究^*[J]. 数据分析与知识发现, 2022, 6(2/3): 329-337.
[3]	李纲, 张霁, 毛进. 面向突发事件画像的社交媒体图像分类研究^*[J]. 数据分析与知识发现, 2022, 6(2/3): 67-79.
[4]	谢豪,毛进,李纲. 基于多层语义融合的图文信息情感分类研究^*[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[5]	张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测^*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[6]	王雨竹,谢珺,陈波,续欣莹. 基于跨模态上下文感知注意力的多模态情感分析 ^*[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[7]	朱路, 邓芳, 刘坤, 贺婷婷, 刘媛媛. 基于语义自编码哈希学习的跨模态检索方法^*[J]. 数据分析与知识发现, 2021, 5(12): 110-122.

Viewed

Full text

Abstract

Cited

Shared

Discussed