Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (9): 40-51     https://doi.org/10.11925/infotech.2096-3467.2021.1362
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于SC-Attention机制的多模态讽刺检测研究*
陈圆圆,马静()
南京航空航天大学经济与管理学院 南京 211106
Detecting Multimodal Sarcasm Based on SC-Attention Mechanism
Chen Yuanyuan,Ma Jing()
College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
全文: PDF (5393 KB)   HTML ( 24
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 针对现有多模态讽刺检测模型中存在预测准确率不高、多模态特征难以融合等问题,设计一种SC-Attention融合机制。【方法】 采用CLIP和RoBERTa模型分别提取图像、图像属性和文本三种模态特征,经由SENet的注意力机制和Co-Attention机制结合构成的SC-Attention机制将多模态特征进行融合,以原始模态特征为引导,合理分配特征权重,最后输入全连接层进行讽刺检测。【结果】 实验结果表明,基于SC-Attention机制的多模态讽刺检测的准确率为93.71%,F1值为 91.68%,与基准模型相比,准确率提升10.27个百分点,F1值提升11.50个百分点。【局限】 模型的泛化性需要在更多数据集上体现出来。【结论】 SC-Attention机制减少信息冗余和特征损失,有效提高多模态讽刺检测的准确率。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
陈圆圆
马静
关键词 多模态讽刺检测SC-Attention机制CLIP模型    
Abstract

[Objective] This paper designs an SC-Attention fusion mechanism,aiming to improve the low prediction accuracy and difficult fusion of multimodal features in the existing detection models for multimodal sarcasm. [Methods] First, we used the CLIP and RoBERTa models to extract features from pictures, picture attributes, and texts. Then, we combined the SC-Attention mechanism with SENet’s attention mechanism to establish the Co-Attention mechanism and fuse multi-modal features. Third, we re-allocated attention feature weights by the original modals. Finally, we input features to the full connection layers to detect sarcasm. [Results] The accuracy and F1 of the proposed model reached 93.71% and 91.68%, which were 10.27 and 11.5 percentage point higher than the existing ones. [Limitations] We need to examine our model with more data sets. [Conclusions] The proposed model reduces information redundancy and feature loss, which effectively improves the accuracy of multimodal sarcasm detection.

Key wordsMultimodal    Sarcasm Detection    SC-Attention Mechanism    CLIP Model
收稿日期: 2021-12-01      出版日期: 2022-10-26
ZTFLH:  TP393  
  G250  
基金资助:*国家科学自然基金项目(72174086);中央高校基本科研业务费专项前瞻性发展策略研究基金项目(NW2020001)
通讯作者: 马静,ORCID: 0000-0001-8472-2518     E-mail: majing5525@126.com
引用本文:   
陈圆圆, 马静. 基于SC-Attention机制的多模态讽刺检测研究*[J]. 数据分析与知识发现, 2022, 6(9): 40-51.
Chen Yuanyuan, Ma Jing. Detecting Multimodal Sarcasm Based on SC-Attention Mechanism. Data Analysis and Knowledge Discovery, 2022, 6(9): 40-51.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.1362      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I9/40
Fig.1  基于SC-Attention机制的多模态讽刺检测模型
Fig.2  ViT特征提取架构
Fig.3  添加图像属性模态的例子
(图像属性:blue, sky,white,cloud,many,文本:“What bad weather!”)
Fig.4  CLIP模型文本编码器结构
Fig.5  SC-Attention机制结构
Fig.6  Parallel-Attention机制结构
Fig.7  SENet的注意力机制结构
分类 训练集 测试集
无讽刺性 8 642 1 918
讽刺性 11 174 2 901
合计 19 816 4 819
Table 1  数据集标注结果
参数
词向量维度 768
图像向量维度 768
Dropout 0.15
学习率 0.000 1
批处理 32
优化函数 Adam
损失函数 CrossEntropy Loss
Table 2  实验参数设置
模型 VIT RN50 RoBERTa BERT CLIP模型文本编码器 Co-Attention SC
VRC-FC
VRC-CF
VRC-TFN
VRC-SC
RRC-FC
RRC-CF
RRC-TFN
RRC-SC
VBC-SC
Table 3  对比实验
模型 Acc/% P/% R/% F1/%
Text(BERT) 80.40 75.67 77.80 76.72
Text(RoBERTa) 90.11 87.16 89.67 88.40
Image(ViT) 61.85 54.06 54.01 54.03
Image(ResNet50×16) 59.56 52.62 50.32 51.44
Attribute 78.87 74.69 74.26 74.48
Concat(I+T) 91.93 89.02 90.11 89.56
Concat(I+A) 78.91 74.02 75.81 78.91
Concat(A+T) 91.23 88.76 89.34 89.05
VRC-SC(I+T+A) 93.71 90.28 93.13 91.68
Table 4  模态消融实验结果
模型 Acc/% P/% R/% F1/%
VRC-FC 91.33 89.17 92.04 90.23
VRC-CF 90.14 87.43 88.72 88.07
VRC-TFN 91.98 89.45 91.68 90.55
VRC-SC 93.71 90.28 93.13 91.68
RRC-FC 91.11 88.15 91.64 89.48
RRC-CF 89.75 87.23 86.89 87.06
RRC-TFN 91.33 88.14 90.07 89.09
RRC-SC 92.42 89.48 92.66 90.56
分层融合模型 83.44 76.57 84.15 80.18
Table 5  不同的特征融合机制的实验结果
模型 Acc/% P/% R/% F1/%
VBC-SC 83.83 77.45 83.46 80.34
VRC-SC 93.71 90.28 93.13 91.68
RRC-SC 92.42 89.48 92.66 90.56
Table 6  不同特征提取模型的实验结果
Fig.8  不同Dropout值的准确率对比
训练方案 优化器 学习率衰减策略 Epoch数 Acc/%
方案1 SGD 后期手动调整 129 93.42
方案2 Adam 余弦退火 112 93.71
方案3 RMSprop 自适应调整学习率 139 93.35
Table 7  三种优化器以及学习率衰减策略的实验结果
Fig.9  讽刺推文的例子
[1] Cai Y T, Cai H Y, Wan X J. Multi-Modal Sarcasm Detection in Twitter with Hierarchical Fusion Model[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 2506-2515.
[2] Joshi A, Tripathi V, Patel K, et al. Are Word Embedding-Based Features Useful for Sarcasm Detection?[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016:1006-1011.
[3] Poria S, Cambria E, Hazarika D, et al. A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks[OL]. arXiv Preprint, arXiv:1610.08815.
[4] Potamias R A, Siolas G, Stafylopatis A G. A Transformer-Based Approach to Irony And sarcasm Detection[J]. Neural Computing and Applications, 2020, 32(23): 17309-17320.
doi: 10.1007/s00521-020-05102-3
[5] 何俊, 张彩庆, 李小珍, 等. 面向深度学习的多模态融合技术研究综述[J]. 计算机工程, 2020, 46(5): 1-11.
[5] ( He Jun, Zhang Caiqing, Li Xiaozhen, et al. Survey of Research on Multimodal Fusion Technology for Deep Learning[J]. Computer Engineering, 2020, 46(5): 1-11.)
[6] Zadeh A, Chen M, Poria S, et al. Tensor Fusion Network for Multimodal Sentiment Analysis[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1103-1114.
[7] Bedi M, Kumar S, Akhtar M S, et al. Multi-Modal Sarcasm Detection and Humor Classification in Code-Mixed Conversations[J]. IEEE Transactions on Affective Computing, DOI: 10.1109/TAFFC.2021.3083522.
doi: 10.1109/TAFFC.2021.3083522
[8] Handoyo A T, Suhartono D. Sarcasm Detection in Twitter: Performance Impact While Using Data Augmentation: Word Embeddings[OL]. arXiv Preprint, arXiv: 2108.09924.
[9] Swami S, Khandelwal A, Singh V, et al. A Corpus of English-Hindi Code-Mixed Tweets for Sarcasm Detection[OL]. arXiv Preprint, arXiv:1805.11869.
[10] Castro S, Hazarika D, Pérez-Rosas V, et al. Towards Multimodal Sarcasm Detection (An_Obviously_Perfect Paper)[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 4619-4629.
[11] Bharti S K, Babu K S, Jena S K. Harnessing Online News for Sarcasm Detection in Hindi Tweets[C]// Proceedings of International Conference on Pattern Recognition and Machine Intelligence. 2017: 679-686.
[12] Radford A, Kim J W, Hallacy C, et al. Learning Transferable Visual Models from Natural Language Supervision[C]// Proceedings of the 38th International Conference on Machine Learning. 2021: 8748-8763.
[13] Liu Z, Lin W, Shi Y, et al. A Robustly Optimized BERT Pre-training Approach with Post-training[C]// Proceedings of Chinese Computational Linguistics:20th China National Conference. 2021: 471-484.
[14] 孟祥瑞, 杨文忠, 王婷. 基于图文融合的情感分析研究综述[J]. 计算机应用, 2021, 41(2): 307-317.
doi: 10.11772/j.issn.1001-9081.2020060923
[14] ( Meng Xiangrui, Yang Wenzhong, Wang Ting. Survey of Sentiment Analysis Based on Image and Text Fusion[J]. Journal of Computer Applications, 2021, 41(2): 307-317.)
doi: 10.11772/j.issn.1001-9081.2020060923
[15] Lu J S, Batra D, Yang J W, et al. Hierarchical Question-Image Co-Attention for Visual Question Answering[OL]. arXiv Preprint, arXiv: 1606.00061.
[16] Hu J, Shen L, Sun G. Squeeze-and-EXCITATION Networks[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 7132-7141.
[1] 施运梅, 袁博, 张乐, 吕学强. IMTS:融合图像与文本语义的虚假评论检测方法*[J]. 数据分析与知识发现, 2022, 6(8): 84-96.
[2] 范涛, 王昊, 李跃艳, 邓三鸿. 基于多模态融合的非遗图片分类研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 329-337.
[3] 李纲, 张霁, 毛进. 面向突发事件画像的社交媒体图像分类研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 67-79.
[4] 谢豪,毛进,李纲. 基于多层语义融合的图文信息情感分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[5] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[6] 王雨竹,谢珺,陈波,续欣莹. 基于跨模态上下文感知注意力的多模态情感分析 *[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[7] 朱路, 邓芳, 刘坤, 贺婷婷, 刘媛媛. 基于语义自编码哈希学习的跨模态检索方法*[J]. 数据分析与知识发现, 2021, 5(12): 110-122.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn