Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (9): 40-51    DOI: 10.11925/infotech.2096-3467.2021.1362
Current Issue | Archive | Adv Search |
Detecting Multimodal Sarcasm Based on SC-Attention Mechanism
Chen Yuanyuan,Ma Jing()
College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Download: PDF (5393 KB)   HTML ( 22
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper designs an SC-Attention fusion mechanism,aiming to improve the low prediction accuracy and difficult fusion of multimodal features in the existing detection models for multimodal sarcasm. [Methods] First, we used the CLIP and RoBERTa models to extract features from pictures, picture attributes, and texts. Then, we combined the SC-Attention mechanism with SENet’s attention mechanism to establish the Co-Attention mechanism and fuse multi-modal features. Third, we re-allocated attention feature weights by the original modals. Finally, we input features to the full connection layers to detect sarcasm. [Results] The accuracy and F1 of the proposed model reached 93.71% and 91.68%, which were 10.27 and 11.5 percentage point higher than the existing ones. [Limitations] We need to examine our model with more data sets. [Conclusions] The proposed model reduces information redundancy and feature loss, which effectively improves the accuracy of multimodal sarcasm detection.

Key wordsMultimodal      Sarcasm Detection      SC-Attention Mechanism      CLIP Model     
Received: 01 December 2021      Published: 26 October 2022
ZTFLH:  TP393  
  G250  
Fund:National Natural Science Foundation of China(72174086);Special Forward-looking Development Strategy Research Project of the Fundamental Research Funds for the of Central Universities(NW2020001)
Corresponding Authors: Ma Jing,ORCID: 0000-0001-8472-2518     E-mail: majing5525@126.com

Cite this article:

Chen Yuanyuan, Ma Jing. Detecting Multimodal Sarcasm Based on SC-Attention Mechanism. Data Analysis and Knowledge Discovery, 2022, 6(9): 40-51.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.1362     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I9/40

Multimodal Sarcasm Detection Model Based on SC-Attention Mechanism
Architecture of ViT Feature Extraction
Example of Adding Image Property Modal
Text Encoder Structure of CLIP Model
Diagram of SC-Attention Mechanism
Diagram of Parallel-Attention
Attention Mechanism Structure of SENet
分类 训练集 测试集
无讽刺性 8 642 1 918
讽刺性 11 174 2 901
合计 19 816 4 819
Annotation Results of the Dataset
参数
词向量维度 768
图像向量维度 768
Dropout 0.15
学习率 0.000 1
批处理 32
优化函数 Adam
损失函数 CrossEntropy Loss
Experimental Parameter Setting
模型 VIT RN50 RoBERTa BERT CLIP模型文本编码器 Co-Attention SC
VRC-FC
VRC-CF
VRC-TFN
VRC-SC
RRC-FC
RRC-CF
RRC-TFN
RRC-SC
VBC-SC
Contrast Experiment
模型 Acc/% P/% R/% F1/%
Text(BERT) 80.40 75.67 77.80 76.72
Text(RoBERTa) 90.11 87.16 89.67 88.40
Image(ViT) 61.85 54.06 54.01 54.03
Image(ResNet50×16) 59.56 52.62 50.32 51.44
Attribute 78.87 74.69 74.26 74.48
Concat(I+T) 91.93 89.02 90.11 89.56
Concat(I+A) 78.91 74.02 75.81 78.91
Concat(A+T) 91.23 88.76 89.34 89.05
VRC-SC(I+T+A) 93.71 90.28 93.13 91.68
Experimental Results of Modal Ablation
模型 Acc/% P/% R/% F1/%
VRC-FC 91.33 89.17 92.04 90.23
VRC-CF 90.14 87.43 88.72 88.07
VRC-TFN 91.98 89.45 91.68 90.55
VRC-SC 93.71 90.28 93.13 91.68
RRC-FC 91.11 88.15 91.64 89.48
RRC-CF 89.75 87.23 86.89 87.06
RRC-TFN 91.33 88.14 90.07 89.09
RRC-SC 92.42 89.48 92.66 90.56
分层融合模型 83.44 76.57 84.15 80.18
Experimental Results of Different Feature Fusion Mechanisms
模型 Acc/% P/% R/% F1/%
VBC-SC 83.83 77.45 83.46 80.34
VRC-SC 93.71 90.28 93.13 91.68
RRC-SC 92.42 89.48 92.66 90.56
Experimental Results of Different Feature Extraction Models
The Accuracy of Different Dropout
训练方案 优化器 学习率衰减策略 Epoch数 Acc/%
方案1 SGD 后期手动调整 129 93.42
方案2 Adam 余弦退火 112 93.71
方案3 RMSprop 自适应调整学习率 139 93.35
Experimental Results of Three Optimizers and Learning Rate Attenuation Strategies
Examples of Sarcastic Tweets
[1] Cai Y T, Cai H Y, Wan X J. Multi-Modal Sarcasm Detection in Twitter with Hierarchical Fusion Model[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 2506-2515.
[2] Joshi A, Tripathi V, Patel K, et al. Are Word Embedding-Based Features Useful for Sarcasm Detection?[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016:1006-1011.
[3] Poria S, Cambria E, Hazarika D, et al. A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks[OL]. arXiv Preprint, arXiv:1610.08815.
[4] Potamias R A, Siolas G, Stafylopatis A G. A Transformer-Based Approach to Irony And sarcasm Detection[J]. Neural Computing and Applications, 2020, 32(23): 17309-17320.
doi: 10.1007/s00521-020-05102-3
[5] 何俊, 张彩庆, 李小珍, 等. 面向深度学习的多模态融合技术研究综述[J]. 计算机工程, 2020, 46(5): 1-11.
[5] ( He Jun, Zhang Caiqing, Li Xiaozhen, et al. Survey of Research on Multimodal Fusion Technology for Deep Learning[J]. Computer Engineering, 2020, 46(5): 1-11.)
[6] Zadeh A, Chen M, Poria S, et al. Tensor Fusion Network for Multimodal Sentiment Analysis[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1103-1114.
[7] Bedi M, Kumar S, Akhtar M S, et al. Multi-Modal Sarcasm Detection and Humor Classification in Code-Mixed Conversations[J]. IEEE Transactions on Affective Computing, DOI: 10.1109/TAFFC.2021.3083522.
doi: 10.1109/TAFFC.2021.3083522
[8] Handoyo A T, Suhartono D. Sarcasm Detection in Twitter: Performance Impact While Using Data Augmentation: Word Embeddings[OL]. arXiv Preprint, arXiv: 2108.09924.
[9] Swami S, Khandelwal A, Singh V, et al. A Corpus of English-Hindi Code-Mixed Tweets for Sarcasm Detection[OL]. arXiv Preprint, arXiv:1805.11869.
[10] Castro S, Hazarika D, Pérez-Rosas V, et al. Towards Multimodal Sarcasm Detection (An_Obviously_Perfect Paper)[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 4619-4629.
[11] Bharti S K, Babu K S, Jena S K. Harnessing Online News for Sarcasm Detection in Hindi Tweets[C]// Proceedings of International Conference on Pattern Recognition and Machine Intelligence. 2017: 679-686.
[12] Radford A, Kim J W, Hallacy C, et al. Learning Transferable Visual Models from Natural Language Supervision[C]// Proceedings of the 38th International Conference on Machine Learning. 2021: 8748-8763.
[13] Liu Z, Lin W, Shi Y, et al. A Robustly Optimized BERT Pre-training Approach with Post-training[C]// Proceedings of Chinese Computational Linguistics:20th China National Conference. 2021: 471-484.
[14] 孟祥瑞, 杨文忠, 王婷. 基于图文融合的情感分析研究综述[J]. 计算机应用, 2021, 41(2): 307-317.
doi: 10.11772/j.issn.1001-9081.2020060923
[14] ( Meng Xiangrui, Yang Wenzhong, Wang Ting. Survey of Sentiment Analysis Based on Image and Text Fusion[J]. Journal of Computer Applications, 2021, 41(2): 307-317.)
doi: 10.11772/j.issn.1001-9081.2020060923
[15] Lu J S, Batra D, Yang J W, et al. Hierarchical Question-Image Co-Attention for Visual Question Answering[OL]. arXiv Preprint, arXiv: 1606.00061.
[16] Hu J, Shen L, Sun G. Squeeze-and-EXCITATION Networks[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 7132-7141.
[1] Shi Yunmei, Yuan Bo, Zhang Le, Lv Xueqiang. IMTS: Detecting Fake Reviews with Image and Text Semantics[J]. 数据分析与知识发现, 2022, 6(8): 84-96.
[2] Fan Tao, Wang Hao, Li Yueyan, Deng Sanhong. Classifying Images of Intangible Cultural Heritages with Multimodal Fusion[J]. 数据分析与知识发现, 2022, 6(2/3): 329-337.
[3] Li Gang, Zhang Ji, Mao Jin. Social Media Image Classification for Emergency Portrait[J]. 数据分析与知识发现, 2022, 6(2/3): 67-79.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn