Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (10): 85-94    DOI: 10.11925/infotech.2096-3467.2022.0987
Current Issue | Archive | Adv Search |
Detecting Multimodal Sarcasm Based on ADGCN-MFM
Yu Bengong1,2(),Ji Xiaohan1
1School of Management, Hefei University of Technology, Hefei 230009, China
2Key Laboratory of Process Optimization & Intelligent Decision-Making, Ministry of Education, Hefei University of Technology, Hefei 230009, China
Download: PDF (2346 KB)   HTML ( 12
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a sarcasm detection model based on affective dependency graph convolutional neural network-modality fusion. It tries to comprehensively improve multimodal sarcasm detection studies with sentiment information and syntactic dependencies of texts. [Methods] The new model enhances text modalities’ sentiment and syntactic information by utilizing sentiment graphs and syntactic dependency graphs. It uses graph convolutional neural networks to obtain text information with rich sentiment semantics and then fuses multimodal features by modal fusion. Finally, the model uses a self-attention mechanism to filter redundant information and perform sarcasm detection based on the fused information. [Results] The new model’s accuracy reached 85.85%, which is 3.46%, 2.25%, 1.83%, and 0.95% higher than the baseline models HFM, Res-BERT, D&R Net, and IIMI-MMSD, respectively. The F1 value reached 84.80%, 1.44% higher than the baseline models. [Limitations] More research is needed to validate the generalization and robustness of the model on more datasets. [Conclusions] The proposed model can thoroughly examine the sentiment and syntactic dependencies of the text and effectively detect multimodal sarcasm.

Key wordsMultimodality      Sarcasm Detection      Sentiment-Dependency      Graph Convolutional Neural Network      Modality Fusion     
Received: 20 September 2022      Published: 21 March 2023
ZTFLH:  TP393  
  G250  
Fund:National Natural Science Foundation of China(72071061)
Corresponding Authors: Yu Bengong,ORCID:0000-0003-4170-2335,E-mail: bgyu19@163.com。   

Cite this article:

Yu Bengong, Ji Xiaohan. Detecting Multimodal Sarcasm Based on ADGCN-MFM. Data Analysis and Knowledge Discovery, 2023, 7(10): 85-94.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0987     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I10/85

Structure of ADGCN-MFM Model
Architecture of ViT Model
Extraction and Processing of Image Attributes
类别 训练集 验证集 测试集
正例 8 642 959 959
负例 11 174 1 451 1 450
总计 19 816 2 410 2 409
Statistics of Datasets
参数名称 参数数值
词向量维度 768
图像向量维度 768
Bi-LSTM隐藏层大小 128
句子最大长度 128
Dropout 0.2
批尺寸 64
学习率 2e-5
Epochs 10
优化器 Adam
损失函数 CrossEntropy Loss
Experimental Parameter Setting
模态 模型 F1值/
%
精确率/
%
召回率/
%
准确率/
%
Image ResNet152* 65.13 54.41 70.80 64.76
ViT 65.59 60.32 71.87 66.19
Text TextCNN* 75.32 74.29 76.39 80.03
SMSD* 75.82 76.46 75.18 80.90
Bi-LSTM* 77.53 76.66 78.42 81.90
MIARN* 77.36 79.67 75.18 82.48
BERT* 80.22 78.27 82.27 83.85
Image+Text HFM 79.43 76.74 82.32 82.39
Res-BERT 82.93 82.73 83.19 83.60
D&R Net* 80.60 77.97 83.42 84.02
IIMI-MMSD 83.36 83.11 83.73 84.90
InCrossMGs* 82.84 81.38 84.36 86.10
ADGCN-MFM 84.80 84.33 85.27 85.85
Comparison Results
模型 F1值/% 精确率/% 召回率/% 准确率/%
ADGCN-MFM 84.80 84.33 85.27 85.85
w/o Attribute
w/o A-Graph
w/o D-Graph
w/o Fusion
w/o Attention
83.63
82.83
83.11
82.89
83.00
83.41
82.71
83.10
83.08
82.73
83.93
82.96
83.13
82.72
83.43
84.18
83.47
83.81
83.68
83.52
Results of Ablation Experiments
The Effect of GCN Layers on Model Performance
编号 图像 图像属性 文本 是否讽刺 模型预测结果
1 ‘frown’, ‘woman’, ‘eyes’,‘white’, ‘hand’ I got a nice cold for the rest of winter. sarcasm
2 ‘man’, ‘wearing’, ‘sitting’,‘hat’, ‘watch’ beautiful day , not a care in the world.oh i was talking about the picture not my cold freezing world. sarcasm
3 ‘child’, ‘cake’, ‘smile’, ‘candles’, ‘woman’ What a happy day! not sarcasm
Examples of Cases
[1] 罗观柱, 赵妍妍, 秦兵, 等. 面向社交媒体的反讽识别[J]. 智能计算机与应用, 2020, 10(2): 301-307.
[1] (Luo Guanzhu, Zhao Yanyan, Qin Bing, et al. Social Media-Oriented Sarcasm Detection[J]. Intelligent Computer and Applications, 2020, 10(2): 301-307.)
[2] Potamias R A, Siolas G, Stafylopatis A G. A Transformer-Based Approach to Irony and Sarcasm Detection[J]. Neural Computing and Applications, 2020, 32(23): 17309-17320.
doi: 10.1007/s00521-020-05102-3
[3] Cai Y T, Cai H Y, Wan X J. Multi-Modal Sarcasm Detection in Twitter with Hierarchical Fusion Model[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 2506-2515.
[4] Sangwan S, Akhtar M S, Behera P, et al. I Didn’t Mean What I Wrote! Exploring Multimodality for Sarcasm Detection[C]// Proceedings of 2020 International Joint Conference on Neural Networks. 2020: 1-8.
[5] Wang X Y, Sun X W, Yang T, et al. Building a Bridge: A Method for Image-Text Sarcasm Detection Without Pretraining on Image-Text Data[C]// Proceedings of the 1st International Workshop on Natural Language Processing Beyond Text. 2020: 19-29.
[6] 钟佳娃, 刘巍, 王思丽, 等. 文本情感分析方法及应用综述[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[6] (Zhong Jiawa, Liu Wei, Wang Sili, et al. Review of Methods and Applications of Text Sentiment Analysis[J]. Data Analysis and Knowledge Discovery, 2021, 5(6): 1-13.)
[7] Abdu S A, Yousef A H, Salem A. Multimodal Video Sentiment Analysis Using Deep Learning Approaches, a Survey[J]. Information Fusion, 2021, 76(C): 204-226.
[8] Du Y P, Liu Y, Peng Z, et al. Gated Attention Fusion Network for Multimodal Sentiment Classification[J]. Knowledge-Based Systems, 2022, 240: 108107.
doi: 10.1016/j.knosys.2021.108107
[9] 袁景凌, 丁远远, 盛德明, 等. 基于视觉方面注意力的图像文本情感分析模型[J]. 计算机科学, 2022, 49(1): 219-224.
doi: 10.11896/jsjkx.201000074
[9] (Yuan Jingling, Ding Yuanyuan, Sheng Deming, et al. Image-Text Sentiment Analysis Model Based on Visual Aspect Attention[J]. Computer Science, 2022, 49(1): 219-224.)
doi: 10.11896/jsjkx.201000074
[10] Wang K, Shen W Z, Yang Y Y, et al. Relational Graph Attention Network for Aspect-Based Sentiment Analysis[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 3229-3238.
[11] Xue X J, Zhang C X, Niu Z D, et al. Multi-Level Attention Map Network for Multimodal Sentiment Analysis[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(5): 5105-5118.
[12] Yang X C, Feng S, Zhang Y F, et al. Multimodal Sentiment Detection Based on Multi-channel Graph Neural Networks[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1:Long Papers). 2021: 328-339.
[13] Pan H L, Lin Z, Fu P, et al. Modeling Intra and Inter-modality Incongruity for Multi-Modal Sarcasm Detection[C]// Findings of the Association for Computational Linguistics:EMNLP 2020. 2020: 1383-1392.
[14] Xu N, Zeng Z X, Mao W J. Reasoning with Multimodal Sarcastic Tweets via Modeling Cross-Modality Contrast and Semantic Association[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 3777-3786.
[15] Gupta S, Shah A, Shah M, et al. FiLMing Multimodal Sarcasm Detection with Attention[OL]. arXiv Preprint, arXiv: 2110.00416.
[16] 张继东, 蒋丽萍. 基于多模态深度学习的旅游评论反讽识别研究[J]. 情报理论与实践, 2022, 45(7): 158-164.
doi: 10.16353/j.cnki.1000-7490.2022.07.022
[16] (Zhang Jidong, Jiang Liping. Research on Irony Recognition of Travel Reviews Based on Multi-modal Deep Learning[J]. Information Studies: Theory & Application, 2022, 45(7): 158-164.)
doi: 10.16353/j.cnki.1000-7490.2022.07.022
[17] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale[OL]. arXiv Preprint, arXiv: 2010. 11929.
[18] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[19] Voita E, Talbot D, Moiseev F, et al. Analyzing Multi-head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 5797-5808.
[20] Gaudart J, Giusiano B, Huiart L. Comparison of the Performance of Multi-layer Perceptron and Linear Regression for Epidemiological Data[J]. Computational Statistics & Data Analysis, 2004, 44(4): 547-570.
doi: 10.1016/S0167-9473(02)00257-8
[21] Ba J L, Kiros J R, Hinton G E. Layer Normalization[OL]. arXiv Preprint, arXiv: 1607.06450.
[22] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.
[23] Lou C W, Liang B, Gui L, et al. Affective Dependency Graph for Sarcasm Detection[C]// Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021: 1844-1849.
[24] Cambria E, Li Y, Xing F Z, et al. SenticNet 6: Ensemble Application of Symbolic and Subsymbolic AI for Sentiment Analysis[C]// Proceedings of the 29th ACM International Conference on Information and Knowledge Management. 2020: 105-114.
[25] 罗曜儒, 李智. 基于Bi-LSTM的生物医学文本语义消歧研究[J]. 软件导刊, 2019, 18(4): 57-59.
[25] (Luo Yaoru, Li Zhi. Word Sense Disambiguation in Biomedical Text Based on Bi-LSTM[J]. Software Guide, 2019, 18(4): 57-59.)
[26] He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
[27] Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1746-1751.
[28] Xiong T, Zhang P R, Zhu H B, et al. Sarcasm Detection with Self-matching Networks and Low-rank Bilinear Pooling[C]// Proceedings of the World Wide Web Conference. 2019: 2115-2124.
[29] Tay Y, Luu A T, Hui S C, et al. Reasoning with Sarcasm by Reading In-between[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2018: 1010-1020.
[30] Liang B, Lou C W, Li X, et al. Multi-Modal Sarcasm Detection with Interactive In-Modal and Cross-Modal Graphs[C]// Proceedings of the 29th ACM International Conference on Multimedia. 2021: 4707-4715.
[1] Zhang Yu, Zhang Haijun, Liu Yaqing, Liang Kejin, Wang Yueyang. Multimodal Sentiment Analysis Based on Bidirectional Mask Attention Mechanism[J]. 数据分析与知识发现, 2023, 7(4): 46-55.
[2] Chen Yuanyuan, Ma Jing. Detecting Multimodal Sarcasm Based on SC-Attention Mechanism[J]. 数据分析与知识发现, 2022, 6(9): 40-51.
[3] Guo Fanrong, Huang Xiaoxi, Wang Rongbo, Chen Zhiqun, Hu Chuang, Xie Yimin, Si Boyu. Identifying Metaphor with Transformer and Graph Convolutional Network[J]. 数据分析与知识发现, 2022, 6(4): 120-129.
[4] Liu Yang, Ma Lili, Zhang Wen, Hu Zhongyi, Wu Jiang. Detecting Sarcasm from Travel Reviews Based on Cross-Modal Deep Learning[J]. 数据分析与知识发现, 2022, 6(12): 23-31.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn