Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (10): 85-94     https://doi.org/10.11925/infotech.2096-3467.2022.0987
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于ADGCN-MFM的多模态讽刺检测研究*
余本功1,2(),季晓晗1
1合肥工业大学管理学院 合肥 230009
2合肥工业大学过程优化与智能决策教育部重点实验室 合肥 230009
Detecting Multimodal Sarcasm Based on ADGCN-MFM
Yu Bengong1,2(),Ji Xiaohan1
1School of Management, Hefei University of Technology, Hefei 230009, China
2Key Laboratory of Process Optimization & Intelligent Decision-Making, Ministry of Education, Hefei University of Technology, Hefei 230009, China
全文: PDF (2346 KB)   HTML ( 12
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】针对现有多模态讽刺检测研究对文本的情感信息和句法依存关系考虑不够全面的问题,提出一种基于情感-依存图卷积神经网络与模态融合的讽刺检测模型。【方法】该模型通过情感图和句法依存图增强文本模态的情感和句法信息,利用图卷积神经网络得到具有丰富情感语义的文本信息,随后通过模态融合的方式融合多模态特征,并利用自注意力机制过滤冗余信息,根据融合信息进行讽刺检测。【结果】实验结果表明,模型的准确率达到85.85%,相较于基线模型HFM、Res-BERT、D&R Net、IIMI-MMSD分别提升3.46、2.25、1.83、0.95个百分点;F1值达到84.80%,相较于基线模型中的较优者提升1.44个百分点。【局限】 未在更多数据集上验证模型的泛化性与稳健性。【结论】所提模型可以充分挖掘到文本的情感和句法依存关系,有效提升了多模态讽刺检测的准确性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
余本功
季晓晗
关键词 多模态讽刺检测情感-依存图卷积神经网络模态融合    
Abstract

[Objective] This paper proposes a sarcasm detection model based on affective dependency graph convolutional neural network-modality fusion. It tries to comprehensively improve multimodal sarcasm detection studies with sentiment information and syntactic dependencies of texts. [Methods] The new model enhances text modalities’ sentiment and syntactic information by utilizing sentiment graphs and syntactic dependency graphs. It uses graph convolutional neural networks to obtain text information with rich sentiment semantics and then fuses multimodal features by modal fusion. Finally, the model uses a self-attention mechanism to filter redundant information and perform sarcasm detection based on the fused information. [Results] The new model’s accuracy reached 85.85%, which is 3.46%, 2.25%, 1.83%, and 0.95% higher than the baseline models HFM, Res-BERT, D&R Net, and IIMI-MMSD, respectively. The F1 value reached 84.80%, 1.44% higher than the baseline models. [Limitations] More research is needed to validate the generalization and robustness of the model on more datasets. [Conclusions] The proposed model can thoroughly examine the sentiment and syntactic dependencies of the text and effectively detect multimodal sarcasm.

Key wordsMultimodality    Sarcasm Detection    Sentiment-Dependency    Graph Convolutional Neural Network    Modality Fusion
收稿日期: 2022-09-20      出版日期: 2023-03-21
ZTFLH:  TP393  
  G250  
基金资助:*国家自然科学基金项目(72071061)
通讯作者: 余本功,ORCID:0000-0003-4170-2335,E-mail: bgyu19@163.com。   
引用本文:   
余本功, 季晓晗. 基于ADGCN-MFM的多模态讽刺检测研究*[J]. 数据分析与知识发现, 2023, 7(10): 85-94.
Yu Bengong, Ji Xiaohan. Detecting Multimodal Sarcasm Based on ADGCN-MFM. Data Analysis and Knowledge Discovery, 2023, 7(10): 85-94.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0987      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I10/85
Fig.1  ADGCN-MFM模型结构
Fig.2  ViT模型结构
Fig.3  图像属性提取与处理
类别 训练集 验证集 测试集
正例 8 642 959 959
负例 11 174 1 451 1 450
总计 19 816 2 410 2 409
Table 1  数据集统计数据
参数名称 参数数值
词向量维度 768
图像向量维度 768
Bi-LSTM隐藏层大小 128
句子最大长度 128
Dropout 0.2
批尺寸 64
学习率 2e-5
Epochs 10
优化器 Adam
损失函数 CrossEntropy Loss
Table 2  实验参数设置
模态 模型 F1值/
%
精确率/
%
召回率/
%
准确率/
%
Image ResNet152* 65.13 54.41 70.80 64.76
ViT 65.59 60.32 71.87 66.19
Text TextCNN* 75.32 74.29 76.39 80.03
SMSD* 75.82 76.46 75.18 80.90
Bi-LSTM* 77.53 76.66 78.42 81.90
MIARN* 77.36 79.67 75.18 82.48
BERT* 80.22 78.27 82.27 83.85
Image+Text HFM 79.43 76.74 82.32 82.39
Res-BERT 82.93 82.73 83.19 83.60
D&R Net* 80.60 77.97 83.42 84.02
IIMI-MMSD 83.36 83.11 83.73 84.90
InCrossMGs* 82.84 81.38 84.36 86.10
ADGCN-MFM 84.80 84.33 85.27 85.85
Table 3  模型对比结果
模型 F1值/% 精确率/% 召回率/% 准确率/%
ADGCN-MFM 84.80 84.33 85.27 85.85
w/o Attribute
w/o A-Graph
w/o D-Graph
w/o Fusion
w/o Attention
83.63
82.83
83.11
82.89
83.00
83.41
82.71
83.10
83.08
82.73
83.93
82.96
83.13
82.72
83.43
84.18
83.47
83.81
83.68
83.52
Table 4  消融实验结果
Fig.4  GCN层数对模型性能的影响
编号 图像 图像属性 文本 是否讽刺 模型预测结果
1 ‘frown’, ‘woman’, ‘eyes’,‘white’, ‘hand’ I got a nice cold for the rest of winter. sarcasm
2 ‘man’, ‘wearing’, ‘sitting’,‘hat’, ‘watch’ beautiful day , not a care in the world.oh i was talking about the picture not my cold freezing world. sarcasm
3 ‘child’, ‘cake’, ‘smile’, ‘candles’, ‘woman’ What a happy day! not sarcasm
Table 5  案例示例
[1] 罗观柱, 赵妍妍, 秦兵, 等. 面向社交媒体的反讽识别[J]. 智能计算机与应用, 2020, 10(2): 301-307.
[1] (Luo Guanzhu, Zhao Yanyan, Qin Bing, et al. Social Media-Oriented Sarcasm Detection[J]. Intelligent Computer and Applications, 2020, 10(2): 301-307.)
[2] Potamias R A, Siolas G, Stafylopatis A G. A Transformer-Based Approach to Irony and Sarcasm Detection[J]. Neural Computing and Applications, 2020, 32(23): 17309-17320.
doi: 10.1007/s00521-020-05102-3
[3] Cai Y T, Cai H Y, Wan X J. Multi-Modal Sarcasm Detection in Twitter with Hierarchical Fusion Model[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 2506-2515.
[4] Sangwan S, Akhtar M S, Behera P, et al. I Didn’t Mean What I Wrote! Exploring Multimodality for Sarcasm Detection[C]// Proceedings of 2020 International Joint Conference on Neural Networks. 2020: 1-8.
[5] Wang X Y, Sun X W, Yang T, et al. Building a Bridge: A Method for Image-Text Sarcasm Detection Without Pretraining on Image-Text Data[C]// Proceedings of the 1st International Workshop on Natural Language Processing Beyond Text. 2020: 19-29.
[6] 钟佳娃, 刘巍, 王思丽, 等. 文本情感分析方法及应用综述[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[6] (Zhong Jiawa, Liu Wei, Wang Sili, et al. Review of Methods and Applications of Text Sentiment Analysis[J]. Data Analysis and Knowledge Discovery, 2021, 5(6): 1-13.)
[7] Abdu S A, Yousef A H, Salem A. Multimodal Video Sentiment Analysis Using Deep Learning Approaches, a Survey[J]. Information Fusion, 2021, 76(C): 204-226.
[8] Du Y P, Liu Y, Peng Z, et al. Gated Attention Fusion Network for Multimodal Sentiment Classification[J]. Knowledge-Based Systems, 2022, 240: 108107.
doi: 10.1016/j.knosys.2021.108107
[9] 袁景凌, 丁远远, 盛德明, 等. 基于视觉方面注意力的图像文本情感分析模型[J]. 计算机科学, 2022, 49(1): 219-224.
doi: 10.11896/jsjkx.201000074
[9] (Yuan Jingling, Ding Yuanyuan, Sheng Deming, et al. Image-Text Sentiment Analysis Model Based on Visual Aspect Attention[J]. Computer Science, 2022, 49(1): 219-224.)
doi: 10.11896/jsjkx.201000074
[10] Wang K, Shen W Z, Yang Y Y, et al. Relational Graph Attention Network for Aspect-Based Sentiment Analysis[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 3229-3238.
[11] Xue X J, Zhang C X, Niu Z D, et al. Multi-Level Attention Map Network for Multimodal Sentiment Analysis[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(5): 5105-5118.
[12] Yang X C, Feng S, Zhang Y F, et al. Multimodal Sentiment Detection Based on Multi-channel Graph Neural Networks[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1:Long Papers). 2021: 328-339.
[13] Pan H L, Lin Z, Fu P, et al. Modeling Intra and Inter-modality Incongruity for Multi-Modal Sarcasm Detection[C]// Findings of the Association for Computational Linguistics:EMNLP 2020. 2020: 1383-1392.
[14] Xu N, Zeng Z X, Mao W J. Reasoning with Multimodal Sarcastic Tweets via Modeling Cross-Modality Contrast and Semantic Association[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 3777-3786.
[15] Gupta S, Shah A, Shah M, et al. FiLMing Multimodal Sarcasm Detection with Attention[OL]. arXiv Preprint, arXiv: 2110.00416.
[16] 张继东, 蒋丽萍. 基于多模态深度学习的旅游评论反讽识别研究[J]. 情报理论与实践, 2022, 45(7): 158-164.
doi: 10.16353/j.cnki.1000-7490.2022.07.022
[16] (Zhang Jidong, Jiang Liping. Research on Irony Recognition of Travel Reviews Based on Multi-modal Deep Learning[J]. Information Studies: Theory & Application, 2022, 45(7): 158-164.)
doi: 10.16353/j.cnki.1000-7490.2022.07.022
[17] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale[OL]. arXiv Preprint, arXiv: 2010. 11929.
[18] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[19] Voita E, Talbot D, Moiseev F, et al. Analyzing Multi-head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 5797-5808.
[20] Gaudart J, Giusiano B, Huiart L. Comparison of the Performance of Multi-layer Perceptron and Linear Regression for Epidemiological Data[J]. Computational Statistics & Data Analysis, 2004, 44(4): 547-570.
doi: 10.1016/S0167-9473(02)00257-8
[21] Ba J L, Kiros J R, Hinton G E. Layer Normalization[OL]. arXiv Preprint, arXiv: 1607.06450.
[22] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.
[23] Lou C W, Liang B, Gui L, et al. Affective Dependency Graph for Sarcasm Detection[C]// Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021: 1844-1849.
[24] Cambria E, Li Y, Xing F Z, et al. SenticNet 6: Ensemble Application of Symbolic and Subsymbolic AI for Sentiment Analysis[C]// Proceedings of the 29th ACM International Conference on Information and Knowledge Management. 2020: 105-114.
[25] 罗曜儒, 李智. 基于Bi-LSTM的生物医学文本语义消歧研究[J]. 软件导刊, 2019, 18(4): 57-59.
[25] (Luo Yaoru, Li Zhi. Word Sense Disambiguation in Biomedical Text Based on Bi-LSTM[J]. Software Guide, 2019, 18(4): 57-59.)
[26] He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
[27] Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1746-1751.
[28] Xiong T, Zhang P R, Zhu H B, et al. Sarcasm Detection with Self-matching Networks and Low-rank Bilinear Pooling[C]// Proceedings of the World Wide Web Conference. 2019: 2115-2124.
[29] Tay Y, Luu A T, Hui S C, et al. Reasoning with Sarcasm by Reading In-between[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2018: 1010-1020.
[30] Liang B, Lou C W, Li X, et al. Multi-Modal Sarcasm Detection with Interactive In-Modal and Cross-Modal Graphs[C]// Proceedings of the 29th ACM International Conference on Multimedia. 2021: 4707-4715.
[1] 刘洋, 丁星辰, 马莉莉, 王淳洋, 朱立芳. 基于多维度图卷积网络的旅游评论有用性识别*[J]. 数据分析与知识发现, 2023, 7(8): 95-104.
[2] 赵萌, 王昊, 李晓敏. 中国民歌多情感识别及情感变化规律分析研究*[J]. 数据分析与知识发现, 2023, 7(7): 111-124.
[3] 刘洋, 张雯, 胡毅, 毛进, 黄菲. 基于多模态深度学习的酒店股票预测*[J]. 数据分析与知识发现, 2023, 7(5): 21-32.
[4] 潘华莉, 谢珺, 高婧, 续欣莹, 王长征. 融合多模态特征的深度强化学习推荐模型*[J]. 数据分析与知识发现, 2023, 7(4): 114-128.
[5] 张昱, 张海军, 刘雅情, 梁科晋, 王月阳. 基于双向掩码注意力机制的多模态情感分析*[J]. 数据分析与知识发现, 2023, 7(4): 46-55.
[6] 赵朝阳, 朱贵波, 王金桥. ChatGPT给语言大模型带来的启示和多模态大模型新的发展思路*[J]. 数据分析与知识发现, 2023, 7(3): 26-35.
[7] 王昊, 龚丽娟, 周泽聿, 范涛, 王永生. 融合语义增强的社交媒体虚假信息检测方法研究*[J]. 数据分析与知识发现, 2023, 7(2): 48-60.
[8] 吴思思, 马静. 基于感知融合的多任务多模态情感分析模型*[J]. 数据分析与知识发现, 2023, 7(10): 74-84.
[9] 张艳琼, 朱兆松, 赵晓驰. 面向手语语言学的中国手语词汇多模态语料库构建研究*[J]. 数据分析与知识发现, 2023, 7(10): 144-155.
[10] 陈圆圆, 马静. 基于SC-Attention机制的多模态讽刺检测研究*[J]. 数据分析与知识发现, 2022, 6(9): 40-51.
[11] 施运梅, 袁博, 张乐, 吕学强. IMTS:融合图像与文本语义的虚假评论检测方法*[J]. 数据分析与知识发现, 2022, 6(8): 84-96.
[12] 郭樊容, 黄孝喜, 王荣波, 谌志群, 胡创, 谢一敏, 司博宇. 基于Transformer和图卷积神经网络的隐喻识别*[J]. 数据分析与知识发现, 2022, 6(4): 120-129.
[13] 范涛, 王昊, 李跃艳, 邓三鸿. 基于多模态融合的非遗图片分类研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 329-337.
[14] 李纲, 张霁, 毛进. 面向突发事件画像的社交媒体图像分类研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 67-79.
[15] 范涛,王昊,吴鹏. 基于图卷积神经网络和依存句法分析的网民负面情感分析研究*[J]. 数据分析与知识发现, 2021, 5(9): 97-106.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn