[Objective]In order to solve the problems of low prediction accuracy and difficult fusion of multimodal features in the existing multimodal sarcasm detection model, this paper designs an SC-attention fusion mechanism.
[Methods]The CLIP and RoBERTa models are used to extract features from three modes: picture, picture attribute and text respectively. SC-attention mechanism was combined with SENet's attention mechanism and Co-attention mechanism to fuse multi-modal features. Guided by the original modal features, attention weights are allocated reasonably. Finally, the features are input to the full connection layer for sarcasm detection.
[Results]The experimental results show that the accuracy of multimodal sarcasm detection based on SC-attention mechanism is 93.71%, and the F1 index is 91.89%. Compared with the model with the same data set, the accuracy of this model is increased by 10.27%, and the F1 value is increased by 11.5%.
[Limitations]The generalization of the model needs to be reflected in more data sets.
[Conclusions]The model proposed in this paper reduces information redundancy and feature loss, and effectively improves the accuracy of multimodal sarcasm detection.
陈圆圆, 马静.
基于SC-attention机制的多模态讽刺检测研究
[J]. 数据分析与知识发现, 10.11925/infotech.2096-3467.2021-1362.
Chen Yuanyuan, Ma Jing.
Research on Multimodal Sarcasm Detection Based on SC-attention Mechanism
. Data Analysis and Knowledge Discovery, 0, (): 1-.