Please wait a minute...
Advanced Search
数据分析与知识发现  2024, Vol. 8 Issue (5): 91-101     https://doi.org/10.11925/infotech.2096-3467.2023.0026
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
融合多特征和注意力机制的多模态情感分析模型*
吕学强1,田驰1,张乐1(),杜一凡1,张旭1,才藏太2
1北京信息科技大学网络文化与数字传播北京市重点实验室 北京 100101
2青海师范大学藏语智能信息处理及应用国家重点实验室 西宁 810008
Multimodal Sentiment Analysis Model Integrating Multi-features and Attention Mechanism
Lyu Xueqiang1,Tian Chi1,Zhang Le1(),Du Yifan1,Zhang Xu1,Cai Zangtai2
1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China
2The State Key Laboratory of Tibetan Intelligent Information Processing and Application, Qinghai Normal University, Xining 810008, China
全文: PDF (1103 KB)   HTML ( 36
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 针对当前多模态情感分析中多模态特征提取不充分,模态内部信息和模态间交互信息结合不充分的问题,提出一种融合多特征和注意力机制的多模态情感分析模型。【方法】 在多模态特征提取方面,增加视频模态中人物的肢体动作、性别和年龄特征;对于文本模态,融合基于BERT的字粒度语义向量和融合义原信息的词粒度语义向量,丰富了多模态数据的低层特征。利用自注意力机制和跨模态注意力机制以实现模态内部信息和模态间信息的充分结合。将各模态特征进行拼接,通过软注意力机制为各模态特征分配注意力权重,通过全连接层输出最终的情感分类结果。【结果】 在公开数据集CH-SIMS和本文构建的热点舆情评论视频数据集HPOC上与Self-MM模型对比,实验结果表明,本文模型在CH-SIMS数据集上的二分类准确率、三分类准确率和F1值分别提升1.83、1.74和0.69个百分点,在HPOC数据集上分别提升1.03、0.94和0.79个百分点。【局限】 视频中人物所处的场景可能不断变化,不同的场景可能蕴含不同的情感信息,模型未考虑融合人物所处的场景信息。【结论】 本文模型丰富了多模态数据的低层特征,充分结合模态内部信息和模态间信息,能够有效提升情感分析的效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
吕学强
田驰
张乐
杜一凡
张旭
才藏太
关键词 多特征多模态情感分析注意力机制    
Abstract

[Objective] This paper proposes a multimodal sentiment analysis model integrating multiple features and attention mechanisms. It addresses the insufficient extraction of multimodal features and inadequate interaction of intra-modal and inter-modal information in existing models. [Methods] In multimodal feature extraction, we enhanced the features of body movements, gender, and age of individuals in the video modality. For the text modality, we integrated BERT-based character-level and word-level semantic vectors. Therefore, we enriched the low-level features of multimodal data. We also utilized self-attention and cross-modal attention mechanisms to integrate intra-modal and inter-modal information. We concatenated the modal features and employed a soft attention mechanism to allocate attention weight to each feature. Finally, we generated the sentiment classification results through fully connected layers. [Results] We examined the proposed model on the public dataset (CH-SIMS) and the Hot Public Opinion Comments Videos (HPOC) dataset constructed in this paper. Compared with the Self-MM model, our model improved the binary classification accuracy, tri-class classification accuracy, and F1 value by 1.83%, 1.74%, and 0.69% on the CH-SIMS dataset, and 1.03%, 0.94%, and 0.79% on the HPOC dataset. [Limitations] The person’s scene in the video may change constantly, and different scenes may contain different emotional information. Our model does not integrate the scene information of the person. [Conclusions] The proposed model enriches the low-level features of multimodal data and improves the effectiveness of sentimental analysis.

Key wordsMulti-features    Multi-modal    Sentiment Analysis    Attention Mechanism
收稿日期: 2023-01-11      出版日期: 2024-05-27
ZTFLH:  TP391  
基金资助:国家语委重点项目(ZDI145-10);北京市教育委员会科技计划一般项目(KM202311232001);青海省创新平台建设专项(2022-ZJ-T02)
通讯作者: 张乐,ORCID:0000-0002-9620-511X,E-mail: zhangle@bistu.edu.cn。   
引用本文:   
吕学强, 田驰, 张乐, 杜一凡, 张旭, 才藏太. 融合多特征和注意力机制的多模态情感分析模型*[J]. 数据分析与知识发现, 2024, 8(5): 91-101.
Lyu Xueqiang, Tian Chi, Zhang Le, Du Yifan, Zhang Xu, Cai Zangtai. Multimodal Sentiment Analysis Model Integrating Multi-features and Attention Mechanism. Data Analysis and Knowledge Discovery, 2024, 8(5): 91-101.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2023.0026      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2024/V8/I5/91
Fig.1  MFAM架构图
Fig.2  跨模态注意力模块结构
项目 CH-SIMS HPOC
训练集 验证集 测试集 训练集 验证集 测试集
话语数 1 368 456 457 350 119 119
积极情感 419 139 140 125 33 47
中性情感 207 69 69 87 32 37
消极情感 742 248 248 138 54 35
Table 1  数据集基本信息
实验环境 配置
操作系统 Linux
CPU Intel(R) Xeon(R) Gold 5118 CPU @2.30GHz
GPU Tesla V100
Python 3.8.13
PyTorch 1.12.1
CUDA 11.4
Table 2  实验环境信息
参数 参数值 参数 参数值
跨模态注意力维度 50 Learning_rate 0.002
跨模态注意力头数 10 Dropout 0.3
优化器 Adam Early_stop 8
迭代次数 20 Batch_size 16
Table 3  实验参数设置
模型 Acc-2/% Acc-3/% F1-Score/% MAE
EF-LSTM 69.37 54.27 56.82 0.590
TFN 78.38 65.12 78.62 0.432
MFN 77.90 65.73 77.88 0.435
MulT 78.56 64.77 79.66 0.453
MISA 79.43 64.55 79.70 0.428
Self-MM 80.04 65.47 80.44 0.425
MFAM 81.87 67.21 81.13 0.416
Table 4  不同模型在CH-SIMS数据集上的实验结果
模型 Acc-2/% Acc-3/% F1-Score/% MAE
EF-LSTM 63.26 49.50 50.17 0.632
TFN 72.45 57.14 71.85 0.593
MFN 73.38 57.65 72.04 0.589
MulT 73.32 57.43 71.90 0.591
MISA 74.03 58.21 73.26 0.578
Self-MM 74.37 58.52 73.73 0.564
MFAM 75.40 59.46 74.52 0.560
Table 5  不同模型在HPOC数据集上的实验结果
序号 模型 Acc-2/% Acc-3/% F1-Score/% MAE
1 L-A 80.41 66.47 79.71 0.435
2 L-V 79.36 65.72 78.84 0.447
3 w/o Pose 80.92 66.64 79.93 0.435
4 w/o Gender 81.13 66.73 80.23 0.426
5 w/o Age 81.27 66.85 80.45 0.423
6 w/o P_G_A 80.35 66.39 79.48 0.444
7 w/o Sememe 81.05 66.70 79.95 0.432
8 w/o Cross-attention 80.18 65.96 79.26 0.446
9 w/o Soft-attention 81.35 67.03 80.56 0.421
10 MFAM 81.87 67.21 81.13 0.416
Table 6  CH-SIMS数据集消融实验结果
[1] Gönen M, Alpaydin E. Multiple Kernel Learning Algorithms[J]. Journal of Machine Learning Research, 2011, 12: 2211-2268.
[2] Ghahramani Z. Learning Dynamic Bayesian Networks[M]//Adaptive Processing of Sequences and Data Structures. Berlin, Heidelberg: Springer, 1998: 168-197.
[3] Chen Y. Convolutional Neural Network for Sentence Classification[D]. Waterloo: University of Waterloo, 2015.
[4] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
doi: 10.1162/neco.1997.9.8.1735 pmid: 9377276
[5] Tang D Y, Qin B, Liu T. Document Modeling with Gated Recurrent Neural Network for Sentiment Classification[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1422-1432.
[6] Poria S, Cambria E, Hazarika D, et al. Context-Dependent Sentiment Analysis in User-Generated Videos[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 873-883.
[7] Williams J, Kleinegesse S, Comanescu R, et al. Recognizing Emotions in Video Using Multimodal DNN Feature Fusion[C]// Proceedings of Grand Challenge and Workshop on Human Multimodal Language. 2018: 11-19.
[8] Zadeh A, Chen M H, Poria S, et al. Tensor Fusion Network for Multimodal Sentiment Analysis[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1103-1114.
[9] Hou M, Tang J J, Zhang J H, et al. Deep Multimodal Multilinear Fusion with High-Order Polynomial Pooling[C]// Proceedings of the 33rd Conference on Neural Information Processing Systems. 2019: 12156-12166.
[10] Liu Z, Shen Y, Lakshminarasimhan V B, et al. Efficient Low-Rank Multimodal Fusion with Modality-Specific Factors[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2018: 2247-2256.
[11] 马超, 李纲, 陈思菁, 等. 基于多模态数据语义融合的旅游在线评论有用性识别研究[J]. 情报学报, 2020, 39(2): 199-207.
[11] (Ma Chao, Li Gang, Chen Sijing, et al. Research on Usefulness Recognition of Tourism Online Reviews Based on Multimodal Data Semantic Fusion[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(2): 199-207.)
[12] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[13] Tsai Y H H, Bai S J, Liang P P, et al. Multimodal Transformer for Unaligned Multimodal Language Sequences[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 6558-6569.
[14] Hazarika D, Zimmermann R, Poria S. MISA: Modality-Invariant and-Specific Representations for Multimodal Sentiment Analysis[C]// Proceedings of the 28th ACM International Conference on Multimedia. 2020: 1122-1131.
[15] 张昱, 张海军, 刘雅情, 等. 基于双向掩码注意力机制的多模态情感分析[J]. 数据分析与知识发现, 2023, 7(4): 46-55.
[15] (Zhang Yu, Zhang Haijun, Liu Yaqing, et al. Multimodal Sentiment Analysis Based on Bidirectional Mask Attention Mechanism[J]. Data Analysis and Knowledge Discovery, 2023, 7(4): 46-55.)
[16] Lindsay P H, Norman D A. Human Information Processing: An Introduction to Psychology[M]. London: Academic Press, 2013.
[17] 张峰, 李希城, 董春茹, 等. 基于深度情感唤醒网络的多模态情感分析与情绪识别[J]. 控制与决策, 2022, 37(11): 2984-2992.
[17] (Zhang Feng, Li Xicheng, Dong Chunru, et al. Deep Emotional Arousal Network for Multimodal Sentiment Analysis and Emotion Recognition[J]. Control and Decision, 2022, 37(11): 2984-2992.)
[18] 王旭阳, 董帅, 石杰. 复合层次融合的多模态情感分析[J]. 计算机科学与探索, 2023, 17(1): 198-208.
doi: 10.3778/j.issn.1673-9418.2111004
[18] (Wang Xuyang, Dong Shuai, Shi Jie. Multimodal Sentiment Analysis with Composite Hierarchical Fusion[J]. Journal of Frontiers of Computer Science & Technology, 2023, 17(1): 198-208.)
doi: 10.3778/j.issn.1673-9418.2111004
[19] McFee B, Raffel C, Liang D W, et al. librosa: Audio and Music Signal Analysis in Python[C]// Proceedings of the 14th Python in Science Conference. 2015: 18-25.
[20] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume1(Long and Short Papers). 2019: 4171-4186.
[21] Niu Y L, Xie R B, Liu Z Y, et al. Improved Word Representation Learning with Sememes[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2017: 2049-2058.
[22] Yu W M, Xu H, Meng F Y, et al. CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-Grained Annotation of Modality[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 3718-3727.
[23] Baltrusaitis T, Zadeh A, Lim Y C, et al. OpenFace 2.0: Facial Behavior Analysis Toolkit[C]// Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition. 2018: 59-66.
[24] Xu Y F, Zhang J, Zhang Q M, et al. ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation [OL]. arXiv Preprint, arXiv: 2204.12484.
[25] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: Common Objects in Context[C]// Proceedings of the 13th European Conference on Computer Vision. 2014: 740-755.
[26] Park S, Shim H S, Chatterjee M, et al. Computational Analysis of Persuasiveness in Social Multimedia: A Novel Dataset and Multimodal Prediction Approach[C]// Proceedings of the 16th ACM International Conference on Multimodal Interaction. 2014: 50-57.
[27] Zeng Y, Mai S, Hu H F. Which is Making the Contribution: Modulating Unimodal and Cross-Modal Dynamics for Multimodal Sentiment Analysis[C]// Findings of the Association for Computational Linguistics:EMNLP 2021. 2021: 1262-1274.
[28] Zadeh A, Liang P P, Mazumder N, et al. Memory Fusion Network for Multi-view Sequential Learning[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018: 5634-5641.
[29] Yu W M, Xu H, Yuan Z Q, et al. Learning Modality-Specific Representations with Self-Supervised Multi-task Learning for Multimodal Sentiment Analysis[C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021: 10790-10797.
[1] 江亿平, 张婷, 夏争鸣, 李玉花, 张兆同. 融合边缘采样和Tri-training的用户评论情感分析方法*[J]. 数据分析与知识发现, 2024, 8(5): 102-112.
[2] 魏宏程, 朱恒民, 魏静, 叶东宇. 基于短视频网络的互联网舆情演化研究*[J]. 数据分析与知识发现, 2024, 8(5): 113-126.
[3] 凤丽洲, 刘馥榕, 王友卫. 基于图卷积网络和注意力机制的谣言检测方法*[J]. 数据分析与知识发现, 2024, 8(4): 125-136.
[4] 韩普, 陈文祺. 多模态命名实体识别研究进展*[J]. 数据分析与知识发现, 2024, 8(4): 50-63.
[5] 张伟, 徐宗煌, 蔡鸿宇, 韩普, 石进. 基于情感增强和知识融合的在线健康社区情感分析研究*[J]. 数据分析与知识发现, 2024, 8(3): 53-62.
[6] 刘成山, 李普国, 汪圳. 一种以科研团队为服务对象的科研人员推荐模型*[J]. 数据分析与知识发现, 2024, 8(3): 132-142.
[7] 倪亮, 吴鹏, 周雪晴. 基于深度学习的多模态新闻数据主题发现研究*[J]. 数据分析与知识发现, 2024, 8(3): 85-97.
[8] 全安坤, 李红莲, 张乐, 吕学强. 融合内容和图片特征的中文摘要生成方法研究*[J]. 数据分析与知识发现, 2024, 8(3): 110-119.
[9] 李雪莲, 王碧, 李立鑫, 韩迪轩. 融合抽象语义表示和依存语法的方面级情感分析*[J]. 数据分析与知识发现, 2024, 8(1): 55-68.
[10] 李慧, 胡耀华, 徐存真. 考虑评论情感表达力及其重要性的个性化推荐算法*[J]. 数据分析与知识发现, 2024, 8(1): 69-79.
[11] 何丽, 杨美华, 刘璐瑶. 融合SPO语义和句法信息的事件检测方法*[J]. 数据分析与知识发现, 2023, 7(9): 114-124.
[12] 韩普, 顾亮, 叶东宇, 陈文祺. 基于多任务和迁移学习的中文医学文献实体识别研究*[J]. 数据分析与知识发现, 2023, 7(9): 136-145.
[13] 刘洋, 丁星辰, 马莉莉, 王淳洋, 朱立芳. 基于多维度图卷积网络的旅游评论有用性识别*[J]. 数据分析与知识发现, 2023, 7(8): 95-104.
[14] 赵萌, 王昊, 李晓敏. 中国民歌多情感识别及情感变化规律分析研究*[J]. 数据分析与知识发现, 2023, 7(7): 111-124.
[15] 刘洋, 张雯, 胡毅, 毛进, 黄菲. 基于多模态深度学习的酒店股票预测*[J]. 数据分析与知识发现, 2023, 7(5): 21-32.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn