Please wait a minute...
Advanced Search
数据分析与知识发现  2024, Vol. 8 Issue (3): 85-97     https://doi.org/10.11925/infotech.2096-3467.2023.0021
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于深度学习的多模态新闻数据主题发现研究*
倪亮1,吴鹏2(),周雪晴3
1南京理工大学网络空间安全学院 南京 210094
2南京理工大学智能制造学院 南京 210094
3南京理工大学经济管理学院 南京 210094
Topic Detecting on Multimodal News Data Based on Deep Learning
Ni Liang1,Wu Peng2(),Zhou Xueqing3
1School of Cyber Science & Engineering, Nanjing University of Science & Technology, Nanjing 210094, China
2School of Intelligent Manufacturing, Nanjing University of Science & Technology, Nanjing 210094, China
3School of Economics & Management, Nanjing University of Science & Technology, Nanjing 210094, China
全文: PDF (4663 KB)   HTML ( 9
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】基于多模态学习方法,对新闻中文本和图片相结合内容,构建多模态主题模型,自动挖掘新闻中的潜在主题。【方法】采用结合词嵌入的主题模型,从图片和文本两方面进行主题建模,并且使用多模态联合表征学习和协同表征学习的方法进行特征融合。最后,对发现的多模态新闻主题进行可视化分析,结合N15News数据集进行实证研究。【结果】实验结果表明,相对于仅使用文本特征的Label-ETM,多模态主题建模方法可以获得更好的主题的可解释性和多样性。这说明多模态主题建模方法具有一定的可行性与合理性。【局限】本文假设新闻中的图片和文字在语义和主题上是相关的,在弱相关和不相关领域多模态融合方法仍需要改善。【结论】多模态主题建模可以发现不同模态数据之间的联系,提高发现主题的多样性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
倪亮
吴鹏
周雪晴
关键词 主题模型多模态联合表征多模态协同表征新闻主题发现    
Abstract

[Objective] This paper constructs a multimodal topic model combining text and images in news based on multimodal learning methods. It aims to uncover latent topics in the news automatically. [Methods] We constructed a model incorporating word embedding for topics from texts and images. It uses multimodal joint representation learning and coordinate representation learning for feature fusion. We conducted visual analysis for the discovered multimodal news topics. Finally, we examined the new model on the N15News dataset. [Results] Compared to Label-ETM using only text features, the multimodal topic modeling approach can achieve better topic interpretability and diversity. This suggests that the multimodal topic modeling approach is feasible. [Limitations] We assume images and text in news are semantically and thematically related. Multimodal fusion methods need to be improved in weakly related and irrelevant domains. [Conclusions] Multimodal topic modeling can discover connections between different modal data and improve the diversity of discovered topics.

Key wordsTopic Model    Multi-Modal Joint Representation    Multi-Modal Coordinate Representation    Topic Detecting for News
收稿日期: 2023-01-08      出版日期: 2023-05-08
ZTFLH:  TP393  
  G250  
基金资助:* 国家自然科学基金项目(72274096);国家自然科学基金项目(71774084);江苏省青蓝工程优秀教学团队项目([2020]10)
通讯作者: 吴鹏,ORCID:0000-0001-7066-5487,E-mail: wupeng@njust.edu.cn。   
引用本文:   
倪亮, 吴鹏, 周雪晴. 基于深度学习的多模态新闻数据主题发现研究*[J]. 数据分析与知识发现, 2024, 8(3): 85-97.
Ni Liang, Wu Peng, Zhou Xueqing. Topic Detecting on Multimodal News Data Based on Deep Learning. Data Analysis and Knowledge Discovery, 2024, 8(3): 85-97.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2023.0021      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2024/V8/I3/85
Fig.1  ETM总体结构
Fig.2  基于深度学习的多模态主题建模研究框架
Fig.3  基于联合表征的多模态主题模型
Fig.4  基于协同表征的多模态主题模型
Fig.5  SIFT特征点示例
Fig.6  视觉词典构建流程
相关统计值 SIFT特征点个数
平均值 898.66
最大值 3 455.00
最小值 47.00
Table 1  图像中SIFT特征点个数分布
预训练模型 特征向量维度 每张图像中的特征描述子个数
VGG19 4 096 4(2×2)
ResNet50 2 048 4(2×2)
Table 2  基于深度神经网络的图像特征提取结果
模型 基于VAE 随机初始化的词
嵌入
预训练的
词嵌入
基于词嵌入映射的主题嵌入
LDA[11] × \ \ \
NVDM-GSM[35] × ×
ETM[10] × ×
Label-ETM × ×
Word2Vec-ETM ×
Table 3  验证词嵌入有效性的对比设计
备选文本特征 备选图像特征 备选主题模型
Skip-Gram SIFT Label-ETM
Fast-Text VGG19 Word2Vec-ETM
ResNet50
Table 4  验证最优特征组合的对比实验方案
模型 TC TD(取前25个词)
K=50 K=200 K=50 K=200
LDA 0.115 0.127 0.453 0.315
NVDM-GSM 0.180 0.132 0.414 0.119
ETM 0.165 0.149 0.402 0.164
Label-ETM 0.183 0.145 0.722 0.372
Word2Vec-ETM 0.186 0.150 0.758 0.464
Table 5  文本主题建模结果
对比方案 TC TD(取前25个词)
词嵌入 图像特征 基线模型 文本 图像 平均 文本 图像 平均
Skip-Gram SIFT Label-ETM 0.096 0.064 0.078 0.214 0.678 0.479
Skip-Gram VGG19 Label-ETM 0.164 -0.995 -0.424 0.486 0.142 0.311
Skip-Gram ResNet50 Label-ETM 0.162 -0.997 -0.426 0.494 0.057 0.272
Skip-Gram SIFT Word2Vec-ETM -0.021 -0.012 -0.016 0.120 0.118 0.119
Skip-Gram VGG19 Word2Vec-ETM 0.111 -1.000 -0.452 0.020 0.020 0.020
Skip-Gram ResNet50 Word2Vec-ETM 0.106 -1.000 -0.455 0.020 0.020 0.020
Fast-Text SIFT Label-ETM 0.040 0.066 0.055 0.242 0.742 0.527
Fast-Text VGG19 Label-ETM 0.154 -0.991 -0.427 0.518 0.278 0.396
Fast-Text ResNet50 Label-ETM 0.154 -0.998 -0.431 0.520 0.033 0.273
Table 6  基于联合表征的多模态主题建模结果
对比方案 以图检文 以文检图 平均召回率/%
词嵌入 图像特征 文本主题建模 R@1/% R@5/% R@10/% R@1/% R@5/% R@10/%
Skip-Gram VGG19 Label-ETM 3.8 18.9 30.2 3.0 13.2 25.3 15.73
Skip-Gram ResNet50 Label-ETM 3.8 17.0 43.4 3.4 14.3 26.4 18.05
Skip-Gram VGG19 Word2Vec-ETM 5.7 20.8 30.2 3.4 11.3 22.3 15.62
Skip-Gram ResNet50 Word2Vec-ETM 11.3 17.0 35.8 4.2 12.1 21.1 16.92
Fast-Text VGG19 Label-ETM 0.0 11.3 26.4 1.1 12.5 22.3 12.27
Fast-Text ResNet50 Label-ETM 5.7 20.8 32.1 3.8 12.5 23.8 16.45
Fast-Text VGG19 Word2Vec-ETM 3.8 13.2 22.6 1.5 12.8 23.8 12.95
Fast-Text ResNet50 Word2Vec-ETM 1.9 11.3 30.2 2.6 14.3 27.9 14.70
Table 7  基于协同表征的多模态主题建模结果
Fig.7  VGG19不同深度的卷积层提取的图像特征
Fig.8  ResNet50不同深度的卷积层提取的图像特征
Fig.9  基于联合表征的多模态新闻主题可视化结果示例
Fig.10  基于协同表征的多模态新闻主题可视化结果示例
[1] Mehrotra R, Sanner S, Buntine W, et al. Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013: 889-892.
[2] Kim E H J, Jeong Y K, Kim Y, et al. Topic-Based Content and Sentiment Analysis of Ebola Virus on Twitter and in the News[J]. Journal of Information Science, 2016, 42(6): 763-781.
doi: 10.1177/0165551515608733
[3] Xue J, Chen J X, Hu R, et al. Twitter Discussions and Emotions about the COVID-19 Pandemic: Machine Learning Approach[J]. Journal of Medical Internet Research, 2020, 22(11): e20550.
doi: 10.2196/20550
[4] Fang Y X, Zhang H X, Ren Y W. Unsupervised Cross-Modal Retrieval via Multi-Modal Graph Regularized Smooth Matrix Factorization Hashing[J]. Knowledge-Based Systems, 2019, 171: 69-80.
doi: 10.1016/j.knosys.2019.02.004
[5] Hu P, Peng D Z, Wang X, et al. Multimodal Adversarial Network for Cross-Modal Retrieval[J]. Knowledge-Based Systems, 2019, 180: 38-50.
doi: 10.1016/j.knosys.2019.05.017
[6] Li Y Q, Zhang K, Wang J Y, et al. A Cognitive Brain Model for Multimodal Sentiment Analysis Based on Attention Neural Networks[J]. Neurocomputing, 2021, 430: 159-173.
doi: 10.1016/j.neucom.2020.10.021
[7] Hong D, Gao L, Yokoya N, et al. More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(5): 4340- 4354.
doi: 10.1109/TGRS.2020.3016820
[8] Li C L, Duan Y, Wang H R, et al. Enhancing Topic Modeling for Short Texts with Auxiliary Word Embeddings[J]. ACM Transactions on Information Systems, 2017, 36(2): Article No.11.
[9] Wang J, He K J, Yang M. Topic Discovery by Spectral Decomposition and Clustering with Coordinated Global and Local Contexts[J]. International Journal of Machine Learning and Cybernetics, 2020, 11(11): 2475-2487.
doi: 10.1007/s13042-020-01133-3
[10] Dieng A B, Ruiz F J R, Blei D M. Topic Modeling in Embedding Spaces[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 439-453.
doi: 10.1162/tacl_a_00325
[11] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[12] Yan X H, Guo J F, Lan Y Y, et al. A Biterm Topic Model for Short Texts[C]// Proceedings of the 22nd International Conference on World Wide Web. New York: ACM, 2013: 1445-1456.
[13] Blei D M, Lafferty J D. Correlated Topic Models[C]// Proceedings of the 18th International Conference on Neural Information Processing Systems. New York: ACM, 2005: 147-154.
[14] Blei D M, Lafferty J D. Dynamic Topic Models[C]// Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006: 113-120.
[15] Zhu P P, Zhang L P, Wang Y B, et al. Projection Learning with Local and Global Consistency Constraints for Scene Classification[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 144: 202-216.
doi: 10.1016/j.isprsjprs.2018.07.004
[16] Li Y, Nair P, Wen Z, et al. Global Surveillance of COVID-19 by Mining News Media Using a Multi-Source Dynamic Embedded Topic Model[C]// Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. New York: ACM, 2020:Article No.34.
[17] Harandizadeh B, Priniski J H, Morstatter F. Keyword Assisted Embedded Topic Model[C]// Proceedings of the 15th ACM International Conference on Web Search and Data Mining. New York: ACM, 2022: 372-380.
[18] Bhatia S, Lau J H, Baldwin T. Topic Intrusion for Automatic Topic Model Evaluation[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2018: 844-849.
[19] Baltrušaitis T, Ahuja C, Morency L P. Multimodal Machine Learning: A Survey and Taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(2): 423-443.
doi: 10.1109/TPAMI.2018.2798607 pmid: 29994351
[20] 杜鹏飞, 李小勇, 高雅丽. 多模态视觉语言表征学习研究综述[J]. 软件学报, 2021, 32(2): 327-348.
[20] (Du Pengfei, Li Xiaoyong, Gao Yali. Survey on Multimodal Visual Language Representation Learning[J]. Journal of Software, 2021, 32(2): 327-348.)
[21] 蹇松雷, 卢凯. 复杂异构数据的表征学习综述[J]. 计算机科学, 2020, 47(2): 1-9.
doi: 10.11896/jsjkx.190600180
[21] (Jian Songlei, Lu Kai. Survey on Representation Learning of Complex Heterogeneous Data[J]. Computer Science, 2020, 47(2): 1-9.)
doi: 10.11896/jsjkx.190600180
[22] Bougiatiotis K, Giannakopoulos T. Enhanced Movie Content Similarity Based on Textual, Auditory and Visual Information[J]. Expert Systems with Applications, 2018, 96: 86-102.
doi: 10.1016/j.eswa.2017.11.050
[23] Hazarika D, Zimmermann R, Poria S. MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis[C]// Proceedings of the 28th ACM International Conference on Multimedia. New York: ACM, 2020: 1122-1131.
[24] Du Y P, Liu Y, Peng Z, et al. Gated Attention Fusion Network for Multimodal Sentiment Classification[J]. Knowledge-Based Systems, 2022, 240: 108107.
doi: 10.1016/j.knosys.2021.108107
[25] Karpathy A, Li F F. Deep Visual-Semantic Alignments for Generating Image Descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 664-676.
doi: 10.1109/TPAMI.2016.2598339 pmid: 27514036
[26] Song G L, Wang S H, Huang Q M, et al. Harmonized Multimodal Learning with Gaussian Process Latent Variable Models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(3): 858-872.
doi: 10.1109/TPAMI.34
[27] Li Z Y, Lu H B, Fu H, et al. Image-Text Bidirectional Learning Network Based Cross-Modal Retrieval[J]. Neurocomputing, 2022, 483: 148-159.
doi: 10.1016/j.neucom.2022.02.007
[28] Qian S S, Zhang T Z, Xu C S, et al. Multi-Modal Event Topic Model for Social Event Analysis[J]. IEEE Transactions on Multimedia, 2016, 18(2): 233-246.
doi: 10.1109/TMM.2015.2510329
[29] Liu Z, Zhang C M, Chen C X. MMDF-LDA: An Improved Multi-Modal Latent Dirichlet Allocation Model for Social Image Annotation[J]. Expert Systems with Applications, 2018, 104: 168-184.
doi: 10.1016/j.eswa.2018.03.014
[30] Wang Z, Shan X, Zhang X X, et al. N24News: A New Dataset for Multimodal News Classification[C]// Proceedings of the 13th Language Resources and Evaluation Conference. 2022: 6768-6775.
[31] Chang J, Boyd-Graber J, Gerrish S, et al. Reading Tea Leaves: How Humans Interpret Topic Models[C]// Proceedings of the 22nd International Conference on Neural Information Processing Systems. New York: ACM, 2009: 288-296.
[32] Lau J H, Newman D, Baldwin T. Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality[C]// Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2014: 530-539.
[33] Li J N, Selvaraju R R, Gotmare A D, et al. Align before Fuse: Vision and Language Representation Learning with Momentum Distillation[J]. Advances in Neural Information Processing Systems, 2021, 34: 9694-9705.
[34] Radford A, Kim J W, Hallacy C, et al. Learning Transferable Visual Models from Natural Language Supervision[C]// Proceedings of the 38th International Conference on Machine Learning. New York: ACM, 2021: 8748-8763.
[35] Miao Y S, Grefenstette E, Blunsom P. Discovering Discrete Latent Topics with Neural Variational Inference[C]// Proceedings of the 34th International Conference on Machine Learning. New York: ACM, 2017: 2410-2419.
[1] 贺超城, 黄茜, 李欣儒, 王春迎, 吴江. 元宇宙的冷与热——融合BERT与动态主题模型的微博文本分析*[J]. 数据分析与知识发现, 2023, 7(9): 25-38.
[2] 林伟振, 刘洪伟, 陈燕君, 温展明, 易闽琦. 基于在线评论的顾客满意度研究——以健康监测穿戴产品为例*[J]. 数据分析与知识发现, 2023, 7(5): 145-154.
[3] 唐娇, 张力生, 桑春艳. 基于潜在主题分布和长、短期用户表示的新闻推荐模型*[J]. 数据分析与知识发现, 2022, 6(9): 52-64.
[4] 王丽, 刘细文. 基于专利数据的技术主题扩散量化研究与实现*[J]. 数据分析与知识发现, 2022, 6(6): 1-10.
[5] 岳铁骐, 傅友斐, 徐健. 基于招聘广告的岗位人才需求分析框架构建与实证研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 151-166.
[6] 吕璐成, 周健, 王学昭, 刘细文. 基于双层主题模型的技术演化分析框架及其应用*[J]. 数据分析与知识发现, 2022, 6(2/3): 18-32.
[7] 周云泽, 闵超. 基于LDA模型与共享语义空间的新兴技术识别——以自动驾驶汽车为例*[J]. 数据分析与知识发现, 2022, 6(2/3): 55-66.
[8] 伊惠芳,刘细文. 一种专利技术主题分析的IPC语境增强Context-LDA模型研究[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[9] 张鑫,文奕,许海云. 一种融合表示学习与主题表征的作者合作预测模型*[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[10] 赵天资, 段亮, 岳昆, 乔少杰, 马子娟. 基于Biterm主题模型的新闻线索生成方法 *[J]. 数据分析与知识发现, 2021, 5(2): 1-13.
[11] 陈浩, 张梦毅, 程秀峰. 融合主题模型与决策树的跨地区专利合作关系发现与推荐*——以广东省和武汉市高校专利库为例[J]. 数据分析与知识发现, 2021, 5(10): 37-50.
[12] 余传明,原赛,朱星宇,林虹君,张普亮,安璐. 基于深度学习的热点事件主题表示研究*[J]. 数据分析与知识发现, 2020, 4(4): 1-14.
[13] 潘有能,倪秀丽. 基于Labeled-LDA模型的在线医疗专家推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[14] 陈文杰. 基于翻译模型的科研合作预测研究*[J]. 数据分析与知识发现, 2020, 4(10): 28-36.
[15] 凌洪飞,欧石燕. 面向主题模型的主题自动语义标注研究综述 *[J]. 数据分析与知识发现, 2019, 3(9): 16-26.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn