Please wait a minute...
Data Analysis and Knowledge Discovery  2024, Vol. 8 Issue (3): 85-97    DOI: 10.11925/infotech.2096-3467.2023.0021
Current Issue | Archive | Adv Search |
Topic Detecting on Multimodal News Data Based on Deep Learning
Ni Liang1,Wu Peng2(),Zhou Xueqing3
1School of Cyber Science & Engineering, Nanjing University of Science & Technology, Nanjing 210094, China
2School of Intelligent Manufacturing, Nanjing University of Science & Technology, Nanjing 210094, China
3School of Economics & Management, Nanjing University of Science & Technology, Nanjing 210094, China
Download: PDF (4663 KB)   HTML ( 9
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper constructs a multimodal topic model combining text and images in news based on multimodal learning methods. It aims to uncover latent topics in the news automatically. [Methods] We constructed a model incorporating word embedding for topics from texts and images. It uses multimodal joint representation learning and coordinate representation learning for feature fusion. We conducted visual analysis for the discovered multimodal news topics. Finally, we examined the new model on the N15News dataset. [Results] Compared to Label-ETM using only text features, the multimodal topic modeling approach can achieve better topic interpretability and diversity. This suggests that the multimodal topic modeling approach is feasible. [Limitations] We assume images and text in news are semantically and thematically related. Multimodal fusion methods need to be improved in weakly related and irrelevant domains. [Conclusions] Multimodal topic modeling can discover connections between different modal data and improve the diversity of discovered topics.

Key wordsTopic Model      Multi-Modal Joint Representation      Multi-Modal Coordinate Representation      Topic Detecting for News     
Received: 08 January 2023      Published: 08 May 2023
ZTFLH:  TP393  
  G250  
Fund:National Natural Science Foundation of China(72274096);National Natural Science Foundation of China(71774084);Qing Lan Project in Jiangsu Universities([2020]10)
Corresponding Authors: Wu Peng,ORCID:0000-0001-7066-5487,E-mail: wupeng@njust.edu.cn。   

Cite this article:

Ni Liang, Wu Peng, Zhou Xueqing. Topic Detecting on Multimodal News Data Based on Deep Learning. Data Analysis and Knowledge Discovery, 2024, 8(3): 85-97.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2023.0021     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2024/V8/I3/85

Overall Structure of the ETM
Framework of Multi-modal Topic Modeling Based on Deep Learning
Multi-modal Topic Model Based on Joint Representation
Multi-modal Topic Model Based on Coordinate Representation
Example of SIFT Feature Points
Visual Dictionary Construction Process
相关统计值 SIFT特征点个数
平均值 898.66
最大值 3 455.00
最小值 47.00
Distribution of the Number of SIFT Feature Points in the Image
预训练模型 特征向量维度 每张图像中的特征描述子个数
VGG19 4 096 4(2×2)
ResNet50 2 048 4(2×2)
Image Feature Extraction Results Based on Deep Neural Network
模型 基于VAE 随机初始化的词
嵌入
预训练的
词嵌入
基于词嵌入映射的主题嵌入
LDA[11] × \ \ \
NVDM-GSM[35] × ×
ETM[10] × ×
Label-ETM × ×
Word2Vec-ETM ×
Comparative Design for Verifying the Effectiveness of Word Embeddings
备选文本特征 备选图像特征 备选主题模型
Skip-Gram SIFT Label-ETM
Fast-Text VGG19 Word2Vec-ETM
ResNet50
Comparative Experimental Scheme for Verifying the Optimal Feature Combination
模型 TC TD(取前25个词)
K=50 K=200 K=50 K=200
LDA 0.115 0.127 0.453 0.315
NVDM-GSM 0.180 0.132 0.414 0.119
ETM 0.165 0.149 0.402 0.164
Label-ETM 0.183 0.145 0.722 0.372
Word2Vec-ETM 0.186 0.150 0.758 0.464
Results of Text Topic Modeling
对比方案 TC TD(取前25个词)
词嵌入 图像特征 基线模型 文本 图像 平均 文本 图像 平均
Skip-Gram SIFT Label-ETM 0.096 0.064 0.078 0.214 0.678 0.479
Skip-Gram VGG19 Label-ETM 0.164 -0.995 -0.424 0.486 0.142 0.311
Skip-Gram ResNet50 Label-ETM 0.162 -0.997 -0.426 0.494 0.057 0.272
Skip-Gram SIFT Word2Vec-ETM -0.021 -0.012 -0.016 0.120 0.118 0.119
Skip-Gram VGG19 Word2Vec-ETM 0.111 -1.000 -0.452 0.020 0.020 0.020
Skip-Gram ResNet50 Word2Vec-ETM 0.106 -1.000 -0.455 0.020 0.020 0.020
Fast-Text SIFT Label-ETM 0.040 0.066 0.055 0.242 0.742 0.527
Fast-Text VGG19 Label-ETM 0.154 -0.991 -0.427 0.518 0.278 0.396
Fast-Text ResNet50 Label-ETM 0.154 -0.998 -0.431 0.520 0.033 0.273
Results of Multi-modal Topic Modeling Based on Joint Representation
对比方案 以图检文 以文检图 平均召回率/%
词嵌入 图像特征 文本主题建模 R@1/% R@5/% R@10/% R@1/% R@5/% R@10/%
Skip-Gram VGG19 Label-ETM 3.8 18.9 30.2 3.0 13.2 25.3 15.73
Skip-Gram ResNet50 Label-ETM 3.8 17.0 43.4 3.4 14.3 26.4 18.05
Skip-Gram VGG19 Word2Vec-ETM 5.7 20.8 30.2 3.4 11.3 22.3 15.62
Skip-Gram ResNet50 Word2Vec-ETM 11.3 17.0 35.8 4.2 12.1 21.1 16.92
Fast-Text VGG19 Label-ETM 0.0 11.3 26.4 1.1 12.5 22.3 12.27
Fast-Text ResNet50 Label-ETM 5.7 20.8 32.1 3.8 12.5 23.8 16.45
Fast-Text VGG19 Word2Vec-ETM 3.8 13.2 22.6 1.5 12.8 23.8 12.95
Fast-Text ResNet50 Word2Vec-ETM 1.9 11.3 30.2 2.6 14.3 27.9 14.70
Results of Multi-modal Topic Modeling Based on Coordinate Representation
Image Features Extracted by Convolutional Layers of Different Depths in VGG19
Image Features Extracted by Convolutional Layers of Different Depths in ResNet50
Multi-modal News Topic Visualization Results Based on Joint Representation
Multi-modal News Topic Visualization Results Based on Coordinate Representation
[1] Mehrotra R, Sanner S, Buntine W, et al. Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013: 889-892.
[2] Kim E H J, Jeong Y K, Kim Y, et al. Topic-Based Content and Sentiment Analysis of Ebola Virus on Twitter and in the News[J]. Journal of Information Science, 2016, 42(6): 763-781.
doi: 10.1177/0165551515608733
[3] Xue J, Chen J X, Hu R, et al. Twitter Discussions and Emotions about the COVID-19 Pandemic: Machine Learning Approach[J]. Journal of Medical Internet Research, 2020, 22(11): e20550.
doi: 10.2196/20550
[4] Fang Y X, Zhang H X, Ren Y W. Unsupervised Cross-Modal Retrieval via Multi-Modal Graph Regularized Smooth Matrix Factorization Hashing[J]. Knowledge-Based Systems, 2019, 171: 69-80.
doi: 10.1016/j.knosys.2019.02.004
[5] Hu P, Peng D Z, Wang X, et al. Multimodal Adversarial Network for Cross-Modal Retrieval[J]. Knowledge-Based Systems, 2019, 180: 38-50.
doi: 10.1016/j.knosys.2019.05.017
[6] Li Y Q, Zhang K, Wang J Y, et al. A Cognitive Brain Model for Multimodal Sentiment Analysis Based on Attention Neural Networks[J]. Neurocomputing, 2021, 430: 159-173.
doi: 10.1016/j.neucom.2020.10.021
[7] Hong D, Gao L, Yokoya N, et al. More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(5): 4340- 4354.
doi: 10.1109/TGRS.2020.3016820
[8] Li C L, Duan Y, Wang H R, et al. Enhancing Topic Modeling for Short Texts with Auxiliary Word Embeddings[J]. ACM Transactions on Information Systems, 2017, 36(2): Article No.11.
[9] Wang J, He K J, Yang M. Topic Discovery by Spectral Decomposition and Clustering with Coordinated Global and Local Contexts[J]. International Journal of Machine Learning and Cybernetics, 2020, 11(11): 2475-2487.
doi: 10.1007/s13042-020-01133-3
[10] Dieng A B, Ruiz F J R, Blei D M. Topic Modeling in Embedding Spaces[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 439-453.
doi: 10.1162/tacl_a_00325
[11] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[12] Yan X H, Guo J F, Lan Y Y, et al. A Biterm Topic Model for Short Texts[C]// Proceedings of the 22nd International Conference on World Wide Web. New York: ACM, 2013: 1445-1456.
[13] Blei D M, Lafferty J D. Correlated Topic Models[C]// Proceedings of the 18th International Conference on Neural Information Processing Systems. New York: ACM, 2005: 147-154.
[14] Blei D M, Lafferty J D. Dynamic Topic Models[C]// Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006: 113-120.
[15] Zhu P P, Zhang L P, Wang Y B, et al. Projection Learning with Local and Global Consistency Constraints for Scene Classification[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 144: 202-216.
doi: 10.1016/j.isprsjprs.2018.07.004
[16] Li Y, Nair P, Wen Z, et al. Global Surveillance of COVID-19 by Mining News Media Using a Multi-Source Dynamic Embedded Topic Model[C]// Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. New York: ACM, 2020:Article No.34.
[17] Harandizadeh B, Priniski J H, Morstatter F. Keyword Assisted Embedded Topic Model[C]// Proceedings of the 15th ACM International Conference on Web Search and Data Mining. New York: ACM, 2022: 372-380.
[18] Bhatia S, Lau J H, Baldwin T. Topic Intrusion for Automatic Topic Model Evaluation[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2018: 844-849.
[19] Baltrušaitis T, Ahuja C, Morency L P. Multimodal Machine Learning: A Survey and Taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(2): 423-443.
doi: 10.1109/TPAMI.2018.2798607 pmid: 29994351
[20] 杜鹏飞, 李小勇, 高雅丽. 多模态视觉语言表征学习研究综述[J]. 软件学报, 2021, 32(2): 327-348.
[20] (Du Pengfei, Li Xiaoyong, Gao Yali. Survey on Multimodal Visual Language Representation Learning[J]. Journal of Software, 2021, 32(2): 327-348.)
[21] 蹇松雷, 卢凯. 复杂异构数据的表征学习综述[J]. 计算机科学, 2020, 47(2): 1-9.
doi: 10.11896/jsjkx.190600180
[21] (Jian Songlei, Lu Kai. Survey on Representation Learning of Complex Heterogeneous Data[J]. Computer Science, 2020, 47(2): 1-9.)
doi: 10.11896/jsjkx.190600180
[22] Bougiatiotis K, Giannakopoulos T. Enhanced Movie Content Similarity Based on Textual, Auditory and Visual Information[J]. Expert Systems with Applications, 2018, 96: 86-102.
doi: 10.1016/j.eswa.2017.11.050
[23] Hazarika D, Zimmermann R, Poria S. MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis[C]// Proceedings of the 28th ACM International Conference on Multimedia. New York: ACM, 2020: 1122-1131.
[24] Du Y P, Liu Y, Peng Z, et al. Gated Attention Fusion Network for Multimodal Sentiment Classification[J]. Knowledge-Based Systems, 2022, 240: 108107.
doi: 10.1016/j.knosys.2021.108107
[25] Karpathy A, Li F F. Deep Visual-Semantic Alignments for Generating Image Descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 664-676.
doi: 10.1109/TPAMI.2016.2598339 pmid: 27514036
[26] Song G L, Wang S H, Huang Q M, et al. Harmonized Multimodal Learning with Gaussian Process Latent Variable Models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(3): 858-872.
doi: 10.1109/TPAMI.34
[27] Li Z Y, Lu H B, Fu H, et al. Image-Text Bidirectional Learning Network Based Cross-Modal Retrieval[J]. Neurocomputing, 2022, 483: 148-159.
doi: 10.1016/j.neucom.2022.02.007
[28] Qian S S, Zhang T Z, Xu C S, et al. Multi-Modal Event Topic Model for Social Event Analysis[J]. IEEE Transactions on Multimedia, 2016, 18(2): 233-246.
doi: 10.1109/TMM.2015.2510329
[29] Liu Z, Zhang C M, Chen C X. MMDF-LDA: An Improved Multi-Modal Latent Dirichlet Allocation Model for Social Image Annotation[J]. Expert Systems with Applications, 2018, 104: 168-184.
doi: 10.1016/j.eswa.2018.03.014
[30] Wang Z, Shan X, Zhang X X, et al. N24News: A New Dataset for Multimodal News Classification[C]// Proceedings of the 13th Language Resources and Evaluation Conference. 2022: 6768-6775.
[31] Chang J, Boyd-Graber J, Gerrish S, et al. Reading Tea Leaves: How Humans Interpret Topic Models[C]// Proceedings of the 22nd International Conference on Neural Information Processing Systems. New York: ACM, 2009: 288-296.
[32] Lau J H, Newman D, Baldwin T. Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality[C]// Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2014: 530-539.
[33] Li J N, Selvaraju R R, Gotmare A D, et al. Align before Fuse: Vision and Language Representation Learning with Momentum Distillation[J]. Advances in Neural Information Processing Systems, 2021, 34: 9694-9705.
[34] Radford A, Kim J W, Hallacy C, et al. Learning Transferable Visual Models from Natural Language Supervision[C]// Proceedings of the 38th International Conference on Machine Learning. New York: ACM, 2021: 8748-8763.
[35] Miao Y S, Grefenstette E, Blunsom P. Discovering Discrete Latent Topics with Neural Variational Inference[C]// Proceedings of the 34th International Conference on Machine Learning. New York: ACM, 2017: 2410-2419.
[1] He Chaocheng, Huang Qian, Li Xinru, Wang Chunying, Wu Jiang. Trending Topics on Metaverse: A Microblog Text Analysis with BERT and DTM[J]. 数据分析与知识发现, 2023, 7(9): 25-38.
[2] Lin Weizhen, Liu Hongwei, Chen Yanjun, Wen Zhanming, Yi Minqi. Customer Satisfaction Modelling for Healthcare Wearable Devices Through Online Reviews[J]. 数据分析与知识发现, 2023, 7(5): 145-154.
[3] Tang Jiao, Zhang Lisheng, Sang Chunyan. News Recommendation with Latent Topic Distribution and Long and Short-Term User Representations[J]. 数据分析与知识发现, 2022, 6(9): 52-64.
[4] Wang Li, Liu Xiwen. Measuring Diffusion of Technology Topics with Patent Data[J]. 数据分析与知识发现, 2022, 6(6): 1-10.
[5] Yue Tieqi, Fu Youfei, Xu Jian. An Analysis Framework for Job Demands from Job Postings[J]. 数据分析与知识发现, 2022, 6(2/3): 151-166.
[6] Lv Lucheng, Zhou Jian, Wang Xuezhao, Liu Xiwen. Technology Evolution Analysis Framework Based on Two-Layer Topic Model and Application[J]. 数据分析与知识发现, 2022, 6(2/3): 18-32.
[7] Yi Huifang,Liu Xiwen. Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[8] Zhang Xin,Wen Yi,Xu Haiyun. A Prediction Model with Network Representation Learning and Topic Model for Author Collaboration[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[9] Zhao Tianzi, Duan Liang, Yue Kun, Qiao Shaojie, Ma Zijuan. Generating News Clues with Biterm Topic Model[J]. 数据分析与知识发现, 2021, 5(2): 1-13.
[10] Chen Hao, Zhang Mengyi, Cheng Xiufeng. Identifying Cross-Region Patent Collaboration Opportunities Using LDA and Decision Trees——Case Study of Universities from Guangdong and Wuhan[J]. 数据分析与知识发现, 2021, 5(10): 37-50.
[11] Yu Chuanming,Yuan Sai,Zhu Xingyu,Lin Hongjun,Zhang Puliang,An Lu. Research on Deep Learning Based Topic Representation of Hot Events[J]. 数据分析与知识发现, 2020, 4(4): 1-14.
[12] Pan Youneng,Ni Xiuli. Recommending Online Medical Experts with Labeled-LDA Model[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[13] Xu Jianmin,Zhang Liqing,Wang Miao. Tracking Static Topics with Bayesian Network[J]. 数据分析与知识发现, 2020, 4(2/3): 200-206.
[14] Chen Wenjie. Predicting Research Collaboration Based on Translation Model[J]. 数据分析与知识发现, 2020, 4(10): 28-36.
[15] Hongfei Ling,Shiyan Ou. Review of Automatic Labeling for Topic Models[J]. 数据分析与知识发现, 2019, 3(9): 16-26.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn