|
|
Topic Detecting on Multimodal News Data Based on Deep Learning |
Ni Liang1,Wu Peng2(),Zhou Xueqing3 |
1School of Cyber Science & Engineering, Nanjing University of Science & Technology, Nanjing 210094, China 2School of Intelligent Manufacturing, Nanjing University of Science & Technology, Nanjing 210094, China 3School of Economics & Management, Nanjing University of Science & Technology, Nanjing 210094, China |
|
|
Abstract [Objective] This paper constructs a multimodal topic model combining text and images in news based on multimodal learning methods. It aims to uncover latent topics in the news automatically. [Methods] We constructed a model incorporating word embedding for topics from texts and images. It uses multimodal joint representation learning and coordinate representation learning for feature fusion. We conducted visual analysis for the discovered multimodal news topics. Finally, we examined the new model on the N15News dataset. [Results] Compared to Label-ETM using only text features, the multimodal topic modeling approach can achieve better topic interpretability and diversity. This suggests that the multimodal topic modeling approach is feasible. [Limitations] We assume images and text in news are semantically and thematically related. Multimodal fusion methods need to be improved in weakly related and irrelevant domains. [Conclusions] Multimodal topic modeling can discover connections between different modal data and improve the diversity of discovered topics.
|
Received: 08 January 2023
Published: 08 May 2023
|
|
Fund:National Natural Science Foundation of China(72274096);National Natural Science Foundation of China(71774084);Qing Lan Project in Jiangsu Universities([2020]10) |
Corresponding Authors:
Wu Peng,ORCID:0000-0001-7066-5487,E-mail: wupeng@njust.edu.cn。
|
[1] |
Mehrotra R, Sanner S, Buntine W, et al. Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013: 889-892.
|
[2] |
Kim E H J, Jeong Y K, Kim Y, et al. Topic-Based Content and Sentiment Analysis of Ebola Virus on Twitter and in the News[J]. Journal of Information Science, 2016, 42(6): 763-781.
doi: 10.1177/0165551515608733
|
[3] |
Xue J, Chen J X, Hu R, et al. Twitter Discussions and Emotions about the COVID-19 Pandemic: Machine Learning Approach[J]. Journal of Medical Internet Research, 2020, 22(11): e20550.
doi: 10.2196/20550
|
[4] |
Fang Y X, Zhang H X, Ren Y W. Unsupervised Cross-Modal Retrieval via Multi-Modal Graph Regularized Smooth Matrix Factorization Hashing[J]. Knowledge-Based Systems, 2019, 171: 69-80.
doi: 10.1016/j.knosys.2019.02.004
|
[5] |
Hu P, Peng D Z, Wang X, et al. Multimodal Adversarial Network for Cross-Modal Retrieval[J]. Knowledge-Based Systems, 2019, 180: 38-50.
doi: 10.1016/j.knosys.2019.05.017
|
[6] |
Li Y Q, Zhang K, Wang J Y, et al. A Cognitive Brain Model for Multimodal Sentiment Analysis Based on Attention Neural Networks[J]. Neurocomputing, 2021, 430: 159-173.
doi: 10.1016/j.neucom.2020.10.021
|
[7] |
Hong D, Gao L, Yokoya N, et al. More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(5): 4340- 4354.
doi: 10.1109/TGRS.2020.3016820
|
[8] |
Li C L, Duan Y, Wang H R, et al. Enhancing Topic Modeling for Short Texts with Auxiliary Word Embeddings[J]. ACM Transactions on Information Systems, 2017, 36(2): Article No.11.
|
[9] |
Wang J, He K J, Yang M. Topic Discovery by Spectral Decomposition and Clustering with Coordinated Global and Local Contexts[J]. International Journal of Machine Learning and Cybernetics, 2020, 11(11): 2475-2487.
doi: 10.1007/s13042-020-01133-3
|
[10] |
Dieng A B, Ruiz F J R, Blei D M. Topic Modeling in Embedding Spaces[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 439-453.
doi: 10.1162/tacl_a_00325
|
[11] |
Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
|
[12] |
Yan X H, Guo J F, Lan Y Y, et al. A Biterm Topic Model for Short Texts[C]// Proceedings of the 22nd International Conference on World Wide Web. New York: ACM, 2013: 1445-1456.
|
[13] |
Blei D M, Lafferty J D. Correlated Topic Models[C]// Proceedings of the 18th International Conference on Neural Information Processing Systems. New York: ACM, 2005: 147-154.
|
[14] |
Blei D M, Lafferty J D. Dynamic Topic Models[C]// Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006: 113-120.
|
[15] |
Zhu P P, Zhang L P, Wang Y B, et al. Projection Learning with Local and Global Consistency Constraints for Scene Classification[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 144: 202-216.
doi: 10.1016/j.isprsjprs.2018.07.004
|
[16] |
Li Y, Nair P, Wen Z, et al. Global Surveillance of COVID-19 by Mining News Media Using a Multi-Source Dynamic Embedded Topic Model[C]// Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. New York: ACM, 2020:Article No.34.
|
[17] |
Harandizadeh B, Priniski J H, Morstatter F. Keyword Assisted Embedded Topic Model[C]// Proceedings of the 15th ACM International Conference on Web Search and Data Mining. New York: ACM, 2022: 372-380.
|
[18] |
Bhatia S, Lau J H, Baldwin T. Topic Intrusion for Automatic Topic Model Evaluation[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2018: 844-849.
|
[19] |
Baltrušaitis T, Ahuja C, Morency L P. Multimodal Machine Learning: A Survey and Taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(2): 423-443.
doi: 10.1109/TPAMI.2018.2798607
pmid: 29994351
|
[20] |
杜鹏飞, 李小勇, 高雅丽. 多模态视觉语言表征学习研究综述[J]. 软件学报, 2021, 32(2): 327-348.
|
[20] |
(Du Pengfei, Li Xiaoyong, Gao Yali. Survey on Multimodal Visual Language Representation Learning[J]. Journal of Software, 2021, 32(2): 327-348.)
|
[21] |
蹇松雷, 卢凯. 复杂异构数据的表征学习综述[J]. 计算机科学, 2020, 47(2): 1-9.
doi: 10.11896/jsjkx.190600180
|
[21] |
(Jian Songlei, Lu Kai. Survey on Representation Learning of Complex Heterogeneous Data[J]. Computer Science, 2020, 47(2): 1-9.)
doi: 10.11896/jsjkx.190600180
|
[22] |
Bougiatiotis K, Giannakopoulos T. Enhanced Movie Content Similarity Based on Textual, Auditory and Visual Information[J]. Expert Systems with Applications, 2018, 96: 86-102.
doi: 10.1016/j.eswa.2017.11.050
|
[23] |
Hazarika D, Zimmermann R, Poria S. MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis[C]// Proceedings of the 28th ACM International Conference on Multimedia. New York: ACM, 2020: 1122-1131.
|
[24] |
Du Y P, Liu Y, Peng Z, et al. Gated Attention Fusion Network for Multimodal Sentiment Classification[J]. Knowledge-Based Systems, 2022, 240: 108107.
doi: 10.1016/j.knosys.2021.108107
|
[25] |
Karpathy A, Li F F. Deep Visual-Semantic Alignments for Generating Image Descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 664-676.
doi: 10.1109/TPAMI.2016.2598339
pmid: 27514036
|
[26] |
Song G L, Wang S H, Huang Q M, et al. Harmonized Multimodal Learning with Gaussian Process Latent Variable Models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(3): 858-872.
doi: 10.1109/TPAMI.34
|
[27] |
Li Z Y, Lu H B, Fu H, et al. Image-Text Bidirectional Learning Network Based Cross-Modal Retrieval[J]. Neurocomputing, 2022, 483: 148-159.
doi: 10.1016/j.neucom.2022.02.007
|
[28] |
Qian S S, Zhang T Z, Xu C S, et al. Multi-Modal Event Topic Model for Social Event Analysis[J]. IEEE Transactions on Multimedia, 2016, 18(2): 233-246.
doi: 10.1109/TMM.2015.2510329
|
[29] |
Liu Z, Zhang C M, Chen C X. MMDF-LDA: An Improved Multi-Modal Latent Dirichlet Allocation Model for Social Image Annotation[J]. Expert Systems with Applications, 2018, 104: 168-184.
doi: 10.1016/j.eswa.2018.03.014
|
[30] |
Wang Z, Shan X, Zhang X X, et al. N24News: A New Dataset for Multimodal News Classification[C]// Proceedings of the 13th Language Resources and Evaluation Conference. 2022: 6768-6775.
|
[31] |
Chang J, Boyd-Graber J, Gerrish S, et al. Reading Tea Leaves: How Humans Interpret Topic Models[C]// Proceedings of the 22nd International Conference on Neural Information Processing Systems. New York: ACM, 2009: 288-296.
|
[32] |
Lau J H, Newman D, Baldwin T. Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality[C]// Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2014: 530-539.
|
[33] |
Li J N, Selvaraju R R, Gotmare A D, et al. Align before Fuse: Vision and Language Representation Learning with Momentum Distillation[J]. Advances in Neural Information Processing Systems, 2021, 34: 9694-9705.
|
[34] |
Radford A, Kim J W, Hallacy C, et al. Learning Transferable Visual Models from Natural Language Supervision[C]// Proceedings of the 38th International Conference on Machine Learning. New York: ACM, 2021: 8748-8763.
|
[35] |
Miao Y S, Grefenstette E, Blunsom P. Discovering Discrete Latent Topics with Neural Variational Inference[C]// Proceedings of the 34th International Conference on Machine Learning. New York: ACM, 2017: 2410-2419.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|