1 School of Information and Security Engineering, Zhongnan University of Economics and Law, Wuhan 430073, China 2 School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan 430073, China 3 School of Information Management, Wuhan University, Wuhan 430072, China
[Objective] This study aims to explore how to learn topic representation for hot events, and investigate the performances of various topic representation models on tasks such as topic classification and topic relevance modeling. [Methods] Based on the LDA2Vec method, we proposed W-LDA2Vec, a topic representation learning model. We predicted the context vectors of the central words after joint training of the initial document and word vectors. Finally, we obtained a word representation of topic information and a topic representation of context information. [Results] In hot events topical classification task, our model achieved the highest F1 value of 0.893, which is 0.314, 0.057, 0.022 and 0.013 higher than those of the four baseline models LDA, Word2Vec, TEWV and Doc2Vec, respectively. For task of hot events topic relevance modeling, with the number of topics as 10, our model achieved a higher correlation score of 0.462 5, which is 0.067 8 higher than that of the LDA model. [Limitations] The experimental corpus is limited to Chinese and English.[Conclusions] By embedding topic information to word and document representation, our model can effectively improve the performance of topical classification and relevance modeling.
余传明,原赛,朱星宇,林虹君,张普亮,安璐. 基于深度学习的热点事件主题表示研究*[J]. 数据分析与知识发现, 2020, 4(4): 1-14.
Yu Chuanming,Yuan Sai,Zhu Xingyu,Lin Hongjun,Zhang Puliang,An Lu. Research on Deep Learning Based Topic Representation of Hot Events. Data Analysis and Knowledge Discovery, 2020, 4(4): 1-14.
Bengio Y, Courville A, Vincent P . Representation Learning: A Review and New Perspectives[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,35(8):1798-1828.
doi: 10.1109/TPAMI.2013.50
[2]
Blei D M, Ng A Y, Jordan M I . Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
( Ma Xiufeng, Guo Shunli, Song Kai . Subject-Method Co-occurrence Analysis Based on LDA Topic Model——Taking the Information Science Field as an Example[J]. Information Science, 2018,36(4):69-74.)
( Liu Junwan, Long Zhixin, Wang Feifei . Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction[J]. Data Analysis and Knowledge Discovery, 2019,3(1):104-117.)
( Wang Yuefen, Fu Zhu, Chen Bikun . Topic Identification of Scientific Literature Based on LDA Topic Model: Comparative Analysis of Two Views of Global and Discipline[J]. Information Studies: Theory & Application, 2016,39(7):121-126,101.)
( Li Hui, Hu Yunfeng . Analyzing Online Reviews with Dynamic Sentiment Topic Model[J]. Data Analysis and Knowledge Discovery, 2017,1(9):74-82.)
[11]
Li L, Gan S, Yin X. Feedback Recurrent Neural Network-based Embedded Vector and Its Application in Topic Model[J]. EURASIP Journal on Embedded Systems, 2017, 2017(1): Article No. 5.
doi: 10.1186/s13639-016-0038-6
[12]
Wei X, Croft W B. LDA-based Document Models for Ad-Hoc Retrieval[C]// Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2006: 178-185.
[13]
Yang L, Liu Z, Chua T S, et al. Topical Word Embeddings[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015.
[14]
Jung N, Choi H I . Continuous Semantic Topic Embedding Model Using Variational Autoencoder[OL]. arXiv Preprint, arXiv: 1711. 08870.
[15]
Moody C E . Mixing Dirichlet Topic Models and Word Embeddings to Make LDA2Vec[OL]. arXiv Preprint, arXiv: 1605. 02019.
[16]
Li D, Li Y, Wang S. Topic Enhanced Word Vectors for Documents Representation[C]// Proceedings of the 6th National Conference on Social Media Processing. 2017: 166-177.
[17]
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013: 3111-3119.
[18]
Mikolov T, Chen K, Corrado G , et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301. 3781.
[19]
Le Q, Mikolov T. Distributed Representations of Sentences and Documents[C]// Proceedings of the 31st International Conference on Machine Learning. 2014: 1188-1196.
[20]
Yao L, Zhang Y, Chen Q , et al. Mining Coherent Topics in Documents Using Word Embeddings and Large-Scale Text Data[J]. Engineering Applications of Artificial Intelligence, 2017,64:432-439.
doi: 10.1016/j.engappai.2017.06.024
[21]
Levy O, Goldberg Y. Linguistic Regularities in Sparse and Explicit Word Representations[C]// Proceedings of the 18th Conference on Computational Natural Language Learning. 2014: 171-180.