|
|
Research on Deep Learning Based Topic Representation of Hot Events |
Yu Chuanming1(),Yuan Sai2,Zhu Xingyu1,Lin Hongjun1,Zhang Puliang1,An Lu3 |
1 School of Information and Security Engineering, Zhongnan University of Economics and Law, Wuhan 430073, China 2 School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan 430073, China 3 School of Information Management, Wuhan University, Wuhan 430072, China |
|
|
Abstract [Objective] This study aims to explore how to learn topic representation for hot events, and investigate the performances of various topic representation models on tasks such as topic classification and topic relevance modeling. [Methods] Based on the LDA2Vec method, we proposed W-LDA2Vec, a topic representation learning model. We predicted the context vectors of the central words after joint training of the initial document and word vectors. Finally, we obtained a word representation of topic information and a topic representation of context information. [Results] In hot events topical classification task, our model achieved the highest F1 value of 0.893, which is 0.314, 0.057, 0.022 and 0.013 higher than those of the four baseline models LDA, Word2Vec, TEWV and Doc2Vec, respectively. For task of hot events topic relevance modeling, with the number of topics as 10, our model achieved a higher correlation score of 0.462 5, which is 0.067 8 higher than that of the LDA model. [Limitations] The experimental corpus is limited to Chinese and English.[Conclusions] By embedding topic information to word and document representation, our model can effectively improve the performance of topical classification and relevance modeling.
|
Received: 14 May 2019
Published: 01 June 2020
|
|
Corresponding Authors:
Yu Chuanming
E-mail: yucm@zuel.edu.cn
|
[1] |
Bengio Y, Courville A, Vincent P . Representation Learning: A Review and New Perspectives[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,35(8):1798-1828.
doi: 10.1109/TPAMI.2013.50
|
[2] |
Blei D M, Ng A Y, Jordan M I . Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
|
[3] |
马秀峰, 郭顺利, 宋凯 . 基于LDA主题模型的“内容-方法”共现分析研究——以情报学领域为例[J]. 情报科学, 2018,36(4):69-74.
|
[3] |
( Ma Xiufeng, Guo Shunli, Song Kai . Subject-Method Co-occurrence Analysis Based on LDA Topic Model——Taking the Information Science Field as an Example[J]. Information Science, 2018,36(4):69-74.)
|
[4] |
刘俊婉, 龙志昕, 王菲菲 . 基于LDA主题模型与链路预测的新兴主题关联机会发现研究[J]. 数据分析与知识发现, 2019,3(1):104-117.
|
[4] |
( Liu Junwan, Long Zhixin, Wang Feifei . Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction[J]. Data Analysis and Knowledge Discovery, 2019,3(1):104-117.)
|
[5] |
张涛, 马海群 . 一种基于LDA主题模型的政策文本聚类方法研究[J]. 数据分析与知识发现, 2018,2(9):59-65.
|
[5] |
( Zhang Tao, Ma Haiqun . Clustering Policy Texts Based on LDA Topic Model[J]. Data Analysis and Knowledge Discovery, 2018,2(9):59-65.)
|
[6] |
熊回香, 窦燕 . 基于LDA主题模型的标签混合推荐研究[J]. 图书情报工作, 2018,62(3):104-113.
|
[6] |
( Xiong Huixiang, Dou Yan . Research on Tag Hybrid Recommendation Based on LDA Topic Model[J]. Library and Information Service, 2018,62(3):104-113.)
|
[7] |
熊回香, 叶佳鑫 . 基于LDA主题模型的微博标签生成研究[J]. 情报科学, 2018,36(10):7-12.
|
[7] |
( Xiong Huixiang, Ye Jiaxin . Microblog Tags Generation Based on LDA Theme Model[J]. Information Science, 2018,36(10):7-12.)
|
[8] |
王曰芬, 傅柱, 陈必坤 . 基于LDA主题模型的科学文献主题识别:全局和学科两个视角的对比分析[J]. 情报理论与实践, 2016,39(7):121-126,101.
|
[8] |
( Wang Yuefen, Fu Zhu, Chen Bikun . Topic Identification of Scientific Literature Based on LDA Topic Model: Comparative Analysis of Two Views of Global and Discipline[J]. Information Studies: Theory & Application, 2016,39(7):121-126,101.)
|
[9] |
王婷婷, 王宇, 秦琳杰 . 基于动态主题模型的时间窗口划分研究[J]. 数据分析与知识发现, 2018,2(10):54-64.
|
[9] |
( Wang Tingting, Wang Yu, Qin Linjie . Dividing Time Windows of Dynamic Topic Model[J]. Data Analysis and Knowledge Discovery, 2018,2(10):54-64.)
|
[10] |
李慧, 胡云凤 . 基于动态情感主题模型的在线评论分析[J]. 数据分析与知识发现, 2017,1(9):74-82.
|
[10] |
( Li Hui, Hu Yunfeng . Analyzing Online Reviews with Dynamic Sentiment Topic Model[J]. Data Analysis and Knowledge Discovery, 2017,1(9):74-82.)
|
[11] |
Li L, Gan S, Yin X. Feedback Recurrent Neural Network-based Embedded Vector and Its Application in Topic Model[J]. EURASIP Journal on Embedded Systems, 2017, 2017(1): Article No. 5.
doi: 10.1186/s13639-016-0038-6
|
[12] |
Wei X, Croft W B. LDA-based Document Models for Ad-Hoc Retrieval[C]// Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2006: 178-185.
|
[13] |
Yang L, Liu Z, Chua T S, et al. Topical Word Embeddings[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015.
|
[14] |
Jung N, Choi H I . Continuous Semantic Topic Embedding Model Using Variational Autoencoder[OL]. arXiv Preprint, arXiv: 1711. 08870.
|
[15] |
Moody C E . Mixing Dirichlet Topic Models and Word Embeddings to Make LDA2Vec[OL]. arXiv Preprint, arXiv: 1605. 02019.
|
[16] |
Li D, Li Y, Wang S. Topic Enhanced Word Vectors for Documents Representation[C]// Proceedings of the 6th National Conference on Social Media Processing. 2017: 166-177.
|
[17] |
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013: 3111-3119.
|
[18] |
Mikolov T, Chen K, Corrado G , et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301. 3781.
|
[19] |
Le Q, Mikolov T. Distributed Representations of Sentences and Documents[C]// Proceedings of the 31st International Conference on Machine Learning. 2014: 1188-1196.
|
[20] |
Yao L, Zhang Y, Chen Q , et al. Mining Coherent Topics in Documents Using Word Embeddings and Large-Scale Text Data[J]. Engineering Applications of Artificial Intelligence, 2017,64:432-439.
doi: 10.1016/j.engappai.2017.06.024
|
[21] |
Levy O, Goldberg Y. Linguistic Regularities in Sparse and Explicit Word Representations[C]// Proceedings of the 18th Conference on Computational Natural Language Learning. 2014: 171-180.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|