[Objective] This paper aims to correctly grasp the topic development trend by constructing a microblog topic evolution method, and it is of great significance for public sentiment warning. [Methods] Firstly, the Ship-gram model is used to train the word vector model on the text set. Input the text of each time slice into the BTM to get the candidate theme. In BTM thematic dimension, the theme word vector is constructed. Secondly, k-means algorithm is used to cluster the theme word vector to get the fused theme. And the topic evolution of the text set on time slice is established. [Results] The experimental results show that the F value of this method is 75%, which is about 10% higher than that of the topic model. This proves the feasibility of the proposed method. [Limitations] There is no definite measuring standard for topic evolution, and there is no comparison between various methods of topic evolution. [Conclusions] The proposed method can effectively extract topics at all stages and provide an effective way for network public opinion analysis.
张佩瑶,刘东苏. 基于词向量和BTM的短文本话题演化分析*[J]. 数据分析与知识发现, 2019, 3(3): 95-101.
Peiyao Zhang,Dongsu Liu. Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM. Data Analysis and Knowledge Discovery, 2019, 3(3): 95-101.
(Chen Fuji, Ma Meilan.A Subtopic Detection Method of Specific Events for Network Public Opinion: Taking News about a Female Driver in Chengdu as Example[J]. Journal of Intelligence, 2016, 35(5): 58-64.)
(Zhao Aihua, Liu Peiyu, Zheng Yan.Subtopic Division in News Topic Based on Latent Dirichlet Allocation[J]. Journal of Chinese Computer Systerms, 2013, 34(4): 732-737.)
(Xu Jiajun, Yang Yang, Yao Tianfang, et al.LDA Based Hot Topic Detection and Tracking for the Forum[J]. Journal of Chinese Information Processing, 2016, 30(1): 43-49.)
(Wang Yamin, Hu Yue.Hotspot Detection in Microblog Public Opinion Based on Biterm Topic Model[J]. Journal of Intelligence, 2016, 35(11): 116-124, 140.)
[5]
Wang X, McCallum A. Topic over Time: A Non-Markov Continuous-Time Model of Topical Trends[C]// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006: 424-433.
[6]
Blei D M, Lafferty J D.Dynamic Topic Model[C]// Proceedings of the 23rd International Conference on Machine Learning. 2006: 113-120.
(Qi Yashuang, Zhu Na, Zhai Yujia.A Comparative Study on Topic Heats Evolution in the Field of Information Science Between the Domestic and Foreign Research Based on DTM[J]. Library and Information Service, 2016, 60(16): 99-109.)
[8]
Alsumait L, Barbar D, Domeniconi C.On-line LDA: Adaptive Topic Models for Mining Text Streams with Application to Topic Detection and Tracking[C]// Proceedings of the 8th IEEE International Conference on Data Mining. IEEE, 2008: 3-12.
(Hu Yanli, Bai Liang, Zhang Weiming.OLDA-based Method for Online Topic Evolution in Network Public Opinion Analysis[J]. Journal of National University of Defense Technology, 2012, 34(1): 150-154.)
(Tang Xiaobo, Wang Hongyan.Analysis of Microblog Topic Evolution Based on Latent Dirichlet Allocation Model[J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(3): 281-287.)
(Li Shuaibin, Li Yaxing, Feng Xupeng, et al.Microblogging Topic Detection Based on the Word Distributed Representation[J]. Computer Application and Software, 2017, 34(12): 47-52.)
(Zhang Jiaming, Xi Yaoyi, Wang Bo, et al.Method of Micro-blog Event Tracking Based on Word Vector[J]. Computer Engineering and Applications, 2016, 52(17): 73-78.)
[14]
Hinton G E.Learning Distributed Representations of Concepts[C]//Proceedings of the 8th Annual Conference of the Cognitive Science Society. 1986.
[15]
Yan X, Guo J, Lan Y, et al.A Biterm Topic Model for Short Texts[C]//Proceedings of the 22nd International Conference on World Wide Web. 2013: 1445-1456.
[16]
Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.