1School of Information Science & Engineering, Yunnan University, Kunming 650500, China 2School of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China 3Sichuan Key Laboratory of Software Automatic Generation and Intelligent Service,Chengdu University of Information Technology, Chengdu 610225, China
[Objective] This paper modifies the topic model to improve the quality of extracted news clues. [Methods] We constructed a News-IBTM model based on IBTM (Incremental Biterm Topic Model) with dynamic sliding window, which reduced the extraction scope of binary phrases. Then, we used this model to extract topics and topic-word distributions from news, and inferred the document-topic distributions. Finally, we used the JS (Jensen-Shannon) divergence to measure the difference between document-topic distributions and generate news clues. [Results] We examined our News-IBTM model with news from People’s Daily Online and Weibo. The proposed model outperformed existing ones in perplexity, accuracy and efficiency. [Limitations] The accuracy of News-IBTM algorithm needs to be further improved. [Conclusions] The proposed method could effectively extract quality news topics and clues.
Surendran S, Chithraprasad D, Kaimal M R. A Scalable Geometric Algorithm for Community Detection from Social Networks with Incremental Update[J]. Social Network Analysis & Mining, 2016, 6(1): Article No.90.
Papadimitriou C H, Raghavan P, Tamaki H, et al. Latent Semantic Indexing: A Probabilistic Analysis[J]. Journal of Computer and System Sciences, 2000,61(2):217-235.
Kling C C, Posch L, Bleier A, et al. Topic Model Tutorial: A Basic Introduction on Latent Dirichlet Allocation and Extensions for Web Scientists[C]//Proceedings of the 8th ACM Conference on Web Science. 2016.
Blei D M, Ng A Y, Jordan M I, et al. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
AlSumait L, Barbará D, Domeniconi C. On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking[C]//Proceedings of the 8th IEEE International Conference on Data Mining. 2008: 3-12.
Yao L, Zhang Y, Wei B, et al. Incorporating Knowledge Graph Embeddings into Topic Modeling[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017: 3119-3126.
( Zheng Fei, Wei Dehao, Huang Sheng. Text Classification Method Based on LDA and Deep Learning[J]. Computer Engineering and Design, 2020,41(8):2184-2189.)
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810. 04805.
Fiscus J G, Doddington G R. Topic Detection and Tracking Evaluation Overview[A]//Topic Detection and Tracking: Event-based Information Organization[M]. 2002: 17-31.
Mei Q, Zhai C X. Discovering Evolutionary Theme Patterns from Text: An Exploration of Temporal Text Mining[C]//Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2005: 198-207.
Goodfellow I, Bengio Y, Courville A. Deep Learning (Vol. 1) [M]. Cambridge: MIT Press, 2016: 71-73.
Canini K R, Shi L, Griffiths T L. Online Inference of Topics with Latent Dirichlet Allocation[C]//Proceedings of the 12th International Conference on Artificial Intelligence and Statistics. AISTATS, 2009: 65-72.
( He Xufeng, Chen Ling, Chen Gencai, et al. A LDA Topic Model Based Collection Selection Method for Distributed Information Retrieval[J]. Journal of Chinese Information Processing, 2017,31(3):125-133.)
Li C, Wang H, Zhang Z, et al. Topic Modeling for Short Texts with Auxiliary Word Embeddings[C]//Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2016: 165-174.
Huang J, Peng M, Wang H, et al. A Probabilistic Method for Emerging Topic Tracking in Microblog Stream[J]. World Wide Web Journal, 2017,20(2):325-350.
( Zhang Yangsen, Duan Yuxiang, Huang Gaijuan, et al. A Survey on Topic Detection and Tracking Methods in Social Media[J]. Journal of Chinese Information Processing, 2019,33(7):1-10,30.)
Zhang Y, Ma J, Wang Z, et al. Extraction and Tracking of Scientific Topics by LDA[C]//Proceedings of the 9th International Conference on Intelligent Networking and Collaborative Systems. 2017: 536-544.