[Objective] There are lots of irrelevant results among the topics identified by the LDA model, which poses negative effects to the accuracy of evolution analysis. This paper constructs topics evolution paths to analyze their evolution by filtering out noises and calculating relevance. [Methods] First, we filtered out irrelevant topics by their probability of appearing in all documents and the word propensity distribution of topics. Then, we calculated the Jensen-Shannon Divergence to identify related topics. Finally, we constructed the topic evolution paths based on the correlation between topics. [Results] The effectiveness of the proposed method was examined with scientific literature on “machine learning”, which yielded five evolution paths, i.e. rebirth, extinction, succession, division and merger. [Limitations] There are some subjective factors involving the estimated threshold values. [Conclusions] The proposed method could avoid the interference of noise topics, and then identify relevant topics from adjacent time intervals. It helps us discover the evolution of discipline topics more accurately.
(Ye Chunlei, Leng Fuhai.Research on the Improvement of Subject Evolution Method Based on Co-word Analysis[J]. Information Studies: Theory & Application, 2012, 35(3): 79-82.)
(Tang Guoyuan, Zhang Wei.Development and Analysis of Subject Theme Evolution Based on Co-word Analysis Method[J]. Library and Information Service, 2015, 59(5): 128-136.)
doi: 10.13266/j.issn.0252-3116.2015.05.020
[3]
Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
(Shan Bin, Li Fang.A Survey of Topic Evolution Based on LDA[J]. Journal of Chinese Information Processing, 2010, 24(6): 43-49.)
doi: 10.3969/j.issn.1003-0077.2010.06.007
[6]
Wang X, McCallum A. Topic over Time: A Non-Markov Continuous-Time Model of Topical Trends[C] //Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006: 424-433.
(Yang Haixia, Gao Baojun, Sun Hanlin.Extracting Topics of Computer Science Literature with LDA Model[J]. New Technology of Library and Information Service, 2016(11): 20-26.)
(Shan Bin, Li Fang.Topic Evolution Based on Seminal Document and Topic Model[J]. New Technology of Library and Information Service, 2011(7/8): 104-109.)
(Zhu Na, Wang Fang.Identification of Knowledge Evolutionary Path Based on Topic Relevance: Taking the Case of 3D Printing Field[J]. Library and Information Service, 2016, 60(5): 101-109.)
doi: 10.13266/j.issn.0252-3116.2016.05.015
(Qin Xiaohui, Le Xiaoqiu.Topic Evolution Research on a Certain Field Based on LDA Topic Association Filter[J]. New Technology of Library and Information Service, 2015(3): 18-25.)
(Li Xiangdong, Zhang Jiao, Yuan Man.On Topic Evolution of a Scientific Journal Based on LDA Model[J]. Journal of Intelligence, 2014, 33(7): 115-121.)
doi: 10.3969/j.issn.1002-1965.2014.07.021
[14]
Blei D M, Lafferty J D.Dynamic Topic Models[C]// Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006: 113-120.
(Qi Yashuang, Zhu Na, Zhai Yujia.A Comparative Study on Topic Heats Evolution in the Field of Information Science Between the Domestic and Foreign Research Based on DTM[J]. Library and Information Service, 2016, 60(16): 99-109.)
(Wang Yanpeng.Research Progress of Scientific and Technical Literature Topic Detection and Evolution Based on Topic Model in China[J]. Library and Information Service, 2016, 60(3): 130-137.)
doi: 10.13266/j.issn.0252-3116.2016.03.019
[17]
Cao J, Xia T, Li J, et al.A Density-based Method for Adaptive LDA Model Selection[J]. Neurocomputing, 2009, 72(7-9): 1775-1781.
doi: 10.1016/j.neucom.2008.06.011
(Guan Peng, Wang Yuefen.Identifying Optimal Topic Numbers from Sci-Tech Information with LDA Model[J]. New Technology of Library and Information Service, 2016(9): 42-50.)
(Cao Juan, Zhang Yongdong, Li Jintao, et al.A Method of Adaptively Selecting Best LDA Model Based on Density[J]. Chinese Journal of Computers, 2008, 31(10): 1780-1787.)
[20]
Lee L.On the Eectiveness of the Skew Divergence for Statistical Language Analysis[C]//Proceeding of the 4th International Conference on Artificial Intelligence & Statistics. 2001: 65-72.
[21]
Alsumait L, Barbará D, Gentle J, et al.Topic Significance Ranking of LDA Generative Models[A]// Machine Learning and Knowledge Discovery in Databases[M]. Springer, Berlin, Heidelberg, 2009: 67-82.
[22]
袁胜文. 基于LDA的中文科技文献话题演化研究[D]. 郑州: 河南工业大学, 2015.
[22]
(Yuan Shengwen.The Research on Topic Evolution for Chinese Literature of Science and Technology Based on LDA[D]. Zhengzhou: Henan University of Technology, 2015.)
[23]
MacKay D J C. Information Theory, Inference, and Learning Algorithms[M]. Cambridge University Press, 2003.
[24]
Lin J.Divergence Measures Based on Shannon Entropy[J]. IEEE Transactions on Information Theory, 1991, 37(1): 145-151.
doi: 10.1109/18.61115
[25]
吕楠. 话题追踪与演化分析技术研究[D]. 郑州: 解放军信息工程大学, 2009.
[25]
(Lv Nan.Research on Topic Tracking and Evolution Analysis Technique[D]. Zhengzhou: Information Engineering University, 2009.)
[26]
THULAC: 一个高效的中文词法分析工具包[EB/OL]. [2017-07-11]. .
[26]
(THULAC: An Efficient Chinese Lexical Analysis Toolkit [EB/OL]. [2017-07-11].