[Objective] To detect the birth, extinction, development, merge and split of topic evolution of the literatures in a certain field. [Methods] This paper divides time windows according to the publication data of the literatures, and LDA model is applied to extract topics from each time window automatically. The topic association filter rules are used to determine evolution relationships between topics in adjacent time windows. Form a topic evolution path in a continuous time period. [Results] Considering the continuity of the topics, different types of topic evolution could be detected with high accuracy. [Limitations] This method fixes the size of time windows without considering the diversity of topic evolution cycles. [Conclusions] This method can effectively reduce the interference of topics with smaller similarity in LDA, and enhance accuracy of evolution relation recognition.
秦晓慧, 乐小虬. 基于LDA主题关联过滤的领域主题演化研究[J]. 现代图书情报技术, 2015, 31(3): 18-25.
Qin Xiaohui, Le Xiaoqiu. Topic Evolution Research on a Certain Field Based on LDA Topic Association Filter. New Technology of Library and Information Service, 2015, 31(3): 18-25.
[1] 李勇, 安新颖. 基于LDA的主题演化研究[J]. 医学信息学杂志, 2013, 34(2): 57-61. (Li Yong, An Xinying. Research on Topic Evolution Based on LDA [J]. Journal of Medical Informatics, 2013, 34(2): 57-61.)
[2] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
[3] 楚克明, 李芳. 基于LDA模型的新闻话题的演化[J]. 计算机应用与软件, 2011, 28(4): 4-7, 26.( Chu Keming, Li Fang. LDA Model-based News Topic Evolution [J]. Computer Applications and Software, 2011, 28(4): 4-7, 26.)
[4] 楚克明. 基于LDA的新闻话题演化研究[D]. 上海: 上海交通大学, 2010.(Chu Keming. The Reaearch on Topic Evolution for News Based on LDA Model [D]. Shanghai: Shanghai Jiaotong University, 2010.)
[5] 李保利, 杨星. 基于LDA模型和话题过滤的研究主题演化分析[J]. 小型微型计算机系统, 2012, 33(12): 2738-2743. (Li Baoli, Yang Xing. Analyzing Research Topic Evolution with LDA and Topic Filtering [J]. Journal of Chinese Computer Systems, 2012, 33(12): 2738-2743.)
[6] 崔凯, 周斌, 贾焰, 等.一种基于LDA的在线主题演化挖掘模型[J]. 计算机科学, 2010, 37(11): 156-159, 193. (Cui Kai, Zhou Bin, Jia Yan, et al. LDA-based Model for Online Topic Evolution Mining [J]. Computer Science, 2010, 37(11): 156-159, 193.)
[7] 胡吉明, 陈果. 基于动态LDA主题模型的内容主题挖掘与演化[J]. 图书情报工作, 2014, 58(2): 138-142. (Hu Jiming, Chen Guo. Mining and Eolution of Content Topics Based on Dynamic LDA [J]. Library and Information Service, 2014, 58(2): 138-142.)
[8] Lv N, Luo J, Liu Y, et al. Analysis of Topic Evolution Based on Subtopic Similarity [C]. In: Proceedings of the 2009 International Conference on Computational Intelligence and Natural Computing, 2009, 2: 506-509.
[9] 胡艳丽, 白亮, 张维明. 一种话题演化建模与分析方法[J]. 自动化学报, 2012, 38(10): 1690-1697. (Hu Yanli, Bai Liang, Zhang Weiming. Modeling and Analyzing Topic Evolution [J]. Acta Automatic Sinica, 2012, 38(10): 1690-1697.)
[10] Blei D M, Lafferty J D. Dynamic Topic Models [C]. In: Proceedings of the 23rd International Conference on Machine Learning. 2006: 113-120.
[11] Alsumait L, Barbara D, Domeniconi C. On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking [C]. In: Proceeding of the 8th IEEE International Conference on Data Mining. IEEE, 2008: 3-12.
[12] Wang X, McCallum A. Topics over Time: A Non-Markov Continuous-Time Model of Topical Trends [C]. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006: 424-433.
[13] 贺亮, 李芳.科技文献话题演化研究[J]. 现代图书情报技术, 2012(4): 61-67. (He Liang, Li Fang. Topic Evolution in Scientific Literature [J]. New Technology of Library and Information Service, 2012(4): 61-67.)
[14] 范云满, 马建霞. 利用LDA的领域新兴主题探测技术综述[J]. 现代图书情报技术, 2012(12): 58-65. (Fan Yunman, Ma Jianxia. Review on the LDA-based Techniques Detection for the Field Emerging Topic [J]. New Technology of Library and Information Service, 2012(12): 58-65.)
[15] 唐晓波, 王洪艳. 基于潜在狄利克雷分配模型的微博主题演化分析[J]. 情报学报, 2013, 32(3): 281-287. (Tang Xiaobo, Wang Hongyan. Analysis of Microblog Topic Evolution Based on Latent Dirichlet Allocation Model [J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(3): 281-287.)
[16] 史庆伟, 乔晓东, 徐硕, 等.作者主题演化模型及其在研究兴趣演化分析中的应用[J]. 情报学报, 2013, 32(9): 912-919. (Shi Qingwei, Qiao Xiaodong, Xu Shuo, et al. Author-topic Evolution Model and Its Application in Analysis of Research Interests Evolution [J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(9): 912-919.)
[17] Xu S, Shi Q, Qiao X, et al. Author-topic over Time (AToT): A Dynamic Users' Interest Model [A].// Mobile, Ubiquitous, and Intelligent Computing [M]. Springer Berlin Heidelberg, 2014: 239-245.
[18] 单斌, 李芳. 基于LDA话题演化研究方法综述[J]. 中文信息学报, 2010, 24(6): 43-49, 68. (Shan Bin, Li Fang. A Survey of Topic Evolution Based on LDA [J]. Journal of Chinese Information Processing, 2010, 24(6): 43-49, 68.)
[19] Wei X, Sun J, Wang X. Dynamic Mixture Models for Multiple Timeseries [C]. In: Proceedings of the 20th International Joint Conference on Artificial Intelligent, Hyderabad, India. 2007: 2909-2914.
[20] Griffiths T L, Steyvers M. Finding Scientific Topics [C]. In: Proceedings of the National Academy of Sciences of the United States of America. 2004: 5228-5235.
[21] Manning C D, Schütze H, Raghavan P. 信息检索导论[M]. 王斌译. 北京: 人民邮电出版社, 2011. (Manning C D, Schütze H, Raghavan P. Introduction to Information Retrieval [M]. Translated by Wang Bin. Beijing: Post & Telecom Press, 2011.)
[22] National Cancer Institute. NCI Thesaurus Hierarchy [EB/OL]. [2014-02-14]. http://ncim.nci.nih.gov/ncimbrowser/pages/source_ hierarchy.jsf?&sab=NCI.