|
|
Analyzing Topic Evolution with Topic Filtering and Relevance |
Qu Jiabin1,2, Ou Shiyan1() |
1(School of Information Management, Nanjing University, Nanjing 210023, China) 2(Yantai University Library, Yantai 264005, China) |
|
|
Abstract [Objective] There are lots of irrelevant results among the topics identified by the LDA model, which poses negative effects to the accuracy of evolution analysis. This paper constructs topics evolution paths to analyze their evolution by filtering out noises and calculating relevance. [Methods] First, we filtered out irrelevant topics by their probability of appearing in all documents and the word propensity distribution of topics. Then, we calculated the Jensen-Shannon Divergence to identify related topics. Finally, we constructed the topic evolution paths based on the correlation between topics. [Results] The effectiveness of the proposed method was examined with scientific literature on “machine learning”, which yielded five evolution paths, i.e. rebirth, extinction, succession, division and merger. [Limitations] There are some subjective factors involving the estimated threshold values. [Conclusions] The proposed method could avoid the interference of noise topics, and then identify relevant topics from adjacent time intervals. It helps us discover the evolution of discipline topics more accurately.
|
Received: 07 November 2017
Published: 05 February 2018
|
|
[1] |
叶春蕾, 冷伏海. 基于共词分析的学科主题演化方法改进研究[J]. 情报理论与实践, 2012, 35(3): 79-82.
|
[1] |
(Ye Chunlei, Leng Fuhai.Research on the Improvement of Subject Evolution Method Based on Co-word Analysis[J]. Information Studies: Theory & Application, 2012, 35(3): 79-82.)
|
[2] |
唐果媛, 张薇. 基于共词分析法的学科主题演化研究进展与分析[J]. 图书情报工作, 2015, 59(5): 128-136.
doi: 10.13266/j.issn.0252-3116.2015.05.020
|
[2] |
(Tang Guoyuan, Zhang Wei.Development and Analysis of Subject Theme Evolution Based on Co-word Analysis Method[J]. Library and Information Service, 2015, 59(5): 128-136.)
doi: 10.13266/j.issn.0252-3116.2015.05.020
|
[3] |
Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
|
[4] |
杨星, 李保利, 金明举. 基于LDA模型的研究领域热点及趋势分析[J]. 计算机技术与发展, 2012, 22(10): 66-69.
|
[4] |
(Yang Xing, Li Baoli, Jin Mingju.LDA-based Research Domain Hotspots and Trend Analysis[J]. Computer Technology and Development, 2012, 22(10): 66-69.)
|
[5] |
单斌, 李芳. 基于LDA话题演化研究方法综述[J]. 中文信息学报, 2010, 24(6): 43-49.
doi: 10.3969/j.issn.1003-0077.2010.06.007
|
[5] |
(Shan Bin, Li Fang.A Survey of Topic Evolution Based on LDA[J]. Journal of Chinese Information Processing, 2010, 24(6): 43-49.)
doi: 10.3969/j.issn.1003-0077.2010.06.007
|
[6] |
Wang X, McCallum A. Topic over Time: A Non-Markov Continuous-Time Model of Topical Trends[C] //Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006: 424-433.
|
[7] |
杨海霞, 高宝俊, 孙含林. 基于LDA挖掘计算机科学文献的研究主题[J]. 现代图书情报技术, 2016(11): 20-26.
|
[7] |
(Yang Haixia, Gao Baojun, Sun Hanlin.Extracting Topics of Computer Science Literature with LDA Model[J]. New Technology of Library and Information Service, 2016(11): 20-26.)
|
[8] |
单斌, 李芳. 基于种子文档LDA话题的演化研究[J]. 现代图书情报技术, 2011(7/8): 104-109.
|
[8] |
(Shan Bin, Li Fang.Topic Evolution Based on Seminal Document and Topic Model[J]. New Technology of Library and Information Service, 2011(7/8): 104-109.)
|
[9] |
胡艳丽, 白亮, 张维明. 一种话题演化建模与分析方法[J]. 自动化学报, 2012, 38(10): 1690-1697.
doi: 10.3724/SP.J.1004.2012.01690
|
[9] |
(Hu Yanli, Bai Liang, Zhang Weiming.Modeling and Analyzing Topic Evolution[J]. Acta Automatic Sinica, 2012, 38(10): 1690-1697.)
doi: 10.3724/SP.J.1004.2012.01690
|
[10] |
祝娜, 王芳. 基于主题关联的知识演化路径识别研究——以3D打印领域为例[J]. 图书情报工作, 2016, 60(5): 101-109.
doi: 10.13266/j.issn.0252-3116.2016.05.015
|
[10] |
(Zhu Na, Wang Fang.Identification of Knowledge Evolutionary Path Based on Topic Relevance: Taking the Case of 3D Printing Field[J]. Library and Information Service, 2016, 60(5): 101-109.)
doi: 10.13266/j.issn.0252-3116.2016.05.015
|
[11] |
崔凯, 周斌, 贾焰, 等. 一种基于LDA的在线主题演化挖掘模型[J].计算机科学, 2010, 37(11): 156-159, 193.
doi: 10.3969/j.issn.1002-137X.2010.11.037
|
[11] |
(Cui Kai, Zhou Bin, Jia Yan, et al.LDA-based Model for Online Topic Evolution Mining[J]. Computer Science, 2010, 37(11): 156-159, 193.)
doi: 10.3969/j.issn.1002-137X.2010.11.037
|
[12] |
秦晓慧, 乐小虬. 基于LDA主题关联过滤的领域主题演化研究[J]. 现代图书情报技术, 2015(3): 18-25.
|
[12] |
(Qin Xiaohui, Le Xiaoqiu.Topic Evolution Research on a Certain Field Based on LDA Topic Association Filter[J]. New Technology of Library and Information Service, 2015(3): 18-25.)
|
[13] |
李湘东, 张娇, 袁满. 基于LDA模型的科技期刊主题演化研究[J]. 情报杂志, 2014, 33(7): 115-121.
doi: 10.3969/j.issn.1002-1965.2014.07.021
|
[13] |
(Li Xiangdong, Zhang Jiao, Yuan Man.On Topic Evolution of a Scientific Journal Based on LDA Model[J]. Journal of Intelligence, 2014, 33(7): 115-121.)
doi: 10.3969/j.issn.1002-1965.2014.07.021
|
[14] |
Blei D M, Lafferty J D.Dynamic Topic Models[C]// Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006: 113-120.
|
[15] |
齐亚双, 祝娜, 翟羽佳. 基于DTM的国内外情报学研究主题热度演化对比研究[J]. 图书情报工作, 2016, 60(16): 99-109.
|
[15] |
(Qi Yashuang, Zhu Na, Zhai Yujia.A Comparative Study on Topic Heats Evolution in the Field of Information Science Between the Domestic and Foreign Research Based on DTM[J]. Library and Information Service, 2016, 60(16): 99-109.)
|
[16] |
王燕鹏. 国内基于主题模型的科技文献主题发现及演化研究进展[J]. 图书情报工作, 2016, 60(3): 130-137.
doi: 10.13266/j.issn.0252-3116.2016.03.019
|
[16] |
(Wang Yanpeng.Research Progress of Scientific and Technical Literature Topic Detection and Evolution Based on Topic Model in China[J]. Library and Information Service, 2016, 60(3): 130-137.)
doi: 10.13266/j.issn.0252-3116.2016.03.019
|
[17] |
Cao J, Xia T, Li J, et al.A Density-based Method for Adaptive LDA Model Selection[J]. Neurocomputing, 2009, 72(7-9): 1775-1781.
doi: 10.1016/j.neucom.2008.06.011
|
[18] |
关鹏, 王曰芬. 科技情报分析中LDA主题模型最优主题数确定方法研究[J]. 现代图书情报技术, 2016(9): 42-50.
|
[18] |
(Guan Peng, Wang Yuefen.Identifying Optimal Topic Numbers from Sci-Tech Information with LDA Model[J]. New Technology of Library and Information Service, 2016(9): 42-50.)
|
[19] |
曹娟, 张勇东, 李锦涛, 等. 一种基于密度的自适应最优LDA模型选择方法[J]. 计算机学报, 2008, 31(10): 1780-1787.
|
[19] |
(Cao Juan, Zhang Yongdong, Li Jintao, et al.A Method of Adaptively Selecting Best LDA Model Based on Density[J]. Chinese Journal of Computers, 2008, 31(10): 1780-1787.)
|
[20] |
Lee L.On the Eectiveness of the Skew Divergence for Statistical Language Analysis[C]//Proceeding of the 4th International Conference on Artificial Intelligence & Statistics. 2001: 65-72.
|
[21] |
Alsumait L, Barbará D, Gentle J, et al.Topic Significance Ranking of LDA Generative Models[A]// Machine Learning and Knowledge Discovery in Databases[M]. Springer, Berlin, Heidelberg, 2009: 67-82.
|
[22] |
袁胜文. 基于LDA的中文科技文献话题演化研究[D]. 郑州: 河南工业大学, 2015.
|
[22] |
(Yuan Shengwen.The Research on Topic Evolution for Chinese Literature of Science and Technology Based on LDA[D]. Zhengzhou: Henan University of Technology, 2015.)
|
[23] |
MacKay D J C. Information Theory, Inference, and Learning Algorithms[M]. Cambridge University Press, 2003.
|
[24] |
Lin J.Divergence Measures Based on Shannon Entropy[J]. IEEE Transactions on Information Theory, 1991, 37(1): 145-151.
doi: 10.1109/18.61115
|
[25] |
吕楠. 话题追踪与演化分析技术研究[D]. 郑州: 解放军信息工程大学, 2009.
|
[25] |
(Lv Nan.Research on Topic Tracking and Evolution Analysis Technique[D]. Zhengzhou: Information Engineering University, 2009.)
|
[26] |
THULAC: 一个高效的中文词法分析工具包[EB/OL]. [2017-07-11]. .
|
[26] |
(THULAC: An Efficient Chinese Lexical Analysis Toolkit [EB/OL]. [2017-07-11].
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|