|
|
Extracting Topics of Computer Science Literature with LDA Model |
Yang Haixia,Gao Baojun(),Sun Hanlin |
Economics and Management School, Wuhan University, Wuhan 430072, China |
|
|
Abstract [Objective] This paper employs text mining technology to automatically identify research topics from large amounts of scientific literature and then detects future trends. [Methods] First, we used the LDA model to find both topical prevalence and contents of articles published by the top ten computer science journals in China. Second, we described the evolution of major topics with the help of publishing dates. [Results] We extracted 18 topics from 29, 621 computer science papers and then identified 7 trending topics as well as 6 less popular ones. [Limitations] Our study did not include papers published overseas by Chinese authors. [Conclusions] The proposed method could help us learn the evolution of computer science research and then grasp the emerging trends.
|
Received: 02 June 2016
Published: 20 December 2016
|
[1] | Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022. | [2] | Blei D M.Probabilistic Topic Models[J]. Communications of the ACM, 2012, 55(4): 77-84. | [3] | Griffiths T L, Steyvers M.Finding Scientific Topics[J]. Proceedings of the National Academy of Sciences, 2004, 101(S1): 5228-5235. | [4] | 郭玉, 蔚海燕. 我国计算机科学发展态势文献计量分析[J]. 计算机应用研究, 2007, 24(12): 28-31. | [4] | (Guo Yu, Yu Haiyan.Biblio-metrilogical Analysis on Development Trends of Computer Science in China[J]. Application Research of Computers, 2007, 24(12): 18-31.) | [5] | 陈国良, 孙广中, 徐云, 等. 并行计算的一体化研究现状与发展趋势[J]. 科学通报, 2009, 54(8): 1043-1049. | [5] | (Chen Guoliang, Sun Guangzhong, Xu Yun, et al.Integrated Research of Parallel Computing: Status and Future[J]. Chinese Science Bulletin, 2009, 54(8): 1043-1049.) | [6] | 章锦文, 马远良. 神经网络计算机的现状与发展趋势[J]. 计算机科学, 1993, 20(6): 24-27. | [6] | (Zhang Jinwen, Ma Yuanliang.The Development Situation and Direction of Neurocomputer[J]. Computer Science, 1993, 20(6): 24-27.) | [7] | Zheng B, McLean D C, Lu X. Identifying Biological Concepts from a Protein-related Corpus with a Probabilistic Topic Model[J]. BMC Bioinformatics, 2006, 7(4): 58. | [8] | Hall D, Jurafsky D, Manning C D.Studying the History of Ideas Using Topic Models [C]. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2008: 363-371. | [9] | Wu H, Wang M, Feng J, et al.Research Topic Evolution in “Bioinformatics”[C]. In: Proceedings of the 4th International Conference on Bioinformatics and Biomedical Engineering (iCBBE). IEEE, 2010: 1-4. | [10] | Sugimoto C R, Li D, Russell T G, et al.The Shifting Sands of Disciplinary Development: Analyzing North American Library and Information Science Dissertations Using Latent Dirichlet Allocation[J]. Journal of the American Society for Information Science and Technology, 2011, 62(1): 185-204. | [11] | Piepenbrink A, Nurmammadov E.Topics in the Literature of Transition Economies and Emerging Markets[J]. Scientometrics, 2015, 102(3): 2107-2130. | [12] | 贺亮, 李芳. 科技文献话题演化研究[J]. 现代图书情报技术, 2012(4): 61-67. | [12] | (He Liang, Li Fang.Topic Evolution in Scientific Literature[J]. New Technology of Library and Information Service, 2012(4): 61-67.) | [13] | 关鹏, 王曰芬, 傅柱.不同语料下基于LDA主题模型的科学文献主题抽取效果分析[J]. 图书情报工作, 2016, 60(2): 112-121. | [13] | (Guan Peng, Wang Yuefen, Fu Zhu.Effect Analysis of Scientific Literature Extraction Based on LDA Topic Model with Different Corpus[J]. Library and Information Service, 2016, 60(2): 112-121.) | [14] | 李湘东, 张娇, 袁满.基于LDA模型的科技期刊主题演化研究[J]. 情报杂志, 2014, 33(7): 115-121. | [14] | (Li Xiangdong, Zhang Jiao, Yuan Man.On Topic Evolution of Scientific Journal Based on LDA Model[J]. Journal of Intelligence, 2014, 33(7): 115-121.) | [15] | 王曰芬, 傅柱, 陈必坤.采用LDA主题模型的国内知识流研究结构探讨: 以学科分类主题抽取为视角[J]. 现代图书情报技术, 2016(4): 8-19. | [15] | (Wang Yuefen, Fu Zhu, Chen Bikun.Analyzing Knowledge Structure Research with LDA Model[J]. New Technology of Library and Information Service, 2016(4): 8-19.) | [16] | 王萍. 基于概率主题模型的文献知识挖掘[J]. 情报学报, 2011, 30(6): 583-590. | [16] | (Wang Ping.Literature Knowledge Mining Based on Probabilistic Topic Model[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(6): 583-590.) | [17] | 叶春蕾, 冷伏海. 基于引文—主题概率模型的科技文献主题识别方法研究[J]. 情报理论与实践, 2013, 36(9): 100-103. | [17] | (Ye Chunlei, Leng Fuhai.Discovering the Topic of Science Literature Based on Citation-Topic Model[J]. Information Studies: Theory & Application, 2013, 36(9): 100-103.) | [18] | 王平. 基于层次概率主题模型的科技文献主题发现及演化[J]. 图书情报工作, 2014, 58(22): 70-77. | [18] | (Wang Ping.Topic Extraction and Evolution for Scientific Literature Based on Hierarchical Probabilistic Topic Model[J]. Library and Information Service, 2014, 58(22): 70-77.) | [19] | 王金龙, 徐从富, 耿雪玉. 基于概率图模型的科研文献主题演化研究[J]. 情报学报, 2009, 28(3): 347-355. | [19] | (Wang Jinlong, Xu Congfu, Geng Xueyu.Study on Research Topic Evolution Based on Probabilistic Graphical Models[J]. Journal of the China Society for Scientific and Technical Information, 2009, 28(3): 347-355.) | [20] | 李湘东, 廖香鹏, 黄莉. LDA模型下书目信息分类系统的研究与实现[J]. 现代图书情报技术, 2014 (5): 18-25. | [20] | (Li Xiangdong, Liao Xiangpeng, Huang Li.Research and Implementation of Bibliographic Information Classification System in LDA Model[J]. New Technology of Library and Information Service, 2014 (5): 18-25.) | [21] | 秦晓慧, 乐小虬. 基于LDA主题关联过滤的领域主题演化研究[J]. 现代图书情报技术, 2015 (3): 18-25. | [21] | (Qin Xiaohui, Le Xiaoqiu.Topic Evolution Research on a Certain Field Based on LDA Topic Association Filter[J]. New Technology of Library and Information Service, 2015 (3): 18-25.) | [22] | 杨如意, 刘东苏, 李慧. 一种融合外部特征的改进主题模型[J]. 现代图书情报技术, 2016(1): 48-54. | [22] | (Yang Ruyi, Liu Dongsu, Li Hui.An Improved Topic Model Integrating Extra- Features[J]. New Technology of Library and Information Service, 2016 (1): 48-54.) | [23] | Grün B, Hornik K.Topicmodels: An R Package for Fitting Topic Models[J]. Journal of Statistical Software, 2011, 40(13): 1-30. | [24] | Blei D M, Lafferty J D.A Correlated Topic Model of Science[J]. The Annals of Applied Statistics, 2007, 1(1): 17-35. | [25] | Roberts M E, Stewart B M, Tingley D, et al.The Structural Topic Model and Applied Social Science[J]. Medical Journal of Australia, 2013, 155(6): 419-420. | [26] | Roberts M E, Stewart B M, Tingley D. stm: R Package for Structural Topic Models[J]. General Information, 2014, 57(1): 445-460. | [27] | Roberts M E, Stewart B M, Tingley D, et al.Structural Topic Models for Open-Ended Survey Responses[J]. American Journal of Political Science, 2014, 58(4): 1064-1082. |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|