Please wait a minute...
New Technology of Library and Information Service  2016, Vol. 32 Issue (11): 20-26    DOI: 10.11925/infotech.1003-3513.2016.11.03
Orginal Article Current Issue | Archive | Adv Search |
Extracting Topics of Computer Science Literature with LDA Model
Yang Haixia,Gao Baojun(),Sun Hanlin
Economics and Management School, Wuhan University, Wuhan 430072, China
Download: PDF(737 KB)   HTML ( 53
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper employs text mining technology to automatically identify research topics from large amounts of scientific literature and then detects future trends. [Methods] First, we used the LDA model to find both topical prevalence and contents of articles published by the top ten computer science journals in China. Second, we described the evolution of major topics with the help of publishing dates. [Results] We extracted 18 topics from 29, 621 computer science papers and then identified 7 trending topics as well as 6 less popular ones. [Limitations] Our study did not include papers published overseas by Chinese authors. [Conclusions] The proposed method could help us learn the evolution of computer science research and then grasp the emerging trends.

Key wordsComputer science      LDA      Topic mining      Topic prevalence      Document cluster     
Received: 02 June 2016      Published: 20 December 2016

Cite this article:

Yang Haixia,Gao Baojun,Sun Hanlin. Extracting Topics of Computer Science Literature with LDA Model. New Technology of Library and Information Service, 2016, 32(11): 20-26.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2016.11.03     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2016/V32/I11/20

[1] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[2] Blei D M.Probabilistic Topic Models[J]. Communications of the ACM, 2012, 55(4): 77-84.
[3] Griffiths T L, Steyvers M.Finding Scientific Topics[J]. Proceedings of the National Academy of Sciences, 2004, 101(S1): 5228-5235.
[4] 郭玉, 蔚海燕. 我国计算机科学发展态势文献计量分析[J]. 计算机应用研究, 2007, 24(12): 28-31.
[4] (Guo Yu, Yu Haiyan.Biblio-metrilogical Analysis on Development Trends of Computer Science in China[J]. Application Research of Computers, 2007, 24(12): 18-31.)
[5] 陈国良, 孙广中, 徐云, 等. 并行计算的一体化研究现状与发展趋势[J]. 科学通报, 2009, 54(8): 1043-1049.
[5] (Chen Guoliang, Sun Guangzhong, Xu Yun, et al.Integrated Research of Parallel Computing: Status and Future[J]. Chinese Science Bulletin, 2009, 54(8): 1043-1049.)
[6] 章锦文, 马远良. 神经网络计算机的现状与发展趋势[J]. 计算机科学, 1993, 20(6): 24-27.
[6] (Zhang Jinwen, Ma Yuanliang.The Development Situation and Direction of Neurocomputer[J]. Computer Science, 1993, 20(6): 24-27.)
[7] Zheng B, McLean D C, Lu X. Identifying Biological Concepts from a Protein-related Corpus with a Probabilistic Topic Model[J]. BMC Bioinformatics, 2006, 7(4): 58.
[8] Hall D, Jurafsky D, Manning C D.Studying the History of Ideas Using Topic Models [C]. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2008: 363-371.
[9] Wu H, Wang M, Feng J, et al.Research Topic Evolution in “Bioinformatics”[C]. In: Proceedings of the 4th International Conference on Bioinformatics and Biomedical Engineering (iCBBE). IEEE, 2010: 1-4.
[10] Sugimoto C R, Li D, Russell T G, et al.The Shifting Sands of Disciplinary Development: Analyzing North American Library and Information Science Dissertations Using Latent Dirichlet Allocation[J]. Journal of the American Society for Information Science and Technology, 2011, 62(1): 185-204.
[11] Piepenbrink A, Nurmammadov E.Topics in the Literature of Transition Economies and Emerging Markets[J]. Scientometrics, 2015, 102(3): 2107-2130.
[12] 贺亮, 李芳. 科技文献话题演化研究[J]. 现代图书情报技术, 2012(4): 61-67.
[12] (He Liang, Li Fang.Topic Evolution in Scientific Literature[J]. New Technology of Library and Information Service, 2012(4): 61-67.)
[13] 关鹏, 王曰芬, 傅柱.不同语料下基于LDA主题模型的科学文献主题抽取效果分析[J]. 图书情报工作, 2016, 60(2): 112-121.
[13] (Guan Peng, Wang Yuefen, Fu Zhu.Effect Analysis of Scientific Literature Extraction Based on LDA Topic Model with Different Corpus[J]. Library and Information Service, 2016, 60(2): 112-121.)
[14] 李湘东, 张娇, 袁满.基于LDA模型的科技期刊主题演化研究[J]. 情报杂志, 2014, 33(7): 115-121.
[14] (Li Xiangdong, Zhang Jiao, Yuan Man.On Topic Evolution of Scientific Journal Based on LDA Model[J]. Journal of Intelligence, 2014, 33(7): 115-121.)
[15] 王曰芬, 傅柱, 陈必坤.采用LDA主题模型的国内知识流研究结构探讨: 以学科分类主题抽取为视角[J]. 现代图书情报技术, 2016(4): 8-19.
[15] (Wang Yuefen, Fu Zhu, Chen Bikun.Analyzing Knowledge Structure Research with LDA Model[J]. New Technology of Library and Information Service, 2016(4): 8-19.)
[16] 王萍. 基于概率主题模型的文献知识挖掘[J]. 情报学报, 2011, 30(6): 583-590.
[16] (Wang Ping.Literature Knowledge Mining Based on Probabilistic Topic Model[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(6): 583-590.)
[17] 叶春蕾, 冷伏海. 基于引文—主题概率模型的科技文献主题识别方法研究[J]. 情报理论与实践, 2013, 36(9): 100-103.
[17] (Ye Chunlei, Leng Fuhai.Discovering the Topic of Science Literature Based on Citation-Topic Model[J]. Information Studies: Theory & Application, 2013, 36(9): 100-103.)
[18] 王平. 基于层次概率主题模型的科技文献主题发现及演化[J]. 图书情报工作, 2014, 58(22): 70-77.
[18] (Wang Ping.Topic Extraction and Evolution for Scientific Literature Based on Hierarchical Probabilistic Topic Model[J]. Library and Information Service, 2014, 58(22): 70-77.)
[19] 王金龙, 徐从富, 耿雪玉. 基于概率图模型的科研文献主题演化研究[J]. 情报学报, 2009, 28(3): 347-355.
[19] (Wang Jinlong, Xu Congfu, Geng Xueyu.Study on Research Topic Evolution Based on Probabilistic Graphical Models[J]. Journal of the China Society for Scientific and Technical Information, 2009, 28(3): 347-355.)
[20] 李湘东, 廖香鹏, 黄莉. LDA模型下书目信息分类系统的研究与实现[J]. 现代图书情报技术, 2014 (5): 18-25.
[20] (Li Xiangdong, Liao Xiangpeng, Huang Li.Research and Implementation of Bibliographic Information Classification System in LDA Model[J]. New Technology of Library and Information Service, 2014 (5): 18-25.)
[21] 秦晓慧, 乐小虬. 基于LDA主题关联过滤的领域主题演化研究[J]. 现代图书情报技术, 2015 (3): 18-25.
[21] (Qin Xiaohui, Le Xiaoqiu.Topic Evolution Research on a Certain Field Based on LDA Topic Association Filter[J]. New Technology of Library and Information Service, 2015 (3): 18-25.)
[22] 杨如意, 刘东苏, 李慧. 一种融合外部特征的改进主题模型[J]. 现代图书情报技术, 2016(1): 48-54.
[22] (Yang Ruyi, Liu Dongsu, Li Hui.An Improved Topic Model Integrating Extra- Features[J]. New Technology of Library and Information Service, 2016 (1): 48-54.)
[23] Grün B, Hornik K.Topicmodels: An R Package for Fitting Topic Models[J]. Journal of Statistical Software, 2011, 40(13): 1-30.
[24] Blei D M, Lafferty J D.A Correlated Topic Model of Science[J]. The Annals of Applied Statistics, 2007, 1(1): 17-35.
[25] Roberts M E, Stewart B M, Tingley D, et al.The Structural Topic Model and Applied Social Science[J]. Medical Journal of Australia, 2013, 155(6): 419-420.
[26] Roberts M E, Stewart B M, Tingley D. stm: R Package for Structural Topic Models[J]. General Information, 2014, 57(1): 445-460.
[27] Roberts M E, Stewart B M, Tingley D, et al.Structural Topic Models for Open-Ended Survey Responses[J]. American Journal of Political Science, 2014, 58(4): 1064-1082.
[1] Lixin Xia,Jieyan Zeng,Chongwu Bi,Guanghui Ye. Identifying Hierarchy Evolution of User Interests with LDA Topic Model[J]. 数据分析与知识发现, 2019, 3(7): 1-13.
[2] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[3] Linna Xi,Yongxiang Dou. Examining Reposts of Micro-bloggers with Planned Behavior Theory[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[4] Jie Zhang,Junbo Zhao,Dongsheng Zhai,Ningning Sun. Patent Technology Analysis of Microalgae Biofuel Industrial Chain Based on Topic Model[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[5] Junwan Liu,Zhixin Long,Feifei Wang. Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
[6] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[7] Yue He,Yue Feng,Shupeng Zhao,Yufeng Ma. Recommending Contents Based on Zhihu Q&A Community: Case Study of Logistics Topics[J]. 数据分析与知识发现, 2018, 2(9): 42-49.
[8] Tao Zhang,Haiqun Ma. Clustering Policy Texts Based on LDA Topic Model[J]. 数据分析与知识发现, 2018, 2(9): 59-65.
[9] Yanhua Xu,Yujie Miao,Lin Miao,Xueqiang Lv. Generating HSK Writing Essays with LDA Model[J]. 数据分析与知识发现, 2018, 2(9): 80-87.
[10] Ziming Zeng,Qianwen Yang. Sentiment Analysis for Micro-blogs with LDA and AdaBoost[J]. 数据分析与知识发现, 2018, 2(8): 51-59.
[11] Beibei Pang,Juanqiong Gou,Wenxin Mu. Extracting Topics and Their Relationship from College Student Mentoring[J]. 数据分析与知识发现, 2018, 2(6): 92-101.
[12] Shuyi Wang,Huatao Liao,Chake Wu. Mining News on Competitors with Sentiment Classification[J]. 数据分析与知识发现, 2018, 2(3): 70-78.
[13] Li Wang,Lixue Zou,Xiwen Liu. Visualizing Document Correlation Based on LDA Model[J]. 数据分析与知识发现, 2018, 2(3): 98-106.
[14] Jingqi Wang,Rui Li,Huayi Wu. The Evolution of Online Public Opinion Based on Spatial Autocorrelation[J]. 数据分析与知识发现, 2018, 2(2): 64-73.
[15] He Li,Linlin Zhu,Min Yan,Jincheng Liu,Chuang Hong. Identifying Useful Information from Open Innovation Community[J]. 数据分析与知识发现, 2018, 2(12): 12-22.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn