Please wait a minute...
New Technology of Library and Information Service  2016, Vol. 32 Issue (11): 20-26    DOI: 10.11925/infotech.1003-3513.2016.11.03
Orginal Article Current Issue | Archive | Adv Search |
Extracting Topics of Computer Science Literature with LDA Model
Yang Haixia,Gao Baojun(),Sun Hanlin
Economics and Management School, Wuhan University, Wuhan 430072, China
Export: BibTeX | EndNote (RIS)      

[Objective] This paper employs text mining technology to automatically identify research topics from large amounts of scientific literature and then detects future trends. [Methods] First, we used the LDA model to find both topical prevalence and contents of articles published by the top ten computer science journals in China. Second, we described the evolution of major topics with the help of publishing dates. [Results] We extracted 18 topics from 29, 621 computer science papers and then identified 7 trending topics as well as 6 less popular ones. [Limitations] Our study did not include papers published overseas by Chinese authors. [Conclusions] The proposed method could help us learn the evolution of computer science research and then grasp the emerging trends.

Key wordsComputer science      LDA      Topic mining      Topic prevalence      Document cluster     
Received: 02 June 2016      Published: 20 December 2016

Cite this article:

Yang Haixia,Gao Baojun,Sun Hanlin. Extracting Topics of Computer Science Literature with LDA Model. New Technology of Library and Information Service, 2016, 32(11): 20-26.

URL:     OR

[1] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[2] Blei D M.Probabilistic Topic Models[J]. Communications of the ACM, 2012, 55(4): 77-84.
[3] Griffiths T L, Steyvers M.Finding Scientific Topics[J]. Proceedings of the National Academy of Sciences, 2004, 101(S1): 5228-5235.
[4] 郭玉, 蔚海燕. 我国计算机科学发展态势文献计量分析[J]. 计算机应用研究, 2007, 24(12): 28-31.
[4] (Guo Yu, Yu Haiyan.Biblio-metrilogical Analysis on Development Trends of Computer Science in China[J]. Application Research of Computers, 2007, 24(12): 18-31.)
[5] 陈国良, 孙广中, 徐云, 等. 并行计算的一体化研究现状与发展趋势[J]. 科学通报, 2009, 54(8): 1043-1049.
[5] (Chen Guoliang, Sun Guangzhong, Xu Yun, et al.Integrated Research of Parallel Computing: Status and Future[J]. Chinese Science Bulletin, 2009, 54(8): 1043-1049.)
[6] 章锦文, 马远良. 神经网络计算机的现状与发展趋势[J]. 计算机科学, 1993, 20(6): 24-27.
[6] (Zhang Jinwen, Ma Yuanliang.The Development Situation and Direction of Neurocomputer[J]. Computer Science, 1993, 20(6): 24-27.)
[7] Zheng B, McLean D C, Lu X. Identifying Biological Concepts from a Protein-related Corpus with a Probabilistic Topic Model[J]. BMC Bioinformatics, 2006, 7(4): 58.
[8] Hall D, Jurafsky D, Manning C D.Studying the History of Ideas Using Topic Models [C]. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2008: 363-371.
[9] Wu H, Wang M, Feng J, et al.Research Topic Evolution in “Bioinformatics”[C]. In: Proceedings of the 4th International Conference on Bioinformatics and Biomedical Engineering (iCBBE). IEEE, 2010: 1-4.
[10] Sugimoto C R, Li D, Russell T G, et al.The Shifting Sands of Disciplinary Development: Analyzing North American Library and Information Science Dissertations Using Latent Dirichlet Allocation[J]. Journal of the American Society for Information Science and Technology, 2011, 62(1): 185-204.
[11] Piepenbrink A, Nurmammadov E.Topics in the Literature of Transition Economies and Emerging Markets[J]. Scientometrics, 2015, 102(3): 2107-2130.
[12] 贺亮, 李芳. 科技文献话题演化研究[J]. 现代图书情报技术, 2012(4): 61-67.
[12] (He Liang, Li Fang.Topic Evolution in Scientific Literature[J]. New Technology of Library and Information Service, 2012(4): 61-67.)
[13] 关鹏, 王曰芬, 傅柱.不同语料下基于LDA主题模型的科学文献主题抽取效果分析[J]. 图书情报工作, 2016, 60(2): 112-121.
[13] (Guan Peng, Wang Yuefen, Fu Zhu.Effect Analysis of Scientific Literature Extraction Based on LDA Topic Model with Different Corpus[J]. Library and Information Service, 2016, 60(2): 112-121.)
[14] 李湘东, 张娇, 袁满.基于LDA模型的科技期刊主题演化研究[J]. 情报杂志, 2014, 33(7): 115-121.
[14] (Li Xiangdong, Zhang Jiao, Yuan Man.On Topic Evolution of Scientific Journal Based on LDA Model[J]. Journal of Intelligence, 2014, 33(7): 115-121.)
[15] 王曰芬, 傅柱, 陈必坤.采用LDA主题模型的国内知识流研究结构探讨: 以学科分类主题抽取为视角[J]. 现代图书情报技术, 2016(4): 8-19.
[15] (Wang Yuefen, Fu Zhu, Chen Bikun.Analyzing Knowledge Structure Research with LDA Model[J]. New Technology of Library and Information Service, 2016(4): 8-19.)
[16] 王萍. 基于概率主题模型的文献知识挖掘[J]. 情报学报, 2011, 30(6): 583-590.
[16] (Wang Ping.Literature Knowledge Mining Based on Probabilistic Topic Model[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(6): 583-590.)
[17] 叶春蕾, 冷伏海. 基于引文—主题概率模型的科技文献主题识别方法研究[J]. 情报理论与实践, 2013, 36(9): 100-103.
[17] (Ye Chunlei, Leng Fuhai.Discovering the Topic of Science Literature Based on Citation-Topic Model[J]. Information Studies: Theory & Application, 2013, 36(9): 100-103.)
[18] 王平. 基于层次概率主题模型的科技文献主题发现及演化[J]. 图书情报工作, 2014, 58(22): 70-77.
[18] (Wang Ping.Topic Extraction and Evolution for Scientific Literature Based on Hierarchical Probabilistic Topic Model[J]. Library and Information Service, 2014, 58(22): 70-77.)
[19] 王金龙, 徐从富, 耿雪玉. 基于概率图模型的科研文献主题演化研究[J]. 情报学报, 2009, 28(3): 347-355.
[19] (Wang Jinlong, Xu Congfu, Geng Xueyu.Study on Research Topic Evolution Based on Probabilistic Graphical Models[J]. Journal of the China Society for Scientific and Technical Information, 2009, 28(3): 347-355.)
[20] 李湘东, 廖香鹏, 黄莉. LDA模型下书目信息分类系统的研究与实现[J]. 现代图书情报技术, 2014 (5): 18-25.
[20] (Li Xiangdong, Liao Xiangpeng, Huang Li.Research and Implementation of Bibliographic Information Classification System in LDA Model[J]. New Technology of Library and Information Service, 2014 (5): 18-25.)
[21] 秦晓慧, 乐小虬. 基于LDA主题关联过滤的领域主题演化研究[J]. 现代图书情报技术, 2015 (3): 18-25.
[21] (Qin Xiaohui, Le Xiaoqiu.Topic Evolution Research on a Certain Field Based on LDA Topic Association Filter[J]. New Technology of Library and Information Service, 2015 (3): 18-25.)
[22] 杨如意, 刘东苏, 李慧. 一种融合外部特征的改进主题模型[J]. 现代图书情报技术, 2016(1): 48-54.
[22] (Yang Ruyi, Liu Dongsu, Li Hui.An Improved Topic Model Integrating Extra- Features[J]. New Technology of Library and Information Service, 2016 (1): 48-54.)
[23] Grün B, Hornik K.Topicmodels: An R Package for Fitting Topic Models[J]. Journal of Statistical Software, 2011, 40(13): 1-30.
[24] Blei D M, Lafferty J D.A Correlated Topic Model of Science[J]. The Annals of Applied Statistics, 2007, 1(1): 17-35.
[25] Roberts M E, Stewart B M, Tingley D, et al.The Structural Topic Model and Applied Social Science[J]. Medical Journal of Australia, 2013, 155(6): 419-420.
[26] Roberts M E, Stewart B M, Tingley D. stm: R Package for Structural Topic Models[J]. General Information, 2014, 57(1): 445-460.
[27] Roberts M E, Stewart B M, Tingley D, et al.Structural Topic Models for Open-Ended Survey Responses[J]. American Journal of Political Science, 2014, 58(4): 1064-1082.
[1] Ma Yingxue,Zhao Jichang. Patterns and Evolution of Public Opinion on Weibo During Natural Disasters: Case Study of Typhoons and Rainstorms[J]. 数据分析与知识发现, 2021, 5(6): 66-79.
[2] Yi Huifang,Liu Xiwen. Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[3] Li Yueyan,Wang Hao,Deng Sanhong,Wang Wei. Research Trends of Information Retrieval——Case Study of SIGIR Conference Papers[J]. 数据分析与知识发现, 2021, 5(4): 13-24.
[4] Wang Hongbin,Wang Jianxiong,Zhang Yafei,Yang Heng. Topic Recognition of News Reports with Imbalanced Contents[J]. 数据分析与知识发现, 2021, 5(3): 109-120.
[5] Wang Wei, Gao Ning, Xu Yuting, Wang Hongwei. Topic Evolution of Online Reviews for Crowdfunding Campaigns[J]. 数据分析与知识发现, 2021, 5(10): 103-123.
[6] Cai Yongming,Liu Lu,Wang Kewei. Identifying Key Users and Topics from Online Learning Community[J]. 数据分析与知识发现, 2020, 4(6): 69-79.
[7] Ye Guanghui,Zeng Jieyan,Hu Jinglan,Bi Chongwu. Analyzing Public Sentiments from the Perspective of City Profiles[J]. 数据分析与知识发现, 2020, 4(4): 15-26.
[8] Pan Youneng,Ni Xiuli. Recommending Online Medical Experts with Labeled-LDA Model[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[9] Liu Yuwen,Wang Kai. Finding Geographic Locations of Popular Online Topics[J]. 数据分析与知识发现, 2020, 4(2/3): 173-181.
[10] Ye Guanghui,Xu Tong,Bi Chongwu,Li Xinyue. Analyzing Evolution of City Tourism Portraits with Multi-Dimensional Features and LDA Model[J]. 数据分析与知识发现, 2020, 4(11): 121-130.
[11] Huang Wei,Zhao Jiangyuan,Yan Lu. Empirical Research on Topic Drift Index for Trending Network Events[J]. 数据分析与知识发现, 2020, 4(11): 92-101.
[12] Wang Xiwei,Zhang Liu,Huang Bo,Wei Ya’nan. Constructing Topic Graph for Weibo Users Based on LDA: Case Study of “Egypt Air Disaster”[J]. 数据分析与知识发现, 2020, 4(10): 47-57.
[13] Manyu Huang,Qi Yun,Hufeng Peng,Xuemeng Dou. Analyzing Textual Features of Excess-funded Agricultural Products——Case Study of Crowdfunding Website[J]. 数据分析与知识发现, 2019, 3(9): 124-134.
[14] Hongfei Ling,Shiyan Ou. Review of Automatic Labeling for Topic Models[J]. 数据分析与知识发现, 2019, 3(9): 16-26.
[15] Yunfei Shao,Dongsu Liu. Classifying Short-texts with Class Feature Extension[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938