[Objective] This paper studies the topics of scientific literature and then tracks their changes.[Methods] We used the improved CSToT Model (Content Similarity - Topics over Time), to analyze scholarly papers from 9 information science journals in China published from 2012-2016. [Results] The CSToT model effectively revealed the subject structure of scientific literature and the evolution of topics. We also found that majority of the current information science research covers information services, online public opinion and data mining. Their evolution trends include rising, falling, stable and fluctuating patterns, which are particularly prominent in information services research. [Limitations] The training data set needs to be expanded. [Conclusions] The CSToT model could effectively identify the topics of scientific literature and their evolutionary trends, which provide new directions for future research.
何伟林, 奉国和, 谢红玲. 基于CSToT模型的科技文献主题发现与演化研究*[J]. 数据分析与知识发现, 2018, 2(11): 64-72.
He Weilin,Feng Guohe,Xie Hongling. Analyzing Scientific Literature with Content Similarity - Topics over Time Model. Data Analysis and Knowledge Discovery, 2018, 2(11): 64-72.
(Zhao Rongying, Wei Mingkun.Research on the Subject of Knowledge Management Based on Citation Analysis: From the Perspective of Library and Information Science[J]. Information Science, 2017, 35(6): 3-8.)
(Fang Yushen.Trends of Research Topics in the Technology Education: A Citation Analysis from 1994 to 2013[J]. Journal of Library Science in China, 2016, 42(1): 109-125.)
doi: 10.13530/j.cnki.jlis.161009
(Chu Jiewang, Qian Qian.Analysis of Research Focus and Research Methods in the Field of Knowledge Management During the Past Decade[J]. Information Science, 2014, 32(10): 156-160.)
(Zheng Yanning, Xu Xiaoyang, Liu Zhihui.Study on the Method of Identifying Research Fronts Based on Keywords Co-occurrence[J]. Library and Information Service, 2016, 60(4): 85-92.)
doi: 10.13266/j.issn.0252-3116.2016.04.012
(Tang Guoyuan.Building the Method System of the Subject Theme Evolution Based on the Co-word Analysis Method[J]. Library and Information Service, 2017, 61(23): 100-107.)
doi: 10.13266/j.issn.0252-3116.2017.23.012
[6]
Deerwester S.Indexing by Latent Semantic Analysis[J]. Journal of the American Society for Information Science, 1990, 41(6): 391-407.
doi: 10.1002/(ISSN)1097-4571
[7]
Hofmann T.Probabilistic Latent Semantic Analysis[C]// Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. 1999: 289-296.
[8]
Blei D M, Ng A Y, Jordan M L, et al.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3(2): 993-1022.
[9]
Blei D M, Lafferty J D.Dynamic Topic Models[C]// Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006: 113-120.
(Qi Yashuang, Zhu Na, Zhai Yujia.A Comparative Study on Topic Heats Evolution in the Field of Information Science Between the Domestic and Foreign Research Based on DTM[J]. Library and Information Service, 2016, 60(16): 99-109.)
[11]
Wang C, Blei D M, Heckerman D.Continuous Time Dynamic Topic Models[OL]. arXiv Preprint, arXiv: 1206.3298.
(Liu Liangxuan, Huang Mengxing.A Continuous-time Topic Model for Word Burstiness[J]. Computer Engineering, 2016, 42(11): 195-201.)
doi: 10.3969/j.issn.1000-3428.2016.11.032
[13]
Wang X, MCCallum A.Topics Over Time: A Non-Markov Continuous-time Model of Topical Trends[C]// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2006: 424-433.
[14]
Alsumalt L, Barbara D, Domeniconi C.Online LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking[C]// Proceedings of the 8th IEEE International Conference on Data Mining. IEEE, 2008: 3-12.
(He Jianyun, Chen Xingshu, Du Min, et al.Topic Evolution Analysis Based on Improved Online LDA Model[J]. Journal of Central South University: Science and Technology, 2015, 46(2): 547-553.)
(Chen Xingshu, Gao Yue, Jiang Hao, et al.OLDA-Based Model for Hot Topic Evolution and Tracking[J]. Journal of South China University of Technology: Natural Science Edition, 2016, 44(5): 130-136.)
doi: 10.3969/j.issn.1000-565X.2016.05.020
(Shi Mingzhe, Wu Guodong, Zhang Qian, et al.Research on the Long Tail Distribution Recommendation of the Multi-topic and RBM[J]. Journal of Chinese Computer Systems, 2018, 39(2): 304-309.)
(Wang Xingfu, Fu Huanhuan, Wang Lin.Improved Naïve Bayes Algorithm Based on Weighted Instance with Cosine Similarity[J]. Computer Systems and Applications, 2016, 25(8): 166-170.)
(Shi Qingwei, Qiao Xiaodong, Xu Shuo, et al.Author-Topic Evolution Model and Its Application in Analysis of Research Interests Evolution[J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(9): 912-919.)
doi: 10.3772/j.issn.1000-0135.2013.09.002
[21]
Sugimoto C R, Li D, Russell T G, et al.The Shifting Sands of Disciplinary Development: Analyzing North American Library and Information Science Dissertations Using Latent Dirichlet Allocation[J]. Journal of the Association for Information Science & Technology, 2011, 62(1): 185-204.
(Xu Lulu, Wang Xiaoyue, Bai Rujiang, et al.Detecting Emerging Trends of Funds Based on DTM Model and Text Analytics: Case Study of NSF Graphene Field[J]. Data Analysis and Knowledge Discovery, 2018, 2(3): 87-97.)