Evolution Analysis of Hot Topics with Trend-Prediction
Yue Lixin1,Liu Ziqiang2,3(),Hu Zhengyin2,3
1School of Information Resource Management, Renmin University of China, Beijing 100872, China 2Chengdu Library of Chinese Academy of Sciences, Chengdu 610041, China 3Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
[Objective] The paper constructs mathematical and content prediction models based on the external and internal characteristics academic articles, aiming to analyze the evolution of trending research topics. [Methods] With the help of LDA model, we identified the needed topics and constructed their time series. Then, we determined the popular topics by mean values and linear regression fitting. Finally, we predicted the trending topics with ARIMA and Word2Vec models based on the topic intensity and content. [Results] We conducted an empirical study to evaluate our models with stem cell research in the United States. We identified popular topics and predicted their development trends. [Limitations] There might be ambiguity in interpreting the documents, because the Word2Vec model analyzes trends of theme contents based on single words. [Conclusions] The proposed method can provide better prediction results than methods based on manual interpretation.
岳丽欣,刘自强,胡正银. 面向趋势预测的热点主题演化分析方法研究*[J]. 数据分析与知识发现, 2020, 4(6): 22-34.
Yue Lixin,Liu Ziqiang,Hu Zhengyin. Evolution Analysis of Hot Topics with Trend-Prediction. Data Analysis and Knowledge Discovery, 2020, 4(6): 22-34.
( Liu Xiaoping, Leng Fuhai, Li Zexia. Methods and Approaches of International S&T Front Analysis[J]. Library and Information Service, 2012,56(12):60-65.)
( Liu Ziqiang, Wang Xiaoyue, Bai Rujiang. Research on Visualization Analysis Method of Discipline Topics Evolution from the Perspective of Multi-Dimensions: A Case Study of the Big Data in the Field of Library and Information Science in China[J]. Journal of Library Science in China, 2016,42(6):67-84.)
( Jing Fachong, Li Chenying, Han Mingjie, et al. Topic Analysis of Projects from Emerging Frontiers Division of NSF’s Directorate for Biological Science Based on Text Mining[J]. Journal of Modern Information, 2014,34(12):107-112.)
( Liu Ziqiang, Wang Xiaoyue, Bai Rujiang. Research on the Forecasting Method of Research Hotspots Analysis Based on Time Series Model[J]. Information Studies: Theory & Application, 2016,39(5):27-33.)
( Xu Xiaoyang, Zheng Yanning, Liu Zhihui. Study on the Method of Identifying Research Fronts Based on Scientific Papers and Patents[J]. Library and Information Service, 2016,60(24):97-106.)
[6]
Yu G, Wang M Y, Yu D R. Characterizing Knowledge Diffusion of Nanoscience & Nanotechnology by Citation Analysis[J]. Scientometrics, 2010,84:81-97.
doi: 10.1007/s11192-009-0090-2
( Hou Jianhua, Wang Zhongyu. The Measurement of Knowledge Flow in Research Subject with an Empirical Analysis——Taking H-index Study as an Example[J]. Library and Information Service, 2017,61(10):87-93.)
( Bai Rujiang, Leng Fuhai. Knowledge Innovational Evolution Analysis Based on k-clique Community Network[J]. Library and Information Service, 2013,57(17):86-94.)
[9]
Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
[10]
Blei D M, Lafferty J. Dynamic Topic Models [C]// Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006: 113-120.
( Fan Yunman, Ma Jianxia. Detection of Emerging Topics Based on LDA and Feature Analysis of Emerging Topics[J]. Journal of the China Society for Scientific and Technical Information, 2014,33(7):698-711.)
( Wang Xiaoyue, Liu Ziqiang, Bai Rujiang, et al. The Method of Research Front Topic Detection Based on the Fund Project Data[J]. Library and Information Service, 2017,61(13):87-98.)
( Wang Xiaoguang, Cheng Qikai. Analysis on Evolution of Research Topics in a Discipline Based on NEViewer[J]. Journal of the China Society for Scientific and Technical Information, 2013,32(9):900-911.)
[15]
Yan E. Research Dynamics, Impact, and Dissemination: A Topic-Level Analysis[J]. Journal of the Association for Information Science and Technology, 2015,66(11):2357-2372.
doi: 10.1002/asi.2015.66.issue-11
( Zhou Yuan, Zhang Chao, Tang Jie, et al. Intelligent Identification of Field Development Trajectory Based on Topic Evolution: A Case Study of Artificial Intelligence[J]. Library and Information Service, 2018,62(14):62-71.)
[17]
Jaccard P. The Distribution of Flora in the Alpine Zone[J]. New Phytologist, 1912,11(2):37-50.
doi: 10.1111/nph.1912.11.issue-2
( Qi Yashuang, Zhu Na, Zhai Yujia. A Comparative Study on Topic Heats Evolution in the Field of Information Science Between the Domestic and Foreign Research Based on DTM[J]. Library and Information Service, 2016,60(16):99-109.)
( Chen Wei, Lin Chaoran, Li Jinqiu, et al. Analysis of the Evolutionary Trend of Technical Topics in Patents Based on LDA and HMM: Taking Marine Diesel Engine Technology as an Example[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(7):732-741.)
( Li Jing, Xu Lulu, Zhao Sujun. Prediction and Visualization of Emerging Topics of Fund Sponsored Projects Based on Time Series Analysis and SVM Model[J]. Information Studies: Theory & Application, 2019,42(1):118-123, 152.)
( Liu Ziqiang, Wang Xiaoyue, Bai Rujiang. Research on the Discipline Topic Evolution Analysis Method of Semantic Classification——A Case Study of Big Data in the Field of Library and Information Science in China[J]. Library and Information Service, 2016,60(15):76-85, 93.)
( Guan Peng, Wang Yuefen, Fu Zhu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. Data Analysis and Knowledge Discovery, 2019,3(7):61-72.)
( Shen Wenjuan, Li Mingshi, Huang Chengquan. Review of Remote Sensing Algorithms for Monitoring Forest Disturbance from Time Series and Multi-source Data Fusion[J]. Journal of Remote Sensing, 2018,22(6):1005-1022.)
( Zhang Wenqiu, Fang Lei, Yang Jian, et al. Reconstruction of Stand-replacement Disturbance and Stand Age of Chinese Fir Plantation Based on a Landsat Time Series in Huitong County, Hunan[J]. Chinese Journal of Ecology, 2018,37(11):3467-3479.)
( Yang Binqing, Zhang Xilin. Forecast of Price of Rare Earths Neodymium Oxide and Dysprosium Oxide Based on ARIMA Time Series Model[J]. Journal of the Chinese Society of Rare Earths, 2017,35(5):680-686.)
( Yue Lixin, Zhou Xiaoying, Chen Yini. Thematic Trend Prediction of Information Architecture Based on the ARIMA Model[J]. Documentation, Information & Knowledge, 2019(5):54-63.)
( Zhou Lian. Exploration of the Working Principle and Application of Word2vec[J]. Sci-Tech Information Development & Economy, 2015,25(2):145-148.)
[29]
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality [C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013: 3111-3119.
( Hu Zhigang, Lin Gege, Sun Taian, et al. Research on Spotlights Analysis for Different Regions in China by VOSviewer[J]. Science and Management, 2017,37(4):44-51, 79.)
( Ji Lijun. Analysis on Information Literacy Hotspots at Home and Abroad Between 2016 and 2018 with VOSviewer[J]. Contemporary Library, 2019(3):23-28.)
( Hou Haiyan, Guo Fangqi, Sun Taian, et al. Analysis of the Domestic and International Research Situation of Biotechnology in Shandong Province by VOSviewer[J]. Science and Management, 2018,38(2):25-33.)