Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (10): 35-45    DOI: 10.11925/infotech.2096-3467.2022.0075
Predicting Popularity of Emerging Topics with Multivariable LSTM and Bibliometric Indicators
Chen Wen1,2,Chen Wei1,2,3()
1Wuhan Documentation and Information Center, Chinese Academy of Sciences, Wuhan 430071, China
2Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
3Hubei Key Laboratory of Big Data in Science and Technology, Wuhan 430071, China
[Objective] This paper identifies emerging topics from multi-source data, and constructs a multivariable LSTM with bibliometric indicators to predict their popularity. [Methods] Firstly, we explored the topics of funded projects, papers and patents. Secondly, we identified the emerging ones based on their novelty, growth and persistence. Finally, we predicted these topics’ popularity with the multivariable LSTM model and indicators of funding amounts, number of fundings, average citation counts for each article, and number of patent IPC subclasses. [Results] We examined our new model with studies on solid oxide fuel cell, which yielded better performance than BP, KNN, SVM and univariate LSTM. Our model had the lowest MAE (16.534) and RMSE (23.494), as well as the highest R2 (0.642). [Limitations] We did not include each patent’s citation number because it was difficult to obtain specific data for each time window. [Conclusions] The modified LSMT could effectively predict the popularity of emerging topics.

Key wordsMultivariable LSTM      Emerging Topic      Popularity Prediction      Bibliometric Indicator     
Received: 25 January 2022      Published: 16 November 2022
ZTFLH:  G353 TM911  
Fund:Strategic Priority Research Program of the Chinese Academy of Sciences(XDA21010103);Document and Information Capacity Building Project of the Chinese Academy of Sciences(E0290001);Youth Innovation Promotion Association of the Chinese Academy of Sciences(2017221)
Corresponding Authors: Chen Wei,ORCID:0000-0002-2334-1129      E-mail:

Chen Wen, Chen Wei. Predicting Popularity of Emerging Topics with Multivariable LSTM and Bibliometric Indicators. Data Analysis and Knowledge Discovery, 2022, 6(10): 35-45.

Framework of Popularity Prediction of Emerging Topics
Framework of LSTM Cell
变量名称 变量内涵
主题热度 时间窗口下主题的研究热度
基金金额 时间窗口下主题相关的有效基金项目金额之和
基金数量 时间窗口下主题相关的有效基金项目数量之和
论文篇均被引 时间窗口下主题相关论文篇均被引
专利IPC小类数量 时间窗口下公开专利的IPC小类数量之和
Input Variables of Popularity Prediction of Emerging Topics Based on LSTM
Curve of Perplexity
Strategic Coordinate Chart of Emerging Topics Identification
主题编号 主题名 主题编号 主题名
topic5 质子电解质 topic30 SOFC运行温度
topic9 SOFC性能动态建模 topic32 电池储能
topic13 元件微观结构 topic37 电极电解过程
topic21 掺杂材料性能 topic40 中温SOFC阴极
topic22 电力系统效率 topic43 粉体材料制备
SOFC Emerging Topics
Curves of Epoch-MAE and Epoch-RMSE
BP 44.852 59.906 -1.454
SVM 19.585 28.371 0.474
KNN 18.471 26.325 0.548
LSTM 16.833 23.564 0.639
Evaluation of Algorithm Performance
输入变量 MAE RMSE R 2
主题热度 16.833 23.564 0.639
主题热度+基金金额 16.725 23.583 0.639
主题热度+基金数量 16.602 23.631 0.638
主题热度+论文篇均被引 16.796 23.510 0.641
主题热度+ IPC小类数量 16.602 23.536 0.641
以上所有变量 16.534 23.494 0.642
Evaluation of Algorithm Performance under Different Input Variables
Popularity Prediction of SOFC Emerging Topics
[1] Junwan Liu,Zhixin Long,Feifei Wang. Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
