Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (10): 35-45    DOI: 10.11925/infotech.2096-3467.2022.0075
Current Issue | Archive | Adv Search |
Predicting Popularity of Emerging Topics with Multivariable LSTM and Bibliometric Indicators
Chen Wen1,2,Chen Wei1,2,3()
1Wuhan Documentation and Information Center, Chinese Academy of Sciences, Wuhan 430071, China
2Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
3Hubei Key Laboratory of Big Data in Science and Technology, Wuhan 430071, China
Download: PDF (1366 KB)   HTML ( 7
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper identifies emerging topics from multi-source data, and constructs a multivariable LSTM with bibliometric indicators to predict their popularity. [Methods] Firstly, we explored the topics of funded projects, papers and patents. Secondly, we identified the emerging ones based on their novelty, growth and persistence. Finally, we predicted these topics’ popularity with the multivariable LSTM model and indicators of funding amounts, number of fundings, average citation counts for each article, and number of patent IPC subclasses. [Results] We examined our new model with studies on solid oxide fuel cell, which yielded better performance than BP, KNN, SVM and univariate LSTM. Our model had the lowest MAE (16.534) and RMSE (23.494), as well as the highest R2 (0.642). [Limitations] We did not include each patent’s citation number because it was difficult to obtain specific data for each time window. [Conclusions] The modified LSMT could effectively predict the popularity of emerging topics.

Key wordsMultivariable LSTM      Emerging Topic      Popularity Prediction      Bibliometric Indicator     
Received: 25 January 2022      Published: 16 November 2022
ZTFLH:  G353 TM911  
Fund:Strategic Priority Research Program of the Chinese Academy of Sciences(XDA21010103);Document and Information Capacity Building Project of the Chinese Academy of Sciences(E0290001);Youth Innovation Promotion Association of the Chinese Academy of Sciences(2017221)
Corresponding Authors: Chen Wei,ORCID:0000-0002-2334-1129      E-mail: chenw@whlib.ac.cn

Cite this article:

Chen Wen, Chen Wei. Predicting Popularity of Emerging Topics with Multivariable LSTM and Bibliometric Indicators. Data Analysis and Knowledge Discovery, 2022, 6(10): 35-45.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0075     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I10/35

Framework of Popularity Prediction of Emerging Topics
Framework of LSTM Cell
变量名称 变量内涵
主题热度 时间窗口下主题的研究热度
基金金额 时间窗口下主题相关的有效基金项目金额之和
基金数量 时间窗口下主题相关的有效基金项目数量之和
论文篇均被引 时间窗口下主题相关论文篇均被引
专利IPC小类数量 时间窗口下公开专利的IPC小类数量之和
Input Variables of Popularity Prediction of Emerging Topics Based on LSTM
Curve of Perplexity
Strategic Coordinate Chart of Emerging Topics Identification
主题编号 主题名 主题编号 主题名
topic5 质子电解质 topic30 SOFC运行温度
topic9 SOFC性能动态建模 topic32 电池储能
topic13 元件微观结构 topic37 电极电解过程
topic21 掺杂材料性能 topic40 中温SOFC阴极
topic22 电力系统效率 topic43 粉体材料制备
SOFC Emerging Topics
Curves of Epoch-MAE and Epoch-RMSE
模型 MAE RMSE R 2
BP 44.852 59.906 -1.454
SVM 19.585 28.371 0.474
KNN 18.471 26.325 0.548
LSTM 16.833 23.564 0.639
Evaluation of Algorithm Performance
输入变量 MAE RMSE R 2
主题热度 16.833 23.564 0.639
主题热度+基金金额 16.725 23.583 0.639
主题热度+基金数量 16.602 23.631 0.638
主题热度+论文篇均被引 16.796 23.510 0.641
主题热度+ IPC小类数量 16.602 23.536 0.641
以上所有变量 16.534 23.494 0.642
Evaluation of Algorithm Performance under Different Input Variables
Popularity Prediction of SOFC Emerging Topics
[1] 周云泽, 闵超. 基于LDA模型与共享语义空间的新兴技术识别——以自动驾驶汽车为例[J]. 数据分析与知识发现, 2022, 6(2/3): 55-66.
[1] (Zhou Yunze, Min Chao. Identifying Emerging Technology with LDA Model and Shared Semantic Space——Case Study of Autonomous Vehicles[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 55-66.)
[2] Matsumura N, Matsuo Y, Ohsawa Y, et al. Discovering Emerging Topics from WWW[J]. Journal of Contingencies and Crisis Management, 2002, 10(2): 73-81.
doi: 10.1111/1468-5973.00183
[3] Zhang S T, Han F. Identifying Emerging Topics in a Technological Domain[J]. Journal of Intelligent & Fuzzy Systems, 2016, 31(4): 2147-2157.
[4] Tu Y N, Seng J L. Indices of Novelty for Emerging Topic Detection[J]. Information Processing & Management, 2012, 48(2): 303-325.
doi: 10.1016/j.ipm.2011.07.006
[5] Glänzel W, Thijs B. Using ‘Core Documents’ for Detecting and Labelling New Emerging Topics[J]. Scientometrics, 2012, 91(2): 399-416.
doi: 10.1007/s11192-011-0591-7
[6] Rotolo D, Hicks D, Martin B R. What is an Emerging Technology?[J]. Research Policy, 2015, 44(10): 1827-1843.
doi: 10.1016/j.respol.2015.06.006
[7] 白敬毅, 颜端武, 陈琼. 基于主题模型和曲线拟合的新兴主题趋势预测研究[J]. 情报理论与实践, 2020, 43(7): 130-136, 193.
[7] (Bai Jingyi, Yan Ruiwu, Chen Qiong. Trend Prediction of Emerging Topics Based on Topic Model and Curve Fitting[J]. Information Studies: Theory & Application, 2020, 43(7): 130-136, 193.)
[8] 杨金庆, 魏雨晗, 黄圣智, 等. 基于科技文献的新兴主题识别研究综述[J]. 情报科学, 2020, 38(8): 159-163,177.
[8] (Yang Jinqing, Wei Yuhan, Huang Shengzhi, et al. Research Review on Emerging Topic Identification Based on Scientific Literatures[J]. Information Science, 2020, 38(8): 159-163, 177.)
[9] 黄鲁成, 唐月强, 吴菲菲, 等. 基于文献多属性测度的新兴主题识别方法研究[J]. 科学学与科学技术管理, 2015, 36(2): 34-43.
[9] (Huang Lucheng, Tang Yueqiang, Wu Feifei, et al. Research on Identification of Emerging Topics Based on Muti-Attribute Measurement of Literature[J]. Science of Science and Management of S.& T., 2015, 36(2): 34-43.)
[10] 宋欣娜, 郭颖, 席笑文. 基于专利文献的多指标新兴技术识别研究[J]. 情报杂志, 2020, 39(6): 76-81, 88.
[10] (Song Xinna, Guo Ying, Xi Xiaowen. Research on Multi-Indicator Emerging Technology Identification Based on Patent Literature[J]. Journal of Intelligence, 2020, 39(6): 76-81, 88.)
[11] 刘小平, 冷伏海, 李泽霞. 国际科技前沿分析的方法和途径[J]. 图书情报工作, 2012, 56(12): 60-65.
[11] (Liu Xiaoping, Leng Fuhai, Li Zexia. Methods and Approaches of International S&T Front Analysis[J]. Library and Information Service, 2012, 56(12): 60-65.)
[12] 张婧, 刘彦君, 张炜, 等. 基于科研项目数据的科技前沿识别有效路径实证探索[J]. 科技管理研究, 2019, 39(16): 108-119.
[12] (Zhang Jing, Liu Yanjun, Zhang Wei, et al. Empirical Exploration on Effective Paths to Identify Frontier Tech Based upon Data of Scientific Research Projects[J]. Science and Technology Management Research, 2019, 39(16): 108-119.)
[13] 曾海娇, 孙巍. 基于专利与论文关联的潜在科学前沿识别——以生物农药领域为例[J]. 农业展望, 2020, 16(9): 93-100.
[13] (Zeng Haijiao, Sun Wei. Identification of Potential Scientific Frontiers Based on Correlation Between Patents and Papers——A Case Study of Biopesticide[J]. Agricultural Outlook, 2020, 16(9): 93-100.)
[14] 徐路路, 王芳. 基于支持向量机和改进粒子群算法的科学前沿预测模型研究[J]. 情报科学, 2019, 37(8): 22-28.
[14] (Xu Lulu, Wang Fang. Scientific Frontier Prediction Model Based on Support Vector Machine and Improved Particle Swarm Optimization[J]. Information Science, 2019, 37(8): 22-28.)
[15] 白如江, 刘博文, 冷伏海. 基于多维指标的未来新兴科学研究前沿识别研究[J]. 情报学报, 2020, 39(7): 747-760.
[15] (Bai Rujiang, Liu Bowen, Leng Fuhai. Frontier Identification of Emerging Scientific Research Based on Multi-indicators[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(7): 747-760.)
[16] 宋凯, 朱彦君. 专利前沿技术主题识别及趋势预测方法——以人工智能领域为例[J]. 情报杂志, 2021, 40(1): 33-38.
[16] (Song Kai, Zhu Yanjun. Patent Frontier Technology Topic Identification and Trend Prediction: A Case Analysis of Artificial Intelligence[J]. Journal of Intelligence, 2021, 40(1): 33-38.)
[17] 岳丽欣, 刘自强, 胡正银. 面向趋势预测的热点主题演化分析方法研究[J]. 数据分析与知识发现, 2020, 4(6): 22-34.
[17] (Yue Lixin, Liu Ziqiang, Hu Zhengyin. Evolution Analysis of Hot Topics with Trend-Prediction[J]. Data Analysis and Knowledge Discovery, 2020, 4(6): 22-34.)
[18] 李静, 徐路路, 赵素君. 基于时间序列分析和SVM模型的基金项目新兴主题趋势预测与可视化研究[J]. 情报理论与实践, 2019, 42(1): 118-123, 152.
[18] (Li Jing, Xu Lulu, Zhao Sujun. Prediction and Visualization of Emerging Topics of Fund Sponsored Projects Based on Time Series Analysis and SVM Model[J]. Information Studies: Theory & Application, 2019, 42(1): 118-123, 152.)
[19] 霍朝光, 董克, 司湘云. 国内外LIS学科主题热度演化分析与预测[J]. 图书情报知识, 2021(2): 35-47, 57.
[19] (Huo Chaoguang, Dong Ke, Si Xiangyun. Evolution Analysis and Prediction of Scientific Topic Popularity in the Field of LIS[J]. Documentation, Information & Knowledge, 2021(2): 35-47, 57.)
[20] 霍朝光, 霍帆帆, 董克. 基于LSTM神经网络的学科主题热度预测模型[J]. 图书情报知识, 2021(2): 25-34.
[20] (Huo Chaoguang, Huo Fanfan, Dong Ke. The Popularity Prediction of Scientific Topics Based on LSTM[J]. Documentation, Information & Knowledge, 2021(2): 25-34.)
[21] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
[22] 江秋菊. 融入主题和时间因素的文献影响力评价研究[J]. 情报科学, 2019, 37(6): 96-100, 145.
[22] (Jiang Qiuju. Evaluation Research on the Influence of Document Integrating Topic and Time Factors[J]. Information Science, 2019, 37(6): 96-100, 145.)
[23] 范少萍, 安新颖, 晏归来, 等. 医学领域前沿主题识别方法研究[J]. 情报学报, 2018, 37(7): 686-694.
[23] (Fan Shaoping, An Xinying, Yan Guilai, et al. Study on the Recognition Method of Frontier Topic in the Medical Field[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(7): 686-694.)
[24] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
pmid: 9377276
[25] Porter A L, Garner J, Carley S F, et al. Emergence Scoring to Identify Frontier R&D Topics and Key Players[J]. Technological Forecasting and Social Change, 2019, 146: 628-643.
doi: 10.1016/j.techfore.2018.04.016
[26] 王效岳, 刘自强, 白如江, 等. 基于基金项目数据的研究前沿主题探测方法[J]. 图书情报工作, 2017, 61(13): 87-98.
doi: 10.13266/j.issn.0252-3116.2017.13.011
[26] (Wang Xiaoyue, Liu Ziqiang, Bai Rujiang, et al. The Method of Research Front Topic Detection Based on the Fund Project Data[J]. Library and Information Service, 2017, 61(13): 87-98.)
doi: 10.13266/j.issn.0252-3116.2017.13.011
[27] 朱光, 刘蕾, 李凤景. 基于LDA和LSTM模型的研究主题关联与预测研究——以隐私研究为例[J]. 现代情报, 2020, 40(8): 38-50.
doi: 10.3969/j.issn.1008-0821.2020.08.005
[27] (Zhu Guang, Liu Lei, Li Fengjing. Research on Topic Relation and Prediction Based on LDA and LSTM——A Case Study of Privacy Research[J]. Journal of Modern Information, 2020, 40(8): 38-50.)
doi: 10.3969/j.issn.1008-0821.2020.08.005
[28] 王卫姣, 陈黎, 王亚强, 等. LDA和KNN相结合的帖子热度预测算法[J]. 四川大学学报(自然科学版), 2014, 51(3): 467-473.
[28] (Wang Weijiao, Chen Li, Wang Yaqiang, et al. Algorithm for Prediction of Post’s Hotness Using K-Nearest Neighbors and Latent Dirichlet Allocation[J]. Journal of Sichuan University(Natural Science Edition), 2014, 51(3): 467-473.)
[29] Shi H G, Su C, Ran R, et al. Electrolyte Materials for Intermediate-Temperature Solid Oxide Fuel Cells[J]. Progress in Natural Science: Materials International, 2020, 30(6): 764-774.
doi: 10.1016/j.pnsc.2020.09.003
[30] Bello I T, Zhai S, Zhao S Y, et al. Scientometric Review of Proton-Conducting Solid Oxide Fuel Cells[J]. International Journal of Hydrogen Energy, 2021, 46(75): 37406-37428.
doi: 10.1016/j.ijhydene.2021.09.061
[31] Bello I T, Zhai S, He Q J, et al. Scientometric Review of Advancements in the Development of High-Performance Cathode for Low and Intermediate Temperature Solid Oxide Fuel Cells: Three Decades in Retrospect[J]. International Journal of Hydrogen Energy, 2021, 46(52): 26518-26536.
doi: 10.1016/j.ijhydene.2021.05.134
[32] Singh M, Zappa D, Comini E. Solid Oxide Fuel Cell: Decade of Progress, Future Perspectives and Challenges[J]. International Journal of Hydrogen Energy, 2021, 46(54): 27643-27674.
doi: 10.1016/j.ijhydene.2021.06.020
[1] Junwan Liu,Zhixin Long,Feifei Wang. Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn