Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (10): 35-45     https://doi.org/10.11925/infotech.2096-3467.2022.0075
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于计量指标多变量LSTM模型的新兴主题热度预测研究*
陈稳1,2,陈伟1,2,3()
1中国科学院武汉文献情报中心 武汉 430071
2中国科学院大学经济与管理学院图书情报与档案管理系 北京 100190
3科技大数据湖北省重点实验室 武汉 430071
Predicting Popularity of Emerging Topics with Multivariable LSTM and Bibliometric Indicators
Chen Wen1,2,Chen Wei1,2,3()
1Wuhan Documentation and Information Center, Chinese Academy of Sciences, Wuhan 430071, China
2Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
3Hubei Key Laboratory of Big Data in Science and Technology, Wuhan 430071, China
全文: PDF (1366 KB)   HTML ( 24
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 识别出多源数据中蕴含的新兴主题,建立融合计量指标多变量的LSTM模型以实现新兴主题热度的预测。【方法】 首先,挖掘出基金项目、期刊论文、专利文献中的研究主题;其次,根据主题新颖性、成长性和持续性筛选出研究主题中的新兴主题;最后,设计主题热度指标,并融合基金金额、基金数量、论文篇均被引、专利IPC小类数量4个计量指标,基于LSTM模型实现对新兴主题研究热度的预测。【结果】 以固体氧化物燃料电池领域为例,融合计量指标的多变量LSTM预测效果优于BP、KNN、SVM、单变量LSTM模型,其MAE(16.534)、RMSE(23.494)最小, R2(0.642)最高。【局限】 在输入计量指标选择中,专利被引数量等指标由于很难获取每个时间窗口下具体数据而未被纳入。【结论】 计量指标多变量的纳入优化了新兴主题热度预测模型的预测效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
陈稳
陈伟
关键词 多变量LSTM新兴主题热度预测计量指标    
Abstract

[Objective] This paper identifies emerging topics from multi-source data, and constructs a multivariable LSTM with bibliometric indicators to predict their popularity. [Methods] Firstly, we explored the topics of funded projects, papers and patents. Secondly, we identified the emerging ones based on their novelty, growth and persistence. Finally, we predicted these topics’ popularity with the multivariable LSTM model and indicators of funding amounts, number of fundings, average citation counts for each article, and number of patent IPC subclasses. [Results] We examined our new model with studies on solid oxide fuel cell, which yielded better performance than BP, KNN, SVM and univariate LSTM. Our model had the lowest MAE (16.534) and RMSE (23.494), as well as the highest R2 (0.642). [Limitations] We did not include each patent’s citation number because it was difficult to obtain specific data for each time window. [Conclusions] The modified LSMT could effectively predict the popularity of emerging topics.

Key wordsMultivariable LSTM    Emerging Topic    Popularity Prediction    Bibliometric Indicator
收稿日期: 2022-01-25      出版日期: 2022-11-16
ZTFLH:  G353 TM911  
基金资助:中国科学院A类战略性先导科技专项(XDA21010103);中国科学院文献情报能力建设专项(E0290001);中国科学院青年创新促进会项目(2017221)
通讯作者: 陈伟,ORCID:0000-0002-2334-1129      E-mail: chenw@whlib.ac.cn
引用本文:   
陈稳, 陈伟. 基于计量指标多变量LSTM模型的新兴主题热度预测研究*[J]. 数据分析与知识发现, 2022, 6(10): 35-45.
Chen Wen, Chen Wei. Predicting Popularity of Emerging Topics with Multivariable LSTM and Bibliometric Indicators. Data Analysis and Knowledge Discovery, 2022, 6(10): 35-45.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0075      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I10/35
Fig.1  新兴主题热度预测方法框架
Fig.2  LSTM单元结构
变量名称 变量内涵
主题热度 时间窗口下主题的研究热度
基金金额 时间窗口下主题相关的有效基金项目金额之和
基金数量 时间窗口下主题相关的有效基金项目数量之和
论文篇均被引 时间窗口下主题相关论文篇均被引
专利IPC小类数量 时间窗口下公开专利的IPC小类数量之和
Table 1  LSTM新兴主题热度预测模型输入变量
Fig.3  困惑度曲线
Fig.4  新兴主题识别战略坐标图
主题编号 主题名 主题编号 主题名
topic5 质子电解质 topic30 SOFC运行温度
topic9 SOFC性能动态建模 topic32 电池储能
topic13 元件微观结构 topic37 电极电解过程
topic21 掺杂材料性能 topic40 中温SOFC阴极
topic22 电力系统效率 topic43 粉体材料制备
Table 2  固体氧化物燃料电池领域新兴主题
Fig.5  Epoch-MAE及Epoch-RMSE曲线
模型 MAE RMSE R 2
BP 44.852 59.906 -1.454
SVM 19.585 28.371 0.474
KNN 18.471 26.325 0.548
LSTM 16.833 23.564 0.639
Table 3  模型算法总体评估结果
输入变量 MAE RMSE R 2
主题热度 16.833 23.564 0.639
主题热度+基金金额 16.725 23.583 0.639
主题热度+基金数量 16.602 23.631 0.638
主题热度+论文篇均被引 16.796 23.510 0.641
主题热度+ IPC小类数量 16.602 23.536 0.641
以上所有变量 16.534 23.494 0.642
Table 4  不同输入变量下模型总体评估结果
Fig.6  SOFC领域新兴主题研究热度预测结果
(注:红色虚线为2021-2023年主题热度预测结果,蓝色虚线为主题热度趋势线。)
[1] 周云泽, 闵超. 基于LDA模型与共享语义空间的新兴技术识别——以自动驾驶汽车为例[J]. 数据分析与知识发现, 2022, 6(2/3): 55-66.
[1] (Zhou Yunze, Min Chao. Identifying Emerging Technology with LDA Model and Shared Semantic Space——Case Study of Autonomous Vehicles[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 55-66.)
[2] Matsumura N, Matsuo Y, Ohsawa Y, et al. Discovering Emerging Topics from WWW[J]. Journal of Contingencies and Crisis Management, 2002, 10(2): 73-81.
doi: 10.1111/1468-5973.00183
[3] Zhang S T, Han F. Identifying Emerging Topics in a Technological Domain[J]. Journal of Intelligent & Fuzzy Systems, 2016, 31(4): 2147-2157.
[4] Tu Y N, Seng J L. Indices of Novelty for Emerging Topic Detection[J]. Information Processing & Management, 2012, 48(2): 303-325.
doi: 10.1016/j.ipm.2011.07.006
[5] Glänzel W, Thijs B. Using ‘Core Documents’ for Detecting and Labelling New Emerging Topics[J]. Scientometrics, 2012, 91(2): 399-416.
doi: 10.1007/s11192-011-0591-7
[6] Rotolo D, Hicks D, Martin B R. What is an Emerging Technology?[J]. Research Policy, 2015, 44(10): 1827-1843.
doi: 10.1016/j.respol.2015.06.006
[7] 白敬毅, 颜端武, 陈琼. 基于主题模型和曲线拟合的新兴主题趋势预测研究[J]. 情报理论与实践, 2020, 43(7): 130-136, 193.
[7] (Bai Jingyi, Yan Ruiwu, Chen Qiong. Trend Prediction of Emerging Topics Based on Topic Model and Curve Fitting[J]. Information Studies: Theory & Application, 2020, 43(7): 130-136, 193.)
[8] 杨金庆, 魏雨晗, 黄圣智, 等. 基于科技文献的新兴主题识别研究综述[J]. 情报科学, 2020, 38(8): 159-163,177.
[8] (Yang Jinqing, Wei Yuhan, Huang Shengzhi, et al. Research Review on Emerging Topic Identification Based on Scientific Literatures[J]. Information Science, 2020, 38(8): 159-163, 177.)
[9] 黄鲁成, 唐月强, 吴菲菲, 等. 基于文献多属性测度的新兴主题识别方法研究[J]. 科学学与科学技术管理, 2015, 36(2): 34-43.
[9] (Huang Lucheng, Tang Yueqiang, Wu Feifei, et al. Research on Identification of Emerging Topics Based on Muti-Attribute Measurement of Literature[J]. Science of Science and Management of S.& T., 2015, 36(2): 34-43.)
[10] 宋欣娜, 郭颖, 席笑文. 基于专利文献的多指标新兴技术识别研究[J]. 情报杂志, 2020, 39(6): 76-81, 88.
[10] (Song Xinna, Guo Ying, Xi Xiaowen. Research on Multi-Indicator Emerging Technology Identification Based on Patent Literature[J]. Journal of Intelligence, 2020, 39(6): 76-81, 88.)
[11] 刘小平, 冷伏海, 李泽霞. 国际科技前沿分析的方法和途径[J]. 图书情报工作, 2012, 56(12): 60-65.
[11] (Liu Xiaoping, Leng Fuhai, Li Zexia. Methods and Approaches of International S&T Front Analysis[J]. Library and Information Service, 2012, 56(12): 60-65.)
[12] 张婧, 刘彦君, 张炜, 等. 基于科研项目数据的科技前沿识别有效路径实证探索[J]. 科技管理研究, 2019, 39(16): 108-119.
[12] (Zhang Jing, Liu Yanjun, Zhang Wei, et al. Empirical Exploration on Effective Paths to Identify Frontier Tech Based upon Data of Scientific Research Projects[J]. Science and Technology Management Research, 2019, 39(16): 108-119.)
[13] 曾海娇, 孙巍. 基于专利与论文关联的潜在科学前沿识别——以生物农药领域为例[J]. 农业展望, 2020, 16(9): 93-100.
[13] (Zeng Haijiao, Sun Wei. Identification of Potential Scientific Frontiers Based on Correlation Between Patents and Papers——A Case Study of Biopesticide[J]. Agricultural Outlook, 2020, 16(9): 93-100.)
[14] 徐路路, 王芳. 基于支持向量机和改进粒子群算法的科学前沿预测模型研究[J]. 情报科学, 2019, 37(8): 22-28.
[14] (Xu Lulu, Wang Fang. Scientific Frontier Prediction Model Based on Support Vector Machine and Improved Particle Swarm Optimization[J]. Information Science, 2019, 37(8): 22-28.)
[15] 白如江, 刘博文, 冷伏海. 基于多维指标的未来新兴科学研究前沿识别研究[J]. 情报学报, 2020, 39(7): 747-760.
[15] (Bai Rujiang, Liu Bowen, Leng Fuhai. Frontier Identification of Emerging Scientific Research Based on Multi-indicators[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(7): 747-760.)
[16] 宋凯, 朱彦君. 专利前沿技术主题识别及趋势预测方法——以人工智能领域为例[J]. 情报杂志, 2021, 40(1): 33-38.
[16] (Song Kai, Zhu Yanjun. Patent Frontier Technology Topic Identification and Trend Prediction: A Case Analysis of Artificial Intelligence[J]. Journal of Intelligence, 2021, 40(1): 33-38.)
[17] 岳丽欣, 刘自强, 胡正银. 面向趋势预测的热点主题演化分析方法研究[J]. 数据分析与知识发现, 2020, 4(6): 22-34.
[17] (Yue Lixin, Liu Ziqiang, Hu Zhengyin. Evolution Analysis of Hot Topics with Trend-Prediction[J]. Data Analysis and Knowledge Discovery, 2020, 4(6): 22-34.)
[18] 李静, 徐路路, 赵素君. 基于时间序列分析和SVM模型的基金项目新兴主题趋势预测与可视化研究[J]. 情报理论与实践, 2019, 42(1): 118-123, 152.
[18] (Li Jing, Xu Lulu, Zhao Sujun. Prediction and Visualization of Emerging Topics of Fund Sponsored Projects Based on Time Series Analysis and SVM Model[J]. Information Studies: Theory & Application, 2019, 42(1): 118-123, 152.)
[19] 霍朝光, 董克, 司湘云. 国内外LIS学科主题热度演化分析与预测[J]. 图书情报知识, 2021(2): 35-47, 57.
[19] (Huo Chaoguang, Dong Ke, Si Xiangyun. Evolution Analysis and Prediction of Scientific Topic Popularity in the Field of LIS[J]. Documentation, Information & Knowledge, 2021(2): 35-47, 57.)
[20] 霍朝光, 霍帆帆, 董克. 基于LSTM神经网络的学科主题热度预测模型[J]. 图书情报知识, 2021(2): 25-34.
[20] (Huo Chaoguang, Huo Fanfan, Dong Ke. The Popularity Prediction of Scientific Topics Based on LSTM[J]. Documentation, Information & Knowledge, 2021(2): 25-34.)
[21] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
[22] 江秋菊. 融入主题和时间因素的文献影响力评价研究[J]. 情报科学, 2019, 37(6): 96-100, 145.
[22] (Jiang Qiuju. Evaluation Research on the Influence of Document Integrating Topic and Time Factors[J]. Information Science, 2019, 37(6): 96-100, 145.)
[23] 范少萍, 安新颖, 晏归来, 等. 医学领域前沿主题识别方法研究[J]. 情报学报, 2018, 37(7): 686-694.
[23] (Fan Shaoping, An Xinying, Yan Guilai, et al. Study on the Recognition Method of Frontier Topic in the Medical Field[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(7): 686-694.)
[24] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
pmid: 9377276
[25] Porter A L, Garner J, Carley S F, et al. Emergence Scoring to Identify Frontier R&D Topics and Key Players[J]. Technological Forecasting and Social Change, 2019, 146: 628-643.
doi: 10.1016/j.techfore.2018.04.016
[26] 王效岳, 刘自强, 白如江, 等. 基于基金项目数据的研究前沿主题探测方法[J]. 图书情报工作, 2017, 61(13): 87-98.
doi: 10.13266/j.issn.0252-3116.2017.13.011
[26] (Wang Xiaoyue, Liu Ziqiang, Bai Rujiang, et al. The Method of Research Front Topic Detection Based on the Fund Project Data[J]. Library and Information Service, 2017, 61(13): 87-98.)
doi: 10.13266/j.issn.0252-3116.2017.13.011
[27] 朱光, 刘蕾, 李凤景. 基于LDA和LSTM模型的研究主题关联与预测研究——以隐私研究为例[J]. 现代情报, 2020, 40(8): 38-50.
doi: 10.3969/j.issn.1008-0821.2020.08.005
[27] (Zhu Guang, Liu Lei, Li Fengjing. Research on Topic Relation and Prediction Based on LDA and LSTM——A Case Study of Privacy Research[J]. Journal of Modern Information, 2020, 40(8): 38-50.)
doi: 10.3969/j.issn.1008-0821.2020.08.005
[28] 王卫姣, 陈黎, 王亚强, 等. LDA和KNN相结合的帖子热度预测算法[J]. 四川大学学报(自然科学版), 2014, 51(3): 467-473.
[28] (Wang Weijiao, Chen Li, Wang Yaqiang, et al. Algorithm for Prediction of Post’s Hotness Using K-Nearest Neighbors and Latent Dirichlet Allocation[J]. Journal of Sichuan University(Natural Science Edition), 2014, 51(3): 467-473.)
[29] Shi H G, Su C, Ran R, et al. Electrolyte Materials for Intermediate-Temperature Solid Oxide Fuel Cells[J]. Progress in Natural Science: Materials International, 2020, 30(6): 764-774.
doi: 10.1016/j.pnsc.2020.09.003
[30] Bello I T, Zhai S, Zhao S Y, et al. Scientometric Review of Proton-Conducting Solid Oxide Fuel Cells[J]. International Journal of Hydrogen Energy, 2021, 46(75): 37406-37428.
doi: 10.1016/j.ijhydene.2021.09.061
[31] Bello I T, Zhai S, He Q J, et al. Scientometric Review of Advancements in the Development of High-Performance Cathode for Low and Intermediate Temperature Solid Oxide Fuel Cells: Three Decades in Retrospect[J]. International Journal of Hydrogen Energy, 2021, 46(52): 26518-26536.
doi: 10.1016/j.ijhydene.2021.05.134
[32] Singh M, Zappa D, Comini E. Solid Oxide Fuel Cell: Decade of Progress, Future Perspectives and Challenges[J]. International Journal of Hydrogen Energy, 2021, 46(54): 27643-27674.
doi: 10.1016/j.ijhydene.2021.06.020
[1] 刘俊婉,龙志昕,王菲菲. 基于LDA主题模型与链路预测的新兴主题关联机会发现研究*[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
[2] 董玉敏. 《中国科学计量指标:论文与引文统计》利用之初探[J]. 现代图书情报技术, 2001, 17(3): 80-82.
[3] 吴振新. 中国科学计量指标数据库的设计与实现[J]. 现代图书情报技术, 2001, 17(1): 68-70.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn