Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (9): 1-13    DOI: 10.11925/infotech.2096-3467.2021.1451
Current Issue | Archive | Adv Search |
Forecasting Developments of Core Topics in Science and Technology with Trend Analysis
Cui Ji,Zhang Jinpeng(),Bao Zhou,Ding Shengchun
Nanjing University of Science & Technology, Nanjing 210094, China
Download: PDF (2315 KB)   HTML ( 49
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] The study creates a predictive model based on trending topics and analyzes the related literature, aiming to forecast the developments of core topics. [Methods] First, we analyzed the characteristics of research topics from scientific and technological literature. Then, we extracted the core topics of strategic coordinate identification. Finally, we used the ARIMA model and exponential smoothing method to predict the topics’ trending degrees. [Results] The mean absolute error and mean root mean square error of the exponential smoothing method were both smaller than those of the ARIMA model. [Limitations] The selection of initial parameters for the model, the distribution of coefficients and the number of published papers will affect the prediction performance. [Conclusions] The two proposed models could yield better prediction results for growing and emerging topics.

Key wordsTheme Discovery      VOS Clustering      Exponential Smoothing      ARIMA Model      Strategic Coordinates     
Received: 25 December 2021      Published: 26 October 2022
ZTFLH:  G350  
Fund:Social Science Fund of Jiangsu Province(20TQB004)
Corresponding Authors: Zhang Jinpeng     E-mail: zjp_gem@163.com

Cite this article:

Cui Ji, Zhang Jinpeng, Bao Zhou, Ding Shengchun. Forecasting Developments of Core Topics in Science and Technology with Trend Analysis. Data Analysis and Knowledge Discovery, 2022, 6(9): 1-13.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.1451     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I9/1

Framework of Topic Discovery and Trend Forecasting
主题 关键词 主题概括 文献量
主题1 engineering, computer science, adaptive algorithms, communication, algorithm 通信算法分析 3 680
主题2 ecology, freshwater biology, marine, environmental sciences, geology 环境科学研究 5 119
主题3 physics, materials science, fabrication, oil/water separation, wettability 材料物理研究 2 546
主题4 design, automation, control systems, autonomous underwater vehicles, tracking 机器设备自动化 1 361
主题5 audiology, noise, underwater noise, sound, signals 声源信号探测 2 295
主题6 mechanics, flow, bubble, fluid-structure interaction, deformation 流体结构研究 2 962
主题7 oceanography, model, water resources, waves, meteorology 海洋资源研究 1 265
主题8 system, remote sensing, photographic technology, calibration, vision 遥感系统分析 1 568
主题9 neurology, neurosciences, kinematics, stress, physiology 人体生理学 1 442
主题10 light, absorption, plant sciences, phytoplankton, underwater photosynthesis 水下浮游植物研究 2 772
主题11 microstructure, stability, mechanical-properties, temperature, strength 温度因素对机械稳定性的影响 2 136
主题12 localization, kalman filter, filter, underwater navigation, target tracking 滤波器及水下定位研究 876
主题13 optimization, propagation, thermodynamics, range, finite element analysis 测距方法 590
主题14 propulsion, computational fluid dynamics, glider, propeller, hydrodynamic coefficients 推进器动力研究 41
主题15 energy, fuels, technology, nuclear science, real-time 核能燃料设备存储 25
主题16 Multiple Input Multiple Output, underwater sensor network, acoustic communications, doppler, underwater communications 水下传感器网络 31
主题17 carbon, coral, shape, continental-shelf, great-barrier-reef 海底植被勘测 33
主题18 ultrafast separation, cellulose, discharges, elastomers, electrodes 设备表层涂料物理特性研究 67
主题19 remotely operated vehicles, robust-control, attitude-control, backstepping control, heading control 水下车辆控制系统 49
Subject-Keyword List
Distribution of Theme Innovation
Distribution of Topic Attention
Distribution of Topic Growth
Thematic Strategy Coordinate
T S I t k
">
Graph of Topic 1’s Novelty and Value of T S I t k
时间 主题1
趋势度
主题2
趋势度
主题3
趋势度
主题5
趋势度
主题6
趋势度
主题11
趋势度
2001 -0.017 58 -0.028 58 0.048 10 -0.015 75 0.037 33 -0.003 50
2002 -0.012 04 -0.016 44 0.064 01 0.024 94 -0.001 24 -0.039 25
2003 0.001 50 -0.012 12 0.020 79 0.009 17 0.004 65 -0.003 97
2004 0.001 94 -0.016 56 0.013 26 -0.000 27 0.020 93 0.000 72
2005 0.010 16 -0.018 42 0.014 83 0.004 38 0.004 34 0.004 72
2006 0.016 29 -0.011 57 0.007 26 -0.004 71 -0.002 60 0.015 32
2007 0.010 64 -0.003 83 0.008 97 0.001 56 -0.004 67 0.007 33
2008 0.012 92 0.000 92 0.009 45 0.009 84 -0.013 78 0.000 63
2009 0.013 32 0.002 94 0.010 28 0.015 61 -0.021 77 -0.000 36
2010 0.012 80 0.005 96 0.011 10 0.015 80 -0.028 17 0.002 50
2011 0.010 99 0.006 22 0.009 62 0.016 37 -0.023 78 0.000 58
2012 0.008 95 0.005 85 0.008 78 0.017 65 -0.021 39 0.000 17
2013 0.004 51 0.004 94 0.010 35 0.015 71 -0.014 32 -0.001 20
2014 0.003 51 0.004 68 0.014 54 0.008 94 -0.011 16 -0.000 52
2015 0.002 98 0.004 82 0.014 55 0.011 04 -0.012 33 -0.001 05
Experimental Training Set
时间 主题1
趋势度
主题2
趋势度
主题3
趋势度
主题5
趋势度
主题6
趋势度
主题11
趋势度
2016 0.002 78 0.004 22 0.014 32 0.012 37 -0.012 44 -0.001 27
2017 0.002 31 0.002 75 0.014 70 0.012 34 -0.010 35 -0.001 74
2018 0.001 95 0.002 22 0.015 50 0.009 29 -0.008 81 -0.000 13
2019 0.009 01 0.000 48 0.014 47 0.006 54 -0.008 91 -0.001 58
Experimental Test Set
主题编号 模型 公式
1 ARIMA(1,2,1) x t = 0.717 022 x t - 1 + 0.665 934 ε t - 1 + ε t
2 ARIMA(1,2,0) x t = 0.911 074 x t - 1 + ε t
3 ARIMA(1,2,1) x t = 0.906 811 x t - 1 + 0.350 869 ε t - 1 + ε t
5 ARIMA(0,2,0) C 5 t = 2 C 5 t - 1 - C 5 t - 2 + ε t
6 ARIMA(0,1,0) C 6 t = 2 C 6 t - 1 - C 6 t - 2 + ε t
11 MA(0) C 11 t = 2 C 11 t - 1 - C 11 t - 2 + ε t
Fitting Model Results for Each Topic
Fitting Effect of ARIMA Model
主题1 主题2 主题3 主题5 主题6 主题11
初始平滑值 -0.009 374 -0.019 048 0.006 119 0.024 299 0.013 579 -0.015 574
Initial Smoothing Value of Topic Trend Degree
时间 一次平滑值 二次平滑值 三次平滑值 a t b t c t
2001 -0.009 37 -0.009 37 -0.009 37 -0.009 37 0 0
2002 -0.010 71 -0.010 04 -0.009 71 -0.011 70 -0.001 50 -0.000 17
2003 -0.004 60 -0.007 32 -0.008 51 -0.000 36 0.006 53 0.000 76
2004 -0.001 33 -0.004 33 -0.006 42 0.002 56 0.005 25 0.000 45
2005 0.004 41 0.000 04 -0.003 19 0.009 92 0.007 22 0.000 57
2006 0.010 35 0.005 20 0.001 00 0.016 46 0.007 55 0.000 48
2007 0.010 49 0.007 84 0.004 42 0.012 37 0.000 72 -0.000 39
2008 0.011 71 0.009 78 0.007 10 0.012 90 0.000 07 -0.000 37
2009 0.012 51 0.011 15 0.009 12 0.013 23 -0.000 27 -0.000 33
2010 0.012 65 0.011 90 0.010 51 0.012 77 -0.000 83 -0.000 32
2011 0.011 82 0.011 86 0.011 19 0.011 07 -0.001 82 -0.000 36
2012 0.010 38 0.011 12 0.011 15 0.008 94 -0.002 50 -0.000 35
2013 0.007 45 0.009 29 0.010 22 0.004 71 -0.004 10 -0.000 45
2014 0.005 48 0.007 38 0.008 80 0.003 09 -0.003 11 -0.000 24
2015 0.004 23 0.005 81 0.007 30 0.002 57 -0.001 78 -0.000 04
Exponential Smoothing Value and Parameter Value of Topic 1’s Trend Degree
Fitting Effect of Exponential Smoothing Method
Predicted Value of ARIMA Model and Exponential Smoothing Method
主题 MAE
(ARIMA)
MAE
(指数平滑法)
增长型
主题
主题1 0.001 87 0.001 70
主题2 0.000 40 0.000 60
主题3 0.002 63 0.001 05
增长型主题平均预测误差 0.001 63 0.001 12
新兴型
主题
主题5 0.006 13 0.003 38
主题6 0.002 08 0.003 23
主题11 0.000 52 0.000 75
新兴型主题平均预测误差 0.002 91 0.002 45
MAE of Predicted Values Between ARIMA Model and Exponential Smoothing Method
主题 RMSE
(ARIMA模型)
RMSE
(指数平滑法)
增长型主题 主题1 0.002 33 0.001 85
主题2 0.000 52 0.000 64
主题3 0.007 73 0.003 77
增长型主题平均
均方根误差
0.003 53 0.002 09
新兴型主题 主题5 0.002 65 0.001 22
主题6 0.002 70 0.003 49
主题11 0.001 14 0.001 76
新兴型主题平均
均方根误差
0.002 17 0.002 16
RMSE of Predicted Values Between ARIMA Model and Exponential Smoothing Method
[1] 刘峰, 李煜, 吕学强, 等. 查询主题分类方法研究[J]. 现代图书情报技术, 2015(4): 10-17.
[1] ( Liu Feng, Li Yu, Lv Xueqiang, et al. Research on Query Topic Classification Method[J]. New Technology of Library and Information Service, 2015(4): 10-17.)
[2] 张莉, 王丽婷, 蒋竞, 等. 基于主题模型和机器学习的回答者推荐方法: 中国, CN107562836A[P]. 2018-01-09[2022-04-22]. https://doc.paperpass.com/patent/CN107562836A.html.
[2] ( Zhang Li, Wang Liting, Jiang Jing, et al. Respondent Recommendation Method Based on Topic Model and Machine Learning: China, CN107562836A[P]. 2018-01-09[2022-04-22]. https://doc.paperpass.com/patent/CN107562836A.html.)
[3] 张爽, 刘非凡, 罗双玲, 等. 基于领域语义地图的区块链研究主题发现及演化分析[J]. 情报工程, 2021, 7(2): 3-14.
[3] ( Zhang Shuang, Liu Feifan, Luo Shuangling, et al. Topic Detection and Evolution Analysis of Blockchain with the Domain Semantic Map[J]. Technology Intelligence Engineering, 2021, 7(2): 3-14.)
[4] Chakraborti S, Dey S. Multi-Level K-Means Text Clustering Technique for Topic Identification for Competitor Intelligence[C]// Proceedings of the 10th IEEE International Conference on Research Challenges in Information Science. IEEE, 2016: 1-10.
[5] Kusumawardani R P, Basri M H. Topic Identification and Categorization of Public Information in Community-Based Social Media[J]. Journal of Physics: Conference Series, 2017, 801: 012075.
doi: 10.1088/1742-6596/801/1/012075
[6] 陶兴, 张向先, 郭顺利. 基于DPCA的社会化问答社区用户生成答案知识聚合与主题发现服务研究[J]. 情报理论与实践, 2019, 42(6):94-98.
[6] ( Tao Xing, Zhang Xiangxian, Guo Shunli. Research of User-Generated-Answer Knowledge Aggregation and Topic Discovery Service in Social Q & A Community Based on DPCA[J]. Information Studies: Theory & Application, 2019, 42(6): 94-98.)
[7] 王曰芬, 王一山, 杨洁. 基于社区发现和关键节点识别的网络舆情主题发现与实证分析[J]. 图书与情报, 2020(5): 48-58.
[7] ( Wang Yuefen, Wang Yishan, Yang Jie. Topic Discovery and Empirical Analysis of Network Public Opinion Based on Community Detection and Key Node Identification[J]. Library & Information, 2020(5): 48-58.)
[8] 林丽丽, 马秀峰. 基于LDA模型的国内图书情报学研究主题发现及演化分析[J]. 情报科学, 2019, 37(12): 87-92.
[8] ( Lin Lili, Ma Xiufeng. The Theme Discovery and Evolution Analysis of Domestic Library and Information Science Research Based on LDA[J]. Information Science, 2019, 37(12): 87-92.)
[9] 唐晓波, 顾娜, 谭明亮. 基于句子主题发现的中文多文档自动摘要研究[J]. 情报科学, 2020, 38(3): 11-16.
[9] ( Tang Xiaobo, Gu Na, Tan Mingliang. The Study of Multi-Documents Summarization in Chinese Based on Sentence Topic Discovery[J]. Information Science, 2020, 38(3): 11-16.)
[10] 杨海民, 潘志松, 白玮. 时间序列预测方法综述[J]. 计算机科学, 2019, 46(1): 21-28.
doi: 10.11896/j.issn.1002-137X.2019.01.004
[10] ( Yang Haimin, Pan Zhisong, Bai Wei. Review of Time Series Prediction Methods[J]. Computer Science, 2019, 46(1): 21-28.)
doi: 10.11896/j.issn.1002-137X.2019.01.004
[11] Wang X Q, Qi L, Chen C, et al. Grey System Theory Based Prediction for Topic Trend on Internet[J]. Engineering Applications of Artificial Intelligence, 2014, 29: 191-200.
doi: 10.1016/j.engappai.2013.12.005
[12] 张鑫, 文奕, 许海云, 等. Prophet预测-修正的主题强度演化模型——以干细胞领域为实证[J]. 图书情报工作, 2020, 64(8): 78-92.
doi: 10.13266/j.issn.0252-3116.2020.08.010
[12] ( Zhang Xin, Wen Yi, Xu Haiyun, et al. Prophet Prediction-Correction Topic Evolution Model—A Case Study in Stem Cell Field[J]. Library and Information Service, 2020, 64(8): 78-92.)
doi: 10.13266/j.issn.0252-3116.2020.08.010
[13] 李静, 徐路路, 赵素君. 基于时间序列分析和SVM模型的基金项目新兴主题趋势预测与可视化研究[J]. 情报理论与实践, 2019, 42(1): 118-123.
[13] ( Li Jing, Xu Lulu, Zhao Sujun. Prediction and Visualization of Emerging Topics of Fund Sponsored Projects Based on Time Series Analysis and SVM Model[J]. Information Studies: Theory & Application, 2019, 42(1): 118-123.)
[14] 白敬毅, 颜端武, 陈琼. 基于主题模型和曲线拟合的新兴主题趋势预测研究[J]. 情报理论与实践, 2020, 43(7): 130-136.
[14] ( Bai Jingyi, Yan Duanwu, Chen Qiong. Trend Prediction of Emerging Topics Based on Topic Model and Curve Fitting[J]. Information Studies: Theory & Application, 2020, 43(7): 130-136.)
[15] 岳丽欣, 周晓英, 陈旖旎. 基于ARIMA模型的信息构建研究主题趋势预测研究[J]. 图书情报知识, 2019(5): 54-63.
[15] ( Yue Lixin, Zhou Xiaoying, Chen Yini. Thematic Trend Prediction of Information Architecture Based on the ARIMA Model[J]. Documentation, Information & Knowledge, 2019(5): 54-63.)
[16] Law J, Baurin S, Courtial J, et al. Policy and the Mapping of Scientific Change: A Co-word Analysis of Research into Environment Acidification[J]. Scientometrics, 1988, 14(3):251-264.
doi: 10.1007/BF02020078
[17] 马费成, 望俊成, 张于涛. 国内生命周期理论研究知识图谱绘制——基于战略坐标图和概念网络分析法[J]. 情报科学, 2010, 28(4): 481-487.
[17] ( Ma Feicheng, Wang Juncheng, Zhang Yutao. The Knowledge Map of Domestic Life Cycle Theory Studies—Based on Strategic Diagram and Conceptual Network Methods[J]. Information Science, 2010, 28(4): 481-487.)
[18] 韩霞, 李秀霞, 史盛楠, 等. 基于Z分数与Sen’s斜率的研究前沿识别方法——以图书馆学领域为例[J]. 情报科学, 2020, 38(1): 93-97.
[18] ( Han Xia, Li Xiuxia, Shi Shengnan, et al. Research Fronts Identification Based on Z-Score and Sen’s Slope Method—Taking the Field of Library Science as an Example[J]. Information Science, 2020, 38(1): 93-97.)
[19] 刘蓉, 文军, 王欣. 黄河源区蒸散发量时空变化趋势及突变分析[J]. 气候与环境研究, 2016, 21(5): 503-511.
[19] ( Liu Rong, Wen Jun, Wang Xin. Spatial-Temporal Variation and Abrupt Analysis of Evapotranspiration over the Yellow River Source Region[J]. Climatic and Environmental Research, 2016, 21(5): 503-511.)
[20] 范云满, 马建霞. 基于LDA与新兴主题特征分析的新兴主题探测研究[J]. 情报学报, 2014, 33(7): 698-711.
[20] ( Fan Yunman, Ma Jianxia. Detection of Emerging Topics Based on LDA and Feature Analysis of Emerging Topics[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(7): 698-711.)
[21] 薛冬梅. ARIMA模型及其在时间序列分析中的应用[J]. 吉林化工学院学报, 2010, 27(3): 80-83.
[21] ( Xue Dongmei. Application of the ARIMA Model in Time Series Analysis[J]. Journal of Jilin Institute of Chemical Technology, 2010, 27(3): 80-83.)
[22] Liu W W, Qin Y, Dong H H, et al. Highway Passenger Traffic Volume Prediction of Cubic Exponential Smoothing Model Based on Grey System Theory[C]// Proceedings of the 2nd International Conference on Soft Computing in Information Communication Technology. 2014.
[23] Upham S P, Small H. Emerging Research Fronts in Science and Technology: Patterns of New Knowledge Development[J]. Scientometrics, 2010, 83(1): 15-38.
doi: 10.1007/s11192-009-0051-9 pmid: 32214555
[1] Yue Lixin,Liu Ziqiang,Hu Zhengyin. Evolution Analysis of Hot Topics with Trend-Prediction[J]. 数据分析与知识发现, 2020, 4(6): 22-34.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn