Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (6): 22-34    DOI: 10.11925/infotech.2096-3467.2019.1155
Current Issue | Archive | Adv Search |
Evolution Analysis of Hot Topics with Trend-Prediction
Yue Lixin1,Liu Ziqiang2,3(),Hu Zhengyin2,3
1School of Information Resource Management, Renmin University of China, Beijing 100872, China
2Chengdu Library of Chinese Academy of Sciences, Chengdu 610041, China
3Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
Download: PDF (7483 KB)   HTML ( 40
Export: BibTeX | EndNote (RIS)      

[Objective] The paper constructs mathematical and content prediction models based on the external and internal characteristics academic articles, aiming to analyze the evolution of trending research topics. [Methods] With the help of LDA model, we identified the needed topics and constructed their time series. Then, we determined the popular topics by mean values and linear regression fitting. Finally, we predicted the trending topics with ARIMA and Word2Vec models based on the topic intensity and content. [Results] We conducted an empirical study to evaluate our models with stem cell research in the United States. We identified popular topics and predicted their development trends. [Limitations] There might be ambiguity in interpreting the documents, because the Word2Vec model analyzes trends of theme contents based on single words. [Conclusions] The proposed method can provide better prediction results than methods based on manual interpretation.

Key wordsTrend Prediction      Hot Topics      ARIMA Model      Word2Vec      Topic Evolution     
Received: 22 October 2019      Published: 07 July 2020
ZTFLH:  G350  
Corresponding Authors: Liu Ziqiang     E-mail:

Cite this article:

Yue Lixin,Liu Ziqiang,Hu Zhengyin. Evolution Analysis of Hot Topics with Trend-Prediction. Data Analysis and Knowledge Discovery, 2020, 4(6): 22-34.

URL:     OR

模型 自相关函数(ACF) 偏自相关函数(PACF)
AR(p) 拖尾 p阶后截尾
MA(q) q阶后截尾 拖尾
ARMA(p, q) q阶后拖尾 p阶后拖尾
Determination of Model Parameters
Schematic Diagram of CBOW Model and Skip-Gram Model
Annual Distribution of Papers
Determination of Optimal Number of Topics
主题序号 主题词
Topic1 acute|intestinal|hematopoietic|kinase|term|epithelial|
Topic2 pathway|cancer|embryonic|hematopoietic|virus|cell|
Topic3 regulation|marrow|hematopoietic|embryonic|biology|
Topic4 resistance|cell|effect|hematopoietic|cancer|imaging|
Topic5 cell|cancer|breast|pancreatic|new|hematopoietic|
…… ……
Research Topics in the Field of Stem Cells in the United States (Partial)
Time Series of Theme in Stem Cell Field (2000-2018)
ARIMA(0, 0, 1) BIC:-77.88 ARIMA(1, 2, 0) BIC:-111.36
ARIMA(0, 0, 2) BIC:-80.42 ARIMA(1, 2, 1) BIC:-113.48
ARIMA(0, 1, 1) BIC:-127.98 ARIMA(1, 2, 2) BIC:-107.37
ARIMA(0, 1, 2) BIC:-119.46 ARIMA(2, 0, 0) BIC:-133.28
ARIMA(0, 2, 1) BIC:-110.23 ARIMA(2, 0, 1) BIC:-140.00
ARIMA(0, 2, 2) BIC:-109.99 ARIMA(2, 0, 2) BIC:-125.84
ARIMA(1, 0, 0) BIC:-136.18 ARIMA(2, 1, 0) BIC:-130.73
ARIMA(1, 0, 1) BIC:-139.63 ARIMA(2, 1, 1) BIC:-127.64
ARIMA(1, 0, 2) BIC:-132.44 ARIMA(2, 1, 2) BIC:-116.59
ARIMA(1, 1, 0) BIC:-136.13 ARIMA(2, 2, 0) BIC:-112.62
ARIMA(1, 1, 1) BIC:-129.06 ARIMA(2, 2, 1) BIC:-115.63
ARIMA(1, 1, 2) BIC:-117.46 ARIMA(2, 2, 2) BIC:-103.39
Determination of Model Parameters
Model Test Results
Prediction of Hot Topic Intensity Evolution Trend
Trend Forecast of Hot Topic Content
Research Hotspot of Stem Cell Based on VOSviewer
[1] 刘小平, 冷伏海, 李泽霞. 国际科技前沿分析的方法和途径[J]. 图书情报工作, 2012,56(12):60-65.
[1] ( Liu Xiaoping, Leng Fuhai, Li Zexia. Methods and Approaches of International S&T Front Analysis[J]. Library and Information Service, 2012,56(12):60-65.)
[2] 刘自强, 王效岳, 白如江. 多维度视角下学科主题演化可视化分析方法研究——以我国图书情报领域大数据研究为例[J]. 中国图书馆学报, 2016,42(6):67-84.
[2] ( Liu Ziqiang, Wang Xiaoyue, Bai Rujiang. Research on Visualization Analysis Method of Discipline Topics Evolution from the Perspective of Multi-Dimensions: A Case Study of the Big Data in the Field of Library and Information Science in China[J]. Journal of Library Science in China, 2016,42(6):67-84.)
[3] 静发冲, 李晨英, 韩明杰, 等. 基于文本挖掘的美国NSF生物科学部新兴前沿项目主题分析[J]. 现代情报, 2014,34(12):107-112.
[3] ( Jing Fachong, Li Chenying, Han Mingjie, et al. Topic Analysis of Projects from Emerging Frontiers Division of NSF’s Directorate for Biological Science Based on Text Mining[J]. Journal of Modern Information, 2014,34(12):107-112.)
[4] 刘自强, 王效岳, 白如江. 基于时间序列模型的研究热点分析预测方法研究[J]. 情报理论与实践, 2016,39(5):27-33.
[4] ( Liu Ziqiang, Wang Xiaoyue, Bai Rujiang. Research on the Forecasting Method of Research Hotspots Analysis Based on Time Series Model[J]. Information Studies: Theory & Application, 2016,39(5):27-33.)
[5] 许晓阳, 郑彦宁, 刘志辉. 论文和专利相结合的研究前沿识别方法研究[J]. 图书情报工作, 2016,60(24):97-106.
[5] ( Xu Xiaoyang, Zheng Yanning, Liu Zhihui. Study on the Method of Identifying Research Fronts Based on Scientific Papers and Patents[J]. Library and Information Service, 2016,60(24):97-106.)
[6] Yu G, Wang M Y, Yu D R. Characterizing Knowledge Diffusion of Nanoscience & Nanotechnology by Citation Analysis[J]. Scientometrics, 2010,84:81-97.
doi: 10.1007/s11192-009-0090-2
[7] 侯剑华, 王仲禹. 研究主题的知识流动测度及其实证分析——以H指数研究为例[J]. 图书情报工作, 2017,61(10):87-93.
[7] ( Hou Jianhua, Wang Zhongyu. The Measurement of Knowledge Flow in Research Subject with an Empirical Analysis——Taking H-index Study as an Example[J]. Library and Information Service, 2017,61(10):87-93.)
[8] 白如江, 冷伏海. k-clique社区知识创新演化方法研究[J]. 图书情报工作, 2013,57(17):86-94.
[8] ( Bai Rujiang, Leng Fuhai. Knowledge Innovational Evolution Analysis Based on k-clique Community Network[J]. Library and Information Service, 2013,57(17):86-94.)
[9] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
[10] Blei D M, Lafferty J. Dynamic Topic Models [C]// Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006: 113-120.
[11] 范云满, 马建霞. 基于LDA与新兴主题特征分析的新兴主题探测研究[J]. 情报学报, 2014,33(7):698-711.
[11] ( Fan Yunman, Ma Jianxia. Detection of Emerging Topics Based on LDA and Feature Analysis of Emerging Topics[J]. Journal of the China Society for Scientific and Technical Information, 2014,33(7):698-711.)
[12] 王效岳, 刘自强, 白如江, 等. 基于基金项目数据的研究前沿主题探测方法[J]. 图书情报工作, 2017,61(13):87-98.
[12] ( Wang Xiaoyue, Liu Ziqiang, Bai Rujiang, et al. The Method of Research Front Topic Detection Based on the Fund Project Data[J]. Library and Information Service, 2017,61(13):87-98.)
[13] Rosvall M, Bergstrom C T. Mapping Change in Large Networks[J]. PLoS ONE, 2010,5(1):e8694.
doi: 10.1371/journal.pone.0008694 pmid: 20111700
[14] 王晓光, 程齐凯. 基于NEViewer的学科主题演化可视化分析[J]. 情报学报, 2013,32(9):900-911.
[14] ( Wang Xiaoguang, Cheng Qikai. Analysis on Evolution of Research Topics in a Discipline Based on NEViewer[J]. Journal of the China Society for Scientific and Technical Information, 2013,32(9):900-911.)
[15] Yan E. Research Dynamics, Impact, and Dissemination: A Topic-Level Analysis[J]. Journal of the Association for Information Science and Technology, 2015,66(11):2357-2372.
doi: 10.1002/asi.2015.66.issue-11
[16] 周源, 张超, 唐杰, 等. 基于主题变迁的领域发展路径智能化识别——以人工智能为例[J]. 图书情报工作, 2018,62(14):62-71.
[16] ( Zhou Yuan, Zhang Chao, Tang Jie, et al. Intelligent Identification of Field Development Trajectory Based on Topic Evolution: A Case Study of Artificial Intelligence[J]. Library and Information Service, 2018,62(14):62-71.)
[17] Jaccard P. The Distribution of Flora in the Alpine Zone[J]. New Phytologist, 1912,11(2):37-50.
doi: 10.1111/nph.1912.11.issue-2
[18] 齐亚双, 祝娜, 翟羽佳. 基于DTM的国内外情报学研究主题热度演化对比研究[J]. 图书情报工作, 2016,60(16):99-109.
[18] ( Qi Yashuang, Zhu Na, Zhai Yujia. A Comparative Study on Topic Heats Evolution in the Field of Information Science Between the Domestic and Foreign Research Based on DTM[J]. Library and Information Service, 2016,60(16):99-109.)
[19] 陈伟, 林超然, 李金秋, 等. 基于LDA-HMM的专利技术主题演化趋势分析——以船用柴油机技术为例[J]. 情报学报, 2018,37(7):732-741.
[19] ( Chen Wei, Lin Chaoran, Li Jinqiu, et al. Analysis of the Evolutionary Trend of Technical Topics in Patents Based on LDA and HMM: Taking Marine Diesel Engine Technology as an Example[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(7):732-741.)
[20] 李静, 徐路路, 赵素君. 基于时间序列分析和SVM模型的基金项目新兴主题趋势预测与可视化研究[J]. 情报理论与实践, 2019,42(1):118-123,152.
[20] ( Li Jing, Xu Lulu, Zhao Sujun. Prediction and Visualization of Emerging Topics of Fund Sponsored Projects Based on Time Series Analysis and SVM Model[J]. Information Studies: Theory & Application, 2019,42(1):118-123, 152.)
[21] 刘自强, 王效岳, 白如江. 语义分类的学科主题演化分析方法研究——以我国图书情报领域大数据研究为例[J]. 图书情报工作, 2016,60(15):76-85,93.
[21] ( Liu Ziqiang, Wang Xiaoyue, Bai Rujiang. Research on the Discipline Topic Evolution Analysis Method of Semantic Classification——A Case Study of Big Data in the Field of Library and Information Science in China[J]. Library and Information Service, 2016,60(15):76-85, 93.)
[22] 关鹏, 王曰芬, 傅柱. 基于LDA的主题语义演化分析方法研究——以锂离子电池领域为例[J]. 数据分析与知识发现, 2019,3(7):61-72.
[22] ( Guan Peng, Wang Yuefen, Fu Zhu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. Data Analysis and Knowledge Discovery, 2019,3(7):61-72.)
[23] 沈文娟, 李明诗, 黄成全. 长时间序列多源遥感数据的森林干扰监测算法研究进展[J]. 遥感学报, 2018,22(6):1005-1022.
[23] ( Shen Wenjuan, Li Mingshi, Huang Chengquan. Review of Remote Sensing Algorithms for Monitoring Forest Disturbance from Time Series and Multi-source Data Fusion[J]. Journal of Remote Sensing, 2018,22(6):1005-1022.)
[24] 张文秋, 房磊, 杨健, 等. 基于Landsat时间序列的湖南省会同县杉木人工林干扰历史重建与林龄估算[J]. 生态学杂志, 2018,37(11):3467-3479.
[24] ( Zhang Wenqiu, Fang Lei, Yang Jian, et al. Reconstruction of Stand-replacement Disturbance and Stand Age of Chinese Fir Plantation Based on a Landsat Time Series in Huitong County, Hunan[J]. Chinese Journal of Ecology, 2018,37(11):3467-3479.)
[25] 杨斌清, 张希琳. 基于ARIMA时间序列模型的稀土氧化物价格预测研究[J]. 中国稀土学报, 2017,35(5):680-686.
[25] ( Yang Binqing, Zhang Xilin. Forecast of Price of Rare Earths Neodymium Oxide and Dysprosium Oxide Based on ARIMA Time Series Model[J]. Journal of the Chinese Society of Rare Earths, 2017,35(5):680-686.)
[26] 张美英, 何杰. 时间序列预测模型研究综述[J]. 数学的实践与认识, 2011,41(18):189-195.
[26] ( Zhang Meiying, He Jie. Summary on Time Series Forecasting Model[J]. Mathematics in Practice and Theory, 2011,41(18):189-195.)
[27] 岳丽欣, 周晓英, 陈旖旎. 基于ARIMA模型的信息构建研究主题趋势预测研究[J]. 图书情报知识, 2019(5):54-63.
[27] ( Yue Lixin, Zhou Xiaoying, Chen Yini. Thematic Trend Prediction of Information Architecture Based on the ARIMA Model[J]. Documentation, Information & Knowledge, 2019(5):54-63.)
[28] 周练. Word2vec的工作原理及应用探究[J]. 科技情报开发与经济, 2015,25(2):145-148.
[28] ( Zhou Lian. Exploration of the Working Principle and Application of Word2vec[J]. Sci-Tech Information Development & Economy, 2015,25(2):145-148.)
[29] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality [C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013: 3111-3119.
[30] 胡志刚, 林歌歌, 孙太安, 等. 基于VOSviewer的我国各省市科研热点领域分析[J]. 科学与管理, 2017,37(4):44-51,79.
[30] ( Hu Zhigang, Lin Gege, Sun Taian, et al. Research on Spotlights Analysis for Different Regions in China by VOSviewer[J]. Science and Management, 2017,37(4):44-51, 79.)
[31] 吉丽君. 基于VOSviewer的2016-2018年国内外信息素养热点分析[J]. 当代图书馆, 2019(3):23-28.
[31] ( Ji Lijun. Analysis on Information Literacy Hotspots at Home and Abroad Between 2016 and 2018 with VOSviewer[J]. Contemporary Library, 2019(3):23-28.)
[32] 侯海燕, 郭芳琪, 孙太安, 等. 基于VOSviewer的山东省生物技术领域国内及国际研究现状分析[J]. 科学与管理, 2018,38(2):25-33.
[32] ( Hou Haiyan, Guo Fangqi, Sun Taian, et al. Analysis of the Domestic and International Research Situation of Biotechnology in Shandong Province by VOSviewer[J]. Science and Management, 2018,38(2):25-33.)
[1] Shen Si,Li Qinyu,Ye Yuan,Sun Hao,Ye Wenhao. Topic Mining and Evolution Analysis of Medical Sci-Tech Reports with TWE Model[J]. 数据分析与知识发现, 2021, 5(3): 35-44.
[2] Wang Wei, Gao Ning, Xu Yuting, Wang Hongwei. Topic Evolution of Online Reviews for Crowdfunding Campaigns[J]. 数据分析与知识发现, 2021, 5(10): 103-123.
[3] Li Yueyan,Xiong Huixiang,Li Xiaomin. Recommending Doctors Online Based on Combined Conditions[J]. 数据分析与知识发现, 2020, 4(8): 130-142.
[4] Liu Qian, Li Chenliang. A Survey of Topic Evolution on Social Media[J]. 数据分析与知识发现, 2020, 4(8): 1-14.
[5] Tang Xiaobo,Gao Hexuan. Classification of Health Questions Based on Vector Extension of Keywords[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
[6] Ye Jiaxin,Xiong Huixiang,Tong Zhaoli,Meng Qiuqing. Collaborative Tagging for Doctors in Online Medical Community[J]. 数据分析与知识发现, 2020, 4(6): 118-128.
[7] Tao Xing,Zhang Xiangxian,Guo Shunli,Zhang Liman. Automatic Summarization of User-Generated Content in Academic Q&A Community Based on Word2Vec and MMR[J]. 数据分析与知识发现, 2020, 4(4): 109-118.
[8] Ye Jiaxin,Xiong Huixiang,Jiang Wuxuan. A Physician Recommendation Algorithm Integrating Inquiries and Decisions of Patients[J]. 数据分析与知识发现, 2020, 4(2/3): 153-164.
[9] Xue Fuliang,Liu Lifang. Fine-Grained Sentiment Analysis with CRF and ATAE-LSTM[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
[10] Ding Shengchun,Yu Fengyang,Li Zhen. Identifying Potential Trending Topics of Online Public Opinion[J]. 数据分析与知识发现, 2020, 4(2/3): 29-38.
[11] Gong Lijuan,Wang Hao,Zhang Zixuan,Zhu Liping. Reducing Dimensions of Custom Declaration Texts with Word2Vec[J]. 数据分析与知识发现, 2020, 4(2/3): 89-100.
[12] Huang Wei,Zhao Jiangyuan,Yan Lu. Empirical Research on Topic Drift Index for Trending Network Events[J]. 数据分析与知识发现, 2020, 4(11): 92-101.
[13] Jiang Wu,Guanjun Liu,Xian Hu. An Overview of Online Medical and Health Research: Hot Topics, Theme Evolution and Research Content[J]. 数据分析与知识发现, 2019, 3(4): 2-12.
[14] Peiyao Zhang,Dongsu Liu. Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM[J]. 数据分析与知识发现, 2019, 3(3): 95-101.
[15] Hongqinling Wang,Zhichao Ba,Gang Li. Conversational Topic Intensity Calculation and Evolution Analysis of WeChat Group[J]. 数据分析与知识发现, 2019, 3(2): 33-42.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938