|
|
Evolution Analysis of Hot Topics with Trend-Prediction |
Yue Lixin1,Liu Ziqiang2,3(),Hu Zhengyin2,3 |
1School of Information Resource Management, Renmin University of China, Beijing 100872, China 2Chengdu Library of Chinese Academy of Sciences, Chengdu 610041, China 3Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China |
|
|
Abstract [Objective] The paper constructs mathematical and content prediction models based on the external and internal characteristics academic articles, aiming to analyze the evolution of trending research topics. [Methods] With the help of LDA model, we identified the needed topics and constructed their time series. Then, we determined the popular topics by mean values and linear regression fitting. Finally, we predicted the trending topics with ARIMA and Word2Vec models based on the topic intensity and content. [Results] We conducted an empirical study to evaluate our models with stem cell research in the United States. We identified popular topics and predicted their development trends. [Limitations] There might be ambiguity in interpreting the documents, because the Word2Vec model analyzes trends of theme contents based on single words. [Conclusions] The proposed method can provide better prediction results than methods based on manual interpretation.
|
Received: 22 October 2019
Published: 07 July 2020
|
|
Corresponding Authors:
Liu Ziqiang
E-mail: 1224615932@qq.com
|
[1] |
刘小平, 冷伏海, 李泽霞. 国际科技前沿分析的方法和途径[J]. 图书情报工作, 2012,56(12):60-65.
|
[1] |
( Liu Xiaoping, Leng Fuhai, Li Zexia. Methods and Approaches of International S&T Front Analysis[J]. Library and Information Service, 2012,56(12):60-65.)
|
[2] |
刘自强, 王效岳, 白如江. 多维度视角下学科主题演化可视化分析方法研究——以我国图书情报领域大数据研究为例[J]. 中国图书馆学报, 2016,42(6):67-84.
|
[2] |
( Liu Ziqiang, Wang Xiaoyue, Bai Rujiang. Research on Visualization Analysis Method of Discipline Topics Evolution from the Perspective of Multi-Dimensions: A Case Study of the Big Data in the Field of Library and Information Science in China[J]. Journal of Library Science in China, 2016,42(6):67-84.)
|
[3] |
静发冲, 李晨英, 韩明杰, 等. 基于文本挖掘的美国NSF生物科学部新兴前沿项目主题分析[J]. 现代情报, 2014,34(12):107-112.
|
[3] |
( Jing Fachong, Li Chenying, Han Mingjie, et al. Topic Analysis of Projects from Emerging Frontiers Division of NSF’s Directorate for Biological Science Based on Text Mining[J]. Journal of Modern Information, 2014,34(12):107-112.)
|
[4] |
刘自强, 王效岳, 白如江. 基于时间序列模型的研究热点分析预测方法研究[J]. 情报理论与实践, 2016,39(5):27-33.
|
[4] |
( Liu Ziqiang, Wang Xiaoyue, Bai Rujiang. Research on the Forecasting Method of Research Hotspots Analysis Based on Time Series Model[J]. Information Studies: Theory & Application, 2016,39(5):27-33.)
|
[5] |
许晓阳, 郑彦宁, 刘志辉. 论文和专利相结合的研究前沿识别方法研究[J]. 图书情报工作, 2016,60(24):97-106.
|
[5] |
( Xu Xiaoyang, Zheng Yanning, Liu Zhihui. Study on the Method of Identifying Research Fronts Based on Scientific Papers and Patents[J]. Library and Information Service, 2016,60(24):97-106.)
|
[6] |
Yu G, Wang M Y, Yu D R. Characterizing Knowledge Diffusion of Nanoscience & Nanotechnology by Citation Analysis[J]. Scientometrics, 2010,84:81-97.
doi: 10.1007/s11192-009-0090-2
|
[7] |
侯剑华, 王仲禹. 研究主题的知识流动测度及其实证分析——以H指数研究为例[J]. 图书情报工作, 2017,61(10):87-93.
|
[7] |
( Hou Jianhua, Wang Zhongyu. The Measurement of Knowledge Flow in Research Subject with an Empirical Analysis——Taking H-index Study as an Example[J]. Library and Information Service, 2017,61(10):87-93.)
|
[8] |
白如江, 冷伏海. k-clique社区知识创新演化方法研究[J]. 图书情报工作, 2013,57(17):86-94.
|
[8] |
( Bai Rujiang, Leng Fuhai. Knowledge Innovational Evolution Analysis Based on k-clique Community Network[J]. Library and Information Service, 2013,57(17):86-94.)
|
[9] |
Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
|
[10] |
Blei D M, Lafferty J. Dynamic Topic Models [C]// Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006: 113-120.
|
[11] |
范云满, 马建霞. 基于LDA与新兴主题特征分析的新兴主题探测研究[J]. 情报学报, 2014,33(7):698-711.
|
[11] |
( Fan Yunman, Ma Jianxia. Detection of Emerging Topics Based on LDA and Feature Analysis of Emerging Topics[J]. Journal of the China Society for Scientific and Technical Information, 2014,33(7):698-711.)
|
[12] |
王效岳, 刘自强, 白如江, 等. 基于基金项目数据的研究前沿主题探测方法[J]. 图书情报工作, 2017,61(13):87-98.
|
[12] |
( Wang Xiaoyue, Liu Ziqiang, Bai Rujiang, et al. The Method of Research Front Topic Detection Based on the Fund Project Data[J]. Library and Information Service, 2017,61(13):87-98.)
|
[13] |
Rosvall M, Bergstrom C T. Mapping Change in Large Networks[J]. PLoS ONE, 2010,5(1):e8694.
doi: 10.1371/journal.pone.0008694
pmid: 20111700
|
[14] |
王晓光, 程齐凯. 基于NEViewer的学科主题演化可视化分析[J]. 情报学报, 2013,32(9):900-911.
|
[14] |
( Wang Xiaoguang, Cheng Qikai. Analysis on Evolution of Research Topics in a Discipline Based on NEViewer[J]. Journal of the China Society for Scientific and Technical Information, 2013,32(9):900-911.)
|
[15] |
Yan E. Research Dynamics, Impact, and Dissemination: A Topic-Level Analysis[J]. Journal of the Association for Information Science and Technology, 2015,66(11):2357-2372.
doi: 10.1002/asi.2015.66.issue-11
|
[16] |
周源, 张超, 唐杰, 等. 基于主题变迁的领域发展路径智能化识别——以人工智能为例[J]. 图书情报工作, 2018,62(14):62-71.
|
[16] |
( Zhou Yuan, Zhang Chao, Tang Jie, et al. Intelligent Identification of Field Development Trajectory Based on Topic Evolution: A Case Study of Artificial Intelligence[J]. Library and Information Service, 2018,62(14):62-71.)
|
[17] |
Jaccard P. The Distribution of Flora in the Alpine Zone[J]. New Phytologist, 1912,11(2):37-50.
doi: 10.1111/nph.1912.11.issue-2
|
[18] |
齐亚双, 祝娜, 翟羽佳. 基于DTM的国内外情报学研究主题热度演化对比研究[J]. 图书情报工作, 2016,60(16):99-109.
|
[18] |
( Qi Yashuang, Zhu Na, Zhai Yujia. A Comparative Study on Topic Heats Evolution in the Field of Information Science Between the Domestic and Foreign Research Based on DTM[J]. Library and Information Service, 2016,60(16):99-109.)
|
[19] |
陈伟, 林超然, 李金秋, 等. 基于LDA-HMM的专利技术主题演化趋势分析——以船用柴油机技术为例[J]. 情报学报, 2018,37(7):732-741.
|
[19] |
( Chen Wei, Lin Chaoran, Li Jinqiu, et al. Analysis of the Evolutionary Trend of Technical Topics in Patents Based on LDA and HMM: Taking Marine Diesel Engine Technology as an Example[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(7):732-741.)
|
[20] |
李静, 徐路路, 赵素君. 基于时间序列分析和SVM模型的基金项目新兴主题趋势预测与可视化研究[J]. 情报理论与实践, 2019,42(1):118-123,152.
|
[20] |
( Li Jing, Xu Lulu, Zhao Sujun. Prediction and Visualization of Emerging Topics of Fund Sponsored Projects Based on Time Series Analysis and SVM Model[J]. Information Studies: Theory & Application, 2019,42(1):118-123, 152.)
|
[21] |
刘自强, 王效岳, 白如江. 语义分类的学科主题演化分析方法研究——以我国图书情报领域大数据研究为例[J]. 图书情报工作, 2016,60(15):76-85,93.
|
[21] |
( Liu Ziqiang, Wang Xiaoyue, Bai Rujiang. Research on the Discipline Topic Evolution Analysis Method of Semantic Classification——A Case Study of Big Data in the Field of Library and Information Science in China[J]. Library and Information Service, 2016,60(15):76-85, 93.)
|
[22] |
关鹏, 王曰芬, 傅柱. 基于LDA的主题语义演化分析方法研究——以锂离子电池领域为例[J]. 数据分析与知识发现, 2019,3(7):61-72.
|
[22] |
( Guan Peng, Wang Yuefen, Fu Zhu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. Data Analysis and Knowledge Discovery, 2019,3(7):61-72.)
|
[23] |
沈文娟, 李明诗, 黄成全. 长时间序列多源遥感数据的森林干扰监测算法研究进展[J]. 遥感学报, 2018,22(6):1005-1022.
|
[23] |
( Shen Wenjuan, Li Mingshi, Huang Chengquan. Review of Remote Sensing Algorithms for Monitoring Forest Disturbance from Time Series and Multi-source Data Fusion[J]. Journal of Remote Sensing, 2018,22(6):1005-1022.)
|
[24] |
张文秋, 房磊, 杨健, 等. 基于Landsat时间序列的湖南省会同县杉木人工林干扰历史重建与林龄估算[J]. 生态学杂志, 2018,37(11):3467-3479.
|
[24] |
( Zhang Wenqiu, Fang Lei, Yang Jian, et al. Reconstruction of Stand-replacement Disturbance and Stand Age of Chinese Fir Plantation Based on a Landsat Time Series in Huitong County, Hunan[J]. Chinese Journal of Ecology, 2018,37(11):3467-3479.)
|
[25] |
杨斌清, 张希琳. 基于ARIMA时间序列模型的稀土氧化物价格预测研究[J]. 中国稀土学报, 2017,35(5):680-686.
|
[25] |
( Yang Binqing, Zhang Xilin. Forecast of Price of Rare Earths Neodymium Oxide and Dysprosium Oxide Based on ARIMA Time Series Model[J]. Journal of the Chinese Society of Rare Earths, 2017,35(5):680-686.)
|
[26] |
张美英, 何杰. 时间序列预测模型研究综述[J]. 数学的实践与认识, 2011,41(18):189-195.
|
[26] |
( Zhang Meiying, He Jie. Summary on Time Series Forecasting Model[J]. Mathematics in Practice and Theory, 2011,41(18):189-195.)
|
[27] |
岳丽欣, 周晓英, 陈旖旎. 基于ARIMA模型的信息构建研究主题趋势预测研究[J]. 图书情报知识, 2019(5):54-63.
|
[27] |
( Yue Lixin, Zhou Xiaoying, Chen Yini. Thematic Trend Prediction of Information Architecture Based on the ARIMA Model[J]. Documentation, Information & Knowledge, 2019(5):54-63.)
|
[28] |
周练. Word2vec的工作原理及应用探究[J]. 科技情报开发与经济, 2015,25(2):145-148.
|
[28] |
( Zhou Lian. Exploration of the Working Principle and Application of Word2vec[J]. Sci-Tech Information Development & Economy, 2015,25(2):145-148.)
|
[29] |
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality [C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013: 3111-3119.
|
[30] |
胡志刚, 林歌歌, 孙太安, 等. 基于VOSviewer的我国各省市科研热点领域分析[J]. 科学与管理, 2017,37(4):44-51,79.
|
[30] |
( Hu Zhigang, Lin Gege, Sun Taian, et al. Research on Spotlights Analysis for Different Regions in China by VOSviewer[J]. Science and Management, 2017,37(4):44-51, 79.)
|
[31] |
吉丽君. 基于VOSviewer的2016-2018年国内外信息素养热点分析[J]. 当代图书馆, 2019(3):23-28.
|
[31] |
( Ji Lijun. Analysis on Information Literacy Hotspots at Home and Abroad Between 2016 and 2018 with VOSviewer[J]. Contemporary Library, 2019(3):23-28.)
|
[32] |
侯海燕, 郭芳琪, 孙太安, 等. 基于VOSviewer的山东省生物技术领域国内及国际研究现状分析[J]. 科学与管理, 2018,38(2):25-33.
|
[32] |
( Hou Haiyan, Guo Fangqi, Sun Taian, et al. Analysis of the Domestic and International Research Situation of Biotechnology in Shandong Province by VOSviewer[J]. Science and Management, 2018,38(2):25-33.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|