Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (2/3): 18-32    DOI: 10.11925/infotech.2096-3467.2021.0908
Current Issue | Archive | Adv Search |
Technology Evolution Analysis Framework Based on Two-Layer Topic Model and Application
Lv Lucheng1,2,Zhou Jian3,Wang Xuezhao1,2,Liu Xiwen1,2()
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academic of Sciences, Beijing 100190, China
3Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100094, China
Download: PDF (1909 KB)   HTML ( 27
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper constructs a new analysis framework for technology evolution, aiming to address the problems of the topic similarity calculation and manually setting the threshold to judge the correlation between window technology topics. [Methods] We established the new framework based on two layer topic model, which identified the dynamic topics using the LDA and NMF. Then, we evaluated the technical topic identification effects with the indicators of inner consistency and outer difference of the topics. Finally, we analyzed the evolution of technical topics from the perspectives of topic growth and importance. [Results] We examined our new method with data from the field of resources and environment. The two layer topic model based on NMF is more effective in dynamic topic recognition, and the analysis results of technology evolution can be verified from the list of breakthrough technologies released by MIT Technology Review. [Limitations] This paper only studies the development of technology from emergence to extinction, and does not examine the division, derivation and integration of technology. [Conclusions] The proposed method can automatically identify dynamic topics and analyze their evolution tracks using the literature. It has application value in scientific and technological information analysis.

Key wordsTechnology Evolution Analysis      Topic Model      S&T Literature Mining      NMF      Resource and Environment Field     
Received: 25 August 2021      Published: 14 April 2022
ZTFLH:  G254  
Fund:Projects of Strategy Research from Bureau of Planning and Strategy, Chinese Academy of Sciences(GHJ-ZLZX-2020-31-3)
Corresponding Authors: Liu Xiwen,ORCID:0000-0003-0820-3622     E-mail: liuxw@mail.las.ac.cn

Cite this article:

Lv Lucheng, Zhou Jian, Wang Xuezhao, Liu Xiwen. Technology Evolution Analysis Framework Based on Two-Layer Topic Model and Application. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 18-32.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0908     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I2/3/18

The Overall Framework of Method
The Framework of Two-Layers Topic Model
符号 意义
T 总时刻
D i i个时刻的文档集合
V 所有时刻的文档集合的词项集合
d 单篇文档
w 单个词项
k i i个时刻的文档集的主题个数
A i i个时刻的文档集合的文档-词项矩阵, A i R D i × V i
W i i个时刻的由主题模型得到的文档集合的文档-主题矩阵, W i R D i × k i
H i i个时刻的由主题模型得到的文档集合的主题-词项矩阵, H i R k i × V i
k ' 第二层主题模型的动态主题个数
A ' 由所有 H i i 1 , T)合并得到的矩阵, A ' R i = 1 T k i × V
W ' A '作为主题模型的输入得到的文档-主题矩阵, W ' R i = 1 T k i × k '
H ' A '作为主题模型的输入得到的主题-词项矩阵, H ' R k ' × V
The Symbol Definition in Two-Layer Topic Model
The Weight Calculation Method of Dynamic Theme in Each Time Window
年份 论文数量
2010 959
2011 1 078
2012 1 103
2013 1 173
2014 1 260
2015 1 268
2016 1 391
2017 1 504
2018 1 703
2019 2 166
2020 1 580
The Distribution of WOS Paper
Result of the Optimal Number of Window Topics by TL-NMF
Result of the Optimal Number of Window Topics by TL-LDA
The Topic Consistence Average Value by TL-NMF and TL-LDA When Taking the Optimal Number of Window Topics
Result of the Optimal Number of Dynamic Topics by TL-NMF and TL-LDA
模型 主题一致性 InnerSim 主题差异度 OuterDif f W 2 V 主题差异度 OuterDif f JCD
TL-NMF 0.366 9 0.806 9 0.986 7
TL-LDA 0.351 3 0.714 5 0.992 8
Comparison of Model Effect
Top5 t 0 t 7 t 10 t 27 t 35 t 49
1 Climate Lithium GIS Network COVID Microalgae
2 Change Ion Batteries Support Vector Machine Prediction Coronavirus Biodiesel
3 Impacts Li Spatial Prediction Artificial Neural SARS Algae
4 Temperature Capacity Regression Algorithm COV Biodiesel Production
5 Climate Change Storage Logistic ANN Pandemic Biofuels
Examples of Dynamic Topics Based on TL-NMF
Top5 t 0 t 5 t 17 t 26 t 32 t 41
1 Membrane Bioreactor Trend Analysis Model COVID Membrane Fouling Hydraulic Fracture
2 Temporal Microgrids Water Holocene Aerobic Granular Sludge Neural Network
3 Flower Pollination Algorithm Monitor Climate Ecological Footprint Biosynthesis Electricity Market
4 Nanoscale Zero Water Management Carbon Mediterranean Density Functional Theory Transfer Learning
5 Artificial Bee Colony Statistics Temperature Soil Erosion Hydrogen Generation Surface Mass
Examples of Dynamic Topics Based on TL-LDA
The Evaluation Analysis of Top3 Dynamic Topics in Growth
The Evaluation Analysis of Top3 Dynamic Topics in Importance
The Growth and Importance Distribution of 50 Dynamic Topics
[1] 吕璐成, 罗文馨, 许景龙, 等. 专利情报方法、工具、应用研究进展及新技术应用趋势[A]// 情报学进展[M]. 2020, 13:235-278.
[1] ( Lv Lucheng, Luo Wenxin, Xu Jinglong, et al. Patent Information Analysis Methods, Tools, Application Research Progress and Application Trend of New Technology[A]// Advances in Information Science[M]. 2020, 13:235-278.)
[2] 胡阿沛, 张静, 张晓宇. 基于专利文献的技术演化分析方法评述[J]. 现代情报, 2013, 33(10):172-176.
[2] ( Hu Apei, Zhang Jing, Zhang Xiaoyu. A Review on the Method of Analyzing Technological Evolution Based on Patent Documents[J]. Journal of Modern Information, 2013, 33(10):172-176.)
[3] 李晓曼. 基于专利要素特征的技术演化分析[D]. 北京: 中国农业科学院, 2020.
[3] ( Li Xiaoman. Technology Evolution Analysis Based on Patent Elements Features[D]. Beijing: Chinese Academy of Agricultural Sciences, 2020.)
[4] 刘自强, 王效岳, 白如江. 多维度视角下学科主题演化可视化分析方法研究: 以我国图书情报领域大数据研究为例[J]. 中国图书馆学报, 2016, 42(6):67-84.
[4] ( Liu Ziqiang, Wang Xiaoyue, Bai Rujiang. Research on Visualization Analysis Method of Discipline Topics Evolution from the Perspective of Multi-Dimensions: A Case Study of the Big Data in the Field of Library and Information Science in China[J]. Journal of Library Science in China, 2016, 42(6):67-84.)
[5] 陈亮, 杨冠灿, 张静, 等. 面向技术演化分析的多主路径方法研究[J]. 图书情报工作, 2015, 59(10):124-130, 115.
[5] ( Chen Liang, Yang Guancan, Zhang Jing, et al. Research on Multiple Main Paths Method Oriented to Analysis of Technological Evolution[J]. Library and Information Service, 2015, 59(10):124-130, 115.)
[6] 陈亮, 张静, 杨冠灿, 等. 基于专利文本的闭频繁项集在技术演化分析中的应用[J]. 图书情报工作, 2016, 60(6):70-76.
[6] ( Chen Liang, Zhang Jing, Yang Guancan, et al. The Application of Closed Frequent Itemsets on Patent Text for Technological Evolution Analysis[J]. Library and Information Service, 2016, 60(6):70-76.)
[7] Greene D, Cross J P. Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach[J]. Political Analysis, 2017, 25(1):77-94.
doi: 10.1017/pan.2016.7
[8] 方曙, 胡正银, 庞弘燊, 等. 基于专利文献的技术演化分析方法研究[J]. 图书情报工作, 2011, 55(22):42-46.
[8] ( Fang Shu, Hu Zhengyin, Pang Hongshen, et al. Study on the Method of Analyzing Technology Evolution Based on Patent Documents[J]. Library and Information Service, 2011, 55(22):42-46.)
[9] 张娴, 方曙, 王春华. 专利引证视角下的技术演化研究综述[J]. 科学学与科学技术管理, 2016, 37(3):58-67.
[9] ( Zhang Xian, Fang Shu, Wang Chunhua. Review on Technology Evolution Research from Patent Citation Perspective[J]. Science of Science and Management of S.&T., 2016, 37(3):58-67.)
[10] 李蕾, 宋俭宁, 宋天华. 基于创新网络社区主题发现和S曲线的技术预测[J]. 农业图书情报学报, 2021, 33(4):45-57.
[10] ( Li Lei, Song Jianning, Song Tianhua. Technology Forecasting Based on Topic Identification of Online Innovation Communities and S-Curve[J]. Journal of Library and Information Science in Agriculture, 2021, 33(4):45-57.)
[11] 黄颖. 基于专利文献的技术演化路径识别方法研究[D]. 北京: 北京理工大学, 2018.
[11] ( Huang Ying. Research on Tracing Technological Evolution Pathways Based on Patent Documents[D]. Beijing: Beijing Institute of Technology 2018.)
[12] 郑晓月, 牟冬梅, 琚沅红, 等. 学科知识结构主题演化模式研究: 以图书情报学领域“计量学”主题为例[J]. 图书情报工作, 2017, 61(12):32-41.
[12] ( Zheng Xiaoyue, Mu Dongmei, Ju Yuanhong, et al. Research on the Three Theme Evolution Patterns of Discipline Knowledge Structure: A Case Study of the “Metrology” Theme in the Field of Library and Information Science[J]. Library and Information Service, 2017, 61(12):32-41.)
[13] 巴志超, 杨子江, 朱世伟, 等. 基于关键词语义网络的领域主题演化分析方法研究[J]. 情报理论与实践, 2016, 39(3):67-72.
[13] ( Ba Zhichao, Yang Zijiang, Zhu Shiwei, et al. Research on the Domain Theme Evolution Analysis Based on Keywords Semantic Network[J]. Information Studies: Theory & Application, 2016, 39(3):67-72.)
[14] 王康, 陈悦, 苏成, 等. 多维视角下科学主题演化分析框架[J]. 情报学报, 2021, 40(0):297-307.
[14] ( Wang Kang, Chen Yue, Su Cheng, et al. Analysis Framework for the Evolution of Scientific Themes from a Multi-Dimensional Perspective[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(3):297-307.)
[15] 杨超, 朱东华, 汪雪锋. 专利技术主题分析: 基于SAO结构的LDA主题模型方法[J]. 图书情报工作, 2017, 61(3):86-96.
[15] ( Yang Chao, Zhu Donghua, Wang Xuefeng. Technical Topic Analysis in Patents: SAO-Based LDA Modeling[J]. Library and Information Service, 2017, 61(3):86-96.)
[16] 廖列法, 勒孚刚. 基于LDA模型和分类号的专利技术演化研究[J]. 现代情报, 2017, 37(5):13-18.
[16] ( Liao Liefa, Le Fugang. Research on Patent Technology Evolution Based on LDA Model and Classification Number[J]. Journal of Modern Information, 2017, 37(5):13-18.)
[17] 陈亮, 张静, 张海超, 等. 层次主题模型在技术演化分析上的应用研究[J]. 图书情报工作, 2017, 61(5):103-108.
[17] ( Chen Liang, Zhang Jing, Zhang Haichao, et al. Application of Hierarchical Topic Model on Technological Evolution Analysis[J]. Library and Information Service, 2017, 61(5):103-108.)
[18] 吴菲菲, 张亚茹, 黄鲁成, 等. 基于AToT模型的技术主题多维动态演化分析: 以石墨烯技术为例[J]. 图书情报工作, 2017, 61(5):95-102.
[18] ( Wu Feifei, Zhang Yaru, Huang Lucheng, et al. Multi-dimension Dynamic Evolution Analysis of Technology Topics Based on AToT by Taking Grapheme Technology as an Example[J]. Library and Information Service, 2017, 61(5):95-102.)
[19] 吴红, 伊惠芳, 马永新, 等. 面向专利技术主题分析的WI-LDA模型研究[J]. 图书情报工作, 2018, 62(17):68-74.
[19] ( Wu Hong, Yi Huifang, Ma Yongxin, et al. WI-LDA : Technical Topic Analysis in Patents[J]. Library and Information Service, 2018, 62(17):68-74.)
[20] 王园园, 赵亚娟. 基于非负矩阵分解的技术主题演化分析[J]. 图书情报工作, 2018, 62(10):94-105.
[20] ( Wang Yuanyuan, Zhao Yajuan. Evolution Analysis of Technological Topic: An Approach Based on Non-Negative Matrix Factorization[J]. Library and Information Service, 2018, 62(10):94-105.)
[21] Lee D D, Seung H S, Learning the Parts of Objects by Non-Negative Matrix Factorization[J]. Nature, 1999, 401(6755):788-791.
doi: 10.1038/44565
[22] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3:993-1022.
[23] Mikolov T, Corrado G, Kai C, et al. Efficient Estimation of Word Representations in Vector Space[C]// Proceedings of the International Conference on Learning Representations. 2013.
[24] 祁海, 张民, 李俊涛, 等. MIT Technology Review 2021年“十大突破性技术”解读[J]. 中国科学基金, 2021, 35(3):402-418.
[24] ( Qi Hai, Zhang Min, Li Juntao, et al. Interpretation of 2021 MIT Technology Review’s Top 10 Breakthrough Technologies[J]. Bulletin of National Natural Science Foundation of China, 2021, 35(3):402-418.)
[25] MIT Technology Review 2020年“十大突破性技术”解读[J]. 中国科学基金, 2020, 34(3):250-265.
[25] (Interpretation of 2020 MIT Technology Review’s Top 10 Breakthrough Technologies[J]. Bulletin of National Natural Science Foundation of China, 2020, 34(3):250-265.)
[26] Blei D M, Lafferty J D. Dynamic Topic Models[C]// Proceedings of the 23rd International Conference on Machine Learning. 2006: 113-120.
[1] Yue Tieqi, Fu Youfei, Xu Jian. An Analysis Framework for Job Demands from Job Postings[J]. 数据分析与知识发现, 2022, 6(2/3): 151-166.
[2] Yi Huifang,Liu Xiwen. Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[3] Zhang Xin,Wen Yi,Xu Haiyun. A Prediction Model with Network Representation Learning and Topic Model for Author Collaboration[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[4] Zhao Tianzi, Duan Liang, Yue Kun, Qiao Shaojie, Ma Zijuan. Generating News Clues with Biterm Topic Model[J]. 数据分析与知识发现, 2021, 5(2): 1-13.
[5] Chen Hao, Zhang Mengyi, Cheng Xiufeng. Identifying Cross-Region Patent Collaboration Opportunities Using LDA and Decision Trees——Case Study of Universities from Guangdong and Wuhan[J]. 数据分析与知识发现, 2021, 5(10): 37-50.
[6] Yu Chuanming,Yuan Sai,Zhu Xingyu,Lin Hongjun,Zhang Puliang,An Lu. Research on Deep Learning Based Topic Representation of Hot Events[J]. 数据分析与知识发现, 2020, 4(4): 1-14.
[7] Pan Youneng,Ni Xiuli. Recommending Online Medical Experts with Labeled-LDA Model[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[8] Xu Jianmin,Zhang Liqing,Wang Miao. Tracking Static Topics with Bayesian Network[J]. 数据分析与知识发现, 2020, 4(2/3): 200-206.
[9] Chen Wenjie. Predicting Research Collaboration Based on Translation Model[J]. 数据分析与知识发现, 2020, 4(10): 28-36.
[10] Hongfei Ling,Shiyan Ou. Review of Automatic Labeling for Topic Models[J]. 数据分析与知识发现, 2019, 3(9): 16-26.
[11] Weimin Nie,Yongzhou Chen,Jing Ma. A Text Vector Representation Model Merging Multi-Granularity Information[J]. 数据分析与知识发现, 2019, 3(9): 45-52.
[12] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[13] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[14] Peiyao Zhang,Dongsu Liu. Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM[J]. 数据分析与知识发现, 2019, 3(3): 95-101.
[15] Linna Xi,Yongxiang Dou. Examining Reposts of Micro-bloggers with Planned Behavior Theory[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn