Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (9): 89-99     https://doi.org/10.11925/infotech.2096-3467.2022.1069
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于共词和Node2Vec表示学习的新兴技术识别方法*
曹琨1,2,吴新年1,2(),靳军宝1,2,郑玉荣1,付爽1
1中国科学院西北生态环境资源研究院 兰州 730000
2中国科学院大学经济与管理学院信息资源管理系 北京 100190
Identification of Emerging Technology Based on Co-words and Node2Vec Representation Learning
Cao Kun1,2,Wu Xinnian1,2(),Jin Junbao1,2,Zheng Yurong1,Fu Shuang1
1Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou 730000, China
2Department of Information Resource Managements, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
全文: PDF (2269 KB)   HTML ( 11
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】高效准确地识别新兴技术,帮助政府、企业等市场各参与主体及时洞察技术前沿并合理配置资源。【方法】本研究以细粒度的技术术语为研究对象,在考虑共词网络结构特征和语义表示的基础上,构建模型进行新兴术语的遴选和新兴分数的量化,并运用Node2Vec图表示学习算法对新兴术语的向量进行编码及语义表示,实现了新兴术语和新兴技术主题的识别。【结果】在数控机床领域进行实证研究,共识别出449个新兴术语以及4个新兴技术主题(机器人自动上下料系统、清洁高效切削加工技术、高速高精度数控加工中心、增减材复合制造技术),验证了所提方法的科学性和合理性。【局限】仅使用专利文献的数据,对其他多源异构文献数据及其中存在的引用、语义相似等其他网络关系利用不足。【结论】运用共词和Node2Vec图表示学习的方法可深入挖掘技术术语间共词网络结构特征和语义表示,实现了新兴技术的细粒度精准量化识别。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
曹琨
吴新年
靳军宝
郑玉荣
付爽
关键词 新兴技术文本挖掘图表示学习Node2Vec    
Abstract

[Objective] This paper aims to efficiently and accurately identify emerging technologies, which also helps governments and enterprises allocate resources appropriately. [Methods] We took fine-grained technical terms as research objects. We constructed an emerging technology recognition model based on the co-word network’s structural features and semantic representation. Then, we identified emerging terms and quantified their scores. Third, we used the Node2Vec graph representation learning algorithm to encode and semantically represent the vectors of these terms. Finally, we identified emerging terms and technical topics. [Results] We conducted an empirical study with the new model and CNC machine tools. A total of 449 emerging terms and four emerging technology topics (including robot automatic loading and unloading systems, clean and efficient cutting technology, high-speed and high-precision CNC machining centers, and additive-subtractive hybrid manufacturing technology) were identified. [Limitations] We only used patent data, which needs to be expanded to other multi-source heterogeneous data with network relationships like citation and semantic similarity. [Conclusions] Using the co-word and Node2Vec representation learning method, we successfully utilize the co-word network’s structural features and semantic representation between technical terms, which help us identify emerging technologies.

Key wordsEmerging Technology    Text Mining    Graph Representation Learning    Node2Vec
收稿日期: 2022-10-12      出版日期: 2023-10-24
ZTFLH:  TP393  
  G250  
基金资助:*国家社会科学基金项目(20BTQ094);甘肃省软科学专项(21CX6ZA110)
通讯作者: 吴新年,ORCID:0000-0002-7865-9548,E-mail:wuxn@lzb.ac.cn。   
引用本文:   
曹琨, 吴新年, 靳军宝, 郑玉荣, 付爽. 基于共词和Node2Vec表示学习的新兴技术识别方法*[J]. 数据分析与知识发现, 2023, 7(9): 89-99.
Cao Kun, Wu Xinnian, Jin Junbao, Zheng Yurong, Fu Shuang. Identification of Emerging Technology Based on Co-words and Node2Vec Representation Learning. Data Analysis and Knowledge Discovery, 2023, 7(9): 89-99.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.1069      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I9/89
Fig.1  研究框架
Fig.2  Node2Vec随机游走策略
Fig.3  新兴技术特征权重分布
序号 新兴术语 新颖性 增长性 影响力 不确定性 新兴分数
1 neural network 7.24 0.03 1.62 1.26 0.60
2 magnetic attraction 8.19 0.29 0.35 0.84 0.54
3 fluid filter 8.08 0.18 0.67 1.00 0.53
4 arc additive manufacture 7.50 0.01 1.25 1.54 0.52
5 fluid recycle 8.25 0.07 0.66 2.75 0.52
6 additive manufacture 6.80 0.03 1.54 0.01 0.50
7 arc additive 7.73 0.03 1.01 1.73 0.50
8 intelligent tool 7.44 0.02 1.18 1.30 0.50
9 machine tool chip 7.71 0.13 0.80 0.55 0.49
??? ??? ??? ??? ??? ??? ???
449 photoelectric switch 5.57 0.01 0.28 0.36 0.07
Table 1  新兴术语特征值及新兴分数
Fig.4  新兴术语共词网络图谱
Fig.5  聚类效果评价
Fig.6  t-SEN降维
主题 新兴技术主题 新兴术语(Top 5)
1 机器人自动上下料系统 convey unit; insert hole; truss manipulator; drive box; multi degree
2 清洁高效切削加工技术 magnetic attraction; fluid filter; fluid recycle; machine tool chip; front bear assembly
3 高速高精度数控加工中心 intelligent tool; center tool magazine; manual tool; machine center tool; center tool
4 增减材复合制造技术 neural network; arc additive manufacture; additive manufacture; arc additive; vibration signal
Table2  新兴技术主题及包含的新兴术语
[1] Lee M H, Kim S, Kim H, et al. Technology Opportunity Discovery Using Deep Learning-Based Text Mining and a Knowledge Graph[J]. Technological Forecasting and Social Change, 2022, 180: 121718.
doi: 10.1016/j.techfore.2022.121718
[2] Miao H, Wang Y, Li X, et al. Integrating Technology-Relationship-Technology Semantic Analysis and Technology Roadmapping Method: A Case of Elderly Smart Wear Technology[J]. IEEE Transactions on Engineering Management, 2022, 69(1): 262-278.
doi: 10.1109/TEM.2020.2970972
[3] 王玏, 吴新年. 新兴技术识别方法研究综述[J]. 图书情报工作, 2020, 64(4): 125-135.
doi: 10.13266/j.issn.0252-3116.2020.04.014
[3] (Wang Le, Wu Xinnian. Research on Identification Methods of Emerging Technologies[J]. Library and Information Service, 2020, 64(4): 125-135.)
doi: 10.13266/j.issn.0252-3116.2020.04.014
[4] 李昌, 杨中楷, 董坤. 基于多维属性动态变化特征的新兴技术识别研究[J]. 情报学报, 2022, 41(5): 463-474.
[4] (Li Chang, Yang Zhongkai, Dong Kun. Recognition of Emerging Technologies Based on Dynamic Characteristics of Multi-dimensional Attributes[J]. Journal of the China Society for Scientific and Technical Information, 2022, 41(5): 463-474.)
[5] 乔治·戴, 保罗·休梅克. 沃顿论新兴技术管理[M]. 石莹, 等译. 北京: 华夏出版社, 2002.
[5] (Geroge S. Day, Paul J. H. Schoemaker. Wharton on Managing Emerging Technologies[M]. Translated by ShiYing, et al. Beijing: Huaxia Publishing House, 2002.)
[6] Rotolo D, Hicks D, Martin B R. What is an Emerging Technology?[J]. Research Policy, 2015, 44(10): 1827-1843.
doi: 10.1016/j.respol.2015.06.006
[7] Porter A L, Garner J, Carley S F, et al. Emergence Scoring to Identify Frontier R&D Topics and Key Players[J]. Technological Forecasting and Social Change, 2019, 146: 628-643.
doi: 10.1016/j.techfore.2018.04.016
[8] 张金柱, 王秋月, 仇蒙蒙. 颠覆性技术识别研究进展综述[J]. 数据分析与知识发现, 2022, 6(7): 12-31.
[8] (Zhang Jinzhu, Wang Qiuyue, Qiu Mengmeng. Review of Studies Identifying Disruptive Technologies[J]. Data Analysis and Knowledge Discovery, 2022, 6(7): 12-31.)
[9] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
[10] Xu S, Hao L Y, Yang G C, et al. A Topic Models Based Framework for Detecting and Forecasting Emerging Technologies[J]. Technological Forecasting and Social Change, 2021, 162: 120366.
doi: 10.1016/j.techfore.2020.120366
[11] 宋凯, 朱彦君. 专利前沿技术主题识别及趋势预测方法——以人工智能领域为例[J]. 情报杂志, 2021, 40(1): 33-38.
[11] (Song Kai, Zhu Yanjun. Patent Frontier Technology Topic Identification and Trend Prediction: A Case Analysis of Artificial Intelligence[J]. Journal of Intelligence, 2021, 40(1): 33-38.)
[12] 唐恒, 邱悦文. 多源信息视角下的多指标新兴技术主题识别研究——以智能网联汽车领域为例[J]. 情报杂志, 2021, 40(3): 81-88.
[12] (Tang Heng, Qiu Yuewen. Emerging Technology Topic Identification Based on Multi-Source Information: Intelligent Connected Vehicle as an Example[J]. Journal of Intelligence, 2021, 40(3): 81-88.)
[13] 任佳妮, 张薇, 杨阳, 等. “人工智能+医疗”新兴技术识别研究——以医疗机器人为例[J]. 情报杂志, 2021, 40(12): 45-50.
[13] (Ren Jiani, Zhang Wei, Yang Yang, et al. Research on Identification of Emerging Technologies of “Artificial Intelligence+Medical”—As the Case of Medical Robot[J]. Journal of Intelligence, 2021, 40(12): 45-50.)
[14] 李慧, 胡吉霞, 佟志颖. 面向多源数据的学科主题挖掘与演化分析[J]. 数据分析与知识发现, 2022, 6(7): 44-55.
[14] (Li Hui, Hu Jixia, Tong Zhiying. Subject Topic Mining and Evolution Analysis with Multi-Source Data[J]. Data Analysis and Knowledge Discovery, 2022, 6(7): 44-55.)
[15] Stein R A, Jaques P A, Valiati J F. An Analysis of Hierarchical Text Classification Using Word Embeddings[J]. Information Sciences, 2019, 471: 216-232.
doi: 10.1016/j.ins.2018.09.001
[16] 周云泽, 闵超. 基于LDA模型与共享语义空间的新兴技术识别——以自动驾驶汽车为例[J]. 数据分析与知识发现, 2022, 6(2/3): 55-66.
[16] (Zhou Yunze, Min Chao. Identifying Emerging Technology with LDA Model and Shared Semantic Space——Case Study of Autonomous Vehicles[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 55-66.)
[17] Tshitoyan V, Dagdelen J, Weston L, et al. Unsupervised Word Embeddings Capture Latent Knowledge from Materials Science Literature[J]. Nature, 2019, 571(7763): 95-98.
doi: 10.1038/s41586-019-1335-8
[18] Cui P, Wang X, Pei J, et al. A Survey on Network Embedding[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(5): 833-852.
doi: 10.1109/TKDE.69
[19] 丁恒, 任卫强, 曹高辉. 基于无监督图神经网络的学术文献表示学习研究[J]. 情报学报, 2022, 41(1): 62-72.
[19] (Ding Heng, Ren Weiqiang, Cao Gaohui. Using Unsupervised Graphs of Neural Networks for Constructing Learning Representations of Academic Papers[J]. Journal of the China Society for Scientific and Technical Information, 2022, 41(1): 62-72.)
[20] 孙蒙鸽, 王燕鹏, 韩涛, 等. 新兴技术的多指标量化识别研究——基于向量表征方法的探索[J]. 图书情报工作, 2022, 66(3): 130-139.
doi: 10.13266/j.issn.0252-3116.2022.03.014
[20] (Sun Mengge, Wang Yanpeng, Han Tao, et al. Research on Multi-Index Quantitative Recognition of Emerging Technologies: Exploration Based on Vector Representation Method[J]. Library and Information Service, 2022, 66(3): 130-139.)
doi: 10.13266/j.issn.0252-3116.2022.03.014
[21] 霍朝光, 魏瑞斌, 张斌. 基于PageRank和Node2vec的研究热点与集群发现——以国际深度学习研究领域为例[J]. 情报杂志, 2020, 39(8): 174-179, 153.
[21] (Huo Chaoguang, Wei Ruibin, Zhang Bin. Research on Discovery of Hot Topic and Cluster Based on PageRank and Node2vec: International Deep Learning Research as an Example[J]. Journal of Intelligence, 2020, 39(8): 174-179, 153.)
[22] 翟东升, 阚慧敏, 李梦洋, 等. 产业链视角下基于图嵌入的专利布局意图挖掘方法研究[J]. 情报学报, 2022, 41(5): 437-450.
[22] (Zhai Dongsheng, Kan Huimin, Li Mengyang, et al. Research on Mining Patent Layout Intention Based on Graph Embedding from the Perspective of Industrial Chain[J]. Journal of the China Society for Scientific and Technical Information, 2022, 41(5): 437-450.)
[23] 杜瑾, 熊回香, 王妞妞. 融合多元网络与网络表示学习的科研合作者推荐研究[J]. 情报资料工作, 2022, 43(4): 27-35.
[23] (Du Jin, Xiong Huixiang, Wang Niuniu. Research Collaborator Recommendation Research on Fusion of Multivariate Networks and Network Representation Learning[J]. Information and Documentation Services, 2022, 43(4): 27-35.)
[24] 周源, 刘宇飞, 薛澜. 一种基于机器学习的新兴技术识别方法:以机器人技术为例[J]. 情报学报, 2018, 37(9): 939-955.
[24] (Zhou Yuan, Liu Yufei, Xue Lan. An Approach to Identify Emerging Technologies Using Machine Learning: A Case Study of Robotics[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(9): 939-955.)
[25] 宋欣娜, 郭颖, 席笑文. 基于专利文献的多指标新兴技术识别研究[J]. 情报杂志, 2020, 39(6): 76-81, 88.
[25] (Song Xinna, Guo Ying, Xi Xiaowen. Research on Multi-Indicator Emerging Technology Identification Based on Patent Literature[J]. Journal of Intelligence, 2020, 39(6): 76-81, 88.)
[26] 刘自强, 胡正银, 许海云, 等. 基于PWLR模型的领域新兴趋势识别及其可视化研究[J]. 情报学报, 2020, 39(9): 979-988.
[26] (Liu Ziqiang, Hu Zhengyin, Xu Haiyun, et al. Identifying and Visualizing Emerging Trends in Domain Based on PWLR Model[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(9): 979-988.)
[27] Kim Y G, Suh J H, Park S C. Visualization of Patent Analysis for Emerging Technology[J]. Expert Systems with Applications, 2008, 34(3): 1804-1812.
doi: 10.1016/j.eswa.2007.01.033
[28] Huang L, Chen X, Ni X X, et al. Tracking the Dynamics of Co-word Networks for Emerging Topic Identification[J]. Technological Forecasting and Social Change, 2021, 170: 120944.
doi: 10.1016/j.techfore.2021.120944
[29] Arthur W B. The Nature of Technology: What it is and How It Evolves[M]. New York: Free Press, 2009.
[30] Jin Q C, Jiang J, Li J C, et al. Emerging Technology Identification and Selection Based on Data-Driven: Taking the Unmanned Systems as an Example[C]// Proceedings of 2020 IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 2020: 1380-1384.
[31] 邓启平, 陈卫静, 张玲玲, 等. 基于多维特征测度的人工智能领域研究前沿分析[J]. 情报杂志, 2020, 39(3): 56-62.
[31] (Deng Qiping, Chen Weijing, Zhang Lingling, et al. Research Fronts Analysis of AI Based on Multidimensional Feature Measure[J]. Journal of Intelligence, 2020, 39(3): 56-62.)
[32] Kwon S, Liu X Y, Porter A L, et al. Research Addressing Emerging Technological Ideas Has Greater Scientific Impact[J]. Research Policy, 2019, 48(9): 103834.
doi: 10.1016/j.respol.2019.103834
[33] 郭颖, 王明星, 段炜钰. 专利的技术新兴度与其技术影响力间关系研究[J]. 科学学研究, 2022, 40(6): 1034-1043.
[33] (Guo Ying, Wang Mingxing, Duan Weiyu. A Study on the Relationship Between the Patent’s Technological Emerging Degree and Its Technology Impact[J]. Studies in Science of Science, 2022, 40(6): 1034-1043.)
[34] Shannon C E. A Mathematical Theory of Communication[J]. The Bell System Technical Journal, 1948, 27(3): 379-423.
doi: 10.1002/bltj.1948.27.issue-3
[35] Lucio-Arias D, Leydesdorff L. An Indicator of Research Front Activity: Measuring Intellectual Organization as Uncertainty Reduction in Document Sets[J]. Journal of the American Society for Information Science and Technology, 2009, 60(12): 2488-2498.
doi: 10.1002/asi.v60:12
[36] Diakoulaki D. Mavrotas G. Papayannakis L. Determining Objective Weights in Multiple Criteria Problems: The Critic Method[J]. Computers and Operations Research, 1995, 22(7): 763-770.
doi: 10.1016/0305-0548(94)00059-H
[37] Grover A, Leskovec J. node2vec: Scalable Feature Learning for Networks[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016: 855-864.
[38] Dai H J, Umarov R, Kuwahara H, et al. Sequence2Vec: A Novel Embedding Approach for Modeling Transcription Factor Binding Affinity Landscape[J]. Bioinformatics, 2017, 33(22): 3575-3583.
doi: 10.1093/bioinformatics/btx480 pmid: 28961686
[39] Zhou Y, Dong F, Liu Y F, et al. A Deep Learning Framework to Early Identify Emerging Technologies in Large-Scale Outlier Patents: An Empirical Study of CNC Machine Tool[J]. Scientometrics, 2021, 126(2): 969-994.
doi: 10.1007/s11192-020-03797-8
[40] 刘林山, 李建永, 郝铭. 一种数控机床自动上下料桁架机器人控制系统设计与实现[J]. 制造业自动化, 2019, 41(9): 108-110, 138.
[40] (Liu Linshan, Li Jianyong, Hao Ming. Research on Control System of Up-Down Materials Truss Robot for a CNC Machine[J]. Manufacturing Automation, 2019, 41(9): 108-110, 138.)
[41] 张明辰, 王坤. 数控机床加工中心技术研究[J]. 河南科技, 2019(14): 66-68.
[41] (Zhang Mingchen, Wang Kun. Machining Center Technology of CNC Machine Tools[J]. Henan Science and Technology, 2019(14): 66-68.)
[42] 西门子数字化工业集团. 机械加工数字化技术白皮书[R]. 2021: 26-28.
[42] (Siemens Digital Industry Group. White Paper on Machining Digital Technology[R]. 2021: 26-28.)
[1] 李佳蕾, 安培浚, 肖仙桃. 学科交叉主题识别方法研究综述*[J]. 数据分析与知识发现, 2023, 7(4): 1-15.
[2] 吕琦, 上官燕红, 张琳, 黄颖. 基于文本内容自动分类的跨学科测度研究*[J]. 数据分析与知识发现, 2023, 7(4): 56-67.
[3] 邓娜, 何昕洋, 陈伟杰, 陈旭. MPMFC:一种融合网络邻里结构特征和专利语义特征的中药专利分类模型*[J]. 数据分析与知识发现, 2023, 7(4): 145-158.
[4] 周云泽, 闵超. 基于LDA模型与共享语义空间的新兴技术识别——以自动驾驶汽车为例*[J]. 数据分析与知识发现, 2022, 6(2/3): 55-66.
[5] 邓露,胡珀,李炫宏. 知识增强的生物医学文本生成式摘要研究*[J]. 数据分析与知识发现, 2022, 6(11): 1-12.
[6] 华斌,康月,范林昊. 政策文本的知识建模与关联问答研究[J]. 数据分析与知识发现, 2022, 6(11): 79-92.
[7] 黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[8] 许光,任明,宋城宇. 西方媒体新闻中的中国经济形象提取*[J]. 数据分析与知识发现, 2021, 5(5): 30-40.
[9] 代冰,胡正银. 基于文献的知识发现新近研究综述 *[J]. 数据分析与知识发现, 2021, 5(4): 1-12.
[10] 余传明, 王曼怡, 林虹君, 朱星宇, 黄婷婷, 安璐. 基于深度学习的词汇表示模型对比研究*[J]. 数据分析与知识发现, 2020, 4(8): 28-40.
[11] 夏天. 面向中文学术文本的单文档关键短语抽取 *[J]. 数据分析与知识发现, 2020, 4(7): 76-86.
[12] 马建霞,袁慧,蒋翔. 基于Bi-LSTM+CRF的科学文献中生态治理技术相关命名实体抽取研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 78-88.
[13] 丁勇,陈夕,蒋翠清,王钊. 一种融合网络表示学习与XGBoost的评分预测模型*[J]. 数据分析与知识发现, 2020, 4(11): 52-62.
[14] 杜建. 医学知识不确定性测度的进展与展望*[J]. 数据分析与知识发现, 2020, 4(10): 14-27.
[15] 关鹏,王曰芬. 国内外专利网络研究进展*[J]. 数据分析与知识发现, 2020, 4(1): 26-39.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn