|
|
Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model |
Yi Huifang,Liu Xiwen() |
National Science Library, Chinese Academy of Sciences, Beijing 100190, China Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China |
|
|
Abstract [Objective] This paper explores issues facing topic modeling, such as lack of context, weak interpretability, and poor IPC integration. [Methods] First, we proposed the concept of context enhancement. Then, we built a Context-LDA model using both the IPC and the extracted vocabulary as training corpus at the same time. Third, we constructed our topic model with Python, and compared its generalization and topic representation abilities with traditional LDA models. [Results] We examined the proposed model with 38,354 pieces of patents of graphene. The new model had lower perplexity values (below 100), and had a strong generalization ability in different scenarios. The JS value was about 0.1 higher than the traditional LDA model. The combined IPC and the topic words represented each other and enhanced the topic readability. The average IPC position was 9.6/20 with little noise. [Limitations] The vocabulary representation under the new model needs to be expanded to n-gram from uni-gram. [Conclusions] Topic models play an important role in supporting analysis of patent topics, and more effective and accurate models should be developed based on actual needs.
|
Received: 14 December 2020
Published: 17 May 2021
|
|
Corresponding Authors:
Liu Xiwen
E-mail: liuxw@mail.las.ac.cn
|
[1] |
胡阿沛, 张静, 雷孝平, 等. 基于文本挖掘的专利技术主题分析研究综述[J]. 情报杂志, 2013,32(12):88-92.
|
[1] |
( Hu Apei, Zhang Jing, Lei Xiaoping, et al. A Review of Technical Topic Analysis Based on Text Mining[J]. Journal of Intelligence, 2013,32(12):88-92.)
|
[2] |
Alexander J, Chase J, Newman N, et al. Emergence as a Conceptual Framework for Understanding Scientific and Technological Progress[C]// Proceedings of the 2012 Portland International Conference on Management of Engineering and Technology. 2012: 1286-1292.
|
[3] |
杨超, 朱东华, 汪雪锋, 等. 专利技术主题分析: 基于SAO结构的LDA主题模型方法[J]. 图书情报工作, 2017,61(3):86-96.
|
[3] |
( Yang Chao, Zhu Donghua, Wang Xuefeng, et al. Technical Topic Analysis in Patents: SAO-Based LDA Modeling[J]. Library and Information Service, 2017,61(3):86-96.)
|
[4] |
Callon M, Courtial J P, Laville F. Co-word Analysis as a Tool for Describing the Network of Interactions Between Basic and Technological Research: The Case of Polymer Chemistry[J]. Scientometrics, 1991,22(1):155-205.
doi: 10.1007/BF02019280
|
[5] |
Lee H, Kim C, Cho H, et al. An ANP-Based Technology Network for Identification of Core Technologies: A Case of Telecommunication Technologies[J]. Expert Systems with Applications, 2009,36(1):894-908.
doi: 10.1016/j.eswa.2007.10.026
|
[6] |
Kajikawa Y, Yoshikawa J, Takeda Y, et al. Tracking Emerging Technologies in Energy Research: Toward a Roadmap for Sustainable Energy[J]. Technological Forecasting & Social Change, 2008,75(6):771-782.
|
[7] |
Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
|
[8] |
Wang X, Mc Callum A. Topics over Time: A Non-markov Continuous-Time Model of Topical Trends[C]// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006: 424-433.
|
[9] |
Wallach H M. Topic Modeling: Beyond Bag-of-Words[C]// Proceedings of the 23rd International Conference on Machine Learning. 2006: 977-984.
|
[10] |
Wang X, McCallum A, Wei X. Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval[C]// Proceedings of the 7th International Conference on Data Mining. 2007: 697-702.
|
[11] |
艾楚涵, 姜迪, 吴建德. 基于主题模型和文本相似度计算的专利推荐研究[J]. 信息技术, 2020,44(4):65-70.
|
[11] |
( Ai Chuhan, Jiang Di, Wu Jiande. Patent Recommendation Research Based on Topic Model and Text Similarity Calculation[J]. Information Technology, 2020,44(4):65-70.)
|
[12] |
艾楚涵, 熊新, 吴建德. 基于LDA主题模型的专利文本分析应用研究[J]. 科技和产业, 2019,19(3):77-82.
|
[12] |
( Ai Chuhan, Xiong Xin, Wu Jiande. Research on Application of Patent Text Analysis Based on LDA Topic Model[J]. Science Technology and Industry, 2019,19(3):77-82.)
|
[13] |
马永红, 孔令凯, 林超然, 等. 基于专利挖掘的关键共性技术识别研究[J]. 情报学报, 2020,39(10):1093-1103.
|
[13] |
( Ma Yonghong, Kong Lingkai, Lin Chaoran, et al. Key Generic Technology Identification Based on Patent Mining[J]. Journal of the China Society for Scientific and Technical Information, 2020,39(10):1093-1103.)
|
[14] |
李慧, 玄洪升. 专利视角下融合多属性的技术创新主题挖掘方法——以芯片领域专利为例[J]. 图书情报工作, 2020,64(11):96-107.
|
[14] |
( Li Hui, Xuan Hongsheng. Multi-Attribute Mining Method for Technology Innovation Subject from the Perspective of Patent: The Case of Chip Patent[J]. Library and Information Service, 2020,64(11):96-107.)
|
[15] |
Blei D M, Jordan M I, Griffiths T L, et al. Hierarchical Topic Models and the Nested Chinese Restaurant Process[C]// Proceedings of the 16th International Conference on Neural Information Processing Systems. 2003: 17-24.
|
[16] |
Blei D M, Lafferty J D. Dynamic Topic Models[C]// Proceedings of the 23rd International Conference on Machine Learning. 2006: 113-120.
|
[17] |
Wang B, Liu S, Ding K, et al. Identifying Technological Topics and Institution-Topic Distribution Probability for Patent Competitive Intelligence Analysis: A Case Study in LTE Technology[J]. Scientometrics, 2014,101(1):685-704.
doi: 10.1007/s11192-014-1342-3
|
[18] |
Tang J, Wang B, Yang Y, et al. PatentMiner: Topic-Driven Patent Analysis and Mining[C]// Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012: 1366-1374.
|
[19] |
吴菲菲, 张亚茹, 黄鲁成, 等. 基于AToT模型的技术主题多维动态演化分析——以石墨烯技术为例[J]. 图书情报工作, 2017,61(5):95-102.
|
[19] |
( Wu Feifei, Zhang Yaru, Huang Lucheng, et al. Multi-Dimensional Dynamic Evolution Analysis of Technology Topics Based on the AToT by Taking Graphene Technology as an Example[J]. Library and Information Service, 2017,61(5):95-102.)
|
[20] |
吴红, 伊惠芳, 马永新, 等. 面向专利技术主题分析的WI-LDA模型研究[J]. 图书情报工作, 2018,62(17):68-74.
|
[20] |
( Wu Hong, Yi Huifang, Ma Yongxin, et al. WI-LDA: Technical Topic Analysis in Patents[J]. Library and Information Service, 2018,62(17):68-74.)
|
[21] |
王龙飞. 基于主题模型的汽车专利文本主题挖掘与应用研究[D]. 合肥: 合肥工业大学, 2018.
|
[21] |
( Wang Longfei. Research on Topic Mining and Application of Auto Patent Text Based on Topic Model[D]. Hefei: Hefei University of Technology, 2018.)
|
[22] |
陈玲, 林平, 段尧清. 产业链视角下结合K-means和LDA的专利技术主题挖掘与趋势分析——以虚拟现实技术为例[J]. 知识管理论坛, 2020,5(3):135-146.
|
[22] |
( Chen Ling, Lin Ping, Duan Yaoqing. Technology Topic Mining and Trend Analysis from the Perspective of the Industrial Chain Combined with K-Means and LDA—Taking Virtual Reality Technology as an Example[J]. Knowledge Management Forum, 2020,5(3):135-146.)
|
[23] |
廖列法, 勒孚刚. 基于LDA模型和分类号的专利技术演化研究[J]. 现代情报, 2017,37(5):13-18.
|
[23] |
( Liao Liefa, Le Fugang. Research on Patent Technology Evolution Based on LDA Model and Classification Number[J]. Journal of Modern Information, 2017,37(5):13-18.)
|
[24] |
Mao X L, Ming Z Y, Chua T S, et al. SSHLDA: A Semi-Supervised Hierarchical Topic Model[C]// Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing & Computational Natural Language Learning. 2012: 800-809.
|
[25] |
陈亮. 面向专利分析的Patent Classification LDA模型[J]. 情报学报, 2016,35(8):864-874.
|
[25] |
( Chen Liang. Patent Classification LDA: Topic Model for Patent Analysis[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(8):864-874.)
|
[26] |
Hohenstein U, Plesser V. Semantic Enrichment: A First Step to Provide Database Interoperability[C]// Proceedings of the 1996 Wokshop Föderierte Datenbanken. 1996: 3-17.
|
[27] |
白如江, 祝娜, 王效岳. 语义增强的科技创新内容表征研究[J]. 情报理论与实践, 2016,39(3):73-79.
|
[27] |
( Bai Rujiang, Zhu Na, Wang Xiaoyue. Research on Representation of Technical Innovation Content with Enhanced Semantics[J]. Information Studies: Theory & Application, 2016,39(3):73-79.)
|
[28] |
刘自强, 许海云, 岳丽欣, 等. 基于Chunk-LDAvis的核心技术主题识别方法研究[J]. 图书情报工作, 2019,63(9):73-84.
|
[28] |
( Liu Ziqiang, Xu Haiyun, Yue Lixin, et al. Research on Core Technology Topic Identification Based on Chunk-LDAvis[J]. Library and Information Service, 2019,63(9):73-84.)
|
[29] |
徐戈, 王厚峰. 自然语言处理中主题模型的发展[J]. 计算机学报, 2011,34(8):1423-1436.
|
[29] |
( Xu Ge, Wang Houfeng. The Development of Topic Models in Natural Language Processing[J]. Chinese Journal of Computers, 2011,34(8):1423-1436.)
|
[30] |
Lee L. On the Eectiveness of the Skew Divergence for Statistical Language Analysis[C]// Proceedings of the 4th International Conference on Artificial Intelligence & Statistics. 2001: 65-72.
|
[31] |
杨曦, 余翔, 刘鑫. 基于专利情报的石墨烯产业技术竞争态势研究[J]. 情报杂志, 2017,36(12):75-81, 89.
|
[31] |
( Yang Xi, Yu Xiang, Liu Xin. A Study on the Technological Competition Situation of Graphene Industry Under the Perspective of Patent Information[J]. Journal of Intelligence, 2017,36(12):75-81,89.)
|
[32] |
赵振霞, 陈红. 我国石墨烯技术发展现状及趋势分析——基于专利数据[J]. 纺织导报, 2016(9):40-43.
|
[32] |
( Zhao Zhenxia, Chen Hong . Development of Graphene Technology in China: Present and Future-Based on Patent Statistics[J]. China Textile Leader, 2016(9):40-43.)
|
[33] |
王博, 刘盛博, 丁堃, 等. 基于LDA主题模型的专利内容分析方法[J]. 科研管理, 2015,36(3):111-117.
|
[33] |
( Wang Bo, Liu Shengbo, Ding Kun, et al. Patent Content Analysis Method Based on LDA Topic Model[J]. Science Research Management, 2015,36(3):111-117.)
|
[34] |
GitHub. Stopwords[EB/OL]. [2020-09-05]. https://github.com/goto456/stopwords.
|
[35] |
百度AI开放平台. SDK文档-Python语言[EB/OL]. [2018-10- 14]. http://ai.baidu.com/docs#/NLP-Python-SDK/top.
|
[35] |
( Baidu AI Open Platform. SDK Documentation-Python Language[EB/OL].[ 2018- 10- 14]. http://ai.baidu.com/docs#/NLP-Python-SDK/top.
|
[36] |
O'Callaghan D, Greene D, Carthy J, et al. An Analysis of the Coherence of Descriptors in Topic Modeling[J]. Expert Systems with Applications, 2015,42(13):5645-5657.
doi: 10.1016/j.eswa.2015.02.055
|
[37] |
liuph_脚本之家. Python_LDA实现方法详解[EB/OL]. [2017-10-25]. https://www.jb51.net/article/126747.htm.
|
[37] |
( liuph_Script Home. Python_LDA Implementation Method Detailed[EB/OL]. [2017-10-25]. https://www.jb51.net/article/126747.htm.)
|
[38] |
AlSumait L, Daniel B, Domeniconi C. On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking[C]// Proceedings of the 8th IEEE International Conference on Data Mining. 2008: 3-12.
|
[39] |
陈伟, 林超然, 李金秋, 等. 基于LDA-HMM的专利技术主题演化趋势分析——以船用柴油机技术为例[J]. 情报学报, 2018,37(7):732-741.
|
[39] |
( Chen Wei, Lin Chaoran, Li Jinqiu, et al. Analysis of the Evolutionary Trend of Technical Topics in Patents Based on LDA and HMM——Taking Marine Diesel Engine Technology as an Example[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(7):732-741.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|