Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model
Yi Huifang,Liu Xiwen()
National Science Library, Chinese Academy of Sciences, Beijing 100190, China Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
[Objective] This paper explores issues facing topic modeling, such as lack of context, weak interpretability, and poor IPC integration. [Methods] First, we proposed the concept of context enhancement. Then, we built a Context-LDA model using both the IPC and the extracted vocabulary as training corpus at the same time. Third, we constructed our topic model with Python, and compared its generalization and topic representation abilities with traditional LDA models. [Results] We examined the proposed model with 38,354 pieces of patents of graphene. The new model had lower perplexity values (below 100), and had a strong generalization ability in different scenarios. The JS value was about 0.1 higher than the traditional LDA model. The combined IPC and the topic words represented each other and enhanced the topic readability. The average IPC position was 9.6/20 with little noise. [Limitations] The vocabulary representation under the new model needs to be expanded to n-gram from uni-gram. [Conclusions] Topic models play an important role in supporting analysis of patent topics, and more effective and accurate models should be developed based on actual needs.
( Hu Apei, Zhang Jing, Lei Xiaoping, et al. A Review of Technical Topic Analysis Based on Text Mining[J]. Journal of Intelligence, 2013,32(12):88-92.)
Alexander J, Chase J, Newman N, et al. Emergence as a Conceptual Framework for Understanding Scientific and Technological Progress[C]// Proceedings of the 2012 Portland International Conference on Management of Engineering and Technology. 2012: 1286-1292.
( Yang Chao, Zhu Donghua, Wang Xuefeng, et al. Technical Topic Analysis in Patents: SAO-Based LDA Modeling[J]. Library and Information Service, 2017,61(3):86-96.)
Callon M, Courtial J P, Laville F. Co-word Analysis as a Tool for Describing the Network of Interactions Between Basic and Technological Research: The Case of Polymer Chemistry[J]. Scientometrics, 1991,22(1):155-205.
Lee H, Kim C, Cho H, et al. An ANP-Based Technology Network for Identification of Core Technologies: A Case of Telecommunication Technologies[J]. Expert Systems with Applications, 2009,36(1):894-908.
Kajikawa Y, Yoshikawa J, Takeda Y, et al. Tracking Emerging Technologies in Energy Research: Toward a Roadmap for Sustainable Energy[J]. Technological Forecasting & Social Change, 2008,75(6):771-782.
Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
Wang X, Mc Callum A. Topics over Time: A Non-markov Continuous-Time Model of Topical Trends[C]// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006: 424-433.
Wallach H M. Topic Modeling: Beyond Bag-of-Words[C]// Proceedings of the 23rd International Conference on Machine Learning. 2006: 977-984.
Wang X, McCallum A, Wei X. Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval[C]// Proceedings of the 7th International Conference on Data Mining. 2007: 697-702.
( Ma Yonghong, Kong Lingkai, Lin Chaoran, et al. Key Generic Technology Identification Based on Patent Mining[J]. Journal of the China Society for Scientific and Technical Information, 2020,39(10):1093-1103.)
( Li Hui, Xuan Hongsheng. Multi-Attribute Mining Method for Technology Innovation Subject from the Perspective of Patent: The Case of Chip Patent[J]. Library and Information Service, 2020,64(11):96-107.)
Blei D M, Jordan M I, Griffiths T L, et al. Hierarchical Topic Models and the Nested Chinese Restaurant Process[C]// Proceedings of the 16th International Conference on Neural Information Processing Systems. 2003: 17-24.
Blei D M, Lafferty J D. Dynamic Topic Models[C]// Proceedings of the 23rd International Conference on Machine Learning. 2006: 113-120.
Wang B, Liu S, Ding K, et al. Identifying Technological Topics and Institution-Topic Distribution Probability for Patent Competitive Intelligence Analysis: A Case Study in LTE Technology[J]. Scientometrics, 2014,101(1):685-704.
Tang J, Wang B, Yang Y, et al. PatentMiner: Topic-Driven Patent Analysis and Mining[C]// Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012: 1366-1374.
( Wu Feifei, Zhang Yaru, Huang Lucheng, et al. Multi-Dimensional Dynamic Evolution Analysis of Technology Topics Based on the AToT by Taking Graphene Technology as an Example[J]. Library and Information Service, 2017,61(5):95-102.)
( Chen Ling, Lin Ping, Duan Yaoqing. Technology Topic Mining and Trend Analysis from the Perspective of the Industrial Chain Combined with K-Means and LDA—Taking Virtual Reality Technology as an Example[J]. Knowledge Management Forum, 2020,5(3):135-146.)
( Liao Liefa, Le Fugang. Research on Patent Technology Evolution Based on LDA Model and Classification Number[J]. Journal of Modern Information, 2017,37(5):13-18.)
Mao X L, Ming Z Y, Chua T S, et al. SSHLDA: A Semi-Supervised Hierarchical Topic Model[C]// Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing & Computational Natural Language Learning. 2012: 800-809.
AlSumait L, Daniel B, Domeniconi C. On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking[C]// Proceedings of the 8th IEEE International Conference on Data Mining. 2008: 3-12.
( Chen Wei, Lin Chaoran, Li Jinqiu, et al. Analysis of the Evolutionary Trend of Technical Topics in Patents Based on LDA and HMM——Taking Marine Diesel Engine Technology as an Example[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(7):732-741.)