Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model
Yi Huifang,Liu Xiwen()
National Science Library, Chinese Academy of Sciences, Beijing 100190, China Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
[Objective] This paper explores issues facing topic modeling, such as lack of context, weak interpretability, and poor IPC integration. [Methods] First, we proposed the concept of context enhancement. Then, we built a Context-LDA model using both the IPC and the extracted vocabulary as training corpus at the same time. Third, we constructed our topic model with Python, and compared its generalization and topic representation abilities with traditional LDA models. [Results] We examined the proposed model with 38,354 pieces of patents of graphene. The new model had lower perplexity values (below 100), and had a strong generalization ability in different scenarios. The JS value was about 0.1 higher than the traditional LDA model. The combined IPC and the topic words represented each other and enhanced the topic readability. The average IPC position was 9.6/20 with little noise. [Limitations] The vocabulary representation under the new model needs to be expanded to n-gram from uni-gram. [Conclusions] Topic models play an important role in supporting analysis of patent topics, and more effective and accurate models should be developed based on actual needs.
( Hu Apei, Zhang Jing, Lei Xiaoping, et al. A Review of Technical Topic Analysis Based on Text Mining[J]. Journal of Intelligence, 2013,32(12):88-92.)
[2]
Alexander J, Chase J, Newman N, et al. Emergence as a Conceptual Framework for Understanding Scientific and Technological Progress[C]// Proceedings of the 2012 Portland International Conference on Management of Engineering and Technology. 2012: 1286-1292.
( Yang Chao, Zhu Donghua, Wang Xuefeng, et al. Technical Topic Analysis in Patents: SAO-Based LDA Modeling[J]. Library and Information Service, 2017,61(3):86-96.)
[4]
Callon M, Courtial J P, Laville F. Co-word Analysis as a Tool for Describing the Network of Interactions Between Basic and Technological Research: The Case of Polymer Chemistry[J]. Scientometrics, 1991,22(1):155-205.
doi: 10.1007/BF02019280
[5]
Lee H, Kim C, Cho H, et al. An ANP-Based Technology Network for Identification of Core Technologies: A Case of Telecommunication Technologies[J]. Expert Systems with Applications, 2009,36(1):894-908.
doi: 10.1016/j.eswa.2007.10.026
[6]
Kajikawa Y, Yoshikawa J, Takeda Y, et al. Tracking Emerging Technologies in Energy Research: Toward a Roadmap for Sustainable Energy[J]. Technological Forecasting & Social Change, 2008,75(6):771-782.
[7]
Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
[8]
Wang X, Mc Callum A. Topics over Time: A Non-markov Continuous-Time Model of Topical Trends[C]// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006: 424-433.
[9]
Wallach H M. Topic Modeling: Beyond Bag-of-Words[C]// Proceedings of the 23rd International Conference on Machine Learning. 2006: 977-984.
[10]
Wang X, McCallum A, Wei X. Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval[C]// Proceedings of the 7th International Conference on Data Mining. 2007: 697-702.
( Ai Chuhan, Jiang Di, Wu Jiande. Patent Recommendation Research Based on Topic Model and Text Similarity Calculation[J]. Information Technology, 2020,44(4):65-70.)
( Ai Chuhan, Xiong Xin, Wu Jiande. Research on Application of Patent Text Analysis Based on LDA Topic Model[J]. Science Technology and Industry, 2019,19(3):77-82.)
( Ma Yonghong, Kong Lingkai, Lin Chaoran, et al. Key Generic Technology Identification Based on Patent Mining[J]. Journal of the China Society for Scientific and Technical Information, 2020,39(10):1093-1103.)
( Li Hui, Xuan Hongsheng. Multi-Attribute Mining Method for Technology Innovation Subject from the Perspective of Patent: The Case of Chip Patent[J]. Library and Information Service, 2020,64(11):96-107.)
[15]
Blei D M, Jordan M I, Griffiths T L, et al. Hierarchical Topic Models and the Nested Chinese Restaurant Process[C]// Proceedings of the 16th International Conference on Neural Information Processing Systems. 2003: 17-24.
[16]
Blei D M, Lafferty J D. Dynamic Topic Models[C]// Proceedings of the 23rd International Conference on Machine Learning. 2006: 113-120.
[17]
Wang B, Liu S, Ding K, et al. Identifying Technological Topics and Institution-Topic Distribution Probability for Patent Competitive Intelligence Analysis: A Case Study in LTE Technology[J]. Scientometrics, 2014,101(1):685-704.
doi: 10.1007/s11192-014-1342-3
[18]
Tang J, Wang B, Yang Y, et al. PatentMiner: Topic-Driven Patent Analysis and Mining[C]// Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012: 1366-1374.
( Wu Feifei, Zhang Yaru, Huang Lucheng, et al. Multi-Dimensional Dynamic Evolution Analysis of Technology Topics Based on the AToT by Taking Graphene Technology as an Example[J]. Library and Information Service, 2017,61(5):95-102.)
( Chen Ling, Lin Ping, Duan Yaoqing. Technology Topic Mining and Trend Analysis from the Perspective of the Industrial Chain Combined with K-Means and LDA—Taking Virtual Reality Technology as an Example[J]. Knowledge Management Forum, 2020,5(3):135-146.)
( Liao Liefa, Le Fugang. Research on Patent Technology Evolution Based on LDA Model and Classification Number[J]. Journal of Modern Information, 2017,37(5):13-18.)
[24]
Mao X L, Ming Z Y, Chua T S, et al. SSHLDA: A Semi-Supervised Hierarchical Topic Model[C]// Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing & Computational Natural Language Learning. 2012: 800-809.
( Chen Liang. Patent Classification LDA: Topic Model for Patent Analysis[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(8):864-874.)
[26]
Hohenstein U, Plesser V. Semantic Enrichment: A First Step to Provide Database Interoperability[C]// Proceedings of the 1996 Wokshop Föderierte Datenbanken. 1996: 3-17.
( Bai Rujiang, Zhu Na, Wang Xiaoyue. Research on Representation of Technical Innovation Content with Enhanced Semantics[J]. Information Studies: Theory & Application, 2016,39(3):73-79.)
( Liu Ziqiang, Xu Haiyun, Yue Lixin, et al. Research on Core Technology Topic Identification Based on Chunk-LDAvis[J]. Library and Information Service, 2019,63(9):73-84.)
( Xu Ge, Wang Houfeng. The Development of Topic Models in Natural Language Processing[J]. Chinese Journal of Computers, 2011,34(8):1423-1436.)
[30]
Lee L. On the Eectiveness of the Skew Divergence for Statistical Language Analysis[C]// Proceedings of the 4th International Conference on Artificial Intelligence & Statistics. 2001: 65-72.
( Yang Xi, Yu Xiang, Liu Xin. A Study on the Technological Competition Situation of Graphene Industry Under the Perspective of Patent Information[J]. Journal of Intelligence, 2017,36(12):75-81,89.)
( Zhao Zhenxia, Chen Hong . Development of Graphene Technology in China: Present and Future-Based on Patent Statistics[J]. China Textile Leader, 2016(9):40-43.)
( Baidu AI Open Platform. SDK Documentation-Python Language[EB/OL].[ 2018- 10- 14]. http://ai.baidu.com/docs#/NLP-Python-SDK/top.
[36]
O'Callaghan D, Greene D, Carthy J, et al. An Analysis of the Coherence of Descriptors in Topic Modeling[J]. Expert Systems with Applications, 2015,42(13):5645-5657.
doi: 10.1016/j.eswa.2015.02.055
AlSumait L, Daniel B, Domeniconi C. On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking[C]// Proceedings of the 8th IEEE International Conference on Data Mining. 2008: 3-12.
( Chen Wei, Lin Chaoran, Li Jinqiu, et al. Analysis of the Evolutionary Trend of Technical Topics in Patents Based on LDA and HMM——Taking Marine Diesel Engine Technology as an Example[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(7):732-741.)