Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (4): 25-36    DOI: 10.11925/infotech.2096-3467.2020.1255
Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model
Yi Huifang,Liu Xiwen()
National Science Library, Chinese Academy of Sciences, Beijing 100190, China
Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
[Objective] This paper explores issues facing topic modeling, such as lack of context, weak interpretability, and poor IPC integration. [Methods] First, we proposed the concept of context enhancement. Then, we built a Context-LDA model using both the IPC and the extracted vocabulary as training corpus at the same time. Third, we constructed our topic model with Python, and compared its generalization and topic representation abilities with traditional LDA models. [Results] We examined the proposed model with 38,354 pieces of patents of graphene. The new model had lower perplexity values (below 100), and had a strong generalization ability in different scenarios. The JS value was about 0.1 higher than the traditional LDA model. The combined IPC and the topic words represented each other and enhanced the topic readability. The average IPC position was 9.6/20 with little noise. [Limitations] The vocabulary representation under the new model needs to be expanded to n-gram from uni-gram. [Conclusions] Topic models play an important role in supporting analysis of patent topics, and more effective and accurate models should be developed based on actual needs.

Key wordsTechnology Topic Analysis      Topic Model      Context-Enhance      Context-LDA     
Received: 14 December 2020      Published: 17 May 2021
ZTFLH:  分类号: G250  
Yi Huifang,Liu Xiwen. Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model. Data Analysis and Knowledge Discovery, 2021, 5(4): 25-36.

Context-LDA Model
统计对象 词(预处理前) 词(预处理后) 最小值 最大值 均值 中位数 标准差
LDA整个语料集大小 357 186 242 279 / / / / /
LDA文档长度统计 / / 2 19 6.3 6 1.9
Context-LDA整个语料集大小 493 768 378 861 / / / / /
Context-LDA文档长度统计 / / 3 46 9.9 10 3.3
Statistical Description of Corpus and Text Length
Curve of Perplexity with the Number of Iterations
Curve of Perplexity with the Number of Topics
Curve of Confusion Degree with the Number of Document Words
JS Distance Curve of the Topic Model
主题 主题内容
Topic0 0.216 制备+0.200 导电+0.171 石墨烯+0.120 复合+0.077 纳米材料+0.061 石墨烯纳米带+0.022 橡胶+0.021 应用+0.020 浆料+ 0.010 水性聚氨酯
Topic11 0.180 制备+0.177 纳米+0.149 石墨烯+0.094 纳米复合材料+0.077 复合物+0.050 二氧化钛+0.032 应用+0.028 可控+0.027 粒子+0.025 复合
Topic35 0.207 石墨烯+0.159 增强+0.146 生长+0.111 制备+0.057 设备+0.052直接+0.025 层数+0.021 复合材料+0.021 石墨烯微片+0.012 原位
Topic19 0.178 电池+0.136 石墨烯+0.125 锂离子+0.113 负极+0.107 制备+0.103 材料+0.063 制作+0.044 复合+0.016 低成本+ 0.009 发射
Topic37 0.151 制备+0.120 材料+0.118 石墨烯+0.103 正极+0.080 包覆+0.058复合+0.054 锂离子电池+0.053 聚苯胺+0.042 催化+0.031 活性
Topic44 0.283 材料 +0.187制备+0.150 石墨烯+0.123 电极+0.051 复合+0.041应用+0.032 吸附+0.014 光催化剂+0.011 超级电容器+0.008 氨基
LDA Topic Content (Partial)
主题 主题内容 IPC释义 IPC位置
Topic3 0.374 C01B+0.194 制备+0.188 石墨烯+ 0.100 B82Y+0.023 掺杂+0.016 量子点 +0.013 材料+0.012 纳米材料+0.010 复合材料+ 0.007 薄膜 C01B:非金属元素;其化合物
Topic7 0.277 C30B+0.145 石墨烯+0.078 G02F+ 0.074多层+0.044 生长+ 0.041 制备+0.034 单晶+0.030 功能+0.019 选择性+0.015 结构 C03B:单晶生长;共晶材料的定向凝固或共析材料的定向分层…
Topic20 0.378 C08K+0.294 C08L+0.084 制备+ 0.053 石墨烯+0.047 复合材料+0.013 氧化石墨烯+0.013复合+0.010 材料+0.006 应用+ 0.006 增强 C08K:用无机物或非高分子有机物作为配料
Topic31 0.457 H01G+0.125 制备+0.083 石墨烯+ 0.066 电极+0.044 材料+0.036 复合材料+ 0.036 应用+0.022电容器+0.021 外延+ 0.008 掺杂 H01G:电容器;电解型的电容器、整流器、检波器、开关器件、光敏器件或热敏器件 1/20
Topic41 0.486 H01M+0.121 制备+0.100 石墨烯 +0.066 复合材料+ 0.043 应用+ 0.042 B82Y +0.017 锂离子电池+0.014 电极+ 0.013 掺杂+ 0.012 材料 H01M:用于直接转变化学能为电能的方法或装置,例如电池组 1/20,6/20
Context-LDA Topic Content (Partial)
