Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (2/3): 45-54    DOI: 10.11925/infotech.2096-3467.2021.1086
Current Issue | Archive | Adv Search |
Clustering Technology Topics Based on Patent Multi-Attribute Fusion
Liu Xiaoling1(),Tan Zongying1,2
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
Download: PDF (789 KB)   HTML ( 14
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] Reasonable, effective and accurate classification of technology topics is of great significance. This article integrates multiple attributes of patents to improve the division of technology topics. [Methods] First, we constructed the patent text vector, the patent citation vector and the patent classification vector based on text contents, citation relationship and classification information of the patents. Then, we obtained a new patent vector based on multi-attribute fusion of the three vectors. Finally, we identified technology topics through patent clustering analysis. [Results] Compared with the patent vector representation method based on single or two attributes, our method had higher patent classification precision, recall rate and F1 value on different IPC classification levels and sample sizes. Our measurement of patent similarity was also more accurate. [Limitations] We used automatic classification for patents rather than direct methods to evaluate the effect of technology topic division. [Conclusions] The proposed method improves the accuracy of patent similarity measurement and technology topic division.

Key wordsMulti-Attribute Fusion      Technology Topics      Patent Similarity     
Received: 24 September 2021      Published: 31 December 2021
ZTFLH:  G306  
Corresponding Authors: Liu Xiaoling,ORCID:0000-0001-7523-247X     E-mail: liuxiaoling@mail.las.ac.cn

Cite this article:

Liu Xiaoling, Tan Zongying. Clustering Technology Topics Based on Patent Multi-Attribute Fusion. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 45-54.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.1086     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I2/3/45

通过单一属性计算的相似度 实际的专利相似度 其他属性对相似度的修正
假设专利A和专利B共同的专利参考文献或施引专利较少,以此计算的相似度小 A和B的专利文本内容相似,分类号相同或相似,实际技术主题较相似 基于文本内容或所属分类的专利相似度大,可对基于引用关系的相似度进行修正
假设专利A和专利B共同的专利参考文献或施引专利较多,以此计算的相似度较大 A和B的引文并非其核心相关技术,实际技术主题相似性不大 基于文本内容或所属分类的专利相似度小,可修正基于引用关系的相似度
假设专利A和专利B的文本内容相似度较小 A和B写作习惯不同,用词差异大,共同的专利参考文献或施引专利较多,实际二者技术主题较相似 基于引用关系或分类的相似度较大,可修正基于文本内容的相似度
Theoretical Hypothesis of Patent Similarity Measurement Based on Multi-Attribute Fusion
Process of Technology Topics Division Based on Patent Multi-Attribute Fusion
Schematic Diagram of Patent Similarity Calculation Method Based on Multi-Attribute Fusion
专利向量表示方法 属性维度 IPC_4(Top10) IPC_6(Top10) IPC_6(Top2-6)
P R F1 P R F1 P R F1
Doc2Vec_CIT_IPC 多属性 0.853 0.864 0.847 0.708 0.725 0.662 0.694 0.700 0.692
Doc2Vec 单属性 0.777 0.816 0.760 0.644 0.680 0.590 0.609 0.617 0.609
CIT 单属性 0.791 0.812 0.757 0.684 0.683 0.592 0.540 0.547 0.539
IPC 单属性 0.820 0.836 0.801 0.623 0.700 0.610 0.586 0.542 0.512
Doc2Vec_CIT 两项属性 0.814 0.829 0.779 0.674 0.693 0.610 0.644 0.650 0.644
Doc2Vec_IPC 两项属性 0.848 0.857 0.837 0.712 0.719 0.649 0.653 0.664 0.652
CIT_IPC 两项属性 0.844 0.855 0.838 0.696 0.724 0.659 0.639 0.652 0.641
Patent Classification Results of Patent Vector Representation Based on Multi-Attribute Fusion Method and Other Methods
排名 公开号 专利名称 申请年 相似度
1 US10229108B2 自适应拼写检查的系统和方法 2016 0.773
2 US6732333B2 用于管理与文字处理文档的更正有关的统计数据的系统和方法 2001 0.085
3 US7647554B2 改进拼写检查的系统和方法 2006 0.071
4 US8543378B1 用于识别具有拼写错误单词的系统和方法 2003 0.068
5 US9489372B2 基于Web的拼写检查器 2013 0.060
6 US9069753B2 拼写错误的输入与预期输入的接近度测量 2010 0.057
7 US4730269A 利用Alpha集合生成单词骨架的方法和装置 1986 0.053
8 US5765180A 纠正拼写错误单词的方法和系统 1996 0.046
9 US5572423A 使用错误频率纠正拼写的方法 1995 0.045
10 US7669112B2 自动拼写分析 2007 0.044
……
20 US10310628B2 输入错误修改方法 2013 0.038
Top 20 Patents with the Highest Citation Similarity to the Patent US9275036B2
排名 公开号 专利名称 申请年 相似度
1 US10229108B2 自适应拼写检查的系统和方法 2016 0.992
2 US9779080B2 通过N-gram进行文本自动更正 2012 0.813
3 US5765180A 纠正拼写错误单词的方法和系统 1996 0.782
4 US5604897A 纠正拼写错误单词的方法和系统 1990 0.781
5 US10468015B2 自动化的TTS自校正系统 2017 0.775
6 US4777596B1 文本替换打字辅助工具,用于计算机文本编辑器 1986 0.769
7 US10318631B2 可移动的拼写检查器设备 2018 0.765
8 US5270927A 将汉语语音转换为汉字的方法 1990 0.765
9 US4783758A 使用拼写错误的单词和候选替换单词之间的结构差异的数值排名自动替换单词 1985 0.763
10 US8543378B1 用于识别拼写错误单词的系统和方法 2003 0.755
……
20 US10467338B2 纠正用户输入 2017 0.740
Top 20 Patents with the Highest Text Similarity to the Patent US9275036B2
排名 公开号 专利名称 申请年 相似度
1 US10229108B2 自适应拼写检查的系统和方法 2016 0.996
2 US9779080B2 通过N-gram进行文本自动更正 2012 0.926
3 US5765180A 纠正拼写错误的单词的方法和系统 1996 0.913
4 US5604897A 纠正拼写错误的单词的方法和系统 1990 0.913
5 US4777596B1 文本替换打字辅助工具,用于计算机文本编辑器 1986 0.908
6 US10318631B2 可移动的拼写检查器设备 2018 0.907
7 US4783758A 使用拼写错误的单词和候选替换单词之间的结构差异的数值排名自动替换单词 1985 0.906
8 US8543378B1 用于识别具有拼写错误单词的系统和方法 2003 0.903
9 US5276741A 模糊字符串匹配器 1991 0.903
10 US5761687A 具有校正传播的基于字符的校正方法 1995 0.902
11 US7831911B2 拼写检查系统,包括语音拼写器 2006 0.901
……
20 US8457946B2 用于生成亚洲字符的识别体系结构 2007 0.888
Top 20 Patents with the Highest Similarity to the Patent US9275036B2 Calculated Based on the Multi-Attribute Fusion Method
簇序号 主题 专利数(件)
1 文档处理;实体抽取;信息提取;问答系统;关键词提取;语音识别;词义消岐;语义文本搜索;文本聚类;文本可视化系统;文本相似度;文档查询;结构化文本索引技术;生成数字文档;问答系统中的问题预处理;专业语言识别;SVO结构提取;情感分析;生成候选答案;文档分类;网页排名 569
2 对话系统;语音控制系统;语音命令处理;动态语音邮件接收;物联网对话;虚拟现实系统互动技术;数字助理;语音呼叫转换为文本;语音理解;自动翻译多用户音频和视频;礼宾机器人系统;对话动态分析;智能自动化助手;转录对话;语音响应;虚拟助手系统;方言识别;音频信息提取;语音识别;语音翻译;交互式语音系统 325
3 字符输入错误纠正;触摸键盘;文字编辑;用户输入建议;表情符号词义消歧;字符输入;书写系统;字符串自动提示;输入法编辑器;目标文本选择方法;多国语言键盘系统;用户输入预测;协助键盘输入;拼写检查;虚拟键盘;魔方输入系统;预想输入法 293
4 信息检索;搜索引擎;半结构式问答系统;数字元素搜索;从非结构化文档中提取对象数据;信息查询;网络搜索映射;语料库查询;记录搜索系统;基于多语言用户交互的概念推荐;自然语言查询生成;自动生成结构化查询;语义搜索;搜索结果排名;搜索查询意图 288
5 文档处理;文档解析;基于规则的解析器;文档转换;汇总文档内容;显示网页的方法和装置;XML文件解析;文档分组;处理结构化数据文件;分页点识别;结构化搜索查询;文档编排;首字母缩略词生成;自动文件获取;文档顺序管理;生成便携式格式文档;文件准备平台;文档内容识别 279
……
10 问答系统;关系提取;深层问答系统;对话中基于上下文的语言分析;人机交互系统;使用问答系统的医学鉴别诊断和治疗;使用认知分析进行身份验证;问答系统中的类型评估;生成候选答案;会话查询处理器;生成完整问题;智能问答;使用聊天机器人系统提供问题答案 212
Number of Topics and Patents in Some NLP Clusters
[1] 胡阿沛, 张静, 雷孝平, 等. 基于文本挖掘的专利技术主题分析研究综述[J]. 情报杂志, 2013, 32(12):88-92.
[1] ( Hu Apei, Zhang Jing, Lei Xiaoping, et al. A Review of Technical Topic Analysis Based on Text Mining[J]. Journal of Intelligence, 2013, 32(12):88-92.)
[2] 沈君, 王续琨, 陈悦, 等. 战略坐标视角下的专利技术主题分析——以第三代移动通信技术为例[J]. 情报杂志, 2012, 31(11):88-94.
[2] ( Shen Jun, Wang Xukun, Chen Yue, et al. Analysis on Technology Focus from the Perspective of Strategic Diagram: A Case in the Field of 3G Mobile Communication[J]. Journal of Intelligence, 2012, 31(11):88-94.)
[3] 黄璐, 朱一鹤, 张嶷. 基于加权网络链路预测的新兴技术主题识别研究[J]. 情报学报, 2019, 38(4):335-341.
[3] ( Huang Lu, Zhu Yihe, Zhang Yi. Research on Identification of Emerging Topics Based on Link Prediction with Weighted Networks[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(4):335-341.)
[4] Kajikawa Y, Yoshikawa J, Takeda Y, et al. Tracking Emerging Technologies in Energy Research: Toward a Roadmap for Sustainable Energy[J]. Technological Forecasting and Social Change, 2008, 75(6):771-782.
doi: 10.1016/j.techfore.2007.05.005
[5] Small H, Boyack K W, Klavans R. Identifying Emerging Topics in Science and Technology[J]. Research Policy, 2014, 43(8):1450-1467.
doi: 10.1016/j.respol.2014.02.005
[6] Hopcroft J, Khan O, Kulis B, et al. Tracking Evolving Communities in Large Linked Networks[J]. PNAS, 2004, 101(S1):5249-5253.
doi: 10.1073/pnas.0307750100
[7] Feng L J, Niu Y X, Liu Z F, et al. Discovering Technology Opportunity by Keyword-Based Patent Analysis: A Hybrid Approach of Morphology Analysis and USIT[J]. Sustainability, 2019, 12(1):136.
doi: 10.3390/su12010136
[8] 薛金成, 姜迪, 吴建德. 基于Word2Vec的专利文本自动分类研究[J]. 信息技术, 2020, 44(2):73-77.
[8] ( Xue Jincheng, Jiang Di, Wu Jiande. Research on Automatic Patent Text Classification Based on Word2Vec[J]. Information Technology, 2020, 44(2):73-77.)
[9] Trappey A J C, Trappey C V, Chang A C. Intelligent Extraction of a Knowledge Ontology from Global Patents[J]. International Journal on Semantic Web and Information Systems, 2020, 16(4):61-80.
doi: 10.4018/IJSWIS.2020100104
[10] Wang J, Hsu C C. A Topic-Based Patent Analytics Approach for Exploring Technological Trends in Smart Manufacturing[J]. Journal of Manufacturing Technology Management, 2020, 32(1):110-135.
doi: 10.1108/JMTM-03-2020-0106
[11] 周京生. 融合视角下智能交通技术主题演进研究[D]. 大连: 大连理工大学, 2019.
[11] ( Zhou Jingsheng. Research on the Evolution of Intelligent Transportation Technology Topics on the Perspective of Convergence[D]. Dalian: Dalian University of Technology, 2019.)
[12] 罗建, 蔡丽君, 史敏. 基于专利的两阶段新兴技术识别研究——以图像识别技术为例[J]. 情报科学, 2019, 37(12):57-62.
[12] ( Luo Jian, Cai Lijun, Shi Min. Two-Stage Identification of Emerging Technologies Based on Patent—Take the Field of Image Identification as an Example[J]. Information Science, 2019, 37(12):57-62.)
[13] 刘小玲, 谭宗颖. 基于专利网络的技术演进研究方法探索[J]. 科学学研究, 2013, 31(5):651-656.
[13] ( Liu Xiaoling, Tan Zongying. Explore the Method of Technology Evolution Research Based on Patents Network[J]. Studies in Science of Science, 2013, 31(5):651-656.)
[14] 肖雪, 王钊伟, 陈云伟, 等. 基于样本加权的引文网络的社团划分[J]. 图书情报工作, 2016, 60(20):86-93.
[14] ( Xiao Xue, Wang Zhaowei, Chen Yunwei, et al. Community Detection Algorithm Based on Sample Weighting[J]. Library and Information Service, 2016, 60(20):86-93.)
[15] 侯婷, 吕学强, 李卓, 等. 面向专利技术主题分析的技术主题获取[J]. 情报理论与实践, 2015, 38(5):125-129.
[15] ( Hou Ting, Lü Xueqiang, Li Zhuo, et al. Acquisition of Technical Theme for Patent Technical Theme Analysis[J]. Information Studies: Theory & Application, 2015, 38(5):125-129.)
[16] 胡菊香, 吕学强, 徐丽萍. 面向专利的技术主题检测[J]. 计算机工程与设计, 2016, 37(12):3260-3265.
[16] ( Hu Juxiang, Lü Xueqiang, Xu Liping. Technology Subject Detection for Patent[J]. Computer Engineering and Design, 2016, 37(12):3260-3265.)
[17] Feng S J. The Proximity of Ideas: An Analysis of Patent Text Using Machine Learning[J]. PLoS One, 2020, 15(7):e0234880.
doi: 10.1371/journal.pone.0234880
[18] Kim H J, Kim T S, Sohn S Y. Recommendation of Startups as Technology Cooperation Candidates from the Perspectives of Similarity and Potential: A Deep Learning Approach[J]. Decision Support Systems, 2020, 130:113229.
doi: 10.1016/j.dss.2019.113229
[19] Lai K K, Wu S J. Using the Patent Co-Citation Approach to Establish a New Patent Classification System[J]. Information Processing & Management, 2005, 41(2):313-330.
doi: 10.1016/j.ipm.2003.11.004
[20] Chang S B, Lai K K, Chang S M. Exploring Technology Diffusion and Classification of Business Methods: Using the Patent Citation Network[J]. Technological Forecasting and Social Change, 2009, 76(1):107-117.
doi: 10.1016/j.techfore.2008.03.014
[21] 赵京胜, 宋梦雪, 高祥. 自然语言处理发展及应用综述[J]. 信息技术与信息化, 2019(7):142-145.
[21] ( Zhao Jingsheng, Song Mengxue, Gao Xiang. Overview of Natural Language Processing Development and Application[J]. Information Technology and Informatization, 2019(7):142-145.)
[22] 李生. 自然语言处理的研究与发展[J]. 燕山大学学报, 2013, 37(5):377-384.
[22] ( Li Sheng. Research and Development of Natural Language Processing[J]. Journal of Yanshan University, 2013, 37(5):377-384.)
[1] Lv Xueqiang,Luo Yixiong,Li Jiaquan,You Xindong. Review of Studies on Detecting Chinese Patent Infringements[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
[2] Yan Yu,Lei Chen,Jinde Jiang,Naixuan Zhao. Measuring Patent Similarity with Word Embedding and Statistical Features[J]. 数据分析与知识发现, 2019, 3(9): 53-59.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn