基于权利要求层级特征的专利相似度计算方法研究<sup>*</sup>

doi:10.11925/infotech.2096-3467.2022.1340

数据分析与知识发现

2024, Vol. 8

Issue (2): 33-43 https://doi.org/10.11925/infotech.2096-3467.2022.1340

研究论文

本期目录 | 过刊浏览 | 高级检索

基于权利要求层级特征的专利相似度计算方法研究^*

向姝璇¹(

),操玉杰²,毛进^3,⁴

¹南京大学数据智能与交叉创新实验室南京 210023
²华中师范大学信息管理学院武汉 430074
³武汉大学信息管理学院武汉 430072
⁴武汉大学信息资源研究中心武汉 430072

Computing Patent Similarity Based on Hierarchical Feature of Claims

Xiang Shuxuan¹(

),Cao Yujie²,Mao Jin^3,⁴

¹Laboratory of Data Intelligence and Interdisciplinary Innovation, Nanjing University, Nanjing 210023, China
²School of Information Management, Central China Normal University, Wuhan 430074, China
³School of Information Management, Wuhan University, Wuhan 430072, China
⁴Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (1058 KB) HTML ( 12 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】 现有专利相似度计算方法对专利文本独有特征利用不足，并一定程度上忽视了专利内容与结构的特性，本文就上述问题提出一种新的专利相似度计算方法。【方法】 通过权利要求层级特征生成技术组合句并进行信息核心度、信息丰富度的加权，兼顾技术内容范围与技术信息重点进行专利表示，在此基础上进行专利相似度计算。通过相关性指标与专利分类的对比实验证明方法的合理性。【结果】 本文提出的方法较同类基准方法可以更充分地表达专利信息，更适用于专利相似度计算；技术组合句的重构对模型表现提升作用明显，在该基础上的信息核心度、信息丰富度的加权能进一步提高模型表现。【局限】 仅在量子计算领域进行实验，技术领域是否会对方法表现造成影响仍待探究。【结论】 权利要求树与技术组合句的信息组织形式能够提高专利文本的利用效率；基于专利权利要求层级特征的技术组合句与对应信息特征加权能够提升专利表示效果及其在相似度任务中的表现。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	向姝璇
	操玉杰
	毛进

关键词 ：权利要求, 专利相似度, 权利要求层级

Abstract：

[Objective] This paper proposes a new model to compute patent similarity, which fully leverages the characteristics of patent texts and their structural and context features. [Methods] First, we used technical compound sentences, the weighting of information core degree, and information richness to represent patents. Then, we calculated patent-to-patent similarity with the representation. Finally, we conducted comparative experiments with correlation scores and patent classification. [Results] The proposed method outperformed benchmark methods in computing patent similarities. The technical compound sentences and weighting of information core degree and richness further improved the model's performance. [Limitations] We only examined the model with quantum computing. [Conclusions] Using a claim tree and technical compound sentences to organize patent information can improve the efficiency of patent text processing. The weighting of information core degree and richness based on hierarchical features of patents can improve their representation and patent similarity computing tasks.

Key words： Patent Claims Patent Similarity Hierarchy of Claims

收稿日期: 2022-12-19 出版日期: 2023-05-16

ZTFLH:	TP393
	G255

基金资助:*国家自然科学基金创新研究群体项目(71921002);湖湘高层次人才聚集计划项目(2021RC5029)

通讯作者: 向姝璇，ORCID：0000-0002-3259-7169，E-mail：xsx@smail.nju.edu.cn。

引用本文:

向姝璇, 操玉杰, 毛进. 基于权利要求层级特征的专利相似度计算方法研究^*[J]. 数据分析与知识发现, 2024, 8(2): 33-43.
Xiang Shuxuan, Cao Yujie, Mao Jin. Computing Patent Similarity Based on Hierarchical Feature of Claims. Data Analysis and Knowledge Discovery, 2024, 8(2): 33-43.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.1340 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2024/V8/I2/33

Fig.1 专利权利要求示意

Fig.2 专利权利要求树

Fig.3 技术组合句构建

Fig.4 基于技术组合句信息核心度与丰富度加权的专利表示

Table 1 对比基准方法

Table 2 技术组合句与特征加权方法评估结果

Table 3 显著性检验结果（Ⅰ）

Table 4 显著性检验结果（Ⅱ）

Table 5 显著性检验结果（Ⅲ）

Table 6 专利相似度计算方法对比结果

Table 7 显著性检验结果（Ⅳ）

Table 8 显著性检验结果（Ⅴ）

Table 9 量子计算领域专利分布

Table 10 模型分类效果对比结果

[1]	Qiu Z P, Wang Z. Technology Forecasting Based on Semantic and Citation Analysis of Patents: A Case of Robotics Domain[J]. IEEE Transactions on Engineering Management, 2022, 69(4): 1216-1236. doi: 10.1109/TEM.2020.2978849
[2]	刘小玲, 谭宗颖. 基于专利多属性融合的技术主题划分方法研究[J]. 数据分析与知识发现, 2022, 6(2/3): 45-54.
[2]	(Liu Xiaoling, Tan Zongying. Clustering Technology Topics Based on Patent Multi-Attribute Fusion[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 45-54.)
[3]	Kim T S, Sohn S Y. Machine-Learning-Based Deep Semantic Analysis Approach for Forecasting New Technology Convergence[J]. Technological Forecasting and Social Change, 2020, 157: Article No.120095.
[4]	寇园园, 陈会英, 徐华杰, 等. 海外跨国公司在华人工智能专利布局及竞争态势研究[J]. 情报杂志, 2022, 41(9): 48-54.
[4]	(Kou Yuanyuan, Chen Huiying, Xu Huajie, et al. Study on AI Patent Layout and Competitive Situation of Overseas Multinational Companies in China[J]. Journal of Intelligence, 2022, 41(9): 48-54.)
[5]	吕学强, 罗艺雄, 李家全, 等. 中文专利侵权检测研究综述[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
[5]	(Lv Xueqiang, Luo Yixiong, Li Jiaquan, et al. Review of Studies on Detecting Chinese Patent Infringements[J]. Data Analysis and Knowledge Discovery, 2021, 5(3): 60-68.)
[6]	Bekamiri H, Hain D S, Jurowetzki R. PatentSBERTa: A Deep NLP Based Hybrid Model for Patent Distance and Classification Using Augmented SBERT[OL]. arXiv Preprint, arXiv: 2103.11933.
[7]	俞琰, 鞠鹏, 尚明杰. 基于信息增益与相似度的专利关键词抽取算法评价模型[J]. 图书情报工作, 2022, 66(6): 108-117. doi: 10.13266/j.issn.0252-3116.2022.06.012
[7]	(Yu Yan, Ju Peng, Shang Mingjie. Research on the Evaluation Method of Patent Keyword Extraction Algorithm Based on Information Gain and Similarity[J]. Library and Information Service, 2022, 66(6): 108-117.) doi: 10.13266/j.issn.0252-3116.2022.06.012
[8]	Chen L X. Do Patent Citations Indicate Knowledge Linkage? The Evidence from Text Similarities Between Patents and Their Citations[J]. Journal of Informetrics, 2017, 11(1): 63-79. doi: 10.1016/j.joi.2016.04.018
[9]	Zhou Y, Dong F, Liu Y F, et al. A Deep Learning Framework to Early Identify Emerging Technologies in Large-Scale Outlier Patents: An Empirical Study of CNC Machine Tool[J]. Scientometrics, 2021, 126(2): 969-994. doi: 10.1007/s11192-020-03797-8
[10]	Frerich K, Bukowski M, Geisler S, et al. On the Potential of Taxonomic Graphs to Improve Applicability and Performance for the Classification of Biomedical Patents[J]. Applied Sciences, 2021, 11(2): Article No.690.
[11]	Lee M, Lee S. Identifying New Business Opportunities from Competitor Intelligence: An Integrated Use of Patent and Trademark Databases[J]. Technological Forecasting and Social Change, 2017, 119: 170-183. doi: 10.1016/j.techfore.2017.03.026
[12]	高楠, 彭鼎原, 傅俊英, 等. 基于专利IPC分类与文本信息的前沿技术演进分析——以人工智能领域为例[J]. 情报理论与实践, 2020, 43(4): 123-129.
[12]	(Gao Nan, Peng Dingyuan, Fu Junying, et al. Research on Technology Fronts Prediction Based on Patent IPC Classification and Text Information: Taking the Field of Artificial Intelligence as an Example[J]. Information Studies: Theory & Application, 2020, 43(4): 123-129.)
[13]	高道斌, 吴红, 张彪, 等. 基于改进技术相似度计算的竞争对手辨别研究[J]. 情报杂志, 2022, 41(8): 53-61.
[13]	(Gao Daobin, Wu Hong, Zhang Biao, et al. Research on Competitor Identification Based on Improved Technology Similarity Calculation[J]. Journal of Intelligence, 2022, 41(8): 53-61.)
[14]	向姝璇, 李睿. 基于专利文献整体相似度计算的竞争对手发现——以5G领域为例[J]. 情报理论与实践, 2021, 44(5): 100-105.
[14]	(Xiang Shuxuan, Li Rui. Competitor Discovery Based on Overall Similarity Calculation of Patent Documents: A Case Study of 5G Domain[J]. Information Studies: Theory & Application, 2021, 44(5): 100-105.)
[15]	Yun S, Cho W, Kim C, et al. Technological Trend Mining: Identifying New Technology Opportunities Using Patent Semantic Analysis[J]. Information Processing & Management, 2022, 59(4): Article No.102993.
[16]	俞琰, 陈磊, 姜金德, 等. 结合词向量和统计特征的专利相似度测量方法[J]. 数据分析与知识发现, 2019, 3(9): 53-59.
[16]	(Yu Yan, Chen Lei, Jiang Jinde, et al. Measuring Patent Similarity with Word Embedding and Statistical Features[J]. Data Analysis and Knowledge Discovery, 2019, 3(9): 53-59.)
[17]	Lee J S, Hsiang J. Patent Classification by Fine-Tuning BERT Language Model[J]. World Patent Information, 2020, 61: Article No.101965.
[18]	Lei L, Qi J J, Zheng K. Patent Analytics Based on Feature Vector Space Model: A Case of IoT[J]. IEEE Access, 2019, 7: 45705-45715. doi: 10.1109/ACCESS.2019.2909123
[19]	Hain D S, Jurowetzki R, Buchmann T, et al. A Text-Embedding-Based Approach to Measuring Patent-to-Patent Technological Similarity[J]. Technological Forecasting and Social Change, 2022, 177: Article No.121559.
[20]	Li S B, Hu J, Cui Y X, et al. DeepPatent: Patent Classification with Convolutional Neural Networks and Word Embedding[J]. Scientometrics, 2018, 117(2): 721-744. doi: 10.1007/s11192-018-2905-5
[21]	Qi J J, Lei L, Zheng K, et al. Patent Analytic Citation-Based VSM: Challenges and Applications[J]. IEEE Access, 2020, 8: 17464-17476. doi: 10.1109/Access.6287639
[22]	张杰, 魏鹏涛, 翟东升. 基于权利要求分解和相似度排序的专利无效检索研究[J]. 情报理论与实践, 2019, 42(12): 108-114. doi: 10.16353/j.cnki.1000-7490.2019.12.017
[22]	(Zhang Jie, Wei Pengtao, Zhai Dongsheng. Research on Patent Invalidity Search Based on Claim Decomposition and Similarity Ranking[J]. Information Studies: Theory & Application, 2019, 42(12): 108-114.) doi: 10.16353/j.cnki.1000-7490.2019.12.017
[23]	国家知识产权局. 中华人民共和国专利法实施细则[EB/OL]. [2015-09-07]. https://www.cnipa.gov.cn/art/2015/9/7/art_98_28200.html.
[23]	(China National Intellectual Property Administration. Rules for Implementation of the Patent Law of the People's Republic of China[EB/OL]. [2015-09-07]. https://www.cnipa.gov.cn/art/2015/9/7/art_98_28200.html.)
[24]	康旭东, 邓乐乐, 王宇开, 等. 基于全代引证的专利累积影响力评价——一个诺奖得主专利的案例研究[J]. 情报学报, 2021, 40(3): 267-277.
[24]	(Kang Xudong, Deng Lele, Wang Yukai, et al. Evaluation of Patents. Cumulative Impact Based on all Generations of Citations: A Case Study of a Nobel Prize Winner's Patents[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(3): 267-277.)
[25]	Mirisaee H, Gaussier E, Lagnier C, et al. Terminology-Based Text Embedding for Computing Document Similarities on Technical Content[OL]. arXiv Preprint, arXiv: 1906.01874.
[26]	Gao T Y, Yao X C, Chen D Q. SimCSE: Simple Contrastive Learning of Sentence Embeddings[OL]. arXiv Preprint, arXiv: 2104.08821.
[27]	Costa Y M G, Bertolini D, Britto A S, et al. The Dissimilarity Approach: A Review[J]. Artificial Intelligence Review, 2020, 53(4): 2783-2808. doi: 10.1007/s10462-019-09746-z
[28]	Riesen K, Bunke H. Graph Classification Based on Vector Space Embedding[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2009, 23(6): 1053-1081. doi: 10.1142/S021800140900748X
[29]	Paclik P, Duin R P W. A Generalized Kernel Approach to Dissimilarity-Based Classification[J]. Journal of Machine Learning Research, 2002, 2(2): 175-211.
[30]	Bille P. A Survey on Tree Edit Distance and Related Problems[J]. Theoretical Computer Science, 2005, 337(1-3): 217-239. doi: 10.1016/j.tcs.2004.12.030
[31]	Reimers N, Beyer P, Gurevych I. Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity[C]// Proceedings of the 26th International Conference on Computational Linguistics:Technical Papers. 2016: 87-96.
[32]	李睿, 王堂蓉, 龙瑞. 专利引证与专利维持时间的相关性实证[J]. 情报杂志, 2022, 41(7): 71-76.
[32]	(Li Rui, Wang Tangrong, Long Rui. Empirical Research on the Correlation Between Patent Citations and Patent Maintenance Time[J]. Journal of Intelligence, 2022, 41(7): 71-76.)
[33]	Du L, Liu W D, Xiao K Y, et al. Technical Function-Effect Based Patent Multi-to-One Negation Game Model[C]// Proceedings of the 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, 2022: 1443-1448.
[34]	Lau J H, Baldwin T. An Empirical Evaluation of Doc2Vec with Practical Insights into Document Embedding Generation[OL]. arXiv Preprint, arXiv: 1607.05368.
[35]	Dingwall N, Potts C. Mittens: An Extension of GloVe for Learning Domain-Specialized Representations[OL]. arXiv Preprint, arXiv: 1803.09901.
[36]	Ethayarajh K. Unsupervised Random Walk Sentence Embeddings: A Strong But Simple Baseline[C]// Proceedings of the 3rd Workshop on Representation Learning for NLP. Stroudsburg, PA, USA: Association for Computational Linguistics, 2018: 91-100.
[37]	Kim G J, Park S S, Jang D S. Technology Forecasting Using Topic-Based Patent Analysis[J]. Journal of Scientific and Industrial Research, 2015, 74(5): 265-270.

[1]	邓娜, 何昕洋, 陈伟杰, 陈旭. MPMFC：一种融合网络邻里结构特征和专利语义特征的中药专利分类模型^*[J]. 数据分析与知识发现, 2023, 7(4): 145-158.
[2]	刘小玲, 谭宗颖. 基于专利多属性融合的技术主题划分方法研究[J]. 数据分析与知识发现, 2022, 6(2/3): 45-54.
[3]	俞琰, 朱晟忱. 融入限定关系的专利关键词抽取方法^*[J]. 数据分析与知识发现, 2022, 6(10): 57-67.
[4]	吕学强,罗艺雄,李家全,游新冬. 中文专利侵权检测研究综述^*[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
[5]	俞琰,陈磊,姜金德,赵乃瑄. 结合词向量和统计特征的专利相似度测量方法 ^*[J]. 数据分析与知识发现, 2019, 3(9): 53-59.
[6]	张杰, 张海超, 翟东升. 面向中文专利权利要求书的分词方法研究[J]. 现代图书情报技术, 2014, 30(9): 91-98.

Viewed

Full text

Abstract

Cited

Shared

Discussed