Computing Patent Similarity Based on Hierarchical Feature of Claims
Xiang Shuxuan1(),Cao Yujie2,Mao Jin3,4
1Laboratory of Data Intelligence and Interdisciplinary Innovation, Nanjing University, Nanjing 210023, China 2School of Information Management, Central China Normal University, Wuhan 430074, China 3School of Information Management, Wuhan University, Wuhan 430072, China 4Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China
[Objective] This paper proposes a new model to compute patent similarity, which fully leverages the characteristics of patent texts and their structural and context features. [Methods] First, we used technical compound sentences, the weighting of information core degree, and information richness to represent patents. Then, we calculated patent-to-patent similarity with the representation. Finally, we conducted comparative experiments with correlation scores and patent classification. [Results] The proposed method outperformed benchmark methods in computing patent similarities. The technical compound sentences and weighting of information core degree and richness further improved the model's performance. [Limitations] We only examined the model with quantum computing. [Conclusions] Using a claim tree and technical compound sentences to organize patent information can improve the efficiency of patent text processing. The weighting of information core degree and richness based on hierarchical features of patents can improve their representation and patent similarity computing tasks.
向姝璇, 操玉杰, 毛进. 基于权利要求层级特征的专利相似度计算方法研究*[J]. 数据分析与知识发现, 2024, 8(2): 33-43.
Xiang Shuxuan, Cao Yujie, Mao Jin. Computing Patent Similarity Based on Hierarchical Feature of Claims. Data Analysis and Knowledge Discovery, 2024, 8(2): 33-43.
Qiu Z P, Wang Z. Technology Forecasting Based on Semantic and Citation Analysis of Patents: A Case of Robotics Domain[J]. IEEE Transactions on Engineering Management, 2022, 69(4): 1216-1236.
doi: 10.1109/TEM.2020.2978849
(Liu Xiaoling, Tan Zongying. Clustering Technology Topics Based on Patent Multi-Attribute Fusion[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 45-54.)
[3]
Kim T S, Sohn S Y. Machine-Learning-Based Deep Semantic Analysis Approach for Forecasting New Technology Convergence[J]. Technological Forecasting and Social Change, 2020, 157: Article No.120095.
(Kou Yuanyuan, Chen Huiying, Xu Huajie, et al. Study on AI Patent Layout and Competitive Situation of Overseas Multinational Companies in China[J]. Journal of Intelligence, 2022, 41(9): 48-54.)
(Lv Xueqiang, Luo Yixiong, Li Jiaquan, et al. Review of Studies on Detecting Chinese Patent Infringements[J]. Data Analysis and Knowledge Discovery, 2021, 5(3): 60-68.)
[6]
Bekamiri H, Hain D S, Jurowetzki R. PatentSBERTa: A Deep NLP Based Hybrid Model for Patent Distance and Classification Using Augmented SBERT[OL]. arXiv Preprint, arXiv: 2103.11933.
(Yu Yan, Ju Peng, Shang Mingjie. Research on the Evaluation Method of Patent Keyword Extraction Algorithm Based on Information Gain and Similarity[J]. Library and Information Service, 2022, 66(6): 108-117.)
doi: 10.13266/j.issn.0252-3116.2022.06.012
[8]
Chen L X. Do Patent Citations Indicate Knowledge Linkage? The Evidence from Text Similarities Between Patents and Their Citations[J]. Journal of Informetrics, 2017, 11(1): 63-79.
doi: 10.1016/j.joi.2016.04.018
[9]
Zhou Y, Dong F, Liu Y F, et al. A Deep Learning Framework to Early Identify Emerging Technologies in Large-Scale Outlier Patents: An Empirical Study of CNC Machine Tool[J]. Scientometrics, 2021, 126(2): 969-994.
doi: 10.1007/s11192-020-03797-8
[10]
Frerich K, Bukowski M, Geisler S, et al. On the Potential of Taxonomic Graphs to Improve Applicability and Performance for the Classification of Biomedical Patents[J]. Applied Sciences, 2021, 11(2): Article No.690.
[11]
Lee M, Lee S. Identifying New Business Opportunities from Competitor Intelligence: An Integrated Use of Patent and Trademark Databases[J]. Technological Forecasting and Social Change, 2017, 119: 170-183.
doi: 10.1016/j.techfore.2017.03.026
(Gao Nan, Peng Dingyuan, Fu Junying, et al. Research on Technology Fronts Prediction Based on Patent IPC Classification and Text Information: Taking the Field of Artificial Intelligence as an Example[J]. Information Studies: Theory & Application, 2020, 43(4): 123-129.)
(Gao Daobin, Wu Hong, Zhang Biao, et al. Research on Competitor Identification Based on Improved Technology Similarity Calculation[J]. Journal of Intelligence, 2022, 41(8): 53-61.)
(Xiang Shuxuan, Li Rui. Competitor Discovery Based on Overall Similarity Calculation of Patent Documents: A Case Study of 5G Domain[J]. Information Studies: Theory & Application, 2021, 44(5): 100-105.)
[15]
Yun S, Cho W, Kim C, et al. Technological Trend Mining: Identifying New Technology Opportunities Using Patent Semantic Analysis[J]. Information Processing & Management, 2022, 59(4): Article No.102993.
(Yu Yan, Chen Lei, Jiang Jinde, et al. Measuring Patent Similarity with Word Embedding and Statistical Features[J]. Data Analysis and Knowledge Discovery, 2019, 3(9): 53-59.)
[17]
Lee J S, Hsiang J. Patent Classification by Fine-Tuning BERT Language Model[J]. World Patent Information, 2020, 61: Article No.101965.
[18]
Lei L, Qi J J, Zheng K. Patent Analytics Based on Feature Vector Space Model: A Case of IoT[J]. IEEE Access, 2019, 7: 45705-45715.
doi: 10.1109/ACCESS.2019.2909123
[19]
Hain D S, Jurowetzki R, Buchmann T, et al. A Text-Embedding-Based Approach to Measuring Patent-to-Patent Technological Similarity[J]. Technological Forecasting and Social Change, 2022, 177: Article No.121559.
[20]
Li S B, Hu J, Cui Y X, et al. DeepPatent: Patent Classification with Convolutional Neural Networks and Word Embedding[J]. Scientometrics, 2018, 117(2): 721-744.
doi: 10.1007/s11192-018-2905-5
[21]
Qi J J, Lei L, Zheng K, et al. Patent Analytic Citation-Based VSM: Challenges and Applications[J]. IEEE Access, 2020, 8: 17464-17476.
doi: 10.1109/Access.6287639
(Zhang Jie, Wei Pengtao, Zhai Dongsheng. Research on Patent Invalidity Search Based on Claim Decomposition and Similarity Ranking[J]. Information Studies: Theory & Application, 2019, 42(12): 108-114.)
doi: 10.16353/j.cnki.1000-7490.2019.12.017
(China National Intellectual Property Administration. Rules for Implementation of the Patent Law of the People's Republic of China[EB/OL]. [2015-09-07]. https://www.cnipa.gov.cn/art/2015/9/7/art_98_28200.html.)
(Kang Xudong, Deng Lele, Wang Yukai, et al. Evaluation of Patents. Cumulative Impact Based on all Generations of Citations: A Case Study of a Nobel Prize Winner's Patents[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(3): 267-277.)
[25]
Mirisaee H, Gaussier E, Lagnier C, et al. Terminology-Based Text Embedding for Computing Document Similarities on Technical Content[OL]. arXiv Preprint, arXiv: 1906.01874.
[26]
Gao T Y, Yao X C, Chen D Q. SimCSE: Simple Contrastive Learning of Sentence Embeddings[OL]. arXiv Preprint, arXiv: 2104.08821.
[27]
Costa Y M G, Bertolini D, Britto A S, et al. The Dissimilarity Approach: A Review[J]. Artificial Intelligence Review, 2020, 53(4): 2783-2808.
doi: 10.1007/s10462-019-09746-z
[28]
Riesen K, Bunke H. Graph Classification Based on Vector Space Embedding[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2009, 23(6): 1053-1081.
doi: 10.1142/S021800140900748X
[29]
Paclik P, Duin R P W. A Generalized Kernel Approach to Dissimilarity-Based Classification[J]. Journal of Machine Learning Research, 2002, 2(2): 175-211.
[30]
Bille P. A Survey on Tree Edit Distance and Related Problems[J]. Theoretical Computer Science, 2005, 337(1-3): 217-239.
doi: 10.1016/j.tcs.2004.12.030
[31]
Reimers N, Beyer P, Gurevych I. Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity[C]// Proceedings of the 26th International Conference on Computational Linguistics:Technical Papers. 2016: 87-96.
(Li Rui, Wang Tangrong, Long Rui. Empirical Research on the Correlation Between Patent Citations and Patent Maintenance Time[J]. Journal of Intelligence, 2022, 41(7): 71-76.)
[33]
Du L, Liu W D, Xiao K Y, et al. Technical Function-Effect Based Patent Multi-to-One Negation Game Model[C]// Proceedings of the 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, 2022: 1443-1448.
[34]
Lau J H, Baldwin T. An Empirical Evaluation of Doc2Vec with Practical Insights into Document Embedding Generation[OL]. arXiv Preprint, arXiv: 1607.05368.
[35]
Dingwall N, Potts C. Mittens: An Extension of GloVe for Learning Domain-Specialized Representations[OL]. arXiv Preprint, arXiv: 1803.09901.
[36]
Ethayarajh K. Unsupervised Random Walk Sentence Embeddings: A Strong But Simple Baseline[C]// Proceedings of the 3rd Workshop on Representation Learning for NLP. Stroudsburg, PA, USA: Association for Computational Linguistics, 2018: 91-100.
[37]
Kim G J, Park S S, Jang D S. Technology Forecasting Using Topic-Based Patent Analysis[J]. Journal of Scientific and Industrial Research, 2015, 74(5): 265-270.