|
|
Computing Patent Similarity Based on Hierarchical Feature of Claims |
Xiang Shuxuan1(),Cao Yujie2,Mao Jin3,4 |
1Laboratory of Data Intelligence and Interdisciplinary Innovation, Nanjing University, Nanjing 210023, China 2School of Information Management, Central China Normal University, Wuhan 430074, China 3School of Information Management, Wuhan University, Wuhan 430072, China 4Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China |
|
|
Abstract [Objective] This paper proposes a new model to compute patent similarity, which fully leverages the characteristics of patent texts and their structural and context features. [Methods] First, we used technical compound sentences, the weighting of information core degree, and information richness to represent patents. Then, we calculated patent-to-patent similarity with the representation. Finally, we conducted comparative experiments with correlation scores and patent classification. [Results] The proposed method outperformed benchmark methods in computing patent similarities. The technical compound sentences and weighting of information core degree and richness further improved the model's performance. [Limitations] We only examined the model with quantum computing. [Conclusions] Using a claim tree and technical compound sentences to organize patent information can improve the efficiency of patent text processing. The weighting of information core degree and richness based on hierarchical features of patents can improve their representation and patent similarity computing tasks.
|
Received: 19 December 2022
Published: 16 May 2023
|
|
Fund:National Natural Science Foundation of China(71921002);Fund of Hunan Province for High-level Talents(2021RC5029) |
Corresponding Authors:
Xiang Shuxuan,ORCID:0000-0002-3259-7169,E-mail:xsx@smail.nju.edu.cn。
|
[1] |
Qiu Z P, Wang Z. Technology Forecasting Based on Semantic and Citation Analysis of Patents: A Case of Robotics Domain[J]. IEEE Transactions on Engineering Management, 2022, 69(4): 1216-1236.
doi: 10.1109/TEM.2020.2978849
|
[2] |
刘小玲, 谭宗颖. 基于专利多属性融合的技术主题划分方法研究[J]. 数据分析与知识发现, 2022, 6(2/3): 45-54.
|
[2] |
(Liu Xiaoling, Tan Zongying. Clustering Technology Topics Based on Patent Multi-Attribute Fusion[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 45-54.)
|
[3] |
Kim T S, Sohn S Y. Machine-Learning-Based Deep Semantic Analysis Approach for Forecasting New Technology Convergence[J]. Technological Forecasting and Social Change, 2020, 157: Article No.120095.
|
[4] |
寇园园, 陈会英, 徐华杰, 等. 海外跨国公司在华人工智能专利布局及竞争态势研究[J]. 情报杂志, 2022, 41(9): 48-54.
|
[4] |
(Kou Yuanyuan, Chen Huiying, Xu Huajie, et al. Study on AI Patent Layout and Competitive Situation of Overseas Multinational Companies in China[J]. Journal of Intelligence, 2022, 41(9): 48-54.)
|
[5] |
吕学强, 罗艺雄, 李家全, 等. 中文专利侵权检测研究综述[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
|
[5] |
(Lv Xueqiang, Luo Yixiong, Li Jiaquan, et al. Review of Studies on Detecting Chinese Patent Infringements[J]. Data Analysis and Knowledge Discovery, 2021, 5(3): 60-68.)
|
[6] |
Bekamiri H, Hain D S, Jurowetzki R. PatentSBERTa: A Deep NLP Based Hybrid Model for Patent Distance and Classification Using Augmented SBERT[OL]. arXiv Preprint, arXiv: 2103.11933.
|
[7] |
俞琰, 鞠鹏, 尚明杰. 基于信息增益与相似度的专利关键词抽取算法评价模型[J]. 图书情报工作, 2022, 66(6): 108-117.
doi: 10.13266/j.issn.0252-3116.2022.06.012
|
[7] |
(Yu Yan, Ju Peng, Shang Mingjie. Research on the Evaluation Method of Patent Keyword Extraction Algorithm Based on Information Gain and Similarity[J]. Library and Information Service, 2022, 66(6): 108-117.)
doi: 10.13266/j.issn.0252-3116.2022.06.012
|
[8] |
Chen L X. Do Patent Citations Indicate Knowledge Linkage? The Evidence from Text Similarities Between Patents and Their Citations[J]. Journal of Informetrics, 2017, 11(1): 63-79.
doi: 10.1016/j.joi.2016.04.018
|
[9] |
Zhou Y, Dong F, Liu Y F, et al. A Deep Learning Framework to Early Identify Emerging Technologies in Large-Scale Outlier Patents: An Empirical Study of CNC Machine Tool[J]. Scientometrics, 2021, 126(2): 969-994.
doi: 10.1007/s11192-020-03797-8
|
[10] |
Frerich K, Bukowski M, Geisler S, et al. On the Potential of Taxonomic Graphs to Improve Applicability and Performance for the Classification of Biomedical Patents[J]. Applied Sciences, 2021, 11(2): Article No.690.
|
[11] |
Lee M, Lee S. Identifying New Business Opportunities from Competitor Intelligence: An Integrated Use of Patent and Trademark Databases[J]. Technological Forecasting and Social Change, 2017, 119: 170-183.
doi: 10.1016/j.techfore.2017.03.026
|
[12] |
高楠, 彭鼎原, 傅俊英, 等. 基于专利IPC分类与文本信息的前沿技术演进分析——以人工智能领域为例[J]. 情报理论与实践, 2020, 43(4): 123-129.
|
[12] |
(Gao Nan, Peng Dingyuan, Fu Junying, et al. Research on Technology Fronts Prediction Based on Patent IPC Classification and Text Information: Taking the Field of Artificial Intelligence as an Example[J]. Information Studies: Theory & Application, 2020, 43(4): 123-129.)
|
[13] |
高道斌, 吴红, 张彪, 等. 基于改进技术相似度计算的竞争对手辨别研究[J]. 情报杂志, 2022, 41(8): 53-61.
|
[13] |
(Gao Daobin, Wu Hong, Zhang Biao, et al. Research on Competitor Identification Based on Improved Technology Similarity Calculation[J]. Journal of Intelligence, 2022, 41(8): 53-61.)
|
[14] |
向姝璇, 李睿. 基于专利文献整体相似度计算的竞争对手发现——以5G领域为例[J]. 情报理论与实践, 2021, 44(5): 100-105.
|
[14] |
(Xiang Shuxuan, Li Rui. Competitor Discovery Based on Overall Similarity Calculation of Patent Documents: A Case Study of 5G Domain[J]. Information Studies: Theory & Application, 2021, 44(5): 100-105.)
|
[15] |
Yun S, Cho W, Kim C, et al. Technological Trend Mining: Identifying New Technology Opportunities Using Patent Semantic Analysis[J]. Information Processing & Management, 2022, 59(4): Article No.102993.
|
[16] |
俞琰, 陈磊, 姜金德, 等. 结合词向量和统计特征的专利相似度测量方法[J]. 数据分析与知识发现, 2019, 3(9): 53-59.
|
[16] |
(Yu Yan, Chen Lei, Jiang Jinde, et al. Measuring Patent Similarity with Word Embedding and Statistical Features[J]. Data Analysis and Knowledge Discovery, 2019, 3(9): 53-59.)
|
[17] |
Lee J S, Hsiang J. Patent Classification by Fine-Tuning BERT Language Model[J]. World Patent Information, 2020, 61: Article No.101965.
|
[18] |
Lei L, Qi J J, Zheng K. Patent Analytics Based on Feature Vector Space Model: A Case of IoT[J]. IEEE Access, 2019, 7: 45705-45715.
doi: 10.1109/ACCESS.2019.2909123
|
[19] |
Hain D S, Jurowetzki R, Buchmann T, et al. A Text-Embedding-Based Approach to Measuring Patent-to-Patent Technological Similarity[J]. Technological Forecasting and Social Change, 2022, 177: Article No.121559.
|
[20] |
Li S B, Hu J, Cui Y X, et al. DeepPatent: Patent Classification with Convolutional Neural Networks and Word Embedding[J]. Scientometrics, 2018, 117(2): 721-744.
doi: 10.1007/s11192-018-2905-5
|
[21] |
Qi J J, Lei L, Zheng K, et al. Patent Analytic Citation-Based VSM: Challenges and Applications[J]. IEEE Access, 2020, 8: 17464-17476.
doi: 10.1109/Access.6287639
|
[22] |
张杰, 魏鹏涛, 翟东升. 基于权利要求分解和相似度排序的专利无效检索研究[J]. 情报理论与实践, 2019, 42(12): 108-114.
doi: 10.16353/j.cnki.1000-7490.2019.12.017
|
[22] |
(Zhang Jie, Wei Pengtao, Zhai Dongsheng. Research on Patent Invalidity Search Based on Claim Decomposition and Similarity Ranking[J]. Information Studies: Theory & Application, 2019, 42(12): 108-114.)
doi: 10.16353/j.cnki.1000-7490.2019.12.017
|
[23] |
国家知识产权局. 中华人民共和国专利法实施细则[EB/OL]. [2015-09-07]. https://www.cnipa.gov.cn/art/2015/9/7/art_98_28200.html.
|
[23] |
(China National Intellectual Property Administration. Rules for Implementation of the Patent Law of the People's Republic of China[EB/OL]. [2015-09-07]. https://www.cnipa.gov.cn/art/2015/9/7/art_98_28200.html.)
|
[24] |
康旭东, 邓乐乐, 王宇开, 等. 基于全代引证的专利累积影响力评价——一个诺奖得主专利的案例研究[J]. 情报学报, 2021, 40(3): 267-277.
|
[24] |
(Kang Xudong, Deng Lele, Wang Yukai, et al. Evaluation of Patents. Cumulative Impact Based on all Generations of Citations: A Case Study of a Nobel Prize Winner's Patents[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(3): 267-277.)
|
[25] |
Mirisaee H, Gaussier E, Lagnier C, et al. Terminology-Based Text Embedding for Computing Document Similarities on Technical Content[OL]. arXiv Preprint, arXiv: 1906.01874.
|
[26] |
Gao T Y, Yao X C, Chen D Q. SimCSE: Simple Contrastive Learning of Sentence Embeddings[OL]. arXiv Preprint, arXiv: 2104.08821.
|
[27] |
Costa Y M G, Bertolini D, Britto A S, et al. The Dissimilarity Approach: A Review[J]. Artificial Intelligence Review, 2020, 53(4): 2783-2808.
doi: 10.1007/s10462-019-09746-z
|
[28] |
Riesen K, Bunke H. Graph Classification Based on Vector Space Embedding[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2009, 23(6): 1053-1081.
doi: 10.1142/S021800140900748X
|
[29] |
Paclik P, Duin R P W. A Generalized Kernel Approach to Dissimilarity-Based Classification[J]. Journal of Machine Learning Research, 2002, 2(2): 175-211.
|
[30] |
Bille P. A Survey on Tree Edit Distance and Related Problems[J]. Theoretical Computer Science, 2005, 337(1-3): 217-239.
doi: 10.1016/j.tcs.2004.12.030
|
[31] |
Reimers N, Beyer P, Gurevych I. Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity[C]// Proceedings of the 26th International Conference on Computational Linguistics:Technical Papers. 2016: 87-96.
|
[32] |
李睿, 王堂蓉, 龙瑞. 专利引证与专利维持时间的相关性实证[J]. 情报杂志, 2022, 41(7): 71-76.
|
[32] |
(Li Rui, Wang Tangrong, Long Rui. Empirical Research on the Correlation Between Patent Citations and Patent Maintenance Time[J]. Journal of Intelligence, 2022, 41(7): 71-76.)
|
[33] |
Du L, Liu W D, Xiao K Y, et al. Technical Function-Effect Based Patent Multi-to-One Negation Game Model[C]// Proceedings of the 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, 2022: 1443-1448.
|
[34] |
Lau J H, Baldwin T. An Empirical Evaluation of Doc2Vec with Practical Insights into Document Embedding Generation[OL]. arXiv Preprint, arXiv: 1607.05368.
|
[35] |
Dingwall N, Potts C. Mittens: An Extension of GloVe for Learning Domain-Specialized Representations[OL]. arXiv Preprint, arXiv: 1803.09901.
|
[36] |
Ethayarajh K. Unsupervised Random Walk Sentence Embeddings: A Strong But Simple Baseline[C]// Proceedings of the 3rd Workshop on Representation Learning for NLP. Stroudsburg, PA, USA: Association for Computational Linguistics, 2018: 91-100.
|
[37] |
Kim G J, Park S S, Jang D S. Technology Forecasting Using Topic-Based Patent Analysis[J]. Journal of Scientific and Industrial Research, 2015, 74(5): 265-270.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|