Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (3): 60-68    DOI: 10.11925/infotech.2096-3467.2020.1028
Current Issue | Archive | Adv Search |
Review of Studies on Detecting Chinese Patent Infringements
Lv Xueqiang,Luo Yixiong,Li Jiaquan,You Xindong()
Beijing Key Laboratory of Internet Culture & Digital Dissemination Research, Beijing Information Science &Technology University, Beijing 100101, China
Download: PDF (676 KB)   HTML ( 18
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper reviews research on detecting patent infringements, aiming to provide theoretical frameworks and development trends for future studies. [Coverage] We retrieved 53 representative literatures from CNKI and Bing Scholar using the keywords of “Patent Infringement” or “Patent Similarity”. [Methods] First, we summarized the methods for detecting patent infringement based on clustering, vector space model, SAO (Subject-Action-Object) structure, deep learning and patent structure. Then, we compared the advantages and disadvantages of popular methods for detecting patent infringements. Finally, we explored some possible optimization solutions for the existing methods. [Results] Patent infringement detection aims to retrieve small number of patents with higher risks of infringement from a large number of patent documents. It reduces the number of patents requiring manual judgments. Our method decides the risk of patent infringement by calculating their similarities based on statistical information of different granularities. [Limitations] Due to the lack of standard data sets, we could not quantitatively compare the methods for detecting patent infringements. [Conclusions] We could optimize patent infringement detection with pre-training models, calculating similarity of different patent components, and constructing high-quality data sets.

Key wordsPatent Similarity      Patent Infringement Detection      Deep Learning      Artificial Intelligence     
Received: 01 October 2020      Published: 12 April 2021
ZTFLH:  TP391  
Fund:National Natural Science Foundation of China(61671070)
Corresponding Authors: You Xindong     E-mail: youxindong@bistu.edu.cn

Cite this article:

Lv Xueqiang,Luo Yixiong,Li Jiaquan,You Xindong. Review of Studies on Detecting Chinese Patent Infringements. Data Analysis and Knowledge Discovery, 2021, 5(3): 60-68.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.1028     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I3/60

方法 优点 缺点 综述的文献篇数
基于聚类的方法 无监督学习方法,无需要人工标注。 聚类的类别有限,类别个数无法预先设定,在大量的专利文本中无法有效减少人工工作量。 3
基于向量空间模型的方法 向量工具模型的构建过程简单易行,向量维度特征的粒度及权值可调节。 无法表示维度特征间的联系及大于维度特征粒度的信息。
向量维度与语料规模正相关,大规模语料下构建的VSM中向量维度高且稀疏,使得进一步的相似度计算更复杂。
4
基于SAO结构的方法 可从语义层面检测相似度,计算工作量较小。 基于SAO结构的专利相似度计算准确性依赖于SAO结构抽取的准确性,有监督学习有助于筛选出符合需求的SAO结构,但需要人工标注的成本。 6
基于深度学习的方法 深度学习模型学习样本数据经验的容量上限远超出其他模型,准确性较高。 可解释性差,且高质量标注数据需要大量人力参与。 8
基于专利结构的方法 准确性较高,同时更具有针对性,可根据专利不同组成部分的特点采用不同的方法。 各个部分的相似度对专利侵权的权重不一样,需要一定数量的实验进行权重确定,计算工作量较大。 7
Advantages and Disadvantages of Patent Infringement Detection Methods
[1] 国家知识产权局. 2019主要工作统计数据及有关情况新闻发布会[R/OL]. http://www.gov.cn/xinwen/2020-01/15/content_5469519.htm. http://www.gov.cn/xinwen/2020-01/15/content_5469519.htm
[1] ( National Intellectual Property Administration. Press Conference on Major Work Statistics and Related Conditions in 2019[R/OL]. http://www.gov.cn/xinwen/2020-01/15/content_5469519.htm. http://www.gov.cn/xinwen/2020-01/15/content_5469519.htm)
[2] 路炜, 肖沪卫. 专利侵权检索与分析报告的规范研究[J]. 图书情报工作, 2008,52(2):73-76.
[2] ( Lu Wei, Xiao Huwei. Study on Report Criterion for Patent Infringement Search[J]. Library and Information Service, 2008,52(2):73-76.)
[3] Germeraad P, Morrison L. How Avery Dennison Manages Its Intellectual Assets[J]. Research-technology Management, 1998,41(6):36-43.
[4] 武玉英, 马羽翔, 翟东升. 基于SOM的中文专利侵权检测研究[J]. 情报杂志, 2014,33(2):33-39.
[4] ( Wu Yuying, Ma Yuxiang, Zhai Dongsheng. Research on Chinese Patent Infringement Detection Based on SOM[J]. Journal of Intelligence, 2014,33(2):33-39.)
[5] Lee S, Yoon B, Park Y. An Approach to Discovering New Technology Opportunities: Keyword-Based Patent Map Approach[J]. Technovation, 2009,29(6-7):481-497.
[6] Lee C, Cho Y, Seol H, et al. A Stochastic Patent Citation Analysis Approach to Assessing Future Technological Impacts[J]. Technological Forecasting and Social Change, 2012,79(1):16-29.
[7] 汪雪锋, 刘玉琴, 刘佳. 中文专利侵权检索模型研究[J]. 计算机工程与应用, 2009,45(9):212-215.
[7] ( Wang Xuefeng, Liu Yuqin, Liu Jia. Research on Chinese Patent Infringement Retrieval Model[J]. Computer Engineering and Applications, 2009,45(9):212-215.)
[8] 马文姗, 赵海宁, 翟东升. 中文专利侵权检索模型研究[J]. 情报杂志, 2012,31(4):175-179, 195.
[8] ( Ma Wenshan, Zhao Haining, Zhai Dongsheng. Research on Chinese Patent Infringement Retrieval Model[J]. Journal of Intelligence, 2012,31(4):175-179, 195.)
[9] 俞琰, 陈磊, 姜金德, 等. 结合词向量和统计特征的专利相似度测量方法[J]. 数据分析与知识发现, 2019,3(9):53-59.
[9] ( Yu Yan, Chen Lei, Jiang Jinde, et al. Measuring Patent Similarity with Word Embedding and Statistical Features[J]. Data Analysis and Knowledge Discovery, 2019,3(9):53-59.)
[10] 金健, 朱玉全, 陈耿. 基于三元组特征和词向量技术的中文专利侵权检测研究[J]. 计算机应用研究, 2017,34(10):2901-2904.
[10] ( Jin Jian, Zhu Yuquan, Chen Geng. Infringement Detection of Chinese Patent Based on Three Tuple Character and Word Embedding[J]. Application Research of Computers, 2017,34(10):2901-2904.)
[11] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[12] Dong Z D, Dong Q. HowNet—A Hybrid Language and Knowledge Resource[C]// Proceedings of the International Conference on Natural Language Processing and Knowledge Engineering. 2003: 820-824.
[13] 杜玉锋, 季铎, 姜利雪, 等. 基于SAO的专利结构化相似度计算方法[J]. 中文信息学报, 2016,30(1):30-35.
[13] ( Du Yufeng, Ji Duo, Jiang Lixue, et al. Patent Similarity Measure Based on SAO Structure[J]. Journal of Chinese Information Processing, 2016,30(1):30-35.)
[14] 张杰, 孙宁宁, 张海超, 等. 基于SAO结构的中文相似专利识别算法及其应用[J]. 情报学报, 2016,35(5):472-482.
[14] ( Zhang Jie, Sun Ningning, Zhang Haichao, et al. Method and Application of Chinese Similar Patents Recognition Based on SAO Structures[J]. Journal of Intelligence, 2016,35(5):472-482.)
[15] Park H, Yoon J, Kim K. Identifying Patent Infringement Using SAO Based Semantic Technological Similarities[J]. Scientometrics, 2012,90(2):515-529.
[16] 马勋, 周长胜, 吕学强, 等. 基于SAO结构的非分类关系抽取研究[J]. 计算机工程与应用, 2018,54(8):220-225, 235.
[16] ( Ma Xun, Zhou Changsheng, Lv Xueqiang, et al. Extraction of Non-Taxonomic Relations Based on SAO Structure[J]. Computer Engineering and Applications, 2018,54(8):220-225, 235.)
[17] 张永真, 吕学强, 申闫春, 等. 基于SAO结构的中文专利实体关系抽取[J]. 计算机工程与设计, 2019,40(3):706-712.
[17] ( Zhang Yongzhen, Lv Xueqiang, Shen Yanchun, et al. Chinese Patent Entity Relation Extraction Based on Subject Action Object Structure[J]. Computer Engineering and Design, 2019,40(3):706-712.)
[18] 翟东升, 蔡文浩, 张杰, 等. 基于图相似度的专利侵权检测方法研究[J]. 图书情报工作, 2018,62(5):97-105.
[18] ( Zhai Dongsheng, Cai Wenhao, Zhang Jie, et al. A Method of Patent Infringement Detection Based on Graph Similarity[J]. Library and Information Service, 2018,62(5):97-105.)
[19] Cascini G, Zini M. Measuring Patent Similarity by Comparing Inventions Functional Trees[M]. Springer, 2008.
[20] Huang P S, He X, Gao J, et al. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data[C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 2013: 2333-2338.
[21] He H, Gimpel K, Lin J. Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1576-1586.
[22] Tai K S, Socher R, Manning C D. Improved Semantic Representations from Tree-Structured Long Short-Term Memory Networks[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015: 1556-1566.
[23] Mueller J, Thyagarajan A. Siamese Recurrent Architectures for Learning Sentence Similarity[C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence. AAAI Press, 2016: 2786-2792.
[24] Neculoiu P, Versteegh M, Rotaru M. Learning Text Similarity with Siamese Recurrent Networks[C]// Proceedings of the 1st Workshop on Representation Learning for NLP. 2016: 148-157.
[25] Yoon B, Yoon C, Park Y. On the Development and Application of a Self-Organizing Feature Map-Based Patent Map[J]. R&D Management, 2002(32):291-300.
[26] Huang S H, Ke H R, Yang W P. Structure Clustering for Chinese Patent Documents[J]. Expert Systems with Applications, 2008,34(4):2290-2297.
[27] 曹祺, 赵伟, 张英杰, 等. 基于Doc2Vec的专利文件相似度检测方法的对比研究[J]. 图书情报工作, 2018,62(13):74-81.
[27] ( Cao Qi, Zhao Wei, Zhang Yingjie, et al. Comparative Study of Patent Documents Similarity Detection on Deep Learning of Doc2Vec Based Methods[J]. Library and Information Service, 2018,62(13):74-81.)
[28] Indukuri K V, Ambekar A A, Sureka A, et al. Similarity Analysis of Patent Claims Using Natural Language Processing Techniques[C]// Proceedings of the International Conference on Computational Intelligence and Multimedia Applications, 2007: 169-175.
[29] Bergmann I, Butzke D, Walter L, et al. Evaluating the Risk of Patent Infringement by Means of Semantic Patent Analysis: The Case of DNA Chips[J]. R & D Management, 2008,38(5):550-562.
[30] Fujii A, Ishikawa T. Document Structure Analysis for the NTCIR-5 Patent Retrieval Task[C]// Proceedings of the 5th NTCIR Workshop on Evaluation of Information Access Technologies, Information Retrieval, Question Answering and Cross-Lingual Information Access, Tokyo, Japan. 2005.
[31] Osborn M, Strzalkowski T, Marinescu M. Evaluating Document Retrieval in Patent Database: A Preliminary Report[C]// Proceedings of the 6th International Conference on Information and Knowledge Management. 1997: 216-221.
[32] Lee C, Song B, Park Y. How to Assess Patent Infringement Risks: A Semantic Patent Claim Analysis Using Dependency Relationships[J]. Technology Analysis & Strategic Management, 2013,25(1):23-38.
[33] Cheng T Y, Wang M T. The Patent-Classification Technology/Function Matrix - A Systematic Method for Design Around[J]. Journal of Intellectual Property Rights, 2013,18(2):158-167.
[34] Lin F R, Chen K R, Lin S Y. A Hybrid Patent Prior Art Retrieval Approach Using Claim Structure and Description[C]// Proceedings of the 8th International Conference on Knowledge Management in Organizations. Springer, 2014: 231-248.
[35] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019.
[36] Miao Q, Zhang S, Zhang B, et al. Extracting and Visualizing Semantic Relationships from Chinese Biomedical Text[C]// Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, Bali,Indonesia. 2012: 99-107.
[37] Chen Y, Zheng Q, Zhang W. Omni-word Feature and Soft Constraint for Chinese Relation Extraction[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA. 2014: 572-581.
[38] 段利国, 徐庆, 李爱萍, 等. 实体词语义信息对中文实体关系抽取的作用研究[J]. 计算机应用研究, 2017,34(1):141-146.
[38] ( Duan Liguo, Xu Qing, Li Aiping, et al. Research on Effect of Entities Semantic Information on Chinese Entity Relation Extraction[J]. Application Research of Computers, 2017,34(1):141-146.)
[39] 刘丹丹, 彭成, 钱龙华, 等. 《同义词词林》在中文实体关系抽取中的作用[J]. 中文信息学报, 2014,28(2):91-99.
[39] ( Liu Dandan, Peng Cheng, Qian Longhua, et al. The Effect of TongYiCi CiLin in Chinese Entity Relation Extraction[J]. Journal of Chinese Information Processing, 2014,28(2):91-99.)
[40] Yang C, Zhu D, Wang X. SAO-Based Core Technological Components’ Identification[C]// Proceedings of the 10th International Conference on Software, Knowledge, Information Management & Applications. IEEE, 2016: 67-72.
[41] 刘勇, 兴艳云. 基于改进随机森林算法的文本分类研究与应用[J]. 计算机系统应用, 2019,28(5):220-225.
[41] ( Liu Yong, Xing Yanyun. Research and Application of Text Classification Based on Improved Random Forest Algorithm[J]. Computer Systems & Applications, 2019,28(5):220-225.)
[42] Sun A, Lim E P, Liu Y. On Strategies for Imbalanced Text Classification Using SVM: A Comparative Study[J]. Decision Support Systems, 2009,48(1):191-201.
[43] Zhang X. Interactive Patent Classification Based on Multi-Classifier Fusion and Active Learning[J]. Neurocomputing, 2014,127:200-205.
[44] Kim Y. Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408.5882.
[45] Zhang X, Zhao J B, LeCun Y. Character-Level Convolutional Networks for Text Classification[C]// Proceedings of the International Conference on Computational Intelligence and Multimedia Applications, 2015: 649-657.
[46] Zhou C, Sun C, Liu Z, et al. AC-LSTM Neural Network for Text Classification[OL]. arXiv Preprint, arXiv: 1511. 08630.
[47] 王吉俐, 彭敦陆, 陈章, 等. AM-CNN: 一种基于注意力的卷积神经网络文本分类模型[J]. 小型微型计算机系统, 2019,40(4):710-714.
[47] ( Wang Jili, Peng Dunlu, Chen Zhang, et al. AM-CNN: A Convolution Neural Network Architecture for Text Classification Based on Attention Mechanism[J]. Journal of Chinese Computer Systems, 2019,40(4):710-714.)
[48] Lu X, Ni B. BERT-CNN: A Hierarchical Patent Classifier Based on a Pre-Trained Language Model [OL]. arXiv Preprint, arXiv: 1911. 06241.
[49] 张忠平, 赵海亮, 张志惠. 基于本体的概念相似度计算[J]. 计算机工程, 2009,35(7):17-19.
[49] ( Zhang Zhongping, Zhao Hailiang, Zhang Zhihui. Concept Similarity Computation Based on Ontology[J]. Computer Engineering, 2009,35(7):17-19.)
[50] 李家全, 李宝安, 游新冬, 等. 基于专利知识图谱的专利术语相似度计算研究[J]. 数据分析与知识发现, 2020,4(10):104-112.
[50] ( Li Jiaquan, Li Baoan, You Xindong, et al. Computing Similarity of Patent Terms Based on Patent Knowledge Graph[J]. Data Analysis and Knowledge Discovery, 2020,4(10):104-112.)
[51] 徐英卓, 贾欢. 基于树结构的本体概念相似度计算方法[J]. 计算机系统应用, 2017,26(3):275-279.
[51] ( Xu Yingzhuo, Jia Huan. Ontology Concept Similarity Calculation Based on Tree Structure[J]. Computer Systems & Applications, 2017,26(3):275-279.)
[52] Bouras C, Tsogkas V. A Clustering Technique for News Articles Using WordNet[J]. Knowledge-Based Systems, 2012,36:115-128.
[53] Yu X, Ren X, Gu Q, et al. Collaborative Filtering with Entity Similarity Regularization in Heterogeneous Information Networks[C]// Proceedings of the IJCAI-13 HINA Workshop. 2013.
[54] Zhang J, Tang J, Ma C, et al. Panther: Fast Top-K Similarity Search on Large Networks[C]// Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015: 1445-1454.
[1] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[2] Zhao Danning,Mu Dongmei,Bai Sen. Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[3] Xu Yuemei, Wang Zihou, Wu Zixin. Predicting Stock Trends with CNN-BiLSTM Based Multi-Feature Integration Model[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[4] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[5] Huang Mingxuan,Jiang Caoqing,Lu Shoudong. Expanding Queries Based on Word Embedding and Expansion Terms[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[6] Song Ruoxuan,Qian Li,Du Yu. Identifying Academic Creative Concept Topics Based on Future Work of Scientific Papers[J]. 数据分析与知识发现, 2021, 5(5): 10-20.
[7] Zhang Guobiao,Li Jie. Detecting Social Media Fake News with Semantic Consistency Between Multi-model Contents[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[8] Chang Chengyang,Wang Xiaodong,Zhang Shenglei. Polarity Analysis of Dynamic Political Sentiments from Tweets with Deep Learning Method[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[9] Feng Yong,Liu Yang,Xu Hongyan,Wang Rongbing,Zhang Yonggang. Recommendation Model Incorporating Neighbor Reviews for GRU Products[J]. 数据分析与知识发现, 2021, 5(3): 78-87.
[10] Hu Haotian,Ji Jinfeng,Wang Dongbo,Deng Sanhong. An Integrated Platform for Food Safety Incident Entities Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[11] Zhang Qi,Jiang Chuan,Ji Youshu,Feng Minxuan,Li Bin,Xu Chao,Liu Liu. Unified Model for Word Segmentation and POS Tagging of Multi-Domain Pre-Qin Literature[J]. 数据分析与知识发现, 2021, 5(3): 2-11.
[12] Cheng Bin,Shi Shuicai,Du Yuncheng,Xiao Shibin. Keyword Extraction for Journals Based on Part-of-Speech and BiLSTM-CRF Combined Model[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[13] Li Danyang, Gan Mingxin. Music Recommendation Method Based on Multi-Source Information Fusion[J]. 数据分析与知识发现, 2021, 5(2): 94-105.
[14] Yu Chuanming, Zhang Zhengang, Kong Lingge. Comparing Knowledge Graph Representation Models for Link Prediction[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
[15] Han Pu, Zhang Wei, Zhang Zhanpeng, Wang Yuxin, Fang Haoyu. Sentiment Analysis of Weibo Posts on Public Health Emergency with Feature Fusion and Multi-Channel[J]. 数据分析与知识发现, 2021, 5(11): 68-79.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn