Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (3): 60-68     https://doi.org/10.11925/infotech.2096-3467.2020.1028
  综述评介 本期目录 | 过刊浏览 | 高级检索 |
中文专利侵权检测研究综述*
吕学强,罗艺雄,李家全,游新冬()
北京信息科技大学网络文化与数字传播重点实验室 北京 100101
Review of Studies on Detecting Chinese Patent Infringements
Lv Xueqiang,Luo Yixiong,Li Jiaquan,You Xindong()
Beijing Key Laboratory of Internet Culture & Digital Dissemination Research, Beijing Information Science &Technology University, Beijing 100101, China
全文: PDF (676 KB)   HTML ( 18
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 分析并总结专利侵权检测的相关研究,为下一步研究提供理论基础和发展趋势。【文献范围】 利用知网和Bing Scholar以“专利侵权”、“Patent Infringement”、“专利相似度”和“Patent Similarity”等关键词进行检索,经过手工筛选获得代表性文献53篇。【方法】 总结基于聚类、基于向量空间模型、基于SAO(Subject-Action-Object)结构、基于深度学习和基于专利结构等专利侵权检测方法;在分析现有方法优缺点的基础上,总结优化专利侵权检测的方向。【结果】 专利侵权检测旨在从大量专利文献中检索出小批量的侵权风险较高的专利,从而减少需要人工进行专利侵权判定的专利数量。专利侵权检测通过计算专利间相似度来判断专利侵权的风险,相似度主要使用不同粒度的统计信息计算得到。【局限】 由于标准数据集的缺失,未能对专利侵权检测相关方法进行量化比较。【结论】 提出从引入预训练模型、融合专利不同组成部分计算相似度和构建高质量的专利侵权检测数据集等方向开展该主题后继研究的建议。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
吕学强
罗艺雄
李家全
游新冬
关键词 专利相似度专利侵权检测深度学习人工智能    
Abstract

[Objective] This paper reviews research on detecting patent infringements, aiming to provide theoretical frameworks and development trends for future studies. [Coverage] We retrieved 53 representative literatures from CNKI and Bing Scholar using the keywords of “Patent Infringement” or “Patent Similarity”. [Methods] First, we summarized the methods for detecting patent infringement based on clustering, vector space model, SAO (Subject-Action-Object) structure, deep learning and patent structure. Then, we compared the advantages and disadvantages of popular methods for detecting patent infringements. Finally, we explored some possible optimization solutions for the existing methods. [Results] Patent infringement detection aims to retrieve small number of patents with higher risks of infringement from a large number of patent documents. It reduces the number of patents requiring manual judgments. Our method decides the risk of patent infringement by calculating their similarities based on statistical information of different granularities. [Limitations] Due to the lack of standard data sets, we could not quantitatively compare the methods for detecting patent infringements. [Conclusions] We could optimize patent infringement detection with pre-training models, calculating similarity of different patent components, and constructing high-quality data sets.

Key wordsPatent Similarity    Patent Infringement Detection    Deep Learning    Artificial Intelligence
收稿日期: 2020-10-01      出版日期: 2021-04-12
ZTFLH:  TP391  
基金资助:*国家自然科学基金项目(61671070)
通讯作者: 游新冬     E-mail: youxindong@bistu.edu.cn
引用本文:   
吕学强,罗艺雄,李家全,游新冬. 中文专利侵权检测研究综述*[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
Lv Xueqiang,Luo Yixiong,Li Jiaquan,You Xindong. Review of Studies on Detecting Chinese Patent Infringements. Data Analysis and Knowledge Discovery, 2021, 5(3): 60-68.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.1028      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I3/60
方法 优点 缺点 综述的文献篇数
基于聚类的方法 无监督学习方法,无需要人工标注。 聚类的类别有限,类别个数无法预先设定,在大量的专利文本中无法有效减少人工工作量。 3
基于向量空间模型的方法 向量工具模型的构建过程简单易行,向量维度特征的粒度及权值可调节。 无法表示维度特征间的联系及大于维度特征粒度的信息。
向量维度与语料规模正相关,大规模语料下构建的VSM中向量维度高且稀疏,使得进一步的相似度计算更复杂。
4
基于SAO结构的方法 可从语义层面检测相似度,计算工作量较小。 基于SAO结构的专利相似度计算准确性依赖于SAO结构抽取的准确性,有监督学习有助于筛选出符合需求的SAO结构,但需要人工标注的成本。 6
基于深度学习的方法 深度学习模型学习样本数据经验的容量上限远超出其他模型,准确性较高。 可解释性差,且高质量标注数据需要大量人力参与。 8
基于专利结构的方法 准确性较高,同时更具有针对性,可根据专利不同组成部分的特点采用不同的方法。 各个部分的相似度对专利侵权的权重不一样,需要一定数量的实验进行权重确定,计算工作量较大。 7
Table 1  现有专利侵权检测方法优缺点对比
[1] 国家知识产权局. 2019主要工作统计数据及有关情况新闻发布会[R/OL]. http://www.gov.cn/xinwen/2020-01/15/content_5469519.htm. http://www.gov.cn/xinwen/2020-01/15/content_5469519.htm
[1] ( National Intellectual Property Administration. Press Conference on Major Work Statistics and Related Conditions in 2019[R/OL]. http://www.gov.cn/xinwen/2020-01/15/content_5469519.htm. http://www.gov.cn/xinwen/2020-01/15/content_5469519.htm)
[2] 路炜, 肖沪卫. 专利侵权检索与分析报告的规范研究[J]. 图书情报工作, 2008,52(2):73-76.
[2] ( Lu Wei, Xiao Huwei. Study on Report Criterion for Patent Infringement Search[J]. Library and Information Service, 2008,52(2):73-76.)
[3] Germeraad P, Morrison L. How Avery Dennison Manages Its Intellectual Assets[J]. Research-technology Management, 1998,41(6):36-43.
[4] 武玉英, 马羽翔, 翟东升. 基于SOM的中文专利侵权检测研究[J]. 情报杂志, 2014,33(2):33-39.
[4] ( Wu Yuying, Ma Yuxiang, Zhai Dongsheng. Research on Chinese Patent Infringement Detection Based on SOM[J]. Journal of Intelligence, 2014,33(2):33-39.)
[5] Lee S, Yoon B, Park Y. An Approach to Discovering New Technology Opportunities: Keyword-Based Patent Map Approach[J]. Technovation, 2009,29(6-7):481-497.
[6] Lee C, Cho Y, Seol H, et al. A Stochastic Patent Citation Analysis Approach to Assessing Future Technological Impacts[J]. Technological Forecasting and Social Change, 2012,79(1):16-29.
[7] 汪雪锋, 刘玉琴, 刘佳. 中文专利侵权检索模型研究[J]. 计算机工程与应用, 2009,45(9):212-215.
[7] ( Wang Xuefeng, Liu Yuqin, Liu Jia. Research on Chinese Patent Infringement Retrieval Model[J]. Computer Engineering and Applications, 2009,45(9):212-215.)
[8] 马文姗, 赵海宁, 翟东升. 中文专利侵权检索模型研究[J]. 情报杂志, 2012,31(4):175-179, 195.
[8] ( Ma Wenshan, Zhao Haining, Zhai Dongsheng. Research on Chinese Patent Infringement Retrieval Model[J]. Journal of Intelligence, 2012,31(4):175-179, 195.)
[9] 俞琰, 陈磊, 姜金德, 等. 结合词向量和统计特征的专利相似度测量方法[J]. 数据分析与知识发现, 2019,3(9):53-59.
[9] ( Yu Yan, Chen Lei, Jiang Jinde, et al. Measuring Patent Similarity with Word Embedding and Statistical Features[J]. Data Analysis and Knowledge Discovery, 2019,3(9):53-59.)
[10] 金健, 朱玉全, 陈耿. 基于三元组特征和词向量技术的中文专利侵权检测研究[J]. 计算机应用研究, 2017,34(10):2901-2904.
[10] ( Jin Jian, Zhu Yuquan, Chen Geng. Infringement Detection of Chinese Patent Based on Three Tuple Character and Word Embedding[J]. Application Research of Computers, 2017,34(10):2901-2904.)
[11] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[12] Dong Z D, Dong Q. HowNet—A Hybrid Language and Knowledge Resource[C]// Proceedings of the International Conference on Natural Language Processing and Knowledge Engineering. 2003: 820-824.
[13] 杜玉锋, 季铎, 姜利雪, 等. 基于SAO的专利结构化相似度计算方法[J]. 中文信息学报, 2016,30(1):30-35.
[13] ( Du Yufeng, Ji Duo, Jiang Lixue, et al. Patent Similarity Measure Based on SAO Structure[J]. Journal of Chinese Information Processing, 2016,30(1):30-35.)
[14] 张杰, 孙宁宁, 张海超, 等. 基于SAO结构的中文相似专利识别算法及其应用[J]. 情报学报, 2016,35(5):472-482.
[14] ( Zhang Jie, Sun Ningning, Zhang Haichao, et al. Method and Application of Chinese Similar Patents Recognition Based on SAO Structures[J]. Journal of Intelligence, 2016,35(5):472-482.)
[15] Park H, Yoon J, Kim K. Identifying Patent Infringement Using SAO Based Semantic Technological Similarities[J]. Scientometrics, 2012,90(2):515-529.
[16] 马勋, 周长胜, 吕学强, 等. 基于SAO结构的非分类关系抽取研究[J]. 计算机工程与应用, 2018,54(8):220-225, 235.
[16] ( Ma Xun, Zhou Changsheng, Lv Xueqiang, et al. Extraction of Non-Taxonomic Relations Based on SAO Structure[J]. Computer Engineering and Applications, 2018,54(8):220-225, 235.)
[17] 张永真, 吕学强, 申闫春, 等. 基于SAO结构的中文专利实体关系抽取[J]. 计算机工程与设计, 2019,40(3):706-712.
[17] ( Zhang Yongzhen, Lv Xueqiang, Shen Yanchun, et al. Chinese Patent Entity Relation Extraction Based on Subject Action Object Structure[J]. Computer Engineering and Design, 2019,40(3):706-712.)
[18] 翟东升, 蔡文浩, 张杰, 等. 基于图相似度的专利侵权检测方法研究[J]. 图书情报工作, 2018,62(5):97-105.
[18] ( Zhai Dongsheng, Cai Wenhao, Zhang Jie, et al. A Method of Patent Infringement Detection Based on Graph Similarity[J]. Library and Information Service, 2018,62(5):97-105.)
[19] Cascini G, Zini M. Measuring Patent Similarity by Comparing Inventions Functional Trees[M]. Springer, 2008.
[20] Huang P S, He X, Gao J, et al. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data[C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 2013: 2333-2338.
[21] He H, Gimpel K, Lin J. Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1576-1586.
[22] Tai K S, Socher R, Manning C D. Improved Semantic Representations from Tree-Structured Long Short-Term Memory Networks[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015: 1556-1566.
[23] Mueller J, Thyagarajan A. Siamese Recurrent Architectures for Learning Sentence Similarity[C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence. AAAI Press, 2016: 2786-2792.
[24] Neculoiu P, Versteegh M, Rotaru M. Learning Text Similarity with Siamese Recurrent Networks[C]// Proceedings of the 1st Workshop on Representation Learning for NLP. 2016: 148-157.
[25] Yoon B, Yoon C, Park Y. On the Development and Application of a Self-Organizing Feature Map-Based Patent Map[J]. R&D Management, 2002(32):291-300.
[26] Huang S H, Ke H R, Yang W P. Structure Clustering for Chinese Patent Documents[J]. Expert Systems with Applications, 2008,34(4):2290-2297.
[27] 曹祺, 赵伟, 张英杰, 等. 基于Doc2Vec的专利文件相似度检测方法的对比研究[J]. 图书情报工作, 2018,62(13):74-81.
[27] ( Cao Qi, Zhao Wei, Zhang Yingjie, et al. Comparative Study of Patent Documents Similarity Detection on Deep Learning of Doc2Vec Based Methods[J]. Library and Information Service, 2018,62(13):74-81.)
[28] Indukuri K V, Ambekar A A, Sureka A, et al. Similarity Analysis of Patent Claims Using Natural Language Processing Techniques[C]// Proceedings of the International Conference on Computational Intelligence and Multimedia Applications, 2007: 169-175.
[29] Bergmann I, Butzke D, Walter L, et al. Evaluating the Risk of Patent Infringement by Means of Semantic Patent Analysis: The Case of DNA Chips[J]. R & D Management, 2008,38(5):550-562.
[30] Fujii A, Ishikawa T. Document Structure Analysis for the NTCIR-5 Patent Retrieval Task[C]// Proceedings of the 5th NTCIR Workshop on Evaluation of Information Access Technologies, Information Retrieval, Question Answering and Cross-Lingual Information Access, Tokyo, Japan. 2005.
[31] Osborn M, Strzalkowski T, Marinescu M. Evaluating Document Retrieval in Patent Database: A Preliminary Report[C]// Proceedings of the 6th International Conference on Information and Knowledge Management. 1997: 216-221.
[32] Lee C, Song B, Park Y. How to Assess Patent Infringement Risks: A Semantic Patent Claim Analysis Using Dependency Relationships[J]. Technology Analysis & Strategic Management, 2013,25(1):23-38.
[33] Cheng T Y, Wang M T. The Patent-Classification Technology/Function Matrix - A Systematic Method for Design Around[J]. Journal of Intellectual Property Rights, 2013,18(2):158-167.
[34] Lin F R, Chen K R, Lin S Y. A Hybrid Patent Prior Art Retrieval Approach Using Claim Structure and Description[C]// Proceedings of the 8th International Conference on Knowledge Management in Organizations. Springer, 2014: 231-248.
[35] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019.
[36] Miao Q, Zhang S, Zhang B, et al. Extracting and Visualizing Semantic Relationships from Chinese Biomedical Text[C]// Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, Bali,Indonesia. 2012: 99-107.
[37] Chen Y, Zheng Q, Zhang W. Omni-word Feature and Soft Constraint for Chinese Relation Extraction[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA. 2014: 572-581.
[38] 段利国, 徐庆, 李爱萍, 等. 实体词语义信息对中文实体关系抽取的作用研究[J]. 计算机应用研究, 2017,34(1):141-146.
[38] ( Duan Liguo, Xu Qing, Li Aiping, et al. Research on Effect of Entities Semantic Information on Chinese Entity Relation Extraction[J]. Application Research of Computers, 2017,34(1):141-146.)
[39] 刘丹丹, 彭成, 钱龙华, 等. 《同义词词林》在中文实体关系抽取中的作用[J]. 中文信息学报, 2014,28(2):91-99.
[39] ( Liu Dandan, Peng Cheng, Qian Longhua, et al. The Effect of TongYiCi CiLin in Chinese Entity Relation Extraction[J]. Journal of Chinese Information Processing, 2014,28(2):91-99.)
[40] Yang C, Zhu D, Wang X. SAO-Based Core Technological Components’ Identification[C]// Proceedings of the 10th International Conference on Software, Knowledge, Information Management & Applications. IEEE, 2016: 67-72.
[41] 刘勇, 兴艳云. 基于改进随机森林算法的文本分类研究与应用[J]. 计算机系统应用, 2019,28(5):220-225.
[41] ( Liu Yong, Xing Yanyun. Research and Application of Text Classification Based on Improved Random Forest Algorithm[J]. Computer Systems & Applications, 2019,28(5):220-225.)
[42] Sun A, Lim E P, Liu Y. On Strategies for Imbalanced Text Classification Using SVM: A Comparative Study[J]. Decision Support Systems, 2009,48(1):191-201.
[43] Zhang X. Interactive Patent Classification Based on Multi-Classifier Fusion and Active Learning[J]. Neurocomputing, 2014,127:200-205.
[44] Kim Y. Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408.5882.
[45] Zhang X, Zhao J B, LeCun Y. Character-Level Convolutional Networks for Text Classification[C]// Proceedings of the International Conference on Computational Intelligence and Multimedia Applications, 2015: 649-657.
[46] Zhou C, Sun C, Liu Z, et al. AC-LSTM Neural Network for Text Classification[OL]. arXiv Preprint, arXiv: 1511. 08630.
[47] 王吉俐, 彭敦陆, 陈章, 等. AM-CNN: 一种基于注意力的卷积神经网络文本分类模型[J]. 小型微型计算机系统, 2019,40(4):710-714.
[47] ( Wang Jili, Peng Dunlu, Chen Zhang, et al. AM-CNN: A Convolution Neural Network Architecture for Text Classification Based on Attention Mechanism[J]. Journal of Chinese Computer Systems, 2019,40(4):710-714.)
[48] Lu X, Ni B. BERT-CNN: A Hierarchical Patent Classifier Based on a Pre-Trained Language Model [OL]. arXiv Preprint, arXiv: 1911. 06241.
[49] 张忠平, 赵海亮, 张志惠. 基于本体的概念相似度计算[J]. 计算机工程, 2009,35(7):17-19.
[49] ( Zhang Zhongping, Zhao Hailiang, Zhang Zhihui. Concept Similarity Computation Based on Ontology[J]. Computer Engineering, 2009,35(7):17-19.)
[50] 李家全, 李宝安, 游新冬, 等. 基于专利知识图谱的专利术语相似度计算研究[J]. 数据分析与知识发现, 2020,4(10):104-112.
[50] ( Li Jiaquan, Li Baoan, You Xindong, et al. Computing Similarity of Patent Terms Based on Patent Knowledge Graph[J]. Data Analysis and Knowledge Discovery, 2020,4(10):104-112.)
[51] 徐英卓, 贾欢. 基于树结构的本体概念相似度计算方法[J]. 计算机系统应用, 2017,26(3):275-279.
[51] ( Xu Yingzhuo, Jia Huan. Ontology Concept Similarity Calculation Based on Tree Structure[J]. Computer Systems & Applications, 2017,26(3):275-279.)
[52] Bouras C, Tsogkas V. A Clustering Technique for News Articles Using WordNet[J]. Knowledge-Based Systems, 2012,36:115-128.
[53] Yu X, Ren X, Gu Q, et al. Collaborative Filtering with Entity Similarity Regularization in Heterogeneous Information Networks[C]// Proceedings of the IJCAI-13 HINA Workshop. 2013.
[54] Zhang J, Tang J, Ma C, et al. Panther: Fast Top-K Similarity Search on Large Networks[C]// Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015: 1445-1454.
[1] 周泽聿,王昊,赵梓博,李跃艳,张小琴. 融合关联信息的GCN文本分类模型构建及其应用研究*[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[2] 赵丹宁,牟冬梅,白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[3] 徐月梅, 王子厚, 吴子歆. 一种基于CNN-BiLSTM多特征融合的股票走势预测模型*[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[4] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[5] 黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[6] 宋若璇,钱力,杜宇. 基于科技论文中未来工作句集的学术创新构想话题自动生成方法研究*[J]. 数据分析与知识发现, 2021, 5(5): 10-20.
[7] 马莹雪,甘明鑫,肖克峻. 融合标签和内容信息的矩阵分解推荐方法*[J]. 数据分析与知识发现, 2021, 5(5): 71-82.
[8] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[9] 常城扬,王晓东,张胜磊. 基于深度学习方法对特定群体推特的动态政治情感极性分析*[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[10] 冯勇,刘洋,徐红艳,王嵘冰,张永刚. 融合近邻评论的GRU商品推荐模型*[J]. 数据分析与知识发现, 2021, 5(3): 78-87.
[11] 胡昊天,吉晋锋,王东波,邓三鸿. 基于深度学习的食品安全事件实体一体化呈现平台构建*[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[12] 张琪,江川,纪有书,冯敏萱,李斌,许超,刘浏. 面向多领域先秦典籍的分词词性一体化自动标注模型构建*[J]. 数据分析与知识发现, 2021, 5(3): 2-11.
[13] 成彬,施水才,都云程,肖诗斌. 基于融合词性的BiLSTM-CRF的期刊关键词抽取方法[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[14] 李丹阳, 甘明鑫. 基于多源信息融合的音乐推荐方法 *[J]. 数据分析与知识发现, 2021, 5(2): 94-105.
[15] 余传明, 张贞港, 孔令格. 面向链接预测的知识图谱表示模型对比研究*[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn