Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (10): 104-112     https://doi.org/10.11925/infotech.2096-3467.2019.1157
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于专利知识图谱的专利术语相似度计算研究*
李家全1(),李宝安2,游新冬1,吕学强1
1北京信息科技大学网络文化与数字传播重点实验室 北京 100101
2北京信息科技大学计算机学院 北京 100101
Computing Similarity of Patent Terms Based on Knowledge Graph
Li Jiaquan1(),Li Baoan2,You Xindong1,Lü Xueqiang1
1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information cience & Technology University, Beijing 100101, China
2Computer School, Beijing Information Science & Technology University, Beijing 100101, China
全文: PDF (828 KB)   HTML ( 21
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 利用专利知识图谱计算专利术语之间的相似度,从而计算专利文本之间的相似度以判断专利是否侵权。【方法】 利用已构建的新能源汽车专利的知识图谱,结合术语的概念层次结构、术语在知识图谱中的距离、术语的语义相似度以及术语的属性计算术语之间的相似度。【结果】 专利术语分类的准确率和召回率都在80%以上,相较于传统方法有明显提升。【局限】 人工构建概念层次结构树以及标注术语的分类,可能会存在部分的分类错误。【结论】 基于专利的知识图谱计算专利术语之间的相似度是可行的,使用分类的指标对方法进行评价时,指标的准确率达80%以上,对于后续的专利侵权检测研究具有很好的参考作用。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
李家全
李宝安
游新冬
吕学强
关键词 专利知识图谱专利术语相似度专利侵权检测    
Abstract

[Objective] The study uses patent knowledge graph to calculate similarities between patent terms, aiming to detect infringement cases from patent texts.[Methods] We calculated term similarities based on the knowledge graph of new energy vehicle patent. Other factors included: the concept hierarchy of terms, the distance between terms in the knowledge graph, the semantic similarity of terms, as well as the attributes of terms.[Results] The accuracy and recall rates of patent term classification were more than 80%, which were significantly higher than those of the traditional methods.[Limitations] Manual construction of concept hierarchy tree and annotation of term classification might yield errors.[Conclusions] It is feasible to compute similarities between patent terms based on the knowledge graph, which provides good reference for future research.

Key wordsPatent Knowledge Graph    Similarity of Patent Terms    Patent Infringement Detection
收稿日期: 2019-10-22      出版日期: 2020-07-28
ZTFLH:  TP393  
基金资助:*本文系国家自然科学基金项目“中文专利侵权检测研究”(61671070);北京信息科技大学促进高校内涵发展科研水平提高项目(2019KYNH226);北京信息科技大学“勤信人才”培育计划项目资助的研究成果之一(QXTCP B201908)
通讯作者: 李家全     E-mail: 15600083132@163.com
引用本文:   
李家全,李宝安,游新冬,吕学强. 基于专利知识图谱的专利术语相似度计算研究*[J]. 数据分析与知识发现, 2020, 4(10): 104-112.
Li Jiaquan,Li Baoan,You Xindong,Lü Xueqiang. Computing Similarity of Patent Terms Based on Knowledge Graph. Data Analysis and Knowledge Discovery, 2020, 4(10): 104-112.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2019.1157      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I10/104
Fig.1  新能源汽车领域概念层结构树
Fig.2  各种类型的术语在语料中出现的频次
术语对 文献[17] 文献[18] 文献[19] 路径权重
发电机组-汽车引擎 0.750 0.634 0.333 0.486
发电机组-皮带轮 0.500 0.374 0.200 0.291
发电机组-蓄电池 0.667 0.197 0.167 0.000
蓄电池-金属燃料 0.667 0.558 0.333 0.488
蓄电池-金属元素 0.333 0.241 0.200 0.000
电能-省油 0.667 0.558 0.333 0.487
Table 1  几种术语之间概念相似度计算结果
Fig.3  每种关系对应的数量
Fig.4  每种关系术语对属于相同类型的数量
术语对 最长距离/最短距离 最短距离的边 文献[16] wei_path
发电机组-汽车引擎 10/3 1-2-2 0.824 0.733
发电机组-皮带轮 7/3 1-3-6 0.670 0.683
发电机组-蓄电池 8/4 2-1-1-1 0.602 0.718
蓄电池-金属元素 0/0 N/A 0 0
电能-省油 7/3 4-5-4 0.670 0.673
Table 2  不同术语对的距离相似度计算结果
Fig.5  Bert模型结构
参数 取值 说明
size 300 词向量的维度
windows 5 训练窗口大小
min_count 2 词语出现的最小次数
hs 1 1表示不采用Softmax方法
Table 3  Word2Vec训练的参数
术语对 Word2Vec Bert
发电机组-汽车引擎 0.340 0.842
发电机组-皮带轮 0.330 0.769
发电机组-蓄电池 0.258 0.838
金属燃料-金属元素 0.222 0.898
Table 4  Word2Vec和Bert计算语义相似度
术语对 Doc2Vec Doc2Vec+TextRank Bert+Text Rank Bert
蓄电池-发电机 0.465 0.518 0.912 0.810
蓄电池-制动器 0.511 0.473 0.419 0.465
制动器-离合器 0.581 0.623 0.822 0.781
发电机-制动器 0.622 0.634 0.432 0.557
蓄电池-皮带轮 0.588 0.512 0.327 0.475
Table 5  属性相似度计算结果
术语类型 汽车类型 结构部件 材料 燃料 性能 其他学科
数量 58 241 102 98 60 41
Table 6  术语类型数量
α1 α2 α3 α4 召回率 准确率 F1值
0.25 0.25 0.25 0.25 77.4 78.3 77.8
0.25 0.25 0.3 0.2 76.3 78.2 77.2
0.25 0.25 0.2 0.3 76.5 78.4 77.4
0.2 0.2 0.4 0.2 75.1 79.5 77.2
0.2 0.2 0.2 0.4 75.3 76.8 76.0
0.3 0.2 0.25 0.25 78.4 79.5 78.9
0.2 0.3 0.25 0.25 79.6 80.1 79.8
0.35 0.15 0.25 0.25 78.3 79.3 78.8
0.4 0.1 0.25 0.25 77.3 78.2 77.7
0.35 0.25 0.2 0.2 80.2 82.1 81.1
0.4 0.2 0.2 0.2 81.2 79.5 80.3
0.35 0.35 0.15 0.15 79.2 81.1 80.1
Table 7  α1α2α3以及α4的取值对实验结果的影响
β γ 召回率 准确率 F1值
0.2 0.5 60.3 64.1 62.1
0.3 0.5 63.1 68.2 65.6
0.4 0.5 59.1 63.3 61.1
0.3 0.6 74.1 78.1 76.0
0.3 0.7 80.2 82.1 81.1
0.3 0.8 75.1 79.2 77.1
0.4 0.7 74.7 78.3 76.5
Table 8  不同的βγ值对实验结果的影响
方法 文献[17] 文献[18] 文献[19] 文献[16] Word2Vec Sim-KG
召回率 78.1 70.9 72.1 68.9 56.8 80.2
准确率 80.4 78.3 79.4 75.6 64.3 82.1
F1值 79.2 74.4 75.6 72.1 60.3 81.1
Table 9  各个方法的评价指标
[1] 黄恒琪, 于娟, 廖晓, 等. 知识图谱研究综述[J]. 计算机系统应用, 2019,28(6):1-12.
[1] ( Huang Hengqi, Yu Juan, Liao Xiao, et al. Review on Knowledge Graphs[J]. Computer Systems & Applications, 2019,28(6):1-12.)
[2] Jeh G, Widom J. SimRank: A Measure of Structural-context Similarity[C]//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2002: 538-543.
[3] 李善平, 尹奇韡, 胡玉杰, 等. 本体论研究综述[J]. 计算机研究与发展, 2004,41(7):1041-1052.
[3] ( Li Shanping, Yin Qiwei, Hu Yujie, et al. Overview of Researches on Ontology[J]. Journal of Computer Research and Development, 2004,41(7):1041-1052.)
[4] 张忠平, 赵海亮, 张志惠. 基于本体的概念相似度计算[J]. 计算机工程, 2009,35(7):17-19.
[4] ( Zhang Zhongping, Zhao Hailiang, Zhang Zhihui. Concept Similarity Computation Based on Ontology[J]. Computer Engineering, 2009,35(7):17-19.)
[5] 徐英卓, 贾欢. 基于树结构的本体概念相似度计算方法[J]. 计算机系统应用, 2017,26(3):275-279.
[5] ( Xu Yingzhuo, Jia Huan. Ontology Concept Similarity Calculation Based on Tree Structure[J]. Computer Systems & Applications, 2017,26(3):275-279.)
[6] Sussna M. Word Sense Disambiguation for Free-text Indexing Using a Massive Semantic Network[C]//Proceedings of the 2nd International Conference on Information and Knowledge Management, Washington, DC, US. 1993: 67-74.
[7] Bouras C, Tsogkas V. A Clustering Technique for News Articles Using WordNet[J]. Knowledge-Based Systems, 2012,36(6):115-128.
doi: 10.1016/j.knosys.2012.06.015
[8] Martinez S, Sanchez D, Valls A. Semantic Adaptive Microaggregation of Categorical Microdata[J]. Computer & Security, 2012,31(5):653-672.
[9] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013: 3111-3119.
[10] Devlin J, Chang M, Lee K, et al. Bert: Pretraining of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019: 4171-4186.
[11] 李荣, 杨冬, 刘磊. 基于本体的概念相似度计算方法研究[J]. 计算机研究与发展, 2011,48(S3):312-317.
[11] ( Li Rong, Yang Dong, Liu Lei. Research of Ontology-Based Conceptual Similarity Computation[J]. Journal of Computer Research and Development, 2011,48(S3):312-317.)
[12] 刘杰. 一种基于自动特征权值的实体相似度计算方法[J]. 重庆科技学院学报(自然科学版), 2014,16(3):157-160.
[12] ( Liu Jie. An Entity Similarity Measurement Based on Automatic Feature Weight[J]. Journal of Chongqing University of Science and Technology (Natural Sciences Edition), 2014,16(3):157-160.)
[13] Yu X, Ren X, Gu Q, et al. Collaborative Filtering with Entity Similarity Regularization in Heterogeneous Information Networks[C]//Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2013.
[14] Zhang J, Tang J, Ma C, et al. Panther: Fast Top-k Similarity Search on Large Networks[C]//Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015: 1445-1454.
[15] 中国科学技术信息研究所. 汉语科技词系统(新能源汽车卷)[M]. 北京: 科学技术文献出版社, 2012.
[15] (Institute of Scientific and Technical Information of China. Chinese Science&Technology Vocabulary System (New Energy Vehicles)[M]. Beijing: Scientific and Technical Documentation Press, 2012.)
[16] Leacock C, Chodorow M. Combining Local Context and WordNet Similarity for Word Sense Identification[A]//WordNet: An Electronic Lexical Database[M]. 1998: 265-283.
[17] Wu Z, Palmer M. Verbs Semantics and Lexical Selection[C]// Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. 1994: 133-138.
[18] Li Y, Bandar Z, Mclean D. An Approach for Measuring Semantic Similarity Between Words Using Multiple Information Sources[J]. IEEE Transactions on Knowledge and Data Engineering, 2003,15(4):871-882.
doi: 10.1109/TKDE.2003.1209005
[19] Rada R, Mili H, Bicknell E, et al. Development and Application of a Metric on Semantic Nets[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1989,19(1):17-30.
doi: 10.1109/21.24528
[20] Lyu X, Lyu X, Sun F, et al. Patent Domain Terminology Extraction Based on Multi-Feature Fusion and BiLSTM-CRF Model[C]//Proceedings of the 4th International Conference on Fuzzy Systems and Data Mining (FSDM 2018). 2018: 495-500.
[21] Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018: 2227-2237.
[22] Gers F A, Schmidhuber J, Cummins F. Learning to Forget: Continual Prediction with LSTM[C]//Proceedings of the 9th International Conference on Artificial Neural Networks, 1999.
[23] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st Annual Conference on Neural Information Processing Systems. 2017: 5998-6008.
[24] Le Q, Mikolov T. Distributed Representations of Sentences and Documents[C]//Proceedings of the 31st International Conference on Machine Learning. 2014: 1188-1196.
[1] 吕学强,罗艺雄,李家全,游新冬. 中文专利侵权检测研究综述*[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn