Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (10): 104-112    DOI: 10.11925/infotech.2096-3467.2019.1157
Current Issue | Archive | Adv Search |
Computing Similarity of Patent Terms Based on Knowledge Graph
Li Jiaquan1(),Li Baoan2,You Xindong1,Lü Xueqiang1
1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information cience & Technology University, Beijing 100101, China
2Computer School, Beijing Information Science & Technology University, Beijing 100101, China
Download: PDF (828 KB)   HTML ( 0
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] The study uses patent knowledge graph to calculate similarities between patent terms, aiming to detect infringement cases from patent texts.[Methods] We calculated term similarities based on the knowledge graph of new energy vehicle patent. Other factors included: the concept hierarchy of terms, the distance between terms in the knowledge graph, the semantic similarity of terms, as well as the attributes of terms.[Results] The accuracy and recall rates of patent term classification were more than 80%, which were significantly higher than those of the traditional methods.[Limitations] Manual construction of concept hierarchy tree and annotation of term classification might yield errors.[Conclusions] It is feasible to compute similarities between patent terms based on the knowledge graph, which provides good reference for future research.

Key wordsPatent Knowledge Graph      Similarity of Patent Terms      Patent Infringement Detection     
Received: 22 October 2019      Published: 28 July 2020
ZTFLH:  TP393  
Corresponding Authors: Li Jiaquan     E-mail: 15600083132@163.com

Cite this article:

Li Jiaquan,Li Baoan,You Xindong,Lü Xueqiang. Computing Similarity of Patent Terms Based on Knowledge Graph. Data Analysis and Knowledge Discovery, 2020, 4(10): 104-112.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.1157     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I10/104

Concept Hierarchy Tree of New Energy Vehicle
The Frequency of Various Types of Terms in the Corpus
术语对 文献[17] 文献[18] 文献[19] 路径权重
发电机组-汽车引擎 0.750 0.634 0.333 0.486
发电机组-皮带轮 0.500 0.374 0.200 0.291
发电机组-蓄电池 0.667 0.197 0.167 0.000
蓄电池-金属燃料 0.667 0.558 0.333 0.488
蓄电池-金属元素 0.333 0.241 0.200 0.000
电能-省油 0.667 0.558 0.333 0.487
Calculation Results of Concept Similarity
Number of Each Relationship
Number of Pairs of Each Relationship Term that Belong to the Same Type
术语对 最长距离/最短距离 最短距离的边 文献[16] wei_path
发电机组-汽车引擎 10/3 1-2-2 0.824 0.733
发电机组-皮带轮 7/3 1-3-6 0.670 0.683
发电机组-蓄电池 8/4 2-1-1-1 0.602 0.718
蓄电池-金属元素 0/0 N/A 0 0
电能-省油 7/3 4-5-4 0.670 0.673
Distance Similarity Calculation Results
Structure of Bert
参数 取值 说明
size 300 词向量的维度
windows 5 训练窗口大小
min_count 2 词语出现的最小次数
hs 1 1表示不采用Softmax方法
Parameters of Word2Vec
术语对 Word2Vec Bert
发电机组-汽车引擎 0.340 0.842
发电机组-皮带轮 0.330 0.769
发电机组-蓄电池 0.258 0.838
金属燃料-金属元素 0.222 0.898
Calculating Semantic Similarity with Word2Vec and Bert
术语对 Doc2Vec Doc2Vec+TextRank Bert+Text Rank Bert
蓄电池-发电机 0.465 0.518 0.912 0.810
蓄电池-制动器 0.511 0.473 0.419 0.465
制动器-离合器 0.581 0.623 0.822 0.781
发电机-制动器 0.622 0.634 0.432 0.557
蓄电池-皮带轮 0.588 0.512 0.327 0.475
Attribute Similarity Calculation Results
术语类型 汽车类型 结构部件 材料 燃料 性能 其他学科
数量 58 241 102 98 60 41
Number of Term Types
α1 α2 α3 α4 召回率 准确率 F1值
0.25 0.25 0.25 0.25 77.4 78.3 77.8
0.25 0.25 0.3 0.2 76.3 78.2 77.2
0.25 0.25 0.2 0.3 76.5 78.4 77.4
0.2 0.2 0.4 0.2 75.1 79.5 77.2
0.2 0.2 0.2 0.4 75.3 76.8 76.0
0.3 0.2 0.25 0.25 78.4 79.5 78.9
0.2 0.3 0.25 0.25 79.6 80.1 79.8
0.35 0.15 0.25 0.25 78.3 79.3 78.8
0.4 0.1 0.25 0.25 77.3 78.2 77.7
0.35 0.25 0.2 0.2 80.2 82.1 81.1
0.4 0.2 0.2 0.2 81.2 79.5 80.3
0.35 0.35 0.15 0.15 79.2 81.1 80.1
The Influence of α1α2α3 and α4 on the Experimental Results
β γ 召回率 准确率 F1值
0.2 0.5 60.3 64.1 62.1
0.3 0.5 63.1 68.2 65.6
0.4 0.5 59.1 63.3 61.1
0.3 0.6 74.1 78.1 76.0
0.3 0.7 80.2 82.1 81.1
0.3 0.8 75.1 79.2 77.1
0.4 0.7 74.7 78.3 76.5
Influence of β and γ Values on Experimental Results
方法 文献[17] 文献[18] 文献[19] 文献[16] Word2Vec Sim-KG
召回率 78.1 70.9 72.1 68.9 56.8 80.2
准确率 80.4 78.3 79.4 75.6 64.3 82.1
F1值 79.2 74.4 75.6 72.1 60.3 81.1
Evaluation Index of Each Method
[1] 黄恒琪, 于娟, 廖晓, 等. 知识图谱研究综述[J]. 计算机系统应用, 2019,28(6):1-12.
[1] ( Huang Hengqi, Yu Juan, Liao Xiao, et al. Review on Knowledge Graphs[J]. Computer Systems & Applications, 2019,28(6):1-12.)
[2] Jeh G, Widom J. SimRank: A Measure of Structural-context Similarity[C]//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2002: 538-543.
[3] 李善平, 尹奇韡, 胡玉杰, 等. 本体论研究综述[J]. 计算机研究与发展, 2004,41(7):1041-1052.
[3] ( Li Shanping, Yin Qiwei, Hu Yujie, et al. Overview of Researches on Ontology[J]. Journal of Computer Research and Development, 2004,41(7):1041-1052.)
[4] 张忠平, 赵海亮, 张志惠. 基于本体的概念相似度计算[J]. 计算机工程, 2009,35(7):17-19.
[4] ( Zhang Zhongping, Zhao Hailiang, Zhang Zhihui. Concept Similarity Computation Based on Ontology[J]. Computer Engineering, 2009,35(7):17-19.)
[5] 徐英卓, 贾欢. 基于树结构的本体概念相似度计算方法[J]. 计算机系统应用, 2017,26(3):275-279.
[5] ( Xu Yingzhuo, Jia Huan. Ontology Concept Similarity Calculation Based on Tree Structure[J]. Computer Systems & Applications, 2017,26(3):275-279.)
[6] Sussna M. Word Sense Disambiguation for Free-text Indexing Using a Massive Semantic Network[C]//Proceedings of the 2nd International Conference on Information and Knowledge Management, Washington, DC, US. 1993: 67-74.
[7] Bouras C, Tsogkas V. A Clustering Technique for News Articles Using WordNet[J]. Knowledge-Based Systems, 2012,36(6):115-128.
doi: 10.1016/j.knosys.2012.06.015
[8] Martinez S, Sanchez D, Valls A. Semantic Adaptive Microaggregation of Categorical Microdata[J]. Computer & Security, 2012,31(5):653-672.
[9] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013: 3111-3119.
[10] Devlin J, Chang M, Lee K, et al. Bert: Pretraining of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019: 4171-4186.
[11] 李荣, 杨冬, 刘磊. 基于本体的概念相似度计算方法研究[J]. 计算机研究与发展, 2011,48(S3):312-317.
[11] ( Li Rong, Yang Dong, Liu Lei. Research of Ontology-Based Conceptual Similarity Computation[J]. Journal of Computer Research and Development, 2011,48(S3):312-317.)
[12] 刘杰. 一种基于自动特征权值的实体相似度计算方法[J]. 重庆科技学院学报(自然科学版), 2014,16(3):157-160.
[12] ( Liu Jie. An Entity Similarity Measurement Based on Automatic Feature Weight[J]. Journal of Chongqing University of Science and Technology (Natural Sciences Edition), 2014,16(3):157-160.)
[13] Yu X, Ren X, Gu Q, et al. Collaborative Filtering with Entity Similarity Regularization in Heterogeneous Information Networks[C]//Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2013.
[14] Zhang J, Tang J, Ma C, et al. Panther: Fast Top-k Similarity Search on Large Networks[C]//Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015: 1445-1454.
[15] 中国科学技术信息研究所. 汉语科技词系统(新能源汽车卷)[M]. 北京: 科学技术文献出版社, 2012.
[15] (Institute of Scientific and Technical Information of China. Chinese Science&Technology Vocabulary System (New Energy Vehicles)[M]. Beijing: Scientific and Technical Documentation Press, 2012.)
[16] Leacock C, Chodorow M. Combining Local Context and WordNet Similarity for Word Sense Identification[A]//WordNet: An Electronic Lexical Database[M]. 1998: 265-283.
[17] Wu Z, Palmer M. Verbs Semantics and Lexical Selection[C]// Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. 1994: 133-138.
[18] Li Y, Bandar Z, Mclean D. An Approach for Measuring Semantic Similarity Between Words Using Multiple Information Sources[J]. IEEE Transactions on Knowledge and Data Engineering, 2003,15(4):871-882.
doi: 10.1109/TKDE.2003.1209005
[19] Rada R, Mili H, Bicknell E, et al. Development and Application of a Metric on Semantic Nets[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1989,19(1):17-30.
doi: 10.1109/21.24528
[20] Lyu X, Lyu X, Sun F, et al. Patent Domain Terminology Extraction Based on Multi-Feature Fusion and BiLSTM-CRF Model[C]//Proceedings of the 4th International Conference on Fuzzy Systems and Data Mining (FSDM 2018). 2018: 495-500.
[21] Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018: 2227-2237.
[22] Gers F A, Schmidhuber J, Cummins F. Learning to Forget: Continual Prediction with LSTM[C]//Proceedings of the 9th International Conference on Artificial Neural Networks, 1999.
[23] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st Annual Conference on Neural Information Processing Systems. 2017: 5998-6008.
[24] Le Q, Mikolov T. Distributed Representations of Sentences and Documents[C]//Proceedings of the 31st International Conference on Machine Learning. 2014: 1188-1196.
[1] Wang Xiwei,Zhang Liu,Huang Bo,Wei Ya’nan. Constructing Topic Graph for Weibo Users Based on LDA: Case Study of “Egypt Air Disaster”[J]. 数据分析与知识发现, 2020, 4(10): 47-57.
[2] Ding Heng,Li Yingxuan. Improving Online Q&A Service with Deep Learning[J]. 数据分析与知识发现, 2020, 4(10): 37-46.
[3] Xu Tongtong,Sun Huazhi,Ma Chunmei,Jiang Lifen,Liu Yichen. Classification Model for Few-shot Texts Based on Bi-directional Long-term Attention Features[J]. 数据分析与知识发现, 2020, 4(10): 113-123.
[4] Tao Yue,Yu Li,Zhang Runjie. Active Learning Strategies for Extracting Phrase-Level Topics from Scientific Literature[J]. 数据分析与知识发现, 2020, 4(10): 134-143.
[5] Zhang Chunjin, Guo Shenghui, Ji Shujuan, Yang Wei, Yi Lei . The Group recommendation algorithms based on implicit representation learning of multi-attribute ratings [J]. 数据分析与知识发现, 0, (): 1-.
[6] Sifan Zhang, Zhendong Niu, Hao Lu, Yifan Zhu, Rongrong Wang. Graph Convolution Embedding and Feature Cross Based Literature Citation Prediction Method:Taking the Transportation Field as An Example [J]. 数据分析与知识发现, 0, (): 1-.
[7] Zhang Sifan,Niu Zhendong,Lu Hao,Zhu Yifan,Wang Rongrong. Predicting Citations Based on Graph Convolution Embedding and Feature Cross:Case Study of Transportation Research[J]. 数据分析与知识发现, 2020, 4(9): 56-67.
[8] Zeng Zhen,Li Gang,Mao Jin,Chen Jinghao. Data Governance and Domain Ontology of Regional Public Security[J]. 数据分析与知识发现, 2020, 4(9): 41-55.
[9] Wen Pingmei,Ye Zhiwei,Ding Wenjian,Liu Ying,Xu Jian. Developments of Named Entity Disambiguation[J]. 数据分析与知识发现, 2020, 4(9): 15-25.
[10] Huang Lu,Zhou Enguo,Li Daifeng. Text Representation Learning Model Based on Attention Mechanism with Task-specific Information[J]. 数据分析与知识发现, 2020, 4(9): 111-122.
[11] Liu Qian, Li Chenliang. A Survey of Topic Evolution on Social Media[J]. 数据分析与知识发现, 2020, 4(8): 1-14.
[12] Shen Zhe, Wang Yi, Yao Yifan, Cheng Ying. Author Name Disambiguation Techniques for Academic Literature: A Review[J]. 数据分析与知识发现, 2020, 4(8): 15-27.
[13] Sheng Jiaqi, Xu Xin. Expanding Scholar Labels with Research Similarity and Co-authorship Network[J]. 数据分析与知识发现, 2020, 4(8): 75-85.
[14] Chenglei Qin, Chengzhi Zhang. Using Hierarchical Attention Network Model to Recognize Structure Functions of Academic Articles [J]. 数据分析与知识发现, 0, (): 1-.
[15] Shen Zhihong,Zhao Zihao,Wang Haibo. Big Data Technology Stack Shifting: From SQL Centric to Graph Centric[J]. 数据分析与知识发现, 2020, 4(7): 50-65.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn