Computing Similarity of Patent Terms Based on Knowledge Graph
Li Jiaquan1(),Li Baoan2,You Xindong1,Lü Xueqiang1
1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information cience & Technology University, Beijing 100101, China 2Computer School, Beijing Information Science & Technology University, Beijing 100101, China
[Objective] The study uses patent knowledge graph to calculate similarities between patent terms, aiming to detect infringement cases from patent texts.[Methods] We calculated term similarities based on the knowledge graph of new energy vehicle patent. Other factors included: the concept hierarchy of terms, the distance between terms in the knowledge graph, the semantic similarity of terms, as well as the attributes of terms.[Results] The accuracy and recall rates of patent term classification were more than 80%, which were significantly higher than those of the traditional methods.[Limitations] Manual construction of concept hierarchy tree and annotation of term classification might yield errors.[Conclusions] It is feasible to compute similarities between patent terms based on the knowledge graph, which provides good reference for future research.
李家全,李宝安,游新冬,吕学强. 基于专利知识图谱的专利术语相似度计算研究*[J]. 数据分析与知识发现, 2020, 4(10): 104-112.
Li Jiaquan,Li Baoan,You Xindong,Lü Xueqiang. Computing Similarity of Patent Terms Based on Knowledge Graph. Data Analysis and Knowledge Discovery, 2020, 4(10): 104-112.
( Huang Hengqi, Yu Juan, Liao Xiao, et al. Review on Knowledge Graphs[J]. Computer Systems & Applications, 2019,28(6):1-12.)
[2]
Jeh G, Widom J. SimRank: A Measure of Structural-context Similarity[C]//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2002: 538-543.
( Xu Yingzhuo, Jia Huan. Ontology Concept Similarity Calculation Based on Tree Structure[J]. Computer Systems & Applications, 2017,26(3):275-279.)
[6]
Sussna M. Word Sense Disambiguation for Free-text Indexing Using a Massive Semantic Network[C]//Proceedings of the 2nd International Conference on Information and Knowledge Management, Washington, DC, US. 1993: 67-74.
[7]
Bouras C, Tsogkas V. A Clustering Technique for News Articles Using WordNet[J]. Knowledge-Based Systems, 2012,36(6):115-128.
doi: 10.1016/j.knosys.2012.06.015
[8]
Martinez S, Sanchez D, Valls A. Semantic Adaptive Microaggregation of Categorical Microdata[J]. Computer & Security, 2012,31(5):653-672.
[9]
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013: 3111-3119.
[10]
Devlin J, Chang M, Lee K, et al. Bert: Pretraining of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019: 4171-4186.
( Li Rong, Yang Dong, Liu Lei. Research of Ontology-Based Conceptual Similarity Computation[J]. Journal of Computer Research and Development, 2011,48(S3):312-317.)
( Liu Jie. An Entity Similarity Measurement Based on Automatic Feature Weight[J]. Journal of Chongqing University of Science and Technology (Natural Sciences Edition), 2014,16(3):157-160.)
[13]
Yu X, Ren X, Gu Q, et al. Collaborative Filtering with Entity Similarity Regularization in Heterogeneous Information Networks[C]//Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2013.
[14]
Zhang J, Tang J, Ma C, et al. Panther: Fast Top-k Similarity Search on Large Networks[C]//Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015: 1445-1454.
(Institute of Scientific and Technical Information of China. Chinese Science&Technology Vocabulary System (New Energy Vehicles)[M]. Beijing: Scientific and Technical Documentation Press, 2012.)
[16]
Leacock C, Chodorow M. Combining Local Context and WordNet Similarity for Word Sense Identification[A]//WordNet: An Electronic Lexical Database[M]. 1998: 265-283.
[17]
Wu Z, Palmer M. Verbs Semantics and Lexical Selection[C]// Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. 1994: 133-138.
[18]
Li Y, Bandar Z, Mclean D. An Approach for Measuring Semantic Similarity Between Words Using Multiple Information Sources[J]. IEEE Transactions on Knowledge and Data Engineering, 2003,15(4):871-882.
doi: 10.1109/TKDE.2003.1209005
[19]
Rada R, Mili H, Bicknell E, et al. Development and Application of a Metric on Semantic Nets[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1989,19(1):17-30.
doi: 10.1109/21.24528
[20]
Lyu X, Lyu X, Sun F, et al. Patent Domain Terminology Extraction Based on Multi-Feature Fusion and BiLSTM-CRF Model[C]//Proceedings of the 4th International Conference on Fuzzy Systems and Data Mining (FSDM 2018). 2018: 495-500.
[21]
Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018: 2227-2237.
[22]
Gers F A, Schmidhuber J, Cummins F. Learning to Forget: Continual Prediction with LSTM[C]//Proceedings of the 9th International Conference on Artificial Neural Networks, 1999.
[23]
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st Annual Conference on Neural Information Processing Systems. 2017: 5998-6008.
[24]
Le Q, Mikolov T. Distributed Representations of Sentences and Documents[C]//Proceedings of the 31st International Conference on Machine Learning. 2014: 1188-1196.