Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (10): 57-67    DOI: 10.11925/infotech.2096-3467.2021.1458
Current Issue | Archive | Adv Search |
Extracting Patent Keywords by Integrating Restriction Relationship
Yu Yan(),Zhu Shengchen
Institute of the Information Management and Technology, Nanjing Tech University, Nanjing 210009, China
Download: PDF (1127 KB)   HTML ( 10
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to improve the accuracy of patent keyword extraction with the characteristics of patent claims. [Methods] We examined the restriction relationship between technical features of patent claims. Then, we integrated these relationship into the patent keyword extraction method based on graph. [Results] We examined our model with the USPTO and Baiten data sets for patents. The MRR index of our method was 31.79% (USPTO) and 33.81% (Baiten) higher than the traditional TextRank method. [Limitations] The data of our experimental analysis need to be further expanded. [Conclusions] The proposed method could significantly improve the accuracy of patent keyword extraction.

Key wordsPatentExtraction      Restriction Relationship      Claim      TextRank     
Received: 27 December 2021      Published: 16 November 2022
ZTFLH:  TP393 G250  
Fund:National Social Science Fund of China(17BTQ059)
Corresponding Authors: Yu Yan,ORCID:0000-0002-9654-8614      E-mail: yuyanyuyan2004@126.com

Cite this article:

Yu Yan, Zhu Shengchen. Extracting Patent Keywords by Integrating Restriction Relationship. Data Analysis and Knowledge Discovery, 2022, 6(10): 57-67.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.1458     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I10/57

Example of Restricted Relationships Diagram
Restricted Relationships in Patent Claims
Example of Graph Building
数据集 专利数 平均权利要求数 平均关键词数
USPTO 200 17.28 7.83
Baiten 500 9.93 10
Data Set Information
方法 MRR
USPTO Baiten
PageRank+Word 1.48 1.36
PageRank+Phrase 1.61 1.49
Evaluation Results of Candidate Keywords as Vertex
方法 排名 抽取的前10个关键词 MRR
PageRank+
Word
1
2
3
4
5
6
7
8
9
10
electrolyte additive composition
novel borate-based lithium compound
borate-based lithium compound
additive composition
electrolyte additive
borate-based compound
phosphate-based compound tributyl
phosphate
vinyl silane-based compound
non-lithiated additive
fluorocarbonate-based compound
1.87
PageRank+Phrase 1
2
3
4
5
6
7
8
9
10
electrolyte additive composition
lithium secondary battery
non-lithiated additive
borate-based lithium compound
tetravinyl silane
sultone-based compound
alkyltrivinyl_silane
sulfite-based compound
vinyl silane-based compound
linear carbonate solvent
2.32
Examples of Candidate Keywords as Vertex
Evaluation Results of Restricted Relationship
方法 排名 抽取的前10个关键词 MRR
ClaimRank+
Undirected
α = 0)
1
2
3
4
5
6
7
8
9
10
electrolyte_additive_composition
lithium secondary_battery
non-lithiated_additive
borate-based_lithium_compound
tetravinyl_silane
sultone-based_compound
alkyltrivinyl_silane
sulfite-based_compound
vinyl_silane-based_compound
linear_carbonate_solvent
2.32
ClaimRank+
Undirected
α=0.8)
1
2
3
4
5
6
7
8
9
10
borate-based_lithium_compound
non-lithiated_additive
non-aqueous_organic_solvent
vinyl_silane-based_compound
sulfate-based_compound
fluorocarbonate-based_compound
electrolyte_additive_composition
tetravinyl_silane
trialkylvinyl_silane
alkyltrivinyl_silane
2.59
ClaimRank+
Directed
α=0.8)
1
2
3
4
5
6
7
8
9
10
non-aqueous_organic_solvent
borate-based lithium compound
non-lithiated additive
sulfate-based compound
electrolyte additive composition
vinyl silane-based compound
fluorocarbonate-based compound
lithium secondary_battery
cyclic_carbonate_solvent
linear_carbonate_solvent
2.72
Example of Restricted Relationship Effect Evaluation
Examples of Directed Restricted Relationship
方法 MRR
USPTO Baiten
TextRank 1.51 1.39
SingleRank 1.54 1.35
PositionRank 1.72 1.47
ClaimRank 1.99 1.86
Comparison Results of Related Methods
Example of Keyword Position in Patent
[1] Mihalcea R, Tarau P. TextRank: Bringing Order into Texts[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
[2] Wan X J, Xiao J G. Single Document Keyphrase Extraction Using Neighborhood Knowledge[C]// Proceedings of the 23rd National Conference on Artificial Intelligence. 2008: 855-860.
[3] 夏天. 词语位置加权TextRank的关键词抽取研究[J]. 现代图书情报技术, 2013(9): 30-34.
[3] (Xia Tian. Study on Keyword Extraction Using Word Position Weighted TextRank[J]. New Technology of Library and Information Service, 2013(9): 30-34.)
[4] Florescu C, Caragea C. PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1105-1115.
[5] 李航, 唐超兰, 杨贤, 等. 融合多特征的TextRank关键词抽取方法[J]. 情报杂志, 2017, 36(8): 183-187.
[5] (Li Hang, Tang Chaolan, Yang Xian, et al. TextRank Keyword Extraction Based on Multi Feature Fusion[J]. Journal of Intelligence, 2017, 36(8): 183-187.)
[6] 刘竹辰, 陈浩, 于艳华, 等. 词位置分布加权TextRank的关键词提取[J]. 数据分析与知识发现, 2018, 2(9): 74-79.
[6] (Liu Zhuchen, Chen Hao, Yu Yanhua, et al. Extracting Keywords with TextRank and Weighted Word Positions[J]. Data Analysis and Knowledge Discovery, 2018, 2(9): 74-79.)
[7] Boudin F. Unsupervised Keyphrase Extraction with Multipartite Graphs[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2018: 667-672.
[8] 顾益军, 夏天. 融合LDA与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2014(7): 41-47.
[8] (Gu Yijun, Xia Tian. Study on Keyword Extraction with LDA and TextRank Combination[J]. New Technology of Library and Information Service, 2014(7): 41-47.)
[9] 刘啸剑, 谢飞, 吴信东. 基于图和LDA主题模型的关键词抽取算法[J]. 情报学报, 2016, 35(6): 664-672.
[9] (Liu Xiaojian, Xie Fei, Wu Xindong. Graph Based Keyphrase Extraction Using LDA Topic Model[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(6): 664-672.)
[10] 夏天. 词向量聚类加权TextRank的关键词抽取[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[10] (Xia Tian. Extracting Keywords with Modified TextRank Model[J]. Data Analysis and Knowledge Discovery, 2017, 1(2): 28-34.)
[11] 宁建飞, 刘降珍. 融合Word2vec与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2016(6): 20-27.
[11] (Ning Jianfei, Liu Jiangzhen. Using Word2vec with TextRank to Extract Keywords[J]. New Technology of Library and Information Service, 2016(6): 20-27.)
[12] Wang R, Liu W, McDonald C. Using Word Embeddings to Enhance Keyword Identification for Scientific Publications [A]// Databases Theory and Applications[M]. Springer, Cham. 2015.
[13] 俞琰, 尚明杰, 赵乃瑄. 权利要求特征驱动的专利关键词抽取方法[J]. 情报学报, 2021, 40(6):610-620.
[13] (Yu Yan, Shang Mingjie, Zhao Naixuan. Patent Keyword Extraction Driven by Claim Features[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(6): 610-620.)
[14] Witten I H, Paynter G W, Frank E, et al. KEA: Practical Automatic Keyphrase Extraction[C]// Proceedings of the 4th ACM Conference on Digital Libraries. 1999: 254-255.
[15] Zhang K, Xu H, Tang J, et al. Keyword Extraction Using Support Vector Machine[C]// Proceedings of the 7th International Conference on Advances in Web-Age Information Management. 2006: 85-96.
[16] 陈忆群, 周如旗, 朱蔚恒, 等. 挖掘专利知识实现关键词自动抽取[J]. 计算机研究与发展, 2016, 53(8): 1740-1752.
[16] (Chen Yiqun, Zhou Ruqi, Zhu Weiheng, et al. Mining Patent Knowledge for Automatic Keyword Extraction[J]. Journal of Computer Research and Development, 2016, 53(8): 1740-1752.)
[17] Hu J, Li S B, Yao Y, et al. Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification[J]. Entropy (Basel, Switzerland), 2018, 20(2): Ariticle No.104.
[18] Zhang C, Wang H, Liu Y, et al. Automatic Keyword Extraction from Documents Using Conditional Random Fields[J]. Journal of Computer Information Systems, 2008, 4(3): 1169-1180.
[19] Gollapalli S D, Li X L, Yang P. Incorporating Expert Knowledge into Keyphrase Extraction[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017: 3180-3187.
[20] 成彬, 施水才, 都云程, 等. 基于融合词性的BiLSTM-CRF的期刊关键词抽取方法[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[20] (Cheng Bin, Shi Shuicai, Du Yuncheng, et al. Keyword Extraction for Journals Based on Part-of-Speech and BiLSTM-CRF Combined Model[J]. Data Analysis and Knowledge Discovery, 2021, 5(3): 101-108.)
[21] 陈伟, 吴友政, 陈文亮, 等. 基于BiLSTM-CRF的关键词自动抽取[J]. 计算机科学, 2018, 45(S): 91-113.
[21] (Chen Wei, Wu Youzheng, Chen Wenliang, et al. Automatic Keyword Extraction Based on BiLSTM-CRF[J]. Computer Science, 2018, 45(S): 91-113.)
[22] Sterckx L, Demeester T, Deleu J, et al. Creation and Evaluation of Large Keyphrase Extraction Collections with Multiple Opinions[J]. Language Resources and Evaluation, 2018, 52(2): 503-532.
doi: 10.1007/s10579-017-9395-6
[23] Wang L, Li F. SJTULTLAB: Chunk Based Method for Keyphrase Extraction[C]// Proceedings of the 5th International Workshop on Semantic Evaluation. 2010: 158-161.
[24] 刘峰, 吴瑞红, 徐川, 等. 专利文献中关键词抽取方法的改进[J]. 情报杂志, 2014, 33(12): 36-40.
[24] (Liu Feng, Wu Ruihong, Xu Chuan, et al. Keyword Extraction of Patent Document: An Improved Approach[J]. Journal of Intelligence, 2014, 33(12): 36-40.)
[25] 黄磊, 伍雁鹏, 朱群峰. 关键词自动提取方法的研究与改进[J]. 计算机科学, 2014, 41(6): 204-207.
doi: 10.11896/j.issn.1002-137X.2014.06.040
[25] (Huang Lei, Wu Yanpeng, Zhu Qunfeng. Research and Improvement of TFIDF Text Feature Weighting Method[J]. Computer Science, 2014, 41(6): 204-207.)
doi: 10.11896/j.issn.1002-137X.2014.06.040
[26] 张瑾. 基于改进TF-IDF算法的情报关键词提取方法[J]. 情报杂志, 2014, 33(4): 153-155.
[26] (Zhang Jin. A Method of Intelligence Key Words Extraction Based on Improved TF-IDF[J]. Journal of Intelligence, 2014, 33(4): 153-155.)
[27] 牛萍, 黄德根. TF-IDF与规则相结合的中文关键词自动抽取研究[J]. 小型微型计算机系统, 2016, 37(4): 711-715.
[27] (Niu Ping, Huang Degen. TF-IDF and Rules Based Automatic Extraction of Chinese Keywords[J]. Journal of Chinese Computer Systems, 2016, 37(4): 711-715.)
[28] Joung J, Kim K. Monitoring Emerging Technologies for Technology Planning Using Technical Keyword Based Analysis from Patent Data[J]. Technological Forecasting and Social Change, 2017, 114: 281-292.
doi: 10.1016/j.techfore.2016.08.020
[29] Brin S, Page L. The Anatomy of a Large-Scale Hypertextual Web Search Engine[J]. Computer Networks and ISDN Systems, 1998, 30(1-7): 107-117.
doi: 10.1016/S0169-7552(98)00110-X
[1] Yan Qiang,Zhang Xiaoyan,Zhou Simin. Extracting Keywords Based on Sememe Similarity[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
[2] Xia Tian. Extracting Key-phrases from Chinese Scholarly Papers[J]. 数据分析与知识发现, 2020, 4(7): 76-86.
[3] Mingzhu Sun,Jing Ma,Lingfei Qian. Extracting Keywords Based on Topic Structure and Word Diagram Iteration[J]. 数据分析与知识发现, 2019, 3(8): 68-76.
[4] An Wang,Yijun Gu,Kunming Li,Wenzheng Li. Extracting Keywords Based on Removed Network Word Nodes[J]. 数据分析与知识发现, 2019, 3(11): 35-44.
[5] Liu Zhuchen,Chen Hao,Yu Yanhua,Li Jie. Extracting Keywords with TextRank and Weighted Word Positions[J]. 数据分析与知识发现, 2018, 2(9): 74-79.
[6] Wang Zixuan,Le Xiaoqiu,He Yuanbiao. Recognizing Core Topic Sentences with Improved TextRank Algorithm Based on WMD Semantic Similarity[J]. 数据分析与知识发现, 2017, 1(4): 1-8.
[7] Xia Tian. Extracting Keywords with Modified TextRank Model[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[8] Ning Jianfei,Liu Jiangzhen. Using Word2vec with TextRank to Extract Keywords[J]. 现代图书情报技术, 2016, 32(6): 20-27.
[9] Yan Chaobin, Chen Jiayong, Hou Ruifang, Li Ling, Zhou Jie. Construction of University Institutional Repository: Demand-driven by Paper Index and Citation Service[J]. 现代图书情报技术, 2015, 31(5): 94-100.
[10] Zhang Jie, Zhang Haichao, Zhai Dongsheng. Research of the Word Segmentation for Chinese Patent Claims[J]. 现代图书情报技术, 2014, 30(9): 91-98.
[11] Liu Wei, Zhu Zhongming, Zhang Wangqiang, Lu Linong, Yao Xiaona. Development and Research of Author Identifier and Item Claim Service for Institutional Repository[J]. 现代图书情报技术, 2014, 30(3): 8-13.
[12] Xia Tian. Study on Keyword Extraction Using Word Position Weighted TextRank[J]. 现代图书情报技术, 2013, 29(9): 30-34.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn