Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (10): 57-67     https://doi.org/10.11925/infotech.2096-3467.2021.1458
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
融入限定关系的专利关键词抽取方法*
俞琰(),朱晟忱
南京工业大学信息管理与技术研究所 南京 210009
Extracting Patent Keywords by Integrating Restriction Relationship
Yu Yan(),Zhu Shengchen
Institute of the Information Management and Technology, Nanjing Tech University, Nanjing 210009, China
全文: PDF (1127 KB)   HTML ( 19
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 通过考虑专利权利要求特征,提高专利关键词抽取准确性。【方法】 挖掘出专利权利要求中技术特征间的限定关系,将限定关系融入基于图的专利关键词抽取方法中,以抽取专利关键词。【结果】 在USPTO专利数据集和Baiten专利数据集上进行实验,实验结果表明所提方法的MRR指标较传统的TextRank方法分别相对提升了31.79%(USPTO)和33.81%(Baiten)。【局限】 实验分析的数据需要进一步扩大。【结论】 融入专利权利要求的限定关系信息能够显著提高专利关键词抽取的准确性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
俞琰
朱晟忱
关键词 专利抽取限定关系权利要求TextRank    
Abstract

[Objective] This paper tries to improve the accuracy of patent keyword extraction with the characteristics of patent claims. [Methods] We examined the restriction relationship between technical features of patent claims. Then, we integrated these relationship into the patent keyword extraction method based on graph. [Results] We examined our model with the USPTO and Baiten data sets for patents. The MRR index of our method was 31.79% (USPTO) and 33.81% (Baiten) higher than the traditional TextRank method. [Limitations] The data of our experimental analysis need to be further expanded. [Conclusions] The proposed method could significantly improve the accuracy of patent keyword extraction.

Key wordsPatentExtraction    Restriction Relationship    Claim    TextRank
收稿日期: 2021-12-27      出版日期: 2022-11-16
ZTFLH:  TP393 G250  
基金资助:国家社会科学基金项目(17BTQ059)
通讯作者: 俞琰,ORCID:0000-0002-9654-8614      E-mail: yuyanyuyan2004@126.com
引用本文:   
俞琰, 朱晟忱. 融入限定关系的专利关键词抽取方法*[J]. 数据分析与知识发现, 2022, 6(10): 57-67.
Yu Yan, Zhu Shengchen. Extracting Patent Keywords by Integrating Restriction Relationship. Data Analysis and Knowledge Discovery, 2022, 6(10): 57-67.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.1458      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I10/57
Fig.1  限定关系示图
Fig.2  示例专利权利要求中的限定关系
(注:绿色表示线索短语,红色表示父技术特征,蓝色表示子技术特征。)
Fig.3  图构建示例
数据集 专利数 平均权利要求数 平均关键词数
USPTO 200 17.28 7.83
Baiten 500 9.93 10
Table 1  数据集信息
方法 MRR
USPTO Baiten
PageRank+Word 1.48 1.36
PageRank+Phrase 1.61 1.49
Table 2  候选关键词作为顶点效果评估结果
方法 排名 抽取的前10个关键词 MRR
PageRank+
Word
1
2
3
4
5
6
7
8
9
10
electrolyte additive composition
novel borate-based lithium compound
borate-based lithium compound
additive composition
electrolyte additive
borate-based compound
phosphate-based compound tributyl
phosphate
vinyl silane-based compound
non-lithiated additive
fluorocarbonate-based compound
1.87
PageRank+Phrase 1
2
3
4
5
6
7
8
9
10
electrolyte additive composition
lithium secondary battery
non-lithiated additive
borate-based lithium compound
tetravinyl silane
sultone-based compound
alkyltrivinyl_silane
sulfite-based compound
vinyl silane-based compound
linear carbonate solvent
2.32
Table 3  候选关键词作为顶点效果评估实例
Fig.4  限定边效果评估结果
方法 排名 抽取的前10个关键词 MRR
ClaimRank+
Undirected
α = 0)
1
2
3
4
5
6
7
8
9
10
electrolyte_additive_composition
lithium secondary_battery
non-lithiated_additive
borate-based_lithium_compound
tetravinyl_silane
sultone-based_compound
alkyltrivinyl_silane
sulfite-based_compound
vinyl_silane-based_compound
linear_carbonate_solvent
2.32
ClaimRank+
Undirected
α=0.8)
1
2
3
4
5
6
7
8
9
10
borate-based_lithium_compound
non-lithiated_additive
non-aqueous_organic_solvent
vinyl_silane-based_compound
sulfate-based_compound
fluorocarbonate-based_compound
electrolyte_additive_composition
tetravinyl_silane
trialkylvinyl_silane
alkyltrivinyl_silane
2.59
ClaimRank+
Directed
α=0.8)
1
2
3
4
5
6
7
8
9
10
non-aqueous_organic_solvent
borate-based lithium compound
non-lithiated additive
sulfate-based compound
electrolyte additive composition
vinyl silane-based compound
fluorocarbonate-based compound
lithium secondary_battery
cyclic_carbonate_solvent
linear_carbonate_solvent
2.72
Table4  限定边效果评估示例
Fig.5  具有有向限定边的图示例
方法 MRR
USPTO Baiten
TextRank 1.51 1.39
SingleRank 1.54 1.35
PositionRank 1.72 1.47
ClaimRank 1.99 1.86
Table 5  相关方法比较结果
Fig.6  示例专利关键词位置
[1] Mihalcea R, Tarau P. TextRank: Bringing Order into Texts[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
[2] Wan X J, Xiao J G. Single Document Keyphrase Extraction Using Neighborhood Knowledge[C]// Proceedings of the 23rd National Conference on Artificial Intelligence. 2008: 855-860.
[3] 夏天. 词语位置加权TextRank的关键词抽取研究[J]. 现代图书情报技术, 2013(9): 30-34.
[3] (Xia Tian. Study on Keyword Extraction Using Word Position Weighted TextRank[J]. New Technology of Library and Information Service, 2013(9): 30-34.)
[4] Florescu C, Caragea C. PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1105-1115.
[5] 李航, 唐超兰, 杨贤, 等. 融合多特征的TextRank关键词抽取方法[J]. 情报杂志, 2017, 36(8): 183-187.
[5] (Li Hang, Tang Chaolan, Yang Xian, et al. TextRank Keyword Extraction Based on Multi Feature Fusion[J]. Journal of Intelligence, 2017, 36(8): 183-187.)
[6] 刘竹辰, 陈浩, 于艳华, 等. 词位置分布加权TextRank的关键词提取[J]. 数据分析与知识发现, 2018, 2(9): 74-79.
[6] (Liu Zhuchen, Chen Hao, Yu Yanhua, et al. Extracting Keywords with TextRank and Weighted Word Positions[J]. Data Analysis and Knowledge Discovery, 2018, 2(9): 74-79.)
[7] Boudin F. Unsupervised Keyphrase Extraction with Multipartite Graphs[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2018: 667-672.
[8] 顾益军, 夏天. 融合LDA与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2014(7): 41-47.
[8] (Gu Yijun, Xia Tian. Study on Keyword Extraction with LDA and TextRank Combination[J]. New Technology of Library and Information Service, 2014(7): 41-47.)
[9] 刘啸剑, 谢飞, 吴信东. 基于图和LDA主题模型的关键词抽取算法[J]. 情报学报, 2016, 35(6): 664-672.
[9] (Liu Xiaojian, Xie Fei, Wu Xindong. Graph Based Keyphrase Extraction Using LDA Topic Model[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(6): 664-672.)
[10] 夏天. 词向量聚类加权TextRank的关键词抽取[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[10] (Xia Tian. Extracting Keywords with Modified TextRank Model[J]. Data Analysis and Knowledge Discovery, 2017, 1(2): 28-34.)
[11] 宁建飞, 刘降珍. 融合Word2vec与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2016(6): 20-27.
[11] (Ning Jianfei, Liu Jiangzhen. Using Word2vec with TextRank to Extract Keywords[J]. New Technology of Library and Information Service, 2016(6): 20-27.)
[12] Wang R, Liu W, McDonald C. Using Word Embeddings to Enhance Keyword Identification for Scientific Publications [A]// Databases Theory and Applications[M]. Springer, Cham. 2015.
[13] 俞琰, 尚明杰, 赵乃瑄. 权利要求特征驱动的专利关键词抽取方法[J]. 情报学报, 2021, 40(6):610-620.
[13] (Yu Yan, Shang Mingjie, Zhao Naixuan. Patent Keyword Extraction Driven by Claim Features[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(6): 610-620.)
[14] Witten I H, Paynter G W, Frank E, et al. KEA: Practical Automatic Keyphrase Extraction[C]// Proceedings of the 4th ACM Conference on Digital Libraries. 1999: 254-255.
[15] Zhang K, Xu H, Tang J, et al. Keyword Extraction Using Support Vector Machine[C]// Proceedings of the 7th International Conference on Advances in Web-Age Information Management. 2006: 85-96.
[16] 陈忆群, 周如旗, 朱蔚恒, 等. 挖掘专利知识实现关键词自动抽取[J]. 计算机研究与发展, 2016, 53(8): 1740-1752.
[16] (Chen Yiqun, Zhou Ruqi, Zhu Weiheng, et al. Mining Patent Knowledge for Automatic Keyword Extraction[J]. Journal of Computer Research and Development, 2016, 53(8): 1740-1752.)
[17] Hu J, Li S B, Yao Y, et al. Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification[J]. Entropy (Basel, Switzerland), 2018, 20(2): Ariticle No.104.
[18] Zhang C, Wang H, Liu Y, et al. Automatic Keyword Extraction from Documents Using Conditional Random Fields[J]. Journal of Computer Information Systems, 2008, 4(3): 1169-1180.
[19] Gollapalli S D, Li X L, Yang P. Incorporating Expert Knowledge into Keyphrase Extraction[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017: 3180-3187.
[20] 成彬, 施水才, 都云程, 等. 基于融合词性的BiLSTM-CRF的期刊关键词抽取方法[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[20] (Cheng Bin, Shi Shuicai, Du Yuncheng, et al. Keyword Extraction for Journals Based on Part-of-Speech and BiLSTM-CRF Combined Model[J]. Data Analysis and Knowledge Discovery, 2021, 5(3): 101-108.)
[21] 陈伟, 吴友政, 陈文亮, 等. 基于BiLSTM-CRF的关键词自动抽取[J]. 计算机科学, 2018, 45(S): 91-113.
[21] (Chen Wei, Wu Youzheng, Chen Wenliang, et al. Automatic Keyword Extraction Based on BiLSTM-CRF[J]. Computer Science, 2018, 45(S): 91-113.)
[22] Sterckx L, Demeester T, Deleu J, et al. Creation and Evaluation of Large Keyphrase Extraction Collections with Multiple Opinions[J]. Language Resources and Evaluation, 2018, 52(2): 503-532.
doi: 10.1007/s10579-017-9395-6
[23] Wang L, Li F. SJTULTLAB: Chunk Based Method for Keyphrase Extraction[C]// Proceedings of the 5th International Workshop on Semantic Evaluation. 2010: 158-161.
[24] 刘峰, 吴瑞红, 徐川, 等. 专利文献中关键词抽取方法的改进[J]. 情报杂志, 2014, 33(12): 36-40.
[24] (Liu Feng, Wu Ruihong, Xu Chuan, et al. Keyword Extraction of Patent Document: An Improved Approach[J]. Journal of Intelligence, 2014, 33(12): 36-40.)
[25] 黄磊, 伍雁鹏, 朱群峰. 关键词自动提取方法的研究与改进[J]. 计算机科学, 2014, 41(6): 204-207.
doi: 10.11896/j.issn.1002-137X.2014.06.040
[25] (Huang Lei, Wu Yanpeng, Zhu Qunfeng. Research and Improvement of TFIDF Text Feature Weighting Method[J]. Computer Science, 2014, 41(6): 204-207.)
doi: 10.11896/j.issn.1002-137X.2014.06.040
[26] 张瑾. 基于改进TF-IDF算法的情报关键词提取方法[J]. 情报杂志, 2014, 33(4): 153-155.
[26] (Zhang Jin. A Method of Intelligence Key Words Extraction Based on Improved TF-IDF[J]. Journal of Intelligence, 2014, 33(4): 153-155.)
[27] 牛萍, 黄德根. TF-IDF与规则相结合的中文关键词自动抽取研究[J]. 小型微型计算机系统, 2016, 37(4): 711-715.
[27] (Niu Ping, Huang Degen. TF-IDF and Rules Based Automatic Extraction of Chinese Keywords[J]. Journal of Chinese Computer Systems, 2016, 37(4): 711-715.)
[28] Joung J, Kim K. Monitoring Emerging Technologies for Technology Planning Using Technical Keyword Based Analysis from Patent Data[J]. Technological Forecasting and Social Change, 2017, 114: 281-292.
doi: 10.1016/j.techfore.2016.08.020
[29] Brin S, Page L. The Anatomy of a Large-Scale Hypertextual Web Search Engine[J]. Computer Networks and ISDN Systems, 1998, 30(1-7): 107-117.
doi: 10.1016/S0169-7552(98)00110-X
[1] 闫强,张笑妍,周思敏. 基于义原相似度的关键词抽取方法 *[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
[2] 夏天. 面向中文学术文本的单文档关键短语抽取 *[J]. 数据分析与知识发现, 2020, 4(7): 76-86.
[3] 孙明珠,马静,钱玲飞. 基于文档主题结构和词图迭代的关键词抽取方法研究 *[J]. 数据分析与知识发现, 2019, 3(8): 68-76.
[4] 王安,顾益军,李坤明,李文政. 基于复杂网络词节点移除的关键词抽取方法 *[J]. 数据分析与知识发现, 2019, 3(11): 35-44.
[5] 刘竹辰, 陈浩, 于艳华, 李劼. 词位置分布加权TextRank的关键词提取*[J]. 数据分析与知识发现, 2018, 2(9): 74-79.
[6] 王子璇, 乐小虬, 何远标. 基于WMD语义相似度的TextRank改进算法识别论文核心主题句研究[J]. 数据分析与知识发现, 2017, 1(4): 1-8.
[7] 夏天. 词向量聚类加权TextRank的关键词抽取*[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[8] 宁建飞,刘降珍. 融合Word2vec与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2016, 32(6): 20-27.
[9] 张杰, 张海超, 翟东升. 面向中文专利权利要求书的分词方法研究[J]. 现代图书情报技术, 2014, 30(9): 91-98.
[10] 夏天. 词语位置加权TextRank的关键词抽取研究[J]. 现代图书情报技术, 2013, 29(9): 30-34.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn