Please wait a minute...
Advanced Search
数据分析与知识发现  2024, Vol. 8 Issue (1): 146-156     https://doi.org/10.11925/infotech.2096-3467.2022.1142
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于SpERT-Aggcn模型的专利知识图谱构建研究*
何玉,张晓冬(),郑鑫
北京科技大学经济管理学院 北京 100083
Constructing Patent Knowledge Graph with SpERT-Aggcn Model
He Yu,Zhang Xiaodong(),Zheng Xin
School of Economics and Management, University of Science and Technology Bejing, Beijing 100083, China
全文: PDF (1502 KB)   HTML ( 17
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 针对知识图谱构建中识别嵌套实体以及提升关系抽取精度的问题,提出信息抽取模型SpERT-Aggcn,并构建绿色合作专利知识图谱。【方法】 基于SpERT-Aggcn模型抽取专利摘要文本中的嵌套实体和关系,采用Protégé构建本体并根据所构建本体实现三元组的映射。【结果】 在关系抽取任务上,SpERT-Aggcn比SpERT模型的F1值高2.61个百分点,其中长距离关系抽取F1值高4.42个百分点;构建的绿色合作专利知识图谱包含699 517个实体、3 241 805条关系。【局限】 SpERT-Aggcn模型的短距离关系F1值低于SpERT模型,说明本文模型对于短距离关系的识别能力较差。【结论】 通过基于跨度的实体识别模型以及引入依存文法信息的关系抽取模型,构建的知识图谱完整度更高。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
何玉
张晓冬
郑鑫
关键词 绿色合作专利知识图谱图卷积网络信息抽取    
Abstract

[Objective] This paper proposes an information extraction model (SpERT-Aggcn) and constructs knowledge graphs for green cooperation patents based on this model. It helps us identify nested entities and improve the accuracy of relationship extraction for knowledge graphs. [Methods] First, we utilized the SpERT-Aggcn model to extract nested entities and relationships from patent abstracts. Then, we built an ontology using Protégé and mapped the triples with the constructed ontology. [Results] In relationship extraction, the SpERT-Aggcn model’s F1 score was 2.61% higher than the SpERT model. The SpERT-Aggcn model’s F1 score was 4.42% higher than the SpERT model for the long-distance relationship extraction tasks. The constructed knowledge graph for green cooperation patents contained 699,517 entities and 3,241,805 relationships. [Limitations] The F1 score of SpERT-Aggcn for extracting short-distance relationships was lower than the SpERT model, indicating a weaker capability of the proposed model in identifying short-distance relationships. [Conclusions] The proposed model could help us construct better knowledge graphs.

Key wordsGreen Cooperation Patent    Knowledge Graph    Graph Convolution Network    Information Extraction
收稿日期: 2022-10-31      出版日期: 2023-04-28
ZTFLH:  TP393  
  G250  
基金资助:*国家自然科学基金项目(71871018)
通讯作者: 张晓冬,ORCID:0000-0002-8203-9763,E-mail:xdzhang@manage.ustb.edu.cn。   
引用本文:   
何玉, 张晓冬, 郑鑫. 基于SpERT-Aggcn模型的专利知识图谱构建研究*[J]. 数据分析与知识发现, 2024, 8(1): 146-156.
He Yu, Zhang Xiaodong, Zheng Xin. Constructing Patent Knowledge Graph with SpERT-Aggcn Model. Data Analysis and Knowledge Discovery, 2024, 8(1): 146-156.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.1142      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2024/V8/I1/146
Fig.1  序列标注模型与基于跨度模型的区别
Fig.2  绿色合作专利知识图谱构建流程
Fig.3  SpERT-Aggcn模型结构
Fig.4  SpERT-Aggcn实体识别模块
Fig.5  Aggcn模块
Fig.6  关系抽取模块
模型 实体识别 关系抽取
Precision(%) Recall(%) F1(%) Precision(%) Recall(%) F1(%)
SpERT 79.25 83.39 81.27 48.82 68.21 56.91
SpERT-Aggcn 82.01 81.01 81.51 52.92 68.00 59.52
Table 1  测试集结果
模型 专利-优点:(间隔长度为45字) 专利-设备:(间隔长度为42字) 空间关系:(间隔长度为12字) 设备-原料:(间隔长度为5字)
SpERT 64.46 68.49 52.87 41.07
SpERT-Aggcn 68.88 70.17 54.72 40.61
Table 2  关系分类F1值(%)
ID 问题类型 实例
实例1 漏标纠正 [本实用新型][结构简单],[对污泥处理效果好],[充分利用能源]
实例2 语义错误 [本发明]公开了一种以[[转炉][钢渣]]为原料制备[[钙铁]双氧载体]的方法,属于[固体废弃物]利用技术领域
实例3 实体漏检 [本发明]涉及一种[[餐饮[垃圾]]自动粉碎压榨一体化设备]
实例4 关系漏检 [本发明]专利-优点能够[提高高压环路互锁系统可靠性和稳定性]专利-优点
Table 3  负例
Fig.7  绿色合作专利知识图谱本体
Fig.8  基于Neo4j的知识图谱可视化
[1] 程文婷. 绿色专利[J]. 电子知识产权, 2011(Z1): 42.
[1] (Cheng Wenting. Green Patent[J]. Electronics Intellectual Property, 2011(Z1): 42.)
[2] 零壹智库, 横琴数链数字金融研究院. 中国绿色技术创新指数报告(2021)[EB/OL]. [2022-10-01]. https://www.01caijing.com/article/323551.htm.
[2] (Zero One Think Tank, Hengqin Digital Finance Research Institute. China Green Technology Innovation Index Report (2021)[EB/OL]. [2022-10-01]. https://www.01caijing.com/article/323551.htm.)
[3] 齐绍洲, 林屾, 崔静波. 环境权益交易市场能否诱发绿色创新?——基于我国上市公司绿色专利数据的证据[J]. 经济研究, 2018, 53(12): 129-143.
[3] (Qi Shaozhou, Lin Shen, Cui Jingbo. Do Environmental Rights Trading Schemes Induce Green Innovation? Evidence from Listed Firms in China[J]. Economic Research Journal, 2018, 53(12): 129-143.)
[4] 王班班, 赵程. 中国的绿色技术创新——专利统计和影响因素[J]. 工业技术经济, 2019, 38(7): 53-66.
doi: 10.3969/j.issn.1004-910X.2019.07.007
[4] (Wang Banban, Zhao Cheng. China’s Green Technological Innovation—Patent Statistics and Influencing Factors[J]. Journal of Industrial Technological Economics, 2019, 38(7): 53-66.)
doi: 10.3969/j.issn.1004-910X.2019.07.007
[5] 安晓慧, 许轶. 绿色专利技术创新趋势[J]. 世界科技研究与发展, 2020, 42(3): 358.
[5] (An Xiaohui, Xu Yi. Innovation Trend of Green Patent Technology[J]. World Sci-Tech R&D, 2020, 42(3): 358.)
[6] 田玲, 张谨川, 张晋豪, 等. 知识图谱综述——表示、构建、推理与知识超图理论[J]. 计算机应用, 2021, 41(8): 2161-2186.
doi: 10.11772/j.issn.1001-9081.2021040662
[6] (Tian Ling, Zhang Jinchuan, Zhang Jinhao, et al. Knowledge Graph Survey: Representation, Construction, Reasoning and Knowledge Hypergraph Theory[J]. Journal of Computer Applications, 2021, 41(8): 2161-2186.)
doi: 10.11772/j.issn.1001-9081.2021040662
[7] Eberts M, Ulges A. Span-Based Joint Entity and Relation Extraction with Transformer Pre-Training[OL]. arXiv Preprint, arXiv: 1909.07755.
[8] Guo Z J, Zhang Y, Lu W. Attention Guided Graph Convolutional Networks for Relation Extraction[OL]. arXiv Preprint, arXiv: 1906.07510.
[9] Lafferty J, McCallum A, Peteira F C. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]// Proceedings of the 18th International Conference on Machine Learning. 2001: 282-289.
[10] Ma X Z, Hovy E. End-to-End Sequence Labeling via Bi-Directional LSTM-CNNS-CRF[OL]. arXiv Preprint, arXiv: 1603.01354.
[11] Liu L Y, Shang J B, Ren X, et al. Empower Sequence Labeling with Task-Aware Neural Language Model[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. 2018: 5253-5260.
[12] Fu J L, Huang X J, Liu P F. SpanNER: Named Entity Re-/Recognition as Span Prediction[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021: 7183-7195.
[13] Li F, Lin Z C, Zhang M S, et al. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021: 4814-4828.
[14] Zhong Z X, Chen D Q. A Frustratingly Easy Approach for Entity and Relation Extraction[C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2021: 50-61.
[15] Ye D M, Lin Y K, Li P, et al. Packed Levitated Marker for Entity and Relation Extraction[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022: 4904-4917.
[16] Zhang M S, Zhang Y, Fu G H. End-to-End Neural Relation Extraction with Global Optimization[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1730-1740.
[17] Li X Y, Yin F, Sun Z J, et al. Entity-Relation Extraction as Multi-Turn Question Answering[OL]. arXiv Preprint, arXiv: 1905.05529.
[18] Sun C Z, Gong Y Y, Wu Y B, et al. Joint Type Inference on Entities and Relations via Graph Convolutional Networks[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 1361-1370.
[19] Zheng S C, Wang F, Bao H Y, et al. Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme[OL]. arXiv Preprint, arXiv: 1706.05075.
[20] Bekoulis G, Deleu J, Demeester T, et al. Joint Entity Recognition and Relation Extraction as a Multi-Head Selection Problem[J]. Expert Systems with Applications, 2018, 114: 34-45.
doi: 10.1016/j.eswa.2018.07.032
[21] Zhang Y H, Qi P, Manning C D. Graph Convolution over Pruned Dependency Trees Improves Relation Extraction[OL]. arXiv Preprint, arXiv: 1809.10185.
[22] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. ACM, 2017: 6000-6010.
[23] Huang G, Liu Z, van der Maaten L, et al. Densely Connected Convolutional Networks[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2017: 2261-2269.
[24] Singhal A. Introducing the Knowledge Graph: Things, Not Strings[EB/OL]. [2022-10-01]. https://blog.google/products/search/introducing-knowledge-graph-things-not/.
[25] 徐增林, 盛泳潘, 贺丽荣, 等. 知识图谱技术综述[J]. 电子科技大学学报, 2016, 45(4): 589-606.
[25] (Xu Zenglin, Sheng Yongpan, He Lirong, et al. Review on Knowledge Graph Techniques[J]. Journal of University of Electronic Science and Technology of China, 2016, 45(4): 589-606.)
[26] 邵泽宇, 孟天宇. 基于知识图谱的区块链专利数据挖掘[J]. 技术与创新管理, 2020, 41(6): 588-595.
[26] (Shao Zeyu, Meng Tianyu. Block Chain Patented Data Mining Based on Mapping Knowledge[J]. Technology and Innovation Management, 2020, 41(6): 588-595.)
[27] 赖朝安, 钱娇. 基于知识图谱的专利挖掘方法及其应用[J]. 科研管理, 2017, 38(S1): 333-341.
[27] (Lai Chaoan, Qian Jiao. A Method for Patent Mining Based on Knowledge Map and Its Application[J]. Science Research Management, 2017, 38(S1): 333-341.)
[28] 张少光. 面向创新的专利知识图谱构建与应用研究[D]. 天津: 河北工业大学, 2020.
[28] (Zhang Shaoguang. Research on the Construction and Application of Innovation-Oriented Patent Knowledge Graph[D]. Tianjin: Hebei University of Technology, 2020.)
[29] 马国斌. 基于知识图谱的专利知识检索研究[D]. 哈尔滨: 哈尔滨工业大学, 2021.
[29] (Ma Guobin. Research on Patent Knowledge Search Based on Knowledge Graph[D]. Harbin: Harbin Institute of Technology, 2021.)
[30] 吕向如. 中文专利知识图谱构建研究[D]. 北京: 北京信息科技大学, 2019.
[30] (Lü Xiangru. Research on the Construction of Chinese Patent Knowledge Map[D]. Beijing: Beijing Information Science & Technology University, 2019.)
[31] 祝德刚, 宫琳, 唐圣, 等. 基于专利知识图谱的产品创新概念设计方法[J]. 计算机集成制造系统, 2022, 28(11): 3599-3614.
[31] (Zhu Degang, Gong Lin, Tang Sheng, et al. Conceptual Design Method for Product Innovation Based on a Patent Knowledge Graph[J]. Computer Integrated Manufacturing Systems, 2022, 28(11): 3599-3614.)
[32] 林超. 基于自然语言处理的专利知识图谱构建研究[D]. 杭州: 杭州电子科技大学, 2021.
[32] (Lin Chao. Research on the Construction of Patent Knowledge Graph Based on Natural Language Processing[D]. Hangzhou: Hangzhou Dianzi University, 2021.)
[33] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
[34] 刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述[J]. 计算机研究与发展, 2016, 53(3): 582-600.
[34] (Liu Qiao, Li Yang, Duan Hong, et al. Knowledge Graph Construction Techniques[J]. Journal of Computer Research and Development, 2016, 53(3): 582-600.)
[1] 翟东升, 娄莹, 阚慧敏, 何喜军, 梁国强, 马自飞. 基于多源异构数据的中医药知识图谱构建与应用研究*[J]. 数据分析与知识发现, 2023, 7(9): 146-158.
[2] 鲍彤, 章成志. ChatGPT中文信息抽取能力测评——以三种典型的抽取任务为例*[J]. 数据分析与知识发现, 2023, 7(9): 1-11.
[3] 张颖怡, 章成志, 周毅, 陈必坤. 基于ChatGPT的多视角学术论文实体识别:性能测评与可用性研究*[J]. 数据分析与知识发现, 2023, 7(9): 12-24.
[4] 张志剑, 倪珍妮, 刘政昊, 夏苏迪. 面向金融知识图谱的动态关系预测方法研究*[J]. 数据分析与知识发现, 2023, 7(9): 39-50.
[5] 普祥和, 王红斌, 线岩团. 结合类型感知注意力的少样本知识图谱补全*[J]. 数据分析与知识发现, 2023, 7(9): 51-63.
[6] 刘洋, 丁星辰, 马莉莉, 王淳洋, 朱立芳. 基于多维度图卷积网络的旅游评论有用性识别*[J]. 数据分析与知识发现, 2023, 7(8): 95-104.
[7] 汪晓凤, 孙雨洁, 王华珍, 张恒彰. 融合深度学习和知识图谱的类型可控问句生成模型构建及验证*[J]. 数据分析与知识发现, 2023, 7(6): 26-37.
[8] 胥桂仙, 张子欣, 于绍娜, 董玉双, 田媛. 基于图卷积网络的藏文新闻文本分类*[J]. 数据分析与知识发现, 2023, 7(6): 73-85.
[9] 徐康, 余胜男, 陈蕾, 王传栋. 基于语言学知识增强的自监督式图卷积网络的事件关系抽取方法*[J]. 数据分析与知识发现, 2023, 7(5): 92-104.
[10] 李锴君, 牛振东, 时恺泽, 邱萍. 基于学术知识图谱及主题特征嵌入的论文推荐方法*[J]. 数据分析与知识发现, 2023, 7(5): 48-59.
[11] 王寅秋, 虞为, 陈俊鹏. 融合知识图谱的中文医疗问答社区自动问答研究*[J]. 数据分析与知识发现, 2023, 7(3): 97-109.
[12] 杜悦, 常志军, 董美, 钱力, 王颖. 一种面向海量科技文献数据的大规模知识图谱构建方法*[J]. 数据分析与知识发现, 2023, 7(2): 141-150.
[13] 张贞港, 余传明. 基于实体与关系融合的知识图谱补全模型研究*[J]. 数据分析与知识发现, 2023, 7(2): 15-25.
[14] 陈玲洪, 潘晓华. 基于知识图谱和读者画像的图书推荐研究*[J]. 数据分析与知识发现, 2023, 7(12): 164-171.
[15] 魏建香, 陆谦, 韩普, 黄卫东. 基于多语义信息融合的事件检测模型*[J]. 数据分析与知识发现, 2023, 7(12): 64-74.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn