Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (6): 61-72     https://doi.org/10.11925/infotech.2096-3467.2022.0530
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于改进多头注意力机制的专利文本摘要生成研究*
施国良1(),周抒1,2,王云峰2,施春江2,刘亮2
1河海大学商学院 南京 211100
2江苏银行 南京 211006
Generating Patent Text Abstracts Based on Improved Multi-head Attention Mechanism
Shi Guoliang1(),Zhou Shu1,2,Wang Yunfeng2,Shi Chunjiang2,Liu Liang2
1Business School, Hohai University, Nanjing 211100, China
2Bank of Jiangsu, Nanjing 210006, China
全文: PDF (1242 KB)   HTML ( 9
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 解决专利文本摘要生成中专利文本输入结构单一导致的摘要生成单一偏向性问题,以及摘要生成整体上的重复生成、不够简洁流畅、原始信息丢失等问题,提升专利文本摘要生成的质量。【方法】 设计基于改进多头注意力机制的专利文本摘要生成模型IMHAM(Improved Multi-Head Attention Mechanism)。首先,针对结构单一问题,在专利的文本逻辑结构基础上设计两种基于余弦相似度的算法,选出最重要的专利文档;其次,设计一种具有多头注意力机制的序列至序列结构模型,更好地学习专利文本的特征表达;同时,在编码器层与解码器层增加自注意力层,修改注意力函数,解决重复生成的问题;最后,加入改进的指针网络结构解决原始信息丢失的问题。【结果】 在公开的专利文本数据集上,所提模型相较于MedWriter基线模型,评价指标Rouge-1、Rouge-2、Rouge-L分别高出3.3%、2.4%、5.5%。【局限】 所提模型更适用于专利这种有多种结构的文档,对于单一的文档结构无法发挥最重要文档算法的选择效果。【结论】 对于类似具有多文档结构的文本,所提模型在摘要生成领域的质量提升具有良好的泛化能力,同时生成的摘要具有较好的流畅性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
施国良
周抒
王云峰
施春江
刘亮
关键词 专利文本摘要生成多头注意力指针网络    
Abstract

[Objective] This paper addresses the problem of single-bias in patent text summarization caused by the single input structure of the patent text in patent texts. It also addresses the issues of repeated generation, the need for conciseness and fluency, and the loss of original information in generating abstracts. [Methods] We designed a patent text abstract generation model based on an improved multi-head attention mechanism (IMHAM). Firstly, we designed two cosine similarity-based algorithms based on the logical structure of the patent text to address the single structure issue and select the most important patent document. Then, we established a sequence-to-sequence model with a multi-head attention mechanism to learn the feature representation of patent text. Meanwhile, we added self-attention layers at the encoder and decoder levels. Next, we modified the attention function to address the problem of repetitive generation. Finally, we added an improved pointer network structure to solve the problem of original information loss. [Results] On the publicly available patent text dataset, the Rouge-1, Rouge-2, and Rouge-L scores of the proposed model were 3.3%, 2.4%, and 5.5% higher than the MedWriter baseline model. [Limitations] The proposed model is more applicable for documents with multiple structures and cannot fully utilize the algorithm for selecting the most important ones from single-structured documents. [Conclusions] The proposed model has good generalization ability in improving the quality of summary generation for text with multi-document structures.

Key wordsPatent Text    Abstract Generation    Multi-head Attention    Pointer Network
收稿日期: 2022-05-25      出版日期: 2023-08-09
ZTFLH:  TP391  
  G350  
基金资助:* 中央高校基本科研业务费专项资金项目(B200207036)
通讯作者: 施国良,ORCID:0000-0001-8672-9342,E-mail:shigl@hhu.edu.cn。   
引用本文:   
施国良, 周抒, 王云峰, 施春江, 刘亮. 基于改进多头注意力机制的专利文本摘要生成研究*[J]. 数据分析与知识发现, 2023, 7(6): 61-72.
Shi Guoliang, Zhou Shu, Wang Yunfeng, Shi Chunjiang, Liu Liang. Generating Patent Text Abstracts Based on Improved Multi-head Attention Mechanism. Data Analysis and Knowledge Discovery, 2023, 7(6): 61-72.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0530      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I6/61
Fig.1  模型整体架构
Fig.2  编码器端与解码器端细节
正则表达式 作用
<script[^>]*?>[\\s\\S]*?<\\/script> 处理网页标签等格式
<style[^>]*?>[\\s\\S]*?<\\/style>
<(?!div|/div|p|/p|br)[^>]*>
<tr>(.*?)</tr>
<th>(.*?)</th>
<td>(.*?)</td>
(?<=<title>).*?(?=</title>) 处理标题
<a.*?href=.*?<\/a> 处理图片引用等超链接格式
\\s*|\t|\r|\n 处理多余空格、换行、空白等格式
Table1  正则表达式处理类型与其表达式
属性 内容
标题 一种人工智能教育机器人的雷达安装座组件
公开号 CN211333277U
摘要 本实用新型公开了一种人工智能教育机器人的雷达安装座组件,属于机器人技术领域,通过…
说明书文本 本实用新型的目的是至少解决现有技术中存在的技术问题之一提供一种雷达安装座组件实现通过一个红外检测雷达工作扩大红外监测雷达的工作范围减少盲区对前方障碍物进行更宽范围的扫描判定…
权利要求书 1.一种人工智能教育机器人的雷达安装座组件,包括车体(1),其特征在于,所述车体(1)的前端形成有雷达安装座(2),所述雷达安装座(2)包括第一安装面板(3)和第二安装面板…
Table 2  专利数据展现形式
实验环境 具体配置
系统 CentOS Linux release 7.4.1708
GPU Tesla T4(16GB)×2
CPU Intel(R) Xeon(R) Gold 6130 CPU@3.5GHz
CPU核心数 32
CPU线程数 64
内存 256GB
CUDA 10.0
Python 3.6.5
PyTorch 1.1.0
Table 3  实验环境具体参数与配置
模型 说明
TextRank[32] 基于图的文本处理模型,有两种创新的无监督关键词句提取方法
SummaRuNNer[33] 基于循环神经网络的序列模型进行提取式总结
Baseline-Multi-Encoder 基于以上流程,把改进的多头注意力机制单独用于编码器
Baseline-Multi-Decoder 基于以上流程,把改进的多头注意力机制单独用于解码器
Baseline-Multi-Encoder-NoChoosen 基于以上流程,把改进的多头注意力机制单独用于编码器,但是没有使用最重要文档语义相似度抉择
Baseline-Multi-Decoder-NoChoosen 基于以上流程,把改进的多头注意力机制单独用于解码器,但是没有使用最重要文档语义相似度抉择
MedWriter[34] 采用一种知识感知的文本生成模型,具有学习图级表示的能力
IMHAM(本文) 将改进的多头注意力用于解码器和编码器,使用最重要文档语义相似度选择和指针网络优化
Table 4  模型与对应说明
数据分类 算法A/% 算法B/%
水利 45.3 54.7
人工智能 44.9 55.1
光纤 37.2 62.8
农业 39.9 60.1
金融 41.8 58.2
Table 5  专利数据选择算法的选择占比
数据分类 说明书文本/% 权利要求书/%
水利 83.3 16.7
人工智能 79.3 20.7
光纤 86.2 13.8
农业 76.9 23.1
金融 72.1 27.9
Table 6  说明书文本与权利要求书占比
Fig.3  主要模型的迭代次数与损失值情况
模型 Rouge-1 Rouge-2 Rouge-L
TextRank 0.432 0.235 0.367
SummaRuNNer 0.482 0.293 0.393
Baseline-Multi-Encoder-NoChoosen 0.488 0.303 0.401
Baseline-Multi-Encoder 0.491 0.312 0.407
Baseline-Multi-Decoder-NoChoosen 0.509 0.329 0.411
Baseline-Multi-Decoder 0.515 0.335 0.416
MedWriter 0.518 0.339 0.419
IMHAM 0.535 0.347 0.442
Table 7  模型在专利文本数据集合上的Rouge表现
模型 摘要内容
原文内容(重要文档选择为
“说明书文本”)
水利工程承担挡水蓄水和泄水的任务,因而对水工建筑物的稳定承压防渗抗冲耐磨抗冻抗裂等性能都有特殊要求,需按照水利工程的技术规范,采取专门的施工方法和措施,确保工程质量但是在实际使用时,堤坝在长时间的经过潮水的冲撞后,很容易导致坝体出现松动,存在较大的危险性,并且堤坝处的水面上会带来很多漂浮大垃圾,非常的不便于工作人员打捞。实用新型内容一解决的技术问题针对现有技术的不足,本实用新型公开了一种水利工程用防冲堤坝保护装置,以解决上述背景技术中提出的问题。技术方案为实现以上目的,本实用新型通过以下技术方案予以实现一种水利工程用防冲堤坝保护装置…
人工摘要 本实用新型公开一种水利工程用防冲堤坝保护装置,涉及水利工程技术领域。该水利工程用防冲堤坝保护装置包括,防护板两个支撑杆两个第二连接板和两个第一连接板,两个所述支撑杆均固定设置在防护板的顶部两端。该水利工程用防冲堤坝保护装置通过设置有防护清理机构,加强板在滑槽内滑动,便于调节加强板的所在位置,用于适配不同的水位,卷取盘进行旋转,便于对收集板的所在位置进行调节,便于工作人员对水面垃圾进行收集,整体设计简单,结构紧凑,在能够对堤坝进行防护的同时,便于对水面垃圾进行清理,具有良好的实用性。
Baseline-Multi-Encoder 本实用新型涉及水利工程技术领域,尤其为一种水利工程堤坝保护装置,包括防护板两个支撑杆两个第二连接板两个所述支撑杆均固定设置在防护板的顶部两端。两个所述第二连接板均固定设置在防护板的顶部,两个所述第一连接板分别固定设置在支撑杆的顶部一侧所述防护板的表面开设有两个滑槽,所述防护板的表面设置有防护清理机构所述。防护清理机构包括加强板,支撑板设置在防护板的表面,能够调节调节调节调节调节调节调节调节调节
Baseline-Multi-Decoder 本实用新型涉及水利工程技术领域,尤其为一种水利工程堤坝保护装置,所述移动组件包括车板和四个移动轮,包括防护板两个支撑杆两个第二连接板两个所述支撑杆均固定设置在防护板的顶部两端。该装置有清理部位可便于调节加强板,洗刷件竖直设置在车板上,喷洗件设置在车板的后端底部。此装置设计精巧可对堤坝防护,具有有用性。
IMHAM 本实用新型涉及水利工程技术领域,尤其为一种水利工程堤坝保护装置,包括防护板两个支撑杆两个第二连接板两个所述支撑杆均固定设置在防护板的顶部两端。该装置有清理部位可便于调节加强板,适配水位,卷盘旋转,调节位置,垃圾收集。此装置设计精巧对堤坝可防护,具有实用性。
Table 8  摘要生成结果不同模型的示例
模型 Encoder_Hidden_Layer=128 Encoder_Hidden_Layer=256 Encoder_Hidden_Layer=512
Baseline-Multi-Encoder-NoChoosen 0.483 0.485 0.481
Baseline-Multi-Encoder 0.484 0.488 0.485
Baseline-Multi-Decoder-NoChoosen 0.503 0.507 0.506
Baseline-Multi-Decoder 0.509 0.513 0.508
IMHAM 0.518 0.529 0.522
Table 9  编码器的隐藏层大小对实验指标Rouge-1的影响
模型 Decoder_Hidden_Layer=128 Decoder_Hidden_Layer=256 Decoder_Hidden_Layer=512
Baseline-Multi-Encoder-NoChoosen 0.484 0.487 0.483
Baseline-Multi-Encoder 0.485 0.490 0.486
Baseline-Multi-Decoder-NoChoosen 0.505 0.508 0.506
Baseline-Multi-Decoder 0.510 0.514 0.510
IMHAM 0.522 0.531 0.526
Table 10  解码器的隐藏层大小对实验指标Rouge-1的影响
模型 Multihead_Count=2 Multihead_Count=3 Multihead_Count=4 Multihead_Count=5
Baseline-Multi-Encoder-NoChoosen 0.483 0.486 0.488 0.487
Baseline-Multi-Encoder 0.485 0.489 0.491 0.490
Baseline-Multi-Decoder-NoChoosen 0.503 0.506 0.509 0.508
Baseline-Multi-Decoder 0.509 0.511 0.515 0.513
IMHAM 0.521 0.532 0.535 0.533
Table 11  多头注意力头数对实验指标Rouge-1的影响
[1] Tan J W, Wan X J, Xiao J G. Abstractive Document Summarization with A Graph-based Attentional Neural Model[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2017: 1171-1181.
[2] Rush A M, Chopra S, Weston J. A Neural Attention Model for Abstractive Sentence Summarization[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 379-389.
[3] See A, Liu P J, Manning C D. Get to the Point: Summarization with Pointer-Generator Networks[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2017: 1073-1083.
[4] 朱永清, 赵鹏, 赵菲菲, 等. 基于深度学习的生成式文本摘要技术综述[J]. 计算机工程, 2021, 47(11): 11-21, 28.
doi: 10.19678/j.issn.1000-3428.0061174
[4] (Zhu Yongqing, Zhao Peng, Zhao Feifei, et al. Survey on Abstractive Text Summarization Technologies Based on Deep Learning[J]. Computer Engineering, 2021, 47(11): 11-21, 28.)
doi: 10.19678/j.issn.1000-3428.0061174
[5] Landauer T K, Dumais S T. A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge[J]. Psychological Review, 1997, 104(2): 211-240.
doi: 10.1037/0033-295X.104.2.211
[6] Landauer T K, Foltz P W, Laham D. An Introduction to Latent Semantic Analysis[J]. Discourse Processes, 1998, 25(2-3): 259-284.
doi: 10.1080/01638539809545028
[7] Mohamed A H T, Mohamed B A, Abdelmajid B H. Computing Semantic Relatedness Using Wikipedia Features[J]. Knowledge-Based Systems, 2013, 50: 260-278.
doi: 10.1016/j.knosys.2013.06.015
[8] Sinoara R A, Camacho-Collados J, Rossi R G, et al. Knowledge-Enhanced Document Embeddings for Text Classification[J]. Knowledge-Based Systems, 2019, 163: 955-971.
doi: 10.1016/j.knosys.2018.10.026
[9] Sutskever I, Vinyals O, Le Q V. Sequence to Sequence Learning with Neural Networks[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2. 2014: 3104-3112.
[10] R J, Anami B S, Poornima B K. Text Document Summarization Using POS Tagging for Kannada Text Documents[C]// Proceedings of the 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence). 2021: 423-426.
[11] Pan M, Wang J M, Huang X J, et al. A Probabilistic Framework for Integrating Sentence-Level Semantics via BERT into Pseudo-Relevance Feedback[J]. Information Processing and Management, 2022, 59(1): 102734.
doi: 10.1016/j.ipm.2021.102734
[12] Wang X P, Liu X X, Guo J, et al. A Deep Person re-Identification Model with Multi Visual-Semantic Information Embedding[J]. Multimedia Tools and Applications, 2021, 80(5): 6853-6870.
doi: 10.1007/s11042-020-09957-5
[13] McMillan-Major A, Osei S, Rodriguez J D, et al. Reusable Templates and Guides for Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of the HuggingFace and GEM Data and Model Cards[C]// Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021). 2021: 121-135.
[14] Alami N, Mallahi M El, Amakdouf H, et al. Hybrid Method for Text Summarization Based on Statistical and Semantic Treatment[J]. Multimedia Tools and Applications, 2021, 80(13): 19567-19600.
doi: 10.1007/s11042-021-10613-9
[15] Shao Y N, Lin J C W, Srivastava G, et al. Self-Attention-Based Conditional Random Fields Latent Variables Model for Sequence Labeling[J]. Pattern Recognition Letters, 2021, 145(C): 157-164.
[16] Badanidiyuru A, Karbasi A, Kazemi E, et al. Submodular Maximization Through Barrier Functions[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020: 524-534.
[17] Faris M R, Ibrahim H M, Abdulrahman K Z, et al. Fuzzy Logic Model for Optimal Operation of Darbandikhan Reservoir, Iraq[J]. International Journal of Design & Nature and Ecodynamics, 2021, 16(4): 335-343.
[18] Ghumade T G, Deshmukh R A. A Document Classification Using NLP and Recurrent Neural Network[J]. International Journal of Engineering and Advanced Technology, 2019, 8(6): 633-636.
[19] Shi T, Keneshloo Y, Ramakrishnan N, et al. Neural Abstractive Text Summarization with Sequence-to-Sequence Models[J]. ACM/IMS Transactions on Data Science, 2021, 2(1): 1-37.
[20] Sultana M, Chakraborty P, Choudhury T. Bengali Abstractive News Summarization Using Seq2Seq Learning with Attention[C]// Proceedings of Cyber Intelligence and Information Retrieval. 2021: 279-289.
[21] Yang M, Qu Q, Shen Y, et al. Cross-Domain Aspect/Sentiment-Aware Abstractive Review Summarization by Combining Topic Modeling and Deep Reinforcement Learning[J]. Neural Computing and Applications, 2020, 32(11): 6421-6433.
doi: 10.1007/s00521-018-3825-2
[22] Chen Y B, Ma Y, Mao X D, et al. Multi-Task Learning for Abstractive and Extractive Summarization[J]. Data Science and Engineering, 2019, 4(1): 14-23.
doi: 10.1007/s41019-019-0087-7
[23] Choi H, Cho K, Bengio Y. Context-Dependent Word Representation for Neural Machine Translation[J]. Computer Speech & Language, 2017, 45: 149-160.
[24] Yun H, Hwang Y, Jung K. Improving Context-Aware Neural Machine Translation Using Self-Attentive Sentence Embedding[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34: 9498-9506.
[25] Munir K, Zhao H, Li Z C. Adaptive Convolution for Semantic Role Labeling[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2021, 29: 782-791.
doi: 10.1109/TASLP.6570655
[26] Hou K K, Hou T T, Cai L L. Public Attention about COVID-19 on Social Media: An Investigation Based on Data Mining and Text Analysis[J]. Personality and Individual Differences, 2021, 175: 110701.
doi: 10.1016/j.paid.2021.110701
[27] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[28] Tao C Y, Gao S, Shang M Y, et al. Get the Point of My Utterance! Learning Towards Effective Responses with Multi-Head Attention Mechanism[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018: 4418-4424.
[29] Jean S, Cho K, Memisevic R, et al. On Using Very Large Target Vocabulary for Neural Machine Translation[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1:Long Papers). 2015: 1-10.
[30] Kumar A, Seth S, Gupta S, et al. Sentic Computing for Aspect-Based Opinion Summarization Using Multi-Head Attention with Feature Pooled Pointer Generator Network[J]. Cognitive Computation, 2022, 14(1): 130-148.
doi: 10.1007/s12559-021-09835-8
[31] Gill H S, Khehra B S, Singh A, et al. Teaching-Learning-Based Optimization Algorithm to Minimize Cross Entropy for Selecting Multilevel Threshold Values[J]. Egyptian Informatics Journal, 2018, 20(1): 11-25.
doi: 10.1016/j.eij.2018.03.006
[32] Mihalcea R, Tarau P. TextRank: Bringing Order into Texts[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2003: 404-411.
[33] Nallapati R, Zhai F F, Zhou B W. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2013: 3075-3081.
[34] Pan Y C, Chen Q C, Peng W H, et al. MedWriter: Knowledge-Aware Medical Text Generation[C]// Proceedings of the 28th International Conference on Computational Linguistics. 2020: 2363-2368.
[35] Lin C Y. Rouge: A Package for Automatic Evaluation of Summaries[C]// Proceedings of Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004. 2004: 74-81.
[1] 徐康, 余胜男, 陈蕾, 王传栋. 基于语言学知识增强的自监督式图卷积网络的事件关系抽取方法*[J]. 数据分析与知识发现, 2023, 7(5): 92-104.
[2] 韩普, 仲雨乐, 陆豪杰, 马诗雯. 基于对抗性迁移学习的药品不良反应实体识别研究*[J]. 数据分析与知识发现, 2023, 7(3): 131-141.
[3] 胡吉明, 郑翔. 基于主题聚类的新媒体政务互动内容摘要生成研究*[J]. 数据分析与知识发现, 2022, 6(6): 95-104.
[4] 佟昕瑀, 赵蕊洁, 路永和. 基于预训练模型的多标签专利分类研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 129-137.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn