[Objective] In the field of patent text abstract generation, there are currently problems of single bias in the abstract generation due to the single input structure of patent text, and the abstract generation as a whole has the problems of repeated generation, insufficient concise and smooth, and loss of original information.
[Methods] Firstly, we design two algorithms based on cosine similarity to select the most important patent documents based on the logical structure of the patent text for the single structure problem. Secondly, we design a new sequence-to-sequence structure model with improved multi-head attention mechanism(IMHAM) to better learn the feature expression of the patent text, and add self-attentive layers in the encoder and decoder layers to solve the duplicate generation problem. Finally, we add the improved pointer network structure to solve the problem of missing original information.
[Results] Our model is 3.2%, 2.3%, 5.4% higher in evaluation metrics Rouge-1, Rouge-2, Rouge-L, respectively, compared to model MedWriter on the publicly available patent text data set.
[Limitations] This model is more applicable to the document system with multiple structures such as patents, and may lack room for improvement in extracting important documents for the text content of a single system.
[Conclusions] The proposed model has good generalization capability for quality improvement of text summary generation domains similar to those with multi-document structure systems.