|
|
STNLTP: Generating Chinese Patent Abstracts Based on Integrated Strategy |
Zhang Le,Du Yifan,Lü Xueqiang( ),Dong Zhian |
Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China |
|
|
Abstract [Objective] This paper proposes an abstracting model for Chinese patents based on integration strategy (STNLTP), aiming to reduce the duplication and long document dependency issues of the existing automatic abstracting techniques. [Methods] First, we introduced a patent term dictionary, and used the sememe vector based on SAT model to represent traditional Chinese medicine patents. Then, with the help of integration strategy, we utilized the TextRank, Lead4 and NMF models to extract key sentences from the patents. Third, we identified the optimal key sentences with the clustering and redundancy removing. Finally, we processed these optimal key sentences with the pointer-generator network based on Transformer character vector to create the abstracts. [Results] Our new model successfully combined the extractive and generative methods. Compared with the existing RLCPAR model, we improved the evaluation indicators of ROUGE-1, ROUGE-2 and ROUGE-L by 2.00%, 9.73% and 2.35%, respectively. [Limitations] There are still some errors in the new abstracts. [Conclusions] The new STNLTP model could effectively generate Chinese patent abstracts.
|
Received: 17 November 2021
Published: 01 March 2022
|
|
Fund:National Natural Science Foundation of China(62171043) |
Corresponding Authors:
Lü Xueqiang,ORCID:0000-0002-1422-0560
E-mail: lxq@bistu.edu.cn
|
[1] |
万小丽, 朱雪忠. 专利价值的评估指标体系及模糊综合评价[J]. 科研管理, 2008, 29(2): 185-191.
|
[1] |
( Wan Xiaoli, Zhu Xuezhong. The Indicator System and Fuzzy Comprehensive Evaluation of Patent Value[J]. Science Research Management, 2008, 29(2): 185-191.)
|
[2] |
张乐, 冷基栋, 吕学强, 等. RLCPAR:一种基于强化学习的中文专利摘要改写模型[J]. 数据分析与知识发现, 2021, 5(7): 59-69.
|
[2] |
( Zhang Le, Leng Jidong, Lv Xueqiang, et al. RLCPAR: A Rewriting Model for Chinese Patent Abstracts Based on Reinforcement Learning[J]. Data Analysis and Knowledge Discovery, 2021, 5(7): 59-69.)
|
[3] |
Berg-Kirkpatrick T, Gillick D, Klein D. Jointly Learning to Extract and Compress[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies. 2011: 481-490.
|
[4] |
Nallapati R, Zhai F, Zhou B. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017: 3075-3081.
|
[5] |
Liu Y, Lapata M. Text Summarization with Pretrained Encoders[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 2019: 3730-3740.
|
[6] |
Mihalcea R, Tarau P. TextRank: Bringing Order into Text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2004: 404-411.
|
[7] |
Foltz P W. Latent Semantic Analysis for Text-Based Research[J]. Behavior Research Methods, Instruments and Computers, 1996, 28(2): 197-202.
doi: 10.3758/BF03204765
|
[8] |
Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
|
[9] |
Lee D D, Seung H S. Learning the Parts of Objects by Non-Negative Matrix Factorization[J]. Nature, 1999, 401(6755): 788-791.
doi: 10.1038/44565
|
[10] |
Gong Y H, Liu X. Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis[C]// Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2001: 19-25.
|
[11] |
Kar M, Nunes S, Ribeiro C. Summarization of Changes in Dynamic Text Collections Using Latent Dirichlet Allocation Model[J]. Information Processing & Management, 2015, 51(6): 809-833.
doi: 10.1016/j.ipm.2015.06.002
|
[12] |
章成志, 童甜甜, 周清清. 基于细粒度评论挖掘的书评自动摘要研究[J]. 情报学报, 2021, 40(2): 163-172.
|
[12] |
( Zhang Chengzhi, Tong Tiantian, Zhou Qingqing. Automatic Summarization of Book Reviews Based on Fine-Grained Review Mining[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(2): 163-172.)
|
[13] |
邝砾, 施如意, 赵雷浩, 等. 大粒度Pull Request描述自动生成[J]. 软件学报, 2021, 32(6): 1597-1611.
|
[13] |
( Kuang Li, Shi Ruyi, Zhao Leihao, et al. Automatic Generation of Large-Granularity Pull Request Description[J]. Journal of Software, 2021, 32(6): 1597-1611.)
|
[14] |
朱永清, 赵鹏, 赵菲菲, 等. 基于深度学习的生成式文本摘要技术综述[J]. 计算机工程, 2021, 47(11): 11-21.
|
[14] |
( Zhu Yongqing, Zhao Peng, Zhao Feifei, et al. Survey on Abstractive Text Summarization Technologies Based on Deep Learning[J]. Computer Engineering, 2021, 47(11): 11-21.)
|
[15] |
See A, Liu P J, Manning C D. Get to the Point: Summarization with Pointer-Generator Networks[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2017: 1073-1083.
|
[16] |
Chung T L, Xu B, Liu Y, et al. Main Point Generator: Summarizing with a Focus[C]// Proceedings of the 23rd International Conference on Database Systems for Advanced Applications. Springer, 2018: 924-932.
|
[17] |
Cohan A, Dernoncourt F, Kim D S, et al. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Association for Computational Linguistics, 2018: 615-621.
|
[18] |
王帅, 赵翔, 李博, 等. TP-AS: 一种面向长文本的两阶段自动摘要方法[J]. 中文信息学报, 2018, 32(6): 71-79.
|
[18] |
( Wang Shuai, Zhao Xiang, Li Bo, et al. TP-AS: A Two-Phase Approach to Long Text Automatic Summarization[J]. Journal of Chinese Information Processing, 2018, 32(6): 71-79.)
|
[19] |
谭金源, 刁宇峰, 杨亮, 等. 基于BERT-SUMOPN模型的抽取-生成式文本自动摘要[J]. 山东大学学报(理学版), 2021, 56(7): 82-90.
|
[19] |
Tan Jinyuan, Diao Yufeng, Yang Liang, et al. Extractive-Abstractive Text Automatic Summary Based on BERT-SUMOPN Model[J]. Journal of Shandong University(Natural Science), 2021, 56(7): 82-90.)
|
[20] |
束云峰, 王中卿. 基于专利结构的中文专利摘要研究[J]. 计算机科学, 2020, 47(S1): 45-48.
|
[20] |
( Shu Yunfeng, Wang Zhongqing. Research on Chinese Patent Summarization Based on Patented Structure[J]. Computer Science, 2020, 47(S1): 45-48.)
|
[21] |
Zhang X X, Lapata M, Wei F R, et al. Neural Latent Extractive Document Summarization[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018: 779-784.
|
[22] |
Dong Z D, Dong Q. HowNet—A Hybrid Language and Knowledge Resource[C]// Proceedings of the 2003 International Conference on Natural Language Processing and Knowledge Engineering. 2003: 820-824.
|
[23] |
Niu Y L, Xie R B, Liu Z Y, et al. Improved Word Representation Learning with Sememes[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2017: 2049-2058.
|
[24] |
Salton G, Buckley C. Term-Weighting Approaches in Automatic Text Retrieval[J]. Information Processing & Management, 1988, 24(5): 513-523.
doi: 10.1016/0306-4573(88)90021-0
|
[25] |
Day W H E, Edelsbrunner H. Efficient Algorithms for Agglomerative Hierarchical Clustering Methods[J]. Journal of Classification, 1984, 1(1): 7-24.
doi: 10.1007/BF01890115
|
[26] |
Lin C Y. Rouge: A Package for Automatic Evaluation of Summaries[C]// Proceedings of the 2004 Workshop on Text Summarization Branches Out. Association for Computational Linguistics, 2004: 74-81.
|
[27] |
Chen Y C, Bansal M. Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2018: 675-686.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|