Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (7): 59-69    DOI: 10.11925/infotech.2096-3467.2021.0089
Current Issue | Archive | Adv Search |
RLCPAR: A Rewriting Model for Chinese Patent Abstracts Based on Reinforcement Learning
Zhang Le1,Leng Jidong1,Lv Xueqiang1,Cui Zhuo2,Wang Lei1,You Xindong1()
1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science & Technology University, Beijing 100101, China
2School of Information & Communication Engineering, Beijing Information Science & Technology University, Beijing 100101, China
Download: PDF (1021 KB)   HTML ( 14
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a rewriting model for Chinese patent abstracts based on reinforcement learning (RLCPAR), aiming to address the issues of sentence redundancy and low accuracy in rewriting multi-sentence abstracts. [Methods] First, we used the RLCPAR to extract key sentences from patent descriptions with the help of patent term dictionary and reinforcement learning. Then, we generated the candidate abstracts using the Transformer deep neural network. Finally, we merged the candidate abstracts with the original patent abstracts to obtain the rewritten abstracts after semantic de-duplication and sorting. [Results] The proposed model effectively finished the end-to-end rewriting of patent abstracts. The scores of RLCPAR were 56.95%, 37.21% and 51.24% with the ROUGE-1, ROUGE-2 and ROUGE-L criteria. [Limitations] The experimental data, which were mainly on Chinese medicine materials, needs to be expanded to other fields. [Conclusions] The PLCPAR model is much better than other sequence generation methods and improves the rewriting quality of Chinese patent abstracts.

Key wordsPatent Abstract      Automatic Rewriting      Reinforcement Learning      Neural Network      Text Generation     
Received: 27 January 2021      Published: 11 August 2021
ZTFLH:  TP391  
Fund:National Natural Science Foundation of China(61671070);Open Project Fund of the Tibetan Information Processing and Machine Translation Key Laboratory/the Key Laboratory of Tibetan Information Processing, Ministry of Education(2019Z002)
Corresponding Authors: You Xindong,ORCID: 0000-0002-3351-4599     E-mail: ybyq920@126.com

Cite this article:

Zhang Le, Leng Jidong, Lv Xueqiang, Cui Zhuo, Wang Lei, You Xindong. RLCPAR: A Rewriting Model for Chinese Patent Abstracts Based on Reinforcement Learning. Data Analysis and Knowledge Discovery, 2021, 5(7): 59-69.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0089     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I7/59

Rewriting Framework of Chinese Patent Abstract
原句子 分词及词性标注
临床研究显示,枸杞多糖可明显改善脂肪肝患者的临床症状。 临床/JJ 研究/NN 显示/VV,/PU 枸杞多糖/NN 可/VV 明显/AD 改善/V脂肪肝/NN患者/NN 的/DEG 临床/JJ 症状/NN 。/PU
Examples of Sentence Preprocessing
字段 示例
专利申请号 CN201410640231
人工摘要 一种蓝莓猕猴桃果酱及其制备方法。制备原料包括,草莓酱蓝莓猕猴桃樱桃山竹榛子仁山楂山豆根草河车白头翁穿心莲柠檬酸蜂蜜和木糖醇。制备方法为,将猕猴桃山竹去皮,将蓝莓樱桃洗净;将榛子仁与猕猴桃山竹蓝莓樱桃加水煮制后,将所有物料打碎,得到混合果浆将山楂山豆根草河车白头翁穿心莲加水煎煮,过滤后将滤液进行喷雾干燥,得到中草药粉末将柠檬酸木糖醇蜂蜜与中草药粉末加入到混合果浆中,混匀,得到混合物料,小火煮沸,并不断搅拌使水分蒸发,直至较粘稠后,与草莓酱混匀,进行灌装密封消毒,即得。所述果酱,制备工艺简单,含有丰富的钙钾硒锌锗等微量元素和人体所需17种氨基酸,还含有丰富的维生素葡萄酸果糖柠檬酸苹果酸脂肪等,口感细腻温润,添加了山楂白头翁等中草药。使得具有美容养颜功效的同时还具有一定的助消化清热解毒的功效。
摘要 本发明公开了一种蓝莓猕猴桃果酱及其制备方法。所述蓝莓猕猴桃果酱由下列重量的原料组成:草莓酱蓝莓猕猴桃樱桃山竹榛子仁山楂山豆根草河车白头翁穿心莲柠檬酸蜂蜜木糖醇本发明制备的蓝莓猕猴桃果酱,制备工艺简单,含有丰富的钙钾硒锌锗等微量元素和人体所需17种氨基酸,还含有丰富的维生素葡萄酸果糖柠檬酸苹果酸脂肪等,口感细腻温润。\n添加了山楂白头翁等中草药,使得本发明具有美容养颜功效的同时还具有一定的助消化清热解毒的功效。
说明书
(截取)
一种蓝莓猕猴桃果酱及其制备方法。技术领域:本发明属于食品加工技术领域,具体涉及一种蓝莓猕猴桃果酱及其制备方法。背景技术:食品饮料是人们消耗最快的消费品,食品饮料关系到人们的身体健康,现在市场上的副食品越来越多,但是大多数的副食品只是能解决人们基本需求,人们的生活水平在不断的提高。对副食品的营养要求也越来越高。果酱是把水果糖及酸度调节剂混合后熬制而成的凝胶物质,主要用来涂抹于面包或吐司上食用。不论草莓蓝莓葡萄玫瑰等小型果实,或李橙苹果桃等大型果实切小后,同样可制成果酱,不过调制时通常只使用一种果实,且制备的果酱无保健养生功能。发明内容:本发明的目的在于提供一种具有保健作用的蓝莓猕猴桃果酱及其制备方法。为实现上述目的,本发明提供如下技术方案:一种蓝莓猕猴桃果酱。所述蓝莓猕猴桃果酱由下列重量的原料组成:草莓酱蓝莓猕猴桃樱桃山竹榛子仁山楂山豆根草河车白头翁穿心莲柠檬酸蜂蜜木糖醇。进一步。所述蓝莓猕猴桃果酱由下列重量的原料组成:草莓酱蓝莓猕猴桃樱桃山竹榛子仁山楂山豆根草河车白头翁穿心莲柠檬酸蜂蜜木糖醇。本发明还提供一种蓝莓猕猴桃果酱的制备方法,所述制备方法包括如下步骤:将猕猴桃山竹去皮,将蓝莓樱桃洗净后,待用将榛子仁与步骤所得猕猴桃山竹蓝莓樱桃加适量水进行煮制,然后将所有物料转移至磨浆机,将物料打碎,得到中草药粉末将柠檬酸木糖醇蜂蜜与步骤所得的中草药粉末加入到步骤所得的混合果酱中,混合均匀,得到混合物料将步骤所得的混合物料小火煮沸,并不断…
Examples of Patent Data
参数名 参数值
句子最大长度 100
词向量维度 128
隐藏层大小 256
批处理个数 64
学习率 0.000 1
提前终止 5
学习率衰减 0.5
强化学习的折扣因子 0.95
Parameter Setting
模型 ROUGE-1/% ROUGE-2/% ROUGE-L/%
Baseline 53.42 32.25 48.27
TextRank 37.60 18.21 31.21
PGN+RL 40.35 23.45 32.67
Top6+Seq2Seq 41.00 24.19 36.27
FASRS 51.68 32.45 44.88
RLCPAR 52.76 33.64 45.58
RLCPAR+Text 56.63 36.38 50.87
Experimental Results (Text of Instruction)
Contrast Test Results
模型 ROUGE-1/% ROUGE-2/% ROUGE-L/%
Baseline 53.42 32.25 48.27
TextRank 41.59 22.52 34.18
PGN+RL 44.85 26.00 36.15
Top6+Seq2Seq 45.09 27.33 39.33
FASRS 54.84 36.48 48.21
RLCPAR 55.89 36.96 49.73
RLCPAR+Text 56.95 37.21 51.24
Experimental Results (Original Abstract + Text of Instruction)
Contrast Test Results(Original Abstract + Text of Instruction)
对比项 内容
原始摘要 一种治疗复发性口腔溃疡的汤剂药物及制备方法,涉及治疗复发性口腔溃疡的中草药配方,其药物是由下述重量份的原料制成的:防风、栀子、藿香、炮姜、麦冬、连翘各8-10克,石膏15-18克,炒苍术、荷叶各4-5克。本发明的特点是取材容易、制备方便、费用低廉、见效快。
改写摘要 一种含有防风、栀子、藿香、炮姜等中药成分的汤剂药物及其制备方法。制备方法为:先将配比量、防风、栀子、藿香、炮姜、麦冬、连翘、石膏、炒苍术、荷叶等中草药放入煎药器具内,加入洁净水,入煎前将上述中草药浸泡半小时,使其充分湿润,以利药汁充分煎出。该药物具有清热泻火消肿排脓的功效,可用于治疗复发性口腔溃疡。
人工摘要 一种治疗复发性口腔溃疡的药物,将防风、栀子、藿香、炮姜、麦冬、连翘、石膏、炒苍术、荷叶加水浸泡后煎熬,除去药渣,得汤剂。该药物具有清热泻火、消肿排脓的功效,用于治疗复发性口腔溃疡。
分析 改写摘要在原始摘要的基础上添加了制备方法和功效。
原始摘要 一种治疗腰间盘突出的中药,其特征的具体成分及配重比是:制草乌:9克,制川乌:9克,麻黄:9克,白芷:9克,仓术:9克,透骨草:9克,地龙:9克,土元:9克,当归:10克,白芍:10克,黄芪:10克,白术:10克,党参:10克,远志:10克,合欢花:10克,细辛:4克,木瓜:10克,独活:9克。用该中药治疗腰间盘突出的病人,不需要开刀住院治疗,就可很快的治愈病人的病症并解除病人的痛苦,用该治疗方法,花费少、治愈快,特别适用于广大缺医少药的农村及工薪阶层,是造福人类的好药,极具推广应用的巨大价值。
改写摘要 一种治疗腰间盘突出的中药。一种中药组合物,原料为:白术 党参 远志 合欢花 细辛 木瓜 独活5.45%,以上中药精确度均为1%。该药物具有肝肾益气活血止痛的功效,可用于治疗气虚血损,风寒湿邪外袭,寒凝筋脉,气滞血瘀湿凝成痰。
人工摘要 一种治疗腰间盘突出的中药组合物。该组合物由制草乌、制川乌、麻黄、白芷、仓术、透骨草、地龙、土元、当归、白芍、黄芪、白术、党参、远志、合欢花、细辛、木瓜、独活等中药组成。将以上中药组合物浸泡在50度高粱酒内一个月得到成品。该中药提取物可治疗腰间盘突出的病人,不需要开刀住院,花费少、治愈快。
分析 改写摘要比原始摘要在功效描述上更加详细全面。
Examples of Patent Rewriting Results
[1] 张金柱, 主立鹏, 刘菁婕. 基于表示学习的无监督跨语言专利推荐研究[J]. 数据分析与知识发现, 2020, 4(10):93-103.
[1] (Zhang Jinzhu, Zhu Lipeng, Liu Jingjie. Unsupervised Cross-Language Model for Patent Recommendation Based on Representation[J]. Data Analysis and Knowledge Discovery, 2020, 4(10):93-103.)
[2] 杨婧, 法雷, 张延花. 面向查询的专利文献自动摘要方法[J]. 产业与科技论坛, 2014, 13(16):70-71.
[2] (Yang Jing, Fa Lei, Zhang Yanhua. Query Oriented Automatic Patent Document Summarization Method[J]. Estate and Science Tribune, 2014, 13(16):70-71.)
[3] 师朝阳. 中国专利摘要改写中“有益效果”问题的研讨[J]. 专利文献研究, 2007(2):9-13.
[3] (Shi Chaoyang. Discussion on the “Beneficial Effect” in the Rewriting of Chinese Patent Abstract[J]. Patent Literature Research, 2007(2):9-13.)
[4] Berg-Kirkpatrick T, Gillick D, Klein D. Jointly Learning to Extract and Compress[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011:481-490.
[5] Banko M, Mittal V O, Witbrock M J. Headline Generation Based on Statistical Translation[C]// Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. 2000:318-325.
[6] Radev D R, Jing H, Sty M, et al. Centroid-based Summarization of Multiple Documents[J]. Information Processing & Management, 2004, 40(6):919-938.
doi: 10.1016/j.ipm.2003.10.006
[7] Knight K, Marcu D. Statistics-Based Summarization -Step One: Sentence Compression[C]// Proceedings of the 17th National Conference on Artificial Intelligence and 12th Conference on Innovative Applications of Artificial Intelligence. Austin, Texas: AAAI Press, 2000:703-710.
[8] Gillick D, Favre B. A Scalable Global Model for Summarization[C]// Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2009: 10-18.
[9] Marcu D. The Theory and Practice of Discourse Parsing and Summarization by Daniel Marcu[J]. Computational Linguistics, 2000, 28(1):81-83.
doi: 10.1162/coli.2000.28.1.81
[10] Pera M S, Ng Y K. A Naive Bayes Classifier for Web Document Summaries Created by Using Word Similarity and Significant Factors[J]. International Journal on Artificial Intelligence Tools, 2010, 19(4):465-486.
doi: 10.1142/S0218213010000285
[11] 张晗, 赵玉虹. 基于语义图的医学多文档摘要提取模型构建[J]. 图书情报工作, 2017, 61(8):112-119.
[11] (Zhang Han, Zhao Yuhong. Constructing Semantic Graph Based on Summary Extracting Model for Multiple for Multiple Medical Documents[J]. Library and Information Service, 2017, 61(8):112-119.)
[12] Yan S, Wan X J. SRRank: Leveraging Semantic Roles for Extractive Multi-Document Summarization[J]. IEEE-ACM Transactions on Audio Speech and Language Processing, 2014, 22(12):2048-2058.
[13] Cheng J P, Lapata M. Neural Summarization by Extracting Sentences and Words[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2016: 484-494.
[14] 贾晓婷, 王名扬, 曹宇. 结合Doc2Vec与改进聚类算法的中文单文档自动摘要方法研究[J]. 数据分析与知识发现, 2018, 2(2):86-95.
[14] (Jia Xiaoting, Wang Mingyang, Cao Yu. Automatic Abstracting of Chinese Document with Doc2Vec and Improved Clustering Algorithm[J]. Data Analysis and Knowledge Discovery, 2018, 2(2):86-95.)
[15] 张迎, 王中卿, 王红玲. 基于篇章主次关系的单文档抽取式摘要方法研究[J]. 中文信息学报, 2019, 33(8):67-76.
[15] (Zhang Ying, Wang Zhongqing, Wang Hongling. Single Document Extractive Summarization with Satellite and Nuclear Relations[J]. Journal of Chinese Information Processing, 2019, 33(8):67-76.)
[16] Liu F, Flanigan J, Thomson S, et al. Toward Abstractive Summarization Using Semantic Representations[C]// Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2015: 1077-1086.
[17] Bing L D, Li P J, Liao Y, et al. Abstractive Multi-Document Summarization via Phrase Selection and Merging[J]. Computational Linguistics, 2015, 31(4):505-530.
doi: 10.1162/089120105775299140
[18] Nallapati R, Zhou B W, Dos Santos C N, et al. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond[C]// Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. Stroudsburg: Association for Computational Linguistics, 2016: 280-290.
[19] See A, Liu P J, Manning C D. Get to the Point: Summarization with Pointer-generator Networks[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2017: 1073-1083.
[20] 沈思, 胡昊天, 叶文豪, 等. 基于全字语义的摘要结构功能自动识别研究[J]. 情报学报, 2019, 38(1):79-88.
[20] (Shen Si, Hu Haotian, Ye Wenhao, et al. Research on Abstract Structure Function Automatic Recognition Based on Full Character Semantics[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(1):79-88.)
[21] Wang K, Quan X J, Wang R. BiSET: Bi-directional Selective Encoding with Template for Abstractive Summarization[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Stroudsburg: Association for Computational Linguistics, 2019: 2153-2162.
[22] Song K T, Tan X, Qin T, et al. Mass: Masked Sequence to Sequence Pre-training for Language Generation[C]// Proceedings of the 36th International Conference on Machine Learning. California:International Conference on Machine Learning, 2019: 5926-5936.
[23] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2017: 6000-6010.
[24] Le Q V, Mikolov T. Distributed Representations of Sentences and Documents[C]// Proceedings of the 31st International Conference on Machine Learning. New York: ACM, 2014: 1188-1196.
[25] Vinyals O, Fortunato M, Jaitly N. Pointer Networks[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2015: 2692-2700.
[26] Mnih V, Badia A P, Mirza M, et al. Asynchronous Methods for Deep Reinforcement Learning[C]// Proceedings of the 33rd International Conference on Machine Learning. USA: International Machine Learning Society, 2016: 2850-2869.
[27] Paulus R, Xiong C, Socher R. A Deep Reinforced Model for Abstractive Summarization[C]// Proceedings of International Conference on Learning Representations. 2018: 428-441.
[28] Lin C Y. ROUGE: A Package for Automatic Evaluation of Summaries[C]// Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004). Stroudsburg: Association for Computational Linguistics, 2004: 74-81.
[29] Chen Y C, Bansal M. Fast Abstractive Summarization with Rein-force-selected Sentence Rewriting[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2018: 675-686.
[1] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[2] Jiang Yaren, Le Xiaoqiu. Continual Learning for One-to-many Entity Relationship Generation with Small Samples[J]. 数据分析与知识发现, 2021, 5(8): 45-53.
[3] Gu Yaowen, Zhang Bowen, Zheng Si, Yang Fengchun, Li Jiao. Predicting Drug ADMET Properties Based on Graph Attention Network[J]. 数据分析与知识发现, 2021, 5(8): 76-85.
[4] Ruan Xiaoyun,Liao Jianbin,Li Xiang,Yang Yang,Li Daifeng. Interpretable Recommendation of Reinforcement Learning Based on Talent Knowledge Graph Reasoning[J]. 数据分析与知识发现, 2021, 5(6): 36-50.
[5] Song Ruoxuan,Qian Li,Du Yu. Identifying Academic Creative Concept Topics Based on Future Work of Scientific Papers[J]. 数据分析与知识发现, 2021, 5(5): 10-20.
[6] Han Pu,Zhang Zhanpeng,Zhang Mingtao,Gu Liang. Normalizing Chinese Disease Names with Multi-feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 83-94.
[7] Wang Nan,Li Hairong,Tan Shuru. Predicting of Public Opinion Reversal with Improved SMOTE Algorithm and Ensemble Learning[J]. 数据分析与知识发现, 2021, 5(4): 37-48.
[8] Li Danyang, Gan Mingxin. Music Recommendation Method Based on Multi-Source Information Fusion[J]. 数据分析与知识发现, 2021, 5(2): 94-105.
[9] Ding Hao, Ai Wenhua, Hu Guangwei, Li Shuqing, Suo Wei. A Personalized Recommendation Model with Time Series Fluctuation of User Interest[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[10] Yin Haoran,Cao Jinxuan,Cao Luzhe,Wang Guodong. Identifying Emergency Elements Based on BiGRU-AM Model with Extended Semantic Dimension[J]. 数据分析与知识发现, 2020, 4(9): 91-99.
[11] Qiu Erli,He Hongwei,Yi Chengqi,Li Huiying. Research on Public Policy Support Based on Character-level CNN Technology[J]. 数据分析与知识发现, 2020, 4(7): 28-37.
[12] Jiao Qihang,Le Xiaoqiu. Generating Sentences of Contrast Relationship[J]. 数据分析与知识发现, 2020, 4(6): 43-50.
[13] Liu Weijiang,Wei Hai,Yun Tianhe. Evaluation Model for Customer Credits Based on Convolutional Neural Network[J]. 数据分析与知识发现, 2020, 4(6): 80-90.
[14] Wang Mo,Cui Yunpeng,Chen Li,Li Huan. A Deep Learning-based Method of Argumentative Zoning for Research Articles[J]. 数据分析与知识发现, 2020, 4(6): 60-68.
[15] Yan Chun,Liu Lu. Classifying Non-life Insurance Customers Based on Improved SOM and RFM Models[J]. 数据分析与知识发现, 2020, 4(4): 83-90.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn