基于提示学习与T5 PEGASUS的图书宣传自动摘要生成器

doi:10.11925/infotech.2096-3467.2022-0350

数据分析与知识发现

本期目录 | 过刊浏览 | 高级检索

基于提示学习与T5 PEGASUS的图书宣传自动摘要生成器

李岱峰,林凯欣,李栩婷

(中山大学信息管理学院广东广州 510006)

A books promotion abstractive summarization method based on prompt learning and T5 PEGASUS

Li Daifeng,Lin Kaixin,Li Xuting

(School of Information Management，Sun Yat－Sen University，Guangzhou 510006，China)

摘要
相关文章
Metrics

全文:
输出: BibTeX | EndNote (RIS)

摘要

[目的]从图书信息中快速生成书籍的实时宣传语，减少使用纯人工手段所消耗的人力物力。

[应用背景]现有针对图书自动化宣传摘要生成的研究较少，图书馆与网上书城对图书的宣传多使用人工法撰写或生成固定的宣传语，使得增加了工作负担且达不到好的宣传效果。

[方法]基于提示学习的思想将爬取的图书信息构造数据集，使用数据增强、关键词抽取增加信息，最后输入T5 PEGASUS得到基础宣传语。并当书评数量达到阈值时加入书评的摘要。

[结果] 本文模型可以在数据集上Rouge-1、Rouge-2、Rouge-L相较于最优的基线模型分别提升了28.9%、37.6%、31.9%。而加入书评的摘要则能体现用户的兴趣点。

[结论]本文根据图书语料的特点设计的实验流程所生成宣传语具有实际的应用价值。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

关键词 ：文本摘要, 提示学习, 数据增强, Textrank, T5 PEGASUS

Abstract：

[Objective] To generate accurate propaganda words of books from book information quickly, and reduce the manpower and material resources consumed by purely artificial means.

[Application Background] Currently, there are few researches on the generation of automatic publicity summary of books. The library and online book market mostly use manual method to write publicity words for books, which increases the work burden.

[Methods] Based on the idea of prompt learning, the data set is constructed by crawling the book information, data enhancement and keyword extraction are used to increase the information, and finally T5 PEGASUS is input to get the basic propaganda. Summaries of book reviews are added when the number of book reviews reaches the threshold.

[Results] Compared with the optimal baseline model, the results of Rouge-1、Rouge-2、Rouge-L were improved by 28.9%, 37.6% and 31.9%, respectively.

[Conclusions] According to the characteristics of the book corpus, the propaganda generated by the experiment process has practical application value.

Key words： Text summarization, Prompt learning, Data enhancement Textrank,T5 PEGASUS

出版日期: 2022-07-29

ZTFLH:

TP393，G250

引用本文:

李岱峰, 林凯欣, 李栩婷. 基于提示学习与T5 PEGASUS的图书宣传自动摘要生成器 [J]. 数据分析与知识发现, 10.11925/infotech.2096-3467.2022-0350.
Li Daifeng, Lin Kaixin, Li Xuting. A books promotion abstractive summarization method based on prompt learning and T5 PEGASUS . Data Analysis and Knowledge Discovery, 0, (): 1-.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022-0350 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y0/V/I/1

[1]	黄泰峰, 马静. 基于提示学习增强的文本情感分类模型*[J]. 数据分析与知识发现, 2024, 8(3): 77-84.
[2]	宝日彤, 孙海春. 多文档摘要研究综述^*[J]. 数据分析与知识发现, 2024, 8(2): 17-32.
[3]	刘江峰, 冯钰童, 刘浏, 沈思, 王东波. 领域双语数据增强的学术文本摘要结构识别研究*[J]. 数据分析与知识发现, 2023, 7(8): 105-118.
[4]	陈诺, 李旭晖. 一种基于模板提示学习的事件抽取方法^*[J]. 数据分析与知识发现, 2023, 7(6): 86-98.
[5]	李岱峰, 林凯欣, 李栩婷. 基于提示学习与T5 PEGASUS的图书宣传自动摘要生成器^*[J]. 数据分析与知识发现, 2023, 7(3): 121-130.
[6]	赵一鸣, 潘沛, 毛进. 基于任务知识融合与文本数据增强的医学信息查询意图强度识别研究^*[J]. 数据分析与知识发现, 2023, 7(2): 38-47.
[7]	曾子明, 张瑜. 基于数据增强和多任务学习的突发公共卫生事件谣言识别研究^*[J]. 数据分析与知识发现, 2023, 7(11): 56-67.
[8]	刘兴丽, 范俊杰, 马海群. 面向小样本命名实体识别的数据增强算法改进策略研究^*[J]. 数据分析与知识发现, 2022, 6(10): 128-141.
[9]	俞琰, 朱晟忱. 融入限定关系的专利关键词抽取方法^*[J]. 数据分析与知识发现, 2022, 6(10): 57-67.
[10]	刘彤,刘琛,倪维健. 多层次数据增强的半监督中文情感分析方法^*[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[11]	闫强,张笑妍,周思敏. 基于义原相似度的关键词抽取方法 ^*[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
[12]	夏天. 面向中文学术文本的单文档关键短语抽取 ^*[J]. 数据分析与知识发现, 2020, 4(7): 76-86.
[13]	孙明珠,马静,钱玲飞. 基于文档主题结构和词图迭代的关键词抽取方法研究 ^*[J]. 数据分析与知识发现, 2019, 3(8): 68-76.
[14]	王安,顾益军,李坤明,李文政. 基于复杂网络词节点移除的关键词抽取方法 ^*[J]. 数据分析与知识发现, 2019, 3(11): 35-44.
[15]	刘竹辰, 陈浩, 于艳华, 李劼. 词位置分布加权TextRank的关键词提取^*[J]. 数据分析与知识发现, 2018, 2(9): 74-79.

Viewed

Full text

Abstract

Cited

Shared

Discussed