Please wait a minute...
Advanced Search
数据分析与知识发现  2024, Vol. 8 Issue (2): 17-32     https://doi.org/10.11925/infotech.2096-3467.2022.1245
  综述评价 本期目录 | 过刊浏览 | 高级检索 |
多文档摘要研究综述*
宝日彤1,孙海春1,2()
1中国人民公安大学信息网络安全学院 北京 100038
2安全防范技术与风险评估公安部重点实验室 北京 100026
An Overview of Research on Multi-Document Summarization
Bao Ritong1,Sun Haichun1,2()
1School of Information and Cyber Security, People's Public Security University of China, Beijing 100038, China
2Key Laboratory of Security Technology & Risk Assessment, People's Public Security University of China, Beijing 100026, China
全文: PDF (1034 KB)   HTML ( 9
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 调研和梳理相关文献,总结多文档摘要研究框架和主流模型。【文献范围】 以“Multi-Document Summarization”、“多文档摘要”为检索词,分别在AI Open Index、Paper with Code和CNKI数据库中进行检索,共筛选出76篇文献。【方法】 归纳多文档摘要技术实现的主流框架,依据关键技术对近年最新模型和算法进行分类概述,并对未来研究提出展望。【结果】 对比阐述了多文档摘要最新模型与传统方法的优缺点,并对高质量多文档摘要数据集、现阶段评价指标进行总结。【局限】 在实验结果对比部分,只讨论了Multi-News等数据集上部分应用较为广泛模型的评估结果,缺乏全部模型在同一数据集上的实验结果对比。【结论】 多文档摘要任务仍存在很多亟待解决的问题,如生成摘要的事实性不高、摘要模型的通用性差等。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
宝日彤
孙海春
关键词 多文档摘要文本摘要内容选择Transformer模型预训练模型    
Abstract

[Objective] This paper reviews the literature on multi-document summarization, aiming to examine their research frameworks and mainstream models. [Coverage] We searched the AI Open Index, Paper with Code, and CNKI databases with queries “multi-document summarization” and “多文档摘要”. A total of 76 representative articles were retrieved. [Methods] We summarized the mainstream research frameworks, the latest models, and algorithms of multi-document summarization technology. We also present prospects for future studies. [Results] This paper compared the strengths and weaknesses of the latest models for multi-document summarization to the traditional methods. We also summarized high-quality multi-document summarization datasets and current evaluation metrics. [Limitations] We only discussed the evaluation results of some popular models on the Multi-News dataset, lacking a comparison of all models on the same dataset. [Conclusions] Many challenges remain in the task of multi-document summarization, including the generated summaries' low factual accuracy and the models' poor generality.

Key wordsMulti-Document Summarization    Text Summarization    Content Selection    Transformer Model    Pre-Training Model
收稿日期: 2022-11-22      出版日期: 2023-04-11
ZTFLH:  TP311  
  G350  
基金资助:*公安部技术研究计划项目(2020JSYJC22);北京市自然科学基金项目(4184099)
通讯作者: 孙海春,E-mail: sunhaichun@ppsuc.edu.cn。   
引用本文:   
宝日彤, 孙海春. 多文档摘要研究综述*[J]. 数据分析与知识发现, 2024, 8(2): 17-32.
Bao Ritong, Sun Haichun. An Overview of Research on Multi-Document Summarization. Data Analysis and Knowledge Discovery, 2024, 8(2): 17-32.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.1245      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2024/V8/I2/17
常用数据集 开始年限 应用领域 技术难点
单文档摘要 Gigaword 1958年 领域论文摘要自动生成 (1)模型大多具有普适性,但在细分场景下生成摘要细节含量不足;
(2)生成摘要可能存在事实性错误。
LCSTS[14]
CNN/DM
Newsroom[15] 文献自动标引
Xsum[16]
PubMed 新闻标题生成
Rotten Tomatoes
多文档摘要 DUC/TAC 20世纪80年代 新闻、产品评论等多种形式文档信息的总结 (1)跨文档句子间关系的建模;
(2)生成摘要冗余性高、覆盖细节信息不全、可能存在事实性错误。
WikiSum 智能机器人[17]
ScisummNet[18] 临床试验干预的有效性预测[19]
Multi-News
Table 1  单文档摘要与多文档摘要联系与区别
Fig.1  多文档摘要实现一般框架
Fig.2  传统基于图结构的多文档摘要方法流程
Fig.3  基于图神经网络的多文档摘要技术
模型 节点 边权重 方法
SemSentSum 句子 句子-句子 余弦相似度(边缘消除) 图卷积网络
SummPip 句子 句子-句子 近似话语图 无监督句子压缩
HeterSumGraph 单词、句子、文档 单词-句子
单词-文档
TF-IDF 图注意力网络
EMSum 段落、实体 段落-实体 TF-IDF 图注意力网络
GraphSum 段落 段落-段落 TF-IDF相似图
基于LDA主题模型图
近似话语图
分层图注意力网络
SgSum 句子 句子-句子 TF-IDF相似图
基于LDA主题模型图
近似话语图
分层图注意力网络
Table 2  基于图的多文档摘要模型
Fig.4  不同注意力机制图示[38]
模型 Transformer结构 注意力机制 文本单元
T-DMCA 仅含解码器Transformer模型 自我注意力机制
记忆压缩注意力机制
词语
Hierarchical Transformer 分层Transformer模型 段落间注意力机制
全局注意力机制
词语、段落
CopyTransformer+DPPs 普通Transformer模型 基于行列式点过程分配注意力权重 词语、句子
Highlight-Transformer 普通Transformer模型 单头突出注意力机制
多头突出注意力机制
关键短语
ParsingSum-Transformer 分层Transformer模型
普通Transformer模型
利用依存关系引导多头注意力机制 词语、句子
Table 3  基于Transformer模型的多文档摘要模型
Fig.5  指针生成网络结构[58]
方法名称 特点 适用场景 实验数据集 测试性能(%) 模型主干网络
R-1 R-2 R-L
ProCluster 将子句命题分组在一起,旨在实现更精确的信息对齐 抽取式多文档摘要 TAC 2011
DUC 2004
40.98
38.73
12.40
9.64
15.77
13.89
OpenIE提取器
SuperPAL分类器
BART模型
CopyTransformer[75] 使用高效内容选择器来确定源文档中应该作为摘要的短语 输入序列过长、内容选择效率低 CNN-DM 41.22 18.68 38.34 自下而上注意力机制
HDSG 通过异构图连接多个文档,图中包含不同粒度节点,但相同粒度节点之间没有联系 利用外部规则生成图像产生错误传播 Multi-News 46.05 16.35 42.08 图注意力网络
SgSum 将摘要任务抽象为子图选择问题,通过子图建模,选取最佳子图总结摘要 摘要的连贯性和简洁性低 Multi-News
DUC2004
47.53
39.41
18.75
10.42
43.31
35.41
分层图注意力网络
ParsingSum 利用依赖性解析捕获跨位置依赖性和语法结构 句子中存在对摘要质量有很大影响的语言知识 Multi-News 44.32 15.35 20.72 多头注意力机制
Highlight 具有高亮机制,可以为关键短语中的标记分配更大权重 从多个输入文档中生成包含显著细节的摘要 Multi-News 44.62 15.57 18.06 单头和多头突出注意力机制
PRIMERA 通过屏蔽文档中的显著句子,训练让模型根据上下文信息重构句子 低资源领域 Multi-News
DUC2004
WikiSum
42.00
35.10
28.00
13.60
7.20
8.00
20.80
17.90
18.00
局部和全局注意力机制
BART-Long-Graph 将Longformer模型注意力机制集成于BART模型中,探索各种大小的上下文局部注意机制窗口 输入的序列过长 Multi-News
DUC-2004
49.15
34.72
19.50
7.97
24.47
11.04
上下文局部注意力机制和全局注意力机制
PoBRL
λ= λ a d v t
通过强化学习单独解决子问题,将每个学习到的策略混合生成简洁、完整的摘要 从不同角度对生成摘要进行优化 Multi-News
DUC-2004
46.51
38.67
17.33
10.23
42.42
13.19
强化学习
BART-LED+ROUGE-L+ Coverage( β=1.0)+RELAX 用奖励平衡生成摘要的ROUGE评分和文档覆盖率 最大似然估计的目标函数难以满足摘要的质量需求 Multi-News 47.23 18.86 25.03 Transformer
强化学习
Table 3  多文档摘要主流方法实验效果比较
[1] Luhn H P. The Automatic Creation of Literature Abstracts[J]. IBM Journal of Research and Development, 1958, 2(2): 159-165.
doi: 10.1147/rd.22.0159
[2] Nallapati R, Zhou B W, dos Santos C, et al. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond[C]// Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. Stroudsburg, PA, USA: Association for Computational Linguistics, 2016: 280-290.
[3] Zopf M. Auto-HMDS: Automatic Construction of a Large Heterogeneous Multilingual Multi-Document Summarization Corpus[C]// Proceedings of the 11th International Conference on Language Resources and Evaluation. European Language Resources Association, 2018.
[4] Liu P J, Saleh M, Pot E, et al. Generating Wikipedia by Summarizing Long Sequences[OL]. arXiv Preprint, arXiv: 1801.10198.
[5] Fabbri A, Li I, She T W, et al. Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 1074-1084.
[6] Gu J T, Lu Z D, Li H, et al. Incorporating Copying Mechanism in Sequence-to-Sequence Learning[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1631-1640.
[7] Tu Z P, Lu Z D, Liu Y, et al. Modeling Coverage for Neural Machine Translation[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 76-85.
[8] Yasunaga M, Zhang R, Meelu K, et al. Graph-Based Neural Multi-Document Summarization[C]// Proceedings of the 21st Conference on Computational Natural Language Learning. 2017: 452-462.
[9] Paulus R, Xiong C M, Socher R. A Deep Reinforced Model for Abstractive Summarization[OL]. arXiv Preprint, arXiv: 1705.04304.
[10] Cho S, Lebanoff L, Foroosh H, et al. Improving the Similarity Measure of Determinantal Point Processes for Extractive Multi-Document Summarization[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 1027-1038.
[11] 黄文彬, 倪少康. 多文档自动摘要方法的进展研究[J]. 情报科学, 2017, 35(4): 160-165.
[11] (Huang Wenbin, Ni Shaokang. Study of the Development of Multi-Document Automatic Summarization[J]. Information Science, 2017, 35(4): 160-165.)
[12] 李金鹏, 张闯, 陈小军, 等. 自动文本摘要研究综述[J]. 计算机研究与发展, 2021, 58(1): 1-21.
[12] (Li Jinpeng, Zhang Chuang, Chen Xiaojun, et al. Survey on Automatic Text Summarization[J]. Journal of Computer Research and Development, 2021, 58(1): 1-21.)
[13] Jalil Z, Nasir J A, Nasir M. Extractive Multi-Document Summarization: A Review of Progress in the Last Decade[J]. IEEE Access, 2021, 9: 130928-130946.
doi: 10.1109/ACCESS.2021.3112496
[14] Hu B T, Chen Q C, Zhu F Z. LCSTS: A Large Scale Chinese Short Text Summarization Dataset[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2015: 1967-1972.
[15] Grusky M, Naaman M, Artzi Y. Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2018: 708-719.
[16] Narayan S, Cohen S B, Lapata M. Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2018: 1797-1807.
[17] Xu R X, Cao J, Wang M X, et al. Xiaomingbot: A Multilingual Robot News Reporter[OL]. arXiv Preprint, arXiv: 2007.08005.
[18] Yasunaga M, Kasai J, Zhang R, et al. ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 7386-7393.
doi: 10.1609/aaai.v33i01.33017386
[19] Katsimpras G, Paliouras G. Predicting Intervention Approval in Clinical Trials Through Multi-Document Summarization[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022: 1947-1957.
[20] 曾昭霖, 严馨, 徐广义, 等. 基于层级BiGRU+Attention的面向查询的新闻多文档抽取式摘要方法[J]. 小型微型计算机系统, 2023, 44(1): 185-192.
[20] (Zeng Zhaolin, Yan Xin, Xu Guangyi, et al. Query-Oriented News Multi-Document Extractive Summarization Method Based on Hierarchical BiGRU+Attention[J]. Journal of Chinese Computer Systems, 2023, 44(1): 185-192.)
[21] Zhao C, Huang T H, Chowdhury S B R, et al. Read Top News First: A Document Reordering Approach for Multi-Document News Summarization[OL]. arXiv Preprint, arXiv: 2203.10254.
[22] Jin H Q, Wang T M, Wan X J. Multi-Granularity Interaction Network for Extractive and Abstractive Multi-Document Summarization[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 6244-6254.
[23] Antognini D, Faltings B. Learning to Create Sentence Semantic Relation Graphs for Multi-Document Summarization[OL]. arXiv Preprint, arXiv: 1909.12231.
[24] Zhao J M, Liu M, Gao L X, et al. SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2020: 1949-1952.
[25] Wang D Q, Liu P F, Zheng Y N, et al. Heterogeneous Graph Neural Networks for Extractive Document Summarization[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 6209-6219.
[26] Zhou H, Ren W D, Liu G S, et al. Entity-Aware Abstractive Multi-Document Summarization[C]// Proceedings of the 2021 International Joint Conference on Natural Language Processing. 2021: 351-362.
[27] Cui P, Hu L. Topic-Guided Abstractive Multi-Document Summarization[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021: 1463-1472.
[28] Li W, Xiao X Y, Liu J C, et al. Leveraging Graph to Improve Abstractive Multi-Document Summarization[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 6232-6243.
[29] Hickmann M L, Wurzberger F, Hoxhalli M, et al. Analysis of GraphSum's Attention Weights to Improve the Explainability of Multi-Document Summarization[C]// Proceedings of the 23rd International Conference on Information Integration and Web Intelligence. ACM, 2021: 359-366.
[30] Chen M Y, Li W, Liu J C, et al. SgSum: Transforming Multi-Document Summarization into Sub-Graph Selection[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2021: 4063-4074.
[31] Nayeem M T, Fuad T A, Chali Y. Abstractive Unsupervised Multi-Document Summarization Using Paraphrastic Sentence Fusion[C]// Proceedings of the 27th International Conference on Computational Linguistics. 2018: 1191-1204.
[32] Alambo A, Lohstroh C, Madaus E, et al. Topic-Centric Unsupervised Multi-Document Summarization of Scientific and News Articles[C]// Proceedings of the 2020 IEEE International Conference on Big Data. IEEE, 2020: 591-596.
[33] Saeed M Y, Awais M, Talib R, et al. Unstructured Text Documents Summarization with Multi-Stage Clustering[J]. IEEE Access, 2020, 8: 212838-212854.
doi: 10.1109/Access.6287639
[34] Ernst O, Caciularu A, Shapira O, et al. A Proposition-Level Clustering Approach for Multi-Document Summarization[C]// Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2021: 1765-1779.
[35] Alqaisi R, Ghanem W, Qaroush A. Extractive Multi-Document Arabic Text Summarization Using Evolutionary Multi-Objective Optimization with K-Medoid Clustering[J]. IEEE Access, 2020, 8: 228206-228224.
doi: 10.1109/ACCESS.2020.3046494
[36] Coavoux M, Elsahar H, Gallé M. Unsupervised Aspect-Based Multi-Document Abstractive Summarization[C]// Proceedings of the 2nd Workshop on New Frontiers in Summarization. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019: 42-47.
[37] Brazinskas A, Lapata M, Titov I. Unsupervised Multi-Document Opinion Summarization as Copycat-Review Generation[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2019: 5151-5169.
[38] Pasunuru R, Liu M W, Bansal M, et al. Efficiently Summarizing Text and Graph Encodings of Multi-Document Clusters[C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2021: 4768-4779.
[39] Liu Y, Lapata M. Hierarchical Transformers for Multi-Document Summarization[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 5050-5081.
[40] Perez-Beltrachini L, Lapata M. Multi-Document Summarization with Determinantal Point Process Attention[J]. Journal of Artificial Intelligence Research, 2021, 71: 371-399.
doi: 10.1613/jair.1.12522
[41] Liu S Q, Cao J N, Yang R S, et al. Highlight-Transformer: Leveraging Key Phrase Aware Attention to Improve Abstractive Multi-Document Summarization[C]// Proceedings of the 2021 International Joint Conference on Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2021: 5021-5027.
[42] Ma C B, Zhang W E, Wang H, et al. Incorporating Linguistic Knowledge for Abstractive Multi-Document Summarization[C]// Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation. 2022: 147-156.
[43] Kim S. Using Pre-Trained Transformer for Better Lay Summarization[C]// Proceedings of the 1st Workshop on Scholarly Document Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020: 328-335.
[44] Zou Y Y, Zhang X X, Lu W, et al. Pre-Training for Abstractive Document Summarization by Reinstating Source Text[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020: 3646-3660.
[45] Aghajanyan A, Okhonko D, Lewis M, et al. HTLM: Hyper-Text Pre-Training and Prompting of Language Model[OL]. arXiv Preprint, arXiv: 2107.06955.
[46] Goodwin T, Savery M, Demner-Fushman D.Flight of the PEGASUS? Comparing Transformers on Few-Shot and Zero-Shot Multi-Document Abstractive Summarization[C]// Proceedings of the 28th International Conference on Computational Linguistics. 2020: 5640-5646.
[47] Lewis M, Liu Y H, Goyal N, et al. BART: Denoising Sequence-to-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 7871-7880.
[48] Raffel C, Shazeer N M, Roberts A, et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer[J]. Journal of Machine Learning Research, 2020, 21(1): 5485-5551.
[49] Zhang J Q, Zhao Y, Saleh M, et al. PEGASUS: Pre-Training with Extracted Gap-Sentences for Abstractive Summarization[C]// Proceedings of the 37th International Conference on Machine Learning. 2020: 11328-11339.
[50] Beltagy I, Peters M E, Cohan A. Longformer: The Long-Document Transformer[OL]. arXiv Preprint, arXiv: 2004.05150.
[51] Moro G, Ragazzi L, Valgimigli L, et al. Discriminative Marginalized Probabilistic Neural Method for Multi-Document Summarization of Medical Literature[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022: 180-189.
[52] Zaheer M, Guruganesh G, Dubey A, et al. Big Bird: Transformers for Longer Sequences[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. ACM, 2020: 17283-17297.
[53] Guo M, Ainslie J, Uthus D, et al. LongT5: Efficient Text-to-Text Transformer for Long Sequences[C]// Proceedings of the 2022 North American Chapter of the Association for Computational Linguistics. 2022: 724-736.
[54] Xiao W, Beltagy I, Carenini G, et al. PRIMERA: Pyramid-Based Masked Sentence Pre-Training for Multi-Document Summarization[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022: 5245-5263.
[55] Puduppully R, Steedman M. Multi-Document Summarization with Centroid-Based Pretraining[OL]. arXiv Preprint, arXiv: 2208.01006.
[56] Goldstein J, Carbonell J. Summarization: (1) Using MMR for Diversity - Based Reranking and (2) Evaluating Summaries[C]// TIPSTER'98:Proceedings of a Workshop on Held at Baltimore, Maryland. 1998.
[57] Akhtar N, Beg M M S, Hussain M M, et al. Extractive Multi-Document Summarization Using Relative Redundancy and Coherence Scores[J]. Journal of Intelligent & Fuzzy Systems, 2020, 38(5): 6201-6210.
[58] See A, Liu P J, Manning C D. Get to the Point: Summarization with Pointer-Generator Networks[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1073-1083.
[59] Lebanoff L, Song K Q, Liu F. Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2018: 4131-4141.
[60] Chen Z J, Xu J, Liao M, et al. Two-Phase Multi-Document Event Summarization on Core Event Graphs[J]. Journal of Artificial Intelligence Research, 2022, 74: 1037-1057.
doi: 10.1613/jair.1.13267
[61] Su A, Su D F, Mulvey J M, et al. PoBRL: Optimizing Multi-Document Summarization by Blending Reinforcement Learning Policies[OL]. arXiv Preprint, arXiv: 2105.08244.
[62] Song Y Z, Chen Y S, Shuai H H. Improving Multi-Document Summarization Through Referenced Flexible Extraction with Credit-Awareness[C]// Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2022:1667-1681.
[63] Parnell J, Unanue I J, Piccardi M. A Multi-Document Coverage Reward for RELAXed Multi-Document Summarization[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022: 5112-5128.
[64] Lu Y, Dong Y, Charlin L. Multi-XScience: A Large-Scale Dataset for Extreme Multi-Document Summarization of Scientific Articles[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020: 8068-8074.
[65] Ghalandari D G, Hokamp C, Pham N T, et al. A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 1302-1308.
[66] Boni O, Feigenblat G, Lev G, et al. HowSumm: A Multi-Document Summarization Dataset Derived from WikiHow Articles[OL]. arXiv Preprint, arXiv: 2110.03179.
[67] Xu Y M, Lapata M. Coarse-to-Fine Query Focused Multi-Document Summarization[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020: 3632-3645.
[68] Xu Y M, Lapata M. Query Focused Multi-Document Summarization with Distant Supervision[OL]. arXiv Preprint, arXiv: 2004.03027.
[69] Nenkova A, Passonneau R, McKeown K. The Pyramid Method: Incorporating Human Content Selection Variation in Summarization Evaluation[J]. ACM Transactions on Speech and Language Processing, 2007, 4(2): Article No.4.
[70] Lin C Y, Hovy E. Automatic Evaluation of Summaries Using N-Gram Co-Occurrence Statistics[C]// Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. ACM, 2003: 71-78.
[71] Lin C Y, Hovy E. The Automated Acquisition of Topic Signatures for Text Summarization[C]// Proceedings of the 18th Conference on Computational Linguistics. ACM, 2000: 495-501.
[72] Papineni K, Roukos S, Ward T, et al. BLEU: A Method for Automatic Evaluation of Machine Translation[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. ACM, 2002: 311-318.
[73] Gao Y, Zhao W, Eger S. SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 1347-1354.
[74] Wolhandler R, Cattan A, Ernst O, et al. How “Multi” is Multi-Document Summarization?[C]// Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022: 5761-5769.
[75] Gehrmann S, Deng Y T, Rush A. Bottom-up Abstractive Summarization[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2018: 4098-4109.
[76] 梁梦英, 李德玉, 王素格, 等. Senti-PG-MMR: 多文档游记情感摘要生成方法[J]. 中文信息学报, 2022, 36(3): 128-135.
[76] (Liang Mengying, Li Deyu, Wang Suge, et al. Senti-PG-MMR: Research on Generation Method of Sentimental Summary of Multi-Document Travel Notes[J]. Journal of Chinese Information Processing, 2022, 36(3): 128-135.)
[1] 杜新玉, 李宁. 中文学术论文全文语步识别研究*[J]. 数据分析与知识发现, 2024, 8(2): 74-83.
[2] 刘江峰, 冯钰童, 刘浏, 沈思, 王东波. 领域双语数据增强的学术文本摘要结构识别研究*[J]. 数据分析与知识发现, 2023, 7(8): 105-118.
[3] 邓娜, 何昕洋, 陈伟杰, 陈旭. MPMFC:一种融合网络邻里结构特征和专利语义特征的中药专利分类模型*[J]. 数据分析与知识发现, 2023, 7(4): 145-158.
[4] 李岱峰, 林凯欣, 李栩婷. 基于提示学习与T5 PEGASUS的图书宣传自动摘要生成器*[J]. 数据分析与知识发现, 2023, 7(3): 121-130.
[5] 赵朝阳, 朱贵波, 王金桥. ChatGPT给语言大模型带来的启示和多模态大模型新的发展思路*[J]. 数据分析与知识发现, 2023, 7(3): 26-35.
[6] 钱力, 刘熠, 张智雄, 李雪思, 谢靖, 许钦亚, 黎洋, 管铮懿, 李西雨, 文森. ChatGPT的技术基础分析*[J]. 数据分析与知识发现, 2023, 7(3): 6-15.
[7] 佟昕瑀, 赵蕊洁, 路永和. 基于预训练模型的多标签专利分类研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 129-137.
[8] 陈星月, 倪丽萍, 倪志伟. 基于ELECTRA模型与词性特征的金融事件抽取方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 36-47.
[9] 赵旸, 张智雄, 刘欢, 丁良萍. 基于BERT模型的中文医学文献分类研究*[J]. 数据分析与知识发现, 2020, 4(8): 41-49.
[10] 张冬瑜,崔紫娟,李映夏,张伟,林鸿飞. 基于Transformer和BERT的名词隐喻识别*[J]. 数据分析与知识发现, 2020, 4(4): 100-108.
[11] 程倩倩,田大钢. 基于基本要素方法的中文自动文本摘要模型*[J]. 现代图书情报技术, 2010, 26(2): 74-78.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn