融合内容和图片特征的中文摘要生成方法研究*

doi:10.11925/infotech.2096-3467.2022.1303

数据分析与知识发现

2024, Vol. 8

Issue (3): 110-119 https://doi.org/10.11925/infotech.2096-3467.2022.1303

研究论文

本期目录 | 过刊浏览 | 高级检索

融合内容和图片特征的中文摘要生成方法研究*

全安坤^1,²,李红莲¹(

),张乐²,吕学强²

¹北京信息科技大学信息与通信工程学院北京 100101
²北京信息科技大学网络文化与数字传播北京市重点实验室北京 100101

Generating Chinese Abstracts with Content and Image Features

Quan Ankun^1,²,Li Honglian¹(

),Zhang Le²,Lyu Xueqiang²

¹School of Information & Communication Engineering, Beijing Information Science and Technology University, Beijing 100101, China
²Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (1416 KB) HTML ( 7 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】为解决现有根据单一文本特征生成的中文摘要质量不佳问题，提出一种融合内容和图片特征的中文摘要生成方法。【方法】使用BERT提取文本特征，使用ResNet提取图片特征，该特征能够对文本特征进行补充与验证，并利用注意力机制将两种模态特征进行融合，最终将融合后的特征送入指针生成网络模型进一步生成质量更高的中文摘要。【结果】实验结果表明，所提方法相较于仅使用单一文本模态生成中文摘要的方法，在ROUGE-1、ROUGE-2和ROUGE-L指标上分别有1.9、1.3和1.4个百分点的提升。【局限】实验数据主要来源于新闻领域，在其他领域中的效果有待验证。【结论】加入图片信息能够使融合后的特征保存更多重要信息，帮助模型更好地定位关键内容，使生成的摘要更具有概括性和可读性。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	全安坤
	李红莲
	张乐
	吕学强

关键词 ：特征融合, BERT, ResNet, 注意力机制, 摘要生成

Abstract：

[Objective] This paper proposes a new Chinese abstract generation method integrating content and image features. It aims to improve the performance of existing methods based on text features. [Methods] First, we used the BERT to extract text features and used ResNet to extract image features. Then, we utilized these features to complement and validate each other. Third, we fused the two modal features with the attention mechanism. Finally, we inputted the fused features into a pointer generation network to generate higher-quality Chinese abstracts. [Results] Compared to models solely relying on single text modality, the proposed method showed improvements of 1.9%, 1.3%, and 1.4% on ROUGE-1, ROUGE-2, and ROUGE-L metrics, respectively. [Limitations] The experimental data were primarily retrieved from the news domain, and the model’s effectiveness in other fields remains to be verified. [Conclusions] Incorporating image information allows the fused features to retain more important information. It helps the model identify the key content better and makes the generated abstracts more comprehensive and readable.

Key words： Feature Fusion BERT ResNet Attention Mechanism Abstract Generation

收稿日期: 2022-12-07 出版日期: 2023-05-16

ZTFLH:

TP391

基金资助:* 国家自然科学基金项目(62171043);北京信息科技大学“勤信人才”培育计划基金项目(QXTCP B201908)

通讯作者: 李红莲，ORCID：0000-0002-0531-3650，E-mail：lihonglian@bistu.edu.cn。

引用本文:

全安坤, 李红莲, 张乐, 吕学强. 融合内容和图片特征的中文摘要生成方法研究*[J]. 数据分析与知识发现, 2024, 8(3): 110-119.
Quan Ankun, Li Honglian, Zhang Le, Lyu Xueqiang. Generating Chinese Abstracts with Content and Image Features. Data Analysis and Knowledge Discovery, 2024, 8(3): 110-119.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.1303 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2024/V8/I3/110

Fig.1 融合内容与图片特征的中文摘要生成框架

Table 1 ResNet模块参数设置

Fig.2 注意力机制模型

Fig.3 指针生成网络

Table 2 参数设置

Table 3 实验结果

Table 4 摘要实例对比

Fig.4 图片与相应文本的注意力可视化

Table 5 消融实验结果

[1]	明拓思宇, 陈鸿昶. 文本摘要研究进展与趋势[J]. 网络与信息安全学报, 2018, 4(6): 1-10.
[1]	(Ming Tuosiyu, Chen Hongchang. Research Progress and Trend of Text Summarization[J]. Chinese Journal of Network and Information Security, 2018, 4(6): 1-10.)
[2]	何丽. 基于多模态神经网络的图文摘要生成方法研究[D]. 北京: 北京邮电大学, 2021.
[2]	(He Li. Research on Method of Text-Image Summarization Based on Multimodal Neural Network[D]. Beijing: Beijing University of Posts and Telecommunications, 2021.)
[3]	See A, Liu P J, Manning C D. Get to the Point: Summarization with Pointer-Generator Networks[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1073-1083.
[4]	谭金源, 刁宇峰, 祁瑞华, 等. 基于BERT-PGN模型的中文新闻文本自动摘要生成[J]. 计算机应用, 2021, 41(1): 127-132. doi: 10.11772/j.issn.1001-9081.2020060920
[4]	(Tan Jinyuan, Diao Yufeng, Qi Ruihua, et al. Automatic Summary Generation of Chinese News Text Based on BERT-PGN Model[J]. Journal of Computer Applications, 2021, 41(1): 127-132.) doi: 10.11772/j.issn.1001-9081.2020060920
[5]	李金鹏, 张闯, 陈小军, 等. 自动文本摘要研究综述[J]. 计算机研究与发展, 2021, 58(1): 1-21.
[5]	(Li Jinpeng, Zhang Chuang, Chen Xiaojun, et al. Survey on Automatic Text Summarization[J]. Journal of Computer Research and Development, 2021, 58(1): 1-21.)
[6]	Mihalcea R, Tarau P. TextRank: Bringing Order into Text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
[7]	程齐凯, 王佳敏, 陆伟. 基于引用共词网络的领域基础词汇发现研究[J]. 数据分析与知识发现, 2019, 3(6): 57-65.
[7]	(Cheng Qikai, Wang Jiamin, Lu Wei. Discovering Domain Vocabularies Based on Citation Co-word Network[J]. Data Analysis and Knowledge Discovery, 2019, 3(6): 57-65.)
[8]	Sutskever I, Vinyals O, Le Q V. Sequence to Sequence Learning with Neural Networks[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014: 3104-3112.
[9]	Shi T, Keneshloo Y, Ramakrishnan N, et al. Neural Abstractive Text Summarization with Sequence-to-Sequence Models[J]. ACM Transactions on Data Science, 2021, 2(1): 1-37.
[10]	Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
[11]	刘泽宇, 马龙龙, 吴健, 等. 基于多模态神经网络的图像中文摘要生成方法[J]. 中文信息学报, 2017, 31(6): 162-171.
[11]	(Liu Zeyu, Ma Longlong, Wu Jian, et al. Chinese Image Captioning Method Based on Multimodal Neural Network[J]. Journal of Chinese Information Processing, 2017, 31(6): 162-171.)
[12]	陈祥. 基于多模态数据的文本摘要生成研究[D]. 成都: 电子科技大学, 2020.
[12]	(Chen Xiang. Research on Text Abstraction Generation Based on Multimodal Data[D]. Chengdu: University of Electronic Science and Technology of China, 2020.)
[13]	Li H R, Zhu J N, Ma C, et al. Multi-modal Summarization for Asynchronous Collection of Text, Image, Audio and Video[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1092-1102.
[14]	Li H R, Zhu J N, Liu T S, et al. Multi-modal Sentence Summarization with Modality Attention and Image Filtering[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018: 4152-4158.
[15]	Li M Z, Chen X Y, Gao S, et al. VMSMO: Learning to Generate Multimodal Summary for Video-Based News Articles[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020: 9360-9369.
[16]	Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[17]	刘文斌, 何彦青, 吴振峰, 等. 基于BERT和多相似度融合的句子对齐方法研究[J]. 数据分析与知识发现, 2021, 5(7): 48-58.
[17]	(Liu Wenbin, He Yanqing, Wu Zhenfeng, et al. Sentence Alignment Method Based on BERT and Multi-similarity Fusion[J]. Data Analysis and Knowledge Discovery, 2021, 5(7): 48-58.)
[18]	Chen Y H. Convolutional Neural Network for Sentence Classification[D]. Waterloo: University of Waterloo, 2015.
[19]	Philipp G, Song D, Carbonell J G. The Exploding Gradient Problem Demystified-Definition, Prevalence, Impact, Origin, Tradeoffs, and Solutions[OL]. arXiv Preprint, arXiv: 1712.05577.
[20]	Bahdanau D, Cho K H, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv: 1409.0473.
[21]	邓珍荣, 汤园钰, 杨睿, 等. 基于关键词与指针生成网络的摘要生成算法[J]. 计算机系统应用, 2022, 31(11): 246-253.
[21]	(Deng Zhenrong, Tang Yuanyu, Yang Rui, et al. Summarization Algorithm Based on Key Words and Pointer Generation Network[J]. Computer Systems and Applications, 2022, 31(11): 246-253.)
[22]	Lin C Y. ROUGE: A Package for Automatic Evaluation of Summaries[C]// Proceedings of the Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004. 2004: 74-81.

[1]	黄泰峰, 马静. 基于提示学习增强的文本情感分类模型*[J]. 数据分析与知识发现, 2024, 8(3): 77-84.
[2]	刘成山, 李普国, 汪圳. 一种以科研团队为服务对象的科研人员推荐模型*[J]. 数据分析与知识发现, 2024, 8(3): 132-142.
[3]	李慧, 胡耀华, 徐存真. 考虑评论情感表达力及其重要性的个性化推荐算法^*[J]. 数据分析与知识发现, 2024, 8(1): 69-79.
[4]	吕学强, 杨雨婷, 肖刚, 李育贤, 游新冬. 稀疏样本下长术语的抽取方法^*[J]. 数据分析与知识发现, 2024, 8(1): 135-145.
[5]	贺超城, 黄茜, 李欣儒, 王春迎, 吴江. 元宇宙的冷与热——融合BERT与动态主题模型的微博文本分析^*[J]. 数据分析与知识发现, 2023, 7(9): 25-38.
[6]	何丽, 杨美华, 刘璐瑶. 融合SPO语义和句法信息的事件检测方法^*[J]. 数据分析与知识发现, 2023, 7(9): 114-124.
[7]	韩普, 顾亮, 叶东宇, 陈文祺. 基于多任务和迁移学习的中文医学文献实体识别研究^*[J]. 数据分析与知识发现, 2023, 7(9): 136-145.
[8]	赵雪峰, 吴德林, 吴伟伟, 孙卓荦, 胡瑾瑾, 廉莹, 单佳宇. 基于深度学习与多分类轮询机制的高质量“卡脖子”技术专利识别模型——以专利申请文件为研究主体*[J]. 数据分析与知识发现, 2023, 7(8): 30-45.
[9]	施国良, 周抒, 王云峰, 施春江, 刘亮. 基于改进多头注意力机制的专利文本摘要生成研究^*[J]. 数据分析与知识发现, 2023, 7(6): 61-72.
[10]	本妍妍, 庞雪芹. 融入词性的医疗命名实体识别研究^*[J]. 数据分析与知识发现, 2023, 7(5): 123-132.
[11]	李锴君, 牛振东, 时恺泽, 邱萍. 基于学术知识图谱及主题特征嵌入的论文推荐方法^*[J]. 数据分析与知识发现, 2023, 7(5): 48-59.
[12]	徐康, 余胜男, 陈蕾, 王传栋. 基于语言学知识增强的自监督式图卷积网络的事件关系抽取方法^*[J]. 数据分析与知识发现, 2023, 7(5): 92-104.
[13]	潘华莉, 谢珺, 高婧, 续欣莹, 王长征. 融合多模态特征的深度强化学习推荐模型^*[J]. 数据分析与知识发现, 2023, 7(4): 114-128.
[14]	邓娜, 何昕洋, 陈伟杰, 陈旭. MPMFC：一种融合网络邻里结构特征和专利语义特征的中药专利分类模型^*[J]. 数据分析与知识发现, 2023, 7(4): 145-158.
[15]	韩普, 仲雨乐, 陆豪杰, 马诗雯. 基于对抗性迁移学习的药品不良反应实体识别研究^*[J]. 数据分析与知识发现, 2023, 7(3): 131-141.

Viewed

Full text

Abstract

Cited

Shared

Discussed