基于ERNIE和DPCNN的科技文献摘要结构要素识别<sup>*</sup>

doi:10.11925/infotech.2096-3467.2022.1359

数据分析与知识发现

2024, Vol. 8

Issue (1): 125-144 https://doi.org/10.11925/infotech.2096-3467.2022.1359

研究论文

本期目录 | 过刊浏览 | 高级检索

基于ERNIE和DPCNN的科技文献摘要结构要素识别^*

胡忠义^1,²(

),税典程¹,吴江^1,²

¹武汉大学信息管理学院武汉 430072
²武汉大学电子商务研究与发展中心武汉 430072

Identifying Structural Elements of Scholarly Abstracts with ERNIE-DPCNN

Hu Zhongyi^1,²(

),Shui Diancheng¹,Wu Jiang^1,²

¹School of Information Management, Wuhan University, Wuhan 430072, China
²The Center for Electronic Commerce Research and Development, Wuhan University, Wuhan 430072, China

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (943 KB) HTML ( 22 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】 构建一种高效的科技文献摘要结构要素识别模型，实现对一段式摘要的结构要素识别。【方法】 以知识增强语义表示模型（ERNIE）对科技文献的摘要文本进行表征，通过深度金字塔卷积神经网络（DPCNN）进行文本特征抽取，构建科技文献摘要结构要素识别模型。【结果】 所构建的模型在图书情报领域数据集上识别文献摘要结构要素的精确率、召回率、 $F 1$ 宏平均值均高于0.95，比基准模型具有更好的识别性能。【局限】 使用的语料具有一定的领域倾向，模型的领域通用性还有待验证。【结论】 构建的模型可以更好地对文本特征进行抽取，有效提升了科技文献摘要结构要素的识别性能。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	胡忠义
	税典程
	吴江

关键词 ：摘要结构要素识别, 文本表征, ERNIE, DPCNN

Abstract：

[Objective] This paper proposes an effective model to extract key elements from unstructured abstracts of academic literature automatically. [Methods] First, we used the ERNIE model to represent the abstracts. Then, we utilized the DPCNN to extract semantic features. Finally, we built the identification model. [Results] We evaluated the proposed model using a library and information science dataset. The precision, recall, and F1-score values were all above 0.95, which outperformed benchmark models. [Limitations] Since the corpus used in this study is from a specific domain, more research is needed to assess the model’s performance in other fields. [Conclusions] The proposed model can represent the abstract more comprehensively, improving the structural elements’ identification performance from unstructured abstracts.

Key words： Structural Element Identification of Abstracts Text Representation ERNIE DPCNN

收稿日期: 2022-12-29 出版日期: 2023-05-16

ZTFLH:	TP391
	G350

基金资助:*教育部哲学社会科学研究重大课题攻关项目(20JZD024)

通讯作者: 胡忠义，ORCID：0000-0002-1113-0199，E-mail：zhongyi.hu@whu.edu.cn。

引用本文:

胡忠义, 税典程, 吴江. 基于ERNIE和DPCNN的科技文献摘要结构要素识别^*[J]. 数据分析与知识发现, 2024, 8(1): 125-144.
Hu Zhongyi, Shui Diancheng, Wu Jiang. Identifying Structural Elements of Scholarly Abstracts with ERNIE-DPCNN. Data Analysis and Knowledge Discovery, 2024, 8(1): 125-144.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.1359 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2024/V8/I1/125

Fig.1 ERNIE-DPCNN模型结构

Fig.2 通过加深网络扩大感受野

Table 1 数据集来源期刊及其时间范围

Table 2 结构类别与对应标记词汇

Fig.3 摘要单句长度统计

Table 3 语言模型词嵌入有效性对比

Table 4 不同分类器的性能对比

Table 5 ERNIE-DPCNN模型混淆矩阵

Table 6 误分类样本示例

Fig.4 模型前30轮迭代的损失值

Fig.5 模型前30轮迭代的准确率

Table 7 微调前后模型的性能对比

Fig.6 微调前后模型性能对比

Table 8 不同规模数据集模型性能对比

[1]	Ermakova L, Bordignon F, Turenne N, et al. Is the Abstract a Mere Teaser? Evaluating Generosity of Article Abstracts in the Environmental Sciences[J]. Frontiers in Research Metrics and Analytics, 2018, 3: 16. doi: 10.3389/frma.2018.00016
[2]	赵丽莹, 苗秀芝, 国荣. 中文科技期刊采用结构式长摘要的建议[J]. 编辑学报, 2017, 29(S1): 59-61.
[2]	(Zhao Liying, Miao Xiuzhi, Guo Rong. Suggestions on Extended Structured Abstract of Chinese Language Sci-Tech Journal[J]. Acta Editologica, 2017, 29(S1): 59-61.)
[3]	Taddio A, Pain T, Fassos F F, et al. Quality of Nonstructured and Structured Abstracts of Original Research Articles in the British Medical Journal, the Canadian Medical Association Journal and the Journal of the American Medical Association[J]. CMAJ: Canadian Medical Association Journal, 1994, 150(10): 1611-1615.
[4]	Hartley J, Benjamin M. An Evaluation of Structured Abstracts in Journals Published by the British Psychological Society[J]. British Journal of Educational Psychology, 1998, 68(3): 443-456. doi: 10.1111/bjep.1998.68.issue-3
[5]	Hartley J, Sydes M, Blurton A. Obtaining Information Accurately and Quickly: Are Structured Abstracts More Efficient?[J]. Journal of Information Science, 1996, 22(5): 349-356. doi: 10.1177/016555159602200503
[6]	Yepes A J, Mork J, Aronson A R. Using the Argumentative Structure of Scientific Literature to Improve Information Access[C]// Proceedings of the 2013 Workshop on Biomedical Natural Language Processing. 2013: 102-110.
[7]	Dawes M, Pluye P, Shea L, et al. The Identification of Clinically Important Elements within Medical Journal Abstracts: Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results (PECODR)[J]. Informatics in Primary Care, 2007, 15(1): 9-16. pmid: 17612476
[8]	郑梦悦, 秦春秀, 马续补. 面向中文科技文献非结构化摘要的知识元表示与抽取研究——基于知识元本体理论[J]. 情报理论与实践, 2020, 43(2): 157-163.
[8]	(Zheng Mengyue, Qin Chunxiu, Ma Xubu. Research on Knowledge Unit Representation and Extraction for Unstructured Abstracts of Chinese Scientific and Technical Literature: Ontology Theory Based on Knowledge Unit[J]. Information Studies: Theory & Application, 2020, 43(2): 157-163.)
[9]	宋东桓, 李晨英, 刘子瑜, 等. 英文科技论文摘要的语义特征词典构建[J]. 图书情报工作, 2020, 64(6): 108-119. doi: 10.13266/j.issn.0252-3116.2020.06.013
[9]	(Song Donghuan, Li Chenying, Liu Ziyu, et al. Semantic Feature Dictionary Construction of Abstract in English Scientific Journals[J]. Library and Information Service, 2020, 64(6): 108-119.) doi: 10.13266/j.issn.0252-3116.2020.06.013
[10]	陆伟, 黄永, 程齐凯. 学术文本的结构功能识别——功能框架及基于章节标题的识别[J]. 情报学报, 2014, 33(9): 979-985.
[10]	(Lu Wei, Huang Yong, Cheng Qikai, et al. The Structure Function of Academic Text and Its Classification[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(9): 979-985.)
[11]	黄永, 陆伟, 程齐凯, 等. 学术文本的结构功能识别——基于段落的识别[J]. 情报学报, 2016, 35(5): 530-538.
[11]	(Huang Yong, Lu Wei, Cheng Qikai, et al. The Structure Function Recognition of Academic Text— Paragraph-Based Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(5): 530-538.)
[12]	黄永, 陆伟, 程齐凯. 学术文本的结构功能识别——基于章节内容的识别[J]. 情报学报, 2016, 35(3): 293-300.
[12]	(Huang Yong, Lu Wei, Cheng Qikai. The Structure Function Recognition of Academic Text—Chapter Content Based Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(3): 293-300.)
[13]	王东波, 陆昊翔, 周鑫, 等. 面向摘要结构功能划分的模型性能比较研究[J]. 图书情报工作, 2018, 62(12): 84-90. doi: 10.13266/j.issn.0252-3116.2018.12.011
[13]	(Wang Dongbo, Lu Haoxiang, Zhou Xin, et al. A Comparative Study of Model Performances Facing Abstract Structure Function[J]. Library and Information Service, 2018, 62(12): 84-90.) doi: 10.13266/j.issn.0252-3116.2018.12.011
[14]	赵丹宁, 牟冬梅, 白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[14]	(Zhao Danning, Mu Dongmei, Bai Sen. Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning[J]. Data Analysis and Knowledge Discovery, 2021, 5(7): 70-80.)
[15]	毛进, 陈子洋. 基于深度学习的科技文献摘要结构功能识别研究[J]. 农业图书情报学报, 2022, 34(3): 15-27. doi: 10.13998/j.cnki.issn1002-1248.21-0707
[15]	(Mao Jin, Chen Ziyang. A Deep Learning Based Approach to Structural Function Recognition of Scientific Literature Abstracts[J]. Journal of Library and Information Science in Agriculture, 2022, 34(3):15-27.) doi: 10.13998/j.cnki.issn1002-1248.21-0707
[16]	Gonçalves S, Cortez P, Moro S. A Deep Learning Classifier for Sentence Classification in Biomedical and Computer Science Abstracts[J]. Neural Computing and Applications, 2020, 32(11): 6793-6807. doi: 10.1007/s00521-019-04334-2
[17]	Shen S, Jiang C, Hu H T, et al. A Model for the Identification of the Functional Structures of Unstructured Abstracts in the Social Sciences[J]. The Electronic Library, 2022, 40(6): 680-697. doi: 10.1108/EL-10-2021-0190
[18]	郭航程, 何彦青, 兰天, 等. 基于Paragraph-BERT-CRF的科技论文摘要语步功能信息识别方法研究[J]. 数据分析与知识发现, 2022, 6(2/3): 298-307.
[18]	(Guo Hangcheng, He Yanqing, Lan Tian, et al. Identifying Moves from Scientific Abstracts Based on Paragraph-BERT-CRF[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 298-307.)
[19]	沈思, 胡昊天, 叶文豪, 等. 基于全字语义的摘要结构功能自动识别研究[J]. 情报学报, 2019, 38(1): 79-88.
[19]	(Shen Si, Hu Haotian, Ye Wenhao, et al. Research on Abstract Structure Function Automatic Recognition Based on Full Character Semantics[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(1): 79-88.)
[20]	张智雄, 刘欢, 丁良萍, 等. 不同深度学习模型的科技论文摘要语步识别效果对比研究[J]. 数据分析与知识发现, 2019, 3(12): 1-9.
[20]	(Zhang Zhixiong, Liu Huan, Ding Liangping, et al. Identifying Moves of Research Abstracts with Deep Learning Methods[J]. Data Analysis and Knowledge Discovery, 2019, 3(12): 1-9.)
[21]	Sun Y, Wang S H, Li Y K, et al. ERNIE: Enhanced Representation Through Knowledge Integration[OL]. arXiv Preprint, arXiv:1904.09223.
[22]	Johnson R, Zhang T. Deep Pyramid Convolutional Neural Networks for Text Categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2017: 562-570.

[1]	张治鹏, 毛煜升, 张李义. 基于领域ERNIE和BiLSTM模型的酒店评论观点原因分类研究^*[J]. 数据分析与知识发现, 2022, 6(9): 65-76.
[2]	陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[3]	焦启航,乐小虬. 对比关系句子生成方法研究[J]. 数据分析与知识发现, 2020, 4(6): 43-50.

Viewed

Full text

Abstract

Cited

Shared

Discussed