Please wait a minute...
Data Analysis and Knowledge Discovery  2024, Vol. 8 Issue (1): 125-144    DOI: 10.11925/infotech.2096-3467.2022.1359
Current Issue | Archive | Adv Search |
Identifying Structural Elements of Scholarly Abstracts with ERNIE-DPCNN
Hu Zhongyi1,2(),Shui Diancheng1,Wu Jiang1,2
1School of Information Management, Wuhan University, Wuhan 430072, China
2The Center for Electronic Commerce Research and Development, Wuhan University, Wuhan 430072, China
Download: PDF (943 KB)   HTML ( 20
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes an effective model to extract key elements from unstructured abstracts of academic literature automatically. [Methods] First, we used the ERNIE model to represent the abstracts. Then, we utilized the DPCNN to extract semantic features. Finally, we built the identification model. [Results] We evaluated the proposed model using a library and information science dataset. The precision, recall, and F1-score values were all above 0.95, which outperformed benchmark models. [Limitations] Since the corpus used in this study is from a specific domain, more research is needed to assess the model’s performance in other fields. [Conclusions] The proposed model can represent the abstract more comprehensively, improving the structural elements’ identification performance from unstructured abstracts.

Key wordsStructural Element Identification of Abstracts      Text Representation      ERNIE      DPCNN     
Received: 29 December 2022      Published: 16 May 2023
ZTFLH:  TP391  
  G350  
Fund:Major Project of Philosophy and Social Science Research of the Ministry of Education(20JZD024)
Corresponding Authors: Hu Zhongyi,ORCID:0000-0002-1113-0199,E-mail:zhongyi.hu@whu.edu.cn。   

Cite this article:

Hu Zhongyi, Shui Diancheng, Wu Jiang. Identifying Structural Elements of Scholarly Abstracts with ERNIE-DPCNN. Data Analysis and Knowledge Discovery, 2024, 8(1): 125-144.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.1359     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2024/V8/I1/125

ERNIE-DPCNN Model
Deepening the Network to Expand Receptive Field
期刊 时间范围
《数据分析与知识发现》
(含《现代图书情报技术》)
2014年1月-2022年4月
《图书情报工作》 2015年1月-2022年4月
《情报科学》 2017年1月-2022年4月
《图书情报知识》 2019年1月-2022年4月
《现代情报》 2018年1月-2022年4月
《情报理论与实践》 2017年1月-2022年4月
《农业图书情报》 2019年1月-2022年4月
Source of the Dataset and Its Time Frame
结构类别 标记词汇 样本数
目的 目的、 目的/意义、研究目的、意义/目的 10 627
方法 方法/过程、方法/内容、方法过程、过程、
过程/方法、研究方法、研究设计/方法
10 599
结论 成果/结论、价值/意义、结果/结论、结果、
结果/意义、结果/总结、结论、研究结论
11 478
局限 局限、创新/局限、局限/创新、局限/不足 1 340
创新 创新/价值 312
文献范围 文献背景 56
应用背景 应用背景 23
Structural Categories and Corresponding Marked Words
Statistics of the Length per Sentence
模型 指标 目的 方法 结论 局限 宏平均值
Word2Vec-DPCNN P 0.810 58 0.854 71 0.809 14 0.779 41 0.813 46
R 0.790 81 0.899 91 0.786 59 0.791 04 0.817 09
F1 0.800 57 0.876 72 0.797 70 0.785 19 0.815 05
BERT-DPCNN P 0.837 50 0.917 54 0.878 09 0.865 25 0.874 59
R 0.879 92 0.914 07 0.834 49 0.910 45 0.884 73
F1 0.858 19 0.915 80 0.855 74 0.887 27 0.879 25
ERNIE-DPCNN P 0.919 02 0.943 31 0.961 78 0.984 96 0.952 27
R 0.947 47 0.958 45 0.920 73 0.977 61 0.951 07
F1 0.933 03 0.950 82 0.940 81 0.981 27 0.951 48
Comparison of the Effectiveness of Word Embedding
模型 指标 目的 方法 结论 局限 宏平均值
ERNIE-FC P 0.919 82 0.942 54 0.947 37 1.000 00 0.952 43
R 0.936 21 0.960 34 0.925 09 0.910 45 0.933 02
F1 0.927 94 0.951 36 0.936 10 0.953 12 0.942 13
ERNIE-TextCNN P 0.902 48 0.945 39 0.965 93 0.984 73 0.949 36
R 0.954 97 0.948 06 0.913 76 0.962 69 0.944 87
F1 0.927 99 0.946 72 0.939 12 0.973 58 0.946 85
ERNIE-BiLSTM P 0.922 37 0.946 28 0.949 24 0.992 19 0.952 52
R 0.947 47 0.948 06 0.928 57 0.947 76 0.942 97
F1 0.934 75 0.947 17 0.938 79 0.969 47 0.947 55
ERNIE-DPCNN P 0.919 02 0.943 31 0.961 78 0.984 96 0.952 27
R 0.947 47 0.958 45 0.920 73 0.977 61 0.951 07
F1 0.933 03 0.950 82 0.940 81 0.981 27 0.951 48
Performance Comparison of Different Classifiers
Confusion Matrix of ERNIE-DPCNN
序号 样本内容 预测类别 真实类别
1 空间句法理论为高校图书馆建筑空间研究引入了新的视角和理论支持,从空间构型角度为图书馆内部空间在可达性方面的效能评价提供一种客观、量化、图示化的手段,相关空间优化法则也将对今后高校图书馆空间再造及创新设计提供指导。 目的 结论
2 结合国内外文献研究与初步研究,基于移动互联网环境下大学生数据素养能力表现,构建大学生数据素养能力评价指标体系并进行实证分析,从而提供数据素养能力综合评价标准与工具。 方法 目的
Examples of Misclassified Samples
Loss Values for the First 30 Iterations
Accuracy for the First 30 Iterations
模型 指标 目的 方法 结论 局限 宏平均值
ERNIE-DPCNN-without Tuning P 0.904 54 0.929 91 0.892 15 0.975 00 0.925 40
R 0.897 75 0.939 57 0.900 70 0.873 13 0.902 79
F1 0.901 13 0.934 71 0.896 40 0.921 26 0.913 38
ERNIE-DPCNN P 0.919 02 0.943 31 0.961 78 0.984 96 0.952 27
R 0.947 47 0.958 45 0.920 73 0.977 61 0.951 07
F1 0.933 03 0.950 82 0.940 81 0.981 27 0.951 48
Performance of Models before and after Fine-Tuning
Performance before and after Fine-Tuning
模型 指标 数据集
5 438(20%) 13 596(50%) 21 738(80%) 27 235(100%)
Word2Vec-DPCNN P 0.708 12 0.781 20 0.841 22 0.813 46
R 0.693 41 0.752 02 0.784 27 0.817 09
F1 0.699 12 0.764 99 0.807 25 0.815 05
BERT-DPCNN P 0.809 25 0.863 82 0.870 31 0.874 59
R 0.822 76 0.822 49 0.876 39 0.884 73
F1 0.814 95 0.839 70 0.873 26 0.879 25
ERNIE-DPCNN P 0.939 94 0.949 70 0.952 04 0.952 27
R 0.925 87 0.936 66 0.945 85 0.951 07
F1 0.927 16 0.942 91 0.948 72 0.951 48
Model Performance Related to the Size of the Dataset
[1] Ermakova L, Bordignon F, Turenne N, et al. Is the Abstract a Mere Teaser? Evaluating Generosity of Article Abstracts in the Environmental Sciences[J]. Frontiers in Research Metrics and Analytics, 2018, 3: 16.
doi: 10.3389/frma.2018.00016
[2] 赵丽莹, 苗秀芝, 国荣. 中文科技期刊采用结构式长摘要的建议[J]. 编辑学报, 2017, 29(S1): 59-61.
[2] (Zhao Liying, Miao Xiuzhi, Guo Rong. Suggestions on Extended Structured Abstract of Chinese Language Sci-Tech Journal[J]. Acta Editologica, 2017, 29(S1): 59-61.)
[3] Taddio A, Pain T, Fassos F F, et al. Quality of Nonstructured and Structured Abstracts of Original Research Articles in the British Medical Journal, the Canadian Medical Association Journal and the Journal of the American Medical Association[J]. CMAJ: Canadian Medical Association Journal, 1994, 150(10): 1611-1615.
[4] Hartley J, Benjamin M. An Evaluation of Structured Abstracts in Journals Published by the British Psychological Society[J]. British Journal of Educational Psychology, 1998, 68(3): 443-456.
doi: 10.1111/bjep.1998.68.issue-3
[5] Hartley J, Sydes M, Blurton A. Obtaining Information Accurately and Quickly: Are Structured Abstracts More Efficient?[J]. Journal of Information Science, 1996, 22(5): 349-356.
doi: 10.1177/016555159602200503
[6] Yepes A J, Mork J, Aronson A R. Using the Argumentative Structure of Scientific Literature to Improve Information Access[C]// Proceedings of the 2013 Workshop on Biomedical Natural Language Processing. 2013: 102-110.
[7] Dawes M, Pluye P, Shea L, et al. The Identification of Clinically Important Elements within Medical Journal Abstracts: Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results (PECODR)[J]. Informatics in Primary Care, 2007, 15(1): 9-16.
pmid: 17612476
[8] 郑梦悦, 秦春秀, 马续补. 面向中文科技文献非结构化摘要的知识元表示与抽取研究——基于知识元本体理论[J]. 情报理论与实践, 2020, 43(2): 157-163.
[8] (Zheng Mengyue, Qin Chunxiu, Ma Xubu. Research on Knowledge Unit Representation and Extraction for Unstructured Abstracts of Chinese Scientific and Technical Literature: Ontology Theory Based on Knowledge Unit[J]. Information Studies: Theory & Application, 2020, 43(2): 157-163.)
[9] 宋东桓, 李晨英, 刘子瑜, 等. 英文科技论文摘要的语义特征词典构建[J]. 图书情报工作, 2020, 64(6): 108-119.
doi: 10.13266/j.issn.0252-3116.2020.06.013
[9] (Song Donghuan, Li Chenying, Liu Ziyu, et al. Semantic Feature Dictionary Construction of Abstract in English Scientific Journals[J]. Library and Information Service, 2020, 64(6): 108-119.)
doi: 10.13266/j.issn.0252-3116.2020.06.013
[10] 陆伟, 黄永, 程齐凯. 学术文本的结构功能识别——功能框架及基于章节标题的识别[J]. 情报学报, 2014, 33(9): 979-985.
[10] (Lu Wei, Huang Yong, Cheng Qikai, et al. The Structure Function of Academic Text and Its Classification[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(9): 979-985.)
[11] 黄永, 陆伟, 程齐凯, 等. 学术文本的结构功能识别——基于段落的识别[J]. 情报学报, 2016, 35(5): 530-538.
[11] (Huang Yong, Lu Wei, Cheng Qikai, et al. The Structure Function Recognition of Academic Text— Paragraph-Based Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(5): 530-538.)
[12] 黄永, 陆伟, 程齐凯. 学术文本的结构功能识别——基于章节内容的识别[J]. 情报学报, 2016, 35(3): 293-300.
[12] (Huang Yong, Lu Wei, Cheng Qikai. The Structure Function Recognition of Academic Text—Chapter Content Based Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(3): 293-300.)
[13] 王东波, 陆昊翔, 周鑫, 等. 面向摘要结构功能划分的模型性能比较研究[J]. 图书情报工作, 2018, 62(12): 84-90.
doi: 10.13266/j.issn.0252-3116.2018.12.011
[13] (Wang Dongbo, Lu Haoxiang, Zhou Xin, et al. A Comparative Study of Model Performances Facing Abstract Structure Function[J]. Library and Information Service, 2018, 62(12): 84-90.)
doi: 10.13266/j.issn.0252-3116.2018.12.011
[14] 赵丹宁, 牟冬梅, 白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[14] (Zhao Danning, Mu Dongmei, Bai Sen. Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning[J]. Data Analysis and Knowledge Discovery, 2021, 5(7): 70-80.)
[15] 毛进, 陈子洋. 基于深度学习的科技文献摘要结构功能识别研究[J]. 农业图书情报学报, 2022, 34(3): 15-27.
doi: 10.13998/j.cnki.issn1002-1248.21-0707
[15] (Mao Jin, Chen Ziyang. A Deep Learning Based Approach to Structural Function Recognition of Scientific Literature Abstracts[J]. Journal of Library and Information Science in Agriculture, 2022, 34(3):15-27.)
doi: 10.13998/j.cnki.issn1002-1248.21-0707
[16] Gonçalves S, Cortez P, Moro S. A Deep Learning Classifier for Sentence Classification in Biomedical and Computer Science Abstracts[J]. Neural Computing and Applications, 2020, 32(11): 6793-6807.
doi: 10.1007/s00521-019-04334-2
[17] Shen S, Jiang C, Hu H T, et al. A Model for the Identification of the Functional Structures of Unstructured Abstracts in the Social Sciences[J]. The Electronic Library, 2022, 40(6): 680-697.
doi: 10.1108/EL-10-2021-0190
[18] 郭航程, 何彦青, 兰天, 等. 基于Paragraph-BERT-CRF的科技论文摘要语步功能信息识别方法研究[J]. 数据分析与知识发现, 2022, 6(2/3): 298-307.
[18] (Guo Hangcheng, He Yanqing, Lan Tian, et al. Identifying Moves from Scientific Abstracts Based on Paragraph-BERT-CRF[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 298-307.)
[19] 沈思, 胡昊天, 叶文豪, 等. 基于全字语义的摘要结构功能自动识别研究[J]. 情报学报, 2019, 38(1): 79-88.
[19] (Shen Si, Hu Haotian, Ye Wenhao, et al. Research on Abstract Structure Function Automatic Recognition Based on Full Character Semantics[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(1): 79-88.)
[20] 张智雄, 刘欢, 丁良萍, 等. 不同深度学习模型的科技论文摘要语步识别效果对比研究[J]. 数据分析与知识发现, 2019, 3(12): 1-9.
[20] (Zhang Zhixiong, Liu Huan, Ding Liangping, et al. Identifying Moves of Research Abstracts with Deep Learning Methods[J]. Data Analysis and Knowledge Discovery, 2019, 3(12): 1-9.)
[21] Sun Y, Wang S H, Li Y K, et al. ERNIE: Enhanced Representation Through Knowledge Integration[OL]. arXiv Preprint, arXiv:1904.09223.
[22] Johnson R, Zhang T. Deep Pyramid Convolutional Neural Networks for Text Categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2017: 562-570.
[1] Zhang Zhipeng, Mao Yusheng, Zhang Liyi. Classifying Reasons of Hotel Reviews with Domain ERNIE and BiLSTM Model[J]. 数据分析与知识发现, 2022, 6(9): 65-76.
[2] Tu Zhenchao, Ma Jing. Item Categorization Algorithm Based on Improved Text Representation[J]. 数据分析与知识发现, 2022, 6(5): 34-43.
[3] Tong Xinyu, Zhao Ruijie, Lu Yonghe. Multi-label Patent Classification with Pre-training Model[J]. 数据分析与知识发现, 2022, 6(2/3): 129-137.
[4] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[5] Huang Lu,Zhou Enguo,Li Daifeng. Text Representation Learning Model Based on Attention Mechanism with Task-specific Information[J]. 数据分析与知识发现, 2020, 4(9): 111-122.
[6] Jiao Qihang,Le Xiaoqiu. Generating Sentences of Contrast Relationship[J]. 数据分析与知识发现, 2020, 4(6): 43-50.
[7] Jingjing Pei,Xiaoqiu Le. Identifying Coordinate Text Blocks in Discourses[J]. 数据分析与知识发现, 2019, 3(5): 51-56.
[8] Feng Guoming,Zhang Xiaodong,Liu Suhui. Classifying Chinese Texts with CapsNet[J]. 数据分析与知识发现, 2018, 2(12): 68-76.
[9] Yang Zhimo, Liu Huailiang, Zhao Hui. An Algorithm of Chinese Text Representation Based on Complex Network[J]. 现代图书情报技术, 2014, 30(11): 38-44.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn