Identifying Structural Elements of Scholarly Abstracts with ERNIE-DPCNN
Hu Zhongyi1,2(),Shui Diancheng1,Wu Jiang1,2
1School of Information Management, Wuhan University, Wuhan 430072, China 2The Center for Electronic Commerce Research and Development, Wuhan University, Wuhan 430072, China
[Objective] This paper proposes an effective model to extract key elements from unstructured abstracts of academic literature automatically. [Methods] First, we used the ERNIE model to represent the abstracts. Then, we utilized the DPCNN to extract semantic features. Finally, we built the identification model. [Results] We evaluated the proposed model using a library and information science dataset. The precision, recall, and F1-score values were all above 0.95, which outperformed benchmark models. [Limitations] Since the corpus used in this study is from a specific domain, more research is needed to assess the model’s performance in other fields. [Conclusions] The proposed model can represent the abstract more comprehensively, improving the structural elements’ identification performance from unstructured abstracts.
胡忠义, 税典程, 吴江. 基于ERNIE和DPCNN的科技文献摘要结构要素识别*[J]. 数据分析与知识发现, 2024, 8(1): 125-144.
Hu Zhongyi, Shui Diancheng, Wu Jiang. Identifying Structural Elements of Scholarly Abstracts with ERNIE-DPCNN. Data Analysis and Knowledge Discovery, 2024, 8(1): 125-144.
Ermakova L, Bordignon F, Turenne N, et al. Is the Abstract a Mere Teaser? Evaluating Generosity of Article Abstracts in the Environmental Sciences[J]. Frontiers in Research Metrics and Analytics, 2018, 3: 16.
doi: 10.3389/frma.2018.00016
(Zhao Liying, Miao Xiuzhi, Guo Rong. Suggestions on Extended Structured Abstract of Chinese Language Sci-Tech Journal[J]. Acta Editologica, 2017, 29(S1): 59-61.)
[3]
Taddio A, Pain T, Fassos F F, et al. Quality of Nonstructured and Structured Abstracts of Original Research Articles in the British Medical Journal, the Canadian Medical Association Journal and the Journal of the American Medical Association[J]. CMAJ: Canadian Medical Association Journal, 1994, 150(10): 1611-1615.
[4]
Hartley J, Benjamin M. An Evaluation of Structured Abstracts in Journals Published by the British Psychological Society[J]. British Journal of Educational Psychology, 1998, 68(3): 443-456.
doi: 10.1111/bjep.1998.68.issue-3
[5]
Hartley J, Sydes M, Blurton A. Obtaining Information Accurately and Quickly: Are Structured Abstracts More Efficient?[J]. Journal of Information Science, 1996, 22(5): 349-356.
doi: 10.1177/016555159602200503
[6]
Yepes A J, Mork J, Aronson A R. Using the Argumentative Structure of Scientific Literature to Improve Information Access[C]// Proceedings of the 2013 Workshop on Biomedical Natural Language Processing. 2013: 102-110.
[7]
Dawes M, Pluye P, Shea L, et al. The Identification of Clinically Important Elements within Medical Journal Abstracts: Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results (PECODR)[J]. Informatics in Primary Care, 2007, 15(1): 9-16.
pmid: 17612476
(Zheng Mengyue, Qin Chunxiu, Ma Xubu. Research on Knowledge Unit Representation and Extraction for Unstructured Abstracts of Chinese Scientific and Technical Literature: Ontology Theory Based on Knowledge Unit[J]. Information Studies: Theory & Application, 2020, 43(2): 157-163.)
(Song Donghuan, Li Chenying, Liu Ziyu, et al. Semantic Feature Dictionary Construction of Abstract in English Scientific Journals[J]. Library and Information Service, 2020, 64(6): 108-119.)
doi: 10.13266/j.issn.0252-3116.2020.06.013
(Lu Wei, Huang Yong, Cheng Qikai, et al. The Structure Function of Academic Text and Its Classification[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(9): 979-985.)
(Huang Yong, Lu Wei, Cheng Qikai, et al. The Structure Function Recognition of Academic Text— Paragraph-Based Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(5): 530-538.)
(Huang Yong, Lu Wei, Cheng Qikai. The Structure Function Recognition of Academic Text—Chapter Content Based Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(3): 293-300.)
(Wang Dongbo, Lu Haoxiang, Zhou Xin, et al. A Comparative Study of Model Performances Facing Abstract Structure Function[J]. Library and Information Service, 2018, 62(12): 84-90.)
doi: 10.13266/j.issn.0252-3116.2018.12.011
(Zhao Danning, Mu Dongmei, Bai Sen. Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning[J]. Data Analysis and Knowledge Discovery, 2021, 5(7): 70-80.)
(Mao Jin, Chen Ziyang. A Deep Learning Based Approach to Structural Function Recognition of Scientific Literature Abstracts[J]. Journal of Library and Information Science in Agriculture, 2022, 34(3):15-27.)
doi: 10.13998/j.cnki.issn1002-1248.21-0707
[16]
Gonçalves S, Cortez P, Moro S. A Deep Learning Classifier for Sentence Classification in Biomedical and Computer Science Abstracts[J]. Neural Computing and Applications, 2020, 32(11): 6793-6807.
doi: 10.1007/s00521-019-04334-2
[17]
Shen S, Jiang C, Hu H T, et al. A Model for the Identification of the Functional Structures of Unstructured Abstracts in the Social Sciences[J]. The Electronic Library, 2022, 40(6): 680-697.
doi: 10.1108/EL-10-2021-0190
(Guo Hangcheng, He Yanqing, Lan Tian, et al. Identifying Moves from Scientific Abstracts Based on Paragraph-BERT-CRF[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 298-307.)
(Shen Si, Hu Haotian, Ye Wenhao, et al. Research on Abstract Structure Function Automatic Recognition Based on Full Character Semantics[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(1): 79-88.)
(Zhang Zhixiong, Liu Huan, Ding Liangping, et al. Identifying Moves of Research Abstracts with Deep Learning Methods[J]. Data Analysis and Knowledge Discovery, 2019, 3(12): 1-9.)
[21]
Sun Y, Wang S H, Li Y K, et al. ERNIE: Enhanced Representation Through Knowledge Integration[OL]. arXiv Preprint, arXiv:1904.09223.
[22]
Johnson R, Zhang T. Deep Pyramid Convolutional Neural Networks for Text Categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2017: 562-570.