|
|
Identifying Structural Elements of Scholarly Abstracts with ERNIE-DPCNN |
Hu Zhongyi1,2(),Shui Diancheng1,Wu Jiang1,2 |
1School of Information Management, Wuhan University, Wuhan 430072, China 2The Center for Electronic Commerce Research and Development, Wuhan University, Wuhan 430072, China |
|
|
Abstract [Objective] This paper proposes an effective model to extract key elements from unstructured abstracts of academic literature automatically. [Methods] First, we used the ERNIE model to represent the abstracts. Then, we utilized the DPCNN to extract semantic features. Finally, we built the identification model. [Results] We evaluated the proposed model using a library and information science dataset. The precision, recall, and F1-score values were all above 0.95, which outperformed benchmark models. [Limitations] Since the corpus used in this study is from a specific domain, more research is needed to assess the model’s performance in other fields. [Conclusions] The proposed model can represent the abstract more comprehensively, improving the structural elements’ identification performance from unstructured abstracts.
|
Received: 29 December 2022
Published: 16 May 2023
|
|
Fund:Major Project of Philosophy and Social Science Research of the Ministry of Education(20JZD024) |
Corresponding Authors:
Hu Zhongyi,ORCID:0000-0002-1113-0199,E-mail:zhongyi.hu@whu.edu.cn。
|
[1] |
Ermakova L, Bordignon F, Turenne N, et al. Is the Abstract a Mere Teaser? Evaluating Generosity of Article Abstracts in the Environmental Sciences[J]. Frontiers in Research Metrics and Analytics, 2018, 3: 16.
doi: 10.3389/frma.2018.00016
|
[2] |
赵丽莹, 苗秀芝, 国荣. 中文科技期刊采用结构式长摘要的建议[J]. 编辑学报, 2017, 29(S1): 59-61.
|
[2] |
(Zhao Liying, Miao Xiuzhi, Guo Rong. Suggestions on Extended Structured Abstract of Chinese Language Sci-Tech Journal[J]. Acta Editologica, 2017, 29(S1): 59-61.)
|
[3] |
Taddio A, Pain T, Fassos F F, et al. Quality of Nonstructured and Structured Abstracts of Original Research Articles in the British Medical Journal, the Canadian Medical Association Journal and the Journal of the American Medical Association[J]. CMAJ: Canadian Medical Association Journal, 1994, 150(10): 1611-1615.
|
[4] |
Hartley J, Benjamin M. An Evaluation of Structured Abstracts in Journals Published by the British Psychological Society[J]. British Journal of Educational Psychology, 1998, 68(3): 443-456.
doi: 10.1111/bjep.1998.68.issue-3
|
[5] |
Hartley J, Sydes M, Blurton A. Obtaining Information Accurately and Quickly: Are Structured Abstracts More Efficient?[J]. Journal of Information Science, 1996, 22(5): 349-356.
doi: 10.1177/016555159602200503
|
[6] |
Yepes A J, Mork J, Aronson A R. Using the Argumentative Structure of Scientific Literature to Improve Information Access[C]// Proceedings of the 2013 Workshop on Biomedical Natural Language Processing. 2013: 102-110.
|
[7] |
Dawes M, Pluye P, Shea L, et al. The Identification of Clinically Important Elements within Medical Journal Abstracts: Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results (PECODR)[J]. Informatics in Primary Care, 2007, 15(1): 9-16.
pmid: 17612476
|
[8] |
郑梦悦, 秦春秀, 马续补. 面向中文科技文献非结构化摘要的知识元表示与抽取研究——基于知识元本体理论[J]. 情报理论与实践, 2020, 43(2): 157-163.
|
[8] |
(Zheng Mengyue, Qin Chunxiu, Ma Xubu. Research on Knowledge Unit Representation and Extraction for Unstructured Abstracts of Chinese Scientific and Technical Literature: Ontology Theory Based on Knowledge Unit[J]. Information Studies: Theory & Application, 2020, 43(2): 157-163.)
|
[9] |
宋东桓, 李晨英, 刘子瑜, 等. 英文科技论文摘要的语义特征词典构建[J]. 图书情报工作, 2020, 64(6): 108-119.
doi: 10.13266/j.issn.0252-3116.2020.06.013
|
[9] |
(Song Donghuan, Li Chenying, Liu Ziyu, et al. Semantic Feature Dictionary Construction of Abstract in English Scientific Journals[J]. Library and Information Service, 2020, 64(6): 108-119.)
doi: 10.13266/j.issn.0252-3116.2020.06.013
|
[10] |
陆伟, 黄永, 程齐凯. 学术文本的结构功能识别——功能框架及基于章节标题的识别[J]. 情报学报, 2014, 33(9): 979-985.
|
[10] |
(Lu Wei, Huang Yong, Cheng Qikai, et al. The Structure Function of Academic Text and Its Classification[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(9): 979-985.)
|
[11] |
黄永, 陆伟, 程齐凯, 等. 学术文本的结构功能识别——基于段落的识别[J]. 情报学报, 2016, 35(5): 530-538.
|
[11] |
(Huang Yong, Lu Wei, Cheng Qikai, et al. The Structure Function Recognition of Academic Text— Paragraph-Based Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(5): 530-538.)
|
[12] |
黄永, 陆伟, 程齐凯. 学术文本的结构功能识别——基于章节内容的识别[J]. 情报学报, 2016, 35(3): 293-300.
|
[12] |
(Huang Yong, Lu Wei, Cheng Qikai. The Structure Function Recognition of Academic Text—Chapter Content Based Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(3): 293-300.)
|
[13] |
王东波, 陆昊翔, 周鑫, 等. 面向摘要结构功能划分的模型性能比较研究[J]. 图书情报工作, 2018, 62(12): 84-90.
doi: 10.13266/j.issn.0252-3116.2018.12.011
|
[13] |
(Wang Dongbo, Lu Haoxiang, Zhou Xin, et al. A Comparative Study of Model Performances Facing Abstract Structure Function[J]. Library and Information Service, 2018, 62(12): 84-90.)
doi: 10.13266/j.issn.0252-3116.2018.12.011
|
[14] |
赵丹宁, 牟冬梅, 白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
|
[14] |
(Zhao Danning, Mu Dongmei, Bai Sen. Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning[J]. Data Analysis and Knowledge Discovery, 2021, 5(7): 70-80.)
|
[15] |
毛进, 陈子洋. 基于深度学习的科技文献摘要结构功能识别研究[J]. 农业图书情报学报, 2022, 34(3): 15-27.
doi: 10.13998/j.cnki.issn1002-1248.21-0707
|
[15] |
(Mao Jin, Chen Ziyang. A Deep Learning Based Approach to Structural Function Recognition of Scientific Literature Abstracts[J]. Journal of Library and Information Science in Agriculture, 2022, 34(3):15-27.)
doi: 10.13998/j.cnki.issn1002-1248.21-0707
|
[16] |
Gonçalves S, Cortez P, Moro S. A Deep Learning Classifier for Sentence Classification in Biomedical and Computer Science Abstracts[J]. Neural Computing and Applications, 2020, 32(11): 6793-6807.
doi: 10.1007/s00521-019-04334-2
|
[17] |
Shen S, Jiang C, Hu H T, et al. A Model for the Identification of the Functional Structures of Unstructured Abstracts in the Social Sciences[J]. The Electronic Library, 2022, 40(6): 680-697.
doi: 10.1108/EL-10-2021-0190
|
[18] |
郭航程, 何彦青, 兰天, 等. 基于Paragraph-BERT-CRF的科技论文摘要语步功能信息识别方法研究[J]. 数据分析与知识发现, 2022, 6(2/3): 298-307.
|
[18] |
(Guo Hangcheng, He Yanqing, Lan Tian, et al. Identifying Moves from Scientific Abstracts Based on Paragraph-BERT-CRF[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 298-307.)
|
[19] |
沈思, 胡昊天, 叶文豪, 等. 基于全字语义的摘要结构功能自动识别研究[J]. 情报学报, 2019, 38(1): 79-88.
|
[19] |
(Shen Si, Hu Haotian, Ye Wenhao, et al. Research on Abstract Structure Function Automatic Recognition Based on Full Character Semantics[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(1): 79-88.)
|
[20] |
张智雄, 刘欢, 丁良萍, 等. 不同深度学习模型的科技论文摘要语步识别效果对比研究[J]. 数据分析与知识发现, 2019, 3(12): 1-9.
|
[20] |
(Zhang Zhixiong, Liu Huan, Ding Liangping, et al. Identifying Moves of Research Abstracts with Deep Learning Methods[J]. Data Analysis and Knowledge Discovery, 2019, 3(12): 1-9.)
|
[21] |
Sun Y, Wang S H, Li Y K, et al. ERNIE: Enhanced Representation Through Knowledge Integration[OL]. arXiv Preprint, arXiv:1904.09223.
|
[22] |
Johnson R, Zhang T. Deep Pyramid Convolutional Neural Networks for Text Categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2017: 562-570.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|