|
|
Identifying Moves in Full-Text Chinese Academic Papers |
Du Xinyu(),Li Ning |
Computer School, Beijing Information Science & Technology University, Beijing 100101, China |
|
|
Abstract [Objective] This paper investigates the recognition of moves in full-text academic papers. It establishes a solid foundation for automatically understanding paper contents. Existing research on move recognition in academic papers only processes a small number of moves with coarse granularity. There are few open datasets for move classification. [Methods] Based on the BERT model, we constructed a move classification dataset of academic papers with multi-stage fine-tuning. Then, we proposed a move recognition model incorporating the section titles to recognize moves at a fine-grained level. [Results] For the 22-class classification, the overall accuracy of the RoBERTa-wwm-ext model increased by 0.031 to 0.909, and the Micro-F1 improved by 0.022 to 0.837. [Limitations] There is a small amount of unbalanced data in the constructed corpus, and the paper's quality will affect by the proposed model's performance. [Conclusions] The proposed model benefits the automatic understanding of academic papers, research quality evaluation, and semantic content retrieval, which play important roles in using scientific and technological literature.
|
Received: 04 December 2022
Published: 28 March 2023
|
|
Fund:National Natural Science Foundation of China(61672105) |
Corresponding Authors:
Du Xinyu,ORCID:0000-0001-5289-8199,E-mail: duxinyu_0@163.com。
|
[1] |
Du X Y, Li N. Academic Paper Knowledge Graph, the Construction and Application[C]// Proceedings of the 2022 3rd International Conference on Big Data and Artificial Intelligence and Software Engineering. 2022: 15-27.
|
[2] |
周明, 贾艳明, 周彩兰, 等. 基于篇章结构的英文作文自动评分方法[J]. 计算机科学, 2019, 46(3): 234-241.
doi: 10.11896/j.issn.1002-137X.2019.03.035
|
[2] |
(Zhou Ming, Jia Yanming, Zhou Cailan, et al. English Automated Essay Scoring Methods Based on Discourse Structure[J]. Computer Science, 2019, 46(3): 234-241.)
doi: 10.11896/j.issn.1002-137X.2019.03.035
|
[3] |
薛家秀, 欧石燕. 科学论文篇章结构建模与解析研究进展[J]. 图书与情报, 2019(2): 120-132.
|
[3] |
(Xue Jiaxiu, Ou Shiyan. Research Progress on Discourse Structure Modelling and Discourse Parsing of Scientific Articles[J]. Library & Information, 2019(2): 120-132.)
|
[4] |
朱丽萍, 李洪奇, 杨中国, 等. 一种面向科技文献引言的信息抽取方法[J]. 山东大学学报(理学版), 2015, 50(7): 23-30, 37.
|
[4] |
Zhu Liping, Li Hongqi, Yang Zhongguo, et al. An Information Extraction Method for Scientific Literature Introduction[J]. Journal of Shandong University(Natural Science), 2015, 50(7):23- 30, 37.)
|
[5] |
王蜜蜜. 中外英语学术论文结论部分的语步及词块对比分析[D]. 新乡: 河南师范大学, 2020.
|
[5] |
(Wang Mimi. A Comparative Analysis of Moves and Lexical Bundles in the Conclusion Part of Chinese and International English Academic Writing[D]. Xinxiang: Henan Normal University, 2020.)
|
[6] |
周海晨, 郑德俊, 郦天宇. 学术全文本的学术创新贡献识别探索[J]. 情报学报, 2020, 39(8): 845-851.
|
[6] |
(Zhou Haichen, Zheng Dejun, Li Tianyu. Research on the Identification of Academic Innovation Contributions of Full Academic Texts[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(8): 845-851.)
|
[7] |
曹树金, 赵浜, 岳文玉, 等. 学术论文创新点的识别与检索入口研究——以情报学期刊论文为例[J]. 现代情报, 2021, 41(12): 17-27.
doi: 10.3969/j.issn.1008-0821.2021.12.002
|
[7] |
(Cao Shujin, Zhao Bang, Yue Wenyu, et al. Research on the Identification and Retrieval Entry of Innovation Points of Academic Papers — Taking the Papers of Information Science Journals as an Example[J]. Journal of Modern Information, 2021, 41(12): 17-27.)
doi: 10.3969/j.issn.1008-0821.2021.12.002
|
[8] |
张颖怡, 章成志. 基于学术论文全文的研究方法句自动抽取研究[J]. 情报学报, 2020, 39(6): 640-650.
|
[8] |
(Zhang Yingyi, Zhang Chengzhi. Methodological and Automatic Sentence Extraction from Academic Article's Full-Text[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(6): 640-650.)
|
[9] |
李鹏程, 程齐凯. 基于知识角色的信息学研究方法识别[J]. 情报杂志, 2021, 40(7): 23-29.
|
[9] |
(Li Pengcheng, Cheng Qikai. Identification of Research Methods in Information Science Based on Knowledge Role[J]. Journal of Intelligence, 2021, 40(7): 23-29.)
|
[10] |
曹树金, 闫欣阳, 张倩, 等. 中外情报学论文创新性特征研究[J]. 图书情报工作, 2020, 64(1): 80-92.
doi: 10.13266/j.issn.0252-3116.2020.01.011
|
[10] |
(Cao Shujin, Yan Xinyang, Zhang Qian, et al. Research on Characteristics of Innovation in Chinese and International Academic Literature of Information Science[J]. Library and Information Service, 2020, 64(1): 80-92.)
doi: 10.13266/j.issn.0252-3116.2020.01.011
|
[11] |
侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展[J]. 计算机科学, 2022, 49(7): 148-163.
doi: 10.11896/jsjkx.211200018
|
[11] |
(Hou Yutao, Abulizi Abudukelimu, Abudukelimu Halidanmu. Advances in Chinese Pre-training Models[J]. Computer Science, 2022, 49(7): 148-163.)
doi: 10.11896/jsjkx.211200018
|
[12] |
Gupta S, Manning C D. Analyzing the Dynamics of Research by Extracting Key Aspects of Scientific Papers[C]// Proceedings of the 5th International Joint Conference on Natural Language Processing. 2011: 1-9.
|
[13] |
Houngbo H, Mercer R E. Method Mention Extraction from Scientific Research Papers[C]// Proceedings of COLING 2012. 2012: 1211-1222.
|
[14] |
白光祖, 何远标, 马建霞, 等. 利用小样本量机器学习实现学术文摘结构的自动识别[J]. 现代图书情报技术, 2014(7): 34-40.
|
[14] |
(Bai Guangzu, He Yuanbiao, Ma Jianxia, et al. Application of Machine Learning with Limited Corpus to Identify Structure of Scientific Abstracts Automatically[J]. New Technology of Library and Information Service, 2014(7): 34-40.)
|
[15] |
Soonklang T. Move Classification in Scientific Abstracts Using Linguistic Features[C]// Proceedings of the 11th International Symposium on Natural Language Processing. 2016.
|
[16] |
陈果, 许天祥. 基于主动学习的科技论文句子功能识别研究[J]. 数据分析与知识发现, 2019, 3(8): 53-61.
|
[16] |
(Chen Guo, Xu Tianxiang. Sentence Function Recognition Based on Active Learning[J]. Data Analysis and Knowledge Discovery, 2019, 3(8): 53-61.)
|
[17] |
Hirohata K, Okazaki N, Ananiadou S, et al. Identifying Sections in Scientific Abstracts Using Conditional Random Fields[C]// Proceedings of the 3rd International Joint Conference on Natural Language Processing. 2008: 381-388.
|
[18] |
王立非, 刘霞. 英语学术论文摘要语步结构自动识别模型的构建[J]. 外语电化教学, 2017(2): 45-50.
|
[18] |
(Wang Lifei, Liu Xia. Constructing a Model for the Automatic Identification of Move Structure in English Research Article Abstracts[J]. Technology Enhanced Foreign Language Education, 2017(2): 45-50.)
|
[19] |
Dayrell C Jr. Candido A, Lima G, et al. Rhetorical Move Detection in English Abstracts: Multi-label Sentence Classifiers and Their Annotated Corpora[C]// Proceedings of the 8th International Conference on Language Resources and Evaluation. 2012: 1604-1609.
|
[20] |
Pendar N, Cotos E. Automatic Identification of Discourse Moves in Scientific Article Introductions[C]// Proceedings of the 3rd Workshop on Innovative Use of NLP for Building Educational Applications. 2008: 62-70.
|
[21] |
Cui Y M, Che W X, Liu T, et al. Pre-training with Whole Word Masking for Chinese BERT[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514.
doi: 10.1109/TASLP.2021.3124365
|
[22] |
Jin D, Szolovits P. Hierarchical Neural Networks for Sequential Sentence Classification in Medical Scientific Abstracts[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 3100-3109.
|
[23] |
Yu G H, Zhang Z X, Liu H, et al. Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts[J]. Journal of Data and Information Science, 2019, 4(4):42-55.
doi: 10.2478/jdis-2019-0020
|
[24] |
郭航程, 何彦青, 兰天, 等. 基于Paragraph-BERT-CRF的科技论文摘要语步功能信息识别方法研究[J]. 数据分析与知识发现, 2022, 6(2/3): 298-307.
|
[24] |
(Guo Hangcheng, He Yanqing, Lan Tian, et al. Identifying Moves from Scientific Abstracts Based on Paragraph-BERT-CRF[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 298-307.)
|
[25] |
赵旸, 张智雄, 刘欢, 等. 基金项目摘要的语步识别系统设计与实现[J]. 情报理论与实践, 2022, 45(8): 162-168.
|
[25] |
(Zhao Yang, Zhang Zhixiong, Liu Huan, et al. Design and Implementation of the Move Recognition System for Fund Project Abstract[J]. Information Studies: Theory & Application, 2022, 45(8): 162-168.)
|
[26] |
刘江峰, 冯钰童, 刘浏, 等. 领域双语数据增强的学术文本摘要结构识别研究[J]. 数据分析与知识发现, 2023, 7(8): 105-118.
|
[26] |
(Liu Jiangfeng, Feng Yutong, Liu Liu, et al. Structural Recognition of Abstracts of Academic Text Enhanced by Domain Bilingual Data[J]. Data Analysis and Knowledge Discovery, 2023, 7(8): 105-118.)
|
[27] |
李雪思, 张智雄, 刘欢. 基于BERT模型实现概念定义句自动识别[J]. 情报科学, 2022, 40(8): 160-166.
|
[27] |
(Li Xuesi, Zhang Zhixiong, Liu Huan. Automatic Recognition of Concept Definition Sentences Based on Bert Model[J]. Information Science, 2022, 40(8): 160-166.)
|
[28] |
张智雄, 刘欢, 丁良萍, 等. 不同深度学习模型的科技论文摘要语步识别效果对比研究[J]. 数据分析与知识发现, 2019, 3(12): 1-9.
|
[28] |
(Zhang Zhixiong, Liu Huan, Ding Liangping, et al. Identifying Moves of Research Abstracts with Deep Learning Methods[J]. Data Analysis and Knowledge Discovery, 2019, 3(12): 1-9.)
|
[29] |
王末, 崔运鹏, 陈丽, 等. 基于深度学习的学术论文语步结构分类方法研究[J]. 数据分析与知识发现, 2020, 4(6): 60-68.
|
[29] |
(Wang Mo, Cui Yunpeng, Chen Li, et al. A Deep Learning-Based Method of Argumentative Zoning for Research Articles[J]. Data Analysis and Knowledge Discovery, 2020, 4(6): 60-68.)
|
[30] |
欧石燕, 陈嘉文. 科学论文全文语步自动识别研究[J]. 现代情报, 2021, 41(11): 3-11.
doi: 10.3969/j.issn.1008-0821.2021.11.001
|
[30] |
(Ou Shiyan, Chen Jiawen. The Research on Automatic Recognition of Moves in Full-text Scientific Papers[J]. Journal of Modern Information, 2021, 41(11): 3-11.)
doi: 10.3969/j.issn.1008-0821.2021.11.001
|
[31] |
Cunningham H, Tablan V, Roberts A, et al. Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics[J]. PLoS Computational Biology, 2013, 9(2): e1002854.
doi: 10.1371/journal.pcbi.1002854
|
[32] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1(Long and Short Papers). 2019: 4171-4186.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|