Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning
Zhao Danning1,Mu Dongmei1,2(),Bai Sen2
1School of Public Health, Jilin University, Changchun 130021, China 2Division of Clinical Research, The First Hospital of Jilin University, Changchun 130021, China
[Objective] This paper proposes a deep learning-based method to automatically extract key elements from unstructured abstracts of sci-tech literature. [Methods] We used structured abstracts as the training corpus, and utilized deep learning methods (e.g., LSTM and the attention mechanism) to extract “objective”, “method” and “results” from the sci-tech literature, and then generated new structured abstracts. [Results] The method’s F-scores were 0.951, 0.916, and 0.960 respectively for the three structural elements of “objective”, “method”, and “results”. [Limitations] The deep learning technique in this paper is relatively uninterpretable. [Conclusions] The proposed method could effectively extract elements from unstructured abstracts.
赵丹宁,牟冬梅,白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
Zhao Danning,Mu Dongmei,Bai Sen. Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning. Data Analysis and Knowledge Discovery, 2021, 5(7): 70-80.
(Zhao Liying, Miao Xiuzhi, Guo Rong. Suggestions on Extended Structured Abstract of Chinese Language Sci-Tech Journal[J]. Acta Editologica, 2017, 29(S1):59-61.)
[2]
Zhang C F, Liu X L. Review of James Hartley’s Research on Structured Abstracts[J]. Journal of Information Science, 2011, 37(6):570-576.
doi: 10.1177/0165551511420217
[3]
Budgen D, Burn A J, Kitchenham B. Reporting Computing Projects Through Structured Abstracts: A Quasi-experiment[J]. Empirical Software Engineering, 2011, 16(2):244-277.
doi: 10.1007/s10664-010-9139-3
[4]
李清. 基于机器学习的文本摘要技术的研究与实现[D]. 成都: 电子科技大学, 2020.
[4]
(Li Qing. Research and Implementation of Text Summarization Technology Based on Machine Learning[D]. Chengdu: University of Electronic Science and Technology of China, 2020.)
[5]
周青宇. 基于深度神经网络的文本自动摘要研究[D]. 哈尔滨: 哈尔滨工业大学, 2020.
[5]
(Zhou Qingyu. Research on Deep Neural Networks Based Automatic Text Summarization[D]. Harbin: Harbin Institute of Technology, 2020.)
[6]
Almugbel Z, Elhaggar N, Bugshan N. Automatic Structured Abstract for Research Papers Supported by Tabular Format Using NLP[J]. International Journal of Advanced Computer Science and Applications, 2019, 10(2):233-240.
[7]
Nam S, Jeong S, Kim S K, et al. Structuralizing Biomedical Abstracts with Discriminative Linguistic Features[J]. Computers in Biology and Medicine, 2016, 79:276-285.
doi: 10.1016/j.compbiomed.2016.10.026
[8]
Ripple A M, Mork J G, Knecht L S, et al. A Retrospective Cohort Study of Structured Abstracts in Medline, 1992-2006[J]. Journal of the Medical Library Association, 2011, 99(2):160-163.
doi: 10.3163/1536-5050.99.2.009
pmid: 21464855
[9]
Harbourt A M, Knecht L S, Humphreys B L. Structured Abstracts in Medline, 1989-1991[J]. Bulletin of the Medical Library Association, 1995, 83(2):190-195.
pmid: 7599584
[10]
Ripple A M, Mork J G, Rozier J M, et al. Structured Abstracts in Medline: Twenty-Five Years Later[R]. National Library of Medicine, 2012: 1-3.
(Zeng Zhihong. Exploration and Practice of Structured Abstracts in Scientific Journals Exploration and Practice of Structured Abstracts in Scientific Journals[J]. Journal of Hubei University of Education, 2019, 36(12):104-108.)
(Song Donghuan, Li Chenying, Liu Ziyu, et al. Semantic Feature Dictionary Construction of Abstract in English Scientific Journals[J]. Library and Information Service, 2020, 64(6):108-119.)
[13]
Gratez N. Teaching EFL Students to Extract Structural Information from Abstracts[A]// Ulijn J M, Pugh A K. Reading for Professional Purposes: Methods and Materials in Teaching Languages[M]. Leuven, Belgium: Acco Press, 1985: 123-135.
[14]
Nilsen D L F, Nilsen A P. Semantic Theory: A Linguistic Perspective[J]. Teaching German, 1975, 11(2):1-20.
(Zheng Mengyue, Qin Chunxiu, Ma Xubu. Research on Knowledge Unit Representation and Extraction for Unstructured Abstracts of Chinese Scientific and Technical Literature: Ontology Theory Based on Knowledge Unit[J]. Information Studies: Theory and Application, 2020, 43(2):157-163.)
(Zou Jian, Zhong Maosheng, Meng Li. Method of Chinese Text Segmentation Model Acquisition and its Optimization[J]. Journal of Nanchang University(Natural Science), 2011, 49(6):597-601.)
[17]
Ribeiro S, Yao J T, Rezende D A. Discovering IMRaD Structure with Different Classifiers[C]// Proceedings of IEEE International Conference on Big Knowledge (ICBK), Singapore. Los Alamitos, CA: IEEE Computer Society, 2018: 200-204.
(Zhao Danning, Mu Dongmei, Si Qin. Research on Experimental Data Automatic Extraction of Scientific and Technological Literature——A Case Study of Pharmacokinetic Literature[J]. Library Development, 2017, 40(12):33-38.)
(Chen Guo, Xu Tianxiang. Sentence Function Recognition Based on Active Learning[J]. Data Analysis and Knowledge Discovery, 2019, 3(8):53-61.)
[21]
Yang M, Tu W T, Wang J X, et al. Attention-Based LSTM for Target-Dependent Sentiment Classification[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017: 5013-5014.
[22]
Gers F A, Schmidhuber J, Cummins F, et al. Learning to Forget: Continual Prediction with LSTM[J]. Neural Computation, 2000, 12(10):2451-2471.
pmid: 11032042
[23]
Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8):1735-1780.
pmid: 9377276
(Zhao Huaming, Yu Li, Zhou Qiang. Determining Best Text Clustering Number with Mean Shift Algorithm[J]. Data Analysis and Knowledge Discovery, 2019, 3(9):27-35.)