|
|
Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning |
Zhao Danning1,Mu Dongmei1,2(),Bai Sen2 |
1School of Public Health, Jilin University, Changchun 130021, China 2Division of Clinical Research, The First Hospital of Jilin University, Changchun 130021, China |
|
|
Abstract [Objective] This paper proposes a deep learning-based method to automatically extract key elements from unstructured abstracts of sci-tech literature. [Methods] We used structured abstracts as the training corpus, and utilized deep learning methods (e.g., LSTM and the attention mechanism) to extract “objective”, “method” and “results” from the sci-tech literature, and then generated new structured abstracts. [Results] The method’s F-scores were 0.951, 0.916, and 0.960 respectively for the three structural elements of “objective”, “method”, and “results”. [Limitations] The deep learning technique in this paper is relatively uninterpretable. [Conclusions] The proposed method could effectively extract elements from unstructured abstracts.
|
Received: 18 November 2020
Published: 11 August 2021
|
|
Fund:National Natural Science Foundation of China(71974074);Scientific and Technological Developing Scheme of Ji Lin Province(20200301004RQ) |
Corresponding Authors:
Mu Dongmei
E-mail: moudm@jlu.edu.cn
|
[1] |
赵丽莹, 苗秀芝, 国荣. 中文科技期刊采用结构式长摘要的建议[J]. 编辑学报, 2017, 29(S1):59-61.
|
[1] |
(Zhao Liying, Miao Xiuzhi, Guo Rong. Suggestions on Extended Structured Abstract of Chinese Language Sci-Tech Journal[J]. Acta Editologica, 2017, 29(S1):59-61.)
|
[2] |
Zhang C F, Liu X L. Review of James Hartley’s Research on Structured Abstracts[J]. Journal of Information Science, 2011, 37(6):570-576.
doi: 10.1177/0165551511420217
|
[3] |
Budgen D, Burn A J, Kitchenham B. Reporting Computing Projects Through Structured Abstracts: A Quasi-experiment[J]. Empirical Software Engineering, 2011, 16(2):244-277.
doi: 10.1007/s10664-010-9139-3
|
[4] |
李清. 基于机器学习的文本摘要技术的研究与实现[D]. 成都: 电子科技大学, 2020.
|
[4] |
(Li Qing. Research and Implementation of Text Summarization Technology Based on Machine Learning[D]. Chengdu: University of Electronic Science and Technology of China, 2020.)
|
[5] |
周青宇. 基于深度神经网络的文本自动摘要研究[D]. 哈尔滨: 哈尔滨工业大学, 2020.
|
[5] |
(Zhou Qingyu. Research on Deep Neural Networks Based Automatic Text Summarization[D]. Harbin: Harbin Institute of Technology, 2020.)
|
[6] |
Almugbel Z, Elhaggar N, Bugshan N. Automatic Structured Abstract for Research Papers Supported by Tabular Format Using NLP[J]. International Journal of Advanced Computer Science and Applications, 2019, 10(2):233-240.
|
[7] |
Nam S, Jeong S, Kim S K, et al. Structuralizing Biomedical Abstracts with Discriminative Linguistic Features[J]. Computers in Biology and Medicine, 2016, 79:276-285.
doi: 10.1016/j.compbiomed.2016.10.026
|
[8] |
Ripple A M, Mork J G, Knecht L S, et al. A Retrospective Cohort Study of Structured Abstracts in Medline, 1992-2006[J]. Journal of the Medical Library Association, 2011, 99(2):160-163.
doi: 10.3163/1536-5050.99.2.009
pmid: 21464855
|
[9] |
Harbourt A M, Knecht L S, Humphreys B L. Structured Abstracts in Medline, 1989-1991[J]. Bulletin of the Medical Library Association, 1995, 83(2):190-195.
pmid: 7599584
|
[10] |
Ripple A M, Mork J G, Rozier J M, et al. Structured Abstracts in Medline: Twenty-Five Years Later[R]. National Library of Medicine, 2012: 1-3.
|
[11] |
曾志红. 科技期刊结构式摘要的探索与实践——以数学学术性论文为例[J]. 湖北第二师范学院学报, 2019, 36(12):104-108.
|
[11] |
(Zeng Zhihong. Exploration and Practice of Structured Abstracts in Scientific Journals Exploration and Practice of Structured Abstracts in Scientific Journals[J]. Journal of Hubei University of Education, 2019, 36(12):104-108.)
|
[12] |
宋东桓, 李晨英, 刘子瑜, 等. 英文科技论文摘要的语义特征词典构建[J]. 图书情报工作, 2020, 64(6):108-119.
|
[12] |
(Song Donghuan, Li Chenying, Liu Ziyu, et al. Semantic Feature Dictionary Construction of Abstract in English Scientific Journals[J]. Library and Information Service, 2020, 64(6):108-119.)
|
[13] |
Gratez N. Teaching EFL Students to Extract Structural Information from Abstracts[A]// Ulijn J M, Pugh A K. Reading for Professional Purposes: Methods and Materials in Teaching Languages[M]. Leuven, Belgium: Acco Press, 1985: 123-135.
|
[14] |
Nilsen D L F, Nilsen A P. Semantic Theory: A Linguistic Perspective[J]. Teaching German, 1975, 11(2):1-20.
|
[15] |
郑梦悦, 秦春秀, 马续补. 面向中文科技文献非结构化摘要的知识元表示与抽取研究——基于知识元本体理论[J]. 情报理论与实践, 2020, 43(2):157-163.
|
[15] |
(Zheng Mengyue, Qin Chunxiu, Ma Xubu. Research on Knowledge Unit Representation and Extraction for Unstructured Abstracts of Chinese Scientific and Technical Literature: Ontology Theory Based on Knowledge Unit[J]. Information Studies: Theory and Application, 2020, 43(2):157-163.)
|
[16] |
邹箭, 钟茂生, 孟荔. 中文文本分割模式获取及其优化方法[J]. 南昌大学学报(理科版), 2011, 49(6):597-601.
|
[16] |
(Zou Jian, Zhong Maosheng, Meng Li. Method of Chinese Text Segmentation Model Acquisition and its Optimization[J]. Journal of Nanchang University(Natural Science), 2011, 49(6):597-601.)
|
[17] |
Ribeiro S, Yao J T, Rezende D A. Discovering IMRaD Structure with Different Classifiers[C]// Proceedings of IEEE International Conference on Big Knowledge (ICBK), Singapore. Los Alamitos, CA: IEEE Computer Society, 2018: 200-204.
|
[18] |
丁良萍, 张智雄, 刘欢. 影响支持向量机模型语步自动识别效果的因素研究[J]. 数据分析与知识发现, 2019, 3(11):16-23.
|
[18] |
(Ding Liangping, Zhang Zhixiong, Liu Huan. Factors Affecting Rhetorical Move Recognition with SVM Model[J]. Data Analysis and Knowledge Discovery, 2019, 3(11):16-23.)
|
[19] |
赵丹宁, 牟冬梅, 斯琴. 研究型科技文献的实验数据自动抽取研究——以药物代谢动力学文献为例[J]. 图书馆建设, 2017, 40(12):33-38.
|
[19] |
(Zhao Danning, Mu Dongmei, Si Qin. Research on Experimental Data Automatic Extraction of Scientific and Technological Literature——A Case Study of Pharmacokinetic Literature[J]. Library Development, 2017, 40(12):33-38.)
|
[20] |
陈果, 许天祥. 基于主动学习的科技论文句子功能识别研究[J]. 数据分析与知识发现, 2019, 3(8):53-61.
|
[20] |
(Chen Guo, Xu Tianxiang. Sentence Function Recognition Based on Active Learning[J]. Data Analysis and Knowledge Discovery, 2019, 3(8):53-61.)
|
[21] |
Yang M, Tu W T, Wang J X, et al. Attention-Based LSTM for Target-Dependent Sentiment Classification[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017: 5013-5014.
|
[22] |
Gers F A, Schmidhuber J, Cummins F, et al. Learning to Forget: Continual Prediction with LSTM[J]. Neural Computation, 2000, 12(10):2451-2471.
pmid: 11032042
|
[23] |
Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8):1735-1780.
pmid: 9377276
|
[24] |
赵华茗, 余丽, 周强. 基于均值漂移算法的文本聚类数目优化研究[J]. 数据分析与知识发现, 2019, 3(9):27-35.
|
[24] |
(Zhao Huaming, Yu Li, Zhou Qiang. Determining Best Text Clustering Number with Mean Shift Algorithm[J]. Data Analysis and Knowledge Discovery, 2019, 3(9):27-35.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|