|
|
Abstracting Biomedical Documents with Knowledge Enhancement |
Deng Lu,Hu Po(),Li Xuanhong |
School of Computer Science, Central China Normal University, Wuhan 430079, China |
|
|
Abstract [Objective] This study proposes a new text summarization model for biomedicine research, aiming to improve the quality of their abstracts. [Methods] First, we obtained the important contents of the biomedical texts with extractive abstracting technology. Then, we combined the important contents with related knowledge base to extract the key terms and their corresponding concepts. Third, we integrated these contents and concepts to the neural network abstrcting model as background knowledge for the attention mechanism. With the help of domain knowledge, the proposed model can not only focus on the important information from the texts, but also reduce the noises occurring due to the introduction of external information. [Results] We examined the proposed model with three biomedical data sets. The average ROUGE of the proposed model’s PG-meta reached 31.06, which was 1.51 higher than the average ROUGE of the original PG model. [Limitations] We did not investigate the impacts of different knowledge acquiring methods on the effectiveness of our model. [Conclusions] The proposed model can better learn the in-depth meaning of biomedical documents and improve the quality of their abstracts.
|
Received: 12 January 2022
Published: 13 January 2023
|
|
Fund:research project of State Language Commission(YB135-149);Fundamental Research Funds for the Central Universities(CCNU20ZT012) |
Corresponding Authors:
Hu Po
E-mail: phu@mail.ccnu.edu.cn
|
[1] |
Mishra R, Bian J T, Fiszman M, et al. Text Summarization in the Biomedical Domain: A Systematic Review of Recent Research[J]. Journal of Biomedical Informatics, 2014, 52: 457-467.
doi: 10.1016/j.jbi.2014.06.009
pmid: 25016293
|
[2] |
See A, Liu P J, Manning C D. Get to the Point: Summarization with Pointer-Generator Networks[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1073-1083.
|
[3] |
王凯祥. 面向查询的自动文本摘要技术研究综述[J]. 计算机科学, 2018, 45(S2): 12-16.
|
[3] |
(Wang Kaixiang. Survey of Query-Oriented Automatic Summarization Technology[J]. Computer Science, 2018, 45(S2): 12-16.)
|
[4] |
余珊珊, 苏锦钿, 李鹏飞. 基于改进的TextRank的自动摘要提取方法[J]. 计算机科学, 2016, 43(6): 240-247.
doi: 10.11896/j.issn.1002-137X.2016.06.048
|
[4] |
(Yu Shanshan, Su Jindian, Li Pengfei. Improved TextRank-Based Method for Automatic Summarization[J]. Computer Science, 2016, 43(6): 240-247.)
doi: 10.11896/j.issn.1002-137X.2016.06.048
|
[5] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
|
[6] |
Liu Y. Fine-Tune BERT for Extractive Summarization[OL]. arXiv Preprint, arXiv: 1903.10318.
|
[7] |
Hermann K M, Kočiský T, Grefenstette E, et al. Teaching Machines to Read and Comprehend[OL]. arXiv Preprint, arXiv: 1506.03340.
|
[8] |
Zhou L, Hovy E. Template-Filtered Headline Summarization[C]// Proceedings of the ACL-04 Workshop:Text Summarization Branches Out. 2004: 56-60.
|
[9] |
石磊, 阮选敏, 魏瑞斌, 等. 基于序列到序列模型的生成式文本摘要研究综述[J]. 情报学报, 2019, 38(10): 1102-1116.
|
[9] |
(Shi Lei, Ruan Xuanmin, Wei Ruibin, et al. Abstractive Summarization Based on Sequence to Sequence Models: A Review[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(10): 1102-1116.)
|
[10] |
Rush A M, Chopra S, Weston J. A Neural Attention Model for Abstractive Sentence Summarization[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 379-389.
|
[11] |
Chen T, Xu R F, He Y L, et al. Improving Sentiment Analysis via Sentence Type Classification Using BiLSTM-CRF and CNN[J]. Expert Systems with Applications, 2017, 72: 221-230.
doi: 10.1016/j.eswa.2016.10.065
|
[12] |
Gehring J, Auli M, Grangier D, et al. Convolutional Sequence to Sequence Learning[C]// Proceedings of the 34th International Conference on Machine Learning. 2017: 1243-1252.
|
[13] |
Cai T, Shen M J, Peng H L, et al. Improving Transformer with Sequential Context Representations for Abstractive Text Summarization[C]// Proceedings of the 8th CCF International Conference on Natural Language Processing and Chinese Computing. 2019: 512-524.
|
[14] |
罗鹏程, 王一博, 王继民. 基于深度预训练语言模型的文献学科自动分类研究[J]. 情报学报, 2020, 39(10): 1046-1059.
|
[14] |
(Luo Pengcheng, Wang Yibo, Wang Jimin. Automatic Discipline Classification for Scientific Papers Based on a Deep Pre-Training Language Model[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(10): 1046-1059.)
|
[15] |
Bhatia N, Jaiswal A. Automatic Text Summarization and It's Methods—A Review[C]// Proceedings of the 6th International Conference-Cloud System and Big Data Engineering(Confluence). IEEE, 2016: 65-72.
|
[16] |
Nallapati R, Zhou B, Dos Santos C, et al. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond[C]// Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. 2016: 280-290.
|
[17] |
Gu J, Lu Z, Li H, et al. Incorporating Copying Mechanism in Sequence-to-Sequence Learning[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1631-1640.
|
[18] |
Tu Z, Lu Z, Liu Y, et al. Modeling Coverage for Neural Machine Translation[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 76-85.
|
[19] |
Jiang X P, Hu P, Hou L W, et al. Improving Pointer-Generator Network with Keywords Information for Chinese Abstractive Summarization[C]// Proceedings of the 7th CCF International Conference on Natural Language Processing and Chinese Computing. 2018: 464-474.
|
[20] |
Nasr-Azadani M, Ghadiri N, Davoodijam E. Graph-Based Biomedical Text Summarization: An Itemset Mining and Sentence Clustering Approach[J]. Journal of Biomedical Informatics, 2018, 84: 42-58.
doi: S1532-0464(18)30111-4
pmid: 29906584
|
[21] |
Yoo I, Hu X H, Song I Y. A Coherent Graph-Based Semantic Clustering and Summarization Approach for Biomedical Literature and a New Summarization Evaluation Method[J]. BMC Bioinformatics, 2007, 8(S9): S4.
|
[22] |
Afzal M, Alam F, Malik K M, et al. Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation[J]. Journal of Medical Internet Research, 2020, 22(10): e19810.
doi: 10.2196/19810
|
[23] |
Moradi M, Dorffner G, Samwald M. Deep Contextualized Embeddings for Quantifying the Informative Content in Biomedical Text Summarization[J]. Computer Methods and Programs in Biomedicine, 2020, 184: 105117.
doi: 10.1016/j.cmpb.2019.105117
|
[24] |
Kondadadi R, Manchanda S, Ngo J, et al.Optum at MEDIQA 2021: Abstractive Summarization of Radiology Reports Using Simple BART Finetuning[C]// Proceedings of the 20th Workshop on Biomedical Language Processing. 2021: 280-284.
|
[25] |
Mahajan D, Tsou C H, Liang J J. IBM Research at MEDIQA 2021: Toward Improving Factual Correctness of Radiology Report Abstractive Summarization[C]// Proceedings of the 20th Workshop on Biomedical Language Processing. 2021: 302-310.
|
[26] |
Sotudeh S, Goharian N, Filice R. Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 1899-1905.
|
[27] |
Bhattacharya S, Ha-Thuc V, Srinivasan P. MeSH: A Window into Full Text for Document Summarization[J]. Bioinformatics, 2011, 27(13): i120-i128.
doi: 10.1093/bioinformatics/btr223
|
[28] |
Plaza L, Díaz A, Gervás P. A Semantic Graph-Based Approach to Biomedical Summarisation[J]. Artificial Intelligence in Medicine, 2011, 53(1): 1-14.
doi: 10.1016/j.artmed.2011.06.005
pmid: 21752612
|
[29] |
Bodenreider O. The Unified Medical Language System(UMLS): Integrating Biomedical Terminology[J]. Nucleic Acids Research, 2004, 32(S1): D267-D270.
doi: 10.1093/nar/gkh061
|
[30] |
MacAvaney S, Sotudeh S, Cohan A, et al. Ontology-Aware Clinical Abstractive Summarization[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2019: 1013-1016.
|
[31] |
Zhang Y, Ding D Y, Qian T, et al. Learning to Summarize Radiology Findings[C]// Proceedings of the 9th International Workshop on Health Text Mining and Information Analysis. 2018: 204-213.
|
[32] |
Mohan S, Li D. MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts[OL]. arXiv Preprint, arXiv: 1902.09476.
|
[33] |
Du Y P, Li Q X, Wang L L, et al. Biomedical-Domain Pre-Trained Language Model for Extractive Summarization[J]. Knowledge-Based Systems, 2020, 199: 105964.
doi: 10.1016/j.knosys.2020.105964
|
[34] |
Lin C Y. ROUGE: A Package for Automatic Evaluation of Summaries[C]// Proceedings of the 2004 Workshop on Text Summarization Branches Out. 2004: 74-81.
|
[35] |
Lin C Y, Hovy E. Automatic Evaluation of Summaries Using N-Gram Co-Occurrence Statistics[C]// Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. 2003: 71-78.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|