Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (11): 1-12    DOI: 10.11925/infotech.2096-3467.2022.0034
Current Issue | Archive | Adv Search |
Abstracting Biomedical Documents with Knowledge Enhancement
Deng Lu,Hu Po(),Li Xuanhong
School of Computer Science, Central China Normal University, Wuhan 430079, China
Download: PDF (1518 KB)   HTML ( 30
Export: BibTeX | EndNote (RIS)      

[Objective] This study proposes a new text summarization model for biomedicine research, aiming to improve the quality of their abstracts. [Methods] First, we obtained the important contents of the biomedical texts with extractive abstracting technology. Then, we combined the important contents with related knowledge base to extract the key terms and their corresponding concepts. Third, we integrated these contents and concepts to the neural network abstrcting model as background knowledge for the attention mechanism. With the help of domain knowledge, the proposed model can not only focus on the important information from the texts, but also reduce the noises occurring due to the introduction of external information. [Results] We examined the proposed model with three biomedical data sets. The average ROUGE of the proposed model’s PG-meta reached 31.06, which was 1.51 higher than the average ROUGE of the original PG model. [Limitations] We did not investigate the impacts of different knowledge acquiring methods on the effectiveness of our model. [Conclusions] The proposed model can better learn the in-depth meaning of biomedical documents and improve the quality of their abstracts.

Key wordsBiomedical Text Mining      Generative Abstract      Domain Knowledge      Knowledge Enhancement     
Received: 12 January 2022      Published: 13 January 2023
ZTFLH:  TP393  
Fund:research project of State Language Commission(YB135-149);Fundamental Research Funds for the Central Universities(CCNU20ZT012)
Corresponding Authors: Hu Po     E-mail:

Cite this article:

Deng Lu,Hu Po,Li Xuanhong. Abstracting Biomedical Documents with Knowledge Enhancement. Data Analysis and Knowledge Discovery, 2022, 6(11): 1-12.

URL:     OR

Structure of the PG-meta Model
Examples of Knowledge Acquisition
类别 内容
Glaucoma is a leading cause of blindness within the United States and the leading cause of blindness among African-Americans. Measurement of intraocular pressure only is no longer considered adequate for screening. Recognition of risk factors and examination of the optic nerve are key strategies to identify individuals at risk. Medical and surgical treatment of glaucoma have ······
control of iop regulation resides within the aqueous outflow system of the eye ( grant , 1958 ) and iop regulation becomes abnormal in glaucoma.<q>iop is the only treatable risk factor.<q>the intrinsic outflow system abnormality inglaucoma is unknown but is described as poag······
Glaucoma:Eye disease
IOP:Intraocular pressure
POAG:Glaucoma, Primary Open Angle
Examples of the Relationship Between Knowledge Acquisition and Corresponding Abstracts
数据集 数据集
Full-Abs 训练集 3 200 25 825 1 072 # 10
验证集 400 24 348 903 # 9
测试集 400 24 484 936 # 9
Abs-Ti 训练集 3 514 1 477 111 9 3
验证集 439 1 439 109 9 3
测试集 439 1 466 113 9 3
BioAbsTi 训练集 24 631 1 574 118 109 4
验证集 8 210 1 584 118 109 4
测试集 8 210 1 562 117 10 4
Statistical Results of the Experimental Datasets
Experimental Results Under Different Values of d
Full-Abs Abs-Ti BioAbsTi
模型 R-1 R-2 R-L R-1 R-2 R-L R-1 R-2 R-L AVG
Lead 27.93 15.38 24.58 23.36 13.79 24.58 31.79 16.55 28.67 22.96
TextRank 27.82 14.64 24.83 24.83 13.56 20.93 33.52 17.29 30.23 23.07
PG 35.95 20.42 29.87 33.25 18.36 31.45 36.58 25.56 34.52 29.55
Keywords-PG 36.23 20.89 30.52 33.75 19.06 31.83 36.93 26.22 34.85 30.03
PG-meta(All) 36.15 20.85 30.28 33.54 18.75 31.79 36.88 25.96 34.58 29.86
BERT+聚类 32.85 18.87 28.85 27.53 14.22 23.93 35.25 21.52 32.85 26.20
BERTSum 33.93 19.86 29.18 28.70 14.56 24.34 37.60 22.81 33.65 27.18
PG-meta 37.05 21.96 33.58 34.82 20.21 32.97 37.58 26.26 35.19 31.06
Experimental Results of the Baseline Model and the PG-meta Model Involved in the Comparison on the Three Datasets
类别 文本内容
原文 ······. A recent study reported that cardiac lymphatic endothelial cells (LECs) stem from venous and non-venous origins in mice. Here, we identified Isl1-expressing progenitors as a potential non-venous origin of cardiac LECs. Genetic lineage tracing with Isl1-Cre reporter mice suggested a possible contribution from the Isl1-expressing pharyngeal mesoderm constituting the second heart field to lymphatic vessels around the cardiac outflow tract as well as to those in the facial skin and the lymph sac. Isl1(+) lineage-specific deletion of Prox1 resulted in disrupted LYVE1(+) vessel structures, indicating a Prox1-dependent mechanism in this contribution. ······
参考摘要 Isl1-expressing non-venous cell lineage contributes to cardiac lymphatic vessel development.
译文: Isl1-expressing的非静脉细胞谱系有助于心脏淋巴管发育。
Here, we identified Isl1-expressing progenitors as a potential non-venous origin of cardiac LECs.
PG模型的摘要结果 The non-venous cell lineage can help the development of cardiac lymphatic vessels.
The non-venous cell lineage of Isl1-expressing promotes the development of cardiac lymphatic vessels.
译文: Isl1-expression的非静脉细胞谱系促进心脏淋巴管的发育。
Summary Results Automatically Generated by the Three Models
数据集 指标 PG PG-meta
PG-meta(TR) PG-meta(BS)
Full-Abs R-1 35.95 36.89 36.97 37.05
R-2 20.42 21.88 21.97 21.96
R-L 29.87 32.95 33.56 33.58
Abs-Ti R-1 33.25 34.67 34.85 34.82
R-2 18.36 19.66 19.35 20.21
R-L 31.45 32.58 32.73 32.97
BioAbsTi R-1 36.58 37.42 37.55 37.58
R-2 25.56 25.93 26.13 26.26
R-L 34.52 34.97 35.16 35.19
AVG 29.55 30.77 30.91 31.06
Experimental Results Under Different Important Content Extraction Methods
数据集 指标 PG-meta(term) PG-meta(con) PG-meta(t-c)
Full-Abs R-1 36.53 36.75 37.05
R-2 21.85 21.72 21.96
R-L 33.24 33.35 33.58
Abs-Ti R-1 34.63 34.79 34.82
R-2 20.19 20.25 20.21
R-L 32.73 32.79 32.97
BioAbsTi R-1 37.46 37.59 37.58
R-2 26.07 26.12 26.26
R-L 35.07 35.03 35.19
AVG 30.86 30.93 31.06
Experimental Results Under Different Knowledge Correlation Granularities
Experimental Results of Different Knowledge Fusion Methods
[1] Mishra R, Bian J T, Fiszman M, et al. Text Summarization in the Biomedical Domain: A Systematic Review of Recent Research[J]. Journal of Biomedical Informatics, 2014, 52: 457-467.
doi: 10.1016/j.jbi.2014.06.009 pmid: 25016293
[2] See A, Liu P J, Manning C D. Get to the Point: Summarization with Pointer-Generator Networks[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1073-1083.
[3] 王凯祥. 面向查询的自动文本摘要技术研究综述[J]. 计算机科学, 2018, 45(S2): 12-16.
[3] (Wang Kaixiang. Survey of Query-Oriented Automatic Summarization Technology[J]. Computer Science, 2018, 45(S2): 12-16.)
[4] 余珊珊, 苏锦钿, 李鹏飞. 基于改进的TextRank的自动摘要提取方法[J]. 计算机科学, 2016, 43(6): 240-247.
doi: 10.11896/j.issn.1002-137X.2016.06.048
[4] (Yu Shanshan, Su Jindian, Li Pengfei. Improved TextRank-Based Method for Automatic Summarization[J]. Computer Science, 2016, 43(6): 240-247.)
doi: 10.11896/j.issn.1002-137X.2016.06.048
[5] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
[6] Liu Y. Fine-Tune BERT for Extractive Summarization[OL]. arXiv Preprint, arXiv: 1903.10318.
[7] Hermann K M, Kočiský T, Grefenstette E, et al. Teaching Machines to Read and Comprehend[OL]. arXiv Preprint, arXiv: 1506.03340.
[8] Zhou L, Hovy E. Template-Filtered Headline Summarization[C]// Proceedings of the ACL-04 Workshop:Text Summarization Branches Out. 2004: 56-60.
[9] 石磊, 阮选敏, 魏瑞斌, 等. 基于序列到序列模型的生成式文本摘要研究综述[J]. 情报学报, 2019, 38(10): 1102-1116.
[9] (Shi Lei, Ruan Xuanmin, Wei Ruibin, et al. Abstractive Summarization Based on Sequence to Sequence Models: A Review[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(10): 1102-1116.)
[10] Rush A M, Chopra S, Weston J. A Neural Attention Model for Abstractive Sentence Summarization[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 379-389.
[11] Chen T, Xu R F, He Y L, et al. Improving Sentiment Analysis via Sentence Type Classification Using BiLSTM-CRF and CNN[J]. Expert Systems with Applications, 2017, 72: 221-230.
doi: 10.1016/j.eswa.2016.10.065
[12] Gehring J, Auli M, Grangier D, et al. Convolutional Sequence to Sequence Learning[C]// Proceedings of the 34th International Conference on Machine Learning. 2017: 1243-1252.
[13] Cai T, Shen M J, Peng H L, et al. Improving Transformer with Sequential Context Representations for Abstractive Text Summarization[C]// Proceedings of the 8th CCF International Conference on Natural Language Processing and Chinese Computing. 2019: 512-524.
[14] 罗鹏程, 王一博, 王继民. 基于深度预训练语言模型的文献学科自动分类研究[J]. 情报学报, 2020, 39(10): 1046-1059.
[14] (Luo Pengcheng, Wang Yibo, Wang Jimin. Automatic Discipline Classification for Scientific Papers Based on a Deep Pre-Training Language Model[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(10): 1046-1059.)
[15] Bhatia N, Jaiswal A. Automatic Text Summarization and It's Methods—A Review[C]// Proceedings of the 6th International Conference-Cloud System and Big Data Engineering(Confluence). IEEE, 2016: 65-72.
[16] Nallapati R, Zhou B, Dos Santos C, et al. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond[C]// Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. 2016: 280-290.
[17] Gu J, Lu Z, Li H, et al. Incorporating Copying Mechanism in Sequence-to-Sequence Learning[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1631-1640.
[18] Tu Z, Lu Z, Liu Y, et al. Modeling Coverage for Neural Machine Translation[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 76-85.
[19] Jiang X P, Hu P, Hou L W, et al. Improving Pointer-Generator Network with Keywords Information for Chinese Abstractive Summarization[C]// Proceedings of the 7th CCF International Conference on Natural Language Processing and Chinese Computing. 2018: 464-474.
[20] Nasr-Azadani M, Ghadiri N, Davoodijam E. Graph-Based Biomedical Text Summarization: An Itemset Mining and Sentence Clustering Approach[J]. Journal of Biomedical Informatics, 2018, 84: 42-58.
doi: S1532-0464(18)30111-4 pmid: 29906584
[21] Yoo I, Hu X H, Song I Y. A Coherent Graph-Based Semantic Clustering and Summarization Approach for Biomedical Literature and a New Summarization Evaluation Method[J]. BMC Bioinformatics, 2007, 8(S9): S4.
[22] Afzal M, Alam F, Malik K M, et al. Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation[J]. Journal of Medical Internet Research, 2020, 22(10): e19810.
doi: 10.2196/19810
[23] Moradi M, Dorffner G, Samwald M. Deep Contextualized Embeddings for Quantifying the Informative Content in Biomedical Text Summarization[J]. Computer Methods and Programs in Biomedicine, 2020, 184: 105117.
doi: 10.1016/j.cmpb.2019.105117
[24] Kondadadi R, Manchanda S, Ngo J, et al.Optum at MEDIQA 2021: Abstractive Summarization of Radiology Reports Using Simple BART Finetuning[C]// Proceedings of the 20th Workshop on Biomedical Language Processing. 2021: 280-284.
[25] Mahajan D, Tsou C H, Liang J J. IBM Research at MEDIQA 2021: Toward Improving Factual Correctness of Radiology Report Abstractive Summarization[C]// Proceedings of the 20th Workshop on Biomedical Language Processing. 2021: 302-310.
[26] Sotudeh S, Goharian N, Filice R. Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 1899-1905.
[27] Bhattacharya S, Ha-Thuc V, Srinivasan P. MeSH: A Window into Full Text for Document Summarization[J]. Bioinformatics, 2011, 27(13): i120-i128.
doi: 10.1093/bioinformatics/btr223
[28] Plaza L, Díaz A, Gervás P. A Semantic Graph-Based Approach to Biomedical Summarisation[J]. Artificial Intelligence in Medicine, 2011, 53(1): 1-14.
doi: 10.1016/j.artmed.2011.06.005 pmid: 21752612
[29] Bodenreider O. The Unified Medical Language System(UMLS): Integrating Biomedical Terminology[J]. Nucleic Acids Research, 2004, 32(S1): D267-D270.
doi: 10.1093/nar/gkh061
[30] MacAvaney S, Sotudeh S, Cohan A, et al. Ontology-Aware Clinical Abstractive Summarization[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2019: 1013-1016.
[31] Zhang Y, Ding D Y, Qian T, et al. Learning to Summarize Radiology Findings[C]// Proceedings of the 9th International Workshop on Health Text Mining and Information Analysis. 2018: 204-213.
[32] Mohan S, Li D. MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts[OL]. arXiv Preprint, arXiv: 1902.09476.
[33] Du Y P, Li Q X, Wang L L, et al. Biomedical-Domain Pre-Trained Language Model for Extractive Summarization[J]. Knowledge-Based Systems, 2020, 199: 105964.
doi: 10.1016/j.knosys.2020.105964
[34] Lin C Y. ROUGE: A Package for Automatic Evaluation of Summaries[C]// Proceedings of the 2004 Workshop on Text Summarization Branches Out. 2004: 74-81.
[35] Lin C Y, Hovy E. Automatic Evaluation of Summaries Using N-Gram Co-Occurrence Statistics[C]// Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. 2003: 71-78.
[1] Jia Minghua, Wang Xiuli. Quantifying Logical Relations of Financial Risks with BERT and Mutual Information[J]. 数据分析与知识发现, 2022, 6(10): 68-78.
[2] Chen Guo,Xiao Lu. Linking Knowledge Elements from Online Community[J]. 数据分析与知识发现, 2017, 1(11): 75-83.
[3] Xu Xin, Hong Yunjia. Study on Text Visualization of Clustering Result for Domain Knowledge Base —— Take Knowledge Base of Chinese Cuisine Culture as the Object[J]. 现代图书情报技术, 2014, 30(10): 25-32.
[4] Hu Xinming, Luo Jianjun, Xia Huosong. Research on Interactive Recommender System Based on Commodity Domain Knowledge[J]. 现代图书情报技术, 2014, 30(10): 56-62.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938