Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (6): 105-114    DOI: 10.11925/infotech.2096-3467.2021.1238
Current Issue | Archive | Adv Search |
Extracting Medical Entity Relationships with Domain-Specific Knowledge and Distant Supervision
Jing Shenqi1,2,3,Zhao Youlin1()
1School of Information Management, Nanjing University, Nanjing 210023, China
2School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
3Center for Data Management, The First Affiliated Hospital of Nanjing Medical University (Jiangsu Province Hospital), Nanjing 210096, China
Download: PDF (915 KB)   HTML ( 20
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a distant supervised model to extract medical entity relationships based on Medical Domain-Specific Knowledge, aiming to reduce the cost of data labeling and potential errors of the existing models. [Methods] First, we used a multi-instance strategy to reduce the noise of distant supervised labeled data. Then, we utilized a pre-trained language model (MedicalBERT) to encode the labeled texts. Third, with the description of the entities in the medical knowledge base, we provided supervision signals for medical relationship extraction, and improved the accuracy of the semantic encoding. [Results] Compared with the existing models, performance of our new algorithm was up to 5.4% higher for Precision, 2.5% higher for Recall, and 4.1% higher for F1. In addition, F1-score for the complicated extraction tasks reached 93.8%. [Limitations] More research is needed to examine the proposed method with more sentences. [Conclusions] Our new model could effectively extract medical entity relationships and benefit related research.

Key wordsMedical Relation Extraction      Distant Supervision      Medical Domain-Specific Knowledge      Pre-Trained Language Model     
Received: 28 October 2021      Published: 28 July 2022
ZTFLH:  G302  
  R-02  
Fund:National Key R&D Program of China(2018YFC1314900);Key R&D Program of Jiangsu(BE2020721)
Corresponding Authors: Zhao Youlin     E-mail: sobzyl@hhu.edu.cn

Cite this article:

Jing Shenqi, Zhao Youlin. Extracting Medical Entity Relationships with Domain-Specific Knowledge and Distant Supervision. Data Analysis and Knowledge Discovery, 2022, 6(6): 105-114.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.1238     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I6/105

Illustration of Partition Convolution
The Method of Memory Networks Coding
Medical Knowledge-Base Triples
预定义关系类型 头实体与尾实体类型 数目统计
引发症状 (疾病,症状) 2 341
并发症 (症状,症状) 1 206
治疗方法 (疾病,治疗) 532
检查方式 (疾病,检查) 792
用药 (疾病,药物) 1 638
Relationship Category
Distribution of Length of Medical Sentence
模型 P(%) R(%) F1值(%)
MIML 81.5 89.8 85.4
PCNN 83.4 90.6 86.9
SeG 84.3 91.2 87.6
BPCMA 86.9 92.3 89.5
Comparison of Performance
关系类别 P(%) R(%) F1值(%)
引发症状 87.2 91.8 89.4
并发症 93.4 94.2 93.8
治疗方法 79.3 83.5 81.3
检查方式 77.9 80.6 79.2
用药 92.6 93.7 93.1
The Effect of BPCMA in Each Category
模型 P(%) R(%) F1值(%)
No-MedicalBERT 82.4 88.2 85.2
No-实体描述 85.3 90.5 87.8
No-记忆网络 84.2 90.3 87.1
No-注意力机制 83.6 89.4 86.4
No-分段策略 83.3 87.9 85.5
All 86.9 92.3 89.5
The Results of Ablation Experiments
[1] 李丽双, 袁光辉, 刘晗喆. 基于位置降噪和丰富语义的电子病历实体关系抽取[J]. 中文信息学报, 2021, 35(8): 89-97.
[1] (Li Lishuang, Yuan Guanghui, Liu Hanzhe. Entity Relationship Extraction from Electronic Medical Records Based on Location Noise Reduction and Rich Semantics[J]. Journal of Chinese Information Processing, 2021, 35(8): 89-97.)
[2] 昝红英, 关同峰, 张坤丽, 等. 面向医学文本的实体关系抽取研究综述[J]. 郑州大学学报(理学版), 2020, 52(4): 1-15.
[2] Zan Hongying, Guan Tongfeng, Zhang Kunli, et al. Review on Entity Relation Extraction for Medical Text[J]. Journal of Zhengzhou University(Natural Science Edition), 2020, 52(4): 1-15.)
[3] 杨锦锋, 于秋滨, 关毅, 等. 电子病历命名实体识别和实体关系抽取研究综述[J]. 自动化学报, 2014, 40(8): 1537-1562.
[3] (Yang Jinfeng, Yu Qiubin, Guan Yi, et al. An Overview of Research on Electronic Medical Record Oriented Named Entity Recognition and Entity Relation Extraction[J]. Acta Automatica Sinica, 2014, 40(8): 1537-1562.)
[4] Jelier R, Jenster G, Dorssers L C J, et al. Co-Occurrence Based Meta-Analysis of Scientific Texts: Retrieving Biological Relationships Between Genes[J]. Bioinformatics, 2005, 21(9): 2049-2058.
pmid: 15657104
[5] Yang Y L, Lai P T, Tsai R T H. A Hybrid System for Temporal Relation Extraction from Discharge Summaries[C]// Proceedings of the 19th International Conference on Technologies and Applications of Artificial Intelligence. 2014: 379-386.
[6] Seol J W, Yi W J, Choi J, et al. Causality Patterns and Machine Learning for the Extraction of Problem-Action Relations in Discharge Summaries[J]. International Journal of Medical Informatics, 2017, 98: 1-12.
[7] Nikfarjam A, Emadzadeh E, Gonzalez G. Towards Generating a Patient’s Timeline: Extracting Temporal Relationships from Clinical Notes[J]. Journal of Biomedical Informatics, 2013, 46: S40-S47.
[8] Hendrickx I, Kim S N, Kozareva Z, et al. SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals[C]// Proceedings of the 2009 Workshop on Semantic Evaluations:Recent Achievements and Future Directions. 2009: 94-99.
[9] Doddington G. The Automatic Content Extraction(ACE) Program-Tasks, Data, and Evaluation[C]// Proceedings of the 4th International Conference on Language Resources and Evaluation. 2004: 837-840
[10] Wei C H, Peng Y, Leaman R, et al. Overview of the BioCreative V Chemical Disease Relation(CDR) Task[C]// Proceedings of the 5th BioCreative Challenge Evaluation Workshop. 2015:154-166.
[11] Uzuner Ö, South B R, Shen S Y, et al. 2010 i2b2/VA Challenge on Concepts, Assertions, and Relations in Clinical Text[J]. Journal of the American Medical Informatics Association, 2011, 18(5): 552-556.
doi: 10.1136/amiajnl-2011-000203 pmid: 21685143
[12] Mintz M, Bills S, Snow R, et al. Distant Supervision for Relation Extraction Without Labeled Data[C]// Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009: 1003-1011.
[13] Riedel S, Yao L M, McCallum A. Modeling Relations and Their Mentions Without Labeled Text[C]// Proceedings of the 2010 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2010: 148-163.
[14] Zeng D J, Liu K, Chen Y B, et al. Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1753-1762.
[15] Jiang X, Wang Q, Li Peng, et al. Relation Extraction with Multi-Instance Multi-Label Convolutional Neural Networks[C]// Proceedings of the 26th International Conference on Computational Linguistics. 2016: 1471-1480.
[16] Feng X C, Guo J, Qin B, et al. Effective Deep Memory Networks for Distant Supervised Relation Extraction[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017: 4003-4008.
[17] Ji G L, Liu K, He S Z, et al. Distant Supervision for Relation Extraction with Sentence-Level Attention and Entity Descriptions[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017: 3060-3066.
[18] 杨穗珠, 刘艳霞, 张凯文, 等. 远程监督关系抽取综述[J]. 计算机学报, 2021, 44(8): 1636-1660.
[18] (Yang Suizhu, Liu Yanxia, Zhang Kaiwen, et al. Survey on Distantly-Supervised Relation Extraction[J]. Chinese Journal of Computers, 2021, 44(8): 1636-1660.)
[19] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
[20] Donnelly K. SNOMED-CT: The Advanced Terminology and Coding System for eHealth[J]. Studies in Health Technology and Informatics, 2006, 121: 279-290.
pmid: 17095826
[21] Lipscomb C E. Medical Subject Headings(MeSH)[J]. Bulletin of the Medical Library Association, 2000, 88(3): 265-266.
pmid: 10928714
[22] Wu T X, Gao C, Qi G L, et al. KG-Buddhism: The Chinese Knowledge Graph on Buddhism[C]// Proceedings of the 7th Joint International Semantic Technology Conference. 2017: 259-267.
[23] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[24] Liu W Y, Wen Y D, Yu Z D, et al. Large-Margin Softmax Loss for Convolutional Neural Networks[C]// Proceedings of the 33rd International Conference on Machine Learning. 2016: 507-516.
[25] Kingma D P, Ba J. Adam: A Method for Stochastic Optimization[OL]. arXiv Preprint, arXiv: 1412.6980.
[26] Fleiss J L, Cohen J. The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability[J]. Educational and Psychological Measurement, 1973, 33(3): 613-619.
[27] Surdeanu M, Tibshirani J, Nallapati R, et al. Multi-Instance Multi-Label Learning for Relation Extraction[C]// Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012: 455-465.
[28] Li Y, Long G D, Shen T, et al. Self-Attention Enhanced Selective Gate with Entity-Aware Embedding for Distantly Supervised Relation Extraction[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 8269-8276.
[29] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting[J]. Journal of Machine Learning Research, 2014, 15(1):1929-1958.
[1] Ye Han,Sun Haichun,Li Xin,Jiao Kainan. Classification Model for Long Texts with Attention Mechanism and Sentence Vector Compression[J]. 数据分析与知识发现, 2022, 6(6): 84-94.
[2] Wang Yizhen,Ou Shiyan,Chen Jinju. Automatic Abstracting Civil Judgment Documents with Two-Stage Procedure[J]. 数据分析与知识发现, 2021, 5(5): 104-114.
[3] Guo Shaoqing,Le Xiaoqiu. Identifying Actual Value of Numerical Indicator from Scientific Paper[J]. 数据分析与知识发现, 2018, 2(1): 21-28.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn