Extracting Medical Entity Relationships with Domain-Specific Knowledge and Distant Supervision
Jing Shenqi1,2,3,Zhao Youlin1()
1School of Information Management, Nanjing University, Nanjing 210023, China
2School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
3Center for Data Management, The First Affiliated Hospital of Nanjing Medical University (Jiangsu Province Hospital), Nanjing 210096, China
[Objective] This paper proposes a distant supervised model to extract medical entity relationships based on Medical Domain-Specific Knowledge, aiming to reduce the cost of data labeling and potential errors of the existing models. [Methods] First, we used a multi-instance strategy to reduce the noise of distant supervised labeled data. Then, we utilized a pre-trained language model (MedicalBERT) to encode the labeled texts. Third, with the description of the entities in the medical knowledge base, we provided supervision signals for medical relationship extraction, and improved the accuracy of the semantic encoding. [Results] Compared with the existing models, performance of our new algorithm was up to 5.4% higher for Precision, 2.5% higher for Recall, and 4.1% higher for F1. In addition, F1-score for the complicated extraction tasks reached 93.8%. [Limitations] More research is needed to examine the proposed method with more sentences. [Conclusions] Our new model could effectively extract medical entity relationships and benefit related research.

Key wordsMedical Relation Extraction      Distant Supervision      Medical Domain-Specific Knowledge      Pre-Trained Language Model     
Received: 28 October 2021      Published: 28 July 2022
ZTFLH:  G302  
Fund:National Key R&D Program of China(2018YFC1314900);Key R&D Program of Jiangsu(BE2020721)
Corresponding Authors: Zhao Youlin     E-mail:

Jing Shenqi, Zhao Youlin. Extracting Medical Entity Relationships with Domain-Specific Knowledge and Distant Supervision. Data Analysis and Knowledge Discovery, 2022, 6(6): 105-114.

Illustration of Partition Convolution
The Method of Memory Networks Coding
Medical Knowledge-Base Triples
预定义关系类型 头实体与尾实体类型 数目统计
引发症状 (疾病,症状) 2 341
并发症 (症状,症状) 1 206
治疗方法 (疾病,治疗) 532
检查方式 (疾病,检查) 792
用药 (疾病,药物) 1 638
Relationship Category
Distribution of Length of Medical Sentence
模型 P(%) R(%) F1值(%)
MIML 81.5 89.8 85.4
PCNN 83.4 90.6 86.9
SeG 84.3 91.2 87.6
BPCMA 86.9 92.3 89.5
Comparison of Performance
关系类别 P(%) R(%) F1值(%)
引发症状 87.2 91.8 89.4
并发症 93.4 94.2 93.8
治疗方法 79.3 83.5 81.3
检查方式 77.9 80.6 79.2
用药 92.6 93.7 93.1
The Effect of BPCMA in Each Category
模型 P(%) R(%) F1值(%)
No-MedicalBERT 82.4 88.2 85.2
No-实体描述 85.3 90.5 87.8
No-记忆网络 84.2 90.3 87.1
No-注意力机制 83.6 89.4 86.4
No-分段策略 83.3 87.9 85.5
All 86.9 92.3 89.5
The Results of Ablation Experiments
