Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (6): 105-114     https://doi.org/10.11925/infotech.2096-3467.2021.1238
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于医学领域知识和远程监督的医学实体关系抽取研究*
景慎旗1,2,3,赵又霖1()
1南京大学信息管理学院 南京 210023
2南京医科大学生物医学工程与信息学院 南京 211166
3南京医科大学第一附属医院(江苏省人民医院)数据应用管理中心 南京 210096
Extracting Medical Entity Relationships with Domain-Specific Knowledge and Distant Supervision
Jing Shenqi1,2,3,Zhao Youlin1()
1School of Information Management, Nanjing University, Nanjing 210023, China
2School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
3Center for Data Management, The First Affiliated Hospital of Nanjing Medical University (Jiangsu Province Hospital), Nanjing 210096, China
全文: PDF (915 KB)   HTML ( 19
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 针对当前传统医学关系抽取方法存在数据标注成本高及易产生错误标签的问题,提出引入医学领域知识的远程监督医学实体关系抽取模型。【方法】 该模型采用多实例策略降低远程监督标注数据的噪声影响,使用预训练语言模型MedicalBERT对远程监督标注文本进行编码,以实体在医学知识库的描述作为背景知识为医学关系抽取提供监督信号,提升文本中实体语义编码的准确性。【结果】 本文模型的抽取效果与现有模型相比,准确率最高提升5.4%,召回率最高提升2.5%,F1值最高提升4.1%。此外,在并发症的抽取结果中,F1值达到93.8%。【局限】 模型主要适用于句子级关系抽取,暂未考虑其在更多句子情况下的性能。【结论】 引入医学领域知识的远程监督医学实体关系抽取模型具有良好的关系抽取效果,可为医学关系抽取研究提供参考。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
景慎旗
赵又霖
关键词 医学关系抽取远程监督医学领域知识预训练语言模型    
Abstract

[Objective] This paper proposes a distant supervised model to extract medical entity relationships based on Medical Domain-Specific Knowledge, aiming to reduce the cost of data labeling and potential errors of the existing models. [Methods] First, we used a multi-instance strategy to reduce the noise of distant supervised labeled data. Then, we utilized a pre-trained language model (MedicalBERT) to encode the labeled texts. Third, with the description of the entities in the medical knowledge base, we provided supervision signals for medical relationship extraction, and improved the accuracy of the semantic encoding. [Results] Compared with the existing models, performance of our new algorithm was up to 5.4% higher for Precision, 2.5% higher for Recall, and 4.1% higher for F1. In addition, F1-score for the complicated extraction tasks reached 93.8%. [Limitations] More research is needed to examine the proposed method with more sentences. [Conclusions] Our new model could effectively extract medical entity relationships and benefit related research.

Key wordsMedical Relation Extraction    Distant Supervision    Medical Domain-Specific Knowledge    Pre-Trained Language Model
收稿日期: 2021-10-28      出版日期: 2022-07-28
ZTFLH:  G302  
  R-02  
基金资助:*国家重点研发计划项目(2018YFC1314900);江苏省重点研发计划项目(BE2020721)
通讯作者: 赵又霖, ORCID:0000-0002-3028-437X     E-mail: sobzyl@hhu.edu.cn
引用本文:   
景慎旗, 赵又霖. 基于医学领域知识和远程监督的医学实体关系抽取研究*[J]. 数据分析与知识发现, 2022, 6(6): 105-114.
Jing Shenqi, Zhao Youlin. Extracting Medical Entity Relationships with Domain-Specific Knowledge and Distant Supervision. Data Analysis and Knowledge Discovery, 2022, 6(6): 105-114.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.1238      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I6/105
Fig.1  分段卷积示意
Fig.2  记忆网络编码方式
Fig.3  医学知识库三元组
预定义关系类型 头实体与尾实体类型 数目统计
引发症状 (疾病,症状) 2 341
并发症 (症状,症状) 1 206
治疗方法 (疾病,治疗) 532
检查方式 (疾病,检查) 792
用药 (疾病,药物) 1 638
Table 1  关系类别
Fig.4  医学句子长度分布
模型 P(%) R(%) F1值(%)
MIML 81.5 89.8 85.4
PCNN 83.4 90.6 86.9
SeG 84.3 91.2 87.6
BPCMA 86.9 92.3 89.5
Table 2  模型效果对比
关系类别 P(%) R(%) F1值(%)
引发症状 87.2 91.8 89.4
并发症 93.4 94.2 93.8
治疗方法 79.3 83.5 81.3
检查方式 77.9 80.6 79.2
用药 92.6 93.7 93.1
Table 3  BPCMA在各个类别中的效果
模型 P(%) R(%) F1值(%)
No-MedicalBERT 82.4 88.2 85.2
No-实体描述 85.3 90.5 87.8
No-记忆网络 84.2 90.3 87.1
No-注意力机制 83.6 89.4 86.4
No-分段策略 83.3 87.9 85.5
All 86.9 92.3 89.5
Table 4  消融实验结果
[1] 李丽双, 袁光辉, 刘晗喆. 基于位置降噪和丰富语义的电子病历实体关系抽取[J]. 中文信息学报, 2021, 35(8): 89-97.
[1] (Li Lishuang, Yuan Guanghui, Liu Hanzhe. Entity Relationship Extraction from Electronic Medical Records Based on Location Noise Reduction and Rich Semantics[J]. Journal of Chinese Information Processing, 2021, 35(8): 89-97.)
[2] 昝红英, 关同峰, 张坤丽, 等. 面向医学文本的实体关系抽取研究综述[J]. 郑州大学学报(理学版), 2020, 52(4): 1-15.
[2] Zan Hongying, Guan Tongfeng, Zhang Kunli, et al. Review on Entity Relation Extraction for Medical Text[J]. Journal of Zhengzhou University(Natural Science Edition), 2020, 52(4): 1-15.)
[3] 杨锦锋, 于秋滨, 关毅, 等. 电子病历命名实体识别和实体关系抽取研究综述[J]. 自动化学报, 2014, 40(8): 1537-1562.
[3] (Yang Jinfeng, Yu Qiubin, Guan Yi, et al. An Overview of Research on Electronic Medical Record Oriented Named Entity Recognition and Entity Relation Extraction[J]. Acta Automatica Sinica, 2014, 40(8): 1537-1562.)
[4] Jelier R, Jenster G, Dorssers L C J, et al. Co-Occurrence Based Meta-Analysis of Scientific Texts: Retrieving Biological Relationships Between Genes[J]. Bioinformatics, 2005, 21(9): 2049-2058.
pmid: 15657104
[5] Yang Y L, Lai P T, Tsai R T H. A Hybrid System for Temporal Relation Extraction from Discharge Summaries[C]// Proceedings of the 19th International Conference on Technologies and Applications of Artificial Intelligence. 2014: 379-386.
[6] Seol J W, Yi W J, Choi J, et al. Causality Patterns and Machine Learning for the Extraction of Problem-Action Relations in Discharge Summaries[J]. International Journal of Medical Informatics, 2017, 98: 1-12.
[7] Nikfarjam A, Emadzadeh E, Gonzalez G. Towards Generating a Patient’s Timeline: Extracting Temporal Relationships from Clinical Notes[J]. Journal of Biomedical Informatics, 2013, 46: S40-S47.
[8] Hendrickx I, Kim S N, Kozareva Z, et al. SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals[C]// Proceedings of the 2009 Workshop on Semantic Evaluations:Recent Achievements and Future Directions. 2009: 94-99.
[9] Doddington G. The Automatic Content Extraction(ACE) Program-Tasks, Data, and Evaluation[C]// Proceedings of the 4th International Conference on Language Resources and Evaluation. 2004: 837-840
[10] Wei C H, Peng Y, Leaman R, et al. Overview of the BioCreative V Chemical Disease Relation(CDR) Task[C]// Proceedings of the 5th BioCreative Challenge Evaluation Workshop. 2015:154-166.
[11] Uzuner Ö, South B R, Shen S Y, et al. 2010 i2b2/VA Challenge on Concepts, Assertions, and Relations in Clinical Text[J]. Journal of the American Medical Informatics Association, 2011, 18(5): 552-556.
doi: 10.1136/amiajnl-2011-000203 pmid: 21685143
[12] Mintz M, Bills S, Snow R, et al. Distant Supervision for Relation Extraction Without Labeled Data[C]// Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009: 1003-1011.
[13] Riedel S, Yao L M, McCallum A. Modeling Relations and Their Mentions Without Labeled Text[C]// Proceedings of the 2010 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2010: 148-163.
[14] Zeng D J, Liu K, Chen Y B, et al. Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1753-1762.
[15] Jiang X, Wang Q, Li Peng, et al. Relation Extraction with Multi-Instance Multi-Label Convolutional Neural Networks[C]// Proceedings of the 26th International Conference on Computational Linguistics. 2016: 1471-1480.
[16] Feng X C, Guo J, Qin B, et al. Effective Deep Memory Networks for Distant Supervised Relation Extraction[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017: 4003-4008.
[17] Ji G L, Liu K, He S Z, et al. Distant Supervision for Relation Extraction with Sentence-Level Attention and Entity Descriptions[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017: 3060-3066.
[18] 杨穗珠, 刘艳霞, 张凯文, 等. 远程监督关系抽取综述[J]. 计算机学报, 2021, 44(8): 1636-1660.
[18] (Yang Suizhu, Liu Yanxia, Zhang Kaiwen, et al. Survey on Distantly-Supervised Relation Extraction[J]. Chinese Journal of Computers, 2021, 44(8): 1636-1660.)
[19] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
[20] Donnelly K. SNOMED-CT: The Advanced Terminology and Coding System for eHealth[J]. Studies in Health Technology and Informatics, 2006, 121: 279-290.
pmid: 17095826
[21] Lipscomb C E. Medical Subject Headings(MeSH)[J]. Bulletin of the Medical Library Association, 2000, 88(3): 265-266.
pmid: 10928714
[22] Wu T X, Gao C, Qi G L, et al. KG-Buddhism: The Chinese Knowledge Graph on Buddhism[C]// Proceedings of the 7th Joint International Semantic Technology Conference. 2017: 259-267.
[23] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[24] Liu W Y, Wen Y D, Yu Z D, et al. Large-Margin Softmax Loss for Convolutional Neural Networks[C]// Proceedings of the 33rd International Conference on Machine Learning. 2016: 507-516.
[25] Kingma D P, Ba J. Adam: A Method for Stochastic Optimization[OL]. arXiv Preprint, arXiv: 1412.6980.
[26] Fleiss J L, Cohen J. The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability[J]. Educational and Psychological Measurement, 1973, 33(3): 613-619.
[27] Surdeanu M, Tibshirani J, Nallapati R, et al. Multi-Instance Multi-Label Learning for Relation Extraction[C]// Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012: 455-465.
[28] Li Y, Long G D, Shen T, et al. Self-Attention Enhanced Selective Gate with Entity-Aware Embedding for Distantly Supervised Relation Extraction[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 8269-8276.
[29] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting[J]. Journal of Machine Learning Research, 2014, 15(1):1929-1958.
[1] 叶瀚,孙海春,李欣,焦凯楠. 融合注意力机制与句向量压缩的长文本分类模型[J]. 数据分析与知识发现, 2022, 6(6): 84-94.
[2] 王永生, 王昊, 虞为, 周泽聿. 融合结构和内容的方志文本人物关系抽取方法*[J]. 数据分析与知识发现, 2022, 6(2/3): 318-328.
[3] 王义真,欧石燕,陈金菊. 民事裁判文书两阶段式自动摘要研究*[J]. 数据分析与知识发现, 2021, 5(5): 104-114.
[4] 沈卓,李艳. 基于PreLM-FT细粒度情感分析的餐饮业用户评论挖掘[J]. 数据分析与知识发现, 2020, 4(4): 63-71.
[5] 郭少卿, 乐小虬. 科技论文中数值指标实际取值识别[J]. 数据分析与知识发现, 2018, 2(1): 21-28.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn