Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (6): 148-160     https://doi.org/10.11925/infotech.2096-3467.2022.0535
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于患者相似性分析的药物推荐深度学习模型研究*
吴佳伦1,张若楠2,康武林3,袁普卫3()
1西安交通大学计算机科学与技术学院 西安 710049
2西安交通大学图书馆 西安 710049
3陕西中医药大学附属医院 咸阳 712046
Deep Learning Model of Drug Recommendation Based on Patient Similarity Analysis
Wu Jialun1,Zhang Ruonan2,Kang Wulin3,Yuan Puwei3()
1School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
2Library of Xi’an Jiaotong University, Xi’an 710049, China
3Affiliated Hospital of Shaanxi University of Chinese Medicine, Xianyang 712046, China
全文: PDF (1121 KB)   HTML ( 23
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 研究一种通过解析结构化时序医疗健康数据、分析患者相似性以准确预测药物组合的深度学习模型。【方法】 通过两种注意力机制解析结构化时序数据以学习全面的患者表示,通过计算患者相似性丰富患者表示,最终将药物推荐问题转化为多标签学习问题。【结果】 在MIMIC-III数据集上进行实验,相较于现有最优药物推荐模型,所提模型的DDI率降低了1.09个百分点,同时所提模型的Jaccard相似性、PRAUC和F1值分别提升了2.38、1.40和1.08个百分点。【局限】 模型尚未融入生物医学等具有领域特色的先验知识;未深究数据本身存在的噪声及其在临床应用可能出现的问题。【结论】 所提模型能够准确学习全面的患者表示,并提升药物推荐任务的安全性和准确性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
吴佳伦
张若楠
康武林
袁普卫
关键词 药物推荐电子健康记录患者表示学习深度学习    
Abstract

[Objective] This paper develops a deep learning model that accurately predicts drug combinations by analyzing structured time-series medical data and patient similarity. [Methods] Our model learned comprehensive patient representations by parsing structured time-series data through two attention mechanisms. Then, we calculated the patients’ similarity to enrich their representation and transformed the drug recommendation problem into a multi-label learning task. [Results] We examined the new model with the MIMIC-III dataset. Compared to other mainstream models, the proposed one achieved improvements of at least 1.09%, 2.38%, 1.40%, and 1.08% in DDI rate, Jaccard similarity, PRAUC, and F1-score, respectively. [Limitations] Our model should have included the prior domain knowledge from biomedical fields. More research is needed to thoroughly investigate the noise in the data and potential issues in clinical applications. [Conclusions] The proposed method can learn comprehensive patient representations and enhance the safety and accuracy of drug recommendation tasks.

Key wordsDrug Recommendation    Electronic Health Records    Patient Representation Learning    Deep Learning
收稿日期: 2022-05-25      出版日期: 2022-11-09
ZTFLH:  TP391  
  G35  
基金资助:* 陕西省科技厅2021年重点研发计划高校联合重点项目(2021GXLH-Z-095);陕西省教育厅2021年度科研计划项目“服务地方专项”(21JC010);陕西省教育厅“骨退行性疾病中西医结合防治转化医学陕西省高校工程研究中心”项目(陕教技办[2021]10号)
通讯作者: 袁普卫,ORCID:0000-0001-7916-8823,E-mail:spine_surgeon@163.com。   
引用本文:   
吴佳伦, 张若楠, 康武林, 袁普卫. 基于患者相似性分析的药物推荐深度学习模型研究*[J]. 数据分析与知识发现, 2023, 7(6): 148-160.
Wu Jialun, Zhang Ruonan, Kang Wulin, Yuan Puwei. Deep Learning Model of Drug Recommendation Based on Patient Similarity Analysis. Data Analysis and Knowledge Discovery, 2023, 7(6): 148-160.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0535      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I6/148
Fig.1  药物推荐任务定义
符号 定义
x i t i个患者第 t次入院的医疗记录
d i t i个患者第 t次入院的诊断记录
p i t i个患者第 t次入院的手术记录
m i t i个患者第 t次入院的用药记录
X i i个患者的全部病史记录
X i 1 : t i个患者从第一次到第 t次入院的病史记录
d e i t , p e i t , m e i t 医疗记录(诊断、手术、用药)的嵌入向量
α i ( t ) i个患者第 t次入院的代码级(Code-Level)注意力
β i ( t ) i个患者第 t次入院的访问级(Visit-Level)注意力
q i t i个患者所有病史的整合表示
s i , j t , k 患者表示 q i t q j k之间的相似性
S n 相似病史序列集
m ^ i t i个患者的预测药物组合
m i t i个患者的真实药物组合
Table1  符号
Fig.2  MedSim模型架构
项目 数量
患者 6 350
临床事件 15 031
诊断 1 958
手术 1 430
药物(ATC 3) 132
药物(ATC 4) 266
访问的平均值/总数 2.37/29
访问诊断的平均值/总数 10.51/128
访问手术的平均值/总数 3.84/50
访问药物的平均值/总数(ATC 3) 11.18/64
访问药物的平均值/总数(ATC 4) 11.87/81
DDI种类数量 50
Table2  数据集统计信息
代码 模型 DDI率 Jaccard相似性 F1值 PRAUC 药物平均数量
ATC3 LR 0.088 1±0.000 8 0.496 6±0.002 8 0.633 5±0.002 4 0.730 0±0.003 2 15.990 4±0.099 0
RETAIN 0.093 3±0.001 5 0.497 8±0.002 8 0.637 2±0.002 5 0.748 3±0.003 0 18.742 4±0.073 8
LEAP 0.094 5±0.000 4 0.480 4±0.002 6 0.621 8±0.002 6 0.746 3±0.003 0 18.742 4±0.073 8
DMNC 0.088 1±0.000 4 0.495 4±0.002 0 0.637 3±0.002 7 0.735 9±0.003 8 20.000 0±0.000 0
GAMENet 0.087 9±0.000 7 0.499 5±0.002 3 0.643 8±0.002 3 0.749 1±0.002 6 20.793 9±0.062 8
MedSim 0.076 1±0.000 9 0.511 8±0.002 3 0.657 9±0.002 3 0.756 3±0.003 0 17.456 0±0.068 6
ATC4 LR 0.089 5±0.000 7 0.382 6±0.002 4 0.531 3±0.002 3 0.627 1±0.001 7 14.332 5±0.112 8
RETAIN 0.099 2±0.001 7 0.392 0±0.003 7 0.553 5±0.004 1 0.622 7±0.004 7 16.631 9±0.163 2
LEAP 0.109 0±0.001 0 0.382 1±0.002 9 0.531 8±0.003 4 0.594 3±0.005 2 18.998 1±0.060 4
DMNC 0.103 6±0.000 4 0.394 3±0.002 1 0.543 4±0.002 3 0.619 4±0.003 0 20.000 0±0.000 0
GAMENet 0.079 4±0.000 8 0.391 7±0.003 6 0.552 5±0.003 8 0.618 3±0.004 1 19.349 1±0.104 1
MedSim 0.068 5±0.000 3 0.415 5±0.002 4 0.566 5±0.002 4 0.629 1±0.002 8 20.076 4±0.083 3
Table3  模型性能对比
γ DDI率 Jaccard相似性 F1值 PRAUC 药物平均数量
0.00 0.010 8±0.000 3 0.409 7±0.002 1 0.560 2±0.002 3 0.612 0±0.002 9 19.059 9±0.057 9
0.01 0.022 5±0.000 4 0.409 9±0.002 6 0.561 1±0.002 6 0.610 9±0.002 5 19.140 3±0.079 7
0.02 0.024 3±0.000 3 0.411 0±0.002 2 0.561 7±0.002 2 0.617 9±0.002 5 19.114 6±0.093 1
0.03 0.037 0±0.000 2 0.413 1±0.002 2 0.564 2±0.002 3 0.622 1±0.002 5 19.352 0±0.113 7
0.04 0.040 7±0.000 3 0.413 6±0.001 8 0.565 0±0.001 8 0.627 9±0.002 2 19.827 2±0.105 8
0.05 0.054 4±0.000 2 0.414 5±0.001 9 0.566 1±0.001 9 0.628 0±0.002 8 20.165 8±0.090 5
0.06 0.068 5±0.000 3 0.415 5±0.002 4 0.566 5±0.002 4 0.629 1±0.002 8 20.076 4±0.083 3
0.07 0.074 4±0.000 4 0.418 0±0.002 2 0.568 7±0.002 1 0.633 3±0.002 3 20.555 7±0.098 2
0.08 0.081 5±0.000 4 0.419 3±0.002 8 0.590 3±0.002 8 0.636 4±0.002 6 20.890 8±0.120 7
Table4  不同DDI率 γ下的模型性能
n DDI率 Jaccard相似性 F1值 PRAUC 药物平均数量
6 0.066 2±0.000 3 0.408 1±0.002 1 0.559 6±0.002 2 0.624 0±0.002 3 20.585 3±0.105 0
8 0.067 0±0.000 3 0.415 3±0.002 3 0.566 1±0.002 4 0.628 7±0.002 9 19.214 6±0.063 6
10 0.068 5±0.000 3 0.415 5±0.002 4 0.566 5±0.002 4 0.629 1±0.002 8 20.076 4±0.083 3
12 0.067 7±0.000 4 0.414 2±0.002 1 0.565 2±0.002 1 0.628 1±0.002 3 19.351 8±0.113 8
14 0.067 9±0.000 3 0.414 0±0.002 6 0.564 5±0.002 6 0.627 3±0.003 0 18.045 1±0.058 8
Table5  不同大小的候选集的模型性能
病例 模型 DDI率 药物组合推荐
病例1 处方 0.126 3 A01AD, A02BA, B05CX, C01CA, M01AB, N01AX, C07AB, N02BE, C03CA, N07AA, A02BC, C10AA, A06AD, A12BA, J01DB, C01DA, A01AB, A03FA, A02AA, N06A
LEAP 0.105 2 15 正确: A01AD, A02BA, B05CX, C01CA, M01AB, N01AX, C07AB, N02BE, C03CA, N07AA, A02BC, C10AA, A06AD, A12BA, J01DB
6 错误: A07AA, A02BX, N02AA, B01AB, C01BD(2), J01DH
GAMENet 0.090 8 17 正确: A01AD, A02BA, B05CX, C01CA, M01AB, N01AX, C07AB, N02BE, C03CA, N07AA, A02BC, C10AA, A06AD, A12BA, J01DB, C01DA, A03FA
6 错误: A07AA, A02BX, N02AA, B01AB, H03AA, J01DH
MedSim 0.087 1 19 正确: A01AD, A02BA, B05CX, C01CA, M01AB, N01AX, C07AB, N02BE, C03CA, N07AA, A02BC, C10AA, A06AD, A12BA, J01DB, A01AB, A03FA, A02AA, N06AX
1 错误: A02B
病例2 处方 0.097 7 A07AA, N01AX, N02BE, B05CX, N03AX, A06AD, A12BA, A01AB,
B03BB, N07BA, A04AA, C01EB, D04AA, J01DD, N05CF, A01A
LEAP 0.085 7 8 正确: A07AA, B05CX, N03AX, A06AD, A12BA, A01AB, B03BB, N07BA
7 错误: A01AD(1), A02BA, N02AA, B01AB, C02DB, A02BC, N03AB(2)
GAMENet 0.079 2 8 正确: A07AA, B05CX, N03AX, A06AD, A12BA, A01AB, B03BB, N07BA
7 错误: A01AD(1), A02BA, N02AA, B01AB, C02DB, R06AX(1), J01XX
MedSim 0.076 1 8 正确: A07AA, N02BE, B05CX, N03AX, A06AD, A12BA, B03BB, A04AA
6 错误: A02BA, N02AA, N06AX, B01AB, N05BA(1), A02BC
Table6  用例分析
[1] Lee C, Luo Z J, Ngiam K Y, et al. Big Healthcare Data Analytics: Challenges and Applications [A]//Handbook of Large-Scale Distributed Computing in Smart Healthcare[M]. Berlin: Springer, 2017: 11-41.
[2] Miotto R, Wang F, Wang S, et al. Deep Learning for Healthcare: Review, Opportunities and Challenges[J]. Briefings in Bioinformatics, 2018, 19(6): 1236-1246.
doi: 10.1093/bib/bbx044 pmid: 28481991
[3] Choi E, Bahadori M T, Schuetz A, et al. Doctor AI: Predicting Clinical Events via Recurrent Neural Networks[C]// Proceedings of the Machine Learning for Healthcare 2016. 2016: 301-318.
[4] Xiao C, Choi E, Sun J M. Opportunities and Challenges in Developing Deep Learning Models Using Electronic Health Records Data: A Systematic Review[J]. Journal of the American Medical Informatics Association, 2018, 25(10): 1419-1428.
doi: 10.1093/jamia/ocy068 pmid: 29893864
[5] 国务院办公厅. 国务院办公厅关于促进和规范健康医疗大数据应用发展的指导意见[J]. 中华人民共和国国务院公报, 2016(19):24-28.
[5] (General Office of the State Council of the People’s Republic of China. Guidance on Promoting and Regulating the Development of Health Care Big Data Applications[J]. ZHONGHUA RENMIN GONGHEGUO GUOWUYUAN GONGBAO, 2016(19):24-28.)
[6] Panagioti M, Stokes J, Esmail A, et al. Multimorbidity and Patient Safety Incidents in Primary Care: A Systematic Review and Meta-Analysis[J]. PLoS One, 2015, 10(8): e0135947.
doi: 10.1371/journal.pone.0135947
[7] Vinyals O, Bengio S, Kudlur M. Order Matters: Sequence to Sequence for Sets[OL]. arXiv Preprint, arXiv:1511.06391.
[8] Zhang Y T, Chen R, Tang J, et al. LEAP: Learning to Prescribe Effective and Safe Treatment Combinations for Multimorbidity[C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017: 1315-1324.
[9] Gong F, Wang M, Wang H F, et al. SMR: Medical Knowledge Graph Embedding for Safe Medicine Recommendation[J]. Big Data Research, 2021, 23: 100174.
doi: 10.1016/j.bdr.2020.100174
[10] Zheng Z, Wang C, Xu T, et al. Drug Package Recommendation via Interaction-Aware Graph Induction[C]// Proceedings of the Web Conference. 2021:1284-1295.
[11] Le H, Tran T, Venkatesh S. Dual Memory Neural Computer for Asynchronous Two-View Sequential Learning[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 1637-1645.
[12] Shang J Y, Ma T F, Xiao C, et al. Pre-training of Graph Augmented Transformers for Medication Recommendation[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019: 5953-5959.
[13] Choi E, Bahadori M T, Kulas J A, et al. RETAIN: An Interpretable Predictive Model for Healthcare Using Reverse Time Attention Mechanism[C]// Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016: 3512-3520.
[14] Shang J Y, Xiao C, Ma T F, et al. GAMENet: Graph Augmented Memory Networks for Recommending Medication Combination[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33: 1126-1133.
[15] Wang S S, Ren P J, Chen Z M, et al. Order-Free Medicine Combination Prediction with Graph Convolutional Reinforcement Learning[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019: 1623-1632.
[16] Sun J M, Wang F, Hu J Y, et al. Supervised Patient Similarity Measure of Heterogeneous Patient Records[J]. ACM SIGKDD Explorations Newsletter, 2012, 14(1): 16-24.
doi: 10.1145/2408736.2408740
[17] Suo Q L, Ma F L, Yuan Y, et al. Personalized Disease Prediction Using a CNN-Based Similarity Learning Method[C]// Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine. 2017: 811-816.
[18] Zhang P, Wang F, Hu J Y, et al. Towards Personalized Medicine: Leveraging Patient Similarity and Drug Similarity Analytics[J]. AMIA Joint Summits on Translational Science. 2014: 132-136.
[19] Müller M. Dynamic Time Warping[A]//Information Retrieval for Music and Motion[M]. Berlin: Springer, 2007: 69-84.
[20] Yin C C, Liu R Q, Zhang D D, et al. Identifying Sepsis Subphenotypes via Time-Aware Multi-modal Auto-encoder[C]// Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020: 862-872.
[21] Chung J, Gulcehre C, Cho K, et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling[OL]. arXiv Preprint, arXiv:1412.3555.
[22] He Y, Wang C, Li N, et al. Attention and Memory-Augmented Networks for Dual-View Sequential Learning[C]// Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020: 125-134.
[23] He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
[24] Ji S W, Ye J P. Linear Dimensionality Reduction for Multi-label Classification[C]// Proceedings of the 21st International Joint Conference on Artificial Intelligence. 2009: 1077-1082.
[25] An W P, Wang H Q, Sun Q Y, et al. A PID Controller Approach for Stochastic Optimization of Deep Networks[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 8522-8531.
[26] Johnson A E W, Pollard T J, Shen L, et al. MIMIC-III, a Freely Accessible Critical Care Database[J]. Scientific Data, 2016, 3: 160035.
doi: 10.1038/sdata.2016.35
[27] Wishart D S, Feunang Y D, Guo A C, et al. DrugBank 5.0: A Major Update to the DrugBank Database for 2018[J]. Nucleic Acids Research, 2018, 46(D1): D1074-D1082.
doi: 10.1093/nar/gkx1037
[28] Niwattanakul S, Singthongchai J, Naenudorn E, et al. Using of Jaccard Coefficient for Keywords Similarity[C]// Proceedings of the 2013 IAENG International Conference on Internet Computing and Web Services. 2013.
[29] Davis J, Goadrich M. The Relationship Between Precision-Recall and ROC Curves[C]// Proceedings of the 23rd International Conference on Machine Learning. 2006: 233-240.
[1] 汪晓凤, 孙雨洁, 王华珍, 张恒彰. 融合深度学习和知识图谱的类型可控问句生成模型构建及验证*[J]. 数据分析与知识发现, 2023, 7(6): 26-37.
[2] 王楠, 王淇. 基于深度学习的学生课堂专注度测评方法*[J]. 数据分析与知识发现, 2023, 7(6): 123-133.
[3] 刘洋, 张雯, 胡毅, 毛进, 黄菲. 基于多模态深度学习的酒店股票预测*[J]. 数据分析与知识发现, 2023, 7(5): 21-32.
[4] 黄学坚, 马廷淮, 王根生. 基于分层语义特征学习模型的微博谣言事件检测*[J]. 数据分析与知识发现, 2023, 7(5): 81-91.
[5] 王寅秋, 虞为, 陈俊鹏. 融合知识图谱的中文医疗问答社区自动问答研究*[J]. 数据分析与知识发现, 2023, 7(3): 97-109.
[6] 张贞港, 余传明. 基于实体与关系融合的知识图谱补全模型研究*[J]. 数据分析与知识发现, 2023, 7(2): 15-25.
[7] 沈丽宁, 杨佳艺, 裴家旋, 曹广, 陈功正. 基于OCC模型和情绪诱因事件抽取的细颗粒度情绪识别方法研究*[J]. 数据分析与知识发现, 2023, 7(2): 72-85.
[8] 王卫军, 宁致远, 杜一, 周园春. 基于多标签分类的科技文献学科交叉研究性质识别*[J]. 数据分析与知识发现, 2023, 7(1): 102-112.
[9] 肖宇晗, 林慧苹. 基于CWSA方面词提取模型的差异化需求挖掘方法研究——以京东手机评论为例*[J]. 数据分析与知识发现, 2023, 7(1): 63-75.
[10] 成全, 佘德昕. 融合患者体征与用药数据的图神经网络药物推荐方法研究*[J]. 数据分析与知识发现, 2022, 6(9): 113-124.
[11] 王露, 乐小虬. 科技论文引用内容分析研究进展[J]. 数据分析与知识发现, 2022, 6(4): 1-15.
[12] 郑潇, 李树青, 张志旺. 基于评分数值分析的用户项目质量测度及其在深度推荐模型中的应用*[J]. 数据分析与知识发现, 2022, 6(4): 39-48.
[13] 余传明, 林虹君, 张贞港. 基于多任务深度学习的实体和事件联合抽取模型*[J]. 数据分析与知识发现, 2022, 6(2/3): 117-128.
[14] 张云秋, 李博诚, 陈妍. 面向不平衡数据的电子病历自动分类研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 233-241.
[15] 张芳丛, 秦秋莉, 姜勇, 庄润涛. 基于RoBERTa-WWM-BiLSTM-CRF的中文电子病历命名实体识别研究[J]. 数据分析与知识发现, 2022, 6(2/3): 251-262.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn