Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (8): 76-85     https://doi.org/10.11925/infotech.2096-3467.2021.0233
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于图注意力网络的药物ADMET分类预测模型构建方法*
顾耀文1,张博文2,郑思1,杨丰春1,李姣1()
1中国医学科学院/北京协和医学院医学信息研究所 北京 100020
2晶泰科技人工智能研发中心 北京 100089
Predicting Drug ADMET Properties Based on Graph Attention Network
Gu Yaowen1,Zhang Bowen2,Zheng Si1,Yang Fengchun1,Li Jiao1()
1Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing 100020, China
2XtalPi AI Research Center, Beijing 100089, China
全文: PDF (1092 KB)   HTML ( 42
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 对药物的吸收、分布、代谢、排泄、毒性(Absorption,Distribution,Metabolism,Excretion,Toxicity,ADMET)中的代谢、毒性属性进行建模,用于虚拟筛选中的药物性质评价。【方法】 提出一种图注意力网络构建药物ADMET预测模型,基于开放数据库和科学文献的药物ADMET数据构造分子图作为分子结构特征,进一步将提出的模型与三种机器学习模型和两种传统的图神经网络模型进行性能比较。【结果】 收集整合得到9个ADMET数据集共计149 457条数据。基于图注意力网络的ADMET预测模型在9个数据集中的平均准确率为0.825、平均F1分数为0.672。与机器学习和图神经网络基线模型相比,所提方法在平均准确率和平均F1分数指标上最大提升幅度达6.4%和26.0%。【局限】 数据清洗步骤可以精细化处理,模型预测性能可以通过改进预训练策略进一步提升。【结论】 所提图注意力网络模型在药物ADMET分类预测上取得良好性能,可将其应用于虚拟药物筛选流程,为计算机辅助药物设计和药物发现提供参考。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
顾耀文
张博文
郑思
杨丰春
李姣
关键词 图神经网络图注意力网络多源异构数据ADMET虚拟筛选    
Abstract

[Objective] This study builds a prediction model for drugs’ ADMET properties (Absorption, Distribution, Metabolism, Excretion, Toxicity), aiming to evaluate drugs in virtual screening. [Methods] We constructed a drug ADMET prediction based on the Graph Attention Network (GAN). Then, we used the drug ADMET properties from open access databases and scientific publications to create their molecular graphs and structures. Finally, we compared the GAN-based model with three machine learning models and two graph neural network models. [Results] We collected 9 datasets with 149 457 ADMET records. The proposed prediction model had an average accuracy of 0.825 and an average F1-Score of 0.672 with the 9 datasets, which were 6.4% and 26.0% higher than those of the baseline models. [Limitations] The data cleansing process needs to be refined, while the prediction performance can be further improved with a pre-training architecture. [Conclusions] The proposed model could effectively predict a drug’s ADMET, which could help virtual drug screening and computer-aided drug developments.

Key wordsGraph Neural Network    Graph Attention Network    Multi-source Heterogeneous Data    ADMET    Virtual Screening
收稿日期: 2021-03-08      出版日期: 2021-09-15
ZTFLH:  R961  
基金资助:*国家自然科学基金(81601573);国家重点研发计划项目(2016YFC0901901)
通讯作者: 李姣 ORCID:0000-0001-6391-8343     E-mail: li.jiao@imicams.ac.cn
引用本文:   
顾耀文, 张博文, 郑思, 杨丰春, 李姣. 基于图注意力网络的药物ADMET分类预测模型构建方法*[J]. 数据分析与知识发现, 2021, 5(8): 76-85.
Gu Yaowen, Zhang Bowen, Zheng Si, Yang Fengchun, Li Jiao. Predicting Drug ADMET Properties Based on Graph Attention Network. Data Analysis and Knowledge Discovery, 2021, 5(8): 76-85.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.0233      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I8/76
Fig.1  基于图注意力网络的ADMET预测模型构建流程
NAME SMILES VALUE
CHEMBL274189 [Cl-].[H]c1c(OC([H])([H])[H])c(OC([H])([H])[H])c([H])c2c1-c1c([H])c3c([H])c([H])c(OC([H])([H])[H])c(OC([H])([H])[H])c3c([H])[n+]1C([H])([H])C2([H])[H] 0
CHEMBL12089 [Cl-].[H]c1c2c(c([H])c3c1-c1c([H])c4c([H])c([H])c(OC([H])([H])[H])c(OC([H])([H])[H])c4c([H])[n+]1C([H])([H])C3([H])[H])OC([H])([H])O2 0
CHEMBL3353920 [H]/C(=C(C(/[H])=C(\[H])C(=O)[O-])\C([H])([H])[H])[C@]([H])(C(=O)c1c([H])c([H])c(N(C([H])([H])[H])C([H])([H])[H])c([H])c1[H])C([H])([H])[H] 0
CHEMBL67 [H]/C(=C(\[H])c1c([H])c(OC([H])([H])[H])c(OC([H])([H])[H])c(OC([H])([H])[H])c1[H])c1c([H])c([H])c(OC([H])([H])[H])c([O-])c1[H] 1
Table 1  ADMET数据集示例(LO2毒性)
数据类型 ADMET
属性
数据描述 样本量 阳性
样本量
阴性
样本量
代谢 CYP450 1A2 inhibitor 细胞色素酶P450 1A2亚型抑制 21 566 10 376 11 190
CYP450 2C9 inhibitor 细胞色素酶P450 2C9亚型抑制 21 763 5 422 16 341
CYP450 2C19
inhibitor
细胞色素酶P450 2C19亚型抑制 22 255 7 809 14 446
CYP450 2D6 inhibitor 细胞色素酶P450 2D6亚型抑制 22 470 4 542 17 928
CYP450 3A4 inhibitor 细胞色素酶P450 3A4亚型抑制 24 066 8 782 15 284
毒性 hERG hERG钾通道抑制(心脏毒性) 6 596 4 570 2 026
Ames 致突变性 12 970 7 242 5 728
LO2 LO2细胞毒性 501 94 407
HEK293 HEK293细胞毒性 17 270 2 445 14 825
Table 2  ADMET数据集描述
Fig.2  化学空间分布(基于t-SNE)
预测模型 P450 1A2 P450 2C9 P450 2C19 P450 2D6 P450 3A4
F1-Score Accuracy F1-Score Accuracy F1-Score Accuracy F1-Score Accuracy F1-Score Accuracy
RF 0.771 0.792 0.535 0.829 0.685 0.779 0.437 0.853 0.676 0.826
KNN 0.686 0.737 0.451 0.790 0.570 0.749 0.441 0.842 0.567 0.782
LR 0.729 0.756 0.602 0.824 0.669 0.772 0.514 0.833 0.689 0.811
GCN 0.754 0.773 0.658 0.820 0.723 0.776 0.580 0.806 0.741 0.845
MPNN 0.755 0.781 0.648 0.832 0.712 0.794 0.584 0.853 0.726 0.822
本文模型(GAT) 0.778 0.799 0.670 0.840 0.725 0.787 0.585 0.855 0.748 0.844
Table 3  药物代谢预测模型性能
预测模型 hERG Ames LO2 HEK293
F1-Score Accuracy F1-Score Accuracy F1-Score Accuracy F1-Score Accuracy
RF 0.868 0.803 0.555 0.687 0.545 0.851 0.277 0.908
KNN 0.843 0.773 0.342 0.603 0.556 0.842 0.348 0.908
LR 0.838 0.773 0.599 0.639 0.579 0.842 0.258 0.888
GCN 0.864 0.808 0.674 0.689 0.585 0.832 0.262 0.902
MPNN 0.841 0.766 0.726 0.752 0.370 0.495 0.301 0.884
本文模型(GAT) 0.872 0.829 0.676 0.709 0.588 0.861 0.409 0.901
Table 4  药物毒性预测模型性能
[1] Ferreira L L G, Andricopulo A D. ADMET Modeling Approaches in Drug Discovery[J]. Drug Discovery Today, 2019, 24(5):1157-1165.
doi: 10.1016/j.drudis.2019.03.015
[2] Lucas A J, Sproston J L, Barton P, et al. Estimating Human ADME Properties, Pharmacokinetic Parameters and Likely Clinical Dose in Drug Discovery[J]. Expert Opinion on Drug Discovery, 2019, 14(12):1313-1327.
doi: 10.1080/17460441.2019.1660642 pmid: 31538500
[3] Wang Y L, Xing J, Xu Y, et al. In Silico ADME/T Modelling for Rational Drug Design[J]. Quarterly Reviews of Biophys, 2015, 48(4):488-515.
doi: 10.1017/S0033583515000190
[4] Chi C T, Lee M H, Weng C F, et al. In Silico Prediction of PAMPA Effective Permeability Using a Two-QSAR Approach[J]. International Journal of Molecular Sciences, 2019, 20(13):3170.
doi: 10.3390/ijms20133170
[5] Ruiz I L, Gómez-Nieto M A. Robust QSAR Prediction Models for Volume of Distribution at Steady State in Humans Using Relative Distance Measurements[J]. SAR and QSAR Environmental Research, 2018, 29(7):529-550.
doi: 10.1080/1062936X.2018.1494038
[6] Dong J, Wang N N, Yao Z J, et al. ADMETlab: A Platform for Systematic ADMET Evaluation Based on a Comprehensively Collected ADMET Database[J]. Journal of Cheminformatics, 2018, 10(1):29.
doi: 10.1186/s13321-018-0283-x pmid: 29943074
[7] Durant J L, Leland B A, Henry D R, et al. Reoptimization of MDL Keys for Use in Drug Discovery[J]. Journal of Chemical Information and Computer Sciences, 2002, 42(6):1273-1280.
doi: 10.1021/ci010132r
[8] Rogers D, Hahn M. Extended-Connectivity Fingerprints[J]. Journal of Chemical Informaiton and Model, 2010, 50(5):742-754.
[9] Cheng F X, Li W H, Zhou Y D, et al. admetSAR: A Comprehensive Source and Free Tool for Assessment of Chemical ADMET Properties[J]. Journal of Chemical Informaiton and Model, 2012, 52(11):3099-3105.
[10] Wishart D S, Feunang Y D, Guo A C, et al. DrugBank 5.0: A Major Update to the DrugBank Database for 2018[J]. Nucleic Acids Research, 2018, 46(D1):D1074-D1082.
doi: 10.1093/nar/gkx1037
[11] Pires D E, Blundell T L, Ascher D B. pkCSM: Predicting Small-Molecule Pharmacokinetic and Toxicity Properties Using Graph-Based Signatures[J]. Journal of Medicinal Chemistry, 2015, 58(9):4066-4072.
doi: 10.1021/acs.jmedchem.5b00104
[12] Withnall M, Lindelöf E, Engkvist O, et al. Building Attention and Edge Message Passing Neural Networks for Bioactivity and Physical-Chemical Property Prediction[J]. Journal of Cheminformatics, 2020, 12:1.
doi: 10.1186/s13321-019-0407-y
[13] Huang Y A, Hu P, Chan K C C, et al. Graph Convolution for Predicting Associations Between miRNA and Drug Resistance[J]. Bioinformatics, 2020, 36(3):851-858.
[14] Gilmer J, Schoenholz S S, Riley P F, et al. Neural Message Passing for Quantum Chemistry[C]// Proceedings of the 34th International Conference on Machine Learning. 2017: 1263-1272.
[15] 张思凡, 牛振东, 陆浩, 等. 基于图卷积嵌入与特征交叉的文献被引量预测方法:以交通运输领域为例[J]. 数据分析与知识发现, 2020, 4(9):56-67.
[15] ( Zhang Sifan, Niu Zhendong, Lu Hao, et al. Predicting Citations Based on Graph Convolution Embedding and Feature Cross: Case Study of Transportation Research[J]. Data Analysis and Knowledge Discovery, 2020, 4(9):56-67.)
[16] 陈鑫, 刘喜恩, 吴及. 药物表示学习研究进展[J]. 清华大学学报(自然科学版), 2020, 60(2):171-180.
[16] ( Chen Xin, Liu Xien, Wu Ji. Research Progress on Drug Representation Learning[J]. Journal of Tsinghua University (Science and Technology), 2020, 60(2):171-180.)
[17] Wu Z Q, Ramsundar B, Feinberg E N, et al. MoleculeNet: A Benchmark for Molecular Machine Learning[J]. Chemical Science, 2018, 9(2):513-530.
doi: 10.1039/C7SC02664A
[18] Liu K, Sun X Y, Jia L, et al. Chemi-Net: A Molecular Graph Convolutional Network for Accurate Drug Property Prediction[J]. International Journal of Molecular Sciences, 2019, 20(14):3389.
doi: 10.3390/ijms20143389
[19] Jo J, Kwak B, Choi H S, et al. The Message Passing Neural Networks for Chemical Property Prediction on SMILES[J]. Methods, 2020, 179:65-72.
doi: 10.1016/j.ymeth.2020.05.009
[20] Veličković P, Cucurull G, Casanova A, et al. Graph Attention Networks[OL]. arXiv Preprint, arXiv:1710.10903.
[21] Zhang J L, Jiang Z L, Hu X H, et al. A Novel Graph Attention Adversarial Network for Predicting Disease-Related Associations[J]. Methods, 2020, 179:81-88.
doi: 10.1016/j.ymeth.2020.05.010
[22] Yu Z X, Huang F, Zhao X H, et al. Predicting Drug-Disease Associations Through Layer Attention Graph Convolutional Network[J]. Briefings in Bioinformatics, 2020, doi: 10.1093/bib/bbaa243.
doi: 10.1093/bib/bbaa243
[23] Gaulton A, Bellis L J, Bento A P, et al. ChEMBL: A Large-scale Bioactivity Database for Drug Discovery[J]. Nucleic Acids Research, 2012, 40(Database Issue):D1100-D1107.
doi: 10.1093/nar/gkr777
[24] Wang Y L, Xiao J W, Suzek T O, et al. PubChem’s BioAssay Database[J]. Nucleic Acids Research, 2012, 40(Database Issue):D400-D412.
doi: 10.1093/nar/gkr1132
[25] Cao D Y, Wang J M, Zhou R, et al. ADMET Evaluation in Drug Discovery. 11. PharmacoKinetics Knowledge Base (PKKB): A Comprehensive Database of Pharmacokinetic and Toxic Properties for Drugs[J]. Journal of Chemical Information and Modeling, 2012, 52(5):1132-1137.
doi: 10.1021/ci300112j
[26] Xu Q, Liu K, Lin X M, et al. ADMETNet: The Knowledge Base of Pharmacokinetics and Toxicology Network[J]. Journal of Genetics and Genomics, 2017, 44(5):273-276.
doi: 10.1016/j.jgg.2017.04.005
[27] Sorkun M C, Khetan A, Er S. AqSolDB, a Curated Reference Set of Aqueous Solubility and 2D Descriptors for a Diverse Set of Compounds[J]. Scientific Data, 2019, 6:143.
doi: 10.1038/s41597-019-0151-1 pmid: 31395888
[28] Richard A M, Huang R L, Waidyanatha S, et al. The Tox21 10K Compound Library: Collaborative Chemistry Advancing Toxicology[J]. Chemical Research in Toxicology, 2021, 34(2):189-216.
doi: 10.1021/acs.chemrestox.0c00264
[29] Li X, Xu Y J, Lai L H, et al. Prediction of Human Cytochrome P450 Inhibition Using a Multitask Deep Autoencoder Neural Network[J]. Molecular Pharmaceutics, 2018, 15(10):4336-4345.
doi: 10.1021/acs.molpharmaceut.8b00110
[30] Wallach I, Heifets A. Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization[J]. Journal of Chemical Information and Modeling, 2018, 58(5):916-932.
doi: 10.1021/acs.jcim.7b00403 pmid: 29698607
[31] Parks C, Gaieb Z, Amaro R E. An Analysis of Proteochemometric and Conformal Prediction Machine Learning Protein-Ligand Binding Affinity Models[J]. Frontiers in Molecular Biosciences, 2020, 7:93.
doi: 10.3389/fmolb.2020.00093
[32] Kearnes S, Mccloskey K, Berndl M, et al. Molecular Graph Convolutions: Moving Beyond Fingerprints[J]. Journal of Computer-Aided Molecular Design, 2016, 30(8):595-608.
doi: 10.1007/s10822-016-9938-8 pmid: 27558503
[1] 王松, 杨洋, 刘新民. 基于图注意力网络的开放式创新社区用户创意潜在价值发现研究*[J]. 数据分析与知识发现, 2021, 5(11): 89-101.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn