Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (11): 93-102     https://doi.org/10.11925/infotech.2096-3467.2022.0196
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于图神经网络的抗结核杆菌药物虚拟筛选模型的建立及应用*
顾耀文,郑思,杨丰春,李姣()
中国医学科学院/北京协和医学院医学信息研究所 北京 100020
GNN-MTB: An Anti-Mycobacterium Drug Virtual Screening Model Based on Graph Neural Network
Gu Yaowen,Zheng Si,Yang Fengchun,Li Jiao()
Institute of Medical Information, Chinese Academy of Medical Sciences / Peking Union Medical College, Beijing 100020, China
全文: PDF (2314 KB)   HTML ( 29
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 构建和比较抗结核杆菌药物虚拟筛选模型,助力抗结核药物的研发。【方法】 提出一种基于课程式学习优化的图神经网络模型GNN-MTB,用于抗结核杆菌抑制剂的虚拟筛选。进一步,从开放数据库中收集整理抗结核杆菌药物筛选相关基准数据集,将GNN-MTB模型与4种常规机器学习模型和两种图神经网络模型在基准数据集上进行性能比较。【结果】 对10 789条抗结核杆菌药物虚拟筛选实验数据的分析结果显示,GNN-MTB模型的预测性能(AUC为0.912,AUPR为0.679)优于传统的机器学习模型和图神经网络模型的性能表现(平均AUC为0.878~0.900,平均AUPR为0.600~0.673),平均AUC和AUPR的最大提升幅度达3.872%和13.167%。同时,开源GNN-MTB模型并构建抗结核杆菌药物虚拟筛选预测工具以供广大抗结核杆菌药物研究者使用。【局限】 未纳入药物敏感性和菌株耐药性相关分析。【结论】 GNN-MTB模型取得良好性能,可探索将其应用于抗结核病药物研发。同时,研究框架也可为其他疾病药物的虚拟筛选提供参考。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
顾耀文
郑思
杨丰春
李姣
关键词 图神经网络课程式学习结核杆菌虚拟筛选    
Abstract

[Objective] This study constructs a virtual screening model for anti-tuberculosis drugs aiming to support the research and development of new medicine. [Methods] We proposed a curriculum learning-optimized graph neural network model for anti-tuberculosis inhibitors virtual screening (GNN-MTB). Then, we created a benchmark dataset for anti-tuberculosis drugs from the open access databases. Finally, we compared the performance of the GNN-MTB with four classic machine learning models and two graph neural network models on the benchmark dataset of 10,789 records. [Results] The proposed GNN-MTB model’s AUC score reached 0.912 and its AUPR score was 0.679, which were higher than those of the classic models. The maximum improvement of our method in AUC and AUPR were 3.872% and 13.167%. The GNN-MTB is made open source and could be found at https://github.com/gu-yaowen/GNN-MTB. [Limitations] The proposed model needs to add the analysis data on drug sensitivity and bacterial resistance. [Conclusions] The proposed GNN-MTB model benefits the development of anti-tuberculosis drug screening. This method could also create drug virtual screening models for other diseases.

Key wordsGraph Neural Network    Curriculum Learning    Mycobacterium Tuberculosis    Virtual Screening
收稿日期: 2022-03-09      出版日期: 2023-01-13
ZTFLH:  R961  
  P315  
基金资助:* 中国医学科学院医学与健康科技创新工程(2021-I2M-1-056);中国医学科学院医学与健康科技创新工程(2018-I2M-AI-016);国家重点研发计划(2016YFC0901901)
通讯作者: 李姣     E-mail: li.jiao@imicams.ac.cn
引用本文:   
顾耀文,郑思,杨丰春,李姣. 基于图神经网络的抗结核杆菌药物虚拟筛选模型的建立及应用*[J]. 数据分析与知识发现, 2022, 6(11): 93-102.
Gu Yaowen,Zheng Si,Yang Fengchun,Li Jiao. GNN-MTB: An Anti-Mycobacterium Drug Virtual Screening Model Based on Graph Neural Network. Data Analysis and Knowledge Discovery, 2022, 6(11): 93-102.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0196      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I11/93
Fig.1  抗结核杆菌药物虚拟筛选模型构建流程
Fig.2  抗结核杆菌数据集各测量终点占比
Fig.3  抗结核杆菌MIC数值核密度估计分布图
SMILES表示化合物 标签
CC(C)c1csc(C(=O)NN)n1 0
COc1cc2ccc(=O)oc2cc1O 0
NNC(=O)c1ccncc1 1
Cc1sc(N)nc1C(=O)O 1
NC(=O)c1cnccn1 0
Table 1  抗结核杆菌基准数据集示例表
Fig.4  基于GNN-MTB的抗结核杆菌药物虚拟筛选工具界面
类型 模型 AUC AUPR F1分数
机器学习 RF 0.897±0.011 0.634±0.012 0.620±0.015
SVM 0.894±0.008 0.647±0.022 0.624±0.010
MLP 0.896±0.011 0.649±0.013 0.614±0.015
GBDT 0.897±0.008 0.673±0.017 0.631±0.019
图神经网络 GAT 0.900±0.013* 0.656±0.027 0.609±0.048
MPNN 0.878±0.014 0.600±0.033 0.595±0.039
GNN-MTB 0.912±0.010* 0.679±0.017 0.643±0.032
Table 2  模型表现对比结果
Fig.5  不同模型ROC曲线对比
Fig.6  不同模型PR曲线对比
Fig.7  不同模型的Top300预测结果的精确度对比
类型 模型 AUC AUPR F1分数
机器学习 RF 0.541 0.306 0.248
SVM 0.569 0.335 0.222
MLP 0.641 0.370 0.354
GBDT 0.545 0.305 0.122
图神经网络 GAT 0.691 0.463 0.400
MPNN 0.552 0.318 0.178
GNN-MTB 0.683 0.462 0.526
Table 3  模型表现对比结果(外部验证集)
SMILES 真实活性值 标签 预测概率
Cc1ccn2nc(C)c(C(=O)NCc3ccc(N4CCC(c5ccc(OC(F)(F)F)cc5)CC4)cc3)c2n1 57.340 0 0.912
Cc1cn2nc(C)c(C(=O)NCc3ccc(N4CCC(c5ccc(OC(F)(F)F)cc5)CC4)cc3)c2s1 0.060 1 0.866
Cc1nn2ccsc2c1C(=O)NCc1ccc(N2CCC(c3ccc(OC(F)(F)F)cc3)CC2)cc1 0.580 1 0.821
Cc1ccc2[nH]c(C)c(C(=O)NCc3ccc(N4CCC(c5ccc(OC(F)(F)F)cc5)CC4)cc3)c2c1 19.190 0 0.771
Cc1nn2c(C)csc2c1C(=O)NCc1ccc(N2CCC(c3ccc(OC(F)(F)F)cc3)CC2)cc1 0.190 1 0.699
COc1nc2ccc(Br)cc2cc1[C@@H](c1ccccc1)[C@@](O)(CCN(C)C)c1cccc2ccccc12 0.450 1 0.682
Cc1c(-c2ccc(N3CCC(C(F)(F)F)CC3)cc2)[nH]c2cc(F)cc(F)c2c1=O 10.000 0 0.667
COc1ccc(CN2CC3(CCN(c4nc(=O)c5cc(C(F)(F)F)cc([N+](=O)[O-])c5s4)CC3)C2)cc1 0.110 1 0.650
COc1ccc(CN2CCC3(CCN(c4nc(=O)c5cc(C(F)(F)F)cc([N+](=O)[O-])c5s4)CC3)C2)cc1 0.110 1 0.648
O=c1nc(N2CCN(CC3CCCCC3)CC2)sc2c([N+](=O)[O-])cc(C(F)(F)F)cc12 0.040 1 0.631
COc1ccc(COc2nc3ccc(Br)cc3cc2-c2ccc(CN(C)C)cc2)cc1 10.900 0 0.625
O=c1nc(N2CCC3(CCN(Cc4ccc(C(F)(F)F)cc4)CC3)CC2)sc2c([N+](=O)[O-])cc(C(F)(F)F)cc12 0.040 1 0.622
COc1ccc(CN2CCC3(CC2)CCN(c2nc(=O)c4cc(C(F)(F)F)cc([N+](=O)[O-])c4s2)CC3)cc1 1.010 0 0.620
COc1cccc(COc2nc3ccc(Br)cc3cc2-c2ccc(CN(C)C)cc2)c1 1.200 0 0.619
CN(C)Cc1ccc(-c2cc3cc(Br)ccc3nc2OCc2ccncc2)cc1 0.950 1 0.617
O=c1nc(N2CCC3(CCN(Cc4ccc(Br)cc4)CC3)CC2)sc2c([N+](=O)[O-])cc(C(F)(F)F)cc12 0.060 1 0.613
CN(C)Cc1ccc(-c2cc3cc(Br)ccc3nc2OCc2ccc(Cl)cc2)cc1 1.100 0 0.600
Cc1ccc2oc(C)c(C(=O)NCc3ccc(N4CCC(c5ccc(OC(F)(F)F)cc5)CC4)cc3)c2c1 57.450 0 0.587
CCOC(=O)CN1CC2(CCN(c3nc(=O)c4cc(C(F)(F)F)cc([N+](=O)[O-])c4s3)CC2)C1 0.440 1 0.586
COc1cc(CN2CCC3(CC2)CCN(c2nc(=O)c4cc(C(F)(F)F)cc([N+](=O)[O-])c4s2)CC3)cc(OC)c1 0.820 1 0.575
O=c1nc(N2CCC3(CC2)CN(CC2CCCCC2)C3)sc2c([N+](=O)[O-])cc(C(F)(F)F)cc12 0.470 1 0.574
CN(C)Cc1ccc(-c2cc3cc(Br)ccc3nc2OCc2c(F)cccc2F)cc1 3.100 0 0.570
O=c1nc(N2CCC3(CCN(Cc4ccc(F)cc4)C3)CC2)sc2c([N+](=O)[O-])cc(C(F)(F)F)cc12 1.680 0 0.566
CN(C)c1cc2[nH]c(C3CCC(F)(F)CC3)nc2cc1NC(=O)c1ccc(OC(F)(F)F)cc1 2.590 0 0.564
CN(C)Cc1ccc(-c2cc3cc(Br)ccc3nc2OCc2cccc(F)c2)cc1 1.500 0 0.564
CCOC(=O)CN1CCC2(CCN(c3nc(=O)c4cc(C(F)(F)F)cc([N+](=O)[O-])c4s3)CC2)C1 0.060 1 0.564
CN1CCC2(CC1)CCN(c1nc(=O)c3cc(C(F)(F)F)cc([N+](=O)[O-])c3s1)CC2 33.540 0 0.560
O=c1nc(N2CCC3(CCN(Cc4ccccc4)C3)CC2)sc2c([N+](=O)[O-])cc(C(F)(F)F)cc12 0.060 1 0.559
O=c1nc(N2CCC3(CCN(CC4CCCCC4)C3)CC2)sc2c([N+](=O)[O-])cc(C(F)(F)F)cc12 0.220 1 0.558
COc1nc2ccc(Br)cc2cc1-c1ccc(CN(C)C)cc1 15.900 0 0.546
Table 4  GNN-MTB模型在外部验证集上的Top30预测结果
Fig.8  外部验证集中的真实活性值
[1] MacNeil A, Glaziou P, Sismanidis C, et al. Global Epidemiology of Tuberculosis and Progress Toward Meeting Global Targets-Worldwide, 2018[J]. MMWR Morbidity and Mortality Weekly Report, 2020, 69(11): 281-285.
[2] Furin J, Cox H, Pai M. Tuberculosis[J]. The Lancet, 2019, 393(10181): 1642-1656.
doi: 10.1016/S0140-6736(19)30308-3
[3] Abubakar I, Zignol M, Falzon D, et al. Drug-Resistant Tuberculosis: Time for Visionary Political Leadership[J]. The Lancet Infectious Diseases, 2013, 13(6): 529-539.
doi: 10.1016/S1473-3099(13)70030-6
[4] Cox V, Brigden G, Crespo R H, et al. Global Programmatic Use of Bedaquiline and Delamanid for the Treatment of Multidrug-Resistant Tuberculosis[J]. The International Journal of Tuberculosis and Lung Disease, 2018, 22(4): 407-412.
doi: 10.5588/ijtld.17.0706
[5] Jastrzębski S, Szymczak M, Pocha A, et al. Emulating Docking Results Using a Deep Neural Network: A New Perspective for Virtual Screening[J]. Journal of Chemical Information and Modeling, 2020, 60(9): 4246-4262.
doi: 10.1021/acs.jcim.9b01202 pmid: 32865414
[6] Stokes J M, Yang K, Swanson K, et al. A Deep Learning Approach to Antibiotic Discovery[J]. Cell, 2020, 180(4): 688-702.e13.
doi: S0092-8674(20)30102-1 pmid: 32084340
[7] Gomes J, Ramsundar B, Feinberg E N, et al. Atomic Convolutional Networks for Predicting Protein-ligand Binding Affinity[OL]. arXiv Preprint, arXiv:1703.10603.
[8] Kong W, Tu X Y, Huang W R, et al. Prediction and Optimization of NaV1.7 Sodium Channel Inhibitors Based on Machine Learning and Simulated Annealing[J]. Journal of Chemical Information and Modeling, 2020, 60(6): 2739-2753.
doi: 10.1021/acs.jcim.9b01180
[9] 周泽聿, 王昊, 赵梓博, 等. 融合关联信息的GCN文本分类模型构建及其应用研究[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[9] (Zhou Zeyu, Wang Hao, Zhao Zibo, et al. Construction and Application of GCN Model for Text Classification with Associated Information[J]. Data Analysis and Knowledge Discovery, 2021, 5(9): 31-41.)
[10] 顾耀文, 张博文, 郑思, 等. 基于图注意力网络的药物ADMET分类预测模型构建方法[J]. 数据分析与知识发现, 2021, 5(8): 76-85.
[10] (Gu Yaowen, Zhang Bowen, Zheng Si, et al. Predicting Drug ADMET Properties Based on Graph Attention Network[J]. Data Analysis and Knowledge Discovery, 2021, 5(8): 76-85.)
[11] Sakai M, Nagayasu K, Shibui N, et al. Prediction of Pharmacological Activities from Chemical Structures with Graph Convolutional Neural Networks[J]. Scientific Reports, 2021, 11: 525.
doi: 10.1038/s41598-020-80113-7 pmid: 33436854
[12] Prathipati P, Ma N L, Keller T H. Global Bayesian Models for the Prioritization of Antitubercular Agents[J]. Journal of Chemical Information and Modeling, 2008, 48(12): 2362-2370.
doi: 10.1021/ci800143n pmid: 19053518
[13] Lane T, Russo D P, Zorn K M, et al. Comparing and Validating Machine Learning Models for Mycobacterium Tuberculosis Drug Discovery[J]. Molecular Pharmaceutics, 2018, 15(10): 4346-4360.
doi: 10.1021/acs.molpharmaceut.8b00083 pmid: 29672063
[14] Ye Q, Chai X, Jiang D J, et al. Identification of Active Molecules Against Mycobacterium Tuberculosis Through Machine Learning[J]. Briefings in Bioinformatics, 2021, 22(5): bbab068.
doi: 10.1093/bib/bbab068
[15] Yang Y, Walker T M, Walker A S, et al. DeepAMR for Predicting Co-occurrent Resistance of Mycobacterium Tuberculosis[J]. Bioinformatics (Oxford, England), 2019, 35(18): 3240-3249.
doi: 10.1093/bioinformatics/btz067
[16] Yang Y, Walker T M, Kouchaki S, et al. An End-to-End Heterogeneous Graph Attention Network for Mycobacterium Tuberculosis Drug-Resistance Prediction[J]. Briefings in Bioinformatics, 2021, 22(6): bbab299.
doi: 10.1093/bib/bbab299
[17] Mendez D, Gaulton A, Bento A P, et al. ChEMBL: Towards Direct Deposition of Bioassay Data[J]. Nucleic Acids Research, 2018, 47(D1): D930-D940.
doi: 10.1093/nar/gky1075
[18] Lane T R, Urbina F, Rank L, et al. Machine Learning Models for Mycobacterium Tuberculosis in Vitro Activity: Prediction and Target Visualization[J]. Molecular Pharmaceutics, 2022, 19(2): 674-689.
doi: 10.1021/acs.molpharmaceut.1c00791
[19] Kipf T, Welling M. Semi-supervised Classification with Graph Convolutional Networks[OL]. arXiv Preprint, arXiv:1609.02907.
[20] Bengio Y, Louradour J, Collobert R, et al. Curriculum Learning[C]// Proceedings of the 26th Annual International Conference on Machine Learning. 2009: 41-48.
[21] Wang X, Chen Y D, Zhu W W. A Survey on Curriculum Learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(9): 4555-4576.
[22] Wang Y W, Wang W, Liang Y X, et al. CurGraph: Curriculum Learning for Graph Classification[C]// Proceedings of the Web Conference 2021. 2021: 1238-1248.
[23] Li X H, Wen L J, Deng Y W, et al. Graph Neural Network with Curriculum Learning for Imbalanced Node Classification[OL]. arXiv Preprint, arXiv: 2202.02529.
[24] Gu Y W, Zheng S, Xu Z D, et al. An Efficient Curriculum Learning-Based Strategy for Molecular Graph Learning[J]. Briefings in Bioinformatics, 2022, 23(3): bbac099.
doi: 10.1093/bib/bbac099
[25] Gu Y W, Zheng S, Li J. CurrMG: A Curriculum Learning Approach for Graph Based Molecular Property Prediction[C]// Proceedings of 2021 IEEE International Conference on Bioinformatics and Biomedicine. 2021: 2686-2693.
[26] Platanios E A, Stretcu O, Neubig G, et al. Competence-Based Curriculum Learning for Neural Machine Translation[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019: 1162-1172.
[27] Veličković P, Cucurull G, Casanova A. et al. Graph Attention Networks[OL]. arXiv Preprint, arXiv:1710.10903.
[28] Gilmer J, Schoenholz S S, Riley P F, et al. Neural Message Passing for Quantum Chemistry[C]// Proceedings of the 34th International Conference on Machine Learning. 2017: 1263-1272.
[1] 成全, 佘德昕. 融合患者体征与用药数据的图神经网络药物推荐方法研究*[J]. 数据分析与知识发现, 2022, 6(9): 113-124.
[2] 张若琦, 申建芳, 陈平华. 结合GNN、Bi-GRU及注意力机制的会话序列推荐*[J]. 数据分析与知识发现, 2022, 6(6): 46-54.
[3] 王洁,高原,张蕾,马力文,冯筠. 基于因果分析图的城市交通流短时预测研究*[J]. 数据分析与知识发现, 2022, 6(11): 111-125.
[4] 冯小东, 惠康欣. 基于异构图神经网络的社交媒体文本主题聚类*[J]. 数据分析与知识发现, 2022, 6(10): 9-19.
[5] 黄学坚, 刘雨飏, 马廷淮. 基于改进型图神经网络的学术论文分类模型*[J]. 数据分析与知识发现, 2022, 6(10): 93-102.
[6] 顾耀文, 张博文, 郑思, 杨丰春, 李姣. 基于图注意力网络的药物ADMET分类预测模型构建方法*[J]. 数据分析与知识发现, 2021, 5(8): 76-85.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn