Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (11): 93-102    DOI: 10.11925/infotech.2096-3467.2022.0196
Current Issue | Archive | Adv Search |
GNN-MTB: An Anti-Mycobacterium Drug Virtual Screening Model Based on Graph Neural Network
Gu Yaowen,Zheng Si,Yang Fengchun,Li Jiao()
Institute of Medical Information, Chinese Academy of Medical Sciences / Peking Union Medical College, Beijing 100020, China
Download: PDF (2314 KB)   HTML ( 29
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study constructs a virtual screening model for anti-tuberculosis drugs aiming to support the research and development of new medicine. [Methods] We proposed a curriculum learning-optimized graph neural network model for anti-tuberculosis inhibitors virtual screening (GNN-MTB). Then, we created a benchmark dataset for anti-tuberculosis drugs from the open access databases. Finally, we compared the performance of the GNN-MTB with four classic machine learning models and two graph neural network models on the benchmark dataset of 10,789 records. [Results] The proposed GNN-MTB model’s AUC score reached 0.912 and its AUPR score was 0.679, which were higher than those of the classic models. The maximum improvement of our method in AUC and AUPR were 3.872% and 13.167%. The GNN-MTB is made open source and could be found at https://github.com/gu-yaowen/GNN-MTB. [Limitations] The proposed model needs to add the analysis data on drug sensitivity and bacterial resistance. [Conclusions] The proposed GNN-MTB model benefits the development of anti-tuberculosis drug screening. This method could also create drug virtual screening models for other diseases.

Key wordsGraph Neural Network      Curriculum Learning      Mycobacterium Tuberculosis      Virtual Screening     
Received: 09 March 2022      Published: 13 January 2023
ZTFLH:  R961  
  P315  
Fund:CAMS Innovation Fund for Medical Sciences (CIFMS)(2021-I2M-1-056);CAMS Innovation Fund for Medical Sciences (CIFMS)(2018-I2M-AI-016);National Key R&D Program of China(2016YFC0901901)
Corresponding Authors: Li Jiao     E-mail: li.jiao@imicams.ac.cn

Cite this article:

Gu Yaowen,Zheng Si,Yang Fengchun,Li Jiao. GNN-MTB: An Anti-Mycobacterium Drug Virtual Screening Model Based on Graph Neural Network. Data Analysis and Knowledge Discovery, 2022, 6(11): 93-102.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0196     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I11/93

Process of Building Anti-Tuberculosis Drug Virtual Screening Model
Proportion of Value Types in Anti-Tuberculosis Dataset
Kernel Density Estimation Distribution of Anti-Tuberculosis MIC Values
SMILES表示化合物 标签
CC(C)c1csc(C(=O)NN)n1 0
COc1cc2ccc(=O)oc2cc1O 0
NNC(=O)c1ccncc1 1
Cc1sc(N)nc1C(=O)O 1
NC(=O)c1cnccn1 0
Diagram of Anti-Tuberculosis Dataset
Interface of GNN-MTB Based Anti-Tuberculosis Drug Virtual Screening Tool
类型 模型 AUC AUPR F1分数
机器学习 RF 0.897±0.011 0.634±0.012 0.620±0.015
SVM 0.894±0.008 0.647±0.022 0.624±0.010
MLP 0.896±0.011 0.649±0.013 0.614±0.015
GBDT 0.897±0.008 0.673±0.017 0.631±0.019
图神经网络 GAT 0.900±0.013* 0.656±0.027 0.609±0.048
MPNN 0.878±0.014 0.600±0.033 0.595±0.039
GNN-MTB 0.912±0.010* 0.679±0.017 0.643±0.032
Comparison of Model Performance Result
The ROC Curve of Different Models
The PR Curve of Different Models
Precision Score on Top300 Results Predicted by Different Models
类型 模型 AUC AUPR F1分数
机器学习 RF 0.541 0.306 0.248
SVM 0.569 0.335 0.222
MLP 0.641 0.370 0.354
GBDT 0.545 0.305 0.122
图神经网络 GAT 0.691 0.463 0.400
MPNN 0.552 0.318 0.178
GNN-MTB 0.683 0.462 0.526
Model Performance Result (External Validation Set)
SMILES 真实活性值 标签 预测概率
Cc1ccn2nc(C)c(C(=O)NCc3ccc(N4CCC(c5ccc(OC(F)(F)F)cc5)CC4)cc3)c2n1 57.340 0 0.912
Cc1cn2nc(C)c(C(=O)NCc3ccc(N4CCC(c5ccc(OC(F)(F)F)cc5)CC4)cc3)c2s1 0.060 1 0.866
Cc1nn2ccsc2c1C(=O)NCc1ccc(N2CCC(c3ccc(OC(F)(F)F)cc3)CC2)cc1 0.580 1 0.821
Cc1ccc2[nH]c(C)c(C(=O)NCc3ccc(N4CCC(c5ccc(OC(F)(F)F)cc5)CC4)cc3)c2c1 19.190 0 0.771
Cc1nn2c(C)csc2c1C(=O)NCc1ccc(N2CCC(c3ccc(OC(F)(F)F)cc3)CC2)cc1 0.190 1 0.699
COc1nc2ccc(Br)cc2cc1[C@@H](c1ccccc1)[C@@](O)(CCN(C)C)c1cccc2ccccc12 0.450 1 0.682
Cc1c(-c2ccc(N3CCC(C(F)(F)F)CC3)cc2)[nH]c2cc(F)cc(F)c2c1=O 10.000 0 0.667
COc1ccc(CN2CC3(CCN(c4nc(=O)c5cc(C(F)(F)F)cc([N+](=O)[O-])c5s4)CC3)C2)cc1 0.110 1 0.650
COc1ccc(CN2CCC3(CCN(c4nc(=O)c5cc(C(F)(F)F)cc([N+](=O)[O-])c5s4)CC3)C2)cc1 0.110 1 0.648
O=c1nc(N2CCN(CC3CCCCC3)CC2)sc2c([N+](=O)[O-])cc(C(F)(F)F)cc12 0.040 1 0.631
COc1ccc(COc2nc3ccc(Br)cc3cc2-c2ccc(CN(C)C)cc2)cc1 10.900 0 0.625
O=c1nc(N2CCC3(CCN(Cc4ccc(C(F)(F)F)cc4)CC3)CC2)sc2c([N+](=O)[O-])cc(C(F)(F)F)cc12 0.040 1 0.622
COc1ccc(CN2CCC3(CC2)CCN(c2nc(=O)c4cc(C(F)(F)F)cc([N+](=O)[O-])c4s2)CC3)cc1 1.010 0 0.620
COc1cccc(COc2nc3ccc(Br)cc3cc2-c2ccc(CN(C)C)cc2)c1 1.200 0 0.619
CN(C)Cc1ccc(-c2cc3cc(Br)ccc3nc2OCc2ccncc2)cc1 0.950 1 0.617
O=c1nc(N2CCC3(CCN(Cc4ccc(Br)cc4)CC3)CC2)sc2c([N+](=O)[O-])cc(C(F)(F)F)cc12 0.060 1 0.613
CN(C)Cc1ccc(-c2cc3cc(Br)ccc3nc2OCc2ccc(Cl)cc2)cc1 1.100 0 0.600
Cc1ccc2oc(C)c(C(=O)NCc3ccc(N4CCC(c5ccc(OC(F)(F)F)cc5)CC4)cc3)c2c1 57.450 0 0.587
CCOC(=O)CN1CC2(CCN(c3nc(=O)c4cc(C(F)(F)F)cc([N+](=O)[O-])c4s3)CC2)C1 0.440 1 0.586
COc1cc(CN2CCC3(CC2)CCN(c2nc(=O)c4cc(C(F)(F)F)cc([N+](=O)[O-])c4s2)CC3)cc(OC)c1 0.820 1 0.575
O=c1nc(N2CCC3(CC2)CN(CC2CCCCC2)C3)sc2c([N+](=O)[O-])cc(C(F)(F)F)cc12 0.470 1 0.574
CN(C)Cc1ccc(-c2cc3cc(Br)ccc3nc2OCc2c(F)cccc2F)cc1 3.100 0 0.570
O=c1nc(N2CCC3(CCN(Cc4ccc(F)cc4)C3)CC2)sc2c([N+](=O)[O-])cc(C(F)(F)F)cc12 1.680 0 0.566
CN(C)c1cc2[nH]c(C3CCC(F)(F)CC3)nc2cc1NC(=O)c1ccc(OC(F)(F)F)cc1 2.590 0 0.564
CN(C)Cc1ccc(-c2cc3cc(Br)ccc3nc2OCc2cccc(F)c2)cc1 1.500 0 0.564
CCOC(=O)CN1CCC2(CCN(c3nc(=O)c4cc(C(F)(F)F)cc([N+](=O)[O-])c4s3)CC2)C1 0.060 1 0.564
CN1CCC2(CC1)CCN(c1nc(=O)c3cc(C(F)(F)F)cc([N+](=O)[O-])c3s1)CC2 33.540 0 0.560
O=c1nc(N2CCC3(CCN(Cc4ccccc4)C3)CC2)sc2c([N+](=O)[O-])cc(C(F)(F)F)cc12 0.060 1 0.559
O=c1nc(N2CCC3(CCN(CC4CCCCC4)C3)CC2)sc2c([N+](=O)[O-])cc(C(F)(F)F)cc12 0.220 1 0.558
COc1nc2ccc(Br)cc2cc1-c1ccc(CN(C)C)cc1 15.900 0 0.546
Top30 Prediction Results of GNN-MTB Model on External Validation Set
True Activity Values on External Validation Set
[1] MacNeil A, Glaziou P, Sismanidis C, et al. Global Epidemiology of Tuberculosis and Progress Toward Meeting Global Targets-Worldwide, 2018[J]. MMWR Morbidity and Mortality Weekly Report, 2020, 69(11): 281-285.
[2] Furin J, Cox H, Pai M. Tuberculosis[J]. The Lancet, 2019, 393(10181): 1642-1656.
doi: 10.1016/S0140-6736(19)30308-3
[3] Abubakar I, Zignol M, Falzon D, et al. Drug-Resistant Tuberculosis: Time for Visionary Political Leadership[J]. The Lancet Infectious Diseases, 2013, 13(6): 529-539.
doi: 10.1016/S1473-3099(13)70030-6
[4] Cox V, Brigden G, Crespo R H, et al. Global Programmatic Use of Bedaquiline and Delamanid for the Treatment of Multidrug-Resistant Tuberculosis[J]. The International Journal of Tuberculosis and Lung Disease, 2018, 22(4): 407-412.
doi: 10.5588/ijtld.17.0706
[5] Jastrzębski S, Szymczak M, Pocha A, et al. Emulating Docking Results Using a Deep Neural Network: A New Perspective for Virtual Screening[J]. Journal of Chemical Information and Modeling, 2020, 60(9): 4246-4262.
doi: 10.1021/acs.jcim.9b01202 pmid: 32865414
[6] Stokes J M, Yang K, Swanson K, et al. A Deep Learning Approach to Antibiotic Discovery[J]. Cell, 2020, 180(4): 688-702.e13.
doi: S0092-8674(20)30102-1 pmid: 32084340
[7] Gomes J, Ramsundar B, Feinberg E N, et al. Atomic Convolutional Networks for Predicting Protein-ligand Binding Affinity[OL]. arXiv Preprint, arXiv:1703.10603.
[8] Kong W, Tu X Y, Huang W R, et al. Prediction and Optimization of NaV1.7 Sodium Channel Inhibitors Based on Machine Learning and Simulated Annealing[J]. Journal of Chemical Information and Modeling, 2020, 60(6): 2739-2753.
doi: 10.1021/acs.jcim.9b01180
[9] 周泽聿, 王昊, 赵梓博, 等. 融合关联信息的GCN文本分类模型构建及其应用研究[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[9] (Zhou Zeyu, Wang Hao, Zhao Zibo, et al. Construction and Application of GCN Model for Text Classification with Associated Information[J]. Data Analysis and Knowledge Discovery, 2021, 5(9): 31-41.)
[10] 顾耀文, 张博文, 郑思, 等. 基于图注意力网络的药物ADMET分类预测模型构建方法[J]. 数据分析与知识发现, 2021, 5(8): 76-85.
[10] (Gu Yaowen, Zhang Bowen, Zheng Si, et al. Predicting Drug ADMET Properties Based on Graph Attention Network[J]. Data Analysis and Knowledge Discovery, 2021, 5(8): 76-85.)
[11] Sakai M, Nagayasu K, Shibui N, et al. Prediction of Pharmacological Activities from Chemical Structures with Graph Convolutional Neural Networks[J]. Scientific Reports, 2021, 11: 525.
doi: 10.1038/s41598-020-80113-7 pmid: 33436854
[12] Prathipati P, Ma N L, Keller T H. Global Bayesian Models for the Prioritization of Antitubercular Agents[J]. Journal of Chemical Information and Modeling, 2008, 48(12): 2362-2370.
doi: 10.1021/ci800143n pmid: 19053518
[13] Lane T, Russo D P, Zorn K M, et al. Comparing and Validating Machine Learning Models for Mycobacterium Tuberculosis Drug Discovery[J]. Molecular Pharmaceutics, 2018, 15(10): 4346-4360.
doi: 10.1021/acs.molpharmaceut.8b00083 pmid: 29672063
[14] Ye Q, Chai X, Jiang D J, et al. Identification of Active Molecules Against Mycobacterium Tuberculosis Through Machine Learning[J]. Briefings in Bioinformatics, 2021, 22(5): bbab068.
doi: 10.1093/bib/bbab068
[15] Yang Y, Walker T M, Walker A S, et al. DeepAMR for Predicting Co-occurrent Resistance of Mycobacterium Tuberculosis[J]. Bioinformatics (Oxford, England), 2019, 35(18): 3240-3249.
doi: 10.1093/bioinformatics/btz067
[16] Yang Y, Walker T M, Kouchaki S, et al. An End-to-End Heterogeneous Graph Attention Network for Mycobacterium Tuberculosis Drug-Resistance Prediction[J]. Briefings in Bioinformatics, 2021, 22(6): bbab299.
doi: 10.1093/bib/bbab299
[17] Mendez D, Gaulton A, Bento A P, et al. ChEMBL: Towards Direct Deposition of Bioassay Data[J]. Nucleic Acids Research, 2018, 47(D1): D930-D940.
doi: 10.1093/nar/gky1075
[18] Lane T R, Urbina F, Rank L, et al. Machine Learning Models for Mycobacterium Tuberculosis in Vitro Activity: Prediction and Target Visualization[J]. Molecular Pharmaceutics, 2022, 19(2): 674-689.
doi: 10.1021/acs.molpharmaceut.1c00791
[19] Kipf T, Welling M. Semi-supervised Classification with Graph Convolutional Networks[OL]. arXiv Preprint, arXiv:1609.02907.
[20] Bengio Y, Louradour J, Collobert R, et al. Curriculum Learning[C]// Proceedings of the 26th Annual International Conference on Machine Learning. 2009: 41-48.
[21] Wang X, Chen Y D, Zhu W W. A Survey on Curriculum Learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(9): 4555-4576.
[22] Wang Y W, Wang W, Liang Y X, et al. CurGraph: Curriculum Learning for Graph Classification[C]// Proceedings of the Web Conference 2021. 2021: 1238-1248.
[23] Li X H, Wen L J, Deng Y W, et al. Graph Neural Network with Curriculum Learning for Imbalanced Node Classification[OL]. arXiv Preprint, arXiv: 2202.02529.
[24] Gu Y W, Zheng S, Xu Z D, et al. An Efficient Curriculum Learning-Based Strategy for Molecular Graph Learning[J]. Briefings in Bioinformatics, 2022, 23(3): bbac099.
doi: 10.1093/bib/bbac099
[25] Gu Y W, Zheng S, Li J. CurrMG: A Curriculum Learning Approach for Graph Based Molecular Property Prediction[C]// Proceedings of 2021 IEEE International Conference on Bioinformatics and Biomedicine. 2021: 2686-2693.
[26] Platanios E A, Stretcu O, Neubig G, et al. Competence-Based Curriculum Learning for Neural Machine Translation[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019: 1162-1172.
[27] Veličković P, Cucurull G, Casanova A. et al. Graph Attention Networks[OL]. arXiv Preprint, arXiv:1710.10903.
[28] Gilmer J, Schoenholz S S, Riley P F, et al. Neural Message Passing for Quantum Chemistry[C]// Proceedings of the 34th International Conference on Machine Learning. 2017: 1263-1272.
[1] Cheng Quan, She Dexin. Drug Recommendation Based on Graph Neural Network with Patient Signs and Medication Data[J]. 数据分析与知识发现, 2022, 6(9): 113-124.
[2] Wang Jie,Gao Yuan,Zhang Lei,Ma Liwen,Feng Jun. Predicting Short-Term Urban Traffics Based on Causality Analysis Graph[J]. 数据分析与知识发现, 2022, 6(11): 111-125.
[3] Feng Xiaodong, Hui Kangxin. Topic Clustering for Social Media Texts with Heterogeneous Graph Neural Networks[J]. 数据分析与知识发现, 2022, 6(10): 9-19.
[4] Huang Xuejian, Liu Yuyang, Ma Tinghuai. Classification Model for Scholarly Articles Based on Improved Graph Neural Network[J]. 数据分析与知识发现, 2022, 6(10): 93-102.
[5] Gu Yaowen, Zhang Bowen, Zheng Si, Yang Fengchun, Li Jiao. Predicting Drug ADMET Properties Based on Graph Attention Network[J]. 数据分析与知识发现, 2021, 5(8): 76-85.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn