Please wait a minute...
Advanced Search
数据分析与知识发现  2018, Vol. 2 Issue (12): 98-108     https://doi.org/10.11925/infotech.2096-3467.2018.0545
  应用论文 本期目录 | 过刊浏览 | 高级检索 |
基于网络属性的抗肿瘤药物靶点预测方法及其应用*
范馨月, 崔雷()
中国医科大学医学信息学院 沈阳 110122
Predicting Antineoplastic Drug Targets Based on Network Properties
Fan Xinyue, Cui Lei()
School of Medical Informatics, China Medical University, Shenyang 110122, China
全文: PDF (2408 KB)   HTML ( 5
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】旨在发现潜在的抗肿瘤药物作用靶点, 为日后临床工作及实验验证提供参考。【方法】从DrugBank数据库获取抗肿瘤药物靶点, 结合HPRD数据库中蛋白质相互作用信息, 使用Cytoscape建立药物靶点PPI网络并计算网络节点的拓扑属性, 使用SPSS单因素分析和Weka信息增益原理筛选拓扑属性变量, 采用SMOTE算法处理不平衡数据集问题, 利用决策树方法构建抗肿瘤药物靶点预测模型, 并与其他三种常见的机器学习分类算法模型进行性能比较。【结果】应用决策树算法构建的抗肿瘤药物靶点预测模型的预测准确率达73.18%, 在CBioPortal中验证发现, 结果中预测分数大于等于0.9的16个靶点在多种肿瘤中存在突变和扩增, 并以NR5A1为例进行具体分析。【局限】仅使用抗肿瘤药物靶点的PPI网络属性构建预测模型, 未加入靶点的功能、序列属性等特征。【结论】基于PPI网络的拓扑属性, 采用机器学习方法对潜在的抗肿瘤药物靶点进行预测是有效的, 可以为抗肿瘤药物的研发及临床工作提供一定参考。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
范馨月
崔雷
关键词 PPI网络机器学习决策树抗肿瘤药靶点预测    
Abstract

[Objective] This paper tries to identify potential targets of antineoplastic drugs, aiming to provide references for future clinical work and experiment. [Methods] First, we retrieved the targets of antineoplastic drugs from the DrugBank database, which were also combined with the protein interaction information from the HPRD database. Then, we established the PPI network for these targets with Cytoscape and calculated the topology properties of the nodes. Third, we used SPSS single factor analysis and Weka’s information gain principle to choose the variables for topological attributes. Fourth, we introduced the SMOTE algorithm to process unbalanced data sets and constructed the prediction model for antineoplastic drug targets with the decision tree method. Finally, we compared the performance of our new model with those of the classic ones. [Results] The precision of the proposed model reached 73.18%. With the help of CBioPortal, we found 16 targets’ prediction scores higher than 0.9. These targets could mutate and amplify in various tumors, which were analyzed with the case of NR5A1. [Limitations] The characteristics of target functions, sequence attributes, and other factors should also be included to construct the model. [Conclusions] The proposed model could predict the potential targets of antineoplastic drugs effectively.

Key wordsPPI Network    Machine Learning    Decision Tree    Antineoplastic Drug Targets Prediction
收稿日期: 2018-05-15      出版日期: 2019-01-16
ZTFLH:  TP391 G353  
基金资助:*本文系赛尔网络下一代互联网技术创新项目“面向高等院校的医学影像学教学平台”(项目编号: NGII20150503)的研究成果之一
引用本文:   
范馨月, 崔雷. 基于网络属性的抗肿瘤药物靶点预测方法及其应用*[J]. 数据分析与知识发现, 2018, 2(12): 98-108.
Fan Xinyue,Cui Lei. Predicting Antineoplastic Drug Targets Based on Network Properties. Data Analysis and Knowledge Discovery, 2018, 2(12): 98-108.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0545      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2018/V2/I12/98
靶点 通用名 商品名 疾病 国内是否上市
EGFR
HER2
Necitumumab(耐昔妥珠单抗)
Osimertinib(奥昔替尼)
Portrazza
Tagrisso(泰瑞莎)
肺癌
AKL Ceritinib(色瑞替尼) Zykadia 肺癌
Alectinib(艾乐替尼) Alecensa 肺癌
Brigatinib(布吉替尼) Alunbrig 肺癌
VEGFR2 Ramucirumab(雷莫芦单抗) Cyramza 肺癌、胃癌、结直肠癌
BRAF Dabrafenib(达拉非尼)+
Trametinib(曲美替尼)
Tafinlar+Mekinist 肺癌
PD-1 Nivolumab(纳武单抗) Opdivo 肺癌、结直肠癌、肝癌
Pembrolizumab(派姆单抗) Keytruda(健痊得) 肺癌、结直肠癌
PD-L1 Atezolizumab(阿特珠单抗) Tecentrip 肺癌、胃癌
KIT,PDGFR,
RAF,RET,
VEGFR1/2/3
Regorafenib(瑞戈非尼) Stivarga 结直肠癌、肝癌
VEGFA/B,PIGF Ziv-aflibercept(阿柏西普) Zaltrap 结直肠癌
EGFR,KRAS Panitumumab(帕尼单抗) Vectibix 结直肠癌
—— Trifluridine(曲氟尿苷) Tipiracil 结直肠癌
RTK,VEGF Lenvatinib(乐伐替尼) Lenvima 肝癌
HER2 Ado-transtzumab
Emtansine(TDM-1)
Kadcyla 乳腺癌
Peryuzumab(帕妥珠单抗) Perjeta 乳腺癌
Neratinib Nerlynx 乳腺癌
CDK4 Palbociclib(帕博西尼) Ibrance 乳腺癌
CDK6 Ribociclib(瑞博西尼) Kisqali 乳腺癌
Abemaciclib Verzenio 乳腺癌
  肺癌、胃癌、结直肠癌、肝癌、乳腺癌常用靶向药物
  技术路线
  数据收集流程
网络属性(预测特征) 重要性排序
Average Shortest Path Length ANR
Betweenness Centrality Average Shortest Path Length
Closeness Centrality Degree
Clustering Coefficienty Number Of Directed Edges
Degree Stress
Eccentricity Closeness Centrality
Number Of Directed Edges Eccentricity
Number Of Undirected Edges Clustering Coefficienty
Partner Of MultiEdgedNodePairs SelfLoops
Radiality Topological Coefficient
SelfLoops Betweenness Centrality
Stress Radiality
Topological Coefficient
ANR
  蛋白质靶点网络属性及其排序
算法 Precision Recall F-measure AUC AUPR
C4.5决策树 0.773 0.732 0.747 0.754 0.797
人工神经网络 0.784 0.745 0.759 0.753 0.796
贝叶斯网络 0.758 0.780 0.764 0.752 0.795
支持向量机 0.784 0.743 0.757 0.701 0.748
  4种分类算法所建模型预测结果比较
  Score≥0.9靶点信息汇总
Gene Protein Mutation Amplification
NR5A1 Steroidogenic factor 1 Cutaneous Melanoma (3.14%) Prostate Cancer, NOS (16.92%)
CSF3R Granulocyte colony-stimulating factor receptor Penile Cancer (14.29%) Ovarian Cancer (5.71%)
NFKB2 Nuclear factor NF-kappa-B p100 subunit Cholangiocarcinoma (100%) Prostate Cancer, NOS (7.69%)
TNK2 Activated CDC42 kinase 1 Myelodysplasia (5.56%) Prostate Cancer, NOS (21.54%)
UBC Polyubiquitin-C Endometrial Cancer (2%) Prostate Cancer, NOS (12.31%)
PIK3R2 Phosphatidylinositol 3-kinase regulatory subunit beta Small Bowel Cancer (5.56%) Prostate Cancer, NOS (13.85%)
IDE Insulin-degrading enzyme Endometrial Cancer (3.78%) Prostate Cancer, NOS (7.69%)
PSMB3 Proteasome subunit beta type-3 Adrenocortical Carcinoma (0.99%) Breast Cancer, NOS (18.75%)
GRM7 Metabotropic glutamate receptor 7 Ovarian/Fallopian Tube Cancer, NOS (14.29%) Prostate Cancer, NOS (23.08%)
THRA Thyroid hormone receptor alpha Colorectal Adenocarcinoma(2.91%) Breast Cancer, NOS (18.75%)
MED1 Mediator of RNA polymerase II transcription subunit 1 Cervical Cancer (4.6%) Breast Cancer, NOS (31.25%)
THRB Thyroid hormone receptor beta Cutaneous Melanoma (5.23%) Prostate Cancer, NOS (21.54%)
NCS1 Neuronal calcium sensor 1 Endometrial Cancer (0.59%) Prostate Cancer, NOS (13.85%)
NR3C2 Mineralocorticoid receptor Ovarian/Fallopian Tube Cancer, NOS (14.29%) Prostate Cancer, NOS (15.38%)
TUB Tubby protein homolog Endometrial Cancer (4.08%) Prostate Cancer, NOS (9.23%)
IL2 Interleukin-2 Cutaneous Melanoma (1.05%) Prostate Cancer, NOS (7.69%)
  Score≥0.9的药物靶点在癌症组织中突变及扩增情况
  NR5A1一阶邻居子网
  NR5A1在不同类型癌症中的表达情况
  NFKB1在不同类型癌症中的表达情况
  NCOA1在不同类型癌症中的表达情况
  MAPK1在不同类型癌症中的表达情况
  JUN在不同类型癌症中的表达情况
癌症类型
基因名称
Melanoma Adrenocortical Carcinoma Endometrial Cancer Esophagogastric Cancer Colorectal Adenocarcinoma Cancer of Unknown Primary
AR 2.09% (2.79%a,18b) 1.97% (1.97%,20) 6.08% (6.68%,4) 4.09% (4.75%,11) 4.52% (4.52%,9) 5.14% (5.24%,7)
NCOA1 2.79% (3.83%,6) 1.97% (2.46%,9) 5.34% (7.42%,2) 2.47% (3.20%,7) 3.55% (3.55%,4) 4.40% (5.99%,3)
JUN 0.35% (1.05%,15) 0 (0.99,-) 0.59% (0.96%,8) 0.76% (1.15%,7) 1.94% (1.94%,2) 0.47% (3.27%,11)
MAPK1 1.39% (3.48%,4) 0.99% (2.96%,8) 1.19% (2.23%,6) 0.49% (1.55%,16) 0.65% (0.96%,14) 0.84% (6.08%,9)
NFKB1 2.09% (2.79%,4) 0.99% (0.99%,8) 3.86% (4.15%,2) 0.73% (0.89%,13) 2.91% (2.91%,3) 6.74% (8.23%,1)
  5种抗肿瘤药物靶点在不同癌症组织中突变频率
基因名称 Case Ampilication Case 比例
AR 65 38 58.46%
NR5A1 65 11 16.92%
NCOA1 65 8 12.31%
JUN 65 6 9.23%
MAPK1 65 5 7.65%
NFKB1 65 4 6.15%
  前列腺癌中6种基因扩增率
[1] Allemani C, Matsuda T, Di Carlo V, et al.Global Surveillance of Trends in Cancer Survival 2000-14 (CONCORD-3): Analysis of Individual Records for 37513025 Patients Diagnosed with One of 18 Cancers from 322 Population-based Registries in 71 Countries[J]. The Lancet, 2018, 391(10125): 1023-1075.
doi: 10.1016/S0140-6736(17)33326-3 pmid: 29395269
[2] 陈万青, 孙可欣, 郑荣寿, 等. 2014年中国分地区恶性肿瘤发病和死亡分析[J]. 中国肿瘤, 2018, 27(1): 1-14.
[2] (Chen Wanqing, Sun Kexin, Zheng Rongshou, et al.Report of Cancer Incidence and Mortality in Different Areas of China, 2014[J]. China Cancer, 2018, 27(1): 1-14.)
[3] Futreal P A, Coin L, Marshall M, et al.A Census of Human Cancer Genes[J]. Nature Reviews Cancer, 2004, 4(3): 177-183.
doi: 10.1038/nrc1299 pmid: 14993899
[4] Strausberg R L, Simpson A J, Wooster R.Sequence-based Cancer Genomics: Progress, Lessons and Opportunities[J]. Nature Reviews Genetics, 2003, 4(6): 409-418.
doi: 10.1038/nrg1085 pmid: 12776211
[5] Ostlund G, Lindskog M, Sonnhammer E L.Network-based Identification of Novel Cancer Genes[J]. Molecular & Cellular Proteomics, 2010, 9(4): 648-655.
doi: 10.1074/mcp.M900227-MCP200 pmid: 2860235
[6] Li L, Zhang K, Lee J, et al.Discovering Cancer Genes by Integrating Network and Functional Properties[J]. BMC Medical Genomics, 2009, 2: 61-74.
doi: 10.1186/1755-8794-2-61 pmid: 2758898
[7] 尚振伟, 李晋, 姜永帅, 等. 基于SVM的药物靶点预测方法及其应用[J]. 现代生物医学进展, 2012, 12(20): 3943-3946.
doi: 10.3969/j.issn.1004-1346.2014.08.015
[7] (Shang Zhenwei, Li Jin, Jiang Yongshuai, et al.A Method of Drug Target Prediction Based on SVM and Its Application[J]. Progress in Modern Biomedicine, 2012, 12(20): 3943-3946.)
doi: 10.3969/j.issn.1004-1346.2014.08.015
[8] 谢倩倩, 李订芳, 章文. 基于集成学习的离子通道药物靶点预测[J]. 计算机科学, 2015, 42(4): 177-180.
doi: 10.11896/j.issn.1002-137X.2015.4.035
[8] (Xie Qianqian, Li Dingfang, Zhang Wen.Predicting Potential Drug Targets for Ion Channel Proteins Based on Ensemble Learning[J]. Computer Science, 2015, 42(4): 177-180.)
doi: 10.11896/j.issn.1002-137X.2015.4.035
[9] 蔡立葛. 基于失衡数据挖掘的药物靶点预测方法研究[D]. 哈尔滨: 哈尔滨理工大学, 2017.
[9] (Cai Lige.Research on the Prediction of Drug Targets Based on Imbalance Data Mining[D]. Harbin: Harbin University of Science and Technology, 2017.)
[10] Carson M B, Lu H.Network-based Prediction and Knowledge Mining of Disease Genes[J]. BMC Medical Genomics, 2015, 8(S2): S9.
doi: 10.1186/1755-8794-8-S2-S9 pmid: 4460923
[11] Jing Y, Bian Y, Hu Z, et al.Deep Learning for Drug Design: An Artificial Intelligence Paradigm for Drug Discovery in the Big Data Era[J]. The AAPS Journal, 2018, 20(3): 58.
doi: 10.1208/s12248-018-0210-0 pmid: 29943256
[12] Ferrero E, Dunham I, Sanseau P.In Silico Prediction of Novel Therapeutic Targets Using Gene-Disease Association Data[J]. Journal of Translational Medicine, 2017, 15(1): 182.
doi: 10.1186/s12967-017-1285-6 pmid: 28851378
[13] Wishart D S, Knox C, Guo A C, et al.DrugBank: A Knowledgebase for Drugs, Drug Actions and Drug Targets[J]. Nucleic Acids Research, 2008, 36(Database Issue): 901-906.
doi: 10.1093/nar/gkm958 pmid: 18048412
[14] Keshava Prasad T S, Goel R, Kandasamy K, et al. Human Protein Reference Database[J]. Nucleic Acids Research, 2008, 37(S1): 767-772.
doi: 10.1038/nrg1266
[15] Shannon P, Markiel A, Ozier O, et al.Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks[J]. Genome Research, 2003, 13(11): 2498-2504.
doi: 10.1101/gr.1239303
[16] Hall M, Frank E, Holmes G, et al.The WEKA Data Mining Software: An Update[J]. ACM SIGKDD Explorations Newsletter, 2009, 11(1): 10-18.
doi: 10.1145/1656274
[17] Han L, Cui J, Lin H, et al.Recent Progresses in the Application of Machine Learning Approach for Predicting Protein Functional Class Independent of Sequence Similarity[J]. Proteomics, 2006, 6(14): 4023-4037.
doi: 10.1002/pmic.200500938 pmid: 16791826
[18] Chawla N V, Bowyer K W, Hall L O, et al.SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.
doi: 10.1613/jair.953
[19] 杜景林, 严蔚岚. 基于距离权值的C4.5组合决策树算法[J]. 计算机工程与设计, 2018, 39(1): 96-102.
[19] (Du Jinglin, Yan Weilan.Multiple Classifiers of C4.5 Decision Tree Based on Distance Weight[J]. Computer Engineering and Design , 2018, 39(1): 96-102.)
[20] 黄秀霞, 孙力. C4.5算法的优化[J]. 计算机工程与设计, 2016, 37(5): 1265-1270.
[20] (Huang Xiuxia, Sun Li.Optimization of C4.5 Algorithm[J]. Computer Engineering and Design, 2016, 37(5): 1265-1270.)
[21] Cerami E, Gao J, Dogrusoz U, et al.The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data[J]. Cancer Discovery, 2012, 2(5): 401-404.
doi: 10.1158/2159-8290.CD-12-0095
[22] Delaney J R, Patel C B, Willis K M, et al. Haploinsufficiency Networks Identify Targetable Patterns of Allelic Deficiency in Low Mutation Ovarian Cancer[J]. Nature Communications, 2017, 8: Article No.14423.
doi: 10.1038/ncomms14423 pmid: 28198375
[1] 陈东,王建冬,李慧颖,蔡思航,黄倩倩,易成岐,曹攀. 融合机器学习算法和多因素的禽肉交易量预测方法研究 *[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[2] 梁野,李小元,许航,胡伊然. CLOpin:一种面向舆情分析与预警领域的跨语言知识图谱架构*[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[3] 杨恒,王思丽,祝忠明,刘巍,王楠. 基于并行协同过滤算法的领域知识推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[4] 王若佳,张璐,王继民. 基于机器学习的在线问诊平台智能分诊研究[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
[5] 李纲,周华阳,毛进,陈思菁. 基于机器学习的社交媒体用户分类研究 *[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[6] 胡佳慧,方安,赵琬清,杨晨柳,任慧玲. 面向知识发现的中文电子病历标注方法研究 *[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[7] 张金柱,胡一鸣. 融合表示学习与机器学习的专利科学引文标题自动抽取研究*[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[8] 刘志强,都云程,施水才. 基于改进的隐马尔科夫模型的网页新闻关键信息抽取*[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[9] 徐红霞,李春旺. 科技文献内容知识点抽取研究综述[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[10] 李静,潘舒笑,李雪岩,贾立静,赵宇卓. 基于多目标量子优化分类器的急诊危重患者关键指标筛选 *[J]. 数据分析与知识发现, 2019, 3(12): 101-112.
[11] 沈洋,庄伟超,吴清华,钱玲飞. 基于区间模糊VIKOR的监犯特征风险评估研究 *[J]. 数据分析与知识发现, 2019, 3(11): 70-78.
[12] 张紫玄,王昊,朱立平,邓三鸿. 中国海关HS编码风险的识别研究*[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
[13] 刘丽娜,齐佳音,张镇平,曾丹. 品牌对商品在线销量的影响*——基于海量商品评论的在线声誉和品牌知名度的调节作用研究[J]. 数据分析与知识发现, 2018, 2(9): 10-21.
[14] 贾隆嘉,张邦佐. 高校网络舆情安全中主题分类方法研究*——以新浪微博数据为例[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
[15] 陆伟,罗梦奇,丁恒,李信. 深度学习图像标注与用户标注比较研究*[J]. 数据分析与知识发现, 2018, 2(5): 1-10.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn