Please wait a minute...
Advanced Search
数据分析与知识发现  2017, Vol. 1 Issue (10): 94-104     https://doi.org/10.11925/infotech.2096-3467.2017.0641
  应用论文 本期目录 | 过刊浏览 | 高级检索 |
基于数据立方体挖掘疾病-基因-药物新关联*
魏星1,2, 胡德华1(), 易敏寒1, 朱启贞1, 朱文婕2
1中南大学信息安全与大数据研究院 长沙 410083
2蚌埠医学院公共基础学院 蚌埠 233003
Extracting Disease-Gene-Drug Correlations Based on Data Cube
Wei Xing1,2, Hu Dehua1(), Yi Minhan1, Zhu Qizhen1, Zhu Wenjie2
1Institute of Information Security and Big Data, Central South University, Changsha 410083, China
2School of Basic Courses, Bengbu Medical College, Bengbu 233003, China
全文: PDF (2530 KB)   HTML ( 2
输出: BibTeX | EndNote (RIS)      
摘要 

目的】在海量文献中, 挖掘并预测生物医学实体之间的新关联, 构建关联网络。【方法】提出一种基于数据立方体的新方法挖掘疾病-基因-药物间关联, 以糖尿病为例, 构建关联网络, 并使用关联规则量化实体关联程度。【结果】由糖尿病相关疾病(14种)、基因(23种)和药物(24种)构建三个1-D方体、三个2-D方体及其关联网络和一个3-D方体关联网络, 共计存在411种关联, 同时得到8个关联子网。【局限】数据预处理存在主观性, 可能会对挖掘结果产生影响。【结论】算法性能优于其他同类算法, 能够为糖尿病精准医疗提供更好的新研究思路。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
魏星
胡德华
易敏寒
朱启贞
朱文婕
关键词 疾病基因药物数据立方体关联规则关联网络    
Abstract

[Objective] This study aims to construct a disease-gene-drug correlation network for diabetes mellitus (DM). [Methods] First, we proposed a new data cube-based approach to construct a disease-gene-drug correlations network for the DM. Then, we measured the associations among the biological entities. [Results] We retrieved the needed data from the PubMed database and constructed three 1-D vertex cubes, three 2-D square cubes and one 3-D disease-gene-drug network, which revealed 411 associations among the 14 subclasses of DM, 23 genes, and 24 drugs. We also constructed 8 optimal disease-gene-drug subnetworks of DM. [Limitations] There were some subjective issues with the data analysis. The changing of user behaviors may also influence the results. [Conclusions] The proposed algorithm is better than the existing ones, which provides new directions for research on customized medical treatments.

Key wordsDisease    Gene    Drug    Data Cube    Association Rules    Correlations Network
收稿日期: 2017-07-03      出版日期: 2017-11-08
ZTFLH:  TP391 G202  
基金资助:*本文系国家自然科学基金项目“利用黄鳝性逆转模型探索piRNA通路在性别决定中的作用机制”(项目编号: 31500999)和安徽省高校质量工程“医学院校物联网工程专业建设医工融合的实践教学新模式”(项目编号: 2016jyxm0673)的研究成果之一
引用本文:   
魏星, 胡德华, 易敏寒, 朱启贞, 朱文婕. 基于数据立方体挖掘疾病-基因-药物新关联*[J]. 数据分析与知识发现, 2017, 1(10): 94-104.
Wei Xing,Hu Dehua,Yi Minhan,Zhu Qizhen,Zhu Wenjie. Extracting Disease-Gene-Drug Correlations Based on Data Cube. Data Analysis and Knowledge Discovery, 2017, 1(10): 94-104.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2017.0641      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2017/V1/I10/94
  数据立方
营养性系统疾病下的分类 内分泌系统疾病下的分类 糖尿病并发症的分类
英文名称 中文名称 英文名称 中文名称 英文名称 中文名称
Diabetes Mellitus, Experimental 实验性糖尿病 Diabetes Complications 糖尿病并发症 Diabetic Angiopathies 糖尿病性血管病
Diabetes Mellitus, Type 1 1型糖尿病 Diabetes, Gestational 妊娠糖尿病 Diabetic Cardiomyopathies 糖尿病性心肌病
Diabetes Mellitus, Type 2 2型糖尿病 Diabetes Mellitus, Experimental 实验性糖尿病 Diabetic Coma 糖尿病性昏迷
Diabetes, Gestational 妊娠糖尿病 Diabetes Mellitus, Type 1 1型糖尿病 Diabetic Ketoacidosis 糖尿病性酮症酸中毒
Diabetic Ketoacidosis 糖尿病酮症酸中毒 Diabetes Mellitus, Type 2 2型糖尿病, Diabetic Nephropathies 糖尿病性肾病
Donohue Syndrome 多诺霍综合症 Donohue Syndrome 多诺霍综合症 Diabetic Neuropathies 糖尿病性神经病
Prediabetic State 糖尿病前期 Prediabetic State 糖尿病前期 Fetal Macrosomia 巨大胎儿(症)
  糖尿病在MeSH词表中的分类
  (disease, gene)2-D方体的关联网络
  (disease, drug)2-D方体的关联网络
  (gene, drug)2-D方体关联网络
  (disease, gene, drug)3-D基本方体关联网络
  8种疾病关联子网
(注: a: 2型糖尿病; b: 实验性糖尿病; c: 糖尿病血管病; d: 糖尿病性神经病; e: 糖尿病心肌病; f: 糖尿病肾病; g: 1型糖尿病; h: 妊娠糖尿病)
Rel EN 1 Description 1 EN 2 Description 2
Disease-Gene Diabetic Neuropathies 糖尿病性神经病 IPF1 transcription factor 1
Diabetic Neuropathies 糖尿病性神经病 SUMO4 small ubiquitin-like modifier 4
Diabetic Nephropathies 糖尿病性肾病 IPF1 transcription factor 1
Diabetic Nephropathies 糖尿病性肾病 SUMO4 small ubiquitin-like modifier 4
Disease-Drug Iron Dextran 右旋糖酐铁 Diabetic Angiopathies 糖尿病性血管病
GFT505 治疗代谢综合征(MS)相关性血脂和血糖
异常的潜在新型候选药物
T2DM 2型糖尿病
Telmisartan 替米沙坦 Diabetic Neuropathies 糖尿病性神经病
Aleglitazar 阿格列扎 Diabetic Nephropathies 糖尿病性肾病
Gene-Drug IRS2 insulin receptor substrate 2 Icosapent 二十碳五烯酸
PPARG peroxisome proliferator-activated receptor gamma Icosapent 二十碳五烯酸
IRS2 insulin receptor substrate 2 Levosimendan 左西孟旦
GCK glucokinase (hexokinase 4) Levosimendan 左西孟旦
ENPP1 ectonucleotide pyrophosphatase/ phosphodiesterase 1 Myristic Acid 肉豆蔻酸
  预测部分关联程度较高但尚未证实的生物实体间新关联
  ROC曲线性能评价
[1] Moreau Y, Tranchevent L C.Computational Tools for Prioritizing Candidate Genes: Boosting Disease Gene Discovery[J]. Nature Reviews Genetics, 2012, 13(8): 523-536.
doi: 10.1038/nrg3253 pmid: 22751426
[2] Fundel K, Kuffner R R.RelEx——Relation Extraction Using Dependency Parse Trees[J]. Bioinformatics, 2007, 23(3): 365-371.
doi: 10.1093/bioinformatics/btl616 pmid: 17142812
[3] Bui Q C, Sloot P M, van Mulligen E M, et al. A Novel Feature-Based Approach to Extract Drug-Drug Interactions from Biomedical Text[J]. Bioinformatics, 2014, 30(23): 3365-3371.
doi: 10.1093/bioinformatics/btu557 pmid: 25143286
[4] Xu R, Wang Q Q.Large-scale Extraction of Accurate Drug-Disease Treatment Pairs from Biomedical Literature for Drug Repurposing[J]. BMC Bioinformatics, 2013, 14(13): 1-11.
doi: 10.1186/1471-2105-14-1 pmid: 23323762
[5] Gray J, Bosworth A, Layman A, et al.Data Cube. A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total[J]. Data Mining & Knowledge Discovery, 1997, 1(1): 29-53.
doi: 10.1023/A:1009726021843
[6] Piro R M.Computational Approaches to Disease-Gene Prediction: Rationale, Classification and Successes[J]. Febs Journal, 2012, 279(5): 678-696.
doi: 10.1111/j.1742-4658.2012.08471.x pmid: 22221742
[7] Goh K I, Cusick M E, Valle D, et al.The Human Disease Network[J]. Proceedings of the National Academy of Sciences of the United States of America, 2007, 104(21): 8685-8690.
doi: 10.1073/pnas.0701361104
[8] Suthram S.Network-Based Elucidation of Human Disease Similarities Reveals Common Functional Modules Enriched for Pluripotent Drug Targets[J]. PLoS Computational Biology, 2010, 6(2): e1000662.
doi: 10.1371/journal.pcbi.1000662
[9] Arrell D K, Terzic A.Network Systems Biology for Drug Discovery[J]. Clinical Pharmacology & Therapeutics, 2010, 88(1): 120-125.
doi: 10.1038/clpt.2010.91 pmid: 20520604
[10] Lamb J, Craeford E D, Peck D, et al.The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease[J]. Science, 2006, 313(5795): 1929-1935.
doi: 10.1126/science.1132939 pmid: 17008526
[11] Natarajan N.Inductive Matrix Completion for Predicting Gene-Disease Associations[J]. Bioinformatics, 2014, 30(12): 60-68.
doi: 10.1093/bioinformatics/btu269 pmid: 4058925
[12] Odibat O, Reddy C K.Efficient Mining of Discriminative Co-clusters from Gene Expression Data[J]. Knowledge & Information Systems, 2014, 41(3): 667-696.
doi: 10.1007/s10115-013-0684-0 pmid: 4308820
[13] Li J, Edwards S M, Bo T, et al.A Random Set Scoring Model for Prioritization of Disease Candidate Genes Using Protein Complexes and Data-Mining of GeneRIF, OMIM and PubMed Records[J]. BMC Bioinformatics, 2014, 15(22): 3946-3959.
doi: 10.1186/1471-2105-15-315 pmid: 154876224409799996603
[14] Frijters R, Vugt M V, Smeets R, et al.Literature Mining for the Discovery of Hidden Connections Between Drugs, Genes and Diseases[J]. PLoS Computational Biology, 2010, 6(9): e10000943.
[15] Jenssen T K, Laegreid A, Komorowski J, et al.A Literature Network of Human Genes for High-Throughput Analysis of Gene Expression[J]. Nature Genetics, 2001, 28(1): 21-28.
doi: 10.1038/ng0501-21 pmid: 11326270
[16] Li C, Ooi B C, Tung A K H, et al. DADA: A Data Cube for Dominant Relationship Analysis[C]// Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. 2006: 659-670.
[17] Fang M, Shivakumar N, Garcia-Molina H, et al.Computing Iceberg Queries Efficiently[C]// Proceedings of the 24th International Conference on Very Large Data Bases. 1998: 299-310.
[18] Beyer K S, Ramakrishnan R.Bottom-Up Computation of Sparse and Iceberg CUBEs[C]// Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. 1999.
[19] Gonzalez G H, Tahsin T, Goodale B C, et al.Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery[J]. Briefings in Bioinformatics, 2016, 17(1): 33-42.
doi: 10.1093/bib/bbv087 pmid: 4719073
[20] Development Core R Team. R: A Language and Environment for Statistical Computing[J]. Computing, 2013, 14: 12-21.
doi: 10.1890/0012-9658(2002)083[3097:CFHIWS]2.0.CO;2
[21] Hanley J A, Mcneil B J.The Meaning and Use of the Area Under a Receiver Operating Characteristic (ROC) Curve[J]. Radiology, 1982, 143(1): 29-36.
doi: 10.1148/radiology.143.1.7063747 pmid: 7063747
[22] Donna M, Jim O, Pruitt K D, et al.Entrez Gene: Gene-Centered Information at NCBI[J]. Nucleic Acids Research, 2007, 39(2): 54-58.
doi: 10.1093/nar/gki031 pmid: 17148475
[23] Pruitt K D, Tatiana T, Maglott D R. NCBI Reference Sequences (RefSeq): A Curated Non-Redundant Sequence Database of Genomes Transcripts and Proteins[J]. Nucleic Acids Research, 2008, 33: 501-504.
doi: 10.1093/nar/gki025 pmid: 15608248
[24] Ashburner M, Ball C A, Blake J A, et al.Gene Ontology: Tool for the Unification of Biology[J]. Nature Genetics, 2000, 25(1): 25-29.
doi: 10.1038/75556
[25] Hamosh A, Scott A F, Amberger J S, et al.Online Mendelian Inheritance in Man (OMIM), A Knowledgebase of Human Genes and Genetic Disorders[J]. Nucleic Acids Research, 2005, 33(1): 514-517.
doi: 10.1093/nar/gki033 pmid: 15608251
[26] Knox C, Law V, Jewison T, et al.DrugBank 3.0: A Comprehensive Resource for ‘Omics’ Research on Drugs[J]. Nucleic Acids Research, 2011, 39(S1): 1035-1041.
doi: 10.1093/nar/gkq1126 pmid: 3013709
[27] Lang V Y, Fatehi M, Light P E.Pharmacogenomic Analysis of ATP-Sensitive Potassium Channels Coexpressing the Common Type 2 Diabetes Risk Variants E23K and S1369A[J]. Pharmacogenetics & Genomics, 2012, 22(3): 206-214.
doi: 10.1097/FPC.0b013e32835001e7 pmid: 22209866
[28] Tenenbaum A, Fisman E Z.Balanced Pan-PPAR Activator Bezafibrate in Combination with Statin: Comprehensive Lipids Control and Diabetes Prevention?[J]. Cardiovascular Diabetology, 2012, 11(2): 140.
doi: 10.1186/1475-2840-11-140 pmid: 3502168
[29] Ke J T, Li M, Xu S Q, et al.Gliquidone Decreases Urinary Protein by Promoting Tubular Reabsorption in Diabetic Goto- Kakizaki Rats[J]. Journal of Endocrinology, 2014, 220(2): 129-141.
doi: 10.1530/JOE-13-0199 pmid: 24254365
[30] Hui Z, Min G, Zhou T, et al.An Isogenic Human ESC Platform for Functional Evaluation of Genome-wide- Association-Study-Identified Diabetes Genes and Drug Discovery[J]. Cell Stem Cell, 2016, 9: 326-340.
doi: 10.1016/j.stem.2016.07.002 pmid: 27524441
[31] Nichols C G, Koster J C, Remedi M S.Beta-cell Hyperexcitability: From Hyperinsulinism to Diabetes[J]. Diabetes Obesity & Metabolism, 2007, 9(S2): 81-88.
doi: 10.1111/j.1463-1326.2007.00778.x pmid: 17919182
[32] 张闻. 英汉人类基因词典[M]. 北京: 人民卫生出版社, 2011.
[32] (Zhang Wen.English Chinese Dictionary of Human Genes [M]. Beijing: People’s Medical Publishing House, 2011.)
[33] Rudofsky G, Schlotterer A, Humpert P M, et al.A M55V Polymorphism in the SUMO4 Gene is Associated with a Reduced Prevalence of Diabetic Retinopathy in Patients with Type 1 Diabetes[J]. Experimental & Clinical Endocrinology & Diabetes, 2007, 116(1): 14-17.
doi: 10.1055/s-2007-985357 pmid: 17926234
[34] Esmatjes E, Jimenez A, Diaz G, et al.Neonatal Diabetes with End-Stage Nephropathy Pancreas Transplantation Decision[J]. Diabetes Care, 2008, 31(11): 2116-2117.
doi: 10.2337/dc08-0823
[35] Stefanski A, Majkowska L, Ciechanowicz A, et al.The Common C49620T Polymorphism in the Sulfonylurea Receptor Gene (ABCC8), Pancreatic Beta Cell Function and Long-Term Diabetic Complications in Obese Patients with Long-Lasting Type 2 Diabetes Mellitus[J]. Experimental & Clinical Endocrinology & Diabetes, 2007, 115(5): 317-321.
[36] Sun K, Liu H, Yeganova L, et al.Extracting Drug-Drug Interactions from Literature Using a Rich Feature-Based Linear Kernel Approach[J]. Journal of Biomedical Informatics, 2015, 55: 23-30.
doi: 10.1016/j.jbi.2015.03.002 pmid: 25796456
[37] Rong X, Wang Q Q.Large-scale Automatic Extraction of Side Effects Associated with Targeted Anticancer Drugs from Full-Text Oncological Articles[J]. Journal of Biomedical Informatics, 2015, 55: 64-72.
doi: 10.1016/j.jbi.2015.03.009 pmid: 25817969
[38] Gonzalez G H, Tahsin T, Goodale B C, et al.Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery[J]. Briefings in Bioinformatics, 2015, 29: 1-10.
doi: 10.1093/bib/bbv087 pmid: 4719073
[39] Boulil K, Bimonte S, Pinet F.Conceptual Model for Spatial Data Cubes: A UML Profile and Its Automatic Implementation[J]. Computer Standards & Interfaces, 2014, 38: 113-132.
doi: 10.1016/j.csi.2014.06.004
[1] 王寒雪,崔文娟,周园春,杜一. 基于机器学习的食源性疾病致病菌识别方法*[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] 韩普,张展鹏,张明淘,顾亮. 基于多特征融合的中文疾病名称归一化研究*[J]. 数据分析与知识发现, 2021, 5(5): 83-94.
[3] 李铁军,颜端武,杨雄飞. 基于情感加权关联规则的微博推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 27-33.
[4] 张润彤,陈东华,赵红梅,朱晓敏. 基于中文语义分析的计算机辅助ICD-11编码方法研究*[J]. 数据分析与知识发现, 2020, 4(4): 44-55.
[5] 魏伟,郭崇慧,邢小宇. 基于语义关联规则的试题知识点标注及试题推荐*[J]. 数据分析与知识发现, 2020, 4(2/3): 182-191.
[6] 黄名选,卢守东,徐辉. 基于加权关联模式挖掘与规则后件扩展的跨语言信息检索 *[J]. 数据分析与知识发现, 2019, 3(9): 77-87.
[7] 张勇,李树青,程永上. 基于频次有效长度的加权关联规则挖掘算法研究 *[J]. 数据分析与知识发现, 2019, 3(7): 85-93.
[8] 牟冬梅,法慧,王萍,孙晶. 基于结构方程模型的疾病危险因素研究*[J]. 数据分析与知识发现, 2019, 3(4): 80-89.
[9] 何跃, 丰月, 赵书朋, 马玉凤. 基于知乎问答社区的内容推荐研究——以物流话题为例[J]. 数据分析与知识发现, 2018, 2(9): 42-49.
[10] 牟冬梅, 金姗, 琚沅红. 基于文献数据的疾病与基因关联关系研究*[J]. 数据分析与知识发现, 2018, 2(8): 98-106.
[11] 范馨月, 崔雷. 基于文本挖掘的药物副作用知识发现研究[J]. 数据分析与知识发现, 2018, 2(3): 79-86.
[12] 何跃, 王爱欣, 丰月, 王莉. 基于关联规则的门诊药房布局优化[J]. 数据分析与知识发现, 2018, 2(1): 99-108.
[13] 敦欣卉, 张云秋, 杨铠西. 基于微博的细粒度情感分析[J]. 数据分析与知识发现, 2017, 1(7): 61-72.
[14] 黄名选. 基于矩阵加权关联模式的印尼中跨语言信息检索模型*[J]. 数据分析与知识发现, 2017, 1(1): 26-36.
[15] 罗文馨,陈翀,邓思艺. 基于Word2Vec及大众健康信息源的疾病关联探测[J]. 现代图书情报技术, 2016, 32(9): 78-87.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn