|
|
Predicting Breast Cancer Survival Length with Multi-Omics Data Fusion |
Huiying Qi1( ),Yuhe Jiang2 |
1School of Health Humanities, Peking University, Beijing 100191, China 2Health Science Center, Peking University, Beijing 100191, China |
|
|
Abstract [Objective] This paper proposes a model using machine learning techniques and various omics data, aiming to better predict the survival length of breast cancer patients. [Methods] The prediction model was established with random forest algorithm. It merged four types of omics data, including gene expression, copy number variation, DNA methylation and protein expression of breast cancer cases from TCGA database. [Results] On the test data set, the model’s prediction precision reached 97.22%, and the recall was 98.13%. Compared with the exisiting models, the AUC value of our new algorithm was the highest (0.8393). [Limitations] The sample size needs to be expanded. [Conclusions] The proposed method is an effective way to predict breast cancer patients’ survival length.
|
Received: 07 January 2019
Published: 29 September 2019
|
|
Corresponding Authors:
Huiying Qi
E-mail: qhy@bjmu.edu.cn
|
[1] |
世卫组织: 2018 年全球最新癌症报告[EB/OL]. [2019-01-02].
|
[1] |
( WHO: Global Latest Cancer Report 2018[EB/OL].[ 2019-01-02]. )
|
[2] |
国家癌症中心: 2017最新中国肿瘤现状和趋势[EB/OL]. [ 2019- 01- 02].
|
[2] |
( National Cancer Center: The Latest Cancer Status and Trends in China in 2017[EB/OL]. [ 2019- 01- 02].
|
[3] |
Hidalgo S J T, Ma S . Clustering Multilayer Omics Data Using MuNCut[J]. BMC Genomics, 2018,19(1):198.
|
[4] |
Van De Vijver M J, He Y D, Van’t Veer L J , et al. A Gene-Expression Signature as a Predictor of Survival in Breast Cancer[J]. New England Journal of Medicine, 2002,347(25):1999-2009.
|
[5] |
贾晓晨, 贾勇圣, 孟文静 , 等. 基于TCGA数据库建立的八基因预后模型在乳腺癌中的应用[J]. 天津医药, 2018,46(8):856-861.
|
[5] |
( Jia Xiaochen, Jia Yongsheng, Meng Wenjing , et al. Identification of Prognostic Eight-Gene Signature Model in Breast Cancer Using Integrated TCGA Database[J]. Tianjin Medical Journal, 2018,46(8):856-861.)
|
[6] |
Xu X, Zhang Y, Zou L, et al. A Gene Signature for Breast Cancer Prognosis Using Support Vector Machine [C]// Proceedings of the 5th International Conference on BioMedical Engineering and Informatics. IEEE, 2013: 928-931.
|
[7] |
Kim D, Joung J G, Sohn K A , et al. Knowledge Boosting: A Graph-Based Integration Approach with Multi-Omics Data and Genomic Knowledge for Cancer Clinical Outcome Prediction[J]. Journal of the American Medical Informatics Association, 2015,22(1):109-120.
|
[8] |
Kim D, Li R, Lucas A , et al. Using Knowledge-Driven Genomic Interactions for Multi-Omics Data Analysis: Meta Dimensional Models for Predicting Clinical Outcomes in Ovarian Carcinoma[J]. Journal of the American Medical Informatics Association, 2016,24(3):577-587.
|
[9] |
Satagopan J M, Venkatraman E S, Begg C B . Two-Stage Designs for Gene-Disease Association Studies with Sample Size Constraints[J]. Biometrics, 2004,60(3):589-597.
|
[10] |
Wold S, Esbensen K, Geladi P . Principal Component Analysis[J]. Chemometrics & Intelligent Laboratory Systems, 1987,2(1-3):37-52.
|
[11] |
Gao J, Liang F, Fan W , et al. A Graph-Based Consensus Maximization Approach for Combing Multiple Supervised and Unsupervised Models[J]. IEEE Transactions on Knowledge and Data Engineering, 2013,25(1):15-28.
|
[12] |
Yu G, Zhu H, Domeniconi C , et al. Integrating Multiple Networks for Protein Function Prediction[J]. BMC Systems Biology, 2015, 9(S1): Article No. S3.
|
[13] |
Guo X, Gao L, Liao Q , et al. Long Non-Coding RNAs Function Annotation: A Global Prediction Method Based on Bi-Colored Networks[J]. Nucleic Acids Research, 2013,41(2):e35.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|