|
|
Predicting Survival Rates for Gastric Cancer Based on Ensemble Learning |
Xu Liangchen,Guo Chonghui( ) |
Institute of Systems Engineering, Dalian University of Technology, Dalian 116024, China |
|
|
Abstract [Objective] This paper constructs a model to predict the 5-year survival rates for gastric cancer based on the SEER database, aiming to provide support for the prognosis of gastric cancer, as well as analyze factors affecting the patients’ 5-year survival rates. [Methods] With the help of ensemble learning algorithm, especially the idea of EasyEnsemble, we handled data imbalance issue by combining data layer and model layer. Then, we integrated multiple GradientBoosting classifiers with Bagging, and built a prediction model using unbalanced gastric cancer survival data. Finally, we identified factors affecting the 5-year survival of gastric cancer using the SHAP value. [Results] Our new model’s prediction accuracy reached 0.808, with an AUC of 0.883. The prediction accuracy for subcategory survival patients was 0.835. Compared with the traditional models, our method yielded better prediction rates. We also found the regional nodes positive, summary stage/grade, and age had higher SHAP values. [Limitations] The related prognostic factors from the SEER database were limited, which influenced our model’s performance. [Conclusions] The new model could effectively predict survival rates for gastric cancer, and identify factors influencing the 5-year survival probability of the patients.
|
Received: 15 January 2021
Published: 14 April 2021
|
|
Fund:National Natural Science Foundation of China(71771034);Fundamental Research Funds for the Central Universities(DUT21YG108) |
Corresponding Authors:
Guo Chonghui ORCID:0000-0002-5155-1297
E-mail: dlutguo@dlut.edu.cn
|
[1] |
Bray F, Ferlay J, Soerjomataram I, et al. Global Cancer Statistics 2018: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries[J]. CA: A Cancer Journal for Clinicians, 2018, 68(6):394-424.
doi: 10.3322/caac.v68.6
|
[2] |
Shin H, Nam Y. A Coupling Approach of a Predictor and a Descriptor for Breast Cancer Prognosis[J]. BMC Medical Genomics, 2014, 7(S1):S4. DOI: 10.1186/1755-8794-7-S1-S4.
doi: 10.1186/1755-8794-7-S1-S4
|
[3] |
Allemani C, Matsuda T, Di Carlo V, et al. Global Surveillance of Trends in Cancer Survival 2000-14 (CONCORD-3): Analysis of Individual Records for 37 513 025 Patients Diagnosed with One of 18 Cancers from 322 Population-based Registries in 71 Countries[J]. The Lancet, 2018, 391(10125):1023-1075.
|
[4] |
Yang L M, Takimoto T, Fujimoto J. Prognostic Model for Predicting Overall Survival in Children and Adolescents with Rhabdomyosarcoma[J]. BMC Cancer, 2014, 14(1):654.
doi: 10.1186/1471-2407-14-654
|
[5] |
Park I, Lee J L, Ryu M H, et al. Prognostic Factors and Predictive Model in Patients with Advanced Biliary Tract Adenocarcinoma Receiving First-Line Palliative Chemotherapy[J]. Cancer: Interdisciplinary International Journal of the American Cancer Society, 2009, 115(18):4148-4155.
|
[6] |
冯婷婷, 凌孙彬, 刘碧霞, 等. 非功能型胰腺神经内分泌肿瘤手术预后分析——一项基于SEER数据库的回顾性研究[J]. 中国肿瘤, 2017, 26(11):910-914.
|
[6] |
( Feng Tingting, Ling Sunbin, Liu Bixia, et al. Prognostic Factors of Long-term Outcome of Non-functional Pancreatic Neuroendocrine Neoplasms Following Surgical Treatment: A Retrospective Study Based on SEER Database[J]. China Cancer, 2017, 26(11):910-914.)
|
[7] |
潘辉, 张亚雷, 肖大凯, 等. 基于SEER数据库构建小细胞肺癌术后患者生存预测模型[J]. 肿瘤预防与治疗, 2019, 32(6):516-523.
|
[7] |
( Pan Hui, Zhang Yalei, Xiao Dakai, et al. Nomogram for Prediction of Survival of Postoperative Small Cell Lung Cancer Patients: An Analysis Based on SEER[J]. Journal of Cancer Control and Treatment, 2019, 32(6):516-523.)
|
[8] |
Kim W, Kim K S, Park R W. Nomogram of Naive Bayesian Model for Recurrence Prediction of Breast Cancer[J]. Healthcare Informatics Research, 2016, 22(2):89-94.
doi: 10.4258/hir.2016.22.2.89
|
[9] |
Kim W, Kim K S, Lee J E, et al. Development of Novel Breast Cancer Recurrence Prediction Model Using Support Vector Machine[J]. Journal of Breast Cancer, 2012, 15(2):230-238.
doi: 10.4048/jbc.2012.15.2.230
|
[10] |
Lynch C M, Abdollahi B, Fuqua J D, et al. Prediction of Lung Cancer Patient Survival via Supervised Machine Learning Classification Techniques[J]. International Journal of Medical Informatics, 2017, 108:1-8.
doi: S1386-5056(17)30236-8
pmid: 29132615
|
[11] |
尹玢璨, 辛世超, 张晗, 等. 基于SEER数据库应用贝叶斯网络构建亚洲肿瘤患者预后模型——以非小细胞肺癌为例[J]. 数据分析与知识发现, 2017, 1(2):41-46.
|
[11] |
( Yin Bincan, Xin Shichao, Zhang Han, et al. Building Asian Tumor-patients Prognostic Model with Bayesian Network and SEER Database——Case Study of Non-small Cell Lung Cancer[J]. Data Analysis and Knowledge Discovery, 2017, 1(2):41-46.)
|
[12] |
Hasan M M, Haque M R, Kabir M M J. Breast Cancer Diagnosis Models Using PCA and Different Neural Network Architectures[C]// Proceedings of 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering IC4ME2. IEEE, 2019.
|
[13] |
黄志刚, 刘虹, 刘娟, 等. 基于C5.0算法的胃癌生存预测模型研究[J]. 南京信息工程大学学报(自然科学版), 2017, 9(4):406-410.
|
[13] |
( Huang Zhigang, Liu Hong, Liu Juan, et al. Gastric Cancer Prediction Model Based on C5.0 Classification Algorithm[J]. Journal of Nanjing University of Information Science & Technology (Natural Science Edition), 2017, 9(4):406-410.)
|
[14] |
Wong M L, Seng K, Wong P K. Cost-sensitive Ensemble of Stacked Denoising Autoencoders for Class Imbalance Problems in Business Domain[J]. Expert Systems with Applications, 2020, 141:112918.
|
[15] |
Thabtah F. Machine Learning in Autistic Spectrum Disorder Behavioral Research: A Review and Ways Forward[J]. Informatics for Health and Social Care, 2019, 44(3):278-297.
doi: 10.1080/17538157.2017.1399132
pmid: 29436887
|
[16] |
Thabtah F, Hammoud S, Kamalov F, et al. Data Imbalance in Classification: Experimental Evaluation[J]. Information Sciences, 2020, 513:429-441.
doi: 10.1016/j.ins.2019.11.004
|
[17] |
Lee H K, Kim S B. An Overlap-sensitive Margin Classifier for Imbalanced and Overlapping Data[J]. Expert Systems with Applications, 2018, 98:72-83.
doi: 10.1016/j.eswa.2018.01.008
|
[18] |
Chawla N V, Lazarevic A, Hall L O, et al. SMOTEBoost: Improving Prediction of the Minority Class in Boosting[C]// Proceedings of European Conference on Principles of Data Mining and Knowledge Discovery. Springer, Berlin, Heidelberg, 2003: 107-119.
|
[19] |
Liu X Y, Wu J, Zhou Z H. Exploratory Undersampling for Class-imbalance Learning[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2008, 39(2):539-550.
doi: 10.1109/TSMCB.2008.2007853
|
[20] |
章鸣嬛, 张璇, 郭欣, 等. 基于SEER数据库利用机器学习方法分析乳腺癌的预后因素[J]. 北京生物医学工程, 2019, 38(5):486-491, 497.
|
[20] |
( Zhang Minghuan, Zhang Xuan, Guo Xin, et al. Prognostic Factors of Breast Cancer with Machine Learning Method Based on SEER Database[J]. Beijing Biomedical Engineering, 2019, 38(5):486-491, 497.)
|
[21] |
Lundberg S M, Lee S I. A Unified Approach to Interpreting Model Predictions[C]// Proceedings of the 31st Conference on Neural Information Processing Systems. 2017: 4765-4774.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|