[Objective] This paper compares the performance of prostate cancer prediction models based on ensemble learning and non-ensemble learning algorithms, aiming to identify the optimal algorithm and key risk factors for the cancer. [Objective] First, we constructed the prediction models with K-Nearest Neighbor, Decision Tree, Support Vector Machine, and BP neural network. Then, we built prediction models based on AdaBoost, GradientBoost and XGBoost. Finally, we identified risk factors of prostate cancer with the two groups of models. [Results] Among models based on the non-ensemble algorithms, the Decision Tree model had the best performance with the accuracy of 0.933 3, the F1 score of 0.930 1, and the AUC of 0.914 5. For the ensemble algorithm based models, the performance of XGBoost model was the best, with the accuracy of 0.957 3, F1 score of 0.962 4, and the AUC of 0.951 3. We found nine important risk factors for prostate cancer, including total PSA and free PSA. [Limitations] The experimental data set and the model building algorithm need to be expanded. [Conclusions] Ensemble learning algorithm is better than the non-ensemble ones to predict prostate cancer and identify risk factors.
Bray F, Ferlay J, Soerjomataram I, et al. Global Cancer Statistics 2018: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries[J]. CA: A Cancer Journal for Clinicians, 2018, 68(6):394-424.
( Gu Xiuying, Zheng Rongshou, Zhang Siwei, et al. Analysis on the Trend of Prostate Cancer Incidence and Age Change in Cancer Registration Areas of China, 2000 to 2014[J]. Chinese Journal of Preventive Medicine, 2018, 52(6):586-592.)
Platz E A, Rimm E B, Willett W C, et al. Racial Variation in Prostate Cancer Incidence and in Hormonal System Markers Among Male Health Professionals[J]. Journal of the National Cancer Institute, 2000, 92(24):2009-2017.
Culp M B, Soerjomataram I, Efstathiou J A, et al. Recent Global Patterns in Prostate Cancer Incidence and Mortality Rates[J]. European Urology, 2020, 77(1):38-52.
Nitta S, Tsutsumi M, Sakka S, et al. Machine Learning Methods Can More Efficiently Predict Prostate Cancer Compared with Prostate-specific Antigen Density and Prostate Specific Antigen Velocity[J]. Prostate International, 2019, 7(3):114-118.
Jones S, Hargrave C, Deegan T, et al. Comparison of Statistical Machine Learning Models for Rectal Protocol Compliance in Prostate External Beam Radiation Therapy[J]. Medical Physics, 2020, 47(4):1452-1459.
( Yang Zhensen, Li Chuanfu, Zhou Kangyuan, et al. Diagnosis of Prostate Cancer and Texture Feature Extraction of Ultrasound Images Based on Wavelet Transform[J]. Space Medicine & Medical Engineering, 2009, 22(4):281-285.)
( Yin Zhaoyang, Li Fanglong, Cui Liang, et al. Establishment of a Model for Predicting the Result of Prostate Biopsy[J]. Journal of Contemporary Urologic and Reproductive Oncology, 2016, 8(5):283-287.)
( Peng Tao, Xiao Jianming, Zhang Shihui, et al. Establishment of Machine Learning Models for Diagnosis of Clinically Significant Prostate Cancer Based on Multi-Parameter MRI and Radiomics[J]. Chinese Journal of Medical Imaging Technology, 2019, 35(10):1526-1530.)
( Chen Zhiyuan, Yang Rui, Liu Xiuheng. Construction of a Multigene Predictive Model of Prostate Cancer Based on Machine Learning[J]. Journal of Modern Urology, 2020, 25(7):585-589.)
Maiti T, Mukhopadhyay P. Comparison of Statistical Classification Methods Based on a Prostate Cancer Study[J]. Calcutta Statistical Association Bulletin, 2005, 57(3-4):219-238.
( Xiao Lihong, Chen Peiran, Li Mei, et al. Tree-Augmented Naive Bayesian Network Model for Predicting Prostate Cancer[J]. National Journal of Andrology, 2016, 22(6):506-510.)
Sanchis-Bonet A, Ortega-Polledo L, Garcia-Loarte E E, et al. Utility of Prostate Health Index and Prostate Health Index Density in Predicting Detection of Clinically Significant Prostate Cancer in a Cohort of Patients with PSA in the Grey Zone and Normal Digital Rectal Examination[J]. European Urology Supplements, 2019, 18(11):e3420.
( Yu Ling, Wu Tiejun. Assemble Learning: A Survey of Boosting Algorithms[J]. Pattern Recognition and Artificial Intelligence, 2004, 17(1):52-59.)
Çınar M, Engin M, Engin E Z, et al. Early Prostate Cancer Diagnosis by Using Artificial Neural Networks and Support Vector Machines[J]. Expert Systems with Applications, 2009, 36(3):6357-6361.
Lee H J, Hwang S I, Han S M, et al. Image-based Clinical Decision Support for Transrectal Ultrasound in the Diagnosis of Prostate Cancer: Comparison of Multiple Logistic Regression, Artificial Neural Network, and Support Vector Machine[J]. European Radiology, 2010, 20(6):1476-1484.
Pantic D N, Stojadinovic M M, Stojadinovic M M. Decision Tree Analysis for Prostate Cancer Prediction in Patients with Serum PSA 10 ng/mL or Less[J]. Serbian Journal of Experimental and Clinical Research, 2020, 21(1):43-50.
黄朴文. 基于集成学习的糖尿病分析预测[J]. 电子制作, 2018(22):73-75.
( Huang Puwen. Diabetes Analysis and Prediction Based on Ensemble Learning[J]. Practical Electronics, 2018(22):73-75.)