Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (9): 107-114    DOI: 10.11925/infotech.2096-3467.2020.1185
Comparing Prediction Models for Prostate Cancer
Che Hongxin,Wang Tong,Wang Wei()
School of Public Health, Jilin University, Changchun 130021, China
[Objective] This paper compares the performance of prostate cancer prediction models based on ensemble learning and non-ensemble learning algorithms, aiming to identify the optimal algorithm and key risk factors for the cancer. [Objective] First, we constructed the prediction models with K-Nearest Neighbor, Decision Tree, Support Vector Machine, and BP neural network. Then, we built prediction models based on AdaBoost, GradientBoost and XGBoost. Finally, we identified risk factors of prostate cancer with the two groups of models. [Results] Among models based on the non-ensemble algorithms, the Decision Tree model had the best performance with the accuracy of 0.933 3, the F1 score of 0.930 1, and the AUC of 0.914 5. For the ensemble algorithm based models, the performance of XGBoost model was the best, with the accuracy of 0.957 3, F1 score of 0.962 4, and the AUC of 0.951 3. We found nine important risk factors for prostate cancer, including total PSA and free PSA. [Limitations] The experimental data set and the model building algorithm need to be expanded. [Conclusions] Ensemble learning algorithm is better than the non-ensemble ones to predict prostate cancer and identify risk factors.

Key wordsEnsemble Learning      Machine Learning      Prostate Cancer      Prediction Model     
Received: 29 November 2020      Published: 15 October 2021
ZTFLH:  分类号: TP391  
Fund:*Interdisciplinary Research Funding Program for Doctoral Students of Jilin University(101832020DJX081)
Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer. Data Analysis and Knowledge Discovery, 2021, 5(9): 107-114.

ROC Curve of Prostate Cancer Prediction Models
模型类别 模型名称 准确率 精确率 召回率 F1分数
KNN 0.936 0 0.920 0 0.940 0 0.921 9
BP神经网络 0.933 3 0.870 0 0.930 0 0.903 1
决策树 0.936 0 0.930 0 0.930 0 0.930 1
支持向量机 0.936 0 0.940 0 0.940 0 0.914 4
AdaBoost 0.950 7 0.950 0 0.950 0 0.952 0
GradientBoost 0.956 0 0.950 0 0.960 0 0.953 1
XGBoost 0.957 3 0.960 0 0.960 0 0.962 4
Evaluation Index Results of Prostate Cancer Prediction Models
Feature Importance Ranking of Machine Learning Models
Feature Frequency of Machine Learning Models
