Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (9): 107-114    DOI: 10.11925/infotech.2096-3467.2020.1185
Current Issue | Archive | Adv Search |
Comparing Prediction Models for Prostate Cancer
Che Hongxin,Wang Tong,Wang Wei()
School of Public Health, Jilin University, Changchun 130021, China
Download: PDF (1329 KB)   HTML ( 21
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper compares the performance of prostate cancer prediction models based on ensemble learning and non-ensemble learning algorithms, aiming to identify the optimal algorithm and key risk factors for the cancer. [Objective] First, we constructed the prediction models with K-Nearest Neighbor, Decision Tree, Support Vector Machine, and BP neural network. Then, we built prediction models based on AdaBoost, GradientBoost and XGBoost. Finally, we identified risk factors of prostate cancer with the two groups of models. [Results] Among models based on the non-ensemble algorithms, the Decision Tree model had the best performance with the accuracy of 0.933 3, the F1 score of 0.930 1, and the AUC of 0.914 5. For the ensemble algorithm based models, the performance of XGBoost model was the best, with the accuracy of 0.957 3, F1 score of 0.962 4, and the AUC of 0.951 3. We found nine important risk factors for prostate cancer, including total PSA and free PSA. [Limitations] The experimental data set and the model building algorithm need to be expanded. [Conclusions] Ensemble learning algorithm is better than the non-ensemble ones to predict prostate cancer and identify risk factors.

Key wordsEnsemble Learning      Machine Learning      Prostate Cancer      Prediction Model     
Received: 29 November 2020      Published: 15 October 2021
ZTFLH:  分类号: TP391  
Fund:*Interdisciplinary Research Funding Program for Doctoral Students of Jilin University(101832020DJX081)
Corresponding Authors: Wang Wei     E-mail: w_w@jlu.edu.cn

Cite this article:

Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer. Data Analysis and Knowledge Discovery, 2021, 5(9): 107-114.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.1185     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I9/107

ROC Curve of Prostate Cancer Prediction Models
模型类别 模型名称 准确率 精确率 召回率 F1分数
非集成
学习算法
KNN 0.936 0 0.920 0 0.940 0 0.921 9
BP神经网络 0.933 3 0.870 0 0.930 0 0.903 1
决策树 0.936 0 0.930 0 0.930 0 0.930 1
支持向量机 0.936 0 0.940 0 0.940 0 0.914 4
集成学习
算法
AdaBoost 0.950 7 0.950 0 0.950 0 0.952 0
GradientBoost 0.956 0 0.950 0 0.960 0 0.953 1
XGBoost 0.957 3 0.960 0 0.960 0 0.962 4
Evaluation Index Results of Prostate Cancer Prediction Models
Feature Importance Ranking of Machine Learning Models
Feature Frequency of Machine Learning Models
[1] Bray F, Ferlay J, Soerjomataram I, et al. Global Cancer Statistics 2018: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries[J]. CA: A Cancer Journal for Clinicians, 2018, 68(6):394-424.
doi: 10.3322/caac.v68.6
[2] 顾秀瑛, 郑荣寿, 张思维, 等. 2000—2014年中国肿瘤登记地区前列腺癌发病趋势及年龄变化分析[J]. 中华预防医学杂志, 2018, 52(6):586-592.
[2] ( Gu Xiuying, Zheng Rongshou, Zhang Siwei, et al. Analysis on the Trend of Prostate Cancer Incidence and Age Change in Cancer Registration Areas of China, 2000 to 2014[J]. Chinese Journal of Preventive Medicine, 2018, 52(6):586-592.)
[3] Platz E A, Rimm E B, Willett W C, et al. Racial Variation in Prostate Cancer Incidence and in Hormonal System Markers Among Male Health Professionals[J]. Journal of the National Cancer Institute, 2000, 92(24):2009-2017.
pmid: 11121463
[4] Culp M B, Soerjomataram I, Efstathiou J A, et al. Recent Global Patterns in Prostate Cancer Incidence and Mortality Rates[J]. European Urology, 2020, 77(1):38-52.
doi: 10.1016/j.eururo.2019.08.005
[5] Nitta S, Tsutsumi M, Sakka S, et al. Machine Learning Methods Can More Efficiently Predict Prostate Cancer Compared with Prostate-specific Antigen Density and Prostate Specific Antigen Velocity[J]. Prostate International, 2019, 7(3):114-118.
doi: 10.1016/j.prnil.2019.01.001
[6] Jones S, Hargrave C, Deegan T, et al. Comparison of Statistical Machine Learning Models for Rectal Protocol Compliance in Prostate External Beam Radiation Therapy[J]. Medical Physics, 2020, 47(4):1452-1459.
doi: 10.1002/mp.v47.4
[7] 杨振森, 李传富, 周康源, 等. 基于小波变换的超声图像纹理特征提取及前列腺癌诊断[J]. 航天医学与医学工程, 2009, 22(4):281-285.
[7] ( Yang Zhensen, Li Chuanfu, Zhou Kangyuan, et al. Diagnosis of Prostate Cancer and Texture Feature Extraction of Ultrasound Images Based on Wavelet Transform[J]. Space Medicine & Medical Engineering, 2009, 22(4):281-285.)
[8] 殷昭阳, 李方龙, 崔亮, 等. 前列腺穿刺活检结果预测模型的建立[J]. 现代泌尿生殖肿瘤杂志, 2016, 8(5):283-287.
[8] ( Yin Zhaoyang, Li Fanglong, Cui Liang, et al. Establishment of a Model for Predicting the Result of Prostate Biopsy[J]. Journal of Contemporary Urologic and Reproductive Oncology, 2016, 8(5):283-287.)
[9] 彭涛, 肖建明, 张仕慧, 等. 基于多参数MRI及影像组学建立机器学习模型诊断临床显著性前列腺癌[J]. 中国医学影像技术, 2019, 35(10):1526-1530.
[9] ( Peng Tao, Xiao Jianming, Zhang Shihui, et al. Establishment of Machine Learning Models for Diagnosis of Clinically Significant Prostate Cancer Based on Multi-Parameter MRI and Radiomics[J]. Chinese Journal of Medical Imaging Technology, 2019, 35(10):1526-1530.)
[10] 陈志远, 杨瑞, 刘修恒. 机器学习构建多基因模型预测前列腺癌[J]. 现代泌尿外科杂志, 2020, 25(7):585-589.
[10] ( Chen Zhiyuan, Yang Rui, Liu Xiuheng. Construction of a Multigene Predictive Model of Prostate Cancer Based on Machine Learning[J]. Journal of Modern Urology, 2020, 25(7):585-589.)
[11] Maiti T, Mukhopadhyay P. Comparison of Statistical Classification Methods Based on a Prostate Cancer Study[J]. Calcutta Statistical Association Bulletin, 2005, 57(3-4):219-238.
doi: 10.1177/0008068320050306
[12] 肖利洪, 陈沛然, 李梅, 等. TAN贝叶斯网络模型在前列腺癌中的预测研究[J]. 中华男科学杂志, 2016, 22(6):506-510.
[12] ( Xiao Lihong, Chen Peiran, Li Mei, et al. Tree-Augmented Naive Bayesian Network Model for Predicting Prostate Cancer[J]. National Journal of Andrology, 2016, 22(6):506-510.)
[13] Sanchis-Bonet A, Ortega-Polledo L, Garcia-Loarte E E, et al. Utility of Prostate Health Index and Prostate Health Index Density in Predicting Detection of Clinically Significant Prostate Cancer in a Cohort of Patients with PSA in the Grey Zone and Normal Digital Rectal Examination[J]. European Urology Supplements, 2019, 18(11):e3420.
doi: 10.1016/S1569-9056(19)34593-2
[14] 徐继伟, 杨云. 集成学习方法: 研究综述[J]. 云南大学学报(自然科学版), 2018, 40(6):1082-1092.
[14] ( Xu Jiwei, Yang Yun. A Survey of Ensemble Learning Approaches[J]. Journal of Yunnan University (Natural Sciences Edition), 2018, 40(6):1082-1092.)
[15] Zhou Z H, Wu J X, Tang W. Ensembling Neural Networks: Many Could be Better Than All[J]. Artificial Intelligence, 2002, 137(1-2):239-263.
doi: 10.1016/S0004-3702(02)00190-X
[16] 于玲, 吴铁军. 集成学习:Boosting算法综述[J]. 模式识别与人工智能, 2004, 17(1):52-59.
[16] ( Yu Ling, Wu Tiejun. Assemble Learning: A Survey of Boosting Algorithms[J]. Pattern Recognition and Artificial Intelligence, 2004, 17(1):52-59.)
[17] Çınar M, Engin M, Engin E Z, et al. Early Prostate Cancer Diagnosis by Using Artificial Neural Networks and Support Vector Machines[J]. Expert Systems with Applications, 2009, 36(3):6357-6361.
doi: 10.1016/j.eswa.2008.08.010
[18] Lee H J, Hwang S I, Han S M, et al. Image-based Clinical Decision Support for Transrectal Ultrasound in the Diagnosis of Prostate Cancer: Comparison of Multiple Logistic Regression, Artificial Neural Network, and Support Vector Machine[J]. European Radiology, 2010, 20(6):1476-1484.
doi: 10.1007/s00330-009-1686-x
[19] Pantic D N, Stojadinovic M M, Stojadinovic M M. Decision Tree Analysis for Prostate Cancer Prediction in Patients with Serum PSA 10 ng/mL or Less[J]. Serbian Journal of Experimental and Clinical Research, 2020, 21(1):43-50.
doi: 10.2478/sjecr-2018-0039
[20] 黄朴文. 基于集成学习的糖尿病分析预测[J]. 电子制作, 2018(22):73-75.
[20] ( Huang Puwen. Diabetes Analysis and Prediction Based on Ensemble Learning[J]. Practical Electronics, 2018(22):73-75.)
[21] 汤元杰. 雄激素对前列腺癌细胞内游离钙离子浓度的影响及其机制探讨[D]. 上海:第二军医大学, 2004.
[21] ( Tang Yuanjie. Effects of Androgen on Intracellular Free Calcium Concentration of Prostate Cancer Cells and Its Underlying Mechanism[D]. Shanghai: Second Military Medical University, 2004.)
[22] 巩蓓, 雷婷, 张曼. 比较载脂蛋白A-1在前列腺癌和前列腺增生中的表达[J]. 国际检验医学杂志, 2015, 36(2):150-152.
[22] ( Gong Bei, Lei Ting, Zhang Man. Expression of Apolipoprotein A-I in Prostate Cancer and Benign Prostatic Hyperplasia[J]. International Journal of Laboratory Medicine, 2015, 36(2):150-152.)
[23] Venanzoni M, Giunta S, Muraro G, et al. Apolipoprotein E Expression in Localized Prostate Cancers[J]. International Journal of Oncology, 2003, 22(4):779-786.
pmid: 12632068
[24] 郑轶群, 李志坚, 高新, 等. Eag1钾通道在前列腺癌组织中的表达及意义[J]. 中华男科学杂志, 2013, 19(3):205-209.
[24] ( Zheng Yiqun, Li Zhijian, Gao Xin, et al. Expression of Eag1 K(+) Channel in Prostate Cancer and Its Significance[J]. National Journal of Andrology, 2013, 19(3):205-209.)
[1] Wang Hanxue,Cui Wenjuan,Zhou Yuanchun,Du Yi. Identifying Pathogens of Foodborne Diseases with Machine Learning[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] Chen Donghua,Zhao Hongmei,Shang Xiaopu,Zhang Runtong. Optimizing Large Hospital Operating Rooms with Data Analytics[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[3] Su Qiang, Hou Xiaoli, Zou Ni. Predicting Surgical Infections Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[4] Xu Liangchen, Guo Chonghui. Predicting Survival Rates for Gastric Cancer Based on Ensemble Learning[J]. 数据分析与知识发现, 2021, 5(8): 86-99.
[5] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[6] Cao Rui,Liao Bin,Li Min,Sun Ruina. Predicting Prices and Analyzing Features of Online Short-Term Rentals Based on XGBoost[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[7] Xiang Zhuoyuan,Liu Zhicong,Wu Yu. Adaptive Recommendation Model Based on User Behaviors[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[8] Wang Nan,Li Hairong,Tan Shuru. Predicting of Public Opinion Reversal with Improved SMOTE Algorithm and Ensemble Learning[J]. 数据分析与知识发现, 2021, 5(4): 37-48.
[9] Qiu Yunfei, Guo Lei. Predicting Diabetic Complications with Unbalanced Data[J]. 数据分析与知识发现, 2021, 5(2): 116-128.
[10] Chai Guorong,Wang Bin,Sha Yongzhong. Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[11] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[12] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[13] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[14] Yu Bengong,Ji Haomin. Semi-Supervised Method for Text Classification Based on DW-TCI[J]. 数据分析与知识发现, 2020, 4(10): 58-69.
[15] Wang Shuyi,Liu Sai,Ma Zheng. Microblog Image Privacy Classification with Deep Transfer Learning[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn