Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (9): 107-114     https://doi.org/10.11925/infotech.2096-3467.2020.1185
     研究论文 本期目录 | 过刊浏览 | 高级检索 |
前列腺癌预测模型对比研究*
车宏鑫,王桐,王伟()
吉林大学公共卫生学院 长春 130021
Comparing Prediction Models for Prostate Cancer
Che Hongxin,Wang Tong,Wang Wei()
School of Public Health, Jilin University, Changchun 130021, China
全文: PDF (1329 KB)   HTML ( 11
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 对比基于集成学习算法与非集成学习算法构建的前列腺癌预测模型性能,选出最优算法和重要风险因素。【方法】 基于KNN、决策树、支持向量机、BP神经网络4种非集成算法和AdaBoost、GradientBoost、XGBoost 三种集成算法构建前列腺癌预测模型并验证模型效能,识别前列腺癌风险因素。【结果】 非集成算法中决策树模型性能最优,准确率为0.933 3,F1分数为0.930 1,AUC为0.914 5;集成算法中XGBoost模型性能最优,准确率为0.957 3,F1分数为0.962 4,AUC为0.951 3。识别出总PSA、游离PSA等9个前列腺癌重要风险因素。【局限】 实验数据集有待扩充,构建模型的算法有待增加。【结论】 在前列腺癌预测模型性能和风险因素识别方面,集成学习算法整体上优于非集成学习算法。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
车宏鑫
王桐
王伟
关键词 集成学习机器学习前列腺癌预测模型    
Abstract

[Objective] This paper compares the performance of prostate cancer prediction models based on ensemble learning and non-ensemble learning algorithms, aiming to identify the optimal algorithm and key risk factors for the cancer. [Objective] First, we constructed the prediction models with K-Nearest Neighbor, Decision Tree, Support Vector Machine, and BP neural network. Then, we built prediction models based on AdaBoost, GradientBoost and XGBoost. Finally, we identified risk factors of prostate cancer with the two groups of models. [Results] Among models based on the non-ensemble algorithms, the Decision Tree model had the best performance with the accuracy of 0.933 3, the F1 score of 0.930 1, and the AUC of 0.914 5. For the ensemble algorithm based models, the performance of XGBoost model was the best, with the accuracy of 0.957 3, F1 score of 0.962 4, and the AUC of 0.951 3. We found nine important risk factors for prostate cancer, including total PSA and free PSA. [Limitations] The experimental data set and the model building algorithm need to be expanded. [Conclusions] Ensemble learning algorithm is better than the non-ensemble ones to predict prostate cancer and identify risk factors.

Key wordsEnsemble Learning    Machine Learning    Prostate Cancer    Prediction Model
收稿日期: 2020-11-29      出版日期: 2021-10-15
ZTFLH:  分类号: TP391  
基金资助:*吉林大学博士研究生交叉学科科研资助计划的研究成果之一(101832020DJX081)
通讯作者: 王伟     E-mail: w_w@jlu.edu.cn
引用本文:   
车宏鑫,王桐,王伟. 前列腺癌预测模型对比研究*[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer. Data Analysis and Knowledge Discovery, 2021, 5(9): 107-114.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.1185      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I9/107
Fig.1  前列腺癌预测模型性能比较ROC曲线
模型类别 模型名称 准确率 精确率 召回率 F1分数
非集成
学习算法
KNN 0.936 0 0.920 0 0.940 0 0.921 9
BP神经网络 0.933 3 0.870 0 0.930 0 0.903 1
决策树 0.936 0 0.930 0 0.930 0 0.930 1
支持向量机 0.936 0 0.940 0 0.940 0 0.914 4
集成学习
算法
AdaBoost 0.950 7 0.950 0 0.950 0 0.952 0
GradientBoost 0.956 0 0.950 0 0.960 0 0.953 1
XGBoost 0.957 3 0.960 0 0.960 0 0.962 4
Table 1  前列腺癌预测模型性能评估指标结果
Fig.2  机器学习模型特征重要性排序
Fig.3  机器学习模型特征出现频次
[1] Bray F, Ferlay J, Soerjomataram I, et al. Global Cancer Statistics 2018: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries[J]. CA: A Cancer Journal for Clinicians, 2018, 68(6):394-424.
doi: 10.3322/caac.v68.6
[2] 顾秀瑛, 郑荣寿, 张思维, 等. 2000—2014年中国肿瘤登记地区前列腺癌发病趋势及年龄变化分析[J]. 中华预防医学杂志, 2018, 52(6):586-592.
[2] ( Gu Xiuying, Zheng Rongshou, Zhang Siwei, et al. Analysis on the Trend of Prostate Cancer Incidence and Age Change in Cancer Registration Areas of China, 2000 to 2014[J]. Chinese Journal of Preventive Medicine, 2018, 52(6):586-592.)
[3] Platz E A, Rimm E B, Willett W C, et al. Racial Variation in Prostate Cancer Incidence and in Hormonal System Markers Among Male Health Professionals[J]. Journal of the National Cancer Institute, 2000, 92(24):2009-2017.
pmid: 11121463
[4] Culp M B, Soerjomataram I, Efstathiou J A, et al. Recent Global Patterns in Prostate Cancer Incidence and Mortality Rates[J]. European Urology, 2020, 77(1):38-52.
doi: 10.1016/j.eururo.2019.08.005
[5] Nitta S, Tsutsumi M, Sakka S, et al. Machine Learning Methods Can More Efficiently Predict Prostate Cancer Compared with Prostate-specific Antigen Density and Prostate Specific Antigen Velocity[J]. Prostate International, 2019, 7(3):114-118.
doi: 10.1016/j.prnil.2019.01.001
[6] Jones S, Hargrave C, Deegan T, et al. Comparison of Statistical Machine Learning Models for Rectal Protocol Compliance in Prostate External Beam Radiation Therapy[J]. Medical Physics, 2020, 47(4):1452-1459.
doi: 10.1002/mp.v47.4
[7] 杨振森, 李传富, 周康源, 等. 基于小波变换的超声图像纹理特征提取及前列腺癌诊断[J]. 航天医学与医学工程, 2009, 22(4):281-285.
[7] ( Yang Zhensen, Li Chuanfu, Zhou Kangyuan, et al. Diagnosis of Prostate Cancer and Texture Feature Extraction of Ultrasound Images Based on Wavelet Transform[J]. Space Medicine & Medical Engineering, 2009, 22(4):281-285.)
[8] 殷昭阳, 李方龙, 崔亮, 等. 前列腺穿刺活检结果预测模型的建立[J]. 现代泌尿生殖肿瘤杂志, 2016, 8(5):283-287.
[8] ( Yin Zhaoyang, Li Fanglong, Cui Liang, et al. Establishment of a Model for Predicting the Result of Prostate Biopsy[J]. Journal of Contemporary Urologic and Reproductive Oncology, 2016, 8(5):283-287.)
[9] 彭涛, 肖建明, 张仕慧, 等. 基于多参数MRI及影像组学建立机器学习模型诊断临床显著性前列腺癌[J]. 中国医学影像技术, 2019, 35(10):1526-1530.
[9] ( Peng Tao, Xiao Jianming, Zhang Shihui, et al. Establishment of Machine Learning Models for Diagnosis of Clinically Significant Prostate Cancer Based on Multi-Parameter MRI and Radiomics[J]. Chinese Journal of Medical Imaging Technology, 2019, 35(10):1526-1530.)
[10] 陈志远, 杨瑞, 刘修恒. 机器学习构建多基因模型预测前列腺癌[J]. 现代泌尿外科杂志, 2020, 25(7):585-589.
[10] ( Chen Zhiyuan, Yang Rui, Liu Xiuheng. Construction of a Multigene Predictive Model of Prostate Cancer Based on Machine Learning[J]. Journal of Modern Urology, 2020, 25(7):585-589.)
[11] Maiti T, Mukhopadhyay P. Comparison of Statistical Classification Methods Based on a Prostate Cancer Study[J]. Calcutta Statistical Association Bulletin, 2005, 57(3-4):219-238.
doi: 10.1177/0008068320050306
[12] 肖利洪, 陈沛然, 李梅, 等. TAN贝叶斯网络模型在前列腺癌中的预测研究[J]. 中华男科学杂志, 2016, 22(6):506-510.
[12] ( Xiao Lihong, Chen Peiran, Li Mei, et al. Tree-Augmented Naive Bayesian Network Model for Predicting Prostate Cancer[J]. National Journal of Andrology, 2016, 22(6):506-510.)
[13] Sanchis-Bonet A, Ortega-Polledo L, Garcia-Loarte E E, et al. Utility of Prostate Health Index and Prostate Health Index Density in Predicting Detection of Clinically Significant Prostate Cancer in a Cohort of Patients with PSA in the Grey Zone and Normal Digital Rectal Examination[J]. European Urology Supplements, 2019, 18(11):e3420.
doi: 10.1016/S1569-9056(19)34593-2
[14] 徐继伟, 杨云. 集成学习方法: 研究综述[J]. 云南大学学报(自然科学版), 2018, 40(6):1082-1092.
[14] ( Xu Jiwei, Yang Yun. A Survey of Ensemble Learning Approaches[J]. Journal of Yunnan University (Natural Sciences Edition), 2018, 40(6):1082-1092.)
[15] Zhou Z H, Wu J X, Tang W. Ensembling Neural Networks: Many Could be Better Than All[J]. Artificial Intelligence, 2002, 137(1-2):239-263.
doi: 10.1016/S0004-3702(02)00190-X
[16] 于玲, 吴铁军. 集成学习:Boosting算法综述[J]. 模式识别与人工智能, 2004, 17(1):52-59.
[16] ( Yu Ling, Wu Tiejun. Assemble Learning: A Survey of Boosting Algorithms[J]. Pattern Recognition and Artificial Intelligence, 2004, 17(1):52-59.)
[17] Çınar M, Engin M, Engin E Z, et al. Early Prostate Cancer Diagnosis by Using Artificial Neural Networks and Support Vector Machines[J]. Expert Systems with Applications, 2009, 36(3):6357-6361.
doi: 10.1016/j.eswa.2008.08.010
[18] Lee H J, Hwang S I, Han S M, et al. Image-based Clinical Decision Support for Transrectal Ultrasound in the Diagnosis of Prostate Cancer: Comparison of Multiple Logistic Regression, Artificial Neural Network, and Support Vector Machine[J]. European Radiology, 2010, 20(6):1476-1484.
doi: 10.1007/s00330-009-1686-x
[19] Pantic D N, Stojadinovic M M, Stojadinovic M M. Decision Tree Analysis for Prostate Cancer Prediction in Patients with Serum PSA 10 ng/mL or Less[J]. Serbian Journal of Experimental and Clinical Research, 2020, 21(1):43-50.
doi: 10.2478/sjecr-2018-0039
[20] 黄朴文. 基于集成学习的糖尿病分析预测[J]. 电子制作, 2018(22):73-75.
[20] ( Huang Puwen. Diabetes Analysis and Prediction Based on Ensemble Learning[J]. Practical Electronics, 2018(22):73-75.)
[21] 汤元杰. 雄激素对前列腺癌细胞内游离钙离子浓度的影响及其机制探讨[D]. 上海:第二军医大学, 2004.
[21] ( Tang Yuanjie. Effects of Androgen on Intracellular Free Calcium Concentration of Prostate Cancer Cells and Its Underlying Mechanism[D]. Shanghai: Second Military Medical University, 2004.)
[22] 巩蓓, 雷婷, 张曼. 比较载脂蛋白A-1在前列腺癌和前列腺增生中的表达[J]. 国际检验医学杂志, 2015, 36(2):150-152.
[22] ( Gong Bei, Lei Ting, Zhang Man. Expression of Apolipoprotein A-I in Prostate Cancer and Benign Prostatic Hyperplasia[J]. International Journal of Laboratory Medicine, 2015, 36(2):150-152.)
[23] Venanzoni M, Giunta S, Muraro G, et al. Apolipoprotein E Expression in Localized Prostate Cancers[J]. International Journal of Oncology, 2003, 22(4):779-786.
pmid: 12632068
[24] 郑轶群, 李志坚, 高新, 等. Eag1钾通道在前列腺癌组织中的表达及意义[J]. 中华男科学杂志, 2013, 19(3):205-209.
[24] ( Zheng Yiqun, Li Zhijian, Gao Xin, et al. Expression of Eag1 K(+) Channel in Prostate Cancer and Its Significance[J]. National Journal of Andrology, 2013, 19(3):205-209.)
[1] 王寒雪,崔文娟,周园春,杜一. 基于机器学习的食源性疾病致病菌识别方法*[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] 陈东华,赵红梅,尚小溥,张润彤. 数据驱动的大型医院手术室运营预测与优化方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[3] 苏强, 侯校理, 邹妮. 基于机器学习组合优化方法的术后感染预测模型研究*[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[4] 徐良辰, 郭崇慧. 基于集成学习的胃癌生存预测模型研究*[J]. 数据分析与知识发现, 2021, 5(8): 86-99.
[5] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[6] 曹睿,廖彬,李敏,孙瑞娜. 基于XGBoost的在线短租市场价格预测及特征分析模型*[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[7] 向卓元,刘志聪,吴玉. 基于用户行为自适应推荐模型研究 *[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[8] 王楠,李海荣,谭舒孺. 基于改进SMOTE算法与集成学习的舆情反转预测研究*[J]. 数据分析与知识发现, 2021, 5(4): 37-48.
[9] 邱云飞, 郭蕾. 面向非均衡数据的糖尿病并发症预测[J]. 数据分析与知识发现, 2021, 5(2): 116-128.
[10] 柴国荣,王斌,沙勇忠. 基于多机器学习方法联合的公共卫生风险预测研究——以兰州市流感预测为例*[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[11] 陈东,王建冬,李慧颖,蔡思航,黄倩倩,易成岐,曹攀. 融合机器学习算法和多因素的禽肉交易量预测方法研究 *[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[12] 梁野,李小元,许航,胡伊然. CLOpin:一种面向舆情分析与预警领域的跨语言知识图谱架构*[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[13] 杨恒,王思丽,祝忠明,刘巍,王楠. 基于并行协同过滤算法的领域知识推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[14] 魏国辉,张丰聪,付先军,王振国. 中药成分相似性量化建模及寒热药性预测分析*[J]. 数据分析与知识发现, 2020, 4(5): 75-83.
[15] 余本功,汲浩敏. 基于DW-TCI的半监督文本分类方法研究*[J]. 数据分析与知识发现, 2020, 4(10): 58-69.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn