Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (2): 116-128    DOI: 10.11925/infotech.2096-3467.2020.0353
Current Issue | Archive | Adv Search |
Predicting Diabetic Complications with Unbalanced Data
Qiu Yunfei,Guo Lei()
School of Software, Liaoning Technical University, Huludao 125105, China
Download: PDF (1334 KB)   HTML ( 16
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper addresses the classification issues facing unbalanced sample data, aiming to find a better solution and improve the prediction results of diabetic complications. [Methods] At the data level, we used the improved SMOTE oversampling algorithm (F_SMOTE) to change the class distribution of unbalanced data. At the algorithm level, we adopted the balance accuracy, ROC and AUC under PR curve as evaluation criteria. Finally, we compared the performance of four single classifier learning models and four ensemble learning models. [Results] Compared with the traditional over sampling algorithm, our F_SMOTE algorithm improved the prediction accuracy, ROC and PR by 1.49%, 3.43% and 8.05%, respectively. Compared with the single classifier learning model, our method improved the accuracy, ROC and PR by 9.73%, 14.07% and 46.79%, respectively. The combined F_SMOTE algorithm and Random Forest model reached 97.64% in accuracy, 98.91% in ROC and 96.64% in PR for unbalanced data. [Limitations] The coverage and efficiency of our model training needs to be further improved. [Conclusions] This method creates a predictive analysis framework for researchers, which could also help doctors in disease diagnosis and prevention.

Key wordsUnbalanced Data      F_SMOTE Algorithm      Ensemble Learning      Diabetic Complications     
Received: 08 June 2020      Published: 11 November 2020
ZTFLH:  G350  
Corresponding Authors: Guo Lei ORCID:0000-0003-3441-5063     E-mail: 752714018@qq.com

Cite this article:

Qiu Yunfei, Guo Lei. Predicting Diabetic Complications with Unbalanced Data. Data Analysis and Knowledge Discovery, 2021, 5(2): 116-128.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0353     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I2/116

样本分布 煤矿高能地震波数据集 肺癌患者
数据集
心力衰竭
患者数据集
正样本数量(例) 170 70 96
负样本数量(例) 2 414 400 203
正负样本比例 1∶14 1∶6 1∶2
Statistics of Unbalanced Distribution of Samples
Result of Evaluation Index
Investigation Results of Each Field of Biochemical Table
并发症所属类型 并发症细分 英文名 本文命名
慢性并发症 糖尿病肾病 Diabetic Nephropathy D1
糖尿病视网膜病变 Diabetic Retinopathy D2
糖尿病神经病变 Diabetic Neuropathy D3
糖尿病心血管病变 Diabetic Cardiovascular Disease D4
糖尿病脑血管病变 Diabetic Cerebrovascular Disease D5
糖尿病下肢血管病变 Diabetic Lower Extremity Angiopathy D6
糖尿病病足 Diabetic Foot D7
急性并发症 糖尿病酮症酸中毒 Diabetic Ketoacidosis D8
糖尿病高渗性高血糖 Diabetic Hyperosmotic Hyperglycemia D9
糖尿病乳酸性中毒 Diabetic Lactic Acidosis D10
糖尿病严重低血糖 Diabetic Severe Hypoglycemia D11
11 Types of High Complications and Nomenclature
分布情况 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11
正类样本 2 684 421 316 5 999 2 350 542 76 553 39 173 167
负类样本 8 448 10 711 10 816 5 133 8 782 10 590 11 056 10 579 11 093 10 959 10 965
正负样本比例 0.32 0.04 0.03 1.17 0.27 0.05 0.01 0.05 0.01 0.02 0.02
属性个数 27 27 27 27 27 27 27 27 27 27 27
Detailed Description of Each Complication Data
分布情况 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11
样本总量 3 341 3 341 3 340 3 340 3 340 3 340 3 340 3 340 3 340 3 340 3 341
正样本例数 806 127 95 1 540 705 163 23 166 12 52 51
负样本例数 2 535 3 214 3 245 1 800 2 635 3 177 3 317 3 174 3 328 3 288 3 290
正例所占比例 0.24 0.04 0.03 0.46 0.21 0.05 0.01 0.05 0.01 0.02 0.02
Distribution of Positive and Negative Samples of Complications in the Test Set
分布情况 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11
正样本例数 5 913 7 497 7 571 4 199 6 147 7 413 7 739 7 405 7 765 7 671 7 675
负样本例数 5 913 7 497 7 571 4 199 6 147 7 413 7 739 7 405 7 765 7 671 7 675
样本总量 11 826 14 994 15 142 8 398 12 294 14 826 15 478 14 810 15 530 15 342 15 350
The Distribution of Sample Number after Over-Sampling Methods
对比算法 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11
重采样 LR 0.79 0.74 0.59 0.65 0.62 0.61 0.77 0.73 0.90 0.85 0.87
SVM 0.77 0.74 0.60 0.63 0.64 0.64 0.78 0.76 0.91 0.88 0.88
kNN 0.86 0.95 0.96 0.72 0.79 0.94 0.99 0.96 1.00 0.99 0.99
DT 0.94 0.98 0.99 0.80 0.91 0.98 1.00 0.99 1.00 1.00 1.00
RF 0.95 0.99 1.00 0.86 0.95 1.00 1.00 0.99 1.00 1.00 1.00
GBDT 0.88 0.93 0.94 0.77 0.78 0.90 1.00 0.93 1.00 0.99 0.99
XGBoost 0.88 0.92 0.94 0.77 0.77 0.88 0.99 0.93 1.00 0.99 1.00
并行集成 0.88 0.93 0.98 0.78 0.83 0.94 1.00 0.95 1.00 0.99 1.00
SMOTE LR 0.79 0.75 0.62 0.65 0.63 0.61 0.77 0.73 0.92 0.87 0.88
SVM 0.76 0.76 0.62 0.64 0.65 0.65 0.81 0.76 0.92 0.88 0.88
kNN 0.88 0.93 0.94 0.70 0.82 0.92 0.98 0.94 0.99 0.98 0.98
DT 0.89 0.96 0.97 0.70 0.84 0.95 0.99 0.96 1.00 0.99 0.99
RF 0.93 0.98 0.99 0.79 0.91 0.98 1.00 0.99 1.00 1.00 1.00
GBDT 0.90 0.94 0.96 0.76 0.83 0.92 0.99 0.95 1.00 0.99 1.00
XGBoost 0.90 0.94 0.95 0.76 0.82 0.92 0.99 0.94 1.00 0.99 1.00
并行集成 0.89 0.94 0.97 0.76 0.83 0.95 0.99 0.95 1.00 0.98 1.00
F_SMOTE LR 0.80 0.78 0.71 0.67 0.64 0.64 0.80 0.76 0.95 0.87 0.90
SVM 0.80 0.80 0.68 0.60 0.63 0.67 0.85 0.76 0.94 0.90 0.89
kNN 0.90 0.96 0.94 0.72 0.82 0.92 0.99 0.97 0.99 0.99 0.99
DT 0.92 0.98 0.98 0.73 0.85 0.95 0.99 0.98 1.00 1.00 1.00
RF 0.95 0.98 1.00 0.87 0.97 0.98 1.00 0.99 1.00 1.00 1.00
GBDT 0.92 0.95 0.96 0.80 0.84 0.92 1.00 0.97 1.00 0.98 0.99
XGBoost 0.91 0.94 0.96 0.79 0.83 0.91 0.99 0.97 1.00 0.98 0.99
并行集成 0.94 0.98 0.99 0.81 0.90 0.98 1.00 0.98 1.00 0.99 1.00
Classification Accuracy of Three Over-Sampling Algorithms
ROC and PR Curves of Diabetic Nephropathy Treated with F_ SMOTE + RF and SMOTE + RF
对比算法 AUC D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11
重采样 LR ROC 0.83 0.76 0.55 0.71 0.65 0.59 0.70 0.37 0.28 0.69 0.88
PR 0.65 0.11 0.05 0.69 0.37 0.06 0.02 0.04 0.01 0.05 0.38
SVM ROC 0.83 0.68 0.45 0.70 0.62 0.54 0.65 0.45 0.61 0.54 0.85
PR 0.65 0.08 0.03 0.68 0.29 0.05 0.04 0.05 0.01 0.02 0.49
kNN ROC 0.93 0.90 0.91 0.80 0.84 0.91 0.97 0.95 0.99 0.96 0.98
PR 0.82 0.23 0.20 0.83 0.59 0.22 0.15 0.39 0.13 0.15 0.54
DT ROC 0.93 0.96 0.97 0.77 0.90 0.95 0.99 0.96 1.00 0.99 0.99
PR 0.73 0.30 0.32 0.86 0.64 0.30 0.33 0.37 0.26 0.37 0.48
RF ROC 0.98 0.99 1.00 0.97 0.99 1.00 1.00 0.99 1.00 1.00 1.00
PR 0.94 0.80 0.81 0.97 0.95 0.88 0.75 0.79 0.90 0.76 0.92
GBDT ROC 0.95 0.95 0.94 0.86 0.87 0.90 1.00 0.96 1.00 0.98 1.00
PR 0.87 0.44 0.55 0.86 0.65 0.40 0.90 0.61 0.92 0.73 0.94
XGBoost ROC 0.95 0.94 0.95 0.86 0.86 0.90 1.00 0.96 1.00 0.99 1.00
PR 0.87 0.38 0.53 0.86 0.63 0.36 0.84 0.60 0.90 0.71 0.94
并行集成 ROC 0.97 0.98 0.99 0.93 0.98 0.98 1.00 0.98 1.00 0.99 1.00
PR 0.91 0.63 0.67 0.95 0.89 0.71 0.89 0.68 0.90 0.79 0.94
SMOTE LR ROC 0.84 0.75 0.55 0.71 0.67 0.62 0.75 0.68 0.78 0.78 0.90
PR 0.67 0.15 0.04 0.69 0.39 0.07 0.04 0.14 0.02 0.02 0.44
SVM ROC 0.84 0.73 0.53 0.70 0.66 0.62 0.57 0.62 0.75 0.75 0.90
PR 0.64 0.14 0.06 0.68 0.33 0.08 0.01 0.10 0.01 0.01 0.62
kNN ROC 0.92 0.88 0.85 0.80 0.83 0.88 0.95 0.91 0.97 0.97 0.98
PR 0.80 0.14 0.09 0.83 0.52 0.17 0.06 0.24 0.07 0.07 0.57
DT ROC 0.95 0.95 0.96 0.77 0.89 0.93 0.99 0.95 1.00 1.00 1.00
PR 0.80 0.28 0.24 0.87 0.58 0.26 0.21 0.32 0.24 0.24 0.61
RF ROC 0.98 0.99 1.00 0.97 0.99 1.00 1.00 0.99 1.00 1.00 1.00
PR 0.94 0.89 0.97 0.97 0.95 0.97 0.92 0.87 0.86 0.86 0.94
GBDT ROC 0.95 0.96 0.94 0.86 0.86 0.90 0.99 0.96 1.00 1.00 1.00
PR 0.88 0.55 0.59 0.86 0.67 0.42 0.88 0.69 0.78 0.78 0.98
XGBoost ROC 0.95 0.95 0.93 0.86 0.86 0.90 0.98 0.96 1.00 1.00 1.00
PR 0.87 0.46 0.50 0.86 0.65 0.37 0.83 0.68 0.77 0.77 0.96
并行集成 ROC 0.97 0.98 0.98 0.93 0.97 0.98 1.00 0.98 1.00 1.00 1.00
PR 0.91 0.66 0.73 0.95 0.89 0.80 0.89 0.76 0.84 0.84 0.95
F_SMOTE LR ROC 0.87 0.80 0.66 0.71 0.69 0.66 0.88 0.82 0.98 0.89 0.96
PR 0.70 0.15 0.05 0.69 0.40 0.07 0.04 0.28 0.12 0.12 0.47
SVM ROC 0.87 0.82 0.68 0.70 0.71 0.70 0.89 0.82 0.99 0.89 0.97
PR 0.70 0.15 0.06 0.68 0.42 0.09 0.05 0.38 0.17 0.17 0.66
kNN ROC 0.92 0.96 0.97 0.76 0.84 0.95 0.99 0.97 1.00 0.99 1.00
PR 0.78 0.32 0.31 0.77 0.54 0.30 0.31 0.44 0.53 0.363 0.72
DT ROC 0.95 0.96 0.98 0.75 0.90 0.96 1.00 0.97 1.00 0.99 1.00
PR 0.81 0.32 0.35 0.79 0.63 0.35 0.57 0.43 0.68 0.38 0.77
RF ROC 0.98 1.00 1.00 0.92 0.98 1.00 1.00 1.00 1.00 1.00 1.00
PR 0.96 0.99 0.99 0.91 0.96 0.99 0.96 0.99 0.92 0.98 0.98
GBDT ROC 0.95 0.97 0.98 0.86 0.87 0.95 1.00 0.98 1.00 1.00 1.00
PR 0.88 0.65 0.77 0.86 0.69 0.58 0.94 0.72 0.82 0.85 0.98
XGBoost ROC 0.95 0.97 0.97 0.86 0.86 0.94 1.00 0.97 1.00 1.00 1.00
PR 0.88 0.62 0.74 0.86 0.67 0.55 0.94 0.70 0.88 0.83 0.98
并行集成 ROC 0.98 1.00 1.00 0.90 0.96 1.00 1.00 1.00 1.00 1.00 1.00
PR 0.93 0.92 0.96 0.89 0.89 0.96 0.96 0.97 0.92 0.96 0.98
Area AUC of the ROC and PR Curves of Three Over-Sampling Algorithms
对比算法 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11
重采样 LR 0.21 0.21 0.38 0.20 0.22 0.22 0.23 0.23 0.27 0.28 0.21
SVM 49.67 100.63 122.14 51.49 74.74 111.16 104.32 92.97 73.50 81.23 43.34
kNN 0.66 0.94 0.99 0.76 0.63 0.82 1.00 1.01 1.08 0.80 0.87
DT 0.35 0.35 0.26 0.34 0.30 0.33 0.22 0.26 0.07 0.18 0.14
RF 15.32 13.09 14.64 13.31 15.38 15.28 10.06 12.76 5.98 9.55 7.61
GBDT 37.99 44.64 39.83 30.54 36.92 41.66 38.21 40.00 34.34 36.31 35.42
XGBoost 8.70 9.65 9.69 6.86 8.19 9.14 9.69 9.55 9.74 8.99 9.09
并行集成 69.88 41.70 42.83 67.99 85.26 44.09 39.02 42.05 40.55 39.64 71.51
SMOTE LR 0.16 0.18 0.21 0.19 0.27 0.21 0.34 0.31 0.25 0.23 0.24
SVM 37.40 72.18 113.49 47.79 75.50 117.72 104.35 96.86 79.36 76.72 50.39
kNN 0.51 0.98 0.89 0.60 0.80 0.83 1.16 0.87 1.12 0.79 1.00
DT 0.40 0.42 0.62 0.35 0.50 0.79 0.88 0.42 0.23 0.59 0.43
RF 12.53 17.24 21.74 14.98 17.52 24.86 29.42 19.14 15.32 20.36 18.10
GBDT 34.86 58.66 65.00 33.56 46.38 67.64 67.65 63.14 57.93 66.29 67.93
XGBoost 7.73 12.05 13.63 8.00 10.23 12.79 13.79 12.90 14.31 15.05 14.06
并行集成 71.13 61.52 62.53 66.25 104.77 67.80 75.04 68.60 62.39 74.85 109.42
F_SMOTE LR 0.23 0.30 0.29 0.17 0.19 0.23 0.25 0.26 0.25 0.18 0.17
SVM 49.33 68.38 82.09 56.17 76.01 86.54 51.93 53.80 26.37 29.36 9.53
kNN 0.87 1.08 0.76 0.88 0.73 0.83 0.82 0.71 0.58 0.57 0.75
DT 0.34 0.52 0.43 0.37 0.39 0.33 0.21 0.38 0.15 0.24 0.16
RF 13.09 12.80 13.69 14.38 14.01 12.44 10.20 11.51 6.48 9.04 7.05
GBDT 40.00 44.47 48.09 34.15 42.19 44.42 43.51 34.21 32.28 31.96 30.66
XGBoost 10.38 10.71 10.19 8.19 8.85 10.40 10.92 8.82 7.61 8.10 7.31
并行集成 74.83 56.42 49.23 77.90 38.24 37.66 44.36 48.10 39.83 39.55 46.18
The Time (Seconds) Spent by Each Algorithm on Three Over-Sampling Methods
[1] 张争辉, 薛爱芹, 于兰. 糖尿病相关研究进展[J]. 世界最新医学信息文摘, 2019, 19(20): 145, 149.
[1] ( Zhang Zhenghui, Xue Aiqin, Yu Lan. Research Progress of Diabetes Mellitus[J]. World Latest Medical Information Abstracts, 2019, 19(20): 145, 149.)
[2] 钟玉玲, 凡豪志, 张茹, 等. 糖化血红蛋白变异指数与糖尿病慢性并发症发生风险的相关性研究[J]. 中国全科医学, 2020,23(3):276-280, 288.
[2] ( Zhong Yuling, Fan Haozhi, Zhang Ru, et al. Associations Between HbA1c Glycation Index and Risk of Chronic Complications of Diabetes Mellitus[J]. Chinese General Practice, 2020,23(3):276-280, 288.)
[3] Fatima M, Pasha M. Survey of Machine Learning Algorithms for Disease Diagnostic[J]. Journal of Intelligent Learning Systems and Applications, 2017,9(1):1-16.
doi: 10.4236/jilsa.2017.91001
[4] 侯玉梅, 朱亚楠, 朱立春, 等. 决策树模型在Ⅱ型糖尿病患病风险预测中的应用[J]. 中国卫生统计, 2016,33(6):976-978, 982.
[4] ( Hou Yumei, Zhu Ya’nan, Zhu Lichun, et al. Application of Decision Tree Model in Risk Prediction of Type 2 Diabetes[J]. China Journal of Health Statistics, 2016,33(6):976-978, 982.)
[5] Sowjanya K, Singhal A, Choudhary C. MobDBTest: A Machine Learning Based System for Predicting Diabetes Risk Using Mobile Devices[C]//Proceedings of 2015 IEEE International Advance Computing Conference (IACC). 2015: 397-402.
[6] 崔波, 朱晓军. 混合kNN算法在Ⅱ型糖尿病预测诊断中的研究[J]. 现代电子技术, 2019,42(20):164-168.
[6] ( Cui Bo, Zhu Xiaojun. Hybrid kNN Algorithm for Predictive Diagnosis of Type 2 Diabetes[J]. Modern Electronics Technique, 2019,42(20):164-168.)
[7] 张洪侠, 郭贺, 王金霞, 等. 基于XGBoost算法的Ⅱ型糖尿病精准预测模型研究[J]. 中国实验诊断学, 2018,22(3):408-412.
[7] ( Zhang Hongxia, Guo He, Wang Jinxia, et al. Research on Type 2 Diabetes Mellitus Precise Prediction Models Based on XGBoost Algorithm[J]. Chinese Journal of Laboratory Diagnostics, 2018,22(3):408-412.)
[8] 林鑫, 李晋, 刘蕾, 等. 二型糖尿病肾病风险预测模型的比较[J]. 中华医学图书情报杂志, 2019,28(4):41-45.
[8] ( Lin Xin, Li Jin, Liu Lei, et al. Risk Prediction Models of Type 2 Diabetic Nephropathy[J]. Chinese Journal of Medical Library and Information Science, 2019,28(4):41-45.)
[9] 崔纯纯. 基于神经网络的糖尿病并发症预测系统研究[D]. 北京:北京交通大学, 2018.
[9] ( Cui Chunchun. Study on Prediction System of Diabetic Complications Based on Neural Network[D]. Beijing: Beijing Jiaotong University, 2018.)
[10] 聂斌, 王卓, 杜建强, 等. 基于粗糙集和随机森林算法辅助糖尿病并发症分类研究[J]. 江西师范大学学报(自然科学版), 2014,38(3):278-281.
[10] ( Nie Bin, Wang Zhuo, Du Jianqiang, et al. The Study on Classification of Secondary Complications of Diabetes Based on Rough Set and Random Forest[J]. Journal of Jiangxi Normal University (Natural Sciences Edition), 2014,38(3):278-281.)
[11] 刘迷迷, 蔡永铭. 基于多层感知神经网络的糖尿病并发症预测研究[J]. 软件, 2018,39(10):30-35.
[11] ( Liu Mimi, Cai Yongming. Prediction of Diabetic Complications Based on MLP[J]. Computer Engineering & Software, 2018,39(10):30-35.)
[12] 王洁, 乔艺璇, 彭岩, 等. 基于Logistic回归和多层神经网络的Ⅱ型糖尿病并发症预测[J]. 高技术通讯, 2019,29(5):455-461.
[12] ( Wang Jie, Qiao Yixuan, Peng Yan, et al. Prediction of Type Ⅱ Diabetes Complications Based on Logistic Regression and Multilayer Neural Network[J]. Chinese High Technology Letters, 2019,29(5):455-461.)
[13] VijiyaKumar K, Lavanya B, Nirmala I, et al. Random Forest Algorithm for the Prediction of Diabetes[C]//Proceedings of 2019 IEEE International Conference on System Computation, Automation and Networking (ICSCAN). 2019. DOI: 10.1109/ICSCAN.2019.8878802.
[14] Wang Q, Cao W J, Guo J W, et al. DMP_MI: An Effective Diabetes Mellitus Classification Algorithm on Imbalanced Data with Missing Values[J]. IEEE Access, 2019,7:102232-102238.
doi: 10.1109/Access.6287639
[15] 刘斌, 陈凯. 基于SMOTE和XGBoost的贷款风险预测方法[J]. 计算机与现代化, 2020(2):26-30.
[15] ( Liu Bin, Chen Kai. Loan Risk Prediction Method Based on SMOTE and XGBoost[J]. Computer and Modernization, 2020(2):26-30.)
[16] 张家伟, 郭林明, 杨晓梅. 针对不平衡数据的过采样和随机森林改进算法[J]. 计算机工程与应用, 2020,56(11):39-45.
[16] ( Zhang Jiawei, Guo Linming, Yang Xiaomei. Improved Oversampling and Random Forest Algorithm for Imbalanced Data[J]. Computer Engineering and Applications, 2020,56(11):39-45.)
[17] 刘华玲, 林蓓, 恽文婧, 等. 互联网金融风险识别中类平衡处理方法对比研究——以拍拍贷为例[J]. 计算机科学, 2019,46(11A):595-598, 608.
[17] ( Liu Hualing, Lin Bei, Yun Wenjing, et al. Comparison of Balancing Methods in Internet Finance Overdue Recognition: Taking PPDai.com as Case[J]. Computer Science, 2019,46(11A):595-598, 608.)
[18] 王忠震, 黄勃, 方志军, 等. 改进SMOTE的不平衡数据集成分类算法[J]. 计算机应用, 2019,39(9):2591-2596.
[18] ( Wang Zhongzhen, Huang Bo, Fang Zhijun, et al. Improved SMOTE Unbalanced Data Integration Classification Algorithm[J]. Journal of Computer Applications, 2019,39(9):2591-2596.)
[19] Alghamdi M, Al-Mallah M, Keteyian S, et al. Predicting Diabetes Mellitus Using SMOTE and Ensemble Machine Learning Approach: The Henry Ford Exercise Testing (FIT) Project[J]. PLoS One, 2017,12(7):e0179805.
doi: 10.1371/journal.pone.0179805 pmid: 28738059
[20] Ramesh D, Katheria Y S. Ensemble Method Based Predictive Model for Analyzing Disease Datasets: A Predictive Analysis Approach[J]. Health and Technology, 2019,9(4):533-545.
doi: 10.1007/s12553-019-00299-3
[21] 杨美洁, 唐建军. 基于随机森林算法的糖尿病预测研究[J]. 医学信息学杂志, 2019,40(9):47-49.
[21] ( Yang Meijie, Tang Jianjun. Study on Predictions of Diabetes Mellitus Based on Random Forest Algorithm[J]. Journal of Medical Informatics, 2019,40(9):47-49.)
[22] 贺小娟, 潘文捷, 程宏. 基于集成学习方法的点击率预估模型研究[J]. 计算机工程与科学, 2019,41(12):2278-2284.
[22] ( He Xiaojuan, Pan Wenjie, Cheng Hong. An Advertisement Click-Through Rate Prediction Model Based on Ensemble Learning[J]. Computer Engineering and Science, 2019,41(12):2278-2284.)
[23] 张春富, 王松, 吴亚东, 等. 基于GA_XGboost模型的糖尿病风险预测[J]. 计算机工程, 2020,46(3):315-320.
[23] ( Zhang Chunfu, Wang Song, Wu Yadong, et al. Diabetes Risk Prediction Based on GA_XGboost Model[J]. Computer Engineering, 2020,46(3):315-320.)
[24] Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic Minority Over-sampling Technique[J]. Journal of Artificial Intelligence Research, 2011,16(1):321-357.
doi: 10.1613/jair.953
[1] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[2] Xu Liangchen, Guo Chonghui. Predicting Survival Rates for Gastric Cancer Based on Ensemble Learning[J]. 数据分析与知识发现, 2021, 5(8): 86-99.
[3] Wang Nan,Li Hairong,Tan Shuru. Predicting of Public Opinion Reversal with Improved SMOTE Algorithm and Ensemble Learning[J]. 数据分析与知识发现, 2021, 5(4): 37-48.
[4] Yu Bengong,Ji Haomin. Semi-Supervised Method for Text Classification Based on DW-TCI[J]. 数据分析与知识发现, 2020, 4(10): 58-69.
[5] Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang. Classification of Short Texts Based on nLD-SVM-RF Model[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[6] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[7] Lianjie Xiao,Mengrui Gao,Xinning Su. An Under-sampling Ensemble Classification Algorithm Based on Fuzzy C-Means Clustering for Imbalanced Data[J]. 数据分析与知识发现, 2019, 3(4): 90-96.
[8] Cao Wei,Li Can,He Tingting,Zhu Weidong. Predicting Credit Risks of P2P Loans in China Based on Ensemble Learning Methods[J]. 数据分析与知识发现, 2018, 2(10): 65-76.
[9] Wang Huaqiu, Wang Bin, Nie Zhen. Research on Image Semantic Mapping with Multiple-Reservoirs Echo State Network[J]. 现代图书情报技术, 2015, 31(6): 41-48.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn