[Objective]In view of the problem of insufficient description ability of classifier and deviation of decision boundary caused by unbalanced data samples of diabetic complications,to explore a suitable classifier model,improve the prediction effect of diabetic complications.[Methods]At the data level,The improved smote oversampling algorithm (F_ Smote) changes the class distribution of unbalanced data;At the algorithm level,Balance accuracy, ROC and AUC under PR curve were used as evaluation indexes,Four single classifier learning models and four ensemble learning models are compared and analyzed.[Results]It was found that compared with the traditional over sampling algorithm, F_SMOTE oversampling algorithm improved the prediction value by 1.48% (accuracy), 4.14% (ROC) and 9.21% (PR) respectively;Compared with the single classifier learning model, the prediction value of ensemble learning model was improved by 9.78% (accuracy), 8.82% (ROC) and 45.9% (PR), respectively,the combination of F_ Smote algorithm and random forest (RF) model can reach 97.63% (accuracy), 98.97% (ROC) and 96.54% (PR) for unbalanced data.[Limitations]The time efficiency of model training needs to be further improved.[Conclusions]This method can not only provide multi angle analysis and prediction framework for data mining personnel, but also assist doctors in disease diagnosis and prevention.
[J]. 数据分析与知识发现, 10.11925/infotech.2096-3467.2020.0353.
Qiu Yunfei, Guo Lei.
Prediction of diabetic complications based on unbalanced data
. Data Analysis and Knowledge Discovery, 0, (): 1-.