Please wait a minute...
Data Analysis and Knowledge Discovery
Current Issue | Archive | Adv Search |
Prediction of diabetic complications based on unbalanced data
Qiu Yunfei,Guo Lei
(School of Software, Liaoning Technical University, Huludao 125105, China)
Export: BibTeX | EndNote (RIS)      

[Objective]In view of the problem of insufficient description ability of classifier and deviation of decision boundary caused by unbalanced data samples of diabetic complications,to explore a suitable classifier model,improve the prediction effect of diabetic complications.[Methods]At the data level,The improved smote oversampling algorithm (F_ Smote) changes the class distribution of unbalanced data;At the algorithm level,Balance accuracy, ROC and AUC under PR curve were used as evaluation indexes,Four single classifier learning models and four ensemble learning models are compared and analyzed.[Results]It was found that compared with the traditional over sampling algorithm, F_SMOTE oversampling algorithm improved the prediction value by 1.48% (accuracy), 4.14% (ROC) and 9.21% (PR) respectively;Compared with the single classifier learning model, the prediction value of ensemble learning model was improved by 9.78% (accuracy), 8.82% (ROC) and 45.9% (PR), respectively,the combination of F_ Smote algorithm and random forest (RF) model can reach 97.63% (accuracy), 98.97% (ROC) and 96.54% (PR) for unbalanced data.[Limitations]The time efficiency of model training needs to be further improved.[Conclusions]This method can not only provide multi angle analysis and prediction framework for data mining personnel, but also assist doctors in disease diagnosis and prevention.

Key words Unbalanced data      F_SMOTE algorithm      integrated learning      diabetic complications      
Published: 11 November 2020

Cite this article:

Qiu Yunfei, Guo Lei. Prediction of diabetic complications based on unbalanced data . Data Analysis and Knowledge Discovery, 0, (): 1-.

URL:     OR

[1] Yu Bengong,Ji Haomin. Semi-Supervised Method for Text Classification Based on DW-TCI[J]. 数据分析与知识发现, 2020, 4(10): 58-69.
[2] Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang. Classification of Short Texts Based on nLD-SVM-RF Model[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[3] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[4] Lianjie Xiao,Mengrui Gao,Xinning Su. An Under-sampling Ensemble Classification Algorithm Based on Fuzzy C-Means Clustering for Imbalanced Data[J]. 数据分析与知识发现, 2019, 3(4): 90-96.
[5] Cao Wei,Li Can,He Tingting,Zhu Weidong. Predicting Credit Risks of P2P Loans in China Based on Ensemble Learning Methods[J]. 数据分析与知识发现, 2018, 2(10): 65-76.
[6] Wang Huaqiu, Wang Bin, Nie Zhen. Research on Image Semantic Mapping with Multiple-Reservoirs Echo State Network[J]. 现代图书情报技术, 2015, 31(6): 41-48.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938