Please wait a minute...
Advanced Search
数据分析与知识发现
  本期目录 | 过刊浏览 | 高级检索 |
面向非均衡数据的糖尿病并发症预测
邱云飞,郭蕾
(辽宁工程技术大学软件学院 辽宁葫芦岛  125105)
Prediction of diabetic complications based on unbalanced data
Qiu Yunfei,Guo Lei
(School of Software, Liaoning Technical University, Huludao 125105, China)
全文: PDF (1623 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的]针对糖尿病并发症数据样本不均衡带来的分类器刻画能力不足和决策边界偏移问题,探索合适的分类器模型, 提高糖尿病并发症预测的效果。[方法]在数据层面,使用改进的SMOTE过采样算法(F_SMOTE)改变不均衡数据的类分布;在算法层面,用平衡准确度、ROC和PR曲线下AUC值共同作为评价指标,对比分析4种单分类器学习模型和4种集成学习模型。[结果]发现采用F_SMOTE过采样算法比传统过采样算法的预测值分别提升了1.48%(准确度)、4.14%(ROC)和9.21%(PR);采用集成学习模型比单分类器学习模型的预测值分别提升了9.78%(准确度)、8.82%(ROC)和45.9%(PR),其中结合F_SMOTE算法和随机森林(RF)模型在面向非均衡数据时的预测值可达到97.63%(准确度)、98.97%(ROC)和96.54%(PR)。[局限]未能覆盖全部的糖尿病并发症,模型训练的时间效率有待进一步提升。[结论]该方法在为数据挖掘人员提供多角度分析预测框架的同时,也可辅助医生进行疾病诊断和预防。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
关键词 非均衡数据F_SMOTE算法集成学习糖尿病并发症     
Abstract

[Objective]In view of the problem of insufficient description ability of classifier and deviation of decision boundary caused by unbalanced data samples of diabetic complications,to explore a suitable classifier model,improve the prediction effect of diabetic complications.[Methods]At the data level,The improved smote oversampling algorithm (F_ Smote) changes the class distribution of unbalanced data;At the algorithm level,Balance accuracy, ROC and AUC under PR curve were used as evaluation indexes,Four single classifier learning models and four ensemble learning models are compared and analyzed.[Results]It was found that compared with the traditional over sampling algorithm, F_SMOTE oversampling algorithm improved the prediction value by 1.48% (accuracy), 4.14% (ROC) and 9.21% (PR) respectively;Compared with the single classifier learning model, the prediction value of ensemble learning model was improved by 9.78% (accuracy), 8.82% (ROC) and 45.9% (PR), respectively,the combination of F_ Smote algorithm and random forest (RF) model can reach 97.63% (accuracy), 98.97% (ROC) and 96.54% (PR) for unbalanced data.[Limitations]The time efficiency of model training needs to be further improved.[Conclusions]This method can not only provide multi angle analysis and prediction framework for data mining personnel, but also assist doctors in disease diagnosis and prevention.

Key words Unbalanced data    F_SMOTE algorithm    integrated learning    diabetic complications
     出版日期: 2020-11-11
引用本文:   
邱云飞, 郭蕾. 面向非均衡数据的糖尿病并发症预测 [J]. 数据分析与知识发现, 10.11925/infotech.2096-3467.2020.0353.
Qiu Yunfei, Guo Lei. Prediction of diabetic complications based on unbalanced data . Data Analysis and Knowledge Discovery, 0, (): 1-.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0353      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y0/V/I/1
[1] 余本功,汲浩敏. 基于DW-TCI的半监督文本分类方法研究*[J]. 数据分析与知识发现, 2020, 4(10): 58-69.
[2] 余本功,曹雨蒙,陈杨楠,杨颖. 基于nLD-SVM-RF的短文本分类研究*[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[3] 余本功,陈杨楠,杨颖. 基于nBD-SVM模型的投诉短文本分类*[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[4] 肖连杰,郜梦蕊,苏新宁. 一种基于模糊C-均值聚类的欠采样集成不平衡数据分类算法*[J]. 数据分析与知识发现, 2019, 3(4): 90-96.
[5] 操玮, 李灿, 贺婷婷, 朱卫东. 基于集成学习的中国P2P网络借贷信用风险预警模型的对比研究*[J]. 数据分析与知识发现, 2018, 2(10): 65-76.
[6] 王华秋, 王斌, 聂珍. 一种应用多储备池回声状态网络的图像语义映射研究[J]. 现代图书情报技术, 2015, 31(6): 41-48.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn