Please wait a minute...
Data Analysis and Knowledge Discovery
Current Issue | Archive | Adv Search |
A novel borderline over-sampling method based on KNN and Deep Gaussian Mixture Model for Imbalanced Data
ZHANG Haibin,XIAO Han,YI Cancan,YUAN Rui
(Key Laboratory of Metallurgical Equipment and Control Technology, Ministry of Education, Wuhan University of Science and Technology, Wuhan 430081, China) (Hubei Key Laboratory of Mechanical Transmission and Manufacturing Engineering, Wuhan University of Science and Technology, Wuhan 430081, China) (Precision Manufacturing Institute, Wuhan University of Science and Technology, Wuhan 430081, China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] A borderline oversampling method based on KNN and Deep Gaussian Mixture Model is proposed to address the problem of classifier bias caused by data imbalance. [Methods] Firstly, k-nearest neighbor (KNN) algorithm is employed to obtain the borderline minority samples in the training set; Secondly, the DGMMs of the minority samples in the region are constructed, and the oversampling samples that conform to the distribution characteristics of the borderline minority samples in the training set are generated by reverse application of DGMM; Finally, with three sigma guidelines, the noise points in the generated samples are eliminated, which is executed circularly until the noise is completely eliminated. [Results] The maximum increasing amplitudes of AUC and Gmean obtained by the proposed method are 5.64% and 7.95% respectively, and the corresponding average increasing amplitudes are 2.75% and 3.78% respectively. [Limitations] The parameter optimization method for DGMM needs to be further improved. [Conclusions] The proposed method can better address the problem of data imbalance.

Key words imbalanced data      over-sampling      Deep Gaussian Mixture Model      
Published: 10 November 2022
ZTFLH:  TP181,TP311.13  

Cite this article:

ZHANG Haibin, XIAO Han, YI Cancan, YUAN Rui. A novel borderline over-sampling method based on KNN and Deep Gaussian Mixture Model for Imbalanced Data . Data Analysis and Knowledge Discovery, 0, (): 1-.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022-0609     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y0/V/I/1

[1] Zhang Haibin, Xiao Han, Yi Cancan, Yuan Rui. A Novel Borderline Over-Sampling Method Based on KNN and Deep Gaussian Mixture Model for Imbalanced Data[J]. 数据分析与知识发现, 2023, 7(5): 116-122.
[2] Xu Liangchen, Guo Chonghui. Predicting Survival Rates for Gastric Cancer Based on Ensemble Learning[J]. 数据分析与知识发现, 2021, 5(8): 86-99.
[3] Su Qiang, Hou Xiaoli, Zou Ni. Predicting Surgical Infections Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn