|
|
A novel borderline over-sampling method based on KNN and Deep Gaussian Mixture Model for Imbalanced Data
|
ZHANG Haibin,XIAO Han,YI Cancan,YUAN Rui
|
(Key Laboratory of Metallurgical Equipment and Control Technology, Ministry of Education, Wuhan University of Science and Technology, Wuhan 430081, China)
(Hubei Key Laboratory of Mechanical Transmission and Manufacturing Engineering, Wuhan University of Science and Technology, Wuhan 430081, China)
(Precision Manufacturing Institute, Wuhan University of Science and Technology, Wuhan 430081, China)
|
|
|
Abstract
[Objective] A borderline oversampling method based on KNN and Deep Gaussian Mixture Model is proposed to address the problem of classifier bias caused by data imbalance. [Methods] Firstly, k-nearest neighbor (KNN) algorithm is employed to obtain the borderline minority samples in the training set; Secondly, the DGMMs of the minority samples in the region are constructed, and the oversampling samples that conform to the distribution characteristics of the borderline minority samples in the training set are generated by reverse application of DGMM; Finally, with three sigma guidelines, the noise points in the generated samples are eliminated, which is executed circularly until the noise is completely eliminated. [Results] The maximum increasing amplitudes of AUC and Gmean obtained by the proposed method are 5.64% and 7.95% respectively, and the corresponding average increasing amplitudes are 2.75% and 3.78% respectively. [Limitations] The parameter optimization method for DGMM needs to be further improved. [Conclusions] The proposed method can better address the problem of data imbalance.
|
Published: 10 November 2022
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|