Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (10): 28-36    DOI: 10.11925/infotech.2096-3467.2021.0096
Multi-layer Cascade Classifier for Credit Scoring with Multiple-Support Vector Machines
Feng Hao,Li Shuqing()
College of Information Engineering, Nanjing University of Finance & Economics, Nanjing 210023, China
[Objective] This paper proposes a new multi-layer cascade classifier based on multiple-support vector machines, aiming to address the credit scoring issues of financial institutions. [Methods] The proposed hybrid model combines the ideas of genetic algorithm, machine learning and ensemble learning. The framework includes support vector machine classifier, normalization method, feature extraction, parameter optimization, 10-fold cross evaluation and other technologies. We tested the layer deepening strategy, attribute reuse method, and fitness function diversification by experiment. [Results] We examined the support vector machine optimized by genetic algorithm with Australian Credit Approval dataset. The prediction accuracy was improved as the increase of layers, and the overall frame prediction accuracy reached 93.33%. [Limitations] The proposed method only uses SVM, which needs to be expanded. There are many classifiers in the framework, which took long time to train and optimize. [Conclusions] The proposed classifier could effectively improve credit scoring services, and finish similar binary classification tasks.

Key wordsSupport Vector Machine      Classification      Multi-layer Cascade      Credit Scoring     
Received: 29 January 2021      Published: 23 November 2021
ZTFLH:  TP399  
Fund:Major Natural Science Research Project of Colleges and Universities in Jiangsu Province(19KJA510011)
Corresponding Authors: Li Shuqing,ORCID:0000-0001-9814-5766     E-mail:

Feng Hao, Li Shuqing. Multi-layer Cascade Classifier for Credit Scoring with Multiple-Support Vector Machines. Data Analysis and Knowledge Discovery, 2021, 5(10): 28-36.

The Framework of Research
Classifier Structure in Related Work
Classifier Structure
参数名称 参数
种群规模 500
染色体长度 第1层长度为18(4个分类器参数+14个借款人属性)
交叉方法 实数部分为离散重组
变异方法 实数部分为均匀变异
选择方式 锦标赛法
适应度函数 测试集误判数量总和如公式(1)所示
ER R T = er r L 1 000 + er r T (1)
ER R % = ( er r L % + er r t % ) / 2 (2)
ER R Sum = er r L + er r T + F a F (3)
er r L表示10个训练集中的误判数量之和
er r T表示10个测试集中的误判数量之和
er r L %表示10个训练集中错误数量占比
er r T %表示10个测试集中错误数量占比
F a表示特征选择最终选择的特征数
Parameters of Genetic Algorithm
Feature Selection Process
信用评分对象 分类器1 分类器2 分类器3
信用评分对象1 1 1 0
信用评分对象2 1 1 1
信用评分对象3 0 0 1
Result of Judging Method
分类器 核函数 数据规范化方法 特征提取方法 误差计算方式 准确率
nu-SVC RBF Z-Score None ER R T 88.16%
C-SVC Poly Max-Min None ER R % 87.97%
nu-SVC Sigmoid Max-Min PCA ER R T 87.97%
nu-SVC Sigmoid Max-Min None ER R T 87.83%
nu-SVC Poly Z-Score PCA ER R T 87.83%
The Top 5 Classifiers in the First Layer
层数 分类器 核函数 误差计算方式 准确率
1 nu-SVC RBF ER R T 88.16%
2 nu-SVC RBF ER R T 90.00%
3 nu-SVC RBF ER R % 91.28%
4 nu-SVC RBF ER R % 92.43%
5 nu-SVC Sigmoid ER R sum 93.33%
The Classifier of the Highest Accuracy in Each Layer
Hybrid Matrix of Classifier Prediction
方法 准确率/%
SVM 82.43
XGBoost 85.25
MLP 83.54
DBN 87.50
本文 93.33
Accuracy of Each Method
