Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (10): 28-36    DOI: 10.11925/infotech.2096-3467.2021.0096
Current Issue | Archive | Adv Search |
Multi-layer Cascade Classifier for Credit Scoring with Multiple-Support Vector Machines
Feng Hao,Li Shuqing()
College of Information Engineering, Nanjing University of Finance & Economics, Nanjing 210023, China
Download: PDF (1182 KB)   HTML ( 2
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a new multi-layer cascade classifier based on multiple-support vector machines, aiming to address the credit scoring issues of financial institutions. [Methods] The proposed hybrid model combines the ideas of genetic algorithm, machine learning and ensemble learning. The framework includes support vector machine classifier, normalization method, feature extraction, parameter optimization, 10-fold cross evaluation and other technologies. We tested the layer deepening strategy, attribute reuse method, and fitness function diversification by experiment. [Results] We examined the support vector machine optimized by genetic algorithm with Australian Credit Approval dataset. The prediction accuracy was improved as the increase of layers, and the overall frame prediction accuracy reached 93.33%. [Limitations] The proposed method only uses SVM, which needs to be expanded. There are many classifiers in the framework, which took long time to train and optimize. [Conclusions] The proposed classifier could effectively improve credit scoring services, and finish similar binary classification tasks.

Key wordsSupport Vector Machine      Classification      Multi-layer Cascade      Credit Scoring     
Received: 29 January 2021      Published: 23 November 2021
ZTFLH:  TP399  
Fund:Major Natural Science Research Project of Colleges and Universities in Jiangsu Province(19KJA510011)
Corresponding Authors: Li Shuqing,ORCID:0000-0001-9814-5766     E-mail: leeshuqing@163.com

Cite this article:

Feng Hao, Li Shuqing. Multi-layer Cascade Classifier for Credit Scoring with Multiple-Support Vector Machines. Data Analysis and Knowledge Discovery, 2021, 5(10): 28-36.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0096     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I10/28

The Framework of Research
Classifier Structure in Related Work
Classifier Structure
参数名称 参数
种群规模 500
染色体长度 第1层长度为18(4个分类器参数+14个借款人属性)
第2层长度为36(第1层36个分类器预测结果)
第3层长度为45(第1层36个分类器预测结果+第2层9个分类器预测结果)
第4层长度为90(第1层72个分类器预测结果+第3层18个分类器预测结果)
第5层长度为65(56个借款人属性+第4层9个分类器预测结果)
交叉方法 实数部分为离散重组
二进制数部分为中间重组
交叉概率为0.7
变异方法 实数部分为均匀变异
二进制数部分为二元变异
变异概率为0.1
选择方式 锦标赛法
适应度函数 测试集误判数量总和如公式(1)所示
ER R T = er r L 1 000 + er r T (1)
测试集与训练集错误率之和如公式(2)所示
ER R % = ( er r L % + er r t % ) / 2 (2)
测试集和训练集误判数量以及验收特征系数之和如公式(3)所示
ER R Sum = er r L + er r T + F a F (3)
其中:
er r L表示10个训练集中的误判数量之和
er r T表示10个测试集中的误判数量之和
er r L %表示10个训练集中错误数量占比
er r T %表示10个测试集中错误数量占比
F a表示特征选择最终选择的特征数
F表示总特征数
Parameters of Genetic Algorithm
Feature Selection Process
信用评分对象 分类器1 分类器2 分类器3
信用评分对象1 1 1 0
信用评分对象2 1 1 1
信用评分对象3 0 0 1
Result of Judging Method
分类器 核函数 数据规范化方法 特征提取方法 误差计算方式 准确率
nu-SVC RBF Z-Score None ER R T 88.16%
C-SVC Poly Max-Min None ER R % 87.97%
nu-SVC Sigmoid Max-Min PCA ER R T 87.97%
nu-SVC Sigmoid Max-Min None ER R T 87.83%
nu-SVC Poly Z-Score PCA ER R T 87.83%
The Top 5 Classifiers in the First Layer
层数 分类器 核函数 误差计算方式 准确率
1 nu-SVC RBF ER R T 88.16%
2 nu-SVC RBF ER R T 90.00%
3 nu-SVC RBF ER R % 91.28%
4 nu-SVC RBF ER R % 92.43%
5 nu-SVC Sigmoid ER R sum 93.33%
The Classifier of the Highest Accuracy in Each Layer
Hybrid Matrix of Classifier Prediction
方法 准确率/%
SVM 82.43
XGBoost 85.25
MLP 83.54
DBN 87.50
本文 93.33
Accuracy of Each Method
[1] Thomas L C, Edelman D B, Crook J N. Credit Scoring and Its Applications[M]. Philadelphia: Society for Industrial and Applied Mathematics, 2002.
[2] Siddiqi N. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring[M]. Hoboken: SAS Publishing, 2005.
[3] Baesens B, Setiono R, Mues C, et al. Using Neural Network Rule Extraction and Decision Tables for Credit-Risk Evaluation[J]. Management Science, 2003, 49(3): 312-329.
doi: 10.1287/mnsc.49.3.312.12739
[4] Thomas L C. A Survey of Credit and Behavioural Scoring: Forecasting Financial Risk of Lending to Consumers[J]. International Journal of Forecasting, 2000, 16(2): 149-172.
doi: 10.1016/S0169-2070(00)00034-0
[5] Šušteršič M, Mramor D, Zupan J R. Consumer Credit Scoring Models with Limited Data[J]. Expert Systems with Applications, 2009, 36(3): 4736-4744.
doi: 10.1016/j.eswa.2008.06.016
[6] Crone S F, Finlay S. Instance Sampling in Credit Scoring: An Empirical Study of Sample Size and Balancing[J]. International Journal of Forecasting, 2012, 28(1): 224-238.
doi: 10.1016/j.ijforecast.2011.07.006
[7] Malhotra R, Malhotra D K. Evaluating Consumer Loans Using Neural Networks[J]. Omega, 2003, 31(2): 83-96.
doi: 10.1016/S0305-0483(03)00016-1
[8] Akkoç S. Exploring the Nature of Credit Scoring: A Neuro Fuzzy Approach[J]. Fuzzy Economic Review, 2019, 24(1): 3-24.
[9] West D. Neural Network Credit Scoring Models[J]. Computers & Operations Research, 2000, 27(11-12): 1131-1152.
doi: 10.1016/S0305-0548(99)00149-5
[10] Khashman A. Neural Networks for Credit Risk Evaluation: Investigation of Different Neural Models and Learning Schemes[J]. Expert Systems with Applications, 2010, 37(9): 6233-6239.
doi: 10.1016/j.eswa.2010.02.101
[11] Sasiada M, Fraczek-Szczypta A, Tadeusiewicz R. Efficiency Testing of Artificial Neural Networks in Predicting the Properties of Carbon Nanomaterials as Potential Systems for Nervous Tissue Stimulation and Regeneration[J]. Bio-Algorithms and Med-Systems, 2017, 13(1): 25-35.
[12] Finlay S. Multiple Classifier Architectures and Their Application to Credit Risk Assessment[J]. European Journal of Operational Research, 2011, 210(2): 368-378.
doi: 10.1016/j.ejor.2010.09.029
[13] Brown I, Mues C. An Experimental Comparison of Classification Algorithms for Imbalanced Credit Scoring Data Sets[J]. Expert Systems with Applications, 2012, 39(3): 3446-3453.
doi: 10.1016/j.eswa.2011.09.033
[14] Ong C S, Huang J J, Tzeng G H. Building Credit Scoring Models Using Genetic Programming[J]. Expert Systems with Applications, 2005, 29(1): 41-47.
doi: 10.1016/j.eswa.2005.01.003
[15] Hung C, Chen J H. A Selective Ensemble Based on Expected Probabilities for Bankruptcy Prediction[J]. Expert Systems with Applications, 2009, 36(3): 5297-5303.
doi: 10.1016/j.eswa.2008.06.068
[16] Bishop C. Pattern Recognition and Machine Learning[M]. New York: Springer-Verlag, 2006.
[17] Engelbrecht A P. Computational Intelligence[M]. Chichester: John Wiley & Sons, 2007.
[18] Akko S. An Empirical Comparison of Conventional Techniques, Neural Networks and the Three Stage Hybrid Adaptive Neuro Fuzzy Inference System (ANFIS) Model for Credit Scoring Analysis: The Case of Turkish Credit Card Data[J]. European Journal of Operational Research, 2012, 222(1): 168-178.
doi: 10.1016/j.ejor.2012.04.009
[19] Tsai C F. Feature Selection in Bankruptcy Prediction[J]. Knowledge-Based Systems, 2009, 22(2): 120-127.
doi: 10.1016/j.knosys.2008.08.002
[20] Onay C, Öztürk E. A Review of Credit Scoring Research in the Age of Big Data[J]. Journal of Financial Regulation and Compliance, 2018, 26(3): 382-405.
doi: 10.1108/JFRC-06-2017-0054
[21] Pratiwi H, Mukid M A, Hoyyi A, et al. Credit Scoring Analysis Using Pseudo Nearest Neighbor[J]. Journal of Physics: Conference Series, 2019, 1217(1): 0121008.
[22] Wang G, Ma J, Huang L H, et al. Two Credit Scoring Models Based on Dual Strategy Ensemble Trees[J]. Knowledge-Based Systems, 2012, 26: 61-68.
doi: 10.1016/j.knosys.2011.06.020
[23] Oreski S, Oreski D, Oreski G. Hybrid System with Genetic Algorithm and Artificial Neural Networks and Its Application to Retail Credit Risk Assessment[J]. Expert Systems with Applications, 2012, 39(16): 12605-12617.
doi: 10.1016/j.eswa.2012.05.023
[24] Huang C L, Chen M C, Wang C J. Credit Scoring with a Data Mining Approach Based on Support Vector Machines[J]. Expert Systems with Applications, 2007, 33(4): 847-856.
doi: 10.1016/j.eswa.2006.07.007
[25] Twala B. Multiple Classifier Application to Credit Risk Assessment[J]. Expert Systems with Applications, 2010, 37(4): 3326-3336.
doi: 10.1016/j.eswa.2009.10.018
[26] Zhang D F, Zhou X Y, Leung S C H, et al. Vertical Bagging Decision Trees Model for Credit Scoring[J]. Expert Systems with Applications, 2010, 37(12): 7838-7843.
doi: 10.1016/j.eswa.2010.04.054
[27] Ala’Raj M, Abbod M F. A New Hybrid Ensemble Credit Scoring Model Based on Classifiers Consensus System Approach[J]. Expert Systems with Applications, 2016, 64: 36-55.
doi: 10.1016/j.eswa.2016.07.017
[28] Nanni L, Lumini A. An Experimental Comparison of Ensemble of Classifiers for Bankruptcy Prediction and Credit Scoring[J]. Expert Systems with Applications, 2009, 36(2): 3028-3033.
doi: 10.1016/j.eswa.2008.01.018
[29] Tsai C F, Hsu Y F, Yen D C. A Comparative Study of Classifier Ensembles for Bankruptcy Prediction[J]. Applied Soft Computing, 2014, 24(11): 977-984.
doi: 10.1016/j.asoc.2014.08.047
[30] Fogel L J. Autonomous Automata[J]. Industrial Research, 1962 (4): 14-19.
[31] Gong D W, Sun J, Miao Z. A Set-Based Genetic Algorithm for Interval Many-Objective Optimization Problems[J]. IEEE Transactions on Evolutionary Computation, 2018, 22(1): 47-60.
doi: 10.1109/TEVC.2016.2634625
[32] Cramer N L. A Representation for the Adaptive Generation of Simple Sequential Programs [C]//Proceedings of the 1st International Conference on Genetic Algorithms. 1985: 183-187.
[33] Yang J, Honavar V. Feature Subset Selection Using a Genetic Algorithm[J]. IEEE Intelligent Systems and Their Applications, 1998, 13(2): 44-49.
doi: 10.1109/5254.671091
[34] Qiu X, Xu J X, Xu Y H, et al. A New Differential Evolution Algorithm for Minimax Optimization in Robust Design[J]. IEEE Transactions on Cybernetics, 2018, 48(5): 1355-1368.
doi: 10.1109/TCYB.2017.2692963
[35] Baranti L A, Lutfi A. Credit Scoring Model for SME Customer Assessment in a Telco Company[A]//Understanding Digital Industry[M]. Routledge, 2020: 141-144.
[36] Kuncheva L I, Jain L C. Designing Classifier Fusion Systems by Genetic Algorithms[J]. IEEE Transactions on Evolutionary Computation, 2000, 4(4): 327-336.
doi: 10.1109/4235.887233
[37] Raymer M L, Punch W F, Goodman E D, et al. Dimensionality Reduction Using Genetic Algorithms[J]. IEEE Transactions on Evolutionary Computation, 2000, 4(2): 164-171.
doi: 10.1109/4235.850656
[38] Lei K, Xie Y X, Zhong S R, et al. Generative Adversarial Fusion Network for Class Imbalance Credit Scoring[J]. Neural Computing and Applications, 2020, 32(12): 8451-8462.
doi: 10.1007/s00521-019-04335-1
[39] Gorzałczany M B, Rudziński F. A Multi-Objective Genetic Optimization for Fast, Fuzzy Rule-Based Credit Classification with Balanced Accuracy and Interpretability[J]. Applied Soft Computing, 2016, 40: 206-220.
doi: 10.1016/j.asoc.2015.11.037
[40] Vukovic S, Delibasic B, Uzelac A, et al. A Case-Based Reasoning Model that Uses Preference Theory Functions for Credit Scoring[J]. Expert Systems with Applications, 2012, 39(9): 8389-8395.
doi: 10.1016/j.eswa.2012.01.181
[41] Pławiak P. Novel Genetic Ensembles of Classifiers Applied to Myocardium Dysfunction Recognition Based on ECG Signals[J]. Swarm and Evolutionary Computation, 2018, 39: 192-208.
doi: 10.1016/j.swevo.2017.10.002
[1] Fan Shaoping,Zhao Yuxuan,An Xinying,Wu Qingqiang. Classification Model for Medical Entity Relations with Convolutional Neural Network[J]. 数据分析与知识发现, 2021, 5(9): 75-84.
[2] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[3] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[4] Lu Quan, He Chao, Chen Jing, Tian Min, Liu Ting. A Multi-Label Classification Model with Two-Stage Transfer Learning[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
[5] Xie Hao,Mao Jin,Li Gang. Sentiment Classification of Image-Text Information with Multi-Layer Semantic Fusion[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[6] Yu Bengong,Zhu Xiaojie,Zhang Ziwei. A Capsule Network Model for Text Classification with Multi-level Feature Extraction[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[7] Meng Zhen,Wang Hao,Yu Wei,Deng Sanhong,Zhang Baolong. Vocal Music Classification Based on Multi-category Feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 59-70.
[8] Zhang Mengyao, Zhu Guangli, Zhang Shunxiang, Zhang Biao. Grouping Microblog Users of Trending Topics Based on Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[9] Wang Yan, Wang Huyan, Yu Bengong. Chinese Text Classification with Feature Fusion[J]. 数据分析与知识发现, 2021, 5(10): 1-14.
[10] Leng Jidong,Lv Xueqiang,Jiang Yang,Li Guolin. Consensus Mechanisms of Consortium Blockchain: A Survey[J]. 数据分析与知识发现, 2021, 5(1): 56-65.
[11] Yu Bengong, Zhu Mengdi. Question Classification Based on Bidirectional GRU with Hierarchical Attention and Multi-channel Convolution[J]. 数据分析与知识发现, 2020, 4(8): 50-62.
[12] Zhao Yang, Zhang Zhixiong, Liu Huan, Ding Liangping. Classification of Chinese Medical Literature with BERT Model[J]. 数据分析与知识发现, 2020, 4(8): 41-49.
[13] Weng Mengjuan,Yao Changqing,Han Hongqi,Wang Lijun,Ran Yaxin. Classification and Indexing Method with CNN for Imbalanced Datasets[J]. 数据分析与知识发现, 2020, 4(7): 87-95.
[14] Tang Xiaobo,Gao Hexuan. Classification of Health Questions Based on Vector Extension of Keywords[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
[15] Wang Xinyun,Wang Hao,Deng Sanhong,Zhang Baolong. Classification of Academic Papers for Periodical Selection[J]. 数据分析与知识发现, 2020, 4(7): 96-109.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn