Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (10): 28-36     https://doi.org/10.11925/infotech.2096-3467.2021.0096
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于多种支持向量机的多层级联式分类器研究及其在信用评分中的应用*
冯昊,李树青()
南京财经大学信息工程学院 南京 210023
Multi-layer Cascade Classifier for Credit Scoring with Multiple-Support Vector Machines
Feng Hao,Li Shuqing()
College of Information Engineering, Nanjing University of Finance & Economics, Nanjing 210023, China
全文: PDF (1182 KB)   HTML ( 12
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 主要面向广受关注的金融机构信用评分问题,利用机器学习方法,研究基于多种支持向量机的多层级联式分类器方法在其中的应用。【方法】 所提分类器是一种混合模型,结合遗传算法、机器学习和集成学习思想,框架包含支持向量机分类器、归一化方法、特征提取、参数优化、10折交叉验证等多种技术。重点在层数加深策略、属性复用方法、适应度函数多样化等方面做了深入细致的方法研究和实验论证。【结果】 实验发现,经过遗传算法优化的支持向量机在应用于Australian Credit Approval数据集时,预测准确率可以随着层数的增加而提高,整体框架预测准确率达到93.33%。【局限】 仅使用支持向量机这一种方法,仍需要进一步尝试使用更多分类器进行优化。同时,由于框架中分类器较多,且拥有多层结构,需要较长时间进行训练和优化。【结论】 所提分类器可以有效应用于各种金融领域的信用评分服务,也可以用于其他类似的二分类问题。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
冯昊
李树青
关键词 支持向量机分类多层级联式信用评分    
Abstract

[Objective] This paper proposes a new multi-layer cascade classifier based on multiple-support vector machines, aiming to address the credit scoring issues of financial institutions. [Methods] The proposed hybrid model combines the ideas of genetic algorithm, machine learning and ensemble learning. The framework includes support vector machine classifier, normalization method, feature extraction, parameter optimization, 10-fold cross evaluation and other technologies. We tested the layer deepening strategy, attribute reuse method, and fitness function diversification by experiment. [Results] We examined the support vector machine optimized by genetic algorithm with Australian Credit Approval dataset. The prediction accuracy was improved as the increase of layers, and the overall frame prediction accuracy reached 93.33%. [Limitations] The proposed method only uses SVM, which needs to be expanded. There are many classifiers in the framework, which took long time to train and optimize. [Conclusions] The proposed classifier could effectively improve credit scoring services, and finish similar binary classification tasks.

Key wordsSupport Vector Machine    Classification    Multi-layer Cascade    Credit Scoring
收稿日期: 2021-01-29      出版日期: 2021-11-23
ZTFLH:  TP399  
基金资助:*江苏省高等学校自然科学研究重大项目(19KJA510011)
通讯作者: 李树青,ORCID:0000-0001-9814-5766     E-mail: leeshuqing@163.com
引用本文:   
冯昊, 李树青. 基于多种支持向量机的多层级联式分类器研究及其在信用评分中的应用*[J]. 数据分析与知识发现, 2021, 5(10): 28-36.
Feng Hao, Li Shuqing. Multi-layer Cascade Classifier for Credit Scoring with Multiple-Support Vector Machines. Data Analysis and Knowledge Discovery, 2021, 5(10): 28-36.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.0096      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I10/28
Fig.1  本文研究框架
Fig.2  相关工作中的分类器结构
Fig.3  分类器结构
参数名称 参数
种群规模 500
染色体长度 第1层长度为18(4个分类器参数+14个借款人属性)
第2层长度为36(第1层36个分类器预测结果)
第3层长度为45(第1层36个分类器预测结果+第2层9个分类器预测结果)
第4层长度为90(第1层72个分类器预测结果+第3层18个分类器预测结果)
第5层长度为65(56个借款人属性+第4层9个分类器预测结果)
交叉方法 实数部分为离散重组
二进制数部分为中间重组
交叉概率为0.7
变异方法 实数部分为均匀变异
二进制数部分为二元变异
变异概率为0.1
选择方式 锦标赛法
适应度函数 测试集误判数量总和如公式(1)所示
ER R T = er r L 1 000 + er r T (1)
测试集与训练集错误率之和如公式(2)所示
ER R % = ( er r L % + er r t % ) / 2 (2)
测试集和训练集误判数量以及验收特征系数之和如公式(3)所示
ER R Sum = er r L + er r T + F a F (3)
其中:
er r L表示10个训练集中的误判数量之和
er r T表示10个测试集中的误判数量之和
er r L %表示10个训练集中错误数量占比
er r T %表示10个测试集中错误数量占比
F a表示特征选择最终选择的特征数
F表示总特征数
Table 1  遗传算法参数
Fig.4  特征选择流程
信用评分对象 分类器1 分类器2 分类器3
信用评分对象1 1 1 0
信用评分对象2 1 1 1
信用评分对象3 0 0 1
Table 2  分类器预测结果
分类器 核函数 数据规范化方法 特征提取方法 误差计算方式 准确率
nu-SVC RBF Z-Score None ER R T 88.16%
C-SVC Poly Max-Min None ER R % 87.97%
nu-SVC Sigmoid Max-Min PCA ER R T 87.97%
nu-SVC Sigmoid Max-Min None ER R T 87.83%
nu-SVC Poly Z-Score PCA ER R T 87.83%
Table 3  第1层准确率前5的分类器
层数 分类器 核函数 误差计算方式 准确率
1 nu-SVC RBF ER R T 88.16%
2 nu-SVC RBF ER R T 90.00%
3 nu-SVC RBF ER R % 91.28%
4 nu-SVC RBF ER R % 92.43%
5 nu-SVC Sigmoid ER R sum 93.33%
Table 4  每层中准确率最高的分类器
Fig.5  分类器预测结果的混合矩阵
方法 准确率/%
SVM 82.43
XGBoost 85.25
MLP 83.54
DBN 87.50
本文 93.33
Table 5  各方法准确率
[1] Thomas L C, Edelman D B, Crook J N. Credit Scoring and Its Applications[M]. Philadelphia: Society for Industrial and Applied Mathematics, 2002.
[2] Siddiqi N. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring[M]. Hoboken: SAS Publishing, 2005.
[3] Baesens B, Setiono R, Mues C, et al. Using Neural Network Rule Extraction and Decision Tables for Credit-Risk Evaluation[J]. Management Science, 2003, 49(3): 312-329.
doi: 10.1287/mnsc.49.3.312.12739
[4] Thomas L C. A Survey of Credit and Behavioural Scoring: Forecasting Financial Risk of Lending to Consumers[J]. International Journal of Forecasting, 2000, 16(2): 149-172.
doi: 10.1016/S0169-2070(00)00034-0
[5] Šušteršič M, Mramor D, Zupan J R. Consumer Credit Scoring Models with Limited Data[J]. Expert Systems with Applications, 2009, 36(3): 4736-4744.
doi: 10.1016/j.eswa.2008.06.016
[6] Crone S F, Finlay S. Instance Sampling in Credit Scoring: An Empirical Study of Sample Size and Balancing[J]. International Journal of Forecasting, 2012, 28(1): 224-238.
doi: 10.1016/j.ijforecast.2011.07.006
[7] Malhotra R, Malhotra D K. Evaluating Consumer Loans Using Neural Networks[J]. Omega, 2003, 31(2): 83-96.
doi: 10.1016/S0305-0483(03)00016-1
[8] Akkoç S. Exploring the Nature of Credit Scoring: A Neuro Fuzzy Approach[J]. Fuzzy Economic Review, 2019, 24(1): 3-24.
[9] West D. Neural Network Credit Scoring Models[J]. Computers & Operations Research, 2000, 27(11-12): 1131-1152.
doi: 10.1016/S0305-0548(99)00149-5
[10] Khashman A. Neural Networks for Credit Risk Evaluation: Investigation of Different Neural Models and Learning Schemes[J]. Expert Systems with Applications, 2010, 37(9): 6233-6239.
doi: 10.1016/j.eswa.2010.02.101
[11] Sasiada M, Fraczek-Szczypta A, Tadeusiewicz R. Efficiency Testing of Artificial Neural Networks in Predicting the Properties of Carbon Nanomaterials as Potential Systems for Nervous Tissue Stimulation and Regeneration[J]. Bio-Algorithms and Med-Systems, 2017, 13(1): 25-35.
[12] Finlay S. Multiple Classifier Architectures and Their Application to Credit Risk Assessment[J]. European Journal of Operational Research, 2011, 210(2): 368-378.
doi: 10.1016/j.ejor.2010.09.029
[13] Brown I, Mues C. An Experimental Comparison of Classification Algorithms for Imbalanced Credit Scoring Data Sets[J]. Expert Systems with Applications, 2012, 39(3): 3446-3453.
doi: 10.1016/j.eswa.2011.09.033
[14] Ong C S, Huang J J, Tzeng G H. Building Credit Scoring Models Using Genetic Programming[J]. Expert Systems with Applications, 2005, 29(1): 41-47.
doi: 10.1016/j.eswa.2005.01.003
[15] Hung C, Chen J H. A Selective Ensemble Based on Expected Probabilities for Bankruptcy Prediction[J]. Expert Systems with Applications, 2009, 36(3): 5297-5303.
doi: 10.1016/j.eswa.2008.06.068
[16] Bishop C. Pattern Recognition and Machine Learning[M]. New York: Springer-Verlag, 2006.
[17] Engelbrecht A P. Computational Intelligence[M]. Chichester: John Wiley & Sons, 2007.
[18] Akko S. An Empirical Comparison of Conventional Techniques, Neural Networks and the Three Stage Hybrid Adaptive Neuro Fuzzy Inference System (ANFIS) Model for Credit Scoring Analysis: The Case of Turkish Credit Card Data[J]. European Journal of Operational Research, 2012, 222(1): 168-178.
doi: 10.1016/j.ejor.2012.04.009
[19] Tsai C F. Feature Selection in Bankruptcy Prediction[J]. Knowledge-Based Systems, 2009, 22(2): 120-127.
doi: 10.1016/j.knosys.2008.08.002
[20] Onay C, Öztürk E. A Review of Credit Scoring Research in the Age of Big Data[J]. Journal of Financial Regulation and Compliance, 2018, 26(3): 382-405.
doi: 10.1108/JFRC-06-2017-0054
[21] Pratiwi H, Mukid M A, Hoyyi A, et al. Credit Scoring Analysis Using Pseudo Nearest Neighbor[J]. Journal of Physics: Conference Series, 2019, 1217(1): 0121008.
[22] Wang G, Ma J, Huang L H, et al. Two Credit Scoring Models Based on Dual Strategy Ensemble Trees[J]. Knowledge-Based Systems, 2012, 26: 61-68.
doi: 10.1016/j.knosys.2011.06.020
[23] Oreski S, Oreski D, Oreski G. Hybrid System with Genetic Algorithm and Artificial Neural Networks and Its Application to Retail Credit Risk Assessment[J]. Expert Systems with Applications, 2012, 39(16): 12605-12617.
doi: 10.1016/j.eswa.2012.05.023
[24] Huang C L, Chen M C, Wang C J. Credit Scoring with a Data Mining Approach Based on Support Vector Machines[J]. Expert Systems with Applications, 2007, 33(4): 847-856.
doi: 10.1016/j.eswa.2006.07.007
[25] Twala B. Multiple Classifier Application to Credit Risk Assessment[J]. Expert Systems with Applications, 2010, 37(4): 3326-3336.
doi: 10.1016/j.eswa.2009.10.018
[26] Zhang D F, Zhou X Y, Leung S C H, et al. Vertical Bagging Decision Trees Model for Credit Scoring[J]. Expert Systems with Applications, 2010, 37(12): 7838-7843.
doi: 10.1016/j.eswa.2010.04.054
[27] Ala’Raj M, Abbod M F. A New Hybrid Ensemble Credit Scoring Model Based on Classifiers Consensus System Approach[J]. Expert Systems with Applications, 2016, 64: 36-55.
doi: 10.1016/j.eswa.2016.07.017
[28] Nanni L, Lumini A. An Experimental Comparison of Ensemble of Classifiers for Bankruptcy Prediction and Credit Scoring[J]. Expert Systems with Applications, 2009, 36(2): 3028-3033.
doi: 10.1016/j.eswa.2008.01.018
[29] Tsai C F, Hsu Y F, Yen D C. A Comparative Study of Classifier Ensembles for Bankruptcy Prediction[J]. Applied Soft Computing, 2014, 24(11): 977-984.
doi: 10.1016/j.asoc.2014.08.047
[30] Fogel L J. Autonomous Automata[J]. Industrial Research, 1962 (4): 14-19.
[31] Gong D W, Sun J, Miao Z. A Set-Based Genetic Algorithm for Interval Many-Objective Optimization Problems[J]. IEEE Transactions on Evolutionary Computation, 2018, 22(1): 47-60.
doi: 10.1109/TEVC.2016.2634625
[32] Cramer N L. A Representation for the Adaptive Generation of Simple Sequential Programs [C]//Proceedings of the 1st International Conference on Genetic Algorithms. 1985: 183-187.
[33] Yang J, Honavar V. Feature Subset Selection Using a Genetic Algorithm[J]. IEEE Intelligent Systems and Their Applications, 1998, 13(2): 44-49.
doi: 10.1109/5254.671091
[34] Qiu X, Xu J X, Xu Y H, et al. A New Differential Evolution Algorithm for Minimax Optimization in Robust Design[J]. IEEE Transactions on Cybernetics, 2018, 48(5): 1355-1368.
doi: 10.1109/TCYB.2017.2692963
[35] Baranti L A, Lutfi A. Credit Scoring Model for SME Customer Assessment in a Telco Company[A]//Understanding Digital Industry[M]. Routledge, 2020: 141-144.
[36] Kuncheva L I, Jain L C. Designing Classifier Fusion Systems by Genetic Algorithms[J]. IEEE Transactions on Evolutionary Computation, 2000, 4(4): 327-336.
doi: 10.1109/4235.887233
[37] Raymer M L, Punch W F, Goodman E D, et al. Dimensionality Reduction Using Genetic Algorithms[J]. IEEE Transactions on Evolutionary Computation, 2000, 4(2): 164-171.
doi: 10.1109/4235.850656
[38] Lei K, Xie Y X, Zhong S R, et al. Generative Adversarial Fusion Network for Class Imbalance Credit Scoring[J]. Neural Computing and Applications, 2020, 32(12): 8451-8462.
doi: 10.1007/s00521-019-04335-1
[39] Gorzałczany M B, Rudziński F. A Multi-Objective Genetic Optimization for Fast, Fuzzy Rule-Based Credit Classification with Balanced Accuracy and Interpretability[J]. Applied Soft Computing, 2016, 40: 206-220.
doi: 10.1016/j.asoc.2015.11.037
[40] Vukovic S, Delibasic B, Uzelac A, et al. A Case-Based Reasoning Model that Uses Preference Theory Functions for Credit Scoring[J]. Expert Systems with Applications, 2012, 39(9): 8389-8395.
doi: 10.1016/j.eswa.2012.01.181
[41] Pławiak P. Novel Genetic Ensembles of Classifiers Applied to Myocardium Dysfunction Recognition Based on ECG Signals[J]. Swarm and Evolutionary Computation, 2018, 39: 192-208.
doi: 10.1016/j.swevo.2017.10.002
[1] 范少萍,赵雨宣,安新颖,吴清强. 基于卷积神经网络的医学实体关系分类模型研究*[J]. 数据分析与知识发现, 2021, 5(9): 75-84.
[2] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[3] 周泽聿,王昊,赵梓博,李跃艳,张小琴. 融合关联信息的GCN文本分类模型构建及其应用研究*[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[4] 陆泉, 何超, 陈静, 田敏, 刘婷. 基于两阶段迁移学习的多标签分类模型研究*[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
[5] 谢豪,毛进,李纲. 基于多层语义融合的图文信息情感分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[6] 余本功,朱晓洁,张子薇. 基于多层次特征提取的胶囊网络文本分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[7] 孟镇,王昊,虞为,邓三鸿,张宝隆. 基于特征融合的声乐分类研究*[J]. 数据分析与知识发现, 2021, 5(5): 59-70.
[8] 李菲菲,吴璠,王中卿. 基于生成式对抗网络和评论专业类型的情感分类研究 *[J]. 数据分析与知识发现, 2021, 5(4): 72-79.
[9] 沈旺, 李世钰, 刘嘉宇, 李贺. 问答社区回答质量评价体系优化方法研究 *[J]. 数据分析与知识发现, 2021, 5(2): 83-93.
[10] 王鸿, 舒展, 高印权, 田文洪. 一种单分类器联合多任务网络的隐式句间关系分析方法*[J]. 数据分析与知识发现, 2021, 5(11): 80-88.
[11] 董淼, 苏中琪, 周晓北, 兰雪, 崔志刚, 崔雷. 利用Text-CNN改进PubMedBERT在化学诱导性疾病实体关系分类效果的尝试[J]. 数据分析与知识发现, 2021, 5(11): 145-152.
[12] 王艳, 王胡燕, 余本功. 基于多特征融合的中文文本分类研究*[J]. 数据分析与知识发现, 2021, 5(10): 1-14.
[13] 冷基栋,吕学强,姜阳,李果林. 联盟链共识机制研究综述*[J]. 数据分析与知识发现, 2021, 5(1): 56-65.
[14] 余本功, 朱梦迪. 基于层级注意力多通道卷积双向GRU的问题分类研究*[J]. 数据分析与知识发现, 2020, 4(8): 50-62.
[15] 赵旸, 张智雄, 刘欢, 丁良萍. 基于BERT模型的中文医学文献分类研究*[J]. 数据分析与知识发现, 2020, 4(8): 41-49.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn