Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (6): 80-90    DOI: 10.11925/infotech.2096-3467.2019.1285
Evaluation Model for Customer Credits Based on Convolutional Neural Network
Liu Weijiang1,2,Wei Hai2(),Yun Tianhe2
1Center for Quantitative Economics, Jilin University, Changchun 130012, China
2Businesses School, Jilin University, Changchun 130012, China
[Objective] This paper analyzes customer loan information, and extracts their characteristics, aiming to more effectively predict customer defaults of online loans. [Methods] First, we collected customer credit data from Lending Club. Then, we integrated the characteristic variables from four aspects of customer information and created a grayscale map. Finally, we established a customer credit evaluation model based on convolutional neural networks. [Results] The proposed model had specificity of 99.4%, sensitivity of 68.7%, G-mean value of 82.7%, F1 value of 81.4% and AUC value of 99.5%. The performance of our new model was much better than those credit models based on feature processing. [Limitations] We only investigated the performance of a few models. More research is needed to study the impacts of unbalanced data. [Conclusions] The proposed model effectively predicts probability of customer defaults.

Key wordsConvolutional Neural Networks      Indicator Imaging      Credit Evaluation      Information Value      PCA     
Received: 29 November 2019      Published: 07 July 2020
ZTFLH:  TP393 G250  
Corresponding Authors: Wei Hai

Liu Weijiang,Wei Hai,Yun Tianhe. Evaluation Model for Customer Credits Based on Convolutional Neural Network. Data Analysis and Knowledge Discovery, 2020, 4(6): 80-90.

Research Process of Customer Credit Evaluation Combined with Convolutional Neural Network
CNN Structure Designed in This Paper
IV值 预测能力
[0,0.02) 无预测能力
[0.02,0.10) 预测能力低
[0.10,0.30) 预测能力中
[0.30,+∞) 预测能力高
IV Value Corresponding to Prediction Ability Interval
Graphical Sample Data
指标变量 变量名称 IV值 含义
贷款信息 loan_amnt 0.561 借款人申请的贷款金额
int_rate 0.724 贷款利率
偿付能力 annual_inc 0.560 借款人在注册期间自行报告的年收入
dti 0.333 使用借款人的总债务偿还总额(不包括抵押贷款和要求的LC贷款)除以借款人自我报告的月收入计算的比值
tot_cur_bal 0.555 所有账户的当前总余额
il_util 0.685 所有固定账户的总流量余额/信用额度的比率
max_bal_bc 0.710 所有周转账户的最大当前余额
acc_open_past_24mths 0.488 过去24个月的消费额
bc_open_to_buy 0.355 银行卡上可用于购买的资金
mort_acc 0.314 抵押账户数量
num_actv_rev_tl 0.560 当前活跃的循环交易数量
num_bc_tl 0.333 银行卡账户数量
installment_feat 0.306 客户每月还款支出占月收入的比值
历史信贷 open_acc 0.489 借款人信用档案中的未结信用额度
all_util 0.534 所有交易均衡信贷限额
total_bal_il 0.394 所有分期付款账户的当前总余额
revol_bal 0.780 总信贷周转余额
revol_util 0.564 循环利用率,或借款人相对于所有可用循环信贷使用的信贷额度
pct_tl_nvr_dlq 0.489 从未拖欠交易百分比
历史申请 mo_sin_old_il_acct 0.577 自最早开立银行分期账户以来的月数
mo_sin_old_rev_tl_op 0.441 自最早的循环账户开始以来的月数
mo_sin_rcnt_rev_tl_op 0.349 自最近一次的循环账户开通以来的月数
mo_sin_rcnt_tl 0.461 自最近一次开户以来的月数
mths_since_recent_bc 0.561 自最近一次开立银行卡账户以来的月数
mths_since_recent_inq 0.724 自最近的调查以来的月数
mths_since_rcnt_il 0.604 自最近的分期付款账户开通以来的月数
Index Variable System
模型 参数或结构设置
LeNet-5 参考图1结构
BP神经网络 采用传统三层BP神经网络,参数设定参考文献[19]
决策树 max_depth:7
支持向量机 kernal:rbf,c:100,gamma:0.01
随机森林 max_depth:8,min_samples_leaf:4
Logistic回归 c:0.1,penalty:l1
Machine Learning Models’ Parameter Settings
真实情况 预测结果
正例 反例
(True Positive,FP)
(False Negative,FN)
(False Positive,FP)
(True Negative,FN)
Confusion Matrix
特征处理方法 模型方法 灵敏度 特异度 查准率
LeNet-5 0.687(1) 0.994(1) 0.998(1)
基于信息价值特征处理 BP神经网络 0.616 0.723 0.695
决策树 0.625 0.706 0.679
支持向量机 0.657(3) 0.751 0.724
随机森林 0.663(2) 0.717 0.744
Logistic回归 0.617 0.649 0.635
基于PCA特征处理 BP神经网络 0.648 0.889(3) 0.946(3)
决策树 0.636 0.839 0.911
支持向量机 0.657(3) 0.911(2) 0.956(2)
随机森林 0.641 0.872 0.930
Logistic回归 0.648 0.889(3) 0.656
Models’ Recognition Performance of Positive and Negative Samples
特征处理方法 模型方法 G-mean F1 AUC
LeNet-5 0.827(1) 0.814(1) 0.995(1)
BP神经网络 0.668 0.653 0.715
决策树 0.665 0.651 0.724
支持向量机 0.703 0.689 0.668
随机森林 0.705 0.701 0.778
Logistic回归 0.633 0.626 0.747
BP神经网络 0.759(3) 0.769(3) 0.816
决策树 0.730 0.749 0.837
支持向量机 0.773(2) 0.778(2) 0.904(2)
随机森林 0.748 0.759 0.865
Logistic回归 0.759(3) 0.652 0.880(3)
Models’ Performance Evaluation
