基本情况 主编致辞 收录获奖
 编委会 编辑部 审稿专家
 本刊学术规范 行业规范

## 基于卷积神经网络的客户信用评估模型研究*

1吉林大学数量经济研究中心 长春 130012

2吉林大学商学院 长春 130012

## Evaluation Model for Customer Credits Based on Convolutional Neural Network

Liu Weijiang1,2, Wei Hai,,2, Yun Tianhe2

1Center for Quantitative Economics, Jilin University, Changchun 130012, China

2Businesses School, Jilin University, Changchun 130012, China

 基金资助: *本文系教育部人文社会科学重点研究基地项目“新常态下促进经济稳定增长的要素配置与产业升级政策研究”.  16JJD790015国家自然科学基金项目“中国经济周期波动的转折点识别、阶段转换及预警研究”.  715731052020年度吉林大学东北振兴发展专项研究课题“大数据背景下吉林省外来投资情况动态监测及新冠后的对策建议”的研究成果之一.  20ZXZ01

【目的】 分析客户贷款信息基础上,抽取贷款客户特征并成像,利用卷积神经网络构建客户信用模型,提高客户网贷违约预测准确率。【方法】 基于Lending Club客户信用数据,将反映客户信息4个方面的特征变量相互连接综合成灰度图,建立基于卷积神经网络的客户信用评估模型。【结果】 实验结果表明,基于卷积神经网络的新模型在信用评估实验中特异度为99.4%,灵敏度为68.7%,G-mean值为82.7%,F1值为81.4%,AUC值为99.5%,与传统以特征处理为基础的信用评估模型相比均有显著提升。【局限】 仅对比分析有限的信用评估模型,未对不平衡数据的影响做进一步研究。【结论】 基于卷积神经网络的网贷客户信用评估模型,在客户违约特征信息提取和违约可能性的预测上具有良好性能。

Abstract

[Objective] This paper analyzes customer loan information, and extracts their characteristics, aiming to more effectively predict customer defaults of online loans. [Methods] First, we collected customer credit data from Lending Club. Then, we integrated the characteristic variables from four aspects of customer information and created a grayscale map. Finally, we established a customer credit evaluation model based on convolutional neural networks. [Results] The proposed model had specificity of 99.4%, sensitivity of 68.7%, G-mean value of 82.7%, F1 value of 81.4% and AUC value of 99.5%. The performance of our new model was much better than those credit models based on feature processing. [Limitations] We only investigated the performance of a few models. More research is needed to study the impacts of unbalanced data. [Conclusions] The proposed model effectively predicts probability of customer defaults.

Keywords： Convolutional Neural Networks ; Indicator Imaging ; Credit Evaluation ; Information Value ; PCA

Liu Weijiang. Evaluation Model for Customer Credits Based on Convolutional Neural Network. Data Analysis and Knowledge Discovery[J], 2020, 4(6): 80-90 doi:10.11925/infotech.2096-3467.2019.1285

## 1 引言

(1)采用多隐层的深层网络结构相比浅层结构更能学习刻画复杂数据的本质特征,对可视化和分类等任务而言有很大的帮助;

(2)通过无监督的逐层初始化策略有效克服深度神经网络在训练中的困难。

## 2 模型构建理论基础

### 图1

Fig.1   Research Process of Customer Credit Evaluation Combined with Convolutional Neural Network

### 2.1 卷积神经网络模型构建

LeNet-5模型作为一种典型的卷积神经网络模型,最初应用于手写数字的识别并取得巨大的成功[16]。LeNet-5模型是由输入层、卷积层、池化层、卷积层、池化层、全连接层和输出层等7层构成。针对客户信用评估问题的特点,本文对传统的LeNet-5模型进行部分改进。

（1）由于正负样本数量存在差异,为防止模型出现过拟合现象,在全连接层F6层中加入Dropout层,并将阈值设为0.5。

（2）客户信用评估的目标是区分违约用户和正常用户,属于典型的二分类问题。因此需要将传统LeNet-5模型的输出层由10个神经元修改为2个神经元。

### 图2

Fig.2   CNN Structure Designed in This Paper

### 2.2 模型输入数据处理方法

（1） 基于特征信息度的指标特征处理

①WOE值

$WOEi=lnpyipni=ln#yi#yt#ni#nt$

②IV值

$IVi=(pyi-pni)⋅WOEi=#yi#yt-#ni#nt⋅ln#yi#yt#ni#nt$

$IVi=∑i=1nIVi$

③指标选取原则

$WOEi=α+βi+ε$

$i=1,2,...,M$

Table 1  IV Value Corresponding to Prediction Ability Interval

IV值预测能力
[0,0.02)无预测能力
[0.02,0.10)预测能力低
[0.10,0.30)预测能力中
[0.30,+∞)预测能力高

（2） 基于主成分分析的指标特征处理

## 3 实证

### 图3

Fig.3   Graphical Sample Data

（1） 基于信息价值的特征选择处理

Table 2  Index Variable System

int_rate0.724贷款利率

dti0.333使用借款人的总债务偿还总额（不包括抵押贷款和要求的LC贷款）除以借款人自我报告的月收入计算的比值
tot_cur_bal0.555所有账户的当前总余额
il_util0.685所有固定账户的总流量余额/信用额度的比率
max_bal_bc0.710所有周转账户的最大当前余额
acc_open_past_24mths0.488过去24个月的消费额
mort_acc0.314抵押账户数量
num_actv_rev_tl0.560当前活跃的循环交易数量
num_bc_tl0.333银行卡账户数量
installment_feat0.306客户每月还款支出占月收入的比值

all_util0.534所有交易均衡信贷限额
total_bal_il0.394所有分期付款账户的当前总余额
revol_bal0.780总信贷周转余额
revol_util0.564循环利用率,或借款人相对于所有可用循环信贷使用的信贷额度
pct_tl_nvr_dlq0.489从未拖欠交易百分比

mo_sin_old_rev_tl_op0.441自最早的循环账户开始以来的月数
mo_sin_rcnt_rev_tl_op0.349自最近一次的循环账户开通以来的月数
mo_sin_rcnt_tl0.461自最近一次开户以来的月数
mths_since_recent_bc0.561自最近一次开立银行卡账户以来的月数
mths_since_recent_inq0.724自最近的调查以来的月数
mths_since_rcnt_il0.604自最近的分期付款账户开通以来的月数

（注：原始数据出于对贷款人个人信息的保护,未能获取借款人的个人基本情况数据,如性别、年龄等。）

（2） 基于主成分分析的特征选择处理

$F1=0.311V1+0.169V2+0.064V3+…+0.530V52$

$F2=0.369V1+0.165V2+0.051V3+…+0.313V52$

$F18=-0.058V1+0.075V2-0.023V3+…+0.042V52$

### 3.2 对比模型与评价指标

Table 3  Machine Learning Models’ Parameter Settings

LeNet-5参考图1结构
BP神经网络采用传统三层BP神经网络,参数设定参考文献[19]

Logistic回归c:0.1,penalty:l1

Table 4  Confusion Matrix

(Positive)

(True Positive,FP)

(False Negative,FN)

(Negative)

(False Positive,FP)

(True Negative,FN)

$Sensitivity=Recall=TPTP+FN$
$Specificity=TNTN+FP$
$Precision=TPTP+FP$
$G-mean=Recall×Specificity$
$F1=2×Precision×RecallPrecision+Recall$

### 3.3 模型实证结果及分析

（1） 模型对正负样本区分度性能评估

6种不同模型对正负样本的识别评估情况如表5所示。 (基于卷积神经网络的评估模型的特征提取由模型自动完成。)

Table 5  Models’ Recognition Performance of Positive and Negative Samples

LeNet-50.687(1)0.994(1)0.998(1)

Logistic回归0.6170.6490.635

Logistic回归0.6480.889(3)0.656

（注：得分后括号内数字为准确率排名前三的模型序号。）

（2） 模型综合性能评估

Table 6  Models’ Performance Evaluation

LeNet-50.827（1）0.814（1）0.995（1）

BP神经网络0.6680.6530.715

Logistic回归0.6330.6260.747

BP神经网络0.759（3）0.769（3）0.816

Logistic回归0.759（3）0.6520.880（3）

（注：得分后括号内数字为准确率排名前三的模型序号。）

## 支撑数据:

[1] 魏海.Lending Club.xlsx.初步清洗后数据集.

[2] 运天鹤.data_IV.xlsx.经过筛选后的变量及IV值.

[3] 运天鹤.facter.xlsx.因子得分系数表.

[4] 运天鹤.loan3.csv.筛选后的数据集.

[5] 魏海.X_test.csv.传统方法测试集.

[6] 魏海.X_train.csv.传统方法训练集.

[7] 魏海.new_train.csv.新方法的训练集灰度图索引.

[8] 魏海.new_test.csv.新方法的测试集灰度图索引.

## 参考文献 原文顺序 文献年度倒序 文中引用次数倒序 被引期刊影响因子

[J]. 中国管理科学, 2019,27(3):11-19.

( Jiang Hui, Ma Chaoqun, Xu Xuqing, et al.

An EM-similar Imputation Algorithm for Multivariable Data Missing and Its Application in Credit Scoring

[J]. Chinese Journal of Management Science, 2019,27(3):11-19.)

[J]. 管理科学学报, 2015,18(3):114-126.

( Xiao Jin, Liu Dunhu, Gu Xin, et al.

Dynamic Classifier Ensemble Selection Model for Bank Customer’s Credit Scoring

[J]. Journal of Management Sciences in China, 2015,18(3):114-126.)

Altman E I. Financial Ratios,

Discriminant Analysis and the Prediction of Corporate Bankruptcy

[J]. The Journal of Finance, 1968,23(4):589-609.

Wiginton J C.

A Note on the Comparison of Logit and Discriminant Models of Consumer Credit Behavior

[J]. Journal of Financial Quantitative Analysis, 1980(15):757-770.

[J]. 中国管理科学, 2008,16(S1):362-367.

( Wu Chong, Xia Han.

Study of Customer Credit Evaluation Under E-commerce Based on Support Vector Machine Ensemble

[J]. Chinese Journal of Management Science, 2008,16(S1):362-367.)

Blanco A, Pino-Mejías R, Lara J, et al.

Credit Scoring Models for the Microfinance Industry Using Neural Networks: Evidence from Peru

[J]. Expert Systems with Applications, 2013,40(1):356-364.

Credit scoring systems are currently in common use by numerous financial institutions worldwide. However, credit scoring with the microfinance industry is a relatively recent application, and no model which employs a non-parametric statistical technique has yet, to the best of our knowledge, been published. This lack is surprising since the implementation of credit scoring should contribute towards the efficiency of microfinance institutions, thereby improving their competitiveness in an increasingly constrained environment. This paper builds several non-parametric credit scoring models based on the multilayer perceptron approach (MLP) and benchmarks their performance against other models which employ the traditional linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and logistic regression (LR) techniques. Based on a sample of almost 5500 borrowers from a Peruvian microfinance institution, the results reveal that neural network models outperform the other three classic techniques both in terms of area under the receiver-operating characteristic curve (AUC) and as misclassification costs. (C) 2012 Elsevier Ltd.

Chen F L, Li F C.

Combination of Feature Selection Approaches with SVM in Credit Scoring

[J]. Expert Systems with Applications, 2010,37(7):4902-4909.

## Abstract

The credit scoring has been regarded as a critical topic and its related departments make efforts to collect huge amount of data to avoid wrong decision. An effective classificatory model will objectively help managers instead of intuitive experience. This study proposes four approaches combining with the SVM (support vector machine) classifier for features selection that retains sufficient information for classification purpose. Different credit scoring models are constructed by selecting attributes with four approaches. Two UCI (University of California, Irvine) data sets are chosen to evaluate the accuracy of various hybrid-SVM models. SVM classifier combines with conventional statistical LDA, Decision tree, Rough sets and F-score approaches as features pre-processing step to optimize feature space by removing both irrelevant and redundant features. In this paper, the procedure of the proposed approaches will be described and then evaluated by their performances. The results are compared in combination with SVM classifier and nonparametric Wilcoxon signed rank test will be held to show if there is any significant difference between these models. The result in this study suggests that hybrid credit scoring approach is mostly robust and effective in finding optimal subsets and is a promising method to the fields of data mining.

[J]. 科研信息化技术与应用, 2019,10(3):28-36.

( Xiong Zhibing, Wu Weiye.

Credit Evaluation Research Based on Deep Belief Networks

[J]. E-science Technology & Application, 2019,10(3):28-36.)

[J]. 会计研究, 2011(2):59-65,97.

( Wu Xingze.

Problems on Research of Predicting Financial Distress and Framework Reconstructure

[J]. Accounting Research, 2011(2):59-65, 97.)

He K M, Zhang X Y, Ren S Q, et al.

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015: 1026-1034.

Sun S N, Zhang B B, Xie L, et al.

An Unsupervised Deep Domain Adaptation Approach for Robust Speech Recognition

[J]. Neurocomputing, 2017,257:79-87.

Williamson D S, Wang D L.

Time-frequency Masking in the Complex Domain for Speech Dereverberation and Denoising

[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017,25(7):1492-1501.

In real-world situations, speech is masked by both background noise and reverberation, which negatively affect perceptual quality and intelligibility. In this paper, we address monaural speech separation in reverberant and noisy environments. We perform dereverberation and denoising using supervised learning with a deep neural network. Specifically, we enhance the magnitude and phase by performing separation with an estimate of the complex ideal ratio mask. We define the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech. Our approach is evaluated using simulated and real room impulse responses, and with background noises. The proposed approach improves objective speech quality and intelligibility significantly. Evaluations and comparisons show that it outperforms related methods in many reverberant and noisy environments.

Zhang Y, Marshall I, Wallace B C.

Rationale-augmented Convolutional Neural Networks for Text Classification

[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016: 795-804.

[J]. 数据分析与知识发现, 2019,3(1):95-103.

( Li Hui, Chai Yaqing.

Fine-Grained Sentiment Analysis Based on Convolutional Neural Network

[J]. Data Analysis and Knowledge Discovery, 2019,3(1):95-103.)

Hosaka T.

Bankruptcy Prediction Using Imaged Financial Ratios and Convolutional Neural Networks

[J]. Expert Systems with Applications, 2019,117:287-299.

Lencun Y, Bottou L, Bengio Y, et al.

Gradient-Based Learning Applied to Document Recognition

[J]. Proceedings of the IEEE, 1998,86(11):2278-2324.

[D]. 济南:山东大学, 2010.

( Kan Shixing.

Performance Comparison of Several Methods for Selecting Indices of Commercial Bank Credit Ranking

[D]. Ji’nan: Shandong University, 2010.)

[J]. 数学的实践与认识, 2018,48(2):76-87.

( Liu Dan, Li Zhanjiang, Zheng Xixi.

Selection Model of Credit Index Combination Based on WOE-Probit Stepwise Regression and Its Application

[J]. Mathematics in Practice and Theory, 2018,48(2):76-87.)

[J]. 系统工程理论与实践, 2005,25(1):12-18,26.

( Yang Shu’e, Huang Li.

Financial Crisis Warning Model Based on BP Neural Network

[J]. Systems Engineering-Theory & Practice, 2005,25(1):12-18, 26.)

/

 〈 〉

 版权所有 © 2015 《数据分析与知识发现》编辑部 地址：北京市海淀区中关村北四环西路33号 邮编：100190 电话/传真：(010)82626611-6626，82624938 E-mail:jishu@mail.las.ac.cn