Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (10): 65-76    DOI: 10.11925/infotech.2096-3467.2018.0026
Current Issue | Archive | Adv Search |
Predicting Credit Risks of P2P Loans in China Based on Ensemble Learning Methods
Wei Cao,Can Li(),Tingting He,Weidong Zhu
School of Economics, Hefei University of Technology, Hefei 230601, China
Download: PDF(944 KB)   HTML ( 1
Export: BibTeX | EndNote (RIS)      

[Objective] This paper examines several popular ensemble-learning methods with real-world data, aiming to find the most suitable way to monitor the P2P credit risks facing China. [Methods] We extracted the borrower’s features from five aspects, and identified the most remarkable ones with Random Forest method. Then, we compared the prediction models based on four ensemble-learning methods and five base classifiers. [Results] We found that the Rotation Forest method had the highest accuracy rate of 99.32% and the lowest error rate of 1.71% . Feature selection processing based on Random Forest could improve the performance of all related models significantly. [Limitations] The sample dataset needs to be expanded. [Conclusions] The proposed method could identify credit risks more effectively.

Key wordsEnsemble Learning      Feature Select      P2P Net Loan      Credit Risks     
Received: 09 January 2018      Published: 12 November 2018

Cite this article:

Wei Cao,Can Li,Tingting He,Weidong Zhu. Predicting Credit Risks of P2P Loans in China Based on Ensemble Learning Methods. Data Analysis and Knowledge Discovery, 2018, 2(10): 65-76.

URL:     OR

[1] Korol T.Early Warning Models Against Bankruptcy Risk for Central European and Latin American Enterprises[J]. Economic Modelling, 2013, 31(1): 22-30.
[2] 储蕾. 基于BP神经网络和SVM的个人信用评估比较研究[D]. 上海: 上海师范大学, 2014.
[2] (Chu Lei.The Comparative Research of Personal Credit Assessment Model Based on BP Neural Network and SVM[D]. Shanghai: Shanghai Normal University, 2014.)
[3] Serrano-Cinca C, Gutiérrez-Nieto B.The Use of Profit Scoring as an Alternative to Credit Scoring Systems in Peer-to-Peer (P2P) Lending[J]. Decision Support Systems, 2016, 89: 113-122.
[4] Dahiya S, Handa S S, Singh N P.A Feature Selection Enabled Hybrid-Bagging Algorithm for Credit Risk Evaluation[J]. Expert Systems, 2017, 34(9): e12217.
[5] Xia Y, Liu C, Da B, et al.A Novel Heterogeneous Ensemble Credit Scoring Model Based on Stacking Approach[J]. Expert Systems with Applications, 2018, 93: 182-199.
[6] Sun J, Lang J, Fujita H, et al.Imbalanced Enterprise Credit Evaluation with DTE-SBD: Decision Tree Ensemble Based on SMOTE and Bagging with Differentiated Sampling Rates[J]. Information Sciences, 2018, 425: 76-91.
[7] Zhu Y, Xie C, Wang G J, et al.Comparison of Individual, Ensemble and Integrated Ensemble Machine Learning Methods to Predict China’s SME Credit Risk in Supply Chain Finance[J]. Neural Computing & Applications, 2017, 28(1): 41-50.
[8] He H, Zhang W, Zhang S.A Novel Ensemble Method for Credit Scoring: Adaption of Different Imbalance Ratios[J]. Expert Systems with Applications, 2018, 98: 105-107.
[9] Sun Z, Song Q, Zhu X, et al.A Novel Ensemble Method for Classifying Imbalanced Data[J]. Pattern Recognition, 2015, 48(5): 1623-1637.
[10] Xiao H, Xiao Z, Wang Y.Ensemble Classification Based on Supervised Clustering for Credit Scoring[J]. Applied Soft Computing, 2016, 43: 73-86.
[11] Abellán J, Castellano J G.A Comparative Study on Base Classifiers in Ensemble Method for Credit Scoring[J]. Expert Systems with Applications, 2016, 73: 1-10.
[12] 梁明江, 庄宇. 集成学习方法在企业财务危机预警中的应用[J]. 软科学, 2012, 26(4): 114-117.
[12] (Liang Mingjiang, Zhuang Yu.Ensemble Learning Method and Its Application in Enterprise Financial Crisis Early Warning[J]. Soft Science, 2012, 26(4): 114-117.)
[13] 李诒靖, 郭海湘, 李亚楠, 等. 一种基于Boosting的集成学习算法在不均衡数据中的分类[J]. 系统工程理论与实践, 2016, 36(1): 189-199.
[13] (Li Yijing, Guo Haixiang, Li Ya’nan, et al.A Boosting Based Ensemble Learning Algorithm in Imbalanced Data Classification[J]. Systems Engineering—Theory & Practice, 2016, 36(1): 189-199.)
[14] 王清. 集成学习中若干关键问题的研究[D]. 上海: 复旦大学, 2011.
[14] (Wang Qing.Research on Several Key Problems of Ensemble Learning Algorithms[D]. Shanghai: Fudan University, 2011.)
[15] Dietterich T G.An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization[J]. Machine Learning, 2000, 40(2): 139-157.
[16] Nanni L, Lumini A.An Experimental Comparison of Ensemble Classifiers for Bankruptcy Prediction and Credit Scoring[J]. Expert Systems with Applications, 2009, 36(2): 3028-3033.
[17] Altman E I. Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy[J]. The Journal of Finance, 1968, 23(4): 589-609.
[18] 石澄贤, 陈雪交. P2P网贷个人信用评价指标体系的构建[J]. 常州大学学报: 社会科学版, 2016, 17(1): 80-85.
[18] (Shi Chengxian, Chen Xuejiao.The Construction of P2P Network Lending Personal Credit Evaluation Index System[J]. Journal of Changzhou University:Social Science Edition, 2016, 17(1): 80-85.)
[19] 王金珠. 基于证据权重逻辑回归模型的P2P公司信用风险评估[D]. 南京: 南京航空航天大学, 2016.
[19] (Wang Jinzhu. Based on the Weight of Evidence Logistic Regression Model to Assess P2P Company’s Credit Risk[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2016).
[20] Hens A B, Tiwari M K.Computational Time Reduction for Credit Scoring: An Integrated Approach Based on Support Vector Machine and Stratified Sampling Method[J]. Expert Systems with Applications, 2012, 39(8): 6774-6781.
[21] Zhao Z, Xu S, Kang B H, et al.Investigation and Improvement of Multi-Layer Perceptron Neural Networks for Credit Scoring[J]. Expert Systems with Applications, 2015, 42(7): 3508-3516.
[22] 余华银, 雷雅慧. 基于决策树与Logistic回归的P2P网贷平台信用风险评价比较分析[J]. 长春大学学报: 社会科学版, 2017, 27(9): 13-16.
[22] (Yu Huayin, Lei Yahui.Comparative Analysis on Credit Risk Evaluation of P2P Network Loan Platform Based on Decision Tree and Logistic Regression[J]. Journal of Changchun University, 2017, 27(9): 13-16.)
[23] 王重仁, 韩冬梅. 基于卷积神经网络的互联网金融信用风险预测研究[J]. 微型机与应用, 2017, 36(24): 44-48.
[23] (Wang Chongren, Han Dongmei.Prediction of Credit Riskin Internet Financial Industry Based on Convolutional Neural Network[J]. Microcomputer &Its Applications, 2017, 36(24): 44-48.)
[24] Abellán J, Mantas C J.Improving Experimental Studies about Ensembles of Classifiers for Bankruptcy Prediction and Credit Scoring[J]. Expert Systems with Applications, 2014, 41(8): 3825-3830.
[25] Tsai C F, Hsu Y F, Yen D C.A Comparative Study of Classifier Ensembles for Bankruptcy Prediction[J]. Applied Soft Computing, 2014, 24: 977-984.
[26] Bequé A, Lessmann S.Extreme Learning Machines for Credit Scoring: An Empirical Evaluation[J]. Expert Systems with Applications, 2017, 86: 42-53.
[27] Ala'raj M, Abbod M F. A New Hybrid Ensemble Credit Scoring Model Based on Classifiers Consensus System Approach[J]. Expert Systems with Applications, 2016, 64: 36-55.
[28] Florez-Lopez R, Ramon-Jeronimo J M. Enhancing Accuracy and Interpretability of Ensemble Strategies in Credit Risk Assessment: A Correlated-Adjusted Decision Forest Proposal[J]. Expert Systems with Applications, 2015, 42(13): 5737-5753.
[29] Lin W Y, Hu Y H, Tsai C F.Machine Learning in Financial Crisis Prediction: A Survey[J]. IEEE Transactions on Systems, Man, and Cybernetics, 2012, 42(4): 421-436.
[30] 薛薇, 陈欢歌. SPSS Modeler数据挖掘方法及应用[M]. 北京: 电子工业出版社, 2014.
[30] (Xue Wei, Chen Huan’ge.SPSS Modeler Data Mining Method and Application[M]. Beijing: Publishing House of Electronics Industry, 2014.)
[31] Breiman L I, Friedman J H, Olshen R A, et al.Classification and Regression Trees (CART)[J]. Encyclopedia of Ecology, 1984, 40(3): 582-588.
[32] Rutkowski L, Jaworski M, Pietruczuk L, et al.The CART Decision Tree for Mining Data Streams[J]. Information Sciences, 2014, 266: 1-15.
[33] Quinlan J R.C4.5: Programs for Machine Learning[M]. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1993.
[34] Lakshmi B N, Indumathi T S, Ravi N.A Study on C.5 Decision Tree Classification Algorithm for Risk Predictions During Pregnancy[J]. Procedia Technology, 2016, 24: 1542-1549.
[35] Kohavi R, John G H.The Wrapper Approach[A]//Feature Extraction, Construction and Selection[M]. New York: Springer US, 1998: 33-50.
[36] Rumelhart D E, McClelland J L, The PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructures of Cognition[J]. Language, 1987, 63(4): 871-886.
[37] Cortes C, Vapink V.Support Vector Networks[J]. Machine Learning, 1995, 20(3): 273-297.
[38] Sadik O, Land W H, Wanekaya A K, et al.Detection and Classification of Organophosphate Nerve Agent Simulants Using Support Vector Machines with Multiarray Sensors[J]. Journal of Chemical Information and Computer Sciences, 2004, 44(2): 499-507.
[39] Kearns M J, Valiant L G.Cryptographic Limitations on Learning Boolean Formulae and Finite Automata[J]. Journal of the Association for Computing Machinery, 1994, 41(1): 433-444.
[40] 曹莹, 苗启广, 刘家辰, 等. AdaBoost算法研究进展与展望[J]. 自动化学报, 2013, 39(6): 745-758.
[40] (Cao Ying, Miao Qiguang, Liu Jiachen, et al.Advance and Prospects of AdaBoost Algorithm[J]. Acta Automatica Sinica, 2013, 39(6): 745-758.)
[41] Breiman L.Arcing Classifiers[J]. The Annals of Statistics, 1998, 26(3): 801-824.
[42] Ho T K.The Random Subspace Method for Constructing Decision Forests[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1998, 20(8): 832-844.
[43] Tumer K, Ghosh J.Error Correlation and Error Reduction in Ensemble Classifiers[J]. Connection Science, 1996, 8(3-4): 385-404.
[44] Rodriguez J J, Kuncheva L I, Alonso C J.Rotation Forest: A New Classifier Ensemble Method[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(10): 1619-1630.
[45] Demšar J.Statistical Comparisons of Classifiers over Multiple Data Sets[J]. The Journal of Machine Learning Research, 2006, 7: 1-30.
[46] Piramuthu S.On Preprocessing Data for Financial Credit Risk Evaluation[J]. Expert Systems with Applications, 2006, 30: 489-497.
[47] Liu Y, Schumann M.Data Mining Feature Selection for Credit-Scoring Models[J]. The Journal of the Operational Research Society, 2005, 56(9): 1099-1108.
[1] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[2] Jiaming Liang,Jie Zhao,Zhou Jianlong,Zhenning Dong. Detecting Collusive Fraudulent Online Transaction with Implicit User Behaviors[J]. 数据分析与知识发现, 2019, 3(5): 125-138.
[3] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[4] Lianjie Xiao,Mengrui Gao,Xinning Su. An Under-sampling Ensemble Classification Algorithm Based on Fuzzy C-Means Clustering for Imbalanced Data[J]. 数据分析与知识发现, 2019, 3(4): 90-96.
[5] Tingxin Wen,Yangzi Li,Jingshuang Sun. News Hotspots Discovery Method Based on Multi Factor Feature Selection and AFOA/K-means[J]. 数据分析与知识发现, 2019, 3(4): 97-106.
[6] Zhanglu Tan,Zhaogang Wang,Han Hu. Study on a Method of Feature Classification Selection Based on χ2 Statistics[J]. 数据分析与知识发现, 2019, 3(2): 72-78.
[7] Tingxin Wen,Yangzi Li,Jingshuang Sun. Extracting Text Features with Improved Fruit Fly Optimization Algorithm[J]. 数据分析与知识发现, 2018, 2(5): 59-69.
[8] Zhipeng Li,Weizhong Li. Feature Selection Based on Modified QPSO Algorithm[J]. 数据分析与知识发现, 2017, 1(7): 82-89.
[9] Yue Zhang,Dongbo Wang,Danhao Zhu. Segmenting Chinese Words from Food Safety Emergencies[J]. 数据分析与知识发现, 2017, 1(2): 64-72.
[10] Xiangdong Li,Tao Ruan,Kang Liu. Automatic Classification of Documents from Wikipedia[J]. 数据分析与知识发现, 2017, 1(10): 43-52.
[11] Yonghe Lu,Jinghuang Chen. Optimizing Feature Selection Method for Text Classification with Shuffled Frog Leaping Algorithm[J]. 数据分析与知识发现, 2017, 1(1): 91-101.
[12] Liu Hongguang,Ma Shuanggang,Liu Guifeng. Classifying Chinese News Texts with Denoising Auto Encoder[J]. 现代图书情报技术, 2016, 32(6): 12-19.
[13] Meng Yuan,Wang Hongwei. Evaluating Online Reviews Based on Text Content Features[J]. 现代图书情报技术, 2016, 32(4): 40-47.
[14] Wang Huaqiu, Wang Bin, Nie Zhen. Research on Image Semantic Mapping with Multiple-Reservoirs Echo State Network[J]. 现代图书情报技术, 2015, 31(6): 41-48.
[15] Li Xiangdong, Ba Zhichao, Huang Li. Allocation and Multi-granularity[J]. 现代图书情报技术, 2015, 31(5): 42-49.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938