[Objective] This paper examines several popular ensemble-learning methods with real-world data, aiming to find the most suitable way to monitor the P2P credit risks facing China. [Methods] We extracted the borrower’s features from five aspects, and identified the most remarkable ones with Random Forest method. Then, we compared the prediction models based on four ensemble-learning methods and five base classifiers. [Results] We found that the Rotation Forest method had the highest accuracy rate of 99.32% and the lowest error rate of 1.71% . Feature selection processing based on Random Forest could improve the performance of all related models significantly. [Limitations] The sample dataset needs to be expanded. [Conclusions] The proposed method could identify credit risks more effectively.
操玮, 李灿, 贺婷婷, 朱卫东. 基于集成学习的中国P2P网络借贷信用风险预警模型的对比研究*[J]. 数据分析与知识发现, 2018, 2(10): 65-76.
Cao Wei,Li Can,He Tingting,Zhu Weidong. Predicting Credit Risks of P2P Loans in China Based on Ensemble Learning Methods. Data Analysis and Knowledge Discovery, 2018, 2(10): 65-76.
Korol T.Early Warning Models Against Bankruptcy Risk for Central European and Latin American Enterprises[J]. Economic Modelling, 2013, 31(1): 22-30.
doi: 10.1016/j.econmod.2012.11.017
[2]
储蕾. 基于BP神经网络和SVM的个人信用评估比较研究[D]. 上海: 上海师范大学, 2014.
[2]
(Chu Lei.The Comparative Research of Personal Credit Assessment Model Based on BP Neural Network and SVM[D]. Shanghai: Shanghai Normal University, 2014.)
[3]
Serrano-Cinca C, Gutiérrez-Nieto B.The Use of Profit Scoring as an Alternative to Credit Scoring Systems in Peer-to-Peer (P2P) Lending[J]. Decision Support Systems, 2016, 89: 113-122.
doi: 10.1016/j.dss.2016.06.014
[4]
Dahiya S, Handa S S, Singh N P.A Feature Selection Enabled Hybrid-Bagging Algorithm for Credit Risk Evaluation[J]. Expert Systems, 2017, 34(9): e12217.
doi: 10.1111/exsy.12217
[5]
Xia Y, Liu C, Da B, et al.A Novel Heterogeneous Ensemble Credit Scoring Model Based on Stacking Approach[J]. Expert Systems with Applications, 2018, 93: 182-199.
doi: 10.1016/j.eswa.2017.10.022
[6]
Sun J, Lang J, Fujita H, et al.Imbalanced Enterprise Credit Evaluation with DTE-SBD: Decision Tree Ensemble Based on SMOTE and Bagging with Differentiated Sampling Rates[J]. Information Sciences, 2018, 425: 76-91.
doi: 10.1016/j.ins.2017.10.017
[7]
Zhu Y, Xie C, Wang G J, et al.Comparison of Individual, Ensemble and Integrated Ensemble Machine Learning Methods to Predict China’s SME Credit Risk in Supply Chain Finance[J]. Neural Computing & Applications, 2017, 28(1): 41-50.
[8]
He H, Zhang W, Zhang S.A Novel Ensemble Method for Credit Scoring: Adaption of Different Imbalance Ratios[J]. Expert Systems with Applications, 2018, 98: 105-107.
doi: 10.1016/j.eswa.2018.01.012
[9]
Sun Z, Song Q, Zhu X, et al.A Novel Ensemble Method for Classifying Imbalanced Data[J]. Pattern Recognition, 2015, 48(5): 1623-1637.
doi: 10.1016/j.patcog.2014.11.014
[10]
Xiao H, Xiao Z, Wang Y.Ensemble Classification Based on Supervised Clustering for Credit Scoring[J]. Applied Soft Computing, 2016, 43: 73-86.
doi: 10.1016/j.asoc.2016.02.022
[11]
Abellán J, Castellano J G.A Comparative Study on Base Classifiers in Ensemble Method for Credit Scoring[J]. Expert Systems with Applications, 2016, 73: 1-10.
doi: 10.1016/j.eswa.2016.12.020
(Liang Mingjiang, Zhuang Yu.Ensemble Learning Method and Its Application in Enterprise Financial Crisis Early Warning[J]. Soft Science, 2012, 26(4): 114-117.)
(Li Yijing, Guo Haixiang, Li Ya’nan, et al.A Boosting Based Ensemble Learning Algorithm in Imbalanced Data Classification[J]. Systems Engineering—Theory & Practice, 2016, 36(1): 189-199.)
[14]
王清. 集成学习中若干关键问题的研究[D]. 上海: 复旦大学, 2011.
[14]
(Wang Qing.Research on Several Key Problems of Ensemble Learning Algorithms[D]. Shanghai: Fudan University, 2011.)
[15]
Dietterich T G.An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization[J]. Machine Learning, 2000, 40(2): 139-157.
doi: 10.1023/A:1007607513941
[16]
Nanni L, Lumini A.An Experimental Comparison of Ensemble Classifiers for Bankruptcy Prediction and Credit Scoring[J]. Expert Systems with Applications, 2009, 36(2): 3028-3033.
doi: 10.1016/j.eswa.2008.01.018
[17]
Altman E I. Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy[J]. The Journal of Finance, 1968, 23(4): 589-609.
doi: 10.2307/2978933
(Shi Chengxian, Chen Xuejiao.The Construction of P2P Network Lending Personal Credit Evaluation Index System[J]. Journal of Changzhou University:Social Science Edition, 2016, 17(1): 80-85.)
doi: 10.3969/j.issn.2095-042X.2016.01.012
(Wang Jinzhu. Based on the Weight of Evidence Logistic Regression Model to Assess P2P Company’s Credit Risk[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2016).
[20]
Hens A B, Tiwari M K.Computational Time Reduction for Credit Scoring: An Integrated Approach Based on Support Vector Machine and Stratified Sampling Method[J]. Expert Systems with Applications, 2012, 39(8): 6774-6781.
doi: 10.1016/j.eswa.2011.12.057
[21]
Zhao Z, Xu S, Kang B H, et al.Investigation and Improvement of Multi-Layer Perceptron Neural Networks for Credit Scoring[J]. Expert Systems with Applications, 2015, 42(7): 3508-3516.
doi: 10.1016/j.eswa.2014.12.006
(Yu Huayin, Lei Yahui.Comparative Analysis on Credit Risk Evaluation of P2P Network Loan Platform Based on Decision Tree and Logistic Regression[J]. Journal of Changchun University, 2017, 27(9): 13-16.)
(Wang Chongren, Han Dongmei.Prediction of Credit Riskin Internet Financial Industry Based on Convolutional Neural Network[J]. Microcomputer &Its Applications, 2017, 36(24): 44-48.)
[24]
Abellán J, Mantas C J.Improving Experimental Studies about Ensembles of Classifiers for Bankruptcy Prediction and Credit Scoring[J]. Expert Systems with Applications, 2014, 41(8): 3825-3830.
doi: 10.1016/j.eswa.2013.12.003
[25]
Tsai C F, Hsu Y F, Yen D C.A Comparative Study of Classifier Ensembles for Bankruptcy Prediction[J]. Applied Soft Computing, 2014, 24: 977-984.
doi: 10.1016/j.asoc.2014.08.047
[26]
Bequé A, Lessmann S.Extreme Learning Machines for Credit Scoring: An Empirical Evaluation[J]. Expert Systems with Applications, 2017, 86: 42-53.
doi: 10.1016/j.eswa.2017.05.050
[27]
Ala'raj M, Abbod M F. A New Hybrid Ensemble Credit Scoring Model Based on Classifiers Consensus System Approach[J]. Expert Systems with Applications, 2016, 64: 36-55.
doi: 10.1016/j.eswa.2016.07.017
[28]
Florez-Lopez R, Ramon-Jeronimo J M. Enhancing Accuracy and Interpretability of Ensemble Strategies in Credit Risk Assessment: A Correlated-Adjusted Decision Forest Proposal[J]. Expert Systems with Applications, 2015, 42(13): 5737-5753.
doi: 10.1016/j.eswa.2015.02.042
[29]
Lin W Y, Hu Y H, Tsai C F.Machine Learning in Financial Crisis Prediction: A Survey[J]. IEEE Transactions on Systems, Man, and Cybernetics, 2012, 42(4): 421-436.
doi: 10.1109/TSMCC.2011.2170420
(Xue Wei, Chen Huan’ge.SPSS Modeler Data Mining Method and Application[M]. Beijing: Publishing House of Electronics Industry, 2014.)
[31]
Breiman L I, Friedman J H, Olshen R A, et al.Classification and Regression Trees (CART)[J]. Encyclopedia of Ecology, 1984, 40(3): 582-588.
[32]
Rutkowski L, Jaworski M, Pietruczuk L, et al.The CART Decision Tree for Mining Data Streams[J]. Information Sciences, 2014, 266: 1-15.
doi: 10.1016/j.ins.2013.12.060
[33]
Quinlan J R.C4.5: Programs for Machine Learning[M]. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1993.
[34]
Lakshmi B N, Indumathi T S, Ravi N.A Study on C.5 Decision Tree Classification Algorithm for Risk Predictions During Pregnancy[J]. Procedia Technology, 2016, 24: 1542-1549.
doi: 10.1016/j.protcy.2016.05.128
[35]
Kohavi R, John G H.The Wrapper Approach[A]//Feature Extraction, Construction and Selection[M]. New York: Springer US, 1998: 33-50.
[36]
Rumelhart D E, McClelland J L, The PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructures of Cognition[J]. Language, 1987, 63(4): 871-886.
doi: 10.2307/415721
Sadik O, Land W H, Wanekaya A K, et al.Detection and Classification of Organophosphate Nerve Agent Simulants Using Support Vector Machines with Multiarray Sensors[J]. Journal of Chemical Information and Computer Sciences, 2004, 44(2): 499-507.
doi: 10.1021/ci034220i
pmid: 15032529
[39]
Kearns M J, Valiant L G.Cryptographic Limitations on Learning Boolean Formulae and Finite Automata[J]. Journal of the Association for Computing Machinery, 1994, 41(1): 433-444.
doi: 10.1007/3-540-56483-7_21
(Cao Ying, Miao Qiguang, Liu Jiachen, et al.Advance and Prospects of AdaBoost Algorithm[J]. Acta Automatica Sinica, 2013, 39(6): 745-758.)
doi: 10.3724/sp.j.1004.2013.00745
[41]
Breiman L.Arcing Classifiers[J]. The Annals of Statistics, 1998, 26(3): 801-824.
doi: 10.1214/aos/1024691079
[42]
Ho T K.The Random Subspace Method for Constructing Decision Forests[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1998, 20(8): 832-844.
[43]
Tumer K, Ghosh J.Error Correlation and Error Reduction in Ensemble Classifiers[J]. Connection Science, 1996, 8(3-4): 385-404.
doi: 10.1080/095400996116839
[44]
Rodriguez J J, Kuncheva L I, Alonso C J.Rotation Forest: A New Classifier Ensemble Method[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(10): 1619-1630.
doi: 10.1109/TPAMI.2006.211
pmid: 16986543
[45]
Demšar J.Statistical Comparisons of Classifiers over Multiple Data Sets[J]. The Journal of Machine Learning Research, 2006, 7: 1-30.
doi: 10.1007/s10846-005-9016-2
[46]
Piramuthu S.On Preprocessing Data for Financial Credit Risk Evaluation[J]. Expert Systems with Applications, 2006, 30: 489-497.
doi: 10.1016/j.eswa.2005.10.006
[47]
Liu Y, Schumann M.Data Mining Feature Selection for Credit-Scoring Models[J]. The Journal of the Operational Research Society, 2005, 56(9): 1099-1108.
doi: 10.1057/palgrave.jors.2601976