Please wait a minute...
Advanced Search
数据分析与知识发现  2018, Vol. 2 Issue (10): 65-76    DOI: 10.11925/infotech.2096-3467.2018.0026
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于集成学习的中国P2P网络借贷信用风险预警模型的对比研究*
操玮,李灿(),贺婷婷,朱卫东
合肥工业大学经济学院 合肥 230601
Predicting Credit Risks of P2P Loans in China Based on Ensemble Learning Methods
Wei Cao,Can Li(),Tingting He,Weidong Zhu
School of Economics, Hefei University of Technology, Hefei 230601, China
全文: PDF(944 KB)   HTML
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】结合实际的中国网贷数据, 通过对不同流行集成方法的对比分析, 探索合适中国网贷信用风险监测的集成方法, 从而提高对中国网贷平台信用风险的监测效率。【方法】基于人人贷交易数据, 从借款人的5个方面提取特征信息并运用随机森林算法进行特征筛选, 基于此运用4种集成算法和5种基分类器, 构建信用风险预警模型实现对比分析。【结果】实验结果表明, Rotation Forest的准确度最高为99.32%, 误差率仅为1.71%。而且基于随机森林的特征选择过程能够提高相关模型的性能。【局限】实验数据集有待进一步扩充。【结论】Rotation Forest集成模型与识别风险的重要因素结合, 可以显著提高信用风险预测效率。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
操玮
李灿
贺婷婷
朱卫东
关键词 集成学习特征选择P2P网络借贷信用风险    
Abstract

[Objective] This paper examines several popular ensemble-learning methods with real-world data, aiming to find the most suitable way to monitor the P2P credit risks facing China. [Methods] We extracted the borrower’s features from five aspects, and identified the most remarkable ones with Random Forest method. Then, we compared the prediction models based on four ensemble-learning methods and five base classifiers. [Results] We found that the Rotation Forest method had the highest accuracy rate of 99.32% and the lowest error rate of 1.71% . Feature selection processing based on Random Forest could improve the performance of all related models significantly. [Limitations] The sample dataset needs to be expanded. [Conclusions] The proposed method could identify credit risks more effectively.

Key wordsEnsemble Learning    Feature Select    P2P Net Loan    Credit Risks
收稿日期: 2018-01-09     
基金资助:*本文系国家自然科学基金项目“基于多维证据理论的科学基金立项评估信息融合研究”(项目编号: 71774047)的研究成果之一
引用本文:   
操玮,李灿,贺婷婷,朱卫东. 基于集成学习的中国P2P网络借贷信用风险预警模型的对比研究*[J]. 数据分析与知识发现, 2018, 2(10): 65-76.
Wei Cao,Can Li,Tingting He,Weidong Zhu. Predicting Credit Risks of P2P Loans in China Based on Ensemble Learning Methods. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2018.0026.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0026
图1  研究设计路线
变量类型 变量名 实际含义 变量数值化
因变量 label 借款违约与否 违约=1, 未违约=0
借款人特征信息 F1 年龄 20-25岁=0, 26-31岁=1, 32-37岁=2, 38-43岁=3, 44-49岁=4, 50岁及以上=5
F2 学历 高中及以下=0, 大专=1, 本科=2, 研究生=3
F3 婚姻状况 单身(包括未婚、离异和丧偶)=0, 已婚=1
F4 工作时间 空值=0, 1年及以下=2, 1-3年(含)=4, 3-5年(含)=6, 5年以上=8
F5 工作城市 东部=0, 中部=1, 西部=2
F6 公司行业 借款人所在公司所属行业*
F7 公司规模 空值=0, 10人以下=1, 10-100人=2, 100-500人=3, 500人以上=4
借款人财务信息 F8 收入 1000元以下=0, 1000-2000元=1, 2000-5000元=2, 5000-1000元=3, 10000=20000元=4, 20000-50000元=5, 50000元以上=6
F9 信用等级 HR=0, E=1, D=2, C=3, B=4, A=5, AA=6
F10 信用额度 信用额度做Min-Max标准化处理
F11 房产 有房产=1, 无房产=0
F12 车产 有车产=1, 无车产=0
F13 房贷 无房贷=1, 有房贷=0
F14 车贷 无车贷=1, 有车贷=0
借款人历史信息 F15 成功借款 借款人成功借款数量
F16 申请借款 借款人历史申请借款笔数
F17 逾期次数 借款人历史逾期次数
F18 严重逾期 存在严重逾期=1, 否则=0
借款特征 F19 借款金额 借款人预期借款金额做Min-Max标准化处理
F20 用途 借款人的借款用途**
F21 利率 借款年利率
F22 还款期限 借款期限, 按月衡量, 最短3个月, 最长36个月
F23 标的类型 机构担保标=0, 信用认证标=1, 实地认证标=2
平台认证
信息
F24 信用认证 借款人提供央行开具的个人征信报告, 认证通过=1, 其他=0
F25 身份认证 借款人提供身份证复印件认证身份信息, 认证通过=1, 其他=0
F26 工作认证 借款人提供工作证复印件或劳动合同, 认证通过=1, 其他=0
F27 收入认证 借款人提供收入证明或工资卡银行流水, 认证通过=1, 其他=0
表1  变量说明
变量类型 变量 含义
借款人特征信息 F1 年龄
F3 婚姻状况
借款人财务信息 F9 信用等级
F10 信用额度
借款人历史信息 F17 逾期次数
F18 严重逾期
借款特征 F19 借款金额
F22 还款期限
平台认证信息 F26 工作认证
F27 收入认证
表2  特征选择结果
预测为违约 预测为未违约
实际为违约 TP FN
实际为未违约 FP TN
表3  两分类预警问题的混合矩阵
集成算法 基分类器 准确率(%) Type-I error (%) AUC
A FS A FS A FS
Bagging LR 97.17 98.79 7.47 6.71 0.987 0.989
CART 98.08 98.58 6.32 2.87 0.936 0.999
C4.5 97.67 98.28 6.90 2.87 0.949 0.998
MLP 96.06 98.08 10.92 8.05 0.988 0.998
SVM 97.47 97.97 9.20 6.32 0.973 0.995
Boosting LR 97.98 98.48 5.74 3.45 0.984 0.898
CART 97.17 98.88 6.32 5.17 0.983 0.994
C4.5 97.17 98.07 5.17 4.02 0.995 0.996
MLP 97.27 98.88 8.05 2.30 0.995 0.999
SVM 97.97 98.28 8.62 4.59 0.954 0.971
Random
Subspace
LR 95.55 97.27 15.52 8.05 0.984 0.996
CART 96.46 97.17 4.59 2.87 0.940 0.981
C4.5 95.05 98.07 6.32 4.02 0.961 0.994
MLP 95.45 97.67 8.62 7.47 0.965 0.997
SVM 96.87 97.37 9.77 7.47 0.955 0.963
Rotation Forest LR 98.68 99.69 3.45 0.57 0.998 1.000
CART 98.48 99.19 3.45 1.15 0.992 0.998
C4.5 97.97 98.99 6.90 5.17 0.954 0.996
MLP 98.07 99.29 6.32 1.15 0.997 1.000
SVM 98.78 99.79 4.59 0.00 0.975 0.999
表4  根据三个指标的集成模型准确性评估 (训练测试比为60:40)
集成算法 基分类器 准确率(%) Type-I error (%) AUC
A FS A FS A FS
Bagging LR 98.65 99.46 6.03 2.59 0.990 0.998
CART 97.71 98.65 6.03 5.17 0.977 0.996
C4.5 97.71 98.65 12.93 5.17 0.992 0.998
MLP 96.36 98.38 10.34 5.17 0.992 0.998
SVM 98.52 98.92 5.17 1.72 0.974 0.994
Boosting LR 97.98 98.48 5.74 3.45 0.984 0.898
CART 97.71 99.59 5.17 1.72 0.984 0.999
C4.5 97.30 98.38 6.03 3.45 0.981 0.997
MLP 97.30 99.32 7.76 3.44 0.996 0.999
SVM 98.52 98.65 4.31 2.59 0.974 0.981
Random
Subspace
LR 97.71 98.79 7.41 4.31 0.988 0.994
CART 96.23 97.04 12.93 5.17 0.975 0.976
C4.5 96.36 97.57 12.07 3.45 0.957 0.997
MLP 96.77 97.57 12.93 4.31 0.993 0.995
SVM 97.98 98.25 7.76 4.31 0.956 0.994
Rotation Forest LR 98.65 99.59 4.31 0.86 0.998 1.000
CART 98.11 98.65 3.44 1.72 0.993 0.995
C4.5 98.65 99.05 6.03 3.45 0.985 0.999
MLP 97.84 99.46 7.76 2.59 0.995 0.999
SVM 98.92 99.73 4.31 1.72 0.976 1.000
表5  根据三个指标的集成模型准确性评估 (训练测试比为70:30)
集成算法 基分类器 准确率(%) Type-I error (%) AUC
A FS A FS A FS
Bagging LR 97.17 98.99 7.32 4.90 0.962 0.990
CART 97.17 98.78 8.53 6.09 0.968 0.994
C4.5 97.57 98.58 9.76 4.90 0.995 0.997
MLP 96.56 98.79 8.53 6.09 0.989 0.985
SVM 96.37 98.38 9.76 3.66 0.976 0.986
Boosting LR 96.56 97.57 13.41 6.09 0.966 0.995
CART 97.37 97.77 8.53 3.66 0.995 0.997
C4.5 95.14 97.36 8.53 4.90 0.972 0.980
MLP 97.37 98.58 9.76 6.09 0.993 0.996
SVM 97.37 98.18 7.32 3.66 0.982 0.990
Random Subspace LR 94.13 97.36 20.73 8.53 0.966 0.994
CART 96.56 96.96 14.63 9.76 0.940 0.997
C4.5 96.56 97.16 12.20 10.98 0.964 0.975
MLP 95.95 97.97 8.53 7.32 0.968 0.991
SVM 96.56 97.77 14.63 9.76 0.967 0.983
Rotation Forest LR 98.38 99.39 6.09 1.20 0.994 1.000
CART 97.57 99.19 6.09 1.20 0.992 0.996
C4.5 97.57 99.19 7.32 1.20 0.981 0.998
MLP 98.38 99.19 7.32 2.44 0.992 1.000
SVM 98.58 99.39 6.09 1.20 0.981 1.000
表6  根据三个指标的集成模型准确性评估 (训练测试比为80:20)
训练
测试比
Bagging Boosting Random Subspace Rotation Forest
A FS A FS A FS A FS
60:40 97.29 98.34 97.51 98.52 95.88 97.51 98.40 99.39
70:30 97.79 98.81 97.71 98.92 97.01 97.84 98.43 99.30
80:20 96.97 98.70 96.76 97.89 95.95 97.44 98.10 99.27
平均值 97.35 98.62 97.32 98.44 96.27 97.60 98.31 99.32
表7  平均准确率(%)
训练
测试比
Bagging Boosting Random Subspace Rotation Forest
A FS A FS A FS A FS
60:40 8.16 5.36 6.78 3.91 8.96 5.98 4.94 1.61
70:30 8.10 4.00 7.24 3.27 10.62 4.31 5.17 2.07
80:20 8.78 5.13 9.51 4.88 14.14 9.27 6.58 1.45
平均值 8.35 4.82 7.84 4.02 11.24 6.52 5.56 1.71
表8  平均Type-I error(%)
标准 Bagging Boosting Random Subspace Rotation
Forest
A FS A FS A FS A FS
准确率 6.00 2.67 6.33 2.50 8.00 5.67 3.83 1.00
Type-I error 6.33 3.33 6.33 2.00 8.00 5.00 4.00 1.00
表9  模型排名
图2  集成模型的Friedman平均排名和Post-Hoc检验结果
[1] Korol T.Early Warning Models Against Bankruptcy Risk for Central European and Latin American Enterprises[J]. Economic Modelling, 2013, 31(1): 22-30.
doi: 10.1016/j.econmod.2012.11.017
[2] 储蕾. 基于BP神经网络和SVM的个人信用评估比较研究[D]. 上海: 上海师范大学, 2014.
(Chu Lei.The Comparative Research of Personal Credit Assessment Model Based on BP Neural Network and SVM[D]. Shanghai: Shanghai Normal University, 2014.)
[3] Serrano-Cinca C, Gutiérrez-Nieto B.The Use of Profit Scoring as an Alternative to Credit Scoring Systems in Peer-to-Peer (P2P) Lending[J]. Decision Support Systems, 2016, 89: 113-122.
doi: 10.1016/j.dss.2016.06.014
[4] Dahiya S, Handa S S, Singh N P.A Feature Selection Enabled Hybrid-Bagging Algorithm for Credit Risk Evaluation[J]. Expert Systems, 2017, 34(9): e12217.
doi: 10.1111/exsy.12217
[5] Xia Y, Liu C, Da B, et al.A Novel Heterogeneous Ensemble Credit Scoring Model Based on Stacking Approach[J]. Expert Systems with Applications, 2018, 93: 182-199.
doi: 10.1016/j.eswa.2017.10.022
[6] Sun J, Lang J, Fujita H, et al.Imbalanced Enterprise Credit Evaluation with DTE-SBD: Decision Tree Ensemble Based on SMOTE and Bagging with Differentiated Sampling Rates[J]. Information Sciences, 2018, 425: 76-91.
doi: 10.1016/j.ins.2017.10.017
[7] Zhu Y, Xie C, Wang G J, et al.Comparison of Individual, Ensemble and Integrated Ensemble Machine Learning Methods to Predict China’s SME Credit Risk in Supply Chain Finance[J]. Neural Computing & Applications, 2017, 28(1): 41-50.
[8] He H, Zhang W, Zhang S.A Novel Ensemble Method for Credit Scoring: Adaption of Different Imbalance Ratios[J]. Expert Systems with Applications, 2018, 98: 105-107.
doi: 10.1016/j.eswa.2018.01.012
[9] Sun Z, Song Q, Zhu X, et al.A Novel Ensemble Method for Classifying Imbalanced Data[J]. Pattern Recognition, 2015, 48(5): 1623-1637.
doi: 10.1016/j.patcog.2014.11.014
[10] Xiao H, Xiao Z, Wang Y.Ensemble Classification Based on Supervised Clustering for Credit Scoring[J]. Applied Soft Computing, 2016, 43: 73-86.
doi: 10.1016/j.asoc.2016.02.022
[11] Abellán J, Castellano J G.A Comparative Study on Base Classifiers in Ensemble Method for Credit Scoring[J]. Expert Systems with Applications, 2016, 73: 1-10.
doi: 10.1016/j.eswa.2016.12.020
[12] 梁明江, 庄宇. 集成学习方法在企业财务危机预警中的应用[J]. 软科学, 2012, 26(4): 114-117.
(Liang Mingjiang, Zhuang Yu.Ensemble Learning Method and Its Application in Enterprise Financial Crisis Early Warning[J]. Soft Science, 2012, 26(4): 114-117.)
[13] 李诒靖, 郭海湘, 李亚楠, 等. 一种基于Boosting的集成学习算法在不均衡数据中的分类[J]. 系统工程理论与实践, 2016, 36(1): 189-199.
(Li Yijing, Guo Haixiang, Li Ya’nan, et al.A Boosting Based Ensemble Learning Algorithm in Imbalanced Data Classification[J]. Systems Engineering—Theory & Practice, 2016, 36(1): 189-199.)
[14] 王清. 集成学习中若干关键问题的研究[D]. 上海: 复旦大学, 2011.
(Wang Qing.Research on Several Key Problems of Ensemble Learning Algorithms[D]. Shanghai: Fudan University, 2011.)
[15] Dietterich T G.An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization[J]. Machine Learning, 2000, 40(2): 139-157.
doi: 10.1023/A:1007607513941
[16] Nanni L, Lumini A.An Experimental Comparison of Ensemble Classifiers for Bankruptcy Prediction and Credit Scoring[J]. Expert Systems with Applications, 2009, 36(2): 3028-3033.
doi: 10.1016/j.eswa.2008.01.018
[17] Altman E I. Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy[J]. The Journal of Finance, 1968, 23(4): 589-609.
doi: 10.2307/2978933
[18] 石澄贤, 陈雪交. P2P网贷个人信用评价指标体系的构建[J]. 常州大学学报: 社会科学版, 2016, 17(1): 80-85.
doi: 10.3969/j.issn.2095-042X.2016.01.012
(Shi Chengxian, Chen Xuejiao.The Construction of P2P Network Lending Personal Credit Evaluation Index System[J]. Journal of Changzhou University:Social Science Edition, 2016, 17(1): 80-85.)
[19] 王金珠. 基于证据权重逻辑回归模型的P2P公司信用风险评估[D]. 南京: 南京航空航天大学, 2016.
(Wang Jinzhu. Based on the Weight of Evidence Logistic Regression Model to Assess P2P Company’s Credit Risk[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2016).
[20] Hens A B, Tiwari M K.Computational Time Reduction for Credit Scoring: An Integrated Approach Based on Support Vector Machine and Stratified Sampling Method[J]. Expert Systems with Applications, 2012, 39(8): 6774-6781.
doi: 10.1016/j.eswa.2011.12.057
[21] Zhao Z, Xu S, Kang B H, et al.Investigation and Improvement of Multi-Layer Perceptron Neural Networks for Credit Scoring[J]. Expert Systems with Applications, 2015, 42(7): 3508-3516.
doi: 10.1016/j.eswa.2014.12.006
[22] 余华银, 雷雅慧. 基于决策树与Logistic回归的P2P网贷平台信用风险评价比较分析[J]. 长春大学学报: 社会科学版, 2017, 27(9): 13-16.
(Yu Huayin, Lei Yahui.Comparative Analysis on Credit Risk Evaluation of P2P Network Loan Platform Based on Decision Tree and Logistic Regression[J]. Journal of Changchun University, 2017, 27(9): 13-16.)
[23] 王重仁, 韩冬梅. 基于卷积神经网络的互联网金融信用风险预测研究[J]. 微型机与应用, 2017, 36(24): 44-48.
(Wang Chongren, Han Dongmei.Prediction of Credit Riskin Internet Financial Industry Based on Convolutional Neural Network[J]. Microcomputer &Its Applications, 2017, 36(24): 44-48.)
[24] Abellán J, Mantas C J.Improving Experimental Studies about Ensembles of Classifiers for Bankruptcy Prediction and Credit Scoring[J]. Expert Systems with Applications, 2014, 41(8): 3825-3830.
doi: 10.1016/j.eswa.2013.12.003
[25] Tsai C F, Hsu Y F, Yen D C.A Comparative Study of Classifier Ensembles for Bankruptcy Prediction[J]. Applied Soft Computing, 2014, 24: 977-984.
doi: 10.1016/j.asoc.2014.08.047
[26] Bequé A, Lessmann S.Extreme Learning Machines for Credit Scoring: An Empirical Evaluation[J]. Expert Systems with Applications, 2017, 86: 42-53.
doi: 10.1016/j.eswa.2017.05.050
[27] Ala'raj M, Abbod M F. A New Hybrid Ensemble Credit Scoring Model Based on Classifiers Consensus System Approach[J]. Expert Systems with Applications, 2016, 64: 36-55.
doi: 10.1016/j.eswa.2016.07.017
[28] Florez-Lopez R, Ramon-Jeronimo J M. Enhancing Accuracy and Interpretability of Ensemble Strategies in Credit Risk Assessment: A Correlated-Adjusted Decision Forest Proposal[J]. Expert Systems with Applications, 2015, 42(13): 5737-5753.
doi: 10.1016/j.eswa.2015.02.042
[29] Lin W Y, Hu Y H, Tsai C F.Machine Learning in Financial Crisis Prediction: A Survey[J]. IEEE Transactions on Systems, Man, and Cybernetics, 2012, 42(4): 421-436.
doi: 10.1109/TSMCC.2011.2170420
[30] 薛薇, 陈欢歌. SPSS Modeler数据挖掘方法及应用[M]. 北京: 电子工业出版社, 2014.
(Xue Wei, Chen Huan’ge.SPSS Modeler Data Mining Method and Application[M]. Beijing: Publishing House of Electronics Industry, 2014.)
[31] Breiman L I, Friedman J H, Olshen R A, et al.Classification and Regression Trees (CART)[J]. Encyclopedia of Ecology, 1984, 40(3): 582-588.
[32] Rutkowski L, Jaworski M, Pietruczuk L, et al.The CART Decision Tree for Mining Data Streams[J]. Information Sciences, 2014, 266: 1-15.
doi: 10.1016/j.ins.2013.12.060
[33] Quinlan J R.C4.5: Programs for Machine Learning[M]. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1993.
[34] Lakshmi B N, Indumathi T S, Ravi N.A Study on C.5 Decision Tree Classification Algorithm for Risk Predictions During Pregnancy[J]. Procedia Technology, 2016, 24: 1542-1549.
doi: 10.1016/j.protcy.2016.05.128
[35] Kohavi R, John G H.The Wrapper Approach[A]//Feature Extraction, Construction and Selection[M]. New York: Springer US, 1998: 33-50.
[36] Rumelhart D E, McClelland J L, The PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructures of Cognition[J]. Language, 1987, 63(4): 871-886.
doi: 10.2307/415721
[37] Cortes C, Vapink V.Support Vector Networks[J]. Machine Learning, 1995, 20(3): 273-297.
[38] Sadik O, Land W H, Wanekaya A K, et al.Detection and Classification of Organophosphate Nerve Agent Simulants Using Support Vector Machines with Multiarray Sensors[J]. Journal of Chemical Information and Computer Sciences, 2004, 44(2): 499-507.
doi: 10.1021/ci034220i pmid: 15032529
[39] Kearns M J, Valiant L G.Cryptographic Limitations on Learning Boolean Formulae and Finite Automata[J]. Journal of the Association for Computing Machinery, 1994, 41(1): 433-444.
doi: 10.1007/3-540-56483-7_21
[40] 曹莹, 苗启广, 刘家辰, 等. AdaBoost算法研究进展与展望[J]. 自动化学报, 2013, 39(6): 745-758.
doi: 10.3724/sp.j.1004.2013.00745
(Cao Ying, Miao Qiguang, Liu Jiachen, et al.Advance and Prospects of AdaBoost Algorithm[J]. Acta Automatica Sinica, 2013, 39(6): 745-758.)
[41] Breiman L.Arcing Classifiers[J]. The Annals of Statistics, 1998, 26(3): 801-824.
doi: 10.1214/aos/1024691079
[42] Ho T K.The Random Subspace Method for Constructing Decision Forests[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1998, 20(8): 832-844.
[43] Tumer K, Ghosh J.Error Correlation and Error Reduction in Ensemble Classifiers[J]. Connection Science, 1996, 8(3-4): 385-404.
doi: 10.1080/095400996116839
[44] Rodriguez J J, Kuncheva L I, Alonso C J.Rotation Forest: A New Classifier Ensemble Method[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(10): 1619-1630.
doi: 10.1109/TPAMI.2006.211 pmid: 16986543
[45] Demšar J.Statistical Comparisons of Classifiers over Multiple Data Sets[J]. The Journal of Machine Learning Research, 2006, 7: 1-30.
doi: 10.1007/s10846-005-9016-2
[46] Piramuthu S.On Preprocessing Data for Financial Credit Risk Evaluation[J]. Expert Systems with Applications, 2006, 30: 489-497.
doi: 10.1016/j.eswa.2005.10.006
[47] Liu Y, Schumann M.Data Mining Feature Selection for Credit-Scoring Models[J]. The Journal of the Operational Research Society, 2005, 56(9): 1099-1108.
doi: 10.1057/palgrave.jors.2601976
[1] 温廷新,李洋子,孙静霜. 基于改进的果蝇优化算法的文本特征选择优化模型[J]. 数据分析与知识发现, 2018, 2(5): 59-69.
[2] 李志鹏,李卫忠. 基于可拓小生境量子粒子群算法的特征选择*[J]. 数据分析与知识发现, 2017, 1(7): 82-89.
[3] 张越,王东波,朱丹浩. 面向食品安全突发事件汉语分词的特征选择及模型优化研究*[J]. 数据分析与知识发现, 2017, 1(2): 64-72.
[4] 李湘东,阮涛,刘康. 基于维基百科的多种类型文献自动分类研究*[J]. 数据分析与知识发现, 2017, 1(10): 43-52.
[5] 路永和,陈景煌. 混合蛙跳算法在文本分类特征选择优化中的应用*[J]. 数据分析与知识发现, 2017, 1(1): 91-101.
[6] 孟园,王洪伟. 基于文本内容特征选择的评论质量检测*[J]. 现代图书情报技术, 2016, 32(4): 40-47.
[7] 王华秋, 王斌, 聂珍. 一种应用多储备池回声状态网络的图像语义映射研究[J]. 现代图书情报技术, 2015, 31(6): 41-48.
[8] 李湘东, 巴志超, 黄莉. 一种基于加权LDA模型和多粒度的文本特征选择方法[J]. 现代图书情报技术, 2015, 31(5): 42-49.
[9] 徐冬冬, 吴韶波. 一种基于类别描述的TF-IDF特征选择方法的改进[J]. 现代图书情报技术, 2015, 31(3): 39-48.
[10] 谭学清, 周通, 罗琳. 一种基于类平均相似度的文本分类算法[J]. 现代图书情报技术, 2014, 30(9): 66-73.
[11] 顾晓雪, 章成志. 结合内容和标签的Web文本聚类研究[J]. 现代图书情报技术, 2014, 30(11): 45-52.
[12] 汪润,何琳,王东波,黄水清,范远标. 面向文本挖掘的植物生长发育实体识别研究*[J]. 现代图书情报技术, 2014, 30(1): 24-27.
[13] 胡昌平, 陈果. 共词分析中的词语贡献度特征选择研究[J]. 现代图书情报技术, 2013, 29(7/8): 89-93.
[14] 何文静, 何琳. 基于社会标签的文本聚类研究[J]. 现代图书情报技术, 2013, 29(7/8): 49-54.
[15] 路永和, 李焰锋. 多因素影响的特征选择方法[J]. 现代图书情报技术, 2013, (5): 34-39.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn