Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (1): 85-94    DOI: 10.11925/infotech.2096-3467.2018.0323
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
邻域粗糙集融合网格搜索组合分类器的理财决策知识获取研究*
李静,刘潇(),王效俐
同济大学经济与管理学院上海200092
Financial Decision Knowledge Acquisition Based on Neighborhood Rough Set and Ensemble Classifiers with Grid Search
Jing Li,Xiao Liu(),Xiaoli Wang
School of Economics and Management, Tongji University, Shanghai 200092, China
全文: PDF(719 KB)   HTML ( 3
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】为提高金融机构理财决策知识获取的效率和有效性, 提出邻域粗糙集融合网格搜索组合分类器的理财决策知识获取模型。【方法】建立理财决策知识获取系统框架, 采用邻域粗糙集方法对决策系统进行知识约简, 采用SMOTE过采样方法消除数据的不平衡性, 采用网格搜索方法搜索组合分类器的最优参数。通过模型的训练和测试, 对约简组合进行评估和优选, 选出最佳约简; 最后, 通过约简获取决策系统的规则知识, 存入组织知识库, 完成知识获取。【结果】采用4 521条真实理财数据进行实证分析, 测试集购买类样本准确率(Sensitivity)达到83.55%, 未购买类样本准确率(Specificity)达到80.74%, AUC值达到0.8214。【局限】未针对保险、消费贷款等其他类型的营销数据进行验证。【结论】邻域粗糙集融合网格搜索组合分类器的分类模型能够有效提高理财决策系统的整体分类能力, 识别和获取关键客户知识, 提高金融机构理财产品决策的效益和效率。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
李静
刘潇
王效俐
关键词 知识获取邻域粗糙集组合分类器网格搜索    
Abstract

[Objective]This paper tries to improve the effectiveness and efficiency of acquiring decision-making knowledge from Financial Institutions. [Methods]First, we built a framework of an acquisition system for financial decision-making knowledge, which used neighborhood rough set to remove redundant attributes. Then, we adopted the SMOTE method to balance the data. We also applied grid search method to optimize parameters of the ensemble classifiers. Third, we trained and used the new model to identify the optimal reduction group. Finally, we acquired the needed knowledge through the optimal reduction, and stored them in the database.[Results]We examined the proposed method with 4,521 pieces of financial record, which yielded sensitivity of 83.55%, specificity of 80.74% and AUC of 0.8214. [Limitations]We did not run the proposed model with data of insurance or consumer loans. [Conclusions] The proposed method could improve the classification performance of financial decision-making system, which could identify and acquire knowledge of key customers effectively.

Key wordsKnowledge Acquisition    Neighborhood Rough Set    Ensemble Classifiers    Grid Search
收稿日期: 2018-03-25     
基金资助:*本文系国家自然科学基金项目“面向全生命周期的医疗质量安全管理与资源优化配置”(项目编号:71432007)的研究成果之一
引用本文:   
李静,刘潇,王效俐. 邻域粗糙集融合网格搜索组合分类器的理财决策知识获取研究*[J]. 数据分析与知识发现, 2019, 3(1): 85-94.
Jing Li,Xiao Liu,Xiaoli Wang. Financial Decision Knowledge Acquisition Based on Neighborhood Rough Set and Ensemble Classifiers with Grid Search. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2018.0323.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0323
[1] McKenney J L, Scott M M. Management Decision Systems: Computer-Based Support for Decision Making[M]. Harvard Business School Press, 1984.
[2] Kim N S, Park B, Lee K D.A Knowledge Based Freight Management Decision Support System Incorporating Economies of Scale: Multimodal Minimum Cost Flow Optimization Approach[J]. Information Technology and Management, 2016, 17(1): 81-94.
[3] 李扬, 李竟翔, 马双鸽. 不平衡数据的企业财务预警模型研究[J]. 数理统计与管理, 2016, 35(5): 893-906.
[3] (Li Yang, Li Jingxiang, Ma Shuangge.Financial Early Warning System Based on Imbalanced Dataset[J]. Journal of Applied Statistics and Management, 2016, 35(5): 893-906.)
[4] 蒋翠清, 宋凯伦, 丁勇, 等. 基于用户生成内容的潜在客户识别方法[J]. 数据分析与知识发现, 2018, 2(3):1-8.
[4] (Jiang Cuiqing, Song Kailun, Ding Yong, et al.Identifying Potential Customers Based on User-Generated Contents[J]. Data Analysis and Knowledge Discovery, 2018, 2(3): 1-8.)
[5] 牟冬梅, 任珂. 三种数据挖掘算法在电子病历知识发现中的比较[J]. 现代图书情报技术, 2016(6): 102-109.
[5] (Mu Dongmei, Ren Ke.Discovering Knowledge from Electronic Medical Records with Three Data Mining Algorithms[J]. New Technology of Library and Information Service, 2016(6): 102-109.)
[6] 谢邦彦. 数据挖掘在理财产品营销中的应用——以CATI数据为例[J]. 统计与信息论坛, 2009, 24(10): 91-96.
[6] (Xie Bangyan.The Application of Data Mining in Financial Investment——Based on CATI Data[J]. Statistics & Information Forum, 2009, 24(10): 91-96.)
[7] 王俊花, 高齐圣, 马金焱. 消费者购买银行个人理财产品行为分析——基于多水平Logistic模型[J]. 青岛大学学报: 自然科学版, 2017, 30(4): 109-116.
[7] (Wang Junhua, Gao Qisheng, Ma Jinyan.Analysis on the Behavior of Consumers Purchasing Bank’s Financial Products—Based on Multi-level Logistic Model[J]. Journal of Qingdao University: Natural Science Edition, 2017, 30(4): 109-116.)
[8] Pawlak Z.Rough Sets[J]. International Journal of Computer and Information Sciences, 1982, 11(5): 341-356.
[9] 张建华, 刘仲英. 知识获取与求精RS-GA策略[J]. 同济大学学报: 自然科学版, 2004, 32(6):822-826.
[9] (Zhang Jianhua, Liu Zhongying.RS-GA Strategy for Knowledge Acquisition and Refinement[J]. Journal of Tongji University: Natural Science, 2004, 32(6): 822-826.)
[10] El-Baz A H. Hybrid Intelligent System-Based Rough Set and Ensemble Classifier for Breast Cancer Diagnosis[J]. Neural Computing & Applications, 2015, 26(2): 437-446.
[11] 胡清华, 于达仁, 谢宗霞. 基于邻域粒化和粗糙逼近的数值属性约简[J]. 软件学报, 2008, 19(3): 640-649.
[11] (Hu Qinghua, Yu Daren, Xie Zongxia.Numerical Attribute Reduction Based on Neighborhood Granulation and Rough Approximation[J]. Journal of Software, 2008, 19(3): 640-649.)
[12] Lin T Y.Granular Computing on Binary Relations II: Rough Set Representations and Belief Functions[J]. Rough Sets in Knowledge Discovery, 1998(1): 122-140.
[13] Yao Y Y.Relational Interpretations of Neighborhood Operators and Rough Set Approximation Operators[J]. Information Sciences, 1998, 111(1-4): 239-259.
[14] Wu W Z, Zhang W X.Neighborhood Operator Systems and Approximations[J]. Information Sciences, 2002, 144(1-4): 201-217.
[15] Chen Y, Zhang Z, Zheng J, et al.Gene Selection for Tumor Classification Using Neighborhood Rough Sets and Entropy Measures[J]. Journal of Biomedical Informatics, 2017, 67: 59-68.
[16] 王效俐, 刘潇, 苏强. 邻域粗糙集融合贝叶斯神经网络在医疗决策中的应用研究[J]. 工业工程与管理, 2016, 21(5): 141-147.
[16] (Wang Xiaoli, Liu Xiao, Su Qiang.Application in Medical Decision Making Based on Neighborhood Rough Set and Bayes Neural Network[J]. Industrial Engineering and Management, 2016, 21(5): 141-147.)
[17] 赵冠华. 基于邻域粗糙集属性约简的对偶约束式LS-SVM财务困境预测模型研究[J]. 运筹与管理, 2011, 20(3): 132-139.
[17] (Zhao Guanhua.Study on Financial Distress Prediction Model of Least Squares Support Vector Machine of Dual Constraint Type Based on Attribute Reduction of Neighborhood Rough Set[J]. Operations Research and Management Science, 2011, 20(3): 132-139.)
[18] Hu Q, Yu D, Xie Z.Neighborhood Classifiers[J]. Expert Systems with Applications, 2008, 34(2): 866-876.
[19] Chawla N V, Bowyer K W, Hall L O, et al.SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.
[20] Luengo J, Fernández A, García S, et al.Addressing Data Complexity for Imbalanced Data Sets: Analysis of SMOTE-Based Oversampling and Evolutionary Undersampling[J]. Soft Computing, 2011, 15(10): 1909-1936.
[21] Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C.Safe-Level-Smote: Safe-Level-Synthetic Minority Over-Sampling Technique for Handling the Class Imbalanced Problem[C]// Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Springer, 2009: 475-482.
[22] Ramentol E, Caballero Y, Bello R, et al.SMOTE-RSB*: A Hybrid Preprocessing Approach Based on Oversampling and Undersampling for High Imbalanced Data-Sets Using SMOTE and Rough Sets Theory[J]. Knowledge and Information Systems, 2012, 33(2): 245-265.
[23] Wang X Z, Xing H J, Li Y, et al.A Study on Relationship Between Generalization Abilities and Fuzziness of Base Classifiers in Ensemble Learning[J]. IEEE Transactions on Fuzzy Systems, 2015, 23(5): 1638-1654.
[24] Schapire R E.The Strength of Weak Learnability[J]. Machine Learning, 1990, 5(2): 197-227.
[25] Breiman L.Bagging Predictors[J]. Machine Learning, 1996, 24(2): 123-140.
[26] Freund Y, Schapire R E.A Decision-theoretic Generalization of On-line Learning and an Application to Boosting[J].Journal of Computer and System Sciences, 1997,55(1):119-139.
[27] 纪昌明, 周婷, 向腾飞, 等. 基于网格搜索和交叉验证的支持向量机在梯级水电系统隐随机调度中的应用[J]. 电力自动化设备, 2014, 34(3): 125-131.
[27] (Ji Changming, Zhou Ting, Xiang Tengfei, et al.Application of Support Vector Machine Based on Grid Search and Cross Validation in Implicit Stochastic Dispatch of Cascaded Hydropower Stations[J]. Electric Power Automation Equipment, 2014, 34(3): 125-131.)
[28] Moro S, Cortez P, Rita P.A Data-Driven Approach to Predict the Success of Bank Telemarketing[J]. Decision Support Systems, 2014, 62: 22-31.
[29] 索明亮. 基于粗糙集的故障预测及诊断技术在卫星中的应用[D]. 哈尔滨: 哈尔滨工业大学, 2013.
[29] (Suo Mingliang.Application of Fault Prediction and Diagnosis Technology in the Satellite Based on Rough Set[D]. Harbin: Harbin Institute of Technology, 2013.)
[30] 王鹏. 基于邻域粗糙集的属性约简算法研究[D]. 石家庄: 河北科技大学, 2011.
[30] (Wang Peng.Study on Feature Selection Based on Neighbor Rough Set[D]. Shijiazhuang: Hebei University of Science and Technology University, 2011.)
[31] Zweig M H, Campbell G. Receiver-Operating Characteristic (ROC) Plots: A Fundamental Evaluation Tool in Clinical Medicine[J]. Clinical Chemistry, 1993, 39(4): 561-577.
[32] Kamalov F, Leung H H.ROC Curve Model Under Pareto Distribution[J]. Applied Mathematical Sciences, 2016, 10(10): 461-466.
[33] Wang S, Li D, Petrick N, et al.Optimizing Area Under the ROC Curve Using Semi-Supervised Learning[J]. Pattern Recognition, 2015, 48(1): 276-287.
[1] 羊柳,傅柱,王曰芬. 概念设计中的设计过程知识获取研究*[J]. 数据分析与知识发现, 2018, 2(2): 29-36.
[2] 谷威, 李超凡, 王洪俊, 肖诗斌, 施水才. 专利检索日志的同义词获取[J]. 现代图书情报技术, 2015, 31(2): 24-30.
[3] 王思丽, 祝忠明, 姚晓娜. 机构知识库语义知识获取方法分析及实验研究[J]. 现代图书情报技术, 2014, 30(4): 7-13.
[4] 蒋勋, 徐绪堪, 苏新宁. 面向知识服务的双库协同知识库框架结构研究[J]. 现代图书情报技术, 2014, 30(2): 55-62.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn