Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (1): 85-94    DOI: 10.11925/infotech.2096-3467.2018.0323
Current Issue | Archive | Adv Search |
Financial Decision Knowledge Acquisition Based on Neighborhood Rough Set and Ensemble Classifiers with Grid Search
Jing Li,Xiao Liu(),Xiaoli Wang
School of Economics and Management, Tongji University, Shanghai 200092, China
Download: PDF (719 KB)   HTML ( 3
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective]This paper tries to improve the effectiveness and efficiency of acquiring decision-making knowledge from Financial Institutions. [Methods]First, we built a framework of an acquisition system for financial decision-making knowledge, which used neighborhood rough set to remove redundant attributes. Then, we adopted the SMOTE method to balance the data. We also applied grid search method to optimize parameters of the ensemble classifiers. Third, we trained and used the new model to identify the optimal reduction group. Finally, we acquired the needed knowledge through the optimal reduction, and stored them in the database.[Results]We examined the proposed method with 4,521 pieces of financial record, which yielded sensitivity of 83.55%, specificity of 80.74% and AUC of 0.8214. [Limitations]We did not run the proposed model with data of insurance or consumer loans. [Conclusions] The proposed method could improve the classification performance of financial decision-making system, which could identify and acquire knowledge of key customers effectively.

Key wordsKnowledge Acquisition      Neighborhood Rough Set      Ensemble Classifiers      Grid Search     
Received: 25 March 2018      Published: 04 March 2019

Cite this article:

Jing Li,Xiao Liu,Xiaoli Wang. Financial Decision Knowledge Acquisition Based on Neighborhood Rough Set and Ensemble Classifiers with Grid Search. Data Analysis and Knowledge Discovery, 2019, 3(1): 85-94.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.0323     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2019/V3/I1/85

[1] McKenney J L, Scott M M. Management Decision Systems: Computer-Based Support for Decision Making[M]. Harvard Business School Press, 1984.
[2] Kim N S, Park B, Lee K D.A Knowledge Based Freight Management Decision Support System Incorporating Economies of Scale: Multimodal Minimum Cost Flow Optimization Approach[J]. Information Technology and Management, 2016, 17(1): 81-94.
[3] 李扬, 李竟翔, 马双鸽. 不平衡数据的企业财务预警模型研究[J]. 数理统计与管理, 2016, 35(5): 893-906.
[3] (Li Yang, Li Jingxiang, Ma Shuangge.Financial Early Warning System Based on Imbalanced Dataset[J]. Journal of Applied Statistics and Management, 2016, 35(5): 893-906.)
[4] 蒋翠清, 宋凯伦, 丁勇, 等. 基于用户生成内容的潜在客户识别方法[J]. 数据分析与知识发现, 2018, 2(3):1-8.
[4] (Jiang Cuiqing, Song Kailun, Ding Yong, et al.Identifying Potential Customers Based on User-Generated Contents[J]. Data Analysis and Knowledge Discovery, 2018, 2(3): 1-8.)
[5] 牟冬梅, 任珂. 三种数据挖掘算法在电子病历知识发现中的比较[J]. 现代图书情报技术, 2016(6): 102-109.
[5] (Mu Dongmei, Ren Ke.Discovering Knowledge from Electronic Medical Records with Three Data Mining Algorithms[J]. New Technology of Library and Information Service, 2016(6): 102-109.)
[6] 谢邦彦. 数据挖掘在理财产品营销中的应用——以CATI数据为例[J]. 统计与信息论坛, 2009, 24(10): 91-96.
[6] (Xie Bangyan.The Application of Data Mining in Financial Investment——Based on CATI Data[J]. Statistics & Information Forum, 2009, 24(10): 91-96.)
[7] 王俊花, 高齐圣, 马金焱. 消费者购买银行个人理财产品行为分析——基于多水平Logistic模型[J]. 青岛大学学报: 自然科学版, 2017, 30(4): 109-116.
[7] (Wang Junhua, Gao Qisheng, Ma Jinyan.Analysis on the Behavior of Consumers Purchasing Bank’s Financial Products—Based on Multi-level Logistic Model[J]. Journal of Qingdao University: Natural Science Edition, 2017, 30(4): 109-116.)
[8] Pawlak Z.Rough Sets[J]. International Journal of Computer and Information Sciences, 1982, 11(5): 341-356.
[9] 张建华, 刘仲英. 知识获取与求精RS-GA策略[J]. 同济大学学报: 自然科学版, 2004, 32(6):822-826.
[9] (Zhang Jianhua, Liu Zhongying.RS-GA Strategy for Knowledge Acquisition and Refinement[J]. Journal of Tongji University: Natural Science, 2004, 32(6): 822-826.)
[10] El-Baz A H. Hybrid Intelligent System-Based Rough Set and Ensemble Classifier for Breast Cancer Diagnosis[J]. Neural Computing & Applications, 2015, 26(2): 437-446.
[11] 胡清华, 于达仁, 谢宗霞. 基于邻域粒化和粗糙逼近的数值属性约简[J]. 软件学报, 2008, 19(3): 640-649.
[11] (Hu Qinghua, Yu Daren, Xie Zongxia.Numerical Attribute Reduction Based on Neighborhood Granulation and Rough Approximation[J]. Journal of Software, 2008, 19(3): 640-649.)
[12] Lin T Y.Granular Computing on Binary Relations II: Rough Set Representations and Belief Functions[J]. Rough Sets in Knowledge Discovery, 1998(1): 122-140.
[13] Yao Y Y.Relational Interpretations of Neighborhood Operators and Rough Set Approximation Operators[J]. Information Sciences, 1998, 111(1-4): 239-259.
[14] Wu W Z, Zhang W X.Neighborhood Operator Systems and Approximations[J]. Information Sciences, 2002, 144(1-4): 201-217.
[15] Chen Y, Zhang Z, Zheng J, et al.Gene Selection for Tumor Classification Using Neighborhood Rough Sets and Entropy Measures[J]. Journal of Biomedical Informatics, 2017, 67: 59-68.
[16] 王效俐, 刘潇, 苏强. 邻域粗糙集融合贝叶斯神经网络在医疗决策中的应用研究[J]. 工业工程与管理, 2016, 21(5): 141-147.
[16] (Wang Xiaoli, Liu Xiao, Su Qiang.Application in Medical Decision Making Based on Neighborhood Rough Set and Bayes Neural Network[J]. Industrial Engineering and Management, 2016, 21(5): 141-147.)
[17] 赵冠华. 基于邻域粗糙集属性约简的对偶约束式LS-SVM财务困境预测模型研究[J]. 运筹与管理, 2011, 20(3): 132-139.
[17] (Zhao Guanhua.Study on Financial Distress Prediction Model of Least Squares Support Vector Machine of Dual Constraint Type Based on Attribute Reduction of Neighborhood Rough Set[J]. Operations Research and Management Science, 2011, 20(3): 132-139.)
[18] Hu Q, Yu D, Xie Z.Neighborhood Classifiers[J]. Expert Systems with Applications, 2008, 34(2): 866-876.
[19] Chawla N V, Bowyer K W, Hall L O, et al.SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.
[20] Luengo J, Fernández A, García S, et al.Addressing Data Complexity for Imbalanced Data Sets: Analysis of SMOTE-Based Oversampling and Evolutionary Undersampling[J]. Soft Computing, 2011, 15(10): 1909-1936.
[21] Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C.Safe-Level-Smote: Safe-Level-Synthetic Minority Over-Sampling Technique for Handling the Class Imbalanced Problem[C]// Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Springer, 2009: 475-482.
[22] Ramentol E, Caballero Y, Bello R, et al.SMOTE-RSB*: A Hybrid Preprocessing Approach Based on Oversampling and Undersampling for High Imbalanced Data-Sets Using SMOTE and Rough Sets Theory[J]. Knowledge and Information Systems, 2012, 33(2): 245-265.
[23] Wang X Z, Xing H J, Li Y, et al.A Study on Relationship Between Generalization Abilities and Fuzziness of Base Classifiers in Ensemble Learning[J]. IEEE Transactions on Fuzzy Systems, 2015, 23(5): 1638-1654.
[24] Schapire R E.The Strength of Weak Learnability[J]. Machine Learning, 1990, 5(2): 197-227.
[25] Breiman L.Bagging Predictors[J]. Machine Learning, 1996, 24(2): 123-140.
[26] Freund Y, Schapire R E.A Decision-theoretic Generalization of On-line Learning and an Application to Boosting[J].Journal of Computer and System Sciences, 1997,55(1):119-139.
[27] 纪昌明, 周婷, 向腾飞, 等. 基于网格搜索和交叉验证的支持向量机在梯级水电系统隐随机调度中的应用[J]. 电力自动化设备, 2014, 34(3): 125-131.
[27] (Ji Changming, Zhou Ting, Xiang Tengfei, et al.Application of Support Vector Machine Based on Grid Search and Cross Validation in Implicit Stochastic Dispatch of Cascaded Hydropower Stations[J]. Electric Power Automation Equipment, 2014, 34(3): 125-131.)
[28] Moro S, Cortez P, Rita P.A Data-Driven Approach to Predict the Success of Bank Telemarketing[J]. Decision Support Systems, 2014, 62: 22-31.
[29] 索明亮. 基于粗糙集的故障预测及诊断技术在卫星中的应用[D]. 哈尔滨: 哈尔滨工业大学, 2013.
[29] (Suo Mingliang.Application of Fault Prediction and Diagnosis Technology in the Satellite Based on Rough Set[D]. Harbin: Harbin Institute of Technology, 2013.)
[30] 王鹏. 基于邻域粗糙集的属性约简算法研究[D]. 石家庄: 河北科技大学, 2011.
[30] (Wang Peng.Study on Feature Selection Based on Neighbor Rough Set[D]. Shijiazhuang: Hebei University of Science and Technology University, 2011.)
[31] Zweig M H, Campbell G. Receiver-Operating Characteristic (ROC) Plots: A Fundamental Evaluation Tool in Clinical Medicine[J]. Clinical Chemistry, 1993, 39(4): 561-577.
[32] Kamalov F, Leung H H.ROC Curve Model Under Pareto Distribution[J]. Applied Mathematical Sciences, 2016, 10(10): 461-466.
[33] Wang S, Li D, Petrick N, et al.Optimizing Area Under the ROC Curve Using Semi-Supervised Learning[J]. Pattern Recognition, 2015, 48(1): 276-287.
[1] Yang Liu,Fu Zhu,Wang Yuefen. Acquisition Method of Design Process Knowledge in Conceptual Design[J]. 数据分析与知识发现, 2018, 2(2): 29-36.
[2] Gu Wei, Li Chaofan, Wang Hongjun, Xiao Shibin, Shi Shuicai. Acquisition of Synonym from Patent Query Logs[J]. 现代图书情报技术, 2015, 31(2): 24-30.
[3] Wang Sili, Zhu Zhongming, Yao Xiaona. Analysis and Experimental Research on Method of Semantic Knowledge Acquisition for Institutional Repository[J]. 现代图书情报技术, 2014, 30(4): 7-13.
[4] Jiang Xun, Xu Xukan, Su Xinning. Knowledge Service-oriented Model of Knowledge Base Frame Structure Research Based on Double-base Cooperating[J]. 现代图书情报技术, 2014, 30(2): 55-62.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn