[Objective]This paper tries to improve the effectiveness and efficiency of acquiring decision-making knowledge from Financial Institutions. [Methods]First, we built a framework of an acquisition system for financial decision-making knowledge, which used neighborhood rough set to remove redundant attributes. Then, we adopted the SMOTE method to balance the data. We also applied grid search method to optimize parameters of the ensemble classifiers. Third, we trained and used the new model to identify the optimal reduction group. Finally, we acquired the needed knowledge through the optimal reduction, and stored them in the database.[Results]We examined the proposed method with 4,521 pieces of financial record, which yielded sensitivity of 83.55%, specificity of 80.74% and AUC of 0.8214. [Limitations]We did not run the proposed model with data of insurance or consumer loans. [Conclusions] The proposed method could improve the classification performance of financial decision-making system, which could identify and acquire knowledge of key customers effectively.
李静,刘潇,王效俐. 邻域粗糙集融合网格搜索组合分类器的理财决策知识获取研究*[J]. 数据分析与知识发现, 2019, 3(1): 85-94.
Jing Li,Xiao Liu,Xiaoli Wang. Financial Decision Knowledge Acquisition Based on Neighborhood Rough Set and Ensemble Classifiers with Grid Search. Data Analysis and Knowledge Discovery, 2019, 3(1): 85-94.
McKenney J L, Scott M M. Management Decision Systems: Computer-Based Support for Decision Making[M]. Harvard Business School Press, 1984.
[2]
Kim N S, Park B, Lee K D.A Knowledge Based Freight Management Decision Support System Incorporating Economies of Scale: Multimodal Minimum Cost Flow Optimization Approach[J]. Information Technology and Management, 2016, 17(1): 81-94.
(Li Yang, Li Jingxiang, Ma Shuangge.Financial Early Warning System Based on Imbalanced Dataset[J]. Journal of Applied Statistics and Management, 2016, 35(5): 893-906.)
(Jiang Cuiqing, Song Kailun, Ding Yong, et al.Identifying Potential Customers Based on User-Generated Contents[J]. Data Analysis and Knowledge Discovery, 2018, 2(3): 1-8.)
(Mu Dongmei, Ren Ke.Discovering Knowledge from Electronic Medical Records with Three Data Mining Algorithms[J]. New Technology of Library and Information Service, 2016(6): 102-109.)
(Wang Junhua, Gao Qisheng, Ma Jinyan.Analysis on the Behavior of Consumers Purchasing Bank’s Financial Products—Based on Multi-level Logistic Model[J]. Journal of Qingdao University: Natural Science Edition, 2017, 30(4): 109-116.)
[8]
Pawlak Z.Rough Sets[J]. International Journal of Computer and Information Sciences, 1982, 11(5): 341-356.
(Zhang Jianhua, Liu Zhongying.RS-GA Strategy for Knowledge Acquisition and Refinement[J]. Journal of Tongji University: Natural Science, 2004, 32(6): 822-826.)
[10]
El-Baz A H. Hybrid Intelligent System-Based Rough Set and Ensemble Classifier for Breast Cancer Diagnosis[J]. Neural Computing & Applications, 2015, 26(2): 437-446.
(Hu Qinghua, Yu Daren, Xie Zongxia.Numerical Attribute Reduction Based on Neighborhood Granulation and Rough Approximation[J]. Journal of Software, 2008, 19(3): 640-649.)
[12]
Lin T Y.Granular Computing on Binary Relations II: Rough Set Representations and Belief Functions[J]. Rough Sets in Knowledge Discovery, 1998(1): 122-140.
[13]
Yao Y Y.Relational Interpretations of Neighborhood Operators and Rough Set Approximation Operators[J]. Information Sciences, 1998, 111(1-4): 239-259.
[14]
Wu W Z, Zhang W X.Neighborhood Operator Systems and Approximations[J]. Information Sciences, 2002, 144(1-4): 201-217.
[15]
Chen Y, Zhang Z, Zheng J, et al.Gene Selection for Tumor Classification Using Neighborhood Rough Sets and Entropy Measures[J]. Journal of Biomedical Informatics, 2017, 67: 59-68.
(Wang Xiaoli, Liu Xiao, Su Qiang.Application in Medical Decision Making Based on Neighborhood Rough Set and Bayes Neural Network[J]. Industrial Engineering and Management, 2016, 21(5): 141-147.)
(Zhao Guanhua.Study on Financial Distress Prediction Model of Least Squares Support Vector Machine of Dual Constraint Type Based on Attribute Reduction of Neighborhood Rough Set[J]. Operations Research and Management Science, 2011, 20(3): 132-139.)
[18]
Hu Q, Yu D, Xie Z.Neighborhood Classifiers[J]. Expert Systems with Applications, 2008, 34(2): 866-876.
[19]
Chawla N V, Bowyer K W, Hall L O, et al.SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.
[20]
Luengo J, Fernández A, García S, et al.Addressing Data Complexity for Imbalanced Data Sets: Analysis of SMOTE-Based Oversampling and Evolutionary Undersampling[J]. Soft Computing, 2011, 15(10): 1909-1936.
[21]
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C.Safe-Level-Smote: Safe-Level-Synthetic Minority Over-Sampling Technique for Handling the Class Imbalanced Problem[C]// Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Springer, 2009: 475-482.
[22]
Ramentol E, Caballero Y, Bello R, et al.SMOTE-RSB*: A Hybrid Preprocessing Approach Based on Oversampling and Undersampling for High Imbalanced Data-Sets Using SMOTE and Rough Sets Theory[J]. Knowledge and Information Systems, 2012, 33(2): 245-265.
[23]
Wang X Z, Xing H J, Li Y, et al.A Study on Relationship Between Generalization Abilities and Fuzziness of Base Classifiers in Ensemble Learning[J]. IEEE Transactions on Fuzzy Systems, 2015, 23(5): 1638-1654.
[24]
Schapire R E.The Strength of Weak Learnability[J]. Machine Learning, 1990, 5(2): 197-227.
Freund Y, Schapire R E.A Decision-theoretic Generalization of On-line Learning and an Application to Boosting[J].Journal of Computer and System Sciences, 1997,55(1):119-139.
(Ji Changming, Zhou Ting, Xiang Tengfei, et al.Application of Support Vector Machine Based on Grid Search and Cross Validation in Implicit Stochastic Dispatch of Cascaded Hydropower Stations[J]. Electric Power Automation Equipment, 2014, 34(3): 125-131.)
[28]
Moro S, Cortez P, Rita P.A Data-Driven Approach to Predict the Success of Bank Telemarketing[J]. Decision Support Systems, 2014, 62: 22-31.
(Suo Mingliang.Application of Fault Prediction and Diagnosis Technology in the Satellite Based on Rough Set[D]. Harbin: Harbin Institute of Technology, 2013.)
[30]
王鹏. 基于邻域粗糙集的属性约简算法研究[D]. 石家庄: 河北科技大学, 2011.
[30]
(Wang Peng.Study on Feature Selection Based on Neighbor Rough Set[D]. Shijiazhuang: Hebei University of Science and Technology University, 2011.)
[31]
Zweig M H, Campbell G. Receiver-Operating Characteristic (ROC) Plots: A Fundamental Evaluation Tool in Clinical Medicine[J]. Clinical Chemistry, 1993, 39(4): 561-577.
[32]
Kamalov F, Leung H H.ROC Curve Model Under Pareto Distribution[J]. Applied Mathematical Sciences, 2016, 10(10): 461-466.
[33]
Wang S, Li D, Petrick N, et al.Optimizing Area Under the ROC Curve Using Semi-Supervised Learning[J]. Pattern Recognition, 2015, 48(1): 276-287.