|
|
Financial Fraud Detection for Growth Enterprise Market Listed Companies Based on Data Fusion |
Li Aihua(),Wang Diwen,Xu Weijia,Li Zimo,Yao Sihan |
School of Management Science and Engineering, Central University of Finance and Economics, Beijing 100081, China |
|
|
Abstract [Objective] This paper builds ensemble models to detect financial frauds of Growth Enterprise Market (GEM) listed companies. [Methods] We constructed a financial fraud anomaly detection framework based on data fusion. In the data layer, we fused structured, text, and multi-source heterogeneous data to construct financial and non-financial information features. In the information layer, we combined different sampling and ensemble classification models. In the knowledge layer, we fused current domain information to construct the model evaluation indicators. [Results] After non-balance processing, the evaluation indicators of the model were better than those of the un-processed results. The optimized SMOTE+ENN+LightGBM model achieved an Fβ of 0.7738. In addition, the detection results containing multiple types of features were better than those containing only single-class features. [Limitations] The proposed method mainly identifies suspicious financial fraud companies. It cannot distinguish or determine specific types of fraud. [Conclusions] Non-balance processing is beneficial for improving the model’s ability to find abnormal samples, and the fusion of multi-source heterogeneous data positive affects the identification of financial frauds in listed companies.
|
Received: 07 June 2022
Published: 09 November 2022
|
|
Fund:National Natural Science Foundation of China(71932008);Fundamental Research Funds for the Central Universities(20170065) |
Corresponding Authors:
Li Aihua,ORCID:0000-0003-4425-1955,E-mail:aihuali@cufe.edu.cn。
|
[1] |
Fligstein N, Roehrkasse A. All the Incentives were Wrong: Opportunism and the Financial Crisis[C]// Proceedings of Annual Meetings of the American Sociological Association. 2013.
|
[2] |
宋新平, 丁永生, 张革夫. 集成分类法在财务欺诈风险识别中的应用[J]. 计算机工程与应用, 2008, 44(34): 226-230.
doi: 10.3778/j.issn.1002-8331.2008.34.069
|
[2] |
(Song Xinping, Ding Yongsheng, Zhang Gefu. Application of Integrated Classification Method in Identifying Risk of Fraudulent Financial Report[J]. Computer Engineering and Applications, 2008, 44(34): 226-230.)
doi: 10.3778/j.issn.1002-8331.2008.34.069
|
[3] |
Lin C C, Chiu A A, Huang S Y, et al. Detecting the Financial Statement Fraud: The Analysis of the Differences Between Data Mining Techniques and Experts’ Judgments[J]. Knowledge-Based Systems, 2015, 89: 459-470.
doi: 10.1016/j.knosys.2015.08.011
|
[4] |
夏明, 李海林, 吴立源. 基于神经网络组合模型的会计舞弊识别[J]. 统计与决策, 2015(16): 49-52.
|
[4] |
(Xia Ming, Li Hailin, Wu Liyuan. Identification of Accounting Fraud Based on Neural Network Combination Model[J]. Statistics & Decision, 2015(16): 49-52.)
|
[5] |
Albrecht W S, Wernz G W, Williams T L. Fraud: Bringing Light to the Dark Side of Business[M]. Irwin Professional Pub., 1995.
|
[6] |
Persons O S. Using Financial Statement Data to Identify Factors Associated with Fraudulent Financial Reporting[J]. Journal of Applied Business Research, 2011, 11(3): 38-46.
|
[7] |
贺建刚, 孙铮, 周友梅. 金字塔结构、审计质量和管理层讨论与分析——基于会计重述视角[J]. 审计研究, 2013(6): 68-75.
|
[7] |
(He Jiangang, Sun Zheng, Zhou Youmei. Pyramid Structures, Audit Quality and the Usefulness of MD & A—Evidence from Accounting Restatements[J]. Auditing Research, 2013(6): 68-75.)
|
[8] |
王克敏, 王华杰, 李栋栋, 等. 年报文本信息复杂性与管理者自利——来自中国上市公司的证据[J]. 管理世界, 2018, 34(12): 120-132.
|
[8] |
(Wang Kemin, Wang Huajie, Li Dongdong, et al. Complexity of Annual Report and Management Self-Interest: Empirical Evidence from Chinese Listed Firms[J]. Management World, 2018, 34(12): 120-132.)
|
[9] |
Purda L, Skillicorn D. Accounting Variables, Deception, and a Bag of Words: Assessing the Tools of Fraud Detection[J]. Contemporary Accounting Research, 2015, 32(3): 1193-1223.
doi: 10.1111/1911-3846.12089
|
[10] |
Bell T B, Carcello J V. A Decision Aid for Assessing the Likelihood of Fraudulent Financial Reporting[J]. Auditing: A Journal of Practice & Theory, 2000, 19(1): 169-184.
doi: 10.2308/aud.2000.19.1.169
|
[11] |
Fanning K M, Cogger K O. Neural Network Detection of Management Fraud Using Published Financial Data[J]. International Journal of Intelligent Systems in Accounting, Finance & Management, 1998, 7(1): 21-41.
|
[12] |
Waltz E, Llinas J. Multisensor Data Fusion[M]. Boston: Artech House, 1990.
|
[13] |
陈科文, 张祖平, 龙军. 多源信息融合关键问题、研究进展与新动向[J]. 计算机科学, 2013, 40(8): 6-13.
|
[13] |
(Chen Kewen, Zhang Zuping, Long Jun. Multisource Information Fusion: Key Issues, Research Progress and New Trends[J]. Computer Science, 2013, 40(8): 6-13.)
|
[14] |
Li A H, Xu W J, Shi Y. A New Data Fusion Framework of Business Intelligence and Analytics in Economy, Finance and Management[C]// Proceedings of the 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. IEEE, 2021: 940-945.
|
[15] |
Shen J L, Liu R Y, Xie M G. iFusion: Individualized Fusion Learning[J]. Journal of the American Statistical Association, 2020, 115(531): 1251-1267.
doi: 10.1080/01621459.2019.1672557
|
[16] |
Kashinath S A, Mostafa S A, Mustapha A, et al. Review of Data Fusion Methods for Real-Time and Multi-Sensor Traffic Flow Analysis[J]. IEEE Access, 2021, 9: 51258-51276.
doi: 10.1109/ACCESS.2021.3069770
|
[17] |
Lau B P L, Marakkalage S H, Zhou Y R, et al. A Survey of Data Fusion in Smart City Applications[J]. Information Fusion, 2019, 52: 357-374.
doi: 10.1016/j.inffus.2019.05.004
|
[18] |
杜德林, 黄洁, 王姣娥. 基于多源数据的中国智慧城市发展状态评价[J]. 地球信息科学学报, 2020, 22(6): 1294-1306.
doi: 10.12082/dqxxkx.2020.190702
|
[18] |
(Du Delin, Huang Jie, Wang Jiaoe. Assessment of Smart City Development Status in China Based on Multi-Source Data[J]. Journal of Geo-Information Science, 2020, 22(6): 1294-1306.)
doi: 10.12082/dqxxkx.2020.190702
|
[19] |
吴建华, 张颖, 原雪梅. 动态贝叶斯信用评级的宏观经济冲击模型[J]. 数理统计与管理, 2022, 41(6): 969-981.
|
[19] |
(Wu Jianhua, Zhang Ying, Yuan Xuemei. Macroeconomic Shock Model for Dynamic Bayesian Credit Rating[J]. Journal of Applied Statistics and Management, 2022, 41(6): 969-981.)
|
[20] |
Wang Q L, Xu W, Huang X T, et al. Enhancing Intraday Stock Price Manipulation Detection by Leveraging Recurrent Neural Networks with Ensemble Learning[J]. Neurocomputing, 2019, 347: 46-58.
doi: 10.1016/j.neucom.2019.03.006
|
[21] |
Chen F H, Chi D J, Zhu J Y. Application of Random Forest, Rough Set Theory, Decision Tree and Neural Network to Detect Financial Statement Fraud-Taking Corporate Governance into Consideration[C]// Proceedings of the 10th International Conference on Intelligent Computing. 2014: 221-234.
|
[22] |
李书信, 倪晴, 曹起, 等. 基于财务造假识别模型的公司授信风险预警研究及应用[J]. 国际金融, 2019(1): 30-33.
|
[22] |
(Li Shuxin, Ni Qing, Cao Qi, et al. Research and Application of Corporate Credit Risk Early Warning Based on Financial Fraud Identification Model[J]. International Finance, 2019(1): 30-33.)
|
[23] |
姚欣. 我国上市公司财务舞弊影响因素实证分析[J]. 行政事业资产与财务, 2019(20): 83-84.
|
[23] |
(Yao Xin. An Empirical Analysis on the Influencing Factors of Financial Fraud of Listed Companies in China[J]. Assets and Finances in Administration and Institution, 2019(20): 83-84.)
|
[24] |
张悦, 宋海涛. 基于代价敏感学习的财务造假识别研究[J]. 财会研究, 2022(2): 22-29.
|
[24] |
(Zhang Yue, Song Haitao. Research on Financial Fraud Identification Based on Cost-Sensitive Learning[J]. Research of Finance and Accounting, 2022(2): 22-29.)
|
[25] |
袁先智, 周云鹏, 严诚幸, 等. 财务欺诈风险特征筛选框架的建立和应用[J]. 中国管理科学, 2022, 30(3): 43-54.
|
[25] |
(Yuan Xianzhi, Zhou Yunpeng, Yan Chengxing, et al. The Framework for the Risk Feature Extraction Method on Corporate Financial Fraud George[J]. Chinese Journal of Management Science, 2022, 30(3): 43-54.)
|
[26] |
连竑彬. 中国上市公司财务报表舞弊现状分析及甄别模型研究[D]. 厦门: 厦门大学, 2008.
|
[26] |
(Lian Hongbin. Fraudulent Financial Statements of Chinese Listed Companies: Analysis of the Status Quo and the Fraud-Detecting Model[D]. Xiamen: Xiamen University, 2008.)
|
[27] |
余玉苗, 吕凡. 财务舞弊风险的识别——基于财务指标增量信息的研究视角[J]. 经济评论, 2010(4): 124-130.
|
[27] |
(Yu Yumiao, Lü Fan. The Identification of Financial Fraud: Based on Incremental Information of Financial Index[J]. Economic Review, 2010(4): 124-130.)
|
[28] |
Cecchini M, Aytug H, Koehler G J, et al. Making Words Work: Using Financial Text as a Predictor of Financial Events[J]. Decision Support Systems, 2010, 50(1): 164-175.
doi: 10.1016/j.dss.2010.07.012
|
[29] |
董伟. 挖掘和分析文本来识别公司财务欺诈:针对财务报表和社交媒体的分析[D]. 合肥: 中国科学技术大学, 2017.
|
[29] |
(Dong Wei. Mining and Analyzing the Text for Corporate Fraud Detection: An Investigation of Financial Statements and Social Media[D]. Hefei: University of Science and Technology of China, 2017.)
|
[30] |
张春梅, 赵明清, 吴学子. 基于新闻情感的上市公司财务造假识别方法研究[J]. 山东科技大学学报(自然科学版), 2021, 40(1): 91-99.
|
[30] |
(Zhang Chunmei, Zhao Mingqing, Wu Xuezi. Financial Fraud Identification Method for Listed Companies Based on News Sentiment[J]. Journal of Shandong University of Science and Technology (Natural Science), 2021, 40(1): 91-99.)
|
[31] |
Ng W W Y, Hu J J, Yeung D S, et al. Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems[J]. IEEE Transactions on Cybernetics, 2015, 45(11): 2402-2412.
doi: 10.1109/TCYB.2014.2372060
pmid: 25474818
|
[32] |
Liu X Y, Wu J X, Zhou Z H. Exploratory Undersampling for Class-Imbalance Learning[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009, 39(2): 539-550.
doi: 10.1109/TSMCB.2008.2007853
|
[33] |
Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357.
doi: 10.1613/jair.953
|
[34] |
Batista G E A P A, Prati R C, Monard M C. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 20-29.
doi: 10.1145/1007730.1007735
|
[35] |
Quinlan J R. Induction of Decision Trees[J]. Machine Learning, 1986, 1(1): 81-106.
|
[36] |
Breiman L. Random Forests[J]. Machine Learning, 2001, 45(1): 5-32.
doi: 10.1023/A:1010933404324
|
[37] |
Friedman J H. Greedy Function Approximation: A Gradient Boosting Machine[J]. The Annals of Statistics, 2001, 29(5): 1189-1232.
doi: 10.1214/aos/1013203450
|
[38] |
Chen T Q, Guestrin C. XGBoost: A Scalable Tree Boosting System[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016: 785-794.
|
[39] |
Ke G L, Meng Q, Finley T, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. ACM, 2017: 3149-3157.
|
[40] |
Van Rijsbergen C J. Information Retrieval[M]. Butterworths, 1979.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|