[Objective] This paper integrates tax-related data from multiple sources, and uses machine learning methods to identify the illegal corporate tax evasions. [Methods] First, we use web-scraping, text mining, and other methods to collect business financial data, executive information, and media coverage of the corporations. Then, we used the random forest method for feature selection and established indictors for the candidate companies. Then, we built a discriminatory model with the multi-task sparse structure learning based on the improved focal loss function. Finally, we trained the model with different types of tax audits to identify the needed candidates. [Results] We examined our model with real world datasets and found it had good performance for various applications. Its mean recall rate reached 0.830 9, which was 0.135 1 and 0.103 3 higher than the logistic method and the traditional multi-task sparse structure learning. [Limitations] The model needs to be examined with datasets not from the listed companies. [Conclusions] The new model could identify the target enterprises with various dishonest tax evasions. This study provides new directions for smart tax audit by the government.
(Tian Gaoliang, Li Xing, Si Yi, et al. Option Incentives, Media Coverage and Tax Aggressive: The Corporate Governance Mechanism of Media from Coverage Mode Perspective[J]. Journal of Industrial Engineering and Engineering Management, 2019, 33(1): 1-11.)
Lin T Y, Goyal P, Girshick R, et al. Focal Loss for Dense Object Detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327.
Gonçalves A R, Zuben F V J, Banerjee A. Multi-task Sparse Structure Learning with Gaussian Copula Models[J]. Journal of Machine Learning Research, 2016, 17: 1-30.
李选举. Tobit模型与税收稽查[J]. 统计研究, 2000, 17(1):46-50.
(Li Xuanju. Tobit Model and Tax Audit[J]. Statistical Research, 2000, 17(1):46-50.)
González P C, Velásquez J D. Characterization and Detection of Taxpayers with False Invoices Using Data Mining Techniques[J]. Expert Systems with Applications, 2013, 40(5): 1427-1436.
齐鑫鑫. 识别偷税的税务稽查方法研究[D]. 长春: 吉林大学, 2010.
(Qi Xinxin. The Research on the Tax Inspection Methods about Identifying Tax Evasion[D]. Changchun: Jilin University, 2010.)
唐登山. 税务稽查选案方法探析[J]. 税务研究, 2011(4):61-63.
(Tang Dengshan. Exploration of Case Selection Method of Tax Audit[J]. Taxation Research, 2011(4):61-63.)
Rahimikia E, Mohammadi S, Rahmani T, et al. Detecting Corporate Tax Evasion Using a Hybrid Intelligent System: A Case Study of Iran[J]. International Journal of Accounting Information Systems, 2017, 25: 1-17.
(Xing Xinying, Ji Junzhong, Yao Yao. Brain Networks Classification Based on an Adaptive Multi-task Convolutional Neural Networks[J]. Journal of Computer Research and Development, 2020, 57(7): 1449-1459.)
(Zheng Hongxia, Han Meifang. Tax Planning Analysis Based on Listed Company with Different Ownership Structure: The Empirical Evidence from State-owned Listed Company and Private Listed Company in China[J]. China Soft Science, 2008(9): 122-131.)