Identifying Tax Audit Cases with Multi-task Learning
Li Guofeng1,Li Zuojuan1(),Wang Zheji1,Wu Meng2
1School of Statistics, Shandong University of Finance and Economics, Jinan 250014, China 2School of Economics, Shandong University of Finance and Economics, Jinan 250014, China
[Objective] This paper integrates tax-related data from multiple sources, and uses machine learning methods to identify the illegal corporate tax evasions. [Methods] First, we use web-scraping, text mining, and other methods to collect business financial data, executive information, and media coverage of the corporations. Then, we used the random forest method for feature selection and established indictors for the candidate companies. Then, we built a discriminatory model with the multi-task sparse structure learning based on the improved focal loss function. Finally, we trained the model with different types of tax audits to identify the needed candidates. [Results] We examined our model with real world datasets and found it had good performance for various applications. Its mean recall rate reached 0.830 9, which was 0.135 1 and 0.103 3 higher than the logistic method and the traditional multi-task sparse structure learning. [Limitations] The model needs to be examined with datasets not from the listed companies. [Conclusions] The new model could identify the target enterprises with various dishonest tax evasions. This study provides new directions for smart tax audit by the government.
Wu R S, Ou C S, Lin H Y, et al. Using Data Mining Technique to Enhance Tax Evasion Detection Performance[J]. Expert Systems with Applications, 2012, 39(10): 8769-8777.
(Liu Shangxi, Sun Jing. Big Data Thinking: Application in Tax Risk Management[J]. Review of Economic Research, 2016(9):19-26.)
[3]
Lismont J, Cardinaels E, Bruynseels L, et al. Predicting Tax Avoidance by Means of Social Network Analytics[J]. Decision Support Systems, 2018, 108: 13-24.
(Wang Yanjie, Li Qing, Qi Xinxin. Research on the Tax Inspection Selection Scheme Model Based on the Logistic Regression[J]. Economic Research Guide, 2012(35): 96-97.)
[5]
程书生. 浅析大数据背景下制造业税务风险管理[J]. 纳税, 2020(14): 13-15.
[5]
(Cheng Shusheng. Analysis of Tax Risk Management in Manufacturing Industry in the Context of Big Data[J]. Taxation, 2020(14): 13-15.)
[6]
Slemrod J. The Economics of Corporate Tax Selfishness[J]. National Tax Journal, 2004, 57(4): 877-899.
(Tian Gaoliang, Li Xing, Si Yi, et al. Option Incentives, Media Coverage and Tax Aggressive: The Corporate Governance Mechanism of Media from Coverage Mode Perspective[J]. Journal of Industrial Engineering and Engineering Management, 2019, 33(1): 1-11.)
[8]
Lin T Y, Goyal P, Girshick R, et al. Focal Loss for Dense Object Detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327.
[9]
Gonçalves A R, Zuben F V J, Banerjee A. Multi-task Sparse Structure Learning with Gaussian Copula Models[J]. Journal of Machine Learning Research, 2016, 17: 1-30.
[10]
李选举. Tobit模型与税收稽查[J]. 统计研究, 2000, 17(1):46-50.
[10]
(Li Xuanju. Tobit Model and Tax Audit[J]. Statistical Research, 2000, 17(1):46-50.)
[11]
González P C, Velásquez J D. Characterization and Detection of Taxpayers with False Invoices Using Data Mining Techniques[J]. Expert Systems with Applications, 2013, 40(5): 1427-1436.
[12]
齐鑫鑫. 识别偷税的税务稽查方法研究[D]. 长春: 吉林大学, 2010.
[12]
(Qi Xinxin. The Research on the Tax Inspection Methods about Identifying Tax Evasion[D]. Changchun: Jilin University, 2010.)
[13]
唐登山. 税务稽查选案方法探析[J]. 税务研究, 2011(4):61-63.
[13]
(Tang Dengshan. Exploration of Case Selection Method of Tax Audit[J]. Taxation Research, 2011(4):61-63.)
[14]
Rahimikia E, Mohammadi S, Rahmani T, et al. Detecting Corporate Tax Evasion Using a Hybrid Intelligent System: A Case Study of Iran[J]. International Journal of Accounting Information Systems, 2017, 25: 1-17.
(Fan Hui. A Discussion on the Improvement of the Tax Risk Identification Index System from the “Interne-Plus” Perspective[J]. Taxation Research, 2019(11): 77-81.)
[17]
Bonilla E V, Chai K M A, Williams C K I. Multi-task Gaussian Process Prediction[C]// Proceedings of the 20th Annual Conference on Neural Information Processing Systems. 2007: 153-160.
[18]
Zhang Y, Yeung D Y. A Convex Formulation for Learning Task Relationships in Multi-Task Learning[C]// Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence. 2010: 733-742.
(Xing Xinying, Ji Junzhong, Yao Yao. Brain Networks Classification Based on an Adaptive Multi-task Convolutional Neural Networks[J]. Journal of Computer Research and Development, 2020, 57(7): 1449-1459.)
(Yang Hanxun, Zhou Dequn, Ma Jing, et al. Detecting Rumors with Uncertain Loss and Task-level Attention Mechanism[J]. Data Analysis and Knowledge Discovery, 2021, 5(7): 101-110.)
(Zheng Hongxia, Han Meifang. Tax Planning Analysis Based on Listed Company with Different Ownership Structure: The Empirical Evidence from State-owned Listed Company and Private Listed Company in China[J]. China Soft Science, 2008(9): 122-131.)
(Yu Zhongbo, Tian Gaoliang, Qi Baolei, et al. Corporate Governance Mechanisms of Media Attention: An Examination Based on the Perspective of Surplus Management[J]. Management World, 2011(9): 127-140.)
[25]
Kamkar I, Gupta S K, Phung D, et al. Stable Feature Selection for Clinical Prediction: Exploiting ICD Tree Structure Using Tree-Lasso[J]. Journal of Biomedical Informatics, 2015, 53: 277-290.
[26]
Geurts P, Ernst D, Wehenkel L. Extremely Randomized Trees[J]. Machine Learning, 2006, 63(1): 3-42.
[27]
Jiang R, Tang W W, Wu X B, et al. A Random Forest Approach to the Detection of Epistatic Interactions in Case-Control Studies[J]. BMC Bioinformatics, 2009, 10: S65.
[28]
Gorski J, Pfeuffer F, Klamroth K. Biconvex Sets and Optimization with Biconvex Functions: A Survey and Extensions[J]. Mathematical Methods of Operations Research, 2007, 66(3): 373-407.
[29]
Beck A, Teboulle M. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems[J]. SIAM Journal on Imaging Sciences, 2009, 2(1): 183-202.
[30]
Boyd S, Parikh N, Chu E, et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers[J]. Foundation and Trends in Machine Learning, 2010, 3(1): 1-122.
[31]
Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.