Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (6): 128-140    DOI: 10.11925/infotech.2096-3467.2021.1116
Identifying Tax Audit Cases with Multi-task Learning
Li Guofeng1,Li Zuojuan1(),Wang Zheji1,Wu Meng2
1School of Statistics, Shandong University of Finance and Economics, Jinan 250014, China
2School of Economics, Shandong University of Finance and Economics, Jinan 250014, China
[Objective] This paper integrates tax-related data from multiple sources, and uses machine learning methods to identify the illegal corporate tax evasions. [Methods] First, we use web-scraping, text mining, and other methods to collect business financial data, executive information, and media coverage of the corporations. Then, we used the random forest method for feature selection and established indictors for the candidate companies. Then, we built a discriminatory model with the multi-task sparse structure learning based on the improved focal loss function. Finally, we trained the model with different types of tax audits to identify the needed candidates. [Results] We examined our model with real world datasets and found it had good performance for various applications. Its mean recall rate reached 0.830 9, which was 0.135 1 and 0.103 3 higher than the logistic method and the traditional multi-task sparse structure learning. [Limitations] The model needs to be examined with datasets not from the listed companies. [Conclusions] The new model could identify the target enterprises with various dishonest tax evasions. This study provides new directions for smart tax audit by the government.

Key wordsMulti-source Data Fusion      Smart Tax Audit      Multi-task Sparse Structure Learning      Focal Loss     
Received: 30 September 2021      Published: 25 January 2022
ZTFLH:  F812  
Fund:National Social Science Fund of China(19BTJ023)
Li Zuojuan

Li Guofeng, Li Zuojuan, Wang Zheji, Wu Meng. Identifying Tax Audit Cases with Multi-task Learning. Data Analysis and Knowledge Discovery, 2022, 6(6): 128-140.

数据来源 指标名称 指标描述
企业报表 财务指标 盈利能力指标:资产报酬率、销售毛利率、管理费用率、销售费用率等
股权指标 股权性质:国有企业、非国有企业等
高管信息 年龄 高管团队成员年龄的平均值
性别 高管团队中女性人数的比例
学历 高管团队中本科以上学历人数占比
薪酬激励 高管薪酬均值(取对数)
法律背景 法律类专业高管人数占所有高管人数比例
财务背景 财务类专业高管人数占所有高管人数比例
媒体关注 媒体关
税务部门 企业是否
Data Sources and Description of Relevant Indicators
Calculation Process of Negative Media Sentiment Score
Ranking of Feature Importance Scores for VAT Dataset Based on Random Forest
指标维度 指标名称
盈利能力 销售成本率、销售费用率、管理费用率、每股营业总收入、净利润、营业总成本/营业总收入、每股收益增长率、投入资本回报率、营业外收支净额/利润总额
营运能力 存货周转率、流动资产周转率、固定资产周转率、存货周转天数、营业周期、应付账款周转天数、应收账款周转率、应收账款周转天数、股东权益周转率
偿债能力 流动比率、资产负债率、股东权益/负债合计、每股资本公积金
成长能力 净资产收益率、营业收入3年复合增长率
现金流量 每股企业自由现金流量、销售商品劳务收入现金
股票市场 市盈率、市净率、市销率、股东户数、户均持股数
高管信息 性别、年龄、法律背景、财务背景
媒体关注 媒体负面情绪度
Judgment Index System of Case Selection for Tax Inspection by Tax Type
预测正类 预测负类
实际正类 TP(True Positives) FN(False Negatives)
实际负类 FP(False Positives) TN(True Negatives)
Confusion Matrix
评价指标 分税种 Pool-
准确率 增值税 0.847 8 0.878 3 0.826 1
企业所得税 0.752 1 0.816 7 0.816 7
个人所得税 0.798 2 0.853 2 0.834 9
其他税 0.803 4 0.841 9 0.806 3
均值 0.795 0 0.800 4 0.847 5 0.821 0
AUC 增值税 0.822 4 0.819 1 0.837 3
企业所得税 0.785 7 0.817 0 0.819 2
个人所得税 0.722 9 0.780 8 0.816 6
其他税 0.749 1 0.750 0 0.814 8
均值 0.735 1 0.770 0 0.791 7 0.822 0
F1-Score 增值税 0.715 4 0.740 7 0.710 1
企业所得税 0.602 7 0.807 0 0.813 6
个人所得税 0.607 1 0.703 7 0.727 3
其他税 0.645 7 0.638 6 0.693 6
均值 0.604 9 0.642 7 0.722 5 0.736 1
G-mean 增值税 0.783 5 0.842 5 0.837 3
企业所得税 0.664 3 0.815 6 0.819 2
个人所得税 0.752 8 0.843 1 0.816 6
其他税 0.812 1 0.795 0 0.814 8
均值 0.735 1 0.753 2 0.824 1 0.822 0
召回率 增值税 0.686 3 0.701 8 0.859 6
企业所得税 0.831 2 0.821 4 0.857 1
个人所得税 0.645 1 0.689 6 0.774 2
其他税 0.620 6 0.697 7 0.832 8
均值 0.619 6 0.695 8 0.727 6 0.830 9
The Effectiveness of Logistic Regression and Multi-task Sparse Structure Learning
Ablation Experiments with Different Characteristic Indicators
Recall Rate of Model with Different Sampling Ratios
FL-MSSL Evaluation Indexes under Different Number of Tasks
企业编号 上市公司代码 预测不诚实
1 C6034** 13.75% 0 0
2 C3000** 38.81% 0 0
3 C0021** 80.00% 增值税 增值税
4 C0007** 76.96% 其他税 其他税
5 C6039** 3.22% 0 0
6 C0027** 68.86% 增值税 增值税
7 C6006** 88.35% 个人所得税 个人所得税
8 C3007** 22.12% 0 0
9 C3005** 64.74% 企业所得税 企业所得税
10 C3005** 40.93% 0 0
11 C6006** 65.41% 其他税 其他税
12 C6038** 4.09% 0 0
13 C0027** 86.33% 个人所得税 个人所得税
14 C0027** 32.37% 0 0
15 C6003** 43.43% 0 0
Out-of-Sample Firm Predictions Based on the FL-MSSL
