Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (1): 72-84     https://doi.org/10.11925/infotech.2096-3467.2018.0506
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
中国海关HS编码风险的识别研究*
张紫玄1,2,王昊1,2(),朱立平1,2,3,邓三鸿1,2
1南京大学信息管理学院 南京 210023
2江苏省数据工程与知识服务重点实验室 南京 210023
3中华人民共和国南京海关 南京 210001
Identifying Risks of HS Codes by China Customs
Zixuan Zhang1,2,Hao Wang1,2(),Liping Zhu1,2,3,Sanhong eng1,2
1School of Information Management, Nanjing University, Nanjing 210023, China
2Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
3Nanjing Customs District, P.R.China, Nanjing 210001, China
全文: PDF (897 KB)   HTML ( 16
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】利用HS编码数据中所蕴含的规律, 为海关税收风险判断分析提供有效的知识服务。【方法】提出直接以HS编码作为风险判别目标和以HS编码正误作为风险判别目标两种基于机器学习的自动分类方案解决HS编码风险判断问题, 针对编码目标的结构、特征的性质、文本的长短等特征构建与方案对应的SVM预测模型并进行相应实验。【结果】对以HS编码作为判别目标和以HS编码正误作为判别目标两种预测海关报关风险方案进行探讨与分析, 发现后者对训练数据的要求更低, 预测速度更快, 风险的识别效果也更好。【局限】仅获得4个月的数据, 可能存在样本代表性不足的问题。【结论】最终经过测试获得风险预测率较高的分类器, 为形成可实用的分类模型和判别系统提供了良好的知识基础。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张紫玄
王昊
朱立平
邓三鸿
关键词 风险识别HS编码预测SVM算法文本分类机器学习    
Abstract

[Objective] This study tries to utilize patterns from the HS codes to provide effective knowledge service for the China customs taxation. [Methods] We proposed two machine learning-based automatic classification schemes. The first one directly used original HS codes as risk identifiers while the other one relied on the correctness of the HS codes. We also built a SVM prediction model and examined the two schemes from the perspectives of target structures and features, as well as the text length. [Results] We found that the second model required less training efforts and processing time and then reached better accuracy. [Limitations] Only used four-month-data to train the new models. [Conclusions] This study finds an effective way to forecast customs risks, and indicate directions of applicable products.

Key wordsRisk Identification    HS Prediction    SVM    Text Classification    Machine Learning
收稿日期: 2018-05-07      出版日期: 2019-03-04
基金资助:*本文系江苏省研究生科研与实践创新计划项目“大数据环境下海关商品归类风险分析和规避研究”(项目编号: SJCX18_0009)和“南京海关税收大数据分析咨询项目”的研究成果之一
引用本文:   
张紫玄,王昊,朱立平,邓三鸿. 中国海关HS编码风险的识别研究*[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
Zixuan Zhang,Hao Wang,Liping Zhu,Sanhong eng. Identifying Risks of HS Codes by China Customs. Data Analysis and Knowledge Discovery, 2019, 3(1): 72-84.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0506      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2019/V3/I1/72
[1] Zhang S, Zhao S.The Implication of Customs Modernization on Export Competitiveness in China[A]// Impact of Trade Facilitation on Export Competitiveness: A Regional Perspective[M]. 2009, 66: 121-131.
[2] Laporte B.Risk Management Systems: Using Data Mining in Developing Countries’ Customs Administrations[J]. World Customs Journal, 2011, 5(1): 17-27.
[3] 白雪燕. 中国海关概论[M]. 北京: 中国海关出版社, 2011.
[3] (Bai Xueyan.Introduction to China Customs[M]. Beijing: China Custom Press, 2011.)
[4] Pierce J R, Schott P K.A Concordance Between Ten-Digit U.S. Harmonized System Codes and Sic/Naics Product Classes and Industries[J]. Journal of Economic and Social Measurement, 2012, 37(1-2): 61-96.
[5] 海关总署. 中华人民共和国海关报关员执业管理办法[J]. 中华人民共和国国务院公报, 2007(12): 30-33.
[5] (General Administration of Customs. The Customs of the People’s Republic of China. Customs Administration Measures[J]. The State Council of the People’s Republic of China, 2007(12): 30-33.)
[6] 周欣, 张弛海. 基于数据挖掘的海关风险分类预测模型研究[J]. 海关与经贸研究, 2017, 38(2):22-31.
[6] (Zhou Xin, Zhang Chihai.Customs Risk Classification and Forecasting Model Based on Data Mining[J]. Journal of Customs and Trade, 2017, 38(2): 22-31.)
[7] 卢金秋. 数据挖掘中的人工神经网络算法及应用研究[D]. 杭州: 浙江工业大学, 2005.
[7] (Lu Jinqiu.Research and Application on Artificial Neural Network Algorithm in Data Mining[D]. Hangzhou: Zhejiang University of Technology, 2005.)
[8] 杨海. 现代海关制度建设中的难点及对策研究[D]. 武汉: 华中科技大学, 2008.
[8] (Yang Hai.A Research on Crux and the Counterplan Within Construction of Modern Customs System[D]. Wuhan: Huazhong University of Science and Technology, 2008.)
[9] 马俊. 基于关联规则的海关审单商品分组研究[D]. 大连: 大连理工大学, 2006.
[9] (Ma Jun.ARM-Based Research on Commodity Grouping for Customs Documents Checking[D]. Dalian: Dalian University of Technology, 2006.)
[10] 唐麒麟, 李长生. 美国海关“预进口复审系统”简介[J]. 中国海关, 1994(11): 44-45.
[10] (Tang Qilin, Li Changsheng.Introduction of U.S. Customs “Pre-import Review System”[J]. China Custom, 1994(11): 44-45.)
[11] Zdanowicz J S.Detecting Money Laundering and Terrorist Financing via Data Mining[J]. Communications of the ACM, 2004, 47(5): 53-55.
[12] Hoffmann L.A Critical Look at the Current International Response to Combat Trade-Based Money Laundering: The Risk-Based Customs Audit as a Solution[J]. Texas International Law Journal, 2013, 48(2): 325.
[13] 操辉. 韩国海关全心开发风险管理系统[J]. 中国海关, 2001(7): 60-61.
[13] (Cao Hui.South Korean Customs Devotes Heart to Risk Management System[J]. China Custom, 2001(7): 60-61.)
[14] 张荣忠. 印度海关的巨大进步[J]. 中国海关, 2004(8): 46-47.
[14] (Zhang Rongzhong.The Great Progress of Indian Customs[J]. China Custom, 2004(8): 46-47.)
[15] Coundoul O, Gadiaga M,Geourjon A M, et al.Inspecting Less to Inspect Better: The Use of Data Mining for Risk Management by Customs Administrations[R]. Working Papers, 2012: 46.
[16] Shao H, Zhao H, Chang G.Applying Data Mining to Detect Fraud Behavior in Customs Declaration[C]// Proceedings of the 2002 International Conference on Machine Learning and Cybernetics, 2002: 1241-1244.
[17] 任尔伟, 牟青杰, 孙学文. 数据挖掘技术在海关查验和价格瞒骗辅助决策中的应用[J]. 上海海关高等专科学校学报, 2002(3): 58-61.
[17] (Ren Erwei, Mou Qingjie, Sun Xuewen.Application of Data Mining Technology in Customs Inspection and Price-cheat Assistant Decision-making[J]. Journal of Shanghai Customs College, 2002(3): 58-61.)
[18] 张云波, 邓波, 苏锦秀. 数据挖掘在海关商品查验中的应用[J]. 上海海关高等专科学校学报, 2003(2): 51-55.
[18] (Zhang Yunbo, Deng Bo, Su Jinxiu.Application of Data Mining in Customs Inspection[J]. Journal of Shanghai Customs College, 2003(2): 51-55.)
[19] 卢金秋. 人工神经网络在海关风险管理中的应用研究[J]. 计算机工程与应用, 2006, 42(27): 208-211.
[19] (Lu Jinqiu.Application Research on Customs Risk-management Based on Artificial Neural Networks[J]. Computer Engineering and Applications, 2006, 42(27): 208-211.)
[20] 喻宇. 重庆海关进出口数据挖掘与分析[D]. 重庆: 重庆大学, 2008.
[20] (Yu Yu.Mining and Analysising of Chongqing Customs’ Import and Export Data[D]. Chongqing: Chongqing University, 2008.)
[21] 杨波. 关于进出口商品归类风险的成因探析和防范[J]. 海关与经贸研究, 2016, 37(1): 59-81.
[21] (Yang Bo.Cause and Prevention of the Risks in Import and Export Commodities Classification[J]. Journal of Customs and Trade, 2016, 37(1): 59-81.)
[22] 刘昌伟, 段景辉. 基于因子分析法的海关风险管理评价分析[J]. 海关与经贸研究, 2016, 37(6): 27-42.
[22] (Liu Changwei, Duan Jinghui.On Evaluation of Customs Risk Management on the Basis of Factor Analysis[J]. Journal of Customs and Trade, 2016, 37(6): 27-42.)
[23] 张亦鸣. 1996年版《商品名称及编码协调制度》对我国进出口税则的影响[J]. 中国海关, 1995(2): 27-28.
[23] (Zhang Yiming. The Influence of the 1996 Version of the Harmonized Commodity Name and Coding System on China’s Import and Export Tariffs[J]. China Custom, 1995(2): 27-28.)
[24] 王克海. 大规模产品生产作业计划作业事项号的自动生成[J]. 系统工程理论与实践, 1994(8): 51-55.
[24] (Wang Kehai.The Automatic Generation of the Event Number for the Large-Scale Producting Task Schedule[J]. Systems Engineering-Theory & Practice, 1994(8): 51-55.)
[25] 陈东明, 常桂然. 基于分段编码自动生成产品结构树的研究[J]. 计算机集成制造系统, 2005, 11(7): 1014-1018.
[25] (Chen Dongming, Chang Guiran.Automatic Creation of Product Structure Tree Based on Segment Coding[J]. Computer Integrated Manufacturing Systems, 2005, 11(7): 1014-1018.)
[26] 王昊, 严明, 苏新宁. 基于机器学习的中文书目自动分类研究[J]. 中国图书馆学报, 2010, 36(6): 28-39.
[26] (Wang Hao, Yan Ming, Su Xinning.Research on Automatic Classification of Chinese Language Items Based on Machine Learning[J]. Journal of Library Science in China, 2010, 36(6): 28-39.)
[27] Wang J, Lee M C.Reconstructing DDC for Interactive Classification[C]// Proceedings of the 16th ACM Conference on Information and Knowledge Management. ACM, 2007: 137-146.
[28] Koller D, Sahami M.Hierarchically Classifying Documents Using Very Few Words[C]// Proceedings of the 14th International Conference on Machine Learning. 1997: 170-178.
[29] Zimek A, Buchwald F, Frank E, et al.A Study of Hierarchical and Flat Classification of Proteins[J]. IEEE/ACM Transactions on Computational Biology & Bioinformatics, 2010, 7(3): 563-571.
[30] 王昊, 叶鹏, 邓三鸿. 机器学习在中文期刊论文自动分类研究中的应用[J]. 现代图书情报技术, 2014(3): 80-87.
[30] (Wang Hao, Ye Peng, Deng Sanhong.The Application of Machine-Learning in the Research on Automatic Categorization of Chinese Periodical Articles[J]. New Technology of Library and Information Service, 2014(3): 80-87.)
[31] 谢小楚. 数据挖掘技术在海关缉私系统中的设计与应用[D]. 北京: 北京工业大学, 2007.
[31] (Xie Xiaochu.The Design and Application of Data Mining Technology in Customs Smuggling Systems[D]. Beijing: Beijing University of Technology, 2007.)
[32] 严俊龙, 李铁源. 基于SVM的网络安全风险评估模型及应用[J]. 计算机与数字工程, 2012, 40(1): 82-84.
[32] (Yan Junlong, Li Tieyuan.Assessing Model of Network Security Risk Based on SVM[J]. Computer and Digital Engineering, 2012, 40(1): 82-84.)
[33] 罗方科, 陈晓红. 基于Logistic回归模型的个人小额贷款信用风险评估及应用[J]. 财经理论与实践, 2017, 38(1): 30-35.
[33] (Luo Fangke, Chen Xiaohong.Credit Risk Assessment of Personal Small Loan Based on Logistic Regression Model and Its Application[J]. The Theory and Practice of Finance and Economics, 2017, 38(1): 30-35.)
[34] 海关总署关税征管司. 进出口税则商品及品目注释[M]. 北京: 中国商务出版社, 2011.
[34] (Customs Administration Department.Import and Export Tariff Notes on Commodities and Products[M]. Beijing: China Business Press, 2011.)
[35] 陆跃平. 《商品名称及编码协调制度》及其公约介绍[J]. 国际贸易, 1992(1): 51-53.
[35] (Lu Yueping.“Commodity Name and Coding Coordination System” and Its Convention Introduction[J]. International Trade, 1992(1): 51-53.)
[36] 中华人民共和国海关进出口税则编委会. 中华人民共和国海关进出口税则[M]. 北京: 经济日报出版社, 2012.
[36] (Customs Import and Export Tariff Editorial Board of the People’s Republic of China. Customs Import and Export Tariff of the People’s Republic of China[M]. Beijing: Economic Daily Press, 2012.)
[37] 海关总署统计司. 中华人民共和国海关统计商品目录[M].北京: 中国海关出版社, 2014.
[37] (Statistical Department of the General Administration of Customs. Catalogue of Customs Statistics of the People’s Republic of China[M]. Beijing: China Customs Press, 2014.)
[38] 陆彦婷, 陆建峰, 杨静宇. 层次分类方法综述[J]. 模式识别与人工智能, 2013, 26(12): 1130-1139.
[38] (Lu Yanting, Lu Jianfeng, Yang Jingyu.A Survey of Hierarchical Classification Methods[J]. Pattern Recognition and Artificial Intelligence, 2013, 26(12): 1130-1139.)
[39] 李森. 层次化文本分类方法的研究[D]. 济南: 山东大学, 2007.
[39] (Li Sen.Research on Hierarchy Document Classification[D]. Jinan: Shandong University, 2007.)
[40] McCallum A, Rosenfeld R, Mitchell T M, et al. Improving Text Classification by Shrinkage in a Hierarchy of Classes[C]// Proceedings of the 15th International Conference on Machine Learning. 1998: 359-367.
[41] 胥丽娜. 海关商品归类错误的风险及其防范[J]. 对外经贸实务, 2015(11): 70-73.
[41] (Xu Lina.The Risk of Misclassification of Customs Commodities and Its Prevention[J]. Practice in Foreign Economic Relations and Trade, 2015(11): 70-73.)
[42] Joachims T.Making Large-Scale SVM Learning Practical[R]. Advances in Kernel Methods-Support Vector Learning, DOI: 10.17877/DE290R-14262.
[43] Leslie C, Eskin E, Noble W S.The Spectrum Kernel: A String Kernel for SVM Protein Classification[J]. Pacific Symposium on Biocomputing, 2002: 564-575.
[44] 曹予思. 我国海关查验工作绩效评估的研究[D]. 北京: 中央财经大学, 2010.
[44] (Cao Yusi.Study on Performance Evaluation of China Customs Inspection Work[D]. Beijing: Central University of Finance and Economics, 2010.)
[1] 王寒雪,崔文娟,周园春,杜一. 基于机器学习的食源性疾病致病菌识别方法*[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[3] 陈东华,赵红梅,尚小溥,张润彤. 数据驱动的大型医院手术室运营预测与优化方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[4] 车宏鑫,王桐,王伟. 前列腺癌预测模型对比研究*[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[5] 周泽聿,王昊,赵梓博,李跃艳,张小琴. 融合关联信息的GCN文本分类模型构建及其应用研究*[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[6] 苏强, 侯校理, 邹妮. 基于机器学习组合优化方法的术后感染预测模型研究*[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[7] 曹睿,廖彬,李敏,孙瑞娜. 基于XGBoost的在线短租市场价格预测及特征分析模型*[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[8] 余本功,朱晓洁,张子薇. 基于多层次特征提取的胶囊网络文本分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[9] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[10] 向卓元,刘志聪,吴玉. 基于用户行为自适应推荐模型研究 *[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[11] 王艳, 王胡燕, 余本功. 基于多特征融合的中文文本分类研究*[J]. 数据分析与知识发现, 2021, 5(10): 1-14.
[12] 柴国荣,王斌,沙勇忠. 基于多机器学习方法联合的公共卫生风险预测研究——以兰州市流感预测为例*[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[13] 唐晓波,高和璇. 基于关键词词向量特征扩展的健康问句分类研究 *[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
[14] 陈东,王建冬,李慧颖,蔡思航,黄倩倩,易成岐,曹攀. 融合机器学习算法和多因素的禽肉交易量预测方法研究 *[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[15] 梁野,李小元,许航,胡伊然. CLOpin:一种面向舆情分析与预警领域的跨语言知识图谱架构*[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn