%A Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu %T Identifying Commodity Names Based on XGBoost Model %0 Journal Article %D 2019 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.2096-3467.2018.1048 %P 34-41 %V 3 %N 7 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_4682.shtml} %8 2019-07-25 %X

[Objective] This paper tries to automatically identify commodity names from product descriptions, aiming to classifying items sold by Taobao. [Methods] First, we retrieved a large number of transaction records from Taobao. Then, we built an e-commerce commodity description dataset and labeled it manually. Third, we created a supervised machine learning algorithm based on the XGBoost model to extract names from product description. [Results] The precision and recall of the algorithm was 85% and 87% for 816 different items from 20,059 records. [Limitations] Categories of commodities in the test corpus need to be expanded. [Conclusions] Machine learning algorithm is an effective way to identify product names.