%A Li Xiangdong,Ba Zhichao,Gao Fan %T Review of Digital Documents Automatic Classification Research %0 Journal Article %D 2016 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.1003-3513.2016.09.02 %P 17-26 %V 32 %N 9 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_4263.shtml} %8 2016-09-25 %X

[Objective] This paper discusses the existing issues and possible solutions to the automatic classification of digital documents (i.e. library bibliographies, news pages and social media posts). [Coverage] We reviewed literature on the feature semantics conversion, feature expansion and weighting strategy from the field of Automatic Classification based on machine learning. [Methods] We analyzed the leading studies, key technologies, current achievements, and future directions from the published articles. [Results] Our research found the limits of previous studies on semantic representation of texts and utilization of knowledge bases. [Limitations] We did not discuss the classification algorithms. [Conclusions] To improve the effectiveness of automatic classification of digital documents, future research could try to combine Vector Space Model with Probabilistic Topic Model, use the knowledge base to improve the concept similarity computing, as well as construct composite weighted strategy.