New Technology of Library and Information Service  2016, Vol. 32 Issue (9): 17-26    DOI: 10.11925/infotech.1003-3513.2016.09.02
Review of Digital Documents Automatic Classification Research
Li Xiangdong1,2(),Ba Zhichao1,3,Gao Fan1
1School of Information Management, Wuhan University, Wuhan 430072, China
2Center for the Studies of Information Resources, Wuhan University, Wuhan 430072, China
3Information Research Institute of Shandong Academy of Sciences Ji’nan 250014, China
[Objective] This paper discusses the existing issues and possible solutions to the automatic classification of digital documents (i.e. library bibliographies, news pages and social media posts). [Coverage] We reviewed literature on the feature semantics conversion, feature expansion and weighting strategy from the field of Automatic Classification based on machine learning. [Methods] We analyzed the leading studies, key technologies, current achievements, and future directions from the published articles. [Results] Our research found the limits of previous studies on semantic representation of texts and utilization of knowledge bases. [Limitations] We did not discuss the classification algorithms. [Conclusions] To improve the effectiveness of automatic classification of digital documents, future research could try to combine Vector Space Model with Probabilistic Topic Model, use the knowledge base to improve the concept similarity computing, as well as construct composite weighted strategy.

Key wordsAutomatic classification      Feature semantic association      Feature semantic conversion      Feature expansion      Weighting strategy     
Received: 22 January 2016      Published: 19 October 2016

Li Xiangdong,Ba Zhichao,Gao Fan. Review of Digital Documents Automatic Classification Research. New Technology of Library and Information Service, 2016, 32(9): 17-26.

