Based on Vector Space Model(VSM) and Nave-Bayes(NB), completed a multilayer and multi-classification text categorization system. Introduce detailedly four modules: words’ segmentation and frequency statistics, calculating between classifications’ and document, emendating the veracity of parent-class by emendation of subclass, judging whether document has multi-classification and multi-label. Text representation based on Vector Space Model has 89.7% MicroF1 of parent- category, 77.8% of sub- category; text representation based on Nave-Bayes has 67.6% MicroF1 of parent- category, 66.5% of sub- category.
刘华 . 文本分类C#实现*[J]. 现代图书情报技术, 2007, 2(3): 43-45.
Liu Hua . A Text Categorization System with C#. New Technology of Library and Information Service, 2007, 2(3): 43-45.