%A Tan Xueqing, Zhou Tong, Luo Lin %T A Text Classification Algorithm Based on the Average Category Similarity %0 Journal Article %D 2014 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.1003-3513.2014.09.09 %P 66-73 %V 30 %N 9 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_3947.shtml} %8 2014-09-25 %X

[Objective] To improve the classification performance and classification speed based on the KNN algorithm. [Methods] This paper proposes a classification algorithm based on the average category similarity, to judge the type of the test text by calculating the mean value of the text similarities of the test text and all texts of each category in the training set. [Results] The experimental results on the Fudan, balanced Sogou and unbalanced Sogou public corpus show that compared with KNN classification algorithm, the Macro_F1 on the two corpora of the method in this paper is increased by 3.5%, 3.2% and 3.3% respectively, the classification speed is 1/22, 1/6 and 1/5 respectively of KNN algorithm. [Limitations] Considering the time efficiency of KNN algorithm, the number of text of the experimental data is few. [Conclusions] It is a kind of practical classification algorithm for large scale text classification contrast with KNN.