Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (2): 72-78    DOI: 10.11925/infotech.2096-3467.2018.0509
Study on a Method of Feature Classification Selection Based on χ2 Statistics
Zhanglu Tan,Zhaogang Wang(),Han Hu
School of Management, China University of Mining and Technology, Beijing 100083, China
[Objective] This paper aims at improving the application effect by improving χ2 statistics. The deficiency of traditional χ2 statistics could not guarantee the balance of information between categories and influence the classification effect. [Methods] By analyzing the characteristics selection process of traditional χ2 statistics and its limitations, a feature classification selection method based on χ2 statistics was proposed, and the feature words of different classes were selected according to the correlation degree between the feature words and each class. [Results] The effect of the improved method on the text classification effect was compared with the SVM as the classification model. The results showed that the feature classification selection method based on χ2 statistics made the accuracy, the average classification accuracy, the lowest classification accuracy, the stability and the system running time significantly improved. [Limitations] When the number of feature words selected was small, the difference was not obvious before and after improvement. [Conclusions] The method of feature classification selection based on χ2 statistics could effectively improve the stability and generalization performance of the classification model, reduce the fluctuation of classification accuracy and improve the efficiency of classification process.

Key wordsχ2 Statistics      Feature Selection      Text Categorization      Stability     
Received: 07 May 2018      Published: 27 March 2019

