%A Lu Yonghe, Liang Minghui %T Improvement of Text Feature Extraction with Genetic Algorithm %0 Journal Article %D 2014 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.1003-3513.2014.04.08 %P 48-57 %V 30 %N 4 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_3885.shtml} %8 2014-04-25 %X

[Objective] To comprehensively analyze many feature extraction methods and improve traditional feature extraction process. [Methods] Firstly, the paper uses feature pool to pre-extract features, then extract best feature set by genetic algorithm and group coding. [Results] When the fitness function uses KNN classification algorithm, the method using in this paper shows the best performance. Besides, the effect is more obvious with less feature dimensions. Simultaneously, the proposed method has better stability in text classification for different feature dimensions and corpuses. [Limitations] The corpus is not abundant enough. Only IG and CHI are used to extract features for feature pool construction. It ignores semantic relationships among words for group coding. The population size and the number of iteration in genetic algorithm are restricted by experimental conditions. [Conclusions] The stability of text classification is improved by adding a feature pool to pre-extract features. The result of text classification is more accurate by adding genetic algorithm in the text feature extraction. To use proposed method reduces overfitting of features and improves efficiency by utilizing group coding in the genetic algorithm.