New Technology of Library and Information Service  2014, Vol. 30 Issue (4): 48-57    DOI: 10.11925/infotech.1003-3513.2014.04.08
Improvement of Text Feature Extraction with Genetic Algorithm
Lu Yonghe, Liang Minghui
School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China
[Objective] To comprehensively analyze many feature extraction methods and improve traditional feature extraction process. [Methods] Firstly, the paper uses feature pool to pre-extract features, then extract best feature set by genetic algorithm and group coding. [Results] When the fitness function uses KNN classification algorithm, the method using in this paper shows the best performance. Besides, the effect is more obvious with less feature dimensions. Simultaneously, the proposed method has better stability in text classification for different feature dimensions and corpuses. [Limitations] The corpus is not abundant enough. Only IG and CHI are used to extract features for feature pool construction. It ignores semantic relationships among words for group coding. The population size and the number of iteration in genetic algorithm are restricted by experimental conditions. [Conclusions] The stability of text classification is improved by adding a feature pool to pre-extract features. The result of text classification is more accurate by adding genetic algorithm in the text feature extraction. To use proposed method reduces overfitting of features and improves efficiency by utilizing group coding in the genetic algorithm.

Key wordsText categorization      Feature extraction      Genetic algorithms      Feature pool     
Received: 25 December 2013      Published: 19 May 2014
Cite this article:

Lu Yonghe, Liang Minghui. Improvement of Text Feature Extraction with Genetic Algorithm. New Technology of Library and Information Service, 2014, 30(4): 48-57.

