Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (1): 91-101    DOI: 10.11925/infotech.2096-3467.2017.01.11
Optimizing Feature Selection Method for Text Classification with Shuffled Frog Leaping Algorithm
Yonghe Lu(),Jinghuang Chen
School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China
[Objective]This paper introduces the shuffled frog leaping algorithm (SFLA) to remove the irrelevant terms from the texts, and optimizes the feature selection method to improve the accuracy of text classification. [Methods] First, we used CHI and IG techniques to pre-select different dimensions of feature terms, and then adopted the modified SFLA to refine the text features’ list. Second, we used a frog to represent a feature selection rule, and applied the classification precision as the fitness function. Finally, the SVM and KNN classifier were adopted to calculate the classification precision. [Results] The modified SFLA had better performance in classification precision than CHI and IG, and the highest increasing rate was 12%. [Limitations] The feature over fitting occured in small portion of space dimensions. [Conclusions] Using feature preselection and the modified SFLA could effectively exclude irrelevant or invalid terms, and then improve the precision of feature selection.

Key wordsFeature Selection      Text Classification      Shuffled Frog Leaping Algorithm     
Received: 30 September 2016      Published: 22 February 2017

Yonghe Lu,Jinghuang Chen. Optimizing Feature Selection Method for Text Classification with Shuffled Frog Leaping Algorithm. Data Analysis and Knowledge Discovery, 2017, 1(1): 91-101.

