|
|
Optimizing Feature Selection Method for Text Classification with Shuffled Frog Leaping Algorithm |
Lu Yonghe(), Chen Jinghuang |
School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China |
|
|
Abstract [Objective]This paper introduces the shuffled frog leaping algorithm (SFLA) to remove the irrelevant terms from the texts, and optimizes the feature selection method to improve the accuracy of text classification. [Methods] First, we used CHI and IG techniques to pre-select different dimensions of feature terms, and then adopted the modified SFLA to refine the text features’ list. Second, we used a frog to represent a feature selection rule, and applied the classification precision as the fitness function. Finally, the SVM and KNN classifier were adopted to calculate the classification precision. [Results] The modified SFLA had better performance in classification precision than CHI and IG, and the highest increasing rate was 12%. [Limitations] The feature over fitting occured in small portion of space dimensions. [Conclusions] Using feature preselection and the modified SFLA could effectively exclude irrelevant or invalid terms, and then improve the precision of feature selection.
|
Received: 30 September 2016
Published: 22 February 2017
|
|
[1] |
庞观松, 蒋盛益. 文本自动分类技术研究综述[J]. 情报理论与实践, 2012, 35(2): 123-128.
|
[1] |
(Pang Guansong, Jiang Shengyi.Text Automatic Classification Technology Research[J]. Information Studies: Theory & Application, 2012, 35(2): 123-128.)
|
[2] |
吴科. 基于机器学习的文本分类研究[D]. 上海:上海交通大学, 2008.
|
[2] |
(Wu Ke.A Study on Text Categorization Based on Machine Learning [D]. Shanghai: Shanghai Jiaotong University, 2008.)
|
[3] |
伍建军, 康耀红. 文本分类中特征选择方法的比较和改进[J]. 郑州大学学报: 理学版, 2007,39(2): 110-113.
doi: 10.3969/j.issn.1671-6841.2007.02.026
|
[3] |
(Wu Jianjun, Kang Yaohong.Comparison and Improvement of Feature Selection for Text Categorization[J]. Journal of Zhengzhou University: Natural Science Edition, 2007,39(2): 110-113.)
doi: 10.3969/j.issn.1671-6841.2007.02.026
|
[4] |
Yang Y, Pedersen J O.A Comparative Study on Feature Selection in Text Categorization[C]//Proceedings of the 14th International Conference on Machine Learning.San Francisco: Morgan Kaufmann Publishers Inc., 1997: 412-420.
|
[5] |
符发. 中文文本分类中特征选择方法的比较[J]. 现代计算机: 专业版, 2008(6): 43-45.
|
[5] |
(Fu Fa.Comparison of Feature Selection in Chinese Text Categorization[J]. Modern Computer, 2008(6): 43-45.)
|
[6] |
Tabakhi S, Moradi P, Akhlaghian F.An Unsupervised Feature Selection Algorithm Based on Ant Colony Optimization[J]. Engineering Applications of Artificial Intelligence, 2014, 32: 112-123.
doi: 10.1016/j.engappai.2014.03.007
|
[7] |
刘亚南. KNN文本分类中基于遗传算法的特征提取技术研究[D]. 北京: 中国石油大学, 2011.
|
[7] |
(Liu Ya’nan.Research of Feature Extraction Technology in KNN Text Classification Based on the Genetic Algorithm [D]. Beijing: China University of Petroleum, 2011.)
|
[8] |
刘逵. 基于野草算法的文本特征选择研究[D]. 重庆: 西南大学, 2013.
|
[8] |
(Liu Kui.An Invasive Weed Optimization Algorithm for Text Feature Selection [D]. Chongqing: Southwest University, 2013.)
|
[9] |
Uguz H.A Two-stage Feature Selection Method for Text Categorization by Using Information Gain, Principal Component Analysis and Genetic Algorithm[J]. Knowledge-Based Systems, 2011, 24(7): 1024-1032.
doi: 10.1016/j.knosys.2011.04.014
|
[10] |
Javed K, Maruf S, Babri H A.A Two-stage Markov Blanket Based Feature Selection Algorithm for Text Classification[J]. Neurocomputing, 2015, 157: 91-104.
doi: 10.1016/j.neucom.2015.01.031
|
[11] |
Lu Y, Liang M, Ye Z, et al.Improved Particle Swarm Optimization Algorithm and Its Application in Text Feature Selection[J]. Applied Soft Computing, 2015, 35(C): 629-636.
doi: 10.1016/j.asoc.2015.07.005
|
[12] |
Eusuff M M, Lansey K E.Optimization of Water Distribution Network Design Using the Shuffled Frog Leaping Algorithm[J]. Journal of Water Resources Planning and Management, 2003, 129(3): 210-225.
|
[13] |
崔文华, 刘晓冰, 王伟, 等. 混合蛙跳算法研究综述[J]. 控制与决策, 2012, 27(4): 481-486, 493.
|
[13] |
(Cui Wenhua, Liu Xiaobing, Wang Wei, et al.Survey on Shuffled Frog Leaping Algorithm[J]. Control and Decision, 2012, 27(4): 481-486, 493.)
|
[14] |
Elbehairy H, Elbeltagi E, Hegazy T, et al.Comparison of Two Evolutionary Algorithms for Optimization of Bridge Deck Repairs[J]. Computer-Aided Civil and Infrastructure Engineering, 2006, 21(8): 561-572.
doi: 10.1111/j.1467-8667.2006.00458.x
|
[15] |
陈功贵, 李智欢, 陈金富, 等. 含风电场电力系统动态优化潮流的混合蛙跳算法[J]. 电力系统自动化, 2009, 33(4): 25-30.
|
[15] |
(Chen Gonggui, Li Zhihuan, Chen Jinfu, et al.SFL Algorithm Based Dynamic Optimal Power Flow in Wind Power Integrated System[J]. Automation of Electric Power Systems, 2009, 33(4): 25-30.)
|
[16] |
张沈习, 陈楷, 龙禹, 等. 基于混合蛙跳算法的分布式风电源规划[J]. 电力系统自动化, 2013,37(13): 76-82.
doi: 10.7500/AEPS201207219
|
[16] |
(Zhang Shenxi, Chen Kai, Long Yu, et al.Distributed Wind Generator Planning Based Shuffled Frog Leaping Algorithm[J]. Automation of Electric Power Systems, 2013, 37(13): 76-82.)
doi: 10.7500/AEPS201207219
|
[17] |
余华, 黄程韦, 金赟, 等. 基于改进的蛙跳算法的神经网络在语音情感识别中的研究[J]. 信号处理, 2010, 26(9): 1294-1299.
doi: 10.3969/j.issn.1003-0530.2010.09.003
|
[17] |
(Yu Hua, Huang Chengwei, Jin Yun, et al.Speech Emotion Recognition Based on Modified Shuffled Frog Leaping Algorithm Neural Network[J]. Signal Processing, 2010, 26(9): 1294-1299.)
doi: 10.3969/j.issn.1003-0530.2010.09.003
|
[18] |
许方. 基于混合蛙跳算法的Web文本聚类研究[D]. 无锡:江南大学, 2013.
|
[18] |
(Xu Fang.Research on Web Text Cluster Algorithm Based on Shuffled Frog-leaping Algorithm [D]. Wuxi: Jiangnan University, 2013.)
|
[19] |
尉建兴, 崔冬华, 宁晓青. 蛙跳算法在Web文本聚类技术中的应用[J]. 电脑开发与应用, 2011, 24(5): 35-37.
doi: 10.3969/j.issn.1003-5850.2011.05.013
|
[19] |
(Yu Jianxing, Cui Donghua, Ning Xiaoqing.Applicatin of Shuffled Frog-leaping Algorithm to Web’s Text Cluster Technology[J]. Computer Development & Applications, 2011, 24(5): 35-37.)
doi: 10.3969/j.issn.1003-5850.2011.05.013
|
[20] |
Sun X, Wang Z.An Efficient Document Categorization Algorithm Based on LDA and SFL[C]//Proceedings of the 2008 International Seminar on Business and Information Management. IEEE, 2008: 113-115.
|
[21] |
NLPIR 汉语分词系统 [EB/OL]. [2016-03-17]. .
|
[21] |
(NLPIR Chinese Word Segmentation System [EB/OL]. [2016-03-17].
|
[22] |
路永和, 彭燕虹. 融合实用性与科学性的互联网信息分类体系构建[J]. 图书与情报, 2015(3): 118-124.
doi: 10.11968/tsygb.1003-6938.2015072
|
[22] |
(Lu Yonghe, Peng Yanhong.The Classification System Construction for Internet Information both Practical and Scientific[J]. Library and Information, 2015(3): 118-124.)
doi: 10.11968/tsygb.1003-6938.2015072
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|