Optimizing Feature Selection Method for Text Classification with Shuffled Frog Leaping Algorithm

doi:10.11925/infotech.2096-3467.2017.01.11

Data Analysis and Knowledge Discovery

2017, Vol. 1

Issue (1): 91-101 DOI: 10.11925/infotech.2096-3467.2017.01.11

Orginal Article

Current Issue | Archive | Adv Search

Optimizing Feature Selection Method for Text Classification with Shuffled Frog Leaping Algorithm

Lu Yonghe(

), Chen Jinghuang

School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China

Download: PDF (1084 KB) HTML ( 45 )
Export: BibTeX | EndNote (RIS)

Abstract

[Objective]This paper introduces the shuffled frog leaping algorithm (SFLA) to remove the irrelevant terms from the texts, and optimizes the feature selection method to improve the accuracy of text classification. [Methods] First, we used CHI and IG techniques to pre-select different dimensions of feature terms, and then adopted the modified SFLA to refine the text features’ list. Second, we used a frog to represent a feature selection rule, and applied the classification precision as the fitness function. Finally, the SVM and KNN classifier were adopted to calculate the classification precision. [Results] The modified SFLA had better performance in classification precision than CHI and IG, and the highest increasing rate was 12%. [Limitations] The feature over fitting occured in small portion of space dimensions. [Conclusions] Using feature preselection and the modified SFLA could effectively exclude irrelevant or invalid terms, and then improve the precision of feature selection.

Key words： Feature Selection Text Classification Shuffled Frog Leaping Algorithm

Received: 30 September 2016 Published: 22 February 2017

ZTFLH:

TP391

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Yonghe Lu
	Jinghuang Chen

Cite this article:

Lu Yonghe,Chen Jinghuang. Optimizing Feature Selection Method for Text Classification with Shuffled Frog Leaping Algorithm. Data Analysis and Knowledge Discovery, 2017, 1(1): 91-101.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.01.11 OR https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I1/91

[1]	庞观松, 蒋盛益. 文本自动分类技术研究综述[J]. 情报理论与实践, 2012, 35(2): 123-128.
[1]	(Pang Guansong, Jiang Shengyi.Text Automatic Classification Technology Research[J]. Information Studies: Theory & Application, 2012, 35(2): 123-128.)
[2]	吴科. 基于机器学习的文本分类研究[D]. 上海:上海交通大学, 2008.
[2]	(Wu Ke.A Study on Text Categorization Based on Machine Learning [D]. Shanghai: Shanghai Jiaotong University, 2008.)
[3]	伍建军, 康耀红. 文本分类中特征选择方法的比较和改进[J]. 郑州大学学报: 理学版, 2007,39(2): 110-113. doi: 10.3969/j.issn.1671-6841.2007.02.026
[3]	(Wu Jianjun, Kang Yaohong.Comparison and Improvement of Feature Selection for Text Categorization[J]. Journal of Zhengzhou University: Natural Science Edition, 2007,39(2): 110-113.) doi: 10.3969/j.issn.1671-6841.2007.02.026
[4]	Yang Y, Pedersen J O.A Comparative Study on Feature Selection in Text Categorization[C]//Proceedings of the 14th International Conference on Machine Learning.San Francisco: Morgan Kaufmann Publishers Inc., 1997: 412-420.
[5]	符发. 中文文本分类中特征选择方法的比较[J]. 现代计算机: 专业版, 2008(6): 43-45.
[5]	(Fu Fa.Comparison of Feature Selection in Chinese Text Categorization[J]. Modern Computer, 2008(6): 43-45.)
[6]	Tabakhi S, Moradi P, Akhlaghian F.An Unsupervised Feature Selection Algorithm Based on Ant Colony Optimization[J]. Engineering Applications of Artificial Intelligence, 2014, 32: 112-123. doi: 10.1016/j.engappai.2014.03.007
[7]	刘亚南. KNN文本分类中基于遗传算法的特征提取技术研究[D]. 北京: 中国石油大学, 2011.
[7]	(Liu Ya’nan.Research of Feature Extraction Technology in KNN Text Classification Based on the Genetic Algorithm [D]. Beijing: China University of Petroleum, 2011.)
[8]	刘逵. 基于野草算法的文本特征选择研究[D]. 重庆: 西南大学, 2013.
[8]	(Liu Kui.An Invasive Weed Optimization Algorithm for Text Feature Selection [D]. Chongqing: Southwest University, 2013.)
[9]	Uguz H.A Two-stage Feature Selection Method for Text Categorization by Using Information Gain, Principal Component Analysis and Genetic Algorithm[J]. Knowledge-Based Systems, 2011, 24(7): 1024-1032. doi: 10.1016/j.knosys.2011.04.014
[10]	Javed K, Maruf S, Babri H A.A Two-stage Markov Blanket Based Feature Selection Algorithm for Text Classification[J]. Neurocomputing, 2015, 157: 91-104. doi: 10.1016/j.neucom.2015.01.031
[11]	Lu Y, Liang M, Ye Z, et al.Improved Particle Swarm Optimization Algorithm and Its Application in Text Feature Selection[J]. Applied Soft Computing, 2015, 35(C): 629-636. doi: 10.1016/j.asoc.2015.07.005
[12]	Eusuff M M, Lansey K E.Optimization of Water Distribution Network Design Using the Shuffled Frog Leaping Algorithm[J]. Journal of Water Resources Planning and Management, 2003, 129(3): 210-225.
[13]	崔文华, 刘晓冰, 王伟, 等. 混合蛙跳算法研究综述[J]. 控制与决策, 2012, 27(4): 481-486, 493.
[13]	(Cui Wenhua, Liu Xiaobing, Wang Wei, et al.Survey on Shufﬂed Frog Leaping Algorithm[J]. Control and Decision, 2012, 27(4): 481-486, 493.)
[14]	Elbehairy H, Elbeltagi E, Hegazy T, et al.Comparison of Two Evolutionary Algorithms for Optimization of Bridge Deck Repairs[J]. Computer-Aided Civil and Infrastructure Engineering, 2006, 21(8): 561-572. doi: 10.1111/j.1467-8667.2006.00458.x
[15]	陈功贵, 李智欢, 陈金富, 等. 含风电场电力系统动态优化潮流的混合蛙跳算法[J]. 电力系统自动化, 2009, 33(4): 25-30.
[15]	(Chen Gonggui, Li Zhihuan, Chen Jinfu, et al.SFL Algorithm Based Dynamic Optimal Power Flow in Wind Power Integrated System[J]. Automation of Electric Power Systems, 2009, 33(4): 25-30.)
[16]	张沈习, 陈楷, 龙禹, 等. 基于混合蛙跳算法的分布式风电源规划[J]. 电力系统自动化, 2013,37(13): 76-82. doi: 10.7500/AEPS201207219
[16]	(Zhang Shenxi, Chen Kai, Long Yu, et al.Distributed Wind Generator Planning Based Shuffled Frog Leaping Algorithm[J]. Automation of Electric Power Systems, 2013, 37(13): 76-82.) doi: 10.7500/AEPS201207219
[17]	余华, 黄程韦, 金赟, 等. 基于改进的蛙跳算法的神经网络在语音情感识别中的研究[J]. 信号处理, 2010, 26(9): 1294-1299. doi: 10.3969/j.issn.1003-0530.2010.09.003
[17]	(Yu Hua, Huang Chengwei, Jin Yun, et al.Speech Emotion Recognition Based on Modified Shuffled Frog Leaping Algorithm Neural Network[J]. Signal Processing, 2010, 26(9): 1294-1299.) doi: 10.3969/j.issn.1003-0530.2010.09.003
[18]	许方. 基于混合蛙跳算法的Web文本聚类研究[D]. 无锡:江南大学, 2013.
[18]	(Xu Fang.Research on Web Text Cluster Algorithm Based on Shuffled Frog-leaping Algorithm [D]. Wuxi: Jiangnan University, 2013.)
[19]	尉建兴, 崔冬华, 宁晓青. 蛙跳算法在Web文本聚类技术中的应用[J]. 电脑开发与应用, 2011, 24(5): 35-37. doi: 10.3969/j.issn.1003-5850.2011.05.013
[19]	(Yu Jianxing, Cui Donghua, Ning Xiaoqing.Applicatin of Shuffled Frog-leaping Algorithm to Web’s Text Cluster Technology[J]. Computer Development & Applications, 2011, 24(5): 35-37.) doi: 10.3969/j.issn.1003-5850.2011.05.013
[20]	Sun X, Wang Z.An Efficient Document Categorization Algorithm Based on LDA and SFL[C]//Proceedings of the 2008 International Seminar on Business and Information Management. IEEE, 2008: 113-115.
[21]	NLPIR 汉语分词系统 [EB/OL]. [2016-03-17]. .
[21]	(NLPIR Chinese Word Segmentation System [EB/OL]. [2016-03-17].
[22]	路永和, 彭燕虹. 融合实用性与科学性的互联网信息分类体系构建[J]. 图书与情报, 2015(3): 118-124. doi: 10.11968/tsygb.1003-6938.2015072
[22]	(Lu Yonghe, Peng Yanhong.The Classification System Construction for Internet Information both Practical and Scientific[J]. Library and Information, 2015(3): 118-124.) doi: 10.11968/tsygb.1003-6938.2015072

[1]		Download
[2]		Download
[3]		Download
[4]		Download

[1]	Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2]	Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[3]	Yu Bengong,Zhu Xiaojie,Zhang Ziwei. A Capsule Network Model for Text Classification with Multi-level Feature Extraction[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[4]	Liang Jiaming, Zhao Jie, Zheng Peng, Huang Liushen, Ye Minqi, Dong Zhenning. Framework for Computing Trust in Online Short-Rent Platform Using Feature Selection of Images and Texts[J]. 数据分析与知识发现, 2021, 5(2): 129-140.
[5]	Wang Yan, Wang Huyan, Yu Bengong. Chinese Text Classification with Feature Fusion[J]. 数据分析与知识发现, 2021, 5(10): 1-14.
[6]	Wang Sidi,Hu Guangwei,Yang Siyu,Shi Yun. Automatic Transferring Government Website E-Mails Based on Text Classification[J]. 数据分析与知识发现, 2020, 4(6): 51-59.
[7]	Xu Yuemei,Liu Yunwen,Cai Lianqiao. Predicitng Retweets of Government Microblogs with Deep-combined Features[J]. 数据分析与知识发现, 2020, 4(2/3): 18-28.
[8]	Xu Tongtong,Sun Huazhi,Ma Chunmei,Jiang Lifen,Liu Yichen. Classification Model for Few-shot Texts Based on Bi-directional Long-term Attention Features[J]. 数据分析与知识发现, 2020, 4(10): 113-123.
[9]	Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang. Classification of Short Texts Based on nLD-SVM-RF Model[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[10]	Weimin Nie,Yongzhou Chen,Jing Ma. A Text Vector Representation Model Merging Multi-Granularity Information[J]. 数据分析与知识发现, 2019, 3(9): 45-52.
[11]	Yunfei Shao,Dongsu Liu. Classifying Short-texts with Class Feature Extension[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[12]	Heran Qin,Liu Liu,Bin Li,Dongbo Wang. Automatic Classification of Ancient Classics with Entity Features[J]. 数据分析与知识发现, 2019, 3(9): 68-76.
[13]	Guo Chen,Tianxiang Xu. Sentence Function Recognition Based on Active Learning[J]. 数据分析与知识发现, 2019, 3(8): 53-61.
[14]	Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[15]	Jiaming Liang,Jie Zhao,Zhou Jianlong,Zhenning Dong. Detecting Collusive Fraudulent Online Transaction with Implicit User Behaviors[J]. 数据分析与知识发现, 2019, 3(5): 125-138.

Viewed

Full text

Abstract

Cited

Shared

Discussed