Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (5): 59-69    DOI: 10.11925/infotech.2096-3467.2017.1119
Orginal Article Current Issue | Archive | Adv Search |
Extracting Text Features with Improved Fruit Fly Optimization Algorithm
Wen Tingxin1, Li Yangzi1(), Sun Jingshuang2
1Institute of Systems Engineering, Liaoning Technical University, Huludao 125105, China
2 College of Business Administration, Liaoning Technical University, Huludao 125105, China
Download: PDF (1207 KB)   HTML ( 4
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to reduce the dimension of text feature vector space and then improves the accuracy of text classification. [Methods] We proposed a text feature selection model IFOATFSO based on the improved fruit fly optimization algorithm. It introduced the classification accuracy variance to monitor the convergence degree of the model. We also used the crossover operator, roulette wheel selection method based on simulated annealing mechanism and genetic algorithm to deepen global search and improve population diversity. [Results] The IFOATFSO model, which optimized the feature selection based on CHI method, not only reduced the feature dimension, but also improved the accuracy of text classification by up to 10.5%. [Limitations] The performance of IFOATFSO model for extracting English text features needs to be improved. [Conclusions] The IFOATFSO model improves the text classification.

Key wordsText Feature Selection      Fruit Fly Optimization Algorithm      Classification Accuracy Variance     
Received: 08 November 2017      Published: 20 June 2018
ZTFLH:  TP391  

Cite this article:

Wen Tingxin,Li Yangzi,Sun Jingshuang. Extracting Text Features with Improved Fruit Fly Optimization Algorithm. Data Analysis and Knowledge Discovery, 2018, 2(5): 59-69.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.1119     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I5/59

方法
维度
CHI CHI-IFOATFSO
300
600
900
1 200
1 500
1 800
0.716
0.802
0.812
0.820
0.820
0.831
0.741
0.815
0.818
0.841
0.834
0.838
方法
维度
CHI CHI-IFOATFSO
300
600
900
1 200
1 500
1 800
0.759
0.852
0.845
0.850
0.846
0.817
0.783
0.859
0.889
0.880
0.858
0.867
CHI CHI-IFOATFSO
300
600
900
1 200
1 500
1 800
160
291
417
577
691
870
CHI CHI-IFOATFSO
300
600
900
1 200
1 500
1 800
151
307
452
565
683
844
方法
维度
CHI CHI-IFOATFSO
300
600
900
1 200
1 500
1 800
0.540
0.658
0.689
0.712
0.705
0.713
0.578
0.690
0.712
0.717
0.721
0.736
方法
维度
CHI CHI-IFOATFSO
300
600
900
1 200
1 500
1 800
0.628
0.604
0.625
0.623
0.673
0.671
0.638
0.709
0.652
0.695
0.768
0.773
CHI CHI-IFOATFSO
300
600
900
1 200
1 500
1 800
148
307
452
579
732
907
CHI CHI-IFOATFSO
300
600
900
1 200
1 500
1 800
155
293
448
614
736
906
[1] 林艳峰. 中文文本分类特征选择方法的研究与实现[D]. 西安: 西安电子科技大学, 2014.
[1] (Lin Yanfeng.Research and Implementation of Feature Selection in Chinese Text Classification [D]. Xi’an : Xidian University, 2014.)
[2] 范雪莉, 冯海泓, 原猛. 基于互信息的主成分分析特征选择算法[J]. 控制与决策, 2013, 28(6): 915-919.
[2] (Fan Xueli, Feng Haihong, Yuan Meng.PCA Based on Mutual Information for Feature Selection[J]. Control and Decision, 2013, 28(6): 915-919.)
[3] 路永和, 梁明辉. 遗传算法在改进文本特征提取方法中的应用[J]. 现代图书情报技术, 2014(4): 48-57.
[3] (Lu Yonghe, Liang Minghui.Improvement of Text Feature Extraction with Genetic Algorithm[J]. New Technology of Library and Information Service, 2014(4): 48-57.)
[4] 张彪. 文本分类中特征选择算法的分析与研究[D]. 合肥: 中国科学技术大学, 2010.
[4] (Zhang Biao.Analysis and Research on Feature Selection Algorithm for Text Classification [D]. Hefei: University of Science and Technology of China, 2010.)
[5] 邱云飞, 王威, 刘大有, 等. 基于方差的CHI特征选择方法[J]. 计算机应用研究, 2012, 29(4): 1304-1306.
[5] (Qiu Yunfei, Wang Wei, Liu Dayou, et al.CHI Feature Selection Method Based on Variance[J]. Application Research of Computers, 2012, 29(4): 1304-1306.)
[6] 石慧, 贾代平, 苗培. 基于词频信息的改进信息增益文本特征选择算法[J]. 计算机应用, 2014, 34(11): 3279-3282.
doi: 10.11772/j.issn.1001-9081.2014.11.3279
[6] (Shi Hui, Jia Daiping, Miao Pei.Improved Information Gain Text Feature Selection Algorithm Based on Word Frequency Information[J]. Journal of Computer Applications, 2014, 34(11): 3279-3282.)
doi: 10.11772/j.issn.1001-9081.2014.11.3279
[7] 刘松, 张德贤. 基于权重差异和类别关联的互信息改进研究[J]. 计算机应用研究, 2014, 31(7): 1998-2000.
doi: 10.3969/j.issn.1001-3695.2014.07.017
[7] (Liu Song, Zhang Dexian.Mutual Information Feature Selection Method Based on Weight Difference and Categories Association[J]. Application Research of Computers, 2014, 31(7): 1998-2000.)
doi: 10.3969/j.issn.1001-3695.2014.07.017
[8] Uğuz H.A Two-stage Feature Selection Method for Text Categorization by Using Information Gain, Principal Component Analysis and Genetic Algorithm[J]. Knowledge- Based Systems, 2011, 24(7): 1024-1032.
doi: 10.1016/j.knosys.2011.04.014
[9] 邬开俊, 鲁怀伟. 采用并行协同进化遗传算法的文本特征选择[J]. 系统工程理论与实践, 2012, 32(10): 2215-2220.
doi: 10.3969/j.issn.1000-6788.2012.10.012
[9] (Wu Kaijun, Lu Huaiwei.PCGA Used to Solve Text Feature Selection[J]. Systems Engineering — Theory & Practice, 2012, 32(10): 2215-2220.)
doi: 10.3969/j.issn.1000-6788.2012.10.012
[10] Lu Y, Liang M, Ye Z, et al.Improved Particle Swarm Optimization Algorithm and Its Application in Text Feature Selection[J]. Applied Soft Computing, 2015, 35(C): 629-636.
doi: 10.1016/j.asoc.2015.07.005
[11] Dadaneh B Z, Markid H Y, Zakerolhosseini A.Unsupervised Probabilistic Feature Selection Using Ant Colony Optimization[J]. Expert Systems with Applications, 2016, 53: 27-42.
doi: 10.1016/j.eswa.2016.01.021
[12] 李志鹏, 李卫忠. 基于可拓小生境量子粒子群算法的特征选择[J]. 数据分析与知识发现, 2017, 1(7): 82-89.
[12] (Li Zhipeng, Li Weizhong.Feature Selection Based on Modified QPSO Algorithm[J]. Data Analysis and Knowledge Discovery, 2017, 1(7): 82-89.)
[13] 潘文超. 果蝇最佳化演算法[M]. 台北: 沧海书局, 2011: 10-12.
[13] (Pan Wenchao.Fruit Fly Optimization Algorithm [M]. Taipei: Tsang Hai Publishing Co., 2011: 10-12.)
[14] 肖振久, 孙健, 王永滨, 等. 基于果蝇优化算法的小波域数字水印算法[J]. 计算机应用, 2015, 35(9): 2527-2530.
doi: 10.11772/j.issn.1001-9081.2015.09.2527
[14] (Xiao Zhenjiu, Sun Jian, Wang Yongbin, et al.Wavelet Domain Digital Watermarking Method Based on Fruit Fly Optimization Algorithm[J]. Journal of Computer Applications, 2015, 35(9): 2527-2530.)
doi: 10.11772/j.issn.1001-9081.2015.09.2527
[15] Li M W, Geng J, Han D F, et al.Ship Motion Prediction Using Dynamic Seasonal RvSVR with Phase Space Reconstruction and the Chaos Adaptive Efficient FOA[J]. Neurocomputing, 2016, 174: 661-680.
doi: 10.1016/j.neucom.2015.09.089
[16] 耿立艳, 陈丽华. 基于FOA优化混合核LSSVM的铁路货运量预测[J]. 计算机应用研究, 2017, 34(2): 409-412.
doi: 10.3969/j.issn.1001-3695.2017.02.020
[16] (Geng Liyan, Chen Lihua.Forecast on Railway Traffic Volume Using Mixed-kernel LSSVM Optimized by FOA[J]. Application Research of Computers, 2017, 34(2): 409-412.)
doi: 10.3969/j.issn.1001-3695.2017.02.020
[17] 田旭, 李杰. 一种改进的果蝇优化算法及其在气动优化设计中的应用[J]. 航空学报, 2017, 38(4): 120370.
doi: 10.7527/S1000-6893.2016.0198
[17] (Tian Xu, Li Jie.An Improved Fruit Fly Optimization Algorithm and Its Application in Aerodynamic Optimization Design[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38(4): 120370.)
doi: 10.7527/S1000-6893.2016.0198
[18] 徐同伟, 何庆, 吴意乐, 等. 基于量子果蝇优化的认知无线网络频谱分配[J]. 计算机应用研究, 2017, 34(10): 3116-3120.
doi: 10.3969/j.issn.1001-3695.2017.10.052
[18] (Xu Tongwei, He Qing, Wu Yile, et al.Spectrum Allocation Based on Quantum Fruit Fly Optimization Algorithm in Cognitive Radio Network[J]. Application Research of Computers, 2017, 34(10): 3116-3120.)
doi: 10.3969/j.issn.1001-3695.2017.10.052
[19] 王岩, 张波, 薛博. 基于FOA-SVM的中文文本分类方法研究[J]. 四川大学学报: 自然科学版, 2016, 53(4): 759-763.
[19] (Wang Yan, Zhang Bo, Xue Bo.Research on Chinese Classification Based on FOA-SVM[J]. Journal of Sichuan University: Natural Science Edition, 2016, 53(4): 759-763.)
[1] Wang Hong, Shu Zhan, Gao Yinquan, Tian Wenhong. Analyzing Implicit Discourse Relation with Single Classifier and Multi-Task Network[J]. 数据分析与知识发现, 2021, 5(11): 80-88.
[2] Wu Yanwen, Cai Qiuting, Liu Zhi, Deng Yunze. Digital Resource Recommendation Based on Multi-Source Data and Scene Similarity Calculation[J]. 数据分析与知识发现, 2021, 5(11): 114-123.
[3] Li Zhenyu, Li Shuqing. Deep Collaborative Filtering Algorithm with Embedding Implicit Similarity Groups[J]. 数据分析与知识发现, 2021, 5(11): 124-134.
[4] Dong Miao, Su Zhongqi, Zhou Xiaobei, Lan Xue, Cui Zhigang, Cui Lei. Improving PubMedBERT for CID-Entity-Relation Classification Using Text-CNN[J]. 数据分析与知识发现, 2021, 5(11): 145-152.
[5] Yu Chuanming, Zhang Zhengang, Kong Lingge. Comparing Knowledge Graph Representation Models for Link Prediction[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
[6] Ding Hao, Ai Wenhua, Hu Guangwei, Li Shuqing, Suo Wei. A Personalized Recommendation Model with Time Series Fluctuation of User Interest[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[7] Hua Bin, Wu Nuo, He Xin. Integrating Expert Reviews for Government Information Projects with Knowledge Fusion[J]. 数据分析与知识发现, 2021, 5(10): 124-136.
[8] Wang Yuan, Shi Kaize, Niu Zhendong. Position-Aware Stepwise Tagging Method for Triples Extraction of Entity-Relationship[J]. 数据分析与知识发现, 2021, 5(10): 71-80.
[9] Yang Chen, Chen Xiaohong, Wang Chuhan, Liu Tingting. Recommendation Strategy Based on Users’ Preferences for Fine-Grained Attributes[J]. 数据分析与知识发现, 2021, 5(10): 94-102.
[10] Dai Zhihong, Hao Xiaoling. Extracting Hypernym-Hyponym Relationship for Financial Market Applications[J]. 数据分析与知识发现, 2021, 5(10): 60-70.
[11] Wang Xuefeng, Ren Huichao, Liu Yuqin. Research on the Visualization Method of Drawing Technology Theme Map with Clusters [J]. 数据分析与知识发现, 0, (): 1-.
[12] Wang Yifan,Li Bo,Shi Hua,Miao Wei,Jiang Bin. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[13] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[14] Zhou Yang,Li Xuejun,Wang Donglei,Chen Fang,Peng Lijuan. Visualizing Knowledge Graph for Explosive Formula Design[J]. 数据分析与知识发现, 2021, 5(9): 42-53.
[15] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn