Please wait a minute...
New Technology of Library and Information Service  2011, Vol. 27 Issue (7/8): 76-81    DOI: 10.11925/infotech.1003-3513.2011.07-08.13
Current Issue | Archive | Adv Search |
Text Feature Selection Method Based on Particle Swarm Optimization
Lu Yonghe, Cao Lichao
School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China
Download: PDF(572 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  From the perspective of the overall impact of text features on the result of text categorization, a text feature selection method based on particle swarm optimization (PSOTFS)is proposed; to mine the text feature selection rules by PSO algorithm. At first, PSOTFS uses CHI to preselect the text features, then uses PSO algorithm to precisely select the text features from the preselected text features. PSOTFS uses a particle to represent a feature selection rule and the set of feature selection rules corresponds with a particle swarm. At the same time, the classification precision is used as the fitness function and grouping is used to reduce the dimensions of the particles. The experiment result shows that the text categorization effectiveness of PSOTFS is better than that of CHI, information gain, document frequency and mutual information.
Key wordsText categorization      Feature selection      Text feature      Particle swarm optimization      CHI     
Received: 04 May 2011      Published: 09 October 2011
: 

TP391

 

Cite this article:

Lu Yonghe, Cao Lichao. Text Feature Selection Method Based on Particle Swarm Optimization. New Technology of Library and Information Service, 2011, 27(7/8): 76-81.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2011.07-08.13     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2011/V27/I7/8/76

[1] 肖可,奉国和. 1999-2008年国内文本分类研究文献计量分析[J]. 情报学报, 2010,29(4):679-687.

[2] Yang Y, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization . In: Proceedings of the 14th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc, 1997:412-420.

[3] 苏新宁. 信息检索理论与技术[M]. 北京:科学技术文献出版社,2004:273-307.

[4] 伍建军,康耀红. 文本分类中特征选择方法的比较和改进[J]. 郑州大学学报:理学版, 2007,39(2):110-113.

[5] 符发. 中文文本分类中特征选择方法的比较[J]. 现代计算机:专业版, 2008(6):43-45.

[6] 李凯齐,刁兴春,曹建军,等. 基于改进蚁群算法的高精度文本特征选择方法[J]. 解放军理工大学学报:自然科学版, 2010,11(6):634-639.

[7] Kennedy J, Eberhart R. Particle Swarm Optimization . In: Proceedings of IEEE International Conference on Neural Networks. Piscataway:IEEE Service Center, 1995:1942-1948.

[8] 谢晓锋,张文俊,杨之廉. 微粒群算法综述[J]. 控制与决策, 2003,18(2):129-134.

[9] Lin S W, Chen S C. PSOLDA: A Particle Swarm Optimization Approach for Enhancing Classification Accuracy Rate of Linear Discriminant Analysis[J]. Applied Soft Computing, 2009, 9(3):1008-1015.

[10] 张国英,沙芸,江慧娜. 基于粒子群优化的快速KNN分类算法[J]. 山东大学学报:理学版, 2006,41(3):34-36.

[11] 林令娟,刘希玉. 基于微粒群优化的快速K-近邻分类算法[J]. 山东科学, 2009,22(1):13-16.

[12] 李欢,焦建民. 简化的粒子群优化快速KNN分类算法[J]. 计算机工程与应用, 2008,44(32):57-59.

[13] 唐朝霞. 基于PSO和KNN的网页智能分类算法[J]. 太原师范学院学报:自然科学版, 2010,9(4):55-58.

[14] 任江涛,卓晓岚,许盛灿,等. 基于PSO面向K近邻分类的特征权重学习算法[J]. 计算机科学, 2007,34(5):187-189.

[15] 拓守恒. 基于改进PSO的SVM文本分类研究[J]. 电脑开发与应用, 2010,23(10):3-5,8.

[16] 王东,吴湘滨. 利用粒子群算法优化SVM分类器的超参数[J]. 计算机应用, 2008,28(1):134-135,139.

[17] 孙洋. 粒子群算法的改进及其在文本分类上的应用[J]. 中央民族大学学报:自然科学版, 2008,17(3):57-62.

[18] 罗新. 基于群集智能的文本分类研究 . 广州:中山大学,2009.

[19] 童亚拉,陈益. 一种基于混沌粒子群算法的网页分类规则抽取方法[J]. 微电子学与计算机, 2009,26(2):193-196.

[20] 谭德坤. 基于混沌微粒群算法的文本分类研究[J]. 计算机应用研究, 2010,27(12):4464-4466.

[21] 朱颢东,钟勇. 基于并行二进制免疫量子粒子群优化的特征选择方法[J]. 控制与决策, 2010,25(1):53-58,63.

[22] Zahran B M, Kanaan G. Text Feature Selection Using Particle Swarm Optimization Algorithm[J]. World Applied Sciences Journal, 2009(7):69-74.
[1] Lixin Xia,Jieyan Zeng,Chongwu Bi,Guanghui Ye. Identifying Hierarchy Evolution of User Interests with LDA Topic Model[J]. 数据分析与知识发现, 2019, 3(7): 1-13.
[2] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[3] Zhongxi You,Weina Hua,Xuelian Pan. Matching Book Reviews and Essential Sentiment Lexicons with Chinese Word Segmenters[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
[4] Junliang Yao,Xiaoqiu Le. Semantic Matching for Sci-Tech Novelty Retrieval[J]. 数据分析与知识发现, 2019, 3(6): 50-56.
[5] Qingtian Zeng,Mingdi Dai,Chao Li,Hua Duan,Zhongying Zhao. Discovering Important Locations with User Representation and Trace Data[J]. 数据分析与知识发现, 2019, 3(6): 75-82.
[6] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[7] Jiaming Liang,Jie Zhao,Zhou Jianlong,Zhenning Dong. Detecting Collusive Fraudulent Online Transaction with Implicit User Behaviors[J]. 数据分析与知识发现, 2019, 3(5): 125-138.
[8] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[9] Shijie Song,Yuxiang Zhao,Wenting Han,Qinghua Zhu. The Inhibition Effect of Health Literacy on Health Risk Under the Internet Environment: An Empirical Study of Chronic Diseases Based on CHNS Data[J]. 数据分析与知识发现, 2019, 3(4): 13-21.
[10] Tingxin Wen,Yangzi Li,Jingshuang Sun. News Hotspots Discovery Method Based on Multi Factor Feature Selection and AFOA/K-means[J]. 数据分析与知识发现, 2019, 3(4): 97-106.
[11] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[12] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[13] Yue Yuan,Dongbo Wang,Shuiqing Huang,Bin Li. The Comparative Study of Different Tagging Sets on Entity Extraction of Classical Books[J]. 数据分析与知识发现, 2019, 3(3): 57-65.
[14] Qingmin Liu,Changqing Yao,Chongde Shi,Xiaojie Wen,Yueying Sun. Vocabulary Optimization of Neural Machine Translation for Scientific and Technical Document[J]. 数据分析与知识发现, 2019, 3(3): 76-82.
[15] Zhanglu Tan,Zhaogang Wang,Han Hu. Study on a Method of Feature Classification Selection Based on χ2 Statistics[J]. 数据分析与知识发现, 2019, 3(2): 72-78.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn