|
|
A Feature Selection Method for Text Classification Based on Statistical Frequency |
Zhang Junli Zhao Naixuan Feng Jun |
(Library of Nanjing University of Technology, Nanjing 210009, China) |
|
|
Abstract This paper analyzes Chi-square algorithm (CHI), which is unreliable for low-document frequency, and can’t show the pertinence for term and classification. A new Statistical Frequency algorithm (SF) is proposed according to the chief shortcomings. The experiments of the SF algorithm is validated by comparison, the results show that improved algorithm performs better.
|
Received: 13 August 2008
Published: 25 November 2008
|
|
Corresponding Authors:
Zhang Junli
E-mail: elili62@126.com
|
About author:: Zhang Junli,Zhao Naixuan,Feng Jun |
[1] 张俊丽.文本分类中的关键技术研究[D].武汉:华中师范大学,2008.
[2] Yang Y M, Liu X. A re-examination of Text Categorization Methods.22nd Annual International SIGIR[J], In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999:42-49.
[3] 张俊丽,张帆.改进KNN算法在垃圾邮件过滤中的应用[J].现代图书情报技术,2007(4):75-78.
[4] 北京大学计算语言学研究所[EB/OL]. [2008-08-05].http://www.icl.pku.edu.cn/default_cn.asp.
[5] Salton G, Wong A, Yang C S. A Vector Model for Automatic Indexing[J]. Communication of ACM,1975,18(11):613-620.
[6] Salton G, McGill M J. Introduction to Modern Information Retrieval[M]. McGraw Hill, Computer Series, 1983.
[7] Mladenic D, Grobelnik M. Feature Selection for Classification Based on Text Hierarchy[C]. In: Working Notes of Learning from Text and the Web, Conference on Automated Learning and Discovery (CONALD’98), 1998.
[8] Cover T M, Hart P E. Nearest Neighbor Pattern Classification[J].IEEE Trans.Inform.Theory,1967(13):23.
[9] 张俊丽,张帆.KNN-FCM聚类算法在构建智能搜索引擎系统中的应用[J].图书与情报,2007(4):48-51,62.
[10] Sakkis G, Androutsopoulos I.Stacking Classifiers for Anti-spam Filtering of Email [C].In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 2001:44-50.
[11] Yang Y. An Evaluation of Statistical Approaches to Text Categorization[J]. Information Retrieval,1999,1(1):76-78.
[12] 张帆.信息组织学[M].北京:科学出版社,2005:411-412. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|