|
|
Application of Improved KNN Algorithm in Spam Email Filtering |
Zhang Junli Zhang Fan |
(Department of Information Management, Huazhong Normal University, Wuhan 430079, China) |
|
|
Abstract In this paper, an improved K-Nearest Neighbor (KNN) is proposed and is applied to filter spam email. It’s proved that the improved algorithm is less sensitive to the parameter K and the distribution of the training set, helps reducing the misclassification, and performances well in experiments.
|
Received: 05 March 2007
Published: 25 April 2007
|
|
Corresponding Authors:
Zhang Junli
E-mail: elili62@126.com
|
About author:: Zhang Junli,Zhang Fan |
1张帆.信息组织学.北京:科学出版社,2005:411- 412
2王斌,潘文锋.基于内容的垃圾邮件过滤技术综述.中文信息学报,2005,19(5):4-5
3Joachims T.Text Categorization with Support Vector Machines: Learning with Many Relevant Features.European Conference on Machine Learning, 1998
4Li Baoli,Chen Yuzhong,Yu Shiwen. A Comparative Study on Automatic Categorization Methods for Chinese Search Engine.In:Proceedings of the Eighth Joint International Computer Conference, 2002:117-120
5Androutsopoulos I,Koutsias J, Chandrinos K V,Spyropoulos C D. An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Encrypted Personal E-mail Messages. In:Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2000: 160-167
6Cover T M, Hart P E. Nearest Neighbor Pattern Classification.IEEE Trans.Inform.Theory,1967(13):23
7Salton G, Wong A, Yang C S. A Vector Model for Automatic Indexing. Communication of ACM,1975,18(11):613-620
8Sahami M,Dumais S,Heckerman D,Horvitz E.A Bayesian Approach to Filtering Junk E-Mail. AAAI Technical Report, 1998(5): 55-62
9Mitchell T M. Machine Learning.New York: McGraw-Hill, 1997
10Salton G,McGill M J. Introduction to Modern Information Retrieval. McGraw Hill, Computer Series, 1983
11徐洪伟,方勇,音春.垃圾邮件过滤技术分析.通信技术,2003,142(10):127
12Georgios Sakkis, Ion Androutsopoulos. Stacking Classifiers for Anti-Spam Filtering of Email. In:Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 2001:44-50
13Androutsopoulos I,Koutsias J,Chandrinos K V,Paliouras P,Spyropoulos C D.An Evaluation of Na?ve Bayesian Anti-Spam Filtering. In:Proceedings of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning. 2000:9-17
14The Linguist List. http://listserv.linguistlist.org/archives/linguist.html. (Accessed Dec.20,2006) |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|