Please wait a minute...
New Technology of Library and Information Service  2007, Vol. 2 Issue (4): 75-78    DOI: 10.11925/infotech.1003-3513.2007.04.18
Current Issue | Archive | Adv Search |
Application of Improved KNN Algorithm in Spam Email Filtering
Zhang Junli   Zhang Fan
(Department of Information Management, Huazhong Normal University, Wuhan 430079, China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

In this paper, an improved K-Nearest Neighbor (KNN) is proposed and is applied to filter spam email. It’s proved that the improved algorithm is less sensitive to the parameter K and the distribution of the training set, helps reducing the misclassification, and performances well in experiments.

Key wordsKNN      Anti-spam email      Text classification     
Received: 05 March 2007      Published: 25 April 2007
: 

TP391

 
Corresponding Authors: Zhang Junli     E-mail: elili62@126.com
About author:: Zhang Junli,Zhang Fan

Cite this article:

Zhang Junli,Zhang Fan . Application of Improved KNN Algorithm in Spam Email Filtering. New Technology of Library and Information Service, 2007, 2(4): 75-78.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2007.04.18     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2007/V2/I4/75

1张帆.信息组织学.北京:科学出版社,2005:411- 412
2王斌,潘文锋.基于内容的垃圾邮件过滤技术综述.中文信息学报,2005,19(5):4-5
3Joachims T.Text Categorization with Support Vector Machines: Learning with Many Relevant Features.European Conference on Machine Learning, 1998
4Li Baoli,Chen Yuzhong,Yu Shiwen. A Comparative Study on Automatic Categorization Methods for Chinese Search Engine.In:Proceedings of the Eighth Joint International Computer Conference, 2002:117-120
5Androutsopoulos I,Koutsias J, Chandrinos K V,Spyropoulos C D. An  Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Encrypted Personal E-mail Messages.  In:Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2000: 160-167
6Cover T M, Hart P E. Nearest Neighbor Pattern Classification.IEEE Trans.Inform.Theory,1967(13):23
7Salton G, Wong A, Yang C S. A Vector Model for Automatic Indexing. Communication of ACM,1975,18(11):613-620
8Sahami M,Dumais S,Heckerman D,Horvitz E.A Bayesian  Approach to Filtering Junk E-Mail. AAAI Technical Report, 1998(5): 55-62
9Mitchell T M. Machine Learning.New York: McGraw-Hill, 1997
10Salton G,McGill M J. Introduction to Modern Information Retrieval. McGraw Hill, Computer Series, 1983
11徐洪伟,方勇,音春.垃圾邮件过滤技术分析.通信技术,2003,142(10):127
12Georgios Sakkis, Ion Androutsopoulos.  Stacking  Classifiers for Anti-Spam Filtering of Email. In:Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 2001:44-50
13Androutsopoulos I,Koutsias J,Chandrinos K V,Paliouras P,Spyropoulos C D.An  Evaluation of Na?ve Bayesian Anti-Spam Filtering. In:Proceedings of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning. 2000:9-17
14The Linguist List.  http://listserv.linguistlist.org/archives/linguist.html. (Accessed Dec.20,2006)

[1] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[3] Yu Bengong,Zhu Xiaojie,Zhang Ziwei. A Capsule Network Model for Text Classification with Multi-level Feature Extraction[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[4] Wang Yan, Wang Huyan, Yu Bengong. Chinese Text Classification with Feature Fusion[J]. 数据分析与知识发现, 2021, 5(10): 1-14.
[5] Wang Sidi,Hu Guangwei,Yang Siyu,Shi Yun. Automatic Transferring Government Website E-Mails Based on Text Classification[J]. 数据分析与知识发现, 2020, 4(6): 51-59.
[6] Xu Yuemei,Liu Yunwen,Cai Lianqiao. Predicitng Retweets of Government Microblogs with Deep-combined Features[J]. 数据分析与知识发现, 2020, 4(2/3): 18-28.
[7] Xu Tongtong,Sun Huazhi,Ma Chunmei,Jiang Lifen,Liu Yichen. Classification Model for Few-shot Texts Based on Bi-directional Long-term Attention Features[J]. 数据分析与知识发现, 2020, 4(10): 113-123.
[8] Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang. Classification of Short Texts Based on nLD-SVM-RF Model[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[9] Weimin Nie,Yongzhou Chen,Jing Ma. A Text Vector Representation Model Merging Multi-Granularity Information[J]. 数据分析与知识发现, 2019, 3(9): 45-52.
[10] Yunfei Shao,Dongsu Liu. Classifying Short-texts with Class Feature Extension[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[11] Heran Qin,Liu Liu,Bin Li,Dongbo Wang. Automatic Classification of Ancient Classics with Entity Features[J]. 数据分析与知识发现, 2019, 3(9): 68-76.
[12] Guo Chen,Tianxiang Xu. Sentence Function Recognition Based on Active Learning[J]. 数据分析与知识发现, 2019, 3(8): 53-61.
[13] Wancheng Chen,Haoran Dai,Yinghan Jin. Appraising Home Prices with HEDONIC Model: Case Study of Seattle, U.S.[J]. 数据分析与知识发现, 2019, 3(5): 19-26.
[14] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[15] Zhiyong Tao,Xiaobing Li,Ying Liu,Xiaofang Liu. Classifying Short Texts with Improved-Attention Based Bidirectional Long Memory Network[J]. 数据分析与知识发现, 2019, 3(12): 21-29.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn