Identifying Phishing Websites with Multiple Online Data Sources
Hu Zhongyi(), Wang Chaoqun, Wu Jiang
School of Information Management, Wuhan University, Wuhan 430072, China The Center for Electronic Commerce Research and Development, Wuhan University, Wuhan 430072, China
[Objective] This study aims to identify phishing websites more effectively with the help of online evaluation data and URL abnormal features. [Methods] First, we used eight machine learning techniques to compare the performance of various online evaluation data and URL abnormal features in identifying phishing websites. Then, we proposed a new method to improve the accuracy of the identification procedures. [Results] We found that the evaluation data had better performance than abnormal features of URL. Combining the two data sets could improve the identification performance. [Limitations] We did not consider the difference between the numbers of phishing sites and the good ones. [Conclusions] Online evaluation data and URL abnormal features could help us identify phishing websites effectively, which indicates the direction of future studies.
Sheng S, Weidman B, Warner G, et al.An Empirical Analysis of Phishing Blacklists[C]//Proceedings of the 6th Conference on Email and Anti-Spam, California, USA.2009: 112-118.
[2]
Zhang Y, Egelman S, Cranor L, et al.Phinding Phish: Evaluating Anti-phishing Tools[C]//Proceedings of the 14th Annual Network and Distributed System Security Symposium. 2007: 381-192.
[3]
Blum A, Warden B, Solaria T, et al.Lexical Feature Based Phishing URL Detection Using Online Learning[C]// Proceedings of the ACM Workshop on Artificial Intelligence & Security. 2010: 54-60.
(Huang Huajun, Qian Liang, Wang Yaojun.Detection of Phishing URL Based on Abnormal Feature[J]. Netinfo Security, 2012(1): 23-25.)
[5]
Ma J, Saul L K, Savage S, et al.Identifying Suspicious URLs: An Application of Large-scale Online Learning[C]// Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 2009: 681-688.
[6]
Ma J, Saul L K, Savage S, et al.Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs[C]// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2009: 1245-1254.
(Zeng Chuanhuang, Li Siqiang, Zhang Xiaohong.Phishing Detection System Based on AdaCostBoost Algorithm[J]. Computer Systems & Applications, 2015, 24(9): 129-133.)
[8]
Thomas K, Grier C, Ma J, et a1. Design and Evaluation of a Real-time URL Spam Filtering Service[C]// Proceedings of the 2011 IEEE Symposium on Security and Privacy, Berkeley, California, USA. 2011: 376-382.
(Gu Xiaoqing, Wang Hongyuan, Ni Tongguang, et al.Phishing Detection Approach Based on Naïve Bayes and Support Vector Machine[J]. Computer Engineering and Applications, 2015, 51(4): 87-90.)
[10]
Hu Z, Chiong R, Pranata I, et al.Identifying Malicious Web Domains Using Machine Learning Techniques with Online Credibility and Performance Data[C]//Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC), Vancouver, Canada. 2016: 5186-5194.
[11]
Kursa M B, Rudnicki W R.Feature Selection with the Boruta Package[J]. Journal of Statistical Software, 2010, 36(11): 1-13.
doi: 10.18637/jss.v036.i11
[12]
Freund Y, Schapire R E.A Decision-theoretic Generalization of On-line Learning and an Application to Boosting[J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139.
doi: 10.1007/3-540-59119-2_166
[13]
Lo S L, Chiong R, Cornforth D.Using Support Vector Machine Ensembles for Target Audience Classification on Twitter[J]. PLoS One, 2015, 10(3): 417-434.
doi: 10.1371/journal.pone.0122855
pmid: 4395415
[14]
Bayes T, Price R, Canton J.An Essay Towards Solving a Problem in the Doctrine of Chances[J]. Reasonance, 2003, 8(4): 80-88.
doi: 10.1007/BF02883540