|
|
Identifying Phishing Websites with Multiple Online Data Sources |
Hu Zhongyi(), Wang Chaoqun, Wu Jiang |
School of Information Management, Wuhan University, Wuhan 430072, China The Center for Electronic Commerce Research and Development, Wuhan University, Wuhan 430072, China |
|
|
Abstract [Objective] This study aims to identify phishing websites more effectively with the help of online evaluation data and URL abnormal features. [Methods] First, we used eight machine learning techniques to compare the performance of various online evaluation data and URL abnormal features in identifying phishing websites. Then, we proposed a new method to improve the accuracy of the identification procedures. [Results] We found that the evaluation data had better performance than abnormal features of URL. Combining the two data sets could improve the identification performance. [Limitations] We did not consider the difference between the numbers of phishing sites and the good ones. [Conclusions] Online evaluation data and URL abnormal features could help us identify phishing websites effectively, which indicates the direction of future studies.
|
Received: 10 April 2017
Published: 25 August 2017
|
|
[1] |
Sheng S, Weidman B, Warner G, et al.An Empirical Analysis of Phishing Blacklists[C]//Proceedings of the 6th Conference on Email and Anti-Spam, California, USA.2009: 112-118.
|
[2] |
Zhang Y, Egelman S, Cranor L, et al.Phinding Phish: Evaluating Anti-phishing Tools[C]//Proceedings of the 14th Annual Network and Distributed System Security Symposium. 2007: 381-192.
|
[3] |
Blum A, Warden B, Solaria T, et al.Lexical Feature Based Phishing URL Detection Using Online Learning[C]// Proceedings of the ACM Workshop on Artificial Intelligence & Security. 2010: 54-60.
|
[4] |
黄华军, 钱亮, 王耀钧. 基于异常特征的钓鱼网站 URL 检测技术[J]. 信息网络安全, 2012 (1): 23-25.
|
[4] |
(Huang Huajun, Qian Liang, Wang Yaojun.Detection of Phishing URL Based on Abnormal Feature[J]. Netinfo Security, 2012(1): 23-25.)
|
[5] |
Ma J, Saul L K, Savage S, et al.Identifying Suspicious URLs: An Application of Large-scale Online Learning[C]// Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 2009: 681-688.
|
[6] |
Ma J, Saul L K, Savage S, et al.Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs[C]// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2009: 1245-1254.
|
[7] |
曾传璜, 李思强, 张小红. 基于AdaCostBoost 算法的网络钓鱼检测[J]. 计算机系统应用, 2015, 24(9): 129-133.
|
[7] |
(Zeng Chuanhuang, Li Siqiang, Zhang Xiaohong.Phishing Detection System Based on AdaCostBoost Algorithm[J]. Computer Systems & Applications, 2015, 24(9): 129-133.)
|
[8] |
Thomas K, Grier C, Ma J, et a1. Design and Evaluation of a Real-time URL Spam Filtering Service[C]// Proceedings of the 2011 IEEE Symposium on Security and Privacy, Berkeley, California, USA. 2011: 376-382.
|
[9] |
顾晓清, 王洪元, 倪彤光, 等. 基于贝叶斯和支持向量机的钓鱼网站检测方法[J]. 计算机工程与应用, 2015, 51(4): 87-90.
|
[9] |
(Gu Xiaoqing, Wang Hongyuan, Ni Tongguang, et al.Phishing Detection Approach Based on Naïve Bayes and Support Vector Machine[J]. Computer Engineering and Applications, 2015, 51(4): 87-90.)
|
[10] |
Hu Z, Chiong R, Pranata I, et al.Identifying Malicious Web Domains Using Machine Learning Techniques with Online Credibility and Performance Data[C]//Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC), Vancouver, Canada. 2016: 5186-5194.
|
[11] |
Kursa M B, Rudnicki W R.Feature Selection with the Boruta Package[J]. Journal of Statistical Software, 2010, 36(11): 1-13.
doi: 10.18637/jss.v036.i11
|
[12] |
Freund Y, Schapire R E.A Decision-theoretic Generalization of On-line Learning and an Application to Boosting[J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139.
doi: 10.1007/3-540-59119-2_166
|
[13] |
Lo S L, Chiong R, Cornforth D.Using Support Vector Machine Ensembles for Target Audience Classification on Twitter[J]. PLoS One, 2015, 10(3): 417-434.
doi: 10.1371/journal.pone.0122855
pmid: 4395415
|
[14] |
Bayes T, Price R, Canton J.An Essay Towards Solving a Problem in the Doctrine of Chances[J]. Reasonance, 2003, 8(4): 80-88.
doi: 10.1007/BF02883540
|
[15] |
Breiman L.Random Forests[J]. Machine Learning, 2001, 45(1): 5-32.
doi: 10.1023/A:1010933404324
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|