[Objective] This study aims to assess and identify malicious websites with the help of multi-source evaluation metrics. [Methods] We used the principal component analysis (PCA) to conduct a multi-dimensional assessment of malicious websites based on multi-source metrics of websites. Then, we built a malicious site identification model using random forest based on the assessment. [Results] We found that the PCA could effectively extract five assessment dimensions: authority, references, website traffic, ranking, and links. Meanwhile, the identification model was accurate and efficient. [Limitations] Most of the samples in this study were foreign websites, which means the extracted dimensions may be different from those in China. Additionally, we did not study the ratio of malicious to normal websites. [Conclusions] The proposed model could effectively extract dimensions for website assessment and then identifies the malicious ones.
(Huang Huajun, Qian Liang, Wang Yaojun.Detection of Phishing URL Based on Abnormal Feature[J]. Netinfo Security, 2012(1): 23-25.)
Chiew K L, Chang E H, Sze S N, et al.Utilisation of Website Logo for Phishing Detection[J]. Computers & Security, 2015, 54: 16-26.
Hu Z, Chiong R, Pranata I, et al.Identifying Malicious Web Domains Using Machine Learning Techniques with Online Credibility and Performance Data[C]//Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC), Vancouver, Canada. 2016: 5186-5194.
马威. 网站恶意性评估系统设计与实现[D]. 北京: 北京交通大学, 2010.
(Ma Wei.The Design and Implementation of Website Malice Assessing System[D]. Beijing: Beijing Jiaotong University, 2010.)
Purkait S.Examining the Effectiveness of Phishing Filters Against DNS Based Phishing Attacks, Information & Computer Security[J]. Information & Computer Security, 2015, 23(3): 333-346.
(Zeng Chuanhuang, Li Siqiang, Zhang Xiaohong.Phishing Detection System Based on AdaCostBoost Algorithm[J]. Computer Systems & Applications, 2015, 24(9): 129-133.)
Abdelhamid N.Multi-label Rules for Phishing Classification[J]. Applied Computing and Informatics, 2015, 11(1): 29-46.
Abutair H Y A, Belghith A. Using Case-Based Reasoning for Phishing Detection[J]. Procedia Computer Science, 2017, 109: 281-288.
Moghimi M, Varjani A Y.New Rule-based Phishing Detection Method[J]. Expert Systems with Applications, 2016, 53: 231-242.
Yang X, Yan L, Yang B, et al.Phishing Website Detection Using C4.5 Decision Tree[C]//Proceedings of the 2nd International Conference on Information Technology and Management Engineering, Beijing, China. 2017.
Tan C L, Kang L C, Wong K S, et al.PhishWHO: Phishing Webpage Detection via Identity Keywords Extraction and Target Domain Name Finder[J]. Decision Support Systems, 2016, 88: 18-27.