|
|
A Brusher Detection Method Based on Principle Component Analysis and Random Forest |
Zhang Liyi, Zhang Jiao |
School of Information Management, Wuhan University, Wuhan 430072, China |
|
|
Abstract [Objective] A new model based on Principle Component Analysis and Random Forest is proposed aiming to detect Taobao brushers, decrease the dimensions of indicators and improve recognition rate. [Methods] This article uses Principle Component Analysis to reduce dimensions and uses Random Forest to classify users. In order to reflect the superiority of the detection model, it also builds detection models respectively based on KNN and SVM using the same data for different model training to compare the detection accuracy and efficiency of these models. [Results] The experimental results show that the detection model on the Principle Component Analysis and Random Forest gets 88.0% accuracy within 3 minutes. [Limitations] Most data is from third-party platforms which cannot fully reflect the all Singlebrush types. [Conclusions] The detection model on the Principle Component Analysis and Random Forest has higher detection accuracy and efficiency.
|
Received: 07 April 2015
Published: 06 April 2016
|
|
[1] 阿里巴巴招股说明书[EB/OL]. [2015-04-06]. http://tech.sina. com.cn/i/2007-10-23/08361808855.shtml. (Alibaba Group's Prospectus [EB/OL]. [2015-04-06]. http://tech.sina.com.cn/i/2007-10-23/08361808855.shtml.)
[2] 刘会涛. 揭秘刷钻黑色产业链[N]. 北京青年报, 2009-08- 05(A09). (Liu Huitao. Disclosure of Singlebrush Black Industry [N]. Beijing Youth Daily, 2009-08-05(A09).)
[3] 戴添. 虚假订单风波致阿里巴巴股票创收盘新低[N]. 北京青年报, 2015-03-04. (Dai Tian. Alibaba Shares Close at New Low on Fake Orders [N]. Beijing Youth Daily, 2015-03-04.)
[4] Feng S, Banerjee R, Choi Y. Syntactic Stylometry for Deception Detection [C]. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2: Association for Computational Linguistics, 2012.
[5] Mukherjee A, Venkataraman V, Liu B, et al. What Yelp Fake Review Filter Might be Doing? [C]. In: Proceedings of the 7th International AAAI Conference on Weblogs and Social Media, 2013.
[6] 任亚峰, 姬东鸿, 尹兰. 基于半监督学习算法的虚假评论
识别研究[J]. 四川大学学报: 工程科学版, 2014,46(3): 62-69. (Ren Yafeng, Ji Donghong, Yin Lan. Deceptive Reviews Detection Based on Semi-supervised Learning Algorithm [J]. Journal of Sichuan University: Engineering Science Edition, 2014, 46(3): 62-69.)
[7] Mukherjee A, Liu B, Glance N. Spotting Fake Reviewer Groups in Consumer Reviews [C]. In: Proceedings of the 21st International Conference on World Wide Web. ACM, 2012: 191-200.
[8] Wang G, Xie S H, Liu B, et al. Review Graph Based OnlineStore Review Spammer Detection [C]. In: Proceedings of the 11th International Conference on Data Mining. Washington, DC, USA: IEEE Computer Society, 2011: 1242-1247.
[9] Lu Y, Zhang L, Xiao Y, et al. Simultaneously Detecting Fake Reviews and Review Spammers Using Factor Graph Model [C]. In: Proceedings of the 5th Annual ACM Web Science Conference. ACM, 2013: 225-233.
[10] Hotelling H. Analysis of a Complex of Statistical Variables into Principal Components [J]. Journal of Education Psychology, 1933, 24(6): 417-441.
[11] Karhunen J, Oja E, Wang L, et al. A Class of Neural Networks for Independent Component Analysis [J]. IEEE Transactions on Neural Networks, 1997, 8(3): 486-504.
[12] Ho C-T B, Wu D D. Online Banking Performance Evaluation Using Data Envelopment Analysis and Principal Component Analysis [J]. Computers & Operations Research, 2009, 36(6): 1835-1842.
[13] Oja E. Principal Components, Minor Components, and Linear Neural Networks [J]. Neural Networks, 1992, 5(5): 927-935.
[14] Kaiser H F. The Varimax Criterion for Analytic Rotation in Factor Analysis [J]. Psychometrika, 1958, 23(3): 187-200.
[15] 章文波, 陈红艳. 实用数据统计分析及spss12.0应用[M]. 北京: 人民邮电出版社, 2006: 249-250. (Zhang Wenbo, Chen Hongyan. Practical Data Analysis and SPSS 12.0 Application [M]. Beijing: People's Posts and Telecommunications Press, 2006: 249-250.)
[16] Breiman L. Random Forests [J]. Machine Learning, 2001, 45(1): 5-32.
[17] Kaiser H F. An Index of Factorial Simplicity [J]. Psychometrika, 1974, 39(1): 31-36.Bartlett M S. Properties of Sufficiency and Statistical Tests [J]. Proceedings of Royal Society of London, 1937, 160(901): 268-282. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|