Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (10): 65-71    DOI: 10.11925/infotech.1003-3513.2015.10.09
Current Issue | Archive | Adv Search |
A Brusher Detection Method Based on Principle Component Analysis and Random Forest
Zhang Liyi, Zhang Jiao
School of Information Management, Wuhan University, Wuhan 430072, China
Download: PDF(539 KB)   HTML  
Export: BibTeX | EndNote (RIS)      

[Objective] A new model based on Principle Component Analysis and Random Forest is proposed aiming to detect Taobao brushers, decrease the dimensions of indicators and improve recognition rate. [Methods] This article uses Principle Component Analysis to reduce dimensions and uses Random Forest to classify users. In order to reflect the superiority of the detection model, it also builds detection models respectively based on KNN and SVM using the same data for different model training to compare the detection accuracy and efficiency of these models. [Results] The experimental results show that the detection model on the Principle Component Analysis and Random Forest gets 88.0% accuracy within 3 minutes. [Limitations] Most data is from third-party platforms which cannot fully reflect the all Singlebrush types. [Conclusions] The detection model on the Principle Component Analysis and Random Forest has higher detection accuracy and efficiency.

Received: 07 April 2015      Published: 06 April 2016
:  G202  

Cite this article:

Zhang Liyi, Zhang Jiao. A Brusher Detection Method Based on Principle Component Analysis and Random Forest. New Technology of Library and Information Service, 2015, 31(10): 65-71.

URL:     OR

[1] 阿里巴巴招股说明书[EB/OL]. [2015-04-06]. (Alibaba Group's Prospectus [EB/OL]. [2015-04-06].
[2] 刘会涛. 揭秘刷钻黑色产业链[N]. 北京青年报, 2009-08- 05(A09). (Liu Huitao. Disclosure of Singlebrush Black Industry [N]. Beijing Youth Daily, 2009-08-05(A09).)
[3] 戴添. 虚假订单风波致阿里巴巴股票创收盘新低[N]. 北京青年报, 2015-03-04. (Dai Tian. Alibaba Shares Close at New Low on Fake Orders [N]. Beijing Youth Daily, 2015-03-04.)
[4] Feng S, Banerjee R, Choi Y. Syntactic Stylometry for Deception Detection [C]. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2: Association for Computational Linguistics, 2012.
[5] Mukherjee A, Venkataraman V, Liu B, et al. What Yelp Fake Review Filter Might be Doing? [C]. In: Proceedings of the 7th International AAAI Conference on Weblogs and Social Media, 2013.
[6] 任亚峰, 姬东鸿, 尹兰. 基于半监督学习算法的虚假评论
识别研究[J]. 四川大学学报: 工程科学版, 2014,46(3): 62-69. (Ren Yafeng, Ji Donghong, Yin Lan. Deceptive Reviews Detection Based on Semi-supervised Learning Algorithm [J]. Journal of Sichuan University: Engineering Science Edition, 2014, 46(3): 62-69.)
[7] Mukherjee A, Liu B, Glance N. Spotting Fake Reviewer Groups in Consumer Reviews [C]. In: Proceedings of the 21st International Conference on World Wide Web. ACM, 2012: 191-200.
[8] Wang G, Xie S H, Liu B, et al. Review Graph Based OnlineStore Review Spammer Detection [C]. In: Proceedings of the 11th International Conference on Data Mining. Washington, DC, USA: IEEE Computer Society, 2011: 1242-1247.
[9] Lu Y, Zhang L, Xiao Y, et al. Simultaneously Detecting Fake Reviews and Review Spammers Using Factor Graph Model [C]. In: Proceedings of the 5th Annual ACM Web Science Conference. ACM, 2013: 225-233.
[10] Hotelling H. Analysis of a Complex of Statistical Variables into Principal Components [J]. Journal of Education Psychology, 1933, 24(6): 417-441.
[11] Karhunen J, Oja E, Wang L, et al. A Class of Neural Networks for Independent Component Analysis [J]. IEEE Transactions on Neural Networks, 1997, 8(3): 486-504.
[12] Ho C-T B, Wu D D. Online Banking Performance Evaluation Using Data Envelopment Analysis and Principal Component Analysis [J]. Computers & Operations Research, 2009, 36(6): 1835-1842.
[13] Oja E. Principal Components, Minor Components, and Linear Neural Networks [J]. Neural Networks, 1992, 5(5): 927-935.
[14] Kaiser H F. The Varimax Criterion for Analytic Rotation in Factor Analysis [J]. Psychometrika, 1958, 23(3): 187-200.
[15] 章文波, 陈红艳. 实用数据统计分析及spss12.0应用[M]. 北京: 人民邮电出版社, 2006: 249-250. (Zhang Wenbo, Chen Hongyan. Practical Data Analysis and SPSS 12.0 Application [M]. Beijing: People's Posts and Telecommuni­ca­tions Press, 2006: 249-250.)
[16] Breiman L. Random Forests [J]. Machine Learning, 2001, 45(1): 5-32.
[17] Kaiser H F. An Index of Factorial Simplicity [J]. Psychometrika, 1974, 39(1): 31-36.Bartlett M S. Properties of Sufficiency and Statistical Tests [J]. Proceedings of Royal Society of London, 1937, 160(901): 268-282.

[1] Song Meiqing. Research on Multi-granularity Users' Preference Mining Based on Collaborative Filtering Personalized Recommendation[J]. 现代图书情报技术, 2015, 31(12): 28-33.
[2] Wang Zhongqun, Le Yuan, Xiu Yu, Huang Subin, Wang Qiansong. Collusive Sales Fraud Detection Based on Users' Information Search Behavior Template and Statistical Analysis[J]. 现代图书情报技术, 2015, 31(11): 41-50.
[3] He Yue, Song Lingxi, Qi Liyun. Spillover Effect of Internet Word of Mouth in Negative Events——Take the “Deadly Yuantong Express” Event for an Example[J]. 现代图书情报技术, 2015, 31(10): 58-64.
[4] Wang Zhongqun, Huang Subin, Xiu Yu, Zhang Yi. Research on Metrics-Model for Online Product Review Depth Based on Domain Expert and Feature Concept Tree of Products[J]. 现代图书情报技术, 2015, 31(9): 17-25.
[5] Ying Yan, Cao Yan, Mu Xiangwei. A Hybrid Collaborative Filtering Recommender Based on Item Rating Prediction[J]. 现代图书情报技术, 2015, 31(6): 27-32.
[6] Zhao Jingxian. Detect of Internet Fake Public Opinion Based on Decision Tree[J]. 现代图书情报技术, 2015, 31(6): 78-84.
[7] Wu Jiehua, Zhu Anqing. Mixture Topological Factors for Collaboration Prediction in Academic Network[J]. 现代图书情报技术, 2015, 31(4): 65-71.
[8] Li Sheng, Wang Yemao. An Ontology-based and Location-aware Book Recommendation Model in Library[J]. 现代图书情报技术, 2015, 31(3): 58-66.
[9] Chen Tao, Zhang Yongjuan, Chen Heng. Implementation of the Framework for Converting Web-data to RDF (W2R)[J]. 现代图书情报技术, 2015, 31(2): 1-6.
[10] Wang Weijun, Song Meiqing. A Collaborative Filtering Personalized Recommendation Algorithm Through Directionally Mining Users’ Preferences[J]. 现代图书情报技术, 2014, 30(6): 25-32.
[11] Wu Shanyan, Xu Xin. Cooking Recipe Recommendation System Based on CBR[J]. 现代图书情报技术, 2013, (12): 34-41.
[12] Liu Kan, Zhu Huaiping, Liu Xiuqin. Detection of Internet Deceptive Opinion Based on SVM[J]. 现代图书情报技术, 2013, 29(11): 75-80.
[13] Xiong Tao, He Yue. The Identification and Analysis of Micro-blogging Opinion Leaders in the Network of Retweet Relationship[J]. 现代图书情报技术, 2013, (6): 55-62.
[14] Li Shuqing, Wang Jianqiang. A Visualization and Recognition Method of Readers’ Interests with the Analysis of the Characteristics of Borrowing Time[J]. 现代图书情报技术, 2013, (5): 46-53.
[15] Kou Jihong, Dai Yishu, Liu Fang, Wu Jun, Xu Chenghuan, Cao Qian. The Analysis on Functional Mechanism of TheBrain[J]. 现代图书情报技术, 2012, (12): 45-51.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938