Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (10): 65-71    DOI: 10.11925/infotech.1003-3513.2015.10.09
Current Issue | Archive | Adv Search |
A Brusher Detection Method Based on Principle Component Analysis and Random Forest
Zhang Liyi, Zhang Jiao
School of Information Management, Wuhan University, Wuhan 430072, China
Export: BibTeX | EndNote (RIS)      

[Objective] A new model based on Principle Component Analysis and Random Forest is proposed aiming to detect Taobao brushers, decrease the dimensions of indicators and improve recognition rate. [Methods] This article uses Principle Component Analysis to reduce dimensions and uses Random Forest to classify users. In order to reflect the superiority of the detection model, it also builds detection models respectively based on KNN and SVM using the same data for different model training to compare the detection accuracy and efficiency of these models. [Results] The experimental results show that the detection model on the Principle Component Analysis and Random Forest gets 88.0% accuracy within 3 minutes. [Limitations] Most data is from third-party platforms which cannot fully reflect the all Singlebrush types. [Conclusions] The detection model on the Principle Component Analysis and Random Forest has higher detection accuracy and efficiency.

Received: 07 April 2015      Published: 06 April 2016
:  G202  

Cite this article:

Zhang Liyi, Zhang Jiao. A Brusher Detection Method Based on Principle Component Analysis and Random Forest. New Technology of Library and Information Service, 2015, 31(10): 65-71.

URL:     OR

[1] 阿里巴巴招股说明书[EB/OL]. [2015-04-06]. (Alibaba Group's Prospectus [EB/OL]. [2015-04-06].
[2] 刘会涛. 揭秘刷钻黑色产业链[N]. 北京青年报, 2009-08- 05(A09). (Liu Huitao. Disclosure of Singlebrush Black Industry [N]. Beijing Youth Daily, 2009-08-05(A09).)
[3] 戴添. 虚假订单风波致阿里巴巴股票创收盘新低[N]. 北京青年报, 2015-03-04. (Dai Tian. Alibaba Shares Close at New Low on Fake Orders [N]. Beijing Youth Daily, 2015-03-04.)
[4] Feng S, Banerjee R, Choi Y. Syntactic Stylometry for Deception Detection [C]. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2: Association for Computational Linguistics, 2012.
[5] Mukherjee A, Venkataraman V, Liu B, et al. What Yelp Fake Review Filter Might be Doing? [C]. In: Proceedings of the 7th International AAAI Conference on Weblogs and Social Media, 2013.
[6] 任亚峰, 姬东鸿, 尹兰. 基于半监督学习算法的虚假评论
识别研究[J]. 四川大学学报: 工程科学版, 2014,46(3): 62-69. (Ren Yafeng, Ji Donghong, Yin Lan. Deceptive Reviews Detection Based on Semi-supervised Learning Algorithm [J]. Journal of Sichuan University: Engineering Science Edition, 2014, 46(3): 62-69.)
[7] Mukherjee A, Liu B, Glance N. Spotting Fake Reviewer Groups in Consumer Reviews [C]. In: Proceedings of the 21st International Conference on World Wide Web. ACM, 2012: 191-200.
[8] Wang G, Xie S H, Liu B, et al. Review Graph Based OnlineStore Review Spammer Detection [C]. In: Proceedings of the 11th International Conference on Data Mining. Washington, DC, USA: IEEE Computer Society, 2011: 1242-1247.
[9] Lu Y, Zhang L, Xiao Y, et al. Simultaneously Detecting Fake Reviews and Review Spammers Using Factor Graph Model [C]. In: Proceedings of the 5th Annual ACM Web Science Conference. ACM, 2013: 225-233.
[10] Hotelling H. Analysis of a Complex of Statistical Variables into Principal Components [J]. Journal of Education Psychology, 1933, 24(6): 417-441.
[11] Karhunen J, Oja E, Wang L, et al. A Class of Neural Networks for Independent Component Analysis [J]. IEEE Transactions on Neural Networks, 1997, 8(3): 486-504.
[12] Ho C-T B, Wu D D. Online Banking Performance Evaluation Using Data Envelopment Analysis and Principal Component Analysis [J]. Computers & Operations Research, 2009, 36(6): 1835-1842.
[13] Oja E. Principal Components, Minor Components, and Linear Neural Networks [J]. Neural Networks, 1992, 5(5): 927-935.
[14] Kaiser H F. The Varimax Criterion for Analytic Rotation in Factor Analysis [J]. Psychometrika, 1958, 23(3): 187-200.
[15] 章文波, 陈红艳. 实用数据统计分析及spss12.0应用[M]. 北京: 人民邮电出版社, 2006: 249-250. (Zhang Wenbo, Chen Hongyan. Practical Data Analysis and SPSS 12.0 Application [M]. Beijing: People's Posts and Telecommuni­ca­tions Press, 2006: 249-250.)
[16] Breiman L. Random Forests [J]. Machine Learning, 2001, 45(1): 5-32.
[17] Kaiser H F. An Index of Factorial Simplicity [J]. Psychometrika, 1974, 39(1): 31-36.Bartlett M S. Properties of Sufficiency and Statistical Tests [J]. Proceedings of Royal Society of London, 1937, 160(901): 268-282.

[1] Fan Tao,Wang Hao,Wu Peng. Sentiment Analysis of Online Users' Negative Emotions Based on Graph Convolutional Network and Dependency Parsing[J]. 数据分析与知识发现, 2021, 5(9): 97-106.
[2] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[3] Feng Yong,Liu Yang,Xu Hongyan,Wang Rongbing,Zhang Yonggang. Recommendation Model Incorporating Neighbor Reviews for GRU Products[J]. 数据分析与知识发现, 2021, 5(3): 78-87.
[4] Wu Jinming,Hou Yuefang,Cui Lei. Automatic Expression of Co-occurrence Clustering Based on Indexing Rules of Medical Subject Headings[J]. 数据分析与知识发现, 2020, 4(9): 133-144.
[5] Zhao Yang, Zhang Zhixiong, Liu Huan, Ding Liangping. Classification of Chinese Medical Literature with BERT Model[J]. 数据分析与知识发现, 2020, 4(8): 41-49.
[6] Zhixiong Zhang,Huan Liu,Liangping Ding,Pengmin Wu,Gaihong Yu. Identifying Moves of Research Abstracts with Deep Learning Methods[J]. 数据分析与知识发现, 2019, 3(12): 1-9.
[7] Yan Yu,Lei Chen,Jinde Jiang,Naixuan Zhao. Measuring Patent Similarity with Word Embedding and Statistical Features[J]. 数据分析与知识发现, 2019, 3(9): 53-59.
[8] Xiong Huixiang,Ye Jiaxin,Jiang Wuxuan. Clustering Social Tags with Improved DBSCAN Algorithm[J]. 数据分析与知识发现, 2018, 2(12): 77-88.
[9] He Weilin,Feng Guohe,Xie Hongling. Analyzing Scientific Literature with Content Similarity - Topics over Time Model[J]. 数据分析与知识发现, 2018, 2(11): 64-72.
[10] Yin Cong,Zhang Liyi. Recommendation Algorithm for Post-Context Filtering Based on TF-IDF: Case Study of Catering O2O[J]. 数据分析与知识发现, 2018, 2(11): 28-36.
[11] Hu Jiaheng,Cen Yonghua,Wu Chengyao. Constructing Sentiment Dictionary with Deep Learning: Case Study of Financial Data[J]. 数据分析与知识发现, 2018, 2(10): 95-102.
[12] Xu Jianmin,Xu Caiyun. Computing Similarity of Sci-Tech Documents Based on Texts and Formulas[J]. 数据分析与知识发现, 2018, 2(10): 103-109.
[13] Zhang Yanfeng,Li He,Peng Lihui,Hou Litie. Identifying Useful Online Reviews with Semantic Feature Extraction[J]. 数据分析与知识发现, 2017, 1(12): 74-83.
[14] Wei Xing,Hu Dehua,Yi Minhan,Zhu Qizhen,Zhu Wenjie. Extracting Disease-Gene-Drug Correlations Based on Data Cube[J]. 数据分析与知识发现, 2017, 1(10): 94-104.
[15] Wang Zhongqun,Wu Dongsheng,Jiang Sheng,Huang Subin. Ranking Credibility of Online Product Reviews Based on Feature-Opinion Pair[J]. 数据分析与知识发现, 2017, 1(10): 32-42.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938