Please wait a minute...
Advanced Search
现代图书情报技术  2015, Vol. 31 Issue (10): 65-71    DOI: 10.11925/infotech.1003-3513.2015.10.09
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
一种基于主成分分析和随机森林的刷客识别方法
张李义, 张皎
武汉大学信息管理学院 武汉 430072
A Brusher Detection Method Based on Principle Component Analysis and Random Forest
Zhang Liyi, Zhang Jiao
School of Information Management, Wuhan University, Wuhan 430072, China
全文: PDF(539 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的] 针对刷客识别的指标维数较高, 识别的准确率和效率较低的问题, 提出新的识别模型, 提高刷客的识别准确率和效率。[方法] 采用主成分分析法对用户指标进行降维, 并运用随机森林算法识别刷客。为了反映该模型在刷客识别方面的优越性, 分别建立基于K近邻判断分析、支持向量机理论的识别模型, 用相同的数据针对不同模型进行训练, 比较不同模型的识别分类准确率和效率。[结果] 实验结果表明, 基于主成分分析和随机森林理论的刷客识别模型识别的准确率为88.0%, 识别时间为3分钟。[局限] 刷客数据主要来源于第三方刷单平台, 不能全面反映所有刷客类型。[结论] 基于主成分分析和随机森林的刷客识别模型对刷客识别具有较高的准确率和较优的效率, 可以为电子商务平台识别刷单交易提供参考。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
Abstract

[Objective] A new model based on Principle Component Analysis and Random Forest is proposed aiming to detect Taobao brushers, decrease the dimensions of indicators and improve recognition rate. [Methods] This article uses Principle Component Analysis to reduce dimensions and uses Random Forest to classify users. In order to reflect the superiority of the detection model, it also builds detection models respectively based on KNN and SVM using the same data for different model training to compare the detection accuracy and efficiency of these models. [Results] The experimental results show that the detection model on the Principle Component Analysis and Random Forest gets 88.0% accuracy within 3 minutes. [Limitations] Most data is from third-party platforms which cannot fully reflect the all Singlebrush types. [Conclusions] The detection model on the Principle Component Analysis and Random Forest has higher detection accuracy and efficiency.

收稿日期: 2015-04-07     
:  G202  
通讯作者: 张皎, ORCID: 0000-0002-9541-5764, E-mail: 1120277437@qq.com。     E-mail: 1120277437@qq.com
作者简介: 作者贡献声明:张李义: 提出研究思路, 设计研究方案, 论文最终版本修订; 张皎: 设计实验过程, 实验数据采集、预处理和分析, 论文起草。
引用本文:   
张李义, 张皎. 一种基于主成分分析和随机森林的刷客识别方法[J]. 现代图书情报技术, 2015, 31(10): 65-71.
Zhang Liyi, Zhang Jiao. A Brusher Detection Method Based on Principle Component Analysis and Random Forest. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2015.10.09.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2015.10.09

[1] 阿里巴巴招股说明书[EB/OL]. [2015-04-06]. http://tech.sina. com.cn/i/2007-10-23/08361808855.shtml. (Alibaba Group's Prospectus [EB/OL]. [2015-04-06]. http://tech.sina.com.cn/i/2007-10-23/08361808855.shtml.)
[2] 刘会涛. 揭秘刷钻黑色产业链[N]. 北京青年报, 2009-08- 05(A09). (Liu Huitao. Disclosure of Singlebrush Black Industry [N]. Beijing Youth Daily, 2009-08-05(A09).)
[3] 戴添. 虚假订单风波致阿里巴巴股票创收盘新低[N]. 北京青年报, 2015-03-04. (Dai Tian. Alibaba Shares Close at New Low on Fake Orders [N]. Beijing Youth Daily, 2015-03-04.)
[4] Feng S, Banerjee R, Choi Y. Syntactic Stylometry for Deception Detection [C]. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2: Association for Computational Linguistics, 2012.
[5] Mukherjee A, Venkataraman V, Liu B, et al. What Yelp Fake Review Filter Might be Doing? [C]. In: Proceedings of the 7th International AAAI Conference on Weblogs and Social Media, 2013.
[6] 任亚峰, 姬东鸿, 尹兰. 基于半监督学习算法的虚假评论
识别研究[J]. 四川大学学报: 工程科学版, 2014,46(3): 62-69. (Ren Yafeng, Ji Donghong, Yin Lan. Deceptive Reviews Detection Based on Semi-supervised Learning Algorithm [J]. Journal of Sichuan University: Engineering Science Edition, 2014, 46(3): 62-69.)
[7] Mukherjee A, Liu B, Glance N. Spotting Fake Reviewer Groups in Consumer Reviews [C]. In: Proceedings of the 21st International Conference on World Wide Web. ACM, 2012: 191-200.
[8] Wang G, Xie S H, Liu B, et al. Review Graph Based OnlineStore Review Spammer Detection [C]. In: Proceedings of the 11th International Conference on Data Mining. Washington, DC, USA: IEEE Computer Society, 2011: 1242-1247.
[9] Lu Y, Zhang L, Xiao Y, et al. Simultaneously Detecting Fake Reviews and Review Spammers Using Factor Graph Model [C]. In: Proceedings of the 5th Annual ACM Web Science Conference. ACM, 2013: 225-233.
[10] Hotelling H. Analysis of a Complex of Statistical Variables into Principal Components [J]. Journal of Education Psychology, 1933, 24(6): 417-441.
[11] Karhunen J, Oja E, Wang L, et al. A Class of Neural Networks for Independent Component Analysis [J]. IEEE Transactions on Neural Networks, 1997, 8(3): 486-504.
[12] Ho C-T B, Wu D D. Online Banking Performance Evaluation Using Data Envelopment Analysis and Principal Component Analysis [J]. Computers & Operations Research, 2009, 36(6): 1835-1842.
[13] Oja E. Principal Components, Minor Components, and Linear Neural Networks [J]. Neural Networks, 1992, 5(5): 927-935.
[14] Kaiser H F. The Varimax Criterion for Analytic Rotation in Factor Analysis [J]. Psychometrika, 1958, 23(3): 187-200.
[15] 章文波, 陈红艳. 实用数据统计分析及spss12.0应用[M]. 北京: 人民邮电出版社, 2006: 249-250. (Zhang Wenbo, Chen Hongyan. Practical Data Analysis and SPSS 12.0 Application [M]. Beijing: People's Posts and Telecommuni­ca­tions Press, 2006: 249-250.)
[16] Breiman L. Random Forests [J]. Machine Learning, 2001, 45(1): 5-32.
[17] Kaiser H F. An Index of Factorial Simplicity [J]. Psychometrika, 1974, 39(1): 31-36.Bartlett M S. Properties of Sufficiency and Statistical Tests [J]. Proceedings of Royal Society of London, 1937, 160(901): 268-282.

[1] 宋梅青. 面向协同过滤推荐的多粒度用户偏好挖掘研究[J]. 现代图书情报技术, 2015, 31(12): 28-33.
[2] 王忠群, 乐元, 修宇, 皇苏斌, 汪千松. 基于模板用户信息搜索行为和统计分析的共谋销量欺诈识别[J]. 现代图书情报技术, 2015, 31(11): 41-50.
[3] 何跃, 宋灵犀, 齐丽云. 负面事件中的品牌网络口碑溢出效应研究——以“圆通夺命快递”事件为例[J]. 现代图书情报技术, 2015, 31(10): 58-64.
[4] 王忠群, 皇苏斌, 修宇, 张义. 基于领域专家和商品特征概念树的在线商品评论深刻性度量[J]. 现代图书情报技术, 2015, 31(9): 17-25.
[5] 盈艳, 曹妍, 牟向伟. 基于项目评分预测的混合式协同过滤推荐[J]. 现代图书情报技术, 2015, 31(6): 27-32.
[6] 赵静娴. 基于决策树的网络伪舆情识别研究[J]. 现代图书情报技术, 2015, 31(6): 78-84.
[7] 伍杰华, 朱岸青. 混合拓扑因子的科研网络合作关系预测[J]. 现代图书情报技术, 2015, 31(4): 65-71.
[8] 李胜, 王叶茂. 一种基于本体和位置感知的图书馆书籍推荐模型[J]. 现代图书情报技术, 2015, 31(3): 58-66.
[9] 陈涛, 张永娟, 陈恒. Web数据到RDF数据的框架实现[J]. 现代图书情报技术, 2015, 31(2): 1-6.
[10] 王伟军, 宋梅青. 一种面向用户偏好定向挖掘的协同过滤个性化推荐算法[J]. 现代图书情报技术, 2014, 30(6): 25-32.
[11] 吴珊燕, 许鑫. 基于案例推理的菜谱推荐系统研究[J]. 现代图书情报技术, 2013, (12): 34-41.
[12] 刘勘, 朱怀萍, 刘秀芹. 基于支持向量机的网络伪舆情识别研究[J]. 现代图书情报技术, 2013, 29(11): 75-80.
[13] 熊涛, 何跃. 微博转发网络中意见领袖的识别与分析[J]. 现代图书情报技术, 2013, (6): 55-62.
[14] 李树青, 王建强. 一种结合借阅时间特征分析的读者兴趣可视化识别方法[J]. 现代图书情报技术, 2013, (5): 46-53.
[15] 寇继虹, 戴亦舒, 刘芳, 吴珺, 徐承欢, 曹倩. 动态思维导图软件TheBrain的功能机制分析[J]. 现代图书情报技术, 2012, (12): 45-51.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn