Please wait a minute...
Advanced Search
现代图书情报技术  2015, Vol. 31 Issue (10): 65-71     https://doi.org/10.11925/infotech.1003-3513.2015.10.09
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
一种基于主成分分析和随机森林的刷客识别方法
张李义, 张皎
武汉大学信息管理学院 武汉 430072
A Brusher Detection Method Based on Principle Component Analysis and Random Forest
Zhang Liyi, Zhang Jiao
School of Information Management, Wuhan University, Wuhan 430072, China
全文: PDF (539 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的] 针对刷客识别的指标维数较高, 识别的准确率和效率较低的问题, 提出新的识别模型, 提高刷客的识别准确率和效率。[方法] 采用主成分分析法对用户指标进行降维, 并运用随机森林算法识别刷客。为了反映该模型在刷客识别方面的优越性, 分别建立基于K近邻判断分析、支持向量机理论的识别模型, 用相同的数据针对不同模型进行训练, 比较不同模型的识别分类准确率和效率。[结果] 实验结果表明, 基于主成分分析和随机森林理论的刷客识别模型识别的准确率为88.0%, 识别时间为3分钟。[局限] 刷客数据主要来源于第三方刷单平台, 不能全面反映所有刷客类型。[结论] 基于主成分分析和随机森林的刷客识别模型对刷客识别具有较高的准确率和较优的效率, 可以为电子商务平台识别刷单交易提供参考。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
Abstract

[Objective] A new model based on Principle Component Analysis and Random Forest is proposed aiming to detect Taobao brushers, decrease the dimensions of indicators and improve recognition rate. [Methods] This article uses Principle Component Analysis to reduce dimensions and uses Random Forest to classify users. In order to reflect the superiority of the detection model, it also builds detection models respectively based on KNN and SVM using the same data for different model training to compare the detection accuracy and efficiency of these models. [Results] The experimental results show that the detection model on the Principle Component Analysis and Random Forest gets 88.0% accuracy within 3 minutes. [Limitations] Most data is from third-party platforms which cannot fully reflect the all Singlebrush types. [Conclusions] The detection model on the Principle Component Analysis and Random Forest has higher detection accuracy and efficiency.

收稿日期: 2015-04-07      出版日期: 2016-04-06
:  G202  
通讯作者: 张皎, ORCID: 0000-0002-9541-5764, E-mail: 1120277437@qq.com。     E-mail: 1120277437@qq.com
作者简介: 作者贡献声明:张李义: 提出研究思路, 设计研究方案, 论文最终版本修订; 张皎: 设计实验过程, 实验数据采集、预处理和分析, 论文起草。
引用本文:   
张李义, 张皎. 一种基于主成分分析和随机森林的刷客识别方法[J]. 现代图书情报技术, 2015, 31(10): 65-71.
Zhang Liyi, Zhang Jiao. A Brusher Detection Method Based on Principle Component Analysis and Random Forest. New Technology of Library and Information Service, 2015, 31(10): 65-71.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2015.10.09      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2015/V31/I10/65

[1] 阿里巴巴招股说明书[EB/OL]. [2015-04-06]. http://tech.sina. com.cn/i/2007-10-23/08361808855.shtml. (Alibaba Group's Prospectus [EB/OL]. [2015-04-06]. http://tech.sina.com.cn/i/2007-10-23/08361808855.shtml.)
[2] 刘会涛. 揭秘刷钻黑色产业链[N]. 北京青年报, 2009-08- 05(A09). (Liu Huitao. Disclosure of Singlebrush Black Industry [N]. Beijing Youth Daily, 2009-08-05(A09).)
[3] 戴添. 虚假订单风波致阿里巴巴股票创收盘新低[N]. 北京青年报, 2015-03-04. (Dai Tian. Alibaba Shares Close at New Low on Fake Orders [N]. Beijing Youth Daily, 2015-03-04.)
[4] Feng S, Banerjee R, Choi Y. Syntactic Stylometry for Deception Detection [C]. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2: Association for Computational Linguistics, 2012.
[5] Mukherjee A, Venkataraman V, Liu B, et al. What Yelp Fake Review Filter Might be Doing? [C]. In: Proceedings of the 7th International AAAI Conference on Weblogs and Social Media, 2013.
[6] 任亚峰, 姬东鸿, 尹兰. 基于半监督学习算法的虚假评论
识别研究[J]. 四川大学学报: 工程科学版, 2014,46(3): 62-69. (Ren Yafeng, Ji Donghong, Yin Lan. Deceptive Reviews Detection Based on Semi-supervised Learning Algorithm [J]. Journal of Sichuan University: Engineering Science Edition, 2014, 46(3): 62-69.)
[7] Mukherjee A, Liu B, Glance N. Spotting Fake Reviewer Groups in Consumer Reviews [C]. In: Proceedings of the 21st International Conference on World Wide Web. ACM, 2012: 191-200.
[8] Wang G, Xie S H, Liu B, et al. Review Graph Based OnlineStore Review Spammer Detection [C]. In: Proceedings of the 11th International Conference on Data Mining. Washington, DC, USA: IEEE Computer Society, 2011: 1242-1247.
[9] Lu Y, Zhang L, Xiao Y, et al. Simultaneously Detecting Fake Reviews and Review Spammers Using Factor Graph Model [C]. In: Proceedings of the 5th Annual ACM Web Science Conference. ACM, 2013: 225-233.
[10] Hotelling H. Analysis of a Complex of Statistical Variables into Principal Components [J]. Journal of Education Psychology, 1933, 24(6): 417-441.
[11] Karhunen J, Oja E, Wang L, et al. A Class of Neural Networks for Independent Component Analysis [J]. IEEE Transactions on Neural Networks, 1997, 8(3): 486-504.
[12] Ho C-T B, Wu D D. Online Banking Performance Evaluation Using Data Envelopment Analysis and Principal Component Analysis [J]. Computers & Operations Research, 2009, 36(6): 1835-1842.
[13] Oja E. Principal Components, Minor Components, and Linear Neural Networks [J]. Neural Networks, 1992, 5(5): 927-935.
[14] Kaiser H F. The Varimax Criterion for Analytic Rotation in Factor Analysis [J]. Psychometrika, 1958, 23(3): 187-200.
[15] 章文波, 陈红艳. 实用数据统计分析及spss12.0应用[M]. 北京: 人民邮电出版社, 2006: 249-250. (Zhang Wenbo, Chen Hongyan. Practical Data Analysis and SPSS 12.0 Application [M]. Beijing: People's Posts and Telecommuni­ca­tions Press, 2006: 249-250.)
[16] Breiman L. Random Forests [J]. Machine Learning, 2001, 45(1): 5-32.
[17] Kaiser H F. An Index of Factorial Simplicity [J]. Psychometrika, 1974, 39(1): 31-36.Bartlett M S. Properties of Sufficiency and Statistical Tests [J]. Proceedings of Royal Society of London, 1937, 160(901): 268-282.

[1] 范涛,王昊,吴鹏. 基于图卷积神经网络和依存句法分析的网民负面情感分析研究*[J]. 数据分析与知识发现, 2021, 5(9): 97-106.
[2] 周泽聿,王昊,赵梓博,李跃艳,张小琴. 融合关联信息的GCN文本分类模型构建及其应用研究*[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[3] 冯勇,刘洋,徐红艳,王嵘冰,张永刚. 融合近邻评论的GRU商品推荐模型*[J]. 数据分析与知识发现, 2021, 5(3): 78-87.
[4] 邬金鸣,侯跃芳,崔雷. 基于医学主题词标引规则的词共现聚类分析结果自动判读和表达的研究[J]. 数据分析与知识发现, 2020, 4(9): 133-144.
[5] 赵旸, 张智雄, 刘欢, 丁良萍. 基于BERT模型的中文医学文献分类研究*[J]. 数据分析与知识发现, 2020, 4(8): 41-49.
[6] 张智雄,刘欢,丁良萍,吴朋民,于改红. 不同深度学习模型的科技论文摘要语步识别效果对比研究 *[J]. 数据分析与知识发现, 2019, 3(12): 1-9.
[7] 俞琰,陈磊,姜金德,赵乃瑄. 结合词向量和统计特征的专利相似度测量方法 *[J]. 数据分析与知识发现, 2019, 3(9): 53-59.
[8] 熊回香, 叶佳鑫, 蒋武轩. 改进的DBSCAN聚类算法在社会化标注中的应用*[J]. 数据分析与知识发现, 2018, 2(12): 77-88.
[9] 何伟林, 奉国和, 谢红玲. 基于CSToT模型的科技文献主题发现与演化研究*[J]. 数据分析与知识发现, 2018, 2(11): 64-72.
[10] 殷聪, 张李义. 基于TF-IDF的情境后过滤推荐算法研究*——以餐饮业O2O为例[J]. 数据分析与知识发现, 2018, 2(11): 28-36.
[11] 胡家珩, 岑咏华, 吴承尧. 基于深度学习的领域情感词典自动构建*——以金融领域为例[J]. 数据分析与知识发现, 2018, 2(10): 95-102.
[12] 徐建民, 许彩云. 基于文本和公式的科技文档相似度计算*[J]. 数据分析与知识发现, 2018, 2(10): 103-109.
[13] 张艳丰, 李贺, 彭丽徽, 侯力铁. 基于情感语义特征抽取的在线评论有用性分类算法与应用[J]. 数据分析与知识发现, 2017, 1(12): 74-83.
[14] 魏星, 胡德华, 易敏寒, 朱启贞, 朱文婕. 基于数据立方体挖掘疾病-基因-药物新关联*[J]. 数据分析与知识发现, 2017, 1(10): 94-104.
[15] 王忠群, 吴东胜, 蒋胜, 皇苏斌. 一种基于主流特征观点对的评论可信性排序研究*[J]. 数据分析与知识发现, 2017, 1(10): 32-42.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn