Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (7): 46-54    DOI: 10.11925/infotech.2096-3467.2017.1193
Current Issue | Archive | Adv Search |
Identifying Crowd Participants with Modified Random Forests Algorithm
Cheng Zhou(),Hongqin Wei
Glorious Sun School of Business and Management, Donghua University, Shanghai 200051, China
Download: PDF(621 KB)   HTML ( 1
Export: BibTeX | EndNote (RIS)      

[Objective] This study tries to address the classic issues facing crowd participant identification tasks. [Methods] We proposed a recursive heuristic method to reduce the attributes, aiming to establish a new crowd participant identification system based on their abilities. Then, we built a model to locate crowd participants with the help of random forests algorithm and the proposed system. [Results] Our new method reduced the data dimension to 8 from 18, which yielded better recognition rates. [Limitations] The proposed model is simple and needs to be expanded. Data of this study was retrieved from crowdsourcing contest websites, which might have data integrity issues. [Conclusions] The modified machine learning method could help us effectively identify crowdsourcing participants.

Key wordsCrowd Participant Identification System      Feature Reduction      Random Forests      Crowdsourcing Contests     
Received: 27 November 2017      Published: 15 August 2018

Cite this article:

Cheng Zhou,Hongqin Wei. Identifying Crowd Participants with Modified Random Forests Algorithm. Data Analysis and Knowledge Discovery, 2018, 2(7): 46-54.

URL:     OR

[1] Howe J.The Rise of Crowdsourcing[J]. Convergence Culture Where Old & New Media Collide, 2006, 14(14): 1-5.
[2] Pénin J, Burger-Helmchen T.Crowdsourcing of Inventive Activities: Definition and Limits[J]. International Journal of Innovation & Sustainable Development, 2011, 5(2/3): 246-263.
[3] Barnes S A, Green A, de Hoyos M. Crowdsourcing and Work: Individual Factors and Circumstances Influencing Employability[J]. New Technology Work & Employment, 2015, 30(1): 16-31.
[4] 郑海超, 侯文华. 网上创新竞争研究综述[J]. 科学学与科学技术管理, 2011, 32(1): 82-88.
[4] (Zheng Haichao, Hou Wenhua.A Research Review of Online Innovation Contest[J]. Science of Science and Management of S.&.T, 2011, 32(1): 82-88.)
[5] Frey K, Lüthje C, Haag S.Whom Should Firms Attract to Open Innovation Platforms? The Role of Knowledge Diversity and Motivation[J]. Long Range Planning, 2011, 44(5): 397-420.
[6] Erickson L B, Petrick I, Trauth E M.Hanging with the Right Crowd: Matching Crowdsourcing Need to Crowd Characteristics[R]. ProQuest LLC, 2012: 77-85.
[7] Geiger D, Schader M.Personalized Task Recommendation in Crowdsourcing Information Systems — Current State of the Art[J]. Decision Support Systems, 2014, 65(C): 3-16.
[8] Tarasov A, Delany S J, Namee B M.Dynamic Estimation of Worker Reliability in Crowdsourcing for Regression Tasks: Making it Work[J]. Expert Systems with Applications, 2014, 41(14): 6190-6210.
[9] Zhao Y C, Zhu Q.Conceptualizing Task Affordance in Online Crowdsourcing Context[J]. Online Information Review, 2016, 40(7): 938-958.
[10] Ye B, Wang Y, Liu L.Crowd Trust: A Context-Aware Trust Model for Worker Selection in Crowdsourcing Environments[C]//Proceedings of IEEE International Conference on Web Services. IEEE, 2015: 121-128.
[11] 吕英杰, 张朋柱, 刘景方. 众包模式中面向创新任务的知识型人才选择[J]. 系统管理学报, 2013, 22(1): 60-66.
[11] (Lv Yingjie, Zhang Pengzhu, Liu Jingfang.Task-oriented Talent Selection in Crowdsourcing[J]. Journal of Systems & Management, 2013, 22(1): 60-66.)
[12] 孟庆良, 郭鑫鑫. 基于BP神经网络的众包创新关键用户知识源识别研究[J]. 科学学与科学技术管理, 2017,38(3): 139-148.
[12] (Meng Qingliang, Guo Xinxin.Research on Identification of Key Knowledge Source in Crowdsourcing Innovation Based on BP Neural Network[J]. Science of Science and Management of S.&.T, 2017, 38(3): 139-148. )
[13] Idris A, Khan A, Lee Y S.Intelligent Churn Prediction in Telecom: Employing mRMR Feature Selection and RotBoost Based Ensemble Classification[J]. Applied Intelligence, 2013, 39(3): 659-672.
[14] Mesleh A M A. Chi Square Feature Extraction Based SVMs Arabic Language Text Categorization System[J]. Journal of Computer Science, 2007, 3(6): 430-435.
[15] Maldonado S, Weber R.A Wrapper Method for Feature Selection Using Support Vector Machines[J]. Information Sciences, 2009, 179(13): 2208-2217.
[16] Wang C W, You W H.Boosting-SVM: Effective Learning with Reduced Data Dimension[J]. Applied Intelligence, 2013, 39(3): 465-474.
[17] 王力波, 王耀力, 常青. 生物信息学中的特征选择[J]. 太原理工大学学报, 2017, 48(3):458-468.
[17] (Wang Libo, Wang Yaoli, Chang Qing.A Review on Feature Selection for Bioinformatics[J]. Journal of Taiyuan University of Technology, 2017, 48(3): 458-468.)
[18] Breiman L.Random Forest[J]. Machine Learning, 2001, 45(1): 5-32.
[19] 姚登举, 杨静, 詹晓娟. 基于随机森林的特征选择算法[J]. 吉林大学学报: 工学版, 2014, 44(1): 137-141.
[19] (Yao Dengju, Yang Jing, Zhan Xiaojuan.Feature Selection Algorithm Based on Random Forest[J]. Journal of Jilin University:Engineering and Technology Edition, 2014, 44(1): 137-141.)
[20] Strobl C, Boulesteix A L, Kneib T, et al.Conditional Variable Importance for Random Forests[J]. BMC Bioinformatics, 2008, 9(1): 1-11.
[21] 邱一卉, 张驰雨, 陈水宣. 基于分类回归树算法的专利价值评估指标体系研究[J]. 厦门大学学报:自然科学版, 2017, 56(2): 244-251.
[21] (Qiu Yihui, Zhang Chiyu, Chen Shuixuan.Research of Patent-value Assessment Indictor System Based on Classification and Regression Tree Algorithm[J]. Journal of Xiamen University:Natural Science, 2017, 56(2): 244-251.)
[22] Gefen D, Gefen G, Carmel E.How Project Description Length and Expected Duration Affect Bidding and Project Success in Crowdsourcing Software Development[J]. Journal of Systems & Software, 2016, 116: 75-84.
[23] 刘景方, 张朋柱, 吕英杰, 等. 基于文本挖掘的众包人才能力分析[J]. 系统管理学报, 2015, 24(3):365-371.
[23] (Liu Jingfang, Zhang Pengzhu, Lv Yingjie, et al.Analysis of the Competence of Crowdsourcing Talents Using Text Mining[J]. Journal of Systems & Management, 2015, 24(3): 365-371.)
[1] Xing Meifeng. Study on Solution to Redundancy of Scientific Literature Keywords[J]. 现代图书情报技术, 2012, 28(1): 34-39.
[2] Liu Hai-Feng, Liu Shou-Sheng, Zhang Hua-Ren, Su Zhan. A Model of Text Categorization Automatically Based on Category[J]. 现代图书情报技术, 2010, 26(4): 72-76.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938