Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (7): 46-54    DOI: 10.11925/infotech.2096-3467.2017.1193
Current Issue | Archive | Adv Search |
Identifying Crowd Participants with Modified Random Forests Algorithm
Zhou Cheng(), Wei Hongqin
Glorious Sun School of Business and Management, Donghua University, Shanghai 200051, China
Download: PDF (621 KB)   HTML ( 1
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study tries to address the classic issues facing crowd participant identification tasks. [Methods] We proposed a recursive heuristic method to reduce the attributes, aiming to establish a new crowd participant identification system based on their abilities. Then, we built a model to locate crowd participants with the help of random forests algorithm and the proposed system. [Results] Our new method reduced the data dimension to 8 from 18, which yielded better recognition rates. [Limitations] The proposed model is simple and needs to be expanded. Data of this study was retrieved from crowdsourcing contest websites, which might have data integrity issues. [Conclusions] The modified machine learning method could help us effectively identify crowdsourcing participants.

Key wordsCrowd Participant Identification System      Feature Reduction      Random Forests      Crowdsourcing Contests     
Received: 27 November 2017      Published: 15 August 2018
ZTFLH:  TP181  

Cite this article:

Zhou Cheng,Wei Hongqin. Identifying Crowd Participants with Modified Random Forests Algorithm. Data Analysis and Knowledge Discovery, 2018, 2(7): 46-54.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.1193     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I7/46

编号 属性名称 编号 属性名称 编号 属性名称
1 学历 7 信息交流强度 13 众包方推荐
2 工作经验 8 参与者能力等级 14 任务难易程度
3 擅长技能 9 参与者诚信度 15 知识匹配程度
4 中标次数 10 好评率 16 浏览量
5 综合评分 11 退款数 17 店铺收藏量
6 任务收入 12 纠纷率 18 中标情况
序号 属性名称 影响
系数(%)
序号 属性名称 影响
系数(%)
1 工作经验 9.48 10 退款数 0.30
2 知识匹配程度 6.52 11 众包方推荐 0.29
3 任务难易程度 5.19 12 好评率 0.28
4 中标次数 4.09 13 参与者诚信度 0.27
5 浏览量 3.79 14 信息交流强度 0.22
6 学历 3.05 15 参与者能力等级 0.17
7 综合评分 2.82 16 纠纷率 0.13
8 擅长技能 1.5 17 店铺收藏量 0
9 任务收入 0.9
序号 属性名称 提升
系数(%)
序号 属性名称 提升
系数(%)
1 工作经验 3.42 10 退款数 -0.65
2 知识匹配程度 1.34 11 众包方推荐 2.44
3 任务难易程度 3.23 12 好评率 0.43
4 中标次数 2.05 13 参与者诚信度 -0.27
5 浏览量 5.99 14 信息交流强度 2.63
6 学历 -2.29 15 参与者能力等级 0.03
7 综合评分 1.17 16 纠纷率 -1.99
8 擅长技能 -0.77 17 店铺收藏量 0
9 任务收入 -0.44
序号 属性名称 序号 属性名称 序号 属性名称
1 店铺收藏量 5 众包方推荐 9 参与者能力等级
2 工作经验 6 好评率 10 任务难易程度
3 综合评分 7 信息交流强度 11 知识匹配程度
4 浏览量 8 中标次数
序号 属性名称 重要系数 序号 属性名称 重要系数
1 信息交流强度 8.953 10 综合评分 1.472
2 中标次数 5.405 11 擅长技能 1.156
3 知识匹配程度 5.005 12 好评率 1.098
4 店铺收藏量 4.658 13 学历 0.960
5 任务收入 4.299 14 退款数 0.359
6 参与者能力等级 3.093 15 众包方推荐 0.270
7 浏览量 2.392 16 纠纷率 0.029
8 任务难易程度 2.201 17 参与者诚信度 0.000
9 工作经验 2.171
分类器模型 识别率(%)
基于识别率 基于属性重要性
随机森林 78.021 75.467
加入
属性个数
识别率
最高值(%) 最低值(%) 平均值(%) 最高值对应的属性集合
1 58.150 32.212 46.121 6
2 74.709 51.656 63.659 6, 8
3 75.915 59.409 66.346 6, 8, 10
4 79.426 61.601 75.063 6, 8, 10, 2
5 76.340 65.306 73.826 6, 8, 10, 2, 7
6 77.587 67.053 75.579 6, 8, 10, 2, 7, 11
7 78.351 68.134 76.709 6, 8, 10, 2, 7, 11, 3
8 80.036 69.235 78.612 6, 8, 10, 2, 7, 11, 3, 5
目标层 准则层 指标层
知识积累水平 工作经验
中标次数
双方参与程度 众包方推荐
众包参与者识别体系 信息交流强度
好评率
平台评价 综合评分
众包任务水平 知识匹配程度
任务难易程度
任务类型 识别模型
CART SVM 随机森林
LOGO设计 68.324 69.021 70.001
网站建设 67.479 70.213 71.213
网络推广 68.568 68.734 70.568
全部样本 67.765 68.679 70.195
[1] Howe J.The Rise of Crowdsourcing[J]. Convergence Culture Where Old & New Media Collide, 2006, 14(14): 1-5.
[2] Pénin J, Burger-Helmchen T.Crowdsourcing of Inventive Activities: Definition and Limits[J]. International Journal of Innovation & Sustainable Development, 2011, 5(2/3): 246-263.
doi: 10.1504/IJISD.2011.043068
[3] Barnes S A, Green A, de Hoyos M. Crowdsourcing and Work: Individual Factors and Circumstances Influencing Employability[J]. New Technology Work & Employment, 2015, 30(1): 16-31.
[4] 郑海超, 侯文华. 网上创新竞争研究综述[J]. 科学学与科学技术管理, 2011, 32(1): 82-88.
[4] (Zheng Haichao, Hou Wenhua.A Research Review of Online Innovation Contest[J]. Science of Science and Management of S.&.T, 2011, 32(1): 82-88.)
[5] Frey K, Lüthje C, Haag S.Whom Should Firms Attract to Open Innovation Platforms? The Role of Knowledge Diversity and Motivation[J]. Long Range Planning, 2011, 44(5): 397-420.
doi: 10.1016/j.lrp.2011.09.006
[6] Erickson L B, Petrick I, Trauth E M.Hanging with the Right Crowd: Matching Crowdsourcing Need to Crowd Characteristics[R]. ProQuest LLC, 2012: 77-85.
[7] Geiger D, Schader M.Personalized Task Recommendation in Crowdsourcing Information Systems — Current State of the Art[J]. Decision Support Systems, 2014, 65(C): 3-16.
doi: 10.1016/j.dss.2014.05.007
[8] Tarasov A, Delany S J, Namee B M.Dynamic Estimation of Worker Reliability in Crowdsourcing for Regression Tasks: Making it Work[J]. Expert Systems with Applications, 2014, 41(14): 6190-6210.
doi: 10.1016/j.eswa.2014.04.012
[9] Zhao Y C, Zhu Q.Conceptualizing Task Affordance in Online Crowdsourcing Context[J]. Online Information Review, 2016, 40(7): 938-958.
doi: 10.1108/OIR-06-2015-0192
[10] Ye B, Wang Y, Liu L.Crowd Trust: A Context-Aware Trust Model for Worker Selection in Crowdsourcing Environments[C]//Proceedings of IEEE International Conference on Web Services. IEEE, 2015: 121-128.
[11] 吕英杰, 张朋柱, 刘景方. 众包模式中面向创新任务的知识型人才选择[J]. 系统管理学报, 2013, 22(1): 60-66.
doi: 10.3969/j.issn.1005-2542.2013.01.009
[11] (Lv Yingjie, Zhang Pengzhu, Liu Jingfang.Task-oriented Talent Selection in Crowdsourcing[J]. Journal of Systems & Management, 2013, 22(1): 60-66.)
doi: 10.3969/j.issn.1005-2542.2013.01.009
[12] 孟庆良, 郭鑫鑫. 基于BP神经网络的众包创新关键用户知识源识别研究[J]. 科学学与科学技术管理, 2017,38(3): 139-148.
[12] (Meng Qingliang, Guo Xinxin.Research on Identification of Key Knowledge Source in Crowdsourcing Innovation Based on BP Neural Network[J]. Science of Science and Management of S.&.T, 2017, 38(3): 139-148. )
[13] Idris A, Khan A, Lee Y S.Intelligent Churn Prediction in Telecom: Employing mRMR Feature Selection and RotBoost Based Ensemble Classification[J]. Applied Intelligence, 2013, 39(3): 659-672.
doi: 10.1007/s10489-013-0440-x
[14] Mesleh A M A. Chi Square Feature Extraction Based SVMs Arabic Language Text Categorization System[J]. Journal of Computer Science, 2007, 3(6): 430-435.
doi: 10.3844/jcssp.2007.430.435
[15] Maldonado S, Weber R.A Wrapper Method for Feature Selection Using Support Vector Machines[J]. Information Sciences, 2009, 179(13): 2208-2217.
doi: 10.1016/j.ins.2009.02.014
[16] Wang C W, You W H.Boosting-SVM: Effective Learning with Reduced Data Dimension[J]. Applied Intelligence, 2013, 39(3): 465-474.
doi: 10.1007/s10489-013-0425-9
[17] 王力波, 王耀力, 常青. 生物信息学中的特征选择[J]. 太原理工大学学报, 2017, 48(3):458-468.
doi: 10.16355/j.cnki.issn1007-9432tyut.2017.03.025
[17] (Wang Libo, Wang Yaoli, Chang Qing.A Review on Feature Selection for Bioinformatics[J]. Journal of Taiyuan University of Technology, 2017, 48(3): 458-468.)
doi: 10.16355/j.cnki.issn1007-9432tyut.2017.03.025
[18] Breiman L.Random Forest[J]. Machine Learning, 2001, 45(1): 5-32.
doi: 10.1023/A:1010933404324
[19] 姚登举, 杨静, 詹晓娟. 基于随机森林的特征选择算法[J]. 吉林大学学报: 工学版, 2014, 44(1): 137-141.
doi: 10.13229/j.cnki.jdxbgxb201401024
[19] (Yao Dengju, Yang Jing, Zhan Xiaojuan.Feature Selection Algorithm Based on Random Forest[J]. Journal of Jilin University:Engineering and Technology Edition, 2014, 44(1): 137-141.)
doi: 10.13229/j.cnki.jdxbgxb201401024
[20] Strobl C, Boulesteix A L, Kneib T, et al.Conditional Variable Importance for Random Forests[J]. BMC Bioinformatics, 2008, 9(1): 1-11.
doi: 10.1186/1471-2105-9-1 pmid: 2265676
[21] 邱一卉, 张驰雨, 陈水宣. 基于分类回归树算法的专利价值评估指标体系研究[J]. 厦门大学学报:自然科学版, 2017, 56(2): 244-251.
doi: 10.6043/j.issn.0438-0479.201608004
[21] (Qiu Yihui, Zhang Chiyu, Chen Shuixuan.Research of Patent-value Assessment Indictor System Based on Classification and Regression Tree Algorithm[J]. Journal of Xiamen University:Natural Science, 2017, 56(2): 244-251.)
doi: 10.6043/j.issn.0438-0479.201608004
[22] Gefen D, Gefen G, Carmel E.How Project Description Length and Expected Duration Affect Bidding and Project Success in Crowdsourcing Software Development[J]. Journal of Systems & Software, 2016, 116: 75-84.
[23] 刘景方, 张朋柱, 吕英杰, 等. 基于文本挖掘的众包人才能力分析[J]. 系统管理学报, 2015, 24(3):365-371.
[23] (Liu Jingfang, Zhang Pengzhu, Lv Yingjie, et al.Analysis of the Competence of Crowdsourcing Talents Using Text Mining[J]. Journal of Systems & Management, 2015, 24(3): 365-371.)
[1] Gong Lijuan,Wang Hao,Zhang Zixuan,Zhu Liping. Reducing Dimensions of Custom Declaration Texts with Word2Vec[J]. 数据分析与知识发现, 2020, 4(2/3): 89-100.
[2] Xing Meifeng. Study on Solution to Redundancy of Scientific Literature Keywords[J]. 现代图书情报技术, 2012, 28(1): 34-39.
[3] Liu Hai-Feng, Liu Shou-Sheng, Zhang Hua-Ren, Su Zhan. A Model of Text Categorization Automatically Based on Category[J]. 现代图书情报技术, 2010, 26(4): 72-76.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn