Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (11): 10-18    DOI: 10.11925/infotech.2096-3467.2018.0823
Current Issue | Archive | Adv Search |
Predicting Repeat Purchase Intention of New Consumers
Zhang Liyi(), Li Yiran, Wen Xuan
School of Information Management, Wuhan University, Wuhan 430072, China
Download: PDF (944 KB)   HTML ( 7
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper compares the prediction accuracy and efficiency of different machine learning algorithms, aiming to identify new consumers with repeat purchase intentions. It also provides a theoretical framework for customer classification. [Methods] First, we collected the server logs of a dealer on Taobao.com from 2015 to 2018, as well as its orders and consumers’ personal information. And then, we used different algorithms to train the proposed models. [Results] The SMOTE algorithm combined with the random forest algorithm obtained the highest prediction accuracy of 96%. [Limitations] The sample data size needs to be expanded. [Conclusions] The fusion algorithm based on SMOTE and random forest has better performance in predicting repurchase intentions of new consumers.

Key wordsRepeat Purchase      New Consumers      Intention Prediction      SMOTE      Random Forest     
Received: 26 July 2018      Published: 11 December 2018
ZTFLH:  TP391 G35  

Cite this article:

Zhang Liyi,Li Yiran,Wen Xuan. Predicting Repeat Purchase Intention of New Consumers. Data Analysis and Knowledge Discovery, 2018, 2(11): 10-18.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.0823     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I11/10

一级属性 二级属性 属性含义
个人基本信息 性别* {男性; 女性}
地址* {一线城市; 二线城市; 三线城市; 四线城市}
买家信誉* {1心; 2心; 3心; 4心; 5心; 1钻; 2钻; 3钻; 4钻; 5钻; 1冠; 2冠; 3冠; 4冠; 5冠}
卖家信誉* {1心; 2心; 3心; 4心; 5心; 1钻; 2钻; 3钻; 4钻; 5钻; 1冠; 2冠; 3冠; 4冠; 5冠}
是否实名认证* {已认证; 未认证}
是否有头像* {是; 否}
服务难度 异常情况 被其他卖家拦截次数 消费者被其他淘宝卖家拦截的次数
是否是云黑名单成员* {是; 否}
服务难度 是否给过其他卖家中差评* {是; 否}
退款次数 消费者的淘宝历史退款次数
退款率 消费者的淘宝历史退款次数/总成交次数
评价信誉 发出好评率 消费者发出的好评数/发出的评价总数
收到好评率 消费者收到的好评数/收到的评价总数
消费水平 购买能力* 消费者在淘宝的购买能力等级(1-10级)
购买积极性* 消费者在淘宝的购买积极性等级(1-10级)
交易数据(近三个月) 历史交易金额 消费者近三个月在淘宝的交易总金额
历史成交次数 消费者近三个月在淘宝的交易成交次数
历史关闭次数 消费者近三个月在淘宝的订单关闭次数
订单支付率 消费者近三个月在淘宝的支付订单数/总订单数
支付积极性* 消费者近三个月在淘宝的支付积极性等级
平均客单价 消费者近三个月在淘宝的交易总金额/成交次数
近三个月浏览本店次数 消费者近三个月浏览本店的次数
交易行为偏好 交易平台偏好 {手机端; 电脑端; 聚划算}
折扣敏感度* {不敏感; 一般敏感; 比较敏感; 非常敏感}
分类模型 Precision Recall F-Score 训练
时长(s)
Bagging_KNN 0.88 0.90 0.86 0.320
SMOTE-Bagging_KNN 0.87 0.87 0.87 1.254
Decision_tree 0.83 0.82 0.82 0.446
SMOTE-Decision_tree 0.89 0.89 0.89 1.188
RF 0.88 0.90 0.86 2.458
SMOTE-RF 0.96 0.95 0.95 5.614
[1] 陈璐. 阿里巴巴多元化战略分析[J]. 商业经济, 2017(4): 77-79.
[1] (Chen Lu.Analysis of Diversification Strategy in Alibaba[J]. Business Economy, 2017(4): 77-79.)
[2] Schmittlein D C, Morrison D G, Colombo R.Counting Your Customers: Who are They and What will They Do Next?[J]. Management Science, 1987, 33(1): 1-24.
doi: 10.1287/mnsc.33.1.1
[3] 刘帆. 客户分析与统计建模[D]. 合肥: 中国科学技术大学, 2014.
[3] (Liu Fan.Statistical Models for Customer-base Analysis[D]. Hefei: University of Science and Technology of China, 2014.)
[4] 李美其, 齐佳音. 基于购买行为及评论行为的用户购买预测研究[J]. 北京邮电大学学报: 社会科学版, 2016, 18(4): 18-25.
[4] (Li Meiqi, Qi Jiayin.Customer Purchase Prediction Based on Buying Behavior and Comment Behavior[J]. Journal of Beijing University of Posts and Telecommunications: Social Science Edition, 2016, 18(4): 18-25.)
[5] 马少辉, 刘金兰. Pareto/NBD模型实证与应用研究[J]. 管理科学, 2006, 19(5): 45-49.
doi: 10.3969/j.issn.1672-0334.2006.05.007
[5] (Ma Shaohui, Liu Jinlan.Empirical Study on Pareto/NBD Model and Its Applications[J]. Management Sciences in China, 2006, 19(5): 45-49.)
doi: 10.3969/j.issn.1672-0334.2006.05.007
[6] 陈洁, 谢文昕, 杨升荣. 在线渠道消费者动态品牌选择购买率预测[J]. 工业工程与管理, 2011, 16(3): 52-56.
[6] (Chen Jie, Xie Wenxin, Yang Shengrong.Forecasting the Purchase Rate of Online Consumer’s Dynamic Brand Choice[J]. Industrial Engineering and Management, 2011, 16(3): 52-56.)
[7] 舒方, 马少辉. 客户重复购买的组合预测方法[J]. 计算机与现代化, 2015(5): 67-70.
doi: 10.3969/j.issn.1006-2475.2015.05.014
[7] (Shu Fang, Ma Shaohui.A Composition Forecasting Approach of Customer Repeat Purchasing[J]. Computer and Modernization, 2015(5): 67-70.)
doi: 10.3969/j.issn.1006-2475.2015.05.014
[8] Marshall P.A Simple Heuristic for Obtaining Pareto/NBD Parameter Estimates[J]. Marketing Letters, 2015, 26(2): 165-173.
doi: 10.1007/s11002-013-9272-z
[9] Van Oest R, Knox G.Erratum to: Extending the BG/NBD: A Simple Model of Purchases and Complaints[J]. International Journal of Research in Marketing, 2011, 28(3): 280.
doi: 10.1016/j.ijresmar.2011.08.001
[10] 马宝龙, 李金林, 李纯青, 等. 回报计划对重复购买行为模式的影响研究[J]. 数理统计与管理, 2007, 26(3): 457-467.
doi: 10.3969/j.issn.1002-1566.2007.03.013
[10] (Ma Baolong, Li Jinlin, Li Chunqing, et al.Effect of Reward Programs on Repeat-purchase Behavior Patterns[J]. Application of Statistics & Management, 2007, 26(3): 457-467.)
doi: 10.3969/j.issn.1002-1566.2007.03.013
[11] 吴国华, 潘德惠. 顾客购买行为影响因素分析及重购概率的预测[J]. 管理工程学报, 2005, 19(1): 104-107.
doi: 10.3969/j.issn.1004-6062.2005.01.023
[11] (Wu Guohua, Pan Dehui.Analyzing the Main Elements of Customer Purchase Behavior and Predicting the Probability of Customer Repurchase[J]. Journal of Industrial Engineering & Engineering Management, 2005, 19(1): 104-107.)
doi: 10.3969/j.issn.1004-6062.2005.01.023
[12] Tapiero C S.The NBD Repeat Purchase Process and M / G /∞ Queues[J]. International Journal of Production Economics, 2000, 63(2): 141-145.
doi: 10.1016/S0925-5273(98)00254-0
[13] Jacobs B J D, Donkers B, Fok D. Model-based Purchase Predictions for Large Assortments[J]. Marketing Science, 2018, 35(3): 389-404.
doi: 10.1287/mksc.2016.0985
[14] 王福华, 乔忠, 刘巍. 重复与经常购买的产品生命周期模型的参数估计[J]. 数量经济技术经济研究, 2004(8): 55-61.
doi: 10.3969/j.issn.1000-3894.2004.08.008
[14] (Wang Fuhua, Qiao Zhong, Liu Wei.Parameter Estimations for Life Cycle Models of Repetitive and Frequent Purchased Product[J]. The Journal of Quantitative & Technical Economics, 2004(8): 55-61.)
doi: 10.3969/j.issn.1000-3894.2004.08.008
[15] 叶作亮, 王雪乔, 宝智红, 等. C2C环境中顾客重复购买行为的实证与建模[J]. 管理科学学报, 2011, 14(12): 71-78.
[15] (Ye Zuoliang, Wang Xueqiao, Bao Zhihong, et al.Modeling and Empirical Research of Repeat Purchase Behavior in C2C Ecommerce[J]. Journal of Management Sciences in China, 2011, 14(12): 71-78.)
[16] Hughes A M.Boosting Response with RFM[J]. Marketing Tools, 1996, 3(3): 4.
[17] 张宁, 范崇睿, 张岩. 一种基于RFM模型的新型协同过滤个性化推荐算法[J]. 电信科学, 2015, 31(9): 103-111.
doi: 10.11959/j.issn.1000-0801.2015180
[17] (Zhang Ning, Fan Chongrui, Zhang Yan.A Novel Personalized Recommendation Algorithm of Collaborative Filtering Based on RFM Model[J]. Telecommunications Science, 2015, 31(9): 103-111.)
doi: 10.11959/j.issn.1000-0801.2015180
[18] Reimer K, Albers S. Modeling Repeat Purchases in the Internet When RFM Captures Past Influence of Marketing[OL]. Econstor Preprints, 2011. .
[19] Yeh I, Yang K J, Ting T M.Knowledge Discovery on RFM Model Using Bernoulli Sequence[J]. Expert Systems with Applications, 2009, 36(3): 5866-5871.
doi: 10.1016/j.eswa.2008.07.018
[20] 王萍. 运用数据挖掘技术预测客户购买倾向——方法与实证研究[J]. 情报科学, 2005, 23(5): 738-741.
[20] (Wang Ping.Forecasting Customer Purchase Trend Based on Data Mining Technology——Method and Case Study[J]. Information Science, 2005, 23(5): 738-741.)
[21] 祝歆, 刘潇蔓, 陈树广, 等. 基于机器学习融合算法的网络购买行为预测研究[J]. 统计与信息论坛, 2017, 32(12): 94-100.
[21] (Zhu Xin, Liu Xiaoman, Chen Shuguang, et al.Research on Network Purchase Behavior Prediction Based on Machine Learning Fusion Algorithm[J]. Statistics & Information Forum, 2017, 32(12): 94-100.)
[22] Zhao Y, Yao L, Zhang Y.Purchase Prediction Using TMall-specific Features[J]. Concurrency and Computation: Practice and Experience, 2016, 28(14): 3879-3894.
doi: 10.1002/cpe.3720
[23] 王克利, 邓飞其. 基于阿里巴巴大数据重复购买预测的实证研究[J]. 时代金融, 2018(1): 237-239.
[23] (Wang Keli, Deng Feiqi.An Empirical Study of Repeat Purchase Forecast-Based on Big Data from Alibaba[J]. Times Finance, 2018(1): 237-239.)
[24] Pazzani M J, Merz C J, Murphy P M, et al.Reducing Misclassification Costs[C]// Proceedings of the 11th International Conference on Machine Learning. 1994: 217-225.
[25] Kubat M, Matwin S.Addressing the Curse of Imbalanced Training Sets: One-Sided Selection[C]// Proceedings of the 14th International Conference on Machine Learning. 1997: 179-186.
[26] Japkowicz N.The Class Imbalance Problem: Significance and Strategies[C]//Proceedings of the 2000 International Conference on Artificial Intelligence. 2000: 111-117.
[27] Ling C, Li C.Data Mining for Direct Marketing: Problems and Solutions[C]//Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. 1998.
[28] Chawla N V, Bowyer K W, Hall L O, et al.SMOTE: Synthetic Minority Over-sampling Technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.
doi: 10.1613/jair.953
[29] Breiman L.Random Forests[J]. Machine Learning, 2001, 45(1): 5-32.
doi: 10.1023/A:1010933404324
[30] 武前波, 王圣云. 商业地产开发视角下的中国城市评估体系[J]. 中国名城, 2017(7): 17-25.
[30] (Wu Qianbo, Wang Shengyun.China’s Urban Assessment System from the Perspective of Commercial Real Estate Development[J]. China Ancient City, 2017(7): 17-25.)
[31] Sokolova M, Japkowicz N, Szpakowicz S. Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation[A]// AI 2006: Advances in Artificial Intelligence[M]. Springer, Berlin, Heidelberg, 2006: 1015-1021.
[1] Liu Yuanchen, Wang Hao, Gao Yaqi. Predicting Online Music Playbacks and Influencing Factors[J]. 数据分析与知识发现, 2021, 5(8): 100-112.
[2] Wang Nan,Li Hairong,Tan Shuru. Predicting of Public Opinion Reversal with Improved SMOTE Algorithm and Ensemble Learning[J]. 数据分析与知识发现, 2021, 5(4): 37-48.
[3] Qiu Yunfei, Guo Lei. Predicting Diabetic Complications with Unbalanced Data[J]. 数据分析与知识发现, 2021, 5(2): 116-128.
[4] Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang. Classification of Short Texts Based on nLD-SVM-RF Model[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[5] Huiying Qi,Yuhe Jiang. Predicting Breast Cancer Survival Length with Multi-Omics Data Fusion[J]. 数据分析与知识发现, 2019, 3(8): 88-93.
[6] Wancheng Chen,Haoran Dai,Yinghan Jin. Appraising Home Prices with HEDONIC Model: Case Study of Seattle, U.S.[J]. 数据分析与知识发现, 2019, 3(5): 19-26.
[7] Zhou Cheng,Wei Hongqin. Identifying Crowd Participants with Modified Random Forests Algorithm[J]. 数据分析与知识发现, 2018, 2(7): 46-54.
[8] Chen Yuan,Wang Chaoqun,Hu Zhongyi,Wu Jiang. Identifying Malicious Websites with PCA and Random Forest Methods[J]. 数据分析与知识发现, 2018, 2(4): 71-80.
[9] Lv Weimin,Wang Xiaomei,Han Tao. Recommending Scientific Research Collaborators with Link Prediction and Extremely Randomized Trees Algorithm[J]. 数据分析与知识发现, 2017, 1(4): 38-45.
[10] Yuan Xinwei,Yang Shaohua,Wang Chaochao,Du Zhanhe. Identifying Lead Players of User Innovation Communities Based on Feature Extraction and Random Forest Classification[J]. 数据分析与知识发现, 2017, 1(11): 62-74.
[11] Zhang Liyi, Zhang Jiao. A Brusher Detection Method Based on Principle Component Analysis and Random Forest[J]. 现代图书情报技术, 2015, 31(10): 65-71.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn