[Objective] This paper compares the prediction accuracy and efficiency of different machine learning algorithms, aiming to identify new consumers with repeat purchase intentions. It also provides a theoretical framework for customer classification. [Methods] First, we collected the server logs of a dealer on Taobao.com from 2015 to 2018, as well as its orders and consumers’ personal information. And then, we used different algorithms to train the proposed models. [Results] The SMOTE algorithm combined with the random forest algorithm obtained the highest prediction accuracy of 96%. [Limitations] The sample data size needs to be expanded. [Conclusions] The fusion algorithm based on SMOTE and random forest has better performance in predicting repurchase intentions of new consumers.
(Chen Lu.Analysis of Diversification Strategy in Alibaba[J]. Business Economy, 2017(4): 77-79.)
[2]
Schmittlein D C, Morrison D G, Colombo R.Counting Your Customers: Who are They and What will They Do Next?[J]. Management Science, 1987, 33(1): 1-24.
doi: 10.1287/mnsc.33.1.1
[3]
刘帆. 客户分析与统计建模[D]. 合肥: 中国科学技术大学, 2014.
[3]
(Liu Fan.Statistical Models for Customer-base Analysis[D]. Hefei: University of Science and Technology of China, 2014.)
(Li Meiqi, Qi Jiayin.Customer Purchase Prediction Based on Buying Behavior and Comment Behavior[J]. Journal of Beijing University of Posts and Telecommunications: Social Science Edition, 2016, 18(4): 18-25.)
(Ma Shaohui, Liu Jinlan.Empirical Study on Pareto/NBD Model and Its Applications[J]. Management Sciences in China, 2006, 19(5): 45-49.)
doi: 10.3969/j.issn.1672-0334.2006.05.007
Van Oest R, Knox G.Erratum to: Extending the BG/NBD: A Simple Model of Purchases and Complaints[J]. International Journal of Research in Marketing, 2011, 28(3): 280.
doi: 10.1016/j.ijresmar.2011.08.001
(Ma Baolong, Li Jinlin, Li Chunqing, et al.Effect of Reward Programs on Repeat-purchase Behavior Patterns[J]. Application of Statistics & Management, 2007, 26(3): 457-467.)
doi: 10.3969/j.issn.1002-1566.2007.03.013
(Wu Guohua, Pan Dehui.Analyzing the Main Elements of Customer Purchase Behavior and Predicting the Probability of Customer Repurchase[J]. Journal of Industrial Engineering & Engineering Management, 2005, 19(1): 104-107.)
doi: 10.3969/j.issn.1004-6062.2005.01.023
[12]
Tapiero C S.The NBD Repeat Purchase Process and M / G /∞ Queues[J]. International Journal of Production Economics, 2000, 63(2): 141-145.
doi: 10.1016/S0925-5273(98)00254-0
[13]
Jacobs B J D, Donkers B, Fok D. Model-based Purchase Predictions for Large Assortments[J]. Marketing Science, 2018, 35(3): 389-404.
doi: 10.1287/mksc.2016.0985
(Wang Fuhua, Qiao Zhong, Liu Wei.Parameter Estimations for Life Cycle Models of Repetitive and Frequent Purchased Product[J]. The Journal of Quantitative & Technical Economics, 2004(8): 55-61.)
doi: 10.3969/j.issn.1000-3894.2004.08.008
(Ye Zuoliang, Wang Xueqiao, Bao Zhihong, et al.Modeling and Empirical Research of Repeat Purchase Behavior in C2C Ecommerce[J]. Journal of Management Sciences in China, 2011, 14(12): 71-78.)
[16]
Hughes A M.Boosting Response with RFM[J]. Marketing Tools, 1996, 3(3): 4.
(Zhang Ning, Fan Chongrui, Zhang Yan.A Novel Personalized Recommendation Algorithm of Collaborative Filtering Based on RFM Model[J]. Telecommunications Science, 2015, 31(9): 103-111.)
doi: 10.11959/j.issn.1000-0801.2015180
[18]
Reimer K, Albers S. Modeling Repeat Purchases in the Internet When RFM Captures Past Influence of Marketing[OL]. Econstor Preprints, 2011. .
[19]
Yeh I, Yang K J, Ting T M.Knowledge Discovery on RFM Model Using Bernoulli Sequence[J]. Expert Systems with Applications, 2009, 36(3): 5866-5871.
doi: 10.1016/j.eswa.2008.07.018
(Zhu Xin, Liu Xiaoman, Chen Shuguang, et al.Research on Network Purchase Behavior Prediction Based on Machine Learning Fusion Algorithm[J]. Statistics & Information Forum, 2017, 32(12): 94-100.)
[22]
Zhao Y, Yao L, Zhang Y.Purchase Prediction Using TMall-specific Features[J]. Concurrency and Computation: Practice and Experience, 2016, 28(14): 3879-3894.
doi: 10.1002/cpe.3720
(Wang Keli, Deng Feiqi.An Empirical Study of Repeat Purchase Forecast-Based on Big Data from Alibaba[J]. Times Finance, 2018(1): 237-239.)
[24]
Pazzani M J, Merz C J, Murphy P M, et al.Reducing Misclassification Costs[C]// Proceedings of the 11th International Conference on Machine Learning. 1994: 217-225.
[25]
Kubat M, Matwin S.Addressing the Curse of Imbalanced Training Sets: One-Sided Selection[C]// Proceedings of the 14th International Conference on Machine Learning. 1997: 179-186.
[26]
Japkowicz N.The Class Imbalance Problem: Significance and Strategies[C]//Proceedings of the 2000 International Conference on Artificial Intelligence. 2000: 111-117.
[27]
Ling C, Li C.Data Mining for Direct Marketing: Problems and Solutions[C]//Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. 1998.
[28]
Chawla N V, Bowyer K W, Hall L O, et al.SMOTE: Synthetic Minority Over-sampling Technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.
doi: 10.1613/jair.953
(Wu Qianbo, Wang Shengyun.China’s Urban Assessment System from the Perspective of Commercial Real Estate Development[J]. China Ancient City, 2017(7): 17-25.)
[31]
Sokolova M, Japkowicz N, Szpakowicz S. Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation[A]// AI 2006: Advances in Artificial Intelligence[M]. Springer, Berlin, Heidelberg, 2006: 1015-1021.