[Objective] This study constructs a user churn prediction model for smart home-based care services. It utilizes the SHAP interpretation method to analyze the impact of different features on user churn. [Methods] First, we retrieved more than 300,000 community home-based care service orders from 2019 to 2021. Then, we incorporated the RFM model (RFM-MLP), the Maslow’s hierarchy of demand theory, the Anderson model, and the Boruta algorithm to identify 11 characteristics across three categories: user values, service selections, and individual features. Third, we chose the XGBoost model from the five established machine learning models for the best performance in predicting user churn. Finally, we employed the SHAP interpretation method to examine the feature impact, dependence, and single-sample analysis. [Results] The predictive model achieves high accuracy and F1 score of approximately 87%. Noteworthy features for predicting user churn on smart home-based care services include domestic service purchase numbers, use length, and user age. [Limitations] Our data was from a single region. The data quality and algorithm complexity could be improved in the future. [Conclusions] The SHAP interpretation method effectively balances accuracy and interpretability in machine learning prediction models. The insights gained provide a foundation for optimizing operational strategies and content design on smart home-based care service platforms.
刘天畅, 王雷, 朱庆华. 基于SHAP解释方法的智慧居家养老服务平台用户流失预测研究*[J]. 数据分析与知识发现, 2024, 8(1): 40-54.
Liu Tianchang, Wang Lei, Zhu Qinghua. Predicting User Churn of Smart Home-based Care Services Based on SHAP Interpretation. Data Analysis and Knowledge Discovery, 2024, 8(1): 40-54.
(The State Council. Several Opinions of the State Council on Accelerating the Development of the Elderly Care Service Industry[EB/OL]. (2013-09-13). [2022-02-28]. http://www.gov.cn/zwgk/2013-09/13/content_2487704.htm.)
(Zeng Qiyan, He Zhipeng, Zeng Yinchu. The Cause of Paradoxical Between Willingness and Behavior of Elderly People’s Demand for Home-Based Care Services[J]. Population & Economics, 2022(2): 87-103.)
(Bai Mei, Zhu Qinghua. Impact Factors of Smart Care Needs and Volunteer Service Willingness for the Aged——A Case of Jianghan District in Wuhan[J]. Journal of Modern Information, 2018, 38(12): 3-8.)
doi: 10.3969/j.issn.1008-0821.2018.12.001
(Feng Chunmei. Analysis on Influencing Factors of New Community Home Care Service for the Aged[J]. Statistics & Decision, 2018, 34(20): 110-113.)
[5]
Shrestha Y R, He V F, Puranam P, et al. Algorithm Supported Induction for Building Theory: How Can We Use Prediction Models to Theorize?[J]. Organization Science, 2021, 32(3): 856-880.
doi: 10.1287/orsc.2020.1382
[6]
Molnar C, Casalicchio G, Bischl B. Interpretable Machine Learning - A Brief History, State-of-the-Art and Challenges[C]// Proceedings of the 2020 Workshops of the European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer, 2020: 417-431.
(Xu Xiaojuan, Zhao Yuxiang, Zhu Qinghua. Theoretical Basis and Influence Factors of User Exodus Behavior of Social Networking Sites[J]. Library and Information Service, 2016, 60(4): 134-141.)
doi: 10.13266/j.issn.0252-3116.2016.04.018
(Chen Yu, Huang Liangfeng. Empirical Research on Customers’ Churn Behavior on E-Book Reading Apps: Based on Rational Choice Theory[J]. Library Tribune, 2019, 39(9): 118-126.)
(Xu Xiaojuan, Zhao Yuxiang, Wu Manli, et al. The Empirical Research of User Exodus in Social Network Based on the Stimuli-Organism-Response Theory[J]. Journal of Intelligence, 2017, 36(7): 188-194.)
(Guo Shunli, Zhang Xiangxian, Xiang Mengmeng. Research on the Customer Churn Behavior Model and Its Influencing Factors of WeChat Public Platform in University Libraries[J]. Library and Information Service, 2017, 61(2): 57-66.)
doi: 10.13266/j.issn.0252-3116.2017.02.007
(Zheng Dejun, Li Yang, Shen Junwei, et al. A Study on the Influence Factors of User Exodus in Mobile Reading Platform: Taking “WeChat Reading” as an Example[J]. Information Studies: Theory & Application, 2019, 42(8): 78-82.)
(Wang Meng, Hua Yuwen, Chen Ya. A Study on User Churn of Rural Public Digital Cultural Services in Eastern China from the Perspective of S-O-R Theory[J]. Library Journal, 2022, 41(2): 36-46.)
(Xing Shaoyan, Zhu Xuefang. An Empirical Study on the User Churn Prediction of Paid Knowledge Live[J]. Journal of Information Resources Management, 2022, 12(4): 121-130.)
[14]
Tarokh M J, EsmaeiliGookeh M. Modeling Patient’s Value Using a Stochastic Approach: An Empirical Study in the Medical Industry[J]. Computer Methods and Programs in Biomedicine, 2019, 176: 51-59.
doi: 10.1016/j.cmpb.2019.04.021
[15]
Sato K, Oka M, Kato K. Early Churn User Classification in Social Networking Service Using Attention-Based Long Short-Term Memory[C]// Proceedings of the 2019 Pacific-Asia Conference on Knowledge Discovery and Data Mining. Cham: Springer, 2019: 45-56.
[16]
Kostić S M, Simić M I, Kostić M V. Social Network Analysis and Churn Prediction in Telecommunications Using Graph Theory[J]. Entropy, 2020, 22(7): Article No.753.
[17]
Kilimci Z H, Yörük H, Akyokus S. Sentiment Analysis Based Churn Prediction in Mobile Games Using Word Embedding Models and Deep Learning Algorithms[C]// Proceedings of the 2020 International Conference on Innovations in Intelligent Systems and Applications. IEEE, 2020: 1-7.
(Feng Xin, Wang Chen, Liu Yuan, et al. The Customer Churn Prediction Based on Emotional Polarity and BPNN[J]. Journal of China Academy of Electronics and Information Technology, 2018, 13(3): 340-345.)
(Wang Ruojia, Yan Chengxi, Guo Fengying, et al. Predicting Churners of Online Health Communities Based on the User Persona[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 80-92.)
[21]
Lundberg S M, Lee S I. A Unified Approach to Interpreting Model Predictions[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. ACM, 2017: 4768-4777.
[22]
Shapley L. A Value for n-Person Games[A]//Kuhn H W. Classics in Game Theory[M]. Princeton University Press, 1997: 69-79.
[23]
Lundberg S M, Erion G, Chen H, et al. Explainable AI for Trees: From Local Explanations to Global Understanding[OL]. arXiv Preprint, arXiv: 1905.04610.
(Lei Xinnan, Lin Lefan, Xiao Binqing, et al. Re-Exploration of Small and Micro Enterprises’ Default Characteristics Based on Machine Learning Models with SHAP[J/OL]. Chinese Journal of Management Science. https://doi.org/10.16381/j.cnki.issn1003-207x.2021.0027.)
(Liao Bin, Wang Zhining, Li Min, et al. Integrating XGBoost and SHAP Model for Football Player Value Prediction and Characteristic Analysis[J]. Computer Science, 2022, 49(12): 195-204.)
doi: 10.11896/jsjkx.210600029
(Cao Rui, Liao Bin, Li Min, et al. Predicting Prices and Analyzing Features of Online Short-Term Rentals Based on XGBoost[J]. Data Analysis and Knowledge Discovery, 2021, 5(6): 51-65.)
(Li Zongmin, Zhang Qi, Du Xinyu. Research on Rumor-Refutation Effectiveness Based on the Interactions and Popular Comments’ Emotional Tendencies of the Rumor-Refuting Microblogs: Taking Rumor-Refuting Microblogs Related with COVID-2019 as an Example[J]. Journal of Intelligence, 2020, 39(11): 90-95.)
[28]
Parsa A B, Movahedi A, Taghipour H, et al. Toward Safer Highways, Application of XGBoost and SHAP for Real-Time Accident Detection and Feature Analysis[J]. Accident Analysis & Prevention, 2020, 136: Article No.105405.
[29]
Meng Y, Yang N H, Qian Z L, et al. What Makes an Online Review More Helpful: An Interpretation Framework Using XGBoost and SHAP Values[J]. Journal of Theoretical and Applied Electronic Commerce Research, 2020, 16(3): 466-490.
doi: 10.3390/jtaer16030029
(Lu Yun, Zhang Mengyue, Xia He, et al. Multicenter Retrospective Analysis of 1055 Severe Cases of COVID-19 Treated by Integrated Chinese and Western Medicine or Western Medicine Based on LightGBM and SHAP[J]. Journal of Beijing University of Traditional Chinese Medicine, 2021, 44(12): 1098-1107.)
[31]
Wen X, Xie Y C, Wu L T, et al. Quantifying and Comparing the Effects of Key Risk Factors on Various Types of Roadway Segment Crashes with LightGBM and SHAP[J]. Accident Analysis & Prevention, 2021, 159: Article No.106261.
(Ding Heng, Ruan Jinglong. Exploring the Factors Influencing LIS Scholars Citing Other’s Works: An Empirical Research Based on Algorithmic Attribution[J]. Documentation, Information & Knowledge, 2022, 39(2): 83-97.)
[33]
Hughes A M. Strategic Datebase Marketing[M]. Chicago: Probus Publishing, 1994.
[34]
Cheng C H, Chen Y S. Classifying the Segmentation of Customer Value via RFM Model and RS Theory[J]. Expert Systems with Applications, 2009, 36(3): 4176-4184.
doi: 10.1016/j.eswa.2008.04.003
(Wei Ling, Guo Xinyue. Using Adapted RFM and GMDH Algorithms to Predict MOOC User Attrition Rate[J]. Chinese Journal of Distance Education, 2020(9): 39-43, 61.)
[37]
Keaveney S M. Customer Switching Behavior in Service Industries: An Exploratory Study[J]. Journal of Marketing, 1995, 59(2): 71-82.
[38]
Maslow A H. A Theory of Human Motivation[J]. Psychological Review, 1943, 50(4): 370-396.
doi: 10.1037/h0054346
(Hou Bing. Community Home-Based Care Service for Urban Elderly: Demand Levels and Satisfying Strategy[J]. Chinese Social Security Review, 2019, 3(3): 147-159.)
(Li Bin, Wang Yiming, Li Xue, et al. The Need and Influence Factors of the Elderly Care Services in Urban Community[J]. Architectural Journal, 2016(S1): 90-94.)
[41]
Kominski G F. Changing the US Health Care System: Key Issues in Health Services Policy and Management[M]. John Wiley & Sons, 2013.
(Li Yue’e, Lu Shan. The Development, Application and Implications of the Anderson Model in the Field of Healthcare[J]. Chinese Journal of Health Policy, 2017, 10(11): 77-82.)
(Peng Xizhe, Song Liangjun, Huang Jiankun. Determinants of Long-Term Care Services Among Disabled Older Adults in China: A Quantitative Study Based on Andersen’s Behavioral Model[J]. Population Research, 2017, 41(4): 46-59.)
[44]
Hu B, Li B Q, Wang J, et al. Home and Community Care for Older People in Urban China: Receipt of Services and Sources of Payment[J]. Health & Social Care in the Community, 2020, 28(1): 225-235.
(Zhou Wanting, Zhao Zhijie, Liu Yang, et al. Research on DBN Prediction Model of E-Commerce Customer Churn[J]. Computer Engineering and Applications, 2022, 58(11): 84-92.)
doi: 10.3778/j.issn.1002-8331.2104-0221
[46]
Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357.
doi: 10.1613/jair.953
[47]
Bambi C, Modesto L. Rotating Regular Black Holes[J]. Physics Letters B, 2013, 721(4-5): 329-334.
doi: 10.1016/j.physletb.2013.03.025
[48]
Kursa M B, Rudnicki W R. Feature Selection with the Boruta Package[J]. Journal of Statistical Software, 2010, 36(11): 1-13.
[49]
Dietterich T G. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms[J]. Neural Computation, 1998, 10(7): 1895-1923.
doi: 10.1162/089976698300017197
pmid: 9744903
(Guo Lina, Hao Yong. Health Conditions, Informal Care and Social Provision: Which is More Influencing the Elderly’s Demand for Home-Care[J]. Northwest Population Journal, 2019, 40(5): 36-49.)
(Yang Yinan, Yuan Tao. Analysis on Identification of the Elderly Care Service Buyers and the Attribution Decomposition[J]. Chinese Journal of Population Science, 2022(1): 113-125.)
(Wang Qiong. Demands and Determinants of Community Home-Based Care Services for Urban Elderly: Based on the 2010 National Elderly Survey in China[J]. Population Research, 2016, 40(1): 98-112.)