Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (2/3): 80-92    DOI: 10.11925/infotech.2096-3467.2021.1062
Current Issue | Archive | Adv Search |
Predicting Churners of Online Health Communities Based on the User Persona
Wang Ruojia1,Yan Chengxi2,Guo Fengying1,Wang Jimin3()
1School of Management, Beijing University of Chinese Medicine, Beijing 100029, China
2School of Information Resource Management, Renmin University of China, Beijing 100872, China
3Department of Information Management, Peking University, Beijing 100871, China
Download: PDF (1542 KB)   HTML ( 32
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to predict user behaviors in online health community based on user persona technology, aiming to identify and keep the potential churners. [Methods] We constructed a multi-dimensional label system for user persona with the help of statistical analysis, social network analysis, natural language processing and LDA topic clustering. Then, we used the decision tree and ensemble learning models to predict the potential churners. [Results] We examined our new model with the Huaxia Traditional Chinese Medicine Forum and its F1 value reached 88.77%. [Limitations] More research is needed to examine our algorithm with other online health communities. [Conclusions] User persona technology could help us predict potential user churns.

Key wordsOnline Health Communities      User Persona      Churner Prediction      Machine Learning     
Received: 21 September 2021      Published: 01 March 2022
ZTFLH:  G350  
Fund:Beijing University of Chinese Medicine Young Scientist Fund(2021-JYB-XJSJJ-038)
Corresponding Authors: Wang Jimin,ORCID:0000-0002-3573-7788     E-mail: wjm@pku.edu.cn

Cite this article:

Wang Ruojia, Yan Chengxi, Guo Fengying, Wang Jimin. Predicting Churners of Online Health Communities Based on the User Persona. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 80-92.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.1062     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I2/3/80

构成要素 研究学者 具体指标
发文情况 Huh等[21],张海涛等[22] 发文频率、发帖数量、回帖数量、评论数量、原创帖分享、转发帖分享、发帖与评论中的提问等
文本内容 翟姗姗等[23],吴江等[24],盛姝等[14] 病情主题、疾病主题、医疗领域知识、社会支持类型等
专业权威 王凌霄等[25],王凯等[26],董伟等[27] 获得点赞数量、关注该用户的人数、回答被他人收藏的次数、回答被采纳的次数、精华帖数、认证级别、经验值、头衔等级等
社交交互 谷斌等[28],陈烨等[29] 入度中心性、出度中心性、中间中心性等
Elements of Online Community User Persona
Technology Roadmap
Label Dimension and Calculation Method of the User Persona
Steps and Methods of User Churn Prediction
Churner Judgment Based on the Method of Sliding Window
用户ID 用户名称 注册时间 帖子数 获得点赞数 声望
2 甘草 2008/5/5 701 3 0
32471 二戒老中医 2010/4/22 1 223 231 63
153282 风的季节 2020/12/10 4 0 1
Example Data of User Basic Information
帖子ID 帖子题目 发帖人 发帖时间 回复量 点击量 最新回帖时间
1914 华夏中医论坛患者咨询问诊单(求医者必看) 甘草 2008/7/19 5 8K 2020/12/10
434882 内科病咨询;胆虚少睡,胆实多睡。何方能实胆呢? 陶良义 2021/1/23 3 44 2021/1/23
415094 儿科病咨询;再次求助:小儿遗尿!!! 我是一个早产儿 2016/5/2 139 19K 2021/1/22
Example Data of Forum Posting Information
帖子ID 用户ID 用户名称 回帖时间 回帖内容
434904 153838 bleachpiece 2021/01/26, 07:59 去年被一个医生治坏了,吃很多以六味丸为基础加减的方子,还有几个月五子衍宗丸。……但本人又是一派阳虚之象,四肢凉,白痰多等等,不能只用寒凉药。……我被治坏之前吃温热药一点问题都没的。
434904 145132 小学生11 2021/01/26, 09:51 这是邪火没有清理干净,当然不能吃温热药或者食物拉,当用寒凉派理论,把这邪火排出来就正常。所以用药当清邪火,补阳虚
434904 11474 金钱草 2021/01/26, 10:08 bleachpiece说: 去年前年被一个医生治坏了,点击展开... 比较奇特的个例。六味只是滋养药,五子温阳也不是厉害,怎么导致后来一用温药就痒呢,更何况身体一派阳虚之状?想来,莫非是运化不良而致?留个关注,学习专家们怎么分析。
Example Data of Forum Reply Information
User Category Visualization
序号 关键词 频数 序号 关键词 频数 序号 关键词 频数 序号 关键词 频数
1 针灸 12 6 病痛 4 11 熏蒸 3 16 生姜 2
2 医生 8 7 关元 3 12 内服 3 17 辩证 2
3 见效 7 8 消炎 3 13 尺泽 2 18 电针 2
4 穴位 6 9 针灸师 3 14 手三里 2 19 水针 2
5 经络 4 10 针刺 3 15 浮小麦 2 20 艾灸 2
Top20 Keywords of User ID144512
User Persona of User ID126769
The Best Number of LDA Topic
类型 算法 精确率 召回率 F1
决策树 CART 0.809 3 0.807 6 0.807 3
C4.5 0.809 1 0.807 6 0.807 3
平均 0.809 2 0.807 6 0.807 3
集成学习 Bagging算法 RandomForest 0.807 4 0.807 3 0.807 2
ExtraTrees 0.882 5 0.882 5 0.882 5
平均 0.845 0 0.844 9 0.844 9
Boosting算法 AdaBoost 0.840 2 0.839 7 0.839 6
Gradient Boosting 0.890 3 0.887 9 0.887 7
平均 0.865 3 0.863 8 0.863 7
Comparison of Model Prediction Results
Prediction Results of Different Classification Algorithms in Specific Categories
[1] 中华人民共和国中央人民政府. 国务院关于积极推进“互联网+”行动的指导意见[EB/OL]. [2021-09-15]. http://www.gov.cn/zhengce/content/2015-07/04/content_10002.htm.
[1] (The Central People’s Government of People’s Republic of China. The State Council’s Guiding Opinions on Actively Promoting the “Internet Plus” Action[EB/OL]. [2021-09-15]. http://www.gov.cn/zhengce/content/2015-07/04/content_10002.htm. )
[2] 中华人民共和国中央人民政府. 国务院办公厅关于促进“互联网+医疗健康”发展的意见 [EB/OL]. [2021-09-15]. http://www.gov.cn/zhengce/content/2018-04/28/content_5286645.htm.
[2] (The Central People’s Government of People’s Republic of China. The General Office of the State Council Has the Opinion on Promoting the Development of Internet Plus Medical Health [EB/OL]. [2021-09-15]. http://www.gov.cn/zhengce/content/2018-04/28/content_5286645.htm. )
[3] 智研咨询. 2019-2025年中国互联网医疗行业市场前景分析及发展趋势预测报告[EB/OL]. [2021-09-15]. https://www.chyxx.com/research/201904/729290.html.
[3] (Zhiyan Consulting. Market Prospect Analysis and Development Trend Forecast Report of China’s Internet Medical Industry from 2019 to 2025[EB/OL]. [2021-09-15]. https://www.chyxx.com/research/201904/729290.html. )
[4] 赵栋祥. 国内在线健康社区研究现状综述[J]. 图书情报工作, 2018, 62(9):134-142.
[4] ( Zhao Dongxiang. Review on Domestic Research Status of Online Health Community[J]. Library and Information Service, 2018, 62(9):134-142.)
[5] Johansson V, Islind A S, Lindroth T, et al. Online Communities as a Driver for Patient Empowerment: Systematic Review[J]. Journal of Medical Internet Research, 2021, 23(2):e19910.
doi: 10.2196/19910
[6] Bouma G, Admiraal J M, de Vries E G E, et al. Internet-Based Support Programs to Alleviate Psychosocial and Physical Symptoms in Cancer Patients: A Literature Analysis[J]. Critical Reviews in Oncology/Hematology, 2015, 95(1):26-37.
doi: 10.1016/j.critrevonc.2015.01.011
[7] Zhang S D, Bantum E, Owen J, et al. Does Sustained Participation in an Online Health Community Affect Sentiment?[J]. AMIA Annual Symposium Proceedings AMIA Symposium, 2014: 1970-1979.
[8] Young C. Community Management That Works: How to Build and Sustain a Thriving Online Health Community[J]. Journal of Medical Internet Research, 2013, 15(6):e119.
doi: 10.2196/jmir.2501
[9] Skousen T, Safadi H, Young C, et al. Successful Moderation in Online Patient Communities: Inductive Case Study[J]. Journal of Medical Internet Research, 2020, 22(3):e15983.
doi: 10.2196/15983
[10] Zhou J J, Liu F, Zhou T T. Exploring the Factors Influencing Consumers to Voluntarily Reward Free Health Service Contributors in Online Health Communities: Empirical Study[J]. Journal of Medical Internet Research, 2020, 22(4):e16526.
doi: 10.2196/16526
[11] Alan Cooper. 交互设计之路:让高科技产品回归人性[M]. Chris Ding译. 第2版. 北京: 电子工业出版社, 2006.
[11] (Alan Cooper. The Inmates are Running the Asylum:Why High-Tech Products Drive Us Crazy and How to Restore the Sanity [M]. Translated by Ding C. The 2nd Edition. Beijing: Publishing House of Electronics Industry, 2006.)
[12] 何振宇, 朱庆华, 白玫. 养老服务视角下城市老年人用户画像构建[J]. 情报杂志, 2021, 40(9):154-160.
[12] ( He Zhenyu, Zhu Qinghua, Bai Mei. The Construction of Urban Elderly User Portrait from the Perspective of Pension Service[J]. Journal of Intelligence, 2021, 40(9):154-160.)
[13] 刘丹, 张兴刚, 任淑敏. 基于用户画像的高校图书馆阅读疗法模式[J]. 中华医学图书情报杂志, 2018, 27(7):68-71.
[13] ( Liu Dan, Zhang Xinggang, Ren Shumin. User Profile-Based Bibliotherapy Model in Academic Library[J]. Chinese Journal of Medical Library and Information Science, 2018, 27(7):68-71.)
[14] 盛姝, 黄奇, 郑姝雅, 等. 在线健康社区中用户画像及主题特征分布下信息需求研究——以医享网结直肠癌圈数据为例[J]. 情报学报, 2021, 40(3):308-320.
[14] ( Sheng Shu, Huang Qi, Zheng Shuya, et al. Study of User Information Requirements in an Online Health Community Based on the Distribution of User Profile and Theme Features: Taking Colorectal Cancer Data from YiXiang as an Example[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(3):308-320.)
[15] 韩梅花, 赵景秀. 基于“用户画像”的阅读疗法模式研究——以抑郁症为例[J]. 大学图书馆学报, 2017, 35(6):105-110.
[15] ( Han Meihua, Zhao Jingxiu. Research on Bibliotherapy Model Based on User Profile—Take Depression as an Example[J]. Journal of Academic Libraries, 2017, 35(6):105-110.)
[16] Litchman M L, Walker H R, Fitzgerald C, et al. Patient-Driven Diabetes Technologies: Sentiment and Personas of the #WeAreNotWaiting and #OpenAPS Movements[J]. Journal of Diabetes Science and Technology, 2020, 14(6):990-999.
doi: 10.1177/1932296820932928
[17] 唐晓波, 高和璇. 基于特征分析和标签提取的医生画像构建研究[J]. 情报科学, 2020, 38(5):3-10.
[17] ( Tang Xiaobo, Gao Hexuan. Study of the Doctor Portrait Based on Feature Analysis and Label Extraction[J]. Information Science, 2020, 38(5):3-10.)
[18] 刘静, 安璐. 突发公共卫生事件中社交媒体用户应急信息搜寻行为画像研究[J]. 情报理论与实践, 2020, 43(11):8-15.
[18] ( Liu Jing, An Lu. The Profiling of Users’ Emergency Information Seeking Behavior on Social Media in the Context of Public Health Emergencies[J]. Information Studies: Theory & Application, 2020, 43(11):8-15.)
[19] 杜孟凯, 王蕾, 张维. 突发公共卫生事件中基于网络用户画像的医院舆情治理实践策略研究[J]. 中国医药科学, 2021, 11(6):218-221.
[19] ( Du Mengkai, Wang Lei, Zhang Wei. Research on the Hospital Public Opinion Management Practice Strategy Based on Portrait of Network Users in Public Health Emergencies[J]. China Medicine and Pharmacy, 2021, 11(6):218-221.)
[20] 宋美琦, 陈烨, 张瑞. 用户画像研究述评[J]. 情报科学, 2019, 37(4):171-177.
[20] ( Song Meiqi, Chen Ye, Zhang Rui. A Review of User Profile Research[J]. Information Science, 2019, 37(4):171-177.)
[21] Huh J, Kwon B C, Kim S H, et al. Personas in Online Health Communities[J]. Journal of Biomedical Informatics, 2016, 63:212-225.
doi: 10.1016/j.jbi.2016.08.019
[22] 张海涛, 崔阳, 王丹, 等. 基于概念格的在线健康社区用户画像研究[J]. 情报学报, 2018, 37(9):912-922.
[22] ( Zhang Haitao, Cui Yang, Wang Dan, et al. Study of Online Healthy Community User Profile Based on Concept Lattice[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(9):912-922.)
[23] 翟姗姗, 胡畔, 潘英增, 等. 融合知识图谱与用户病情画像的在线医疗社区场景化信息推荐研究[J]. 情报科学, 2021, 39(5):97-105.
[23] ( Zhai Shanshan, Hu Pan, Pan Yingzeng, et al. Scenario-Based Information Recommendation of Online Medical Community Based on Knowledge Graph and Disease Portrait[J]. Information Science, 2021, 39(5):97-105.)
[24] 吴江, 侯绍新, 靳萌萌, 等. 基于LDA模型特征选择的在线医疗社区文本分类及用户聚类研究[J]. 情报学报, 2017, 36(11):1183-1191.
[24] ( Wu Jiang, Hou Shaoxin, Jin Mengmeng, et al. LDA Feature Selection Based Text Classification and User Clustering in Chinese Online Health Community[J]. Journal of the China Society for Scientific and Technical Information, 2017, 36(11):1183-1191.)
[25] 王凌霄, 沈卓, 李艳. 社会化问答社区用户画像构建[J]. 情报理论与实践, 2018, 41(1):129-134.
[25] ( Wang Lingxiao, Shen Zhuo, Li Yan. User Profiling of Social Q&A Community[J]. Information Studies: Theory & Application, 2018, 41(1):129-134.)
[26] 王凯, 潘玮, 杨枢, 等. 基于模糊概念格的丁香园社区用户多粒度画像研究[J]. 情报理论与实践, 2020, 43(8):103-111.
[26] ( Wang Kai, Pan Wei, Yang Shu, et al. Multi-Grained Portrait of Community Users Based on Fuzzy Concept Lattice: Taking Ding Xiang Yuan as Example[J]. Information Studies: Theory & Application, 2020, 43(8):103-111.)
[27] 董伟, 李建红, 陶金虎. 在线健康社区活跃用户识别及其交互类型分析[J]. 文献与数据学报, 2020, 2(1):89-101.
[27] ( Dong Wei, Li Jianhong, Tao Jinhu. Analysis of Active User Identification and Interactive Behavior in Online Health Community[J]. Journal of Library and Data, 2020, 2(1):89-101.)
[28] 谷斌, 徐菁, 黄家良. 专业虚拟社区用户分类模型研究[J]. 情报杂志, 2014, 33(5):203-207.
[28] ( Gu Bin, Xu Jing, Huang Jialiang. On Classifying Model for Professional Virtual Community Users[J]. Journal of Intelligence, 2014, 33(5):203-207.)
[29] 陈烨, 王乐, 陈天雨, 等. 基于社会网络分析的社会化问答平台用户画像研究[J]. 情报学报, 2021, 40(4):414-423.
[29] ( Chen Ye, Wang Le, Chen Tianyu, et al. Research on User Profile of Social Q&A Platform Participants Based on Social Network Analysis[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(4):414-423.)
[30] Kostić S M, Simić M I, Kostić M V. Social Network Analysis and Churn Prediction in Telecommunications Using Graph Theory[J]. Entropy, 2020, 22(7):753.
doi: 10.3390/e22070753
[31] Spanoudes P, Nguyen T. Deep Learning in Customer Churn Prediction: Unsupervised Feature Learning on Abstract Company Independent Feature Vectors[OL]. arXiv Preprint, arXiv:1703.03869.
[32] Kilimci Z H, Yörük H, Akyokus S. Sentiment Analysis Based Churn Prediction in Mobile Games Using Word Embedding Models and Deep Learning Algorithms[C]// Proceedings of 2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA). IEEE, 2020: 1-7.
[33] Mohammadzadeh M, Hoseini Z Z, Derafshi H. A Data Mining Approach for Modeling Churn Behavior via RFM Model in Specialized Clinics Case Study: A Public Sector Hospital in Tehran[J]. Procedia Computer Science, 2017, 120:23-30.
doi: 10.1016/j.procs.2017.11.206 pmid: 32288897
[34] Tarokh M J, EsmaeiliGookeh M. Modeling Patient’s Value Using a Stochastic Approach: An Empirical Study in the Medical Industry[J]. Computer Methods and Programs in Biomedicine, 2019, 176:51-59.
doi: 10.1016/j.cmpb.2019.04.021
[35] Kwon H, Kim H H, An J, et al. Lifelog Data-Based Prediction Model of Digital Health Care App Customer Churn: Retrospective Observational Study[J]. Journal of Medical Internet Research, 2021, 23(1):e22184.
doi: 10.2196/22184
[36] Rowe M. Predicting Online Community Churners Using Gaussian Sequences[C]// Proceedings of International Conference on Social Informatics, Barcelona, Spain. New York, USA: Springer International Publishing, 2014.
[37] Rowe M. Mining User Development Signals for Online Community Churner Detection[J]. ACM Transactions on Knowledge Discovery from Data, 2016, 10(3):21.
[38] Wang X, Zhao K, Street N. Analyzing and Predicting User Participations in Online Health Communities: A Social Support Perspective[J]. Journal of Medical Internet Research, 2017, 19(4):e130.
doi: 10.2196/jmir.6834
[39] 艾金金. 电商平台客户流失预警分析及应用研究[D]. 南京: 南京大学, 2019.
[39] ( Ai Jinjin. Churn Prediction Analysis and Practical Research of E-Commerce Platform[D]. Nanjing: Nanjing University, 2019.)
[40] 李燕仪. 基于数据挖掘方法的汽车客户画像分析及流失客户预测[D]. 广州: 华南理工大学, 2017.
[40] ( Li Yanyi. A Study of Automobile Customer Portrait and Churn Prediction Based on Data Mining[D]. Guangzhou: South China University of Technology, 2017.)
[41] Jiawei Han, Micheline Kamber, Jian Pei. 数据挖掘概念与技术[M]. 范明, 孟小峰译. 第3版. 北京: 机械工业出版社, 2012.
[41] ( Han J W, Kamber M, Pei J. Data Mining Concepts and Techniques[M]. Translated by Fan Ming, Meng Xiaofeng. The 3rd Edition. Beijing: China Machine Press, 2012.)
[42] Mihalcea R, Tarau P. TextRank: Bringing Order into Text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain. New York, USA: Association for Computational Linguistics, 2004: 404-411.
[43] Wang Y C, Kraut R, Levine J M. To Stay or Leave?: The Relationship of Emotional and Informational Support to Commitment in Online Health Support Groups[C]// Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work. 2012: 833-842.
[44] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022.
[45] Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine Learning in Python[J]. Journal of Machine Learning Research, 2011, 12:2825-2830.
[46] Röder M, Both A, Hinneburg A. Exploring the Space of Topic Coherence Measures[C]// Proceedings of the 8th ACM International Conference on Web Search and Data Mining. 2015: 399-408.
[47] Maclin R, Opitz D. An Empirical Evaluation of Bagging and Boosting[C]// Proceedings of the 14th National Conference on Artificial Intelligence and 9th Conference on Innovative Applications of Artificial Intelligence, Rhode Island. New York, USA: ACM, 1997: 546-551.
[48] Oza N C, Russell S J. Online Bagging and Boosting[C]// Proceedings of International Conference on Systems, Man and Cybernetics. Waikoloa, USA: IEEE, 2005.
[1] Wu Jinhong, Mu Keliang. Automatic Identifying Abnormal Behaviors of International Journals[J]. 数据分析与知识发现, 2022, 6(2/3): 385-395.
[2] Hu Yamin, Wu Xiaoyan, Chen Fang. Review of Technology Term Recognition Studies Based on Machine Learning[J]. 数据分析与知识发现, 2022, 6(2/3): 7-17.
[3] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[4] Chen Donghua,Zhao Hongmei,Shang Xiaopu,Zhang Runtong. Optimizing Large Hospital Operating Rooms with Data Analytics[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[5] Wang Hanxue,Cui Wenjuan,Zhou Yuanchun,Du Yi. Identifying Pathogens of Foodborne Diseases with Machine Learning[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[6] Su Qiang, Hou Xiaoli, Zou Ni. Predicting Surgical Infections Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[7] Cao Rui,Liao Bin,Li Min,Sun Ruina. Predicting Prices and Analyzing Features of Online Short-Term Rentals Based on XGBoost[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[8] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[9] Xiang Zhuoyuan,Liu Zhicong,Wu Yu. Adaptive Recommendation Model Based on User Behaviors[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[10] Zhou Zhichao. Review of Automatic Citation Classification Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(12): 14-24.
[11] Chai Guorong,Wang Bin,Sha Yongzhong. Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[12] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[13] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[14] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[15] Wang Shuyi,Liu Sai,Ma Zheng. Microblog Image Privacy Classification with Deep Transfer Learning[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn