Detecting Depression Factors with Gradient Boosting Tree and Explainable Machine Learning Model SHAP

doi:10.11925/infotech.2096-3467.2023.0052

Data Analysis and Knowledge Discovery

2024, Vol. 8

Issue (3): 41-52 DOI: 10.11925/infotech.2096-3467.2023.0052

Current Issue | Archive | Adv Search

Detecting Depression Factors with Gradient Boosting Tree and Explainable Machine Learning Model SHAP

Nie Hui(

),Wu Xiaoyan

School of Information Management, Sun Yat-Sen University, Guangzhou 510275, China

Download: PDF (1617 KB) HTML ( 9 )
Export: BibTeX | EndNote (RIS)

Abstract

[Objective] This study constructs a predictive model for depression severity and explores its interpretability issues. We aim to improve the automated depression detection model’s reliability and practicality by analyzing Internet user-generated content. [Methods] First, we built a corpus by collecting depression-related medical consultations from the Good Doctor Online platform. Then, we extracted patients’ psychological features using C-LIWC, a psychology lexicon. Third, we predicted the patients’ conditions with the Gradient Boosting Tree algorithm. The study also incorporated the explainable machine learning method SHAP to interpret the new model. Through SHAP’s unique visualizations, we analyzed the complex relationship between patients’ age, gender, cognition, emotions, perceptions, social / family contexts, personal gains or losses, and the occurrence of depression. [Results] The psychological state of depression patients provided feedback on their condition. Utilizing psychological features extracted from consultation records effectively detected severe depression, with an accuracy of 86%. The SHAP reveals multiple effects of patients’ psychological features on depression. [Limitations] Limited by the corpus, predictions of depression severity were based only on single consultation records. Additionally, the model features were based on psychological dictionaries, while more elements related to the risk of depression could be included in the future. [Conclusions] Factors influencing the occurrence and development of depression are complex. Individual differences result in different effects of various characteristics on disease prediction. Building an automated diagnostic model for depression should focus on the model’s accuracy and enhance understanding of the model’s predictions.

Key words： Depression Prediction Online User-Generated Content Interpretable Machine Learning Light Gradient Boosting Machine

Received: 27 January 2023 Published: 28 April 2023

ZTFLH:	TP391
	G350

Fund:Social Science Fund of Guangzhou(10000-42220402)

Corresponding Authors: Nie Hui，ORCID： 0000-0001-8567-3084，E-mail：issnh@mail.sysu.edu.cn。

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Hui Nie
	Xiaoyan Wu

Cite this article:

Nie Hui, Wu Xiaoyan. Detecting Depression Factors with Gradient Boosting Tree and Explainable Machine Learning Model SHAP. Data Analysis and Knowledge Discovery, 2024, 8(3): 41-52.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2023.0052 OR https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2024/V8/I3/41

Six Sets of Psycholinguistic Words Associated with Depression

The Features of Depression Severity Prediction Model

Research Framework

Descriptive Statistics for Psycholinguistic Variables （N=2 950）

Tuning LightGBM Algorithm’s Parameters

Performance of the LightGBM Model

The Confusion Matrix for Predicting Depression Level

Explaining the Depression Prediction Model Using SHAP Values

Interaction Effect Analysis Based on SHAP Values

[1]	World Health Organization. Depression[EB/OL]. (2022-11-19)[2023-03-31]. https://www.who.int/news-room/fact-sheets/detail/depression.
[2]	傅小兰, 张侃, 陈雪峰, 等. 心理健康蓝皮书:中国国民心理健康发展报告(2021-2022)[M]. 北京: 社会科学文献出版社,2023.
[2]	(Fu Xiaolan, Zhang Kan, Chen Xuefeng, et al. Report on National Mental Health Development in China (2021-2022)[M]. Beijing: Social Sciences Academic Press, 2023.)
[3]	Huang Y Q, Wang Y, Wang H, et al. Prevalence of Mental Disorders in China: A Cross-Sectional Epidemiological Study[J]. The Lancet Psychiatry, 2019, 6(3): 211-224. doi: 10.1016/S2215-0366(18)30511-X
[4]	Ren X W, Yu S C, Dong W L, et al. Burden of Depression in China, 1990-2017: Findings from the Global Burden of Disease Study 2017[J]. Journal of Affective Disorders, 2020, 268: 95-101. doi: S0165-0327(20)30083-5 pmid: 32158012
[5]	Yates A, Cohan A, Goharian N. Depression and Self-Harm Risk Assessment in Online Forums[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 2968-2978.
[6]	Eichstaedt J C, Smith R J, Merchant R M, et al. Facebook Language Predicts Depression in Medical Records[J]. Psychological and Cognitive Sciences, 2018, 115(44): 11203-11208.
[7]	Shrestha A, Serra E, Spezzano F. Multi-Modal Social and Psycho-Linguistic Embedding via Recurrent Neural Networks to Identify Depressed Users in Online Forums[J]. Network Modeling Analysis in Health Informatics and Bioinformatics, 2020, 9(1): Article No.22.
[8]	Tadesse M M, Lin H F, Xu B, et al. Detection of Suicide Ideation in Social Media Forums Using Deep Learning[J]. Algorithms, 2020, 13(1): Article No.7.
[9]	Yang T T, Li F, Ji D H, et al. Fine-Grained Depression Analysis Based on Chinese Micro-Blog Reviews[J]. Information Processing & Management, 2021, 58(6): 102681. doi: 10.1016/j.ipm.2021.102681
[10]	Burdisso S G, Errecalde M, Montes-y-Gómez M. Using Text Classification to Estimate the Depression Level of Reddit Users[J]. Journal of Computer Science & Technology, 2021, 21(1): 1-10.
[11]	Abed-Esfahani P, Howard D, Maslej M, et al. Transfer Learning for Depression: Early Detection and Severity Prediction from Social Media Postings[C]// Proceedings of the Working Notes of CLEF 2019-Conference and Labs of the Evaluation Forum. Cham: Springer, 2019.
[12]	Burdisso S G, Errecalde M, Montes-y-Gómez M. τ-SS3: A Text Classifier with Dynamic N-Grams for Early Risk Detection over Text Streams[J]. Pattern Recognition Letters, 2020, 138: 130-137. doi: 10.1016/j.patrec.2020.07.001
[13]	Bucur A M, Cosma A, Dinu L P. Early Risk Detection of Pathological Gambling, Self-Harm and Depression Using BERT[OL]. [2022-12-17]. http://dx.doi.org/10.13140/RG.2.2.25060.50567.
[14]	Parapar J, Martín-Rodilla P, Losada D E, et al. Overview of eRisk 2021: Early Risk Prediction on the Internet[C]// Proceedings of the Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum. Cham: Springer, 2021: 324-344
[15]	Mi J X, Li A D, Zhou L F. Review Study of Interpretation Methods for Future Interpretable Machine Learning[J]. IEEE Access, 2020, 8: 191969-191985. doi: 10.1109/Access.6287639
[16]	Ke G L, Meng Q, Finley T, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Cham: Springer, 2017: 3149-3157.
[17]	Lundberg S M, Erion G, Chen H, et al. From Local Explanations to Global Understanding with Explainable AI for Trees[J]. Nature Machine Intelligence, 2020, 2(1): 56-67. doi: 10.1038/s42256-019-0138-9 pmid: 32607472
[18]	Yao X X, Yu G, Tang J Y, et al. Extracting Depressive Symptoms and Their Associations from an Online Depression Community[J]. Computers in Human Behavior, 2021, 120: 106734. doi: 10.1016/j.chb.2021.106734
[19]	Chung C K, Pennebaker J W. Linguistic Inquiry and Word Count (LIWC): Pronounced "Luke,"... and Other Useful Facts[OL]. [2022-12-17]. https://doi.org/10.4018/978-1-60960-741-8.ch012.
[20]	Fatima I, Abbasi B U D, Khan S, et al. Prediction of Postpartum Depression Using Machine Learning Techniques from Social Media Text[J]. Expert Systems, 2019, 36(4): e12409. doi: 10.1111/exsy.v36.4
[21]	Lyons M, Aksayli N D, Brewer G. Mental Distress and Language Use: Linguistic Analysis of Discussion Forum Posts[J]. Computers in Human Behavior, 2018, 87: 207-211. doi: 10.1016/j.chb.2018.05.035
[22]	Uban A S, Chulvi B, Rosso P. An Emotion and Cognitive Based Analysis of Mental Health Disorders from Social Media Data[J]. Future Generation Computer Systems, 2021, 124: 480-494. doi: 10.1016/j.future.2021.05.032
[23]	Hyde J S, Mezulis A H. Gender Differences in Depression: Biological, Affective, Cognitive, and Sociocultural Factors[J]. Harvard Review of Psychiatry, 2020, 28(1): 4-13. doi: 10.1097/HRP.0000000000000230
[24]	好大夫在线简介[EB/OL]. [2022-10-12]. https://www.haodf.com/info/aboutus.php.(Introduction [EB/OL]. [2022-10-12]. https://www.haodf.com/info/aboutus.php.)
[25]	Zhao N, Jiao D D, Bai S T, et al. Evaluating the Validity of Simplified Chinese Version of LIWC in Detecting Psychological Expressions in Short Texts on Social Network Services[J]. PLoS One, 2016, 11(6): e0157947. doi: 10.1371/journal.pone.0157947
[26]	Chen T Q, Guestrin C. XGBoost: A Scalable Tree Boosting System[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2016: 785-794.
[27]	Shapley L S. A Value for n-Person Games. Contributions to the Theory of Games[M]. Princeton: Princeton University Press, 1953: 307-317.
[28]	Molnar C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable[EB/OL]. [2022-12-17]. https://christophm.github.io/interpretable-ml-book/.
[29]	Moncada-Torres A, van Maaren M C, Hendriks M P, et al. Explainable Machine Learning can Outperform Cox Regression Predictions and Provide Insights in Breast Cancer Survival[J]. Scientific Reports, 2021, 11: Article No.6968.
[30]	Adadi A, Berrada M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)[J]. IEEE Access, 2018, 6: 52138-52160. doi: 10.1109/ACCESS.2018.2870052

[1]	Liu Tianchang, Wang Lei, Zhu Qinghua. Predicting User Churn of Smart Home-based Care Services Based on SHAP Interpretation[J]. 数据分析与知识发现, 2024, 8(1): 40-54.
[2]	Liu Zhifeng, Wang Jimin. Review of Interpretable Machine Learning for Information Resource Management[J]. 数据分析与知识发现, 2024, 8(1): 16-29.

Viewed

Full text

Abstract

Cited

Shared

Discussed