|
|
Feature Selection and Efficient Disease Early Warning Based on Optimized Ensemble Learning Model:Case Study of Geriatric Depression and Anxiety |
Yan Ying1,Huang Qi1,2(),Li Na1 |
1School of Information Management, Nanjing University, Nanjing 210023, China 2Nanjing Research Base of National Information Management, Nanjing University, Nanjing 210093, China |
|
|
Abstract [Objective] This paper makes disease prediction models balance computational efficiency and prediction accuracy by selecting key disease risk variables, aiming to help public health departments achieve efficient disease early warning. [Methods] We used ensemble learning-based Random Forest and XGBoost models to learn high-dimensional disease risk variable data for disease prediction. The models autonomously select subsets of variables that contribute to their prediction. To ensure that the selected subset has high prediction accuracy, we analyze the ensemble strategy of Random Forest and XGBoost. By adjusting hyperparameters and cross-validating, we improved the out-of-bag error rate of the Random Forest model iteratively and converged the loss curve of the XGBoost model on different sub-training sets. Finally, we proposed unique optimization solutions for each model to enhance their disease prediction performance. [Results] We examined the optimized models with the dataset of geriatric depression and anxiety. They exhibited excellent and comparable disease prediction performance, achieving prediction accuracies of 88.6% and 89.7%, as well as AUC values of 0.936 and 0.940, respectively. However, the XGBoost model had a simpler and more efficient structure with the optimized feature selection. It selected only 17 key variables out of 54 geriatric depression and anxiety risk variables, achieving a prediction accuracy of 85.8% and an AUC of 0.917. [Limitations] We did not utilize the latest geriatric cohort data for experimentation. More research is needed to test the adaptability of models in complex and heterogenous data environments. [Conclusions] The feature selection effect of the optimized XGBoost model is superior in improving the efficiency of disease early warning and providing decision support for public health management.
|
Received: 18 July 2022
Published: 07 September 2023
|
|
Corresponding Authors:
Huang Qi,ORCID:0000-0003-2394-148X, E-mail: huangqi@nju.edu.cn。
|
[1] |
王萍, 牟冬梅, 高和璇, 等. 基于传染病监测数据的危机探测研究[J]. 情报学报, 2019, 38(5): 492-499.
|
[1] |
(Wang Ping, Mu Dongmei, Gao Hexuan, et al. Research on Crisis Detection in Infectious Disease Surveillance Data[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(5): 492-499.)
|
[2] |
王敏虾. 基于逻辑回归关联规则的疾病预警模型[D]. 济南: 山东大学, 2016.
|
[2] |
(Wang Minxia. Disease Early Warning Model Based on Logistic Regression Association Rules[D]. Jinan: Shandong University, 2016.)
|
[3] |
Liu H, Motoda H, Setiono R, et al. Feature Selection: An Ever Evolving Frontier in Data Mining[C]// Proceedings of the 4th Workshop on Feature Selection in Data Mining. 2010: 4-13.
|
[4] |
Aziz N A A, Maarof M A, Zainal A. Hate Speech and Offensive Language Detection: A New Feature Set with Filter-Embedded Combining Feature Selection[C]// Proceedings of the 3rd International Cyber Resilience Conference. IEEE, 2021.
|
[5] |
张文德, 程涵, 刘田, 等. 随机森林在高校信息碎片化整合中的应用[J]. 图书情报工作, 2018, 62(7): 119-124.
doi: 10.13266/j.issn.0252-3116.2018.07.014
|
[5] |
(Zhang Wende, Cheng Han, Liu Tian, et al. Application of Random Forest in the Fragmented Integration of University Information[J]. Library and Information Service, 2018, 62(7): 119-124.)
doi: 10.13266/j.issn.0252-3116.2018.07.014
|
[6] |
张燕. 基于决策树的老年心血管疾病住院患者衰弱预测模型构建[D]. 汕头: 汕头大学, 2021.
|
[6] |
(Zhang Yan. Construction of the Debilitating Prediction Model for Elderly Inpatients with Cardiovascular Diseases Based on Decision Tree[D]. Shantou: Shantou University, 2021.)
|
[7] |
Khanam J J, Foo S Y. A Comparison of Machine Learning Algorithms for Diabetes Prediction[J]. ICT Express, 2021, 7(4): 432-439.
doi: 10.1016/j.icte.2021.02.004
|
[8] |
Kwekha-Rashid A S, Abduljabbar H N, Alhayani B. Coronavirus Disease (COVID-19) Cases Analysis Using Machine-Learning Applications[J]. Applied Nanoscience, 2023, 13(3): 2013-2025.
doi: 10.1007/s13204-021-01868-7
|
[9] |
Safdari R, Deghatipour A, Gholamzadeh M, et al. Applying Data Mining Techniques to Classify Patients with Suspected Hepatitis C Virus Infection[J]. Intelligent Medicine, 2022, 2(4): 193-198.
doi: 10.1016/j.imed.2021.12.003
|
[10] |
Dagliati A, Marini S, Sacchi L, et al. Machine Learning Methods to Predict Diabetes Complications[J]. Journal of Diabetes Science and Technology, 2018, 12(2): 295-302.
doi: 10.1177/1932296817706375
pmid: 28494618
|
[11] |
Quesada J A, Lopez-Pineda A, Gil-Guillén V F, et al. Machine Learning to Predict Cardiovascular Risk[J]. International Journal of Clinical Practice, 2019, 73(10): e13389.
|
[12] |
Bari Antor M, Jamil A H M S, Mamtaz M, et al. A Comparative Analysis of Machine Learning Algorithms to Predict Alzheimer's Disease[J]. Journal of Healthcare Engineering, 2021, 2021: 9917919.
|
[13] |
Golpour P, Ghayour-Mobarhan M, Saki A, et al. Comparison of Support Vector Machine, Naïve Bayes and Logistic Regression for Assessing the Necessity for Coronary Angiography[J]. International Journal of Environmental Research and Public Health, 2020, 17(18): 6449.
doi: 10.3390/ijerph17186449
|
[14] |
Schober P, Vetter T R. Logistic Regression in Medical Research[J]. Anesthesia and Analgesia, 2021, 132(2): 365-366.
doi: 10.1213/ANE.0000000000005247
pmid: 33449558
|
[15] |
Xing W C, Bei Y L. Medical Health Big Data Classification Based on KNN Classification Algorithm[J]. IEEE Access, 2019, 8: 28808-28819.
doi: 10.1109/Access.6287639
|
[16] |
Sagi O, Rokach L. Ensemble Learning: A Survey[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2018, 8(4): e1249.
doi: 10.1002/widm.2018.8.issue-4
|
[17] |
曾子明, 王婧. 基于LDA和随机森林的微博谣言识别研究——以2016年雾霾谣言为例[J]. 情报学报, 2019, 38(1): 89-96.
|
[17] |
(Zeng Ziming, Wang Jing. Research on Microblog Rumor Identification Based on LDA and Random Forest[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(1): 89-96.)
|
[18] |
Shafi A S M, Rahman M B, Anwar T, et al. Classification of Brain Tumors and Auto-Immune Disease Using Ensemble Learning[J]. Informatics in Medicine Unlocked, 2021, 24: 100608.
doi: 10.1016/j.imu.2021.100608
|
[19] |
Zhang Y L, Feng T, Wang S D, et al. A Novel XGBoost Method to Identify Cancer Tissue-of-Origin Based on Copy Number Variations[J]. Frontiers in Genetics, 2020, 11: 585029.
doi: 10.3389/fgene.2020.585029
|
[20] |
徐良辰, 郭崇慧. 基于集成学习的胃癌生存预测模型研究[J]. 数据分析与知识发现, 2021, 5(8): 86-99.
|
[20] |
(Xu Liangchen, Guo Chonghui. Predicting Survival Rates for Gastric Cancer Based on Ensemble Learning[J]. Data Analysis and Knowledge Discovery, 2021, 5(8): 86-99.)
|
[21] |
Zhang Y Y, Wang S J, Hermann A, et al. Development and Validation of a Machine Learning Algorithm for Predicting the Risk of Postpartum Depression among Pregnant Women[J]. Journal of Affective Disorders, 2021, 279: 1-8.
doi: 10.1016/j.jad.2020.09.113
pmid: 33035748
|
[22] |
Putatunda S, Rama K. A Comparative Analysis of Hyperopt as Against Other Approaches for Hyper-Parameter Optimization of XGBoost[C]// Proceedings of the 2018 International Conference on Signal Processing and Machine Learning. 2018: 6-10.
|
[23] |
Ogunleye A, Wang Q G. XGBoost Model for Chronic Kidney Disease Diagnosis[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2020, 17(6): 2131-2140.
doi: 10.1109/TCBB.8857
|
[24] |
Wang Y, Ni X S. A XGBoost Risk Model via Feature Selection and Bayesian Hyper-Parameter Optimization[OL]. arXiv Preprint, arXiv:1901.08433.
|
[25] |
梅雪峰. 基于代价敏感的分类集成学习算法研究[D]. 南京: 南京邮电大学, 2021.
|
[25] |
(Mei Xuefeng. Research on Classification Ensemble Learning Algorithm Based on Cost Sensitivity[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2021.)
|
[26] |
Sun L Y. Application and Improvement of XGBoost Algorithm Based on Multiple Parameter Optimization Strategy[C]// Proceedings of the 5th International Conference on Mechanical, Control and Computer Engineering. 2020: 1822-1825.
|
[27] |
Budholiya K, Shrivastava S K, Sharma V. An Optimized XGBoost Based Diagnostic System for Effective Prediction of Heart Disease[J]. Journal of King Saud University-Computer and Information Sciences, 2022, 34(7): 4514-4523.
doi: 10.1016/j.jksuci.2020.10.013
|
[28] |
Qin C, Zhang Y F, Bao F X, et al. XGBoost Optimized by Adaptive Particle Swarm Optimization for Credit Scoring[J]. Mathematical Problems in Engineering, 2021, 2021: 1-18.
|
[29] |
Sun S L, Jin F, Li H T, et al. A New Hybrid Optimization Ensemble Learning Approach for Carbon Price Forecasting[J]. Applied Mathematical Modelling, 2021, 97: 182-205.
doi: 10.1016/j.apm.2021.03.020
|
[30] |
Dhar J. Multistage Ensemble Learning Model with Weighted Voting and Genetic Algorithm Optimization Strategy for Detecting Chronic Obstructive Pulmonary Disease[J]. IEEE Access, 2021, 9: 48640-48657.
doi: 10.1109/ACCESS.2021.3067949
|
[31] |
Syahrani I M. Comparation Analysis of Ensemble Technique with Boosting (XGBoost) and Bagging (RandomForest) for Classify Splice Junction DNA Sequence Category[J]. Jurnal Penelitian Pos dan Informatika, 2019, 9(1): 27-36.
|
[32] |
Chen T Q, Guestrin C. XGBoost: A Scalable Tree Boosting System[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016: 785-794.
|
[33] |
Geisser S. The Predictive Sample Reuse Method with Applications[J]. Journal of the American Statistical Association, 1975, 70(350): 320-328.
doi: 10.1080/01621459.1975.10479865
|
[34] |
Morgan K. The Nottingham Longitudinal Study of Activity and Ageing: A Methodological Overview[J]. Age and Ageing, 1998, 27(suppl_3): 5-11.
|
[35] |
Gotlib I H, Joormann J. Cognition and Depression: Current Status and Future Directions[J]. Annual Review of Clinical Psychology, 2010, 6: 285 -312.
doi: 10.1146/annurev.clinpsy.121208.131305
pmid: 20192795
|
[36] |
Bedford A, Foulds G A, Sheffield B F. A New Personal Disturbance Scale (DSSI/sAD)[J]. British Journal of Social and Clinical Psychology, 1976, 15(4): 387-394.
doi: 10.1111/j.2044-8260.1976.tb00050.x
pmid: 1000147
|
[37] |
Lin H Y, Jin M D, Liu Q, et al. Gender-Specific Prevalence and Influencing Factors of Depression in Elderly in Rural China: A Cross-Sectional Study[J]. Journal of Affective Disorders, 2021, 288: 99-106.
doi: 10.1016/j.jad.2021.03.078
pmid: 33848754
|
[38] |
Mulat N, Gutema H, Wassie G T. Prevalence of Depression and Associated Factors among Elderly People in Womberma District, North-West, Ethiopia[J]. BMC Psychiatry, 2021, 21(1): 136.
doi: 10.1186/s12888-021-03145-x
pmid: 33685419
|
[39] |
寇小君, 龚传鹏, 刘修军, 等. 武汉市社区老年人群焦虑、抑郁现况及影响因素[J]. 中国老年学杂志, 2018, 38(10): 2529-2531.
|
[39] |
(Kou Xiaojun, Gong Chuanpeng, Liu Xiujun, et al. Prevalence and Influencing Factors of Anxiety and Depression among the Elderly in Wuhan Community[J]. Chinese Journal of Gerontology, 2018, 38(10): 2529-2531.)
|
[40] |
Yuziani, Maulina M. The Correlation Between Stress Level and Degree of Depression in the Elderly at a Nursing Home in Lhokseumawe in the Year 2017[C]// Proceedings of MICoMS 2017. 2018: 497-502.
|
[41] |
Byeon H. Exploring Factors for Predicting Anxiety Disorders of the Elderly Living Alone in South Korea Using Interpretable Machine Learning: A Population-Based Study[J]. International Journal of Environmental Research and Public Health, 2021, 18(14): 7625.
doi: 10.3390/ijerph18147625
|
[42] |
Hossain B, Yadav P K, Nagargoje V P, et al. Association Between Physical Limitations and Depressive Symptoms among Indian Elderly: Marital Status as a Moderator[J]. BMC Psychiatry, 2021, 21(1): 573.
doi: 10.1186/s12888-021-03587-3
pmid: 34781925
|
[43] |
Pan L, Li L, Peng H Y, et al. Association of Depressive Symptoms with Marital Status among the Middle-Aged and Elderly in Rural China-Serial Mediating Effects of Sleep Time, Pain and Life Satisfaction[J]. Journal of Affective Disorders, 2022, 303: 52-57.
doi: 10.1016/j.jad.2022.01.111
pmid: 35124113
|
[44] |
李磊, 马孟园, 彭红叶, 等. 中国农村地区老年人抑郁症状发生情况及影响因素研究[J]. 中国全科医学, 2021, 24(27): 3432-3438.
doi: 10.12114/j.issn.1007-9572.2021.00.577
|
[44] |
(Li Lei, Ma Mengyuan, Peng Hongye, et al. Prevalence and Associated Factors of Depressive Symptoms in China's Rural Elderly[J]. Chinese General Practice, 2021, 24(27): 3432-3438.)
doi: 10.12114/j.issn.1007-9572.2021.00.577
|
[45] |
汪苗, 潘庆. 我国老年人焦虑状况城乡差异及影响因素分析[J]. 中国全科医学, 2021, 24(31): 3963-3970.
doi: 10.12114/j.issn.1007-9572.2021.00.294
|
[45] |
(Wang Miao, Pan Qing. The Rural-Urban Differences and Influencing Factors in the Anxiety Symptoms of Chinese Elderly People[J]. Chinese General Practice, 2021, 24(31): 3963-3970.)
doi: 10.12114/j.issn.1007-9572.2021.00.294
|
[46] |
Ma X M, Zhang X F, Guo X T, et al. Examining the Role of ICT Usage in Loneliness Perception and Mental Health of the Elderly in China[J]. Technology in Society, 2021, 67: 101718.
doi: 10.1016/j.techsoc.2021.101718
|
[47] |
梁蔚蔚, 李娟, 刘园园, 等. 北京及广州社区老年人抑郁焦虑水平与社会支持相关研究[J]. 阿尔茨海默病及相关病杂志, 2020, 3(2): 129-135.
doi: 10.3969/j.issn.2096-5516.2020.02.008
|
[47] |
(Liang Weiwei, Li Juan, Liu Yuanyuan, et al. Correlation Between Depression, Anxiety and Social Support Among the Elderly in Beijing and Guangzhou Communities[J]. Chinese Journal of Alzheimer's Disease and Related Disorders, 2020, 3(2): 129-135.)
doi: 10.3969/j.issn.2096-5516.2020.02.008
|
[48] |
邓学学, 方荣华, 毛艳, 等. 综合医院全科病房老年住院患者的焦虑抑郁状况及影响因素研究[J]. 中国全科医学, 2020, 23(1): 96-100.
doi: 10.12114/j.issn.1007-9572.2019.00.603
|
[48] |
(Deng Xuexue, Fang Ronghua, Mao Yan, et al. Prevalence and Influencing Factors of Anxiety and Depression in Hospitalized Elderly Patients in the General Medicine Ward of a General Hospital[J]. Chinese General Practice, 2020, 23(1): 96-100.)
doi: 10.12114/j.issn.1007-9572.2019.00.603
|
[49] |
de Oliveira L D S S C B, Souza E C, Rodrigues R A S, et al. The Effects of Physical Activity on Anxiety, Depression, and Quality of Life in Elderly People Living in the Community[J]. Trends in Psychiatry and Psychotherapy, 2019, 41(1): 36-42.
doi: S2237-60892019000100005
pmid: 30994779
|
[50] |
Zhang M M, Ma Y, Du L T, et al. Sleep Disorders and Non-Sleep Circadian Disorders Predict Depression: A Systematic Review and Meta-Analysis of Longitudinal Studies[J]. Neuroscience & Biobehavioral Reviews, 2022, 134: 104532.
doi: 10.1016/j.neubiorev.2022.104532
|
[51] |
Mukku S S R, Dahale A B, Muniswamy N R, et al. Geriatric Depression and Cognitive Impairment—An Update[J]. Indian Journal of Psychological Medicine, 2021, 43(4): 286-293.
doi: 10.1177/0253717620981556
|
[52] |
Bhandari P, Paswan B. Lifestyle Behaviours and Mental Health Outcomes of Elderly: Modification of Socio-economic and Physical Health Effects[J]. Ageing International, 2021, 46(1): 35-69.
doi: 10.1007/s12126-020-09371-0
|
[53] |
Xazratovich K Z. Depression and Anxiety in Patients with Alcoholism Complicated by Nicotine Addiction[J]. Eurasian Medical Research Periodical, 2022, 9: 65-67.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|