Wang Yan, Xu Meimei, Tong Yujia, Gou Huan, Shan Zhiyi, An Xinying
[Objective] Using machine learning to build a prediction and early warning model and evaluation of circulatory system disease death, so as to provide reference for disease prevention.
[Methods] The death data of circulatory system diseases in a region of China from 2014 to 2018 were used for analysis, and the prediction model was constructed by GAM, RF and XGBoost. The distributed lag nonlinear model is used to calculate the cumulative lag effect results, construct the early warning model and evaluate the model.
[Results] The cumulative lag effect found that continuous low temperature and high temperature, high sunshine hours and high concentration of environmental pollutants would increase the risk of death from circulatory system diseases. The cumulative seven day relative risks were 1.236, 1.130, 1.56, 1.062, 1.218, 1.153 and 1.796 respectively. The RMSE of RF and XGBoost models are 4.979 and 5.341, with good performance. Age, sex, temperature, sunshine hours, SO2, NO2, Co, O3, PM10, PM2.5 concentration is the characteristic variable screened, and the early warning value is determined from the screened data of cumulative lag effect. The early warning effect is good. The sensitivity, specificity and area under the curve of xgboost prediction results were 0.948, 0.939 and 0.941 respectively.
[limitations] There is a lack of independent data on concomitant diseases.
[Conclusions] The increase in the number of deaths in this area is related to the increase of high age, men, temperature, sunshine hours and pollutant concentration. The prediction and early warning model constructed by XGBoost model has good performance, and can provide reference value for disease prevention and intervention in relevant departments.