Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China
Chai Guorong,Wang Bin,Sha Yongzhong()
School of Management, Lanzhou University, Lanzhou 730000, China Research Center for Hospital Management, Lanzhou University, Lanzhou 730000, China Research Center for Emergency Management, Lanzhou University, Lanzhou 730000, China
[Objective] This study tries to explore the practicability and effectiveness of forecasting public health risks with machine learning, taken influenza as an example. [Methods] First, we collected the data on influenza and meteorological factors during 2009 to 2016 in Lanzhou, China. Data from the year 2009 to 2015 were used as the training data and 2016 as the testing data. Then, based on SARIMA, Kalman Filter, and VAR, three machine learning methods for influenza prediction were put forward, respectively. Moreover, we designed two multi-method combined forecasting strategies. Finally, the forecasting performance of the above methods (strategies) was carefully evaluated and compared. [Results] The SARIMA, VAR, and Kalman Filter achieved best predict performance in the whole period (WP), outbreak period (OP), and stabilization period (SP), with RMSE at 11.68, 19.23, 1.60, and R2 at 0.932, 0.923, 0.956, respectively. The forecasting performance among all three scenarios was improved by our multi-method combined strategies, in which Comb_2 has better performance, with RMSE at 10.82, 14.68, 1.38, and R2 at 0.942, 0.934, 0.963, respectively. [Limitations] Limited by the data, this study just considered meteorology factors as external factors. [Conclusions] Predicting public health risks (such as influenza) with machine learning is practicable, effective and has great potential. But a lack of multi-source data is the major dilemma. Therefore, to promote the open exchange and sharing of data, barriers should be broken at the technical, organizational, and institutional levels.
柴国荣,王斌,沙勇忠. 基于多机器学习方法联合的公共卫生风险预测研究——以兰州市流感预测为例*[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
Chai Guorong,Wang Bin,Sha Yongzhong. Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China. Data Analysis and Knowledge Discovery, 2021, 5(1): 90-98.
Nair H, Brooks W A, Katz M , et al. Global Burden of Respiratory Infections due to Seasonal Influenza in Young Children: A Systematic Review and Meta-Analysis[J]. Lancet, 2011,378(9807):1917-1930.
Thompson W W, Eric W, Praveen D , et al. Estimates of US Influenza‐Associated Deaths Made Using Four Different Methods[J]. Influenza Other Respir Viruses, 2009,3(1):37-49.
Lu F S, Hou S Q, Baltrusaitis K , et al. Accurate Influenza Monitoring and Forecasting Using Novel Internet Data Streams: A Case Study in the Boston Metropolis[J]. JMIR Public Health and Surveillance, 2018,4(1):e4.
Centers for Disease Control and Prevention of US. U.S. Influenza Surveillance System: Purpose and Methods[EB/OL].( 2020 -07-08).[2020-08-18].https://www.cdc.gov/flu/weekly/overview.htm .
Biggerstaff M, Johansson M, Alper D , et al. Results from the Second Year of a Collaborative Effort to Forecast Influenza Seasons in the United States[J]. Epidemics, 2018,24:26-33.
Yang W, Karspeck A, Shaman J . Comparison of Filtering Methods for the Modeling and Retrospective Forecasting of Influenza Epidemics[J]. PLoS Computational Biology, 2014,10(4):e1003583.
Olson D R, Konty K J, Paladini M , et al. Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales[J]. PLoS Computational Biology, 2013,9(10):e1003256.
Kim M-J, Nembhard H, Lambert B , et al. A Syndromic Surveillance System for Clinical and Non-Clinical Health Data[J]. IIE Transactions on Healthcare Systems Engineering, 2011,1(1):37-48.
Soebiyanto R P, Adimi F, Kiang R K . Modeling and Predicting Seasonal Influenza Transmission in Warm Regions Using Climatological Parameters[J]. PLoS One, 2010,5(3):e9450.
( Zhou Meilan, Zhou Zhihua, Luo Meiling , et al. Prediction of Influenza-Like Illness in Sentinel Hospitals in Hunan Province by SARIMA Model[J]. Practical Preventive Medicine, 2018,25(3):370-373.)
Venna S R, Tavanaei A, Gottumukkala R N , et al. A Novel Data-Driven Model for Real-Time Influenza Forecasting[J]. IEEE Access, 2019,7:7691-7701.
Ben-Nun M, Riley P, Turtle J , et al. National and Regional Influenza-Like-Illness Forecasts for the USA[OL]. bioRxiv Preprint, https://doi.org/10.1101/309021 .
Box G E, Jenkis G M . Time Series Analysis for Casting and Control[M]. San Francisco: Holden-day, 1970.
Liu S J, Chen J P, Wang J M , et al. Predicting the Outbreak of Hand, Foot, and Mouth Disease in Nanjing, China: A Time-Series Model Based on Weather Variability[J]. International Journal of Biometeorology, 2017.DOI: 10.1007/s00484-017-1465-3.
Du Z C, Xu L, Zhang W J , et al. Predicting the Hand, Foot, and Mouth Disease Incidence Using Search Engine Query Data and Climate Variables: An Ecological Study in Guangdong, China[J]. BMJ Open, 2017,7(10):e016263.
Kalman R E . A New Approach to Linear Filtering and Prediction Problems[J]. Journal of Basic Engineering Transactions, 1960,82:35-45.
Welch G, Bishop G . An Introduction to the Kalman Filter[R]. University of North Carolina at Chapel Hill, 2001.
Pei S, Kandula S, Yang W , et al. Forecasting the Spatial Transmission of Influenza in the United States[J]. Proceedings of the National Academy of Sciences, 2018,115(11):2752-2757.
Monogan J. Vector Autoregression[M]. Mauritius: Betascript Publishing, 2010: 678-699.
Guo P, Liu T, Zhang Q , et al. Developing a Dengue Forecast Model Using Machine Learning: A Case Study in China[J]. PLoS Neglected Tropical Diseases, 2017,11(10):e0005973.