Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (1): 90-98    DOI: 10.11925/infotech.2096-3467.2020.0754
Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China
Chai Guorong,Wang Bin,Sha Yongzhong()
School of Management, Lanzhou University, Lanzhou 730000, China
Research Center for Hospital Management, Lanzhou University, Lanzhou 730000, China
Research Center for Emergency Management, Lanzhou University, Lanzhou 730000, China
[Objective] This study tries to explore the practicability and effectiveness of forecasting public health risks with machine learning, taken influenza as an example. [Methods] First, we collected the data on influenza and meteorological factors during 2009 to 2016 in Lanzhou, China. Data from the year 2009 to 2015 were used as the training data and 2016 as the testing data. Then, based on SARIMA, Kalman Filter, and VAR, three machine learning methods for influenza prediction were put forward, respectively. Moreover, we designed two multi-method combined forecasting strategies. Finally, the forecasting performance of the above methods (strategies) was carefully evaluated and compared. [Results] The SARIMA, VAR, and Kalman Filter achieved best predict performance in the whole period (WP), outbreak period (OP), and stabilization period (SP), with RMSE at 11.68, 19.23, 1.60, and R 2 at 0.932, 0.923, 0.956, respectively. The forecasting performance among all three scenarios was improved by our multi-method combined strategies, in which Comb_2 has better performance, with RMSE at 10.82, 14.68, 1.38, and R 2 at 0.942, 0.934, 0.963, respectively. [Limitations] Limited by the data, this study just considered meteorology factors as external factors. [Conclusions] Predicting public health risks (such as influenza) with machine learning is practicable, effective and has great potential. But a lack of multi-source data is the major dilemma. Therefore, to promote the open exchange and sharing of data, barriers should be broken at the technical, organizational, and institutional levels.

Key wordsMachine Learning      Influenza Forecast      Public Health Risk      Risk Forecast     
Received: 03 August 2020      Published: 05 February 2021
ZTFLH:  C916  
Fund:The work is supported by the National Natural Science Foundation of China Grant No(71472079);the Fundamental Research Funds for the Central Universities Grant No(18LZUJBWZD07);the Key Projects of Philosophy and Social Sciences Research, Ministry of Education Grant No(16JZD023)
Chai Guorong,Wang Bin,Sha Yongzhong. Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China. Data Analysis and Knowledge Discovery, 2021, 5(1): 90-98.

Forecasting Process
因素 均值±
最小值 最大值 百分位数
25% 50% 75%
流感发病数(例) 9±20 0 241 2 5 10
温度(℃) 11.23±9.79 -8.81 29.00 2.18 12.93 19.92
大气压(hPa) 846.78±4.48 836.70 858.00 843.51 846.87 849.70
风速(m/s) 1.24±0.23 0.69 2.00 1.07 1.23 1.41
相对湿度(%) 50.46±12.24 18.06 78.43 42.66 51.37 59.29
降雨量(mm) 0.82±1.67 0 17.89 0 0.14 1.00
Descriptive Statistics of Influenza and Meteorological Factors, 2009-2016
Time Series Chart of Influenza and Meteorological Factors, 2009-2016
因素 温度 大气压 风速 相对湿度 降雨量
大气压 -0.741**
风速 0.425** -0.533**
相对湿度 -0.032 0.166** -0.439**
降雨量 0.499** -0.359** 0.135** 0.493**
流感发病数 -0.482** 0.400** -0.266** -0.074* -0.272**
Spearman Correlations Between Weekly Meteorological Variables at Lag of 1 Week and Influenza Cases, 2009-2016
Observed Values and Predicted Values of Influenza Incidence, 367-418 weeks
指标 场景 SARIMA KF VAR 联合预测
Comb_1 Comb_2
RMSE WP 11.68 12.61 11.85 10.88 10.82*
OP 20.28 21.96 19.23 14.74 14.68*
SP 1.72 1.60 3.13 1.58 1.38*
R2 WP 0.932 0.921 0.930 0.941 0.942#
OP 0.920 0.910 0.923 0.933 0.934#
SP 0.918 0.956 0.832 0.953 0.963#
RMSE and R2for Prediction Results of Independent Methods and Combined Strategies in Each Scenario
