Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (1): 90-98    DOI: 10.11925/infotech.2096-3467.2020.0754
Current Issue | Archive | Adv Search |
Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China
Chai Guorong,Wang Bin,Sha Yongzhong()
School of Management, Lanzhou University, Lanzhou 730000, China
Research Center for Hospital Management, Lanzhou University, Lanzhou 730000, China
Research Center for Emergency Management, Lanzhou University, Lanzhou 730000, China
Download: PDF (1302 KB)   HTML ( 7
Export: BibTeX | EndNote (RIS)      

[Objective] This study tries to explore the practicability and effectiveness of forecasting public health risks with machine learning, taken influenza as an example. [Methods] First, we collected the data on influenza and meteorological factors during 2009 to 2016 in Lanzhou, China. Data from the year 2009 to 2015 were used as the training data and 2016 as the testing data. Then, based on SARIMA, Kalman Filter, and VAR, three machine learning methods for influenza prediction were put forward, respectively. Moreover, we designed two multi-method combined forecasting strategies. Finally, the forecasting performance of the above methods (strategies) was carefully evaluated and compared. [Results] The SARIMA, VAR, and Kalman Filter achieved best predict performance in the whole period (WP), outbreak period (OP), and stabilization period (SP), with RMSE at 11.68, 19.23, 1.60, and R 2 at 0.932, 0.923, 0.956, respectively. The forecasting performance among all three scenarios was improved by our multi-method combined strategies, in which Comb_2 has better performance, with RMSE at 10.82, 14.68, 1.38, and R 2 at 0.942, 0.934, 0.963, respectively. [Limitations] Limited by the data, this study just considered meteorology factors as external factors. [Conclusions] Predicting public health risks (such as influenza) with machine learning is practicable, effective and has great potential. But a lack of multi-source data is the major dilemma. Therefore, to promote the open exchange and sharing of data, barriers should be broken at the technical, organizational, and institutional levels.

Key wordsMachine Learning      Influenza Forecast      Public Health Risk      Risk Forecast     
Received: 03 August 2020      Published: 05 February 2021
ZTFLH:  C916  
Fund:The work is supported by the National Natural Science Foundation of China Grant No(71472079);the Fundamental Research Funds for the Central Universities Grant No(18LZUJBWZD07);the Key Projects of Philosophy and Social Sciences Research, Ministry of Education Grant No(16JZD023)
Corresponding Authors: Sha Yongzhong     E-mail:

Cite this article:

Chai Guorong,Wang Bin,Sha Yongzhong. Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China. Data Analysis and Knowledge Discovery, 2021, 5(1): 90-98.

URL:     OR

Forecasting Process
因素 均值±
最小值 最大值 百分位数
25% 50% 75%
流感发病数(例) 9±20 0 241 2 5 10
温度(℃) 11.23±9.79 -8.81 29.00 2.18 12.93 19.92
大气压(hPa) 846.78±4.48 836.70 858.00 843.51 846.87 849.70
风速(m/s) 1.24±0.23 0.69 2.00 1.07 1.23 1.41
相对湿度(%) 50.46±12.24 18.06 78.43 42.66 51.37 59.29
降雨量(mm) 0.82±1.67 0 17.89 0 0.14 1.00
Descriptive Statistics of Influenza and Meteorological Factors, 2009-2016
Time Series Chart of Influenza and Meteorological Factors, 2009-2016
因素 温度 大气压 风速 相对湿度 降雨量
大气压 -0.741**
风速 0.425** -0.533**
相对湿度 -0.032 0.166** -0.439**
降雨量 0.499** -0.359** 0.135** 0.493**
流感发病数 -0.482** 0.400** -0.266** -0.074* -0.272**
Spearman Correlations Between Weekly Meteorological Variables at Lag of 1 Week and Influenza Cases, 2009-2016
Observed Values and Predicted Values of Influenza Incidence, 367-418 weeks
指标 场景 SARIMA KF VAR 联合预测
Comb_1 Comb_2
RMSE WP 11.68 12.61 11.85 10.88 10.82*
OP 20.28 21.96 19.23 14.74 14.68*
SP 1.72 1.60 3.13 1.58 1.38*
R2 WP 0.932 0.921 0.930 0.941 0.942#
OP 0.920 0.910 0.923 0.933 0.934#
SP 0.918 0.956 0.832 0.953 0.963#
RMSE and R2for Prediction Results of Independent Methods and Combined Strategies in Each Scenario
[1] 乌尔里希·贝克 . 世界风险社会[M]. 吴英姿, 孙淑敏, 译.南京: 南京大学出版社, 2004.
[1] ( Beck U. World Risk Society[M]. Translated by Wu Yingzi, Sun Shumin.Nanjing: Nanjing University Press, 2004.)
[2] 范如国 . “全球风险社会”治理:复杂性范式与中国参与[J]. 中国社会科学, 2017 (2):65-83, 206.
[2] ( Fan Ruguo. . Governance of “Global Risk Society”: The Paradigm of Complexity and Chinese Participation , Social Sciences in China, 2017 (2):65-83, 206.)
[3] Giddens A. The Consequences of Modernity[M]. Redwood City: Stanford University Press, 1990: 325-327.
[4] WHO. Up to 650 000 People Die of Respiratory Diseases Linked to Seasonal Flu Each Year[EB/OL]. ( 2017- 12- 13).[2020-07-01]. .
[5] 李兰娟, 任红 . 传染病学[M]. 北京: 人民卫生出版社, 2013.
[5] ( Li Lanjuan, Ren Hong. Infectious Diseases [M]. Beijing: People’s Medical Publishing House Co., Ltd., 2013.)
[6] Barry J M . The Great Influenza: The Story of the Deadliest Pandemic in History[M]. London: Penguin Books, 2005.
[7] WHO. Influenza (Seasonal)[EB/OL]. ( 2018- 10- 06).[2020-04-06]. .
[8] Nair H, Brooks W A, Katz M , et al. Global Burden of Respiratory Infections due to Seasonal Influenza in Young Children: A Systematic Review and Meta-Analysis[J]. Lancet, 2011,378(9807):1917-1930.
doi: 10.1016/S0140-6736(11)61051-9
[9] Thompson W W, Eric W, Praveen D , et al. Estimates of US Influenza‐Associated Deaths Made Using Four Different Methods[J]. Influenza Other Respir Viruses, 2009,3(1):37-49.
doi: 10.1111/j.1750-2659.2009.00073.x pmid: 19453440
[10] 张海波 . 社会风险研究的范式[J]. 南京大学学报(哲学·人文科学·社会科学), 2007,44(2):136-144.
[10] ( Zhang Haibo . Paradigms for Societal Risk Studies[J]. Journal of Nanjing University (Philosophy, Humanities and Social Sciences), 2007,42(2):136-144.)
[11] 中国国家流感中心. 流感监测[EB/OL]. ( 2019- 12- 25).[2020-08-09]. National Influenza Center. Influenza Surveillance [EB/OL]. (2019-12-25).[ 2020-08-09].
[12] Lu F S, Hou S Q, Baltrusaitis K , et al. Accurate Influenza Monitoring and Forecasting Using Novel Internet Data Streams: A Case Study in the Boston Metropolis[J]. JMIR Public Health and Surveillance, 2018,4(1):e4.
doi: 10.2196/publichealth.8950 pmid: 29317382
[13] Centers for Disease Control and Prevention of US. U.S. Influenza Surveillance System: Purpose and Methods[EB/OL].( 2020 -07-08).[2020-08-18]. .
[14] Biggerstaff M, Johansson M, Alper D , et al. Results from the Second Year of a Collaborative Effort to Forecast Influenza Seasons in the United States[J]. Epidemics, 2018,24:26-33.
pmid: 29506911
[15] Yang W, Karspeck A, Shaman J . Comparison of Filtering Methods for the Modeling and Retrospective Forecasting of Influenza Epidemics[J]. PLoS Computational Biology, 2014,10(4):e1003583.
doi: 10.1371/journal.pcbi.1003583 pmid: 24762780
[16] Olson D R, Konty K J, Paladini M , et al. Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales[J]. PLoS Computational Biology, 2013,9(10):e1003256.
doi: 10.1371/journal.pcbi.1003256 pmid: 24146603
[17] Kim M-J, Nembhard H, Lambert B , et al. A Syndromic Surveillance System for Clinical and Non-Clinical Health Data[J]. IIE Transactions on Healthcare Systems Engineering, 2011,1(1):37-48.
[18] Soebiyanto R P, Adimi F, Kiang R K . Modeling and Predicting Seasonal Influenza Transmission in Warm Regions Using Climatological Parameters[J]. PLoS One, 2010,5(3):e9450.
pmid: 20209164
[19] 李若曦, 王晓岗, 陈黎黎 , 等. ARIMA模型在北京市丰台区流行性感冒预测中的应用[J]. 职业与健康, 2018, 34(6):792-795, 799.
[19] ( Li Ruoxi, Wang Xiaogang, Chen Lili, et al. Application of ARIMA Model in Forecasting Incidence of Influenza in Fengtai District of Beijing , Occup and Health, 2018,34(6):792-795, 799.)
[20] 周美兰, 周志华, 罗美玲 , 等. 湖南省哨点医院流感样病例SARIMA模型预测[J]. 实用预防医学, 2018,25(3):370-373.
[20] ( Zhou Meilan, Zhou Zhihua, Luo Meiling , et al. Prediction of Influenza-Like Illness in Sentinel Hospitals in Hunan Province by SARIMA Model[J]. Practical Preventive Medicine, 2018,25(3):370-373.)
[21] Venna S R, Tavanaei A, Gottumukkala R N , et al. A Novel Data-Driven Model for Real-Time Influenza Forecasting[J]. IEEE Access, 2019,7:7691-7701.
[22] Ben-Nun M, Riley P, Turtle J , et al. National and Regional Influenza-Like-Illness Forecasts for the USA[OL]. bioRxiv Preprint, .
[23] Box G E, Jenkis G M . Time Series Analysis for Casting and Control[M]. San Francisco: Holden-day, 1970.
[24] Liu S J, Chen J P, Wang J M , et al. Predicting the Outbreak of Hand, Foot, and Mouth Disease in Nanjing, China: A Time-Series Model Based on Weather Variability[J]. International Journal of Biometeorology, 2017.DOI: 10.1007/s00484-017-1465-3.
doi: 10.1007/s00484-020-02035-3 pmid: 33416948
[25] Du Z C, Xu L, Zhang W J , et al. Predicting the Hand, Foot, and Mouth Disease Incidence Using Search Engine Query Data and Climate Variables: An Ecological Study in Guangdong, China[J]. BMJ Open, 2017,7(10):e016263.
[26] Kalman R E . A New Approach to Linear Filtering and Prediction Problems[J]. Journal of Basic Engineering Transactions, 1960,82:35-45.
[27] Welch G, Bishop G . An Introduction to the Kalman Filter[R]. University of North Carolina at Chapel Hill, 2001.
[28] Pei S, Kandula S, Yang W , et al. Forecasting the Spatial Transmission of Influenza in the United States[J]. Proceedings of the National Academy of Sciences, 2018,115(11):2752-2757.
[29] Monogan J. Vector Autoregression[M]. Mauritius: Betascript Publishing, 2010: 678-699.
[30] Guo P, Liu T, Zhang Q , et al. Developing a Dengue Forecast Model Using Machine Learning: A Case Study in China[J]. PLoS Neglected Tropical Diseases, 2017,11(10):e0005973.
[31] 陈东, 王建冬, 李慧颖 , 等. 融合机器学习算法和多因素的禽肉交易量预测方法研究[J]. 数据分析与知识发现, 2020,4(7):18-27.
[31] ( Chen Dong, Wang Jiandong, Li Huiying , et al. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. Data Analysis and Knowledge Discovery, 2020,4(7):18-27.)
[1] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[2] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[3] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[4] Wang Shuyi,Liu Sai,Ma Zheng. Microblog Image Privacy Classification with Deep Transfer Learning[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
[5] Ruojia Wang,Lu Zhang,Jimin Wang. Automatic Triage of Online Doctor Services Based on Machine Learning[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
[6] Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[7] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[8] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[9] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[10] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[11] Jing Li,Shuxiao Pan,Xueyan Li,Lijing Jia,Yuzhuo Zhao. Screening Critical Patients with Optimized Classifier Based on Multi Objective Quantum[J]. 数据分析与知识发现, 2019, 3(12): 101-112.
[12] Zixuan Zhang,Hao Wang,Liping Zhu,Sanhong eng. Identifying Risks of HS Codes by China Customs[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
[13] Liu Lina,Qi Jiayin,Zhang Zhenping,Zeng Dan. Analyzing Impacts of Brand Reputation on Online Sales Based on Massive Commodity Reviews and Brand[J]. 数据分析与知识发现, 2018, 2(9): 10-21.
[14] Jia Longjia,Zhang Bangzuo. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
[15] Lu Wei,Luo Mengqi,Ding Heng,Li Xin. Image Annotation Tags by Deep Learning and Real Users: A Comparative Study[J]. 数据分析与知识发现, 2018, 2(5): 1-10.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938