Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (1): 90-98     https://doi.org/10.11925/infotech.2096-3467.2020.0754
     研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于多机器学习方法联合的公共卫生风险预测研究——以兰州市流感预测为例*
柴国荣,王斌,沙勇忠()
兰州大学管理学院 兰州 730000
兰州大学医院管理研究中心 兰州 730000
兰州大学应急管理研究中心 兰州 730000
Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China
Chai Guorong,Wang Bin,Sha Yongzhong()
School of Management, Lanzhou University, Lanzhou 730000, China
Research Center for Hospital Management, Lanzhou University, Lanzhou 730000, China
Research Center for Emergency Management, Lanzhou University, Lanzhou 730000, China
全文: PDF (1302 KB)   HTML ( 26
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 探索应用机器学习预测流感这类公共卫生风险的可行性和有效性。【方法】 首先,收集2009-2016年兰州市的流感和气象数据,拆分成2009-2015年和2016年两组,分别作为训练和验证数据;然后,分别基于SARIMA、Kalman Filter和VAR建立三种机器学习预测方法,并设计两种多方法联合预测策略;最后,评估、比较上述方法(策略)的预测性能。【结果】 在设定的全期、爆发期和稳定期三种场景下,SARIMA、VAR和Kalman Filter方法的预测效果分别为最佳(RMSE分别为11.68、19.23和1.60;R 2分别为0.932、0.923和0.956);多方法联合策略可进一步提升三种场景下的预测效果,其中联合策略Comb_2的表现更好(RMSE分别为10.82、14.68和1.38;R 2分别为0.942、0.934和0.963)。【局限】 相关数据限制,主要考虑了气象一类外部相关因素。【结论】 应用机器学习预测流感等公共卫生风险具有可行性和有效性,且潜力巨大。但目前面临的主要困境是多源数据缺乏,需要从技术、组织和制度层面打破数据壁垒,推动数据共享与开放。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
柴国荣
王斌
沙勇忠
关键词 机器学习流感预测公共卫生风险风险预测    
Abstract

[Objective] This study tries to explore the practicability and effectiveness of forecasting public health risks with machine learning, taken influenza as an example. [Methods] First, we collected the data on influenza and meteorological factors during 2009 to 2016 in Lanzhou, China. Data from the year 2009 to 2015 were used as the training data and 2016 as the testing data. Then, based on SARIMA, Kalman Filter, and VAR, three machine learning methods for influenza prediction were put forward, respectively. Moreover, we designed two multi-method combined forecasting strategies. Finally, the forecasting performance of the above methods (strategies) was carefully evaluated and compared. [Results] The SARIMA, VAR, and Kalman Filter achieved best predict performance in the whole period (WP), outbreak period (OP), and stabilization period (SP), with RMSE at 11.68, 19.23, 1.60, and R 2 at 0.932, 0.923, 0.956, respectively. The forecasting performance among all three scenarios was improved by our multi-method combined strategies, in which Comb_2 has better performance, with RMSE at 10.82, 14.68, 1.38, and R 2 at 0.942, 0.934, 0.963, respectively. [Limitations] Limited by the data, this study just considered meteorology factors as external factors. [Conclusions] Predicting public health risks (such as influenza) with machine learning is practicable, effective and has great potential. But a lack of multi-source data is the major dilemma. Therefore, to promote the open exchange and sharing of data, barriers should be broken at the technical, organizational, and institutional levels.

Key wordsMachine Learning    Influenza Forecast    Public Health Risk    Risk Forecast
收稿日期: 2020-08-03      出版日期: 2021-02-05
ZTFLH:  C916  
基金资助:*本文系国家自然科学基金项目项目编号(71472079);国家中央高校基本科研业务费重点项目(项目编号)(18LZUJBWZD07);教育部哲学社会科学研究重大课题攻关项目 的研究成果之一项目编号(16JZD023)
通讯作者: 沙勇忠     E-mail: shayzh@lzu.edu.cn
引用本文:   
柴国荣,王斌,沙勇忠. 基于多机器学习方法联合的公共卫生风险预测研究——以兰州市流感预测为例*[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
Chai Guorong,Wang Bin,Sha Yongzhong. Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China. Data Analysis and Knowledge Discovery, 2021, 5(1): 90-98.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0754      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I1/90
Fig.1  预测流程
因素 均值±
标准差
最小值 最大值 百分位数
25% 50% 75%
流感发病数(例) 9±20 0 241 2 5 10
温度(℃) 11.23±9.79 -8.81 29.00 2.18 12.93 19.92
大气压(hPa) 846.78±4.48 836.70 858.00 843.51 846.87 849.70
风速(m/s) 1.24±0.23 0.69 2.00 1.07 1.23 1.41
相对湿度(%) 50.46±12.24 18.06 78.43 42.66 51.37 59.29
降雨量(mm) 0.82±1.67 0 17.89 0 0.14 1.00
Table 1  流感与气象因素的描述性统计(2009-2016年)
Fig.2  流感和气象因素的时间序列图(2009-2016年)
因素 温度 大气压 风速 相对湿度 降雨量
大气压 -0.741**
风速 0.425** -0.533**
相对湿度 -0.032 0.166** -0.439**
降雨量 0.499** -0.359** 0.135** 0.493**
流感发病数 -0.482** 0.400** -0.266** -0.074* -0.272**
Table 2  滞后一期气象因素与流感发病数的Spearman相关系数(2009-2016年)
Fig.3  流感发病数的观测值和预测值(367-418周)
指标 场景 SARIMA KF VAR 联合预测
Comb_1 Comb_2
RMSE WP 11.68 12.61 11.85 10.88 10.82*
OP 20.28 21.96 19.23 14.74 14.68*
SP 1.72 1.60 3.13 1.58 1.38*
R2 WP 0.932 0.921 0.930 0.941 0.942#
OP 0.920 0.910 0.923 0.933 0.934#
SP 0.918 0.956 0.832 0.953 0.963#
Table 3  各场景下独立方法和联合策略预测结果的RMSER2
[1] 乌尔里希·贝克 . 世界风险社会[M]. 吴英姿, 孙淑敏, 译.南京: 南京大学出版社, 2004.
[1] ( Beck U. World Risk Society[M]. Translated by Wu Yingzi, Sun Shumin.Nanjing: Nanjing University Press, 2004.)
[2] 范如国 . “全球风险社会”治理:复杂性范式与中国参与[J]. 中国社会科学, 2017 (2):65-83, 206.
[2] ( Fan Ruguo. . Governance of “Global Risk Society”: The Paradigm of Complexity and Chinese Participation , Social Sciences in China, 2017 (2):65-83, 206.)
[3] Giddens A. The Consequences of Modernity[M]. Redwood City: Stanford University Press, 1990: 325-327.
[4] WHO. Up to 650 000 People Die of Respiratory Diseases Linked to Seasonal Flu Each Year[EB/OL]. ( 2017- 12- 13).[2020-07-01]. http://www.who.int/mediacentre/news/statements/2017/flu/en/ .
[5] 李兰娟, 任红 . 传染病学[M]. 北京: 人民卫生出版社, 2013.
[5] ( Li Lanjuan, Ren Hong. Infectious Diseases [M]. Beijing: People’s Medical Publishing House Co., Ltd., 2013.)
[6] Barry J M . The Great Influenza: The Story of the Deadliest Pandemic in History[M]. London: Penguin Books, 2005.
[7] WHO. Influenza (Seasonal)[EB/OL]. ( 2018- 10- 06).[2020-04-06]. http://www.who.int/en/news-room/fact-sheets/detail/influenza-(seasonal) .
[8] Nair H, Brooks W A, Katz M , et al. Global Burden of Respiratory Infections due to Seasonal Influenza in Young Children: A Systematic Review and Meta-Analysis[J]. Lancet, 2011,378(9807):1917-1930.
doi: 10.1016/S0140-6736(11)61051-9
[9] Thompson W W, Eric W, Praveen D , et al. Estimates of US Influenza‐Associated Deaths Made Using Four Different Methods[J]. Influenza Other Respir Viruses, 2009,3(1):37-49.
doi: 10.1111/j.1750-2659.2009.00073.x pmid: 19453440
[10] 张海波 . 社会风险研究的范式[J]. 南京大学学报(哲学·人文科学·社会科学), 2007,44(2):136-144.
[10] ( Zhang Haibo . Paradigms for Societal Risk Studies[J]. Journal of Nanjing University (Philosophy, Humanities and Social Sciences), 2007,42(2):136-144.)
[11] 中国国家流感中心. 流感监测[EB/OL]. ( 2019- 12- 25).[2020-08-09]. http://www.chinaivdc.cn/cnic/lgwd/ptlg/201912/t20191225_209368.htm.(Chinese National Influenza Center. Influenza Surveillance [EB/OL]. (2019-12-25).[ 2020-08-09]. http://www.chinaivdc.cn/cnic/lgwd/ptlg/201912/t20191225_209368.htm
[12] Lu F S, Hou S Q, Baltrusaitis K , et al. Accurate Influenza Monitoring and Forecasting Using Novel Internet Data Streams: A Case Study in the Boston Metropolis[J]. JMIR Public Health and Surveillance, 2018,4(1):e4.
doi: 10.2196/publichealth.8950 pmid: 29317382
[13] Centers for Disease Control and Prevention of US. U.S. Influenza Surveillance System: Purpose and Methods[EB/OL].( 2020 -07-08).[2020-08-18].https://www.cdc.gov/flu/weekly/overview.htm .
[14] Biggerstaff M, Johansson M, Alper D , et al. Results from the Second Year of a Collaborative Effort to Forecast Influenza Seasons in the United States[J]. Epidemics, 2018,24:26-33.
pmid: 29506911
[15] Yang W, Karspeck A, Shaman J . Comparison of Filtering Methods for the Modeling and Retrospective Forecasting of Influenza Epidemics[J]. PLoS Computational Biology, 2014,10(4):e1003583.
doi: 10.1371/journal.pcbi.1003583 pmid: 24762780
[16] Olson D R, Konty K J, Paladini M , et al. Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales[J]. PLoS Computational Biology, 2013,9(10):e1003256.
doi: 10.1371/journal.pcbi.1003256 pmid: 24146603
[17] Kim M-J, Nembhard H, Lambert B , et al. A Syndromic Surveillance System for Clinical and Non-Clinical Health Data[J]. IIE Transactions on Healthcare Systems Engineering, 2011,1(1):37-48.
[18] Soebiyanto R P, Adimi F, Kiang R K . Modeling and Predicting Seasonal Influenza Transmission in Warm Regions Using Climatological Parameters[J]. PLoS One, 2010,5(3):e9450.
pmid: 20209164
[19] 李若曦, 王晓岗, 陈黎黎 , 等. ARIMA模型在北京市丰台区流行性感冒预测中的应用[J]. 职业与健康, 2018, 34(6):792-795, 799.
[19] ( Li Ruoxi, Wang Xiaogang, Chen Lili, et al. Application of ARIMA Model in Forecasting Incidence of Influenza in Fengtai District of Beijing , Occup and Health, 2018,34(6):792-795, 799.)
[20] 周美兰, 周志华, 罗美玲 , 等. 湖南省哨点医院流感样病例SARIMA模型预测[J]. 实用预防医学, 2018,25(3):370-373.
[20] ( Zhou Meilan, Zhou Zhihua, Luo Meiling , et al. Prediction of Influenza-Like Illness in Sentinel Hospitals in Hunan Province by SARIMA Model[J]. Practical Preventive Medicine, 2018,25(3):370-373.)
[21] Venna S R, Tavanaei A, Gottumukkala R N , et al. A Novel Data-Driven Model for Real-Time Influenza Forecasting[J]. IEEE Access, 2019,7:7691-7701.
[22] Ben-Nun M, Riley P, Turtle J , et al. National and Regional Influenza-Like-Illness Forecasts for the USA[OL]. bioRxiv Preprint, https://doi.org/10.1101/309021 .
[23] Box G E, Jenkis G M . Time Series Analysis for Casting and Control[M]. San Francisco: Holden-day, 1970.
[24] Liu S J, Chen J P, Wang J M , et al. Predicting the Outbreak of Hand, Foot, and Mouth Disease in Nanjing, China: A Time-Series Model Based on Weather Variability[J]. International Journal of Biometeorology, 2017.DOI: 10.1007/s00484-017-1465-3.
doi: 10.1007/s00484-020-02035-3 pmid: 33416948
[25] Du Z C, Xu L, Zhang W J , et al. Predicting the Hand, Foot, and Mouth Disease Incidence Using Search Engine Query Data and Climate Variables: An Ecological Study in Guangdong, China[J]. BMJ Open, 2017,7(10):e016263.
[26] Kalman R E . A New Approach to Linear Filtering and Prediction Problems[J]. Journal of Basic Engineering Transactions, 1960,82:35-45.
[27] Welch G, Bishop G . An Introduction to the Kalman Filter[R]. University of North Carolina at Chapel Hill, 2001.
[28] Pei S, Kandula S, Yang W , et al. Forecasting the Spatial Transmission of Influenza in the United States[J]. Proceedings of the National Academy of Sciences, 2018,115(11):2752-2757.
[29] Monogan J. Vector Autoregression[M]. Mauritius: Betascript Publishing, 2010: 678-699.
[30] Guo P, Liu T, Zhang Q , et al. Developing a Dengue Forecast Model Using Machine Learning: A Case Study in China[J]. PLoS Neglected Tropical Diseases, 2017,11(10):e0005973.
[31] 陈东, 王建冬, 李慧颖 , 等. 融合机器学习算法和多因素的禽肉交易量预测方法研究[J]. 数据分析与知识发现, 2020,4(7):18-27.
[31] ( Chen Dong, Wang Jiandong, Li Huiying , et al. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. Data Analysis and Knowledge Discovery, 2020,4(7):18-27.)
[1] 王寒雪,崔文娟,周园春,杜一. 基于机器学习的食源性疾病致病菌识别方法*[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] 陈东华,赵红梅,尚小溥,张润彤. 数据驱动的大型医院手术室运营预测与优化方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[3] 车宏鑫,王桐,王伟. 前列腺癌预测模型对比研究*[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[4] 苏强, 侯校理, 邹妮. 基于机器学习组合优化方法的术后感染预测模型研究*[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[5] 曹睿,廖彬,李敏,孙瑞娜. 基于XGBoost的在线短租市场价格预测及特征分析模型*[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[6] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[7] 向卓元,刘志聪,吴玉. 基于用户行为自适应推荐模型研究 *[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[8] 陈东,王建冬,李慧颖,蔡思航,黄倩倩,易成岐,曹攀. 融合机器学习算法和多因素的禽肉交易量预测方法研究 *[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[9] 梁野,李小元,许航,胡伊然. CLOpin:一种面向舆情分析与预警领域的跨语言知识图谱架构*[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[10] 杨恒,王思丽,祝忠明,刘巍,王楠. 基于并行协同过滤算法的领域知识推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[11] 王树义,刘赛,马峥. 基于深度迁移学习的微博图像隐私分类研究*[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
[12] 王若佳,张璐,王继民. 基于机器学习的在线问诊平台智能分诊研究[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
[13] 李纲,周华阳,毛进,陈思菁. 基于机器学习的社交媒体用户分类研究 *[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[14] 胡佳慧,方安,赵琬清,杨晨柳,任慧玲. 面向知识发现的中文电子病历标注方法研究 *[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[15] 张金柱,胡一鸣. 融合表示学习与机器学习的专利科学引文标题自动抽取研究*[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn