Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (8): 122-133     https://doi.org/10.11925/infotech.2096-3467.2021.1269
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于灰狼优化与多机器学习的重大传染病集合预测研究——以COVID-19疫情为例*
曲宗希,沙勇忠(),李雨桐
兰州大学管理学院 兰州 730099,兰州大学应急管理研究中心 兰州 730099
Predicting Major Infectious Diseases Based on Grey Wolf Optimization and Multi-machine Learning: Case Study of COVID-19
Qu Zongxi,Sha Yongzhong(),Li Yutong
School of Management, Lanzhou University, Lanzhou 730099, China,Research Center for Emergency Management, Lanzhou University, Lanzhou 730099, China
全文: PDF (3231 KB)   HTML ( 19
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 预知重大传染病的发展趋势可提前制定应对措施,探索基于多机器学习的集合预测方法构建准确有效的传染病疫情预测模型。【方法】 建立融合多机器学习的重大传染病集合预测模型,基于灰狼优化算法搜索获得集合模型的最优权重系数。通过COVID-19疫情数据设计实验评估模型预测性能。【结果】 ANFIS、LSSVM和LSTM分别适用于确诊、死亡和恢复病例情景;基于灰狼优化的集合预测模型在三种情景下的平均R2分别达到0.989、0.993和0.987,相较于各单项模型的平均RMSE分别降低了37.37%、63.93%和53.37%。【局限】 模型需使用其他重大传染病疫情数据进一步验证。【结论】 不同机器学习的预测表现各有所长,基于灰狼优化的集合预测模型能够有效融合多机器学习的优势,从而获得稳定、精确的预测结果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
曲宗希
沙勇忠
李雨桐
关键词 重大传染病疫情集合预测灰狼优化机器学习    
Abstract

[Objective] This paper tries to build an accurate and effective forecasting model for major infectious diseases based on multi-machine learning, aiming to predict outbreak trends and help formulate countermeasures in advance. [Methods] We established an ensemble prediction model with three machine learning optimal weight combinations of ANFIS, LSSVM and LSTM from the Gray Wolf Optimization algorithm. Then, we assessed the model’s prediction performance with the COVID-19 epidemic data. [Results] The ANFIS, LSSVM, and LSTM were suitable for predicting confirmed cases, death cases, and recovery cases. The average R2 of the proposed model reached 0.989, 0.993 and 0.987for the three scenarios. The average RMSE were 37.37%, 63.93% and 53.37% lower than the single model, respectively. [Limitations] The model needs to be examined with data sets on other major infectious diseases. [Conclusions] The ensemble prediction model based on Gray Wolf Optimization can effectively merge the advantages of multiple machine learning models to obtain stable and accurate results.

Key wordsMajor Infectious Disease Outbreak    Ensemble Prediction    Grey Wolf Optimization    Machine Learning
收稿日期: 2021-11-07      出版日期: 2022-09-23
ZTFLH:  R183  
  TP181  
基金资助:*国家自然科学青年基金项目的研究成果之一(72004086)
通讯作者: 沙勇忠,ORCID: 0000-0002-2479-2335     E-mail: shayzh@lzu.edu.cn
引用本文:   
曲宗希, 沙勇忠, 李雨桐. 基于灰狼优化与多机器学习的重大传染病集合预测研究——以COVID-19疫情为例*[J]. 数据分析与知识发现, 2022, 6(8): 122-133.
Qu Zongxi, Sha Yongzhong, Li Yutong. Predicting Major Infectious Diseases Based on Grey Wolf Optimization and Multi-machine Learning: Case Study of COVID-19. Data Analysis and Knowledge Discovery, 2022, 6(8): 122-133.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.1269      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I8/122
Fig.1  灰狼狩猎示意图
评价准则 计算公式
RMSE 1 N n = 1 N ( y n - y ^ n ) 2 1 / 2
MAE 1 N n = 1 N y n - y ^ n
R2 1 - n = 1 N ( y n - y ^ n ) n = 1 N ( y n - y - )
Table 1  模型性能评价准则
模型 实验参数 参数值
ANFIS FIS Generation Method ‘FCM’
Partition matrix exponent 2
Number of clusters 10
Maximum number of epochs 200
LSSVM Kernel ‘RBF_kernel’
Type ‘function estimation’
Maximum number of epochs 200
LSTM NumFeatures 1
NumResponses 1
NumHiddenUnits 100
Maximum number of epochs 200
Table 2  机器学习模型参数设置
Fig.2  不同机器学习模型预测误差对比
国家 病例种类 权重系数
ANFIS LSSVM LSTM
巴西 确诊病例 0.433 7 0.229 3 0.336 1
死亡病例 0.258 6 0.633 7 0.106 9
恢复病例 0.127 8 0.169 3 0.702 9
德国 确诊病例 0.775 5 0.192 2 0.033 9
死亡病例 0.021 2 0.904 3 0.077 3
恢复病例 0.854 5 0.008 1 0.140 0
印度 确诊病例 0.959 8 0.000 8 0.039 7
死亡病例 0.386 3 0.360 7 0.252 8
恢复病例 0.021 3 0.236 0 0.743 4
Table 3  GWO算法优化获得的集合模型的权重结果
算法 误差指标 确诊病例 死亡病例 恢复病例
ANFIS RMSE 61 907.389 9 067.774 51 677.476
MAE 53 346.317 7 823.019 41 125.250
R2 0.988 0.670 0.992
LSSVM RMSE 102 874.191 3 519.665 54 504.770
MAE 93 877.450 3 138.636 44 329.653
R2 0.966 0.950 0.992
LSTM RMSE 65 036.197 4 691.670 44 754.954
MAE 57 760.524 4 207.787 31 966.607
R2 0.986 0.912 0.995
平均集合 RMSE 38 347.996 543.311 45 580.268
MAE 35 297.435 452.753 36 317.756
R2 0.995 0.999 0.994
GWO优化集合 RMSE 19 004.084 351.125 43 304.957
MAE 13 251.499 300.861 33 187.829
R2 0.999 1.000 0.995
Table 4  各模型在巴西的预测误差对比
算法 误差指标 确诊病例 死亡病例 恢复病例
ANFIS RMSE 2 570.413 36.020 3 604.251
MAE 1 464.941 21.973 2 326.934
R2 0.966 0.988 0.912
LSSVM RMSE 3 636.940 36.004 5 095.082
MAE 2 541.193 28.337 4 563.184
R2 0.933 0.988 0.824
LSTM RMSE 3 833.311 68.183 2 748.320
MAE 2 490.947 63.268 1 951.193
R2 0.925 0.958 0.949
平均集合 RMSE 3 222.102 37.925 3 491.498
MAE 2 030.421 30.370 2 801.723
R2 0.947 0.987 0.917
GWO优化集合 RMSE 2 379.995 28.624 2 003.162
MAE 1 516.312 19.324 1 468.277
R2 0.971 0.993 0.973
Table 5  各模型在德国的预测误差对比
算法 误差指标 确诊病例 死亡病例 恢复病例
ANFIS RMSE 53 816.132 5 749.633 65 452.553
MAE 44 922.194 5 224.422 53 986.358
R2 0.988 0.702 0.986
LSSVM RMSE 56 798.467 1 670.177 375 812.555
MAE 44 066.724 1 233.361 293 712.684
R2 0.987 0.975 0.540
LSTM RMSE 189 399.641 2 202.258 245 591.605
MAE 165 611.399 1 997.203 214 948.692
R2 0.853 0.956 0.804
平均集合 RMSE 99 469.679 2 380.371 228 009.859
MAE 84 628.157 1 779.095 187 542.846
R2 0.960 0.949 0.831
GWO优化集合 RMSE 32 503.480 1 187.918 48 544.028
MAE 26 845.140 980.660 41 433.556
R2 0.996 0.987 0.992
Table 6  各模型在印度的预测误差对比
Fig.3  巴西不同模型预测值与实际值对比
Fig.4  德国不同模型预测值与实际值对比
Fig.5  印度不同模型预测值与实际值对比
国家 模型 DM
确诊病例 死亡病例 恢复病例
巴西 ANFIS 5.945 7.130 1.930
LSSVM 8.588 7.583 2.273
LSTM 6.746 8.539 0.445
平均集合 5.364 3.373 0.976
GWO优化集合 - - -
德国 ANFIS 1.384 1.690 1.501
LSSVM 2.859 2.673 6.573
LSTM 2.683 7.020 2.974
平均集合 2.212 2.898 3.472
GWO优化集合 - - -
印度 ANFIS 4.469 7.950 3.204
LSSVM 3.869 3.096 5.108
LSTM 6.614 4.783 6.648
平均集合 5.826 4.454 5.571
GWO优化集合 - - -
Table 7  模型DM检验结果
[1] Wu T, Perrings C, Kinzig A, et al. Economic Growth, Urbanization, Globalization, and the Risks of Emerging Infectious Diseases in China: A Review[J]. Ambio, 2017, 46(1): 18-29.
doi: 10.1007/s13280-016-0809-2
[2] 陈叶, 王萍, 刘芳炜, 等. 埃博拉出血热研究进展[J]. 中国公共卫生, 2017, 33(1): 170-172.
[2] (Chen Ye, Wang Ping, Liu Fangwei, et al. Progress in Researches on Ebola Hemorrhagic Fever[J]. Chinese Journal of Public Health, 2017, 33(1): 170-172.)
[3] Devadoss P R, Pan S L, Singh S. Managing Knowledge Integration in a National Health-Care Crisis: Lessons Learned from Combating SARS in Singapore[J]. IEEE Transactions on Information Technology in Biomedicine, 2005, 9(2):266-275.
pmid: 16138543
[4] Racey P A, Fenton B. Mubareka S, et al. Don’t Misrepresent Link Between Bats and SARS[J]. Nature, 2018, 553(7688): 281.
[5] Zumla A, Hui D S, Perlman S. Middle East Respiratory Syndrome[J]. The Lancet, 2015, 386(9997): 995-1007.
doi: 10.1016/S0140-6736(15)60454-8
[6] Cauchemez S, Besnard M, Bompard P, et al. Association Between Zika Virus and Microcephaly in French Polynesia, 2013-15: A Retrospective Study[J]. The Lancet, 2016, 387(10033): 2125-2132.
doi: 10.1016/S0140-6736(16)00651-6
[7] Swapnarekha H, Behera H S, Nayak J, et al. Role of Intelligent Computing in COVID-19 Prognosis: A State-of-the-Art Review[J]. Chaos, Solitons & Fractals, 2020, 138: 109947.
doi: 10.1016/j.chaos.2020.109947
[8] Ghosal S, Sengupta S, Majumder M, et al. Linear Regression Analysis to Predict the Number of Deaths in India due to SARS-CoV-2 at 6 Weeks from Day 0 (100 Cases-March 14th 2020)[J]. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 2020, 14(4): 311-315.
[9] Ly K T. A COVID-19 Forecasting System Using Adaptive Neuro-Fuzzy Inference[J]. Finance Research Letters, 2021, 41: 101844.
doi: 10.1016/j.frl.2020.101844
[10] Borghi P H, Zakordonets O, Teixeira J P. A COVID-19 Time Series Forecasting Model Based on MLP ANN[J]. Procedia Computer Science, 2021, 181: 940-947.
doi: 10.1016/j.procs.2021.01.250
[11] Parbat D, Chakraborty M. A Python Based Support Vector Regression Model for Prediction of COVID19 Cases in India[J]. Chaos, Solitons & Fractals, 2020, 138: 109942.
doi: 10.1016/j.chaos.2020.109942
[12] Shastri S, Singh K, Kumar S, et al. Time Series Forecasting of Covid-19 Using Deep Learning Models: India-USA Comparative Case Study[J]. Chaos Solitons & Fractals, 2020, 140: 110227.
doi: 10.1016/j.chaos.2020.110227
[13] 洪彬, 陈锦秀, 王连生, 等. 基于SEIR-LSTM混合模型的新型冠状病毒肺炎传播趋势分析与预测[J]. 厦门大学学报(自然科学版), 2020, 59(6): 1034-1040.
[13] (Hong Bin, Chen Jinxiu, Wang Liansheng, et al. Analysis and Prediction of the Spread Trend of COVID-19 based on SEIR-LSTM Mixed Model[J]. Journal of Xiamen University(Natural Science), 2020, 59(6): 1034-1040.)
[14] 程玲华, 陈华友. 基于Theil不等系数的加权几何平均组合预测模型的性质[J]. 运筹与管理, 2007, 16(2): 78-83.
[14] (Cheng Linghua, Chen Huayou. Properties of Weighted Geometric Means Combination Forecasting Method Based on Theil Coefficient[J]. Operations Research and Management Science, 2007, 16(2): 78-83.)
[15] 袁宏俊, 钟梅, 吴庆鹏. 基于IGOWLA算子的区间组合预测模型[J]. 统计与决策, 2016 (14): 22-25.
[15] (Yuan Hongjun, Zhong Mei, Wu Qingpeng. Interval Combination Prediction Model based on IGOWLA Operator. Statistics & Decision, 2016 (14): 22-25.)
[16] Bates J M, Granger C W J. The Combination of Forecasts[J]. Journal of the Operational Research Society, 1969, 20(4): 451-468.
doi: 10.1057/jors.1969.103
[17] Ren Y, Suganthan P N, Srikanth N. Ensemble Methods for Wind and Solar Power Forecasting—A State-of-the-Art Review[J]. Renewable and Sustainable Energy Reviews, 2015, 50: 82-91.
doi: 10.1016/j.rser.2015.04.081
[18] Mirjalili S, Mirjalili S M, Lewis A. Grey Wolf Optimizer[J]. Advances in Engineering Software, 2014, 69: 46-61.
doi: 10.1016/j.advengsoft.2013.12.007
[19] Emary E, Zawbaa H M, Grosan C, et al. Feature Subset Selection Approach by Gray-Wolf Optimization[C]// Proceedings of Afro-European Conference for Industrial Advancement. 2015: 1-13.
[20] 王琛, 董永权. 基于二进制灰狼优化的特征选择及文本聚类[J]. 计算机工程与设计, 2021, 42(9): 2526-2535.
[20] (Wang Chen, Dong Yongquan. Feature Selection Based on Binary Grey Wolf Optimization and Text Clustering[J]. Computer Engineering and Design, 2021, 42(9): 2526-2535.)
[21] 李天翼, 陈红梅. 一种用于解决特征选择问题的新型混合演化算法[J]. 郑州大学学报(理学版), 2021, 53(2): 41-49.
[21] (Li Tianyi, Chen Hongmei. A New Hybrid Evolutionary Algorithm for Solving Feature Selection Problem[J]. Journal of Zhengzhou University (Natural Science Edition), 2021, 53(2): 41-49.)
[22] Wong L I, Sulaiman M H, Mohamed M R. Solving Economic Dispatch Problems with Practical Constraints Utilizing Grey Wolf Optimizer[J]. Applied Mechanics and Materials, 2015, 785: 511-515.
doi: 10.4028/www.scientific.net/AMM.785.511
[23] Kamboj V K, Bath S K, Dhillon J S. Solution of Non-Convex Economic Load Dispatch Problem Using Grey Wolf Optimizer[J]. Neural Computing and Applications, 2016, 27(5):1301-1316.
doi: 10.1007/s00521-015-1934-8
[24] Sulaiman M H, Ing W L, Mustaffa Z, et al. Grey Wolf Optimizer for Solving Economic Dispatch Problem with Valve-Loading Effects[J]. APRN Journal of Engineering and Applied Sciences, 2015, 10(21): 1619-1628.
[25] Jayabarathi T, Raghunathan T, Adarsh B R, et al. Economic Dispatch Using Hybrid Grey Wolf Optimizer[J]. Energy, 2016, 111: 630-641.
doi: 10.1016/j.energy.2016.05.105
[26] Yusof Y, Mustaffa Z. Time Series Forecasting of Energy Commodity Using Grey Wolf Optimizer[C]// Proceedings of the International MultiConference of Engineers and Computer Scientists. 2015.
[27] Mustaffa Z, Sulaiman M H, Kahar M N M. Training LSSVM with GWO for Price Forecasting[C]// Proceedings of 2015 International Conference on Informatics, Electronics& Vision. 2015: 1-6.
[28] Hassanin M F, Shoeb A M, Hassanien A E. Grey Wolf Optimizer-Based Back-Propagation Neural Network Algorithm[C]// Proceedings of the 12th International Computer Engineering Conference. 2016: 213-218.
[29] Jang J S R. ANFIS: Adaptive-Network-Based Fuzzy Inference System[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1993, 23(3): 665-685.
doi: 10.1109/21.256541
[30] Suykens J, Vandewalle J. Least Squares Support Vector Machine Classifiers[J]. Neural Processing Letters, 1999, 9: 293-300.
doi: 10.1023/A:1018628609742
[31] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
pmid: 9377276
[32] Clemen R T. Combining Forecasts: A Review and Annotated Bibliography[J]. International Journal of Forecasting, 1989, 5(4): 559-583.
doi: 10.1016/0169-2070(89)90012-5
[33] Dong E S, Du H R, Gardner L. An Interactive Web-Based Dashboard to Track COVID-19 in Real Time[J]. The Lancet Infectious Diseases, 2020, 20(5): 533-534.
doi: 10.1016/S1473-3099(20)30120-1
[34] Diebold F X, Mariano R S. Comparing Predictive Accuracy[J]. Journal of Business & Economic Statistics, 1995, 13(3): 253-263.
[1] 赵杨, 严周周, 沈棋琦, 李钟航. 基于机器学习的医疗健康APP隐私政策合规性研究*[J]. 数据分析与知识发现, 2022, 6(5): 112-126.
[2] 王露, 乐小虬. 科技论文引用内容分析研究进展[J]. 数据分析与知识发现, 2022, 6(4): 1-15.
[3] 王若佳, 严承希, 郭凤英, 王继民. 基于用户画像的在线健康社区用户流失预测研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 80-92.
[4] 吴金红, 穆克亮. 国际期刊异常行为的自动识别与预警研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 385-395.
[5] 胡雅敏, 吴晓燕, 陈方. 基于机器学习的技术术语识别研究综述[J]. 数据分析与知识发现, 2022, 6(2/3): 7-17.
[6] 车宏鑫,王桐,王伟. 前列腺癌预测模型对比研究*[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[7] 陈东华,赵红梅,尚小溥,张润彤. 数据驱动的大型医院手术室运营预测与优化方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[8] 王寒雪,崔文娟,周园春,杜一. 基于机器学习的食源性疾病致病菌识别方法*[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[9] 苏强, 侯校理, 邹妮. 基于机器学习组合优化方法的术后感染预测模型研究*[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[10] 曹睿,廖彬,李敏,孙瑞娜. 基于XGBoost的在线短租市场价格预测及特征分析模型*[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[11] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[12] 向卓元,刘志聪,吴玉. 基于用户行为自适应推荐模型研究 *[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[13] 周志超. 基于机器学习技术的自动引文分类研究综述*[J]. 数据分析与知识发现, 2021, 5(12): 14-24.
[14] 柴国荣,王斌,沙勇忠. 基于多机器学习方法联合的公共卫生风险预测研究——以兰州市流感预测为例*[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[15] 陈东,王建冬,李慧颖,蔡思航,黄倩倩,易成岐,曹攀. 融合机器学习算法和多因素的禽肉交易量预测方法研究 *[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn