Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (8): 122-133    DOI: 10.11925/infotech.2096-3467.2021.1269
Current Issue | Archive | Adv Search |
Predicting Major Infectious Diseases Based on Grey Wolf Optimization and Multi-machine Learning: Case Study of COVID-19
Qu Zongxi,Sha Yongzhong(),Li Yutong
School of Management, Lanzhou University, Lanzhou 730099, China,Research Center for Emergency Management, Lanzhou University, Lanzhou 730099, China
Download: PDF (3231 KB)   HTML ( 19
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to build an accurate and effective forecasting model for major infectious diseases based on multi-machine learning, aiming to predict outbreak trends and help formulate countermeasures in advance. [Methods] We established an ensemble prediction model with three machine learning optimal weight combinations of ANFIS, LSSVM and LSTM from the Gray Wolf Optimization algorithm. Then, we assessed the model’s prediction performance with the COVID-19 epidemic data. [Results] The ANFIS, LSSVM, and LSTM were suitable for predicting confirmed cases, death cases, and recovery cases. The average R2 of the proposed model reached 0.989, 0.993 and 0.987for the three scenarios. The average RMSE were 37.37%, 63.93% and 53.37% lower than the single model, respectively. [Limitations] The model needs to be examined with data sets on other major infectious diseases. [Conclusions] The ensemble prediction model based on Gray Wolf Optimization can effectively merge the advantages of multiple machine learning models to obtain stable and accurate results.

Key wordsMajor Infectious Disease Outbreak      Ensemble Prediction      Grey Wolf Optimization      Machine Learning     
Received: 07 November 2021      Published: 23 September 2022
ZTFLH:  R183  
  TP181  
Fund:National Natural Science Foundation of China(72004086)
Corresponding Authors: Sha Yongzhong,ORCID: 0000-0002-2479-2335     E-mail: shayzh@lzu.edu.cn

Cite this article:

Qu Zongxi, Sha Yongzhong, Li Yutong. Predicting Major Infectious Diseases Based on Grey Wolf Optimization and Multi-machine Learning: Case Study of COVID-19. Data Analysis and Knowledge Discovery, 2022, 6(8): 122-133.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.1269     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I8/122

Grey Wolf Hunting
评价准则 计算公式
RMSE 1 N n = 1 N ( y n - y ^ n ) 2 1 / 2
MAE 1 N n = 1 N y n - y ^ n
R2 1 - n = 1 N ( y n - y ^ n ) n = 1 N ( y n - y - )
Model Performance Evaluation Criteria
模型 实验参数 参数值
ANFIS FIS Generation Method ‘FCM’
Partition matrix exponent 2
Number of clusters 10
Maximum number of epochs 200
LSSVM Kernel ‘RBF_kernel’
Type ‘function estimation’
Maximum number of epochs 200
LSTM NumFeatures 1
NumResponses 1
NumHiddenUnits 100
Maximum number of epochs 200
Machine Learning Model Parameter Setting
Comparison of Prediction Errors of Different Machine Learning Models
国家 病例种类 权重系数
ANFIS LSSVM LSTM
巴西 确诊病例 0.433 7 0.229 3 0.336 1
死亡病例 0.258 6 0.633 7 0.106 9
恢复病例 0.127 8 0.169 3 0.702 9
德国 确诊病例 0.775 5 0.192 2 0.033 9
死亡病例 0.021 2 0.904 3 0.077 3
恢复病例 0.854 5 0.008 1 0.140 0
印度 确诊病例 0.959 8 0.000 8 0.039 7
死亡病例 0.386 3 0.360 7 0.252 8
恢复病例 0.021 3 0.236 0 0.743 4
The Weight Coefficients of the Ensemble Model Obtained by GWO Optimization
算法 误差指标 确诊病例 死亡病例 恢复病例
ANFIS RMSE 61 907.389 9 067.774 51 677.476
MAE 53 346.317 7 823.019 41 125.250
R2 0.988 0.670 0.992
LSSVM RMSE 102 874.191 3 519.665 54 504.770
MAE 93 877.450 3 138.636 44 329.653
R2 0.966 0.950 0.992
LSTM RMSE 65 036.197 4 691.670 44 754.954
MAE 57 760.524 4 207.787 31 966.607
R2 0.986 0.912 0.995
平均集合 RMSE 38 347.996 543.311 45 580.268
MAE 35 297.435 452.753 36 317.756
R2 0.995 0.999 0.994
GWO优化集合 RMSE 19 004.084 351.125 43 304.957
MAE 13 251.499 300.861 33 187.829
R2 0.999 1.000 0.995
Comparison of Prediction Errors of the Models in Brazil
算法 误差指标 确诊病例 死亡病例 恢复病例
ANFIS RMSE 2 570.413 36.020 3 604.251
MAE 1 464.941 21.973 2 326.934
R2 0.966 0.988 0.912
LSSVM RMSE 3 636.940 36.004 5 095.082
MAE 2 541.193 28.337 4 563.184
R2 0.933 0.988 0.824
LSTM RMSE 3 833.311 68.183 2 748.320
MAE 2 490.947 63.268 1 951.193
R2 0.925 0.958 0.949
平均集合 RMSE 3 222.102 37.925 3 491.498
MAE 2 030.421 30.370 2 801.723
R2 0.947 0.987 0.917
GWO优化集合 RMSE 2 379.995 28.624 2 003.162
MAE 1 516.312 19.324 1 468.277
R2 0.971 0.993 0.973
Comparison of Prediction Errors of the Models in Germany
算法 误差指标 确诊病例 死亡病例 恢复病例
ANFIS RMSE 53 816.132 5 749.633 65 452.553
MAE 44 922.194 5 224.422 53 986.358
R2 0.988 0.702 0.986
LSSVM RMSE 56 798.467 1 670.177 375 812.555
MAE 44 066.724 1 233.361 293 712.684
R2 0.987 0.975 0.540
LSTM RMSE 189 399.641 2 202.258 245 591.605
MAE 165 611.399 1 997.203 214 948.692
R2 0.853 0.956 0.804
平均集合 RMSE 99 469.679 2 380.371 228 009.859
MAE 84 628.157 1 779.095 187 542.846
R2 0.960 0.949 0.831
GWO优化集合 RMSE 32 503.480 1 187.918 48 544.028
MAE 26 845.140 980.660 41 433.556
R2 0.996 0.987 0.992
Comparison of Prediction Errors of the Models in India
Predicted and Actual Values of Different Models in Brazil
Predicted and Actual Values of Different Models in Germany
Predicted and Actual Values of Different Models in India
国家 模型 DM
确诊病例 死亡病例 恢复病例
巴西 ANFIS 5.945 7.130 1.930
LSSVM 8.588 7.583 2.273
LSTM 6.746 8.539 0.445
平均集合 5.364 3.373 0.976
GWO优化集合 - - -
德国 ANFIS 1.384 1.690 1.501
LSSVM 2.859 2.673 6.573
LSTM 2.683 7.020 2.974
平均集合 2.212 2.898 3.472
GWO优化集合 - - -
印度 ANFIS 4.469 7.950 3.204
LSSVM 3.869 3.096 5.108
LSTM 6.614 4.783 6.648
平均集合 5.826 4.454 5.571
GWO优化集合 - - -
Results of DM Test
[1] Wu T, Perrings C, Kinzig A, et al. Economic Growth, Urbanization, Globalization, and the Risks of Emerging Infectious Diseases in China: A Review[J]. Ambio, 2017, 46(1): 18-29.
doi: 10.1007/s13280-016-0809-2
[2] 陈叶, 王萍, 刘芳炜, 等. 埃博拉出血热研究进展[J]. 中国公共卫生, 2017, 33(1): 170-172.
[2] (Chen Ye, Wang Ping, Liu Fangwei, et al. Progress in Researches on Ebola Hemorrhagic Fever[J]. Chinese Journal of Public Health, 2017, 33(1): 170-172.)
[3] Devadoss P R, Pan S L, Singh S. Managing Knowledge Integration in a National Health-Care Crisis: Lessons Learned from Combating SARS in Singapore[J]. IEEE Transactions on Information Technology in Biomedicine, 2005, 9(2):266-275.
pmid: 16138543
[4] Racey P A, Fenton B. Mubareka S, et al. Don’t Misrepresent Link Between Bats and SARS[J]. Nature, 2018, 553(7688): 281.
[5] Zumla A, Hui D S, Perlman S. Middle East Respiratory Syndrome[J]. The Lancet, 2015, 386(9997): 995-1007.
doi: 10.1016/S0140-6736(15)60454-8
[6] Cauchemez S, Besnard M, Bompard P, et al. Association Between Zika Virus and Microcephaly in French Polynesia, 2013-15: A Retrospective Study[J]. The Lancet, 2016, 387(10033): 2125-2132.
doi: 10.1016/S0140-6736(16)00651-6
[7] Swapnarekha H, Behera H S, Nayak J, et al. Role of Intelligent Computing in COVID-19 Prognosis: A State-of-the-Art Review[J]. Chaos, Solitons & Fractals, 2020, 138: 109947.
doi: 10.1016/j.chaos.2020.109947
[8] Ghosal S, Sengupta S, Majumder M, et al. Linear Regression Analysis to Predict the Number of Deaths in India due to SARS-CoV-2 at 6 Weeks from Day 0 (100 Cases-March 14th 2020)[J]. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 2020, 14(4): 311-315.
[9] Ly K T. A COVID-19 Forecasting System Using Adaptive Neuro-Fuzzy Inference[J]. Finance Research Letters, 2021, 41: 101844.
doi: 10.1016/j.frl.2020.101844
[10] Borghi P H, Zakordonets O, Teixeira J P. A COVID-19 Time Series Forecasting Model Based on MLP ANN[J]. Procedia Computer Science, 2021, 181: 940-947.
doi: 10.1016/j.procs.2021.01.250
[11] Parbat D, Chakraborty M. A Python Based Support Vector Regression Model for Prediction of COVID19 Cases in India[J]. Chaos, Solitons & Fractals, 2020, 138: 109942.
doi: 10.1016/j.chaos.2020.109942
[12] Shastri S, Singh K, Kumar S, et al. Time Series Forecasting of Covid-19 Using Deep Learning Models: India-USA Comparative Case Study[J]. Chaos Solitons & Fractals, 2020, 140: 110227.
doi: 10.1016/j.chaos.2020.110227
[13] 洪彬, 陈锦秀, 王连生, 等. 基于SEIR-LSTM混合模型的新型冠状病毒肺炎传播趋势分析与预测[J]. 厦门大学学报(自然科学版), 2020, 59(6): 1034-1040.
[13] (Hong Bin, Chen Jinxiu, Wang Liansheng, et al. Analysis and Prediction of the Spread Trend of COVID-19 based on SEIR-LSTM Mixed Model[J]. Journal of Xiamen University(Natural Science), 2020, 59(6): 1034-1040.)
[14] 程玲华, 陈华友. 基于Theil不等系数的加权几何平均组合预测模型的性质[J]. 运筹与管理, 2007, 16(2): 78-83.
[14] (Cheng Linghua, Chen Huayou. Properties of Weighted Geometric Means Combination Forecasting Method Based on Theil Coefficient[J]. Operations Research and Management Science, 2007, 16(2): 78-83.)
[15] 袁宏俊, 钟梅, 吴庆鹏. 基于IGOWLA算子的区间组合预测模型[J]. 统计与决策, 2016 (14): 22-25.
[15] (Yuan Hongjun, Zhong Mei, Wu Qingpeng. Interval Combination Prediction Model based on IGOWLA Operator. Statistics & Decision, 2016 (14): 22-25.)
[16] Bates J M, Granger C W J. The Combination of Forecasts[J]. Journal of the Operational Research Society, 1969, 20(4): 451-468.
doi: 10.1057/jors.1969.103
[17] Ren Y, Suganthan P N, Srikanth N. Ensemble Methods for Wind and Solar Power Forecasting—A State-of-the-Art Review[J]. Renewable and Sustainable Energy Reviews, 2015, 50: 82-91.
doi: 10.1016/j.rser.2015.04.081
[18] Mirjalili S, Mirjalili S M, Lewis A. Grey Wolf Optimizer[J]. Advances in Engineering Software, 2014, 69: 46-61.
doi: 10.1016/j.advengsoft.2013.12.007
[19] Emary E, Zawbaa H M, Grosan C, et al. Feature Subset Selection Approach by Gray-Wolf Optimization[C]// Proceedings of Afro-European Conference for Industrial Advancement. 2015: 1-13.
[20] 王琛, 董永权. 基于二进制灰狼优化的特征选择及文本聚类[J]. 计算机工程与设计, 2021, 42(9): 2526-2535.
[20] (Wang Chen, Dong Yongquan. Feature Selection Based on Binary Grey Wolf Optimization and Text Clustering[J]. Computer Engineering and Design, 2021, 42(9): 2526-2535.)
[21] 李天翼, 陈红梅. 一种用于解决特征选择问题的新型混合演化算法[J]. 郑州大学学报(理学版), 2021, 53(2): 41-49.
[21] (Li Tianyi, Chen Hongmei. A New Hybrid Evolutionary Algorithm for Solving Feature Selection Problem[J]. Journal of Zhengzhou University (Natural Science Edition), 2021, 53(2): 41-49.)
[22] Wong L I, Sulaiman M H, Mohamed M R. Solving Economic Dispatch Problems with Practical Constraints Utilizing Grey Wolf Optimizer[J]. Applied Mechanics and Materials, 2015, 785: 511-515.
doi: 10.4028/www.scientific.net/AMM.785.511
[23] Kamboj V K, Bath S K, Dhillon J S. Solution of Non-Convex Economic Load Dispatch Problem Using Grey Wolf Optimizer[J]. Neural Computing and Applications, 2016, 27(5):1301-1316.
doi: 10.1007/s00521-015-1934-8
[24] Sulaiman M H, Ing W L, Mustaffa Z, et al. Grey Wolf Optimizer for Solving Economic Dispatch Problem with Valve-Loading Effects[J]. APRN Journal of Engineering and Applied Sciences, 2015, 10(21): 1619-1628.
[25] Jayabarathi T, Raghunathan T, Adarsh B R, et al. Economic Dispatch Using Hybrid Grey Wolf Optimizer[J]. Energy, 2016, 111: 630-641.
doi: 10.1016/j.energy.2016.05.105
[26] Yusof Y, Mustaffa Z. Time Series Forecasting of Energy Commodity Using Grey Wolf Optimizer[C]// Proceedings of the International MultiConference of Engineers and Computer Scientists. 2015.
[27] Mustaffa Z, Sulaiman M H, Kahar M N M. Training LSSVM with GWO for Price Forecasting[C]// Proceedings of 2015 International Conference on Informatics, Electronics& Vision. 2015: 1-6.
[28] Hassanin M F, Shoeb A M, Hassanien A E. Grey Wolf Optimizer-Based Back-Propagation Neural Network Algorithm[C]// Proceedings of the 12th International Computer Engineering Conference. 2016: 213-218.
[29] Jang J S R. ANFIS: Adaptive-Network-Based Fuzzy Inference System[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1993, 23(3): 665-685.
doi: 10.1109/21.256541
[30] Suykens J, Vandewalle J. Least Squares Support Vector Machine Classifiers[J]. Neural Processing Letters, 1999, 9: 293-300.
doi: 10.1023/A:1018628609742
[31] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
pmid: 9377276
[32] Clemen R T. Combining Forecasts: A Review and Annotated Bibliography[J]. International Journal of Forecasting, 1989, 5(4): 559-583.
doi: 10.1016/0169-2070(89)90012-5
[33] Dong E S, Du H R, Gardner L. An Interactive Web-Based Dashboard to Track COVID-19 in Real Time[J]. The Lancet Infectious Diseases, 2020, 20(5): 533-534.
doi: 10.1016/S1473-3099(20)30120-1
[34] Diebold F X, Mariano R S. Comparing Predictive Accuracy[J]. Journal of Business & Economic Statistics, 1995, 13(3): 253-263.
[1] Zhao Yang, Yan Zhouzhou, Shen Qiqi, Li Zhonghang. Evaluating Privacy Policy for Mobile Health APPs with Machine Learning[J]. 数据分析与知识发现, 2022, 6(5): 112-126.
[2] Wang Lu, Le Xiaoqiu. Research Progress on Citation Analysis of Scientific Papers[J]. 数据分析与知识发现, 2022, 6(4): 1-15.
[3] Wang Ruojia, Yan Chengxi, Guo Fengying, Wang Jimin. Predicting Churners of Online Health Communities Based on the User Persona[J]. 数据分析与知识发现, 2022, 6(2/3): 80-92.
[4] Wu Jinhong, Mu Keliang. Automatic Identifying Abnormal Behaviors of International Journals[J]. 数据分析与知识发现, 2022, 6(2/3): 385-395.
[5] Hu Yamin, Wu Xiaoyan, Chen Fang. Review of Technology Term Recognition Studies Based on Machine Learning[J]. 数据分析与知识发现, 2022, 6(2/3): 7-17.
[6] Chen Donghua,Zhao Hongmei,Shang Xiaopu,Zhang Runtong. Optimizing Large Hospital Operating Rooms with Data Analytics[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[7] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[8] Wang Hanxue,Cui Wenjuan,Zhou Yuanchun,Du Yi. Identifying Pathogens of Foodborne Diseases with Machine Learning[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[9] Su Qiang, Hou Xiaoli, Zou Ni. Predicting Surgical Infections Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[10] Cao Rui,Liao Bin,Li Min,Sun Ruina. Predicting Prices and Analyzing Features of Online Short-Term Rentals Based on XGBoost[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[11] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[12] Xiang Zhuoyuan,Liu Zhicong,Wu Yu. Adaptive Recommendation Model Based on User Behaviors[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[13] Zhou Zhichao. Review of Automatic Citation Classification Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(12): 14-24.
[14] Chai Guorong,Wang Bin,Sha Yongzhong. Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[15] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn