Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (8): 65-75    DOI: 10.11925/infotech.2096-3467.2021.0188
Current Issue | Archive | Adv Search |
Predicting Surgical Infections Based on Machine Learning
Su Qiang1,Hou Xiaoli1(),Zou Ni2
1School of Economics and Management, Tongji University, Shanghai 200092, China
2Shanghai General Hospital, Shanghai 200240, China
Download: PDF (809 KB)   HTML ( 22
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a prediction model for post-operative infection based on a combined machine learning algorithm, aiming to effectively reduce surgical site infection risks. [Methods] First, we used SMOTE, ADASYN, and random oversampling to reduce the imbalance of the original data. Then, we combined five commonly used predictive models: Lasso, SVM, GBDT, ANN and RF to create a hybrid prediction method. Finally, we used the improved artificial bee colony algorithm to optimize the weight of multiple combinations. [Results] The G-mean and F1 values of the ABC combination strategy method reached 0.791 2 and 0.669 3 respectively, which were 15.15% and 23.62% higher than the existing ones. [Limitations] The sample size used in the study needs to be expanded. [Conclusions] The proposed model can effectively predict post-operative infections.

Key wordsSurgical Site Infection      Forecast Combination      Artificial Bee Colony Algorithm      Oversampling      Machine Learning     
Received: 01 March 2021      Published: 15 September 2021
ZTFLH:  R619  
Fund:National Natural Science Foundation of China(71972146);National Natural Science Foundation of China(71974127)
Corresponding Authors: Hou Xiaoli ORCID:0000-0003-3609-4734     E-mail: houxl@tongji.edu.cn

Cite this article:

Su Qiang, Hou Xiaoli, Zou Ni. Predicting Surgical Infections Based on Machine Learning. Data Analysis and Knowledge Discovery, 2021, 5(8): 65-75.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0188     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I8/65

Framework of Combined Forecasting Method
预测结果 真实类别
阳性 阴性
阳性 TP FP
阴性 FN TN
Confusion Matrix
变量选取 类别变量 连续变量
一般情况 性别、ICD诊断编码、入住ICU 年龄、BMI、术前住院时长、ICU住院天数
既往病史 高血压、糖尿病、恶性肿瘤、肿瘤转移、心梗史、COPD、肾病史、肝病史、吸烟史、饮酒史
术前状态 术前30天有腹水、术前输血史、术前机械通气 RBC总数、WBC、血红蛋白、血肌酐、尿素氮、总胆红素、ALT、AST、白蛋白、前白蛋白、空腹血糖
手术信息 二次手术、手术方式、手术类型、主刀医生、切口类型、造口、引流、引流管数、术后吸氧 手术时长、术后早期血糖
SSI Data Details
SSI Data Under PCA
数据集名称 变量
个数
样本
总数
多数
类样本数
少数
类样本数
不平
衡比
Wisconsin 9 683 444 239 1.86
Abalone17VS78910 8 2 338 2 280 58 39.31
Description of KEEL Data
采样方法 分类器 准确率 敏感性 特异性 精确率 AUC GM F1
/ Lasso 0.914 4 0.340 0 0.976 3 0.683 4 0.932 4 0.576 2 0.454 1
GBDT 0.900 8 0.500 0 0.944 1 0.590 2 0.877 8 0.687 1 0.541 4
SVM 0.898 8 0.340 0 0.959 1 0.492 3 0.896 3 0.571 1 0.402 2
ANN 0.873 6 0.480 0 0.916 1 0.533 4 0.849 4 0.663 1 0.505 3
RF 0.906 6 0.400 0 0.961 3 0.762 0 0.906 2 0.620 1 0.524 6
SMOTE Lasso 0.852 3 0.720 0 0.866 7 0.493 5 0.916 2 0.789 9 0.585 6
GBDT 0.897 0 0.660 0 0.922 6 0.528 7 0.907 7 0.780 3 0.587 1
SVM 0.844 5 0.680 0 0.862 4 0.473 8 0.893 6 0.765 8 0.558 4
ANN 0.875 6 0.500 0 0.916 1 0.566 0 0.850 6 0.676 8 0.530 9
RF 0.912 4 0.540 0 0.952 7 0.647 7 0.930 6 0.717 3 0.589 0
随机过采样 Lasso 0.851 9 0.720 0 0.859 5 0.459 5 0.913 8 0.786 7 0.560 9
GBDT 0.895 0 0.620 0 0.924 7 0.530 8 0.913 4 0.757 2 0.571 9
SVM 0.840 7 0.640 0 0.862 4 0.456 6 0.895 0 0.742 9 0.533 0
ANN 0.871 7 0.500 0 0.911 8 0.521 9 0.845 8 0.675 2 0.510 7
RF 0.908 5 0.500 0 0.952 7 0.669 9 0.930 6 0.690 2 0.572 6
ADASYN Lasso 0.848 4 0.720 0 0.862 3 0.413 6 0.918 5 0.788 0 0.525 4
GBDT 0.881 4 0.560 0 0.916 1 0.491 0 0.908 8 0.716 3 0.523 2
SVM 0.827 0 0.640 0 0.847 3 0.380 1 0.894 4 0.736 4 0.476 9
ANN 0.865 9 0.520 0 0.903 2 0.509 0 0.852 5 0.685 3 0.514 4
RF 0.902 7 0.540 0 0.941 9 0.613 4 0.930 0 0.713 2 0.574 3
The Results of Single Model Combining Different Sampling Methods(SSI Dataset)
采样方法 组合策略 准确率 敏感性 特异性 精确率 AUC GM F1
/ Mean 0.912 4 0.460 0 0.961 3 0.659 5 0.924 0 0.665 0 0.542 0
Median 0.912 4 0.420 0 0.965 6 0.595 8 0.904 4 0.636 8 0.492 7
ABC 0.929 9 0.440 0 0.982 8 0.790 3 0.934 5 0.657 6 0.565 3
SMOTE Mean 0.889 2 0.600 0 0.920 4 0.560 1 0.930 0 0.743 1 0.579 3
Median 0.893 1 0.640 0 0.920 4 0.568 4 0.926 3 0.767 5 0.602 1
ABC 0.898 9 0.660 0 0.924 7 0.613 8 0.930 2 0.781 2 0.636 1
随机过采样 Mean 0.887 2 0.620 0 0.916 1 0.548 0 0.914 9 0.753 7 0.581 8
Median 0.891 1 0.620 0 0.920 4 0.558 3 0.906 9 0.755 4 0.587 5
ABC 0.908 6 0.620 0 0.939 8 0.599 9 0.928 5 0.763 3 0.609 8
ADASYN Mean 0.875 6 0.620 0 0.903 2 0.525 7 0.925 9 0.748 3 0.569 0
Median 0.875 6 0.620 0 0.903 2 0.536 7 0.926 1 0.748 3 0.575 3
ABC 0.900 8 0.600 0 0.933 3 0.594 2 0.922 0 0.748 3 0.597 1
The Results of Combination Model Under Different Sampling Methods(SSI Dataset)
组合
策略
准确率 敏感性 特异性 精确率 AUC GM F1
Mean 0.891 1 0.640 0 0.918 3 0.675 3 0.935 1 0.766 6 0.657 2
Median 0.889 2 0.620 0 0.918 3 0.658 7 0.933 1 0.754 5 0.638 8
ABC 0.916 4 0.660 0 0.948 4 0.678 9 0.936 9 0.791 2 0.669 3
The Results of the Full Model(SSI Dataset)
采样方法 分类器 Wisconsin Abalone
准确率 AUC GM F1 准确率 AUC GM F1
/ Lasso 0.966 2 0.993 5 0.961 1 0.951 8 0.975 6 0.839 7 0.129 1 0.032 8
GBDT 0.957 4 0.989 5 0.950 3 0.939 2 0.964 1 0.826 2 0.485 0 0.259 9
SVM 0.963 3 0.993 5 0.961 0 0.948 9 0.975 2 0.940 3 0.000 0 \
ANN 0.945 6 0.978 0 0.940 4 0.922 2 0.972 2 0.931 2 0.536 5 0.390 5
RF 0.960 4 0.990 8 0.956 7 0.944 5 0.974 8 0.863 2 0.223 4 0.093 6
SMOTE Lasso 0.966 2 0.993 6 0.963 2 0.952 3 0.873 4 0.932 9 0.839 8 0.375 6
GBDT 0.961 8 0.990 3 0.957 8 0.946 5 0.920 9 0.869 8 0.748 5 0.394 6
SVM 0.960 4 0.993 0 0.957 8 0.944 9 0.883 7 0.947 2 0.846 8 0.357 5
ANN 0.944 1 0.968 9 0.940 3 0.920 3 0.905 5 0.921 2 0.820 1 0.435 7
RF 0.961 8 0.988 6 0.958 8 0.946 3 0.873 0 0.852 2 0.767 7 0.335 2
随机过采样 Lasso 0.969 1 0.994 2 0.966 5 0.956 4 0.868 7 0.932 1 0.846 7 0.382 1
GBDT 0.966 2 0.990 9 0.965 2 0.953 1 0.952 1 0.901 0 0.680 5 0.416 7
SVM 0.966 3 0.994 5 0.966 3 0.952 8 0.883 7 0.948 2 0.838 2 0.365 6
ANN 0.947 1 0.974 4 0.943 6 0.925 0 0.888 8 0.937 1 0.830 1 0.377 3
RF 0.966 3 0.991 8 0.965 2 0.952 9 0.915 3 0.860 9 0.734 0 0.293 8
ADASYN Lasso 0.966 2 0.993 4 0.968 2 0.953 1 0.871 3 0.933 3 0.848 0 0.367 8
GBDT 0.966 3 0.987 3 0.966 2 0.953 0 0.913 6 0.880 1 0.723 9 0.355 6
SVM 0.961 8 0.993 2 0.966 7 0.947 7 0.882 8 0.948 1 0.846 3 0.348 5
ANN 0.947 1 0.983 6 0.944 7 0.925 6 0.918 3 0.949 3 0.862 5 0.426 2
RF 0.967 8 0.989 6 0.968 4 0.955 2 0.863 6 0.858 2 0.771 8 0.322 3
The Results of Single Model Combining Different Sampling Methods (KEEL Dataset)
采样方法 组合策略 Wisconsin Abalone
准确率 AUC GM F1 准确率 AUC GM F1
/ Mean 0.958 9 0.992 8 0.947 7 0.941 3 0.976 5 0.920 9 0.339 7 0.207 0
Median 0.964 7 0.993 6 0.954 8 0.949 9 0.976 5 0.922 8 0.285 3 0.151 5
ABC 0.960 4 0.993 2 0.954 9 0.944 9 0.975 2 0.931 9 0.364 7 0.230 4
SMOTE Mean 0.970 7 0.993 4 0.970 2 0.959 0 0.907 2 0.936 1 0.846 2 0.440 8
Median 0.970 7 0.993 2 0.968 5 0.959 0 0.899 1 0.926 3 0.851 3 0.421 5
ABC 0.970 7 0.994 2 0.970 6 0.959 0 0.901 2 0.941 4 0.861 9 0.421 8
随机过采样 Mean 0.964 7 0.992 2 0.961 2 0.949 9 0.912 3 0.939 1 0.839 6 0.440 5
Median 0.967 7 0.993 5 0.962 7 0.954 5 0.902 9 0.933 5 0.825 4 0.389 7
ABC 0.967 7 0.992 8 0.964 8 0.954 5 0.912 3 0.951 0 0.860 0 0.477 7
ADASYN Mean 0.967 7 0.992 5 0.971 2 0.954 9 0.905 9 0.942 1 0.835 9 0.397 1
Median 0.970 7 0.993 8 0.972 6 0.959 0 0.902 1 0.943 6 0.871 0 0.414 5
ABC 0.972 1 0.993 8 0.975 6 0.961 2 0.897 8 0.937 5 0.859 4 0.427 6
混合模型 Mean 0.960 4 0.992 5 0.957 0 0.944 3 0.956 0 0.944 8 0.780 9 0.448 9
Median 0.960 4 0.993 2 0.959 1 0.944 7 0.920 0 0.912 3 0.743 9 0.425 6
ABC 0.970 7 0.994 9 0.976 9 0.959 3 0.907 6 0.953 4 0.883 8 0.479 3
The Results of Combination Model Under Three Combination Strategy (KEEL Dataset)
Ranking of GM and F1Under Different Combination Strategies
[1] Ke C Y, Jin Y, Evans H, et al. Prognostics of Surgical Site Infections Using Dynamic Health Data[J]. Journal of Biomedical Informatics, 2017, 65:22-33.
doi: 10.1016/j.jbi.2016.10.021
[2] de Lissovoy G, Fraeman K, Hutchins V, et al. Surgical Site Infection: Incidence and Impact on Hospital Utilization and Treatment Costs[J]. American Journal of Infection Control, 2009, 37(5):387-397.
doi: S0196-6553(09)00073-X pmid: 19398246
[3] Hedrick T L, Sawyer R G, Friel C M, et al. A Method for Estimating the Risk of Surgical Site Infection in Patients with Abdominal Colorectal Procedures[J]. Diseases of the Colon & Rectum, 2013, 56(5):627-637.
[4] Bilimoria K Y, Liu Y M, Paruch J L, et al. Development and Evaluation of the Universal ACS NSQIP Surgical Risk Calculator: A Decision Aid and Informed Consent Tool for Patients and Surgeons[J]. Journal of the American College of Surgeons, 2013, 217(5):833-842.
doi: 10.1016/j.jamcollsurg.2013.07.385 pmid: 24055383
[5] Amri R, Dinaux A M, Kunitake H, et al. Risk Stratification for Surgical Site Infections in Colon Cancer[J]. JAMA Surgery, 2017, 152(7):686-690.
doi: 10.1001/jamasurg.2017.0505
[6] Bergquist J R, Thiels C A, Etzioni D A, et al. Failure of Colorectal Surgical Site Infection Predictive Models Applied to an Independent Dataset: Do They Add Value or Just Confusion?[J]. Journal of the American College of Surgeons, 2016, 222(4):431-438.
doi: 10.1016/j.jamcollsurg.2015.12.034 pmid: 26847588
[7] Bartz-Kurycki M A, Charles G, Anderson K T, et al. Enhanced Neonatal Surgical Site Infection Prediction Model Utilizing Statistically and Clinically Significant Variables in Combination with a Machine Learning Algorithm[J]. American Journal of Surgery, 2018, 216(4):764-777.
doi: S0002-9610(18)30093-X pmid: 30078669
[8] Grundmeier R W, Rui X, Ross R K, et al. Identifying Surgical Site Infections in Electronic Health Data Using Predictive Models[J]. Journal of the American Medical Informatics Association, 2018, 25(9):1160-1166.
doi: 10.1093/jamia/ocy075 pmid: 29982511
[9] Kuo P J, Wu S C, Chien P C, et al. Artificial Neural Network Approach to Predict Surgical Site Infection after Free-Flap Reconstruction in Patients Receiving Surgery for Head and Neck Cancer[J]. Oncotarget, 2018, 9(17):13768-13782.
doi: 10.18632/oncotarget.v9i17
[10] Zhu M, Xia J, Jin X Q, et al. Class Weights Random Forest Algorithm for Processing Class Imbalanced Medical Data[J]. IEEE Access, 2018, 6:4641-4652.
doi: 10.1109/ACCESS.2018.2789428
[11] Guo X J, Yin Y L, Dong C L, et al. On the Class Imbalance Problem[C]// Proceedings of the 4th International Conference on Natural Computation. 2008: 192-201.
[12] He H B, Garcia E A. Learning from Imbalanced Data[J]. IEEE Transactions on Knowledge & Data Engineering, 2009, 21(9):1263-1284.
[13] Nekooeimehr I, Lai-Yuen S K. Adaptive Semi-unsupervised Weighted Oversampling (A-SUWO) for Imbalanced Datasets[J]. Expert Systems with Applications, 2015, 46:405-416.
doi: 10.1016/j.eswa.2015.10.031
[14] Rivera W A, Xanthopoulos P. A Priori Synthetic Over-sampling Methods for Increasing Classification Sensitivity in Imbalanced Data Sets[J]. Expert Systems with Applications, 2016, 66:124-135.
doi: 10.1016/j.eswa.2016.09.010
[15] Kourentzes N, Barrow D, Petropoulos F. Another Look at Forecast Selection and Combination: Evidence from Forecast Pooling[J]. International Journal of Production Economics, 2018, 209:226-235.
doi: 10.1016/j.ijpe.2018.05.019
[16] 李静, 刘潇, 王效俐. 邻域粗糙集融合网格搜索组合分类器的理财决策知识获取研究[J]. 数据分析与知识发现, 2019, 3(1):85-94.
[16] ( Li Jing, Liu Xiao, Wang Xiaoli. Financial Decision Knowledge Acquisition Based on Neighborhood Rough Set and Ensemble Classifiers with Grid Search[J]. Data Analysis and Knowledge Discovery, 2019, 3(1):85-94.)
[17] 单英浩, 付青, 耿炫, 等. 基于改进BP-SVM-ELM与粒子化SOM-LSF的微电网光伏发电组合预测方法[J]. 中国电机工程学报, 2016, 36(12):3334-3343.
[17] ( Shan Yinghao, Fu Qing, Geng Xuan, et al. Combined Forecasting of Photovoltaic Power Generation in Microgrid Based on the Improved BP-SVM-ELM and SOM-LSF with Particlization[J]. Proceedings of the CSEE, 2016, 36(12):3334-3343.)
[18] Blanc S M, Setzer T. When to Choose the Simple Average in Forecast Combination[J]. Journal of Business Research, 2016, 69(10):3951-3962.
doi: 10.1016/j.jbusres.2016.05.013
[19] 刘洋, 冯玉强, 邵真. 基于Bagging与决策树算法的在线拍卖成交价格预测模型[J]. 系统工程理论与实践, 2009, 29(12):134-140.
[19] ( Liu Yang, Feng Yuqiang, Shao Zhen. Online Auction Final Price Forecasting Model Based on Bagging and Decision Tree[J]. Systems Engineering-Theory & Practice, 2009, 29(12):134-140.)
[20] 杨贵军, 徐雪, 赵富强. 基于XGBoost算法的用户评分预测模型及应用[J]. 数据分析与知识发现, 2019, 3(1):118-126.
[20] ( Yang Guijun, Xu Xue, Zhao Fuqiang. Predicting User Ratings with XGBoost Algorithm[J]. Data Analysis and Knowledge Discovery, 2019, 3(1):118-126.)
[21] Karaboga D, Basturk B. A Powerful and Efficient Algorithm for Numerical Function Optimization: Artificial Bee Colony (ABC) Algorithm[J]. Journal of Global Optimization, 2007, 39(3):459-471.
doi: 10.1007/s10898-007-9149-x
[22] Horng M H. Multilevel Thresholding Selection Based on the Artificial Bee Colony Algorithm for Image Segmentation[J]. Expert Systems with Applications, 2011, 38(11):13785-13791.
[23] Gao W F, Sheng H L, Wang J, et al. Artificial Bee Colony Algorithm Based on Novel Mechanism for Fuzzy Portfolio Selection[J]. IEEE Transactions on Fuzzy Systems, 2019, 27(5):966-978.
doi: 10.1109/TFUZZ.91
[24] Wang J, Wang Z, Li X, et al. Artificial Bee Colony-based Combination Approach to Forecasting Agricultural Commodity Prices[J/OL]. International Journal of Forecasting, 2019. https://doi.org/10.1016/j.ijforecast.2019.08.006.
[25] Kiran M S, Hakli H, Gunduz M, et al. Artificial Bee Colony Algorithm with Variable Search Strategy for Continuous Optimization[J]. Information Sciences, 2015, 300:140-157.
doi: 10.1016/j.ins.2014.12.043
[26] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016: 171-172.
[26] ( Zhou Zhihua. Machine Learning[M]. Beijing: Tsinghua University Press, 2016: 171-172.)
[27] Alcalá-Fdez J, Fernández A, Luengo J, et al. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework[J]. Journal of Multiple-Valued Logic and Soft Computing, 2011, 17:255-287.
[1] Wang Hanxue,Cui Wenjuan,Zhou Yuanchun,Du Yi. Identifying Pathogens of Foodborne Diseases with Machine Learning[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] Chen Donghua,Zhao Hongmei,Shang Xiaopu,Zhang Runtong. Optimizing Large Hospital Operating Rooms with Data Analytics[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[3] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[4] Cao Rui,Liao Bin,Li Min,Sun Ruina. Predicting Prices and Analyzing Features of Online Short-Term Rentals Based on XGBoost[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[5] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[6] Xiang Zhuoyuan,Liu Zhicong,Wu Yu. Adaptive Recommendation Model Based on User Behaviors[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[7] Chai Guorong,Wang Bin,Sha Yongzhong. Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[8] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[9] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[10] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[11] Wang Shuyi,Liu Sai,Ma Zheng. Microblog Image Privacy Classification with Deep Transfer Learning[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
[12] Ruojia Wang,Lu Zhang,Jimin Wang. Automatic Triage of Online Doctor Services Based on Machine Learning[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
[13] Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[14] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[15] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn