Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (10): 79-92    DOI: 10.11925/infotech.2096-3467.2022.0012
Current Issue | Archive | Adv Search |
Prediction and Early Warning Model for Environmental Data and Circulatory System Disease Death with Machine Learning
Wang Yan,Xu Meimei,Tong Yujia,Gou Huan,Cai Rong,Shan Zhiyi,An Xinying()
Institute of Medical Information, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China
Download: PDF (4902 KB)   HTML ( 22
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper builds a prediction and early warning model for circulatory system disease death, aiming to improve disease prevention. [Methods] We retrieved the death data of circulatory system diseases in a Chinese region from 2014 to 2018, and constructed the prediction model with GAM, RF and XGBoost. Then, we used the distributed lag nonlinear model to calculate the accumulative lag effect results, and built the early warning model. [Results] The continuous low and high temperatures, strong sunshine hours and high concentration of environmental pollutants would increase the risk of death from circulatory system diseases. The accumulative weekly relative risks were 1.236, 1.130, 1.560, 1.062, 1.218, 1.153 and 1.796 respectively. The RMSE of the RF and XGBoost models were 4.979 and 5.341 with good performance. Age, sex, temperature, sunshine hours, SO2, NO2, CO, O3, PM10, PM2.5 concentration are the characteristic variables, and the early warning value was determined from the data of accumulative lag effects. The early warning effect is good. The sensitivity, specificity and area under the curve of the XGBoost prediction results were 0.948, 0.939 and 0.941 respectively. [Limitations] We need to add data on concomitant diseases and their progress. [Conclusions] The regional number of deaths is related to the increase of age, men, temperature, sunshine hours and pollutant concentration. The new prediction and early warning model could benefit disease prevention and intervention.

Key wordsCirculatory Diseases      Prediction and Early Warning Model      XGBoost      DLNM      Random Forest     
Received: 05 January 2022      Published: 16 November 2022
ZTFLH:  TP393 R122  
Fund:Medical and Health Science and Technology Innovation Project of Chinese Academy of Medical Sciences(2021-I2M-1-033)
Corresponding Authors: An Xinying,ORCID:0000-0002-9870-7009      E-mail: an.xinying@imicams.ac.cn

Cite this article:

Wang Yan, Xu Meimei, Tong Yujia, Gou Huan, Cai Rong, Shan Zhiyi, An Xinying. Prediction and Early Warning Model for Environmental Data and Circulatory System Disease Death with Machine Learning. Data Analysis and Knowledge Discovery, 2022, 6(10): 79-92.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0012     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I10/79

Flow Chart of Meteorological Sensitive Disease Prediction and Early Warning
气象因素 最低值 最高值 P25(四分位数) x - ± s(均数加减标准差) P75(四分位数)
日最高气温(℃) -2.0 39.0 14.7 21.972±8.747 28.7
日平均气温(℃) -5.0 33.0 10.2 17.508±8.501 24.4
日最低气温(℃) -8.0 29.0 6.6 13.824±8.827 21.5
平均相对湿度(%) 23.0 100.0 73.0 79.939±11.364 88.0
平均气压(hPa) 892.0 1 040.0 1 008.5 1 015.783±9.618 1 022.9
平均风速(m/s) 0.0 12.0 1.4 1.982±0.937 2.4
降水量(0.1mm) 0.0 2 762.0 0.0 51.621±141.541 37.0
日照时数(0.1h) 0.0 130.0 0.0 44.424±41.312 83.0
SO2(μg/m3 4.0 73.0 8.0 12.943±7.028 15.0
NO2(μg/m3 6.0 122.0 26.0 39.193±17.664 51.0
CO(mg/m3 0.0 2.0 0.7 0.834±0.240 1.0
O3(8小时浓度) (μg/m3 6.0 249.0 65.0 94.829±41.288 121.0
PM10(μg/m3 7.0 282.0 38.0 63.083±36.275 79.0
PM2.5(μg/m3 4.0 219.0 23.0 39.743±25.127 49.0
Basic Information of Daily Environmental Monitoring Data from 2014 to 2018
Time Series Diagram of Basic Information of Environmental Monitoring Data
Scatter Diagram of Various Environmental Monitoring Data and Death Data of Circulatory System
相关检验结果 平均气温 平均相对湿度 平均气压 平均风速 降水量 日照时数 SO2 NO2 CO O3 PM10 PM2.5
平均气温 1.000 0.077 -0.894 0.025 0.072 0.154 -0.470 -0.599 -0.433 0.377 -0.500 -0.489
平均相对湿度 1.000 -0.181 -0.363 0.618 -0.559 -0.257 0.125 0.131 -0.399 -0.215 -0.087
平均气压 1.000 -0.005 -0.186 -0.063 0.451 0.546 0.329 -0.342 0.460 0.417
平均风速 1.000 -0.037 0.097 -0.188 -0.472 -0.307 0.049 -0.264 -0.300
降水量 1.000 -0.519 -0.235 -0.018 0.055 -0.316 -0.291 -0.194
日照时数 1.000 0.301 -0.096 -0.057 0.450 0.174 0.104
SO2 1.000 0.701 0.633 0.031 0.782 0.750
NO2 1.000 0.687 -0.231 0.740 0.738
CO 1.000 -0.156 0.711 0.794
O3 1.000 0.068 0.031
PM10 1.000 0.954
PM2.5 1.000
Spearman Correlation Test among Environmental Monitoring Data
Sorting of Death Variables Based on Boruta Algorithm
变量 累积滞后效应的相对危险度最大值及最大值出现情况
滞后0天 滞后3天 滞后7天 滞后30天
气温 1.032[1.008,1.032](-4℃) 1.099[1.031,1.170](-4℃) 1.234[1.099,1.391](-4℃) 1.952[1.553,2.454](-4℃)
日照
时数
1.021[1.004,1.039](13时) 1.062[1.014,1.112](13时) 1.130[1.035,1.233](13时) 1.269[1.002, 1.584](13时)
SO2 1.073[0.987,1.166](73μg/m3 1.227[0.986,1.525](73μg/m3 1.560[1.062,2.290](73μg/m3 2.868[1.470,5.595](73μg/m3
NO2 1.019[0.972,1.068](122μg/m3 1.046[0.926,1.182](122μg/m3 1.062[0.857,1.317](6μg/m3 1.027[0.691,1.523](122μg/m3
CO 0.558[0.169,1.841](2.05mg/m3 0.205[0.008,4.859](2.05mg/m3 0.046[0.001,16.063](2.05mg/m3 0.000[0.000,61.126](2.05mg/m3
O3 1.043[1.017,1.064](245μg/m3 1.113[1.047,1.183](245μg/m3 1.218[1.086,1.367](245μg/m3 1.203[0.912,1.586](245μg/m3
PM10 1.022[0.972,1.073](280μg/m3 1.064[0.934,1.212](280μg/m3 1.153[0.913,1.456](280μg/m3 2.048[1.319,3.181](280μg/m3
PM2.5 1.105[1.037,1.167](215μg/m3 1.315[1.127,1.534](215μg/m3 1.796[1.361,2.370](215μg/m3 1.973[1.429,2.545](215μg/m3
Relative Risk of Death from Circulatory Diseases with Different Lag Time
Environmental Monitoring Data and Death Contour Map of Circulatory System
Environmental Monitoring Data and Death Exposure Response Curve of Circulatory System
Fitting Effects of Three Prediction Models Based on Circulatory System Death Data
Fitting Curves of Three Prediction Models
模型 训练集RMSE 训练集MAE 测试集RMSE 测试集MAE
GAM 4.479 3.559 18.386 17.352
RF 2.150 1.697 4.979 4.008
XGBoost 1.273 0.986 5.341 4.220
Prediction Results of Different Models
结果比较 模型 灵敏度 特异度 AUC
无滞后阈值 GAM 0.745 0.792 0.760
RF 0.866 0.832 0.854
XGBoost 0.879 0.821 0.862
滞后7天阈值 GAM 0.924 0.935 0.927
RF 0.892 0.837 0.852
XGBoost 0.948 0.939 0.941
各疾病死亡数P75 GAM 0.923 0.913 0.936
RF 0.942 0.916 0.940
XGBoost 0.952 0.969 0.967
Early Warning Results of Different Models for Circulatory System Diseases
[1] 中国疾病预防控制中心. 我国某地区2014-2018年死因及环境监测数据.公共卫生数据科学中心[OL]. [2021-05-25].https://www.phsciencedata.cn/Share/renkoubei/index.jsp.
[1] (Chinese Center for Disease Control and Prevention. Cause of Death and Environmental Monitoring Data of a Region in China from 2014 to 2018 Public Health Data Science Center[OL]. [2021-05-25].https://www.phsciencedata.cn/Share/renkoubei/index.jsp.)
[2] 王嘉鑫, 石彦军, 卢山, 等. 我国华东与西南县域主要气象敏感性疾病变化特征及其医疗费用研究[J]. 沙漠与绿洲气象, 2019, 13(6): 133-140.
[2] (Wang Jiaxin, Shi Yanjun, Lu Shan, et al. A Study on the Change Characteristics of Major Weather Sensitive Diseases and Their Medical Expenses in the County of Eastern and Western China[J]. Desert and Oasis Meteorology, 2019, 13(6): 133-140.)
[3] 刘博, 党冰, 张楠, 等. 多种气象统计模型对比研究: 以气象敏感性疾病脑卒中预报为例[J]. 气象与环境学报, 2018, 34(4): 126-133.
[3] (Liu Bo, Dang Bing, Zhang Nan, et al. Comparison of Various Meteorological Statistical Forecasting Models-Taking Causing-Stroke Weather Forecasting as an Example[J]. Journal of Meteorology and Environment, 2018, 34(4): 126-133.)
[4] Ma P, Wang S G, Zhou J, et al. Meteorological Rhythms of Respiratory and Circulatory Diseases Revealed by Harmonic Analysis[J]. Heliyon, 2020, 6(5): e04034.
doi: 10.1016/j.heliyon.2020.e04034
[5] Liang H Q, Qiu H, Tian L W. Short-Term Effects of Fine Particulate Matter on Acute Myocardial Infraction Mortality and Years of Life Lost: A Time Series Study in Hong Kong[J]. Science of the Total Environment, 2018, 615: 558-563.
doi: 10.1016/j.scitotenv.2017.09.266
[6] Gasparrini A, Guo Y M, Hashizume M, et al. Mortality Risk Attributable to High and Low Ambient Temperature: A Multicountry Observational Study[J]. The Lancet, 2015, 386(9991): 369-375.
doi: 10.1016/S0140-6736(14)62114-0
[7] 王嘉鑫. 我国东西部县域主要气象敏感性疾病变化特征及其医疗费用研究[D]. 成都: 成都信息工程大学, 2019.
[7] (Wang Jiaxin. Study on Variation Characteristics and Medical Expenses of Major Meteorological Sensitive Diseases in Eastern and Western Counties of China[D]. Chengdu: Chengdu University of Information Engineering, 2019.)
[8] 孙兆彬, 安兴琴, 崔甍甍, 等. 北京地区颗粒物健康效应研究——沙尘天气、非沙尘天气下颗粒物(PM2.5、PM10)对心血管疾病入院人次的影响[J]. 中国环境科学, 2016, 36(8): 2536-2544.
[8] (Sun Zhaobin, An Xingqin, Cui Mengmeng, et al. The Effect of PM2.5 and PM10 on Cardiovascular and Cerebrovascular Diseases Admission Visitors in Beijing Areas During Dust Weather, Non-Dust Weather and Haze Pollution[J]. China Environmental Science, 2016, 36(8): 2536-2544.)
[9] 科技部. 科技部关于发布科技基础资源调查专项2016年度项目指南的通知[EB/OL]. [2016-07-28]. https://www.neac.gov.cn/seac/mzjy/201608/1016713.shtml.
[9] (Ministry of Science and Technology of the People’s Republic of China. Notice of the Ministry of Science and Technology on Issuing the 2016 Project Guide for the Special Investigation of Basic Science and Technology Resources[EB/OL]. [2016-07-28]. https://service.most.gov.cn/2015tztg_all/20160728/1131.html.)
[10] 黄学敏, 郑卓灵. 广东省佛山市高明区空气质量因素与呼吸系统疾病死亡的时间序列分析[J]. 现代医药卫生, 2021, 37(20): 3420-3425.
[10] (Huang Xuemin, Zheng Zhuoling. Time Series Analysis of Air Quality Factors and Death of Respiratory System Diseases in Gaoming District, Foshan City[J]. Modern Medicine & Health, 2021, 37 (20): 3420-3425.)
[11] 高琦. 气象因素对手足口病发病的影响及预测预警研究[D]. 济南: 山东大学, 2021.
[11] (Gao Qi. Impact of Meteorological Factors on Hand Foot and Mouth Disease and Forecast and Early Warning[D]. Ji’nan: Shandong University, 2021.)
[12] 钟沛丽. 我国流感流行特征、影响因素及模型预测研究[D]. 广州: 广州中医药大学, 2020.
[12] (Zhong Peili. A Study of Influenza Epidemic Character Istics, Influencing Factors and Model Prediction in China[D]. Guangzhou: Guangzhou University of Chinese Medicine, 2020.)
[13] Gasparrini A. Distributed Lag Linear and Non-linear Models in R: The Package DLNM[J]. Journal of Statistical Software, 2011, 43(8): 1-20.
pmid: 22003319
[14] 贾俊妹. 石家庄地区三种天气敏感性疾病的医疗气象预报[D]. 兰州: 兰州大学, 2017.
[14] (Jia Junmei. Medical Meteorological Forecast for Three Weather Sensitive Diseases in Shijiazhuang[D]. Lanzhou: Lanzhou University, 2017.)
[15] 刘志东. 气象因素致其他感染性腹泻发病综合风险评估及预警模型研究[D]. 济南: 山东大学, 2020.
[15] (Liu Zhidong. Impact of Meteorological Factor on Other Infectious Diarrhea: Comprehensive Risk Estimation and Early Warning Models[D]. Jinan: Shandong University, 2020.)
[16] 唐琳, 赵英, 周志华, 等. 基于气象因素的衡阳市手足口病疫情预警模型的建立[J]. 实用预防医学, 2016, 23(7): 889-893.
[16] (Tang Lin, Zhao Ying, Zhou Zhihua, et al. Establishment of HFMD Early-Warning Model Based on Meteorological Factors in Hengyang City[J]. Practical Preventive Medicine, 2016, 23(7): 889-893.)
[17] Gasparrini A. Modeling Exposure-Lag-Response Associations with Distributed Lag Non-linear Models[J]. Statistics in Medicine, 2014, 33(5): 881-899.
doi: 10.1002/sim.5963 pmid: 24027094
[18] 周凌柯. 数据校正技术的研究及应用[D]. 杭州: 浙江大学, 2005.
[18] (Zhou Lingke. Research on Data Reconciliation and Its Application[D]. Hangzhou: Zhejiang University, 2005.)
[19] Curriero F C, Heiner K S, Samet J M, et al. Temperature and Mortality in 11 Cities of the Eastern United States[J]. American Journal of Epidemiology, 2002, 155(1) : 80-87.
pmid: 11772788
[20] 甘涛. 基于特征选择方法识别喉癌和下咽癌患者的预后基因标志物[D]. 长春: 吉林大学, 2020.
[20] (Gan Tao. Identification of Prognostic Gene Signatures for Laryngocarcinoma and Hypoharyngeal Carcinoma Patients Using Feature Selection Methods[D]. Changchun: Jilin University, 2020.)
[21] Costa O Y A, de Hollander M, Pijl A, et al. Cultivation-Independent and Cultivation-Dependent Metagenomes Reveal Genetic and Enzymatic Potential of Microbial Community Involved in the Degradation of a Complex Microbial Polymer[J]. Microbiome, 2020, 8(1): 76.
doi: 10.1186/s40168-020-00836-7 pmid: 32482164
[22] 卢宏亮, 赵明松, 刘斌寅, 等. 基于Boruta-支持向量回归的安徽省土壤pH值预测制图[J]. 地理与地理信息科学, 2019, 35(5): 66-72.
[22] (Lu Hongliang, Zhao Mingsong, Liu Binyin, et al. Predictive Mapping of Soil pH in Anhui Province Based on Boruta-Support Vector Regression[J]. Geography and Geo-Information Science, 2019, 35(5): 66-72.)
[23] Rudnicki W R, Wrzesień M, Paja W.All Relevant Feature Selection Methods and Applications[A]// Feature Selection for Data and Pattern Recognition[M]. Cham: Springer, 2015: 11-28.
[24] 谷少华, 贺天锋, 陆蓓蓓, 等. 基于分布滞后非线性模型的归因风险评估方法及应用[J]. 中国卫生统计, 2016, 33(6): 959-962.
[24] (Gu Shaohua, He Tianfeng, Lu Beibei, et al. Measures and Application for Attributable Risk from Distributed Lag Non-Linear Model[J]. Chinese Journal of Health Statistics, 2016, 33(6): 959-962.)
[25] Hua J X, Zhang Y X, de Foy B, et al. Quantitative Estimation of Meteorological Impacts and the COVID-19 Lockdown Reductions on NO2 and PM2.5 over the Beijing Area Using Generalized Additive Models (GAM)[J]. Journal of Environmental Management, 2021, 291: 112676.
doi: 10.1016/j.jenvman.2021.112676
[26] 陶芳芳, 赵耐青, 何懿, 等. 广义相加模型在细菌性痢疾预警中的应用[J]. 中国卫生统计, 2012, 29(4): 481-483.
[26] (Tao Fangfang, Zhao Naiqing, He Yi, et al. Application of Generalized Additive Model in Early Warning of Bacillary Dysentery[J]. Chinese Journal of Health Statistics, 2012, 29(4): 481-483.)
[27] 陈丰, 张婷, 黄雅迪, 等. 越江越海隧道入口段追尾事故风险预测模型研究[J]. 交通运输系统工程与信息, 2021, 21(6): 167-175.
[27] (Chen Feng, Zhang Ting, Huang Yadi, et al. Rear-End Crash Risk Prediction Model on Entrance Section of Cross-River and Cross-Sea Tunnels[J]. Journal of Transportation Systems Engineering and Information Technology, 2021, 21(6): 167-175.)
[28] 冯晨, 陈志德. 基于XGBoost和LSTM加权组合模型在销售预测的应用[J]. 计算机系统应用, 2019, 28(10): 226-232.
[28] (Feng Chen, Chen Zhide. Application of Weighted Combination Model Based on XGBoost and LSTM in Sales Forecasting[J]. Computer Systems & Applications, 2019, 28(10): 226-232.)
[29] Duan W J, Wang X Q, Cheng S Y, et al. Influencing Factors of PM2.5 and O3 from 2016 to 2020 Based on DLNM and WRF-CMAQ[J]. Environmental Pollution, 2021, 285 : 117512.
doi: 10.1016/j.envpol.2021.117512
[30] 王瑛, 朱小红, 刘强, 等. 2017-2019年苏州市大气主要污染物PM2.5与人群死亡风险的关系[J]. 职业与健康, 2021, 37(20): 2803-2808.
[30] (Wang Ying, Zhu Xiaohong, Liu Qiang, et al. Relationship Between Atmospheric Pollutants PM2.5 and Human Death Risk in Suzhou City from 2017-2019[J]. Occupation and Health, 2021, 37(20): 2803-2808.)
[31] Hu L, Xing Y, Jiang P, et al. Predicting the Postmortem Interval Using Human Intestinal Microbiome Data and Random Forest Algorithm[J]. Science & Justice, 2021, 61(5): 516-527.
[32] Hu Y B, Cheng J, Jiang F, et al. Season-Stratified Effects of Meteorological Factors on Childhood Asthma in Shanghai, China[J]. Environmental Research, 2020, 191: 110115.
doi: 10.1016/j.envres.2020.110115
[33] Parida B R, Bar S, Kaskaoutis D, et al. Impact of COVID-19 Induced Lockdown on Land Surface Temperature, Aerosol, and Urban Heat in Europe and North America[J]. Sustainable Cities and Society, 2021, 75: 103336.
doi: 10.1016/j.scs.2021.103336
[34] 刘乐, 韦慧燕, 王兵亚, 等. 郑州市大气PM2.5与居民循环系统疾病死亡的相关性[J]. 环境与职业医学, 2021, 38(7): 740-746.
[34] (Liu Le, Wei Huiyan, Wang Bingya, et al. Correlations Between Atmospheric PM2.5 and Residents’ Circulatory Disease Deaths in Zhengzhou[J]. Journal of Environmental and Occupational Medicine, 2021, 38(7): 740-746.)
[35] 付洺宇, 朱一阳, 吴春勇, 等. 基于机器学习的药物血浆蛋白结合率的预测[J]. 中国药科大学学报, 2021, 52(6): 699-706.
[35] (Fu Mingyu, Zhu Yiyang, Wu Chunyong, et al. Prediction of Plasma Protein Binding Rate Based on Machine Learning[J]. Journal of China Pharmaceutical University, 2021, 52(6): 699-706.)
[36] 潘子妍, 邢素霞, 逄键梁, 等. 基于多特征融合与XGBoost的肺结节检测[J]. 中国医学物理学杂志, 2021, 38(11): 1371-1376.
[36] (Pan Ziyan, Xing Suxia, Pang Jianliang, et al. Lung Nodule Detection Based on Multi-Feature Fusion and XGBoost[J]. Chinese Journal of Medical Physics, 2021, 38(11): 1371-1376.)
[37] 闵晶晶, 丁德平, 李津, 等. 北京急性脑血管疾病与气象要素的关系及预测[J]. 气象, 2014, 40(1): 108-113.
[37] (Min Jingjing, Ding Deping, Li Jin, et al. Relationship Between Acute Cerebrovascular Disease and Meteorological Factors in Beijing and Its Forecast[J]. Meteorological Monthly, 2014, 40(1): 108-113.)
[38] 谢昀霏, 宋晓明, 方嘉堃, 等. 广州市氧化性污染物与气温对居民心脑血管疾病死亡风险的交互作用[J]. 环境与职业医学, 2021, 38(11): 1199-1206.
doi: 10.1097/00043764-199612000-00001
[38] (Xie Yunfei, Song Xiaoming, Fang Jiakun, et al. Interaction Between Oxidant Pollutants and Ambient Temperature on Cardio-Cerebrovascular Mortality Risks in Guangzhou, China[J]. Journal of Environmental and Occupational Medicine, 2021, 38(11): 1199-1206.)
doi: 10.1097/00043764-199612000-00001
[1] Ding Hao, Hu Guangwei, Qi Jianglei, Zhuang Guangguang. Recommending Medical Literature with Random Forest Model and Query Expansion[J]. 数据分析与知识发现, 2022, 6(7): 32-43.
[2] Liu Yuanchen, Wang Hao, Gao Yaqi. Predicting Online Music Playbacks and Influencing Factors[J]. 数据分析与知识发现, 2021, 5(8): 100-112.
[3] Cao Rui,Liao Bin,Li Min,Sun Ruina. Predicting Prices and Analyzing Features of Online Short-Term Rentals Based on XGBoost[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[4] Ding Yong,Chen Xi,Jiang Cuiqing,Wang Zhao. Predicting Online Ratings with Network Representation Learning and XGBoost[J]. 数据分析与知识发现, 2020, 4(11): 52-62.
[5] Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang. Classification of Short Texts Based on nLD-SVM-RF Model[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[6] Huiying Qi,Yuhe Jiang. Predicting Breast Cancer Survival Length with Multi-Omics Data Fusion[J]. 数据分析与知识发现, 2019, 3(8): 88-93.
[7] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[8] Wancheng Chen,Haoran Dai,Yinghan Jin. Appraising Home Prices with HEDONIC Model: Case Study of Seattle, U.S.[J]. 数据分析与知识发现, 2019, 3(5): 19-26.
[9] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[10] Zhou Cheng,Wei Hongqin. Identifying Crowd Participants with Modified Random Forests Algorithm[J]. 数据分析与知识发现, 2018, 2(7): 46-54.
[11] Chen Yuan,Wang Chaoqun,Hu Zhongyi,Wu Jiang. Identifying Malicious Websites with PCA and Random Forest Methods[J]. 数据分析与知识发现, 2018, 2(4): 71-80.
[12] Zhang Liyi,Li Yiran,Wen Xuan. Predicting Repeat Purchase Intention of New Consumers[J]. 数据分析与知识发现, 2018, 2(11): 10-18.
[13] Lv Weimin,Wang Xiaomei,Han Tao. Recommending Scientific Research Collaborators with Link Prediction and Extremely Randomized Trees Algorithm[J]. 数据分析与知识发现, 2017, 1(4): 38-45.
[14] Yuan Xinwei,Yang Shaohua,Wang Chaochao,Du Zhanhe. Identifying Lead Players of User Innovation Communities Based on Feature Extraction and Random Forest Classification[J]. 数据分析与知识发现, 2017, 1(11): 62-74.
[15] Zhang Liyi, Zhang Jiao. A Brusher Detection Method Based on Principle Component Analysis and Random Forest[J]. 现代图书情报技术, 2015, 31(10): 65-71.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn