Advanced Search

数据分析与知识发现  2018 , 2 (8): 10-15 https://doi.org/10.11925/infotech.2096-3467.2018.0205

研究论文

基于BRFSS数据库应用人工神经网络构建儿童哮喘预测模型*

马晓宇12, 张晗1, 赵玉虹12

1中国医科大学医学信息学院 沈阳 110122
2中国医科大学附属盛京医院临床流行病学教研室 沈阳 110004

Building Childhood Asthma Prediction Model with Artificial Neural Network and BRFSS Database

Ma Xiaoyu12, Zhang Han1, Zhao Yuhong12

1Department of Medical Informatics, China Medical University, Shenyang 110122, China
2Department of Clinical Epidemiology, Shengjing Hospital of China Medical University, Shenyang 110004, China

中图分类号:  R725.6 G35

通讯作者:  通讯作者: 赵玉虹, ORCID: 0000-0003-4265-6692, E-mail: joan@mail.cmu.edu.cn; zhaoyh@sj-hospital.org通讯作者: 赵玉虹, ORCID: 0000-0003-4265-6692, E-mail: joan@mail.cmu.edu.cn; zhaoyh@sj-hospital.org

收稿日期: 2018-02-26

修回日期:  2018-03-24

网络出版日期:  2018-08-25

版权声明:  2018 《数据分析与知识发现》编辑部 《数据分析与知识发现》编辑部

基金资助:  *本文系2017年度国家重点研发计划“精准医学”重点专项基金项目“东北区域自然人群队列研究”(项目编号: 2017YFC0907400)的研究成果之一

展开

摘要

目的】利用BRFSS数据库, 找出对儿童哮喘影响较大的高相关变量, 建立简单易行、无需侵入性临床指标的儿童哮喘预测模型。【方法】采用统计学方法对变量进行筛选, 利用BP人工神经网络的方法建立预测模型, 并与传统Logistic回归、决策树及支持向量机方法所建模型进行比较。【结果】最终纳入预测模型的变量共4项, 包括哮喘史、吸入器使用是否正确、确诊年龄、家庭收入。BP人工神经网络建立的预测模型准确度达0.723, 灵敏度达0.697, 特异度达0.680。【局限】BRFSS数据库属回访型调查, 数据存在缺失, 一定程度上会影响预测效果。【结论】BP人工神经网络建立的儿童哮喘最优预测模型对影响因素多且关系复杂的哮喘疾病, 更能发挥其自适应强的优点。

关键词: BP人工神经网络 ; 儿童哮喘 ; 预测模型

Abstract

[Objective] This study tries to identify high-correlated variables with significant impacts on childhood asthma, aiming to establish predictive model without invasive clinical indicators. [Methods] First, we used statistical methods to identify the needed variables from the BRFSS database. Second, we employed the back propagation artificial neural network to build the prediction model. Finally, we compared the performance of the new model with three other methods: the traditional logistic regression, decision tree and support vector machine. [Results] The identified variables included history of asthma, correct use of inhaler, age of diagnosis, and family income. The proposed model has an accuracy of 0.723, a sensitivity of 0.697 and a specificity of 0.680. [Limitations] The BRFSS database has lots of missing data, which may influence the prediction accuracy. [Conclusions] The self-adaptable BP artificial neural network, could help us establish better prediction models for childhood asthma.

Keywords: Back Propagation ; Artificial Neural Network ; Childhood Asthma ; Prediction Model

0

PDF (425KB) 元数据 多维度评价 相关文章 收藏文章

本文引用格式 导出 EndNote Ris Bibtex

马晓宇, 张晗, 赵玉虹. 基于BRFSS数据库应用人工神经网络构建儿童哮喘预测模型*[J]. 数据分析与知识发现, 2018, 2(8): 10-15 https://doi.org/10.11925/infotech.2096-3467.2018.0205

Ma Xiaoyu, Zhang Han, Zhao Yuhong. Building Childhood Asthma Prediction Model with Artificial Neural Network and BRFSS Database[J]. Data Analysis and Knowledge Discovery, 2018, 2(8): 10-15 https://doi.org/10.11925/infotech.2096-3467.2018.0205

1 引 言

哮喘是最常见的儿童慢性病之一, 影响着世界上9.6%-13.0%的儿童, 造成巨大的经济和社会负担[1]。2017年全球哮喘防治创议(GINA)[2]指出, 儿童尤其是5岁及以下的婴幼儿, 由于无法进行肺功能检测, 出现咳嗽、喘息属于正常现象, 所以针对儿童哮喘尚无诊断金标准。目前, 临床医生对儿童哮喘的诊断灵敏度仅为29.0%, 阳性预测值为23.0%, 误诊率极高[3]。因此, 很多基于危险因素建立的儿童哮喘预测模型应运而生, 但尚没有一个模型可以很好地预测患病风险高的儿童, 并且排除患病风险低的儿童, 即模型的灵敏度与特异度均较高[4]。临床上亟需有基于较大量数据、预测性能较好的儿童哮喘预测模型。

美国疾病控制与预防中心(Centers for Disease Control and Prevention, CDC)于1984年建立行为风险因素监测系统(Behavioral Risk Factor Surveillance System, BRFSS)[5], 主要研究行为风险因素对疾病的影响, 以电话访问的形式, 每年进行40余万人次的访问, 是美国CDC对行为风险因素监测的首要工具, 是世界上最大的行为风险因素监测系统。2006年增加儿童哮喘回访调查(Asthma Call-Back Survey, ACBS)。本研究利用该项调查数据, 选取高相关预测变量, 构建以行为风险因素为主的儿童哮喘预测模型, 无需侵入性临床指标, 简单易行, 为患者及其家属提供哮喘预防的指导与建议。

2 相关研究

现有的儿童哮喘预测模型可分为两种: 基于普通队列人群, 预测其学龄期是否会患有哮喘[6,7]; 基于反复出现咳嗽喘息等症状, 或已被确诊为哮喘的婴幼儿, 预测其学龄期是否会患有持续性哮喘[8,9]。建模指标多数为临床指标、特征指标与行为风险指标相结合, 而临床指标(FeNO等)对于5岁及以下的儿童来说获取较难, 依从性差, 数值多为估计值[2], 特征指标(过去一个月内喘息次数等)多来自于家长填写调查问卷, 存在严重的回忆偏倚和报告偏倚, 而行为风险指标(社会经济环境等), 对于儿童而言, 数据更易获取, 真实可信, 其与疾病的相关性逐渐得到研究证实[10,11], 故本文选用行为风险指标, 建立一个简单易行的儿童哮喘预测模型。

自21世纪初, 越来越多的研究者从机器学习的方向出发, 评价哮喘及其他疾病发生、发展及预后情况。Finkelstein等[12]应用贝叶斯网络、决策树的方法, 建立儿童哮喘7天监测系统, 预测哮喘是否发生恶化; Kim等[13]基于支持向量机的方法建立乳腺癌复发预测模型; 牟冬梅等[14]选用决策树的方法构建妊娠高血压综合征危险因素预测模型。本文应用具有强大非线性映射能力的机器学习方法[15], 弥补传统Logistic回归方法无法处理共线性问题、易出现过拟合的不足。

3 哮喘预测模型构建方案

研究表明, 五分之四的哮喘患者在6岁之前出现喘息症状, 其中大多数在3岁之前[16]。儿童哮喘具有自愈性, 随着年龄的增长, 气道逐渐发育完善, 哮喘症状会逐渐缓解, 但仍有三分之一的儿童在6岁及以后会持续出现哮喘类症状[17], 如果病情延长至学龄期仍不能控制, 则需要终身治疗。因此, 现有儿童哮喘预测模型, 纳入年龄多数集中在0-3岁, 结局年龄为6-12岁。本研究也据此设定儿童纳入年龄及结局年龄, 具体研究流程如图1所示。

图1   基于BRFSS数据库建立儿童哮喘预测模型的研究流程

   

(1) 下载数据: 访问BRFSS数据库Survey Data & Documentation模块, 选取哮喘回访调查(ACBS)子数据库, 下载2011年、2012年、2014年儿童(Child)回访调查数据(由于下载数据时, BRFSS数据库未提供2013年儿童哮喘回访调查数据, 故未纳入该年)。

(2) 选取变量: 依据2017 GINA指南[2]中建议易引发儿童哮喘的危险因素, 根据前人研究中纳入的危险因素[6,8,18], 根据临床儿科专家建议的危险因素, 将整理后的数据录入Excel表。

(3) 筛选特征变量: 为验证各个特征变量对结局变量是否有影响, 首先应用SPSS19.0对各个变量进行单因素分析(独立样本T检验或卡方检验)初步筛选变量, 而后将经单因素分析得到的变量纳入Logistic回归分析, p<0.05的变量具有统计学意义, 最终将被纳入预测模型。

(4) 构建哮喘预测模型: 选用机器学习中的监督学习方法, 进行哮喘预测模型的构建, 基于Python语言, 应用Eclipse软件, 首先纳入全部变量建立预测模型, 随后纳入经筛选得到的高相关变量建立模型, 比较得出变量筛选过程的意义。

(5) 评价模型: 比较人工神经网络模型与传统Logistic回归模型、决策树模型、支持向量机模型预测的准确度、灵敏度及特异度, 采用Weka3.6.0软件包实现上述操作, Weka 是基于Java语言编写的数据挖掘机器学习软件, 包括完整的数据处理工具、学习算法和评价方法[19]

4 研究过程

4.1 哮喘预测模型的建立

(1) 研究对象

在所下载的数据中, 选取第一次被医生确诊为哮喘的年龄为0-3岁(How old was [child's name] when a doctor or other health professional first said [he/she] had asthma?), 且接受哮喘回访调查时的年龄为6-12岁的儿童, 共计810名。

(2) 研究变量

在ACBS数据库中提取预测儿童哮喘发生的危险因素共19项, 分为连续型变量和分类型变量, 如表1所示。

表1   儿童哮喘预测指标信息

   

数据类型变量ACBS中所示名称类数/数值范围
分类型孩子性别RCSGENDR2
孩子种族@_RACE4
医生是否教过吸入器的使用INHALERH3
孩子有无保险INS_TYP2
家长有无保险INS12
被动吸烟SMOKE2
家长受教育程度@_EDUCAG3
家庭收入@_INCOMG6
家长有无心脏病(或心肌梗塞)CVDINFR42
家长有无肾病CHCKIDNY2
家长有无糖尿病DIABETE32
家长有无慢阻肺、慢性支气管炎CHCCOPD12
家长有无关节炎HAVARTH32
家长有无抑郁ADDEPEV22
家长有无哮喘ASTHMA32
数值型孩子确诊年龄AGEDX0-3
孩子出生月份BRTHMNTH1-12
孩子出生体重BIRTHW10.7-12
母亲生产时年龄AGEM9-65

新窗口打开

其中, 种族包括4类: 白种人(非西班牙裔)、黑种人(非西班牙裔)、西班牙裔、其他; 吸入器教学分为三类: 教过、未教过、从未使用过吸入器; 家长受教育程度分为三类: 大学及以上学历、高中学历、高中以下学历; 家庭收入分为6类: 年收入在15 000美元以下、25 000美元以下、35 000美元以下、50 000美元以下、50 000美元以上、其他。上述分类均依据BRFSS数据库中对变量的分类方式。

19个变量中有17个均为数据库中有的变量, 为直接变量, 有2个变量为计算所得变量:被动吸烟, 根据孩子确诊年龄(AGEDX), 孩子现在年龄(CHILDAGE), 以及家长最后一次吸烟距今多长时间(LASTSMK2), 来计算孩子在确诊时期是否处于被动吸烟的状态, 记为SMOKE; 母亲生产时年龄, 根据家长年龄(AGE), 家长性别(SEX)以及孩子年龄(CHILDAGE)计算而得, 记为AGEM。

(3) 结局变量

选取数据库中变量@_CUR_ASTH_C(STILL HAVE ASTHMA)为结局变量, 确定孩子在接受哮喘回访调查时(6-12岁)是否还患有哮喘, 在纳入的810名研究对象中, 有526人在学龄期患有哮喘, 有284人未患有哮喘, 比例约为1.852:1, 数据较为平衡。

(4) 特征变量选择

为提高预测模型的准确性, 需对纳入变量进行高相关因素筛选。经单因素分析后初步纳入的变量有(p<0.05): 医生是否教过吸入器的使用、被动吸烟、家庭收入、家长有无慢阻肺/慢性支气管炎、家长有无哮喘、孩子确诊年龄、孩子出生月份。在单因素分析的基础上经 Logistic 回归分析筛选出的预测变量有(p<0.05): 家长有无哮喘、医生是否教过吸入器的使用、孩子确诊年龄、家庭收入。筛选结果如表2所示。

表2   Logistic回归变量筛选结果

   

变量名BS.E.Exp(B)95%Exp(B)Sig.
下限上限
ASTHMA3-0.8140.2200.4430.2880.6820.000
INHALERH-0.2630.0400.7690.7110.8310.000
AGEDX0.1590.0701.1731.0221.3450.023
@_INCOMG-0.0940.0420.9100.8390.9880.025

新窗口打开

(5) 数据预处理

删除19个预测变量中存在缺失值的患者信息, 删除记录错误的患者信息, 例如变量前后矛盾: “家长与孩子的关系”为“母亲”, “家长性别”却为“男”。选用Weka中Discretize过滤器进行离散化, 全路径名称为weka.filters.unsupervised.attribute.Discretize, 选择等频离散, 使得每个子划分空间的样本分布尽量均匀[19]

而后按照约7:3的比例[20]将数据分为训练集(N1 =567)和测试集(N2=243), 训练集用于进行网络学习及调整, 构建预测模型, 测试集用于评价模型的性能, 在Python语言中调用train_test_split包实现上述过程。

(6) 预测模型建立及性能评估

人工神经网络是基于对人脑神经组织结构和运行机制的认识与理解的基础上模拟其结构和行为的一种工程系统[21]。在医学中应用较为广泛的是BP神经网络(Back Propagation), 即误差逆向传递神经网络, 是由美国学者Rumenlhart等[22]首先提出来的, 其特点是在训练过程中将输出值同实际值的差异(误差)不断地反传给网络, 调整各层之间的权重大小, 以求使理论值与实际值的误差降到最小[23,24]

①前向计算过程: 根据初始化权重Wij对各层神经元节点求和, 并进行非线性转换, 激活函数为${{O}_{j}}=\frac{1}{1+{{e}^{-Ij}}}$, 加上初始化的误差${{\theta }_{j}}$计算最终输出值;

②反向传播过程: 对比输出值与真实值之间的差距, 以最小化误差($Er{{r}_{j}}$)更新每个连接的权重, 对于输出层$Er{{r}_{j}}=$ ${{O}_{j}}(1-{{O}_{j}})({{T}_{j}}-{{O}_{j}})$, 隐藏层$Er{{r}_{j}}={{O}_{j}}(1-{{O}_{j}})\sum\nolimits_{k}{Er{{r}_{K}}{{W}_{jk}}}$, 误差更新${{\theta }_{j}}={{\theta }_{j}}+(l)Er{{r}_{j}}$, 权重更新${{W}_{ij}}={{W}_{ij}}+(l)Er{{r}_{j}}{{O}_{i}}$;

③终止循环条件: 权重更新低于某个阈值, 或预测错误率低于某个阈值, 或循环次数达到预设值。

本文建立人工神经网络模型时采用Python语言中的sklearn库, 是基于numpy库和scipy库的机器学习算法库, 可对大部分的机器学习算法进行Python语言的实现, 核心代码如下。

class NeuralNetwork:

def __init__(self, layers, activation='tanh'):

if activation == 'logistic':

self.activation = logistic

self.activation_deriv = logistic_derivative

elif activation == 'tanh':

self.activation = tanh

self.activation_deriv = tanh_deriv

self.weights = []

for i in range(1, len(layers) - 1):

self.weights.append((2*np.random.random((layers [i-1] + 1, layers[i] + 1))-1)*0.25)

self.weights.append((2*np.random.random((layers [i] + 1, layers[i + 1]))-1)*0.25)

Chatzimichail等[9]应用人工神经网络的方法对5岁之前确诊为哮喘的儿童预测其7-14岁是否还患有哮喘, 模型的准确度达到94.8%, 证明该机器学习方法对建立儿童哮喘预测模型的可行性, 但由于其研究人群数量较少, 仅纳入112名儿童, 所以模型的泛化能力有待验证。本研究克服研究人数过少的弊端, 基于BRFSS数据库, 对较大量样本数据进行分析, 最终应用BP人工神经网络建立预测模型。

在Eclipse中, 应用Python中的metrics包输出预测模型评价指标。本研究共243个测试集样本, 应用全部变量建立的模型, 准确度达0.637, 灵敏度达0.630, 特异度达0.556; 应用高相关变量建立的模型, 准确度达0.723, 灵敏度达0.697, 特异度达0.680。结果表明, 不相关变量的存在降低了模型的预测性能, 证明了变量筛选的意义。同时, 鉴于现有模型大多存在灵敏度与特异度无法达到均高的弊端[4], 本研究最终建立的模型性能属中上等, 且简单易行, 依从性高。

4.2 对比实验

使用Logistic回归、决策树及支持向量机三种方法建立预测模型, 并根据相应预测结果与本研究预测模型作对比, 在Weka中选择上述三种方法对应的分类算法Logistic、J48及SMO建立预测模型(常用的决策树算法包括ID3算法及J48算法, 由于ID3算法依据信息增益作为测试属性的标准, 会偏向于高度分支属性, 但该属性不一定是最优属性, 而J48算法采用信息增益率作为测试属性的标准, 所选属性更加科学可信[25], 故选用J48算法), 参数均默认。4种模型的准确性及模型性能评价如表3所示。

表3   4种不同算法所构建模型预测性能比较

   

方法准确度灵敏度特异度
BP人工神经网络0.7230.6970.680
Logistic回归0.7020.7120.492
决策树0.6910.7080.545
支持向量机0.6960.7120.523

新窗口打开

4.3 实验分析

表3可知, BP人工神经网络模型的准确度最高, 虽然灵敏度较其他三个模型略低近0.015, 但准确度及特异度是4种算法中最高的, 而这正弥补了现有模型灵敏度与特异度无法达到均高的弊端, 说明BP人工神经网络可以更好地适应所给数据, 建立预测性能较高的模型。

现有儿童哮喘预测模型多以传统Logistic回归方法为主, 但受Logistic回归原理的限制[26], 要求变量之间满足独立性等条件, 且无法处理变量间共线性的问题, 而哮喘是受遗传、环境暴露、社会经济等多方面共同影响的复杂疾病, 变量之间也存在共线的情况, 故现有模型存在一些局限性, 也是模型预测性能普遍较差的原因。而人工神经网络是一个由大量处理单元组成的高度复杂的非线性自适应系统, 其对变量类型、分布没有任何要求, 可以研究任何输入(自变量)和输出(因变量)之间的任意映射, 对信息进行大规模并行处理, 具有强大的解决共线性效应和变量间相互作用的能力, 善于处理模糊的、噪声较大的、非线性的数据情况[27]。目前, 应用人工神经网络建立儿童哮喘预测模型的研究甚少, 还远不及Logistic回归应用广泛。本研究也有一定的局限性, 选取的BRFSS数据库, 基于电话访问的形式, 属于回访型调查, 受访者的应答率还有待提高, 数据存在一定的缺失值, 在一定程度上影响模型的预测性能。

5 结 语

本文对0-3岁儿童预测其学龄期是否会患有哮喘, 应用统计学分析方法对变量进行筛选, 通过BP神经网络构建儿童哮喘预测模型, 极大改善了现有模型预测效果差的弊端。本研究首次应用BRFSS行为监测数据库, 选取行为风险因素指标, 易获取, 依从性高, 避免选取临床指标及特征指标的不足, 对患者及家属从行为因素方面给予哮喘预防的建议及指导。未来研究可考虑加入其他来源的患者作为外部验证数据集, 进一步完善模型性能。

作者贡献声明

马晓宇: 设计研究方案, 数据分析, 建模实验, 撰写论文;

张晗: 修改论文;

赵玉虹: 提出研究思路, 论文最终版本修订。

利益冲突声明

所有作者声明不存在利益冲突关系。

支撑数据

支撑数据由作者自存储, E-mail: 694558951@qq.com。

[1] 马晓宇. Child-ACBS.sav. BRFSS儿童哮喘回访调查原始数据.

[2] 马晓宇. Data.csv. 儿童哮喘患者建模数据.


参考文献

[1] Valet R S, Gebretsadik T, Carroll K N, et al.

High Asthma Prevalence and Increased Morbidity Among Rural Children in a Medicaid Cohort

[J]. Annals of Allergy Asthma & Immunology, 2011, 106(6): 467-473.

[本文引用: 1]     

[2]

GINA Global Strategy for Asthma Management and Prevention updated 2017

[R/OL]. [2017-12-25]. .

URL      [本文引用: 3]     

[3] Caudri D, Wijga A, Schipper C M A, et al.

Predicting the Long-Term Prognosis of Children with Symptoms Suggestive of Asthma at Preschool Age

[J]. The Journal of Allergy Clinical Immunology, 2009, 124(5): 903-910.

https://doi.org/10.1016/j.jaci.2009.06.045      URL      [本文引用: 1]     

[4] Smit H A, Pinart M, Antó J M, et al.

Childhood Asthma Prediction Models: A Systematic Review

[J]. Lancet Respiratory Medicine, 2015, 3(12): 973-984.

https://doi.org/10.1016/S2213-2600(15)00428-2      URL      PMID: 26597131      [本文引用: 2]      摘要

Early identification of children at risk of developing asthma at school age is crucial, but the usefulness of childhood asthma prediction models in clinical practice is still unclear. We systematically reviewed all existing prediction models to identify preschool children with asthma-like symptoms at risk of developing asthma at school age. Studies were included if they developed a new prediction model or updated an existing model in children aged 4 years or younger with asthma-like symptoms, with assessment of asthma done between 6 and 12 years of age. 12 prediction models were identified in four types of cohorts of preschool children: those with health-care visits, those with parent-reported symptoms, those at high risk of asthma, or children in the general population. Four basic models included non-invasive, easy-to-obtain predictors only, notably family history, allergic disease comorbidities or precursors of asthma, and severity of early symptoms. Eight extended models included additional clinical tests, mostly specific IgE determination. Some models could better predict asthma development and other models could better rule out asthma development, but the predictive performance of no single model stood out in both aspects simultaneously. This finding suggests that there is a large proportion of preschool children with wheeze for which prediction of asthma development is difficult.
[5] Behavioral Risk Factor Surveillance System [DB/OL]. [2017-11-28]..

URL      [本文引用: 1]     

[6] Castro-Rodríguez J A, Holberg C J, Wright A L, et al.

A Clinical Index to Define Risk of Asthma in Young Children with Recurrent Wheezing

[J]. American Journal of Respiratory and Critical Care Medicine, 2000, 162(4): 1403-1406.

https://doi.org/10.1164/ajrccm.162.4.9912111      URL      PMID: 11029352      [本文引用: 2]      摘要

Abstract Because most cases of asthma begin during the first years of life, identification of young children at high risk of developing the disease is an important public health priority. We used data from the Tucson Children's Respiratory Study to develop two indices for the prediction of asthma. A stringent index included frequent wheezing during the first 3 yr of life and either one major risk factor (parental history of asthma or eczema) or two of three minor risk factors (eosinophilia, wheezing without colds, and allergic rhinitis). A loose index required any wheezing during the first 3 yr of life plus the same combination of risk factors described previously. Children with a positive loose index were 2.6 to 5.5 times more likely to have active asthma between ages 6 and 13 than children with a negative loose index. Risk of having subsequent asthma increased to 4.3 to 9.8 times when a stringent index was used. We found that 59% of children with a positive loose index and 76% of those with a positive stringent index had active asthma in at least one survey during the school years. Over 95% of children with a negative stringent index never had active asthma between ages 6 and 13. We conclude that the subsequent development of asthma can be predicted with reasonable accuracy using simple, clinically based parameters.
[7] Brand P L.

The Asthma Predictive Index: Not a Useful Tool in Clinical Practice

[J]. The Journal of Allergy Clinical Immunology, 2011, 127(1): 293-294.

https://doi.org/10.1016/j.jaci.2010.10.012      URL      PMID: 21075441      [本文引用: 1]      摘要

Comment in J Allergy Clin Immunol. 2011 Apr;127(4):1082-3. Comment on J Allergy Clin Immunol. 2010 Aug;126(2):212-6.
[8] Van Der Mark L B, Van Wonderen K E, Mohrs J, et al.

Predicting Asthma in Preschool Children at High Risk Presenting in Primary Care: Development of a Clinical Asthma Prediction Score

[J]. Primary Care Respiratory Journal, 2014, 23(1): 52-59.

https://doi.org/10.4104/pcrj.2014.00003      URL      [本文引用: 2]     

[9] Chatzimichail E, Paraskakis E, Rigas A.

An Evolutionary Two-Objective Genetic Algorithm for Asthma Prediction

[C]// Proceedings of 2013 UKSim 15th International Conference on Computer Modelling and Simulation, Cambridge, United Kingdom. US: IEEE, 2013.

[本文引用: 2]     

[10] 刘苗苗, 王达, 任万辉, .

家养皮毛宠物与儿童哮喘关系

[J]. 中国公共卫生, 2012, 28(11): 1420-1430.

https://doi.org/10.11847/zgggws2012-28-11-11      URL      Magsci      [本文引用: 1]      摘要

<b>目的 </b>了解北京市西城区中学生减肥相关行为现状与特点,探讨相关因素,为制定政策及干预措施提供科学依据。<b>方法 </b>对北京市西城区9所学校1 930名中学生进行减肥相关行为调查。<b>结果 </b>有32.3%的学生采取过不健康减肥行为;节食、禁食、诱导呕吐或腹泻、吃减肥药报告率分别为31.9%、3.8%、1.4%、2.7%;女生采取减肥行为的危险是男生的2.444倍,高中采取减肥行为的危险是初中的1.313倍,职高是其1.465倍。<b>结论 </b>部分青少年中有盲目减肥行为,不同性别、学习阶段是不健康减肥行为的影响因素;应针对学生特点,引导青少年正确理解健康的含义和判断体型状况技能,指导青少年形成健康的生活方式。

(Liu Miaomiao, Wang Da, Ren Wanhui, et al.

Relationship Between Pet Keeping and Childhood Asthma in Shenyang City

[J]. Chinese Journal of Public Health, 2012, 28(11): 1420-1430.)

https://doi.org/10.11847/zgggws2012-28-11-11      URL      Magsci      [本文引用: 1]      摘要

<b>目的 </b>了解北京市西城区中学生减肥相关行为现状与特点,探讨相关因素,为制定政策及干预措施提供科学依据。<b>方法 </b>对北京市西城区9所学校1 930名中学生进行减肥相关行为调查。<b>结果 </b>有32.3%的学生采取过不健康减肥行为;节食、禁食、诱导呕吐或腹泻、吃减肥药报告率分别为31.9%、3.8%、1.4%、2.7%;女生采取减肥行为的危险是男生的2.444倍,高中采取减肥行为的危险是初中的1.313倍,职高是其1.465倍。<b>结论 </b>部分青少年中有盲目减肥行为,不同性别、学习阶段是不健康减肥行为的影响因素;应针对学生特点,引导青少年正确理解健康的含义和判断体型状况技能,指导青少年形成健康的生活方式。
[11] 周燕凤.

支气管哮喘发病危险因素分析及相关护理对策

[J]. 当代医学, 2011, 17(27): 133-134.

https://doi.org/10.3969/j.issn.1009-4393.2011.27.094      URL      [本文引用: 1]      摘要

目的探讨支气管哮喘发病的危险因素及相关护理对策。方法选取自2007年3月~2010年5月收治的120例支气管哮喘患者作为观察组,选取同期来我院体检的120例健康者作为对照组,分析支气管哮喘发病的相关危险因素。结果 (1)观察组与对照组比较,9种因素有显著性差异(P0.01),具有统计学意义;危险因素变量中危险性最大的依次为过敏体质、母亲哮喘史、父亲哮喘史和母亲孕期使用过致敏药物。(2)父母任何一方具有哮喘史均为子女发生哮喘的危险因素,但以父母双方均有哮喘史最为危险,其次为母亲哮喘史、父亲哮喘史;子女患哮喘的几率差异不大(P0.05)。结论支气管哮喘是由多因素综合作用形成的疾病,其中以遗传因素及体质因素最为重要。

(Zhou Yanfeng.

Risk Factors of Bronchial Asthma and Related Nursing Strategies

[J]. Contemporary Medicine, 2011, 17(27): 133-134.)

https://doi.org/10.3969/j.issn.1009-4393.2011.27.094      URL      [本文引用: 1]      摘要

目的探讨支气管哮喘发病的危险因素及相关护理对策。方法选取自2007年3月~2010年5月收治的120例支气管哮喘患者作为观察组,选取同期来我院体检的120例健康者作为对照组,分析支气管哮喘发病的相关危险因素。结果 (1)观察组与对照组比较,9种因素有显著性差异(P0.01),具有统计学意义;危险因素变量中危险性最大的依次为过敏体质、母亲哮喘史、父亲哮喘史和母亲孕期使用过致敏药物。(2)父母任何一方具有哮喘史均为子女发生哮喘的危险因素,但以父母双方均有哮喘史最为危险,其次为母亲哮喘史、父亲哮喘史;子女患哮喘的几率差异不大(P0.05)。结论支气管哮喘是由多因素综合作用形成的疾病,其中以遗传因素及体质因素最为重要。
[12] Finkelstein J, Jeong I C.

Machine Learning Approaches to Personalize Early Prediction of Asthma Exacerbations

[J]. Annals of the New York Academy of Sciences, 2017, 1387(1): 153-165.

https://doi.org/10.1111/nyas.13218      URL      PMID: 27627195      [本文引用: 1]      摘要

Patient telemonitoring results in an aggregation of significant amounts of information about patient disease trajectory. However, the potential use of this information for early prediction of exacerbations in adult asthma patients has not been systematically evaluated. The aim of this study was to explore the utility of telemonitoring data for building machine learning algorithms that predict asthma exacerbations before they occur. The study dataset comprised daily self鈥恗onitoring reports consisting of 7001 records submitted by adult asthma patients during home telemonitoring. Predictive modeling included preparation of stratified training datasets, predictive feature selection, and evaluation of resulting classifiers. Using a 7 ay window, a naive Bayesian classifier, adaptive Bayesian network, and support vector machines were able to predict asthma exacerbation occurring on day 8, with sensitivity of 0.80, 1.00, and 0.84; specificity of 0.77, 1.00, and 0.80; and accuracy of 0.77, 1.00, and 0.80, respectively. Our study demonstrated that machine learning techniques have significant potential in developing personalized decision support for chronic disease telemonitoring systems. Future studies may benefit from a comprehensive predictive framework that combines telemonitoring data with other factors affecting the likelihood of developing acute exacerbation. Approaches implemented for advanced asthma exacerbation prediction may be extended to prediction of exacerbations in patients with other chronic health conditions.
[13] Kim W, Kim K S, Lee J E, et al.

Development of Novel Breast Cancer Recurrence Prediction Model Using Support Vector Machine

[J]. Journal of Breast Cancer, 2012, 15(2): 230-238.

https://doi.org/10.4048/jbc.2012.15.2.230      URL      PMID: 22807942      [本文引用: 1]      摘要

The prediction of breast cancer recurrence is a crucial factor for successful treatment and follow-up planning. The principal objective of this study was to construct a novel prognostic model based on support vector machine (SVM) for the prediction of breast cancer recurrence within 5 years after breast cancer surgery in the Korean population, and to compare the predictive performance of the model with the previously established models. Data on 679 patients, who underwent breast cancer surgery between 1994 and 2002, were collected retrospectively from a Korean tertiary teaching hospital. The following variables were selected as independent variables for the prognostic model, by using the established medical knowledge and univariate analysis: histological grade, tumor size, number of metastatic lymph node, estrogen receptor, lymphovascular invasion, local invasion of tumor, and number of tumors. Three prediction algorithms, with each using SVM, artificial neural network and Cox-proportional hazard regression model, were constructed and compared with one another. The resultant and most effective model based on SVM was compared with previously established prognostic models, which included Adjuvant! Online, Nottingham prognostic index (NPI), and St. Gallen guidelines. The SVM-based prediction model, named 'breast cancer recurrence prediction based on SVM (BCRSVM),' proposed herein outperformed other prognostic models (area under the curve=0.85, 0.71, 0.70, respectively for the BCRSVM, Adjuvant! Online, and NPI). The BCRSVM evidenced substantially high sensitivity (0.89), specificity (0.73), positive predictive values (0.75), and negative predictive values (0.89). As the selected prognostic factors can be easily obtained in clinical practice, the proposed model might prove useful in the prediction of breast cancer recurrence. The prediction model is freely available in the website (http://ami.ajou.ac.kr/bcr/).
[14] 牟冬梅, 任珂.

三种数据挖掘算法在电子病历知识发现中的比较

[J]. 现代图书情报技术, 2016(6): 102-109.

[本文引用: 1]     

(Mu Dongmei, Ren Ke.

Discovering Knowledge from Electronic Medical Records with Three Data Mining Algorithms

[J]. New Technology of Library and Information Service, 2016(6): 102-109.)

[本文引用: 1]     

[15] 李丽霞, 张敏, 郜艳晖, .

人工神经网络在医学研究中的应用

[J]. 数理医药学杂志, 2009, 22(1): 80-82.

[本文引用: 1]     

(Li Lixia, Zhang Min, Gao Yanhui, et al.

Application of Artificial Neural Network in Medical Research

[J]. Journal of Mathematical Medicine, 2009, 22(1): 80-82.)

[本文引用: 1]     

[16] Bisgaard H, Szefler S.

Prevalence of Asthma-Like Symptoms in Young Children

[J]. Pediatric Pulmonology, 2007, 42(8): 723-728.

https://doi.org/10.1002/ppul.v42:8      URL      [本文引用: 1]     

[17] Savenije O E, Granell R, Caudri D, et al.

Comparison of Childhood Wheezing Phenotypes in 2 Birth Cohorts: ALSPAC and PIAMA

[J]. The Journal of Allergy Clinical Immunology, 2011, 127(6): 1505-1512.

https://doi.org/10.1016/j.jaci.2011.02.002      URL      PMID: 21411131      [本文引用: 1]      摘要

Asthma has its origins in early childhood, but different patterns of childhood wheezing vary in their associations with subsequent asthma, atopy, and bronchial hyperresponsiveness (BHR). Novel wheezing phenotypes have been identified on the basis of analyses of longitudinal data from the Avon Longitudinal Study of Parents And Children (ALSPAC). It is unclear whether these phenotypes can be replicated in other birth cohorts. To compare wheezing phenotypes identified in the first 8 years of life in the ALSPAC study and the Prevention and Incidence of Asthma and Mite Allergy (PIAMA) study. We used longitudinal latent class analysis to identify phenotypes on the basis of repeated reports of wheezing from 0聽to 8 years in 5760 children from the ALSPAC study and 2810 children from the PIAMA study. Phenotypes were compared between cohorts. Associations with asthma, atopy, BHR, and lung function were analyzed by using weighted regression analyses. The model with the best fit to PIAMA data in the first 8聽years of life was a 5-class model. Phenotypes identified in the PIAMA study had wheezing patterns that were similar to those previously reported in ALSPAC, adding further evidence to the existence of an intermediate-onset phenotype with onset of wheeze after 2 years of age. Associations with asthma, atopy, BHR, and lung function were remarkably similar in the 2 cohorts. Wheezing phenotypes identified by using longitudinal latent class analysis were comparable in 2 large birth cohorts. Study of genetic and environmental factors associated with different phenotypes may help elucidate the origins of asthma.
[18] Caudri D, Wijga A, Maarten C, et al.

Predicting the Long-Term Prognosis of Children with Symptoms Suggestive of Asthma at Preschool Age

[J]. The Journal of Allergy Clinical Immunology, 2009, 124(5): 903-910.

https://doi.org/10.1016/j.jaci.2009.06.045      URL      [本文引用: 1]     

[19] 袁梅宇. 数据挖掘与机器学习—WEKA应用技术与实践[M]. 北京: 清华大学出版社, 2014.

[本文引用: 2]     

(Yuan Meiyu.Data Mining and Machine Learning — WEKA Application Technology and Practice[M]. Beijing: Tsinghua University Press, 2014.)

[本文引用: 2]     

[20] Kumar Y, Sahoo G.

Prediction of Different Types of Liver Diseases Using Rule Based Classification Model

[J]. Technology & Health Care, 2013, 21(5): 417-432.

https://doi.org/10.3233/THC-130742      URL      PMID: 23963359      [本文引用: 1]      摘要

Abstract Diagnosing different types of liver diseases clinically is a quite hectic process because patients have to undergo large numbers of independent laboratory tests. On the basis of results and analysis of laboratory test, different liver diseases are classified. Hence to simplify this complex process, we have developed a Rule Base Classification Model (RBCM) to predict different types of liver diseases. The proposed model is the combination of rules and different data mining techniques.OBJECTIVE: The objective of this paper is to propose a rule based classification model with machine learning techniques for the prediction of different types of Liver diseases.METHOD: A dataset was developed with twelve attributes that include the records of 583 patients in which 441 patients were male and rests were female. Support Vector Machine (SVM), Rule Induction (RI), Decision Tree (DT), Naive Bayes (NB) and Artificial Neural Network (ANN) data mining techniques with K-cross fold technique are used with the proposed model for the prediction of liver diseases. The performance of these data mining techniques are evaluated with accuracy, sensitivity, specificity and kappa parameters as well as statistical techniques (ANOVA and Chi square test) are used to analyze the liver disease dataset and independence of attributes.RESULT: Out of 583 patients, 416 patients are liver diseases affected and rests of 167 patients are healthy. The proposed model with decision tree (DT) technique provides the better result among all techniques (RI, SVM, ANN and NB) with all parameters (Accuracy 98.46%, Sensitivity 95.7%, Specificity 95.28% and Kappa 0.983) while the SVM exhibits poor performance (Accuracy 82.33%, Sensitivity 68.03%, Specificity 91.28% and Kappa 0.801). It is also found that the best performance of the model without rules (RI, Accuracy 82.68%, Sensitivity 86.34%, Specificity 90.51% and Kappa 0.619) is almost similar to the worst performance of the rule based classification model (SVM, Accuracy 82.33%, Sensitivity 68.03%, Specificity 91.28% and Kappa 0.801 and the accuracy of chi square test is 76.67%.CONCLUSION: This study demonstrates that there is a significant difference between the proposed rules based classification model and the model without rules for the liver diseases prediction and the rule based classification model with decision tree (DT) technique provides most accurate result. This model can be used as a valuable tool for medical decision making.
[21] 方积乾. 现代医学统计学[M]. 北京: 人民卫生出版社, 2002: 708-718.

[本文引用: 1]     

(Fang Jiqian.Advanced Medical Statistics[M]. Beijing: People’s Medical Publishing House, 2002: 708-718.)

[本文引用: 1]     

[22] Rumelhart D E, Hinton G E, Williams G J, et al.

Learning Internal Representation by Back-Propagation Errors

[J]. Nature, 1986, 323: 533-536.

https://doi.org/10.1038/323533a0      URL      [本文引用: 1]     

[23] Dayhoff J E, Deleo J M.

Artificial Neural Networks

[J]. Cancer, 2001, 91(8): 1615-1634.

https://doi.org/10.1002/(ISSN)1097-0142      URL      [本文引用: 1]     

[24] Cross S S, Harrison R F, Kennedy R L.

Introduction to Neural Networks

[J]. Lancet, 1995, 346(8982): 1075-1079.

https://doi.org/10.1016/S0140-6736(95)91746-2      URL      [本文引用: 1]     

[25] 张良均, 王路, 谭立云, . Python数据分析与挖掘实战[M]. 北京: 机械工业出版社, 2015.

[本文引用: 1]     

(Zhang Liangjun, Wang Lu, Tan Liyun, et al.Python Practice of Data Analysis and Mining[M]. Beijing: China Machine Press, 2015.)

[本文引用: 1]     

[26] 孙振球. 医学统计学[M]. 北京: 人民卫生出版社, 2007: 333-341.

[本文引用: 1]     

(Sun Zhenqiu.Medical Statistics[M]. Beijing: People’s Medical Publishing House, 2007: 333-341.)

[本文引用: 1]     

[27] 张良均, 曹晶, 蔡世忠. 神经网络实用教程[M]. 北京: 机械工业出版社, 2008:31-36.

[本文引用: 1]     

(Zhang Liangjun, Cao Jing, Cai Shizhong.Neural Network Practical Guide[M]. Beijing: China Machine Press, 2008: 31-36.)

[本文引用: 1]     

版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn

/