Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (11): 63-73     https://doi.org/10.11925/infotech.2096-3467.2020.0469
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于深度学习的重复住院预测模型研究——以心脏病为例*
达婧玮1,颜嘉麒1(),邓三鸿1,2,王忠民3
1南京大学信息管理学院 南京 210023
2江苏省数据工程与知识服务重点实验室 南京 210023
3江苏省人民医院(南京医科大学第一附属医院)南京 210029
Predicting Hospital Readmissions with Deep Learning: Case Study of Heart Diseases
Da Jingwei1,Yan Jiaqi1(),Deng Sanhong1,2,Wang Zhongmin3
1School of Information Management, Nanjing University, Nanjing 210023, China
2Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
3Jiangsu Province Hospital (The First Affiliated Hospital of Nanjing Medical University), Nanjing 210029, China
全文: PDF (990 KB)   HTML ( 21
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 基于电子病历,运用深度学习方法提高重复住院预测准确率,为医院管理提供参考。【方法】 提出一种融合结构化和非结构化数据的模型。该模型基于字符级卷积神经网络对非结构化文本数据进行学习,并结合结构化数据(人口统计学数据、临床数据和行政数据)对重复住院进行预测。【结果】 融合结构化和非结构化数据的深度学习模型表现最好,F1值为0.735,超出单独使用结构化数据的模型12.9%,超出单独使用非结构化数据的模型约2.1%,预测性能有较大提升。【局限】 实验数据集仅包括来自一家医院患者的部分病历数据,对模型预测结果有一定影响。【结论】 本文模型实现了较好的预测效果,可为相关研究者和医院管理者提供参考。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
达婧玮
颜嘉麒
邓三鸿
王忠民
关键词 重复住院深度学习心脏病预测分析    
Abstract

[Objective] This paper uses the deep learning method to predict possible readmissions of patients based on their electronic medical records, aiming to improve hospital management. [Methods] We proposed a model based on character-level convolution neural network to process the unstructured texts. Then, with the help of structured data (demographics, clinical records and administrative data) to predict the hospital readmission cases. [Results] The deep learning model combining structured and unstructured data yielded better prediction results at F1-score of 0.735. Compared with the models only using structured or unstructured data, the F1-score was increased by 12.9% and 2.1%, respectively. [Limitations] The experimental medical records were collected from one hospital, which has some impacts on prediction results. [Conclusions] The proposed model provides references for researchers of hospital readmission prediction and hospital administrators.

Key wordsHospital Readmission    Deep Learning    Heart Disease    Predictive Analysis
收稿日期: 2020-05-27      出版日期: 2020-09-02
ZTFLH:  TP391  
基金资助:*本文系国家自然科学基金青年项目“供应链质量管理中基于区块链的智能系统模型研究”(71701091);教育部人文社会科学青年项目“区块链虚拟组织信息资源的知识表示方法研究”的研究成果之一(17YJC870020)
通讯作者: 颜嘉麒     E-mail: jiaqiyan@nju.edu.cn
引用本文:   
达婧玮,颜嘉麒,邓三鸿,王忠民. 基于深度学习的重复住院预测模型研究——以心脏病为例*[J]. 数据分析与知识发现, 2020, 4(11): 63-73.
Da Jingwei,Yan Jiaqi,Deng Sanhong,Wang Zhongmin. Predicting Hospital Readmissions with Deep Learning: Case Study of Heart Diseases. Data Analysis and Knowledge Discovery, 2020, 4(11): 63-73.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0469      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I11/63
Fig.1  研究框架
特征子集 特征名称 数据类型 数据描述
人口统计学数据 性别 分类变量 男(2 394, 65.4%); 女(1 266, 34.6%)
婚姻状态 分类变量 已婚(3 510, 95.9%); 未婚(150, 4.1%)
临床数据 收缩压 数值型变量 平均值=129.582; 方差=18.512
舒张压 数值型变量 平均值=72.314; 方差=12.130
行政数据 住院天数 数值型变量 平均值=14.879; 方差=10.311
ICD_10编码 分类变量 I25.101(57.6%); I25.105(29.3%);其他(13.1%)
美托洛尔使用 分类变量 是 (1 197, 32.7%); 否 (2 463, 67.3%)
厄贝沙坦使用 分类变量 是(215, 5.9%); 否(3 445, 94.1%)
是否手术 分类变量 是 (201, 5.5%); 否 (3 459, 94.5%)
Table 1  结构化数据描述
Fig.2  非结构化文本数据示例
变量 χ2 Spearman相关系数 p
性别 1.493 - 0.224
是否手术 34.905 - 0.000
ICD-10编码 81.145 - 0.000
婚姻状态 41.540 - 0.000
美托洛尔使用 88.998 - 0.000
厄贝沙坦使用 5.284 - 0.024
住院天数 - 0.391 0.000
收缩压 - -0.095 0.000
舒张压 - -0.163 0.000
Table 2  自变量与因变量的相关性分析
Fig.3  融合模型(SUCM)网络结构示意图
参数名称 参数值
词嵌入维度(Word Embedding) 64
句子维度(Sentences Dimension) 300
卷积核个数(Number of Filter) 32
卷积核长度(Filter Length) 5
学习率(Learning Rate) 0.001
Table 3  SUCM模型实验参数设置
模 型 ACC F1 P R AUC
结构化数据 NB 0.704 0.649 0.705 0.639 0.701
SVM 0.717 0.647 0.746 0.619 0.713
LR 0.711 0.650 0.730 0.633 0.708
MS(DL) 0.728 0.651 0.771 0.604 0.723
非结构化数据 NB 0.718 0.624 0.802 0.578 0.712
SVM 0.743 0.704 0.771 0.701 0.742
LR 0.731 0.687 0.762 0.685 0.729
MU(DL) 0.743 0.720 0.751 0.728 0.743
结构化数据+
非结构化数据
NB 0.719 0.654 0.736 0.636 0.715
SVM 0.745 0.710 0.771 0.713 0.744
LR 0.734 0.687 0.760 0.689 0.732
SUCM(DL) 0.754 0.735 0.752 0.749 0.754
Table 4  各模型预测实验结果
Fig.4  结构化数据融合非结构化数据后模型的性能提升比较
Fig.5  非结构化数据融合结构化数据后模型的性能提升比较
激活函数 ACC F1 P R AUC
ReLU 0.754 0.735 0.752 0.749 0.754
ELU 0.750 0.728 0.767 0.724 0.749
Swish 0.748 0.727 0.752 0.736 0.747
Table 5  不同激活函数对应预测实验结果
全连接层数 ACC F1 P R AUC 运行时间/min
(3, 1) 0.750 0.732 0.760 0.738 0.750 1.3
(3, 2) 0.750 0.734 0.744 0.748 0.750 1.8
(4, 1) 0.754 0.735 0.752 0.749 0.754 1.2
(4, 2) 0.738 0.725 0.732 0.754 0.739 1.9
(5, 1) 0.741 0.725 0.739 0.740 0.741 1.4
(5, 2) 0.749 0.730 0.750 0.743 0.749 1.7
Table 6  不同全连接层数设置对应预测实验结果
模 型 ACC F1 P R AUC 运行时间/min
MU(Char-CNN) 0.743 0.720 0.751 0.728 0.743 1.0
MU(Word2Vec) 0.728 0.715 0.720 0.732 0.728 2.3
SUCM(Char-CNN) 0.754 0.735 0.752 0.749 0.754 1.2
SUCM(Word2Vec) 0.729 0.713 0.733 0.718 0.729 4.3
Table 7  不同文本处理方法效果
[1] Roy S B, Teredesai A, Zolfaghar K, et al. Dynamic Hierarchical Classification for Patient Risk-of-Readmission[C]// Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2015: 1691-1700.
[2] Mcilvennan C K, Eapen Z J, Allen L A. Hospital Readmissions Reduction Program[J]. Circulation, 2015,131(20):1796-1803.
doi: 10.1161/CIRCULATIONAHA.114.010270 pmid: 25986448
[3] Benbassat J, Taragin M. Hospital Readmissions as a Measure of Quality of Health Care[J]. Archives of Internal Medicine, 2000,160(8):1074.
doi: 10.1001/archinte.160.8.1074 pmid: 10789599
[4] Zhang X, Zhao J B, LeCun Y. Character-Level Convolutional Networks for Text Classification[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015: 649-657.
[5] 刘勘, 陈露. 面向医疗分诊的深度神经网络学习[J]. 数据分析与知识发现, 2019,3(6):99-108.
[5] ( Liu Kan, Chen Lu. Deep Neural Network Learning for Medical Triage[J]. Data Analysis and Knowledge Discovery, 2019,3(6):99-108.)
[6] Yu S P, Farooq F, Van Esbroeck A, et al. Predicting Readmission Risk with Institution-Specific Prediction Models[J]. Artificial Intelligence in Medicine, 2015,65(2):89-96.
doi: 10.1016/j.artmed.2015.08.005 pmid: 26363683
[7] 余传明, 龚雨田, 王峰, 等. 基于文本价格融合模型的股票趋势预测[J]. 数据分析与知识发现, 2018,2(12):33-42.
[7] ( Yu Chuanming, Gong Yutian, Wang Feng, et al. Predicting Stock Prices with Text and Price Combined Model[J]. Data Analysis and Knowledge Discovery, 2018,2(12):33-42.)
[8] 汤培楷. 基于机器学习的再入院预测[J]. 中国数字医学, 2016,11(7):50-52.
[8] ( Tang Peikai. Predicting Hospital Readmission Based on Machine Learning[J]. China Digital Medicine, 2016,11(7):50-52.)
[9] 朱春燕. 心血管疾病患者再次入院风险评估系统的设计与实现[D]. 杭州: 浙江大学, 2016.
[9] ( Zhu Chunyan. Design and Realization of a Readmission Risk Assessment System for Patients with Cardiovascular Disease[D]. Hangzhou: Zhejiang University, 2016.)
[10] 杜国栋. 基于梯度提升决策树的患者30天再入院预测模型研究[D]. 昆明: 昆明理工大学, 2018.
[10] ( Du Guodong. Study on Prediction Model of 30-day Readmission Based on Gradient Boosting Decision Tree[D]. Kunming: Kunming University of Science and Technology, 2018.)
[11] Eigner I, Reischl D, Bodendorf F. Development and Evaluation of Ensemble-Based Classification Models for Predicting Unplanned Hospital Readmissions after Hysterectomy[C]// Proceedings of Australasian Conference on Information Systems. 2018.
[12] Hammoudeh A, Alnaymat G, Ghannam I, et al. Predicting Hospital Readmission Among Diabetics Using Deep Learning[C]// Proceedings of the 5th International Symposium on Emerging Information, Communication and Networks. 2018: 484-489.
[13] Ashfaq A, Santanna A, Lingman M, et al. Readmission Prediction Using Deep Learning on Electronic Health Records[J]. Journal of Biomedical Informatics, 2019,97:103256.
pmid: 31351136
[14] Zebin T, Chaussalet T J. Design and Implementation of a Deep Recurrent Model for Prediction of Readmission in Urgent Care Using Electronic Health Records[C]// Proceedings of 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). IEEE, 2019: 1-5.
[15] Wang H S, Cui Z C, Chen Y X, et al. Predicting Hospital Readmission via Cost-sensitive Deep Learning[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 2018,15(6):1968-1978.
doi: 10.1109/TCBB.2018.2827029
[16] Dashtban M, Li W. Predicting Risk of Hospital Readmission for Comorbidity Patients Through a Novel Deep Learning Framework[C]// Proceedings of the 53rd Hawaii International Conference on System Sciences. 2020.
[17] Craig E, Arias C, Gillman D. Predicting Readmission Risk from Doctors’ Notes[OL]. arXiv Preprint, arXiv: 1711. 10663.
[18] Xiao C, Ma T F, Dieng A B, et al. Readmission Prediction via Deep Contextual Embedding of Clinical Concepts[J]. PLoS One, 2018,13(4):e0195024.
doi: 10.1371/journal.pone.0195024 pmid: 29630604
[19] Patel A, Gan K, Li A, et al. Machine Learning Algorithms in Predicting Hospital Readmissions in Sickle Cell Disease[J]. Blood, 2019,134(S1):982.
[20] Liu X, Chen Y, Bae J, et al. Predicting Heart Failure Readmission from Clinical Notes Using Deep Learning[OL]. arXiv Preprint, arXiv: 1912. 10306.
[21] Tang F Y, Xiao C, Wang F, et al. Predictive Modeling in Urgent Care: A Comparative Study of Machine Learning Approaches[J]. JAMIA Open, 2018,1(1):87-98.
doi: 10.1093/jamiaopen/ooy011 pmid: 31984321
[22] Lovelace J R, Hurley N C, Haimovich A D, et al. Explainable Prediction of Adverse Outcomes Using Clinical Notes[OL]. arXiv Preprint, arXiv: 1910. 14095.
[23] Golas S B, Shibahara T, Agboola S, et al. A Machine Learning Model to Predict the Risk of 30-day Readmissions in Patients with Heart Failure: A Retrospective Analysis of Electronic Medical Records Data[J]. BMC Medical Informatics and Decision Making, 2018,18(1):44.
pmid: 29929496
[24] Kwon O, Na W, Yang H, et al. Electronic Medical Record-Based Machine Learning Approach to Predict the Risk of 30-Day Major Adverse Cardiac Event After Invasive Coronary Treatment[J]. Circulation, 2019,140(S1):A14474.
[25] 韩雅玲, 周玉杰. 冠心病合理用药指南(第2版)[J/OL]. 中国医学前沿杂志(电子版), 2018,10(6):1-130.
[25] ( Han Yaling, Zhou Yujie. Guidelines for Rational Use of Coronary Heart Disease (The 2nd edition) [J/OL]. Chinese Journal of the Frontiers of Medical Science (Electronic Version), 2018,10(6):1-130.)
[26] 曹明花. 长沙地区慢性心力衰竭患者再入院影响因素的研究[D]. 长沙: 湖南师范大学, 2013.
[26] ( Cao Minghua. A Study About the Risk Factors of Readmission in Patients with Chronic Heart Failure in Changsha[D]. Changsha: Hunan Normal University, 2013.)
[27] Goto456 Stopwords[EB/OL]. (2020-03-04). [2020-03-08]. https://github.com/goto456/stopwords.
[28] 郭跃华. 概率论与数理统计[M]. 北京: 科学出版社, 2007.
[28] ( Guo Yuehua. Probability Theory and Mathematical Statistics[M]. Beijing: Science Press, 2007.)
[29] 洪寒梅, 陈妍, 钱欣平, 等. 期刊影响力指数排名的合理性分析[J]. 中国科技期刊研究, 2018,29(8):842-848.
doi: 10.11946/cjstp.201803200244
[29] ( Hong Hanmei, Chen Yan, Qian Xinping, et al. Analysis of Rationality on the Ranking of Academic Journal Clout Index[J]. Chinese Journal of Scientific and Technical Periodicals, 2018,29(8):842-848.)
doi: 10.11946/cjstp.201803200244
[30] 谢为俊, 丁冶春, 王凤贺, 等. 基于卷积神经网络的油茶籽完整性识别方法[J]. 农业机械学报, 2020,51(7):13-21.
[30] ( Xie Weijun, Ding Yechun, Wang Fenghe, et al. Integrity Recognition of Camellia Oleifera Seeds Based on Neural Network[J]. Transactions of the Chinese Society for Agricultural Machinery, 2020,51(7):13-21.)
[31] Jia M M, Tian F. Readmission Prediction of Diabetic Based on Convolutional Neural Networks[C]// Proceedings of 2019 IEEE International Conference on Computer and Communications (ICCC). IEEE, 2019, DOI: 10.1109/ICCC47050.2019.9064477.
[32] Clevert D A, Unterthiner T, Hochreiter S, et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)[OL]. arXiv Preprint, arXiv: 1511. 07289.
[33] Ramachandran P, Zoph B, Le Q V, et al. Searching for Activation Functions[OL]. arXiv Preprint, arXiv: 1710. 05941.
[34] 向菲, 谢耀谈. 基于混合采样与迁移学习的患者评论识别模型[J]. 数据分析与知识发现, 2020,4(2/3):39-47.
[34] ( Xiang Fei, Xie Yaotan. Recognition Model of Patient Reviews Based on Mixed Sampling and Transfer Learning[J]. Data Analysis and Knowledge Discovery, 2020,4(2/3):39-47.)
[35] Li S, Zhao Z, Hu R F, et al. Analogical Reasoning on Chinese Morphological and Semantic Relations[C]// Proceedings of Meeting of the Association for Computational Linguistics. 2018: 138-143.
[1] 周泽聿,王昊,赵梓博,李跃艳,张小琴. 融合关联信息的GCN文本分类模型构建及其应用研究*[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[2] 徐月梅, 王子厚, 吴子歆. 一种基于CNN-BiLSTM多特征融合的股票走势预测模型*[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[3] 赵丹宁,牟冬梅,白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[4] 黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[5] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[6] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[7] 马莹雪,甘明鑫,肖克峻. 融合标签和内容信息的矩阵分解推荐方法*[J]. 数据分析与知识发现, 2021, 5(5): 71-82.
[8] 成彬,施水才,都云程,肖诗斌. 基于融合词性的BiLSTM-CRF的期刊关键词抽取方法[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[9] 常城扬,王晓东,张胜磊. 基于深度学习方法对特定群体推特的动态政治情感极性分析*[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[10] 冯勇,刘洋,徐红艳,王嵘冰,张永刚. 融合近邻评论的GRU商品推荐模型*[J]. 数据分析与知识发现, 2021, 5(3): 78-87.
[11] 胡昊天,吉晋锋,王东波,邓三鸿. 基于深度学习的食品安全事件实体一体化呈现平台构建*[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[12] 张琪,江川,纪有书,冯敏萱,李斌,许超,刘浏. 面向多领域先秦典籍的分词词性一体化自动标注模型构建*[J]. 数据分析与知识发现, 2021, 5(3): 2-11.
[13] 吕学强,罗艺雄,李家全,游新冬. 中文专利侵权检测研究综述*[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
[14] 李丹阳, 甘明鑫. 基于多源信息融合的音乐推荐方法 *[J]. 数据分析与知识发现, 2021, 5(2): 94-105.
[15] 余传明, 张贞港, 孔令格. 面向链接预测的知识图谱表示模型对比研究*[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn