Please wait a minute...
Advanced Search
数据分析与知识发现  2017, Vol. 1 Issue (2): 41-46     https://doi.org/10.11925/infotech.2096-3467.2017.02.06
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于SEER数据库应用贝叶斯网络构建亚洲肿瘤患者预后模型*——以非小细胞肺癌为例
尹玢璨1, 辛世超1, 张晗1, 赵玉虹1,2()
1中国医科大学医学信息学院 沈阳 110122
2中国医科大学附属盛京医院 沈阳 110004
Building Asian Tumor-patients Prognostic Model with Bayesian Network and SEER Database——Case Study of Non-Small Cell Lung Cancer
Yin Bincan1, Xin Shichao1, Zhang Han1, Zhao Yuhong1,2()
1Department of Medical Informatics, China Medical University, Shenyang 110122, China
2Shengjing Hospital of China Medical University, Shenyang 110004, China
全文: PDF (493 KB)   HTML ( 40
输出: BibTeX | EndNote (RIS)      
摘要 

目的】利用SEER数据库, 找出对非小细胞肺癌患者预后生存的影响因素并预测患者预后生存状态, 指导肿瘤预后评价。【方法】采用单因素统计学方法及Logistic回归分析初步筛选预后相关因素, 利用贝叶斯网络方法构建患者术后生存预测模型, 并与其他三种常见的机器学习分类算法所建模型效能做比较。【结果】最终纳入模型的预后变量共5项, 包括年龄、肿瘤大小、组织学分级、肿瘤分期和受累淋巴结比率。贝叶斯网络所建模型对非小细胞肺癌患者生存状况预测准确率达到72.87%。【局限】SEER数据库内纳入的预后因素有限, 一定程度影响预测效果。【结论】贝叶斯网络可探寻变量间的关系并构建肺癌患者最优预后模型, 辅助医生判断患者预后情况及治疗效果, 优于决策树、支持向量机及人工神经网络三种模式。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
尹玢璨
辛世超
张晗
赵玉虹
关键词 贝叶斯网络非小细胞肺癌预后机器学习    
Abstract

[Objective] This study aims to improve the tumor-prognostic assessment for Asian patients who were diagnosed with Non-Small Cell Lung Cancer (NSCLC). The proposed model identifies the influencing factors of the patients’ survival status and predicts their prognostic situation. [Methods] First, we used single factor statistical method and logistic regression to identify the prognostic variables. Second, we employed the Bayesian Network algorithm to construct the prognostic survival model for the Asian NSCLC patients. Finally, we compared the performance of our model with three other algorithms. [Results] The identified prognostic variables include age, tumor size, grade, tumor stage, as well as the lymph nodes ratio. The proposed model could predict NSCLC patients’ prognostic survival status effectively. [Limitations] The SEER database had limited number of prognostic factors, which may influence the prediction accuracy. [Conclusions] The Bayesian Network could help us build optimal prognosis model for cancer patients to improve their survival rates. The proposed model is better than the Decision Tree, Support Vector Machine and Artificial Neural Network models.

Key wordsBayesian Networks    Non-Small Cell Lung Cancer    Prognosis    Machine Learning
收稿日期: 2016-10-31      出版日期: 2017-03-27
:  R730.7 G35  
基金资助:*本文系国家自然科学基金项目“中国临床医师岗位胜任力模型构建及评价体系研究”(项目编号: 71473268)、辽宁省科学技术计划项目“肝炎、结核等重大疾病临床研究平台建设”之子项目“构建辽宁(本溪)生物医药科技产业基地的信息化服务与成果转化创新平台”(项目编号: 2013225079)和教育部人文社会科学研究青年基金项目“基于语义述谓网络属性的多文档自动摘要: 以生物医学为例”(项目编号: 13YJC870030)的研究成果之一
引用本文:   
尹玢璨, 辛世超, 张晗, 赵玉虹. 基于SEER数据库应用贝叶斯网络构建亚洲肿瘤患者预后模型*——以非小细胞肺癌为例[J]. 数据分析与知识发现, 2017, 1(2): 41-46.
Yin Bincan,Xin Shichao,Zhang Han,Zhao Yuhong. Building Asian Tumor-patients Prognostic Model with Bayesian Network and SEER Database——Case Study of Non-Small Cell Lung Cancer. Data Analysis and Knowledge Discovery, 2017, 1(2): 41-46.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2017.02.06      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2017/V1/I2/41
  基于SEER构建亚洲NSCLC患者预后模型的研究流程
数据类型 变量 SEER中所示名称 类数/数值范围
分类型 性别 Sex 2
国别 Race recode (Asian) 8
婚姻状况 Marital status at
diagnosis
4
发病部位 Primary Site - labeled 5
病理类型 ICD-O-3 Hist/behav,
malignant
4
组织学分级 Grade 4
患侧部位 Laterality 2
邻近器官
浸润程度
CS extension 18
区域淋巴结
累积程度
CS lymph nodes 5
远处转移程度 CS mets at dx 5
肿瘤分期 Derived AJCC
Stage Group
7
手术类型 RX Summ--Surg
Prim Site
13
是否放疗 Radiation 3
连续型 确诊时年龄 Age at diagnosis 26-90
肿瘤大小 CS tumor size 4-132
阳性淋巴结数量 Regional nodes
positive
0-23
受检淋巴结数量 Regional nodes
examined
1-45
  非小细胞肺癌患者预后指标信息
变量名称 B S.E. Exp(B) 95% Exp(B) Sig.
下限 上限
确诊时年龄 -0.066 0.011 0.936 0.916 0.957 0.000
肿瘤大小 -0.018 0.007 0.982 0.968 0.996 0.014
组织学分级 / / / / / 0.001
肿瘤分期 / / / / / 0.013
受检淋巴结
数量
0.050 0.017 1.051 1.016 1.087 0.004
阳性淋巴结
数量
-0.199 0.067 0.819 0.719 0.934 0.003
  Logistic回归分析筛选变量结果
  亚洲非小细胞肺癌患者预后生存贝叶斯网络模型
所用分类算法 预测准确率
训练集 测试集
贝叶斯网络 0.683 0.729
决策树 0.713 0.670
支持向量机 0.733 0.686
人工神经网络 0.784 0.649
  BNNSCLC模型与其他三种分类算法所建模型预测准确率比较
算法 预测准确率 精确度 ROC曲线下面积
贝叶斯网络 72.87% 71.0% 0.67
决策树 67.02% 66.3% 0.568
支持向量机 68.62% 68.2% 0.611
人工神经网络 64.89% 63.7% 0.615
  不同算法所构建模型性能比较
[1] National Cancer Institute. SEER Cancer Statistics Review (CSR) 1975-2013 [R/OL]. [2016-09-20]. .
[2] Ettinger D S, Wood D E, Akerley W, et al.NCCN Guidelines Insights: Non-Small Cell Lung Cancer, Version 4.2016[J]. Journal of the National Comprehensive Cancer Network: JNCCN, 2016, 14(3): 255-264.
pmid: 26957612
[3] Muers M F, Shevlin P, Brown J.Prognosis in Lung Cancer: Physicians’ Opinions Compared with Outcome and a Predictive Model[J]. Thorax, 1996, 51(9): 894-902.
doi: 10.1136/thx.51.9.894 pmid: 8984699
[4] Yang L, Takimoto T, Fujimoto J.Prognostic Model for Predicting Overall Survival in Children and Adolescents with Rhabdomyosarcoma[J]. BMC Cancer, 2014, 14: 654. DOI: 10.1186/1471-2407-14-654.
doi: 10.1186/1471-2407-14-654 pmid: 25189734
[5] Park I, Lee J L, Ryu M H, et al.Prognostic Factors and Predictive Model in Patients with Advanced Biliary Tract Adenocarcinoma Receiving First-line Palliative Chemotherapy[J]. Cancer, 2009, 115(18): 4148-4155.
doi: 10.1002/cncr.24472 pmid: 19536892
[6] Kim W, Kim K S, Park R W.Nomogram of Naive Bayesian Model for Recurrence Prediction of Breast Cancer[J]. Healthcare Informatics Research, 2016, 22(2): 89-94.
doi: 10.4258/hir.2016.22.2.89
[7] Kim W, Kim K S, Lee J E, et al.Development of Novel Breast Cancer Recurrence Prediction Model Using Support Vector Machine[J]. Journal of Breast Cancer, 2012, 15(2): 230-238.
doi: 10.4048/jbc.2012.15.2.230
[8] 刘雅琴. 乳腺癌患者预后模型的研究[D]. 上海: 上海交通大学, 2008.
[8] (Liu Yaqin.Study on the Prognosis Model for Breast Cancer [D]. Shanghai: Shanghai Jiaotong University, 2008.)
[9] Chen Y C, Ke W C, Chiu H W.Risk Classification of Cancer Survival Using ANN with Gene Expression Data from Multiple Laboratories[J]. Computers in Biology and Medicine, 2014, 48: 1-7.
doi: 10.1016/j.compbiomed.2014.02.006
[10] 牟冬梅, 任珂. 三种数据挖掘算法在电子病历知识发现中的比较[J]. 现代图书情报技术, 2016(6): 102-109.
[10] (Mu Dongmei, Ren Ke.Discovering Knowledge from Electronic Medical Records with Three Data Mining Algorithms[J]. New Technology of Library and Information Service, 2016(6): 102-109.)
[11] Shin H, Nam Y.A Coupling Approach of a Predictor and a Descriptor for Breast Cancer Prognosis[J]. BMC Medical Genomics, 2014, 7(S1): S4.
doi: 10.1186/1755-8794-7-S1-S4 pmid: 4101306
[12] American Joint Committee on Cancer, AJCC Cancer Staging Manual[M]. The 7th Edition. New York: Springer Verlag, 2010: 253-270.
[13] National Comprehensive Cancer Network: NCCN Clinical Practice Guidelines in Oncology: Non-Small Cell Lung Cancer, Version 2.2016 [R/OL]. [2016-09-20]. .
[14] Hartemink A J.Principled Computational Methods for the Validation and Discovery of Genetic Regulatory Networks [D]. Massachusetts Institute of Technology, 2001: 86-87.
[15] Kumar Y, Sahoo G.Prediction of Different Types of Liver Diseases Using Rule Based Classification Model[J]. Technology & Health Care Official Journal of the European Society for Engineering & Medicine, 2013, 21(5): 417-432.
[16] Oh J H, Craft J, Al L R, et al.A Bayesian Network Approach for Modeling Local Failure in Lung Cancer[J]. Physics in Medicine & Biology, 2011, 56(6): 1635-1651.
doi: 10.1088/0031-9155/56/6/008 pmid: 21335651
[17] 张雪雷. 基于禁忌搜索算法的贝叶斯网络在疾病预测与诊断中的应用[D]. 太原: 山西医科大学, 2015.
[17] (Zhang Xuelei.The Application of Bayesian Network Based on Tabu Search Algorithm in Diseases Prediction and Diagnosis [D]. Taiyuan: Shanxi Medical University, 2015.)
[18] Lim W L, Wibowo A, Desa M I, et al.A Biogeography-Based Optimization Algorithm Hybridized with Tabu Search for the Quadratic Assignment Problem[J]. Computational Intelligence & Neuroscience, 2016. DOI: 10.1155/2016/5803893.
doi: 10.1155/2016/5803893 pmid: 26819585
[19] Makond B, Wang K J, Wang K M.Probabilistic Modeling of Short Survivability in Patients with Brain Metastasis from Lung Cancer[J]. Computer Methods & Programs in Biomedicine, 2015, 119(3): 142-162.
doi: 10.1016/j.cmpb.2015.02.005 pmid: 25804445
[20] 魏珍, 张雪雷, 饶华祥, 等.禁忌搜索算法的贝叶斯网络模型在冠心病影响因素分析中的应用[J].中华流行病学杂志, 2016, 37(6): 895-899.
doi: 10.3760/cma.j.issn.0254-6450.2016.06.031
[20] (Wei Zhen, Zhang Xuelei, Rao Huaxiang, et al.Using the Tabu-search-algorithm-based Bayesian Network to Analyze the Risk Factors of Coronary Heart Diseases[J]. Chinese Journal of Epidemiology, 2016, 37(6): 895-899.)
doi: 10.3760/cma.j.issn.0254-6450.2016.06.031
[21] 杨乔, 张俊萍. 肿瘤登记数据库的临床应用[J]. 循证医学, 2013, 13(4): 250-251, 256.
doi: 10.3969/j.issn.1671-5144.2013.04.016
[21] (Yang Qiao, Zhang Junping.Clinical Applications of the Tumor Registry Database[J]. The Journal of Evidence-Based Medicine, 2013, 13(4): 250-251, 256.)
doi: 10.3969/j.issn.1671-5144.2013.04.016
[1] 王寒雪,崔文娟,周园春,杜一. 基于机器学习的食源性疾病致病菌识别方法*[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] 陈东华,赵红梅,尚小溥,张润彤. 数据驱动的大型医院手术室运营预测与优化方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[3] 车宏鑫,王桐,王伟. 前列腺癌预测模型对比研究*[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[4] 苏强, 侯校理, 邹妮. 基于机器学习组合优化方法的术后感染预测模型研究*[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[5] 曹睿,廖彬,李敏,孙瑞娜. 基于XGBoost的在线短租市场价格预测及特征分析模型*[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[6] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[7] 向卓元,刘志聪,吴玉. 基于用户行为自适应推荐模型研究 *[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[8] 吴胜男, 蒲虹君, 田若楠, 梁雯琪, 于琦. 网络结构对链路预测算法的影响研究*——基于元分析视角[J]. 数据分析与知识发现, 2021, 5(11): 102-113.
[9] 柴国荣,王斌,沙勇忠. 基于多机器学习方法联合的公共卫生风险预测研究——以兰州市流感预测为例*[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[10] 陈东,王建冬,李慧颖,蔡思航,黄倩倩,易成岐,曹攀. 融合机器学习算法和多因素的禽肉交易量预测方法研究 *[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[11] 梁野,李小元,许航,胡伊然. CLOpin:一种面向舆情分析与预警领域的跨语言知识图谱架构*[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[12] 杨恒,王思丽,祝忠明,刘巍,王楠. 基于并行协同过滤算法的领域知识推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[13] 徐建民,张丽青,王苗. 基于贝叶斯网络的静态话题追踪模型*[J]. 数据分析与知识发现, 2020, 4(2/3): 200-206.
[14] 王树义,刘赛,马峥. 基于深度迁移学习的微博图像隐私分类研究*[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
[15] 王若佳,张璐,王继民. 基于机器学习的在线问诊平台智能分诊研究[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn