Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (2): 41-46    DOI: 10.11925/infotech.2096-3467.2017.02.06
Building Asian Tumor-patients Prognostic Model with Bayesian Network and SEER Database——Case Study of Non-Small Cell Lung Cancer
Yin Bincan1, Xin Shichao1, Zhang Han1, Zhao Yuhong1,2()
1Department of Medical Informatics, China Medical University, Shenyang 110122, China
2Shengjing Hospital of China Medical University, Shenyang 110004, China
[Objective] This study aims to improve the tumor-prognostic assessment for Asian patients who were diagnosed with Non-Small Cell Lung Cancer (NSCLC). The proposed model identifies the influencing factors of the patients’ survival status and predicts their prognostic situation. [Methods] First, we used single factor statistical method and logistic regression to identify the prognostic variables. Second, we employed the Bayesian Network algorithm to construct the prognostic survival model for the Asian NSCLC patients. Finally, we compared the performance of our model with three other algorithms. [Results] The identified prognostic variables include age, tumor size, grade, tumor stage, as well as the lymph nodes ratio. The proposed model could predict NSCLC patients’ prognostic survival status effectively. [Limitations] The SEER database had limited number of prognostic factors, which may influence the prediction accuracy. [Conclusions] The Bayesian Network could help us build optimal prognosis model for cancer patients to improve their survival rates. The proposed model is better than the Decision Tree, Support Vector Machine and Artificial Neural Network models.

Key wordsBayesian Networks      Non-Small Cell Lung Cancer      Prognosis      Machine Learning     
Received: 31 October 2016      Published: 27 March 2017
Yin Bincan,Xin Shichao,Zhang Han,Zhao Yuhong. Building Asian Tumor-patients Prognostic Model with Bayesian Network and SEER Database——Case Study of Non-Small Cell Lung Cancer. Data Analysis and Knowledge Discovery, 2017, 1(2): 41-46.

数据类型 变量 SEER中所示名称 类数/数值范围
分类型 性别 Sex 2
国别 Race recode (Asian) 8
婚姻状况 Marital status at
发病部位 Primary Site - labeled 5
病理类型 ICD-O-3 Hist/behav,
组织学分级 Grade 4
患侧部位 Laterality 2
CS extension 18
CS lymph nodes 5
远处转移程度 CS mets at dx 5
肿瘤分期 Derived AJCC
Stage Group
手术类型 RX Summ--Surg
Prim Site
是否放疗 Radiation 3
连续型 确诊时年龄 Age at diagnosis 26-90
肿瘤大小 CS tumor size 4-132
阳性淋巴结数量 Regional nodes
受检淋巴结数量 Regional nodes
变量名称 B S.E. Exp(B) 95% Exp(B) Sig.
下限 上限
确诊时年龄 -0.066 0.011 0.936 0.916 0.957 0.000
肿瘤大小 -0.018 0.007 0.982 0.968 0.996 0.014
组织学分级 / / / / / 0.001
肿瘤分期 / / / / / 0.013
0.050 0.017 1.051 1.016 1.087 0.004
-0.199 0.067 0.819 0.719 0.934 0.003
所用分类算法 预测准确率
训练集 测试集
贝叶斯网络 0.683 0.729
决策树 0.713 0.670
支持向量机 0.733 0.686
人工神经网络 0.784 0.649
算法 预测准确率 精确度 ROC曲线下面积
贝叶斯网络 72.87% 71.0% 0.67
决策树 67.02% 66.3% 0.568
支持向量机 68.62% 68.2% 0.611
人工神经网络 64.89% 63.7% 0.615
[1] National Cancer Institute. SEER Cancer Statistics Review (CSR) 1975-2013 [R/OL]. [2016-09-20]. .
[2] Ettinger D S, Wood D E, Akerley W, et al.NCCN Guidelines Insights: Non-Small Cell Lung Cancer, Version 4.2016[J]. Journal of the National Comprehensive Cancer Network: JNCCN, 2016, 14(3): 255-264.
pmid: 26957612
[3] Muers M F, Shevlin P, Brown J.Prognosis in Lung Cancer: Physicians’ Opinions Compared with Outcome and a Predictive Model[J]. Thorax, 1996, 51(9): 894-902.
doi: 10.1136/thx.51.9.894 pmid: 8984699
[4] Yang L, Takimoto T, Fujimoto J.Prognostic Model for Predicting Overall Survival in Children and Adolescents with Rhabdomyosarcoma[J]. BMC Cancer, 2014, 14: 654. DOI: 10.1186/1471-2407-14-654.
doi: 10.1186/1471-2407-14-654 pmid: 25189734
[5] Park I, Lee J L, Ryu M H, et al.Prognostic Factors and Predictive Model in Patients with Advanced Biliary Tract Adenocarcinoma Receiving First-line Palliative Chemotherapy[J]. Cancer, 2009, 115(18): 4148-4155.
doi: 10.1002/cncr.24472 pmid: 19536892
[6] Kim W, Kim K S, Park R W.Nomogram of Naive Bayesian Model for Recurrence Prediction of Breast Cancer[J]. Healthcare Informatics Research, 2016, 22(2): 89-94.
doi: 10.4258/hir.2016.22.2.89
[7] Kim W, Kim K S, Lee J E, et al.Development of Novel Breast Cancer Recurrence Prediction Model Using Support Vector Machine[J]. Journal of Breast Cancer, 2012, 15(2): 230-238.
doi: 10.4048/jbc.2012.15.2.230
[8] 刘雅琴. 乳腺癌患者预后模型的研究[D]. 上海: 上海交通大学, 2008.
[8] (Liu Yaqin.Study on the Prognosis Model for Breast Cancer [D]. Shanghai: Shanghai Jiaotong University, 2008.)
[9] Chen Y C, Ke W C, Chiu H W.Risk Classification of Cancer Survival Using ANN with Gene Expression Data from Multiple Laboratories[J]. Computers in Biology and Medicine, 2014, 48: 1-7.
doi: 10.1016/j.compbiomed.2014.02.006
[10] 牟冬梅, 任珂. 三种数据挖掘算法在电子病历知识发现中的比较[J]. 现代图书情报技术, 2016(6): 102-109.
[10] (Mu Dongmei, Ren Ke.Discovering Knowledge from Electronic Medical Records with Three Data Mining Algorithms[J]. New Technology of Library and Information Service, 2016(6): 102-109.)
[11] Shin H, Nam Y.A Coupling Approach of a Predictor and a Descriptor for Breast Cancer Prognosis[J]. BMC Medical Genomics, 2014, 7(S1): S4.
doi: 10.1186/1755-8794-7-S1-S4 pmid: 4101306
[12] American Joint Committee on Cancer, AJCC Cancer Staging Manual[M]. The 7th Edition. New York: Springer Verlag, 2010: 253-270.
[13] National Comprehensive Cancer Network: NCCN Clinical Practice Guidelines in Oncology: Non-Small Cell Lung Cancer, Version 2.2016 [R/OL]. [2016-09-20]. .
[14] Hartemink A J.Principled Computational Methods for the Validation and Discovery of Genetic Regulatory Networks [D]. Massachusetts Institute of Technology, 2001: 86-87.
[15] Kumar Y, Sahoo G.Prediction of Different Types of Liver Diseases Using Rule Based Classification Model[J]. Technology & Health Care Official Journal of the European Society for Engineering & Medicine, 2013, 21(5): 417-432.
[16] Oh J H, Craft J, Al L R, et al.A Bayesian Network Approach for Modeling Local Failure in Lung Cancer[J]. Physics in Medicine & Biology, 2011, 56(6): 1635-1651.
doi: 10.1088/0031-9155/56/6/008 pmid: 21335651
[17] 张雪雷. 基于禁忌搜索算法的贝叶斯网络在疾病预测与诊断中的应用[D]. 太原: 山西医科大学, 2015.
[17] (Zhang Xuelei.The Application of Bayesian Network Based on Tabu Search Algorithm in Diseases Prediction and Diagnosis [D]. Taiyuan: Shanxi Medical University, 2015.)
[18] Lim W L, Wibowo A, Desa M I, et al.A Biogeography-Based Optimization Algorithm Hybridized with Tabu Search for the Quadratic Assignment Problem[J]. Computational Intelligence & Neuroscience, 2016. DOI: 10.1155/2016/5803893.
doi: 10.1155/2016/5803893 pmid: 26819585
[19] Makond B, Wang K J, Wang K M.Probabilistic Modeling of Short Survivability in Patients with Brain Metastasis from Lung Cancer[J]. Computer Methods & Programs in Biomedicine, 2015, 119(3): 142-162.
doi: 10.1016/j.cmpb.2015.02.005 pmid: 25804445
[20] 魏珍, 张雪雷, 饶华祥, 等.禁忌搜索算法的贝叶斯网络模型在冠心病影响因素分析中的应用[J].中华流行病学杂志, 2016, 37(6): 895-899.
doi: 10.3760/cma.j.issn.0254-6450.2016.06.031
[20] (Wei Zhen, Zhang Xuelei, Rao Huaxiang, et al.Using the Tabu-search-algorithm-based Bayesian Network to Analyze the Risk Factors of Coronary Heart Diseases[J]. Chinese Journal of Epidemiology, 2016, 37(6): 895-899.)
doi: 10.3760/cma.j.issn.0254-6450.2016.06.031
[21] 杨乔, 张俊萍. 肿瘤登记数据库的临床应用[J]. 循证医学, 2013, 13(4): 250-251, 256.
doi: 10.3969/j.issn.1671-5144.2013.04.016
[21] (Yang Qiao, Zhang Junping.Clinical Applications of the Tumor Registry Database[J]. The Journal of Evidence-Based Medicine, 2013, 13(4): 250-251, 256.)
doi: 10.3969/j.issn.1671-5144.2013.04.016
