Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (2): 41-46    DOI: 10.11925/infotech.2096-3467.2017.02.06
Orginal Article Current Issue | Archive | Adv Search |
Building Asian Tumor-patients Prognostic Model with Bayesian Network and SEER Database——Case Study of Non-Small Cell Lung Cancer
Yin Bincan1, Xin Shichao1, Zhang Han1, Zhao Yuhong1,2()
1Department of Medical Informatics, China Medical University, Shenyang 110122, China
2Shengjing Hospital of China Medical University, Shenyang 110004, China
Download: PDF (493 KB)   HTML ( 35
Export: BibTeX | EndNote (RIS)      

[Objective] This study aims to improve the tumor-prognostic assessment for Asian patients who were diagnosed with Non-Small Cell Lung Cancer (NSCLC). The proposed model identifies the influencing factors of the patients’ survival status and predicts their prognostic situation. [Methods] First, we used single factor statistical method and logistic regression to identify the prognostic variables. Second, we employed the Bayesian Network algorithm to construct the prognostic survival model for the Asian NSCLC patients. Finally, we compared the performance of our model with three other algorithms. [Results] The identified prognostic variables include age, tumor size, grade, tumor stage, as well as the lymph nodes ratio. The proposed model could predict NSCLC patients’ prognostic survival status effectively. [Limitations] The SEER database had limited number of prognostic factors, which may influence the prediction accuracy. [Conclusions] The Bayesian Network could help us build optimal prognosis model for cancer patients to improve their survival rates. The proposed model is better than the Decision Tree, Support Vector Machine and Artificial Neural Network models.

Key wordsBayesian Networks      Non-Small Cell Lung Cancer      Prognosis      Machine Learning     
Received: 31 October 2016      Published: 27 March 2017
ZTFLH:  R730.7 G35  

Cite this article:

Yin Bincan,Xin Shichao,Zhang Han,Zhao Yuhong. Building Asian Tumor-patients Prognostic Model with Bayesian Network and SEER Database——Case Study of Non-Small Cell Lung Cancer. Data Analysis and Knowledge Discovery, 2017, 1(2): 41-46.

URL:     OR

数据类型 变量 SEER中所示名称 类数/数值范围
分类型 性别 Sex 2
国别 Race recode (Asian) 8
婚姻状况 Marital status at
发病部位 Primary Site - labeled 5
病理类型 ICD-O-3 Hist/behav,
组织学分级 Grade 4
患侧部位 Laterality 2
CS extension 18
CS lymph nodes 5
远处转移程度 CS mets at dx 5
肿瘤分期 Derived AJCC
Stage Group
手术类型 RX Summ--Surg
Prim Site
是否放疗 Radiation 3
连续型 确诊时年龄 Age at diagnosis 26-90
肿瘤大小 CS tumor size 4-132
阳性淋巴结数量 Regional nodes
受检淋巴结数量 Regional nodes
变量名称 B S.E. Exp(B) 95% Exp(B) Sig.
下限 上限
确诊时年龄 -0.066 0.011 0.936 0.916 0.957 0.000
肿瘤大小 -0.018 0.007 0.982 0.968 0.996 0.014
组织学分级 / / / / / 0.001
肿瘤分期 / / / / / 0.013
0.050 0.017 1.051 1.016 1.087 0.004
-0.199 0.067 0.819 0.719 0.934 0.003
所用分类算法 预测准确率
训练集 测试集
贝叶斯网络 0.683 0.729
决策树 0.713 0.670
支持向量机 0.733 0.686
人工神经网络 0.784 0.649
算法 预测准确率 精确度 ROC曲线下面积
贝叶斯网络 72.87% 71.0% 0.67
决策树 67.02% 66.3% 0.568
支持向量机 68.62% 68.2% 0.611
人工神经网络 64.89% 63.7% 0.615
[1] National Cancer Institute. SEER Cancer Statistics Review (CSR) 1975-2013 [R/OL]. [2016-09-20]. .
[2] Ettinger D S, Wood D E, Akerley W, et al.NCCN Guidelines Insights: Non-Small Cell Lung Cancer, Version 4.2016[J]. Journal of the National Comprehensive Cancer Network: JNCCN, 2016, 14(3): 255-264.
pmid: 26957612
[3] Muers M F, Shevlin P, Brown J.Prognosis in Lung Cancer: Physicians’ Opinions Compared with Outcome and a Predictive Model[J]. Thorax, 1996, 51(9): 894-902.
doi: 10.1136/thx.51.9.894 pmid: 8984699
[4] Yang L, Takimoto T, Fujimoto J.Prognostic Model for Predicting Overall Survival in Children and Adolescents with Rhabdomyosarcoma[J]. BMC Cancer, 2014, 14: 654. DOI: 10.1186/1471-2407-14-654.
doi: 10.1186/1471-2407-14-654 pmid: 25189734
[5] Park I, Lee J L, Ryu M H, et al.Prognostic Factors and Predictive Model in Patients with Advanced Biliary Tract Adenocarcinoma Receiving First-line Palliative Chemotherapy[J]. Cancer, 2009, 115(18): 4148-4155.
doi: 10.1002/cncr.24472 pmid: 19536892
[6] Kim W, Kim K S, Park R W.Nomogram of Naive Bayesian Model for Recurrence Prediction of Breast Cancer[J]. Healthcare Informatics Research, 2016, 22(2): 89-94.
doi: 10.4258/hir.2016.22.2.89
[7] Kim W, Kim K S, Lee J E, et al.Development of Novel Breast Cancer Recurrence Prediction Model Using Support Vector Machine[J]. Journal of Breast Cancer, 2012, 15(2): 230-238.
doi: 10.4048/jbc.2012.15.2.230
[8] 刘雅琴. 乳腺癌患者预后模型的研究[D]. 上海: 上海交通大学, 2008.
[8] (Liu Yaqin.Study on the Prognosis Model for Breast Cancer [D]. Shanghai: Shanghai Jiaotong University, 2008.)
[9] Chen Y C, Ke W C, Chiu H W.Risk Classification of Cancer Survival Using ANN with Gene Expression Data from Multiple Laboratories[J]. Computers in Biology and Medicine, 2014, 48: 1-7.
doi: 10.1016/j.compbiomed.2014.02.006
[10] 牟冬梅, 任珂. 三种数据挖掘算法在电子病历知识发现中的比较[J]. 现代图书情报技术, 2016(6): 102-109.
[10] (Mu Dongmei, Ren Ke.Discovering Knowledge from Electronic Medical Records with Three Data Mining Algorithms[J]. New Technology of Library and Information Service, 2016(6): 102-109.)
[11] Shin H, Nam Y.A Coupling Approach of a Predictor and a Descriptor for Breast Cancer Prognosis[J]. BMC Medical Genomics, 2014, 7(S1): S4.
doi: 10.1186/1755-8794-7-S1-S4 pmid: 4101306
[12] American Joint Committee on Cancer, AJCC Cancer Staging Manual[M]. The 7th Edition. New York: Springer Verlag, 2010: 253-270.
[13] National Comprehensive Cancer Network: NCCN Clinical Practice Guidelines in Oncology: Non-Small Cell Lung Cancer, Version 2.2016 [R/OL]. [2016-09-20]. .
[14] Hartemink A J.Principled Computational Methods for the Validation and Discovery of Genetic Regulatory Networks [D]. Massachusetts Institute of Technology, 2001: 86-87.
[15] Kumar Y, Sahoo G.Prediction of Different Types of Liver Diseases Using Rule Based Classification Model[J]. Technology & Health Care Official Journal of the European Society for Engineering & Medicine, 2013, 21(5): 417-432.
[16] Oh J H, Craft J, Al L R, et al.A Bayesian Network Approach for Modeling Local Failure in Lung Cancer[J]. Physics in Medicine & Biology, 2011, 56(6): 1635-1651.
doi: 10.1088/0031-9155/56/6/008 pmid: 21335651
[17] 张雪雷. 基于禁忌搜索算法的贝叶斯网络在疾病预测与诊断中的应用[D]. 太原: 山西医科大学, 2015.
[17] (Zhang Xuelei.The Application of Bayesian Network Based on Tabu Search Algorithm in Diseases Prediction and Diagnosis [D]. Taiyuan: Shanxi Medical University, 2015.)
[18] Lim W L, Wibowo A, Desa M I, et al.A Biogeography-Based Optimization Algorithm Hybridized with Tabu Search for the Quadratic Assignment Problem[J]. Computational Intelligence & Neuroscience, 2016. DOI: 10.1155/2016/5803893.
doi: 10.1155/2016/5803893 pmid: 26819585
[19] Makond B, Wang K J, Wang K M.Probabilistic Modeling of Short Survivability in Patients with Brain Metastasis from Lung Cancer[J]. Computer Methods & Programs in Biomedicine, 2015, 119(3): 142-162.
doi: 10.1016/j.cmpb.2015.02.005 pmid: 25804445
[20] 魏珍, 张雪雷, 饶华祥, 等.禁忌搜索算法的贝叶斯网络模型在冠心病影响因素分析中的应用[J].中华流行病学杂志, 2016, 37(6): 895-899.
doi: 10.3760/cma.j.issn.0254-6450.2016.06.031
[20] (Wei Zhen, Zhang Xuelei, Rao Huaxiang, et al.Using the Tabu-search-algorithm-based Bayesian Network to Analyze the Risk Factors of Coronary Heart Diseases[J]. Chinese Journal of Epidemiology, 2016, 37(6): 895-899.)
doi: 10.3760/cma.j.issn.0254-6450.2016.06.031
[21] 杨乔, 张俊萍. 肿瘤登记数据库的临床应用[J]. 循证医学, 2013, 13(4): 250-251, 256.
doi: 10.3969/j.issn.1671-5144.2013.04.016
[21] (Yang Qiao, Zhang Junping.Clinical Applications of the Tumor Registry Database[J]. The Journal of Evidence-Based Medicine, 2013, 13(4): 250-251, 256.)
doi: 10.3969/j.issn.1671-5144.2013.04.016
[1] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[2] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[3] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[4] Ruojia Wang,Lu Zhang,Jimin Wang. Automatic Triage of Online Doctor Services Based on Machine Learning[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
[5] Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[6] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[7] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[8] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[9] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[10] Jing Li,Shuxiao Pan,Xueyan Li,Lijing Jia,Yuzhuo Zhao. Screening Critical Patients with Optimized Classifier Based on Multi Objective Quantum[J]. 数据分析与知识发现, 2019, 3(12): 101-112.
[11] Zixuan Zhang,Hao Wang,Liping Zhu,Sanhong eng. Identifying Risks of HS Codes by China Customs[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
[12] Lina Liu,Jiayin Qi,Zhenping Zhang,Dan Zeng. Analyzing Impacts of Brand Reputation on Online Sales Based on Massive Commodity Reviews and Brand[J]. 数据分析与知识发现, 2018, 2(9): 10-21.
[13] Longjia Jia,Bangzuo Zhang. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
[14] Wei Lu,Mengqi Luo,Heng Ding,Xin Li. Image Annotation Tags by Deep Learning and Real Users: A Comparative Study[J]. 数据分析与知识发现, 2018, 2(5): 1-10.
[15] Li Wang,Lixue Zou,Xiwen Liu. Visualizing Document Correlation Based on LDA Model[J]. 数据分析与知识发现, 2018, 2(3): 98-106.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938