Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (2): 41-46    DOI: 10.11925/infotech.2096-3467.2017.02.06
Orginal Article Current Issue | Archive | Adv Search |
Building Asian Tumor-patients Prognostic Model with Bayesian Network and SEER Database——Case Study of Non-Small Cell Lung Cancer
Bincan Yin1,Shichao Xin1,Han Zhang1,Yuhong Zhao1,2()
1Department of Medical Informatics, China Medical University, Shenyang 110122, China
2Shengjing Hospital of China Medical University, Shenyang 110004, China
Download: PDF(493 KB)   HTML ( 35
Export: BibTeX | EndNote (RIS)      

[Objective] This study aims to improve the tumor-prognostic assessment for Asian patients who were diagnosed with Non-Small Cell Lung Cancer (NSCLC). The proposed model identifies the influencing factors of the patients’ survival status and predicts their prognostic situation. [Methods] First, we used single factor statistical method and logistic regression to identify the prognostic variables. Second, we employed the Bayesian Network algorithm to construct the prognostic survival model for the Asian NSCLC patients. Finally, we compared the performance of our model with three other algorithms. [Results] The identified prognostic variables include age, tumor size, grade, tumor stage, as well as the lymph nodes ratio. The proposed model could predict NSCLC patients’ prognostic survival status effectively. [Limitations] The SEER database had limited number of prognostic factors, which may influence the prediction accuracy. [Conclusions] The Bayesian Network could help us build optimal prognosis model for cancer patients to improve their survival rates. The proposed model is better than the Decision Tree, Support Vector Machine and Artificial Neural Network models.

Key wordsBayesian Networks      Non-Small Cell Lung Cancer      Prognosis      Machine Learning     
Received: 31 October 2016      Published: 27 March 2017

Cite this article:

Bincan Yin,Shichao Xin,Han Zhang,Yuhong Zhao. Building Asian Tumor-patients Prognostic Model with Bayesian Network and SEER Database——Case Study of Non-Small Cell Lung Cancer. Data Analysis and Knowledge Discovery, 2017, 1(2): 41-46.

URL:     OR

[1] National Cancer Institute. SEER Cancer Statistics Review (CSR) 1975-2013 [R/OL]. [2016-09-20]. .
[2] Ettinger D S, Wood D E, Akerley W, et al.NCCN Guidelines Insights: Non-Small Cell Lung Cancer, Version 4.2016[J]. Journal of the National Comprehensive Cancer Network: JNCCN, 2016, 14(3): 255-264.
[3] Muers M F, Shevlin P, Brown J.Prognosis in Lung Cancer: Physicians’ Opinions Compared with Outcome and a Predictive Model[J]. Thorax, 1996, 51(9): 894-902.
[4] Yang L, Takimoto T, Fujimoto J.Prognostic Model for Predicting Overall Survival in Children and Adolescents with Rhabdomyosarcoma[J]. BMC Cancer, 2014, 14: 654. DOI: 10.1186/1471-2407-14-654.
[5] Park I, Lee J L, Ryu M H, et al.Prognostic Factors and Predictive Model in Patients with Advanced Biliary Tract Adenocarcinoma Receiving First-line Palliative Chemotherapy[J]. Cancer, 2009, 115(18): 4148-4155.
[6] Kim W, Kim K S, Park R W.Nomogram of Naive Bayesian Model for Recurrence Prediction of Breast Cancer[J]. Healthcare Informatics Research, 2016, 22(2): 89-94.
[7] Kim W, Kim K S, Lee J E, et al.Development of Novel Breast Cancer Recurrence Prediction Model Using Support Vector Machine[J]. Journal of Breast Cancer, 2012, 15(2): 230-238.
[8] 刘雅琴. 乳腺癌患者预后模型的研究[D]. 上海: 上海交通大学, 2008.
[8] (Liu Yaqin.Study on the Prognosis Model for Breast Cancer [D]. Shanghai: Shanghai Jiaotong University, 2008.)
[9] Chen Y C, Ke W C, Chiu H W.Risk Classification of Cancer Survival Using ANN with Gene Expression Data from Multiple Laboratories[J]. Computers in Biology and Medicine, 2014, 48: 1-7.
[10] 牟冬梅, 任珂. 三种数据挖掘算法在电子病历知识发现中的比较[J]. 现代图书情报技术, 2016(6): 102-109.
[10] (Mu Dongmei, Ren Ke.Discovering Knowledge from Electronic Medical Records with Three Data Mining Algorithms[J]. New Technology of Library and Information Service, 2016(6): 102-109.)
[11] Shin H, Nam Y.A Coupling Approach of a Predictor and a Descriptor for Breast Cancer Prognosis[J]. BMC Medical Genomics, 2014, 7(S1): S4.
[12] American Joint Committee on Cancer, AJCC Cancer Staging Manual[M]. The 7th Edition. New York: Springer Verlag, 2010: 253-270.
[13] National Comprehensive Cancer Network: NCCN Clinical Practice Guidelines in Oncology: Non-Small Cell Lung Cancer, Version 2.2016 [R/OL]. [2016-09-20]. .
[14] Hartemink A J.Principled Computational Methods for the Validation and Discovery of Genetic Regulatory Networks [D]. Massachusetts Institute of Technology, 2001: 86-87.
[15] Kumar Y, Sahoo G.Prediction of Different Types of Liver Diseases Using Rule Based Classification Model[J]. Technology & Health Care Official Journal of the European Society for Engineering & Medicine, 2013, 21(5): 417-432.
[16] Oh J H, Craft J, Al L R, et al.A Bayesian Network Approach for Modeling Local Failure in Lung Cancer[J]. Physics in Medicine & Biology, 2011, 56(6): 1635-1651.
[17] 张雪雷. 基于禁忌搜索算法的贝叶斯网络在疾病预测与诊断中的应用[D]. 太原: 山西医科大学, 2015.
[17] (Zhang Xuelei.The Application of Bayesian Network Based on Tabu Search Algorithm in Diseases Prediction and Diagnosis [D]. Taiyuan: Shanxi Medical University, 2015.)
[18] Lim W L, Wibowo A, Desa M I, et al.A Biogeography-Based Optimization Algorithm Hybridized with Tabu Search for the Quadratic Assignment Problem[J]. Computational Intelligence & Neuroscience, 2016. DOI: 10.1155/2016/5803893.
[19] Makond B, Wang K J, Wang K M.Probabilistic Modeling of Short Survivability in Patients with Brain Metastasis from Lung Cancer[J]. Computer Methods & Programs in Biomedicine, 2015, 119(3): 142-162.
[20] 魏珍, 张雪雷, 饶华祥, 等.禁忌搜索算法的贝叶斯网络模型在冠心病影响因素分析中的应用[J].中华流行病学杂志, 2016, 37(6): 895-899.
[20] (Wei Zhen, Zhang Xuelei, Rao Huaxiang, et al.Using the Tabu-search-algorithm-based Bayesian Network to Analyze the Risk Factors of Coronary Heart Diseases[J]. Chinese Journal of Epidemiology, 2016, 37(6): 895-899.)
[21] 杨乔, 张俊萍. 肿瘤登记数据库的临床应用[J]. 循证医学, 2013, 13(4): 250-251, 256.
[21] (Yang Qiao, Zhang Junping.Clinical Applications of the Tumor Registry Database[J]. The Journal of Evidence-Based Medicine, 2013, 13(4): 250-251, 256.)
[1] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[2] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[3] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[4] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[5] Zixuan Zhang,Hao Wang,Liping Zhu,Sanhong eng. Identifying Risks of HS Codes by China Customs[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
[6] Lina Liu,Jiayin Qi,Zhenping Zhang,Dan Zeng. Analyzing Impacts of Brand Reputation on Online Sales Based on Massive Commodity Reviews and Brand[J]. 数据分析与知识发现, 2018, 2(9): 10-21.
[7] Longjia Jia,Bangzuo Zhang. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
[8] Wei Lu,Mengqi Luo,Heng Ding,Xin Li. Image Annotation Tags by Deep Learning and Real Users: A Comparative Study[J]. 数据分析与知识发现, 2018, 2(5): 1-10.
[9] Li Wang,Lixue Zou,Xiwen Liu. Visualizing Document Correlation Based on LDA Model[J]. 数据分析与知识发现, 2018, 2(3): 98-106.
[10] Xinyue Fan,Lei Cui. Predicting Antineoplastic Drug Targets Based on Network Properties[J]. 数据分析与知识发现, 2018, 2(12): 98-108.
[11] Yang Zhao,Xini Yuan,Yawen Chen,Liqiang Wu. Predicting Conversion Rate of APP Advertising with Machine Learning[J]. 数据分析与知识发现, 2018, 2(11): 2-9.
[12] Xin Wang,Wen’gang Feng. Review of Techniques Detecting Online Extremism and Radicalization[J]. 数据分析与知识发现, 2018, 2(10): 2-8.
[13] Zhongyi Hu,Chaoqun Wang,Jiang Wu. Identifying Phishing Websites with Multiple Online Data Sources[J]. 数据分析与知识发现, 2017, 1(6): 47-55.
[14] Weimin Lv,Xiaomei Wang,Tao Han. Recommending Scientific Research Collaborators with Link Prediction and Extremely Randomized Trees Algorithm[J]. 数据分析与知识发现, 2017, 1(4): 38-45.
[15] Yue He,Min Xiao,Yue Zhang. Sentiment Analysis of Trending Topics Based on Relevance[J]. 数据分析与知识发现, 2017, 1(3): 46-53.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938