Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (1): 88-98    DOI: 10.11925/infotech.2096-3467.2017.1053
Orginal Article Current Issue | Archive | Adv Search |
Reducing Data Dimension of Electronic Medical Records: An Empirical Study
Mu Dongmei(), Wang Ping, Zhao Danning
School of Public Health, Jilin University, Changchun 130021, China
Download: PDF (1630 KB)   HTML ( 3
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper explores the strategy of reducing the data dimension of electronic medical records, aiming to improve the knowledge discovery. [Methods] First, we conducted preliminary dimension reduction through literature review. Then, we used three methods to finish the second round of dimension reduction. We extracted the factors with the eigenvalue greater than 1, with the cumulative contribution rate greater than 85%, as well as factors of significant differences. Finally, we compared results of the three methods with empirical research. [Results] The dimensional reduction methods extracted 8, 17 and 14 attributes respectively. After qualitative and quantitative evaluation, the principal component analysis method yielded the best result, whose dimension of the feature root was larger than 1. [Limitations] The sample size needs to be expanded for more in-depth analysis. [Conclusions] The proposed method could effectively reduce the data dimension of electronic medical records.

Key wordsDimension Reduction      Data Mining      Knowledge Discovery      Electronic Medical Record     
Received: 23 October 2017      Published: 05 February 2018
ZTFLH:  G353.1  

Cite this article:

Mu Dongmei,Wang Ping,Zhao Danning. Reducing Data Dimension of Electronic Medical Records: An Empirical Study. Data Analysis and Knowledge Discovery, 2018, 2(1): 88-98.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.1053     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I1/88

表格类型 表格数 数据量 属性简介
诊断表 1 16 508 patient ID、visit ID、diagnosis、诊断时间、诊断内容、治疗时间、治疗结果
基本信息表 1 10 791 patient ID、birth place、data of birth、sex
病人ID表 2 10 791 patient ID
生理指标表 1 65 536 patient ID、visit ID、record data、体重、体温、脉搏、呼吸、血压
生化指标表 12 45 803 主题词、申请序号、病人ID号、性别、诊断、检验时间、费用类型、血液样本类型、样本记录说明、检查时间及各种生化指标: 总胆固醇、高密度脂蛋白、低密度脂蛋白、甘油三酯、乳酸脱氢酶、载脂蛋白A、载脂蛋白B等
用药表 1 65 536 病人ID号、药品名称、起止时间、用法与用量、用药频次、费用等
糖化指标表 12 5 230 主题词(全血糖化血红蛋白)、申请序号、病人ID、性别、年龄、费别、血液样本类型、标本说明、时间、全血糖化血红蛋白测定量
序号 主成分 英文缩写 旋转后因子载荷 特征根
1 直接胆红素 DBIL 0.959 2.743
2 Ca 0.820 2.563
3 Na 0.857 2.339
4 载脂蛋白B apoB 0.786 2.190
5 总胆固醇 TC 0.866 1.826
6 K 0.686 1.738
7 全血糖化血红蛋白 Hb 0.790 1.507
8 高密度脂蛋白 HDL 0.731 1.119
序号 主成分 英文缩写 旋转后
因子载荷
累计贡献率
1 直接胆红素 DBIL 0.961 10.158
2 血清白蛋白 ALB 0.873 19.650
3 总胆固醇 TC 0.944 28.314
4 氯化物 Cl 0.914 36.427
5 甘油三酯 TG 0.890 43.191
6 全血糖化血红蛋白 Hb 0.834 49.627
7 载脂蛋白A1 apoA1 0.786 55.207
8 年龄 age 0.870 59.350
9 二氧化碳 CO2 0.912 63.033
10 K 0.952 66.525
11 游离钙 f-Ca 0.964 69.908
12 血清尿酸 SUA 0.942 73.159
13 性别 sex 0.888 76.242
14 Mg 0.971 79.214
15 乳酸脱氢酶 LDH 0.972 81.920
16 血清脂蛋白 LP(a) 0.973 84.609
17 糖化血清蛋白 GSP 0.934 87.023
序号 属性 B sig. Exp(B)
1 葡萄糖(GS) 0.103 0.000 1.108
2 总胆固醇(TC) 0.441 0.000 1.554
3 乳酸脱氢酶(LDH) 0.005 0.000 1.005
4 低密度脂蛋白(LDL) -0.544 0.000 0.580
5 全血糖化血红蛋白(Hb) -0.128 0.005 0.880
6 总胆红素(STB) -0.030 0.008 0.970
7 直接胆红素(DBIL) 0.246 0.000 1.279
8 总胆汁酸(TBA) 0.048 0.004 1.049
9 钾(K) -0.670 0.000 0.512
10 氯化物(Cl) -0.105 0.000 0.901
11 无机磷(P) -1.215 0.000 2.97
12 镁(Mg) 2.948 0.000 19.071
13 游离钙(f-Ca) 3.009 0.035 20.265
14 二氧化碳(CO2) -0.176 0.000 0.839
项目 降维方法1 降维方法2 降维方法3
降维原理 特征向量
特征根
特征向量累
计贡献率
特征向量患病
差异显著性
降维方法 主成分分析 主成分分析 逻辑回归
降维条件 特征根大于1 累计贡献率大于85% P值小于0.05
提取因子数量 8个 17个 14个
提取因子百分比 29.63% 62.96% 51.85%
方法 TP FP Precision Recall F-Measure ROC Area Kappa
未降维 0.935 0.096 0.935 0.935 0.935 0.925 0.838
降维方法1 0.975 0.045 0.974 0.975 0.974 0.975 0.936
降维方法2 0.933 0.093 0.934 0.933 0.934 0.925 0.836
降维方法3 0.941 0.088 0.941 0.941 0.941 0.937 0.879
[1] 罗旭, 刘友江. 医疗大数据研究现状及其临床应用[J]. 医学信息学杂志, 2015, 36(5): 10-14.
[1] (Luo Xu, Liu Youjiang.Medical Big Data Research Status and Its Clinical Application[J]. Journal of Medical Informatics, 2015, 36(5): 10-14.)
[2] Godinho T M, Costa C, Oliveira J L.Intelligent Generator of Big Data Medical Imaging Repositories[J]. IET Software, 2017, 11(3): 100-104.
doi: 10.1049/iet-sen.2016.0191
[3] 毕达天, 邱长波, 张晗. 数据降维技术研究现状及其进展[J]. 情报理论与实践, 2013, 36(2): 125-128.
[3] (Bi Datian, Qiu Changbo, Zhang Han.Research Status and Progress of Data Dimensionality Reduction Technology[J]. Information Studies: Theory & Application, 2013, 36(2): 125-128.)
[4] 雷健波. 电子病历的核心价值与临床决策支持[J]. 中国数字医学, 2008, 3(3): 26-30.
doi: 10.3969/j.issn.1673-7571.2008.03.009
[4] (Lei Jianbo.Clinical Decision Support and the Core Value of Electronic Medical Record[J].China Digital Medicine, 2008, 3(3): 26-30.)
doi: 10.3969/j.issn.1673-7571.2008.03.009
[5] Byrd R J, Steinhubl S R, Sun J, et al.Automatic Identification of Heart Failure Diagnostic Criteria, Using Text Analysis of Clinical Notes from Electronic Health Records[J]. International Journal of Medical Informatics, 2014, 83(12): 983-992.
doi: 10.1016/j.ijmedinf.2012.12.005 pmid: 23317809
[6] Ye J, Farnum M, Yang E, et al.Sparse Learning and Stability Selection for Predicting MCI to AD Conversion Using Baseline ADNI Data[J]. BMC Neurology, 2012, 12: 46.
doi: 10.1186/1471-2377-12-46 pmid: 22731740
[7] Kawata T, Daimon M, Miyazaki S, et al.Coronary Microvascular Function is Independently Associated with Left Ventricular Filling Pressure in Patients with Type 2 Diabetes Mellitus[J]. Cardiovascular Diabetology, 2015, 14: 98.
doi: 10.1186/s12933-015-0263-7 pmid: 4525728
[8] 郭珉江. 数据挖掘技术在疾病诊断相关分组中的应用[D]. 长沙: 中南大学, 2009.
[8] (Guo Minjiang.Research on the Application of Data Mining Technology in Disease Related Groups[D]. Changsha: Central South University, 2009.)
[9] Alvarez C A, Clark C A, Zhang S, et al.Predicting out of Intensive Care Unit Cardiopulmonary Arrest or Death Using Electronic Medical Record Data[J]. BMC Medical Informatics and Decision Making, 2013, 13: 28.
doi: 10.1186/1472-6947-13-28 pmid: 23442316
[10] Matheny M E, Fitzhenry F, Speroff T, et al.Detection of Infectious Symptoms from VA Emergency Department and Primary Care Clinical Documentation[J]. International Journal of Medical Informatics, 2012, 81(3): 143-156.
doi: 10.1016/j.ijmedinf.2011.11.005 pmid: 22244191
[11] Ciecholewski M.Ischemic Heart Disease Detection Using Selected Machine Learning Methods[J]. International Journal of Computer Mathematics, 2013, 90(8): 1734-1759.
doi: 10.1080/00207160.2012.742189
[12] Ramírez J, Górriz J M, Salas-Gonzalez D, et al.Computer- aided Diagnosis of Alzheimer’s Type Dementia Combining Support Vector Machines and Discriminant Set of Features[J]. Information Sciences, 2013, 237: 59-72.
doi: 10.1016/j.ins.2009.05.012
[13] Pierson E, Yau C.ZIFA: Dimensionality Reduction for Zero- inflated Single-cell Gene Expression Analysis[J]. Genome Biology, 2015, 16: 241.
doi: 10.1186/s13059-015-0805-z pmid: 4630968
[14] Yao F, Coquery J, Cao K A L. Independent Principal Component Analysis for Biologically Meaningful Dimension Reduction of Large Biological Data Sets[J]. BMC Bioinformatics, 2012, 13: 24.
doi: 10.1186/1471-2105-13-24 pmid: 22305354
[15] Gui J, Moore J H, Williams S M, et al.A Simple and Computationally Efficient Approach to Multifactor Dimensionality Reduction Analysis of Gene-Gene Interactions for Quantitative Traits[J]. PLOS One, 2013, 8(6): e66545.
doi: 10.1371/journal.pone.0066545 pmid: 3689797
[16] 周威光. 粗糙集理论处理海量电子病历的研究与应用[D].杭州: 浙江理工大学, 2017.
[16] (Zhou Weiguang.Research and Application of Rough Set Theory in Dealing with Massive Electronic Medical Records [D]. Hangzhou: Zhejiang Sci-Tech University, 2017.)
[17] 田宇驰, 胡亮. 基于SVM的一种医疗数据分析模型[J].东北师大学报 : 自然科学版, 2015, 47(1): 77-82.
doi: 10.16163/j.cnki.22-1123/n.2015.01.015
[17] (Tian Yuchi, Hu Liang.A Medical Data Analysis Model Based on SVM[J]. Journal of Northeast Normal University, 2015, 47(1): 77-82.)
doi: 10.16163/j.cnki.22-1123/n.2015.01.015
[18] 佘侃侃, 胡孔法, 王珍. 基于变精度容差粗糙集模型及属性敏感度约简的方剂配伍研究[J]. 世界科学技术: 中医药现代化, 2014(6): 1222-1228.
doi: 10.11842/wst.2014.06.003
[18] (She Kankan, Hu Kongfa, Wang Zhen.Research on Chinese Prescription Compatibility Based on Variable Precision Tolerance Model and Attribute Sensitivity Reduction[J]. Modernization of Traditional Chinese Medicine and Materia Medica-World Science and Technology, 2014(6): 1222-1228.)
doi: 10.11842/wst.2014.06.003
[19] Carter J T.Electronic Medical Records and Quality Improvement[J]. Neurosurgery Clinics of North America, 2015, 26(2): 245-251.
doi: 10.1016/j.nec.2014.11.018
[20] 中华人民共和国国家卫生和计划生育委员会.关于发布《电子病历基本数据集第1部分: 病例概要》等20项卫生行业标准的通告(国卫通〔2014〕5号)[EB/OL]. (2014-06-19). [2017-04-15]. .
[20] (National Health and Family Planning Commission of the People’s Republic of China. Notice on Publishing 20 Health Industry Standards such as Basic Medical Data Record Part 1: Case Summary and Other Health Industry Standards [EB/OL]. (2014-06-19). [2017-04-15]. .)
[21] Grauer R, Barber M, Scheeren J.Exploring Microsoft Office Excel 2007[M]. Prentice Hall, 2007.
[22] Moore B.Principal Component Analysis in Linear Systems: Controllability, Observability, and Model Reduction[J]. IEEE Transactions on Automatic Control, 1981, 26(1): 17-32.
doi: 10.1109/TAC.1981.1102568
[23] Hosmer D W, Lemeshow S.Applied Logistic Regression[M]. Wiley, 2000.
[24] 张冬慧, 唐智柳, 李岚, 等. 我国21世纪初糖尿病患病率系统综述[J]. 上海预防医学, 2012, 24(9): 492-495.
doi: 10.3969/j.issn.1004-9231.2012.09.009
[24] (Zhang Donghui, Tang Zhiliu, Li Lan, et al.A Systematic Review on Prevalence Rate of Diabetes in 2001-2010 in China[J]. Shanghai Journal of Preventive Medicine, 2012, 24(9): 492-495.)
doi: 10.3969/j.issn.1004-9231.2012.09.009
[25] 国家人口与健康科学数据共享服务平台. 糖尿病数据集[EB/OL]. (2015-06-13). [2017-04-15]. .
[25] (National Scientific Data Sharing Platform for Population and Health.Diabetes Data Set [EB/OL]. (2015- 06-13). [2017-04-15].
[26] 百度百科. 新年龄分段 [EB/OL]. [2016-05-14]..
[26] (Baidu Encyclopedia.New Age Segmentation [EB/OL]. [2016-05-14]..)
[27] 王庭俊, 严孙杰, 陈纯娴. 不同性别2型糖尿病患者血脂、血压与骨质疏松的关系[J]. 中华高血压杂志, 2012, 20(12): 1152-1156.
[27] (Wang Tingjun, Yan Sunjie, Chen Chunxian.Relationship Between Blood Lipid, Blood Pressure and Osteoporosis in Male and Female Patients with Type 2 Diabetes[J]. Chinese Journal of Hypertension, 2012, 20(12): 1152-1156.)
[28] 黄琼刁, 邓万溪, 黄钦展, 等.糖尿病主要并发症年龄与体质之相关性研究[J]. 世界中医药, 2013, 8(3): 288-290.
doi: 10.3969/j.issn.1673-7202.2013.03.014
[28] (Huang Qiongdiao, Deng Wanxi, Huang Qinzhan, et al.Study on Correlation of Age and Constitution in Patients with Major Diabetes Complications[J]. World Chinese Medicine, 2013, 8(3): 288-290.)
doi: 10.3969/j.issn.1673-7202.2013.03.014
[29] 石珂, 张悦之, 谢琳, 等. 负性调节葡萄糖转运对糖尿病小鼠视网膜微血管病变的抑制作用[J]. 第二军医大学学报, 2015, 36(2): 147-154.
doi: 10.3724/SP.J.1008.2015.00147
[29] (Shi Ke, Zhang Yuezhi, Xie Lin, et al.Negative Regulation of Glucose Transport Alleciates Microvasculature Pathological Changes of Retinopathy in Diabetic Mice[J]. Academic Journal of Second Military Medical University, 2015, 36(2): 147-154.)
doi: 10.3724/SP.J.1008.2015.00147
[30] 张媛媛, 张日华, 杜新丽, 等. 血清尿酸水平与糖尿病各代谢因子的相关性研究[J]. 南京医科大学学报: 自然科学版, 2013(1): 62-67.
[30] (Zhang Yuanyuan, Zhang Rihua, Du Xinli, et al.Association Between Serum Eric Acid Concentration and the Metabolic Factors of Diabetes[J]. Acta Universitatis Medicinalis Nanjing, 2013(1): 62-67.)
[31] Scanlon G, Connell P, Ratzlaff M, et al.Macular Pigment Optical Density is Lower in Type 2 Diabetes, Compared with Type 1 Diabetes and Normal Controls[J].Retina, 2015, 35(9): 1808-1816.
doi: 10.1097/IAE.0000000000000551 pmid: 25932554
[32] 杨维娜, 王璇, 蓝茜, 等. 2型糖尿病并发周围血管病变的临床流行病学分析[J]. 西安交通大学学报: 医学版, 2013, 34(1): 73-76.
doi: 10.3969/j.issn.1671-8259.2013.01.017
[32] (Yang Weina, Wang Xuan, Lan Qian, et al.Clinical Epidemiological Analysis of Type 2 Diabetes Patients with Peripheral Vascular Disease[J].Journal of Xi’an Jiaotong University: Medical Sciences, 2013, 34(1): 73-76.)
doi: 10.3969/j.issn.1671-8259.2013.01.017
[33] Das R, Kerr R, Chakravarthy U, et al.Dyslipidemia and Diabetic Macular Edema: A Systematic Review and Meta- Analysis[J]. Ophthalmology, 2015, 122(9): 1820-1827.
doi: 10.1016/j.ophtha.2015.05.011
[34] 魏忠燕, 谢立科, 镇华, 等. 同型半胱氨酸、尿酸、乳酸脱氢酶及肌酸激酶与糖尿病视网膜病变的关系[J]. 眼科新进展, 2011, 31(9): 846-848.
[34] (Wei Zhongyan, Xie Like, Zhen Hua, et al.Correlation of Homocysteine, Uric Acid, Lac-tate Dehydrogenase and Creatine Kinase with Diabetic Retinopathy[J]. Recent Advances in Ophthalmology, 2011, 31(9): 846-848.)
[35] 白洲霞. 2型糖尿病血清载脂蛋白A1、B、脂蛋白(a)水平观察分析[J]. 国际检验医学杂志, 2010, 31(10): 1146-1147.
doi: 10.3969/j.issn.1673-4130.2010.10.041
[35] (Bai Zhouxia.Observation and Analysis of Serum Apolipoprotein A1, B and Lipoprotein (a) Levels in Type 2 Diabetes Mellitus[J]. International Journal of Laboratory Medicine, 2010, 31(10): 1146-1147.)
doi: 10.3969/j.issn.1673-4130.2010.10.041
[36] 王晶晶, 田晨光. 糖化血红蛋白、糖化血清蛋白、血细胞参数在老年糖尿病微血管病变患者中的应用价值[J]. 中华实用诊断与治疗杂志, 2010, 24(2): 143-145.
[36] (Wang Jingjing, Tian Chenguang.Application of Glycosylated Hemoglobin, Glycosylated Serum Protein and Peripheral Blood Cell Parametersin Senile Patients with Diabetic Microangiopathy[J]. Journal of Chinese Practical Diagnosis and Therapy, 2010, 24(2): 143-145.)
[37] Takahara M, Katakami N, Osonoi T, et al.Different Impacts of Cardiovascular Risk Factors on Arterial Stiffness Versus Arterial Wall Thickness in Japanese Patients with Type 2 Diabetes Mellitus[J]. Journal of Atherosclerosis and Thrombosis, 2015, 22(9): 971-980.
doi: 10.5551/jat.29090 pmid: 25864887
[38] Kaidonis G, Burdon K P, Gillies M C, et al.Common Sequence Variation in the VEGFC Gene is Associated with Diabetic Retinopathy and Diabetic Macular Edema[J]. Ophthalmology, 2015, 122(9): 1828-1836.
doi: 10.1016/j.ophtha.2015.05.004
[39] 杜玮, 刘子扬, 周艳艳, 等. 糖尿病视网膜病变与血清胆红素水平的关系[J]. 眼科新进展, 2012, 32(5): 484-485.
doi: 10.3969/j.issn.1006-7795.2010.01.027
[39] (Du Wei, Liu Ziyang, Zhou Yanyan, et al.Relationship Between Diabetic Retinopathy and Serum Bilirubin Level[J]. Recent Advances in Ophthalmology, 2012, 32(5): 484-485.)
doi: 10.3969/j.issn.1006-7795.2010.01.027
[40] 项旻, 杨虹, 叶成夫, 等. 女性2型糖尿病视网膜病变患者血微量元素水平及其相关因素分析[J]. 中国医药导报, 2014, 11(13): 9-11.
[40] (Xiang Min, Yang Hong, Ye Chengfu, et al.Level of Blood Trace Elements in Female Patients with Type 2 Diabetic Retinopathy and Its Related Factors Analysis[J]. China Medical Herald, 2014, 11(13): 9-11.)
[1] Dai Bing,Hu Zhengyin. Review of Studies on Literature-Based Discovery[J]. 数据分析与知识发现, 2021, 5(4): 1-12.
[2] Xie Wang, Wang Lizhen, Chen Hongmei, Zeng Lanqing. Identifying Relationship Between Pollution Sources and Cancer Cases with Spatial Ordered Pair Patterns[J]. 数据分析与知识发现, 2021, 5(2): 14-31.
[3] Hu Zhengyin,Liu Leilei,Dai Bing,Qin Xiaochu. Discovering Subject Knowledge in Life and Medical Sciences with Knowledge Graph[J]. 数据分析与知识发现, 2020, 4(11): 1-14.
[4] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[5] Yong Zhang,Shuqing Li,Yongshang Cheng. Mining Algorithm for Weighted Association Rules Based on Frequency Effective Length[J]. 数据分析与知识发现, 2019, 3(7): 85-93.
[6] Kan Liu,Lu Chen. Deep Neural Network Learning for Medical Triage[J]. 数据分析与知识发现, 2019, 3(6): 99-108.
[7] Quan Lu,Anqi Zhu,Jiyue Zhang,Jing Chen. Research on User Information Requirement in Chinese Network Health Community: Taking Tumor-forum Data of Qiuyi as an Example[J]. 数据分析与知识发现, 2019, 3(4): 22-32.
[8] Dongmei Mu,Hui Fa,Ping Wang,Jing Sun. Research on Disease Risk Factors on Structural Equation Model[J]. 数据分析与知识发现, 2019, 3(4): 80-89.
[9] Juhua Wu,Yu Wang,Ming Li,Shaoyun Cai. Knowledge Discovery of Online Health Communities with Weighted Knowledge Network[J]. 数据分析与知识发现, 2019, 3(2): 108-117.
[10] Lei Yang,Zirun Wang,Guisheng Hou. Discovering Topics of Online Health Community with Q-LDA Model[J]. 数据分析与知识发现, 2019, 3(11): 52-59.
[11] Jiying Hu,Jing Xie,Li Qian,Changlei Fu. Constructing Big Data Platform for Sci-Tech Knowledge Discovery with Knowledge Graph[J]. 数据分析与知识发现, 2019, 3(1): 55-62.
[12] Wang Xin,Feng Wen’gang. Review of Techniques Detecting Online Extremism and Radicalization[J]. 数据分析与知识发现, 2018, 2(10): 2-8.
[13] Li Yongnan. Using Bayes Theory to Classify Counter Terrorism Intelligence[J]. 数据分析与知识发现, 2018, 2(10): 9-14.
[14] Zhang Zhiqiang,Fan Shaoping,Chen Xiujuan. Biomedical Informatics Studies for Knowledge Discovery in Precision Medicine[J]. 数据分析与知识发现, 2018, 2(1): 1-8.
[15] Hu Zhongyi,Wang Chaoqun,Wu Jiang. Identifying Phishing Websites with Multiple Online Data Sources[J]. 数据分析与知识发现, 2017, 1(6): 47-55.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn