Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (1): 88-98    DOI: 10.11925/infotech.2096-3467.2017.1053
Orginal Article Current Issue | Archive | Adv Search |
Reducing Data Dimension of Electronic Medical Records: An Empirical Study
Dongmei Mu(),Ping Wang,Danning Zhao
School of Public Health, Jilin University, Changchun 130021, China
Download: PDF(1630 KB)   HTML ( 2
Export: BibTeX | EndNote (RIS)      

[Objective] This paper explores the strategy of reducing the data dimension of electronic medical records, aiming to improve the knowledge discovery. [Methods] First, we conducted preliminary dimension reduction through literature review. Then, we used three methods to finish the second round of dimension reduction. We extracted the factors with the eigenvalue greater than 1, with the cumulative contribution rate greater than 85%, as well as factors of significant differences. Finally, we compared results of the three methods with empirical research. [Results] The dimensional reduction methods extracted 8, 17 and 14 attributes respectively. After qualitative and quantitative evaluation, the principal component analysis method yielded the best result, whose dimension of the feature root was larger than 1. [Limitations] The sample size needs to be expanded for more in-depth analysis. [Conclusions] The proposed method could effectively reduce the data dimension of electronic medical records.

Key wordsDimension Reduction      Data Mining      Knowledge Discovery      Electronic Medical Record     
Received: 23 October 2017      Published: 05 February 2018

Cite this article:

Dongmei Mu,Ping Wang,Danning Zhao. Reducing Data Dimension of Electronic Medical Records: An Empirical Study. Data Analysis and Knowledge Discovery, 2018, 2(1): 88-98.

URL:     OR

[1] 罗旭, 刘友江. 医疗大数据研究现状及其临床应用[J]. 医学信息学杂志, 2015, 36(5): 10-14.
[1] (Luo Xu, Liu Youjiang.Medical Big Data Research Status and Its Clinical Application[J]. Journal of Medical Informatics, 2015, 36(5): 10-14.)
[2] Godinho T M, Costa C, Oliveira J L.Intelligent Generator of Big Data Medical Imaging Repositories[J]. IET Software, 2017, 11(3): 100-104.
[3] 毕达天, 邱长波, 张晗. 数据降维技术研究现状及其进展[J]. 情报理论与实践, 2013, 36(2): 125-128.
[3] (Bi Datian, Qiu Changbo, Zhang Han.Research Status and Progress of Data Dimensionality Reduction Technology[J]. Information Studies: Theory & Application, 2013, 36(2): 125-128.)
[4] 雷健波. 电子病历的核心价值与临床决策支持[J]. 中国数字医学, 2008, 3(3): 26-30.
[4] (Lei Jianbo.Clinical Decision Support and the Core Value of Electronic Medical Record[J].China Digital Medicine, 2008, 3(3): 26-30.)
[5] Byrd R J, Steinhubl S R, Sun J, et al.Automatic Identification of Heart Failure Diagnostic Criteria, Using Text Analysis of Clinical Notes from Electronic Health Records[J]. International Journal of Medical Informatics, 2014, 83(12): 983-992.
[6] Ye J, Farnum M, Yang E, et al.Sparse Learning and Stability Selection for Predicting MCI to AD Conversion Using Baseline ADNI Data[J]. BMC Neurology, 2012, 12: 46.
[7] Kawata T, Daimon M, Miyazaki S, et al.Coronary Microvascular Function is Independently Associated with Left Ventricular Filling Pressure in Patients with Type 2 Diabetes Mellitus[J]. Cardiovascular Diabetology, 2015, 14: 98.
[8] 郭珉江. 数据挖掘技术在疾病诊断相关分组中的应用[D]. 长沙: 中南大学, 2009.
[8] (Guo Minjiang.Research on the Application of Data Mining Technology in Disease Related Groups[D]. Changsha: Central South University, 2009.)
[9] Alvarez C A, Clark C A, Zhang S, et al.Predicting out of Intensive Care Unit Cardiopulmonary Arrest or Death Using Electronic Medical Record Data[J]. BMC Medical Informatics and Decision Making, 2013, 13: 28.
[10] Matheny M E, Fitzhenry F, Speroff T, et al.Detection of Infectious Symptoms from VA Emergency Department and Primary Care Clinical Documentation[J]. International Journal of Medical Informatics, 2012, 81(3): 143-156.
[11] Ciecholewski M.Ischemic Heart Disease Detection Using Selected Machine Learning Methods[J]. International Journal of Computer Mathematics, 2013, 90(8): 1734-1759.
[12] Ramírez J, Górriz J M, Salas-Gonzalez D, et al.Computer- aided Diagnosis of Alzheimer’s Type Dementia Combining Support Vector Machines and Discriminant Set of Features[J]. Information Sciences, 2013, 237: 59-72.
[13] Pierson E, Yau C.ZIFA: Dimensionality Reduction for Zero- inflated Single-cell Gene Expression Analysis[J]. Genome Biology, 2015, 16: 241.
[14] Yao F, Coquery J, Cao K A L. Independent Principal Component Analysis for Biologically Meaningful Dimension Reduction of Large Biological Data Sets[J]. BMC Bioinformatics, 2012, 13: 24.
[15] Gui J, Moore J H, Williams S M, et al.A Simple and Computationally Efficient Approach to Multifactor Dimensionality Reduction Analysis of Gene-Gene Interactions for Quantitative Traits[J]. PLOS One, 2013, 8(6): e66545.
[16] 周威光. 粗糙集理论处理海量电子病历的研究与应用[D].杭州: 浙江理工大学, 2017.
[16] (Zhou Weiguang.Research and Application of Rough Set Theory in Dealing with Massive Electronic Medical Records [D]. Hangzhou: Zhejiang Sci-Tech University, 2017.)
[17] 田宇驰, 胡亮. 基于SVM的一种医疗数据分析模型[J].东北师大学报 : 自然科学版, 2015, 47(1): 77-82.
[17] (Tian Yuchi, Hu Liang.A Medical Data Analysis Model Based on SVM[J]. Journal of Northeast Normal University, 2015, 47(1): 77-82.)
[18] 佘侃侃, 胡孔法, 王珍. 基于变精度容差粗糙集模型及属性敏感度约简的方剂配伍研究[J]. 世界科学技术: 中医药现代化, 2014(6): 1222-1228.
[18] (She Kankan, Hu Kongfa, Wang Zhen.Research on Chinese Prescription Compatibility Based on Variable Precision Tolerance Model and Attribute Sensitivity Reduction[J]. Modernization of Traditional Chinese Medicine and Materia Medica-World Science and Technology, 2014(6): 1222-1228.)
[19] Carter J T.Electronic Medical Records and Quality Improvement[J]. Neurosurgery Clinics of North America, 2015, 26(2): 245-251.
[20] 中华人民共和国国家卫生和计划生育委员会.关于发布《电子病历基本数据集第1部分: 病例概要》等20项卫生行业标准的通告(国卫通〔2014〕5号)[EB/OL]. (2014-06-19). [2017-04-15]. .
[20] (National Health and Family Planning Commission of the People’s Republic of China. Notice on Publishing 20 Health Industry Standards such as Basic Medical Data Record Part 1: Case Summary and Other Health Industry Standards [EB/OL]. (2014-06-19). [2017-04-15]. .)
[21] Grauer R, Barber M, Scheeren J.Exploring Microsoft Office Excel 2007[M]. Prentice Hall, 2007.
[22] Moore B.Principal Component Analysis in Linear Systems: Controllability, Observability, and Model Reduction[J]. IEEE Transactions on Automatic Control, 1981, 26(1): 17-32.
[23] Hosmer D W, Lemeshow S.Applied Logistic Regression[M]. Wiley, 2000.
[24] 张冬慧, 唐智柳, 李岚, 等. 我国21世纪初糖尿病患病率系统综述[J]. 上海预防医学, 2012, 24(9): 492-495.
[24] (Zhang Donghui, Tang Zhiliu, Li Lan, et al.A Systematic Review on Prevalence Rate of Diabetes in 2001-2010 in China[J]. Shanghai Journal of Preventive Medicine, 2012, 24(9): 492-495.)
[25] 国家人口与健康科学数据共享服务平台. 糖尿病数据集[EB/OL]. (2015-06-13). [2017-04-15]. .
[25] (National Scientific Data Sharing Platform for Population and Health.Diabetes Data Set [EB/OL]. (2015- 06-13). [2017-04-15].
[26] 百度百科. 新年龄分段 [EB/OL]. [2016-05-14]..
[26] (Baidu Encyclopedia.New Age Segmentation [EB/OL]. [2016-05-14]..)
[27] 王庭俊, 严孙杰, 陈纯娴. 不同性别2型糖尿病患者血脂、血压与骨质疏松的关系[J]. 中华高血压杂志, 2012, 20(12): 1152-1156.
[27] (Wang Tingjun, Yan Sunjie, Chen Chunxian.Relationship Between Blood Lipid, Blood Pressure and Osteoporosis in Male and Female Patients with Type 2 Diabetes[J]. Chinese Journal of Hypertension, 2012, 20(12): 1152-1156.)
[28] 黄琼刁, 邓万溪, 黄钦展, 等.糖尿病主要并发症年龄与体质之相关性研究[J]. 世界中医药, 2013, 8(3): 288-290.
[28] (Huang Qiongdiao, Deng Wanxi, Huang Qinzhan, et al.Study on Correlation of Age and Constitution in Patients with Major Diabetes Complications[J]. World Chinese Medicine, 2013, 8(3): 288-290.)
[29] 石珂, 张悦之, 谢琳, 等. 负性调节葡萄糖转运对糖尿病小鼠视网膜微血管病变的抑制作用[J]. 第二军医大学学报, 2015, 36(2): 147-154.
[29] (Shi Ke, Zhang Yuezhi, Xie Lin, et al.Negative Regulation of Glucose Transport Alleciates Microvasculature Pathological Changes of Retinopathy in Diabetic Mice[J]. Academic Journal of Second Military Medical University, 2015, 36(2): 147-154.)
[30] 张媛媛, 张日华, 杜新丽, 等. 血清尿酸水平与糖尿病各代谢因子的相关性研究[J]. 南京医科大学学报: 自然科学版, 2013(1): 62-67.
[30] (Zhang Yuanyuan, Zhang Rihua, Du Xinli, et al.Association Between Serum Eric Acid Concentration and the Metabolic Factors of Diabetes[J]. Acta Universitatis Medicinalis Nanjing, 2013(1): 62-67.)
[31] Scanlon G, Connell P, Ratzlaff M, et al.Macular Pigment Optical Density is Lower in Type 2 Diabetes, Compared with Type 1 Diabetes and Normal Controls[J].Retina, 2015, 35(9): 1808-1816.
[32] 杨维娜, 王璇, 蓝茜, 等. 2型糖尿病并发周围血管病变的临床流行病学分析[J]. 西安交通大学学报: 医学版, 2013, 34(1): 73-76.
[32] (Yang Weina, Wang Xuan, Lan Qian, et al.Clinical Epidemiological Analysis of Type 2 Diabetes Patients with Peripheral Vascular Disease[J].Journal of Xi’an Jiaotong University: Medical Sciences, 2013, 34(1): 73-76.)
[33] Das R, Kerr R, Chakravarthy U, et al.Dyslipidemia and Diabetic Macular Edema: A Systematic Review and Meta- Analysis[J]. Ophthalmology, 2015, 122(9): 1820-1827.
[34] 魏忠燕, 谢立科, 镇华, 等. 同型半胱氨酸、尿酸、乳酸脱氢酶及肌酸激酶与糖尿病视网膜病变的关系[J]. 眼科新进展, 2011, 31(9): 846-848.
[34] (Wei Zhongyan, Xie Like, Zhen Hua, et al.Correlation of Homocysteine, Uric Acid, Lac-tate Dehydrogenase and Creatine Kinase with Diabetic Retinopathy[J]. Recent Advances in Ophthalmology, 2011, 31(9): 846-848.)
[35] 白洲霞. 2型糖尿病血清载脂蛋白A1、B、脂蛋白(a)水平观察分析[J]. 国际检验医学杂志, 2010, 31(10): 1146-1147.
[35] (Bai Zhouxia.Observation and Analysis of Serum Apolipoprotein A1, B and Lipoprotein (a) Levels in Type 2 Diabetes Mellitus[J]. International Journal of Laboratory Medicine, 2010, 31(10): 1146-1147.)
[36] 王晶晶, 田晨光. 糖化血红蛋白、糖化血清蛋白、血细胞参数在老年糖尿病微血管病变患者中的应用价值[J]. 中华实用诊断与治疗杂志, 2010, 24(2): 143-145.
[36] (Wang Jingjing, Tian Chenguang.Application of Glycosylated Hemoglobin, Glycosylated Serum Protein and Peripheral Blood Cell Parametersin Senile Patients with Diabetic Microangiopathy[J]. Journal of Chinese Practical Diagnosis and Therapy, 2010, 24(2): 143-145.)
[37] Takahara M, Katakami N, Osonoi T, et al.Different Impacts of Cardiovascular Risk Factors on Arterial Stiffness Versus Arterial Wall Thickness in Japanese Patients with Type 2 Diabetes Mellitus[J]. Journal of Atherosclerosis and Thrombosis, 2015, 22(9): 971-980.
[38] Kaidonis G, Burdon K P, Gillies M C, et al.Common Sequence Variation in the VEGFC Gene is Associated with Diabetic Retinopathy and Diabetic Macular Edema[J]. Ophthalmology, 2015, 122(9): 1828-1836.
[39] 杜玮, 刘子扬, 周艳艳, 等. 糖尿病视网膜病变与血清胆红素水平的关系[J]. 眼科新进展, 2012, 32(5): 484-485.
[39] (Du Wei, Liu Ziyang, Zhou Yanyan, et al.Relationship Between Diabetic Retinopathy and Serum Bilirubin Level[J]. Recent Advances in Ophthalmology, 2012, 32(5): 484-485.)
[40] 项旻, 杨虹, 叶成夫, 等. 女性2型糖尿病视网膜病变患者血微量元素水平及其相关因素分析[J]. 中国医药导报, 2014, 11(13): 9-11.
[40] (Xiang Min, Yang Hong, Ye Chengfu, et al.Level of Blood Trace Elements in Female Patients with Type 2 Diabetic Retinopathy and Its Related Factors Analysis[J]. China Medical Herald, 2014, 11(13): 9-11.)
[1] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[2] Yong Zhang,Shuqing Li,Yongshang Cheng. Mining Algorithm for Weighted Association Rules Based on Frequency Effective Length[J]. 数据分析与知识发现, 2019, 3(7): 85-93.
[3] Kan Liu,Lu Chen. Deep Neural Network Learning for Medical Triage[J]. 数据分析与知识发现, 2019, 3(6): 99-108.
[4] Quan Lu,Anqi Zhu,Jiyue Zhang,Jing Chen. Research on User Information Requirement in Chinese Network Health Community: Taking Tumor-forum Data of Qiuyi as an Example[J]. 数据分析与知识发现, 2019, 3(4): 22-32.
[5] Dongmei Mu,Hui Fa,Ping Wang,Jing Sun. Research on Disease Risk Factors on Structural Equation Model[J]. 数据分析与知识发现, 2019, 3(4): 80-89.
[6] Juhua Wu,Yu Wang,Ming Li,Shaoyun Cai. Knowledge Discovery of Online Health Communities with Weighted Knowledge Network[J]. 数据分析与知识发现, 2019, 3(2): 108-117.
[7] Jiying Hu,Jing Xie,Li Qian,Changlei Fu. Constructing Big Data Platform for Sci-Tech Knowledge Discovery with Knowledge Graph[J]. 数据分析与知识发现, 2019, 3(1): 55-62.
[8] Xin Wang,Wen’gang Feng. Review of Techniques Detecting Online Extremism and Radicalization[J]. 数据分析与知识发现, 2018, 2(10): 2-8.
[9] Yongnan Li. Using Bayes Theory to Classify Counter Terrorism Intelligence[J]. 数据分析与知识发现, 2018, 2(10): 9-14.
[10] Zhiqiang Zhang,Shaoping Fan,Xiujuan Chen. Biomedical Informatics Studies for Knowledge Discovery in Precision Medicine[J]. 数据分析与知识发现, 2018, 2(1): 1-8.
[11] Zhongyi Hu,Chaoqun Wang,Jiang Wu. Identifying Phishing Websites with Multiple Online Data Sources[J]. 数据分析与知识发现, 2017, 1(6): 47-55.
[12] Siwei Jiang,Zhenping Xie,Meijie Chen,Ming Cai. Self-Explainable Reduction Method for Mixed Feature Data Modeling[J]. 数据分析与知识发现, 2017, 1(12): 92-100.
[13] Xiufang Xie,Xiaolin Zhang. Integrated Analysis and Visualization of Sci-Tech Roadmaps: Case Study of Renewable Energy[J]. 数据分析与知识发现, 2017, 1(1): 16-25.
[14] Mu Dongmei,Ren Ke. Discovering Knowledge from Electronic Medical Records with Three Data Mining Algorithms[J]. 现代图书情报技术, 2016, 32(6): 102-109.
[15] Liu Hongxu,Qu Jiansheng. Using Meta-analysis Software for Domain Knowledge Discovery[J]. 现代图书情报技术, 2016, 32(5): 9-21.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938