Please wait a minute...
New Technology of Library and Information Service  2016, Vol. 32 Issue (6): 102-109    DOI: 10.11925/infotech.1003-3513.2016.06.13
Orginal Article Current Issue | Archive | Adv Search |
Discovering Knowledge from Electronic Medical Records with Three Data Mining Algorithms
Mu Dongmei1,Ren Ke2()
1School of Public Health, Jilin University, Changchun 130021, China
2School of Information Management, Wuhan University, Wuhan 430072, China
Export: BibTeX | EndNote (RIS)      

[Objective] This empirical study tries to identify risk factors for diseases from the heterogeneous Electronic Medical Records (EMR). [Methods] First, we collected EMR with various data structures. Second, we built models to predict risk factors for diseases with the help of three algorithms (i.e., decision-making tree, logistic regression and neutral network). Finally, we compared and evaluated these models statistically. [Results] The Decision Tree Model achieved higher recall and precision rates than the Logistic Regression and Neural Network ones. However, there was no significant difference among them. [Limitations] We did not optimize the EMR’s properties. [Conclusions] The Decision Tree Model does a better job than the Logistic Regression and Neural Network models in discovering the risk factors to predict diseases. The framework of knowledge discovery based on data mining algorithms, provides some directions for future research.

Key wordsKnowledge discovery      Electronic medical record      Data mining algorithms      Prediction model     
Received: 19 February 2016      Published: 18 July 2016

Cite this article:

Mu Dongmei,Ren Ke. Discovering Knowledge from Electronic Medical Records with Three Data Mining Algorithms. New Technology of Library and Information Service, 2016, 32(6): 102-109.

URL:     OR

[1] 曾建勋, 魏来. 大数据时代的情报学变革[J]. 情报学报, 2015, 34(1): 37-44.
[1] (Zeng Jianxun, Wei Lai.The Changes of Information Science in Big Data Era[J]. Journal of the China Society for Scientific and Technical Information, 2015, 34(1): 37-44.)
[2] Ackoff R L.From Data to Wisdom[J]. Journal of Applies Systems Analysis, 1980(16): 3-9.
[3] Bellinger G, Castro D, Mills A. Data, Information, Knowledge, and Wisdom [EB/OL]. [2015-11-24]. .
[4] Zeleny M.Human Systems Management: Integrating Knowledge, Management and Systems[M]. Singapore: World Scientific, 2005: 15-16.
[5] CIO时代网. DIKW: 数据、信息、知识、智慧的金字塔层次体系[EB/OL]. [2014-11-24]. .
[5] (CIO Network Era. DIKW: Pyramid Hierarchy of Data, Information, Knowledge, Wisdom [EB/OL]. [2014-11-24].
[6] 王曰芬. 文献计量法与内容分析法综合研究的方法论来源与依据[J]. 情报理论与实践, 2009, 32(2): 21-26.
[6] (Wang Yuefen.The Source and Basis of the Methodology of Synthetic Research with Bibliometric Method and Content Analysis Method[J]. Information Studies: Theory & Application, 2009, 32(2): 21-26.)
[7] 王丽伟, 李梅, 牟冬梅, 等. 一种面向知识服务的领域知识发现流程及实例研究[J]. 情报学报, 2015, 34(1): 45-52.
[7] (Wang Liwei, Li Mei, Mu Dongmei, et al.A Knowledge Service-oriented Domain Knowledge Discovery Process[J]. Journal of the China Society for Scientific and Technical Information, 2015, 34(1): 45-52.)
[8] 徐戈, 王厚峰. 自然语言处理中主题模型的发展[J]. 计算机学报, 2011, 34(8): 1423-1436.
[8] (Xu Ge, Wang Houfeng.The Development of Topic Models in Natural Language Processing[J]. Chinese Journal of Computers, 2011, 34(8): 1423-1436.)
[9] 何清, 李宁, 罗文娟, 等. 大数据下的机器学习算法综述[J]. 模式识别与人工智能, 2014, 27(4): 327-336.
[9] (He Qing, Li Ning, Luo Wenjuan, et al.A Survey of Machine Learning Algorithms for Big Data[J]. PR&AI, 2014, 27(4): 327-336.)
[10] 唐慧丰, 谭松波, 程学旗. 基于监督学习的中文情感分类技术比较研究[J]. 中文信息学报, 2007, 21(6): 88-94, 108.
[10] (Tang Huifeng, Tan Songbo, Cheng Xueqi.Research on Sentiment Classification of Chinese Reviews Based on Supervised Machine Learning Techniques[J]. Journal of Chinese Information Processing, 2007, 21(6): 88-94, 108.)
[11] 侯亚君. R语言在数据挖掘中的运用[J]. 晋城职业技术学院学报, 2014, 7(2): 63-65.
[11] (Hou Yajun.On the Application of R Language in Data Mining[J]. Journal of Jincheng Institute of Technology, 2014, 7(2): 63-65.)
[12] 杨静, 张楠男, 李建, 等. 决策树算法的研究与应用[J]. 计算机技术与发展, 2010, 20(2): 114-116, 120.
[12] (Yang Jing, Zhang Nannan, Li Jian, et al.Research and Application of Decision Tree Algorithm[J]. Computer Technology and Development, 2010, 20(2): 114-116, 120.)
[13] 洪家荣, 丁明峰, 李星原, 等. 一种新的决策树归纳学习算法[J]. 计算机学报, 1995, 18(6): 470-474.
[13] (Hong Jiarong, Ding Mingfeng, Li Xingyuan, et al.A New Algorithm of Decision Tree Induction[J]. Chinese Journals of Computers, 1995, 18(6): 470-474.)
[14] 邢秋菊, 赵纯勇, 高克昌. 基于GIS的滑坡危险性逻辑回归评价研究[J]. 地理与地理信息科学, 2004, 20(3): 49-51.
[14] (Xing Qiuju, Zhao Chunyong, Gao Kechang.Logical Regression Analysis on the Hazard of Landslide Based on GIS[J]. Geography and Geo-Information Science, 2004, 20(3): 49-51.)
[15] 邬伦, 刘瑜, 张晶, 等. 地理信息系统——原理、方法和应用[M]. 北京: 科学出版社, 2001.
[15] (Wu Lun, Liu Yu, Zhang Jing, et al.Geographical Information System——Theory, Method, Application [M]. Beijing: Science Press, 2001.)
[16] 王春峰, 万海晖, 张维. 基于神经网络技术的商业银行信用风险评估[J]. 系统工程理论与实践, 1999(9): 24-32.
[16] (Wang Chunfeng, Wan Haihui, Zhang Wei.Credit Risk Assessment in Commercial Banks Using Neural Networks[J]. System Engineering Theory and Practice, 1999(9): 24-32.)
[17] McClelland J L, Rumelhart D E, Hinton G E. Parallel Distributed Processing: Explorations in the Microstructure of Cognition [M]. Cambridge, MA: MIT Press, 1986.
[18] Zhang Y, Cui H, Burkell J, et al.A Machine Learning Approach for Rating the Quality of Depression Treatment Web Pages [C]. In: Proceedings of iConference 2014.
[19] Manning C D, Schutze H, Raghavan P.信息检索导论 [M]. 王斌译. 北京: 人民邮电出版社, 2010: 105-107, 196-200.
[19] (Manning C D, Schutze H, Raghavan P.Introduction to Information Retrieval [M]. Translated by Wang Bin. Beijing: Posts & Telecom Press, 2010: 105-107, 196-200.)
[20] 赵莹. 配对四格表资料的条件Logistic回归模型的Bayes分析[J]. 数理医药学杂志, 2010, 23(5): 505-506.
[20] (Zhao Ying.Bayes Analysis of Conditional Logistic Model for Paired Fourfold Table Data[J]. Journal of Mathematical Medicine, 2010, 23(5): 505-506.)
[1] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[2] Dai Bing,Hu Zhengyin. Review of Studies on Literature-Based Discovery[J]. 数据分析与知识发现, 2021, 5(4): 1-12.
[3] Hu Zhengyin,Liu Leilei,Dai Bing,Qin Xiaochu. Discovering Subject Knowledge in Life and Medical Sciences with Knowledge Graph[J]. 数据分析与知识发现, 2020, 4(11): 1-14.
[4] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[5] Kan Liu,Lu Chen. Deep Neural Network Learning for Medical Triage[J]. 数据分析与知识发现, 2019, 3(6): 99-108.
[6] Juhua Wu,Yu Wang,Ming Li,Shaoyun Cai. Knowledge Discovery of Online Health Communities with Weighted Knowledge Network[J]. 数据分析与知识发现, 2019, 3(2): 108-117.
[7] Mingqing Zhao,Shengqiang Wu. Research on Stock Market Weighted Prediction Method Based on Micro-blog Sentiment Analysis[J]. 数据分析与知识发现, 2019, 3(2): 43-51.
[8] Juhua Wu,Shuo Zhang,Lei Tao,Shunjun Jiang. Predicting Stroke Risks with Neural Network[J]. 数据分析与知识发现, 2019, 3(12): 70-75.
[9] Lei Yang,Zirun Wang,Guisheng Hou. Discovering Topics of Online Health Community with Q-LDA Model[J]. 数据分析与知识发现, 2019, 3(11): 52-59.
[10] Jiying Hu,Jing Xie,Li Qian,Changlei Fu. Constructing Big Data Platform for Sci-Tech Knowledge Discovery with Knowledge Graph[J]. 数据分析与知识发现, 2019, 3(1): 55-62.
[11] Ma Xiaoyu,Zhang Han,Zhao Yuhong. Building Childhood Asthma Prediction Model with Artificial Neural Network and BRFSS Database[J]. 数据分析与知识发现, 2018, 2(8): 10-15.
[12] Wang Xin,Feng Wen’gang. Review of Techniques Detecting Online Extremism and Radicalization[J]. 数据分析与知识发现, 2018, 2(10): 2-8.
[13] Zhang Zhiqiang,Fan Shaoping,Chen Xiujuan. Biomedical Informatics Studies for Knowledge Discovery in Precision Medicine[J]. 数据分析与知识发现, 2018, 2(1): 1-8.
[14] Mu Dongmei,Wang Ping,Zhao Danning. Reducing Data Dimension of Electronic Medical Records: An Empirical Study[J]. 数据分析与知识发现, 2018, 2(1): 88-98.
[15] Xie Xiufang,Zhang Xiaolin. Integrated Analysis and Visualization of Sci-Tech Roadmaps: Case Study of Renewable Energy[J]. 数据分析与知识发现, 2017, 1(1): 16-25.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938