[Objective] This empirical study tries to identify risk factors for diseases from the heterogeneous Electronic Medical Records (EMR). [Methods] First, we collected EMR with various data structures. Second, we built models to predict risk factors for diseases with the help of three algorithms (i.e., decision-making tree, logistic regression and neutral network). Finally, we compared and evaluated these models statistically. [Results] The Decision Tree Model achieved higher recall and precision rates than the Logistic Regression and Neural Network ones. However, there was no significant difference among them. [Limitations] We did not optimize the EMR’s properties. [Conclusions] The Decision Tree Model does a better job than the Logistic Regression and Neural Network models in discovering the risk factors to predict diseases. The framework of knowledge discovery based on data mining algorithms, provides some directions for future research.
牟冬梅,任珂. 三种数据挖掘算法在电子病历知识发现中的比较*[J]. 现代图书情报技术, 2016, 32(6): 102-109.
Mu Dongmei,Ren Ke. Discovering Knowledge from Electronic Medical Records with Three Data Mining Algorithms. New Technology of Library and Information Service, 2016, 32(6): 102-109.
(Zeng Jianxun, Wei Lai.The Changes of Information Science in Big Data Era[J]. Journal of the China Society for Scientific and Technical Information, 2015, 34(1): 37-44.)
[2]
Ackoff R L.From Data to Wisdom[J]. Journal of Applies Systems Analysis, 1980(16): 3-9.
[3]
Bellinger G, Castro D, Mills A. Data, Information, Knowledge, and Wisdom [EB/OL]. [2015-11-24]. .
[4]
Zeleny M.Human Systems Management: Integrating Knowledge, Management and Systems[M]. Singapore: World Scientific, 2005: 15-16.
(Wang Yuefen.The Source and Basis of the Methodology of Synthetic Research with Bibliometric Method and Content Analysis Method[J]. Information Studies: Theory & Application, 2009, 32(2): 21-26.)
(Wang Liwei, Li Mei, Mu Dongmei, et al.A Knowledge Service-oriented Domain Knowledge Discovery Process[J]. Journal of the China Society for Scientific and Technical Information, 2015, 34(1): 45-52.)
(Tang Huifeng, Tan Songbo, Cheng Xueqi.Research on Sentiment Classification of Chinese Reviews Based on Supervised Machine Learning Techniques[J]. Journal of Chinese Information Processing, 2007, 21(6): 88-94, 108.)
(Yang Jing, Zhang Nannan, Li Jian, et al.Research and Application of Decision Tree Algorithm[J]. Computer Technology and Development, 2010, 20(2): 114-116, 120.)
(Xing Qiuju, Zhao Chunyong, Gao Kechang.Logical Regression Analysis on the Hazard of Landslide Based on GIS[J]. Geography and Geo-Information Science, 2004, 20(3): 49-51.)
(Wang Chunfeng, Wan Haihui, Zhang Wei.Credit Risk Assessment in Commercial Banks Using Neural Networks[J]. System Engineering Theory and Practice, 1999(9): 24-32.)
[17]
McClelland J L, Rumelhart D E, Hinton G E. Parallel Distributed Processing: Explorations in the Microstructure of Cognition [M]. Cambridge, MA: MIT Press, 1986.
[18]
Zhang Y, Cui H, Burkell J, et al.A Machine Learning Approach for Rating the Quality of Depression Treatment Web Pages [C]. In: Proceedings of iConference 2014.
(Manning C D, Schutze H, Raghavan P.Introduction to Information Retrieval [M]. Translated by Wang Bin. Beijing: Posts & Telecom Press, 2010: 105-107, 196-200.)