A BiLSTM-CRF Model for Chinese Clinical Protected Health Information Recognition
Liu Jingru,Song Yang,Jia Rui,Zhang Yipeng,Luo Yong,Ma Jingdong
(School of Medical and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030)
(School of Public Health, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137)
(Sichuan Province Electronic Medical Record Engineering Technology Research Center, Chengdu 610041)
(Sichuan Jiuzhen Technology Co., Ltd., Chengdu, 610041)
[Objective] In order to protect private information in clinical texts and effectively identify protected health information (PHI) from unstructured structured texts, an automated scheme for removing private information from clinical records using a BiLSTM-CRF model is proposed. [Method] The discharge summary from the Electronic health records of a health information platform was selected as experimental data. According to the 18 PHI regulations specified by HIPAA combined with the characteristics of the experimental data, 7 PHI categories and 15 PHI types were determined. The BiLSTM-CRF model is used to effectively identify protected health information from unstructured clinical records. [Result] The accuracy rate, recall rate and F value of all entity category recognition were 98.66%, 99.36%, and 99.01% respectively, and the wrong labels were summarized and analyzed.. [Limitations] The optimization of model performance based on corpus characteristics needs to be improved, and the clinical text quality after automatic recognition of PHI has not been evaluated in this study. [Conclusion] The BiLSTM-CRF model realizes the automatic recognition of named entities without feature engineering, which is helpful to promote the sharing and utilization of clinical information.
刘婧茹, 宋阳, 贾睿, 张翼鹏, 罗勇, 马敬东. 基于BiLSTM-CRF中文临床文本中受保护的健康信息识别
[J]. 数据分析与知识发现, 0, (): 0-.
Liu Jingru, Song Yang, Jia Rui, Zhang Yipeng, Luo Yong, Ma Jingdong. A BiLSTM-CRF Model for Chinese Clinical Protected Health Information Recognition
. Data Analysis and Knowledge Discovery, 0, (): 0-.