A BiLSTM-CRF Model for Protected Health Information in Chinese
Liu Jingru1,Song Yang1,Jia Rui2,3,Zhang Yipeng1,Luo Yong2,4,Ma Jingdong1()
1School of Medical and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China 2Sichuan Province Electronic Medical Record Engineering Technology Research Center, Chengdu 610041, China 3School of Public Health, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China 4Sichuan Jiuzhen Technology Co., Ltd., Chengdu 610041, China
[Objective] This paper proposes an automated scheme to remove personal information from clinical records based on the BiLSTM-CRF model, aiming to protect patient privacy and identify protected health information (PHI) from unstructured files.[Methods] We collected experimental data from the discharge summaries of a health information platform. According to the 18 PHI regulations specified by HIPAA, we determined 7 PHI categories and 15 PHI types. We used the BiLSTM-CRF model to effectively identify protected health information from unstructured clinical records.[Results] The accuracy rate, recall rate and F value of all entity category recognition were 98.66%, 99.36%, and 99.01% respectively, and the wrong labels were summarized and analyzed.[Limitations] The corpus characteristics need to be improved, and the clinical text quality after automatic recognition of PHI was not evaluated.[Conclusions] The BiLSTM-CRF model could automatically recognize named entities without feature engineering, which promotes the sharing and utilization of clinical information.
刘婧茹,宋阳,贾睿,张翼鹏,罗勇,马敬东. 基于BiLSTM-CRF中文临床文本中受保护的健康信息识别*[J]. 数据分析与知识发现, 2020, 4(10): 124-133.
Liu Jingru,Song Yang,Jia Rui,Zhang Yipeng,Luo Yong,Ma Jingdong. A BiLSTM-CRF Model for Protected Health Information in Chinese. Data Analysis and Knowledge Discovery, 2020, 4(10): 124-133.
Demner-Fushman D, Chapman W W, McDonald C J. What Can Natural Language Processing do for Clinical Decision Support?[J]. Journal of Biomedical Informatics, 2009,42(5):760-772.
doi: 10.1016/j.jbi.2009.08.007
pmid: 19683066
[2]
Wagholikar K B, Maclaughlin K L, Henry M R, et al. Clinical Decision Support with Automated Text Processing for Cervical Cancer Screening[J]. Journal of the American Medical Informatics Association, 2012,19(5):833-839.
doi: 10.1136/amiajnl-2012-000820
pmid: 22542812
[3]
Weng C H, Wu X Y, Luo Z H, et al. EliXR: An Approach to Eligibility Criteria Extraction and Representation[J]. Journal of the American Medical Informatics Association, 2011(S1):116-124.
[4]
Stubbs A, Uzuner O. Annotating Longitudinal Clinical Narratives for De-identification: The 2014 i2b2/UTHealth Corpus[J]. Journal of Biomedical Informatics, 2015,58:S20-S29.
doi: 10.1016/j.jbi.2015.07.020
pmid: 26319540
[5]
Tucker K, Branson J, Dilleen M, et al. Protecting Patient Privacy When Sharing Patient-Level Data from Clinical Trials[J]. BMC Medical Research Methodology, 2016, 16(S1): Article 77.
[6]
Deven M G. Building Public Trust in Uses of Health Insurance Portability and Accountability Act De-Identified Data[J]. Journal of the American Medical Informatics Association, 2013,20(1):29-34.
doi: 10.1136/amiajnl-2012-000936
pmid: 22735615
[7]
Dernoncourt F, Lee J Y Uzuner O et al. De-Identification of Patient Notes with Recurrent Neural Networks[J]. Journal of the American Medical Informatics Association, 2017,24(3):596-606.
doi: 10.1093/jamia/ocw156
pmid: 28040687
[8]
Jian Z, Guo X S, Liu S J, et al. A Cascaded Approach for Chinese Clinical Text De-Identification with Less Annotation Effort[J]. Journal of Biomedical Informatics, 2017,73:76-83.
doi: 10.1016/j.jbi.2017.07.017
pmid: 28756160
[9]
Meystre S M, Friedlin F J, South B R, et al. Automatic De-Identification of Textual Documents in the Electronic Health Record: A Review of Recent Research[J]. BMC Medical Research Methodology, 2010,10(1):1-16.
doi: 10.1186/1471-2288-10-1
[10]
Sundheim B M. Named Entity Task Definition,Version 2.1[C]//Proceedings of the 6th Message Understanding Conference. 1995.
[11]
Chen L, Yang J J, Wang Q. Privacy-Preserving Data Publishing for Free Text Chinese Electronic Medical Records[C]// Proceedings of the 2012 IEEE 36th Annual Computer Software and Applications Conference.. 2012: 567-572.
[12]
Ford E, Carrol J A, Smith H E, et al. Extracting Information from the Text of Electronic Medical Records to Improve Case Detection: A Systematic Review[J]. Journal of the American Medical Informatics Association, 2016,23(5):1007-1015.
doi: 10.1093/jamia/ocv180
pmid: 26911811
[13]
韩旭. 基于神经网络的文本特征表示关键技术研究[D]. 北京: 北京邮电大学, 2019.
[13]
( Han Xu. Research on Key Technologies of Text Feature Representation Based on Neural Network[D]. Beijing: Beijing University of Posts and Telecommunications, 2019.)
[14]
顾溢. 基于BiLSTM-CRF的复杂中文命名实体识别研究[D]. 南京: 南京大学, 2019.
[14]
( Gu Yi. Research on Complex Chinese Named Entity Recognition Based on BiLSTM-CRF[D]. Nanjing: Nanjing University, 2019.)
[15]
Ji B, Liu R, Li S S, et al. A Hybrid Approach for Named Entity Recognition in Chinese Electronic Medical Record[J]. BMC Medical Informatics and Decision Making, 2019,19(S2):64.
doi: 10.1186/s12911-019-0767-2
( Chen Shudong, Ouyang Xiaoye. Overview of Named Entity Recognition Technology[J]. Radio Communications Technology, 2020,46(3):251-260.)
[17]
Yang X, Bian J, Gong Y, et al. MADEx: A System for Detecting Medications, Adverse Drug Events, and Their Relations from Clinical Notes[J]. Drug Safety, 2019,42(1):123-133.
doi: 10.1007/s40264-018-0761-0
pmid: 30600484
[18]
申站. 基于神经网络的中文电子病历命名实体识别[D]. 北京: 北京邮电大学, 2018.
[18]
( Shen Zhan. Named Entity Recognition for Chinese Electronic Record with Neural Network[D]. Beijing: Beijing University of Posts and Telecommunications, 2018.)
[19]
Ji B, Liu R, Li S S, et al. A BiLSTM-CRF Method to Chinese Electronic Medical Record Named Entity Recognition[C]//Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence. ACM, 2018.
[20]
Ji B, Li S S, Yu J, et al. Research on Chinese Medical Named Entity Recognition Based on Collaborative Cooperation of Multiple Neural Network Models[J]. Journal of Biomedical Informatics, 2020,104:103395.
doi: 10.1016/j.jbi.2020.103395
pmid: 32109551
( Pan Cuiran, Wang Qinghua, Tang Buzhou, et al. Chinese Electronic Medical Record Named Entity Recognition Based on Sentence-Level Lattice-Long Short-Term Memory Neural Network[J]. Academic Journal of Second Military Medical University, 2019,40(5):497-506.)
( Cao Chunping, Guan Pengju. Clinical Text Named Entity Recognition Based on E-CNN and BLSTM-CRF[J]. Computer Application Research, 2019,36(12):3748-3751.)
[23]
Luo L, Yang Z H, Yang P, et al. An Attention-based BiLSTM-CRF Approach to Document-level Chemical Named Entity Recognition[J]. Bioinformatics, 2018,34(8):1381-1388.
doi: 10.1093/bioinformatics/btx761
pmid: 29186323
[24]
Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
doi: 10.1162/neco.1997.9.8.1735
pmid: 9377276
[25]
Du L, Xia C, Deng Z, et al. A Machine Learning Based Approach to Identify Protected Health Information in Chinese Clinical Text[J]. International Journal of Medical Informatics, 2018,116:24-32.
doi: 10.1016/j.ijmedinf.2018.05.010
pmid: 29887232
[26]
都丽婷. 临床文本数据信息挖掘去识别技术研究[D]. 武汉: 华中科技大学, 2018.
[26]
( Du Liting. Research on Clinical Text Data Information Mining De-Identification Technology[D]. Wuhan: Huazhong University of Science and Technology, 2018.)
( Wu Hui, Lv Li, Yu Bihui. Chinese Named Entity Recognition Based on Transfer Learning and BiLSTM-CRF[J]. Journal of Chinese Computer Systems, 2019,40(6):1142-1147.)
[28]
Li X Q, Shi T Y, Li P, et al. BiLSTM-CRF Model for Named Entity Recognition in Railway Accident and Fault Analysis Report[C]//Proceedings of the Asia-Pacific Conference on Intelligent Medical 2018 & International Conference on Transportation and Traffic Engineering 2018. 2018:1-5.
[29]
Arellano A M, Dai W R, Wang S, et al. Privacy Policy and Technology in Biomedical Data Science[M]. Annual Review of Biomedical Data Science, 2018,1:115-129.