Please wait a minute...
Data Analysis and Knowledge Discovery  0, Vol. Issue (): 0-    DOI: 10.11925/infotech. 2096-3467. 2020.0167
Current Issue | Archive | Adv Search |
A BiLSTM-CRF Model for Chinese Clinical Protected Health Information Recognition
Liu Jingru,Song Yang,Jia Rui,Zhang Yipeng,Luo Yong,Ma Jingdong
(School of Medical and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030)
(School of Public Health, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137)
(Sichuan Province Electronic Medical Record Engineering Technology Research Center, Chengdu 610041)
(Sichuan Jiuzhen Technology Co., Ltd., Chengdu, 610041)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] In order to protect private information in clinical texts and effectively identify protected health information (PHI) from unstructured structured texts, an automated scheme for removing private information from clinical records using a BiLSTM-CRF model is proposed. [Method] The discharge summary from the Electronic health records of a health information platform was selected as experimental data. According to the 18 PHI regulations specified by HIPAA combined with the characteristics of the experimental data, 7 PHI categories and 15 PHI types were determined. The BiLSTM-CRF model is used to effectively identify protected health information from unstructured clinical records. [Result] The accuracy rate, recall rate and F value of all entity category recognition were 98.66%, 99.36%, and 99.01% respectively, and the wrong labels were summarized and analyzed.. [Limitations] The optimization of model performance based on corpus characteristics needs to be improved, and the clinical text quality after automatic recognition of PHI has not been evaluated in this study. [Conclusion] The BiLSTM-CRF model realizes the automatic recognition of named entities without feature engineering, which is helpful to promote the sharing and utilization of clinical information.

Key words Chinese clinical text      protected health information      Long Short-Term Memory      named entity recognition      private information      
Published: 10 July 2020
ZTFLH:  TP391.1  

Cite this article:

Liu Jingru, Song Yang, Jia Rui, Zhang Yipeng, Luo Yong, Ma Jingdong. A BiLSTM-CRF Model for Chinese Clinical Protected Health Information Recognition . Data Analysis and Knowledge Discovery, 0, (): 0-.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech. 2096-3467. 2020.0167     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y0/V/I/0

[1] Xu Chenfei, Ye Haiying, Bao Ping. Automatic Recognition of Produce Entities from Local Chronicles with Deep Learning[J]. 数据分析与知识发现, 2020, 4(8): 86-97.
[2] Gao Yuan,Shi Yuanlei,Zhang Lei,Cao Tianyi,Feng Jun. Reconstructing Tour Routes Based on Travel Notes[J]. 数据分析与知识发现, 2020, 4(2/3): 165-172.
[3] Xue Fuliang,Liu Lifang. Fine-Grained Sentiment Analysis with CRF and ATAE-LSTM[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
[4] Ma Jianxia,Yuan Hui,Jiang Xiang. Extracting Name Entities from Ecological Restoration Literature with Bi-LSTM+CRF[J]. 数据分析与知识发现, 2020, 4(2/3): 78-88.
[5] Liu Jingru,Song Yang,Jia Rui,Zhang Yipeng,Luo Yong,Ma Jingdong. A BiLSTM-CRF Model for Protected Health Information in Chinese[J]. 数据分析与知识发现, 2020, 4(10): 124-133.
[6] Han Huang,Hongyu Wang,Xiaoguang Wang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[7] Meishan Chen,Chenxi Xia. Identifying Entities of Online Questions from Cancer Patients Based on Transfer Learning[J]. 数据分析与知识发现, 2019, 3(12): 61-69.
[8] Li Yu,Li Qian,Changlei Fu,Huaming Zhao. Extracting Fine-grained Knowledge Units from Texts with Deep Learning[J]. 数据分析与知识发现, 2019, 3(1): 38-45.
[9] Tang Huihui,Wang Hao,Zhang Zixuan,Wang Xueying. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[10] Fan Xinyue,Cui Lei. Using Text Mining to Discover Drug Side Effects: Case Study of PubMed[J]. 数据分析与知识发现, 2018, 2(3): 79-86.
[11] Sui Mingshuang,Cui Lei. Extracting Chemical and Disease Named Entities with Multiple-Feature CRF Model[J]. 现代图书情报技术, 2016, 32(10): 91-97.
[12] Wang Run,He Lin,Wang Dongbo,Huang Shuiqing,Fan Yuanbiao. Research on Plant Growth and Development Stage Named Entity Recognition for Text Mining[J]. 现代图书情报技术, 2014, 30(1): 24-27.
[13] Gao Qiang, You Hongliang. Study on Named Entity Recognition Based on Cascaded Model for Field of Defense[J]. 现代图书情报技术, 2012, (11): 47-52.
[14] Yu Chuanming, Huang Jianqiu, Guo Fei. Recognizing Named Entity from Free-text Customer Reviews——A Maximum Entropy Model-based Approach[J]. 现代图书情报技术, 2011, 27(5): 77-82.
[15] Sun Zhen Wang Huilin. Overview on the Advance of the Research on Named Entity Recognition[J]. 现代图书情报技术, 2010, 26(6): 42-47.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn