Please wait a minute...
Data Analysis and Knowledge Discovery  0, Vol. Issue (): 0-    DOI: 10.11925/infotech. 2096-3467. 2020.0167
Current Issue | Archive | Adv Search |
A BiLSTM-CRF Model for Chinese Clinical Protected Health Information Recognition
Liu Jingru,Song Yang,Jia Rui,Zhang Yipeng,Luo Yong,Ma Jingdong
(School of Medical and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030)
(School of Public Health, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137)
(Sichuan Province Electronic Medical Record Engineering Technology Research Center, Chengdu 610041)
(Sichuan Jiuzhen Technology Co., Ltd., Chengdu, 610041)
Export: BibTeX | EndNote (RIS)      

[Objective] In order to protect private information in clinical texts and effectively identify protected health information (PHI) from unstructured structured texts, an automated scheme for removing private information from clinical records using a BiLSTM-CRF model is proposed. [Method] The discharge summary from the Electronic health records of a health information platform was selected as experimental data. According to the 18 PHI regulations specified by HIPAA combined with the characteristics of the experimental data, 7 PHI categories and 15 PHI types were determined. The BiLSTM-CRF model is used to effectively identify protected health information from unstructured clinical records. [Result] The accuracy rate, recall rate and F value of all entity category recognition were 98.66%, 99.36%, and 99.01% respectively, and the wrong labels were summarized and analyzed.. [Limitations] The optimization of model performance based on corpus characteristics needs to be improved, and the clinical text quality after automatic recognition of PHI has not been evaluated in this study. [Conclusion] The BiLSTM-CRF model realizes the automatic recognition of named entities without feature engineering, which is helpful to promote the sharing and utilization of clinical information.

Key words Chinese clinical text      protected health information      Long Short-Term Memory      named entity recognition      private information      
Published: 10 July 2020
ZTFLH:  TP391.1  

Cite this article:

Liu Jingru, Song Yang, Jia Rui, Zhang Yipeng, Luo Yong, Ma Jingdong. A BiLSTM-CRF Model for Chinese Clinical Protected Health Information Recognition . Data Analysis and Knowledge Discovery, 0, (): 0-.

URL: 2096-3467. 2020.0167     OR

[1] Zhang Yunqiu, Wang Yang, Li Bocheng. Identifying Named Entities of Chinese Electronic Medical Records Based on RoBERTa-wwm Dynamic Fusion Model[J]. 数据分析与知识发现, 2022, 6(2/3): 242-250.
[2] Yu Chuanming, Lin Hongjun, Zhang Zhengang. Joint Extraction Model for Entities and Events with Multi-task Deep Learning[J]. 数据分析与知识发现, 2022, 6(2/3): 117-128.
[3] Zhang Fangcong, Qin Qiuli, Jiang Yong, Zhuang Runtao. Named Entity Recognition for Chinese EMR with RoBERTa-WWM-BiLSTM-CRF[J]. 数据分析与知识发现, 2022, 6(2/3): 251-262.
[4] Cheng Bin,Shi Shuicai,Du Yuncheng,Xiao Shibin. Keyword Extraction for Journals Based on Part-of-Speech and BiLSTM-CRF Combined Model[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[5] Jiang Cuiqing,Wang Xiangxiang,Wang Zhao. Forecasting Car Sales Based on Consumer Attention[J]. 数据分析与知识发现, 2021, 5(1): 128-139.
[6] Xu Chenfei, Ye Haiying, Bao Ping. Automatic Recognition of Produce Entities from Local Chronicles with Deep Learning[J]. 数据分析与知识发现, 2020, 4(8): 86-97.
[7] Gao Yuan,Shi Yuanlei,Zhang Lei,Cao Tianyi,Feng Jun. Reconstructing Tour Routes Based on Travel Notes[J]. 数据分析与知识发现, 2020, 4(2/3): 165-172.
[8] Xue Fuliang,Liu Lifang. Fine-Grained Sentiment Analysis with CRF and ATAE-LSTM[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
[9] Ma Jianxia,Yuan Hui,Jiang Xiang. Extracting Name Entities from Ecological Restoration Literature with Bi-LSTM+CRF[J]. 数据分析与知识发现, 2020, 4(2/3): 78-88.
[10] Liu Jingru,Song Yang,Jia Rui,Zhang Yipeng,Luo Yong,Ma Jingdong. A BiLSTM-CRF Model for Protected Health Information in Chinese[J]. 数据分析与知识发现, 2020, 4(10): 124-133.
[11] Han Huang,Hongyu Wang,Xiaoguang Wang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[12] Meishan Chen,Chenxi Xia. Identifying Entities of Online Questions from Cancer Patients Based on Transfer Learning[J]. 数据分析与知识发现, 2019, 3(12): 61-69.
[13] Li Yu,Li Qian,Changlei Fu,Huaming Zhao. Extracting Fine-grained Knowledge Units from Texts with Deep Learning[J]. 数据分析与知识发现, 2019, 3(1): 38-45.
[14] Tang Huihui,Wang Hao,Zhang Zixuan,Wang Xueying. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[15] Fan Xinyue,Cui Lei. Using Text Mining to Discover Drug Side Effects: Case Study of PubMed[J]. 数据分析与知识发现, 2018, 2(3): 79-86.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938