Identifying Medical Named Entities with Word Information
Ben Yanyan1,Pang Xueqin2()
1School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan 430074, China 2Archives of Wuhan University of Science and Technology, Wuhan 430081, China
[Objective] This paper utilizes the word information to identify and infer the key clinical features in online consultation records and address the difficulty in recognizing the boundaries of named entities. [Methods] First, we constructed a new model based on MacBERT and conditional random fields. Then, we embedded the word position and part of speech as the dialogue text information by the speaker role embedding. Finally, we used the weighted multi-class cross-entropy to solve the problem of entity category imbalance. [Results] We conducted an empirical study with online consultation records from Chunyu Doctor. The F1 value of the proposed model in the named entity recognition task was 74.35%, which was nearly 2% higher than directly using the MacBERT model. [Limitations] We did not design a specific model for Chinese word segmentation. [Conclusions] Our new model with more dimensional features can effectively improve its ability to recognize key features of clinical findings.
本妍妍, 庞雪芹. 融入词性的医疗命名实体识别研究*[J]. 数据分析与知识发现, 2023, 7(5): 123-132.
Ben Yanyan, Pang Xueqin. Identifying Medical Named Entities with Word Information. Data Analysis and Knowledge Discovery, 2023, 7(5): 123-132.
(Sui Chen. Research of Chinese Named Entity Recognition Based on Deep Learning[D]. Hangzhou: Zhejiang University, 2017.)
[2]
Collobert R, Weston J, Bottou L, et al. Natural Language Processing (Almost) from Scratch[OL]. arXiv Preprint, arXiv: 1103.0398.
[3]
Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.
[4]
Ma X Z, Hovy E. End-to-End Sequence Labeling via Bi-Directional LSTM-CNNS-CRF[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1064-1074.
[5]
Chiu J P C, Nichols E. Named Entity Recognition with Bidirectional LSTM-CNNS[J]. Transactions of the Association for Computational Linguistics, 2016, 4: 357-370.
doi: 10.1162/tacl_a_00104
[6]
Rei M, Crichton G, Pyysalo S. Attending to Characters in Neural Sequence Labeling Models[C]// Proceedings of the 26th International Conference on Computational Linguistics:Technical Papers. 2016: 309-318.
[7]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019.
[8]
Cui Y M, Che W X, Liu T, et al. Revisiting Pre-Trained Models for Chinese Natural Language Processing[OL]. arXiv Preprint, arXiv: 2004.13922.
[9]
Zhang Y, Yang J. Chinese NER Using Lattice LSTM[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 1554-1564.
[10]
Li X N, Yan H, Qiu X P, et al. FLAT: Chinese NER Using Flat-Lattice Transformer[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 6836-6842.
[11]
Settles B. Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Set[C]// Proceedings of the 2004 International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications. 2004: 104-107.
[12]
Clark C, Aberdeen J, Coarr M, et al. MITRE System for Clinical Assertion Status Classification[J]. Journal of the American Medical Informatics Association, 2011, 18(5): 563-567.
doi: 10.1136/amiajnl-2011-000164
pmid: 21515542
[13]
Xu K, Zhou Z, Hao T, et al. A Bidirectional LSTM and Conditional Random Fields Approach to Medical Named Entity Recognition[C]// Proceedings of the 2017 International Conference on Advanced Intelligent Systems and Informatics. 2017: 355-365.
[14]
Gligic L, Kormilitzin A, Goldberg P, et al. Named Entity Recognition in Electronic Health Records Using Transfer Learning Bootstrapped Neural Networks[J]. Neural Networks, 2020, 121: 132-139.
doi: S0893-6080(19)30259-X
pmid: 31541881
[15]
Wang Y Q, Liu Y G, Yu Z H, et al. A Preliminary Work on Symptom Name Recognition from Free-Text Clinical Records of Traditional Chinese Medicine Using Conditional Random Fields and Reasonable Features[C]// Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. ACM, 2012: 223-230.
[16]
Liu K X, Hu Q C, Liu J W, et al. Named Entity Recognition in Chinese Electronic Medical Records Based on CRF[C]// Proceedings of the 14th Web Information Systems and Applications Conference (WISA). IEEE, 2018: 105-110.
(Su Ya, Liu Jie, Huang Yalou. Entity Recognition Research in Online Medical Texts[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2016, 52(1): 1-9.)
(Zhang Fan, Wang Min. Medical Text Entities Recognition Method Base on Deep Learning[J]. Computing Technology and Automation, 2017, 36(1): 123-127.)
[19]
申站. 基于神经网络的中文电子病历命名实体识别[D]. 北京: 北京邮电大学, 2018.
[19]
(Shen Zhan. Named Entity Recognition for Chinese Electronic Record with Neural Network[D]. Beijing: Beijing University of Posts and Telecommunications, 2018.)
(Zhao Hongyang. Research and Implementation of Named Entity Recognition of Electronic Medical Records Based on Deep Learning[J]. Computer Engineering & Software, 2019, 40(8): 208-211.)
[22]
Tang B Z, Wang X L, Yan J, et al. Entity Recognition in Chinese Clinical Text Using Attention-Based CNN-LSTM-CRF[J]. BMC Medical Informatics and Decision Making, 2019, 19(3): 74.
doi: 10.1186/s12911-019-0787-y
(Pan Cuiran, Wang Qinghua, Tang Buzhou, et al. Chinese Electronic Medical Record Named Entity Recognition Based on Sentence-Level Lattice-Long Short-Term Memory Neural Network[J]. Academic Journal of Second Military Medical University, 2019, 40(5): 497-506.)
(Li Bo, Kang Xiaodong, Zhang Huali, et al. Named Entity Recognition in Chinese Electronic Medical Records Using Transformer-CRF[J]. Computer Engineering and Applications, 2020, 56(5): 153-159.)
doi: 10.3778/j.issn.1002-8331.1909-0211
(Luo Ling, Yang Zhihao, Song Yawen, et al. Chinese Clinical Named Entity Recognition Based on Stroke ELMo and Multi-Task Learning[J]. Chinese Journal of Computers, 2020, 43(10): 1943-1957.)
(Tang Guoqiang, Gao Daqi, Ruan Tong, et al. Clinical Electronic Medical Record Named Entity Recognition Incorporating Language Model[J]. Computer Science, 2020, 47(3): 211-216.)
doi: 10.11896/jsjkx.190200259
(Shen Zhoufeng, Su Qianmin, Guo Jinglei. Named Entity Recognition Model of Chinese Clinical Electronic Medical Record Based on XLNet-BiLSTM[J]. Intelligent Computer and Applications, 2021, 11(8): 97-102.)
(Zeng Qingxia, Xiong Wangping, Du Jianqiang, et al. Electronic Medical Record Named Entity Recognition Combined with Self-Attention BiLSTM-CRF[J]. Computer Applications and Software, 2021, 38(3): 159-162.)
(Zhu Yan, Zhang Li, Wang Yu. Named Entity Recognition on Chinese Electronic Medical Records Based on RoBERTa-WWM[J]. Computer and Modernization, 2021(2): 51-55.)
(He Tao, Chen Jian, Wen Yingyou. Research on Entity Recognition of Electronic Medical Record Based on BERT-CRF Model[J]. Computer & Digital Engineering, 2022, 50(3): 639-643.)
(Zhang Houchang, Liu Chengliang. Recognition of Chinese-Named Medical Entities Embedded Words Character[J]. Chinese Journal of Medical Library and Information Science, 2021, 30(9): 42-49.)
[32]
Wu S C, He Y F. Enriching Pre-Trained Language Model with Entity Information for Relation Classification[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, 2019: 2361-2364.