[Objective] This paper decomposes the named entity recognition models based on neural network for Chinese medical texts. We investigate the impacts of single neural network module and the collaboration of multiple modules on the entity recognition performance. [Methods] First, we chosed the benchmark datasets from CCKS2017, CCKS2019, and IMCS-NER for named entity recognition tasks. Then, we conducted extensive experiments to compare the performance of different single modules of the aforementioned layers. Third, we built and compared entity recognition models based on ensemble, parallel, and serial neural models. [Results] Using hfl/chinese-macbert-base, hfl/chinese-roberta-wwm-ext, hfl/chinese-bert-wwm-ext in the symbolic representation layer significantly improved the performance of entity recognition models, the average F1-scores reached 0.8816, 0.8816 and 0.8812 respectively. Stacking neural models at the context encoding layer improved the performance of the neural network. Moreover, ensembled neural networks could achieve the best performance, the F1-scores reached 0.9330, 0.8211 and 0.9181 respectively. [Limitations] More research is needed to examine our findings with datasets in other languages. [Conclusions] The characteristics of single neural modules and their collaboration could significantly affect the performance of the named entity recognition of Chinese medical texts.
段宇锋, 贺国秀. 面向中文医学文本命名实体识别的神经网络模块分解分析*[J]. 数据分析与知识发现, 2023, 7(2): 26-37.
Duan Yufeng, He Guoxiu. Analysis of Neural Network Modules for Named Entity Recognition of Chinese Medical Texts. Data Analysis and Knowledge Discovery, 2023, 7(2): 26-37.
(Li Wenxin, Zhang Kunli, Guan Tongfeng, et al. Overview of CHIP2020 Shared Task 1: Named Entity Recognition in Chinese Medical Text[J]. Journal of Chinese Information Processing, 2022, 36(4): 66-72.)
[2]
Yang X, Huang W. A Conditional Random Fields Approach to Clinical Name Entity Recognition[EB/OL]. [2022-08-08]. http://ceur-ws.org/Vol-2242/paper01.pdf.
[3]
Tong Y Q, Chen Y D, Shi X D. A Multi-Task Approach for Improving Biomedical Named Entity Recognition by Incorporating Multi-Granularity Information[C]// Proceedings of the 2021 International Joint Conference on Natural Language Processing. 2021: 4804-4813.
[4]
Li L Q, Zhao J, Hou L, et al. An Attention-Based Deep Learning Model for Clinical Named Entity Recognition of Chinese Electronic Medical Records[J]. BMC Medical Informatics and Decision Making, 2019, 19(S5): 235.
doi: 10.1186/s12911-019-0933-6
[5]
Crichton G, Pyysalo S, Chiu B, et al. A Neural Network Multi-Task Learning Approach to Biomedical Named Entity Recognition[J]. BMC Bioinformatics, 2017, 18(1): 368.
doi: 10.1186/s12859-017-1776-8
pmid: 28810903
(Sheng Yu, Hu Huirong, Wang Congcong, et al. Analyzing Structures of Medical Imaging Diagnosis Reports[J]. Data Analysis and Knowledge Discovery, 2022, 6(10): 46-56.)
(Zhang Houchang, Liu Chengliang. Recognition of Chinese-Named Medical Entities Embedded Words Character[J]. Chinese Journal of Medical Library and Information Science, 2021, 30(9): 42-49.)
(Hu Jiming, Qian Wei, Wen Peng, et al. Text Semantic Representation with Structure-Function and Entity Recognition: Case Study of Medical Records[J]. Data Analysis and Knowledge Discovery, 2022, 6(8): 110-121.)
(Gong Dunwei, Zhang Yongkai, Guo Yinan, et al. Named Entity Recognition of Chinese Electronic Medical Records Based on Multifeature Embedding and Attention Mechanism[J]. Chinese Journal of Engineering, 2021, 43(9): 1190-1196.)
[11]
Cao S S, Lu W, Zhou J, et al. Cw2vec: Learning Chinese Word Embeddings with Stroke N-Gram Information[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 32(1).
[12]
Wan Q, Liu J, Wei L N, et al. A Self-Attention Based Neural Architecture for Chinese Medical Named Entity Recognition[J]. Mathematical Biosciences and Engineering: MBE, 2020, 17(4): 3498-3511.
doi: 10.3934/mbe.2020197
(Luo Ling, Yang Zhihao, Song Yawen, et al. Chinese Clinical Named Entity Recognition Based on Stroke ELMo and Multi-Task Learning[J]. Chinese Journal of Computers, 2020, 43(10): 1943-1957.)
(Jing Shenqi, Zhao Youlin. Recognizing Clinical Named Entity from Chinese Electronic Medical Record Texts Based on Semi-Supervised Deep Learning[J]. Journal of Information Resources Management, 2021, 11(6): 105-115.)
(Qu Qianqian, Kan Hongxing. Named Entity Recognition of Chinese Medical Text Based on Bert-BiLSTM-CRF[J]. Electronic Design Engineering, 2021, 29(19): 40-43.)
(Zhang Yunqiu, Wang Yang, Li Bocheng. Identifying Named Entities of Chinese Electronic Medical Records Based on RoBERTa-wwm Dynamic Fusion Model[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 242-250.)
(Zhang Fangcong, Qin Qiuli, Jiang Yong, et al. Named Entity Recognition for Chinese EMR with RoBERTa-WWM-BiLSTM-CRF[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 251-262.)
(Li Mengdie, Zhang Ping, Li Gongli, et al. Study on Chinese Medical Named Entity Recognition Algorithm[J]. Journal of Medical Informatics, 2022, 43(3): 45-51.)
(Qiao Rui, Yang Xiaoran, Huang Wenkang. Medical Named Entity Recognition Based on BERT and Model Fusion[EB/OL]. [2022-08-08]. https://conference.bj.bcebos.com/ccks2019/eval/webpage/pdfs/eval_paper_1_1_1.pdf.)
[21]
Liu M, Zhou X, Cao Z, et al. Team MSIIP at CCKS 2019 Task 1[EB/OL]. [2022-08-08]. https://conference.bj.bcebos.com/ccks2019/eval/webpage/pdfs/eval_paper_1_1_2.pdf.
[22]
Li N, Luo L, Ding Z, et al. DUTIR at the CCKS-2019 Task1: Improving Chinese Clinical Named Entity Recognition Using Stroke ELMo and Transfer Learning[C]// Proceedings of the 4th Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing. 2019: 24-27.
[23]
Li Z, Gan Z, Zhang B, et al. Noisy Label Learning for Chinese Medical Named Entity Recognition Based on Uncertainty Strategy[EB/OL]. [2022-08-08]. https://bj.bcebos.com/v1/conference/ccks2020/eval_paper/ccks2020_eval_paper_3_1_1.pdf.
(Yan Yangtian, Zhao Xinyu, Wu Xian. Medical Named Entity Recognition Based on BERT and Glyph Phonetic Characteristics[EB/OL]. [2022-08-08]. https://bj.bcebos.com/v1/conference/ccks2020/eval_paper/ccks2020_eval_paper_3_1_2.pdf.)
(Yang Wenming, Bi Jinliang, Zou Jiali, et al. Medical Named Entity Recognition Based on ChiEHRBert and Muti-Model Fusion[EB/OL]. [2022-08-08]. https://bj.bcebos.com/v1/conference/ccks2020/eval_paper/ccks2020_eval_paper_3_1_3.pdf.)
[26]
Qiu J H, Zhou Y M, Wang Q, et al. Chinese Clinical Named Entity Recognition Using Residual Dilated Convolutional Neural Network with Conditional Random Field[J]. IEEE Transactions on Nanobioscience, 2019, 18(3): 306-315.
doi: 10.1109/TNB.7728
[27]
Kong J, Zhang L, Jiang M, et al. Incorporating Multi-Level CNN and Attention Mechanism for Chinese Clinical Named Entity Recognition[J]. Journal of Biomedical Informatics, 2021, 116: 103737.
doi: 10.1016/j.jbi.2021.103737
[28]
Zhang R Y, Zhao P, Guo W, et al. Medical Named Entity Recognition Based on Dilated Convolutional Neural Network[J]. Cognitive Robotics, 2022, 2: 13-20.
doi: 10.1016/j.cogr.2021.11.002
[29]
An Y, Xia X, Chen X, et al. Chinese Clinical Named Entity Recognition via Multi-Head Self-Attention Based BiLSTM-CRF[J]. Artificial Intelligence in Medicine, 2022, 127: 102282.
doi: 10.1016/j.artmed.2022.102282
[30]
Tang B Z, Wang X L, Yan J, et al. Entity Recognition in Chinese Clinical Text Using Attention-Based CNN-LSTM-CRF[J]. BMC Medical Informatics and Decision Making, 2019, 19(S3): 74.
doi: 10.1186/s12911-019-0787-y
[31]
Ji B, Liu R, Li S S, et al. A Hybrid Approach for Named Entity Recognition in Chinese Electronic Medical Record[J]. BMC Medical Informatics and Decision Making, 2019, 19(S2): 64.
doi: 10.1186/s12911-019-0767-2
[32]
Xiong Y, Peng H, Xiang Y, et al. Leveraging Multi-Source Knowledge for Chinese Clinical Named Entity Recognition via Relational Graph Convolutional Network[J]. Journal of Biomedical Informatics, 2022, 128: 104035.
doi: 10.1016/j.jbi.2022.104035
[33]
He Q Z, Wu L, Yin Y D, et al. Knowledge-Graph Augmented Word Representations for Named Entity Recognition[C]// Proceedings of the 2020 AAAI Conference on Artificial Intelligence. 2020, 34(5): 7919-7926.
[34]
Zhou B H, Cai X R, Zhang Y, et al. An End-to-End Progressive Multi-Task Learning Framework for Medical Named Entity Recognition and Normalization[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021: 6214-6224.
[35]
Luo L, Li N, Li S C, et al. DUTIR at the CCKS-2018 Task1: A Neural Network Ensemble Approach for Chinese Clinical Named Entity Recognition[C]// Proceedings of the 3rd Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing. 2018: 7-12.
[36]
Hu J, Shi X, Liu Z, et al. HITSZ_CNER: A Hybrid System for Entity Recognition from Chinese Clinical Text[EB/OL]. [2022-08-08]. https://ceur-ws.org/Vol-1976/paper05.pdf.
[37]
Nie B L, Ding R X, Xie P J, et al. Knowledge-Aware Named Entity Recognition with Alleviating Heterogeneity[C]// Proceedings of the 2021 AAAI Conference on Artificial Intelligence. 2021, 35(15): 13595-13603.
[38]
Nakayama H. Seqeval: A Python Framework for Sequence Labeling Evaluation[CP/OL]. [2022-09-10]. https://github.com/chakki-works/seqeval.
Yin Z, Shen Y Y. On the Dimensionality of Word Embedding[OL]. arXiv Preprint, arXiv: 1812.04224.
[43]
Cui Y M, Che W X, Liu T, et al. Pre-Training with Whole Word Masking for Chinese BERT[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514.
doi: 10.1109/TASLP.2021.3124365
[44]
Cui Y M, Che W X, Liu T, et al. Revisiting Pre-Trained Models for Chinese Natural Language Processing[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020.