|
|
Identifying Medical Named Entities with Word Information |
Ben Yanyan1,Pang Xueqin2() |
1School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan 430074, China 2Archives of Wuhan University of Science and Technology, Wuhan 430081, China |
|
|
Abstract [Objective] This paper utilizes the word information to identify and infer the key clinical features in online consultation records and address the difficulty in recognizing the boundaries of named entities. [Methods] First, we constructed a new model based on MacBERT and conditional random fields. Then, we embedded the word position and part of speech as the dialogue text information by the speaker role embedding. Finally, we used the weighted multi-class cross-entropy to solve the problem of entity category imbalance. [Results] We conducted an empirical study with online consultation records from Chunyu Doctor. The F1 value of the proposed model in the named entity recognition task was 74.35%, which was nearly 2% higher than directly using the MacBERT model. [Limitations] We did not design a specific model for Chinese word segmentation. [Conclusions] Our new model with more dimensional features can effectively improve its ability to recognize key features of clinical findings.
|
Received: 30 May 2022
Published: 04 July 2023
|
|
Fund:National Natural Science Foundation of China(11971185) |
Corresponding Authors:
Pang Xueqin,ORCID:0000-0002-0097-8725,E-mail:1046614047@qq.com。
|
[1] |
隋臣. 基于深度学习的中文命名实体识别研究[D]. 杭州: 浙江大学, 2017.
|
[1] |
(Sui Chen. Research of Chinese Named Entity Recognition Based on Deep Learning[D]. Hangzhou: Zhejiang University, 2017.)
|
[2] |
Collobert R, Weston J, Bottou L, et al. Natural Language Processing (Almost) from Scratch[OL]. arXiv Preprint, arXiv: 1103.0398.
|
[3] |
Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.
|
[4] |
Ma X Z, Hovy E. End-to-End Sequence Labeling via Bi-Directional LSTM-CNNS-CRF[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1064-1074.
|
[5] |
Chiu J P C, Nichols E. Named Entity Recognition with Bidirectional LSTM-CNNS[J]. Transactions of the Association for Computational Linguistics, 2016, 4: 357-370.
doi: 10.1162/tacl_a_00104
|
[6] |
Rei M, Crichton G, Pyysalo S. Attending to Characters in Neural Sequence Labeling Models[C]// Proceedings of the 26th International Conference on Computational Linguistics:Technical Papers. 2016: 309-318.
|
[7] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019.
|
[8] |
Cui Y M, Che W X, Liu T, et al. Revisiting Pre-Trained Models for Chinese Natural Language Processing[OL]. arXiv Preprint, arXiv: 2004.13922.
|
[9] |
Zhang Y, Yang J. Chinese NER Using Lattice LSTM[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 1554-1564.
|
[10] |
Li X N, Yan H, Qiu X P, et al. FLAT: Chinese NER Using Flat-Lattice Transformer[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 6836-6842.
|
[11] |
Settles B. Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Set[C]// Proceedings of the 2004 International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications. 2004: 104-107.
|
[12] |
Clark C, Aberdeen J, Coarr M, et al. MITRE System for Clinical Assertion Status Classification[J]. Journal of the American Medical Informatics Association, 2011, 18(5): 563-567.
doi: 10.1136/amiajnl-2011-000164
pmid: 21515542
|
[13] |
Xu K, Zhou Z, Hao T, et al. A Bidirectional LSTM and Conditional Random Fields Approach to Medical Named Entity Recognition[C]// Proceedings of the 2017 International Conference on Advanced Intelligent Systems and Informatics. 2017: 355-365.
|
[14] |
Gligic L, Kormilitzin A, Goldberg P, et al. Named Entity Recognition in Electronic Health Records Using Transfer Learning Bootstrapped Neural Networks[J]. Neural Networks, 2020, 121: 132-139.
doi: S0893-6080(19)30259-X
pmid: 31541881
|
[15] |
Wang Y Q, Liu Y G, Yu Z H, et al. A Preliminary Work on Symptom Name Recognition from Free-Text Clinical Records of Traditional Chinese Medicine Using Conditional Random Fields and Reasonable Features[C]// Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. ACM, 2012: 223-230.
|
[16] |
Liu K X, Hu Q C, Liu J W, et al. Named Entity Recognition in Chinese Electronic Medical Records Based on CRF[C]// Proceedings of the 14th Web Information Systems and Applications Conference (WISA). IEEE, 2018: 105-110.
|
[17] |
苏娅, 刘杰, 黄亚楼. 在线医疗文本中的实体识别研究[J]. 北京大学学报(自然科学版), 2016, 52(1): 1-9.
|
[17] |
(Su Ya, Liu Jie, Huang Yalou. Entity Recognition Research in Online Medical Texts[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2016, 52(1): 1-9.)
|
[18] |
张帆, 王敏. 基于深度学习的医疗命名实体识别[J]. 计算技术与自动化, 2017, 36(1): 123-127.
|
[18] |
(Zhang Fan, Wang Min. Medical Text Entities Recognition Method Base on Deep Learning[J]. Computing Technology and Automation, 2017, 36(1): 123-127.)
|
[19] |
申站. 基于神经网络的中文电子病历命名实体识别[D]. 北京: 北京邮电大学, 2018.
|
[19] |
(Shen Zhan. Named Entity Recognition for Chinese Electronic Record with Neural Network[D]. Beijing: Beijing University of Posts and Telecommunications, 2018.)
|
[20] |
杨文明, 褚伟杰. 在线医疗问答文本的命名实体识别[J]. 计算机系统应用, 2019, 28(2): 8-14.
|
[20] |
(Yang Wenming, Chu Weijie. Named Entity Recognition of Online Medical Question Answering Text[J]. Computer Systems & Applications, 2019, 28(2): 8-14.)
|
[21] |
赵鸿阳. 基于深度学习的电子病历命名实体识别的研究与实现[J]. 软件, 2019, 40(8): 208-211.
|
[21] |
(Zhao Hongyang. Research and Implementation of Named Entity Recognition of Electronic Medical Records Based on Deep Learning[J]. Computer Engineering & Software, 2019, 40(8): 208-211.)
|
[22] |
Tang B Z, Wang X L, Yan J, et al. Entity Recognition in Chinese Clinical Text Using Attention-Based CNN-LSTM-CRF[J]. BMC Medical Informatics and Decision Making, 2019, 19(3): 74.
doi: 10.1186/s12911-019-0787-y
|
[23] |
潘璀然, 王青华, 汤步洲, 等. 基于句子级Lattice-长短记忆神经网络的中文电子病历命名实体识别[J]. 第二军医大学学报, 2019, 40(5): 497-506.
|
[23] |
(Pan Cuiran, Wang Qinghua, Tang Buzhou, et al. Chinese Electronic Medical Record Named Entity Recognition Based on Sentence-Level Lattice-Long Short-Term Memory Neural Network[J]. Academic Journal of Second Military Medical University, 2019, 40(5): 497-506.)
|
[24] |
李博, 康晓东, 张华丽, 等. 采用Transformer-CRF的中文电子病历命名实体识别[J]. 计算机工程与应用, 2020, 56(5): 153-159.
doi: 10.3778/j.issn.1002-8331.1909-0211
|
[24] |
(Li Bo, Kang Xiaodong, Zhang Huali, et al. Named Entity Recognition in Chinese Electronic Medical Records Using Transformer-CRF[J]. Computer Engineering and Applications, 2020, 56(5): 153-159.)
doi: 10.3778/j.issn.1002-8331.1909-0211
|
[25] |
罗凌, 杨志豪, 宋雅文, 等. 基于笔画ELMo和多任务学习的中文电子病历命名实体识别研究[J]. 计算机学报, 2020, 43(10): 1943-1957.
|
[25] |
(Luo Ling, Yang Zhihao, Song Yawen, et al. Chinese Clinical Named Entity Recognition Based on Stroke ELMo and Multi-Task Learning[J]. Chinese Journal of Computers, 2020, 43(10): 1943-1957.)
|
[26] |
唐国强, 高大启, 阮彤, 等. 融入语言模型和注意力机制的临床电子病历命名实体识别[J]. 计算机科学, 2020, 47(3): 211-216.
doi: 10.11896/jsjkx.190200259
|
[26] |
(Tang Guoqiang, Gao Daqi, Ruan Tong, et al. Clinical Electronic Medical Record Named Entity Recognition Incorporating Language Model[J]. Computer Science, 2020, 47(3): 211-216.)
doi: 10.11896/jsjkx.190200259
|
[27] |
沈宙锋, 苏前敏, 郭晶磊. 基于XLNet-BiLSTM的中文电子病历命名实体识别方法[J]. 智能计算机与应用, 2021, 11(8): 97-102.
|
[27] |
(Shen Zhoufeng, Su Qianmin, Guo Jinglei. Named Entity Recognition Model of Chinese Clinical Electronic Medical Record Based on XLNet-BiLSTM[J]. Intelligent Computer and Applications, 2021, 11(8): 97-102.)
|
[28] |
曾青霞, 熊旺平, 杜建强, 等. 结合自注意力的BiLSTM-CRF的电子病历命名实体识别[J]. 计算机应用与软件, 2021, 38(3): 159-162.
|
[28] |
(Zeng Qingxia, Xiong Wangping, Du Jianqiang, et al. Electronic Medical Record Named Entity Recognition Combined with Self-Attention BiLSTM-CRF[J]. Computer Applications and Software, 2021, 38(3): 159-162.)
|
[29] |
朱岩, 张利, 王煜. 基于RoBERTa-WWM的中文电子病历命名实体识别[J]. 计算机与现代化, 2021(2): 51-55.
|
[29] |
(Zhu Yan, Zhang Li, Wang Yu. Named Entity Recognition on Chinese Electronic Medical Records Based on RoBERTa-WWM[J]. Computer and Modernization, 2021(2): 51-55.)
|
[30] |
何涛, 陈剑, 闻英友. 基于BERT-CRF模型的电子病历实体识别研究[J]. 计算机与数字工程, 2022, 50(3): 639-643.
|
[30] |
(He Tao, Chen Jian, Wen Yingyou. Research on Entity Recognition of Electronic Medical Record Based on BERT-CRF Model[J]. Computer & Digital Engineering, 2022, 50(3): 639-643.)
|
[31] |
张厚昌, 刘成良. 融合嵌入字词特征的中文医疗命名实体识别[J]. 中华医学图书情报杂志, 2021, 30(9): 42-49.
|
[31] |
(Zhang Houchang, Liu Chengliang. Recognition of Chinese-Named Medical Entities Embedded Words Character[J]. Chinese Journal of Medical Library and Information Science, 2021, 30(9): 42-49.)
|
[32] |
Wu S C, He Y F. Enriching Pre-Trained Language Model with Entity Information for Relation Classification[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, 2019: 2361-2364.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|