Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (5): 123-132    DOI: 10.11925/infotech.2096-3467.2022.0547
Current Issue | Archive | Adv Search |
Identifying Medical Named Entities with Word Information
Ben Yanyan1,Pang Xueqin2()
1School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan 430074, China
2Archives of Wuhan University of Science and Technology, Wuhan 430081, China
Download: PDF (900 KB)   HTML ( 13
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper utilizes the word information to identify and infer the key clinical features in online consultation records and address the difficulty in recognizing the boundaries of named entities. [Methods] First, we constructed a new model based on MacBERT and conditional random fields. Then, we embedded the word position and part of speech as the dialogue text information by the speaker role embedding. Finally, we used the weighted multi-class cross-entropy to solve the problem of entity category imbalance. [Results] We conducted an empirical study with online consultation records from Chunyu Doctor. The F1 value of the proposed model in the named entity recognition task was 74.35%, which was nearly 2% higher than directly using the MacBERT model. [Limitations] We did not design a specific model for Chinese word segmentation. [Conclusions] Our new model with more dimensional features can effectively improve its ability to recognize key features of clinical findings.

Key wordsChinese Named Entity Recognition      Online Medical Consultation      Word Information Embedding      MacBERT      Weighted Cross Entropy     
Received: 30 May 2022      Published: 04 July 2023
ZTFLH:  TP393  
  G250  
Fund:National Natural Science Foundation of China(11971185)
Corresponding Authors: Pang Xueqin,ORCID:0000-0002-0097-8725,E-mail:1046614047@qq.com。   

Cite this article:

Ben Yanyan, Pang Xueqin. Identifying Medical Named Entities with Word Information. Data Analysis and Knowledge Discovery, 2023, 7(5): 123-132.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0547     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I5/123

Model Architecture
Combined Sample Text Flow Chart
算法 类别 P R F 1
MacBERT+
Softmax模型
negative 0.602 17 0.705 01 0.649 55
positive 0.661 38 0.741 65 0.699 22
weighted 0.652 28 0.736 02 0.691 59
MacBERT+
CRF模型
negative 0.689 14 0.694 58 0.691 85
positive 0.738 72 0.728 85 0.733 75
weighted 0.731 10 0.723 59 0.727 31
MacBERT+
加权交叉熵+
CRF模型
negative 0.709 21 0.690 25 0.699 60
positive 0.736 03 0.751 70 0.743 78
weighted 0.731 91 0.742 25 0.737 04
融入词信息模型 negative 0.687 41 0.725 38 0.705 88
positive 0.781 59 0.721 64 0.750 42
weighted 0.767 12 0.722 22 0.743 58
Comparison of Experimental Results
[1] 隋臣. 基于深度学习的中文命名实体识别研究[D]. 杭州: 浙江大学, 2017.
[1] (Sui Chen. Research of Chinese Named Entity Recognition Based on Deep Learning[D]. Hangzhou: Zhejiang University, 2017.)
[2] Collobert R, Weston J, Bottou L, et al. Natural Language Processing (Almost) from Scratch[OL]. arXiv Preprint, arXiv: 1103.0398.
[3] Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.
[4] Ma X Z, Hovy E. End-to-End Sequence Labeling via Bi-Directional LSTM-CNNS-CRF[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1064-1074.
[5] Chiu J P C, Nichols E. Named Entity Recognition with Bidirectional LSTM-CNNS[J]. Transactions of the Association for Computational Linguistics, 2016, 4: 357-370.
doi: 10.1162/tacl_a_00104
[6] Rei M, Crichton G, Pyysalo S. Attending to Characters in Neural Sequence Labeling Models[C]// Proceedings of the 26th International Conference on Computational Linguistics:Technical Papers. 2016: 309-318.
[7] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019.
[8] Cui Y M, Che W X, Liu T, et al. Revisiting Pre-Trained Models for Chinese Natural Language Processing[OL]. arXiv Preprint, arXiv: 2004.13922.
[9] Zhang Y, Yang J. Chinese NER Using Lattice LSTM[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 1554-1564.
[10] Li X N, Yan H, Qiu X P, et al. FLAT: Chinese NER Using Flat-Lattice Transformer[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 6836-6842.
[11] Settles B. Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Set[C]// Proceedings of the 2004 International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications. 2004: 104-107.
[12] Clark C, Aberdeen J, Coarr M, et al. MITRE System for Clinical Assertion Status Classification[J]. Journal of the American Medical Informatics Association, 2011, 18(5): 563-567.
doi: 10.1136/amiajnl-2011-000164 pmid: 21515542
[13] Xu K, Zhou Z, Hao T, et al. A Bidirectional LSTM and Conditional Random Fields Approach to Medical Named Entity Recognition[C]// Proceedings of the 2017 International Conference on Advanced Intelligent Systems and Informatics. 2017: 355-365.
[14] Gligic L, Kormilitzin A, Goldberg P, et al. Named Entity Recognition in Electronic Health Records Using Transfer Learning Bootstrapped Neural Networks[J]. Neural Networks, 2020, 121: 132-139.
doi: S0893-6080(19)30259-X pmid: 31541881
[15] Wang Y Q, Liu Y G, Yu Z H, et al. A Preliminary Work on Symptom Name Recognition from Free-Text Clinical Records of Traditional Chinese Medicine Using Conditional Random Fields and Reasonable Features[C]// Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. ACM, 2012: 223-230.
[16] Liu K X, Hu Q C, Liu J W, et al. Named Entity Recognition in Chinese Electronic Medical Records Based on CRF[C]// Proceedings of the 14th Web Information Systems and Applications Conference (WISA). IEEE, 2018: 105-110.
[17] 苏娅, 刘杰, 黄亚楼. 在线医疗文本中的实体识别研究[J]. 北京大学学报(自然科学版), 2016, 52(1): 1-9.
[17] (Su Ya, Liu Jie, Huang Yalou. Entity Recognition Research in Online Medical Texts[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2016, 52(1): 1-9.)
[18] 张帆, 王敏. 基于深度学习的医疗命名实体识别[J]. 计算技术与自动化, 2017, 36(1): 123-127.
[18] (Zhang Fan, Wang Min. Medical Text Entities Recognition Method Base on Deep Learning[J]. Computing Technology and Automation, 2017, 36(1): 123-127.)
[19] 申站. 基于神经网络的中文电子病历命名实体识别[D]. 北京: 北京邮电大学, 2018.
[19] (Shen Zhan. Named Entity Recognition for Chinese Electronic Record with Neural Network[D]. Beijing: Beijing University of Posts and Telecommunications, 2018.)
[20] 杨文明, 褚伟杰. 在线医疗问答文本的命名实体识别[J]. 计算机系统应用, 2019, 28(2): 8-14.
[20] (Yang Wenming, Chu Weijie. Named Entity Recognition of Online Medical Question Answering Text[J]. Computer Systems & Applications, 2019, 28(2): 8-14.)
[21] 赵鸿阳. 基于深度学习的电子病历命名实体识别的研究与实现[J]. 软件, 2019, 40(8): 208-211.
[21] (Zhao Hongyang. Research and Implementation of Named Entity Recognition of Electronic Medical Records Based on Deep Learning[J]. Computer Engineering & Software, 2019, 40(8): 208-211.)
[22] Tang B Z, Wang X L, Yan J, et al. Entity Recognition in Chinese Clinical Text Using Attention-Based CNN-LSTM-CRF[J]. BMC Medical Informatics and Decision Making, 2019, 19(3): 74.
doi: 10.1186/s12911-019-0787-y
[23] 潘璀然, 王青华, 汤步洲, 等. 基于句子级Lattice-长短记忆神经网络的中文电子病历命名实体识别[J]. 第二军医大学学报, 2019, 40(5): 497-506.
[23] (Pan Cuiran, Wang Qinghua, Tang Buzhou, et al. Chinese Electronic Medical Record Named Entity Recognition Based on Sentence-Level Lattice-Long Short-Term Memory Neural Network[J]. Academic Journal of Second Military Medical University, 2019, 40(5): 497-506.)
[24] 李博, 康晓东, 张华丽, 等. 采用Transformer-CRF的中文电子病历命名实体识别[J]. 计算机工程与应用, 2020, 56(5): 153-159.
doi: 10.3778/j.issn.1002-8331.1909-0211
[24] (Li Bo, Kang Xiaodong, Zhang Huali, et al. Named Entity Recognition in Chinese Electronic Medical Records Using Transformer-CRF[J]. Computer Engineering and Applications, 2020, 56(5): 153-159.)
doi: 10.3778/j.issn.1002-8331.1909-0211
[25] 罗凌, 杨志豪, 宋雅文, 等. 基于笔画ELMo和多任务学习的中文电子病历命名实体识别研究[J]. 计算机学报, 2020, 43(10): 1943-1957.
[25] (Luo Ling, Yang Zhihao, Song Yawen, et al. Chinese Clinical Named Entity Recognition Based on Stroke ELMo and Multi-Task Learning[J]. Chinese Journal of Computers, 2020, 43(10): 1943-1957.)
[26] 唐国强, 高大启, 阮彤, 等. 融入语言模型和注意力机制的临床电子病历命名实体识别[J]. 计算机科学, 2020, 47(3): 211-216.
doi: 10.11896/jsjkx.190200259
[26] (Tang Guoqiang, Gao Daqi, Ruan Tong, et al. Clinical Electronic Medical Record Named Entity Recognition Incorporating Language Model[J]. Computer Science, 2020, 47(3): 211-216.)
doi: 10.11896/jsjkx.190200259
[27] 沈宙锋, 苏前敏, 郭晶磊. 基于XLNet-BiLSTM的中文电子病历命名实体识别方法[J]. 智能计算机与应用, 2021, 11(8): 97-102.
[27] (Shen Zhoufeng, Su Qianmin, Guo Jinglei. Named Entity Recognition Model of Chinese Clinical Electronic Medical Record Based on XLNet-BiLSTM[J]. Intelligent Computer and Applications, 2021, 11(8): 97-102.)
[28] 曾青霞, 熊旺平, 杜建强, 等. 结合自注意力的BiLSTM-CRF的电子病历命名实体识别[J]. 计算机应用与软件, 2021, 38(3): 159-162.
[28] (Zeng Qingxia, Xiong Wangping, Du Jianqiang, et al. Electronic Medical Record Named Entity Recognition Combined with Self-Attention BiLSTM-CRF[J]. Computer Applications and Software, 2021, 38(3): 159-162.)
[29] 朱岩, 张利, 王煜. 基于RoBERTa-WWM的中文电子病历命名实体识别[J]. 计算机与现代化, 2021(2): 51-55.
[29] (Zhu Yan, Zhang Li, Wang Yu. Named Entity Recognition on Chinese Electronic Medical Records Based on RoBERTa-WWM[J]. Computer and Modernization, 2021(2): 51-55.)
[30] 何涛, 陈剑, 闻英友. 基于BERT-CRF模型的电子病历实体识别研究[J]. 计算机与数字工程, 2022, 50(3): 639-643.
[30] (He Tao, Chen Jian, Wen Yingyou. Research on Entity Recognition of Electronic Medical Record Based on BERT-CRF Model[J]. Computer & Digital Engineering, 2022, 50(3): 639-643.)
[31] 张厚昌, 刘成良. 融合嵌入字词特征的中文医疗命名实体识别[J]. 中华医学图书情报杂志, 2021, 30(9): 42-49.
[31] (Zhang Houchang, Liu Chengliang. Recognition of Chinese-Named Medical Entities Embedded Words Character[J]. Chinese Journal of Medical Library and Information Science, 2021, 30(9): 42-49.)
[32] Wu S C, He Y F. Enriching Pre-Trained Language Model with Entity Information for Relation Classification[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, 2019: 2361-2364.
[1] Wei Huanan, Lei Ming, Wang Xuefeng, Yu Yin. Analyzing Evolution of Basic Research Funding Orientation: Case Study of NSF[J]. 数据分析与知识发现, 2023, 7(5): 10-20.
[2] Lin Weizhen, Liu Hongwei, Chen Yanjun, Wen Zhanming, Yi Minqi. Customer Satisfaction Modelling for Healthcare Wearable Devices Through Online Reviews[J]. 数据分析与知识发现, 2023, 7(5): 145-154.
[3] Huang Xuejian, Ma Tinghuai, Wang Gensheng. Detecting Weibo Rumors Based on Hierarchical Semantic Feature Learning Model[J]. 数据分析与知识发现, 2023, 7(5): 81-91.
[4] Zhang Yu, Zhang Haijun, Liu Yaqing, Liang Kejin, Wang Yueyang. Multimodal Sentiment Analysis Based on Bidirectional Mask Attention Mechanism[J]. 数据分析与知识发现, 2023, 7(4): 46-55.
[5] Chen Wenjie. Scientific Collaboration Recommendation Based on Hypergraph[J]. 数据分析与知识发现, 2023, 7(4): 68-76.
[6] Li Jialei, An Peijun, Xiao Xiantao. Review of Methods for Interdisciplinary Topic Identification[J]. 数据分析与知识发现, 2023, 7(4): 1-15.
[7] Li Daifeng, Lin Kaixin, Li Xuting. Identifying Named Entities of Adverse Drug Reaction with Adversarial Transfer Learning[J]. 数据分析与知识发现, 2023, 7(3): 121-130.
[8] Zhao Chaoyang, Zhu Guibo, Wang Jinqiao. The Inspiration Brought by ChatGPT to LLM and the New Development Ideas of Multi-modal Large Model[J]. 数据分析与知识发现, 2023, 7(3): 26-35.
[9] Zhang Zhixiong, Yu Gaihong, Liu Yi, Lin Xin, Zhang Menting, Qian Li. The Influence of ChatGPT on Library & Information Services[J]. 数据分析与知识发现, 2023, 7(3): 36-42.
[10] Zhao Chaoyang, Zhu Guibo, Wang Jinqiao. The Inspiration Brought by ChatGPT to LLM and the New Development Ideas of Multi-modal Large Model[J]. 数据分析与知识发现, 0, (): 1-11.
[11] Zhao Yiming, Pan Pei, Mao Jin. Recognizing Intensity of Medical Query Intentions Based on Task Knowledge Fusion and Text Data Enhancement[J]. 数据分析与知识发现, 2023, 7(2): 38-47.
[12] Zhang Siyang, Wei Subo, Sun Zhengyan, Zhang Shunxiang, Zhu Guangli, Wu Houyue. Extracting Emotion-Cause Pairs Based on Multi-Label Seq2Seq Model[J]. 数据分析与知识发现, 2023, 7(2): 86-96.
[13] Wang Weijun, Ning Zhiyuan, Du Yi, Zhou Yuanchun. Identifying Interdisciplinary Sci-Tech Literature Based on Multi-Label Classification[J]. 数据分析与知识发现, 2023, 7(1): 102-112.
[14] Yan Dongmei, He Wenxin, Chen Zhi. Predicting Stock Prices Based on RoBERTa-TCN and Sentimental Characteristics[J]. 数据分析与知识发现, 2022, 6(12): 123-134.
[15] Wang Yetong, Jiang Tao. Identifying Influential Nodes in Social Networks by Overlapping Community Structure[J]. 数据分析与知识发现, 2022, 6(12): 80-89.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn