Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (5): 123-132    DOI: 10.11925/infotech.2096-3467.2022.0547
Identifying Medical Named Entities with Word Information
Ben Yanyan1,Pang Xueqin2()
1School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan 430074, China
2Archives of Wuhan University of Science and Technology, Wuhan 430081, China
[Objective] This paper utilizes the word information to identify and infer the key clinical features in online consultation records and address the difficulty in recognizing the boundaries of named entities. [Methods] First, we constructed a new model based on MacBERT and conditional random fields. Then, we embedded the word position and part of speech as the dialogue text information by the speaker role embedding. Finally, we used the weighted multi-class cross-entropy to solve the problem of entity category imbalance. [Results] We conducted an empirical study with online consultation records from Chunyu Doctor. The F1 value of the proposed model in the named entity recognition task was 74.35%, which was nearly 2% higher than directly using the MacBERT model. [Limitations] We did not design a specific model for Chinese word segmentation. [Conclusions] Our new model with more dimensional features can effectively improve its ability to recognize key features of clinical findings.

Key wordsChinese Named Entity Recognition      Online Medical Consultation      Word Information Embedding      MacBERT      Weighted Cross Entropy     
Received: 30 May 2022      Published: 04 July 2023
ZTFLH:  TP393  
Fund:National Natural Science Foundation of China(11971185)
Corresponding Authors: Pang Xueqin,ORCID:0000-0002-0097-8725,。   

Ben Yanyan, Pang Xueqin. Identifying Medical Named Entities with Word Information. Data Analysis and Knowledge Discovery, 2023, 7(5): 123-132.

Model Architecture
Combined Sample Text Flow Chart
算法 类别 P R F 1
negative 0.602 17 0.705 01 0.649 55
positive 0.661 38 0.741 65 0.699 22
weighted 0.652 28 0.736 02 0.691 59
negative 0.689 14 0.694 58 0.691 85
positive 0.738 72 0.728 85 0.733 75
weighted 0.731 10 0.723 59 0.727 31
negative 0.709 21 0.690 25 0.699 60
positive 0.736 03 0.751 70 0.743 78
weighted 0.731 91 0.742 25 0.737 04
融入词信息模型 negative 0.687 41 0.725 38 0.705 88
positive 0.781 59 0.721 64 0.750 42
weighted 0.767 12 0.722 22 0.743 58
Comparison of Experimental Results
