Abstract
[Objective]Aiming at the difficulty of identifying the boundaries of named entities, the word information is integrated to improve the identification and inference of the identification of key clinical features in online consultation records.
[Methods]The model is constructed based on MacBERT and conditional random fields, and the positional "soft" embedding of word information such as word position and part of speech is carried out, and the dialogue text information is introduced by the speaker role embedding. At the same time, weighted multi-class cross-entropy is introduced to solve the problem of entity category imbalance.
[Results]An empirical study was carried out on the online consultation records of Chunyu Doctor, and the F1 value of the proposed model in the named entity recognition task was 74.35%, an increase of nearly 2%.
[Limitations]No model is designed specifically for Chinese word segmentation.
[Conclusions]Compared with directly using the MacBERT model for modeling, incorporating more dimensional features such as word information can effectively improve the model's ability to recognize key features of clinical findings.
|