Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (5): 123-132     https://doi.org/10.11925/infotech.2096-3467.2022.0547
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
融入词性的医疗命名实体识别研究*
本妍妍1,庞雪芹2()
1华中科技大学数学与统计学院 武汉 430074
2武汉科技大学档案馆 武汉 430081
Identifying Medical Named Entities with Word Information
Ben Yanyan1,Pang Xueqin2()
1School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan 430074, China
2Archives of Wuhan University of Science and Technology, Wuhan 430081, China
全文: PDF (900 KB)   HTML ( 13
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 针对命名实体边界识别困难问题,融入词信息以改进在线问诊记录中临床关键特征的识别与推断。【方法】 基于MacBERT与条件随机场构建模型,对词位置和词性等词信息进行位置“软”嵌入,利用说话者角色嵌入引入对话文本信息。同时,引入加权多分类交叉熵解决实体类别不均衡问题。【结果】 在春雨医生互联网在线问诊记录上开展实证研究,本文模型在命名实体识别任务上的F1值为74.35%,相比直接利用MacBERT模型提高近2个百分点。【局限】 未设计专门对中文分词的模型。【结论】 与直接利用MacBERT模型建模相比,融入词信息等更多维度特征能有效提升模型的识别能力。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
本妍妍
庞雪芹
关键词 中文命名实体识别在线医疗问诊词信息融入MacBERT加权交叉熵    
Abstract

[Objective] This paper utilizes the word information to identify and infer the key clinical features in online consultation records and address the difficulty in recognizing the boundaries of named entities. [Methods] First, we constructed a new model based on MacBERT and conditional random fields. Then, we embedded the word position and part of speech as the dialogue text information by the speaker role embedding. Finally, we used the weighted multi-class cross-entropy to solve the problem of entity category imbalance. [Results] We conducted an empirical study with online consultation records from Chunyu Doctor. The F1 value of the proposed model in the named entity recognition task was 74.35%, which was nearly 2% higher than directly using the MacBERT model. [Limitations] We did not design a specific model for Chinese word segmentation. [Conclusions] Our new model with more dimensional features can effectively improve its ability to recognize key features of clinical findings.

Key wordsChinese Named Entity Recognition    Online Medical Consultation    Word Information Embedding    MacBERT    Weighted Cross Entropy
收稿日期: 2022-05-30      出版日期: 2023-07-04
ZTFLH:  TP393  
  G250  
基金资助:*国家自然科学基金项目的研究成果之一(11971185)
通讯作者: 庞雪芹,ORCID:0000-0002-0097-8725,E-mail:1046614047@qq.com。   
引用本文:   
本妍妍, 庞雪芹. 融入词性的医疗命名实体识别研究*[J]. 数据分析与知识发现, 2023, 7(5): 123-132.
Ben Yanyan, Pang Xueqin. Identifying Medical Named Entities with Word Information. Data Analysis and Knowledge Discovery, 2023, 7(5): 123-132.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0547      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I5/123
Fig.1  模型架构
Fig.2  组合样本文本流程
算法 类别 P R F 1
MacBERT+
Softmax模型
negative 0.602 17 0.705 01 0.649 55
positive 0.661 38 0.741 65 0.699 22
weighted 0.652 28 0.736 02 0.691 59
MacBERT+
CRF模型
negative 0.689 14 0.694 58 0.691 85
positive 0.738 72 0.728 85 0.733 75
weighted 0.731 10 0.723 59 0.727 31
MacBERT+
加权交叉熵+
CRF模型
negative 0.709 21 0.690 25 0.699 60
positive 0.736 03 0.751 70 0.743 78
weighted 0.731 91 0.742 25 0.737 04
融入词信息模型 negative 0.687 41 0.725 38 0.705 88
positive 0.781 59 0.721 64 0.750 42
weighted 0.767 12 0.722 22 0.743 58
Table 1  实验结果对比
[1] 隋臣. 基于深度学习的中文命名实体识别研究[D]. 杭州: 浙江大学, 2017.
[1] (Sui Chen. Research of Chinese Named Entity Recognition Based on Deep Learning[D]. Hangzhou: Zhejiang University, 2017.)
[2] Collobert R, Weston J, Bottou L, et al. Natural Language Processing (Almost) from Scratch[OL]. arXiv Preprint, arXiv: 1103.0398.
[3] Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.
[4] Ma X Z, Hovy E. End-to-End Sequence Labeling via Bi-Directional LSTM-CNNS-CRF[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1064-1074.
[5] Chiu J P C, Nichols E. Named Entity Recognition with Bidirectional LSTM-CNNS[J]. Transactions of the Association for Computational Linguistics, 2016, 4: 357-370.
doi: 10.1162/tacl_a_00104
[6] Rei M, Crichton G, Pyysalo S. Attending to Characters in Neural Sequence Labeling Models[C]// Proceedings of the 26th International Conference on Computational Linguistics:Technical Papers. 2016: 309-318.
[7] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019.
[8] Cui Y M, Che W X, Liu T, et al. Revisiting Pre-Trained Models for Chinese Natural Language Processing[OL]. arXiv Preprint, arXiv: 2004.13922.
[9] Zhang Y, Yang J. Chinese NER Using Lattice LSTM[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 1554-1564.
[10] Li X N, Yan H, Qiu X P, et al. FLAT: Chinese NER Using Flat-Lattice Transformer[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 6836-6842.
[11] Settles B. Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Set[C]// Proceedings of the 2004 International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications. 2004: 104-107.
[12] Clark C, Aberdeen J, Coarr M, et al. MITRE System for Clinical Assertion Status Classification[J]. Journal of the American Medical Informatics Association, 2011, 18(5): 563-567.
doi: 10.1136/amiajnl-2011-000164 pmid: 21515542
[13] Xu K, Zhou Z, Hao T, et al. A Bidirectional LSTM and Conditional Random Fields Approach to Medical Named Entity Recognition[C]// Proceedings of the 2017 International Conference on Advanced Intelligent Systems and Informatics. 2017: 355-365.
[14] Gligic L, Kormilitzin A, Goldberg P, et al. Named Entity Recognition in Electronic Health Records Using Transfer Learning Bootstrapped Neural Networks[J]. Neural Networks, 2020, 121: 132-139.
doi: S0893-6080(19)30259-X pmid: 31541881
[15] Wang Y Q, Liu Y G, Yu Z H, et al. A Preliminary Work on Symptom Name Recognition from Free-Text Clinical Records of Traditional Chinese Medicine Using Conditional Random Fields and Reasonable Features[C]// Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. ACM, 2012: 223-230.
[16] Liu K X, Hu Q C, Liu J W, et al. Named Entity Recognition in Chinese Electronic Medical Records Based on CRF[C]// Proceedings of the 14th Web Information Systems and Applications Conference (WISA). IEEE, 2018: 105-110.
[17] 苏娅, 刘杰, 黄亚楼. 在线医疗文本中的实体识别研究[J]. 北京大学学报(自然科学版), 2016, 52(1): 1-9.
[17] (Su Ya, Liu Jie, Huang Yalou. Entity Recognition Research in Online Medical Texts[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2016, 52(1): 1-9.)
[18] 张帆, 王敏. 基于深度学习的医疗命名实体识别[J]. 计算技术与自动化, 2017, 36(1): 123-127.
[18] (Zhang Fan, Wang Min. Medical Text Entities Recognition Method Base on Deep Learning[J]. Computing Technology and Automation, 2017, 36(1): 123-127.)
[19] 申站. 基于神经网络的中文电子病历命名实体识别[D]. 北京: 北京邮电大学, 2018.
[19] (Shen Zhan. Named Entity Recognition for Chinese Electronic Record with Neural Network[D]. Beijing: Beijing University of Posts and Telecommunications, 2018.)
[20] 杨文明, 褚伟杰. 在线医疗问答文本的命名实体识别[J]. 计算机系统应用, 2019, 28(2): 8-14.
[20] (Yang Wenming, Chu Weijie. Named Entity Recognition of Online Medical Question Answering Text[J]. Computer Systems & Applications, 2019, 28(2): 8-14.)
[21] 赵鸿阳. 基于深度学习的电子病历命名实体识别的研究与实现[J]. 软件, 2019, 40(8): 208-211.
[21] (Zhao Hongyang. Research and Implementation of Named Entity Recognition of Electronic Medical Records Based on Deep Learning[J]. Computer Engineering & Software, 2019, 40(8): 208-211.)
[22] Tang B Z, Wang X L, Yan J, et al. Entity Recognition in Chinese Clinical Text Using Attention-Based CNN-LSTM-CRF[J]. BMC Medical Informatics and Decision Making, 2019, 19(3): 74.
doi: 10.1186/s12911-019-0787-y
[23] 潘璀然, 王青华, 汤步洲, 等. 基于句子级Lattice-长短记忆神经网络的中文电子病历命名实体识别[J]. 第二军医大学学报, 2019, 40(5): 497-506.
[23] (Pan Cuiran, Wang Qinghua, Tang Buzhou, et al. Chinese Electronic Medical Record Named Entity Recognition Based on Sentence-Level Lattice-Long Short-Term Memory Neural Network[J]. Academic Journal of Second Military Medical University, 2019, 40(5): 497-506.)
[24] 李博, 康晓东, 张华丽, 等. 采用Transformer-CRF的中文电子病历命名实体识别[J]. 计算机工程与应用, 2020, 56(5): 153-159.
doi: 10.3778/j.issn.1002-8331.1909-0211
[24] (Li Bo, Kang Xiaodong, Zhang Huali, et al. Named Entity Recognition in Chinese Electronic Medical Records Using Transformer-CRF[J]. Computer Engineering and Applications, 2020, 56(5): 153-159.)
doi: 10.3778/j.issn.1002-8331.1909-0211
[25] 罗凌, 杨志豪, 宋雅文, 等. 基于笔画ELMo和多任务学习的中文电子病历命名实体识别研究[J]. 计算机学报, 2020, 43(10): 1943-1957.
[25] (Luo Ling, Yang Zhihao, Song Yawen, et al. Chinese Clinical Named Entity Recognition Based on Stroke ELMo and Multi-Task Learning[J]. Chinese Journal of Computers, 2020, 43(10): 1943-1957.)
[26] 唐国强, 高大启, 阮彤, 等. 融入语言模型和注意力机制的临床电子病历命名实体识别[J]. 计算机科学, 2020, 47(3): 211-216.
doi: 10.11896/jsjkx.190200259
[26] (Tang Guoqiang, Gao Daqi, Ruan Tong, et al. Clinical Electronic Medical Record Named Entity Recognition Incorporating Language Model[J]. Computer Science, 2020, 47(3): 211-216.)
doi: 10.11896/jsjkx.190200259
[27] 沈宙锋, 苏前敏, 郭晶磊. 基于XLNet-BiLSTM的中文电子病历命名实体识别方法[J]. 智能计算机与应用, 2021, 11(8): 97-102.
[27] (Shen Zhoufeng, Su Qianmin, Guo Jinglei. Named Entity Recognition Model of Chinese Clinical Electronic Medical Record Based on XLNet-BiLSTM[J]. Intelligent Computer and Applications, 2021, 11(8): 97-102.)
[28] 曾青霞, 熊旺平, 杜建强, 等. 结合自注意力的BiLSTM-CRF的电子病历命名实体识别[J]. 计算机应用与软件, 2021, 38(3): 159-162.
[28] (Zeng Qingxia, Xiong Wangping, Du Jianqiang, et al. Electronic Medical Record Named Entity Recognition Combined with Self-Attention BiLSTM-CRF[J]. Computer Applications and Software, 2021, 38(3): 159-162.)
[29] 朱岩, 张利, 王煜. 基于RoBERTa-WWM的中文电子病历命名实体识别[J]. 计算机与现代化, 2021(2): 51-55.
[29] (Zhu Yan, Zhang Li, Wang Yu. Named Entity Recognition on Chinese Electronic Medical Records Based on RoBERTa-WWM[J]. Computer and Modernization, 2021(2): 51-55.)
[30] 何涛, 陈剑, 闻英友. 基于BERT-CRF模型的电子病历实体识别研究[J]. 计算机与数字工程, 2022, 50(3): 639-643.
[30] (He Tao, Chen Jian, Wen Yingyou. Research on Entity Recognition of Electronic Medical Record Based on BERT-CRF Model[J]. Computer & Digital Engineering, 2022, 50(3): 639-643.)
[31] 张厚昌, 刘成良. 融合嵌入字词特征的中文医疗命名实体识别[J]. 中华医学图书情报杂志, 2021, 30(9): 42-49.
[31] (Zhang Houchang, Liu Chengliang. Recognition of Chinese-Named Medical Entities Embedded Words Character[J]. Chinese Journal of Medical Library and Information Science, 2021, 30(9): 42-49.)
[32] Wu S C, He Y F. Enriching Pre-Trained Language Model with Entity Information for Relation Classification[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, 2019: 2361-2364.
[1] 韦华楠, 雷鸣, 汪雪锋, 余音. 基础研究资助导向识别及演化分析:以NSF为例[J]. 数据分析与知识发现, 2023, 7(5): 10-20.
[2] 林伟振, 刘洪伟, 陈燕君, 温展明, 易闽琦. 基于在线评论的顾客满意度研究——以健康监测穿戴产品为例*[J]. 数据分析与知识发现, 2023, 7(5): 145-154.
[3] 黄学坚, 马廷淮, 王根生. 基于分层语义特征学习模型的微博谣言事件检测*[J]. 数据分析与知识发现, 2023, 7(5): 81-91.
[4] 张昱, 张海军, 刘雅情, 梁科晋, 王月阳. 基于双向掩码注意力机制的多模态情感分析*[J]. 数据分析与知识发现, 2023, 7(4): 46-55.
[5] 陈文杰. 基于超图的科研合作推荐研究*[J]. 数据分析与知识发现, 2023, 7(4): 68-76.
[6] 李佳蕾, 安培浚, 肖仙桃. 学科交叉主题识别方法研究综述*[J]. 数据分析与知识发现, 2023, 7(4): 1-15.
[7] 李岱峰, 林凯欣, 李栩婷. 基于提示学习与T5 PEGASUS的图书宣传自动摘要生成器*[J]. 数据分析与知识发现, 2023, 7(3): 121-130.
[8] 赵朝阳, 朱贵波, 王金桥. ChatGPT给语言大模型带来的启示和多模态大模型新的发展思路*[J]. 数据分析与知识发现, 2023, 7(3): 26-35.
[9] 张智雄, 于改红, 刘熠, 林歆, 张梦婷, 钱力. ChatGPT对文献情报工作的影响*[J]. 数据分析与知识发现, 2023, 7(3): 36-42.
[10] 赵朝阳, 朱贵波, 王金桥. ChatGPT给语言大模型带来的启示和多模态大模型新的发展思路*[J]. 数据分析与知识发现, 0, (): 1-11.
[11] 赵一鸣, 潘沛, 毛进. 基于任务知识融合与文本数据增强的医学信息查询意图强度识别研究*[J]. 数据分析与知识发现, 2023, 7(2): 38-47.
[12] 张思阳, 魏苏波, 孙争艳, 张顺香, 朱广丽, 吴厚月. 基于多标签Seq2Seq模型的情绪-原因对提取模型*[J]. 数据分析与知识发现, 2023, 7(2): 86-96.
[13] 王卫军, 宁致远, 杜一, 周园春. 基于多标签分类的科技文献学科交叉研究性质识别*[J]. 数据分析与知识发现, 2023, 7(1): 102-112.
[14] 严冬梅, 何雯馨, 陈智. 融合情感特征的基于RoBERTa-TCN的股价预测研究[J]. 数据分析与知识发现, 2022, 6(12): 123-134.
[15] 王烨桐, 江涛. 通过重叠社区结构识别社交网络中的影响力节点*[J]. 数据分析与知识发现, 2022, 6(12): 80-89.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn