Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (10): 46-56     https://doi.org/10.11925/infotech.2096-3467.2022.0085
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
医学影像诊断报告的结构化研究*
盛羽(),胡慧荣,王聪聪,杨晟艺
中南大学计算机学院 长沙 410083
Analyzing Structures of Medical Imaging Diagnosis Reports
Sheng Yu(),Hu Huirong,Wang Congcong,Yang Shengyi
School of Computer Science and Engineering, Central South University, Changsha 410083, China
全文: PDF (1029 KB)   HTML ( 11
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 研究医学影像诊断报告的结构化方法,实现从医学影像诊断报告中准确高效地提取信息。【方法】 分析医学影像诊断报告的文本特征,提出基于实体识别和规则抽取相结合的结构化方法,标注800份医学影像诊断报告构建数据集实验评估。【结果】 所提方法对医学影像诊断报告各类实体的识别精确率均达到了0.87,相较于BERT-BiLSTM-CRF在识别精确率上提升了4.03个百分点,召回率提升了2.81个百分点。该医学影像诊断报告结构化方法比基于依存分析的结构化方法对检查项和检查结果的识别精确率分别提升5.62个百分点和2.31个百分点。【局限】 研究基于某医院PET-CT影像诊断报告,数据来源单一。【结论】 实现医学影像诊断报告从自由文本到结构化数据的转换,不仅优化医学影像诊断报告的分类、检索与存储,还为医学影像领域后续研究提供数据支持。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
盛羽
胡慧荣
王聪聪
杨晟艺
关键词 医学影像诊断报告实体识别规则抽取结构化    
Abstract

[Objective] This paper tries to turn medical imaging diagnosis reports into structured data, aiming to effectively extract information from these free-text-reports. [Methods] First, we analyzed the text characteristics of medical imaging diagnosis reports, and proposed a structuring method based on entity recognition and rule extraction. Then, we annotated 800 reports to construct datasets for model evaluation. [Results] The proposed method had a precision rate of 0.87 for all entities from the medical imaging diagnostic reports, which was 4.03% higher than that of the BERT-BiLSTM-CRF. Its recall rate was also 2.81% higher than that of the BERT-BiLSTM-CRF. Compared with the method of dependency analysis, the proposed model improved the recognition precision of medical exam items and results by 5.62% and 2.31%. [Limitations] We only examined the proposed method with diagnostic PET-CT imaging reports from one hospital. [Conclusions] This study successfully converts the free texts of medical imaging diagnostic reports to structured data. It not only optimizes the classification, storage, and retrieval of medical reports, but also provides supports for future research on medical imaging.

Key wordsMedical Imaging Diagnosis Report    Entity Recognition    Rule Extraction    Structure
收稿日期: 2022-01-28      出版日期: 2022-11-16
ZTFLH:  TP391  
基金资助:国家自然科学基金面上项目(61877059)
通讯作者: 盛羽,ORCID:0000-0002-6347-0769      E-mail: shengyu@csu.edu.cn
引用本文:   
盛羽, 胡慧荣, 王聪聪, 杨晟艺. 医学影像诊断报告的结构化研究*[J]. 数据分析与知识发现, 2022, 6(10): 46-56.
Sheng Yu, Hu Huirong, Wang Congcong, Yang Shengyi. Analyzing Structures of Medical Imaging Diagnosis Reports. Data Analysis and Knowledge Discovery, 2022, 6(10): 46-56.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0085      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I10/46
影像描述 诊断结果
胸部PET-CT图像显示右胸廓内可见大片无肺纹理的透亮影及水样密度影,约占右侧胸腔70%。余下肺野显示大片软组织密度影,其内仅显示部分肺纹理影和气管影。该区域PET显示大片不均匀的糖代谢增高影,SUVmax为18.5,… 1. 右肺下叶基底段显示糖代谢异常增高,考虑恶性肿瘤,肺癌可能性大
2. 右肺门及纵隔多个淋巴结增大并代谢增高,考虑淋巴结转移
3. 右肺下叶感染和肺不张
Table 1  PET-CT影像诊断报告内容示意
实体名字 实体定义 实体实例
检查部位(BP) 影像检查的脏器名称 大脑、肺、甲状腺、肝脏…
属性(BPC) 脏器的属性 比例、密度、体积、大小…
谓词(PW) 说明脏器与征象之间关系 见、弥漫、呈、无、存在…
连接词(JW) 说明前后并列和递进等关系 及、同样、和、并、与…
征象(DI) 体征相关的描述 膨胀、钙化灶、糖代谢、楔形变…
间接征象(DIC) 对可见征象的进一步描述 短径、最大SUV、最大横截面…
Table 2  医学影像诊断报告实体定义
类别 数目
身体部位 31 425
属性 3 856
谓词 20 395
连接词 11 999
征象 30 893
间接征象 3 975
Table 3  标注后的医学影像诊断报告数据集
Fig.1  医学影像诊断报告结构化算法流程
分词工具 错误率
jieba 0.321 5
snownlp 0.548 6
pkuseg 0.387 5
thunlp 0.407 7
Table 4  不同分词工具在医学影像诊断报告分词性能
Fig.2  Dic-BiLSTM-CRF医学影像诊断报告命名实体识别模型
Fig.3  嵌入层结构
Fig.4  BiLSTM层结构
实体类型 P R F1
BP 0.948 5 0.934 9 0.941 6
BPC 0.883 8 0.900 7 0.892 2
DI 0.941 8 0.930 0 0.935 9
DIC 0.879 1 0.794 7 0.834 8
PW 0.956 2 0.951 6 0.953 8
JW 0.983 8 0.988 4 0.986 1
Table 5  Dic-BiLSTM-CRF模型识别结果
模型 P R F1
CNN-CRF 0.849 2 0.827 3 0.863 7
IDCNN-CRF 0.852 7 0.818 6 0.833 5
BiLSTM-CRF 0.857 2 0.851 3 0.853 8
BiLSTM-Attention-CRF 0.826 5 0.796 1 0.809 1
BERT-BiLSTM-CRF 0.852 2 0.851 1 0.850 9
本文模型 0.892 5 0.879 2 0.885 4
Table 6  不同模型在医学影像诊断报告识别结果
模型 无词典 有词典
P R F1 P R F1
CNN-CRF 0.849 2 0.827 3 0.836 7 0.855 2 0.855 7 0.855 1
IDCNN-CRF 0.852 7 0.818 6 0.833 5 0.842 6 0.852 4 0.847 4
BiLSTM-Attention-CRF 0.826 5 0.796 1 0.809 1 0.823 1 0.826 5 0.823 9
BiLSTM-CRF 0.857 2 0.851 3 0.853 8 0.892 5 0.879 2 0.885 4
Table 7  有无词典在医学影像诊断报告实体识别对照结果
结构化
方法
检查项 检查结果
P R F1 P R F1
人工规则 0.923 0 0.720 5 0.809 2 0.855 2 0.431 2 0.573 3
依存分析 0.887 5 0.834 3 0.860 0 0.887 2 0.786 1 0.833 5
ERRE 0.943 7 0.867 4 0.903 9 0.910 3 0.844 3 0.876 0
Table 8  医学影像诊断报告结构化对比实验结果
医学影像诊断报告原文 医学影像诊断报告结构化数据
CT平扫示脑实质密度均匀,未见异常密度灶;脑室系统扩大,形态如常,脑沟、脑裂增宽,脑中线结构居中。 脑实质 密度均匀
脑实质 未见异常密度灶
脑室系统 扩大
脑室系统 形态如常
脑沟 增宽
脑裂 增宽
脑中线 结构居中
Table 9  医学影像诊断报告结构化数据示例
[1] Desai S B, Pareek A, Lungren M P. Deep Learning and Its Role in COVID-19 Medical Imaging[J]. Intelligence-Based Medicine, 2020, 3-4: Article No. 100013.
[2] 王平, 陈亮, 胡磊. 人工智能+结构化报告赋能冠脉CT血管成像临床一体化[J]. 中国数字医学, 2021, 16(11):50-54.
[2] (Wang Ping, Chen Liang, Hu Lei. Artifical Intelligence Combined with Structured Reporting Enables Clinical Integration of Coronary CTA[J]. China Digital Medicine, 2021, 16(11):50-54.)
[3] Percha B. Modern Clinical Text Mining: A Guide and Review[J]. Annual Review of Biomedical Data Science, 2021, 4: 165-187.
doi: 10.1146/annurev-biodatasci-030421-030931 pmid: 34465177
[4] Shi Y H, Wang Q. The Artificial Intelligence-Enabled Medical Imaging: Today and Its Future[J]. Chinese Medical Sciences Journal, 2019, 34(2): 71-75.
[5] Lin M Q, Wynne J F, Zhou B R, et al. Artificial Intelligence in Tumor Subregion Analysis Based on Medical Imaging: A Review[J]. Journal of Applied Clinical Medical Physics, 2021, 22(7): 10-26.
doi: 10.1002/acm2.13321
[6] Rocha D M, Brasil L M, Lamas J M, et al. Evidence of the Benefits, Advantages and Potentialities of the Structured Radiological Report: An Integrative Review[J]. Artificial Intelligence in Medicine, 2020, 102: 101770.
doi: 10.1016/j.artmed.2019.101770
[7] van Ginneken A M, Stam H, Moorman P W. A Multi-Strategy Approach for Medical Records of Specialists[J]. International Journal of Bio-Medical Computing, 1996, 42(1-2): 21-26.
pmid: 8880265
[8] van Ginneken A M. The Computerized Patient Record: Balancing Effort and Benefit[J]. International Journal of Medical Informatics, 2002, 65(2): 97-119.
pmid: 12052424
[9] 肖强, 吴伟斌, 陈联忠, 等. 自由结构录入法在电子病历系统中的应用[J]. 解放军医院管理杂志, 2005, 12(3): 222.
[9] (Xiao Qiang, Wu Weibin, Chen Lianzhong. Application of Free Structure Input Method in Electronic Medical Record System[J]. Hospital Administration Journal of Chinese PLA, 2005, 12(3): 222.)
[10] Friedman C, Liu H, Shagina L, et al. Evaluating the UMLS as a Source of Lexical Knowledge for Medical Language Processing[C]// Proceedings of AMIA Symposium. 2001: 189-193.
[11] Sevenster M, van Ommering R, Qian Y C. Automatically Correlating Clinical Findings and Body Locations in Radiology Reports Using MedLEE[J]. Journal of Digital Imaging, 2012, 25(2): 240-249.
doi: 10.1007/s10278-011-9411-0 pmid: 21796490
[12] Morwal S. Named Entity Recognition Using Hidden Markov Model (HMM)[J]. International Journal on Natural Language Computing, 2012, 1(4): 15-23.
[13] Ning H, Yang H, Tan Y Z, et al. A Method of Chinese Named Entity Recognition Based on Maximum Entropy Model[C]// Proceedings of 2009 International Conference on Mechatronics and Automation. 2009: 2472-2477.
[14] Corbett P, Copestake A. Cascaded Classifiers for Confidence-Based Chemical Named Entity Recognition[J]. BMC Bioinformatics, 2008, 9(S11): Article No. S4.
[15] Lee K J, Hwang Y S, Kim S, et al. Biomedical Named Entity Recognition Using Two-Phase Model Based on SVMs[J]. Journal of Biomedical Informatics, 2004, 37(6): 436-447.
pmid: 15542017
[16] Cejuela J M, Bojchevski A, Uhlig C, et al. Nala: Text Mining Natural Language Mutation Mentions[J]. Bioinformatics, 2017, 33(12): 1852-1858.
doi: 10.1093/bioinformatics/btx083 pmid: 28200120
[17] de Bruijn B, Cherry C, Kiritchenko S, et al. Machine-Learned Solutions for Three Stages of Clinical Information Extraction: The State of the Art at I2B2 2010[J]. Journal of the American Medical Informatics Association, 2011, 18(5): 557-562.
doi: 10.1136/amiajnl-2011-000150 pmid: 21565856
[18] Lei J B, Tang B Z, Lu X Q, et al. A Comprehensive Study of Named Entity Recognition in Chinese Clinical Text[J]. Journal of the American Medical Informatics Association, 2014, 21(5): 808-814.
doi: 10.1136/amiajnl-2013-002381 pmid: 24347408
[19] 叶枫, 陈莺莺, 周根贵, 等. 电子病历中命名实体的智能识别[J]. 中国生物医学工程学报, 2011, 30(2): 256-262.
[19] (Ye Feng, Chen Yingying, Zhou Gengui, et al. Intelligent Recognition of Named Entity in Electronic Medical Records[J]. Chinese Journal of Biomedical Engineering, 2011, 30(2): 256-262.)
[20] Wu Y H, Jiang M, Xu J, et al. Clinical Named Entity Recognition Using Deep Learning Models[C]// Proceedings of AMIA Annual Symposium. 2017:1812-1819.
[21] Lyu C, Chen B, Ren Y F, et al. Long Short-Term Memory RNN for Biomedical Named Entity Recognition[J]. BMC Bioinformatics, 2017, 18(1): 462.
doi: 10.1186/s12859-017-1868-5 pmid: 29084508
[22] Li L Q, Hou L. Named Entity Recognition in Chinese Electronic Medical Records Based on the Model of Bidirectional Long Short-Term Memory with a Conditional Random Field Layer[J]. Studies in Health Technology and Informatics, 2019, 264: 1524-1525.
doi: 10.3233/SHTI190516 pmid: 31438213
[23] Xue K, Zhou Y M, Ma Z Y, et al. Fine-Tuning BERT for Joint Entity and Relation Extraction in Chinese Medical Text[C]// Proceedings of 2019 IEEE International Conference on Bioinformatics and Biomedicine. 2019: 892-897.
[24] 张芳丛, 秦秋莉, 姜勇, 等. 基于RoBERTa-WWM-BiLSTM-CRF的中文电子病历命名实体识别研究[J]. 数据分析与知识发现, 2022, 6(2/3): 251-262.
[24] (Zhang Fangcong, Qin Qiuli, Jiang Yong, et al. Named Entity Recognition for Chinese EMR with RoBERTa-WWM-BiLSTM-CRF[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 251-262.)
[25] 张云秋, 汪洋, 李博诚. 基于RoBERTa-WWM动态融合模型的中文电子病历命名实体识别[J]. 数据分析与知识发现, 2022, 6(2/3): 242-250.
[25] (Zhang Yunqiu, Wang Yang, Li Bocheng. Identifying Named Entities of Chinese Electronic Medical Records Based on RoBERTa-WWM Dynamic Fusion Model[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 242-250.)
[26] 金征宇, 龚启勇. 医学影像学[M]. 3版. 北京: 人民卫生出版社, 2015
[26] Jin Zhengyu, Gong Qiyong. Medical Imaging[M]. The 3rd Edition. Beijing: People’s Medical Publishing House, 2015.)
[27] 曹依依, 周应华, 申发海, 等. 基于CNN-CRF的中文电子病历命名实体识别研究[J]. 重庆邮电大学学报(自然科学版), 2019, 31(6): 869-875.
[27] (Cao Yiyi, Zhou Yinghua, Shen Fahai, et al. Research on Named Entity Recognition of Chinese Electronic Medical Record Based on CNN-CRF[J]. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 2019, 31(6): 869-875.)
[28] Gao M, Xiao Q F, Wu S C, et al. An Attention-Based ID-CNNs-CRF Model for Named Entity Recognition on Clinical Electronic Medical Records[C]// Proceedings of International Conference on Artificial Neural Networks. 2019: 231-242.
[29] Wang Z K, Guan H. Research on Named Entity Recognition of Doctor-Patient Question Answering Community Based on BiLSTM-CRF Model[C]// Proceedings of IEEE International Conference on Bioinformatics and Biomedicine. 2020: 1641-1644.
[30] Wei H, Gao M Y, Zhou A, et al. Named Entity Recognition from Biomedical Texts Using a Fusion Attention-Based BiLSTM-CRF[J]. IEEE Access, 2019, 7: 73627-73636.
doi: 10.1109/ACCESS.2019.2920734
[31] Wei K W, Wen B. Named Entity Recognition Method for Educational Emergency Field Based on BERT[C]// Proceedings of IEEE 12th International Conference on Software Engineering and Service Science. 2021: 145-149.
[32] 田驰远, 陈德华, 王梅, 等. 基于依存句法分析的病理报告结构化处理方法[J]. 计算机研究与发展, 2016, 53(12): 2669-2680.
[32] (Tian Chiyuan, Chen Dehua, Wang Mei, et al. Structured Processing for Pathological Reports Based on Dependency Parsing[J]. Journal of Computer Research and Development, 2016, 53(12): 2669-2680.)
[1] 赵蕊洁, 佟昕瑀, 刘小桦, 路永和. 基于神经网络的医药科技论文实体识别与标注研究*[J]. 数据分析与知识发现, 2022, 6(9): 100-112.
[2] 胡吉明, 钱玮, 文鹏, 吕晓光. 基于结构功能和实体识别的文本语义表示——以病历领域为例*[J]. 数据分析与知识发现, 2022, 6(8): 110-121.
[3] 李国锋, 李祚娟, 王哲吉, 吴梦. 基于多任务学习的税务稽查选案研究*[J]. 数据分析与知识发现, 2022, 6(6): 128-140.
[4] 余传明, 林虹君, 张贞港. 基于多任务深度学习的实体和事件联合抽取模型*[J]. 数据分析与知识发现, 2022, 6(2/3): 117-128.
[5] 张芳丛, 秦秋莉, 姜勇, 庄润涛. 基于RoBERTa-WWM-BiLSTM-CRF的中文电子病历命名实体识别研究[J]. 数据分析与知识发现, 2022, 6(2/3): 251-262.
[6] 张云秋, 汪洋, 李博诚. 基于RoBERTa-wwm动态融合模型的中文电子病历命名实体识别*[J]. 数据分析与知识发现, 2022, 6(2/3): 242-250.
[7] 刘兴丽, 范俊杰, 马海群. 面向小样本命名实体识别的数据增强算法改进策略研究*[J]. 数据分析与知识发现, 2022, 6(10): 128-141.
[8] 沈科杰, 黄焕婷, 化柏林. 基于公开履历数据的人物知识图谱构建*[J]. 数据分析与知识发现, 2021, 5(7): 81-90.
[9] 徐晨飞, 叶海影, 包平. 基于深度学习的方志物产资料实体自动识别模型构建研究*[J]. 数据分析与知识发现, 2020, 4(8): 86-97.
[10] 赵平,孙连英,涂帅,卞建玲,万莹. 改进的知识迁移景点实体识别算法研究及应用*[J]. 数据分析与知识发现, 2020, 4(5): 118-126.
[11] 高原,施元磊,张蕾,曹天奕,冯筠. 基于游记文本的游客游览行程重构*[J]. 数据分析与知识发现, 2020, 4(2/3): 165-172.
[12] 马建霞,袁慧,蒋翔. 基于Bi-LSTM+CRF的科学文献中生态治理技术相关命名实体抽取研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 78-88.
[13] 刘浏,秦天允,王东波. 非物质文化遗产传统音乐术语自动抽取*[J]. 数据分析与知识发现, 2020, 4(12): 68-75.
[14] 刘婧茹,宋阳,贾睿,张翼鹏,罗勇,马敬东. 基于BiLSTM-CRF中文临床文本中受保护的健康信息识别*[J]. 数据分析与知识发现, 2020, 4(10): 124-133.
[15] 陈果,许天祥. 基于主动学习的科技论文句子功能识别研究 *[J]. 数据分析与知识发现, 2019, 3(8): 53-61.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn