Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (10): 46-56    DOI: 10.11925/infotech.2096-3467.2022.0085
Current Issue | Archive | Adv Search |
Analyzing Structures of Medical Imaging Diagnosis Reports
Sheng Yu(),Hu Huirong,Wang Congcong,Yang Shengyi
School of Computer Science and Engineering, Central South University, Changsha 410083, China
Download: PDF (1029 KB)   HTML ( 7
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to turn medical imaging diagnosis reports into structured data, aiming to effectively extract information from these free-text-reports. [Methods] First, we analyzed the text characteristics of medical imaging diagnosis reports, and proposed a structuring method based on entity recognition and rule extraction. Then, we annotated 800 reports to construct datasets for model evaluation. [Results] The proposed method had a precision rate of 0.87 for all entities from the medical imaging diagnostic reports, which was 4.03% higher than that of the BERT-BiLSTM-CRF. Its recall rate was also 2.81% higher than that of the BERT-BiLSTM-CRF. Compared with the method of dependency analysis, the proposed model improved the recognition precision of medical exam items and results by 5.62% and 2.31%. [Limitations] We only examined the proposed method with diagnostic PET-CT imaging reports from one hospital. [Conclusions] This study successfully converts the free texts of medical imaging diagnostic reports to structured data. It not only optimizes the classification, storage, and retrieval of medical reports, but also provides supports for future research on medical imaging.

Key wordsMedical Imaging Diagnosis Report      Entity Recognition      Rule Extraction      Structure     
Received: 28 January 2022      Published: 16 November 2022
ZTFLH:  TP391  
Fund:National Natural Science Foundation of China(61877059)
Corresponding Authors: Sheng Yu,ORCID:0000-0002-6347-0769      E-mail: shengyu@csu.edu.cn

Cite this article:

Sheng Yu, Hu Huirong, Wang Congcong, Yang Shengyi. Analyzing Structures of Medical Imaging Diagnosis Reports. Data Analysis and Knowledge Discovery, 2022, 6(10): 46-56.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0085     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I10/46

影像描述 诊断结果
胸部PET-CT图像显示右胸廓内可见大片无肺纹理的透亮影及水样密度影,约占右侧胸腔70%。余下肺野显示大片软组织密度影,其内仅显示部分肺纹理影和气管影。该区域PET显示大片不均匀的糖代谢增高影,SUVmax为18.5,… 1. 右肺下叶基底段显示糖代谢异常增高,考虑恶性肿瘤,肺癌可能性大
2. 右肺门及纵隔多个淋巴结增大并代谢增高,考虑淋巴结转移
3. 右肺下叶感染和肺不张
Schematic of the PET-CT Imaging Diagnosis Report
实体名字 实体定义 实体实例
检查部位(BP) 影像检查的脏器名称 大脑、肺、甲状腺、肝脏…
属性(BPC) 脏器的属性 比例、密度、体积、大小…
谓词(PW) 说明脏器与征象之间关系 见、弥漫、呈、无、存在…
连接词(JW) 说明前后并列和递进等关系 及、同样、和、并、与…
征象(DI) 体征相关的描述 膨胀、钙化灶、糖代谢、楔形变…
间接征象(DIC) 对可见征象的进一步描述 短径、最大SUV、最大横截面…
Entity Definition for Medical Imaging Diagnosis Report
类别 数目
身体部位 31 425
属性 3 856
谓词 20 395
连接词 11 999
征象 30 893
间接征象 3 975
Annotated Medical Imaging Diagnosis Report Dataset
Flow of Structured Algorithm for Medical Imaging Diagnosis Report
分词工具 错误率
jieba 0.321 5
snownlp 0.548 6
pkuseg 0.387 5
thunlp 0.407 7
Performance of Different Tools in Medical Imaging Diagnosis Report
Structure of the Dic-BiLSTM-CRF Model
Structure of Embedding
Structure of BiLSTM
实体类型 P R F1
BP 0.948 5 0.934 9 0.941 6
BPC 0.883 8 0.900 7 0.892 2
DI 0.941 8 0.930 0 0.935 9
DIC 0.879 1 0.794 7 0.834 8
PW 0.956 2 0.951 6 0.953 8
JW 0.983 8 0.988 4 0.986 1
Recognition Result of Dic-BiLSTM-CRF
模型 P R F1
CNN-CRF 0.849 2 0.827 3 0.863 7
IDCNN-CRF 0.852 7 0.818 6 0.833 5
BiLSTM-CRF 0.857 2 0.851 3 0.853 8
BiLSTM-Attention-CRF 0.826 5 0.796 1 0.809 1
BERT-BiLSTM-CRF 0.852 2 0.851 1 0.850 9
本文模型 0.892 5 0.879 2 0.885 4
Recognition Result of Different Models in Medical Imaging Diagnosis Report
模型 无词典 有词典
P R F1 P R F1
CNN-CRF 0.849 2 0.827 3 0.836 7 0.855 2 0.855 7 0.855 1
IDCNN-CRF 0.852 7 0.818 6 0.833 5 0.842 6 0.852 4 0.847 4
BiLSTM-Attention-CRF 0.826 5 0.796 1 0.809 1 0.823 1 0.826 5 0.823 9
BiLSTM-CRF 0.857 2 0.851 3 0.853 8 0.892 5 0.879 2 0.885 4
Entity Recognition Comparison Result in Medical Imaging Diagnosis Report with or Without Dictionary
结构化
方法
检查项 检查结果
P R F1 P R F1
人工规则 0.923 0 0.720 5 0.809 2 0.855 2 0.431 2 0.573 3
依存分析 0.887 5 0.834 3 0.860 0 0.887 2 0.786 1 0.833 5
ERRE 0.943 7 0.867 4 0.903 9 0.910 3 0.844 3 0.876 0
Comparison Results of the Structured Medical Imaging Diagnostic Reports
医学影像诊断报告原文 医学影像诊断报告结构化数据
CT平扫示脑实质密度均匀,未见异常密度灶;脑室系统扩大,形态如常,脑沟、脑裂增宽,脑中线结构居中。 脑实质 密度均匀
脑实质 未见异常密度灶
脑室系统 扩大
脑室系统 形态如常
脑沟 增宽
脑裂 增宽
脑中线 结构居中
Structured Data for Medical Imaging Diagnostic Reports
[1] Desai S B, Pareek A, Lungren M P. Deep Learning and Its Role in COVID-19 Medical Imaging[J]. Intelligence-Based Medicine, 2020, 3-4: Article No. 100013.
[2] 王平, 陈亮, 胡磊. 人工智能+结构化报告赋能冠脉CT血管成像临床一体化[J]. 中国数字医学, 2021, 16(11):50-54.
[2] (Wang Ping, Chen Liang, Hu Lei. Artifical Intelligence Combined with Structured Reporting Enables Clinical Integration of Coronary CTA[J]. China Digital Medicine, 2021, 16(11):50-54.)
[3] Percha B. Modern Clinical Text Mining: A Guide and Review[J]. Annual Review of Biomedical Data Science, 2021, 4: 165-187.
doi: 10.1146/annurev-biodatasci-030421-030931 pmid: 34465177
[4] Shi Y H, Wang Q. The Artificial Intelligence-Enabled Medical Imaging: Today and Its Future[J]. Chinese Medical Sciences Journal, 2019, 34(2): 71-75.
[5] Lin M Q, Wynne J F, Zhou B R, et al. Artificial Intelligence in Tumor Subregion Analysis Based on Medical Imaging: A Review[J]. Journal of Applied Clinical Medical Physics, 2021, 22(7): 10-26.
doi: 10.1002/acm2.13321
[6] Rocha D M, Brasil L M, Lamas J M, et al. Evidence of the Benefits, Advantages and Potentialities of the Structured Radiological Report: An Integrative Review[J]. Artificial Intelligence in Medicine, 2020, 102: 101770.
doi: 10.1016/j.artmed.2019.101770
[7] van Ginneken A M, Stam H, Moorman P W. A Multi-Strategy Approach for Medical Records of Specialists[J]. International Journal of Bio-Medical Computing, 1996, 42(1-2): 21-26.
pmid: 8880265
[8] van Ginneken A M. The Computerized Patient Record: Balancing Effort and Benefit[J]. International Journal of Medical Informatics, 2002, 65(2): 97-119.
pmid: 12052424
[9] 肖强, 吴伟斌, 陈联忠, 等. 自由结构录入法在电子病历系统中的应用[J]. 解放军医院管理杂志, 2005, 12(3): 222.
[9] (Xiao Qiang, Wu Weibin, Chen Lianzhong. Application of Free Structure Input Method in Electronic Medical Record System[J]. Hospital Administration Journal of Chinese PLA, 2005, 12(3): 222.)
[10] Friedman C, Liu H, Shagina L, et al. Evaluating the UMLS as a Source of Lexical Knowledge for Medical Language Processing[C]// Proceedings of AMIA Symposium. 2001: 189-193.
[11] Sevenster M, van Ommering R, Qian Y C. Automatically Correlating Clinical Findings and Body Locations in Radiology Reports Using MedLEE[J]. Journal of Digital Imaging, 2012, 25(2): 240-249.
doi: 10.1007/s10278-011-9411-0 pmid: 21796490
[12] Morwal S. Named Entity Recognition Using Hidden Markov Model (HMM)[J]. International Journal on Natural Language Computing, 2012, 1(4): 15-23.
[13] Ning H, Yang H, Tan Y Z, et al. A Method of Chinese Named Entity Recognition Based on Maximum Entropy Model[C]// Proceedings of 2009 International Conference on Mechatronics and Automation. 2009: 2472-2477.
[14] Corbett P, Copestake A. Cascaded Classifiers for Confidence-Based Chemical Named Entity Recognition[J]. BMC Bioinformatics, 2008, 9(S11): Article No. S4.
[15] Lee K J, Hwang Y S, Kim S, et al. Biomedical Named Entity Recognition Using Two-Phase Model Based on SVMs[J]. Journal of Biomedical Informatics, 2004, 37(6): 436-447.
pmid: 15542017
[16] Cejuela J M, Bojchevski A, Uhlig C, et al. Nala: Text Mining Natural Language Mutation Mentions[J]. Bioinformatics, 2017, 33(12): 1852-1858.
doi: 10.1093/bioinformatics/btx083 pmid: 28200120
[17] de Bruijn B, Cherry C, Kiritchenko S, et al. Machine-Learned Solutions for Three Stages of Clinical Information Extraction: The State of the Art at I2B2 2010[J]. Journal of the American Medical Informatics Association, 2011, 18(5): 557-562.
doi: 10.1136/amiajnl-2011-000150 pmid: 21565856
[18] Lei J B, Tang B Z, Lu X Q, et al. A Comprehensive Study of Named Entity Recognition in Chinese Clinical Text[J]. Journal of the American Medical Informatics Association, 2014, 21(5): 808-814.
doi: 10.1136/amiajnl-2013-002381 pmid: 24347408
[19] 叶枫, 陈莺莺, 周根贵, 等. 电子病历中命名实体的智能识别[J]. 中国生物医学工程学报, 2011, 30(2): 256-262.
[19] (Ye Feng, Chen Yingying, Zhou Gengui, et al. Intelligent Recognition of Named Entity in Electronic Medical Records[J]. Chinese Journal of Biomedical Engineering, 2011, 30(2): 256-262.)
[20] Wu Y H, Jiang M, Xu J, et al. Clinical Named Entity Recognition Using Deep Learning Models[C]// Proceedings of AMIA Annual Symposium. 2017:1812-1819.
[21] Lyu C, Chen B, Ren Y F, et al. Long Short-Term Memory RNN for Biomedical Named Entity Recognition[J]. BMC Bioinformatics, 2017, 18(1): 462.
doi: 10.1186/s12859-017-1868-5 pmid: 29084508
[22] Li L Q, Hou L. Named Entity Recognition in Chinese Electronic Medical Records Based on the Model of Bidirectional Long Short-Term Memory with a Conditional Random Field Layer[J]. Studies in Health Technology and Informatics, 2019, 264: 1524-1525.
doi: 10.3233/SHTI190516 pmid: 31438213
[23] Xue K, Zhou Y M, Ma Z Y, et al. Fine-Tuning BERT for Joint Entity and Relation Extraction in Chinese Medical Text[C]// Proceedings of 2019 IEEE International Conference on Bioinformatics and Biomedicine. 2019: 892-897.
[24] 张芳丛, 秦秋莉, 姜勇, 等. 基于RoBERTa-WWM-BiLSTM-CRF的中文电子病历命名实体识别研究[J]. 数据分析与知识发现, 2022, 6(2/3): 251-262.
[24] (Zhang Fangcong, Qin Qiuli, Jiang Yong, et al. Named Entity Recognition for Chinese EMR with RoBERTa-WWM-BiLSTM-CRF[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 251-262.)
[25] 张云秋, 汪洋, 李博诚. 基于RoBERTa-WWM动态融合模型的中文电子病历命名实体识别[J]. 数据分析与知识发现, 2022, 6(2/3): 242-250.
[25] (Zhang Yunqiu, Wang Yang, Li Bocheng. Identifying Named Entities of Chinese Electronic Medical Records Based on RoBERTa-WWM Dynamic Fusion Model[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 242-250.)
[26] 金征宇, 龚启勇. 医学影像学[M]. 3版. 北京: 人民卫生出版社, 2015
[26] Jin Zhengyu, Gong Qiyong. Medical Imaging[M]. The 3rd Edition. Beijing: People’s Medical Publishing House, 2015.)
[27] 曹依依, 周应华, 申发海, 等. 基于CNN-CRF的中文电子病历命名实体识别研究[J]. 重庆邮电大学学报(自然科学版), 2019, 31(6): 869-875.
[27] (Cao Yiyi, Zhou Yinghua, Shen Fahai, et al. Research on Named Entity Recognition of Chinese Electronic Medical Record Based on CNN-CRF[J]. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 2019, 31(6): 869-875.)
[28] Gao M, Xiao Q F, Wu S C, et al. An Attention-Based ID-CNNs-CRF Model for Named Entity Recognition on Clinical Electronic Medical Records[C]// Proceedings of International Conference on Artificial Neural Networks. 2019: 231-242.
[29] Wang Z K, Guan H. Research on Named Entity Recognition of Doctor-Patient Question Answering Community Based on BiLSTM-CRF Model[C]// Proceedings of IEEE International Conference on Bioinformatics and Biomedicine. 2020: 1641-1644.
[30] Wei H, Gao M Y, Zhou A, et al. Named Entity Recognition from Biomedical Texts Using a Fusion Attention-Based BiLSTM-CRF[J]. IEEE Access, 2019, 7: 73627-73636.
doi: 10.1109/ACCESS.2019.2920734
[31] Wei K W, Wen B. Named Entity Recognition Method for Educational Emergency Field Based on BERT[C]// Proceedings of IEEE 12th International Conference on Software Engineering and Service Science. 2021: 145-149.
[32] 田驰远, 陈德华, 王梅, 等. 基于依存句法分析的病理报告结构化处理方法[J]. 计算机研究与发展, 2016, 53(12): 2669-2680.
[32] (Tian Chiyuan, Chen Dehua, Wang Mei, et al. Structured Processing for Pathological Reports Based on Dependency Parsing[J]. Journal of Computer Research and Development, 2016, 53(12): 2669-2680.)
[1] Zhao Ruijie, Tong Xinyu, Liu Xiaohua, Lu Yonghe. Entity Recognition and Labeling for Medical Literature Based on Neural Network[J]. 数据分析与知识发现, 2022, 6(9): 100-112.
[2] Hu Jiming, Qian Wei, Wen Peng, Lv Xiaoguang. Text Semantic Representation with Structure-Function and Entity Recognition: Case Study of Medical Records[J]. 数据分析与知识发现, 2022, 6(8): 110-121.
[3] Li Guofeng, Li Zuojuan, Wang Zheji, Wu Meng. Identifying Tax Audit Cases with Multi-task Learning[J]. 数据分析与知识发现, 2022, 6(6): 128-140.
[4] Yu Chuanming, Lin Hongjun, Zhang Zhengang. Joint Extraction Model for Entities and Events with Multi-task Deep Learning[J]. 数据分析与知识发现, 2022, 6(2/3): 117-128.
[5] Zhang Fangcong, Qin Qiuli, Jiang Yong, Zhuang Runtao. Named Entity Recognition for Chinese EMR with RoBERTa-WWM-BiLSTM-CRF[J]. 数据分析与知识发现, 2022, 6(2/3): 251-262.
[6] Zhang Yunqiu, Wang Yang, Li Bocheng. Identifying Named Entities of Chinese Electronic Medical Records Based on RoBERTa-wwm Dynamic Fusion Model[J]. 数据分析与知识发现, 2022, 6(2/3): 242-250.
[7] Liu Xingli, Fan Junjie, Ma Haiqun. Improvement of Data Augment Algorithm for Named Entity Recognition with Small Samples[J]. 数据分析与知识发现, 2022, 6(10): 128-141.
[8] Chen Wenjie,Wen Yi,Yang Ning. Fuzzy Overlapping Community Detection Algorithm Based on Node Vector Representation[J]. 数据分析与知识发现, 2021, 5(5): 41-50.
[9] Wu Shengnan, Pu Hongjun, Tian Ruonan, Liang Wenqi, Yu Qi. Network Structure’s Impacts on Link Prediction Algorithm from Meta-Analysis Perspective[J]. 数据分析与知识发现, 2021, 5(11): 102-113.
[10] Wang Song, Yang Yang, Liu Xinmin. Discovering Potentialities of User Ideas from Open Innovation Communities with Graph Attention Network[J]. 数据分析与知识发现, 2021, 5(11): 89-101.
[11] Xu Chenfei, Ye Haiying, Bao Ping. Automatic Recognition of Produce Entities from Local Chronicles with Deep Learning[J]. 数据分析与知识发现, 2020, 4(8): 86-97.
[12] Lv Huakui,Hong Liang,Ma Feicheng. Constructing Knowledge Graph for Financial Equities[J]. 数据分析与知识发现, 2020, 4(5): 27-37.
[13] Gao Yuan,Shi Yuanlei,Zhang Lei,Cao Tianyi,Feng Jun. Reconstructing Tour Routes Based on Travel Notes[J]. 数据分析与知识发现, 2020, 4(2/3): 165-172.
[14] Ma Jianxia,Yuan Hui,Jiang Xiang. Extracting Name Entities from Ecological Restoration Literature with Bi-LSTM+CRF[J]. 数据分析与知识发现, 2020, 4(2/3): 78-88.
[15] Liu Liu,Qin Tianyun,Wang Dongbo. Automatic Extraction of Traditional Music Terms of Intangible Cultural Heritage[J]. 数据分析与知识发现, 2020, 4(12): 68-75.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn