Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (12): 33-44     https://doi.org/10.11925/infotech.2096-3467.2020.0951
     专题 本期目录 | 过刊浏览 | 高级检索 |
基于语义对齐的临床量表信息提取方法及其临床试验队列识别的应用研究*
杨林,黄晓硕,王嘉阳,李姣()
中国医学科学院/北京协和医学院 医学信息研究所/图书馆 北京 100020
Extracting Clinical Scale Information and Identifying Trial Cohorts with Semantic Alignment
Yang Lin,Huang Xiaoshuo,Wang Jiayang,Li Jiao()
Institute of Medical Information/Medical Library, Chinese Academy of Medical Science & Peking Union Medical College, Beijing 100020, China
全文: PDF (1331 KB)   HTML ( 3
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 面向真实世界数据驱动的临床研究需求,提出一种基于语义对齐的临床量表信息提取方法,辅助识别潜在受试人群。【方法】 选取卒中量表NIHSS,分析量表信息在临床试验和真实世界电子病历中的特征,构建基于语义对齐的量表信息提取方法,应用临床试验数据集(ClinicalTrials.gov)和开放电子病历数据集MIMIC-III开展实验验证。【结果】 从患者出院小结中抽取NIHSS总评分、检查项评分的F1值分别为0.953 5和0.926 7;围绕两项匹配NIHSS纳排标准的测试任务,可以有效地识别出潜在受试人群。【局限】 缺乏在其他量表上的可行性研究,以及在真实临床试验环境中的有效性和可靠性验证。【结论】 本方法可以有效地解决临床量表信息在临床研究与电子病历数据的语义一致性问题。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
杨林
黄晓硕
王嘉阳
李姣
关键词 语义对齐临床量表临床试验纳排标准队列识别    
Abstract

[Objective] This study develops a method to extract clinical scale information based on semantic alignment, aiming to identify the potential cohort and improve the data-driven clinical research. [Methods] First, we analyzed the features of National Institutes of Health Stroke Scale (NIHSS) with clinical trials and real-world electronic medical records. Then, we proposed an extraction method for clinical scale information based on semantic alignment. Finally, we examined our model with data from ClinicalTrials.gov and open electronic medical record dataset MIMIC-III. [Results] The F1 values of the NIHSS total score and item scores of the extracted contents were 0.953 5 and 0.926 7. We identified patients who met NIHSS criteria effectively. [Limitations] More research is needed to examine this method with other clinical scales and real-world trial recuriment scenario. [Conclusions] The proposed method could effectively address the issue of semantic consistency facing clinical scale information.

Key wordsSemantic Alignment    Clinical Scale    Clinical Trial    Eligible Criteria    Cohort Identification
收稿日期: 2020-09-27      出版日期: 2020-12-25
ZTFLH:  TP391  
基金资助:中国医学科学院中央级公益性科研院所基本科研业务费项目“医学人工智能技术与人机交互关键问题研究”(2018PT33024);“真实世界临床数据感知与智能处理技术研究”(2017PT63010);中国医学科学院医学与健康科技创新工程“医学人工智能算法评价标准库构建”(2018-I2M-AI-016)
通讯作者: 李姣     E-mail: li.jiao@imicams.ac.cn
引用本文:   
杨林, 黄晓硕, 王嘉阳, 李姣. 基于语义对齐的临床量表信息提取方法及其临床试验队列识别的应用研究*[J]. 数据分析与知识发现, 2020, 4(12): 33-44.
Yang Lin, Huang Xiaoshuo, Wang Jiayang, Li Jiao. Extracting Clinical Scale Information and Identifying Trial Cohorts with Semantic Alignment. Data Analysis and Knowledge Discovery, 2020, 4(12): 33-44.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0951      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I12/33
Fig.1  基于语义对齐的临床量表信息提取方法及其临床试验队列识别应用
Fig.2  NIHSS纳排标准抽取
Fig.3  出院小结-NIHSS量表信息提取流程
Fig.4  NIHSS纳排标准的查询表示
测试任务 纳入/排除 NIHSS标准筛选条件
1 纳入标准 - NIHSS level of consciousness score ≥ 2
- Baseline NIHSS > 16
2 纳入标准 - NIHSS score of 6 - 22, inclusive, or at least 2 on the aphasia item of the NIHSS
排除标准 - score >= 2 on NIHSS Q1a
- score of 2 on NIHSS Q2
Table 1  测试任务
临床试验 类别 临床
试验数
招募状态 完成(Completed) 287
招募中(Recruiting) 163
未知状态(Unknown status) 101
终止(Terminated) 59
未开始招募(Not yet recruiting) 54
撤回(Withdrawn) 17
正在进行,非招募中(Active, not recruiting) 17
暂停(Suspended) 10
邀请招募(Enrolling by invitation) 7
干预措施 药物(Drug) 570
设备(Device) 251
其他(Other) 233
手术(Procedure) 87
行为(Behavioral) 67
生物(Biological) 24
诊断测试(Diagnostic Test) 19
饮食补充(Dietary Supplement) 9
放射(Radiation) 4
复合产品(Combination Product) 3
基因(Genetic) 1
Table 2  临床试验注册数据分布
纳排标准 类别 临床试验数
NIHSS标准 仅出现在纳入标准中 205
仅出现在排除标准中 47
在纳入与排除标准中均出现 34
筛选粒度 仅筛选总评分 258
仅筛选检查项评分 10
既筛选总评分也筛选检查项评分 18
否定限定 - 7
总数 - 286
Table 3  NIHSS纳排标准分布
ICD-9 疾病名称 病例数
431 脑出血 1 294
43491 脑动脉闭塞,未明确为脑梗死 700
43411 脑栓塞伴脑梗死 630
430 蛛网膜下腔出血 617
4321 硬膜下出血 380
43311 脑梗死合并颈动脉闭塞与狭窄 124
4329 不明原因颅内出血 72
43401 脑血栓形成伴脑梗死 60
43331 多支及双侧脑前动脉闭塞狭窄伴脑梗死 49
43301 基底动脉闭塞与狭窄伴脑梗死 32
43490 脑动脉闭塞,未注明脑梗死 29
43321 椎动脉闭塞狭窄伴脑梗死 22
436 急性但定义不清的脑血管病 21
43410 无脑梗塞的脑栓塞 12
4320 非创伤性硬膜外出血 6
43400 脑血栓未提及脑梗死 5
43381 脑前动脉闭塞狭窄伴脑梗死 3
43391 不明原因脑前动脉闭塞狭窄伴脑梗死 2
Table 4  病例疾病分布
分值类型 分类 病例数
NIHSS总评分 无分值 26
有分值 一个分值 240
多个分值 46
有检查项评分值 128
无检查项评分值 158
检查项评分 无分值 179
有分值 一个分值 129
多个分值 4
有总评分分值 128
无总评分分值 5
Table 5  病例NIHSS信息分布
任务 准确率 召回率 F1值
NIHSS总评分 0.972 9 0.934 9 0.953 5
NIHSS检查项评分 0.986 3 0.873 9 0.926 7
1a. Level of Consciousness 0.941 7 0.932 7 0.937 2
1b. LOC Questions 0.990 0 0.846 2 0.912 5
1c. LOC Commands 1.000 0 0.899 0 0.946 8
2. Best Gaze 0.990 0 0.900 0 0.942 9
3. Visual 0.990 2 0.886 0 0.935 2
4. Facial Palsy 1.000 0 0.837 0 0.911 3
5a. Motor Arm(Left Arm) 0.961 5 0.862 1 0.909 1
5b. Motor Arm(Right Arm) 0.959 6 0.855 9 0.904 8
6a. Motor Leg(Left Leg) 0.9900 0.868 4 0.925 2
6b. Motor Leg(Right Leg) 0.978 9 0.885 7 0.930 0
7. Limb Ataxia 1.000 0 0.924 5 0.960 8
8. Sensory 1.000 0 0.871 8 0.931 5
9. Best Language 0.990 7 0.861 8 0.921 8
10. Dysarthria 1.000 0 0.848 4 0.918 0
11. Extinction and Inattention 1.000 0 0.853 4 0.920 9
Table 6  NIHSS信息抽取性能
序号 NIHSS总评分分值 检查项1a分值
case01 17 2
case02 18 2
case03 21 3
case04 22 2
case05 22 2
case06 23 2
case07 22 3
case08 24, 25 2
case09 27, 28 2
case10 29 3
case11 32 2
Table 7  测试任务1识别结果
序号 NIHSS总评分
分值
检查项
9分值
检查项
1a分值
检查项
2分值
case01 22 - - -
case02 18 2 0 1
case03 19 0 1 1
case04 - 3 0 1
Table 8  测试任务2识别结果
Fig.5  三种方法的信息提取结果
[1] Myers K, Winters N C . Ten-year Review of Rating Scales. I: Overview of Scale Functioning, Psychometric Properties, and Selection[J]. Journal of the American Academy of Child & Adolescent Psychiatry, 2002,41(2):114-122.
doi: 10.1097/00004583-200202000-00004 pmid: 11837400
[2] 施榕, 郭爱民 . 全科医生科研方法[M]. 第2版. 北京: 人民卫生出版社, 2017: 211-217.
[2] ( Shi Rong, Guo Aimin. Research Methods of General Practitioners[M]. The 2nd Edition. Beijing: People’s Medical Publishing House, 2017: 211-217.)
[3] 中国卒中学会. 中国脑血管病临床管理指南[M]. 北京: 人民卫生出版社, 2019: 275.
[3] (Chinese Stroke Association. Guidelines for Clinical Management of Cerebrovascular Diseases in China[M]. Beijing: People’s Medical Publishing House, 2019: 275.)
[4] Teasdale G, Jennett B . Assessment of Coma and Impaired Consciousness: A Practical Scale[J]. The Lancet, 1974,304(7872):81-84.
[5] 国家药品监督管理局. 国家药监局关于发布真实世界证据支持药物研发与审评的指导原则(试行)的通告(2020年第1号)[EB/OL]. ( 2020- 01- 03). [2020-06-06]. http://www.nmpa.gov.cn/WS04/CL2182/373175.html.
[5] (National Medical Products Administration. Announce of the National Medical Products Administration on Issuing the Guiding Principles of Real World Evidence Supporting Drug Research and Approval (Trial) (No. 1 in 2020)[EB/OL]. (2020-01-03). [2020-06-06]. http://www.nmpa.gov.cn/WS04/CL2182/373175.html
[6] 王水强 . 治疗急性缺血性脑卒中药物临床试验的考虑要点[J]. 中国临床药理学杂志, 2010,26(7):483-487.
[6] ( Wang Shuiqiang . Points to Consider on Clinical Trials of Medicinal Products for the Treatment of Acute Ischemic Stroke[J]. The Chinese Journal of Clinical Pharmacology, 2010,26(7):483-487.)
[7] Hobart J, Cano S . Rating Scales for Clinical Studies in Neurology: Challenges and Opportunities[J]. US Neurol, 2008,4(1):12-18.
[8] ClinicalTrials.gov[EB/OL].[2020-06-06]. https://clinicaltrials.gov/.
[9] Feldman W B, Kim A S, Chiong W . Trends in Recruitment Rates for Acute Stroke Trials 1990-2014[J]. Stroke, 2017,48(3):799-801.
[10] Zöllner J P, Misselwitz B, Kaps M , et al. National Institutes of Health Stroke Scale (NIHSS) on Admission Predicts Acute Symptomatic Seizure Risk in Ischemic Stroke: A Population-Based Study Involving 135,117 Cases[J]. Scientific Reports, 2020,10(1):1-7.
pmid: 31913322
[11] Kogan E, Twyman K, Heap J , et al. Assessing Stroke Severity Using Electronic Health Record Data: A Machine Learning Approach[J]. BMC Medical Informatics and Decision Making, 2020,20(1):1-8.
doi: 10.1186/s12911-019-1002-x pmid: 31906929
[12] Sheikhalishahi S, Miotto R, Dudley J T , et al. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review[J]. JMIR Medical Informatics, 2019,7(2):e12239.
doi: 10.2196/12239 pmid: 31066697
[13] Wu H H, Toti G, Morley K I , et al. SemEHR: A General-Purpose Semantic Search System to Surface Semantic Data from Clinical Notes for Tailored Care, Trial Recruitment, and Clinical Research[J]. Journal of the American Medical Informatics Association, 2018,25(5):530-537.
doi: 10.1093/jamia/ocx160 pmid: 29361077
[14] Zhang Y, Wang X W, Hou Z , et al. Clinical Named Entity Recognition from Chinese Electronic Health Records via Machine Learning Methods[J]. JMIR Medical Informatics, 2018,6(4):e50.
doi: 10.2196/medinform.9965 pmid: 30559093
[15] Uzuner Ö, South B R, Shen S Y , et al. 2010 I2B2/VA Challenge on Concepts, Assertions, and Relations in Clinical Text[J]. Journal of the American Medical Informatics Association, 2011,18(5):552-556.
doi: 10.1136/amiajnl-2011-000203
[16] Li Z F, Liu F F, Antieau L , et al. Lancet: A High Precision Medication Event Extraction System for Clinical Text[J]. Journal of the American Medical Informatics Association, 2010,17(5):563-567.
doi: 10.1136/jamia.2010.004077 pmid: 20819865
[17] Šarić F, Glavaš G, Karan M , et al. Takelab: Systems for Measuring Semantic Text Similarity[C]// Proceedings of the 1st Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the 6th International Workshop on Semantic Evaluation 2012: 441-448.
[18] Majumder G, Pakray P, Gelbukh A , et al. Semantic Textual Similarity Methods, Tools, and Applications: A Survey[J]. Computación y Sistemas, 2016,20(4):647-665.
[19] De Marneffe M C, Grenager T, MacCartney B, et al. Aligning Semantic Graphs for Textual Inference and Machine Reading [C]//Proceedings of the AAAI Spring Symposium. 2007: 468-476.
[20] Lin X S, Lam W, Lai K P. Entity Retrieval in the Knowledge Graph with Hierarchical Entity Type and Content [C]//Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval. 2018: 211-214.
[21] Zhang S, Balog K. Auto-completion for Data Cells in Relational Tables [C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019: 761-770.
[22] Daniel C, Ouagne D, Sadou E , et al. Cross Border Semantic Interoperability for Clinical Research: the EHR4CR Semantic Resources and Services[J]. AMIA Summits on Translational Science Proceedings, 2016: 51-59.
[23] Mudaranthakam D P, Thompson J, Hu J , et al. A Curated Cancer Clinical Outcomes Database (C3OD) for Accelerating Patient Recruitment in Cancer Clinical Trials[J]. JAMIA Open, 2018,1(2):166-171.
doi: 10.1093/jamiaopen/ooy023 pmid: 30474074
[24] Baader F, Borgwardt S, Forkel W. Patient Selection for Clinical Trials Using Temporalized Ontology-Mediated Query Answering [C]//Companion Proceedings of the Web Conference 2018. 2018:1069-1074.
[25] 王雯, 高培, 吴晶 , 等. 构建基于既有健康医疗数据的研究型数据库技术规范[J]. 中国循证医学杂志, 2019(7):763-770.
[25] ( Wang Wen, Gao Pei, Wu Jing , et al. Technical Guidance for Developing Research Databases Using Existing Health and Medical Data[J]. Chinese Journal of Evidence-Based Medicine, 2019(7):763-770.)
[26] Kang T, Zhang S D, Tang Y L , et al. EliIE: An Open-Source Information Extraction System for Clinical Trial Eligibility Criteria[J]. Journal of the American Medical Informatics Association, 2017,24(6):1062-1071.
doi: 10.1093/jamia/ocx019 pmid: 28379377
[27] Yuan C, Ryan P B, Ta C , et al. Criteria2Query: A Natural Language Interface to Clinical Databases for Cohort Definition[J]. Journal of the American Medical Informatics Association, 2019,26(4):294-305.
pmid: 30753493
[28] Brott T, Adams H P, Olinger C P , et al. Measurements of Acute Cerebral Infarction: A Clinical Examination Scale[J]. Stroke, 1989,20(7):864-870.
doi: 10.1161/01.str.20.7.864 pmid: 2749846
[29] National Institute of Neurological Disorders and Stroke, National Institute of Health. NIH STROKE SCALE[EB/OL].[2020-06-06]. https://www.stroke.nih.gov/documents/NIH_Stroke_Scale_508C.pdf.
[30] Hage V . The NIH Stroke Scale: A Window into Neurological Status[J]. Nursing Spectrum, 2011,24(15):44-49.
[31] Johnson A E W, Pollard T J, Shen L , et al. MIMIC-III, a Freely Accessible Critical Care Database[J]. Scientific Data, 2016,3:160035.
doi: 10.1038/sdata.2016.35 pmid: 27219127
[32] Woodfield R, Grant I , UK Biobank Stroke Outcomes Group, et al. Accuracy of Electronic Health Record Data for Identifying Stroke Cases in Large-Scale Epidemiological Studies: A Systematic Review from The UK Biobank Stroke Outcomes Group[J]. PLoS One, 2015,10(10):e0140533.
doi: 10.1371/journal.pone.0140533 pmid: 26496350
[1] 张翼鹏,马敬东. 突发公共卫生事件误导信息受众情感分析及传播特征研究*[J]. 数据分析与知识发现, 2020, 4(12): 45-54.
[2] 刘浏,秦天允,王东波. 非物质文化遗产传统音乐术语自动抽取*[J]. 数据分析与知识发现, 2020, 4(12): 68-75.
[3] 达婧玮,颜嘉麒,邓三鸿,王忠民. 基于深度学习的重复住院预测模型研究——以心脏病为例*[J]. 数据分析与知识发现, 2020, 4(11): 63-73.
[4] 丁勇,陈夕,蒋翠清,王钊. 一种融合网络表示学习与XGBoost的评分预测模型*[J]. 数据分析与知识发现, 2020, 4(11): 52-62.
[5] 王思丽,祝忠明,杨恒,刘巍. 基于模式和投影学习的领域概念上下位关系自动识别研究*[J]. 数据分析与知识发现, 2020, 4(11): 15-25.
[6] 叶光辉,徐彤,毕崇武,李心悦. 基于多维度特征与LDA模型的城市旅游画像演化分析*[J]. 数据分析与知识发现, 2020, 4(11): 121-130.
[7] 彭郴,吕学强,孙宁,张乐,姜肇财,宋黎. 基于CNN的消费品缺陷领域词典构建方法研究*[J]. 数据分析与知识发现, 2020, 4(11): 112-120.
[8] 刘婧茹,宋阳,贾睿,张翼鹏,罗勇,马敬东. 基于BiLSTM-CRF中文临床文本中受保护的健康信息识别*[J]. 数据分析与知识发现, 2020, 4(10): 124-133.
[9] 陈文杰. 基于翻译模型的科研合作预测研究*[J]. 数据分析与知识发现, 2020, 4(10): 28-36.
[10] 魏家泽,董诚,何彦青,刘志辉,彭柯芸. 基于均衡段落和分话题向量的新闻热点话题检测研究*[J]. 数据分析与知识发现, 2020, 4(10): 70-79.
[11] 余本功,汲浩敏. 基于DW-TCI的半监督文本分类方法研究*[J]. 数据分析与知识发现, 2020, 4(10): 58-69.
[12] 刘浏, 秦天允, 王东波. 非物质文化遗产传统音乐术语自动抽取 [J]. 数据分析与知识发现, 0, (): 1-.
[13] 杨林, 黄晓硕, 王嘉阳, 李姣. 基于语义对齐的临床量表信息提取方法及其临床试验队列识别的应用研究 [J]. 数据分析与知识发现, 0, (): 1-.
[14] 邵琦,牟冬梅,王萍,靳春妍. 基于语义的突发公共卫生事件网络舆情主题发现研究*[J]. 数据分析与知识发现, 2020, 4(9): 68-80.
[15] 李广建,王锴,张庆芝. 基于多源数据的美国出口管制分析框架及其实证研究*[J]. 数据分析与知识发现, 2020, 4(9): 26-40.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn