Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (12): 33-44    DOI: 10.11925/infotech.2096-3467.2020.0951
Current Issue | Archive | Adv Search |
Extracting Clinical Scale Information and Identifying Trial Cohorts with Semantic Alignment
Yang Lin,Huang Xiaoshuo,Wang Jiayang,Li Jiao()
Institute of Medical Information/Medical Library, Chinese Academy of Medical Science & Peking Union Medical College, Beijing 100020, China
Download: PDF (1331 KB)   HTML ( 2
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study develops a method to extract clinical scale information based on semantic alignment, aiming to identify the potential cohort and improve the data-driven clinical research. [Methods] First, we analyzed the features of National Institutes of Health Stroke Scale (NIHSS) with clinical trials and real-world electronic medical records. Then, we proposed an extraction method for clinical scale information based on semantic alignment. Finally, we examined our model with data from ClinicalTrials.gov and open electronic medical record dataset MIMIC-III. [Results] The F1 values of the NIHSS total score and item scores of the extracted contents were 0.953 5 and 0.926 7. We identified patients who met NIHSS criteria effectively. [Limitations] More research is needed to examine this method with other clinical scales and real-world trial recuriment scenario. [Conclusions] The proposed method could effectively address the issue of semantic consistency facing clinical scale information.

Key wordsSemantic Alignment      Clinical Scale      Clinical Trial      Eligible Criteria      Cohort Identification     
Received: 27 September 2020      Published: 25 December 2020
ZTFLH:  TP391  
Corresponding Authors: Li Jiao     E-mail: li.jiao@imicams.ac.cn

Cite this article:

Yang Lin, Huang Xiaoshuo, Wang Jiayang, Li Jiao. Extracting Clinical Scale Information and Identifying Trial Cohorts with Semantic Alignment. Data Analysis and Knowledge Discovery, 2020, 4(12): 33-44.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0951     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I12/33

Semantic Alignment-based Clinical Scale Information Extraction and Its Application in Cohort Identification
NIHSS Eligibility Criteria Extraction
Information Extraction Process of Discharge Summary-NIHSS
Query Representation of NIHSS Eligibility Criteria
测试任务 纳入/排除 NIHSS标准筛选条件
1 纳入标准 - NIHSS level of consciousness score ≥ 2
- Baseline NIHSS > 16
2 纳入标准 - NIHSS score of 6 - 22, inclusive, or at least 2 on the aphasia item of the NIHSS
排除标准 - score >= 2 on NIHSS Q1a
- score of 2 on NIHSS Q2
Test Tasks
临床试验 类别 临床
试验数
招募状态 完成(Completed) 287
招募中(Recruiting) 163
未知状态(Unknown status) 101
终止(Terminated) 59
未开始招募(Not yet recruiting) 54
撤回(Withdrawn) 17
正在进行,非招募中(Active, not recruiting) 17
暂停(Suspended) 10
邀请招募(Enrolling by invitation) 7
干预措施 药物(Drug) 570
设备(Device) 251
其他(Other) 233
手术(Procedure) 87
行为(Behavioral) 67
生物(Biological) 24
诊断测试(Diagnostic Test) 19
饮食补充(Dietary Supplement) 9
放射(Radiation) 4
复合产品(Combination Product) 3
基因(Genetic) 1
Distribution of Clinical Trials
纳排标准 类别 临床试验数
NIHSS标准 仅出现在纳入标准中 205
仅出现在排除标准中 47
在纳入与排除标准中均出现 34
筛选粒度 仅筛选总评分 258
仅筛选检查项评分 10
既筛选总评分也筛选检查项评分 18
否定限定 - 7
总数 - 286
Distribution of NIHSS Eligible Criteria
ICD-9 疾病名称 病例数
431 脑出血 1 294
43491 脑动脉闭塞,未明确为脑梗死 700
43411 脑栓塞伴脑梗死 630
430 蛛网膜下腔出血 617
4321 硬膜下出血 380
43311 脑梗死合并颈动脉闭塞与狭窄 124
4329 不明原因颅内出血 72
43401 脑血栓形成伴脑梗死 60
43331 多支及双侧脑前动脉闭塞狭窄伴脑梗死 49
43301 基底动脉闭塞与狭窄伴脑梗死 32
43490 脑动脉闭塞,未注明脑梗死 29
43321 椎动脉闭塞狭窄伴脑梗死 22
436 急性但定义不清的脑血管病 21
43410 无脑梗塞的脑栓塞 12
4320 非创伤性硬膜外出血 6
43400 脑血栓未提及脑梗死 5
43381 脑前动脉闭塞狭窄伴脑梗死 3
43391 不明原因脑前动脉闭塞狭窄伴脑梗死 2
Disease Distribution of Cases
分值类型 分类 病例数
NIHSS总评分 无分值 26
有分值 一个分值 240
多个分值 46
有检查项评分值 128
无检查项评分值 158
检查项评分 无分值 179
有分值 一个分值 129
多个分值 4
有总评分分值 128
无总评分分值 5
Distribution of NIHSS Scores
任务 准确率 召回率 F1值
NIHSS总评分 0.972 9 0.934 9 0.953 5
NIHSS检查项评分 0.986 3 0.873 9 0.926 7
1a. Level of Consciousness 0.941 7 0.932 7 0.937 2
1b. LOC Questions 0.990 0 0.846 2 0.912 5
1c. LOC Commands 1.000 0 0.899 0 0.946 8
2. Best Gaze 0.990 0 0.900 0 0.942 9
3. Visual 0.990 2 0.886 0 0.935 2
4. Facial Palsy 1.000 0 0.837 0 0.911 3
5a. Motor Arm(Left Arm) 0.961 5 0.862 1 0.909 1
5b. Motor Arm(Right Arm) 0.959 6 0.855 9 0.904 8
6a. Motor Leg(Left Leg) 0.9900 0.868 4 0.925 2
6b. Motor Leg(Right Leg) 0.978 9 0.885 7 0.930 0
7. Limb Ataxia 1.000 0 0.924 5 0.960 8
8. Sensory 1.000 0 0.871 8 0.931 5
9. Best Language 0.990 7 0.861 8 0.921 8
10. Dysarthria 1.000 0 0.848 4 0.918 0
11. Extinction and Inattention 1.000 0 0.853 4 0.920 9
Performance of NIHSS Information Extraction
序号 NIHSS总评分分值 检查项1a分值
case01 17 2
case02 18 2
case03 21 3
case04 22 2
case05 22 2
case06 23 2
case07 22 3
case08 24, 25 2
case09 27, 28 2
case10 29 3
case11 32 2
Example Results of Test Task 1
序号 NIHSS总评分
分值
检查项
9分值
检查项
1a分值
检查项
2分值
case01 22 - - -
case02 18 2 0 1
case03 19 0 1 1
case04 - 3 0 1
Example Results of Test Task 2
Information Extraction Results of Three Methods
[1] Myers K, Winters N C . Ten-year Review of Rating Scales. I: Overview of Scale Functioning, Psychometric Properties, and Selection[J]. Journal of the American Academy of Child & Adolescent Psychiatry, 2002,41(2):114-122.
doi: 10.1097/00004583-200202000-00004 pmid: 11837400
[2] 施榕, 郭爱民 . 全科医生科研方法[M]. 第2版. 北京: 人民卫生出版社, 2017: 211-217.
[2] ( Shi Rong, Guo Aimin. Research Methods of General Practitioners[M]. The 2nd Edition. Beijing: People’s Medical Publishing House, 2017: 211-217.)
[3] 中国卒中学会. 中国脑血管病临床管理指南[M]. 北京: 人民卫生出版社, 2019: 275.
[3] (Chinese Stroke Association. Guidelines for Clinical Management of Cerebrovascular Diseases in China[M]. Beijing: People’s Medical Publishing House, 2019: 275.)
[4] Teasdale G, Jennett B . Assessment of Coma and Impaired Consciousness: A Practical Scale[J]. The Lancet, 1974,304(7872):81-84.
[5] 国家药品监督管理局. 国家药监局关于发布真实世界证据支持药物研发与审评的指导原则(试行)的通告(2020年第1号)[EB/OL]. ( 2020- 01- 03). [2020-06-06]. http://www.nmpa.gov.cn/WS04/CL2182/373175.html.
[5] (National Medical Products Administration. Announce of the National Medical Products Administration on Issuing the Guiding Principles of Real World Evidence Supporting Drug Research and Approval (Trial) (No. 1 in 2020)[EB/OL]. (2020-01-03). [2020-06-06]. http://www.nmpa.gov.cn/WS04/CL2182/373175.html
[6] 王水强 . 治疗急性缺血性脑卒中药物临床试验的考虑要点[J]. 中国临床药理学杂志, 2010,26(7):483-487.
[6] ( Wang Shuiqiang . Points to Consider on Clinical Trials of Medicinal Products for the Treatment of Acute Ischemic Stroke[J]. The Chinese Journal of Clinical Pharmacology, 2010,26(7):483-487.)
[7] Hobart J, Cano S . Rating Scales for Clinical Studies in Neurology: Challenges and Opportunities[J]. US Neurol, 2008,4(1):12-18.
[8] ClinicalTrials.gov[EB/OL].[2020-06-06]. https://clinicaltrials.gov/.
[9] Feldman W B, Kim A S, Chiong W . Trends in Recruitment Rates for Acute Stroke Trials 1990-2014[J]. Stroke, 2017,48(3):799-801.
[10] Zöllner J P, Misselwitz B, Kaps M , et al. National Institutes of Health Stroke Scale (NIHSS) on Admission Predicts Acute Symptomatic Seizure Risk in Ischemic Stroke: A Population-Based Study Involving 135,117 Cases[J]. Scientific Reports, 2020,10(1):1-7.
pmid: 31913322
[11] Kogan E, Twyman K, Heap J , et al. Assessing Stroke Severity Using Electronic Health Record Data: A Machine Learning Approach[J]. BMC Medical Informatics and Decision Making, 2020,20(1):1-8.
doi: 10.1186/s12911-019-1002-x pmid: 31906929
[12] Sheikhalishahi S, Miotto R, Dudley J T , et al. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review[J]. JMIR Medical Informatics, 2019,7(2):e12239.
doi: 10.2196/12239 pmid: 31066697
[13] Wu H H, Toti G, Morley K I , et al. SemEHR: A General-Purpose Semantic Search System to Surface Semantic Data from Clinical Notes for Tailored Care, Trial Recruitment, and Clinical Research[J]. Journal of the American Medical Informatics Association, 2018,25(5):530-537.
doi: 10.1093/jamia/ocx160 pmid: 29361077
[14] Zhang Y, Wang X W, Hou Z , et al. Clinical Named Entity Recognition from Chinese Electronic Health Records via Machine Learning Methods[J]. JMIR Medical Informatics, 2018,6(4):e50.
doi: 10.2196/medinform.9965 pmid: 30559093
[15] Uzuner Ö, South B R, Shen S Y , et al. 2010 I2B2/VA Challenge on Concepts, Assertions, and Relations in Clinical Text[J]. Journal of the American Medical Informatics Association, 2011,18(5):552-556.
doi: 10.1136/amiajnl-2011-000203
[16] Li Z F, Liu F F, Antieau L , et al. Lancet: A High Precision Medication Event Extraction System for Clinical Text[J]. Journal of the American Medical Informatics Association, 2010,17(5):563-567.
doi: 10.1136/jamia.2010.004077 pmid: 20819865
[17] Šarić F, Glavaš G, Karan M , et al. Takelab: Systems for Measuring Semantic Text Similarity[C]// Proceedings of the 1st Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the 6th International Workshop on Semantic Evaluation 2012: 441-448.
[18] Majumder G, Pakray P, Gelbukh A , et al. Semantic Textual Similarity Methods, Tools, and Applications: A Survey[J]. Computación y Sistemas, 2016,20(4):647-665.
[19] De Marneffe M C, Grenager T, MacCartney B, et al. Aligning Semantic Graphs for Textual Inference and Machine Reading [C]//Proceedings of the AAAI Spring Symposium. 2007: 468-476.
[20] Lin X S, Lam W, Lai K P. Entity Retrieval in the Knowledge Graph with Hierarchical Entity Type and Content [C]//Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval. 2018: 211-214.
[21] Zhang S, Balog K. Auto-completion for Data Cells in Relational Tables [C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019: 761-770.
[22] Daniel C, Ouagne D, Sadou E , et al. Cross Border Semantic Interoperability for Clinical Research: the EHR4CR Semantic Resources and Services[J]. AMIA Summits on Translational Science Proceedings, 2016: 51-59.
[23] Mudaranthakam D P, Thompson J, Hu J , et al. A Curated Cancer Clinical Outcomes Database (C3OD) for Accelerating Patient Recruitment in Cancer Clinical Trials[J]. JAMIA Open, 2018,1(2):166-171.
doi: 10.1093/jamiaopen/ooy023 pmid: 30474074
[24] Baader F, Borgwardt S, Forkel W. Patient Selection for Clinical Trials Using Temporalized Ontology-Mediated Query Answering [C]//Companion Proceedings of the Web Conference 2018. 2018:1069-1074.
[25] 王雯, 高培, 吴晶 , 等. 构建基于既有健康医疗数据的研究型数据库技术规范[J]. 中国循证医学杂志, 2019(7):763-770.
[25] ( Wang Wen, Gao Pei, Wu Jing , et al. Technical Guidance for Developing Research Databases Using Existing Health and Medical Data[J]. Chinese Journal of Evidence-Based Medicine, 2019(7):763-770.)
[26] Kang T, Zhang S D, Tang Y L , et al. EliIE: An Open-Source Information Extraction System for Clinical Trial Eligibility Criteria[J]. Journal of the American Medical Informatics Association, 2017,24(6):1062-1071.
doi: 10.1093/jamia/ocx019 pmid: 28379377
[27] Yuan C, Ryan P B, Ta C , et al. Criteria2Query: A Natural Language Interface to Clinical Databases for Cohort Definition[J]. Journal of the American Medical Informatics Association, 2019,26(4):294-305.
pmid: 30753493
[28] Brott T, Adams H P, Olinger C P , et al. Measurements of Acute Cerebral Infarction: A Clinical Examination Scale[J]. Stroke, 1989,20(7):864-870.
doi: 10.1161/01.str.20.7.864 pmid: 2749846
[29] National Institute of Neurological Disorders and Stroke, National Institute of Health. NIH STROKE SCALE[EB/OL].[2020-06-06]. https://www.stroke.nih.gov/documents/NIH_Stroke_Scale_508C.pdf.
[30] Hage V . The NIH Stroke Scale: A Window into Neurological Status[J]. Nursing Spectrum, 2011,24(15):44-49.
[31] Johnson A E W, Pollard T J, Shen L , et al. MIMIC-III, a Freely Accessible Critical Care Database[J]. Scientific Data, 2016,3:160035.
doi: 10.1038/sdata.2016.35 pmid: 27219127
[32] Woodfield R, Grant I , UK Biobank Stroke Outcomes Group, et al. Accuracy of Electronic Health Record Data for Identifying Stroke Cases in Large-Scale Epidemiological Studies: A Systematic Review from The UK Biobank Stroke Outcomes Group[J]. PLoS One, 2015,10(10):e0140533.
doi: 10.1371/journal.pone.0140533 pmid: 26496350
[1] Zhang Yipeng,Ma Jingdong. Analyzing Sentiments and Dissemination of Misinformation on Public Health Emergency[J]. 数据分析与知识发现, 2020, 4(12): 45-54.
[2] Liu Liu,Qin Tianyun,Wang Dongbo. Automatic Extraction of Traditional Music Terms of Intangible Cultural Heritage[J]. 数据分析与知识发现, 2020, 4(12): 68-75.
[3] Da Jingwei,Yan Jiaqi,Deng Sanhong,Wang Zhongmin. Predicting Hospital Readmissions with Deep Learning: Case Study of Heart Diseases[J]. 数据分析与知识发现, 2020, 4(11): 63-73.
[4] Ding Yong,Chen Xi,Jiang Cuiqing,Wang Zhao. Predicting Online Ratings with Network Representation Learning and XGBoost[J]. 数据分析与知识发现, 2020, 4(11): 52-62.
[5] Wang Sili,Zhu Zhongming,Yang Heng,Liu Wei. Automatically Identifying Hypernym-Hyponym Relations of Domain Concepts with Patterns and Projection Learning[J]. 数据分析与知识发现, 2020, 4(11): 15-25.
[6] Ye Guanghui,Xu Tong,Bi Chongwu,Li Xinyue. Analyzing Evolution of City Tourism Portraits with Multi-Dimensional Features and LDA Model[J]. 数据分析与知识发现, 2020, 4(11): 121-130.
[7] Peng Chen,Lv Xueqiang,Sun Ning,Zang Le,Jiang Zhaocai,Song Li. Building Phrase Dictionary for Defective Products with Convolutional Neural Network[J]. 数据分析与知识发现, 2020, 4(11): 112-120.
[8] Liu Jingru,Song Yang,Jia Rui,Zhang Yipeng,Luo Yong,Ma Jingdong. A BiLSTM-CRF Model for Protected Health Information in Chinese[J]. 数据分析与知识发现, 2020, 4(10): 124-133.
[9] Chen Wenjie. Predicting Research Collaboration Based on Translation Model[J]. 数据分析与知识发现, 2020, 4(10): 28-36.
[10] Wei Jiaze,Dong Cheng,He Yanqing,Liu Zhihui,Peng Keyun. Detecting News Topics Based on Equalized Paragraph and Sub-topic Vector[J]. 数据分析与知识发现, 2020, 4(10): 70-79.
[11] Yu Bengong,Ji Haomin. Semi-Supervised Method for Text Classification Based on DW-TCI[J]. 数据分析与知识发现, 2020, 4(10): 58-69.
[12] Liu Liu, Qin Tianyun, Wang Dongbo. Automatic Extraction of Traditional Music Terms of Intangible Cultural Heritage [J]. 数据分析与知识发现, 0, (): 1-.
[13] Yang Lin, Huang Xiaoshuo, Wang Jiayang, Li Jiao. Semantic Alignment-based Clinical Scale Information Extraction and its Application in Cohort Identification [J]. 数据分析与知识发现, 0, (): 1-.
[14] Shao Qi,Mu Dongmei,Wang Ping,Jin Chunyan. Identifying Subjects of Online Opinion from Public Health Emergencies[J]. 数据分析与知识发现, 2020, 4(9): 68-80.
[15] Li Guangjian,Wang Kai,Zhang Qingzhi. Analysis Framework Based on Multi-Source Data for US Export Control: An Empirical Study[J]. 数据分析与知识发现, 2020, 4(9): 26-40.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn