[Objective] This study develops a method to extract clinical scale information based on semantic alignment, aiming to identify the potential cohort and improve the data-driven clinical research. [Methods] First, we analyzed the features of National Institutes of Health Stroke Scale (NIHSS) with clinical trials and real-world electronic medical records. Then, we proposed an extraction method for clinical scale information based on semantic alignment. Finally, we examined our model with data from ClinicalTrials.gov and open electronic medical record dataset MIMIC-III. [Results] The F1 values of the NIHSS total score and item scores of the extracted contents were 0.953 5 and 0.926 7. We identified patients who met NIHSS criteria effectively. [Limitations] More research is needed to examine this method with other clinical scales and real-world trial recuriment scenario. [Conclusions] The proposed method could effectively address the issue of semantic consistency facing clinical scale information.
杨林, 黄晓硕, 王嘉阳, 李姣. 基于语义对齐的临床量表信息提取方法及其临床试验队列识别的应用研究*[J]. 数据分析与知识发现, 2020, 4(12): 33-44.
Yang Lin, Huang Xiaoshuo, Wang Jiayang, Li Jiao. Extracting Clinical Scale Information and Identifying Trial Cohorts with Semantic Alignment. Data Analysis and Knowledge Discovery, 2020, 4(12): 33-44.
- NIHSS score of 6 - 22, inclusive, or at least 2 on the aphasia item of the NIHSS
排除标准
- score >= 2 on NIHSS Q1a - score of 2 on NIHSS Q2
Table 1 测试任务
临床试验
类别
临床 试验数
招募状态
完成(Completed)
287
招募中(Recruiting)
163
未知状态(Unknown status)
101
终止(Terminated)
59
未开始招募(Not yet recruiting)
54
撤回(Withdrawn)
17
正在进行,非招募中(Active, not recruiting)
17
暂停(Suspended)
10
邀请招募(Enrolling by invitation)
7
干预措施
药物(Drug)
570
设备(Device)
251
其他(Other)
233
手术(Procedure)
87
行为(Behavioral)
67
生物(Biological)
24
诊断测试(Diagnostic Test)
19
饮食补充(Dietary Supplement)
9
放射(Radiation)
4
复合产品(Combination Product)
3
基因(Genetic)
1
Table 2 临床试验注册数据分布
纳排标准
类别
临床试验数
NIHSS标准
仅出现在纳入标准中
205
仅出现在排除标准中
47
在纳入与排除标准中均出现
34
筛选粒度
仅筛选总评分
258
仅筛选检查项评分
10
既筛选总评分也筛选检查项评分
18
否定限定
-
7
总数
-
286
Table 3 NIHSS纳排标准分布
ICD-9
疾病名称
病例数
431
脑出血
1 294
43491
脑动脉闭塞,未明确为脑梗死
700
43411
脑栓塞伴脑梗死
630
430
蛛网膜下腔出血
617
4321
硬膜下出血
380
43311
脑梗死合并颈动脉闭塞与狭窄
124
4329
不明原因颅内出血
72
43401
脑血栓形成伴脑梗死
60
43331
多支及双侧脑前动脉闭塞狭窄伴脑梗死
49
43301
基底动脉闭塞与狭窄伴脑梗死
32
43490
脑动脉闭塞,未注明脑梗死
29
43321
椎动脉闭塞狭窄伴脑梗死
22
436
急性但定义不清的脑血管病
21
43410
无脑梗塞的脑栓塞
12
4320
非创伤性硬膜外出血
6
43400
脑血栓未提及脑梗死
5
43381
脑前动脉闭塞狭窄伴脑梗死
3
43391
不明原因脑前动脉闭塞狭窄伴脑梗死
2
Table 4 病例疾病分布
分值类型
分类
病例数
NIHSS总评分
无分值
26
有分值
一个分值
240
多个分值
46
有检查项评分值
128
无检查项评分值
158
检查项评分
无分值
179
有分值
一个分值
129
多个分值
4
有总评分分值
128
无总评分分值
5
Table 5 病例NIHSS信息分布
任务
准确率
召回率
F1值
NIHSS总评分
0.972 9
0.934 9
0.953 5
NIHSS检查项评分
0.986 3
0.873 9
0.926 7
1a. Level of Consciousness
0.941 7
0.932 7
0.937 2
1b. LOC Questions
0.990 0
0.846 2
0.912 5
1c. LOC Commands
1.000 0
0.899 0
0.946 8
2. Best Gaze
0.990 0
0.900 0
0.942 9
3. Visual
0.990 2
0.886 0
0.935 2
4. Facial Palsy
1.000 0
0.837 0
0.911 3
5a. Motor Arm(Left Arm)
0.961 5
0.862 1
0.909 1
5b. Motor Arm(Right Arm)
0.959 6
0.855 9
0.904 8
6a. Motor Leg(Left Leg)
0.9900
0.868 4
0.925 2
6b. Motor Leg(Right Leg)
0.978 9
0.885 7
0.930 0
7. Limb Ataxia
1.000 0
0.924 5
0.960 8
8. Sensory
1.000 0
0.871 8
0.931 5
9. Best Language
0.990 7
0.861 8
0.921 8
10. Dysarthria
1.000 0
0.848 4
0.918 0
11. Extinction and Inattention
1.000 0
0.853 4
0.920 9
Table 6 NIHSS信息抽取性能
序号
NIHSS总评分分值
检查项1a分值
case01
17
2
case02
18
2
case03
21
3
case04
22
2
case05
22
2
case06
23
2
case07
22
3
case08
24, 25
2
case09
27, 28
2
case10
29
3
case11
32
2
Table 7 测试任务1识别结果
序号
NIHSS总评分 分值
检查项 9分值
检查项 1a分值
检查项 2分值
case01
22
-
-
-
case02
18
2
0
1
case03
19
0
1
1
case04
-
3
0
1
Table 8 测试任务2识别结果
Fig.5 三种方法的信息提取结果
[1]
Myers K, Winters N C . Ten-year Review of Rating Scales. I: Overview of Scale Functioning, Psychometric Properties, and Selection[J]. Journal of the American Academy of Child & Adolescent Psychiatry, 2002,41(2):114-122.
doi: 10.1097/00004583-200202000-00004
pmid: 11837400
( Shi Rong, Guo Aimin. Research Methods of General Practitioners[M]. The 2nd Edition. Beijing: People’s Medical Publishing House, 2017: 211-217.)
[3]
中国卒中学会. 中国脑血管病临床管理指南[M]. 北京: 人民卫生出版社, 2019: 275.
[3]
(Chinese Stroke Association. Guidelines for Clinical Management of Cerebrovascular Diseases in China[M]. Beijing: People’s Medical Publishing House, 2019: 275.)
[4]
Teasdale G, Jennett B . Assessment of Coma and Impaired Consciousness: A Practical Scale[J]. The Lancet, 1974,304(7872):81-84.
(National Medical Products Administration. Announce of the National Medical Products Administration on Issuing the Guiding Principles of Real World Evidence Supporting Drug Research and Approval (Trial) (No. 1 in 2020)[EB/OL]. (2020-01-03). [2020-06-06]. http://www.nmpa.gov.cn/WS04/CL2182/373175.html
( Wang Shuiqiang . Points to Consider on Clinical Trials of Medicinal Products for the Treatment of Acute Ischemic Stroke[J]. The Chinese Journal of Clinical Pharmacology, 2010,26(7):483-487.)
[7]
Hobart J, Cano S . Rating Scales for Clinical Studies in Neurology: Challenges and Opportunities[J]. US Neurol, 2008,4(1):12-18.
Feldman W B, Kim A S, Chiong W . Trends in Recruitment Rates for Acute Stroke Trials 1990-2014[J]. Stroke, 2017,48(3):799-801.
[10]
Zöllner J P, Misselwitz B, Kaps M , et al. National Institutes of Health Stroke Scale (NIHSS) on Admission Predicts Acute Symptomatic Seizure Risk in Ischemic Stroke: A Population-Based Study Involving 135,117 Cases[J]. Scientific Reports, 2020,10(1):1-7.
pmid: 31913322
[11]
Kogan E, Twyman K, Heap J , et al. Assessing Stroke Severity Using Electronic Health Record Data: A Machine Learning Approach[J]. BMC Medical Informatics and Decision Making, 2020,20(1):1-8.
doi: 10.1186/s12911-019-1002-x
pmid: 31906929
[12]
Sheikhalishahi S, Miotto R, Dudley J T , et al. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review[J]. JMIR Medical Informatics, 2019,7(2):e12239.
doi: 10.2196/12239
pmid: 31066697
[13]
Wu H H, Toti G, Morley K I , et al. SemEHR: A General-Purpose Semantic Search System to Surface Semantic Data from Clinical Notes for Tailored Care, Trial Recruitment, and Clinical Research[J]. Journal of the American Medical Informatics Association, 2018,25(5):530-537.
doi: 10.1093/jamia/ocx160
pmid: 29361077
[14]
Zhang Y, Wang X W, Hou Z , et al. Clinical Named Entity Recognition from Chinese Electronic Health Records via Machine Learning Methods[J]. JMIR Medical Informatics, 2018,6(4):e50.
doi: 10.2196/medinform.9965
pmid: 30559093
[15]
Uzuner Ö, South B R, Shen S Y , et al. 2010 I2B2/VA Challenge on Concepts, Assertions, and Relations in Clinical Text[J]. Journal of the American Medical Informatics Association, 2011,18(5):552-556.
doi: 10.1136/amiajnl-2011-000203
[16]
Li Z F, Liu F F, Antieau L , et al. Lancet: A High Precision Medication Event Extraction System for Clinical Text[J]. Journal of the American Medical Informatics Association, 2010,17(5):563-567.
doi: 10.1136/jamia.2010.004077
pmid: 20819865
[17]
Šarić F, Glavaš G, Karan M , et al. Takelab: Systems for Measuring Semantic Text Similarity[C]// Proceedings of the 1st Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the 6th International Workshop on Semantic Evaluation 2012: 441-448.
[18]
Majumder G, Pakray P, Gelbukh A , et al. Semantic Textual Similarity Methods, Tools, and Applications: A Survey[J]. Computación y Sistemas, 2016,20(4):647-665.
[19]
De Marneffe M C, Grenager T, MacCartney B, et al. Aligning Semantic Graphs for Textual Inference and Machine Reading [C]//Proceedings of the AAAI Spring Symposium. 2007: 468-476.
[20]
Lin X S, Lam W, Lai K P. Entity Retrieval in the Knowledge Graph with Hierarchical Entity Type and Content [C]//Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval. 2018: 211-214.
[21]
Zhang S, Balog K. Auto-completion for Data Cells in Relational Tables [C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019: 761-770.
[22]
Daniel C, Ouagne D, Sadou E , et al. Cross Border Semantic Interoperability for Clinical Research: the EHR4CR Semantic Resources and Services[J]. AMIA Summits on Translational Science Proceedings, 2016: 51-59.
[23]
Mudaranthakam D P, Thompson J, Hu J , et al. A Curated Cancer Clinical Outcomes Database (C3OD) for Accelerating Patient Recruitment in Cancer Clinical Trials[J]. JAMIA Open, 2018,1(2):166-171.
doi: 10.1093/jamiaopen/ooy023
pmid: 30474074
[24]
Baader F, Borgwardt S, Forkel W. Patient Selection for Clinical Trials Using Temporalized Ontology-Mediated Query Answering [C]//Companion Proceedings of the Web Conference 2018. 2018:1069-1074.
( Wang Wen, Gao Pei, Wu Jing , et al. Technical Guidance for Developing Research Databases Using Existing Health and Medical Data[J]. Chinese Journal of Evidence-Based Medicine, 2019(7):763-770.)
[26]
Kang T, Zhang S D, Tang Y L , et al. EliIE: An Open-Source Information Extraction System for Clinical Trial Eligibility Criteria[J]. Journal of the American Medical Informatics Association, 2017,24(6):1062-1071.
doi: 10.1093/jamia/ocx019
pmid: 28379377
[27]
Yuan C, Ryan P B, Ta C , et al. Criteria2Query: A Natural Language Interface to Clinical Databases for Cohort Definition[J]. Journal of the American Medical Informatics Association, 2019,26(4):294-305.
pmid: 30753493
[28]
Brott T, Adams H P, Olinger C P , et al. Measurements of Acute Cerebral Infarction: A Clinical Examination Scale[J]. Stroke, 1989,20(7):864-870.
doi: 10.1161/01.str.20.7.864
pmid: 2749846
[29]
National Institute of Neurological Disorders and Stroke, National Institute of Health. NIH STROKE SCALE[EB/OL].[2020-06-06]. https://www.stroke.nih.gov/documents/NIH_Stroke_Scale_508C.pdf.
[30]
Hage V . The NIH Stroke Scale: A Window into Neurological Status[J]. Nursing Spectrum, 2011,24(15):44-49.
[31]
Johnson A E W, Pollard T J, Shen L , et al. MIMIC-III, a Freely Accessible Critical Care Database[J]. Scientific Data, 2016,3:160035.
doi: 10.1038/sdata.2016.35
pmid: 27219127
[32]
Woodfield R, Grant I , UK Biobank Stroke Outcomes Group, et al. Accuracy of Electronic Health Record Data for Identifying Stroke Cases in Large-Scale Epidemiological Studies: A Systematic Review from The UK Biobank Stroke Outcomes Group[J]. PLoS One, 2015,10(10):e0140533.
doi: 10.1371/journal.pone.0140533
pmid: 26496350