Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (7): 123-132    DOI: 10.11925/infotech.2096-3467.2018.1454
Annotating Chinese E-Medical Record for Knowledge Discovery
Jiahui Hu,An Fang(),Wanqing Zhao,Chenliu Yang,Huiling Ren
Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing 100020, China
Download: PDF (817 KB)   HTML ( 33
[Objective] This paper studies the annotation method for Chinese electronic medical records, aiming to improve the processing of massive clinical texts and clinical knowledge discovery. [Methods] First, we proposed annotation method for Chinese e-medical records, and constructed a visual interactive platform. Then, based on the word and phrase features of these records, we identified the medical name entities with natural language processing and machine learning approaches. [Results] A total of 700 annotated records were obtained, and the overall F value of the Pipeline-based annotation method reached 0.8772, which was 32.9% higher than those based on the original medical records. [Limitations] Since the electronic medical record contains sensitive privacy information, this study was conducted with open dataset, and the corpus size was limited. [Conclusions] The Chinese electronic medical record annotation method and platform constructed in this study could effectively process clinical texts, and the association of medical knowledge.

Key wordsChinese Electronic Medical Record      Text Annotation      Natural Language Processing      Machine Learning      Knowledge Discovery     
Received: 24 December 2018      Published: 06 September 2019
ZTFLH:  TP391  
Corresponding Authors: An Fang     E-mail:

Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery. Data Analysis and Knowledge Discovery, 2019, 3(7): 123-132.

数据集 症状和体征 检查和
治疗 疾病和
身体部位 合计
训练集 6 486 7 987 853 515 8 942 24 783
测试集 1 345 1 559 195 207 1 777 5 083
序号 编码 实体类别
1 B-SYMPTOM 症状和体征
5 B-CHECK 检查和检验
13 B-DISEASE 疾病和诊断
17 B-BODY 身体部位
21 O 非医疗实体
治疗 疾病和
P 0.9898 0.9554 0.9588 0.9703 0.9237 0.9531
R 0.9864 0.8233 0.9555 0.9515 0.9358 0.9138
F值 0.9881 0.8845 0.9571 0.9608 0.9297 0.9331
症状和体征 检查和检验 治疗 疾病和诊断 身体
P 0.9439 0.9091 0.7945 0.7772 0.8419 0.8860
R 0.9636 0.7505 0.5949 0.6908 0.8149 0.8210
F值 0.9536 0.8222 0.6804 0.7315 0.8281 0.8522
治疗 疾病和
重合度 0.6148 0.3263 0.1181 0.1803 0.2423
