Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (8): 110-121    DOI: 10.11925/infotech.2096-3467.2021.1167
Text Semantic Representation with Structure-Function and Entity Recognition: Case Study of Medical Records
Hu Jiming1,2,Qian Wei1,2,Wen Peng3(),Lv Xiaoguang4
1School of Information Management, Wuhan University, Wuhan 430072, China
2Information Retrieval and Knowledge Mining Laboratory, Wuhan University, Wuhan 430072, China
3School of Marxism, Wuhan University, Wuhan 430072, China
4Renmin Hospital of Wuhan University, Wuhan 430060, China
[Objective] This paper tries to improve the accuracy of text representation and mining, with the help of structural and functional information from Chinese medical records. [Methods] First, we proposed a new semantic representation strategy for the texts of Chinese medical records based on their structure-function features. Then, we used the BiLSTM-CRF model to recognize named entities, which introduced structure information at the word vector level. Finally, we utilized the TextCNN model to extract local context features, which helped us obtain a vector representation with richer text semantic connotations. [Results] The precision, recall and F values of the new model reached 93.20%, 95.19% and 94.19% respectively, while the classification accuracy rate reached 92.12%. [Limitations] Future research is needed to evaluate our model with more texts and refine the structure recognition process. [Conclusions] The proposed method could effectively improve the accuracy of named entity recognition, and enrich the semantic connotation and representation of the texts.

Key wordsChinese Medical Records      Text Structure and Function      Named Entity Recognition      Text Semantic Representation      BiLSTM-CRF Model     
Received: 14 October 2021      Published: 23 September 2022
ZTFLH:  TP391  
Fund:National Natural Science Foundation of China(71874125);Young Top-notch Talent Cultivation Program of Hubei Province
Corresponding Authors: Wen Peng,ORCID:0000-0002-0278-7391

Hu Jiming, Qian Wei, Wen Peng, Lv Xiaoguang. Text Semantic Representation with Structure-Function and Entity Recognition: Case Study of Medical Records. Data Analysis and Knowledge Discovery, 2022, 6(8): 110-121.

学者 研究视角 研究思路
Lu等[33] 文本块
本文 结构功能
Research Methods of Text Representation Based on Structure Information
Text Representation Framework of Medical Records Based on Structure Function and Entity Recognition
Named Entity Recognition Model Based on Structure Function (CSF-BiLSTM-CRF)
TextCNN Text Representation Model
序号 结构模块 内涵功能
1 入院情况 主诉、既往史、体查发现、主要辅助检查
2 入院诊断 疾病
3 治疗经过 入院检查、治疗方式、药物、病检
4 出院情况 主诉、体查发现
5 出院诊断 疾病
The Text Structure and Connotative Functions of Chinese Medical Records
实体类型 类型定义 示例 标识符号
症状 患者主观描述症状,位于患者主诉中 腹痛、呕吐、腹胀 SYMPTOM
身体部位 身体的解剖学部位或器官 腹、胃、肝 BODY
化验和检查 化验主要指血、粪、尿实验室化验指标;检查主要指影像学、核医学等结果 T(体温)、胃镜、CT TEST&
疾病 各类疾病医学名词及缩写,位于患者既往疾病史及入院诊断和出院诊断中 胃癌、溃疡、高血压 DISEASE
体征 体格检查发现身体客观异常表现 压痛、反跳痛、呼吸 SIGN
治疗 止血、营养支持以及特殊手术名称 化疗、手术、营养 TREATMENT
药物 药物名称,位于既往疾病史、药物过敏史以及治疗经过中 奥沙利铂、替吉奥、维康达 DRUG
The Entity Type of Chinese Medical Record
参数名称 参数值
初始学习率 1.0
Dropout 0.5
隐藏层大小 300
迭代次数 50
Batch_size 32
The Parameter Settings of CSF-BiLSTM-CRF Model
模型 P/% R/% F值/%
HMM 86.02 73.52 79.28
CRF 82.17 85.88 83.99
BiLSTM 81.42 78.21 79.78
BiLSTM-CRF 92.39 92.51 92.48
CSF-BiLSTM-CRF 93.20 95.19 94.19
Entity Recognition Results of Different Models
参数名称 参数值
文本维度 800
词维度 100
卷积核大小 3,4,5
Dropout 0.5
Batch_size 64
迭代次数 50
Parameter Settings of TextCNN Model
序号 文本表示方法 Acc/% 类别 P/% R/% F值/%
1 Doc2Vec+结构(Baseline) 74.55 腺癌 72.58 64.29 68.18
胃癌 75.73 82.11 78.79
2 仅文本向量 55.76 腺癌 58.57 48.24 52.90
胃癌 53.68 63.75 58.29
3 文本向量+实体结构信息 56.36 腺癌 58.90 50.59 54.43
胃癌 54.35 62.50 58.14
4 仅文本向量(TextCNN) 87.27 腺癌 84.81 88.16 86.45
胃癌 89.53 86.52 88.00
5 文本向量+普通实体(TextCNN) 90.30 腺癌 90.54 88.16 89.33
胃癌 90.11 92.13 91.11
6 文本向量+实体结构信息(TextCNN) 92.12 腺癌 95.00 89.41 92.12
胃癌 89.41 95.00 92.12
Classification Results Under Different Text Representation Methods
