Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (9): 75-84    DOI: 10.11925/infotech.2096-3467.2021.0015
Classification Model for Medical Entity Relations with Convolutional Neural Network
Fan Shaoping1,Zhao Yuxuan2,An Xinying1,Wu Qingqiang3()
1Institute of Medical Information / Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100020, China
2School of Finance, Central University of Finance and Economics, Beijing 102206, China
3School of Informatics, Xiamen University, Xiamen 361005, China
[Objective] This paper proposes a new classification model for entity relationship based on the Convolutional Neural Network (CNN) with multi-features embedding, aiming to improve the classification results and simplify feature calculation. [Objective] Based on the existing algorithms of embedded features, our CNN model integrated word positions and lexical features, as well as demonstrated the representation methods for the features. These features did not require complex algorithm calculation, which improved the model's performance. [Results] We examined the proposed model with the Bio-Medical corpus of AIMed, GENIA and ChemProt. The F1 scores were 0.7342, 0.9764 and 0.8900, respectively. This model yielded the best results with the GENIA and ChemProt datasets. [Limitations] Our model did not include the prior domain knowledge from biomedical field. [Conclusions] The proposed model could effectively conduct entity relationship classification, which also help the research on relation extraction and knowledgebase construction in bio-medical field.

Key wordsRelation Classification      CNN      Position Features      Lexical Features     
Received: 07 January 2021      Published: 29 June 2021
ZTFLH:  分类号: G350  
Fund:*National Natural Science Foundation of China(71704188);National Key Research and Development Program of China(2016YFC0901902-2)
Corresponding Authors: Wu Qingqiang     E-mail:

Cite this article:

Fan Shaoping,Zhao Yuxuan,An Xinying,Wu Qingqiang. Classification Model for Medical Entity Relations with Convolutional Neural Network. Data Analysis and Knowledge Discovery, 2021, 5(9): 75-84.

句子 实体e1 实体e2 关系:
<e1>1,25D</e1> inhibited <e2>MYC gene</e2> expression and accelerated its protein turnover 1,25D MYC gene inhibit (e1, e2
A Sample of Relation Classification
句子 实体e1 实体e2
Demethylation experiments further confirmed that loss of <e1>ALX4</e1> expression was regulated by <e2>CpG island</e2> hypermethylation. ALX4 CpG
A Sample of Lexical Features
CNN Network Architecture
CNN Architecture with Only Word Representation
CNN Architecture with Word Representation and Position Features
CNN Architecture with Word Representation, Position Features and Lexical Features
语料库 关系名称 关系语句数量 训练集 测试集
AIMed[35] False 4 834 4 861 973
True 1 000
GENIA[36] Protein-Component 1 302 1 547 310
Subunit-Complex 555
ChemProt[37] Activator 2 571 5 363 1 073
Indirect-Downregulator 446
Indirect-Upregulator 3 225
Inhibitor 194
The Corpora Used and the Number of Relations Used for Training and Testing
语料库 模型结构 准确率 F1值
AIMed CNN + Word Representation + Position Features + Lexical Features 0.856 1 0.734 2
GENIA CNN + Word Representation + Position Features + Lexical Features 0.980 6 0.976 4
ChemProt CNN + Word Representation + Position Features + Lexical Features 0.923 6 0.890 0
The Performance of Proposed Model on AIMed, GENIA and ChemProt
The Performance of Different CNN Models on AIMed, GENIA and ChemProt
语料库 模型 F1值
AIMed 本文模型 0.734 2
Zhang 等[22] (Word, Position, SDP) 0.617 0
Peng等[19] (Word, Position, POS, Chunk, Dependency Information) 0.635 0
Chang等[38](Convolution Tree Kernel) 0.567 0
Hsieh等[41] (LSTMpre 0.769 0
Yadav等[42] (Att-sdpLSTM) 0.932 9
GENIA 本文模型 0.976 4
Ramesh等[40] (SVM + CFR) 0.761 0
ChemProt 本文模型 0.890 0
Corbett等[13] (RNNs + Word) 0.615 1
Lim等[43] (Tree-LSTM: Position +
Syntactic Parse Tree)
0.641 0
Beltagy等[44] (SciBERT) 0.836 4
The Performance Comparison with Other Models
