Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (9): 75-84    DOI: 10.11925/infotech.2096-3467.2021.0015
Current Issue | Archive | Adv Search |
Classification Model for Medical Entity Relations with Convolutional Neural Network
Fan Shaoping1,Zhao Yuxuan2,An Xinying1,Wu Qingqiang3()
1Institute of Medical Information / Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100020, China
2School of Finance, Central University of Finance and Economics, Beijing 102206, China
3School of Informatics, Xiamen University, Xiamen 361005, China
Download: PDF (1073 KB)   HTML ( 20
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a new classification model for entity relationship based on the Convolutional Neural Network (CNN) with multi-features embedding, aiming to improve the classification results and simplify feature calculation. [Objective] Based on the existing algorithms of embedded features, our CNN model integrated word positions and lexical features, as well as demonstrated the representation methods for the features. These features did not require complex algorithm calculation, which improved the model's performance. [Results] We examined the proposed model with the Bio-Medical corpus of AIMed, GENIA and ChemProt. The F1 scores were 0.7342, 0.9764 and 0.8900, respectively. This model yielded the best results with the GENIA and ChemProt datasets. [Limitations] Our model did not include the prior domain knowledge from biomedical field. [Conclusions] The proposed model could effectively conduct entity relationship classification, which also help the research on relation extraction and knowledgebase construction in bio-medical field.

Key wordsRelation Classification      CNN      Position Features      Lexical Features     
Received: 07 January 2021      Published: 29 June 2021
ZTFLH:  分类号: G350  
Fund:*National Natural Science Foundation of China(71704188);National Key Research and Development Program of China(2016YFC0901902-2)
Corresponding Authors: Wu Qingqiang     E-mail: wuqq@xmu.edu.cn

Cite this article:

Fan Shaoping,Zhao Yuxuan,An Xinying,Wu Qingqiang. Classification Model for Medical Entity Relations with Convolutional Neural Network. Data Analysis and Knowledge Discovery, 2021, 5(9): 75-84.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0015     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I9/75

句子 实体e1 实体e2 关系:
<e1>1,25D</e1> inhibited <e2>MYC gene</e2> expression and accelerated its protein turnover 1,25D MYC gene inhibit (e1, e2
A Sample of Relation Classification
句子 实体e1 实体e2
Demethylation experiments further confirmed that loss of <e1>ALX4</e1> expression was regulated by <e2>CpG island</e2> hypermethylation. ALX4 CpG
island
A Sample of Lexical Features
CNN Network Architecture
CNN Architecture with Only Word Representation
CNN Architecture with Word Representation and Position Features
CNN Architecture with Word Representation, Position Features and Lexical Features
语料库 关系名称 关系语句数量 训练集 测试集
AIMed[35] False 4 834 4 861 973
True 1 000
GENIA[36] Protein-Component 1 302 1 547 310
Subunit-Complex 555
ChemProt[37] Activator 2 571 5 363 1 073
Indirect-Downregulator 446
Indirect-Upregulator 3 225
Inhibitor 194
The Corpora Used and the Number of Relations Used for Training and Testing
语料库 模型结构 准确率 F1值
AIMed CNN + Word Representation + Position Features + Lexical Features 0.856 1 0.734 2
GENIA CNN + Word Representation + Position Features + Lexical Features 0.980 6 0.976 4
ChemProt CNN + Word Representation + Position Features + Lexical Features 0.923 6 0.890 0
The Performance of Proposed Model on AIMed, GENIA and ChemProt
The Performance of Different CNN Models on AIMed, GENIA and ChemProt
语料库 模型 F1值
AIMed 本文模型 0.734 2
Zhang 等[22] (Word, Position, SDP) 0.617 0
Peng等[19] (Word, Position, POS, Chunk, Dependency Information) 0.635 0
Chang等[38](Convolution Tree Kernel) 0.567 0
Hsieh等[41] (LSTMpre 0.769 0
Yadav等[42] (Att-sdpLSTM) 0.932 9
GENIA 本文模型 0.976 4
Ramesh等[40] (SVM + CFR) 0.761 0
ChemProt 本文模型 0.890 0
Corbett等[13] (RNNs + Word) 0.615 1
Lim等[43] (Tree-LSTM: Position +
Syntactic Parse Tree)
0.641 0
Beltagy等[44] (SciBERT) 0.836 4
The Performance Comparison with Other Models
[1] The Precision Medicine Initiative[EB/OL].[2019-12-01].https://obamawhitehouse.archives.gov/precision-medicine .
[2] 科技部关于发布国家重点研发计划精准医学研究等重点专项2016年度项目申报指南的通知[EB/OL]. [2019-12-01]. http://www.most.gov.cn/tztg/201603/t20160308_124542.html .
[2] (Notice of the Ministry of Science and Technology on Issuing 2016 Annual Project Application Guidelines for National Key R & D Plan, Precision Medicine Research and Other Key Special Projects [EB/OL]. [2019-12-01]. http://www.most.gov.cn/tztg/201603/t20160308_124542.html
[3] 刘雷, 王星. 精准医学知识库的构建[J]. 中华医学图书情报杂志, 2018, 27(6):1-9.
[3] ( Liu Lei, Wang Xing. Development of Knowledge Base for Precision Medicine[J]. Chinese Journal of Medical Library and Information Science, 2018, 27(6):1-9.)
[4] Disease Ontology[EB/OL].[2019-12-01].https://disease-ontology.org/ .
[5] KEGG: Kyoto Encyclopedia of Genes and Genomes[EB/OL].[2019-12-01].https://www.kegg.jp/ .
[6] PharmGKB[EB/OL].[2019-12-01].https://www.pharmgkb.org/ .
[7] Hendrickx I, Kim S N, Kozareva Z, et al. Semeval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals [C]// Proceedings of the 5th International Workshop on Semantic Evaluation. 2010: 33-38.
[8] Corpus for Relation Classification in Medical Field[EB/OL].[2021-01-10].https://github.com/yangshuothtf/corpus_relation_classification .
[9] Afzal H, Eales J, Stevens R, et al. Mining Semantic Networks of Bioinformatics E-Resources from the Literature[J]. Journal of Biomedical Semantics, 2011, 2 (S1): Article No. S4.
[10] Segura-Bedmar I, Martínez P, de Pablo-Sanchez C. Using a Shallow Linguistic Kernel for Drug-Drug Interaction Extraction[J]. Journal of Biomedical Informatics, 2011, 44(5):789-804.
doi: 10.1016/j.jbi.2011.04.005 pmid: 21545845
[11] Zhao Z H, Yang Z H, Luo L, et al. Drug-Drug Interaction Extraction from Biomedical Literature Using Syntax Convolutional Neural Network[J]. Bioinformatics, 2016, 32(22):3444-3453.
[12] Zhao Z H, Yang Z H, Sun C, et al. A Hybrid Protein-Protein Interaction Triple Extraction Method for Biomedical Literature [C]//Proceedings of 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2017.
[13] Corbett P, Boyle J. Improving the Learning of Chemical-Protein Interactions from Literature Using Transfer Learning and Specialized Word Embeddings[J]. Database, 2018. DOI: 10.1093/database/bay066.
doi: 10.1093/database/bay066
[14] Yan X, Mou L L, Li G, et al. Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths [C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1785-1794.
[15] 王天时. 基于特征嵌入表示的文本分类方法研究[D]. 济南: 山东师范大学, 2020.
[15] ( Wang Tianshi. Research on Text Classification Method Based on Feature Embedding Representation[D]. Ji'nan: Shandong Normal University, 2020.)
[16] Lee J, Seo S, Choi Y S. Semantic Relation Classification via Bidirectional LSTM Networks with Entity-Aware Attention Using Latent Entity Typing[J]. Symmetry, 2019, 11(6):785.
doi: 10.3390/sym11060785
[17] Sahu S K, Anand A, Oruganty K, et al. Relation Extraction from Clinical Texts Using Domain Invariant Convolutional Neural Network [C]∥Proceedings of the 15th Workshop on Biomedical Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2016:206-215.
[18] Quan C Q, Hua L, Sun X, et al. Multichannel Convolutional Neural Network for Biological Relation Extraction[J]. BioMed Research International, 2016: 1-10.
[19] Peng Y F, Lu Z Y. Deep Learning for Extracting Protein-Protein Interactions from Biomedical Literature [C]//Proceedings of the BioNLP 2017 Workshop. 2017: 29-38.
[20] Sahu S K, Anand A. Drug-Drug Interaction Extraction from Biomedical Texts Using Long Short Term Memory Network[J]. Journal of Biomedical Informatics, 2018, 86:15-24.
doi: 10.1016/j.jbi.2018.08.005
[21] Peng Y F, Rios A, Kavuluru R, et al. Extracting Chemical-Protein Relations with Ensembles of SVM and Deep Learning Models[J]. Database the Journal of Biological Database & Curation, DOI: 10.1093/database/bay073.
doi: 10.1093/database/bay073
[22] Zhang Y J, Lin H F, Yang Z H, et al. A Hybrid Model Based on Neural Networks for Biomedical Relation Extraction[J]. Journal of Biomedical Informatics, 2018, 81:83-92.
doi: 10.1016/j.jbi.2018.03.011
[23] Zeng D J, Liu K, Lai S W, et al. Relation Classification via Convolutional Deep Neural Network [C]//Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers. 2014: 2335-2344.
[24] Socher R, Huval B, Manning C D, et al. Semantic Compositionality Through Recursive Matrix-Vector Spaces [C]// Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012:1201-1211.
[25] Nguyen T H, Grishman R. Relation Extraction: Perspective from Convolutional Neural Networks [C]//Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. 2015: 39-48.
[26] Nguyen T H, Grishman R. Employing Word Representations and Regularization for Domain Adaptation of Relation Extraction [C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2014:68-74.
[27] Choi S P. Extraction of Protein-Protein Interactions (PPIs) from the Literature by Deep Convolutional Neural Networks with Various Feature Embeddings[J]. Journal of Information Science, 2018, 44(1):60-73.
doi: 10.1177/0165551516673485
[28] Porumb M, Barbantan I, Lemnaru C, et al. REMed: Automatic Relation Extraction from Medical Documents [C]//Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services. ACM, 2015: 19.
[29] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[30] Krizhevsky A, Sutskever I, Hinton G. ImageNet Classification with Deep Convolutional Neural Networks[J]. Communications of the ACM, 2017, 60(6):84-90.
doi: 10.1145/3065386
[31] Yang Y M. An Evaluation of Statistical Approaches to Text Categorization[J]. Information Retrieval, 1999, 1(1-2):69-90.
doi: 10.1023/A:1009982220290
[32] Bunescu R, Ge R F, Kate R J, et al. Comparative Experiments on Learning Information Extractors for Proteins and Their Interactions[J]. Artificial Intelligence in Medicine, 2005, 33(2):139-155.
pmid: 15811782
[33] Ohta T, Pyysalo S, Kim J D, et al. A Re-evaluation of Biomedical Named Entity-Term Relations[J]. Journal of Bioinformatics and Computational Biology, 2010, 8(5):917-928.
doi: 10.1142/S0219720010005014
[34] Taboureau O, Nielsen S K, Audouze K, et al. ChemProt: A Disease Chemical Biology Database[J]. Nucleic Acids Research, 2011, 39(S1):D367-D372.
doi: 10.1093/nar/gkq906
[35] AIMed [DB/OL].[2021-01-12]. ftp://ftp.cs.utexas.edu/pub/mooney/bio-data/ .
[36] Relation Annotation [EB/OL].[2021-01-12]. http://www.geniaproject.org/genia-corpus/relation-corpus .
[37] BioCreative VII [EB/OL]. [2021-01-12]. http://www.biocreative.org .
[38] Chang Y C, Chu C H, Su Y C, et al. PIPE: A Protein-Protein Interaction Passage Extraction Module for BioCreative Challenge[J]. Database, DOI: 10.1093/database/baw101.
doi: 10.1093/database/baw101
[39] Björne J, Salakoski T. Generalizing Biomedical Event Extraction [C]//Proceedings of the 2011 BioNLP Shared Task Workshop. 2011: 183-191.
[40] Ramesh B P, Prasad R, Miller T, et al. Automatic Discourse Connective Detection in Biomedical Text[J]. Journal of the American Medical Informatics Association, 2012, 19(5):800-808.
doi: 10.1136/amiajnl-2011-000775
[41] Hsieh Y L, Chang Y C, Chang N W, et al. Identifying Protein-Protein Interactions in Biomedical Literature Using Recurrent Neural Networks with Long Short-Term Memory [C]//Proceedings of the 8th International Joint Conference on Natural Language Processing. 2017.
[42] Yadav S, Ekbal A, Saha S, et al. Feature Assisted Stacked Attentive Shortest Dependency Path Based Bi-LSTM Model for Protein-Protein Interaction[J]. Knowledge-Based Systems, 2019, 166:18-29.
doi: 10.1016/j.knosys.2018.11.020
[43] Lim S, Kang J. Chemical-Gene Relation Extraction Using Recursive Neural Network[J]. Database, DOI: 10.1093/database/bay060.
doi: 10.1093/database/bay060
[44] Beltagy I, Lo K, Cohan A. SciBERT: A Pretrained Language Model for Scientific Text [C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 3615-3620.
[45] 胡正银, 刘蕾蕾, 代冰, 等. 基于领域知识图谱的生命医学学科知识发现探析[J]. 数据分析与知识发现, 2020, 4(11):1-14.
[45] ( Hu Zhengyin, Liu Leilei, Dai Bing, et al. Discovering Subject Knowledge in Life and Medical Sciences with Knowledge Graph[J]. Data Analysis and Knowledge Discovery, 2020, 4(11):1-14.)
[46] Beltagy I, Lo K, Cohan A. SciBERT: Pretrained Contextualized Embeddings for Scientific Text[OL]. arXivPreprint,arXiv: 1903. 10676.
[47] Zhu Y, Li L S, Lu H B, et al. Extracting Drug-Drug Interactions from Texts with BioBERT and Multiple Entity-aware Attentions[J]. Journal of Biomedical Informatics, 2020, 106:103451.
doi: S1532-0464(20)30079-4 pmid: 32454243
[1] Wang Hao, Lin Kerou, Meng Zhen, Li Xinlei. Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[2] Meng Zhen,Wang Hao,Yu Wei,Deng Sanhong,Zhang Baolong. Vocal Music Classification Based on Multi-category Feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 59-70.
[3] Dong Miao, Su Zhongqi, Zhou Xiaobei, Lan Xue, Cui Zhigang, Cui Lei. Improving PubMedBERT for CID-Entity-Relation Classification Using Text-CNN[J]. 数据分析与知识发现, 2021, 5(11): 145-152.
[4] Dai Jianhua, Deng Yubin. Extracting Emotion-Cause Pairs Based on Emotional Dilation Gated CNN[J]. 数据分析与知识发现, 2020, 4(8): 98-106.
[5] Weng Mengjuan,Yao Changqing,Han Hongqi,Wang Lijun,Ran Yaxin. Classification and Indexing Method with CNN for Imbalanced Datasets[J]. 数据分析与知识发现, 2020, 4(7): 87-95.
[6] Peng Chen,Lv Xueqiang,Sun Ning,Zang Le,Jiang Zhaocai,Song Li. Building Phrase Dictionary for Defective Products with Convolutional Neural Network[J]. 数据分析与知识发现, 2020, 4(11): 112-120.
[7] Na Ma,Zhixiong Zhang,Pengmin Wu. Automatic Identification of Term Citation Object with Feature Fusion[J]. 数据分析与知识发现, 2020, 4(1): 89-98.
[8] Yunfei Shao,Dongsu Liu. Classifying Short-texts with Class Feature Extension[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[9] Hui Li,Yaqing Chai. Fine-Grained Sentiment Analysis Based on Convolutional Neural Network[J]. 数据分析与知识发现, 2019, 3(1): 95-103.
[10] Feng Guoming,Zhang Xiaodong,Liu Suhui. Classifying Chinese Texts with CapsNet[J]. 数据分析与知识发现, 2018, 2(12): 68-76.
[11] Zhao Yang,Li Qiqi,Chen Yuhan,Cao Wenhang. Examining Consumer Reviews of Overseas Shopping APP with Sentiment Analysis[J]. 数据分析与知识发现, 2018, 2(11): 19-27.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn