Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (11): 145-152    DOI: 10.11925/infotech.2096-3467.2021.0671
Current Issue | Archive | Adv Search |
Improving PubMedBERT for CID-Entity-Relation Classification Using Text-CNN
Dong Miao1,4,Su Zhongqi2,Zhou Xiaobei3,Lan Xue4,Cui Zhigang5,Cui Lei4()
1Financial Section, China Medical University, Shenyang 110122, China
2China Medical University Library, Shenyang 110122, China
3Institute of Health Sciences, China Medical University, Shenyang 110122, China
4School of Health Management, China Medical University, Shenyang 110122, China
5Nursing School, China Medical University, Shenyang 110122, China
Download: PDF (989 KB)   HTML ( 8
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to improve the performance of PubMedBERT for CID entity relation classification. [Methods] We proposed a classification model based on PubMedBERT, which was also fine-tuned by Text-CNN. Then, we input entity pairs and sentence pairs to the model. Third, we used PubMedBERT to encode CID texts and obtained their global features. Finally, we captured important local information from the global features with Text-CNN to decide whether entity pairs have CID relation. [Results] The precision, recall and F1 value of this method on the BioCreative V CDR dataset reached 78.3%, 73.5% and 75.8% respectively, which were at least 3.1%, 1.5% and 3.3% higher than other methods. [Limitations] This model only examines CID texts, and more research is needed to evaluate its performance on clinical data or corpus of other domains. [Conclusions] This method can capture the features of CID texts and improve their entity relation classification.

Key wordsCID Entity Relation Classification      PubMedBERT      Text-CNN      Sentence Pair     
Received: 06 July 2021      Published: 23 December 2021
ZTFLH:  TP391  
Corresponding Authors: Cui Lei,ORCID:0000-0001-9479-8225     E-mail: lcui@cmu.edu.cn

Cite this article:

Dong Miao, Su Zhongqi, Zhou Xiaobei, Lan Xue, Cui Zhigang, Cui Lei. Improving PubMedBERT for CID-Entity-Relation Classification Using Text-CNN. Data Analysis and Knowledge Discovery, 2021, 5(11): 145-152.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0671     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I11/145

Structure of Model
BC5CDR5 Corpus(PMID:354896)
Dataset Document Chemicals Diseases CID
relation
Mention ID Mention ID
Training 500 5203 1467 4182 1965 1038
Development 500 5347 1507 4244 1865 1012
Test 500 5385 1435 4424 1988 1066
Summary of BC5CDR5 Corpus
ID e1 e2 sentence
3693336_1 triazolo manic triazolam-induced brief episodes of secondary mania in a depressed patient. large doses of triazolam repeatedly induced brief episodes of mania in a depressed elderly woman. features of organic mental disorder (delirium) were not present. manic excitement was coincident with the duration of action of triazolam. the possible contribution of the triazolo group to changes in affective status is discussed
Positive Sample of BC5CDR5
ID e1 e2 sentence
3693336_2 triazolo depressed triazolam-induced brief episodes of secondary mania in a depressed patient. large doses of triazolam repeatedly induced brief episodes of mania in a depressed elderly woman. features of organic mental disorder (delirium) were not present. manic excitement was coincident with the duration of action of triazolam. the possible contribution of the triazolo group to changes in affective status is discussed
Negative Sample of BC5CDR5
方法 Precision Recall F1
Best Approach of BioCreative V CDR[24] 55.6% 58.4% 57.0%
LSTM-based[20] 64.9% 49.3% 56.0%
CNN-based[15] 60.9% 59.5% 60.2%
BERT Original 70.1% 67.7% 65.6%
BERT+Text-CNN 71.2% 68.3% 69.7%
ClinicalBERT 70.5% 69.3% 69.8%
ClinicalBERT+Text-CNN 70.9% 70.0% 70.4%
BioBERT 72.0% 70.3% 71.1%
BioBERT+Text-CNN 73.1% 72.0% 72.5%
PubMedBERT 75.2% 69.1% 72.0%
PubMedBERT+Text-CNN 78.3% 73.5% 75.8%
Performance for Models on BC5CDR5 Corpus
Method Precision Recall F1
PubMed Embedding+Text-CNN 62.7% 56.3% 59.3%
Glove Embedding+Text-CNN 60.4% 54.6% 57.4%
PubMedBERT+Text-CNN 78.3% 73.5% 75.8%
Performance for Pre-trained Model and Word Embedding Models
[1] Dogan R I, Murray G C, Névéol A, et al. Understanding PubMed® User Search Behavior Through Log Analysis[J/OL]. Database, 2009. https://doi.org/10.1093/database/bap018.
[2] Lu Z Y. PubMed and Beyond: A Survey of Web Tools for Searching Biomedical Literature[J/OL]. Database, 2011. https://doi.org/10.1093/database/baq036.
[3] Kang N, Singh B, Bui C, et al. Knowledge-based Extraction of Adverse Drug Events from Biomedical Text[J]. BMC Bioinformatics, 2014, 15(1): Article No. 64.
[4] Davis A P, Grondin C J, Johnson R J, et al. The Comparative Toxicogenomics Database: Update 2017[J]. Nucleic Acids Research, 2017, 45:D972-D978.
doi: 10.1093/nar/gkw838
[5] PharmGKB[EB/OL]. [2021-07-16]. https://www.pharmgkb.org/.
[6] Zhou D Y, Zhong D Y, He Y L. Biomedical Relation Extraction: From Binary to Complex[J]. Computational and Mathematical Methods in Medicine, 2014: Article ID 298473.
[7] Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2014: 1746-1751.
[8] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, 1:4171-4186.
[9] Gu Y, Tinn R, Cheng H, et al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing[OL]. arXiv Preprint, arXiv: 2007.15779.
[10] Abacha A B, Zweigenbaum P. Automatic Extraction of Semantic Relations Between Medical Entities: Application to the Treatment Relation[C]// Proceedings of the 4th International Symposium for Semantic Mining in Biomedicine, Cambridge, United Kingdom. 2010.
[11] Li H, Tang B, Chen Q, et al. HITSZ_CDR: An End-to-End Chemical and Disease Relation Extraction System for BioCreative V[J/OL]. Database, 2016. https://doi.org/10.1093/database/baw077.
[12] Peng Y F, Wei C H, Lu Z Y. Improving Chemical Disease Relation Extraction with Rich Features and Weakly Labeled Data[J]. Journal of Cheminformatics, 2016, 8: Article No.53.
[13] Giles C B, Wren J D. Large-scale Directional Relationship Extraction and Resolution[J]. BMC Bioinformatics, 2008, 9: Article No.S11.
[14] Alam F, Corazza A, Lavelli A, et al. A Knowledge-poor Approach to Chemical-Disease Relation Extraction[J/OL]. Database, 2016. https://doi.org/10.1093/database/baw071.
[15] Gu J H, Sun F Q, Qian L H, et al. Chemical-induced Disease Relation Extraction via Convolutional Neural Network[J/OL]. Database, 2017. https://doi.org/10.1093/database/bax024.
[16] Zhou H W, Lang C K, Liu Z, et al. Knowledge-guided Convolutional Networks for Chemical-isease elation Extraction[J]. BMC Bioinformatics, 2019, 20: Article No.260.
[17] Gu J H, Sun F Q, Qian L H, et al. Chemical-induced Disease Relation Extraction via Attention-based Distant Supervision[J]. BMC Bioinformatics, 2019, 20: Article No.403.
[18] Li Z H, Yang Z H, Xiang Y, et al. Exploiting Sequence Labeling Framework to Extract Document-level Relations from Biomedical Texts[J]. BMC Bioinformatics, 2020, 21. DOI: 10.1186/s12859-020-3457-2.
doi: 10.1186/s12859-020-3457-2
[19] Mitra S, Saha S, Hasanuzzaman M. A Multi-view Deep Neural Network Model for Chemical-Disease Relation Extraction from Imbalanced Datasets[J]. IEEE Journal of Biomedical and Health Informatics, 2020, 24(11):3315-3325.
doi: 10.1109/JBHI.6221020
[20] Zhou H W, Deng H J, Chen L, et al. Exploiting Syntactic and Semantics Information for Chemical-Disease Relation Extraction[J/OL]. Database, 2016. https://doi.org/10.1093/database/baw048.
[21] Lee J, Yoon W, Kim S, et al. BioBERT: A Pre-trained Biomedical Language Representation Model for Biomedical Text Mining[J]. Bioinformatics, 2020, 36(4):1234-1240.
[22] Alsentzer E, Murphy J R, Boag W, et al. Publicly Available Clinical BERT Embeddings[OL]. arXiv Preprint, arXiv: 1904.03323.
[23] Li J, Sun Y P, Johnson R J, et al. BioCreative V CDR Task Corpus: A Resource for Chemical Disease Relation Extraction[J/OL]. Database, 2016. https://doi.org/10.1093/database/baw068.
[24] Bowman S R, Gauthier J, Rastogi A, et al. A Fast Unified Model for Parsing and Sentence understanding[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016, 3:1466-1477.
[25] 廖开际, 黄琼影, 席运江. 在线医疗社区问答文本的知识图谱构建研究[J]. 情报科学, 2021, 39(3):51-59, 75.
[25] (Liao Kaiji, Huang Qiongying, Xi Yunjiang. Research on the construction of knowledge graph of Q & A text in online medical community[J]. Information Science, 2021, 39(3):51-59, 75.)
[26] 黄梦醒, 李梦龙, 韩惠蕊. 基于电子病历的实体识别和知识图谱构建的研究[J]. 计算机应用研究, 2019, 36(12):3735-3739.
[26] (Huang Mengxing, Li Menglong, Han Huirui. Research on Entity Recognition and Knowledge Graph Construction Based on Electronic Medical Record[J]. Application Research of Computers, 2019, 36(12):3735-3739.)
[27] 李东奇, 李明鑫, 张潇. 基于知识库的开放域问答研究[J]. 电脑知识与技术, 2020, 16(36):179-181.
[27] (Li Dongqi, Li Mingxin, Zhang Xiao. Research on Open Domain Question Answering Based on Knowledge Base[J]. Computer Knowledge and Technology, 2020, 16(36):179-181.)
[28] 高曼, 崔雷. 利用文本挖掘进行药物重新定位的步骤与工具[J]. 中华医学图书情报杂志, 2017, 26(3):6-9.
[28] (Gao Man, Cui Lei. Steps and Tools for Drug Repositioning Using Text Mining[J]. Chinese Journal of Medical Library and Information Science, 2017, 26(3):6-9.)
[29] 隋明爽, 崔雷. 用文本挖掘方法发现药物的副作用[J]. 中华医学图书情报杂志, 2015, 24(11):67-72.
[29] (Sui Mingshuang, Cui Lei. Using Text Mining to Find the Side Effects of Drugs[J]. Chinese Journal of Medical Library and Information Science, 2015, 24(11):67-72.)
[30] 王秀艳, 崔雷. 采用混合方法抽取生物医学实体间语义关系[J]. 现代图书情报技术, 2013(3):77-82.
[30] (Wang Xiuyan, Cui Lei. A Hybrid Method to Extract Semantic Relation of Biomedical Entities[J].New Technology of Library and Information Service, 2013(3):77-82.)
[31] 王可鉴, 石乐明, 贺林, 等. 中国药物研发的新机遇:基于医药大数据的系统性药物重定位[J]. 科学通报, 2014, 59(18):1790-1796.
[31] (Wang Kejian, Shi Leming, He Lin, et al. New Opportunities for Drug Research and Development in China: Systematic Drug Repositioning Based on Big Data of Medicine[J]. Science Bulletin, 2014, 59(18):1790-1796.)
[1] Wang Hong, Shu Zhan, Gao Yinquan, Tian Wenhong. Analyzing Implicit Discourse Relation with Single Classifier and Multi-Task Network[J]. 数据分析与知识发现, 2021, 5(11): 80-88.
[2] Wu Yanwen, Cai Qiuting, Liu Zhi, Deng Yunze. Digital Resource Recommendation Based on Multi-Source Data and Scene Similarity Calculation[J]. 数据分析与知识发现, 2021, 5(11): 114-123.
[3] Li Zhenyu, Li Shuqing. Deep Collaborative Filtering Algorithm with Embedding Implicit Similarity Groups[J]. 数据分析与知识发现, 2021, 5(11): 124-134.
[4] Yu Chuanming, Zhang Zhengang, Kong Lingge. Comparing Knowledge Graph Representation Models for Link Prediction[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
[5] Ding Hao, Ai Wenhua, Hu Guangwei, Li Shuqing, Suo Wei. A Personalized Recommendation Model with Time Series Fluctuation of User Interest[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[6] Hua Bin, Wu Nuo, He Xin. Integrating Expert Reviews for Government Information Projects with Knowledge Fusion[J]. 数据分析与知识发现, 2021, 5(10): 124-136.
[7] Wang Yuan, Shi Kaize, Niu Zhendong. Position-Aware Stepwise Tagging Method for Triples Extraction of Entity-Relationship[J]. 数据分析与知识发现, 2021, 5(10): 71-80.
[8] Yang Chen, Chen Xiaohong, Wang Chuhan, Liu Tingting. Recommendation Strategy Based on Users’ Preferences for Fine-Grained Attributes[J]. 数据分析与知识发现, 2021, 5(10): 94-102.
[9] Dai Zhihong, Hao Xiaoling. Extracting Hypernym-Hyponym Relationship for Financial Market Applications[J]. 数据分析与知识发现, 2021, 5(10): 60-70.
[10] Wang Xuefeng, Ren Huichao, Liu Yuqin. Research on the Visualization Method of Drawing Technology Theme Map with Clusters [J]. 数据分析与知识发现, 0, (): 1-.
[11] Wang Yifan,Li Bo,Shi Hua,Miao Wei,Jiang Bin. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[12] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[13] Zhou Yang,Li Xuejun,Wang Donglei,Chen Fang,Peng Lijuan. Visualizing Knowledge Graph for Explosive Formula Design[J]. 数据分析与知识发现, 2021, 5(9): 42-53.
[14] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[15] Liu Yuanchen, Wang Hao, Gao Yaqi. Predicting Online Music Playbacks and Influencing Factors[J]. 数据分析与知识发现, 2021, 5(8): 100-112.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn