|
|
Improving PubMedBERT for CID-Entity-Relation Classification Using Text-CNN |
Dong Miao1,4,Su Zhongqi2,Zhou Xiaobei3,Lan Xue4,Cui Zhigang5,Cui Lei4( ) |
1Financial Section, China Medical University, Shenyang 110122, China 2China Medical University Library, Shenyang 110122, China 3Institute of Health Sciences, China Medical University, Shenyang 110122, China 4School of Health Management, China Medical University, Shenyang 110122, China 5Nursing School, China Medical University, Shenyang 110122, China |
|
|
Abstract [Objective] This paper tries to improve the performance of PubMedBERT for CID entity relation classification. [Methods] We proposed a classification model based on PubMedBERT, which was also fine-tuned by Text-CNN. Then, we input entity pairs and sentence pairs to the model. Third, we used PubMedBERT to encode CID texts and obtained their global features. Finally, we captured important local information from the global features with Text-CNN to decide whether entity pairs have CID relation. [Results] The precision, recall and F1 value of this method on the BioCreative V CDR dataset reached 78.3%, 73.5% and 75.8% respectively, which were at least 3.1%, 1.5% and 3.3% higher than other methods. [Limitations] This model only examines CID texts, and more research is needed to evaluate its performance on clinical data or corpus of other domains. [Conclusions] This method can capture the features of CID texts and improve their entity relation classification.
|
Received: 06 July 2021
Published: 23 December 2021
|
|
Corresponding Authors:
Cui Lei,ORCID:0000-0001-9479-8225
E-mail: lcui@cmu.edu.cn
|
[1] |
Dogan R I, Murray G C, Névéol A, et al. Understanding PubMed® User Search Behavior Through Log Analysis[J/OL]. Database, 2009. https://doi.org/10.1093/database/bap018.
|
[2] |
Lu Z Y. PubMed and Beyond: A Survey of Web Tools for Searching Biomedical Literature[J/OL]. Database, 2011. https://doi.org/10.1093/database/baq036.
|
[3] |
Kang N, Singh B, Bui C, et al. Knowledge-based Extraction of Adverse Drug Events from Biomedical Text[J]. BMC Bioinformatics, 2014, 15(1): Article No. 64.
|
[4] |
Davis A P, Grondin C J, Johnson R J, et al. The Comparative Toxicogenomics Database: Update 2017[J]. Nucleic Acids Research, 2017, 45:D972-D978.
doi: 10.1093/nar/gkw838
|
[5] |
PharmGKB[EB/OL]. [2021-07-16]. https://www.pharmgkb.org/.
|
[6] |
Zhou D Y, Zhong D Y, He Y L. Biomedical Relation Extraction: From Binary to Complex[J]. Computational and Mathematical Methods in Medicine, 2014: Article ID 298473.
|
[7] |
Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2014: 1746-1751.
|
[8] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, 1:4171-4186.
|
[9] |
Gu Y, Tinn R, Cheng H, et al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing[OL]. arXiv Preprint, arXiv: 2007.15779.
|
[10] |
Abacha A B, Zweigenbaum P. Automatic Extraction of Semantic Relations Between Medical Entities: Application to the Treatment Relation[C]// Proceedings of the 4th International Symposium for Semantic Mining in Biomedicine, Cambridge, United Kingdom. 2010.
|
[11] |
Li H, Tang B, Chen Q, et al. HITSZ_CDR: An End-to-End Chemical and Disease Relation Extraction System for BioCreative V[J/OL]. Database, 2016. https://doi.org/10.1093/database/baw077.
|
[12] |
Peng Y F, Wei C H, Lu Z Y. Improving Chemical Disease Relation Extraction with Rich Features and Weakly Labeled Data[J]. Journal of Cheminformatics, 2016, 8: Article No.53.
|
[13] |
Giles C B, Wren J D. Large-scale Directional Relationship Extraction and Resolution[J]. BMC Bioinformatics, 2008, 9: Article No.S11.
|
[14] |
Alam F, Corazza A, Lavelli A, et al. A Knowledge-poor Approach to Chemical-Disease Relation Extraction[J/OL]. Database, 2016. https://doi.org/10.1093/database/baw071.
|
[15] |
Gu J H, Sun F Q, Qian L H, et al. Chemical-induced Disease Relation Extraction via Convolutional Neural Network[J/OL]. Database, 2017. https://doi.org/10.1093/database/bax024.
|
[16] |
Zhou H W, Lang C K, Liu Z, et al. Knowledge-guided Convolutional Networks for Chemical-isease elation Extraction[J]. BMC Bioinformatics, 2019, 20: Article No.260.
|
[17] |
Gu J H, Sun F Q, Qian L H, et al. Chemical-induced Disease Relation Extraction via Attention-based Distant Supervision[J]. BMC Bioinformatics, 2019, 20: Article No.403.
|
[18] |
Li Z H, Yang Z H, Xiang Y, et al. Exploiting Sequence Labeling Framework to Extract Document-level Relations from Biomedical Texts[J]. BMC Bioinformatics, 2020, 21. DOI: 10.1186/s12859-020-3457-2.
doi: 10.1186/s12859-020-3457-2
|
[19] |
Mitra S, Saha S, Hasanuzzaman M. A Multi-view Deep Neural Network Model for Chemical-Disease Relation Extraction from Imbalanced Datasets[J]. IEEE Journal of Biomedical and Health Informatics, 2020, 24(11):3315-3325.
doi: 10.1109/JBHI.6221020
|
[20] |
Zhou H W, Deng H J, Chen L, et al. Exploiting Syntactic and Semantics Information for Chemical-Disease Relation Extraction[J/OL]. Database, 2016. https://doi.org/10.1093/database/baw048.
|
[21] |
Lee J, Yoon W, Kim S, et al. BioBERT: A Pre-trained Biomedical Language Representation Model for Biomedical Text Mining[J]. Bioinformatics, 2020, 36(4):1234-1240.
|
[22] |
Alsentzer E, Murphy J R, Boag W, et al. Publicly Available Clinical BERT Embeddings[OL]. arXiv Preprint, arXiv: 1904.03323.
|
[23] |
Li J, Sun Y P, Johnson R J, et al. BioCreative V CDR Task Corpus: A Resource for Chemical Disease Relation Extraction[J/OL]. Database, 2016. https://doi.org/10.1093/database/baw068.
|
[24] |
Bowman S R, Gauthier J, Rastogi A, et al. A Fast Unified Model for Parsing and Sentence understanding[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016, 3:1466-1477.
|
[25] |
廖开际, 黄琼影, 席运江. 在线医疗社区问答文本的知识图谱构建研究[J]. 情报科学, 2021, 39(3):51-59, 75.
|
[25] |
(Liao Kaiji, Huang Qiongying, Xi Yunjiang. Research on the construction of knowledge graph of Q & A text in online medical community[J]. Information Science, 2021, 39(3):51-59, 75.)
|
[26] |
黄梦醒, 李梦龙, 韩惠蕊. 基于电子病历的实体识别和知识图谱构建的研究[J]. 计算机应用研究, 2019, 36(12):3735-3739.
|
[26] |
(Huang Mengxing, Li Menglong, Han Huirui. Research on Entity Recognition and Knowledge Graph Construction Based on Electronic Medical Record[J]. Application Research of Computers, 2019, 36(12):3735-3739.)
|
[27] |
李东奇, 李明鑫, 张潇. 基于知识库的开放域问答研究[J]. 电脑知识与技术, 2020, 16(36):179-181.
|
[27] |
(Li Dongqi, Li Mingxin, Zhang Xiao. Research on Open Domain Question Answering Based on Knowledge Base[J]. Computer Knowledge and Technology, 2020, 16(36):179-181.)
|
[28] |
高曼, 崔雷. 利用文本挖掘进行药物重新定位的步骤与工具[J]. 中华医学图书情报杂志, 2017, 26(3):6-9.
|
[28] |
(Gao Man, Cui Lei. Steps and Tools for Drug Repositioning Using Text Mining[J]. Chinese Journal of Medical Library and Information Science, 2017, 26(3):6-9.)
|
[29] |
隋明爽, 崔雷. 用文本挖掘方法发现药物的副作用[J]. 中华医学图书情报杂志, 2015, 24(11):67-72.
|
[29] |
(Sui Mingshuang, Cui Lei. Using Text Mining to Find the Side Effects of Drugs[J]. Chinese Journal of Medical Library and Information Science, 2015, 24(11):67-72.)
|
[30] |
王秀艳, 崔雷. 采用混合方法抽取生物医学实体间语义关系[J]. 现代图书情报技术, 2013(3):77-82.
|
[30] |
(Wang Xiuyan, Cui Lei. A Hybrid Method to Extract Semantic Relation of Biomedical Entities[J].New Technology of Library and Information Service, 2013(3):77-82.)
|
[31] |
王可鉴, 石乐明, 贺林, 等. 中国药物研发的新机遇:基于医药大数据的系统性药物重定位[J]. 科学通报, 2014, 59(18):1790-1796.
|
[31] |
(Wang Kejian, Shi Leming, He Lin, et al. New Opportunities for Drug Research and Development in China: Systematic Drug Repositioning Based on Big Data of Medicine[J]. Science Bulletin, 2014, 59(18):1790-1796.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|