1Financial Section, China Medical University, Shenyang 110122, China 2China Medical University Library, Shenyang 110122, China 3Institute of Health Sciences, China Medical University, Shenyang 110122, China 4School of Health Management, China Medical University, Shenyang 110122, China 5Nursing School, China Medical University, Shenyang 110122, China
[Objective] This paper tries to improve the performance of PubMedBERT for CID entity relation classification. [Methods] We proposed a classification model based on PubMedBERT, which was also fine-tuned by Text-CNN. Then, we input entity pairs and sentence pairs to the model. Third, we used PubMedBERT to encode CID texts and obtained their global features. Finally, we captured important local information from the global features with Text-CNN to decide whether entity pairs have CID relation. [Results] The precision, recall and F1 value of this method on the BioCreative V CDR dataset reached 78.3%, 73.5% and 75.8% respectively, which were at least 3.1%, 1.5% and 3.3% higher than other methods. [Limitations] This model only examines CID texts, and more research is needed to evaluate its performance on clinical data or corpus of other domains. [Conclusions] This method can capture the features of CID texts and improve their entity relation classification.
董淼, 苏中琪, 周晓北, 兰雪, 崔志刚, 崔雷. 利用Text-CNN改进PubMedBERT在化学诱导性疾病实体关系分类效果的尝试[J]. 数据分析与知识发现, 2021, 5(11): 145-152.
Dong Miao, Su Zhongqi, Zhou Xiaobei, Lan Xue, Cui Zhigang, Cui Lei. Improving PubMedBERT for CID-Entity-Relation Classification Using Text-CNN. Data Analysis and Knowledge Discovery, 2021, 5(11): 145-152.
triazolam-induced brief episodes of secondary mania in a depressed patient. large doses of triazolam repeatedly induced brief episodes of mania in a depressed elderly woman. features of organic mental disorder (delirium) were not present. manic excitement was coincident with the duration of action of triazolam. the possible contribution of the triazolo group to changes in affective status is discussed
Table 2 BC5CDR5的正样本
ID
e1e2
sentence
3693336_2
triazolo depressed
triazolam-induced brief episodes of secondary mania in a depressed patient. large doses of triazolam repeatedly induced brief episodes of mania in a depressed elderly woman. features of organic mental disorder (delirium) were not present. manic excitement was coincident with the duration of action of triazolam. the possible contribution of the triazolo group to changes in affective status is discussed
Table 3 BC5CDR5的负样本
方法
Precision
Recall
F1
Best Approach of BioCreative V CDR[24]
55.6%
58.4%
57.0%
LSTM-based[20]
64.9%
49.3%
56.0%
CNN-based[15]
60.9%
59.5%
60.2%
BERT Original
70.1%
67.7%
65.6%
BERT+Text-CNN
71.2%
68.3%
69.7%
ClinicalBERT
70.5%
69.3%
69.8%
ClinicalBERT+Text-CNN
70.9%
70.0%
70.4%
BioBERT
72.0%
70.3%
71.1%
BioBERT+Text-CNN
73.1%
72.0%
72.5%
PubMedBERT
75.2%
69.1%
72.0%
PubMedBERT+Text-CNN
78.3%
73.5%
75.8%
Table 4 在BC5CDR5语料库上各模型结果的比较
Method
Precision
Recall
F1
PubMed Embedding+Text-CNN
62.7%
56.3%
59.3%
Glove Embedding+Text-CNN
60.4%
54.6%
57.4%
PubMedBERT+Text-CNN
78.3%
73.5%
75.8%
Table 5 预训练模型与词嵌入模型的对比
[1]
Dogan R I, Murray G C, Névéol A, et al. Understanding PubMed® User Search Behavior Through Log Analysis[J/OL]. Database, 2009. https://doi.org/10.1093/database/bap018.
[2]
Lu Z Y. PubMed and Beyond: A Survey of Web Tools for Searching Biomedical Literature[J/OL]. Database, 2011. https://doi.org/10.1093/database/baq036.
[3]
Kang N, Singh B, Bui C, et al. Knowledge-based Extraction of Adverse Drug Events from Biomedical Text[J]. BMC Bioinformatics, 2014, 15(1): Article No. 64.
[4]
Davis A P, Grondin C J, Johnson R J, et al. The Comparative Toxicogenomics Database: Update 2017[J]. Nucleic Acids Research, 2017, 45:D972-D978.
doi: 10.1093/nar/gkw838
Zhou D Y, Zhong D Y, He Y L. Biomedical Relation Extraction: From Binary to Complex[J]. Computational and Mathematical Methods in Medicine, 2014: Article ID 298473.
[7]
Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2014: 1746-1751.
[8]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, 1:4171-4186.
[9]
Gu Y, Tinn R, Cheng H, et al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing[OL]. arXiv Preprint, arXiv: 2007.15779.
[10]
Abacha A B, Zweigenbaum P. Automatic Extraction of Semantic Relations Between Medical Entities: Application to the Treatment Relation[C]// Proceedings of the 4th International Symposium for Semantic Mining in Biomedicine, Cambridge, United Kingdom. 2010.
[11]
Li H, Tang B, Chen Q, et al. HITSZ_CDR: An End-to-End Chemical and Disease Relation Extraction System for BioCreative V[J/OL]. Database, 2016. https://doi.org/10.1093/database/baw077.
[12]
Peng Y F, Wei C H, Lu Z Y. Improving Chemical Disease Relation Extraction with Rich Features and Weakly Labeled Data[J]. Journal of Cheminformatics, 2016, 8: Article No.53.
[13]
Giles C B, Wren J D. Large-scale Directional Relationship Extraction and Resolution[J]. BMC Bioinformatics, 2008, 9: Article No.S11.
[14]
Alam F, Corazza A, Lavelli A, et al. A Knowledge-poor Approach to Chemical-Disease Relation Extraction[J/OL]. Database, 2016. https://doi.org/10.1093/database/baw071.
[15]
Gu J H, Sun F Q, Qian L H, et al. Chemical-induced Disease Relation Extraction via Convolutional Neural Network[J/OL]. Database, 2017. https://doi.org/10.1093/database/bax024.
[16]
Zhou H W, Lang C K, Liu Z, et al. Knowledge-guided Convolutional Networks for Chemical-isease elation Extraction[J]. BMC Bioinformatics, 2019, 20: Article No.260.
[17]
Gu J H, Sun F Q, Qian L H, et al. Chemical-induced Disease Relation Extraction via Attention-based Distant Supervision[J]. BMC Bioinformatics, 2019, 20: Article No.403.
[18]
Li Z H, Yang Z H, Xiang Y, et al. Exploiting Sequence Labeling Framework to Extract Document-level Relations from Biomedical Texts[J]. BMC Bioinformatics, 2020, 21. DOI: 10.1186/s12859-020-3457-2.
doi: 10.1186/s12859-020-3457-2
[19]
Mitra S, Saha S, Hasanuzzaman M. A Multi-view Deep Neural Network Model for Chemical-Disease Relation Extraction from Imbalanced Datasets[J]. IEEE Journal of Biomedical and Health Informatics, 2020, 24(11):3315-3325.
doi: 10.1109/JBHI.6221020
[20]
Zhou H W, Deng H J, Chen L, et al. Exploiting Syntactic and Semantics Information for Chemical-Disease Relation Extraction[J/OL]. Database, 2016. https://doi.org/10.1093/database/baw048.
[21]
Lee J, Yoon W, Kim S, et al. BioBERT: A Pre-trained Biomedical Language Representation Model for Biomedical Text Mining[J]. Bioinformatics, 2020, 36(4):1234-1240.
[22]
Alsentzer E, Murphy J R, Boag W, et al. Publicly Available Clinical BERT Embeddings[OL]. arXiv Preprint, arXiv: 1904.03323.
[23]
Li J, Sun Y P, Johnson R J, et al. BioCreative V CDR Task Corpus: A Resource for Chemical Disease Relation Extraction[J/OL]. Database, 2016. https://doi.org/10.1093/database/baw068.
[24]
Bowman S R, Gauthier J, Rastogi A, et al. A Fast Unified Model for Parsing and Sentence understanding[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016, 3:1466-1477.
(Liao Kaiji, Huang Qiongying, Xi Yunjiang. Research on the construction of knowledge graph of Q & A text in online medical community[J]. Information Science, 2021, 39(3):51-59, 75.)
(Huang Mengxing, Li Menglong, Han Huirui. Research on Entity Recognition and Knowledge Graph Construction Based on Electronic Medical Record[J]. Application Research of Computers, 2019, 36(12):3735-3739.)
(Li Dongqi, Li Mingxin, Zhang Xiao. Research on Open Domain Question Answering Based on Knowledge Base[J]. Computer Knowledge and Technology, 2020, 16(36):179-181.)
(Gao Man, Cui Lei. Steps and Tools for Drug Repositioning Using Text Mining[J]. Chinese Journal of Medical Library and Information Science, 2017, 26(3):6-9.)
(Sui Mingshuang, Cui Lei. Using Text Mining to Find the Side Effects of Drugs[J]. Chinese Journal of Medical Library and Information Science, 2015, 24(11):67-72.)
(Wang Xiuyan, Cui Lei. A Hybrid Method to Extract Semantic Relation of Biomedical Entities[J].New Technology of Library and Information Service, 2013(3):77-82.)
(Wang Kejian, Shi Leming, He Lin, et al. New Opportunities for Drug Research and Development in China: Systematic Drug Repositioning Based on Big Data of Medicine[J]. Science Bulletin, 2014, 59(18):1790-1796.)