|
|
Entity Recognition and Labeling for Medical Literature Based on Neural Network |
Zhao Ruijie,Tong Xinyu,Liu Xiaohua,Lu Yonghe( ) |
School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China |
|
|
Abstract [Objective] This paper proposes a new entity recognition model, aiming to find new knowledge effectively and improve the utilization of medical papers. [Methods] We constructed a pharmaceutical entity recognition model based on Attention-BiLSTM-CRF and examined it on the public datasets of GENIA Term Annotation Task and BioCreative II Gene Mention Tagging. We also used the model to annotate abstracts of biomedical scientific papers. [Results] The F1 values of our model on the two data sets were 81.57% and 84.23%, while the accuracy rates were 92.51% and 97.85%. These results are better than those of the benchmark ones. Moreover, our model has more advantages in processing the extremely unbalanced data. [Limitations] The volume of data and application of entity labeling experiments are relatively homogeneous. [Conclusions] The proposed model improves the effectiveness of entity recognition and mining of new medical knowledge.
|
Received: 15 December 2021
Published: 26 October 2022
|
|
Fund:Science and Technology Program of Guangzhou, China(202002020036) |
Corresponding Authors:
Lu Yonghe,ORCID:0000-0002-7758-9365
E-mail: luyonghe@mail.sysu.edu.cn
|
[1] |
张海楠, 伍大勇, 刘悦, 等. 基于深度神经网络的中文命名实体识别[J]. 中文信息学报, 2017, 31(4): 28-35.
|
[1] |
( Zhang Hainan, Wu Dayong, Liu Yue, et al. Chinese Named Entity Recognition Based on Deep Neural Network[J]. Journal of Chinese Information Processing, 2017, 31(4): 28-35.)
|
[2] |
姚霖, 刘轶, 李鑫鑫, 等. 词边界字向量的中文命名实体识别[J]. 智能系统学报, 2016, 11(1): 37-42.
|
[2] |
( Yao Lin, Liu Yi, Li Xinxin, et al. Chinese Named Entity Recognition via Word Boundary Based Character Embedding[J]. CAAI Transactions on Intelligent Systems, 2016, 11(1): 37-42.)
|
[3] |
Bengio Y, Schwenk H, Senécal J S, et al. Neural Probabilistic Language Models[A]//Holmes D E, Jain L C. Innovations in Machine Learning[M]. 2006: 137-186.
|
[4] |
Luong T, Pham H, Manning C D. Effective Approaches to Attention-Based Neural Machine Translation[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1412-1421.
|
[5] |
张帆, 王敏. 基于深度学习的医疗命名实体识别[J]. 计算技术与自动化, 2017, 36(1): 123-127.
|
[5] |
( Zhang Fan, Wang Min. Medical Text Entities Recognition Method Base on Deep Learning[J]. Computing Technology and Automation, 2017, 36(1): 123-127.)
|
[6] |
张聪品, 方滔, 刘昱良. 基于LSTM-CRF命名实体识别技术的研究与应用[J]. 计算机技术与发展, 2019, 29(2): 106-108.
|
[6] |
( Zhang Congpin, Fang Tao, Liu Yuliang. Research and Application of Named Entity Recognition Based on LSTM-CRF[J]. Computer Technology and Development, 2019, 29(2): 106-108.)
|
[7] |
申站. 基于神经网络的中文电子病历命名实体识别[D]. 北京: 北京邮电大学, 2018.
|
[7] |
( Shen Zhan. Named Entity Recognition for Chinese Electronic Record with Neural Network[D]. Beijing: Beijing University of Posts and Telecommunications, 2018.)
|
[8] |
薛天竹. 面向医疗领域的中文命名实体识别[D]. 哈尔滨: 哈尔滨工业大学, 2017.
|
[8] |
( Xue Tianzhu. Research on Chinese Named Entity Recognition in Medical Field[D]. Harbin: Harbin Institute of Technology, 2017.)
|
[9] |
dos Santos C N, Zadrozny B. Learning Character-Level Representations for Part-of-Speech Tagging[C]// Proceedings of the 31st International Conference on Machine Learning. 2014: 1818-1826.
|
[10] |
LeCun Y, Bottou L, Bengio Y, et al. Gradient-Based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
doi: 10.1109/5.726791
|
[11] |
Zhao Z H, Yang Z H, Luo L, et al. Disease Named Entity Recognition from Biomedical Literature Using a Novel Convolutional Neural Network[J]. BMC Medical Genomics, 2017, 10(S5): 73.
doi: 10.1186/s12920-017-0316-8
|
[12] |
Elman J L. Finding Structure in Time[J]. Cognitive Science, 1990, 14(2): 179-211.
doi: 10.1207/s15516709cog1402_1
|
[13] |
Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
pmid: 9377276
|
[14] |
Cho K, van Merrienboer B, Bahdanau D, et al. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches[C]// Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. 2014: 103-111.
|
[15] |
Huang D G, Jin L K, Song D X, et al. Biomedical Named Entity Recognition Based on Recurrent Neural Networks with Different Extended Methods[J]. International Journal of Data Mining and Bioinformatics, 2016, 16(1): 17.
doi: 10.1504/IJDMB.2016.079799
|
[16] |
Liu Z J, Yang M, Wang X L, et al. Entity Recognition from Clinical Texts via Recurrent Neural Network[J]. BMC Medical Informatics and Decision Making, 2017, 17(S2): 67.
doi: 10.1186/s12911-017-0468-7
|
[17] |
Sahu S, Anand A. Recurrent Neural Network Models for Disease Name Recognition Using Domain Invariant Features[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 2216-2225.
|
[18] |
Gridach M. Character-Level Neural Network for Biomedical Named Entity Recognition[J]. Journal of Biomedical Informatics, 2017, 70: 85-91.
doi: S1532-0464(17)30097-7
pmid: 28502909
|
[19] |
Zeng D H, Sun C J, Lin L, et al. LSTM-CRF for Drug-Named Entity Recognition[J]. Entropy, 2017, 19(6): 283.
doi: 10.3390/e19060283
|
[20] |
Habibi M, Weber L, Neves M, et al. Deep Learning with Word Embeddings Improves Biomedical Named Entity Recognition[J]. Bioinformatics, 2017, 33(14): i37-i48.
doi: 10.1093/bioinformatics/btx228
|
[21] |
Jauregi Unanue I, Zare Borzeshi E, Piccardi M. Recurrent Neural Networks with Specialized Word Embeddings for Health-Domain Named-Entity Recognition[J]. Journal of Biomedical Informatics, 2017, 76: 102-109.
doi: S1532-0464(17)30244-7
pmid: 29146561
|
[22] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
|
[23] |
Souza F, Nogueira R, Lotufo R. Portuguese Named Entity Recognition Using BERT-CRF[OL]. arXiv Preprint, arXiv: 1909.10649.
|
[24] |
Alsentzer E, Murphy J, Boag W, et al. Publicly Available Clinical[C]// Proceedings of the 2nd Clinical Natural Language Processing Workshop. 2019: 72-78.
|
[25] |
Lee J, Yoon W, Kim S, et al. BioBERT: A Pre-Trained Biomedical Language Representation Model for Biomedical Text Mining[J]. Bioinformatics, 2020, 36(4): 1234-1240.
doi: 10.1093/bioinformatics/btz682
pmid: 31501885
|
[26] |
Lyu C, Chen B, Ren Y F, et al. Long Short-Term Memory RNN for Biomedical Named Entity Recognition[J]. BMC Bioinformatics, 2017, 18(1): 462.
doi: 10.1186/s12859-017-1868-5
pmid: 29084508
|
[27] |
Rei M, Crichton G K O, Pyysalo S. Attending to Characters in Neural Sequence Labeling Models[C]// Proceedings of the 26th International Conference on Computational Linguistics. 2016: 309-318.
|
[28] |
Luo L, Yang Z H, Yang P, et al. An Attention-Based BiLSTM-CRF Approach to Document-Level Chemical Named Entity Recognition[J]. Bioinformatics, 2018, 34(8): 1381-1388.
doi: 10.1093/bioinformatics/btx761
pmid: 29186323
|
[29] |
Graves A, Schmidhuber J. Framewise Phoneme Classification with Bidirectional LSTM Networks[C]// Proceedings of the 2005 IEEE International Joint Conference on Neural Networks. 2005: 2047-2052.
|
[30] |
Milolov T, Corrado G, Chen K, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
|
[31] |
Yuan Z, Liu Y J, Tan C Q, et al. Improving Biomedical Pretrained Language Models with Knowledge[C]// Proceedings of the 20th Workshop on Biomedical Language Processing. 2021.
|
[32] |
Yuan Z, Tan C Q, Huang S F, et al. Fusing Heterogeneous Factors with Triaffine Mechanism for Nested Named Entity Recognition[OL]. arXiv Preprint, arXiv: 2110.07480.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|