|
|
Normalizing Chinese Disease Names with Multi-feature Fusion |
Han Pu1,2(),Zhang Zhanpeng1,Zhang Mingtao1,Gu Liang1 |
1School of Management, Nanjing University of Posts & Telecommunications, Nanjing 210023, China; 2Jiangsu Provincial Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China |
|
|
Abstract [Objective] This paper proposes a normalization model for Chinese disease names based on multi-feature fusion, aiming to address the issue of multiple alternative disease names for online health communities. [Methods] First, we constructed a normalized dataset for Chinese disease names used by online health communities. Second, we conducted experiments in Chinese and English with the LSTM, GRU and CNN models. Third, we generated external semantic feature vectors with Word2vec and GloVe. Finally, we developed the normalization model MFCF-CNN for Chinese disease names based on the multi-feature fusion and self-attention mechanism. [Results] We examined the proposed model with dataset. The accuracy of our MFCF-CNN model reached 85.48%, which is 8.84% higher than the basic CNN model. Our model made better use of global and local semantic features. [Limitations] The amount of the experiment data needs to be expanded. [Conclusions] The proposed model promotes the normalization of Chinese disease names, which benefits the medical knowledge graph construction and natural language understanding in Chinese.
|
Received: 04 December 2020
Published: 27 May 2021
|
|
Fund:*The work is supported by the National Social Science Fund of China(17CTQ022);the Jiangsu Graduate Research and Innovation Program Fund Project(KYCX20_0844) |
Corresponding Authors:
Han Pu
E-mail: hanpu@njupt.edu.cn
|
[1] |
Liu X, Zhou Y J, Wang Z R. Recognition and Extraction of Named Entities in Online Medical Diagnosis Data Based on a Deep Neural Network[J]. Journal of Visual Communication and Image Representation, 2019,60:1-15.
doi: 10.1016/j.jvcir.2019.02.001
|
[2] |
Wu C C, Luo G, Guo C, et al. An Attention-based Multi-task Model for Named Entity Recognition and Intent Analysis of Chinese Online Medical Questions[J]. Journal of Biomedical Informatics, 2020,108:103511.
doi: 10.1016/j.jbi.2020.103511
|
[3] |
杨文明, 褚伟杰. 在线医疗问答文本的命名实体识别[J]. 计算机系统应用, 2019,28(2):8-14.
|
[3] |
( Yang Wenming, Chu Weijie. Named Entity Recognition of Online Medical Question Answering Text[J]. Computer Systems & Applications, 2019,28(2):8-14.)
|
[4] |
陈美杉, 夏晨曦. 肝癌患者在线提问的命名实体识别研究:一种基于迁移学习的方法[J]. 数据分析与知识发现, 2020,3(12):61-69.
|
[4] |
( Chen Meishan, Xia Chenxi. Identifying Entities of Online Questions from Cancer Patients Based on Transfer Learning[J]. Data Analysis and Knowledge Discovery, 2020,3(12):61-69.)
|
[5] |
Nie L Q, Zhao Y L, Akbari M, et al. Bridging the Vocabulary Gap Between Health Seekers and Healthcare Knowledge[J]. IEEE Transactions on Knowledge and Data Engineering, 2014,27(2):396-409.
doi: 10.1109/TKDE.2014.2330813
|
[6] |
金碧漪, 许鑫. 社会化问答社区中糖尿病健康信息的需求分析[J]. 中华医学图书情报杂志, 2014,23(12):37-42.
|
[6] |
( Jin Biyi, Xu Xin. Health Information Needs of Diabetics in Social Q&A Community[J]. Chinese Journal of Medical Library and Information Science, 2014,23(12):37-42.)
|
[7] |
张洪武, 冯思佳, 赵文龙, 等. 基于网络用户搜索行为的健康信息需求分析[J]. 医学信息学杂志, 2011,32(5):13-18.
|
[7] |
( Zhang Hongwu, Feng Sijia, Zhao Wenlong, et al. Analysis of Health Information Needs Based on Network Users Retrieval Behavior[J]. Journal of Medical Informatics, 2011,32(5):13-18.)
|
[8] |
Nie L Q, Wang M, Zhang L M, et al. Disease Inference from Health-related Questions via Sparse Deep Learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2015,27(8):2107-2119.
doi: 10.1109/TKDE.2015.2399298
|
[9] |
Chen X, Yan G Y. Semi-supervised Learning for Potential Human MicroRNA-disease Associations Inference[J]. Scientific Reports, 2014,4(1):5501.
doi: 10.1038/srep05501
|
[10] |
Stanovsky G, Gruhl D, Mendes P. Recognizing Mentions of Adverse Drug Reaction in Social Media Using Knowledge-Infused Recurrent Models[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 2017: 142-151.
|
[11] |
Tutubalina E, Nikolenko S. Combination of Deep Recurrent Neural Networks and Conditional Random Fields for Extracting Adverse Drug Reactions from User Reviews[J]. Journal of Healthcare Engineering, 2017: Article No. 9451342.
|
[12] |
朱笑笑, 杨尊琦, 刘婧. 基于Bi-LSTM和CRF的药品不良反应抽取模型构建[J]. 数据分析与知识发现, 2019,3(2):90-97.
|
[12] |
( Zhu Xiaoxiao, Yang Zunqi, Liu Jing. Construction of an Adverse Drug Reaction Extraction Model Based on Bi-LSTM and CRF[J]. Data Analysis and Knowledge Discovery, 2019,3(2):90-97.)
|
[13] |
Leaman R, Khare R, Lu Z. Challenges in Clinical Natural Language Processing for Automated Disorder Normalization[J]. Journal of Biomedical Informatics, 2015,57:28-37.
doi: 10.1016/j.jbi.2015.07.010
|
[14] |
Ching T, Himmelstein D S, Beaulieu-Jones B K, et al. Opportunities and Obstacles for Deep Learning in Biology and Medicine[J]. Journal of the Royal Society Interface, 2018,15:20170387.
doi: 10.1098/rsif.2017.0387
|
[15] |
Leaman R, Dogan R I, Lu Z. DNorm: Disease Name Normalization with Pairwise Learning to Rank[J]. Bioinformatics, 2013,29(22):2909-2917.
doi: 10.1093/bioinformatics/btt474
|
[16] |
Ristad E S, Yianilos P N. Learning String-edit Distance[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998,20(5):522-532.
doi: 10.1109/34.682181
|
[17] |
Aronson A R. Effective Mapping of Biomedical Text to the UMLS Metathesaurus: the MetaMap Program[C]// Proceedings of the AMIA Symposium. 2001: 17-21.
|
[18] |
Tsuruoka Y, McNaught J, Tsujii J, et al. Learning String Similarity Measures for Gene/Protein Name Dictionary Look-up Using Logistic Regression[J]. Bioinformatics, 2007,23(20):2768-2774.
doi: 10.1093/bioinformatics/btm393
|
[19] |
Kate R J. Normalizing Clinical Terms Using Learned Edit Distance Patterns[J]. Journal of the American Medical Informatics Association, 2016,23(2):380-386.
doi: 10.1093/jamia/ocv108
|
[20] |
Jonnagaddala J, Jue T R, Chang N W, et al. Improving the Dictionary Lookup Approach for Disease Normalization Using Enhanced Dictionary and Query Expansion[J]. Database: The Journal of Biological Databases and Curation, 2016. DOI: 10.1093/database/baw112.
doi: 10.1093/database/baw112
|
[21] |
Zhang Y Z, Ma X J, Song G J. Chinese Medical Concept Normalization by Using Text and Comorbidity Network Embedding[C]// Proceedings of 2018 IEEE International Conference on Data Mining. 2018: 777-786.
|
[22] |
Liu H W, Xu Y. A Deep Learning Way for Disease Name Representation and Normalization[C]// Proceedings of the 8th CCF International Conference on Natural Language Processing and Chinese Computing. 2017: 151-157.
|
[23] |
Limsopatham N, Collier N. Normalising Medical Concepts in Social Media Texts by Learning Semantic Representation[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1014-1023.
|
[24] |
Li H D, Chen Q C, Tang B Z, et al. CNN-based Ranking for Biomedical Entity Normalization[J]. BMC Bioinformatics, 2017,18(11):79-86.
doi: 10.1186/s12859-017-1499-x
|
[25] |
Tutubalina E, Miftahutdinov Z, Nikolenko S, et al. Sequence Learning with RNNs for Medical Concept Normalization in User-Generated Texts[OL]. arXiv Preprint, arXiv: 1811. 11523.
|
[26] |
Niu J H, Yang Y H, Zhang S H, et al. Multi-task Character-Level Attentional Networks for Medical Concept Normalization[J]. Neural Processing Letters, 2019,49(3):1239-1256.
doi: 10.1007/s11063-018-9873-x
|
[27] |
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013: 3111-3119.
|
[28] |
Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
|
[29] |
Limsopatham N, Collier N. Normalising Medical Concepts in Social Media Texts by Learning Semantic Representation[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1014-1023.
|
[30] |
Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
pmid: 9377276
|
[31] |
Cho K, van Merriënboer B, Gulcehre C, et al. Learning Phrase Representations Using RNN Encoder-decoder for Statistical Machine Translation[OL]. arXiv Preprint, arXiv: 1406. 1078.
|
[32] |
Kim Y. Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408. 5882.
|
[33] |
Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv: 1409. 0473.
|
[34] |
Young T, Hazarika D, Poria S, et al. Recent Trends in Deep Learning based Natural Language Processing[J]. IEEE Computational Intelligence Magazine, 2018,13(3):55-75.
doi: 10.1109/MCI.2018.2840738
|
[35] |
Tutubalina E, Miftahutdinov Z, Nikolenko S, et al. Medical Concept Normalization in Social Media Posts with Recurrent Neural Networks[J]. Journal of Biomedical Informatics, 2018,84:93-102.
doi: S1532-0464(18)30112-6
pmid: 29906585
|
[36] |
Lee K, Hasan S A, Farri O, et al. Medical Concept Normalization for Online User-generated Texts[C]// Proceedings of the IEEE International Conference on Healthcare Informatics. 2017: 462-469.
|
[37] |
Tan Z X, Wang M X, Xie J, et al. Deep Semantic Role Labeling with Self-attention[OL]. arXiv Preprint, arXiv: 1712. 01586.
|
[38] |
Verga P, Strubell E, McCallum A. Simultaneously Self-attending to All Mentions for Full-abstract Biological Relation Extraction[OL]. arXiv Preprint, arXiv: 1802. 10569.
|
[39] |
Woo S, Park J, Lee J Y, et al. CBAM: Convolutional Block Attention Module[C]// Proceedings of the European Conference on Computer Vision. 2018: 3-19.
|
[40] |
Subramanyam K K, Sangeetha S. Deep Contextualized Medical Concept Normalization in Social Media Text[J]. Procedia Computer Science, 2020,171:1353-1362.
doi: 10.1016/j.procs.2020.04.145
|
[41] |
Dogan R I, Lu Z. An Inference Method for Disease Name Normalization[C]// Proceedings of the AAAI 2012 Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text. 2012: 8-13.
|
[42] |
Karadeniz I, Özgür A. Linking Entities Through an Ontology Using Word Embeddings and Syntactic Re-ranking[J]. BMC Bioinformatics, 2019,20(1):156.
doi: 10.1186/s12859-019-2678-8
pmid: 30917789
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|