Recognizing Intensity of Medical Query Intentions Based on Task Knowledge Fusion and Text Data Enhancement
Zhao Yiming1,2,3(),Pan Pei2,3,4,Mao Jin1,2
1Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China 2School of Information Management, Wuhan University, Wuhan 430072, China 3Big Data Institute, Wuhan University, Wuhan 430072, China 4National Demonstration Center for Experimental Library and Information Science Education, Wuhan University, Wuhan 430072, China
[Objective] This paper proposes a recognition model for the intensity of medical query intentions based on task knowledge fusion and text enhancement, aiming to improve the representation of query word vectors, as well as expand labeled data sets. [Methods] First, we used the SimBERT model to realize the text data enhancement of small task data set. Then, we utilized the medical query text corpus to incrementally pre-train the BERT model and obtain the MQ-BERT (Medical-Query BERT) model with task knowledge. Finally, we introduced the Bi-LSTM and other models to compare the classification performance before and after text data enhancement. [Results] The F-Score of our new MQ-BERT model reached 92.22%, which is superior than the existing models by Alibaba team on the same task data set (F-Score=87.5%). With the text data enhancement, the classification performance of our new model was also improved (F-Score=95.34%), which is 7.84% higher than the MC-BERT one. [Limitations] The data selection of incremental pre-training process could be further optimized. [Conclusions] Task knowledge fusion and text data enhancement can effectively improve the recognition accuracy of the intensity of medical query intentions, which benefits the developments of medical information retrieval system.
赵一鸣, 潘沛, 毛进. 基于任务知识融合与文本数据增强的医学信息查询意图强度识别研究*[J]. 数据分析与知识发现, 2023, 7(2): 38-47.
Zhao Yiming, Pan Pei, Mao Jin. Recognizing Intensity of Medical Query Intentions Based on Task Knowledge Fusion and Text Data Enhancement. Data Analysis and Knowledge Discovery, 2023, 7(2): 38-47.
(World Wide Web. Baidu Big Data: Health Search Demand Increased by 207%, AI Helped Accelerate the Implementation of Health Science[EB/OL]. [2022-05-18]. https://tech.huanqiu.com/article/44PFzGyVmgO.)
(Zhang Lu,Peng Xueying, Chen Jing. Intentions of Health Information Seeking in Public Health Emergency[J]. Information Science, 2022, 40(10): 51-59.)
[3]
Broder A. A Taxonomy of Web Search[J]. ACM SIGIR Forum, 2002, 36(2): 3-10.
doi: 10.1145/792550.792552
[4]
Sushmita S, Piwowarski B, Lalmas M. Dynamics of Genre and Domain Intents[C]// Proceedings of the 6th Asia Information Retrieval Societies Conference on Information Retrieval Technology. Berlin:Springer, 2010: 399-409.
(Gui Sisi, Lu Wei, Zhang Xiaojuan. Temporal Intent Classification with Query Expression Feature[J]. Data Analysis and Knowledge Discovery, 2019, 3(3): 66-75.)
[6]
Zhang N Y, Jia Q H, Yin K P, et al. Conceptualized Representation Learning for Chinese Biomedical Text Mining[OL]. arXiv Preprint, arXiv: 2008.10813.
[7]
Segev E, Ahituv N. Popular Searches in Google and Yahoo!: A “Digital Divide” in Information Uses?[J]. The Information Society, 2010, 26(1): 17-37.
doi: 10.1080/01972240903423477
[8]
Kanhabua N, Nørvåg K. Determining Time of Queries for Re-Ranking Search Results[C]// Proceedings of the 14th International Conference on Theory and Practice of Digital Libraries. Berlin:Springer, 2010: 261-272.
[9]
Ross N C M, Wolfram D. End User Searching on the Internet: An Analysis of Term Pair Topics Submitted to the Excite Search Engine[J]. Journal of the American Society for Information Science, 2000, 51(10): 949-958.
doi: 10.1002/(ISSN)1097-4571
(Lu Wei,Zhou Hongxia, Zhang Xiaojuan. Review of Research on Query Intent[J]. Journal of Library Science in China, 2013, 39(1): 100-111.)
[11]
Yang Z H, Gong J Y, Liu C Y, et al. iExplore: Accelerating Exploratory Data Analysis by Predicting User Intention[C]// Proceedings of the 2018 International Conference on Database Systems for Advanced Applications. Cham: Springer, 2018: 149-165.
[12]
Chen T, Yin H Z, Chen H X, et al. AIR: Attentional Intention-Aware Recommender Systems[C]// Proceedings of the 35th International Conference on Data Engineering. IEEE, 2019: 304-315.
(Wang Ruixue, Fang Jing, Gui Sisi, et al. Deep Learning-Based Algorithm for Academic Query Intent Classification[J]. Library and Information Service, 2021, 65(3): 93-99.)
doi: 10.13266/j.issn.0252-3116.2021.03.012
[14]
Figueroa A, Atkinson J. Ensembling Classifiers for Detecting User Intentions Behind Web Queries[J]. IEEE Internet Computing, 2016, 20(2): 8-16.
[15]
He C G, Chen S B, Huang S L, et al. Using Convolutional Neural Network with BERT for Intent Determination[C]// Proceedings of the 2019 International Conference on Asian Language Processing. IEEE, 2019: 65-70.
[16]
Qiu L R, Chen Y D, Jia H R, et al. Query Intent Recognition Based on Multi-Class Features[J]. IEEE Access, 2018, 6: 52195-52204.
doi: 10.1109/ACCESS.2018.2869585
[17]
Suresh S, Guru Rajan T S, Gopinath V. VoC-DL: Revisiting Voice of Customer Using Deep Learning[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018: 7843-7848.
[18]
Zhang J H, Ye Y X, Zhang Y, et al. Multi-Point Semantic Representation for Intent Classification[C]// Proceedings of the 2020 AAAI Conference on Artificial Intelligence. 2020: 9531-9538.
[19]
Cai R C, Zhu B J, Ji L, et al. An CNN-LSTM Attention Approach to Understanding User Query Intent from Online Health Communities[C]// Proceedings of the 2017 IEEE International Conference on Data Mining Workshops. IEEE, 2017: 430-437.
[20]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2018: 4171-4186.
[21]
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st Annual Conference on Neural Information Processing Systems Advances in Neural Information Processing Systems. 2017: 5999-6009.
[22]
Sun C, Qiu X, Xu Y, et al. How to Fine-Tune BERT for Text Classification?[C]// Proceedings of the 2019 China National Conference on Chinese Computational Linguistics. Cham: Springer, 2019: 194-206.
[23]
Peng Y F, Yan S K, Lu Z Y. Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets[C]// Proceedings of the 18th SIGBioMed Workshop on Biomedical Natural Language Processing. 2019: 58-65.
[24]
Huang K, Altosaar J, Ranganath R. ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission[OL]. arXiv Preprint, arXiv: 1904.05342.
[25]
Beltagy I, Lo K, Cohan A. SciBERT: A Pretrained Language Model for Scientific Text[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 3615-3620.
[26]
Lee J, Yoon W, Kim S, et al. BioBERT: A Pre-Trained Biomedical Language Representation Model for Biomedical Text Mining[J]. Bioinformatics, 2019, 36(4): 1234-1240.
doi: 10.1093/bioinformatics/btz682
[27]
Gu Y, Tinn R, Cheng H, et al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing[J]. ACM Transactions on Computing for Healthcare, 2022, 3(1): 1-23.
[28]
He Y, Zhu Z, Zhang Y, et al. Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name Recognition[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020: 4604-4614.
(Wang Dongbo, Liu Chang, Zhu Zihe, et al. Construction and Application of Pre-Trained Models of Siku Quanshu in Orientation to Digital Humanities[J]. Library Tribune, 2022, 42(6): 31-43.)
(Zhang Wei, Wang Hao, Chen Yuetong, et al. Identifying Metaphors and Association of Chinese Idioms with Transfer Learning and Text Augmentation[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 167-183.)
[31]
Wei J, Zou K. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 6382-6388.
[32]
Wu X, Lv S, Zang L, et al. Conditional BERT Contextual Augmentation[C]// Proceedings of the 19th International Conference on Computational Science. Cham: Springer, 2019: 84-95.
[33]
Kobayashi S. Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2018: 452-457.
[34]
Wang Y F, Xu C, Sun Q F, et al. PromDA: Prompt-Based Data Augmentation for Low-Resource NLU Tasks[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022: 4242-4255.
(Shi Guoliang, Chen Yuqi. A Comparative Study on the Integration of Text Enhanced and Pre-Trained Language Models in the Classification of Internet Political Messages[J]. Library and Information Service, 2021, 65(13): 96-107.)
doi: 10.13266/j.issn.0252-3116.2021.13.010
(Su Jianlin. Fish and Bear’s Paw: SimBERT Model for Fusion Retrieval and Generation[EB/OL]. [2022-05-18]. https://spaces.ac.cn/archives/7427.)
[37]
Conneau A, Lample G. Cross-Lingual Language Model Pretraining[C]// Proceedings of the 33rd Annual Conference on Neural Information Processing Systems. 2019: 7057-7067.
[38]
Joshi M, Chen D Q, Liu Y H, et al. SpanBERT: Improving Pre-Training by Representing and Predicting Spans[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 64-77.
doi: 10.1162/tacl_a_00300
[39]
Liu Y, Ott M, Goyal N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[OL]. arXiv Preprint, arXiv: 1907.11692.
[40]
Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.
[41]
Jiao Z, Sun S, Sun K. Chinese Lexical Analysis with Deep Bi-GRU-CRF Network[OL]. arXiv Preprint, arXiv: 1807.01882.
[42]
Li X, Wang Y Y, Acero A. Learning Query Intent from Regularized Click Graphs[C]// Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2008: 339-346.