Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (2): 38-47    DOI: 10.11925/infotech.2096-3467.2022.0919
Recognizing Intensity of Medical Query Intentions Based on Task Knowledge Fusion and Text Data Enhancement
Zhao Yiming1,2,3(),Pan Pei2,3,4,Mao Jin1,2
1Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China
2School of Information Management, Wuhan University, Wuhan 430072, China
3Big Data Institute, Wuhan University, Wuhan 430072, China
4National Demonstration Center for Experimental Library and Information Science Education, Wuhan University, Wuhan 430072, China
[Objective] This paper proposes a recognition model for the intensity of medical query intentions based on task knowledge fusion and text enhancement, aiming to improve the representation of query word vectors, as well as expand labeled data sets. [Methods] First, we used the SimBERT model to realize the text data enhancement of small task data set. Then, we utilized the medical query text corpus to incrementally pre-train the BERT model and obtain the MQ-BERT (Medical-Query BERT) model with task knowledge. Finally, we introduced the Bi-LSTM and other models to compare the classification performance before and after text data enhancement. [Results] The F-Score of our new MQ-BERT model reached 92.22%, which is superior than the existing models by Alibaba team on the same task data set (F-Score=87.5%). With the text data enhancement, the classification performance of our new model was also improved (F-Score=95.34%), which is 7.84% higher than the MC-BERT one. [Limitations] The data selection of incremental pre-training process could be further optimized. [Conclusions] Task knowledge fusion and text data enhancement can effectively improve the recognition accuracy of the intensity of medical query intentions, which benefits the developments of medical information retrieval system.

Key wordsMedical Information Query      Intention Intensity Recognition      Text Data Enhancement      Task Knowledge Fusion      BERT Model     
Received: 31 August 2022      Published: 28 March 2023
ZTFLH:  TP393 G250  
Fund:National Natural Science Foundation of China(71874130);National Natural Science Foundation of China(72274146);Ministry of Education Foundation on Humanities and Social Sciences(18YJC870026)
Corresponding Authors: Zhao Yiming,ORCID:0000-0001-8182-456X,E-mail:。   

Zhao Yiming, Pan Pei, Mao Jin. Recognizing Intensity of Medical Query Intentions Based on Task Knowledge Fusion and Text Data Enhancement. Data Analysis and Knowledge Discovery, 2023, 7(2): 38-47.

Technology Roadmap
类别 数量 示例数据
强意图 1 252 眼袋按摩能消除吗
弱意图 579 觉得有焦虑症
无意图 59 唾沫喷到脸上
Example of cMedIC Dataset
类别 原始语句 SimBERT增强后语句
强意图 正常的孩子一般什么时候说话 孩子什么时候才能说话
弱意图 晚上睡不踏实老做梦 睡觉不踏实总是做梦
无意图 艾滋病长效药降价免费 长效药降价了艾滋病
Enhanced Data of SimBERT
微调/分类模型 batch_size learning rate epochs dropout
Finetune 64 1e-5 10 /
Bi-LSTM 64 1e-5 10 0.2
Bi-GRU 32 1e-5 15 0.2
Bi-LSTM+ATT 32 1e-5 15 0.1
Bi-GRU+ATT 64 1e-5 15 0.2
Parameter Settings Before Text Data Enhancement
微调/分类模型 batch_size learning rate epochs dropout
Finetune 64 1e-5 20 /
Bi-LSTM 64 1e-5 10 0.2
Bi-GRU 64 1e-5 15 0.2
Bi-LSTM+ATT 32 2e-5 15 0.1
Bi-GRU+ATT 64 2e-5 10 0.1
Parameter Settings After Text Data Enhancement
是否文本数据增强 Precision Recall F-Score
MQ-BERT Finetune 92.25% 92.19% 92.22%
93.82% 93.75% 93.78%
Bi-LSTM 90.77% 93.75% 92.24%
95.38% 95.31% 95.34%
Bi-GRU 89.23% 92.19% 90.69%
89.23% 92.19% 90.69%
Bi-LSTM+ATT 91.03% 93.75% 92.37%
93.91% 93.75% 93.83%
Bi-GRU+ATT 90.58% 90.62% 90.60%
93.91% 93.75% 93.83%
Test Results of Various Models Before and After Text Data Enhancement
