Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (3): 97-109    DOI: 10.11925/infotech.2096-3467.2022.0333
Current Issue | Archive | Adv Search |
Automatic Question-Answering in Chinese Medical Q & A Community with Knowledge Graph
Wang Yinqiu1,2,Yu Wei1,2(),Chen Junpeng3
1Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
2School of Information Management, Nanjing University, Nanjing 210023, China
3College of Information Engineering, Nanjing University of Finance & Economics, Nanjing 210023, China
Download: PDF (1104 KB)   HTML ( 28
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a new method to determine the reliability of answers from the online Chinese medical question and answer (Q&A) community, aiming to enhance the accuracy of answer selection models for medical Q&A recognition with the help of professional medical knowledge graphs. [Methods] Based on the answer selection model using a hybrid neural network (fusing RNN and multi-scale CNN to capture context and local information), we constructed a professional medical knowledge graph that integrated entity and relationship embeddings to enrich the semantic information of the Q&A text. Combined with the Q&A pair attention mechanism, we obtained the final similarity of the pairs and selected candidate answers with the highest scores. [Results] We examined the proposed model on the cMedQA2.0 dataset. Compared to the hybrid neural network model without incorporating knowledge graph entity relationship, the Top-1 accuracy of the answer selection in our new model increased by 2.3% (to 62.2%), demonstrating its effectiveness for improving answer selection. [Limitations] The medical knowledge graph used is of small size, only including the common entities in the medical community Q&A. The incomplete relationship between medical entities may affect the answer selection effectiveness when facing niche questions. [Conclusions] Combining professional Chinese medical knowledge graphs and deep learning models could improve the answer selection technology. It helps people with medical consultation needs obtain reliable medical advice in the Q & A community. Our model also monitors the online medical community’s information quality and reduces the burden of hospital outpatient service.

Key wordsQuestion-Answering Community      Deep Learning      Answer Selection      Knowledge Graph     
Received: 25 March 2022      Published: 13 April 2023
ZTFLH:  TP391  
Fund:National Social Science Fund of China(21BTQ030)
Corresponding Authors: Yu Wei,ORCID:0000-0003-1933-5380,E-mail:yuwei@nju.edu.cn。   

Cite this article:

Wang Yinqiu, Yu Wei, Chen Junpeng. Automatic Question-Answering in Chinese Medical Q & A Community with Knowledge Graph. Data Analysis and Knowledge Discovery, 2023, 7(3): 97-109.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0333     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I3/97

来源 问答文本
平台用户 医生你好,我有腰间盘突出的毛病,左下肢疼痛,骨质增生压迫神经,求治痛药物?
平台专业医生 您好!确定了存在腰间盘突出本身压迫神经了,说明比较严重了,短期还是应该卧床休息,配合药物积极调节一般来说可以选择腰椎牵引,配合活血止痛胶囊、芬必得、甲钴胺等药物治疗,期间还是应该注意休息。
平台答案池随机挑选 引起腹痛的原因很多,主要概况为腹腔脏器疾病所致疼痛和腹外脏器疾病所致疼痛,腹腔脏器的炎症如胃肠炎、胰腺炎、胆囊炎、阑尾炎、腹膜炎等。你的情况建议到正规医院进行检查,诊断明确后对症治疗。
Question and Answer Corpus of Chinese Medical Platform
Model Structure of Question and Answer Selection Based on Knowledge Graph
Schematic Diagram of Multi-scale Convolution Kernel CNN
Interactive Attention
cMedQA2.0 Word Frequency Statistics of Question and Answer Corpus (TOP20)
数据集 问题 答案 问题平均长度 答案平均长度
训练集 100 000 188 490 48 101
验证集 4 000 7 527 49 101
测试集 4 000 7 552 49 100
总计 108 000 203 569 49 101
Overview of cMedQA2.0 Data
问题ID 正确答案ID 错误答案ID 问题实体ID 问题实体对应的关系ID 正确答案对应KG实体ID 错误答案对应KG实体ID
57765992 55336 21379 9613 6 4959 1734
3504339 205252 122378 13456 8 17643 42782
26039334 130857 150574 4274 10 44421 5648
Overview of Training Data
Question_ID 问题内容 问题匹配的实体关系对
57765992 医生请问,颈复康颗粒是适合那类型的的颈椎病的?如果是增生或是颈椎间盘突出吃这个可以吗?效果怎么样呢? 实体:颈椎病;关系:疾病推荐药品
3504339 左腿莫名其妙的会痛,睡觉时不能将腿伸直,否则会象抽筋般疼痛,是缺钙的关系还是意味着身体某部分有病状? 实体:腰腿痛;关系:疾病症状
26039334 肺癌晚期吃什么中药好呢??肺癌转移到肝脏了,人目前还没什么事,只是肝脏部分有痛疼现象,是怎么会事,吃什么药会好点。 实体:肝脏类癌;关系:治疗方法
Question Content-ID Comparison
Answer_ID 答案内容
55336 颈复康颗粒具有活血通络,散风止痛。用于风湿瘀阻所致的颈椎病,症见头晕、颈项僵硬、肩背酸痛、手臂麻木。所以这种情况适用于颈椎骨质增生所致的神经根型颈椎病、椎动脉型颈椎病。建议你这种情况是可以口服的,但是治疗的关键需要休息保暖,不可长时间的低头伏案工作。
205252 不要干重体力劳动和剧烈运动。不要睡弹簧床垫,一定要避免长期保持一个姿势工作和学习,尤其是长期弯腰工作学习,每1小时左右要休息10分钟左右。可以牵引,理疗,红外线,推拿按摩治疗。严重的可能需要手术治疗。建议飞燕式锻炼:俯卧于床,先后做双下肢交替抬举,双下肢同时抬举,上半身后伸抬起,身体两端同时抬离于床等动作,上述动作各十余次,每日坚持30分钟锻炼。
130857 有转移属于晚期,保守治疗为主,如果身体情况允许多模式的综合治疗效果较好,治疗多采用化疗联合中医药治疗的综合手段,以充分结合各治疗方法的优势,抑瘤消瘤,加强疗效,减轻副作用,如果身体虚弱,也可以考虑纯中医药治疗,同样可以起到很好的治疗效果,有效的控制复发和转移。
Answer Content-ID Comparison
Entity_ID Entity_
name
Relation_ID Relation_name
9613 颈椎病 6 recommand_drug(疾病推荐药品)
13456 腰腿痛 8 has_symptom(疾病症状)
4274 肝脏类癌 10 cure_way(治疗方法)
Comparison of Entity-ID and Relation-ID of Knowledge Atlas
实体类型 中文含义 实体数量 举例
Disease 疾病 8 808 急性肺脓肿
Drug 药品 3 828 布林佐胺滴眼液
Food 食物 4 870 芝麻
Check 检查项目 3 353 胸部CT检查
Department 科目 54 内科
Producer 在售药品 17 201 青阳醋酸地塞米松片
Symptom 疾病症状 5 998 乏力
Cure 治疗方法 544 抗生素药物治疗
Total 总计 44 656 约4.4万实体量级
Entity Types and Data Examples of Knowledge Map
实体关系类型 中文含义 关系数量 举例
belongs_to 属于 8 843 <内科,属于,呼吸内科>
common_drug 疾病常用药品 14 647 <成人呼吸窘迫综合征,常用,人血白蛋白>
do_eat 疾病宜吃食物 22 230 <成人呼吸窘迫综合征,宜吃,莲子>
drugs_of 药品在售药品 17 315 <人血白蛋白,在售,莱士人蛋白人血白蛋白>
need_check 疾病所需检查 39 418 <单侧肺气肿,所需检查,支气管造影>
Total 总计 312 159 /
Relation Types and Data Examples of Knowledge Map
ID 模型 验证集Top-1准确率/% 测试集Top-1准确率/%
1 Bi-LSTM 56.6 57.3
2 Multi-CNN(3) 57.9 57.6
3 Bi-LSTM+Multi-CNN(3)
+Cross-attention
59.0 59.5
4 Bi-LSTM+Multi-CNN(3)
+Cross-attention+KG Embedding
61.1 62.2
Training Effect and Comparison of Models
k1取值 Top-1准确率/%
1 62.2
2 61.3
3 58.2
4 56.5
Influence of k1 value on model effect when k2 is 1
k2取值 Top-1准确率/%
1 62.2
2 61.5
3 59.9
4 57.5
Influence of k2 value on model effect when k1 is 1
ID Components Top-1准确率/% 增量/%
1 本文模型 62.2 /
2 -Bi-LSTM 59.6 -2.6
3 -Bi-LSTM-Multi-CNN(3)
-Cross-attention
56.9 -5.3
4 -Bi-LSTM-Multi-CNN(3)
-Cross-attention-KG Embedding
55.2 -7.0
Ablation Experiments
[1] 李明, 李莹, 周庆, 等. 基于TF-PIDF的网络问答社区中的知识供需研究[J]. 数据分析与知识发现, 2021, 5(2):106-115.
[1] ( Li Ming, Li Ying, Zhou Qing, et al. Analyzing Knowledge Demand and Supply of Community Question Answering with TF-PIDF[J]. Data Analysis and Knowledge Discovery, 2021, 5(2):106-115.)
[2] 易明, 张婷婷. 大众性问答社区答案质量排序方法研究[J]. 数据分析与知识发现, 2019, 3(6):12-20.
[2] ( Yi Ming, Zhang Tingting. Ranking Answer Quality of Popular Q&A Community[J]. Data Analysis and Knowledge Discovery, 2019, 3(6):12-20.)
[3] 石静, 厉臣璐, 钱宇星, 等. 国内外健康问答社区用户信息需求对比研究——基于主题和时间视角的实证分析[J]. 数据分析与知识发现, 2019, 3(5):1-10.
[3] ( Shi Jing, Li Chenlu, Qian Yuxing, et al. Information Needs of Domestic and International HCQA Users——An Empirical Analysis[J]. Data Analysis and Knowledge Discovery, 2019, 3(5):1-10.)
[4] Kalchbrenner N, Grefenstette E, Blunsom P. A Convolutional Neural Network for Modelling Sentences[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2014: 655-665.
[5] Tan M, dos Santos C, Xiang B, et al. LSTM-Based Deep Learning Models for Non-Factoid Answer Selection[OL]. arXiv Preprint, arXiv:1511.04108.
[6] Zhang S, Zhang X, Wang H, et al. Multi-scale Attentive Interaction Networks for Chinese Medical Question Answer Selection[J]. IEEE Access, 2018, 6:74061-74071.
doi: 10.1109/Access.6287639
[7] Deng Y, Xie Y X, Li Y L, et al. Multi-task Learning with Multi-View Attention for Answer Selection and Knowledge Base Question Answering[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2019:6318-6325.
[8] Bilotti M W, Ogilvie P, Callan J, et al. Structured Retrieval for Question Answering[C]// Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007:351-358.
[9] Shen D, Lapata M. Using Semantic Roles to Improve Question Answering[C]// Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2007:12-21.
[10] Heilman M, Smith N A. Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions[C]// Proceedings of the 2010 Annual Conference of the North American Chapter of the Association of Computational Linguistics. 2010:1011-1019.
[11] Wang M Q, Smith N A, Mitamura T. What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA[C]// Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2007:22-32.
[12] Lai A, Hockenmaier J. Illinois-LH: A Denotational andDistributional Approach to Semantics[C]// Proceedings of the 8th International Workshop on Semantic Evaluation. 2014: 329-334.
[13] Yao X C, van Durme B, Callison-Burch C, et al. Semi-Markov Phrase-Based Monolingual Alignment[C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013:590-600.
[14] Feng M W, Xiang B, Glass M R, et al. Applying Deep Learning to Answer Selection: A Study and an Open Task[C]// Proceedings of 2015 IEEE Workshop on Automatic Speech Recognition and Understanding. 2015:813-820.
[15] Qiu X P, Huang X J. Convolutional Neural Tensor Network Architecture for Community-Based Question Answering[C]// Proceedings of the 24th International Conference on Artificial Intelligence. 2015:1305-1311.
[16] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8):1735-1780.
doi: 10.1162/neco.1997.9.8.1735 pmid: 9377276
[17] Zhang T X, Ren Y Q, Tadessem M M, et al. Bi-directional CapsuleNetwork Model for Chinese Biomedical Community Question Answering[C]// Proceedings of the 8th CCF International Conference on Natural Language Processing and Chinese Computing. 2019:105-116.
[18] Xiang Y, Chen Q C, Wang X L, et al. Answer Selection in Community Question Answering via Attentive Neural Networks[J]. IEEE Signal Processing Letters, 2017, 24(4):505-509.
doi: 10.1109/LSP.2017.2673123
[19] Song Y, Hu Q V, He L. P-CNN: Enhancing Text Matching with Positional Convolutional Neural Network[J]. Knowledge-Based Systems, 2019, 169(C):67-79.
doi: 10.1016/j.knosys.2019.01.028
[20] Chen X C, Yang Z Y, Liang N Y, et al. Co-Attention Fusion Based Deep Neural Network for Chinese Medical Answer Selection[J]. Applied Intelligence, 2021, 51(10):6633-6646.
doi: 10.1007/s10489-021-02212-w
[21] Deng Y, Xie Y X, Li Y L, et al. Contextualized Knowledge-Aware Attentive Neural Network: Enhancing Answer Selection with Knowledge[J]. ACM Transactions on Information Systems, 2022, 40(1):Article No.2.
[22] 李贺, 刘嘉宇, 李世钰, 等. 基于疾病知识图谱的自动问答系统优化研究[J]. 数据分析与知识发现, 2021, 5(5):115-126.
[22] ( Li He, Liu Jiayu, Li Shiyu, et al. Optimizing Automatic Question Answering System Based on Disease Knowledge Graph[J]. Data Analysis and Knowledge Discovery, 2021, 5(5):115-126.)
[23] 胡正银, 刘蕾蕾, 代冰, 等. 基于领域知识图谱的生命医学学科知识发现探析[J]. 数据分析与知识发现, 2020, 4(11):1-14.
[23] ( Hu Zhengyin, Liu Leilei, Dai Bing, et al. Discovering Subject Knowledge in Life and Medical Sciences with Knowledge Graph[J]. Data Analysis and Knowledge Discovery, 2020, 4(11):1-14.)
[1] Du Yue, Chang Zhijun, Dong Mei, Qian Li, Wang Ying. Constructing Large-scale Knowledge Graph for Massive Sci-Tech Literature[J]. 数据分析与知识发现, 2023, 7(2): 141-150.
[2] Zhang Zhengang, Yu Chuanming. Knowledge Graph Completion Model Based on Entity and Relation Fusion[J]. 数据分析与知识发现, 2023, 7(2): 15-25.
[3] Shen Lining, Yang Jiayi, Pei Jiaxuan, Cao Guang, Chen Gongzheng. A Fine-Grained Sentiment Recognition Method Based on OCC Model and Triggering Events[J]. 数据分析与知识发现, 2023, 7(2): 72-85.
[4] Wang Weijun, Ning Zhiyuan, Du Yi, Zhou Yuanchun. Identifying Interdisciplinary Sci-Tech Literature Based on Multi-Label Classification[J]. 数据分析与知识发现, 2023, 7(1): 102-112.
[5] Peng Cheng, Zhang Chunxia, Zhang Xin, Guo Jingtao, Niu Zhendong. Reasoning Model for Temporal Knowledge Graph Based on Entity Multiple Unit Coding[J]. 数据分析与知识发现, 2023, 7(1): 138-149.
[6] Xiao Yuhan, Lin Huiping. Mining Differentiated Demands with Aspect Word Extraction: Case Study of Smartphone Reviews[J]. 数据分析与知识发现, 2023, 7(1): 63-75.
[7] Cheng Quan, She Dexin. Drug Recommendation Based on Graph Neural Network with Patient Signs and Medication Data[J]. 数据分析与知识发现, 2022, 6(9): 113-124.
[8] Liu Chunjiang, Li Shuying, Hu Hanlin, Fang Shu. Graph Databases for Complex Network Analysis[J]. 数据分析与知识发现, 2022, 6(7): 1-11.
[9] Zhang Han, An Xinyu, Liu Chunhe. Building Multi-Source Semantic Knowledge Graph for Drug Repositioning[J]. 数据分析与知识发现, 2022, 6(7): 87-98.
[10] Wang Lu, Le Xiaoqiu. Research Progress on Citation Analysis of Scientific Papers[J]. 数据分析与知识发现, 2022, 6(4): 1-15.
[11] Zheng Xiao, Li Shuqing, Zhang Zhiwang. Measuring User Item Quality with Rating Analysis for Deep Recommendation Model[J]. 数据分析与知识发现, 2022, 6(4): 39-48.
[12] Liu Kan, Xu Qinya, Yu Lu. Constructing Knowledge Graph for Business Environment[J]. 数据分析与知识发现, 2022, 6(4): 82-96.
[13] Yu Chuanming, Lin Hongjun, Zhang Zhengang. Joint Extraction Model for Entities and Events with Multi-task Deep Learning[J]. 数据分析与知识发现, 2022, 6(2/3): 117-128.
[14] Zhang Wei, Wang Hao, Chen Yuetong, Fan Tao, Deng Sanhong. Identifying Metaphors and Association of Chinese Idioms with Transfer Learning and Text Augmentation[J]. 数据分析与知识发现, 2022, 6(2/3): 167-183.
[15] Liu Zhenghao, Qian Yuxing, Yi Tianlong, Lv Huakui. Constructing Knowledge Graph for Financial Securities and Discovering Related Stocks with Knowledge Association[J]. 数据分析与知识发现, 2022, 6(2/3): 184-201.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn