Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (9): 88-97    DOI: 10.11925/infotech.2096-3467.2019.0147
Current Issue | Archive | Adv Search |
Automatic Triage of Online Doctor Services Based on Machine Learning
Ruojia Wang1,2,Lu Zhang1,Jimin Wang1()
1 Department of Information Management, Peking University, Beijing 100871, China
2 Institute of Ocean Research, Peking University, Beijing 100871, China
Download: PDF(710 KB)   HTML ( 10
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper compares the performance of various machine learning algorithms for automatic triage, aiming to improve their effectiveness through analyzing mis-classification data. [Methods] First, we retrieved 33,073 real patients’ questions from a website named “chunyu doctor”. Then, we compared the accuracy of two text vectorization methods and six classification models. Finally, we analyzed the mis-classification data and extracted new features to improve the performance of models. [Results] The best automatic triage model used TF-IDF as text vectorization method and support vector machine as classification algorithm. After adding age and gender characteristics, the classification accuracy rate reached 76.3%. The classifier had the lowest accuracy rate for surgery department due to the setting of this platform’s categories. [Limitations] We assumed that the department selection of the patient was correct. [Conclusions] Machine learning techniques could improve the performance of automatic triage services of the online health consulting platforms.

Key wordsAsk the Doctor Service      Automatic Triage      Machine Learning      Support Vector Machine     
Received: 11 February 2019      Published: 23 October 2019
:  TP393 G35  

Cite this article:

Ruojia Wang,Lu Zhang,Jimin Wang. Automatic Triage of Online Doctor Services Based on Machine Learning. Data Analysis and Knowledge Discovery, 2019, 3(9): 88-97.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.0147     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2019/V3/I9/88

科室 示例 样本数(个)
内科 我的心跳最近跳的次数在九十跳左右算正常吗 3 405
外科 60岁老人脚后跟摔了里面有小碎片, 怎么治疗 2 362
妇科 盆腔炎会肚子隐隐痛吗, 没异味, 白带特别粘 4 205
产科 怀孕四个月喝酒抽烟熬夜对胎儿有影响吗 1 937
儿科 7天新生儿综合评分36分踏步反射0分是脑瘫吗 2 294
男科 睾丸紧缩好像变小了, 是怎么回事呢? 2 553
骨伤科 手肘关节处肿胀, 可以不用打石膏固定吗 1 914
营养科 为什么有一种人每天暴饮暴食都不会胖的呢 3 691
肿瘤科 59岁老人宫颈癌放化疗后尿失禁带点血怎么回事 2 103
眼科 62岁青光眼晚期如何治疗 2 822
耳鼻咽喉科 鼻子塞得很严重, 擦了油和通鼻贴完全没有效果, 怎么办 2 036
口腔颌面科 最近这几天刷牙流血越来越厉害了怎么回事 1 926
皮肤性病科 尖锐湿疣有什么特征 1 825
总计 33 073
分类算法 Count TF-IDF
支持向量机 73.4% 75.1%
随机森林 68.6% 70.0%
多项式贝叶斯 71.8% 69.1%
逻辑回归 74.2% 74.0%
k近邻 48.4% 54.9%
集成分类 74.5% 74.4%
科室 数据量 分诊准确率
眼科 565 94.9%
营养科 738 85.0%
口腔颌面科 385 84.2%
耳鼻喉科 407 82.6%
肿瘤科 421 82.2%
妇科 841 82.2%
骨伤科 383 72.6%
内科 681 72.2%
男科 511 72.2%
产科 387 66.1%
儿科 459 64.3%
皮肤性病科 365 62.5%
外科 472 40.9%
原始科室 预测科室 错误率
外科 男科 25%
产科 妇科 22%
儿科 内科 10%
男科 外科 10%
妇科 产科 7%
内科 儿科 6%
骨伤科 皮肤性病科 5%
皮肤性病科 内科 5%
营养科 儿科 5%
肿瘤科 妇科 5%
耳鼻喉科 内科 4%
口腔颌面科 内科 3%
眼科 皮肤性病科 1%
科室 常见高频易混词举例
外科-男科 龟头、阴茎、手淫、勃起、早泄、包皮、尿、睾丸、
疼、精子、性生活、前列腺炎、痒、手术、龟头炎
产科-妇科 月经、怀孕、流产、检查、子宫、出血、严重、疼、
自然流产、分泌物、流血、孩子
儿科-内科 发烧、咳嗽、治疗、感冒、药、反复、症状、大便、
吐、检查、拉肚子、痰
科室 年龄平均值 男性比例 女性比例
妇科 27.2 3.5% 96.5%
产科 27.0 4.1% 95.9%
营养科 23.6 35.1% 64.9%
儿科 10.9 43.1% 56.9%
口腔颌面科 26.6 43.5% 56.5%
皮肤性病科 25.8 44.1% 55.9%
眼科 28.3 45.0% 55.0%
肿瘤科 47.3 45.5% 54.5%
耳鼻喉科 27.8 47.8% 52.2%
内科 34.8 48.9% 51.1%
骨伤科 34.0 51.3% 48.7%
外科 31.3 64.0% 36.0%
男科 26.9 94.4% 5.6%
总体 28.6 43.9% 56.1%
科室 增加特征前准确率 增加特征后准确率 提升率
妇科 82.7% 83.2% 0.5%
产科 67.4% 67.9% 0.5%
营养科 86.7% 87.5% 0.8%
儿科 58.3% 61.8% 3.5%
口腔颌面科 81.6% 82.1% 0.4%
皮肤性病科 60.6% 60.6% 0.0%
眼科 99.4% 99.4% 0.0%
肿瘤科 75.5% 76.6% 1.0%
耳鼻喉科 85.8% 84.7% -1.1%
内科 73.2% 73.8% 0.5%
骨伤科 70.4% 71.1% 0.7%
外科 45.8% 46.4% 0.6%
男科 70.0% 73.5% 3.5%
总体 75.5% 76.3% 0.8%
[1] Pineda A L, Ye Y, Visweswaran S , et al. Comparison of Machine Learning Classifiers for Influenza Detection from Emergency Department Free-text Reports[J]. Journal of Biomedical Informatics, 2015,58:60-69.
[2] 孔倩, 王杜娟, 王延章 , 等. 基于多目标神经网络的前列腺癌诊断方法[J]. 系统工程理论与实践, 2018,38(2):532-544.
[2] ( Kong Qian, Wang Dujuan, Wang Yanzhang , et al. Multi-Objective Neural Network-Based Diagnostic Model of Prostatic Cancer[J]. Systems Engineering - Theory & Practice, 2018,38(2):532-544.)
[3] Nikfarjam A, Sarker A, O’connor K , et al. Pharmacovigilance from Social Media: Mining Adverse Drug Reaction Mentions Using Sequence Labeling with Word Embedding Cluster Features[J]. Journal of the American Medical Informatics Association, 2015,22(3):671-681.
[4] Kose I, Gokturk M, Kilic K . An Interactive Machine- Learning-Based Electronic Fraud and Abuse Detection System in Healthcare Insurance[J]. Applied Soft Computing, 2015,36:283-299.
[5] 李嘉, 唐洁, 蒋玲 , 等. 在线健康咨询市场中的价格溢价研究[J]. 管理科学, 2018,31(1):15-32.
[5] ( Li Jia, Tang Jie, Jiang Ling , et al. Price Premiums in the Online Health Consultation Market[J]. Journal of Management Science, 2018,31(1):15-32.)
[6] 刘笑笑 . 在线医生信誉和医生努力对咨询量的影响研究[D]. 哈尔滨: 哈尔滨工业大学, 2014.
[6] ( Liu Xiaoxiao . The Impact of Online Doctor Reputation and Doctor Effort on Consultation Amount[D]. Harbin: Harbin Institute of Technology, 2014.)
[7] 薛书峰 . 互联网医疗的定价影响因素研究[D]. 南京: 南京大学, 2016.
[7] ( Xue Shufeng . Research on the Factors Affecting the Pricing of Online Healthcare Community[D]. Nanjing: Nanjing University, 2016.)
[8] 邓朝华, 洪紫映 . 在线医疗健康服务医患信任影响因素实证研究[J]. 管理科学, 2017,30(1):43-52.
[8] ( Deng Zhaohua, Hong Ziying . An Empirical Study of Patient-physician Trust Impact Factors in Online Healthcare Services[J]. Journal of Management Science, 2017,30(1):43-52.)
[9] 范晓妞, 艾时钟 . 在线医疗社区参与双方行为对知识交换效果影响的实证研究[J]. 情报杂志, 2016,35(7):173-178.
[9] ( Fan Xiaoniu, Ai Shizhong . An Empirical Study on the Relationship Between Online Medical Community Participants’ Behaviors and Knowledge Exchange Effect[J]. Journal of Intelligence, 2016,35(7):173-178.)
[10] Björk A B, Hillborg H, Augutis M , et al. Evolving Techniques in Text-Based Medical Consultation-Physicians’ Long-Term Experiences at an Ask the Doctor Service[J]. International Journal of Medical Informatics, 2017,105:83-88.
[11] Umefjord G, Petersson G, Hamberg K . Reasons for Consulting a Doctor on the Internet: Web Survey of Users of an Ask the Doctor Service[J]. Journal of Medical Internet Research, 2003,5(4):e26.
[12] Umefjord G, Sandström H, Malker H , et al. Medical Text-Based Consultations on the Internet: A 4-Year Study[J]. International Journal of Medical Informatics, 2008,77(2):114-121.
[13] Ma X, Gui X, Fan J , et al. Professional Medical Advice at Your Fingertips: An Empirical Study of an Online[J]. Proceedings of the ACM on Human-Computer Interaction, 2018, 2: Article No. 116.
[14] 吴江, 周露莎 . 在线医疗社区中知识共享网络及知识互动行为研究[J]. 情报科学, 2017,35(3):144-151.
[14] ( Wu Jiang, Zhou Lusha . The Study of Knowledge Sharing Network and Users’ Knowledge Interaction in Online Health Community[J]. Information Science, 2017,35(3):144-151.)
[15] 吴江, 施立 . 基于社会网络分析的在线医疗社区用户交互行为研究[J]. 情报科学, 2017,35(7):120-125.
[15] ( Wu Jiang, Shi Li . Study of the User Interaction Behavior in Online Health Community Based on Social Network Analysis[J]. Information Science, 2017,35(7):120-125.)
[16] 吴江, 李姗姗, 周露莎 , 等. 基于随机行动者模型的在线医疗社区用户关系网络动态演化研究[J]. 情报学报, 2017,36(2):213-220.
[16] ( Wu Jiang, Li Shanshan, Zhou Lusha , et al. Research on Dynamic Evolution of Users’ Relationship Network in Online Health Community Based on Stochastic Actor-oriented Model[J]. Journal of the China Society for Scientific and Technical Information, 2017,36(2):213-220.)
[17] 吴江, 侯绍新, 靳萌萌 , 等. 基于LDA模型特征选择的在线医疗社区文本分类及用户聚类研究[J]. 情报学报, 2017,36(11):1183-1191.
[17] ( Wu Jiang, Hou Shaoxin, Jin Mengmeng , et al. LDA Feature Selection Based Text Classification and User Clustering in Chinese Online Health Community[J]. Journal of the China Society for Scientific and Technical Information, 2017,36(11):1183-1191.)
[18] 刘通, 杨敬成 . 基于信号传播算法的在线医疗咨询反馈内容评估方法[J]. 数据分析与知识发现, 2017,1(11):29-36.
[18] ( Liu Tong, Yang Jingcheng . Evaluating Online Healthcare Consultation Feedbacks Based on Signal Transmission Algorithm[J]. Data Analysis and Knowledge Discovery, 2017,1(11):29-36.)
[19] Himmel W, Reincke U, Michelmann H W . Text Mining and Natural Language Processing Approaches for Automatic Categorization of Lay Requests to Web-Based Expert Forums[J]. Journal of Medical Internet Research, 2009,11(3):e25.
[20] Abdaoui A, Azé J, Bringay S , et al. Assisting E-patients in an Ask the Doctor Service[J]. Studies in Health Technology and Informatics, 2015,210:572-576.
[21] 刁必颂 . 基于在线患者咨询数据的在线医生推荐系统研究[D]. 北京: 北京理工大学, 2016.
[21] ( Diao Bisong . Online Patient Counseling Data Based Online Doctor Recommend System Research[D]. Beijing: Beijing Institute of Technology, 2016.)
[22] 王静 . 在线问诊平台相似病例推荐[D]. 哈尔滨: 哈尔滨理工大学, 2017.
[22] ( Wang Jing . Similar Cases Recommendation on Online Medical Diagnose Platform[D]. Harbin: Harbin University of Science and Technology, 2017.)
[23] 刘通 . 基于在线咨询记录的医生自动匹配算法应用研究[J]. 情报理论与实践, 2018,41(6):147-152.
[23] ( Liu Tong . An Application Research of Automatic Physician Matching Algorithm Based on Online Healthcare Consultation Records[J]. Information Studies: Theory & Application, 2018,41(6):147-152.)
[24] Scikit-learn. One-Vs-The-Rest[EB/OL].[2018-02-02]. https:// scikit-learn.org/stable/modules/multiclass.html#one-vs-the-rest.
[25] Kibriya A M, Frank E, Pfahringer B, et al. Multinomial Naive Bayes for Text Categorization Revisited [C]// Proceedings of the Australasian Joint Conference on Artificial Intelligence. 2004: 488-499.
[26] Scikit-learn. Logistic Regression[EB/OL].[2018-02-02]. https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression.
[27] Scikit-learn. Random Forests[EB/OL].[2018-02-02]. https://scikit-learn.org/stable/modules/ensemble.html#random-forests.
[28] Scikit-learn. Nearest Neighbors Classification[EB/OL].[2018-02-02]. https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbors-classification.
[29] 王昊, 叶鹏, 邓三鸿 . 机器学习在中文期刊论文自动分类研究中的应用[J]. 现代图书情报技术, 2014(3):80-87.
[29] ( Wang Hao, Ye Peng, Deng Sanhong . The Application of Machine- Learning in the Research on Automatic Categorization of Chinese Periodical Articles[J]. New Technology of Library and Information Service, 2014(3):80-87.)
[30] 刘浏, 王东波 . 基于论文自动分类的社科类学科跨学科性研究[J]. 数据分析与知识发现, 2018,2(3):30-38.
[30] ( Liu Liu, Wang Dongbo . Identifying Interdisciplinary Social Science Research Based on Article Classification[J]. Data Analysis and Knowledge Discovery, 2018,2(3):30-38.)
[31] Ishikawa H, Hashimoto H, Kiuchi T . The Evolving Concept of “Patient-Centeredness” in Patient-Physician Communication Research[J]. Social Science & Medicine, 2013,96:147-153.
[32] 赵明, 杜会芳, 董翠翠 , 等. 基于Word2Vec和LSTM的饮食健康文本分类研究[J]. 农业机械学报, 2017,48(10):202-208.
[32] ( Zhao Ming, Du Huifang, Dong Cuicui , et al. Diet Health Text Classification Based on Word2Vec and LSTM[J]. Transactions of the Chinese Society for Agricultural Machinery, 2017,48(10):202-208.)
[1] Heran Qin,Liu Liu,Bin Li,Dongbo Wang. Automatic Classification of Ancient Classics with Entity Features[J]. 数据分析与知识发现, 2019, 3(9): 68-76.
[2] Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[3] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[4] Qingtian Zeng,Mingdi Dai,Chao Li,Hua Duan,Zhongying Zhao. Discovering Important Locations with User Representation and Trace Data[J]. 数据分析与知识发现, 2019, 3(6): 75-82.
[5] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[6] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[7] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[8] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[9] Zixuan Zhang,Hao Wang,Liping Zhu,Sanhong eng. Identifying Risks of HS Codes by China Customs[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
[10] Lina Liu,Jiayin Qi,Zhenping Zhang,Dan Zeng. Analyzing Impacts of Brand Reputation on Online Sales Based on Massive Commodity Reviews and Brand[J]. 数据分析与知识发现, 2018, 2(9): 10-21.
[11] Longjia Jia,Bangzuo Zhang. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
[12] Wei Lu,Mengqi Luo,Heng Ding,Xin Li. Image Annotation Tags by Deep Learning and Real Users: A Comparative Study[J]. 数据分析与知识发现, 2018, 2(5): 1-10.
[13] Li Wang,Lixue Zou,Xiwen Liu. Visualizing Document Correlation Based on LDA Model[J]. 数据分析与知识发现, 2018, 2(3): 98-106.
[14] Xinyue Fan,Lei Cui. Predicting Antineoplastic Drug Targets Based on Network Properties[J]. 数据分析与知识发现, 2018, 2(12): 98-108.
[15] Yang Zhao,Xini Yuan,Yawen Chen,Liqiang Wu. Predicting Conversion Rate of APP Advertising with Machine Learning[J]. 数据分析与知识发现, 2018, 2(11): 2-9.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn