Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (4): 34-43     https://doi.org/10.11925/infotech.2096-3467.2019.0815
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于Labeled-LDA模型的在线医疗专家推荐研究*
潘有能(),倪秀丽
浙江大学公共管理学院 杭州 310058
Recommending Online Medical Experts with Labeled-LDA Model
Pan Youneng(),Ni Xiuli
School of Public Affairs, Zhejiang University, Hangzhou 310058, China
全文: PDF (1031 KB)   HTML ( 9
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 改进现有在线医疗专家推荐模型,提高医生回答健康问题的效率和质量。【方法】 基于Labeled-LDA模型挖掘健康问题潜在主题,明确医生专长,以提高“问题-医生”匹配度,并使用39健康网的数据进行实验验证。【结果】 本文方法的准确率、召回率和回答采纳比分别为40.4%、44.0%和22.9%,而网站现有指标分别为20.4%、29.7%和6.8%。【局限】 未考虑医生回答问题的速度和医生的简历等相关信息;不能很好地识别出回答问题过于稀疏的新加入医生的专长。【结论】 本研究所提专家推荐方法在评价指标上均超过网站现有指标,具有良好的推荐效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
潘有能
倪秀丽
关键词 Labeled-LDA专家推荐主题模型在线医疗    
Abstract

[Objective] This paper tries to modify the existing recommendation model for online medical experts, aiming to more effectively address health-related inquiries. [Methods] First, we identified the latent topics of online health questions with the help of Labeled-LDA model. Then, we defined the doctors’ specialties and better match them with questions. Finally, we evaluated the new model with data from http://www.39.net. [Results] The precision, recall and response adoption rates of the proposed method were 40.4%, 44.0% and 22.9%, which were much higher than those of the existing ones. [Limitations] Our method did not include factors like doctors’ responding time and their resumes. This method could not identify expertise of newly joined doctors who answered few questions. [Conclusions] The proposed model could effectively recommend physicians for patients asking questions online.

Key wordsLabeled-LDA    Expert Recommendation    Topic Model    Online Healthcare
收稿日期: 2019-07-12      出版日期: 2020-06-01
ZTFLH:  G350  
基金资助:*本文系浙江省哲学社会科学规划项目“基于领域本体的知识地图构建研究”的研究成果之一(13ZJQN043YB)
通讯作者: 潘有能     E-mail: ynpan@zju.edu.cn
引用本文:   
潘有能,倪秀丽. 基于Labeled-LDA模型的在线医疗专家推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
Pan Youneng,Ni Xiuli. Recommending Online Medical Experts with Labeled-LDA Model. Data Analysis and Knowledge Discovery, 2020, 4(4): 34-43.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2019.0815      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I4/34
Fig.1  在线医疗专家推荐模型框架
Fig.2  内科医生主题分布样例
Fig.3  部分健康问题主题分布
组别 准确率 召回率 MRR
组1 42% 41% 0.325
组2 37% 38% 0.301
组3 43% 42% 0.283
组4 35% 34% 0.247
组5 46% 44% 0.362
组6 43% 41% 0.344
组平均值 41% 40% 0.312
测试集总体情况 40% 44% 0.314
Table 1  在线医疗专家推荐测评结果
组别 组1 组2 组3 组4 组5 组6 组平均值 总体情况
最佳推荐个数 273 240 280 228 300 279 267 1 588
Table 2  最佳推荐个数测评结果
对比指标 网站现有指标 专家推荐方法
内科健康问题总数 407 189 6 000
内科医生回答采纳次数 27 726 1 588
所有医生回答总次数 1 371 877 14 022
内科医生回答总次数 407 949 6 940
准确率 20.4% 40.4%
召回率 29.7% 44.0%
回答采纳比 6.8% 22.9%
Table 3  推荐方法对比
[1] 谢文照, 龚雪琴, 罗爱静 . 我国互联网医疗的发展现状及面临的挑战[J]. 中华医学图书情报杂志, 2016,25(9):6-9.
[1] ( Xie Wenzhao, Gong Xueqin, Luo Aijing . Current Situation and Challenges of Internet Medicine in Our Country[J]. Chinese Journal of Medical Library and Information Science, 2016,25(9):6-9.)
[2] 李全才 . “互联网+医疗”建设与应用模式探究[J]. 中国数字医学, 2015,10(11):1.
[2] ( Li Quancai . The Construction and Application Model of “Internet+Medicine”[J]. China Digital Medicine, 2015,10(11):1.)
[3] 朱利, 岳爱珍 . 健康问题和医生匹配机制的研究[J]. 西安交通大学学报, 2014,48(12):57-62.
[3] ( Zhu Li, Yue Aizhen . Routing Health-Oriented Questions to Appropriate Doctors[J]. Journal of Xi’an Jiaotong University, 2014,48(12):57-62.)
[4] Balog K, Azzopardi L, de Rijke M. A Language Modeling Framework for Expert Finding[J]. Information Processing & Management, 2009,45(1):1-19.
doi: 10.1016/j.ipm.2008.06.003
[5] 厉超 . 论坛专家发现系统的研究与实现[D]. 广州: 华南理工大学, 2009.
[5] ( Li Chao . Research and Implementation of BBS Expert Discovery System[D]. Guangzhou: South China University of Technology, 2009.)
[6] Cao Y, Liu J, Bao S, et al. Research on Expert Search at Enterprise Track of TREC 2005[C]// Proceedings of the 14th Text Retrieval Conference, Gaithersburg, Maryland, USA. 2005.
[7] Kleinberg J M . Authoritative Sources in a Hyperlinked Environment[J]. Journal of the ACM, 1999,46(5):604-632.
doi: 10.1145/324133.324140
[8] Page L . The PageRank Citation Ranking: Bringing Order to the Web[R]. Stanford InfoLab, 1999.
[9] Dom B, Eiron I, Cozzi A, et al. Graph-based Ranking Algorithms for E-mail Expertise Analysis[C]// Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. 2003: 42-48.
[10] Zhang J, Ackerman M S, Adamic L . Community Net Simulator: Using Simulations to Study Online Community Networks[A]// Steinfield C, Pentland B T, Ackerman M, et al. Communities and Technologies 2007[M]. Springer, 2007: 295-321.
[11] Jurczyk P, Agichtein E. Discovering Authorities in Question Answer Communities by Using Link Analysis[C]// Proceedings of the 16th ACM Conference on Information and Knowledge Management, Lisbon, Portugal. 2007: 919-922.
[12] Bouguessa M, Dumoulin B, Wang S. Identifying Authoritative Actors in Question-Answering Forums: The Case of Yahoo! Answers[C]// Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2008: 866-874.
[13] Zhou G, Lai S, Liu K, et al. Topic-sensitive Probabilistic Model for Expert Finding in Question Answer Communities[C]// Proceedings of the 21st ACM International Conference on Information & Knowledge Management. 2012: 1662-1666.
[14] 戴秋敏 . 互动问答平台专家发现及问题推荐机制的研究[D]. 上海: 华东师范大学, 2014.
[14] ( Dai Qiumin . Research on Experts Finding and Question Recommendation Mechanism of User-interactive Q&A Platform[D]. Shanghai: East China Normal University, 2014.)
[15] Dumais S T, Furnas G W, Landauer T K, et al. Using Latent Semantic Analysis to Improve Access to Textual Information[C]// Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1988: 281-285.
[16] Hofmann T. Probabilistic Latent Semantic Indexing[C]// Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1999: 50-57.
[17] Blei D M, Ng A Y, Jordan M I . Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
[18] Tian Y, Kochhar P S, Lim E P, et al. Predicting Best Answerers for New Questions: An Approach Leveraging Topic Modeling and Collaborative Voting[C]// Proceedings of the 2013 International Conference on Social Informatics. Springer, 2013: 55-68.
[19] 林鸿飞, 王健, 熊大平 , 等. 基于类别参与度的社区问答专家发现方法[J]. 计算机工程与设计, 2014,35(1):333-338.
[19] ( Lin Hongfei, Wang Jian, Xiong Daping , et al. Category Participation-based Approach to Find Experts for Community Question Answer Services[J]. Computer Engineering and Design, 2014,35(1):333-338.)
[20] Li H, Jin S, Li S. A Hybrid Model for Experts Finding in Community Question Answering[C]// Proceedings of the 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery. IEEE, 2015: 176-185.
[21] Cheng X, Zhu S, Chen G, et al. Exploiting User Feedback for Expert Finding in Community Question Answering[C]// Proceedings of the 2015 IEEE International Conference on Data Mining Workshop. IEEE, 2015: 295-302.
[22] Blei D M, Lafferty J D. Correlated Topic Models[C]// Proceedings of the 18th International Conference on Neural Information Processing Systems. 2005: 147-154.
[23] Li W, McCallum A. Pachinko Allocation: DAG-structured Mixture Models of Topic Correlations[C]// Proceedings of the 23rd International Conference on Machine Learning. ACM, 2006: 577-584.
[24] Rosen-Zvi M, Griffiths T, Steyvers M , et al. The Author-Topic Model for Authors and Documents[OL]. arXiv Preprint, arXiv: 1207. 4169.
[25] Guo X, Xiang Y, Chen Q , et al. LDA-based Online Topic Detection Using Tensor Factorization[J]. Journal of Information Science, 2013,39(4):459-469.
doi: 10.1177/0165551512473066
[26] Ramage D, Hall D, Nallapati R, et al. Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-labeled Corpora[C]// Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009: 248-256.
[27] 杨春艳, 潘有能, 赵莉 . 基于语义和引用加权的文献主题提取研究[J]. 图书情报工作, 2016,60(9):131-138.
[27] ( Yang Chunyan, Pan Youneng, Zhao Li . Study on Topic Extraction of Literatures Based on Weighted Semantic and Citation Relation[J]. Library and Information Service, 2016,60(9):131-138.)
[28] Dai G, Xu M, Xu J , et al. Mining Bursty Topics from Twitter Text Streams Based on Labeled-LDA[J]. Journal of Computational Information Systems, 2014,10(11):4905-4912.
[29] 王树锋, 王文, 费贤举 . 一种基于上下文信息的个性化推荐模型[J]. 常州工学院学报, 2014,27(2):27-31.
[29] ( Wang Shufeng, Wang Wen, Fei Xianju . An Personalized Recommendation Model Based on Context Information[J]. Journal of Changzhou Institute of Technology, 2014,27(2):27-31.)
[30] Zhu X, Hao R, Chi H, et al. Personalized Location Recommendations with Local Feature Awareness[C]// Proceedings of the 2016 IEEE Global Communications Conference. IEEE, 2016.
[31] 卢盛祺, 管连, 金敏 , 等. LDA模型在网络视频推荐中的应用[J]. 微型机与应用, 2016,35(11):74-79.
[31] ( Lu Shengqi, Guan Lian, Jin Min , et al. The Application of LDA in Online Video Recommendation[J]. Microcomputer & Its Applications, 2016,35(11):74-79.)
[32] 朱郁筱, 吕琳媛 . 推荐系统评价指标综述[J]. 电子科技大学学报, 2012,41(2):163-175.
[32] ( Zhu Yuxiao, Lü Linyuan . Evaluation Metrics for Recommender Systems[J]. Journal of University of Electronic Science and Technology of China, 2012,41(2):163-175.)
[1] 伊惠芳,刘细文. 一种专利技术主题分析的IPC语境增强Context-LDA模型研究[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[2] 张鑫,文奕,许海云. 一种融合表示学习与主题表征的作者合作预测模型*[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[3] 赵天资, 段亮, 岳昆, 乔少杰, 马子娟. 基于Biterm主题模型的新闻线索生成方法 *[J]. 数据分析与知识发现, 2021, 5(2): 1-13.
[4] 陈浩, 张梦毅, 程秀峰. 融合主题模型与决策树的跨地区专利合作关系发现与推荐*——以广东省和武汉市高校专利库为例[J]. 数据分析与知识发现, 2021, 5(10): 37-50.
[5] 余传明,原赛,朱星宇,林虹君,张普亮,安璐. 基于深度学习的热点事件主题表示研究*[J]. 数据分析与知识发现, 2020, 4(4): 1-14.
[6] 叶佳鑫,熊回香,蒋武轩. 一种融合患者咨询文本与决策机理的医生推荐算法*[J]. 数据分析与知识发现, 2020, 4(2/3): 153-164.
[7] 陈文杰. 基于翻译模型的科研合作预测研究*[J]. 数据分析与知识发现, 2020, 4(10): 28-36.
[8] 凌洪飞,欧石燕. 面向主题模型的主题自动语义标注研究综述 *[J]. 数据分析与知识发现, 2019, 3(9): 16-26.
[9] 聂维民,陈永洲,马静. 融合多粒度信息的文本向量表示模型 *[J]. 数据分析与知识发现, 2019, 3(9): 45-52.
[10] 曾庆田,胡晓慧,李超. 融合主题词嵌入和网络结构分析的主题关键词提取方法 *[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[11] 余本功,陈杨楠,杨颖. 基于nBD-SVM模型的投诉短文本分类*[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[12] 吴江,刘冠君,胡仙. 在线医疗健康研究的系统综述: 研究热点、主题演化和研究方法*[J]. 数据分析与知识发现, 2019, 3(4): 2-12.
[13] 席林娜,窦永香. 基于计划行为理论的微博用户转发行为影响因素研究*[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[14] 张杰,赵君博,翟东升,孙宁宁. 基于主题模型的微藻生物燃料产业链专利技术分析*[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[15] 刘俊婉,龙志昕,王菲菲. 基于LDA主题模型与链路预测的新兴主题关联机会发现研究*[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn