Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (6): 118-128    DOI: 10.11925/infotech.2096-3467.2019.1156
Collaborative Tagging for Doctors in Online Medical Community
Ye Jiaxin1,Xiong Huixiang1(),Tong Zhaoli1,2,Meng Qiuqing1
1School of Information Management, Central China Normal University, Wuhan 430079, China
2Hubei Communication Technical College, Wuhan 430079, China
[Objective] This paper tries to find similar doctors and improve the descriptions of their characteristics. [Methods] We generated vector representation for each doctor’s consulting texts, article titles and service scopes with the Word2Vec model, which helped us identify similar doctors. Then, we analyzed their common characteristics and collaboratively tag these doctors. [Results] The accuracy of tagging results based on doctor’s consulting texts, article titles and services were 0.667, 0.252 and 0.708, respectively. The accuracy of tagging results based on mixed texts was 1.000. [Limitations] The performance of single-text based tagging needs to be improved. [Conclusions] Tags based on consultation texts are closely related to the immediate needs of patients, while tags based on article titles are strongly related to doctor’s interests. Tags obtained from their services and mixed texts are more accurate.

Key wordsWord2Vec      Collaborative Tagging      Physician Tagging      Tag Recommendations     
Received: 22 October 2019      Published: 07 July 2020
Xiong Huixiang

Ye Jiaxin,Xiong Huixiang,Tong Zhaoli,Meng Qiuqing. Collaborative Tagging for Doctors in Online Medical Community. Data Analysis and Knowledge Discovery, 2020, 4(6): 118-128.

Sample Data of 800 Doctors
Sample Training Texts of 596 Doctors
对比项 肺癌 肺部结节 肺部疾病 肺炎 糖尿病 不孕不育 呼吸衰竭
序号 1 2 3 4 5 6 204
频次 119 112 108 92 89 84 1
概率 0.200 0.188 0.181 0.154 0.149 0.141 0.002
Vote Data of 596 Doctors from Patient
The Vote Frequency Line Chart of 204 Diseases
Collaborative Tagging Model
The Word Vector of Patial Words
Test Doctor and His Similar Doctor
The Number of Doctors Meeting the Tagging Criteria
投票 出现频次 出现概率 原出现概率 原出现概率×2
糖尿病 4 0.500 0.149 0.298
高血压 3 0.375 0.134 0.268
甲亢 3 0.375 0.129 0.258
甲减 3 0.375 0.112 0.224
内分泌疾病 2 0.250 0.020 0.040
不孕不育 1 0.125 0.141 0.282
乙肝 1 0.125 0.134 0.268
试管婴儿 1 0.125 0.119 0.238
感染 1 0.125 0.015 0.030
Similar Doctor Vote for Test Doctor
Doctor Tagging Based on Different Texts
测试医生 标签
13 高血压;冠心病;心脏病;房颤
21 Null
28 肺炎;咳嗽;哮喘;支气管炎;支气管扩张
35 糖尿病;甲亢;甲减;甲状腺疾病
38 Null
63 哮喘;过敏
Doctor Tagging of Mixed Different Texts
Evaluation of Tagging Effect
