Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (2/3): 222-232    DOI: 10.11925/infotech.2096-3467.2021.0883
Current Issue | Archive | Adv Search |
Clustering and Characterizing Depression Patients Based on Online Medical Records
Nie Hui(),Wu Xiaoyan,Lin Yun
School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China
Download: PDF (2486 KB)   HTML ( 15
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study examines the online consultation records of depression patients, aiming to thoroughly understand their situation. [Methods] First, we retrieved the depression consultation records from haodf.com, an online medical platform. Then, we modeled the patients with word vectors, and identified patient groups with the K-means clustering algorithm. Third, we used visualization techniques, such as t-SNE, heat map, and word-cloud, to analyze the group structure and relationship among them. Finally,we identified the emotional-psychological, social, and behavioral differences of different groups and decided their treatment needs with the LDA topic model. [Results] We found six depression groups with different emotional-psychology, social relationship, and behavioral performance. The depression patients’ needs include: seeking suggestion on offline medical treatments, multi-faceted consultation, and inquiry about medication. [Limitations] We analyzed the differences in group characteristics by selecting keywords in each dimension based on part-of-speech and manual analysis. [Conclusions] The proposed method could help us understand patients and their needs, and then construct better online medical platforms.

Key wordsOnline Medical Care      Depression      Clustering      Visualization     
Received: 23 August 2021      Published: 07 January 2022
ZTFLH:  G353  
Fund:Guangzhou Science and Technology Plan Project(202002020036)
Corresponding Authors: Nie Hui,ORCID:0000-0001-8567-3084     E-mail: issnh@mail.sysu.edu.cn

Cite this article:

Nie Hui, Wu Xiaoyan, Lin Yun. Clustering and Characterizing Depression Patients Based on Online Medical Records. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 222-232.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0883     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I2/3/222

Flow Chart of Research
Screenshot of the Consultation Records
Structure of LDA
Evaluation Results of Patient Clustering
病患群组 个案数占比 群组主题词云
C1 10.74% 倾向 轻生 悲观 提不起 念头 自残 失去 社交 不感兴趣 暴躁 压抑 哭泣 低落 胡思乱想 减退 烦躁不安 信心 易怒 自卑 脾气 自杀 没意思 绝望 心烦 高兴 厌世 恐惧 不安 下降 动力
C2 17.58% 感觉 入睡 有时 失眠 睡不着 症状 睡眠 气短 头疼 难受 乏力 心跳 出汗 容易 紧张 出现 身体 害怕 小时 头痛 胸闷 有点 没有 孩子 晚上 有时候 脑袋 睡着 头晕 觉得
C3 19.27% 不想 觉得 感觉 别人 有时候 喜欢 事情 没有 容易 害怕 开心 活着 孩子 每天 心里 有时 难过 讨厌 莫名其妙 想着 愿意 生气 说话 干什么 没意思 情绪 知道 难受 意义 世界
C4 22.10% 孩子 工作 父母 觉得 学校 老公 学习 妈妈 家人 母亲 情绪 生活 没有 上学 父亲 同学 知道 别人 不想 愿意 老师 问题 喜欢 手机 宝宝 分手 家庭 丈夫 离婚 患者
C5 12.09% 服用 mg 一片 毫克 舍曲林 奥氮 效果 服药 氮平 停药 一粒 黛力 半片 盐酸 帕罗西汀 西酞普兰 草酸 副作用 文拉法 阿立 一天 丙戊酸 碳酸锂 复发 药物 劳拉西泮 克隆 来士普 百忧解 胶囊
C6 18.22% 检查 治疗 医生 孩子 医院 药物 是否 患者 咨询 症状 复发 大夫 住院 测试 吃药 服药 病情 一下 发病 主任 需要 停药 问题 服用 手术 诊断 怀孕 用药 月份 就诊
Depression Patient Group Generated Based on K-means Clustering
Distribution of Depression Patient Groups
Co-word Network of Depression Patient Groups
特征维度 关键词个数 关键词示例
情绪与心理 32 害怕、焦虑、紧张、头痛、恐惧
家庭社会角色与关系 26 孩子、父母、老公、学校、母亲
行为表现 32 工作、失眠、学习、自杀、自残
Characteristic Dimensions and Keywords of Patient
Analysis of Differences in Emotional Cognition of Patient Groups
Analysis of Differences in Interpersonal Environment of Patient Groups
Analysis of Differences in Behavior of Patient Groups
就诊诉求 主题 主题词 主题解释
药物相关 T1 是否、吃药、需要、抑郁症、严重 吃药
T2 咨询、药物、药、副作用、用药 药物副作用
T3 吃、药、好、改善、效果 药物效果
就诊相关 T4 门诊、就诊、是否、需要、预约 是否门诊就诊
T5 医院、看、检查、就诊、科 医院检查科室
T6 抑郁症、是否、是不是、确诊、医生 确诊是否抑郁症
T7 想、知道、情况、这种、问题 知道情况
治疗相关 T8 抑郁、情绪、现在、是不是、心理咨询 缓解情绪
T9 希望、医生、帮助、治疗、建议 治疗建议
T10 控制、病情、调整、心理、疏导 控制病情
其他 T11 医生、问题、希望、帮、建议 其他
Patient’s Demand Model Based on LDA
Differences in Demand Distribution among Patient Groups
相关系数 C1 C2 C3 C4 C5 C6
C1 1.000
C2 0.555 1.000
C3 .900** 0.527 1.000
C4 .827** .791** .818** 1.000
C5 -0.109 0.500 -0.118 0.082 1.000
C6 0.527 .836** 0.509 .736** 0.555 1.000
Results of Spearman Correlation
[1] 孟秋晴, 熊回香. 基于在线问诊文本信息的医生推荐研究[J]. 情报科学, 2021, 39(6):152-160.
[1] ( Meng Qiuqing, Xiong Huixiang. Doctor Recommendation Based on Online Consultation Text Information[J]. Information Science, 2021, 39(6):152-160.)
[2] WHO. Depression[EB/OL].(2020-01-30)[2021-08-06]. https://www.who.int/news-room/fact-sheets/detail/depression.
[3] Shen G Y, Jia J, Nie L Q, et al. Depression Detection via Harvesting Social Media: A Multimodal Dictionary Learning Solution[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017: 3838-3844.
[4] Yin Z J, Sulieman L M, Malin B A. A Systematic Literature Review of Machine Learning in Online Personal Health Data[J]. Journal of the American Medical Informatics Association, 2019, 26(6):561-576.
doi: 10.1093/jamia/ocz009
[5] 席海涛, 聂文博, 李闺臣, 等. 在线健康社区用户交互的研究现状与进展[J]. 情报科学, 2021, 39(4):186-193.
[5] ( Xi Haitao, Nie Wenbo, Li Guichen, et al. Research Status and Progress of Online Health Community User Interaction[J]. Information Science, 2021, 39(4):186-193.)
[6] Beykikhoshk A, Arandjelović O, Phung D, et al. Using Twitter to Learn about the Autism Community[J]. Social Network Analysis and Mining, 2015, 5(1):1-17.
doi: 10.1007/s13278-014-0242-0
[7] Hswen Y, Gopaluni A, Brownstein J S, et al. Using Twitter to Detect Psychological Characteristics of Self-Identified Persons with Autism Spectrum Disorder: A Feasibility Study[J]. JMIR MHealth and UHealth, 2019, 7(2):e12264.
doi: 10.2196/12264
[8] van der Eijk M, Faber M J, Aarts J W M, et al. Using Online Health Communities to Deliver Patient-Centered Care to People with Chronic Conditions[J]. Journal of Medical Internet Research, 2013, 15(6):e115.
doi: 10.2196/jmir.2476
[9] Young C. Community Management that Works: How to Build and Sustain a Thriving Online Health Community[J]. Journal of Medical Internet Research, 2013, 15(6):e119.
doi: 10.2196/jmir.2501
[10] Park A, Conway M, Chen A T. Examining Thematic Similarity, Difference, and Membership in Three Online Mental Health Communities from Reddit: A Text Mining and Visualization Approach[J]. Computers in Human Behavior, 2018, 78:98-112.
doi: 10.1016/j.chb.2017.09.001
[11] Bi Q Q, Shen L N, Evans R, et al. Determining the Topic Evolution and Sentiment Polarity for Albinism in a Chinese Online Health Community: Machine Learning and Social Network Analysis[J]. JMIR Medical Informatics, 2020, 8(5):e17813.
doi: 10.2196/17813
[12] 盛姝, 黄奇, 郑姝雅, 等. 在线健康社区中用户画像及主题特征分布下信息需求研究——以医享网结直肠癌圈数据为例[J]. 情报学报, 2021, 40(3):308-320.
[12] ( Sheng Shu, Huang Qi, Zheng Shuya, et al. Study of User Information Requirements in an Online Health Community Based on the Distribution of User Profile and Theme Features: Taking Colorectal Cancer Data from YiXiang as an Example[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(3):308-320.)
[13] Huh J, Kwon B C, Kim S H, et al. Personas in Online Health Communities[J]. Journal of Biomedical Informatics, 2016, 63:212-225.
doi: 10.1016/j.jbi.2016.08.019
[14] Bui N, Yen J, Honavar V. Temporal Causality Analysis of Sentiment Change in a Cancer Survivor Network[J]. IEEE Transactions on Computational Social Systems, 2016, 3(2):75-87.
doi: 10.1109/TCSS.2016.2591880
[15] Chen A T. Exploring Online Support Spaces: Using Cluster Analysis to Examine Breast Cancer, Diabetes and Fibromyalgia Support Groups[J]. Patient Education and Counseling, 2012, 87(2):250-257.
doi: 10.1016/j.pec.2011.08.017
[16] Feldhege J, Moessner M, Bauer S. Who Says What? Content and Participation Characteristics in an Online Depression Community[J]. Journal of Affective Disorders, 2020, 263:521-527.
doi: 10.1016/j.jad.2019.11.007
[17] Liu Y, Yin Z J. Understanding Weight Loss via Online Discussions: Content Analysis of Reddit Posts Using Topic Modeling and Word Clustering Techniques[J]. Journal of Medical Internet Research, 2020, 22(6):e13745.
doi: 10.2196/13745
[18] 吴江, 刘冠君, 胡仙. 在线医疗健康研究的系统综述: 研究热点、主题演化和研究方法[J]. 数据分析与知识发现, 2019, 3(4):2-12.
[18] ( Wu Jiang, Liu Guanjun, Hu Xian. An Overview of Online Medical and Health Research: Hot Topics, Theme Evolution and Research Content[J]. Data Analysis and Knowledge Discovery, 2019, 3(4):2-12.)
[19] 好大夫在线简介[EB/OL]. [2021-07-09]. https://www.haodf.com/info/aboutus.php.
[19] (Introduction of haodf.com[EB/OL]. [2021-07-09]. https://www.haodf.com/info/aboutus.php. )
[20] 李丹亚, 胡铁军, 李军莲. MeSH增补概念的术语映射分析[J]. 医学信息学杂志, 2012, 33(4):45-49.
[20] ( Li Danya, Hu Tiejun, Li Junlian. Analysis on Terminology Mapping in MeSH Supplementary Concept[J]. Journal of Medical Informatics, 2012, 33(4):45-49.)
[21] Mikolov T, Chen K C, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint,arXiv:1301.3781.
[22] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2:3111-3119.
[23] Salton G, Buckley C. Term-Weighting Approaches in Automatic Text Retrieval[J]. Information Processing & Management, 1988, 24(5):513-523.
doi: 10.1016/0306-4573(88)90021-0
[24] Hinton G E. Visualizing High-Dimensional Data Using t-SNE[J]. Vigiliae Christianae, 2008, 9(2):2579-2605.
[25] 赵华茗, 余丽, 周强. 基于均值漂移算法的文本聚类数目优化研究[J]. 数据分析与知识发现, 2019, 3(9):27-35.
[25] ( Zhao Huaming, Yu Li, Zhou Qiang. Determining Best Text Clustering Number with Mean Shift Algorithm[J]. Data Analysis and Knowledge Discovery, 2019, 3(9):27-35.)
[26] Bastian M, Heymann S, Jacomy M. Gephi: An Open Source Software for Exploring and Manipulating Networks[C]// Proceedings of the 3rd International Conference on Weblogs and Social Media. AAAI Press, 2009.
[27] Blei D M, Ng A Y, Jordan M J. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022.
[1] Wang Xuefeng, Ren Huichao, Liu Yuqin. Visualization Method for Technology Theme Map with Clustering[J]. 数据分析与知识发现, 2022, 6(1): 91-100.
[2] Wang Ruolin, Niu Zhendong, Lin Qika, Zhu Yifan, Qiu Ping, Lu Hao, Liu Donglei. Disambiguating Author Names with Embedding Heterogeneous Information and Attentive RNN Clustering Parameters[J]. 数据分析与知识发现, 2021, 5(8): 13-24.
[3] Wang Xiwei,Jia Ruonan,Wei Yanan,Zhang Liu. Clustering User Groups of Public Opinion Events from Multi-dimensional Social Network[J]. 数据分析与知识发现, 2021, 5(6): 25-35.
[4] Lu Linong,Zhu Zhongming,Zhang Wangqiang,Wang Xiaochun. Cross-database Knowledge Integration and Fingerprint of Institutional Repositories with Lingo3G Clustering Algorithm[J]. 数据分析与知识发现, 2021, 5(5): 127-132.
[5] Zhang Mengyao, Zhu Guangli, Zhang Shunxiang, Zhang Biao. Grouping Microblog Users of Trending Topics Based on Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[6] Ding Hao, Ai Wenhua, Hu Guangwei, Li Shuqing, Suo Wei. A Personalized Recommendation Model with Time Series Fluctuation of User Interest[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[7] Yang Chen, Chen Xiaohong, Wang Chuhan, Liu Tingting. Recommendation Strategy Based on Users’ Preferences for Fine-Grained Attributes[J]. 数据分析与知识发现, 2021, 5(10): 94-102.
[8] Yu Fengchang,Cheng Qikai,Lu Wei. Locating Academic Literature Figures and Tables with Geometric Object Clustering[J]. 数据分析与知识发现, 2021, 5(1): 140-149.
[9] Wen Pingmei,Ye Zhiwei,Ding Wenjian,Liu Ying,Xu Jian. Developments of Named Entity Disambiguation[J]. 数据分析与知识发现, 2020, 4(9): 15-25.
[10] Wu Jinming,Hou Yuefang,Cui Lei. Automatic Expression of Co-occurrence Clustering Based on Indexing Rules of Medical Subject Headings[J]. 数据分析与知识发现, 2020, 4(9): 133-144.
[11] Xi Yunjiang, Du Diedie, Liao Xiao, Zhang Xuehong. Analyzing & Clustering Enterprise Microblog Users with Supernetwork[J]. 数据分析与知识发现, 2020, 4(8): 107-118.
[12] Yang Xu,Qian Xiaodong. Synchronous Clustering Algorithm for Social Networks Based on Improved Vicsek Model[J]. 数据分析与知识发现, 2020, 4(4): 119-128.
[13] Xiong Huixiang,Li Xiaomin,Li Yueyan. Group Recommendation Based on Attribute Mining of Book Reviews[J]. 数据分析与知识发现, 2020, 4(2/3): 214-222.
[14] Chen Ting,Wang Haiming,Wang Xiaomei. Detecting Funding Topics Evolutions with Visualization[J]. 数据分析与知识发现, 2020, 4(2/3): 60-67.
[15] Wei Jiaze,Dong Cheng,He Yanqing,Liu Zhihui,Peng Keyun. Detecting News Topics Based on Equalized Paragraph and Sub-topic Vector[J]. 数据分析与知识发现, 2020, 4(10): 70-79.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn