[Objective] This study examines the online consultation records of depression patients, aiming to thoroughly understand their situation. [Methods] First, we retrieved the depression consultation records from haodf.com, an online medical platform. Then, we modeled the patients with word vectors, and identified patient groups with the K-means clustering algorithm. Third, we used visualization techniques, such as t-SNE, heat map, and word-cloud, to analyze the group structure and relationship among them. Finally,we identified the emotional-psychological, social, and behavioral differences of different groups and decided their treatment needs with the LDA topic model. [Results] We found six depression groups with different emotional-psychology, social relationship, and behavioral performance. The depression patients’ needs include: seeking suggestion on offline medical treatments, multi-faceted consultation, and inquiry about medication. [Limitations] We analyzed the differences in group characteristics by selecting keywords in each dimension based on part-of-speech and manual analysis. [Conclusions] The proposed method could help us understand patients and their needs, and then construct better online medical platforms.
聂卉, 吴晓燕, 林芸. 基于在线问诊记录的抑郁症病患群组划分与特征分析*[J]. 数据分析与知识发现, 2022, 6(2/3): 222-232.
Nie Hui, Wu Xiaoyan, Lin Yun. Clustering and Characterizing Depression Patients Based on Online Medical Records. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 222-232.
Shen G Y, Jia J, Nie L Q, et al. Depression Detection via Harvesting Social Media: A Multimodal Dictionary Learning Solution[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017: 3838-3844.
[4]
Yin Z J, Sulieman L M, Malin B A. A Systematic Literature Review of Machine Learning in Online Personal Health Data[J]. Journal of the American Medical Informatics Association, 2019, 26(6):561-576.
doi: 10.1093/jamia/ocz009
( Xi Haitao, Nie Wenbo, Li Guichen, et al. Research Status and Progress of Online Health Community User Interaction[J]. Information Science, 2021, 39(4):186-193.)
[6]
Beykikhoshk A, Arandjelović O, Phung D, et al. Using Twitter to Learn about the Autism Community[J]. Social Network Analysis and Mining, 2015, 5(1):1-17.
doi: 10.1007/s13278-014-0242-0
[7]
Hswen Y, Gopaluni A, Brownstein J S, et al. Using Twitter to Detect Psychological Characteristics of Self-Identified Persons with Autism Spectrum Disorder: A Feasibility Study[J]. JMIR MHealth and UHealth, 2019, 7(2):e12264.
doi: 10.2196/12264
[8]
van der Eijk M, Faber M J, Aarts J W M, et al. Using Online Health Communities to Deliver Patient-Centered Care to People with Chronic Conditions[J]. Journal of Medical Internet Research, 2013, 15(6):e115.
doi: 10.2196/jmir.2476
[9]
Young C. Community Management that Works: How to Build and Sustain a Thriving Online Health Community[J]. Journal of Medical Internet Research, 2013, 15(6):e119.
doi: 10.2196/jmir.2501
[10]
Park A, Conway M, Chen A T. Examining Thematic Similarity, Difference, and Membership in Three Online Mental Health Communities from Reddit: A Text Mining and Visualization Approach[J]. Computers in Human Behavior, 2018, 78:98-112.
doi: 10.1016/j.chb.2017.09.001
[11]
Bi Q Q, Shen L N, Evans R, et al. Determining the Topic Evolution and Sentiment Polarity for Albinism in a Chinese Online Health Community: Machine Learning and Social Network Analysis[J]. JMIR Medical Informatics, 2020, 8(5):e17813.
doi: 10.2196/17813
( Sheng Shu, Huang Qi, Zheng Shuya, et al. Study of User Information Requirements in an Online Health Community Based on the Distribution of User Profile and Theme Features: Taking Colorectal Cancer Data from YiXiang as an Example[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(3):308-320.)
[13]
Huh J, Kwon B C, Kim S H, et al. Personas in Online Health Communities[J]. Journal of Biomedical Informatics, 2016, 63:212-225.
doi: 10.1016/j.jbi.2016.08.019
[14]
Bui N, Yen J, Honavar V. Temporal Causality Analysis of Sentiment Change in a Cancer Survivor Network[J]. IEEE Transactions on Computational Social Systems, 2016, 3(2):75-87.
doi: 10.1109/TCSS.2016.2591880
[15]
Chen A T. Exploring Online Support Spaces: Using Cluster Analysis to Examine Breast Cancer, Diabetes and Fibromyalgia Support Groups[J]. Patient Education and Counseling, 2012, 87(2):250-257.
doi: 10.1016/j.pec.2011.08.017
[16]
Feldhege J, Moessner M, Bauer S. Who Says What? Content and Participation Characteristics in an Online Depression Community[J]. Journal of Affective Disorders, 2020, 263:521-527.
doi: 10.1016/j.jad.2019.11.007
[17]
Liu Y, Yin Z J. Understanding Weight Loss via Online Discussions: Content Analysis of Reddit Posts Using Topic Modeling and Word Clustering Techniques[J]. Journal of Medical Internet Research, 2020, 22(6):e13745.
doi: 10.2196/13745
( Wu Jiang, Liu Guanjun, Hu Xian. An Overview of Online Medical and Health Research: Hot Topics, Theme Evolution and Research Content[J]. Data Analysis and Knowledge Discovery, 2019, 3(4):2-12.)
( Li Danya, Hu Tiejun, Li Junlian. Analysis on Terminology Mapping in MeSH Supplementary Concept[J]. Journal of Medical Informatics, 2012, 33(4):45-49.)
[21]
Mikolov T, Chen K C, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint,arXiv:1301.3781.
[22]
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2:3111-3119.
[23]
Salton G, Buckley C. Term-Weighting Approaches in Automatic Text Retrieval[J]. Information Processing & Management, 1988, 24(5):513-523.
doi: 10.1016/0306-4573(88)90021-0
[24]
Hinton G E. Visualizing High-Dimensional Data Using t-SNE[J]. Vigiliae Christianae, 2008, 9(2):2579-2605.
( Zhao Huaming, Yu Li, Zhou Qiang. Determining Best Text Clustering Number with Mean Shift Algorithm[J]. Data Analysis and Knowledge Discovery, 2019, 3(9):27-35.)
[26]
Bastian M, Heymann S, Jacomy M. Gephi: An Open Source Software for Exploring and Manipulating Networks[C]// Proceedings of the 3rd International Conference on Weblogs and Social Media. AAAI Press, 2009.
[27]
Blei D M, Ng A Y, Jordan M J. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022.