Please wait a minute...
New Technology of Library and Information Service  2016, Vol. 32 Issue (9): 78-87    DOI: 10.11925/infotech.1003-3513.2016.09.10
Orginal Article Current Issue | Archive | Adv Search |
Detecting Disease Associations with Word2Vec from Consumer Health Information
Luo Wenxin,Chen Chong(),Deng Siyi
School of Government, Beijing Normal University, Beijing 100875, China
Export: BibTeX | EndNote (RIS)      

[Objective] Average people usually do not know the complex associations among diseases, which poses negative effects to their health information seeking experience. This study tries to detect the associations among diseases using popular medical information with the help of deep learning technology (Word2Vec), aiming to improve personalized information services. [Methods] First, we identified 30 common disease topics with the help of medical professionals, and then collected related reports from Medical News Today. Second, we built word vector for each document with Word2Vec technology to calculate the semantic similarities among them. Finally, we compared the machine training results with experts’ scores to evaluate the performance of the proposed method. We also investigated the impacts of different models, optimization methods, data sizes and important parameters to the results. [Results] The correlation coefficient between the Word2Vec results and the experts’ scores reached 0.635 in optimal condition. We found that Skip-Gram model with less than 20 negative samples on large scale dataset yielded the best results. [Limitations] The precision of the Word2Vec judgment was affected by the number of disease topics. The granularity of disease topic needed to be improved. [Conclusions] The Word2Vec technology could be used to identify diseases association from consumer health information sources. It could also be used to improve the personalized health information services.

Key wordsWord2Vec      Disease association      Non-professional medical information      Health informaiton      Personalization     
Received: 16 May 2016      Published: 19 October 2016

Cite this article:

Luo Wenxin,Chen Chong,Deng Siyi. Detecting Disease Associations with Word2Vec from Consumer Health Information. New Technology of Library and Information Service, 2016, 32(9): 78-87.

URL:     OR

[1] Kempson E.Review Article: Consumer Health Information Services[J]. Health Libraries Review, 1984, 1(3): 127-144.
[2] Eysenbach G.Recent Advances: Consumer Health Informatics[J]. BMJ Clinical Research, 2000, 320(7251): 1713-1716.
[3] 侯小妮, 孙静. 北京市三甲医院门诊患者互联网健康信息查寻行为研究[J]. 图书情报工作, 2015, 59(20): 126-131, 11.
[3] (Hou Xiaoni, Sun Jing.Research on Internet Health Information Searching Behaviors of Outpatients from Tertiary Referral Hospital in Beijing[J]. Library and Information Service, 2015, 59(20): 126-131, 11.)
[4] Klavans J L, Muresan S.Evaluation of the DEFINDER System for Fully Automatic Glossary Construction[C]. In: Proceedings AMIA Annual Symposium. 2001: 324-328.
[5] Zeng-Treitler Q, Tse T.Exploring and Developing Consumer Health Vocabularies[J]. Journal of the American Medical Informatics Association, 2006, 13(1): 24-29.
[6] Zeng-Treitler Q, Goryachev S, Tse T, et al.Estimating Consumer Familiarity with Health Terminology: A Context- based Approach[J]. Journal of the American Medical Informatics Association, 2008, 15(3): 349-356.
[7] Burgun A, Bodenreider O.Mapping the UMLS Semantic Network into General Ontologies [C]. In: Proceedings of Annual Symposium. 2001: 81-85.
[8] Keselman A, Smith C A, Divita G, et al.Consumer Health Concepts that do not Map to the UMLS: Where do They Fit?[J]. Journal of the American Medical Informatics Association, 2008, 15(4): 496-505.
[9] Yang Z H, Lin H F, Li Y P, et al.TREC 2005 Genomics Track Experiments at DUTAI [C]. In: Proceedings of the 14th Text REtrieval Conference. 2005: 1-9.
[10] Yang Z H, Lin H F, Li Y P, et al.DUTIR at TREC 2006 Genomics and Enterprise Tracks [C]. In: Proceedings of the 15th Text REtrieval Conference. 2006: 1-10.
[11] Jiang Q, Wang Y, Hao Y, et al.miR2Disease: A Manually Curated Database for microRNA Deregulation in Human Disease[J]. Nucleic Acids Research, 2009, 37(Database issue): D98-104.
[12] Yang H, Yang C C. Using Health Consumer Contributed Data to Detect Adverse Drug Reactions by Association Mining with Temporal Analysis [J]. ACM Transactions on Intelligent Systems & Technology, 2015, 6(4): Article No.55.
[13] Chen A T.Exploring Online Support Spaces: Using Cluster Analysis to Examine Breast Cancer, Diabetes and Fibromyalgia Support Groups[J]. Patient Education and Counseling, 2012, 87(2): 250-257.
[14] 刘红霞, 张进, 陈璟浩. WHO英文网站健康主题语义链接关系社会网络分析[J]. 图书情报工作, 2014, 58(13): 75-82.
[14] (Liu Hongxia, Zhang Jin, Chen Jinghao.Social Network Analysis of Semantic Links Relationships Among Health Topics in WHO English Website[J]. Library and Information Service, 2014, 58(13): 75-82.)
[15] Bengio Y, Schwenk H, Senécal J-S, et al.A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003, 3(6): 1137-1155.
[16] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space [OL]. [2016-05-13]. .
[17] Mikolov T, Sutskever I, Chen K, et al.Distributed Representations of Words and Phrases and Their Compositionality [A]. //Advances in Neural Information Processing Systems[M]. 2013: 3111-3119.
[18] Handler A.An Empirical Study of Semantic Similarity in WordNet and Word2Vec [D]. Columbia University, 2014.
[19] Amunategui M, Markwell T, Rozenfeld Y.Prediction Using Note Text: Synthetic Feature Creation with Word2Vec[J]. Computer Science, 2015(3): 1-6.
[20] Ju R, Zhou P, Li C H, et al.An Efficient Method for Document Categorization Based on Word2Vec and Latent Semantic Analysis [C]. In: Proceedings of the 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM). IEEE, 2015: 2276-2283.
[21] Su Z, Xu H, Zhang D, et al.Chinese Sentiment Classification Using a Neural Network Tool — Word2Vec [C]. In: Proceedings of the 2014 International Conference on Multisensor Fusion and Information Integration for Intelligent Systems (MFI). IEEE, 2014: 1-6.
[1] Li Yueyan,Xiong Huixiang,Li Xiaomin. Recommending Doctors Online Based on Combined Conditions[J]. 数据分析与知识发现, 2020, 4(8): 130-142.
[2] Tang Xiaobo,Gao Hexuan. Classification of Health Questions Based on Vector Extension of Keywords[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
[3] Ye Jiaxin,Xiong Huixiang,Tong Zhaoli,Meng Qiuqing. Collaborative Tagging for Doctors in Online Medical Community[J]. 数据分析与知识发现, 2020, 4(6): 118-128.
[4] Yue Lixin,Liu Ziqiang,Hu Zhengyin. Evolution Analysis of Hot Topics with Trend-Prediction[J]. 数据分析与知识发现, 2020, 4(6): 22-34.
[5] Tao Xing,Zhang Xiangxian,Guo Shunli,Zhang Liman. Automatic Summarization of User-Generated Content in Academic Q&A Community Based on Word2Vec and MMR[J]. 数据分析与知识发现, 2020, 4(4): 109-118.
[6] Ye Jiaxin,Xiong Huixiang,Jiang Wuxuan. A Physician Recommendation Algorithm Integrating Inquiries and Decisions of Patients[J]. 数据分析与知识发现, 2020, 4(2/3): 153-164.
[7] Xue Fuliang,Liu Lifang. Fine-Grained Sentiment Analysis with CRF and ATAE-LSTM[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
[8] Gong Lijuan,Wang Hao,Zhang Zixuan,Zhu Liping. Reducing Dimensions of Custom Declaration Texts with Word2Vec[J]. 数据分析与知识发现, 2020, 4(2/3): 89-100.
[9] Cuiqing Jiang,Yibo Guo,Yao Liu. Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
[10] Li Xinlei,Wang Hao,Liu Xiaomin,Deng Sanhong. Comparing Text Vector Generators for Weibo Short Text Classification[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[11] Gao Yongbing,Yang Guipeng,Zhang Di,Ma Zhanfei. Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words[J]. 数据分析与知识发现, 2017, 1(9): 57-64.
[12] Zhang Qin,Guo Hongmei,Zhang Zhixiong. Extracting Entity Relationship with Word Embedding Representation Features[J]. 数据分析与知识发现, 2017, 1(9): 8-15.
[13] Xia Tian. Extracting Keywords with Modified TextRank Model[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[14] Liu Ruilun,Ye Wenhao,Gao Ruiqing,Tang Mengjia,Wang Dongbo. Research on Text Clustering Based on Requirements of Big Data Jobs[J]. 数据分析与知识发现, 2017, 1(12): 32-40.
[15] Ning Jianfei,Liu Jiangzhen. Using Word2vec with TextRank to Extract Keywords[J]. 现代图书情报技术, 2016, 32(6): 20-27.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938