Please wait a minute...
New Technology of Library and Information Service  2016, Vol. 32 Issue (9): 78-87    DOI: 10.11925/infotech.1003-3513.2016.09.10
Orginal Article Current Issue | Archive | Adv Search |
Detecting Disease Associations with Word2Vec from Consumer Health Information
Luo Wenxin,Chen Chong(),Deng Siyi
School of Government, Beijing Normal University, Beijing 100875, China
Download: PDF(689 KB)   HTML ( 32
Export: BibTeX | EndNote (RIS)      

[Objective] Average people usually do not know the complex associations among diseases, which poses negative effects to their health information seeking experience. This study tries to detect the associations among diseases using popular medical information with the help of deep learning technology (Word2Vec), aiming to improve personalized information services. [Methods] First, we identified 30 common disease topics with the help of medical professionals, and then collected related reports from Medical News Today. Second, we built word vector for each document with Word2Vec technology to calculate the semantic similarities among them. Finally, we compared the machine training results with experts’ scores to evaluate the performance of the proposed method. We also investigated the impacts of different models, optimization methods, data sizes and important parameters to the results. [Results] The correlation coefficient between the Word2Vec results and the experts’ scores reached 0.635 in optimal condition. We found that Skip-Gram model with less than 20 negative samples on large scale dataset yielded the best results. [Limitations] The precision of the Word2Vec judgment was affected by the number of disease topics. The granularity of disease topic needed to be improved. [Conclusions] The Word2Vec technology could be used to identify diseases association from consumer health information sources. It could also be used to improve the personalized health information services.

Key wordsWord2Vec      Disease association      Non-professional medical information      Health informaiton      Personalization     
Received: 16 May 2016      Published: 19 October 2016

Cite this article:

Luo Wenxin,Chen Chong,Deng Siyi. Detecting Disease Associations with Word2Vec from Consumer Health Information. New Technology of Library and Information Service, 2016, 32(9): 78-87.

URL:     OR

[1] Kempson E.Review Article: Consumer Health Information Services[J]. Health Libraries Review, 1984, 1(3): 127-144.
[2] Eysenbach G.Recent Advances: Consumer Health Informatics[J]. BMJ Clinical Research, 2000, 320(7251): 1713-1716.
[3] 侯小妮, 孙静. 北京市三甲医院门诊患者互联网健康信息查寻行为研究[J]. 图书情报工作, 2015, 59(20): 126-131, 11.
[3] (Hou Xiaoni, Sun Jing.Research on Internet Health Information Searching Behaviors of Outpatients from Tertiary Referral Hospital in Beijing[J]. Library and Information Service, 2015, 59(20): 126-131, 11.)
[4] Klavans J L, Muresan S.Evaluation of the DEFINDER System for Fully Automatic Glossary Construction[C]. In: Proceedings AMIA Annual Symposium. 2001: 324-328.
[5] Zeng-Treitler Q, Tse T.Exploring and Developing Consumer Health Vocabularies[J]. Journal of the American Medical Informatics Association, 2006, 13(1): 24-29.
[6] Zeng-Treitler Q, Goryachev S, Tse T, et al.Estimating Consumer Familiarity with Health Terminology: A Context- based Approach[J]. Journal of the American Medical Informatics Association, 2008, 15(3): 349-356.
[7] Burgun A, Bodenreider O.Mapping the UMLS Semantic Network into General Ontologies [C]. In: Proceedings of Annual Symposium. 2001: 81-85.
[8] Keselman A, Smith C A, Divita G, et al.Consumer Health Concepts that do not Map to the UMLS: Where do They Fit?[J]. Journal of the American Medical Informatics Association, 2008, 15(4): 496-505.
[9] Yang Z H, Lin H F, Li Y P, et al.TREC 2005 Genomics Track Experiments at DUTAI [C]. In: Proceedings of the 14th Text REtrieval Conference. 2005: 1-9.
[10] Yang Z H, Lin H F, Li Y P, et al.DUTIR at TREC 2006 Genomics and Enterprise Tracks [C]. In: Proceedings of the 15th Text REtrieval Conference. 2006: 1-10.
[11] Jiang Q, Wang Y, Hao Y, et al.miR2Disease: A Manually Curated Database for microRNA Deregulation in Human Disease[J]. Nucleic Acids Research, 2009, 37(Database issue): D98-104.
[12] Yang H, Yang C C. Using Health Consumer Contributed Data to Detect Adverse Drug Reactions by Association Mining with Temporal Analysis [J]. ACM Transactions on Intelligent Systems & Technology, 2015, 6(4): Article No.55.
[13] Chen A T.Exploring Online Support Spaces: Using Cluster Analysis to Examine Breast Cancer, Diabetes and Fibromyalgia Support Groups[J]. Patient Education and Counseling, 2012, 87(2): 250-257.
[14] 刘红霞, 张进, 陈璟浩. WHO英文网站健康主题语义链接关系社会网络分析[J]. 图书情报工作, 2014, 58(13): 75-82.
[14] (Liu Hongxia, Zhang Jin, Chen Jinghao.Social Network Analysis of Semantic Links Relationships Among Health Topics in WHO English Website[J]. Library and Information Service, 2014, 58(13): 75-82.)
[15] Bengio Y, Schwenk H, Senécal J-S, et al.A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003, 3(6): 1137-1155.
[16] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space [OL]. [2016-05-13]. .
[17] Mikolov T, Sutskever I, Chen K, et al.Distributed Representations of Words and Phrases and Their Compositionality [A]. //Advances in Neural Information Processing Systems[M]. 2013: 3111-3119.
[18] Handler A.An Empirical Study of Semantic Similarity in WordNet and Word2Vec [D]. Columbia University, 2014.
[19] Amunategui M, Markwell T, Rozenfeld Y.Prediction Using Note Text: Synthetic Feature Creation with Word2Vec[J]. Computer Science, 2015(3): 1-6.
[20] Ju R, Zhou P, Li C H, et al.An Efficient Method for Document Categorization Based on Word2Vec and Latent Semantic Analysis [C]. In: Proceedings of the 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM). IEEE, 2015: 2276-2283.
[21] Su Z, Xu H, Zhang D, et al.Chinese Sentiment Classification Using a Neural Network Tool — Word2Vec [C]. In: Proceedings of the 2014 International Conference on Multisensor Fusion and Information Integration for Intelligent Systems (MFI). IEEE, 2014: 1-6.
[1] Cuiqing Jiang,Yibo Guo,Yao Liu. Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
[2] Xinlei Li,Hao Wang,Xiaomin Liu,Sanhong Deng. Comparing Text Vector Generators for Weibo Short Text Classification[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[3] Yongbing Gao,Guipeng Yang,Di Zhang,Zhanfei Ma. Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words[J]. 数据分析与知识发现, 2017, 1(9): 57-64.
[4] Qin Zhang,Hongmei Guo,Zhixiong Zhang. Extracting Entity Relationship with Word Embedding Representation Features[J]. 数据分析与知识发现, 2017, 1(9): 8-15.
[5] Tian Xia. Extracting Keywords with Modified TextRank Model[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[6] Ruilun Liu,Wenhao Ye,Ruiqing Gao,Mengjia Tang,Dongbo Wang. Research on Text Clustering Based on Requirements of Big Data Jobs[J]. 数据分析与知识发现, 2017, 1(12): 32-40.
[7] Ning Jianfei,Liu Jiangzhen. Using Word2vec with TextRank to Extract Keywords[J]. 现代图书情报技术, 2016, 32(6): 20-27.
[8] Li Shuqing, Wang Jianqiang. A Visualization and Recognition Method of Readers’ Interests with the Analysis of the Characteristics of Borrowing Time[J]. 现代图书情报技术, 2013, (5): 46-53.
[9] Li Shuqing, Liu Xiaoqian. The Matching Algorithm of Heterogeneous User Personalized Profile Based on Centripetal Spreading Weighted XML Model[J]. 现代图书情报技术, 2012, 28(5): 32-40.
[10] Zhang Qi, Zhang Yinghua. Research on an Approach of Context Aware Collaborative Recommend for Scientific & Technical Literatures[J]. 现代图书情报技术, 2012, 28(2): 10-17.
[11] Zhao Yan, Su Yuzhao, Guan Tao. A Method of Data Collecting to Improve the Precision of Filtering User Preference[J]. 现代图书情报技术, 2011, (11): 31-37.
[12] Zhou Hong. Customizing Personalization Library Toolbar by Google Toolbar[J]. 现代图书情报技术, 2009, 25(6): 66-69.
[13] Li Shuqing. The Personalized Product Recommendation Method Based on Weighted XML Model[J]. 现代图书情报技术, 2009, 25(4): 64-69.
[14] Guo Wenli,Zhang Xiaolin. Design and Implementation of an Embedded Digital Library Toolbar[J]. 现代图书情报技术, 2007, 2(6): 1-4.
[15] Zhang Yulian,Wang Quan. User Profile Mining of Combining Web Behavior and Content Analysis[J]. 现代图书情报技术, 2007, 2(6): 52-55.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938