Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (8): 98-106    DOI: 10.11925/infotech.2096-3467.2018.0142
Finding Association Between Diseases and Genes from Literature Abstracts
Dongmei Mu(),Shan Jin,Yuanhong Ju
School of Public Health, Jilin University, Changchun 130021, China
[Objective] This study tries to find association between genes and diseases from literature abstracts, aiming to provide evidence for the prevention and treatment of diseases. [Methods] First, we established the entity extraction rules with the help of recognition techniques based on thesaurus. Then, we proposed a model to discover the association between disease and gene entities. Finally, we validated the new model with abstracts of diabete nephropathy studies. [Results] A total of 656 diabetic nephropathy associated genes were obtained, which included high frequency, mid frequency and low frequency genes. [Limitations] More research is needed to explore other diabete complications with the proposed model. [Conclusions] (I)The high frequency associated genes of disease are possibly the theoretical foundations of current research. (II)Intermediate frequency associated genes are the focus of current research. (III) Low frequency associated genes could become new fields for knowledge discovery.

Key wordsEntity Recognition      Information Extraction      Cluster Analysis      Genes Association Relationship     
Received: 02 February 2018      Published: 08 September 2018

Dongmei Mu,Shan Jin,Yuanhong Ju. Finding Association Between Diseases and Genes from Literature Abstracts. Data Analysis and Knowledge Discovery, 2018, 2(8): 98-106.

