Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (8): 98-106    DOI: 10.11925/infotech.2096-3467.2018.0142
Current Issue | Archive | Adv Search |
Finding Association Between Diseases and Genes from Literature Abstracts
Dongmei Mu(),Shan Jin,Yuanhong Ju
School of Public Health, Jilin University, Changchun 130021, China
Download: PDF(619 KB)   HTML ( 2
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study tries to find association between genes and diseases from literature abstracts, aiming to provide evidence for the prevention and treatment of diseases. [Methods] First, we established the entity extraction rules with the help of recognition techniques based on thesaurus. Then, we proposed a model to discover the association between disease and gene entities. Finally, we validated the new model with abstracts of diabete nephropathy studies. [Results] A total of 656 diabetic nephropathy associated genes were obtained, which included high frequency, mid frequency and low frequency genes. [Limitations] More research is needed to explore other diabete complications with the proposed model. [Conclusions] (I)The high frequency associated genes of disease are possibly the theoretical foundations of current research. (II)Intermediate frequency associated genes are the focus of current research. (III) Low frequency associated genes could become new fields for knowledge discovery.

Key wordsEntity Recognition      Information Extraction      Cluster Analysis      Genes Association Relationship     
Received: 02 February 2018      Published: 08 September 2018

Cite this article:

Dongmei Mu,Shan Jin,Yuanhong Ju. Finding Association Between Diseases and Genes from Literature Abstracts. Data Analysis and Knowledge Discovery, 2018, 2(8): 98-106.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.0142     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I8/98

[1] 王郝日钦. 深度学习在文本挖掘中的应用研究[D]. 通辽: 内蒙古民族大学, 2015.
[1] (Wang Haoriqin.Application of Deep Learning in Text Mining [D]. Tongliao: Inner Mongolia University for Nationalities, 2015.)
[2] 吴潇泽. 科技文献趋势挖掘技术研究[D]. 杭州: 杭州电子科技大学, 2010.
[2] (Wu Xiaoze.Research of Scientific Literature Trend Mining Technology [D]. Hangzhou: Hangzhou Dianzi University, 2010.)
[3] 郑玲, 刘秋爽, 金晶, 等. 糖尿病并发症治疗靶点的研究进展[J]. 海峡药学, 2014, 26(1):13-17.
[3] (Zheng Ling, Liu Qiushuang, Jin Jing, et al.Research Progress on the Target of Diabetic Complications Treatment[J]. Strait Pharmaceutical Journal, 2014, 26(1): 13-17.)
[4] 李英, 唐英琪. 载脂蛋白E(ApoE)基因多态性与2型糖尿病(T2DM)血脂及其脑梗死并发症的关系研究[J]. 中外医疗, 2014(31): 37-38,41.
[4] (Li Ying, Tang Yingqi.Study of the Relationship Between Apolipoprotein E (ApoE) Gene Polymorphism, Type 2 Diabetes (T2DM) Blood Lipid and Cerebral Infarction Complication[J]. China & Foreign Medical Treatment, 2014(31): 37-38,41.)
[5] 许慧宁, 代青湘. PON2 Cys311Ser 基因多态性与高原老年糖尿病并发症的关系[J]. 世界最新医学信息文摘:电子版, 2014(21):11,13.
[5] (Xu Huining, Dai Qingxiang. Relationship Between PON2 Cys311Ser Gene Polymorphism and Elderly Diabetic Complications in Plateau [J]. World Latest Medicine Information, 2014(21):11,13.)
[6] 唐珊珊. Irisin及其基因多态性与中国人2型糖尿病相关临床性状及微血管并发症的关系[D]. 上海: 上海交通大学, 2015.
[6] (Tang Shanshan.Association of Irisin and Its Genetic Variants with Type 2 Diabetes-related Traits and Microvascular Complications in the Chinese Population [D]. Shanghai: Shanghai Jiaotong University, 2015.)
[7] 徐哲奕. 2型糖尿病大血管并发症中血管平滑肌细胞增殖相关基因DNA甲基化的作用研究[D]. 武汉: 华中科技大学, 2014.
[7] (Xu Zheyi.Study on DNA Methylation Alteration of Vascular Smooth Muscle Cell Proliferation Related Genes in Type 2 Diabetic Macrovascular Complications [D]. Wuhan: Huazhong University of Science and Technology, 2014.)
[8] 余翠, 熊钱颖, 王李卓, 等. 糖尿病肾病的发病机制及治疗进展[J]. 医学综述, 2015, 21(21): 3944-3947.
[8] (Yu Cui, Xiong Qianying, Wang Lizhuo, et al.Recent Progress in the Pathogenesis of Diabetic Nephropathy and Its Treatment[J]. Medical Recapitulate, 2015, 21(21): 3944-3947.)
[9] 张晓艳, 王挺, 陈火旺. 命名实体识别研究[J]. 计算机科学, 2005, 32(4): 44-48.
[9] (Zhang Xiaoyan, Wang Ting, Chen Huowang.Research on Named Entity Recognition[J]. Computer Science, 2005, 32(4): 44-48.)
[10] Karadeniz İ, Özgür A.Detection and Categorization of Bacteria Habitats Using Shallow Linguistic Analysis[J]. BMC Bioinformatics, 2015, 16(S10): S5.
[11] Yimam S M, Biemann C, Majnaric L, et al.An Adaptive Annotation Approach for Biomedical Entity and Relation Recognition[J]. Brain Informatics, 2016, 3(3): 157-168.
[12] Lin W, Ji D, Lu Y.Disorder Recognition in Clinical Texts Using Multi-label Structured SVM[J]. BMC Bioinformatics, 2017, 18: 75.
[13] 舒刚. 基于生物医学文本挖掘技术的天然产物的靶标蛋白预测[D]. 上海: 复旦大学, 2012.
[13] (Shu Gang.Prediction on Target Protein of Natural Products Based on Biomedical Text Mining Technology [D]. Shanghai: Fudan University, 2012.)
[14] 李保利, 陈玉忠, 俞士汶. 信息抽取研究综述[J]. 计算机工程与应用, 2003, 39(10):1-5.
[14] (Li Baoli, Chen Yuzhong, Yu Shiwen.Research on Information Extraction: A Survey[J]. Computer Engineering and Applications, 2003, 39(10): 1-5.)
[15] 方福德. 人类基因的命名和书写[J]. 基础医学与临床, 2010(10). DOI: 10.16352/j.issn.1001-6325.2010.10.001.
[15] (Fang Fude. Nomenclature and Writing of Human Gene [J]. Basic & Clinical Medicine, 2010(10). DOI:10.16352/j.issn.1001-6325.2010.10.001.)
[16] HGNC [EB/OL]. [2017-05-20].
[17] 尚美辰. 基于UMLS和通路数据的潜在语义分析技术的研究与实现[D]. 哈尔滨:黑龙江大学, 2015.
[17] (Shang Meichen.Research and Implementation of Latent Semantic Analysis Technology Based on UMLS and Path Data [D]. Harbin: Heilongjiang University, 2015.)
[18] 肖袁. 基于DOM4J的XML文档解析技术[J]. 科技信息, 2011(2): 229-230.
[18] (Xiao Yuan.XML Document Parsing Technology Based on DOM4J[J]. Science & Technology Information, 2011(2): 229-230.)
[19] 杨宏新, 毛培春, 孟林, 等. 19份高燕麦草种质材料苗期抗旱性评价[J]. 干旱地区农业研究, 2011, 29(2): 6-14.
[19] (Yang Hongxin, Mao Peichun, Meng Lin, et al.Assessment of Drought Resistance for 19 Germplasm and Materials of Arrhenatherum Elatius at the Seedling Stage[J]. Agricultural Research in the Arid Areas, 2011, 29(2): 6-14.)
[20] Amberger J, Bocchini C, Hamosh A.A New Face and New Challenges for Online Mendelian Inheritance in Man (OMIM?)[J]. Human Mutation, 2011, 32(5): 564-567.
[21] OMIM [EB/OL]. [2017-07-05].
[22] Dorr C R, Freedman B I, Hicks P J, et al.Deceased-Donor Apolipoprotein L1 Renal-Risk Variants Have Minimal Effects on Liver Transplant Outcomes[J]. PLoS One, 2016, 11(4): e0152775.
[23] Dummer P D, Limou S, Rosenberg A Z, et al.APOL1 Kidney Disease Risk Variants: An Evolving Landscape[J]. Seminars in Nephrology, 2015, 35(3): 222-236.
[24] Dollerup P, Thomsen T M, Nejsum L N, et al.Partial Nephrogenic Diabetes Insipidus Caused by a Novel AQP2 Variation Impairing Trafficking of the Aquaporin-2 Water Channel[J]. BMC Nephrology, 2015, 16(1): 217.
[25] Guo K, Lu J, Kou J, et al.Increased Urinary Smad3 is Significantly Correlated with Glomerular Hyperfiltration and a Reduced Glomerular Filtration Rate and is a New Urinary Biomarker for Diabetic Nephropathy[J]. BMC Nephrology, 2015, 16(1): 159.
[26] Tsun J G S, Yung S, Chau M K M, et al. Cellular Cholesterol Transport Proteins in Diabetic Nephropathy[J]. PLoS One, 2014, 9(9): e105787.
[27] Xu H, Wang X, Liu M, et al.Association of Aldosterone Synthase (CYP11B2) -344 T/C Polymorphism with Diabetic Nephropathy: A Meta-analysis[J]. Journal of the Renin-Angiotensin-Aldosterone System: JRAAS, 2016, 17(1): 1470320316633896.
[1] Han Huang,Hongyu Wang,Xiaoguang Wang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[2] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[3] Li Yu,Li Qian,Changlei Fu,Huaming Zhao. Extracting Fine-grained Knowledge Units from Texts with Deep Learning[J]. 数据分析与知识发现, 2019, 3(1): 38-45.
[4] Huihui Tang,Hao Wang,Zixuan Zhang,Xueying Wang. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[5] Xinyue Fan,Lei Cui. Using Text Mining to Discover Drug Side Effects: Case Study of PubMed[J]. 数据分析与知识发现, 2018, 2(3): 79-86.
[6] Yue He,Aixin Wang,Yue Feng,Li Wang. Optimizing Layouts of Outpatient Pharmacy Based on Association Rules[J]. 数据分析与知识发现, 2018, 2(1): 99-108.
[7] Runwen Chen,Yong Qiu,Wenbin Huang,Jun Wang. Analyzing Private College Students’ Online Lifestyle with Web-logs[J]. 数据分析与知识发现, 2017, 1(8): 31-38.
[8] Xueying Wang,Zixuan Zhang,Hao Wang,Sanhong Deng. Evaluating Brands of Agriculture Products: A Literature Review[J]. 数据分析与知识发现, 2017, 1(7): 13-21.
[9] Jiawang Cui,Chunwang Li. Identifying Semantic Relations of Clusters Based on Linked Data[J]. 数据分析与知识发现, 2017, 1(4): 57-66.
[10] Sui Mingshuang,Cui Lei. Extracting Chemical and Disease Named Entities with Multiple-Feature CRF Model[J]. 现代图书情报技术, 2016, 32(10): 91-97.
[11] Yufeng Duan,Sisi Huang. Information Extraction from Chinese Plant Species Diversity Description Text[J]. 现代图书情报技术, 2016, 32(1): 87-96.
[12] Liu Wei, Wang Xing, Song Peiyan. A Noise Cleaning Method for Synonym Extraction Results[J]. 现代图书情报技术, 2015, 31(6): 64-70.
[13] Jiang Chuntao. Automatic Annotation of Bibliographical References in Chinese Patent Documents[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[14] Li Xiangdong, Huo Yayong, Huang Li. Study of Book Pages Automatic Identification and Bibliographic Information Extraction[J]. 现代图书情报技术, 2014, 30(4): 71-77.
[15] Liu Yajing, Wang Yanxi, Hao Dan, Zhou Jinhui. Study on the Methods of Institutional Repository Supporting Research Services[J]. 现代图书情报技术, 2014, 30(3): 1-7.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn