[Objective] This study tries to find association between genes and diseases from literature abstracts, aiming to provide evidence for the prevention and treatment of diseases. [Methods] First, we established the entity extraction rules with the help of recognition techniques based on thesaurus. Then, we proposed a model to discover the association between disease and gene entities. Finally, we validated the new model with abstracts of diabete nephropathy studies. [Results] A total of 656 diabetic nephropathy associated genes were obtained, which included high frequency, mid frequency and low frequency genes. [Limitations] More research is needed to explore other diabete complications with the proposed model. [Conclusions] (I)The high frequency associated genes of disease are possibly the theoretical foundations of current research. (II)Intermediate frequency associated genes are the focus of current research. (III) Low frequency associated genes could become new fields for knowledge discovery.
牟冬梅, 金姗, 琚沅红. 基于文献数据的疾病与基因关联关系研究*[J]. 数据分析与知识发现, 2018, 2(8): 98-106.
Mu Dongmei,Jin Shan,Ju Yuanhong. Finding Association Between Diseases and Genes from Literature Abstracts. Data Analysis and Knowledge Discovery, 2018, 2(8): 98-106.
(Zheng Ling, Liu Qiushuang, Jin Jing, et al.Research Progress on the Target of Diabetic Complications Treatment[J]. Strait Pharmaceutical Journal, 2014, 26(1): 13-17.)
(Li Ying, Tang Yingqi.Study of the Relationship Between Apolipoprotein E (ApoE) Gene Polymorphism, Type 2 Diabetes (T2DM) Blood Lipid and Cerebral Infarction Complication[J]. China & Foreign Medical Treatment, 2014(31): 37-38,41.)
(Xu Huining, Dai Qingxiang. Relationship Between PON2 Cys311Ser Gene Polymorphism and Elderly Diabetic Complications in Plateau [J]. World Latest Medicine Information, 2014(21):11,13.)
doi: 10.3969/j.issn.1671-3141.2014.21.005
(Tang Shanshan.Association of Irisin and Its Genetic Variants with Type 2 Diabetes-related Traits and Microvascular Complications in the Chinese Population [D]. Shanghai: Shanghai Jiaotong University, 2015.)
(Xu Zheyi.Study on DNA Methylation Alteration of Vascular Smooth Muscle Cell Proliferation Related Genes in Type 2 Diabetic Macrovascular Complications [D]. Wuhan: Huazhong University of Science and Technology, 2014.)
(Yu Cui, Xiong Qianying, Wang Lizhuo, et al.Recent Progress in the Pathogenesis of Diabetic Nephropathy and Its Treatment[J]. Medical Recapitulate, 2015, 21(21): 3944-3947.)
(Zhang Xiaoyan, Wang Ting, Chen Huowang.Research on Named Entity Recognition[J]. Computer Science, 2005, 32(4): 44-48.)
[10]
Karadeniz İ, Özgür A.Detection and Categorization of Bacteria Habitats Using Shallow Linguistic Analysis[J]. BMC Bioinformatics, 2015, 16(S10): S5.
doi: 10.1186/1471-2105-16-S10-S5
pmid: 4511461
[11]
Yimam S M, Biemann C, Majnaric L, et al.An Adaptive Annotation Approach for Biomedical Entity and Relation Recognition[J]. Brain Informatics, 2016, 3(3): 157-168.
doi: 10.1007/s40708-016-0036-4
pmid: 4999566
[12]
Lin W, Ji D, Lu Y.Disorder Recognition in Clinical Texts Using Multi-label Structured SVM[J]. BMC Bioinformatics, 2017, 18: 75.
doi: 10.1186/s12859-017-1476-4
pmid: 5282630
[13]
舒刚. 基于生物医学文本挖掘技术的天然产物的靶标蛋白预测[D]. 上海: 复旦大学, 2012.
[13]
(Shu Gang.Prediction on Target Protein of Natural Products Based on Biomedical Text Mining Technology [D]. Shanghai: Fudan University, 2012.)
(Shang Meichen.Research and Implementation of Latent Semantic Analysis Technology Based on UMLS and Path Data [D]. Harbin: Heilongjiang University, 2015.)
[18]
肖袁. 基于DOM4J的XML文档解析技术[J]. 科技信息, 2011(2): 229-230.
[18]
(Xiao Yuan.XML Document Parsing Technology Based on DOM4J[J]. Science & Technology Information, 2011(2): 229-230.)
(Yang Hongxin, Mao Peichun, Meng Lin, et al.Assessment of Drought Resistance for 19 Germplasm and Materials of Arrhenatherum Elatius at the Seedling Stage[J]. Agricultural Research in the Arid Areas, 2011, 29(2): 6-14.)
[20]
Amberger J, Bocchini C, Hamosh A.A New Face and New Challenges for Online Mendelian Inheritance in Man (OMIM®)[J]. Human Mutation, 2011, 32(5): 564-567.
doi: 10.1002/humu.21466
pmid: 21472891
[21]
OMIM [EB/OL]. [2017-07-05].
[22]
Dorr C R, Freedman B I, Hicks P J, et al.Deceased-Donor Apolipoprotein L1 Renal-Risk Variants Have Minimal Effects on Liver Transplant Outcomes[J]. PLoS One, 2016, 11(4): e0152775.
doi: 10.1371/journal.pone.0152775
[23]
Dummer P D, Limou S, Rosenberg A Z, et al.APOL1 Kidney Disease Risk Variants: An Evolving Landscape[J]. Seminars in Nephrology, 2015, 35(3): 222-236.
doi: 10.1016/j.semnephrol.2015.04.008
pmid: 26215860
[24]
Dollerup P, Thomsen T M, Nejsum L N, et al.Partial Nephrogenic Diabetes Insipidus Caused by a Novel AQP2 Variation Impairing Trafficking of the Aquaporin-2 Water Channel[J]. BMC Nephrology, 2015, 16(1): 217.
doi: 10.1186/s12882-015-0213-3
[25]
Guo K, Lu J, Kou J, et al.Increased Urinary Smad3 is Significantly Correlated with Glomerular Hyperfiltration and a Reduced Glomerular Filtration Rate and is a New Urinary Biomarker for Diabetic Nephropathy[J]. BMC Nephrology, 2015, 16(1): 159.
doi: 10.1186/s12882-015-0156-8
[26]
Tsun J G S, Yung S, Chau M K M, et al. Cellular Cholesterol Transport Proteins in Diabetic Nephropathy[J]. PLoS One, 2014, 9(9): e105787.
doi: 10.1371/journal.pone.0105787
pmid: 4152117
[27]
Xu H, Wang X, Liu M, et al.Association of Aldosterone Synthase (CYP11B2) -344 T/C Polymorphism with Diabetic Nephropathy: A Meta-analysis[J]. Journal of the Renin-Angiotensin-Aldosterone System: JRAAS, 2016, 17(1): 1470320316633896.