Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (1): 134-144     https://doi.org/10.11925/infotech.2096-3467.2021.0612
     研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于SPO语义三元组的疾病知识发现*
蔡妙芝,李晓瑛,赵嘉玮,冯凤翔,任慧玲()
中国医学科学院/北京协和医学院医学信息研究所 北京 100020
Disease Knowledge Discovery Based on SPO Predications
Cai Miaozhi,Li Xiaoying,Zhao Jiawei,Feng Fengxiang,Ren Huiling()
Institute of Medical Information, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100020, China
全文: PDF (1238 KB)   HTML ( 33
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 对PubMed收录的高证据疾病文献进行挖掘与知识发现,为疾病临床诊疗和日常防控提供借鉴。【方法】 利用语义抽取工具SemRep,提出基于SPO语义三元组的疾病知识发现模型,选取糖尿病相关文献对模型进行验证,结合可视化及临床知识进行糖尿病知识发现。【结果】 获得糖尿病SPO三元组1 258个,语义关系16个,揭示了糖尿病相关的基因、常见的并发症、检测手段及治疗方式。【局限】 数据来源为公开发表的文献,未从知识库、电子病历等真实世界数据发现疾病知识。【结论】 验证了基于SPO语义三元组的疾病知识发现模型用于揭示大规模文献中隐含的生物医学知识的可行性,有助于为生物医学科研人员提供潜在的研究假设和思路参考。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
蔡妙芝
李晓瑛
赵嘉玮
冯凤翔
任慧玲
关键词 SPO糖尿病知识发现知识组织    
Abstract

[Objective] This study tries to discover knowledge from the high-level evidence-based literature on diseases indexed by PubMed, aiming to provide reference for clinical diagnosis, treatment, as well as routine prevention and control of diseases. [Methods] We proposed a diseases knowledge discovery model based on SPO predications with the semantic extraction tool SemRep. Then we selected the diabetes-related literature to evaluate this model, and discovered knowledge based on SPO visualization and clinical knowledge. [Results] We obtained 1 258 SPO predications and 16 semantic relationships, which identified diabetes-related genes, common complications, as well as detection and treatment methods. [Limitations] We only examined our model with publicly accessible literature. More research is needed to include knowledge bases and electronic medical records. [Conclusions] The disease knowledge discovery model based on SPO predication could identify the biomedical knowledge from literature, which provides potential research hypotheses and ideas for biomedical researchers.

Key wordsSPO    Diabetes Mellitus    Knowledge Discovery    Knowledge Organization
收稿日期: 2021-06-21      出版日期: 2022-02-22
ZTFLH:  G250  
基金资助:*本文系科技创新2030-“新一代人工智能”重大项目课题(2019AAA0104901);国家社会科学基金项目(20BTQ062);中国WHO双年合作项目的研究成果之一(GJ2-2021-WHOSO-01)
通讯作者: 任慧玲,ORCID:0000-0002-1067-408X     E-mail: wangjd@sic.gov.cn
引用本文:   
蔡妙芝, 李晓瑛, 赵嘉玮, 冯凤翔, 任慧玲. 基于SPO语义三元组的疾病知识发现*[J]. 数据分析与知识发现, 2022, 6(1): 134-144.
Cai Miaozhi, Li Xiaoying, Zhao Jiawei, Feng Fengxiang, Ren Huiling. Disease Knowledge Discovery Based on SPO Predications. Data Analysis and Knowledge Discovery, 2022, 6(1): 134-144.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.0612      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I1/134
Fig.1  疾病知识发现模型
Fig.2  SemRep输出结果示例
类型 语义关系 语义模式示例 三元组示例
诊断治疗 TREATS phsu-TREATS-dsyn Metformin-TREATS-Diabetes Mellitus, Non-Insulin-Dependent
topp-TREATS-dsyn Interventional procedure-TREATS-Diabetes Mellitus, Non-Insulin-
Dependent
horm-TREATS-dsyn Insulin-TREATS-Diabetes Mellitus, Non-Insulin-Dependent
DIAGNOSES diap-DIAGNOSES-dsyn Oral Glucose Tolerance Test-DIAGNOSES-Diabetes
lbpr-DIAGNOSES-dsyn Glucose tolerance test-DIAGNOSES-Gestational Diabetes
PREVENTS dora-PREVENTS-dsyn Exercise-PREVENTS-Gestational Diabetes
phsu-PREVENTS-dsyn Metformin-PREVENTS-Diabetes
相关疾病 PRECEDES dsyn-PRECEDES-dsyn Myocardial Infarction-PRECEDES-Diabetes
COEXISTS_WITH dsyn-COEXISTS_WITH-dsyn Hypoglycemia-COEXISTS_WITH-Diabetes Mellitus, Insulin-Dependent
patf-COEXISTS_WITH-dsyn Insulin Resistance-COEXISTS_WITH-Diabetes Mellitus, Non-Insulin-Dependent
疾病特征 LOCATION_OF bpoc-LOCATION_OF-dsyn Eye-LOCATION_OF-Diabetic macular edema
ISA dsyn-ISA-dsyn Diabetes Mellitus, Non-Insulin-Dependent-ISA-Metabolic Diseases
影响/关联因素 CAUSES dsyn-CAUSES-dsyn Diabetic Nephropathy-CAUSES-Kidney Failure, Chronic
patf-CAUSES-dsyn Insulin Resistance-CAUSES-Diabetes Mellitus, Non-Insulin-Dependent
AFFECTS orch-AFFECTS-dsyn Blood Glucose-AFFECTS-Diabetes Mellitus, Insulin-Dependent
PREDISPOSES dsyn-PREDISPOSES-dsyn Diabetes Mellitus, Non-Insulin-Dependent-PREDISPOSES-
Cardiovascular Diseases
ASSOCIATED_WITH aapp-ASSOCIATED_WITH-dsyn Insulin-ASSOCIATED_WITH-Diabetes Mellitus, Insulin-Dependent
gngm-ASSOCIATED_WITH-dsyn IMPACT gene-ASSOCIATED_WITH-Diabetes Mellitus, Non-Insulin-Dependent
药理作用 AUGMENTS aapp-AUGMENTS-celf Insulin-AUGMENTS-glucose uptake
STIMULATES phsu-STIMULATES-aapp Insulin-STIMULATES-Glucose
INHIBITS phsu-INHIBITS-bacs canagliflozin-INHIBITS-Glucose
DISRUPTS aapp-DISRUPTS-dsyn ranibizumab-DISRUPTS-Diabetic macular edema
INTERACTS_WITH aapp-INTERACTS_WITH-orch CD69 protein, human-INTERACTS_WITH-Blood Glucose
Table 1  糖尿病SPO语义关系及语义模式
Fig.3  SPO可视化
类型 S P O 出现频次
基因 SLC5A2 gene ASSOCIATED_WITH Diabetes Mellitus, Non-Insulin-Dependent 5
HSD11B1 wt Allele ASSOCIATED_WITH Diabetes Mellitus, Non-Insulin-Dependent 3
FABP4 gene ASSOCIATED_WITH Insulin Resistance 3
并发症 Hypoglycemia COEXISTS_WITH Diabetes Mellitus, Insulin-Dependent 40
Cardiovascular Diseases COEXISTS_WITH Diabetes Mellitus, Non-Insulin-Dependent 20
Diabetic Nephropathy ISA Complication 18
Diabetic Foot ISA Complication 16
Diabetic Retinopathy ISA Complication 12
检测手段 Body mass index procedure DIAGNOSES Diabetes Mellitus, Non-Insulin-Dependent 13
Oral Glucose Tolerance Test DIAGNOSES Diabetes 13
治疗 Metformin TREATS Diabetes Mellitus, Non-Insulin-Dependent 338
Insulin TREATS Diabetes Mellitus, Non-Insulin-Dependent 202
sitagliptin TREATS Diabetes Mellitus, Non-Insulin-Dependent 130
liraglutide TREATS Diabetes Mellitus, Non-Insulin-Dependent 97
dapagliflozin TREATS Diabetes Mellitus, Non-Insulin-Dependent 67
pioglitazone TREATS Diabetes Mellitus, Non-Insulin-Dependent 66
canagliflozin TREATS Diabetes Mellitus, Non-Insulin-Dependent 61
exenatide TREATS Diabetes Mellitus, Non-Insulin-Dependent 59
empagliflozin TREATS Diabetes Mellitus, Non-Insulin-Dependent 52
Exercise TREATS Diabetes Mellitus, Non-Insulin-Dependent 101
Exercise Training TREATS Diabetes Mellitus, Non-Insulin-Dependent 28
High-Intensity Interval Training TREATS Diabetes Mellitus, Non-Insulin-Dependent 13
Diet, Carbohydrate-Restricted TREATS Diabetes Mellitus, Non-Insulin-Dependent 13
Very low energy diet TREATS Diabetes Mellitus, Non-Insulin-Dependent 13
Diet, High-Protein TREATS Diabetes Mellitus, Non-Insulin-Dependent 5
Diet, Mediterranean TREATS Diabetes Mellitus, Non-Insulin-Dependent 3
diabetes education ISA Self-Management 68
Resistance education TREATS Diabetes Mellitus, Non-Insulin-Dependent 26
Table 2  糖尿病SPO示例
[1] National Library of Medicine. PubMed Overview[EB/OL].[2021-05-04]. https://pubmed.ncbi.nlm.nih.gov/about/ .
[2] Pyysalo S, Baker S, Ali I, et al. LION LBD: A Literature-Based Discovery System for Cancer Biology[J]. Bioinformatics, 2019, 35(9):1553-1561.
doi: 10.1093/bioinformatics/bty845 pmid: 30304355
[3] 隗玲, 胡正银, 庞弘燊, 等. 基于“主语-谓语-宾语”三元组的知识发现研究——以诱导多能干细胞领域为例[J]. 数字图书馆论坛, 2017(9):28-34.
[3] ( Wei Ling, Hu Zhengyin, Pang Hongshen, et al. Study on Knowledge Discovery Based on “Subject-Predication-Object” Predications: A Case Study of Induced Pluripotent Stem Cells[J]. Digital Library Forum, 2017(9):28-34.)
[4] Liu Y, Bill R, Fiszman M, et al. Using SemRep to Label Semantic Relations Extracted from Clinical Text[J]. Proceedings of the AMIA Annual Fall Symposium, 2012: 587-595.
[5] World Health Organization. Noncommunicable Diseases[EB/OL].(2018-06-1). [2021-05-06]. https://www.who.int/news-room/fact-sheets/detail/noncommunicable-diseases .
[6] Harding J L, Pavkov M E, Magliano D J, et al. Global Trends in Diabetes Complications: A Review of Current Evidence[J]. Diabetologia, 2019, 62(1):3-16.
doi: 10.1007/s00125-018-4711-2
[7] 国家卫生健康委员会. 健康中国行动(2019—2030年)[EB/OL].(2019-07-15). [2021-05-06]. http://www.gov.cn/xinwen/2019-07/15/content_5409694.htm. .
[7] (National Health Commission. Healthy China Action(2019-2030)[EB/OL].(2019-07-15). [2021-05-06]. http://www.gov.cn/xinwen/2019-07/15/content_5409694.htm. .
[8] Gopalakrishnan V, Jha K, Jin W, et al. A Survey on Literature Based Discovery Approaches in Biomedical Domain[J]. Journal of Biomedical Informatics, 2019, 93:103141.
doi: S1532-0464(19)30059-0 pmid: 30857950
[9] 代冰, 胡正银. 基于文献的知识发现新近研究综述[J]. 数据分析与知识发现, 2021, 5(4):1-12.
[9] ( Dai Bing, Hu Zhengyin. Review of Studies on Literature-Based Discovery[J]. Data Analysis and Knowledge Discovery, 2021, 5(4):1-12.)
[10] Swanson D R, Smalheiser N R. An Interactive System for Finding Complementary Literatures: A Stimulus to Scientific Discovery[J]. Artificial Intelligence, 1997, 91(2):183-203.
doi: 10.1016/S0004-3702(97)00008-8
[11] Cohen T, Widdows D, Stephan C, et al. Predicting High-throughput Screening Results with Scalable Literature-based Discovery Methods[J]. CPT: Pharmacometrics & Systems Pharmacology, 2014, 3(10):e140.
[12] 贺丹, 姜淼, 郑光, 等. 利用文本挖掘技术探索高血压病症状、证候以及用药规律[J]. 中国实验方剂学杂志, 2014, 20(19):214-216.
[12] ( He Dan, Jiang Miao, Zheng Guang, et al. Exploring Relationship Among Symptom, Pattern and Medication Regularityof Hypertension Based on Text Mining Technology[J]. Chinese Journal of Experimental Traditional Medical Formulae, 2014, 20(19):214-216.)
[13] 胡正银, 刘蕾蕾, 代冰, 等. 基于领域知识图谱的生命医学学科知识发现探析[J]. 数据分析与知识发现, 2020, 4(11):1-14.
[13] ( Hu Zhengyin, Liu Leilei, Dai Bing, et al. Discovering Subject Knowledge in Life and Medical Sciences with Knowledge Graph[J]. Data Analysis and Knowledge Discovery, 2020, 4(11):1-14.)
[14] Xu B, Shi X, Zhao Z, et al. Leveraging Biomedical Resources in Bi-LSTM for Drug Drug Interaction Extraction[J]. IEEE Access, 2018, 6:33432-33439.
doi: 10.1109/ACCESS.2018.2845840
[15] 李智恒, 桂颖溢, 杨志豪, 等. 基于生物医学文献的化学物质致病关系抽取[J]. 计算机研究与发展, 2018, 55(1):198-206.
[15] ( Li Zhiheng, Gui Yingyi, Yang Zhihao, et al. Chemical-Induced Disease Relation Extraction Based on Biomedical Literature[J]. Journal of Computer Research and Development, 2018, 55(1):198-206.)
[16] 李晓瑛, 李军莲, 李丹亚. 一体化医学语言系统及其在知识发现中的应用研究[J]. 数字图书馆论坛, 2019(9):24-29.
[16] ( Li Xiaoying, Li Junlian, Li Danya. Research on the Unified Medical Language System and Its Application to Knowledge Discovery[J].Digital Library Forum, 2019(9):24-29.)
[17] Fiszman M, Rindflesch T C, Kilicoglu H. Summarizing Drug Information in Medline Citations[J]. Proceedings of the AMIA Annual Fall Symposium, 2006: 254-258.
[18] Kilicoglu H, Fiszman M, Rodriguez R, et al. Semantic MEDLINE: A Web Application for Managing the Results of PubMed Searches[C]// Proceedings of the 3rd International Symposium on Semantic Mining in Biomedicine. 2008: 69-76.
[19] Zhang H, Fiszman M, Shin D, et al. Clustering Cliques for Graph-based Summarization of the Biomedical Research Literature[J]. BMC Bioinformatics, 2013, 14(1):Article No. 182.
doi: 10.1186/1471-2105-14-182
[20] 闫雷, 刘春鹤, 关晶, 等. SemRep处理结果统计挖掘系统的开发[J]. 医学信息学杂志, 2013, 34(4):31-34.
[20] ( Yan Lei, Liu Chunhe, Guan Jing, et al. The Development of Statistics Mining System Based on Result Analysis by Application of SemRep[J]. Journal of Medical Informatics, 2013, 34(4):31-34.)
[21] 王雪, 杨雪梅, 李沛鑫, 等. 基于语义模型的药物矛盾知识发现[J]. 情报杂志, 2020, 39(7):159-165.
[21] ( Wang Xue, Yang Xuemei, Li Peixin, et al. Contradiction Knowledge Discovery of Drugs Based on Semantic Model[J]. Journal of Intelligence, 2020, 39(7):159-165.)
[22] Fiszman M, Rindflesch T C, Kilicoglu H. Abstraction Summarization for Managing the Biomedical Research Literature[C]// Proceedings of the HLT/NAACL 2004 Workshop on Computational Lexical Semantics. USA: Association for Computational Linguistics, 2004: 76-83.
[23] Lundkvist P, Pereira MJ, Kamble PG, et al. Glucagon Levels During Short-Term SGLT2 Inhibition are Largely Regulated by Glucose Changes in Patients with Type 2 Diabetes[J]. Journal of Clinical Endocrinology and Metabolism, 2019, 104(1):193-201.
doi: 10.1210/jc.2018-00969
[24] Stomby A, Otten J, Ryberg M, et al. Diet-induced Weight Loss Alters Hepatic Glucocorticoid Metabolism in Type 2 Diabetes Mellitus[J]. European Journal of Endocrinology, 2020, 182(4):447-457.
doi: 10.1530/EJE-19-0901
[25] Furuhashi M, Hiramitsu S, Mita T, et al. Reduction of Serum FABP4 Level by Sitagliptin, a DPP-4 Inhibitor, in Patients with Type 2 Diabetes Mellitus[J]. Journal of Lipid Research, 2015, 56(12):2372-2380.
doi: 10.1194/jlr.M059469 pmid: 26467280
[26] 殷雨晨. 糖尿病并发症(Ⅰ)[J]. 中国伤残医学, 2020, 28(23):I0003.
[26] (Yin Yuchen. Diabetes Complications (Ⅰ)[J]. Chinese Journal of Trauma and Disability Medicine, 2020, 28(23):I0003.)
[27] Henriksen M M, Andersen H U, Thorsteinsson B, et al. Asymptomatic Hypoglycaemia in Type 1 Diabetes: Incidence and Risk Factors[J]. Diabetic Medicine, 2019, 36(1):62-69.
doi: 10.1111/dme.13848
[28] 中华医学会糖尿病学分会. 2020年2型糖尿病防治指南[EB/OL].[2021-05-15].https://max.book118.com/html/2021/0412/5134100120003220.shtm .
[28] (Chinese Diabetes Society. 2020 Type 2 Diabetes Prevention Guidelines[EB/OL].[2021-05-15].https://max.book118.com/html/2021/0412/5134100120003220.shtm .)
[29] 王永胜, 杨丽霞, 程涛, 等. 糖尿病肾病的炎症致病机制与中药防治[J]. 中国实验方剂学杂志, 2018, 24(2):200-207.
[29] ( Wang Yongsheng, Yang Lixia, Cheng Tao, et al. Pathogenic Mechanism of Inflammation and TCM Intervention of Diabetic Nephropathy[J]. Chinese Journal of Experimental Traditional Medical Formulae, 2018, 24(2):200-207.)
[30] Weng J P, Bi Y. Epidemiological Status of Chronic Diabetic Complications in China[J]. Chinese Medical Journal, 2015, 128(24):3267-3269.
doi: 10.4103/0366-6999.171350
[31] 科普中国. 口服葡萄糖耐量试验[EB/OL]. [2021-05-15].https://baike.baidu.com/item/口服葡萄糖耐量试验/10729512?fr=aladdin .
[31] (China Science Communication. Oral Glucose Tolerance Test[EB/OL].[2021-05-15].https://baike.baidu.com/item/口服葡萄糖耐量试验/10729512?fr=aladdin .)
[32] 王超. 中国成人超重和肥胖及主要危险因素对糖尿病发病的影响[D]. 北京: 北京协和医学院, 2014.
[32] ( Wang Chao. The Influence of Overweight, Obesity and Main Risk Factors on the Incidence of Diabetes in Chinese Adults[D]. Beijing: Peking Union Medical College, 2014.)
[33] Sanchez-Rangel E, Inzucchi S E. Metformin: Clinical Use in Type 2 Diabetes[J]. Diabetologia, 2017, 60(9):1586-1593.
doi: 10.1007/s00125-017-4336-x pmid: 28770321
[34] Fullerton B, Siebenhofer A, Jeitler K, et al. Short-acting Insulin Analogues Versus Regular Human Insulin for Adult, Non-pregnant Persons with Type 2 Diabetes Mellitus[J]. The Cochrane Database of Systematic Reviews, 2018, 12: CD013228.
[35] Defronzo R A, Inzucchi S, Abdul-Ghani M, et al. Pioglitazone: The Forgotten, Cost-effective Cardioprotective Drug for Type 2 Diabetes[J]. Diabetes & Vascular Disease Research, 2019, 16(2):133-143.
[36] Lee I M, Shiroma E J, Lobelo F, et al. Effect of Physical Inactivity on Major Non-Communicable Diseases Worldwide: An Analysis of Burden of Disease and Life Expectancy[J]. The Lancet, 2012, 380(9838):219-229.
doi: 10.1016/S0140-6736(12)61031-9
[37] 科普中国. 糖尿病[EB/OL]. [2021-05-15].https://baike.baidu.com/item/糖尿病/100969?fr=aladdin .
[37] (China Science Communication. Diabetes Mellitus[EB/OL]. [2021-05-15].https://baike.baidu.com/item/糖尿病/100969?fr=aladdin .)
[38] Kilicoglu H, Shin D, Fiszman M, et al. SemMedDB: A PubMed-scale Repository of Biomedical Semantic Predications[J]. Bioinformatics, 2012, 28(23):3158-3160.
doi: 10.1093/bioinformatics/bts591 pmid: 23044550
[1] 张玉洁, 白如江, 许海云, 韩靖, 赵梦梦. 融合多自然语言处理任务的中医辅助诊疗方案研究——以糖尿病为例*[J]. 数据分析与知识发现, 2022, 6(1): 122-133.
[2] 代冰,胡正银. 基于文献的知识发现新近研究综述 *[J]. 数据分析与知识发现, 2021, 5(4): 1-12.
[3] 邱云飞, 郭蕾. 面向非均衡数据的糖尿病并发症预测[J]. 数据分析与知识发现, 2021, 5(2): 116-128.
[4] 吴胜男, 田若楠, 蒲虹君, 梁雯琪, 张亚飞, 于琦, 贺培凤. 基于社交媒体的医药领域关联主题预测方法研究*[J]. 数据分析与知识发现, 2021, 5(12): 98-109.
[5] 胡正银,刘蕾蕾,代冰,覃筱楚. 基于领域知识图谱的生命医学学科知识发现探析*[J]. 数据分析与知识发现, 2020, 4(11): 1-14.
[6] 孙海霞,邓盼盼,李姣,沈柳,钱庆. 面向多源词表整合的概念自动更新策略研究*[J]. 数据分析与知识发现, 2020, 4(1): 121-130.
[7] 胡佳慧,方安,赵琬清,杨晨柳,任慧玲. 面向知识发现的中文电子病历标注方法研究 *[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[8] 吴菊华,王煜,黎明,蔡少云. 基于加权知识网络的在线健康社区用户知识发现*[J]. 数据分析与知识发现, 2019, 3(2): 108-117.
[9] 杨磊,王子润,侯贵生. 基于Q-LDA主题模型的网络健康社区主题挖掘研究 *[J]. 数据分析与知识发现, 2019, 3(11): 52-59.
[10] 胡吉颖,谢靖,钱力,付常雷. 基于知识图谱的科技大数据知识发现平台建设*[J]. 数据分析与知识发现, 2019, 3(1): 55-62.
[11] 王欣, 冯文刚. 在线极端主义和激进化监测技术综述*[J]. 数据分析与知识发现, 2018, 2(10): 2-8.
[12] 张志强, 范少萍, 陈秀娟. 面向精准医学知识发现的生物医学信息学发展*[J]. 数据分析与知识发现, 2018, 2(1): 1-8.
[13] 牟冬梅, 王萍, 赵丹宁. 高维电子病历的数据降维策略与实证研究*[J]. 数据分析与知识发现, 2018, 2(1): 88-98.
[14] 谢靖, 王敬东, 吴振新, 张智雄, 王颖, 叶志飞. 科技文献检索系统语义丰富化框架的设计与实践*[J]. 数据分析与知识发现, 2017, 1(4): 84-93.
[15] 陈果, 肖璐. 网络社区中的知识元链接体系构建研究*[J]. 数据分析与知识发现, 2017, 1(11): 75-83.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn