Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (11): 1-14    DOI: 10.11925/infotech.2096-3467.2020.0681
Current Issue | Archive | Adv Search |
Discovering Subject Knowledge in Life and Medical Sciences with Knowledge Graph
Hu Zhengyin1,2(),Liu Leilei1,2,Dai Bing1,2,Qin Xiaochu3,4
1Chengdu Library and Information Center, Chinese Academy of Sciences, Chengdu 610041, China
2Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
3Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou 510700, China
4Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
Download: PDF (4808 KB)   HTML ( 75
Export: BibTeX | EndNote (RIS)      

[Objective] This paper explores new methods for deep subject knowledge discovery using multi-source heterogeneous data. [Methods] First, we constructed a SPO semantic network of literature to create the core domain knowledge graph. Then, we implemented multi-source heterogeneous data fusion through “entity alignment, concept level fusion and relationship fusion” to obtain the whole domain knowledge graph. Finally, we discovered deep subject knowledge with the help of this knowledge graph. We examined our method with data on Hematopoietic Stem Cell for Cancer Treatment (HSCCT). [Results] This paper proposed a knowledge graph-based framework for subject knowledge discovery (KGSKD), which fuses multi-source heterogeneous data multi-dimensionally and fine-grainedly, enriches semantic relationships among data, and supports knowledge discovery techniques such as knowledge inference, pathfinder, and link prediction natively. [Limitations] KGSKD has some limitations including data supersaturation, poor interpretability of knowledge discovery results and difficulty in communicating with domain experts. [Conclusions] KGSKD has the advantages of “richer data types”, “more comprehensive knowledge linkage”, “more advanced mining methods” and “deeper discovery results”, which effectively supports research and services of deep knowledge discovery in life sciences and medicine.

Key wordsSubject Knowledge Discovery      Knowledge Graph      SPO Triples      Data Fusion      Entity Alignment     
Received: 13 July 2020      Published: 04 December 2020
ZTFLH:  G251  
Corresponding Authors: Hu Zhengyin     E-mail:

Cite this article:

Hu Zhengyin,Liu Leilei,Dai Bing,Qin Xiaochu. Discovering Subject Knowledge in Life and Medical Sciences with Knowledge Graph. Data Analysis and Knowledge Discovery, 2020, 4(11): 1-14.

URL:     OR

A Diagram of Close Discovery and Open Discovery [17]
序号 主语 主语语义类型 谓语 宾语 宾语语义类型
1 Hemofiltration Therapeutic or Preventive Procedure TREATS Patients Human
2 Digoxin overdose Injury or Poisoning PROCESS_OF Patients Human
3 Hyperkalemia Pathologic Function COMPLICATES Digoxin overdose Injury or Poisoning
4 Hemofiltration Therapeutic or Preventive Procedure TREATS (INFER) Digoxin overdose Injury or Poisoning
Samples of SPO Triples
Framework of KGSKD
A Sample of SPO Semantic Network [22]
序号 映射类型 源知识实体(Term) 目标知识实体(CUI|Concept Name|STY)*
1 一对一映射 Abnormality of neutrophils C0427515| Neutrophil abnormality| Finding
2 多对一映射 Central Nervous System Neoplasms C0085136| Central Nervous System Neoplasms| Neoplastic Process
3 一对多映射 RUNX1 C1335654|RUNX1 gene| Gene or Genome
C1435548| RUNX1 protein, human| Amino Acid, Peptide, or Protein
4 一对无映射 Conjunctival icterus ——
Mapping Types of Knowledge Entities to UMLS[30,31]
The Knowledge Graph-based Knowledge Discovery Techniques
类型 数据库 检索策略 数据量
论文 PubMed (((((((stem cells) OR stem cell)) AND (((((stem cellulose) OR stem. Cellular) OR cello) OR cellar) OR cellphone))) OR ((((((((((((ESC) OR ASC) OR iPS) OR PGC) OR MSC) OR CSC) OR LSC) OR TSC) OR ADSC) OR HSC)) near ((cell) OR cells)))) AND ((Hematopoiet*) AND stem cell*)
24 051篇
专利 Derwent
((((ALLD=(("stem cells" OR "stem cell") NOT ("stem cellulose" or "stem. Cellular" or "cello" or "cellar" or "cellphone")) OR ALLD=((ESC or ASC or iPS or PGC or MSC or CSC or LSC or TSC or ADSC or HSC) near (cells OR cell)) OR ALLD=(("totipotent" or "pluripotent" or "multipotent" or "unipotent" or "progenitor" or "precursor") ADJ (cells OR cell)) OR ALLD=("tissue engineer*" OR "tissue scaffolding " OR "tissue regenerat*of regenerative medicine" OR "tissue expansion of regenerative medicine" OR "tissue therapy of regenerative medicine" OR "tissue culture of regenerative medicine" OR "tissue construction of regenerative medicine" OR "biological material*" OR "animal seed cells") OR ABD=(("skin" OR "cartilage" OR "bone" OR "tendon" OR "myocardiac" OR "cardiac" OR "vascular" OR "nerve" OR "cornea" OR "dental" OR "periodontal") ADJ ("tissue engineer*" or "regenerat*")) OR ALLD=("tissue engineer*" AND biomaterial*) OR SSTO=("regenerative medicine") OR ICR=("C12N0050735" OR "C12N005074" OR "C12N0050789" OR "C12N0050797" OR "C12N005095")) NOT ALLD=("seed*" or "herbicide insect hybrid" or "hybrid" or "root bud seeding" or "hybrid corn " or "plant tissue seed") NOT ALLD=(("fuel cell" or "in-plane switching" or "Intrusion Prevention System") NOT (("non-pluripotent") ADJ (CELL*))) NOT ICR=(H or D or E or F or A01B or A01C or A01H or A01G or A21 or A22 or A23 or A46 or A24 or A47 or A63 or A62 or A44 or A45 or C02 or C03C or C05or OR C06 or C10 or C21 or C07B or C07C or C07D or C07F or C07J))) AND (CC=((WO OR US OR EP OR JP)))) AND (ALLD=(Hematopoiet* and stem cell*));
3 986件
Search Policy and Results of HSC Literatures
语义类型(英文) 语义类型(中文)
Chemicals_Drug 化学物质与药物
Disorder 疾病
Genes_Molecular_Sequence 基因与分子序列
Phenotype 表型
Mutation 突变
Hallmark 癌症标识物
Phenomena 现象
Procedure 程序活动
Device 设备
Physiology 生理学
Concepts(including gene, cell, virus, etc.) 概念(包含基因、细胞、病毒等)
Living_Being 生物
PN 专利
Semantic Types of HSCCT Knowledge Entities [30]
语义关系对象 语义分组
(Semantic Group)
语义关系(Semantic Relationship)
相互作用关系 ASSOCIATED_WITH(mutation_to_disease, mutation_to_phenotype, gene_to_mutation, gene_to_disease, gene_to_phenotype, gene Related);
共现关系 cooccurrence
隶属关系 belong_to_PMID
Semantic Relations in HSCCT Knowledge Graph [30]
LinkPaths Between Vaccines and Placental Growth Factor
[1] 梁娜, 曾燕. 推进数据密集科学发现提升科技创新能力: 新模式、新方法、新挑战——《第四范式: 数据密集型科学发现》译著出版[J]. 中国科学院院刊, 2013,28(1):115-121.
[1] ( Liang Na, Zeng Yan. Promote Data-intensive Scientific Discovery, Enhance Scientific and Technological Innovation Capability: New Model, New Method, and New Challenges Comments on “The Fourth Paradigm: Data-Intensive Scientific Discovery”[J]. Bulletin of the Chinese Academy of Sciences, 2013,28(1):115-121.)
[2] 张志强, 胡正银, 杨宁, 等. 干细胞领域知识发现大数据平台建设与应用[A]// 中国科研信息化蓝皮书2020[M]. 北京: 科学出版社, 2020.
[2] ( Zhang Zhiqiang, Hu Zhengyin, Yang Ning, et al. Big Data Platform for Subject Knowledge Discovery in the Stem Cell Field[A] // China’s e-Science Blue Book 2020[M]. Beijing: Science Press, 2020.)
[3] 陆伟, 李信, 任珂. 基于解剖结构视角的医学学科画像研究[J]. 信息资源管理学报, 2018,8(3):12-24.
[3] ( Lu Wei, Li Xin, Ren Ke. Research on Subject Profile of Medical Science from the Perspective of Anatomical Structure[J]. Journal of Information Resources Management, 2018,8(3):12-24.)
[4] 张志强, 范少萍. 论学科信息学的兴起与发展[J]. 情报学报, 2015,34(10):1011-1023.
[4] ( Zhang Zhiqiang, Fan Shaoping. On the Emergence and Development of Subject Informatics[J]. Journal of the China Society for Scientific and Technical Information, 2015,34(10):1011-1023.)
[5] 张志强, 范少萍, 陈秀娟. 面向精准医学知识发现的生物医学信息学发展[J]. 数据分析与知识发现, 2018,2(1):1-8.
[5] ( Zhang Zhiqiang, Fan Shaoping, Chen Xiujuan. Biomedical Informatics Studies for Knowledge Discovery in Precision Medicine[J]. Data Analysis and Knowledge Discovery, 2018,2(1):1-8.)
[6] 李广建, 江信昱. 论计算型情报分析[J]. 中国图书馆学报, 2018,44(2):4-16.
[6] ( Li Guangjian, Jiang Xinyu. On Computational Information Analysis[J]. Journal of Library Science in China, 2018,44(2):4-16.)
[7] 李文林, 曾莉, 杨斓. 基于文献的知识发现服务及其问题——以南京中医药大学图书馆为例[J]. 大学图书馆学报, 2015,33(2):61-65.
[7] ( Li Wenlin, Zeng Li, Yang Lan. Experiences and Problems in Literature-based Knowledge Discovery Service in University Libraries - Taking Nanjing University of Chinese Medicine Library as an Example[J]. Journal of Academic Library, 2015,33(2):61-65.)
[8] 漆桂林, 高桓, 吴天星. 知识图谱研究进展[J]. 情报工程, 2017,3(1):4-25.
[8] ( Qi Guilin, Gao Huan, Wu Tianxing. The Research Advances of Knowledge Graph[J]. Technology Intelligence Engineering, 2017,3(1):4-25.)
[9] Hu Z Y, Xu H Y, Qin X C. A Knowledge Graph of Stem Cell Oriented to Subject Knowledge Discovery [C]//Proceedings of the 7th IEEE International Conference on Healthcare Informatics. 2019.
[10] Lamurias A, Ferreira J D, Clarke L A, et al. Generating a Tolerogenic Cell Therapy Knowledge Graph from Literature[J]. Frontiers in Immunology, 2017,8:1-12.
doi: 10.3389/fimmu.2017.00001 pmid: 28149297
[11] 马明, 武夷山. Don R.Swanson的情报学学术成就的方法论意义与启示[J]. 情报学报, 2003,22(3):259-266.
[11] ( Ma Ming, Wu Yishan. Methodological Enlightenment and Significance of Don R.Swanson’s Achievements in Information Science[J]. Journal of the China Society for Scientific and Technical Information, 2003,22(3):259-266.)
[12] 胡正银, 刘春江, 隗玲, 等. 面向TRIZ的领域专利技术挖掘系统设计与实践[J]. 图书情报工作, 2017,61(1):117-124.
[12] ( Hu Zhengyin, Liu Chunjiang, Wei Ling, et al. Design and Practice of Domain Patent Tech Mining System Oriented to TRIZ[J]. Library and Information Service, 2017,61(1):117-124.)
[13] Swanson D R. Fish Oil, Raynaud’s Syndrome, and Undiscovered Public Knowledge[J]. Perspectives in Biology and Medicine, 1986,30(1):7-18.
doi: 10.1353/pbm.1986.0087 pmid: 3797213
[14] Swanson D R. Undiscovered Public Knowledge[J]. The Library Quarterly, 1986,56(2):103-118.
doi: 10.1086/601720
[15] Smalheiser N R. Literature-Based Discovery: Beyond the ABCs[J]. Journal of the American Society for Information Science and Technology, 2012,63(2):218-224.
doi: 10.1002/asi.21599
[16] Henry S, Mcinnes B. Literature Based Discovery: Models, Methods, and Trends[J]. Journal of Biomedical Informatics, 2017,74:20-32.
doi: 10.1016/j.jbi.2017.08.011 pmid: 28838802
[17] Pyysalo S, Baker S, Ali I, et al. LION LBD: A Literature-Based Discovery System for Cancer Biology[J]. Bioinformatics, 2019,35(9):1553-1561.
doi: 10.1093/bioinformatics/bty845 pmid: 30304355
[18] Kostoff R N. Literature-Related Discovery(LRD): Potential Treatments for Cataracts[J]. Technological Forecasting and Social Change, 2008,75(2):215-225.
doi: 10.1016/j.techfore.2007.11.006
[19] Kostoff R N, Briggs M B, Lyons T J. Literature-Related Discovery(LRD): Potential Treatments for Multiple Sclerosis[J]. Technological Forecasting and Social Change, 2008,75(2):239-255.
doi: 10.1016/j.techfore.2007.11.002
[20] Kostoff R N, Briggs M B. Literature-Related Discovery(LRD): Potential Treatments for Parkinson’s Disease[J]. Technological Forecasting and Social Change, 2008,75(2):226-238.
doi: 10.1016/j.techfore.2007.11.007
[21] 侯跃芳, 朱瑾, 崔梦遥, 等. 运用非相关文献知识发现方法挖掘疾病的潜在相关基因[J]. 中华医学图书情报杂志, 2010,19(5):1-4, 10.
[21] ( Hou Yuefang, Zhu Jin, Cui Mengyao, et al. To Mine Disease-Related Potential Genes Using Non-Literature Related Knowledge Discovery Methods[J]. Chinese Journal of Medical Library and Information Science, 2010,19(5):1-4, 10.)
[22] Hu Z Y, Zeng R Q, Qin X C, et al. A Method of Biomedical Knowledge Discovery by Literature Mining Based on SPO Predications: A Case Study of Induced Pluripotent Stem Cells[C]// Proceedings of 2018 Machine Learning and Data Mining in Pattern Recognition. 2018: 383-393.
[23] Hu Z, Zeng R Q, Peng L, et al. Discovering Emerging Research Topics Based on SPO Predications[C]// Proceedings of 2019 Knowledge Management in Organizations. 2019: 110-121.
[24] Rindflesch T C, Fiszman M. The Interaction of Domain Knowledge and Linguistic Structure in Natural Language Processing: Interpreting Hypernymic Propositions in Biomedical Text[J]. Journal of Biomedical Informatics, 2003,36(6):462-477.
doi: 10.1016/j.jbi.2003.11.003
[25] Kilicoglu H, Rosemblat G, Fiszman M, et al. Constructing a Semantic Predication Gold Standard from the Biomedical Literature[J]. BMC Bioinformatics, 2011,12(1):1-17.
doi: 10.1186/1471-2105-12-1
[26] Zhang Y, Porter A L, Hu Z, et al. “Term Clumping” for Technical Intelligence: A Case Study on Dye-Sensitized Solar Cells[J]. Technological Forecasting and Social Change, 2014,85:26-39.
doi: 10.1016/j.techfore.2013.12.019
[27] 胡正银. 基于个性化语义TRIZ的专利技术挖掘研究[D]. 北京:中国科学院大学, 2015.
[27] ( Hu Zhengyin. Study on Patent Tech Mining Based on Personalized Semantic TRIZ[D]. Beijing: University of Chinese Academy of Sciences, 2015.)
[28] Fiszman M, Rindflesch T C, Kilicoglu H. Abstraction Summarization for Managing the Biomedical Research Literature[C]// Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics (CLS’04). ACM, 2004: 76-83.
[29] 隗玲, 胡正银, 庞弘燊, 等. 基于“主语-谓语-宾语”三元组的知识发现研究——以诱导多能干细胞领域为例[J]. 数字图书馆论坛, 2017(9):28-34.
[29] ( Wei Ling, Hu Zhengyin, Pang Hongshen, et al. Study on Knowledge Discovery in Biomedical Literature Based on SPO Predications: A Case Study of Induced Pluripotent Stem Cells[J]. Digital Library Forum, 2017(9):28-34.)
[30] 刘蕾蕾. 面向学科知识问答的多源数据融合研究——以造血干细胞癌症治疗为例[D]. 北京: 中国科学院大学, 2020.
[30] ( Liu Leilei. Research on Multi-Source Data Fusion for the Question and Answer of Subject Knowledge - A Case Study of Hematopoietic Stem Cell for Cancer Treatment[D]. Beijing: University of Chinese Academy of Sciences, 2020.)
[31] Chris J L. The Specialist Lexicon and NLP Tools [EB/OL]. [2020-05-11].
[32] NLM. Metathesaurus[EB/OL]. [2020-05-11].
[33] NLM. Term Processing[EB/OL]. [2019-10-16]. Processing.pdf.
[34] Chris J L, Browne A C. Sub-Term Mapping Tools[EB/OL]. [2019-10-28].
[35] Hristovski D, Kastrin A, Peterlin B, et al. Combining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation[A]// Linking Literature, Information, and Knowledge for Biology[M].Heidelberg: Springer, 2010.
[36] 胡正银, 方曙, 郑颖, 等. 基于Ontology的智能检索技术研究与实践[J]. 情报杂志, 2009,28(5):159-162.
[36] ( Hu Zhengyin, Fang Shu, Zheng Ying, et al. Method of Development and Architecture of an Ontology-Based Intelligent Retrieval System[J]. Journal of Intelligences, 2009,28(5):159-162.)
[37] Chen C. Searching for Intellectual Turning Points: Progressive Knowledge Domain Visualization[J]. Proceedings of the National Academy of Sciences, 2004,101(S1):5303-5310.
[38] Song M, Heo G E, Ding Y. SemPathFinder: Semantic Path Analysis for Discovering Publicly Unknown Knowledge[J]. Journal of Informetrics, 2015,9(4):686-703.
[39] Kumar A, Singh S, Singh K, et al. Link Prediction Techniques, Applications, and Performance: A Survey[J]. Physica A: Statistical Mechanics and Its Applications, 2020,553:1-46.
[40] Adamic L, Adar E. Friends and Neighbors on the Web[J]. Social Networks, 2003,25(3):211-230.
[41] 郝莎, 董芳, 胡林萍, 等. 造血干细胞生物学及临床应用研究概况[J]. 中国细胞生物学学报, 2018,40(13):2237-2248.
[41] ( Hao Sha, Dong Fang, Hu Linping, et al. Biology and Clinical Application Research of Hematopoietic Stem Cells[J]. Chinese Journal of Cell Biology, 2018,40(13):2237-2248.)
[42] 周园春, 王卫军, 乔子越, 等. 科技大数据知识图谱构建方法及应用研究综述[J]. 中国科学: 信息科学, 2020,50(7):957-987.
[42] ( Zhou Yuanchun, Wang Weijun, Qiao Ziyue, et al. A Survey on the Construction Methods and Applications of Sci-Tech Big Data Knowledge Graph[J]. Scientia Sinica Informationis, 2020,50(7):957-987.)
[43] 张志强, 胡正银, 文奕. 学科信息学与学科知识发现[M]. 北京: 科学出版社, 2020.
[43] ( Zhang Zhiqiang, Hu Zhengyin, Wen Yi. Subject Informatics and Subject Knowledge Discovery[M]. Beijing: Science Press, 2020.)
[1] Zhou Yang,Li Xuejun,Wang Donglei,Chen Fang,Peng Lijuan. Visualizing Knowledge Graph for Explosive Formula Design[J]. 数据分析与知识发现, 2021, 5(9): 42-53.
[2] Li Wenna, Zhang Zhixiong. Entity Alignment Method for Different Knowledge Repositories with Joint Semantic Representation[J]. 数据分析与知识发现, 2021, 5(7): 1-9.
[3] Shen Kejie, Huang Huanting, Hua Bolin. Constructing Knowledge Graph with Public Resumes[J]. 数据分析与知识发现, 2021, 5(7): 81-90.
[4] Ruan Xiaoyun,Liao Jianbin,Li Xiang,Yang Yang,Li Daifeng. Interpretable Recommendation of Reinforcement Learning Based on Talent Knowledge Graph Reasoning[J]. 数据分析与知识发现, 2021, 5(6): 36-50.
[5] Li He,Liu Jiayu,Li Shiyu,Wu Di,Jin Shuaiqi. Optimizing Automatic Question Answering System Based on Disease Knowledge Graph[J]. 数据分析与知识发现, 2021, 5(5): 115-126.
[6] Dai Bing,Hu Zhengyin. Review of Studies on Literature-Based Discovery[J]. 数据分析与知识发现, 2021, 5(4): 1-12.
[7] Yu Chuanming, Zhang Zhengang, Kong Lingge. Comparing Knowledge Graph Representation Models for Link Prediction[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
[8] Li Guangjian,Wang Kai,Zhang Qingzhi. Analysis Framework Based on Multi-Source Data for US Export Control: An Empirical Study[J]. 数据分析与知识发现, 2020, 4(9): 26-40.
[9] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[10] Lv Huakui,Hong Liang,Ma Feicheng. Constructing Knowledge Graph for Financial Equities[J]. 数据分析与知识发现, 2020, 4(5): 27-37.
[11] Sun Xinrui,Meng Yu,Wang Wenle. Identifying Traffic Events from Weibo with Knowledge Graph and Target Detection[J]. 数据分析与知识发现, 2020, 4(12): 136-147.
[12] Zhu Chaoyu, Liu Lei. A Review of Medical Decision Supports Based on Knowledge Graph[J]. 数据分析与知识发现, 2020, 4(12): 26-32.
[13] Li Jiaquan,Li Baoan,You Xindong,Lü Xueqiang. Computing Similarity of Patent Terms Based on Knowledge Graph[J]. 数据分析与知识发现, 2020, 4(10): 104-112.
[14] Wang Yi,Shen Zhe,Yao Yifan,Cheng Ying. Domain-Specific Event Graph Construction Methods:A Review[J]. 数据分析与知识发现, 2020, 4(10): 1-13.
[15] Huiying Qi,Yuhe Jiang. Predicting Breast Cancer Survival Length with Multi-Omics Data Fusion[J]. 数据分析与知识发现, 2019, 3(8): 88-93.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938