Discovering Subject Knowledge in Life and Medical Sciences with Knowledge Graph
Hu Zhengyin1,2(),Liu Leilei1,2,Dai Bing1,2,Qin Xiaochu3,4
1Chengdu Library and Information Center, Chinese Academy of Sciences, Chengdu 610041, China 2Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China 3Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou 510700, China 4Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
[Objective] This paper explores new methods for deep subject knowledge discovery using multi-source heterogeneous data. [Methods] First, we constructed a SPO semantic network of literature to create the core domain knowledge graph. Then, we implemented multi-source heterogeneous data fusion through “entity alignment, concept level fusion and relationship fusion” to obtain the whole domain knowledge graph. Finally, we discovered deep subject knowledge with the help of this knowledge graph. We examined our method with data on Hematopoietic Stem Cell for Cancer Treatment (HSCCT). [Results] This paper proposed a knowledge graph-based framework for subject knowledge discovery (KGSKD), which fuses multi-source heterogeneous data multi-dimensionally and fine-grainedly, enriches semantic relationships among data, and supports knowledge discovery techniques such as knowledge inference, pathfinder, and link prediction natively. [Limitations] KGSKD has some limitations including data supersaturation, poor interpretability of knowledge discovery results and difficulty in communicating with domain experts. [Conclusions] KGSKD has the advantages of “richer data types”, “more comprehensive knowledge linkage”, “more advanced mining methods” and “deeper discovery results”, which effectively supports research and services of deep knowledge discovery in life sciences and medicine.
胡正银,刘蕾蕾,代冰,覃筱楚. 基于领域知识图谱的生命医学学科知识发现探析*[J]. 数据分析与知识发现, 2020, 4(11): 1-14.
Hu Zhengyin,Liu Leilei,Dai Bing,Qin Xiaochu. Discovering Subject Knowledge in Life and Medical Sciences with Knowledge Graph. Data Analysis and Knowledge Discovery, 2020, 4(11): 1-14.
C0085136| Central Nervous System Neoplasms| Neoplastic Process
CNS TUMORS
CNS NEOPL
3
一对多映射
RUNX1
C1335654|RUNX1 gene| Gene or Genome
C1435548| RUNX1 protein, human| Amino Acid, Peptide, or Protein
4
一对无映射
Conjunctival icterus
——
Table 2 知识实体向UMLS映射[30,31]
Fig.4 基于知识图谱的知识发现方法
类型
数据库
检索策略
数据量
论文
PubMed
(((((((stem cells) OR stem cell)) AND (((((stem cellulose) OR stem. Cellular) OR cello) OR cellar) OR cellphone))) OR ((((((((((((ESC) OR ASC) OR iPS) OR PGC) OR MSC) OR CSC) OR LSC) OR TSC) OR ADSC) OR HSC)) near ((cell) OR cells)))) AND ((Hematopoiet*) AND stem cell*) 出版日期:2009/01/01-2017/12/31
24 051篇
专利
Derwent Innovation
((((ALLD=(("stem cells" OR "stem cell") NOT ("stem cellulose" or "stem. Cellular" or "cello" or "cellar" or "cellphone")) OR ALLD=((ESC or ASC or iPS or PGC or MSC or CSC or LSC or TSC or ADSC or HSC) near (cells OR cell)) OR ALLD=(("totipotent" or "pluripotent" or "multipotent" or "unipotent" or "progenitor" or "precursor") ADJ (cells OR cell)) OR ALLD=("tissue engineer*" OR "tissue scaffolding " OR "tissue regenerat*of regenerative medicine" OR "tissue expansion of regenerative medicine" OR "tissue therapy of regenerative medicine" OR "tissue culture of regenerative medicine" OR "tissue construction of regenerative medicine" OR "biological material*" OR "animal seed cells") OR ABD=(("skin" OR "cartilage" OR "bone" OR "tendon" OR "myocardiac" OR "cardiac" OR "vascular" OR "nerve" OR "cornea" OR "dental" OR "periodontal") ADJ ("tissue engineer*" or "regenerat*")) OR ALLD=("tissue engineer*" AND biomaterial*) OR SSTO=("regenerative medicine") OR ICR=("C12N0050735" OR "C12N005074" OR "C12N0050789" OR "C12N0050797" OR "C12N005095")) NOT ALLD=("seed*" or "herbicide insect hybrid" or "hybrid" or "root bud seeding" or "hybrid corn " or "plant tissue seed") NOT ALLD=(("fuel cell" or "in-plane switching" or "Intrusion Prevention System") NOT (("non-pluripotent") ADJ (CELL*))) NOT ICR=(H or D or E or F or A01B or A01C or A01H or A01G or A21 or A22 or A23 or A46 or A24 or A47 or A63 or A62 or A44 or A45 or C02 or C03C or C05or OR C06 or C10 or C21 or C07B or C07C or C07D or C07F or C07J))) AND (CC=((WO OR US OR EP OR JP)))) AND (ALLD=(Hematopoiet* and stem cell*)); 申请年:1999年-2018年
( Liang Na, Zeng Yan. Promote Data-intensive Scientific Discovery, Enhance Scientific and Technological Innovation Capability: New Model, New Method, and New Challenges Comments on “The Fourth Paradigm: Data-Intensive Scientific Discovery”[J]. Bulletin of the Chinese Academy of Sciences, 2013,28(1):115-121.)
( Zhang Zhiqiang, Hu Zhengyin, Yang Ning, et al. Big Data Platform for Subject Knowledge Discovery in the Stem Cell Field[A] // China’s e-Science Blue Book 2020[M]. Beijing: Science Press, 2020.)
( Lu Wei, Li Xin, Ren Ke. Research on Subject Profile of Medical Science from the Perspective of Anatomical Structure[J]. Journal of Information Resources Management, 2018,8(3):12-24.)
( Zhang Zhiqiang, Fan Shaoping. On the Emergence and Development of Subject Informatics[J]. Journal of the China Society for Scientific and Technical Information, 2015,34(10):1011-1023.)
( Zhang Zhiqiang, Fan Shaoping, Chen Xiujuan. Biomedical Informatics Studies for Knowledge Discovery in Precision Medicine[J]. Data Analysis and Knowledge Discovery, 2018,2(1):1-8.)
[6]
李广建, 江信昱. 论计算型情报分析[J]. 中国图书馆学报, 2018,44(2):4-16.
[6]
( Li Guangjian, Jiang Xinyu. On Computational Information Analysis[J]. Journal of Library Science in China, 2018,44(2):4-16.)
( Li Wenlin, Zeng Li, Yang Lan. Experiences and Problems in Literature-based Knowledge Discovery Service in University Libraries - Taking Nanjing University of Chinese Medicine Library as an Example[J]. Journal of Academic Library, 2015,33(2):61-65.)
[8]
漆桂林, 高桓, 吴天星. 知识图谱研究进展[J]. 情报工程, 2017,3(1):4-25.
[8]
( Qi Guilin, Gao Huan, Wu Tianxing. The Research Advances of Knowledge Graph[J]. Technology Intelligence Engineering, 2017,3(1):4-25.)
[9]
Hu Z Y, Xu H Y, Qin X C. A Knowledge Graph of Stem Cell Oriented to Subject Knowledge Discovery [C]//Proceedings of the 7th IEEE International Conference on Healthcare Informatics. 2019.
[10]
Lamurias A, Ferreira J D, Clarke L A, et al. Generating a Tolerogenic Cell Therapy Knowledge Graph from Literature[J]. Frontiers in Immunology, 2017,8:1-12.
doi: 10.3389/fimmu.2017.00001
pmid: 28149297
[11]
马明, 武夷山. Don R.Swanson的情报学学术成就的方法论意义与启示[J]. 情报学报, 2003,22(3):259-266.
[11]
( Ma Ming, Wu Yishan. Methodological Enlightenment and Significance of Don R.Swanson’s Achievements in Information Science[J]. Journal of the China Society for Scientific and Technical Information, 2003,22(3):259-266.)
( Hu Zhengyin, Liu Chunjiang, Wei Ling, et al. Design and Practice of Domain Patent Tech Mining System Oriented to TRIZ[J]. Library and Information Service, 2017,61(1):117-124.)
[13]
Swanson D R. Fish Oil, Raynaud’s Syndrome, and Undiscovered Public Knowledge[J]. Perspectives in Biology and Medicine, 1986,30(1):7-18.
doi: 10.1353/pbm.1986.0087
pmid: 3797213
[14]
Swanson D R. Undiscovered Public Knowledge[J]. The Library Quarterly, 1986,56(2):103-118.
doi: 10.1086/601720
[15]
Smalheiser N R. Literature-Based Discovery: Beyond the ABCs[J]. Journal of the American Society for Information Science and Technology, 2012,63(2):218-224.
doi: 10.1002/asi.21599
[16]
Henry S, Mcinnes B. Literature Based Discovery: Models, Methods, and Trends[J]. Journal of Biomedical Informatics, 2017,74:20-32.
doi: 10.1016/j.jbi.2017.08.011
pmid: 28838802
[17]
Pyysalo S, Baker S, Ali I, et al. LION LBD: A Literature-Based Discovery System for Cancer Biology[J]. Bioinformatics, 2019,35(9):1553-1561.
doi: 10.1093/bioinformatics/bty845
pmid: 30304355
[18]
Kostoff R N. Literature-Related Discovery(LRD): Potential Treatments for Cataracts[J]. Technological Forecasting and Social Change, 2008,75(2):215-225.
doi: 10.1016/j.techfore.2007.11.006
[19]
Kostoff R N, Briggs M B, Lyons T J. Literature-Related Discovery(LRD): Potential Treatments for Multiple Sclerosis[J]. Technological Forecasting and Social Change, 2008,75(2):239-255.
doi: 10.1016/j.techfore.2007.11.002
[20]
Kostoff R N, Briggs M B. Literature-Related Discovery(LRD): Potential Treatments for Parkinson’s Disease[J]. Technological Forecasting and Social Change, 2008,75(2):226-238.
doi: 10.1016/j.techfore.2007.11.007
( Hou Yuefang, Zhu Jin, Cui Mengyao, et al. To Mine Disease-Related Potential Genes Using Non-Literature Related Knowledge Discovery Methods[J]. Chinese Journal of Medical Library and Information Science, 2010,19(5):1-4, 10.)
[22]
Hu Z Y, Zeng R Q, Qin X C, et al. A Method of Biomedical Knowledge Discovery by Literature Mining Based on SPO Predications: A Case Study of Induced Pluripotent Stem Cells[C]// Proceedings of 2018 Machine Learning and Data Mining in Pattern Recognition. 2018: 383-393.
[23]
Hu Z, Zeng R Q, Peng L, et al. Discovering Emerging Research Topics Based on SPO Predications[C]// Proceedings of 2019 Knowledge Management in Organizations. 2019: 110-121.
[24]
Rindflesch T C, Fiszman M. The Interaction of Domain Knowledge and Linguistic Structure in Natural Language Processing: Interpreting Hypernymic Propositions in Biomedical Text[J]. Journal of Biomedical Informatics, 2003,36(6):462-477.
doi: 10.1016/j.jbi.2003.11.003
[25]
Kilicoglu H, Rosemblat G, Fiszman M, et al. Constructing a Semantic Predication Gold Standard from the Biomedical Literature[J]. BMC Bioinformatics, 2011,12(1):1-17.
doi: 10.1186/1471-2105-12-1
[26]
Zhang Y, Porter A L, Hu Z, et al. “Term Clumping” for Technical Intelligence: A Case Study on Dye-Sensitized Solar Cells[J]. Technological Forecasting and Social Change, 2014,85:26-39.
doi: 10.1016/j.techfore.2013.12.019
[27]
胡正银. 基于个性化语义TRIZ的专利技术挖掘研究[D]. 北京:中国科学院大学, 2015.
[27]
( Hu Zhengyin. Study on Patent Tech Mining Based on Personalized Semantic TRIZ[D]. Beijing: University of Chinese Academy of Sciences, 2015.)
[28]
Fiszman M, Rindflesch T C, Kilicoglu H. Abstraction Summarization for Managing the Biomedical Research Literature[C]// Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics (CLS’04). ACM, 2004: 76-83.
( Wei Ling, Hu Zhengyin, Pang Hongshen, et al. Study on Knowledge Discovery in Biomedical Literature Based on SPO Predications: A Case Study of Induced Pluripotent Stem Cells[J]. Digital Library Forum, 2017(9):28-34.)
( Liu Leilei. Research on Multi-Source Data Fusion for the Question and Answer of Subject Knowledge - A Case Study of Hematopoietic Stem Cell for Cancer Treatment[D]. Beijing: University of Chinese Academy of Sciences, 2020.)
[31]
Chris J L. The Specialist Lexicon and NLP Tools [EB/OL]. [2020-05-11]. https://lexsrv3.nlm.nih.gov/Specialist/Docs/Presentations/2017SummerLectures/2017-SLS-LexSynonym.pdf.
NLM. Term Processing[EB/OL]. [2019-10-16]. https://metamap.nlm.nih.gov/Docs/FAQ/Term Processing.pdf.
[34]
Chris J L, Browne A C. Sub-Term Mapping Tools[EB/OL]. [2019-10-28]. https://lexsrv3.nlm.nih.gov/Specialist/Summary/stmt.html.
[35]
Hristovski D, Kastrin A, Peterlin B, et al. Combining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation[A]// Linking Literature, Information, and Knowledge for Biology[M].Heidelberg: Springer, 2010.
( Hu Zhengyin, Fang Shu, Zheng Ying, et al. Method of Development and Architecture of an Ontology-Based Intelligent Retrieval System[J]. Journal of Intelligences, 2009,28(5):159-162.)
[37]
Chen C. Searching for Intellectual Turning Points: Progressive Knowledge Domain Visualization[J]. Proceedings of the National Academy of Sciences, 2004,101(S1):5303-5310.
[38]
Song M, Heo G E, Ding Y. SemPathFinder: Semantic Path Analysis for Discovering Publicly Unknown Knowledge[J]. Journal of Informetrics, 2015,9(4):686-703.
[39]
Kumar A, Singh S, Singh K, et al. Link Prediction Techniques, Applications, and Performance: A Survey[J]. Physica A: Statistical Mechanics and Its Applications, 2020,553:1-46.
[40]
Adamic L, Adar E. Friends and Neighbors on the Web[J]. Social Networks, 2003,25(3):211-230.
( Hao Sha, Dong Fang, Hu Linping, et al. Biology and Clinical Application Research of Hematopoietic Stem Cells[J]. Chinese Journal of Cell Biology, 2018,40(13):2237-2248.)
( Zhou Yuanchun, Wang Weijun, Qiao Ziyue, et al. A Survey on the Construction Methods and Applications of Sci-Tech Big Data Knowledge Graph[J]. Scientia Sinica Informationis, 2020,50(7):957-987.)
[43]
张志强, 胡正银, 文奕. 学科信息学与学科知识发现[M]. 北京: 科学出版社, 2020.
[43]
( Zhang Zhiqiang, Hu Zhengyin, Wen Yi. Subject Informatics and Subject Knowledge Discovery[M]. Beijing: Science Press, 2020.)