|
|
Discovering Subject Knowledge in Life and Medical Sciences with Knowledge Graph |
Hu Zhengyin1,2( ),Liu Leilei1,2,Dai Bing1,2,Qin Xiaochu3,4 |
1Chengdu Library and Information Center, Chinese Academy of Sciences, Chengdu 610041, China 2Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China 3Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou 510700, China 4Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China |
|
|
Abstract [Objective] This paper explores new methods for deep subject knowledge discovery using multi-source heterogeneous data. [Methods] First, we constructed a SPO semantic network of literature to create the core domain knowledge graph. Then, we implemented multi-source heterogeneous data fusion through “entity alignment, concept level fusion and relationship fusion” to obtain the whole domain knowledge graph. Finally, we discovered deep subject knowledge with the help of this knowledge graph. We examined our method with data on Hematopoietic Stem Cell for Cancer Treatment (HSCCT). [Results] This paper proposed a knowledge graph-based framework for subject knowledge discovery (KGSKD), which fuses multi-source heterogeneous data multi-dimensionally and fine-grainedly, enriches semantic relationships among data, and supports knowledge discovery techniques such as knowledge inference, pathfinder, and link prediction natively. [Limitations] KGSKD has some limitations including data supersaturation, poor interpretability of knowledge discovery results and difficulty in communicating with domain experts. [Conclusions] KGSKD has the advantages of “richer data types”, “more comprehensive knowledge linkage”, “more advanced mining methods” and “deeper discovery results”, which effectively supports research and services of deep knowledge discovery in life sciences and medicine.
|
Received: 13 July 2020
Published: 04 December 2020
|
|
Corresponding Authors:
Hu Zhengyin
E-mail: huzy@clas.ac.cn
|
[1] |
梁娜, 曾燕. 推进数据密集科学发现提升科技创新能力: 新模式、新方法、新挑战——《第四范式: 数据密集型科学发现》译著出版[J]. 中国科学院院刊, 2013,28(1):115-121.
|
[1] |
( Liang Na, Zeng Yan. Promote Data-intensive Scientific Discovery, Enhance Scientific and Technological Innovation Capability: New Model, New Method, and New Challenges Comments on “The Fourth Paradigm: Data-Intensive Scientific Discovery”[J]. Bulletin of the Chinese Academy of Sciences, 2013,28(1):115-121.)
|
[2] |
张志强, 胡正银, 杨宁, 等. 干细胞领域知识发现大数据平台建设与应用[A]// 中国科研信息化蓝皮书2020[M]. 北京: 科学出版社, 2020.
|
[2] |
( Zhang Zhiqiang, Hu Zhengyin, Yang Ning, et al. Big Data Platform for Subject Knowledge Discovery in the Stem Cell Field[A] // China’s e-Science Blue Book 2020[M]. Beijing: Science Press, 2020.)
|
[3] |
陆伟, 李信, 任珂. 基于解剖结构视角的医学学科画像研究[J]. 信息资源管理学报, 2018,8(3):12-24.
|
[3] |
( Lu Wei, Li Xin, Ren Ke. Research on Subject Profile of Medical Science from the Perspective of Anatomical Structure[J]. Journal of Information Resources Management, 2018,8(3):12-24.)
|
[4] |
张志强, 范少萍. 论学科信息学的兴起与发展[J]. 情报学报, 2015,34(10):1011-1023.
|
[4] |
( Zhang Zhiqiang, Fan Shaoping. On the Emergence and Development of Subject Informatics[J]. Journal of the China Society for Scientific and Technical Information, 2015,34(10):1011-1023.)
|
[5] |
张志强, 范少萍, 陈秀娟. 面向精准医学知识发现的生物医学信息学发展[J]. 数据分析与知识发现, 2018,2(1):1-8.
|
[5] |
( Zhang Zhiqiang, Fan Shaoping, Chen Xiujuan. Biomedical Informatics Studies for Knowledge Discovery in Precision Medicine[J]. Data Analysis and Knowledge Discovery, 2018,2(1):1-8.)
|
[6] |
李广建, 江信昱. 论计算型情报分析[J]. 中国图书馆学报, 2018,44(2):4-16.
|
[6] |
( Li Guangjian, Jiang Xinyu. On Computational Information Analysis[J]. Journal of Library Science in China, 2018,44(2):4-16.)
|
[7] |
李文林, 曾莉, 杨斓. 基于文献的知识发现服务及其问题——以南京中医药大学图书馆为例[J]. 大学图书馆学报, 2015,33(2):61-65.
|
[7] |
( Li Wenlin, Zeng Li, Yang Lan. Experiences and Problems in Literature-based Knowledge Discovery Service in University Libraries - Taking Nanjing University of Chinese Medicine Library as an Example[J]. Journal of Academic Library, 2015,33(2):61-65.)
|
[8] |
漆桂林, 高桓, 吴天星. 知识图谱研究进展[J]. 情报工程, 2017,3(1):4-25.
|
[8] |
( Qi Guilin, Gao Huan, Wu Tianxing. The Research Advances of Knowledge Graph[J]. Technology Intelligence Engineering, 2017,3(1):4-25.)
|
[9] |
Hu Z Y, Xu H Y, Qin X C. A Knowledge Graph of Stem Cell Oriented to Subject Knowledge Discovery [C]//Proceedings of the 7th IEEE International Conference on Healthcare Informatics. 2019.
|
[10] |
Lamurias A, Ferreira J D, Clarke L A, et al. Generating a Tolerogenic Cell Therapy Knowledge Graph from Literature[J]. Frontiers in Immunology, 2017,8:1-12.
doi: 10.3389/fimmu.2017.00001
pmid: 28149297
|
[11] |
马明, 武夷山. Don R.Swanson的情报学学术成就的方法论意义与启示[J]. 情报学报, 2003,22(3):259-266.
|
[11] |
( Ma Ming, Wu Yishan. Methodological Enlightenment and Significance of Don R.Swanson’s Achievements in Information Science[J]. Journal of the China Society for Scientific and Technical Information, 2003,22(3):259-266.)
|
[12] |
胡正银, 刘春江, 隗玲, 等. 面向TRIZ的领域专利技术挖掘系统设计与实践[J]. 图书情报工作, 2017,61(1):117-124.
|
[12] |
( Hu Zhengyin, Liu Chunjiang, Wei Ling, et al. Design and Practice of Domain Patent Tech Mining System Oriented to TRIZ[J]. Library and Information Service, 2017,61(1):117-124.)
|
[13] |
Swanson D R. Fish Oil, Raynaud’s Syndrome, and Undiscovered Public Knowledge[J]. Perspectives in Biology and Medicine, 1986,30(1):7-18.
doi: 10.1353/pbm.1986.0087
pmid: 3797213
|
[14] |
Swanson D R. Undiscovered Public Knowledge[J]. The Library Quarterly, 1986,56(2):103-118.
doi: 10.1086/601720
|
[15] |
Smalheiser N R. Literature-Based Discovery: Beyond the ABCs[J]. Journal of the American Society for Information Science and Technology, 2012,63(2):218-224.
doi: 10.1002/asi.21599
|
[16] |
Henry S, Mcinnes B. Literature Based Discovery: Models, Methods, and Trends[J]. Journal of Biomedical Informatics, 2017,74:20-32.
doi: 10.1016/j.jbi.2017.08.011
pmid: 28838802
|
[17] |
Pyysalo S, Baker S, Ali I, et al. LION LBD: A Literature-Based Discovery System for Cancer Biology[J]. Bioinformatics, 2019,35(9):1553-1561.
doi: 10.1093/bioinformatics/bty845
pmid: 30304355
|
[18] |
Kostoff R N. Literature-Related Discovery(LRD): Potential Treatments for Cataracts[J]. Technological Forecasting and Social Change, 2008,75(2):215-225.
doi: 10.1016/j.techfore.2007.11.006
|
[19] |
Kostoff R N, Briggs M B, Lyons T J. Literature-Related Discovery(LRD): Potential Treatments for Multiple Sclerosis[J]. Technological Forecasting and Social Change, 2008,75(2):239-255.
doi: 10.1016/j.techfore.2007.11.002
|
[20] |
Kostoff R N, Briggs M B. Literature-Related Discovery(LRD): Potential Treatments for Parkinson’s Disease[J]. Technological Forecasting and Social Change, 2008,75(2):226-238.
doi: 10.1016/j.techfore.2007.11.007
|
[21] |
侯跃芳, 朱瑾, 崔梦遥, 等. 运用非相关文献知识发现方法挖掘疾病的潜在相关基因[J]. 中华医学图书情报杂志, 2010,19(5):1-4, 10.
|
[21] |
( Hou Yuefang, Zhu Jin, Cui Mengyao, et al. To Mine Disease-Related Potential Genes Using Non-Literature Related Knowledge Discovery Methods[J]. Chinese Journal of Medical Library and Information Science, 2010,19(5):1-4, 10.)
|
[22] |
Hu Z Y, Zeng R Q, Qin X C, et al. A Method of Biomedical Knowledge Discovery by Literature Mining Based on SPO Predications: A Case Study of Induced Pluripotent Stem Cells[C]// Proceedings of 2018 Machine Learning and Data Mining in Pattern Recognition. 2018: 383-393.
|
[23] |
Hu Z, Zeng R Q, Peng L, et al. Discovering Emerging Research Topics Based on SPO Predications[C]// Proceedings of 2019 Knowledge Management in Organizations. 2019: 110-121.
|
[24] |
Rindflesch T C, Fiszman M. The Interaction of Domain Knowledge and Linguistic Structure in Natural Language Processing: Interpreting Hypernymic Propositions in Biomedical Text[J]. Journal of Biomedical Informatics, 2003,36(6):462-477.
doi: 10.1016/j.jbi.2003.11.003
|
[25] |
Kilicoglu H, Rosemblat G, Fiszman M, et al. Constructing a Semantic Predication Gold Standard from the Biomedical Literature[J]. BMC Bioinformatics, 2011,12(1):1-17.
doi: 10.1186/1471-2105-12-1
|
[26] |
Zhang Y, Porter A L, Hu Z, et al. “Term Clumping” for Technical Intelligence: A Case Study on Dye-Sensitized Solar Cells[J]. Technological Forecasting and Social Change, 2014,85:26-39.
doi: 10.1016/j.techfore.2013.12.019
|
[27] |
胡正银. 基于个性化语义TRIZ的专利技术挖掘研究[D]. 北京:中国科学院大学, 2015.
|
[27] |
( Hu Zhengyin. Study on Patent Tech Mining Based on Personalized Semantic TRIZ[D]. Beijing: University of Chinese Academy of Sciences, 2015.)
|
[28] |
Fiszman M, Rindflesch T C, Kilicoglu H. Abstraction Summarization for Managing the Biomedical Research Literature[C]// Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics (CLS’04). ACM, 2004: 76-83.
|
[29] |
隗玲, 胡正银, 庞弘燊, 等. 基于“主语-谓语-宾语”三元组的知识发现研究——以诱导多能干细胞领域为例[J]. 数字图书馆论坛, 2017(9):28-34.
|
[29] |
( Wei Ling, Hu Zhengyin, Pang Hongshen, et al. Study on Knowledge Discovery in Biomedical Literature Based on SPO Predications: A Case Study of Induced Pluripotent Stem Cells[J]. Digital Library Forum, 2017(9):28-34.)
|
[30] |
刘蕾蕾. 面向学科知识问答的多源数据融合研究——以造血干细胞癌症治疗为例[D]. 北京: 中国科学院大学, 2020.
|
[30] |
( Liu Leilei. Research on Multi-Source Data Fusion for the Question and Answer of Subject Knowledge - A Case Study of Hematopoietic Stem Cell for Cancer Treatment[D]. Beijing: University of Chinese Academy of Sciences, 2020.)
|
[31] |
Chris J L. The Specialist Lexicon and NLP Tools [EB/OL]. [2020-05-11]. https://lexsrv3.nlm.nih.gov/Specialist/Docs/Presentations/2017SummerLectures/2017-SLS-LexSynonym.pdf.
|
[32] |
NLM. Metathesaurus[EB/OL]. [2020-05-11]. https://www.ncbi.nlm.nih.gov/books/NBK9685/.
|
[33] |
NLM. Term Processing[EB/OL]. [2019-10-16]. https://metamap.nlm.nih.gov/Docs/FAQ/Term Processing.pdf.
|
[34] |
Chris J L, Browne A C. Sub-Term Mapping Tools[EB/OL]. [2019-10-28]. https://lexsrv3.nlm.nih.gov/Specialist/Summary/stmt.html.
|
[35] |
Hristovski D, Kastrin A, Peterlin B, et al. Combining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation[A]// Linking Literature, Information, and Knowledge for Biology[M].Heidelberg: Springer, 2010.
|
[36] |
胡正银, 方曙, 郑颖, 等. 基于Ontology的智能检索技术研究与实践[J]. 情报杂志, 2009,28(5):159-162.
|
[36] |
( Hu Zhengyin, Fang Shu, Zheng Ying, et al. Method of Development and Architecture of an Ontology-Based Intelligent Retrieval System[J]. Journal of Intelligences, 2009,28(5):159-162.)
|
[37] |
Chen C. Searching for Intellectual Turning Points: Progressive Knowledge Domain Visualization[J]. Proceedings of the National Academy of Sciences, 2004,101(S1):5303-5310.
|
[38] |
Song M, Heo G E, Ding Y. SemPathFinder: Semantic Path Analysis for Discovering Publicly Unknown Knowledge[J]. Journal of Informetrics, 2015,9(4):686-703.
|
[39] |
Kumar A, Singh S, Singh K, et al. Link Prediction Techniques, Applications, and Performance: A Survey[J]. Physica A: Statistical Mechanics and Its Applications, 2020,553:1-46.
|
[40] |
Adamic L, Adar E. Friends and Neighbors on the Web[J]. Social Networks, 2003,25(3):211-230.
|
[41] |
郝莎, 董芳, 胡林萍, 等. 造血干细胞生物学及临床应用研究概况[J]. 中国细胞生物学学报, 2018,40(13):2237-2248.
|
[41] |
( Hao Sha, Dong Fang, Hu Linping, et al. Biology and Clinical Application Research of Hematopoietic Stem Cells[J]. Chinese Journal of Cell Biology, 2018,40(13):2237-2248.)
|
[42] |
周园春, 王卫军, 乔子越, 等. 科技大数据知识图谱构建方法及应用研究综述[J]. 中国科学: 信息科学, 2020,50(7):957-987.
|
[42] |
( Zhou Yuanchun, Wang Weijun, Qiao Ziyue, et al. A Survey on the Construction Methods and Applications of Sci-Tech Big Data Knowledge Graph[J]. Scientia Sinica Informationis, 2020,50(7):957-987.)
|
[43] |
张志强, 胡正银, 文奕. 学科信息学与学科知识发现[M]. 北京: 科学出版社, 2020.
|
[43] |
( Zhang Zhiqiang, Hu Zhengyin, Wen Yi. Subject Informatics and Subject Knowledge Discovery[M]. Beijing: Science Press, 2020.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|