Chengdu Library and Information Center, Chinese Academy of Sciences, Chengdu 610041, China Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
【目的】 对基于文献的知识发现(Literature-Based Discovery,LBD)近十年的文献进行综述,了解该主题的最新研究进展、发展趋势与面临的挑战。【文献范围】 在Web of Science、CNKI和百度学术中使用“literature based discovery”、“literature AND knowledge discovery”、“文献知识发现”、“文献AND知识挖掘”进行检索,限定文献发表时间为2010年-2020年,共筛选出72篇代表性文献进行述评。【方法】 从研究对象、方法技术、结果评估与典型应用4个方面对文献进行归纳梳理,并总结LBD的发展趋势与面临的挑战。【结果】 LBD发展呈现出研究对象复杂化、分析方法智能化、发现结果丰富化与应用服务实践化的趋势;LBD在多源异构数据融合、知识发现可解释性、结果有效性评估、多领域专家协同方面面临重大挑战。【局限】 主要基于文献对LBD新近进展进行综述,对LBD工具系统及产业界应用覆盖不够。【结论】 作为情报学、信息学、数据科学的交叉研究领域,LBD对挖掘跨学科领域隐性知识与提供高质量学科化知识服务具有重要意义,但真正实现支持潜在的科学新发现还存在诸多挑战。
[Objective] This paper reviews literature-based discovery (LBD) studies, aiming to explore the latest progress, development trends and challenges in this field. [Coverage] We searched “literature-based discovery” or “literature and knowledge discovery” in Chinese and English with the Web of Science, CNKI and Baidu Academic for research published from 2010 to 2020. A total of 72 representative literature were chosen for review. [Methods] Firstly, we summarized these studies from research objects, methods and techniques, results and typical applications. We then discussed future development trends and challenges facing LBD. [Results] The research objects of LBD were becoming complicated, while the analysis methods and techniques were more intelligent. The discovery results were further enriched, which led to more LBD applications. There are some challenges facing LBD, such as multi-source heterogeneous data fusion, interpretability of knowledge discovery, evaluation of results, and collaboration of multi-disciplinary experts. [Limitations] We did not examine LBD tools / systems as well as industry applications extensively. [Conclusions] As an interdisciplinary research field of information science, informatics and data science, LBD is of great significance for mining knowledge and providing high-quality subject knowledge services.
代冰,胡正银. 基于文献的知识发现新近研究综述 *[J]. 数据分析与知识发现, 2021, 5(4): 1-12.
Dai Bing,Hu Zhengyin. Review of Studies on Literature-Based Discovery. Data Analysis and Knowledge Discovery, 2021, 5(4): 1-12.
马明, 武夷山. Don R. Swanson的情报学学术成就的方法论意义与启示[J]. 情报学报, 2003,22(3):259-266.
[1]
( Ma Ming, Wu Yishan. Methodological Enlightenment and Significance of Don R.Swanson’s Achievements in Information Science[J]. Journal of the China Society for Scientific and Technical Information, 2003,22(3):259-266.)
( Hu Zhengyin, Liu Leilei, Dai Bing, et al. Discovering Subject Knowledge in Life and Medical Sciences with Knowledge Graph[J]. Data Analysis and Knowledge Discovery, 2020,4(11):1-14.)
[3]
Hey T, Tansley S, Tolle K. The Fourth Paradigm: Data-intensive Scientific Discovery[M]. Redmond, WA: Microsoft Research, 2009.
[4]
Ganiz M C, Pottenger W M, Janneck C D. Recent Advances in Literature Based Discovery[EB/OL]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.77.6842&rep=rep1&type=pdf.
[5]
Hui W, Lau W K. Application of Literature-Based Discovery in Nonmedical Disciplines: A Survey[C]// Proceedings of the 2nd International Conference on Computing and Big Data. 2019: 7-11.
[6]
Swanson D R. Fish Oil, Raynaud’s Syndrome, Undiscovered Public Knowledge[J]. Perspectives in Biology and Medicine, 1986,30(1):7-18.
doi: 10.1353/pbm.1986.0087
[7]
Swanson D R. Undiscovered Public Knowledge[J]. The Library Quarterly, 1986,56(2):103-118.
doi: 10.1086/601720
[8]
Gopalakrishnan V, Jha K, Jin W, et al. A Survey on Literature Based Discovery Approaches in Biomedical Domain[J]. Journal of Biomedical Informatics, 2019,93:103141.
doi: S1532-0464(19)30059-0
pmid: 30857950
[9]
Smalheiser N R. Literature-Based Discovery: Beyond the ABCs[J]. Journal of the American Society for Information Science and Technology, 2012,63(2):218-224.
doi: 10.1002/asi.v63.2
[10]
Wilkowski B, Fiszman M, Miller C M, et al. Graph-Based Methods for Discovery Browsing with Semantic Predications[C]// Proceedings of the 2011 AMIA Annual Symposium. 2011.
[11]
Cameron D, Kavuluru R, Rindflesch T C, et al. Context-Driven Automatic Subgraph Creation for Literature-Based Discovery[J]. Journal of Biomedical Informatics, 2015,54:141-157.
doi: 10.1016/j.jbi.2015.01.014
pmid: 25661592
[12]
桑盛田. 生物医学文献中的隐含知识发现方法研究[D]. 大连: 大连理工大学, 2019.
[12]
( Sang Shengtian. Research on Literature Based Discovery Methods in Biomedical Literature[D]. Dalian: Dalian University of Technology, 2019.)
[13]
Vicente-Gomila J M. The Contribution of Syntactic-Semantic Approach to the Search for Complementary Literatures for Scientific or Technical Discovery[J]. Scientometrics, 2014,100(3):659-673.
doi: 10.1007/s11192-014-1299-2
( Yu Huangyingzi, Dong Qingxing, Zhang Bin. Disease Knowledge Association Mining and Forecasting Based on Network Representation Learning[J]. Information Studies: Theory & Application, 2019,42(12):156-162.)
[15]
Maclean D, Seltzer M I. Mining the Web for Medical Hypoconfproc: A Proof-of-Concept System[C]// Proceedings of the 2011 International Conference on Health Informatics. 2012.
[16]
Cohen T, Widdows D, Schvaneveldt R W, et al. Discovering Discovery Patterns with Predication-Based Semantic Indexing[J]. Journal of Biomedical Informatics, 2012,45(6):1049-1065.
doi: 10.1016/j.jbi.2012.07.003
[17]
Cohen T, Widdows D, Stephan C, et al. Predicting High-throughput Screening Results with Scalable Literature-Based Discovery Methods[J]. CPT: Pharmacometrics & Systems Pharmacology, 2014,3(10):e140.
[18]
Cohen T, Whitfield G K, Schvaneveldt R W, et al. EpiphaNet: An Interactive Tool to Support Biomedical Discoveries[J]. Journal of Biomedical Discovery and Collaboration, 2010,5(1):21-49.
[19]
Hu Z Y, Dai B, Zhang Y, et al. Mining Latent Relations Between Disease and Transcription Factor Based on Knowledge Graph: A Case Study on Alzheimer’s Disease[C]// Proceedings of the 10th Global TechMining Conference. 2020.
[20]
Liu C, Chu W W, Sabb F, et al. Path Knowledge Discovery: Association Mining Based on Multi-Category Lexicons[C]// Proceedings of the 2014 IEEE International Conference on Big Data. IEEE, 2014: 1049-1059.
[21]
Swanson D R, Smalheiser N R. An Interactive System for Finding Complementary Literatures: A Stimulus to Scientific Discovery[J]. Artificial Intelligence, 1997,91(2):183-203.
doi: 10.1016/S0004-3702(97)00008-8
[22]
Hristovski D, Kastrin A, Peterlin B, et al. Combining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation[A]// Linking Literature, Information, and Knowledge for Biology[M]. Springer, Berlin, Heidelberg, 2010: 53-61.
[23]
Pyysalo S, Baker S, Ali I, et al. LION LBD: A Literature-Based Discovery System for Cancer Biology[J]. Bioinformatics, 2019,35(9):1553-1561.
doi: 10.1093/bioinformatics/bty845
pmid: 30304355
( Qian Qing, Hong Na, Li Yong, et al. Construction of the Chinese Disjoint Literature-Based Knowledge Discovery System CmedLBKD[J]. Information Studies: Theory & Application, 2012,35(4):109-113.)
[25]
Sebastian Y, Siew E G, Orimaye S O. Emerging Approaches in Literature-Based Discovery: Techniques and Performance Review[J]. The Knowledge Engineering Review, 2017,32: Article No. e12. DOI: https://doi.org/10.1017/S0269888917000042.
[26]
Thilakaratne M, Falkner K, Atapattu T. A Systematic Review on Literature-Based Discovery: General Overview, Methodology, & Statistical Analysis[J]. ACM Computing Surveys, 2019,52(6):1-34.
[27]
Petrič I, Cestnik B, Lavrač N, et al. Outlier Detection in Cross-Context Link Discovery for Creative Literature Mining[J]. The Computer Journal, 2012,55(1):47-61.
doi: 10.1093/comjnl/bxq074
[28]
Petriĕ I, Urbanĕiĕ T, Cestnik B, et al. Literature Mining Method RaJoLink for Uncovering Relations Between Biomedical Concepts[J]. Journal of Biomedical Informatics, 2009,42(2):219-227.
doi: 10.1016/j.jbi.2008.08.004
[29]
Workman T E, Fiszman M, Cairelli M J, et al. Spark, An Application Based on Serendipitous Knowledge Discovery[J]. Journal of Biomedical Informatics, 2016,60:23-37.
doi: 10.1016/j.jbi.2015.12.014
[30]
Banerjee R, Ramakrishnan I, Choi Y, et al. Automated Suggestion of Tests for Identifying Likelihood of Adverse Drug Events[C]// Proceedings of the 2014 IEEE International Conference on Healthcare Informatics. 2014: 170-176.
( Li Zongyao, Yang Zhihao, Wu Xiaofang, et al. Using Semantic Relations for Biomedical Literature-Based Discovery[J]. Journal of Chinese Information Processing, 2016,30(1):176-182.)
[32]
Hristovski D, Friedman C, Rindflesch T C, et al. Exploiting Semantic Relations for Literature-Based Discovery[C]// Proceedings of the 2006 AMIA Annual Symposium. 2006: 349-353.
[33]
Ahlers C B, Hristovski D, Kilicoglu H, et al. Using the Literature-Based Discovery Paradigm to Investigate Drug Mechanisms[C]// Proceedings of the 2007 AMIA Annual Symposium. 2007: 6-10.
[34]
Mower J, Subramanian D, Shang N, et al. Classification-by-analogy: Using Vector Representations of Implicit Relationships to Identify Plausibly Causal Drug/Side-effect Relationships[C]// Proceedings of the 2016 AMIA Annual Symposium. 2016: 1940-1949.
[35]
Symonds M, Bruza P D, Sitbon L. The Efficiency of Corpus-Based Distributional Models for Literature-Based Discovery on Large Data Sets[C]// Proceedings of the 2nd Australasian Web Conference. 2014: 49-57.
[36]
Zhao H, Yao Q, Li J, et al. Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks[C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2017: 635-644.
[37]
Dharmavaram S, Shaik A, Jin W. Mining Biomedical Data for Hidden Relationship Discovery[C]// Proceedings of the 2019 IEEE International Conference on Healthcare Informatics. IEEE, 2019: 1-10.
[38]
Jha K, Jin W. Mining Hidden Knowledge from the Counterterrorism Dataset Using Graph-Based Approach[C]// Proceedings of the 21st International Conference on Applications of Natural Language to Information Systems. Springer, Cham, 2016: 310-317.
[39]
Sang S, Yang Z, Liu X, et al. GrEDeL: A Knowledge Graph Embedding Based Method for Drug Discovery from Biomedical Literatures[J]. IEEE Access, 2018,7:8404-8415.
doi: 10.1109/ACCESS.2018.2886311
( Du Jian. Measuring Uncertainty of Medical Knowledge: A Literature Review[J]. Data Analysis and Knowledge Discovery, 2020,4(10):14-27.)
[41]
Kastrin A, Rindflesch T C, Hristovski D. Link Prediction on a Network of Co-Occurring MeSH Terms: Towards Literature-Based Discovery[J]. Methods of Information in Medicine, 2016,55(4):340-346.
doi: 10.3414/ME15-01-0108
pmid: 27435341
[42]
Sebastian Y, Siew E G, Orimaye S O. Learning the Heterogeneous Bibliographic Information Network for Literature-Based Discovery[J]. Knowledge-Based Systems, 2017,115:66-79.
doi: 10.1016/j.knosys.2016.10.015
( Chen Liang, Peng Zhe. Discovery of Potential Partners of SMEs Based on Patent Heterogeneous Network[A]//Zhang Zhiqiang, Hu Zhengyin, Wen Yi. Subject Informatics and Subject Knowledge Discovery[M]. Beijing:Science Press, 2020.)
[44]
Ding Y, Song M, Han J, et al. Entitymetrics: Measuring the Impact of Entities[J]. PLoS One, 2013,8(8):e71416.
doi: 10.1371/journal.pone.0071416
[45]
Kostoff R N. Literature-Related Discovery: Common Factors for Parkinson’s Disease and Crohn’s Disease[J]. Scientometrics, 2014,100(3):623-657.
doi: 10.1007/s11192-014-1298-3
[46]
Henry S, Panahi A, Wijesinghe D S, et al. A Literature Based Discovery Visualization System with Hierarchical Clustering and Linking Set Associations[J]. AMIA Summits on Translational Science Proceedings, 2019: 582-591.
[47]
Fujita K. Finding Linkage Between Sustainability Science and Technologies Based on Citation Network Analysis[C]// Proceedings of the 5th IEEE International Conference on Service-Oriented Computing and Applications. IEEE, 2012: 1-6.
[48]
Ittipanuvat V, Fujita K, Sakata I, et al. Finding Linkage Between Technology and Social Issue: A Literature Based Discovery Approach[J]. Journal of Engineering and Technology Management, 2014,32:160-184.
doi: 10.1016/j.jengtecman.2013.05.006
[49]
Crichton G K O. Improving Automated Literature-Based Discovery with Neural Networks: Neural Biomedical Named Entity Recognition, Link Prediction and Discovery[D]. London: University of Cambridge, 2019.
[50]
Sang S, Yang Z, Li Z, et al. Supervised Learning Based Hypojournal Generation from Biomedical Literature[J]. BioMed Research International, 2015. DOI: 10.1155/2015/698527.
[51]
Xu B, Shi X, Zhao Z, et al. Leveraging Biomedical Resources in Bi-LSTM for Drug-Drug Interaction Extraction[J]. IEEE Access, 2018,6:33432-33439.
doi: 10.1109/ACCESS.2018.2845840
[52]
Choudhury N, Faisal F, Khushi M. Towards an LSTM-Based Predictive Framework for Literature-Based Knowledge Discovery[OL]. arXiv Preprint, arXiv: 1907. 09395.
[53]
Xun G, Jha K, Gopalakrishnan V, et al. Generating Medical Hypotheses Based on Evolutionary Medical Concepts[C]// Proceedings of the 2017 IEEE International Conference on Data Mining. IEEE, 2017: 535-544.
( Liu Jingtao, Liu Yaohua. Application of Computer Molecular Simulation Technology and Artificial Intelligence in Drug Development[J]. Technology Innovation and Application, 2018(2):46-47.)
[55]
Brown N, Cambruzzi J, Cox P J, et al. Big Data in Drug Discovery[J]. Progress in Medicinal Chemistry, 2018,57:277-356.
[56]
Lever J, Gakkhar S, Gottlieb M, et al. A Collaborative Filtering-Based Approach to Biomedical Knowledge Discovery[J]. Bioinformatics, 2018,34(4):652-659.
doi: 10.1093/bioinformatics/btx613
pmid: 29028901
[57]
Gordon M D, Lindsay R K. Toward Discovery Support Systems: A Replication, Re-Examination, and Extension of Swanson’s Work on Literature-Based Discovery of a Connection Between Raynaud’s and Fish Oil[J]. Journal of the American Society for Information Science, 1996,47(2):116-128.
doi: 10.1002/(ISSN)1097-4571
[58]
Srinivasan P. Text Mining: Generating Hypotheses from MEDLINE[J]. Journal of the American Society for Information Science and Technology, 2004,55(5):396-413.
doi: 10.1002/(ISSN)1532-2890
[59]
Rastegar-Mojarad M, Elayavilli R K, Wang L, et al. Prioritizing Adverse Drug Reaction and Drug Repositioning Candidates Generated by Literature-Based Discovery[C]// Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. 2016: 289-296.
[60]
Shang N, Xu H, Rindflesch T C, et al. Identifying Plausible Adverse Drug Reactions Using Knowledge Extracted from the Literature[J]. Journal of Biomedical Informatics, 2014,52:293-310.
doi: 10.1016/j.jbi.2014.07.011
pmid: 25046831
[61]
Hristovski D, Stare J, Peterlin B, et al. Supporting Discovery in Medicine by Association Rule Mining in Medline and UMLS[J]. Studies in Health Technology and Informatics, 2001(2):1344-1348.
[62]
Yetisgen-Yildiz M, Pratt W. Using Statistical and Knowledge-Based Approaches for Literature-Based Discovery[J]. Journal of Biomedical Informatics, 2006,39(6):600-611.
pmid: 16442852
( Zhu Qingsong, Leng Fuhai. Research on the Development of the Fusion Evaluation Method for the Disjoint Literature-Based Knowledge Discovery[J]. Information Studies: Theory & Application, 2013,36(7):106-109, 105.)
( Zhong Liping, Leng Fuhai. Research Status and Review of the Effectiveness Evaluation for the Disjoint Literature-Based Knowledge Discovery[J]. Information Studies: Theory & Application, 2011,34(5):121-125.)
[65]
Baek S H, Lee D, Kim M, et al. Enriching Plausible New Hypojournal Generation in PubMed[J]. PLoS One, 2017,12(7):e0180539.
doi: 10.1371/journal.pone.0180539
[66]
Srinivasan P, Libbus B. Mining MEDLINE for Implicit Links Between Dietary Substances and Diseases[J]. Bioinformatics, 2004,20(S1):i290-i296.
doi: 10.1093/bioinformatics/bth914
[67]
Gordon M, Lindsay R K, Fan W. Literature-based Discovery on the World Wide Web[J]. ACM Transactions on Internet Technology, 2002,2(4):261-275.
doi: 10.1145/604596.604597
[68]
Spangler S, Wilkins A D, Bachman B J, et al. Automated Hypothesis Generation Based on Mining Scientific Literature[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014: 1877-1886.
[69]
Sang S, Yang Z, Wang L, et al. SemaTyP: A Knowledge Graph Based Literature Mining Method for Drug Discovery[J]. BMC Bioinformatics, 2018,19(1): ArticleNo. 193.
doi: 10.1186/s12859-018-2167-5
[70]
Zhang R, Cairelli M J, Fiszman M, et al. Exploiting Literature-Derived Knowledge and Semantics to Identify Potential Prostate Cancer Drugs[J]. Cancer Informatics, 2014,13(S1):103-111.
[71]
Yang H T, Ju J H, Wong Y T, et al. Literature-Based Discovery of New Candidates for Drug Repurposing[J]. Briefings in Bioinformatics, 2017,18(3):488-497.
[72]
Henry S, McInnes B T. Literature Based Discovery: Models, Methods, and Trends[J]. Journal of Biomedical Informatics, 2017,74:20-32.
doi: S1532-0464(17)30190-9
pmid: 28838802
[73]
Hristovski D, Kastrin A, Dinevski D, et al. Using Literature-Based Discovery to Explain Adverse Drug Effects[J]. Journal of Medical Systems, 2016, 40(8): Article No. 185.
doi: 10.1007/s10916-016-0544-z
pmid: 27318993
[74]
Lamurias A, Ferreira J D, Clarke L A, et al. Generating a Tolerogenic Cell Therapy Knowledge Graph from Literature[J]. Frontiers in Immunology, 2017,8:1656.
doi: 10.3389/fimmu.2017.01656
pmid: 29238346
( Wang Xue, Wu Junwei, Chen Guanqun, et al. Knowledge Mining of Alzheimer’s Disease Gene-Disease Associations[J]. Library and Information Service, 2020,64(13):120-132.)
( Mu Dongmei, Jin Shan, Ju Yuanhong. Finding Association Between Diseases and Genes from Literature Abstracts[J]. Data Analysis and Knowledge Discovery, 2018,2(8):98-106.)
[77]
Özgür A, Xiang Z, Radev D R, et al. Literature-Based Discovery of IFN-γ and Vaccine-Mediated Gene Interaction Networks[J]. of Biomedicine and Biotechnology, 2010(19):426479.
[78]
Srinivasan M, Blackburn C, Mohamed M, et al. Literature-Based Discovery of Salivary Biomarkers for Type 2 Diabetes Mellitus[J]. Biomarker Insights, 2015,10:39-45.
doi: 10.4137/BMI.S22177
pmid: 26005324
[79]
Hristovski D, Kastrin A, Rindflesch T C. Semantics-Based Cross-domain Collaboration Recommendation in the Life Sciences: Preliminary Results[C]// Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2015: 805-806.
[80]
Porter A, Zhang Y, Huang Y, et al. Tracking and Mining the COVID-19 Research Literature[J]. Frontiers in Research Metrics and Analytics, 2020,5. DOI: 10.3389/frma.2020.594060.
[81]
Wu M J, Zhang Y, Zhang G Q, et al. Exploring the Genetic Basis for Diseases Through a Heterogeneous Bibliometric Network: A Methodology and Case Study[J]. Technological Forecasting and Social Change, 2021,164: Article No. 120513.
doi: 10.1016/j.techfore.2020.120513
( Li Wenlin, Zeng Li, Yang Lan. Experiences and Problems in Literature-based Knowledge Discovery Service in University Libraries——Taking Nanjing University of Chinese Medicine Library as an Example[J]. Journal of Academic Libraries, 2015,33(2):61-65.)
( Liu Xiaohui, Li Changling, Cui Bin, et al. Research Topics Identification of Potential Interdisciplinary Collaboration Based on Closed and Irrelevant Knowledge Discovery[J]. Information Studies: Theory & Application, 2017,40(9):71-76.)
( Li Changling, Liu Xiaohui, Liu Yunmei, et al. Identifying Potential Disciplinary Collaboration Research Topics by Open Literature-based Discovery: Taking Information Science and Computer Science as Examples[J]. Information Studies: Theory & Application, 2018,41(2):100-104,137.)
[85]
Hu Z Y, Xu H Y, Tan X C. A Knowledge Graph of Stem Cell Oriented to Subject Knowledge Discovery[C]// Proceedings of the 7th IEEE International Conference on Healthcare Informatics. 2019.
[86]
Smalheiser N R. Rediscovering Don Swanson: The Past, Present and Future of Literature-based Discovery[J]. Journal of Data & Information Science, 2017,2(4):45-66.
[87]
胡正银. 基于个性化语义TRIZ的专利技术挖掘研究[D]. 北京:中国科学院大学, 2015.
[87]
( Hu Zhengyin. Study on Patent Tech Mining Based on Personalized Semantic TRIZ[D]. Beijing: University of Chinese Academy of Sciences, 2015.)