|
|
Constructing Large-scale Knowledge Graph for Massive Sci-Tech Literature |
Du Yue1,2,Chang Zhijun1,2(),Dong Mei1,2,Qian Li1,2,Wang Ying1 |
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China 2Department of Information Resources Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China |
|
|
Abstract [Objective] This paper builds a large-scale knowledge graph for scientific research, which meets the needs of sci-tech information services and improves the data consistency of traditional models. [Methods] First, we proposed an implicit knowledge graph construction method. Then, we used the identification tools for entity feature fields and implicit relationships to continuously update entities and discover entity relationship. [Results] We examined the proposed model with big data platform for PB-level sci-tech literature. Once there are changes in the entity data, the implicit knowledge graph will only update the entity data and will not modify their relationship. The model could retrieve all scholars from one institution through the predefined interface, and the average processing time was one hundredth of the triple-type knowledge graph. [Limitations] It is difficult to solidify the situation not satisfying the implicit relational data structure, and the entity data must be stored in a technical cluster with search engine. [Conclusions] The proposed method could effectively improve the data consistency issue due to changes in entity information. It helps us construct large-scale scientific research knowledge graph, which benefits the management, dissemination and utilization of sci-tech knowledge.
|
Received: 11 April 2022
Published: 28 March 2023
|
|
Fund:Literature and Information Capacity Building Project of Chinese Academy of Sciences(Y9100901) |
Corresponding Authors:
Chang Zhijun,ORCID:0000-0001-9211-8599,E-mail: changzj@mail.las.ac.cn。
|
[1] |
李娇. 基于知识图谱的科研综述生成研究[D]. 北京: 中国农业科学院, 2021.
|
[1] |
(Li Jiao. Research on Generation of Scientific Research Review Based on Knowledge Graph[D]. Beijing: Chinese Academy of Agricultural Sciences, 2021.)
|
[2] |
田俊峰, 王彦骉, 何欣枫, 等. 数据因果一致性研究综述[J]. 通信学报, 2020, 41(3):154-167.
doi: 10.11959/j.issn.1000-436x.2020055
|
[2] |
(Tian Junfeng, Wang Yanbiao, He Xinfeng, et al. Survey on the Causal Consistency of Data[J]. Journal on Communications, 2020, 41(3): 154-167.)
doi: 10.11959/j.issn.1000-436x.2020055
|
[3] |
Sowa J F. Principles of Semantic Networks:Exploration in the Representation of Knowledge[A]// The Morgan Kaufmann Series in Representation and Reasoning[M]. Morgan Kaufmann, 1991.
|
[4] |
Berners-Lee T, Hendler J, Lassila O. The Semantic Web: A New Form of Web Content That is Meaningful to Computers will Unleash a Revolution of New Possibilities[J]. Scientific American, 2001, 284(5):34-43.
|
[5] |
Bizer C, Heath T, Berners-Lee T. Linked Data - The Story So Far[J]. International Journal on Semantic Web and Information Systems, 2009, 5(3). DOI: 10.4018/jswis.2009081901.
doi: 10.4018/jswis.2009081901
|
[6] |
Singhal A. Introducing the Knowledge Graph: Things, Not Strings[OL]. (2021-05-16).[2022-06-01]. https://www.blog.google/products/search/introducing-knowledge-graph-things-not/.
|
[7] |
搜狗知立方[DB/OL]. (2017-03-06). [2022-06-01]. https://www.sogou.com/.
|
[8] |
Niu X, Sun X R, Wang H F, et al. Zhishi. me — Weaving Chinese Linking Open Data[C]// Proceedings of the 2011 International Semantic Web Conference, LNCS 7032. Berlin:Springer, 2011:205-220.
|
[9] |
Miller G A. WordNet: A Lexical Database for English[J]. Communications of the ACM, 1995, 38(11): 39-41.
|
[10] |
Auer S, Bizer C, Kobilarov G, et al. DBpedia: A Nucleus for a Web of Open Data[C]// Proceedings of the International Semantic Web Conference and Asian Semantic Web Conference. Berlin, Heidelberg:Springer, 2007: 722-735.
|
[11] |
Bollacker K, Evans C, Paritosh P, et al. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge[C]// Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, Canada. New York, USA: ACM, 2008: 1247-1250.
|
[12] |
DrugBank[EB/OL]. [2022-06-01]. https://go.drugbank.com/.
|
[13] |
Rospocher M, van Erp M, Vossen P, et al. Building Event-centric Knowledge Graphs from News[J]. Journal of Web Semantics, 2016, 37: 132-151.
|
[14] |
Springer Nature. SciGraph[EB/OL]. [2022-03-28]. https://www.springernature.com/gp/researchers/scigraph.
|
[15] |
Microsoft Academic[EB/OL]. [2022-06-01]. https://academic.microsoft.com/.
|
[16] |
AMiner[EB/OL]. [2022-03-28]. https://aminer.org/.
|
[17] |
Tang J, Zhang J, Yao L M, et al. ArnetMiner: Extraction and Mining of Academic Social Networks[C]// Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2008: 990-998.
|
[18] |
Acemap[EB/OL]. [2022-03-28]. http://acemap.sjtu.edu.cn/.
|
[19] |
Wizdom.ai[EB/OL]. [2022-06-01]. https://www.wizdom.ai/.
|
[20] |
徐雷, 潘珺. 科学出版物语义数据及其应用研究[J]. 中国科技期刊研究, 2018, 29(7):704-710.
doi: 10.11946/cjstp.201803070189
|
[20] |
(Xu Lei, Pan Jun. Semantic Data of Scientific Publications and their Applications[J]. Chinese Journal of Scientific and Technical Periodicals, 2018, 29(7):704-710.)
doi: 10.11946/cjstp.201803070189
|
[21] |
王鑫, 邹磊, 王朝坤, 等. 知识图谱数据管理研究综述[J]. 软件学报, 2019, 30(7): 2139-2174.
|
[21] |
(Wang Xin, Zou Lei, Wang Chaokun, et al. Research on Knowledge Graph Data Management: A Survey[J]. Journal of Software, 2019, 30(7): 2139-2174.)
|
[22] |
Wilkinson K, Sayers C, Kuno H, et al. Efficient RDF Storage and Retrieval in Jena2[C]// Proceedings of the 1st International Conference on Semantic Web and Databases. Aachen, Germany: CEUR-WS, 2003, 3:120-139.
|
[23] |
Eclipse. RDF4J[EB/OL]. [2022-03-28]. http://rdf4j.org/.
|
[24] |
Neumann T, Weikum G. RDF-3X: A RISC-style Engine for RDF[J]. Proceedings of the VLDB Endowment, 2008, 1(1): 647-659.
doi: 10.14778/1453856.1453927
|
[25] |
Zou L, Özsu M T, Chen L, et al. GStore: A Graph-based SPARQL Query Engine[J]. The VLDB Journal, 2014, 23(4): 565-590.
doi: 10.1007/s00778-013-0337-7
|
[26] |
OpenLink Virtuoso[EB/OL]. [2022-03-28]. https://virtuoso.openlinksw.com/.
|
[27] |
AllegroGraph[EB/OL]. [2022-03-28]. https://franz.com/agraph/allegrograph/.
|
[28] |
Ontotext. GraphDB[EB/OL]. [2022-03-28]. http://graphdb.ontotext.com/.
|
[29] |
Blazegraph[EB/OL]. [2022-03-28]. https://www.blazegraph.com/.
|
[30] |
The Neo4j Manual v3.4[EB/OL]. [2022-03-28]. https://neo4j.com/docs/developer-manual/current/.
|
[31] |
JanusGraph—Distributed Graph Database[EB/OL]. [2022-03-28]. http://janusgraph.org/.
|
[32] |
OrientDB-multi-model Database[EB/OL]. [2022-03-28]. http://orientdb.com/.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|