Constructing Large-scale Knowledge Graph for Massive Sci-Tech Literature
Du Yue1,2,Chang Zhijun1,2(),Dong Mei1,2,Qian Li1,2,Wang Ying1
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China 2Department of Information Resources Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
[Objective] This paper builds a large-scale knowledge graph for scientific research, which meets the needs of sci-tech information services and improves the data consistency of traditional models. [Methods] First, we proposed an implicit knowledge graph construction method. Then, we used the identification tools for entity feature fields and implicit relationships to continuously update entities and discover entity relationship. [Results] We examined the proposed model with big data platform for PB-level sci-tech literature. Once there are changes in the entity data, the implicit knowledge graph will only update the entity data and will not modify their relationship. The model could retrieve all scholars from one institution through the predefined interface, and the average processing time was one hundredth of the triple-type knowledge graph. [Limitations] It is difficult to solidify the situation not satisfying the implicit relational data structure, and the entity data must be stored in a technical cluster with search engine. [Conclusions] The proposed method could effectively improve the data consistency issue due to changes in entity information. It helps us construct large-scale scientific research knowledge graph, which benefits the management, dissemination and utilization of sci-tech knowledge.
(Tian Junfeng, Wang Yanbiao, He Xinfeng, et al. Survey on the Causal Consistency of Data[J]. Journal on Communications, 2020, 41(3): 154-167.)
doi: 10.11959/j.issn.1000-436x.2020055
[3]
Sowa J F. Principles of Semantic Networks:Exploration in the Representation of Knowledge[A]// The Morgan Kaufmann Series in Representation and Reasoning[M]. Morgan Kaufmann, 1991.
[4]
Berners-Lee T, Hendler J, Lassila O. The Semantic Web: A New Form of Web Content That is Meaningful to Computers will Unleash a Revolution of New Possibilities[J]. Scientific American, 2001, 284(5):34-43.
[5]
Bizer C, Heath T, Berners-Lee T. Linked Data - The Story So Far[J]. International Journal on Semantic Web and Information Systems, 2009, 5(3). DOI: 10.4018/jswis.2009081901.
doi: 10.4018/jswis.2009081901
[6]
Singhal A. Introducing the Knowledge Graph: Things, Not Strings[OL]. (2021-05-16).[2022-06-01]. https://www.blog.google/products/search/introducing-knowledge-graph-things-not/.
Niu X, Sun X R, Wang H F, et al. Zhishi. me — Weaving Chinese Linking Open Data[C]// Proceedings of the 2011 International Semantic Web Conference, LNCS 7032. Berlin:Springer, 2011:205-220.
[9]
Miller G A. WordNet: A Lexical Database for English[J]. Communications of the ACM, 1995, 38(11): 39-41.
[10]
Auer S, Bizer C, Kobilarov G, et al. DBpedia: A Nucleus for a Web of Open Data[C]// Proceedings of the International Semantic Web Conference and Asian Semantic Web Conference. Berlin, Heidelberg:Springer, 2007: 722-735.
[11]
Bollacker K, Evans C, Paritosh P, et al. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge[C]// Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, Canada. New York, USA: ACM, 2008: 1247-1250.
Microsoft Academic[EB/OL]. [2022-06-01]. https://academic.microsoft.com/.
[16]
AMiner[EB/OL]. [2022-03-28]. https://aminer.org/.
[17]
Tang J, Zhang J, Yao L M, et al. ArnetMiner: Extraction and Mining of Academic Social Networks[C]// Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2008: 990-998.
(Xu Lei, Pan Jun. Semantic Data of Scientific Publications and their Applications[J]. Chinese Journal of Scientific and Technical Periodicals, 2018, 29(7):704-710.)
doi: 10.11946/cjstp.201803070189
(Wang Xin, Zou Lei, Wang Chaokun, et al. Research on Knowledge Graph Data Management: A Survey[J]. Journal of Software, 2019, 30(7): 2139-2174.)
[22]
Wilkinson K, Sayers C, Kuno H, et al. Efficient RDF Storage and Retrieval in Jena2[C]// Proceedings of the 1st International Conference on Semantic Web and Databases. Aachen, Germany: CEUR-WS, 2003, 3:120-139.