Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (1): 9-20    DOI: 10.11925/infotech.2096-3467.2017.1341
Orginal Article Current Issue | Archive | Adv Search |
Big Linked Data Management: Challenges, Solutions and Practices
Shen Zhihong1(), Yao Chang2, Hou Yanfei1, Wu Linhuan3, Li Yuepeng1
1(Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China)
2(National Natural Science Foundation, Beijing 100085, China)
3(Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China)
Download: PDF (2794 KB)   HTML ( 5
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This article analyzed the concept, connotation and characteristics of the big linked data, aiming to explore possible solutions for technical challenges facing its management. [Methods] We proposed a new model based on NoSQL data management, distributed graph computing and big data pipeline technologies, which designed and develop gETL, a large-scale graph data warehouse processing system. [Results] The proposed system was used in NSFC-KBMS and WDCM projects, which effectively manages large-scale knowledge-data and biological data. [Limitations] The proposed system could be improved with new applications. [Conclusions] The NoSQL data storage, distributed graph computing, and big data pipeline technologies, as well as the gETL system, help us address the challenges facing linked big data management.

Key wordsLinked Data      Knowledge Graph      Big Data      Big Linked Data     
Received: 12 December 2017      Published: 05 February 2018
ZTFLH:  TP393  

Cite this article:

Shen Zhihong,Yao Chang,Hou Yanfei,Wu Linhuan,Li Yuepeng. Big Linked Data Management: Challenges, Solutions and Practices. Data Analysis and Knowledge Discovery, 2018, 2(1): 9-20.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.1341     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I1/9

[1] Berners-Lee T.Design Issues: Linked Data[EB/OL]. [2017- 12-29]..
[2] 沈志宏, 张晓林. 关联数据及其应用现状综述[J]. 现代图书情报技术, 2011(11): 1-9.
[2] (Shen Zhihong, Zhang Xiaolin.Linked Data and Its Applications: An Overview[J]. New Technology of Library and Information Service, 2011(11): 1-9.)
[3] BigData[J]. Nature, 2008, 455(7209): 1-136.
doi: 10.1038/455001a
[4] Big Data [EB/OL]. [2017-12-29]..
[5] 黎建辉, 沈志宏, 孟小峰. 科学大数据管理: 概念、技术与系统[J]. 计算机研究与发展, 2017, 54(2): 235-247.
[5] (Li Jianhui, Shen Zhihong, Meng Xiaofeng.Scientific Big Data Management: Concepts, Technologies and System[J]. Journal of Computer Research and Development, 2017, 54(2): 235-247.)
[6] Hu B, Carvalho N, Laera L, et al.Towards Big Linked Data: A Large-scale, Distributed Semantic Data Storage[C]// Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services, Bali, Indonesia. New York, USA: ACM, 2012: 167-176.
[7] Hitzler P, Janowicz K. Linked Data, Big Data,the 4th Paradigm[J]. Semantic Web, 2013, 4(3): 233-235.
[8] Big Data & Linked Data[EB/OL]. [2017-06-08]. .
[9] Robak S, Franczyk B, Robak M.Applying Big Data and Linked Data Concepts in Supply Chains Management[C]// Proceedings of the 2013 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE, 2013: 1215-1221.
[10] 刘炜, 夏翠娟, 张春景. 大数据与关联数据: 正在到来的数据技术革命[J]. 现代图书情报技术, 2013(4): 2-9.
[10] (Liu Wei, Xia Cuijuan, Zhang Chunjing.Big Data and Linked Data: The Emerging Data Technology for the Future of Librarianship[J]. New Technology of Library and Information Service, 2013(4): 2-9.
[11] Erling O, Mikhailov I.Virtuoso: RDF Support in a Native RDBMS[A]//Semantic Web Information Management[M]. Springer, Berlin, Heidelberg, 2010: 501-519.
[12] Bizer C, Cyganiak R.D2R Server-Publishing Relational Databases on the Semantic Web[C]//Proceedings of the 5th International Semantic Web Conference. 2006.
[13] Volz J, Bizer C, Gaedke M, et al.Silk - A Link Discovery Framework for the Web of Data[C]//Proceedings of the 2nd Workshop about Linked Data on the Web. 2009.
[14] 李涓子, 侯磊. 知识图谱研究综述[J].山西大学学报: 自然科学版, 2017, 40(3): 454-459.
[14] (Li Juanzi, Hou Lei.Overview of Knowledge Graph[J]. Journal of Shanxi University: Natural Science Edition, 2017, 40(3): 454-459.)
[15] Auer S, Bizer C, Kobilarov G, et al.DBpedia: A Nucleus for a Web of Open Data[A]// The Semantic Web[M]. Springer, Berlin, Heidelberg, 2007.
[16] Suchanek F M, Kasneci G, Weikum G.YAGO: A Large Ontology from Wikipedia and Wordnet[J]. Web Semantics: Science, Services and Agents on the World Wide Web, 2008, 6(3): 203-217.
doi: 10.1016/j.websem.2008.06.001
[17] Vrandečić D, Krötzsch M.Wikidata: A Free Collaborative Knowledgebase[J]. Communications of the ACM, 2014, 57(10): 78-85.
doi: 10.1145/2629489
[18] 知识图谱的应用[EB/OL]. [2017-10-02]. .
[18] (Application of Knowledge Graph [EB/ OL]. [2017-10-02]. Application of Knowledge Graph [EB/ OL]. [2017-10-02]. . [2017-10-02]. Application of Knowledge Graph [EB/ OL]. [2017-10-02].
[19] Barwick H. The ‘Four Vs’ of Big Data. Implementing Information Infrastructure Symposium [EB/OL]. [2012-10- 02]. .
[20] IBM. What is Big Data? [EB/OL]. [2012-10-02]. .
[21] Cyganiak R, Jentzsch A, Abele A, McCrae J. Linking Open Data Cloud Diagram [EB/OL]. [2016-12-02]. .
[22] Wu L, Sun Q, Desmeth P, et al.World Data Centre for Microorganisms: An Information Infrastructure to Explore and Utilize Preserved Microbial Strains Worldwide[J]. Nucleic Acids Research, 2017, 45(D1): D611-D618.
doi: 10.1093/nar/gkw903 pmid: 5210620
[23] Auer S, Demter J, Martin M, et al.Lodstats - An Extensible Framework for High-performance Dataset Analytics[A]// Knowledge Engineering and Knowledge Management[M]. Springer Berlin Heidelberg, 2012: 353-362.
[24] Dong X, Ding Y, Wang H, et al.Chem2Bio2RDF Dashboard: Ranking Semantic Associations in Systems Chemical Biology Space[C]// Proceedings of the 19th World Wide Web Conference on the Future of the Web in Collaboratice Science(FWCS), Raleigh, NC, USA. 2010.
[25] Vidal M E, Raschid L, Márquez N, et al.BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data[A]// The Semantic Web: Research and Applications[M]. Springer, Berlin, Heidelberg, 2010.
[26] Hausenblas M. Linked Data Applications[R/OL]. Digital Enterprise Research Institute(DERI), 2009. .
[27] 夏翠娟, 刘炜.关联数据的消费技术及实现[J].大学图书馆学报, 2013, 31(3): 29-37.
doi: 10.3969/j.issn.1002-1027.2013.03.004
[27] (Xia Cuijuan, Liu Wei.Technologies and Implementation of Consuming Linked Data[J]. Journal of Academic Libraries, 2013, 31(3): 29-37.)
doi: 10.3969/j.issn.1002-1027.2013.03.004
[28] Slater T, Bouton C, Huang E S.Beyond Data Integration[J]. Drug Discovery Today, 2008, 13(13-14): 584-589.
doi: 10.1016/j.drudis.2008.01.008
[29] 何少鹏, 黎建辉, 沈志宏, 等. 大规模的RDF数据存储技术综述[J]. 网络新媒体技术, 2013, 2(1): 8-16.
doi: 10.3969/j.issn.2095-347X.2013.01.002
[29] (He Shaopeng, Li Jianhui, Shen Zhihong, et al.Overview of the Storage Technology for Large-scale RDF Data[J]. Microcomputer Applications, 2013, 2(1): 8-16.)
doi: 10.3969/j.issn.2095-347X.2013.01.002
[30] 从语义网到知识图谱——语义技术工程化的回顾与反思[EB/OL]. [2016-12-02]..
[30] (From Semantic Web to Knowledge Graph——Review of the Engineering of Semantic Technology[EB/OL]. [2016-12-02]..)
[31] 沈志宏, 黎建辉, 张晓林. 面向LOD的关联发现过程的定位、目标与复杂性分析[J]. 中国图书馆学报, 2013, 39(6): 101-108.
doi: 10.3969/j.issn.1001-8867.2013.06.009
[31] (Shen Zhihong, Li Jianhui, Zhang Xiaolin.Insights into Link Discovery Process for Linked Open Data: Positioning, Goals and Complexity[J]. Journal of Library Science in China, 2013, 39(6): 101-108.)
doi: 10.3969/j.issn.1001-8867.2013.06.009
[32] Hassanzadeh O, Lim L, Kementsietsidis, et al. A Declarative Framework for Semantic Link Discovery over Relational Data[C] // Proceedings of the 18th World Wide Web Conference (WWW2009). 2009: 1101-1102.
[33] Ngomo A C N, Auer S. LIMES: A Time-efficient Approach for Large-scale Link Discovery on the Web of Data[C]// Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011: 2312-2317.
[34] Hassanzadeh O.Publishing Relational Databases as Linked Data [EB/OL]. [2016-12-02].
[35] Scharffe F, Liu Y, Zhou C. RDF-AI: An Architecture for RDF Datasets Matching, Fusion and Interlink[C]//Proceedings of the IJCAI 2009 Workshop on Identity, Reference, and Knowledge Representation (IR-KR). 2009.
[36] Cattell R.Scalable SQL and NoSQL Data Stores[J]. ACM SIGMOD Record, 2010, 39(4): 12-27.
[37] Wang G, Tang J.The NoSQL Principles and Basic Application of Cassandra Model[C]// Proceedings of the 2012 International Conference on Computer Science & Service System (CSSS). 2012: 1332-1335
[38] Brewer E.CAP Twelve Years Later: How the "Rules" Have Changed[J]. Computer, 2012, 45(2): 23-29.
doi: 10.1109/MC.2012.37
[39] Webber J.A Programmatic Introduction to Neo4j[C]// Proceedings of the 3rd Annual Conference on Systems, Programming, and Applications: Software for Humanity. ACM, 2012: 217-218.
[40] Jouili S, Vansteenberghe V.An Empirical Comparison of Graph Databases[C]// Proceedings of the 2013 International Conference on Social Computing (SocialCom). IEEE, 2013: 708-715.
[41] Abreu D D, Flores A, Palma G, et al.Choosing Between Graph Databases and RDF Engines for Consuming and Mining Linked Data[C]// Proceedings of the 4th International Conference on Consuming Linked Data. 2013.
[42] Hernández D, Hogan A, Riveros C, et al.Querying Wikidata: Comparing SPARQL, Relational and Graph Databases[C]// Proceedings of the 15th International Semantic Web Conference. Springer International Publishing, 2016.
[43] Papailiou N, Konstantinou I, Tsoumakos D, et al.H2RDF: Adaptive Query Processing on RDF Data in the Cloud[C]// Proceedings of the 21st International Conference on World Wide Web. ACM, 2012: 397-400.
[44] Low Y, Gonzalez J, Kyrola A, et al.Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud[J]. Proceedings of the VLDB Endowment, 2012, 5(8): 716-727.
doi: 10.14778/2212351
[45] Avery C.Giraph: Large-scale Graph Processing Infrastructure on Hadoop[C]//Proceedings of the Hadoop Summit. 2011.
[46] Xin R S, Gonzalez J E, Franklin M J, et al.Graphx: A Resilient Distributed Graph System on Spark[C]//Proceedings of the 1st International Workshop on Graph Data Management Experiences and Systems. ACM, 2013: 2.
[47] Koitzsch K.Data Pipelines and How to Construct Them[A]// Pro Hadoop Data Analytics[M]. Apress, 2017: 77-90.
[48] Yi X, Liu F, Liu J, et al.Building a Network Highway for Big Data: Architecture and Challenges[J]. IEEE Network, 2014, 28(4): 5-13.
doi: 10.1109/MNET.2014.6863125
[49] Pedregosa F, Varoquaux G, Gramfort A, et al.Scikit-learn: Machine learning in Python[J]. Journal of Machine Learning Research, 2011, 12: 2825-2830.
[50] Meng X R, Bradley J, Yavuz B, et al.Mllib: Machine Learning in Apache Spark[J]. Journal of Machine Learning Research, 2016, 17(1): 1235-1241.
[51] Apache NiFi. An Easy to Use, Powerful, and Reliable System to Process and Distribute Data[EB/OL]. [2016-12-02]. .
[52] Thusoo A, Sarma J S, Jain N, et al.Hive-A Petabyte Scale Data Warehouse Using Hadoop[C]//Proceedings of the 26th International Conference on Data Engineering(ICDE). IEEE, 2010: 996-1005.
[53] Avram A.Gremlin, A Language for Working with Graphs [EB/OL]. [2016-12-02]..
[54] Wang C, Rayan I A, Schwan K. Faster, Larger, Easier: Reining Real-time Big Data Processing in Cloud[C]// Proceedings of the Posters and Demo Track. ACM, 2012.
[55] Ranawade S V, Navale S, Dhamal A, et al. Online Analytical Processing on Hadoop Using Apache Kylin [EB/OL]. [2016- 12-02].
[56] Li L, Shen Z H, Li J H, et al.A Resilient Index Graph for Querying Large Biological Scientific Data[C]//Proceedings of the 2017 IEEE International Congress on Big Data (BigData Congress). 2017: 435-443.
[57] Carbone P, Katsifodimos A, Ewen S, et al.Apache Flink: Stream and Batch Processing in a Single Engine[J]. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2015, 36(4): 28-38.
[58] Jones M.Process Real-time Big Data with Twitter Storm [EB/OL]. [2016-12-02]..
[59] Apache Beam: An Advanced Unified Programming Model [EB/OL]. [2016-12-02]..
[1] Qiu Erli,He Hongwei,Yi Chengqi,Li Huiying. Research on Public Policy Support Based on Character-level CNN Technology[J]. 数据分析与知识发现, 2020, 4(7): 28-37.
[2] Wang Jiandong,Yu Shiyang. Principles on Constructing National Economic Brain[J]. 数据分析与知识发现, 2020, 4(7): 2-17.
[3] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[4] Lv Huakui,Hong Liang,Ma Feicheng. Constructing Knowledge Graph for Financial Equities[J]. 数据分析与知识发现, 2020, 4(5): 27-37.
[5] Jiandong Wang. Monitoring and Forecasting Economic Performance with Big Data[J]. 数据分析与知识发现, 2020, 4(1): 12-26.
[6] Beibei Kong,Jing Xie,Li Qian,Zhijun Chang,Zhenxin Wu. Methodology and Tools to Enrich Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(7): 113-122.
[7] Haici Yang,Jun Wang. Visualizing Knowledge Graph of Academic Inheritance in Song Dynasty[J]. 数据分析与知识发现, 2019, 3(6): 109-116.
[8] Xiaozhou Dong,Xinkang Chen. E-Coupon and Economic Performance of E-commerce[J]. 数据分析与知识发现, 2019, 3(6): 42-49.
[9] Quan Lu,Anqi Zhu,Jiyue Zhang,Jing Chen. Research on User Information Requirement in Chinese Network Health Community: Taking Tumor-forum Data of Qiuyi as an Example[J]. 数据分析与知识发现, 2019, 3(4): 22-32.
[10] Ying Wang,Li Qian,Jing Xie,Zhijun Chang,Beibei Kong. Building Knowledge Graph with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 15-26.
[11] Li Qian,Jing Xie,Zhijun Chang,Zhenxin Wu,Dongrong Zhang. Designing Smart Knowledge Services with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 4-14.
[12] Jiying Hu,Jing Xie,Li Qian,Changlei Fu. Constructing Big Data Platform for Sci-Tech Knowledge Discovery with Knowledge Graph[J]. 数据分析与知识发现, 2019, 3(1): 55-62.
[13] Jing Xie,Li Qian,Hongbo Shi,Beibei Kong,Jiying Hu. Designing Framework for Precise Service of Scholarly Big Data[J]. 数据分析与知识发现, 2019, 3(1): 63-71.
[14] Yang Cao,Wenfei Fan,Tengfei Yuan. Is Big Data Analytics Beyond the Reach of Small Companies?[J]. 数据分析与知识发现, 2017, 1(9): 1-7.
[15] Chao Lemen,Yang Canjun,Wang Shengjie,Zhao Junpeng,Xu Mengtian. Data Science Curriculums Around the World: An Empirical Study[J]. 数据分析与知识发现, 2017, 1(6): 12-21.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn