|
|
The Research Practices of DataBase Cloud Storage Using Hadoop/HBase for the Pharmacogenomics Data |
Fan Yunman, Hong Na, Qian Qing, Fang An |
Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing 100020, China |
|
|
Abstract [Objective] To explore the new idea and method, accumulate first-hand experience from the aspects of importing, storaging, retrievaling and bulk exporting the large-scale biomedical data. [Methods] Analyze the characteristics of the large-scale biomedical data, and compare the technologies, the advantages and disadvantages for solving the big data problem of the traditional relational databases (the representative Oracle) and the NoSQL database (the representative HBase), from the aspects of theoretic and test results. Take a drug database of genomic data storage systems as an example, and make a test for the performances of Oracle and HBase. [Results] HBase in practical application has a large advantage over Oracle when process large data. [Limitations] Lacking the deep mining and analysing to the pharmacogenomics data, the future research needs an in-depth technical optimization for Hadoop/HBase. [Conclusions] In this experiment, HBase can meet storage requirements for the large-scale biomedical data.
|
Received: 05 November 2014
Published: 11 June 2015
|
|
[1] PubMed [EB/OL]. [2014-10-30]. http://www.ncbi.nlm.nih. gov/pubmed.
[2] Unified Medical Language System (UMLS) [EB/OL]. [2014-10-30]. http://www.nlm.nih.gov/research/umls/.
[3] UniProt [EB/OL]. [2014-10-30]. http://www.uniprot.org/.
[4] 王培建. 云计算环境下大规模数据存储技术研究[D]. 北京: 北京邮电大学, 2013. (Wang Peijian. The Research of Big Data Storage Technology in Cloud Computing [D]. Beijing: Beijing University of Posts and Telecommunications, 2013.)
[5] 李青. 基于NoSQL的大数据处理的研究[D]. 西安: 西安电子科技大学, 2014. (Li Qing. Processing of Big Data Based on NoSQL [D]. Xi'an: Xidian University, 2014.)
[6] 卓海艺. 基于HBase的海量数据实时查询系统设计与实现[D]. 北京: 北京邮电大学, 2013. (Zhuo Haiyi. The Design and Implementation of Real-time Query System for Mass Data Based on HBase [D]. Beijing: Beijing University of Posts and Telecommunications, 2013.)
[7] 潘洪志. 高性能NoSQL存储系统的研究与实现[D]. 长春: 吉林大学, 2014. (Pan Hongzhi. Research and Implementation of High-performance Storage Systems NoSQL [D]. Chang-chun: Jilin University, 2014.)
[8] 边耐政, 郑小裕. SQL与NoSQL数据库的统一查询模型的设计与实现 [C]. 见: 电子教育, 电子商务与信息管理国际会议, 上海, 中国. 2014. (Bian Naizheng, Zheng Xiaoyu. Design and Implementation of Relation Database and Non-Relation Database Unified Query Model [C]. In: Proceedings of the 2014 International Conference on E-Education, E-Business and Information Management, Shanghai, China. 2014.)
[9] Cattell R. Scalable SQL and NoSQL Data Stores [J]. ACM SIGMOD Record, 2010, 39(4): 12-27.
[10] Hadjigeorgiou C. RDBMS vs NoSQL: Performance and Scaling Comparison [EB/OL]. [2014-10-30]. http://static.ph. ed.ac.uk/dissertations/hpc-msc/2012-2013/RDBMS%20vs%20NoSQL%20-%20Performance%20and%20Scaling%20Comparison.pdf.
[11] Nance C, Losser T, Iype R, et al. NoSQL vs RDBMS-Why There is Room for Both [C]. In: Proceedings of the 2013 Southern Association for Information Systems. 2013.
[12] Moniruzzaman A B M, Hossain S A. NoSQL Database: New Era of Databases for Big Data Analytics-Classification, Characteristics and Comparison [OL]. arXiv, 2013. arXiv: 1307. 0191.
[13] Boicea A, Radulescu F, Agapin L I. MongoDB vs Oracle-Database Comparison [C]. In: Proceedings of the 3rd International Conference on Emerging Intelligent Data and Web Technologies, Bucharest, Romania. 2012.
[14] HBase [EB/OL]. [2014-10-30]. http://hbase.apache.org/.
[15] Chang F, Dean J, Ghemawat S, et al. Bigtable: A Distributed Storage System for Structured Data [J]. ACM Transactions on Computer Systems (TOCS), 2008, 26(2): Article No.4.
[16] Johnson J A. Pharmacogenetics: Potential for Individualized Drug Therapy Through Genetics [J]. Trends in Genet, 2003, 19(11): 660-666. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|