|
|
Analysis of MapReduce Principle and Its Main Implementation Platforms |
Kang Liyun, Wang Xiaoyue, Bai Rujiang |
Institute of Scientific & Technical Information, Shandong University of Technology, Zibo 255049, China |
|
|
Abstract Due to the problems in processing speed, storage space, fault tolerance, access time and others of massive data processing, this paper analyzes principle, implementation process of Google MapReduce programming model, introduces four main MapReduce implementation platforms,including Hadoop, Phoenix, Disco and Mars.Then,it separately compares them in four aspects as the programming language, building platform, functions and features, applications to have a comprehensive understanding of MapReduce programming model and it's application platforms.
|
Received: 21 September 2011
Published: 23 March 2012
|
|
[1] Dean J, Ghemawat S. MapReduce:Simplified Data Processing on Large Clusters[J].Communications of the ACM, 2008,51(1):107-113.[2] White T. Hadoop: The Definitive Guide[M]. O'Reilly Media,2009.[3] Ghemawat S, Gobioff H,Leung S. The Google File System[C]. In: Proceedings of the 19th ACM SIGOPS Symposium on Operating Systems Principles (SOSP'03), Bolton Landing, NY.New York, USA: ACM, 2003: 29-43.[4] MapReduce Tutorial [EB/OL]. [2011-08-19]. http://hadoop.apache.org/common/docs/current/mapred_tutorial.html.[5] EE382a: Advanced Processor Architecture[EB/OL].[2011-08-20].https://courseware.stanford.edu/pg/courses/95981.[6] Ranger C,Raghuraman R,Penmetsa A, et al. Evaluating MapReduce for Multi-core and Multiprocessor Systems[C]. In: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture(HPCA'07). Washington, DC, USA:IEEE Computer Society,2007:13-24.[7] Technical Overview Disco Architecture [EB/OL].[2011-12-22].http://discoproject.org/doc/overview.html.[8] He B S, Fang W B, Luo Q, et al. Mars:A MapReduce Framework on Graphics Processors[C]. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques(PACT'08). New York, NY, USA:ACM,2008:260-269.[9] Mars: A MapReduce Framework on Graphics Processors [EB/OL]. [2011-08-20].http://www.cse.ust.hk/gpuqp/Mars.html.[10] Hadoop Streaming [EB/OL]. [2011-12-23]. http://hadoop.apache.org/common/docs/r0.15.2/ streaming.html.[11] Package org.apache.hadoop.mapred.pipes[EB/OL].[2011-12-23].http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html.[12] Leo S, Zanetti G. Pydoop: A Python MapReduce and HDFS API for Hadoop[C]. In: Proceedings of the 19th ACM International Sysposium on High Performance Distributed Computing(HPDC'10).New York, NY, USA:ACM,2010:819-825.[13] Pydoop [EB/OL]. [2011-12-26]. http://sourceforge.net/projects/pydoop/.[14] Style Guide for Disco Code [EB/OL].[2011-12-26].http://discoproject.org/doc/howto/style.html.[15] Programming Rules and Conventions [EB/OL].[2011-12-26]. http://www.erlang.se/doc/programming_rules.shtml.[16] Style Guide for Python Code[EB/OL].[2011-12-26].http://www.python.org/dev/peps/pep-0008.[17] Lam C. Hadoop in Action[M]. Shelter Island, NY:Manning Publications Co., 2010.[18] POSIX Threads Programming[EB/OL].[2011-12-29].https://computing.llnl.gov/tutorials/pthreads/.[19] Disco Distributed File System[EB/OL].[2011-12-29]. http://discoproject.org/doc/howto/ddfs.html.[20] 李建江, 崔健, 王聃, 等.MapReduce并行编程模型研究综述[J]. 电子学报 ,2011,39(11):2635-2642.(Li Jianjiang, Cui Jian, Wang Dan, et al. Survey of MapReduce Parallel Programming Model[J]. Chinese Journal of Electronics, 2011,39(11):2635-2642.)[21] Langendoen K,Romein J,Bhoedjang R,et al.Integrating Polling, Interrupts, and Thread Management[C].In: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation. Los Alamitos: IEEE Computer Society,1996:13-22.[22] Hadoop at Yahoo![EB/OL].[2012-01-07].http://developer.yahoo.com/hadoop/.[23] PoweredBy-Haoop Wiki[EB/OL].[2012-01-07].http://wiki.apache.org/hadoop/PoweredBy.[24] Hadoop at Last.fm[EB/OL].[2012-01-08].http://www.slideshare.net/klbostee/hadoop-at-Lastfm.[25] Facebook on Hadoop,Hive,HBase,and A/B Testing[EB/OL].[2012-01-08]. http://www.infoq.com/news/2010/07/facebook-hadoop-summit.[26] Hadoop Archive-淘宝共享数据平台TBDATA.org[EB/OL].[2012-02-08]. http://www.tbdata.org/archives/category/cloud-computing/hadoop.(Hadoop Archive: Taobao Shared Data Platform TBDATA.org[EB/OL].[2012-01-08].http://www.tbdata.org/archives/category/cloud-computing/hadoop.)[27] Hypertable: An Open Source, High Performance, Scalable Database[EB/OL].[2011-08-19]. http://hypertable.org/.[28] Disco: Massive Data-minimal Code[EB/OL].[2011-12-22].http://discoproject.org/about. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|