Please wait a minute...
New Technology of Library and Information Service  2012, Vol. 28 Issue (2): 60-67    DOI: 10.11925/infotech.1003-3513.2012.02.10
Current Issue | Archive | Adv Search |
Analysis of MapReduce Principle and Its Main Implementation Platforms
Kang Liyun, Wang Xiaoyue, Bai Rujiang
Institute of Scientific & Technical Information, Shandong University of Technology, Zibo 255049, China
Download: PDF(944 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  Due to the problems in processing speed, storage space, fault tolerance, access time and others of massive data processing, this paper analyzes principle, implementation process of Google MapReduce programming model, introduces four main MapReduce implementation platforms,including Hadoop, Phoenix, Disco and Mars.Then,it separately compares them in four aspects as the programming language, building platform, functions and features, applications to have a comprehensive understanding of MapReduce programming model and it's application platforms.
Key wordsMapReduce      Implementation platform      Hadoop      Phoenix      Disco      Mars     
Received: 21 September 2011      Published: 23 March 2012
: 

G250 TP391

 

Cite this article:

Kang Liyun, Wang Xiaoyue, Bai Rujiang. Analysis of MapReduce Principle and Its Main Implementation Platforms. New Technology of Library and Information Service, 2012, 28(2): 60-67.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2012.02.10     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2012/V28/I2/60

[1] Dean J, Ghemawat S. MapReduce:Simplified Data Processing on Large Clusters[J].Communications of the ACM, 2008,51(1):107-113.

[2] White T. Hadoop: The Definitive Guide[M]. O'Reilly Media,2009.

[3] Ghemawat S, Gobioff H,Leung S. The Google File System[C]. In: Proceedings of the 19th ACM SIGOPS Symposium on Operating Systems Principles (SOSP'03), Bolton Landing, NY.New York, USA: ACM, 2003: 29-43.

[4] MapReduce Tutorial [EB/OL]. [2011-08-19]. http://hadoop.apache.org/common/docs/current/mapred_tutorial.html.

[5] EE382a: Advanced Processor Architecture[EB/OL].[2011-08-20].https://courseware.stanford.edu/pg/courses/95981.

[6] Ranger C,Raghuraman R,Penmetsa A, et al. Evaluating MapReduce for Multi-core and Multiprocessor Systems[C]. In: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture(HPCA'07). Washington, DC, USA:IEEE Computer Society,2007:13-24.

[7] Technical Overview Disco Architecture [EB/OL].[2011-12-22].http://discoproject.org/doc/overview.html.

[8] He B S, Fang W B, Luo Q, et al. Mars:A MapReduce Framework on Graphics Processors[C]. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques(PACT'08). New York, NY, USA:ACM,2008:260-269.

[9] Mars: A MapReduce Framework on Graphics Processors [EB/OL]. [2011-08-20].http://www.cse.ust.hk/gpuqp/Mars.html.

[10] Hadoop Streaming [EB/OL]. [2011-12-23]. http://hadoop.apache.org/common/docs/r0.15.2/ streaming.html.

[11] Package org.apache.hadoop.mapred.pipes[EB/OL].[2011-12-23].http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html.

[12] Leo S, Zanetti G. Pydoop: A Python MapReduce and HDFS API for Hadoop[C]. In: Proceedings of the 19th ACM International Sysposium on High Performance Distributed Computing(HPDC'10).New York, NY, USA:ACM,2010:819-825.

[13] Pydoop [EB/OL]. [2011-12-26]. http://sourceforge.net/projects/pydoop/.

[14] Style Guide for Disco Code [EB/OL].[2011-12-26].http://discoproject.org/doc/howto/style.html.

[15] Programming Rules and Conventions [EB/OL].[2011-12-26]. http://www.erlang.se/doc/programming_rules.shtml.

[16] Style Guide for Python Code[EB/OL].[2011-12-26].http://www.python.org/dev/peps/pep-0008.

[17] Lam C. Hadoop in Action[M]. Shelter Island, NY:Manning Publications Co., 2010.

[18] POSIX Threads Programming[EB/OL].[2011-12-29].https://computing.llnl.gov/tutorials/pthreads/.

[19] Disco Distributed File System[EB/OL].[2011-12-29]. http://discoproject.org/doc/howto/ddfs.html.

[20] 李建江, 崔健, 王聃, 等.MapReduce并行编程模型研究综述[J]. 电子学报 ,2011,39(11):2635-2642.(Li Jianjiang, Cui Jian, Wang Dan, et al. Survey of MapReduce Parallel Programming Model[J]. Chinese Journal of Electronics, 2011,39(11):2635-2642.)

[21] Langendoen K,Romein J,Bhoedjang R,et al.Integrating Polling, Interrupts, and Thread Management[C].In: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation. Los Alamitos: IEEE Computer Society,1996:13-22.

[22] Hadoop at Yahoo![EB/OL].[2012-01-07].http://developer.yahoo.com/hadoop/.

[23] PoweredBy-Haoop Wiki[EB/OL].[2012-01-07].http://wiki.apache.org/hadoop/PoweredBy.

[24] Hadoop at Last.fm[EB/OL].[2012-01-08].http://www.slideshare.net/klbostee/hadoop-at-Lastfm.

[25] Facebook on Hadoop,Hive,HBase,and A/B Testing[EB/OL].[2012-01-08]. http://www.infoq.com/news/2010/07/facebook-hadoop-summit.

[26] Hadoop Archive-淘宝共享数据平台TBDATA.org[EB/OL].[2012-02-08]. http://www.tbdata.org/archives/category/cloud-computing/hadoop.(Hadoop Archive: Taobao Shared Data Platform TBDATA.org[EB/OL].[2012-01-08].http://www.tbdata.org/archives/category/cloud-computing/hadoop.)

[27] Hypertable: An Open Source, High Performance, Scalable Database[EB/OL].[2011-08-19]. http://hypertable.org/.

[28] Disco: Massive Data-minimal Code[EB/OL].[2011-12-22].http://discoproject.org/about.
[1] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[2] Tingxin Wen,Yangzi Li,Jingshuang Sun. News Hotspots Discovery Method Based on Multi Factor Feature Selection and AFOA/K-means[J]. 数据分析与知识发现, 2019, 3(4): 97-106.
[3] Juhua Wu,Yu Wang,Ming Li,Shaoyun Cai. Knowledge Discovery of Online Health Communities with Weighted Knowledge Network[J]. 数据分析与知识发现, 2019, 3(2): 108-117.
[4] Jiying Hu,Jing Xie,Li Qian,Changlei Fu. Constructing Big Data Platform for Sci-Tech Knowledge Discovery with Knowledge Graph[J]. 数据分析与知识发现, 2019, 3(1): 55-62.
[5] Xin Wang,Wen’gang Feng. Review of Techniques Detecting Online Extremism and Radicalization[J]. 数据分析与知识发现, 2018, 2(10): 2-8.
[6] Zhiqiang Zhang,Shaoping Fan,Xiujuan Chen. Biomedical Informatics Studies for Knowledge Discovery in Precision Medicine[J]. 数据分析与知识发现, 2018, 2(1): 1-8.
[7] Dongmei Mu,Ping Wang,Danning Zhao. Reducing Data Dimension of Electronic Medical Records: An Empirical Study[J]. 数据分析与知识发现, 2018, 2(1): 88-98.
[8] Changyuan Gao,Jianping Yu,Xiaoyan He. Knowledge Search for Cloud Computing Industry Alliance: An Algorithm Based on Improved Particle Swarm Optimization[J]. 数据分析与知识发现, 2017, 1(3): 81-89.
[9] Xiufang Xie,Xiaolin Zhang. Integrated Analysis and Visualization of Sci-Tech Roadmaps: Case Study of Renewable Energy[J]. 数据分析与知识发现, 2017, 1(1): 16-25.
[10] Mu Dongmei,Ren Ke. Discovering Knowledge from Electronic Medical Records with Three Data Mining Algorithms[J]. 现代图书情报技术, 2016, 32(6): 102-109.
[11] Liu Hongxu,Qu Jiansheng. Using Meta-analysis Software for Domain Knowledge Discovery[J]. 现代图书情报技术, 2016, 32(5): 9-21.
[12] Li Hui,Hu Yunfeng. Clustering and Discovering Web Services with Topic Model[J]. 现代图书情报技术, 2016, 32(5): 30-37.
[13] Yang Aidong,Liu Dongsu. Hadoop Based Public Opinion Monitoring System for Micro-blogs[J]. 现代图书情报技术, 2016, 32(5): 56-63.
[14] Hao Jiashu. Enriching Personal Name Authority with Open Semantic Resources:FOAF for Schema Design[J]. 现代图书情报技术, 2016, 32(2): 75-82.
[15] Ren Ni, Zhou Jiannong. The Discovery and Evaluation of Research Team Under the Mode of Weighted Co-Author Network[J]. 现代图书情报技术, 2015, 31(9): 68-75.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn