Please wait a minute...
New Technology of Library and Information Service  2012, Vol. 28 Issue (2): 60-67    DOI: 10.11925/infotech.1003-3513.2012.02.10
Current Issue | Archive | Adv Search |
Analysis of MapReduce Principle and Its Main Implementation Platforms
Kang Liyun, Wang Xiaoyue, Bai Rujiang
Institute of Scientific & Technical Information, Shandong University of Technology, Zibo 255049, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  Due to the problems in processing speed, storage space, fault tolerance, access time and others of massive data processing, this paper analyzes principle, implementation process of Google MapReduce programming model, introduces four main MapReduce implementation platforms,including Hadoop, Phoenix, Disco and Mars.Then,it separately compares them in four aspects as the programming language, building platform, functions and features, applications to have a comprehensive understanding of MapReduce programming model and it's application platforms.
Key wordsMapReduce      Implementation platform      Hadoop      Phoenix      Disco      Mars     
Received: 21 September 2011      Published: 23 March 2012
: 

G250 TP391

 

Cite this article:

Kang Liyun, Wang Xiaoyue, Bai Rujiang. Analysis of MapReduce Principle and Its Main Implementation Platforms. New Technology of Library and Information Service, 2012, 28(2): 60-67.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2012.02.10     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2012/V28/I2/60

[1] Dean J, Ghemawat S. MapReduce:Simplified Data Processing on Large Clusters[J].Communications of the ACM, 2008,51(1):107-113.

[2] White T. Hadoop: The Definitive Guide[M]. O'Reilly Media,2009.

[3] Ghemawat S, Gobioff H,Leung S. The Google File System[C]. In: Proceedings of the 19th ACM SIGOPS Symposium on Operating Systems Principles (SOSP'03), Bolton Landing, NY.New York, USA: ACM, 2003: 29-43.

[4] MapReduce Tutorial [EB/OL]. [2011-08-19]. http://hadoop.apache.org/common/docs/current/mapred_tutorial.html.

[5] EE382a: Advanced Processor Architecture[EB/OL].[2011-08-20].https://courseware.stanford.edu/pg/courses/95981.

[6] Ranger C,Raghuraman R,Penmetsa A, et al. Evaluating MapReduce for Multi-core and Multiprocessor Systems[C]. In: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture(HPCA'07). Washington, DC, USA:IEEE Computer Society,2007:13-24.

[7] Technical Overview Disco Architecture [EB/OL].[2011-12-22].http://discoproject.org/doc/overview.html.

[8] He B S, Fang W B, Luo Q, et al. Mars:A MapReduce Framework on Graphics Processors[C]. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques(PACT'08). New York, NY, USA:ACM,2008:260-269.

[9] Mars: A MapReduce Framework on Graphics Processors [EB/OL]. [2011-08-20].http://www.cse.ust.hk/gpuqp/Mars.html.

[10] Hadoop Streaming [EB/OL]. [2011-12-23]. http://hadoop.apache.org/common/docs/r0.15.2/ streaming.html.

[11] Package org.apache.hadoop.mapred.pipes[EB/OL].[2011-12-23].http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html.

[12] Leo S, Zanetti G. Pydoop: A Python MapReduce and HDFS API for Hadoop[C]. In: Proceedings of the 19th ACM International Sysposium on High Performance Distributed Computing(HPDC'10).New York, NY, USA:ACM,2010:819-825.

[13] Pydoop [EB/OL]. [2011-12-26]. http://sourceforge.net/projects/pydoop/.

[14] Style Guide for Disco Code [EB/OL].[2011-12-26].http://discoproject.org/doc/howto/style.html.

[15] Programming Rules and Conventions [EB/OL].[2011-12-26]. http://www.erlang.se/doc/programming_rules.shtml.

[16] Style Guide for Python Code[EB/OL].[2011-12-26].http://www.python.org/dev/peps/pep-0008.

[17] Lam C. Hadoop in Action[M]. Shelter Island, NY:Manning Publications Co., 2010.

[18] POSIX Threads Programming[EB/OL].[2011-12-29].https://computing.llnl.gov/tutorials/pthreads/.

[19] Disco Distributed File System[EB/OL].[2011-12-29]. http://discoproject.org/doc/howto/ddfs.html.

[20] 李建江, 崔健, 王聃, 等.MapReduce并行编程模型研究综述[J]. 电子学报 ,2011,39(11):2635-2642.(Li Jianjiang, Cui Jian, Wang Dan, et al. Survey of MapReduce Parallel Programming Model[J]. Chinese Journal of Electronics, 2011,39(11):2635-2642.)

[21] Langendoen K,Romein J,Bhoedjang R,et al.Integrating Polling, Interrupts, and Thread Management[C].In: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation. Los Alamitos: IEEE Computer Society,1996:13-22.

[22] Hadoop at Yahoo![EB/OL].[2012-01-07].http://developer.yahoo.com/hadoop/.

[23] PoweredBy-Haoop Wiki[EB/OL].[2012-01-07].http://wiki.apache.org/hadoop/PoweredBy.

[24] Hadoop at Last.fm[EB/OL].[2012-01-08].http://www.slideshare.net/klbostee/hadoop-at-Lastfm.

[25] Facebook on Hadoop,Hive,HBase,and A/B Testing[EB/OL].[2012-01-08]. http://www.infoq.com/news/2010/07/facebook-hadoop-summit.

[26] Hadoop Archive-淘宝共享数据平台TBDATA.org[EB/OL].[2012-02-08]. http://www.tbdata.org/archives/category/cloud-computing/hadoop.(Hadoop Archive: Taobao Shared Data Platform TBDATA.org[EB/OL].[2012-01-08].http://www.tbdata.org/archives/category/cloud-computing/hadoop.)

[27] Hypertable: An Open Source, High Performance, Scalable Database[EB/OL].[2011-08-19]. http://hypertable.org/.

[28] Disco: Massive Data-minimal Code[EB/OL].[2011-12-22].http://discoproject.org/about.
[1] Dai Bing,Hu Zhengyin. Review of Studies on Literature-Based Discovery[J]. 数据分析与知识发现, 2021, 5(4): 1-12.
[2] Wang Hong, Shu Zhan, Gao Yinquan, Tian Wenhong. Analyzing Implicit Discourse Relation with Single Classifier and Multi-Task Network[J]. 数据分析与知识发现, 2021, 5(11): 80-88.
[3] Wang Song, Yang Yang, Liu Xinmin. Discovering Potentialities of User Ideas from Open Innovation Communities with Graph Attention Network[J]. 数据分析与知识发现, 2021, 5(11): 89-101.
[4] Shao Qi,Mu Dongmei,Wang Ping,Jin Chunyan. Identifying Subjects of Online Opinion from Public Health Emergencies[J]. 数据分析与知识发现, 2020, 4(9): 68-80.
[5] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[6] Hu Zhengyin,Liu Leilei,Dai Bing,Qin Xiaochu. Discovering Subject Knowledge in Life and Medical Sciences with Knowledge Graph[J]. 数据分析与知识发现, 2020, 4(11): 1-14.
[7] Xianlai Chen,Chaopeng Han,Ying An,Li Liu,Zhongmin Li,Rong Yang. Extracting New Words with Mutual Information and Logistic Regression[J]. 数据分析与知识发现, 2019, 3(8): 105-113.
[8] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[9] Tingxin Wen,Yangzi Li,Jingshuang Sun. News Hotspots Discovery Method Based on Multi Factor Feature Selection and AFOA/K-means[J]. 数据分析与知识发现, 2019, 3(4): 97-106.
[10] Juhua Wu,Yu Wang,Ming Li,Shaoyun Cai. Knowledge Discovery of Online Health Communities with Weighted Knowledge Network[J]. 数据分析与知识发现, 2019, 3(2): 108-117.
[11] Lei Yang,Zirun Wang,Guisheng Hou. Discovering Topics of Online Health Community with Q-LDA Model[J]. 数据分析与知识发现, 2019, 3(11): 52-59.
[12] Jiying Hu,Jing Xie,Li Qian,Changlei Fu. Constructing Big Data Platform for Sci-Tech Knowledge Discovery with Knowledge Graph[J]. 数据分析与知识发现, 2019, 3(1): 55-62.
[13] Wang Xin,Feng Wen’gang. Review of Techniques Detecting Online Extremism and Radicalization[J]. 数据分析与知识发现, 2018, 2(10): 2-8.
[14] Zhang Zhiqiang,Fan Shaoping,Chen Xiujuan. Biomedical Informatics Studies for Knowledge Discovery in Precision Medicine[J]. 数据分析与知识发现, 2018, 2(1): 1-8.
[15] Mu Dongmei,Wang Ping,Zhao Danning. Reducing Data Dimension of Electronic Medical Records: An Empirical Study[J]. 数据分析与知识发现, 2018, 2(1): 88-98.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn