Please wait a minute...
Advanced Search
现代图书情报技术  2012, Vol. 28 Issue (2): 60-67     https://doi.org/10.11925/infotech.1003-3513.2012.02.10
  情报分析与研究 本期目录 | 过刊浏览 | 高级检索 |
MapReduce原理及其主要实现平台分析
亢丽芸, 王效岳, 白如江
山东理工大学科技信息研究所 淄博 255049
Analysis of MapReduce Principle and Its Main Implementation Platforms
Kang Liyun, Wang Xiaoyue, Bai Rujiang
Institute of Scientific & Technical Information, Shandong University of Technology, Zibo 255049, China
全文: PDF (944 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 针对海量数据处理在处理速度、存储空间、容错性、访问时间等方面存在的问题,对Google MapReduce编程模型的原理、执行流程等进行分析研究,介绍4种主要的MapReduce实现平台Hadoop、Phoenix、Disco、Mars,从编程语言、构建平台、功能特点和应用领域4个方面对4种平台进行比较分析,以期对MapReduce编程模型原理及其应用平台有一个较全面的认识。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
亢丽芸
王效岳
白如江
关键词 MapReduce实现平台HadoopPhoenixDiscoMars    
Abstract:Due to the problems in processing speed, storage space, fault tolerance, access time and others of massive data processing, this paper analyzes principle, implementation process of Google MapReduce programming model, introduces four main MapReduce implementation platforms,including Hadoop, Phoenix, Disco and Mars.Then,it separately compares them in four aspects as the programming language, building platform, functions and features, applications to have a comprehensive understanding of MapReduce programming model and it's application platforms.
Key wordsMapReduce    Implementation platform    Hadoop    Phoenix    Disco    Mars
收稿日期: 2011-09-21      出版日期: 2012-03-23
: 

G250 TP391

 
基金资助:

本文系国家社会科学基金一般项目“海量网络学术文献自动分类研究”(项目编号:10BTQ047)和山东省自然科学基金项目“大规模学术文献并行处理与语义分类研究”(项目编号:ZR2011GL025)的研究成果之一。

引用本文:   
亢丽芸, 王效岳, 白如江. MapReduce原理及其主要实现平台分析[J]. 现代图书情报技术, 2012, 28(2): 60-67.
Kang Liyun, Wang Xiaoyue, Bai Rujiang. Analysis of MapReduce Principle and Its Main Implementation Platforms. New Technology of Library and Information Service, 2012, 28(2): 60-67.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2012.02.10      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2012/V28/I2/60
[1] Dean J, Ghemawat S. MapReduce:Simplified Data Processing on Large Clusters[J].Communications of the ACM, 2008,51(1):107-113.

[2] White T. Hadoop: The Definitive Guide[M]. O'Reilly Media,2009.

[3] Ghemawat S, Gobioff H,Leung S. The Google File System[C]. In: Proceedings of the 19th ACM SIGOPS Symposium on Operating Systems Principles (SOSP'03), Bolton Landing, NY.New York, USA: ACM, 2003: 29-43.

[4] MapReduce Tutorial [EB/OL]. [2011-08-19]. http://hadoop.apache.org/common/docs/current/mapred_tutorial.html.

[5] EE382a: Advanced Processor Architecture[EB/OL].[2011-08-20].https://courseware.stanford.edu/pg/courses/95981.

[6] Ranger C,Raghuraman R,Penmetsa A, et al. Evaluating MapReduce for Multi-core and Multiprocessor Systems[C]. In: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture(HPCA'07). Washington, DC, USA:IEEE Computer Society,2007:13-24.

[7] Technical Overview Disco Architecture [EB/OL].[2011-12-22].http://discoproject.org/doc/overview.html.

[8] He B S, Fang W B, Luo Q, et al. Mars:A MapReduce Framework on Graphics Processors[C]. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques(PACT'08). New York, NY, USA:ACM,2008:260-269.

[9] Mars: A MapReduce Framework on Graphics Processors [EB/OL]. [2011-08-20].http://www.cse.ust.hk/gpuqp/Mars.html.

[10] Hadoop Streaming [EB/OL]. [2011-12-23]. http://hadoop.apache.org/common/docs/r0.15.2/ streaming.html.

[11] Package org.apache.hadoop.mapred.pipes[EB/OL].[2011-12-23].http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html.

[12] Leo S, Zanetti G. Pydoop: A Python MapReduce and HDFS API for Hadoop[C]. In: Proceedings of the 19th ACM International Sysposium on High Performance Distributed Computing(HPDC'10).New York, NY, USA:ACM,2010:819-825.

[13] Pydoop [EB/OL]. [2011-12-26]. http://sourceforge.net/projects/pydoop/.

[14] Style Guide for Disco Code [EB/OL].[2011-12-26].http://discoproject.org/doc/howto/style.html.

[15] Programming Rules and Conventions [EB/OL].[2011-12-26]. http://www.erlang.se/doc/programming_rules.shtml.

[16] Style Guide for Python Code[EB/OL].[2011-12-26].http://www.python.org/dev/peps/pep-0008.

[17] Lam C. Hadoop in Action[M]. Shelter Island, NY:Manning Publications Co., 2010.

[18] POSIX Threads Programming[EB/OL].[2011-12-29].https://computing.llnl.gov/tutorials/pthreads/.

[19] Disco Distributed File System[EB/OL].[2011-12-29]. http://discoproject.org/doc/howto/ddfs.html.

[20] 李建江, 崔健, 王聃, 等.MapReduce并行编程模型研究综述[J]. 电子学报 ,2011,39(11):2635-2642.(Li Jianjiang, Cui Jian, Wang Dan, et al. Survey of MapReduce Parallel Programming Model[J]. Chinese Journal of Electronics, 2011,39(11):2635-2642.)

[21] Langendoen K,Romein J,Bhoedjang R,et al.Integrating Polling, Interrupts, and Thread Management[C].In: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation. Los Alamitos: IEEE Computer Society,1996:13-22.

[22] Hadoop at Yahoo![EB/OL].[2012-01-07].http://developer.yahoo.com/hadoop/.

[23] PoweredBy-Haoop Wiki[EB/OL].[2012-01-07].http://wiki.apache.org/hadoop/PoweredBy.

[24] Hadoop at Last.fm[EB/OL].[2012-01-08].http://www.slideshare.net/klbostee/hadoop-at-Lastfm.

[25] Facebook on Hadoop,Hive,HBase,and A/B Testing[EB/OL].[2012-01-08]. http://www.infoq.com/news/2010/07/facebook-hadoop-summit.

[26] Hadoop Archive-淘宝共享数据平台TBDATA.org[EB/OL].[2012-02-08]. http://www.tbdata.org/archives/category/cloud-computing/hadoop.(Hadoop Archive: Taobao Shared Data Platform TBDATA.org[EB/OL].[2012-01-08].http://www.tbdata.org/archives/category/cloud-computing/hadoop.)

[27] Hypertable: An Open Source, High Performance, Scalable Database[EB/OL].[2011-08-19]. http://hypertable.org/.

[28] Disco: Massive Data-minimal Code[EB/OL].[2011-12-22].http://discoproject.org/about.
[1] 杨恒,王思丽,祝忠明,刘巍,王楠. 基于并行协同过滤算法的领域知识推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[2] 高长元, 于建萍, 何晓燕. 基于改进粒子群算法的云计算产业联盟知识搜索算法研究*[J]. 数据分析与知识发现, 2017, 1(3): 81-89.
[3] 杨爱东,刘东苏. 基于Hadoop的微博舆情监控系统模型研究[J]. 现代图书情报技术, 2016, 32(5): 56-63.
[4] 范云满, 洪娜, 钱庆, 方安. 利用Hadoop/HBase的药物基因组数据云存储实践研究[J]. 现代图书情报技术, 2015, 31(5): 73-79.
[5] 卓可秋, 虞为, 苏新宁. 突发事件检测的MapReduce并行化实现[J]. 现代图书情报技术, 2015, 31(2): 46-54.
[6] 马宾, 殷立峰. 一种基于Hadoop平台的并行朴素贝叶斯网络舆情快速分类算法[J]. 现代图书情报技术, 2015, 31(2): 78-84.
[7] 赵华茗. 分布式环境下的文本聚类研究与实现[J]. 现代图书情报技术, 2015, 31(1): 82-88.
[8] 虞为, 陈俊鹏. 基于MapReduce的书目数据关联匹配研究[J]. 现代图书情报技术, 2013, 29(9): 15-22.
[9] 肖强, 朱庆华, 郑华, 吴克文. Hadoop环境下的分布式协同过滤算法设计与实现[J]. 现代图书情报技术, 2013, 29(1): 83-89.
[10] 赵华茗. 分布式环境下的文档相似度研究与实现[J]. 现代图书情报技术, 2011, 27(7/8): 14-20.
[11] 张兴旺, 李晨晖, 秦晓珠. 云计算环境下大规模数据处理的研究与初步实现[J]. 现代图书情报技术, 2011, 27(4): 17-23.
[12] 赵华茗. 搭建基于云计算的开源海量数据挖掘平台[J]. 现代图书情报技术, 2010, 26(10): 76-81.
[13] 杨代庆,张智雄. 基于Hadoop的海量共现矩阵生成方法*[J]. 现代图书情报技术, 2009, 25(4): 23-26.
[14] 吴宝贵,丁振国. 基于Map/Reduce的分布式搜索引擎研究[J]. 现代图书情报技术, 2007, 2(8): 52-55.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn