|
|
Building the Open Source Mass Data Mining Platform Based on Cloud Computing |
Zhao Huaming |
National Science Library,Chinese Academy of Sciences,Beijing 100190,China |
|
|
Abstract Aiming to meet the internal data processing needs of information organizations, this paper, by analyzing the frameworks of Amazon Elastic Map/Reduce (EMR) platform, puts forward to build the dynamic and elastic open source mass data mining platform based on cloud computing, and provides a roadmap of successful implementation, an example of massive text data processing and the analysis of advantages of open source EMR platform. This implementation plan includes three parts: building dynamic virtual environment of cloud computing,creating the virtual server template of Hadoop, and deploying and running Cloudera and Cloudera Desktop. Through the application of open source EMR platform , the problem of server sprawl can be solved effectively,the utilization ratio of network computing resource is improved,and the rapid deployment capability and agility of distributed data processing services are enhanced.
|
Received: 26 September 2010
Published: 04 January 2011
|
|
[1] 2010 Digital Universe Study . .http://gigaom.files.wordpress.com/2010/05/2010-digital-universe-iview_5-4-10.pdf.
[2] Amazon Introduces Elastic MapReduce (Hadoop Framework) Service . . http://www.byteonic.com/2009/amazon-introduces-elastic-mapreduce-Hadoop-framework-service/.
[3] Amazon Elastic MapReduce . .http://aws.amazon.com/elasticmapreduce/.
[4] Cloudera Enterprise . .http://www.Cloudera.com/products-services/enterprise/.
[5] Hadoop中国2009云计算大会 . .http://Linux.chinaunix.net/news/2009/11/15/1144192.shtml.
[6] Developing Applications for HUE . .http://www.Cloudera.com/blog/2010/07/developing-applications-for-hue/.
[7] Pratt I, Fraser K, Hand S,et al.Xen 3.0 and the Art of Virtualization . .http://www.Linuxsymposium.org/2005/Linuxsymposium_procv2.pdf.
[8] Technical and Commercial Comparison of Citrix XenServer and VMware . . http://www.citrix.com/site/resources/dynamic/salesdocs/XS_vs_VMware_comparison.pdf.
[9] VMware vSphere . .http://www.vmware.com/products/esx/.
[10] XenServer Installation Guide . . http://support.citrix.com/servlet/KbServlet/download/18052-102-19049/installation.pdf.
[11] Hadoop Cluster Setup . . http://hadoop.apache.org/common/docs/r0.20.0/cluster_setup.html.
[12] Hadoop 5_minute Quick Start . . http://nightly.cloudera.com/docs-backup/hadoop_5_minute_quick_start.html.
[13] Hadoop添加节点的方法 . . http://wenku.baidu.com/view/e57ffe3e0912a2161479291e.html.
[14] 赵华茗,李春旺,周强.基于XenServer的数字图书馆云服务平台实现研究 [J]. 电信科学 ,2010, 26(8A):33-38.
[15] Hadoop Map/Reduce Tutorial . .http://Hadoop.apache.org/common/docs/r0.18.2/mapred_tutorial.html.
[16] Amazon Elastic MapReduce Updates from Hadoop Summit 2010 . . http://www.infoq.com/news/2010/07/amazon-elastic-mapreduce-updates.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|