Aiming to meet the internal data processing needs of information organizations, this paper, by analyzing the frameworks of Amazon Elastic Map/Reduce (EMR) platform, puts forward to build the dynamic and elastic open source mass data mining platform based on cloud computing, and provides a roadmap of successful implementation, an example of massive text data processing and the analysis of advantages of open source EMR platform. This implementation plan includes three parts: building dynamic virtual environment of cloud computing,creating the virtual server template of Hadoop, and deploying and running Cloudera and Cloudera Desktop. Through the application of open source EMR platform , the problem of server sprawl can be solved effectively,the utilization ratio of network computing resource is improved,and the rapid deployment capability and agility of distributed data processing services are enhanced.
赵华茗. 搭建基于云计算的开源海量数据挖掘平台[J]. 现代图书情报技术, 2010, 26(10): 76-81.
Zhao Huaming. Building the Open Source Mass Data Mining Platform Based on Cloud Computing. New Technology of Library and Information Service, 2010, 26(10): 76-81.
[6] Developing Applications for HUE . .http://www.Cloudera.com/blog/2010/07/developing-applications-for-hue/.
[7] Pratt I, Fraser K, Hand S,et al.Xen 3.0 and the Art of Virtualization . .http://www.Linuxsymposium.org/2005/Linuxsymposium_procv2.pdf.
[8] Technical and Commercial Comparison of Citrix XenServer and VMware . . http://www.citrix.com/site/resources/dynamic/salesdocs/XS_vs_VMware_comparison.pdf.