New Technology of Library and Information Service  2010, Vol. 26 Issue (10): 76-81    DOI: 10.11925/infotech.1003-3513.2010.10.13
Building the Open Source Mass Data Mining Platform Based on Cloud Computing
Zhao Huaming
National Science Library,Chinese Academy of Sciences,Beijing 100190,China
Aiming to meet the internal data processing needs of information organizations, this paper, by analyzing the frameworks of Amazon Elastic Map/Reduce (EMR) platform, puts forward to build the dynamic and elastic open source mass data mining platform based on cloud computing, and provides a roadmap of successful implementation, an example of massive text data processing and the analysis of advantages of open source EMR platform. This implementation plan includes three parts: building dynamic virtual environment of cloud computing,creating the virtual server template of Hadoop, and deploying and running Cloudera and Cloudera Desktop. Through the application of open source EMR platform , the problem of server sprawl can be solved effectively,the utilization ratio of network computing resource is improved,and the rapid deployment capability and agility of distributed data processing services are enhanced.

Key wordsCloud      computing      Mass      data      mining      Virtualization      Distributed      computing      Xen      Cloudera      Hadoop     
Received: 26 September 2010      Published: 04 January 2011



Zhao Huaming. Building the Open Source Mass Data Mining Platform Based on Cloud Computing. New Technology of Library and Information Service, 2010, 26(10): 76-81.

