Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (10): 76-81    DOI: 10.11925/infotech.1003-3513.2010.10.13
article Current Issue | Archive | Adv Search |
Building the Open Source Mass Data Mining Platform Based on Cloud Computing
Zhao Huaming
National Science Library,Chinese Academy of Sciences,Beijing 100190,China
Download: PDF(661 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

Aiming to meet the internal data processing needs of information organizations, this paper, by analyzing the frameworks of Amazon Elastic Map/Reduce (EMR) platform, puts forward to build the dynamic and elastic open source mass data mining platform based on cloud computing, and provides a roadmap of successful implementation, an example of massive text data processing and the analysis of advantages of open source EMR platform. This implementation plan includes three parts: building dynamic virtual environment of cloud computing,creating the virtual server template of Hadoop, and deploying and running Cloudera and Cloudera Desktop. Through the application of open source EMR platform , the problem of server sprawl can be solved effectively,the utilization ratio of network computing resource is improved,and the rapid deployment capability and agility of distributed data processing services are enhanced.

Key wordsCloud      computing      Mass      data      mining      Virtualization      Distributed      computing      Xen      Cloudera      Hadoop     
Received: 26 September 2010      Published: 04 January 2011
: 

TP393

 

Cite this article:

Zhao Huaming. Building the Open Source Mass Data Mining Platform Based on Cloud Computing. New Technology of Library and Information Service, 2010, 26(10): 76-81.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2010.10.13     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2010/V26/I10/76


[1] 2010 Digital Universe Study . .http://gigaom.files.wordpress.com/2010/05/2010-digital-universe-iview_5-4-10.pdf.

[2] Amazon Introduces Elastic MapReduce (Hadoop Framework) Service . . http://www.byteonic.com/2009/amazon-introduces-elastic-mapreduce-Hadoop-framework-service/.

[3] Amazon Elastic MapReduce . .http://aws.amazon.com/elasticmapreduce/.

[4] Cloudera Enterprise . .http://www.Cloudera.com/products-services/enterprise/.

[5] Hadoop中国2009云计算大会 . .http://Linux.chinaunix.net/news/2009/11/15/1144192.shtml.

[6] Developing Applications for HUE . .http://www.Cloudera.com/blog/2010/07/developing-applications-for-hue/.

[7] Pratt I, Fraser K, Hand S,et al.Xen 3.0 and the Art of Virtualization . .http://www.Linuxsymposium.org/2005/Linuxsymposium_procv2.pdf.

[8] Technical and Commercial Comparison of Citrix XenServer and VMware . . http://www.citrix.com/site/resources/dynamic/salesdocs/XS_vs_VMware_comparison.pdf.

[9] VMware vSphere . .http://www.vmware.com/products/esx/.

[10] XenServer Installation Guide . . http://support.citrix.com/servlet/KbServlet/download/18052-102-19049/installation.pdf.

[11] Hadoop Cluster Setup . . http://hadoop.apache.org/common/docs/r0.20.0/cluster_setup.html.

[12] Hadoop 5_minute Quick Start . . http://nightly.cloudera.com/docs-backup/hadoop_5_minute_quick_start.html.

[13] Hadoop添加节点的方法 . . http://wenku.baidu.com/view/e57ffe3e0912a2161479291e.html.

[14] 赵华茗,李春旺,周强.基于XenServer的数字图书馆云服务平台实现研究
[J]. 电信科学 ,2010, 26(8A):33-38.

[15] Hadoop Map/Reduce Tutorial . .http://Hadoop.apache.org/common/docs/r0.18.2/mapred_tutorial.html.

[16] Amazon Elastic MapReduce Updates from Hadoop Summit 2010 . . http://www.infoq.com/news/2010/07/amazon-elastic-mapreduce-updates.

[1] Beibei Kong,Jing Xie,Li Qian,Zhijun Chang,Zhenxin Wu. Methodology and Tools to Enrich Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(7): 113-122.
[2] Ke Li,Yuya Sasaki. Analyzing Sentiment Distribution with Spatial-textual Data of Multi-dimensional Clustering[J]. 数据分析与知识发现, 2019, 3(7): 14-22.
[3] Yong Zhang,Shuqing Li,Yongshang Cheng. Mining Algorithm for Weighted Association Rules Based on Frequency Effective Length[J]. 数据分析与知识发现, 2019, 3(7): 85-93.
[4] Yanan Yang,Wenhui Zhao,Jian Zhang,Shen Tan,Beibei Zhang. Visualizing Policy Texts Based on Multi-View Collaboration[J]. 数据分析与知识发现, 2019, 3(6): 30-41.
[5] Xiaozhou Dong,Xinkang Chen. E-Coupon and Economic Performance of E-commerce[J]. 数据分析与知识发现, 2019, 3(6): 42-49.
[6] Qingtian Zeng,Mingdi Dai,Chao Li,Hua Duan,Zhongying Zhao. Discovering Important Locations with User Representation and Trace Data[J]. 数据分析与知识发现, 2019, 3(6): 75-82.
[7] Mengji Zhang,Wanyu Du,Nan Zheng. Predicting Stock Trends Based on News Events[J]. 数据分析与知识发现, 2019, 3(5): 11-18.
[8] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[9] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[10] Quan Lu,Anqi Zhu,Jiyue Zhang,Jing Chen. Research on User Information Requirement in Chinese Network Health Community: Taking Tumor-forum Data of Qiuyi as an Example[J]. 数据分析与知识发现, 2019, 3(4): 22-32.
[11] Dongmei Mu,Hui Fa,Ping Wang,Jing Sun. Research on Disease Risk Factors on Structural Equation Model[J]. 数据分析与知识发现, 2019, 3(4): 80-89.
[12] Lianjie Xiao,Mengrui Gao,Xinning Su. An Under-sampling Ensemble Classification Algorithm Based on Fuzzy C-Means Clustering for Imbalanced Data[J]. 数据分析与知识发现, 2019, 3(4): 90-96.
[13] Xuhui Li,Yang Liu. Review of Spatio-temporal Data Modeling Methods[J]. 数据分析与知识发现, 2019, 3(3): 1-13.
[14] Ying Wang,Li Qian,Jing Xie,Zhijun Chang,Beibei Kong. Building Knowledge Graph with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 15-26.
[15] Li Qian,Jing Xie,Zhijun Chang,Zhenxin Wu,Dongrong Zhang. Designing Smart Knowledge Services with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 4-14.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn