Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (10): 76-81    DOI: 10.11925/infotech.1003-3513.2010.10.13
article Current Issue | Archive | Adv Search |
Building the Open Source Mass Data Mining Platform Based on Cloud Computing
Zhao Huaming
National Science Library,Chinese Academy of Sciences,Beijing 100190,China
Export: BibTeX | EndNote (RIS)      

Aiming to meet the internal data processing needs of information organizations, this paper, by analyzing the frameworks of Amazon Elastic Map/Reduce (EMR) platform, puts forward to build the dynamic and elastic open source mass data mining platform based on cloud computing, and provides a roadmap of successful implementation, an example of massive text data processing and the analysis of advantages of open source EMR platform. This implementation plan includes three parts: building dynamic virtual environment of cloud computing,creating the virtual server template of Hadoop, and deploying and running Cloudera and Cloudera Desktop. Through the application of open source EMR platform , the problem of server sprawl can be solved effectively,the utilization ratio of network computing resource is improved,and the rapid deployment capability and agility of distributed data processing services are enhanced.

Key wordsCloud      computing      Mass      data      mining      Virtualization      Distributed      computing      Xen      Cloudera      Hadoop     
Received: 26 September 2010      Published: 04 January 2011



Cite this article:

Zhao Huaming. Building the Open Source Mass Data Mining Platform Based on Cloud Computing. New Technology of Library and Information Service, 2010, 26(10): 76-81.

URL:     OR

[1] 2010 Digital Universe Study . .

[2] Amazon Introduces Elastic MapReduce (Hadoop Framework) Service . .

[3] Amazon Elastic MapReduce . .

[4] Cloudera Enterprise . .

[5] Hadoop中国2009云计算大会 . .

[6] Developing Applications for HUE . .

[7] Pratt I, Fraser K, Hand S,et al.Xen 3.0 and the Art of Virtualization . .

[8] Technical and Commercial Comparison of Citrix XenServer and VMware . .

[9] VMware vSphere . .

[10] XenServer Installation Guide . .

[11] Hadoop Cluster Setup . .

[12] Hadoop 5_minute Quick Start . .

[13] Hadoop添加节点的方法 . .

[14] 赵华茗,李春旺,周强.基于XenServer的数字图书馆云服务平台实现研究
[J]. 电信科学 ,2010, 26(8A):33-38.

[15] Hadoop Map/Reduce Tutorial . .

[16] Amazon Elastic MapReduce Updates from Hadoop Summit 2010 . .

[1] Xu Liangchen, Guo Chonghui. Predicting Survival Rates for Gastric Cancer Based on Ensemble Learning[J]. 数据分析与知识发现, 2021, 5(8): 86-99.
[2] Liu Yuanchen, Wang Hao, Gao Yaqi. Predicting Online Music Playbacks and Influencing Factors[J]. 数据分析与知识发现, 2021, 5(8): 100-112.
[3] Gu Yaowen, Zhang Bowen, Zheng Si, Yang Fengchun, Li Jiao. Predicting Drug ADMET Properties Based on Graph Attention Network[J]. 数据分析与知识发现, 2021, 5(8): 76-85.
[4] Lu Quan, He Chao, Chen Jing, Tian Min, Liu Ting. A Multi-Label Classification Model with Two-Stage Transfer Learning[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
[5] Dong Mei,Chang Zhijun,Zhang Runjie. A Multiple Pattern Matching Algorithm for Specifications of Incremental Metadata for Sci-Tech Literature[J]. 数据分析与知识发现, 2021, 5(6): 135-144.
[6] Huang Mingxuan,Jiang Caoqing,Lu Shoudong. Expanding Queries Based on Word Embedding and Expansion Terms[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[7] Ma Yingxue,Zhao Jichang. Patterns and Evolution of Public Opinion on Weibo During Natural Disasters: Case Study of Typhoons and Rainstorms[J]. 数据分析与知识发现, 2021, 5(6): 66-79.
[8] Liu Tong,Liu Chen,Ni Weijian. A Semi-Supervised Sentiment Analysis Method for Chinese Based on Multi-Level Data Augmentation[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[9] Xu Guang,Ren Ming,Song Chengyu. Extracting China’s Economic Image from Western News[J]. 数据分析与知识发现, 2021, 5(5): 30-40.
[10] Dai Bing,Hu Zhengyin. Review of Studies on Literature-Based Discovery[J]. 数据分析与知识发现, 2021, 5(4): 1-12.
[11] Wang Hongbin,Wang Jianxiong,Zhang Yafei,Yang Heng. Topic Recognition of News Reports with Imbalanced Contents[J]. 数据分析与知识发现, 2021, 5(3): 109-120.
[12] Chang Zhijun,Qian Li,Xie Jing,Wu Zhenxin,Zhang Hu,Yu Qianqian,Wang Ying,Wang Yongji. Big Data Platform for Sci-Tech Literature Based on Distributed Technology[J]. 数据分析与知识发现, 2021, 5(3): 69-77.
[13] Liang Jiaming, Zhao Jie, Zheng Peng, Huang Liushen, Ye Minqi, Dong Zhenning. Framework for Computing Trust in Online Short-Rent Platform Using Feature Selection of Images and Texts[J]. 数据分析与知识发现, 2021, 5(2): 129-140.
[14] Xie Wang, Wang Lizhen, Chen Hongmei, Zeng Lanqing. Identifying Relationship Between Pollution Sources and Cancer Cases with Spatial Ordered Pair Patterns[J]. 数据分析与知识发现, 2021, 5(2): 14-31.
[15] Shen Wang, Li Shiyu, Liu Jiayu, Li He. Optimizing Quality Evaluation for Answers of Q&A Community[J]. 数据分析与知识发现, 2021, 5(2): 83-93.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938