Please wait a minute...
New Technology of Library and Information Service  2009, Vol. 25 Issue (4): 23-26    DOI: 10.11925/infotech.1003-3513.2009.04.05
Current Issue | Archive | Adv Search |
A Method for Generating Co-occurrence Matrix of Mass Data Based on Hadoop
Yang Daiqing1,2  Zhang Zhixiong1
1 (National Science Library, Chinese Academy of Sciences, Beijing 100190, China)
2(Institute of Scientific and Technical Information of China, Beijing 100038, China)
Download: PDF(536 KB)   HTML  
Export: BibTeX | EndNote (RIS)      

Mass data processing is a focal point of information techniques. This paper introduces architecture of open source parallel system-Hadoop, analyzes the MapReduce programming framework based on Hadoop, and proposes a method for generating co-occurrence matrix of mass data through multiple MapReduce operations.

Key wordsHadoop      MapReduce      Co-Occurrence Matrix      open-source-software     
Received: 28 March 2009      Published: 25 April 2009


Corresponding Authors: Yang Daiqing     E-mail:
About author:: Yang Daiqing,Zhang Zhixiong

Cite this article:

Yang Daiqing,Zhang Zhixiong. A Method for Generating Co-occurrence Matrix of Mass Data Based on Hadoop. New Technology of Library and Information Service, 2009, 25(4): 23-26.

URL:     OR

[1] HDFS Architecture[EB/OL].[2008-12-10].
[2] Hadoop Cluster Setup[EB/OL].[2008-12-15].
[3] HadoopMapReduce[EB/OL].[2008-12-16].
[4] Distributed Computing with Linux and Hadoop.[EB/OL].[2009-01-10].
[5] Hbase[EB/OL].[2009-01-10].
[6] Hive[EB/OL].[2009-01-15].
[7] Pig[EB/OL].[2009-01-15].
[8] CloudBase[EB/OL].[2009-01-16].

[1] Changyuan Gao,Jianping Yu,Xiaoyan He. Knowledge Search for Cloud Computing Industry Alliance: An Algorithm Based on Improved Particle Swarm Optimization[J]. 数据分析与知识发现, 2017, 1(3): 81-89.
[2] Yang Aidong,Liu Dongsu. Hadoop Based Public Opinion Monitoring System for Micro-blogs[J]. 现代图书情报技术, 2016, 32(5): 56-63.
[3] Fan Yunman, Hong Na, Qian Qing, Fang An. The Research Practices of DataBase Cloud Storage Using Hadoop/HBase for the Pharmacogenomics Data[J]. 现代图书情报技术, 2015, 31(5): 73-79.
[4] Zhuo Keqiu, Yu Wei, Su Xinning. Parallel Implementing Bursty Events Detection Using MapReduce[J]. 现代图书情报技术, 2015, 31(2): 46-54.
[5] Ma Bin, Yin Lifeng. A Parallel Naive Bayesian Network Public Opinion Fast Classification Algorithm Based on Hadoop Platform[J]. 现代图书情报技术, 2015, 31(2): 78-84.
[6] Zhao Huaming. Research and Implementation of Textual Clustering in Distributed Environment[J]. 现代图书情报技术, 2015, 31(1): 82-88.
[7] Yu Wei, Chen Junpeng. Linking and Mapping of Library Catalogue Data Based on MapReduce[J]. 现代图书情报技术, 2013, 29(9): 15-22.
[8] Xiao Qiang, Zhu Qinghua, Zheng Hua, Wu Kewen. Design and Implementation of Distributed Collaborative Filtering Algorithm on Hadoop[J]. 现代图书情报技术, 2013, 29(1): 83-89.
[9] Kang Liyun, Wang Xiaoyue, Bai Rujiang. Analysis of MapReduce Principle and Its Main Implementation Platforms[J]. 现代图书情报技术, 2012, 28(2): 60-67.
[10] Zhao Huaming. Research and Implementation of Textual Similarity in Distributed Environment[J]. 现代图书情报技术, 2011, 27(7/8): 14-20.
[11] Zhang Xingwang, Li Chenhui, Qin Xiaozhu. Research and Initial Implementation of Large-scale Data Processing Based on Cloud Computing[J]. 现代图书情报技术, 2011, 27(4): 17-23.
[12] Zhao Huaming. Building the Open Source Mass Data Mining Platform Based on Cloud Computing[J]. 现代图书情报技术, 2010, 26(10): 76-81.
[13] Wu Baogui,Ding Zhenguo. Research of Distributed Search Engine Based on Map/Reduce[J]. 现代图书情报技术, 2007, 2(8): 52-55.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938