Please wait a minute...
New Technology of Library and Information Service  2009, Vol. 3 Issue (1): 10-15    DOI: 10.11925/infotech.1003-3513.2009.01.03
article Current Issue | Archive | Adv Search |
Study on Harvest Strategy in Web Archive
Liu Lan1,Wu ZhenxinZhang ZhixiongXu Lin3
1(National Science Library,Chinese Academy of Sciences, Beijing 100190,China)
2(Graduate University of Chinese Academy of Sciences, Beijing 100049,China)
3(Library of Southwest Jiaotong University, Chengdu 610031,China)
Download: PDF (424 KB)  
Export: BibTeX | EndNote (RIS)      

This paper summarizes three commonly used harvest strategies in Web Archive:the integrity harvest, selective harvest and hybrid harvest. Then comparatively analyzes characteristics of various harvest strategies, key issues and representation projects. Finally, some key factors need to consider in choosing the harvest strategy are analyzed and general recommendations are made.

Key wordsWeb Archive      Harvest strategy      Integrity harvest      Selective harvest      Hybrid harvest     
Received: 24 September 2008      Published: 25 January 2009


Corresponding Authors: Liu Lan     E-mail:
About author:: Liu Lan,Wu Zhenxin,Zhang Zhixiong,Xu Lin

Cite this article:

Liu Lan,Wu Zhenxin,Zhang Zhixiong,Xu Lin. Study on Harvest Strategy in Web Archive. New Technology of Library and Information Service, 2009, 3(1): 10-15.

URL:     OR

[1] Kelly B. Approaches to the Preservation of Web Sites[EB/OL].[2008-06-11].
[2] Online Australian Publications: Selection Guidelines for Archiving and Preservation by the National Library of Australia[EB/OL].[2008-06-11].
[3] Michael Day. Collecting and Preserving the World Wide Web: A Feasibility Study Undertaken for the JISC and Wellcome Trust [J/OL].[2008-06-11].
[4] The Internet Archive Web Archive[EB/OL].[2008-06-11].
[5] WebArchivArchive of the Czech Web[EB/OL].[2008-06-11].
[6] 数据来源:MINERVA[EB/OL].[2008-06-11].
[7] The Australian Web Domain Harvests: A Preliminary Quantitative Analysis of the Archive Data[J/OL]. [2008-05-16].

[1] Hu Jiying,Wu Zhenxin,Xie Jing,Zhang Zhixiong. A Full-text Indexing System for WARC Files[J]. 现代图书情报技术, 2016, 32(5): 91-98.
[2] Wu Zhenxin, Zhang Zhixiong, Xie Jing, Hu Jiying. Developing Web Archive System of International Institutions Based on IIPC Open Source Software[J]. 现代图书情报技术, 2015, 31(4): 1-9.
[3] Liu Lan,Wu Zhenxin,Xiang Jing,Sun Zhiru. Review of Open Source Software in Web Archive[J]. 现代图书情报技术, 2009, 25(5): 11-17.
[4] Sun Zhiru,Wu Zhenxin,Qu Yupeng. Analysis of Index Strategies in Web Archive[J]. 现代图书情报技术, 2009, 25(4): 14-18.
[5] Shen Jinzhi,Kou Wenbo,Tian Chengeng. Web Archive Content Extracted on Feature Orienting and Boarder Forecasting[J]. 现代图书情报技术, 2009, 25(12): 52-56.
[6] Wu Zhenxin,Xiang Jing. Analysis of Retrieval System Architecture in Web Archive[J]. 现代图书情报技术, 2009, 3(1): 22-27.
[7] Lin Ying,Wu Zhenxin,Zhang Zhixiong. An Analysis of Web Information Archiving Strategies[J]. 现代图书情报技术, 2009, 3(1): 16-21.
[8] Wu Zhenxin,Zhang Zhixiong,Sun Zhiru. An Analysis of the Application of Web Archive Resources Based on Data Mining[J]. 现代图书情报技术, 2009, 3(1): 28-33.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938