Please wait a minute...
New Technology of Library and Information Service  2009, Vol. 25 Issue (7-8): 6-10    DOI: 10.11925/infotech.1003-3513.2009.07-08.02
article Current Issue | Archive | Adv Search |
Constructing a System for Harvesting and Preserving Chinese Web Information Resources Based on Open Source Software
Wu ZhenxinQu YunpengLi ChengwenXiang Jing1,2
1 (National Science Library,  Chinese Academy of Sciences, Beijing 100190, China)
2 (Graduate University of  Chinese Academy of Sciences, Beijing 100049, China)
3 (National Library of China, Beijing 100081, China)
Download: PDF(502 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

This paper discusses how to use open source software to construct a system for harvesting and preserving Chinese Web information resources,and introduces the thematic harvest experiments based on selective strategy, preliminarily analyses and summarizes the experimental results.

Key wordsOpen source software      Web information resources      Harvest of resources      Preservation     
Received: 02 March 2009      Published: 25 August 2009
: 

TP202

 
Corresponding Authors: Wu Zhenxin     E-mail: wuzx@mail.las.ac.cn
About author:: Wu Zhenxin,Qu Yunpeng,Li Chengwen,Xiang Jing

Cite this article:

Wu Zhenxin,Qu Yunpeng,Li Chengwen,Xiang Jing. Constructing a System for Harvesting and Preserving Chinese Web Information Resources Based on Open Source Software. New Technology of Library and Information Service, 2009, 25(7-8): 6-10.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2009.07-08.02     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2009/V25/I7-8/6

[1] IIPC.Active Solutions for Preserving Internet Content[EB/OL]. [2009-01-12].http://www.netpreserve.org/about/index.php.
[2] Reference Model for an Open Archival Information System[EB/OL].[2009-01-10].http://public.ccsds.org/publications/archive/650x0b1.pdf.
[3] Web Archives Long Term Access and Interoperability: The International Internet Preservation Consortium Activity[EB/OL].[2009-01-10].http://www.ifla.org/IV/ifla71/papers/194e-Lupovici.pdf.
[4] Heritrix[EB/OL]. [2008-12-24].http://crawler.archive.org/.
[5] NutchWAX[EB/OL]. [2008-12-24]. http://archive-access.sourceforge.net/projects/nutch/.
[6] WERA[EB/OL]. [2008-12-24]. http://archive-access.sourceforge.net/projects/wera/.
[7] Wayback[EB/OL].[2008-12-24]. http://www.archive.org/web/web.php.
[8] WCT[EB/OL].[2008-12-24]. http://webcurator.sourceforge.net/.
[9] NetarchiveSuite[EB/OL]. [2008-11-12]. http://netarchive.dk/suite.
[10] Netarchive.dk[EB/OL]. [2008-11-12].http://netarchive.dk/index-en.php.
[11] Smart Crawler[EB/OL]. [2008-11-12]. http://crawler.archive.org/.
[12] Hadoop[EB/OL]. [2008-11-12]. http://hadoop.apache.org/core/.
[13] Lucene[EB/OL] . [2008-11-12]. http://lucene.apache.org/.

[1] Chonghui Guo,Minqian Li. Evaluating Web Information for Ancient Villages Based on Rank Aggregation[J]. 数据分析与知识发现, 2018, 2(4): 10-19.
[2] Jiancheng Zheng, Xiaolin Zhang, Yan Zhao, Zhenxin Wu, Gaolei Yin, Man Xiao, Xiujuan Chen. Study of Sustainable Support Mechanisms for Long Term Preservation of Digital Publications[J]. 数据分析与知识发现, 2016, 32(12): 1-8.
[3] Mengxia Zhang,Liping Ku. Policy Research of Data Curation[J]. 现代图书情报技术, 2016, 32(1): 3-10.
[4] Wu Zhenxin, Zhang Zhixiong, Xie Jing, Hu Jiying. Developing Web Archive System of International Institutions Based on IIPC Open Source Software[J]. 现代图书情报技术, 2015, 31(4): 1-9.
[5] Wu Zhenxin, Wang Yuju, Fu Honghu, Li Chunwang, Liu Jianhua. Constructing a Trusted Ingest Workflow of Digital Preservation System[J]. 现代图书情报技术, 2015, 31(3): 1-7.
[6] Wang Yuju, Wu Zhenxin, Kong Beibei, Fu Honghu. Application of DROID About Format Identification in Long-term Preservation System[J]. 现代图书情报技术, 2015, 31(1): 75-81.
[7] Li Wenjiang, Chen Shiqin. WeChat as Library Public Service Platform for the APP Client[J]. 现代图书情报技术, 2014, 30(7): 133-138.
[8] Zheng Haishan, Lin Junwei. Application of Open Source Software in Operation and Maintenance in the Data Center of Library[J]. 现代图书情报技术, 2014, 30(6): 100-106.
[9] Zhang Wangqiang, Zhu Zhongming, Lu Linong. Comparative Analysis of Several Typical New Open Source Institutional Repository Software[J]. 现代图书情报技术, 2014, 30(2): 17-24.
[10] Wu Zhenxin. Research on Fixity of Digital Object in Digital Preservation[J]. 现代图书情报技术, 2014, 30(11): 1-9.
[11] Zhang Zhixiong,Wu Zhenxin,Liu Jianhua,Guo Hongmei. Analysis of the Difference Between Digital Curation and Digital Preservation[J]. 现代图书情报技术, 2014, 30(1): 4-13.
[12] Li Shanjie. Application and Implementation of Two-dimensional Bar Code on Library Book Inquiry Machine[J]. 现代图书情报技术, 2014, 30(1): 97-101.
[13] Ma Ningning, Li Chao, Qu Yunpeng. Design and Implementation of an Automatic Obsolescence Management System for Digital Preservation[J]. 现代图书情报技术, 2013, (4): 69-76.
[14] Hou Yueming, Qiao Xiaodong, Sun Wei, Li Ying. Application of Open Source Analysis Tools in Chinese Literature Analysis[J]. 现代图书情报技术, 2013, 29(3): 71-76.
[15] Hu Tao, Wei Tao, Xu Haijun. Design and Implement Security Gateway System Based on Open Source Software Used in Electronic Reading Room[J]. 现代图书情报技术, 2012, 28(2): 92-97.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn