Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (4): 1-9    DOI: 10.11925/infotech.1003-3513.2015.04.01
Current Issue | Archive | Adv Search |
Developing Web Archive System of International Institutions Based on IIPC Open Source Software
Wu Zhenxin, Zhang Zhixiong, Xie Jing, Hu Jiying
National Science Library, Chinese Academy of Sciences, Beijing 100190, China
Download: PDF(2375 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] Develope Web Archive System of International Institutions. [Methods] Based on IIPC open source software framework, this paper applies a three layer expansion strategy in the acquisition terminal, provides automatical uploading and reporting function in the acquisition client, develops a WARC parser which can analyze the content of WARC file, uses Solr to be an indexer. [Results] This paper implements acquisition expansion, promotes the automatical level of system workflow by adding more function modules in the acquisition client, extracts more information by developing WARC parser modules, uses Solr to enrich index and retrieval service. [Limitations] Lack of large-scale Web archive to verify this platform. [Conclusions] The expanded Web archive framework becomes distributed, extended and full automatic.

Key wordsOpen source software      Web archive      Syetem development     
Received: 03 September 2014      Published: 21 May 2015
:  G352  

Cite this article:

Wu Zhenxin, Zhang Zhixiong, Xie Jing, Hu Jiying. Developing Web Archive System of International Institutions Based on IIPC Open Source Software. New Technology of Library and Information Service, 2015, 31(4): 1-9.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.04.01     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I4/1

[1] Toward a National Strategy for Preserving Online Science [EB/OL]. [2014-08-05]. http://www.digitalpreservation.gov/meetings/documents/othermeetings/science-at-risk-NDIIPP-report-nov-2012.pdf.
[2] IIPC [EB/OL]. [2014-08-05]. http://netpreserve.org/.
[3] Tools and Software [EB/OL]. [2014-08-05]. http://netpreserve.org/Web-archiving/tools-and-software.
[4] 刘兰, 吴振新, 向菁, 等. 网络信息资源保存开源软件综述[J]. 现代图书情报技术, 2009(5): 11-17. (Liu Lan, Wu Zhenxin, Xiang Jing, et al. Review of Open Source Software in Web Archive [J]. New Technology of Library and Information Service, 2009(5): 11-17.)
[5] ISO 28500:2009 Information and Documentation——WARC File Format [EB/OL]. [2014-08-05]. http://www.iso.org/iso/
home/store/catalogue_tc/catalogue_detail.htm?csnumber=44717.
[6] Heritrix [EB/OL]. [2014-08-05]. https://Webarchive.jira.com/wiki/display/Heritrix/Heritrix.
[7] Internet Archive [EB/OL]. [2014-08-05]. http://www.internetarchive.org/.
[8] The Web Curator Tool Project [EB/OL]. [2014-08-05]. http://Webcurator.sourceforge.net/.
[9] Web Archive Access [EB/OL]. [2014-08-05]. http://sourceforge.net/projects/archive-access/files/wayback/.
[10] NutchWAX [EB/OL]. [2014-08-05]. http://archive-access.sourceforge.net/projects/nutch/.
[11] 吴振新, 曲云鹏, 李成文, 等. 基于开源软件搭建网络信息资源采集与保存平台[J]. 现代图书情报技术, 2009(7-8): 6-10. (Wu Zhenxin, Qu Yunpeng, Li Chengwen, et al. Constructing a System for Harvesting and Preserving Chinese Web Information Resources Based on Open Source Software [J]. New Technology of Library and Information Service, 2009(7-8): 6-10.)
[12] Trail: RMI [EB/OL]. [2014-08-05]. http://download.oracle.com/javase/tutorial/rmi/index.html.
[13] 吴振新,张智雄,王婷.网络信息资源保存的协作网络研究[J]. 数字图书馆论坛. 2009(7): 2-6. (Wu Zhenxin, Zhang Zhixiong, Wang Ting. Research on the Web Archive Cooperative Networks [J]. Digital Library Forum, 2009(7): 2-6.)

[1] Hu Jiying,Wu Zhenxin,Xie Jing,Zhang Zhixiong. A Full-text Indexing System for WARC Files[J]. 现代图书情报技术, 2016, 32(5): 91-98.
[2] Li Wenjiang, Chen Shiqin. WeChat as Library Public Service Platform for the APP Client[J]. 现代图书情报技术, 2014, 30(7): 133-138.
[3] Zheng Haishan, Lin Junwei. Application of Open Source Software in Operation and Maintenance in the Data Center of Library[J]. 现代图书情报技术, 2014, 30(6): 100-106.
[4] Zhang Wangqiang, Zhu Zhongming, Lu Linong. Comparative Analysis of Several Typical New Open Source Institutional Repository Software[J]. 现代图书情报技术, 2014, 30(2): 17-24.
[5] Li Shanjie. Application and Implementation of Two-dimensional Bar Code on Library Book Inquiry Machine[J]. 现代图书情报技术, 2014, 30(1): 97-101.
[6] Hou Yueming, Qiao Xiaodong, Sun Wei, Li Ying. Application of Open Source Analysis Tools in Chinese Literature Analysis[J]. 现代图书情报技术, 2013, 29(3): 71-76.
[7] Hu Tao, Wei Tao, Xu Haijun. Design and Implement Security Gateway System Based on Open Source Software Used in Electronic Reading Room[J]. 现代图书情报技术, 2012, 28(2): 92-97.
[8] Wang Ke, Zhou Qiang, Li Chunwang. Design and Implementation of Web System Multi-stage Distributed Caching Mechanism[J]. 现代图书情报技术, 2011, 27(7/8): 21-25.
[9] Qin Xuedong, Chen Daqing, Cui Xiaosong. High-availability System Architecture with Open Source Virtualization[J]. 现代图书情报技术, 2011, 27(6): 46-50.
[10] Xian Guojian, Zhao Ruixue. Research and Implementation of Chinese Agricultural Journals’ Abstracts Retrieval System Based on Solr[J]. 现代图书情报技术, 2011, 27(6): 51-58.
[11] Li Yu, Wang Wei. Design and Prototype Implementation of PDF Downloading Abuse Warning System[J]. 现代图书情报技术, 2011, 27(4): 71-76.
[12] Zhu Zhongming,Ma Jianxia,Lu Linong,Li Fuqiang ,Liu Wei,Wu Denglu. Developing an Institutional Repository Platform via Extending DSpace[J]. 现代图书情报技术, 2009, 25(7-8): 11-17.
[13] Wu Zhenxin,Qu Yunpeng,Li Chengwen,Xiang Jing. Constructing a System for Harvesting and Preserving Chinese Web Information Resources Based on Open Source Software[J]. 现代图书情报技术, 2009, 25(7-8): 6-10.
[14] Shi Hongjuan,Li Ling,Cui Yeqiu. Design and Implementation of Searching System on Reader Borrowed Records[J]. 现代图书情报技术, 2009, 25(7-8): 23-27.
[15] Wang Zexian. Implement the Browser-based Slide System Using Open Source Software[J]. 现代图书情报技术, 2009, 25(6): 89-93.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn