|
|
Research on Automatic Archiving System for Institutional Repositories |
Cui Yuhong |
Beijing Institute of Technology Library, Beijing 100081,China |
|
|
Abstract This paper introduces an experimental system (DAAS) which can automatic harvest the institutional researcher articles and ingest the metadata into the local DSpace platform. The system implements a semi-automatic approach for IRs population which consists of information filtering, metadata extraction, copyright verification, metadata mapping and data archiving. Based on Nutch key component, how to parse the URL and extract the metadata from unstructured Web pages according to the rule-based filter is described in detail. The next research is focus on the computer-learning algorithm.
|
Received: 08 October 2010
Published: 07 January 2011
|
|
[1] Lynch C A. Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age. http://scholarship.utm.edu/21/1/Lynch,_IRs.pdf.
[2] OpenDOAR.http://www.opendoar.org/.
[3] CiteULike:Everyone’s Library. http://www.citeulike.org/.
[4] Symplectic Elements-Publications Management System.http://www.symplectic.co.uk/products/publications.html.
[5] Ponomareva1 N, Gomez J M, Pekar V. AIR: A Semi-Automatic System for Archiving Institutional Repositories. http://clg.wlv.ac.uk/papers/AIR-system.pdf.
[6] SHERPA/RoMEO Home - Publisher Copyright Policies & Self-archiving. http://www.sherpa.ac.uk/romeo/.
[7] SWORD v2.0: Deposit Lifecycle. http://www.mops1.com/oracle/event/pasig/downloads/SWORDforDepositLifecycle_presentation.pdf.
[8] Hanlon A. Asking for Permission: A Survey of Copyright Workflows for Institutional Repositories. http://works.bepress.com/marisa_ramirez/14/.
[9] Li H, Councill I G, Bolelli L, et al. CiteSeerX-A Scalable Autonomous Scientific Digital Library. In: Proceedings of the 1st International Conference on Scalable Information Systems (INFOSCALE 06), Hong Kong, China.2006.
[10] 刘兰,吴振新,向菁,等. 网络信息资源保存开源软件综述 [J]. 现代图书情报技术, 2009(5):11-17.
[11] 崔宇红,张奎. 基于Nutch的开放存取搜索引擎构建研究 [J]. 现代图书情报技术, 2010(10):82-86.
[12] Welcome to Apache Hadoop!.http://hadoop.apache.org/index.pdf.
[13] 张俊英,胡侠,佳俊. 网页文本信息自动提取技术综述 [J]. 计算机应用研究,2009,26(8):2827-2831.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|