This paper introduces an experimental system (DAAS) which can automatic harvest the institutional researcher articles and ingest the metadata into the local DSpace platform. The system implements a semi-automatic approach for IRs population which consists of information filtering, metadata extraction, copyright verification, metadata mapping and data archiving. Based on Nutch key component, how to parse the URL and extract the metadata from unstructured Web pages according to the rule-based filter is described in detail. The next research is focus on the computer-learning algorithm.
崔宇红. 机构知识库自动存储系统研究[J]. 现代图书情报技术, 2010, 26(12): 76-80.
Cui Yuhong. Research on Automatic Archiving System for Institutional Repositories. New Technology of Library and Information Service, 2010, 26(12): 76-80.
[1] Lynch C A. Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age. http://scholarship.utm.edu/21/1/Lynch,_IRs.pdf.
[5] Ponomareva1 N, Gomez J M, Pekar V. AIR: A Semi-Automatic System for Archiving Institutional Repositories. http://clg.wlv.ac.uk/papers/AIR-system.pdf.
[6] SHERPA/RoMEO Home - Publisher Copyright Policies & Self-archiving. http://www.sherpa.ac.uk/romeo/.
[8] Hanlon A. Asking for Permission: A Survey of Copyright Workflows for Institutional Repositories. http://works.bepress.com/marisa_ramirez/14/.
[9] Li H, Councill I G, Bolelli L, et al. CiteSeerX-A Scalable Autonomous Scientific Digital Library. In: Proceedings of the 1st International Conference on Scalable Information Systems (INFOSCALE 06), Hong Kong, China.2006.