Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (12): 76-80    DOI: 10.11925/infotech.1003-3513.2010.12.13
article Current Issue | Archive | Adv Search |
Research on Automatic Archiving System for Institutional Repositories
Cui Yuhong
Beijing Institute of Technology Library, Beijing 100081,China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

This paper introduces an experimental system (DAAS) which can automatic harvest the institutional researcher articles and ingest the metadata into the local DSpace platform. The system implements a semi-automatic approach for IRs population which consists of information filtering, metadata extraction, copyright verification, metadata mapping and data archiving. Based on Nutch key component, how to parse the URL and extract the metadata from unstructured Web pages according to the rule-based filter is described in detail. The next research is focus on the computer-learning algorithm.

Key wordsInstitutional      repositories      Automatic      archive      Information      extraction      Nutch      DSpace     
Received: 08 October 2010      Published: 07 January 2011
: 

TP39

 

Cite this article:

Cui Yuhong. Research on Automatic Archiving System for Institutional Repositories. New Technology of Library and Information Service, 2010, 26(12): 76-80.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2010.12.13     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2010/V26/I12/76


[1] Lynch C A. Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age. http://scholarship.utm.edu/21/1/Lynch,_IRs.pdf.

[2] OpenDOAR.http://www.opendoar.org/.

[3] CiteULike:Everyone’s Library. http://www.citeulike.org/.

[4] Symplectic Elements-Publications Management System.http://www.symplectic.co.uk/products/publications.html.

[5] Ponomareva1 N, Gomez J M, Pekar V. AIR: A Semi-Automatic System for Archiving Institutional Repositories. http://clg.wlv.ac.uk/papers/AIR-system.pdf.

[6] SHERPA/RoMEO Home - Publisher Copyright Policies & Self-archiving. http://www.sherpa.ac.uk/romeo/.

[7] SWORD v2.0: Deposit Lifecycle. http://www.mops1.com/oracle/event/pasig/downloads/SWORDforDepositLifecycle_presentation.pdf.

[8] Hanlon A. Asking for Permission: A Survey of Copyright Workflows for Institutional Repositories. http://works.bepress.com/marisa_ramirez/14/.

[9] Li H, Councill I G, Bolelli L, et al. CiteSeerX-A Scalable Autonomous Scientific Digital Library. In: Proceedings of the 1st International Conference on Scalable Information Systems (INFOSCALE 06), Hong Kong, China.2006.

[10] 刘兰,吴振新,向菁,等. 网络信息资源保存开源软件综述
[J]. 现代图书情报技术, 2009(5):11-17.

[11] 崔宇红,张奎. 基于Nutch的开放存取搜索引擎构建研究
[J]. 现代图书情报技术, 2010(10):82-86.

[12] Welcome to Apache Hadoop!.http://hadoop.apache.org/index.pdf.

[13] 张俊英,胡侠,佳俊. 网页文本信息自动提取技术综述
[J]. 计算机应用研究,2009,26(8):2827-2831.

[1] Wang Yifan,Li Bo,Shi Hua,Miao Wei,Jiang Bin. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[2] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[3] Han Hui, Liu Xiuwen. Automatic Scoring for Subjective Questions in Maritime Competency Assessment[J]. 数据分析与知识发现, 2021, 5(8): 113-121.
[4] Wang Ruolin, Niu Zhendong, Lin Qika, Zhu Yifan, Qiu Ping, Lu Hao, Liu Donglei. Disambiguating Author Names with Embedding Heterogeneous Information and Attentive RNN Clustering Parameters[J]. 数据分析与知识发现, 2021, 5(8): 13-24.
[5] Chai Qingfeng, Shi Linyan, Mei Shan, Xiong Haitao, He Huixin. Extracting Knowledge Elements of Sci-Tech Literature Based on Artificial and Machine Features[J]. 数据分析与知识发现, 2021, 5(8): 132-144.
[6] Tan Ying, Tang Yifei. Extracting Citation Contents with Coreference Resolution[J]. 数据分析与知识发现, 2021, 5(8): 25-33.
[7] Zhang Jiandong, Chen Shiji, Xu Xiaoting, Zuo Wenge. Extracting PDF Tables Based on Word Vectors[J]. 数据分析与知识发现, 2021, 5(8): 34-44.
[8] Wang Qinjie, Qin Chunxiu, Ma Xubu, Liu Huailiang, Xu Cunzhen. Recommending Scientific Literature Based on Author Preference and Heterogeneous Information Network[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[9] Chen Xingyue, Ni Liping, Ni Zhiwei. Extracting Financial Events with ELECTRA and Part-of-Speech[J]. 数据分析与知识发现, 2021, 5(7): 36-47.
[10] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[11] Zhang Le, Leng Jidong, Lv Xueqiang, Cui Zhuo, Wang Lei, You Xindong. RLCPAR: A Rewriting Model for Chinese Patent Abstracts Based on Reinforcement Learning[J]. 数据分析与知识发现, 2021, 5(7): 59-69.
[12] Zhao Danning,Mu Dongmei,Bai Sen. Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[13] Huang Mingxuan,Jiang Caoqing,Lu Shoudong. Expanding Queries Based on Word Embedding and Expansion Terms[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[14] Ma Yingxue,Zhao Jichang. Patterns and Evolution of Public Opinion on Weibo During Natural Disasters: Case Study of Typhoons and Rainstorms[J]. 数据分析与知识发现, 2021, 5(6): 66-79.
[15] Wang Yizhen,Ou Shiyan,Chen Jinju. Automatic Abstracting Civil Judgment Documents with Two-Stage Procedure[J]. 数据分析与知识发现, 2021, 5(5): 104-114.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn