Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (12): 76-80    DOI: 10.11925/infotech.1003-3513.2010.12.13
article Current Issue | Archive | Adv Search |
Research on Automatic Archiving System for Institutional Repositories
Cui Yuhong
Beijing Institute of Technology Library, Beijing 100081,China
Download: PDF(586 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

This paper introduces an experimental system (DAAS) which can automatic harvest the institutional researcher articles and ingest the metadata into the local DSpace platform. The system implements a semi-automatic approach for IRs population which consists of information filtering, metadata extraction, copyright verification, metadata mapping and data archiving. Based on Nutch key component, how to parse the URL and extract the metadata from unstructured Web pages according to the rule-based filter is described in detail. The next research is focus on the computer-learning algorithm.

Key wordsInstitutional      repositories      Automatic      archive      Information      extraction      Nutch      DSpace     
Received: 08 October 2010      Published: 07 January 2011
: 

TP39

 

Cite this article:

Cui Yuhong. Research on Automatic Archiving System for Institutional Repositories. New Technology of Library and Information Service, 2010, 26(12): 76-80.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2010.12.13     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2010/V26/I12/76


[1] Lynch C A. Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age. http://scholarship.utm.edu/21/1/Lynch,_IRs.pdf.

[2] OpenDOAR.http://www.opendoar.org/.

[3] CiteULike:Everyone’s Library. http://www.citeulike.org/.

[4] Symplectic Elements-Publications Management System.http://www.symplectic.co.uk/products/publications.html.

[5] Ponomareva1 N, Gomez J M, Pekar V. AIR: A Semi-Automatic System for Archiving Institutional Repositories. http://clg.wlv.ac.uk/papers/AIR-system.pdf.

[6] SHERPA/RoMEO Home - Publisher Copyright Policies & Self-archiving. http://www.sherpa.ac.uk/romeo/.

[7] SWORD v2.0: Deposit Lifecycle. http://www.mops1.com/oracle/event/pasig/downloads/SWORDforDepositLifecycle_presentation.pdf.

[8] Hanlon A. Asking for Permission: A Survey of Copyright Workflows for Institutional Repositories. http://works.bepress.com/marisa_ramirez/14/.

[9] Li H, Councill I G, Bolelli L, et al. CiteSeerX-A Scalable Autonomous Scientific Digital Library. In: Proceedings of the 1st International Conference on Scalable Information Systems (INFOSCALE 06), Hong Kong, China.2006.

[10] 刘兰,吴振新,向菁,等. 网络信息资源保存开源软件综述
[J]. 现代图书情报技术, 2009(5):11-17.

[11] 崔宇红,张奎. 基于Nutch的开放存取搜索引擎构建研究
[J]. 现代图书情报技术, 2010(10):82-86.

[12] Welcome to Apache Hadoop!.http://hadoop.apache.org/index.pdf.

[13] 张俊英,胡侠,佳俊. 网页文本信息自动提取技术综述
[J]. 计算机应用研究,2009,26(8):2827-2831.

[1] Yong Cheng,Dekuan Xu,Xueqiang Lv. Automatically Grading Text Difficulty with Multiple Features[J]. 数据分析与知识发现, 2019, 3(7): 103-112.
[2] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[3] Xiuxian Wen,Jian Xu. Research on Product Characteristics Extraction and Hedonic Price Based on User Comments[J]. 数据分析与知识发现, 2019, 3(7): 42-51.
[4] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[5] Ruihua Qi,Junyi Zhou,Xu Guo,Caihong Liu. Extracting Book Review Topics with Knowledge Base[J]. 数据分析与知识发现, 2019, 3(6): 83-91.
[6] Wangqiang Zhang,Zhongming Zhu,Yamei Li,Linong Lu,Wei Liu. Disambiguating Author Names Automatically for Institutional Repository[J]. 数据分析与知识发现, 2019, 3(6): 92-98.
[7] Jing Shi,Chenlu Li,Yuxing Qian,Liqin Zhou,Bin Zhang. Information Needs of Domestic and International HCQA Users ——An Empirical Analysis[J]. 数据分析与知识发现, 2019, 3(5): 1-10.
[8] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[9] Yuemin Wu,Ganggui Ding,Bin Hu. Extracting Relationship of Agricultural Financial Texts with Attention Mechanism[J]. 数据分析与知识发现, 2019, 3(5): 86-92.
[10] Shijie Song,Yuxiang Zhao,Wenting Han,Qinghua Zhu. The Inhibition Effect of Health Literacy on Health Risk Under the Internet Environment: An Empirical Study of Chronic Diseases Based on CHNS Data[J]. 数据分析与知识发现, 2019, 3(4): 13-21.
[11] Quan Lu,Anqi Zhu,Jiyue Zhang,Jing Chen. Research on User Information Requirement in Chinese Network Health Community: Taking Tumor-forum Data of Qiuyi as an Example[J]. 数据分析与知识发现, 2019, 3(4): 22-32.
[12] Zhiqiang Wu,Zhongming Zhu,Wei Liu,Sili Wang. Research and Practice on the Extension of Knowledge Analysis and Visualization Function in CSpace[J]. 数据分析与知识发现, 2019, 3(3): 112-119.
[13] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[14] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[15] Zhen Zhang,Jin Zeng. Extracting Keywords from User Comments: Case Study of Meituan[J]. 数据分析与知识发现, 2019, 3(3): 36-44.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn