机构知识库自动存储系统研究

doi:10.11925/infotech.1003-3513.2010.12.13

现代图书情报技术

2010, Vol. 26

Issue (12): 76-80 https://doi.org/10.11925/infotech.1003-3513.2010.12.13

应用实践

本期目录 | 过刊浏览 | 高级检索

机构知识库自动存储系统研究

崔宇红

北京理工大学图书馆北京 100081

Research on Automatic Archiving System for Institutional Repositories

Cui Yuhong

Beijing Institute of Technology Library, Beijing 100081,China

摘要
参考文献
相关文章
Metrics

全文: PDF (586 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要

介绍一种从网络文献数据库中自动采集机构学术成果并存储到DSpace平台的实验系统(DAAS),并实现信息过滤、元数据提取、版权验证、元数据映射和数据存储的半自动化流程。详细描述基于Nutch核心组件,DAAS针对不同的期刊数据库,采用基于规则的方法设置过滤器来提取非结构化网页上书目信息,并指出计算机学习算法是下一步研究重点。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	崔宇红

关键词 ：机构知识库, 自动存储, 信息提取, Nutch, DSpace

Abstract：

This paper introduces an experimental system (DAAS) which can automatic harvest the institutional researcher articles and ingest the metadata into the local DSpace platform. The system implements a semi-automatic approach for IRs population which consists of information filtering, metadata extraction, copyright verification, metadata mapping and data archiving. Based on Nutch key component, how to parse the URL and extract the metadata from unstructured Web pages according to the rule-based filter is described in detail. The next research is focus on the computer-learning algorithm.

Key words： Institutional repositories Automatic archive Information extraction Nutch DSpace

收稿日期: 2010-10-08 出版日期: 2011-01-07

TP39

基金资助:

本文系北京理工大学基础研究基金“机构知识库构建研究”(项目编号:20061442003)的研究成果之一。

引用本文:

崔宇红. 机构知识库自动存储系统研究[J]. 现代图书情报技术, 2010, 26(12): 76-80.
Cui Yuhong. Research on Automatic Archiving System for Institutional Repositories. New Technology of Library and Information Service, 2010, 26(12): 76-80.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2010.12.13 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2010/V26/I12/76

[1] Lynch C A. Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age. http://scholarship.utm.edu/21/1/Lynch,_IRs.pdf.

[2] OpenDOAR.http://www.opendoar.org/.

[3] CiteULike:Everyone’s Library. http://www.citeulike.org/.

[4] Symplectic Elements-Publications Management System.http://www.symplectic.co.uk/products/publications.html.

[5] Ponomareva1 N, Gomez J M, Pekar V. AIR: A Semi-Automatic System for Archiving Institutional Repositories. http://clg.wlv.ac.uk/papers/AIR-system.pdf.

[6] SHERPA/RoMEO Home - Publisher Copyright Policies & Self-archiving. http://www.sherpa.ac.uk/romeo/.

[7] SWORD v2.0: Deposit Lifecycle. http://www.mops1.com/oracle/event/pasig/downloads/SWORDforDepositLifecycle_presentation.pdf.

[8] Hanlon A. Asking for Permission: A Survey of Copyright Workflows for Institutional Repositories. http://works.bepress.com/marisa_ramirez/14/.

[9] Li H, Councill I G, Bolelli L, et al. CiteSeer^X-A Scalable Autonomous Scientific Digital Library. In: Proceedings of the 1st International Conference on Scalable Information Systems (INFOSCALE 06), Hong Kong, China.2006.

[10] 刘兰,吴振新,向菁,等. 网络信息资源保存开源软件综述
[J]. 现代图书情报技术, 2009(5):11-17.

[11] 崔宇红,张奎. 基于Nutch的开放存取搜索引擎构建研究
[J]. 现代图书情报技术, 2010(10):82-86.

[12] Welcome to Apache Hadoop!.http://hadoop.apache.org/index.pdf.

[13] 张俊英,胡侠,佳俊. 网页文本信息自动提取技术综述
[J]. 计算机应用研究,2009,26(8):2827-2831.

[1]	卢利农,祝忠明,张旺强,王小春. 基于Lingo3G聚类算法的机构知识库跨库知识整合与知识指纹服务实现[J]. 数据分析与知识发现, 2021, 5(5): 127-132.
[2]	王毅,沈喆,姚毅凡,成颖. 领域事件图谱构建方法综述^*[J]. 数据分析与知识发现, 2020, 4(10): 1-13.
[3]	张旺强,祝忠明,李雅梅,卢利农,刘巍. 机构知识库作者名自动消歧框架设计与实践^*[J]. 数据分析与知识发现, 2019, 3(6): 92-98.
[4]	吴志强,祝忠明,刘巍,王思丽. CSpace知识分析与可视化功能扩展研究与实践^*[J]. 数据分析与知识发现, 2019, 3(3): 112-119.
[5]	吴志强, 祝忠明, 姚晓娜, 王思丽. CSpace机构知识库影音资源支持能力扩展研究与实践^*[J]. 数据分析与知识发现, 2017, 1(9): 90-96.
[6]	王思丽, 刘巍, 祝忠明, 吴志强, 王金平. 基于CSpace的科技信息可配置化自动监测功能设计与实现^*[J]. 数据分析与知识发现, 2017, 1(10): 85-93.
[7]	吴志强, 祝忠明, 刘巍, 张旺强, 姚晓娜. 机构知识库三维模型检索与展示技术研究与实践^*[J]. 数据分析与知识发现, 2017, 1(1): 73-80.
[8]	张旺强,祝忠明,姚晓娜,刘巍. 基于开放获取论文推送转发服务系统iSwitch的机构知识库内容建设^*[J]. 现代图书情报技术, 2016, 32(4): 91-96.
[9]	钱力, 师洪波, 张晓林, 梁娜. 开放获取论文推送转发服务系统iSwitch: 论文分发推送[J]. 现代图书情报技术, 2015, 31(6): 7-12.
[10]	陈和. 运用开源软件Logstash和ElasticSearch实现DSpace日志实时统计分析[J]. 现代图书情报技术, 2015, 31(5): 88-93.
[11]	严潮斌, 陈嘉勇, 侯瑞芳, 李玲, 周婕. 查收查引服务支撑需求驱动下的高校机构知识库建设[J]. 现代图书情报技术, 2015, 31(5): 94-100.
[12]	白海燕. ORCID在机构知识库中的整合介绍[J]. 现代图书情报技术, 2015, 31(3): 8-17.
[13]	赵瑞雪, 杜若鹏. 中国农业科学院机构知识库的实践探索[J]. 现代图书情报技术, 2015, 31(2): 72-77.
[14]	姜春涛. 自动标注中文专利的引文信息[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[15]	张晓丹, 乔晓东, 顾立平, 姚长青, 初景利. 中国学术期刊对机构知识库存缴政策调查分析[J]. 现代图书情报技术, 2014, 30(6): 1-7.

Viewed

Full text

Abstract

Cited

Shared

Discussed