元数据自动抽取研究新进展*
曾苏1,2 马建霞1 张秀秀1
1 (中国科学院国家科学图书馆兰州分馆 兰州 730000)
2 (中国科学院研究生院 北京 100049)
New Development of Automatic Metadata Extraction
Zeng Su1,2 Ma Jianxia1 Zhang Xiuxiu1
1 (The Lanzhou Branch of the National Science Library, Chinese Academy of Sciences, Lanzhou 730000,China)
2 (Graduate University of Chinese Academy of Sciences, Beijing 100049,China)
摘要 分析元数据自动抽取的现实需求,对元数据自动抽取的相关研究进行阐述,然后对DROID、 NLNZ Metadata Extractor、Metadata Miner Catalogue PRO 3种典型的元数据自动抽取器进行分析比较;在讨论目前元数据自动抽取技术局限性的基础上,对该技术进行总结和展望。
关键词 :
元数据 ,
自动抽取 ,
抽取器
Abstract : This paper analyses realistic demands of automatic metadata extraction, elaborates related research on automatic metadata extraction and compares three typical automatic extractors of metadata, including DROID, NLNZ Metadata Extractor and Metadata Miner Catalogue PRO. On the basis of discussing present limitations of automatic metadata extraction, the authors give a summary and prediction of this technology.
Key words :
Metadata
Automatic extraction
Extractor
收稿日期: 2007-12-17
出版日期: 2008-04-25
基金资助: *本文系国家社会科学基金项目“机构知识库建设与应用研究”(项目编号:07BTQ019)的研究成果之一。
通讯作者:
曾苏
E-mail: zengs@mail.las.ac.cn
作者简介 : 曾苏,马建霞,张秀秀
[1] Dublin Core Metadata Editor[EB/OL].[2007-11-08].http://www.ukoln.ac.uk/metadata/dcdot/ .
[2] Liu Y, Bai K, Mitra P, et al. TableSeer: Automatic Table Metadata Extraction and Searching in Digital Libraries[EB/OL]. [2007-11-10]. http://delivery.acm.org/10.1145/1260000/1255193/p91-liu.pdf?key1=1255193&key2=9007077911&coll=GUIDE&dl=GUIDE&CFID=9677192&CFTOKEN=66821516 .
[3] Day M Y, Tsai R T, Sung C L, et al. Reference Metadata Extraction Using a Hierarchical Knowledge Representation Framework[J]. Decision Support Systems, 2007(43): 152-167.
[4] Cortezl E, da Silval A S, Goncalves M A, et al. FLUX-CIM: Flexible Unsupervised Extraction of Citation Metadata[EB/OL]. [2007-12-18]. http://delivery.acm.org/10.1145/1260000/1255219/p215-cortez.pdf?key1=1255219&key2=9296088911&coll=GUIDE&dl=GUIDE&CFID=10613840&CFTOKEN=55320929/ .
[5] Hu Y H, Li H, Cao Y B, et al. Automatic Extraction of Titles from General Documents Using Machine Learning[J]. Information Processing and Management , 2006,42(1):1276-1293.
[6] 贺亚锋. Web站点元数据自动生成工具介绍[J]. 图书馆杂志, 2001,20(1): 28-30.
[7] Xue Y W, Hu Y H, Xin G M, et al. Web Page Title Extraction and Its Application[J]. Information Processing and Management, 2007 (43): 1332-1347.
[8] Yu J D, Fan X Z. Metadata Extraction from Chinese Research Papers Based on Conditional Random Fields[EB/OL]. [2007-12-01]. http://210.37.44.253/nc2007/fskd2007/data/Volume%201/105-1-Chinese%20Research%20Papers.pdf .
[9] 李朝光, 张铭, 邓志鸿, 等. 论文元数据信息的自动抽取[J]. 计算机工程与应用, 2002,38(21): 189-191,235.
[10] DROID[EB/OL].[2007-11-22].http://droid.sourceforge.net/wiki/index.php/Introduction .
[11] Metadata Extraction Tool[CP/OL].[2007-12-03].http://sourceforge.net/projects/meta-extractor/ .
[12] Nation Library of New Zealand.[2007-12-05].http://www.natlib.govt.nz/about-us/current-initiatives/metadata-extraction-tool/ .
[13] Catalogue PRO[EB/OL]. [2007-12-08]. http://peccatte.karefil.com/software/Catalogue/catalogueDK.htm/ .
[14] Main Features of Catalogue[EB/OL]. [2007-12-10].http://peccatte.karefil.com/software/Catalogue/CatalogueENG.htm/ .
[15] Implementing the PREMIS Data Dictionary: A Survey of Approaches[EB/OL]. [2007-12-16]. http://www.loc.gov/standards/premis/implementation-report-woodyard.pdf/ .
Viewed
Full text
Abstract
Cited
Shared
Discussed