Please wait a minute...
New Technology of Library and Information Service  2007, Vol. 2 Issue (2): 18-23    DOI: 10.11925/infotech.1003-3513.2007.02.04
Current Issue | Archive | Adv Search |
Optimizing Extraction of Science Documents’ Metadata in PDF Format Based on XSLT
Chen Junlin   Zhang Wende
(Library of Fuzhou Uninversity, Fuzhou 350002, China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

This paper firstly introduces a format transforming tool and XSLT which is the language used to produce extraction rules, then simply analyses the middle documents generated from PDF to HTML. Thirdly, discusses the problem of metadata existed in the science documents in PDF format, finally gives the methods to solve this problem.

Key wordsPDF      PDF to HTML      XSLT      Metadata     
Received: 10 November 2006      Published: 25 February 2007
: 

TP311.13

 
Corresponding Authors: Chen Junlin     E-mail: bluesea_cc@163.com
About author:: Chen Junlin,Zhang Wende

Cite this article:

Chen Junlin,Zhang Wende . Optimizing Extraction of Science Documents’ Metadata in PDF Format Based on XSLT. New Technology of Library and Information Service, 2007, 2(2): 18-23.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2007.02.04     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2007/V2/I2/18

1Portable Document Format.http://www.adobe.com/products/acrobat/adobepdf.html(Accessed Nov.5,2006)
2Extensible Markup Language(XML).http://www.w3.org/XML/(Accessed Nov.5,2006)
3Advanced PDF to HTML.http://www.intrapdf.com/(Accessed Nov.5,2006)
4PDF2HTML v2.0. http://www.verypdf.com/pdf2htm/index.html(Accessed Nov.5,2006)
5PDF Converter. http://www.e-pdfconverter.com/(Accessed Nov.5,2006)
6PDFConv. http://www.bumpnetworks.com/(Accessed Nov.5,2006)
7PDF2HTML.http://sourceforge.net/projects/pdftohtml(Accessed Nov.5,2006)
8eXtensible Stylesheet Language:transformation.http://www.w3.org/TR/xslt(Accessed Nov.5,2006)

[1] Zhang Jiandong, Chen Shiji, Xu Xiaoting, Zuo Wenge. Extracting PDF Tables Based on Word Vectors[J]. 数据分析与知识发现, 2021, 5(8): 34-44.
[2] Xuhui Li,Tao Yu,Ting Li,Yiwen Li,Jinguang Gu. An Evolutionary Schema for Metadata Description[J]. 数据分析与知识发现, 2020, 4(1): 76-88.
[3] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[4] Jiang Lin,Wang Dongbo. Automatically Detecting and Tagging Foreign Language Citation Metadata[J]. 数据分析与知识发现, 2017, 1(1): 47-54.
[5] Qianqian Yu,Jianyong Zhang. Practices of NSTL Integrating and Using Third-party Metadata[J]. 现代图书情报技术, 2016, 32(1): 97-102.
[6] Liu Feng, Zhang Xiaolin. Review on the Scientific Metadata Standards and Research on Its Generic Design[J]. 现代图书情报技术, 2015, 31(12): 3-12.
[7] Wang Hui, Michael Witt, Dou Tianfang. Purdue University Research Repository and Scientific Data Management Services Based on PURR[J]. 现代图书情报技术, 2015, 31(1): 9-16.
[8] Tan Xueqing, He Shan. Research Review on Music Personalized Recommendation System[J]. 现代图书情报技术, 2014, 30(9): 22-32.
[9] Li Yu, Wang Wei. Design and Prototype Implementation of PDF Downloading Abuse Warning System[J]. 现代图书情报技术, 2011, 27(4): 71-76.
[10] Cheng Yanyan. Comparative Research on International Electronic Records Metadata Packaging Methods—VEO and METS[J]. 现代图书情报技术, 2011, 27(10): 7-11.
[11] Zhou Jing, Zhao Ying, Yang Xin. CWM-based ETL Metadata System Model Design[J]. 现代图书情报技术, 2011, 27(1): 88-93.
[12] Shen Yunyun, Xiao Long, Feng Ying. Study on General Metadata Application Rules for Digital Library[J]. 现代图书情报技术, 2010, 26(12): 1-8.
[13] Zhang Chunhong, Tang Yong, Shao Ke. Digitalization Standards and the Applications of Objects Resources[J]. 现代图书情报技术, 2010, 26(12): 9-14.
[14] Zhou Yutao, Fan Guoyin. Automatically Generating Program for OAI-METS Metadata of Dissertation[J]. 现代图书情报技术, 2010, 26(10): 91-94.
[15] Han Ying,Zhu Zhongming. Research Progress and Application of Contextual Metadata for Digital Object[J]. 现代图书情报技术, 2009, 25(6): 24-30.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn