This paper firstly introduces a format transforming tool and XSLT which is the language used to produce extraction rules, then simply analyses the middle documents generated from PDF to HTML. Thirdly, discusses the problem of metadata existed in the science documents in PDF format, finally gives the methods to solve this problem.
陈俊林,张文德 . 基于XSLT的PDF论文元数据的优化抽取[J]. 现代图书情报技术, 2007, 2(2): 18-23.
Chen Junlin,Zhang Wende . Optimizing Extraction of Science Documents’ Metadata in PDF Format Based on XSLT. New Technology of Library and Information Service, 2007, 2(2): 18-23.