The article is structured as follows. Firstly, we try to design a DTD of articles of science and technology. Secondly, we analyze the structure of PDF documents. Based on that, we dwell on the design of a PDF information extraction system, which use the above-mentioned DTD as a template, transfer a PDF-formatted scientific and technological article to a valid XML document.
宋艳娟,张文德. 基于XML的PDF文档信息抽取系统的研究*[J]. 现代图书情报技术, 2005, 21(9): 10-13.
Song Yanjuan,Zhang Wende. Research on PDF Documents Information Extraction System Based on XML. New Technology of Library and Information Service, 2005, 21(9): 10-13.