New Technology of Library and Information Service  2009, Vol. 25 Issue (5): 44-49    DOI: 10.11925/infotech.1003-3513.2009.05.09
A Web Information Extractor Based on the Combination of Ontology and DOM
Liu Jiagang  Chen Shan  He Lingya
(Department of Computer Science,Hunan Institute of Technology,Hengyang 421002,China)
In terms of the weakness that information extraction based on information item Ontology of Web page can not partition accurately the areas of extraction, an improved Web information extractor based on Ontology and DOM is designed. This paper utilizes the DOM tree to design an inductive learning algorithm for the path of information items in sample Web pages. Through this algorithm, the areas of information extraction can be partitioned accurately, the noises of sample Web page can be reduced, and the preprocessing of the Web page can be implemented. The experiment shows that the improved approach can increase the precision of information extraction.

Key wordsInformation extraction      Wrapper      Ontology      DOM      Inductive learning     
Received: 23 March 2009      Published: 25 May 2009


Received: 23 March 2009      Published: 25 May 2009
About author:: Liu Jiagang,Chen Shan,He Lingya

Cite this article:

Liu Jiagang,Chen Shan,He Lingya. A Web Information Extractor Based on the Combination of Ontology and DOM. New Technology of Library and Information Service, 2009, 25(5): 44-49.

