New Technology of Library and Information Service  2003, Vol. 19 Issue (5): 66-67    DOI: 10.11925/infotech.1003-3513.2003.05.21
The Research and Realization of Technology Converting HTML to XML
Chen Yanmei1   Zhang Bin2
1(Northeastern University Library, Shenyang 110004, China)
2(Information Engineering Institute of Northeastern University, Shenyang 110004, China)
Nowadays, the whole world can possibly communicate with all different people by using web. Internet usually uses HTML, it cannot handle the various requirement of Internet and also express the data itself.To do so, information from web sources needs to be accessible in a structured way. XML and its various extensions are a step in this direction. Unfortunately, the web is not yet a well organized repository of nicely structured XML documents but rather a conglomerate of volatile HTML pages, for which structure has to be extracted. This thesis shows the design and imp lementation of a conversion system of HTML to XML.

Key wordsWeb wrapper      Information extraction      HTML parsing      HTML to XML conversion     
Received: 19 March 2003      Published: 25 October 2003


Corresponding Authors: Chen Yanmei,Zhang Bin   
About author:: Chen Yanmei,Zhang Bin

Chen Yanmei,Zhang Bin. The Research and Realization of Technology Converting HTML to XML. New Technology of Library and Information Service, 2003, 19(5): 66-67.

