|
|
The Research and Realization of Technology Converting HTML to XML |
Chen Yanmei1 Zhang Bin2 |
1(Northeastern University Library, Shenyang 110004, China)
2(Information Engineering Institute of Northeastern University, Shenyang 110004, China) |
|
|
Abstract Nowadays, the whole world can possibly communicate with all different people by using web. Internet usually uses HTML, it cannot handle the various requirement of Internet and also express the data itself.To do so, information from web sources needs to be accessible in a structured way. XML and its various extensions are a step in this direction. Unfortunately, the web is not yet a well organized repository of nicely structured XML documents but rather a conglomerate of volatile HTML pages, for which structure has to be extracted. This thesis shows the design and imp lementation of a conversion system of HTML to XML.
|
Received: 19 March 2003
Published: 25 October 2003
|
|
Corresponding Authors:
Chen Yanmei,Zhang Bin
|
About author:: Chen Yanmei,Zhang Bin |
[1]Ling Liu,Calton Pu,Wei Han,XWRAP:an XML-enabled wrapper construction system for web information sources [J].2000 IEEE on data engineering
[2]S.Abitebonl,D.Quass,J.Mc Hugh,J.Widom,and J.L.Wiener.The Lorel Query Language for Semistructured Data [J].Journal on Digital Libraries,1997
[3]Brad Adelberg.XoDoSe-ATool for SemiAutomatically Extracting Semi-Structured Data from Text [J].InProc.Of the
SIGMOD Conference,Seattle,June1998
[4]Gustavo Arocena and Alberto Mendelzon.WebOQL:Restructuring Documents,Databases,and Webs [J].InProc.ICDE’98,Orlando,February 1998
[5]Jean-Robert Gruser,Louiqa Raschid,M.E.Vidal and L.Bright.Wrapper Generation for Web Accessible Data Sources [J].In COOPIS,1998
[6]J.Hammer,H.Garcia-Molina,J.Cho,R.Aranba,and A.Crespo.Extracting Semistructured Information from the Web [J].In Proceedings of the Workshop on Management of Semistructured Data.Tueson,Arizona,May1997
[7Gerald Huck,Peter Fankhauser,Karl Aberer,and ErichJ.Neuhold.JEDI:Extracting and Synthesizing Information from the Web [J].In COOPOIS,New-York,1998
[8]Mary Tork Roth and Peter Schwartz.A Wrapper Architecture for Legacy Data Sources [J].Technical Report RJ10077,IBM Almaden Research Center,1997
[9]World Wide Web Consortium(W3C).The Document Object Model,1998.http://www.w3.org/DOM
[10]Jon Bosak.XML,Java and the Future of the Web [J] http://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.html
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|