Nowadays, the whole world can possibly communicate with all different people by using web. Internet usually uses HTML, it cannot handle the various requirement of Internet and also express the data itself.To do so, information from web sources needs to be accessible in a structured way. XML and its various extensions are a step in this direction. Unfortunately, the web is not yet a well organized repository of nicely structured XML documents but rather a conglomerate of volatile HTML pages, for which structure has to be extracted. This thesis shows the design and imp lementation of a conversion system of HTML to XML.
陈艳梅,张斌. HTML到XML转换技术的研究与实现[J]. 现代图书情报技术, 2003, 19(5): 66-67.
Chen Yanmei,Zhang Bin. The Research and Realization of Technology Converting HTML to XML. New Technology of Library and Information Service, 2003, 19(5): 66-67.
[1]Ling Liu,Calton Pu,Wei Han,XWRAP:an XML-enabled wrapper construction system for web information sources [J].2000 IEEE on data engineering
[2]S.Abitebonl,D.Quass,J.Mc Hugh,J.Widom,and J.L.Wiener.The Lorel Query Language for Semistructured Data [J].Journal on Digital Libraries,1997
[3]Brad Adelberg.XoDoSe-ATool for SemiAutomatically Extracting Semi-Structured Data from Text [J].InProc.Of the
SIGMOD Conference,Seattle,June1998
[4]Gustavo Arocena and Alberto Mendelzon.WebOQL:Restructuring Documents,Databases,and Webs [J].InProc.ICDE’98,Orlando,February 1998
[5]Jean-Robert Gruser,Louiqa Raschid,M.E.Vidal and L.Bright.Wrapper Generation for Web Accessible Data Sources [J].In COOPIS,1998
[6]J.Hammer,H.Garcia-Molina,J.Cho,R.Aranba,and A.Crespo.Extracting Semistructured Information from the Web [J].In Proceedings of the Workshop on Management of Semistructured Data.Tueson,Arizona,May1997
[7Gerald Huck,Peter Fankhauser,Karl Aberer,and ErichJ.Neuhold.JEDI:Extracting and Synthesizing Information from the Web [J].In COOPOIS,New-York,1998
[8]Mary Tork Roth and Peter Schwartz.A Wrapper Architecture for Legacy Data Sources [J].Technical Report RJ10077,IBM Almaden Research Center,1997
[9]World Wide Web Consortium(W3C).The Document Object Model,1998.http://www.w3.org/DOM