Please wait a minute...
New Technology of Library and Information Service  2003, Vol. 19 Issue (5): 66-67    DOI: 10.11925/infotech.1003-3513.2003.05.21
Current Issue | Archive | Adv Search |
The Research and Realization of Technology Converting HTML to XML
Chen Yanmei1   Zhang Bin2
1(Northeastern University Library, Shenyang 110004, China)
2(Information Engineering Institute of Northeastern University, Shenyang 110004, China)
Download: PDF (0 KB)  
Export: BibTeX | EndNote (RIS)      

Nowadays, the whole world can possibly communicate with all different people by using web. Internet usually uses HTML, it cannot handle the various requirement of Internet and also express the data itself.To do so, information from web sources needs to be accessible in a structured way. XML and its various extensions are a step in this direction. Unfortunately, the web is not yet a well organized repository of nicely structured XML documents but rather a conglomerate of volatile HTML pages, for which structure has to be extracted. This thesis shows the design and imp lementation of a conversion system of HTML to XML.

Key wordsWeb wrapper      Information extraction      HTML parsing      HTML to XML conversion     
Received: 19 March 2003      Published: 25 October 2003


Corresponding Authors: Chen Yanmei,Zhang Bin   
About author:: Chen Yanmei,Zhang Bin

Cite this article:

Chen Yanmei,Zhang Bin. The Research and Realization of Technology Converting HTML to XML. New Technology of Library and Information Service, 2003, 19(5): 66-67.

URL:     OR

[1]Ling Liu,Calton Pu,Wei Han,XWRAP:an XML-enabled wrapper construction system for web information sources
[J].2000 IEEE on data engineering

[2]S.Abitebonl,D.Quass,J.Mc Hugh,J.Widom,and J.L.Wiener.The Lorel Query Language for Semistructured Data
[J].Journal on Digital Libraries,1997

[3]Brad Adelberg.XoDoSe-ATool for SemiAutomatically Extracting Semi-Structured Data from Text
[J].InProc.Of the
SIGMOD Conference,Seattle,June1998

[4]Gustavo Arocena and Alberto Mendelzon.WebOQL:Restructuring Documents,Databases,and Webs
[J].InProc.ICDE’98,Orlando,February 1998

[5]Jean-Robert Gruser,Louiqa Raschid,M.E.Vidal and L.Bright.Wrapper Generation for Web Accessible Data Sources
[J].In COOPIS,1998

[6]J.Hammer,H.Garcia-Molina,J.Cho,R.Aranba,and A.Crespo.Extracting Semistructured Information from the Web
[J].In Proceedings of the Workshop on Management of Semistructured Data.Tueson,Arizona,May1997

[7Gerald Huck,Peter Fankhauser,Karl Aberer,and ErichJ.Neuhold.JEDI:Extracting and  Synthesizing Information from the Web
[J].In COOPOIS,New-York,1998

[8]Mary Tork Roth and Peter Schwartz.A Wrapper Architecture for Legacy Data Sources
[J].Technical Report RJ10077,IBM Almaden Research Center,1997

[9]World Wide Web Consortium(W3C).The Document Object Model,1998.

[10]Jon Bosak.XML,Java and the Future of the Web

[1] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[2] Chengzhi Zhang,Zheng Li. Extracting Sentences of Research Originality from Full Text Academic Articles[J]. 数据分析与知识发现, 2019, 3(10): 12-18.
[3] Mu Dongmei,Jin Shan,Ju Yuanhong. Finding Association Between Diseases and Genes from Literature Abstracts[J]. 数据分析与知识发现, 2018, 2(8): 98-106.
[4] Yufeng Duan,Sisi Huang. Information Extraction from Chinese Plant Species Diversity Description Text[J]. 现代图书情报技术, 2016, 32(1): 87-96.
[5] Liu Wei, Wang Xing, Song Peiyan. A Noise Cleaning Method for Synonym Extraction Results[J]. 现代图书情报技术, 2015, 31(6): 64-70.
[6] Jiang Chuntao. Automatic Annotation of Bibliographical References in Chinese Patent Documents[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[7] Li Xiangdong, Huo Yayong, Huang Li. Study of Book Pages Automatic Identification and Bibliographic Information Extraction[J]. 现代图书情报技术, 2014, 30(4): 71-77.
[8] Liu Yajing, Wang Yanxi, Hao Dan, Zhou Jinhui. Study on the Methods of Institutional Repository Supporting Research Services[J]. 现代图书情报技术, 2014, 30(3): 1-7.
[9] Zhang Han, Liu Shuangmei. Comparative Analysis of Centrality Indices in Extracting Concepts from Semantic Predication Network——Based on Disease Treatment Research[J]. 现代图书情报技术, 2013, (6): 30-35.
[10] Huang Xun, You Hongliang, Yu Yang. A Review of Relation Extraction[J]. 现代图书情报技术, 2013, 29(11): 30-39.
[11] He Lin, He Juan, Shen Gengyu, Yang Bo, Huang Shuiqing. An Approach to Discovery of Reference Control Gene for qRT-PCR Experiment Based on Texting Mining[J]. 现代图书情报技术, 2012, 28(7): 109-114.
[12] Gao Qiang, You Hongliang. Study on Named Entity Recognition Based on Cascaded Model for Field of Defense[J]. 现代图书情报技术, 2012, (11): 47-52.
[13] Wang Xiuyan, Cui Lei. Overview of Semantic Relations Extraction Between Biomedical Entities by Key Verbs[J]. 现代图书情报技术, 2011, 27(9): 21-27.
[14] Zhou Hong, Zhang Bei, Jiang Airong, Zhang Chengyu. Design and Implementation of Library Bibliography Information Self SMS Push Service[J]. 现代图书情报技术, 2011, 27(7/8): 127-131.
[15] Wang Zhichao, Weng Nan, Wang Yu. Research of Title Party News Identification Technology Based on Topic Sentence Similarity[J]. 现代图书情报技术, 2011, (11): 48-53.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938