|
|
The Technology of Web Information Extraction and Its Application in the TBT Early-Warning System |
Zhai Dongsheng Yu Yang Li Li |
(The Economics and Management School, Beijing University of Technology, Beijing 100022,China) |
|
|
Abstract This paper researches into an information technology, which could real-timely extract the interested information from data-type Web pages. The technology we employ could intelligently identify table structures, and automatically separate different kinds of data. In the process of analyzing and classifying data, it adopts the combination of sorting by words and dividing by table structure, which depends on the idea of ontology and aggregates a series of mature models, such as SVM and HMM. The technology, which has passed the test, is applied into a dynamic information gathering system of a TBT early-warning system and does a good work.
|
Received: 08 June 2005
Published: 25 September 2005
|
|
Corresponding Authors:
Yu Yang
E-mail: bgdyuyang@emails.bjut.edu.cn
|
About author:: Zhai Dongsheng,Yu Yang,Li Li |
1周明建,高济等.基于本体论的Web信息抽取.计算机辅助设计与图形学学报,2004.16(4)
2Xiaoying Gao,Mengjie Zhang and Peter Andreae. Learning Information Extraction Patterns from Tabular Web Pages without Manual Labeling. Proceedings of the IEEE/WIC International Conference on Web Intelligence (WI'03) 3Kumi ITAI, Atsuhiro TAKASU and Jun ADACHI. Information Extraction from HTML Pages and its Integration. Proceedings of the 2003 Symposium on Applications and the Internet Workshops (SAINT-w'03)
4张志刚,陈静等.一种HTML网页净化方法.情报学报,2004, 23(4):387-393
5周源远,王继成等.Web页面清洗技术的研究与实现.计算机工程,2002.28(9):48-50 |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|