Please wait a minute...
New Technology of Library and Information Service  2008, Vol. 24 Issue (6): 41-45    DOI: 10.11925/infotech.1003-3513.2008.06.08
Current Issue | Archive | Adv Search |
Design of Web Crawler for Deep Web Based on ID3 Algorithm
Wang Shunyan   Li Lei   Wu Binghua
(Department of Computer Science & Technology, Wuhan University of Technology, Wuhan 430070, China)
Download: PDF(491 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

Considering the problem of poor information coverage in Web data mining, this paper proposes a configurable Web crawling method for deep Web which can improve the results performance of a general search engine significantly. It classifies Web pages and manipulates key information of page content in order to make sensible queries. The experiment results also show it.

Key words Web crawler      Deep Web      ID3 algorithm     
Received: 14 March 2008      Published: 25 June 2008
: 

TP393

 
Corresponding Authors: Li Lei     E-mail: lilei_lisa@163.com
About author:: Wang Shunyan,Li Lei,Wu Binghua

Cite this article:

Wang Shunyan,Li Lei,Wu Binghua. Design of Web Crawler for Deep Web Based on ID3 Algorithm. New Technology of Library and Information Service, 2008, 24(6): 41-45.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2008.06.08     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2008/V24/I6/41

[1] Cohen L.The Deep Web[EB/OL].[2008-01-18]. http://www.internettutorials.net/deepweb.html.
[2] 李国辉,汤大权,武德峰.信息组织与检索[M].北京:科学出版社,2003.
[3] 中国互联网络信息中心(CNNIC)[R/OL].[2008-01-18]. 中国互联网络发展状况统计报告.http://www.cnnic.cn/uploadfiles/pdf/2008/1/17/104156.pdf.
[4] 娄卓男.近几年国外隐蔽网络研究综述[J].图书情报工作,2004(1):102-104.
[5] UC Berkeley - Teaching Library Internet Workshops. Invisible Web: What it is, Why it exists, How to find it, and Its inherent ambiguity[EB/OL]. [2008-01-18]. http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html.
[6] 曲开社,成文丽,王俊红.ID3算法的一种改进算法[J].计算机工程与应用,2003,39(25):104-107.
[7] 马瑜,王有刚.ID3算法应用研究[J].信息技术,2006,30(12):84-86.

[1] Guo Shaoyou. Research on Deep Web Surfacing Based on Common Search Engines[J]. 现代图书情报技术, 2010, 26(2): 24-30.
[2] Zhang Yulian ,Li Shuai ,Zhou Xinglin. Research on Ontology-based Automatic Annotation for Deep Web[J]. 现代图书情报技术, 2009, (9): 45-50.
[3] Luo Liqun,Zhang Wei,Chen Jinxin. Design and Implementation of Elementary Education Yellow Page Website Auto-generation System[J]. 现代图书情报技术, 2007, 2(8): 80-83.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn