|
|
Design of Web Crawler for Deep Web Based on ID3 Algorithm |
Wang Shunyan Li Lei Wu Binghua |
(Department of Computer Science & Technology, Wuhan University of Technology, Wuhan 430070, China) |
|
|
Abstract Considering the problem of poor information coverage in Web data mining, this paper proposes a configurable Web crawling method for deep Web which can improve the results performance of a general search engine significantly. It classifies Web pages and manipulates key information of page content in order to make sensible queries. The experiment results also show it.
|
Received: 14 March 2008
Published: 25 June 2008
|
|
Corresponding Authors:
Li Lei
E-mail: lilei_lisa@163.com
|
About author:: Wang Shunyan,Li Lei,Wu Binghua |
[1] Cohen L.The Deep Web[EB/OL].[2008-01-18]. http://www.internettutorials.net/deepweb.html.
[2] 李国辉,汤大权,武德峰.信息组织与检索[M].北京:科学出版社,2003.
[3] 中国互联网络信息中心(CNNIC)[R/OL].[2008-01-18]. 中国互联网络发展状况统计报告.http://www.cnnic.cn/uploadfiles/pdf/2008/1/17/104156.pdf.
[4] 娄卓男.近几年国外隐蔽网络研究综述[J].图书情报工作,2004(1):102-104.
[5] UC Berkeley - Teaching Library Internet Workshops. Invisible Web: What it is, Why it exists, How to find it, and Its inherent ambiguity[EB/OL]. [2008-01-18]. http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html.
[6] 曲开社,成文丽,王俊红.ID3算法的一种改进算法[J].计算机工程与应用,2003,39(25):104-107.
[7] 马瑜,王有刚.ID3算法应用研究[J].信息技术,2006,30(12):84-86. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|