Please wait a minute...
New Technology of Library and Information Service  2008, Vol. 24 Issue (6): 83-87    DOI: 10.11925/infotech.1003-3513.2008.06.16
Current Issue | Archive | Adv Search |
Web Crawler’s Design and Implementation Based on Dynamic Tunneling
Ren XiaoyanKang XiaojunZhang Hongwei1
1(The College of Electrical Engineering & Information Technology, China Three Gorges University,Yichang 443002,China)
2(Information Technology Center,China Three Gorges University,Yichang 443002,China)
Download: PDF(638 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

 Based on analysis of the traditional Web Crawlers’ searching mechanics,this paper combines the tunneling and Web page division with Web Crawler’s searching strategy. Then a dynamic tunneling Web Crawler’s searching algorithm is proposed. Experiments in four university Websites are carried out in allusion to “education resources”, and results show that the new algorithm outperforms two standard crawlers for focused crawling.

Key wordsWeb crawlers      Tunneling      Web page division     
Received: 05 December 2007      Published: 25 June 2008
: 

TP393

 
Corresponding Authors: Ren Xiaoyan     E-mail: rxy327@ctgu.edu.cn
About author:: Ren Xiaoyan,Kang Xiaojun,Zhang Hongwei

Cite this article:

Ren Xiaoyan,Kang Xiaojun,Zhang Hongwei. Web Crawler’s Design and Implementation Based on Dynamic Tunneling. New Technology of Library and Information Service, 2008, 24(6): 83-87.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2008.06.16     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2008/V24/I6/83

[1] Bermark D , Lagoze C, Sbiltyakov A. Focused Crawls, Tunneling, and Digital Libraries[C].In: Proceedings of the 6th European Conferrence on Research Advanced Technology for Digital Libraries, Lecture Notes In Computer Science,2002,2458:91-106.
[2] Luo N,Zuo W L,Yuan F Y. Gray Tunneling Based on Block Relevance for Focused Crawling[EB/OL].[2007-12-30].http://www.atlantis-press.com/php/download_paper?id=1288.
[3] 封化民,刘飚,刘艳敏,等.含有位置坐标树的Web页面分析和内容提取框架[J].清华大学学报,2005,45(S):1767-1771.
[4] Lin S H, Ho J M. Discovering Informative Content Blocks from Web Documents[C]. In: Proceedings of the ACM SIGKDD Int.2002. New York: ACM Press, 2002:588-593.
[5] Kovacevic M, Diligenti M,  Gori M, et al.Recognition of Common Area in a Web Page Using Visual Information: A Possible Application in a Page Classification[C]. In: Proceeding of the 10th international Conference on Artifical Intelligence:Methodology, Systems, Application. Varna:Springer,2002:203-212.
[6] 荆涛,左万利. 基于可视布局信息的网页噪音去除算法[J]. 华南理工大学学报(自然科学版),2004, 32(增刊):84-87.
[7] 王知津,贾福新,郑红军.现代信息检索[M]. 北京:机械工业出版社,2006.
[8] Srinivasan P, Menczer F,  Pant G. A General Evaluation Framework for Topical Crawlers[J]. Information Retrieval, 2005,8(3):417-447.
[9] 教育信息化技术标准委员会.CELTS-31:教育资源建设技术规范[EB/OL].[2005-12-20].http:// www.edu.cn/html/keyanfz/doc20020210/13.doc.

[1] Qiao Jianzhong. Anchor and Link Text Expansion Based KBES Algorithm Tunneling Strategy[J]. 现代图书情报技术, 2011, 27(3): 45-50.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn