Please wait a minute...
New Technology of Library and Information Service  2011, Vol. 27 Issue (3): 45-50    DOI: 10.11925/infotech.1003-3513.2011.03.07
Current Issue | Archive | Adv Search |
Anchor and Link Text Expansion Based KBES Algorithm Tunneling Strategy
Qiao Jianzhong
National Science Library, Chinese Academy of Sciences, Beijing 100190, China; Educational Technology Center of PLA Academy of Arts, Beijing 100081, China; Graduate University of Chinese Academy of Sciences, Beijing 100049, China
Download: PDF(483 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  On the basis of summary of “true or false tunnel” strategy on focused crawler, this paper proposes a new KBES algorithm to solute the “false tunnel” problem. The experiments prove that KBES algorithm can improve the efficiency to predict the relevance of new links by anchor and link text in the heuristic strategies to some extent.
Key wordsFocused crawling      Tunneling      Search algorithm      Focused crawler     
Received: 15 February 2011      Published: 05 May 2011
: 

G250.73

 

Cite this article:

Qiao Jianzhong. Anchor and Link Text Expansion Based KBES Algorithm Tunneling Strategy. New Technology of Library and Information Service, 2011, 27(3): 45-50.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2011.03.07     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2011/V27/I3/45

[1] Chakrabarti S, Berg M V D, Dom B. Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery[J]. Computer Networks, 1999, 31(11): 1623-1640.

[2] Ester M, Gross M, Kriegel H P. Focused Web Crawling: A Generic Framework for Specifying the Use Interest and for Adaptive Crawling Strategies[C]. In: Proceedings of the 27th International Conference on Very Large Data Bases(VLDB2001). San Francisco: Morgan Kaufmann Publishers Inc, 2001: 1-10.

[3] Shchekotykhin K, Jannach D, Friedrich G. Xcrawl: A High-recall Crawling Method for Web Mining [J]. Knowledge and Information Systems, 2009, 25(2):303-326.

[4] Diligenti M, Coetzee F, Lawrence S. Focused Crawling Using Context Graphs[C]. In: Proceedings of the 26th VLDB Conference, Cairo, Egypt. San Francisco: Morgan Kaufmann Publishers Inc, 2000:527-534.

[5] McCallum A, Nigam K, Rennie J, et al. Building Domain-Specific Search Engines with Machine Learning Techniques[C]. In: Proceedings of AAAI Spring Symposium on Intelligent Agents in Cyberspace.Menlo Park: AAAI Press, 1999:28-39.

[6] Rennie J,MeCallum A. Using Reinforcement Learning to Spider the Web Efficiently [C]. In: Proceedings of the International Conference on Machine Learning (ICML99).San Francisco: Morgan Kaufmann Publishers Inc, 1999: 335-343.

[7] 傅向华, 冯博琴, 马兆丰,等.可在线增量自学习的聚焦爬行方法[J]. 西安交通大学学报,2004, 38(6):599-602.

[8] 黄莉, 王成良, 杨铮.面向主题网络爬行的智能隧道穿越算法研究[J].计算机应用研究,2009, 26(8):2931-2933.

[9] 谭骏珊, 陈可钦.聚焦爬行中网页爬行算法的改进[J].电脑知识与技术,2008, 4(35):2145-2146, 2149.

[10] Ehrig M. Ontology-Focused Crawling of Documents and Relational Metadata[D].Germany, Karlsruhe: FZI, University Karlsruhe,2002.

[11] Ester M, Kriegel H, Schubert M. Accurate and Efficient Crawling for Relevant Websites [C]. In: Proceedings of the 30th International Conference on Very Large Data Bases, Toronto, Canada. San Francisco: Morgan Kaufmann Publishers Inc, 2004: 396-407.

[12] 杨贞.基于本体的主题爬虫的设计与实现[D].合肥: 合肥工业大学, 2008.

[13] Mouton A, Marteau P F. Exploiting Routing Information Encoded into Backlinks to Improve Topical Crawling[C]. In: Proceedings of International Conference of Soft Computing and Pattern Recognition(SOCPAR ’09). Malacca: IEEE, 2009: 659-664.

[14] About WordNet[EB/OL]. [2011-03-10].http://wordnet.princeton.edu.

[15] Welcome to HowNet! [EB/OL]. [2011-03-10].http://www.keenage.com.

[16] WebSPHINX: A Personal, Customizable Web Crawler [EB/OL]. [2011-02-12]. http://www.cs.cmu.edu/~rcm/websphinx/.

[17] jsoup: Java HTML Parser [EB/OL]. [2011-02-12]. http://jsoup.org/.

[18] The Apache Software Foundation. Apache Tika [EB/OL].[2011-02-12]. http://tika.apache.org/.

[19] JTextCat 0.1[EB/OL]. [2011-02-12].http://www.jedi.be/pages/JTextCat/.

[20] Ik-Analyzer [EB/OL]. [2011-02-12].http://code.google.com/p/ik-analyzer/.

[21] LingPipe Home [EB/OL]. [2011-02-12]. http://alias-i.com/lingpipe/.
[1] Qiao Jianzhong. An Improved Best-First Search Algorithm Based Focused Crawling Research[J]. 现代图书情报技术, 2013, 29(7/8): 28-35.
[2] Qiao Jianzhong. Statistical Characteristics Based Web Page Relevance Judgment Strategy for the “Type” Topics Crawled[J]. 现代图书情报技术, 2012, 28(6): 9-16.
[3] Wang Huaqiu. Research of a Collaborative Filtering Algorithm Based on Harmony Search[J]. 现代图书情报技术, 2012, (12): 79-84.
[4] Huang Wei, Jin Yabo, Hu Changlong. Focused Crawling for Network Public Opinion’s Topic Information[J]. 现代图书情报技术, 2012, (11): 65-71.
[5] Huang Wei,Zhang Liyi. Research on Focused Merchandise Information Crawling Based on Semantic Crawler[J]. 现代图书情报技术, 2010, 26(1): 3-8.
[6] Ren Xiaoyan,Kang Xiaojun,Zhang Hongwei. Web Crawler’s Design and Implementation Based on Dynamic Tunneling[J]. 现代图书情报技术, 2008, 24(6): 83-87.
[7] Qian Aibing. A Model for Analyzing Public Opinion Under the Web and Its Implementation[J]. 现代图书情报技术, 2008, 24(4): 49-55.
[8] Xia Chongpu,Kang Li . The Focused-crawler Based on Thesaurus[J]. 现代图书情报技术, 2007, 2(5): 41-44.
[9] Bai Guangzu,Lv Junsheng. Principle Research and Architecture Design of Focused Crawler Based on WebSPHINX[J]. 现代图书情报技术, 2007, 2(11): 58-62.
[10] Li Chunwang . Design and Implementation of Focused Crawler Based on OSS[J]. 现代图书情报技术, 2007, 2(1): 49-52.
[11] Hou Zhenyu. Implementation of a Dynamic Search System Based on Fish Search Algorithm[J]. 现代图书情报技术, 2002, 18(6): 33-35.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn