New Technology of Library and Information Service  2013, Vol. 29 Issue (7/8): 28-35    DOI: 10.11925/infotech.1003-3513.2013.07-08.04
An Improved Best-First Search Algorithm Based Focused Crawling Research
Qiao Jianzhong
Information Management Center of PLA Academy of Arts, Beijing 100081, China
Abstract  This paper introduces two new features——harvest rate and media type as the basis to judge relevance, by refining and reclassifying all kinds of characteristic factors that are used by focused crawlers to predict the priority of Web links, and proposes an improved Best-First Search algorithm. The algorithm uses "fine-grained" policy filtering irrelevant Web pages, selects multiple angles representative characteristic factors and constructs a links priority formula to reveal and predict the subjects of Web links comprehensively. The small-scale experiment comparing with the other three topic search algorithms demonstrates that the improved algorithm has a better performance on harvest rate and the average number of links submitted.
Key wordsFocused crawling      Search algorithm      Best-First Search algorithm      Focused crawler      Characteristic factor     
Qiao Jianzhong. An Improved Best-First Search Algorithm Based Focused Crawling Research. New Technology of Library and Information Service, 2013, 29(7/8): 28-35.

