%A Ding Shengchun, Liu Kai, Fang Zhen %T Crawler with Dynamic Thesaurus and Improved Shark-Search Algorithm:Case Study of Military Equipment %0 Journal Article %D 2022 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.2096-3467.2021.1125 %P 52-60 %V 6 %N 8 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_5454.shtml} %8 2022-08-25 %X

[Objective] This paper tries to address the issues facing traditional theme crawlers, such as low indexing rates and insufficient theme relevance. [Methods] We proposed a Two-step Dynamic Shark-Search (TDSS) algorithm based on Shark-Search, which divided the topic relevance calculation into the relevance of hyperlink and webpage topics. Then, we added new keywords extracted from topic-related pages to the established topic thesaurus, which improved the effectiveness of topic judgment. [Results] The TDSS crawler’s accuracy and indexing efficiency were 14.2% and 35% higher than the comparable algorithms in the same experiment environment. [Limitations] More research is needed to increase the clawer’s accuracy with excessive topic words. [Conclusions] The proposed algorithm could effectively improve the accuracy of topic information and retrieve more topic-related webpages.