For the problem of traditional search engine can’t get completed and updated copies of the whole Web in time, especially news and Weblog site with high update frequency, this paper designes a distributed news & Weblog search engine based on RSS syndicated data. Using the pastry protocol, distributed data could be stored and transferred smoothly. This paper also compresses index file with Bloom filter. So the news and Weblog site with high update frequency could be searched in time and the cost of storage could be reduced. The system has a bright future.
刘峰,施水才,肖诗斌,王弘蔚 . 基于RSS的分布式新闻博客搜索引擎设计*[J]. 现代图书情报技术, 2007, 2(1): 29-32.
Liu Feng,Shi Shuicai,Xiao Shibin,Wang Hongwei . A Design of Distributed News & Weblog Search Engine Based on RSS. New Technology of Library and Information Service, 2007, 2(1): 29-32.
1Balakrishnan H, Kaashoek M, Karger D, Morris R, Stoica I. Looking Up Data in P2P Systems.Comm. of the ACM, February 2003
2伍玉伟. RSS:网络信息“聚合”利器.图书情报论坛,2006(1) :72-73
3于忠涛,刘兴伟.Pastry 网络模型的路由机制及改进.西华大学学报自然科学版,2006,25(1) :27-30
4Ripeanu M.Peer-to-peer Architecture Case Study:Gnutella.In Proceedings of International Conference on P2P Computing, 2001
5Bloom Filter.http://www.nist.gov/dads/HTML/bloomFilter.html(Accessed Aug.18,2006)
6池静,方启泉. Bloom filter 的研究和应用.河北建筑科技学院学报,2003,20(4) :59-61