|
|
Research of large-scale URL Filter Base on Bloom Filter |
Ding Zhenguo1 Wu Baogui2 Xin Youqiang2 |
1(College of Networking Education, Xidian University, Xi’an 710071,China)
2(Collegel of Economics and Management, Xidian University, Xi’an 710071,China) |
|
|
Abstract On the condition of error allowing, the Bloom Filter and its improvable algorithm, can be used to filter the homology URL pages through URL Hashing. Experiment shows that it can achieve satisfactory results through reasonable adjustments of its parameter.
|
Received: 06 December 2007
Published: 25 March 2008
|
|
Corresponding Authors:
Wu Baogui
E-mail: bg1011@163.com
|
About author:: Ding Zhenguo,Wu Baogui,Xin Youqiang |
[1] Gulli A, Signorini A.The Indexable Web is More than 11.5 Billion Pages[C]. Special Interest Tracks and Posters of the 14th International Conference on World Wide Web WWW ’05.ACM Press 2005:902-903.
[2] Bloom B. Space/time Tradeoffs in Hash Coding with Allowable Errors[J].Communication of the ACM, 1970, 13(7):422-426.
[3] Cormen T H, Leiserson C E. Introduction to Algorithms[M]. 2nd ed. Cambridge: MIT Press, 2001:221-252.
[4] 吴丽辉,白硕,张刚,等.Web信息采集中的哈希函数比较[J].小型微型计算机系统,2006,27(4):673-676.
[5] 李晓明,凤旺森.两种对URL 的散列效果很好的函数[J].软件学报,2004,15 (2) :179-184.
[6] 肖明忠,代亚非.Bloom Filter及其应用综述[J].计算机科学,2004,30(4):180-183.
[7] 池静,倪健,王华,等.Bloom Filter 和Weighted Bloom Filter 的比较与研究[J].河北师范大学学报:自然科学版,2006,30(4):398-402.
[8] Fan L, Cao P, Almeida J,et al. Summary Cache: A Scalable Wide-area Web Cache Sharing Protocol[C].In:IEEE/ACM Transactions On Networking,2000,8(3):281-293.
[9] 肖明忠,代亚非,李小明.拆分型Bloom Filter[J].电子学报,2004,32(2):241-245.
[10] 谢鲲,闵应骅,张大方,等.分档布鲁姆过滤器的查询算法[J].计算机学报,2007,30(4):597-607.
[11] Mitzenmacher M.Compressed Bloom Filters[C].In: Proceedings of the 20th ACM Symposium on Principles of Distributed Computing (PODC2001).Rhode, Island, 2001:23-34. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|