Please wait a minute...
New Technology of Library and Information Service  2008, Vol. 24 Issue (3): 45-50    DOI: 10.11925/infotech.1003-3513.2008.03.08
Current Issue | Archive | Adv Search |
Research of large-scale URL Filter Base on Bloom Filter
Ding ZhenguoWu BaoguiXin Youqiang2
1(College of Networking Education, Xidian University,  Xi’an  710071,China)
2(Collegel of Economics and Management, Xidian University,  Xi’an  710071,China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

On the condition of error allowing, the Bloom Filter and its improvable algorithm, can be used to filter the homology URL pages through URL Hashing. Experiment shows that it can achieve satisfactory results through reasonable adjustments of its parameter.

Key words Bloom filter      Hash function      URL      URL filter     
Received: 06 December 2007      Published: 25 March 2008
: 

TP391.3

 
Corresponding Authors: Wu Baogui     E-mail: bg1011@163.com
About author:: Ding Zhenguo,Wu Baogui,Xin Youqiang

Cite this article:

Ding Zhenguo,Wu Baogui,Xin Youqiang. Research of large-scale URL Filter Base on Bloom Filter. New Technology of Library and Information Service, 2008, 24(3): 45-50.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2008.03.08     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2008/V24/I3/45

[1] Gulli A, Signorini A.The Indexable Web is More than 11.5 Billion Pages[C]. Special Interest Tracks and Posters of the 14th International Conference on World Wide Web WWW ’05.ACM Press 2005:902-903.
[2] Bloom B. Space/time Tradeoffs in Hash Coding with Allowable Errors[J].Communication of the ACM, 1970, 13(7):422-426.
[3] Cormen T H, Leiserson C E. Introduction to Algorithms[M].  2nd ed. Cambridge: MIT Press, 2001:221-252.
[4] 吴丽辉,白硕,张刚,等.Web信息采集中的哈希函数比较[J].小型微型计算机系统,2006,27(4):673-676.
[5] 李晓明,凤旺森.两种对URL 的散列效果很好的函数[J].软件学报,2004,15 (2) :179-184.
[6] 肖明忠,代亚非.Bloom Filter及其应用综述[J].计算机科学,2004,30(4):180-183.
[7] 池静,倪健,王华,等.Bloom Filter 和Weighted Bloom Filter 的比较与研究[J].河北师范大学学报:自然科学版,2006,30(4):398-402.
[8] Fan L, Cao P, Almeida J,et al. Summary Cache: A Scalable Wide-area Web Cache Sharing Protocol[C].In:IEEE/ACM Transactions On Networking,2000,8(3):281-293.
[9] 肖明忠,代亚非,李小明.拆分型Bloom Filter[J].电子学报,2004,32(2):241-245.
[10] 谢鲲,闵应骅,张大方,等.分档布鲁姆过滤器的查询算法[J].计算机学报,2007,30(4):597-607.
[11] Mitzenmacher M.Compressed Bloom Filters[C].In: Proceedings of the 20th ACM Symposium on Principles of Distributed Computing (PODC2001).Rhode, Island, 2001:23-34.

[1] Ce Zhang,Yuncheng Du,Ran Liang. A Study on Hub Page Recognition Using URL Features[J]. 现代图书情报技术, 2016, 32(1): 24-31.
[2] Wang Qiang-Heng, Ma Zi-Wei, Li Gao-Hu. The Research and Implementation of Unified Search Service’s Key  Technologies[J]. 现代图书情报技术, 2010, 26(4): 18-23.
[3] Fang Hong,Lv Taizhi. Automatic Extraction of Job Listing Page Link Information in Job Web Station[J]. 现代图书情报技术, 2009, 25(7-8): 93-96.
[4] Yin Feifei, Li Yazi. Research on the Realization of an OpenURL Resolver[J]. 现代图书情报技术, 2009, 25(6): 19-23.
[5] Dou Tianfang,Jiang Airong,Lin Rong. A Case on Extending SFX Service Based on Z39.50 Protocol[J]. 现代图书情报技术, 2008, 24(4): 86-89.
[6] Gao Min,Jin Yuling,Liu Weiling. The Development and Innovative Uses of OpenURL Research[J]. 现代图书情报技术, 2008, 24(2): 87-90.
[7] Zhang Yashan,Zhang Yuntian. The Problem and Its Improvement of Secure Login for Digital Library[J]. 现代图书情报技术, 2007, 2(12): 78-81.
[8] Bai Haiyan . Analysis of Function and Source of Open Source Software OpenResolver[J]. 现代图书情报技术, 2007, 2(1): 58-61.
[9] Li Chunwang,Zhang Zhixiong,Wu Zhenxin,Qu Yunpeng . Design and Implementation of Integrative Service System in Digital Library[J]. 现代图书情报技术, 2006, 1(7): 1-5.
[10] Jiang Airong,Huang Meijun,Dou Tianfang . Integration of Digital Resources and Construction of Information Portal——Exploration and Practice of Tsinghua University Library[J]. 现代图书情报技术, 2006, 1(11): 2-6.
[11] Wu Chunfeng,Shi Shuicai. The Prototype Research of Open Linking Service  Based on the OpenURL[J]. 现代图书情报技术, 2005, 21(12): 55-58.
[12] Shen Yi. OpenURL and Its Application[J]. 现代图书情报技术, 2004, 20(1): 30-32.
[13] Mao Jun,Zhang Xiaolin,Marcia LeiZeng,LiGuangjian,Liu Wei. URI and Digital Object Unique Identifier[J]. 现代图书情报技术, 2003, 19(2): 9-12.
[14] Li Aiguo,Wang Shejiao. Tool of Academic Information Integration——SFX and Its Enlightenment[J]. 现代图书情报技术, 2003, 19(1): 48-50.
[15] Li Fuling,Lu Zhenbo. SFX——New Product in Integration of Scholarly Information Sources[J]. 现代图书情报技术, 2002, 18(6): 69-71.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn