|
|
Design and Realization of Weblog Gathering System Based on RSS |
Liu Li1,2 Xiao Shibin1,2 Wang Tao1,2 Shi Shuicai1,2 |
1(Chinese Information Processing Research Center,Beijing Information Science and Technology University,Beijing 100101,China)
2(Beijing TRS Information Technology Ltd,Beijing 100101,China) |
|
|
Abstract This paper focuses on how to crawl Weblogs effectively in some sections of Web,and brings forward an arithmetic of the Weblog gathering based on RSS.The authors design two crawlers,one of which is responsible for gathering RSS by performing a breadth-first traversal of the Web,and the other tracks updated Weblogs automatically by performing a vertical search of every RSS.Also A model system is implemented.
|
Received: 14 September 2007
Published: 25 November 2007
|
|
Corresponding Authors:
Liu Li
E-mail: luili.luili.liuli@163.com)
|
About author:: Liu Li,Xiao Shibin,Wang Tao,Shi Shuicai |
[1] 张道银,蔡瑞英.RSS技术及其应用研究[J].微计算机信息,2006,22(21):281-283
[2] Najork M,Heydon A.High-Performance Web Crawling[M].Handbook of Massive Data Sets,Kluwer Academic Publishers Inc,2001:25-45
[3] Heydon A,Najork M.Mercator:A Scalable,Entensible Web Crawler[J].World Wide Web,1999(2):219-229
[4] 李盛韬,赵章界,余智华,等.基于主题的Web信息采集系统的设计与实现[J].计算机工程,2003,29(17):102-104
[5] 李晓明,凤旺森.两种对URL的散列效果很好的函数[J].软件学报,2004,15(2):179-184
[6] 崔国华,周荣华,粟栗,等.关于MD5强度分析的研究[J].计算机工程与科学,2007,29(1):45-48
[7] 郭红艳,杨波,金蓓弘,等.高效DOM实现的技术研究[J].计算机科学,2006,33(6):274-277 |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|