Abstract:By analyzing the existing open-source framework collection system, an accurate acquistition system is designed and developed based on Crawler4j. So the system can meet the real-time monitoring of collection of resources and accuracy requirements. And the paper introduces the design and implementation of the system.
谢靖, 曲云鹏, 刘建华. 面向网络科技监测的分布式定向资源精确采集研究和应用[J]. 现代图书情报技术, 2011, 27(7/8): 26-31.
Xie Jing, Qu Yunpeng, Liu Jianhua. Targeted Websites Distributed and Precise Harvest System for Network Monitoring Technology. New Technology of Library and Information Service, 2011, 27(7/8): 26-31.
[1] Nutch.http://wiki.apache.org/nutch.[2] Heritrix.http://crawler.archive.org/.[3] Open Source Web Crawler for Java.http://code.google.com/p/crawler4j/.[4] Trail:RMI.http://download.oracle.com/javase/tutorial/rmi/index.html.[5] Cobra: Java HTML Renderer & Parser.http://lobobrowser.org/cobra.jsp.[6] Regular Expression.http://en.wikipedia.org/wiki/Regular_expression.