Research and Implementation of Nutch-based Website Harvest and Service System in Special Field
Chang Zhirong1 Ma Ziwei2 Li Gaohu3
1(College of Computer,Beijing University of Posts and Telecommunications, Beijing 100876, China) 2(Beijing University of Post and Telecommunication Library,Beijing 100876, China) 3(Bupt Assets Management Co., Ltd, Beijing 100876,China)
This paper proposes the design of Nutch-based Website Harvest and Service system in Special field under the framework of digital library systems integration. It introduces information filtering module, dictionary-based Chinese analyzer module, GUI information module,topic-knowledge based information processing module as well as the Webservice-based search service modules to improve function and performance of the system. It focuses on text parsing filters, plugin development and applications of the level-automatic clustering of the search results. Finally, integration with other subsystem in digital library is realized through the Webservice-interface, which can provide comprehensive and professional services.
常智荣,马自卫,李高虎. 基于Nutch的专题网页资源采集服务系统的设计与实现[J]. 现代图书情报技术, 2010, 26(3): 19-26.
Chang Zhirong,Ma Ziwei,Li Gaohu. Research and Implementation of Nutch-based Website Harvest and Service System in Special Field. New Technology of Library and Information Service, 2010, 26(3): 19-26.