Integrated retrieval mechanism is studied for open access system and the Web crawling is used to build a distributed DSearch system based on Nutch, which can provide a kind of efficient, flexible, customizable search tools. Three key technologies are also introduced,including distributed cluster configuration,Chinese word splitter modification and index settings. Finally,the functions of DSearch are evaluated with the selected feed lists.
崔宇红, 张奎. 基于Nutch的开放存取搜索引擎构建研究[J]. 现代图书情报技术, 2010, 26(10): 82-86.
Cui Yuhong, Zhang Kui. Research on Building an Open Access Search Engine with Nutch. New Technology of Library and Information Service, 2010, 26(10): 82-86.
[4] The OAIster Database . .http://www.oclc.org/oaister/.
[5] Norris M, Oppenheim C, Rowland F. Finding Open Access Articles Using Google, Google Scholar, OAIster and OpenDOAR [J]. Online Information Review, 2008, 32(6):709-715.
[6] Welcome to Apache Hadoop . .http://hadoop.apache.org/index.pdf.
[7] Welcome to Pig! . .http://hadoop.apache.org/pig/index.pdf.
[8] Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Cluster . .http://labs.google.com/papers/mapreduce-osdi04.pdf.