Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (10): 82-86    DOI: 10.11925/infotech.1003-3513.2010.10.14
article Current Issue | Archive | Adv Search |
Research on Building an Open Access Search Engine with Nutch
Cui Yuhong, Zhang Kui
Beijing Institute of Technology Library, Beijing 100081,China
Export: BibTeX | EndNote (RIS)      

Integrated retrieval mechanism is studied for open access system and the Web crawling is used to build a distributed DSearch system based on Nutch, which can provide a kind of efficient, flexible, customizable search tools. Three key technologies are also introduced,including distributed cluster configuration,Chinese word splitter modification and index settings. Finally,the functions of DSearch are evaluated with the selected feed lists.

Key wordsOpen      access      Search      engine      Nutch      Chinese      academic      resources     
Received: 12 July 2010      Published: 04 January 2011



Cite this article:

Cui Yuhong, Zhang Kui. Research on Building an Open Access Search Engine with Nutch. New Technology of Library and Information Service, 2010, 26(10): 82-86.

URL:     OR

[1] DOAJ . .

[2] OpenDOAR . .

[3] 李春旺. 网络环境下学术信息的开放存取
[J]. 中国图书馆学报 ,2005,31(1):33-37.

[4] The OAIster Database . .

[5] Norris M, Oppenheim C, Rowland F. Finding Open Access Articles Using Google, Google Scholar, OAIster and OpenDOAR
[J]. Online Information Review, 2008, 32(6):709-715.

[6] Welcome to Apache Hadoop . .

[7] Welcome to Pig! . .

[8] Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Cluster . .

[9] Paoding . .

[1] Shan Xiaohong,Wang Chunwen,Liu Xiaoyan,Han Shengxi,Yang Juan. Identifying Lead Users in Open Innovation Community from Knowledge-based Perspectives[J]. 数据分析与知识发现, 2021, 5(9): 85-96.
[2] Xu Zengxulin, Xie Jing, Yu Qianqian. Designing New Evaluation Model for Talents[J]. 数据分析与知识发现, 2021, 5(8): 122-131.
[3] Wang Ruolin, Niu Zhendong, Lin Qika, Zhu Yifan, Qiu Ping, Lu Hao, Liu Donglei. Disambiguating Author Names with Embedding Heterogeneous Information and Attentive RNN Clustering Parameters[J]. 数据分析与知识发现, 2021, 5(8): 13-24.
[4] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[5] Yin Pengbo,Pan Weimin,Zhang Haijun,Chen Degang. Identifying Clickbait with BERT-BiGA Model[J]. 数据分析与知识发现, 2021, 5(6): 126-134.
[6] Dai Bing,Hu Zhengyin. Review of Studies on Literature-Based Discovery[J]. 数据分析与知识发现, 2021, 5(4): 1-12.
[7] Lin Kerou,Wang Hao,Gong Lijuan,Zhang Baolong. Disambiguation of Chinese Author Names with Multiple Features[J]. 数据分析与知识发现, 2021, 5(4): 90-102.
[8] Yue Mingliang,Li Fushan,Tang Hongbo,Lv Xinhua,Ma Tingcan. Evaluating Consistency of Scholarly Article Reviewers[J]. 数据分析与知识发现, 2021, 5(4): 115-122.
[9] Wang Qian,Wang Dongbo,Li Bin,Xu Chao. Deep Learning Based Automatic Sentence Segmentation and Punctuation Model for Massive Classical Chinese Literature[J]. 数据分析与知识发现, 2021, 5(3): 25-34.
[10] Wu Yanwen, Cai Qiuting, Liu Zhi, Deng Yunze. Digital Resource Recommendation Based on Multi-Source Data and Scene Similarity Calculation[J]. 数据分析与知识发现, 2021, 5(11): 114-123.
[11] Ji Youshu, Wang Dongbo, Huang Shuiqing. Automatically Extracting Ancient Chinese Synonyms with Word Alignment——Case Study of Pre-Four-History Corpus[J]. 数据分析与知识发现, 2021, 5(11): 135-144.
[12] Wang Song, Yang Yang, Liu Xinmin. Discovering Potentialities of User Ideas from Open Innovation Communities with Graph Attention Network[J]. 数据分析与知识发现, 2021, 5(11): 89-101.
[13] Wang Yan, Wang Huyan, Yu Bengong. Chinese Text Classification with Feature Fusion[J]. 数据分析与知识发现, 2021, 5(10): 1-14.
[14] Yu Fengchang,Cheng Qikai,Lu Wei. Locating Academic Literature Figures and Tables with Geometric Object Clustering[J]. 数据分析与知识发现, 2021, 5(1): 140-149.
[15] Liang Jiwen,Jiang Chuan,Wang Dongbo. Chinese-English Sentence Alignment of Ancient Literature Based on Multi-feature Fusion[J]. 数据分析与知识发现, 2020, 4(9): 123-132.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938