Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (10): 82-86    DOI: 10.11925/infotech.1003-3513.2010.10.14
article Current Issue | Archive | Adv Search |
Research on Building an Open Access Search Engine with Nutch
Cui Yuhong, Zhang Kui
Beijing Institute of Technology Library, Beijing 100081,China
Download: PDF(928 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

Integrated retrieval mechanism is studied for open access system and the Web crawling is used to build a distributed DSearch system based on Nutch, which can provide a kind of efficient, flexible, customizable search tools. Three key technologies are also introduced,including distributed cluster configuration,Chinese word splitter modification and index settings. Finally,the functions of DSearch are evaluated with the selected feed lists.

Key wordsOpen      access      Search      engine      Nutch      Chinese      academic      resources     
Received: 12 July 2010      Published: 04 January 2011
: 

TP39

 

Cite this article:

Cui Yuhong, Zhang Kui. Research on Building an Open Access Search Engine with Nutch. New Technology of Library and Information Service, 2010, 26(10): 82-86.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2010.10.14     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2010/V26/I10/82


[1] DOAJ . . http://www.doaj.org.

[2] OpenDOAR . .http://www.opendoar.org.

[3] 李春旺. 网络环境下学术信息的开放存取
[J]. 中国图书馆学报 ,2005,31(1):33-37.

[4] The OAIster Database . .http://www.oclc.org/oaister/.

[5] Norris M, Oppenheim C, Rowland F. Finding Open Access Articles Using Google, Google Scholar, OAIster and OpenDOAR
[J]. Online Information Review, 2008, 32(6):709-715.

[6] Welcome to Apache Hadoop . .http://hadoop.apache.org/index.pdf.

[7] Welcome to Pig! . .http://hadoop.apache.org/pig/index.pdf.

[8] Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Cluster . .http://labs.google.com/papers/mapreduce-osdi04.pdf.

[9] Paoding . .http://code.google.com/p/paoding/.

[1] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[2] Zhongxi You,Weina Hua,Xuelian Pan. Matching Book Reviews and Essential Sentiment Lexicons with Chinese Word Segmenters[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
[3] Ru Li,Rui Li,Jie Jiang,Huayi Wu. Spatio-Temporal Characteristics of WMTS Access Sessions[J]. 数据分析与知识发现, 2019, 3(6): 1-11.
[4] Qiang Liu,Yunwei Chen,Zhiqiang Zhang. Methods and Applications of Norwegian Model for Science and Technology Evaluation[J]. 数据分析与知识发现, 2019, 3(5): 41-50.
[5] Xiaolan Wu,Chengzhi Zhang. Analysis of Knowledge Flow Based on Academic Social Networks:
A Case Study of ScienceNet.cn
[J]. 数据分析与知识发现, 2019, 3(4): 107-116.
[6] Shijie Song,Yuxiang Zhao,Wenting Han,Qinghua Zhu. The Inhibition Effect of Health Literacy on Health Risk Under the Internet Environment: An Empirical Study of Chronic Diseases Based on CHNS Data[J]. 数据分析与知识发现, 2019, 3(4): 13-21.
[7] Yue Yuan,Dongbo Wang,Shuiqing Huang,Bin Li. The Comparative Study of Different Tagging Sets on Entity Extraction of Classical Books[J]. 数据分析与知识发现, 2019, 3(3): 57-65.
[8] Sisi Gui,Wei Lu,Xiaojuan Zhang. Temporal Intent Classification with Query Expression Feature[J]. 数据分析与知识发现, 2019, 3(3): 66-75.
[9] Li Qian,Jing Xie,Zhijun Chang,Zhenxin Wu,Dongrong Zhang. Designing Smart Knowledge Services with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 4-14.
[10] Jing Li,Xiao Liu,Xiaoli Wang. Financial Decision Knowledge Acquisition Based on Neighborhood Rough Set and Ensemble Classifiers with Grid Search[J]. 数据分析与知识发现, 2019, 3(1): 85-94.
[11] Ting Chen,Guopeng Li,Xiaomei Wang. Visualizing Appropriation of Research Funding with t-SNE Algorithm[J]. 数据分析与知识发现, 2018, 2(8): 1-9.
[12] Dan Wu,Liuxing Lu. Semantic Changes of Queries from Cross-device Searching[J]. 数据分析与知识发现, 2018, 2(8): 69-78.
[13] Huihui Tang,Hao Wang,Zixuan Zhang,Xueying Wang. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[14] Xiaodong Qian,Min Li. Identifying E-commerce User Types Based on Complex Network Overlapping Community[J]. 数据分析与知识发现, 2018, 2(6): 79-91.
[15] Xueying Wang,Hao Wang,Zixuan Zhang. Recognizing Semantics of Continuous Strings in Chinese Patent Documents[J]. 数据分析与知识发现, 2018, 2(5): 11-22.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn