Please wait a minute...
New Technology of Library and Information Service  2007, Vol. 2 Issue (5): 41-44    DOI: 10.11925/infotech.1003-3513.2007.05.10
Current Issue | Archive | Adv Search |
The Focused-crawler Based on Thesaurus
Xia Chongpu   Kang Li
(Department of Computer Science,China Agricultural University,Beijing 100083, China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

Combining the thesaurus with the traditional information retrieval technology, a new method is presented that family in thesaurus is used to describe the predefined topic.Also develops a focused-crawler based on this method. On which we compare its efficiency with other well-known Web search engine. The experimental results showes the effectiveness of our models and algorithms.

Key wordsFocused crawler      Thesaurus      Search engine     
Received: 06 February 2007      Published: 25 May 2007
: 

TP393

 
Corresponding Authors: Kang Li     E-mail: kangli.cau@gmail.com
About author:: Xia Chongpu,Kang Li

Cite this article:

Xia Chongpu,Kang Li . The Focused-crawler Based on Thesaurus. New Technology of Library and Information Service, 2007, 2(5): 41-44.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2007.05.10     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2007/V2/I5/41

1赵焕洲,唐爱民. 对两种知识组织系统——叙词表与Ontology的比较研究. 情报理论与实践,2005,28(5): 469-471
2毛军. 基于RDF的叙词表研究. 情报学报,2003,22(2):163-168
3李波,戴秀梅,侯汉清. 计算机建立分类法和主题词表转换系统的尝试. 现代情报,2003,23(6): 112-115
4常春,卢文林. 叙词表编制历史、现状与发展. 农业图书情报学刊,2002(5):22-28
5贺纯佩,李思经. 农业叙词表在中国的发展和农业本体论展望. 农业图书情报学刊,2003(4):16-19

[1] Wang Ying,Wu Sizhu. Converting STKOS Metathesaurus to RDF Triples with R2RML[J]. 数据分析与知识发现, 2018, 2(12): 89-97.
[2] Liu Tong,Ni Weijian,Liu Mei. Identifying Terminology from Search Engine Query Logs[J]. 现代图书情报技术, 2016, 32(2): 25-33.
[3] Tong Guoping, Sun Jianjun. User Behavior Analysis Based on Search Engine Log[J]. 现代图书情报技术, 2015, 31(7-8): 80-88.
[4] Duan Yufeng, Zhu Wenjing, Chen Qiao, Liu Wei, Liu Fenghong. A Domain Concepts Triple-layer Filter Method[J]. 现代图书情报技术, 2015, 31(4): 26-33.
[5] Wang Xiwei, Zhao Dan, Yang Mengqing, Wei Junwei. Indices and Empirical Research on Search Engine Optimization of the Industry Websites: An Analysis from the Perspective of Information Ecology[J]. 现代图书情报技术, 2015, 31(3): 75-83.
[6] Zeng Xinhong, Cai Qinghe, Huang Huajun, Lin Weiming. Research on Non-uniform Node Clustered Graph Layout Algorithm for Visualization Based on Force Directed Model[J]. 现代图书情报技术, 2014, 30(9): 33-43.
[7] Li Peng, Zhu Lijun, Liu Yajie, Yan Yingying. Realization of Improved RBAC Model in Task Management in Normative Concepts Collaborative Construction Platform[J]. 现代图书情报技术, 2014, 30(2): 86-91.
[8] Chen Yong, Li Honglian, Lv Xueqiang. Analysis for the Search Behavior of Web Users[J]. 现代图书情报技术, 2014, 30(12): 10-17.
[9] Qiao Jianzhong. An Improved Best-First Search Algorithm Based Focused Crawling Research[J]. 现代图书情报技术, 2013, 29(7/8): 28-35.
[10] Yang He, Yang Yihong, Li Ning. Construction of Keywords-Chinese Library Classification Codes Integrated Thesaurus[J]. 现代图书情报技术, 2013, 29(7/8): 107-113.
[11] Ye Chunlei, Leng Fuhai. Building the Future-oriented Technology Thesaurus of Technology Roadmap[J]. 现代图书情报技术, 2013, (5): 59-63.
[12] Xian Guojian, Zhao Ruixue, Kou Yuantao, Zhu Liang, Zhang Jie. Study and Practice on Converting and Publishing Chinese Agricultural Thesaurus as Linked Open Data[J]. 现代图书情报技术, 2013, 29(11): 8-14.
[13] Qiao Jianzhong. Statistical Characteristics Based Web Page Relevance Judgment Strategy for the “Type” Topics Crawled[J]. 现代图书情报技术, 2012, 28(6): 9-16.
[14] Huang Wei, Jin Yabo, Hu Changlong. Focused Crawling for Network Public Opinion’s Topic Information[J]. 现代图书情报技术, 2012, (11): 65-71.
[15] Zeng Xinhong, Cai Qinghe, Zeng Hanlong, Tang Cheng, Huang Huajun, Lin Weiming. The Research and Implementation of Clustered Graphs Layout Algorithm for OntoThesaurus Visualization[J]. 现代图书情报技术, 2012, (10): 8-15.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn