基于通用搜索引擎的深层网络表面化方法研究
郭少友
(郑州大学信息管理系 郑州 450001)
Research on Deep Web Surfacing Based on Common Search Engines
Guo Shaoyou
(Department of Information Management, Zhengzhou University, Zhengzhou 450001, China)
摘要 在现有相关研究的基础上,对基于通用搜索引擎的深层网络表面化方法的基本原理进行分析,对表单域取值范围的确定、查询处理、查询结果的超链接设置等与深层网络表面化相关的若干关键问题进行探讨。
关键词 :
搜索引擎 ,
深层网络 ,
表面化 ,
数据库
Abstract :On the basis of related works, this paper analyzes the basic principle of deep Web surfacing based on common search engines. Several key issues related to the deep Web surfacing are discussed, which include determination of value ranges of form fields, query processing, and hyperlink setting in result pages.
Key words :
Search engine
Deep Web
Surfacing
Database
收稿日期: 2010-02-03
出版日期: 2010-02-25
通讯作者:
郭少友
E-mail: gsy6@ha.edu.cn
作者简介 : 郭少友
[1] Bergman M K. White Paper: The Deep Web: Surfacing Hidden Value[EB/OL]. [2009-10-20]. http://www.press.umich.edu/jep/07-01/bergman.html .
[2] 刘伟,孟小峰,孟卫一. Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489.
[3] Doan A H, Domingos P, Halevy A. Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach[EB/OL]. [2009-10-12]. http://www.cs.washington.edu/homes/pedrod/papers/sigmod01.pdf .
[4] Raghavan S, Garcia-Molina H. Crawling the Hidden Web[EB/OL]. [2010-02-11]. http://www.dia.uniroma3.it/~vldbproc/017_129.pdf .
[5] Deep Query Manager[EB/OL]. [2009-10-20]. http://brightplanet.com/products/dqm.asp .
[6] Callan J, Connell M. Query-based Sampling of Text Databases[J]. ACM Transactions on Information Systems, 2001,19(2):97-130.
[7] Ipeirotis P, Gravano L. Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection[EB/OL]. [2009-10-22]. http://softbase.uwaterloo.ca/~tozsu/courses/cs856/W05/Presentations/HiddenWeb_Amr.pdf .
[8] Ntoulas A, Zerfos P, Cho J. Downloading Textual Hidden Web Content Through Keyword Queries [EB/OL]. [2009-10-12]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.105.137&rep=rep1&type=pdf .
[9] Wu P, Wen J R, Liu H, et al. Query Selection Techniques for Efficient Crawling of Structured Web Sources[EB/OL]. [2009-10-12]. http://research.microsoft.com/en-us/um/people/jrwen/jrwen_files/publications/deepwebcrawling.pdf .
[10] Byers S, Freire J, Silva C. Efficient Acquisition of Web Data Through Restricted Query Interfaces[EB/OL]. [2009-10-15]. http://www10.org/cdrom/posters/1051.pdf .
[11] Madhavan J, Ko D, Kot L, et al. Google’s Deep-Web Crawl[EB/OL]. [2009-10-15].http://www.cs.cornell.edu/~lucja/Publications/i03.pdf .
[12] 阿拉丁计划[EB/OL]. [2009-09-24]. http://baike.baidu.com/view/2086291.htm .
Viewed
Full text
Abstract
Cited
Shared
Discussed