|
|
Research on Deep Web Surfacing Based on Common Search Engines |
Guo Shaoyou |
(Department of Information Management, Zhengzhou University, Zhengzhou 450001, China) |
|
|
Abstract On the basis of related works, this paper analyzes the basic principle of deep Web surfacing based on common search engines. Several key issues related to the deep Web surfacing are discussed, which include determination of value ranges of form fields, query processing, and hyperlink setting in result pages.
|
Received: 03 February 2010
Published: 25 February 2010
|
|
Corresponding Authors:
Guo Shaoyou
E-mail: gsy6@ha.edu.cn
|
About author:: Guo Shaoyou |
[1] Bergman M K. White Paper: The Deep Web: Surfacing Hidden Value[EB/OL]. [2009-10-20]. http://www.press.umich.edu/jep/07-01/bergman.html.
[2] 刘伟,孟小峰,孟卫一. Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489.
[3] Doan A H, Domingos P, Halevy A. Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach[EB/OL]. [2009-10-12]. http://www.cs.washington.edu/homes/pedrod/papers/sigmod01.pdf.
[4] Raghavan S, Garcia-Molina H. Crawling the Hidden Web[EB/OL]. [2010-02-11]. http://www.dia.uniroma3.it/~vldbproc/017_129.pdf.
[5] Deep Query Manager[EB/OL]. [2009-10-20]. http://brightplanet.com/products/dqm.asp.
[6] Callan J, Connell M. Query-based Sampling of Text Databases[J]. ACM Transactions on Information Systems, 2001,19(2):97-130.
[7] Ipeirotis P, Gravano L. Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection[EB/OL]. [2009-10-22]. http://softbase.uwaterloo.ca/~tozsu/courses/cs856/W05/Presentations/HiddenWeb_Amr.pdf.
[8] Ntoulas A, Zerfos P, Cho J. Downloading Textual Hidden Web Content Through Keyword Queries [EB/OL]. [2009-10-12]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.105.137&rep=rep1&type=pdf.
[9] Wu P, Wen J R, Liu H, et al. Query Selection Techniques for Efficient Crawling of Structured Web Sources[EB/OL]. [2009-10-12]. http://research.microsoft.com/en-us/um/people/jrwen/jrwen_files/publications/deepwebcrawling.pdf.
[10] Byers S, Freire J, Silva C. Efficient Acquisition of Web Data Through Restricted Query Interfaces[EB/OL]. [2009-10-15]. http://www10.org/cdrom/posters/1051.pdf.
[11] Madhavan J, Ko D, Kot L, et al. Google’s Deep-Web Crawl[EB/OL]. [2009-10-15].http://www.cs.cornell.edu/~lucja/Publications/i03.pdf.
[12] 阿拉丁计划[EB/OL]. [2009-09-24]. http://baike.baidu.com/view/2086291.htm. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|