Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (2): 24-30    DOI: 10.11925/infotech.1003-3513.2010.02.05
article Current Issue | Archive | Adv Search |
Research on Deep Web Surfacing Based on Common Search Engines
Guo Shaoyou
(Department of Information Management, Zhengzhou University, Zhengzhou 450001, China)
Download: PDF(827 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

On the basis of related works, this paper analyzes the basic principle of deep Web surfacing based on common search engines. Several key issues related to the deep Web surfacing are discussed, which include determination of value ranges of form fields, query processing, and hyperlink setting in result pages.

Key wordsSearch engine      Deep Web      Surfacing      Database     
Received: 03 February 2010      Published: 25 February 2010
: 

TP393

 
Corresponding Authors: Guo Shaoyou     E-mail: gsy6@ha.edu.cn
About author:: Guo Shaoyou

Cite this article:

Guo Shaoyou. Research on Deep Web Surfacing Based on Common Search Engines. New Technology of Library and Information Service, 2010, 26(2): 24-30.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2010.02.05     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2010/V26/I2/24

[1] Bergman M K. White Paper: The Deep Web: Surfacing Hidden Value[EB/OL]. [2009-10-20]. http://www.press.umich.edu/jep/07-01/bergman.html.
[2] 刘伟,孟小峰,孟卫一. Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489.
[3] Doan A H, Domingos P, Halevy A. Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach[EB/OL]. [2009-10-12]. http://www.cs.washington.edu/homes/pedrod/papers/sigmod01.pdf.
[4] Raghavan S, Garcia-Molina H. Crawling the Hidden Web[EB/OL]. [2010-02-11]. http://www.dia.uniroma3.it/~vldbproc/017_129.pdf.
[5] Deep Query Manager[EB/OL]. [2009-10-20]. http://brightplanet.com/products/dqm.asp.
[6] Callan J, Connell M. Query-based Sampling of Text Databases[J]. ACM Transactions on Information Systems, 2001,19(2):97-130.
[7] Ipeirotis P, Gravano L. Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection[EB/OL]. [2009-10-22]. http://softbase.uwaterloo.ca/~tozsu/courses/cs856/W05/Presentations/HiddenWeb_Amr.pdf.
[8] Ntoulas A, Zerfos P, Cho J. Downloading Textual Hidden Web Content Through Keyword Queries [EB/OL]. [2009-10-12]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.105.137&rep=rep1&type=pdf.
[9] Wu P, Wen J R, Liu H, et al. Query Selection Techniques for Efficient Crawling of Structured Web Sources[EB/OL]. [2009-10-12]. http://research.microsoft.com/en-us/um/people/jrwen/jrwen_files/publications/deepwebcrawling.pdf.
[10] Byers S, Freire J, Silva C. Efficient Acquisition of Web Data Through Restricted Query Interfaces[EB/OL]. [2009-10-15]. http://www10.org/cdrom/posters/1051.pdf.
[11] Madhavan J, Ko D, Kot L, et al. Google’s Deep-Web Crawl[EB/OL]. [2009-10-15].http://www.cs.cornell.edu/~lucja/Publications/i03.pdf.
[12] 阿拉丁计划[EB/OL]. [2009-09-24]. http://baike.baidu.com/view/2086291.htm.

[1] Haixia Sun,Lei Wang,Yingjie Wu,Weina Hua,Junlian Li. Matching Strategies for Institution Names in Literature Database[J]. 数据分析与知识发现, 2018, 2(8): 88-97.
[2] Liu Tong,Ni Weijian,Liu Mei. Identifying Terminology from Search Engine Query Logs[J]. 现代图书情报技术, 2016, 32(2): 25-33.
[3] Dongsheng Zhai, He Liu, Jie Zhang, Liwei Cai. Managing Patent Semantic Knowledge with Graph Database[J]. 数据分析与知识发现, 2016, 32(12): 66-75.
[4] Gao Guangshang, Zhang Zhixiong. Survey on Entity Resolution over Relational Databases[J]. 现代图书情报技术, 2015, 31(7-8): 37-47.
[5] Tong Guoping, Sun Jianjun. User Behavior Analysis Based on Search Engine Log[J]. 现代图书情报技术, 2015, 31(7-8): 80-88.
[6] Wang Xiwei, Zhao Dan, Yang Mengqing, Wei Junwei. Indices and Empirical Research on Search Engine Optimization of the Industry Websites: An Analysis from the Perspective of Information Ecology[J]. 现代图书情报技术, 2015, 31(3): 75-83.
[7] Hao Hui. A Duplicate Removal Algorithm of Cross-database Search Based on Sci-tech Novelty Retrieval[J]. 现代图书情报技术, 2015, 31(1): 89-95.
[8] Xiong Yongjun, Yuan Xiaoyi. Design and Implementation of Automatic Monitoring System about Library Document Database Running State[J]. 现代图书情报技术, 2014, 30(7): 127-132.
[9] Chen Yong, Li Honglian, Lv Xueqiang. Analysis for the Search Behavior of Web Users[J]. 现代图书情报技术, 2014, 30(12): 10-17.
[10] Zhai Dongsheng, Zhang Xinqi, Zhang Jie, Kang Ning. The Design and Implementation of Distributed Patent Information Extraction System[J]. 现代图书情报技术, 2013, 29(7/8): 114-121.
[11] Wang Xiaoliang, Wang Wei. Constructing Statistical Analysis System of Electronic Periodical Databases Based on the Firewall Log Mining[J]. 现代图书情报技术, 2013, 29(7/8): 122-126.
[12] Zhao Yan, Chen Heng. A Method to Improve Accuracy of Automatic Indexing for Chinese-English Mixed Text[J]. 现代图书情报技术, 2012, 28(6): 36-42.
[13] Huang Yong. Design and Implementation of GIS Applications for Overseas Chinese Students Database[J]. 现代图书情报技术, 2012, 28(5): 91-95.
[14] Li Shuning, Guan Fuying, Wu Yingmei, Jia Xilan. Design and Realization of Database Navigation System Based on MetaLib X-Server[J]. 现代图书情报技术, 2011, 27(9): 72-77.
[15] Zhang Liyi, Chen Mingying. Research on the Sensitivity and Specificity of Search Engines[J]. 现代图书情报技术, 2011, 27(7/8): 41-46.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn