Please wait a minute...
New Technology of Library and Information Service  2007, Vol. 2 Issue (2): 49-52    DOI: 10.11925/infotech.1003-3513.2007.02.10
Current Issue | Archive | Adv Search |
Automated Extraction of Search Engine Results
Ou Jun   Ren Minglun
(Institute of Computer Network of Hefei University of Technology,Hefei 230009,China)
Download: PDF(501 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

Present a new method for automatically extracting Search Result Records(SRRs) and Subsequent Result Page Links(SRPLs) from a search engine’s response page. Compare the similarity of nodes on the HTML tags tree of a valid response page to recognize Candidated Records Blocks(CRBs).And recognize SRRs and SRPLs form CRBs based on several heuristic rules.Then building wrapper for them using their location on tags tree. Experiments and comparison with other methods show that the methed is useful and efficient.

Key wordsSearch engine      Web information extraction      Wrapper generation      HTML tags tree      Nodes similarity     
Received: 24 November 2006      Published: 25 February 2007
: 

TP391.3

 
Corresponding Authors: Ou Jun     E-mail: 1717go@gmail.com
About author:: Ou Jun,Ren Minglun

Cite this article:

Ou Jun,Ren Minglun . Automated Extraction of Search Engine Results. New Technology of Library and Information Service, 2007, 2(2): 49-52.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2007.02.10     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2007/V2/I2/49

1Wu Z, Meng W,  Raghavan V,  Yu C, He H, Qian H,  Vuyyuru R. Towards Automatic Incorporation of Search Engines into a Large-Scale Metasearch Engine. IEEE/WICWI-2003 Conference.2003
2Doorenbos R B, Etzioni O,  Weld D S. A Scalable Comparison-Shopping Agent for the World-Wide-Web.Proceedings of the first International Conference on Autonomous Agents, California,1997
3Line Eikvil.网上信息抽取技术纵览.2003.http://www.byiit.com/in2in/www/hongbiao/IESurvey/toc.htm(Accessed Sept.21,2006)
4Liu B,  Grossman R and  Zhai Y. Mining Data Records in Web Pages. SIGKDD’03, 2003
5Hongkun Zhao, Weiyi Meng, Zonghuan Wu, Vijay Raghavan, Clement Yu. Fully Automatic Wrapper Generation for Search Engines . Proc. of 14th International World Wide Web Conference (WWW14), Japan,200566-75
6Dheerendranath Mundluru, Zonghuan Wu, Vijay Raghavan, Weiyi Meng, Hongkun Zhao. Automatically Extracting Subsequent Response Pages from Web Search Sources.IEEE Workshop on Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources .2005
7W3C. DOM. 2004. http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407(Accessed Sept.21,2006)
8李效东,顾毓清.基于DOM的Web信息抽取.计算机学报,2005,25(5):526-533

[1] Liu Tong,Ni Weijian,Liu Mei. Identifying Terminology from Search Engine Query Logs[J]. 现代图书情报技术, 2016, 32(2): 25-33.
[2] Tong Guoping, Sun Jianjun. User Behavior Analysis Based on Search Engine Log[J]. 现代图书情报技术, 2015, 31(7-8): 80-88.
[3] Wang Xiwei, Zhao Dan, Yang Mengqing, Wei Junwei. Indices and Empirical Research on Search Engine Optimization of the Industry Websites: An Analysis from the Perspective of Information Ecology[J]. 现代图书情报技术, 2015, 31(3): 75-83.
[4] Chen Yong, Li Honglian, Lv Xueqiang. Analysis for the Search Behavior of Web Users[J]. 现代图书情报技术, 2014, 30(12): 10-17.
[5] Zhang Liyi, Chen Mingying. Research on the Sensitivity and Specificity of Search Engines[J]. 现代图书情报技术, 2011, 27(7/8): 41-46.
[6] Wang Jimin, Lilei Mingzi, Zhang Peng. Co-authorship Network Analysis in the Research Field of Search Engine’s Log Mining[J]. 现代图书情报技术, 2011, 27(4): 58-63.
[7] Zhang Hongbin, Cao Yiqin. A New Classifier Design in a Topic Search Engine by Combining Multi-layer Classifier with Naive Bayes Classification Model[J]. 现代图书情报技术, 2011, 27(3): 73-79.
[8] Zhou Zhicheng. Real-Time Search Suggestions Based on the Clustering of the User’ s Query Intent[J]. 现代图书情报技术, 2011, 27(2): 87-93.
[9] Ke Qing, Cheng Ying, Zheng Yanning, Pan Yuntao. Construction of the Usability Evaluation Indicators on Search Engine[J]. 现代图书情报技术, 2011, (11): 24-30.
[10] Jing Jing, Hong Ying, Jiang Yuanyuan, Gao Xiaofeng. Study on Web Retrieval Query Fusion Based on Relevance Feedback[J]. 现代图书情报技术, 2011, 27(1): 57-62.
[11] Nie Hui Huang Guipeng. The Application and Implementation of Tree Edit Distance in Web Information Extraction[J]. 现代图书情报技术, 2010, 26(5): 29-34.
[12] Zhan Jiajia. The Design and Application of a Web Information Extraction System Based on Web-Harvest[J]. 现代图书情报技术, 2010, 26(3): 76-81.
[13] Guo Shaoyou. Research on Deep Web Surfacing Based on Common Search Engines[J]. 现代图书情报技术, 2010, 26(2): 24-30.
[14] Fu Zhenzhen,Lu Wei. The Search Engine Optimizing Strategy and Evaluation Based on Keywords[J]. 现代图书情报技术, 2009, 25(6): 61-65.
[15] Xu Fang. The Secondary Development of Site Search Based on Common Search Engines[J]. 现代图书情报技术, 2009, 25(5): 81-85.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn