|
|
Automated Extraction of Search Engine Results |
Ou Jun Ren Minglun |
(Institute of Computer Network of Hefei University of Technology,Hefei 230009,China) |
|
|
Abstract Present a new method for automatically extracting Search Result Records(SRRs) and Subsequent Result Page Links(SRPLs) from a search engine’s response page. Compare the similarity of nodes on the HTML tags tree of a valid response page to recognize Candidated Records Blocks(CRBs).And recognize SRRs and SRPLs form CRBs based on several heuristic rules.Then building wrapper for them using their location on tags tree. Experiments and comparison with other methods show that the methed is useful and efficient.
|
Received: 24 November 2006
Published: 25 February 2007
|
|
Corresponding Authors:
Ou Jun
E-mail: 1717go@gmail.com
|
About author:: Ou Jun,Ren Minglun |
1Wu Z, Meng W, Raghavan V, Yu C, He H, Qian H, Vuyyuru R. Towards Automatic Incorporation of Search Engines into a Large-Scale Metasearch Engine. IEEE/WICWI-2003 Conference.2003
2Doorenbos R B, Etzioni O, Weld D S. A Scalable Comparison-Shopping Agent for the World-Wide-Web.Proceedings of the first International Conference on Autonomous Agents, California,1997
3Line Eikvil.网上信息抽取技术纵览.2003.http://www.byiit.com/in2in/www/hongbiao/IESurvey/toc.htm(Accessed Sept.21,2006)
4Liu B, Grossman R and Zhai Y. Mining Data Records in Web Pages. SIGKDD’03, 2003
5Hongkun Zhao, Weiyi Meng, Zonghuan Wu, Vijay Raghavan, Clement Yu. Fully Automatic Wrapper Generation for Search Engines . Proc. of 14th International World Wide Web Conference (WWW14), Japan,200566-75
6Dheerendranath Mundluru, Zonghuan Wu, Vijay Raghavan, Weiyi Meng, Hongkun Zhao. Automatically Extracting Subsequent Response Pages from Web Search Sources.IEEE Workshop on Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources .2005
7W3C. DOM. 2004. http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407(Accessed Sept.21,2006)
8李效东,顾毓清.基于DOM的Web信息抽取.计算机学报,2005,25(5):526-533 |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|