%A Zhang Jian,Ou Hong %T Extracting the Content of Google Web Page with Regular Expressions %0 Journal Article %D 2005 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.1003-3513.2005.09.12 %P 50-53 %V 21 %N 9 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_1082.shtml} %8 2005-09-25 %X

That properly and completely extracting the content of search Web pages is the basic precondition for handling the information retrieved.This paper analyses the structure characteristic of Google Web pages,presents a group of regular expressions for matching the content of these pages,and realizes a content extractor with Visual C#.The results from practical application to many Google Web pages shows that the matching method with regular expressions can extract the whole main content of Google Web pages.