|
|
Study of Web Page Extraction Algorithm in Mobile Meta Search Engine |
Nie Jing, Li Qiang, Pang Li, Ying Huijie |
Computer School,Hangzhou Dianzi University, Hangzhou 310018, China |
|
|
Abstract This paper introduces a Web page extraction algorithm named WEAV(Web-page Extraction Algorithm based on VIPS).WEAV is used in a mobile meta search engine named M-Meta which is designed for extracting the main content of Web pages and returning them to users. And it makes the result be adaptive for mobile devices displaying, improves the retrieval speed and strengthens the usability of Web on mobile devices.
|
Received: 04 August 2010
Published: 04 January 2011
|
|
[1] Cai D,Yu S, Wen J R,et al.VIPS:A Vision-based Page Segmentation Algorithm. Microsoft Technical Report, MSR-TR-2003-79. 2003 . . http://research.microsoft.com/apps/pubs/default.aspx?id=70027.
[2] 于满泉,陈铁睿,许洪波.基于分块的网页信息解析器的研究与设计 [J]. 计算机应用 ,2005,25(4):974-976.
[3] 刘洪,贺琛,黄河燕.WAP页面转换代理系统原理及其实现 [J]. 计算机工程与应用 ,2002,38(4):177-179.
[4] 孙桂煌,刘发升.基于文本特征的网页正文信息提取方法 [J]. 现代计算机 ,2008(9):34-37.
[5] Yu S, Cai D, Wen J R,et al.Improving Pseudo-relevance Feedback in Web Information Retrieval Using Web Page Segmentation .In:Proceedings of the 12th International Conference on World Wide Web,Hungary,Budapest.2003: 11-18.
[6] Geng H,Gao Q,Pan J.Extracting Content for News Web Pages Based on DOM [J].International Journal of Computer Science and Network Security,2007,7(2):124-129.
[7] 张华平.ICTCLAS . .http://mtgroup.ect.ac.cn/~zhp/ICTCLAS.htm.2002.
[8] 路松峰,王丹丹. 面向移动设备的Web页面分块算法 [J]. 小型微型计算机系统 ,2007,28(9):1672-1677.
[9] Hattori G, Hoashi K, Matsumoto K,et al.Robust Web Page Segmentation for Mobile Terminao Using Content-Distances and Page Latout Information . In:Proceedings of the 16th International Conference on World Wide Web,Alberta, Canada.2007: 361-370.
[10] Burget R, Rudolfova I.Web Page Element Classification Based on Visual Features .In: Proceedings of the 1st Asian Conference on Intelligent Information and Database Systems,Dong Hoi,Vietnam.2009:67-72.
[11] 高琰,谷士文,谭立球.基于多种策略的页面内容提取算法 [J]. 西南交通大学学报 ,2007,42(4):473-477.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|