提出和实现基于VIPS的网页内容提取算法WEAV(Web-page Extraction Algorithm based on VIPS)。将该算法用于移动元搜索引擎M-Meta中,对移动搜索的结果页面进行内容提取并返回给用户,以适应移动设备的显示,提高用户获取信息的速度,增强Web在移动设备中的可用性。
This paper introduces a Web page extraction algorithm named WEAV(Web-page Extraction Algorithm based on VIPS).WEAV is used in a mobile meta search engine named M-Meta which is designed for extracting the main content of Web pages and returning them to users. And it makes the result be adaptive for mobile devices displaying, improves the retrieval speed and strengthens the usability of Web on mobile devices.
聂靖, 李强, 庞力, 应慧杰. 移动元搜索引擎中网页内容提取算法研究[J]. 现代图书情报技术, 2010, 26(10): 54-58.
Nie Jing, Li Qiang, Pang Li, Ying Huijie. Study of Web Page Extraction Algorithm in Mobile Meta Search Engine. New Technology of Library and Information Service, 2010, 26(10): 54-58.
[5] Yu S, Cai D, Wen J R,et al.Improving Pseudo-relevance Feedback in Web Information Retrieval Using Web Page Segmentation .In:Proceedings of the 12th International Conference on World Wide Web,Hungary,Budapest.2003: 11-18.
[6] Geng H,Gao Q,Pan J.Extracting Content for News Web Pages Based on DOM [J].International Journal of Computer Science and Network Security,2007,7(2):124-129.
[9] Hattori G, Hoashi K, Matsumoto K,et al.Robust Web Page Segmentation for Mobile Terminao Using Content-Distances and Page Latout Information . In:Proceedings of the 16th International Conference on World Wide Web,Alberta, Canada.2007: 361-370.
[10] Burget R, Rudolfova I.Web Page Element Classification Based on Visual Features .In: Proceedings of the 1st Asian Conference on Intelligent Information and Database Systems,Dong Hoi,Vietnam.2009:67-72.