移动元搜索引擎中网页内容提取算法研究

doi:10.11925/infotech.1003-3513.2010.10.09

现代图书情报技术

2010, Vol. 26

Issue (10): 54-58 https://doi.org/10.11925/infotech.1003-3513.2010.10.09

知识组织与知识管理

本期目录 | 过刊浏览 | 高级检索

移动元搜索引擎中网页内容提取算法研究

聂靖, 李强, 庞力, 应慧杰

杭州电子科技大学计算机学院杭州 310018

Study of Web Page Extraction Algorithm in Mobile Meta Search Engine

Nie Jing, Li Qiang, Pang Li, Ying Huijie

Computer School,Hangzhou Dianzi University, Hangzhou 310018, China

摘要
参考文献
相关文章
Metrics

全文: PDF (664 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要

提出和实现基于VIPS的网页内容提取算法WEAV(Web-page Extraction Algorithm based on VIPS)。将该算法用于移动元搜索引擎M-Meta中,对移动搜索的结果页面进行内容提取并返回给用户,以适应移动设备的显示,提高用户获取信息的速度,增强Web在移动设备中的可用性。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	聂靖
	李强
	庞力
	应慧杰

关键词 ：移动搜索, 元搜索引擎, 网页内容提取算法, 可用性

Abstract：

This paper introduces a Web page extraction algorithm named WEAV(Web-page Extraction Algorithm based on VIPS).WEAV is used in a mobile meta search engine named M-Meta which is designed for extracting the main content of Web pages and returning them to users. And it makes the result be adaptive for mobile devices displaying, improves the retrieval speed and strengthens the usability of Web on mobile devices.

Key words： Mobile search Meta search engine Web page extraction algorithm Usability

收稿日期: 2010-08-04 出版日期: 2011-01-04

TP319

引用本文:

聂靖, 李强, 庞力, 应慧杰. 移动元搜索引擎中网页内容提取算法研究[J]. 现代图书情报技术, 2010, 26(10): 54-58.
Nie Jing, Li Qiang, Pang Li, Ying Huijie. Study of Web Page Extraction Algorithm in Mobile Meta Search Engine. New Technology of Library and Information Service, 2010, 26(10): 54-58.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2010.10.09 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2010/V26/I10/54

[1] Cai D,Yu S, Wen J R,et al.VIPS:A Vision-based Page Segmentation Algorithm. Microsoft Technical Report, MSR-TR-2003-79. 2003 . . http://research.microsoft.com/apps/pubs/default.aspx?id=70027.

[2] 于满泉,陈铁睿,许洪波.基于分块的网页信息解析器的研究与设计
[J]. 计算机应用 ,2005,25(4):974-976.

[3] 刘洪,贺琛,黄河燕.WAP页面转换代理系统原理及其实现
[J]. 计算机工程与应用 ,2002,38(4):177-179.

[4] 孙桂煌,刘发升.基于文本特征的网页正文信息提取方法
[J]. 现代计算机 ,2008(9):34-37.

[5] Yu S, Cai D, Wen J R,et al.Improving Pseudo-relevance Feedback in Web Information Retrieval Using Web Page Segmentation .In:Proceedings of the 12th International Conference on World Wide Web,Hungary,Budapest.2003: 11-18.

[6] Geng H,Gao Q,Pan J.Extracting Content for News Web Pages Based on DOM
[J].International Journal of Computer Science and Network Security,2007,7(2):124-129.

[7] 张华平.ICTCLAS . .http://mtgroup.ect.ac.cn/~zhp/ICTCLAS.htm.2002.

[8] 路松峰,王丹丹. 面向移动设备的Web页面分块算法
[J]. 小型微型计算机系统 ,2007,28(9):1672-1677.

[9] Hattori G, Hoashi K, Matsumoto K,et al.Robust Web Page Segmentation for Mobile Terminao Using Content-Distances and Page Latout Information . In:Proceedings of the 16th International Conference on World Wide Web,Alberta, Canada.2007: 361-370.

[10] Burget R, Rudolfova I.Web Page Element Classification Based on Visual Features .In: Proceedings of the 1st Asian Conference on Intelligent Information and Database Systems,Dong Hoi,Vietnam.2009:67-72.

[11] 高琰,谷士文,谭立球.基于多种策略的页面内容提取算法
[J]. 西南交通大学学报 ,2007,42(4):473-477.

[1]	吴丹, 毕仁敏. 用户移动搜索与桌面搜索行为对比研究^*[J]. 现代图书情报技术, 2016, 32(2): 1-8.
[2]	吴越, 周义刚, 崔海媛, 聂华. 基于可用性研究的北京大学图书馆门户改版[J]. 现代图书情报技术, 2014, 30(11): 88-94.
[3]	王继民, 李雷明子, 王明星. 移动搜索研究的知识图谱分析[J]. 现代图书情报技术, 2012, (9): 29-35.
[4]	黄晓斌, 邱明辉. 远程可用性评价及其在数字图书馆评价中的应用综述[J]. 现代图书情报技术, 2012, 28(1): 1-6.
[5]	柯青, 成颖, 郑彦宁, 潘云涛. 搜索引擎可用性评价指标体系构建[J]. 现代图书情报技术, 2011, (11): 24-30.
[6]	黄晓斌, 邱明辉. 网站用户界面设计模式语言的比较研究[J]. 现代图书情报技术, 2011, 27(10): 12-17.
[7]	景璟, 洪颖, 蒋媛媛, 杲晓锋. 基于相关反馈的Web检索提问融合研究[J]. 现代图书情报技术, 2011, 27(1): 57-62.
[8]	袁红. 基于网络内容分析的高校门户网站可用性测评 ——以江苏省为例[J]. 现代图书情报技术, 2010, 26(10): 70-75.
[9]	王建冬. 国外可用性研究进展述评[J]. 现代图书情报技术, 2009, (9): 7-16.
[10]	胡晓青,张建勇. 数据库检索系统可用性评价指标与实证研究*[J]. 现代图书情报技术, 2009, 3(2): 46-50.
[11]	王梅文. 基于本体进行自动分类的元搜索引擎的设计与实现[J]. 现代图书情报技术, 2008, 24(9): 58-63.
[12]	李培. 基于词序的多关键词加权检索融合研究*[J]. 现代图书情报技术, 2008, 24(10): 32-37.
[13]	赵宇翔 . 公共图书馆网站信息构建可用性评价研究[J]. 现代图书情报技术, 2007, 2(3): 60-64.
[14]	马翠嫦 . 国外数字图书馆可用性评价研究综述[J]. 现代图书情报技术, 2007, 2(2): 1-6.
[15]	洪梅,马建霞 . 开源机构库软件可用性评估方法的探讨[J]. 现代图书情报技术, 2007, 2(12): 6-10.

Viewed

Full text

Abstract

Cited

Shared

Discussed