Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (10): 54-58    DOI: 10.11925/infotech.1003-3513.2010.10.09
article Current Issue | Archive | Adv Search |
Study of Web Page Extraction Algorithm in Mobile Meta Search Engine
Nie Jing, Li Qiang, Pang Li, Ying Huijie
Computer School,Hangzhou Dianzi University, Hangzhou 310018, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

This paper introduces a Web page extraction algorithm named WEAV(Web-page Extraction Algorithm based on VIPS).WEAV is used in a mobile meta search engine named M-Meta which is designed for extracting the main content of Web pages and returning them to users. And it makes the result be adaptive for mobile devices displaying, improves the retrieval speed and strengthens the usability of Web on mobile devices.

Key wordsMobile      search      Meta      search      engine      Web      page      extraction      algorithm      Usability     
Received: 04 August 2010      Published: 04 January 2011
: 

TP319

 

Cite this article:

Nie Jing, Li Qiang, Pang Li, Ying Huijie. Study of Web Page Extraction Algorithm in Mobile Meta Search Engine. New Technology of Library and Information Service, 2010, 26(10): 54-58.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2010.10.09     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2010/V26/I10/54


[1] Cai D,Yu S, Wen J R,et al.VIPS:A Vision-based Page Segmentation Algorithm. Microsoft Technical Report, MSR-TR-2003-79. 2003 . . http://research.microsoft.com/apps/pubs/default.aspx?id=70027.

[2] 于满泉,陈铁睿,许洪波.基于分块的网页信息解析器的研究与设计
[J]. 计算机应用 ,2005,25(4):974-976.

[3] 刘洪,贺琛,黄河燕.WAP页面转换代理系统原理及其实现
[J]. 计算机工程与应用 ,2002,38(4):177-179.

[4] 孙桂煌,刘发升.基于文本特征的网页正文信息提取方法
[J]. 现代计算机 ,2008(9):34-37.

[5] Yu S, Cai D, Wen J R,et al.Improving Pseudo-relevance Feedback in Web Information Retrieval Using Web Page Segmentation .In:Proceedings of the 12th International Conference on World Wide Web,Hungary,Budapest.2003: 11-18.

[6] Geng H,Gao Q,Pan J.Extracting Content for News Web Pages Based on DOM
[J].International Journal of Computer Science and Network Security,2007,7(2):124-129.

[7] 张华平.ICTCLAS . .http://mtgroup.ect.ac.cn/~zhp/ICTCLAS.htm.2002.

[8] 路松峰,王丹丹. 面向移动设备的Web页面分块算法
[J]. 小型微型计算机系统 ,2007,28(9):1672-1677.

[9] Hattori G, Hoashi K, Matsumoto K,et al.Robust Web Page Segmentation for Mobile Terminao Using Content-Distances and Page Latout Information . In:Proceedings of the 16th International Conference on World Wide Web,Alberta, Canada.2007: 361-370.

[10] Burget R, Rudolfova I.Web Page Element Classification Based on Visual Features .In: Proceedings of the 1st Asian Conference on Intelligent Information and Database Systems,Dong Hoi,Vietnam.2009:67-72.

[11] 高琰,谷士文,谭立球.基于多种策略的页面内容提取算法
[J]. 西南交通大学学报 ,2007,42(4):473-477.

[1] Wang Yifan,Li Bo,Shi Hua,Miao Wei,Jiang Bin. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[2] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[3] Han Hui, Liu Xiuwen. Automatic Scoring for Subjective Questions in Maritime Competency Assessment[J]. 数据分析与知识发现, 2021, 5(8): 113-121.
[4] Xu Zengxulin, Xie Jing, Yu Qianqian. Designing New Evaluation Model for Talents[J]. 数据分析与知识发现, 2021, 5(8): 122-131.
[5] Chai Qingfeng, Shi Linyan, Mei Shan, Xiong Haitao, He Huixin. Extracting Knowledge Elements of Sci-Tech Literature Based on Artificial and Machine Features[J]. 数据分析与知识发现, 2021, 5(8): 132-144.
[6] Tan Ying, Tang Yifei. Extracting Citation Contents with Coreference Resolution[J]. 数据分析与知识发现, 2021, 5(8): 25-33.
[7] Zhang Jiandong, Chen Shiji, Xu Xiaoting, Zuo Wenge. Extracting PDF Tables Based on Word Vectors[J]. 数据分析与知识发现, 2021, 5(8): 34-44.
[8] Wang Qinjie, Qin Chunxiu, Ma Xubu, Liu Huailiang, Xu Cunzhen. Recommending Scientific Literature Based on Author Preference and Heterogeneous Information Network[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[9] Su Qiang, Hou Xiaoli, Zou Ni. Predicting Surgical Infections Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[10] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[11] Zhao Danning,Mu Dongmei,Bai Sen. Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[12] Chen Xingyue, Ni Liping, Ni Zhiwei. Extracting Financial Events with ELECTRA and Part-of-Speech[J]. 数据分析与知识发现, 2021, 5(7): 36-47.
[13] Dong Mei,Chang Zhijun,Zhang Runjie. A Multiple Pattern Matching Algorithm for Specifications of Incremental Metadata for Sci-Tech Literature[J]. 数据分析与知识发现, 2021, 5(6): 135-144.
[14] Dong Zhenheng,Lv Xueqiang,Ren Weiping,Jiang Yang,Li Guolin. Review of Key Technologies of High Performance Blockchain[J]. 数据分析与知识发现, 2021, 5(6): 14-24.
[15] Lu Linong,Zhu Zhongming,Zhang Wangqiang,Wang Xiaochun. Cross-database Knowledge Integration and Fingerprint of Institutional Repositories with Lingo3G Clustering Algorithm[J]. 数据分析与知识发现, 2021, 5(5): 127-132.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn