Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (10): 54-58    DOI: 10.11925/infotech.1003-3513.2010.10.09
article Current Issue | Archive | Adv Search |
Study of Web Page Extraction Algorithm in Mobile Meta Search Engine
Nie Jing, Li Qiang, Pang Li, Ying Huijie
Computer School,Hangzhou Dianzi University, Hangzhou 310018, China
Download: PDF(664 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

This paper introduces a Web page extraction algorithm named WEAV(Web-page Extraction Algorithm based on VIPS).WEAV is used in a mobile meta search engine named M-Meta which is designed for extracting the main content of Web pages and returning them to users. And it makes the result be adaptive for mobile devices displaying, improves the retrieval speed and strengthens the usability of Web on mobile devices.

Key wordsMobile      search      Meta      search      engine      Web      page      extraction      algorithm      Usability     
Received: 04 August 2010      Published: 04 January 2011
: 

TP319

 

Cite this article:

Nie Jing, Li Qiang, Pang Li, Ying Huijie. Study of Web Page Extraction Algorithm in Mobile Meta Search Engine. New Technology of Library and Information Service, 2010, 26(10): 54-58.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2010.10.09     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2010/V26/I10/54


[1] Cai D,Yu S, Wen J R,et al.VIPS:A Vision-based Page Segmentation Algorithm. Microsoft Technical Report, MSR-TR-2003-79. 2003 . . http://research.microsoft.com/apps/pubs/default.aspx?id=70027.

[2] 于满泉,陈铁睿,许洪波.基于分块的网页信息解析器的研究与设计
[J]. 计算机应用 ,2005,25(4):974-976.

[3] 刘洪,贺琛,黄河燕.WAP页面转换代理系统原理及其实现
[J]. 计算机工程与应用 ,2002,38(4):177-179.

[4] 孙桂煌,刘发升.基于文本特征的网页正文信息提取方法
[J]. 现代计算机 ,2008(9):34-37.

[5] Yu S, Cai D, Wen J R,et al.Improving Pseudo-relevance Feedback in Web Information Retrieval Using Web Page Segmentation .In:Proceedings of the 12th International Conference on World Wide Web,Hungary,Budapest.2003: 11-18.

[6] Geng H,Gao Q,Pan J.Extracting Content for News Web Pages Based on DOM
[J].International Journal of Computer Science and Network Security,2007,7(2):124-129.

[7] 张华平.ICTCLAS . .http://mtgroup.ect.ac.cn/~zhp/ICTCLAS.htm.2002.

[8] 路松峰,王丹丹. 面向移动设备的Web页面分块算法
[J]. 小型微型计算机系统 ,2007,28(9):1672-1677.

[9] Hattori G, Hoashi K, Matsumoto K,et al.Robust Web Page Segmentation for Mobile Terminao Using Content-Distances and Page Latout Information . In:Proceedings of the 16th International Conference on World Wide Web,Alberta, Canada.2007: 361-370.

[10] Burget R, Rudolfova I.Web Page Element Classification Based on Visual Features .In: Proceedings of the 1st Asian Conference on Intelligent Information and Database Systems,Dong Hoi,Vietnam.2009:67-72.

[11] 高琰,谷士文,谭立球.基于多种策略的页面内容提取算法
[J]. 西南交通大学学报 ,2007,42(4):473-477.

[1] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[2] Xiuxian Wen,Jian Xu. Research on Product Characteristics Extraction and Hedonic Price Based on User Comments[J]. 数据分析与知识发现, 2019, 3(7): 42-51.
[3] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[4] Zhu Fu,Yuefen Wang,Xuhui Ding. Semantic Representation of Design Process Knowledge Reuse[J]. 数据分析与知识发现, 2019, 3(6): 21-29.
[5] Qikai Cheng,Jiamin Wang,Wei Lu. Discovering Domain Vocabularies Based on Citation Co-word Network[J]. 数据分析与知识发现, 2019, 3(6): 57-65.
[6] Ruihua Qi,Junyi Zhou,Xu Guo,Caihong Liu. Extracting Book Review Topics with Knowledge Base[J]. 数据分析与知识发现, 2019, 3(6): 83-91.
[7] Qiang Liu,Yunwei Chen,Zhiqiang Zhang. Methods and Applications of Norwegian Model for Science and Technology Evaluation[J]. 数据分析与知识发现, 2019, 3(5): 41-50.
[8] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[9] Yuemin Wu,Ganggui Ding,Bin Hu. Extracting Relationship of Agricultural Financial Texts with Attention Mechanism[J]. 数据分析与知识发现, 2019, 3(5): 86-92.
[10] Xiaolan Wu,Chengzhi Zhang. Analysis of Knowledge Flow Based on Academic Social Networks:
A Case Study of ScienceNet.cn
[J]. 数据分析与知识发现, 2019, 3(4): 107-116.
[11] Tingxin Wen,Yangzi Li,Jingshuang Sun. News Hotspots Discovery Method Based on Multi Factor Feature Selection and AFOA/K-means[J]. 数据分析与知识发现, 2019, 3(4): 97-106.
[12] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[13] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[14] Zhen Zhang,Jin Zeng. Extracting Keywords from User Comments: Case Study of Meituan[J]. 数据分析与知识发现, 2019, 3(3): 36-44.
[15] Shengchun Ding,Linlin Hou,Ying Wang. Product Knowledge Map Construction Based on the E-commerce Data[J]. 数据分析与知识发现, 2019, 3(3): 45-56.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn