Research on Identifying Maximal Meaningful Node from Web Page

doi:10.11925/infotech.1003-3513.2009.10.04

New Technology of Library and Information Service

2009, Vol.

Issue (10): 22-27 DOI: 10.11925/infotech.1003-3513.2009.10.04

Current Issue | Archive | Adv Search

Research on Identifying Maximal Meaningful Node from Web Page

Li Yazi Fang An Chen Wei Zhu Feng

(Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing 100020, China)

Download:
Export: BibTeX | EndNote (RIS)

Abstract

The paper analyzes the research and implementation algorithm about identifying the maximal meaningful node. Making uses of and improving the style tree，it computes the importance of nodes to find the maximal meaningful node. Finally， an example is given.

Key words： Style tree Maximal meaningful node Node importance Dom tree

Received: 31 August 2009 Published: 25 October 2009

G250

Corresponding Authors: Li Yazi E-mail: 8982632@163.com

About author:: Li Yazi,Fang An,Chen Wei,Zhu Feng

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	LI Ya-Zi

Cite this article:

Li Yazi,Fang An,Chen Wei,Zhu Feng. Research on Identifying Maximal Meaningful Node from Web Page. New Technology of Library and Information Service, 2009, (10): 22-27.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2009.10.04 OR https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2009/V/I10/22

［1］ Yi L, Liu B. Web Page Cleaning for Web Mining Through Feature Weighting［C］. In：Proceedings of the 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico. 2003：9-15.
［2］ Li J, Ezeife C I. Cleaning Web Pages for Effective Web Content Mining［C］.In:Proceedings of the 17th International Conference. 2006:560-571.
［3］ Laender A, Riberiro-Neto B, Da Silva A S, et al. A Brief Survey of Web Data Extraction Tools［J］. ACM SIGMOD Record, 2002,31(2):84-93.
［4］ Adelberg B. NoDoSE - A Tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents［J］. ACM SIGMOD Record，1998，27（2）:283-294.
［5］ Baumgartner R, Flesca S, Gottlob G. Visual Web Information Extraction with Lixto［C］. In:Proceedings of the 27th International Conference on Very Large Data Bases.2001:119-128.
［6］ Chidlowskii B, Ragetli J, De Rijke M. Automatic Wrapper Generation for Web Search Engines［C］.In: Proceedings of the 1st International Conference on Web-Age Information Management,Shanghai, China. London, UK：Springer-Verlag，2005:66-75.
［7］ Crescenzi V, Mecca G, Merialdo P. RoadRunner:Towards Automatic Data Extraction from Large Web Sites［C］. In:Proceedings of the 27th International Conference on Very Large Data Bases.2001:109-118.
［8］ DOM Specification［EB/OL］.［2008-09-03］.http://www.w3.org/DOM/DOMTR.
［9］ Debnath S，Mitra P，Giles C L. Automatic Extraction of Informative Blocks from Webpages［C］.In: Proceedings of the 2005 ACM Symposium on Applied Computing，Santa Fe, New Mexico.2005:1722-1726.
［10］骆思安,徐俊杰.应用MMB算法清理网页噪声和撷取网页［EB/OL］.［2009-06-25］. http://ccnet.km.nccu.edu.tw/xms/read_attach.php?id=129.
［11］ Zhao H K，Meng W Y, Wu Z H，et al.Fully Automatic Wrapper Generation for Search Engines［C］. In: Proceedings of the 14th International Conference on World Wide Web,Chiba, Japan. ACM Press，2005: 66-75.
［12］ Song R, Liu H, Wen J R,et al. Learning Important Models for Web Page Blocks Based on Layout and Content Analysis［J］. ACM SIGKDD Explorations,2005,6(2):14-23.
［13］ Buttler D, Liu L, Pu C. A Fully Automated Object Extraction System for the World Wide Web［EB/OL］. ［2008-11-10］. http://ieeexplore.ieee.org/iel5/7339/19871/00918966.pdf?arnumber=918966.
［14］ Yi L, Liu B,Xiao L.Eliminating Noisy Information in Web Pages for Data Mining［C］.In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, USA.2003:296-305.
［15］ Debnath S，Mitra P，Giles C L.Identifying Content Blocks from Web Documents［J］. Lecture Notes in Computer Science, 2005（3488）:285-293.

[1]	Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[2]	Lv Juwang,Du Yuncheng,Wang Hongwei,Shi Shuicai. The Study of Topic Information Extraction from Web Pages Based on A New Method of Topic Information Calculation[J]. 现代图书情报技术, 2008, 24(12): 48-53.

Viewed

Full text

Abstract

Cited

Shared

Discussed