Please wait a minute...
New Technology of Library and Information Service  2009, Vol. Issue (10): 22-27    DOI: 10.11925/infotech.1003-3513.2009.10.04
Current Issue | Archive | Adv Search |
Research on Identifying Maximal Meaningful Node from Web Page
Li Yazi   Fang An   Chen Wei   Zhu Feng
(Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing 100020, China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

 The paper analyzes the research and implementation algorithm about identifying the maximal meaningful node. Making uses of and improving the style tree,it computes the importance of nodes to find the maximal meaningful node. Finally, an example is given.

Key wordsStyle tree      Maximal meaningful node      Node importance      Dom tree     
Received: 31 August 2009      Published: 25 October 2009
: 

G250

 
Corresponding Authors: Li Yazi     E-mail: 8982632@163.com
About author:: Li Yazi,Fang An,Chen Wei,Zhu Feng

Cite this article:

Li Yazi,Fang An,Chen Wei,Zhu Feng. Research on Identifying Maximal Meaningful Node from Web Page. New Technology of Library and Information Service, 2009, (10): 22-27.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2009.10.04     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2009/V/I10/22

[1] Yi L, Liu B. Web Page Cleaning for Web Mining Through Feature Weighting[C]. In:Proceedings of the 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico. 2003:9-15.
[2] Li J, Ezeife C I. Cleaning Web Pages for Effective Web Content Mining[C].In:Proceedings of the 17th International Conference.  2006:560-571.
[3] Laender A, Riberiro-Neto B, Da Silva A S, et al. A Brief Survey of Web Data Extraction Tools[J]. ACM SIGMOD Record, 2002,31(2):84-93.
[4] Adelberg B. NoDoSE - A Tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents[J]. ACM SIGMOD Record,1998,27(2):283-294.
[5] Baumgartner R, Flesca S, Gottlob G. Visual Web Information Extraction with Lixto[C]. In:Proceedings of the 27th International Conference on Very Large Data Bases.2001:119-128.
[6] Chidlowskii B, Ragetli J, De Rijke M. Automatic Wrapper Generation for Web Search Engines[C].In: Proceedings of the 1st International Conference on Web-Age Information Management,Shanghai, China. London, UK:Springer-Verlag,2005:66-75.
[7] Crescenzi V, Mecca G, Merialdo P. RoadRunner:Towards Automatic Data Extraction from Large Web Sites[C]. In:Proceedings of the 27th International Conference on Very Large Data Bases.2001:109-118.
[8] DOM Specification[EB/OL].[2008-09-03].http://www.w3.org/DOM/DOMTR.
[9] Debnath S,Mitra P,Giles C  L. Automatic Extraction of Informative Blocks from Webpages[C].In: Proceedings of the 2005 ACM Symposium on Applied Computing,Santa Fe, New Mexico.2005:1722-1726.
[10] 骆思安,徐俊杰.应用MMB算法清理网页噪声和撷取网页[EB/OL].[2009-06-25]. http://ccnet.km.nccu.edu.tw/xms/read_attach.php?id=129.
[11] Zhao H K,Meng W Y, Wu Z H,et al.Fully Automatic Wrapper Generation for Search Engines[C]. In: Proceedings of the 14th International Conference on World Wide Web,Chiba, Japan. ACM Press,2005: 66-75.
[12] Song R, Liu H, Wen J R,et al. Learning Important Models for Web Page Blocks Based on Layout and Content Analysis[J]. ACM SIGKDD Explorations,2005,6(2):14-23.
[13] Buttler D, Liu L, Pu C. A Fully Automated Object Extraction System for the World Wide Web[EB/OL]. [2008-11-10]. http://ieeexplore.ieee.org/iel5/7339/19871/00918966.pdf?arnumber=918966.
[14] Yi L, Liu B,Xiao L.Eliminating Noisy Information in Web Pages for Data Mining[C].In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, USA.2003:296-305.
[15] Debnath S,Mitra P,Giles C L.Identifying Content Blocks from Web Documents[J]. Lecture Notes in Computer Science, 2005(3488):285-293.

[1] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[2] Lv Juwang,Du Yuncheng,Wang Hongwei,Shi Shuicai. The Study of Topic Information Extraction from Web Pages Based on A New Method of Topic Information Calculation[J]. 现代图书情报技术, 2008, 24(12): 48-53.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn