Please wait a minute...
Advanced Search
现代图书情报技术  2009, Vol. Issue (10): 22-27     https://doi.org/10.11925/infotech.1003-3513.2009.10.04
  数字图书馆 本期目录 | 过刊浏览 | 高级检索 |
Web页面最大有意义节点发现算法研究
李亚子 方安 陈薇 朱峰
(中国医学科学院医学信息研究所 北京 100020)
Research on Identifying Maximal Meaningful Node from Web Page
Li Yazi   Fang An   Chen Wei   Zhu Feng
(Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing 100020, China)
全文: PDF (548 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 

 在分析国内外研究和实现发现Web页面中最大有意义节点算法的基础上,将多个相似页面压缩成为样式树,通过计算节点的重要性发现最大有意义节点并给出样例分析。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
李亚子
关键词 样式树最大有意义节点节点重要性DOM树    
Abstract

 The paper analyzes the research and implementation algorithm about identifying the maximal meaningful node. Making uses of and improving the style tree,it computes the importance of nodes to find the maximal meaningful node. Finally, an example is given.

Key wordsStyle tree    Maximal meaningful node    Node importance    Dom tree
收稿日期: 2009-08-31      出版日期: 2009-10-25
: 

G250

 
通讯作者: 李亚子     E-mail: 8982632@163.com
作者简介: 李亚子,方安,陈薇,朱峰
引用本文:   
李亚子,方安,陈薇,朱峰. Web页面最大有意义节点发现算法研究[J]. 现代图书情报技术, 2009, (10): 22-27.
Li Yazi,Fang An,Chen Wei,Zhu Feng. Research on Identifying Maximal Meaningful Node from Web Page. New Technology of Library and Information Service, 2009, (10): 22-27.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2009.10.04      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2009/V/I10/22

[1] Yi L, Liu B. Web Page Cleaning for Web Mining Through Feature Weighting[C]. In:Proceedings of the 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico. 2003:9-15.
[2] Li J, Ezeife C I. Cleaning Web Pages for Effective Web Content Mining[C].In:Proceedings of the 17th International Conference.  2006:560-571.
[3] Laender A, Riberiro-Neto B, Da Silva A S, et al. A Brief Survey of Web Data Extraction Tools[J]. ACM SIGMOD Record, 2002,31(2):84-93.
[4] Adelberg B. NoDoSE - A Tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents[J]. ACM SIGMOD Record,1998,27(2):283-294.
[5] Baumgartner R, Flesca S, Gottlob G. Visual Web Information Extraction with Lixto[C]. In:Proceedings of the 27th International Conference on Very Large Data Bases.2001:119-128.
[6] Chidlowskii B, Ragetli J, De Rijke M. Automatic Wrapper Generation for Web Search Engines[C].In: Proceedings of the 1st International Conference on Web-Age Information Management,Shanghai, China. London, UK:Springer-Verlag,2005:66-75.
[7] Crescenzi V, Mecca G, Merialdo P. RoadRunner:Towards Automatic Data Extraction from Large Web Sites[C]. In:Proceedings of the 27th International Conference on Very Large Data Bases.2001:109-118.
[8] DOM Specification[EB/OL].[2008-09-03].http://www.w3.org/DOM/DOMTR.
[9] Debnath S,Mitra P,Giles C  L. Automatic Extraction of Informative Blocks from Webpages[C].In: Proceedings of the 2005 ACM Symposium on Applied Computing,Santa Fe, New Mexico.2005:1722-1726.
[10] 骆思安,徐俊杰.应用MMB算法清理网页噪声和撷取网页[EB/OL].[2009-06-25]. http://ccnet.km.nccu.edu.tw/xms/read_attach.php?id=129.
[11] Zhao H K,Meng W Y, Wu Z H,et al.Fully Automatic Wrapper Generation for Search Engines[C]. In: Proceedings of the 14th International Conference on World Wide Web,Chiba, Japan. ACM Press,2005: 66-75.
[12] Song R, Liu H, Wen J R,et al. Learning Important Models for Web Page Blocks Based on Layout and Content Analysis[J]. ACM SIGKDD Explorations,2005,6(2):14-23.
[13] Buttler D, Liu L, Pu C. A Fully Automated Object Extraction System for the World Wide Web[EB/OL]. [2008-11-10]. http://ieeexplore.ieee.org/iel5/7339/19871/00918966.pdf?arnumber=918966.
[14] Yi L, Liu B,Xiao L.Eliminating Noisy Information in Web Pages for Data Mining[C].In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, USA.2003:296-305.
[15] Debnath S,Mitra P,Giles C L.Identifying Content Blocks from Web Documents[J]. Lecture Notes in Computer Science, 2005(3488):285-293.

[1] 刘志强,都云程,施水才. 基于改进的隐马尔科夫模型的网页新闻关键信息抽取*[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[2] 朱毅华, 张超群, 曾通, 吴龙凤, 徐玛丽, 王东波, 李晓晖. 基于子树相似度计算的网页评论提取算法研究[J]. 现代图书情报技术, 2013, 29(11): 52-59.
[3] 吕聚旺,都云程,王弘蔚,施水才. 基于新型主题信息量化方法的Web主题信息提取研究*[J]. 现代图书情报技术, 2008, 24(12): 48-53.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn