New Technology of Library and Information Service  2007, Vol. 2 Issue (4): 48-51    DOI: 10.11925/infotech.1003-3513.2007.04.12
An Entropy-Based Approach for News Article Extraction from Web Page
Zhu Hongcan   Long Zhaoyang
(Management School of Xiangtan University, Xiangtan 411105, China)
In this paper,an approach for news article extraction from Web page is proposed and this approach applies information theory to DOM tree. Experiment on several news Web sites shows that it is practical.

Key wordsEntropy      Information extraction      Informative block      DOM     
Received: 05 February 2007      Published: 25 April 2007


Corresponding Authors: Zhu Hongcan     E-mail:
About author:: Zhu Hongcan,Long Zhaoyang

Zhu Hongcan,Long Zhaoyang . An Entropy-Based Approach for News Article Extraction from Web Page. New Technology of Library and Information Service, 2007, 2(4): 48-51.

1Kao H Y,Ho J M,Chen M SWISDOM:Web Intrap age Informative Structure Mining Based on Document Object Model.IEEE Tansactions on Knowledge and Data Engineering:2005,17(5):614-630
2瞿有利,于浩,徐国伟等.Web页面信息块的自动分割. 中文信息学报,2004,18(1):6-13
3孙承杰,关毅. 基于统计的网页正文信息抽取方法的研究.中文信息学报,2004,18(5):17-22
4张敏,高剑峰,马少平. 基于链接描述文本及其上下文的Web信息检索.计算机研究与发展,2004,41(1):221-226

