|
|
An Entropy-Based Approach for News Article Extraction from Web Page |
Zhu Hongcan Long Zhaoyang |
(Management School of Xiangtan University, Xiangtan 411105, China) |
|
|
Abstract In this paper,an approach for news article extraction from Web page is proposed and this approach applies information theory to DOM tree. Experiment on several news Web sites shows that it is practical.
|
Received: 05 February 2007
Published: 25 April 2007
|
|
Corresponding Authors:
Zhu Hongcan
E-mail: zhuhongcan@xtu.edu.cn
|
About author:: Zhu Hongcan,Long Zhaoyang |
1Kao H Y,Ho J M,Chen M SWISDOM:Web Intrap age Informative Structure Mining Based on Document Object Model.IEEE Tansactions on Knowledge and Data Engineering:2005,17(5):614-630
2瞿有利,于浩,徐国伟等.Web页面信息块的自动分割. 中文信息学报,2004,18(1):6-13
3孙承杰,关毅. 基于统计的网页正文信息抽取方法的研究.中文信息学报,2004,18(5):17-22
4张敏,高剑峰,马少平. 基于链接描述文本及其上下文的Web信息检索.计算机研究与发展,2004,41(1):221-226 |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|