In this paper,an approach for news article extraction from Web page is proposed and this approach applies information theory to DOM tree. Experiment on several news Web sites shows that it is practical.
朱红灿,龙朝阳 . 基于熵的新闻网页抽取方法的研究[J]. 现代图书情报技术, 2007, 2(4): 48-51.
Zhu Hongcan,Long Zhaoyang . An Entropy-Based Approach for News Article Extraction from Web Page. New Technology of Library and Information Service, 2007, 2(4): 48-51.
1Kao H Y,Ho J M,Chen M SWISDOM:Web Intrap age Informative Structure Mining Based on Document Object Model.IEEE Tansactions on Knowledge and Data Engineering:2005,17(5):614-630
2瞿有利,于浩,徐国伟等.Web页面信息块的自动分割. 中文信息学报,2004,18(1):6-13
3孙承杰,关毅. 基于统计的网页正文信息抽取方法的研究.中文信息学报,2004,18(5):17-22
4张敏,高剑峰,马少平. 基于链接描述文本及其上下文的Web信息检索.计算机研究与发展,2004,41(1):221-226