Please wait a minute...
New Technology of Library and Information Service  2007, Vol. 2 Issue (4): 48-51    DOI: 10.11925/infotech.1003-3513.2007.04.12
Current Issue | Archive | Adv Search |
An Entropy-Based Approach for News Article Extraction from Web Page
Zhu Hongcan   Long Zhaoyang
(Management School of Xiangtan University, Xiangtan 411105, China)
Download: PDF (573 KB)  
Export: BibTeX | EndNote (RIS)      
Abstract  

In this paper,an approach for news article extraction from Web page is proposed and this approach applies information theory to DOM tree. Experiment on several news Web sites shows that it is practical.

Key wordsEntropy      Information extraction      Informative block      DOM     
Received: 05 February 2007      Published: 25 April 2007
ZTFLH: 

TP181

 
Corresponding Authors: Zhu Hongcan     E-mail: zhuhongcan@xtu.edu.cn
About author:: Zhu Hongcan,Long Zhaoyang

Cite this article:

Zhu Hongcan,Long Zhaoyang . An Entropy-Based Approach for News Article Extraction from Web Page. New Technology of Library and Information Service, 2007, 2(4): 48-51.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2007.04.12     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2007/V2/I4/48

1Kao H Y,Ho J M,Chen M SWISDOM:Web Intrap age Informative Structure Mining Based on Document Object Model.IEEE Tansactions on Knowledge and Data Engineering:2005,17(5):614-630
2瞿有利,于浩,徐国伟等.Web页面信息块的自动分割. 中文信息学报,2004,18(1):6-13
3孙承杰,关毅. 基于统计的网页正文信息抽取方法的研究.中文信息学报,2004,18(5):17-22
4张敏,高剑峰,马少平. 基于链接描述文本及其上下文的Web信息检索.计算机研究与发展,2004,41(1):221-226

[1] Zhao Ping,Sun Lianying,Tu Shuai,Bian Jianling,Wan Ying. Identifying Scenic Spot Entities Based on Improved Knowledge Transfer[J]. 数据分析与知识发现, 2020, 4(5): 118-126.
[2] Li Chengliang,Zhao Zhongying,Li Chao,Qi Liang,Wen Yan. Extracting Product Properties with Dependency Relationship Embedding and Conditional Random Field[J]. 数据分析与知识发现, 2020, 4(5): 54-65.
[3] Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang. Classification of Short Texts Based on nLD-SVM-RF Model[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[4] Jianhua Hou,Pan Liu. Measuring Tech-Entropy of System Evolution: An Empirical Study of Patents[J]. 数据分析与知识发现, 2019, 3(8): 21-29.
[5] Yan Wen,Lijian Ma,Qingtian Zeng,Wenyan Guo. POI Recommendation Based on Geographic and Social Relationship Preferences[J]. 数据分析与知识发现, 2019, 3(8): 30-39.
[6] Huiying Qi,Yuhe Jiang. Predicting Breast Cancer Survival Length with Multi-Omics Data Fusion[J]. 数据分析与知识发现, 2019, 3(8): 88-93.
[7] Han Huang,Hongyu Wang,Xiaoguang Wang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[8] Wancheng Chen,Haoran Dai,Yinghan Jin. Appraising Home Prices with HEDONIC Model: Case Study of Seattle, U.S.[J]. 数据分析与知识发现, 2019, 3(5): 19-26.
[9] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[10] Jiaxin Ye,Huixiang Xiong. Recommending Personalized Contents from Cross-Domain Resources Based on Tags[J]. 数据分析与知识发现, 2019, 3(2): 21-32.
[11] Pengcheng Xu,Qiang Bi. Identifying Domain Experts Based on Knowledge Super-Network[J]. 数据分析与知识发现, 2019, 3(11): 89-98.
[12] Chengzhi Zhang,Zheng Li. Extracting Sentences of Research Originality from Full Text Academic Articles[J]. 数据分析与知识发现, 2019, 3(10): 12-18.
[13] Kan Liu,Haochen Du. Detecting Twitter Rumors with Deep Transfer Network[J]. 数据分析与知识发现, 2019, 3(10): 47-55.
[14] Li Xiangdong,Gao Fan,Li Youhai. Categorizing Documents Automatically within Common Semantic Space[J]. 数据分析与知识发现, 2018, 2(9): 66-73.
[15] He Youshi,He Shufang. Sentiment Mining of Online Product Reviews Based on Domain Ontology[J]. 数据分析与知识发现, 2018, 2(8): 60-68.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn