Please wait a minute...
New Technology of Library and Information Service  2007, Vol. 2 Issue (4): 48-51    DOI: 10.11925/infotech.1003-3513.2007.04.12
Current Issue | Archive | Adv Search |
An Entropy-Based Approach for News Article Extraction from Web Page
Zhu Hongcan   Long Zhaoyang
(Management School of Xiangtan University, Xiangtan 411105, China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

In this paper,an approach for news article extraction from Web page is proposed and this approach applies information theory to DOM tree. Experiment on several news Web sites shows that it is practical.

Key wordsEntropy      Information extraction      Informative block      DOM     
Received: 05 February 2007      Published: 25 April 2007
: 

TP181

 
Corresponding Authors: Zhu Hongcan     E-mail: zhuhongcan@xtu.edu.cn
About author:: Zhu Hongcan,Long Zhaoyang

Cite this article:

Zhu Hongcan,Long Zhaoyang . An Entropy-Based Approach for News Article Extraction from Web Page. New Technology of Library and Information Service, 2007, 2(4): 48-51.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2007.04.12     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2007/V2/I4/48

1Kao H Y,Ho J M,Chen M SWISDOM:Web Intrap age Informative Structure Mining Based on Document Object Model.IEEE Tansactions on Knowledge and Data Engineering:2005,17(5):614-630
2瞿有利,于浩,徐国伟等.Web页面信息块的自动分割. 中文信息学报,2004,18(1):6-13
3孙承杰,关毅. 基于统计的网页正文信息抽取方法的研究.中文信息学报,2004,18(5):17-22
4张敏,高剑峰,马少平. 基于链接描述文本及其上下文的Web信息检索.计算机研究与发展,2004,41(1):221-226

[1] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] Shan Xiaohong,Wang Chunwen,Liu Xiaoyan,Han Shengxi,Yang Juan. Identifying Lead Users in Open Innovation Community from Knowledge-based Perspectives[J]. 数据分析与知识发现, 2021, 5(9): 85-96.
[3] Liu Yuanchen, Wang Hao, Gao Yaqi. Predicting Online Music Playbacks and Influencing Factors[J]. 数据分析与知识发现, 2021, 5(8): 100-112.
[4] Tan Ying, Tang Yifei. Extracting Citation Contents with Coreference Resolution[J]. 数据分析与知识发现, 2021, 5(8): 25-33.
[5] Zhu Hou,Fang Qingyan. Quantifying and Examining Privacy Paradox of Social Media Users[J]. 数据分析与知识发现, 2021, 5(7): 111-125.
[6] Chen Wenjie,Wen Yi,Yang Ning. Fuzzy Overlapping Community Detection Algorithm Based on Node Vector Representation[J]. 数据分析与知识发现, 2021, 5(5): 41-50.
[7] Cheng Bin,Shi Shuicai,Du Yuncheng,Xiao Shibin. Keyword Extraction for Journals Based on Part-of-Speech and BiLSTM-CRF Combined Model[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[8] Zheng Xinman, Dong Yu. Constructing Degree Lexicon for STI Policy Texts[J]. 数据分析与知识发现, 2021, 5(10): 81-93.
[9] Zhao Ping,Sun Lianying,Tu Shuai,Bian Jianling,Wan Ying. Identifying Scenic Spot Entities Based on Improved Knowledge Transfer[J]. 数据分析与知识发现, 2020, 4(5): 118-126.
[10] Li Chengliang,Zhao Zhongying,Li Chao,Qi Liang,Wen Yan. Extracting Product Properties with Dependency Relationship Embedding and Conditional Random Field[J]. 数据分析与知识发现, 2020, 4(5): 54-65.
[11] Qi Ruihua,Jian Yue,Guo Xu,Guan Jinghua,Yang Mingxin. Sentiment Analysis of Cross-Domain Product Reviews Based on Feature Fusion and Attention Mechanism[J]. 数据分析与知识发现, 2020, 4(12): 85-94.
[12] Peng Chen,Lv Xueqiang,Sun Ning,Zang Le,Jiang Zhaocai,Song Li. Building Phrase Dictionary for Defective Products with Convolutional Neural Network[J]. 数据分析与知识发现, 2020, 4(11): 112-120.
[13] Wang Sili,Zhu Zhongming,Yang Heng,Liu Wei. Automatically Identifying Hypernym-Hyponym Relations of Domain Concepts with Patterns and Projection Learning[J]. 数据分析与知识发现, 2020, 4(11): 15-25.
[14] Qin Chenglei,Zhang Chengzhi. Recognizing Structure Functions of Academic Articles with Hierarchical Attention Network[J]. 数据分析与知识发现, 2020, 4(11): 26-42.
[15] Wang Yi,Shen Zhe,Yao Yifan,Cheng Ying. Domain-Specific Event Graph Construction Methods:A Review[J]. 数据分析与知识发现, 2020, 4(10): 1-13.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn