Please wait a minute...
New Technology of Library and Information Service  2007, Vol. 2 Issue (8): 59-62    DOI: 10.11925/infotech.1003-3513.2007.08.14
Current Issue | Archive | Adv Search |
Study of Statistics-rule Based Hierarchical Web Page Classification
Tan Jinbo Yang Xiaojiang2   Li Yi2
1(Educational Technology Department,Shandong Normal University,Jinan 250014,China)
2(Educational Technology Department,Nanjing Normal University,Nanjing 210097,China)
Download: PDF(390 KB)   HTML  
Export: BibTeX | EndNote (RIS)      

Statistics-based classification methods are common-used in hierarchical Web classification.However,classification precision of statistics-based methods often drops when categories are very similar to each other because of feature overlapping.Due to the nature of hierarchical Web classification,categories sharing the same parent (e.g.,leaf categories in the hierarchy) are often very similar to each other.To improve the classification precision,the paper proposes to use rule-based classification methods on top of statistics-based methods in hierarchical Web classification.Experiments show that our methods perform well on our education Web collections.

Key wordsStatistics-based classification      Rule-based classification      Hierarchical Web classification      Statistics-rule based classification     
Received: 11 June 2007      Published: 25 August 2007


Corresponding Authors: Tan Jinbo     E-mail:
About author:: Tan Jinbo,Yang Xiaojiang,Li Yi

Cite this article:

Tan Jinbo,Yang Xiaojiang,Li Yi. Study of Statistics-rule Based Hierarchical Web Page Classification. New Technology of Library and Information Service, 2007, 2(8): 59-62.

URL:     OR

[1] Sasaki M,Kita K.Rule-based Text Categorization Using Hierarchical Categories[C].In:Proceedings of the 1998 IEEE International Conference on Systems,1998:2827-2830.
[2] 黄萱青,吴立德,石崎洋之,等.独立于语种的文本分类方法[J].中文信息学报,2000,14 (6):1-7.
[3] 李渝勤,孙丽华.基于规则的自动分类在文本分类中的应用[J].中文信息学报,2004,18(4):9-14.
[4] Riboni D.Feature Selection for Web Page Classification[C].In:EURASIA -ICT 2002 Proceedings of the Workshop,2002.
[5] Yang Y,Slattery S,Ghani R.A Study of Approaches to Hypertext Categorization[J].Journal of Intelligent Information Systems,2002,18(2/3):219-241.
[6] Sun A X,Lim E P,Ng W K.Web Classification Using Support Vector Machine[M].WIDM, 2002:96-99.
[7] Glover E,Tsioutsiouliklis K,Lawrence S,et al.Using Web Structure for Classifying and Describing Web Pages[C].In:Proc.of the WWW2002,2002.
[8] Oh H J,Myaeng S H,Lee M H.A Practical Hypertext Categorization Method Using Links and Incrementally Available Class Information[C].In:Proc.of the 23rd ACM SIGIR2000,2000:264-271.
[9] Furnkranz J.Exploiting Structural Information for Text Classification on the WWW[C].In:Proc.of the 3rd Symposium on Intelligent Data Analysis (IDA99),1999:487-498.
[10] 马光志,张生庭.基于关联规则的Web文档分类[J].计算机工程与设计,2005 (9):2515-2518
[11] 贺海军.Web信息分类及检索技术的研究[D].北京:北京理工大学计算机应用技术系,2002.
[12] 程军.基于统计的文本分类技术研究[D].北京:中国科学院文献情报中心,2003.
[13] 谭金波,黄峰,杨晓江,等.一种改进的互信息特征提取算法[J].情报学报,2006(12):651-656
[14] 谭金波.基于Web的基础教育资源自动分类技术研究[D].南京:南京师范大学,2006

[1] Tan Jinbo . A Rule-Based Classification Approach of Web Pages Using Ontology[J]. 现代图书情报技术, 2007, 2(3): 39-42.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938