Please wait a minute...
Advanced Search
现代图书情报技术  2007, Vol. 2 Issue (8): 59-62     https://doi.org/10.11925/infotech.1003-3513.2007.08.14
  知识组织与知识管理 本期目录 | 过刊浏览 | 高级检索 |
基于统计-规则方法的网页层次分类技术研究
谭金波1 杨晓江2 李艺2
1(山东师范大学教育技术系 济南 250014)
2(南京师范大学教育技术系 南京 210097)
Study of Statistics-rule Based Hierarchical Web Page Classification
Tan Jinbo Yang Xiaojiang2   Li Yi2
1(Educational Technology Department,Shandong Normal University,Jinan 250014,China)
2(Educational Technology Department,Nanjing Normal University,Nanjing 210097,China)
全文: PDF (390 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 

基于统计的自动分类是网页层次分类中常用的技术,但其有不足之处,主要表现为当子类之间出现严重的特征交叉现象时,分类精确率将大大下降。而网页层次分类的本质决定了同一大类下的子类存在许多相同的特征。针对这一局限性,结合基于规则的自动分类技术的优点,提出一种基于统计-规则方法的网页层次分类技术。实验表明,基于统计-规则方法的网页层次分类技术能够获得比较理想的分类效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
谭金波
杨晓江
李艺
关键词 规则分类统计分类网页层次分类基于统计-规则的分类    
Abstract

Statistics-based classification methods are common-used in hierarchical Web classification.However,classification precision of statistics-based methods often drops when categories are very similar to each other because of feature overlapping.Due to the nature of hierarchical Web classification,categories sharing the same parent (e.g.,leaf categories in the hierarchy) are often very similar to each other.To improve the classification precision,the paper proposes to use rule-based classification methods on top of statistics-based methods in hierarchical Web classification.Experiments show that our methods perform well on our education Web collections.

Key wordsStatistics-based classification    Rule-based classification    Hierarchical Web classification    Statistics-rule based classification
收稿日期: 2007-06-11      出版日期: 2007-08-25
: 

G354.4

 
通讯作者: 谭金波      E-mail: yttjb@163.com
作者简介: 谭金波,杨晓江,李艺
引用本文:   
谭金波,杨晓江,李艺. 基于统计-规则方法的网页层次分类技术研究[J]. 现代图书情报技术, 2007, 2(8): 59-62.
Tan Jinbo,Yang Xiaojiang,Li Yi. Study of Statistics-rule Based Hierarchical Web Page Classification. New Technology of Library and Information Service, 2007, 2(8): 59-62.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2007.08.14      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2007/V2/I8/59

[1] Sasaki M,Kita K.Rule-based Text Categorization Using Hierarchical Categories[C].In:Proceedings of the 1998 IEEE International Conference on Systems,1998:2827-2830.
[2] 黄萱青,吴立德,石崎洋之,等.独立于语种的文本分类方法[J].中文信息学报,2000,14 (6):1-7.
[3] 李渝勤,孙丽华.基于规则的自动分类在文本分类中的应用[J].中文信息学报,2004,18(4):9-14.
[4] Riboni D.Feature Selection for Web Page Classification[C].In:EURASIA -ICT 2002 Proceedings of the Workshop,2002.
[5] Yang Y,Slattery S,Ghani R.A Study of Approaches to Hypertext Categorization[J].Journal of Intelligent Information Systems,2002,18(2/3):219-241.
[6] Sun A X,Lim E P,Ng W K.Web Classification Using Support Vector Machine[M].WIDM, 2002:96-99.
[7] Glover E,Tsioutsiouliklis K,Lawrence S,et al.Using Web Structure for Classifying and Describing Web Pages[C].In:Proc.of the WWW2002,2002.
[8] Oh H J,Myaeng S H,Lee M H.A Practical Hypertext Categorization Method Using Links and Incrementally Available Class Information[C].In:Proc.of the 23rd ACM SIGIR2000,2000:264-271.
[9] Furnkranz J.Exploiting Structural Information for Text Classification on the WWW[C].In:Proc.of the 3rd Symposium on Intelligent Data Analysis (IDA99),1999:487-498.
[10] 马光志,张生庭.基于关联规则的Web文档分类[J].计算机工程与设计,2005 (9):2515-2518
[11] 贺海军.Web信息分类及检索技术的研究[D].北京:北京理工大学计算机应用技术系,2002.
[12] 程军.基于统计的文本分类技术研究[D].北京:中国科学院文献情报中心,2003.
[13] 谭金波,黄峰,杨晓江,等.一种改进的互信息特征提取算法[J].情报学报,2006(12):651-656
[14] 谭金波.基于Web的基础教育资源自动分类技术研究[D].南京:南京师范大学,2006

[1] 施聪莺,徐朝军,杨晓江. 基于规则和Rocchio分类器的学前综合教育资源分类*[J]. 现代图书情报技术, 2009, 25(7-8): 75-79.
[2] 谭金波 . 基于本体实现网页规则分类的方法[J]. 现代图书情报技术, 2007, 2(3): 39-42.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn