Please wait a minute...
New Technology of Library and Information Service  2011, Vol. 27 Issue (3): 73-79    DOI: 10.11925/infotech.1003-3513.2011.03.12
Current Issue | Archive | Adv Search |
A New Classifier Design in a Topic Search Engine by Combining Multi-layer Classifier with Naive Bayes Classification Model
Zhang Hongbin, Cao Yiqin
School of Software, East China Jiaotong University, Nanchang 330013, China
Download: PDF(697 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  The paper firstly analyzes the distribution characteristics of computer education resources on Web, then it designs a multi-layer classifier to resolve the topic classification problem in topic crawling procedure by combining topic words and resources forms, and introduces how to make the precise classification fusion by Naive Bayes Classifier model and how the resources are stored correctly into the hard disk. Finally, experiment results show that the key design idea is feasible and many performances are acceptable, such as the avarage accuracy of the topic classification algorithm reaches to 78% as well as the avarage recall accuracy reaches to 61% and the avarage resources parsing accuracy reaches to 81.5%.
Key wordsMulti-layer classifier      Topic search engine      Computer education resources      Naive Bayes     
Received: 17 January 2011      Published: 05 May 2011
: 

TP393.08

 

Cite this article:

Zhang Hongbin, Cao Yiqin. A New Classifier Design in a Topic Search Engine by Combining Multi-layer Classifier with Naive Bayes Classification Model. New Technology of Library and Information Service, 2011, 27(3): 73-79.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2011.03.12     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2011/V27/I3/73

[1] Li G, Zhang H. Design of a Distributed Spiders System Based on Web Service [C]. In: Proceedings of the 2nd Asia Conference on Web Mining and Web-based Application. Washington, DC, USA:IEEE Computer Society, 2009: 167-170.

[2] 李广丽. 基于网页内容评价和Web图的启发式垂直搜索策略的设计[J]. 情报理论与实践,2009,32(9):121-124.

[3] 刘奕群,马少平,洪涛,等. 搜索引擎技术基础[M]. 北京:清华大学出版社,2010.

[4] Zhang H, Liu J. Search Engine Design Based on Web Service and Lucene[C]. In: Proceedings of the 2009 WASE International Conference on Information Engineering. Washington, DC, USA:IEEE Computer Society, 2009:458-461.

[5] 李广丽.垂直搜索引擎的研究与设计[D].南昌:华东交通大学,2008.

[6] 百度文库-文档分享平台[EB/OL]. [2010-02-14]. http://wenku.baidu.com/.

[7] “IT计算机”-豆丁网[EB/OL]. [2010-02-14]. http://www.docin.com/l-10017-0-0-0-0-1.html.

[8] 朴素贝叶斯_百度百科[EB/OL]. [2010-11-16]. http://baike.baidu.com/view/992724.htm.

[9] 许鑫,黄仲清. 垂直搜索引擎应用中的若干策略探讨——以12580餐饮垂直搜索为例[J].现代图书情报技术,2009(2):62-70.

[10] Heritrix开发文档[EB/OL].[2010-04-03]. http://crawler.archive.org/articles/developer_manual.html.

[11] Welcome to Apache Lucene[EB/OL]. [2010-02-14]. http://lucene.apache.org/.

[12] Apache POI-Text Extraction[EB/OL]. [2010-02-13]. http://poi.apache.org/text-extraction.html.

[13] 使用PDFBox处理PDF文档[EB/OL]. [2010-04-20]. http://www.cnblogs.com/ hejycpu/archive/2009/01/19/1378380.html.

[14] Lucene中文分词庖丁解牛2.0.0版本发布[EB/OL]. [2010-04-20]. http://java.ccidnet.com/art/12013/20070821/1185171_1.html.
[1] Yongnan Li. Using Bayes Theory to Classify Counter Terrorism Intelligence[J]. 数据分析与知识发现, 2018, 2(10): 9-14.
[2] Tang Xiangbin, Lu Wei, Zhang Xiaojuan, Huang Shihao. Feature Analysis and Automatic Identification of Query Specificity[J]. 现代图书情报技术, 2015, 31(2): 15-23.
[3] Ma Bin, Yin Lifeng. A Parallel Naive Bayesian Network Public Opinion Fast Classification Algorithm Based on Hadoop Platform[J]. 现代图书情报技术, 2015, 31(2): 78-84.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn