|
|
Study on Automatic Text Categorization with Support Vector Machine |
Shi Jiebin |
(Zhejiang University Library, Hangzhou 310029, China) |
|
|
Abstract A new machine learning method of Support Vector Machine (SVM), is applied in automatic text categorization. Comparing with the result achieved by k-nearest neighbor algorithm, the accuracy achieved by support vector machine is better; The effect of feature selection methods is smaller to SVM than the KNN method. The SVM is a potential and competitive method for automatic text categorization. The feature selection methods also affectes the accuracy of text categorization.
|
Received: 23 February 2004
Published: 25 July 2004
|
|
Corresponding Authors:
Shi Jiebin
E-mail: jbshi@lib.zju.edu.cn
|
About author:: Shi Jiebin |
1史忠植.知识发现.北京:清华大学出版社,2002:334-363
2王梦云等.基于字频向量的中文文本自动分类系统.情报学报,2000,19(6):644-649
3李勇等.网络文本数据分类技术与实现算法.情报学报,2002,21(1):21-26
4庞剑锋等.基于向量空间模型的文本自动分类系统的研究与实现.计算机应用研究,2001(9):23-26
5柳回春等.支持向量机的研究现状.中国图象图形学报,2002,7A(6):618-623
6萧嵘等.支持向量机理论综述.计算机科学,2000,27(3):1-3
7Vapnik, V., Statistical Learning Theory, New York, NY: Wiley, 1998
8陆玉昌等.向量空间法中单词权重函数的分析和构造.计算机研究与发展,2002,39(10):1205-1210
9李凡等.关于文本特征抽取新方法的研究.清华大学学报(自然科学版),2001,41(7):98-101
10朱明等.Web网页设别中的特征选择问题研究.计算机工程,2000,26(8):35-37
11李蓉等.SVM-KNN分类器——一种提高SVM分类精度的新方法,电子学报,2002,30(5):745-748
12Chang, C. et al, The analysis of decomposition methods for support vector machines, IEEE Transactions on Neural Networks,2000, 11 (4): 1003-1008
13孙健等.基于K-最近距离的自动文本分类研究.北京邮电大学学报,2001,24(1):42-46 |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|