|
|
Implementation and Comparison of Similarity and Probabilistic Mode in Text Categorization |
Liu Hua |
(College of Chinese Language and Culture of Jinan University, Guangzhou 510610, China) |
|
|
Abstract This paper has implemented a text categorization system based on Vector Space Model(VSM) and Naive-Bayes(NB). When estimating the category, the authors enhance the veracity of parent-category by emendation of subcategory, and judge whether document has multi-classification and multi-label by estimating the similar difference of classifier’s final values. The experiment proves that VSM is better than NB in text representation: MicroF1 increases of 25.2 percent of parent-category, and MicroF1 increases of 26.3 percent of sub-category.
|
Received: 12 January 2006
Published: 25 April 2006
|
|
Corresponding Authors:
Liu Hua
E-mail: liuhua0461@sina.com
|
About author:: Liu Hua |
1Fabrizio Sebastiani. Machine learning in automated text categorization.ACM Computing Surveys,2002,34(1):1-47
2Y. Yang. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval,1999,1(1/2): 67-88
3庞剑锋等.基于向量空间模型的文本自动分类系统的研究与实现.计算机应用研究, 2001,18(9):23-26
4陈克利.基于大规模真实文本的平衡语料分析与文本分类方法.Advances in Computation of Oriental Languages.北京:清华大学出版社,2003. 540-545
5施彤年,卢忠良.多类多标签汉语文本自动分类的研究.情报学报,2003,22(3):306-309
6张宇,刘挺,文勖.基于改进贝叶斯模型的问题分类.中文信息学报,2005,19(2):100-105 |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|