|
|
Application of Improved Information Gain Feature Selection Methodto Text Clustering |
Chen Tao1 Song Yan2 Xie Yangqun1 |
1(Department of Management Science and Engineering, Ningbo, Zhejiang 315211,China)
2(Department of Business Administration,Nanjing,Jiangsu 210093,China) |
|
|
Abstract This paper applies the improved information gain method to the text clustering. Retrieving 250 from the corpus, according to Vector Space Model and the information gain feature selection method,construct the text feature vector;use C-means to automatic clustering, the precision、recall and F-measure are 0.82、0.88、0.83.
|
Received: 07 July 2004
Published: 25 December 2004
|
|
Corresponding Authors:
Xie Yangqun
E-mail: xieyangqun1980@yahoo.com.cn
|
About author:: Chen Tao,Song Yan,Xie Yangqun |
1 Fabrizio Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys. 2002, 34(1):1-47
2 Tom Mitchell. Machine learning. McGraw Hill, New York. 1996
3 秦进,陈笑蓉等.文本分类中的特征抽取.计算机应用,2003,23(2):45-46
4 刁倩, 王永成, 张惠惠等. 文本自动分类中的词权重与分类算法. 中文信息学报,2000, 14(3):25-29
5 李雪青,张冬荣.一种基于向量空间模型的文本分类方法.计算机工程,2003,29(17):90-92
6 代六玲,黄河燕,陈肇雄.中文文本分类特征抽取方法的比较研究。中文信息学报,2004,18(1):26-32 |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|