Firstly, the defects of method based on mutual information in the feature selection are analyzed theoretically,then an improved method is put forward. According to the problems of vector space model, the authors use a class space model to express text and take advantage of the category information. In this way, the paper realizes an algorithm of text categorization based on category,and the result based on the Chinese text categorization shows that this method has a better precision in the text categorization.
刘海峰 刘守生 张学仁 苏展. 一种基于类别信息的文本自动分类模型[J]. 现代图书情报技术, 2010, 26(4): 72-76.
Liu Hai-Feng, Liu Shou-Sheng, Zhang Hua-Ren, Su Zhan. A Model of Text Categorization Automatically Based on Category. New Technology of Library and Information Service, 2010, 26(4): 72-76.
[1] 苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859.
[2] De Villiers G, Linford Vogt P, De Wit P. Business Logistics Management[M].Oxford University Press,2002.
[3] Shang W Q, Huang H K, Zhu H B, et al. A Novel Feature Selection Algorithm for Text Categorization[J].Expert Systems with Applications,2007,33(1):1-5.
[4] Salton G,Buckley C. Term-weighting Approaches in Automatic Retrieval[J].Information Processing & Management,1988,24(5):513-523.
[5] Liu H, Yu L. Toward Integrating Feature Selection Algorithms for Classification and Clustering [J].IEEE Transactions on Knowledge and Data Engineering, 2005, 17(5):491-502.
[6] Yang S, Gu J. Feature Selection Based on Mutual Information and Redundancy-synergy Coefficient[J].Journal of Zhejiang University Science A,2004,5(11):1382-1391.
[7] Yang Y,Pedersen J O.A Comparative Study on Feature Selection in Text Categorization[EB/OL].[2010-01-23].http://citeseer.ist.psu.edu/yang97comparative.html.
[8] 秦进,陈笑蓉,汪维家,等.文本分类中的特征抽取[J].计算机应用,2003,23(2):45-46.
[9] 黄冉,郭嵩山.基于类别空间模型的文本分类系统的设计与实现[J].计算机应用研究,2005,22(8):60-63.
[10] Han J W,Kamber M.Data Mining:Concepts and Technologies [M].San Francisco:Morgan Kaufmann Publishers,2001.