Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (4): 72-76    DOI: 10.11925/infotech.1003-3513.2010.04.12
article Current Issue | Archive | Adv Search |
A Model of Text Categorization Automatically Based on Category
 Liu  Hai-Feng, Liu  Shou-Sheng, Zhang  Hua-Ren, Su  Zhan
(Institute of Sciences, Peoples Liberation Army University of Science and Technology, Nanjing 210007,China)
Download: PDF(344 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

Firstly, the defects of method based on mutual information in the feature selection are analyzed theoretically,then an improved method is put forward. According to the problems of vector space model, the authors use a class space model to express text and take advantage of  the category information. In this way, the paper realizes an algorithm of text categorization based on category,and the result based on the Chinese text categorization shows that this method has a better precision in the text categorization.

Key wordsText categorization        Feature selection        Class space model        Feature reduction     
Received: 08 March 2010      Published: 25 April 2010
: 

TP391

 
Corresponding Authors: Liu Hai-Feng     E-mail: liuhaifeng19620717@sina.com

Cite this article:

Liu Hai-Feng, Liu Shou-Sheng, Zhang Hua-Ren, Su Zhan. A Model of Text Categorization Automatically Based on Category. New Technology of Library and Information Service, 2010, 26(4): 72-76.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2010.04.12     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2010/V26/I4/72

[1] 苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859.
[2] De Villiers G, Linford Vogt P, De Wit P. Business Logistics Management[M].Oxford University Press,2002.
[3] Shang W Q, Huang H K, Zhu H B, et al. A Novel Feature Selection Algorithm for Text Categorization[J].Expert Systems with Applications,2007,33(1):1-5.
[4] Salton G,Buckley C. Term-weighting Approaches in Automatic Retrieval[J].Information Processing & Management,1988,24(5):513-523.
[5] Liu H, Yu L. Toward Integrating Feature Selection Algorithms for Classification and Clustering [J].IEEE Transactions on Knowledge and Data Engineering, 2005, 17(5):491-502.
[6] Yang S, Gu J. Feature Selection Based on Mutual Information and Redundancy-synergy Coefficient[J].Journal of Zhejiang University Science A,2004,5(11):1382-1391.
[7] Yang Y,Pedersen J O.A Comparative Study on Feature Selection in Text Categorization[EB/OL].[2010-01-23].http://citeseer.ist.psu.edu/yang97comparative.html.
[8] 秦进,陈笑蓉,汪维家,等.文本分类中的特征抽取[J].计算机应用,2003,23(2):45-46.
[9] 黄冉,郭嵩山.基于类别空间模型的文本分类系统的设计与实现[J].计算机应用研究,2005,22(8):60-63.
[10] Han J W,Kamber M.Data Mining:Concepts and Technologies [M].San Francisco:Morgan Kaufmann Publishers,2001.

[1] Cheng Zhou,Hongqin Wei. Identifying Crowd Participants with Modified Random Forests Algorithm[J]. 数据分析与知识发现, 2018, 2(7): 46-54.
[2] Xing Meifeng. Study on Solution to Redundancy of Scientific Literature Keywords[J]. 现代图书情报技术, 2012, 28(1): 34-39.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn