[Objective] This paper proposes a new method to improve the classification accuracy of the Chinese news texts with the help of Deep Learning theory. [Methods] We first used the denoising auto encoder to construct a deep network to learn the zipped and distributed representation of the Chinese news texts. Second, we used the SVM algorithm to classify these news texts. [Results] As the number of samples expanding, the precision rate, the recall rate and the F value of the proposed method increased too. The results are better than those of the applications using the KNN, BP and SVM algorithms. The average precision rate was higher than 95%. [Limitations] The data size was relatively small, thus, the proposed method did not fully utilize the parallel data processing capacity of the deep learning technology. [Conclusions] The proposed method improves the performance of applications classifying Chinese news texts.
刘红光,马双刚,刘桂锋. 基于降噪自动编码器的中文新闻文本分类方法研究*[J]. 现代图书情报技术, 2016, 32(6): 12-19.
Liu Hongguang,Ma Shuanggang,Liu Guifeng. Classifying Chinese News Texts with Denoising Auto Encoder. New Technology of Library and Information Service, 2016, 32(6): 12-19.
(Pei Yingbo, Liu Xiaoxia.Study on Improved CHI for Feature Selection in Chinese Text Categorization[J]. Computer Engineering and Applications, 2011, 47(4): 128-130.)
(Xin Zhu, Zhou Yajian.Study and Improvement of Mutual Information for Feature Selection in Text Categorization[J]. Journal of Computer Applications, 2013, 33(S2): 116-118, 152.)
(Guo Song, Ma Fei.Improving the Algorithm of Information Gain Feature Selection in Text Classification[J]. Computer Applications and Software, 2013, 30(8): 139-142.)
[4]
Peters C, Koster C H.Uncertainty-based Noise Reduction and Term Selection in Text Categorization [C]. In: Proceedings of the 24th BCS-IRSG European Colloquium on IR Research, Glasgow, UK. Springer, 2002: 248-267.
[5]
Lewis D D.Representation and Learning in Information Retrieval [D]. University of Massachusetts, 1992.
(Li Xuexiang.Research of Text Categorization Based on Improved Maximum Entropy Algorithm[J]. Computer Science, 2012, 39(6): 210-212.)
[7]
Hinton G E, Salakhutdinov R R.Reducing the Dimensionality of Data with Neural Networks[J]. Science, 2006, 313(5786): 504-507.
[8]
Bengio Y, Lamblin P, Popovici D, et al.Greedy Layer-wise Training of Deep Networks [C]. In: Proceedings of the 20th Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada. 2007, 19: 153.
[9]
Vincent P, Larochelle H, Bengio Y, et al.Extracting and Composing Robust Features with Denoising Autoencoders [C]. In: Proceedings of the 25th International Conference on Machine Learning. ACM, 2008: 1096-1103.
[10]
Masci J, Meier U, Cire?an D, et al.Stacked Convolutional Auto-encoders for Hierarchical Feature Extraction [C]. In: Proceedings of the 21st International Conference on Artificial Neural Networks. Springer Berlin Heidelberg, 2011: 52-59.
(Hu Shuai, Yuan Zhiyong, Xiao Ling, et al.Stacked Denoising Autoencoders Applied to Clinical Diagnose and Classification[J]. Application Research of Computers, 2015, 32(5): 1417-1420.)
(Liu Kan, Yuan Yunying.Short Texts Feature Extraction and Clustering Based on Auto-Encoder[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2015, 51(2): 282-288.)
(Qin Shengjun, Lu Zhiping.Research of Unbalance Sentiment Classification Based on Denoising Autoencoders[J]. Science Technology and Engineering, 2014, 51(12): 232-235.)
[17]
Bengio Y, Delalleau O.On the Expressive Power of Deep Architectures [C]. In: Proceedings of the 22nd International Conference on Algorithmic Learning Theory. Springer Berlin Heidelberg, 2011: 18-36.
[18]
Vincent P, Larochelle H, Bengio Y, et al.Extracting and Composing Robust Features with Denoising Autoencoders [C]. In: Proceedings of the 25th International Conference on Machine Learning. ACM, 2008: 1096-1103.
[19]
Neural Networks and Deep Learning [EB/OL]. [2015-12-23]. .
[20]
Vapnik V N.The Nature of Statistical Learning Theory[J]. IEEE Transactions on Neural Networks, 1995, 10(5): 988-999.
[21]
NLPIR汉语分词系统[EB/OL]. [2015-09-22]. .
[21]
(NLPIR Chinese Word Segmentation System [EB/OL]. [2015-09-22].
[22]
文本分类语料库(复旦)测试语料 [EB/OL]. [2015-12-24]. .
[22]
(Text Categorization Corpus (Fudan) Test Corpus [EB/OL]. [2015- 12-24].