|
|
Classifying Chinese News Texts with Denoising Auto Encoder |
Liu Hongguang,Ma Shuanggang( ),Liu Guifeng |
Institute of Scientific & Technical Information, Jiangsu University, Zhenjiang 212013, China |
|
|
Abstract [Objective] This paper proposes a new method to improve the classification accuracy of the Chinese news texts with the help of Deep Learning theory. [Methods] We first used the denoising auto encoder to construct a deep network to learn the zipped and distributed representation of the Chinese news texts. Second, we used the SVM algorithm to classify these news texts. [Results] As the number of samples expanding, the precision rate, the recall rate and the F value of the proposed method increased too. The results are better than those of the applications using the KNN, BP and SVM algorithms. The average precision rate was higher than 95%. [Limitations] The data size was relatively small, thus, the proposed method did not fully utilize the parallel data processing capacity of the deep learning technology. [Conclusions] The proposed method improves the performance of applications classifying Chinese news texts.
|
Received: 13 January 2016
Published: 18 July 2016
|
[1] | 裴英博, 刘晓霞. 文本分类中改进型CHI特征选择方法的研究[J]. 计算机工程与应用, 2011, 47(4): 128-130. | [1] | (Pei Yingbo, Liu Xiaoxia.Study on Improved CHI for Feature Selection in Chinese Text Categorization[J]. Computer Engineering and Applications, 2011, 47(4): 128-130.) | [2] | 辛竹, 周亚建. 文本分类中互信息特征选择方法的研究与算法改进[J]. 计算机应用, 2013, 33(S2): 116-118, 152. | [2] | (Xin Zhu, Zhou Yajian.Study and Improvement of Mutual Information for Feature Selection in Text Categorization[J]. Journal of Computer Applications, 2013, 33(S2): 116-118, 152.) | [3] | 郭颂, 马飞. 文本分类中信息增益特征选择算法的改进[J]. 计算机应用与软件, 2013, 30(8): 139-142. | [3] | (Guo Song, Ma Fei.Improving the Algorithm of Information Gain Feature Selection in Text Classification[J]. Computer Applications and Software, 2013, 30(8): 139-142.) | [4] | Peters C, Koster C H.Uncertainty-based Noise Reduction and Term Selection in Text Categorization [C]. In: Proceedings of the 24th BCS-IRSG European Colloquium on IR Research, Glasgow, UK. Springer, 2002: 248-267. | [5] | Lewis D D.Representation and Learning in Information Retrieval [D]. University of Massachusetts, 1992. | [6] | 李学相. 改进的最大熵权值算法在文本分类中的应用[J]. 计算机科学, 2012, 39(6): 210-212. | [6] | (Li Xuexiang.Research of Text Categorization Based on Improved Maximum Entropy Algorithm[J]. Computer Science, 2012, 39(6): 210-212.) | [7] | Hinton G E, Salakhutdinov R R.Reducing the Dimensionality of Data with Neural Networks[J]. Science, 2006, 313(5786): 504-507. | [8] | Bengio Y, Lamblin P, Popovici D, et al.Greedy Layer-wise Training of Deep Networks [C]. In: Proceedings of the 20th Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada. 2007, 19: 153. | [9] | Vincent P, Larochelle H, Bengio Y, et al.Extracting and Composing Robust Features with Denoising Autoencoders [C]. In: Proceedings of the 25th International Conference on Machine Learning. ACM, 2008: 1096-1103. | [10] | Masci J, Meier U, Cire?an D, et al.Stacked Convolutional Auto-encoders for Hierarchical Feature Extraction [C]. In: Proceedings of the 21st International Conference on Artificial Neural Networks. Springer Berlin Heidelberg, 2011: 52-59. | [11] | 汪彩霞, 魏雪云, 王彪. 基于堆栈降噪自动编码模型的动态纹理分类方法[J]. 现代电子技术, 2015, 38(6): 20-24. | [11] | (Wang Caixia, Wei Xueyun, Wang Biao.Dynamic Texture Classification Method Based on Stacked Denoising Autoencoding Model[J]. Modern Electronics Technique, 2015, 38(6): 20-24.) | [12] | Wu Z, Takaki S, Yamagishi J. Deep Denoising Auto-encoder for Statistical Speech Synthesis [OL]. arXiv:1506.05268, 2015. | [13] | Li J, Struzik Z, Zhang L, et al.Feature Learning from Incomplete EEG with Denoising Autoencoder[J]. Neurocomputing, 2015, 165: 23-31. | [14] | 胡帅, 袁志勇, 肖玲, 等. 基于改进的多层降噪自编码算法临床分类诊断研究[J]. 计算机应用研究, 2015, 32(5): 1417-1420. | [14] | (Hu Shuai, Yuan Zhiyong, Xiao Ling, et al.Stacked Denoising Autoencoders Applied to Clinical Diagnose and Classification[J]. Application Research of Computers, 2015, 32(5): 1417-1420.) | [15] | 刘勘, 袁蕴英. 基于自动编码器的短文本特征提取及聚类研究[J]. 北京大学学报: 自然科学版, 2015, 51(2): 282-288. | [15] | (Liu Kan, Yuan Yunying.Short Texts Feature Extraction and Clustering Based on Auto-Encoder[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2015, 51(2): 282-288.) | [16] | 秦胜君, 卢志平. 基于降噪自动编码器的不平衡情感分类研究[J]. 科学技术与工程, 2014, 14(12): 232-235. | [16] | (Qin Shengjun, Lu Zhiping.Research of Unbalance Sentiment Classification Based on Denoising Autoencoders[J]. Science Technology and Engineering, 2014, 51(12): 232-235.) | [17] | Bengio Y, Delalleau O.On the Expressive Power of Deep Architectures [C]. In: Proceedings of the 22nd International Conference on Algorithmic Learning Theory. Springer Berlin Heidelberg, 2011: 18-36. | [18] | Vincent P, Larochelle H, Bengio Y, et al.Extracting and Composing Robust Features with Denoising Autoencoders [C]. In: Proceedings of the 25th International Conference on Machine Learning. ACM, 2008: 1096-1103. | [19] | Neural Networks and Deep Learning [EB/OL]. [2015-12-23]. . | [20] | Vapnik V N.The Nature of Statistical Learning Theory[J]. IEEE Transactions on Neural Networks, 1995, 10(5): 988-999. | [21] | NLPIR汉语分词系统[EB/OL]. [2015-09-22]. . | [21] | (NLPIR Chinese Word Segmentation System [EB/OL]. [2015-09-22]. | [22] | 文本分类语料库(复旦)测试语料 [EB/OL]. [2015-12-24]. . | [22] | (Text Categorization Corpus (Fudan) Test Corpus [EB/OL]. [2015- 12-24]. |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|