[Objective] This study tries to address the issues facing long text representation and use CapsNet to improve the accuracy of Chinese text classification. [Methods] First, we proposed a LDA matrix and word vector to represent the long texts. Then, we constructed a Chinese classification model based on CapsNet. Third, we examined the proposed model with Sogou news corpus and the text classification corpus of Fudan University. Finally, we compared our results with those of the classic models (e.g., TextCNN, DNN and so on). [Results] The performance of CapsNet model was better than other models. The classification accuracy in five categories of short and long texts reached 89.6% and 96.9% respectively. The convergence speed of the proposed model was almost two times faster than that of the CNN model. [Limitations] The computational complexity of the model is high, which limits the size of testing corpus. [Conclusions] The proposed Chinese text representation method and the modified CapsNet model have better accuracy, convergence speed and robustness than the existing ones.
Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3(2): 993-1022.
Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint. arXiv: 1301.3781.
Joachims T.Text Categorization with Support Vector Machines: Learning with Many Relevant Features[C]// Proceedings of the 10th European Conference on Machine Learning. 1998: 137-142.
Kim Y.Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint. arXiv: 1408.5882.
Kalchbrenner N, Grefenstette E, Blunsom P.A Convolutional Neural Network for Modelling Sentences[OL]. arXiv Preprint. arXiv: 1404.2188.
Liu P, Qiu X, Huang X.Recurrent Neural Network for Text Classification with Multi-Task Learning[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016: 2873-2879.
Joulin A, Grave E, Bojanowski P, et al.Bag of Tricks for Efficient Text Classification[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 2016: 427-431.