[Objective] This paper uses the knowledge distillation method to improve the performance of a small-parameter model guided by the high-performance large-parameter model with insufficient labeled samples. It tries to address the issue of sample scarcity and reduce the cost of large-parameter models with high performance in natural language processing. [Methods] First, we used noise purification to obtain valuable data from an unlabeled corpus. Then, we added pseudo labels and increased the number of labeled samples. Meanwhile, we added the knowledge review mechanism and teaching assistant model to the traditional distillation model to realize comprehensive knowledge transfer from the large-parameter model to the small-parameter model. [Results] We conducted text classification and sentiment analysis tasks with the proposed model on IMDB, AG_ NEWS, and Yahoo!Answers datasets. With only 5% of the original data labeled, the new model’s accuracy rate was only 1.45%, 2.75%, and 7.28% less than the traditional distillation model trained with original data. [Limitations] We only examined the new model with text classification and sentiment analysis tasks in natural language processing, which need to be expanded in the future. [Conclusions] The proposed method could achieve a better distillation effect and improve the performance of the small-parameter model.
(Liu Tong, Liu Chen, Ni Weijian. A Semi-Supervised Sentiment Analysis Method for Chinese Based on Multi-Level Data Augmentation[J]. Data Analysis and Knowledge Discovery, 2021, 5(5): 51-58.)
[2]
Tzelepi M, Passalis N, Tefas A. Online Subclass Knowledge Distillation[J]. Expert Systems with Applications, 2021, 181:115132.
doi: 10.1016/j.eswa.2021.115132
[3]
Romero A, Ballas N, Kahou S E, et al. FitNets: Hints for Thin Deep Nets[C]// Proceedings of the 3rd International Conference on Learning Representations. 2015.
[4]
Chen P G, Liu S, Zhao H S, et al. Distilling Knowledge via Knowledge Review[C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2021: 5006-5015.
[5]
Mirzadeh S I, Farajtabar M, Li A, et al. Improved Knowledge Distillation via Teacher Assistant[C]// Proceedings of AAAI Conference on Artificial Intelligence. 2020: 5191-5198.
[6]
Li D, Liu Y, Song L. Adaptive Weighted Losses with Distribution Approximation for Efficient Consistency-based Semi-supervised Learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(11): 7832-7842.
doi: 10.1109/TCSVT.2022.3186041
[7]
Lee D H. Pseudo-Label: The Simple and Efficient Semi-supervised Learning Method for Deep Neural Networks[C]// Proceedings of the 18th International Conference on Machine Learning. 2013.
[8]
Chen J A, Yang Z C, Yang D Y. MixText: Linguistically-informed Interpolation of Hidden Space for Semi-supervised Text Classification[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 2147-2157.
[9]
Xie Q Z, Dai Z H, Hovy E, et al. Unsupervised Data Augmentation for Consistency Training[C]// Proceedings of Annual Conference on Neural Information Processing Systems. 2020.
[10]
Chen H T, Guo T Y, Xu C, et al. Learning Student Networks in the Wild[C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2021: 6428-6437.
[11]
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of Annual Conference on Neural Information Processing Systems. 2017: 5998-6008.
[12]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019: 4171-4186.
[13]
Sanh V, Debut L, Chaumond J, et al. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter[C]// Proceedings of Annual Conference on Neural Information Processing Systems. 2019.
[14]
Johnson R, Zhang T. Deep Pyramid Convolutional Neural Networks for Text Categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 562-570.
[15]
Fursov I, Zaytsev A, Burnyshev P, et al. A Differentiable Language Model Adversarial Attack on Text Classifiers[J]. IEEE Access, 2022, 10:17966-17976.
doi: 10.1109/ACCESS.2022.3148413
Bataineh A A, Kaur D. Immunocomputing-based Approach for Optimizing the Topologies of LSTM Networks[J]. IEEE Access, 2021, 9: 78993-79004.
doi: 10.1109/ACCESS.2021.3084131