|
|
Knowledge Distillation with Few Labeled Samples |
Liu Tong,Ren Xinru,Yin Jinhui,Ni Weijian() |
College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China |
|
|
Abstract [Objective] This paper uses the knowledge distillation method to improve the performance of a small-parameter model guided by the high-performance large-parameter model with insufficient labeled samples. It tries to address the issue of sample scarcity and reduce the cost of large-parameter models with high performance in natural language processing. [Methods] First, we used noise purification to obtain valuable data from an unlabeled corpus. Then, we added pseudo labels and increased the number of labeled samples. Meanwhile, we added the knowledge review mechanism and teaching assistant model to the traditional distillation model to realize comprehensive knowledge transfer from the large-parameter model to the small-parameter model. [Results] We conducted text classification and sentiment analysis tasks with the proposed model on IMDB, AG_ NEWS, and Yahoo!Answers datasets. With only 5% of the original data labeled, the new model’s accuracy rate was only 1.45%, 2.75%, and 7.28% less than the traditional distillation model trained with original data. [Limitations] We only examined the new model with text classification and sentiment analysis tasks in natural language processing, which need to be expanded in the future. [Conclusions] The proposed method could achieve a better distillation effect and improve the performance of the small-parameter model.
|
Received: 04 November 2022
Published: 08 January 2024
|
|
Fund:Natural Science Foundation of Shandong Province(ZR2022MF319);Young Teachers Teaching Top Talent Training Project of Shandong University of Science and Technology(BJ20211110);Graduate Students’ Teaching Case Library Construction Project of Shandong University of Science and Technology |
Corresponding Authors:
Ni Weijian,ORCID:0000-0002-7924-7350,E-mail:niweijian@sdust.edu.cn。
|
[1] |
刘彤, 刘琛, 倪维健. 多层次数据增强的半监督中文情感分析方法[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
|
[1] |
(Liu Tong, Liu Chen, Ni Weijian. A Semi-Supervised Sentiment Analysis Method for Chinese Based on Multi-Level Data Augmentation[J]. Data Analysis and Knowledge Discovery, 2021, 5(5): 51-58.)
|
[2] |
Tzelepi M, Passalis N, Tefas A. Online Subclass Knowledge Distillation[J]. Expert Systems with Applications, 2021, 181:115132.
doi: 10.1016/j.eswa.2021.115132
|
[3] |
Romero A, Ballas N, Kahou S E, et al. FitNets: Hints for Thin Deep Nets[C]// Proceedings of the 3rd International Conference on Learning Representations. 2015.
|
[4] |
Chen P G, Liu S, Zhao H S, et al. Distilling Knowledge via Knowledge Review[C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2021: 5006-5015.
|
[5] |
Mirzadeh S I, Farajtabar M, Li A, et al. Improved Knowledge Distillation via Teacher Assistant[C]// Proceedings of AAAI Conference on Artificial Intelligence. 2020: 5191-5198.
|
[6] |
Li D, Liu Y, Song L. Adaptive Weighted Losses with Distribution Approximation for Efficient Consistency-based Semi-supervised Learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(11): 7832-7842.
doi: 10.1109/TCSVT.2022.3186041
|
[7] |
Lee D H. Pseudo-Label: The Simple and Efficient Semi-supervised Learning Method for Deep Neural Networks[C]// Proceedings of the 18th International Conference on Machine Learning. 2013.
|
[8] |
Chen J A, Yang Z C, Yang D Y. MixText: Linguistically-informed Interpolation of Hidden Space for Semi-supervised Text Classification[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 2147-2157.
|
[9] |
Xie Q Z, Dai Z H, Hovy E, et al. Unsupervised Data Augmentation for Consistency Training[C]// Proceedings of Annual Conference on Neural Information Processing Systems. 2020.
|
[10] |
Chen H T, Guo T Y, Xu C, et al. Learning Student Networks in the Wild[C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2021: 6428-6437.
|
[11] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of Annual Conference on Neural Information Processing Systems. 2017: 5998-6008.
|
[12] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019: 4171-4186.
|
[13] |
Sanh V, Debut L, Chaumond J, et al. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter[C]// Proceedings of Annual Conference on Neural Information Processing Systems. 2019.
|
[14] |
Johnson R, Zhang T. Deep Pyramid Convolutional Neural Networks for Text Categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 562-570.
|
[15] |
Fursov I, Zaytsev A, Burnyshev P, et al. A Differentiable Language Model Adversarial Attack on Text Classifiers[J]. IEEE Access, 2022, 10:17966-17976.
doi: 10.1109/ACCESS.2022.3148413
|
[16] |
Zhao X, Huang J X. Bert-QAnet: BERT-encoded Hierarchical Question-answer Cross-attention Network for Duplicate Question Detection[J]. Neurocomputing, 2022, 509: 68-74.
doi: 10.1016/j.neucom.2022.08.044
|
[17] |
Bataineh A A, Kaur D. Immunocomputing-based Approach for Optimizing the Topologies of LSTM Networks[J]. IEEE Access, 2021, 9: 78993-79004.
doi: 10.1109/ACCESS.2021.3084131
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|