Knowledge Distillation with Few Labeled Samples

doi:10.11925/infotech.2096-3467.2022.1155

Data Analysis and Knowledge Discovery

2024, Vol. 8

Issue (1): 104-113 DOI: 10.11925/infotech.2096-3467.2022.1155

Current Issue | Archive | Adv Search

Knowledge Distillation with Few Labeled Samples

Liu Tong,Ren Xinru,Yin Jinhui,Ni Weijian(

)

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China

Download: PDF (2371 KB) HTML ( 10 )
Export: BibTeX | EndNote (RIS)

Abstract

[Objective] This paper uses the knowledge distillation method to improve the performance of a small-parameter model guided by the high-performance large-parameter model with insufficient labeled samples. It tries to address the issue of sample scarcity and reduce the cost of large-parameter models with high performance in natural language processing. [Methods] First, we used noise purification to obtain valuable data from an unlabeled corpus. Then, we added pseudo labels and increased the number of labeled samples. Meanwhile, we added the knowledge review mechanism and teaching assistant model to the traditional distillation model to realize comprehensive knowledge transfer from the large-parameter model to the small-parameter model. [Results] We conducted text classification and sentiment analysis tasks with the proposed model on IMDB, AG_ NEWS, and Yahoo!Answers datasets. With only 5% of the original data labeled, the new model’s accuracy rate was only 1.45%, 2.75%, and 7.28% less than the traditional distillation model trained with original data. [Limitations] We only examined the new model with text classification and sentiment analysis tasks in natural language processing, which need to be expanded in the future. [Conclusions] The proposed method could achieve a better distillation effect and improve the performance of the small-parameter model.

Key words： Knowledge Distillation Semi-Supervised Learning Few Labeled Samples Text Classification

Received: 04 November 2022 Published: 08 January 2024

ZTFLH:	G250
	TP393

Fund:Natural Science Foundation of Shandong Province(ZR2022MF319);Young Teachers Teaching Top Talent Training Project of Shandong University of Science and Technology(BJ20211110);Graduate Students’ Teaching Case Library Construction Project of Shandong University of Science and Technology

Corresponding Authors: Ni Weijian，ORCID：0000-0002-7924-7350，E-mail：niweijian@sdust.edu.cn。

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Tong Liu
	Xinru Ren
	Jinhui Yin
	Weijian Ni

Cite this article:

Liu Tong, Ren Xinru, Yin Jinhui, Ni Weijian. Knowledge Distillation with Few Labeled Samples. Data Analysis and Knowledge Discovery, 2024, 8(1): 104-113.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.1155 OR https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2024/V8/I1/104

The Architecture of HoliKD

Data Preprocessing

Experimental Parameter Settings

Information of Experimental Dataset

数据（ $K$ ）	AG_NEWS	Yahoo！Answers	IMDB
30%	82.76%	69.81%	81.40%
40%	89.45%	76.20%	85.51%
50%	94.51%	80.79%	93.90%
60%	93.28%	79.32%	91.76%
70%	91.26%	77.91%	88.39%

Experimental Effects under Different

K

Values

Performance of Different Models on Three Datasets

Results of Ablation Experiment

[1]	刘彤, 刘琛, 倪维健. 多层次数据增强的半监督中文情感分析方法[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[1]	(Liu Tong, Liu Chen, Ni Weijian. A Semi-Supervised Sentiment Analysis Method for Chinese Based on Multi-Level Data Augmentation[J]. Data Analysis and Knowledge Discovery, 2021, 5(5): 51-58.)
[2]	Tzelepi M, Passalis N, Tefas A. Online Subclass Knowledge Distillation[J]. Expert Systems with Applications, 2021, 181:115132. doi: 10.1016/j.eswa.2021.115132
[3]	Romero A, Ballas N, Kahou S E, et al. FitNets: Hints for Thin Deep Nets[C]// Proceedings of the 3rd International Conference on Learning Representations. 2015.
[4]	Chen P G, Liu S, Zhao H S, et al. Distilling Knowledge via Knowledge Review[C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2021: 5006-5015.
[5]	Mirzadeh S I, Farajtabar M, Li A, et al. Improved Knowledge Distillation via Teacher Assistant[C]// Proceedings of AAAI Conference on Artificial Intelligence. 2020: 5191-5198.
[6]	Li D, Liu Y, Song L. Adaptive Weighted Losses with Distribution Approximation for Efficient Consistency-based Semi-supervised Learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(11): 7832-7842. doi: 10.1109/TCSVT.2022.3186041
[7]	Lee D H. Pseudo-Label: The Simple and Efficient Semi-supervised Learning Method for Deep Neural Networks[C]// Proceedings of the 18th International Conference on Machine Learning. 2013.
[8]	Chen J A, Yang Z C, Yang D Y. MixText: Linguistically-informed Interpolation of Hidden Space for Semi-supervised Text Classification[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 2147-2157.
[9]	Xie Q Z, Dai Z H, Hovy E, et al. Unsupervised Data Augmentation for Consistency Training[C]// Proceedings of Annual Conference on Neural Information Processing Systems. 2020.
[10]	Chen H T, Guo T Y, Xu C, et al. Learning Student Networks in the Wild[C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2021: 6428-6437.
[11]	Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of Annual Conference on Neural Information Processing Systems. 2017: 5998-6008.
[12]	Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019: 4171-4186.
[13]	Sanh V, Debut L, Chaumond J, et al. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter[C]// Proceedings of Annual Conference on Neural Information Processing Systems. 2019.
[14]	Johnson R, Zhang T. Deep Pyramid Convolutional Neural Networks for Text Categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 562-570.
[15]	Fursov I, Zaytsev A, Burnyshev P, et al. A Differentiable Language Model Adversarial Attack on Text Classifiers[J]. IEEE Access, 2022, 10:17966-17976. doi: 10.1109/ACCESS.2022.3148413
[16]	Zhao X, Huang J X. Bert-QAnet: BERT-encoded Hierarchical Question-answer Cross-attention Network for Duplicate Question Detection[J]. Neurocomputing, 2022, 509: 68-74. doi: 10.1016/j.neucom.2022.08.044
[17]	Bataineh A A, Kaur D. Immunocomputing-based Approach for Optimizing the Topologies of LSTM Networks[J]. IEEE Access, 2021, 9: 78993-79004. doi: 10.1109/ACCESS.2021.3084131

[1]	Cheng Quan, Dong Jia. Hierarchical Multi-label Classification of Children's Literature for Graded Reading[J]. 数据分析与知识发现, 2023, 7(7): 156-169.
[2]	Xu Guixian, Zhang Zixin, Yu Shaona, Dong Yushuang, Tian Yuan. Tibetan News Text Classification Based on Graph Convolutional Networks[J]. 数据分析与知识发现, 2023, 7(6): 73-85.
[3]	Ye Guanghui, Li Songye, Song Xiaoying. Text Classification Method for Urban Portrait Based on Multi-Label Annotation Learning[J]. 数据分析与知识发现, 2023, 7(5): 60-70.
[4]	Gao Haoxin, Sun Lijuan, Wu Jingchen, Gao Yutong, Wu Xu. Online Sensitive Text Classification Model Based on Heterogeneous Graph Convolutional Network[J]. 数据分析与知识发现, 2023, 7(11): 26-36.
[5]	Wang Weijun, Ning Zhiyuan, Du Yi, Zhou Yuanchun. Identifying Interdisciplinary Sci-Tech Literature Based on Multi-Label Classification[J]. 数据分析与知识发现, 2023, 7(1): 102-112.
[6]	Wang jinzheng, Yang Ying, Yu Bengong. Classifying Customer Complaints Based on Multi-head Co-attention Mechanism[J]. 数据分析与知识发现, 2023, 7(1): 128-137.
[7]	Ye Han,Sun Haichun,Li Xin,Jiao Kainan. Classification Model for Long Texts with Attention Mechanism and Sentence Vector Compression[J]. 数据分析与知识发现, 2022, 6(6): 84-94.
[8]	Tu Zhenchao, Ma Jing. Item Categorization Algorithm Based on Improved Text Representation[J]. 数据分析与知识发现, 2022, 6(5): 34-43.
[9]	Chen Guo, Ye Chao. News Classification with Semi-Supervised and Active Learning[J]. 数据分析与知识发现, 2022, 6(4): 28-38.
[10]	Xiao Yuejun, Li Honglian, Zhang Le, Lv Xueqiang, You Xindong. Classifying Chinese Patent Texts with Feature Fusion[J]. 数据分析与知识发现, 2022, 6(4): 49-59.
[11]	Yang Lin, Huang Xiaoshuo, Wang Jiayang, Ding Lingling, Li Zixiao, Li Jiao. Identifying Subtypes of Clinical Trial Diseases with BERT-TextCNN[J]. 数据分析与知识发现, 2022, 6(4): 69-81.
[12]	Xu Yuemei, Fan Zuwei, Cao Han. A Multi-Task Text Classification Model Based on Label Embedding of Attention Mechanism[J]. 数据分析与知识发现, 2022, 6(2/3): 105-116.
[13]	Bai Simeng,Niu Zhendong,He Hui,Shi Kaize,Yi Kun,Ma Yuanchi. Biomedical Text Classification Method Based on Hypergraph Attention Network[J]. 数据分析与知识发现, 2022, 6(11): 13-24.
[14]	Huang Xuejian, Liu Yuyang, Ma Tinghuai. Classification Model for Scholarly Articles Based on Improved Graph Neural Network[J]. 数据分析与知识发现, 2022, 6(10): 93-102.
[15]	Xie Xingyu, Yu Bengong. Automatic Classification of E-commerce Comments with Multi-Feature Fusion Model[J]. 数据分析与知识发现, 2022, 6(1): 101-112.

Viewed

Full text

Abstract

Cited

Shared

Discussed