一种全面的少标签样本情形下的知识蒸馏方法<sup>*</sup>

doi:10.11925/infotech.2096-3467.2022.1155

数据分析与知识发现

2024, Vol. 8

Issue (1): 104-113 https://doi.org/10.11925/infotech.2096-3467.2022.1155

研究论文

本期目录 | 过刊浏览 | 高级检索

一种全面的少标签样本情形下的知识蒸馏方法^*

刘彤,任欣儒,尹金辉,倪维健(

)

山东科技大学计算机科学与工程学院青岛 266590

Knowledge Distillation with Few Labeled Samples

Liu Tong,Ren Xinru,Yin Jinhui,Ni Weijian(

)

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (2371 KB) HTML ( 10 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】 针对在自然语言处理中有标签样本稀缺和高性能的大规模参数量模型训练成本高的问题，本文在有标签样本不足情况下，通过知识蒸馏方法，提升在高性能大参数量模型指导下的小参数量模型性能。【方法】 通过噪声提纯方法，从无标签数据中获取有价值的数据并赋予其伪标签，增加有标签样本数量；并在传统蒸馏模型基础上增加知识回顾机制和助教模型，实现从大参数量模型到小参数量模型的全面的知识迁移。【结果】 在IMDB、AG_NEWS、Yahoo！Answers数据集的文本分类和情感分析任务上，使用原数据集规模的5%作为有标签数据，准确率表现与使用全部数据训练的传统蒸馏模型分别仅相差1.45%、2.75%、7.28%。【局限】 仅针对自然语言处理中的文本分类以及情感分析任务进行实验研究，后续可进一步扩大任务覆盖面。【结论】 本文所提方法在少量有标签样本的情形下，实现了较好的蒸馏效果，使得小参数量模型的性能得到显著提升。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	刘彤
	任欣儒
	尹金辉
	倪维健

关键词 ：知识蒸馏, 半监督学习, 少标签样本, 文本分类

Abstract：

[Objective] This paper uses the knowledge distillation method to improve the performance of a small-parameter model guided by the high-performance large-parameter model with insufficient labeled samples. It tries to address the issue of sample scarcity and reduce the cost of large-parameter models with high performance in natural language processing. [Methods] First, we used noise purification to obtain valuable data from an unlabeled corpus. Then, we added pseudo labels and increased the number of labeled samples. Meanwhile, we added the knowledge review mechanism and teaching assistant model to the traditional distillation model to realize comprehensive knowledge transfer from the large-parameter model to the small-parameter model. [Results] We conducted text classification and sentiment analysis tasks with the proposed model on IMDB, AG_ NEWS, and Yahoo!Answers datasets. With only 5% of the original data labeled, the new model’s accuracy rate was only 1.45%, 2.75%, and 7.28% less than the traditional distillation model trained with original data. [Limitations] We only examined the new model with text classification and sentiment analysis tasks in natural language processing, which need to be expanded in the future. [Conclusions] The proposed method could achieve a better distillation effect and improve the performance of the small-parameter model.

Key words： Knowledge Distillation Semi-Supervised Learning Few Labeled Samples Text Classification

收稿日期: 2022-11-04 出版日期: 2024-01-08

ZTFLH:	G250
	TP393

基金资助:*山东省自然科学基金项目(ZR2022MF319);山东科技大学青年教师教学拔尖人才培养项目(BJ20211110);山东科技大学专业学位研究生教学案例库建设项目

通讯作者: 倪维健，ORCID：0000-0002-7924-7350，E-mail：niweijian@sdust.edu.cn。

引用本文:

刘彤, 任欣儒, 尹金辉, 倪维健. 一种全面的少标签样本情形下的知识蒸馏方法^*[J]. 数据分析与知识发现, 2024, 8(1): 104-113.
Liu Tong, Ren Xinru, Yin Jinhui, Ni Weijian. Knowledge Distillation with Few Labeled Samples. Data Analysis and Knowledge Discovery, 2024, 8(1): 104-113.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.1155 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2024/V8/I1/104

Fig.1 HoliKD模型

Fig.2 数据预处理

Table 1 实验参数设置

Table 2 实验数据集信息

数据（ $K$ ）	AG_NEWS	Yahoo！Answers	IMDB
30%	82.76%	69.81%	81.40%
40%	89.45%	76.20%	85.51%
50%	94.51%	80.79%	93.90%
60%	93.28%	79.32%	91.76%
70%	91.26%	77.91%	88.39%

Table 3 不同

K

值数据下的实验效果对比

Table 4 不同模型在三个数据集上的效果

Table 5 消融实验结果

[1]	刘彤, 刘琛, 倪维健. 多层次数据增强的半监督中文情感分析方法[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[1]	(Liu Tong, Liu Chen, Ni Weijian. A Semi-Supervised Sentiment Analysis Method for Chinese Based on Multi-Level Data Augmentation[J]. Data Analysis and Knowledge Discovery, 2021, 5(5): 51-58.)
[2]	Tzelepi M, Passalis N, Tefas A. Online Subclass Knowledge Distillation[J]. Expert Systems with Applications, 2021, 181:115132. doi: 10.1016/j.eswa.2021.115132
[3]	Romero A, Ballas N, Kahou S E, et al. FitNets: Hints for Thin Deep Nets[C]// Proceedings of the 3rd International Conference on Learning Representations. 2015.
[4]	Chen P G, Liu S, Zhao H S, et al. Distilling Knowledge via Knowledge Review[C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2021: 5006-5015.
[5]	Mirzadeh S I, Farajtabar M, Li A, et al. Improved Knowledge Distillation via Teacher Assistant[C]// Proceedings of AAAI Conference on Artificial Intelligence. 2020: 5191-5198.
[6]	Li D, Liu Y, Song L. Adaptive Weighted Losses with Distribution Approximation for Efficient Consistency-based Semi-supervised Learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(11): 7832-7842. doi: 10.1109/TCSVT.2022.3186041
[7]	Lee D H. Pseudo-Label: The Simple and Efficient Semi-supervised Learning Method for Deep Neural Networks[C]// Proceedings of the 18th International Conference on Machine Learning. 2013.
[8]	Chen J A, Yang Z C, Yang D Y. MixText: Linguistically-informed Interpolation of Hidden Space for Semi-supervised Text Classification[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 2147-2157.
[9]	Xie Q Z, Dai Z H, Hovy E, et al. Unsupervised Data Augmentation for Consistency Training[C]// Proceedings of Annual Conference on Neural Information Processing Systems. 2020.
[10]	Chen H T, Guo T Y, Xu C, et al. Learning Student Networks in the Wild[C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2021: 6428-6437.
[11]	Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of Annual Conference on Neural Information Processing Systems. 2017: 5998-6008.
[12]	Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019: 4171-4186.
[13]	Sanh V, Debut L, Chaumond J, et al. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter[C]// Proceedings of Annual Conference on Neural Information Processing Systems. 2019.
[14]	Johnson R, Zhang T. Deep Pyramid Convolutional Neural Networks for Text Categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 562-570.
[15]	Fursov I, Zaytsev A, Burnyshev P, et al. A Differentiable Language Model Adversarial Attack on Text Classifiers[J]. IEEE Access, 2022, 10:17966-17976. doi: 10.1109/ACCESS.2022.3148413
[16]	Zhao X, Huang J X. Bert-QAnet: BERT-encoded Hierarchical Question-answer Cross-attention Network for Duplicate Question Detection[J]. Neurocomputing, 2022, 509: 68-74. doi: 10.1016/j.neucom.2022.08.044
[17]	Bataineh A A, Kaur D. Immunocomputing-based Approach for Optimizing the Topologies of LSTM Networks[J]. IEEE Access, 2021, 9: 78993-79004. doi: 10.1109/ACCESS.2021.3084131

[1]	成全, 董佳. 面向分级阅读的儿童读物层级多标签分类研究^*[J]. 数据分析与知识发现, 2023, 7(7): 156-169.
[2]	胥桂仙, 张子欣, 于绍娜, 董玉双, 田媛. 基于图卷积网络的藏文新闻文本分类^*[J]. 数据分析与知识发现, 2023, 7(6): 73-85.
[3]	叶光辉, 李松烨, 宋孝英. 基于多标签标注学习的城市画像文本分类方法研究^*[J]. 数据分析与知识发现, 2023, 7(5): 60-70.
[4]	吕琦, 上官燕红, 张琳, 黄颖. 基于文本内容自动分类的跨学科测度研究^*[J]. 数据分析与知识发现, 2023, 7(4): 56-67.
[5]	高浩鑫, 孙利娟, 吴京宸, 高宇童, 吴旭. 基于异构图卷积网络的网络社区敏感文本分类模型^*[J]. 数据分析与知识发现, 2023, 7(11): 26-36.
[6]	王卫军, 宁致远, 杜一, 周园春. 基于多标签分类的科技文献学科交叉研究性质识别*[J]. 数据分析与知识发现, 2023, 7(1): 102-112.
[7]	王金政, 杨颖, 余本功. 基于多头协同注意力机制的客户投诉文本分类模型*[J]. 数据分析与知识发现, 2023, 7(1): 128-137.
[8]	叶瀚,孙海春,李欣,焦凯楠. 融合注意力机制与句向量压缩的长文本分类模型[J]. 数据分析与知识发现, 2022, 6(6): 84-94.
[9]	屠振超, 马静. 基于改进文本表示的商品文本分类算法研究*[J]. 数据分析与知识发现, 2022, 6(5): 34-43.
[10]	陈果, 叶潮. 融合半监督学习与主动学习的细分领域新闻分类研究^*[J]. 数据分析与知识发现, 2022, 6(4): 28-38.
[11]	肖悦珺, 李红莲, 张乐, 吕学强, 游新冬. 特征融合的中文专利文本分类方法研究^*[J]. 数据分析与知识发现, 2022, 6(4): 49-59.
[12]	杨林, 黄晓硕, 王嘉阳, 丁玲玲, 李子孝, 李姣. 基于BERT-TextCNN的临床试验疾病亚型识别研究^*[J]. 数据分析与知识发现, 2022, 6(4): 69-81.
[13]	徐月梅, 樊祖薇, 曹晗. 基于标签嵌入注意力机制的多任务文本分类模型^*[J]. 数据分析与知识发现, 2022, 6(2/3): 105-116.
[14]	白思萌,牛振东,何慧,时恺泽,易坤,马原驰. 基于超图注意力网络的生物医学文本分类方法^*[J]. 数据分析与知识发现, 2022, 6(11): 13-24.
[15]	黄学坚, 刘雨飏, 马廷淮. 基于改进型图神经网络的学术论文分类模型^*[J]. 数据分析与知识发现, 2022, 6(10): 93-102.

Viewed

Full text

Abstract

Cited

Shared

Discussed