A Multi-Label Classification Model with Two-Stage Transfer Learning
Lu Quan1,2,He Chao1,Chen Jing3(),Tian Min1,Liu Ting1
1Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China 2Big Data Research Institute, Wuhan University, Wuhan 430072, China 3School of Information Management, Central China Normal University, Wuhan 430079, China
[Objective] This paper proposes a multi-label classification model, aiming to improve data sampling and add common characteristics of the existing models. [Methods] We constructed a two-stage migration learning model of “common domain - single tag data in the target domain - multiple tag data”. Then, we trained this model in the general and the target fields, as well as fine-tuned it with the single label data balanced with the over-sampling method. Finally, we migrated the model to multi-label data and generated multi-label classification. [Results] We examined the new model with image annotations from medical literature. On multi-label classification tasks for images and texts, the F1 score was improved by more than 50% compared to the one-stage transfer learning model. [Limitations] More research is needed to choose better basic model and sampling method for different tasks. [Conclusions] This proposed method coud be used in annotation, retrieval and utilization of big data sets with constraints.
陆泉, 何超, 陈静, 田敏, 刘婷. 基于两阶段迁移学习的多标签分类模型研究*[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
Lu Quan, He Chao, Chen Jing, Tian Min, Liu Ting. A Multi-Label Classification Model with Two-Stage Transfer Learning. Data Analysis and Knowledge Discovery, 2021, 5(7): 91-100.
Weiss G M, Provost F. Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction[J]. Journal of Artificial Intelligence Research, 2003, 19(1):315-354.
doi: 10.1613/jair.1199
[2]
Wang Y Q, Yao Q M, Kwok J T, et al. Generalizing from a Few Examples: A Survey on Few-Shot learning[J]. ACM Computing Surveys (CSUR), 2020, 53(3):1-34.
[3]
Yang Q, Wu X D. 10 Challenging Problems in Data Mining Research[J]. International Journal of Information Technology & Decision Making, 2006, 5(4):597-604.
[4]
Tsoumakas G, Katakis I. Multi-Label Classification: An Overview[J]. International Journal of Data Warehousing and Mining, 2007, 3(3):1-13.
[5]
Salakhutdinov R, Tenenbaum J, Torralba A. One-Shot Learning with a Hierarchical Nonparametric Bayesian Model[C]// Proceedings of ICML Workshop on Unsupervised and Transfer Leaning. 2012:195-206.
[6]
Koch G, Zemel R, Salakhutdinov R. Siamese Neural Networks for One-Shot Image Recognition[C]// Proceedings of the 32nd International Conference on Machine Learning. 2015.
[7]
Vinyals O, Blundell C, Lillicrap T, et al. Matching Networks for One Shot Learning[C]// Proceedings of the 30th Conference on Neural Information Processing Systems. 2016: 3630-3638.
[8]
Snell J, Swersky K, Zemel R. Prototypical Networks for Few-Shot Learning[C]// Proceedings of the 31st Conference on Neural Information Processing Systems. 2017: 4077-4087.
[9]
Pan S J, Yang Q. A Survey on Transfer Learning[J]. IEEE Transactions on Knowledge & Data Engineering, 2010, 22(10):1345-1359.
(Zhuang Fuzhen, Luo Ping, He Qing, et al. Survey on Transfer Learning Research[J]. Journal of Software, 2015, 26(1):26-39.)
[11]
Ng W W, Hu J, Yeung D S, et al. Diversified Sensitivity Based Under-Sampling for Imbalance Classification Problems[J]. IEEE Transactions on Cybernetics, 2017, 45(11):2402-2412.
doi: 10.1109/TCYB.2014.2372060
(Zhao Qinghua, Zhang Yihao, Ma Jianfen, et al. Research on Classification Algorithm of Imbalanced Datasets Based on Improved SMOTE[J]. Computer Engineering and Applications, 2018, 54(18):168-173.)
[13]
Cervantes J, Huang D S, Farid G L, et al. A Hybrid Algorithm to Improve the Accuracy of Support Vector Machines on Skewed Data Sets[M]. Berlin, German: Springer, 2014: 782-788.
(Cui Wei, Jia Xiaolin, Fan Shuaishuai, et al. New Associative Classification Algorithm for Imbalanced Data[J]. Computer Science, 2020, 47(S1):488-493.)
[15]
龙明盛. 迁移学习问题与方法研究[D]. 北京: 清华大学, 2014.
[15]
(Long Mingsheng. Transfer Learning Problems and Methods[D]. Beijing: Tsinghua University, 2014.)
[16]
Tan C Q, Sun F C, Kong T, et al. A Survey on Deep Transfer Learning[C]// Proceedings of International Conference on Artificial Neural Networks. 2018: 270-279.
[17]
Yosinski J, Clune J, Bengio Y, et al. How Transferable are Features in Deep Neural Networks?[C]// Proceedings of the 27th Conference on Neural Information Processing Systems. 2014: 3320-3328.
[18]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies-Volume 1. 2019: 4171-4186.
[19]
李年华. 深度神经网络的迁移学习关键问题研究[D]. 成都: 电子科技大学, 2018.
[19]
(Li Nianhua. Research on Key Problems of Transfer Learning in Deep Neural Networks[D]. Chengdu: University of Electronic Science and Technology of China, 2018.)
[20]
朱怀涛. 面向小样本的多标签分类方法与应用研究[D]. 成都:电子科技大学, 2020.
[20]
(Zhu Huaitao. Multi-label Classification Method and Application Research in Small Samples[D]. Chengdu: University of Electronic Science and Technology of China, 2020.)
(Cheng Lei, Wu Xiaofu, Zhang Suofei. Analysis of the Effect of Class Imbalance on Transfer Learning[J]. Journal of Signal Processing, 2020, 36(1):110-117.)
[22]
Estabrooks A, Jo T, Japkowicz N. A Multiple Resampling Method for Learning from Imbalanced Data Sets[J]. Computational Intelligence, 2010, 20(1):18-36.
doi: 10.1111/coin.2004.20.issue-1
[23]
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C. Safe-Level-Smote: Safe-Level-Synthetic Minority Over-Sampling Technique for Handling the Class Imbalanced Problem[C]// Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2009: 475-482.
[24]
Dong A M, Chung F L, Wang S T. Semi-Supervised Classification Method Through Oversampling and Common Hidden Space[J]. Information Sciences, 2016, 349:216-228.
[25]
Konno T, Iwazume M. Pseudo-Feature Generation for Imbalanced Data Analysis in Deep Learning[OL]. arXiv Preprint, arXiv:1807.06538.
[26]
Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative Adversarial Nets[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014:2672-2680.
(Yu Yuhai, Lin Hongfei, Meng Jia’na, et al. Classification Modeling and Recognition for Cross Modal and Multi-Label Biomedical Image[J]. Journal of Image and Graphics, 2018, 23(6):917-927.)
(Li Sihao, Chen Fucai, Huang Ruiyang. Multi-label Random Balanced Resampling Algorithm[J]. Application Research of Computers, 2017, 34(10):2929-2932.)
[29]
Russakovsky O, Deng J, Su H, et al. ImageNet Large Scale Visual Recognition Challenge[J]. International Journal of Computer Vision, 2014, 115(3):211-252.
doi: 10.1007/s11263-015-0816-y
[30]
Mikolov T, Grave E, Bojanowski P, et al. Advances in Pre-Training Distributed Word Representations[OL]. arXiv Preprint, arXiv:1712.09405.
(Chen Jianmei, Song Shunlin, Zhu Yuquan, et al. A Method of Medical Images Combining Classification Based on Bayesian and Neural Network[J]. Computer Science, 2008, 35(3):244-246.)
(Sun Junding, Li Lin. Classification of Medical Image Based on BP Neural Network[J]. Computer Systems & Applications, 2012, 21(3):160-162.)
[34]
Lecun Y, Bengio Y, Hinton G. Deep Learning[J]. Nature, 2015, 521(7553):436-444.
doi: 10.1038/nature14539
[35]
Ratliff L J, Burden S A, Sastry S S. Characterization and Computation of Local Nash Equilibria in Continuous Games[C]// Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton). 2013: 917-924.
[36]
He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
[37]
BERT Model Weights[EB/OL]. [2019-06-01]. https://storage.googleapis.com/bert_models/2020_02_20/uncased_L-12_H-768_ A-12.zip.
[38]
ResNet Model Weights [EB/OL]. [2019-06-01]. https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels.h5.
[39]
田敏. 基于迁移学习的医学文献内图像多标签分类[D]. 武汉: 武汉大学, 2019.
[39]
(Tian Min. Image Multi-Label Classification in Medical Literature Based on Transfer Learning[D]. Wuhan: Wuhan University, 2019.)
[40]
Tahir M A, Kittler J, Bouridane A. Multilabel Classification Using Heterogeneous Ensemble of Multi-Label Classifiers[J]. Pattern Recognition Letters, 2012, 33(5):513-523.
doi: 10.1016/j.patrec.2011.10.019
[41]
Sun C, Qiu X P, Xu Y G, et al. How to Fine-Tune BERT for Text Classification?[C]// Proceedings of China National Conference on Chinese Computational Linguistics. 2019:194-206.
[42]
Read J, Pfahringer B, Holmes G, et al. Classifier Chains for Multi-Label Classification[J]. Machine Learning, 2011, 85(3):333-359.
doi: 10.1007/s10994-011-5256-5