[Objective] This paper proposes a classification model for few-shot texts, aiming to address the issues of data scarcity and low generalization performance.[Methods] First, we divided the text classification tasks into multiple subtasks based on episode training mechanism in meta-learning. Then, we proposed a Bi-directional Temporal Convolutional Network (Bi-TCN) to capture the long-term contextual information of the text in each subtask. Third, we developed a Bi-directional Long-term Attention Network (BLAN) to capture more discriminative features based on Bi-TCN and multi-head attention mechanism. Finally, we used the Neural Tensor Network to measure the correlation between query samples and support set of each subtask to finish few-shot text classification.[Results] We examined our model with the ARSC dataset. The classification accuracy of this model reached 86.80% in few-shot learning setting, which was 3.68% and 1.17% better than those of the ROBUSTTC-FSL and Induction-Network-Routing models.[Limitations] The performance of BLAN on long text is not satisfactory. [Conclusions] BLAN overcomes the issue of data scarcity and captures comprehensive text features, which effectively improves the performance of few-shot text classification.
( Tao Zhiyong, Li Xiaobing, Liu Ying, et al. Classifying Short Texts with Improved-Attention Based Bidirectional Long Memory Network[J]. Data Analysis and Knowledge Discovery, 2019,3(12):21-29.)
( Yu Bengong, Cao Yumeng, Chen Yangnan, et al. Classification of Short Texts Based on nLD-SVM-RF Model[J]. Data Analysis and Knowledge Discovery, 2020,4(1):111-120.)
[3]
Koch G, Zemel R, Salakhutdinov R. Siamese Neural Networks for One-Shot Image Recognition[C]//Proceedings of the 32nd International Conference on Machine Learning (ICML) Workshop on Deep Learning. 2015.
[4]
Wang Y X, Girshick R, Hebert M, et al. Low-shot Learning from Imaginary Data[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 7278-7286.
[5]
Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative Adversarial Nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2. 2014: 2672-2680.
[6]
Vinyals O, Blundell C, Lillicrap T, et al. Matching Networks for One Shot Learning[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016: 3637-3645.
[7]
Snell J, Swersky K, Zemel R. Prototypical Networks for Few-Shot Learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 4080-4090.
[8]
Sung F, Yang Y X, Zhang L, et al. Learning to Compare: Relation Network for Few-Shot Learning[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 1199-1208.
[9]
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[10]
Blitzer J, Dredze M, Pereira F. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification[C]// Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL). 2007: 440-447.
[11]
Joachims T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features[C]//Proceedings of the 10th European Conference on Machine Learning. 1998: 137-142.
[12]
Mladenic D, Grobelnik M. Feature Selection for Unbalanced Class Distribution and Naive Bayes[C]//Proceedings of the 16th International Conference on Machine Learning. 1999: 258-267.
[13]
Kim Y. Convolutional Neural Networks for Sentence Classification [OL]. arXiv Preprint, arXiv:1408.5882, 2014.
[14]
Liu P F, Qiu X P, Huang X J. Recurrent Neural Network for Text Classification with Multi-task Learning[OL]. arXiv Preprint, arXiv:1605.05101, 2016.
[15]
Cai Q, Pan Y W, Yao T, et al. Memory Matching Networks for One-Shot Image Recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4080-4088.
[16]
Pahde F, Jähnichen P, Klein T, et al. Cross-modal Hallucination for Few-Shot Fine-Grained Recognition[OL]. arXiv Preprint, arXiv:1806.05147, 2018.
[17]
Schwartz E, Karlinsky L, Shtok J, et al. Delta-encoder: An Effective Sample Synbook Method for Few-Shot Object Recognition[A]// Advances in Neural Information Processing Systems[M]. Neural Information Processing Systems Foundation, Inc., 2018: 2845-2855.
[18]
Finn C, Abbeel P, Levine S. Model-agnostic Meta-learning for Fast Adaptation of Deep Networks[C]//Proceedings of the 34th International Conference on Machine Learning. 2017: 1126-1135.
[19]
Wang Y, Wu X M, Li Q, et al. Large Margin Meta-Learning for Few-Shot Classification[C]//Proceedings of the 2nd Neural Information Processing Systems (NIPS) Workshop on Meta-Learning. 2018.
[20]
Geng R Y, Li B H, Li Y B, et al. Induction Networks for Few-Shot Text Classification[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019: 3895-3904.
[21]
Sabour S, Frosst N, Hinton G E. Dynamic Routing Between Capsules[A]//Advances in Neural Information Processing Systems[M]. Neural Information Processing Systems Foundation, Inc., 2017: 3856-3866.
[22]
Yu M, Guo X X, Yi J F, et al. Diverse Few-Shot Text Classification with Multiple Metrics[OL]. arXiv Preprint, arXiv:1805.07513, 2018.
[23]
Zhang N Y, Sun Z L, Deng S M, et al. Improving Few-shot Text Classification via Pretrained Language Representations[OL]. arXiv Preprint, arXiv:1908.08788, 2019.
[24]
Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805, 2018.
[25]
Bai S J, Kolter J Z, Koltun V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling[OL]. arXiv Preprint, arXiv:1803.01271, 2018.
[26]
Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.3781, 2013.
[27]
Pennington J, Socher R, Manning C D. GloVe: Global Vectors for Word Representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014: 1532-1543.
[28]
Socher R, Chen D Q, Manning C D, et al. Reasoning with Neural Tensor Networks for Knowledge Base Completion[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 1. 2013: 926-934.
[29]
Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
doi: 10.1162/neco.1997.9.8.1735
pmid: 9377276
[30]
Cho K, van Merriënboer B, Gulcehre C, et al. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation[OL]. arXiv Preprint, arXiv:1406.1078, 2014.
[31]
Graves A, Schmidhuber J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures[J]. Neural Networks, 2005,18(5-6):602-610.
doi: 10.1016/j.neunet.2005.06.042
pmid: 16112549