|
|
Classification Model for Few-shot Texts Based on Bi-directional Long-term Attention Features |
Xu Tongtong,Sun Huazhi,Ma Chunmei(),Jiang Lifen,Liu Yichen |
College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China |
|
|
Abstract [Objective] This paper proposes a classification model for few-shot texts, aiming to address the issues of data scarcity and low generalization performance.[Methods] First, we divided the text classification tasks into multiple subtasks based on episode training mechanism in meta-learning. Then, we proposed a Bi-directional Temporal Convolutional Network (Bi-TCN) to capture the long-term contextual information of the text in each subtask. Third, we developed a Bi-directional Long-term Attention Network (BLAN) to capture more discriminative features based on Bi-TCN and multi-head attention mechanism. Finally, we used the Neural Tensor Network to measure the correlation between query samples and support set of each subtask to finish few-shot text classification.[Results] We examined our model with the ARSC dataset. The classification accuracy of this model reached 86.80% in few-shot learning setting, which was 3.68% and 1.17% better than those of the ROBUSTTC-FSL and Induction-Network-Routing models.[Limitations] The performance of BLAN on long text is not satisfactory. [Conclusions] BLAN overcomes the issue of data scarcity and captures comprehensive text features, which effectively improves the performance of few-shot text classification.
|
Received: 18 March 2020
Published: 09 November 2020
|
|
Corresponding Authors:
Ma Chunmei
E-mail: mcmxhd@163.com
|
[1] |
陶志勇, 李小兵, 刘影, 等. 基于双向长短时记忆网络的改进注意力短文本分类方法[J]. 数据分析与知识发现, 2019,3(12):21-29.
|
[1] |
( Tao Zhiyong, Li Xiaobing, Liu Ying, et al. Classifying Short Texts with Improved-Attention Based Bidirectional Long Memory Network[J]. Data Analysis and Knowledge Discovery, 2019,3(12):21-29.)
|
[2] |
余本功, 曹雨蒙, 陈杨楠, 等. 基于nLD-SVM-RF的短文本分类研究[J]. 数据分析与知识发现, 2020,4(1):111-120.
|
[2] |
( Yu Bengong, Cao Yumeng, Chen Yangnan, et al. Classification of Short Texts Based on nLD-SVM-RF Model[J]. Data Analysis and Knowledge Discovery, 2020,4(1):111-120.)
|
[3] |
Koch G, Zemel R, Salakhutdinov R. Siamese Neural Networks for One-Shot Image Recognition[C]//Proceedings of the 32nd International Conference on Machine Learning (ICML) Workshop on Deep Learning. 2015.
|
[4] |
Wang Y X, Girshick R, Hebert M, et al. Low-shot Learning from Imaginary Data[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 7278-7286.
|
[5] |
Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative Adversarial Nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2. 2014: 2672-2680.
|
[6] |
Vinyals O, Blundell C, Lillicrap T, et al. Matching Networks for One Shot Learning[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016: 3637-3645.
|
[7] |
Snell J, Swersky K, Zemel R. Prototypical Networks for Few-Shot Learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 4080-4090.
|
[8] |
Sung F, Yang Y X, Zhang L, et al. Learning to Compare: Relation Network for Few-Shot Learning[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 1199-1208.
|
[9] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
|
[10] |
Blitzer J, Dredze M, Pereira F. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification[C]// Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL). 2007: 440-447.
|
[11] |
Joachims T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features[C]//Proceedings of the 10th European Conference on Machine Learning. 1998: 137-142.
|
[12] |
Mladenic D, Grobelnik M. Feature Selection for Unbalanced Class Distribution and Naive Bayes[C]//Proceedings of the 16th International Conference on Machine Learning. 1999: 258-267.
|
[13] |
Kim Y. Convolutional Neural Networks for Sentence Classification [OL]. arXiv Preprint, arXiv:1408.5882, 2014.
|
[14] |
Liu P F, Qiu X P, Huang X J. Recurrent Neural Network for Text Classification with Multi-task Learning[OL]. arXiv Preprint, arXiv:1605.05101, 2016.
|
[15] |
Cai Q, Pan Y W, Yao T, et al. Memory Matching Networks for One-Shot Image Recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4080-4088.
|
[16] |
Pahde F, Jähnichen P, Klein T, et al. Cross-modal Hallucination for Few-Shot Fine-Grained Recognition[OL]. arXiv Preprint, arXiv:1806.05147, 2018.
|
[17] |
Schwartz E, Karlinsky L, Shtok J, et al. Delta-encoder: An Effective Sample Synbook Method for Few-Shot Object Recognition[A]// Advances in Neural Information Processing Systems[M]. Neural Information Processing Systems Foundation, Inc., 2018: 2845-2855.
|
[18] |
Finn C, Abbeel P, Levine S. Model-agnostic Meta-learning for Fast Adaptation of Deep Networks[C]//Proceedings of the 34th International Conference on Machine Learning. 2017: 1126-1135.
|
[19] |
Wang Y, Wu X M, Li Q, et al. Large Margin Meta-Learning for Few-Shot Classification[C]//Proceedings of the 2nd Neural Information Processing Systems (NIPS) Workshop on Meta-Learning. 2018.
|
[20] |
Geng R Y, Li B H, Li Y B, et al. Induction Networks for Few-Shot Text Classification[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019: 3895-3904.
|
[21] |
Sabour S, Frosst N, Hinton G E. Dynamic Routing Between Capsules[A]//Advances in Neural Information Processing Systems[M]. Neural Information Processing Systems Foundation, Inc., 2017: 3856-3866.
|
[22] |
Yu M, Guo X X, Yi J F, et al. Diverse Few-Shot Text Classification with Multiple Metrics[OL]. arXiv Preprint, arXiv:1805.07513, 2018.
|
[23] |
Zhang N Y, Sun Z L, Deng S M, et al. Improving Few-shot Text Classification via Pretrained Language Representations[OL]. arXiv Preprint, arXiv:1908.08788, 2019.
|
[24] |
Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805, 2018.
|
[25] |
Bai S J, Kolter J Z, Koltun V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling[OL]. arXiv Preprint, arXiv:1803.01271, 2018.
|
[26] |
Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.3781, 2013.
|
[27] |
Pennington J, Socher R, Manning C D. GloVe: Global Vectors for Word Representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014: 1532-1543.
|
[28] |
Socher R, Chen D Q, Manning C D, et al. Reasoning with Neural Tensor Networks for Knowledge Base Completion[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 1. 2013: 926-934.
|
[29] |
Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
doi: 10.1162/neco.1997.9.8.1735
pmid: 9377276
|
[30] |
Cho K, van Merriënboer B, Gulcehre C, et al. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation[OL]. arXiv Preprint, arXiv:1406.1078, 2014.
|
[31] |
Graves A, Schmidhuber J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures[J]. Neural Networks, 2005,18(5-6):602-610.
doi: 10.1016/j.neunet.2005.06.042
pmid: 16112549
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|