Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (10): 113-123    DOI: 10.11925/infotech.2096-3467.2020.0206
Current Issue | Archive | Adv Search |
Classification Model for Few-shot Texts Based on Bi-directional Long-term Attention Features
Xu Tongtong,Sun Huazhi,Ma Chunmei(),Jiang Lifen,Liu Yichen
College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
Download: PDF (1036 KB)   HTML ( 8
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a classification model for few-shot texts, aiming to address the issues of data scarcity and low generalization performance.[Methods] First, we divided the text classification tasks into multiple subtasks based on episode training mechanism in meta-learning. Then, we proposed a Bi-directional Temporal Convolutional Network (Bi-TCN) to capture the long-term contextual information of the text in each subtask. Third, we developed a Bi-directional Long-term Attention Network (BLAN) to capture more discriminative features based on Bi-TCN and multi-head attention mechanism. Finally, we used the Neural Tensor Network to measure the correlation between query samples and support set of each subtask to finish few-shot text classification.[Results] We examined our model with the ARSC dataset. The classification accuracy of this model reached 86.80% in few-shot learning setting, which was 3.68% and 1.17% better than those of the ROBUSTTC-FSL and Induction-Network-Routing models.[Limitations] The performance of BLAN on long text is not satisfactory. [Conclusions] BLAN overcomes the issue of data scarcity and captures comprehensive text features, which effectively improves the performance of few-shot text classification.

Key wordsFew-shot Text Classification      Attention Mechanism      Few-shot Learning      Bi-TCN     
Received: 18 March 2020      Published: 09 November 2020
ZTFLH:  TP393  
Corresponding Authors: Ma Chunmei     E-mail: mcmxhd@163.com

Cite this article:

Xu Tongtong,Sun Huazhi,Ma Chunmei,Jiang Lifen,Liu Yichen. Classification Model for Few-shot Texts Based on Bi-directional Long-term Attention Features. Data Analysis and Knowledge Discovery, 2020, 4(10): 113-123.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0206     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I10/113

Sample of 2-way 5-shot Task in ARSC Dataset
Architecture of BLAN
Architecture of Improved TCN
Architecture of Bi-TCN
Sample of Test Task in ARSC Dataset
参数
词向量维度
Bi-TCN隐藏层大小
Bi-TCN卷积核大小
注意力维度
注意力头数
关系比较模块卷积层
学习率
300
128
3
64
8
100
1×10-4
Parameter Settings
模型 平均准确率/%
Matching Network
Prototypical Network
MAML
Relation Network
ROBUSTTC-FSL
Induction-Network-Routing
BLAN (本文模型)
65.73
68.15
78.33
83.74
83.12
85.63
86.80
Average Accuracy of the Model on ARSC Dataset
模型 参数量
Induction-Network-Routing
BLAN (本文)
1.986×109
2.269×109
Parameters of Different Models
Comparison of Loss Curve
方法 平均准确率/%
TCN
Bi-TCN
76.70
86.80
Average Accuracy When Different Methods Act as Feature Extraction Modules
Results of Long-Term Feature Learning Model
模型 平均准确率/%
-Attention
BLAN
85.29
86.80
Average Accuracy of the BLAN or -Attention
[1] 陶志勇, 李小兵, 刘影, 等. 基于双向长短时记忆网络的改进注意力短文本分类方法[J]. 数据分析与知识发现, 2019,3(12):21-29.
[1] ( Tao Zhiyong, Li Xiaobing, Liu Ying, et al. Classifying Short Texts with Improved-Attention Based Bidirectional Long Memory Network[J]. Data Analysis and Knowledge Discovery, 2019,3(12):21-29.)
[2] 余本功, 曹雨蒙, 陈杨楠, 等. 基于nLD-SVM-RF的短文本分类研究[J]. 数据分析与知识发现, 2020,4(1):111-120.
[2] ( Yu Bengong, Cao Yumeng, Chen Yangnan, et al. Classification of Short Texts Based on nLD-SVM-RF Model[J]. Data Analysis and Knowledge Discovery, 2020,4(1):111-120.)
[3] Koch G, Zemel R, Salakhutdinov R. Siamese Neural Networks for One-Shot Image Recognition[C]//Proceedings of the 32nd International Conference on Machine Learning (ICML) Workshop on Deep Learning. 2015.
[4] Wang Y X, Girshick R, Hebert M, et al. Low-shot Learning from Imaginary Data[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 7278-7286.
[5] Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative Adversarial Nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2. 2014: 2672-2680.
[6] Vinyals O, Blundell C, Lillicrap T, et al. Matching Networks for One Shot Learning[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016: 3637-3645.
[7] Snell J, Swersky K, Zemel R. Prototypical Networks for Few-Shot Learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 4080-4090.
[8] Sung F, Yang Y X, Zhang L, et al. Learning to Compare: Relation Network for Few-Shot Learning[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 1199-1208.
[9] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[10] Blitzer J, Dredze M, Pereira F. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification[C]// Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL). 2007: 440-447.
[11] Joachims T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features[C]//Proceedings of the 10th European Conference on Machine Learning. 1998: 137-142.
[12] Mladenic D, Grobelnik M. Feature Selection for Unbalanced Class Distribution and Naive Bayes[C]//Proceedings of the 16th International Conference on Machine Learning. 1999: 258-267.
[13] Kim Y. Convolutional Neural Networks for Sentence Classification [OL]. arXiv Preprint, arXiv:1408.5882, 2014.
[14] Liu P F, Qiu X P, Huang X J. Recurrent Neural Network for Text Classification with Multi-task Learning[OL]. arXiv Preprint, arXiv:1605.05101, 2016.
[15] Cai Q, Pan Y W, Yao T, et al. Memory Matching Networks for One-Shot Image Recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4080-4088.
[16] Pahde F, Jähnichen P, Klein T, et al. Cross-modal Hallucination for Few-Shot Fine-Grained Recognition[OL]. arXiv Preprint, arXiv:1806.05147, 2018.
[17] Schwartz E, Karlinsky L, Shtok J, et al. Delta-encoder: An Effective Sample Synbook Method for Few-Shot Object Recognition[A]// Advances in Neural Information Processing Systems[M]. Neural Information Processing Systems Foundation, Inc., 2018: 2845-2855.
[18] Finn C, Abbeel P, Levine S. Model-agnostic Meta-learning for Fast Adaptation of Deep Networks[C]//Proceedings of the 34th International Conference on Machine Learning. 2017: 1126-1135.
[19] Wang Y, Wu X M, Li Q, et al. Large Margin Meta-Learning for Few-Shot Classification[C]//Proceedings of the 2nd Neural Information Processing Systems (NIPS) Workshop on Meta-Learning. 2018.
[20] Geng R Y, Li B H, Li Y B, et al. Induction Networks for Few-Shot Text Classification[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019: 3895-3904.
[21] Sabour S, Frosst N, Hinton G E. Dynamic Routing Between Capsules[A]//Advances in Neural Information Processing Systems[M]. Neural Information Processing Systems Foundation, Inc., 2017: 3856-3866.
[22] Yu M, Guo X X, Yi J F, et al. Diverse Few-Shot Text Classification with Multiple Metrics[OL]. arXiv Preprint, arXiv:1805.07513, 2018.
[23] Zhang N Y, Sun Z L, Deng S M, et al. Improving Few-shot Text Classification via Pretrained Language Representations[OL]. arXiv Preprint, arXiv:1908.08788, 2019.
[24] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805, 2018.
[25] Bai S J, Kolter J Z, Koltun V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling[OL]. arXiv Preprint, arXiv:1803.01271, 2018.
[26] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.3781, 2013.
[27] Pennington J, Socher R, Manning C D. GloVe: Global Vectors for Word Representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014: 1532-1543.
[28] Socher R, Chen D Q, Manning C D, et al. Reasoning with Neural Tensor Networks for Knowledge Base Completion[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 1. 2013: 926-934.
[29] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
doi: 10.1162/neco.1997.9.8.1735 pmid: 9377276
[30] Cho K, van Merriënboer B, Gulcehre C, et al. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation[OL]. arXiv Preprint, arXiv:1406.1078, 2014.
[31] Graves A, Schmidhuber J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures[J]. Neural Networks, 2005,18(5-6):602-610.
doi: 10.1016/j.neunet.2005.06.042 pmid: 16112549
[1] Yang Hanxun, Zhou Dequn, Ma Jing, Luo Yongcong. Detecting Rumors with Uncertain Loss and Task-level Attention Mechanism[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
[2] Xie Hao,Mao Jin,Li Gang. Sentiment Classification of Image-Text Information with Multi-Layer Semantic Fusion[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[3] Yin Pengbo,Pan Weimin,Zhang Haijun,Chen Degang. Identifying Clickbait with BERT-BiGA Model[J]. 数据分析与知识发现, 2021, 5(6): 126-134.
[4] Han Pu,Zhang Zhanpeng,Zhang Mingtao,Gu Liang. Normalizing Chinese Disease Names with Multi-feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 83-94.
[5] Duan Jianyong,Wei Xiaopeng,Wang Hao. A Multi-Perspective Co-Matching Model for Machine Reading Comprehension[J]. 数据分析与知识发现, 2021, 5(4): 134-141.
[6] Wang Yuzhu,Xie Jun,Chen Bo,Xu Xinying. Multi-modal Sentiment Analysis Based on Cross-modal Context-aware Attention[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[7] Jiang Cuiqing,Wang Xiangxiang,Wang Zhao. Forecasting Car Sales Based on Consumer Attention[J]. 数据分析与知识发现, 2021, 5(1): 128-139.
[8] Yin Haoran,Cao Jinxuan,Cao Luzhe,Wang Guodong. Identifying Emergency Elements Based on BiGRU-AM Model with Extended Semantic Dimension[J]. 数据分析与知识发现, 2020, 4(9): 91-99.
[9] Huang Lu,Zhou Enguo,Li Daifeng. Text Representation Learning Model Based on Attention Mechanism with Task-specific Information[J]. 数据分析与知识发现, 2020, 4(9): 111-122.
[10] Shi Lei,Wang Yi,Cheng Ying,Wei Ruibin. Review of Attention Mechanism in Natural Language Processing[J]. 数据分析与知识发现, 2020, 4(5): 1-14.
[11] Xue Fuliang,Liu Lifang. Fine-Grained Sentiment Analysis with CRF and ATAE-LSTM[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
[12] Qi Ruihua,Jian Yue,Guo Xu,Guan Jinghua,Yang Mingxin. Sentiment Analysis of Cross-Domain Product Reviews Based on Feature Fusion and Attention Mechanism[J]. 数据分析与知识发现, 2020, 4(12): 85-94.
[13] Yuemin Wu,Ganggui Ding,Bin Hu. Extracting Relationship of Agricultural Financial Texts with Attention Mechanism[J]. 数据分析与知识发现, 2019, 3(5): 86-92.
[14] Yuman Li,Zhibo Chen,Fu Xu. Classifying Texts with KACC Model[J]. 数据分析与知识发现, 2019, 3(10): 89-97.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn