Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (10): 113-123     https://doi.org/10.11925/infotech.2096-3467.2020.0206
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于双向长效注意力特征表达的少样本文本分类模型研究*
徐彤彤,孙华志,马春梅(),姜丽芬,刘逸琛
天津师范大学计算机与信息工程学院 天津 300387
Classification Model for Few-shot Texts Based on Bi-directional Long-term Attention Features
Xu Tongtong,Sun Huazhi,Ma Chunmei(),Jiang Lifen,Liu Yichen
College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
全文: PDF (1036 KB)   HTML ( 8
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 针对当前文本分类任务中存在的训练数据匮乏以及模型泛化性能低等问题,在少样本环境下研究文本分类问题,提出一种少样本文本分类模型。【方法】 基于元学习中的分段训练机制将文本分类任务划分为多个子任务;为了捕捉每个子任务中文本的长效上下文信息,提出双向时间卷积网络;为了捕获辨别力更强的特征,联合双向时间卷积网络和注意力机制提出双向长效注意力网络;利用一种新的神经网络模型度量每个子任务中查询样本与支持集的相关性,从而实现少样本文本分类。【结果】 在ARSC数据集上进行实验,实验结果表明,在少样本环境下,该模型的分类准确率高达86.80%,比现有先进的少样本文本分类模型ROBUSTTC-FSL和Induction-Network-Routing的准确率分别提高了3.68%和1.17%。【局限】 仅针对短文本分类问题,对于篇幅较长的文本,其分类能力有限。【结论】 双向长效注意力网络克服了训练数据匮乏问题且充分捕获文本的语义信息,有效提高了少样本文本分类性能。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
徐彤彤
孙华志
马春梅
姜丽芬
刘逸琛
关键词 少样本文本分类注意力机制少样本学习双向时间卷积网络    
Abstract

[Objective] This paper proposes a classification model for few-shot texts, aiming to address the issues of data scarcity and low generalization performance.[Methods] First, we divided the text classification tasks into multiple subtasks based on episode training mechanism in meta-learning. Then, we proposed a Bi-directional Temporal Convolutional Network (Bi-TCN) to capture the long-term contextual information of the text in each subtask. Third, we developed a Bi-directional Long-term Attention Network (BLAN) to capture more discriminative features based on Bi-TCN and multi-head attention mechanism. Finally, we used the Neural Tensor Network to measure the correlation between query samples and support set of each subtask to finish few-shot text classification.[Results] We examined our model with the ARSC dataset. The classification accuracy of this model reached 86.80% in few-shot learning setting, which was 3.68% and 1.17% better than those of the ROBUSTTC-FSL and Induction-Network-Routing models.[Limitations] The performance of BLAN on long text is not satisfactory. [Conclusions] BLAN overcomes the issue of data scarcity and captures comprehensive text features, which effectively improves the performance of few-shot text classification.

Key wordsFew-shot Text Classification    Attention Mechanism    Few-shot Learning    Bi-TCN
收稿日期: 2020-03-18      出版日期: 2020-11-09
ZTFLH:  TP393  
基金资助:*本文系天津市教委科研计划项目“基于免疫原理的营养均衡量化配餐推荐算法的研究”(JW1702);天津市自然科学基金项目“物联网环境中的上下文感知关键技术研究”(18JCYBJC85900);天津市自然科学基金项目“面向智能交通基于智能手机的车辆检测技术及其应用研究”的研究成果之一(18JCQNJC70200)
通讯作者: 马春梅     E-mail: mcmxhd@163.com
引用本文:   
徐彤彤,孙华志,马春梅,姜丽芬,刘逸琛. 基于双向长效注意力特征表达的少样本文本分类模型研究*[J]. 数据分析与知识发现, 2020, 4(10): 113-123.
Xu Tongtong,Sun Huazhi,Ma Chunmei,Jiang Lifen,Liu Yichen. Classification Model for Few-shot Texts Based on Bi-directional Long-term Attention Features. Data Analysis and Knowledge Discovery, 2020, 4(10): 113-123.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0206      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I10/113
Fig.1  亚马逊评论情感分类数据集中2-way 5-shot任务示例
Fig.2  BLAN模型结构
Fig.3  改进后的TCN模型结构
Fig.4  Bi-TCN模型结构图
Fig.5  ARSC数据集中测试任务数据样例
参数
词向量维度
Bi-TCN隐藏层大小
Bi-TCN卷积核大小
注意力维度
注意力头数
关系比较模块卷积层
学习率
300
128
3
64
8
100
1×10-4
Table 1  参数设置
模型 平均准确率/%
Matching Network
Prototypical Network
MAML
Relation Network
ROBUSTTC-FSL
Induction-Network-Routing
BLAN (本文模型)
65.73
68.15
78.33
83.74
83.12
85.63
86.80
Table 2  模型在ARSC数据集上的平均准确率
模型 参数量
Induction-Network-Routing
BLAN (本文)
1.986×109
2.269×109
Table 3  不同模型的参数量比较
Fig.6  Loss变化曲线
方法 平均准确率/%
TCN
Bi-TCN
76.70
86.80
Table 4  不同方法充当特征提取模块时的平均准确率
Fig.7  长效特征学习模型对比结果
模型 平均准确率/%
-Attention
BLAN
85.29
86.80
Table 5  有无注意力特征表达模块时模型的平均准确率
[1] 陶志勇, 李小兵, 刘影, 等. 基于双向长短时记忆网络的改进注意力短文本分类方法[J]. 数据分析与知识发现, 2019,3(12):21-29.
[1] ( Tao Zhiyong, Li Xiaobing, Liu Ying, et al. Classifying Short Texts with Improved-Attention Based Bidirectional Long Memory Network[J]. Data Analysis and Knowledge Discovery, 2019,3(12):21-29.)
[2] 余本功, 曹雨蒙, 陈杨楠, 等. 基于nLD-SVM-RF的短文本分类研究[J]. 数据分析与知识发现, 2020,4(1):111-120.
[2] ( Yu Bengong, Cao Yumeng, Chen Yangnan, et al. Classification of Short Texts Based on nLD-SVM-RF Model[J]. Data Analysis and Knowledge Discovery, 2020,4(1):111-120.)
[3] Koch G, Zemel R, Salakhutdinov R. Siamese Neural Networks for One-Shot Image Recognition[C]//Proceedings of the 32nd International Conference on Machine Learning (ICML) Workshop on Deep Learning. 2015.
[4] Wang Y X, Girshick R, Hebert M, et al. Low-shot Learning from Imaginary Data[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 7278-7286.
[5] Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative Adversarial Nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2. 2014: 2672-2680.
[6] Vinyals O, Blundell C, Lillicrap T, et al. Matching Networks for One Shot Learning[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016: 3637-3645.
[7] Snell J, Swersky K, Zemel R. Prototypical Networks for Few-Shot Learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 4080-4090.
[8] Sung F, Yang Y X, Zhang L, et al. Learning to Compare: Relation Network for Few-Shot Learning[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 1199-1208.
[9] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[10] Blitzer J, Dredze M, Pereira F. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification[C]// Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL). 2007: 440-447.
[11] Joachims T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features[C]//Proceedings of the 10th European Conference on Machine Learning. 1998: 137-142.
[12] Mladenic D, Grobelnik M. Feature Selection for Unbalanced Class Distribution and Naive Bayes[C]//Proceedings of the 16th International Conference on Machine Learning. 1999: 258-267.
[13] Kim Y. Convolutional Neural Networks for Sentence Classification [OL]. arXiv Preprint, arXiv:1408.5882, 2014.
[14] Liu P F, Qiu X P, Huang X J. Recurrent Neural Network for Text Classification with Multi-task Learning[OL]. arXiv Preprint, arXiv:1605.05101, 2016.
[15] Cai Q, Pan Y W, Yao T, et al. Memory Matching Networks for One-Shot Image Recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4080-4088.
[16] Pahde F, Jähnichen P, Klein T, et al. Cross-modal Hallucination for Few-Shot Fine-Grained Recognition[OL]. arXiv Preprint, arXiv:1806.05147, 2018.
[17] Schwartz E, Karlinsky L, Shtok J, et al. Delta-encoder: An Effective Sample Synbook Method for Few-Shot Object Recognition[A]// Advances in Neural Information Processing Systems[M]. Neural Information Processing Systems Foundation, Inc., 2018: 2845-2855.
[18] Finn C, Abbeel P, Levine S. Model-agnostic Meta-learning for Fast Adaptation of Deep Networks[C]//Proceedings of the 34th International Conference on Machine Learning. 2017: 1126-1135.
[19] Wang Y, Wu X M, Li Q, et al. Large Margin Meta-Learning for Few-Shot Classification[C]//Proceedings of the 2nd Neural Information Processing Systems (NIPS) Workshop on Meta-Learning. 2018.
[20] Geng R Y, Li B H, Li Y B, et al. Induction Networks for Few-Shot Text Classification[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019: 3895-3904.
[21] Sabour S, Frosst N, Hinton G E. Dynamic Routing Between Capsules[A]//Advances in Neural Information Processing Systems[M]. Neural Information Processing Systems Foundation, Inc., 2017: 3856-3866.
[22] Yu M, Guo X X, Yi J F, et al. Diverse Few-Shot Text Classification with Multiple Metrics[OL]. arXiv Preprint, arXiv:1805.07513, 2018.
[23] Zhang N Y, Sun Z L, Deng S M, et al. Improving Few-shot Text Classification via Pretrained Language Representations[OL]. arXiv Preprint, arXiv:1908.08788, 2019.
[24] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805, 2018.
[25] Bai S J, Kolter J Z, Koltun V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling[OL]. arXiv Preprint, arXiv:1803.01271, 2018.
[26] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.3781, 2013.
[27] Pennington J, Socher R, Manning C D. GloVe: Global Vectors for Word Representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014: 1532-1543.
[28] Socher R, Chen D Q, Manning C D, et al. Reasoning with Neural Tensor Networks for Knowledge Base Completion[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 1. 2013: 926-934.
[29] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
doi: 10.1162/neco.1997.9.8.1735 pmid: 9377276
[30] Cho K, van Merriënboer B, Gulcehre C, et al. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation[OL]. arXiv Preprint, arXiv:1406.1078, 2014.
[31] Graves A, Schmidhuber J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures[J]. Neural Networks, 2005,18(5-6):602-610.
doi: 10.1016/j.neunet.2005.06.042 pmid: 16112549
[1] 范涛,王昊,吴鹏. 基于图卷积神经网络和依存句法分析的网民负面情感分析研究*[J]. 数据分析与知识发现, 2021, 5(9): 97-106.
[2] 杨晗迅, 周德群, 马静, 罗永聪. 基于不确定性损失函数和任务层级注意力机制的多任务谣言检测研究*[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
[3] 谢豪,毛进,李纲. 基于多层语义融合的图文信息情感分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[4] 尹鹏博,潘伟民,张海军,陈德刚. 基于BERT-BiGA模型的标题党新闻识别研究*[J]. 数据分析与知识发现, 2021, 5(6): 126-134.
[5] 余本功,朱晓洁,张子薇. 基于多层次特征提取的胶囊网络文本分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[6] 韩普,张展鹏,张明淘,顾亮. 基于多特征融合的中文疾病名称归一化研究*[J]. 数据分析与知识发现, 2021, 5(5): 83-94.
[7] 段建勇,魏晓鹏,王昊. 基于多角度共同匹配的多项选择机器阅读理解模型 *[J]. 数据分析与知识发现, 2021, 5(4): 134-141.
[8] 王雨竹,谢珺,陈波,续欣莹. 基于跨模态上下文感知注意力的多模态情感分析 *[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[9] 蒋翠清,王香香,王钊. 基于消费者关注度的汽车销量预测方法研究*[J]. 数据分析与知识发现, 2021, 5(1): 128-139.
[10] 黄露,周恩国,李岱峰. 融合特定任务信息注意力机制的文本表示学习模型*[J]. 数据分析与知识发现, 2020, 4(9): 111-122.
[11] 尹浩然,曹金璇,曹鲁喆,王国栋. 扩充语义维度的BiGRU-AM突发事件要素识别研究*[J]. 数据分析与知识发现, 2020, 4(9): 91-99.
[12] 石磊,王毅,成颖,魏瑞斌. 自然语言处理中的注意力机制研究综述*[J]. 数据分析与知识发现, 2020, 4(5): 1-14.
[13] 薛福亮,刘丽芳. 一种基于CRF与ATAE-LSTM的细粒度情感分析方法*[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
[14] 祁瑞华,简悦,郭旭,关菁华,杨明昕. 融合特征与注意力的跨领域产品评论情感分析*[J]. 数据分析与知识发现, 2020, 4(12): 85-94.
[15] 吴粤敏,丁港归,胡滨. 基于注意力机制的农业金融文本关系抽取研究*[J]. 数据分析与知识发现, 2019, 3(5): 86-92.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn