Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (2/3): 105-116     https://doi.org/10.11925/infotech.2096-3467.2021.0912
  专辑 本期目录 | 过刊浏览 | 高级检索 |
基于标签嵌入注意力机制的多任务文本分类模型*
徐月梅1(),樊祖薇2,3,曹晗1
1北京外国语大学信息科学技术学院 北京 100089
2中国科学院信息工程研究所 北京 100093
3中国科学院大学网络空间安全学院 北京 100049
A Multi-Task Text Classification Model Based on Label Embedding of Attention Mechanism
Xu Yuemei1(),Fan Zuwei2,3,Cao Han1
1School of Information Science and Technology, Beijing Foreign Studies University, Beijing 100089, China
2Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China
3School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
全文: PDF (1842 KB)   HTML ( 28
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 将文本分类算法根据不同分类任务的特征进行动态调整,使文本分类算法的性能与任务特征紧密相关。【方法】 提出一种基于标签的注意力权重学习,通过标签嵌入的方法同时对文本语义的词向量和文本的TF-IDF分类矩阵进行学习,为文本中的单词赋予不同的权重提取与分类任务更相关的特征,改进文本的注意力权重学习。【结果】 所提方法相比现有的LSTMAtt、LEAM和SelfAtt方法在准确率上平均提高了3.78%、5.43%和11.78%,并通过可视化结果分析验证所提方法的文本分类性能。【局限】 未比较不同词向量表示对文本分类任务的性能影响。【结论】 为多任务文本分类算法的设计提出了有效的改进和优化方案。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
徐月梅
樊祖薇
曹晗
关键词 文本分类标签嵌入注意力机制多任务    
Abstract

[Objective] This paper tries to adjust text classification algorithm according to task-specific features, aiming to improve the accuracy of text classification for different tasks. [Methods] We proposed a text classification algorithm based on label attention mechanism. Through label embedding learning of both word vector and the TF-IDF classification matrix, we extracted the task-specific features by assigning different weights to the words, which improves the effectiveness of the attention mechanism. [Results] The accuracy of the proposed method increased by 3.78%, 5.43%, and 11.78% in prediction compared with the existing LSTMAtt, LEAM and SelfAtt methods. [Limitations] We did not study the impacts of different vector models on the performance of text classification.[Conclusions] This paper presents an effective method to improve and optimize the multi-task text classification algorithm.

Key wordsText Classification    Label Embedding    Attention Mechanism    Multi-Task
收稿日期: 2021-08-05      出版日期: 2022-04-14
ZTFLH:  TP393  
基金资助:*中央高校基本科研业务费专项的研究成果之一(2022JJ006)
通讯作者: 徐月梅,ORCID: 0000-0002-0223-7146     E-mail: xuyuemei@bfsu.edu.cn
引用本文:   
徐月梅, 樊祖薇, 曹晗. 基于标签嵌入注意力机制的多任务文本分类模型*[J]. 数据分析与知识发现, 2022, 6(2/3): 105-116.
Xu Yuemei, Fan Zuwei, Cao Han. A Multi-Task Text Classification Model Based on Label Embedding of Attention Mechanism. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 105-116.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.0912      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I2/3/105
Fig.1  基于标签改进注意力机制模型的文本分类结构流程
Fig.2  实验流程
数据集 类别数 数据量 训练集 测试集
TREC 6 5 952 5 452 500
CR 2 3 769 2 638 1 131
SST-1 5 10 754 8 544 2 210
Table 1  实验数据集参数
实验参数 参数值
隐藏层的维度 100
词嵌入向量的维度 100
Batch_size 16
Epoch 100
学习率 1×10-3 / 2×10-5
惩罚项系数 0.1
Table 2  实验主要参数设置
文本类别 分类预测为 y i 分类预测为非 y i
y i T P i F N i
y i F P i T N i
Table 3  类别 i的混淆矩阵
模型 准确率/%
TREC数据集 CR数据集 SST-1数据集
LSTMAtt 88.80 79.92 41.13
SelfAtt 86.00 78.16 35.02
LEAM 84.93 76.96 42.65
LabelAtt 92.20 81.96 43.17
LabelAtt- 90.00 80.63 41.40
LabelAtt-- 22.60 63.22 17.60
Table 4  各模型在TREC、CR和SST-1数据集的准确率性能对比
Fig.3  Epoch=1,5,20,25时,LabelAtt文本分类效果图(TREC数据集)
Fig.4  Epoch=1,5,20,25时,LSTMAtt文本分类效果图(TREC数据集)
模型 准确率/%
Epoch=1 Epoch=5 Epoch=20 Epoch=25
LabelAtt 76.60 86.60 91.20 91.80
LSTMAtt 72.60 83.40 88.00 88.20
Table 5  LabelAtt和LSTMAtt模型在TREC数据集上的准确率
Fig.5  LSTMAtt和LabelAtt在TREC数据集上的注意力权重分布对比示例
Fig.6  LSTMAtt和LabelAtt在CR数据集上的注意力权重分布对比示例
[1] Ifrim G, Bakir G, Weikum G. Fast Logistic Regression for Text Categorization with Variable-Length n-grams[C]// Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008: 354-362.
[2] Gordon A D, Breiman L, Friedman J H, et al. Classification and Regression Trees[J]. Biometrics, 1984, 40(3):874.
[3] Burges C J C. A Tutorial on Support Vector Machines for Pattern Recognition[J]. Data Mining and Knowledge Discovery, 1998, 2(2):121-167.
doi: 10.1023/A:1009715923555
[4] Kim S B, Han K S, Rim H C, et al. Some Effective Techniques for Naive Bayes Text Classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(11):1457-1466.
doi: 10.1109/TKDE.2006.180
[5] Kalchbrenner N, Grefenstette E, Blunsom P. A Convolutional Neural Network for Modelling Sentences[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014.
[6] Kim Y. Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408.5882.
[7] Socher R, Lin C C, Manning C. Parsing Natural Scenes and Natural Language with Recursive Neural Networks[C]// Proceedings of the 28th International Conference on Machine Learning. 2011: 129-136.
[8] Nair V, Hinton G E. Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair[C]// Proceedings of the 27th International Conference on Machine Learning. 2010: 807-814.
[9] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8):1735-1780.
pmid: 9377276
[10] Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv: 1409.0473
[11] Wang Y Q, Huang M L, Zhu X Y, et al. Attention-Based LSTM for Aspect-Level Sentiment Classification[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016: 606-615.
[12] Lin Z H, Feng M W, Santos C N D, et al. A Structured Self-Attentive Sentence Embedding[OL]. arXiv Preprint, arXiv:1703.03130.
[13] Wang G, Li C, Wang W, et al. Joint Embedding of Words and Labels for Text Classification[OL]. arXiv Preprint, arXiv: 1805.04174.
[14] Wallach H M. Topic Modeling: Beyond Bag-of-Words[C]// Proceedings of the 23rd International Conference on Machine Learning. 2006: 977-984.
[15] Joachims T. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization[R]. Carnegie Mellon University, Computer Science Technical Report, CMU-CS-96-118, 1996.
[16] Blei D M, Ng A Y, Jordan M I, et al. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2012, 3:993-1022.
[17] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[18] Liu Y, Liu Z Y, Chua T S, et al. Topical Word Embeddings[C]// Proceedings of AAAI Conference on Artificial Intelligence. 2015: 2418-2424.
[19] Peters M, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.
[20] Akata Z, Perronnin F, Harchaoui Z, et al. Label-Embedding for Image Classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(7):1425-1438.
doi: 10.1109/TPAMI.2015.2487986
[21] Tang J, Qu M, Mei Q Z. PTE: Predictive Text Embedding through Large-Scale Heterogeneous Text Networks[C]// Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015.
[22] Zhang H L, Xiao L Q, Chen W Q, et al. Multi-Task Label Embedding for Text Classification[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 4545-4553.
[23] Jin Z Y, Lai X, Cao J. Multi-Label Sentiment Analysis Base on BERT with Modified TF-IDF[C]// Proceedings of 2020 IEEE International Symposium on Product Compliance Engineering-Asia (ISPCE-CN). 2020: 1-6.
[24] van der Maaten L, Hinton G. Visualizing Data Using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(11):2579-2605.
[1] 郭航程, 何彦青, 兰天, 吴振峰, 董诚. 基于Paragraph-BERT-CRF的科技论文摘要语步功能信息识别方法研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 298-307.
[2] 余传明, 林虹君, 张贞港. 基于多任务深度学习的实体和事件联合抽取模型*[J]. 数据分析与知识发现, 2022, 6(2/3): 117-128.
[3] 谢星雨, 余本功. 基于MFFMB的电商评论文本分类研究*[J]. 数据分析与知识发现, 2022, 6(1): 101-112.
[4] 周泽聿,王昊,赵梓博,李跃艳,张小琴. 融合关联信息的GCN文本分类模型构建及其应用研究*[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[5] 范涛,王昊,吴鹏. 基于图卷积神经网络和依存句法分析的网民负面情感分析研究*[J]. 数据分析与知识发现, 2021, 5(9): 97-106.
[6] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[7] 杨晗迅, 周德群, 马静, 罗永聪. 基于不确定性损失函数和任务层级注意力机制的多任务谣言检测研究*[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
[8] 谢豪,毛进,李纲. 基于多层语义融合的图文信息情感分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[9] 尹鹏博,潘伟民,张海军,陈德刚. 基于BERT-BiGA模型的标题党新闻识别研究*[J]. 数据分析与知识发现, 2021, 5(6): 126-134.
[10] 余本功,朱晓洁,张子薇. 基于多层次特征提取的胶囊网络文本分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[11] 韩普,张展鹏,张明淘,顾亮. 基于多特征融合的中文疾病名称归一化研究*[J]. 数据分析与知识发现, 2021, 5(5): 83-94.
[12] 段建勇,魏晓鹏,王昊. 基于多角度共同匹配的多项选择机器阅读理解模型 *[J]. 数据分析与知识发现, 2021, 5(4): 134-141.
[13] 王雨竹,谢珺,陈波,续欣莹. 基于跨模态上下文感知注意力的多模态情感分析 *[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[14] 周志超. 基于机器学习技术的自动引文分类研究综述*[J]. 数据分析与知识发现, 2021, 5(12): 14-24.
[15] 余本功, 张书文. 基于BAGCNN的方面级别情感分析研究*[J]. 数据分析与知识发现, 2021, 5(12): 37-47.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn