Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (9): 111-122     https://doi.org/10.11925/infotech.2096-3467.2020.0204
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
融合特定任务信息注意力机制的文本表示学习模型*
黄露,周恩国,李岱峰()
中山大学资讯管理学院 广州 510006
Text Representation Learning Model Based on Attention Mechanism with Task-specific Information
Huang Lu,Zhou Enguo,Li Daifeng()
School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China
全文: PDF (4879 KB)   HTML ( 13
输出: BibTeX | EndNote (RIS)      
摘要 

目的】 通过任务标签嵌入方法改进注意力机制,学习特定任务信息并产生与任务相关的注意力权重,提高文本向量的表示能力。【方法】 通过多层LSTM提取文本潜在语义的向量表示;通过标签嵌入学习到不同标签下最关注的单词,获取特定任务下的背景语义信息,并产生注意力权重;计算得到融合特定任务信息的文本表示向量,并用于文本的分类预测。【结果】 相比TextCNN、BiGRU、TLSTM、LSTMAtt以及SelfAtt模型,本文方法在情感、主题、主客观句、领域等多个数据集上的分类准确率提升0.60%~11.95%,总体平均提升5.27%,同时该模型具有收敛速度快、复杂度较低等优点。【局限】 实验数据集规模和任务类型相对有限,可进一步扩充进行模型验证和优化。【结论】 该模型具有面向任务、轻量级的特点,可有效提高文本语义的表达能力和分类效果,具有较强的实用价值。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
黄露
周恩国
李岱峰
关键词 深度学习文本表示注意力机制特定任务信息    
Abstract

[Objective] This study uses the Label Embedding technique to modify attention mechanism. It learns the task-specific information and generates task-related attention weights, aiming to improve the quality of text representation vectors.[Methods] First, we adopted Multi-level LSTM to extract potential semantic representation of texts. Then, we retrieved the words attracted most attention with different labels to generate attention weights through Label Embedding. Finally, we calculated the text representation vector with task-specific information, which was used to predict text classification.[Results] Compared with the TextCNN, BiGRU, TLSTM, LSTMAtt, and SelfAtt models, performance of the proposed model on multiple datasets was improved by 0.60% to 11.95% (with an overall average of 5.27%). It also had fast convergence speed and low complexity.[Limitations] The experimental datasets and the task-types need to be expanded.[Conclusions] The proposed model can effectively improve the classification results of text semantics, which has much practical value.

Key wordsDeep Learning    Text Representation    Attention Mechanism    Task-specific Information
收稿日期: 2020-03-17      出版日期: 2020-06-05
ZTFLH:  TP393  
基金资助:*本文系国家自然科学基金青年项目“基于知识图谱的用户长尾需求建模研究”(61702564);中山大学百人计划科研基金项目(20000-18841202);广东省软科学面上项目“银行知识图谱构建及应用研究”的研究成果之一(2019a101002020)
通讯作者: 李岱峰     E-mail: lidaifeng@mail.sysu.edu.cn
引用本文:   
黄露,周恩国,李岱峰. 融合特定任务信息注意力机制的文本表示学习模型*[J]. 数据分析与知识发现, 2020, 4(9): 111-122.
Huang Lu,Zhou Enguo,Li Daifeng. Text Representation Learning Model Based on Attention Mechanism with Task-specific Information. Data Analysis and Knowledge Discovery, 2020, 4(9): 111-122.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0204      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I9/111
Fig.1  分类文本示例
Fig.2  FTIA模型架构
数据集 类别数 数据量 训练集 测试集 备注
CR 2 3 769 2 638 1 131 训练集和测试集按照7∶3随机划分
SST-1 5 10 754 8 544 2 210 训练集和测试集已预先划分
Subj 2 10 000 7 000 3 000 训练集和测试集按照7∶3随机划分
TREC 6 5 952 5 452 500 训练集和测试集已预先划分
Patent 6 18 000 12 600 5 400 训练集和测试集按照7∶3随机划分
Table 1  数据集统计信息
模型 词嵌入 Hidden Size Learning Rate Epochs Batch Size N Layers Penalty Confficient
TextCNN 开源GloVe向量 200 2×10-5/1×10-3 100 32 - -
BiGRU 开源GloVe向量 200 2×10-5/1×10-3 100 32 2 -
TLSTM 开源GloVe向量 200 2×10-5/1×10-3 100 32 2
LSTMAtt 开源GloVe向量 200 2×10-5/1×10-3 100 32 2 -
SelfAtt 开源GloVe向量 200 2×10-5/1×10-3 100 32 2 0.1
FTIA 开源GloVe向量 200 2×10-5/1×10-3 100 32 2 0.1
Table 2  各模型超参数设置
Fig.3  数据预处理流程
注意力机制 模型 CR SST-1 Subj TREC Patent
未引入注意力机制 TextCNN 67.02 31.67 86.27 79.8 78.89
BiGRU 72.50 36.47 87.33 83.0 81.33
TLSMT 71.71 34.93 86.03 82.8 76.28
引入注意力机制 LSTMAtt 73.83 37.51 87.47 81.6 81.41
SelfAtt 74.71 37.01 86.53 85.8 81.15
FTIA 77.54 43.62 92.43 86.4 82.96
Table 3  实验结果对比(%)
Fig.4  SelfAtt和FTIA模型对CR积极情感评论注意力权重可视化对比
Fig.5  SelfAtt和FTIA模型对TREC问题数据注意力权重可视化对比
模型 模型参数 模型 模型参数
TextCNN 3 676 232 LSTMAtt 4 308 802
BiGRU 4 870 802 SelfAtt 16 392 082
TLSTM 4 236 802 FTIA 4 349 402
Table 4  模型参数量
Fig.6  CR数据集下各模型运行时间对比(单位:s)
Fig.7  训练前期FTIA与LSTMAtt文本表示可视化
Fig.8  训练中后期FTIA与LSTMAtt文本表示可视化
惩罚项系数 CR SST-1 Subj TREC
0.0 78.69 44.30 92.60 86.40
0.1 77.54 43.62 92.43 86.40
0.2 77.98 44.34 92.60 86.60
0.3 80.11 44.43 92.50 86.80
0.4 79.05 44.43 92.77 87.00
0.5 78.69 44.39 92.53 87.00
0.6 78.96 44.80 92.63 87.20
0.7 77.98 44.03 92.80 86.20
0.8 78.43 43.48 92.40 86.20
0.9 80.02 44.57 92.13 87.40
1.0 78.69 44.16 92.53 87.00
Table 5  惩罚项系数对应的准确率对比(%)
[1] 李枫林, 柯佳. 基于深度学习的文本表示方法[J]. 情报科学, 2019,37(1):156-164.
[1] ( Li Fenglin, Ke Jia. Text Representation Method Based on Deep Learning[J]. Information Science, 2019,37(1):156-164.)
[2] 马费成. 情报学发展的历史回顾及前沿课题[J]. 图书情报知识, 2013,29(2):4-12.
[2] ( Ma Feicheng. Historical Review of the Development of Information Science with Proposing Frontier Topics[J]. Library and Information Science, 2013,29(2):4-12.)
[3] Minsky M, Papert S A. Perceptrons: An Introduction to Computational Geometry[M]. USA: MIT Press, 2017.
[4] Le Q, Mikolov T. Distributed Representations of Sentences and Documents[C] //Proceedings of the 31st International Conference on Machine Learning. 2014: 1188-1196.
[5] Pennington J, Socher R, Manning C D. GloVe: Global Vectors for Word Representation[C] //Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
[6] Liu Y, Liu Z Y, Chua T S, et al. Topical Word Embeddings[C] //Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015: 2418-2424.
[7] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
doi: 10.1162/neco.1997.9.8.1735 pmid: 9377276
[8] Chung J, Gulcehre C, Cho K H, et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling[C] //Proceedings of NIPS 2014 Deep Learning and Representation Learning Workshop. 2014: 1-9.
[9] Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv: 1409.0473.
[10] Wang Y Q, Huang M L, Zhao L, et al. Attention-based LSTM for Aspect-level Sentiment Classification[C] //Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016: 606-615.
[11] Lin Z H, Feng M W, Santos C N D, et al. A Structured Self-attentive Sentence Embedding[OL]. arXiv Preprint, arXiv:1703.03130.
[12] Papadimitriou C H. Latent Semantic Indexing: A Probabilistic Analysis[J]. Journal of Computer & System Sciences, 2000,61(2):217-235.
[13] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
[14] 刘婷婷, 朱文东, 刘广一. 基于深度学习的文本分类研究进展[J]. 电力信息与通信技术, 2018,16(3):1-7.
[14] ( Liu Tingting, Zhu Wendong, Liu Guangyi. Advances in Deep Learning Based Text Classification[J]. Electric Power Information and Communication Technology, 2018,16(3):1-7.)
[15] Bengio Y, Vincent P, Janvin C. A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003,3(6):1137-1155.
[16] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.3781.
[17] Kim Y. Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408.5882.
[18] Graves A, Schmidhuber J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures[J]. Neural Networks, 2005,18(5-6):602-610.
doi: 10.1016/j.neunet.2005.06.042 pmid: 16112549
[19] Nair V, Hinton G E. Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair[C] //Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010: 807-814.
[20] Lai S W, Xu L H, Liu K, et al. Recurrent Convolutional Neural Networks for Text Classification[C] //Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015: 2267-2273.
[21] Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[OL]. arXiv Preprint, arXiv:1802. 05365.
[22] Yang Z C, Yang D Y, Dyer C, et al. Hierarchical Attention Networks for Document Classification[C] //Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2017: 1480-1489.
[23] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[24] Yang Z L, Dai Z H, Yang Y M, et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding[OL]. arXiv Preprint, arXiv:1906.08237.
[25] Hu M Q, Liu B. Mining and Summarizing Customer Reviews[C] //Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004: 168-177.
[26] Socher R, Perelygin A, Wu J, et al. Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank[C] //Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013: 1631-1642.
[27] Pang B, Lee L. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts[C] //Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. 2004: 271.
[28] Li X, Roth D. Learning Question Classifiers[C] //Proceedings of the 19th International Conference on Computational Linguistics-Volume 1. 2002: 1-7.
[29] Van der Maaten L, Hinton G. Visualizing Data Using t-SNE[J]. Journal of Machine Learning Research, 2008,9(11) : 2579-2605.
[1] 尹浩然,曹金璇,曹鲁喆,王国栋. 扩充语义维度的BiGRU-AM突发事件要素识别研究*[J]. 数据分析与知识发现, 2020, 4(9): 91-99.
[2] 余传明, 王曼怡, 林虹君, 朱星宇, 黄婷婷, 安璐. 基于深度学习的词汇表示模型对比研究*[J]. 数据分析与知识发现, 2020, 4(8): 28-40.
[3] 徐晨飞, 叶海影, 包平. 基于深度学习的方志物产资料实体自动识别模型构建研究*[J]. 数据分析与知识发现, 2020, 4(8): 86-97.
[4] 赵旸, 张智雄, 刘欢, 丁良萍. 基于BERT模型的中文医学文献分类研究*[J]. 数据分析与知识发现, 2020, 4(8): 41-49.
[5] 王鑫芸,王昊,邓三鸿,张宝隆. 面向期刊选择的学术论文内容分类研究 *[J]. 数据分析与知识发现, 2020, 4(7): 96-109.
[6] 王末,崔运鹏,陈丽,李欢. 基于深度学习的学术论文语步结构分类方法研究*[J]. 数据分析与知识发现, 2020, 4(6): 60-68.
[7] 焦启航,乐小虬. 对比关系句子生成方法研究[J]. 数据分析与知识发现, 2020, 4(6): 43-50.
[8] 石磊,王毅,成颖,魏瑞斌. 自然语言处理中的注意力机制研究综述*[J]. 数据分析与知识发现, 2020, 4(5): 1-14.
[9] 邓思艺,乐小虬. 基于动态语义注意力的指代消解方法[J]. 数据分析与知识发现, 2020, 4(5): 46-53.
[10] 余传明,原赛,朱星宇,林虹君,张普亮,安璐. 基于深度学习的热点事件主题表示研究*[J]. 数据分析与知识发现, 2020, 4(4): 1-14.
[11] 苏传东,黄孝喜,王荣波,谌志群,毛君钰,朱嘉莹,潘宇豪. 基于词嵌入融合和循环神经网络的中英文隐喻识别*[J]. 数据分析与知识发现, 2020, 4(4): 91-99.
[12] 刘彤,倪维健,孙宇健,曾庆田. 基于深度迁移学习的业务流程实例剩余执行时间预测方法*[J]. 数据分析与知识发现, 2020, 4(2/3): 134-142.
[13] 薛福亮,刘丽芳. 一种基于CRF与ATAE-LSTM的细粒度情感分析方法*[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
[14] 余传明,李浩男,王曼怡,黄婷婷,安璐. 基于深度学习的知识表示研究:网络视角*[J]. 数据分析与知识发现, 2020, 4(1): 63-75.
[15] 张梦吉,杜婉钰,郑楠. 引入新闻短文本的个股走势预测模型[J]. 数据分析与知识发现, 2019, 3(5): 11-18.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn