Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (9): 111-122    DOI: 10.11925/infotech.2096-3467.2020.0204
Current Issue | Archive | Adv Search |
Text Representation Learning Model Based on Attention Mechanism with Task-specific Information
Huang Lu,Zhou Enguo,Li Daifeng()
School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China
Download: PDF (4879 KB)   HTML ( 18
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study uses the Label Embedding technique to modify attention mechanism. It learns the task-specific information and generates task-related attention weights, aiming to improve the quality of text representation vectors.[Methods] First, we adopted Multi-level LSTM to extract potential semantic representation of texts. Then, we retrieved the words attracted most attention with different labels to generate attention weights through Label Embedding. Finally, we calculated the text representation vector with task-specific information, which was used to predict text classification.[Results] Compared with the TextCNN, BiGRU, TLSTM, LSTMAtt, and SelfAtt models, performance of the proposed model on multiple datasets was improved by 0.60% to 11.95% (with an overall average of 5.27%). It also had fast convergence speed and low complexity.[Limitations] The experimental datasets and the task-types need to be expanded.[Conclusions] The proposed model can effectively improve the classification results of text semantics, which has much practical value.

Key wordsDeep Learning      Text Representation      Attention Mechanism      Task-specific Information     
Received: 17 March 2020      Published: 05 June 2020
ZTFLH:  TP393  
Corresponding Authors: Li Daifeng     E-mail: lidaifeng@mail.sysu.edu.cn

Cite this article:

Huang Lu,Zhou Enguo,Li Daifeng. Text Representation Learning Model Based on Attention Mechanism with Task-specific Information. Data Analysis and Knowledge Discovery, 2020, 4(9): 111-122.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0204     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I9/111

Example of Classification Text
Model Framework of FTIA
数据集 类别数 数据量 训练集 测试集 备注
CR 2 3 769 2 638 1 131 训练集和测试集按照7∶3随机划分
SST-1 5 10 754 8 544 2 210 训练集和测试集已预先划分
Subj 2 10 000 7 000 3 000 训练集和测试集按照7∶3随机划分
TREC 6 5 952 5 452 500 训练集和测试集已预先划分
Patent 6 18 000 12 600 5 400 训练集和测试集按照7∶3随机划分
Dataset Statistics
模型 词嵌入 Hidden Size Learning Rate Epochs Batch Size N Layers Penalty Confficient
TextCNN 开源GloVe向量 200 2×10-5/1×10-3 100 32 - -
BiGRU 开源GloVe向量 200 2×10-5/1×10-3 100 32 2 -
TLSTM 开源GloVe向量 200 2×10-5/1×10-3 100 32 2
LSTMAtt 开源GloVe向量 200 2×10-5/1×10-3 100 32 2 -
SelfAtt 开源GloVe向量 200 2×10-5/1×10-3 100 32 2 0.1
FTIA 开源GloVe向量 200 2×10-5/1×10-3 100 32 2 0.1
Hyperparameter Setting of Models
Data Preprocessing Process
注意力机制 模型 CR SST-1 Subj TREC Patent
未引入注意力机制 TextCNN 67.02 31.67 86.27 79.8 78.89
BiGRU 72.50 36.47 87.33 83.0 81.33
TLSMT 71.71 34.93 86.03 82.8 76.28
引入注意力机制 LSTMAtt 73.83 37.51 87.47 81.6 81.41
SelfAtt 74.71 37.01 86.53 85.8 81.15
FTIA 77.54 43.62 92.43 86.4 82.96
Experiment Results(%)
Visual Comparison of the Attention Weight of SelfAtt and FTIA for Positive Emotional Comments of CR
Visual Comparison of the Attention Weight of SelfAtt and FTIA for Questions of TREC
模型 模型参数 模型 模型参数
TextCNN 3 676 232 LSTMAtt 4 308 802
BiGRU 4 870 802 SelfAtt 16 392 082
TLSTM 4 236 802 FTIA 4 349 402
Number of Model Parameters
The Running Time in the CR Dataset
Visual Comparison of FTIA and LSTMAtt Text Representation in Early Training
Visual Comparison of FTIA and LSTMAtt Text Representation in Mid-to-late Training
惩罚项系数 CR SST-1 Subj TREC
0.0 78.69 44.30 92.60 86.40
0.1 77.54 43.62 92.43 86.40
0.2 77.98 44.34 92.60 86.60
0.3 80.11 44.43 92.50 86.80
0.4 79.05 44.43 92.77 87.00
0.5 78.69 44.39 92.53 87.00
0.6 78.96 44.80 92.63 87.20
0.7 77.98 44.03 92.80 86.20
0.8 78.43 43.48 92.40 86.20
0.9 80.02 44.57 92.13 87.40
1.0 78.69 44.16 92.53 87.00
Accuracy Corresponding to Penalty Coefficients(%)
[1] 李枫林, 柯佳. 基于深度学习的文本表示方法[J]. 情报科学, 2019,37(1):156-164.
[1] ( Li Fenglin, Ke Jia. Text Representation Method Based on Deep Learning[J]. Information Science, 2019,37(1):156-164.)
[2] 马费成. 情报学发展的历史回顾及前沿课题[J]. 图书情报知识, 2013,29(2):4-12.
[2] ( Ma Feicheng. Historical Review of the Development of Information Science with Proposing Frontier Topics[J]. Library and Information Science, 2013,29(2):4-12.)
[3] Minsky M, Papert S A. Perceptrons: An Introduction to Computational Geometry[M]. USA: MIT Press, 2017.
[4] Le Q, Mikolov T. Distributed Representations of Sentences and Documents[C] //Proceedings of the 31st International Conference on Machine Learning. 2014: 1188-1196.
[5] Pennington J, Socher R, Manning C D. GloVe: Global Vectors for Word Representation[C] //Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
[6] Liu Y, Liu Z Y, Chua T S, et al. Topical Word Embeddings[C] //Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015: 2418-2424.
[7] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
doi: 10.1162/neco.1997.9.8.1735 pmid: 9377276
[8] Chung J, Gulcehre C, Cho K H, et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling[C] //Proceedings of NIPS 2014 Deep Learning and Representation Learning Workshop. 2014: 1-9.
[9] Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv: 1409.0473.
[10] Wang Y Q, Huang M L, Zhao L, et al. Attention-based LSTM for Aspect-level Sentiment Classification[C] //Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016: 606-615.
[11] Lin Z H, Feng M W, Santos C N D, et al. A Structured Self-attentive Sentence Embedding[OL]. arXiv Preprint, arXiv:1703.03130.
[12] Papadimitriou C H. Latent Semantic Indexing: A Probabilistic Analysis[J]. Journal of Computer & System Sciences, 2000,61(2):217-235.
[13] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
[14] 刘婷婷, 朱文东, 刘广一. 基于深度学习的文本分类研究进展[J]. 电力信息与通信技术, 2018,16(3):1-7.
[14] ( Liu Tingting, Zhu Wendong, Liu Guangyi. Advances in Deep Learning Based Text Classification[J]. Electric Power Information and Communication Technology, 2018,16(3):1-7.)
[15] Bengio Y, Vincent P, Janvin C. A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003,3(6):1137-1155.
[16] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.3781.
[17] Kim Y. Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408.5882.
[18] Graves A, Schmidhuber J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures[J]. Neural Networks, 2005,18(5-6):602-610.
doi: 10.1016/j.neunet.2005.06.042 pmid: 16112549
[19] Nair V, Hinton G E. Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair[C] //Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010: 807-814.
[20] Lai S W, Xu L H, Liu K, et al. Recurrent Convolutional Neural Networks for Text Classification[C] //Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015: 2267-2273.
[21] Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[OL]. arXiv Preprint, arXiv:1802. 05365.
[22] Yang Z C, Yang D Y, Dyer C, et al. Hierarchical Attention Networks for Document Classification[C] //Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2017: 1480-1489.
[23] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[24] Yang Z L, Dai Z H, Yang Y M, et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding[OL]. arXiv Preprint, arXiv:1906.08237.
[25] Hu M Q, Liu B. Mining and Summarizing Customer Reviews[C] //Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004: 168-177.
[26] Socher R, Perelygin A, Wu J, et al. Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank[C] //Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013: 1631-1642.
[27] Pang B, Lee L. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts[C] //Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. 2004: 271.
[28] Li X, Roth D. Learning Question Classifiers[C] //Proceedings of the 19th International Conference on Computational Linguistics-Volume 1. 2002: 1-7.
[29] Van der Maaten L, Hinton G. Visualizing Data Using t-SNE[J]. Journal of Machine Learning Research, 2008,9(11) : 2579-2605.
[1] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[2] Zhao Danning,Mu Dongmei,Bai Sen. Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[3] Yang Hanxun, Zhou Dequn, Ma Jing, Luo Yongcong. Detecting Rumors with Uncertain Loss and Task-level Attention Mechanism[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
[4] Xu Yuemei, Wang Zihou, Wu Zixin. Predicting Stock Trends with CNN-BiLSTM Based Multi-Feature Integration Model[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[5] Huang Mingxuan,Jiang Caoqing,Lu Shoudong. Expanding Queries Based on Word Embedding and Expansion Terms[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[6] Yin Pengbo,Pan Weimin,Zhang Haijun,Chen Degang. Identifying Clickbait with BERT-BiGA Model[J]. 数据分析与知识发现, 2021, 5(6): 126-134.
[7] Xie Hao,Mao Jin,Li Gang. Sentiment Classification of Image-Text Information with Multi-Layer Semantic Fusion[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[8] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[9] Zhang Guobiao,Li Jie. Detecting Social Media Fake News with Semantic Consistency Between Multi-model Contents[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[10] Han Pu,Zhang Zhanpeng,Zhang Mingtao,Gu Liang. Normalizing Chinese Disease Names with Multi-feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 83-94.
[11] Duan Jianyong,Wei Xiaopeng,Wang Hao. A Multi-Perspective Co-Matching Model for Machine Reading Comprehension[J]. 数据分析与知识发现, 2021, 5(4): 134-141.
[12] Wang Yuzhu,Xie Jun,Chen Bo,Xu Xinying. Multi-modal Sentiment Analysis Based on Cross-modal Context-aware Attention[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[13] Hu Haotian,Ji Jinfeng,Wang Dongbo,Deng Sanhong. An Integrated Platform for Food Safety Incident Entities Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[14] Zhang Qi,Jiang Chuan,Ji Youshu,Feng Minxuan,Li Bin,Xu Chao,Liu Liu. Unified Model for Word Segmentation and POS Tagging of Multi-Domain Pre-Qin Literature[J]. 数据分析与知识发现, 2021, 5(3): 2-11.
[15] Lv Xueqiang,Luo Yixiong,Li Jiaquan,You Xindong. Review of Studies on Detecting Chinese Patent Infringements[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn