[Objective] This study uses the Label Embedding technique to modify attention mechanism. It learns the task-specific information and generates task-related attention weights, aiming to improve the quality of text representation vectors.[Methods] First, we adopted Multi-level LSTM to extract potential semantic representation of texts. Then, we retrieved the words attracted most attention with different labels to generate attention weights through Label Embedding. Finally, we calculated the text representation vector with task-specific information, which was used to predict text classification.[Results] Compared with the TextCNN, BiGRU, TLSTM, LSTMAtt, and SelfAtt models, performance of the proposed model on multiple datasets was improved by 0.60% to 11.95% (with an overall average of 5.27%). It also had fast convergence speed and low complexity.[Limitations] The experimental datasets and the task-types need to be expanded.[Conclusions] The proposed model can effectively improve the classification results of text semantics, which has much practical value.
黄露,周恩国,李岱峰. 融合特定任务信息注意力机制的文本表示学习模型*[J]. 数据分析与知识发现, 2020, 4(9): 111-122.
Huang Lu,Zhou Enguo,Li Daifeng. Text Representation Learning Model Based on Attention Mechanism with Task-specific Information. Data Analysis and Knowledge Discovery, 2020, 4(9): 111-122.
( Li Fenglin, Ke Jia. Text Representation Method Based on Deep Learning[J]. Information Science, 2019,37(1):156-164.)
[2]
马费成. 情报学发展的历史回顾及前沿课题[J]. 图书情报知识, 2013,29(2):4-12.
[2]
( Ma Feicheng. Historical Review of the Development of Information Science with Proposing Frontier Topics[J]. Library and Information Science, 2013,29(2):4-12.)
[3]
Minsky M, Papert S A. Perceptrons: An Introduction to Computational Geometry[M]. USA: MIT Press, 2017.
[4]
Le Q, Mikolov T. Distributed Representations of Sentences and Documents[C] //Proceedings of the 31st International Conference on Machine Learning. 2014: 1188-1196.
[5]
Pennington J, Socher R, Manning C D. GloVe: Global Vectors for Word Representation[C] //Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
[6]
Liu Y, Liu Z Y, Chua T S, et al. Topical Word Embeddings[C] //Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015: 2418-2424.
[7]
Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
doi: 10.1162/neco.1997.9.8.1735
pmid: 9377276
[8]
Chung J, Gulcehre C, Cho K H, et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling[C] //Proceedings of NIPS 2014 Deep Learning and Representation Learning Workshop. 2014: 1-9.
[9]
Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv: 1409.0473.
[10]
Wang Y Q, Huang M L, Zhao L, et al. Attention-based LSTM for Aspect-level Sentiment Classification[C] //Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016: 606-615.
[11]
Lin Z H, Feng M W, Santos C N D, et al. A Structured Self-attentive Sentence Embedding[OL]. arXiv Preprint, arXiv:1703.03130.
[12]
Papadimitriou C H. Latent Semantic Indexing: A Probabilistic Analysis[J]. Journal of Computer & System Sciences, 2000,61(2):217-235.
[13]
Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
( Liu Tingting, Zhu Wendong, Liu Guangyi. Advances in Deep Learning Based Text Classification[J]. Electric Power Information and Communication Technology, 2018,16(3):1-7.)
[15]
Bengio Y, Vincent P, Janvin C. A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003,3(6):1137-1155.
[16]
Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.3781.
[17]
Kim Y. Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408.5882.
[18]
Graves A, Schmidhuber J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures[J]. Neural Networks, 2005,18(5-6):602-610.
doi: 10.1016/j.neunet.2005.06.042
pmid: 16112549
[19]
Nair V, Hinton G E. Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair[C] //Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010: 807-814.
[20]
Lai S W, Xu L H, Liu K, et al. Recurrent Convolutional Neural Networks for Text Classification[C] //Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015: 2267-2273.
[21]
Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[OL]. arXiv Preprint, arXiv:1802. 05365.
[22]
Yang Z C, Yang D Y, Dyer C, et al. Hierarchical Attention Networks for Document Classification[C] //Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2017: 1480-1489.
[23]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[24]
Yang Z L, Dai Z H, Yang Y M, et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding[OL]. arXiv Preprint, arXiv:1906.08237.
[25]
Hu M Q, Liu B. Mining and Summarizing Customer Reviews[C] //Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004: 168-177.
[26]
Socher R, Perelygin A, Wu J, et al. Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank[C] //Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013: 1631-1642.
[27]
Pang B, Lee L. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts[C] //Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. 2004: 271.
[28]
Li X, Roth D. Learning Question Classifiers[C] //Proceedings of the 19th International Conference on Computational Linguistics-Volume 1. 2002: 1-7.
[29]
Van der Maaten L, Hinton G. Visualizing Data Using t-SNE[J]. Journal of Machine Learning Research, 2008,9(11) : 2579-2605.