|
|
Classification Model for Long Texts with Attention Mechanism and Sentence Vector Compression |
Ye Han,Sun Haichun,Li Xin( ),Jiao Kainan |
School of Information and Cyber Security, People’s Public Security University of China, Beijing 102627, China |
|
|
Abstract [Objective] This paper tries to address the input length issue of the pretraining language model, aiming to improve the accuracy of long text classification. [Methods] We designed an algorithm using punctuation in natural texts to segment sentences and feed them into the pre-trained language model in order. Then, we compressed and encoded the classification feature vectors with the average pooling method and the weighted attention mechanism. Finally, we examined the new algorithm with multiple pre-trained language models. [Results] Compared to methods directly truncating the text contents, the classification accuracy of the proposed method improved by up to 3.74%. After applying the attention mechanism, the classification F1-score on two datasets increasd by 1.61% and 0.83% respectively. [Limitations] The improvements are not significant on some pre-trained language models. [Conclusions] The proposed model can effectively classify long texts without changing the pre-training language model’s architecture.
|
Received: 24 October 2021
Published: 25 January 2022
|
|
Fund:Ministry of Public Security Technology Research Program(2020JSYJC22);People’s Public Security University of China Basic Research Fund(2021JKF215) |
Corresponding Authors:
Li Xin
E-mail: lixin@ppsuc.edu.cn
|
[1] |
Katakis I, Tsoumakas G, Vlahavas I. Multilabel Text Classification for Automated Tag Suggestion[C]// Proceedings of the 2008 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. 2008.
|
[2] |
万家山, 吴云志. 基于深度学习的文本分类方法研究综述[J]. 天津理工大学学报, 2021, 37(2): 41-47.
|
[2] |
(Wan Jiashan, Wu Yunzhi. Review of Text Classification Research Based on Deep Learning[J]. Journal of Tianjin University of Technology, 2021, 37(2): 41-47.)
|
[3] |
Sun C, Qiu X, Xu Y, et al. How to Fine-Tune BERT for Text Classification?[C]// Proceedings of the 18th China National Conference on Chinese Computational Linguistics. Springer, Cham, 2019: 194-206.
|
[4] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
|
[5] |
Liu Y H, Ott M, Goyal N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[OL]. arXiv Preprint, arXiv: 1907.11692.
|
[6] |
Peinelt N, Nguyen D, Liakata M. TBERT: Topic Models and BERT Joining Forces for Semantic Similarity Detection[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 7047-7055.
|
[7] |
Ding M, Zhou C, Yang H, et al. CogLTX: Applying BERT to Long Texts[J]. Advances in Neural Information Processing Systems, 2020, 33: 12792-12804.
|
[8] |
Dai Z H, Yang Z L, Yang Y M, et al. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 2978-2988.
|
[9] |
Yang Z C, Yang D Y, Dyer C, et al. Hierarchical Attention Networks for Document Classification[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2016: 1480-1489.
|
[10] |
卢玲, 杨武, 王远伦, 等. 结合注意力机制的长文本分类方法[J]. 计算机应用, 2018, 38(5): 1272-1277.
doi: 10.11772/j.issn.1001-9081.2017112652
|
[10] |
(Lu Ling, Yang Wu, Wang Yuanlun, et al. Long Text Classification Combined with Attention Mechanism[J]. Journal of Computer Applications, 2018, 38(5): 1272-1277.)
doi: 10.11772/j.issn.1001-9081.2017112652
|
[11] |
Adhikari A, Ram A, Tang R, et al. DocBERT: BERT for Document Classification[OL]. arXiv Preprint, arXiv: 1904.08398.
|
[12] |
Hinton G, Vinyals O, Dean J. Distilling the Knowledge in a Neural Network[OL]. arXiv Preprint, arXiv: 1503.02531.
|
[13] |
Wang W, Yan M, Wu C. Multi-Granularity Hierarchical Attention Fusion Networks for Reading Comprehension and Question Answering[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 1705-1714.
|
[14] |
Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv: 1409.0473.
|
[15] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
|
[16] |
Xu L, Hu H, Zhang X W, et al. CLUE: A Chinese Language Understanding Evaluation Benchmark[C]// Proceedings of the 28th International Conference on Computational Linguistics. 2020: 4762-4772.
|
[17] |
Howard J, Ruder S. Universal Language Model Fine-Tuning for Text Classification[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 328-339.
|
[18] |
Sun Y, Wang S H, Li Y K, et al. ERNIE: Enhanced Representation Through Knowledge Integration[OL]. arXiv Preprint, arXiv: 1904.09223.
|
[19] |
Clark K, Luong M T, Le Q V, et al. ELECTRA: Pre-Training Text Encoders as Discriminators Rather than Generators[OL]. arXiv Preprint, arXiv: 2003.10555.
|
[20] |
Lan Z, Chen M, Goodman S, et al. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations[OL]. arXiv Preprint, arXiv: 1909.11942.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|