1School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430073, China 2School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan 430073, China 3School of Information Management, Wuhan University, Wuhan 430072, China
[Objective] This study systematically explores the principles of traditional deep representation models and the latest pre-training ones, aiming to examine their performance in text mining tasks. [Methods] We compared these models’ data mining results from the model side and the experimental side. All tests were conducted with six datasets of CR, MR, MPQA, Subj, SST-2 and TREC. [Results] The XLNet model achieved the best average F1 value (0.918 6), which was higher than ELMo (0.809 0), BERT (0.898 3), Word2Vec (0.769 2), GloVe (0.757 6) and FastText (0.750 6). [Limitations] Our research focused on classification tasks of text mining, which did not compare the performance of vocabulary representation methods in machine translation, Q&A and other tasks. [Conclusions] The traditional deep representation learning models and the latest pre-training ones yield different results in text mining tasks.
余传明, 王曼怡, 林虹君, 朱星宇, 黄婷婷, 安璐. 基于深度学习的词汇表示模型对比研究*[J]. 数据分析与知识发现, 2020, 4(8): 28-40.
Yu Chuanming, Wang Manyi, Lin Hongjun, Zhu Xingyu, Huang Tingting, An Lu. A Comparative Study of Word Representation Models Based on Deep Learning. Data Analysis and Knowledge Discovery, 2020, 4(8): 28-40.
( Yuan Shuhan, Xiang Yang. A Review of Lexical Semantic Representation[J]. Journal of Chinese Information Processing, 2016,30(5):1-8.)
[2]
Turian J P, Ratinov L A, Bengio Y. Word Representations: A Simple and General Method for Semi-supervised Learning[C]// Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010: 384-394.
[3]
Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint,arXiv:1301. 3781.
[4]
Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
[5]
Bojanowski P, Grave E, Joulin A, et al. Enriching Word Vectors with Subword Information[J]. Transactions of the Association for Computational Linguistics, 2017(5):135-146.
[6]
Joulin A, Grave E, Bojanowski P, et al. Bag of Tricks for Efficient Text Classification[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 2017: 427-431.
[7]
Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume1 (Long Papers). 2018: 2227-2237.
[8]
Radford A, Narasimhan K, Salimans T, et al. Improving Language Understanding by Generative Pre-Training[EB/OL].[2019-10-13].https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018 improving.pdf.
[9]
Radford A, Wu J, Child R, et al. Language Models are Unsupervised Multitask Learners[EB/OL]. [2019-10-01].https://d4mucfpksywv.cloudfront.net/better-language-models/language_ models_are_unsupervised_multitask_learners.pdf.
[10]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[11]
Yang Z L, Dai Z H, Yang Y M, et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding[OL]. arXiv Preprint,arXiv: 1906. 08237.
[12]
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[13]
Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1746-1751.
( Zhou Lian. Exploration of the Working Principle and Application of Word2vec[J]. Sci-Tech Information Development & Economy, 2015,25(2):145-148.)
[15]
Bellman R E. Dynamic Programming[M]. New York: Dover Publications, Inc., 2003.
[16]
Bordag S. A Comparison of Co-occurrence and Similarity Measures as Simulations of Context[C]// Proceedings of the 9th International Conference on Computational Linguistics and Intelligent Text Processing. 2008: 52-63.
[17]
Aharon M, Elad M, Bruckstein A. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation[J]. IEEE Transactions on Signal Processing, 2006,54(11):4311-4322.
[18]
Bengio Y, Ducharme R, Vincent P, et al. A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003,3:1137-1155.
[19]
Dai Z H, Yang Z L, Yang Y M, et al. Transformer-XL: Attentive Language Models Beyond a Fixed-length Context[OL]. arXiv Preprint,arXiv: 1901. 02860.
( Zhao Yaou, Zhang Jiachong, Li Yibin. et al. Sentiment Analysis Using ELMo and Multi-scale Convolutional Neural Networks [J/OL]. Journal of Computer Applications. http://kns.cnki.net/kcms/detail/51.1307.TP.20190927.0949.004.html
( Zhao Hong, Wang Fang, Wang Xiaoyu, et al. Research on Construction and Application of a Knowledge Discovery System Based on Intelligent Processing of Large-scale Governmental Documents[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(8):805-812.)
( Zhang Xiaojuan. Personalized Query Reformulations with Embeddings[J]. Journal of the China Society of Scientific and Technology, 2018,37(6):621-630.)
( Yang Piao, Dong Wenyong, Chinese NER Based on BERT Embedding[J/OL]. Computer Engineering, https://doi.org/10.19678/j.issn 0054272.)
[26]
Hu M Q, Liu B. Mining and Summarizing Customer Reviews[C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004: 168-177.
[27]
Pang B, Lee L. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales[C]// Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 2005: 115-124.
[28]
Wiebe J, Wilson T, Cardie C. Annotating Expressions of Opinions and Emotions in Language[J]. Language Resources and Evaluation, 2005,39(2-3):165-210.
[29]
Pang B, Lee L. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts[C]// Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. 2004: 271-278.
[30]
Socher R, Perelygin A, Wu J Y, et al. Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank[C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013: 1631-1642.
[31]
Li X, Roth D. Learning Question Classifiers[C]// Proceedings of the 19th International Conference on Computational Linguistics-Volume 1. 2002: 1-7.