|
|
A Comparative Study of Word Representation Models Based on Deep Learning |
Yu Chuanming1(),Wang Manyi2,Lin Hongjun1,Zhu Xingyu1,Huang Tingting2,An Lu3 |
1School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430073, China 2School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan 430073, China 3School of Information Management, Wuhan University, Wuhan 430072, China |
|
|
Abstract [Objective] This study systematically explores the principles of traditional deep representation models and the latest pre-training ones, aiming to examine their performance in text mining tasks. [Methods] We compared these models’ data mining results from the model side and the experimental side. All tests were conducted with six datasets of CR, MR, MPQA, Subj, SST-2 and TREC. [Results] The XLNet model achieved the best average F1 value (0.918 6), which was higher than ELMo (0.809 0), BERT (0.898 3), Word2Vec (0.769 2), GloVe (0.757 6) and FastText (0.750 6). [Limitations] Our research focused on classification tasks of text mining, which did not compare the performance of vocabulary representation methods in machine translation, Q&A and other tasks. [Conclusions] The traditional deep representation learning models and the latest pre-training ones yield different results in text mining tasks.
|
Received: 08 November 2019
Published: 14 September 2020
|
|
Corresponding Authors:
Yu Chuanming
E-mail: yucm@zuel.edu.cn
|
[1] |
袁书寒, 向阳. 词汇语义表示研究综述[J]. 中文信息学报, 2016,30(5):1-8.
|
[1] |
( Yuan Shuhan, Xiang Yang. A Review of Lexical Semantic Representation[J]. Journal of Chinese Information Processing, 2016,30(5):1-8.)
|
[2] |
Turian J P, Ratinov L A, Bengio Y. Word Representations: A Simple and General Method for Semi-supervised Learning[C]// Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010: 384-394.
|
[3] |
Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint,arXiv:1301. 3781.
|
[4] |
Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
|
[5] |
Bojanowski P, Grave E, Joulin A, et al. Enriching Word Vectors with Subword Information[J]. Transactions of the Association for Computational Linguistics, 2017(5):135-146.
|
[6] |
Joulin A, Grave E, Bojanowski P, et al. Bag of Tricks for Efficient Text Classification[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 2017: 427-431.
|
[7] |
Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume1 (Long Papers). 2018: 2227-2237.
|
[8] |
Radford A, Narasimhan K, Salimans T, et al. Improving Language Understanding by Generative Pre-Training[EB/OL].[2019-10-13].https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018 improving.pdf.
|
[9] |
Radford A, Wu J, Child R, et al. Language Models are Unsupervised Multitask Learners[EB/OL]. [2019-10-01].https://d4mucfpksywv.cloudfront.net/better-language-models/language_ models_are_unsupervised_multitask_learners.pdf.
|
[10] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
|
[11] |
Yang Z L, Dai Z H, Yang Y M, et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding[OL]. arXiv Preprint,arXiv: 1906. 08237.
|
[12] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
|
[13] |
Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1746-1751.
|
[14] |
周练. Word2vec的工作原理及应用探究[J]. 科技情报开发与经济, 2015,25(2):145-148.
|
[14] |
( Zhou Lian. Exploration of the Working Principle and Application of Word2vec[J]. Sci-Tech Information Development & Economy, 2015,25(2):145-148.)
|
[15] |
Bellman R E. Dynamic Programming[M]. New York: Dover Publications, Inc., 2003.
|
[16] |
Bordag S. A Comparison of Co-occurrence and Similarity Measures as Simulations of Context[C]// Proceedings of the 9th International Conference on Computational Linguistics and Intelligent Text Processing. 2008: 52-63.
|
[17] |
Aharon M, Elad M, Bruckstein A. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation[J]. IEEE Transactions on Signal Processing, 2006,54(11):4311-4322.
|
[18] |
Bengio Y, Ducharme R, Vincent P, et al. A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003,3:1137-1155.
|
[19] |
Dai Z H, Yang Z L, Yang Y M, et al. Transformer-XL: Attentive Language Models Beyond a Fixed-length Context[OL]. arXiv Preprint,arXiv: 1901. 02860.
|
[20] |
余传明. 基于深度循环神经网络的跨领域文本情感分析[J]. 图书情报工作, 2018,62(11):23-34.
|
[20] |
( Yu Chuanming. A Cross-domain Text Sentiment Analysis Based on Deep Recurrent Neural Network[J]. Library and Information Service, 2018,62(11):23-34.)
|
[21] |
赵亚欧, 张家重, 李贻斌. 等 融合ELMo和多尺度卷积神经网络的情感分析[J/OL]. 计算机应用. http://kns.cnki.net/kcms/detail/51.1307.TP.20190927.0949.004.html.
|
[21] |
( Zhao Yaou, Zhang Jiachong, Li Yibin. et al. Sentiment Analysis Using ELMo and Multi-scale Convolutional Neural Networks [J/OL]. Journal of Computer Applications. http://kns.cnki.net/kcms/detail/51.1307.TP.20190927.0949.004.html
|
[22] |
李琳, 李辉. 一种基于概念向量空间的文本相似度计算方法[J]. 数据分析与知识发现, 2018,2(5):48-58.
|
[22] |
( Li Lin, Li Hui. Computing Text Similarity Based on Concept Vector Space[J]. Data Analysis and Knowledge Discovery, 2018,2(5):48-58.)
|
[23] |
赵洪, 王芳, 王晓宇, 等. 基于大规模政府公文智能处理的知识发现及应用研究[J]. 情报学报, 2018,37(8):805-812.
|
[23] |
( Zhao Hong, Wang Fang, Wang Xiaoyu, et al. Research on Construction and Application of a Knowledge Discovery System Based on Intelligent Processing of Large-scale Governmental Documents[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(8):805-812.)
|
[24] |
张晓娟. 利用嵌入方法实现个性化查询重构[J]. 情报学报, 2018,37(6):621-630.
|
[24] |
( Zhang Xiaojuan. Personalized Query Reformulations with Embeddings[J]. Journal of the China Society of Scientific and Technology, 2018,37(6):621-630.)
|
[25] |
杨飘, 董文永. 基于BERT嵌入的中文命名实体识别方法[J/OL]. 计算机工程, https://doi.org/10.19678/j.issn.1000-3428. 0054272.
|
[25] |
( Yang Piao, Dong Wenyong, Chinese NER Based on BERT Embedding[J/OL]. Computer Engineering, https://doi.org/10.19678/j.issn 0054272.)
|
[26] |
Hu M Q, Liu B. Mining and Summarizing Customer Reviews[C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004: 168-177.
|
[27] |
Pang B, Lee L. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales[C]// Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 2005: 115-124.
|
[28] |
Wiebe J, Wilson T, Cardie C. Annotating Expressions of Opinions and Emotions in Language[J]. Language Resources and Evaluation, 2005,39(2-3):165-210.
|
[29] |
Pang B, Lee L. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts[C]// Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. 2004: 271-278.
|
[30] |
Socher R, Perelygin A, Wu J Y, et al. Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank[C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013: 1631-1642.
|
[31] |
Li X, Roth D. Learning Question Classifiers[C]// Proceedings of the 19th International Conference on Computational Linguistics-Volume 1. 2002: 1-7.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|