基于深度学习的词汇表示模型对比研究*

doi:10.11925/infotech.2096-3467.2019.1222

数据分析与知识发现

2020, Vol. 4

Issue (8): 28-40 https://doi.org/10.11925/infotech.2096-3467.2019.1222

研究论文｜

本期目录 | 过刊浏览 | 高级检索

基于深度学习的词汇表示模型对比研究*

余传明¹(

),王曼怡²,林虹君¹,朱星宇¹,黄婷婷²,安璐³

¹中南财经政法大学信息与安全工程学院武汉 430073
²中南财经政法大学统计与数学学院武汉 430073
³武汉大学信息管理学院武汉 430072

A Comparative Study of Word Representation Models Based on Deep Learning

Yu Chuanming¹(

),Wang Manyi²,Lin Hongjun¹,Zhu Xingyu¹,Huang Tingting²,An Lu³

¹School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430073, China
²School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan 430073, China
³School of Information Management, Wuhan University, Wuhan 430072, China

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (1029 KB) HTML ( 19 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】系统揭示传统深度表示模型与最新预训练模型的原理,探究其在文本挖掘任务中的效果差异。【方法】采用对比研究法,从模型侧和实验侧分别比较传统模型与最新模型在CR、MR、MPQA、Subj、SST-2和TREC六个数据集上的效果差异。【结果】在六个任务中,XLNet模型取得了最高的平均F1值（0.918 6）,优于ELMo（0.809 0）、BERT（0.898 3）、Word2Vec（0.769 2）、GloVe（0.757 6）和FastText（0.750 6）。【局限】 由于篇幅限制,实证研究以文本挖掘中的分类任务为主,尚未比较词汇表示学习方法在机器翻译、问答等其他任务中的效果。【结论】传统深度表示学习模型与最新预训练模型在文本挖掘任务中的表现存在较大差异。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	余传明
	王曼怡
	林虹君
	朱星宇
	黄婷婷
	安璐

关键词 ：词汇表示学习, 知识表示, 深度学习, 文本挖掘

Abstract：

[Objective] This study systematically explores the principles of traditional deep representation models and the latest pre-training ones, aiming to examine their performance in text mining tasks. [Methods] We compared these models’ data mining results from the model side and the experimental side. All tests were conducted with six datasets of CR, MR, MPQA, Subj, SST-2 and TREC. [Results] The XLNet model achieved the best average F1 value (0.918 6), which was higher than ELMo (0.809 0), BERT (0.898 3), Word2Vec (0.769 2), GloVe (0.757 6) and FastText (0.750 6). [Limitations] Our research focused on classification tasks of text mining, which did not compare the performance of vocabulary representation methods in machine translation, Q&A and other tasks. [Conclusions] The traditional deep representation learning models and the latest pre-training ones yield different results in text mining tasks.

Key words： Word Representation Learning Knowledge Representation Deep Learning Text Mining

收稿日期: 2019-11-08 出版日期: 2020-09-14

ZTFLH:

TP391

基金资助:*本文系国家自然科学基金面上项目"面向跨语言观点摘要的领域知识表示与融合模型研究"(71974202);中南财经政法大学中央高校基本科研业务费专项资金资助"大数据视角下的中美贸易战观点挖掘研究"的研究成果之一(2722019JX007)

通讯作者: 余传明 E-mail: yucm@zuel.edu.cn

引用本文:

余传明, 王曼怡, 林虹君, 朱星宇, 黄婷婷, 安璐. 基于深度学习的词汇表示模型对比研究*[J]. 数据分析与知识发现, 2020, 4(8): 28-40.
Yu Chuanming, Wang Manyi, Lin Hongjun, Zhu Xingyu, Huang Tingting, An Lu. A Comparative Study of Word Representation Models Based on Deep Learning. Data Analysis and Knowledge Discovery, 2020, 4(8): 28-40.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2019.1222 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I8/28

符号表示	说明
W	模型中的权重矩阵
$x i$	模型输入句子中第i个单词的词向量表征
$pos$	Transformer算法单词在句子中的位置
$d model$	Transformer算法设定的模型输入、输出的维度大小
$Trans.$	BERT模型中Transformer结构的简写
$E i, T i$	BERT模型中输入句子第i个位置的输入和输出

Table 1 相关符号说明

Fig.1 词汇表示学习研究思路

Table 2 文本挖掘数据集统计表

Table 3 实验相关情况说明

Table 4 算法参数表

Table 5 不同词汇表示学习方法的实验结果（F1值）

Fig.2 Word2Vec不同维度词向量对模型的影响

Fig.3 GloVe不同维度词向量对模型的影响

Fig.4 FastText不同维度词向量对模型的影响

Fig.5 Word2Vec不同词向量训练方式对分类结果的影响

Fig.6 GloVe不同词向量训练方式对分类结果的影响

Fig.7 FastText不同词向量训练方式对分类结果的影响

Table 6 位置信息对分类结果的影响

Table 7 多头注意力机制头数对分类结果的影响

Fig.8 多头注意力机制头数的分类结果折线图

[1]	袁书寒, 向阳. 词汇语义表示研究综述[J]. 中文信息学报, 2016,30(5):1-8.
[1]	( Yuan Shuhan, Xiang Yang. A Review of Lexical Semantic Representation[J]. Journal of Chinese Information Processing, 2016,30(5):1-8.)
[2]	Turian J P, Ratinov L A, Bengio Y. Word Representations: A Simple and General Method for Semi-supervised Learning[C]// Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010: 384-394.
[3]	Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint,arXiv:1301. 3781.
[4]	Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
[5]	Bojanowski P, Grave E, Joulin A, et al. Enriching Word Vectors with Subword Information[J]. Transactions of the Association for Computational Linguistics, 2017(5):135-146.
[6]	Joulin A, Grave E, Bojanowski P, et al. Bag of Tricks for Efficient Text Classification[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 2017: 427-431.
[7]	Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume1 (Long Papers). 2018: 2227-2237.
[8]	Radford A, Narasimhan K, Salimans T, et al. Improving Language Understanding by Generative Pre-Training[EB/OL].[2019-10-13].https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018 improving.pdf.
[9]	Radford A, Wu J, Child R, et al. Language Models are Unsupervised Multitask Learners[EB/OL]. [2019-10-01].https://d4mucfpksywv.cloudfront.net/better-language-models/language_ models_are_unsupervised_multitask_learners.pdf.
[10]	Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[11]	Yang Z L, Dai Z H, Yang Y M, et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding[OL]. arXiv Preprint,arXiv: 1906. 08237.
[12]	Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[13]	Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1746-1751.
[14]	周练. Word2vec的工作原理及应用探究[J]. 科技情报开发与经济, 2015,25(2):145-148.
[14]	( Zhou Lian. Exploration of the Working Principle and Application of Word2vec[J]. Sci-Tech Information Development & Economy, 2015,25(2):145-148.)
[15]	Bellman R E. Dynamic Programming[M]. New York: Dover Publications, Inc., 2003.
[16]	Bordag S. A Comparison of Co-occurrence and Similarity Measures as Simulations of Context[C]// Proceedings of the 9th International Conference on Computational Linguistics and Intelligent Text Processing. 2008: 52-63.
[17]	Aharon M, Elad M, Bruckstein A. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation[J]. IEEE Transactions on Signal Processing, 2006,54(11):4311-4322.
[18]	Bengio Y, Ducharme R, Vincent P, et al. A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003,3:1137-1155.
[19]	Dai Z H, Yang Z L, Yang Y M, et al. Transformer-XL: Attentive Language Models Beyond a Fixed-length Context[OL]. arXiv Preprint,arXiv: 1901. 02860.
[20]	余传明. 基于深度循环神经网络的跨领域文本情感分析[J]. 图书情报工作, 2018,62(11):23-34.
[20]	( Yu Chuanming. A Cross-domain Text Sentiment Analysis Based on Deep Recurrent Neural Network[J]. Library and Information Service, 2018,62(11):23-34.)
[21]	赵亚欧, 张家重, 李贻斌. 等融合ELMo和多尺度卷积神经网络的情感分析[J/OL]. 计算机应用. http://kns.cnki.net/kcms/detail/51.1307.TP.20190927.0949.004.html.
[21]	( Zhao Yaou, Zhang Jiachong, Li Yibin. et al. Sentiment Analysis Using ELMo and Multi-scale Convolutional Neural Networks [J/OL]. Journal of Computer Applications. http://kns.cnki.net/kcms/detail/51.1307.TP.20190927.0949.004.html
[22]	李琳, 李辉. 一种基于概念向量空间的文本相似度计算方法[J]. 数据分析与知识发现, 2018,2(5):48-58.
[22]	( Li Lin, Li Hui. Computing Text Similarity Based on Concept Vector Space[J]. Data Analysis and Knowledge Discovery, 2018,2(5):48-58.)
[23]	赵洪, 王芳, 王晓宇, 等. 基于大规模政府公文智能处理的知识发现及应用研究[J]. 情报学报, 2018,37(8):805-812.
[23]	( Zhao Hong, Wang Fang, Wang Xiaoyu, et al. Research on Construction and Application of a Knowledge Discovery System Based on Intelligent Processing of Large-scale Governmental Documents[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(8):805-812.)
[24]	张晓娟. 利用嵌入方法实现个性化查询重构[J]. 情报学报, 2018,37(6):621-630.
[24]	( Zhang Xiaojuan. Personalized Query Reformulations with Embeddings[J]. Journal of the China Society of Scientific and Technology, 2018,37(6):621-630.)
[25]	杨飘, 董文永. 基于BERT嵌入的中文命名实体识别方法[J/OL]. 计算机工程, https://doi.org/10.19678/j.issn.1000-3428. 0054272.
[25]	( Yang Piao, Dong Wenyong, Chinese NER Based on BERT Embedding[J/OL]. Computer Engineering, https://doi.org/10.19678/j.issn 0054272.)
[26]	Hu M Q, Liu B. Mining and Summarizing Customer Reviews[C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004: 168-177.
[27]	Pang B, Lee L. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales[C]// Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 2005: 115-124.
[28]	Wiebe J, Wilson T, Cardie C. Annotating Expressions of Opinions and Emotions in Language[J]. Language Resources and Evaluation, 2005,39(2-3):165-210.
[29]	Pang B, Lee L. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts[C]// Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. 2004: 271-278.
[30]	Socher R, Perelygin A, Wu J Y, et al. Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank[C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013: 1631-1642.
[31]	Li X, Roth D. Learning Question Classifiers[C]// Proceedings of the 19th International Conference on Computational Linguistics-Volume 1. 2002: 1-7.

[1]	周泽聿,王昊,赵梓博,李跃艳,张小琴. 融合关联信息的GCN文本分类模型构建及其应用研究*[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[2]	赵丹宁,牟冬梅,白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[3]	徐月梅, 王子厚, 吴子歆. 一种基于CNN-BiLSTM多特征融合的股票走势预测模型*[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[4]	黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展^*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[5]	钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述^*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[6]	马莹雪,甘明鑫,肖克峻. 融合标签和内容信息的矩阵分解推荐方法^*[J]. 数据分析与知识发现, 2021, 5(5): 71-82.
[7]	张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测^*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[8]	许光,任明,宋城宇. 西方媒体新闻中的中国经济形象提取^*[J]. 数据分析与知识发现, 2021, 5(5): 30-40.
[9]	石湘,刘萍. *基于知识元语义描述模型的领域知识抽取与表示研究 ^——以信息检索领域为例**[J]. 数据分析与知识发现, 2021, 5(4): 123-133.
[10]	代冰,胡正银. 基于文献的知识发现新近研究综述 ^*[J]. 数据分析与知识发现, 2021, 5(4): 1-12.
[11]	常城扬,王晓东,张胜磊. 基于深度学习方法对特定群体推特的动态政治情感极性分析^*[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[12]	冯勇,刘洋,徐红艳,王嵘冰,张永刚. 融合近邻评论的GRU商品推荐模型^*[J]. 数据分析与知识发现, 2021, 5(3): 78-87.
[13]	成彬,施水才,都云程,肖诗斌. 基于融合词性的BiLSTM-CRF的期刊关键词抽取方法[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[14]	胡昊天,吉晋锋,王东波,邓三鸿. 基于深度学习的食品安全事件实体一体化呈现平台构建^*[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[15]	张琪,江川,纪有书,冯敏萱,李斌,许超,刘浏. 面向多领域先秦典籍的分词词性一体化自动标注模型构建^*[J]. 数据分析与知识发现, 2021, 5(3): 2-11.

Viewed

Full text

Abstract

Cited

Shared

Discussed