Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (12): 21-29     https://doi.org/10.11925/infotech.2096-3467.2019.0267
     研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于双向长短时记忆网络的改进注意力短文本分类方法 *
陶志勇1,李小兵1,2(),刘影1,刘晓芳1
1 辽宁工程技术大学电子与信息工程学院 葫芦岛 125105
2 阜新力兴科技有限责任公司 阜新 123000
Classifying Short Texts with Improved-Attention Based Bidirectional Long Memory Network
Zhiyong Tao1,Xiaobing Li1,2(),Ying Liu1,Xiaofang Liu1
1 School of Electronic and Information Engineering, Liaoning Technical University, Huludao125105, China
2 Fuxin Lixing Technology Co., Ltd., Fuxin 123000, China
全文: PDF (508 KB)   HTML ( 27
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】针对短文本篇幅较短、特征稀疏等问题, 基于双向长短时记忆网络, 提出一种改进注意力的端到端短文本分类模型。【方法】采用预训练词向量完成原始文本数字化; 利用双向长短时记忆网络进行语义特征提取; 在此基础上, 改进注意力层将正向和反向特征进行融合, 用于全局注意力得分计算, 以获得具有深层语义特征的短文本向量表示; 采用Softmax给出样本标签的分类结果。【结果】相比于传统的卷积神经网络、长短时记忆网络以及双向长短时记忆网络模型, 基于双向长短时记忆网络的改进注意力模型在包含中英文的多个数据集上分类精度取得提升, 其中最高提升为19.1%。【局限】仅针对短文本分类问题, 对于篇幅长度较长文本, 模型分类精度提升有限。【结论】基于双向长短时记忆网络的改进注意力模型实现短文本分类, 能够充分利用文本的上下文语义特征, 有效地克服短文本特征稀疏, 提高短文本分类性能。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
陶志勇
李小兵
刘影
刘晓芳
关键词 短文本分类双向长短时记忆网络注意力机制    
Abstract

[Objective] This paper proposes a new model based on bidirectional long-short term memory network with improved attention, aiming to address the issues facing short texts classification. [Methods] First, we used the pre-trained word vectors to digitize the original texts. Then, we extracted their semantic features with bidirectional long-short term memory network. Third, we calculated their global attention scores with the fused forward and reverse features in the improved attention layer. Finally, we obtained short texts vector representation with deep semantic features. [Results] We used Softmax to create the sample label. Compared with the traditional CNN, LSTM and BLSTM networks, the proposed model improved the classification accuracy up to 19.1%. [Limitations] The performance of our new model on long texts is not satisfactory. [Conclusions] The proposed model could effectively classify short texts.

Key wordsShort Text Classification    Bidirectional Long-short Term Memory Network    Attentive Mechanism
收稿日期: 2019-03-07      出版日期: 2019-12-25
ZTFLH:  TP391.9  
基金资助:*本文系国家重点研发计划项目“新兴产业集成化检验检测服务平台研发与应用”(项目编号: 2018YFB1403303);辽宁省博士启动基金项目“大规模无线传感网泊松混合路由协议及数据分析模型研究”(项目编号: 20170520098);辽宁省自然基金项目“非接触掌纹掌脉双模态鲁棒特征提取及识别方法研究”(项目编号: 2015020100)
通讯作者: 李小兵     E-mail: lixiaobing_lgd@163.com
引用本文:   
陶志勇,李小兵,刘影,刘晓芳. 基于双向长短时记忆网络的改进注意力短文本分类方法 *[J]. 数据分析与知识发现, 2019, 3(12): 21-29.
Zhiyong Tao,Xiaobing Li,Ying Liu,Xiaofang Liu. Classifying Short Texts with Improved-Attention Based Bidirectional Long Memory Network. Data Analysis and Knowledge Discovery, 2019, 3(12): 21-29.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2019.0267      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2019/V3/I12/21
  模型结构图
  改进注意力层原理
数据集 类别 样本数 训练集 验证集 测试集 平均词数 文本最大长度 词语总数
Chinese_news (CNH) 18 192 000 156 000 18 000 18 000 12 29 137 890
MR 2 10 658 7 462 1 598 1 598 20 57 18 159
TREC 6 5 949 5 357 - 592 10 35 9 337
IMDB 2 50 000 25 000 12 500 12 500 239 2 525 141 902
IMDB_10 10 50 000 25 000 12 500 12 500 239 2 525 141 902
Yelp 5 35 000 25 000 5 000 5 000 129 984 104 352
  数据集信息
注意力 数据集 CNH MR TREC IMDB IMDB_10 Yelp
未引入注意力 CNN 60.0% 72.1% 81.7% 74.4% 35.1% 46.0%
LSTM 75.3% 73.7% 85.4% 88.5% 40.3% 55.3%
BLSTM_ave 78.7% 78.7% 87.3% 90.8% 47.4% 59.3%
BLSTM 78.5% 80.3% 89.4% 89.7% 44.2% 61.8%
引入注意力 ABLSTM 78.7% 80.7% 89.0% 91.5% 46.8% 62.3%
HAN 79.0% 80.3% 89.0% 90.2% 49.4% 62.1%
IABLSTM 79.1% 81.5% 90.9% 91.4% 49.4% 62.8%
  模型分类精度
[1] Bollegala D, Mastsuo Y, Lshizuka M . Measuring Semantic Similarity Between Words Using Web Search Engines [C] //Proceedings of the 2nd ACM International Conference on World Wide Web. ACM, 2007: 757-766.
[2] Li J, Cai Y, Cai Z , et al. Wikipedia Based Short Text Classification Method [C]//Proceedings of the 2017 International Conference on Database Systems for Advanced Applications. Springer Cham, 2017: 275-286.
[3] 吕超镇, 姬东鸿, 吴飞飞 . 基于LDA特征扩展的短文本分类[J]. 计算机工程与应用, 2015,51(4):123-127.
[3] ( Lv Chaozhen, Ji Donghong, Wu Feifei . Short Text Classification Based on Expanding Feature of LDA[J]. Computer Engineering and Applications, 2015,15(4):123-127.)
[4] Ma C, Zhao Q, Pan J , et al. Short Text Classification Based on Distributional Representations of Words[J]. IEICE Transactions on Information and Systems, 2016,99(10):2562-2565.
[5] Kaljahi R, Foster J . Any-gram Kernels for Sentence Classification: A Sentiment Analysis Case Study[OL]. arXiv Preprint, arXiv: 1712.07004v1.
[6] Li B, Zhao Z, Liu T , et al. Weighted Neural Bag-of-n-grams Model: New Baselines for Text Classification [C]// Proceedings of the 26th International Conference on Computational Linguistics. 2016: 1591-1600.
[7] Kim Y . Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408.5882v2.
[8] Lee J Y, Dernoncourt F . Sequential Short-text Classification with Recurrent and Convolutional Neural Networks[OL]. arXiv Preprint, arXiv: 1603.03827.
[9] Hsu S T, Moon C, Jones P , et al. A Hybrid CNN-RNN Alignment Model for Phrase-aware Sentence Classification [C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2017: 443-449.
[10] Zhou P, Qi Z, Zheng S , et al. Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling[OL]. arXiv Preprint, arXiv:1611.06639.
[11] Itti L, Koch C, Niebur E . A Model of Saliency-based Visual Attention for Rapid Scene Analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998,20(11):1254-1259.
[12] Yang Z, Yang D, Dyer C , et al. Hierarchical Attention Networks for Document Classification [C]//Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2016: 1480-1489.
[13] Zhou P, Shi W, Tian J , et al. Attention-based Bidirectional Long Short-Term Memory Networks for Relation Classification [C]//Proceedings of the 54th Annual Meeting of the Association for Computational Lingustics. Berlin, Germany: Association for Computational Linguistics, 2016: 207-212.
[14] Wang Y, Huang M, Zhao L , et al. Attention-based LSTM for Aspect-level Sentiment Classification [C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2016: 606-615.
[15] Zhou Y, Xu J, Cao J , et al. Hybrid Attention Networks for Chinese Short Text Classification[J]. Computación y Sistemas, 2018,21(4):759-769.
[16] Zaremba W, Sutskever I, Vinyals O . Recurrent Neural Network Regularization[OL]. arXiv Preprint, arXiv: 1409.2329v5.
[17] Hochreiter S, Schmidhuber J . Long Short Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
[18] Lin Z, Feng M, Santos C N D , et al. A Structured Self-attentive Sentence Embedding[OL]. arXiv Preprint, arXiv: 1703.03130.
[19] Daniluk M, Rocktaschel T, Welbl J , et al. Frustratingly Short Attention Spans in Neural Language Modeling[OL]. arXiv Preprint, arXiv: 1702.04521.
[20] Bahdanau D, Cho K, Bengio Y . Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv: 1409.0473.
[21] Qiu X, Gong J, Huang X . Overview of the NLPCC 2017 Shared Tash: Chinese News Headline Categorization [C] //Proceedings of NLPCC 2017: Natural Language Processing and Chinese Computing. Sprintger, 2017: 948-953.
[22] Pang B, Lee L . A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts [C]//Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2004: 271-278.
[23] Li X, Roth D . Learning Question Classifiers [C]//Proceedings of the 19th International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 2002: 1-7.
[24] Diao Q, Qiu M, Wu C Y , et al. Jointly Modeling Aspects, Ratings and Sentiments for Movie Recommendation (JMARS) [C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2014: 193-202.
[25] Mikolov T, Sutskever I, Chen K , et al. Distributed Representations of Words and Phrases and Their Compositionality [C]//Proceedings of Advances in Neural Information Processing Systems. Neural Information Processing Systems, 2013: 3111-3119.
[26] Pentington J, Socher R, Manning C D . Glove: Global Vectors for Word Representation [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP). Computational Linguistics, 2014: 1532-1543.
[1] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] 范涛,王昊,吴鹏. 基于图卷积神经网络和依存句法分析的网民负面情感分析研究*[J]. 数据分析与知识发现, 2021, 5(9): 97-106.
[3] 杨晗迅, 周德群, 马静, 罗永聪. 基于不确定性损失函数和任务层级注意力机制的多任务谣言检测研究*[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
[4] 谢豪,毛进,李纲. 基于多层语义融合的图文信息情感分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[5] 尹鹏博,潘伟民,张海军,陈德刚. 基于BERT-BiGA模型的标题党新闻识别研究*[J]. 数据分析与知识发现, 2021, 5(6): 126-134.
[6] 余本功,朱晓洁,张子薇. 基于多层次特征提取的胶囊网络文本分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[7] 韩普,张展鹏,张明淘,顾亮. 基于多特征融合的中文疾病名称归一化研究*[J]. 数据分析与知识发现, 2021, 5(5): 83-94.
[8] 段建勇,魏晓鹏,王昊. 基于多角度共同匹配的多项选择机器阅读理解模型 *[J]. 数据分析与知识发现, 2021, 5(4): 134-141.
[9] 王雨竹,谢珺,陈波,续欣莹. 基于跨模态上下文感知注意力的多模态情感分析 *[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[10] 蒋翠清,王香香,王钊. 基于消费者关注度的汽车销量预测方法研究*[J]. 数据分析与知识发现, 2021, 5(1): 128-139.
[11] 尹浩然,曹金璇,曹鲁喆,王国栋. 扩充语义维度的BiGRU-AM突发事件要素识别研究*[J]. 数据分析与知识发现, 2020, 4(9): 91-99.
[12] 黄露,周恩国,李岱峰. 融合特定任务信息注意力机制的文本表示学习模型*[J]. 数据分析与知识发现, 2020, 4(9): 111-122.
[13] 唐晓波,高和璇. 基于关键词词向量特征扩展的健康问句分类研究 *[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
[14] 石磊,王毅,成颖,魏瑞斌. 自然语言处理中的注意力机制研究综述*[J]. 数据分析与知识发现, 2020, 4(5): 1-14.
[15] 薛福亮,刘丽芳. 一种基于CRF与ATAE-LSTM的细粒度情感分析方法*[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn