Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (12): 21-29    DOI: 10.11925/infotech.2096-3467.2019.0267
     研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于双向长短时记忆网络的改进注意力短文本分类方法 *
陶志勇1,李小兵1,2(),刘影1,刘晓芳1
1 辽宁工程技术大学电子与信息工程学院 葫芦岛 125105
2 阜新力兴科技有限责任公司 阜新 123000
Classifying Short Texts with Improved-Attention Based Bidirectional Long Memory Network
Zhiyong Tao1,Xiaobing Li1,2(),Ying Liu1,Xiaofang Liu1
1 School of Electronic and Information Engineering, Liaoning Technical University, Huludao125105, China
2 Fuxin Lixing Technology Co., Ltd., Fuxin 123000, China
全文: PDF(508 KB)   HTML ( 27
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】针对短文本篇幅较短、特征稀疏等问题, 基于双向长短时记忆网络, 提出一种改进注意力的端到端短文本分类模型。【方法】采用预训练词向量完成原始文本数字化; 利用双向长短时记忆网络进行语义特征提取; 在此基础上, 改进注意力层将正向和反向特征进行融合, 用于全局注意力得分计算, 以获得具有深层语义特征的短文本向量表示; 采用Softmax给出样本标签的分类结果。【结果】相比于传统的卷积神经网络、长短时记忆网络以及双向长短时记忆网络模型, 基于双向长短时记忆网络的改进注意力模型在包含中英文的多个数据集上分类精度取得提升, 其中最高提升为19.1%。【局限】仅针对短文本分类问题, 对于篇幅长度较长文本, 模型分类精度提升有限。【结论】基于双向长短时记忆网络的改进注意力模型实现短文本分类, 能够充分利用文本的上下文语义特征, 有效地克服短文本特征稀疏, 提高短文本分类性能。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
陶志勇
李小兵
刘影
刘晓芳
关键词 短文本分类双向长短时记忆网络注意力机制    
Abstract

[Objective] This paper proposes a new model based on bidirectional long-short term memory network with improved attention, aiming to address the issues facing short texts classification. [Methods] First, we used the pre-trained word vectors to digitize the original texts. Then, we extracted their semantic features with bidirectional long-short term memory network. Third, we calculated their global attention scores with the fused forward and reverse features in the improved attention layer. Finally, we obtained short texts vector representation with deep semantic features. [Results] We used Softmax to create the sample label. Compared with the traditional CNN, LSTM and BLSTM networks, the proposed model improved the classification accuracy up to 19.1%. [Limitations] The performance of our new model on long texts is not satisfactory. [Conclusions] The proposed model could effectively classify short texts.

Key wordsShort Text Classification    Bidirectional Long-short Term Memory Network    Attentive Mechanism
收稿日期: 2019-03-07     
中图分类号:  TP391.9  
基金资助:*本文系国家重点研发计划项目“新兴产业集成化检验检测服务平台研发与应用”(项目编号: 2018YFB1403303);辽宁省博士启动基金项目“大规模无线传感网泊松混合路由协议及数据分析模型研究”(项目编号: 20170520098);辽宁省自然基金项目“非接触掌纹掌脉双模态鲁棒特征提取及识别方法研究”(项目编号: 2015020100)
通讯作者: 李小兵     E-mail: lixiaobing_lgd@163.com
引用本文:   
陶志勇,李小兵,刘影,刘晓芳. 基于双向长短时记忆网络的改进注意力短文本分类方法 *[J]. 数据分析与知识发现, 2019, 3(12): 21-29.
Zhiyong Tao,Xiaobing Li,Ying Liu,Xiaofang Liu. Classifying Short Texts with Improved-Attention Based Bidirectional Long Memory Network. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2019.0267.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2019.0267
图1  模型结构图
图2  改进注意力层原理
数据集 类别 样本数 训练集 验证集 测试集 平均词数 文本最大长度 词语总数
Chinese_news (CNH) 18 192 000 156 000 18 000 18 000 12 29 137 890
MR 2 10 658 7 462 1 598 1 598 20 57 18 159
TREC 6 5 949 5 357 - 592 10 35 9 337
IMDB 2 50 000 25 000 12 500 12 500 239 2 525 141 902
IMDB_10 10 50 000 25 000 12 500 12 500 239 2 525 141 902
Yelp 5 35 000 25 000 5 000 5 000 129 984 104 352
表1  数据集信息
注意力 数据集 CNH MR TREC IMDB IMDB_10 Yelp
未引入注意力 CNN 60.0% 72.1% 81.7% 74.4% 35.1% 46.0%
LSTM 75.3% 73.7% 85.4% 88.5% 40.3% 55.3%
BLSTM_ave 78.7% 78.7% 87.3% 90.8% 47.4% 59.3%
BLSTM 78.5% 80.3% 89.4% 89.7% 44.2% 61.8%
引入注意力 ABLSTM 78.7% 80.7% 89.0% 91.5% 46.8% 62.3%
HAN 79.0% 80.3% 89.0% 90.2% 49.4% 62.1%
IABLSTM 79.1% 81.5% 90.9% 91.4% 49.4% 62.8%
表2  模型分类精度
[1] Bollegala D, Mastsuo Y, Lshizuka M . Measuring Semantic Similarity Between Words Using Web Search Engines [C] //Proceedings of the 2nd ACM International Conference on World Wide Web. ACM, 2007: 757-766.
[2] Li J, Cai Y, Cai Z , et al. Wikipedia Based Short Text Classification Method [C]//Proceedings of the 2017 International Conference on Database Systems for Advanced Applications. Springer Cham, 2017: 275-286.
[3] 吕超镇, 姬东鸿, 吴飞飞 . 基于LDA特征扩展的短文本分类[J]. 计算机工程与应用, 2015,51(4):123-127.
( Lv Chaozhen, Ji Donghong, Wu Feifei . Short Text Classification Based on Expanding Feature of LDA[J]. Computer Engineering and Applications, 2015,15(4):123-127.)
[4] Ma C, Zhao Q, Pan J , et al. Short Text Classification Based on Distributional Representations of Words[J]. IEICE Transactions on Information and Systems, 2016,99(10):2562-2565.
[5] Kaljahi R, Foster J . Any-gram Kernels for Sentence Classification: A Sentiment Analysis Case Study[OL]. arXiv Preprint, arXiv: 1712.07004v1.
[6] Li B, Zhao Z, Liu T , et al. Weighted Neural Bag-of-n-grams Model: New Baselines for Text Classification [C]// Proceedings of the 26th International Conference on Computational Linguistics. 2016: 1591-1600.
[7] Kim Y . Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408.5882v2.
[8] Lee J Y, Dernoncourt F . Sequential Short-text Classification with Recurrent and Convolutional Neural Networks[OL]. arXiv Preprint, arXiv: 1603.03827.
[9] Hsu S T, Moon C, Jones P , et al. A Hybrid CNN-RNN Alignment Model for Phrase-aware Sentence Classification [C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2017: 443-449.
[10] Zhou P, Qi Z, Zheng S , et al. Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling[OL]. arXiv Preprint, arXiv:1611.06639.
[11] Itti L, Koch C, Niebur E . A Model of Saliency-based Visual Attention for Rapid Scene Analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998,20(11):1254-1259.
[12] Yang Z, Yang D, Dyer C , et al. Hierarchical Attention Networks for Document Classification [C]//Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2016: 1480-1489.
[13] Zhou P, Shi W, Tian J , et al. Attention-based Bidirectional Long Short-Term Memory Networks for Relation Classification [C]//Proceedings of the 54th Annual Meeting of the Association for Computational Lingustics. Berlin, Germany: Association for Computational Linguistics, 2016: 207-212.
[14] Wang Y, Huang M, Zhao L , et al. Attention-based LSTM for Aspect-level Sentiment Classification [C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2016: 606-615.
[15] Zhou Y, Xu J, Cao J , et al. Hybrid Attention Networks for Chinese Short Text Classification[J]. Computación y Sistemas, 2018,21(4):759-769.
[16] Zaremba W, Sutskever I, Vinyals O . Recurrent Neural Network Regularization[OL]. arXiv Preprint, arXiv: 1409.2329v5.
[17] Hochreiter S, Schmidhuber J . Long Short Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
[18] Lin Z, Feng M, Santos C N D , et al. A Structured Self-attentive Sentence Embedding[OL]. arXiv Preprint, arXiv: 1703.03130.
[19] Daniluk M, Rocktaschel T, Welbl J , et al. Frustratingly Short Attention Spans in Neural Language Modeling[OL]. arXiv Preprint, arXiv: 1702.04521.
[20] Bahdanau D, Cho K, Bengio Y . Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv: 1409.0473.
[21] Qiu X, Gong J, Huang X . Overview of the NLPCC 2017 Shared Tash: Chinese News Headline Categorization [C] //Proceedings of NLPCC 2017: Natural Language Processing and Chinese Computing. Sprintger, 2017: 948-953.
[22] Pang B, Lee L . A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts [C]//Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2004: 271-278.
[23] Li X, Roth D . Learning Question Classifiers [C]//Proceedings of the 19th International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 2002: 1-7.
[24] Diao Q, Qiu M, Wu C Y , et al. Jointly Modeling Aspects, Ratings and Sentiments for Movie Recommendation (JMARS) [C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2014: 193-202.
[25] Mikolov T, Sutskever I, Chen K , et al. Distributed Representations of Words and Phrases and Their Compositionality [C]//Proceedings of Advances in Neural Information Processing Systems. Neural Information Processing Systems, 2013: 3111-3119.
[26] Pentington J, Socher R, Manning C D . Glove: Global Vectors for Word Representation [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP). Computational Linguistics, 2014: 1532-1543.
[1] 邵云飞,刘东苏. 基于类别特征扩展的短文本分类方法研究 *[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[2] 陈果,许天祥. 基于主动学习的科技论文句子功能识别研究 *[J]. 数据分析与知识发现, 2019, 3(8): 53-61.
[3] 余本功,陈杨楠,杨颖. 基于nBD-SVM模型的投诉短文本分类*[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[4] 吴粤敏,丁港归,胡滨. 基于注意力机制的农业金融文本关系抽取研究*[J]. 数据分析与知识发现, 2019, 3(5): 86-92.
[5] 肖连杰,孟涛,王伟,吴志祥. 基于深度学习的情报分析方法识别研究 * ——以安全情报领域为例[J]. 数据分析与知识发现, 2019, 3(10): 20-28.
[6] 刘勘,杜好宸. 基于深度迁移网络的Twitter谣言检测研究 *[J]. 数据分析与知识发现, 2019, 3(10): 47-55.
[7] 李钰曼,陈志泊,许福. 基于KACC模型的文本分类研究 *[J]. 数据分析与知识发现, 2019, 3(10): 89-97.
[8] 李心蕾,王昊,刘小敏,邓三鸿. 面向微博短文本分类的文本向量化方法比较研究*[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[9] 张群, 王红军, 王伦文. 词向量与LDA相融合的短文本分类方法*[J]. 数据分析与知识发现, 2016, 32(12): 27-35.
[10] 李湘东, 曹环, 丁丛, 黄莉. 利用《知网》和领域关键词集扩展方法的短文本分类研究[J]. 现代图书情报技术, 2015, 31(2): 31-38.
[11] 胡勇军, 江嘉欣, 常会友. 基于LDA高频词扩展的中文短文本分类[J]. 现代图书情报技术, 2013, (6): 42-48.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn