Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (11): 68-79     https://doi.org/10.11925/infotech.2096-3467.2021.0339
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于特征融合和多通道的突发公共卫生事件微博情感分析*
韩普1,2(),张伟1,张展鹏1,王宇欣1,方浩宇1
1南京邮电大学管理学院 南京 210003
2江苏省数据工程与知识服务重点实验室 南京 210023
Sentiment Analysis of Weibo Posts on Public Health Emergency with Feature Fusion and Multi-Channel
Han Pu1,2(),Zhang Wei1,Zhang Zhanpeng1,Wang Yuxin1,Fang Haoyu1
1School of Management, Nanjing University of Posts & Telecommunications, Nanjing 210003, China
2Jiangsu Provincial Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
全文: PDF (1292 KB)   HTML ( 19
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 为进一步挖掘突发公共卫生事件微博文本深层语义信息,提出一种基于特征融合和注意力机制的多通道微博情感分析模型。【方法】 首先,在特征向量嵌入层利用Word2Vec和FastText生成词向量,并与词性特征向量和位置特征向量进行融合;其次,基于CNN和BiLSTM构建多通道层以提取微博文本局部和全局特征;接着,通过构建注意力机制层以提取微博文本重要语义特征;最后,在融合层合并多通道输出结果,并在输出层采用Softmax函数进行情感分类。【结果】 在42 384条突发公共卫生事件新冠疫情微博数据上进行对照实验,结果表明所提情感分析模型F1值达到90.21%,较基准模型CNN和BiLSTM分别提升9.71个百分点和9.14个百分点。【局限】 所构建的数据集规模较小,并且尚未考虑图片和语音等多模态信息。【结论】 所提模型在深度学习和多通道基础上,通过引入注意力机制并融合CNN和BiLSTM捕获的微博文本局部和全局语义特征达到了最优效果,进一步推动了微博情感分析研究进展。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
韩普
张伟
张展鹏
王宇欣
方浩宇
关键词 多通道特征融合深度学习情感分析突发公共卫生事件    
Abstract

[Objective] This paper proposes a multi-channel MCMF-A model for Weibo posts based on feature fusion and attention mechanism, aiming to further explore the semantic information of public health emergency. [Methods] Firstly, we generated word vectors with Word2vec and FastText at the feature vector embedding level, which were merged with the vectors of part-of-speech features and position features. Secondly, we constructed multi-channel layer based on CNN and BiLSTM to extract local and global features of Weibo posts. Thirdly, we utilized the attention mechanism to extract important features of the texts. Finally, we merged the multi-channel output results, and used the softmax function for sentiment classification. [Results] We examined MCMF-A model with 42 384 Weibo posts on COVID-19. The F1 value of the proposed model reached 90.21%, which was 9.71% and 9.14% higher than the benchmark CNN and BiLSTM models. [Limitations] More research is needed to expand the experiment data size to include more small and multi-modal information such as images and voices. [Conclusions] The proposed model could effectively conduct sentiment analysis with Weibo posts.

Key wordsMulti-Channel    Feature Fusion    Deep Learning    Sentiment Analysis    Public Health Emergencies
收稿日期: 2021-04-07      出版日期: 2021-12-23
ZTFLH:  G350  
基金资助:*国家社会科学基金项目(17CTQ022);国家级大学生创新训练计划项目(SZDG2020040);江苏研究生科研创新计划基金项目(KYCX20_0844)
通讯作者: 韩普,ORCID:0000-0001-5867-4292     E-mail: hanpu@njupt.edu.cn
引用本文:   
韩普, 张伟, 张展鹏, 王宇欣, 方浩宇. 基于特征融合和多通道的突发公共卫生事件微博情感分析*[J]. 数据分析与知识发现, 2021, 5(11): 68-79.
Han Pu, Zhang Wei, Zhang Zhanpeng, Wang Yuxin, Fang Haoyu. Sentiment Analysis of Weibo Posts on Public Health Emergency with Feature Fusion and Multi-Channel. Data Analysis and Knowledge Discovery, 2021, 5(11): 68-79.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.0339      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I11/68
Fig.1  实验流程
情感词名称 说明 词性标注
Positive Comment Words 正面评价词 PC
Positive Sentiment Words 正面情感词 PS
Negative Comment Words 负面评价词 NC
Negative Sentiment Words 负面情感词 NS
Degree Words 程度词 ADV
Negative Words 否定词 INVER
Table 1  词性标注规范
Fig.2  多特征向量融合过程
Fig.3  基于WMF的神经网络结构
实验参数名称 说明 参数值
Max Length of Sentences
Size of Word Vector
Size of Sentiment Feature Vector
Size of Position Feature Vector
Batch Size
Window Size
Number of Feature Map
Hidden Size of BiLSTM
epochs
最大文本序列长度
词向量的维度
词性特征向量的维度
位置特征向量的维度
每批数据量的大小
卷积核窗口大小
卷积核个数
BiLSTM隐藏层大小
样本训练次数
200
100
30
20
256
[3, 4, 5]
100
256
10
Learning Rate 学习速率 0.01
Dropout 随机断开输入神经元的比例 0.50
Optimizer 优化器 Adam
Table 2  模型参数设置
数据集 类别 数量 总计
微博语料 正类 21 192 42 384
负类 21 192
Table 3  微博标注数据统计
情感极性 微博文本
正类(1) 【“他们抗击疫情很成功”#纽约时报记者点赞中国战疫#】“中国抗击疫情很成功!”“不要觉得方舱舞可笑,那是医治的妙招”“隔离,是中国阻断疫情传播的诀窍”美国《纽约时报》资深健康与科技记者Donald McNeil,日前公开点赞中国“战疫”,纠正了西方媒体的歪曲和误读,认为与意大利和美国的应对形成了鲜明的对比,这段视频已经在全球已有数百万人次观看转发。
负类(0) 【#武汉肺炎疫情病原体并非SARS病毒#】连日来,武汉肺炎疫情是由新型SARS病毒引发的说法在网络流传。中国疾控中心公号1月18日发布科普文章《关于武汉病毒性肺炎,这5大谣言千万别信!》,文章提到,引起武汉病毒性肺炎疫情的病原体不是SARS病毒。目前调查显示,该病毒人际间传播能力和致病性均较SARS弱。#武汉新增4例新型冠状病毒肺炎病例#。
Table 4  微博数据样例
模型 P/% R/% F1/%
SVM 75.71 70.09 72.79
CNN
RNN
80.98
78.01
80.02
78.19
80.50
78.10
LSTM
BiLSTM
79.13
81.32
79.33
80.83
79.23
81.07
Table 5  5种基准模型实验结果
模型 P/% R/% F1/%
CNN-G 82.04 81.41 81.72
CNN-W2V 82.16 81.73 81.94
CNN-WMF 83.77 82.90 83.33
CNN-FT 82.54 81.92 82.23
CNN-FMF 84.13 83.74 83.93
BiLSTM-G 83.26 83.00 83.13
BiLSTM-W2V 83.34 83.19 83.26
BiLSTM-WMF 84.49 84.57 84.53
BiLSTM-FT 83.67 83.79 83.73
BiLSTM-FMF 85.32 85.09 85.20
Table 6  多特征融合实验结果
模型 P/% R/% F1/%
CNN-WMF-A 85.02 84.87 84.94
CNN-FMF-A 85.34 85.03 85.18
BiLSTM-WMF-A 85.97 85.73 85.85
BiLSTM-FMF-A 86.42 86.14 86.28
Table 7  引入注意力机制实验结果
模型 P/% R/% F1/%
CNN-BiLSTM-WMF-A 87.63 87.56 87.59
CNN-BiLSTM-FMF-A 88.07 88.43 88.25
MCMF-A 90.45 89.98 90.21
Table 8  多通道实验结果
[1] 满媛媛, 刘佳宁. 国内突发事件网络舆情研究进展[J]. 情报科学, 2020, 38(12):170-177.
[1] (Man Yuanyuan, Liu Jianing. Research Progress of Network Public Opinion on Emergencies in China[J]. Information Science, 2020, 38(12):170-177.)
[2] 罗双玲, 夏昊翔, 王延章. 微博社会网络及传播研究评述[J]. 情报学报, 2015, 34(12):1304-1313.
[2] (Luo Shuangling, Xia Haoxiang, Wang Yanzhang. Review on Research of Social Networks of Micro-Blogging and Its Propagation Dynamics[J]. Journal of the China Society for Scientific and Technical Information, 2015, 34(12):1304-1313.)
[3] 刘忠宝, 秦权, 赵文娟. 微博环境下新冠肺炎疫情事件对网民情绪的影响分析[J]. 情报杂志, 2021, 40(2):138-145.
[3] (Liu Zhongbao, Qin Quan, Zhao Wenjuan. Research on the Influence of COVID-19 Event on the Netizen Emotion under the Microblog Environment[J]. Journal of Intelligence, 2021, 40(2):138-145.)
[4] 常城扬, 王晓东, 张胜磊. 基于深度学习方法对特定群体推特的动态政治情感极性分析[J]. 数据分析与知识发现, 2021, 5(3):121-131.
[4] (Chang Chengyang, Wang Xiaodong, Zhang Shenglei. Polarity Analysis of Dynamic Political Sentiments from Tweets with Deep Learning Method[J]. Data Analysis and Knowledge Discovery, 2021, 5(3):121-131.)
[5] Taboada M, Brooke J, Tofiloski M, et al. Lexicon-Based Methods for Sentiment Analysis[J]. Computational Linguistics, 2011, 37(2):267-307.
doi: 10.1162/COLI_a_00049
[6] Nasukawa T, Yi J. Sentiment Analysis: Capturing Favorability Using Natural Language Processing[C]// Proceedings of the 2nd International Conference on Knowledge Capture. 2003: 70-77.
[7] Boiy E, Moens M F. A Machine Learning Approach to Sentiment Analysis in Multilingual Web Texts[J]. Information Retrieval, 2009, 12(5):526-558.
doi: 10.1007/s10791-008-9070-z
[8] Kim S M, Hovy E. Extracting Opinions, Opinion Holders, Topics Expressed in Online News Media Text[C]// Proceedings of the Workshop on Sentiment and Subjectivity in Text. Association for Computational Linguistics, 2006: 1-8.
[9] 夏南强, 肖琴. 微博群体信息及其主观倾向性分析[J]. 情报科学, 2014, 32(9):22-29.
[9] (Xia Nanqiang, Xiao Qin. Study of MicroBlog Group Information and Its Subjective Tendency Analysis[J]. Information Science, 2014, 32(9):22-29.)
[10] Rao Y H, Lei J S, Liu W Y, et al. Building Emotional Dictionary for Sentiment Analysis of Online News[J]. World Wide Web, 2014, 17(4):723-742.
doi: 10.1007/s11280-013-0221-9
[11] 陈龙, 管子玉, 何金红, 等. 情感分类研究进展[J]. 计算机研究与发展, 2017, 54(6):1150-1170.
[11] (Chen Long, Guan Ziyu, He Jinhong, et al. A Survey on Sentiment Classification[J]. Journal of Computer Research and Development, 2017, 54(6):1150-1170.)
[12] Gautam G, Yadav D. Sentiment Analysis of Twitter Data Using Machine Learning Approaches and Semantic Analysis[C]// Proceedings of the 7th International Conference on Contemporary Computing (IC3). IEEE, 2014: 437-442.
[13] Sharma A, Dey S. A Boosted SVM Based Ensemble Classifier for Sentiment Analysis of Online Reviews[J]. ACM SIGAPP Applied Computing Review, 2013, 13(4):43-52.
doi: 10.1145/2577554.2577560
[14] Prabowo R, Thelwall M. Sentiment Analysis: A Combined Approach[J]. Journal of Informetrics, 2009, 3(2):143-157.
doi: 10.1016/j.joi.2009.01.003
[15] 谢丽星, 周明, 孙茂松. 基于层次结构的多策略中文微博情感分析和特征抽取[J]. 中文信息学报, 2012, 26(1):73-83.
[15] (Xie Lixing, Zhou Ming, Sun Maosong. Hierarchical Structure Based Hybrid Approach to Sentiment Analysis of Chinese Micro Blog and Its Feature Extraction[J]. Journal of Chinese Information Processing, 2012, 26(1):73-83.)
[16] 李然, 林政, 林海伦, 等. 文本情绪分析综述[J]. 计算机研究与发展, 2018, 55(1):30-52.
[16] (Li Ran, Lin Zheng, Lin Hailun, et al. Text Emotion Analysis: A Survey[J]. Journal of Computer Research and Development, 2018, 55(1):30-52.)
[17] Liao S Y, Wang J B, Yu R Y, et al. CNN for Situations Understanding Based on Sentiment Analysis of Twitter Data[J]. Procedia Computer Science, 2017, 111:376-381.
doi: 10.1016/j.procs.2017.06.037
[18] Zeng D J, Dai Y, Li F, et al. Aspect Based Sentiment Analysis by a Linguistically Regularized CNN with Gated Mechanism[J]. Journal of Intelligent & Fuzzy Systems, 2019, 36(5):3971-3980.
[19] Baktha K, Tripathy B K. Investigation of Recurrent Neural Networks in the Field of Sentiment Analysis[C]// Proceedings of 2017 International Conference on Communication and Signal Processing (ICCSP). IEEE, 2017: 2047-2050.
[20] Nguyen T H, Shirai K. PhraseRNN: Phrase Recursive Neural Network for Aspect-Based Sentiment Analysis[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 2509-2514.
[21] Zhou C T, Sun C L, Liu Z Y, et al. A C-LSTM Neural Network for Text Classification[OL]. arXiv Preprint, arXiv: 1511.08630.
[22] Shuang K, Zhang Z X, Guo H, et al. A Sentiment Information Collector-Extractor Architecture Based Neural Network for Sentiment Analysis[J]. Information Sciences, 2018, 467:549-558.
doi: 10.1016/j.ins.2018.08.026
[23] Cheng Y, Sun H, Chen H M, et al. Sentiment Analysis Using Multi-Head Attention Capsules with Multi-Channel CNN and Bidirectional GRU[J]. IEEE Access, 2021, 9:60383-60395.
doi: 10.1109/ACCESS.2021.3073988
[24] 程艳, 尧磊波, 张光河, 等. 基于注意力机制的多通道CNN和BiGRU的文本情感倾向性分析[J]. 计算机研究与发展, 2020, 57(12):2583-2595.
[24] (Cheng Yan, Yao Leibo, Zhang Guanghe, et al. Text Sentiment Orientation Analysis of Multi-Channels CNN and BiGRU Based on Attention Mechanism[J]. Journal of Computer Research and Development, 2020, 57(12):2583-2595.)
[25] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the Neural Information Processing Systems Conference. 2013: 3111-3119.
[26] Joulin A, Grave E, Bojanowski P, et al. Bag of Tricks for Efficient Text Classification[OL]. arXiv Preprint, arXiv: 1607.01759.
[27] Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
[28] Kim Y. Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408.5882.
[29] 李慧, 柴亚青. 基于卷积神经网络的细粒度情感分析方法[J]. 数据分析与知识发现, 2019, 3(1):95-103.
[29] (Li Hui, Chai Yaqing. Fine-Grained Sentiment Analysis Based on Convolutional Neural Network[J]. Data Analysis and Knowledge Discovery, 2019, 3(1):95-103.)
[30] Sun B H, Yang L, Sha H, et al. Multi-modal Sentiment Analysis Using Super Characters Method on Low-Power CNN Accelerator Device[OL]. arXiv Preprint, arXiv: 2001.10179.
[31] Yin W P, Schütze H. Multichannel Variable-Size Convolution for Sentence Classification[OL]. arXiv Preprint, arXiv: 1603.04513.
[32] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8):1735-1780.
pmid: 9377276
[33] Limsopatham N, Collier N. Normalising Medical Concepts in Social Media Texts by Learning Semantic Representation[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016:1014-1023.
[34] Schuster M, Paliwal K K. Bidirectional Recurrent Neural Networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11):2673-2681.
doi: 10.1109/78.650093
[35] Mnih V, Heess N, Graves A, et al. Recurrent Models of Visual Attention[OL]. arXiv Preprint, arXiv: 1406.6247.
[36] Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv: 1409.0473.
[37] 余本功, 朱梦迪. 基于层级注意力多通道卷积双向GRU的问题分类研究[J]. 数据分析与知识发现, 2020, 4(8):50-62.
[37] (Yu Bengong, Zhu Mengdi. Question Classification Based on Bidirectional GRU with Hierarchical Attention and Mutil-channel Convolution[J]. Data Analysis and Knowledge Discovery, 2020, 4(8):50-62.)
[38] 陈珂, 梁斌, 柯文德, 等. 基于多通道卷积神经网络的中文微博情感分析[J]. 计算机研究与发展, 2018, 55(5):945-957.
[38] (Chen Ke, Liang Bin, Ke Wende, et al. Chinese Micro-Blog Sentiment Analysis Based on Multi-Channels Convolutional Neural Networks[J]. Journal of Computer Research and Development, 2018, 55(5):945-957.)
[39] Miculicich L, Ram D, Pappas N, et al. Document-Level Neural Machine Translation with Hierarchical Attention Networks[OL]. arXiv Preprint, arXiv: 1809.01576.
[40] 宁尚明, 滕飞, 李天瑞. 基于多通道自注意力机制的电子病历实体关系抽取[J]. 计算机学报, 2020, 43(5):916-929.
[40] (Ning Shangming, Teng Fei, Li Tianrui. Multi-channel Self-attention Mechanism for Relation Extraction in Clinical Records[J]. Chinese Journal of Computers, 2020, 43(5):916-929.)
[41] Liu R, Wei W, Mao W G, et al. Phase Conductor on Multi-layered Attentions for Machine Comprehension[OL]. arXiv Preprint, arXiv: 1710.10504.
[42] 蔡莉, 王淑婷, 刘俊晖, 等. 数据标注研究综述[J]. 软件学报, 2020, 31(2):302-320.
[42] (Cai Li, Wang Shuting, Liu Junhui, et al. Survey of Data Annotation[J]. Journal of Software, 2020, 31(2):302-320.)
[1] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] 周泽聿,王昊,赵梓博,李跃艳,张小琴. 融合关联信息的GCN文本分类模型构建及其应用研究*[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[3] 赵丹宁,牟冬梅,白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[4] 徐月梅, 王子厚, 吴子歆. 一种基于CNN-BiLSTM多特征融合的股票走势预测模型*[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[5] 黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[6] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[7] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[8] 刘彤,刘琛,倪维健. 多层次数据增强的半监督中文情感分析方法*[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[9] 马莹雪,甘明鑫,肖克峻. 融合标签和内容信息的矩阵分解推荐方法*[J]. 数据分析与知识发现, 2021, 5(5): 71-82.
[10] 孟镇,王昊,虞为,邓三鸿,张宝隆. 基于特征融合的声乐分类研究*[J]. 数据分析与知识发现, 2021, 5(5): 59-70.
[11] 王雨竹,谢珺,陈波,续欣莹. 基于跨模态上下文感知注意力的多模态情感分析 *[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[12] 林克柔,王昊,龚丽娟,张宝隆. 融合多特征的中文论文同名学者消歧研究 *[J]. 数据分析与知识发现, 2021, 5(4): 90-102.
[13] 成彬,施水才,都云程,肖诗斌. 基于融合词性的BiLSTM-CRF的期刊关键词抽取方法[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[14] 胡昊天,吉晋锋,王东波,邓三鸿. 基于深度学习的食品安全事件实体一体化呈现平台构建*[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[15] 张琪,江川,纪有书,冯敏萱,李斌,许超,刘浏. 面向多领域先秦典籍的分词词性一体化自动标注模型构建*[J]. 数据分析与知识发现, 2021, 5(3): 2-11.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn