Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (12): 102-113    DOI: 10.11925/infotech.2096-3467.2022.1028
Current Issue | Archive | Adv Search |
Micro-Blog Fine-Grained Sentiment Analysis Based on Multi-Feature Fusion
Wu Xuxu1,Chen Peng1(),Jiang Huan2
1School of Information and Cyber Security, People’s Public Security University of China, Beijing 100045, China
2School of E-Business and Logistics, Beijing Technology and Business University, Beijing 100048, China
Download: PDF (1114 KB)   HTML ( 12
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes an RB-LCM model to improve the fine-grained sentiment analysis of Weibo texts. [Methods] First, we used the RoBERTa to encode the character and sentence-level features of Weibo posts. Then, we utilized the Bi-LSTM and capsule network to capture in-depth global and local features of Weibo sentences. Third, we deployed multi-head self-attention feature fusion to fuse the relevant multi-dimensional features. Finally, we used improved Focal Loss and FGM to train the model and improve the dataset labels’ imbalance and the model’s robustness. [Results] The accuracy and F1 value of the proposed model on the SMP2020-EWECT dataset reached 80.64% and 77.41%. The model’s accuracy and F1 value on the NLPCC2013 task 2 dataset were 67.17% and 51.08%. The model’s accuracy and F1 value on the NLPCC2014 task 1 dataset reached 71.27% and 58.25%. The model’s accuracy and F1 value on the binary sentiment dataset weibo_senti_100k dataset were up to 98.45% and 98.44%, respectively. All results were better than the advanced sentiment analysis models on each dataset. [Limitations] Our model did not include relevant pictures, videos, voice, or other information for sentiment analysis. [Conclusions] The proposed model can effectively analyze the sentiment of Weibo posts.

Key wordsRoBERTa      Multi-Head Self-Attention Fusion      Bi-LSTM      Microblog Sentiment Analysis      Capsule Network     
Received: 28 September 2022      Published: 13 September 2023
ZTFLH:  TP391  
  G350  
Fund:Fundamental Research Funds for the Central Universities, People’s Public Security University of China Project(2022JKF02018)
Corresponding Authors: Chen Peng,E-mail:chenpeng@ppsuc.edu.cn。   

Cite this article:

Wu Xuxu, Chen Peng, Jiang Huan. Micro-Blog Fine-Grained Sentiment Analysis Based on Multi-Feature Fusion. Data Analysis and Knowledge Discovery, 2023, 7(12): 102-113.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.1028     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I12/102

Structure of RB-LCM
Capsule Flow
Adversarial Learning
数据集 训练集/
测试集/
验证集/
情绪类别/
SMP2020-EWECT 38 699 9 675 - 6
NLPCC2013 11 200 2 800 - 8
NLPCC2014 14 000 6 000 - 8
weibo_senti_100k 95 990 11 999 11 999 2
Dataset Information
名称 参数 参数值
RoBERTa 学习率 2e-5
层间学习系数 0.95
权重衰减 1e-5
词矢量维度 768
句子最大长度 175
Bi-LSTM 学习率 1e-4
隐藏层维度 384
层数 1
权重衰减 1e-5
Capsule 学习率 1e-4
胶囊个数 24
胶囊维度 32
权重衰减 1e-5
Multi-headed Self-attention 头数 3
Fc Drop_out rate 0.4
Training 优化器 Adam
批处理数量 16
Epoch 10
Model Parameter Settings
实验环境 实验配置
操作系统 Windows 10
显卡型号 Quadro P4000
内存 8GB
编程语言 Python3.8
深度学习框架 PyTorch1.12.1
Experiment Environment
Influence of Routing Iteration Times
模型 ACC(%) F1(%)
BERT-BiLSTM 71.52 63.23
BERT-BiGRU-Attention 76.15 67.79
BERT-HAN 78.77 72.63
RB-LCM 80.64 77.41
Performance of Different Models on SMP2020-EWECT Dataset
模型 ACC(%) F1(%)
C-BiLSTM 54.68 42.82
CNN_BiLSTM 55.07 44.17
EMCNN 63.12 47.23
RB-LCM 67.17 51.08
Performance of Different Models on NLPCC2013 Dataset
模型 ACC(%) F1(%)
TextRCNN 67.23 53.24
Transformer 66.08 53.29
BLLC-CL 70.57 56.59
RB-LCM 71.27 58.25
Performance of Different Models on NLPCC2014 Dataset
模型 ACC(%) F1(%)
Text RCNN 95.73 95.75
CNN-LSTM 96.81 96.81
CBMA 97.65 97.51
RB-LCM 98.45 98.44
Performance of Different Models on weibo_senti_100k Dataset
模型 NLPCC2013 NLPCC2014 SMP2020-EWECT weibo_senti_100k
ACC(%) F1(%) ACC(%) F1(%) ACC(%) F1(%) ACC(%) F1(%)
RB-LCM 67.17 51.08 71.27 58.25 80.64 77.41 98.45 98.44
RB-LCM-P 66.25 49.93 66.18 57.56 80.34 76.90 98.21 97.80
RB-LCM-Hn 66.53 50.39 68.78 56.22 79.86 77.10 97.98 97.10
RB-LCM-M 66.46 47.80 67.46 56.90 80.04 74.71 97.65 96.87
RB-LCM-F 65.71 48.68 69.60 56.62 79.53 75.21 97.42 96.88
RB-LCM-FGM 66.86 50.21 70.38 57.36 79.17 76.84 98.11 97.67
Fuse1 66.04 48.13 69.54 54.18 80.14 75.78 98.04 97.88
Fuse2 66.78 50.10 70.12 56.34 80.02 76.23 97.87 97.71
Fuse3 65.98 49.23 68.78 56.32 79.54 75.47 97.44 97.23
Fuse4 66.92 50.42 69.55 57.20 79.92 76.38 97.99 97.95
Ls1 66.14 49.55 69.95 57.42 79.86 77.23 98.15 98.10
Ls2 66.22 50.13 69.47 56.84 79.97 77.06 98.07 97.86
Performance of Ablation Experiments
[1] 周超, 严馨, 余正涛, 等. 融合词频特性及邻接变化数的微博新词识别[J]. 山东大学学报(理学版), 2015, 50(3): 6-10.
[1] (Zhou Chao, Yan Xin, Yu Zhengtao, et al. Weibo New Word Recognition Combining Frequency Characteristic and Accessor Variety[J]. Journal of Shandong University (Natural Science), 2015, 50(3): 6-10.)
[2] Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2014: 1746-1751.
[3] 刘新星, 姬东鸿, 任亚峰. 基于神经网络模型的产品属性情感分析[J]. 计算机应用, 2017, 37(6): 1735-1740.
doi: 10.11772/j.issn.1001-9081.2017.06.1735
[3] (Liu Xinxing, Ji Donghong, Ren Yafeng. Product Property Sentiment Analysis Based on Neural Network Model[J]. Journal of Computer Applications, 2017, 37(6): 1735-1740.)
doi: 10.11772/j.issn.1001-9081.2017.06.1735
[4] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[5] Peters M, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg, PA, USA: Association for Computational Linguistics, 2018: 2227-2237.
[6] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
[7] Man R, Lin K. Sentiment Analysis Algorithm Based on BERT and Convolutional Neural Network[C]// Proceedings of the 2021 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers. IEEE, 2021: 769-772.
[8] 韩普, 张伟, 张展鹏, 等. 基于特征融合和多通道的突发公共卫生事件微博情感分析[J]. 数据分析与知识发现, 2021, 5(11): 68-79.
[8] (Han Pu, Zhang Wei, Zhang Zhanpeng, et al. Sentiment Analysis of Weibo Posts on Public Health Emergency with Feature Fusion and Multi-Channel[J]. Data Analysis and Knowledge Discovery, 2021, 5(11): 68-79.)
[9] 王儒, 王嘉梅, 王伟全, 等. 深度学习框架下微博文本情感细粒度研究[J]. 计算机系统应用, 2020, 29(5): 19-28.
[9] (Wang Ru, Wang Jiamei, Wang Weiquan, et al. Fine-Grained Analysis and Research of Emotion in Microtext Under Framework of Deep Learning[J]. Computer Systems & Applications, 2020, 29(5): 19-28.)
[10] 李辉, 黄钰杰, 李金秋. 基于HAN的双通道复合模型的文本情感分类[J]. 传感器与微系统, 2021, 40(8): 121-125.
[10] (Li Hui, Huang Yujie, Li Jinqiu. Text Sentiment Classification Based on HAN and Two-Channel Composite Model[J]. Transducer and Microsystem Technologies, 2021, 40(8): 121-125.)
[11] Sabour S, Frosst N, Hinton G E. Dynamic Routing Between Capsules[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. ACM, 2017: 3859-3869.
[12] Yang M, Zhao W, Ye J B, et al. Investigating Capsule Networks with Dynamic Routing for Text Classification[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2018: 3110-3119.
[13] 冯国明, 张晓冬, 刘素辉. 基于CapsNet的中文文本分类研究[J]. 数据分析与知识发现, 2018, 2(12): 68-76.
[13] (Feng Guoming, Zhang Xiaodong, Liu Suhui. Classifying Chinese Texts with CapsNet[J]. Data Analysis and Knowledge Discovery, 2018, 2(12): 68-76.)
[14] 余本功, 朱晓洁, 张子薇. 基于多层次特征提取的胶囊网络文本分类研究[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[14] (Yu Bengong, Zhu Xiaojie, Zhang Ziwei. A Capsule Network Model for Text Classification with Multi-Level Feature Extraction[J]. Data Analysis and Knowledge Discovery, 2021, 5(6): 93-102.)
[15] Tong X, Wang J Y, Jiao K N, et al. Robustness Detection Method of Chinese Spam Based on the Features of Joint Characters-Words[C]// Proceedings of the 10th International Conference on Computer Engineering and Networks. Singapore: Springer, 2021: 845-851.
[16] Chen W T, Fan C X, Wu Y X, et al. A Chinese Character-Level and Word-Level Complementary Text Classification Method[C]// Proceedings of the 2020 International Conference on Technologies and Applications of Artificial Intelligence. IEEE, 2020: 187-192.
[17] Sangeetha K, Prabha D. Retraction Note to: Sentiment Analysis of Student Feedback Using Multi-Head Attention Fusion Model of Word and Context Embedding for LSTM[J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14(S1): Article No.S537.
[18] India M, Safari P, Hernando J. Self Multi-Head Attention for Speaker Recognition[OL]. arXiv Preprint, arXiv: 1906.09890.
[19] Fang Y, Gao J, Huang C, et al. Self Multi-Head Attention-Based Convolutional Neural Networks for Fake News Detection[J]. PLoS One, 2019, 14(9): Article No.e0222713.
[20] Liu Y H, Ott M, Goyal N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[OL]. arXiv Preprint, arXiv: 1907.11692.
[21] Zhu X D, Sobhani P, Guo H Y. Long Short-Term Memory over Recursive Structures[C]// Proceedings of the 32nd International Conference on Machine Learning. ACM, 2015: 1604-1612.
[22] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. ACM, 2017: 6000-6010.
[23] Lin T Y, Goyal P, Girshick R, et al. Focal Loss for Dense Object Detection[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. IEEE, 2017: 2999-3007.
[24] Miyato T, Dai A M, Goodfellow I. Adversarial Training Methods for Semi-Supervised Text Classification[OL]. arXiv Preprint, arXiv: 1605.07725.
[25] Jiang X C, Song C, Xu Y C, et al. Research on Sentiment Classification for Netizens Based on the BERT-BiLSTM-TextCNN Model[J]. PeerJ Computer Science, 2022, 8: Article No.e1005.
[26] Zhou P, Shi W, Tian J, et al. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2016: 207-212.
[27] Zhou C T, Sun C L, Liu Z Y, et al. A C-LSTM Neural Network for Text Classification[OL]. arXiv Preprint, arXiv: 1511.08630.
[28] Lai S W, Xu L H, Liu K, et al. Recurrent Convolutional Neural Networks for Text Classification[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. ACM, 2015: 2267-2273.
[29] 谌志群, 鞠婷. 基于BERT和双向LSTM的微博评论倾向性分析研究[J]. 情报理论与实践, 2020, 43(8): 173-177.
[29] (Chen Zhiqun, Ju Ting. Research on Tendency Analysis of Microblog Comments Based on BER T and BLSTM[J]. Information Studies: Theory & Application, 2020, 43(8): 173-177.)
[30] 赵宏, 傅兆阳, 赵凡. 基于BERT和层次化Attention的微博情感分析研究[J]. 计算机工程与应用, 2022, 58(5): 156-162.
doi: 10.3778/j.issn.1002-8331.2107-0448
[30] (Zhao Hong, Fu Zhaoyang, Zhao Fan. Microblog Sentiment Analysis Based on BERT and Hierarchical Attention[J]. Computer Engineering and Applications, 2022, 58(5): 156-162.)
doi: 10.3778/j.issn.1002-8331.2107-0448
[31] Li L, Liu F, Huang J P. A Label Similarity Attention Mechanism for Multi-Label Emotion Recognition[C]// Proceedings of the 3rd International Conference on Electronic Communication and Artificial Intelligence. IEEE, 2022: 392-396.
[32] Qiu H, Fan C D, Yao J, et al. Chinese Microblog Sentiment Detection Based on CNN-BiGRU and Multihead Attention Mechanism[J]. Scientific Programming, 2020, 2020: Article No.8865983.
[33] 何炎祥, 孙松涛, 牛菲菲, 等. 用于微博情感分析的一种情感语义增强的深度学习模型[J]. 计算机学报, 2017, 40(4): 773-790.
[33] (He Yanxiang, Sun Songtao, Niu Feifei, et al. A Deep Learning Model Enhanced with Emotion Semantics for Microblog Sentiment Analysis[J]. Chinese Journal of Computers, 2017, 40(4): 773-790.)
[1] Yan Shangyi, Wang Jingya, Liu Xiaowen, Cui Yumeng, Tao Zhizhong, Zhang Xiaofan. Microblog Sentiment Analysis with Multi-Head Self-Attention Pooling and Multi-Granularity Feature Interaction Fusion[J]. 数据分析与知识发现, 2023, 7(4): 32-45.
[2] Zhang Shunxiang, Zhang Zhenjiang, Zhu Guangli, Zhao Tong, Huang Ju. Identifying Financial Text Causality with Bi-LSTM and Two-way CNN[J]. 数据分析与知识发现, 2022, 6(7): 118-127.
[3] Zhang Yunqiu, Wang Yang, Li Bocheng. Identifying Named Entities of Chinese Electronic Medical Records Based on RoBERTa-wwm Dynamic Fusion Model[J]. 数据分析与知识发现, 2022, 6(2/3): 242-250.
[4] Yan Dongmei, He Wenxin, Chen Zhi. Predicting Stock Prices Based on RoBERTa-TCN and Sentimental Characteristics[J]. 数据分析与知识发现, 2022, 6(12): 123-134.
[5] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[6] Yu Bengong,Zhu Xiaojie,Zhang Ziwei. A Capsule Network Model for Text Classification with Multi-level Feature Extraction[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[7] Hu Haotian,Ji Jinfeng,Wang Dongbo,Deng Sanhong. An Integrated Platform for Food Safety Incident Entities Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[8] Ma Jianxia,Yuan Hui,Jiang Xiang. Extracting Name Entities from Ecological Restoration Literature with Bi-LSTM+CRF[J]. 数据分析与知识发现, 2020, 4(2/3): 78-88.
[9] Qiang Lu,Zhenfang Zhu,Fuyong Xu,Qiangqiang Guo. Chinese Sentiment Classification Method with Bi-LSTM and Grammar Rules[J]. 数据分析与知识发现, 2019, 3(11): 99-107.
[10] Lianjie Xiao,Tao Meng,Wei Wang,Zhixiang Wu. Entity Recognition of Intelligence Method Based on Deep Learning: Taking Area of Security Intelligence for Example[J]. 数据分析与知识发现, 2019, 3(10): 20-28.
[11] Yuman Li,Zhibo Chen,Fu Xu. Classifying Texts with KACC Model[J]. 数据分析与知识发现, 2019, 3(10): 89-97.
[12] Feng Guoming,Zhang Xiaodong,Liu Suhui. DBLC Model for Word Segmentation Based on Autonomous Learning[J]. 数据分析与知识发现, 2018, 2(5): 40-47.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn