Please wait a minute...
Advanced Search
数据分析与知识发现  2024, Vol. 8 Issue (3): 53-62     https://doi.org/10.11925/infotech.2096-3467.2023.0080
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于情感增强和知识融合的在线健康社区情感分析研究*
张伟1,徐宗煌1,蔡鸿宇1,韩普2,3(),石进1
1南京大学信息管理学院 南京 210023
2南京邮电大学管理学院 南京 210003
3江苏省数据工程与知识服务重点实验室 南京 210023
Sentiment Analysis of Online Health Community Based on Emotional Enhancement and Knowledge Fusion
Zhang Wei1,Xu Zonghuang1,Cai Hongyu1,Han Pu2,3(),Shi Jin1
1School of Information Management, Nanjing University, Nanjing 210023, China
2School of Management, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
3Jiangsu Provincial Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
全文: PDF (1161 KB)   HTML ( 10
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】利用在线健康社区文本依存句法结构中蕴含的情感知识进行情感分析,提出一种基于情感增强和知识融合的在线健康社区情感分析模型WoBEK-GAT。【方法】首先,采用WoBERT Plus实现动态词嵌入;其次,利用卷积神经网络(CNN)和双向长短时记忆网络(BiLSTM)提取语义特征;最后,通过情感增强和知识融合策略将剪枝依存句法树中的关键句法信息与外部情感知识充分融合,并输入图注意力网络(GAT)中进而输出情感类别。【结果】在构建的中文数据集上进行对比实验,实验结果表明WoBEK-GAT模型MacroF1值达到88.48%,较基准模型CNN、BiLSTM和GAT分别提升15.49、14.15和13.15个百分点。【局限】未考虑图片和语音等多模态信息中的情感知识。【结论】依存句法信息的加入以及情感增强策略和知识融合策略的结合能够有效提升模型的情感分析能力。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张伟
徐宗煌
蔡鸿宇
韩普
石进
关键词 在线健康社区情感分析情感增强知识融合图注意力网络    
Abstract

[Objective] This study conducts sentiment analysis using the emotional knowledge contained in the syntactic structures of texts from online health communities. We propose an online health community sentiment analysis model, WoBEK-GAT, based on emotional enhancement and knowledge fusion. [Methods] Firstly, we utilized WoBERT Plus for dynamic word embedding. Then, we extracted semantic features using CNN and BiLSTM. Finally, we fully integrated key syntactic information from pruned dependency trees with external emotional knowledge through sentiment enhancement and knowledge fusion strategies. We fed these inputs into the GAT to output sentiment categories. [Results] We conducted comparative experiments on a constructed Chinese dataset. The proposed model’s MacroF1 value reached 88.48%. It was 15.49%, 14.15%, and 13.15% over baseline models CNN, BiLSTM, and GAT, respectively. [Limitations] We should have considered sentiment knowledge in multimodal information such as pictures and speeches. [Conclusions] The proposed model could effectively improve sentiment analysis capability.

Key wordsOnline    Health    Community    Sentiment    Analysis    Emotional    Enhancement    Knowledge    Fusion    GAT
收稿日期: 2023-02-09      出版日期: 2023-09-12
ZTFLH:  G350  
基金资助:* 国家社会科学基金项目(21BTQ012);国家社会科学基金项目(22BTQ096)
通讯作者: 韩普,ORCID:0000-0001-5867-4292,E-mail:hanpu@njupt.edu.cn。   
引用本文:   
张伟, 徐宗煌, 蔡鸿宇, 韩普, 石进. 基于情感增强和知识融合的在线健康社区情感分析研究*[J]. 数据分析与知识发现, 2024, 8(3): 53-62.
Zhang Wei, Xu Zonghuang, Cai Hongyu, Han Pu, Shi Jin. Sentiment Analysis of Online Health Community Based on Emotional Enhancement and Knowledge Fusion. Data Analysis and Knowledge Discovery, 2024, 8(3): 53-62.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2023.0080      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2024/V8/I3/53
Fig.1  WoBEK-GAT模型框架
Fig.2  EK-GAT构建流程
数据集 类别 数量 总计
在线健康社区语料 积极 23 315 69 724
中性 23 237
消极 23 172
Table 1  在线健康社区数据标注统计
实验参数名称 说明 参数值
Max Length of Sentences 最大文本序列长度 150
Size of Word Vector 词向量的维度 768
Batch Size 每批数据量的大小 16
Numbers of Feature Map CNN卷积核个数 200
Hidden Size of BiLSTM BiLSTM隐藏层大小 256
Numbers of Multi-Head Attention 多头注意力机制头数 4
Negative Input Slope of LeakyReLU LeakyReLU负输入斜率 0.2
Best Degree of Knowledge Fusion 最优知识融合度 0.3
Epochs 样本训练次数 10
Different Learning Rate 差分学习速率 0.001(其他),2e-5(WoBERT Plus)
Table 2  模型参数设置
模型 MacroP /% MacroR /% MacroF1 /%
传统神经网络 CNN 73.15 72.83 72.99
RNN 72.79 72.28 72.53
LSTM 74.41 73.03 73.71
BiLSTM 74.15 74.51 74.33
图神经网络 GCN 74.26 74.95 74.60
GCN-P 74.83 75.00 74.91
GAT 75.38 75.29 75.33
GAT-P 76.04 75.80 75.92
Table 3  6种基准模型实验结果
模型 MacroP /% MacroR /% MacroF1 /%
W2V-CNN 75.12 75.88 75.50
W2V- BiLSTM 76.27 77.07 76.67
W2V-CNN-BiLSTM 77.59 78.01 77.80
WoB-CNN 79.81 80.24 80.02
WoB-BiLSTM 82.04 82.71 82.37
WoB-CNN-BiLSTM 82.93 83.29 83.11
Table 4  不同特征提取方式实验结果
模型 MacroP /% MacroR /% MacroF1 /%
WoB-CNN-BiLSTM-GAT-P 84.68 84.92 84.80
WoB-CNN-BiLSTM-EGAT1 85.83 86.47 86.15
WoB-CNN-BiLSTM-EGAT2 84.87 85.39 85.13
WoB-CNN-BiLSTM-EGAT3 86.76 87.25 87.00
Table 5  情感增强实验结果
知识融合度 MacroP/% MacroR/% MacroF1/%
K=0 84.68 84.92 84.80
K=0.1 84.91 85.13 85.02
K=0.2 85.20 85.61 85.40
K=0.3 85.49 85.93 85.71
K=0.4 85.37 85.73 85.55
K=0.5 85.35 85.51 85.43
K=0.6 84.22 84.69 84.45
K=0.7 83.83 84.07 83.95
K=0.8 82.21 82.62 82.41
K=0.9 78.24 78.95 78.59
K=1.0 67.75 69.01 68.37
Table 6  知识融合实验结果
层数 MacroP/% MacroR/% MacroF1/%
N=1 87.15 87.64 87.39
N=2 88.25 88.72 88.48
N=3 73.63 74.35 73.99
N=4 53.14 54.01 53.57
N=5 32.75 34.12 33.42
Table 7  不同层数的WoBEK-GAT模型实验结果
[1] 吴江, 李姗姗, 周露莎, 等. 基于随机行动者模型的在线医疗社区用户关系网络动态演化研究[J]. 情报学报, 2017, 36(2): 213-220.
[1] (Wu Jiang, Li Shanshan, Zhou Lusha, et al. Research on Dynamic Evolution of Users’ Relationship Network in Online Health Community Based on Stochastic Actor-Oriented Model[J]. Journal of the China Society for Scientific and Technical Information, 2017, 36(2): 213-220.)
[2] Rodrigues R G, das Dores R M, Camilo-Junior C G, et al. SentiHealth-Cancer: A Sentiment Analysis Tool to Help Detecting Mood of Patients in Online Social Networks[J]. International Journal of Medical Informatics, 2016, 85(1): 80-95.
doi: 10.1016/j.ijmedinf.2015.09.007 pmid: 26514078
[3] Zhao K, Yen J, Greer G, et al. Finding Influential Users of Online Health Communities: A New Metric Based on Sentiment Influence[J]. Journal of the American Medical Informatics Association, 2014, 21(e2): e212-e218.
doi: 10.1136/amiajnl-2013-002282
[4] Ali T, Schramm D, Sokolova M, et al. Can I Hear You? Sentiment Analysis on Medical Forums[C]// Proceedings of the 6th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2013: 667-673.
[5] 郭凤仪, 纪雪梅. 突发公共卫生事件下在线健康社区突发话题与情感的共现关联分析[J]. 情报理论与实践, 2022, 45(4): 190-198.
[5] (Guo Fengyi, Ji Xuemei. Co-Occurrence and Correlation Analysis of Emergent Topics and Emotions in Online Health Communities under Public Health Emergencies[J]. Information Studies: Theory & Application, 2022, 45(4): 190-198.)
[6] 刘冰, 历鑫, 张赫钊, 等. 网络健康社区中身份转换期女性信息需求主题特征及情感因素研究——以“妈妈网”中“备孕版块”为例[J]. 情报理论与实践, 2019, 42(5): 87-92.
[6] (Liu Bing, Li Xin, Zhang Hezhao, et al. Thematic Characteristics and Emotional Factors of Women’s Information Needs During Their Identity Transition Period in the Online Health Community: A Case Study of the “Pregnant Section” in “Mama.cn”[J]. Information Studies: Theory & Application, 2019, 42(5): 87-92.)
[7] 叶艳, 吴鹏, 周知, 等. 基于LDA-BiLSTM模型的在线医疗服务质量识别研究[J]. 情报理论与实践, 2022, 45(8): 178-183, 168.
[7] (Ye Yan, Wu Peng, Zhou Zhi, et al. Research on Online Medical Service Quality Identification Based on LDA-BiLSTM Model[J]. Information Studies: Theory & Application, 2022, 45(8): 178-183, 168.)
[8] Chen T, Xu R F, He Y L, et al. Improving Sentiment Analysis via Sentence Type Classification Using BiLSTM-CRF and CNN[J]. Expert Systems with Applications, 2017, 72: 221-230.
doi: 10.1016/j.eswa.2016.10.065
[9] Liang B, Su H, Gui L, et al. Aspect-Based Sentiment Analysis via Affective Knowledge Enhanced Graph Convolutional Networks[J]. Knowledge-Based Systems, 2022, 235: 107643.
doi: 10.1016/j.knosys.2021.107643
[10] Zhou J, Huang J X, Hu Q V, et al. SK-GCN: Modeling Syntax and Knowledge via Graph Convolutional Network for Aspect-Level Sentiment Classification[J]. Knowledge-Based Systems, 2020, 205: 106292.
doi: 10.1016/j.knosys.2020.106292
[11] Lai Y N, Zhang L F, Han D H, et al. Fine-Grained Emotion Classification of Chinese Microblogs Based on Graph Convolution Networks[J]. World Wide Web, 2020, 23(5): 2771-2787.
doi: 10.1007/s11280-020-00803-0
[12] Zhu X F, Zhu L, Guo J F, et al. GL-GCN: Global and Local Dependency Guided Graph Convolutional Networks for Aspect-Based Sentiment Classification[J]. Expert Systems with Applications, 2021, 186: 115712.
doi: 10.1016/j.eswa.2021.115712
[13] Zeng J D, Liu T Y, Jia W J, et al. Fine-Grained Question-Answer Sentiment Classification with Hierarchical Graph Attention Network[J]. Neurocomputing, 2021, 457: 214-224.
doi: 10.1016/j.neucom.2021.06.040
[14] 范涛, 王昊, 吴鹏. 基于图卷积神经网络和依存句法分析的网民负面情感分析研究[J]. 数据分析与知识发现, 2021, 5(9): 97-106.
[14] (Fan Tao, Wang Hao, Wu Peng. Sentiment Analysis of Online Users’ Negative Emotions Based on Graph Convolutional Network and Dependency Parsing[J]. Data Analysis and Knowledge Discovery, 2021, 5(9): 97-106.)
[15] 以词为基本单位的中文BERT[EB/OL]. [2021-11-18]. https://github.com/ZhuiyiTechnology/WoBERT.
[15] (Chinese BERT with Word as Basic Unit[EB/OL]. [2021-11-18]. https://github.com/ZhuiyiTechnology/WoBERT.)
[16] Kim Y. Convolutional Neural Networks for Sentence Classification[OL].arXiv Preprint, arXiv:1408.5882.
[17] Schuster M, Paliwal K K. Bidirectional Recurrent Neural Networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681.
doi: 10.1109/78.650093
[18] Che W X, Li Z H, Liu T. LTP: A Chinese Language Technology Platform[C]// Proceedings of the 23rd International Conference on Computational Linguistics:Demonstrations. New York: ACM Press, 2010: 13-16.
[19] Bruna J, Zaremba W, Szlam A, et al. Spectral Networks and Locally Connected Networks on Graphs[OL]. arXiv Preprint, arXiv: 1312.6203.
[20] Pang S G, Xue Y, Yan Z H, et al. Dynamic and Multi-Channel Graph Convolutional Networks for Aspect-Based Sentiment Analysis[C]// Findings of the Association for Computational Linguistics:ACL-IJCNLP 2021. Stroudsburg: Association for Computational Linguistics, 2021: 2627-2636.
[21] 娄岩, 杨嘉林, 黄鲁成, 等. 基于网络问答社区的老年科技公众关注热点及情感分析——以“知乎”为例[J]. 情报杂志, 2020, 39(3): 115-122.
[21] (Lou Yan, Yang Jialin, Huang Lucheng, et al. Analysis of Public Concerns and Emotions of Gerontechnology Based on Social Q&A Community—Taking “Zhihu” as an Example[J]. Journal of Intelligence, 2020, 39(3): 115-122.)
[22] Zhang X, Zhao J B, LeCun Y. Character-Level Convolutional Networks for Text Classification[C]// Proceedings of the 29th Annual Conference on Neural Information Processing Systems. New York: ACM Press, 2015: 649-657.
[23] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st Annual Conference on Neural Information Processing Systems. New York: ACM Press, 2017: 5998-6008.
[24] 蔡莉, 王淑婷, 刘俊晖, 等. 数据标注研究综述[J]. 软件学报, 2020, 31(2): 302-320.
[24] (Cai Li, Wang Shuting, Liu Junhui, et al. Survey of Data Annotation[J]. Journal of Software, 2020, 31(2): 302-320.)
[25] 王昊, 龚丽娟, 周泽聿, 等. 融合语义增强的社交媒体虚假信息检测方法研究[J]. 数据分析与知识发现, 2023, 7(2): 48-60.
[25] (Wang Hao, Gong Lijuan, Zhou Zeyu, et al. Detecting Mis/Dis-Information from Social Media with Semantic Enhancement[J]. Data Analysis and Knowledge Discovery, 2023, 7(2): 48-60.)
[1] 吴越, 孙海春. 基于图神经网络的知识图谱补全研究综述*[J]. 数据分析与知识发现, 2024, 8(3): 10-28.
[2] 李雪莲, 王碧, 李立鑫, 韩迪轩. 融合抽象语义表示和依存语法的方面级情感分析*[J]. 数据分析与知识发现, 2024, 8(1): 55-68.
[3] 李慧, 胡耀华, 徐存真. 考虑评论情感表达力及其重要性的个性化推荐算法*[J]. 数据分析与知识发现, 2024, 8(1): 69-79.
[4] 徐选华, 代笑含, 陈晓红. 大群体应急决策中基于价值测度的模糊本体知识融合方法及应用*[J]. 数据分析与知识发现, 2023, 7(4): 129-144.
[5] 闫尚义, 王靖亚, 刘晓文, 崔雨萌, 陶知众, 张晓帆. 基于多头自注意力池化与多粒度特征交互融合的微博情感分析*[J]. 数据分析与知识发现, 2023, 7(4): 32-45.
[6] 张昱, 张海军, 刘雅情, 梁科晋, 王月阳. 基于双向掩码注意力机制的多模态情感分析*[J]. 数据分析与知识发现, 2023, 7(4): 46-55.
[7] 李浩君, 吕韵, 汪旭辉, 黄诘雅. 融入情感分析的多层交互深度推荐模型研究*[J]. 数据分析与知识发现, 2023, 7(3): 43-57.
[8] 周宁, 钟娜, 靳高雅, 刘斌. 基于混合词嵌入的双通道注意力网络中文文本情感分析*[J]. 数据分析与知识发现, 2023, 7(3): 58-68.
[9] 赵一鸣, 潘沛, 毛进. 基于任务知识融合与文本数据增强的医学信息查询意图强度识别研究*[J]. 数据分析与知识发现, 2023, 7(2): 38-47.
[10] 王昊, 龚丽娟, 周泽聿, 范涛, 王永生. 融合语义增强的社交媒体虚假信息检测方法研究*[J]. 数据分析与知识发现, 2023, 7(2): 48-60.
[11] 李合龙, 任昌松, 柳欣茹, 汪存华. 金融市场文本情绪研究综述*[J]. 数据分析与知识发现, 2023, 7(12): 22-39.
[12] 李华锋, 温曜东. 基于元分析的在线健康信息分享意愿影响因素研究*[J]. 数据分析与知识发现, 2023, 7(12): 125-141.
[13] 吴旭旭, 陈鹏, 江欢. 基于多特征融合的微博细粒度情感分析*[J]. 数据分析与知识发现, 2023, 7(12): 102-113.
[14] 操玮, 廖臣悦, 张福伟. 跨市场跨来源情感分析驱动的人民币汇率预测研究*[J]. 数据分析与知识发现, 2023, 7(12): 75-87.
[15] 王松, 骆莹, 刘新民. 基于文本语义与关联网络双链路融合的用户生成内容价值早期识别研究*[J]. 数据分析与知识发现, 2023, 7(11): 101-113.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn