Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (5): 81-91     https://doi.org/10.11925/infotech.2096-3467.2022.0613
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于分层语义特征学习模型的微博谣言事件检测*
黄学坚1,2,马廷淮1(),王根生3
1南京信息工程大学软件学院 南京 210044
2江西财经大学虚拟现实(VR)现代产业学院 南昌 330013
3江西财经大学人文学院 南昌 330013
Detecting Weibo Rumors Based on Hierarchical Semantic Feature Learning Model
Huang Xuejian1,2,Ma Tinghuai1(),Wang Gensheng3
1College of Software, Nanjing University of Information Science & Technology, Nanjing 210044, China
2VR College of Modern Industry, Jiangxi University of Finance and Economics, Nanchang 330013, China
3College of Humanities, Jiangxi University of Finance and Economics, Nanchang 330013, China
全文: PDF (849 KB)   HTML ( 17
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 提高微博谣言事件检测的准确率和时效性。【方法】 提出一种基于分层语义特征学习模型的微博谣言事件检测方法。首先,基于BERT预训练模型抽取事件中单条文本信息的语义特征;其次,基于时间域对事件传播数据进行动态划分,利用卷积神经网络挖掘各时间域中的文本集合的语义相关性特征;然后,把各时间域内的语义相关性特征输入深层双向门控循环神经网络,学习事件传播过程中的深层语义时序特征;最后,融合Attention机制使模型更加关注于语义时序特征中具有谣言特征的部分。【结果】 在Weibo公开数据集上的实验结果表明,该模型的检测准确率达到95.39%,检测时延在12h以内。【局限】 模型需要一定数量的转发评论信息,事件热度不够时检测效果不突出。【结论】 分层语义特征学习模型实现了从局部语义到全局语义的学习过程,提升了微博谣言事件检测的性能。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
黄学坚
马廷淮
王根生
关键词 谣言检测深度学习语义特征时序数据分层语义    
Abstract

[Objective] This paper tries to improve the accuracy and timeliness of Weibo rumor detection. [Methods] We proposed a rumor detection method based on the hierarchical semantic feature learning model (BCGA). Firstly, we extracted the semantic features of a single text in an event based on the BERT model. Secondly, we dynamically grouped the event propagation data based on the time domain. Next, we used the convolutional neural network to learn the semantic correlation features of the text sets in each time domain. Fourth, we input the semantic correlation features in each time domain into the deep bidirectional gated recurrent neural network to learn the deep semantic temporal features of the event propagation process. Finally, we integrated the attention mechanism to make the model focus on the rumor feature in semantic temporal features. [Results] Experiments on the Weibo public data sets show that the detection accuracy of the model reached 95.39%, while the detection delay was within 12 hours. [Limitations] The model requires a certain amount of forwarding and commenting information and the detection effect is not prominent when the event is not popular enough. [Conclusions] The hierarchical semantic feature learning model achieves a learning process from local to global semantics, improving the performance of Weibo rumor detection.

Key wordsRumor Detection    Deep Learning    Semantic Features    Temporal Data    Hierarchical Semantic
收稿日期: 2022-06-14      出版日期: 2023-07-04
ZTFLH:  TP393  
  G250  
基金资助:*国家重点研发计划(2021YFE0104400);国家自然科学基金项目(72061015);江西省教育厅科技项目的研究成果之一(GJJ200539)
通讯作者: 马廷淮,ORCID:0000-0003-2320-1692,E-mail:thma@nuist.edu.cn。   
引用本文:   
黄学坚, 马廷淮, 王根生. 基于分层语义特征学习模型的微博谣言事件检测*[J]. 数据分析与知识发现, 2023, 7(5): 81-91.
Huang Xuejian, Ma Tinghuai, Wang Gensheng. Detecting Weibo Rumors Based on Hierarchical Semantic Feature Learning Model. Data Analysis and Knowledge Discovery, 2023, 7(5): 81-91.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0613      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I5/81
Fig.1  BCGA模型
统计项 数量
事件总数 4 664
谣言事件数 2 313
非谣言事件数 2 351
所有事件转发评论总数 3 805 656
事件平均转发评论数 816
事件最大转发评论数 59 318
事件最小转发评论数 10
Table 1  Weibo谣言数据集统计信息
参数类别 参数名称 参数值
模型参数 BERT模型的层数L 12
BERT模型的Multi-head个数A 12
BERT模型的输出维度H 768
事件信息分块数N 50
CNN卷积核高度h 3,4,5
同一高度下的卷积核个数m 80
双向GRU的层数L 3
正则化参数 λ 2e-4
Dropout的keep-prob 0.8
训练参数 学习率learning_rate 0.001
最大迭代次数epoch_num 300
批量训练的batch_size 64
交叉验证K-fold 6
Table 2  主要参数设置
深度 准确率/% 类别 查准率/% 查全率/% F1/%
1 92.39 R 91.18 93.74 92.44
N 93.65 91.06 92.34
2 94.21 R 92.87 95.68 94.26
N 95.61 92.77 94.17
3 95.39 R 93.75 97.19 95.44
N 97.13 93.62 95.34
4 93.68 R 92.26 95.25 93.73
N 95.16 92.13 93.62
5 90.14 R 89.38 90.93 90.15
N 90.91 89.36 90.13
6 85.64 R 84.93 86.39 85.65
N 86.36 84.89 85.62
Table 3  不同深度的双向GRU实验结果对比
Fig.2  不同分块方法的实验结果对比
组合模型 准确率/
%
类别 查准率/
%
查全率/
%
F1/%
BERT 89.92 R 88.20 92.01 90.06
N 91.78 87.87 89.78
DCGA 93.25 R 92.19 94.38 93.28
N 94.34 92.13 93.22
BCG 93.35 R 91.86 95.03 93.42
N 94.93 91.70 93.29
BCGA_s 93.57 R 92.07 95.25 93.63
N 95.15 91.91 93.51
BCSA 94.00 R 92.31 95.90 94.07
N 95.80 92.13 93.93
BCLA 94.86 R 92.96 96.98 94.93
N 96.89 92.77 94.78
BCGA 95.39 R 93.75 97.19 95.44
N 97.13 93.62 95.34
Table 4  不同组合模型的实验结果对比
模型 准确率/% 类别 查准率/% 查全率/% F1/%
DTC 82.96 R 84.70 80.13 82.35
N 81.41 85.74 83.52
SVM-RBF 81.56 R 82.26 80.13 81.18
N 80.91 82.98 81.93
SVM-TS 85.85 R 85.14 86.61 85.87
N 86.58 85.11 85.84
TGBiA 91.21 R 89.28 93.52 91.35
N 93.30 88.94 91.07
GRU-2 89.28 R 87.42 91.58 89.45
N 91.29 87.02 89.11
CNN-GRU 90.68 R 89.17 92.44 90.77
N 92.27 88.94 90.57
BCGA 95.39 R 93.75 97.19 95.44
N 97.13 93.62 95.34
Table 5  和基准模型的实验结果对比
Fig.3  早期谣言检测结果对比
[1] 高玉君, 梁刚, 蒋方婷, 等. 社会网络谣言检测综述[J]. 电子学报, 2020, 48(7): 1421-1435.
doi: 10.3969/j.issn.0372-2112.2020.07.023
[1] (Gao Yujun, Liang Gang, Jiang Fangting, et al. Social Network Rumor Detection: A Survey[J]. Acta Electronica Sinica, 2020, 48(7): 1421-1435.)
doi: 10.3969/j.issn.0372-2112.2020.07.023
[2] Castillo C, Mendoza M, Poblete B. Information Credibility on Twitter[C]// Proceedings of the 20th International Conference on World Wide Web. 2011: 675-684.
[3] Yang F, Liu Y, Yu X H, et al. Automatic Detection of Rumor on Sina Weibo[C]// Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics. 2012: 1-7.
[4] 贺刚, 吕学强, 李卓, 等. 微博谣言识别研究[J]. 图书情报工作, 2013, 57(23): 114-120.
doi: 10.7536/j.issn.0252-3116.2013.23.019
[4] (He Gang, Lv Xueqiang, Li Zhuo, et al. Automatic Rumor Identification on Microblog[J]. Library and Information Service, 2013, 57(23): 114-120.)
doi: 10.7536/j.issn.0252-3116.2013.23.019
[5] Ma J, Gao W, Wei Z Y, et al. Detect Rumors Using Time Series of Social Context Information on Microblogging Websites[C]// Proceedings of the 24th ACM International Conference on Information and Knowledge Management. 2015: 1751-1754.
[6] 祖坤琳, 赵铭伟, 郭凯, 等. 新浪微博谣言检测研究[J]. 中文信息学报, 2017, 31(3): 198-204.
[6] (Zu Kunlin, Zhao Mingwei, Guo Kai, et al. Research on the Detection of Rumor on Sina Weibo[J]. Journal of Chinese Information Processing, 2017, 31(3): 198-204.)
[7] 马鸣, 刘云, 刘地军, 等. 基于主题和预防模型的微博谣言检测[J]. 北京理工大学学报, 2020, 40(3): 310-315.
[7] (Ma Ming, Liu Yun, Liu Dijun, et al. Rumor Detection in Microblogs Based on Topic and Prevention Model[J]. Transactions of Beijing Institute of Technology, 2020, 40(3): 310-315.)
[8] Wu K, Yang S, Zhu K Q. False Rumors Detection on Sina Weibo by Propagation Structures[C]// Proceedings of 2015 IEEE 31st International Conference on Data Engineering. 2015: 651-662.
[9] Ma J, Gao W, Wong K F. Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2017: 708-717.
[10] 曾子明, 王婧. 基于LDA和随机森林的微博谣言识别研究——以2016年雾霾谣言为例[J]. 情报学报, 2019, 38(1): 89-96.
[10] (Zeng Ziming, Wang Jing. Research on Microblog Rumor Identification Based on LDA and Random Forest[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(1): 89-96.)
[11] Yu F, Liu Q, Wu S, et al. A Convolutional Approach for Misinformation Identification[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017: 3901-3907.
[12] Ajao O, Bhowmik D, Zargari S. Fake News Identification on Twitter with Hybrid CNN and RNN Models[C]// Proceedings of the 9th International Conference on Social Media and Society. 2018: 226-230.
[13] 李奥, 但志平, 董方敏, 等. 基于改进生成对抗网络的谣言检测方法[J]. 中文信息学报, 2020, 34(9): 78-88.
[13] (Li Ao, Dan Zhiping, Dong Fangmin, et al. An Improved Generative Adversarial Network for Rumor Detection[J]. Journal of Chinese Information Processing, 2020, 34(9): 78-88.)
[14] Ma J, Gao W, Wong K F. Detect Rumors on Twitter by Promoting Information Campaigns with Generative Adversarial Learning[C]// Proceeding of the 2019 World Wide Web Conference. 2019: 3049-3055.
[15] 黄学坚, 王根生, 罗远胜, 等. 融合多元用户特征和内容特征的微博谣言实时检测模型[J]. 小型微型计算机系统, 2022, 38(12): 2518-2527.
[15] (Huang Xuejian, Wang Gensheng, Luo Yuansheng, et al. Weibo Rumors Real-time Detection Model Based on Fusion of Multi User Features and Content Features[J]. Journal of Chinese Computer Systems, 2022, 38(12): 2518-2527.)
[16] Tu K F, Chen C, Hou C Y, et al. Rumor2vec: A Rumor Detection Framework with Joint Text and Propagation Structure Representation Learning[J]. Information Sciences, 2021, 560: 137-151.
doi: 10.1016/j.ins.2020.12.080
[17] Ke Z W, Li Z, Zhou C Z, et al. Rumor Detection on Social Media via Fused Semantic Information and a Propagation Heterogeneous Graph[J]. Symmetry, 2020, 12(11): 1806.
doi: 10.3390/sym12111806
[18] Ma T H, Zhou H H, Tian Y, et al. A Novel Rumor Detection Algorithm Based on Entity Recognition, Sentence Reconfiguration, and Ordinary Differential Equation Network[J]. Neurocomputing, 2021, 447: 224-234.
doi: 10.1016/j.neucom.2021.03.055
[19] 尹鹏博, 潘伟民, 彭成, 等. 基于用户特征分析的微博谣言早期检测研究[J]. 情报杂志, 2020, 39(7): 81-86.
[19] (Yin Pengbo, Pan Weimin, Peng Cheng, et al. Research on Early Detection of Weibo Rumors Based on User Characteristics Analysis[J]. Journal of Intelligence, 2020, 39(7): 81-86.)
[20] 谢柏林, 蒋盛益, 周咏梅, 等. 基于把关人行为的微博虚假信息及早检测方法[J]. 计算机学报, 2016, 39(4): 730-744.
[20] (Xie Bailin, Jiang Shengyi, Zhou Yongmei, et al. Misinformation Detection Based on Gatekeepers’ Behaviors in Microblog[J]. Chinese Journal of Computers, 2016, 39(4): 730-744.)
[21] 刘知远, 张乐, 涂存超, 等. 中文社交媒体谣言统计语义分析[J]. 中国科学: 信息科学, 2015, 45(12): 1536-1546.
[21] (Liu Zhiyuan, Zhang Le, Tu Cunchao, et al. Statistical and Semantic Analysis of Rumors in Chinese Social Media[J]. Scientia Sinica(Informationis), 2015, 45(12): 1536-1546.)
[22] Vosoughi S, Roy D, Aral S. The Spread of True and False News Online[J]. Science, 2018, 359(6380): 1146-1151.
doi: 10.1126/science.aap9559 pmid: 29590045
[23] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.
[24] Liu F G, Zheng J Z, Zheng L L, et al. Combining Attention Based Bidirectional Gated Recurrent Neural Network and Two Dimensional Convolutional Neural Network for Document-Level Sentiment Classification[J]. Neurocomputing, 2020, 371: 39-50.
doi: 10.1016/j.neucom.2019.09.012
[25] Dey R, Salem F M. Gate-variants of Gated Recurrent Unit (GRU) Neural Networks[C]// Proceedings of 2017 IEEE 60th International Midwest Symposium on Circuits and Systems. 2017: 1597-1600.
[26] Parikh A, Täckström O, Das D, et al. A Decomposable Attention Model for Natural Language Inference[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016: 2249-2255.
[27] Ho Y, Wookey S. The Real-World-Weight Cross-Entropy Loss Function: Modeling the Costs of Mislabeling[J]. IEEE Access, 2019, 8: 4806-4813.
doi: 10.1109/Access.6287639
[28] Ma J, Gao W, Mitra P, et al. Detecting Rumors from Microblogs with Recurrent Neural Networks[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016: 3818-3824.
[29] 李力钊, 蔡国永, 潘角. 基于C-GRU的微博谣言事件检测方法[J]. 山东大学学报(工学版), 2019, 49(2): 102-106, 115.
[29] (Li Lizhao, Cai Guoyong, Pan Jiao. A Microblog Rumor Events Detection Method Based on C-GRU[J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 102-106, 115.)
[1] 刘洋, 张雯, 胡毅, 毛进, 黄菲. 基于多模态深度学习的酒店股票预测*[J]. 数据分析与知识发现, 2023, 7(5): 21-32.
[2] 王寅秋, 虞为, 陈俊鹏. 融合知识图谱的中文医疗问答社区自动问答研究*[J]. 数据分析与知识发现, 2023, 7(3): 97-109.
[3] 张贞港, 余传明. 基于实体与关系融合的知识图谱补全模型研究*[J]. 数据分析与知识发现, 2023, 7(2): 15-25.
[4] 沈丽宁, 杨佳艺, 裴家旋, 曹广, 陈功正. 基于OCC模型和情绪诱因事件抽取的细颗粒度情绪识别方法研究*[J]. 数据分析与知识发现, 2023, 7(2): 72-85.
[5] 王卫军, 宁致远, 杜一, 周园春. 基于多标签分类的科技文献学科交叉研究性质识别*[J]. 数据分析与知识发现, 2023, 7(1): 102-112.
[6] 肖宇晗, 林慧苹. 基于CWSA方面词提取模型的差异化需求挖掘方法研究——以京东手机评论为例*[J]. 数据分析与知识发现, 2023, 7(1): 63-75.
[7] 成全, 佘德昕. 融合患者体征与用药数据的图神经网络药物推荐方法研究*[J]. 数据分析与知识发现, 2022, 6(9): 113-124.
[8] 王露, 乐小虬. 科技论文引用内容分析研究进展[J]. 数据分析与知识发现, 2022, 6(4): 1-15.
[9] 郑潇, 李树青, 张志旺. 基于评分数值分析的用户项目质量测度及其在深度推荐模型中的应用*[J]. 数据分析与知识发现, 2022, 6(4): 39-48.
[10] 余传明, 林虹君, 张贞港. 基于多任务深度学习的实体和事件联合抽取模型*[J]. 数据分析与知识发现, 2022, 6(2/3): 117-128.
[11] 张云秋, 李博诚, 陈妍. 面向不平衡数据的电子病历自动分类研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 233-241.
[12] 张芳丛, 秦秋莉, 姜勇, 庄润涛. 基于RoBERTa-WWM-BiLSTM-CRF的中文电子病历命名实体识别研究[J]. 数据分析与知识发现, 2022, 6(2/3): 251-262.
[13] 胡雅敏, 吴晓燕, 陈方. 基于机器学习的技术术语识别研究综述[J]. 数据分析与知识发现, 2022, 6(2/3): 7-17.
[14] 刘洋, 马莉莉, 张雯, 胡忠义, 吴江. 基于跨模态深度学习的旅游评论反讽识别*[J]. 数据分析与知识发现, 2022, 6(12): 23-31.
[15] 孟佳娜, 王晓培, 李婷, 刘爽, 赵迪. 基于对抗神经网络的跨模态谣言检测*[J]. 数据分析与知识发现, 2022, 6(12): 32-42.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn