Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (5): 81-91    DOI: 10.11925/infotech.2096-3467.2022.0613
Current Issue | Archive | Adv Search |
Detecting Weibo Rumors Based on Hierarchical Semantic Feature Learning Model
Huang Xuejian1,2,Ma Tinghuai1(),Wang Gensheng3
1College of Software, Nanjing University of Information Science & Technology, Nanjing 210044, China
2VR College of Modern Industry, Jiangxi University of Finance and Economics, Nanchang 330013, China
3College of Humanities, Jiangxi University of Finance and Economics, Nanchang 330013, China
Download: PDF (849 KB)   HTML ( 17
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to improve the accuracy and timeliness of Weibo rumor detection. [Methods] We proposed a rumor detection method based on the hierarchical semantic feature learning model (BCGA). Firstly, we extracted the semantic features of a single text in an event based on the BERT model. Secondly, we dynamically grouped the event propagation data based on the time domain. Next, we used the convolutional neural network to learn the semantic correlation features of the text sets in each time domain. Fourth, we input the semantic correlation features in each time domain into the deep bidirectional gated recurrent neural network to learn the deep semantic temporal features of the event propagation process. Finally, we integrated the attention mechanism to make the model focus on the rumor feature in semantic temporal features. [Results] Experiments on the Weibo public data sets show that the detection accuracy of the model reached 95.39%, while the detection delay was within 12 hours. [Limitations] The model requires a certain amount of forwarding and commenting information and the detection effect is not prominent when the event is not popular enough. [Conclusions] The hierarchical semantic feature learning model achieves a learning process from local to global semantics, improving the performance of Weibo rumor detection.

Key wordsRumor Detection      Deep Learning      Semantic Features      Temporal Data      Hierarchical Semantic     
Received: 14 June 2022      Published: 04 July 2023
ZTFLH:  TP393  
  G250  
Fund:National Key R&D Program of China(2021YFE0104400);National Natural Science Foundation of China(72061015);Science and Technology Project of Jiangxi Provincial Department of Education(GJJ200539)
Corresponding Authors: Ma Tinghuai,ORCID:0000-0003-2320-1692,E-mail:thma@nuist.edu.cn。   

Cite this article:

Huang Xuejian, Ma Tinghuai, Wang Gensheng. Detecting Weibo Rumors Based on Hierarchical Semantic Feature Learning Model. Data Analysis and Knowledge Discovery, 2023, 7(5): 81-91.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0613     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I5/81

BCGA Model
统计项 数量
事件总数 4 664
谣言事件数 2 313
非谣言事件数 2 351
所有事件转发评论总数 3 805 656
事件平均转发评论数 816
事件最大转发评论数 59 318
事件最小转发评论数 10
Statistics of the Weibo Rumor Dataset
参数类别 参数名称 参数值
模型参数 BERT模型的层数L 12
BERT模型的Multi-head个数A 12
BERT模型的输出维度H 768
事件信息分块数N 50
CNN卷积核高度h 3,4,5
同一高度下的卷积核个数m 80
双向GRU的层数L 3
正则化参数 λ 2e-4
Dropout的keep-prob 0.8
训练参数 学习率learning_rate 0.001
最大迭代次数epoch_num 300
批量训练的batch_size 64
交叉验证K-fold 6
Main Parameter Setting
深度 准确率/% 类别 查准率/% 查全率/% F1/%
1 92.39 R 91.18 93.74 92.44
N 93.65 91.06 92.34
2 94.21 R 92.87 95.68 94.26
N 95.61 92.77 94.17
3 95.39 R 93.75 97.19 95.44
N 97.13 93.62 95.34
4 93.68 R 92.26 95.25 93.73
N 95.16 92.13 93.62
5 90.14 R 89.38 90.93 90.15
N 90.91 89.36 90.13
6 85.64 R 84.93 86.39 85.65
N 86.36 84.89 85.62
Experimental Results of Bidirectional GRU with Different Layers
Experimental Results of Different Blocking Methods
组合模型 准确率/
%
类别 查准率/
%
查全率/
%
F1/%
BERT 89.92 R 88.20 92.01 90.06
N 91.78 87.87 89.78
DCGA 93.25 R 92.19 94.38 93.28
N 94.34 92.13 93.22
BCG 93.35 R 91.86 95.03 93.42
N 94.93 91.70 93.29
BCGA_s 93.57 R 92.07 95.25 93.63
N 95.15 91.91 93.51
BCSA 94.00 R 92.31 95.90 94.07
N 95.80 92.13 93.93
BCLA 94.86 R 92.96 96.98 94.93
N 96.89 92.77 94.78
BCGA 95.39 R 93.75 97.19 95.44
N 97.13 93.62 95.34
Experimental Results of Different Combination Models
模型 准确率/% 类别 查准率/% 查全率/% F1/%
DTC 82.96 R 84.70 80.13 82.35
N 81.41 85.74 83.52
SVM-RBF 81.56 R 82.26 80.13 81.18
N 80.91 82.98 81.93
SVM-TS 85.85 R 85.14 86.61 85.87
N 86.58 85.11 85.84
TGBiA 91.21 R 89.28 93.52 91.35
N 93.30 88.94 91.07
GRU-2 89.28 R 87.42 91.58 89.45
N 91.29 87.02 89.11
CNN-GRU 90.68 R 89.17 92.44 90.77
N 92.27 88.94 90.57
BCGA 95.39 R 93.75 97.19 95.44
N 97.13 93.62 95.34
The Experimental Results Compared with the Benchmark Models
Comparison of Early Rumor Detection Results
[1] 高玉君, 梁刚, 蒋方婷, 等. 社会网络谣言检测综述[J]. 电子学报, 2020, 48(7): 1421-1435.
doi: 10.3969/j.issn.0372-2112.2020.07.023
[1] (Gao Yujun, Liang Gang, Jiang Fangting, et al. Social Network Rumor Detection: A Survey[J]. Acta Electronica Sinica, 2020, 48(7): 1421-1435.)
doi: 10.3969/j.issn.0372-2112.2020.07.023
[2] Castillo C, Mendoza M, Poblete B. Information Credibility on Twitter[C]// Proceedings of the 20th International Conference on World Wide Web. 2011: 675-684.
[3] Yang F, Liu Y, Yu X H, et al. Automatic Detection of Rumor on Sina Weibo[C]// Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics. 2012: 1-7.
[4] 贺刚, 吕学强, 李卓, 等. 微博谣言识别研究[J]. 图书情报工作, 2013, 57(23): 114-120.
doi: 10.7536/j.issn.0252-3116.2013.23.019
[4] (He Gang, Lv Xueqiang, Li Zhuo, et al. Automatic Rumor Identification on Microblog[J]. Library and Information Service, 2013, 57(23): 114-120.)
doi: 10.7536/j.issn.0252-3116.2013.23.019
[5] Ma J, Gao W, Wei Z Y, et al. Detect Rumors Using Time Series of Social Context Information on Microblogging Websites[C]// Proceedings of the 24th ACM International Conference on Information and Knowledge Management. 2015: 1751-1754.
[6] 祖坤琳, 赵铭伟, 郭凯, 等. 新浪微博谣言检测研究[J]. 中文信息学报, 2017, 31(3): 198-204.
[6] (Zu Kunlin, Zhao Mingwei, Guo Kai, et al. Research on the Detection of Rumor on Sina Weibo[J]. Journal of Chinese Information Processing, 2017, 31(3): 198-204.)
[7] 马鸣, 刘云, 刘地军, 等. 基于主题和预防模型的微博谣言检测[J]. 北京理工大学学报, 2020, 40(3): 310-315.
[7] (Ma Ming, Liu Yun, Liu Dijun, et al. Rumor Detection in Microblogs Based on Topic and Prevention Model[J]. Transactions of Beijing Institute of Technology, 2020, 40(3): 310-315.)
[8] Wu K, Yang S, Zhu K Q. False Rumors Detection on Sina Weibo by Propagation Structures[C]// Proceedings of 2015 IEEE 31st International Conference on Data Engineering. 2015: 651-662.
[9] Ma J, Gao W, Wong K F. Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2017: 708-717.
[10] 曾子明, 王婧. 基于LDA和随机森林的微博谣言识别研究——以2016年雾霾谣言为例[J]. 情报学报, 2019, 38(1): 89-96.
[10] (Zeng Ziming, Wang Jing. Research on Microblog Rumor Identification Based on LDA and Random Forest[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(1): 89-96.)
[11] Yu F, Liu Q, Wu S, et al. A Convolutional Approach for Misinformation Identification[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017: 3901-3907.
[12] Ajao O, Bhowmik D, Zargari S. Fake News Identification on Twitter with Hybrid CNN and RNN Models[C]// Proceedings of the 9th International Conference on Social Media and Society. 2018: 226-230.
[13] 李奥, 但志平, 董方敏, 等. 基于改进生成对抗网络的谣言检测方法[J]. 中文信息学报, 2020, 34(9): 78-88.
[13] (Li Ao, Dan Zhiping, Dong Fangmin, et al. An Improved Generative Adversarial Network for Rumor Detection[J]. Journal of Chinese Information Processing, 2020, 34(9): 78-88.)
[14] Ma J, Gao W, Wong K F. Detect Rumors on Twitter by Promoting Information Campaigns with Generative Adversarial Learning[C]// Proceeding of the 2019 World Wide Web Conference. 2019: 3049-3055.
[15] 黄学坚, 王根生, 罗远胜, 等. 融合多元用户特征和内容特征的微博谣言实时检测模型[J]. 小型微型计算机系统, 2022, 38(12): 2518-2527.
[15] (Huang Xuejian, Wang Gensheng, Luo Yuansheng, et al. Weibo Rumors Real-time Detection Model Based on Fusion of Multi User Features and Content Features[J]. Journal of Chinese Computer Systems, 2022, 38(12): 2518-2527.)
[16] Tu K F, Chen C, Hou C Y, et al. Rumor2vec: A Rumor Detection Framework with Joint Text and Propagation Structure Representation Learning[J]. Information Sciences, 2021, 560: 137-151.
doi: 10.1016/j.ins.2020.12.080
[17] Ke Z W, Li Z, Zhou C Z, et al. Rumor Detection on Social Media via Fused Semantic Information and a Propagation Heterogeneous Graph[J]. Symmetry, 2020, 12(11): 1806.
doi: 10.3390/sym12111806
[18] Ma T H, Zhou H H, Tian Y, et al. A Novel Rumor Detection Algorithm Based on Entity Recognition, Sentence Reconfiguration, and Ordinary Differential Equation Network[J]. Neurocomputing, 2021, 447: 224-234.
doi: 10.1016/j.neucom.2021.03.055
[19] 尹鹏博, 潘伟民, 彭成, 等. 基于用户特征分析的微博谣言早期检测研究[J]. 情报杂志, 2020, 39(7): 81-86.
[19] (Yin Pengbo, Pan Weimin, Peng Cheng, et al. Research on Early Detection of Weibo Rumors Based on User Characteristics Analysis[J]. Journal of Intelligence, 2020, 39(7): 81-86.)
[20] 谢柏林, 蒋盛益, 周咏梅, 等. 基于把关人行为的微博虚假信息及早检测方法[J]. 计算机学报, 2016, 39(4): 730-744.
[20] (Xie Bailin, Jiang Shengyi, Zhou Yongmei, et al. Misinformation Detection Based on Gatekeepers’ Behaviors in Microblog[J]. Chinese Journal of Computers, 2016, 39(4): 730-744.)
[21] 刘知远, 张乐, 涂存超, 等. 中文社交媒体谣言统计语义分析[J]. 中国科学: 信息科学, 2015, 45(12): 1536-1546.
[21] (Liu Zhiyuan, Zhang Le, Tu Cunchao, et al. Statistical and Semantic Analysis of Rumors in Chinese Social Media[J]. Scientia Sinica(Informationis), 2015, 45(12): 1536-1546.)
[22] Vosoughi S, Roy D, Aral S. The Spread of True and False News Online[J]. Science, 2018, 359(6380): 1146-1151.
doi: 10.1126/science.aap9559 pmid: 29590045
[23] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.
[24] Liu F G, Zheng J Z, Zheng L L, et al. Combining Attention Based Bidirectional Gated Recurrent Neural Network and Two Dimensional Convolutional Neural Network for Document-Level Sentiment Classification[J]. Neurocomputing, 2020, 371: 39-50.
doi: 10.1016/j.neucom.2019.09.012
[25] Dey R, Salem F M. Gate-variants of Gated Recurrent Unit (GRU) Neural Networks[C]// Proceedings of 2017 IEEE 60th International Midwest Symposium on Circuits and Systems. 2017: 1597-1600.
[26] Parikh A, Täckström O, Das D, et al. A Decomposable Attention Model for Natural Language Inference[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016: 2249-2255.
[27] Ho Y, Wookey S. The Real-World-Weight Cross-Entropy Loss Function: Modeling the Costs of Mislabeling[J]. IEEE Access, 2019, 8: 4806-4813.
doi: 10.1109/Access.6287639
[28] Ma J, Gao W, Mitra P, et al. Detecting Rumors from Microblogs with Recurrent Neural Networks[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016: 3818-3824.
[29] 李力钊, 蔡国永, 潘角. 基于C-GRU的微博谣言事件检测方法[J]. 山东大学学报(工学版), 2019, 49(2): 102-106, 115.
[29] (Li Lizhao, Cai Guoyong, Pan Jiao. A Microblog Rumor Events Detection Method Based on C-GRU[J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 102-106, 115.)
[1] Liu Yang, Zhang Wen, Hu Yi, Mao Jin, Huang Fei. Hotel Stock Prediction Based on Multimodal Deep Learning[J]. 数据分析与知识发现, 2023, 7(5): 21-32.
[2] Wang Yinqiu, Yu Wei, Chen Junpeng. Automatic Question-Answering in Chinese Medical Q & A Community with Knowledge Graph[J]. 数据分析与知识发现, 2023, 7(3): 97-109.
[3] Zhang Zhengang, Yu Chuanming. Knowledge Graph Completion Model Based on Entity and Relation Fusion[J]. 数据分析与知识发现, 2023, 7(2): 15-25.
[4] Shen Lining, Yang Jiayi, Pei Jiaxuan, Cao Guang, Chen Gongzheng. A Fine-Grained Sentiment Recognition Method Based on OCC Model and Triggering Events[J]. 数据分析与知识发现, 2023, 7(2): 72-85.
[5] Wang Weijun, Ning Zhiyuan, Du Yi, Zhou Yuanchun. Identifying Interdisciplinary Sci-Tech Literature Based on Multi-Label Classification[J]. 数据分析与知识发现, 2023, 7(1): 102-112.
[6] Xiao Yuhan, Lin Huiping. Mining Differentiated Demands with Aspect Word Extraction: Case Study of Smartphone Reviews[J]. 数据分析与知识发现, 2023, 7(1): 63-75.
[7] Cheng Quan, She Dexin. Drug Recommendation Based on Graph Neural Network with Patient Signs and Medication Data[J]. 数据分析与知识发现, 2022, 6(9): 113-124.
[8] Wang Lu, Le Xiaoqiu. Research Progress on Citation Analysis of Scientific Papers[J]. 数据分析与知识发现, 2022, 6(4): 1-15.
[9] Zheng Xiao, Li Shuqing, Zhang Zhiwang. Measuring User Item Quality with Rating Analysis for Deep Recommendation Model[J]. 数据分析与知识发现, 2022, 6(4): 39-48.
[10] Yu Chuanming, Lin Hongjun, Zhang Zhengang. Joint Extraction Model for Entities and Events with Multi-task Deep Learning[J]. 数据分析与知识发现, 2022, 6(2/3): 117-128.
[11] Zhang Yunqiu, Li Bocheng, Chen Yan. Automatic Classification with Unbalanced Data for Electronic Medical Records[J]. 数据分析与知识发现, 2022, 6(2/3): 233-241.
[12] Zhang Fangcong, Qin Qiuli, Jiang Yong, Zhuang Runtao. Named Entity Recognition for Chinese EMR with RoBERTa-WWM-BiLSTM-CRF[J]. 数据分析与知识发现, 2022, 6(2/3): 251-262.
[13] Hu Yamin, Wu Xiaoyan, Chen Fang. Review of Technology Term Recognition Studies Based on Machine Learning[J]. 数据分析与知识发现, 2022, 6(2/3): 7-17.
[14] Liu Yang, Ma Lili, Zhang Wen, Hu Zhongyi, Wu Jiang. Detecting Sarcasm from Travel Reviews Based on Cross-Modal Deep Learning[J]. 数据分析与知识发现, 2022, 6(12): 23-31.
[15] Meng Jiana, Wang Xiaopei, Li Ting, Liu Shuang, Zhao Di. Cross-Modal Rumor Detection Based on Adversarial Neural Network[J]. 数据分析与知识发现, 2022, 6(12): 32-42.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn