Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (11): 56-67    DOI: 10.11925/infotech.2096-3467.2022.1012
Current Issue | Archive | Adv Search |
Rumor Detection of Public Health Emergencies Based on Data Augmentation and Multi-Task Learning
Zeng Ziming(),Zhang Yu
School of Information Management, Wuhan University, Wuhan 430072, China
Download: PDF (919 KB)   HTML ( 12
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a new model with data augmentation and multi-task learning, aiming to address the issue of unbalanced data and insufficient labeled data in rumor detection during public health emergencies. [Methods] Firstly, we extracted the text features of public health emergency rumors to construct a replacement word list. Then, we developed the CEDA method based on the extended synonym table to enhance the unbalanced rumor dataset. Third, we built a multi-task learning model to integrate the domain information of public health emergency sentiment classification and rumor detection. Fourth, we obtained the shared features with Transformer and retrieved the unique features of the rumor detection task using the BiLSTM model. Finally, it helped us improve the accuracy of the rumor detection. [Results] The F1 value of the proposed model was 0.972, which was 0.006 and 0.007 higher than the model based on the unbalanced dataset and the single-task learning model. Compared with the DC-CNN model, the F1 value increased by 0.024. [Limitations] The multi-task learning model only includes binary classification of sentiments, requiring more fine-grained negative sentiment classification. [Conclusions] The proposed method can effectively classify public health emergency rumors.

Key wordsPublic Health Emergencies      Rumor Detection      Data Augmentation      Multi-Task Learning      Shared Transformer     
Received: 25 September 2022      Published: 22 March 2023
ZTFLH:  TP393 G350  
Fund:National Social Science Fund of China(21BTQ046)
Corresponding Authors: Zeng Ziming,ORCID:0000-0001-9847-0358,E-mail: zmzeng1977@aliyun.com。   

Cite this article:

Zeng Ziming, Zhang Yu. Rumor Detection of Public Health Emergencies Based on Data Augmentation and Multi-Task Learning. Data Analysis and Knowledge Discovery, 2023, 7(11): 56-67.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.1012     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I11/56

The Framework of Rumor Detection in Public Health Emergencies Based on Data Augmentation and Multi-Task Learning
主题 词1 词2 词3 词4 词5
新冠病毒 冠状病毒 新冠肺炎 新冠 Coronavirus SARS-CoV-2
流行病 疫情 疫区 感染 确诊 死亡病例
政策 隔离 封城 防控 群体免疫 健康码
人物组织 世界卫生组织 钟南山 张文宏 李文亮 福奇
医疗用品 试剂盒 核酸检测 疫苗 抗体 雷神山
Glossary of Public Health Emergency Rumor Topics
疫情相关词词频数 正向情感词词频数 负向情感词词频数 程度副词词频数
词语 次数 词语 次数 词语 次数 词语 次数
疫情 119 需要 24 恐慌 5 一定 19
病毒 106 希望 23 掉以轻心 3 非常 15
感染 89 同意 10 3 15
新冠病毒 83 注意 9 憋气 3 完全 14
口罩 81 重视 8 原谅 3 7
…… …… …… …… …… …… …… ……
Frequency of Public Health Emergency Rumor Words(Partial)
词语 TF-IDF值 词语 TF-IDF值 词语 TF-IDF值
中国 0.019 武汉 0.014 感染 0.012
美国 0.019 视频 0.013 隔离 0.012
病毒 0.015 口罩 0.013 新型冠状病毒 0.012
正常化 0.014 开学 0.012 医院 0.011
疫情 0.014 没有 0.012 …… ……
Keyword List of Public Health Emergency Rumor(Partial)
领域词表 主题词表 情绪词表 程度词表
原词 替换 原词 替换 原词 替换 原词 替换
口罩 N95 新冠肺炎 新冠病毒
新型肺炎
恐慌 惊慌失措
惶恐
一定 必定
肯定
开学 上班
复工
疫情 病情
疾病
掉以轻心 草率
潦草
非常 特别
格外
干咳 咳嗽
咳痰
疫区 灾区
污染区
憋气 烦躁
郁闷
完全 全然
通通
确诊病例 新冠肺炎患者 感染 传染
沾染
原谅 宽容
谅解
不少 许多
众多
驱疫 防疫
抗疫
确诊 诊断 生气 暴怒 千万 万万
绝对
…… …… …… …… …… …… …… ……
Thesaurus of COVID-19 Rumor(Partial)
操作 文本
原微博 易感染者可以在未与患者见面的情况下,因为吸入了悬浮在空气中的病毒感染新冠肺炎。
随机交换 易感染者可以在未与患者见面的情况下,因吸为入了悬浮在空气中的感毒病染新冠肺炎。
随机删除 易感染者可以在与患者见面情况,因为吸入悬浮在空气中的病毒感染新冠肺炎
随机插入 易见面感染者可以在未与患者见面的情况下来龙去脉,因为吸入了悬浮在空气中的病毒感染新冠肺炎。
同义词替换 易感染者同意在未与病人会面的情况下,因吸吮了漂流在氛围中的病毒感染新冠病毒。
CEDA Data Example
实验环境 配置详情
GPU NVIDIA GeForce RTX 3090
CPU AMD EPYC 7601
显存 24 GB
内存 64GB
开发语言 Python 3.8
深度学习框架 PyTorch 1.10.0+CUDA 11.3
Experimental Environment
超参数设置 模型参数 参数值
训练参数设置 batch_size 64
epoch 5
learning_rate 0.000 1
Transformer 自注意力层数(N) 6
自注意力头数 8
dropout 0.1
BiLSTM 词嵌入维度 768
隐藏层节点数 768
层数 2
dropout 0.1
Parameter Settings of Deep Learning Algorithms
Effects of Different Text Change Rates on F1- score
编号 数据增强方法 Accuracy Precision Recall F1
1 - 0.955 0.952 0.980 0.966
2 简单复制 0.956 0.945 0.989 0.966
3 SimBert 0.957 0.955 0.981 0.967
4 随机交换(change_rate=0.1) 0.963 0.955 0.989 0.971
5 随机删除(change_rate=0.4) 0.964 0.957 0.989 0.972
6 随机插入(change_rate=0.1) 0.963 0.955 0.989 0.971
7 同义词替换(change_rate=0.3) 0.954 0.958 0.975 0.965
8 扩展同义词替换(change_rate=0.3) 0.963 0.955 0.989 0.971
9 EDA 0.963 0.952 0.992 0.971
10 CEDA 0.964 0.962 0.984 0.972
Results of Data Augmentation Comparative Experiment
共享层数 不共享层数 Accuracy Precision Recall F1
0 6 0.947 0.932 0.990 0.959
1 5 0.949 0.941 0.982 0.961
2 4 0.952 0.955 0.974 0.964
3 3 0.952 0.968 0.961 0.964
4 2 0.949 0.971 0.955 0.962
5 1 0.949 0.963 0.962 0.962
6 0 0.955 0.952 0.980 0.966
Results of Different Hyperparameters(MTL)
编号 模型 Accuracy Precision Recall F1
1 TextCNN 0.893 0.893 0.893 0.893
2 DPCNN 0.900 0.869 0.768 0.816
3 BERT 0.846 0.853 0.914 0.881
4 BRET-Att-BiLSTM 0.886 0.841 0.991 0.908
5 BERT-RCNN 0.935 0.926 0.922 0.924
6 DC-CNN 0.948 0.947 0.949 0.948
7 Single-Task 0.947 0.932 0.990 0.959
8 CEDA-Single-Task 0.954 0.947 0.984 0.965
9 MTL 0.955 0.952 0.980 0.966
10 CEDA-MTL 0.964 0.962 0.984 0.972
Results of Multi-Task Learning Comparative Experiment
[1] Gupta A, Li H, Farnoush A, et al. Understanding Patterns of COVID Infodemic: A Systematic and Pragmatic Approach to Curb Fake News[J]. Journal of Business Research, 2022, 140: 670-683.
doi: 10.1016/j.jbusres.2021.11.032
[2] 匡文波, 武晓立. 突发公共卫生事件中网络谣言传播模型及特征研究[J]. 新闻与写作, 2020(4): 83-87.
[2] (Kuang Wenbo, Wu Xiaoli. Research on Network Rumor Propagation Model and Characteristics in Public Health Emergencies[J] News and Writing, 2020(4): 83-87.)
[3] Zimbra D, Ghiassi M, Lee S A. Brand-Related Twitter Sentiment Analysis Using Feature Engineering and the Dynamic Architecture for Artificial Neural Networks[C]// Proceedings of the 49th Hawaii International Conference on System Sciences. IEEE, 2016: 1930-1938.
[4] 首欢容, 邓淑卿, 徐健. 基于情感分析的网络谣言识别方法[J]. 数据分析与知识发现, 2017, 1(7): 44-51.
[4] (Shou Huanrong, Deng Shuqing, Xu Jian. Detecting Online Rumors with Sentiment Analysis[J]. Data Analysis and Knowledge Discovery, 2017, 1(7): 44-51.)
[5] Vosoughi S, Roy D, Aral S. The Spread of True and False News Online[J]. Science, 2018, 359(6380): 1146-1151.
doi: 10.1126/science.aap9559 pmid: 29590045
[6] 石锴文, 刘勘. 突发公共卫生事件中微博谣言的识别[J]. 图书情报工作, 2021, 65(13): 87-95.
doi: 10.13266/j.issn.0252-3116.2021.13.009
[6] (Shi Kaiwen, Liu Kan. Weibo Rumor Identification in Public Health Emergencies[J]. Library and Information Service, 2021, 65(13): 87-95.)
doi: 10.13266/j.issn.0252-3116.2021.13.009
[7] 孙冉, 安璐. 突发公共卫生事件中谣言识别研究[J]. 情报资料工作, 2021, 42(5): 42-49.
[7] (Sun Ran, An Lu. Research on Rumor Identification in Public Health Emergency[J]. Information and Documentation Services, 2021, 42(5): 42-49.)
[8] 尹鹏博, 潘伟民, 彭成, 等. 基于用户特征分析的微博谣言早期检测研究[J]. 情报杂志, 2020, 39(7): 81-86.
[8] (Yin Pengbo, Pan Weimin, Peng Cheng, et al. Research on Early Detection of Weibo Rumors Based on User Characteristics Analysis[J]. Journal of Intelligence, 2020, 39(7): 81-86.)
[9] Yang F, Liu Y, Yu X. Automatic Detection of Rumor on Sina Weibo[C]// Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, New York: Association for Computing Machinery, 2012: 1-7.
[10] 贺刚, 吕学强, 李卓, 等. 微博谣言识别研究[J]. 图书情报工作, 2013, 57(23): 114-120.
doi: 10.7536/j.issn.0252-3116.2013.23.019
[10] (He Gang, Lü Xueqiang, Li Zhuo, et al. Automatic Rumor Identification on Microblog[J]. Library and Information Service, 2013, 57(23): 114-120.)
doi: 10.7536/j.issn.0252-3116.2013.23.019
[11] Zhang Q, Zhang S Y, Dong J, et al. Automatic Detection of Rumor on Social Network[C]// Proceedings of the 4th CCF International Conference on Natural Language Processing and Chinese Computing. Cham: Springer, 2015: 113-122.
[12] Liang G, He W B, Xu C, et al. Rumor Identification in Microblogging Systems Based on Users’ Behavior[J]. IEEE Transactions on Computational Social Systems, 2015, 2(3): 99-108.
doi: 10.1109/TCSS.2016.2517458
[13] Wu K, Yang S, Zhu K Q. False Rumors Detection on Sina Weibo by Propagation Structures[C]// Proceedings of the 31st IEEE International Conference on Data Engineering. IEEE, 2015: 651-662.
[14] Ma J, Gao W, Wei Z Y, et al. Detect Rumors Using Time Series of Social Context Information on Microblogging Websites[C]// Proceedings of the 24th ACM International Conference on Information and Knowledge Management. ACM, 2015: 1751-1754.
[15] Ma J, Gao W, Mitra P, et al. Detecting Rumors from Microblogs with Recurrent Neural Networks[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. ACM, 2016: 3818-3824.
[16] Chen T, Li X, Yin H Z, et al. Call Attention to Rumors: Deep Attention Based Recurrent Neural Networks for Early Rumor Detection[C]// Proceedings of the 2018 Pacific-Asia Conference on Knowledge Discovery and Data Mining. Cham: Springer, 2018: 40-52.
[17] 李悦晨, 钱玲飞, 马静. 基于BERT-RCNN模型的微博谣言早期检测研究[J]. 情报理论与实践, 2021, 44(7): 173-177.
[17] (Li Yuechen, Qian Lingfei, Ma Jing. Early Detection of Micro Blog Rumors Based on BERT-RCNN Model[J]. Information Studies: Theory & Application, 2021, 44(7): 173-177.)
[18] He H B, Garcia E A. Learning from Imbalanced Data[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284.
doi: 10.1109/TKDE.2008.239
[19] Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357.
doi: 10.1613/jair.953
[20] Han H, Wang W Y, Mao B H. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning[C]// Proceedings of the 2005 International Conference on Advances in Intelligent Computing. ACM, 2005: 878-887.
[21] Zhang X, Zhao J B, LeCun Y. Character-Level Convolutional Networks for Text Classification[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. ACM, 2015: 649-657.
[22] Wei J, Zou K. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019: 6382-6388.
[23] 苏致中, 席耀一, 陈宇飞, 等. 面向社交媒体立场检测的数据增强方法[J]. 信息工程大学学报, 2022, 23(1): 58-65.
[23] (Su Zhizhong, Xi Yaoyi, Chen Yufei, et al. Data Augmentation Approach for Social Media Stance Detection[J]. Journal of Information Engineering University, 2022, 23(1): 58-65.)
[24] 施国良, 陈宇奇. 文本增强与预训练语言模型在网络问政留言分类中的集成对比研究[J]. 图书情报工作, 2021, 65(13): 96-107.
doi: 10.13266/j.issn.0252-3116.2021.13.010
[24] (Shi Guoliang, Chen Yuqi. A Comparative Study on the Integration of Text Enhanced and Pre-Trained Language Models in the Classification of Internet Political Messages[J]. Library and Information Service, 2021, 65(13): 96-107.)
doi: 10.13266/j.issn.0252-3116.2021.13.010
[25] Han S, Gao J, Ciravegna F. Data Augmentation for Rumor Detection Using Context-Sensitive Neural Language Model with Large-Scale Credibility Corpus[C]// Proceedings of the 7th International Conference on Learning Representations. 2019: 1-6.
[26] Chen X Y, Zhu D D, Lin D Z, et al. Rumor Knowledge Embedding Based Data Augmentation for Imbalanced Rumor Detection[J]. Information Sciences, 2021, 580: 352-370.
doi: 10.1016/j.ins.2021.08.059
[27] 刘勘, 黄哲英. 重大突发疫情事件中的谣言识别[J]. 华南理工大学学报(自然科学版), 2021, 49(1): 18-28.
[27] (Liu Kan, Huang Zheying. Rumor Identification in Major Sudden Epidemic Situation[J]. Journal of South China University of Technology (Natural Science Edition), 2021, 49(1): 18-28.)
[28] Caruana R. Multitask Learning[J]. Machine Learning, 1997, 28: 41-75.
doi: 10.1023/A:1007379606734
[29] Collobert R, Weston J. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning[C]// Proceedings of the 25th International Conference on Machine Learning. ACM, 2008: 160-167.
[30] Liu P F, Qiu X P, Huang X J. Recurrent Neural Network for Text Classification with Multi-Task Learning[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. ACM, 2016: 2873-2879.
[31] Song X M, Nie L Q, Zhang L M, et al. Interest Inference via Structure-Constrained Multi-Source Multi-Task Learning[C]// Proceedings of the 24th International Conference on Artificial Intelligence. ACM, 2015: 2371-2377.
[32] Ma J, Gao W, Wong K F. Detect Rumor and Stance Jointly by Neural Multi-Task Learning[C]// Proceedings of the 2018 Web Conference. ACM, 2018: 585-593.
[33] Kochkina E, Liakata M, Zubiaga A. All-in-One: Multi-Task Learning for Rumour Verification[OL]. arXiv Preprint, arXiv: 1806.03713.
[34] Li Q Z, Zhang Q, Si L. Rumor Detection by Exploiting User Credibility Information, Attention and Multi-Task Learning[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019: 1173-1179.
[35] 杨晗迅, 周德群, 马静, 等. 基于不确定性损失函数和任务层级注意力机制的多任务谣言检测研究[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
[35] (Yang Hanxun, Zhou Dequn, Ma Jing, et al. Detecting Rumors with Uncertain Loss and Task-Level Attention Mechanism[J]. Data Analysis and Knowledge Discovery, 2021, 5(7): 101-110.)
[36] Kumari R, Ashok N, Ghosal T, et al. Misinformation Detection Using Multitask Learning with Mutual Learning for Novelty Detection and Emotion Recognition[J]. Information Processing & Management, 2021, 58(5): Article No.102631.
[37] 刘知远, 张乐, 涂存超, 等. 中文社交媒体谣言统计语义分析[J]. 中国科学: 信息科学, 2015, 45(12): 1536-1546.
[37] (Liu Zhiyuan, Zhang Le, Tu Cunchao, et al. Statistical and Semantic Analysis of Rumors in Chinese Social Media[J]. Scientia Sinica (Informationis), 2015, 45(12): 1536-1546.)
[38] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. ACM, 2017: 6000-6010.
[39] Graves A, Schmidhuber J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures[J]. Neural Networks, 2005, 18(5-6): 602-610.
doi: 10.1016/j.neunet.2005.06.042 pmid: 16112549
[40] Graves A, Mohamed A R, Hinton G. Speech Recognition with Deep Recurrent Neural Networks[C]// Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013: 6645-6649.
[41] Yang C, Zhou X Y, Zafarani R. CHECKED: Chinese COVID-19 Fake News Dataset[J]. Social Network Analysis and Mining, 2021, 11(1): Article No.58.
[42] 苏剑林. 鱼与熊掌兼得:融合检索和生成的SimBERT模型[EB/OL]. [2022-05-18]. https://spaces.ac.cn/archives/7427.html.
[42] (Su Jianlin. Fish and Bear’s Paw: SimBERT Model for Fusion Retrieval and Generation[EB/OL]. [2022-05-18]. https://spaces.ac.cn/archives/7427.html. )
[43] Ma K, Tang C H, Zhang W J, et al. DC-CNN: Dual-Channel Convolutional Neural Networks with Attention-Pooling for Fake News Detection[J]. Applied Intelligence, 2023, 53(7): 8354-8369.
doi: 10.1007/s10489-022-03910-9
[1] Han Pu, Gu Liang, Ye Dongyu, Chen Wenqi. Recognizing Chinese Medical Literature Entities Based on Multi-Task and Transfer Learning[J]. 数据分析与知识发现, 2023, 7(9): 136-145.
[2] Huang Xuejian, Ma Tinghuai, Wang Gensheng. Detecting Weibo Rumors Based on Hierarchical Semantic Feature Learning Model[J]. 数据分析与知识发现, 2023, 7(5): 81-91.
[3] Qiang Zishan,Gu Yijun. Detecting Social Media Rumors Based on Multimodal Heterogeneous Graph[J]. 数据分析与知识发现, 2023, 7(11): 68-78.
[4] Yu Chuanming, Lin Hongjun, Zhang Zhengang. Joint Extraction Model for Entities and Events with Multi-task Deep Learning[J]. 数据分析与知识发现, 2022, 6(2/3): 117-128.
[5] Qian Danmin, Zeng Tingting, Chang Shiyi. Research on User Roles Based on OHCs-UP in Public Health Emergencies[J]. 数据分析与知识发现, 2022, 6(2/3): 93-104.
[6] Meng Jiana, Wang Xiaopei, Li Ting, Liu Shuang, Zhao Di. Cross-Modal Rumor Detection Based on Adversarial Neural Network[J]. 数据分析与知识发现, 2022, 6(12): 32-42.
[7] Yang Hanxun, Zhou Dequn, Ma Jing, Luo Yongcong. Detecting Rumors with Uncertain Loss and Task-level Attention Mechanism[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
[8] Liu Tong,Liu Chen,Ni Weijian. A Semi-Supervised Sentiment Analysis Method for Chinese Based on Multi-Level Data Augmentation[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[9] Han Pu, Zhang Wei, Zhang Zhanpeng, Wang Yuxin, Fang Haoyu. Sentiment Analysis of Weibo Posts on Public Health Emergency with Feature Fusion and Multi-Channel[J]. 数据分析与知识发现, 2021, 5(11): 68-79.
[10] Lin Wang,Ke Wang,Jiang Wu. Public Opinion Propagation and Evolution of Public Health Emergencies in Social Media Era: A Case Study of 2018 Vaccine Event[J]. 数据分析与知识发现, 2019, 3(4): 42-52.
[11] Kan Liu,Haochen Du. Detecting Twitter Rumors with Deep Transfer Network[J]. 数据分析与知识发现, 2019, 3(10): 47-55.
[12] Shou Huanrong,Deng Shuqing,Xu Jian. Detecting Online Rumors with Sentiment Analysis[J]. 数据分析与知识发现, 2017, 1(7): 44-51.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn