[Objective] This paper proposes a new model with data augmentation and multi-task learning, aiming to address the issue of unbalanced data and insufficient labeled data in rumor detection during public health emergencies. [Methods] Firstly, we extracted the text features of public health emergency rumors to construct a replacement word list. Then, we developed the CEDA method based on the extended synonym table to enhance the unbalanced rumor dataset. Third, we built a multi-task learning model to integrate the domain information of public health emergency sentiment classification and rumor detection. Fourth, we obtained the shared features with Transformer and retrieved the unique features of the rumor detection task using the BiLSTM model. Finally, it helped us improve the accuracy of the rumor detection. [Results] The F1 value of the proposed model was 0.972, which was 0.006 and 0.007 higher than the model based on the unbalanced dataset and the single-task learning model. Compared with the DC-CNN model, the F1 value increased by 0.024. [Limitations] The multi-task learning model only includes binary classification of sentiments, requiring more fine-grained negative sentiment classification. [Conclusions] The proposed method can effectively classify public health emergency rumors.
曾子明, 张瑜. 基于数据增强和多任务学习的突发公共卫生事件谣言识别研究*[J]. 数据分析与知识发现, 2023, 7(11): 56-67.
Zeng Ziming, Zhang Yu. Rumor Detection of Public Health Emergencies Based on Data Augmentation and Multi-Task Learning. Data Analysis and Knowledge Discovery, 2023, 7(11): 56-67.
Gupta A, Li H, Farnoush A, et al. Understanding Patterns of COVID Infodemic: A Systematic and Pragmatic Approach to Curb Fake News[J]. Journal of Business Research, 2022, 140: 670-683.
doi: 10.1016/j.jbusres.2021.11.032
(Kuang Wenbo, Wu Xiaoli. Research on Network Rumor Propagation Model and Characteristics in Public Health Emergencies[J] News and Writing, 2020(4): 83-87.)
[3]
Zimbra D, Ghiassi M, Lee S A. Brand-Related Twitter Sentiment Analysis Using Feature Engineering and the Dynamic Architecture for Artificial Neural Networks[C]// Proceedings of the 49th Hawaii International Conference on System Sciences. IEEE, 2016: 1930-1938.
(Shou Huanrong, Deng Shuqing, Xu Jian. Detecting Online Rumors with Sentiment Analysis[J]. Data Analysis and Knowledge Discovery, 2017, 1(7): 44-51.)
[5]
Vosoughi S, Roy D, Aral S. The Spread of True and False News Online[J]. Science, 2018, 359(6380): 1146-1151.
doi: 10.1126/science.aap9559
pmid: 29590045
(Shi Kaiwen, Liu Kan. Weibo Rumor Identification in Public Health Emergencies[J]. Library and Information Service, 2021, 65(13): 87-95.)
doi: 10.13266/j.issn.0252-3116.2021.13.009
(Yin Pengbo, Pan Weimin, Peng Cheng, et al. Research on Early Detection of Weibo Rumors Based on User Characteristics Analysis[J]. Journal of Intelligence, 2020, 39(7): 81-86.)
[9]
Yang F, Liu Y, Yu X. Automatic Detection of Rumor on Sina Weibo[C]// Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, New York: Association for Computing Machinery, 2012: 1-7.
(He Gang, Lü Xueqiang, Li Zhuo, et al. Automatic Rumor Identification on Microblog[J]. Library and Information Service, 2013, 57(23): 114-120.)
doi: 10.7536/j.issn.0252-3116.2013.23.019
[11]
Zhang Q, Zhang S Y, Dong J, et al. Automatic Detection of Rumor on Social Network[C]// Proceedings of the 4th CCF International Conference on Natural Language Processing and Chinese Computing. Cham: Springer, 2015: 113-122.
[12]
Liang G, He W B, Xu C, et al. Rumor Identification in Microblogging Systems Based on Users’ Behavior[J]. IEEE Transactions on Computational Social Systems, 2015, 2(3): 99-108.
doi: 10.1109/TCSS.2016.2517458
[13]
Wu K, Yang S, Zhu K Q. False Rumors Detection on Sina Weibo by Propagation Structures[C]// Proceedings of the 31st IEEE International Conference on Data Engineering. IEEE, 2015: 651-662.
[14]
Ma J, Gao W, Wei Z Y, et al. Detect Rumors Using Time Series of Social Context Information on Microblogging Websites[C]// Proceedings of the 24th ACM International Conference on Information and Knowledge Management. ACM, 2015: 1751-1754.
[15]
Ma J, Gao W, Mitra P, et al. Detecting Rumors from Microblogs with Recurrent Neural Networks[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. ACM, 2016: 3818-3824.
[16]
Chen T, Li X, Yin H Z, et al. Call Attention to Rumors: Deep Attention Based Recurrent Neural Networks for Early Rumor Detection[C]// Proceedings of the 2018 Pacific-Asia Conference on Knowledge Discovery and Data Mining. Cham: Springer, 2018: 40-52.
(Li Yuechen, Qian Lingfei, Ma Jing. Early Detection of Micro Blog Rumors Based on BERT-RCNN Model[J]. Information Studies: Theory & Application, 2021, 44(7): 173-177.)
[18]
He H B, Garcia E A. Learning from Imbalanced Data[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284.
doi: 10.1109/TKDE.2008.239
[19]
Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357.
doi: 10.1613/jair.953
[20]
Han H, Wang W Y, Mao B H. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning[C]// Proceedings of the 2005 International Conference on Advances in Intelligent Computing. ACM, 2005: 878-887.
[21]
Zhang X, Zhao J B, LeCun Y. Character-Level Convolutional Networks for Text Classification[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. ACM, 2015: 649-657.
[22]
Wei J, Zou K. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019: 6382-6388.
(Su Zhizhong, Xi Yaoyi, Chen Yufei, et al. Data Augmentation Approach for Social Media Stance Detection[J]. Journal of Information Engineering University, 2022, 23(1): 58-65.)
(Shi Guoliang, Chen Yuqi. A Comparative Study on the Integration of Text Enhanced and Pre-Trained Language Models in the Classification of Internet Political Messages[J]. Library and Information Service, 2021, 65(13): 96-107.)
doi: 10.13266/j.issn.0252-3116.2021.13.010
[25]
Han S, Gao J, Ciravegna F. Data Augmentation for Rumor Detection Using Context-Sensitive Neural Language Model with Large-Scale Credibility Corpus[C]// Proceedings of the 7th International Conference on Learning Representations. 2019: 1-6.
[26]
Chen X Y, Zhu D D, Lin D Z, et al. Rumor Knowledge Embedding Based Data Augmentation for Imbalanced Rumor Detection[J]. Information Sciences, 2021, 580: 352-370.
doi: 10.1016/j.ins.2021.08.059
(Liu Kan, Huang Zheying. Rumor Identification in Major Sudden Epidemic Situation[J]. Journal of South China University of Technology (Natural Science Edition), 2021, 49(1): 18-28.)
Collobert R, Weston J. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning[C]// Proceedings of the 25th International Conference on Machine Learning. ACM, 2008: 160-167.
[30]
Liu P F, Qiu X P, Huang X J. Recurrent Neural Network for Text Classification with Multi-Task Learning[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. ACM, 2016: 2873-2879.
[31]
Song X M, Nie L Q, Zhang L M, et al. Interest Inference via Structure-Constrained Multi-Source Multi-Task Learning[C]// Proceedings of the 24th International Conference on Artificial Intelligence. ACM, 2015: 2371-2377.
[32]
Ma J, Gao W, Wong K F. Detect Rumor and Stance Jointly by Neural Multi-Task Learning[C]// Proceedings of the 2018 Web Conference. ACM, 2018: 585-593.
[33]
Kochkina E, Liakata M, Zubiaga A. All-in-One: Multi-Task Learning for Rumour Verification[OL]. arXiv Preprint, arXiv: 1806.03713.
[34]
Li Q Z, Zhang Q, Si L. Rumor Detection by Exploiting User Credibility Information, Attention and Multi-Task Learning[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019: 1173-1179.
(Yang Hanxun, Zhou Dequn, Ma Jing, et al. Detecting Rumors with Uncertain Loss and Task-Level Attention Mechanism[J]. Data Analysis and Knowledge Discovery, 2021, 5(7): 101-110.)
[36]
Kumari R, Ashok N, Ghosal T, et al. Misinformation Detection Using Multitask Learning with Mutual Learning for Novelty Detection and Emotion Recognition[J]. Information Processing & Management, 2021, 58(5): Article No.102631.
(Liu Zhiyuan, Zhang Le, Tu Cunchao, et al. Statistical and Semantic Analysis of Rumors in Chinese Social Media[J]. Scientia Sinica (Informationis), 2015, 45(12): 1536-1546.)
[38]
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. ACM, 2017: 6000-6010.
[39]
Graves A, Schmidhuber J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures[J]. Neural Networks, 2005, 18(5-6): 602-610.
doi: 10.1016/j.neunet.2005.06.042
pmid: 16112549
[40]
Graves A, Mohamed A R, Hinton G. Speech Recognition with Deep Recurrent Neural Networks[C]// Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013: 6645-6649.
[41]
Yang C, Zhou X Y, Zafarani R. CHECKED: Chinese COVID-19 Fake News Dataset[J]. Social Network Analysis and Mining, 2021, 11(1): Article No.58.
(Su Jianlin. Fish and Bear’s Paw: SimBERT Model for Fusion Retrieval and Generation[EB/OL]. [2022-05-18]. https://spaces.ac.cn/archives/7427.html. )
[43]
Ma K, Tang C H, Zhang W J, et al. DC-CNN: Dual-Channel Convolutional Neural Networks with Attention-Pooling for Fake News Detection[J]. Applied Intelligence, 2023, 53(7): 8354-8369.
doi: 10.1007/s10489-022-03910-9