Detecting Mis/Dis-information from Social Media with Semantic Enhancement
Wang Hao1,2,Gong Lijuan1,2,Zhou Zeyu1,2(),Fan Tao1,2,Wang Yongsheng1,2
1School of Information Management, Nanjing University, Nanjing 210023, China 2Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210233, China
[Objective] This paper builds an automated detection model to effectively identify mis/dis-information from social media, aiming to balance the speed and accuracy of processing massive data. [Methods] The classification model is the mainstream processing technique to detect for mis/dis-information. However, most of them could not extract deep semantic features from the texts. Therefore, we used the single text feature BFID model (BERT False-Information-Detection) as the benchmark model, and proposed two new methods with fused semantic enhancement to detect the mis/dis-information. [Results] We examined the new models with data from Sina Weibo. The accuracy of the model based on fused sentiment feature BFID-SEN (BFID-Sentiment) increased about 1.59 percentage point, while the accuracy of model with fused image feature BFID-IMG (BFID-Image) model improved by 0.78 percentage point. [Limitations] The ability to fuse semantic enhancement is limited due to the small corpus size, sentiment categories and multimodal disinformation training datasets. [Conclusions] The proposed methods are able to more effectively identify false information from social media.
王昊, 龚丽娟, 周泽聿, 范涛, 王永生. 融合语义增强的社交媒体虚假信息检测方法研究*[J]. 数据分析与知识发现, 2023, 7(2): 48-60.
Wang Hao, Gong Lijuan, Zhou Zeyu, Fan Tao, Wang Yongsheng. Detecting Mis/Dis-information from Social Media with Semantic Enhancement. Data Analysis and Knowledge Discovery, 2023, 7(2): 48-60.
(Li Zongjian, Cheng Zhuru. Challenges and Countermeasures of Public Opinion Guidance in the New Media Time[J]. The Journal of Shanghai Administration Institute, 2016, 17(5): 76-85.)
(Fan Tao, Wang Hao, Hao Linna, et al. Sentiment Analysis of Online Users in the Emergency Based on Video Context and High-Dimensional Fusion[J]. Information Science, 2021, 39(5): 176-183.)
[4]
Bondielli A, Marcelloni F. A Survey on Fake News and Rumour Detection Techniques[J]. Information Sciences, 2019, 497: 38-55.
doi: 10.1016/j.ins.2019.05.035
[5]
Chen W L, Yeo C K, Lau C T, et al. Behavior Deviation: An Anomaly Detection View of Rumor Preemption[C]// Proceedings of the 7th Annual Information Technology, Electronics and Mobile Communication Conference. IEEE, 2016: 1-7.
[6]
Wu K, Yang S, Zhu K Q. False Rumors Detection on Sina Weibo by Propagation Structures[C]// Proceedings of the 31st International Conference on Data Engineering. IEEE, 2015: 651-662.
[7]
Okazaki N, Nabeshima K, Watanabe K, et al. Extracting and Aggregating False Information from Microblogs[C]// Proceedings of the 2013 Workshop on Language Processing and Crisis Information. 2013: 36-43.
[8]
Yang F, Liu Y, Yu X H, et al. Automatic Detection of Rumor on Sina Weibo[C]// Proceedings of the 2012 ACM SIGKDD Workshop on Mining Data Semantics. 2012: 13.
[9]
Mendoza M, Poblete B, Castillo C. Twitter Under Crisis: Can We Trust What We RT?[C]// Proceedings of the 1st Workshop on Social Media Analytics. 2010: 71-79.
[10]
Yang Y K, Niu K, He Z Q. Exploiting the Topology Property of Social Network for Rumor Detection[C]// Proceedings of the 12th International Joint Conference on Computer Science and Software Engineering. 2015: 41-46.
[11]
Wang S H, Terano T. Detecting Rumor Patterns in Streaming Social Media[C]// Proceedings of the 2015 IEEE International Conference on Big Data. IEEE, 2015: 2709-2715.
[12]
Jain S, Sharma V, Kaushal R. Towards Automated Real-Time Detection of Misinformation on Twitter[C]// Proceedings of the 2016 International Conference on Advances in Computing, Communications and Informatics. 2016: 2015-2020.
(Chen Yanfang, Li Zhiyu, Liang Xun, et al. Review on Rumor Detection of Online Social Networks[J]. Chinese Journal of Computers, 2018, 41(7): 1648-1677.)
(Zu Kunlin, Zhao Mingwei, Guo Kai, et al. Research on the Detection of Rumor on Sina Weibo[J]. Journal of Chinese Information Processing, 2017, 31(3): 198-204.)
[15]
Kwon S, Cha M, Jung K, et al. Prominent Features of Rumor Propagation in Online Social Media[C]// Proceedings of the 13th International Conference on Data Mining. IEEE, 2013: 1103-1108.
(Yang Wentai, Liang Gang, Xie Kai, et al. Rumor Detection Method Based on Burst Topic Detection and Domain Expert Discovery[J]. Journal of Computer Applications, 2017, 37(10): 2799-2805.)
doi: 10.11772/j.issn.1001-9081.2017.10.2799
(Chen Yixin, Chen Xinyue, Liu Yi, et al. Detecting Rumor Dissemination and Sources with SIDR Model[J]. Data Analysis and Knowledge Discovery, 2021, 5(1): 78-89.)
(Liu Che,Liu Zugen. A New Algorithm for Rumor Source Detection Based on Information Transmission[J]. Computer and Modernization, 2020(3): 54-59.)
[19]
Chang C, Zhang Y H, Szabo C, et al. Extreme User and Political Rumor Detection on Twitter[C]// Proceedings of the 12th International Conference on Advanced Data Mining and Applications. 2016: 751-763.
[20]
Zubiaga A, Aker A, Bontcheva K, et al. Detection and Resolution of Rumours in Social Media: A Survey[J]. ACM Computing Surveys (CSUR), 2018, 51(2): 1-36.
[21]
Cai G Y, Wu H, Lv R. Rumors Detection in Chinese via Crowd Responses[C]// Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE, 2014: 912-917.
[22]
Liang G, He W B, Xu C, et al. Rumor Identification in Microblogging Systems Based on Users’ Behavior[J]. IEEE Transactions on Computational Social Systems, 2015, 2(3): 99-108.
doi: 10.1109/TCSS.2016.2517458
[23]
Castillo C, Mendoza M, Poblete B. Information Credibility on Twitter[C]// Proceedings of the 20th International Conference on World Wide Web. 2011: 675-684.
[24]
Takahashi T, Igata N. Rumor Detection on Twitter[C]// Proceedings of the 6th International Conference on Soft Computing and Intelligent Systems, and the 13th International Symposium on Advanced Intelligence Systems. IEEE, 2012: 452-457.
[25]
Ratkiewicz J, Conover M, Meiss M, et al. Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams[OL]. arXiv Preprint, arXiv: 1011.3768.
[26]
Ma J, Gao W, Mitra P, et al. Detecting Rumors from Microblogs with Recurrent Neural Networks[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016: 3818-3824.
(Wang Xinyun, Wang Hao, Deng Sanhong, et al. Classification of Academic Papers for Periodical Selection[J]. Data Analysis and Knowledge Discovery, 2020, 4(7): 96-109.)
(Huang Yaju, Chen Fuji, You Dandan. Research on the Prediction of Network Public Opinion Based on Hybrid Algorithm and BP Neural Network[J]. Information Science, 2018, 36(2): 24-29.)
(Xu Xukan, Zhou Zeyu. A Multi-Scale BiLSTM-CNN Based Emotion Classification Model for WeChat Tweets and Its Application[J]. Information Science, 2021, 39(5): 130-137.)
[30]
Chen T, Li X, Yin H Z, et al. Call Attention to Rumors: Deep Attention Based Recurrent Neural Networks for Early Rumor Detection[C]// Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2018: 40-52.
(Cheng Liang, Qiu Yunfei, Sun Lu. Research on Detecting Microblogging Rumours[J]. Computer Applications and Software, 2013, 30(2): 226-228.)
[32]
Zhang Q, Zhang S Y, Dong J, et al. Automatic Detection of Rumor on Social Network[C]// Proceedings of the 4th Natural Language Processing and Chinese Computing. 2015: 113-122.
[33]
Andreevskaia A, Bergler S. Mining WordNet for Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses[C]// Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics. 2006: 209-216.
(Yang Hanxun, Zhou Dequn, Ma Jing, et al. Detecting Rumors with Uncertain Loss and Task-Level Attention Mechanism[J]. Data Analysis and Knowledge Discovery, 2021, 5(7): 101-110.)
(Zhang Liu, Wang Xiwei, Huang Bo, et al. A Sentiment Classification Model and Experimental Study of Microblog Commentary Based on Multivariate Convolutional Neural Networks Based on Word Vector[J]. Library and Information Service, 2019, 63(18): 99-108.)
doi: 10.13266/j.issn.0252-3116.2019.18.012
(Shen Ruilin, Pan Weimin, Peng Cheng, et al. Microblog Rumor Detection Method Based on Multi-Task Learning[J]. Computer Engineering and Applications, 2021, 57(24): 192-197.)
doi: 10.3778/j.issn.1002-8331.2007-0152
[37]
陈帆. 基于LSTM情感分析模型的微博谣言识别方法研究[D]. 武汉: 华中师范大学, 2018.
[37]
(Chen Fan. Microblog Rumor Detection Research Based on LSTM Sentiment Analysis Model[D]. Wuhan: Central China Normal University, 2018.)
[38]
李巍胤. 基于情感分析的微博谣言识别模式研究[D]. 重庆: 重庆大学, 2016.
[38]
(Li Weiyin. Research on Microblog Rumors Detection Pattern Based on Sentiment Analysis[D]. Chongqing: Chongqing University, 2016.)
[39]
Jin Z W, Cao J, Zhang Y D, et al. Novel Visual and Statistical Image Features for Microblogs News Verification[J]. IEEE Transactions on Multimedia, 2017, 19(3): 598-608.
doi: 10.1109/TMM.6046
[40]
Gupta M, Zhao P X, Han J W. Evaluating Event Credibility on Twitter[C]// Proceedings of the 2012 SIAM International Conference on Data Mining. 2012: 153-164.
[41]
Sun S Y, Liu H Y, He J, et al. Detecting Event Rumors on Sina Weibo Automatically[C]// Proceedings of the 15th Asia-Pacific Web Conference. 2013: 120-131.
(Wang Yuzhu, Xie Jun, Chen Bo, et al. Multi-Modal Sentiment Analysis Based on Cross-Modal Context-Aware Attention[J]. Data Analysis and Knowledge Discovery, 2021, 5(4): 49-59.)
(Zhang Guobiao, Li Jie. Detecting Social Media Fake News with Semantic Consistency Between Multi-Model Contents[J]. Data Analysis and Knowledge Discovery, 2021, 5(5): 21-29.)
(Zhang Shaoqin, Du Shengdong, Zhang Xiaobo, et al. Social Rumor Detection Method Based on Multimodal Fusion[J]. Computer Science, 2021, 48(5): 117-123.)
doi: 10.11896/jsjkx.200400057
(Xie Hao, Mao Jin, Li Gang. Sentiment Classification of Image-Text Information with Multi-Layer Semantic Fusion[J]. Data Analysis and Knowledge Discovery, 2021, 5(6): 103-114.)
(Fan Tao, Wu Peng, Cao Qi. The Research of Sentiment Recognition of Online Users Based on DNNS Multimodal Fusion[J]. Journal of Information Resources Management, 2020, 10(1): 39-48.)
(Zhang Guobiao, Li Jie, Hu Xiaoge. Fake News Detection Based on Multimodal Feature Fusion on Social Media[J]. Information Science, 2021, 39(10): 126-132.)
[49]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
(Chen Dexin, Zhan Yuanyuan, Yang Bing, et al. Research on Extraction of Online Medical Entities Based on Mixed Deep Learning Model[J]. Library and Information Service, 2019, 63(12): 105-113.)
doi: 10.13266/j.issn.0252-3116.2019.12.014
[51]
Cho K, Van Merriënboer B, Gulcehre C, et al. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation[OL]. arXiv Preprint, arXiv:1406.1078.
[52]
Treisman A M, Gelade G. A Feature-Integration Theory of Attention[J]. Cognitive Psychology, 1980, 12(1):97-136.
doi: 10.1016/0010-0285(80)90005-5
pmid: 7351125
(Qi Ruihua, Jian Yue, Guo Xu, et al. Sentiment Analysis of Cross-Domain Product Reviews Based on Feature Fusion and Attention Mechanism[J]. Data Analysis and Knowledge Discovery, 2020, 4(12): 85-94.)
(Zhou Ying, Liu Yue, Cai Jun. Sentiment Analysis of Micro-Blogs Based on Attention Mechanism[J]. Information Studies: Theory & Application, 2018, 41(3): 89-94.)
doi: 10.16353/j.cnki.1000-7490.2018.03.018
[55]
Poria S, Cambria E, Howard N, et al. Fusing Audio, Visual and Textual Clues for Sentiment Analysis from Multimodal Content[J]. Neurocomputing, 2016, 174: 50-59.
doi: 10.1016/j.neucom.2015.01.095
(Wang Shuyi, Liu Sai, Ma Zheng. Microblog Image Privacy Classification with Deep Transfer Learning[J]. Data Analysis and Knowledge Discovery, 2020, 4(10): 80-92.)
[57]
Targ S, Almeida D, Lyman K. Resnet in Resnet: Generalizing Residual Architectures[OL]. arXiv Preprint, arXiv: 1603.08029.
(Hao Xuzheng, Chai Zhengyi. Improved Pedestrian Detection Method Based on Depth Residual Network[J]. Application Research of Computers, 2019, 36(5): 1569-1572.)