Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (5): 21-29     https://doi.org/10.11925/infotech.2096-3467.2020.0884
     研究论文 本期目录 | 过刊浏览 | 高级检索 |
融合多模态内容语义一致性的社交媒体虚假新闻检测*
张国标1,2,李洁3()
1武汉大学信息管理学院 武汉 430072
2武汉大学信息检索与知识挖掘研究所 武汉 430072
3苏州大学社会学院 苏州 215000
Detecting Social Media Fake News with Semantic Consistency Between Multi-model Contents
Zhang Guobiao1,2,Li Jie3()
1School of Information Management, Wuhan University, Wuhan 430072, China
2Institute for Information Retrieval and Knowledge Mining, Wuhan University, Wuhan 430072, China
3School of Sociology, Soochow University, Suzhou 215000, China
全文: PDF (2867 KB)   HTML ( 47
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 实现社交媒体虚假新闻早期检测,遏制虚假信息的广泛传播。【方法】 在同时利用图像与文本特征的基础上,通过将图像映射为语义标签,设计了一种图像与文本内容语义一致性计算方法,构建虚假新闻检测模型,并采用虚假新闻检测标准数据集FakeNewsNet验证模型的性能。【结果】 融合新闻图像与文本语义一致性特征的全特征模型在PolitiFact数据上的检测F1值达到0.775,在GossipCop数据上的F1值达到0.879,说明该模型具有良好的检测效果。【局限】 由于现有图像语义标注模型标注能力的局限性,尚无法准确描述图像内容,所计算的语义一致性存在误差。【结论】 多模态特征融合能够有效提升虚假新闻检测性能,本文构建的新闻文本与图像语义一致性特征能够丰富和拓展虚假新闻检测依据。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张国标
李洁
关键词 虚假新闻检测社交媒体多模态特征融合语义一致性深度学习    
Abstract

[Objective] This study aims to detect fake news on social media earlier and curb the dissemination of mis/dis-information. [Methods] Based on the features of news images and texts, we mapped the images to semantic tags and calculated the semantic consistency between images and texts. Then, we constructed a model to detect fake news. Finally, we examined our new model with the FakeNewsNet dataset. [Results] The F1 value of our model was up to 0.775 on PolitiFact data and 0.879 on GossipCop data. [Limitations] Due to the limits of existing annotation methods for image semantics, we could not accurately describe image contents, and calculate semantic consistency. [Conclusions] The constructed model could effectively detect fake news from social media.

Key wordsFake News Detection    Social Media    Multi-modal Feature Fusion    Semantic Consistency    Deep Learning
收稿日期: 2020-09-08      出版日期: 2020-11-24
ZTFLH:  TP393  
基金资助:*本文系苏州大学2020年人文社会科学优秀学术团队(项目培育)项目的研究成果之一(NH33711520)
通讯作者: 李洁     E-mail: allison_lijie@163.com
引用本文:   
张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
Zhang Guobiao,Li Jie. Detecting Social Media Fake News with Semantic Consistency Between Multi-model Contents. Data Analysis and Knowledge Discovery, 2021, 5(5): 21-29.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0884      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I5/21
Fig.1  虚假新闻图文内容语义不一致示例
Fig.2  图像标签映射过程
Fig.3  多模态特征融合的社交媒体虚假新闻检测模型
项目 PolitiFact GossipCop
Fake True Fake True
训练集 2 466 3 190 14 737 17 922
验证集 352 456 2 105 2 560
测试集 705 912 4 210 5 121
总计 3 523 4 558 21 052 25 603
Table 1  FakeNewsNet实验数据
参数 参数值
Epoch 50
Dropout 0.4
Batch_size 32
激活函数 ReLU
学习率 0.0001
图像全连接层神经元个数 200
MLP各层神经元个数 500,200,100
Tabel 2  
特征类型 PolitiFact GossipCop
准确度 精确率 召回率 F1 准确度 精确率 召回率 F1
文本特征 0.761 0.768 0.773 0.753 0.836 0.810 0.821 0.815
图像特征 0.540 0.520 0.560 0.520 0.654 0.704 0.702 0.653
语义一致性特征 0.520 0.450 0.524 0.480 0.564 0.530 0.545 0.548
文本与图像特征 0.782 0.784 0.813 0.770 0.857 0.827 0.838 0.836
全部特征 0.791 0.792 0.803 0.775 0.883 0.864 0.853 0.879
EANN 0.776 0.764 0.798 0.768 0.841 0.814 0.796 0.806
Tabel 3  
Fig.4  各CNN模型语义一致性均值
Fig.5  新闻图文语义一致性案例
[1] Aldwairi M, Alwahedi A. Detecting Fake News in Social Media Networks[J]. Procedia Computer Science, 2018,141:215-222.
doi: 10.1016/j.procs.2018.10.171
[2] Kim A, Moravec P L, Dennis A R. Combating Fake News on Social Media with Source Ratings: The Effects of User and Expert Reputation Ratings[J]. Journal of Management Information Systems, 2019,36(3):931-968.
doi: 10.1080/07421222.2019.1628921
[3] Shu K, Mahudeswaran D, Wang S, et al. Hierarchical Propagation Networks for Fake News Detection: Investigation and Exploitation[C]// Proceedings of the 14th International AAAI Conference on Web and Social Media. 2020.
[4] Qi P, Cao J, Yang T, et al. Exploiting Multi-domain Visual Information for Fake News Detection[C]// Proceedings of the 19th IEEE International Conference on Data Mining (ICDM), Beijing, China. USA: IEEE, 2019.
[5] Castillo C, Mendoza M, Poblete B. Information Credibility on Twitter[C]// Proceedings of the 20th International Conference on World Wide Web, Hyderabad, India. New York, USA: ACM, 2011.
[6] Rashkin H, Choi E, Jang J Y, et al. Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-checking[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark. USA: ACL, 2017.
[7] Ma J, Gao W, Mitra P, et al. Detecting Rumors from Microblogs with Recurrent Neural Networks[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), New York, USA. New York, USA: ACM, 2016.
[8] Popat K, Mukherjee S, Yates A, et al. DeClarE: Debunking Fake News and False Claims Using Evidence-Aware Deep Learning[C]// Proceeding of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), Brussels, Belgium. USA: ACL, 2018: 22-32.
[9] Jin Z, Cao J, Guo H, et al. Multimodal Fusion with Recurrent Neural Networks for Rumor Detection on Microblogs[C]// Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, USA. New York, USA: ACM, 2017: 795-816.
[10] Wang Y, Ma F, Jin Z, et al. EANN: Event Adversarial Neural Networks for Multi-modal Fake News Detection[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK. New York, USA: ACM, 2018.
[11] Khattar D, Goud J S, Gupta M, et al. MVAE: Multimodal Variational Autoencoder for Fake News Detection[C]// Proceedings of the 2019 World Wide Web Conference. ACM, 2019.
[12] Sing V K, Ghosh I, Sonagara D. Detecting Fake News Stories via Multimodal Analysis[J]. Journal of the Association for Information Science and Technology, 2021,72(1):3-17.
doi: 10.1002/asi.v72.1
[13] 鲍远福. 新媒体文本表意论:从“语图关系”到“语图间性”[J]. 南京邮电大学学报(社会科学版), 2016,18(1):11-22.
[13] ( Bao Yuanfu. Ideographic Text of New Media: From “Language-icon Relationship” to “Language-Icon Intertextuality”[J]. Journal of Nanjing University of Posts and Telecommunications (Social Science), 2016,18(1):11-22.)
[14] Gombrich E H. The Image and the Eye: Further Studies in the Psychology of Pictorial Representation[M]. Oxford: Phaidon Press, 1982: 150.
[15] Deng J, Dong W, Socher R, et al. ImageNet: A Large-scale Hierarchical Image Database[C]// Proceedings of the 2009 IEEE Conference on Computer Vision & Pattern Recognition, Miami, USA. USA: IEEE, 2009.
[16] Krizhevsky A, Sutskever I, Hinton G E. Imagenet Classification with Deep Convolutional Neural Networks[C]// Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, USA. 2012: 1097-1105.
[17] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013: 3111-3119.
[18] Maas A L, Daly R E, Pham P T, et al. Learning Word Vectors for Sentiment Analysis[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, USA. New York, USA: ACM, 2011: 142-150.
[19] Gentzkow M, Shapiro J M, Stone D F. Media Bias in the Marketplace: Theory[R]. National Bureau of Economic Research, Inc., 2014: 623-645.
[20] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
pmid: 9377276
[21] Jibril T A, Abdullah M H. Relevance of Emoticons in Computer-Mediated Communication Contexts: An Overview[J]. Asian Social Ence, 2013,9(4):201-207.
[22] Yoon J, Chung E. Image Use in Social Network Communication: A Case Study of Tweets on the Boston Marathon Bombing[J]. Information Research, 2016,21(1):106-116.
[23] He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA. USA: IEEE, 2016.
[24] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301. 3781.
[25] Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-scale Image Recognition[OL]. arXiv Preprint, arXiv: 1409. 1556.
[26] Szegedy C, Liu W, Jia Y, et al. Going Deeper with Convolutions[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, USA. USA: IEEE, 2015.
[27] Chollet F. Xception: Deep Learning with Depthwise Separable Convolutions[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA. USA: IEEE, 2017.
[28] Shu K, Mahudeswaran D, Wang S, et al. Fakenewsnet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media[OL]. arXiv Preprint ,arXiv: 1809. 01286.
[29] Autonomio Talos[EB/OL]. [ 2020- 11- 07]. http://github.com/autonomio/talos .
[1] 周泽聿,王昊,赵梓博,李跃艳,张小琴. 融合关联信息的GCN文本分类模型构建及其应用研究*[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[2] 赵丹宁,牟冬梅,白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[3] 徐月梅, 王子厚, 吴子歆. 一种基于CNN-BiLSTM多特征融合的股票走势预测模型*[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[4] 黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[5] 马莹雪,赵吉昌. 自然灾害期间微博平台的舆情特征及演变*——以台风和暴雨数据为例[J]. 数据分析与知识发现, 2021, 5(6): 66-79.
[6] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[7] 谢豪,毛进,李纲. 基于多层语义融合的图文信息情感分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[8] 马莹雪,甘明鑫,肖克峻. 融合标签和内容信息的矩阵分解推荐方法*[J]. 数据分析与知识发现, 2021, 5(5): 71-82.
[9] 胡昊天,吉晋锋,王东波,邓三鸿. 基于深度学习的食品安全事件实体一体化呈现平台构建*[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[10] 张琪,江川,纪有书,冯敏萱,李斌,许超,刘浏. 面向多领域先秦典籍的分词词性一体化自动标注模型构建*[J]. 数据分析与知识发现, 2021, 5(3): 2-11.
[11] 吕学强,罗艺雄,李家全,游新冬. 中文专利侵权检测研究综述*[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
[12] 成彬,施水才,都云程,肖诗斌. 基于融合词性的BiLSTM-CRF的期刊关键词抽取方法[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[13] 常城扬,王晓东,张胜磊. 基于深度学习方法对特定群体推特的动态政治情感极性分析*[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[14] 冯勇,刘洋,徐红艳,王嵘冰,张永刚. 融合近邻评论的GRU商品推荐模型*[J]. 数据分析与知识发现, 2021, 5(3): 78-87.
[15] 李丹阳, 甘明鑫. 基于多源信息融合的音乐推荐方法 *[J]. 数据分析与知识发现, 2021, 5(2): 94-105.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn