Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (2): 97-107     https://doi.org/10.11925/infotech.2096-3467.2022.0293
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于新闻标题-正文差异性的虚假新闻检测方法*
刘赏(),沈逸凡
天津财经大学理工学院 天津 300222
Detecting Fake News Based on Title-Content Difference
Liu Shang(),Shen Yifan
School of Science & Technology, Tianjin University of Finance and Economics, Tianjin 300222, China
全文: PDF (1610 KB)   HTML ( 18
输出: BibTeX | EndNote (RIS)      
摘要 

目的】 为解决在虚假新闻检测中新闻评论难以收集、新闻文本简短难以提取有效信息的问题,本文给出一种基于新闻标题和正文差异性的虚假新闻检测方法。【方法】 首先,设计Cos-Gap差异性计算方法以获取新闻标题-正文在文本和情感上的差异性特征;然后,根据获得的差异性特征,以异构图注意网络为基础,构建新闻差异性异构图网络NDHN。该网络既包含基于差异性特征构造的边,也包含基于语义特征和情感特征构建的标题、正文和情感三种类型节点。【结果】 在GossipCop开放数据集上的实验结果显示,本文提出的检测方法在分类准确率上提升约2.7个百分点,F1指标提升约3.2个百分点。【局限】 本文方法适用于带有标题的新闻,对于微博、Twitter等无标题文本存在局限。【结论】 融合新闻差异性特征可以有效提升虚假新闻检测准确率,为社交媒体快速检测出虚假新闻提供有力支持。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
刘赏
沈逸凡
关键词 虚假新闻检测异构图网络差异性特征舆情分析    
Abstract

[Objective] This paper proposes a fake news detection method based on the difference between news titles and contents, aiming to address the issues of extracting features from short news texts or retrieving comments. [Methods] Firstly, we designed the Cos-Gap calculation method to obtain the difference between news titles and contents’ textual and emotional features. Then, we constructed a News Differential Heterogeneous Graph Network (NDHN) based on the obtained differential features and the Heterogeneous Graph Attention Networks. The NDHN contains edges constructed based on differential features and nodes constructed based on semantic and emotional features of title, content, and emotion. [Results] We examined the proposed model on the GossipCop dataset and found that the NDHN can improve the classification accuracy by 2.7% and the F1 by 3.2%. [Limitations] This method is suitable for analyzing the news with title and has limitations for untitled texts from Sina Weibo or Twitter. [Conclusions] The new model could effectively detect fake news from social media.

Key wordsFake News Detection    Heterogeneous Graph Network    Differential Features    Public Opinion Analysis
收稿日期: 2022-04-02      出版日期: 2023-03-28
ZTFLH:  TP391  
基金资助:*教育部人文社会科学研究规划基金项目(19YJA630046);天津市自然科学基金项目(20JCQNJC00970);天津市艺术科学规划项目的研究成果之一(C22030)
通讯作者: 刘赏,ORCID:0000-0002-3797-7339,E-mail: liushangw@tjufe.edu.cn。   
引用本文:   
刘赏, 沈逸凡. 基于新闻标题-正文差异性的虚假新闻检测方法*[J]. 数据分析与知识发现, 2023, 7(2): 97-107.
Liu Shang, Shen Yifan. Detecting Fake News Based on Title-Content Difference. Data Analysis and Knowledge Discovery, 2023, 7(2): 97-107.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0293      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I2/97
Fig.1  NDHN结构
Fig.2  标题-正文邻接矩阵 A t i t l e - c o n t e n t
Fig.3  情感-标题邻接矩阵 A e m o t i o n - t i t l e
Fig.4  情感-正文邻接矩阵 A e m o t i o n - c o n t e n t
Fig.5  NDHN网络的邻接矩阵
数据集 指标 SVM RFC DTC GRU-2 B-TransE KAN NDHN
GossipCop 准确率 0.664 3 0.691 8 0.695 9 0.718 0 0.739 4 0.776 6 0.803 9
F1 0.595 5 0.669 1 0.691 9 0.707 9 0.734 0 0.771 3 0.803 7
Table 1  不同模型实验对比结果
方法 准确率 F1
-Word Similarity 0.789 5 0.789 4
-Attention 0.800 0 0.799 1
-Emotion Gap 0.801 3 0.800 3
NDHN 0.803 9 0.803 7
Table 2  NDHN的消融实验结果
Fig.6  不同情感标签下新闻标题-正文差异数目
Fig.7  新闻标题-正文文本差异频数直方图
Fig.8  新闻标题-正文情感差异频数直方图
[1] Vosoughi S, Roy D, Aral S. The Spread of True and False News Online[J]. Science, 2018, 359(6380): 1146-1151.
doi: 10.1126/science.aap9559 pmid: 29590045
[2] Mian A, Khan S. Coronavirus: The Spread of Misinformation[J]. BMC Medicine, 2020, 18(1): Article No.89.
[3] Kwak H, Lee C, Park H, et al. What is Twitter, a Social Network or a News Media?[C]// Proceedings of the 19th International Conference on World Wide Web. 2010: 591-600.
[4] Gabielkov M, Ramachandran A, Chaintreau A, et al. Social Clicks: What and Who Gets Read on Twitter?[C]// Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science. 2016: 179-192.
[5] Ecker U K H, Lewandowsky S, Chang E P, et al. The Effects of Subtle Misinformation in News Headlines[J]. Journal of Experimental Psychology: Applied, 2014, 20(4): 323-335.
doi: 10.1037/xap0000028
[6] Horne B, Adali S. This JustIn:Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire Than Real News[C]// Proceedings of the 2nd International Workshop on News and Public Opinion at ICWSM. 2017.
[7] Hu L M, Yang T C, Shi C, et al. Heterogeneous Graph Attention Networks for Semi-Supervised Short Text Classification[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 4823-4832.
[8] Castillo C, Mendoza M, Poblete B. Information Credibility on Twitter[C]// Proceedings of the 20th International Conference on World Wide Web. ACM, 2011: 675-684.
[9] Shu K, Sliva A, Wang S, et al. Fake News Detection on Social Media: A Data Mining Perspective[C]// Proceedings of the 2017 ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2017: 22-36.
[10] Ma J, Gao W, Mitra P, et al. Detecting Rumors from Microblogs with Recurrent Neural Networks[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. ACM, 2016: 3818-3824.
[11] Potthast M, Kiesel J, Reinartz K, et al. A Stylometric Inquiry into Hyperpartisan and Fake News[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 231-240.
[12] Shu K, Cui L M, Wang S H, et al. dEFEND: Explainable Fake News Detection[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019: 395-405.
[13] Rashkin H, Choi E, Jang J Y, et al. Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 2931-2937.
[14] Liu Y, Wu Y F. Early Detection of Fake News on Social Media Through Propagation Path Classification with Recurrent and Convolutional Networks[C]// Proceedings of the 2018 AAAI Conference on Artificial Intelligence. 2018: 254-261.
[15] Shrestha A, Spezzano F. Textual Characteristics of News Title and Body to Detect Fake News: A Reproducibility Study[C]// Proceedings of the 2021 European Conference on Information Retrieval. 2021: 120-133.
[16] Ajao O, Bhowmik D, Zargari S. Sentiment Aware Fake News Detection on Online Social Networks[C]// Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing. 2019: 2507-2511.
[17] Giachanou A, Rosso P, Crestani F. Leveraging Emotional Signals for Credibility Detection[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2019: 877-880.
[18] Wu L W, Rao Y. Adaptive Interaction Fusion Networks for Fake News Detection[C]// Proceedings of the 24th European Conference on Artificial Intelligence. 2020: 2220-2227.
[19] Zhang X Y, Cao J, Li X R, et al. Mining Dual Emotion for Fake News Detection[C]// Proceedings of the 2021 International Conference on World Wide Web. ACM, 2021: 3465-3476.
[20] Ghanem B, Rosso P, Rangel F. An Emotional Analysis of False Information in Social Media and News Articles[J]. ACM Transactions on Internet Technology, 2020, 20(2): Article No.19.
[21] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
[22] Kant N, Puri R, Yakovenko N, et al. Practical Text Classification with Large Pre-Trained Language Models[OL]. arXiv Preprint, arXiv: 1812.01207.
[23] Kipf T N, Welling M.Semi-Supervised Classification with Graph Convolutional Networks[OL]. arXiv Preprint, arXiv: 1609.02907.
[24] Shu K, Mahudeswaran D, Wang S H, et al. FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media[J]. Big Data, 2020, 8(3): 171-188.
doi: 10.1089/big.2020.0062
[25] Yang F, Liu Y, Yu X H, et al. Automatic Detection of Rumor on Sina Weibo[C]// Proceedings of the 2012 ACM SIGKDD Workshop on Mining Data Semantics. 2012:Article No.13.
[26] Kwon S, Cha M, Jung K, et al. Prominent Features of Rumor Propagation in Online Social Media[C]// Proceedings of the IEEE 13th International Conference on Data Mining. 2013: 1103-1108.
[27] Pan J Z, Pavlova S, Li C, et al. Content Based Fake News Detection Using Knowledge Graphs[C]// Proceedings of the 17th International Semantic Web Conference. 2018: 669-683.
[28] Dun Y, Tu K, Chen C, et al. KAN: Knowledge-Aware Attention Network for Fake News Detection[C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021: 81-89.
[1] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[2] 梁野,李小元,许航,胡伊然. CLOpin:一种面向舆情分析与预警领域的跨语言知识图谱架构*[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[3] 王秀芳, 盛姝, 路燕. 一种基于话题聚类及情感强度的微博舆情分析模型*[J]. 数据分析与知识发现, 2018, 2(6): 37-47.
[4] 岑咏华,王曰芬. 大数据环境下社会舆情分析与决策支持的研究视角和关键问题*[J]. 现代图书情报技术, 2016, 32(7-8): 3-11.
[5] 段建勇, 程利伟, 张梅, 高振安. 网络舆情分析中共性知识挖掘方法研究[J]. 现代图书情报技术, 2013, 29(10): 59-65.
[6] 王伟,许鑫. 基于聚类的网络舆情热点发现及分析*[J]. 现代图书情报技术, 2009, 3(3): 74-79.
[7] 钱爱兵. 基于主题的网络舆情分析模型及其实现[J]. 现代图书情报技术, 2008, 24(4): 49-55.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn