Please wait a minute...
Advanced Search
数据分析与知识发现
  本期目录 | 过刊浏览 | 高级检索 |
融合RF-GA-XGBoost和SHAP的虚假新闻群体互动质量可解释模型
温廷新,白云鹤
(辽宁工程技术大学工商管理学院 葫芦岛  125105)
An Interpretable Model of Fake News' Group Interaction Quality Based on RF-GA-XGBoost and SHAP
Wen Tingxin,Bai Yunhe
(School of Business Administration, Liaoning Technical University, Huludao 125105, China)
全文:
输出: BibTeX | EndNote (RIS)      
摘要 

[目的]良性群体互动在虚假新闻传播过程中具有正面引导作用。为充分发挥社交媒体用户群体互动质量对虚假新闻负面影响的抑制作用,准确判定良性互动的成因及其作用方式,提出一种融合RF-GA-XGBoost和SHAP的虚假新闻群体互动质量可解释模型。

[方法]以数据集Weibo21中的500篇虚假新闻及7029条评论为研究对象。首先,从评论的内容、形式、情感3个维度综合衡量虚假新闻群体互动质量。其次,从这3个维度依次提取虚假新闻文本特征。接着,采用随机森林的序列前向搜索策略提取虚假新闻文本的最优特征子集,构建基于GA-XGBoost的群体互动质量预测模型,并与LR、SVM和XGBoost等主流机器学习算法进行实验对比。最后,采用SHAP模型对重要特征为群体互动质量带来的影响进行因果解释。

[结果]实验结果表明,GA-XGBoost模型的F1-score和AUC值均达到86%以上,选取的6项性能指标均优于其对比模型。此外,虚假新闻文本的内容字符数、词语数量、负面情感词数量等特征是影响虚假新闻社交媒体群体互动质量的重要因素。

[局限]本文未进行多特征交互解释分析,同时也未根据时间戳深入挖掘早期高质量群体互动规律。

[结论]综上,该可解释预测模型能够准确获得各特征对群体互动质量的影响方式,有利于为社交媒体平台在运营策略和功能设计改进方面提供有效决策支持。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
关键词 虚假新闻群体互动质量GAXGBoostSHAP     
Abstract

[Objective]Benign group interaction has a positive guiding role in the process of spreading of fake news. To give full play to the inhibitory effect of social media user's group interaction quality on the negative impact of fake news, and accurately determine the causes and ways of benign interaction, an interpretable model of fake news' group interaction quality based on RF-GA-XGBoost and SHAP is proposed.

[Methods]Taking 500 fake news and 7029 comments from the dataset Weibo21 as the research object. Firstly, the fake news' group interaction quality is comprehensively measured from three dimensions:content, form and emotion of comments. Secondly, the fake news text features are extracted from these three dimensions. Then, the sequential forward search strategy of random forest is used to extract the optimal feature subset of fake news text, and a group interaction quality prediction model based on GA-XGBoost is constructed, and conduct experimental comparisons with other mainstream machine learning algorithms such as LR, SVM and XGBoost. Finally, the SHAP model is used to provide causal explanations for the impact of important features on the group interaction quality.

[Results]The experimental results show that the F1-score and AUC values of the GA-XGBoost model are both above 86%, and the selected six performance indicators are all superior to their comparative models. In addition, the characteristics of false news texts, such as the number of content characters, the number of words, the number of negative emotional words are important factors that affect the fake news' group interaction quality among social media.

[Limitations]This paper does not conduct multi feature interactive interpretation analysis, nor does it dig into the early high-quality group interaction rules according to the timestamp.

[Conclusions]In summary, this interpretable predictive model can accurately obtain the impact of each feature on the group interaction quality, which is conducive to providing effective decision-making support for improving the operational strategy and functional design of social media platforms.

Key words Fake news    Group interaction quality    GA    XGBoost    SHAP
     出版日期: 2024-04-19
ZTFLH:  TP391,G206  
引用本文:   
温廷新, 白云鹤. 融合RF-GA-XGBoost和SHAP的虚假新闻群体互动质量可解释模型 [J]. 数据分析与知识发现, 10.11925/infotech.2096-3467.2023.0881.
Wen Tingxin, Bai Yunhe. An Interpretable Model of Fake News' Group Interaction Quality Based on RF-GA-XGBoost and SHAP . Data Analysis and Knowledge Discovery, 0, (): 1-.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2023.0881      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y0/V/I/1
[1] 刘天畅, 王雷, 朱庆华. 基于SHAP解释方法的智慧居家养老服务平台用户流失预测研究*[J]. 数据分析与知识发现, 2024, 8(1): 40-54.
[2] 刘智锋, 王继民. 可解释机器学习在信息资源管理领域的应用研究综述*[J]. 数据分析与知识发现, 2024, 8(1): 16-29.
[3] 李岱峰, 林凯欣, 李栩婷. 基于提示学习与T5 PEGASUS的图书宣传自动摘要生成器*[J]. 数据分析与知识发现, 2023, 7(3): 121-130.
[4] 刘赏, 沈逸凡. 基于新闻标题-正文差异性的虚假新闻检测方法*[J]. 数据分析与知识发现, 2023, 7(2): 97-107.
[5] 刘帅, 傅丽芳. 融合外部知识和用户交互特征的虚假新闻检测[J]. 数据分析与知识发现, 2023, 7(11): 79-87.
[6] 李保珍, 陈科. 多视角证据融合的虚假新闻甄别*[J]. 数据分析与知识发现, 2022, 6(2/3): 376-384.
[7] 王琰, 胥美美, 童俞嘉, 苟欢, 蔡荣, 单治易, 安新颖. 基于机器学习的环境监测数据对循环系统疾病死亡影响及预测预警模型构建*[J]. 数据分析与知识发现, 2022, 6(10): 79-92.
[8] 刘渊晨, 王昊, 高亚琪. 在线音乐歌单播放量预测及影响因素分析*[J]. 数据分析与知识发现, 2021, 5(8): 100-112.
[9] 曹睿,廖彬,李敏,孙瑞娜. 基于XGBoost的在线短租市场价格预测及特征分析模型*[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[10] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[11] 丁勇,陈夕,蒋翠清,王钊. 一种融合网络表示学习与XGBoost的评分预测模型*[J]. 数据分析与知识发现, 2020, 4(11): 52-62.
[12] 李晓峰,马静,李驰,朱恒民. 基于XGBoost模型的电商商品品名识别算法研究 *[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[13] 杨贵军,徐雪,赵富强. 基于XGBoost算法的用户评分预测模型及应用*[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[14] 姚晓娜, 祝忠明, 王思丽. 面向地学领域的自动语义标注研究[J]. 现代图书情报技术, 2013, (4): 48-53.
[15] 韦成府, 聂华. 利用MegaZine 3构建虚拟书平台[J]. 现代图书情报技术, 2011, 27(7/8): 32-36.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn