基于代价敏感学习的不平衡虚假评论处理模型

doi:10.11925/infotech.2096-3467.2022-0442

数据分析与知识发现

本期目录 | 过刊浏览 | 高级检索

基于代价敏感学习的不平衡虚假评论处理模型

刘美玲,尚玥,赵铁军,周继云

(东北林业大学信息与计算机工程学院哈尔滨 150006) (哈尔滨工业大学计算机科学与技术学院哈尔滨 150001) (约翰霍普金斯大学利伯研究所巴尔的摩 MD21218)

Unbalanced Fake Review Processing Model Based on Cost-Sensitive Learning

Liu Meiling,Shang Yue,Zhao Tiejun,Zhou Jiyun

(School of Information and Computer Engineering, Northeast Forestry University, Harbin 150006, China) (Department of Computer Science, Harbin Institute of Technology, Harbin 150001, China) (Lieber Institute, Johns Hopkins University, Baltimore, MD 21218, USA)

摘要
相关文章
Metrics

全文:
输出: BibTeX | EndNote (RIS)

摘要

[目的] 增强虚假评论识别任务中模型对文本深层语义信息的学习，并解决虚假评论识别任务中存在的严重的数据不平衡问题。[方法] 基于数据本身的用户行为特征与文本特征进行类间可分性计算自动学习代价敏感矩阵，增强模型对不平衡数据的学习能力；同时利用BERT在文本编码方面的能力对模型进一步优化。[结果] 通过在YelpCHI数据集上进行大量实验，对比现有先进方法，所提模型的F1值得到了18%的提升，AUC值得到了12%的提升。[局限] 将所提方法应用到更多的研究领域中有待进一步探索。[结论] 将用户行为特征与评论文本特征看作虚假评论类与真实类之间的特征集合进行类别可分性计算能够有效增强模型对虚假评论识别的性能。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

关键词 ：虚假评论识别, 类别可分性计算, 代价敏感学习, 不平衡数据处理

Abstract：

[Objective] It enhances the learning of deep semantic information of the text in the fake review detection task, and solves the serious data imbalance problem in this task. [Methods] Based on the user behavior characteristics and text characteristics of the data itself, the cost-sensitive matrix is automatically learned by calculating the inter-class separability, which enhances the model's learning ability for unbalanced data; at the same time, the model is further optimized by using BERT's ability in text encoding. [Results] Through a large number of experiments on the YelpCHI dataset, compared with the existing advanced methods, the F1 value of the proposed model has been improved by 18%, and the AUC value has been improved by 12%. [Limitations] The application of the proposed method to more research fields remains to be further explored. [Conclusions] Taking user behavior features and comment text features as feature sets between fake review class and real class for category separability calculation can effectively enhance the performance of the model for fake review detection.

Key words： fake review detection class separability computation cost-sensitive learning unbalanced data processing

出版日期: 2022-11-11

ZTFLH:

TP393，G250

引用本文:

刘美玲, 尚玥, 赵铁军, 周继云. 基于代价敏感学习的不平衡虚假评论处理模型 [J]. 数据分析与知识发现, 10.11925/infotech.2096-3467.2022-0442.
Liu Meiling, Shang Yue, Zhao Tiejun, Zhou Jiyun. Unbalanced Fake Review Processing Model Based on Cost-Sensitive Learning . Data Analysis and Knowledge Discovery, 0, (): 1-.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022-0442 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y0/V/I/1

[1]	刘美玲, 尚玥, 赵铁军, 周继云. 基于代价敏感学习的不平衡虚假评论处理模型^*[J]. 数据分析与知识发现, 2023, 7(6): 113-122.
[2]	张云秋, 李博诚, 陈妍. 面向不平衡数据的电子病历自动分类研究^*[J]. 数据分析与知识发现, 2022, 6(2/3): 233-241.
[3]	吴佳芬,马费成. 产品虚假评论文本识别方法研究述评 ^*[J]. 数据分析与知识发现, 2019, 3(9): 1-15.
[4]	陈燕方, 李志宇. 基于评论产品属性情感倾向评估的虚假评论识别研究[J]. 现代图书情报技术, 2014, 30(9): 81-90.

Viewed

Full text

Abstract

Cited

Shared

Discussed