Unbalanced Fake Review Processing Model Based on Cost-Sensitive Learning

doi:10.11925/infotech.2096-3467.2022.0442

Data Analysis and Knowledge Discovery

2023, Vol. 7

Issue (6): 113-122 DOI: 10.11925/infotech.2096-3467.2022.0442

Current Issue | Archive | Adv Search

Unbalanced Fake Review Processing Model Based on Cost-Sensitive Learning

Liu Meiling¹(

),Shang Yue¹,Zhao Tiejun²,Zhou Jiyun³

¹School of Information and Computer Engineering, Northeast Forestry University, Harbin 150006, China
²School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
³Lieber Institute, Johns Hopkins University, Baltimore, MD 21218, USA

Download: PDF (1010 KB) HTML ( 11 )
Export: BibTeX | EndNote (RIS)

Abstract

[Objective] This study aims to enhance the detection of fake reviews by improving the model’s ability to learn deep semantic information from text and addressing the problem of data imbalance. [Methods] User behavior and text characteristics of the dataset were analyzed to automatically calculate a cost-sensitive matrix based on inter-class separability, thereby improving the model’s ability to learn from unbalanced data. Additionally, the text encoding ability of BERT was utilized to optimize the model further. [Results] Extensive experiments on the YelpCHI dataset showed that the proposed model outperformed existing advanced methods with an 18% improvement in F1 value and a 12% improvement in AUC value. [Limitations] While the proposed method has achieved promising results, further research is needed to explore its applicability to other domains. [Conclusions] Leveraging user behavior and text features for category separability calculation effectively enhances the performance of the model in detecting fake reviews. The proposed method’s integration of cost-sensitive matrix and BERT’s text encoding ability holds great potential for improving the detection of fake reviews.

Key words： Fake Review Detection Class Separability Computation Cost-Sensitive Learning Unbalanced Data Processing

Received: 06 May 2022 Published: 09 August 2023

ZTFLH:	TP393
	G250

Fund:Natural Science Foundation of Heilongjiang Province(LH2022F002);National Natural Science Foundation of China(61702091)

Corresponding Authors: Liu Meiling，ORCID：0000-0003-4208-7274，E-mail： mlliu@nefu.edu.cn。

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Meiling Liu
	Yue Shang
	Tiejun Zhao
	Jiyun Zhou

Cite this article:

Liu Meiling, Shang Yue, Zhao Tiejun, Zhou Jiyun. Unbalanced Fake Review Processing Model Based on Cost-Sensitive Learning. Data Analysis and Knowledge Discovery, 2023, 7(6): 113-122.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0442 OR https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I6/113

Fake Review Detection Research Dataset Description

The Model Structure of MIANA-B

User Behavior Characteristics

Descriptive Statistics of User Behavior Characteristics and Review Text Characteristics

Statistics Description of YelpCHI Dataset

MIANA-B Experimental Results

Precision of MIANA-B and Comparison Models for Two Categories

Recall of MIANA-B and Comparison Models for Two Categories

F1-Score of MIANA-B and Comparison Models for Two Categories

[1]	Bajaj S, Garg N, Singh S K. A Novel User-Based Spam Review Detection[J]. Procedia Computer Science, 2017, 122: 1009-1015. doi: 10.1016/j.procs.2017.11.467
[2]	Felbermayr A, Nanopoulos A. The Role of Emotions for the Perceived Usefulness in Online Customer Reviews[J]. Journal of Interactive Marketing, 2016, 36: 60-76. doi: 10.1016/j.intmar.2016.05.004
[3]	Jindal N, Liu B. Review Spam Detection[C]// Proceedings of the 16th International Conference on World Wide Web. New York: ACM, 2007: 1189-1190.
[4]	Liu M L, Shang Y, Yue Q, et al. Detecting Fake Reviews Using Multidimensional Representations with Fine-Grained Aspects Plan[J]. IEEE Access, 2020, 9: 3765-3773. doi: 10.1109/ACCESS.2020.3047947
[5]	周黎宇. 基于非均衡数据分类方法的虚假评论检测研究[D]. 合肥: 合肥工业大学, 2018.
[5]	(Zhou Liyu. Research on Review Spam Detection Based on Imbalanced Data Classification Method[D]. Hefei: Hefei University of Technology, 2018.)
[6]	Ott M, Choi Y, Cardie C, et al. Finding Deceptive Opinion Spam by Any Stretch of the Imagination[OL]. arXiv Preprint, arXiv: 1107.4557.
[7]	Fusilier D H, Montes-y-Gómez M, Rosso P, et al. Detecting Positive and Negative Deceptive Opinions Using PU-Learning[J]. Information Processing and Management, 2015, 51(4): 433-443. doi: 10.1016/j.ipm.2014.11.001
[8]	Li Y J, Wang F X, Zhang S W, et al. Detection of Fake Reviews Using Group Model[J]. Mobile Networks and Applications, 2021, 26(1): 91-103. doi: 10.1007/s11036-020-01688-z
[9]	Wang N, Yang J, Kong X F, et al. A Fake Review Identification Framework Considering the Suspicion Degree of Reviews with Time Burst Characteristics[J]. Expert Systems with Applications, 2022, 190: 116207. doi: 10.1016/j.eswa.2021.116207
[10]	Liu Z W, Dou Y T, Yu P S, et al. Alleviating the Inconsistency Problem of Applying Graph Neural Network to Fraud Detection[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2020: 1569-1572.
[11]	Zhang Y M, Fan Y J, Ye Y F, et al. Key Player Identification in Underground Forums over Attributed Heterogeneous Information Network Embedding Framework[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York: ACM, 2019: 549-558.
[12]	赵敏, 张月琴, 窦英通, 等. 集成层级图注意力网络检测非均衡虚假评论[J]. 计算机科学与探索, 2023, 17(2): 428-441. doi: 10.3778/j.issn.1673-9418.2104090
[12]	(Zhao Min, Zhang Yueqin, Dou Yingtong, et al. Imbalanced Fake Reviews Detection with Ensemble Hierarchical Graph Attention Network[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(2): 428-441.) doi: 10.3778/j.issn.1673-9418.2104090
[13]	万建武, 杨明. 代价敏感学习方法综述[J]. 软件学报, 2020, 31(1): 113-136.
[13]	(Wan Jianwu, Yang Ming. Survey on Cost-Sensitive Learning Method[J]. Journal of Software, 2020, 31(1): 113-136.)
[14]	Chen Y R, Chen H H. Opinion Spam Detection in Web Forum: A Real Case Study[C]// Proceedings of the 24th International Conference on World Wide Web. New York: ACM, 2015: 173-183.
[15]	Mukherjee A, Kumar A, Liu B, et al. Spotting Opinion Spammers Using Behavioral Footprints[C]// Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2013: 632-640.
[16]	Mukherjee A, Liu B, Glance N. Spotting Fake Reviewer Groups in Consumer Reviews[C]// Proceedings of the 21st International Conference on World Wide Web. New York: ACM, 2012: 191-200.
[17]	Mukherjee A, Venkataraman V, Liu B, et al. What Yelp Fake Review Filter Might be Doing?[C]// Proceedings of the 7th International AAAI Conference on Web and Social Media. 2013.
[18]	Rayana S, Akoglu L. Collective Opinion Spam Detection: Bridging Review Networks and Metadata[C]// Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2015: 985-994.
[19]	Lin T Y, Goyal P, Girshick R, et al. Focal Loss for Dense Object Detection[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. IEEE, 2017: 2999-3007.

[1]	Zhang Yunqiu, Li Bocheng, Chen Yan. Automatic Classification with Unbalanced Data for Electronic Medical Records[J]. 数据分析与知识发现, 2022, 6(2/3): 233-241.

Viewed

Full text

Abstract

Cited

Shared

Discussed