Unbalanced Fake Review Processing Model Based on Cost-Sensitive Learning
Liu Meiling1(),Shang Yue1,Zhao Tiejun2,Zhou Jiyun3
1School of Information and Computer Engineering, Northeast Forestry University, Harbin 150006, China 2School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China 3Lieber Institute, Johns Hopkins University, Baltimore, MD 21218, USA
[Objective] This study aims to enhance the detection of fake reviews by improving the model’s ability to learn deep semantic information from text and addressing the problem of data imbalance. [Methods] User behavior and text characteristics of the dataset were analyzed to automatically calculate a cost-sensitive matrix based on inter-class separability, thereby improving the model’s ability to learn from unbalanced data. Additionally, the text encoding ability of BERT was utilized to optimize the model further. [Results] Extensive experiments on the YelpCHI dataset showed that the proposed model outperformed existing advanced methods with an 18% improvement in F1 value and a 12% improvement in AUC value. [Limitations] While the proposed method has achieved promising results, further research is needed to explore its applicability to other domains. [Conclusions] Leveraging user behavior and text features for category separability calculation effectively enhances the performance of the model in detecting fake reviews. The proposed method’s integration of cost-sensitive matrix and BERT’s text encoding ability holds great potential for improving the detection of fake reviews.
I went here on Friday night during Restaurant Week. The lamb meat was excellent. The fillet mignon wasn’t as good as I had hoped. I loved their pao de queijo. However, I was disappointed when the server brought the banana cream pie. It was too sweet, and we didn’t even get a chance to choose our dessert.
My girlfriend and I reserve this as a “special” dinner place for certain occasions. Tremendous neopolitan pizza, great insalata mista, and easily the best limoncello anywhere. It’s tough to get a table at prime time on a weekend - go there on a weeknight. One of the fun things about going here for is how relaxed and happy all the patrons seem. The owner is very nice, too, and will visit your table if things aren’t too busy.
Bajaj S, Garg N, Singh S K. A Novel User-Based Spam Review Detection[J]. Procedia Computer Science, 2017, 122: 1009-1015.
doi: 10.1016/j.procs.2017.11.467
[2]
Felbermayr A, Nanopoulos A. The Role of Emotions for the Perceived Usefulness in Online Customer Reviews[J]. Journal of Interactive Marketing, 2016, 36: 60-76.
doi: 10.1016/j.intmar.2016.05.004
[3]
Jindal N, Liu B. Review Spam Detection[C]// Proceedings of the 16th International Conference on World Wide Web. New York: ACM, 2007: 1189-1190.
[4]
Liu M L, Shang Y, Yue Q, et al. Detecting Fake Reviews Using Multidimensional Representations with Fine-Grained Aspects Plan[J]. IEEE Access, 2020, 9: 3765-3773.
doi: 10.1109/ACCESS.2020.3047947
[5]
周黎宇. 基于非均衡数据分类方法的虚假评论检测研究[D]. 合肥: 合肥工业大学, 2018.
[5]
(Zhou Liyu. Research on Review Spam Detection Based on Imbalanced Data Classification Method[D]. Hefei: Hefei University of Technology, 2018.)
[6]
Ott M, Choi Y, Cardie C, et al. Finding Deceptive Opinion Spam by Any Stretch of the Imagination[OL]. arXiv Preprint, arXiv: 1107.4557.
[7]
Fusilier D H, Montes-y-Gómez M, Rosso P, et al. Detecting Positive and Negative Deceptive Opinions Using PU-Learning[J]. Information Processing and Management, 2015, 51(4): 433-443.
doi: 10.1016/j.ipm.2014.11.001
[8]
Li Y J, Wang F X, Zhang S W, et al. Detection of Fake Reviews Using Group Model[J]. Mobile Networks and Applications, 2021, 26(1): 91-103.
doi: 10.1007/s11036-020-01688-z
[9]
Wang N, Yang J, Kong X F, et al. A Fake Review Identification Framework Considering the Suspicion Degree of Reviews with Time Burst Characteristics[J]. Expert Systems with Applications, 2022, 190: 116207.
doi: 10.1016/j.eswa.2021.116207
[10]
Liu Z W, Dou Y T, Yu P S, et al. Alleviating the Inconsistency Problem of Applying Graph Neural Network to Fraud Detection[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2020: 1569-1572.
[11]
Zhang Y M, Fan Y J, Ye Y F, et al. Key Player Identification in Underground Forums over Attributed Heterogeneous Information Network Embedding Framework[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York: ACM, 2019: 549-558.
(Wan Jianwu, Yang Ming. Survey on Cost-Sensitive Learning Method[J]. Journal of Software, 2020, 31(1): 113-136.)
[14]
Chen Y R, Chen H H. Opinion Spam Detection in Web Forum: A Real Case Study[C]// Proceedings of the 24th International Conference on World Wide Web. New York: ACM, 2015: 173-183.
[15]
Mukherjee A, Kumar A, Liu B, et al. Spotting Opinion Spammers Using Behavioral Footprints[C]// Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2013: 632-640.
[16]
Mukherjee A, Liu B, Glance N. Spotting Fake Reviewer Groups in Consumer Reviews[C]// Proceedings of the 21st International Conference on World Wide Web. New York: ACM, 2012: 191-200.
[17]
Mukherjee A, Venkataraman V, Liu B, et al. What Yelp Fake Review Filter Might be Doing?[C]// Proceedings of the 7th International AAAI Conference on Web and Social Media. 2013.
[18]
Rayana S, Akoglu L. Collective Opinion Spam Detection: Bridging Review Networks and Metadata[C]// Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2015: 985-994.
[19]
Lin T Y, Goyal P, Girshick R, et al. Focal Loss for Dense Object Detection[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. IEEE, 2017: 2999-3007.