Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (6): 55-70    DOI: 10.11925/infotech.2096-3467.2021.1259
Current Issue | Archive | Adv Search |
Identifying Fake Accounts with User-Review-Shop Relationship and User Deviation Analysis
Meng Yuan(),Wang Yue
School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai 201620, China
Download: PDF (2328 KB)   HTML ( 19
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] Based on the user-review-shop (URS) and the fake degree relationship, this paper proposes a model based on user deviation, aiming to effectively identify fake accounts. [Methods] First, we measured the user’s deviations of contents and behaviors with the means method, JS divergence and KL divergence respectively. Then, we constructed the URS-FDIRM model to identify fake users with experimental data from mafengwo.com. [Results] The proposed models effectively measured the user’s deviations of contents and behaviors. The F1 value of URS-FDIRM model reached 92.57%. [Limitations] This method mainly uses the conventional measurements to extract the deviation index and did not include more deviation measurements with user behaviors. [Conclusions] The proposed method could help us reveal the false relationship among users, reviews and shops, and monitor abnormal user behaviors.

Key wordsUser Deviation      Reinforcing Relationship      Fake User Identification      Mean Deviation      Fake Degree     
Received: 04 November 2021      Published: 28 July 2022
ZTFLH:  TP391  
Fund:Shanghai Philosophy and Social Sciences Planning Project(2020BGL009);Graduate Research Innovation Cultivation Project of Shanghai University of International Business and Economics(2021-030800-05)
Corresponding Authors: Meng Yuan     E-mail: nancymeng@suibe.edu.cn

Cite this article:

Meng Yuan, Wang Yue. Identifying Fake Accounts with User-Review-Shop Relationship and User Deviation Analysis. Data Analysis and Knowledge Discovery, 2022, 6(6): 55-70.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.1259     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I6/55

Design of URS-FDIRM
类型 指标缩写 指标个数
用户虚假度指标 用户活跃等级(UL
用户社交粉丝数(UF
用户提问与回答比(UQA
时间间隔(UTS
用户突发评论数(URB
用户评论次数(URN
用户评论频率(URF
用户评论集中度(URC
用户在商户中的评论集中度(USC
9
评论虚假度指标 评论长度(RL
极端评分(RR
评论图片数(RPN
评论相似度(RS
4
商户虚假度指标 商户年龄(SA
商户规模(SS
商户早期评论数(SRN
商户高评论用户数(SUN
4
Indicators Related to User-Review-Shop Fake Degree
Reinforced Relationship of URS
Process of Fake Users Identification
Cumulative Distribution of Number of User Reviews and Number of Users
对象 指标 描述 最小值 最大值 中位数 均值 标准差
用户 UL 用户活跃等级 6.000 0 45.000 0 17.000 0 16.840 0 3.460 0
UF 用户社交粉丝数 0.000 0 5 927.000 0 1 182.000 0 960.820 0 520.730 0
UQA 用户提问与回答比 0.000 0 1.000 0 0.000 0 0.040 7 0.156 6
UTS 用户评论时间间隔 1.000 0 4 087.000 0 2 049.000 0 1 985.270 0 433.970 0
URB 用户突发评论数 1.000 0 132.000 0 2.000 0 2.780 0 7.290 0
URN 用户评论次数 51.000 0 301.000 0 54.000 0 56.600 0 12.600 0
URF 用户评论频率 0.014 2 64.000 0 0.026 9 0.093 0 1.539 6
URC 用户评论集中度 0.012 2 0.987 6 0.037 0 0.043 0 0.066 2
USC 用户在商户中评论集中度 0.181 8 1.000 0 1.000 0 0.994 3 0.022 5
评论 RL 评论长度 0.000 0 576.000 0 9.000 0 12.470 0 13.330 0
RR 极端评分 0.000 0 5.000 0 5.000 0 4.430 0 0.770 0
RPN 图片数 0.000 0 10.000 0 0.000 0 0.002 0 0.058 0
RS 评论相似度 0.000 0 1.000 0 0.028 0 0.039 0 0.045 0
商户 SA 商户年龄 0.000 0 108.000 0 8.000 0 9.807 6 7.612 7
SS 商户规模 1.000 0 1 500.000 0 80.000 0 102.070 0 92.610 0
SRN 商户早期评论数 1.000 0 73.000 0 5.000 0 7.596 5 9.910 7
SUN 商户高评论用户数(≥50) 2.000 0 73.000 0 5.000 0 7.551 3 9.782 0
Descriptive Statistics
Distribution of User-Review and Shop-User
检验对象 |cor| |T value| P value
迭代前 迭代后 迭代前 迭代后 迭代前 迭代后
用户-评论虚假度 0.379 7 0.051 3 17.544 0 2.197 1 0.000 0*** 0.028 1***
商户-用户虚假度 0.094 8 0.098 3 11.075 0 11.488 0 0.000 0*** 0.000 0***
评论-商户虚假度 0.458 2 0.458 2 165.300 0 165.300 0 0.000 0*** 0.000 0***
Correlation Coefficient Test
Deviation Display
User Fake Degree Deviation Analysis
规则 规则说明
1 如果一个用户的评论总是与对应商户中其他用户的评论差别很大,这个用户是可疑的。例如,如果一个用户总是给他评论过的商户很高的评价,而其余用户给这些商户的评价较低,此时该用户是可疑的[10]
2 如果一个用户的评论总是与对应商户中其他用户早已发布的评论很相似,这个用户是可疑的。因为虚假用户往往会复制他人已有的评论达到快速评论提高影响的目的[15]
3 如果一个用户绝大多数评论都集中在某一家或某几家商户,且总是发布好评或差评,这个用户是可疑的。此时很可能存在用户与商户之间的串谋关系[18]
4 鉴于本文所用的数据为酒店数据,具有其特殊性,如果一个用户在一天内发布了大量评论,这个用户是可疑的。
5 仅仅从评论文本观察,如果一个用户的评论总是遵循一个固定的模板,或是毫无逻辑的辞藻堆砌,这个用户是可疑的。
6 进入用户主页,观察用户相关数据及日常行为,主观感受该用户是否可疑。
7 在进行数据标注时,要综合所有相关信息进行考量,不可仅看一个方面做出想当然的判断。
Tagging Rules
模型 F1值
URS-FDIRM_MEAN 0.914 0
URS-FDIRM_JS 0.878 1
URS-FDIRM_KL 0.874 6
URS-FDIRM_WITHOUT_DEV 0.871 0
The Experimental Results
k P R F1
230 0.961 0 0.795 7 0.870 6
240 0.954 4 0.824 4 0.884 6
250 0.944 2 0.849 5 0.894 3
260 0.934 9 0.874 6 0.903 7
270 0.922 5 0.896 1 0.909 1
280 0.914 6 0.921 1 0.917 9
290 0.903 8 0.942 7 0.922 8
300 0.890 4 0.960 6 0.924 1
310 0.865 0 0.964 2 0.911 9
320 0.844 2 0.971 3 0.903 3
330 0.818 7 0.971 3 0.888 5
340 0.797 7 0.974 9 0.877 4
Experimental Results Under Different k Values
Change of F1 Score
λ F1 λ F1
0.0 0.871 0 0.6 0.591 4
0.1 0.914 0 0.7 0.483 9
0.2 0.910 4 0.8 0.344 1
0.3 0.842 3 0.9 0.222 2
0.4 0.770 6 1.0 0.179 2
0.5 0.706 1
F1 Score Under Different λ
模型 P R F1
URS-FDIRM_MEAN 0.893 3 0.960 6 0.925 7
LR 0.881 5 0.474 1 0.613 1
RF 0.720 5 0.131 6 0.221 6
KNN 0.939 2 0.737 0 0.825 7
DNN 1.000 0 0.750 0 0.857 1
Fsum 0.262 9 0.709 7 0.383 7
Algorithm Performance
指标 Mean sq F value Pr(>F)
UL 0.205 0 35.490 0 0.000 0***
UF 0.314 1 41.600 0 0.000 0***
UQA 8.133 0 405.000 0 0.000 0***
UTS 0.528 2 55.310 0 0.000 0***
URB 0.134 0 44.960 0 0.000 0***
URN 0.045 1 26.080 0 0.000 0***
URF 0.000 0 0.044 0 0.833 0
URC 0.281 8 64.980 0 0.000 0***
USC 0.015 3 30.550 0 0.000 0***
RL 0.000 0 0.523 0 0.470 0
RR 0.013 8 0.745 0 0.388 0
RPN 0.000 0 1.854 0 0.173 0
RS 0.000 2 2.666 0 0.103 0
SA 0.005 2 32.170 0 0.000 0***
SS 0.006 0 32.020 0 0.000 0***
SRN 0.044 4 20.720 0 0.000 1***
SUN 0.046 0 21.990 0 0.000 0***
ANOVA Summary
删去的指标 新模型的F1值
URF 0.896 1
RL 0.408 6
RR 0.860 2
RPN 0.881 7
RS 0.863 8
Algorithm Performance Without Some Indicators
[1] Hu N, Bose I, Koh N S, et al. Manipulation of Online Reviews: An Analysis of Ratings, Readability, and Sentiments[J]. Decision Support Systems, 2012, 52(3): 674-684.
[2] Wu Y Y, Ngai E W T, Wu P K, et al. Fake Online Reviews: Literature Review, Synthesis, and Directions for Future Research[J]. Decision Support Systems, 2020, 132: 113280.
[3] 宋海霞, 严馨, 余正涛, 等. 基于自适应聚类的虚假评论检测[J]. 南京大学学报(自然科学版), 2013, 49(4): 433-438.
[3] Song Haixia, Yan Xin, Yu Zhengtao, et al. Detection of Fake Reviews Based on Adaptive Clustering[J]. Journal of Nanjing University(Natural Sciences), 2013, 49(4): 433-438.)
[4] 邓松, 万常选, 关爱浩, 等. 基于行为与内容的科技产品虚假评论识别[J]. 小型微型计算机系统, 2015, 36(11): 2498-2503.
[4] (Deng Song, Wan Changxuan, Guan Aihao, et al. Deceptive Reviews Detection of Technology Products Based on Behavior and Content[J]. Journal of Chinese Computer Systems, 2015, 36(11): 2498-2503.)
[5] Mukherjee A, Kumar A, Liu B, et al. Spotting Opinion Spammers Using Behavioral Footprints[C]// Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013: 632-640.
[6] Xu Q K, Zhao H. Using Deep Linguistic Features for Finding Deceptive Opinion Spam[C]// Proceedings of the 24th International Conference on Computational Linguistics. ACL, 2012:1341-1350.
[7] Feng S, Banerjee R, Choi Y. Syntactic Stylometry for Deception Detection[C]// Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. ACL, 2012:171-175.
[8] Goswami K, Park Y, Song C. Impact of Reviewer Social Interaction on Online Consumer Review Fraud Detection[J]. Journal of Big Data, 2017, 4: 15.
[9] Wang X P, Liu K, Zhao J. Handling Cold-Start Problem in Review Spam Detection by Jointly Embedding Texts and Behaviors[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. ACL, 2017: 366-376.
[10] Wang G, Xie S H, Liu B, et al. Identify Online Store Review Spammers via Social Review Graph[J]. ACM Transactions on Intelligent Systems and Technology, 2012, 3(4): 1-21.
[11] 余传明, 冯博琳, 左宇恒, 等. 基于个人-群体-商户关系模型的虚假评论识别研究[J]. 北京大学学报(自然科学版), 2017, 53(2): 262-272.
[11] (Yu Chuanming, Feng Bolin, Zuo Yuheng, et al. An Individual-Group-Merchant Relation Model for Identifying Online Fake Reviews[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2017, 53(2): 262-272.)
[12] Liu Y C, Pang B. A Unified Framework for Detecting Author Spamicity by Modeling Review Deviation[J]. Expert Systems with Applications, 2018, 112: 148-155.
[13] Shan G H, Zhou L N, Zhang D S. From Conflicts and Confusion to Doubts: Examining Review Inconsistency for Fake Review Detection[J]. Decision Support Systems, 2021, 144: 113513.
[14] 彭庆喜, 钱铁云. 基于量化情感的网店垃圾评论检测[J]. 山东大学学报(理学版), 2013, 48(11): 66-72.
[14] Peng Qingxi, Qian Tieyun. Store Review Spam Detection Based on Quantitative Sentiment[J]. Journal of Shandong University(Natural Science), 2013, 48(11): 66-72.)
[15] Cao J X, Xia R Q, Guo Y F, et al. Collusion-Aware Detection of Review Spammers in Location Based Social Networks[J]. World Wide Web, 2019, 22(6): 2921-2951.
[16] 吴佳芬, 马费成. 产品虚假评论文本识别方法研究述评[J]. 数据分析与知识发现, 2019, 3(9): 1-15.
[16] (Wu Jiafen, Ma Feicheng. Detecting Product Review Spam: A Survey[J]. Data Analysis and Knowledge Discovery, 2019, 3(9): 1-15.)
[17] 袁得嵛, 章逸钒, 高见, 等. 基于用户特征提取的新浪微博异常用户检测方法[J]. 计算机科学, 2020, 47(S1): 364-368.
[17] (Yuan Deyu, Zhang Yifan, Gao Jian, et al. Abnormal User Detection Method in Sina Weibo Based on User Feature Extraction[J]. Computer Science, 2020, 47(S1): 364-368.)
[18] 张文宇, 岳昆, 张彬彬. 基于D-S证据理论的电子商务虚假评论者检测[J]. 小型微型计算机系统, 2018, 39(11): 2428-2435.
[18] (Zhang Wenyu, Yue Kun, Zhang Binbin. Detecting E-Commerce Review Spammer Based on D-S Evidence Theory[J]. Journal of Chinese Computer Systems, 2018, 39(11): 2428-2435.)
[19] 邵珠峰, 姬东鸿. 基于情感特征和用户关系的虚假评论者的识别[J]. 计算机应用与软件, 2016, 33(5): 158-161.
[19] (Shao Zhufeng, Ji Donghong. Spotting Fake Reviewers Based on Sentiment Features and Users’ Relationship[J]. Computer Applications and Software, 2016, 33(5): 158-161.)
[20] Ye J T, Akoglu L. Discovering Opinion Spammer Groups by Network Footprints[C]// Proceedings of the 2015 ACM Conference on Online Social Networks. ACM, 2015: 97.
[1] Li Zhijie, Wang Rui, Li Changhua, Zhang Jie. Knowledge Graph Embedding Based on Negative Sampling of Joint Relational Context [J]. 数据分析与知识发现, 0, (): 1-.
[2] Yang Wenli, Li Nana. A Text-Aligned Cross-Language Sentiment Classification Method Based on Adversarial Networks[J]. 数据分析与知识发现, 2022, 6(7): 141-151.
[3] Ding Hao, Hu Guangwei, Qi Jianglei, Zhuang Guangguang. Recommending Medical Literature with Random Forest Model and Query Expansion[J]. 数据分析与知识发现, 2022, 6(7): 32-43.
[4] Liu Chunjiang, Li Shuying, Hu Hanlin, Fang Shu. Graph Databases for Complex Network Analysis[J]. 数据分析与知识发现, 2022, 6(7): 1-11.
[5] Zhang Le, Du Yifan, Lü Xueqiang, Dong Zhian. STNLTP: Generating Chinese Patent Abstracts Based on Integrated Strategy[J]. 数据分析与知识发现, 2022, 6(7): 107-117.
[6] Zheng Jie, Huang Hui, Qin Yongbin. Matching Similar Cases with Legal Knowledge Fusion[J]. 数据分析与知识发现, 2022, 6(7): 99-106.
[7] Ye Han,Sun Haichun,Li Xin,Jiao Kainan. Classification Model for Long Texts with Attention Mechanism and Sentence Vector Compression[J]. 数据分析与知识发现, 2022, 6(6): 84-94.
[8] Xiao Hanqiong, Zhang Xinyu, Xiao Yuhan, Lin Huiping. Creating Consumer Psychology Portrait with Aspect Words[J]. 数据分析与知识发现, 2022, 6(6): 22-31.
[9] Geng Shuang, He Yuqin, Xu Xin, Niu Ben. Comparing Official Projected and Public Perceived Images of Festival Events with Textual Compositional Distance[J]. 数据分析与知识发现, 2022, 6(6): 115-127.
[10] Zhang Ruoqi, Shen Jianfang, Chen Pinghua. Session Sequence Recommendation with GNN, Bi-GRU and Attention Mechanism[J]. 数据分析与知识发现, 2022, 6(6): 46-54.
[11] Xue Jingjing, Qin Yongbin, Huang Ruizhang, Ren Lina, Chen Yanping. SSVAE: A Deep Variational Text Clustering Model with Semantic Supplementation[J]. 数据分析与知识发现, 2022, 6(6): 71-83.
[12] Duan Jianyong, Xu Lishan, Liu Jie, Li Xin, Zhang Jiaming, Wang Hao. Question Generation Based on Sememe Knowledge and Bidirectional Attention Flow[J]. 数据分析与知识发现, 2022, 6(5): 44-53.
[13] Zhao Yang, Yan Zhouzhou, Shen Qiqi, Li Zhonghang. Evaluating Privacy Policy for Mobile Health APPs with Machine Learning[J]. 数据分析与知识发现, 2022, 6(5): 112-126.
[14] Wu Kaibiao, Lang Yuxiang, Dong Yu. Mining Policy Text Relevance with Syntactic Structure and Semantic Information[J]. 数据分析与知识发现, 2022, 6(5): 20-33.
[15] Tu Zhenchao, Ma Jing. Item Categorization Algorithm Based on Improved Text Representation[J]. 数据分析与知识发现, 2022, 6(5): 34-43.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn