Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (8): 84-96    DOI: 10.11925/infotech.2096-3467.2021.1245
Current Issue | Archive | Adv Search |
IMTS: Detecting Fake Reviews with Image and Text Semantics
Shi Yunmei1,2,Yuan Bo1,2,Zhang Le1,2(),Lv Xueqiang1
1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China
2School of Computer Science, Beijing Information Science and Technology University, Beijing 100101, China
Download: PDF (4980 KB)   HTML ( 31
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a fake comment detection method (IMTS) integrating image information and text semantics for Chinese e-commerce websites, aiming to address the proliferation of fake comments posted by “Internet Water Army”. [Methods] First, we used the text convolutional neural network (TextCNN) and the BERT pre-training model to extract features of the text review information, and obtained the corresponding feature vectors. Then, we integrated the reviewer features to enhance the model’s capture of the overall semantic information by splicing the review text semantics and the output features of the reviewer ID. Third, we used the Residual Network (ResNet) to extract features from pictures posted by users in comments to obtain corresponding visual features. Finally, we conducted multimodal fusion of text features and visual features to detect the fake comments. [Results] The IMTS method achieved 96.36% accuracy, 96.35% recall and 96.35% F1 value on the self-built multimodal Chinese fake comment dataset. [Limitations] The dataset in this paper was small in scale, and the BERT pre-training model was used in the text processing stage. [Conclusions] The proposed method could effectively improve the overall detection accuracy of fake comments.

Key wordsFalse comment      Multimodal      Text      Image      BERT     
Received: 31 October 2021      Published: 23 September 2022
ZTFLH:  TP393  
Fund:National Key R&D Program of China(2018YFB1004100);National Natural Science Foundation of China(62171043)
Corresponding Authors: Zhang Le,ORCID:0000-0002-9620-511X     E-mail: zhangle@bistu.edu.cn

Cite this article:

Shi Yunmei, Yuan Bo, Zhang Le, Lv Xueqiang. IMTS: Detecting Fake Reviews with Image and Text Semantics. Data Analysis and Knowledge Discovery, 2022, 6(8): 84-96.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.1245     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I8/84

Flow Chart of Text Semantic Information Extraction
Text Semantic Feature Fusion Model
Flow of Multimodal Fusion
虚假评论类型 虚假评论特征 示例
不包含任意观点难以分辨情感的随机文本 本身为无实际意义且无逻辑性的语句 大家搜集的哈是孤独一个牙刷难道你就 构建能摧毁阿花股表弟爸爸读发布了豆 阿萨德两个你好像比撒打算大家公司


非评论
文本与符号的堆砌 @)¥()*(&@&@搭配上看懂江湖胡汉三可能就@*¥(&*……&@&%&@#)哦咨询哦家吃饭哈哈斯哈斯哈的
单纯符号的堆砌 &¥(!*&¥&!@……!@)())!&@#&#*&!*(@……#&**&%¥*!@)(*)(¥*!)@*#(&*!%@#*%……&!%@#%
单纯数字的堆砌 11111111111111111111111111
与当前主题无关评论 具有逻辑性但与商品属性无关的评论 店家说打够十五字才可返现。来混经验 我也没办法因为我要打十五个字啊 不用数了这是非常标准的十五个字
广告性评论 拼凑好评数量而进行大量重复性语句 好评!好评!好评!好评!好评!好评! 非常好!好用!非常好!好用! 超级超级好看!!!!!超级超级好看!!!!!






欺骗性评论


商家利用“好评返现”手段让用户撰写具有较高模板化痕迹、固定的写法格式与符号占比、情感表现形式单一、无真实体验的评论类型
适合各种肤色! 遮瑕效果 :好!持续六个小时! ,这个恰好适合我的肤质 ,水润好, 适合任何人。不管大家怎么样的皮肤, 都可以完美适应,特别特别好,性价比超高,一定要回购! 外形外观:挺漂亮的,很精致,很光滑 ,无损坏。 屏幕音效:特别棒, 没有杂音。 拍照效果:拍照效果好看清晰 ,反应快。特别漂亮。 运行速度:快很快。 待机时间:不错 运行也跟ok,反正就是推荐大家购买 很不错。 外形外观:黑色一直都很喜欢 真的非常好看 待机时间:还可以其实和上一代也就差一点点 屏幕音效:音质很好很大声很漂亮拍照效果: 大提升呀不用说的爱不释手。快递也很快!
Example of Building Rule
Case of False Comments
模型输入 模型组 模型 准确率 召回率 F1值
文本 单模型 Bi-LSTM 0.653 4 0.652 5 0.652 5
TextCNN 0.688 3 0.669 3 0.665 6
BERT 0.862 6 0.843 8 0.842 9
组合模型 BERT+LSTM 0.885 6 0.885 4 0.885 3
BERT+TextCNN 0.909 3 0.908 9 0.908 9
组合模型+ID BERT+LSTM+ID 0.891 5 0.891 5 0.891 5
BERT+TextCNN+ID 0.937 3 0.934 9 0.934 6
图像 单模型 CNN 0.831 3 0.830 7 0.830 3
VGG 0.855 4 0.849 0 0.847 5
ResNet 0.872 4 0.872 1 0.872 4

文本+图像
组合模型 EANN 0.849 9 0.849 0 0.849 1
BERT+TextCNN+ResNet 0.955 9 0.955 7 0.955 7
组合模型+ID EANN+ID 0.899 2 0.899 2 0.899 2
IMTS 0.963 6 0.963 5 0.963 5
Identification Results of False Comments
[1] 中国互联网络信息中心. 第47次中国互联网络发展状况统计报告[R/OL]. [2021-02-28]. http://www.cac.gov.cn/2021-02/03/c_1613923423079314.htm.
[1] (China Internet Network Information Center. Statistical Report of the 47th Chinese Internet Development[R/OL]. [2021-02-28]. http://www.cac.gov.cn/2021-02/03/c_1613923423079314.htm.)
[2] Wu Y Y, Ngai E W T, Wu P K, et al. Fake Online Reviews: Literature Review, Synthesis, and Directions for Future Research[J]. Decision Support Systems, 2020, 132: 113280.
doi: 10.1016/j.dss.2020.113280
[3] 陈燕方, 谭立辉. 在线商品虚假评论信息治理策略研究[J]. 现代情报, 2015, 35(2): 150-153.
[3] (Chen Yanfang, Tan Lihui. Study on Information Management Strategies of Fake Reviews of Online Products[J]. Journal of Modern Information, 2015, 35(2): 150-153.)
[4] Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1746-1751.
[5] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
[6] He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016: 770-778.
[7] 张紫琼, 叶强, 李一军. 互联网商品评论情感分析研究综述[J]. 管理科学学报, 2010, 13(6): 84-96.
[7] (Zhang Ziqiong, Ye Qiang, Li Yijun. Literature Review on Sentiment Analysis of Online Product Reviews[J]. Journal of Management Sciences in China, 2010, 13(6): 84-96.)
[8] 李菲菲, 吴璠, 王中卿. 基于生成式对抗网络和评论专业类型的情感分类研究[J]. 数据分析与知识发现, 2021, 5(4): 72-79.
[8] (Li Feifei, Wu Fan, Wang Zhongqing. Sentiment Analysis with Reviewer Types and Generative Adversarial Network[J]. Data Analysis and Knowledge Discovery, 2021, 5(4): 72-79.)
[9] 田金霓, 尤天慧, 袁媛. 基于在线评论的产品竞争力分析方法[J]. 东北大学学报(自然科学版), 2021, 42(10): 1498-1505.
[9] (Tian Jinni, You Tianhui, Yuan Yuan. Product Competitiveness Analysis Method Based on Online Reviews[J]. Journal of Northeastern University(Natural Science), 2021, 42(10): 1498-1505.)
[10] 行娟娟. 基于Markov逻辑网的虚假评论识别方法[J]. 中文信息学报, 2016, 30(5): 94-100.
[10] (Xing Juanjuan. Fake Reviews Identification Based on Markov Logic Networks[J]. Journal of Chinese Information Processing, 2016, 30(5): 94-100.)
[11] Gao X Y, Li S, Zhu Y Y, et al. Identification of Deceptive Reviews by Sentimental Analysis and Characteristics of Reviewers[J]. Journal of Engineering Science and Technology Review, 2019, 12(1): 195-201.
[12] 张琪, 纪淑娟, 傅强, 等. 基于带权评论图的水军群组检测及特征分析[J]. 计算机应用, 2019, 39(6): 1595-1600.
doi: 10.11772/j.issn.1001-9081.2018122611
[12] (Zhang Qi, Ji Shujuan, Fu Qiang, et al. Weighted Reviewer Graph Based Spammer Group Detection and Characteristic Analysis[J]. Journal of Computer Applications, 2019, 39(6): 1595-1600.)
doi: 10.11772/j.issn.1001-9081.2018122611
[13] Dong L Y, Ji S J, Zhang C J, et al. An Unsupervised Topic-Sentiment Joint Probabilistic Model for Detecting Deceptive Reviews[J]. Expert Systems with Applications, 2018, 114: 210-223.
doi: 10.1016/j.eswa.2018.07.005
[14] Liu Y C, Pang B. A Unified Framework for Detecting Author Spamicity by Modeling Review Deviation[J]. Expert Systems with Applications, 2018, 112: 148-155.
doi: 10.1016/j.eswa.2018.06.028
[15] Yu C M, Zuo Y H, Feng B L, et al. An Individual-Group-Merchant Relation Model for Identifying Fake Online Reviews: An Empirical Study on a Chinese E-Commerce Platform[J]. Information Technology and Management, 2019, 20(3): 123-138.
doi: 10.1007/s10799-018-0288-1
[16] Zhang L, Wu Z A, Cao J. Detecting Spammer Groups from Product Reviews: A Partially Supervised Learning Model[J]. IEEE Access, 2018, 6: 2559-2568.
doi: 10.1109/ACCESS.2017.2784370
[17] Yuan S H, Wu X T, Xiang Y. Task-Specific Word Identification from Short Texts Using a Convolutional Neural Network[J]. Intelligent Data Analysis, 2018, 22(3): 533-550.
doi: 10.3233/IDA-173413
[18] Mandhula T, Pabboju S, Gugulotu N. Predicting the Customer’s Opinion on Amazon Products Using Selective Memory Architecture-Based Convolutional Neural Network[J]. The Journal of Supercomputing, 2020, 76(8): 5923-5947.
doi: 10.1007/s11227-019-03081-4
[19] Bhargava R, Baoni A, Sharma Y. Composite Sequential Modeling for Identifying Fake Reviews[J]. Journal of Intelligent Systems, 2019, 28(3): 409-422.
doi: 10.1515/jisys-2017-0501
[20] 张国标, 李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[20] (Zhang Guobiao, Li Jie. Detecting Social Media Fake News with Semantic Consistency Between Multi-Model Contents[J]. Data Analysis and Knowledge Discovery, 2021, 5(5): 21-29.)
[21] 孙晓燕, 马路遥, 乔娅丽. 基于文本特征融合的虚假评论识别[C]// 第31届中国过程控制会议. 2020.
[21] (Sun Xiaoyan, Ma Luyao, Qiao Yali. False Comment Recognition Based on Text Feature Fusion[C]// Proceedings of the 31st China Process Control Conference. 2020.)
[22] Lu S, Mao C, Yu Z, et al. A Joint Model with Multi-Granularity Features of Low-Resource Language POS Tagging and Dependency Parsing[C]// Proceedings of the 20th Chinese National Conference on Computational Linguistics. 2021: 747-757.
[23] Ali F, El-Sappagh S, Islam S M R, et al. A Smart Healthcare Monitoring System for Heart Disease Prediction Based on Ensemble Deep Learning and Feature Fusion[J]. Information Fusion, 2020, 63: 208-222.
doi: 10.1016/j.inffus.2020.06.008
[24] Makiuchi M R, Warnita T, Uto K, et al. Multimodal Fusion of BERT-CNN and Gated CNN Representations for Depression Detection[C]// Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop. 2019: 55-63.
[25] 陈鹏, 李擎, 张德政, 等. 多模态学习方法综述[J]. 工程科学学报, 2020, 42(5): 557-569.
[25] (Chen Peng, Li Qing, Zhang Dezheng, et al. A Survey of Multimodal Machine Learning[J]. Chinese Journal of Engineering, 2020, 42(5): 557-569.)
[26] Sutton C, McCallum A. An Introduction to Conditional Random Fields for Relational Learning[J]. Introduction to Statistical Relational Learning, 2006, 2: 93-128.
[27] Ngiam J, Khosla A, Kim M, et al. Multimodal Deep Learning[C]// Proceedings of the 28th International Conference on Machine Learning. 2011: 689-696.
[28] Lei J, Yu L C, Bansal M, et al. TVQA: Localized, Compositional Video Question Answering[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 1369-1379.
[29] Zhang Z F, Li X L, Gan C Q. Multimodality Fusion for Node Classification in D2D Communications[J]. IEEE Access, 2018, 6: 63748-63756.
doi: 10.1109/ACCESS.2018.2877715
[30] Manaskasemsak B, Chanmakho C, Klainongsuang J, et al. Opinion Spam Detection Through User Behavioral Graph Partitioning Approach[C]// Proceedings of the 3rd International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence. 2019: 73-77.
[31] Xie S H, Wang G, Lin S Y, et al. Review Spam Detection via Temporal Pattern Discovery[C]// Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012: 823-831.
[32] Dewang R K, Singh P, Singh A K. Finding of Review Spam Through “Corleone, Review Genre, Writing Style and Review Text Detail Features”[C]// Proceedings of the 2nd International Conference on Information and Communication Technology for Competitive Strategies. 2016.
[33] Wang Y Q, Ma F L, Jin Z W, et al. EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 849-857.
[1] Hu Jiming, Qian Wei, Wen Peng, Lv Xiaoguang. Text Semantic Representation with Structure-Function and Entity Recognition: Case Study of Medical Records[J]. 数据分析与知识发现, 2022, 6(8): 110-121.
[2] Zhang Shunxiang, Zhang Zhenjiang, Zhu Guangli, Zhao Tong, Huang Ju. Identifying Financial Text Causality with Bi-LSTM and Two-way CNN[J]. 数据分析与知识发现, 2022, 6(7): 118-127.
[3] Yang Wenli, Li Nana. A Text-Aligned Cross-Language Sentiment Classification Method Based on Adversarial Networks[J]. 数据分析与知识发现, 2022, 6(7): 141-151.
[4] Wu Jiang, Liu Tao, Liu Yang. Mining Online User Profiles and Self-Presentations: Case Study of NetEase Music Community[J]. 数据分析与知识发现, 2022, 6(7): 56-69.
[5] Zheng Jie, Huang Hui, Qin Yongbin. Matching Similar Cases with Legal Knowledge Fusion[J]. 数据分析与知识发现, 2022, 6(7): 99-106.
[6] Geng Shuang, He Yuqin, Xu Xin, Niu Ben. Comparing Official Projected and Public Perceived Images of Festival Events with Textual Compositional Distance[J]. 数据分析与知识发现, 2022, 6(6): 115-127.
[7] Ye Han,Sun Haichun,Li Xin,Jiao Kainan. Classification Model for Long Texts with Attention Mechanism and Sentence Vector Compression[J]. 数据分析与知识发现, 2022, 6(6): 84-94.
[8] Xue Jingjing, Qin Yongbin, Huang Ruizhang, Ren Lina, Chen Yanping. SSVAE: A Deep Variational Text Clustering Model with Semantic Supplementation[J]. 数据分析与知识发现, 2022, 6(6): 71-83.
[9] Pan Huiping, Li Baoan, Zhang Le, Lv Xueqiang. Extracting Keywords from Government Work Reports with Multi-feature Fusion[J]. 数据分析与知识发现, 2022, 6(5): 54-63.
[10] Wu Kaibiao, Lang Yuxiang, Dong Yu. Mining Policy Text Relevance with Syntactic Structure and Semantic Information[J]. 数据分析与知识发现, 2022, 6(5): 20-33.
[11] Tu Zhenchao, Ma Jing. Item Categorization Algorithm Based on Improved Text Representation[J]. 数据分析与知识发现, 2022, 6(5): 34-43.
[12] Wang Lu, Le Xiaoqiu. Research Progress on Citation Analysis of Scientific Papers[J]. 数据分析与知识发现, 2022, 6(4): 1-15.
[13] Chen Guo, Ye Chao. News Classification with Semi-Supervised and Active Learning[J]. 数据分析与知识发现, 2022, 6(4): 28-38.
[14] Xiao Yuejun, Li Honglian, Zhang Le, Lv Xueqiang, You Xindong. Classifying Chinese Patent Texts with Feature Fusion[J]. 数据分析与知识发现, 2022, 6(4): 49-59.
[15] Yang Lin, Huang Xiaoshuo, Wang Jiayang, Ding Lingling, Li Zixiao, Li Jiao. Identifying Subtypes of Clinical Trial Diseases with BERT-TextCNN[J]. 数据分析与知识发现, 2022, 6(4): 69-81.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn