|
|
IMTS: Detecting Fake Reviews with Image and Text Semantics |
Shi Yunmei1,2,Yuan Bo1,2,Zhang Le1,2(),Lv Xueqiang1 |
1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China 2School of Computer Science, Beijing Information Science and Technology University, Beijing 100101, China |
|
|
Abstract [Objective] This paper proposes a fake comment detection method (IMTS) integrating image information and text semantics for Chinese e-commerce websites, aiming to address the proliferation of fake comments posted by “Internet Water Army”. [Methods] First, we used the text convolutional neural network (TextCNN) and the BERT pre-training model to extract features of the text review information, and obtained the corresponding feature vectors. Then, we integrated the reviewer features to enhance the model’s capture of the overall semantic information by splicing the review text semantics and the output features of the reviewer ID. Third, we used the Residual Network (ResNet) to extract features from pictures posted by users in comments to obtain corresponding visual features. Finally, we conducted multimodal fusion of text features and visual features to detect the fake comments. [Results] The IMTS method achieved 96.36% accuracy, 96.35% recall and 96.35% F1 value on the self-built multimodal Chinese fake comment dataset. [Limitations] The dataset in this paper was small in scale, and the BERT pre-training model was used in the text processing stage. [Conclusions] The proposed method could effectively improve the overall detection accuracy of fake comments.
|
Received: 31 October 2021
Published: 23 September 2022
|
|
Fund:National Key R&D Program of China(2018YFB1004100);National Natural Science Foundation of China(62171043) |
Corresponding Authors:
Zhang Le,ORCID:0000-0002-9620-511X
E-mail: zhangle@bistu.edu.cn
|
[1] |
中国互联网络信息中心. 第47次中国互联网络发展状况统计报告[R/OL]. [2021-02-28]. http://www.cac.gov.cn/2021-02/03/c_1613923423079314.htm.
|
[1] |
(China Internet Network Information Center. Statistical Report of the 47th Chinese Internet Development[R/OL]. [2021-02-28]. http://www.cac.gov.cn/2021-02/03/c_1613923423079314.htm.)
|
[2] |
Wu Y Y, Ngai E W T, Wu P K, et al. Fake Online Reviews: Literature Review, Synthesis, and Directions for Future Research[J]. Decision Support Systems, 2020, 132: 113280.
doi: 10.1016/j.dss.2020.113280
|
[3] |
陈燕方, 谭立辉. 在线商品虚假评论信息治理策略研究[J]. 现代情报, 2015, 35(2): 150-153.
|
[3] |
(Chen Yanfang, Tan Lihui. Study on Information Management Strategies of Fake Reviews of Online Products[J]. Journal of Modern Information, 2015, 35(2): 150-153.)
|
[4] |
Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1746-1751.
|
[5] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
|
[6] |
He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016: 770-778.
|
[7] |
张紫琼, 叶强, 李一军. 互联网商品评论情感分析研究综述[J]. 管理科学学报, 2010, 13(6): 84-96.
|
[7] |
(Zhang Ziqiong, Ye Qiang, Li Yijun. Literature Review on Sentiment Analysis of Online Product Reviews[J]. Journal of Management Sciences in China, 2010, 13(6): 84-96.)
|
[8] |
李菲菲, 吴璠, 王中卿. 基于生成式对抗网络和评论专业类型的情感分类研究[J]. 数据分析与知识发现, 2021, 5(4): 72-79.
|
[8] |
(Li Feifei, Wu Fan, Wang Zhongqing. Sentiment Analysis with Reviewer Types and Generative Adversarial Network[J]. Data Analysis and Knowledge Discovery, 2021, 5(4): 72-79.)
|
[9] |
田金霓, 尤天慧, 袁媛. 基于在线评论的产品竞争力分析方法[J]. 东北大学学报(自然科学版), 2021, 42(10): 1498-1505.
|
[9] |
(Tian Jinni, You Tianhui, Yuan Yuan. Product Competitiveness Analysis Method Based on Online Reviews[J]. Journal of Northeastern University(Natural Science), 2021, 42(10): 1498-1505.)
|
[10] |
行娟娟. 基于Markov逻辑网的虚假评论识别方法[J]. 中文信息学报, 2016, 30(5): 94-100.
|
[10] |
(Xing Juanjuan. Fake Reviews Identification Based on Markov Logic Networks[J]. Journal of Chinese Information Processing, 2016, 30(5): 94-100.)
|
[11] |
Gao X Y, Li S, Zhu Y Y, et al. Identification of Deceptive Reviews by Sentimental Analysis and Characteristics of Reviewers[J]. Journal of Engineering Science and Technology Review, 2019, 12(1): 195-201.
|
[12] |
张琪, 纪淑娟, 傅强, 等. 基于带权评论图的水军群组检测及特征分析[J]. 计算机应用, 2019, 39(6): 1595-1600.
doi: 10.11772/j.issn.1001-9081.2018122611
|
[12] |
(Zhang Qi, Ji Shujuan, Fu Qiang, et al. Weighted Reviewer Graph Based Spammer Group Detection and Characteristic Analysis[J]. Journal of Computer Applications, 2019, 39(6): 1595-1600.)
doi: 10.11772/j.issn.1001-9081.2018122611
|
[13] |
Dong L Y, Ji S J, Zhang C J, et al. An Unsupervised Topic-Sentiment Joint Probabilistic Model for Detecting Deceptive Reviews[J]. Expert Systems with Applications, 2018, 114: 210-223.
doi: 10.1016/j.eswa.2018.07.005
|
[14] |
Liu Y C, Pang B. A Unified Framework for Detecting Author Spamicity by Modeling Review Deviation[J]. Expert Systems with Applications, 2018, 112: 148-155.
doi: 10.1016/j.eswa.2018.06.028
|
[15] |
Yu C M, Zuo Y H, Feng B L, et al. An Individual-Group-Merchant Relation Model for Identifying Fake Online Reviews: An Empirical Study on a Chinese E-Commerce Platform[J]. Information Technology and Management, 2019, 20(3): 123-138.
doi: 10.1007/s10799-018-0288-1
|
[16] |
Zhang L, Wu Z A, Cao J. Detecting Spammer Groups from Product Reviews: A Partially Supervised Learning Model[J]. IEEE Access, 2018, 6: 2559-2568.
doi: 10.1109/ACCESS.2017.2784370
|
[17] |
Yuan S H, Wu X T, Xiang Y. Task-Specific Word Identification from Short Texts Using a Convolutional Neural Network[J]. Intelligent Data Analysis, 2018, 22(3): 533-550.
doi: 10.3233/IDA-173413
|
[18] |
Mandhula T, Pabboju S, Gugulotu N. Predicting the Customer’s Opinion on Amazon Products Using Selective Memory Architecture-Based Convolutional Neural Network[J]. The Journal of Supercomputing, 2020, 76(8): 5923-5947.
doi: 10.1007/s11227-019-03081-4
|
[19] |
Bhargava R, Baoni A, Sharma Y. Composite Sequential Modeling for Identifying Fake Reviews[J]. Journal of Intelligent Systems, 2019, 28(3): 409-422.
doi: 10.1515/jisys-2017-0501
|
[20] |
张国标, 李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
|
[20] |
(Zhang Guobiao, Li Jie. Detecting Social Media Fake News with Semantic Consistency Between Multi-Model Contents[J]. Data Analysis and Knowledge Discovery, 2021, 5(5): 21-29.)
|
[21] |
孙晓燕, 马路遥, 乔娅丽. 基于文本特征融合的虚假评论识别[C]// 第31届中国过程控制会议. 2020.
|
[21] |
(Sun Xiaoyan, Ma Luyao, Qiao Yali. False Comment Recognition Based on Text Feature Fusion[C]// Proceedings of the 31st China Process Control Conference. 2020.)
|
[22] |
Lu S, Mao C, Yu Z, et al. A Joint Model with Multi-Granularity Features of Low-Resource Language POS Tagging and Dependency Parsing[C]// Proceedings of the 20th Chinese National Conference on Computational Linguistics. 2021: 747-757.
|
[23] |
Ali F, El-Sappagh S, Islam S M R, et al. A Smart Healthcare Monitoring System for Heart Disease Prediction Based on Ensemble Deep Learning and Feature Fusion[J]. Information Fusion, 2020, 63: 208-222.
doi: 10.1016/j.inffus.2020.06.008
|
[24] |
Makiuchi M R, Warnita T, Uto K, et al. Multimodal Fusion of BERT-CNN and Gated CNN Representations for Depression Detection[C]// Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop. 2019: 55-63.
|
[25] |
陈鹏, 李擎, 张德政, 等. 多模态学习方法综述[J]. 工程科学学报, 2020, 42(5): 557-569.
|
[25] |
(Chen Peng, Li Qing, Zhang Dezheng, et al. A Survey of Multimodal Machine Learning[J]. Chinese Journal of Engineering, 2020, 42(5): 557-569.)
|
[26] |
Sutton C, McCallum A. An Introduction to Conditional Random Fields for Relational Learning[J]. Introduction to Statistical Relational Learning, 2006, 2: 93-128.
|
[27] |
Ngiam J, Khosla A, Kim M, et al. Multimodal Deep Learning[C]// Proceedings of the 28th International Conference on Machine Learning. 2011: 689-696.
|
[28] |
Lei J, Yu L C, Bansal M, et al. TVQA: Localized, Compositional Video Question Answering[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 1369-1379.
|
[29] |
Zhang Z F, Li X L, Gan C Q. Multimodality Fusion for Node Classification in D2D Communications[J]. IEEE Access, 2018, 6: 63748-63756.
doi: 10.1109/ACCESS.2018.2877715
|
[30] |
Manaskasemsak B, Chanmakho C, Klainongsuang J, et al. Opinion Spam Detection Through User Behavioral Graph Partitioning Approach[C]// Proceedings of the 3rd International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence. 2019: 73-77.
|
[31] |
Xie S H, Wang G, Lin S Y, et al. Review Spam Detection via Temporal Pattern Discovery[C]// Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012: 823-831.
|
[32] |
Dewang R K, Singh P, Singh A K. Finding of Review Spam Through “Corleone, Review Genre, Writing Style and Review Text Detail Features”[C]// Proceedings of the 2nd International Conference on Information and Communication Technology for Competitive Strategies. 2016.
|
[33] |
Wang Y Q, Ma F L, Jin Z W, et al. EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 849-857.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|