Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (8): 84-96     https://doi.org/10.11925/infotech.2096-3467.2021.1245
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
IMTS:融合图像与文本语义的虚假评论检测方法*
施运梅1,2,袁博1,2,张乐1,2(),吕学强1
1北京信息科技大学网络文化与数字传播北京市重点实验室 北京 100101
2北京信息科技大学计算机学院 北京 100101
IMTS: Detecting Fake Reviews with Image and Text Semantics
Shi Yunmei1,2,Yuan Bo1,2,Zhang Le1,2(),Lv Xueqiang1
1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China
2School of Computer Science, Beijing Information Science and Technology University, Beijing 100101, China
全文: PDF (4980 KB)   HTML ( 31
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 针对网络“水军”发布的虚假评论信息在电商网站泛滥的问题,集成了一种面向中文电商网站评论的融合图像信息与文本语义的虚假评论检测方法(IMTS)。【方法】 IMTS方法使用文本卷积神经网络及BERT预训练模型分别对文本评论信息进行特征提取,并得到对应的特征向量。再融入评论者特征,通过拼接评论文本语义与评论者ID的输出特征,进一步加强模型对整体语义信息的捕捉。将用户在评论中发布的图片利用残差网络进行特征抽取,获得对应的视觉特征,最后将文本特征与视觉特征进行多模态融合,检测虚假评论。【结果】 IMTS方法在自建的多模态中文虚假评论数据集上,达到0.963 6的准确率、0.963 5的召回率以及0.963 5的F1值。【局限】 限于计算能力,本文数据集规模较小,且在文本处理阶段使用了BERT预训练模型,在大规模的数据计算情况下,时间成本较高。【结论】 运用多模态思想以及特征融合方法对虚假评论文本进行特征补充从而检测虚假评论是有效的,此方法可以有效提升虚假评论整体的检测精度。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
施运梅
袁博
张乐
吕学强
关键词 虚假评论多模态文本图像BERT    
Abstract

[Objective] This paper proposes a fake comment detection method (IMTS) integrating image information and text semantics for Chinese e-commerce websites, aiming to address the proliferation of fake comments posted by “Internet Water Army”. [Methods] First, we used the text convolutional neural network (TextCNN) and the BERT pre-training model to extract features of the text review information, and obtained the corresponding feature vectors. Then, we integrated the reviewer features to enhance the model’s capture of the overall semantic information by splicing the review text semantics and the output features of the reviewer ID. Third, we used the Residual Network (ResNet) to extract features from pictures posted by users in comments to obtain corresponding visual features. Finally, we conducted multimodal fusion of text features and visual features to detect the fake comments. [Results] The IMTS method achieved 96.36% accuracy, 96.35% recall and 96.35% F1 value on the self-built multimodal Chinese fake comment dataset. [Limitations] The dataset in this paper was small in scale, and the BERT pre-training model was used in the text processing stage. [Conclusions] The proposed method could effectively improve the overall detection accuracy of fake comments.

Key wordsFalse comment    Multimodal    Text    Image    BERT
收稿日期: 2021-10-31      出版日期: 2022-09-23
ZTFLH:  TP393  
基金资助:*国家重点研发计划基金项目(2018YFB1004100);国家自然科学基金项目的研究成果之一(62171043)
通讯作者: 张乐,ORCID:0000-0002-9620-511X     E-mail: zhangle@bistu.edu.cn
引用本文:   
施运梅, 袁博, 张乐, 吕学强. IMTS:融合图像与文本语义的虚假评论检测方法*[J]. 数据分析与知识发现, 2022, 6(8): 84-96.
Shi Yunmei, Yuan Bo, Zhang Le, Lv Xueqiang. IMTS: Detecting Fake Reviews with Image and Text Semantics. Data Analysis and Knowledge Discovery, 2022, 6(8): 84-96.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.1245      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I8/84
Fig.1  文本语义信息提取流程
Fig.2  文本语义特征融合模型
Fig.3  多模态融合流程
虚假评论类型 虚假评论特征 示例
不包含任意观点难以分辨情感的随机文本 本身为无实际意义且无逻辑性的语句 大家搜集的哈是孤独一个牙刷难道你就 构建能摧毁阿花股表弟爸爸读发布了豆 阿萨德两个你好像比撒打算大家公司


非评论
文本与符号的堆砌 @)¥()*(&@&@搭配上看懂江湖胡汉三可能就@*¥(&*……&@&%&@#)哦咨询哦家吃饭哈哈斯哈斯哈的
单纯符号的堆砌 &¥(!*&¥&!@……!@)())!&@#&#*&!*(@……#&**&%¥*!@)(*)(¥*!)@*#(&*!%@#*%……&!%@#%
单纯数字的堆砌 11111111111111111111111111
与当前主题无关评论 具有逻辑性但与商品属性无关的评论 店家说打够十五字才可返现。来混经验 我也没办法因为我要打十五个字啊 不用数了这是非常标准的十五个字
广告性评论 拼凑好评数量而进行大量重复性语句 好评!好评!好评!好评!好评!好评! 非常好!好用!非常好!好用! 超级超级好看!!!!!超级超级好看!!!!!






欺骗性评论


商家利用“好评返现”手段让用户撰写具有较高模板化痕迹、固定的写法格式与符号占比、情感表现形式单一、无真实体验的评论类型
适合各种肤色! 遮瑕效果 :好!持续六个小时! ,这个恰好适合我的肤质 ,水润好, 适合任何人。不管大家怎么样的皮肤, 都可以完美适应,特别特别好,性价比超高,一定要回购! 外形外观:挺漂亮的,很精致,很光滑 ,无损坏。 屏幕音效:特别棒, 没有杂音。 拍照效果:拍照效果好看清晰 ,反应快。特别漂亮。 运行速度:快很快。 待机时间:不错 运行也跟ok,反正就是推荐大家购买 很不错。 外形外观:黑色一直都很喜欢 真的非常好看 待机时间:还可以其实和上一代也就差一点点 屏幕音效:音质很好很大声很漂亮拍照效果: 大提升呀不用说的爱不释手。快递也很快!
Table 1  构建规则示例
Fig.4  虚假评论案例
模型输入 模型组 模型 准确率 召回率 F1值
文本 单模型 Bi-LSTM 0.653 4 0.652 5 0.652 5
TextCNN 0.688 3 0.669 3 0.665 6
BERT 0.862 6 0.843 8 0.842 9
组合模型 BERT+LSTM 0.885 6 0.885 4 0.885 3
BERT+TextCNN 0.909 3 0.908 9 0.908 9
组合模型+ID BERT+LSTM+ID 0.891 5 0.891 5 0.891 5
BERT+TextCNN+ID 0.937 3 0.934 9 0.934 6
图像 单模型 CNN 0.831 3 0.830 7 0.830 3
VGG 0.855 4 0.849 0 0.847 5
ResNet 0.872 4 0.872 1 0.872 4

文本+图像
组合模型 EANN 0.849 9 0.849 0 0.849 1
BERT+TextCNN+ResNet 0.955 9 0.955 7 0.955 7
组合模型+ID EANN+ID 0.899 2 0.899 2 0.899 2
IMTS 0.963 6 0.963 5 0.963 5
Table 2  虚假评论识别结果
[1] 中国互联网络信息中心. 第47次中国互联网络发展状况统计报告[R/OL]. [2021-02-28]. http://www.cac.gov.cn/2021-02/03/c_1613923423079314.htm.
[1] (China Internet Network Information Center. Statistical Report of the 47th Chinese Internet Development[R/OL]. [2021-02-28]. http://www.cac.gov.cn/2021-02/03/c_1613923423079314.htm.)
[2] Wu Y Y, Ngai E W T, Wu P K, et al. Fake Online Reviews: Literature Review, Synthesis, and Directions for Future Research[J]. Decision Support Systems, 2020, 132: 113280.
doi: 10.1016/j.dss.2020.113280
[3] 陈燕方, 谭立辉. 在线商品虚假评论信息治理策略研究[J]. 现代情报, 2015, 35(2): 150-153.
[3] (Chen Yanfang, Tan Lihui. Study on Information Management Strategies of Fake Reviews of Online Products[J]. Journal of Modern Information, 2015, 35(2): 150-153.)
[4] Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1746-1751.
[5] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
[6] He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016: 770-778.
[7] 张紫琼, 叶强, 李一军. 互联网商品评论情感分析研究综述[J]. 管理科学学报, 2010, 13(6): 84-96.
[7] (Zhang Ziqiong, Ye Qiang, Li Yijun. Literature Review on Sentiment Analysis of Online Product Reviews[J]. Journal of Management Sciences in China, 2010, 13(6): 84-96.)
[8] 李菲菲, 吴璠, 王中卿. 基于生成式对抗网络和评论专业类型的情感分类研究[J]. 数据分析与知识发现, 2021, 5(4): 72-79.
[8] (Li Feifei, Wu Fan, Wang Zhongqing. Sentiment Analysis with Reviewer Types and Generative Adversarial Network[J]. Data Analysis and Knowledge Discovery, 2021, 5(4): 72-79.)
[9] 田金霓, 尤天慧, 袁媛. 基于在线评论的产品竞争力分析方法[J]. 东北大学学报(自然科学版), 2021, 42(10): 1498-1505.
[9] (Tian Jinni, You Tianhui, Yuan Yuan. Product Competitiveness Analysis Method Based on Online Reviews[J]. Journal of Northeastern University(Natural Science), 2021, 42(10): 1498-1505.)
[10] 行娟娟. 基于Markov逻辑网的虚假评论识别方法[J]. 中文信息学报, 2016, 30(5): 94-100.
[10] (Xing Juanjuan. Fake Reviews Identification Based on Markov Logic Networks[J]. Journal of Chinese Information Processing, 2016, 30(5): 94-100.)
[11] Gao X Y, Li S, Zhu Y Y, et al. Identification of Deceptive Reviews by Sentimental Analysis and Characteristics of Reviewers[J]. Journal of Engineering Science and Technology Review, 2019, 12(1): 195-201.
[12] 张琪, 纪淑娟, 傅强, 等. 基于带权评论图的水军群组检测及特征分析[J]. 计算机应用, 2019, 39(6): 1595-1600.
doi: 10.11772/j.issn.1001-9081.2018122611
[12] (Zhang Qi, Ji Shujuan, Fu Qiang, et al. Weighted Reviewer Graph Based Spammer Group Detection and Characteristic Analysis[J]. Journal of Computer Applications, 2019, 39(6): 1595-1600.)
doi: 10.11772/j.issn.1001-9081.2018122611
[13] Dong L Y, Ji S J, Zhang C J, et al. An Unsupervised Topic-Sentiment Joint Probabilistic Model for Detecting Deceptive Reviews[J]. Expert Systems with Applications, 2018, 114: 210-223.
doi: 10.1016/j.eswa.2018.07.005
[14] Liu Y C, Pang B. A Unified Framework for Detecting Author Spamicity by Modeling Review Deviation[J]. Expert Systems with Applications, 2018, 112: 148-155.
doi: 10.1016/j.eswa.2018.06.028
[15] Yu C M, Zuo Y H, Feng B L, et al. An Individual-Group-Merchant Relation Model for Identifying Fake Online Reviews: An Empirical Study on a Chinese E-Commerce Platform[J]. Information Technology and Management, 2019, 20(3): 123-138.
doi: 10.1007/s10799-018-0288-1
[16] Zhang L, Wu Z A, Cao J. Detecting Spammer Groups from Product Reviews: A Partially Supervised Learning Model[J]. IEEE Access, 2018, 6: 2559-2568.
doi: 10.1109/ACCESS.2017.2784370
[17] Yuan S H, Wu X T, Xiang Y. Task-Specific Word Identification from Short Texts Using a Convolutional Neural Network[J]. Intelligent Data Analysis, 2018, 22(3): 533-550.
doi: 10.3233/IDA-173413
[18] Mandhula T, Pabboju S, Gugulotu N. Predicting the Customer’s Opinion on Amazon Products Using Selective Memory Architecture-Based Convolutional Neural Network[J]. The Journal of Supercomputing, 2020, 76(8): 5923-5947.
doi: 10.1007/s11227-019-03081-4
[19] Bhargava R, Baoni A, Sharma Y. Composite Sequential Modeling for Identifying Fake Reviews[J]. Journal of Intelligent Systems, 2019, 28(3): 409-422.
doi: 10.1515/jisys-2017-0501
[20] 张国标, 李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[20] (Zhang Guobiao, Li Jie. Detecting Social Media Fake News with Semantic Consistency Between Multi-Model Contents[J]. Data Analysis and Knowledge Discovery, 2021, 5(5): 21-29.)
[21] 孙晓燕, 马路遥, 乔娅丽. 基于文本特征融合的虚假评论识别[C]// 第31届中国过程控制会议. 2020.
[21] (Sun Xiaoyan, Ma Luyao, Qiao Yali. False Comment Recognition Based on Text Feature Fusion[C]// Proceedings of the 31st China Process Control Conference. 2020.)
[22] Lu S, Mao C, Yu Z, et al. A Joint Model with Multi-Granularity Features of Low-Resource Language POS Tagging and Dependency Parsing[C]// Proceedings of the 20th Chinese National Conference on Computational Linguistics. 2021: 747-757.
[23] Ali F, El-Sappagh S, Islam S M R, et al. A Smart Healthcare Monitoring System for Heart Disease Prediction Based on Ensemble Deep Learning and Feature Fusion[J]. Information Fusion, 2020, 63: 208-222.
doi: 10.1016/j.inffus.2020.06.008
[24] Makiuchi M R, Warnita T, Uto K, et al. Multimodal Fusion of BERT-CNN and Gated CNN Representations for Depression Detection[C]// Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop. 2019: 55-63.
[25] 陈鹏, 李擎, 张德政, 等. 多模态学习方法综述[J]. 工程科学学报, 2020, 42(5): 557-569.
[25] (Chen Peng, Li Qing, Zhang Dezheng, et al. A Survey of Multimodal Machine Learning[J]. Chinese Journal of Engineering, 2020, 42(5): 557-569.)
[26] Sutton C, McCallum A. An Introduction to Conditional Random Fields for Relational Learning[J]. Introduction to Statistical Relational Learning, 2006, 2: 93-128.
[27] Ngiam J, Khosla A, Kim M, et al. Multimodal Deep Learning[C]// Proceedings of the 28th International Conference on Machine Learning. 2011: 689-696.
[28] Lei J, Yu L C, Bansal M, et al. TVQA: Localized, Compositional Video Question Answering[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 1369-1379.
[29] Zhang Z F, Li X L, Gan C Q. Multimodality Fusion for Node Classification in D2D Communications[J]. IEEE Access, 2018, 6: 63748-63756.
doi: 10.1109/ACCESS.2018.2877715
[30] Manaskasemsak B, Chanmakho C, Klainongsuang J, et al. Opinion Spam Detection Through User Behavioral Graph Partitioning Approach[C]// Proceedings of the 3rd International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence. 2019: 73-77.
[31] Xie S H, Wang G, Lin S Y, et al. Review Spam Detection via Temporal Pattern Discovery[C]// Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012: 823-831.
[32] Dewang R K, Singh P, Singh A K. Finding of Review Spam Through “Corleone, Review Genre, Writing Style and Review Text Detail Features”[C]// Proceedings of the 2nd International Conference on Information and Communication Technology for Competitive Strategies. 2016.
[33] Wang Y Q, Ma F L, Jin Z W, et al. EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 849-857.
[1] 胡吉明, 钱玮, 文鹏, 吕晓光. 基于结构功能和实体识别的文本语义表示——以病历领域为例*[J]. 数据分析与知识发现, 2022, 6(8): 110-121.
[2] 张顺香, 张镇江, 朱广丽, 赵彤, 黄菊. 基于Bi-LSTM与双路CNN的金融领域文本因果关系识别*[J]. 数据分析与知识发现, 2022, 6(7): 118-127.
[3] 杨文丽, 李娜娜. 基于对抗网络的文本对齐跨语言情感分类方法*[J]. 数据分析与知识发现, 2022, 6(7): 141-151.
[4] 吴江, 刘涛, 刘洋. 在线社区用户画像及自我呈现主题挖掘——以网易云音乐社区为例*[J]. 数据分析与知识发现, 2022, 6(7): 56-69.
[5] 郑洁, 黄辉, 秦永彬. 一种融合法律知识的相似案例匹配模型*[J]. 数据分析与知识发现, 2022, 6(7): 99-106.
[6] 薛菁菁, 秦永彬, 黄瑞章, 任丽娜, 陈艳平. SSVAE:一种补充语义信息的深度变分文本聚类模型*[J]. 数据分析与知识发现, 2022, 6(6): 71-83.
[7] 耿爽, 何钰琴, 许欣, 牛奔. 基于文本成分距离的节事“官方投射形象-观众感知形象”比较研究*[J]. 数据分析与知识发现, 2022, 6(6): 115-127.
[8] 叶瀚,孙海春,李欣,焦凯楠. 融合注意力机制与句向量压缩的长文本分类模型[J]. 数据分析与知识发现, 2022, 6(6): 84-94.
[9] 潘慧萍, 李宝安, 张乐, 吕学强. 基于多特征融合的政府工作报告关键词提取研究*[J]. 数据分析与知识发现, 2022, 6(5): 54-63.
[10] 武楷彪, 郎宇翔, 董瑜. 融合句法结构和词义信息的政策文本关联挖掘方法研究*[J]. 数据分析与知识发现, 2022, 6(5): 20-33.
[11] 屠振超, 马静. 基于改进文本表示的商品文本分类算法研究*[J]. 数据分析与知识发现, 2022, 6(5): 34-43.
[12] 陈果, 叶潮. 融合半监督学习与主动学习的细分领域新闻分类研究*[J]. 数据分析与知识发现, 2022, 6(4): 28-38.
[13] 肖悦珺, 李红莲, 张乐, 吕学强, 游新冬. 特征融合的中文专利文本分类方法研究*[J]. 数据分析与知识发现, 2022, 6(4): 49-59.
[14] 杨林, 黄晓硕, 王嘉阳, 丁玲玲, 李子孝, 李姣. 基于BERT-TextCNN的临床试验疾病亚型识别研究*[J]. 数据分析与知识发现, 2022, 6(4): 69-81.
[15] 徐月梅, 樊祖薇, 曹晗. 基于标签嵌入注意力机制的多任务文本分类模型*[J]. 数据分析与知识发现, 2022, 6(2/3): 105-116.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn