Please wait a minute...
Advanced Search
现代图书情报技术  2014, Vol. 30 Issue (10): 93-100     https://doi.org/10.11925/infotech.1003-3513.2014.10.14
  情报分析与研究 本期目录 | 过刊浏览 | 高级检索 |
电子商务中垃圾评论检测的特征提取方法
游贵荣1,2, 吴为3, 钱沄涛2
1. 福建商业高等专科学校信息管理工程系 福州 350012;
2. 浙江大学计算机科学与技术学院 杭州 310027;
3. 浙江省网络系统与信息安全重点实验室 杭州 310006
Feature Extraction Method for Detecting Spam in Electronic Commerce
You Guirong1,2, Wu Wei3, Qian Yuntao2
1. Department of Information Management Engineering, Fujian Commercial College, Fuzhou 350012, China;
2. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
3. Zhejiang Province Key Laboratory of Network System and Information Security, Hangzhou 310006, China
全文: PDF (530 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的] 针对电子商务中产品评论存在较多垃圾评论的问题, 提出新的特征提取方法, 提高垃圾评论的识别率。[方法] 根据量化评价的思想, 使用词性路径匹配模板检测评论中的评价句, 并在分词中加入自定义评价词词典, 提高评价句的识别率。利用评价句的数量能够很好地区分与产品无关的评论或垃圾评论的优点, 结合评论的主题词、情感倾向、文本结构等, 有针对性地提取相应的特征。[结果] 实验结果显示, 利用该特征识别垃圾评论的准确率为97.96%、F值为88.48%。[局限] 该方法主要用于中文垃圾评论的识别, 不适用于英文产品评论。[结论] 所提取的特征能够高效准确地识别垃圾评论, 亦可用于正常评论的有用性量化评估及排序, 有广泛的应用价值。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
游贵荣
钱沄涛
吴为
关键词 意见挖掘特征提取垃圾评论    
Abstract

[Objective] A feature extraction method is proposed aiming to detect spams and improve recognition rate from regular product reviews in electronic commerce. [Methods] Based on the idea of quantitative evaluation, features are extracted comprehensively in terms of reviews' intrinsic characters such as the number of evaluation sentence, sentiment tendency, topic word and text structure. The number of evaluation sentence is the key feature to distinguish spams from regular product reviews using Part-Of-Speech (POS) path matching templates, and a custom dictionary is imported to improve recognition rate of detecting evaluation sentence. [Results] Experiment results show that the spam recognition precision can reach 97.96% and F-measure reach 88.48%. [Limitations] This method is mainly used to identify Chinese review spams, is not suitable for the English product reviews. [Conclusions] Review spams can be effectively and accurately detected by the proposed features. Furthermore, these features can also be applied to evaluate and rank the regular product reviews, and other related applications.

Key wordsOpinion mining    Feature extraction    Review spam
收稿日期: 2014-04-01      出版日期: 2014-11-28
:  TP393  
基金资助:

本文系国家"973"重点基础研究发展规划资助项目"面向公共安全的跨媒体计算理论与方法"(项目编号:2012CB316400)和浙江省网络系统与信息安全重点实验室开放基金资助项目"网上交易欺诈检测模型研究"(项目编号: 13-124001-012)的研究成果之一。

通讯作者: 游贵荣 E-mail: ygr@fjcc.edu.cn     E-mail: ygr@fjcc.edu.cn
作者简介: 作者贡献声明: 游贵荣: 提出研究思路, 设计研究方案; 游贵荣, 吴为: 设计实验过程; 吴为: 实验数据采集、预处理和分析; 游贵荣: 论文起草及最终版本修订; 钱沄涛: 论文审核及对部分内容进行关键性补充修改。
引用本文:   
游贵荣, 吴为, 钱沄涛. 电子商务中垃圾评论检测的特征提取方法[J]. 现代图书情报技术, 2014, 30(10): 93-100.
You Guirong, Wu Wei, Qian Yuntao. Feature Extraction Method for Detecting Spam in Electronic Commerce. New Technology of Library and Information Service, 2014, 30(10): 93-100.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.10.14      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2014/V30/I10/93

[1] 2012年中国网络购物市场研究报告[R/OL]. [2013-11-16]. http://www.cnnic.cn/hlwfzyj/hlwxzbg/dzswbg/201304/t20130417_39290.htm. (China Online Shopping Market Survey Report, 2012 [R/OL]. [2013-11-16]. http://www.cnnic.cn/hlwfzyj/ hlwxzbg/dzswbg/201304/t20130417_39290.htm.)
[2] 淘宝评价体系介绍[EB/OL]. [2013-10-18]. http://service. taobao.com/support/knowledge-4781666.htm?spm=0.0.0.49.x2xxVE&dkey=searchview.(Introduce of Taobao Evaluation System[EB/OL]. [2013-10-18]. http://service.taobao.com/ support/knowledge-4781666.htm?spm=0.0.0.49.x2xxVE&dkey=searchview.)
[3] Jindal N, Liu B. Opinion Spam and Analysis [C]. In: Proceedings of International Conference on Web Search and Web Data Mining, California, USA. New York, NY, USA: ACM, 2008: 219-229.
[4] Liu J J, Cao Y B, Lin C Y, et al. Low-Quality Product Review Detection in Opinion Summarization [C]. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic. 2007: 334-342.
[5] 李霄, 丁晟春. 垃圾商品评论信息的识别研究[J]. 现代图书情报技术, 2013(1): 63-68. (Li Xiao, Ding Shengchun. Research on Review Spam Recognition [J]. New Technology of Library and Information Service, 2013(1): 63-68.)
[6] ICTCLAS汉语分词系统 [CP/OL]. [2013-11-05]. http://www. ictclas.org/.(ICTCLAS Chinese Lexical Analysis System [CP/ OL]. [2013-11-05].http://www.ictclas.org/.)
[7] 知网[DB/OL]. [2013-11-05]. http://www.keenage.com/. (HowNet Knowledge Database [DB/OL]. [2013-11-05]. http:// www.keenage.com/.)
[8] 酷酷词[EB/OL]. [2013-12-08]. http://www.kukuci.com/. (Ku-ku Words[EB/OL]. [2013-12-08]. http://www.kukuci.com/.)
[9] 李实, 叶强, 李一军, 等. 中文网络客户评论的产品特征挖掘方法研究[J]. 管理科学学报, 2009, 12(2): 142-152. (Li Shi, Ye Qiang, Li Yijun, et al. Mining Features of Products from Chinese Customer Online Reviews [J]. Journal of Management Sciences in China, 2009, 12(2): 142-152.)
[10] Wei W, Liu H Y, He J, et al. Extracting Feature and Opinion Words Effectively from Chinese Product Reviews [C]. In: Proceedings of the 5th International Conference on Fuzzy Systems and Knowledge Discovery, Ji'nan, Shandong.IEEE, 2008: 170-174.
[11] 黄亿华, 濮小佳, 袁春风, 等. 基于句法树结构的情感评价单元抽取算法[J]. 计算机应用研究, 2011, 28(9): 3230-3234. (Huang Yihua, Pu Xiaojia, Yuan Chunfeng, et al. Appraisal Expression Extraction Based on Parse Tree Structure [J]. Application Research of Computers, 2011, 28(9): 3230-3234.)
[12] 冯秀珍, 郝鹏. 基于词性分析的产品评价信息挖掘[J]. 计算机工程与设计, 2013, 34(1): 283-288. (Feng Xiuzhen, Hao Peng. Information of Product Review Mining Based on Analyzing of Part of Speech [J]. Computer Engineering and Design, 2013, 34(1): 283-288.)
[13] 赵文婧. 产品描述词及情感词抽取模式的研究[D]. 北京:北京邮电大学, 2010.(Zhao Wenjing. Research on Extraction Patterns of Product Description Words and Sentiment Words [D]. Beijing: Beijing University of Posts and Telecommunications, 2010.)
[14] 扈中凯, 郑小林, 吴亚峰, 等. 基于用户评论挖掘的产品推荐算法[J]. 浙江大学学报: 工学版, 2013, 47(8): 1475-1485. (Hu Zhongkai, Zheng Xiaolin, Wu Yafeng, et al. Product Recommendation Algorithm Based on Users' Reviews Mining [J]. Journal of Zhejiang University: Engineering Science, 2013, 47(8): 1475-1485.)
[15] STEYX [EB/OL]. [2013-11-20]. http://office.microsoft.com/ zh-cn/excel-help/HP010342925.aspx.
[16] Turney P D. Mining the Web for Synonyms:PMI-IR Versus LSA on TOEFL [C]. In: Proceedings of the 12th European Conference on Machine Learning, Freiburg, Germany. London, UK: Springer-Verlag, 2001:491-502.
[17] Hu M Q, Liu B. Mining and Summarizing Customer Reviews [C]. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA. New York, NY, USA: ACM, 2004: 168-177.
[18] Chen C C, Tseng Y D. Quality Evaluation of Product Reviews Using an Information Quality Framework [J]. Decision Support Systems, 2011, 50(4): 755-768.
[19] LibSVM Tools [EB/OL]. [2013-10-08]. http://www.csie.ntu. edu.tw/~cjlin/libsvmtools/.

[1] 郑新曼, 董瑜. 基于科技政策文本的程度词典构建研究*[J]. 数据分析与知识发现, 2021, 5(10): 81-93.
[2] 华斌, 吴诺, 贺欣. 基于知识融合的政务信息化项目多专家审批意见整合*[J]. 数据分析与知识发现, 2021, 5(10): 124-136.
[3] 蔡婧璇,吴江,王诚坤. 基于深度学习的众测报告有用性预测研究*[J]. 数据分析与知识发现, 2020, 4(11): 102-111.
[4] 李纲,周华阳,毛进,陈思菁. 基于机器学习的社交媒体用户分类研究 *[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[5] 文秀贤,徐健. 基于用户评论的商品特征提取及特征价格研究 *[J]. 数据分析与知识发现, 2019, 3(7): 42-51.
[6] 严娇,马静,房康. 基于融合共现距离的句法网络下文本语义相似度计算 *[J]. 数据分析与知识发现, 2019, 3(12): 93-100.
[7] 钟庆虹,乔晓东,张运良,翁梦娟. 基于LDA2Vec和残差网络的跨媒体融合方法研究 *[J]. 数据分析与知识发现, 2019, 3(10): 78-88.
[8] 杨贵军,徐雪,赵富强. 基于XGBoost算法的用户评分预测模型及应用*[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[9] 黄孝喜, 李晗雨, 王荣波, 王小华, 谌志群. 基于卷积神经网络与SVM分类器的隐喻识别*[J]. 数据分析与知识发现, 2018, 2(10): 77-83.
[10] 李伟卿, 王伟军. 基于大规模评论数据的产品特征词典构建方法研究*[J]. 数据分析与知识发现, 2018, 2(1): 41-50.
[11] 李昌兵, 庞崇鹏, 李美平. 基于权重的Apriori算法在文本统计特征提取方法中的应用*[J]. 数据分析与知识发现, 2017, 1(9): 83-89.
[12] 刘红光,马双刚,刘桂锋. 基于降噪自动编码器的中文新闻文本分类方法研究*[J]. 现代图书情报技术, 2016, 32(6): 12-19.
[13] 杜思奇, 李红莲, 吕学强. 汉语组块分析在产品特征提取中的应用研究[J]. 现代图书情报技术, 2015, 31(9): 26-30.
[14] 张莉, 许鑫. 产品评论中的隐式属性抽取研究[J]. 现代图书情报技术, 2015, 31(12): 42-47.
[15] 张李义, 张皎. 一种基于主成分分析和随机森林的刷客识别方法[J]. 现代图书情报技术, 2015, 31(10): 65-71.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn