Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (10): 93-100    DOI: 10.11925/infotech.1003-3513.2014.10.14
Current Issue | Archive | Adv Search |
Feature Extraction Method for Detecting Spam in Electronic Commerce
You Guirong1,2, Wu Wei3, Qian Yuntao2
1. Department of Information Management Engineering, Fujian Commercial College, Fuzhou 350012, China;
2. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
3. Zhejiang Province Key Laboratory of Network System and Information Security, Hangzhou 310006, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] A feature extraction method is proposed aiming to detect spams and improve recognition rate from regular product reviews in electronic commerce. [Methods] Based on the idea of quantitative evaluation, features are extracted comprehensively in terms of reviews' intrinsic characters such as the number of evaluation sentence, sentiment tendency, topic word and text structure. The number of evaluation sentence is the key feature to distinguish spams from regular product reviews using Part-Of-Speech (POS) path matching templates, and a custom dictionary is imported to improve recognition rate of detecting evaluation sentence. [Results] Experiment results show that the spam recognition precision can reach 97.96% and F-measure reach 88.48%. [Limitations] This method is mainly used to identify Chinese review spams, is not suitable for the English product reviews. [Conclusions] Review spams can be effectively and accurately detected by the proposed features. Furthermore, these features can also be applied to evaluate and rank the regular product reviews, and other related applications.

Key wordsOpinion mining      Feature extraction      Review spam     
Received: 01 April 2014      Published: 28 November 2014
:  TP393  

Cite this article:

You Guirong, Wu Wei, Qian Yuntao. Feature Extraction Method for Detecting Spam in Electronic Commerce. New Technology of Library and Information Service, 2014, 30(10): 93-100.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.10.14     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I10/93

[1] 2012年中国网络购物市场研究报告[R/OL]. [2013-11-16]. http://www.cnnic.cn/hlwfzyj/hlwxzbg/dzswbg/201304/t20130417_39290.htm. (China Online Shopping Market Survey Report, 2012 [R/OL]. [2013-11-16]. http://www.cnnic.cn/hlwfzyj/ hlwxzbg/dzswbg/201304/t20130417_39290.htm.)
[2] 淘宝评价体系介绍[EB/OL]. [2013-10-18]. http://service. taobao.com/support/knowledge-4781666.htm?spm=0.0.0.49.x2xxVE&dkey=searchview.(Introduce of Taobao Evaluation System[EB/OL]. [2013-10-18]. http://service.taobao.com/ support/knowledge-4781666.htm?spm=0.0.0.49.x2xxVE&dkey=searchview.)
[3] Jindal N, Liu B. Opinion Spam and Analysis [C]. In: Proceedings of International Conference on Web Search and Web Data Mining, California, USA. New York, NY, USA: ACM, 2008: 219-229.
[4] Liu J J, Cao Y B, Lin C Y, et al. Low-Quality Product Review Detection in Opinion Summarization [C]. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic. 2007: 334-342.
[5] 李霄, 丁晟春. 垃圾商品评论信息的识别研究[J]. 现代图书情报技术, 2013(1): 63-68. (Li Xiao, Ding Shengchun. Research on Review Spam Recognition [J]. New Technology of Library and Information Service, 2013(1): 63-68.)
[6] ICTCLAS汉语分词系统 [CP/OL]. [2013-11-05]. http://www. ictclas.org/.(ICTCLAS Chinese Lexical Analysis System [CP/ OL]. [2013-11-05].http://www.ictclas.org/.)
[7] 知网[DB/OL]. [2013-11-05]. http://www.keenage.com/. (HowNet Knowledge Database [DB/OL]. [2013-11-05]. http:// www.keenage.com/.)
[8] 酷酷词[EB/OL]. [2013-12-08]. http://www.kukuci.com/. (Ku-ku Words[EB/OL]. [2013-12-08]. http://www.kukuci.com/.)
[9] 李实, 叶强, 李一军, 等. 中文网络客户评论的产品特征挖掘方法研究[J]. 管理科学学报, 2009, 12(2): 142-152. (Li Shi, Ye Qiang, Li Yijun, et al. Mining Features of Products from Chinese Customer Online Reviews [J]. Journal of Management Sciences in China, 2009, 12(2): 142-152.)
[10] Wei W, Liu H Y, He J, et al. Extracting Feature and Opinion Words Effectively from Chinese Product Reviews [C]. In: Proceedings of the 5th International Conference on Fuzzy Systems and Knowledge Discovery, Ji'nan, Shandong.IEEE, 2008: 170-174.
[11] 黄亿华, 濮小佳, 袁春风, 等. 基于句法树结构的情感评价单元抽取算法[J]. 计算机应用研究, 2011, 28(9): 3230-3234. (Huang Yihua, Pu Xiaojia, Yuan Chunfeng, et al. Appraisal Expression Extraction Based on Parse Tree Structure [J]. Application Research of Computers, 2011, 28(9): 3230-3234.)
[12] 冯秀珍, 郝鹏. 基于词性分析的产品评价信息挖掘[J]. 计算机工程与设计, 2013, 34(1): 283-288. (Feng Xiuzhen, Hao Peng. Information of Product Review Mining Based on Analyzing of Part of Speech [J]. Computer Engineering and Design, 2013, 34(1): 283-288.)
[13] 赵文婧. 产品描述词及情感词抽取模式的研究[D]. 北京:北京邮电大学, 2010.(Zhao Wenjing. Research on Extraction Patterns of Product Description Words and Sentiment Words [D]. Beijing: Beijing University of Posts and Telecommunications, 2010.)
[14] 扈中凯, 郑小林, 吴亚峰, 等. 基于用户评论挖掘的产品推荐算法[J]. 浙江大学学报: 工学版, 2013, 47(8): 1475-1485. (Hu Zhongkai, Zheng Xiaolin, Wu Yafeng, et al. Product Recommendation Algorithm Based on Users' Reviews Mining [J]. Journal of Zhejiang University: Engineering Science, 2013, 47(8): 1475-1485.)
[15] STEYX [EB/OL]. [2013-11-20]. http://office.microsoft.com/ zh-cn/excel-help/HP010342925.aspx.
[16] Turney P D. Mining the Web for Synonyms:PMI-IR Versus LSA on TOEFL [C]. In: Proceedings of the 12th European Conference on Machine Learning, Freiburg, Germany. London, UK: Springer-Verlag, 2001:491-502.
[17] Hu M Q, Liu B. Mining and Summarizing Customer Reviews [C]. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA. New York, NY, USA: ACM, 2004: 168-177.
[18] Chen C C, Tseng Y D. Quality Evaluation of Product Reviews Using an Information Quality Framework [J]. Decision Support Systems, 2011, 50(4): 755-768.
[19] LibSVM Tools [EB/OL]. [2013-10-08]. http://www.csie.ntu. edu.tw/~cjlin/libsvmtools/.

[1] Zheng Xinman, Dong Yu. Constructing Degree Lexicon for STI Policy Texts[J]. 数据分析与知识发现, 2021, 5(10): 81-93.
[2] Hua Bin, Wu Nuo, He Xin. Integrating Expert Reviews for Government Information Projects with Knowledge Fusion[J]. 数据分析与知识发现, 2021, 5(10): 124-136.
[3] Xu Hongxia,Yu Qianqian,Qian Li. Studying Content Interaction Data with Topic Model and Sentiment Analysis[J]. 数据分析与知识发现, 2020, 4(7): 110-117.
[4] Cai Jingxuan,Wu Jiang,Wang Chengkun. Predicting Usefulness of Crowd Testing Reports with Deep Learning[J]. 数据分析与知识发现, 2020, 4(11): 102-111.
[5] Hui Nie,Huan He. Identifying Implicit Features with Word Embedding[J]. 数据分析与知识发现, 2020, 4(1): 99-110.
[6] Jiafen Wu,Feicheng Ma. Detecting Product Review Spam: A Survey[J]. 数据分析与知识发现, 2019, 3(9): 1-15.
[7] Bocheng Li,Yunqiu Zhang,Kaixi Yang. Extracting Emotion Tags from Comments of Microblog Commodities[J]. 数据分析与知识发现, 2019, 3(9): 115-123.
[8] Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[9] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[10] Jiao Yan,Jing Ma,Kang Fang. Computing Text Semantic Similarity with Syntactic Network of Co-occurrence Distance[J]. 数据分析与知识发现, 2019, 3(12): 93-100.
[11] Qinghong Zhong,Xiaodong Qiao,Yunliang Zhang,Mengjuan Weng. Cross-media Fusion Method Based on LDA2Vec and Residual Network[J]. 数据分析与知识发现, 2019, 3(10): 78-88.
[12] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[13] Zhou Lixin,Lin Jie. Extracting Product Features with NodeRank Algorithm[J]. 数据分析与知识发现, 2018, 2(4): 90-98.
[14] Huang Xiaoxi,Li Hanyu,Wang Rongbo,Wang Xiaohua,Chen Zhiqun. Recognizing Metaphor with Convolution Neural Network and SVM[J]. 数据分析与知识发现, 2018, 2(10): 77-83.
[15] Li Weiqing,Wang Weijun. Building Product Feature Dictionary with Large-scale Review Data[J]. 数据分析与知识发现, 2018, 2(1): 41-50.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn