Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (10): 93-100    DOI: 10.11925/infotech.1003-3513.2014.10.14
Current Issue | Archive | Adv Search |
Feature Extraction Method for Detecting Spam in Electronic Commerce
You Guirong1,2, Wu Wei3, Qian Yuntao2
1. Department of Information Management Engineering, Fujian Commercial College, Fuzhou 350012, China;
2. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
3. Zhejiang Province Key Laboratory of Network System and Information Security, Hangzhou 310006, China
Download: PDF(530 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] A feature extraction method is proposed aiming to detect spams and improve recognition rate from regular product reviews in electronic commerce. [Methods] Based on the idea of quantitative evaluation, features are extracted comprehensively in terms of reviews' intrinsic characters such as the number of evaluation sentence, sentiment tendency, topic word and text structure. The number of evaluation sentence is the key feature to distinguish spams from regular product reviews using Part-Of-Speech (POS) path matching templates, and a custom dictionary is imported to improve recognition rate of detecting evaluation sentence. [Results] Experiment results show that the spam recognition precision can reach 97.96% and F-measure reach 88.48%. [Limitations] This method is mainly used to identify Chinese review spams, is not suitable for the English product reviews. [Conclusions] Review spams can be effectively and accurately detected by the proposed features. Furthermore, these features can also be applied to evaluate and rank the regular product reviews, and other related applications.

Key wordsOpinion mining      Feature extraction      Review spam     
Received: 01 April 2014      Published: 28 November 2014
:  TP393  

Cite this article:

You Guirong, Wu Wei, Qian Yuntao. Feature Extraction Method for Detecting Spam in Electronic Commerce. New Technology of Library and Information Service, 2014, 30(10): 93-100.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.10.14     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I10/93

[1] 2012年中国网络购物市场研究报告[R/OL]. [2013-11-16]. http://www.cnnic.cn/hlwfzyj/hlwxzbg/dzswbg/201304/t20130417_39290.htm. (China Online Shopping Market Survey Report, 2012 [R/OL]. [2013-11-16]. http://www.cnnic.cn/hlwfzyj/ hlwxzbg/dzswbg/201304/t20130417_39290.htm.)
[2] 淘宝评价体系介绍[EB/OL]. [2013-10-18]. http://service. taobao.com/support/knowledge-4781666.htm?spm=0.0.0.49.x2xxVE&dkey=searchview.(Introduce of Taobao Evaluation System[EB/OL]. [2013-10-18]. http://service.taobao.com/ support/knowledge-4781666.htm?spm=0.0.0.49.x2xxVE&dkey=searchview.)
[3] Jindal N, Liu B. Opinion Spam and Analysis [C]. In: Proceedings of International Conference on Web Search and Web Data Mining, California, USA. New York, NY, USA: ACM, 2008: 219-229.
[4] Liu J J, Cao Y B, Lin C Y, et al. Low-Quality Product Review Detection in Opinion Summarization [C]. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic. 2007: 334-342.
[5] 李霄, 丁晟春. 垃圾商品评论信息的识别研究[J]. 现代图书情报技术, 2013(1): 63-68. (Li Xiao, Ding Shengchun. Research on Review Spam Recognition [J]. New Technology of Library and Information Service, 2013(1): 63-68.)
[6] ICTCLAS汉语分词系统 [CP/OL]. [2013-11-05]. http://www. ictclas.org/.(ICTCLAS Chinese Lexical Analysis System [CP/ OL]. [2013-11-05].http://www.ictclas.org/.)
[7] 知网[DB/OL]. [2013-11-05]. http://www.keenage.com/. (HowNet Knowledge Database [DB/OL]. [2013-11-05]. http:// www.keenage.com/.)
[8] 酷酷词[EB/OL]. [2013-12-08]. http://www.kukuci.com/. (Ku-ku Words[EB/OL]. [2013-12-08]. http://www.kukuci.com/.)
[9] 李实, 叶强, 李一军, 等. 中文网络客户评论的产品特征挖掘方法研究[J]. 管理科学学报, 2009, 12(2): 142-152. (Li Shi, Ye Qiang, Li Yijun, et al. Mining Features of Products from Chinese Customer Online Reviews [J]. Journal of Management Sciences in China, 2009, 12(2): 142-152.)
[10] Wei W, Liu H Y, He J, et al. Extracting Feature and Opinion Words Effectively from Chinese Product Reviews [C]. In: Proceedings of the 5th International Conference on Fuzzy Systems and Knowledge Discovery, Ji'nan, Shandong.IEEE, 2008: 170-174.
[11] 黄亿华, 濮小佳, 袁春风, 等. 基于句法树结构的情感评价单元抽取算法[J]. 计算机应用研究, 2011, 28(9): 3230-3234. (Huang Yihua, Pu Xiaojia, Yuan Chunfeng, et al. Appraisal Expression Extraction Based on Parse Tree Structure [J]. Application Research of Computers, 2011, 28(9): 3230-3234.)
[12] 冯秀珍, 郝鹏. 基于词性分析的产品评价信息挖掘[J]. 计算机工程与设计, 2013, 34(1): 283-288. (Feng Xiuzhen, Hao Peng. Information of Product Review Mining Based on Analyzing of Part of Speech [J]. Computer Engineering and Design, 2013, 34(1): 283-288.)
[13] 赵文婧. 产品描述词及情感词抽取模式的研究[D]. 北京:北京邮电大学, 2010.(Zhao Wenjing. Research on Extraction Patterns of Product Description Words and Sentiment Words [D]. Beijing: Beijing University of Posts and Telecommunications, 2010.)
[14] 扈中凯, 郑小林, 吴亚峰, 等. 基于用户评论挖掘的产品推荐算法[J]. 浙江大学学报: 工学版, 2013, 47(8): 1475-1485. (Hu Zhongkai, Zheng Xiaolin, Wu Yafeng, et al. Product Recommendation Algorithm Based on Users' Reviews Mining [J]. Journal of Zhejiang University: Engineering Science, 2013, 47(8): 1475-1485.)
[15] STEYX [EB/OL]. [2013-11-20]. http://office.microsoft.com/ zh-cn/excel-help/HP010342925.aspx.
[16] Turney P D. Mining the Web for Synonyms:PMI-IR Versus LSA on TOEFL [C]. In: Proceedings of the 12th European Conference on Machine Learning, Freiburg, Germany. London, UK: Springer-Verlag, 2001:491-502.
[17] Hu M Q, Liu B. Mining and Summarizing Customer Reviews [C]. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA. New York, NY, USA: ACM, 2004: 168-177.
[18] Chen C C, Tseng Y D. Quality Evaluation of Product Reviews Using an Information Quality Framework [J]. Decision Support Systems, 2011, 50(4): 755-768.
[19] LibSVM Tools [EB/OL]. [2013-10-08]. http://www.csie.ntu. edu.tw/~cjlin/libsvmtools/.

[1] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[2] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[3] Lixin Zhou,Jie Lin. Extracting Product Features with NodeRank Algorithm[J]. 数据分析与知识发现, 2018, 2(4): 90-98.
[4] Xiaoxi Huang,Hanyu Li,Rongbo Wang,Xiaohua Wang,Zhiqun Chen. Recognizing Metaphor with Convolution Neural Network and SVM[J]. 数据分析与知识发现, 2018, 2(10): 77-83.
[5] Weiqing Li,Weijun Wang. Building Product Feature Dictionary with Large-scale Review Data[J]. 数据分析与知识发现, 2018, 2(1): 41-50.
[6] Changbing Li,Chongpeng Pang,Meiping Li. Extracting Product Features with Weight-based Apriori Algorithm[J]. 数据分析与知识发现, 2017, 1(9): 83-89.
[7] Bo Guo,Shouguang Li,Hao Wang,Xiaojun Zhang,Wei Gong,Zhaojun Yu,Yu Sun. Examining Product Reviews with Sentiment Analysis and Opinion Mining[J]. 数据分析与知识发现, 2017, 1(12): 1-9.
[8] Du Siqi, Li Honglian, Lv Xueqiang. Research of Chinese Chunk Parsing in Application of the Product Feature Extraction[J]. 现代图书情报技术, 2015, 31(9): 26-30.
[9] Zhang Li, Xu Xin. Implicit Feature Identification in Product Reviews[J]. 现代图书情报技术, 2015, 31(12): 42-47.
[10] Lu Yonghe, Liang Minghui. Improvement of Text Feature Extraction with Genetic Algorithm[J]. 现代图书情报技术, 2014, 30(4): 48-57.
[11] Nie Hui, Wang Jiajia. Review of Product Review Spams Detection[J]. 现代图书情报技术, 2014, 30(2): 63-71.
[12] Tang Xiaobo, Xiao Lu. Research of Text Feature Extraction on Dependency Parsing Network[J]. 现代图书情报技术, 2014, 30(11): 31-37.
[13] Li Xiao, Ding Shengchun. Research on Review Spam Recognition[J]. 现代图书情报技术, 2013, 29(1): 63-68.
[14] Xu Jian, Wen Haosheng. Study on Talents Description Web Page Automatic Recognition System[J]. 现代图书情报技术, 2011, 27(6): 20-26.
[15] Li Gang,Chen Jing,Cheng Mingjie,Kou Guangzeng. Study on the City Image Network Monitoring System Based on Opinion-mining[J]. 现代图书情报技术, 2010, 26(2): 56-62.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn