|
|
Feature Extraction Method for Detecting Spam in Electronic Commerce |
You Guirong1,2, Wu Wei3, Qian Yuntao2 |
1. Department of Information Management Engineering, Fujian Commercial College, Fuzhou 350012, China;
2. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
3. Zhejiang Province Key Laboratory of Network System and Information Security, Hangzhou 310006, China |
|
|
Abstract [Objective] A feature extraction method is proposed aiming to detect spams and improve recognition rate from regular product reviews in electronic commerce. [Methods] Based on the idea of quantitative evaluation, features are extracted comprehensively in terms of reviews' intrinsic characters such as the number of evaluation sentence, sentiment tendency, topic word and text structure. The number of evaluation sentence is the key feature to distinguish spams from regular product reviews using Part-Of-Speech (POS) path matching templates, and a custom dictionary is imported to improve recognition rate of detecting evaluation sentence. [Results] Experiment results show that the spam recognition precision can reach 97.96% and F-measure reach 88.48%. [Limitations] This method is mainly used to identify Chinese review spams, is not suitable for the English product reviews. [Conclusions] Review spams can be effectively and accurately detected by the proposed features. Furthermore, these features can also be applied to evaluate and rank the regular product reviews, and other related applications.
|
Received: 01 April 2014
Published: 28 November 2014
|
|
[1] 2012年中国网络购物市场研究报告[R/OL]. [2013-11-16]. http://www.cnnic.cn/hlwfzyj/hlwxzbg/dzswbg/201304/t20130417_39290.htm. (China Online Shopping Market Survey Report, 2012 [R/OL]. [2013-11-16]. http://www.cnnic.cn/hlwfzyj/ hlwxzbg/dzswbg/201304/t20130417_39290.htm.)
[2] 淘宝评价体系介绍[EB/OL]. [2013-10-18]. http://service. taobao.com/support/knowledge-4781666.htm?spm=0.0.0.49.x2xxVE&dkey=searchview.(Introduce of Taobao Evaluation System[EB/OL]. [2013-10-18]. http://service.taobao.com/ support/knowledge-4781666.htm?spm=0.0.0.49.x2xxVE&dkey=searchview.)
[3] Jindal N, Liu B. Opinion Spam and Analysis [C]. In: Proceedings of International Conference on Web Search and Web Data Mining, California, USA. New York, NY, USA: ACM, 2008: 219-229.
[4] Liu J J, Cao Y B, Lin C Y, et al. Low-Quality Product Review Detection in Opinion Summarization [C]. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic. 2007: 334-342.
[5] 李霄, 丁晟春. 垃圾商品评论信息的识别研究[J]. 现代图书情报技术, 2013(1): 63-68. (Li Xiao, Ding Shengchun. Research on Review Spam Recognition [J]. New Technology of Library and Information Service, 2013(1): 63-68.)
[6] ICTCLAS汉语分词系统 [CP/OL]. [2013-11-05]. http://www. ictclas.org/.(ICTCLAS Chinese Lexical Analysis System [CP/ OL]. [2013-11-05].http://www.ictclas.org/.)
[7] 知网[DB/OL]. [2013-11-05]. http://www.keenage.com/. (HowNet Knowledge Database [DB/OL]. [2013-11-05]. http:// www.keenage.com/.)
[8] 酷酷词[EB/OL]. [2013-12-08]. http://www.kukuci.com/. (Ku-ku Words[EB/OL]. [2013-12-08]. http://www.kukuci.com/.)
[9] 李实, 叶强, 李一军, 等. 中文网络客户评论的产品特征挖掘方法研究[J]. 管理科学学报, 2009, 12(2): 142-152. (Li Shi, Ye Qiang, Li Yijun, et al. Mining Features of Products from Chinese Customer Online Reviews [J]. Journal of Management Sciences in China, 2009, 12(2): 142-152.)
[10] Wei W, Liu H Y, He J, et al. Extracting Feature and Opinion Words Effectively from Chinese Product Reviews [C]. In: Proceedings of the 5th International Conference on Fuzzy Systems and Knowledge Discovery, Ji'nan, Shandong.IEEE, 2008: 170-174.
[11] 黄亿华, 濮小佳, 袁春风, 等. 基于句法树结构的情感评价单元抽取算法[J]. 计算机应用研究, 2011, 28(9): 3230-3234. (Huang Yihua, Pu Xiaojia, Yuan Chunfeng, et al. Appraisal Expression Extraction Based on Parse Tree Structure [J]. Application Research of Computers, 2011, 28(9): 3230-3234.)
[12] 冯秀珍, 郝鹏. 基于词性分析的产品评价信息挖掘[J]. 计算机工程与设计, 2013, 34(1): 283-288. (Feng Xiuzhen, Hao Peng. Information of Product Review Mining Based on Analyzing of Part of Speech [J]. Computer Engineering and Design, 2013, 34(1): 283-288.)
[13] 赵文婧. 产品描述词及情感词抽取模式的研究[D]. 北京:北京邮电大学, 2010.(Zhao Wenjing. Research on Extraction Patterns of Product Description Words and Sentiment Words [D]. Beijing: Beijing University of Posts and Telecommunications, 2010.)
[14] 扈中凯, 郑小林, 吴亚峰, 等. 基于用户评论挖掘的产品推荐算法[J]. 浙江大学学报: 工学版, 2013, 47(8): 1475-1485. (Hu Zhongkai, Zheng Xiaolin, Wu Yafeng, et al. Product Recommendation Algorithm Based on Users' Reviews Mining [J]. Journal of Zhejiang University: Engineering Science, 2013, 47(8): 1475-1485.)
[15] STEYX [EB/OL]. [2013-11-20]. http://office.microsoft.com/ zh-cn/excel-help/HP010342925.aspx.
[16] Turney P D. Mining the Web for Synonyms:PMI-IR Versus LSA on TOEFL [C]. In: Proceedings of the 12th European Conference on Machine Learning, Freiburg, Germany. London, UK: Springer-Verlag, 2001:491-502.
[17] Hu M Q, Liu B. Mining and Summarizing Customer Reviews [C]. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA. New York, NY, USA: ACM, 2004: 168-177.
[18] Chen C C, Tseng Y D. Quality Evaluation of Product Reviews Using an Information Quality Framework [J]. Decision Support Systems, 2011, 50(4): 755-768.
[19] LibSVM Tools [EB/OL]. [2013-10-08]. http://www.csie.ntu. edu.tw/~cjlin/libsvmtools/. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|