Please wait a minute...
Advanced Search
数据分析与知识发现  2017, Vol. 1 Issue (12): 1-9     https://doi.org/10.11925/infotech.2096-3467.2017.0618
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
电商评论综合分析系统的设计与实现——情感分析与观点挖掘的研究与应用
郭博1(), 李守光1, 王昊1, 张晓军1, 龚伟1, 于昭君1, 孙宇2
1珠海市魅族科技有限公司北京分公司 北京 100872
2加州州立理工大学计算机学院 波莫纳 91768
Examining Product Reviews with Sentiment Analysis and Opinion Mining
Guo Bo1(), Li Shouguang1, Wang Hao1, Zhang Xiaojun1, Gong Wei1, Yu Zhaojun1, Sun Yu2
1Meizu Telecom Equipment Co., Ltd., Beijing 100872, China
2Computer Science Department, California State Polytechnic University, Pomona 91768, USA
全文: PDF (1009 KB)   HTML ( 8
输出: BibTeX | EndNote (RIS)      
摘要 

目的】通过对电商网站产生的海量用户评论数据进行综合分析, 及时获取与产品口碑相关的用户反馈信息, 以便快速有效地反馈企业的市场营销活动效果。【方法】运用词袋模型、依存句法分析和机器学习等新兴技术, 对来自京东和天猫两个主要电商网站的真实数据集进行分析, 实现了电商用户评论的自动情感分析和观点标签提取。【结果】评论情感分析获得约90%的准确率, 利用改进双向传播算法成功实现了一个自动化的词库构建系统, 摆脱对词典的依赖, 该系统的F值达到约71%。【局限】观点标签提取的召回率需要进一步提高。【结论】通过实时获取海量电商评论数据并进行有效分析, 成功实现对用户口碑的快速分析与准确把控, 具有较高的商业化推广前景。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
郭博
李守光
王昊
张晓军
龚伟
于昭君
孙宇
关键词 用户评论情感分析观点挖掘机器学习标签提取    
Abstract

[Objective] This study conducts a comprehensive analysis of huge amount of reviews generated by E-commerce website users, aiming to assess the marketing strategies. [Methods] We used syntactic parsing, bag of words model and machine learning techniques to examine real-world datasets from JD and TMall. The proposed method could analyze sentiment and extract opinion from the reviews automatically. [Results] The accuracy of the sentiment analysis was 90%. We constructed an automatic vocabulary building mechanism without dictionary dependency. The F-measure of the new system was 71%. [Limitations] The recall of the opinion extraction needs to be improved. [Conclusions] The proposed system could effectively monitor the word-of-mouth issues facing products sold online. It could be transferred to many online business.

Key wordsUser Review    Sentimental Analysis    Opinion Mining    Machine Learning    Tag Extraction
收稿日期: 2017-06-29      出版日期: 2017-12-29
ZTFLH:  TP181  
引用本文:   
郭博, 李守光, 王昊, 张晓军, 龚伟, 于昭君, 孙宇. 电商评论综合分析系统的设计与实现——情感分析与观点挖掘的研究与应用[J]. 数据分析与知识发现, 2017, 1(12): 1-9.
Guo Bo,Li Shouguang,Wang Hao,Zhang Xiaojun,Gong Wei,Yu Zhaojun,Sun Yu. Examining Product Reviews with Sentiment Analysis and Opinion Mining. Data Analysis and Knowledge Discovery, 2017, 1(12): 1-9.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2017.0618      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2017/V1/I12/1
  情感分析流程
  句法分析结果
  双向传播算法流程
步骤 依存句法关系 含义 示例
种子评价词

新特征词
nsubj(VA, NN) 句子主语 手机(外形)很(漂亮)
amod(NN, VA) 修饰关系 很(差)的(手机)
amod(NN, JJ) 修饰关系 这个手机有很(漂亮)的(外形)
种子评价词

新评价词
dep(VA, VA) 依赖关系
新特征词

新特征词
conj(NN, NN) 并列关系 手机的(拍照)和(摄像)不错
compound:nn(NN, NN) 名词组合 (手机外形)不错
nmod:assmod(NN, NN) 名词短语 (手机)的(外形)很漂亮
新特征词

新评价词
nsubj(VA, NN) 句子主语 手机(外形)很(漂亮)
amod(NN, VA) 修饰关系 很(差)的(手机)
amod(NN, JJ) 修饰关系 这个手机有很(漂亮)的(外形)
  依存句法规则
模型 算法 准确率 召回率 F1值 AUC
基础模型 NB 0.889 0.892 0.890 0.950
否定词模型 NB 0.892 0.899 0.895 0.953
句法模型 NB 0.914 0.908 0.911 0.961
基础模型 SGD 0.908 0.894 0.901 0.958
否定词模型 SGD 0.911 0.904 0.907 0.961
句法模型 SGD 0.917 0.919 0.918 0.967
基础模型 SVM 0.902 0.902 0.902 0.959
否定词模型 SVM 0.912 0.900 0.906 0.960
句法模型 SVM 0.916 0.920 0.918 0.966
基础模型 RF 0.871 0.870 0.871 0.942
否定词模型 RF 0.875 0.874 0.874 0.945
句法模型 RF 0.880 0.880 0.880 0.948
  评论分类结果
5万 10万 15万 20万
NB 0.23 0.45 0.59 0.98
SGD 0.22 0.39 0.57 0.75
SVM 4 12 17 26
RF 190 400 640 890
  算法执行时间对比(秒)
  混淆矩阵图
  好评率对比
  词云结果展示
[1] CNNIC. 2015年中国网络购物市场研究报告[R]. 北京: 中国互联网络信息中心, 2016.
[1] (CNNIC. 2015 China Online Shopping Market Research Report [R]. Beijing: China Internet Network Information Center, 2016.)
[2] Agarwal B, Mittal N.Machine Learning Approaches for Sentiment Analysis[A]// Prominent Feature Extraction for Sentiment Analysis[M]. Springer International Publishing, 2016: 21-45.
[3] Yi J, Nasukawa T, Bunescu R.Sentiment Analyzer: Extracting Sentiments about a Given Topic Using Natural Language Processing Techniques[C]//Proceedings of the IEEE International Conference on Data Mining (ICDM). 2003: 427-434.
[4] Shuster S, Shaw E.Alignment of Standards Using WordNet for Assessing K-12 Engineering Practices in a Participatory Learning Environment[C] // Proceedings of International Conference on Advanced Technologies Enhancing Education. 2017.
[5] Amaral K M, Chen P, Crouter S, et al.Bag-of-Words Method Applied to Accelerometer Measurements for the Purpose of Classification and Energy Estimation [OL]. arXiv Preprint. arXiv: 1704. 01574.
[6] Pang B, Lee L, Vaithyanathan S.Thumbs up?: Sentiment Classification Using Machine Learning Techniques[C]// Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10. 2002: 79-86.
[7] Hatzivassiloglou V, Wiebe J M.Effects of Adjective Orientation and Gradability on Sentence Subjectivity[C] //Proceedings of the 18th Conference on Computational Linguistics- Volume 1. 2000: 299-305.
[8] Ku L-W, Liang Y-T, Chen H-H, et al.Opinion Extraction, Summarization and Tracking in News and Blog Corpora[C]// Proceedings of AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs. 2006.
[9] Marrese-Taylor E, Matsuo Y.Replication Issues in Syntax-based Aspect Extraction for Opinion Mining[OL]. arXiv Preprint. arXiv: 1701.01565.
doi: 10.18653/v1/E17-4003
[10] Sokal A.SentiCompass: Interactive Visualization for Exploring and Comparing the Sentiments of Time-varying Twitter Data[C]// Proceedings of Visualization Symposium. IEEE, 2015: 129-133.
[11] Hatzivassiloglou V, McKeown K R. Predicting the Semantic Orientation of Adjectives[C] // Proceedings of the 8th Conference on European Chapter of the Association for Computational Linguistics. 1997: 174-181.
[12] Wiebe J.Learning Subjective Adjectives from Corpora[C]// Proceedings of the 17th National Conference on Artificial Intelligence and 12th Conference on Innovative Applications of Artificial Intelligence. 2000: 735-740.
[13] Kaji N, Kitsuregawa M.Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents[C] //Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2007: 1075-1083.
[14] Kanayama H, Nasukawa T.Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis[C] // Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. 2006: 355-363.
[15] Hu M, Liu B.Mining and Summarizing Customer Reviews[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004: 168-177.
[16] Qiu G, Liu B, Bu J, et al.Expanding Domain Sentiment Lexicon Through Double Propagation[C] //Proceedings of the International Joint Conference on Artificial Intelligence. 2009: 1199-1204.
[17] Serdah A M, Ashour W M.Clustering Large-scale Data Based on Modified Affinity Propagation Algorithm[J]. Journal of Artificial Intelligence and Soft Computing Research, 2016, 6(1): 23-33.
doi: 10.1515/jaiscr-2016-0003
[18] Van Nguyen T, Nguyen A T, Phan H D, et al.Combining Word2Vec with Revised Vector Space Model for Better Code Retrieval [C] // Proceedings of the 39th International Conference on Software Engineering Companion. IEEE Press, 2017: 183-185.
[19] Su Q, Xiang K, Wang H, et al.Using Pointwise Mutual Information to Identify Implicit Features in Customer Reviews[C]//Proceedings of International Conference on Computer Processing of Oriental Languages (ICCPOL). 2006, 4285: 22-30.
[20] Strand J, Carson R T, Navrud S, et al.Using the Delphi Method to Value Protection of the Amazon Rainforest[J]. Ecological Economics, 2017, 131: 475-484.
doi: 10.1016/j.ecolecon.2016.09.028
[21] Guo B, Wang H, Yu Z, et al.Detecting Spammers in E-Commerce Website via Spectrum Features of User Relation Graph[C] //Proceedings of 2017 International Conference on Advanced Cloud and Big Data (CBD), Shanghai, China. 2017: 324-330.
[22] Guo B, Wang H, Yu Z, et al.Detecting the Internet Water Army via Comprehensive Behavioral Features Using Large-scale E-commerce Reviews[C]//Proceedings of 2017 International Conference on Computer, Information and Telecommunication Systems (CITS), Dalian, China. 2017: 88-92.
[1] 王寒雪,崔文娟,周园春,杜一. 基于机器学习的食源性疾病致病菌识别方法*[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] 陈东华,赵红梅,尚小溥,张润彤. 数据驱动的大型医院手术室运营预测与优化方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[3] 车宏鑫,王桐,王伟. 前列腺癌预测模型对比研究*[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[4] 苏强, 侯校理, 邹妮. 基于机器学习组合优化方法的术后感染预测模型研究*[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[5] 曹睿,廖彬,李敏,孙瑞娜. 基于XGBoost的在线短租市场价格预测及特征分析模型*[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[6] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[7] 刘彤,刘琛,倪维健. 多层次数据增强的半监督中文情感分析方法*[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[8] 向卓元,刘志聪,吴玉. 基于用户行为自适应推荐模型研究 *[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[9] 王雨竹,谢珺,陈波,续欣莹. 基于跨模态上下文感知注意力的多模态情感分析 *[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[10] 常城扬,王晓东,张胜磊. 基于深度学习方法对特定群体推特的动态政治情感极性分析*[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[11] 张梦瑶, 朱广丽, 张顺香, 张标. 基于情感分析的微博热点话题用户群体划分模型 *[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[12] 韩普, 张伟, 张展鹏, 王宇欣, 方浩宇. 基于特征融合和多通道的突发公共卫生事件微博情感分析*[J]. 数据分析与知识发现, 2021, 5(11): 68-79.
[13] 柴国荣,王斌,沙勇忠. 基于多机器学习方法联合的公共卫生风险预测研究——以兰州市流感预测为例*[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[14] 吕华揆,刘政昊,钱宇星,洪旭东. 异质性财经新闻与股市关系研究*[J]. 数据分析与知识发现, 2021, 5(1): 99-111.
[15] 徐红霞,于倩倩,钱力. 基于主题模型和情感分析的话题交互数据观点对抗性分析 *[J]. 数据分析与知识发现, 2020, 4(7): 110-117.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn