Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (12): 1-9    DOI: 10.11925/infotech.2096-3467.2017.0618
Orginal Article Current Issue | Archive | Adv Search |
Examining Product Reviews with Sentiment Analysis and Opinion Mining
Guo Bo1(), Li Shouguang1, Wang Hao1, Zhang Xiaojun1, Gong Wei1, Yu Zhaojun1, Sun Yu2
1Meizu Telecom Equipment Co., Ltd., Beijing 100872, China
2Computer Science Department, California State Polytechnic University, Pomona 91768, USA
Download: PDF (1009 KB)   HTML ( 3
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study conducts a comprehensive analysis of huge amount of reviews generated by E-commerce website users, aiming to assess the marketing strategies. [Methods] We used syntactic parsing, bag of words model and machine learning techniques to examine real-world datasets from JD and TMall. The proposed method could analyze sentiment and extract opinion from the reviews automatically. [Results] The accuracy of the sentiment analysis was 90%. We constructed an automatic vocabulary building mechanism without dictionary dependency. The F-measure of the new system was 71%. [Limitations] The recall of the opinion extraction needs to be improved. [Conclusions] The proposed system could effectively monitor the word-of-mouth issues facing products sold online. It could be transferred to many online business.

Key wordsUser Review      Sentimental Analysis      Opinion Mining      Machine Learning      Tag Extraction     
Received: 29 June 2017      Published: 29 December 2017
ZTFLH:  TP181  

Cite this article:

Guo Bo,Li Shouguang,Wang Hao,Zhang Xiaojun,Gong Wei,Yu Zhaojun,Sun Yu. Examining Product Reviews with Sentiment Analysis and Opinion Mining. Data Analysis and Knowledge Discovery, 2017, 1(12): 1-9.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.0618     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I12/1

步骤 依存句法关系 含义 示例
种子评价词

新特征词
nsubj(VA, NN) 句子主语 手机(外形)很(漂亮)
amod(NN, VA) 修饰关系 很(差)的(手机)
amod(NN, JJ) 修饰关系 这个手机有很(漂亮)的(外形)
种子评价词

新评价词
dep(VA, VA) 依赖关系
新特征词

新特征词
conj(NN, NN) 并列关系 手机的(拍照)和(摄像)不错
compound:nn(NN, NN) 名词组合 (手机外形)不错
nmod:assmod(NN, NN) 名词短语 (手机)的(外形)很漂亮
新特征词

新评价词
nsubj(VA, NN) 句子主语 手机(外形)很(漂亮)
amod(NN, VA) 修饰关系 很(差)的(手机)
amod(NN, JJ) 修饰关系 这个手机有很(漂亮)的(外形)
模型 算法 准确率 召回率 F1值 AUC
基础模型 NB 0.889 0.892 0.890 0.950
否定词模型 NB 0.892 0.899 0.895 0.953
句法模型 NB 0.914 0.908 0.911 0.961
基础模型 SGD 0.908 0.894 0.901 0.958
否定词模型 SGD 0.911 0.904 0.907 0.961
句法模型 SGD 0.917 0.919 0.918 0.967
基础模型 SVM 0.902 0.902 0.902 0.959
否定词模型 SVM 0.912 0.900 0.906 0.960
句法模型 SVM 0.916 0.920 0.918 0.966
基础模型 RF 0.871 0.870 0.871 0.942
否定词模型 RF 0.875 0.874 0.874 0.945
句法模型 RF 0.880 0.880 0.880 0.948
5万 10万 15万 20万
NB 0.23 0.45 0.59 0.98
SGD 0.22 0.39 0.57 0.75
SVM 4 12 17 26
RF 190 400 640 890
[1] CNNIC. 2015年中国网络购物市场研究报告[R]. 北京: 中国互联网络信息中心, 2016.
[1] (CNNIC. 2015 China Online Shopping Market Research Report [R]. Beijing: China Internet Network Information Center, 2016.)
[2] Agarwal B, Mittal N.Machine Learning Approaches for Sentiment Analysis[A]// Prominent Feature Extraction for Sentiment Analysis[M]. Springer International Publishing, 2016: 21-45.
[3] Yi J, Nasukawa T, Bunescu R.Sentiment Analyzer: Extracting Sentiments about a Given Topic Using Natural Language Processing Techniques[C]//Proceedings of the IEEE International Conference on Data Mining (ICDM). 2003: 427-434.
[4] Shuster S, Shaw E.Alignment of Standards Using WordNet for Assessing K-12 Engineering Practices in a Participatory Learning Environment[C] // Proceedings of International Conference on Advanced Technologies Enhancing Education. 2017.
[5] Amaral K M, Chen P, Crouter S, et al.Bag-of-Words Method Applied to Accelerometer Measurements for the Purpose of Classification and Energy Estimation [OL]. arXiv Preprint. arXiv: 1704. 01574.
[6] Pang B, Lee L, Vaithyanathan S.Thumbs up?: Sentiment Classification Using Machine Learning Techniques[C]// Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10. 2002: 79-86.
[7] Hatzivassiloglou V, Wiebe J M.Effects of Adjective Orientation and Gradability on Sentence Subjectivity[C] //Proceedings of the 18th Conference on Computational Linguistics- Volume 1. 2000: 299-305.
[8] Ku L-W, Liang Y-T, Chen H-H, et al.Opinion Extraction, Summarization and Tracking in News and Blog Corpora[C]// Proceedings of AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs. 2006.
[9] Marrese-Taylor E, Matsuo Y.Replication Issues in Syntax-based Aspect Extraction for Opinion Mining[OL]. arXiv Preprint. arXiv: 1701.01565.
doi: 10.18653/v1/E17-4003
[10] Sokal A.SentiCompass: Interactive Visualization for Exploring and Comparing the Sentiments of Time-varying Twitter Data[C]// Proceedings of Visualization Symposium. IEEE, 2015: 129-133.
[11] Hatzivassiloglou V, McKeown K R. Predicting the Semantic Orientation of Adjectives[C] // Proceedings of the 8th Conference on European Chapter of the Association for Computational Linguistics. 1997: 174-181.
[12] Wiebe J.Learning Subjective Adjectives from Corpora[C]// Proceedings of the 17th National Conference on Artificial Intelligence and 12th Conference on Innovative Applications of Artificial Intelligence. 2000: 735-740.
[13] Kaji N, Kitsuregawa M.Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents[C] //Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2007: 1075-1083.
[14] Kanayama H, Nasukawa T.Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis[C] // Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. 2006: 355-363.
[15] Hu M, Liu B.Mining and Summarizing Customer Reviews[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004: 168-177.
[16] Qiu G, Liu B, Bu J, et al.Expanding Domain Sentiment Lexicon Through Double Propagation[C] //Proceedings of the International Joint Conference on Artificial Intelligence. 2009: 1199-1204.
[17] Serdah A M, Ashour W M.Clustering Large-scale Data Based on Modified Affinity Propagation Algorithm[J]. Journal of Artificial Intelligence and Soft Computing Research, 2016, 6(1): 23-33.
doi: 10.1515/jaiscr-2016-0003
[18] Van Nguyen T, Nguyen A T, Phan H D, et al.Combining Word2Vec with Revised Vector Space Model for Better Code Retrieval [C] // Proceedings of the 39th International Conference on Software Engineering Companion. IEEE Press, 2017: 183-185.
[19] Su Q, Xiang K, Wang H, et al.Using Pointwise Mutual Information to Identify Implicit Features in Customer Reviews[C]//Proceedings of International Conference on Computer Processing of Oriental Languages (ICCPOL). 2006, 4285: 22-30.
[20] Strand J, Carson R T, Navrud S, et al.Using the Delphi Method to Value Protection of the Amazon Rainforest[J]. Ecological Economics, 2017, 131: 475-484.
doi: 10.1016/j.ecolecon.2016.09.028
[21] Guo B, Wang H, Yu Z, et al.Detecting Spammers in E-Commerce Website via Spectrum Features of User Relation Graph[C] //Proceedings of 2017 International Conference on Advanced Cloud and Big Data (CBD), Shanghai, China. 2017: 324-330.
[22] Guo B, Wang H, Yu Z, et al.Detecting the Internet Water Army via Comprehensive Behavioral Features Using Large-scale E-commerce Reviews[C]//Proceedings of 2017 International Conference on Computer, Information and Telecommunication Systems (CITS), Dalian, China. 2017: 88-92.
[1] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[2] Xu Hongxia,Yu Qianqian,Qian Li. Studying Content Interaction Data with Topic Model and Sentiment Analysis[J]. 数据分析与知识发现, 2020, 4(7): 110-117.
[3] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[4] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[5] Bocheng Li,Yunqiu Zhang,Kaixi Yang. Extracting Emotion Tags from Comments of Microblog Commodities[J]. 数据分析与知识发现, 2019, 3(9): 115-123.
[6] Ruojia Wang,Lu Zhang,Jimin Wang. Automatic Triage of Online Doctor Services Based on Machine Learning[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
[7] Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[8] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[9] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[10] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[11] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[12] Jing Li,Shuxiao Pan,Xueyan Li,Lijing Jia,Yuzhuo Zhao. Screening Critical Patients with Optimized Classifier Based on Multi Objective Quantum[J]. 数据分析与知识发现, 2019, 3(12): 101-112.
[13] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[14] Zixuan Zhang,Hao Wang,Liping Zhu,Sanhong eng. Identifying Risks of HS Codes by China Customs[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
[15] Lina Liu,Jiayin Qi,Zhenping Zhang,Dan Zeng. Analyzing Impacts of Brand Reputation on Online Sales Based on Massive Commodity Reviews and Brand[J]. 数据分析与知识发现, 2018, 2(9): 10-21.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn