Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (12): 1-9    DOI: 10.11925/infotech.2096-3467.2017.0618
Orginal Article Current Issue | Archive | Adv Search |
Examining Product Reviews with Sentiment Analysis and Opinion Mining
Guo Bo1(), Li Shouguang1, Wang Hao1, Zhang Xiaojun1, Gong Wei1, Yu Zhaojun1, Sun Yu2
1Meizu Telecom Equipment Co., Ltd., Beijing 100872, China
2Computer Science Department, California State Polytechnic University, Pomona 91768, USA
Download: PDF (1009 KB)   HTML ( 5
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study conducts a comprehensive analysis of huge amount of reviews generated by E-commerce website users, aiming to assess the marketing strategies. [Methods] We used syntactic parsing, bag of words model and machine learning techniques to examine real-world datasets from JD and TMall. The proposed method could analyze sentiment and extract opinion from the reviews automatically. [Results] The accuracy of the sentiment analysis was 90%. We constructed an automatic vocabulary building mechanism without dictionary dependency. The F-measure of the new system was 71%. [Limitations] The recall of the opinion extraction needs to be improved. [Conclusions] The proposed system could effectively monitor the word-of-mouth issues facing products sold online. It could be transferred to many online business.

Key wordsUser Review      Sentimental Analysis      Opinion Mining      Machine Learning      Tag Extraction     
Received: 29 June 2017      Published: 29 December 2017
ZTFLH:  TP181  

Cite this article:

Guo Bo,Li Shouguang,Wang Hao,Zhang Xiaojun,Gong Wei,Yu Zhaojun,Sun Yu. Examining Product Reviews with Sentiment Analysis and Opinion Mining. Data Analysis and Knowledge Discovery, 2017, 1(12): 1-9.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.0618     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I12/1

步骤 依存句法关系 含义 示例
种子评价词

新特征词
nsubj(VA, NN) 句子主语 手机(外形)很(漂亮)
amod(NN, VA) 修饰关系 很(差)的(手机)
amod(NN, JJ) 修饰关系 这个手机有很(漂亮)的(外形)
种子评价词

新评价词
dep(VA, VA) 依赖关系
新特征词

新特征词
conj(NN, NN) 并列关系 手机的(拍照)和(摄像)不错
compound:nn(NN, NN) 名词组合 (手机外形)不错
nmod:assmod(NN, NN) 名词短语 (手机)的(外形)很漂亮
新特征词

新评价词
nsubj(VA, NN) 句子主语 手机(外形)很(漂亮)
amod(NN, VA) 修饰关系 很(差)的(手机)
amod(NN, JJ) 修饰关系 这个手机有很(漂亮)的(外形)
模型 算法 准确率 召回率 F1值 AUC
基础模型 NB 0.889 0.892 0.890 0.950
否定词模型 NB 0.892 0.899 0.895 0.953
句法模型 NB 0.914 0.908 0.911 0.961
基础模型 SGD 0.908 0.894 0.901 0.958
否定词模型 SGD 0.911 0.904 0.907 0.961
句法模型 SGD 0.917 0.919 0.918 0.967
基础模型 SVM 0.902 0.902 0.902 0.959
否定词模型 SVM 0.912 0.900 0.906 0.960
句法模型 SVM 0.916 0.920 0.918 0.966
基础模型 RF 0.871 0.870 0.871 0.942
否定词模型 RF 0.875 0.874 0.874 0.945
句法模型 RF 0.880 0.880 0.880 0.948
5万 10万 15万 20万
NB 0.23 0.45 0.59 0.98
SGD 0.22 0.39 0.57 0.75
SVM 4 12 17 26
RF 190 400 640 890
[1] CNNIC. 2015年中国网络购物市场研究报告[R]. 北京: 中国互联网络信息中心, 2016.
[1] (CNNIC. 2015 China Online Shopping Market Research Report [R]. Beijing: China Internet Network Information Center, 2016.)
[2] Agarwal B, Mittal N.Machine Learning Approaches for Sentiment Analysis[A]// Prominent Feature Extraction for Sentiment Analysis[M]. Springer International Publishing, 2016: 21-45.
[3] Yi J, Nasukawa T, Bunescu R.Sentiment Analyzer: Extracting Sentiments about a Given Topic Using Natural Language Processing Techniques[C]//Proceedings of the IEEE International Conference on Data Mining (ICDM). 2003: 427-434.
[4] Shuster S, Shaw E.Alignment of Standards Using WordNet for Assessing K-12 Engineering Practices in a Participatory Learning Environment[C] // Proceedings of International Conference on Advanced Technologies Enhancing Education. 2017.
[5] Amaral K M, Chen P, Crouter S, et al.Bag-of-Words Method Applied to Accelerometer Measurements for the Purpose of Classification and Energy Estimation [OL]. arXiv Preprint. arXiv: 1704. 01574.
[6] Pang B, Lee L, Vaithyanathan S.Thumbs up?: Sentiment Classification Using Machine Learning Techniques[C]// Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10. 2002: 79-86.
[7] Hatzivassiloglou V, Wiebe J M.Effects of Adjective Orientation and Gradability on Sentence Subjectivity[C] //Proceedings of the 18th Conference on Computational Linguistics- Volume 1. 2000: 299-305.
[8] Ku L-W, Liang Y-T, Chen H-H, et al.Opinion Extraction, Summarization and Tracking in News and Blog Corpora[C]// Proceedings of AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs. 2006.
[9] Marrese-Taylor E, Matsuo Y.Replication Issues in Syntax-based Aspect Extraction for Opinion Mining[OL]. arXiv Preprint. arXiv: 1701.01565.
doi: 10.18653/v1/E17-4003
[10] Sokal A.SentiCompass: Interactive Visualization for Exploring and Comparing the Sentiments of Time-varying Twitter Data[C]// Proceedings of Visualization Symposium. IEEE, 2015: 129-133.
[11] Hatzivassiloglou V, McKeown K R. Predicting the Semantic Orientation of Adjectives[C] // Proceedings of the 8th Conference on European Chapter of the Association for Computational Linguistics. 1997: 174-181.
[12] Wiebe J.Learning Subjective Adjectives from Corpora[C]// Proceedings of the 17th National Conference on Artificial Intelligence and 12th Conference on Innovative Applications of Artificial Intelligence. 2000: 735-740.
[13] Kaji N, Kitsuregawa M.Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents[C] //Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2007: 1075-1083.
[14] Kanayama H, Nasukawa T.Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis[C] // Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. 2006: 355-363.
[15] Hu M, Liu B.Mining and Summarizing Customer Reviews[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004: 168-177.
[16] Qiu G, Liu B, Bu J, et al.Expanding Domain Sentiment Lexicon Through Double Propagation[C] //Proceedings of the International Joint Conference on Artificial Intelligence. 2009: 1199-1204.
[17] Serdah A M, Ashour W M.Clustering Large-scale Data Based on Modified Affinity Propagation Algorithm[J]. Journal of Artificial Intelligence and Soft Computing Research, 2016, 6(1): 23-33.
doi: 10.1515/jaiscr-2016-0003
[18] Van Nguyen T, Nguyen A T, Phan H D, et al.Combining Word2Vec with Revised Vector Space Model for Better Code Retrieval [C] // Proceedings of the 39th International Conference on Software Engineering Companion. IEEE Press, 2017: 183-185.
[19] Su Q, Xiang K, Wang H, et al.Using Pointwise Mutual Information to Identify Implicit Features in Customer Reviews[C]//Proceedings of International Conference on Computer Processing of Oriental Languages (ICCPOL). 2006, 4285: 22-30.
[20] Strand J, Carson R T, Navrud S, et al.Using the Delphi Method to Value Protection of the Amazon Rainforest[J]. Ecological Economics, 2017, 131: 475-484.
doi: 10.1016/j.ecolecon.2016.09.028
[21] Guo B, Wang H, Yu Z, et al.Detecting Spammers in E-Commerce Website via Spectrum Features of User Relation Graph[C] //Proceedings of 2017 International Conference on Advanced Cloud and Big Data (CBD), Shanghai, China. 2017: 324-330.
[22] Guo B, Wang H, Yu Z, et al.Detecting the Internet Water Army via Comprehensive Behavioral Features Using Large-scale E-commerce Reviews[C]//Proceedings of 2017 International Conference on Computer, Information and Telecommunication Systems (CITS), Dalian, China. 2017: 88-92.
[1] Wang Hanxue,Cui Wenjuan,Zhou Yuanchun,Du Yi. Identifying Pathogens of Foodborne Diseases with Machine Learning[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] Chen Donghua,Zhao Hongmei,Shang Xiaopu,Zhang Runtong. Optimizing Large Hospital Operating Rooms with Data Analytics[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[3] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[4] Su Qiang, Hou Xiaoli, Zou Ni. Predicting Surgical Infections Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[5] Cao Rui,Liao Bin,Li Min,Sun Ruina. Predicting Prices and Analyzing Features of Online Short-Term Rentals Based on XGBoost[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[6] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[7] Xiang Zhuoyuan,Liu Zhicong,Wu Yu. Adaptive Recommendation Model Based on User Behaviors[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[8] Zheng Xinman, Dong Yu. Constructing Degree Lexicon for STI Policy Texts[J]. 数据分析与知识发现, 2021, 5(10): 81-93.
[9] Hua Bin, Wu Nuo, He Xin. Integrating Expert Reviews for Government Information Projects with Knowledge Fusion[J]. 数据分析与知识发现, 2021, 5(10): 124-136.
[10] Chai Guorong,Wang Bin,Sha Yongzhong. Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[11] Xu Hongxia,Yu Qianqian,Qian Li. Studying Content Interaction Data with Topic Model and Sentiment Analysis[J]. 数据分析与知识发现, 2020, 4(7): 110-117.
[12] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[13] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[14] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[15] Wang Shuyi,Liu Sai,Ma Zheng. Microblog Image Privacy Classification with Deep Transfer Learning[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn