1China Center of Economics and Statistics Research, Tianjin University of Finance and Economics, Tianjin 300222, China 2Institute of Polytechnic, Tianjin University of Finance and Economics, Tianjin 300222, China
[Objective] This study aims to build a model for effectively predicting ratings of user reviews and analysing consumer behaviours. [Methods] First, we applied the Latent Dirichlet Allocation model to set the topic features from user reviews as independent variable and user ratings as dependent variable. Then, we built a user rating prediction model based on the eXtreme Gradient Boosting algorithm. Finally, we added the disturbances of samples and attributes to the proposed model for rating prediction. [Results] We used the new model to predict user’s comments on a domestic automobile online portal, and identified their preferences of automobile. Compared with the Logical Regression and Random Forest algorithms, the proposed model has better precision and efficiency. [Limitations] We need to include data from other fields to more comprehensively describe user’s behaviours. [Conclusions] The proposed model could quantify user’s reviews and then predict their ratings effectively.
(Deng Xiaoyi, Jin Chun, Han Jim C, et al.Improved Collaborative Filtering Model Based on Context Clustering and User Ranking[J]. Systems Engineering —Theory & Practice, 2013, 33(11): 2945-2953.)
[4]
Li X, Xu G, Chen E, et al.Learning User Preferences across Multiple Aspects for Merchant Recommendation[C]// Proceedings of the 2015 IEEE International Conference on Data Mining. IEEE, 2015.
[5]
Fan M, Khademi M.Predicting a Business Star in Yelp from Its Reviews Text Alone[OL]. arXiv Preprint, arXiv: 1401.0864.
(Zhang Hongli, Liu Jiying, Yang Sinan, et al.Predicting Online Users’ Ratings with Comments[J]. Data Analysis and Knowledge Discovery, 2017, 1(8): 48-58.)
(Gao Yifan, Yu Wenzhe, Chao Pingfu, et al.Analyzing Reviews for Rating Prediction and Item Recommendation[J]. Journal of East China Normal University: Natural Science, 2015(3): 80-90.)
(Yang Bo, Zhao Pengfei.Review of the Art of Recommendation Algorithms[J]. Journal of Shanxi University: Natural Science Edition, 2011, 34(3): 337-350.)
[9]
Brown I, Mues C.An Experimental Comparison of Classification Algorithms for Imbalanced Credit Scoring Data Sets[J]. Expert Systems with Applications, 2012, 39(3): 3446-3453.
Chen T, Guestrin C.XGBoost: A Scalable Tree Boosting System[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016: 785-794.
[13]
Seyfioğlu M, Demirezen M.A Hierarchical Approach for Sentiment Analysis and Categorization of Turkish Written Customer Relationship Management Data[C]//Proceedings of the 2017 Federated Conference on Computer Science and Information Systems. IEEE, 2017: 361-365.
[14]
Athanasiou V, Maragoudakis M.A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources are Not Plentiful: A Case Study for Modern Greek[J]. Algorithms, 2017, 10(1): 34.
[15]
Zhang R, Gao Y, Yu W, et al.Review Comment Analysis for Predicting Ratings[A]// Web-Age Information Management[M]. Springer, 2015: 247-259.
[16]
Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[17]
Friedman J H.Greedy Function Approximation: A Gradient Boosting Machine[J]. Annals of Statistics, 2001, 29(5): 1189-1232.
[18]
Breiman L I, Friedman J H, Olshen R A, et al.Classification and Regression Trees (CART)[J]. Encyclopedia of Ecology, 1984, 40(3): 582-588.