|
|
Predicting User Ratings with XGBoost Algorithm |
Guijun Yang1,Xue Xu1(),Fuqiang Zhao2 |
1China Center of Economics and Statistics Research, Tianjin University of Finance and Economics, Tianjin 300222, China 2Institute of Polytechnic, Tianjin University of Finance and Economics, Tianjin 300222, China |
|
|
Abstract [Objective] This study aims to build a model for effectively predicting ratings of user reviews and analysing consumer behaviours. [Methods] First, we applied the Latent Dirichlet Allocation model to set the topic features from user reviews as independent variable and user ratings as dependent variable. Then, we built a user rating prediction model based on the eXtreme Gradient Boosting algorithm. Finally, we added the disturbances of samples and attributes to the proposed model for rating prediction. [Results] We used the new model to predict user’s comments on a domestic automobile online portal, and identified their preferences of automobile. Compared with the Logical Regression and Random Forest algorithms, the proposed model has better precision and efficiency. [Limitations] We need to include data from other fields to more comprehensively describe user’s behaviours. [Conclusions] The proposed model could quantify user’s reviews and then predict their ratings effectively.
|
Received: 13 April 2018
Published: 04 March 2019
|
[1] | Koren Y, Bell R, Volinsky C.Matrix Factorization Techniques for Recommender Systems[J]. Computer, 2009, 42(8): 30-37. | [2] | Koren Y, Bell R.Advances in Collaborative Filtering[A]// Recommender Systems Handbook[M]. New York: Springer, 2011: 145-186. | [3] | 邓晓懿, 金淳, 韩庆平, 等. 基于情境聚类和用户评级的协同过滤推荐模型[J]. 系统工程理论与实践, 2013, 33(11): 2945-2953. | [3] | (Deng Xiaoyi, Jin Chun, Han Jim C, et al.Improved Collaborative Filtering Model Based on Context Clustering and User Ranking[J]. Systems Engineering —Theory & Practice, 2013, 33(11): 2945-2953.) | [4] | Li X, Xu G, Chen E, et al.Learning User Preferences across Multiple Aspects for Merchant Recommendation[C]// Proceedings of the 2015 IEEE International Conference on Data Mining. IEEE, 2015. | [5] | Fan M, Khademi M.Predicting a Business Star in Yelp from Its Reviews Text Alone[OL]. arXiv Preprint, arXiv: 1401.0864. | [6] | 张红丽, 刘济郢, 杨斯楠, 等. 基于网络用户评论的评分预测模型研究[J]. 数据分析与知识发现, 2017, 1(8): 48-58. | [6] | (Zhang Hongli, Liu Jiying, Yang Sinan, et al.Predicting Online Users’ Ratings with Comments[J]. Data Analysis and Knowledge Discovery, 2017, 1(8): 48-58.) | [7] | 高祎璠, 余文喆, 晁平复, 等. 基于评论分析的评分预测与推荐[J]. 华东师范大学学报: 自然科学版, 2015(3): 80-90. | [7] | (Gao Yifan, Yu Wenzhe, Chao Pingfu, et al.Analyzing Reviews for Rating Prediction and Item Recommendation[J]. Journal of East China Normal University: Natural Science, 2015(3): 80-90.) | [8] | 杨博, 赵鹏飞. 推荐算法综述[J]. 山西大学学报: 自然科学版, 2011, 34(3): 337-350. | [8] | (Yang Bo, Zhao Pengfei.Review of the Art of Recommendation Algorithms[J]. Journal of Shanxi University: Natural Science Edition, 2011, 34(3): 337-350.) | [9] | Brown I, Mues C.An Experimental Comparison of Classification Algorithms for Imbalanced Credit Scoring Data Sets[J]. Expert Systems with Applications, 2012, 39(3): 3446-3453. | [10] | 应维云. 随机森林方法及其在客户流失预测中的应用研究[J]. 管理评论, 2012, 24(2): 140-145. | [10] | (Ying Weiyun.The Research on Random Forests and the Application in Customer Churn Prediction[J]. Management Review, 2012, 24(2): 140-145.) | [11] | Breiman L.Random Forests[J]. Machine Learning, 2001, 45(1): 5-32. | [12] | Chen T, Guestrin C.XGBoost: A Scalable Tree Boosting System[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016: 785-794. | [13] | Seyfioğlu M, Demirezen M.A Hierarchical Approach for Sentiment Analysis and Categorization of Turkish Written Customer Relationship Management Data[C]//Proceedings of the 2017 Federated Conference on Computer Science and Information Systems. IEEE, 2017: 361-365. | [14] | Athanasiou V, Maragoudakis M.A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources are Not Plentiful: A Case Study for Modern Greek[J]. Algorithms, 2017, 10(1): 34. | [15] | Zhang R, Gao Y, Yu W, et al.Review Comment Analysis for Predicting Ratings[A]// Web-Age Information Management[M]. Springer, 2015: 247-259. | [16] | Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022. | [17] | Friedman J H.Greedy Function Approximation: A Gradient Boosting Machine[J]. Annals of Statistics, 2001, 29(5): 1189-1232. | [18] | Breiman L I, Friedman J H, Olshen R A, et al.Classification and Regression Trees (CART)[J]. Encyclopedia of Ecology, 1984, 40(3): 582-588. |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|