Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (1): 118-126    DOI: 10.11925/infotech.2096-3467.2018.0414
Current Issue | Archive | Adv Search |
Predicting User Ratings with XGBoost Algorithm
Guijun Yang1,Xue Xu1(),Fuqiang Zhao2
1China Center of Economics and Statistics Research, Tianjin University of Finance and Economics, Tianjin 300222, China
2Institute of Polytechnic, Tianjin University of Finance and Economics, Tianjin 300222, China
Download: PDF(1339 KB)   HTML ( 6
Export: BibTeX | EndNote (RIS)      

[Objective] This study aims to build a model for effectively predicting ratings of user reviews and analysing consumer behaviours. [Methods] First, we applied the Latent Dirichlet Allocation model to set the topic features from user reviews as independent variable and user ratings as dependent variable. Then, we built a user rating prediction model based on the eXtreme Gradient Boosting algorithm. Finally, we added the disturbances of samples and attributes to the proposed model for rating prediction. [Results] We used the new model to predict user’s comments on a domestic automobile online portal, and identified their preferences of automobile. Compared with the Logical Regression and Random Forest algorithms, the proposed model has better precision and efficiency. [Limitations] We need to include data from other fields to more comprehensively describe user’s behaviours. [Conclusions] The proposed model could quantify user’s reviews and then predict their ratings effectively.

Key wordsRating Prediction      XGBoost Algorithm      LDA      Feature Extraction      User Reviews     
Received: 13 April 2018      Published: 04 March 2019

Cite this article:

Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm. Data Analysis and Knowledge Discovery, 2019, 3(1): 118-126.

URL:     OR

[1] Koren Y, Bell R, Volinsky C.Matrix Factorization Techniques for Recommender Systems[J]. Computer, 2009, 42(8): 30-37.
[2] Koren Y, Bell R.Advances in Collaborative Filtering[A]// Recommender Systems Handbook[M]. New York: Springer, 2011: 145-186.
[3] 邓晓懿, 金淳, 韩庆平, 等. 基于情境聚类和用户评级的协同过滤推荐模型[J]. 系统工程理论与实践, 2013, 33(11): 2945-2953.
[3] (Deng Xiaoyi, Jin Chun, Han Jim C, et al.Improved Collaborative Filtering Model Based on Context Clustering and User Ranking[J]. Systems Engineering —Theory & Practice, 2013, 33(11): 2945-2953.)
[4] Li X, Xu G, Chen E, et al.Learning User Preferences across Multiple Aspects for Merchant Recommendation[C]// Proceedings of the 2015 IEEE International Conference on Data Mining. IEEE, 2015.
[5] Fan M, Khademi M.Predicting a Business Star in Yelp from Its Reviews Text Alone[OL]. arXiv Preprint, arXiv: 1401.0864.
[6] 张红丽, 刘济郢, 杨斯楠, 等. 基于网络用户评论的评分预测模型研究[J]. 数据分析与知识发现, 2017, 1(8): 48-58.
[6] (Zhang Hongli, Liu Jiying, Yang Sinan, et al.Predicting Online Users’ Ratings with Comments[J]. Data Analysis and Knowledge Discovery, 2017, 1(8): 48-58.)
[7] 高祎璠, 余文喆, 晁平复, 等. 基于评论分析的评分预测与推荐[J]. 华东师范大学学报: 自然科学版, 2015(3): 80-90.
[7] (Gao Yifan, Yu Wenzhe, Chao Pingfu, et al.Analyzing Reviews for Rating Prediction and Item Recommendation[J]. Journal of East China Normal University: Natural Science, 2015(3): 80-90.)
[8] 杨博, 赵鹏飞. 推荐算法综述[J]. 山西大学学报: 自然科学版, 2011, 34(3): 337-350.
[8] (Yang Bo, Zhao Pengfei.Review of the Art of Recommendation Algorithms[J]. Journal of Shanxi University: Natural Science Edition, 2011, 34(3): 337-350.)
[9] Brown I, Mues C.An Experimental Comparison of Classification Algorithms for Imbalanced Credit Scoring Data Sets[J]. Expert Systems with Applications, 2012, 39(3): 3446-3453.
[10] 应维云. 随机森林方法及其在客户流失预测中的应用研究[J]. 管理评论, 2012, 24(2): 140-145.
[10] (Ying Weiyun.The Research on Random Forests and the Application in Customer Churn Prediction[J]. Management Review, 2012, 24(2): 140-145.)
[11] Breiman L.Random Forests[J]. Machine Learning, 2001, 45(1): 5-32.
[12] Chen T, Guestrin C.XGBoost: A Scalable Tree Boosting System[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016: 785-794.
[13] Seyfioğlu M, Demirezen M.A Hierarchical Approach for Sentiment Analysis and Categorization of Turkish Written Customer Relationship Management Data[C]//Proceedings of the 2017 Federated Conference on Computer Science and Information Systems. IEEE, 2017: 361-365.
[14] Athanasiou V, Maragoudakis M.A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources are Not Plentiful: A Case Study for Modern Greek[J]. Algorithms, 2017, 10(1): 34.
[15] Zhang R, Gao Y, Yu W, et al.Review Comment Analysis for Predicting Ratings[A]// Web-Age Information Management[M]. Springer, 2015: 247-259.
[16] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[17] Friedman J H.Greedy Function Approximation: A Gradient Boosting Machine[J]. Annals of Statistics, 2001, 29(5): 1189-1232.
[18] Breiman L I, Friedman J H, Olshen R A, et al.Classification and Regression Trees (CART)[J]. Encyclopedia of Ecology, 1984, 40(3): 582-588.
[1] Lixin Xia,Jieyan Zeng,Chongwu Bi,Guanghui Ye. Identifying Hierarchy Evolution of User Interests with LDA Topic Model[J]. 数据分析与知识发现, 2019, 3(7): 1-13.
[2] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[3] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[4] Linna Xi,Yongxiang Dou. Examining Reposts of Micro-bloggers with Planned Behavior Theory[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[5] Jie Zhang,Junbo Zhao,Dongsheng Zhai,Ningning Sun. Patent Technology Analysis of Microalgae Biofuel Industrial Chain Based on Topic Model[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[6] Junwan Liu,Zhixin Long,Feifei Wang. Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
[7] Yue He,Yue Feng,Shupeng Zhao,Yufeng Ma. Recommending Contents Based on Zhihu Q&A Community: Case Study of Logistics Topics[J]. 数据分析与知识发现, 2018, 2(9): 42-49.
[8] Tao Zhang,Haiqun Ma. Clustering Policy Texts Based on LDA Topic Model[J]. 数据分析与知识发现, 2018, 2(9): 59-65.
[9] Yanhua Xu,Yujie Miao,Lin Miao,Xueqiang Lv. Generating HSK Writing Essays with LDA Model[J]. 数据分析与知识发现, 2018, 2(9): 80-87.
[10] Ziming Zeng,Qianwen Yang. Sentiment Analysis for Micro-blogs with LDA and AdaBoost[J]. 数据分析与知识发现, 2018, 2(8): 51-59.
[11] Beibei Pang,Juanqiong Gou,Wenxin Mu. Extracting Topics and Their Relationship from College Student Mentoring[J]. 数据分析与知识发现, 2018, 2(6): 92-101.
[12] Lixin Zhou,Jie Lin. Extracting Product Features with NodeRank Algorithm[J]. 数据分析与知识发现, 2018, 2(4): 90-98.
[13] Li Wang,Lixue Zou,Xiwen Liu. Visualizing Document Correlation Based on LDA Model[J]. 数据分析与知识发现, 2018, 2(3): 98-106.
[14] Jingqi Wang,Rui Li,Huayi Wu. The Evolution of Online Public Opinion Based on Spatial Autocorrelation[J]. 数据分析与知识发现, 2018, 2(2): 64-73.
[15] He Li,Linlin Zhu,Min Yan,Jincheng Liu,Chuang Hong. Identifying Useful Information from Open Innovation Community[J]. 数据分析与知识发现, 2018, 2(12): 12-22.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938