Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (1): 118-126    DOI: 10.11925/infotech.2096-3467.2018.0414
Current Issue | Archive | Adv Search |
Predicting User Ratings with XGBoost Algorithm
Guijun Yang1,Xue Xu1(),Fuqiang Zhao2
1China Center of Economics and Statistics Research, Tianjin University of Finance and Economics, Tianjin 300222, China
2Institute of Polytechnic, Tianjin University of Finance and Economics, Tianjin 300222, China
Download: PDF (1339 KB)   HTML ( 10
Export: BibTeX | EndNote (RIS)      

[Objective] This study aims to build a model for effectively predicting ratings of user reviews and analysing consumer behaviours. [Methods] First, we applied the Latent Dirichlet Allocation model to set the topic features from user reviews as independent variable and user ratings as dependent variable. Then, we built a user rating prediction model based on the eXtreme Gradient Boosting algorithm. Finally, we added the disturbances of samples and attributes to the proposed model for rating prediction. [Results] We used the new model to predict user’s comments on a domestic automobile online portal, and identified their preferences of automobile. Compared with the Logical Regression and Random Forest algorithms, the proposed model has better precision and efficiency. [Limitations] We need to include data from other fields to more comprehensively describe user’s behaviours. [Conclusions] The proposed model could quantify user’s reviews and then predict their ratings effectively.

Key wordsRating Prediction      XGBoost Algorithm      LDA      Feature Extraction      User Reviews     
Received: 13 April 2018      Published: 04 March 2019

Cite this article:

Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm. Data Analysis and Knowledge Discovery, 2019, 3(1): 118-126.

URL:     OR

[1] Koren Y, Bell R, Volinsky C.Matrix Factorization Techniques for Recommender Systems[J]. Computer, 2009, 42(8): 30-37.
[2] Koren Y, Bell R.Advances in Collaborative Filtering[A]// Recommender Systems Handbook[M]. New York: Springer, 2011: 145-186.
[3] 邓晓懿, 金淳, 韩庆平, 等. 基于情境聚类和用户评级的协同过滤推荐模型[J]. 系统工程理论与实践, 2013, 33(11): 2945-2953.
[3] (Deng Xiaoyi, Jin Chun, Han Jim C, et al.Improved Collaborative Filtering Model Based on Context Clustering and User Ranking[J]. Systems Engineering —Theory & Practice, 2013, 33(11): 2945-2953.)
[4] Li X, Xu G, Chen E, et al.Learning User Preferences across Multiple Aspects for Merchant Recommendation[C]// Proceedings of the 2015 IEEE International Conference on Data Mining. IEEE, 2015.
[5] Fan M, Khademi M.Predicting a Business Star in Yelp from Its Reviews Text Alone[OL]. arXiv Preprint, arXiv: 1401.0864.
[6] 张红丽, 刘济郢, 杨斯楠, 等. 基于网络用户评论的评分预测模型研究[J]. 数据分析与知识发现, 2017, 1(8): 48-58.
[6] (Zhang Hongli, Liu Jiying, Yang Sinan, et al.Predicting Online Users’ Ratings with Comments[J]. Data Analysis and Knowledge Discovery, 2017, 1(8): 48-58.)
[7] 高祎璠, 余文喆, 晁平复, 等. 基于评论分析的评分预测与推荐[J]. 华东师范大学学报: 自然科学版, 2015(3): 80-90.
[7] (Gao Yifan, Yu Wenzhe, Chao Pingfu, et al.Analyzing Reviews for Rating Prediction and Item Recommendation[J]. Journal of East China Normal University: Natural Science, 2015(3): 80-90.)
[8] 杨博, 赵鹏飞. 推荐算法综述[J]. 山西大学学报: 自然科学版, 2011, 34(3): 337-350.
[8] (Yang Bo, Zhao Pengfei.Review of the Art of Recommendation Algorithms[J]. Journal of Shanxi University: Natural Science Edition, 2011, 34(3): 337-350.)
[9] Brown I, Mues C.An Experimental Comparison of Classification Algorithms for Imbalanced Credit Scoring Data Sets[J]. Expert Systems with Applications, 2012, 39(3): 3446-3453.
[10] 应维云. 随机森林方法及其在客户流失预测中的应用研究[J]. 管理评论, 2012, 24(2): 140-145.
[10] (Ying Weiyun.The Research on Random Forests and the Application in Customer Churn Prediction[J]. Management Review, 2012, 24(2): 140-145.)
[11] Breiman L.Random Forests[J]. Machine Learning, 2001, 45(1): 5-32.
[12] Chen T, Guestrin C.XGBoost: A Scalable Tree Boosting System[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016: 785-794.
[13] Seyfioğlu M, Demirezen M.A Hierarchical Approach for Sentiment Analysis and Categorization of Turkish Written Customer Relationship Management Data[C]//Proceedings of the 2017 Federated Conference on Computer Science and Information Systems. IEEE, 2017: 361-365.
[14] Athanasiou V, Maragoudakis M.A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources are Not Plentiful: A Case Study for Modern Greek[J]. Algorithms, 2017, 10(1): 34.
[15] Zhang R, Gao Y, Yu W, et al.Review Comment Analysis for Predicting Ratings[A]// Web-Age Information Management[M]. Springer, 2015: 247-259.
[16] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[17] Friedman J H.Greedy Function Approximation: A Gradient Boosting Machine[J]. Annals of Statistics, 2001, 29(5): 1189-1232.
[18] Breiman L I, Friedman J H, Olshen R A, et al.Classification and Regression Trees (CART)[J]. Encyclopedia of Ecology, 1984, 40(3): 582-588.
[1] Li Yueyan,Wang Hao,Deng Sanhong,Wang Wei. Research Trends of Information Retrieval——Case Study of SIGIR Conference Papers[J]. 数据分析与知识发现, 2021, 5(4): 13-24.
[2] Yi Huifang,Liu Xiwen. Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[3] Wang Hongbin,Wang Jianxiong,Zhang Yafei,Yang Heng. Topic Recognition of News Reports with Imbalanced Contents[J]. 数据分析与知识发现, 2021, 5(3): 109-120.
[4] Zheng Xinman, Dong Yu. Constructing Degree Lexicon for STI Policy Texts[J]. 数据分析与知识发现, 2021, 5(10): 81-93.
[5] Wang Wei, Gao Ning, Xu Yuting, Wang Hongwei. Topic Evolution of Online Reviews for Crowdfunding Campaigns[J]. 数据分析与知识发现, 2021, 5(10): 103-123.
[6] Cai Yongming,Liu Lu,Wang Kewei. Identifying Key Users and Topics from Online Learning Community[J]. 数据分析与知识发现, 2020, 4(6): 69-79.
[7] Ye Guanghui,Zeng Jieyan,Hu Jinglan,Bi Chongwu. Analyzing Public Sentiments from the Perspective of City Profiles[J]. 数据分析与知识发现, 2020, 4(4): 15-26.
[8] Pan Youneng,Ni Xiuli. Recommending Online Medical Experts with Labeled-LDA Model[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[9] Liu Yuwen,Wang Kai. Finding Geographic Locations of Popular Online Topics[J]. 数据分析与知识发现, 2020, 4(2/3): 173-181.
[10] Huang Wei,Zhao Jiangyuan,Yan Lu. Empirical Research on Topic Drift Index for Trending Network Events[J]. 数据分析与知识发现, 2020, 4(11): 92-101.
[11] Cai Jingxuan,Wu Jiang,Wang Chengkun. Predicting Usefulness of Crowd Testing Reports with Deep Learning[J]. 数据分析与知识发现, 2020, 4(11): 102-111.
[12] Ding Yong,Chen Xi,Jiang Cuiqing,Wang Zhao. Predicting Online Ratings with Network Representation Learning and XGBoost[J]. 数据分析与知识发现, 2020, 4(11): 52-62.
[13] Ye Guanghui,Xu Tong,Bi Chongwu,Li Xinyue. Analyzing Evolution of City Tourism Portraits with Multi-Dimensional Features and LDA Model[J]. 数据分析与知识发现, 2020, 4(11): 121-130.
[14] Wang Xiwei,Zhang Liu,Huang Bo,Wei Ya’nan. Constructing Topic Graph for Weibo Users Based on LDA: Case Study of “Egypt Air Disaster”[J]. 数据分析与知识发现, 2020, 4(10): 47-57.
[15] Hui Nie,Huan He. Identifying Implicit Features with Word Embedding[J]. 数据分析与知识发现, 2020, 4(1): 99-110.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938