Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (1): 118-126    DOI: 10.11925/infotech.2096-3467.2018.0414
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于XGBoost算法的用户评分预测模型及应用*
杨贵军1,徐雪1(),赵富强2
1天津财经大学中国经济统计研究中心 天津 300222
2天津财经大学理工学院 天津 300222
Predicting User Ratings with XGBoost Algorithm
Guijun Yang1,Xue Xu1(),Fuqiang Zhao2
1China Center of Economics and Statistics Research, Tianjin University of Finance and Economics, Tianjin 300222, China
2Institute of Polytechnic, Tianjin University of Finance and Economics, Tianjin 300222, China
全文: PDF(1339 KB)   HTML ( 6
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】基于用户网络评论构建有效的评分预测模型, 挖掘用户消费行为特征。【方法】基于LDA模型,量化用户评论为主题特征向量作为解释变量, 将用户评分作为被解释变量, 采用XGBoost算法, 并加入样本扰动和属性扰动生成多个模型进行集成, 构建用户评分预测模型。【结果】针对某汽车门户网站的用户评论评分预测结果表明, 该模型较好地揭示了用户对汽车商品的偏好。较逻辑回归、随机森林算法, 其预测准确度分别高出13.73%、0.64%, 且具有较高的计算效率。【局限】未融合其他方面的数据对用户行为特征进行更全面的刻画。【结论】将用户评论量化为主题特征向量, 基于XGBoost算法能够准确、高效地预测用户评分。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
杨贵军
徐雪
赵富强
关键词 评分预测XGBoost算法LDA主题模型文本特征提取用户评论    
Abstract

[Objective] This study aims to build a model for effectively predicting ratings of user reviews and analysing consumer behaviours. [Methods] First, we applied the Latent Dirichlet Allocation model to set the topic features from user reviews as independent variable and user ratings as dependent variable. Then, we built a user rating prediction model based on the eXtreme Gradient Boosting algorithm. Finally, we added the disturbances of samples and attributes to the proposed model for rating prediction. [Results] We used the new model to predict user’s comments on a domestic automobile online portal, and identified their preferences of automobile. Compared with the Logical Regression and Random Forest algorithms, the proposed model has better precision and efficiency. [Limitations] We need to include data from other fields to more comprehensively describe user’s behaviours. [Conclusions] The proposed model could quantify user’s reviews and then predict their ratings effectively.

Key wordsRating Prediction    XGBoost Algorithm    LDA    Feature Extraction    User Reviews
收稿日期: 2018-04-13     
基金资助:*本文系国家自然科学基金面上项目“劣者淘汰两阶段自适应临床试验的设计和分析”(项目编号: 11471239)、国家社会科学基金青年项目“社交媒体中敏感信息可信度评估方法研究”(项目编号: 18CTJ008)和全国统计科研计划重点项目“Web社会网络中敏感信息识别及突发事件预测研究”(项目编号: 2017LZ05)的研究成果之一
引用本文:   
杨贵军,徐雪,赵富强. 基于XGBoost算法的用户评分预测模型及应用*[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2018.0414.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0414
[1] Koren Y, Bell R, Volinsky C.Matrix Factorization Techniques for Recommender Systems[J]. Computer, 2009, 42(8): 30-37.
[2] Koren Y, Bell R.Advances in Collaborative Filtering[A]// Recommender Systems Handbook[M]. New York: Springer, 2011: 145-186.
[3] 邓晓懿, 金淳, 韩庆平, 等. 基于情境聚类和用户评级的协同过滤推荐模型[J]. 系统工程理论与实践, 2013, 33(11): 2945-2953.
[3] (Deng Xiaoyi, Jin Chun, Han Jim C, et al.Improved Collaborative Filtering Model Based on Context Clustering and User Ranking[J]. Systems Engineering —Theory & Practice, 2013, 33(11): 2945-2953.)
[4] Li X, Xu G, Chen E, et al.Learning User Preferences across Multiple Aspects for Merchant Recommendation[C]// Proceedings of the 2015 IEEE International Conference on Data Mining. IEEE, 2015.
[5] Fan M, Khademi M.Predicting a Business Star in Yelp from Its Reviews Text Alone[OL]. arXiv Preprint, arXiv: 1401.0864.
[6] 张红丽, 刘济郢, 杨斯楠, 等. 基于网络用户评论的评分预测模型研究[J]. 数据分析与知识发现, 2017, 1(8): 48-58.
[6] (Zhang Hongli, Liu Jiying, Yang Sinan, et al.Predicting Online Users’ Ratings with Comments[J]. Data Analysis and Knowledge Discovery, 2017, 1(8): 48-58.)
[7] 高祎璠, 余文喆, 晁平复, 等. 基于评论分析的评分预测与推荐[J]. 华东师范大学学报: 自然科学版, 2015(3): 80-90.
[7] (Gao Yifan, Yu Wenzhe, Chao Pingfu, et al.Analyzing Reviews for Rating Prediction and Item Recommendation[J]. Journal of East China Normal University: Natural Science, 2015(3): 80-90.)
[8] 杨博, 赵鹏飞. 推荐算法综述[J]. 山西大学学报: 自然科学版, 2011, 34(3): 337-350.
[8] (Yang Bo, Zhao Pengfei.Review of the Art of Recommendation Algorithms[J]. Journal of Shanxi University: Natural Science Edition, 2011, 34(3): 337-350.)
[9] Brown I, Mues C.An Experimental Comparison of Classification Algorithms for Imbalanced Credit Scoring Data Sets[J]. Expert Systems with Applications, 2012, 39(3): 3446-3453.
[10] 应维云. 随机森林方法及其在客户流失预测中的应用研究[J]. 管理评论, 2012, 24(2): 140-145.
[10] (Ying Weiyun.The Research on Random Forests and the Application in Customer Churn Prediction[J]. Management Review, 2012, 24(2): 140-145.)
[11] Breiman L.Random Forests[J]. Machine Learning, 2001, 45(1): 5-32.
[12] Chen T, Guestrin C.XGBoost: A Scalable Tree Boosting System[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016: 785-794.
[13] Seyfioğlu M, Demirezen M.A Hierarchical Approach for Sentiment Analysis and Categorization of Turkish Written Customer Relationship Management Data[C]//Proceedings of the 2017 Federated Conference on Computer Science and Information Systems. IEEE, 2017: 361-365.
[14] Athanasiou V, Maragoudakis M.A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources are Not Plentiful: A Case Study for Modern Greek[J]. Algorithms, 2017, 10(1): 34.
[15] Zhang R, Gao Y, Yu W, et al.Review Comment Analysis for Predicting Ratings[A]// Web-Age Information Management[M]. Springer, 2015: 247-259.
[16] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[17] Friedman J H.Greedy Function Approximation: A Gradient Boosting Machine[J]. Annals of Statistics, 2001, 29(5): 1189-1232.
[18] Breiman L I, Friedman J H, Olshen R A, et al.Classification and Regression Trees (CART)[J]. Encyclopedia of Ecology, 1984, 40(3): 582-588.
[1] 文秀贤,徐健. 基于用户评论的商品特征提取及特征价格研究 *[J]. 数据分析与知识发现, 2019, 3(7): 42-51.
[2] 张震,曾金. 面向用户评论的关键词抽取研究*——以美团为例[J]. 数据分析与知识发现, 2019, 3(3): 36-44.
[3] 席林娜,窦永香. 基于计划行为理论的微博用户转发行为影响因素研究*[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[4] 张杰,赵君博,翟东升,孙宁宁. 基于主题模型的微藻生物燃料产业链专利技术分析*[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[5] 刘俊婉,龙志昕,王菲菲. 基于LDA主题模型与链路预测的新兴主题关联机会发现研究*[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
[6] 宗红,薛春香,陈芬. 在线新闻评论生长规律研究*[J]. 数据分析与知识发现, 2018, 2(9): 50-58.
[7] 王丽,邹丽雪,刘细文. 基于LDA主题模型的文献关联分析及可视化研究[J]. 数据分析与知识发现, 2018, 2(3): 98-106.
[8] 李贺,祝琳琳,闫敏,刘金承,洪闯. 开放式创新社区用户信息有用性识别研究*[J]. 数据分析与知识发现, 2018, 2(12): 12-22.
[9] 曲佳彬,欧石燕. 基于主题过滤与主题关联的学科主题演化分析*[J]. 数据分析与知识发现, 2018, 2(1): 64-75.
[10] 张红丽,刘济郢,杨斯楠,徐健. 基于网络用户评论的评分预测模型研究*[J]. 数据分析与知识发现, 2017, 1(8): 48-58.
[11] 郭博,李守光,王昊,张晓军,龚伟,于昭君,孙宇. 电商评论综合分析系统的设计与实现——情感分析与观点挖掘的研究与应用[J]. 数据分析与知识发现, 2017, 1(12): 1-9.
[12] 关鹏,王曰芬. 科技情报分析中LDA主题模型最优主题数确定方法研究*[J]. 现代图书情报技术, 2016, 32(9): 42-50.
[13] 张群, 王红军, 王伦文. 词向量与LDA相融合的短文本分类方法*[J]. 数据分析与知识发现, 2016, 32(12): 27-35.
[14] 卓可秋, 虞为, 苏新宁. 突发事件检测的MapReduce并行化实现[J]. 现代图书情报技术, 2015, 31(2): 46-54.
[15] 蔡晓珍, 徐健, 吴思竹. 面向情感分析的用户评论过滤模型研究[J]. 现代图书情报技术, 2014, 30(4): 58-64.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn