Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (11): 52-62     https://doi.org/10.11925/infotech.2096-3467.2020.0482
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
一种融合网络表示学习与XGBoost的评分预测模型*
丁勇1,2,陈夕1(),蒋翠清1,2,王钊1,2
1合肥工业大学管理学院 合肥 230009
2过程优化与智能决策教育部重点实验室 合肥 230009
Predicting Online Ratings with Network Representation Learning and XGBoost
Ding Yong1,2,Chen Xi1(),Jiang Cuiqing1,2,Wang Zhao1,2
1School of Management, Hefei University of Technology, Hefei 230009, China
2Key Laboratory of Process Optimization and Intelligent Decision-making of Ministry of Education, Hefei 230009, China
全文: PDF (978 KB)   HTML ( 18
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 基于丰富的元数据和评分数据,提出一种融合网络表示学习与XGBoost的评分预测模型——N2V_XGB。【方法】 提取并融合元数据和评分数据的相似性权重,构建同质关系网络;利用网络表示学习自动提取用户和项目特征,再将提取的特征作为XGBoost的输入,迭代训练获得最佳的评分预测模型。【结果】 实验表明,N2V_XGB模型的MAE和RMSE分别为0.686 7、0.873 7,低于4种主要的对比模型。【局限】 N2V_XGB模型未能很好地利用时间特征信息,评分结果没有反映时序变化。【结论】 N2V_XGB模型将网络表示学习与XGBoost算法进行有效融合,能够缓解数据稀疏,提高用户评分的预测精度。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
丁勇
陈夕
蒋翠清
王钊
关键词 网络表示学习XGBoost评分预测协同过滤Node2Vec    
Abstract

[Objective] This paper proposes a model to predict online ratings with the help of network representation learning and XGBoost—N2V_XGB. [Methods] First, we retrieved metadata and existing online rating data. Then, we extracted and merged the similarity weights of collected data to construct a homogenous relationship network. Third, we used network representation learning to automatically extract user and item features. Finally, we input these data to XGBoost, and obtained the best model with iteratively training. [Results] The MAE and RMSE of the proposed N2V_XGB model were 0.686 7 and 0.873 7, which were lower than the four classic models. [Limitations] We did not make good use of time features and the prediction results did not reflect time-series changes. [Conclusions] The proposed N2V_XGB model effectively address the data sparseness issues and improve the prediction accuracy of user ratings.

Key wordsNetwork Representation Learning    XGBoost    Rating Prediction    Collaborative Filtering    Node2Vec
收稿日期: 2020-05-28      出版日期: 2020-09-27
ZTFLH:  TP391  
基金资助:*本文系教育部人文社会科学规划基金项目“社会化媒体对企业绩效的影响机制研究”(15YJA630010);国家自然科学基金重点项目“大数据环境下的微观信用评价理论与方法研究”的研究成果之一(71731005)
通讯作者: 陈夕     E-mail: 1181738697@qq.com
引用本文:   
丁勇,陈夕,蒋翠清,王钊. 一种融合网络表示学习与XGBoost的评分预测模型*[J]. 数据分析与知识发现, 2020, 4(11): 52-62.
Ding Yong,Chen Xi,Jiang Cuiqing,Wang Zhao. Predicting Online Ratings with Network Representation Learning and XGBoost. Data Analysis and Knowledge Discovery, 2020, 4(11): 52-62.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0482      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I11/52
Fig.1  N2V_XGB模型框架
实体
(用户/项目)
属性特征
f1 f2 f3 fn
e1 f1,1 f1,2 f1,3 f1,n
e2 f2,1 f2,2 f2,3 f2,n
em fm,1 fm,2 fm,3 fm,n
Table 1  元数据矩阵
Fig.2  用户同质网络图GU示例
Fig.3  Skip-gram的训练模式
Fig.4  不同的随机游走策略参数pq对结果的影响
Fig.5  不同特征向量维度d对结果的影响
模型参数
n_estimators 3 000
learning_rate 0.3
max_depth 4
min_child_weight 1
gamma 0.2
subsample 1
colsample_bytree 1
colsample_bylevel 1
reg_lambda 0.9
reg_alpha 0.1
seed 33
Table 2  XGBoost算法最优参数
指标

算法
Item-based_CF SVD_CF N2V_CF XGB_CF N2V_XGB
MAE 1.128 3 1.072 8 0.804 3 0.706 5 0.686 7
RMSE 1.391 8 1.320 0 1.027 4 0.911 2 0.873 7
Table 3  N2V_XGB模型与对比模型的表现
[1] 李晓菊. 协同过滤推荐系统中的数据稀疏性及冷启动问题研究[D]. 上海: 华东师范大学, 2018.
[1] ( Li Xiaoju. Research on Data Sparsity and Cold-Start Problem in Collaborative Filtering Recommender System[D]. Shanghai: East China Normal University, 2018.)
[2] 冷亚军, 陆青, 梁昌勇. 协同过滤推荐技术综述[J]. 模式识别与人工智能, 2014,27(8):720-734.
[2] ( Leng Yajun, Lu Qing, Liang Changyong. Survey of Recommendation Based on Collaborative Filtering[J]. Pattern Recognition and Artificial Intelligence, 2014,27(8):720-734.)
[3] 龚安, 高云, 高洪福. 一种基于项目属性评分的协同过滤推荐算法[J]. 计算机工程与科学, 2015,37(12):2366-2371.
[3] ( Gong An, Gao Yun, Gao Hongfu. A Collaborative Filtering Recommendation Algorithm Based on Ratings of Item Attributes[J]. Computer Engineering and Science, 2015,37(12):2366-2371.)
[4] 丁少衡, 姬东鸿, 王路路. 基于用户属性和评分的协同过滤推荐算法[J]. 计算机工程与设计, 2015,36(2):487-491, 497.
[4] ( Ding Shaoheng, Ji Donghong, Wang Lulu. Collaborative Filtering Recommendation Algorithm Based on User Attributes and Scores[J]. Computer Engineering and Design, 2015,36(2):487-491, 497.)
[5] Davoudi A, Chatterjee M. Product Rating Prediction Using Trust Relationships in Social Networks[C]// Proceedings of the 13th IEEE Annual Consumer Communications & Networking Conference, Las Vegas, NV, USA. IEEE, 2016.
[6] 肖志宇, 翟玉庆. 改进的基于信任网络和随机游走策略的评分预测模型[J]. 南京理工大学学报, 2015,39(5):602-608.
[6] ( Xiao Zhiyu, Zhai Yuqing. Improved Rating Prediction Model Basing on Trust Network and Random Walk Strategy[J]. Journal of Nanjing University of Science and Technology, 2015,39(5):602-608.)
[7] Davoudi A, Chatterjee M. Social Trust Model for Rating Prediction in Recommender Systems: Effects of Similarity, Centrality, and Social Ties[J]. Online Social Networks and Media, 2018,7:1-11.
doi: 10.1016/j.osnem.2018.05.001
[8] 薛福亮, 刘君玲. 基于用户间信任关系改进的协同过滤推荐方法[J]. 数据分析与知识发现, 2017,1(7):90-99.
[8] ( Xue Fuliang, Liu Junling. Improving Collaborative Filtering Recommendation Based on Trust Relationship Among Users[J]. Data Analysis and Knowledge Discovery, 2017,1(7):90-99.)
[9] Ren Y, Li G, Zhang J, et al. Lazy Collaborative Filtering for Data Sets with Missing Values[J]. IEEE Transactions on Cybernetics, 2013,43(6):1822-1834.
pmid: 23757575
[10] 李征, 段垒. 基于用户兴趣评分填充的改进混合推荐方法[J]. 工程科学与技术, 2019,51(1):189-196.
[10] ( Li Zheng, Duan Lei. Improved Hybrid Recommendation Approach Based on User Interest Ratings Filling[J]. Advanced Engineering Sciences, 2019,51(1):189-196.)
[11] 彭石, 周志彬, 王国军. 基于评分矩阵预填充的协同过滤算法[J]. 计算机工程, 2013,39(1):175-178.
doi: 10.3969/j.issn.1000-3428.2013.01.037
[11] ( Peng Shi, Zhou Zhibin, Wang Guojun. Collaborative Filtering Algorithm Based on Rating Matrix Pre-filling[J]. Computer Engineering, 2013,39(1):175-178.)
doi: 10.3969/j.issn.1000-3428.2013.01.037
[12] Koren Y, Bell R, Volinsky C. Matrix Factorization Techniques for Recommender Systems[J]. Computer, 2009,42(8):30-37.
[13] Li L, Zhang Y J. FastNMF: Highly Efficient Monotonic Fixed-Point Nonnegative Matrix Factorization Algorithm with Good Applicability[J]. Journal of Electronic Imaging, 2009,18(3):033004.
doi: 10.1117/1.3184771
[14] 毕华玲, 周微, 卢福强. 引入偏置的矩阵分解推荐算法研究[J]. 计算机应用研究, 2018,35(10):2928-2931, 2964.
[14] ( Bi Hualing, Zhou Wei, Lu Fuqiang. Bias Based Matrix Factorization Recommender Techniques[J]. Application Research of Computers, 2018,35(10):2928-2931, 2964.)
[15] 陈晔, 刘志强. 基于LFM矩阵分解的推荐算法优化研究[J]. 计算机工程与应用, 2019,55(2):116-120.
[15] ( Chen Ye, Liu Zhiqiang. Research on Improved Recommendation Algorithm Based on LFM Matrix Factorization[J]. Computer Engineering and Applications, 2019,55(2):116-120.)
[16] 何瑾琳, 刘学军, 徐新艳, 等. 融合Node2Vec和深度神经网络的隐式反馈推荐模型[J]. 计算机科学, 2019,46(6):41-48.
[16] ( He Jinlin, Liu Xuejun, Xu Xinyan, et al. Implicit Feedback Recommendation Model Combining Node2Vec and Deep Neural Networks[J]. Computer Science, 2019,46(6):41-48.)
[17] 杨贵军, 徐雪, 赵富强. 基于XGBoost算法的用户评分预测模型及应用[J]. 数据分析与知识发现, 2019,3(1):118-126.
[17] ( Yang Guijun, Xu Xue, Zhao Fuqiang. Predicting User Ratings with XGBoost Algorithm[J]. Data Analysis and Knowledge Discovery, 2019,3(1):118-126.)
[18] 马春平, 陈文亮. 基于评论主题的个性化评分预测模型[J]. 北京大学学报(自然科学版), 2016,52(1):165-170.
doi: 10.13209/j.0479-8023.2016.011
[18] ( Ma Chunping, Chen Wenliang. Personalized Model for Rating Prediction Based on Review Analysis[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2016,52(1):165-170.)
doi: 10.13209/j.0479-8023.2016.011
[19] 张红丽, 刘济郢, 杨斯楠, 等. 基于网络用户评论的评分预测模型研究[J]. 数据分析与知识发现, 2017,1(8):48-58.
[19] ( Zhang Hongli, Liu Jiying, Yang Sinan, et al. Predicting Online Users’ Ratings with Comments[J]. Data Analysis and Knowledge Discovery, 2017,1(8):48-58.)
[20] Grover A, Leskovec J. Node2Vec: Scalable Feature Learning for Networks[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016: 855-864.
[21] Chen T, Guesintr C. XGBoost: A Scalable Tree Boosting System[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016: 785-794.
[22] Deshpande M, Karypis G. Item-Based Top-N Recommendation Algorithms[J]. ACM Transactions on Information Systems, 2004,22(1):143-177.
doi: 10.1145/963770.963776
[23] Liang L, Tang R. An Improved Collaborative Filtering Algorithm Based on Node2Vec[C]// Proceedings of the 2nd International Conference on Computer Science and Artificial Intelligence. 2018: 218-222.
[24] 崔岩, 祁伟, 庞海龙, 等. 融合协同过滤和XGBoost的推荐算法[J]. 计算机应用研究, 2020,37(1):62-65.
[24] ( Cui Yan, Qi Wei, Pang Hailong, et al. Extreme Gradient Boosting Recommendation Algorithm with Collaborative Filtering[J]. Application Research of Computer, 2020,37(1):62-65.)
[1] 刘渊晨, 王昊, 高亚琪. 在线音乐歌单播放量预测及影响因素分析*[J]. 数据分析与知识发现, 2021, 5(8): 100-112.
[2] 曹睿,廖彬,李敏,孙瑞娜. 基于XGBoost的在线短租市场价格预测及特征分析模型*[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[3] 张鑫,文奕,许海云. 一种融合表示学习与主题表征的作者合作预测模型*[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[4] 李振宇, 李树青. 嵌入隐式相似群的深度协同过滤算法*[J]. 数据分析与知识发现, 2021, 5(11): 124-134.
[5] 杨辰, 陈晓虹, 王楚涵, 刘婷婷. 基于用户细粒度属性偏好聚类的推荐策略*[J]. 数据分析与知识发现, 2021, 5(10): 94-102.
[6] 杨恒,王思丽,祝忠明,刘巍,王楠. 基于并行协同过滤算法的领域知识推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[7] 苏庆,陈思兆,吴伟民,李小妹,黄佃宽. 基于学习情况协同过滤算法的个性化学习推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(5): 105-117.
[8] 郑淞尹,谈国新,史中超. 基于分段用户群与时间上下文的旅游景点推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(5): 92-104.
[9] 余传明,钟韵辞,林奥琛,安璐. 基于网络表示学习的作者重名消歧研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 48-59.
[10] 余传明,李浩男,王曼怡,黄婷婷,安璐. 基于深度学习的知识表示研究:网络视角*[J]. 数据分析与知识发现, 2020, 4(1): 63-75.
[11] 焦富森,李树青. 基于物品质量和用户评分修正的协同过滤推荐算法 *[J]. 数据分析与知识发现, 2019, 3(8): 62-67.
[12] 李珊,姚叶慧,厉浩,刘洁,嘎玛白姆. 基于ISA联合聚类的组推荐算法研究 *[J]. 数据分析与知识发现, 2019, 3(8): 77-87.
[13] 李晓峰,马静,李驰,朱恒民. 基于XGBoost模型的电商商品品名识别算法研究 *[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[14] 杨贵军,徐雪,赵富强. 基于XGBoost算法的用户评分预测模型及应用*[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[15] 李杰, 杨芳, 徐晨曦. 考虑时间动态性和序列模式的个性化推荐算法*[J]. 数据分析与知识发现, 2018, 2(7): 72-80.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn