Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (5): 64-76     https://doi.org/10.11925/infotech.2096-3467.2021.0842
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于用户时点可见性的无趣项挖掘及协同过滤推荐方法*
石磊,李树青()
南京财经大学信息工程学院 南京 210023
Mining Uninteresting Items with Visibility of User Time Points and Collaborative Filtering Recommendation Method
Shi Lei,Li Shuqing()
College of Information Engineering, Nanjing University of Finance & Economics, Nanjing 210023, China
全文: PDF (998 KB)   HTML ( 20
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 解决基于显式反馈信息的协同过滤算法无法处理数据稀疏性和用户选择偏差影响的问题。【方法】 根据看见但未交互的项目表现用户的负面偏好,结合用户活跃度、项目流行度和时间因素综合衡量用户对项目的可见性。引入使用前偏好的概念,构建基于用户时点可见性的加权矩阵分解模型以识别缺失数据中用户不感兴趣的项目,并将其填充为低值。【结果】 通过在MovieLens两个数据集的实验表明,经过基于无趣项挖掘与低值填充的数据填充算法(UIMLF)填充后,ItemCF和BiasSVD的推荐精度平均提升2~2.5倍。【局限】 仅依据“看见未交互”的项目表现用户负面偏好的经验对使用前偏好建模,可能存在经验偏差。【结论】 所提方法能有效缓解数据稀疏性和用户选择偏差的影响,使推荐结果更准确。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
石磊
李树青
关键词 协同过滤显式反馈选择偏差使用前偏好无趣项    
Abstract

[Objective] This paper proposes a new method to improve the collaborative filtering algorithm based on explicit feedbacks, aiming to address data sparsity and user selection bias issues. [Methods] First, we retrieved the negative preferences of users who have seen the items but did not interact with them. Then, we measured the visibility of items along with user activity, item popularity and time factors. Third, we introduced the concept of pre-use preferences to construct a weighted matrix factorization model based on user time point visibility. Finally, we ide.pngied items that users were not interested in, and marked them with low values. [Results] We examined our model with the MovieLens datasets, and found the recommendation accuracy of ItemCF and BiasSVD increased by an average of 2 to 2.5 times. [Limitations] There may be empirical bias in modeling pre-use preferences based on the users’ negative preferences from the “seen-but-not-interacted items”. [Conclusions] The proposed model could effectively reduce the impacts of data sparsity and user selection bias, and make accurate recommendation results.

Key wordsCollaborative Filtering    Explicit Feedback    Selection Bias    Pre-use Preference    Uninteresting Items
收稿日期: 2021-08-12      出版日期: 2022-06-21
ZTFLH:  TP393  
基金资助:*江苏省高等学校自然科学研究重大项目(19KJA510011);江苏省研究生科研与实践创新计划项目的研究成果之一(KYCX20_1348)
通讯作者: 李树青,ORCID:0000-0001-9814-5766     E-mail: leeshuqing@163.com
引用本文:   
石磊, 李树青. 基于用户时点可见性的无趣项挖掘及协同过滤推荐方法*[J]. 数据分析与知识发现, 2022, 6(5): 64-76.
Shi Lei, Li Shuqing. Mining Uninteresting Items with Visibility of User Time Points and Collaborative Filtering Recommendation Method. Data Analysis and Knowledge Discovery, 2022, 6(5): 64-76.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.0842      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I5/64
统计数据 MovieLens 100k MovieLens latest
评分总数 100 000 100 836
用户数量 943 610
项目数量 1 682 9 742
稀疏度 93.7% 98.3%
评分时间跨度 1997.09-1998.04 1996.03-2018.09
Table 1  数据集统计数据
评分 MovieLens 100k MovieLens latest
低评分(1和2) 17.48% 13.41%
高评分(3、4和5) 82.52% 86.59%
Table 2  评分分布
评价指标 ItemCF Bias
SVD
EMDP+ ItemCF EMDP+ Bias
SVD
RZF+ ItemCF RZF+ Bias
SVD
P@5 0.061 2 0.064 5 0.053 3 0.061 4 0.142 5 0.140 2
P@10 0.057 4 0.057 3 0.052 3 0.053 4 0.114 3 0.109 0
P@20 0.048 6 0.045 9 0.045 3 0.042 6 0.086 3 0.082 0
R@5 0.046 2 0.056 6 0.048 5 0.053 0 0.159 0 0.166 2
R@10 0.093 8 0.093 1 0.094 5 0.091 9 0.243 6 0.249 6
R@20 0.172 4 0.159 4 0.172 7 0.152 9 0.354 3 0.363 9
NDCG@5 0.071 9 0.077 4 0.063 0 0.076 0 0.193 8 0.189 3
NDCG@10 0.083 8 0.085 1 0.077 9 0.084 6 0.212 6 0.206 8
NDCG@20 0.108 3 0.104 8 0.103 1 0.101 9 0.246 2 0.242 0
Table 3  MovieLens 100k数据集推荐性能
评价指标 ItemCF Bias
SVD
EMDP+ ItemCF EMDP+ BiasSVD RZF+
ItemCF
RZF+
BiasSVD
P@5 0.042 9 0.046 2 0.025 9 0.028 5 0.095 2 0.105 9
P@10 0.034 6 0.040 1 0.025 2 0.025 0 0.070 9 0.083 8
P@20 0.027 6 0.032 6 0.020 9 0.022 2 0.054 0 0.063 2
R@5 0.029 1 0.033 5 0.022 1 0.021 2 0.080 5 0.093 2
R@10 0.043 5 0.053 2 0.039 6 0.034 2 0.121 8 0.145 9
R@20 0.069 5 0.088 7 0.060 7 0.059 3 0.169 4 0.208 1
NDCG@5 0.051 4 0.054 6 0.035 5 0.034 4 0.117 7 0.132 1
NDCG@10 0.052 1 0.056 9 0.040 2 0.036 1 0.119 6 0.139 9
NDCG@20 0.058 5 0.065 8 0.047 1 0.043 5 0.132 8 0.157 1
Table 4  MovieLens latest数据集推荐性能
Fig.1  EMDP填充的评分分布
Fig.2  参数 ε的影响
评价指标 WRMF eALS UTV-eALS
P@5 0.401 7 0.418 4 0.439 0
P@10 0.342 1 0.344 8 0.358 7
P@20 0.272 0 0.268 8 0.287 1
R@5 0.147 1 0.149 4 0.158 2
R@10 0.235 2 0.231 2 0.240 4
R@20 0.350 2 0.340 3 0.358 0
NDCG@5 0.428 2 0.446 8 0.469 7
NDCG@10 0.419 0 0.416 3 0.434 5
NDCG@20 0.417 9 0.408 3 0.431 6
Table 5  MovieLens 100k数据集使用前偏好建模实验结果
评价指标 WRMF eALS UTV-eALS
P@5 0.315 9 0.319 5 0.337 3
P@10 0.272 0 0.270 4 0.281 4
P@20 0.220 6 0.218 4 0.226 4
R@5 0.090 3 0.093 2 0.097 0
R@10 0.152 1 0.151 8 0.154 4
R@20 0.230 2 0.232 9 0.235 2
NDCG@5 0.333 9 0.339 2 0.357 5
NDCG@10 0.314 3 0.312 9 0.328 7
NDCG@20 0.309 8 0.309 3 0.319 4
Table 6  MovieLens latest数据集使用前偏好建模实验结果
使用前偏好 MovieLens 100k MovieLens latest
[0,0.2) 92.64% 97.24%
[0.2,0.4) 5.40% 2.03%
[0.4,0.6) 1.48% 0.52%
[0.6,0.8) 0.38% 0.15%
[0.8,1] 0.10% 0.06%
Table 7  使用前偏好分布
Fig.3  参数 θ的影响
Fig.4  参数 σ的影响
填充评分 P@5
ItemCF BaisSVD
4 0.092 6 0.086 8
5 0.015 2 0.023 6
Table 8  高值填充结果
评价指标 ItemCF BiasSVD UIMLF+ItemCF UIMLF+BiasSVD ItemKNN PureSVD UTV-eALS
P@5 0.061 2 0.064 5 0.188 3 0.213 4 0.142 3 0.186 5 0.178 6
P@10 0.057 4 0.057 3 0.145 6 0.161 6 0.118 3 0.144 0 0.139 2
P@20 0.048 6 0.045 9 0.108 4 0.114 5 0.089 1 0.104 8 0.103 1
R@5 0.046 2 0.056 6 0.204 0 0.248 5 0.166 9 0.215 5 0.209 1
R@10 0.093 8 0.093 1 0.316 4 0.345 4 0.268 6 0.315 6 0.310 4
R@20 0.172 4 0.159 4 0.460 1 0.480 2 0.387 8 0.440 7 0.441 5
NDCG@5 0.071 9 0.077 4 0.246 7 0.303 7 0.193 4 0.258 4 0.247 2
NDCG@10 0.083 8 0.085 1 0.267 8 0.319 5 0.219 9 0.276 7 0.270 0
NDCG@20 0.108 3 0.104 8 0.311 9 0.358 1 0.256 8 0.314 6 0.310 8
Table 9  MovieLens 100k数据集数据填充实验结果
评价指标 ItemCF BiasSVD UIMLF+ ItemCF UIMLF+ BiasSVD ItemKNN PureSVD UTV-eALS
P@5 0.042 9 0.046 2 0.152 2 0.168 5 0.116 7 0.150 3 0.144 8
P@10 0.034 6 0.040 1 0.115 9 0.125 1 0.091 9 0.117 4 0.114 4
P@20 0.027 6 0.032 6 0.091 6 0.094 2 0.071 1 0.088 4 0.087 6
R@5 0.029 1 0.033 5 0.113 6 0.131 7 0.105 7 0.122 1 0.118 9
R@10 0.043 5 0.053 2 0.171 4 0.191 5 0.153 8 0.182 4 0.179 7
R@20 0.069 5 0.088 7 0.262 6 0.282 1 0.228 6 0.255 4 0.268 0
NDCG@5 0.051 4 0.054 6 0.183 9 0.204 5 0.150 8 0.189 5 0.182 0
NDCG@10 0.052 1 0.056 9 0.183 0 0.202 8 0.148 2 0.191 3 0.183 3
NDCG@20 0.058 5 0.065 8 0.206 2 0.224 8 0.168 0 0.207 6 0.205 4
Table 10  MovieLens latest数据集数据填充实验结果
[1] Lu J, Wu D S, Mao M S, et al. Recommender System Application Developments: A Survey[J]. Decision Support Systems, 2015, 74: 12-32.
doi: 10.1016/j.dss.2015.03.008
[2] Sarwar B, Karypis G, Konstan J, et al. Item-Based Collaborative Filtering Recommendation Algorithms[C]// Proceedings of the 10th International Conference on World Wide Web. 2001: 285-295.
[3] Marlin B M, Zemel R S. Collaborative Prediction and Ranking with Non-random Missing Data[C]// Proceedings of the 3rd ACM Conference on Recommender Systems. 2009: 5-12.
[4] Marlin B M, Zemel R S, Roweis S, et al. Collaborative Filtering and the Missing at Random Assumption[C]// Proceedings of the 23rd Conference on Uncertainty in A.pngicial Intelligence. 2007: 267-275.
[5] Jawaheer G, Szomszor M, Kostkova P. Comparison of Implicit and Explicit Feedback from an Online Music Recommendation Service[C]// Proceedings of the 1st International Workshop on Information Heterogeneity and Fusion in Recommender Systems. 2010: 47-51.
[6] da Silva J F G, de Moura Junior N N, Caloba L P. Effects of Data Sparsity on Recommender Systems Based on Collaborative Filtering[C]// Proceedings of 2018 International Joint Conference on Neural Networks. 2018: 1-8.
[7] Chen J W, Dong H D, Wang X, et al. Bias and Debias in Recommender System: A Survey and Future Directions[OL]. arXiv Preprint, arXiv: 2010.03240.
[8] Dalvi N, Kumar R, Pang B. Para ‘Normal’ Activity: On the Distribution of Average Ratings[C]// Proceedings of the 7th International AAAI Conference on Weblogs and Social Media. 2013: 110-119.
[9] Hwang W S, Parc J, Kim S W, et al. “Told You I Didn’t Like It”: Exploiting Uninteresting Items for Effective Collaborative Filtering[C]// Proceedings of 2016 IEEE 32nd International Conference on Data Engineering. 2016: 349-360.
[10] Steck H. Evaluation of Recommendations: Rating-Prediction and Ranking[C]// Proceedings of the 7th ACM Conference on Recommender Systems. 2013: 213-220.
[11] Ma H, King I, Lyu M R. Effective Missing Data Prediction for Collaborative Filtering[C]// Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007: 39-46.
[12] Ren Y L, Li G, Zhang J, et al. The Efficient Imputation Method for Neighborhood-Based Collaborative Filtering[C]// Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012: 684-693.
[13] Chae D K, Kang J S, Kim S W, et al. Rating Augmentation with Generative Adversarial Networks Towards Accurate Collaborative Filtering[C]// Proceedings of the World Wide Web Conference. 2019: 2616-2622.
[14] Pan R, Zhou Y H, Cao B, et al. One-Class Collaborative Filtering[C]// Proceedings of 2008 8th IEEE International Conference on Data Mining. 2008: 502-511.
[15] Rendle S, Freudenthaler C, Gantner Z, et al. BPR: Bayesian Personalized Ranking from Implicit Feedback[C]// Proceedings of the 25th Conference on Uncertainty in A.pngicial Intelligence. 2009: 452-461.
[16] Hu Y F, Koren Y, Volinsky C. Collaborative Filtering for Implicit Feedback Datasets[C]// Proceedings of 2008 8th IEEE International Conference on Data Mining. 2008: 263-272.
[17] He X N, Zhang H W, Kan M Y, et al. Fast Matrix Factorization for Online Recommendation with Implicit Feedback[C]// Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2016: 549-558.
[18] Cremonesi P, Koren Y, Turrin R. Performance of Recommender Algorithms on Top-N Recommendation Tasks[C]// Proceedings of the 4th ACM Conference on Recommender Systems. 2010: 39-46.
[19] Steck H. Training and Testing of Recommender Systems on Data Missing not at Random[C]// Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010: 713-722.
[20] Liang D W, Charlin L, McInerney J, et al. Modeling User Exposure in Recommendation[C]// Proceedings of the 25th International Conference on World Wide Web. 2016: 549-558.
[21] He X N, Gao M, Kan M Y, et al. Predicting the Popularity of Web 2.0 Items Based on User Comments[C]// Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. 2014: 233-242.
[22] Koren Y, Bell R, Volinsky C. Matrix Factorization Techniques for Recommender Systems[J]. Computer, 2009, 42(8): 30-37.
[1] 李振宇, 李树青. 嵌入隐式相似群的深度协同过滤算法*[J]. 数据分析与知识发现, 2021, 5(11): 124-134.
[2] 杨辰, 陈晓虹, 王楚涵, 刘婷婷. 基于用户细粒度属性偏好聚类的推荐策略*[J]. 数据分析与知识发现, 2021, 5(10): 94-102.
[3] 杨恒,王思丽,祝忠明,刘巍,王楠. 基于并行协同过滤算法的领域知识推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[4] 苏庆,陈思兆,吴伟民,李小妹,黄佃宽. 基于学习情况协同过滤算法的个性化学习推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(5): 105-117.
[5] 郑淞尹,谈国新,史中超. 基于分段用户群与时间上下文的旅游景点推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(5): 92-104.
[6] 丁勇,陈夕,蒋翠清,王钊. 一种融合网络表示学习与XGBoost的评分预测模型*[J]. 数据分析与知识发现, 2020, 4(11): 52-62.
[7] 焦富森,李树青. 基于物品质量和用户评分修正的协同过滤推荐算法 *[J]. 数据分析与知识发现, 2019, 3(8): 62-67.
[8] 李珊,姚叶慧,厉浩,刘洁,嘎玛白姆. 基于ISA联合聚类的组推荐算法研究 *[J]. 数据分析与知识发现, 2019, 3(8): 77-87.
[9] 李杰, 杨芳, 徐晨曦. 考虑时间动态性和序列模式的个性化推荐算法*[J]. 数据分析与知识发现, 2018, 2(7): 72-80.
[10] 王道平, 蒋中杨, 张博卿. 基于灰色关联分析和时间因素的协同过滤算法*[J]. 数据分析与知识发现, 2018, 2(6): 102-109.
[11] 王永, 王永东, 郭慧芳, 周玉敏. 一种基于离散增量的项目相似性度量方法*[J]. 数据分析与知识发现, 2018, 2(5): 70-76.
[12] 花凌锋, 杨高明, 王修君. 面向位置的多样性兴趣新闻推荐研究*[J]. 数据分析与知识发现, 2018, 2(5): 94-104.
[13] 薛福亮, 刘君玲. 基于用户间信任关系改进的协同过滤推荐方法*[J]. 数据分析与知识发现, 2017, 1(7): 90-99.
[14] 覃幸新, 王荣波, 黄孝喜, 谌志群. 基于多权值的Slope One协同过滤算法*[J]. 数据分析与知识发现, 2017, 1(6): 65-71.
[15] 李道国,李连杰,申恩平. 基于用户评分时间改进的协同过滤推荐算法*[J]. 现代图书情报技术, 2016, 32(9): 65-69.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn