Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (6): 1-14     https://doi.org/10.11925/infotech.2096-3467.2022.0605
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
融合选择数据偏差消除和条件生成对抗网络的显式评分填充策略*
石磊,李树青(),蒋明锋,张志旺,王愈
南京财经大学信息工程学院 南京 210023
Explicit Rating Filling Strategy Based on Selection Data Bias Elimination and Conditional Generative Adversarial Networks
Shi Lei,Li Shuqing(),Jiang Mingfeng,Zhang Zhiwang,Wang Yu
College of Information Engineering, Nanjing University of Finance & Economics, Nanjing 210023, China
全文: PDF (1271 KB)   HTML ( 27
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 为缓解推荐系统中显式评分数据广泛存在的数据稀疏性和用户选择偏差问题,提出一种基于无趣项注入的评分数据填充模型。【方法】 基于条件生成对抗网络框架构建通用的评分数据填充模型,使用去噪自编码器作为生成器以捕捉交互背后的非线性潜在因素并提高模型的鲁棒性。针对选择偏差问题,基于用户时点可见性挖掘无趣项,并通过修改掩膜机制注入模型中生成符合用户真实评分分布的数据。【结果】 在MovieLens和Amazon CD数据集上的实验结果表明,经过数据填充后,ItemCF、BiasSVD和AutoRec算法的推荐精度平均提升了3倍以上。【局限】 数据生成依赖于评分数据,无法有效应用于评分数据极度稀疏的冷启动场景。【结论】 所提模型能够有效缓解数据稀疏性并消除选择偏差,显著提高现有协同过滤方法在推荐任务中的性能。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
石磊
李树青
蒋明锋
张志旺
王愈
关键词 数据稀疏选择偏差生成对抗网络无趣项数据填充    
Abstract

[Objective] This study is to address the issues of data sparsity and user selection bias in explicit rating data in recommender systems, by proposing a rating data filling model based on uninteresting item injection. [Methods] A general rating data filling model is constructed based on Conditional Generative Adversarial Networks framework. Denoising Auto-Encoder is used as the generator to capture the nonlinear potential factors behind the interaction and improve the robustness of model. To address the selection bias problem, uninteresting items are identified based on the user’s time point visibility, and are injected into the model by modifying the mask operation to generate data consistent with the user’s real rating distribution. [Results] Our experiments on MovieLens and Amazon datasets show that after data filling, the recommendation accuracy of ItemCF, BiasSVD, and AutoRec improves by more than three times on average. [Limitations] The data generation method relies on rating data and may not be effective in the case of extremely sparse rating data, such as in cold start scenarios. [Conclusions] The proposed model effectively alleviates data sparsity and eliminates selection bias, significantly improving the performance of recommended tasks of existing collaborative filtering methods.

Key wordsData Sparsity    Selection Bias    Generative Adversarial Networks    Uninteresting Items    Data Filling
收稿日期: 2022-06-13      出版日期: 2023-03-21
ZTFLH:  TP393  
基金资助:* 江苏省高等学校自然科学研究重大项目(19KJA510011);国家自然科学基金项目(61877061)
通讯作者: 李树青,ORCID:0000-0001-9814-5766,E-mail:leeshuqing@163.com。   
引用本文:   
石磊, 李树青, 蒋明锋, 张志旺, 王愈. 融合选择数据偏差消除和条件生成对抗网络的显式评分填充策略*[J]. 数据分析与知识发现, 2023, 7(6): 1-14.
Shi Lei, Li Shuqing, Jiang Mingfeng, Zhang Zhiwang, Wang Yu. Explicit Rating Filling Strategy Based on Selection Data Bias Elimination and Conditional Generative Adversarial Networks. Data Analysis and Knowledge Discovery, 2023, 7(6): 1-14.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0605      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I6/1
Fig.1  原始GAN结构
Fig.2  DAE框架
Fig.3  DAEGAN框架
Fig.4  UIIGAN框架
统计数据 MovieLens 100k MovieLens latest Amazon CD
评分总数 100 000 100 836 105 157
用户数量 943 610 2 588
项目数量 1 682 9 742 2 294
稀疏度 93.7% 98.3% 98.2%
Table 1  数据集统计数据
数据集 评价指标 指标值
ItemCF BiasSVD AutoRec DAEGAN+ ItemCF DAEGAN+ BiasSVD DAEGAN+ AutoRec
MovieLens 100 k P@5 0.061 2 0.064 5 0.055 9 0.051 3 0.062 7 0.047 8
R@5 0.046 2 0.056 6 0.048 8 0.040 4 0.049 0 0.040 9
NDCG@5 0.071 9 0.077 4 0.063 7 0.060 2 0.077 2 0.057 8
MovieLens latest P@5 0.042 9 0.046 2 0.043 7 0.035 9 0.037 7 0.032 4
R@5 0.029 1 0.033 5 0.031 7 0.021 2 0.024 8 0.018 6
NDCG@5 0.051 4 0.054 6 0.046 8 0.045 5 0.050 9 0.043 7
Amazon CD P@5 0.011 9 0.012 2 0.010 4 0.012 8 0.014 9 0.011 7
R@5 0.010 2 0.011 9 0.009 8 0.012 9 0.015 1 0.010 6
NDCG@5 0.013 8 0.015 2 0.012 7 0.016 4 0.018 3 0.014 5
Table 2  DAEGAN模型数据填充实验结果
Fig.5  DAEGAN模型生成评分数据分布
Fig.6  参数 θ对错误率 e r r o r的影响
Fig.7  参数 δ对UIIGAN模型数据填充效果的影响
算法 评价指标
P@5 P@10 R@5 R@10 NDCG@5 NDCG@10
ItemCF 0.061 2 0.057 4 0.046 2 0.092 8 0.071 9 0.083 8
BiasSVD 0.064 5 0.058 3 0.056 6 0.093 1 0.077 4 0.085 1
AutoRec 0.055 9 0.051 3 0.048 8 0.082 1 0.063 7 0.077 4
PureSVD 0.186 5 0.144 0 0.215 5 0.315 6 0.258 4 0.276 7
Zero-Injection 0.190 5 0.143 8 0.230 9 0.330 1 0.276 5 0.295 6
UIMLF 0.213 4 0.161 6 0.248 5 0.345 4 0.303 7 0.319 5
UIIGAN+ItemCF 0.201 3 0.156 5 0.214 0 0.326 4 0.256 7 0.277 8
UIIGAN+ BiasSVD 0.234 9 0.184 2 0.261 3 0.362 3 0.314 1 0.325 9
UIIGAN+ AutoRec 0.212 9 0.168 3 0.243 1 0.349 7 0.293 2 0.308 5
Table 3  MovieLens 100k数据集推荐准确性
算法 评价指标
P@5 P@10 R@5 R@10 NDCG@5 NDCG@10
ItemCF 0.042 9 0.034 6 0.029 1 0.043 5 0.051 4 0.052 1
BiasSVD 0.046 2 0.040 1 0.033 5 0.053 2 0.054 6 0.056 9
AutoRec 0.043 7 0.039 7 0.031 7 0.048 7 0.046 8 0.049 6
PureSVD 0.150 3 0.117 4 0.122 1 0.182 4 0.179 5 0.181 3
Zero-Injection 0.151 1 0.121 8 0.124 3 0.190 7 0.181 6 0.186 0
UIMLF 0.168 5 0.125 1 0.131 7 0.191 5 0.204 5 0.202 8
UIIGAN+ItemCF 0.164 6 0.126 1 0.126 7 0.194 1 0.192 7 0.193 0
UIIGAN+ BiasSVD 0.185 8 0.139 6 0.144 9 0.204 4 0.214 5 0.212 6
UIIGAN+ AutoRec 0.166 7 0.130 7 0.129 6 0.197 9 0.193 9 0.199 0
Table 4  MovieLens latest数据集推荐准确性
算法 评价指标
P@5 P@10 R@5 R@10 NDCG@5 NDCG@10
ItemCF 0.011 9 0.009 2 0.010 2 0.016 0 0.013 8 0.014 9
BiasSVD 0.012 1 0.009 6 0.011 9 0.019 5 0.015 2 0.017 2
AutoRec 0.010 4 0.008 8 0.009 8 0.015 4 0.012 7 0.014 1
PureSVD 0.085 7 0.067 0 0.084 2 0.128 5 0.010 8 0.117 8
Zero-Injection 0.089 2 0.069 4 0.087 7 0.130 6 0.011 6 0.122 1
UIMLF 0.098 6 0.076 3 0.102 8 0.158 4 0.124 1 0.137 3
UIIGAN+ItemCF 0.096 2 0.074 1 0.100 1 0.155 2 0.122 6 0.135 2
UIIGAN+ BiasSVD 0.117 4 0.088 7 0.125 4 0.176 2 0.148 2 0.166 3
UIIGAN+ AutoRec 0.092 1 0.071 9 0.092 5 0.144 1 0.118 0 0.125 7
Table 5  Amazon CD数据集推荐准确性
Fig.8  MovieLens 100k数据集实验结果
Fig.9  MovieLens latest数据集实验结果
Fig.10  Amazon CD数据集实验结果
[1] Lu J, Wu D S, Mao M S, et al. Recommender System Application Developments: A Survey[J]. Decision Support Systems, 2015, 74: 12-32.
doi: 10.1016/j.dss.2015.03.008
[2] Pan W S, Cui S, Wen H Y, et al. Correcting the User Feedback-Loop Bias for Recommendation Systems[OL]. arXiv Preprint, arXiv:2109.06037.
[3] Marlin B M, Zemel R S. Collaborative Prediction and Ranking with Non-random Missing Data[C]// Proceedings of the 3rd ACM Conference on Recommender Systems. 2009: 5-12.
[4] Schnabel T, Swaminathan A, Singh A, et al. Recommendations as Treatments: Debiasing Learning and Evaluation[C]// Proceedings of the 33rd International Conference on Machine Learning. 2016: 1670-1679.
[5] Steck H. Evaluation of Recommendations: Rating-Prediction and Ranking[C]// Proceedings of the 7th ACM Conference on Recommender Systems. 2013: 213-220.
[6] Steck H. Training and Testing of Recommender Systems on Data Missing not at Random[C]// Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010: 713-722.
[7] Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative Adversarial Nets[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. 2014: 2672-2680.
[8] Radford A, Metz L, Chintala S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks[OL]. arXiv Preprint, arXiv:1511.06434.
[9] Yu L T, Zhang W N, Wang J, et al. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017: 2852-2858.
[10] Donahue C, McAuley J, Puckette M. Synthesizing Audio with Generative Adversarial Networks[OL]. arXiv Preprint, arXiv:1802.04208.
[11] Antoniou A, Storkey A, Edwards H. Data Augmentation Generative Adversarial Networks[OL]. arXiv Preprint, arXiv:1711.04340.
[12] Wang J, Yu L T, Zhang W N, et al. IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models[C]// Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017: 515-524.
[13] Wang H W, Wang J, Wang J L, et al. GraphGAN: Graph Representation Learning with Generative Adversarial Nets[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018: 2508-2515.
[14] Chae D K, Kim S W. Adversarial Training of Deep Autoencoders Towards Recommendation Tasks[C]// Proceedings of 2018 International Conference on Network Infrastructure and Digital Content. 2018: 91-95.
[15] Chae D K, Kang J S, Kim S W, et al. CFGAN: A Generic Collaborative Filtering Framework Based on Generative Adversarial Networks[C]// Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018: 137-146.
[16] Maurera F B P, Dacrema M F, Cremonesi P. An Evaluation Study of Generative Adversarial Networks for Collaborative Filtering[OL]. arXiv Preprint, arXiv:2201.01815.
[17] Wang Q Y, Yin H Z, Wang H, et al. Enhancing Collaborative Filtering with Generative Augmentation[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019: 548-556.
[18] Chae D K, Kim J, Chau D H, et al. AR-CF: Augmenting Virtual Users and Items in Collaborative Filtering for Addressing Cold-Start Problems[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2020: 1251-1260.
[19] Chen J W, Dong H D, Wang X, et al. Bias and Debias in Recommender System: A Survey and Future Directions[OL]. arXiv Preprint, arXiv:2010.03240.
[20] Marlin B M, Zemel R S, Roweis S, et al. Collaborative Filtering and the Missing at Random Assumption[C]// Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence. 2007: 267-275.
[21] Cremonesi P, Koren Y, Turrin R. Performance of Recommender Algorithms on Top-n Recommendation Tasks[C]// Proceedings of the 4th ACM Conference on Recommender Systems. 2010: 39-46.
[22] 李洋, 肖泽青, 聂松松, 等. 生成对抗网络及其在新能源数据质量中的应用研究综述[J]. 南方电网技术, 2020(2): 25-33.
[22] (Li Yang, Xiao Zeqing, Nie Songsong, et al. Review of Research on Generative Adversarial Network and Its Application in New Energy Data Quality[J]. Southern Power System Technology, 2020(2): 25-33.)
[23] Mirza M, Osindero S. Conditional Generative Adversarial Nets[OL]. arXiv Preprint, arXiv:1411.1784.
[24] Wu Y, DuBois C, Zheng A X, et al. Collaborative Denoising Auto-Encoders for Top-n Recommender Systems[C]// Proceedings of the 9th ACM International Conference on Web Search and Data Mining. 2016: 153-162.
[25] Chen H L, Wang S, Jiang N, et al. Trust‐Aware Generative Adversarial Network with Recurrent Neural Network for Recommender Systems[J]. International Journal of Intelligent Systems, 2021, 36(2): 778-795.
doi: 10.1002/int.v36.2
[26] Liang D W, Charlin L, McInerney J, et al. Modeling User Exposure in Recommendation[C]// Proceedings of the 25th International Conference on World Wide Web. 2016: 951-961.
[27] He X N, Gao M, Kan M Y, et al. Predicting the Popularity of Web 2.0 Items Based on User Comments[C]// Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. 2014: 233-242.
[28] 石磊, 李树青. 基于用户时点可见性的无趣项挖掘及协同过滤推荐方法[J]. 数据分析与知识发现, 2022, 6(5):64-76.
[28] (Shi Lei, Li Shuqing. Mining Uninteresting Items with Visibility of User Time Points and Collaborative Filtering Recommendation Method[J]. Data Analysis and Knowledge Discovery, 2022, 6(5): 64-76.)
[29] Sarwar B, Karypis G, Konstan J, et al. Item-Based Collaborative Filtering Recommendation Algorithms[C]// Proceedings of the 10th International Conference on World Wide Web. 2001: 285-295.
[30] Koren Y, Bell R, Volinsky C. Matrix Factorization Techniques for Recommender Systems[J]. Computer, 2009, 42(8): 30-37.
[31] Sedhain S, Menon A K, Sanner S, et al. AutoRec: Autoencoders Meet Collaborative Filtering[C]// Proceedings of the 24th International Conference on World Wide Web. 2015: 111-112.
[32] Hwang W S, Parc J, Kim S W, et al. “Told You I Didn’t Like It”: Exploiting Uninteresting Items for Effective Collaborative Filtering[C]// Proceedings of 2016 IEEE 32nd International Conference on Data Engineering. 2016: 349-360.
[33] He X N, Zhang H W, Kan M Y, et al. Fast Matrix Factorization for Online Recommendation with Implicit Feedback[C]// Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2016: 549-558.
[1] 杨文丽, 李娜娜. 基于对抗网络的文本对齐跨语言情感分类方法*[J]. 数据分析与知识发现, 2022, 6(7): 141-151.
[2] 石磊, 李树青. 基于用户时点可见性的无趣项挖掘及协同过滤推荐方法*[J]. 数据分析与知识发现, 2022, 6(5): 64-76.
[3] 王永, 王永东, 郭慧芳, 周玉敏. 一种基于离散增量的项目相似性度量方法*[J]. 数据分析与知识发现, 2018, 2(5): 70-76.
[4] 李道国,李连杰,申恩平. 基于用户评分时间改进的协同过滤推荐算法*[J]. 现代图书情报技术, 2016, 32(9): 65-69.
[5] 李媛媛,马永强. 基于潜在语义索引的特征选择与权重改进若干关键问题的研究与实现[J]. 现代图书情报技术, 2007, 2(10): 80-84.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn