Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (6): 1-14    DOI: 10.11925/infotech.2096-3467.2022.0605
Current Issue | Archive | Adv Search |
Explicit Rating Filling Strategy Based on Selection Data Bias Elimination and Conditional Generative Adversarial Networks
Shi Lei,Li Shuqing(),Jiang Mingfeng,Zhang Zhiwang,Wang Yu
College of Information Engineering, Nanjing University of Finance & Economics, Nanjing 210023, China
Download: PDF (1271 KB)   HTML ( 27
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study is to address the issues of data sparsity and user selection bias in explicit rating data in recommender systems, by proposing a rating data filling model based on uninteresting item injection. [Methods] A general rating data filling model is constructed based on Conditional Generative Adversarial Networks framework. Denoising Auto-Encoder is used as the generator to capture the nonlinear potential factors behind the interaction and improve the robustness of model. To address the selection bias problem, uninteresting items are identified based on the user’s time point visibility, and are injected into the model by modifying the mask operation to generate data consistent with the user’s real rating distribution. [Results] Our experiments on MovieLens and Amazon datasets show that after data filling, the recommendation accuracy of ItemCF, BiasSVD, and AutoRec improves by more than three times on average. [Limitations] The data generation method relies on rating data and may not be effective in the case of extremely sparse rating data, such as in cold start scenarios. [Conclusions] The proposed model effectively alleviates data sparsity and eliminates selection bias, significantly improving the performance of recommended tasks of existing collaborative filtering methods.

Key wordsData Sparsity      Selection Bias      Generative Adversarial Networks      Uninteresting Items      Data Filling     
Received: 13 June 2022      Published: 21 March 2023
ZTFLH:  TP393  
Fund:Natural Science Major Foundation of the Jiangsu Higher Education Institutions of China(19KJA510011);National Natural Science Foundation of China(61877061)
Corresponding Authors: Li Shuqing,ORCID:0000-0001-9814-5766,E-mail:leeshuqing@163.com。   

Cite this article:

Shi Lei, Li Shuqing, Jiang Mingfeng, Zhang Zhiwang, Wang Yu. Explicit Rating Filling Strategy Based on Selection Data Bias Elimination and Conditional Generative Adversarial Networks. Data Analysis and Knowledge Discovery, 2023, 7(6): 1-14.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0605     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I6/1

GAN Structure
DAE Structure
DAEGAN Structure
UIIGAN Structure
统计数据 MovieLens 100k MovieLens latest Amazon CD
评分总数 100 000 100 836 105 157
用户数量 943 610 2 588
项目数量 1 682 9 742 2 294
稀疏度 93.7% 98.3% 98.2%
Statistics of Dataset
数据集 评价指标 指标值
ItemCF BiasSVD AutoRec DAEGAN+ ItemCF DAEGAN+ BiasSVD DAEGAN+ AutoRec
MovieLens 100 k P@5 0.061 2 0.064 5 0.055 9 0.051 3 0.062 7 0.047 8
R@5 0.046 2 0.056 6 0.048 8 0.040 4 0.049 0 0.040 9
NDCG@5 0.071 9 0.077 4 0.063 7 0.060 2 0.077 2 0.057 8
MovieLens latest P@5 0.042 9 0.046 2 0.043 7 0.035 9 0.037 7 0.032 4
R@5 0.029 1 0.033 5 0.031 7 0.021 2 0.024 8 0.018 6
NDCG@5 0.051 4 0.054 6 0.046 8 0.045 5 0.050 9 0.043 7
Amazon CD P@5 0.011 9 0.012 2 0.010 4 0.012 8 0.014 9 0.011 7
R@5 0.010 2 0.011 9 0.009 8 0.012 9 0.015 1 0.010 6
NDCG@5 0.013 8 0.015 2 0.012 7 0.016 4 0.018 3 0.014 5
Data Enhancement Experiment Result of DAEGAN
Generated Rating Distribution of DAEGAN
θ on Error Rate
">
Effect of Parameters θ on Error Rate
? δ in UIIGAN
">
Data Filling Effect of Parameter ? δ in UIIGAN
算法 评价指标
P@5 P@10 R@5 R@10 NDCG@5 NDCG@10
ItemCF 0.061 2 0.057 4 0.046 2 0.092 8 0.071 9 0.083 8
BiasSVD 0.064 5 0.058 3 0.056 6 0.093 1 0.077 4 0.085 1
AutoRec 0.055 9 0.051 3 0.048 8 0.082 1 0.063 7 0.077 4
PureSVD 0.186 5 0.144 0 0.215 5 0.315 6 0.258 4 0.276 7
Zero-Injection 0.190 5 0.143 8 0.230 9 0.330 1 0.276 5 0.295 6
UIMLF 0.213 4 0.161 6 0.248 5 0.345 4 0.303 7 0.319 5
UIIGAN+ItemCF 0.201 3 0.156 5 0.214 0 0.326 4 0.256 7 0.277 8
UIIGAN+ BiasSVD 0.234 9 0.184 2 0.261 3 0.362 3 0.314 1 0.325 9
UIIGAN+ AutoRec 0.212 9 0.168 3 0.243 1 0.349 7 0.293 2 0.308 5
Recommendation Accuracy of MovieLens 100k
算法 评价指标
P@5 P@10 R@5 R@10 NDCG@5 NDCG@10
ItemCF 0.042 9 0.034 6 0.029 1 0.043 5 0.051 4 0.052 1
BiasSVD 0.046 2 0.040 1 0.033 5 0.053 2 0.054 6 0.056 9
AutoRec 0.043 7 0.039 7 0.031 7 0.048 7 0.046 8 0.049 6
PureSVD 0.150 3 0.117 4 0.122 1 0.182 4 0.179 5 0.181 3
Zero-Injection 0.151 1 0.121 8 0.124 3 0.190 7 0.181 6 0.186 0
UIMLF 0.168 5 0.125 1 0.131 7 0.191 5 0.204 5 0.202 8
UIIGAN+ItemCF 0.164 6 0.126 1 0.126 7 0.194 1 0.192 7 0.193 0
UIIGAN+ BiasSVD 0.185 8 0.139 6 0.144 9 0.204 4 0.214 5 0.212 6
UIIGAN+ AutoRec 0.166 7 0.130 7 0.129 6 0.197 9 0.193 9 0.199 0
Recommendation Accuracy of MovieLens latest
算法 评价指标
P@5 P@10 R@5 R@10 NDCG@5 NDCG@10
ItemCF 0.011 9 0.009 2 0.010 2 0.016 0 0.013 8 0.014 9
BiasSVD 0.012 1 0.009 6 0.011 9 0.019 5 0.015 2 0.017 2
AutoRec 0.010 4 0.008 8 0.009 8 0.015 4 0.012 7 0.014 1
PureSVD 0.085 7 0.067 0 0.084 2 0.128 5 0.010 8 0.117 8
Zero-Injection 0.089 2 0.069 4 0.087 7 0.130 6 0.011 6 0.122 1
UIMLF 0.098 6 0.076 3 0.102 8 0.158 4 0.124 1 0.137 3
UIIGAN+ItemCF 0.096 2 0.074 1 0.100 1 0.155 2 0.122 6 0.135 2
UIIGAN+ BiasSVD 0.117 4 0.088 7 0.125 4 0.176 2 0.148 2 0.166 3
UIIGAN+ AutoRec 0.092 1 0.071 9 0.092 5 0.144 1 0.118 0 0.125 7
Recommendation Accuracy of Amazon CD
Experiment Result on MovieLens 100k
Experiment Result on MovieLens latest
Experiment Result on Amazon CD
[1] Lu J, Wu D S, Mao M S, et al. Recommender System Application Developments: A Survey[J]. Decision Support Systems, 2015, 74: 12-32.
doi: 10.1016/j.dss.2015.03.008
[2] Pan W S, Cui S, Wen H Y, et al. Correcting the User Feedback-Loop Bias for Recommendation Systems[OL]. arXiv Preprint, arXiv:2109.06037.
[3] Marlin B M, Zemel R S. Collaborative Prediction and Ranking with Non-random Missing Data[C]// Proceedings of the 3rd ACM Conference on Recommender Systems. 2009: 5-12.
[4] Schnabel T, Swaminathan A, Singh A, et al. Recommendations as Treatments: Debiasing Learning and Evaluation[C]// Proceedings of the 33rd International Conference on Machine Learning. 2016: 1670-1679.
[5] Steck H. Evaluation of Recommendations: Rating-Prediction and Ranking[C]// Proceedings of the 7th ACM Conference on Recommender Systems. 2013: 213-220.
[6] Steck H. Training and Testing of Recommender Systems on Data Missing not at Random[C]// Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010: 713-722.
[7] Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative Adversarial Nets[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. 2014: 2672-2680.
[8] Radford A, Metz L, Chintala S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks[OL]. arXiv Preprint, arXiv:1511.06434.
[9] Yu L T, Zhang W N, Wang J, et al. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017: 2852-2858.
[10] Donahue C, McAuley J, Puckette M. Synthesizing Audio with Generative Adversarial Networks[OL]. arXiv Preprint, arXiv:1802.04208.
[11] Antoniou A, Storkey A, Edwards H. Data Augmentation Generative Adversarial Networks[OL]. arXiv Preprint, arXiv:1711.04340.
[12] Wang J, Yu L T, Zhang W N, et al. IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models[C]// Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017: 515-524.
[13] Wang H W, Wang J, Wang J L, et al. GraphGAN: Graph Representation Learning with Generative Adversarial Nets[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018: 2508-2515.
[14] Chae D K, Kim S W. Adversarial Training of Deep Autoencoders Towards Recommendation Tasks[C]// Proceedings of 2018 International Conference on Network Infrastructure and Digital Content. 2018: 91-95.
[15] Chae D K, Kang J S, Kim S W, et al. CFGAN: A Generic Collaborative Filtering Framework Based on Generative Adversarial Networks[C]// Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018: 137-146.
[16] Maurera F B P, Dacrema M F, Cremonesi P. An Evaluation Study of Generative Adversarial Networks for Collaborative Filtering[OL]. arXiv Preprint, arXiv:2201.01815.
[17] Wang Q Y, Yin H Z, Wang H, et al. Enhancing Collaborative Filtering with Generative Augmentation[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019: 548-556.
[18] Chae D K, Kim J, Chau D H, et al. AR-CF: Augmenting Virtual Users and Items in Collaborative Filtering for Addressing Cold-Start Problems[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2020: 1251-1260.
[19] Chen J W, Dong H D, Wang X, et al. Bias and Debias in Recommender System: A Survey and Future Directions[OL]. arXiv Preprint, arXiv:2010.03240.
[20] Marlin B M, Zemel R S, Roweis S, et al. Collaborative Filtering and the Missing at Random Assumption[C]// Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence. 2007: 267-275.
[21] Cremonesi P, Koren Y, Turrin R. Performance of Recommender Algorithms on Top-n Recommendation Tasks[C]// Proceedings of the 4th ACM Conference on Recommender Systems. 2010: 39-46.
[22] 李洋, 肖泽青, 聂松松, 等. 生成对抗网络及其在新能源数据质量中的应用研究综述[J]. 南方电网技术, 2020(2): 25-33.
[22] (Li Yang, Xiao Zeqing, Nie Songsong, et al. Review of Research on Generative Adversarial Network and Its Application in New Energy Data Quality[J]. Southern Power System Technology, 2020(2): 25-33.)
[23] Mirza M, Osindero S. Conditional Generative Adversarial Nets[OL]. arXiv Preprint, arXiv:1411.1784.
[24] Wu Y, DuBois C, Zheng A X, et al. Collaborative Denoising Auto-Encoders for Top-n Recommender Systems[C]// Proceedings of the 9th ACM International Conference on Web Search and Data Mining. 2016: 153-162.
[25] Chen H L, Wang S, Jiang N, et al. Trust‐Aware Generative Adversarial Network with Recurrent Neural Network for Recommender Systems[J]. International Journal of Intelligent Systems, 2021, 36(2): 778-795.
doi: 10.1002/int.v36.2
[26] Liang D W, Charlin L, McInerney J, et al. Modeling User Exposure in Recommendation[C]// Proceedings of the 25th International Conference on World Wide Web. 2016: 951-961.
[27] He X N, Gao M, Kan M Y, et al. Predicting the Popularity of Web 2.0 Items Based on User Comments[C]// Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. 2014: 233-242.
[28] 石磊, 李树青. 基于用户时点可见性的无趣项挖掘及协同过滤推荐方法[J]. 数据分析与知识发现, 2022, 6(5):64-76.
[28] (Shi Lei, Li Shuqing. Mining Uninteresting Items with Visibility of User Time Points and Collaborative Filtering Recommendation Method[J]. Data Analysis and Knowledge Discovery, 2022, 6(5): 64-76.)
[29] Sarwar B, Karypis G, Konstan J, et al. Item-Based Collaborative Filtering Recommendation Algorithms[C]// Proceedings of the 10th International Conference on World Wide Web. 2001: 285-295.
[30] Koren Y, Bell R, Volinsky C. Matrix Factorization Techniques for Recommender Systems[J]. Computer, 2009, 42(8): 30-37.
[31] Sedhain S, Menon A K, Sanner S, et al. AutoRec: Autoencoders Meet Collaborative Filtering[C]// Proceedings of the 24th International Conference on World Wide Web. 2015: 111-112.
[32] Hwang W S, Parc J, Kim S W, et al. “Told You I Didn’t Like It”: Exploiting Uninteresting Items for Effective Collaborative Filtering[C]// Proceedings of 2016 IEEE 32nd International Conference on Data Engineering. 2016: 349-360.
[33] He X N, Zhang H W, Kan M Y, et al. Fast Matrix Factorization for Online Recommendation with Implicit Feedback[C]// Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2016: 549-558.
[1] Shi Lei, Li Shuqing. Mining Uninteresting Items with Visibility of User Time Points and Collaborative Filtering Recommendation Method[J]. 数据分析与知识发现, 2022, 6(5): 64-76.
[2] Wang Yong,Wang Yongdong,Guo Huifang,Zhou Yumin. Measuring Item Similarity Based on Increment of Diversity[J]. 数据分析与知识发现, 2018, 2(5): 70-76.
[3] Li Daoguo,Li Lianjie,Shen Enping. New Collaborative Filtering Recommendation Algorithm Based on User Rating Time[J]. 现代图书情报技术, 2016, 32(9): 65-69.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn