With increasing privacy concerns in artificial intelligence, regulations have mandated the right to be forgotten, granting individuals the right to withdraw their data from models. Machine unlearning has emerged as a potential solution to enable selective forgetting in models, particularly in recommender systems where historical data contains sensitive user information. Despite recent advances in recommendation unlearning, evaluating unlearning methods comprehensively remains challenging due to the absence of a unified evaluation framework and overlooked aspects of deeper influence, e.g., fairness. To address these gaps, we propose CURE4Rec, the first comprehensive benchmark for recommendation unlearning evaluation. CURE4Rec covers four aspects, i.e., unlearning Completeness, recommendation Utility, unleaRning efficiency, and recommendation fairnEss, under three data selection strategies, i.e., core data, edge data, and random data. Specifically, we consider the deeper influence of unlearning on recommendation fairness and robustness towards data with varying impact levels. We construct multiple datasets with CURE4Rec evaluation and conduct extensive experiments on existing recommendation unlearning methods.
Overview of our CURE4Rec benchmark. CURE4Rec evaluates unlearning methods using data with varying levels of unlearning impact on four aspects, i.e., unlearning completeness, recommendation utility, unlearning efficiency, and recommendation fairness.
We conduct experiments on three real-world datasets (ML-100K, ML-1M and ADM) widely used in recommendation. To avoid extreme sparsity, we filter out the users and items that have less than 5 interactions. We provide a statistics summary of our used datasets.
The results regarding four evaluation aspects under three selections of unlearning sets and overall overviews.
Under model WMF. Results in terms of unlearning completeness (MIO), recommendation utility (NDCG and HR), and recommendation fairness (A-IGF) for the approximate recommendation unlearning method (SCIF).
Under model BPR. Results in terms of unlearning completeness (MIO), recommendation utility (NDCG and HR), and recommendation fairness (A-IGF) for the approximate recommendation unlearning method (SCIF).
@article{CURE4Rec2024,
title={CURE4Rec: A Benchmark for Recommendation Unlearning with Deeper Influence},
author={Chaochao Chen and Jiaming Zhang and Yizhao Zhang and Li Zhang and Lingjuan Lyu and Yuyuan Li and Biao Gong and Chenggang Yan},
journal={Advances in Neural Information Processing Systems},
year={2024},
}