← Back to Paper List

ERASE -- A Real-World Aligned Benchmark for Unlearning in Recommender Systems

Pierre Lubitzsch, Maarten de Rijke, Sebastian Schelter
BIFOLD & Technische Universität Berlin, University of Amsterdam
arXiv (2026)
Recommendation Benchmark

📝 Paper Summary

Machine Unlearning Recommender Systems Privacy
ERASE is a large-scale benchmark for machine unlearning in recommender systems that evaluates diverse tasks, real-world unlearning scenarios, and operational efficiency across seven algorithms and nine datasets.
Core Problem
Existing unlearning benchmarks for recommenders focus narrowly on collaborative filtering and unrealistic 'one-shot' deletion of large data chunks, ignoring sequential requests and diverse tasks like session-based recommendation.
Why it matters:
  • Legal regulations (GDPR) and security needs (removing spam) require efficient data deletion, but current methods are often too slow or degrade model utility.
  • Real-world systems face continuous, small-scale deletion requests (e.g., users withdrawing consent), not the massive single-batch deletions simulated in prior benchmarks.
  • Prior benchmarks overlook critical recommendation tasks like Next-Basket and Session-Based Recommendation, which differ significantly from standard Collaborative Filtering.
Concrete Example: A user suffering from addiction requests the removal of all interactions with alcohol products. Current benchmarks simulate this by deleting random 5% chunks of training data, failing to capture the specific, sensitive nature of this request or the need to process it immediately without full retraining.
Key Novelty
ERASE Benchmark
  • Introduces sequential unlearning of small batches to mimic real-time requests (e.g., removing sensitive items user-by-user) rather than single large-batch deletions.
  • Expands evaluation scope beyond Collaborative Filtering to include Session-Based and Next-Basket Recommendation, using 9 diverse datasets.
  • Provides 600GB of pre-computed artifacts (checkpoints, logs) to allow researchers to test new unlearning methods without expensive model pre-training.
Architecture
Architecture Figure Figure 1
Overview of the ERASE benchmark pipeline including tasks, unlearning scenarios, algorithms, and evaluation metrics.
Evaluation Highlights
  • Retraining takes up to 24 hours, while efficient unlearning methods (like SCIF) reduce this latency by 3+ orders of magnitude.
  • Recommender-specific unlearning methods (SCIF, GIF) consistently outperform general-purpose methods (from NeurIPS competition) in stability and utility preservation.
  • General-purpose methods often fail on recurrent/attention-based architectures (GRU4Rec, SASRec), sometimes degrading utility significantly compared to retraining.
Breakthrough Assessment
8/10
Significantly advances the field by aligning evaluation with real-world constraints (sequential requests, diverse tasks) and releasing massive artifacts to lower barriers for future research.
×