← Back to Paper List

Causal Estimation of User Learning in Personalized Systems

Evan Munro, David Jones, Jennifer Brennan, Roland Nelet, Vahab Mirrokni, Jean Pouget-Abadie
Stanford Graduate School of Business, Google Research
arXiv (2023)
P13N Recommendation Benchmark

πŸ“ Paper Summary

Recommendation personalization Causal Inference in Online Platforms
The paper demonstrates that standard Cookie-Cookie-Day (CCD) experiments produce biased estimates of user learning in personalized systems and proposes new designs like CCD-Switch that intervene on personalization to isolate preference changes.
Core Problem
Standard methods for measuring long-term user learning assume that differences between long-term and short-term treated users are solely due to preference changes. In personalized systems, treatment alters user history, which changes future system recommendations (personalization), confounding the measurement.
Why it matters:
  • Mistaking personalization effects for user learning can lead platforms to launch features that actually harm user experience (spurious positive learning) or reject beneficial ones (spurious negative learning).
  • User learning is a critical proxy for long-term ecosystem health, distinct from immediate short-term metrics.
  • Existing long-term A/B tests cannot disentangle user preference evolution from system adaptation.
Concrete Example: A streaming service highlights award-winning movies (Treatment). A long-term treated user clicks more, so the personalized system learns to show *more* award movies. A short-term treated user (control history) sees *fewer* award movies despite receiving the treatment today. A standard comparison concludes the long-term user likes the service more (user learning), but the effect is actually due to the system showing better movies (personalization).
Key Novelty
Decomposition of Total Causal Effect into User Learning, Personalization, and Direct Effects
  • Introduces 'CCD-Switch', an experimental design where treated users receive recommendations based on the history of a matched control user, breaking the causal loop between treatment history and personalization.
  • Introduces 'CCD-Freeze', where personalization is fixed to pre-experiment user history to serve as a proxy for the control state.
  • Proposes 'Clustered-CCD', where personalization is computed at a group level rather than individual level, reducing the correlation between an individual's treatment and their specific recommendations.
Evaluation Highlights
  • Empirical analysis of Google Ads experiments shows a correlation between personalization imbalance and user learning measurement bias, with bias magnitudes ranging from roughly -1.5 to +1.0 relative units depending on the imbalance.
  • Formal proofs demonstrate that standard CCD experiments provide unbiased estimates of user learning *only* when the personalization effect is zero.
  • Identifies that Clustered-CCD recovers user learning effects under an additive separability assumption, without requiring extra experimental cohorts.
Breakthrough Assessment
7/10
Identifies a fundamental causal flaw in a standard industry metric (CCD) for personalized systems and provides theoretically grounded experimental designs to fix it. High practical value for large platforms.
×