Causal Estimation of User Learning in Personalized Systems

📝 Paper Summary

Recommendation personalization Causal Inference in Online Platforms

The paper demonstrates that standard Cookie-Cookie-Day (CCD) experiments produce biased estimates of user learning in personalized systems and proposes new designs like CCD-Switch that intervene on personalization to isolate preference changes.

Core Problem

Standard methods for measuring long-term user learning assume that differences between long-term and short-term treated users are solely due to preference changes. In personalized systems, treatment alters user history, which changes future system recommendations (personalization), confounding the measurement.

Why it matters:

Mistaking personalization effects for user learning can lead platforms to launch features that actually harm user experience (spurious positive learning) or reject beneficial ones (spurious negative learning).
User learning is a critical proxy for long-term ecosystem health, distinct from immediate short-term metrics.
Existing long-term A/B tests cannot disentangle user preference evolution from system adaptation.

Concrete Example: A streaming service highlights award-winning movies (Treatment). A long-term treated user clicks more, so the personalized system learns to show *more* award movies. A short-term treated user (control history) sees *fewer* award movies despite receiving the treatment today. A standard comparison concludes the long-term user likes the service more (user learning), but the effect is actually due to the system showing better movies (personalization).

Key Novelty

Decomposition of Total Causal Effect into User Learning, Personalization, and Direct Effects

Introduces 'CCD-Switch', an experimental design where treated users receive recommendations based on the history of a matched control user, breaking the causal loop between treatment history and personalization.
Introduces 'CCD-Freeze', where personalization is fixed to pre-experiment user history to serve as a proxy for the control state.
Proposes 'Clustered-CCD', where personalization is computed at a group level rather than individual level, reducing the correlation between an individual's treatment and their specific recommendations.

Evaluation Highlights

Empirical analysis of Google Ads experiments shows a correlation between personalization imbalance and user learning measurement bias, with bias magnitudes ranging from roughly -1.5 to +1.0 relative units depending on the imbalance.
Formal proofs demonstrate that standard CCD experiments provide unbiased estimates of user learning *only* when the personalization effect is zero.
Identifies that Clustered-CCD recovers user learning effects under an additive separability assumption, without requiring extra experimental cohorts.

Breakthrough Assessment

7/10

Identifies a fundamental causal flaw in a standard industry metric (CCD) for personalized systems and provides theoretically grounded experimental designs to fix it. High practical value for large platforms.

⚙️ Technical Details

Problem Definition

Setting: Potential outcomes framework over time t=1...T with binary intervention W.

Inputs: User history of actions X and treatments W.

Outputs: Estimates of User Learning Effect (τ_U), Personalization Effect (τ_P), and Direct Effect (τ_S).

Pipeline Flow

Assign users to cohorts (Control, Long-term Treatment, Switch/Freeze variants)
Intervene on Personalization Logic (Standard, Matched, Frozen, or Clustered)
Measure Outcomes (Y_it)
Compute Differences between Cohorts

System Modules

Standard CCD Cohorts (Experimental Design)

Establish baseline total effect and direct effect

Model or implementation: Randomized Experiment

Switch Cohort (CS) (Experimental Design)

Estimate user learning by removing personalization path

Model or implementation: Counterfactual Personalization

Freeze Cohort (CF) (Experimental Design)

Estimate user learning using pre-experiment state as control proxy

Model or implementation: Frozen Personalization

Novel Architectural Elements

Decoupling the treatment assignment from the history used for personalization (Switch/Freeze designs)
Hierarchical personalization assignment (Cluster level) to mitigate interference in causal estimation

Comparison to Prior Work

vs. Long-term A/B Test: A/B tests measure Total Effect but cannot isolate User Learning from Personalization.
vs. Standard CCD: Standard CCD assumes Personalization Effect is zero; CCD-Switch/Freeze explicitly controls for it to remove bias.
vs. Causal Mediation Analysis [not cited in paper]: This work addresses dynamic mediation where the mediator (system state) and exposure (treatment) vary over time, requiring specific experimental interventions rather than just statistical adjustment.

Limitations

CCD-Switch requires finding good matches between treated and control users, which introduces bias if matches are imperfect.
CCD-Freeze relies on pre-experiment history being a valid proxy for control history, which degrades over long experiments.
CCD-Switch and CCD-Freeze degrade user experience by serving suboptimal (non-personalized) recommendations to study cohorts.
Clustered-CCD relies on an 'Additive Separability' assumption that may not hold in complex systems.

Reproducibility

The paper provides mathematical definitions of the designs and estimators. No code or datasets are provided. The empirical data is proprietary (Google Ads).

📊 Experiments & Results

Evaluation Setup

Causal estimation in a personalized movie recommendation simulation (details truncated in text) and real-world analysis of a Google Ads system.

Benchmarks:

Google Ads Recommendation System (Real-world large scale system evaluation)

Metrics:

User Learning Effect (τ_U)
Personalization Effect (τ_P)
Bias (Difference between estimated and true effect)
Statistical methodology: Comparison of cohort means; Non-linear least squares for parameter estimation of learning curves.

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Empirical evidence from Google Ads experiments demonstrates that as the difference in personalization between cohorts increases, the bias in user learning measurement also increases.
Google Ads System	User Learning Measurement Change	0	1.0	+1.0
Google Ads System	User Learning Measurement Change	0	-1.5	-1.5

Experiment Figures

A concrete counter-example showing how CCD reports spurious user learning.

Main Takeaways

Personalization acts as a confounder in standard CCD experiments: it creates a feedback loop where treatment changes history, which changes recommendations, affecting the outcome independently of user learning.
The bias from personalization can be positive (spurious learning) or negative (masking true learning), depending on how the system reacts to treated user history.
Clustered-CCD offers a privacy-preserving alternative that enables user learning estimation under stronger assumptions (additive separability) without requiring complex user matching.

📚 Prerequisite Knowledge

Prerequisites

Causal Inference (Potential Outcomes)
A/B Testing methodologies
Basic understanding of Recommendation Systems

Key Terms

CCD: Cookie-Cookie-Day—an experimental design comparing a long-term treated cohort (Cookie) to a cohort treated only for the current day (Cookie-Day) to measure learning.

User Learning Effect: The causal effect on outcomes mediated through changes in user preferences over time, holding the system state fixed.

Personalization Effect: The causal effect on outcomes mediated through changes in the system's personalized recommendations based on user history.

System State: The configuration of the platform (e.g., specific recommendations shown) which may depend on user history.

CCD-Switch: A proposed design where a treated cohort's personalization features are swapped with those of a matched control user.

CCD-Freeze: A proposed design where a treated cohort's personalization features are fixed to their pre-experiment values.

Clustered-CCD: A design where personalization is computed based on the history of a cluster of users, rather than the individual, to dilute the effect of individual treatment history.