DiM: Difference-in-Means—the standard estimator used in A/B testing, calculating the simple difference between the average rewards of two groups
IPS: Inverse Propensity Scoring—an offline technique that re-weights data based on the probability of assignment to estimate what would have happened under a different policy
OPE: Off-Policy Evaluation—estimating the performance of a new policy using historical data generated by a different (logging) policy
CUPED: Controlled-experiment Using Pre-Experiment Data—a variance reduction technique for A/B tests that uses pre-experiment data as a covariate to adjust the outcome metric
Doubly Robust: An estimation method combining IPS (weighting) and a reward model (regression); it remains unbiased if either the propensity model or the reward model is correct
Control Variate: A random variable correlated with the outcome but with zero expectation, added to an estimator to reduce its variance without introducing bias
ATE: Average Treatment Effect—the difference in expected outcomes between two treatments (e.g., Policy A vs. Policy B)
Action-Agnostic: A model or function that depends only on the context (user features) and not on the specific action (treatment) taken
Bessel's Correction: The use of n-1 instead of n in variance calculations to correct for the bias introduced by estimating the population mean from the sample