Middleman Bias: A form of selection bias where training data only contains samples that passed a previous system's filter (e.g., search engine approval), hiding potential positives that were filtered out
Cross-Encoder: A model that processes query and document simultaneously (full self-attention), offering high accuracy but high computational cost
Bi-Encoder: A model that encodes query and document independently into vectors, allowing fast retrieval via nearest neighbor search but with lower accuracy than cross-encoders
Pearson Correlation Loss: A loss function that maximizes the linear correlation between teacher and student scores, focusing on preserving the relative ranking and distribution shape rather than absolute values
GMB: Gross Merchandise Volume—total sales dollar value of merchandise sold
ROAS: Return on Advertising Spend—revenue generated for every dollar spent on advertising
MNAR: Missing Not At Random—the pattern of missing data is related to the unobserved data itself (e.g., users don't click relevant items because they are ranked low)