RecLLM: Recommender Systems powered by Large Language Models
Neutral Ranker: A baseline recommender that generates lists using prompts without sensitive user attributes
Sensitive Ranker: A recommender that explicitly uses sensitive attributes (e.g., gender, age) in the prompt to generate lists
Counterfactual Sensitive Ranker: A recommender where a sensitive attribute is hypothetically altered (e.g., 'do(Gender=Male)') to test 'what-if' scenarios
NSD: Neutral vs. Sensitive Ranker Deviation—Metric measuring how adding sensitive attributes changes recommendation utility compared to a neutral baseline
NCSD: Neutral vs. Counterfactual Sensitive Deviation—Metric measuring how hypothetically swapping a sensitive attribute changes utility compared to a neutral baseline
IF: Intrinsic Fairness—Metric evaluating if a single ranker's output distribution aligns with a target distribution (e.g., uniform) across groups
Benefit Deviation: The difference in utility (e.g., Hit Rate) between a target ranker and a reference ranker
ICL: In-Context Learning—Providing examples in the prompt (zero-shot vs. few-shot) to guide the LLM
Hit Rate: A metric checking if any relevant item appears in the top-k recommendations