UP5: Unbiased Foundation Model for Fairness-aware Recommendation

📝 Paper Summary

LLM-based Recommendation Fairness in Recommender Systems

UP5 achieves counterfactually fair recommendations by learning adversarial soft prompts that remove sensitive user attributes from input embeddings while preserving recommendation utility.

Core Problem

Large Language Models (LLMs) used for recommendation inadvertently capture and use sensitive user attributes (e.g., gender, age) even when not explicitly prompted, leading to unfair recommendations.

Why it matters:

LLM-based recommenders implicitly infer sensitive attributes from interaction history, perpetuating societal stereotypes
Users lack control over which personal attributes influence the recommendations they receive
Traditional fairness methods for ID-based recommenders (like altering user embeddings) do not transfer to LLM architectures where user info is textual/token-based

Concrete Example: An elderly user might want movie recommendations based on their taste, not their age (e.g., wanting to see modern movies rather than just classics). Standard LLM recommenders might infer 'elderly' from history and stereotype the output, denying the user's preference for fairness regarding age.

Key Novelty

Counterfactually-Fair-Prompt (CFP) via Adversarial Learning

Uses a trainable soft prompt prefix acting as a 'filter' to mask sensitive information from the LLM's internal representations
Optimized via adversarial learning: a discriminator tries to predict the sensitive attribute from the embeddings, while the prompt is trained to fool the discriminator and maximize recommendation accuracy
Introduces a Prompt Mixture (PM) module that combines single-attribute fair prompts to handle multiple sensitive attributes simultaneously without training exponential combinations

Architecture

The architecture of the UP5 framework showing the adversarial training loop.

Evaluation Highlights

Achieves higher Hit@1 than standard P5 baseline on MovieLens-1M (direct task) while reducing gender prediction AUC from ~0.70 to ~0.50 (random guess)
Maintains fairness (AUC near 0.5) for multiple simultaneous attributes (gender+age) using the Prompt Mixture module without retraining from scratch
Outperforms fairness baselines (like parameter-inefficient fine-tuning methods) in both utility (Hit@K) and fairness metrics (Attribute Prediction AUC)

Breakthrough Assessment

7/10

Effective adaptation of adversarial removal of sensitive attributes to the prompt tuning paradigm. The Prompt Mixture mechanism is a practical solution for combinatorial fairness constraints.

⚙️ Technical Details

Problem Definition

Setting: LLM-based Recommender System (LLM4RS) with sensitive user attributes

Inputs: Natural language prompts containing user interaction history (for sequential) or candidate lists (for direct recommendation)

Outputs: Predicted item ID (recommendation) effectively independent of specified sensitive user attributes

Pipeline Flow

Input Construction (User/Item IDs to Text)
Prompt Prepending (CFP / Prompt Mixture)
LLM Backbone (Processing)
Adversarial Discriminator (Training only)

System Modules

Counterfactually-Fair-Prompt (CFP) (Input Processing)

Learnable soft tokens prepended to input to mask sensitive attributes

Model or implementation: Trainable Embedding Vectors

Prompt Mixture (PM) (Input Processing)

Aggregates multiple single-attribute CFPs for multi-attribute fairness

Model or implementation: Attention Layer

LLM Backbone

Generates recommendation output

Model or implementation: T5-small / OpenLlama

Discriminator

Attempts to predict sensitive attributes from internal embeddings

Model or implementation: Multi-class Classifier (MLP)

Novel Architectural Elements

Prompt Mixture (PM) module: An attention layer specifically designed to fuse multiple adversarially-trained soft prompts into a single prefix to handle combinatorial fairness constraints

Modeling

Base Model: T5-small (encoder-decoder) and OpenLlama (decoder-only)

Training Method: Adversarial Prompt Tuning

Objective Functions:

Purpose: Ensure recommendation accuracy.

Formally: Negative Log-Likelihood of the correct item ID token.
Purpose: Ensure fairness by confusing the discriminator.

Formally: Maximize Cross-Entropy Loss of the discriminator (predicting sensitive attributes).
Purpose: Total objective.

Formally: Minimize L_rec - lambda * L_dis (min-max game).

Adaptation: Soft Prompt Tuning (Model weights frozen)

Trainable Parameters: Soft prompt embeddings + Prompt Mixture attention weights + Discriminator weights

Training Data:

MovieLens-1M: User-movie interactions + Gender/Age/Occupation
Insurance: User-insurance interactions + Gender/Marital/Age/Occupation

Key Hyperparameters:

prompt_length: 100
learning_rate: 1e-3
batch_size: 16
+ 2 more
lambda (discriminator weight): 1.0 (tuned range 0.1 to 2.0)
epochs: 20

Compute: Experiments run on NVIDIA A100 GPU

Comparison to Prior Work

vs. P5: UP5 adds adversarial soft prompts to remove sensitive info while P5 uses raw text prompts
vs. Traditional Fair RS (Li et al. 2021): UP5 works on token space/LLMs rather than ID-embedding tables, enabling transfer to foundation models
vs. Adapter-based Fairness (Wu et al. 2022): UP5 uses Prompt Mixture to handle attribute combinations efficiently, whereas adapters require training distinct modules for every combination [not cited in paper, but conceptually related]

Limitations

Adversarial training can be unstable and requires careful tuning of the lambda parameter
Experiments primarily conducted on relatively small backbone models (T5-small)
Depends on the discriminator's ability to probe attributes; if the discriminator is too weak, fairness guarantees might be overestimated

Reproducibility

Code: https://github.com/agiresearch/UP5

Code and data anonymously released at https://github.com/agiresearch/UP5. Implementation details (prompt length, LR, etc.) provided in Appendix.

📊 Experiments & Results

Evaluation Setup

Top-K Recommendation (Direct and Sequential tasks)

Benchmarks:

MovieLens-1M (Movie Recommendation)
Insurance (Insurance Product Recommendation)

Metrics:

Hit@K (K=1, 3, 10) for Utility
NDCG@K (K=1, 3, 10) for Utility
AUC (of attribute classifier) for Fairness (Target = 0.5)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Probing results demonstrate that standard LLMs (P5) implicitly encode sensitive attributes.
MovieLens-1M	AUC (Gender Prediction)	-	0.6865	+0.1865
Performance on Direct Recommendation (MovieLens-1M) showing UP5 improves utility while ensuring fairness.
MovieLens-1M	Hit@1	0.1554	0.1762	+0.0208
MovieLens-1M	AUC (Gender Fairness)	0.6865	0.5186	-0.1679
Multi-attribute fairness results using Prompt Mixture.
MovieLens-1M	Hit@1	0.1554	0.1654	+0.0100
MovieLens-1M	AUC (Age Fairness)	0.6093	0.5154	-0.0939

Experiment Figures

AUC scores of probing sensitive attributes from P5 backbone on MovieLens and Insurance datasets.

Diagram of the Prompt Mixture (PM) module.

Main Takeaways

Standard LLM-based recommenders (P5) inherently leak sensitive user attributes (gender, age, etc.), allowing for potential unfairness.
CFP successfully mitigates this leakage, reducing attribute prediction AUC to near-random levels (0.5).
Surprisingly, removing sensitive attributes via CFP often *improves* recommendation performance (Hit@1) compared to the biased baseline, possibly by acting as a regularizer or removing irrelevant biases.
The Prompt Mixture module effectively handles multiple fairness constraints simultaneously without needing to train a new model for every combination of attributes.

📚 Prerequisite Knowledge

Prerequisites

Basics of Prompt Tuning / Soft Prompts
Adversarial Machine Learning (Min-Max optimization)
Counterfactual Fairness definitions

Key Terms

Counterfactual Fairness: A definition of fairness where the recommendation outcome for a user remains the same even if their sensitive attribute (e.g., gender) were flipped to a different value

Soft Prompt: Learnable vectors prepended to the input text embeddings that steer the LLM's behavior without modifying the model weights

Adversarial Learning: A training method where two networks compete: a generator (here, the prompt) tries to hide information, while a discriminator tries to uncover it

Prompt Mixture: A proposed attention-based mechanism to combine multiple attribute-specific soft prompts into a single prompt representation

P5: Pretrain, Personalized Prompt, Prediction Paradigm—a unified framework for formulating recommendation tasks as language generation problems