UFO: Unfair-to-Fair Evolving Mitigates Unfairness in LLM-based Recommender Systems via Self-Play Fine-tuning

📝 Paper Summary

LLM-based Recommender Systems (LRS) Item-side Fairness

UFO mitigates item-side unfairness in LLM recommenders by analyzing how supervised fine-tuning amplifies pre-training bias and correcting it via a self-play game between a judger and a corrector.

Core Problem

LLM-based Recommender Systems (LRSs) exhibit severe item-side unfairness because the Supervised Fine-Tuning (SFT) stage reinforces and amplifies inherent biases from the pre-training stage.

Why it matters:

Current methods like re-weighting or re-ranking only address bias during SFT, ignoring the root cause in pre-training
LRSs exhibit more severe unfairness than traditional models (e.g., SASRec), leading to significant inequality in item exposure for specific groups (e.g., job providers)
Existing fairness constraints often degrade the recommendation performance (utility) of the system

Concrete Example: In an empirical study on ML-1M using Llama-2-7b, the covariance between pre-training bias and SFT bias shift was positive (7.73e-4), meaning the fine-tuning process actively reinforced the model's initial preference for dominant genres rather than correcting it.

Key Novelty

Unfair-to-Fair evOlving (UFO) with Self-Play

Frames fairness alignment as a two-player game: a 'Judger' identifies unfair outputs relative to ideal distributions, and a 'Corrector' adjusts the model to fool the Judger
Identifies that SFT amplifies pre-training bias (positive covariance) rather than just introducing new bias, requiring corrections that address both stages
Uses a geometric mixture policy to interpolate between the current and original model, ensuring fairness improvements do not catastrophically degrade recommendation utility

Evaluation Highlights

Analysis reveals positive covariance (7.73e-4) between pre-training bias and SFT bias shift on ML-1M, proving SFT amplifies existing inequities
Identifies that 7 out of 10 genre groups in Llama-2-7b retained the same bias direction after SFT, confirming the reinforcement hypothesis

Breakthrough Assessment

7/10

Strong analytical contribution identifying 'bias amplification' in LRS. The self-play solution is conceptually novel for this domain. Score limited by lack of visible end-to-end performance metrics in the provided text.

⚙️ Technical Details

Problem Definition

Setting: Sequential Recommendation (Next-item prediction) with Fairness Constraints

Inputs: User interaction sequence s = {i_1, ..., i_L}

Outputs: Next item i from item space I

Pipeline Flow

Judger Role (Identifies unfairness)
Corrector Role (Adjusts distribution)
Geometric Mixture Update (Preserves utility)

System Modules

Judger

Identifies unfair outputs from the current LRS by comparing them to ideal fair results

Model or implementation: LRS (Self-play)

Corrector

Adjusts the LRS to address identified unfairness while preserving recommendation performance

Model or implementation: LRS (Self-play)

Novel Architectural Elements

Self-play loop where the model alternates roles (Judger/Corrector) to iteratively resolve unfairness
Geometric mixture policy to constrain the updated model to stay geometrically close to the original model for utility preservation

Modeling

Base Model: Llama-2-7b (used in analysis section)

Training Method: Self-play Fine-tuning (UFO framework)

Objective Functions:

Purpose: Calibrate item-level bias.

Formally: min_q E[-log q_target] (Cross-entropy over calibration weights)

Training Data:

ML-1M dataset (MovieLens 1 Million)

Compute: Not reported in the paper

Comparison to Prior Work

vs. Re-weighting/Re-ranking: UFO addresses pre-training bias and SFT shift jointly via iterative self-play, rather than just one stage
vs. SPPO: UFO operates at the distributional level for group fairness rather than pairwise preference (Bradley-Terry model)
vs. SASRec [not cited in paper]: UFO is an LLM-based approach, whereas SASRec is a traditional ID-based sequential model (UFO aims to fix the fairness deficit LRS has compared to SASRec)

Limitations

Analysis relies on the 'small-bias approximation' for mathematical decomposition
Requires iterative optimization (Judger/Corrector) which may increase training complexity compared to simple re-weighting
Fairness definition is strictly tied to historical exposure ratios, which assumes historical data is the ground truth for 'fairness'

Reproducibility

The paper provides mathematical definitions for fairness and bias decomposition. The dataset (ML-1M) and base model (Llama-2-7b) are public. Code URL is not provided in the text.

📊 Experiments & Results

Evaluation Setup

Sequential recommendation with fairness analysis on bias origins

Benchmarks:

ML-1M (Movie Recommendation)

Metrics:

Item-side Fairness (epsilon)
Covariance (between pre-training bias and SFT shift)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
ML-1M	Covariance (Pre-training bias vs SFT shift)	0	0.000773	+0.000773

Experiment Figures

Visualization of centered biases for 10 genre groups in ML-1M, comparing Pre-training bias (Delta_p) and SFT shift (Delta_delta).

Main Takeaways

Group-level unfairness in LRS is composite: Pre-training establishes an initial bias pattern (often due to language priors), and SFT amplifies it.
The positive covariance between pre-training bias and SFT shift proves that standard fine-tuning reinforces existing inequities instead of aligning with the target distribution.
Raw outputs of base LLMs (without SFT) exhibit large dispersion and irrelevant text, necessitating calibration to measure inherent bias accurately.

📚 Prerequisite Knowledge

Prerequisites

Supervised Fine-Tuning (SFT) of LLMs
Item-side Fairness (Group Fairness)
Sequential Recommendation
Self-Play mechanisms

Key Terms

LRS: Large Language Model-based Recommender Systems

SFT: Supervised Fine-Tuning—adapting a pre-trained model to a specific task using labeled data

IF: Item-side Fairness—the principle that different item groups should receive exposure proportional to their historical relevance

Self-play: A training strategy where the model plays different roles (e.g., judger and corrector) against itself to improve performance

Geometric mixture: A method of combining two probability distributions (or models) by averaging them in log-space (interpolating logits or parameters)

Covariance: A statistical measure indicating whether two variables (here, pre-training bias and SFT shift) change in the same direction

SASRec: Self-Attentive Sequential Recommendation—a standard transformer-based baseline for sequential recommendation

Calibration: Adjusting model outputs (logits) so that predicted probabilities match true empirical frequencies