PCN-Rec: Agentic Proof-Carrying Negotiation for Reliable Governance-Constrained Recommendation

📝 Paper Summary

Governance-Constrained Recommendation Agentic Recommender Systems AI Safety and Auditing

PCN-Rec ensures recommender systems strictly obey governance policies by using a multi-agent negotiation process to propose rankings and a deterministic code-based verifier to reject and repair violations.

Core Problem

Monolithic LLM recommenders struggle to reliably satisfy hard combinatorial constraints (like diversity quotas) and lack auditability, often generating plausible text explanations while violating policies.

Why it matters:

Platforms face legal or contractual obligations (e.g., minimum long-tail exposure) that must be strictly enforced and auditable
LLMs suffer from 'lost-in-the-middle' reasoning and cannot maintain global state over combinatorial constraints, leading to silent failures
Existing methods lack a mechanism to prove that a served slate actually met the required policy checks

Concrete Example: A platform requires every movie recommendation list to have at least 30% 'long-tail' (unpopular) items. A standard LLM might generate a list of 10 blockbusters and hallucinate a justification claiming it is diverse. PCN-Rec's verifier would catch the metadata mismatch (0% tail) and force a repair.

Key Novelty

Proof-Carrying Negotiation (PCN)

Treats the LLM as a 'proposer' rather than an authority; it must output a structured certificate (JSON) proving compliance alongside its recommendation
Splits the recommendation task into specialized agents: a User Advocate (optimizing relevance) and a Policy Agent (enforcing constraints) to negotiate trade-offs
Introduces a window-based feasibility analysis to distinguish between 'impossible to satisfy' user histories and 'AI failure' cases

Architecture

The PCN-Rec pipeline separating the Base Recommender, the Agentic Negotiation (User Advocate vs Policy Agent), the Mediator, and the Deterministic Verifier/Repair loop.

Evaluation Highlights

Achieves 98.55% governance pass rate on feasible users in MovieLens-100K, compared to 0.00% for a single-LLM baseline
Maintains utility with only a 0.021 absolute drop in NDCG@10 (0.403 vs. 0.424) compared to an unconstrained LLM, a statistically significant but minimal cost for safety
Identifies that 551 out of 943 users (58%) have feasible solutions within a candidate window of 80 items, validating the feasibility-aware evaluation protocol

Breakthrough Assessment

8/10

Significantly advances reliable GenAI deployment by solving the 'silent violation' problem in regulated recommendation. The proof-carrying interface provides a practical blueprint for auditable AI.

⚙️ Technical Details

Problem Definition

Setting: Slate recommendation under hard per-slate governance constraints

Inputs: User history u and a candidate window C_W(u) of top-W items from a base recommender

Outputs: A slate S of size N and a certificate c proving constraint satisfaction, or a repaired slate S'

Pipeline Flow

Base Recommender (MF/CF) -> Candidate Window
Agent Negotiation (User Advocate vs Policy Agent)
Mediator LLM -> Slate + Certificate
Deterministic Verifier -> Pass/Fail
Deterministic Repair (if Fail) -> Final Slate

System Modules

Candidate Generator

Generate a ranked list of candidate items using standard methods (MF/CF) to define the search space

Model or implementation: Base Recommender (e.g., Matrix Factorization)

User Advocate (Negotiation)

Argue for items maximizing relevance to user preferences, blind to policy constraints

Model or implementation: LLM Agent (specific model not named)

Policy Agent (Negotiation)

Argue for items satisfying governance constraints (tail exposure, diversity)

Model or implementation: LLM Agent (specific model not named)

Mediator

Synthesize agent arguments and propose a slate with a structured certificate

Model or implementation: LLM (specific model not named)

Verifier (Enforcement)

Recompute constraints deterministically from the slate and metadata; reject if invalid

Model or implementation: Deterministic Code

Repair Module (Enforcement)

Construct a compliant slate using a greedy algorithm if the LLM proposal fails verification

Model or implementation: Deterministic Code (Constrained-Greedy)

Novel Architectural Elements

Proof-carrying interface where the LLM output is coupled with a structured JSON certificate for external verification
Separation of 'User Advocate' and 'Policy Agent' roles to prevent 'lost-in-the-middle' constraint reasoning

Modeling

Base Model: LLM (specific model architecture not reported in snippet)

Training Method: Inference-time negotiation and verification (No training described)

Comparison to Prior Work

vs. Single LLM: PCN-Rec uses agentic decomposition and external verification to guarantee compliance (98.55% vs 0% pass rate)
vs. Constrained-Greedy: PCN-Rec uses LLM reasoning for personalization, falling back to greedy only on failure, whereas pure greedy lacks natural language reasoning [implied]
vs. Self-Correction Methods [not cited in paper]: PCN-Rec separates the corrector (Verifier/Repair) into deterministic code rather than asking the LLM to self-correct

Limitations

Compliance is guaranteed only if a feasible slate exists within the fixed candidate window W
The verifier relies on formalized constraints and accurate metadata; incorrect metadata leads to false results
Deterministic repair may produce sub-optimal utility compared to a perfect constrained optimization oracle
Requires running multiple LLM calls (Agents + Mediator), increasing computational cost over a single-shot recommender

📊 Experiments & Results

Evaluation Setup

Top-N recommendation on MovieLens-100K with hard constraints

Benchmarks:

MovieLens-100K (Movie Recommendation)

Metrics:

Governance Pass Rate (Verifier-checked)
NDCG@10 (Ranking Utility)
Window Feasibility (Percentage of users with satisfiable constraints)
Statistical methodology: Paired test over feasible users (p < 0.05)

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
MovieLens-100K	Governance Pass Rate	0.000	0.985	+0.985
MovieLens-100K	NDCG@10	0.424	0.403	-0.021
MovieLens-100K	Feasible Users Count (W=80)	943	551	N/A

Experiment Figures

Feasibility analysis plotting the number of feasible users and tail item shortage against increasing Candidate Window size (W)

Main Takeaways

Strict governance compliance via LLMs requires deterministic verification; relying on LLM self-restraint (Single LLM) results in near-zero compliance for hard combinatorial constraints
The 'cost of governance' is quantifiable: a small drop in NDCG (-0.021) is the price for guaranteeing auditable policy satisfaction
Feasibility analysis is crucial: nearly half of users (943 vs 551) had NO valid solution in the window, meaning failures there are mathematical impossibilities, not AI errors

📚 Prerequisite Knowledge

Prerequisites

Recommender Systems (Matrix Factorization/Collaborative Filtering)
Constraint Satisfaction Problems
Agentic AI patterns (Multi-agent negotiation)
Evaluation metrics (NDCG)

Key Terms

Slate: An ordered list of recommended items shown to the user simultaneously (e.g., 'Top 10 movies for you')

Candidate Window: A limited set of top-ranked items (e.g., top 80) from a base model, defining the search space for the LLM

NDCG: Normalized Discounted Cumulative Gain—a measure of ranking quality that accounts for the position of relevant items

Proof-carrying: A system design where the output includes a machine-checkable 'proof' (certificate) that guarantees correctness properties

Long-tail: Items that are less popular/obscure; governance often mandates their exposure to ensure catalog diversity

Constraint-greedy repair: A deterministic fallback algorithm that constructs a valid list by selecting items that satisfy constraints first, then optimizing for relevance

Feasible user: A user for whom it is mathematically possible to form a valid slate given the constraints and the available items in their candidate window