Wake Technical Community College,
University of North Carolina at Charlotte
arXiv
(2026)
RecommendationAgentReasoning
π Paper Summary
Governance-Constrained RecommendationAgentic Recommender SystemsAI Safety and Auditing
PCN-Rec ensures recommender systems strictly obey governance policies by using a multi-agent negotiation process to propose rankings and a deterministic code-based verifier to reject and repair violations.
Core Problem
Monolithic LLM recommenders struggle to reliably satisfy hard combinatorial constraints (like diversity quotas) and lack auditability, often generating plausible text explanations while violating policies.
Why it matters:
Platforms face legal or contractual obligations (e.g., minimum long-tail exposure) that must be strictly enforced and auditable
LLMs suffer from 'lost-in-the-middle' reasoning and cannot maintain global state over combinatorial constraints, leading to silent failures
Existing methods lack a mechanism to prove that a served slate actually met the required policy checks
Concrete Example:A platform requires every movie recommendation list to have at least 30% 'long-tail' (unpopular) items. A standard LLM might generate a list of 10 blockbusters and hallucinate a justification claiming it is diverse. PCN-Rec's verifier would catch the metadata mismatch (0% tail) and force a repair.
Key Novelty
Proof-Carrying Negotiation (PCN)
Treats the LLM as a 'proposer' rather than an authority; it must output a structured certificate (JSON) proving compliance alongside its recommendation
Splits the recommendation task into specialized agents: a User Advocate (optimizing relevance) and a Policy Agent (enforcing constraints) to negotiate trade-offs
Introduces a window-based feasibility analysis to distinguish between 'impossible to satisfy' user histories and 'AI failure' cases
Architecture
The PCN-Rec pipeline separating the Base Recommender, the Agentic Negotiation (User Advocate vs Policy Agent), the Mediator, and the Deterministic Verifier/Repair loop.
Evaluation Highlights
Achieves 98.55% governance pass rate on feasible users in MovieLens-100K, compared to 0.00% for a single-LLM baseline
Maintains utility with only a 0.021 absolute drop in NDCG@10 (0.403 vs. 0.424) compared to an unconstrained LLM, a statistically significant but minimal cost for safety
Identifies that 551 out of 943 users (58%) have feasible solutions within a candidate window of 80 items, validating the feasibility-aware evaluation protocol
Breakthrough Assessment
8/10
Significantly advances reliable GenAI deployment by solving the 'silent violation' problem in regulated recommendation. The proof-carrying interface provides a practical blueprint for auditable AI.
βοΈ Technical Details
Problem Definition
Setting: Slate recommendation under hard per-slate governance constraints
Inputs: User history u and a candidate window C_W(u) of top-W items from a base recommender
Outputs: A slate S of size N and a certificate c proving constraint satisfaction, or a repaired slate S'
Pipeline Flow
Base Recommender (MF/CF) -> Candidate Window
Agent Negotiation (User Advocate vs Policy Agent)
Mediator LLM -> Slate + Certificate
Deterministic Verifier -> Pass/Fail
Deterministic Repair (if Fail) -> Final Slate
System Modules
Candidate Generator
Generate a ranked list of candidate items using standard methods (MF/CF) to define the search space
Model or implementation: Base Recommender (e.g., Matrix Factorization)
User Advocate (Negotiation)
Argue for items maximizing relevance to user preferences, blind to policy constraints
Model or implementation: LLM Agent (specific model not named)
Policy Agent (Negotiation)
Argue for items satisfying governance constraints (tail exposure, diversity)
Model or implementation: LLM Agent (specific model not named)
Mediator
Synthesize agent arguments and propose a slate with a structured certificate
Model or implementation: LLM (specific model not named)
Verifier (Enforcement)
Recompute constraints deterministically from the slate and metadata; reject if invalid
Model or implementation: Deterministic Code
Repair Module (Enforcement)
Construct a compliant slate using a greedy algorithm if the LLM proposal fails verification
Model or implementation: Deterministic Code (Constrained-Greedy)
Novel Architectural Elements
Proof-carrying interface where the LLM output is coupled with a structured JSON certificate for external verification
Separation of 'User Advocate' and 'Policy Agent' roles to prevent 'lost-in-the-middle' constraint reasoning
Modeling
Base Model: LLM (specific model architecture not reported in snippet)
Training Method: Inference-time negotiation and verification (No training described)
Comparison to Prior Work
vs. Single LLM: PCN-Rec uses agentic decomposition and external verification to guarantee compliance (98.55% vs 0% pass rate)
vs. Constrained-Greedy: PCN-Rec uses LLM reasoning for personalization, falling back to greedy only on failure, whereas pure greedy lacks natural language reasoning [implied]
vs. Self-Correction Methods [not cited in paper]: PCN-Rec separates the corrector (Verifier/Repair) into deterministic code rather than asking the LLM to self-correct
Limitations
Compliance is guaranteed only if a feasible slate exists within the fixed candidate window W
The verifier relies on formalized constraints and accurate metadata; incorrect metadata leads to false results
Deterministic repair may produce sub-optimal utility compared to a perfect constrained optimization oracle
Requires running multiple LLM calls (Agents + Mediator), increasing computational cost over a single-shot recommender
π Experiments & Results
Evaluation Setup
Top-N recommendation on MovieLens-100K with hard constraints
Benchmarks:
MovieLens-100K (Movie Recommendation)
Metrics:
Governance Pass Rate (Verifier-checked)
NDCG@10 (Ranking Utility)
Window Feasibility (Percentage of users with satisfiable constraints)
Statistical methodology: Paired test over feasible users (p < 0.05)
Key Results
Benchmark
Metric
Baseline
This Paper
Ξ
MovieLens-100K
Governance Pass Rate
0.000
0.985
+0.985
MovieLens-100K
NDCG@10
0.424
0.403
-0.021
MovieLens-100K
Feasible Users Count (W=80)
943
551
N/A
Experiment Figures
Feasibility analysis plotting the number of feasible users and tail item shortage against increasing Candidate Window size (W)
Main Takeaways
Strict governance compliance via LLMs requires deterministic verification; relying on LLM self-restraint (Single LLM) results in near-zero compliance for hard combinatorial constraints
The 'cost of governance' is quantifiable: a small drop in NDCG (-0.021) is the price for guaranteeing auditable policy satisfaction
Feasibility analysis is crucial: nearly half of users (943 vs 551) had NO valid solution in the window, meaning failures there are mathematical impossibilities, not AI errors
π Prerequisite Knowledge
Prerequisites
Recommender Systems (Matrix Factorization/Collaborative Filtering)
Constraint Satisfaction Problems
Agentic AI patterns (Multi-agent negotiation)
Evaluation metrics (NDCG)
Key Terms
Slate: An ordered list of recommended items shown to the user simultaneously (e.g., 'Top 10 movies for you')
Candidate Window: A limited set of top-ranked items (e.g., top 80) from a base model, defining the search space for the LLM
NDCG: Normalized Discounted Cumulative Gainβa measure of ranking quality that accounts for the position of relevant items
Proof-carrying: A system design where the output includes a machine-checkable 'proof' (certificate) that guarantees correctness properties
Long-tail: Items that are less popular/obscure; governance often mandates their exposure to ensure catalog diversity
Constraint-greedy repair: A deterministic fallback algorithm that constructs a valid list by selecting items that satisfy constraints first, then optimizing for relevance
Feasible user: A user for whom it is mathematically possible to form a valid slate given the constraints and the available items in their candidate window