← Back to Paper List

LLMs as Orchestrators: Constraint-Compliant Multi-Agent Optimization for Recommendation Systems

Guilin Zhang, Kai Zhao, Jeffrey Friedman, Xu Chu
Workday
arXiv (2026)
Recommendation Agent RL

📝 Paper Summary

Multi-Objective Recommendation Constrained Optimization LLM Agents
DualAgent-Rec guarantees hard business constraints in recommendations by using an LLM to dynamically allocate resources between a rule-abiding exploitation agent and a rule-breaking exploration agent.
Core Problem
Real-world recommendation systems must satisfy hard business constraints (e.g., fairness, seller coverage), but existing methods treat these as soft penalties, leading to unacceptable violations in production.
Why it matters:
  • Business rules like 'every list must have at least one new product' are non-negotiable in deployment, yet standard Multi-Objective Optimization (MOO) algorithms often fail them.
  • Existing solutions either strictly filter solutions (killing diversity/accuracy) or allow infeasible solutions to dominate the results.
  • LLMs are currently used for item scoring or user simulation, but their potential as high-level optimization managers remains untapped.
Concrete Example: An e-commerce platform requires every recommendation list to include items from multiple sellers to ensure market fairness. A standard MOO model might generate a highly accurate list from a single popular seller, satisfying the accuracy objective but violating the hard coverage constraint, making the list undeployable.
Key Novelty
LLM-Coordinated Dual-Agent Evolutionary Optimization
  • Separates optimization into two agents: an Exploitation Agent that strictly follows rules to refine feasible solutions, and an Exploration Agent that ignores rules to find diverse, high-potential candidates.
  • Uses an LLM as a high-level manager that monitors progress and dynamically adjusts the population size (resources) of each agent, rather than using fixed schedules.
  • Employs adaptive constraint relaxation that starts loose to allow exploration and gradually tightens to ensure 100% feasibility at the end.
Architecture
Architecture Figure Figure 1
The DualAgent-Rec framework, illustrating the interaction between the Exploitation Agent, Exploration Agent, and the LLM Coordinator.
Evaluation Highlights
  • Achieves 100% constraint satisfaction rate (CSR) on Amazon Reviews 2023, ensuring all deployed lists meet hard business rules.
  • Improves Pareto Hypervolume (HV) by 4–6% over strong baselines, indicating better trade-offs between accuracy and diversity.
  • Maintains competitive accuracy–diversity balance while eliminating the constraint violations common in prior soft-penalty approaches.
Breakthrough Assessment
7/10
Strong practical contribution for production systems requiring hard constraints. Novel use of LLMs as optimization orchestrators rather than just content processors.
×