← Back to Paper List

UtilityMax Prompting: A Formal Framework for Multi-Objective Large Language Model Optimization

Ofir Marom
Independent Researcher
arXiv (2026)
Recommendation Reasoning P13N

📝 Paper Summary

Prompt Engineering Multi-Objective Optimization Recommendation Systems
UtilityMax replaces ambiguous natural language prompts with formal mathematical influence diagrams, constraining LLMs to explicitly calculate and maximize expected utility across conflicting objectives.
Core Problem
Natural language prompts are inherently ambiguous when specifying multiple competing objectives (e.g., maximize profit vs. minimize risk), requiring the LLM to subjectively interpret how to balance them.
Why it matters:
  • Ambiguity in natural language leads to inconsistent performance in complex tasks where precise trade-offs are required.
  • Existing prompt optimization methods (like OPRO) require expensive scoring functions or labeled data, which are not always available in zero-shot settings.
Concrete Example: A trading agent instructed to 'maximise profit subject to a medium level of risk' fails because 'medium' is subjective. The LLM might prioritize profit too aggressively, whereas a formal utility function defining specific risk tolerances would eliminate this ambiguity.
Key Novelty
Formal Influence Diagram Prompting
  • Reconstructs the task as a Directed Acyclic Graph (DAG) where the LLM's answer is a decision node and objectives are chance nodes.
  • Defines a formal multiplicative utility function over the chance nodes.
  • Instructs the LLM to explicitly estimate the conditional probability of each chance node given a candidate answer and select the answer that maximizes expected utility.
Evaluation Highlights
  • +16.5% improvement in NDCG@10 on MovieLens 1M using Claude Sonnet 4.6 compared to a standard natural language baseline.
  • Consistent performance gains across three frontier models (Claude Sonnet 4.6, GPT-5.4, Gemini 2.5 Pro) compared to both 'Basic' and 'Harsh' natural language prompts.
  • Statistically significant improvement (p<0.01) over baselines across all models according to Wilcoxon signed-rank tests.
Breakthrough Assessment
7/10
A clever, rigorous approach to prompt engineering that moves away from 'prompt alchemy' toward formal specification. While tested on a specific recommendation task, the framework is theoretically applicable to any multi-objective problem.
×