AOP is a framework that automates the orchestration and optimization of LLM pipelines for complex queries by assembling predefined semantic operators into interactive, self-reflecting execution workflows.
Core Problem
Current data lakes and static LLM pipelines (like basic RAG) struggle with complex queries requiring multi-hop retrieval, logical reasoning, and analytics across heterogeneous data because they lack dynamic planning and interactive adjustment.
Why it matters:
Manual pipeline orchestration is brittle, costly, and requires significant human expertise to design effective workflows
Static pipelines cannot adapt to intermediate failures (e.g., retrieving irrelevant documents), leading to error propagation and incorrect final answers
Existing systems fail to effectively link and analyze heterogeneous data types (structured tables, unstructured documents) simultaneously for complex reasoning
Concrete Example:For the query 'Who are the members of the men's team table tennis champion team at the 2024 Olympic Games?', a standard RAG might fail because it requires multi-hop retrieval: first finding the winning team from news (unstructured), then looking up the roster in a table (structured). AOP handles this by linking the 'champion team' concept to a specific roster lookup.
Key Novelty
Automated Semantic Operator Orchestration
Decomposes complex queries into chains of standard 'semantic operators' (e.g., Semantic Filter, Retrieve, Aggregate) rather than bespoke code
Uses an LLM-based planner to generate initial operator chains, rewrites them into Directed Acyclic Graphs (DAGs) for parallelism, and optimizes them using a cost model
Executes pipelines interactively, allowing the system to inspect intermediate results and dynamically adjust the plan (e.g., pruning paths if retrieval fails)
Architecture
The AOP architecture, illustrating the flow from Query Interface to Planner, Optimizer, Executor, and finally the Answer.
Evaluation Highlights
+45% accuracy improvement on a challenging subset of the CRAG benchmark compared to directly asking the LLM
Reduces execution latency by utilizing parallel execution of independent operators within DAG-structured pipelines
Demonstrates effective handling of heterogeneous data by linking structured tables and unstructured text via semantic operators
Breakthrough Assessment
8/10
Significantly advances Agentic RAG by formalizing 'semantic operators' similar to database algebra, enabling systematic optimization and interactive execution for complex, multi-modal queries.
βοΈ Technical Details
Problem Definition
Setting: Complex query answering over heterogeneous data lakes (structured, semi-structured, and unstructured data)
Inputs: Natural language query
Outputs: Final answer (text or structured data) derived from multi-step reasoning
Pipeline Flow
Query Interface (receives NL query)
Planner/Optimizer (generates operator chains β rewrites to DAGs β selects best plan via cost model)
Executor (runs operators interactively, adjusting based on results)
Context Manager (summarizes intermediate state)
System Modules
Planner (Orchestration)
Orchestrates execution pipelines by selecting appropriate semantic operators for the input query
Model or implementation: LLM-based (Specific model not reported in the paper)
Pipeline Rewriter (Orchestration)
Refines chain pipelines into DAG structures to enable parallel execution
Model or implementation: LLM-based
Cost-Based Optimizer
Selects the most efficient pipeline by estimating computational costs and data cardinality
Model or implementation: Mathematical cost model fitted on sample workloads
Pipeline Executor
Executes the pipeline layer-by-layer with interactive adjustment
Model or implementation: Hybrid (LLM calls + Pre-programmed functions)
Context Manager
Condenses intermediate information to fit within LLM context limits
Model or implementation: LLM-based (Summarize/Explain operators)
Novel Architectural Elements
Use of 22 predefined 'Semantic Operators' (Retrieve, Scan, Filter, etc.) as standardized building blocks for LLM agents
Layer-wise interactive execution engine that pauses to self-reflect and prune/adjust the pipeline graph at runtime
Hybrid physical implementation of operators: some are LLM-based prompts, others are pre-programmed functions (e.g., BM25 search)
Modeling
Base Model: Not reported in the paper
Training Method: Process Reward Model (PRM) training to fine-tune LLM planning
Training Data:
Recorded sequences of chosen pipelines, intermediate results, and final outcomes serve as training data
Compute: Not reported in the paper
Comparison to Prior Work
vs. RAG: AOP uses multi-step dynamic planning with logic operators (Filter, Aggregate) rather than a fixed retrieval step
vs. NL2SQL: AOP supports unstructured and semi-structured data via semantic operators, not just structured tables
vs. LangChain/AutoGPT [not cited in paper]: AOP introduces database-style optimization (cost models, cardinality estimation) and DAG parallelism to agentic workflows
Limitations
Cardinality estimation for unstructured data is challenging and relies on sampling, which may be inaccurate
Cost model parameters require fitting on sample workloads, which may not generalize to all query types
The paper does not specify the base LLM used for experiments, making exact reproduction difficult
Reproducibility
Code availability is not provided. The paper lists the 22 semantic operators and their descriptions, but specific prompts, cost model parameters, and trained model weights are not released.
π Experiments & Results
Evaluation Setup
Open-domain Question Answering on the CRAG dataset
Semantic Operators: Predefined functional units (e.g., Retrieve, Filter, Summarize) used as building blocks for LLM pipelines, similar to SQL operators but for semantic tasks
DAG: Directed Acyclic Graphβa structure used here to represent execution pipelines where independent operators can run in parallel
Prefetching: A technique where the system proactively retrieves potentially useful information for future steps during idle time to reduce latency from retrieval failures
Process Reward Model (PRM): A model that assigns rewards to intermediate steps in a reasoning process, used to fine-tune the LLM's planning capabilities
Schema Linking: The process of identifying and connecting relevant data elements (like table columns or document sections) to the terms in a natural language query