← Back to Paper List

Safe, Untrusted, "Proof-Carrying" AI Agents: toward the agentic lakehouse

Jacopo Tagliabue, Ciro Greco
Bauplan
arXiv (2025)
Agent Reasoning Benchmark

📝 Paper Summary

Infrastructure for Agents Agentic Data Engineering
The paper demonstrates that programmable lakehouses with Git-like branching and declarative environments enable untrusted AI agents to safely repair production data pipelines without human-in-the-loop bottlenecks.
Core Problem
Lakehouses run sensitive workloads that resist automation because they lack safe abstractions for untrusted agents to modify production data without risking corruption or security breaches.
Why it matters:
  • Data engineers spend significant time fixing broken pipelines, a high-stakes task that is currently hard to automate safely
  • Current systems lack unified interfaces, requiring agents to navigate heterogeneous tools (SQL editors, Terraform, Docker) rather than a single code-based API
  • Allowing autonomous agents to write to production storage poses severe trust and correctness risks unless writes are sandboxed and verified
Concrete Example: A data pipeline fails due to a NumPy 2.0 / pandas 2.0 dependency mismatch. An agent attempting to fix this in a traditional setup might accidentally corrupt production tables or introduce malicious code while trying to patch the environment.
Key Novelty
The Programmable, Branch-Based Agentic Lakehouse
  • Treats the entire data lifecycle (pipelines, environments, infrastructure) as code accessible via APIs, creating a unified interface for agents
  • Uses 'Git-for-Data' semantics (branch-then-merge) to let agents repair pipelines on isolated data copies (branches), preventing dirty reads in production
  • Implements a 'proof-carrying' protocol where agents must satisfy a semantic correctness check (verifier function) before their branch is merged
Architecture
Architecture Figure Figure 3
The agentic loop workflow interacting with the programmable lakehouse
Evaluation Highlights
  • Demonstrates fully autonomous repair of a broken pipeline (caused by NumPy/pandas version mismatch) using Sonnet 4.5 via a ReAct loop
  • Validates safety: failed agent attempts (e.g., GPT-5-mini) caused no production data corruption due to branch isolation
  • Shows feasibility of 'proof-carrying' workflow where a verifier function automatically gates the merge of agent-generated data into production
Breakthrough Assessment
7/10
Strong conceptual contribution defining safety abstractions for data agents. The prototype is a feasibility demonstration rather than a large-scale benchmark, but the architectural insights on 'Git-for-Data' for agents are significant.
×