programmable lakehouse: A data architecture where all aspects (data, infra, pipelines) are exposed and manageable via code/APIs rather than GUI or disparate tools
Git-for-Data: Applying version control concepts (commits, branches, merges) to large-scale data tables, allowing isolated experimentation and atomic updates
proof-carrying code: A safety concept where untrusted code is accompanied by a formal proof or evidence (here, passing a verifier function) that it satisfies safety properties
MCP: Model Context Protocol—a standard for exposing server-side tools and data context to LLM agents
ReAct: Reason+Act—a paradigm where agents interleave reasoning steps with tool execution steps to solve complex tasks
copy-on-write: A storage optimization where data is copied only when modified, allowing efficient branching without duplicating the entire dataset initially
DAG: Directed Acyclic Graph—a representation of data pipelines where nodes are transformations and edges are dependencies