GUI: Graphical User Interface—visual components like buttons and windows that users interact with
CLI: Command Line Interface—text-based interface for executing commands
POMDP: Partially Observable Markov Decision Process—a mathematical framework for modeling decision-making where the agent cannot see the full state of the world
a11y tree: Accessibility Tree—a structured text representation of UI elements (buttons, inputs) and their properties, used by screen readers and agents
VLM: Vision Language Model—AI models that can process both image and text inputs
DAG: Directed Acyclic Graph—a conceptual representation of a workflow where data flows in one direction without loops, common in tools like Airflow
Set-of-Mark: A prompting technique where visible UI elements on a screenshot are overlaid with numbered bounding boxes to help the model reference specific coordinates
dbt: data build tool—a framework for transforming data in warehouses using SQL
Airbyte: An open-source data integration platform for moving data from sources to destinations
Airflow: A platform to programmatically author, schedule, and monitor workflows