Inducing Programmatic Skills for Agentic Tasks

📝 Paper Summary

Web agents Online skill learning Tool creation

ASI enables web agents to self-improve by converting successful interaction traces into verified, executable Python functions that are added directly to the agent's action space.

Core Problem

Current adaptive web agents represent learned skills as text descriptions in memory, which are verbose, unverifiable, and prone to misinterpretation by the agent.

Why it matters:

Textual skills cannot be rigorously verified, leading to the accumulation of incorrect or hallucinatory guidelines
Solving complex web tasks with primitive actions (click, scroll) is inefficient; agents need high-level abstractions to reduce trajectory length
Offline learning from demonstrations suffers from distribution shift when deployed on real, dynamic websites

Concrete Example: In a shopping task, a text-based agent might induce a vague skill like 'search for game accessories' that mixes searching and adding to a wishlist. ASI induces a precise, reusable Python function `search_product(name)` that only performs the search, verified by execution.

Key Novelty

Agent Skill Induction (ASI)

Represents skills as executable Python programs rather than text, allowing the agent to abstract primitive actions into high-level function calls
Implements a verification loop where induced skills are tested against the environment using rewritten trajectory prefixes before acceptance
Integrates verified skills directly into the agent's action space (as new tools) rather than just appending them to the context window/memory

Architecture

Contrast between AWM (Baseline) and ASI (Proposed) architectures. Top: AWM adds text skills to Memory. Bottom: ASI adds program skills to the Action Space.

Evaluation Highlights

+23.5% success rate improvement on WebArena compared to a static non-adaptive baseline
+11.3% success rate over AWM (state-of-the-art adaptive agent using text skills), driven by the correctness of verified programs
Reduces average steps to solution by 10.7–15.3%, validating that programmatic skills enable more efficient planning

Breakthrough Assessment

8/10

Significant improvement over SOTA by shifting from text-based to program-based skill learning. The verification mechanism addresses the critical reliability issue in self-improving agents.

⚙️ Technical Details

Problem Definition

Setting: Online adaptive web navigation where an agent learns from a sequence of natural language queries without ground-truth rewards

Inputs: Sequence of natural language queries Q = {q1, q2, ...}

Outputs: Action trajectories τ consisting of primitive actions or induced skill calls

Pipeline Flow

Exploration: Agent attempts task with current action space
Filtration: Evaluator judges trajectory success
Induction: LLM converts successful trace into Python function
Verification: Agent tests function by executing it in environment
Deployment: Verified function added to Action Space

System Modules

Agent Policy

Generates actions based on observation and current skill library

Model or implementation: claude-3.5-sonnet

Induction Module (Skill Creation)

Abstracts successful action traces into reusable Python functions

Model or implementation: claude-3.5-sonnet

Verification Module (Skill Creation)

Validates candidate skills by attempting to solve the original task using the new skill

Model or implementation: claude-3.5-sonnet (evaluator)

Novel Architectural Elements

Dynamic expansion of the Action Space: Skills are added as callable tools rather than passive memory retrieval
Programmatic Verification Loop: A dedicated feedback loop that executes code to validate induced skills before adoption

Modeling

Base Model: claude-3.5-sonnet

Training Method: In-context learning / Online adaptation (Non-parametric updates)

Adaptation: None (Frozen model)

Compute: Not reported in the paper

Comparison to Prior Work

vs. AWM: ASI uses executable code instead of text, enabling verification and composition
vs. Vanilla Agent: ASI adapts online by growing its action space
vs. Voyager: ASI applies programmatic skills to the web domain with a specific rewriting-based verification mechanism [not cited in paper]

Limitations

Incompatibility of some induced skills with alternative website designs (limited transferability)
Success depends on the base LLM's coding capability (requires strong models like Claude 3.5 Sonnet)
Verification adds computational overhead during the online learning phase

Reproducibility

Code: https://github.com/zorazrw/agent-skill-induction

📊 Experiments & Results

Evaluation Setup

Online web navigation tasks where agents learn from a stream of queries

Benchmarks:

WebArena (Web navigation (E-commerce, Forums, etc.))

Metrics:

Success Rate (SR)
Average Steps (Efficiency)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Main performance comparison on WebArena showing ASI superiority over static and text-skill baselines.
WebArena	Success Rate	19.3	42.8	+23.5
WebArena	Success Rate	31.5	42.8	+11.3
WebArena	Average Steps	22.8	16.2	-6.6
WebArena	Average Steps	18.1	16.2	-1.9
Ablation study on the format (Text vs Program) and verification of skills.
WebArena (Shopping)	Success Rate	40.0	42.6	+2.6
WebArena (Shopping)	Success Rate	33.2	37.4	+4.2

Experiment Figures

The Skill Induction and Verification process. Left: Input episode. Middle: Induced program code. Right: Rewritten trajectory for verification.

Main Takeaways

Programmatic skills allow for concrete verification, filtering out low-quality or hallucinatory abstractions that plague text-based methods
Integrating skills into the action space (as tools) is more effective than providing them as context in memory
ASI improves efficiency by compressing multiple primitive actions (clicks/scrolls) into single high-level function calls
The method generalizes to scaled-up tasks (longer horizons) and shows capability to transfer common skills across different websites

📚 Prerequisite Knowledge

Prerequisites

Understanding of LLM-based agents (policy, memory, tools)
Familiarity with web browser environments (DOM, accessibility tree)
Basic knowledge of in-context learning vs. parameter updates

Key Terms

ASI: Agent Skill Induction—the proposed method that learns skills as executable programs

AWM: Agent Workflow Memory—a baseline method that learns skills as text descriptions stored in memory

WebArena: A benchmark for evaluating web agents on realistic tasks like e-commerce and software development

DOM: Document Object Model—the structural representation of a webpage that agents interact with

primitive actions: Low-level browser operations like 'click', 'type', or 'scroll' provided by the environment

action space: The set of all valid operations an agent can perform; ASI expands this dynamically with new programs

accessibility tree: A simplified version of the DOM used by assistive technologies and web agents to perceive page content