← Back to Paper List

Inducing Programmatic Skills for Agentic Tasks

Z. Wang, Apurva Gandhi, Graham Neubig, Daniel Fried
Carnegie Mellon University
arXiv.org (2025)
Agent Memory Benchmark

📝 Paper Summary

Web agents Online skill learning Tool creation
ASI enables web agents to self-improve by converting successful interaction traces into verified, executable Python functions that are added directly to the agent's action space.
Core Problem
Current adaptive web agents represent learned skills as text descriptions in memory, which are verbose, unverifiable, and prone to misinterpretation by the agent.
Why it matters:
  • Textual skills cannot be rigorously verified, leading to the accumulation of incorrect or hallucinatory guidelines
  • Solving complex web tasks with primitive actions (click, scroll) is inefficient; agents need high-level abstractions to reduce trajectory length
  • Offline learning from demonstrations suffers from distribution shift when deployed on real, dynamic websites
Concrete Example: In a shopping task, a text-based agent might induce a vague skill like 'search for game accessories' that mixes searching and adding to a wishlist. ASI induces a precise, reusable Python function `search_product(name)` that only performs the search, verified by execution.
Key Novelty
Agent Skill Induction (ASI)
  • Represents skills as executable Python programs rather than text, allowing the agent to abstract primitive actions into high-level function calls
  • Implements a verification loop where induced skills are tested against the environment using rewritten trajectory prefixes before acceptance
  • Integrates verified skills directly into the agent's action space (as new tools) rather than just appending them to the context window/memory
Architecture
Architecture Figure Figure 1
Contrast between AWM (Baseline) and ASI (Proposed) architectures. Top: AWM adds text skills to Memory. Bottom: ASI adds program skills to the Action Space.
Evaluation Highlights
  • +23.5% success rate improvement on WebArena compared to a static non-adaptive baseline
  • +11.3% success rate over AWM (state-of-the-art adaptive agent using text skills), driven by the correctness of verified programs
  • Reduces average steps to solution by 10.7–15.3%, validating that programmatic skills enable more efficient planning
Breakthrough Assessment
8/10
Significant improvement over SOTA by shifting from text-based to program-based skill learning. The verification mechanism addresses the critical reliability issue in self-improving agents.
×