← Back to Paper List

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

Yanna Jiang, Delong Li, Haiyu Deng, Baihe Ma, Xu Wang, Qin Wang, Guangsheng Yu
arXiv (2026)
Agent Memory Benchmark

📝 Paper Summary

Agentic reasoning Procedural memory Agent security
This paper formalizes agentic skills as reusable procedural memory modules with explicit applicability and termination logic, systematically mapping their lifecycle, design patterns, and security risks.
Core Problem
Agents currently suffer from fundamental inefficiency by re-deriving execution strategies for recurring tasks from scratch, as procedural knowledge disappears when the context window clears.
Why it matters:
  • Repeating the same reasoning process for identical tasks wastes computational resources (tokens) and increases latency
  • Ad-hoc planning is less reliable than executing verified, curated procedures (skills)
  • Lack of standardized skill definitions creates security vulnerabilities, such as unmanaged supply-chain risks in agent marketplaces
Concrete Example: A coding agent that has successfully debugged a null-pointer exception 100 times will approach the 101st instance as a novel problem, re-generating the plan from scratch rather than retrieving a known debugging procedure.
Key Novelty
Formalization and Systematization of Agentic Skills
  • Redefines skills not as simple tools, but as 4-tuple modules containing Applicability conditions (when to use), Policy (how to act), Termination (when to stop), and Interface (how to call)
  • Establishes a 7-stage lifecycle model (Discovery to Update) and a taxonomy of 7 design patterns for how skills are packaged and executed in real systems
Architecture
Architecture Figure Figure 1
The formal 4-component architecture of an Agentic Skill
Evaluation Highlights
  • Curated skills increase agent pass rates by 16.2 percentage points on average compared to agents without skills (SkillsBench)
  • Self-generated skills degrade performance by 1.3 percentage points, often encoding incorrect or overly specific heuristics
  • Identified nearly 1,200 malicious skills in the ClawHavoc campaign case study, demonstrating scale of supply-chain risks
Breakthrough Assessment
9/10
A comprehensive foundational work (SoK) that establishes the formal definitions, taxonomies, and governance models necessary to move agents from ad-hoc planning to robust, reusable procedural memory.
×