My Essentials for Understanding Agent Skills

What is a Skill?

A skill defines (repeatable) workflows an agent should follow.

By workflow, we mean any combination of steps & procedures, templates, checks & validations, refinement loops, tool calls, and deterministic scripts.

In their simplest form, they are just instructions.

Why not stuff everything into system prompt/`AGENTS.md`?

You absolutely can. But you risk attention dilution, instructions being 'lost in the middle', and wasted token cost.

Skills are context engineering via progressive disclosure.

In other words, only show context if its relevant to a user query - it's system prompt context management. Think of skills as prompt microservices as opposed to a single prompt monolith

Structure of a Skill

Sample skill directory - note all skills are packaged as a directory, where the directory name should ideally match the name attribute in the SKILL.md frontmatter.

my-skill/
├── SKILL.md
├── scripts/
│   ├── validate.py
│   └── process.sh
├── references/
│   ├── api-guide.md
│   └── troubleshooting.md
├── assets/
│   ├── logo.ttf
│   └── base-template.html
└── templates/
    └── report.md

There is also nothing special about the structure of the supporting files - it only requires a single SKILL.md file. It feels like this directory is simply notation for standardization. However the important thing to realize is that these extra files are important for skill design because the skill body should be <500 lines; anything that can go into another file and be read progressively, should.

In the example skill layout above, all of these other files are there to support the skill such that the body content of the skills references these other supporting files to be used as needed, again segmenting out context to only be referenced when the model needs it (there is no penalty for many files with lots of content - the penalty is only paid as each file is read). You add something like:

Before writing queries, consult references/api-patterns.md

in the SKILL.md file to reference all of these supporting docs. This is another implementation of progressive disclosure.

The `SKILL.md`

Frontmatter critical fields:

name - descriptive, kebab case
description - what is does and when to use it; trigger conditions, key capabilities

Note: it seems like one of the most important things you can do during testing to improve skill usage is refine the name (and thus skill folder name) and description - verbose and descriptive file/skill name and description are keys to getting called correctly. Instead of validate skill name, you write validate-csv-schema-before-any-db-insert

Skills usage in practice

Each skill's frontmatter metadata (name and description) is loaded at startup (say ~50 tokens each; depends on description length, but there is a hard 1,024 character cap on descriptions), along with any MCP or tool definitions. Depending on the request, the model will read the body of relevant skills and will follow their instructions given the user's request. Without this info, the LLM will still call MCP and tools (as it has access to their descriptions), but crucially there may not be sufficient direction or context.

Difference between Skill `scripts` and MCP/tools

I could not wrap my head fully around the difference between MCP/tools and the scripts folder.

When using the model APIs¹, here is what I have realized:

Anything in the scripts folder can be run on the server side (meaning on the model provider's infrastructure). For Claude, this is in their code execution environment
MCP and other tool calls still run on client side (ie. agent harness) when the model returns a tool_call in its response.

This is also why the only types of scripts allowed are python and bash - because this is what the model providers allow in their code execution environments. Crucially there is no network access in the containers running the scripts on the model's servers so no pip install in the API but you can pre-configure dependencies; no persistence by default but you can pass the container ID back to reference objects from a prior run. Also note only the output of the scripts are sent to the model as additional context - the script itself is not.

Relationship between Skill & MCP

Right to the point - since skills are fancy instructions, all skills do is provide context for when and how to use tools (could be MCP, local bash scripts, Python functions). There is no more formal a relationship between Skills and MCP other than instructions for how to use the tools. The client side agent harness still calls them based on model's response.

I have no idea why this was also such a hard concept for me to grasp, but such is life.

Just to reiterate if this wasn't clear: skills are just instructions the system reads before calling (MCP) tools so that it has the most context about how and what to do.

Skills are not guarantees

The frustrating thing to remember is that skills provide no determinism or certainty they will be followed (or even called); they are are intents, not guarantees. In fact, longstanding issues confirm what I have read anecdotally online - skills are hard to get working consistently. This will improve with more RL for skills in the future, but it is slightly worrisome these seem to be positioned as answers to complex workflows when they are still non-deterministically called.

Resources

When using the API, scripts run server-side in Anthropic's sandbox; however in Claude Code, they run on your local machine. But I am more interested in building agents using the API, so am biased towards this framing ↩