{cas}

MCP, Skills, and Friends

MCP servers, skills, hooks, and subagents — the tools that extend an agent harness beyond hardcoded tool schemas.

· Part 3 of 3 in agent harness

As stated earlier, there were some issues with the approach above (injecting tool schemas into request).

  • Tool schema bloat: stuffing dozens of tool definitions into every request is expensive and eats context. It also can cause agent confusion
  • Integration complexity: what if your tool requires complex adaptors for real systems (databases, APIs, auth, caching). ie. How do you pass auth credentials?
  • Workflow reuse: what if you want a reusable playbook ("anytime you mention time off, check the latest policy for the user's country") instead of a tool? This info is only relevant in specific circumstances; in all other requests, these details unnecessarily bloat the context

Several new approaches were introduced to combat some of these shortcomings

Model Context Protocol (MCP)

Tools with state and secrets access; managed by harness, not hardcoded into API call

A server exists that exposes a list of functions (tools) it has. You can create a server yourself, or use one provided by a 3rd party provider. Its a server because:

  • it responds to requests like "what tools do you have?"
  • can execute functions independently (another process/server)
  • it can be stateful (connections, caches, indexes, auth sessions)

You tell the harness about these servers. The harness will then send tools/list (before LLM call) and tools/call (following LLM response) requests to the MCP server to fetch the info it needs.

Detailed JSON-RPC
# MCP is JSON-RPC 2.0. The two key methods are:
#   - tools/list   (discover tools)
#   - tools/call   (invoke a tool)

──────────────────────────────────────────────────────────────────────────────
1) DISCOVER TOOLS: tools/list
──────────────────────────────────────────────────────────────────────────────

CLIENT → MCP SERVER  (JSON-RPC request)
{
  "jsonrpc": "2.0",
  "id": "req_1",
  "method": "tools/list",
  "params": {}
}

MCP SERVER → CLIENT  (JSON-RPC response)
{
  "jsonrpc": "2.0",
  "id": "req_1",
  "result": {
    "tools": [
      {
        "name": "search_issues",
        "description": "Search issues in TrackerX",
        "inputSchema": {
          "type": "object",
          "properties": {
            "query": { "type": "string" },
            "limit": { "type": "integer", "default": 10 }
          },
          "required": ["query"]
        }
      },
      {
        "name": "get_issue",
        "description": "Fetch a single issue by id",
        "inputSchema": {
          "type": "object",
          "properties": {
            "id": { "type": "string" }
          },
          "required": ["id"]
        }
      }
    ]
  }
}

# (Client then maps MCP inputSchema → model input_schema and includes these tools
#  in the LLM request as normal tools.)

──────────────────────────────────────────────────────────────────────────────
2) CALL A TOOL: tools/call
──────────────────────────────────────────────────────────────────────────────

CLIENT → MCP SERVER  (JSON-RPC request)
{
  "jsonrpc": "2.0",
  "id": "req_2",
  "method": "tools/call",
  "params": {
    "name": "search_issues",
    "arguments": {
      "query": "auth bug",
      "limit": 5
    }
  }
}

MCP SERVER → CLIENT  (JSON-RPC response)
{
  "jsonrpc": "2.0",
  "id": "req_2",
  "result": {
    "content": [
      {
        "type": "text",
        "text": "Found 2 issues:\n- AUTH-123: Login fails on Safari\n- AUTH-987: Token refresh loop"
      }
    ],
    "isError": false
  }
}

# (Client then wraps that content into the LLM's tool_result shape and sends back
#  as part of the conversation history.)

──────────────────────────────────────────────────────────────────────────────
3) (OPTIONAL) ERROR EXAMPLE
──────────────────────────────────────────────────────────────────────────────

CLIENT → MCP SERVER
{
  "jsonrpc": "2.0",
  "id": "req_3",
  "method": "tools/call",
  "params": {
    "name": "get_issue",
    "arguments": { "id": "" }
  }
}

MCP SERVER → CLIENT
{
  "jsonrpc": "2.0",
  "id": "req_3",
  "result": {
    "content": [
      { "type": "text", "text": "id is required" }
    ],
    "isError": true
  }
}

Once the MCP client (Claude Code) has called the tools/list , the tools it send to the LLM contain the same info as defining tools 'normally' - they contain name, description, input schema just as normal.

"tools": [
  { "name": "foo", "description": "...", "input_schema": { ... } },
  { "name": "bar", "description": "...", "input_schema": { ... } }
]

The difference is where those tool definitions live:

  • Without MCP: you (the harness author) hardcode tool schemas into every API request.
  • With MCP: Claude Code fetches tool schemas from the server (once / on change), and then decides how much of that to load into the model context.

Does your server have to be 'running' in the background? No. While there is a protocol for , you can also have a stdio server that is callable. See below for example.

A Python `stdio` server
# pseudocode: MCP stdio server (JSON-RPC 2.0)
# - reads one JSON object per line from stdin
# - writes one JSON object per line to stdout
# - exposes tools/list and tools/call

import sys
import json
import os

# ---------------------------
# helpers
# ---------------------------

def read_json_line():
    line = sys.stdin.readline()
    if not line:
        return None
    return json.loads(line)

def write_json(obj):
    sys.stdout.write(json.dumps(obj) + "\n")
    sys.stdout.flush()

def reply_ok(req_id, result_obj):
    write_json({
        "jsonrpc": "2.0",
        "id": req_id,
        "result": result_obj
    })

def reply_err(req_id, code, message, data=None):
    err = {"code": code, "message": message}
    if data is not None:
        err["data"] = data
    write_json({
        "jsonrpc": "2.0",
        "id": req_id,
        "error": err
    })

# ---------------------------
# server state / secrets
# ---------------------------

DB_URL   = os.getenv("DB_URL")            # server-only secret
API_KEY  = os.getenv("VENDOR_API_KEY")    # server-only secret

db = db_connect(DB_URL)                   # pseudocode
http = http_client()                      # pseudocode

TOOLS = [
    {
        "name": "get_customer_by_id",
        "description": "Fetch a customer record by internal id",
        "inputSchema": {
            "type": "object",
            "properties": {"customer_id": {"type": "integer"}},
            "required": ["customer_id"]
        }
    },
    {
        "name": "vendor_search",
        "description": "Search vendor tickets by query",
        "inputSchema": {
            "type": "object",
            "properties": {
                "query": {"type": "string"},
                "limit": {"type": "integer", "default": 5}
            },
            "required": ["query"]
        }
    }
]

# ---------------------------
# tool implementations
# ---------------------------

def tool_get_customer_by_id(args):
    cid = args["customer_id"]
    row = db.query_one(
        "SELECT id, name, email FROM customers WHERE id = ?",
        [cid]
    )
    return {"content": [{"type": "text", "text": json.dumps(row)}], "isError": False}

def tool_vendor_search(args):
    q = args["query"]
    limit = args.get("limit", 5)

    resp = http.get(
        "https://api.vendor.com/tickets/search",
        params={"q": q, "limit": limit},
        headers={"Authorization": "Bearer " + API_KEY, "Accept": "application/json"},
        timeout_ms=8000
    )

    if resp.status != 200:
        return {"content": [{"type": "text", "text": resp.body_text}], "isError": True}

    return {"content": [{"type": "text", "text": json.dumps(resp.json())}], "isError": False}

TOOL_HANDLERS = {
    "get_customer_by_id": tool_get_customer_by_id,
    "vendor_search": tool_vendor_search,
}

# ---------------------------
# JSON-RPC loop
# ---------------------------

def main():
    while True:
        msg = read_json_line()
        if msg is None:
            break  # stdin closed

        req_id = msg.get("id")
        method = msg.get("method")

        # notifications have no id; ignore or handle separately
        if req_id is None:
            continue

        if method == "tools/list":
            reply_ok(req_id, {"tools": TOOLS})
            continue

        if method == "tools/call":
            params = msg.get("params") or {}
            name = params.get("name")
            args = params.get("arguments") or {}

            handler = TOOL_HANDLERS.get(name)
            if not handler:
                reply_ok(req_id, {"content": [{"type": "text", "text": "Unknown tool"}], "isError": True})
                continue

            try:
                result = handler(args)
                reply_ok(req_id, result)
            except Exception as e:
                # don't leak secrets; return a sanitized error
                reply_ok(req_id, {"content": [{"type": "text", "text": "Tool execution failed"}], "isError": True})
            continue

        reply_err(req_id, -32601, "Method not found")

if __name__ == "__main__":
    main()

You would then add the server via:

claude mcp add \
  --transport stdio \
  --env DB_URL=$DB_URL \
  --env VENDOR_API_KEY=$VENDOR_API_KEY \
  myserver -- python mcp_server.py

Notice how we set secrets and credentials when adding the server definition. These are read by the machine running the server, not the LLM.

Further examples of customer servers that use these credentials below

Sample DB Auth & 3rd Party API keys
# PSEUDOCODE: MCP SERVER (stdio JSON-RPC) that:
  - authenticates to a DATABASE (server holds creds)
  - exposes safe tools (no raw SQL from the model)
  - calls a 3rd-party API using an API key (server holds key)

Notes:
- In MCP, the MODEL never receives your DB password / API key.
- The MCP server runs with secrets (env/config) and returns only results.

──────────────────────────────────────────────────────────────────────────────
A) CUSTOM MCP SERVER WITH DATABASE AUTH
──────────────────────────────────────────────────────────────────────────────

# startup
DB_HOST = env("DB_HOST")
DB_PORT = env("DB_PORT")
DB_NAME = env("DB_NAME")
DB_USER = env("DB_USER")
DB_PASS = env("DB_PASS")

db = db_connect(
  host=DB_HOST, port=DB_PORT, database=DB_NAME,
  user=DB_USER, password=DB_PASS,
  pool_size=10
)

TOOLS = [
  {
    "name": "get_customer_by_id",
    "description": "Fetch a customer record by internal id",
    "inputSchema": {
      "type": "object",
      "properties": {"customer_id": {"type": "integer"}},
      "required": ["customer_id"]
    }
  },
  {
    "name": "search_orders",
    "description": "Search orders by email (exact match) with limit",
    "inputSchema": {
      "type": "object",
      "properties": {
        "email": {"type": "string"},
        "limit": {"type": "integer", "default": 20}
      },
      "required": ["email"]
    }
  }
]

# JSON-RPC loop (stdio)
while msg := read_json_line(stdin):
  if msg.method == "tools/list":
    reply(msg.id, { "tools": TOOLS })

  elif msg.method == "tools/call":
    tool = msg.params.name
    args = msg.params.arguments

    if tool == "get_customer_by_id":
      # IMPORTANT: parameterized query, fixed SQL (no model SQL)
      row = db.query_one(
        "SELECT id, name, email, created_at FROM customers WHERE id = ?",
        [args.customer_id]
      )
      reply(msg.id, {
        "content": [{ "type": "text", "text": json(row) }],
        "isError": false
      })

    elif tool == "search_orders":
      rows = db.query_all(
        "SELECT id, total, status, created_at FROM orders WHERE email = ? ORDER BY created_at DESC LIMIT ?",
        [args.email, args.limit]
      )
      reply(msg.id, {
        "content": [{ "type": "text", "text": json(rows) }],
        "isError": false
      })

    else:
      reply(msg.id, {
        "content": [{ "type": "text", "text": "Unknown tool" }],
        "isError": true
      })

──────────────────────────────────────────────────────────────────────────────
B) MCP SERVER CALLING A 3RD-PARTY SERVICE WITH AN API KEY
──────────────────────────────────────────────────────────────────────────────

API_BASE = "https://api.vendor.com"
API_KEY  = env("VENDOR_API_KEY")          # stored only on server
# or env("VENDOR_API_KEY") could be injected by Claude Code config

TOOLS += [
  {
    "name": "vendor_search",
    "description": "Search Vendor tickets by query",
    "inputSchema": {
      "type": "object",
      "properties": {
        "query": {"type": "string"},
        "limit": {"type": "integer", "default": 5}
      },
      "required": ["query"]
    }
  }
]

# inside tools/call handler
if tool == "vendor_search":
  q = args.query
  limit = args.limit

  resp = http_get(
    url = API_BASE + "/tickets/search",
    params = { "q": q, "limit": limit },
    headers = {
      "Authorization": "Bearer " + API_KEY,   # or "x-api-key": API_KEY
      "Accept": "application/json"
    },
    timeout_ms = 8000
  )

  if resp.status != 200:
    reply(msg.id, { "content":[{"type":"text","text": resp.body_text}], "isError": true })
  else:
    data = resp.json()
    reply(msg.id, { "content":[{"type":"text","text": json(data)}], "isError": false })

──────────────────────────────────────────────────────────────────────────────
C) HOW SECRETS GET INTO THE SERVER (Claude Code side)
──────────────────────────────────────────────────────────────────────────────

# Local stdio server: pass secrets via env
claude mcp add --transport stdio --env DB_USER=... --env DB_PASS=... --env VENDOR_API_KEY=... myserver -- python mcp_server.py

# Remote HTTP server: pass API key via headers (client → server)
claude mcp add --transport http myremote https://mcp.vendor.com
# and configure headers (conceptually):
headers = { "Authorization": "Bearer <token>" }

# Either way:
# - Claude model sees tool schemas + names
# - Server sees secrets
# - Results returned are data-only (never echo secrets)

What the heck is stdio?

Notice above we set the transport arg to stdio. It is one of two options:

  • stdio: server runs locally as a long-lived subprocess; JSON-RPC over stdin/stdout. Best for local/community wrappers + keeping secrets on your machine. State lives for the session (caches, open DB conns, in-memory indexes). Common pattern: npx … / python … stdio servers you run yourself.

  • http: server runs remotely as a service; JSON-RPC over HTTP. Best for shared/team services + centralized integrations. State can be durable and shared across clients (memory/Redis/DB), auth via headers/OAuth/tokens. Vendor-hosted MCP servers are often HTTP (they run the service for you), although you could run your own local server on another port.

When MCP tool lists get too big: Tool Search

If you attach a bunch of MCP servers, you're back to the original problem: "dozens/hundreds of tool descriptions sitting in context, even when idle".

Harnesses deal with this by deferring MCP tools and loading them on-demand via tool search once they'd consume "too much" of the context window. The default auto-trigger is when MCP tool descriptions exceed 10% of your context window, and it's configurable (e.g. in Claude Code, ENABLE_TOOL_SEARCH=auto:<N>).

Skills

A repeatable playbook read only when necessary; conditional prompt injection that may leverage tools

What if you want a repeatable playbook without paying the cost of carrying it around in every prompt?

Let's keep with the example we referenced above - the user has inquired about time off eligibility. With a skill, you can inject relevant context depending on the request. So we can pass the model info about the policy, procedures, etc. without having done so at the start when we didn't know the user's intent.

Sample Time Off `SKILL.md`
---
name: time-off
description: Determine time off / leave eligibility and next steps
---
# Skill: Time Off / Leave

When the user asks about time off, leave, vacation policy, EI, mat leave, etc:

1) Identify jurisdiction and employment type
- Country/province/state
- employee vs contractor
- union / public sector if relevant
- start date / tenure, if relevant

2) Ask only the minimum clarifying questions needed
- If jurisdiction is unknown: ask for it first
- If employer policy vs legal minimum is ambiguous: ask which they mean

3) Separate sources explicitly
- Legal minimums (statutory)
- Employer policy (contract/handbook)
- Practical advice (how people usually handle it)

4) Output format
- Short answer (1–2 sentences)
- Then bullets: "What's legally true", "What's policy-dependent", "What I'd do next"
- If high uncertainty: say exactly what detail is missing

5) Tool use guidance
- If there's a company handbook file locally, read it first
- Otherwise, use web search for statutory rules for the stated jurisdiction
- Do not speculate when dates/thresholds matter; ask or search

Notice the name and description frontmatter keys in the SKILL.md above. These are the only things the harness loads into its context when starting up.

Skills add structure and restraint — and are referenced only when that topic comes up.

Structure of a Skill

Can be a single SKILL.md file on its own, or it can have many elements.

my-skill/
├── SKILL.md            # required entrypoint
├── template.md         # optional: a fill-in template (can be any filename)
├── examples/           # optional: sample outputs / expected format
│   └── sample.md
├── reference.md        # optional: deeper docs / API notes (any filename)
└── scripts/            # optional: executable helpers (bash/python/etc.)
    └── validate.sh

Other than the SKILL.md, nothing else is required. In fact, none of the file and folder names matter either. This is just the convention for organization. The key is to mention (when and how) these other files and scripts in the SKILL.md so the skill reliably uses them.

Skill Nuances

Unlike tools, including MCP tools, subagents do not inherit skills from the parent convo by default. You must explicitly configure a subagent to have access to these

Skills are not "stateful" either, which makes sense seeing as it injects context depending on the request; it's as stateful as any other message sent to the LLM in the conversation.

Finally, skills are suggestions. There is no way to actually enforce a model uses a skill. Note that when using skills, CLAUDE.md becomes information about information — tools, skills, and essential info it needs on every request.

Hooks are enforcement. Skills are persuasion.

Skills can suggest bash. Tools can run bash. Hooks can enforce bash safety

Skills in Practice

As I was writing this, an interesting post discussed skills in practice, where a common theme was skills are not used unless explicitly asked. Some notable comments:

found success in treating skills more like re-usable semi-deterministic functions and less like fingers-crossed prompts for random edge-cases

and Soerensen

The observation about agents not using skills without being explicitly asked resonates. In practice, I've found success treating skills as explicit "workflows" rather than background context.

The pattern that works: skills that represent complete, self-contained sequences - "do X, then Y, then Z, then verify" - with clear trigger conditions. The agent recognizes these as distinct modes of operation rather than optional reference material.

What doesn't work: skills as general guidelines or "best practices" documents. These get lost in context or ignored entirely because the agent has no clear signal for when to apply them.

The mental model shift: think of skills less like documentation and more like subroutines you'd explicitly invoke. If you wouldn't write a function for it, it probably shouldn't be a skill.

Even Vercel thinks that skills have their place - they work better for vertical, action-specific workflows that users explicitly trigger. Vercel and other have found adding a skill's instructions in the general AGENTS.md works better.

You can also add a tool call (or even a pre-model to select relevant skills) to load a skill if you find it's not being used.

Subagents

A 'clean' call to an LLM, where (presumably) only a subset of the conversation context is passed in order to achieve some specific goal.

Mentioned earlier - in Claude Code, the tool is called Task. When the harness sees it, it runs a subagent. You rely on the parent model to inject all of the relevant context into the subagent. This seems like it saves tokens in an agentic loop, but there are some issues I have found with them that are articulated nicely in this blog by Mario Zechner.

The first issue is they are black boxes - you really can't see what the model passed to them and what is happening within (context transfer between agents is usually poor). Mario also states that using a sub-agent mid-session for context gathering is a sign you didn't plan ahead. Instead, Mario argues you should create an artefact that the single agent can use ahead of time, since agents are bad at knowing which context is relevant to send to other agents (here is Mario arguing using subagents to implement various features in parallel is an anti-pattern. [^1]

There are exceptions - something like 'code review' subagent genuinely has its merits.

[^1]: Note that this could all change - the blog I mentioned is ancient (Nov 2025) and there are rumblings that swarm workflows will be the big thing in 2026, which would suggest context transfer between agents will be fixed. Mario's stance: "Spawning multiple sub-agents to implement various features in parallel is an anti-pattern in my book and doesn't work."

Hooks

Note: below does not apply to harnesses like codex that do not support hooks Also tools like OpenCode use different event names, but the principle still applies

Things to run when a harness event is triggered (and in CC, everything is an event)

The list of events are here. All we need to remember is that whenever the harness (CC) action, it has an event type for which we can hook some action onto

Event Description Matcher Support
PreToolUse Runs before a tool call is executed Yes (tool name)
PostToolUse Runs after a tool completes successfully Yes (tool name)
PostToolUseFailure Runs after a tool call fails Yes (tool name)
PermissionRequest Runs when a permission dialog is about to be shown Yes (tool name)
UserPromptSubmit Runs when the user submits a prompt, before Claude processes it No
Stop Runs when the main Claude Code agent finishes responding No
SubagentStop Runs when a subagent (Task tool) finishes responding No
Notification Runs when Claude Code sends a notification Yes (notification type)
PreCompact Runs before a compact operation Yes (manual or auto)
SessionStart Runs when a new session starts or an existing one resumes Yes (startup, resume, clear, compact)
SessionEnd Runs when a session ends No

There are 'matchers' for which you can filter the hook even more, so that not every PreToolUse event runs the hook, but rather only ones where the tool is Bash, Write, Edit, for example.

Some common PreToolUse / PostToolUse matchers

  • Bash — Shell commands
  • Read — File reading
  • Write — File writing
  • Edit — File editing
  • MultiEdit — Multi-file editing
  • Glob — File pattern matching
  • Grep — Content search
  • Task — Subagent tasks
  • WebFetch / WebSearch — Web operations
  • Notebook.* — Notebook operations (regex)

And some Notification matchers:

  • permission_prompt — Permission requests
  • idle_prompt — Waiting for user input (60+ seconds idle)
  • auth_success — Authentication success
  • elicitation_dialog — MCP tool elicitation input needed

Docs for CC hooks here

Honestly, other than understanding the events and that you can do 'anything' in response to an event, I just ask the LLM itself to create a hook. This seems to me to be the one differentiator for CC vs other tools - its event system.

Sample Hooks

1. UserPromptSubmit — force skill activation

{
  "hooks": {
    "UserPromptSubmit": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/skill-router.sh"
          }
        ]
      }
    ]
  }
}
#!/bin/bash
INPUT=$(cat)
PROMPT=$(echo "$INPUT" | jq -r '.prompt')

if echo "$PROMPT" | grep -qiE '(test|spec|coverage)'; then
  SKILL="testing-patterns"
elif echo "$PROMPT" | grep -qiE '(api|endpoint|route)'; then
  SKILL="backend-guidelines"
else
  exit 0
fi

echo "MANDATORY: Use Skill($SKILL) BEFORE responding. Do NOT skip this step."
exit 0

Notice how echo is used to print text to the terminal, which Claude sees as if someone typed it into the conversation

2. PreToolUse — block dangerous commands

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "jq -r '.tool_input.command' | grep -qiE 'rm -rf|drop table|force push' && echo 'Blocked: dangerous command' >&2 && exit 2 || exit 0"
          }
        ]
      }
    ]
  }
}

3. SessionStart — inject project context

{
  "hooks": {
    "SessionStart": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "echo \"Recent changes:\n$(cd \"$CLAUDE_PROJECT_DIR\" && git log --oneline -5 2>/dev/null)\""
          }
        ]
      }
    ]
  }
}

4. Notification — play a sound when Claude needs input

{
  "hooks": {
    "Notification": [
      {
        "matcher": "idle_prompt",
        "hooks": [
          {
            "type": "command",
            "command": "afplay /System/Library/Sounds/Glass.aiff"
          }
        ]
      }
    ]
  }
}

Skills vs Hooks vs MCP vs Subagents

		Must it run regardless of what the model wants?
                              │
                    ┌─────────┴─────────┐
                   YES                  NO
                    │                    │
                    ▼                    ▼
                 HOOKS          Needs state, secrets,
            (enforce, block,    or external connections?
             log, validate)              │
                  ¹              ┌───────┴───────┐
                                YES              NO
                                │                 │
                                ▼                 ▼
                           MCP SERVER ¹    Reusable across
                          (remote or        requests?
                           local+env)            │
                                         ┌───────┴───────┐
                                        YES               NO
                                         │                 │
                                         ▼                 ▼
                                 Reducible to a        SUBAGENT
                                 function call?      (isolated task,
                                         │            fresh context)
                                ┌────────┴────────┐
                               YES                NO
                                │                  │
                                ▼                  ▼
                           MCP SERVER ¹         SKILL
                          (local stdio,      (playbook /
                           callable           recipe /
                           function)          procedure) ¹,²

¹ These compose: a Hook can block an MCP call
  (PreToolUse), a Hook can force a Skill to load
  (UserPromptSubmit → skill-router.sh), and a
  Skill can suggest calling an MCP tool.
  They layer rather than compete.

² Skills can contain helper scripts (e.g. scripts/validate.sh)
  but these aren't standalone tools — the model runs them
  through bash as part of following the skill's instructions.
  If the script could stand alone as input → output with its
  own schema, it probably belongs as an MCP tool instead.