The Agent Harness

What is the difference between Claude Code and the Claude API? Easy to answer, right? I thought so — the answer was clear somewhere in my latent space of a brain. Then I tried to explain it, and (re)discovered a universal truth: you don't know something until you can explain it 'out loud'.

So I wrote it all down. After hours of mental gymnastics deconstructing loops, harnesses, tools, and subagents...I arrived at the most basic conclusion: Claude Code is the software to make LLMs work as agents. SO obvious in hindsight - embarrassingly so.

I use Claude Code as my example throughout this series, but the ideas apply broadly to any agent harness — OpenCode, Pi, and others all work the same way.

Claude API vs Claude Code

Claude Code (CC) is a harness around the Claude API. You can use just the API and achieve the same things, but CC is also the 'human hook' that keeps you coming back — a Max subscription gives you CC at a flat rate with OAuth handled for you, while using the API directly is $$$.

The harness is the agent. The agent (and thus the harness) is a coordinated action loop (ie. the definition of software)

The Agent Loop

First let's define the agent loop:

LLM requests → host executes → host sends results → LLM continues

But that is not quite right, is it? Because it oversimplifies the host's actions. Without any coordination/program running to capture and prompt/execute on the host, the above loop is just a single API call. This is the distinction between Claude API and CC - the API is used within CC, alongside a program to prompt the user and execute on their system all 'captured' output from the LLM API call.

A more comprehensive agent loop is below:

┌─────────────────────────────────────────────────────────────────┐
│                     CLAUDE CODE AGENT LOOP                      │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
                    ┌──────────────────┐
                    │  LLM requests    │
                    │  tool use        │
                    │  (e.g., bash,    │
                    │   write_file)    │
                    └────────┬─────────┘
                             │
                             ▼
                    ┌──────────────────┐
                    │  Host checks     │
                    │  permission mode │
                    └────────┬─────────┘
                             │
              ┌──────────────┴──────────────┐
              │                             │
              ▼                             ▼
     ┌────────────────┐           ┌────────────────┐
     │ Needs approval │           │ Auto-approved  │
     │ (default mode) │           │ (bypass mode)  │
     └───────┬────────┘           └───────┬────────┘
             │                            │
             ▼                            │
     ┌────────────────┐                   │
     │ Prompt user:   │                   │
     │ "Allow bash:   │                   │
     │  rm -rf /tmp?" │                   │
     │  [y/n/always]  │                   │
     └───────┬────────┘                   │
             │                            │
             ▼                            │
     ┌────────────────┐                   │
     │ User approves  │                   │
     │ or rejects     │                   │
     └───────┬────────┘                   │
             │                            │
             └──────────────┬─────────────┘
                            │
                            ▼
                   ┌────────────────┐
                   │ Host executes  │
                   │ (subprocess,   │
                   │  file I/O)     │
                   └───────┬────────┘
                           │
                           ▼
                   ┌────────────────┐
                   │ Results sent   │
                   │ back to LLM    │
                   └───────┬────────┘
                           │
                           ▼
                   ┌────────────────┐
                   │ LLM continues  │
                   │ (loop repeats) │
                   └────────────────┘

I have seen this agent loop also called a 'harness'. I guess the distinction is the following:

An agent is the running of the loop, the harness is the framework within which the loop runs. The harness is the rules and context ('decision tree'?) for how an agent runs. Without a harness, the agent is just an LLM waiting for its next call

Without a harness, an agent defined as 'tools in a loop' misses key distinctions on how an an LLM actually acts as an agent. This was something I took for granted until I really thought about it.

What is an Agent 'Harness'?

The following is my intuition for Claude Code's harness:

A CLAUDE.md file that describes exactly what the program (CC) should do, the current user's system, current date, etc. Set up the agent with its objective and key info an agent would need to run commands without error.
Tools available (bash, grep)
- Tools aren't the executables - they are a structured list of how to return a tool call request so that it can execute on the host's machine correctly. The LLM receives a schema representative of the tool (name, description, parameters) and outputs JSON matching that schema, if that tool is to be called. The harness then maps that JSON to actual execution.
Permissions - what tools and actions are permitted to run without approval
- A key part of the 'harness' - which tool calls can the program automatically execute, versus which require human intervention
- We 'bypass' the approval step with permission modes like acceptEdits or bypassPermissions. Without bypass, the harness prompts the user before executing functions.
Built-in functions for orchestration
- Think compaction, resuming prior conversations, etc.
- Most interesting is its use of subagents, which itself is just a tool call via the Task tool. The LLMs have been trained to know to call subagents if the user explicitly asks for it, but also more and more automatically as a way to preserve context and run parallel independent executions (think 'swarm')
- Subagents are called via control flow primitive (fancy if/then) embedded in the agent harness:
- ```
  if name == "bash":
      # Just execute and return - linear, stays in this loop
      return subprocess.run(command)

  if name == "Task":
      # Spawn ENTIRE NEW LOOP - recursive, branching execution
      # This loop pauses, new loop runs to completion, then this resumes
      return spawn_subagent(prompt)
```
- The LLM doesn't know the difference. It just outputs:
- ```
{"name": "Task", "input": {"prompt": "Review auth code"}}
```
Event-based lifecycle hooks.
- Hooks allow custom code to run at specific points in the agent loop: before/after tool execution (PreToolUse, PostToolUse), on session start/stop, or when a subagent completes. This lets you inject validation, logging, or side effects without modifying the core loop. For example: a PreToolUse hook that blocks any rm -rf command, or a PostToolUse hook that logs all file changes to an audit trail.
Feedback mechanism - collecting user input (interruptions, next step questions) or tool call output
- The loop isn't just LLM → tools → LLM. The user can interrupt, redirect, or inject new context at any point.

A key point to remember throughout all of this is the models output text. It may be structured text, but it's still text. There is no 'executable code' returned. The harness is a program to help do the next thing given an LLM structured text response.

How Tool Calls Work

Send a list of tools in the request, where each tool has a name, description, and input_schema. All tools are predefined in the request — practical, but becomes a problem when there are many tools, resulting in context bloat. See MCP and Skills as alternatives.

Full request example

import anthropic

client = anthropic.Anthropic()

# The request you send
response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system="You are a helpful coding assistant.",

    # Tool definitions - this is what tells Claude what tools exist
    tools=[
        {
            "name": "bash",
            "description": "Run a shell command",
            "input_schema": {
                "type": "object",
                "properties": {
                    "command": {
                        "type": "string",
                        "description": "The bash command to execute"
                    }
                },
                "required": ["command"]
            }
        },
        {
            "name": "Read",
            "description": "Read a file's contents",
            "input_schema": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "File path to read"}
                },
                "required": ["path"]
            }
        },
        {
            "name": "Task",  # This is the subagent tool
            "description": "Launch a subagent to handle a complex task autonomously",
            "input_schema": {
                "type": "object",
                "properties": {
                    "description": {
                        "type": "string",
                        "description": "Short 3-5 word description of the task"
                    },
                    "prompt": {
                        "type": "string",
                        "description": "Detailed instructions for the subagent"
                    },
                    "subagent_type": {
                        "type": "string",
                        "description": "Type of subagent: general-purpose, Explore, etc."
                    }
                },
                "required": ["description", "prompt"]
            }
        }
    ],

    messages=[
        {"role": "user", "content": "List files in current directory and research competitors Acme Corp and Globex Inc"}
    ]
)

The response can include multiple tool calls, which the harness can run in parallel. The key signal is "stop_reason": "tool_use" — this tells the harness there are things to execute and send back.

Also note the IDs of each tool use. These are for reference in the conversation context — they are not 'looked up' on the server side (this was my first guess). LLM APIs are stateless, so any ID is used by the LLM to look back in the past context (conversation history) to understand which output corresponds to which tool call.

Full response structure

# What Claude returns - this is the actual structure
{
    "id": "msg_01XYZ...",
    "type": "message",
    "role": "assistant",
    "model": "claude-sonnet-4-5",
    "stop_reason": "tool_use",        # <-- Key signal: Claude wants to use tools
    "stop_sequence": None,

    "content": [
        # Claude can include text before/between tool calls
        {
            "type": "text",
            "text": "I'll list the files and research both competitors in parallel."
        },

        # Tool call 1: bash command
        {
            "type": "tool_use",
            "id": "toolu_01A...",               # Unique ID to match results later
            "name": "bash",
            "input": {
                "command": "ls -la"
            }
        },

        # Tool call 2: Subagent for Acme research (parallel)
        {
            "type": "tool_use",
            "id": "toolu_01B...",
            "name": "Task",                     # This triggers subagent spawning
            "input": {
                "description": "Research Acme Corp",
                "prompt": "Research Acme Corp. Find their main products, recent news, and competitive positioning. Return a summary.",
                "subagent_type": "general-purpose"
            }
        },

        # Tool call 3: Subagent for Globex research (parallel)
        {
            "type": "tool_use",
            "id": "toolu_01C...",
            "name": "Task",
            "input": {
                "description": "Research Globex Inc",
                "prompt": "Research Globex Inc. Find their main products, recent news, and competitive positioning. Return a summary.",
                "subagent_type": "general-purpose"
            }
        }
    ],

    "usage": {
        "input_tokens": 847,
        "output_tokens": 312
    }
}

Tool calling is async but responses are sent collectively — they are not streamed back to the API, as this would cause collisions. Each result includes {"type": "tool_result", "tool_use_id": tc.id, "content": content} — this tells the LLM which response corresponds to which earlier tool call. It's the LLM's job to look back in conversation history to connect request to result (because LLM APIs are stateless).

Harness code

# ═══════════════════════════════════════════════════════════════════
# Execute ALL tool calls in PARALLEL
#
# Claude may return multiple tool_use blocks in one response.
# We launch them all concurrently - this is where text becomes action.
# ═══════════════════════════════════════════════════════════════════

tool_calls = [b for b in response.content if b.type == "tool_use"]

async def execute_tool(tc):
    if tc.name == "bash":
        # SUBPROCESS: Actual OS process - runs in parallel with others
        proc = await asyncio.create_subprocess_shell(
            tc.input["command"],
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE
        )
        stdout, stderr = await proc.communicate()
        content = f"{stdout.decode()}\n{stderr.decode()}"

    elif tc.name == "Task":
        # SUBAGENT: Recursive agent_loop() call - also runs in parallel
        content = await agent_loop(tc.input["prompt"], subagent_tools)

    # Return result with matching ID so Claude knows which call this answers
    return {"type": "tool_result", "tool_use_id": tc.id, "content": content}

# Launch ALL concurrently - wait for slowest, not sum of all
tool_results = await asyncio.gather(*[execute_tool(tc) for tc in tool_calls])

# ═══════════════════════════════════════════════════════════════════
# Send results back to Claude
#
# API is stateless - we append to conversation history and send it all
# ═══════════════════════════════════════════════════════════════════

messages.append({"role": "assistant", "content": response.content})  # Claude's tool_use blocks
messages.append({"role": "user", "content": list(tool_results)})     # Our tool_result blocks

# Loop back to: response = client.messages.create(..., messages=messages)
# Claude now sees full history including results, decides what's next

The harness flow summarized:

Claude returns:  [bash:ls, Task:Acme, Task:Globex]
                         │
                         ▼
              ┌──────────┴──────────┐
              │   asyncio.gather()  │
              └──────────┬──────────┘
                         │
         ┌───────────────┼───────────────┐
         ▼               ▼               ▼
   subprocess       agent_loop      agent_loop
    (bash:ls)       (subagent)      (subagent)
         │               │               │
         ▼               ▼               ▼
    "file1.txt"    "Acme summary"  "Globex summary"
         │               │               │
         └───────────────┼───────────────┘
                         │
                         ▼
         [tool_result, tool_result, tool_result]
                         │
                         ▼
         Append to messages, send back to Claude
                         │
                         ▼
         Claude sees results, continues or finishes

Stopping

The harness knows to await further input by evaluating the same stop_reason key from earlier. Instead of tool_use, if the model returns end_turn (or even something like refusal or max_tokens), then there is nothing more for the harness to do.