Structured Output

Models output text. It may be structured text, but it's still text. There is no 'executable code' returned. The harness is a program to help do the next thing given an LLM's structured text response.

The critical signal is stop_reason. When the model returns "stop_reason": "tool_use", the harness knows there are tool calls to execute. When it returns "end_turn", the harness waits for the user. One field determines whether the loop continues or pauses.

The other subtle insight is tool use IDs. Each tool call gets a unique ID like toolu_01A.... My first assumption was these get looked up server-side — they don't. LLM APIs are stateless. The ID exists so the model can look back through conversation history and match which result corresponds to which request. It's a self-referencing bookmark in a growing document.

This is why structured output is the linchpin of the whole system. Without it, you have a chatbot. With it, you have an agent — text that triggers actions that produce text that triggers more actions. The entire harness described in the previous post is built on top of this one capability: the model's ability to return structured intent instead of just prose.