Structured Output
Why structured output is the linchpin — one field determines whether you have a chatbot or an agent.
I need to take a moment to write this interlude as an appreciation for the importance of the humble structured output. It dawned on me during this whole investigation that the whole agent harness (and anything revolving around the concept of agent) relies exclusively on the structured output the model generates. This (once again) is incredibly obvious, but for some reason in my head there was not nearly enough appreciation for the significance of this development in the history of generative AI.

It's always text
Models output text. It may be structured text, but it's still text. There is no 'executable code' returned. The harness is a program to help do the next thing given an LLM's structured text response.
The entire internet has been consumed and 'understood' by these LLMs, but the limiting factor to making them useful was to reduce their output to a small set of defined fields, for which we can then anchor infinite number of actions onto. Conceptually its hard to convey - it seems all of the world's written knowledge is being passed through a sieve, a filter - only the output of this filter is meaningfully useful to integrate within traditional software.
Tool calling & Structured Output
The critical signal is stop_reason. When the model returns "stop_reason": "tool_use", the harness knows there are tool calls to execute. When it returns "end_turn", the harness waits for the user. One field determines whether the loop continues or pauses. Its this standardization that allows us to build software off the back of LLM output.
The other subtle insight is tool use IDs. Each tool call gets a unique ID like toolu_01A.... My first assumption was these get looked up server-side — they don't. LLM APIs are stateless. The ID exists so the model can look back through conversation history and match which result corresponds to which request. It's a self-referencing bookmark in a growing document.
This is why structured output is the linchpin of the whole system. Without it, you have a chatbot. With it, you have an agent — text that triggers actions that produce text that triggers more actions. The entire harness described in the previous post is built on top of this one capability: the model's ability to return structured intent instead of just prose.