OpenAI's Data Agent

For complex questions across all of their data, OpenAI describes how they built a Data Agent

Some takeaways:

Context is everything: most of the work seems to be about optimizing for context that the model is given
- Use RAG to inject relevent context that has been cached/precomputed on a schedule (they run an offline pipeline that aggregates/normalizes things like schema/lineage + query history, human notes, and code-derived table semantics, embeds it, and retrieves only the relevant chunks at question time). The agent can still do live inspection when needed.
Guide the goal, not the path: overly prescriptive prompts made it worse
Fewer tools > more tools: overlapping tool use confuses agents
Evals matter: They run continuous evals with curated questions and “golden” SQL / expected results, and grade the agent on both the SQL it generates and whether the resulting outputs match expectations (i.e., regression tests for analytics).
Save learnings: feedback learnings into context (similar to updating AGENTS.md)

My takeaway is they utilized role-specific agents for planning, 'common sense' checks (ie. no results from SQL query - that doesnt make sense). This is not explicitly stated in their posts, but it fits the 'vibe' I am getting of 2026 being the year of coordination of very specific agents by a single orchestrator agent.