OpenAI internal data agent
By Raphael Steiman
OpenAI just pulled back the curtain on their internal data agent.
600 petabytes. Six layers of context. Schema metadata, expert annotations, code definitions, RAG, memory systems, golden query evals. Every layer solving a real problem. Every layer buying more reliability.
This is one of the most capable AI teams in the world, and they still needed all of that just to get reliable answers from their own business system data.
That alone should tell you something.
The honest takeaway isn’t “text-to-SQL is hard.” It’s deeper than that.
Data complexity is a problem created by how we’ve organized data itself. We keep adding more layers to manage more layers. Each one works until it doesn’t. Then we build something more sophisticated to fix it. The pattern keeps repeating.
Einstein put it simply: You cannot solve a problem from the same level of thinking that created it.
What would it look like to step up a level?
Not more context layers on top of fragmented systems. Actual harmonization. A single model where data is natively ready to be reasoned with. Where the infrastructure becomes invisible.
OpenAI is showing the world what’s possible with enough engineering firepower. But the harder question is whether that’s the only path forward.
Sometimes the answer isn’t better engineering. It’s dissolving the problem entirely.
