The eval layer is disappearing. Not because evals became less important — because they became too important to leave outside the platform.
OpenAI's acquisition of Promptfoo — an open-source eval framework used by thousands of teams — isn't a product feature announcement. It's a declaration about where safety and evaluation belong: inside the same stack that builds and deploys agents, not bolted on after the fact. GitHub's Agentic Workflows security architecture makes the same move from the CI/CD angle: threat models and authorization controls baked into where agents execute, not layered on top.
Both moves point at the same structural shift. The deployment surface is becoming the product.
From Model Quality to Deployment Surface
For three years, the implicit assumption was: better model, better agent. That's still partially true at the frontier — but it's no longer the differentiating variable. Models commoditized. Faster than the benchmarks predicted, faster than the pricing held.
The question now is which runtime your security team trusts to run agents at scale.
Operational trust means provenance, control, recovery, and auditability. None of those are model properties. They're platform properties. And they're exactly what's getting absorbed.
When OpenAI acquires Promptfoo, it's not buying an eval tool. It's buying the interface between "does this agent behave correctly" and "does this agent behave correctly in my specific deployment context." That's where compliance teams and legal review actually live. The model doesn't reach that far.
The Security Architecture Inversion
GitHub's post on agentic workflow security makes the core claim explicit: the threat model for agents running in CI/CD isn't about jailbreaks in isolation. It's about execution context — what credentials the agent has, what surfaces it can touch, what happens when it acts on a hallucinated file path.
The answer is least-privilege execution, scoped credentials, audit trails — infrastructure controls, not model controls. Security thinking that previously lived outside the platform (third-party audits, external eval frameworks, ad-hoc red-teaming) is becoming native to where agents run. The external audit layer isn't being improved. It's being disintermediated.
What This Means for Platform Builders
Platforms that integrate eval, provenance, and execution controls natively will win. The eval-as-external-tool model is structurally disadvantaged: it creates integration surface, version drift, and accountability gaps that enterprise buyers eventually can't ignore.
If you're evaluating agent platforms today, the right question is which platform makes it easiest to satisfy your security team, your legal counsel, and your customers' compliance requirements. Benchmark scores don't answer that. Platform architecture does.
The deployment surface is the moat. These acquisitions are making it official.