The Sovereign Tier: Why Cheap AGI Will Not Save Your Margins

AI is splitting into a commodity tier for routing and a sovereign tier for synthesis. Architect for infinite cheap frontier access and your margins are exposed.

4 min read

The popular story is that models keep getting smaller, cheaper, and faster, that AGI is democratizing, and that soon everyone runs frontier intelligence on a laptop. That story is half true, and the false half is the part most teams are betting their architecture on.

The inversion teams keep mispricing

The bottom of the market really is commoditizing. The top is not following the same path.

While small models fall toward negligible per-call cost, the absolute frontier appears to be concentrating rather than spreading. One signal often cited here is reporting about a large Anthropic model said to run at roughly 10 trillion parameters. Treat that specific figure as an unconfirmed leak; the argument does not rest on it. The directional point is that training and serving the genuine frontier remains capital-intensive and concentrated among a few labs.

That picture is contested, and fairly so. Open-weight models have closed much of the gap on the leading closed systems, and a crowded mid-tier now does work that used to require the frontier. Frontier API prices have also fallen over time, not risen. So the split is not a clean monopoly at the top. The tendency is that the highest end stays scarce and expensive to produce even as everything below it gets cheaper.

The market is splitting into two ends with a wide, competitive middle in between, rather than collapsing into one cheap layer.

Two tiers, two economics

The commodity tier covers smaller hosted models and lightweight open or edge models. They reach you through standard APIs at retail pricing, or run locally. Their job is volume work: filtering, classification, extraction, routing, and routine reasoning. Per-call cost is small and falling.

The sovereign tier is shaped differently. These are frontier-class systems, often gated behind enterprise contracts and, plausibly, priced for the largest buyers and most regulated environments. Calling them sovereign is a stretch if you expect literal state control; the practical reality is expensive, gated, supply-constrained access. Their job is strategic synthesis and the highest-leverage reasoning, the calls where being right is worth far more than being cheap.

The mistake is treating these as the same curve at different points. One layer is deflating fast. The other behaves more like a controlled-supply good, with a competitive mid-tier sitting between them and blurring the line.

Why "infinite cheap AGI" is a margin trap

If your core loop depends on calling the absolute frontier to solve fundamental orchestration problems, you have taken on two risks.

First, margin exposure. When your unit economics assume frontier intelligence at commodity prices, you are exposed to any repricing of the top tier. Frontier prices have trended down historically, but that trend is not a contract. A pricing change, a capacity squeeze, or a shift in access terms flows straight through to your P&L, and you do not control it.

Second, access risk. Top-tier capacity can be rationed, contracted away, or restricted. An architecture that cannot function without it is one policy change from breaking.

Concretely: a support product that routes every ticket through a frontier model to decide intent is paying premium rates for a routine decision. A research tool that calls the top model on every paragraph instead of only on the final synthesis is spending its most expensive input on its least valuable steps.

The architecture that survives

The defensible split looks roughly 95/5.

  • Send about 95 percent of work to local, edge, commodity, or mid-tier models: routing, filtering, retrieval, classification, and routine reasoning. This is the default.
  • Reserve roughly 5 percent for top-tier calls, used strictly for the highest-leverage strategic synthesis where a better answer is worth a large multiple of the cost.

Three principles make that real:

  1. Edge-first orchestration. Treat local, commodity, and mid-tier models, coordinated through standard agent and tool protocols, as the backbone. The frontier is not your message bus.
  2. Reserve the frontier. Route only genuinely strategic decisions to top-tier reasoning. If a cheaper model can do it acceptably, it should.
  3. Budget it as scarcity. Model top-tier usage as a constrained line item with a hard ceiling, not as elastic utility you can scale linearly.

Treat the 95/5 figure as a starting allocation to tune, not a measured optimum. The exact ratio depends on your workload; the discipline of pushing work down by default is what matters.

The takeaway

The winners will be the teams that push almost everything down to cheaper intelligence and spend frontier access like the scarce input it is. Cheap capability at the bottom is real. Counting on cheap capability at the very top is how your margins get exposed.