The Middle Tier Is the Target

Google's Gemma 4 release isn't generosity. It's a targeted attack on the profit pool where GPT-4o-mini and Claude Haiku live.

3 min read

Google released Gemma 4 under Apache 2.0 in April 2026. The headlines called it a gift to developers. That is the wrong reading.

Gemma 4 is not about generosity. It is a direct attack on the profit pool of two specific products: GPT-4o-mini and Claude Haiku. Everything else in the narrative is cover.

Here is the structure of the attack. The API business has three tiers. The top tier (flagship models like Claude Opus, GPT-5, Gemini Pro) is low volume, high margin. Customers pay frontier prices for work where accuracy matters more than cost. The bottom tier (self-hosted 7B models) is near-free but narrow. The middle tier is where the volume money lives: summarization, classification, extraction, routing, the boring-but-constant workloads that run millions of calls per day across every SaaS product on the market. That middle is where Haiku and 4o-mini print money.

Gemma 4 sits precisely on that tier. Close enough in capability to replace it for most non-reasoning tasks (per Google's own benchmarks). Apache 2.0, so the enterprise legal objection evaporates. Small enough to run on a single accelerator, which is the part Meta got wrong with Llama 4 at ~400B. Meta built a model the typical company cannot serve. Google built one the typical company can.

The dual strategy becomes legible once you stop thinking about Google as a single business. Closed Gemini keeps the enterprise margin. Open Gemma keeps competitors out of the volume business entirely. Google does not need Gemma to make money directly. It needs Gemma to prevent OpenAI and Anthropic from making money at scale. That is enough.

This is the same play Android ran against iOS, with a small rewrite. Android did not have to beat iOS at the premium tier. It had to make the middle of the phone market unprofitable for anyone not running Google's stack. It worked. Nearly two decades later, everyone's middleware is Google's.

For the model labs charging for the middle tier, the arithmetic changes fast. If a developer can run a frontier-adjacent model for the cost of compute alone, the willingness to pay 10x for a managed API collapses for every task that is not reasoning-heavy. You can already see the pricing pressure on mid-tier APIs. You will see more.

For operators, the read is simpler than the debate over open-versus-closed suggests.

If your workload is high-volume classification, extraction, routing, or bulk summarization, you should already be benchmarking Gemma 4 against whatever mid-tier API you pay for today. The cost delta is the entire story. The capability delta, for these tasks, is small enough to ignore.

If your workload is reasoning-heavy (multi-step agent work, code generation, research synthesis), skip Gemma 4 for now. That tier is not under attack yet. The frontier labs still have a real product there.

The interesting question is what Anthropic and OpenAI do next. The obvious move is to push Haiku and 4o-mini down in price, and that is already happening. The less obvious move is to stop defending the middle tier at all, collapse the product catalog to frontier-only, and cede volume to whoever wants it. I expect at least one of them to do this within 18 months. The margin profile at the middle tier after Gemma 4 is not a business. It is a liability.

Google did not release Gemma 4 for the ecosystem. It released Gemma 4 to burn down a neighborhood its competitors live in. The neighborhood happens to be where the volume money lives.

When someone open-sources a frontier-adjacent model, the question is never "why the generosity?" The question is "whose margin just died?"