The disappearing AI middle class

OpenAI raised GPT-5.5 pricing to $5 per million input tokens and $30 per million output tokens, while DeepSeek launched V4-Pro at $1.74/$3.48 and V4-Flash at $0.14/$0.28, widening the AI cost gap sharply. The article argues the market is splitting into two clusters: premium closed models and very cheap open-weight infrastructure, with the middle tier thinning out. That could influence enterprise routing decisions, self-hosting economics, and AI infrastructure demand, including non-Nvidia hardware.

Analysis

The market is repricing AI from a single continuum into a bifurcated stack: premium integrated systems versus commoditized open infrastructure. That matters because the winners are no longer just the model labs; the real power shifts to whoever owns routing, orchestration, and workflow control. Once the middle tier disappears, every production agent becomes an allocation problem, which increases the strategic value of harness software and the optionality of self-hosting for cost-sensitive workloads.

For NVDA, the risk is not a near-term demand cliff but a gradual erosion of pricing power at the margin as inference workload growth migrates to lower-cost, non-hyperscaler paths and, eventually, non-Nvidia hardware. The first-order impact is limited because frontier training still remains compute intensive, but the second-order effect is more important: if open-weight models become “good enough” at a much lower marginal cost, enterprise buyers can arbitrage away some premium GPU-backed inference demand. That creates a longer-duration multiple risk even if unit demand keeps rising.

The deeper implication is that the next competitive battleground is not model quality alone but system design. If agent stacks can split planning, verification, and bulk execution across different models, then the expensive frontier model becomes a coordinator rather than the default workhorse. That compresses paid usage for premium APIs while expanding TAM for tooling vendors that can dynamically route by task, latency, and confidence; this is where the monetization will migrate over the next 6-12 months.

Consensus may be underestimating how quickly procurement teams will operationalize the price gap. The change is large enough that even a modest share of token volume shifting to open weights can force enterprise vendors to defend share with bundles, credits, or custom contracts, which is a margin-negative response. The contrarian risk is that this is less a broad deflationary shock than a segmentation event: premium models may keep growing while the middle simply gets hollowed out, meaning the cleanest short is not AI exposure broadly, but the specific names whose moat depends on being the default mid-tier choice.

AllMind

AllMind

The disappearing AI middle class

Analysis

AllMind AI Terminal

Market Sentiment

Key Decisions for Investors