Back to News
Market Impact: 0.25

The Artificial Intelligence (AI) Trade Is Splitting in Two. Here's How to Pick the Right Side in 2026.

NVDAMSFTMETAGOOGLAMDAVGONFLX
Artificial IntelligenceTechnology & InnovationCompany FundamentalsAntitrust & CompetitionProduct Launches

Nvidia controls more than 90% of the AI training GPU market, but training spending is cyclical while inference usage is growing and viewed as a steadier recurring revenue stream. Grand View Research forecasts the global AI market growing at a 30.6% CAGR from 2026–2033. Broadcom and custom ASICs are gaining traction for inference at hyperscalers, and Nvidia inked a $20 billion non‑exclusive licensing deal with Groq to develop a language processing unit (LPU) to compete in inference. The shift implies potential upside for inference-focused chipmakers (e.g., Broadcom) even as Nvidia remains dominant in training.

Analysis

The shift toward inference-optimized silicon creates a predictable, amortizable revenue stream for firms that can bundle software, orchestration, and low-latency hardware into a single offering. Hyperscalers that internalize NRE (we estimate typical NREs of $100–300M per custom ASIC) can drive unit-cost advantages that compound over 3–5 year deployment cycles, compressing the TAM available to one-size-fits-all GPU vendors and changing where gross margin accrues in the stack. A subtle supply-chain reallocation is underway: inference ASICs tend to trade off expensive HBM stacks for larger on-die SRAM and more advanced packaging, shifting value toward foundries and packaging specialists and away from discrete memory suppliers. That reallocation also raises bar for OS/SDK compatibility — vendors who control both runtime software and silicon capture an outsized share of total margin when customers value integration and SLAs. Key downside catalysts are software-level efficiency gains (quantization, sparsity, adapter layers) that could reduce inference FLOP demand by 20–40% over 12–36 months, and execution risk at small ASIC startups that fail to hit yield or software parity. Conversely, broad real-time, multimodal adoption in consumer and enterprise apps could double inference load in 3–5 years, favoring integrated silicon+software players and making short-duration hardware bets unusually binary.

AllMind AI Terminal

AI-powered research, real-time alerts, and portfolio analytics for institutional investors.