Google developing inference AI chips to rival Nvidia

Google is developing new AI inference chips with Marvell and plans to unveil a new TPU generation at Google Cloud Next, signaling a deeper push into AI hardware. The move targets the fastest-growing part of AI compute demand and could strengthen Google's competitive position versus Nvidia, though supply constraints remain a risk. Meta and Anthropic are already expanding TPU usage, underscoring rising external demand for Google's chip ecosystem.

Analysis

This is less a one-off product announcement than an attempt by Google to reprice the economics of inference compute before the rest of the market fully recognizes the shift. If training was the prestige spend, inference is the recurring-revenue layer: every query, agent call, and retrieval step compounds silicon demand, which is why a more specialized architecture matters. The key second-order effect is that Google is trying to convert model distribution plus cloud access into a vertically integrated stack that can undercut GPU economics on steady-state workloads, not just win benchmark headlines.

For Nvidia, the near-term hit is not share loss in training, but margin pressure as customers increasingly benchmark total cost per token rather than raw performance. That puts the most exposed demand pockets on the lowest-complexity inference workloads first, which is where hyperscalers can amortize custom silicon fastest; high-end training and frontier workloads remain protected. The bigger strategic risk is that if Google can prove acceptable latency and throughput while bundling TPU capacity with Cloud, it raises the switching cost for enterprise AI spend and compresses the TAM for merchant silicon over the next 12-24 months.

Marvell is a quieter beneficiary than it looks: even if Google captures the architecture economics, the semiconductor content is still being outsourced, which should reinforce Marvell’s role as a structural custom-ASIC enabler across hyperscalers. Broadcom benefits too because this validates the custom-chip procurement model broadly, but it also creates a subtle supply-chain bottleneck: packaging, advanced interconnect, and foundry allocation become the real constraint, not chip design. If TPU demand is already being rationed to elite customers, the most likely failure mode is not weak interest but inability to scale shipments fast enough, which delays any revenue inflection until 2026+.

AllMind

AllMind

Google developing inference AI chips to rival Nvidia

Analysis

AllMind AI Terminal

Market Sentiment

Key Decisions for Investors