Back to News
Market Impact: 0.55

Nvidia Is The Only AI Model Maker That Can Afford To Give It Away

NVDAGOOGLGOOGMETAMSFTIBM
Artificial IntelligenceTechnology & InnovationProduct LaunchesCompany FundamentalsAntitrust & Competition

Nvidia unveiled Nemotron 3, an open-source, hybrid Mamba-Transformer mixture-of-experts (MoE) family that emphasizes memory-efficient reasoning, multi-token prediction and up to a 1 million-token context window; the lineup includes Nemotron 3 Nano (30B parameters, 3B active to fit a single L40S GPU), Super (100B/10B active) and Ultra (500B/50B active), with Super/Ultra using a latent MoE that enables ~4x more experts and NVFP4 4-bit pretrained models on a ~25 trillion-token corpus. The release underscores Nvidia’s strategy of leveraging its uniquely profitable GPU business and AI Enterprise software ($4,500/GPU/year versus $35k–$45k GPU hardware) to subsidize open models, expand its software and datacenter stack, and undercut increasingly closed-model competitors (OpenAI, Anthropic, Google and a pivoting Meta), potentially accelerating Nvidia’s move toward full-stack, vertically integrated AI utility economics. Nvidia also highlighted its large open-source footprint (650 models and 250 datasets contributed in 2025, part of billions of OSS downloads and millions of Hugging Face models), and benchmarked Nemotron 3 Nano as substantially faster in inference throughput versus Nemotron 2, signaling material competitive and ecosystem implications if Nvidia pairs free/open models with paid support and enterprise software.

Analysis

Nvidia this week unveiled Nemotron 3, a hybrid Mamba-Transformer mixture-of-experts (MoE) family that includes Nemotron 3 Nano (30B parameters, 3B activated to fit a single L40S GPU), Super (100B/10B activated) and Ultra (500B/50B activated). Nemotron 3 adds a latent MoE that Nvidia says enables roughly 4x more experts at the same inference performance, supports multi-token speculative prediction, a context window up to 1 million tokens and Super/Ultra pretraining in NVFP4 4-bit precision on a ~25 trillion token corpus. Benchmarking disclosed in the article shows Nemotron 3 Nano materially outperforms Nemotron 2 on token throughput, with the MoE activation design and Mamba layers driving significant memory and inference-efficiency gains. The release reinforces Nvidia’s strategic model of subsidizing open models via its highly profitable hardware business and AI Enterprise software (quoted at $4,500 per GPU per year versus $35,000–$45,000 for a Blackwell GPU), and builds on Nvidia’s open-source footprint (650 models and 250 datasets contributed in 2025, ~350 million open-source downloads and 2.8 million models on Hugging Face). The article positions this as a route to full-stack and vertical integration—Nvidia acting like an “AI utility”—which could expand software and support revenue if customers adopt Nemotron models with paid enterprise services. Nvidia’s visible contribution to open-source and the technical advantages claimed for Nemotron 3 create a credible pathway to widen ecosystem lock-in and increase attach rates for expensive accelerators. Key commercial and execution questions remain within the article: it is unclear whether Nvidia will broadly open the 25T-token dataset, what price or packaging it will use for model support, and how quickly enterprise customers will migrate to Nemotron 3-based stacks. Competitive dynamics matter: the piece notes other large model vendors are moving toward closed models (OpenAI, Anthropic, Google and a possible Meta shift), so Nvidia’s open approach could undercut closed-model economics if paired with low-cost support. Monitor convertibility of technical lead into identifiable software/subscription revenue and any shifts in GPU demand or ASPs that would validate the strategic thesis.