Nvidia Is The Only AI Model Maker That Can Afford To Give It Away

Nvidia unveiled Nemotron 3, an open-source, hybrid Mamba-Transformer mixture-of-experts (MoE) family that emphasizes memory-efficient reasoning, multi-token prediction and up to a 1 million-token context window; the lineup includes Nemotron 3 Nano (30B parameters, 3B active to fit a single L40S GPU), Super (100B/10B active) and Ultra (500B/50B active), with Super/Ultra using a latent MoE that enables ~4x more experts and NVFP4 4-bit pretrained models on a ~25 trillion-token corpus. The release underscores Nvidia’s strategy of leveraging its uniquely profitable GPU business and AI Enterprise software ($4,500/GPU/year versus $35k–$45k GPU hardware) to subsidize open models, expand its software and datacenter stack, and undercut increasingly closed-model competitors (OpenAI, Anthropic, Google and a pivoting Meta), potentially accelerating Nvidia’s move toward full-stack, vertically integrated AI utility economics. Nvidia also highlighted its large open-source footprint (650 models and 250 datasets contributed in 2025, part of billions of OSS downloads and millions of Hugging Face models), and benchmarked Nemotron 3 Nano as substantially faster in inference throughput versus Nemotron 2, signaling material competitive and ecosystem implications if Nvidia pairs free/open models with paid support and enterprise software.

Analysis

Nvidia this week unveiled Nemotron 3, a hybrid Mamba-Transformer mixture-of-experts (MoE) family that includes Nemotron 3 Nano (30B parameters, 3B activated to fit a single L40S GPU), Super (100B/10B activated) and Ultra (500B/50B activated). Nemotron 3 adds a latent MoE that Nvidia says enables roughly 4x more experts at the same inference performance, supports multi-token speculative prediction, a context window up to 1 million tokens and Super/Ultra pretraining in NVFP4 4-bit precision on a ~25 trillion token corpus. Benchmarking disclosed in the article shows Nemotron 3 Nano materially outperforms Nemotron 2 on token throughput, with the MoE activation design and Mamba layers driving significant memory and inference-efficiency gains. The release reinforces Nvidia’s strategic model of subsidizing open models via its highly profitable hardware business and AI Enterprise software (quoted at $4,500 per GPU per year versus $35,000–$45,000 for a Blackwell GPU), and builds on Nvidia’s open-source footprint (650 models and 250 datasets contributed in 2025, ~350 million open-source downloads and 2.8 million models on Hugging Face). The article positions this as a route to full-stack and vertical integration—Nvidia acting like an “AI utility”—which could expand software and support revenue if customers adopt Nemotron models with paid enterprise services. Nvidia’s visible contribution to open-source and the technical advantages claimed for Nemotron 3 create a credible pathway to widen ecosystem lock-in and increase attach rates for expensive accelerators. Key commercial and execution questions remain within the article: it is unclear whether Nvidia will broadly open the 25T-token dataset, what price or packaging it will use for model support, and how quickly enterprise customers will migrate to Nemotron 3-based stacks. Competitive dynamics matter: the piece notes other large model vendors are moving toward closed models (OpenAI, Anthropic, Google and a possible Meta shift), so Nvidia’s open approach could undercut closed-model economics if paired with low-cost support. Monitor convertibility of technical lead into identifiable software/subscription revenue and any shifts in GPU demand or ASPs that would validate the strategic thesis.

AllMind

AllMind

Nvidia Is The Only AI Model Maker That Can Afford To Give It Away

Analysis

AllMind AI Terminal

Market Sentiment

Key Decisions for Investors