Nvidia bets on open infrastructure for the agentic AI era with Nemotron 3

Nvidia unveiled the Nemotron 3 family — Nano (30B params, 3B active, 1M-token context) available now, plus Super (~100B, up to 10B active) and Ultra (500B, up to 50B active) due in H1 2026 — built on a hybrid latent mixture-of-experts architecture that Nvidia says delivers ~4x token throughput versus Nemotron 2 and cuts reasoning-token generation by up to 60%. The company is releasing open weights, roughly 3 trillion tokens of training/RL data and NeMo RL/Gym/Evaluator libraries, and is positioning Nemotron 3 as an enterprise infrastructure layer for building domain-specific, multi-agent systems with on-prem and cloud deployment flexibility. Third-party benchmarks highlight Nano’s efficiency and accuracy, and third-party API pricing (e.g., DeepInfra at $0.06/million input tokens) undercuts some closed offerings; however, Claude and GPT-4o still outperform on specialized tasks, so Nvidia’s competitive edge is openness, cost and integration rather than raw benchmark supremacy.

Analysis

Nvidia unveiled the Nemotron 3 family — Nano (30 billion parameters, 3 billion active per token, 1‑million‑token context) available now, plus Super (~100B, up to 10B active) and Ultra (500B, up to 50B active) expected in H1 2026. The company is releasing open weights, about 3 trillion tokens of pretraining/post‑training/RL data and open‑sourcing NeMo Gym, NeMo RL and NeMo Evaluator on GitHub and Hugging Face, positioning Nemotron 3 as an enterprise infrastructure layer rather than a hosted API product. Nvidia attributes performance gains to a hybrid latent mixture‑of‑experts architecture that combines Mamba‑2 layers, sparse transformers and MoE routing, claiming ~4X token throughput vs Nemotron 2 and up to 60% reduction in reasoning‑token generation; third‑party benchmarking cites Nano as the most efficient in its class. Third‑party inference pricing (DeepInfra) starts at $0.06 per million input tokens, which Nvidia frames as significantly cheaper than GPT‑4o, lowering cost assumptions for multi‑agent deployments. The strategic differentiator is openness and deployment flexibility for enterprises with on‑prem or hybrid needs, and cloud distribution via AWS Bedrock, Google Cloud, Microsoft Foundry and CoreWeave broadens go‑to‑market. Near‑term risks include capability gaps versus Claude/GPT‑4o on specialized tasks and the delayed availability of Super/Ultra, so actual enterprise adoption, partner integrations and safety/verification traction (via NeMo Gym) will determine commercial impact.

AllMind

AllMind

Nvidia bets on open infrastructure for the agentic AI era with Nemotron 3

Analysis

AllMind AI Terminal

Market Sentiment

Key Decisions for Investors