Back to News
Market Impact: 0.55

Nvidia debuts Nemotron 3 with hybrid MoE and Mamba-Transformer to drive efficient agentic AI

NVDAACNCRWDORCLPLTRNOWZMAMZN
Artificial IntelligenceTechnology & InnovationProduct Launches
Nvidia debuts Nemotron 3 with hybrid MoE and Mamba-Transformer to drive efficient agentic AI

Nvidia unveiled Nemotron 3 — a three‑model family (Nano 30B, Super 100B, Ultra ~500B) built on a hybrid Mamba‑Transformer mixture‑of‑experts architecture that the company says delivers up to 4× token throughput versus Nemotron 2 Nano and can cut reasoning token generation (and inference costs) by up to 60%. The release introduces innovations including a 1 million‑token context window, a “latent MoE” expert‑sharing design and 4‑bit NVFP4 training to enable large‑model training on existing infrastructure, and already counts enterprise and cloud adopters such as Accenture, CrowdStrike, Oracle Cloud, Palantir, ServiceNow, Siemens and major consultancies. By publishing papers, large permissive post‑training datasets and a NeMo Gym RL lab for stress‑testing agents, Nvidia is pushing for greater openness while positioning itself to capture the enterprise multi‑agent/autonomous AI stack — potentially lowering deployment costs, accelerating adoption and reinforcing demand for its hardware and software ecosystem.

Analysis

Nvidia announced Nemotron 3, a three‑model family (Nano 30B, Super 100B, Ultra ~500B) built on a hybrid Mamba‑Transformer mixture‑of‑experts architecture with a 1 million‑token context window, a “latent MoE” expert‑sharing design and 4‑bit NVFP4 training to enable large‑model training on existing infrastructure. The company cited up to 4x higher token throughput versus Nemotron 2 Nano and up to a 60% reduction in reasoning token generation, and said benchmark testing from Artificial Analysis ranks Nemotron models highly among peers. Nvidia will publish papers, large permissive post‑training datasets and a NeMo Gym reinforcement‑learning lab for stress‑testing agents, and named early adopters including Accenture, CrowdStrike, Oracle Cloud, Palantir, ServiceNow, Siemens, consultancies and Zoom. These moves could materially lower inference costs and accelerate enterprise multi‑agent/autonomous AI adoption, strengthening demand for Nvidia’s hardware and software stack; however, claims currently rest on company releases and select third‑party benchmarks, and comparable architectures are already used by competitors (AI21 Labs) and cloud providers are launching similar testing tools (AWS Nova Forge), so monitor independent validation and adoption milestones.