NVIDIA Beats Everyone To DeepSeek V4 With Day-0 Blackwell Support, Pushing 3,500 Tokens Per Second On 1.6T Models

DeepSeek V4 launches with major efficiency gains, using just 27% of single-token inference FLOPs and 10% of KV cache at a 1M-token context window, while introducing a 1.6T-parameter Pro model and 284B Flash model. NVIDIA says its Blackwell GPUs already support DeepSeek V4 on day 0 with NVFP4, and preliminary performance claims reach nearly 3,500 TPS per GPU on GB300/Blackwell Ultra. The update is positive for NVIDIA’s AI infrastructure narrative and highlights broader ecosystem support for large-model inference.

Analysis

This is less about a single model release and more about NVIDIA trying to lock in a de facto hardware standard for the next wave of frontier inference. If NVFP4 becomes the default path for long-context, sparse-activation workloads, the economic moat shifts from raw FLOPS to the full co-optimization stack: kernels, scheduling, memory traffic reduction, and deployment tooling. That favors NVIDIA’s software-adjacent monetization because customers chasing lower inference cost will optimize around the platform that is already “one release ahead,” which can prolong gross margin resilience even as GPU competition intensifies. The second-order winner is the ecosystem around high-throughput inference, not just the GPU vendor. Model-serving operators, cloud providers, and enterprises with agentic workflows should see a sharper step-down in cost per token, which expands the addressable set of workloads that can be profitably run on-prem or in private cloud. That is a negative for pure-play API providers with weaker infra economics, because cheaper self-hosted inference compresses price points and raises the bar for differentiated software margins over the next 6-18 months. The contrarian point is that the market may already be treating this as incremental upside for NVDA when the bigger effect is defense of share and pricing, not a near-term reacceleration in unit demand. The adoption curve for FP4-optimized stacks will be gated by software readiness, validation cycles, and customer willingness to recompile workloads; that means the earnings impact is likely back-end loaded. A bigger risk is that domestic accelerator ecosystems use the same quantization narrative to narrow NVIDIA’s edge in China faster than investors expect, creating a split market where NVIDIA wins globally but loses some strategic optionality in the most policy-constrained geography.

AllMind

AllMind

NVIDIA Beats Everyone To DeepSeek V4 With Day-0 Blackwell Support, Pushing 3,500 Tokens Per Second On 1.6T Models

Analysis

AllMind AI Terminal

Market Sentiment

Key Decisions for Investors