Sorry, but DeepSeek didn’t really train its flagship model for $294,000

Recent reports suggesting DeepSeek's R1 AI model was trained for a mere $294,000 are inaccurate, as this figure only accounts for the reinforcement learning phase. The foundational DeepSeek V3 model required an additional 2.79 million GPU hours, bringing the estimated total training cost closer to $5.87 million. This correction indicates that the capital expenditure for developing advanced AI models by Chinese firms is comparable to Western counterparts like Meta's Llama 4, challenging previous perceptions of significantly lower development costs and providing a more realistic view of investment requirements in the AI sector.

Analysis

The widely circulated claim that Chinese AI firm DeepSeek trained its R1 model for only $294,000 is a significant misinterpretation of the facts. This figure exclusively covers the post-training reinforcement learning (RL) phase, which constitutes a small fraction of the total effort. The foundational pre-training of the prerequisite DeepSeek V3 model actually required 2.79 million GPU hours on 2,048 Nvidia H800 GPUs, at an estimated cost of $5.58 million. Consequently, the combined cost to develop the R1 model is closer to $5.87 million, approximately 20 times the misleading headline figure. This level of expenditure places DeepSeek's resource commitment in a range comparable to Western counterparts, such as Meta's Llama 4, which required between 2.38 million and 5 million GPU hours. While DeepSeek V3 is a larger model than Llama 4 Maverick, it was trained on significantly fewer tokens (14.8 trillion vs. 22-40 trillion), indicating different strategic trade-offs between model size, compute hours, and data volume. Furthermore, the cost estimates are based on conservative GPU rental rates of $2/hour and do not account for the substantially higher capital cost of purchasing the hardware (estimated north of $51 million), nor do they include R&D, data acquisition, and other operational expenses. The analysis dismantles the narrative of hyper-efficient, low-cost Chinese AI development, revealing that building state-of-the-art foundation models remains a highly capital-intensive endeavor globally, subject to hardware supply chains dominated by firms like Nvidia and the associated geopolitical pressures.

AllMind

AllMind

Sorry, but DeepSeek didn’t really train its flagship model for $294,000

Analysis

AllMind AI Terminal

Market Sentiment

Key Decisions for Investors