DeepSeek’s distilled new R1 AI model can run on a single GPU

DeepSeek has released a smaller, distilled version of its R1 reasoning AI model, DeepSeek-R1-0528-Qwen3-8B, built upon Alibaba's Qwen3-8B, which reportedly outperforms Google's Gemini 2.5 Flash on the AIME 2025 math benchmark and nearly matches Microsoft’s Phi 4 on HMMT. This model, requiring significantly less computational power than the full-sized R1, is available under a permissive MIT license for academic research and industrial development, potentially lowering the barrier to entry for AI model deployment.

Analysis

DeepSeek's introduction of DeepSeek-R1-0528-Qwen3-8B, a distilled version of its R1 reasoning AI model, represents a significant development in the competitive AI landscape, particularly in efficient model deployment. This smaller model, built upon Alibaba's Qwen3-8B foundation, reportedly outperforms Google’s Gemini 2.5 Flash on the AIME 2025 mathematics benchmark and nearly matches Microsoft’s Phi 4 reasoning model on the HMMT math skills test. The primary advantage of such distilled models is their substantially lower computational demand; for instance, the underlying Qwen3-8B requires a GPU with 40GB-80GB of RAM (e.g., an Nvidia H100), whereas the full-sized new R1 model necessitates around a dozen 80GB GPUs, highlighting the efficiency gains. DeepSeek developed this smaller model by fine-tuning Qwen3-8B with text generated by its larger R1 counterpart. Intended for both academic research on reasoning models and industrial development focused on small-scale applications, DeepSeek-R1-0528-Qwen3-8B is available under a permissive MIT license, facilitating unrestricted commercial use and is already accessible via APIs from hosts like LM Studio. This launch underscores a trend towards more specialized, accessible AI tools and intensifies competition among AI developers.

AllMind

AllMind

DeepSeek’s distilled new R1 AI model can run on a single GPU

Analysis

AllMind AI Terminal

Market Sentiment

Key Decisions for Investors