Back to News
Market Impact: 0.55

DeepSeek may have used Google’s Gemini to train its latest model

GOOGLGOOGMSFT
Artificial IntelligenceTechnology & InnovationPatents & Intellectual PropertyAntitrust & Competition

Chinese AI lab DeepSeek is facing renewed accusations of potentially training its R1 reasoning AI model on data extracted from competitor models, specifically Google's Gemini, based on similarities in language and internal model traces; this follows previous allegations of DeepSeek training on OpenAI's ChatGPT data. While distillation, or training on outputs from other models, is not uncommon, it violates OpenAI's terms of service, and AI companies are increasingly implementing security measures such as ID verification and trace summarization to prevent such practices.

Analysis

Chinese AI lab DeepSeek is facing renewed allegations regarding its model training practices, with recent speculation from AI researchers suggesting its R1 reasoning AI model may have been trained using outputs from Google's Gemini AI. This assertion is based on observations by developer Sam Paeach, who noted DeepSeek's R1-0528 model favors words and expressions similar to Google's Gemini 2.5 Pro, and another developer who found its model traces 'read like Gemini traces.' These accusations follow previous incidents; in December, DeepSeek's V3 model reportedly identified itself as OpenAI's ChatGPT, and OpenAI later informed the Financial Times of evidence linking DeepSeek to distillation—a technique of training AI by extracting data from larger models, which violates OpenAI's terms of service. Further, Bloomberg reported that Microsoft, an OpenAI collaborator, detected significant data exfiltration via OpenAI developer accounts in late 2024, suspected to be linked to DeepSeek. While the proliferation of AI-generated content on the web makes filtering training datasets challenging, AI researcher Nathan Lambert suggests it's plausible DeepSeek would use synthetic data from leading models, viewing it as 'effectively more compute' given potential GPU constraints and available capital. In response to these intellectual property concerns, major AI companies like OpenAI, Google, and Anthropic are intensifying security measures, including ID verification for API access (OpenAI's list excludes China) and summarizing model traces to protect their competitive advantages. The overall sentiment from the provided signals is mildly negative, reflecting concerns about IP integrity and fair competition within the rapidly evolving AI landscape.

AllMind AI Terminal

AI-powered research, real-time alerts, and portfolio analytics for institutional investors.

Request a Demo

Market Sentiment

Overall Sentiment

mildly negative

Sentiment Score

-0.35

Ticker Sentiment

GOOG-0.20
GOOGL-0.20
MSFT-0.20

Key Decisions for Investors

  • Investors in AI leaders like Google (GOOGL) and Microsoft (MSFT, via its OpenAI investment) should monitor the increasing risk of intellectual property misuse and the associated costs of implementing defensive security measures, which could impact competitive moats and R&D expenditure.
  • Consider the potential for heightened regulatory scrutiny or industry-wide standards on AI model training data provenance, which may affect companies like DeepSeek and create compliance burdens or opportunities for firms specializing in ethical AI development.
  • Evaluate the long-term valuation implications for AI companies if distillation practices become widespread or difficult to control, potentially eroding the differentiation of proprietary models and intensifying competition based on access to scaled compute rather than unique training data.