New details on Apple-Google AI deal revealed, including Gemini changes: report

Apple gained 'complete access' to Google's Gemini models under the partnership, allowing it to perform model distillation to produce smaller, on-device models that approximate Gemini performance with much lower compute. The agreement gives Apple broad freedom to modify and run Gemini in its own data centers while its in-house Foundation Models team continues work; Apple plans to unveil major Siri AI updates at WWDC in June (contextual memory and proactive features).

Analysis

This deal should be evaluated as a change in where value is captured, not merely who supplies models. If Apple can meaningfully shift latency-sensitive assistant workloads into faster, cheaper execution paths, that mechanically increases per-device engagement and lowers Apple’s marginal services delivery cost — a lever that can translate into low-single-digit percentage points of incremental Services margin over 12–36 months if monetization follows through (search referrals, Maps, in-app conversion). A non-linear second-order is the intra-cloud compute market: any durable transfer of steady-state inference cycles away from hyperscale cloud endpoints depresses the growth runway for GPU-serving revenue tied to large-model inference. Even a 10–20% reallocation of routine assistant inference across Apple’s installed base would be low-double-digit downside to incremental cloud AI serving demand for vendors and to the portion of Google’s cloud that prices on usage, with most impact materializing over the next 12–24 months. Competitive dynamics will push responses that change the playing field: rivals will either accelerate their own edge strategies or double down on server-side differentiation (multimodal, immediate fine-tuning). That creates a two-track market where premium device-integrated experiences and high-throughput cloud inference both coexist but compete for developer mindshare, talent, and capital — favoring integrated incumbents with silicon+services stacks and making standalone inference middleware and mid-tier GPU vendors vulnerable. Key risks and timing: the thesis hinges on execution (product quality, battery/thermal tradeoffs) and regulatory scrutiny over exclusivity or preferential access; either can slow adoption by 6–18 months. Near-term catalysts are the WWDC product reveal and subsequent Services KPI commentary; reversals come from poor UX, safety/accuracy problems, or a swift competitive cloud response that preserves server-side demand.

AllMind

AllMind

New details on Apple-Google AI deal revealed, including Gemini changes: report

Analysis

AllMind AI Terminal

Market Sentiment

Key Decisions for Investors