Back to News
Market Impact: 0.7

Cutting-edge AI models ‘collapse’ in face of complex problems, Apple study finds

GOOGLGOOGAAPL
Artificial IntelligenceTechnology & InnovationPatents & Intellectual Property
Cutting-edge AI models ‘collapse’ in face of complex problems, Apple study finds

Apple researchers have identified "fundamental limitations" in large reasoning models (LRMs), an advanced form of AI, revealing a "complete accuracy collapse" when faced with highly complex problems, according to a newly published paper. The study, which tested models from OpenAI, Google, Anthropic, and DeepSeek, found that LRMs reduce reasoning effort as problems become more difficult, indicating a potential barrier to generalizable reasoning and raising questions about the industry's pursuit of artificial general intelligence (AGI). Experts suggest these findings signal the industry may be reaching a "cul-de-sac" in its current AI development approach.

Analysis

Apple's recent research paper reveals significant 'fundamental limitations' in advanced Large Reasoning Models (LRMs), including those from prominent AI developers like OpenAI, Google, Anthropic, and DeepSeek, challenging the current trajectory of AI development. The study documented a 'complete accuracy collapse' when LRMs faced highly complex problems, such as the Tower of Hanoi puzzle, and observed a counterintuitive reduction in 'reasoning effort' as models neared this failure point—a finding Apple researchers deemed 'particularly concerning.' This behavior, occurring even when models were provided with a solution algorithm, points to what the paper terms a 'fundamental scaling limitation' and 'fundamental barriers to generalisable reasoning.' These findings, described by academic Gary Marcus as 'pretty devastating,' cast doubt on the prevailing assumption that scaling current Large Language Models (LLMs) offers a direct path to Artificial General Intelligence (AGI), with experts like Andrew Rogoyski suggesting the industry might be in a 'potential cul-de-sac.' The research also highlighted inefficiencies, such as wasted computing power on simpler tasks and a tendency to explore incorrect solutions before arriving at correct ones for moderately complex problems, prior to complete failure on high-complexity challenges.