Apple researchers have released a paper criticizing the reliability of AI reasoning models from companies like OpenAI, Google, and Anthropic, finding that their accuracy declines significantly as problem complexity increases, leading to a 'complete collapse' in performance. The study, which tested the models using puzzles like the Tower of Hanoi, also found that the models reduce their reasoning effort as problems become more complex, even when provided with the correct algorithms. This research surfaces as Apple faces scrutiny for trailing competitors in AI development, with its Apple Intelligence service utilizing ChatGPT receiving lukewarm reception.
Apple researchers have published a paper highlighting significant limitations in leading artificial intelligence reasoning models from major competitors including OpenAI, DeepSeek, Anthropic, and Alphabet’s Google. The study, utilizing controllable puzzle environments like the Tower of Hanoi rather than standard benchmarks, revealed that the accuracy of these models progressively declines with increasing problem complexity, ultimately leading to a "complete collapse" or zero accuracy. Furthermore, the research indicated that these AI models tend to reduce their reasoning effort, measured by inference-time tokens, as task complexity rises—a phenomenon described as a "quitter's mentality" or "laziness," which was reportedly most pronounced in OpenAI's o3-mini variants and less severe in Anthropic’s Claude 3.7 Sonnet. This reduction in effort occurred even when the models operated well below their generation length limits and, crucially, performance did not improve even when the models were provided with the correct problem-solving algorithm. This critical research surfaces as Apple is perceived to be lagging in the AI race, and its recently launched Apple Intelligence service, incorporating ChatGPT, received lackluster reviews. The paper's circulation ahead of an Apple developers conference, combined with a generally "defensive" tone and "moderately negative" sentiment surrounding the news, suggests Apple may be attempting to reframe the AI narrative or differentiate its own forthcoming AI strategy.
AI-powered research, real-time alerts, and portfolio analytics for institutional investors.
Overall Sentiment
moderately negative
Sentiment Score
-0.50
Ticker Sentiment