AI models still far from AGI-level reasoning: Apple researchers

Apple researchers have published a paper questioning the reasoning capabilities of leading large language models (LLMs) like ChatGPT and Claude, finding that their performance collapses beyond certain complexities and that they struggle to generalize reasoning effectively. The study, titled "The Illusion of Thinking," challenges the prevailing assumption that artificial general intelligence (AGI) is imminent, suggesting that current approaches to LLMs face fundamental barriers to achieving human-like reasoning, despite claims from OpenAI and Anthropic that AGI is only a few years away.

Analysis

Apple researchers have presented findings in a June paper, "The Illusion of Thinking," which temper expectations for rapid advancements towards artificial general intelligence (AGI) by current leading large language models (LLMs) such as OpenAI's ChatGPT and Anthropic's Claude. The research indicates that these models, despite incorporating large reasoning models (LRMs), exhibit a "complete accuracy collapse beyond certain complexities" and fail to generalize reasoning effectively, with their performance edge diminishing as task complexity increases. This conclusion, based on tests using puzzle games designed to probe beyond standard mathematical and coding benchmarks, suggests that current LLMs tend to mimic reasoning patterns rather than internalizing them, and can "overthink," generating correct answers initially before devolving into incorrect reasoning. These findings challenge the optimistic AGI timelines projected by industry leaders, such as OpenAI's Sam Altman and Anthropic's Dario Amodei who anticipated AGI within a few years, and suggest that prevailing approaches "may be encountering fundamental barriers to generalizable reasoning." The overall sentiment surrounding these findings is moderately negative (-0.65 score), reflecting a cautious outlook on the current trajectory of AGI development, although Apple's (AAPL) specific involvement and research publication is viewed with a neutral to slightly positive sentiment (0.2 ticker score), potentially indicating market perception of a rigorous and realistic approach from the company.

AllMind

AllMind

AI models still far from AGI-level reasoning: Apple researchers

Analysis

AllMind AI Terminal

Market Sentiment

Key Decisions for Investors