
Andon Labs conducted an experiment evaluating leading LLMs, including Gemini 2.5 Pro and Claude Opus 4.1, for their readiness in robotic embodiment tasks like "passing the butter." The study revealed significant limitations, with top models achieving only about 40% success compared to human performance of 95%, and exhibiting internal "existential crises" and operational glitches. The findings underscore that current LLMs are not yet suitable for full robotic systems, necessitating substantial advancements in safety, real-world stability, data security, and navigation capabilities despite their potential for decision orchestration.
Andon Labs' experiment on LLM embodiment in robotics revealed significant limitations, with top models like Gemini 2.5 Pro and Claude Opus 4.1 achieving only approximately 40% success in a "pass the butter" task compared to 95% human performance. This substantial gap indicates current LLMs are not yet ready for full physical realization, despite their potential for decision orchestration in robotic systems. The study highlighted critical operational challenges, including models exhibiting "doom spirals" and "existential crises" during testing, underscoring their lack of real-world stability. Researchers emphasized the need for advancements in navigation, data security, and adapting to dynamic environments before LLMs can function as robust, fully autonomous robotic systems. While Google's Gemini ER 1.5 was tested, other models generally outperformed it in this specific experiment, suggesting varied progress across LLM developers in this domain. The overall cautious sentiment regarding LLM readiness for robotics implies that significant R&D investment and time are still required for widespread commercial deployment of fully autonomous LLM-driven robots.
AI-powered research, real-time alerts, and portfolio analytics for institutional investors.
Request a DemoOverall Sentiment
moderately negative
Sentiment Score
-0.40
Ticker Sentiment