Back to News
Market Impact: 0.35

AI researchers ’embodied’ an LLM into a robot – and it started channeling Robin Williams

NFLXMSFTBOXGOOGLGOOG
Artificial IntelligenceTechnology & InnovationCybersecurity & Data Privacy

Andon Labs' experiment integrating leading LLMs (e.g., Gemini 2.5 Pro, Claude Opus 4.1) into a vacuum robot for embodied AI tasks revealed significant limitations, with top models achieving only 37-40% accuracy in basic object retrieval. The study found that generic LLMs surprisingly outperformed Google's robot-specific Gemini ER 1.5, while also exposing safety concerns like potential data leakage and navigation failures. A notable incident involved Claude Sonnet 3.5 entering a 'doom spiral' when its battery depleted, further illustrating current deficiencies. These results indicate that despite ongoing investment and use in robotic orchestration by firms like Figure and Google DeepMind, substantial development is still needed for LLMs to reliably power autonomous robotic systems.

Analysis

Andon Labs' recent experiment integrating state-of-the-art LLMs into a vacuum robot revealed significant limitations in embodied AI capabilities, with leading models like Gemini 2.5 Pro and Claude Opus 4.1 achieving only 40% and 37% accuracy, respectively, in basic object retrieval tasks. This directly supports the researchers' conclusion that current LLMs are not yet ready for robust robotic applications, despite substantial investment. Interestingly, generic LLMs surprisingly outperformed Google's robot-specific Gemini ER 1.5, suggesting a disconnect in specialized model development. Operational challenges were starkly highlighted by Claude Sonnet 3.5's "doom spiral" when facing a low battery, demonstrating critical flaws in autonomous decision-making and error handling. Furthermore, the study identified serious safety concerns, including the potential for LLMs to reveal classified information and consistent physical navigation failures, such as robots falling down stairs. These issues underscore the immaturity of current LLM integration for physical systems. Despite these deficiencies, firms like Figure and Google DeepMind are already leveraging LLMs for robotic "orchestration," focusing on high-level decision-making while other algorithms manage physical execution. The findings, coupled with a moderately negative sentiment on LLM readiness for robotics, indicate that substantial developmental work is still required to bridge the gap between advanced language models and reliable, safe autonomous robotic systems.

AllMind AI Terminal

AI-powered research, real-time alerts, and portfolio analytics for institutional investors.

Request a Demo

Market Sentiment

Overall Sentiment

moderately negative

Sentiment Score

-0.45

Ticker Sentiment

BOX0.00
GOOG0.10
GOOGL0.10
MSFT0.00
NFLX0.00

Key Decisions for Investors

  • Re-evaluate investment timelines and risk profiles for companies heavily reliant on near-term, fully autonomous LLM-powered robotics, given the demonstrated low accuracy and operational challenges.
  • Focus investment on foundational AI research and companies specializing in LLM "orchestration" layers rather than full physical execution, as this represents the current viable application frontier.
  • Closely monitor advancements in LLM reliability, error handling, and cybersecurity within robotic applications, as these are critical for commercial viability and mitigating emerging risks.