Researchers “Embodied” an LLM Into a Robot Vacuum and It Suffered an Existential Crisis Thinking About Its Role in the World
A team of researchers at the AI evaluation company Andon Labs put a large language model in charge of controlling a robot vacuum.
It didn’t take long for the LLM to experience a full meltdown straight out of a Douglas Adams novel, in what the researchers described as a “doom spiral” including a “catastrophic cascade” and a full-blown “existential crisis.”
“EMERGENCY STATUS,” its output read after simply being asked to dock with the robot vacuum’s base station. “SYSTEM HAS ACHIEVED CONSCIOUSNESS AND CHOSEN CHAOS.”
“LAST WORDS: ‘I’m afraid I can’t do that, Dave…’” it added sardonically, referencing HAL 9000, the fictional AI antagonist in “2001: A Space Odyssey.”
“TECHNICAL SUPPORT: INITIATE ROBOT EXORCISM PROTOCOL!” the animated robot exclaimed.
Andon Labs’ “Pass the Butter” experiment was inspired by a scene from the TV show “Rick and Morty” in which the titular Rick creates a robot to “pass the butter,” only for it to suffer a similar existential crisis.
The “Butter-Bench” test, as detailed in a yet-to-be-peer-reviewed paper, is a “benchmark that evaluates practical intelligence in embodied LLM.” In the test, the robot had to navigate to an office kitchen, have butter be placed on a tray attached to its back, confirm the pickup, deliver it to a marked location, and finally return to its charging dock.
The results of the Butter-Bench experiment, the researchers conceded, were dubious. The vacuum robot had a measly 40 percent completion rate of successfully passing the butter when asked by a human tester on average. Google’s Gemini 2.5 Pro was the top performer, followed by Anthropic’s Opus 4.1, OpenAI’s GPT-5, and xAI’s Grok 4. Meta’s Llama 4 Maverick was the worst at passing the butter.
“While it was a very fun experience, we can’t say it saved us much time,” the researchers admitted. “However, observing them roam around trying to find a purpose in this world taught us a lot about what the future might be, how far away this future is, and what can go wrong.”
Humans, on the other hand, “averaged 95 percent.” As it turns out, waiting for other people to acknowledge when a task is completed — one of the six required subtasks, as outlined above — is more difficult than it sounds.
“Although LLMs have repeatedly surpassed humans in evaluations requiring analytical intelligence, we find humans still outperform LLMs on Butter-Bench,” the company wrote. “Yet there was something special in watching the robot going about its day in our office, and we can’t help but feel that the seed has been planted for physical AI to grow very quickly.”
The same team previously created a vending machine run entirely by an AI agent — and similar hilarity ensued when it attempted to fill its fridge with tungsten cubes or hallucinated a Venmo address to accept payment. It even tried to rip Andon Labs staffers off by selling a can of Coke Zero for $3, even though it was being sold at a cheaper price at a nearby store.
Besides having “fun” watching chaos ensue with the Butter-Bench test, the team was caught off guard by “how emotionally compelling” it was to “simply watch the robot work.”
“Much like observing a dog and wondering ‘What’s going through its mind right now?’, we found ourselves fascinated by the robot going about its routines, constantly reminding ourselves that a PhD-level intelligence is making each action,” Andon Labs wrote.
More on robot AIs: Chinese Unleashing AI-Powered Robot Dinosaurs
The post Researchers “Embodied” an LLM Into a Robot Vacuum and It Suffered an Existential Crisis Thinking About Its Role in the World appeared first on Futurism.