The Key to Agentic Success? BASH Is All You Need

Agent builders are finding that sometimes the easiest way for an agent to do its job is to simply give it a few Unix tools and “let it cook.”
A recent project from Vercel found that stripping away loads of metadata and instead giving the model a BASH shell and access to data produced superior results.
And another group of open source developers is finding that a simple BASH while loop and some time alone is all that is needed to execute even complex tasks.
“Models are getting smarter and context windows are getting larger, so maybe the best agent architecture is almost no architecture at all,” wrote Andrew Qu, chief of software at Vercel. “What if BASH is all you need?”
Let the LLM Do the Thinking
For its employees, Vercel built a file agent to derive answers from its internal data store. Called d0, it can answer questions that typically get asked of the data team:
Vercel’s d0 at work, answering questions.
To do this, d0 must translate natural language queries into SQL queries against a variety of YAML, Markdown and JSON files.
“When d0 works well, it democratizes data access across the company. When it breaks, people lose trust and go back to pinging analysts in Slack,” Qu wrote in a December blog post about d0.
When the company started the project, it devoted resources to making sure the agent had all the backup it needed, giving it specialized tools, heavy dollops of prompt engineering, loads of metadata and plenty of context management.
“It worked … kind of. But it was fragile, slow and required constant maintenance,” Qu wrote.
So, the engineering team tried the opposite approach: Instead of arming the agent to the teeth with context and tools, the agent was stripped to a single functionality, namely the ability to execute BASH commands. It got direct access to the files, which it had the ability to interrogate using grep, cat, ls and other commands.
Instantly, d0 became a lot easier to manage, used fewer resources and had a higher accuracy rate, the company found.
“All by doing less,” Qu wrote.
The Unix Philosophy
Perhaps what Qu and the team learned was not so counterintuitive after all.
The Unix philosophy is one of simplicity: The best way to build complex systems is through the modularity of basic components.
Each tool should do one thing and do it well, and tools should be easily composable into larger workflows. And they should all be text-based, as text is the universal interface.
BASH (Bourne Again SHell) is the interface for this approach, allowing the user to chain together programs using the simple pipeline command to use the output of one program as the input of another.
Through this simple philosophy, Unix (and its offshoot Linux) has been used for decades to manage servers and the complex workloads they run; perhaps it could manage AI work as well.
Better Results With Less Input
Vercel’s d0v2 removed 80% of the supporting information supposedly needed for the agent.
The BASH engine, called bash-tool, runs as an NPM package and was open sourced earlier this week.
It runs on Claude Opus 4.5 via the AI SDK, which is given a Vercel Sandbox for context exploration. Handling and observability are done through Vercel Gateway for request handling and observability, and a Next.js API route was built with Vercel Slack Bolt.
The data was indexed into a cube semantic layer, which is middleware software that aggregates the data sources so they are accessible via a single API, or in this case, a SQL query.
The cube fits into the Unix philosophy as well, given that its single job is to do semantic translation across the different data sources.
A lot of additional context was not needed for d0 because the semantic layer already provides much of the data needed, through dimension definitions, measure calculations and join relationships.
“We were building tools to summarize what was already legible. Claude just needed access to read it directly,” Qu wrote.
The following table summarizes the improvements from the old design to the new one:
| Metric | Advanced (Old) | File System (New) | Change |
|---|---|---|---|
| Avg execution time | 274.8s | 77.4s | 3.5x faster |
| Success rate | 4/5 (80%) | 5/5 (100%) | +20% |
| Avg token usage | ~102k tokens | ~61k tokens | 37% fewer tokens |
| Avg steps | ~12 steps | ~7 steps | 42% fewer steps |
Retrospective
In retrospect, Qu’s team was over-engineering the agent prompt. They were reinventing the wheel.
“Grep is 50 years old and still does exactly what we need. We were building custom tools for what Unix already solves,” Qu wrote.
Models are smart and getting smarter all the time. Providing them with more tools can be beneficial, but they can also be limiting. Sometimes models can make better choices. And they are advancing at a rate that your tool selection can’t equal.
“We were constraining reasoning because we didn’t trust the model to reason. With Opus 4.5, that constraint became a liability. The model makes better choices when we stop making choices for it,” Qu wrote.
Vercel CEO Guillermo Rauch expounded on this lesson on X, formerly known as Twitter, pointing to a return to understanding Unix fundamentals such as file systems, shells, processes and command lines.
“Don’t fight the models, embrace the abstractions they’re tuned for. BASH is all you need,” he wrote.
‘Failures Are Data’
One AI company that is apparently aligning with this philosophy is Anthropic itself, the maker of the Claude family of AI models.
Recently, the company released a plugin called “Ralph Wiggum,” which is basically a BASH script with a single operation: a do/while loop.
The idea is to give the AI agent a single prompt file and have it “iteratively improve its work until completion,” the docs explain.
No adjusting of the prompt is necessary. Instead, all the work is written to files and captured in git history logs. Claude improves the results by reviewing its own past work in files, and keeps revising the work until it hits the stated goals.
Ralph Wiggum was named after a dimwitted child in “The Simpsons,” and the idea was to eliminate the need for someone to review the work of a large language model (LLM) each time it attempts the task. Rather, have the LLM itself do the work, and learn how to pull itself up from its own bootstraps.
“Failures are data,” its creator, open source developer Geoffrey Huntley, explained.
Copyright: The Simpsons.
Despite its simple brute-force approach, Wiggum, in the best Unix fashion, has produced some remarkable results.
In one Hackathon, the Wiggum technique was used to port a web agent tool from Python to TypeScript. Left overnight to run, the researchers returned the next day to over 1,000 commits, six ported codebases and a nearly fully functional program.
In other words, it was able to complete $50,000 of contract work for $297 in API costs, and, over a three-month period, create an entire programming language, according to Anthropic.
Wiggum works best for certain types of jobs, such as well-defined ones that don’t require human intervention along the way.
As we think about the road ahead with AI, sometimes it’s worth keeping in mind that complexity is not always the way forward, and some of the best tools for a job aren’t shiny news ones, but ones that have long been available.
The post The Key to Agentic Success? BASH Is All You Need appeared first on The New Stack.
