Ai2 makes building custom coding agents easier and cheaper

The Allen Institute for AI (Ai2) is launching a new family of open coding agents today that, as standalone models, outperform similar-sized models on standard benchmarks. But what makes this project stand out is that Ai2 is also open sourcing a collection of tools that lets anyone fine-tune the model based on their private codebases, documentation, and other materials for significantly better performance on domain-specific tasks.
“Over the past year, coding agents have transformed how developers write, test, and maintain software. These systems can debug, refactor, and even submit pull requests — fundamentally changing what software development looks like,” Ai2 writes in today’s announcement. “Yet despite this progress, most coding agents share the same constraints: They’re closed, expensive to train, and difficult to study or adapt to private codebases.”
The cost of doing so? $400 to replicate Ai2’s results and just over $2,000 for the best performance. Comparable approaches, Ai2 notes, can cost up to 11 times more. To train its so-called SERA (Soft-verified Efficient Repository Agents) model, the first in Ai2’s Open Coding Agents collection, the team used a cluster of two Nvidia H100s.
Credit: Ai2.
Two models, full recipes included
Ai2 is launching two models under the SERA moniker: SERA-32B, which bests other models like Qwen3-Coder and Mistral’s Devstral Small 2, and SERA-8B.
SWE-Bench Verified is a benchmark that tests whether AI coding agents can resolve real-world GitHub issues from a subset of popular Python repositories. This smaller model only solves 29.4% of SWE-Bench Verified problems, but that’s still well above similar-sized open models. The large model, however, solves 55% of those problems.
Credit: Ai2.
The team is releasing the models, code, all generated agent data, and the full recipe for any team to generate their own data.
One of the interesting results of the team’s research was that the smaller fine-tuned model would often replicate, and at times, exceed the performance of its larger “teacher” coding agent. A 32B model, when fine-tuned on a company codebase, often outperforms its 100B-parameter teacher model.
How Ai2 cut training costs
At the core of Ai2’s efforts to keep the models both performant and affordable are two innovations, the team explains.
The first is soft-verified generation (SVG). As Ai2 notes, when creating synthetic training data for these kinds of models, the traditional approach is to create pairs of incorrect code and its corrected version. Counterintuitively, Ai2 found that including only partially correct solutions in the training set still produced models that generate fully correct code.
Creating the traditional set of “hard-verified” incorrect/corrected code pairs necessitates a lot of thorough, compute-intensive testing. But as it turns out, that isn’t necessary.
The second innovation is that to diversify the training dataset, Ai2 created a taxonomy of 51 bug patterns. Its tools then generate prompts for bugs in each function in a repository, yielding what the team calls “thousands of varied agentic trajectories at low cost.”
As it turns out, training on realistic developer workflows matters more than perfectly verified code pairs.
“We believe bringing the cost of replicating strong coding agents down to a few hundred dollars will unlock research that simply wasn’t possible before,” the Ai2 team writes. “Instead of being limited to a handful of well-funded labs, agentic coding can become a widely accessible direction for small teams, students, and independent developers.”
The post Ai2 makes building custom coding agents easier and cheaper appeared first on The New Stack.
