How far can we push AI autonomy in code generation?
Birgitta Böckeler reports on a series of
experiments we did to explore how far Generative AI can currently
be pushed toward autonomously developing high-quality, up-to-date software
without human intervention. As a test case, we created an agentic workflow
to build a simple Spring Boot application end to end. We found that the
workflow could ultimately generate these simple applications, but still
observed significant issues in the results—especially as we increased the
complexity. The model would generate features we hadn’t asked for, make
shifting assumptions around gaps in the requirements, and declare success
even when tests were failing. We concluded that while many of our
strategies — such as reusable prompts or a reference application — are
valuable for enhancing AI-assisted workflows, a human in the loop to
supervise generation remains essential.
