QCon chat: Is agentic AI killing continuous integration?

In the age of AI, will we still need continuous integration (CI) at all?
One panelist in a QCon AI conference panel on AI and engineering asked this, perhaps deliberatively provocative, question: Will AI kill CI?
While many at the event quickly dismissed the notion that AI could go that far as to actually eliminate CI, the question resonated in the halls of the conference, held within the scholarly confines of the New York Academy of Medicine in Manhattan’s Upper East Side. It turned out to be one of the most hotly-discussed topics at the event.
And many people agreed that the software development lifecycle will have to change in the era of AI.
Daniel Doubrovkine, who has worked in engineering positions at Shopify, AWS and Artsy.Net and recently took on a VP role at Microsoft, initially floated the question of if AI would kill CI altogether during a panel.
He had recently visited operations at Meta and was surprised how few tests the company actually ran before pushing new code to production, where developers can run many tests locally on their laptops (“Shift Left“) before pushing code.
“I think AI gives us a new opportunity to rethink how we work,” he said, noting it also gives us a chance to get rid of unnecessary tasks that have been built up on the way.
The pull request (PR) is the heart of a CI system, kicking off a whole series of tests to the software before it is merged into production.
But “There’s no fundamental law of the universe that says that a PR review or a code review has to happen before the code is deployed,” agreed Michael Webster, principal engineer for CI/CD service provider CircleCI, in his own talk. “There are a lot of compliance tools that say that has to happen, and those are important. But this is not a fundamental fact of software delivery.”
It doesn’t have to be this way — CircleCI’s Michael Webster (Google Gemini recreation of Webster’s slide.)
AI is breaking the software delivery lifecycle
We think of the development lifecycle as a linear series of discrete steps. “You push your code. You build, then you test, then you deploy,” Webster said.
“That model doesn’t hold up with AI,” he said.
Webster’s own QCON talk was about how AI, and agentic systems are changing the software delivery lifecycle. CircleCI is a CI/CD provider, processing over a billion customer jobs annually.
CircleCI’s Michael Webster
From what CircleCI is seeing within its own customer base, the software industry is on the cusp of using a lot of headless agents, which can take on long-running tasks on a schedule or be activated via webhooks.
Headless agents do well at mechanical translations, once given a solid set of rules to work from. A well-structured repository is key.
One project at CircleCI where agentic agents helped was a project to bring dark mode to CI/CD’s software. The design team specified the attributes required, and the agent did the laborious duty of going through all the user-facing components to make the changes.
“All in all, we’ve seen that this pairing of domain expertise plus AI is a really powerful organization attribute, because it allows more people to contribute,” Webster said.
By Webster’s estimate, through Google’s GitHub Archive for BigQuery, GitHub is now incurring hundreds of thousands of agent-related activities per week. What are they doing? Pull requests.
But an AI-fueled project can create an immense amount of code, which creates its own bottleneck.
“You have AIs pushing as much code as they are writing,” Webster said. Circle CI is also seeing this behavior with its own customers.
The problems around pull requests
On average, a code reviewer could be able to inspect 500 lines of code in an hour. When an agentic service can produce 1,500 lines of code every 10 minutes, there is bound to be a traffic jam.
Presentation at QCon.
Beyond the numbers problem, pull requests are “inefficient generally,” Webster said. By many accounts, the median time that a PR review team can take reviewing code can range from 14 hours down to three, in cases when a single engineer relentlessly pushes one PR through.
Reviewing PRs takes you out of the flow, and the information provided would have been more useful earlier in the development cycle.
Persistent technical debt accumulation is also a problem with this tsunami of PRs.
Headless agents working autonomously can work quickly, but also sloppily. The most recent DORA survey reports found the same: increased velocity, but more unstable.
In one paper, a group of researchers found that adopting an AI service, such as Cursor, can provide a temporary gain in code development, though the project’s velocity will soon be hampered by “static analysis warnings and code complexity.”
And in his own mathematical calculations, Webster estimated that any gains achieved with once AI-generating code would become useless once AI becomes 75% faster overall than human coders.
“If you’re not able to complement to speed up your delivery, compared to AI, it’s all going to be washed out by all of the delays in the process,” Webster said.
In other words, “the reality is, even if you did have AI going as fast as you wanted to, you as an organization, and the objective that you’re trying to achieve, couldn’t go faster even if you wanted to.”
There are things that you can do, such as optimizing pipelines, rewriting scripts, paralyzing tests, and better code reviews, which will all help.
Agentic activity on the part of CircleCI customers.
AI-generated code requires more nimble testing
But perhaps the best answer is to rethink the testing and validation process to let agents do as much of the work as possible.
“If you have a way to validate the AI, you can let it run as fast as possible,” Webster said.
Develop a set of tests that assert that if the code passes the tests, it should go to production. As others have pointed out, failure is a data set that AI itself can use to fine-tune its own process.
Thorough Unit tests are good for this, though are limited in scalability (to about 10x the human-driven workload, Webster estimated).
A better approach is test impact analysis to speed testing, through incremental validation, pruning tests to only what is needed, as highlighted by a dependency graph. CircleCI applied it to its own monolithic user interface application and found that it cut test timing from 30 minutes down to 1.5 minutes.
What this means is we can take an AI agent, have it work as fast as we’re willing to spend money on the tokens, and give it a tool to run only the test that it needs to run on the changes that it needs,” Webster said.
Such an operation can be easily run from within a container or a laptop.
The principle of selective attention can apply to code review. “Not all code has the same level of risk,” he said. “Here is where you can prune back review to just the changes that matter,” Webster said.
Circle CI has built its own agent, called Chunk, for customers to run to streamline their own testing processes.
Future build systems will be less linear
Future engineers will be worrying less about the code and more about supporting the AI in its relentless pursuit of generating more code, Webster predicted. So tasks like fixing flaky tests will become the first priority, and can be automated as well.
Instead of this linear process, we will need to build systems where all the required tests take place somewhere in the process.
“Instead of having a linear Yes/No, we combine these things into a single gate, where all we do is keep track of what has occurred,” Webster said. If a test passes, the coder should be moved to production. “Everything else besides that is us concerned about other things.”
With AI, “more effort and energy is likely going to be spent in this testing and evaluation, there is less so thinking about the specific designs of low-level details of our services.”
Full access to these QCon AI talks, and others, can now be procured through a video-only pass.
The post QCon chat: Is agentic AI killing continuous integration? appeared first on The New Stack.
