Insights9 min read

What is an agent loop? How AI loops help dev and QA

By qtrl Team · Engineering

Watch a developer debug something they don't fully understand. They make a change, run it, look at what happened, and change something else based on what they saw. Nobody writes out all the steps in advance. The next move depends on the result of the last one. That back-and-forth, repeated until the thing works, is a loop. It's also, more or less, the whole idea behind AI agents.

"Agent loop" gets used a lot and explained rarely. It's worth slowing down on, because once you see it, a lot of what's happening in AI for product development and QA stops looking like magic and starts looking like one simple pattern running fast.

What an agent loop actually is

An agent loop is the cycle an AI agent runs to get something done: it observes the current state, decides on the next action, takes that action, then checks the result and goes again. A plain language model answers once and stops. An agent keeps going, feeding each result back in as the starting point for the next decision, until it reaches the goal or hits a stop condition.

The agent looprepeats until doneObserveRead the current stateDecidePick the next actionActClick, type, call a toolCheckDid it do what it should?Every agentic test run is this loop, spinning fast against a real app

The shape isn't new. Military strategist John Boyd described the OODA loop, observe, orient, decide, act, decades ago, and control systems have run feedback loops far longer than that. What's new is the thing sitting in the middle making the decision. Put a capable model at the decide step, give it tools it can call at the act step, and the loop can handle messy, open-ended work that you'd normally have to script by hand.

The research that kicked this off is worth knowing by name. The ReAct paper (Yao et al., 2022) showed that interleaving reasoning with actions, think a little, do a little, observe, repeat, beat models that tried to reason everything out up front. Anthropic's write-up on building effective agents makes the same point from an engineering angle: an agent is a model running in a loop, using tools and adjusting based on what comes back. Strip away the framing and that's the core of it.

The loop in product development

Product development already runs on a loop, even without any AI in it. You build something, ship it, watch how it behaves, and use what you learn to decide what to build next. The faster that loop turns, the faster you find out whether you were right. Years of DORA research on engineering performance keeps landing on the same theme: the teams that ship reliably are the ones with short, tight feedback loops, not the ones with the most process.

The slow part of that loop is usually the gap between writing code and knowing if it works. You make a change, and then you wait. Wait for a build, wait for someone to click through it, wait for a test run that may or may not still be relevant. An agent loop compresses that gap. Instead of waiting for a person to go exercise the new flow, an agent observes the running app, tries the path, and reports back what it found, in the time it takes to get a coffee.

This matters more now that so much code is written by AI in the first place. When a team ships features faster than anyone can read them line by line, the build-test-fix loop is the only thing standing between "it compiled" and "it works." We've written about how AI coding tools quietly broke a lot of test suites; the fix is a faster, smarter verification loop, not slower shipping.

The loop in QA

Agentic testing is the agent loop pointed at a browser. The agent observes the page, decides what a real user would do next, clicks or types, then checks whether the result matches the intent. No pre-written selector for every step. The next action comes from what's actually on the screen right now. That's the difference between agentic testing and a recorded script, and it's the same difference autonomous testing is built on.

The loop is also why self-healing tests work at all. When a button moves or a class gets renamed, a scripted test fails on the spot because it was following exact directions. An agent in a loop observes the new layout, notices the button it wanted is now somewhere else, and adjusts. It's not repairing a broken script. It just never depended on the old position to begin with.

And when the thing under test is itself an AI feature, the loop is the only honest way in. A chatbot or recommendation engine won't give the same answer twice, so a fixed assertion misses the point. You need an agent that can hold a real conversation, judge whether each response was reasonable, and probe the edges. That's the heart of our playbook for testing AI agents.

Why a loop beats a straight line

A scripted test is a straight line: step one, step two, step three, end. It's fast, cheap, and perfect right up until the app changes underneath it. Then it breaks, and someone spends an afternoon fixing directions instead of finding bugs. The line has no way to react, because reacting was never part of it.

A loop reacts by design. It decides each step from the live state, so a moved button or a renamed field is just a different observation, not a failure. That flexibility is the whole value, and it's also exactly what makes a loop riskier than a straight line. A line can only do what you wrote. A loop can do things you didn't think of, which is wonderful when it finds a real bug and a problem when it wanders off.

The loop wrapped around the loop

Here's the part teams skip. The fast inner loop, observe, decide, act, check, runs in seconds. Wrapped around it is a slower human loop: an agent proposes work, a person reviews it, and only approved results count. The inner loop gives you speed. The outer loop gives you trust. You need both.

How much of the inner loop you let run without a human watching is a dial, not a switch. An agent can start read-only, earn its way up to suggesting tests, then to running them for review, and only later to running approved suites on its own.

ObserveRead-onlyexplorationFull access to read,no ability to change stateAdviseSuggests testsand gapsProposes only,a human acts on itAct with approvalGenerates and runs,pending reviewActs inside a gate,approval before it countsAct autonomouslyRuns approvedsuites on its ownScoped autonomy,every action loggedLower autonomyHigher autonomy, heavier governance

Gartner has been blunt about what happens when teams skip this and hand agents full autonomy on day one: it predicts more than 40% of agentic AI projects will be cancelled by the end of 2027, usually after a governance gap surfaces in production. The loop is powerful. Letting it run unsupervised before it's earned trust is how the program gets pulled. We went deeper on the proportional approach in our piece on governance for testing agents.

Where loops go wrong

Three failure modes show up often enough to plan for. The first is the runaway loop: an agent that keeps trying, burning tokens and time, because nothing told it when to stop. Every loop needs a stop condition, a step budget, and a timeout. The second is the confident wrong answer, where the agent decides it succeeded when it didn't. That's why the check step has to be real verification against the goal, not the agent grading its own homework.

The third is scope. A loop with broad access can reach things it shouldn't, and "it seemed fine" is not a security posture. Keep secrets out of the agent's reach, scope it to one environment, and log every action. This goes double when the agent is testing AI-generated code, which fails security tests far more often than people expect.

Agent loops: FAQ

Is an agent loop the same as an agentic workflow? Close, but not identical. A workflow is a fixed set of steps an LLM moves through. A loop lets the agent decide the next step from the result of the last one, so the path isn't fixed in advance. Most real systems mix the two: a workflow with a loop inside it where the path can't be known up front.

Doesn't looping make it slow and expensive? It can, which is why stop conditions and step budgets matter. A well-scoped loop runs a handful of iterations and stops. The cost to watch isn't the loop itself, it's a loop with no exit telling it when good enough is good enough.

Where should a team start? Point a read-only agent loop at a flow you already understand and read what it reports for a week. You'll learn quickly whether you trust its observe and check steps, and that tells you whether it's ready to act. Starting low costs almost nothing if the answer turns out to be no.


qtrl is the agent loop built for QA, with the human loop wrapped around it. Agents explore and test inside the rules you set, results flow through a review-and-approve gate before they count, secrets stay out of reach, and every action lands in an audit trail. If you're comparing options, here's how the agentic testing tools stack up. See the loop run on your own app.

Have more questions about AI testing and QA? Check out our FAQ