AGENTIC ENGINEERING

Building Durable Loops with Conductor, Part 1: Why Agentic Loops, and Why Now?

Nick Lotz

Content Engineer

Last updated: July 1, 2026

June 30, 2026

12 min read

Related Blogs

Jul 2, 2025

Why Orchestration is Key for Unlocking Agentic AI

May 26, 2025

What is Model Context Protocol (MCP)?

May 22, 2025

BPM vs Orchestration Tools: Automating Business Processes

Ready to Build Something Amazing?

Join thousands of developers building the future with Orkes.

Start for free

Back to Blogs

AGENTIC ENGINEERING

Building Durable Loops with Conductor, Part 1: Why Agentic Loops, and Why Now?

Nick Lotz

Content Engineer

Last updated: July 1, 2026

June 30, 2026

12 min read

An AI agent in some respects can be thought of as a loop. It asks a model what to do next, takes an action, and repeats until the task is done. This pattern certainly predates large language models, but LLMs made it the default behavior in agentic software. For example, the ReAct formulation of interleaved reasoning and acting (released in 2022!) is a loop, and so is every self-correcting variant that followed it.

While loops are easy to write, they are also very easy to underestimate. A while statement might behave perfectly in isolation. But it stops behaving well the moment the loop has to run longer than a short, individual session. Think integration with slow tool calls, an approval that takes a day, or redeploys with multiple process restarts. When the process holding the loop dies, the loop's state dies with it.

This series is about building a more robust approach. A durable loop keeps its iteration state in a runtime instead of in your process memory. We'll start by discussing how we got here: why agentic systems are loops, why the naive loop fails as tasks get longer, and what specifically makes a loop durable rather than merely restartable.

We'll then ground these concepts in a small working example on Conductor, the orchestration engine powered by Orkes. Later, we'll build the patterns production agents need on top of a durable loop.

A loop is an agent's control structure

A modern agent can be thought of as a control loop with a model inside it. Each pass, the model reads the current state and proposes an action, something executes that action, the result is appended to the state. Then the loop decides whether to go again.

This structure has been implemented in several different ways. ReAct runs a single loop that alternates a reasoning step with a tool call. Reflexion wraps an outer loop around it: after an attempt, the agent writes a critique of its own trajectory and runs the inner loop again with that reflection in context. Plan-and-execute agents split the work in two, one pass to draft a plan and a loop to carry it out and revise it.

But ultimately a survey of these systems finds a common architecture underneath the variations: memory, a step that decides on an action, and execution, iterated until a goal is met.

The agent loop: read memory, decide, execute, repeat ReAct, Reflexion, and plan-and-execute all run the same loop: read the memory, decide an action, execute it, and repeat until the goal is met.

Because the loop is the control structure, its reliability is the agent's reliability. Anything the loop forgets, the agent forgets. Any pass that silently runs twice, the agent does twice. The question of how durable your agent is reduces to the question of how durable its loop is.

Why naive loops break at length

The case against the in-process loop starts as an arithmetic problem. Suppose every step of an agent succeeds, end to end, 99% of the time. A 20-step task then succeeds about 82% of the time, because 0.99 to the 20th power is 0.82.

A 100-step run, the size of a serious multi-agent task once sub-agent calls are counted, drops to 37%. Past a certain length, most runs hit at least one failed step.

Length also exposes failure modes that have nothing to do with the model's judgment! An in-process loop loses its state when the process dies mid-iteration. It repeats side effects when a restart replays a step whose effect already landed, so an agent that crashes after charging a card and before recording the charge will charge again on retry.

It also keeps no durable record of what the agent saw and decided, so a run that does something strange at three in the morning is unauditable after the fact. And it offers no way to meaningfully observe or intervene: once the loop is running, a human or a supervising agent cannot tell it to stop or reconsider.

A while loop has one response to any of this: start over from the top. That doubles the token usage, and is unsafe wherever an earlier step already had an effect.

Underneath sits a distinction the rest of this series depends on: durability is not continuity. Checkpointing your state to a database gives you durability, meaning your data survives. It does not give you continuity, meaning knowledge of which step was in flight when the process died and whether its effect already landed.

Continuity is what lets a system guarantee that a step runs effectively once, however many times the process dies between attempting it and recording it. A durable loop needs both. It has to persist the agent's state and the position of the loop itself.

What makes a loop durable

Durable execution is an old idea that means the following: a computation's progress is persisted as it runs, so after a failure it resumes from the last committed point instead of restarting.

The idea predates agents by decades. Its lineage runs from the Sagas model of long-lived transactions, through event-sourced systems that reconstruct state by replaying a log, into the workflow engines that now carry agents. What they share is a commitment to treating each step's completion as a durable fact rather than a line of code that happened to run.

A loop is durable when four things hold:

The loop counter lives in the runtime. Which iteration you are on is server state, not a variable in your process. A restart reads it back instead of resetting to zero.
Each iteration is checkpointed. A pass's result is committed before the next pass starts, so the boundary between iterations is also a recovery point.
Iteration boundaries are idempotent. A step retried after a crash has the same effect as running it once, which is what makes it safe to re-run the step that was in flight when the process died.
Termination is explicit. The loop has both a condition and a ceiling, so it ends on purpose instead of running until it exhausts a budget or a context window.

Four properties of a durable loop A loop is durable when the runtime owns the counter, commits each iteration, keeps steps idempotent, and terminates on purpose.

For example, Conductor models this with a loop task called DO_WHILE. The tasks inside it run once per iteration, and a loopCondition decides whether to repeat. The loop's state is owned by the Conductor server, not by the workers that do the work, which is the property that makes everything later in this post possible.

From a while-loop to a saga

A durable loop is one point on a spectrum. At one end is the in-process while loop: maximally flexible, and gone the instant the process dies. At the other is the late-bound saga, where the execution graph does not exist when the run starts and the model synthesizes it edge by edge, committing each edge to a ledger before it runs.

A durable loop sits between them. Its approach is fixed, a body and a condition, but its state and position belong to the runtime.

This is a feature, not a limitation. A loop whose structure is known ahead of time is easy to reason about, easy to bound, and easy to recover, which is exactly what you want for the parts of an agent that repeat a well-defined step until a condition holds. Where the plan itself has to be invented as the agent goes, you reach for a saga. Where the agent repeats a known step, a durable loop is the smaller and sturdier tool.

Building one with Conductor

Let's make it concrete. Start by launching a local Conductor instance via the official Docker container:

bash

docker run --init -p 8080:8080 -p 5000:5000 conductoross/conductor:latest

A workflow is ultimately an underlying JSON. This one runs a step called agent_think until the step reports done, or until a ceiling of six iterations:

json

{
  "name": "durable_agent_loop",
  "version": 1,
  "tasks": [
    {
      "name": "agent_loop",
      "taskReferenceName": "agent_loop",
      "type": "DO_WHILE",
      "loopCondition": "if ($.agent_loop['iteration'] < 6 && $.think['done'] != true) { true; } else { false; }",
      "loopOver": [
        {
          "name": "agent_think",
          "taskReferenceName": "think",
          "type": "SIMPLE",
          "inputParameters": {
            "goal": "${workflow.input.goal}",
            "draft": "${think.output.draft}",
            "iteration": "${agent_loop.output.iteration}"
          }
        }
      ]
    }
  ]
}

Two references show the durability aspect. $.agent_loop['iteration'] is the loop counter, maintained by the server. ${think.output.draft} feeds the previous pass's output into the next pass, which means the running state lives in the workflow execution rather than in any worker.

The DO_WHILE loop in the Conductor UI The DO_WHILE task is the loop. Conductor owns the iteration count and the per-pass state; the worker only does one pass at a time.

Putting an agent in the loop

The step itself is an ordinary worker: a process that polls Conductor for agent_think work, calls a model, and reports the result. Here it runs one self-refinement pass with gpt-5.5 using the critique-and-improve move that Reflexion formalizes.

python

import json, requests
from openai import OpenAI

client = OpenAI()                       # reads OPENAI_API_KEY
SERVER = "http://localhost:8080/api"
SYSTEM = (
    "You refine a definition over successive passes. Make one improvement per "
    "pass and reply as JSON {draft, critique, done}. Always set done=false "
    "before pass 4; from pass 4 on, set done=true only when the draft is "
    "accurate, concrete, names the failure it prevents, and is 45 words or fewer."
)

def refine(goal, draft, n):
    resp = client.chat.completions.create(
        model="gpt-5.5",
        response_format={"type": "json_object"},
        messages=[{"role": "system", "content": SYSTEM},
                  {"role": "user", "content": json.dumps(
                      {"goal": goal, "current_draft": draft, "pass": n})}])
    return json.loads(resp.choices[0].message.content)

while True:                             # a worker is just a process that polls
    r = requests.get(f"{SERVER}/tasks/poll/agent_think")
    if r.status_code == 204 or not r.text.strip():
        continue                        # nothing queued right now
    task = r.json()
    if not task or not task.get("taskId"):
        continue
    d = task["inputData"]
    out = refine(d["goal"], d.get("draft") or "", d.get("iteration") or 0)
    requests.post(f"{SERVER}/tasks", json={
        "taskId": task["taskId"],
        "workflowInstanceId": task["workflowInstanceId"],
        "status": "COMPLETED", "outputData": out})

bash

curl -X POST localhost:8080/api/metadata/taskdefs -H 'Content-Type: application/json' \
  -d '[{"name":"agent_think","responseTimeoutSeconds":20,"retryCount":3}]'
curl -X POST localhost:8080/api/metadata/workflow -H 'Content-Type: application/json' \
  -d @durable_agent_loop.json
curl -X POST localhost:8080/api/workflow/durable_agent_loop -H 'Content-Type: application/json' \
  -d '{"goal":"Define durable execution for a senior engineer."}'

Notice there is no agent loop in the worker. It handles one pass and returns; Conductor decides whether to schedule another. On this run the agent took four passes, about sixteen seconds, before it judged the draft good enough and set done.

Four gpt-5.5 passes on the timeline Four gpt-5.5 passes, each one iteration of the same DO_WHILE loop, recorded by the server as they ran.

It converged on:

Durable execution is a runtime model that persists execution history, checkpoints, and recorded external I/O results so, after crashes, restarts, or redeploys, it can deterministically resume from the last committed step instead of re-running completed work, preventing duplicate side effects.

Surviving a crash

Here is the part the in-process loop cannot do. We start the loop, let two passes commit, and kill -9 the worker while the third pass is mid-flight, in the middle of its model call. The process is gone. Nothing is running.

The loop does not go with it. The interrupted pass sits as IN_PROGRESS on the server until its response timeout elapses; the server then marks it TIMED_OUT and re-queues it.

We start a fresh worker. It polls, picks up the re-queued pass, finishes it, and the loop runs to completion. The first two passes are never repeated, because the server already committed them.

The interrupted pass times out and re-runs on a new worker Pass think__3 timed out when its worker was killed, then ran again on a fresh worker and completed. The loop finished without losing the earlier passes.

This is the continuity we mentioned earlier, not just durability. The server knew which iteration was in flight, knew it had not committed, and handed it to another worker. The two committed passes were not lost and not repeated, so the loop resumed at the third pass instead of restarting from the top.

Designing loops that converge

A durable loop removes the penalty for running long, which makes it easy to write loops that never stop. Two guards matter. First, give every loop a ceiling, so a model that never declares itself finished still terminates. And make each step idempotent, so the re-queue after a crash does not double an effect. Refining a draft was safe to retry because it has no external side effect; a step that charges a card or sends an email needs an idempotency key to be equally safe.

There is also a subtlety in how state crosses a crash: ${think.output.draft} resolves to the most recent task instance, so a pass retried after a timeout reads that timed-out instance's empty output rather than the last good draft. When a loop must carry accumulated state through a mid-pass failure, hold it in a workflow variable or reference the last committed output explicitly.

Where we are going next

Part 1 establishes the idea of moving the loop off your process and into a runtime that owns its state, and the agent stops being something that dies when its process does. The next part will build on the same DO_WHILE foundation, with the patterns production agents actually need: pausing a loop for human approval and resuming it on a signal, fanning a loop out across parallel sub-agents, and changing a loop's definition without breaking the runs already in flight.

Related Blogs

Jul 2, 2025

Why Orchestration is Key for Unlocking Agentic AI

May 26, 2025

What is Model Context Protocol (MCP)?

May 22, 2025

BPM vs Orchestration Tools: Automating Business Processes

Ready to Build Something Amazing?

Join thousands of developers building the future with Orkes.

Start for free

We'll then ground these concepts in a small working example on Conductor, the orchestration engine powered by Orkes. Later, we'll build the patterns production agents need on top of a durable loop.

A loop is an agent's control structure

But ultimately a survey of these systems finds a common architecture underneath the variations: memory, a step that decides on an action, and execution, iterated until a goal is met.

The agent loop: read memory, decide, execute, repeat ReAct, Reflexion, and plan-and-execute all run the same loop: read the memory, decide an action, execute it, and repeat until the goal is met.

Why naive loops break at length

A 100-step run, the size of a serious multi-agent task once sub-agent calls are counted, drops to 37%. Past a certain length, most runs hit at least one failed step.

A while loop has one response to any of this: start over from the top. That doubles the token usage, and is unsafe wherever an earlier step already had an effect.

What makes a loop durable

Durable execution is an old idea that means the following: a computation's progress is persisted as it runs, so after a failure it resumes from the last committed point instead of restarting.

A loop is durable when four things hold:

The loop counter lives in the runtime. Which iteration you are on is server state, not a variable in your process. A restart reads it back instead of resetting to zero.
Each iteration is checkpointed. A pass's result is committed before the next pass starts, so the boundary between iterations is also a recovery point.
Iteration boundaries are idempotent. A step retried after a crash has the same effect as running it once, which is what makes it safe to re-run the step that was in flight when the process died.
Termination is explicit. The loop has both a condition and a ceiling, so it ends on purpose instead of running until it exhausts a budget or a context window.

Four properties of a durable loop A loop is durable when the runtime owns the counter, commits each iteration, keeps steps idempotent, and terminates on purpose.

From a while-loop to a saga

A durable loop sits between them. Its approach is fixed, a body and a condition, but its state and position belong to the runtime.

Building one with Conductor

Let's make it concrete. Start by launching a local Conductor instance via the official Docker container:

bash

docker run --init -p 8080:8080 -p 5000:5000 conductoross/conductor:latest

A workflow is ultimately an underlying JSON. This one runs a step called agent_think until the step reports done, or until a ceiling of six iterations:

json

{
  "name": "durable_agent_loop",
  "version": 1,
  "tasks": [
    {
      "name": "agent_loop",
      "taskReferenceName": "agent_loop",
      "type": "DO_WHILE",
      "loopCondition": "if ($.agent_loop['iteration'] < 6 && $.think['done'] != true) { true; } else { false; }",
      "loopOver": [
        {
          "name": "agent_think",
          "taskReferenceName": "think",
          "type": "SIMPLE",
          "inputParameters": {
            "goal": "${workflow.input.goal}",
            "draft": "${think.output.draft}",
            "iteration": "${agent_loop.output.iteration}"
          }
        }
      ]
    }
  ]
}

The DO_WHILE loop in the Conductor UI The DO_WHILE task is the loop. Conductor owns the iteration count and the per-pass state; the worker only does one pass at a time.

Putting an agent in the loop

python

import json, requests
from openai import OpenAI

client = OpenAI()                       # reads OPENAI_API_KEY
SERVER = "http://localhost:8080/api"
SYSTEM = (
    "You refine a definition over successive passes. Make one improvement per "
    "pass and reply as JSON {draft, critique, done}. Always set done=false "
    "before pass 4; from pass 4 on, set done=true only when the draft is "
    "accurate, concrete, names the failure it prevents, and is 45 words or fewer."
)

def refine(goal, draft, n):
    resp = client.chat.completions.create(
        model="gpt-5.5",
        response_format={"type": "json_object"},
        messages=[{"role": "system", "content": SYSTEM},
                  {"role": "user", "content": json.dumps(
                      {"goal": goal, "current_draft": draft, "pass": n})}])
    return json.loads(resp.choices[0].message.content)

while True:                             # a worker is just a process that polls
    r = requests.get(f"{SERVER}/tasks/poll/agent_think")
    if r.status_code == 204 or not r.text.strip():
        continue                        # nothing queued right now
    task = r.json()
    if not task or not task.get("taskId"):
        continue
    d = task["inputData"]
    out = refine(d["goal"], d.get("draft") or "", d.get("iteration") or 0)
    requests.post(f"{SERVER}/tasks", json={
        "taskId": task["taskId"],
        "workflowInstanceId": task["workflowInstanceId"],
        "status": "COMPLETED", "outputData": out})

bash

curl -X POST localhost:8080/api/metadata/taskdefs -H 'Content-Type: application/json' \
  -d '[{"name":"agent_think","responseTimeoutSeconds":20,"retryCount":3}]'
curl -X POST localhost:8080/api/metadata/workflow -H 'Content-Type: application/json' \
  -d @durable_agent_loop.json
curl -X POST localhost:8080/api/workflow/durable_agent_loop -H 'Content-Type: application/json' \
  -d '{"goal":"Define durable execution for a senior engineer."}'

Four gpt-5.5 passes on the timeline Four gpt-5.5 passes, each one iteration of the same DO_WHILE loop, recorded by the server as they ran.

It converged on:

Durable execution is a runtime model that persists execution history, checkpoints, and recorded external I/O results so, after crashes, restarts, or redeploys, it can deterministically resume from the last committed step instead of re-running completed work, preventing duplicate side effects.

Surviving a crash

The loop does not go with it. The interrupted pass sits as IN_PROGRESS on the server until its response timeout elapses; the server then marks it TIMED_OUT and re-queues it.

We start a fresh worker. It polls, picks up the re-queued pass, finishes it, and the loop runs to completion. The first two passes are never repeated, because the server already committed them.

Building Durable Loops with Conductor, Part 1: Why Agentic Loops, and Why Now?

Related Blogs

Why Orchestration is Key for Unlocking Agentic AI

What is Model Context Protocol (MCP)?

BPM vs Orchestration Tools: Automating Business Processes

Ready to Build Something Amazing?

Company

Product

Community

Use Cases

Compare

Building Durable Loops with Conductor, Part 1: Why Agentic Loops, and Why Now?

A loop is an agent's control structure

Why naive loops break at length

What makes a loop durable

From a while-loop to a saga

Building one with Conductor

Putting an agent in the loop

Surviving a crash

Designing loops that converge

Where we are going next

Further reading

Related Blogs

Why Orchestration is Key for Unlocking Agentic AI

What is Model Context Protocol (MCP)?

BPM vs Orchestration Tools: Automating Business Processes

Ready to Build Something Amazing?

A loop is an agent's control structure

Why naive loops break at length

What makes a loop durable

From a while-loop to a saga

Building one with Conductor

Putting an agent in the loop

Surviving a crash

Designing loops that converge

Where we are going next

Further reading