Skip to main content

Your AI agent has been running for nine minutes. It has called three external APIs, spent €0.42 on LLM tokens, and is halfway through generating a customer report. Then the container restarts. Everything is lost — the intermediate state, the API calls that cannot safely be replayed, the hard-won reasoning tokens. Your agent starts over, and the customer waits.

This is the quiet failure mode eating engineering teams alive throughout 2026. It is also why durable execution — infrastructure that distributed systems engineers have quietly relied on for a decade — has suddenly become the most talked-about layer in production AI architectures.

TL;DR

  • AI agents are long-running, non-deterministic, and expensive to restart — making them a poor fit for traditional stateless serverless patterns.
  • Durable execution engines (Temporal, Restate, Inngest, Cloudflare Workflows, AWS Durable Functions, Vercel Workflow DevKit) persist every step of a workflow so failures replay without re-executing completed work.
  • In 2026 this has crossed the chasm: AWS, Cloudflare, and Vercel all shipped first-party durable execution products in late 2025, and Gartner predicts 40% of enterprise apps will include task-specific AI agents by year-end.
  • The architectural shift is real: stop writing retry loops and cron jobs, start modelling agents as deterministic workflows wrapping non-deterministic activities.
  • Choosing between options comes down to deployment model, language support, and whether you want to own the control plane — the wrong call here will quietly bleed reliability for months.

Why traditional architectures are breaking

Most teams that started shipping AI agents in 2024 and 2025 built them the way they built everything else: an HTTP endpoint, a queue, maybe a Lambda, a Redis cache for intermediate state, and a scattering of try/except blocks around the expensive LLM calls. This works beautifully for the first three weeks, right up until an agent workflow needs to do something that takes longer than a Lambda timeout, or survive a deploy, or resume after a flaky third-party API has a bad afternoon.

The problem is not new. It drove Uber to build Cadence, which became Temporal; it drove Azure Durable Functions for .NET in 2018; and every payments company has solved it with some homegrown mess of state machines, outbox tables and weekend pages. AI agents are simply the domain that finally made it impossible to keep pretending otherwise, because agents combine every failure mode durable execution was designed to fix:

  • Long-running processes — some agent workflows run for hours or days, crossing any reasonable request timeout.
  • Expensive, non-idempotent side effects — replaying a payment, a customer email, or a €0.30 LLM call is not free, and not always safe.
  • Probabilistic behaviour — the LLM call that worked a second ago might return something different on retry, which breaks naive replay logic.
  • Human-in-the-loop pauses — the workflow needs to wait for an approval, possibly for days, without holding a connection open.
  • Tool calls that fail in new and creative ways — external APIs rate-limit, go down, or silently change behaviour.

A stateless architecture with hand-rolled retries cannot cope with all of these at once. Something has to remember where you were, what you have already done, and what must not be done twice.

What durable execution actually is

The core idea, stripped of branding, is straightforward. You write your workflow as ordinary code — TypeScript, Python, Go, Java — but the runtime records every step to a persistent log. If anything crashes, the runtime replays your code from the beginning, skipping any step that was already recorded, until it reaches the point of failure and resumes from there. State is not held in memory; it is held in an event history that can be reconstructed on any machine.

This has two consequences that are not obvious until you have used it:

First, you get to write linear, readable code. What would have been a sprawling mess of queues, cron jobs, status tables and retry handlers becomes a single function with await statements. The infrastructure handles persistence, retries, timeouts, and resumption invisibly.

Second, the boundary between deterministic workflow code and non-deterministic activities becomes load-bearing. Your workflow function has to be replayable, so it cannot call Date.now() or Math.random() directly. Side effects — API calls, LLM prompts, database writes — are wrapped in activities, which record their result the first time they run and return the recorded value on replay. This discipline is what makes the whole model work, and it is also what trips up teams new to the pattern.

The 2026 landscape: who ships what

Until late 2025, durable execution was the preserve of Temporal and a small cluster of alternatives. That changed quickly. Here is the shape of the market as it stands this spring:

  • Temporal — still the most mature option, with SDKs for Go, Java, TypeScript, Python, .NET, PHP and Ruby. Self-hostable or managed via Temporal Cloud. Best fit for teams that want full control and are comfortable operating a stateful cluster. Its OpenAI Agents SDK integration, shipped in late 2025, is genuinely impressive.
  • Restate — a newer entrant written in Rust, single binary, strong focus on low operational overhead. Excellent choice for teams who want durable execution without running a database cluster alongside it.
  • Inngest — TypeScript-native, serverless-first, with a developer experience that feels closer to async/await than to Temporal’s explicit activity model. Priced per step executed. Strong fit for Next.js and Vercel-hosted stacks.
  • Cloudflare Workflows — went GA in late 2025. Runs on Workers, pairs naturally with Durable Objects and R2. The right answer if you are already committed to the Cloudflare platform.
  • AWS Durable Functions — Amazon’s answer, tightly coupled to Lambda and Step Functions. Makes sense if you are already deep in the AWS ecosystem and do not want to introduce a new vendor.
  • Vercel Workflow DevKit — the newest arrival, aimed squarely at developers building AI features inside Next.js apps. Early, but the DX is the best in class for small teams.

The choice between them is rarely technical in the abstract — they will all reliably persist your workflow state. The real questions are: which language do your engineers want to write workflows in, who do you want to pay for the control plane, and how painful is it to migrate off your chosen provider in two years’ time?

A concrete example: the research agent that does not lose its mind

Consider a common agent shape: a customer asks a question, the agent searches your knowledge base, calls three external APIs for context, asks an LLM to synthesise a response, generates a PDF, emails it, and logs the interaction to your CRM. Naively implemented as a single HTTP handler, any failure during the five-to-thirty-second run means redoing all of it — including the expensive LLM call and the non-idempotent CRM write.

Modelled as a durable workflow, each step becomes an activity. The LLM call records its output the first time it runs; on replay it returns the recorded token stream instantly and at zero cost. The CRM write is idempotency-keyed. If the container dies halfway through PDF generation, the workflow resumes on a different machine, skips completed steps, and picks up where it left off. The customer sees one response, not two, not zero. Cost per interaction drops because expensive steps never run twice. This is the difference between an AI feature that quietly works and one that silently double-charges your customers once a week.

Where teams go wrong

Having helped several clients migrate from hand-rolled agent loops to durable execution over the past year, three patterns recur:

  • Treating the workflow as a place to do work. Workflows orchestrate; activities act. Putting an HTTP call directly in your workflow function will silently break replay and produce bizarre bugs months later.
  • Ignoring versioning from day one. Workflows that are mid-flight when you deploy new code must still replay correctly against their old history. Every durable execution platform has a versioning story; none of them are automatic.
  • Underestimating observability needs. A paused workflow waiting three days for a human approval looks a lot like a bug unless your dashboards know the difference. Invest in the tracing integration early.

The broader shift

What makes this moment genuinely interesting is not just that AI agents need durable execution — it is becoming the default pattern for any multi-step business process. Order fulfilment, onboarding flows, data pipelines, scheduled reports: all benefit from the same replay-and-resume model. AI agents were the forcing function, but the architectural shift is wider. Teams that learn this pattern once will reach for it constantly; teams that do not will keep paying the tax in 3am pages, duplicate charges, and half-finished reports.

Getting this right

Durable execution is not hard to learn, but it is genuinely a different way of thinking about long-running code. The wrong framework choice, or a workflow with unclear activity boundaries, will calcify into infrastructure you regret. It is the kind of decision that rewards having someone in the room who has shipped this at least once before.

At REPTILEHAUS we have been building AI agents and automation workflows for clients since the field was called “RPA”. If you are about to commit to a durable execution platform for a production agent — or you are already bleeding reliability from a hand-rolled one — get in touch. An hour of conversation at this stage tends to save six months of rework later.

📷 Photo by Sasun Bughdaryan on Unsplash