Skip to main content

For years, building fault-tolerant workflows meant bolting on external orchestration platforms — Temporal, Restate, Inngest, or a hand-rolled queue-and-worker setup. Each added infrastructure, operational overhead, and another failure domain to worry about. This week, Microsoft quietly open-sourced pg_durable, a PostgreSQL extension that challenges that entire assumption. Durable execution, it turns out, can live inside the database you are already running.

TL;DR

  • Microsoft’s pg_durable is a PostgreSQL extension that brings durable, fault-tolerant workflow execution directly into the database — no external orchestrators required.
  • Workflows are defined in a SQL-native DSL using operators like ~> (sequence) and |=> (fan-out), with automatic checkpointing between steps.
  • If PostgreSQL crashes or restarts, execution resumes from the last checkpoint rather than restarting from scratch.
  • Best suited for data-centric workflows like embedding pipelines, ETL, scheduled maintenance, and API enrichment — not sub-millisecond request handling.
  • Part of a broader trend of PostgreSQL absorbing capabilities that previously required separate infrastructure, simplifying architectures for small and mid-sized teams.

What pg_durable Actually Does

At its core, pg_durable lets you define multi-step workflows as composable SQL operations. As the runtime executes each step, it persists state to PostgreSQL itself. If the database crashes, restarts, or a step fails, execution resumes from the last durable checkpoint instead of making you reconstruct state by hand.

The extension is built with pgrx (Rust bindings for PostgreSQL) and ships with two lower-level Rust libraries: duroxide for orchestration (deterministic replay, checkpoints, sub-orchestrations, timers) and duroxide-pg for state persistence.

Here is a minimal example that processes unprocessed documents in a durable fan-out:

SELECT df.start(
  'SELECT id FROM documents WHERE processed = false LIMIT 100' |=> 'batch'
  ~> 'UPDATE documents SET processed = true WHERE id = ANY()'
);

The |=> operator fans out the query results, and ~> sequences the next step. If PostgreSQL goes down mid-batch, the workflow picks up exactly where it left off.

Why This Matters for Architecture

Every external orchestration platform you add to your stack is another service to deploy, monitor, scale, and secure. For teams running Temporal or similar platforms, the trade-off makes sense when workflows span multiple services and involve complex human-in-the-loop interactions. But a surprisingly large number of “workflows” in production are fundamentally data-centric operations: ETL pipelines, embedding generation, scheduled maintenance, API enrichment runs.

For these use cases, the data already lives in PostgreSQL. The business logic is expressible in SQL. The only reason you reached for an external orchestrator was durability — the guarantee that a multi-step process would not silently fail halfway through and leave your data in an inconsistent state.

pg_durable eliminates that architectural detour. One fewer service. One fewer deployment. One fewer thing to page about at 3 AM.

The PostgreSQL-as-a-Platform Trend

This is not happening in isolation. PostgreSQL has been steadily absorbing capabilities that once required separate infrastructure:

  • pgvector turned PostgreSQL into a vector database, reducing the need for Pinecone or Weaviate in many AI pipelines.
  • pg_cron brought job scheduling into the database, replacing external cron services.
  • LISTEN/NOTIFY provides lightweight pub/sub without a message broker.
  • Logical replication enables CDC patterns that previously required Debezium or similar tooling.
  • And now pg_durable adds fault-tolerant workflow orchestration.

For small-to-mid-sized teams, this trend is enormously valuable. Every service you can consolidate into your existing PostgreSQL instance is operational complexity you do not have to carry. The “do you really need a separate service for that?” question now has a credible “no” answer for an expanding list of capabilities.

Where pg_durable Fits — and Where It Does Not

Microsoft is refreshingly honest about the boundaries. pg_durable is well-suited for:

  • Vector embedding pipelines — chunk documents, call an embedding API, upsert into pgvector.
  • Data ingestion and transformation — stage, deduplicate, transform, and publish large batches.
  • Scheduled maintenance — detect bloat, notify, wait for approval, then execute.
  • Fan-out aggregation — run independent queries in parallel, then join the results.
  • External API enrichment — classification, webhook-style calls, and data augmentation from SQL.

It is explicitly not for:

  • Simple single-statement operations already expressible as INSERT...SELECT.
  • Sub-millisecond synchronous request handling.
  • Workflows requiring arbitrary application code outside SQL.
  • Environments where you cannot install extensions or background workers (looking at you, managed database tiers that restrict extensions).

That last point is worth highlighting. If your PostgreSQL instance is on a managed provider that does not support custom extensions, pg_durable is off the table — at least until Azure Database for PostgreSQL adds it, which seems likely given that Microsoft built it.

Security Model Worth Noting

The extension takes security seriously. No PUBLIC privileges are granted by default. Administrators explicitly grant access using df.grant_usage('role'), and row-level security ensures each database user can only manage their own workflow instances. The background worker runs as a configurable superuser role to manage all users’ workflows while respecting RLS boundaries.

For teams already invested in PostgreSQL’s role-based access control, this fits naturally into existing security models rather than introducing a separate authentication layer.

What This Means for Your Stack Decisions

If you are evaluating orchestration options for data-centric workflows, pg_durable deserves a place on your shortlist — with caveats. It is currently in preview, so production adoption should be cautious. But the architectural direction is sound: reduce moving parts, leverage the database you are already operating, and keep workflows close to the data they act on.

For teams already running Temporal or Inngest for complex, multi-service orchestration with human approvals and long-running sagas, pg_durable is not a replacement. It is a complement — handling the data-layer workflows that never needed a full orchestration platform in the first place.

The practical question for every development team: how many of your current “workflows” are really just multi-step SQL operations with a durability requirement? If the answer is “more than we would like to admit,” pg_durable might let you delete some infrastructure.

How We Think About This at REPTILEHAUS

At REPTILEHAUS, we are constantly evaluating where architectural complexity is earning its keep and where it is just inertia. For our clients building SaaS platforms and AI-powered applications, PostgreSQL is almost always already in the stack. Tools like pg_durable represent exactly the kind of consolidation that reduces operational burden without sacrificing reliability.

If you are wrestling with workflow orchestration decisions, or looking to simplify an over-engineered stack, get in touch. We help teams build architectures that are as simple as they can be — and no simpler.

📷 Photo by Taylor Vick on Unsplash