The Cloudflare Blog

How we built saga rollbacks for Cloudflare Workflows

Vaishnav Kavitha — Thu, 25 Jun 2026 13:00:00 GMT

Cloudflare Workflows allows you to build durable, multi-step applications with built-in retries and state persistence across long-running processes. When a Workflow executes, each step can call external systems, retry failures, and persist state across restarts. But if one step fails, it may leave earlier work from completed steps in an inconsistent or partial state.

Today we’re shipping saga rollbacks for Workflows, allowing you to declare rollback logic within the step itself, in case of failure.

For example, consider a workflow for transferring funds between accounts at two different banks:

Debit from account at Bank A
Credit to account at Bank B
Send email confirmation to both account owners

What happens if Step 2, the credit to account at Bank B, fails? Once the debit succeeds at Bank A, the transaction is committed and the money has left its system. As the orchestrator of the transaction, you cannot simply “undo” the operation in Bank A's system. Instead, the money must be credited back to the account at Bank A through a new operation that semantically reverses the first one.

This pairing of an operation and its compensation logic is called the saga pattern.

Before today, developers had to implement their own compensation logic to track what succeeded, what failed, and what actions should be taken upon failure, outside of the steps’ direct definitions. Now, you can define compensation logic for each step.do() as an argument within the steps themselves, maintaining your workflow’s durability for the rollback as well.

// track what completed so we know what to undo
let debitA;
let creditB;
try {
  debitA = await step.do("debit-bank-a", () => bankA.debit(from, amount));
  creditB = await step.do("credit-bank-b", () => bankB.credit(to, amount));
  await step.do("notify", () => notifyBoth(from, to, amount));
} catch (error) {
  // unwind in reverse. each undo is its own durable step,
  // must be idempotent, and must keep going if one fails.
  if (creditB) {
    try {
      await step.do("reverse-credit-b", () => bankB.debit(to, amount, creditB.id));
    } catch (e) {
      await alertOnCall("reverse-credit-b failed", e);
    }
  }
  if (debitA) {
    try {
      await step.do("refund-debit-a", () => bankA.credit(from, amount, debitA.id));
    } catch (e) {
      await alertOnCall("refund-debit-a failed", e);
    }
  }
  throw error;
}

^{Without rollbacks}

// each step ships with its own undo. add a step,
// add its rollback right here. no growing catch
// block, no manual ordering, no replay logic.
await step.do("debit-bank-a", () => bankA.debit(from, amount), {
  rollback: async ({ output }) => bankA.credit(from, amount, output.id),
});
await step.do("credit-bank-b", () => bankB.credit(to, amount), {
  rollback: async ({ output }) => bankB.debit(to, amount, output.id),
});
await step.do("notify", () => notifyBoth(from, to, amount));

^{With rollbacks}

Try it out

To use rollbacks, just pass an options object containing a rollback function as the last argument to step.do().

const debit = await step.do(
  "debit-account-a",
  async () => {
    return await bankA.debit({
      accountId: fromAccountId,
      amount,
      idempotencyKey: `${transferId}:debit-account-a`,
    });
  },
  {
    rollback: async () => {
      await bankA.credit({
        accountId: fromAccountId,
        amount,
        idempotencyKey: `${transferId}:rollback-debit-account-a`,
      });
    },
  }
);

// The idempotency keys make both the forward operations and rollback operations safe to retry without duplicating the transfer

const credit = await step.do(
  "credit-account-b",
  async () => {
    return await bankB.credit({
      accountId: toAccountId,
      amount,
      idempotencyKey: `${transferId}:credit-account-b`,
    });
  },
  {
    rollback: async ({ output }) => {
      if (output === undefined) {
        return;
      }

      await bankB.debit({
        accountId: toAccountId,
        amount,
        idempotencyKey: `${transferId}:rollback-credit-account-b`,
      });
    },
  }
);


// If we fail here, we may want to revert all previous payments. Users should not have to wrap their code in complex try-catch logic just to revert two small payments (see below)

await step.do("send-confirmation", async () => {
  await sendTransferConfirmation({ ... });
});

Rollback functions should be idempotent, just like regular Workflow steps. If you refund a charge, use the payment provider's idempotency key. If you release inventory, make the release safe to call more than once.

If any step fails, the rollback handlers will execute in reverse step-start order. It sounds simple: run the undo steps when something fails. In practice, there are a few details that make the API and execution model important.

1. The failed step may still need rollback. A failed step.do() can still be rollback-eligible if it registered a rollback handler.

The rollback will not start if user code catches an error and the Workflow continues, but if a step error is caught and the Workflow later fails for another reason, rollback can still run for previously registered handlers, which execute in reverse step-start order.

Why? The step may have partially interacted with an external system before failing. For example, a payment provider may capture a charge, but the step may fail before returning the chargeId to Workflows. That is why rollback handlers receive output, but must handle output === undefined.

2. Rollback only starts when the Workflow fails. Adding a rollback handler does not mean every step error triggers rollback. If user code catches an error and continues, the Workflow continues. Rollback starts when the Workflow itself is about to fail terminally.

When rollback starts, Workflows finds eligible step.do() calls, runs their rollback handlers, then records the final Workflow failure.

3. Ordering has to be predictable. For sequential Workflows, rollback order feels obvious:

Reserve inventory.
Charge card.
Create shipment.
If shipment fails, refund the card and release the inventory.

Parallel steps make this more subtle. Completion order can differ from start order, so Workflows uses reverse step-start order instead of reverse completion order.

The practical rules are:

Any started or completed steps with rollback handlers are eligible.
The failing step.do() is also eligible if it registered a rollback handler.
Handlers run in reverse step-start order, not completion order.

How we designed the API

Once we had the expected behavior in mind, we had to add this new pattern into the Workflows API. Rollbacks went through a few iterations before we landed on rollback options.

Why not a fluent or builder API?

The first approach was a fluent form: step.do(...).rollback(...) It reads well. The forward action and the compensation sit next to each other, and the call site looks like ordinary JavaScript chaining.

The problem is that step.do() already has an important meaning: it starts a durable step and returns a Promise for the step output. In Workers, promise-like values are especially meaningful because Workers RPC supports promise pipelining, a pattern inherited from systems like Cap'n Proto.

Promise pipelining lets code call a method on a future value before that value has fully returned to the caller. For example:

const session = api.authenticate(apiKey);
const name = await session.whoami();

Here, session is not the real session object yet. It is more like a handle to the session that will exist soon. When you call session.whoami(), Workers can send that call to the remote side early and say: “once authentication creates the session, call whoami() on it.”

That saves a round trip. The caller does not need to wait for authenticate() to fully finish before asking for whoami().

We considered a fluent API:

step.do("charge-card", chargeCard).rollback(refundCharge);

To a reader, that can look like “call .rollback() on the result of charge-card.” But rollback is not part of the step’s output. It is part of the step.do() options, registered before the step starts, so Workflows knows how to compensate the step if a later step fails.

A fluent API also makes step timing harder to reason about. Today, step.do() starts the step when it is called, so developers can start a step, do other work, and await the first step later:

const first = step.do("first", () => serviceA.call());

await step.do("second", () => serviceB.call());

await first;

With today’s execution model, first starts immediately, before second. A fluent API would complicate that. Workflows would need to wait and see whether .rollback() gets attached before it knows the full step definition. That could delay when the step is sent to the engine.

In the earlier example, first could start at await first instead of at step.do("first", ...), after second has already completed.

That makes concurrent Workflows harder to reason about: step timing would depend on when the returned Promise is consumed, not just where step.do() is called.

We also considered a builder-style API:

const charge = await step
	.saga("charge")
	.do(() => chargeCard())
	.rollback(() => refundCharge())
	.run();

A builder API avoids the Promise ambiguity. It also gives us an obvious place for future step-level options, and makes it clear that the forward action and rollback action belong to the same saga step.

But it adds ceremony. Every step needs a final .run(), forgetting .run() would be easy and hard to spot without tooling, and simple one-step cases start to look like configuration chains. It also introduces a new step.saga() builder, breaking from the existing step. pattern. Most importantly, it makes step.do() feel like an older API rather than the primary Workflows primitive. The goal of rollback was to extend step.do(), not replace it.

Rollback as step metadata

step.do(..., { rollback })

Ultimately, we chose the explicit form where rollback is metadata on the step.

This way, each rollback is defined within the forward step itself. Each handler receives the error that caused the rollback to start, the step context, and the output, which is either the persisted value returned by the forward step (which can be undefined) or undefined if the step failed before persisting a value.

Rollbacks emit lifecycle events, so you can tell whether compensation started, which rollback handler failed, and whether rollback completed successfully.

Crucially, the original Workflow failure remains separate: rollback is what Workflows does after the failure, not the reason the Workflow failed.

Just as you can define custom retry and timeout behavior in the step configuration via WorkflowStepConfig, you add rollback-specific values in rollbackConfig.

{
  rollback: async ({ output }) => {
    await bankA.credit({ accountId: fromAccountId, amount, transferId: `${transferId}-reversal` });
  },
  rollbackConfig: {
    retries: { limit: 10, delay: '30 seconds', backoff: 'exponential' },
    timeout: '2 minutes',
  },
}

This matches the lifecycle-event mental model we wanted. A step.do() already describes a durable unit of work that Workflows records, retries, and later shows in logs. Rollback is another lifecycle behavior for that same unit of work. It should travel with the step definition, not live in a separate wrapper or builder.

The step still starts when step.do() normally starts.
The returned promise still represents the step output.
Concurrent Workflow code keeps the same execution model.
Retry and timeout options for rollback live next to the rollback handler.
Existing step.do() calls keep working exactly as they do today.

This shape is slightly more explicit than the fluent API, but that explicitness is useful. The operation and its compensation are still in one place, and the API does not introduce a new step builder or a new kind of promise. Developers who already understand step.do() only need to learn one additional options object.

This is less magical, but it is simpler to adopt, and clearer to understand.

How it works under the hood

Rollback feels like a small API addition, but it changes what Workflows needs to record about each step.

A regular step.do() already has a durable record. Workflows records that the step started, whether it completed, what it returned, and whether it should be skipped instead of repeated if the Workflow resumes later.

Rollbacks add one more thing to that record: whether the step registered compensation logic.

This means Workflows has two pieces of information to bring together if the Workflow fails.

The first is durable step history. The Workflow engine stores data to know what ran, what completed, what output was saved, and whether rollback was registered.

The second is the rollback handler itself, which is the function written to compensate for that step. Workflows does not save the text of that function as data. Instead, it keeps a callable reference to the handler while the Workflow is running.

In Workers RPC, this kind of callable reference is called a stub. A stub lets one part of the system call code that is running somewhere else. Stubs also have lifetimes such that they can be disposed when a call or execution context ends. If you need to keep a stub past that point, Workers RPC provides a dup() method, which creates another handle to the same target.

For rollback, that model is useful. The durable step history records what needs compensation. The rollback stub gives Workflows a way to invoke the compensation code. And because rollback handlers may need to outlive the immediate step.do() call that registered them, Workflows keeps its own callable reference to the handler for the rollback phase.

In the common case, when a Workflow enters rollback in the same engine lifetime, Workflows already has the rollback stubs it needs. It can use the durable step history to find eligible steps, then invoke the rollback stubs that were registered during forward execution.

This gets more subtle when Workflows has to recover after a restart.

If the engine is evicted, crashes, or restarts while rollback is needed, Workflows still has the durable step history, but it may no longer have the in-memory rollback stubs. To recover, Workflows uses replay: a recovery mode where it can re-run the Workflow code without re-executing completed forward step bodies.

When replay reaches a completed step.do(), Workflows reads the persisted result instead of running the step body again. For rollback recovery, Workflows only needs to rebuild handlers for steps that had rollback attached and are eligible for rollback. As those step.do() calls are encountered, their rollback options can register the callable stubs again

That lets Workflows recover the rollback handlers it needs without duplicating the original external side effects.

With those pieces in place, rollback can work whether the handler is still available in memory or has to be rebuilt during recovery.

When the workflow is about to fail, Workflows does not ask your application to reconstruct what happened. It already has the step history. It can look at the persisted record and answer the important questions:

Which steps started?
Which steps finished?
Which failed step may still need cleanup?
Which steps registered rollback handlers?
What output should each rollback handler receive?
What order should compensation run in?

Then Workflows invokes each rollback stub with a rollback context: the original error, the step context, and the step output, if one was persisted.

The ordering detail matters. In normal JavaScript, especially with Promise.all(), completion order is not always the same as start order. If step A starts first and step B starts second, step B might finish first. For rollback, Workflows uses the persisted start order as the stable source of truth, then unwinds it in reverse.

Rollback handlers also run through Workflows' normal step machinery. That means compensation gets the same operational properties you expect from Workflows: retries, timeouts, lifecycle events, logs, and a final recorded outcome. If a rollback handler keeps failing after its configured retries, Workflows records the rollback outcome as failed, stops running the remaining rollback handlers, and the Workflow instance ultimately ends in the Errored state.

This is the main difference between saga rollbacks and a catch block. A catch block only knows what is still in memory at its exact point in your JavaScript execution. Workflows rollback uses persisted step history to decide what already happened, invokes the stubs it already has in the common case, and safely rebuilds missing stubs during recovery when it needs to.

That is also why the API puts rollback on step.do() itself. Rollback is not a separate global error handler — it is metadata attached to the durable unit of work Workflows already understands.

What’s next

Our first iteration of rollbacks includes:

Explicit per-step rollback handlers for step.do()
Sequential rollback execution
Retry and timeout configuration for compensation

Next, we want to explore:

Rollback support for waitForEvent
Support for parallel rollback execution
Rollback support for Python Workflows

When a multi-step application fails halfway through, the hardest part is often not knowing that it failed. It is knowing what already happened, and what needs to happen next.

Saga rollbacks let you put that answer directly beside each step. If you are building multi-step applications with Workflows, try saga rollbacks and tell us what compensation patterns you want next. Get started with the Workflows documentation and share feedback in the Cloudflare Community.

Rearchitecting the Workflows control plane for the agentic era

Luís Duarte — Wed, 15 Apr 2026 13:00:00 GMT

When we originally built Workflows, our durable execution engine for multi-step applications, it was designed for a world in which workflows were triggered by human actions, like a user signing up or placing an order. For use cases like onboarding flows, workflows only had to support one instance per person — and people can only click so fast.

Over time, what we’ve actually seen is a quantitative shift in the workload and access pattern: fewer human-triggered workflows, and more agent-triggered workflows, created at machine speed.

As agents become persistent and autonomous infrastructure, operating on behalf of users for hours or days, they need a durable, asynchronous execution engine for the work they are doing. Workflows provides exactly that: every step is independently retryable, the workflow can pause for human-in-the-loop approval, and each instance survives failures without losing progress.

Moreover, workflows themselves are being used to implement agent loops and serve as the durable harnesses that manage and keep agents alive. Our Agents SDK integration accelerated this, making it easy for agents to spawn workflow instances and get real-time progress back. A single agent session can now kick off dozens of workflows, and many agents running concurrently means thousands of instances created in seconds. With Project Think now available, we anticipate that velocity will only increase.

To help developers scale their agents and applications on Workflows, we are excited to announce that we now support:

50,000 concurrent instances (number of workflow executions running in parallel), originally 4,500
300 instances/second created per account, previously 100
2 million queued instances (meaning instances that have been created or awoken and are waiting for a concurrency slot) per workflow, up from 1 million

We redesigned the Workflows control plane from usage data and first principles to support these increases. For V1 of the control plane, a single Durable Object (DO) could serve as the central registry and coordinator of an entire account. For V2, we built two new components to help horizontally scale the system and alleviate the bottlenecks that V1 introduced, before migrating all customers — with live traffic — seamlessly onto the new version.

V1: initial architecture of Workflows

As described in our public beta blog post, we built Workflows entirely on our own developer platform. Fundamentally, a workflow is a series of durable steps, each independently retryable, that can execute tasks, wait for external events, or sleep until a predetermined time.

export class MyWorkflow extends WorkflowEntrypoint {

  async run(event, step) {
    const data = await step.do("fetch-data", async () => {
      return fetchFromAPI();
    });

    const approval = await step.waitForEvent("approval", {
      type: "approval",
      timeout: "24 hours",
    });

    await step.do("process-and-save", async () => {
      return store(transform(data));
    });
  }
}

To trigger each instance, execute its logic, and store its metadata, we leverage SQLite-backed Durable Objects, which are a simple but powerful primitive for coordination and storage within a distributed system.

In the control plane, some Durable Objects — like the Engine, which executes the actual workflow instance, including its step, retry, and sleep logic — are spun up at a ratio of 1:1 per instance. On the other hand, the Account is an account-level Durable Object that manages all workflows and workflow instances for that account.

To learn more about the V1 control plane, refer to our Workflows announcement blog post.

After we launched Workflows into beta, we were thrilled to see customers quickly scaling their use of the product, but we also realized that having a single Durable Object to store all that account-level information introduced a bottleneck. Many customers needed to create and execute hundreds or even thousands of Workflow instances per minute, which could quickly overwhelm the Account in our original architecture. The original rate limits — 4,500 concurrency slots and 100 instance creations per 10 seconds — were a result of this limitation.

On the V1 control plane, these limits were a hard cap. Any and all operations depending on Account, including create, update, and list, had to go through that single DO. Users with high concurrency workloads could have thousands of instances starting and ending at any given moment, building up to thousands of requests per second to Account. To solve for this, we rearchitected the workflow control plane such that it horizontally scales to higher concurrency and creation rate limits.

V2: horizontal scale for higher throughput

For the new version, we rethought every single operation from the ground up with the goal of optimizing for high-volume workflows. Ultimately, Workflows should scale to support whatever developers need – whether that is thousands of instances created per second or millions of instances running at a time. We also wanted to ensure that V2 allowed for flexible limits, which we can toggle and continue increasing, rather than the hard cap which V1 limits imposed. After many design iterations, we settled on the following pillars for our new architecture:

The source of truth for the existence of a given instance should be its Engine and nothing else.
- In the V1 control plane architecture, we lacked a check before queuing the instance as to whether its Engine actually existed. This allowed for a bad state where an instance may have been queued without its corresponding Engine having spun up.
- Instance lifecycle and liveness mechanisms must be horizontally scalable per-workflow and distributed throughout many regions.
The new Account singleton should only store the minimum necessary metadata and have an invariant maximum amount of concurrent requests.

There are two new, critical components in the V2 control plane which allowed us to improve the scalability of Workflows: SousChef and Gatekeeper. The first component, SousChef, is a “second in command” to the Account. Recall that previously, the Account managed the metadata and lifecycle for all of the instances across all of the workflows within a given account. SousChef was introduced to keep track of metadata and lifecycle on a subset of instances in a given workflow. Within an account, a distribution of SousChefs can then report back to Account in a more efficient and manageable way. (An added benefit of this design: not only did we already have per-account isolation, but we also inadvertently gained “per-workflow” isolation within the same account, since each SousChef only takes care of one specific workflow).

The second component, Gatekeeper, is a mechanism to distribute concurrency “slots” (derived from concurrency limits) across all SousChefs within the account. It acts as a leasing system. When an instance is created, it is randomly assigned to one of the SousChefs within that account. Then the SousChef makes a request to Account to trigger that instance. Either a slot is granted, or the instance is queued. Once the slot is granted, the SousChef triggers execution of the instance and assumes responsibility that the instance never gets stuck.

Gatekeeper was needed to make sure that Engines never overloaded their Account (a pressing risk on V1) so every communication between SousChefs and their Account happens on a periodic cycle, once per second — each cycle will also batch all slot requests, ensuring that only one JSRPC call is made. This ensures the instance creation rate can never overload or influence the most important component, Account (as an aside: if the SousChef count is too high, we rate-limit calls or spread across different SousChefs throughout different time periods). Also, this periodic property allows us to preserve fairness on older instances and to ensure max-min fairness through the many SousChefs, allowing them all to progress. For example, if an instance wakes up, it should be prioritized for a slot over a newly created instance, but each SousChef ensures that its own instances do not get stuck.

This architecture is more distributed, and therefore, more scalable. Now, when an instance is created, the request path is:

Check control plane version
Check if a cached version of the workflow and version details is available in that location
1. If not, check Account to get workflow name, unique ID, and version, and cache that information
Store only necessary metadata (instance payload, creation date) onto its own Engine

So, how does Engine tell the control plane that it now exists? That happens in the background after instance metadata is set. As background operations on a Durable Object can fail, due to eviction or server failure, we also set an “alarm” on Engine in the creation hot-path. That way, if the background task does not finish, the alarm ensures that the instance will begin.

A Durable Object alarm allows a Durable Object instance to be awakened at a fine-grained time in the future with an at-least-once execution model, with automatic retries built in. We extensively use this combination of background “tasks” and alarms to remove operations off the hot-path while still ensuring that everything will happen as planned. That’s how we keep critical operations like creating an instance fast without ever compromising on reliability.

Other than unlocking scale, this version of the control plane means that:

Instance listing performance is faster, and actually consistent with cursor pagination;
Any operation on an instance does exactly one network hop (as it can go directly to its Engine, ensuring that eyeball request latency is as small as we can manage);
We can ensure that more instances are actually behaving correctly (by running on time) concurrently (and correct them if not, making sure that Engines are never late to continue execution).

V1 → V2 migration

Now that we had a new version of the Workflows control plane that can handle a higher volume of user load, we needed to do the “boring” part: migrating our customers and instances to the new system. At Cloudflare’s scale, this becomes a problem in and of itself, so the “boring” part becomes the biggest challenge. Well before its one-year mark, Workflows had already racked up millions of instances and thousands of customers. Also, some tech debt on V1’s control plane meant that a queued instance might not have its own Engine Durable Object created yet, complicating matters further.

Such a migration is tricky because customers might have instances running at any given moment; we needed a way to add the SousChef and Gatekeeper components into older accounts without causing any disruption or downtime.

We ultimately decided that we would migrate existing Accounts (which we’ll refer to as AccountOlds) to behave like SousChefs. By persisting the Account DOs, we maintained the instance metadata, and simply converted the DO into a SousChef “DO”:

// You might be wondering what's this SousChef class? This is the SousChef DO class!
import { SousChef } from "@repo/souschef";

class AccountOld extends DurableObject {
  constructor(state: DurableObjectState, env: Env) {
    // We added the following snippet to the end of our AccountOld DO's
    // constructor. This ensures that if we want, we can use any primitive
    // that is available on SousChef DO
    if (this.currentVersion === ControlPlaneVersions.SOUS_CHEFS) {
      this.sousChef = new SousChef(this.ctx, this.env);
      await this.sousChef.setup()
    }
  }

  async updateInstance(params: UpdateInstanceParams) {
    if (this.currentVersion === ControlPlaneVersions.SOUS_CHEFS) {
      assert(this.sousChef !== undefined, 'SousChef must exist on v2');
      return this.sousChef.updateInstance(params);
    }

    // old logic remains the same
  }

  @RequiresVersion(ControlPlaneVersions.V1)
  async getMetadata() {
    // this method can only be run if 
    // this.currentVersion === ControlPlaneVersions.V1
  }
}

We can instantiate the SousChef class within the AccountOld because the SQL tables that track instance metadata, on both SousChefs and AccountOld DOs, are the same on both. As such, we could just decide which version of the code to use. If this hadn’t been the case, we would have been forced to migrate the metadata of millions of instances, which would have made the migration more difficult and longer running for each account. So, how did the migration work?

First, we prepared AccountOld DOs to be switched to behave as SousChefs (which meant creating a release with a version of the snippet above). Then, we enabled control plane V2 per account, which triggered the next three steps roughly at the same time:

All new instance creation requests are now routed to the new SousChefs (SousChefs are created when they receive the first request), new instances never go to AccountOld again;
AccountOld DOs start migrating themselves to behave like SousChefs;
The new Account DO is spun up with the corresponding metadata.

After all accounts were migrated to the new control plane version, we were able to sunset AccountOld DOs as their instance retention periods expired. Once all instances on all accounts on AccountOlds were migrated, we could spin down those DOs permanently. The migration was completed with no downtime in a process that truly felt like changing a car’s wheels while driving.

Try it out

If you are new to Workflows, try our Get Started guide or build your first durable agent with Workflows.

If your use case requires higher limits than our new defaults — a concurrency limit of 50,000 slots and account-level creation rate limit of 300 instances per second, 100 per workflow — reach out via your account team or the Workers Limit Request Form. You can also reach out with feedback, feature requests, or just to share how you are using Workflows on our Discord server.

How we use Abstract Syntax Trees (ASTs) to turn Workflows code into visual diagrams

André Venceslau — Fri, 27 Mar 2026 13:00:00 GMT

Cloudflare Workflows is a durable execution engine that lets you chain steps, retry on failure, and persist state across long-running processes. Developers use Workflows to power background agents, manage data pipelines, build human-in-the-loop approval systems, and more.

Last month, we announced that every workflow deployed to Cloudflare now has a complete visual diagram in the dashboard.

We built this because being able to visualize your applications is more important now than ever before. Coding agents are writing code that you may or may not be reading. However, the shape of what gets built still matters: how the steps connect, where they branch, and what's actually happening.

If you've seen diagrams from visual workflow builders before, those are usually working from something declarative: JSON configs, YAML, drag-and-drop. However, Cloudflare Workflows are just code. They can include Promises, Promise.all, loops, conditionals, and/or be nested in functions or classes. This dynamic execution model makes rendering a diagram a bit more complicated.

We use Abstract Syntax Trees (ASTs) to statically derive the graph, tracking Promise and await relationships to understand what runs in parallel, what blocks, and how the pieces connect.

Keep reading to learn how we built these diagrams, or deploy your first workflow and see the diagram for yourself.

Here’s an example of a diagram generated from Cloudflare Workflows code:

Dynamic workflow execution

Generally, workflow engines can execute according to either dynamic or sequential (static) execution order. Sequential execution might seem like the more intuitive solution: trigger workflow → step A → step B → step C, where step B starts executing immediately after the engine completes Step A, and so forth.

Cloudflare Workflows follow the dynamic execution model. Since workflows are just code, the steps execute as the runtime encounters them. When the runtime discovers a step, that step gets passed over to the workflow engine, which manages its execution. The steps are not inherently sequential unless awaited — the engine executes all unawaited steps in parallel. This way, you can write your workflow code as flow control without additional wrappers or directives. Here’s how the handoff works:

An engine, which is a “supervisor” Durable Object for that instance, spins up. The engine is responsible for the logic of the actual workflow execution.
The engine triggers a user worker via dynamic dispatch, passing control over to Workers runtime.
When Runtime encounters a step.do, it passes the execution back to the engine.
The engine executes the step, persists the result (or throws an error, if applicable) and triggers the user Worker again.

With this architecture, the engine does not inherently “know” the order of the steps that it is executing — but for a diagram, the order of steps becomes crucial information. The challenge here lies in getting the vast majority of workflows translated accurately into a diagnostically helpful graph; with the diagrams in beta, we will continue to iterate and improve on these representations.

Parsing the code

Fetching the script at deploy time, instead of run time, allows us to parse the workflow in its entirety to statically generate the diagram.

Taking a step back, here is the life of a workflow deployment:

To create the diagram, we fetch the script after it has been bundled by the internal configuration service which deploys Workers (step 2 under Workflow deployment). Then, we use a parser to create an abstract syntax tree (AST) representing the workflow, and our internal service generates and traverses an intermediate graph with all WorkflowEntrypoints and calls to workflows steps. We render the diagram based on the final result on our API.

When a Worker is deployed, the configuration service bundles (using esbuild by default) and minifies the code unless specified otherwise. This presents another challenge — while Workflows in TypeScript follow an intuitive pattern, their minified Javascript (JS) can be dense and indigestible. There are also different ways that code can be minified, depending on the bundler.

Here’s an example of Workflow code that shows agents executing in parallel:

const summaryPromise = step.do(
         `summary agent (loop ${loop})`,
         async () => {
           return runAgentPrompt(
             this.env,
             SUMMARY_SYSTEM,
             buildReviewPrompt(
               'Summarize this text in 5 bullet points.',
               draft,
               input.context
             )
           );
         }
       );
        const correctnessPromise = step.do(
         `correctness agent (loop ${loop})`,
         async () => {
           return runAgentPrompt(
             this.env,
             CORRECTNESS_SYSTEM,
             buildReviewPrompt(
               'List correctness issues and suggested fixes.',
               draft,
               input.context
             )
           );
         }
       );
        const clarityPromise = step.do(
         `clarity agent (loop ${loop})`,
         async () => {
           return runAgentPrompt(
             this.env,
             CLARITY_SYSTEM,
             buildReviewPrompt(
               'List clarity issues and suggested fixes.',
               draft,
               input.context
             )
           );
         }
       );

Bundling with rspack, a snippet of the minified code looks like this:

class pe extends e{async run(e,t){de("workflow.run.start",{instanceId:e.instanceId});const r=await t.do("validate payload",async()=>{if(!e.payload.r2Key)throw new Error("r2Key is required");if(!e.payload.telegramChatId)throw new Error("telegramChatId is required");return{r2Key:e.payload.r2Key,telegramChatId:e.payload.telegramChatId,context:e.payload.context?.trim()}}),s=await t.do("load source document from r2",async()=>{const e=await this.env.REVIEW_DOCUMENTS.get(r.r2Key);if(!e)throw new Error(`R2 object not found: ${r.r2Key}`);const t=(await e.text()).trim();if(!t)throw new Error("R2 object is empty");return t}),n=Number(this.env.MAX_REVIEW_LOOPS??"5"),o=this.env.RESPONSE_TIMEOUT??"7 days",a=async(s,i,c)=>{if(s>n)return le("workflow.loop.max_reached",{instanceId:e.instanceId,maxLoops:n}),await t.do("notify max loop reached",async()=>{await se(this.env,r.telegramChatId,`Review stopped after ${n} loops for ${e.instanceId}. Start again if you still need revisions.`)}),{approved:!1,loops:n,finalText:i};const h=t.do(`summary agent (loop ${s})`,async()=>te(this.env,"You summarize documents. Keep the output short, concrete, and factual.",ue("Summarize this text in 5 bullet points.",i,r.context)))...

Or, bundling with vite, here is a minified snippet:

class ht extends pe {
  async run(e, r) {
    b("workflow.run.start", { instanceId: e.instanceId });
    const s = await r.do("validate payload", async () => {
      if (!e.payload.r2Key)
        throw new Error("r2Key is required");
      if (!e.payload.telegramChatId)
        throw new Error("telegramChatId is required");
      return {
        r2Key: e.payload.r2Key,
        telegramChatId: e.payload.telegramChatId,
        context: e.payload.context?.trim()
      };
    }), n = await r.do(
      "load source document from r2",
      async () => {
        const i = await this.env.REVIEW_DOCUMENTS.get(s.r2Key);
        if (!i)
          throw new Error(`R2 object not found: ${s.r2Key}`);
        const c = (await i.text()).trim();
        if (!c)
          throw new Error("R2 object is empty");
        return c;
      }
    ), o = Number(this.env.MAX_REVIEW_LOOPS ?? "5"), l = this.env.RESPONSE_TIMEOUT ?? "7 days", a = async (i, c, u) => {
      if (i > o)
        return H("workflow.loop.max_reached", {
          instanceId: e.instanceId,
          maxLoops: o
        }), await r.do("notify max loop reached", async () => {
          await J(
            this.env,
            s.telegramChatId,
            `Review stopped after ${o} loops for ${e.instanceId}. Start again if you still need revisions.`
          );
        }), {
          approved: !1,
          loops: o,
          finalText: c
        };
      const h = r.do(
        `summary agent (loop ${i})`,
        async () => _(
          this.env,
          et,
          K(
            "Summarize this text in 5 bullet points.",
            c,
            s.context
          )
        )
      )...

Minified code can get pretty gnarly — and depending on the bundler, it can get gnarly in a bunch of different directions.

We needed a way to parse the various forms of minified code quickly and precisely. We decided oxc-parser from the JavaScript Oxidation Compiler (OXC) was perfect for the job. We first tested this idea by having a container running Rust. Every script ID was sent to a Cloudflare Queue, after which messages were popped and sent to the container to process. Once we confirmed this approach worked, we moved to a Worker written in Rust. Workers supports running Rust via WebAssembly, and the package was small enough to make this straightforward.

The Rust Worker is responsible for first converting the minified JS into AST node types, then converting the AST node types into the graphical version of the workflow that is rendered on the dashboard. To do this, we generate a graph of pre-defined node types for each workflow and translate into our graph representation through a series of node mappings.

Rendering the diagram

There were two challenges to rendering a diagram version of the workflow: how to track step and function relationships correctly, and how to define the workflow node types as simply as possible while covering all the surface area.

To guarantee that step and function relationships are tracked correctly, we needed to collect both the function and step names. As we discussed earlier, the engine only has information about the steps, but a step may be dependent on a function, or vice versa. For example, developers might wrap steps in functions or define functions as steps. They could also call steps within a function that come from different modules or rename steps.

Although the library passes the initial hurdle by giving us the AST, we still have to decide how to parse it. Some code patterns require additional creativity. For example, functions — within a WorkflowEntrypoint, there can be functions that call steps directly, indirectly, or not at all. Consider functionA, which contains console.log(await functionB(), await functionC()) where functionB calls a step.do(). In that case, both functionA and functionB should be included on the workflow diagram; however, functionC should not. To catch all functions which include direct and indirect step calls, we create a subgraph for each function and check whether it contains a step call itself or whether it calls another function which might. Those subgraphs are represented by a function node, which contains all of its relevant nodes. If a function node is a leaf of the graph, meaning it has no direct or indirect workflow steps within it, it is trimmed from the final output.

We check for other patterns as well, including a list of static steps from which we can infer the workflow diagram or variables, defined in up to ten different ways. If your script contains multiple workflows, we follow a similar pattern to the subgraphs created for functions, abstracted one level higher.

For every AST node type, we had to consider every way they could be used inside of a workflow: loops, branches, promises, parallels, awaits, arrow functions… the list goes on. Even within these paths, there are dozens of possibilities. Consider just a few of the possible ways to loop:

// for...of
for (const item of items) {
	await step.do(`process ${item}`, async () => item);
}
// while
while (shouldContinue) {
	await step.do('poll', async () => getStatus());
}
// map
await Promise.all(
	items.map((item) => step.do(`map ${item}`, async () => item)),
);
// forEach
await items.forEach(async (item) => {
	await step.do(`each ${item}`, async () => item);
});

And beyond looping, how to handle branching:

// switch / case
switch (action.type) {
	case 'create':
		await step.do('handle create', async () => {});
		break;
	default:
		await step.do('handle unknown', async () => {});
		break;
}

// if / else if / else
if (status === 'pending') {
	await step.do('pending path', async () => {});
} else if (status === 'active') {
	await step.do('active path', async () => {});
} else {
	await step.do('fallback path', async () => {});
}

// ternary operator
await (cond
	? step.do('ternary true branch', async () => {})
	: step.do('ternary false branch', async () => {}));

// nullish coalescing with step on RHS
const myStepResult =
	variableThatCanBeNullUndefined ??
	(await step.do('nullish fallback step', async () => 'default'));

// try/catch with finally
try {
	await step.do('try step', async () => {});
} catch (_e) {
	await step.do('catch step', async () => {});
} finally {
	await step.do('finally step', async () => {});
}

Our goal was to create a concise API that communicated what developers need to know without overcomplicating it. But converting a workflow into a diagram meant accounting for every pattern (whether it follows best practices, or not) and edge case possible. As we discussed earlier, each step is not explicitly sequential, by default, to any other step. If a workflow does not utilize await and Promise.all(), we assume that the steps will execute in the order in which they are encountered. But if a workflow included await, Promise or Promise.all(), we needed a way to track those relationships.

We decided on tracking execution order, where each node has a starts: and resolves: field. The starts and resolves indices tell us when a promise started executing and when it ends relative to the first promise that started without an immediate, subsequent conclusion. This correlates to vertical positioning in the diagram UI (i.e., all steps with starts:1 will be inline). If steps are awaited when they are declared, then starts and resolves will be undefined, and the workflow will execute in the order of the steps’ appearance to the runtime.

While parsing, when we encounter an unawaited Promise or Promise.all(), that node (or nodes) are marked with an entry number, surfaced in the starts field. If we encounter an await on that promise, the entry number is incremented by one and saved as the exit number (which is the value in resolves). This allows us to know which promises run at the same time and when they’ll complete in relation to each other.

export class ImplicitParallelWorkflow extends WorkflowEntrypoint {
 async run(event: WorkflowEvent, step: WorkflowStep) {
   const branchA = async () => {
     const a = step.do("task a", async () => "a"); //starts 1
     const b = step.do("task b", async () => "b"); //starts 1
     const c = await step.waitForEvent("task c", { type: "my-event", timeout: "1 hour" }); //starts 1 resolves 2
     await step.do("task d", async () => JSON.stringify(c)); //starts 2 resolves 3
     return Promise.all([a, b]); //resolves 3
   };

   const branchB = async () => {
     const e = step.do("task e", async () => "e"); //starts 1
     const f = step.do("task f", async () => "f"); //starts 1
     return Promise.all([e, f]); //resolves 2
   };

   await Promise.all([branchA(), branchB()]);

   await step.sleep("final sleep", 1000);
 }
}

You can see the steps’ alignment in the diagram:

After accounting for all of those patterns, we settled on the following list of node types:

| StepSleep
| StepDo
| StepWaitForEvent
| StepSleepUntil
| LoopNode
| ParallelNode
| TryNode
| BlockNode
| IfNode
| SwitchNode
| StartNode
| FunctionCall
| FunctionDef
| BreakNode;

Here are a few samples of API output for different behaviors:

function call:

{
  "functions": {
    "runLoop": {
      "name": "runLoop",
      "nodes": []
    }
  }
}

if condition branching to step.do:

{
  "type": "if",
  "branches": [
    {
      "condition": "loop > maxLoops",
      "nodes": [
        {
          "type": "step_do",
          "name": "notify max loop reached",
          "config": {
            "retries": {
              "limit": 5,
              "delay": 1000,
              "backoff": "exponential"
            },
            "timeout": 10000
          },
          "nodes": []
        }
      ]
    }
  ]
}

parallel with step.do and waitForEvent:

{
  "type": "parallel",
  "kind": "all",
  "nodes": [
    {
      "type": "step_do",
      "name": "correctness agent (loop ${...})",
      "config": {
        "retries": {
          "limit": 5,
          "delay": 1000,
          "backoff": "exponential"
        },
        "timeout": 10000
      },
      "nodes": [],
      "starts": 1
    },
...
    {
      "type": "step_wait_for_event",
      "name": "wait for user response (loop ${...})",
      "options": {
        "event_type": "user-response",
        "timeout": "unknown"
      },
      "starts": 3,
      "resolves": 4
    }
  ]
}

What’s next

Ultimately, the goal of these Workflow diagrams is to serve as a full-service debugging tool. That means you’ll be able to:

Trace an execution through the graph in real time
Discover errors, wait for human-in-the-loop approvals, and skip steps for testing
Access visualizations in local development

Check out the diagrams on your Workflow overview pages. If you have any feature requests or notice any bugs, share your feedback directly with the Cloudflare team by joining the Cloudflare Developers community on Discord.