Skip to content

fix: allow async cancellation#9705

Merged
dmadisetti merged 5 commits into
mainfrom
dm/nonblocking-runner
Jun 12, 2026
Merged

fix: allow async cancellation#9705
dmadisetti merged 5 commits into
mainfrom
dm/nonblocking-runner

Conversation

@dmadisetti

@dmadisetti dmadisetti commented May 27, 2026

Copy link
Copy Markdown
Member

📝 Summary

Closes #2684

Promotes scheduler to long lived singleton on kernel context for look-up of running cells. This in turn allows task cancellations on async cells, which were previously un-cancelable. Example of cancelation:

image
@vercel

vercel Bot commented May 27, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Jun 11, 2026 9:00pm

Request Review

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 12 files

Architecture diagram
sequenceDiagram
    participant Client as Browser Client
    participant Kernel as Kernel (main thread)
    participant SIGINT as Signal Handler
    participant Context as KernelRuntimeContext
    participant Scheduler as SequentialScheduler
    participant CellRunner as CellRunner
    participant Evaluator as CellEvaluator
    participant Task as asyncio Task

    Note over Client,Task: Cell Execution with Cancellation Support

    Client->>Kernel: Request cell execution
    Kernel->>Scheduler: Create SequentialScheduler(cells_to_run, graph)
    Scheduler->>Scheduler: Initialize _active={}, _interrupted=False

    Note over Scheduler,CellRunner: async with Scheduler publishes on context

    Scheduler->>Context: __aenter__: set _active_scheduler=self
    Context-->>Scheduler: OK

    loop For each cell batch
        Scheduler->>Scheduler: batch(cell_ids) - yield one cell at a time
        Scheduler->>CellRunner: Run cell_id
        CellRunner->>CellRunner: Check self.cancelled(cell_id)
        alt Cell is cancelled
            CellRunner->>CellRunner: Mark cell as "cancelled"
        else Cell not cancelled
            alt Cell is coroutine (async)
                CellRunner->>Task: asyncio.ensure_future(evaluate(cell, glbls))
                CellRunner->>Scheduler: register_task(cell_id, task)
                Note over CellRunner,Task: Task registered so SIGINT can find it
                CellRunner->>Task: await task
                alt Cell completes normally
                    Task-->>CellRunner: RunResult
                else Cell cancelled externally
                    Task-->>CellRunner: asyncio.CancelledError
                end
                CellRunner->>Scheduler: unregister_task(cell_id)
            else Cell is sync
                CellRunner->>Evaluator: evaluate(cell, glbls)
                Evaluator-->>CellRunner: RunResult
            end
            CellRunner->>CellRunner: _finalize_run_result()
            Note over CellRunner: CancelledError → MarimoInterrupt
            CellRunner-->>Scheduler: Completed cell
        end
    end

    Scheduler->>Context: __aexit__: clear _active_scheduler
    Context-->>Scheduler: OK

    Note over Client,Task: SIGINT/Cancellation Flow

    Client->>Kernel: Send interrupt signal
    Kernel->>SIGINT: SIGINT received
    SIGINT->>SIGINT: construct_interrupt_handler()
    SIGINT->>Context: safe_get_context()
    Context-->>SIGINT: KernelRuntimeContext
    SIGINT->>Context: Check execution_context
    alt No execution context
        SIGINT->>SIGINT: return (no-op)
    else Execution context exists
        SIGINT->>SIGINT: Broadcast InterruptedNotification()
        SIGINT->>Context: Check duckdb_connection
        alt DuckDB connection present
            SIGINT->>SIGINT: interrupt duckdb connection
        end
        SIGINT->>Context: Check active_scheduler
        alt Scheduler has active tasks
            SIGINT->>Scheduler: cancel_all()
            Scheduler->>Scheduler: Set _interrupted=true
            Scheduler->>Task: call_soon_threadsafe(task.cancel())
            Task-->>Scheduler: Task cancelled
            SIGINT->>SIGINT: return (no MarimoInterrupt raise)
        else Scheduler exists but no active tasks
            SIGINT->>Scheduler: cancel_all()
            SIGINT->>SIGINT: raise MarimoInterrupt
        else No scheduler
            SIGINT->>SIGINT: raise MarimoInterrupt
        end
    end
Loading

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread marimo/_runtime/runner/cell_runner.py Outdated
Comment thread marimo/_runtime/runner/scheduler.py
- Type Scheduler.start_task as AbstractAsyncContextManager so
  SequentialScheduler structurally satisfies the Protocol (mypy).
- Replace the 'fail loudly on re-entrant run_all' guard with a
  save/restore stack on _active_scheduler. A nested run_all on the same
  context (code_mode ctx.run_cell from inside a running cell) suspends
  the outer run, completes the inner, and restores the outer. Fixes the
  scratchpad cascade regression while keeping SIGINT routed to the
  innermost run.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses #2684 by making async cell execution cancellable via SIGINT: the running scheduler is now published on the KernelRuntimeContext so the interrupt handler can locate it and cancel in-flight async tasks instead of relying on raising into asyncio internals.

Changes:

  • Promote the per-run scheduler to a context-visible “active scheduler” (async context manager) so SIGINT can cancel queued cells and running async tasks.
  • Update interrupt handling to resolve the currently-installed runtime context at signal time and route cancellation appropriately (cancel-only for active async tasks; cancel+raise for sync/between-cells cases).
  • Refactor runner/executor responsibilities (move interruptible async evaluation into the runner) and expand test coverage for SIGINT/cancellation behavior.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/_runtime/test_scheduler.py Updates scheduler batching assertions to match the new iterable-yielding batch API.
tests/_runtime/test_interrupt_handlers.py Adds regression tests for SIGINT behavior with/without schedulers and active async tasks; updates mocking to use safe_get_context.
tests/_runtime/test_executor_evaluator.py Adds focused tests for task cancellation semantics and scheduler publication on the kernel context.
tests/_runtime/runner/test_hooks.py Verifies post-execution status mapping: MarimoInterrupt -> interrupted (with precedence rules).
tests/_runtime/runner/test_cell_runner.py Adds regression test ensuring run_all swallows SIGINT raises so finish hooks run and see remaining queue.
marimo/_runtime/runtime.py Updates subprocess signal installation to use the new zero-arg construct_interrupt_handler().
marimo/_runtime/runner/scheduler.py Extends scheduler with async-task tracking, cancel_all, start_task, and context publication (async with scheduler).
marimo/_runtime/runner/hooks_post_execution.py Fixes run-result status classification to key off raised MarimoInterrupt (vs broadcast payload).
marimo/_runtime/runner/cell_runner.py Routes coroutine cells through scheduler-tracked tasks; wraps whole run in scheduler async context; preserves queue state on interrupts.
marimo/_runtime/handlers.py Refactors SIGINT handler to look up current context at signal time and cancel async tasks without raising.
marimo/_runtime/executor/evaluator.py Removes prior SIGINT-capture mechanism (_cancel_on_sigint) now that cancellation is scheduler-driven.
marimo/_runtime/context/kernel_context.py Adds _active_scheduler and active_scheduler accessor on KernelRuntimeContext.
marimo/_pyodide/pyodide_session.py Updates SIGINT handler installation to call construct_interrupt_handler() without a captured context.
Comment thread marimo/_runtime/runner/scheduler.py Outdated
Comment thread marimo/_runtime/handlers.py Outdated
Comment thread marimo/_runtime/runner/cell_runner.py
- scheduler.py: remove unused LOGGER / _loggers import.
- cell_runner.py: rebuild the run queue once via requeue() instead of
  removing each filtered cell from the deque (deque.remove is O(n), so
  the prescan was O(n^2) for large notebooks). Preserves the
  SIGINT-mid-prescan semantics (unprocessed tail stays queued).
- handlers.py: execution_context is a plain attribute set by
  with_cell_id, not a per-task ContextVar; reword the comment.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="marimo/_runtime/runner/cell_runner.py">

<violation number="1" location="marimo/_runtime/runner/cell_runner.py:798">
P1: Mid-prescan SIGINTs re-add already-cancelled cells to the queue, so they get interruption handling as well as cancellation handling.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

# tail (including the cell we were about to classify) so
# on_finish_hooks report the correct remaining set.
if interrupted_at is not None:
runnable.extend(snapshot[interrupted_at:])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Mid-prescan SIGINTs re-add already-cancelled cells to the queue, so they get interruption handling as well as cancellation handling.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At marimo/_runtime/runner/cell_runner.py, line 798:

<comment>Mid-prescan SIGINTs re-add already-cancelled cells to the queue, so they get interruption handling as well as cancellation handling.</comment>

<file context>
@@ -772,21 +778,25 @@ async def _dispatch_runnable(
+        # tail (including the cell we were about to classify) so
+        # on_finish_hooks report the correct remaining set.
+        if interrupted_at is not None:
+            runnable.extend(snapshot[interrupted_at:])
+        self._scheduler.requeue(runnable)
 
</file context>
Comment thread marimo/_runtime/runner/scheduler.py Outdated
Comment thread marimo/_runtime/runner/scheduler.py
Co-authored-by: Kiran Gadhave <14944083+kirangadhave@users.noreply.github.com>

@kirangadhave kirangadhave left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@dmadisetti dmadisetti merged commit b83701f into main Jun 12, 2026
42 of 43 checks passed
@dmadisetti dmadisetti deleted the dm/nonblocking-runner branch June 12, 2026 20:35
dmadisetti added a commit that referenced this pull request Jun 24, 2026
CachedLifecycle plugs cell caching into the executor's setup/teardown
protocol, behind the runtime.cell_caching config flag:

- setup: hash the cell via cache_attempt_from_hash; on hit, restore
  defs and Skip the body with the cached output. UI-element-defining
  cells fall through to live execution (a restored UIElement carries a
  stale object id and the frontend would target a dead element).
- pre-flight: refs carrying the __marimo_unhashable__ protocol marker
  (duck-typed; no import of the stub toolkit) invalidate the
  producer's manifest and requeue producer + consumer through
  MarimoCancelCellError -> Scheduler.requeue_for_rerun.
- teardown: backfill the cache on successful miss; raised bodies and
  save failures never break the teardown chain.
- pin_modules and the backing loader are config-driven; the loader
  resolves through PERSISTENT_LOADERS so per-def lazy improvements
  arrive transparently.

Sequenced after #9705 (scheduler requeue protocol) and the stub
serialization toolkit (MarimoCancelCellError definitions); two
end-to-end tripwire tests skip until the per-def lazy store lands.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

3 participants