fix: allow async cancellation#9705
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
2 issues found across 12 files
Architecture diagram
sequenceDiagram
participant Client as Browser Client
participant Kernel as Kernel (main thread)
participant SIGINT as Signal Handler
participant Context as KernelRuntimeContext
participant Scheduler as SequentialScheduler
participant CellRunner as CellRunner
participant Evaluator as CellEvaluator
participant Task as asyncio Task
Note over Client,Task: Cell Execution with Cancellation Support
Client->>Kernel: Request cell execution
Kernel->>Scheduler: Create SequentialScheduler(cells_to_run, graph)
Scheduler->>Scheduler: Initialize _active={}, _interrupted=False
Note over Scheduler,CellRunner: async with Scheduler publishes on context
Scheduler->>Context: __aenter__: set _active_scheduler=self
Context-->>Scheduler: OK
loop For each cell batch
Scheduler->>Scheduler: batch(cell_ids) - yield one cell at a time
Scheduler->>CellRunner: Run cell_id
CellRunner->>CellRunner: Check self.cancelled(cell_id)
alt Cell is cancelled
CellRunner->>CellRunner: Mark cell as "cancelled"
else Cell not cancelled
alt Cell is coroutine (async)
CellRunner->>Task: asyncio.ensure_future(evaluate(cell, glbls))
CellRunner->>Scheduler: register_task(cell_id, task)
Note over CellRunner,Task: Task registered so SIGINT can find it
CellRunner->>Task: await task
alt Cell completes normally
Task-->>CellRunner: RunResult
else Cell cancelled externally
Task-->>CellRunner: asyncio.CancelledError
end
CellRunner->>Scheduler: unregister_task(cell_id)
else Cell is sync
CellRunner->>Evaluator: evaluate(cell, glbls)
Evaluator-->>CellRunner: RunResult
end
CellRunner->>CellRunner: _finalize_run_result()
Note over CellRunner: CancelledError → MarimoInterrupt
CellRunner-->>Scheduler: Completed cell
end
end
Scheduler->>Context: __aexit__: clear _active_scheduler
Context-->>Scheduler: OK
Note over Client,Task: SIGINT/Cancellation Flow
Client->>Kernel: Send interrupt signal
Kernel->>SIGINT: SIGINT received
SIGINT->>SIGINT: construct_interrupt_handler()
SIGINT->>Context: safe_get_context()
Context-->>SIGINT: KernelRuntimeContext
SIGINT->>Context: Check execution_context
alt No execution context
SIGINT->>SIGINT: return (no-op)
else Execution context exists
SIGINT->>SIGINT: Broadcast InterruptedNotification()
SIGINT->>Context: Check duckdb_connection
alt DuckDB connection present
SIGINT->>SIGINT: interrupt duckdb connection
end
SIGINT->>Context: Check active_scheduler
alt Scheduler has active tasks
SIGINT->>Scheduler: cancel_all()
Scheduler->>Scheduler: Set _interrupted=true
Scheduler->>Task: call_soon_threadsafe(task.cancel())
Task-->>Scheduler: Task cancelled
SIGINT->>SIGINT: return (no MarimoInterrupt raise)
else Scheduler exists but no active tasks
SIGINT->>Scheduler: cancel_all()
SIGINT->>SIGINT: raise MarimoInterrupt
else No scheduler
SIGINT->>SIGINT: raise MarimoInterrupt
end
end
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
eda4d02 to
50fa1b4
Compare
for more information, see https://pre-commit.ci
3fd8873 to
5b29262
Compare
- Type Scheduler.start_task as AbstractAsyncContextManager so SequentialScheduler structurally satisfies the Protocol (mypy). - Replace the 'fail loudly on re-entrant run_all' guard with a save/restore stack on _active_scheduler. A nested run_all on the same context (code_mode ctx.run_cell from inside a running cell) suspends the outer run, completes the inner, and restores the outer. Fixes the scratchpad cascade regression while keeping SIGINT routed to the innermost run.
5b29262 to
b0fca7f
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses #2684 by making async cell execution cancellable via SIGINT: the running scheduler is now published on the KernelRuntimeContext so the interrupt handler can locate it and cancel in-flight async tasks instead of relying on raising into asyncio internals.
Changes:
- Promote the per-run scheduler to a context-visible “active scheduler” (async context manager) so SIGINT can cancel queued cells and running async tasks.
- Update interrupt handling to resolve the currently-installed runtime context at signal time and route cancellation appropriately (cancel-only for active async tasks; cancel+raise for sync/between-cells cases).
- Refactor runner/executor responsibilities (move interruptible async evaluation into the runner) and expand test coverage for SIGINT/cancellation behavior.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/_runtime/test_scheduler.py | Updates scheduler batching assertions to match the new iterable-yielding batch API. |
| tests/_runtime/test_interrupt_handlers.py | Adds regression tests for SIGINT behavior with/without schedulers and active async tasks; updates mocking to use safe_get_context. |
| tests/_runtime/test_executor_evaluator.py | Adds focused tests for task cancellation semantics and scheduler publication on the kernel context. |
| tests/_runtime/runner/test_hooks.py | Verifies post-execution status mapping: MarimoInterrupt -> interrupted (with precedence rules). |
| tests/_runtime/runner/test_cell_runner.py | Adds regression test ensuring run_all swallows SIGINT raises so finish hooks run and see remaining queue. |
| marimo/_runtime/runtime.py | Updates subprocess signal installation to use the new zero-arg construct_interrupt_handler(). |
| marimo/_runtime/runner/scheduler.py | Extends scheduler with async-task tracking, cancel_all, start_task, and context publication (async with scheduler). |
| marimo/_runtime/runner/hooks_post_execution.py | Fixes run-result status classification to key off raised MarimoInterrupt (vs broadcast payload). |
| marimo/_runtime/runner/cell_runner.py | Routes coroutine cells through scheduler-tracked tasks; wraps whole run in scheduler async context; preserves queue state on interrupts. |
| marimo/_runtime/handlers.py | Refactors SIGINT handler to look up current context at signal time and cancel async tasks without raising. |
| marimo/_runtime/executor/evaluator.py | Removes prior SIGINT-capture mechanism (_cancel_on_sigint) now that cancellation is scheduler-driven. |
| marimo/_runtime/context/kernel_context.py | Adds _active_scheduler and active_scheduler accessor on KernelRuntimeContext. |
| marimo/_pyodide/pyodide_session.py | Updates SIGINT handler installation to call construct_interrupt_handler() without a captured context. |
- scheduler.py: remove unused LOGGER / _loggers import. - cell_runner.py: rebuild the run queue once via requeue() instead of removing each filtered cell from the deque (deque.remove is O(n), so the prescan was O(n^2) for large notebooks). Preserves the SIGINT-mid-prescan semantics (unprocessed tail stays queued). - handlers.py: execution_context is a plain attribute set by with_cell_id, not a per-task ContextVar; reword the comment.
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="marimo/_runtime/runner/cell_runner.py">
<violation number="1" location="marimo/_runtime/runner/cell_runner.py:798">
P1: Mid-prescan SIGINTs re-add already-cancelled cells to the queue, so they get interruption handling as well as cancellation handling.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
| # tail (including the cell we were about to classify) so | ||
| # on_finish_hooks report the correct remaining set. | ||
| if interrupted_at is not None: | ||
| runnable.extend(snapshot[interrupted_at:]) |
There was a problem hiding this comment.
P1: Mid-prescan SIGINTs re-add already-cancelled cells to the queue, so they get interruption handling as well as cancellation handling.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At marimo/_runtime/runner/cell_runner.py, line 798:
<comment>Mid-prescan SIGINTs re-add already-cancelled cells to the queue, so they get interruption handling as well as cancellation handling.</comment>
<file context>
@@ -772,21 +778,25 @@ async def _dispatch_runnable(
+ # tail (including the cell we were about to classify) so
+ # on_finish_hooks report the correct remaining set.
+ if interrupted_at is not None:
+ runnable.extend(snapshot[interrupted_at:])
+ self._scheduler.requeue(runnable)
</file context>
Co-authored-by: Kiran Gadhave <14944083+kirangadhave@users.noreply.github.com>
CachedLifecycle plugs cell caching into the executor's setup/teardown protocol, behind the runtime.cell_caching config flag: - setup: hash the cell via cache_attempt_from_hash; on hit, restore defs and Skip the body with the cached output. UI-element-defining cells fall through to live execution (a restored UIElement carries a stale object id and the frontend would target a dead element). - pre-flight: refs carrying the __marimo_unhashable__ protocol marker (duck-typed; no import of the stub toolkit) invalidate the producer's manifest and requeue producer + consumer through MarimoCancelCellError -> Scheduler.requeue_for_rerun. - teardown: backfill the cache on successful miss; raised bodies and save failures never break the teardown chain. - pin_modules and the backing loader are config-driven; the loader resolves through PERSISTENT_LOADERS so per-def lazy improvements arrive transparently. Sequenced after #9705 (scheduler requeue protocol) and the stub serialization toolkit (MarimoCancelCellError definitions); two end-to-end tripwire tests skip until the per-def lazy store lands. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
📝 Summary
Closes #2684
Promotes scheduler to long lived singleton on kernel context for look-up of running cells. This in turn allows task cancellations on async cells, which were previously un-cancelable. Example of cancelation: