Skip to content

Swallow sandbox container races in the stream consumer#552

Merged
0xallam merged 2 commits into
mainfrom
runtime/swallow-sandbox-races
Jun 9, 2026
Merged

Swallow sandbox container races in the stream consumer#552
0xallam merged 2 commits into
mainfrom
runtime/swallow-sandbox-races

Conversation

@0xallam

@0xallam 0xallam commented Jun 9, 2026

Copy link
Copy Markdown
Member

When the TUI quit path SIGKILL's the sandbox container (PR #548), the agent's in-flight SDK calls can race the cleanup and surface as either:

  • agents.sandbox.errors.ExecTransportError — from exec calls into the now-gone container
  • docker.errors.NotFound — from container.reload() / inspect_container / any container lookup

Both are expected during teardown. Catch them in _run_cycle's stream consumer alongside the existing LiteLLM shutdown-race catch.

Also moves the raise stream.run_loop_exception line inside the try, so the stored exception path is covered too (the SDK stashes the exception on the stream object rather than re-raising it through stream_events()).

@greptile-apps

greptile-apps Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR addresses teardown races introduced by PR #548's SIGKILL sandbox cleanup by catching ExecTransportError and docker.errors.NotFound in _run_cycle's stream consumer, guarded by a new coordinator.is_shutting_down flag that is set at the start of _fire_sandbox_cleanup.

  • Adds is_shutting_down to AgentCoordinator and mark_shutting_down() call in _fire_sandbox_cleanup, so swallowing is correctly scoped to TUI-initiated teardown rather than unconditional.
  • Moves raise stream.run_loop_exception inside the try block so stored exceptions (stashed on the stream object by the SDK) are covered by all three except clauses.
  • The new except clause includes exc_info=True, preserving tracebacks for post-mortem debugging of teardown noise.

Confidence Score: 5/5

Safe to merge — the change is narrowly scoped to teardown races and the swallowing is properly gated on the shutdown flag.

Both concerns raised in earlier review rounds have been addressed: the coordinator.is_shutting_down guard ensures genuine container crashes outside teardown are still surfaced, and exc_info=True preserves tracebacks. The flag is set before the SIGKILL fires, the exception paths are correctly covered, and non-TUI shutdown paths are unaffected.

No files require special attention.

Important Files Changed

Filename Overview
strix/core/agents.py Adds is_shutting_down flag and mark_shutting_down() method to AgentCoordinator to gate sandbox-error swallowing in the stream consumer.
strix/core/execution.py Imports ExecTransportError and docker_errors; moves raise stream.run_loop_exception inside the try so stored exceptions are covered; adds a gated except (ExecTransportError, docker_errors.NotFound) clause with exc_info=True logging.
strix/interface/tui/app.py Calls coordinator.mark_shutting_down() at the top of _fire_sandbox_cleanup() so the flag is set before sandbox containers are killed.

Reviews (3): Last reviewed commit: "Gate sandbox-error swallow on coordinato..." | Re-trigger Greptile

Comment thread strix/core/execution.py Outdated
Comment thread strix/core/execution.py Outdated
The unconditional except clause would swallow genuine container
failures (OOM, eviction, daemon hiccup) outside of teardown and
report partial streams as successful runs.

AgentCoordinator gains an is_shutting_down flag plus a
mark_shutting_down() method. The TUI quit path flips it before
scheduling the cleanup coroutine. The stream consumer only swallows
ExecTransportError / docker.errors.NotFound when the flag is set;
otherwise it re-raises so the existing error path surfaces. The
warning log adds exc_info=True for diagnosis.
@0xallam 0xallam merged commit 6202131 into main Jun 9, 2026
2 checks passed
@0xallam 0xallam deleted the runtime/swallow-sandbox-races branch June 9, 2026 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant