Refactor boulder and ralph-loop state to per-plan/per-session storage for multi-agent concurrency#2233
Draft
mathew-cf wants to merge 1 commit intocode-yeongyu:devfrom
Draft
Conversation
Contributor
|
All contributors have signed the CLA. Thank you! ✅ |
Author
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
3 issues found across 22 files
Confidence score: 2/5
- High-impact concurrency risk:
setSessionIDinsrc/hooks/ralph-loop/loop-state-controller.tscan hijack another agent’s active state whenstateDiris undefined, overwriting its session ID. - Clearing per-plan state in
src/hooks/start-work/start-work-hook.tscan destroy shared state for other concurrent agents when switching plans, which is a likely regression in multi-agent workflows. - Overall risk is high because multiple severity 7–8 issues can cause cross-agent state corruption or loss, directly affecting user workflows.
- Pay close attention to
src/hooks/ralph-loop/loop-state-controller.ts,src/hooks/start-work/start-work-hook.ts,src/features/boulder-state/legacy-migration.ts- concurrent state isolation and migration safety.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/features/boulder-state/legacy-migration.ts">
<violation number="1" location="src/features/boulder-state/legacy-migration.ts:24">
P1: Legacy state migration incorrectly skipped if per-plan storage directory exists, causing potential loss of continuation state for older plans</violation>
</file>
<file name="src/hooks/ralph-loop/loop-state-controller.ts">
<violation number="1" location="src/hooks/ralph-loop/loop-state-controller.ts:90">
P1: State hijacking vulnerability in setSessionID: When stateDir is undefined, findAnyActiveRalphLoopState(directory) may return the active state of another concurrent agent. The function then overwrites that state's session_id, writes it to the new session's file, and deletes the original agent's state file, completely breaking multi-agent isolation.</violation>
</file>
<file name="src/hooks/start-work/start-work-hook.ts">
<violation number="1" location="src/hooks/start-work/start-work-hook.ts:107">
P1: Clearing the entire plan state file when switching plans destroys shared state for other concurrent agents. Instead of `clearBoulderStateForPlan(ctx.directory, existingState.plan_name)`, the code should remove only the current session's ID from the old plan's state to preserve progress for other agents.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Add one-off context when rerunning by tagging
@cubic-dev-aiwith guidance or docs links (includingllms.txt) - Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
f186cfb to
1281c48
Compare
Author
|
@mathew-cf I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
7 issues found across 30 files
Confidence score: 2/5
- High-risk changes:
src/hooks/atlas/event-handler.tscan drop continuation injection for subagent sessions whenfindBoulderStateBySessionreturns null, which is a concrete user-impacting regression. - Multiple concurrency issues (temp file collisions in
src/features/boulder-state/atomic-file-ops.ts, missing lock insrc/features/boulder-state/per-plan-storage.ts, TOCTOU update insrc/hooks/start-work/start-work-hook.ts) increase the likelihood of state corruption under parallel agents. - Several
src/hooks/ralph-loop/loop-state-controller.tsmethods behave inconsistently whensessionIDis omitted, which can cause active loop state to be missed or mutated in the wrong location. - Pay close attention to
src/hooks/atlas/event-handler.ts,src/features/boulder-state/atomic-file-ops.ts,src/features/boulder-state/per-plan-storage.ts,src/hooks/start-work/start-work-hook.ts,src/hooks/ralph-loop/loop-state-controller.ts- regression risk and concurrency hazards.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/hooks/atlas/event-handler.ts">
<violation number="1" location="src/hooks/atlas/event-handler.ts:45">
P1: Background tasks will lose continuation injection because `findBoulderStateBySession` returns null for subagent sessions not in session_ids, causing early exit at line 83</violation>
</file>
<file name="src/hooks/start-work/start-work-hook.ts">
<violation number="1" location="src/hooks/start-work/start-work-hook.ts:160">
P2: TOCTOU race condition: When updating worktree_path, the code spreads stale `existingState` which may have been modified by concurrent agents. Use atomic update pattern like `appendSessionIdForPlan` or re-read state inside the lock before writing.</violation>
</file>
<file name="src/features/boulder-state/atomic-file-ops.ts">
<violation number="1" location="src/features/boulder-state/atomic-file-ops.ts:13">
P1: Race condition in temp file naming: `Date.now()` provides only millisecond precision, causing collisions when multiple agents write to the same file path within the same millisecond. Use `randomUUID()` (already imported) to ensure unique temp file paths.</violation>
</file>
<file name="src/hooks/ralph-loop/storage.ts">
<violation number="1" location="src/hooks/ralph-loop/storage.ts:119">
P2: Duplicate logic in `findAnyActiveRalphLoopState` and `findSecondActiveRalphLoopState` functions. Both functions implement identical directory traversal, readdirSync filtering, and state parsing logic. The only difference is an additional session_id check in the second function. These should be refactored into a single shared utility with an optional `excludeSessionId` parameter to reduce maintenance burden and follow DRY principles.</violation>
</file>
<file name="src/hooks/ralph-loop/loop-state-controller.ts">
<violation number="1" location="src/hooks/ralph-loop/loop-state-controller.ts:76">
P1: `getState()` does not fall back to checking legacy singleton state when `sessionID` is omitted, returning `null` even if a legacy loop is active. The `setSessionID()` method explicitly tries `readState(directory)` (legacy singleton) before falling back to `findAnyActiveRalphLoopState()`, but `getState()` skips the legacy check. This inconsistency may cause callers to erroneously start duplicate loops or fail to resume existing ones in legacy format.</violation>
<violation number="2" location="src/hooks/ralph-loop/loop-state-controller.ts:79">
P1: clear() and incrementIteration() fail to target the active per-session state when sessionID is omitted. They pass undefined to storage functions which default to the legacy singleton path, while getState() correctly uses findAnyActiveRalphLoopState(). This inconsistency means callers omitting sessionID will silently modify the wrong file.</violation>
</file>
<file name="src/features/boulder-state/per-plan-storage.ts">
<violation number="1" location="src/features/boulder-state/per-plan-storage.ts:89">
P1: Missing file lock in `clearBoulderStateForPlan` introduces race conditions with concurrent operations</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
… for multi-agent concurrency
Migrate boulder state from a singleton boulder.json to per-plan files
under .sisyphus/boulder/{plan-name}.json, and ralph-loop state to
per-session files under .sisyphus/ralph-loop/{session-id}.json. This
prevents concurrent agents from overwriting each other's state.
Add atomic file writes, JSON parser, and legacy migration utilities.
Update atlas, prometheus-md-only, start-work hooks and continuation
state to use session-aware lookups.
1281c48 to
986b157
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
boulder.jsonto per-plan files under.sisyphus/boulder/{plan-name}.json, preventing concurrent agents from overwriting each other's state.sisyphus/ralph-loop/{session-id}.jsonfor the same multi-agent isolation.sisyphus/boulder.jsonand.sisyphus/ralph-loop.jsonChanges
New files
src/features/boulder-state/per-plan-storage.ts— per-plan read/write/list/clear operationssrc/features/boulder-state/atomic-file-ops.ts— write-to-temp-then-rename for crash safetysrc/features/boulder-state/boulder-json-parser.ts— extracted JSON parsing with validationsrc/features/boulder-state/legacy-migration.ts— auto-migrates singleton boulder.json → per-planModified files
src/features/boulder-state/storage.ts— legacy-compatible wrappers delegating to per-plan storagesrc/hooks/ralph-loop/storage.ts— per-session state withreadStateForSession,findAnyActiveRalphLoopState,migrateLegacyRalphLoopStatesrc/cli/run/continuation-state.ts— session-aware boulder and ralph-loop lookupssrc/hooks/atlas/— updated to usefindBoulderStateBySessionsrc/hooks/prometheus-md-only/— updated boulder state importssrc/hooks/ralph-loop/— updated loop controller, event handler, and hook to use per-session storagesrc/hooks/start-work/— updated to per-plan boulder writesMotivation
When multiple agents (e.g., two Atlas sessions) run concurrently against the same project, they share a single
boulder.jsonandralph-loop.json. This causes state corruption — one agent's writes clobber the other's. Per-plan and per-session file isolation eliminates the race condition entirely.Checklist
bun run typecheckpassesbun run buildsucceedspackage.jsonSummary by cubic
Refactored boulder and ralph-loop state to per-plan and per-session files for safe multi-agent concurrency. Added atomic writes with file locking and legacy migrations; hooks now use session-aware helpers across the stack.
New Features
Refactors
Written for commit 986b157. Summary will update on new commits.