Add workspace close TERM/KILL fallback with reentrancy guards#692
Add workspace close TERM/KILL fallback with reentrancy guards#692trydis wants to merge 5 commits intomanaflow-ai:mainfrom
Conversation
…ow-ai#629) When closing a workspace whose terminal processes don't exit within 0.6s, fall back to SIGTERM → SIGKILL escalation per panel before removing the workspace from the tab list. This prevents orphaned shell processes when closing workspaces with long-running commands. Teardown flow (closeAllPanelsForWorkspaceClose): - For each panel: closePanel → wait 0.6s → SIGTERM process groups → wait 1s → SIGKILL remaining → wait 1s → retry closePanel - Returns WorkspaceCloseResult with per-panel success/failure tracking - Gate workspace removal on full teardown (allClosed == true) Main-thread responsiveness: - All waits use RunLoop drain (10ms ticks) instead of blocking calls so the UI stays responsive during teardown - ttyProcessInfos uses process.isRunning + RunLoop spin instead of the blocking waitUntilExit() Reentrancy protection: - Widen isWorkspaceClosingTransaction to internal - Guard all mutating socket handlers (V1 and V2) against mid-teardown workspace mutations — surface.create/close/split/move, workspace.close, pane.create all return invalid_state if the workspace is being closed - Extract v2ResolveActiveWorkspace helper combining workspace resolution with the closing-transaction guard Refactoring: - Extract teardownWorkspacePanels helper in TabManager (eliminates duplicated teardown + logging block in closeWorkspace and closeWorkspaceIfRunningProcess) - Inline processGroupsForPIDs at its sole call site — the caller already knows the target TTY, no need to scan all workspace TTYs - Collapse identical isDetaching / isWorkspaceClosingTransaction branches in didCloseTab - Remove unused force parameter from closeAllPanelsForWorkspaceClose Tests: - WorkspaceCloseResultTests: unit tests for allClosed computed property covering success, partial failure, and edge cases - test_close_workspace_kills_processes.py: integration test verifying processes are terminated when closing a workspace via socket
|
@trydis is attempting to deploy a commit to the Manaflow Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughAdds deterministic workspace-close teardown: Workspace now coordinates panel and terminal-process shutdown; TabManager requires successful teardown before removing workspaces; TerminalController guards against mid-close mutations; unit and integration tests validate teardown and process termination. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant TabMgr as TabManager
participant Workspace
participant Panel as Panel/TTY
participant Process as OS Process
User->>TabMgr: closeWorkspace(workspace)
activate TabMgr
TabMgr->>Workspace: teardownWorkspacePanels(workspace)
activate Workspace
loop per panel
Workspace->>Panel: requestClose()
activate Panel
Panel->>Process: send termination signal (SIGTERM/SIGKILL)
Process-->>Panel: exit or timeout
deactivate Panel
end
alt all panels closed
Workspace-->>TabMgr: WorkspaceCloseResult(allClosed: true)
else partial/failed
Workspace-->>TabMgr: WorkspaceCloseResult(allClosed: false, failedPanelIds: ...)
end
deactivate Workspace
alt allClosed true
TabMgr->>TabMgr: remove workspace from tabs
TabMgr-->>User: success (true)
else
TabMgr-->>User: failure (false)
end
deactivate TabMgr
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
Sources/TabManager.swift (1)
976-997: Implementation looks correct; consider checking workspace membership earlier for robustness.The teardown is called at line 979 before verifying the workspace exists in
tabsat line 985. IfcloseWorkspaceis ever called with a workspace not in this TabManager (e.g., a stale reference or double-close), teardown signals (SIGTERM/SIGKILL) would fire but the method returnsfalse.In practice this is unlikely since callers resolve workspaces from
tabs, and teardown on an already-closed workspace should be idempotent. The current logic achieves the PR objective of preventing orphaned processes.♻️ Optional: Move existence check before teardown
`@discardableResult` func closeWorkspace(_ workspace: Workspace) -> Bool { guard tabs.count > 1 else { return false } + guard let index = tabs.firstIndex(where: { $0.id == workspace.id }) else { + return false + } guard teardownWorkspacePanels(workspace) else { return false } sentryBreadcrumb("workspace.close", data: ["tabCount": tabs.count - 1]) AppDelegate.shared?.notificationStore?.clearNotifications(forTabId: workspace.id) unwireClosedBrowserTracking(for: workspace) - guard let index = tabs.firstIndex(where: { $0.id == workspace.id }) else { - return false - } tabs.remove(at: index)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Sources/TabManager.swift` around lines 976 - 997, The closeWorkspace(_:) implementation calls teardownWorkspacePanels(workspace) before verifying the workspace is actually present in tabs; to avoid invoking teardown on a non-member (stale/double-close) move the membership check up front: find the index via tabs.firstIndex(where: { $0.id == workspace.id }) and guard that it exists, then perform teardownWorkspacePanels(workspace), sentryBreadcrumb(...), AppDelegate.shared?.notificationStore?.clearNotifications(forTabId:), unwireClosedBrowserTracking(for:), remove the tab from tabs, and update selectedTabId as currently implemented; keep function name closeWorkspace, the teardown call teardownWorkspacePanels, and the selectedTabId logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Sources/TerminalController.swift`:
- Around line 2465-2474: The call to tabManager.closeWorkspace(...) currently
discards its Bool return and always returns success; update both call sites
where closeWorkspace is invoked (the site that builds result with
v2ResolveWindowId/v2OrNull/v2Ref and the other similar handler) to check the
Bool return value and propagate a failure response when it returns false—i.e.,
if tabManager.closeWorkspace(ws) is false, set result to an error (or return
early) instead of returning .ok; keep the existing success path intact when true
and still populate "window_id"/"window_ref"/"workspace_id"/"workspace_ref" using
v2ResolveWindowId, v2OrNull and v2Ref.
In `@Sources/Workspace.swift`:
- Around line 2215-2218: The fallback signaling via
forceTerminateTerminalProcessesForWorkspaceClose(panelId:) may not reliably
indicate success (e.g., missing TTY), so always attempt the forced close to
ensure workspace panel teardown; change the code around
forceTerminateTerminalProcessesForWorkspaceClose(panelId:) and
closePanel(panelId, force:) to unconditionally call closePanel(panelId, force:
true) after the fallback attempt (remove the conditional check on
fallbackSignaled) so the forced close is retried regardless of the fallback
result.
- Around line 2267-2273: The code is incorrectly comparing process group IDs to
the app's PID (appPid); obtain the app's process group ID via getpgrp() (store
as appPgid) and use that when filtering targetProcessGroups (and any other
filters that exclude the current process group), e.g., replace comparisons
against appPid with appPgid while keeping the same type casting as used for
processInfos.map(\.processGroupId) so the app's own process group is correctly
excluded before signaling (affecting trackedPids/targetProcessGroups logic).
In `@tests/test_close_workspace_kills_processes.py`:
- Around line 80-83: The test currently constructs predictable temp paths with
Path(f"/tmp/cmux_wsclose_pid_{os.getpid()}_{index}.txt"), which is insecure in
CI; replace this with a secure atomic temp-file creation using
tempfile.mkstemp() to obtain (fd, name), immediately close the fd (os.close(fd))
and then wrap the returned name in Path for pid_file before appending to
pid_files and passing it to _start_probe_process; ensure tracked_pids handling
remains unchanged and remove any reliance on the hardcoded /tmp filename
pattern.
---
Nitpick comments:
In `@Sources/TabManager.swift`:
- Around line 976-997: The closeWorkspace(_:) implementation calls
teardownWorkspacePanels(workspace) before verifying the workspace is actually
present in tabs; to avoid invoking teardown on a non-member (stale/double-close)
move the membership check up front: find the index via tabs.firstIndex(where: {
$0.id == workspace.id }) and guard that it exists, then perform
teardownWorkspacePanels(workspace), sentryBreadcrumb(...),
AppDelegate.shared?.notificationStore?.clearNotifications(forTabId:),
unwireClosedBrowserTracking(for:), remove the tab from tabs, and update
selectedTabId as currently implemented; keep function name closeWorkspace, the
teardown call teardownWorkspacePanels, and the selectedTabId logic unchanged.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
GhosttyTabs.xcodeproj/project.pbxprojSources/TabManager.swiftSources/TerminalController.swiftSources/Workspace.swiftcmuxTests/CmuxWebViewKeyEquivalentTests.swiftcmuxTests/WorkspaceCloseResultTests.swifttests/test_close_workspace_kills_processes.py
Greptile SummaryImplements deterministic workspace teardown with SIGTERM/SIGKILL escalation to prevent orphaned shell processes. Uses non-blocking RunLoop waits to maintain UI responsiveness during the multi-second cleanup flow. Adds comprehensive reentrancy guards across all socket handlers (V1/V2) to block mutations during teardown. Key changes:
Issues found:
Test coverage:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[TabManager.closeWorkspace] --> B{Check tabs.count > 1}
B -->|false| Z[Return false]
B -->|true| C[teardownWorkspacePanels]
C --> D[workspace.closeAllPanelsForWorkspaceClose]
D --> E[activeWorkspaceCloseTransactions++]
E --> F{For each panel}
F --> G[closePanel with force=true]
G --> H{waitForPanelClose 0.6s}
H -->|closed| M[closedCount++]
H -->|timeout| I[forceTerminateTerminalProcesses]
I --> J[Get TTY process list via ps -t]
J --> K[SIGTERM → wait 1s → SIGKILL → wait 1s]
K --> L[Retry closePanel]
L --> N{waitForPanelClose 0.6s}
N -->|closed| M
N -->|timeout| O[failedPanelIds.append]
M --> P{More panels?}
O --> P
P -->|yes| F
P -->|no| Q{allClosed?}
Q -->|true| R[Remove workspace from tabs]
Q -->|false| S[Leave workspace visible]
R --> T[Return true]
S --> Z
style I fill:#ffe6e6
style K fill:#ffe6e6
style Q fill:#fff4e6
style S fill:#ffe6e6
Last reviewed commit: 0f21d5d |
v2WorkspaceClose now returns close_failed error when closeWorkspace returns false (e.g. last workspace). v1 closeWorkspace gates success on the return value instead of always setting true.
Previously closePanel(force:) was only called if forceTerminateTerminalProcessesForWorkspaceClose returned true. When signal delivery fails (e.g. missing TTY), the retry was skipped and we still waited for panel close — missing a chance to force teardown.
Use getpgrp() to obtain the app's process group ID and filter against it when building targetProcessGroups for SIGTERM and SIGKILL, correcting a bug where appPid (a PID) was compared against process group IDs. Also adds the missing app-group exclusion to the SIGKILL refresh filter.
Replace predictable /tmp paths with tempfile.mkstemp() to prevent symlink attacks in CI environments.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
Sources/TerminalController.swift (1)
10803-10809:⚠️ Potential issue | 🟡 MinorReturn precise v1 close-workspace errors instead of a generic not-found.
closeWorkspace(_:)now uses the Bool return, but Line 10808 still maps all failures to"ERROR: Tab not found", including"workspace is closing"and teardown"close_failed"states. This hides actionable failures from callers.💡 Proposed fix
private func closeWorkspace(_ tabId: String) -> String { guard let tabManager = tabManager else { return "ERROR: TabManager not available" } guard let uuid = UUID(uuidString: tabId) else { return "ERROR: Invalid tab ID" } var success = false + var errorMessage = "ERROR: Tab not found" DispatchQueue.main.sync { - if let tab = tabManager.tabs.first(where: { $0.id == uuid }), - !tab.isWorkspaceClosingTransaction { - success = tabManager.closeWorkspace(tab) - } + guard let tab = tabManager.tabs.first(where: { $0.id == uuid }) else { return } + guard !tab.isWorkspaceClosingTransaction else { + errorMessage = "ERROR: Workspace is closing" + return + } + guard tabManager.closeWorkspace(tab) else { + errorMessage = "ERROR: Failed to close workspace" + return + } + success = true } - return success ? "OK" : "ERROR: Tab not found" + return success ? "OK" : errorMessage }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Sources/TerminalController.swift` around lines 10803 - 10809, The handler currently converts any failure from tabManager.closeWorkspace(tab) into the generic "ERROR: Tab not found"; instead, have the code surface the specific failure from closeWorkspace(_:) by changing the call site to capture its detailed result (e.g., an enum or error message returned by closeWorkspace) rather than just a Bool and map those cases to distinct v1 error strings (e.g., "ERROR: workspace is closing", "ERROR: close_failed", or "ERROR: tab not found"). Locate the use of tabManager.tabs.first(where: { $0.id == uuid }) and tabManager.closeWorkspace(tab) and replace the Bool plumbing with the specific result value from closeWorkspace(_:), then switch on that result to return the appropriate v1 error string instead of the single "ERROR: Tab not found".
🧹 Nitpick comments (1)
tests/test_close_workspace_kills_processes.py (1)
80-80: Rename unused loop variable to_.The
indexvariable is defined but never used in the loop body.🔧 Suggested fix
- for index, (_, surface_id, _) in enumerate(surfaces[:2]): + for _, (_, surface_id, _) in enumerate(surfaces[:2]):Alternatively, remove
enumerate()entirely since the index isn't needed:- for index, (_, surface_id, _) in enumerate(surfaces[:2]): + for _, surface_id, _ in surfaces[:2]:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_close_workspace_kills_processes.py` at line 80, The loop defines an unused variable `index` in the line `for index, (_, surface_id, _) in enumerate(surfaces[:2]):`; change it to use a throwaway variable (e.g., `for _, (_, surface_id, _) in enumerate(surfaces[:2]):`) or remove `enumerate()` entirely (e.g., `for (_, surface_id, _) in surfaces[:2]:`) so the unused `index` is eliminated; update the loop in tests/test_close_workspace_kills_processes.py accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@Sources/TerminalController.swift`:
- Around line 10803-10809: The handler currently converts any failure from
tabManager.closeWorkspace(tab) into the generic "ERROR: Tab not found"; instead,
have the code surface the specific failure from closeWorkspace(_:) by changing
the call site to capture its detailed result (e.g., an enum or error message
returned by closeWorkspace) rather than just a Bool and map those cases to
distinct v1 error strings (e.g., "ERROR: workspace is closing", "ERROR:
close_failed", or "ERROR: tab not found"). Locate the use of
tabManager.tabs.first(where: { $0.id == uuid }) and
tabManager.closeWorkspace(tab) and replace the Bool plumbing with the specific
result value from closeWorkspace(_:), then switch on that result to return the
appropriate v1 error string instead of the single "ERROR: Tab not found".
---
Nitpick comments:
In `@tests/test_close_workspace_kills_processes.py`:
- Line 80: The loop defines an unused variable `index` in the line `for index,
(_, surface_id, _) in enumerate(surfaces[:2]):`; change it to use a throwaway
variable (e.g., `for _, (_, surface_id, _) in enumerate(surfaces[:2]):`) or
remove `enumerate()` entirely (e.g., `for (_, surface_id, _) in surfaces[:2]:`)
so the unused `index` is eliminated; update the loop in
tests/test_close_workspace_kills_processes.py accordingly.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
Sources/TerminalController.swiftSources/Workspace.swifttests/test_close_workspace_kills_processes.py

When closing a workspace whose terminal processes don't exit within 0.6s, fall back to SIGTERM → SIGKILL escalation per panel before removing the workspace from the tab list. This prevents orphaned shell processes when closing workspaces with long-running commands.
Teardown flow (closeAllPanelsForWorkspaceClose):
Main-thread responsiveness:
Reentrancy protection:
Refactoring:
Tests:
Fixes #629
Summary by CodeRabbit
Bug Fixes
Tests