Skip to content

feat(logs): add retry from context menu and detail sidebar for failed runs#4181

Open
waleedlatif1 wants to merge 3 commits intostagingfrom
feat/retry-failed-log-v2
Open

feat(logs): add retry from context menu and detail sidebar for failed runs#4181
waleedlatif1 wants to merge 3 commits intostagingfrom
feat/retry-failed-log-v2

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

  • Persist workflowInput in execution log data so it can be recovered for retry
  • Add "Retry" to the right-click context menu on failed log rows
  • Add retry button in the log detail sidebar header for failed runs
  • Add useRetryExecution mutation hook using streaming to avoid blocking on long workflows
  • Fall back to executionState.blockStates for old logs that don't have workflowInput

Type of Change

  • New feature

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Apr 15, 2026 6:43pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 15, 2026

PR Summary

Medium Risk
Adds a new workflow execution path from the logs UI and persists additional execution data (workflowInput) into stored logs, which could impact log payload size and retry correctness for legacy executions.

Overview
Enables retrying failed workflow executions directly from the logs UI via a new Retry action in the log row context menu and a retry button in the log details sidebar header (with pending/disabled states).

Adds a useRetryExecution mutation that POSTs to /api/workflows/{id}/execute using streaming and cancels after the first chunk to avoid blocking, and wires it into logs.tsx with toasts and a detail-log fetch to obtain the original input.

Persists executionData.workflowInput into execution logs (types + logger) and introduces extractRetryInput to prefer this field while falling back to reconstructing input from legacy executionState.blockStates.

Reviewed by Cursor Bugbot for commit 2591102. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 15, 2026

Greptile Summary

This PR adds a retry action for failed workflow runs via the log row context menu and the detail sidebar, persisting workflowInput at execution completion so the original payload can be recovered for retry. The useRetryExecution mutation uses a streaming trigger-and-cancel pattern to avoid blocking the UI on long-running workflows. The previous review concerns (fallback heuristic, isPending guard) have been addressed.

  • workflowInput stored without redaction (logger.ts lines 375–383): traceSpans and finalOutput are passed through redactApiKeys() before persisting, but workflowInput is not. Credentials embedded in webhook payloads, API-trigger bodies, or manual inputs will be stored in plaintext in the executionData column and surfaced to all log-view users.

Confidence Score: 4/5

Safe to merge after addressing the unredacted workflowInput storage — the UI and hook changes are clean and the previous review issues are resolved.

One P1 security finding: workflowInput bypasses redactApiKeys() while traceSpans and finalOutput are both redacted. All other findings are P2 or lower. Prior review concerns (fallback heuristic, isPending guard) are confirmed resolved.

apps/sim/lib/logs/execution/logger.ts — unredacted workflowInput persistence

Security Review

  • Unredacted credential storage (lib/logs/execution/logger.ts): workflowInput is written to the executionData JSON column without passing through redactApiKeys(). Webhook payloads, API-trigger bodies, and manual inputs containing tokens, passwords, or API keys are stored in plaintext and returned to any user with log-view access. traceSpans and finalOutput receive the same redactApiKeys() treatment, so this field is inconsistently protected.

Important Files Changed

Filename Overview
apps/sim/lib/logs/execution/logger.ts Persists workflowInput to the database without passing it through redactApiKeys(), while traceSpans and finalOutput are redacted. Also uses any for workflowInput parameter type.
apps/sim/hooks/queries/logs.ts Adds useRetryExecution mutation that fires POST /execute with stream: true, reads one chunk, then cancels to trigger server-side execution without blocking; cache invalidation and isPending guard are correct.
apps/sim/app/workspace/[workspaceId]/logs/utils.ts Adds extractRetryInput with correct fallback heuristic (executed === false && executionTime === 0) for old logs; logic is clean and the previous reviewer concerns are addressed.
apps/sim/app/workspace/[workspaceId]/logs/logs.tsx Wires retry logic through retryLog, passes isRetryPending to both LogRowContextMenu and LogDetails; useCallback deps are correctly suppressed per project guidelines for stable TanStack Query refs.
apps/sim/app/workspace/[workspaceId]/logs/components/log-row-context-menu/log-row-context-menu.tsx Context menu correctly gates the Retry item behind isRetryable and disables it while isRetryPending; shows "Retrying…" label during in-flight mutation.
apps/sim/app/workspace/[workspaceId]/logs/components/log-details/log-details.tsx Retry button added to sidebar header, correctly gated on log.status === 'failed' and disabled while isRetryPending.
apps/sim/lib/logs/types.ts Adds workflowInput?: unknown to WorkflowExecutionLog.executionData — correctly typed; aligns with completeWorkflowExecution interface update.

Sequence Diagram

sequenceDiagram
    participant U as User
    participant CM as ContextMenu / Sidebar
    participant RL as retryLog()
    participant QC as QueryClient
    participant RE as useRetryExecution
    participant API as /api/workflows/:id/execute

    U->>CM: Right-click failed log / click Retry
    CM->>RL: retryLog(log)
    RL->>QC: fetchQuery(logKeys.detail(logId))
    QC->>API: GET /api/logs/:logId
    API-->>QC: WorkflowLog (with workflowInput)
    QC-->>RL: detailLog
    RL->>RL: extractRetryInput(detailLog)
    note over RL: prefers workflowInput field,<br/>falls back to blockStates heuristic
    RL->>RE: mutateAsync({ workflowId, input })
    RE->>API: POST /execute {input, triggerType:"manual", stream:true}
    API-->>RE: stream (first chunk)
    RE->>RE: reader.read() then reader.cancel()
    note over RE: execution continues server-side
    RE-->>RL: { started: true }
    RL-->>U: toast.success("Retry started")
    RE->>QC: invalidate logs + details + stats
Loading

Comments Outside Diff (1)

  1. apps/sim/lib/logs/execution/logger.ts, line 375-384 (link)

    P1 security workflowInput persisted without API-key redaction

    traceSpans and finalOutput both pass through redactApiKeys() before reaching buildCompletedExecutionData, but workflowInput is forwarded as-is. Any credentials embedded in a workflow's input — OAuth tokens from webhook payloads, API keys passed via the API trigger, passwords in manual inputs — will be stored in plaintext in the executionData JSON column and surfaced through the log detail endpoint to all users with log-view access.

    Since the unredacted value is needed for retry fidelity, one option is to store two fields: workflowInput (redacted, for display) and workflowInputRaw (encrypted or access-controlled, for retry only). Alternatively, if raw storage is intentional, the log-detail API response should explicitly filter workflowInput before sending it to clients, and this should be documented.

Reviews (3): Last reviewed commit: "fix(logs): store workflowInput unredacte..." | Re-trigger Greptile

Comment thread apps/sim/app/workspace/[workspaceId]/logs/utils.ts
Comment thread apps/sim/app/workspace/[workspaceId]/logs/logs.tsx
Comment thread apps/sim/app/workspace/[workspaceId]/logs/utils.ts
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/lib/logs/execution/logger.ts Outdated
workflowInput is internal execution data used for replay, same as
executionState which is also stored unredacted. Redacting at storage
time corrupts the data for retry use cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 2591102. Configure here.

waleedlatif1 added a commit that referenced this pull request Apr 18, 2026
Brings PR #4181 inline: persists workflowInput on successful runs,
adds useRetryExecution mutation (streaming read-one-chunk-and-cancel),
Retry entrypoints in the row context menu and the detail sidebar, and
extractRetryInput with fallback to starter block state for older logs.
Also surfaces the captured input in a new "Workflow Input" section
above Workflow Output in the detail Overview tab, guarded so older
logs without the field don't render an empty block.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
waleedlatif1 added a commit that referenced this pull request Apr 24, 2026
Brings PR #4181 inline: persists workflowInput on successful runs,
adds useRetryExecution mutation (streaming read-one-chunk-and-cancel),
Retry entrypoints in the row context menu and the detail sidebar, and
extractRetryInput with fallback to starter block state for older logs.
Also surfaces the captured input in a new "Workflow Input" section
above Workflow Output in the detail Overview tab, guarded so older
logs without the field don't render an empty block.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
waleedlatif1 added a commit that referenced this pull request Apr 28, 2026
Brings PR #4181 inline: persists workflowInput on successful runs,
adds useRetryExecution mutation (streaming read-one-chunk-and-cancel),
Retry entrypoints in the row context menu and the detail sidebar, and
extractRetryInput with fallback to starter block state for older logs.
Also surfaces the captured input in a new "Workflow Input" section
above Workflow Output in the detail Overview tab, guarded so older
logs without the field don't render an empty block.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
waleedlatif1 added a commit that referenced this pull request Apr 29, 2026
Brings PR #4181 inline: persists workflowInput on successful runs,
adds useRetryExecution mutation (streaming read-one-chunk-and-cancel),
Retry entrypoints in the row context menu and the detail sidebar, and
extractRetryInput with fallback to starter block state for older logs.
Also surfaces the captured input in a new "Workflow Input" section
above Workflow Output in the detail Overview tab, guarded so older
logs without the field don't render an empty block.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
waleedlatif1 added a commit that referenced this pull request Apr 29, 2026
…ons, and execution improvements (#4292)

* improvement(trace-spans): rewrite trace span pipeline with per-iteration enrichment

Unify tool calls under span.children, capture dual-clock timing, and
surface per-iteration model content (assistant text, thinking, tool
calls, finish reason, tokens, cost, ttft, provider, errors) across all
12 LLM providers. UI renders the new fields on model child spans; old
logs degrade gracefully since every field is optional.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* improvement(logs): add Trace tab with two-pane tree+detail view

- Wrap log-details drawer in Overview | Trace tabs; Overview unchanged
- New TraceView with hierarchical tree on the left and detail pane on the right
- Keyboard nav, span filter, expand/collapse all
- Bump min drawer width 400->600 and clamp persisted widths on rehydrate

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(logs): retry failed runs + show workflow input in detail

Brings PR #4181 inline: persists workflowInput on successful runs,
adds useRetryExecution mutation (streaming read-one-chunk-and-cancel),
Retry entrypoints in the row context menu and the detail sidebar, and
extractRetryInput with fallback to starter block state for older logs.
Also surfaces the captured input in a new "Workflow Input" section
above Workflow Output in the detail Overview tab, guarded so older
logs without the field don't render an empty block.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ui): use inverted popover scheme for usage-control popovers

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(logs): trace view chevron padding, section state leak, and tab-scoped keyboard nav

- Pad tree rows from panel edge so the root chevron isn't visually clipped.
- Key DetailCodeSection by label so collapse state belongs to the section
  purpose, preventing isOpen from leaking across span changes when positional
  slots happened to align.
- Ignore log-to-log arrow-key nav while the Trace tab is active so TraceView
  owns span navigation; filter inputs keep native caret movement.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(logs): align trace tree rows to 14px content grid

Chevron at depth 0 and the timeline bars now sit on the same 14px left/right
grid as the trace view's header strip and the rest of the log details panel,
removing the stagger where bars extended further left than chevrons and the
chevron appeared cramped against the panel edge.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(logs): restore scroll in log-details panel

Overview tab's scroll container (SModalTabsContent) was wrapped around a
non-overflow inner div that held the scrollAreaRef, so the scroll-reset on
log change targeted a non-scrolling element. Collapse the wrapper into the
Tabs.Content element itself and move the ref there. Add min-h-0 to the
Trace detail pane wrapper so its scrolling child can shrink inside the
horizontal-flex row.

* fix(logs): hide inactive Overview tab panel

Tailwind's `.flex` utility overrides the UA `[hidden]` rule, so applying
`flex` to SModalTabsContent caused the inactive Overview panel to still
participate in the Tabs flex column and push the Trace view down. Keep
SModalTabsContent as a plain overflow container (no `flex` class) with
the scroll ref on it, and restore the inner flex-col wrapper for the
Overview content so it still stacks with gap spacing.

* fix(logs): trace view padding, section cutoff, keyboard visibility

- Tree pane now has top padding so the first row has breathing room
  under the header strip instead of sitting flush against the border.
- DetailCodeSection dropped its wrapper `overflow-hidden`. Per CSS, a
  flex item with `overflow: hidden` resolves `min-height: auto` to `0`,
  so when Input and Output were both expanded the flex algorithm
  shrank each section below its content, cutting off rows. Without the
  clip, sections size to content and the surrounding pane's
  `overflow-y-auto` takes over.
- Selected span row now scrolls into view on selection change, so
  arrow-key navigation always keeps the active row visible in the
  tree pane.

* fix(logs): inline Workflow State row and lift search dropdown z-index

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(logs): use emcn Button for View Snapshot action

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* minor improvements

* fix(logs): trace view resizable split, bar visibility, provider icons, cleanup

- Resizable tree/detail split in trace view (default 360px, drag to resize)
- Resizable right panel in preview snapshot (280–600px)
- Fix Gantt bar invisibility for late-run spans (clamp offsetPct to 100-MIN_BAR_PCT)
- Propagate model+provider to child model spans in span-factory for correct icons
- Fix icon contrast on light provider backgrounds (luminance-based color class)
- Replace custom status badges with emcn Badge component
- Lighten jump-to-error button to ghost variant
- Remove double X button in modal snapshot (showBlockCloseButton prop)
- Fix emcn subpath imports → barrel in trace-view, log-details, execution-snapshot
- Fix hover: → hover-hover: on resize handles
- Add body style cleanup on resize unmount
- Fix React Query key factory naming (stats/stat convention)
- Remove unnecessary useCallback/useMemo in preview and execution-snapshot

* fix(ui): scroll guard, credentials UX, design token fixes, input padding

- logs: only scroll-into-view on keyboard nav, not on click selection
- resource: stable scrollbar gutter, wider first column
- credentials: toast success/error feedback, remove useMemo for personalEnvData,
  allow editing conflict rows, fix disabled state visibility, use --text-error token
- integrations: use --text-error token for error state
- input: increase right padding (px-2 → pl-2 pr-3)

* chore(skills): add /ship command to claude, cursor, and agents

* fix(input): add scroll-pr-1 to keep caret visible when text overflows

* fix(logs): address PR review — iteration name guard, cost race, mothership retry

* improvement(logs): cleanup pass — remove anti-patterns, fix design tokens, simplify state

* fix(trace-spans): extend final model segment by position not by stale constant name

* fix(modal): restore sidebar-width padding on non-workflow pages

* fix(secrets): eliminate slow save by parallelizing DB ops and fixing stuck button

Sequential per-variable, per-workspace DB round-trips in syncPersonalEnvCredentialsForUser caused O(W×K) latency (800–1600ms for 10 workspaces). Replaced with parallel workspace processing and batched upserts. Also parallelized secret decryption in the GET handler.

On the client, removed the changeToken bug that left the Save button permanently disabled after a failed save, split the shared hasSavedRef into two independent flags to eliminate ordering races, and moved ref updates to after mutation success so optimistic state can never get stuck.

* updated sap block

* fix(sap): remove slash from S4HANA name, set white bgColor, regenerate docs

* fix(logs): prevent log-row arrow navigation when trace tab is active

* fix(logs): aggregate cost onto workflow root span; stabilize onActiveTabChange callback

* improvement(logs): fix Gantt time bounds to walk full span tree; cleanup effects, memos, callbacks, React Query mutations

* fix(logs): reset detail panel tab to overview on log switch

* chore(logs): remove extraneous comments

* fix(logs): restore useEffect for async setActiveWorkflow and useMemo on rowProps

- resource-content.tsx: revert render-time setActiveWorkflow call to useEffect; the store action is async and performs network ops, calling it during render violates React purity
- logs-list.tsx: restore useMemo on rowProps to prevent virtualized list rows from re-rendering on every parent render

* fix(queries): forward AbortSignal in mothership-admin query functions

All queryFn callbacks must forward signal for request cancellation per project React Query standards.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants