Fix spinner clear_transient race condition during rapid reruns#13849
Fix spinner clear_transient race condition during rapid reruns#13849
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
✅ PR preview is ready!
|
There was a problem hiding this comment.
Pull request overview
Fixes a race condition in st.spinner where clear_transient could be sent even when no transient was ever created, which could lead to stale-anchor behavior and incorrect element removal during rapid reruns.
Changes:
- Track whether
create_transient()actually ran and only enqueueclear_transient()when a transient was created. - Add a regression unit test to ensure no transient-related messages are sent when the spinner exits before its delay timer fires.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
lib/streamlit/elements/spinner.py |
Adds transient_created tracking so clear_transient() is only sent when needed, preventing unnecessary transient node creation. |
lib/tests/streamlit/spinner_test.py |
Adds a regression test asserting that fast-exiting spinners do not enqueue transient messages. |
| """Test that no transient message is sent when spinner exits before timer fires. | ||
|
|
||
| This is a regression test for a race condition where rapid reruns would cause | ||
| unnecessary clear_transient messages to be sent, leading to stale element removal. | ||
| The spinner has a 0.5s delay before showing, so if the context exits before that, | ||
| no transient should be created or cleared. | ||
| """ | ||
| with st.spinner("quick spinner"): | ||
| # Exit immediately, before the 0.5s timer fires | ||
| time.sleep(0.1) |
There was a problem hiding this comment.
The test and docstring hard-code the spinner delay as "0.5s" and rely on time.sleep(0.1). Since the real delay is defined by DELAY_SECS in streamlit.elements.spinner, consider referencing that constant (and avoiding mentioning a specific numeric value in the docstring) so the test remains correct if the delay is ever adjusted.
SummaryThis PR fixes a race condition where UI elements (like buttons) could disappear after rapidly clicking buttons that trigger full reruns. The root cause was that the spinner's Production code change: 1 file ( Test changes: 25 test files updated to reflect the corrected behavior. Code QualityThe production fix is clean, minimal, and well-placed: display_message = True
transient_created = False
display_message_lock = threading.Lock()
timer: threading.Timer | None = None
try:
def set_message() -> None:
nonlocal transient_created
with display_message_lock:
if display_message:
# Ignore the DeltaGenerator conveniences because Transients are special
enqueue_message(create_transient())
transient_created = True
timer = threading.Timer(DELAY_SECS, set_message)
add_script_run_ctx(timer)
timer.start()
# Yield control back to the context.
yield
finally:
if timer:
timer.cancel()
with display_message_lock:
display_message = False
# Only send the clear message if we actually created a transient.
# This prevents unnecessary clear messages during rapid reruns
# where the timer is cancelled before firing.
if transient_created:
enqueue_message(clear_transient())Thread safety analysis: The
The comments are clear and explain the "why" effectively. Test CoverageNew regression test ( Updated existing tests: The PR comprehensively updates ~20
This is the right approach — these tests should not have been coupled to spinner internals. Updated spinner-specific cache tests ( No E2E test was added, which is acceptable here — the race condition is timing-dependent and an E2E test would be inherently flaky. The unit test covers the core mechanism reliably. Backwards CompatibilityNo breaking changes. This is purely an internal optimization that reduces unnecessary messages from the server to the client. No public API signatures, protobuf definitions, or observable user-facing behavior (other than the bug fix) is changed. The spinner still displays correctly when the timer fires. Security & RiskNo security concerns. The change is a minor logic guard in a single function. Risk of regression is very low:
AccessibilityNo frontend changes in this PR. The fix is entirely backend (Python message-passing logic). No accessibility implications. RecommendationsNo blocking issues. Minor suggestions:
VerdictAPPROVED: Clean, minimal, thread-safe fix for a real race condition bug with comprehensive test updates. The approach is sound, well-documented, and low-risk. This is an automated AI review using |
51696a5 to
3cc7750
Compare
SummaryThis PR fixes a frontend race condition (issue #13634) where the first widget above a Only two files are changed:
Reviewers agree unanimously that the fix is clean, minimal, well-placed, and ready to merge. Code QualityBoth reviewers confirm the implementation is concise and follows existing visitor patterns. The early-return guard at lines 146-152 checks three narrow conditions:
This correctly short-circuits before the existing logic that would recursively visit the anchor (which may carry a stale Agreement: Both reviewers note the logic is consistent with how Additional analysis (opus-4.6-thinking): The early return bypasses the recursive Test CoverageBoth reviewers agree the new unit test ("restores anchor for transient cleared in current run") correctly validates the core bug scenario and is valuable as a regression test. Existing tests for other Agreement on residual gap: Both reviewers note there is no E2E test reproducing the real-world Non-blocking suggestions (opus-4.6-thinking):
Backwards CompatibilityNo breaking changes. Both reviewers agree the behavior change is intentionally scoped to transient nodes that are from the current script run, have no transient children, and still have an anchor. The early return has strictly narrower conditions than the subsequent code path, preserving existing behavior for all other cases. Security & RiskNo security concerns identified by either reviewer. Both agree regression risk is low because the condition is narrow and test-backed, though render-tree sequencing issues are historically subtle. AccessibilityNo accessibility-impacting changes. No DOM or ARIA changes in this PR. Recommendations
VerdictAPPROVED: Both reviewers approve unanimously. The fix is focused, minimal, consistent with existing render-tree patterns, and sufficiently validated at unit level. No backwards compatibility, security, or accessibility concerns. Merge is recommended, with a suggestion to strengthen integration/E2E coverage as a follow-up. Consolidated review by 📋 Review by `gpt-5.3-codex-high`SummaryNo blocking issues found. The PR introduces a targeted fix in Code QualityThe implementation is concise and follows existing visitor patterns in the render-tree layer.
Test CoverageUnit coverage is improved and directionally good:
Residual gap:
Backwards CompatibilityNo breaking API or schema changes. Behavior change is intentionally scoped to transient nodes that:
This is backward compatible for consumers and aligns with expected UI persistence semantics during rerun transitions. Security & RiskNo security concerns identified. Regression risk appears low because the condition is narrow and test-backed, though render-tree sequencing issues are historically subtle, so the lack of e2e coverage remains the main residual risk. AccessibilityNo accessibility-impacting UI/DOM changes in this PR. No a11y concerns identified. Recommendations
VerdictAPPROVED: The fix is focused, consistent with render-tree behavior, and sufficiently validated at unit level; merge is reasonable with a recommendation to strengthen integration coverage. This is an automated AI review by 📋 Review by `opus-4.6-thinking`SummaryThis PR fixes a frontend race condition (issue #13634) where the first widget above a Only two files are changed:
Code QualityThe fix is clean, minimal, and well-placed. The early-return guard at lines 146-152 of
This correctly short-circuits before the existing logic that would recursively visit the anchor (which may have a stale The logic is consistent with how One minor observation: the early return bypasses the recursive
Test CoverageThe new test case ("restores anchor for transient cleared in current run") correctly validates the core bug scenario: a Minor suggestions (non-blocking):
Backwards CompatibilityNo breaking changes. The fix only adds an early-return for a specific condition that was previously handled incorrectly (anchor being dropped). Existing behavior for all other cases is preserved — the early return has strictly narrower conditions than the subsequent code path. Security & Risk
AccessibilityNo accessibility impact. This is a render-tree data structure fix with no DOM or ARIA changes. Recommendations
VerdictAPPROVED: Clean, minimal, and well-targeted fix for a legitimate race condition. The early-return logic is correct, consistent with existing patterns (e.g., This is an automated AI review by |
## Describe your changes Fixes a regression introduced in #13849 where stale elements from previous script runs could persist after spinner completion. **Root cause:** When a transient node (spinner) captured an element from a previous run as its anchor, and the transient was cleared, the anchor was restored directly without checking if it was stale. **The fix:** Change `return node.anchor` to `return node.anchor.accept(this)` so the anchor is also checked for staleness before being restored. ## GitHub Issue Link (if applicable) Closes #14249 ## Testing Plan - [x] Unit tests: Added test case for stale anchor scenario - [x] Manual testing: Verified fix using reproduction steps from issue <!-- agent-metrics-start --> <details> <summary>Agent metrics</summary> | Type | Name | Count | |------|------|------:| | subagent | fixing-pr | 1 | </details> <!-- agent-metrics-end -->
Describe your changes
Fixes a frontend stale-node edge case where the first widget above a
@st.fragmentcould disappear and be replaced bydiv.stEmptyafter a cache miss.Root cause: During stale-node cleanup, a
TransientNodethat had already been explicitly cleared in the current run (transientNodes.length === 0) could still hold a valid anchor element from a previous run. The existing stale filtering path treated that anchor as stale and removed it.Fix: In
ClearStaleNodeVisitor.visitTransientNode, return the anchor directly when the transient was cleared in the current run and has no transient children. This preserves the anchored widget instead of incorrectly dropping it.GitHub Issue Link (if applicable)
Closes #13634
Testing Plan
ClearStaleNodeVisitor.test.ts:restores anchor for transient cleared in current runyarn vitest run lib/src/render-tree/visitors/ClearStaleNodeVisitor.test.ts -t "restores anchor for transient cleared in current run"Contribution License Agreement
By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.