Skip to content

perf(autoreload): skip stdlib/site-packages on per-cell check#9629

Merged
mscolnick merged 5 commits into
mainfrom
fix/autoreload-per-cell-overhead
May 21, 2026
Merged

perf(autoreload): skip stdlib/site-packages on per-cell check#9629
mscolnick merged 5 commits into
mainfrom
fix/autoreload-per-cell-overhead

Conversation

@mscolnick

Copy link
Copy Markdown
Contributor

This pull request was authored by a coding agent.

Fixes #9628.

With auto_reload set to lazy or autorun, every cell run was calling ModuleReloader.check(sys.modules, reload=True), which iterates all of sys.modules and does os.stat on each entry. With ~1000 modules in scope (typical), that adds 16–80ms per cell — compounded across the dozen cells re-running on a UI interaction it becomes a >1s lag.

This change adds an opt-in skip_non_user_modules=True flag on ModuleReloader.check. When set, stdlib and site-packages module names are recorded in a persistent skip set (classified by sysconfig prefixes) and short-circuited on subsequent calls.

AutoreloadManager.cell_scope (the hot per-cell path) opts in. The background ModuleWatcher keeps the default behavior and continues to scan every module on its 1s loop, so edits inside an installed package are still detected — just at watcher latency rather than cell-entry latency. Editable installs (pip install -e ., uv add --editable) have __file__ outside site-packages, so they are correctly classified as user code and reload with no latency change.

Benchmark

Driving ModuleReloader.check() directly, 200 iterations post-warmup. Issue-shaped workload: ~2.5k modules (heavy stdlib + numpy/pandas/etc.) + 5 user files in a tmp dir.

path median p95
before 4.88 ms 6.15 ms
after 0.91 ms 1.01 ms

~4 ms saved per cell run, 5.4× median speedup.

Scale curve (median µs, varying user-module count):

user mods sys.modules before after speedup
0 2514 5037 873 5.8×
5 2519 5245 802 6.5×
25 2539 6082 1693 3.6×
100 2614 8342 4421 1.9×
500 3014 12489 8398 1.5×

The win narrows as user-code grows, by design: the optimization only filters out non-user-code.

Every cell run with auto_reload enabled was stat-ing every entry in
sys.modules (often 1000+), adding 16-80ms of overhead per cell.

Add an opt-in skip_non_user_modules flag on ModuleReloader.check that
caches stdlib/site-packages module names in a persistent skip set.
AutoreloadManager.cell_scope opts in; the background ModuleWatcher
keeps the default full scan so edits inside installed packages remain
detectable at watcher latency.

Fixes #9628
@vercel

vercel Bot commented May 20, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment May 20, 2026 10:14pm

Request Review

@mscolnick mscolnick added the enhancement New feature or request label May 20, 2026
@mscolnick mscolnick requested review from akshayka and dmadisetti May 20, 2026 18:40
@mscolnick mscolnick marked this pull request as ready for review May 20, 2026 18:43
Copilot AI review requested due to automatic review settings May 20, 2026 18:43

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 3 files

Architecture diagram
sequenceDiagram
    participant UI as User/Client
    participant AM as AutoreloadManager
    participant MR as ModuleReloader
    participant MW as ModuleWatcher (Background)
    participant sysmod as sys.modules dict
    participant FS as Filesystem (os.stat)

    Note over AM,FS: Per-cell execution path (hot path)

    UI->>AM: Execute cell (lazy/auto reload)
    AM->>AM: snapshot = set(sys.modules)
    AM->>MR: check(modules=sys.modules, reload=True, skip_non_user_modules=True)
    
    Note over MR: Skip cache populated lazily
    MR->>MR: _non_user_roots from sysconfig (stdlib, purelib, platlib, base_prefix)
    
    loop For each module in sys.modules
        alt Module name in _skip set
            MR->>MR: continue (skip entirely)
        else Module not classified yet
            MR->>MR: _is_user_module(module)
            alt __file__ starts with non_user_root
                MR->>MR: skip.add(modname), continue
            else User module (editable install / source tree)
                MR->>FS: os.stat(module.__file__)
                FS-->>MR: mtime
                MR->>MR: Compare with cached mtime
            end
        end
    end
    
    alt Stale modules found
        MR->>MR: Reload stale modules
        MR-->>AM: Set of modified modules
    else No stale modules
        MR-->>AM: Empty set (fast path)
    end
    
    AM->>AM: Execute cell yield
    AM->>AM: new_modules = sys.modules - snapshot
    AM->>MR: check(new_modules, reload=False, skip_non_user_modules=True)
    
    Note over AM: Cell execution complete

    Note over MW,FS: Background watcher path (1s loop)

    loop Every ~1 second
        MW->>MR: check(modules=sys.modules, reload=False)
        Note over MR: Default behavior - scans ALL modules
        
        loop For each module
            alt User module (not in site-packages)
                MR->>FS: os.stat(n), compare
            else Stdlib / site-packages
                MR->>FS: os.stat(n), compare
            end
        end
        
        alt Modified modules detected
            MR-->>MW: Set of updated module names
            MW->>MW: Trigger reload callback (if auto_reload=autorun)
        end
    end

    Note over AM,FS: New: skip_non_user_modules flag
    Note over AM: User code changes detected immediately
    Note over MW: Site-package changes detected at watcher latency
Loading

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread marimo/_runtime/reload/manager.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves autoreload performance by avoiding per-cell os.stat scans over the full sys.modules set when runtime.auto_reload is enabled, addressing the cell execution latency regression reported in #9628.

Changes:

  • Added skip_non_user_modules option to ModuleReloader.check() and a persistent skip cache for stdlib/site-packages modules.
  • Updated AutoreloadManager.cell_scope() to use the skip behavior on the hot per-cell path.
  • Added targeted tests for user vs non-user module classification and skip-cache behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
marimo/_runtime/reload/autoreload.py Introduces non-user root detection, user-module classification, and a persistent skip cache used by ModuleReloader.check().
marimo/_runtime/reload/manager.py Opts the per-cell autoreload path into skipping non-user modules to reduce per-cell overhead.
tests/_runtime/reload/test_autoreload.py Adds regression tests for skip-cache population and behavior differences between watcher vs hot path.
Comment thread marimo/_runtime/reload/manager.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/manager.py Outdated

@akshayka akshayka left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, code style comments

Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/autoreload.py Outdated

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread marimo/_runtime/reload/autoreload.py Outdated
akshayka
akshayka previously approved these changes May 20, 2026

@dmadisetti dmadisetti left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the continue logic should be tied to skip_non_user_modules

Comment thread marimo/_runtime/reload/autoreload.py
Comment thread marimo/_runtime/reload/autoreload.py Outdated
source tree, so they are correctly classified as user code.
"""
f = safe_getattr(module, "__file__", None)
if not f:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false positive on c libraries? Unsure, but I think so. Maybe that's fine

Comment thread marimo/_runtime/reload/autoreload.py Outdated
@mscolnick

Copy link
Copy Markdown
Contributor Author

thanks @dmadisetti, i had that but removed from the comments. will add back skip_non_user_modules to just the hot path

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread marimo/_runtime/reload/autoreload.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread marimo/_runtime/reload/autoreload.py Outdated
Comment thread marimo/_runtime/reload/manager.py

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread marimo/_runtime/reload/autoreload.py
@mscolnick mscolnick merged commit 5700859 into main May 21, 2026
56 of 62 checks passed
@mscolnick mscolnick deleted the fix/autoreload-per-cell-overhead branch May 21, 2026 14:19
@github-actions

Copy link
Copy Markdown
Contributor

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.23.7-dev71

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

4 participants