Skip to content

feat(save): stub serialization toolkit (class/lazy/module stubs)#9896

Merged
dmadisetti merged 3 commits into
mainfrom
dm/save-stubs
Jun 24, 2026
Merged

feat(save): stub serialization toolkit (class/lazy/module stubs)#9896
dmadisetti merged 3 commits into
mainfrom
dm/save-stubs

Conversation

@dmadisetti

@dmadisetti dmadisetti commented Jun 15, 2026

Copy link
Copy Markdown
Member

Expands the current stubbing mechanism for larger coverage for various python instances (classes, lambdas, modules, and module imports)

Introduces:

  • UnhashableStub
  • ClassStub

Additionally:

  • Handles module imports explicitly to prevent their serialization to disk
  • and replaces pytorch tensor Pickled objects with a .pt loader
@vercel

vercel Bot commented Jun 15, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Jun 24, 2026 4:52pm

Request Review

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found and verified against the latest diff

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread marimo/_save/stubs/lazy_stub.py Outdated
Comment thread marimo/_save/stubs/module_stub.py Outdated
Comment thread marimo/_save/stubs/class_stub.py
Comment thread marimo/_save/cache.py Outdated
Comment thread marimo/_save/stubs/class_stub.py
@dmadisetti dmadisetti marked this pull request as ready for review June 22, 2026 20:14
Copilot AI review requested due to automatic review settings June 22, 2026 20:14

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands marimo’s cache “stubbing/serialization toolkit” to better persist and restore tricky Python objects (modules, cell-defined classes) and adds infrastructure for resolving __main__ pickle references and serializing torch tensors via .pt.

Changes:

  • Added ClassStub (source-based class serialization) and MissingModule/ModuleStub improvements (name-based module restoration with lazy placeholder on missing modules).
  • Extended lazy-cache manifest schema to support import_ref (inline importable references) and class_def (inline class source), plus added torch .pt blob support.
  • Introduced CellNamespaceUnpickler + pickle_load_with_namespace helper and threaded an optional glbls namespace parameter through loader APIs; added tests covering these behaviors.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/_save/stubs/test_unhashable_stub.py Adds unit tests for UnhashableStub semantics and Cache.restore handling.
tests/_save/stubs/test_module_stub.py Adds tests for module-name-based restore and missing-module placeholder behavior.
tests/_save/stubs/test_class_stub.py Adds tests for capturing/loading class source and filename/linecache behaviors.
tests/_save/loaders/test_unpickler.py Tests the new CellNamespaceUnpickler behavior for __main__ refs.
tests/_save/loaders/test_loader.py Tests inline storage of importable references (no blob files).
tests/_save/loaders/test_class_roundtrip.py End-to-end tests for class stubbing across pickle/lazy loaders.
tests/_save/loaders/mocks.py Updates loader mock signature to accept glbls.
marimo/_save/stubs/module_stub.py Implements MissingModule placeholder and improves ModuleStub.load() error handling.
marimo/_save/stubs/lazy_stub.py Adds manifest fields, torch .pt serializer/deserializer, and UnhashableStub.
marimo/_save/stubs/class_stub.py Implements ClassStub source capture + exec-based restore.
marimo/_save/stubs/init.py Exports ClassStub.
marimo/_save/loaders/unpickler.py Adds CellNamespaceUnpickler and pickle_load_with_namespace.
marimo/_save/loaders/memory.py Extends load_cache() signature with optional glbls.
marimo/_save/loaders/loader.py Threads optional glbls through cache_attempt()/load_cache() interface.
marimo/_save/loaders/lazy.py Adds import_ref/class_def manifest support and adjusts blob writing logic.
marimo/_save/hash.py Avoids KeyError when pinned module isn’t present in sys.modules; passes scope as glbls.
marimo/_save/cache.py Restores ClassStub into scope; preserves UnhashableStub; captures classes/functions selectively.
marimo/_runtime/exceptions.py Adds MarimoCancelCellError and MarimoUnhashableCacheError for cache tripwire flow.
Comments suppressed due to low confidence (1)

marimo/_save/loaders/lazy.py:205

  • LazyLoader.load_cache() accepts glbls but immediately discards it. Combined with restore_cache() deserializing .pickle blobs eagerly via pickle.loads, this means cached objects that reference __main__ (e.g. instances of cell-defined classes) still cannot be restored from lazy caches, and the new CellNamespaceUnpickler/pickle_load_with_namespace helper is never used. To support class-instance restoration, you’ll need to either (a) thread glbls through to pickle deserialization (using pickle_load_with_namespace) and ensure ClassStubs are materialized into glbls before unpickling dependent blobs, or (b) defer .pickle blob deserialization to Cache.restore(scope) once stubs have been loaded into scope.
    def load_cache(
        self,
        key: HashKey,
        glbls: dict[str, Any] | None = None,
    ) -> Cache | None:
        del glbls
        try:
            blob: bytes | None = self.store.get(str(self.build_path(key)))
            if not blob:
                return None
            return self.restore_cache(key, blob)
Comment thread marimo/_save/loaders/lazy.py Outdated
Comment thread marimo/_save/stubs/lazy_stub.py
Comment on lines +308 to +325
"""Marker + tripwire for a def that could not be serialized for caching.

Written to the cache as a placeholder when per-def pickling fails (e.g.
a lambda, a closure over an unpicklable object). The marker is placed
in scope as-is by `Cache.restore` (no `.load()` call). It is
harmless when the consumer cell never touches it; any meaningful access
(call) raises `MarimoUnhashableCacheError` carrying
`variables=[var_name]` so the runner can identify the defining cell,
invalidate its manifest, and re-queue.

Detection happens at use-site, not at pre-execution. Bodies that don't
touch the stub run normally; closure-captured stubs surface through
whichever access the user code performs.

UnhashableStub is created on-demand by the loader and is not
registered in CUSTOM_STUBS — `get_type()` raises since no specific
value type maps to it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put in place for following up.

@dmadisetti dmadisetti requested a review from mscolnick June 24, 2026 16:54
@dmadisetti dmadisetti changed the title feat(save): stub serialization toolkit — class/lazy/module stubs Jun 24, 2026
@dmadisetti dmadisetti requested a review from kirangadhave June 24, 2026 16:55

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 9 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="marimo/_save/stubs/lazy_stub.py">

<violation number="1">
P1: Removed runtime `micropip` fallback for missing `pyarrow` in pyodide during Arrow cache load. `DependencyManager.pyarrow.require()` only validates presence and raises `ModuleNotFoundError`; it does not install. In pyodide/browser environments where pyarrow may not be pre-installed, Arrow cache deserialization will now crash instead of self-healing via `micropip.install("pyarrow")`.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

@@ -17,6 +17,8 @@
if TYPE_CHECKING:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Removed runtime micropip fallback for missing pyarrow in pyodide during Arrow cache load. DependencyManager.pyarrow.require() only validates presence and raises ModuleNotFoundError; it does not install. In pyodide/browser environments where pyarrow may not be pre-installed, Arrow cache deserialization will now crash instead of self-healing via micropip.install("pyarrow").

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At marimo/_save/stubs/lazy_stub.py, line 150:

<comment>Removed runtime `micropip` fallback for missing `pyarrow` in pyodide during Arrow cache load. `DependencyManager.pyarrow.require()` only validates presence and raises `ModuleNotFoundError`; it does not install. In pyodide/browser environments where pyarrow may not be pre-installed, Arrow cache deserialization will now crash instead of self-healing via `micropip.install("pyarrow")`.</comment>

<file context>
@@ -147,21 +147,7 @@ def _arrow_load(data: bytes, type_hint: str | None = None) -> Any:
-            DependencyManager.pyarrow.require(
-                "to load cached Arrow IPC blobs."
-            )
+    DependencyManager.pyarrow.require("to load cached Arrow IPC blobs.")
     import pyarrow as pa
 
</file context>
def load_cache(
self,
key: HashKey,
glbls: dict[str, Any] | None = None,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like all implementations just delete glbls. it might be in a followup PR, so fine if you want to just make a todo to remove this afterwards.

@mscolnick mscolnick left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially dead from glbls added to load_cache, but that can be handled in a followup

@dmadisetti dmadisetti merged commit 6c7da04 into main Jun 24, 2026
45 checks passed
@dmadisetti dmadisetti deleted the dm/save-stubs branch June 24, 2026 17:09
@github-actions

Copy link
Copy Markdown
Contributor

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.23.11-dev22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

3 participants