feat(save): stub serialization toolkit (class/lazy/module stubs)#9896
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
5 issues found and verified against the latest diff
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
c2b8ae6 to
7c9f48e
Compare
There was a problem hiding this comment.
Pull request overview
This PR expands marimo’s cache “stubbing/serialization toolkit” to better persist and restore tricky Python objects (modules, cell-defined classes) and adds infrastructure for resolving __main__ pickle references and serializing torch tensors via .pt.
Changes:
- Added
ClassStub(source-based class serialization) andMissingModule/ModuleStubimprovements (name-based module restoration with lazy placeholder on missing modules). - Extended lazy-cache manifest schema to support
import_ref(inline importable references) andclass_def(inline class source), plus added torch.ptblob support. - Introduced
CellNamespaceUnpickler+pickle_load_with_namespacehelper and threaded an optionalglblsnamespace parameter through loader APIs; added tests covering these behaviors.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/_save/stubs/test_unhashable_stub.py | Adds unit tests for UnhashableStub semantics and Cache.restore handling. |
| tests/_save/stubs/test_module_stub.py | Adds tests for module-name-based restore and missing-module placeholder behavior. |
| tests/_save/stubs/test_class_stub.py | Adds tests for capturing/loading class source and filename/linecache behaviors. |
| tests/_save/loaders/test_unpickler.py | Tests the new CellNamespaceUnpickler behavior for __main__ refs. |
| tests/_save/loaders/test_loader.py | Tests inline storage of importable references (no blob files). |
| tests/_save/loaders/test_class_roundtrip.py | End-to-end tests for class stubbing across pickle/lazy loaders. |
| tests/_save/loaders/mocks.py | Updates loader mock signature to accept glbls. |
| marimo/_save/stubs/module_stub.py | Implements MissingModule placeholder and improves ModuleStub.load() error handling. |
| marimo/_save/stubs/lazy_stub.py | Adds manifest fields, torch .pt serializer/deserializer, and UnhashableStub. |
| marimo/_save/stubs/class_stub.py | Implements ClassStub source capture + exec-based restore. |
| marimo/_save/stubs/init.py | Exports ClassStub. |
| marimo/_save/loaders/unpickler.py | Adds CellNamespaceUnpickler and pickle_load_with_namespace. |
| marimo/_save/loaders/memory.py | Extends load_cache() signature with optional glbls. |
| marimo/_save/loaders/loader.py | Threads optional glbls through cache_attempt()/load_cache() interface. |
| marimo/_save/loaders/lazy.py | Adds import_ref/class_def manifest support and adjusts blob writing logic. |
| marimo/_save/hash.py | Avoids KeyError when pinned module isn’t present in sys.modules; passes scope as glbls. |
| marimo/_save/cache.py | Restores ClassStub into scope; preserves UnhashableStub; captures classes/functions selectively. |
| marimo/_runtime/exceptions.py | Adds MarimoCancelCellError and MarimoUnhashableCacheError for cache tripwire flow. |
Comments suppressed due to low confidence (1)
marimo/_save/loaders/lazy.py:205
LazyLoader.load_cache()acceptsglblsbut immediately discards it. Combined withrestore_cache()deserializing.pickleblobs eagerly viapickle.loads, this means cached objects that reference__main__(e.g. instances of cell-defined classes) still cannot be restored from lazy caches, and the newCellNamespaceUnpickler/pickle_load_with_namespacehelper is never used. To support class-instance restoration, you’ll need to either (a) threadglblsthrough to pickle deserialization (usingpickle_load_with_namespace) and ensureClassStubs are materialized intoglblsbefore unpickling dependent blobs, or (b) defer.pickleblob deserialization toCache.restore(scope)once stubs have been loaded intoscope.
def load_cache(
self,
key: HashKey,
glbls: dict[str, Any] | None = None,
) -> Cache | None:
del glbls
try:
blob: bytes | None = self.store.get(str(self.build_path(key)))
if not blob:
return None
return self.restore_cache(key, blob)
| """Marker + tripwire for a def that could not be serialized for caching. | ||
|
|
||
| Written to the cache as a placeholder when per-def pickling fails (e.g. | ||
| a lambda, a closure over an unpicklable object). The marker is placed | ||
| in scope as-is by `Cache.restore` (no `.load()` call). It is | ||
| harmless when the consumer cell never touches it; any meaningful access | ||
| (call) raises `MarimoUnhashableCacheError` carrying | ||
| `variables=[var_name]` so the runner can identify the defining cell, | ||
| invalidate its manifest, and re-queue. | ||
|
|
||
| Detection happens at use-site, not at pre-execution. Bodies that don't | ||
| touch the stub run normally; closure-captured stubs surface through | ||
| whichever access the user code performs. | ||
|
|
||
| UnhashableStub is created on-demand by the loader and is not | ||
| registered in CUSTOM_STUBS — `get_type()` raises since no specific | ||
| value type maps to it. | ||
|
|
There was a problem hiding this comment.
Put in place for following up.
There was a problem hiding this comment.
1 issue found across 9 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="marimo/_save/stubs/lazy_stub.py">
<violation number="1">
P1: Removed runtime `micropip` fallback for missing `pyarrow` in pyodide during Arrow cache load. `DependencyManager.pyarrow.require()` only validates presence and raises `ModuleNotFoundError`; it does not install. In pyodide/browser environments where pyarrow may not be pre-installed, Arrow cache deserialization will now crash instead of self-healing via `micropip.install("pyarrow")`.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
| @@ -17,6 +17,8 @@ | |||
| if TYPE_CHECKING: | |||
There was a problem hiding this comment.
P1: Removed runtime micropip fallback for missing pyarrow in pyodide during Arrow cache load. DependencyManager.pyarrow.require() only validates presence and raises ModuleNotFoundError; it does not install. In pyodide/browser environments where pyarrow may not be pre-installed, Arrow cache deserialization will now crash instead of self-healing via micropip.install("pyarrow").
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At marimo/_save/stubs/lazy_stub.py, line 150:
<comment>Removed runtime `micropip` fallback for missing `pyarrow` in pyodide during Arrow cache load. `DependencyManager.pyarrow.require()` only validates presence and raises `ModuleNotFoundError`; it does not install. In pyodide/browser environments where pyarrow may not be pre-installed, Arrow cache deserialization will now crash instead of self-healing via `micropip.install("pyarrow")`.</comment>
<file context>
@@ -147,21 +147,7 @@ def _arrow_load(data: bytes, type_hint: str | None = None) -> Any:
- DependencyManager.pyarrow.require(
- "to load cached Arrow IPC blobs."
- )
+ DependencyManager.pyarrow.require("to load cached Arrow IPC blobs.")
import pyarrow as pa
</file context>
| def load_cache( | ||
| self, | ||
| key: HashKey, | ||
| glbls: dict[str, Any] | None = None, |
There was a problem hiding this comment.
it looks like all implementations just delete glbls. it might be in a followup PR, so fine if you want to just make a todo to remove this afterwards.
mscolnick
left a comment
There was a problem hiding this comment.
potentially dead from glbls added to load_cache, but that can be handled in a followup
|
🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.23.11-dev22 |
Expands the current stubbing mechanism for larger coverage for various python instances (classes, lambdas, modules, and module imports)
Introduces:
Additionally:
.ptloader