perf(table): lazy download-size RPC + first-page extrapolation#9691
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
| "columns": self._get_column_types(), | ||
| "dataframe-name": dataframe_name, | ||
| "total": rows, | ||
| "size-bytes": self._get_json_size_bytes(self._manager), |
There was a problem hiding this comment.
yup. This used to json stringify the entire table
There was a problem hiding this comment.
Pull request overview
This PR improves table/dataframe rendering performance by removing the eager full-table JSON size computation from initial render and replacing it with a lazy get_size_bytes RPC that estimates serialized size by extrapolating from a small head sample (used only when a host sets downloadSizeLimitAtom).
Changes:
- Add
TableManager.estimate_size_bytes()and newget_size_bytesRPCs formo.ui.tableandmo.ui.dataframe. - Remove eager
size-bytesplumbing from initial component args / search responses and fetch size lazily on the frontend when a download policy is active. - Update frontend export UX to reflect the “checking download size” state and add/adjust tests accordingly.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| marimo/_plugins/ui/_impl/tables/table_manager.py | Adds sample-based size estimation helper used by both widgets |
| marimo/_plugins/ui/_impl/table.py | Adds get_size_bytes RPC; removes eager size computation from args/search |
| marimo/_plugins/ui/_impl/dataframes/dataframe.py | Adds get_size_bytes RPC; removes eager size plumbing |
| frontend/src/plugins/impl/DataTablePlugin.tsx | Adds lazy size fetch tied to download policy; threads size into table UI |
| frontend/src/plugins/impl/data-frames/DataFramePlugin.tsx | Wires get_size_bytes through dataframe plugin into table UI |
| frontend/src/components/data-table/export-actions.tsx | Disables export while size is being checked; improves tooltip messaging |
| frontend/src/components/data-table/TableTopBar.tsx | Passes loading state into export menu |
| frontend/src/components/data-table/data-table.tsx | Plumbs sizeBytesIsLoading through to top bar/export menu |
| frontend/src/stories/dataframe.stories.tsx | Updates story to provide get_size_bytes |
| frontend/src/plugins/impl/tests/DataTablePlugin.test.tsx | Mocks get_size_bytes in tests |
| tests/_plugins/ui/_impl/test_table.py | Replaces eager size tests with RPC estimation tests |
| tests/_plugins/ui/_impl/dataframes/test_dataframe.py | Adds dataframe RPC estimation test |
There was a problem hiding this comment.
3 issues found across 12 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
|
|
||
| # Rows used when estimating the JSON-serialized size of the rendered data. | ||
| # Bigger samples are more precise but cost more on the kernel control loop. | ||
| SIZE_ESTIMATE_SAMPLE_ROWS = 100 |
There was a problem hiding this comment.
Could still be large depending on the entry. Logic looks reasonable, but wonder if it would still be better to just use narwhals: https://narwhals-dev.github.io/narwhals/api-reference/dataframe/?h=bytes#narwhals.dataframe.DataFrame.estimated_size
There was a problem hiding this comment.
The estimated_size gives size on heap. We want the size of the json string which is encoded in data-uri. That's the part that crashes vscode.
For e.g. null values in the estimated size calculation (it uses pyarrow) add almost nothing in final size calculation. when stringifed to None they start adding up.
There was a problem hiding this comment.
Seems like a good proxy then
There was a problem hiding this comment.
yeah I hope this goes away in future as we implement virtual file system in marimo-lsp and stream the files. I have a ticket to track the work, but it's on low prio for now
Drop format_mapping from estimate_size_bytes. Downloads serialize raw values, so applying formatters made the estimate diverge from actual download size and could mis-classify against the frontend size limit. Hardcode strict_json=True and ensure_ascii=True so the estimate models the JSON download path (upper-bound proxy across formats), and round up the extrapolated size with math.ceil to keep the gate conservative. Also skip the get_size_bytes RPC and its loading state when showDownload=false, and forward sizeBytesIsLoading through DataTableComponent so the 'Checking download size' tooltip surfaces.
|
🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.23.9-dev8 |
Summary
Stop blocking the polars/pandas DataFrame render on a full-table JSON serialization. The size measurement is needed only when a host sets
downloadSizeLimitAtom, and even then it is now cheap.Two layered fixes:
get_size_bytesRPC onmo.ui.tableandmo.ui.dataframe. Standalone marimo never calls it; hosts call it only when their policy atom is set (vscode sets it currently).total_rows / sample_size. Returns in single-digit ms even on million-row frames. Lives asTableManager.estimate_size_bytesso both UI widgets share one implementation.For the regression's 1M-row repro,
table(df)._mime_()drops from ~6.3 s to ~ms-scale, matching polars' own_repr_html_(). The new RPC also resolves in ~ms because it only sanitizes 100 rows.Fixes #9688