Skip to content

perf(table): lazy download-size RPC + first-page extrapolation#9691

Merged
kirangadhave merged 4 commits into
mainfrom
kg/table-size-rpc
May 27, 2026
Merged

perf(table): lazy download-size RPC + first-page extrapolation#9691
kirangadhave merged 4 commits into
mainfrom
kg/table-size-rpc

Conversation

@kirangadhave

@kirangadhave kirangadhave commented May 26, 2026

Copy link
Copy Markdown
Member

Summary

Stop blocking the polars/pandas DataFrame render on a full-table JSON serialization. The size measurement is needed only when a host sets downloadSizeLimitAtom, and even then it is now cheap.

Two layered fixes:

  • New get_size_bytes RPC on mo.ui.table and mo.ui.dataframe. Standalone marimo never calls it; hosts call it only when their policy atom is set (vscode sets it currently).
  • The RPC serializes a 100-row head sample and scales by total_rows / sample_size. Returns in single-digit ms even on million-row frames. Lives as TableManager.estimate_size_bytes so both UI widgets share one implementation.

For the regression's 1M-row repro, table(df)._mime_() drops from ~6.3 s to ~ms-scale, matching polars' own _repr_html_(). The new RPC also resolves in ~ms because it only sanitizes 100 rows.

Fixes #9688

@vercel

vercel Bot commented May 26, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment May 26, 2026 8:27pm

Request Review

@kirangadhave kirangadhave requested a review from dmadisetti May 26, 2026 19:22
@kirangadhave kirangadhave marked this pull request as ready for review May 26, 2026 19:22
Copilot AI review requested due to automatic review settings May 26, 2026 19:22
@kirangadhave kirangadhave added the bug Something isn't working label May 26, 2026
"columns": self._get_column_types(),
"dataframe-name": dataframe_name,
"total": rows,
"size-bytes": self._get_json_size_bytes(self._manager),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this this blocking bit?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup. This used to json stringify the entire table

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves table/dataframe rendering performance by removing the eager full-table JSON size computation from initial render and replacing it with a lazy get_size_bytes RPC that estimates serialized size by extrapolating from a small head sample (used only when a host sets downloadSizeLimitAtom).

Changes:

  • Add TableManager.estimate_size_bytes() and new get_size_bytes RPCs for mo.ui.table and mo.ui.dataframe.
  • Remove eager size-bytes plumbing from initial component args / search responses and fetch size lazily on the frontend when a download policy is active.
  • Update frontend export UX to reflect the “checking download size” state and add/adjust tests accordingly.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
marimo/_plugins/ui/_impl/tables/table_manager.py Adds sample-based size estimation helper used by both widgets
marimo/_plugins/ui/_impl/table.py Adds get_size_bytes RPC; removes eager size computation from args/search
marimo/_plugins/ui/_impl/dataframes/dataframe.py Adds get_size_bytes RPC; removes eager size plumbing
frontend/src/plugins/impl/DataTablePlugin.tsx Adds lazy size fetch tied to download policy; threads size into table UI
frontend/src/plugins/impl/data-frames/DataFramePlugin.tsx Wires get_size_bytes through dataframe plugin into table UI
frontend/src/components/data-table/export-actions.tsx Disables export while size is being checked; improves tooltip messaging
frontend/src/components/data-table/TableTopBar.tsx Passes loading state into export menu
frontend/src/components/data-table/data-table.tsx Plumbs sizeBytesIsLoading through to top bar/export menu
frontend/src/stories/dataframe.stories.tsx Updates story to provide get_size_bytes
frontend/src/plugins/impl/tests/DataTablePlugin.test.tsx Mocks get_size_bytes in tests
tests/_plugins/ui/_impl/test_table.py Replaces eager size tests with RPC estimation tests
tests/_plugins/ui/_impl/dataframes/test_dataframe.py Adds dataframe RPC estimation test
Comment thread marimo/_plugins/ui/_impl/tables/table_manager.py Outdated
Comment thread marimo/_plugins/ui/_impl/tables/table_manager.py Outdated
Comment thread tests/_plugins/ui/_impl/test_table.py
Comment thread tests/_plugins/ui/_impl/dataframes/test_dataframe.py
Comment thread frontend/src/plugins/impl/DataTablePlugin.tsx

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 12 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread frontend/src/plugins/impl/DataTablePlugin.tsx Outdated
Comment thread marimo/_plugins/ui/_impl/dataframes/dataframe.py Outdated
Comment thread frontend/src/components/data-table/TableTopBar.tsx

# Rows used when estimating the JSON-serialized size of the rendered data.
# Bigger samples are more precise but cost more on the kernel control loop.
SIZE_ESTIMATE_SAMPLE_ROWS = 100

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could still be large depending on the entry. Logic looks reasonable, but wonder if it would still be better to just use narwhals: https://narwhals-dev.github.io/narwhals/api-reference/dataframe/?h=bytes#narwhals.dataframe.DataFrame.estimated_size

@kirangadhave kirangadhave May 26, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The estimated_size gives size on heap. We want the size of the json string which is encoded in data-uri. That's the part that crashes vscode.

For e.g. null values in the estimated size calculation (it uses pyarrow) add almost nothing in final size calculation. when stringifed to None they start adding up.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good proxy then

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I hope this goes away in future as we implement virtual file system in marimo-lsp and stream the files. I have a ticket to track the work, but it's on low prio for now

Drop format_mapping from estimate_size_bytes. Downloads serialize raw values, so applying formatters made the estimate diverge from actual download size and could mis-classify against the frontend size limit. Hardcode strict_json=True and ensure_ascii=True so the estimate models the JSON download path (upper-bound proxy across formats), and round up the extrapolated size with math.ceil to keep the gate conservative.

Also skip the get_size_bytes RPC and its loading state when showDownload=false, and forward sizeBytesIsLoading through DataTableComponent so the 'Checking download size' tooltip surfaces.
@kirangadhave kirangadhave merged commit b65bfb4 into main May 27, 2026
44 checks passed
@kirangadhave kirangadhave deleted the kg/table-size-rpc branch May 27, 2026 01:13
@github-actions

Copy link
Copy Markdown
Contributor

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.23.9-dev8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

3 participants