Skip to content

fix(datasets): avoid exponential blow-up of nested struct sample values#9506

Merged
mscolnick merged 1 commit into
mainfrom
kg/polars-nested-struct-perf
May 11, 2026
Merged

fix(datasets): avoid exponential blow-up of nested struct sample values#9506
mscolnick merged 1 commit into
mainfrom
kg/polars-nested-struct-perf

Conversation

@kirangadhave

@kirangadhave kirangadhave commented May 11, 2026

Copy link
Copy Markdown
Member

Summary

NarwhalsTableManager.get_sample_values recursively re-stringified nested list/dict cells, causing each ancestor level to re-escape the children's repr. For deeply nested polars Struct/List columns this scaled ~8× per depth and produced multi-GB strings that hung the browser when the dataframe was registered as a dataset.

Replace the recursion with one json.dumps pass, preserving the Enum.name-at-any-depth contract via a default= callback. Scalar paths are unchanged.

Fixes #9378.

Test plan

  • New tests in tests/_plugins/ui/_impl/tables/test_narwhals.py cover: bounded time/size at nesting depth 8, JSON-shaped output, non-JSON leaf (datetime) embedded in a struct.
  • uv run --group test pytest tests/_plugins/ui/_impl/tables/ — 468 passed.
  • make py-check clean.
  • Manual: register a polars df with depth-8 nested struct via a top-level variable; browser no longer hangs and SQL column hover shows a readable JSON preview.
NarwhalsTableManager.get_sample_values recursively re-stringified nested list/dict cells, causing each ancestor level to re-escape the children's repr. For deeply nested polars Struct/List columns this scaled ~8x per depth and produced multi-GB strings that hung the browser. Replace the recursion with one json.dumps pass, preserving Enum.name handling via a default callback.
@vercel

vercel Bot commented May 11, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment May 11, 2026 8:17pm

Request Review

@kirangadhave kirangadhave added the bug Something isn't working label May 11, 2026
@kirangadhave kirangadhave requested a review from mscolnick May 11, 2026 20:17
@kirangadhave kirangadhave marked this pull request as ready for review May 11, 2026 20:17

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Architecture diagram
sequenceDiagram
    participant UI as Browser/UI
    participant TM as get_sample_values()
    participant json as json.dumps()
    participant Polars as Polars DataFrame
    
    Note over UI,Polars: NEW: Efficient nested struct sampling
    
    UI->>TM: Request sample values for column
    TM->>Polars: Access column data (first 3 rows)
    Polars-->>TM: Return raw values (may contain deep nested structs)
    
    loop For each sampled value
        alt Value is list or dict (nested struct)
            TM->>json: json.dumps(value, default=_json_default)
            Note over TM,json: Single pass serialization<br/>avoids recursive str() calls
            alt Enum encountered in nested structure
                json->>json: _json_default(o) → o.name
                json-->>TM: Encoded JSON string
            else Non-serializable leaf (e.g., datetime)
                json->>json: _json_default(o) → str(o)
                json-->>TM: Encoded JSON string
            else TypeError/ValueError
                TM->>TM: Fallback to str(value)
            end
        else Value is int/float
            TM->>TM: Return as numeric
        else Value is Enum
            TM->>TM: Return .name
        else Other scalar (string, bytes, etc.)
            TM->>TM: Return str(value)
        end
    end
    
    TM-->>UI: Return list[str|int|float] (bounded size, JSON-shaped output)
Loading
@kirangadhave

Copy link
Copy Markdown
Member Author
@cubic-dev-ai

cubic-dev-ai Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

@cubic-dev-ai

@kirangadhave I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Architecture diagram
sequenceDiagram
    participant UI as Browser/UI
    participant TM as get_sample_values()
    participant json as json.dumps()
    participant Polars as Polars DataFrame
    
    Note over UI,Polars: NEW: Efficient nested struct sampling
    
    UI->>TM: Request sample values for column
    TM->>Polars: Access column data (first 3 rows)
    Polars-->>TM: Return raw values (may contain deep nested structs)
    
    loop For each sampled value
        alt Value is list or dict (nested struct)
            TM->>json: json.dumps(value, default=_json_default)
            Note over TM,json: Single pass serialization<br/>avoids recursive str() calls
            alt Enum encountered in nested structure
                json->>json: _json_default(o) → o.name
                json-->>TM: Encoded JSON string
            else Non-serializable leaf (e.g., datetime)
                json->>json: _json_default(o) → str(o)
                json-->>TM: Encoded JSON string
            else TypeError/ValueError
                TM->>TM: Fallback to str(value)
            end
        else Value is int/float
            TM->>TM: Return as numeric
        else Value is Enum
            TM->>TM: Return .name
        else Other scalar (string, bytes, etc.)
            TM->>TM: Return str(value)
        end
    end
    
    TM-->>UI: Return list[str|int|float] (bounded size, JSON-shaped output)
Loading
@mscolnick mscolnick merged commit 0eecccd into main May 11, 2026
50 of 51 checks passed
@mscolnick mscolnick deleted the kg/polars-nested-struct-perf branch May 11, 2026 20:29

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes pathological sample-value serialization for nested list/dict (e.g., Polars Struct/List) columns in NarwhalsTableManager.get_sample_values, preventing exponential string growth that can hang the browser during dataset registration.

Changes:

  • Replace recursive nested list/dict stringification with a single json.dumps(...) pass (with a default= hook to preserve Enum.name at any depth and str(...) fallback).
  • Add regression tests covering bounded runtime/size for deep nesting, JSON-shaped output, and non-JSON leaf handling (e.g., datetime) embedded in a struct.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
marimo/_plugins/ui/_impl/tables/narwhals_table.py Updates nested sample serialization to avoid exponential re-escaping by using json.dumps with an Enum-aware default serializer.
tests/_plugins/ui/_impl/tables/test_narwhals.py Adds regression tests for deep nested struct sampling behavior, including size/time bounds and JSON parsing expectations.
Comment thread tests/_plugins/ui/_impl/tables/test_narwhals.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

3 participants