Skip to content

feat(sql): support nested schemas for data sources (#9837)#9845

Merged
Light2Dark merged 11 commits into
marimo-team:mainfrom
psavalle:psavalle/nested-schemas
Jun 22, 2026
Merged

feat(sql): support nested schemas for data sources (#9837)#9845
Light2Dark merged 11 commits into
marimo-team:mainfrom
psavalle:psavalle/nested-schemas

Conversation

@psavalle

Copy link
Copy Markdown
Contributor

📝 Summary

This addresses part of #9837:

  • Support nested Schemas in the data models and in the UI. Since Database has no tables referenced directly, nesting schemas seems to be the best fit.

    image
  • Update the pyiceberg engine to correctly fetch nested schemas.

This does not yet fix #9837 for other engines (like Ibis+Spark), I'd suggest to tackle that as a follow up.

📋 Pre-Review Checklist

  • For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).
  • Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
  • Video or media evidence is provided for any visual changes (optional).

✅ Merge Checklist

  • I have read the contributor guidelines.
  • Documentation has been updated where applicable, including docstrings for API changes.
  • Tests have been added for the changes made.
@vercel

vercel Bot commented Jun 10, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Jun 18, 2026 3:14pm

Request Review

@github-actions github-actions Bot added the bash-focus Area to focus on during release bug bash label Jun 10, 2026
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@psavalle

Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@psavalle

Copy link
Copy Markdown
Contributor Author

@Light2Dark would you be able to take a look? Thank you!

@mscolnick

Copy link
Copy Markdown
Contributor
@cubic-dev-ai

cubic-dev-ai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@cubic-dev-ai

@mscolnick I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 20 files

Architecture diagram
sequenceDiagram
    participant Client as DataSources UI
    participant FrontendStore as Jotai Store (DataSourceState)
    participant Backend as Python Backend (Kernel)
    participant EngineCatalog as EngineCatalog (PyIceberg)
    participant DataModel as Data Models (Schema/Database)

    Note over Client,DataModel: Nested Schema Discovery (Iceberg flow)

    Client->>FrontendStore: Mount DataSources panel
    FrontendStore->>Backend: list-sql-connections

    Backend-->>FrontendStore: DataSourceConnection (databases)

    Note over Client,FrontendStore: User expands "top" database

    Client->>FrontendStore: Expand database "top"
    alt schema_list not resolved
        FrontendStore->>Backend: list-sql-schemas (database="top", schemaPath=[])
        Backend->>EngineCatalog: get_schemas(database="top", include_tables=False)
        EngineCatalog->>EngineCatalog: List top-level sub-namespaces
        alt engine supports nested schemas (PyIceberg)
            EngineCatalog-->>Backend: Schemas: ["", "nested"] (tables deferred)
        else flat engine (DuckDB)
            EngineCatalog-->>Backend: Schemas: ["public", ...]
        end
        Backend-->>FrontendStore: SQLSchemaListPreviewNotification (schemas=[...])
        FrontendStore->>FrontendStore: updateSchemaList(schemaPath)
    end
    FrontendStore-->>Client: Render schemas (SchemaNode)

    Note over Client,DataModel: User expands nested schema "nested"

    Client->>FrontendStore: Expand nested schema "nested"
    alt child schemas not resolved
        FrontendStore->>Backend: list-sql-schemas (database="top", schemaPath=["nested"])
        Backend->>EngineCatalog: get_child_schemas(database="top", schemaPath=["nested"])
        EngineCatalog-->>Backend: Schemas: ["deep"] (deferred)
        Backend-->>FrontendStore: SQLSchemaListPreviewNotification (schemas=[...])
        FrontendStore->>FrontendStore: updateSchemaList(schemaPath=["nested"])
    end
    FrontendStore-->>Client: Render child schemas recursively

    Note over Client,DataModel: Table preview from nested namespace

    Client->>FrontendStore: Request table details for "top.nested.table4"
    FrontendStore->>Backend: preview-sql-table (database="top", schema="", schemaPath=["nested"], tableName="table4")
    Backend->>EngineCatalog: get_table_details(table="table4", database="top.nested")
    alt engine supports_nested_schemas
        EngineCatalog->>EngineCatalog: Fold database + schema_path: "top" + ["nested"] = "top.nested"
    end
    EngineCatalog-->>Backend: DataTable with columns
    Backend-->>FrontendStore: SQLTablePreviewNotification
    FrontendStore-->>Client: Render table preview

    Note over Client,DataModel: Filter empty schemas (UI helper)

    Client->>Client: filterEmptySchemas(recursive)
    alt schema has no tables and no visible child schemas
        Client->>Client: Remove schema from tree
    else schema has resolved-empty children
        Client->>Client: Prune empty children, keep parent if contains tables
    end
    Client-->>Client: Render filtered tree

    Note over Client,DataModel: allTablesAtom enumerates nested tables

    Client->>FrontendStore: Read allTablesAtom
    FrontendStore->>FrontendStore: Walk database.schemas recursively
    loop For each schema (including child schemas)
        FrontendStore->>FrontendStore: Build qualified name: db.schema_path.table
        alt schema_path non-empty
            FrontendStore->>FrontendStore: segments = [...schemaPath, child.name]
        else schemaless schema
            FrontendStore->>FrontendStore: segments = []
        end
        FrontendStore->>FrontendStore: Add to table map with qualified name
    end
    FrontendStore-->>Client: Map of all discoverable tables

    Note over Backend,DataModel: Backend helpers for nested path updates

    Backend->>DataModel: updateSchemaAtPath(db.schemas, schemaPath, update)
    loop For each segment in schemaPath
        DataModel->>DataModel: Descend into matching Schema.schemas
    end
    DataModel-->>Backend: Updated Schema with children
Loading

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread marimo/_sql/engines/types.py Outdated
Comment thread frontend/src/core/datasets/__tests__/data-source.test.ts Outdated
Comment thread marimo/_messaging/notification.py Outdated
Comment thread marimo/_data/models.py Outdated
Comment thread marimo/_runtime/callbacks/datasets.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds end-to-end support for hierarchical (nested) schemas/namespaces in SQL data source browsing, with a focus on Iceberg catalogs (pyiceberg) and the UI’s datasource tree expansion model.

Changes:

  • Extend shared data models + OpenAPI/request/notification payloads to carry schema_path and allow nested Schema trees.
  • Implement nested-namespace discovery in PyIcebergEngine (top-level namespaces as Database, sub-namespaces as recursive Schemas) with lazy expansion via get_child_schemas.
  • Update runtime handlers and frontend state/UI to request, store, render, and update nested schema/table nodes; add/adjust tests across backend + frontend.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/_sql/test_pyiceberg.py Adds nested-namespace fixtures and coverage for eager/lazy schema + table discovery.
tests/_sql/test_ibis.py Asserts flat backends don’t support nested schemas.
tests/_sql/test_connection_utils.py Adds tests for nested schema path lookup and in-place connection updates.
tests/_server/test_sql_request_models.py Verifies request decoding preserves schemaPath and defaults it for older clients.
tests/_runtime/test_runtime_datasets.py Tests nested schema path routing/echo + _table_database folding behavior.
packages/openapi/src/api.ts Updates generated TS API types to include schemaPath and nested Schema.
packages/openapi/api.yaml Updates OpenAPI spec for schemaPath and nested Schema fields.
marimo/_sql/engines/types.py Adds supports_nested_schemas and default get_child_schemas API.
marimo/_sql/engines/pyiceberg.py Implements nested namespace discovery + lazy expansion for Iceberg catalogs.
marimo/_sql/connection_utils.py Supports updating nested schemas/tables in connection trees via schema_path.
marimo/_server/models/models.py Ensures schema_path is preserved in as_command() conversions.
marimo/_runtime/commands.py Adds schema_path to relevant runtime commands with safe defaults.
marimo/_runtime/callbacks/datasets.py Routes schema expansion to get_child_schemas and folds nested paths for table calls.
marimo/_messaging/notification.py Adds schema_path to SQL metadata structures used in notifications.
marimo/_data/models.py Extends Schema to include nested child schemas + resolution flags.
frontend/src/core/datasets/data-source-connections.ts Updates state reducer utilities to update schemas/tables at nested paths; recursively enumerates nested tables.
frontend/src/core/datasets/tests/data-source.test.ts Adds reducer/allTables tests for nested namespace behavior.
frontend/src/components/datasources/datasources.tsx Renders nested schemas recursively with lazy fetch on expansion; updates indentation + request payloads.
frontend/src/components/datasources/components.tsx Extends Empty/Loading components to accept inline styles for nested indentation.
frontend/src/components/datasources/tests/filter-empty.test.ts Adds recursive “hide empty” behavior tests for nested schemas.
Comment thread marimo/_messaging/notification.py
Comment thread marimo/_messaging/notification.py Outdated
Comment thread frontend/src/components/datasources/datasources.tsx
Comment thread frontend/src/components/datasources/datasources.tsx Outdated
Comment thread marimo/_sql/engines/pyiceberg.py
@psavalle

Copy link
Copy Markdown
Contributor Author

Thanks for taking a look @Light2Dark, I've made another pass.

@psavalle psavalle marked this pull request as ready for review June 11, 2026 07:08
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Bundle Report

Changes will increase total bundle size by 8.92kB (0.04%) ⬆️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
marimo-esm 25.34MB 8.92kB (0.04%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: marimo-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/cells-*.js 1.61kB 718.53kB 0.23%
assets/JsonOutput-*.js 101 bytes 571.86kB 0.02%
assets/index-*.js -7 bytes 432.99kB -0.0%
assets/index-*.css -204 bytes 367.98kB -0.06%
assets/edit-*.js 2 bytes 328.23kB 0.0%
assets/add-*.js 8 bytes 202.23kB 0.0%
assets/cell-*.js 1.38kB 186.09kB 0.75%
assets/file-*.js 3.28kB 51.59kB 6.8% ⚠️
assets/panels-*.js 4 bytes 45.84kB 0.01%
assets/session-*.js 2.29kB 28.37kB 8.79% ⚠️
assets/state-*.js 129 bytes 2.98kB 4.52%
assets/useNotebookActions-*.js 3 bytes 22.9kB 0.01%
assets/cell-*.css 310 bytes 3.58kB 9.47% ⚠️

Files in assets/cells-*.js:

  • ./src/core/datasets/data-source-connections.ts → Total Size: 7.79kB

  • ./src/components/datasources/utils.ts → Total Size: 3.68kB

Files in assets/JsonOutput-*.js:

  • ./src/components/datasources/components.tsx → Total Size: 4.79kB

Files in assets/session-*.js:

  • ./src/components/datasources/datasources.tsx → Total Size: 42.76kB
Comment thread frontend/src/components/datasources/datasources.tsx Outdated
Comment thread frontend/src/components/datasources/datasources.tsx Outdated
Light2Dark
Light2Dark previously approved these changes Jun 11, 2026

@Light2Dark Light2Dark left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thank you. I will do a pass to clean up some code/data structure.

@Light2Dark Light2Dark merged commit 06b152a into marimo-team:main Jun 22, 2026
30 of 44 checks passed
@github-actions

Copy link
Copy Markdown
Contributor

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.23.11-dev2

Light2Dark added a commit that referenced this pull request Jun 22, 2026
…ws (#9954)

**This pull request was authored by a coding agent.**

## Summary

Follow-up to #9845 (which superseded the closed #9874). No API or
wire-protocol change.

Column previews built the fully-qualified table name by hand:

```ts
fullyQualifiedTableName: sqlTableContext
  ? `${sqlTableContext.database}.${sqlTableContext.schema}.${table.name}`
  : table.name,
```

This is incorrect on the data model #9845 already shipped:

- **Schemaless engines** (e.g. ClickHouse, where `schema === ""`)
produce a double dot — `db..table`.
- **Nested namespaces** are ignored: `schemaPath` is dropped, so the
wrong name is sent to the backend.

The fix routes the name through the existing `tableUniqueId` helper
(already used by the datasources tree in `datasources.tsx`), which uses
`schemaPath` when present and filters out empty segments. There is no
behavior change for flat engines.

## Testing

- Adds `column-preview.test.tsx`, which renders `DatasetColumnPreview`
and asserts the `previewDatasetColumn` payload for the no-context, flat,
schemaless, and nested-namespace cases. Verified the schemaless and
nested cases **fail** against the old hand-built string and pass with
the fix, so the regression is guarded at the call site (the existing
`tableUniqueId` unit tests only cover the helper logic).

## Pre-Review Checklist

- [x] Any AI generated code has been reviewed line-by-line by the human
PR author, who stands by it.

## Merge Checklist

- [x] I have read the contributor guidelines.
- [x] Tests have been added for the changes made.

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Light2Dark added a commit that referenced this pull request Jun 23, 2026
…s tree (#9955)

## Summary

Follow-up to #9845 (which superseded the closed #9874). No API or
wire-protocol change — purely a frontend UX improvement on the existing
`Database.schemas` model.

Previously, typing any text into the data sources search box ran
`setIsExpanded(hasSearch)` on **every** database and schema, expanding
the entire tree open regardless of whether a branch had any matching
tables.

Now a database/schema row auto-expands during search **only when its
already-loaded subtree contains a table whose name matches the query**.
Crucially, deferred (not-yet-fetched) tables and child schemas are
treated as non-matching, so searching never triggers a lazy catalog
fetch — important for large/remote connections (Iceberg, etc.) where
unconditional expansion would cause a fetch storm. Expansion is
re-evaluated on each keystroke.

This is a reimplementation, against the merged `Database.schemas` model,
of the matched-subtree search behavior from the closed #9874.

## Changes

- New helpers in `datasources/utils.ts`:
- `schemaSubtreeMatchesSearch(schema, query)` — recurses over *resolved*
tables/child schemas only.
- `shouldExpandDatabaseForSearch(database, query)` — false when the
schema list is deferred.
- `datasources.tsx`: `DatabaseItem` and `SchemaNode` now drive
auto-expansion via these helpers, tracking `prevSearchValue` so
expansion follows the query. Removes the now-unused `hasSearch` plumbing
through the tree.


https://github.com/user-attachments/assets/c58cb882-aa88-4b74-9508-ebf8bb6c1dee

## Testing

- New unit tests for both helpers, including the deferred-bucket cases
(search must not match unfetched data).
- `pnpm test src/components/datasources/` — 55 tests pass.

## Pre-Review Checklist

- [x] Any AI generated code has been reviewed line-by-line by the human
PR author, who stands by it.
- [x] Video or media evidence is provided for any visual changes
(optional).

## Merge Checklist

- [x] I have read the contributor guidelines.
- [x] Tests have been added for the changes made.

<!-- This is an auto-generated description by cubic. -->
<a
href="https://cubic.dev/pr/marimo-team/marimo/pull/9955?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>
<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bash-focus Area to focus on during release bug bash enhancement New feature or request

4 participants