feat(sql): support nested schemas for data sources (#9837)#9845
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
|
@Light2Dark would you be able to take a look? Thank you! |
|
@mscolnick I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
2 issues found across 20 files
Architecture diagram
sequenceDiagram
participant Client as DataSources UI
participant FrontendStore as Jotai Store (DataSourceState)
participant Backend as Python Backend (Kernel)
participant EngineCatalog as EngineCatalog (PyIceberg)
participant DataModel as Data Models (Schema/Database)
Note over Client,DataModel: Nested Schema Discovery (Iceberg flow)
Client->>FrontendStore: Mount DataSources panel
FrontendStore->>Backend: list-sql-connections
Backend-->>FrontendStore: DataSourceConnection (databases)
Note over Client,FrontendStore: User expands "top" database
Client->>FrontendStore: Expand database "top"
alt schema_list not resolved
FrontendStore->>Backend: list-sql-schemas (database="top", schemaPath=[])
Backend->>EngineCatalog: get_schemas(database="top", include_tables=False)
EngineCatalog->>EngineCatalog: List top-level sub-namespaces
alt engine supports nested schemas (PyIceberg)
EngineCatalog-->>Backend: Schemas: ["", "nested"] (tables deferred)
else flat engine (DuckDB)
EngineCatalog-->>Backend: Schemas: ["public", ...]
end
Backend-->>FrontendStore: SQLSchemaListPreviewNotification (schemas=[...])
FrontendStore->>FrontendStore: updateSchemaList(schemaPath)
end
FrontendStore-->>Client: Render schemas (SchemaNode)
Note over Client,DataModel: User expands nested schema "nested"
Client->>FrontendStore: Expand nested schema "nested"
alt child schemas not resolved
FrontendStore->>Backend: list-sql-schemas (database="top", schemaPath=["nested"])
Backend->>EngineCatalog: get_child_schemas(database="top", schemaPath=["nested"])
EngineCatalog-->>Backend: Schemas: ["deep"] (deferred)
Backend-->>FrontendStore: SQLSchemaListPreviewNotification (schemas=[...])
FrontendStore->>FrontendStore: updateSchemaList(schemaPath=["nested"])
end
FrontendStore-->>Client: Render child schemas recursively
Note over Client,DataModel: Table preview from nested namespace
Client->>FrontendStore: Request table details for "top.nested.table4"
FrontendStore->>Backend: preview-sql-table (database="top", schema="", schemaPath=["nested"], tableName="table4")
Backend->>EngineCatalog: get_table_details(table="table4", database="top.nested")
alt engine supports_nested_schemas
EngineCatalog->>EngineCatalog: Fold database + schema_path: "top" + ["nested"] = "top.nested"
end
EngineCatalog-->>Backend: DataTable with columns
Backend-->>FrontendStore: SQLTablePreviewNotification
FrontendStore-->>Client: Render table preview
Note over Client,DataModel: Filter empty schemas (UI helper)
Client->>Client: filterEmptySchemas(recursive)
alt schema has no tables and no visible child schemas
Client->>Client: Remove schema from tree
else schema has resolved-empty children
Client->>Client: Prune empty children, keep parent if contains tables
end
Client-->>Client: Render filtered tree
Note over Client,DataModel: allTablesAtom enumerates nested tables
Client->>FrontendStore: Read allTablesAtom
FrontendStore->>FrontendStore: Walk database.schemas recursively
loop For each schema (including child schemas)
FrontendStore->>FrontendStore: Build qualified name: db.schema_path.table
alt schema_path non-empty
FrontendStore->>FrontendStore: segments = [...schemaPath, child.name]
else schemaless schema
FrontendStore->>FrontendStore: segments = []
end
FrontendStore->>FrontendStore: Add to table map with qualified name
end
FrontendStore-->>Client: Map of all discoverable tables
Note over Backend,DataModel: Backend helpers for nested path updates
Backend->>DataModel: updateSchemaAtPath(db.schemas, schemaPath, update)
loop For each segment in schemaPath
DataModel->>DataModel: Descend into matching Schema.schemas
end
DataModel-->>Backend: Updated Schema with children
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
There was a problem hiding this comment.
Pull request overview
This PR adds end-to-end support for hierarchical (nested) schemas/namespaces in SQL data source browsing, with a focus on Iceberg catalogs (pyiceberg) and the UI’s datasource tree expansion model.
Changes:
- Extend shared data models + OpenAPI/request/notification payloads to carry
schema_pathand allow nestedSchematrees. - Implement nested-namespace discovery in
PyIcebergEngine(top-level namespaces asDatabase, sub-namespaces as recursiveSchemas) with lazy expansion viaget_child_schemas. - Update runtime handlers and frontend state/UI to request, store, render, and update nested schema/table nodes; add/adjust tests across backend + frontend.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/_sql/test_pyiceberg.py | Adds nested-namespace fixtures and coverage for eager/lazy schema + table discovery. |
| tests/_sql/test_ibis.py | Asserts flat backends don’t support nested schemas. |
| tests/_sql/test_connection_utils.py | Adds tests for nested schema path lookup and in-place connection updates. |
| tests/_server/test_sql_request_models.py | Verifies request decoding preserves schemaPath and defaults it for older clients. |
| tests/_runtime/test_runtime_datasets.py | Tests nested schema path routing/echo + _table_database folding behavior. |
| packages/openapi/src/api.ts | Updates generated TS API types to include schemaPath and nested Schema. |
| packages/openapi/api.yaml | Updates OpenAPI spec for schemaPath and nested Schema fields. |
| marimo/_sql/engines/types.py | Adds supports_nested_schemas and default get_child_schemas API. |
| marimo/_sql/engines/pyiceberg.py | Implements nested namespace discovery + lazy expansion for Iceberg catalogs. |
| marimo/_sql/connection_utils.py | Supports updating nested schemas/tables in connection trees via schema_path. |
| marimo/_server/models/models.py | Ensures schema_path is preserved in as_command() conversions. |
| marimo/_runtime/commands.py | Adds schema_path to relevant runtime commands with safe defaults. |
| marimo/_runtime/callbacks/datasets.py | Routes schema expansion to get_child_schemas and folds nested paths for table calls. |
| marimo/_messaging/notification.py | Adds schema_path to SQL metadata structures used in notifications. |
| marimo/_data/models.py | Extends Schema to include nested child schemas + resolution flags. |
| frontend/src/core/datasets/data-source-connections.ts | Updates state reducer utilities to update schemas/tables at nested paths; recursively enumerates nested tables. |
| frontend/src/core/datasets/tests/data-source.test.ts | Adds reducer/allTables tests for nested namespace behavior. |
| frontend/src/components/datasources/datasources.tsx | Renders nested schemas recursively with lazy fetch on expansion; updates indentation + request payloads. |
| frontend/src/components/datasources/components.tsx | Extends Empty/Loading components to accept inline styles for nested indentation. |
| frontend/src/components/datasources/tests/filter-empty.test.ts | Adds recursive “hide empty” behavior tests for nested schemas. |
|
Thanks for taking a look @Light2Dark, I've made another pass. |
Bundle ReportChanges will increase total bundle size by 8.92kB (0.04%) ⬆️. This is within the configured threshold ✅ Detailed changes
Affected Assets, Files, and Routes:view changes for bundle: marimo-esmAssets Changed:
Files in
Files in
Files in
|
Light2Dark
left a comment
There was a problem hiding this comment.
This looks good, thank you. I will do a pass to clean up some code/data structure.
2417f54 to
26a28c2
Compare
26a28c2 to
7a44881
Compare
|
🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.23.11-dev2 |
…ws (#9954) **This pull request was authored by a coding agent.** ## Summary Follow-up to #9845 (which superseded the closed #9874). No API or wire-protocol change. Column previews built the fully-qualified table name by hand: ```ts fullyQualifiedTableName: sqlTableContext ? `${sqlTableContext.database}.${sqlTableContext.schema}.${table.name}` : table.name, ``` This is incorrect on the data model #9845 already shipped: - **Schemaless engines** (e.g. ClickHouse, where `schema === ""`) produce a double dot — `db..table`. - **Nested namespaces** are ignored: `schemaPath` is dropped, so the wrong name is sent to the backend. The fix routes the name through the existing `tableUniqueId` helper (already used by the datasources tree in `datasources.tsx`), which uses `schemaPath` when present and filters out empty segments. There is no behavior change for flat engines. ## Testing - Adds `column-preview.test.tsx`, which renders `DatasetColumnPreview` and asserts the `previewDatasetColumn` payload for the no-context, flat, schemaless, and nested-namespace cases. Verified the schemaless and nested cases **fail** against the old hand-built string and pass with the fix, so the regression is guarded at the call site (the existing `tableUniqueId` unit tests only cover the helper logic). ## Pre-Review Checklist - [x] Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it. ## Merge Checklist - [x] I have read the contributor guidelines. - [x] Tests have been added for the changes made. --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…s tree (#9955) ## Summary Follow-up to #9845 (which superseded the closed #9874). No API or wire-protocol change — purely a frontend UX improvement on the existing `Database.schemas` model. Previously, typing any text into the data sources search box ran `setIsExpanded(hasSearch)` on **every** database and schema, expanding the entire tree open regardless of whether a branch had any matching tables. Now a database/schema row auto-expands during search **only when its already-loaded subtree contains a table whose name matches the query**. Crucially, deferred (not-yet-fetched) tables and child schemas are treated as non-matching, so searching never triggers a lazy catalog fetch — important for large/remote connections (Iceberg, etc.) where unconditional expansion would cause a fetch storm. Expansion is re-evaluated on each keystroke. This is a reimplementation, against the merged `Database.schemas` model, of the matched-subtree search behavior from the closed #9874. ## Changes - New helpers in `datasources/utils.ts`: - `schemaSubtreeMatchesSearch(schema, query)` — recurses over *resolved* tables/child schemas only. - `shouldExpandDatabaseForSearch(database, query)` — false when the schema list is deferred. - `datasources.tsx`: `DatabaseItem` and `SchemaNode` now drive auto-expansion via these helpers, tracking `prevSearchValue` so expansion follows the query. Removes the now-unused `hasSearch` plumbing through the tree. https://github.com/user-attachments/assets/c58cb882-aa88-4b74-9508-ebf8bb6c1dee ## Testing - New unit tests for both helpers, including the deferred-bucket cases (search must not match unfetched data). - `pnpm test src/components/datasources/` — 55 tests pass. ## Pre-Review Checklist - [x] Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it. - [x] Video or media evidence is provided for any visual changes (optional). ## Merge Checklist - [x] I have read the contributor guidelines. - [x] Tests have been added for the changes made. <!-- This is an auto-generated description by cubic. --> <a href="https://cubic.dev/pr/marimo-team/marimo/pull/9955?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. --> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
📝 Summary
This addresses part of #9837:
Support nested
Schemas in the data models and in the UI. SinceDatabasehas no tables referenced directly, nesting schemas seems to be the best fit.Update the
pyicebergengine to correctly fetch nested schemas.This does not yet fix #9837 for other engines (like Ibis+Spark), I'd suggest to tackle that as a follow up.
📋 Pre-Review Checklist
✅ Merge Checklist