gate schema discovery behind "auto" for remote engines#9784
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR fixes a performance issue in the datasources panel for remote SQL backends (notably Snowflake) by ensuring schema discovery is no longer unconditionally enabled for the Ibis and ADBC engines. It aligns discovery behavior across engines by centralizing the default inference config and the "auto" discovery resolution heuristic, matching the existing intent of the SQLAlchemy engine.
Changes:
- Switch Ibis and ADBC engines from
auto_discover_schemas=Trueto a shareddefault_inference_config()where schemas/tables are gated by"auto". - Centralize
"auto"resolution and cheap-dialect detection viaEngineCatalog._resolve_should_auto_discover()/_is_cheap_discovery()andis_cheap_dialect(). - Add regression tests asserting SQLAlchemy/Ibis/ADBC engines use the shared default inference config (and that schemas default to
"auto").
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
marimo/_sql/utils.py |
Adds is_cheap_dialect() helper to centralize the “cheap discovery” dialect heuristic. |
marimo/_sql/engines/types.py |
Introduces default_inference_config() and centralizes "auto" discovery resolution in the base engine catalog type. |
marimo/_sql/engines/sqlalchemy.py |
Switches SQLAlchemy engine to use default_inference_config() and removes duplicated "auto" resolution logic. |
marimo/_sql/engines/ibis.py |
Fixes Ibis schema discovery default by using default_inference_config() (prevents eager schema scanning on remote backends). |
marimo/_sql/engines/adbc.py |
Fixes ADBC schema discovery default by using default_inference_config(); uses shared cheap-dialect helper. |
marimo/_sql/engines/redshift.py |
Updates engine override to implement _is_cheap_discovery() instead of duplicating "auto" resolution. |
marimo/_sql/engines/pyiceberg.py |
Removes duplicated "auto" resolver override; relies on base resolver + _is_cheap_discovery() override. |
marimo/_sql/engines/clickhouse.py |
Updates engine override to implement _is_cheap_discovery() instead of duplicating "auto" resolution. |
tests/_sql/test_sqlalchemy.py |
Adds test asserting SQLAlchemy engine uses the shared default inference config. |
tests/_sql/test_ibis.py |
Adds test asserting Ibis engine uses the shared default inference config and schemas default to "auto". |
tests/_sql/test_adbc.py |
Adds test asserting ADBC engine uses the shared default inference config and schemas default to "auto". |
There was a problem hiding this comment.
4 issues found across 11 files
Architecture diagram
sequenceDiagram
participant UI as Datasources Panel
participant Engine as SQL Engine (base)
participant Adbc as ADBC Engine
participant Ibis as Ibis Engine
participant SA as SQLAlchemy Engine
participant ClickHouse as ClickHouse Engine
participant Redshift as Redshift Engine
participant PyIceberg as PyIceberg Engine
participant Types as types.py
participant Utils as utils.py
Note over UI,PyIceberg: Schema/Table Discovery Flow
UI->>Engine: request datasource discovery
Engine->>Engine: inference_config
Note over Engine: Default: auto_discover_schemas="auto"
alt Engine is ADBC or Ibis or SQLAlchemy
Engine->>Types: call default_inference_config()
Types-->>Engine: InferenceConfig with auto_discover_schemas="auto"
else Engine is ClickHouse or Redshift
Engine->>Engine: hardcoded override (not using default)
Note over Engine: _is_cheap_discovery() returns False
else Engine is PyIceberg
Engine->>Engine: hardcoded override
Note over Engine: _is_cheap_discovery() returns True
end
Engine->>Engine: _resolve_should_auto_discover("auto")
alt Resolution by dialect heuristic
Engine->>Utils: is_cheap_dialect(dialect)
Utils->>Utils: check CHEAP_DISCOVERY_DATABASES set
alt dialect in cheap set (e.g., sqlite, duckdb)
Utils-->>Engine: return True
Engine-->>UI: auto-discover schemas
else dialect not in cheap set (e.g., snowflake, bigquery)
Utils-->>Engine: return False
Engine-->>UI: skip auto-discovery
end
else Override by engine policy
ClickHouse-->>ClickHouse: return False (always expensive)
Redshift-->>Redshift: return False (opt out)
PyIceberg-->>PyIceberg: return True (metadata always fast)
end
Note over UI,PyIceberg: Dialect-based gating unified via base class
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| LOGGER = _loggers.marimo_logger() | ||
|
|
||
| CHEAP_DISCOVERY_DATABASES = ["duckdb", "sqlite", "mysql", "postgresql"] | ||
| CHEAP_DISCOVERY_DATABASES = [ |
There was a problem hiding this comment.
could this be a set? not sure where it is used elsewhere
9169e52 to
b3de311
Compare
The datasources panel scanned every catalog/schema on remote backends such as Snowflake because the ibis and adbc engines hardcoded auto_discover_schemas=True. Switch them to "auto" so discovery is gated by the cheap-dialect heuristic, matching the sqlalchemy engine. Also unify the shared default discovery config (default_inference_config) and the "auto" resolution (_resolve_should_auto_discover / _is_cheap_discovery) across engines to prevent this kind of drift in the future. Co-authored-by: Cursor <cursoragent@cursor.com>
- Resolve include_tables in ClickHouse server get_databases so "auto" no longer scans every database's tables (it stays truthy otherwise). - Add mariadb to the cheap-dialect allowlist so MariaDB connections still auto-discover (SQLAlchemy reports dialect.name == "mariadb"). - Fix grammar in default_inference_config docstring. Co-authored-by: Cursor <cursoragent@cursor.com>
b3de311 to
8ea6870
Compare
📝 Summary
Closes #9775
The datasources panel scanned every catalog/schema on remote backends such as Snowflake because the ibis and adbc engines hardcoded auto_discover_schemas=True. Switch them to "auto" so discovery is gated by the cheap-dialect heuristic, matching the sqlalchemy engine.
Also unify the shared default discovery config (default_inference_config) and the "auto" resolution (_resolve_should_auto_discover / _is_cheap_discovery) across engines to prevent this kind of drift in the future.
📋 Pre-Review Checklist
✅ Merge Checklist