Skip to content

perf: avoid uncessary duckdb calls in stats and getting db names#8250

Merged
mscolnick merged 1 commit intomainfrom
ms/perf-avoid-uncessary-duckdb-calls
Feb 10, 2026
Merged

perf: avoid uncessary duckdb calls in stats and getting db names#8250
mscolnick merged 1 commit intomainfrom
ms/perf-avoid-uncessary-duckdb-calls

Conversation

@mscolnick
Copy link
Contributor

@mscolnick mscolnick commented Feb 10, 2026

  • Eliminate redundant COUNT(*) query in get_column_preview_for_duckdb — reuse stats.total
  • Skip unnecessary _get_duckdb_database_names query in the duckdb_columns() fallback path. avoids an extra SELECT * FROM duckdb_databases() query
@vercel
Copy link

vercel bot commented Feb 10, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Feb 10, 2026 1:22am

Request Review

@mscolnick mscolnick added the bug Something isn't working label Feb 10, 2026
@mscolnick mscolnick changed the title perf: avoid uncessary duckdb calls Feb 10, 2026
@mscolnick mscolnick requested a review from Copilot February 10, 2026 01:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves DuckDB-related performance by eliminating redundant catalog/stat queries when generating column previews and listing databases, while adding tests to lock in the new behavior.

Changes:

  • Reuse stats.total in get_column_preview_for_duckdb to avoid an extra COUNT(*) query.
  • Add a backfill_empty_databases switch to skip duckdb_databases() calls in the duckdb_columns() (agg-query) fallback path.
  • Update/add tests to cover stats.total is None and backfill behavior, and standardize DuckDB tests on @pytest.mark.requires("duckdb") in test_get_datasets.py.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
marimo/_data/preview_column.py Removes redundant row-count query and threads a ChartBuilder into chart-spec generation.
marimo/_data/get_datasets.py Adds backfill_empty_databases flag to avoid extra duckdb_databases() calls in the agg-query path; hoists constants.
tests/_data/test_preview_column.py Adds coverage for stats.total=None to ensure no crash/no chart generation.
tests/_data/test_get_datasets.py Converts DuckDB skips to @pytest.mark.requires("duckdb") and adds tests verifying backfill behavior and avoiding _get_duckdb_database_names calls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.



def has_updates_to_datasource(query: str) -> bool:
import duckdb
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import of module duckdb is redundant, as it was previously imported on line 21.
This import of module marimo._sql.engines.duckdb is redundant, as it was previously imported on line 21.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're drunk

@mscolnick mscolnick merged commit fbadc98 into main Feb 10, 2026
53 of 55 checks passed
@mscolnick mscolnick deleted the ms/perf-avoid-uncessary-duckdb-calls branch February 10, 2026 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

3 participants