Skip to content

fix: handle sqlglot OptimizeError for DuckDB unnest with JSON access#8083

Merged
dmadisetti merged 3 commits intomarimo-team:mainfrom
bxff:main
Feb 2, 2026
Merged

fix: handle sqlglot OptimizeError for DuckDB unnest with JSON access#8083
dmadisetti merged 3 commits intomarimo-team:mainfrom
bxff:main

Conversation

@bxff
Copy link
Contributor

@bxff bxff commented Feb 1, 2026

📝 Summary

Closes #8080

🔍 Description of Changes

Problem: Marimo's SQL static analysis fails with sqlglot.errors.OptimizeError: Alias already used when parsing valid DuckDB syntax using unnest with JSON access (e.g., SELECT unnest([1,2,3])->>'$' FROM (SELECT 1)).

Root Cause: The build_scope() function from sqlglot's optimizer raises an OptimizeError when SQL contains duplicate aliases. This occurs with:

  • Multiple unnest expressions with JSON access that generate identical internal aliases
  • Cross-joined subqueries with the same column alias (e.g., SELECT * FROM (SELECT 1 as x), (SELECT 2 as x))

This is valid DuckDB syntax but not handled by sqlglot's scope optimizer.

Solution: Wrap the build_scope() call in find_sql_refs() with a try-except block. When OptimizeError is caught, fall back to extracting table references directly using expression.find_all(exp.Table) without scope analysis.

Changes:

  • marimo/_ast/sql_visitor.py: Added error handling for OptimizeError with fallback to direct table extraction
  • tests/_ast/test_sql_visitor.py: Added 3 test cases covering the regression and fallback behavior

📋 Checklist

  • I have read the contributor guidelines.
  • For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).
  • Tests have been added for the changes made.
  • Documentation has been updated where applicable, including docstrings for API changes.
  • Pull request title is a good summary of the changes - it will be used in the release notes.
@bxff bxff requested a review from dmadisetti as a code owner February 1, 2026 01:18
@vercel
Copy link

vercel bot commented Feb 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Feb 2, 2026 5:50pm

Request Review

@github-actions
Copy link

github-actions bot commented Feb 1, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@bxff
Copy link
Contributor Author

bxff commented Feb 1, 2026

I have read the CLA Document and I hereby sign the CLA

@dmadisetti dmadisetti added the bug Something isn't working label Feb 1, 2026
dmadisetti
dmadisetti previously approved these changes Feb 1, 2026
Copy link
Collaborator

@dmadisetti dmadisetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if isinstance(source, exp.Table):
if ref := get_ref_from_table(source):
refs.add(ref)
except OptimizeError:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Limitation of the fall back is that it's limited to a single scope. So if there are multiple statements, this will only catch the first one.

I almost wonder if we should log? but seems fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! A couple of clarifications:

Multiple statements are handled correctly: the try/except is inside the for expression in expression_list: loop, so each statement is processed independently. If one fails, others still use build_scope(). I added test_multiple_statements_with_optimize_error to verify this.

Added debug logging: good call. Added a LOGGER.debug() when the fallback triggers.

Re: single scope limitation: you're right that find_all(exp.Table) doesn't understand CTEs like build_scope() does, so we may occasionally over-report dependencies (e.g., including the CTE name itself). Trade-off seems acceptable to avoid the crash, and we won't under-report real table dependencies.

Let me know if you'd like any changes!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the debugging if we're not dropping anything.

Multiple statements are handled correctly

I guess I was trying to determine why don't we just run this instead of trying build_scope ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the debug logging.

Regarding why not use find_all everywhere - build_scope properly understands SQL scopes, which matters for CTEs:

WITH my_data AS (SELECT * FROM real_table)
SELECT * FROM my_data
  • build_scope -> {real_table} (correct - my_data is a CTE, not an external table)
  • find_all -> {my_data, real_table} (over-reports - includes the CTE name)

In marimo's reactive system, this over-reporting could trigger unnecessary cell re-executions if a CTE name happens to match another defined table. So we prefer build_scope's precision when it works, and only fall back to find_all for these edge cases where it fails.

- Add LOGGER.debug when build_scope falls back due to OptimizeError
- Add test_multiple_statements_with_optimize_error to verify each
  statement is processed independently

Follow-up to marimo-team#8080
…rence

build_scope correctly handles CTEs (won't report CTE names as table refs),
which is why we prefer it over find_all when possible.
Copy link
Collaborator

@dmadisetti dmadisetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thanks @bxff

@dmadisetti dmadisetti merged commit c374384 into marimo-team:main Feb 2, 2026
36 of 40 checks passed
@github-actions
Copy link

github-actions bot commented Feb 2, 2026

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.19.8-dev8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

2 participants