fix: make `Config.fingerprint` deterministic across processes by etonlels · Pull Request #5877 · SQLMesh/sqlmesh

etonlels · 2026-07-01T18:20:00Z

Description

Config.fingerprint is crc32(pickle.dumps(config.dict())) and is used as part of the on-disk model cache key. Config fields such as linter.rules are sets, and set/frozenset iteration order is not stable across Python processes. Pickling them directly therefore produces different bytes each run, so the fingerprint — and every cache entry keyed by it — changes on every invocation.

The practical effect is that the model definition cache never hits across processes: every model is re-parsed on each load. On a 45-project monorepo (~1340 models) this made a warm-cache load re-parse everything through the fork pool, ~74s -> ~44s once the cache actually hits.

Fix by canonicalizing the config dict before hashing: recursively sort set/frozenset members into lists so serialization is order-stable while preserving contents. Lists/tuples/dicts are recursed into; other values are unchanged.

Test Plan

Added unit + regression tests in tests/core/test_config.py:

test_canonicalize_sorts_sets, test_canonicalize_recurses_into_containers, test_canonicalize_preserves_non_set_values_and_types — cover the new _canonicalize helper: sets/frozensets sort to lists, nested dicts/lists/tuples recurse, list/tuple types are preserved, and non-set values pass through unchanged.
test_config_fingerprint_is_deterministic_across_processes — the regression guard. It builds a Config with a set-valued linter.rules and computes config.fingerprint in two subprocesses run with different PYTHONHASHSEED values, asserting the results match. This reproduces the original bug: it fails on main (the two fingerprints differ) and passes with this change.

pytest tests/core/test_config.py -k "canonicalize or fingerprint"

Also verified end-to-end on a real 45-project (~1340 model) monorepo: before the fix the model-definition cache had zero cross-process hits (every model re-parsed on each run); after the fix a warm-cache load hits the cache and drops from ~74s to ~44s.

Checklist

I have run make style and fixed any issues
I have added tests for my changes (if applicable)
All existing tests pass (make fast-test)
My commits are signed off (git commit -s) per the DCO

`Config.fingerprint` is `crc32(pickle.dumps(config.dict()))` and is used as part of the on-disk model cache key. Config fields such as `linter.rules` are `set`s, and set/frozenset iteration order is not stable across Python processes. Pickling them directly therefore produces different bytes each run, so the fingerprint — and every cache entry keyed by it — changes on every invocation. The practical effect is that the model definition cache never hits across processes: every model is re-parsed on each load. On a 45-project monorepo (~1340 models) this made a warm-cache load re-parse everything through the fork pool, ~74s -> ~44s once the cache actually hits. Fix by canonicalizing the config dict before hashing: recursively sort set/frozenset members into lists so serialization is order-stable while preserving contents. Lists/tuples/dicts are recursed into; other values are unchanged. Co-Authored-By: OpenCode google-vertex/claude-opus-4-8@default <noreply@opencode.ai> Signed-off-by: etonlels <etonlels@gmail.com>

etonlels marked this pull request as ready for review July 1, 2026 18:35

etonlels force-pushed the fix-deterministic-config-fingerprint branch from 15937a8 to 007aeef Compare July 1, 2026 18:37

etonlels force-pushed the fix-deterministic-config-fingerprint branch from 007aeef to 93675d4 Compare July 1, 2026 22:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: make `Config.fingerprint` deterministic across processes#5877

fix: make `Config.fingerprint` deterministic across processes#5877
etonlels wants to merge 1 commit into
SQLMesh:mainfrom
etonlels:fix-deterministic-config-fingerprint

etonlels commented Jul 1, 2026 •

edited

Loading

Labels

1 participant

Uh oh!

Conversation

etonlels commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Plan

Checklist

Labels

1 participant

etonlels commented Jul 1, 2026 •

edited

Loading