Skip to content

fix: make Config.fingerprint deterministic across processes#5877

Open
etonlels wants to merge 1 commit into
SQLMesh:mainfrom
etonlels:fix-deterministic-config-fingerprint
Open

fix: make Config.fingerprint deterministic across processes#5877
etonlels wants to merge 1 commit into
SQLMesh:mainfrom
etonlels:fix-deterministic-config-fingerprint

Conversation

@etonlels

@etonlels etonlels commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Description

Config.fingerprint is crc32(pickle.dumps(config.dict())) and is used as part of the on-disk model cache key. Config fields such as linter.rules are sets, and set/frozenset iteration order is not stable across Python processes. Pickling them directly therefore produces different bytes each run, so the fingerprint — and every cache entry keyed by it — changes on every invocation.

The practical effect is that the model definition cache never hits across processes: every model is re-parsed on each load. On a 45-project monorepo (~1340 models) this made a warm-cache load re-parse everything through the fork pool, ~74s -> ~44s once the cache actually hits.

Fix by canonicalizing the config dict before hashing: recursively sort set/frozenset members into lists so serialization is order-stable while preserving contents. Lists/tuples/dicts are recursed into; other values are unchanged.

Test Plan

Added unit + regression tests in tests/core/test_config.py:

  • test_canonicalize_sorts_sets, test_canonicalize_recurses_into_containers, test_canonicalize_preserves_non_set_values_and_types — cover the new _canonicalize helper: sets/frozensets sort to lists, nested dicts/lists/tuples recurse, list/tuple types are preserved, and non-set values pass through unchanged.
  • test_config_fingerprint_is_deterministic_across_processes — the regression guard. It builds a Config with a set-valued linter.rules and computes config.fingerprint in two subprocesses run with different PYTHONHASHSEED values, asserting the results match. This reproduces the original bug: it fails on main (the two fingerprints differ) and passes with this change.
pytest tests/core/test_config.py -k "canonicalize or fingerprint"

Also verified end-to-end on a real 45-project (~1340 model) monorepo: before the fix the model-definition cache had zero cross-process hits (every model re-parsed on each run); after the fix a warm-cache load hits the cache and drops from ~74s to ~44s.

Checklist

  • I have run make style and fixed any issues
  • I have added tests for my changes (if applicable)
  • All existing tests pass (make fast-test)
  • My commits are signed off (git commit -s) per the DCO
@etonlels etonlels marked this pull request as ready for review July 1, 2026 18:35
@etonlels etonlels force-pushed the fix-deterministic-config-fingerprint branch from 15937a8 to 007aeef Compare July 1, 2026 18:37
`Config.fingerprint` is `crc32(pickle.dumps(config.dict()))` and is used as part
of the on-disk model cache key. Config fields such as `linter.rules` are `set`s,
and set/frozenset iteration order is not stable across Python processes. Pickling
them directly therefore produces different bytes each run, so the fingerprint —
and every cache entry keyed by it — changes on every invocation.

The practical effect is that the model definition cache never hits across
processes: every model is re-parsed on each load. On a 45-project monorepo
(~1340 models) this made a warm-cache load re-parse everything through the fork
pool, ~74s -> ~44s once the cache actually hits.

Fix by canonicalizing the config dict before hashing: recursively sort
set/frozenset members into lists so serialization is order-stable while
preserving contents. Lists/tuples/dicts are recursed into; other values are
unchanged.

Co-Authored-By: OpenCode google-vertex/claude-opus-4-8@default <noreply@opencode.ai>
Signed-off-by: etonlels <etonlels@gmail.com>
@etonlels etonlels force-pushed the fix-deterministic-config-fingerprint branch from 007aeef to 93675d4 Compare July 1, 2026 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant