fix: make Config.fingerprint deterministic across processes#5877
Open
etonlels wants to merge 1 commit into
Open
fix: make Config.fingerprint deterministic across processes#5877etonlels wants to merge 1 commit into
Config.fingerprint deterministic across processes#5877etonlels wants to merge 1 commit into
Conversation
15937a8 to
007aeef
Compare
`Config.fingerprint` is `crc32(pickle.dumps(config.dict()))` and is used as part of the on-disk model cache key. Config fields such as `linter.rules` are `set`s, and set/frozenset iteration order is not stable across Python processes. Pickling them directly therefore produces different bytes each run, so the fingerprint — and every cache entry keyed by it — changes on every invocation. The practical effect is that the model definition cache never hits across processes: every model is re-parsed on each load. On a 45-project monorepo (~1340 models) this made a warm-cache load re-parse everything through the fork pool, ~74s -> ~44s once the cache actually hits. Fix by canonicalizing the config dict before hashing: recursively sort set/frozenset members into lists so serialization is order-stable while preserving contents. Lists/tuples/dicts are recursed into; other values are unchanged. Co-Authored-By: OpenCode google-vertex/claude-opus-4-8@default <noreply@opencode.ai> Signed-off-by: etonlels <etonlels@gmail.com>
007aeef to
93675d4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Config.fingerprintiscrc32(pickle.dumps(config.dict()))and is used as part of the on-disk model cache key. Config fields such aslinter.rulesaresets, and set/frozenset iteration order is not stable across Python processes. Pickling them directly therefore produces different bytes each run, so the fingerprint — and every cache entry keyed by it — changes on every invocation.The practical effect is that the model definition cache never hits across processes: every model is re-parsed on each load. On a 45-project monorepo (~1340 models) this made a warm-cache load re-parse everything through the fork pool, ~74s -> ~44s once the cache actually hits.
Fix by canonicalizing the config dict before hashing: recursively sort set/frozenset members into lists so serialization is order-stable while preserving contents. Lists/tuples/dicts are recursed into; other values are unchanged.
Test Plan
Added unit + regression tests in
tests/core/test_config.py:test_canonicalize_sorts_sets,test_canonicalize_recurses_into_containers,test_canonicalize_preserves_non_set_values_and_types— cover the new_canonicalizehelper: sets/frozensets sort to lists, nested dicts/lists/tuples recurse, list/tuple types are preserved, and non-set values pass through unchanged.test_config_fingerprint_is_deterministic_across_processes— the regression guard. It builds aConfigwith a set-valuedlinter.rulesand computesconfig.fingerprintin two subprocesses run with differentPYTHONHASHSEEDvalues, asserting the results match. This reproduces the original bug: it fails onmain(the two fingerprints differ) and passes with this change.Also verified end-to-end on a real 45-project (~1340 model) monorepo: before the fix the model-definition cache had zero cross-process hits (every model re-parsed on each run); after the fix a warm-cache load hits the cache and drops from ~74s to ~44s.
Checklist
make styleand fixed any issuesmake fast-test)git commit -s) per the DCO