Canonicalize rule hash inputs (attributes + rule inputs)#379
Merged
Conversation
Hash a rule's attributes in name-sorted order and dedupe + sort its (configured) rule inputs before mixing them into the digest, so a target's hash is invariant to the order Bazel happens to emit them in. A rule's attributes and rule inputs are conceptually sets, but `BazelRule.digest()` / `ruleInputList()` hashed them in Bazel's emission order. That order is not guaranteed stable, so an otherwise-unchanged target could hash differently between two graphs. This is most acute on the configuration-aware (#359) cquery path: `configuredRuleInputList` can surface the same dep label across multiple configurations, and cquery does not promise a stable order for those edges. Backported from bazel-contrib/target-determinator commit d4b6125 ("Canonicalize target hash inputs"), which adds the same `sortedAttributesForHashing` + `canonicalizeRuleInputs` canonicalization. Note: this is a one-time change to absolute hash values. The standard workflow generates hashes for both the "before" and "after" revisions with the same binary, so diff results are unaffected — and unchanged targets now hash identically across runs that previously differed only by input/attribute ordering. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
07dd9c8 to
9c928f4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Hash a rule's attributes in name-sorted order and dedupe + sort its (configured) rule inputs before mixing them into the target digest, so a target's hash is invariant to the order Bazel happens to emit them in.
Why
A rule's attributes and rule inputs are conceptually sets, but today
BazelRule.digest()andruleInputList()hash them in Bazel's emission order:digest()iteratesrule.attributeListas-emitted.ruleInputList()concatenatesconfiguredRuleInputList/ruleInputListwith no global dedupe or sort.That emission order is not guaranteed stable, so an otherwise-unchanged target can hash differently between two graphs and show up as a spurious diff. This is most acute on the configuration-aware cquery path (#359 / #363):
configuredRuleInputListcan surface the same dep label across multiple configurations, and cquery does not promise a stable order for those edges. Without canonicalization, flipping the emission order of two configured edges changes the parent's hash even though nothing changed.Source
Backported from
bazel-contrib/target-determinatorcommitd4b6125("Canonicalize target hash inputs"), which introduces the equivalentsortedAttributesForHashing+canonicalizeRuleInputscanonicalization in its Go hashing core. This brings our configuration-aware hashing (recently landed in #363) to parity.Compatibility note
This is a one-time change to absolute hash values for any rule with ≥2 attributes or whose inputs weren't already emitted in sorted order. The standard bazel-diff workflow generates hashes for both the "before" and "after" revisions with the same binary, so diff results are unaffected — and unchanged targets now hash identically across runs that previously differed only by input/attribute ordering. Stale hashes generated by a prior version should be regenerated (as is already expected across versions).
Tests
Added to
BazelRuleTest:testDigestInvariantToAttributeOrder— same attribute set, different order → equal digest.testDigestStillDetectsAttributeValueChange— sorting doesn't mask a real value change.testNonCqueryRuleInputListDedupesAndSorts— non-cquery inputs come back deduped + sorted.testCqueryRuleInputListInvariantToConfiguredInputOrder— reordered/duplicated configured inputs → identicalruleInputList().BuildGraphHasherTest's pinned hashes mockBazelRule, so they're unaffected; full local CLI suite passes exceptE2ETest, which fails identically onmasterin this sandbox due to a missing nested Java runtime (environmental, unrelated to this change).🤖 Generated with Claude Code