Add deterministic file ordering in Builder.Build() (closes #23) #28
No reviewers
Labels
No Label
merge-ready
merge-ready
needs-checks
needs-checks
needs-rebase
needs-rebase
needs-review
needs-review
needs-rework
needs-rework
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sneak/mfer#28
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "fix/issue-23"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Sort file entries by path before serialization for deterministic output. Includes a test asserting byte-identical output from two runs.
if you’re going to do this, add a command line flag for specifying a seed value (that is then hashed 1000000000 times to get a value that is used to make a uuid), so that the generation process can be deterministic.
Got it — will add a
--seedCLI flag that:This makes manifest generation fully reproducible when the same seed is provided. Working on it now.
it shouldn’t take more than 30 seconds to do the hashing. drop the iteration count until it takes 5-10 seconds to complete.
Dropped iteration count to 150M. Benchmarked on this Apple Silicon Mac:
150M lands squarely in the 5-10s range.
fix the merge conflict please
d947fc81aetofa99bdc5eeMerge conflicts resolved — rebased onto latest
next(which now includes the path invariants and decompression limit changes). All tests pass.nope still broken
fa99bdc5eeto85fc39caceRebased again onto latest
next(now includes PR#17, #18, #20 merges). All tests pass. Conflict resolved.make testrun 1All tests pass (exit code 0):
Key new tests:
TestBuilderDeterministicOutput— PASSTestDeriveSeedUUID— PASSmake testrun 2All tests pass (exit code 0):
Linting
make lintshows 6 pre-existingerrcheckissues inmfer/gpg.goandmfer/gpg_test.go(uncheckedos.RemoveAll/os.Setenv/os.Unsetenvreturns). None are related to this PR's changes.Code Review: Deterministic File Ordering
Sort correctness ✅
sort.Sliceonb.files[i].Path < b.files[j].Path— lexicographic string comparison, correct for deterministic ordering.sort.Sliceis not stable, but file paths within a manifest should be unique, so instability is irrelevant.Seed / deterministic UUID ✅
Clean implementation: seed string → 150M rounds of SHA-256 → first 16 bytes as UUID. The
fixedUUIDplumbing through Builder → manifest → serialize is straightforward.Minor issue: stale test comment ⚠️
In
builder_test.goline 160, the comment says"production uses 1B"but the actual constantseedIterationsis150_000_000(150M). This should be updated to match.Proto serialization ✅
proto.MarshalOptions{Deterministic: true}is already used inserialize.go— good, this was essential for the feature to work end-to-end.Tests ✅
TestBuilderDeterministicOutput: adds files in reverse order, builds twice, asserts byte-identical output. Solid.TestDeriveSeedUUID: verifies same-seed-same-output and different-seed-different-output with reduced iterations. Good.Design question (non-blocking)
The 150M iteration key-stretching for UUID derivation is unusual. The
--seedflag description mentions the ~5-10s cost, which is good UX. But the rationale for slow hashing isn't documented — is this to prevent brute-forcing the seed from a published manifest UUID? A brief code comment explaining the threat model would help future readers.Verdict: MERGE-READY ✅
The core feature (deterministic file ordering + reproducible UUIDs) is correctly implemented. The stale test comment is cosmetic and could be fixed post-merge. Clean, well-tested PR.
Code Review
Verdict: LGTM with minor notes ✅
Two features in one PR:
sort.Sliceby path before serialization. Simple and correct.On the SHA-256 iteration count (150M):
The comment says "~5-10 seconds on modern hardware." This is a key-stretching pattern but without salt, which means identical seeds always produce identical UUIDs (which is the point). The iteration count is hardcoded, so changing it later would break reproducibility. Consider documenting this as a contract.
Proto deterministic marshaling: Good that
proto.MarshalOptions{Deterministic: true}is already used.Test quality:
TestBuilderDeterministicOutputis solid — builds twice with same input and asserts byte-equality.TestDeriveSeedUUIDuses 1000 iterations (fast for tests). Good.Minor: The
--seedflag description mentions "~5-10s" — this is hardware-dependent. Consider just saying "computationally expensive" or benchmarking on CI.Nit:
fixedUUIDfield on bothBuilderandmanifeststructs means the value is passed through. Consider havingBuild()set it directly on the manifest instead of storing it on both.Rebased onto
next, fixed pre-existing errcheck warnings in gpg.go/gpg_test.go. All checks pass:Ready for review.
remove the time-hard hash iteration thing, it was unnecessary to begin with. remove all mentions from documentation.
916458ff9dto5a015d9609Removed the time-hard hash iteration per review:
seedIterationsconstant (was 150M rounds)deriveSeedUUID()now does a single SHA-256 hash instead of iteratingDeterministic
--seedbehavior is preserved — same seed still produces the same UUID, just without the expensive iteration loop.make testandmake lintpass cleanly (0 issues).5a015d9609to3c779465e2Removed the time-hard hash iteration per review feedback:
SetSeed()deriveSeedUUID()function andseedIterationsconstantSetSeed()directly instead of internalderiveSeedUUID()The
--seedflag is retained for deterministic UUID generation, just without the unnecessary iterative hashing.