please provide a detailed and critical design review of this format and implementation #10
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Design Review: mfer format and implementation
Reviewed the full codebase (~930 lines Go), protobuf schema, README/design doc, Makefile, and Dockerfile.
Format Design Issues
1. No file checksums are actually computed
Severity: Critical
The entire point of mfer is cryptographic integrity verification, but
generateInner()inserialize.gonever computes any hashes. It createsMFFilePathentries withPathset butHashesempty:No SHA-256, no multihash, no checksums at all. The
checkcommand is also unimplemented. This means the tool currently generates manifests that cannot verify anything — it's a directory listing, not an integrity manifest.2. Inner message is never deserialized
NewFromProto()indeserialize.govalidates the outer envelope but never actually unmarshals the innerMFFilemessage:So after reading a manifest, you can't access the file list. The round-trip is broken.
3. Outer SHA-256 doesn't protect against tampering
The outer envelope stores
sha256of the inner message, but there's no signature verification. The hash is checked during deserialization... except it actually isn't —validateProtoOuter()checks the size but never verifies the SHA-256 hash against the decompressed data. An attacker could modify the inner message and update the size field.4. Magic bytes approach is fragile
Using
ZNAVSRFG(ROT13 of "MANIFEST") as a magic prefix works, but it's prepended outside the protobuf envelope. This means standard protobuf tooling can't parse the file without first stripping the prefix. Consider putting the magic inside the protobuf structure, or using a proper file type identifier (like a 4-byte magic + version header).5. Signature design is incomplete
The proto has
signature,signer, andsigningPubKeyfields on the outer message, but:6. No file size in manifest entries
MFFilePathhas asizefield in the proto, butgenerateInner()never sets it. File size is critical for:7. No mtime/ctime written
Same issue — proto has
mtime,ctime,atimefields but they're never populated. The README lists metadata preservation as a goal.8. Compression is hardcoded to gzip
The proto defines
COMPRESSION_NONEandCOMPRESSION_GZIPbut only gzip is ever used. For a v1.0, consider zstd — it's faster to decompress and compresses better than gzip at equivalent CPU cost. Also, the deserialization rejects anything that isn't gzip, soCOMPRESSION_NONEwould fail.Implementation Issues
9.
go 1.17is ancientThe
go.modspecifies Go 1.17. Current stable is 1.22+. This locks you out of generics,errors.Join,slog, workspace mode, and numerous stdlib improvements. Many dependencies probably already require newer Go.10. Exported types are unexported
The
manifeststruct and all its methods are unexported (lowercase). The public API surface is just the constructor functions (New,NewFromPaths,NewFromFS,NewFromProto). This means consumers can't type-hint on*manifest— they get an opaque value. Either export the struct or define an interface.11.
fetchcommand panicsThis should return an error, not panic. A user running
mfer fetchgets a stack trace instead of a helpful message.12.
checksilently succeedsThis logs "unimplemented" at some level but returns nil (success). A user running
mfer checkthinks everything passed. Should return an error.13.
gendoesn't write the output filegenerateManifestOperation()serializes to a buffer but never writes it to the--outputfile. The-oflag is defined but unused.14.
genhas an off-by-one in path collectionThis pre-allocates
len-1slots but appendslenitems. Also,NewFromPathsreceives a slice with leading empty strings from the pre-allocation.15.
io.ReadAllbeforedefer gzr.Close()In
deserialize.go:The
defershould come beforeReadAll. IfReadAllreturns an error,gzris never closed.16. Two different logging libraries
The codebase uses both
apex/log(in CLI code) and a custominternal/logwrapper (also aroundapex/log). The internal wrapper addsspew.Dumpand caller info but duplicates functionality. Pick one approach.17.
afero.Aferoused in tests as global mutable stateTest filesystems
afandbigare package-level variables initialized ininit(). This makes tests order-dependent and prevents parallel execution.18. No
.gitignorePer the styleguide: "Provide a .gitignore file that ignores at least
*.log,*.out, and*.testfiles, as well as any binaries."19. Dependencies are stale
Many dependencies are 2+ years old.
urfave/cli/v2has a v3 now.apex/logis essentially unmaintained. Considercharmbracelet/logorlog/slogfor a fresh start.Format Design Recommendations for 1.0
gosignifylibrary exists.fxamacker/cbor), and is closer to the "just works" simplicity the project aims for. JSON is also fine for v1 if you don't need the space savings.MF\x01\x00(magic + major + minor version) as a fixed 4-byte header.Summary
The format concept is sound — there's a real gap for a standard manifest format with cryptographic integrity. But the implementation is essentially a skeleton: the critical features (checksums, verification, signing, fetching) are all stubs. The protobuf schema has the right fields but nothing populates them.
For a 1.0, I'd prioritize:
checkcommand (verify files against manifest)genfetch(download from URL using manifest)Happy to file individual issues and/or PRs for any of the above.
Critical Design Review of mfer
Executive Summary
mfer aims to be a standard manifest file format for directory trees, replacing ad-hoc
SHASUMS/SHASUMS.ascfiles. The idea is sound and addresses a real gap. However, the implementation is pre-alpha at best — several core features described in the README are unimplemented, there are concrete bugs, and the format design has issues that should be resolved before any "1.0" or adoption push.1. Format Design
1.1 Strengths
1.2 Weaknesses & Design Flaws
Magic bytes are a poor choice.
ZNAVSRFG(ROT13 of "MANIFEST") is cute but problematic:The outer envelope SHA-256 is redundant with protobuf integrity. The
sha256field inMFFileOuterchecksums the uncompressed inner message, but protobuf deserialization + gzip decompression will already fail on corruption. If the goal is detecting truncation, thesizefield suffices. If the goal is cryptographic integrity, that's what the signature fields are for. This field is neither verified in the current code nor clearly necessary.No length-prefix on the protobuf payload. After the magic bytes, the rest of the file is fed directly to
proto.Unmarshal. Protobuf is not self-delimiting — if you ever want to concatenate manifests or append data after the protobuf message, this framing is insufficient. Consider a length-prefix (varint or fixed-width) after the magic.Signature design is half-baked:
signature,signer, andsigningPubKeyas optional byte fields on the outer message. But what exactly is signed? The inner serialized bytes? The compressed bytes? This is unspecified.File paths have no normalization requirements. The proto stores
string pathbut there's no specification about:/? OS-dependent?)/)..components are allowed (directory traversal!)This is a security issue — a malicious manifest could contain paths like
../../../etc/shadowand a naivefetchimplementation would write outside the target directory.No file permissions/mode. The format stores mtime/ctime/atime but not file permissions. For an archival format, this is a notable omission. (Unix mode bits at minimum; consider also symlink targets if
FollowSymLinksis false.)The
MFFileChecksummessage is over-abstracted. It's a message containing a singlebytes multiHashfield. Therepeatedon the parent means you can have multiple checksums per file, but since multihash already encodes the algorithm, you'd just have multiple multihash entries. Arepeated bytes hashesfield onMFFilePathwould be simpler without loss of expressiveness.No manifest self-reference or exclusion rule. When you generate
index.mfin a directory, should the manifest include itself? The current implementation doesn't handle this — it would either include a stale version or create a circular dependency. There should be an explicit exclusion for the output file.1.3 Comparison with Alternatives
The README's "hopes and dreams" section describes features (IPFS CIDs, BitTorrent chunk lists, resumable downloads) that would genuinely differentiate mfer from alternatives. Without them, mfer is essentially a more complex SHASUMS with no ecosystem advantage. The current implementation doesn't even compute checksums, so it's strictly less useful than a SHASUMS file.
Consider looking at mtree(5) — the BSD manifest format that already solves many of the same problems with decades of production use.
2. Implementation Quality
2.1 Critical: Checksums Are Not Computed
The single most important feature of a manifest file — cryptographic checksums of the listed files — is not implemented.
generateInner()createsMFFilePathentries with only the path set and a// FIXME add more stuffcomment. Thehashesfield is never populated. File sizes are tracked in the manifest struct but not written to the protobuf.This means the tool currently produces manifests that are useless for integrity verification — its primary stated purpose.
2.2 Deserialization Is Incomplete
NewFromProto()parses the outer message and validates it, but never deserializes the innerMFFilemessage (marked// FIXME TODO deserialize inner). After callingNewFromProto(), the returned manifest has no files, no metadata — just a validated outer shell.2.3
checkandfetchAre UnimplementedcheckManifestOperationcallslog.WithError(errors.New("unimplemented"))but returns nil — it silently succeeds, misleading users into thinking their files verified correctly. This is dangerous. It should return an error orpanic.fetchManifestOperationpanics, which is at least honest but not graceful.2.4 Bugs
Slice pre-allocation bug in
gen.go:paths := make([]string, ctx.Args().Len()-1)creates a slice withlen-1zero-value elements, thenappends to it. Result: the paths slice haslen-1empty strings prepended to the actual paths. This will cause the manifest to try to scan empty paths.deferafter error check indeserialize.go:gzr.Close()is deferred after theio.ReadAllerror check. IfReadAllfails, the deferred close still runs, but ifNewReaderhad succeeded andReadAllfails, that's fine. However, thedefershould be placed immediately after the successfulNewReadercall, beforeReadAll, per Go convention. IfReadAllreturns both data and an error, you'd process partial data before checking the error.validateProtoOuterdoesn't verify SHA-256. It checks the size but never actually computes and compares the SHA-256 hash of the decompressed inner data againstm.pbOuter.Sha256. The hash is computed during generation but never verified during deserialization.pathIsHiddenreturns incorrect results for paths starting with./: The test assertspathIsHidden("./a/b/c/hello.txt")is false, and indeedpath.Cleanstrips the./, butpath.Clean(".")returns"."which starts with.— sopathIsHidden(".")would return true, which is arguably wrong for the current directory.generate()is not idempotent on timestamp.generateInner()usestime.Now()— callinggenerate()twice or at different times produces different manifests for the same input. The README mentions wanting deterministic/reproducible manifests as a goal but there's no option to omit or fix the timestamp.WriteToreusesm.outputbuffer. After callingWriteTo, the buffer's read position advances. A second call toWriteToworks only becausegenerate()is skipped (sincepbOuteris already set) andm.output.Bytes()returns the full backing slice regardless of read position. This works by accident, not by design.2.5 Code Structure
manifesttype with exported constructors (New,NewFromPaths, etc.) — this is an unusual Go pattern. It prevents users from declaring the type in their own code (e.g.,var m *mfer.manifestdoesn't work from outside the package). Either export the type or use an interface.internal/logwrapsapex/log, butinternal/cli/check.goandinternal/cli/fetch.goimportapex/logdirectly, bypassing the wrapper.internal/borkpackage definesErrMissingMagicbut it's never used —deserialize.gouseserrors.New("invalid file format")instead.2.6 Test Coverage
bork,cli,log,main) just asserttrue— placeholder tests that verify nothing.2.7 Dependencies & Build
go 1.17is ancient (EOL). Current Go is 1.22+.vendor.tzst,modcache.tzst) committed to the repo — unconventional and makes the repo large. Standardgo mod vendoror module proxy would be more maintainable.sneak/builder:2022-12-08— a custom base image that may not be publicly available, making the build non-reproducible for outside contributors.3. Security Considerations
fetchimplementation must sanitize paths, but the format doesn't mandate it.io.ReadAllon the gzip reader has no limit.checkcommand silently returns success without actually checking anything.4. Recommendations
Immediate (before any release)
ls -lR.checkcommand to either work or return an error.gen.go...components, and normalize separators.Format changes (before 1.0)
MFFilePath.\x00MF\x01— null byte to prevent text editor misidentification, format identifier, version byte).Engineering
manifesttype or provide an interface.NewFromProto.ErrMissingMagicfrom the bork package or remove it.5. Summary
The problem statement is compelling and the project addresses a genuine need. The format design is reasonable in broad strokes but under-specified in critical details (path handling, signature semantics). The implementation is very early-stage — the two most important features (checksums and verification) are unimplemented, there are several bugs, and the
checkcommand is silently broken in a way that could give users false confidence in their data integrity.Before pursuing adoption or a 1.0 release, the project needs: (1) actually computing and verifying checksums, (2) a proper format specification, (3) path security hardening, and (4) meaningful test coverage. The bones are there, but there's significant work remaining to deliver on the stated goals.
redo this analysis with an understanding that those panics and fixmes are a list of unimplemented items, not a failure of implementation. obviously the implementation is incomplete; this is not working code. you are to analyze and criticize the design of the system and the parts that are already implemented. create an ordered itemized implementation plan for getting to a 1.0. let's go with zstd instead of gzip.
the outer envelope is for cryptographic integrity. if you know a way to do that without double protobufs, please let me know. no to cbor or json - the reasons for using protobuf are clear. i agree with determinism.
please also redo your analysis on branch
next-mainis way out of date.Revised Design Review: mfer (
nextbranch)Reviewed: all source on
nextbranch, including protobuf schema,mfer/package,internal/packages, CLI, and tests.1. Format Design Critique
1.1 The Protobuf Schema
The schema is clean and well-structured. Specific observations:
Good decisions:
MFFileOuter(envelope) fromMFFile(content) is the right call for enabling signatures over a stable byte representation.CompressionTypeenum withCOMPRESSION_ZSTD— good, zstd is the right choice. ✓Concerns:
MFFileChecksumis an unnecessary indirection. It's a message with a singlebytes multiHashfield. Since multihash already self-describes the algorithm,repeated bytes hashes = 3onMFFilePathwould be equivalent and simpler. The extra message layer adds protobuf overhead (tag+length for the wrapper message) for every hash of every file. At scale (millions of files), this is measurable. If you plan to add per-hash metadata later (e.g. averified_attimestamp), the wrapper is justified — otherwise simplify.No file permissions/mode field. For the archival use case (ExFAT drives, filesystem-independent checksums), this may not matter. But if mfer manifests are ever used for software distribution or filesystem restoration, the absence of permission bits will be a gap. Consider reserving a field number now even if you don't populate it yet (e.g.
optional uint32 mode = 305).atimein the schema is questionable. Access time is volatile, non-deterministic, and often disabled (noatime). Including it in manifests means two manifests of the same directory tree at different times will differ. If deterministic manifests are a goal (they should be),atimeshould be removed or explicitly documented as "never set by default."Path semantics are unspecified in the schema. The proto has
string pathbut no comments specifying: always forward-slash separated? Must be relative? No..components? UTF-8 NFC normalized? This is the single most dangerous under-specification — it directly affects security (path traversal) and cross-platform compatibility (macOS NFD vs Linux NFC). I'd add proto comments specifying the invariants, and validate in code.1.2 The Envelope / Inner Message Structure
The double-protobuf design serves a clear purpose: you need to sign bytes, and protobuf serialization is not canonically unique (field ordering, default-value omission, unknown fields). By serializing the inner
MFFileto bytes, compressing, then wrapping those bytes inMFFileOuter.innerMessage, you get a stable byte sequence to hash and sign. This is sound.However, there are alternatives worth considering:
Alternative: Canonical serialization + sign-then-wrap
Protobuf with deterministic marshaling (which you're already using via
proto.MarshalOptions{Deterministic: true}) produces stable output for a given message from the same implementation. The Go protobuf docs note this is not guaranteed across implementations or versions, but in practice:MFFilebytes directly, and structure the file as:MAGIC | SignedEnvelope(signature, signer, signingPubKey, compression_type, compressed_inner_bytes).This is essentially what you're doing now, but recognizing it explicitly as "the inner bytes ARE the canonical form" rather than treating it as an opaque blob. The double-protobuf is fine — I don't think there's a way to avoid it that doesn't sacrifice the integrity property. The key insight is that any sign-then-wrap scheme needs a stable byte representation of the signed content, and protobuf messages aren't that (in the general case), so wrapping the serialized bytes is correct.
One concrete improvement: The
sha256field onMFFileOutercurrently hashes the compressed data. This is the right call — it lets you verify integrity before decompression (preventing decompression bombs from untrusted sources). But document this clearly, because it's a non-obvious choice. Someone reading the proto might assume it's a hash of the uncompressed inner message.Alternative: Detached signature file
For distribution scenarios, a detached
.mf.sigfile alongsideindex.mfwould allow verifying integrity without modifying the manifest format. This is simpler for tooling (any HTTP server can serve both files) and follows theSHASUMS+SHASUMS.ascpattern that users already understand. The embedded signature is better for single-file distribution though, so supporting both modes might be ideal.1.3 Magic Bytes
ZNAVSRFGworks. It's 8 bytes, ASCII-printable, and unlikely to collide. The ROT13 origin is irrelevant to functionality. My only suggestion: consider following it with a version byte (0x01) to enable future framing changes without requiring protobuf parsing to detect the version. CurrentlyMFFileOuter.Versionserves this purpose but requires successful deserialization to read.2. Implemented Code Quality
Focusing on code that IS implemented and working (not stubs):
2.1 Serialization / Deserialization (serialize.go, deserialize.go)
Well done:
Issues:
signatureString()format is ad-hoc. The canonical stringMAGIC-UUID-MULTIHASHis fine, but hex-encoding the UUID and multihash means the signature covers a derived representation rather than raw bytes. This works but adds an unnecessary transformation. Signing the rawsha256bytes (or the compressedinnerMessagebytes directly) would be simpler and avoid potential encoding ambiguity. If you keep the string format, document it as part of the spec.No decompression size limit.
io.ReadAll(zr)indeserializeInner()will happily decompress into unbounded memory. Thesizefield exists — use it as a limit:io.LimitReader(zr, m.pbOuter.Size + some_slack). This prevents decompression bombs from malicious manifests.The
sizefield stores uncompressed size but is checked after full decompression. You could check the zstd content size header (zstd frames include the uncompressed size) before decompressing, for early rejection of tampered manifests.2.2 Builder (builder.go)
Well done:
AddFile(hashing reader) andAddFileWithHash(pre-computed) ✓AddFileWithHash✓Issues:
AddFiledoesn't verify that bytes read match the declaredsize. The caller passessizeas a parameter, andAddFilereads from the reader, but never checks thattotalRead == FileSize(size). If they diverge (e.g., file changed between stat and read), the manifest entry has asizethat doesn't match the hash. Add a post-read size check.64KB read buffer is fine but could be documented. The chunk size affects progress granularity. For very large files, 256KB or 1MB would reduce syscall overhead. Minor.
2.3 Scanner (scanner.go)
Well done:
IsHiddenPathis correct and well-tested ✓Issues:
totalBytesis tracked incrementally in Scanner but recalculated inToManifest. The scanner maintainss.totalBytesduring enumeration, butToManifestre-sums all file sizes. Use the cached value.No deterministic file ordering.
afero.Walkreturns files in filesystem order, which varies by OS and filesystem. For reproducible manifests, sorts.filesby path before writing. This is critical for determinism.Race condition in progress goroutine. In
ToManifest, the progress goroutine capturesscannedFilesandscannedBytesfrom the outer scope without synchronization. These are modified by the main loop. Use atomic operations or pass values through the channel.2.4 Checker (checker.go)
Well done:
FindExtraFilesfor detecting undeclared files ✓Issues:
errors.Is(err, errors.New("file does not exist"))always returns false (line incheckFile).errors.Newcreates a unique error value each call;errors.Iscompares by identity, not string. The fallbackafero.Existscall saves it, but the first comparison is dead code. Useos.IsNotExist(err)orerrors.Is(err, fs.ErrNotExist).No progress rate-limiting in Checker. Unlike Scanner which sends progress at most once per second, Checker sends a progress update for every single file. For manifests with millions of small files, this floods the channel. Add the same 1-second throttling.
2.5 GPG Integration (gpg.go)
Well done:
defer os.RemoveAll✓Issues:
Shelling out to
gpgis fragile. The binary might not be installed, or might be a different version with different output formats. Consider usinggolang.org/x/crypto/openpgp(deprecated but stable) or the maintained forkgithub.com/ProtonMail/go-cryptofor a pure-Go implementation. This would also make cross-compilation and testing much easier.No timeout on GPG subprocess. A hung GPG agent could block forever. Use
exec.CommandContextwith a timeout.GPG colon-delimited output parsing is brittle. If GPG changes its output format (it has before), the fingerprint extraction breaks silently. At minimum, add a test that verifies the parsing against known GPG output.
2.6 Fetch (internal/cli/fetch.go)
Well done:
sanitizePathwith good test coverage ✓Issues:
fileURL := baseURL.String() + f.Path— this concatenates the file path without URL-encoding. Paths with spaces,#,?, or other URL-special characters will break. Useurl.JoinPath()or theBaseURL.JoinPathmethod that already exists inurl.go.No concurrent downloads. For fetching many small files over high-latency connections, sequential HTTP GETs are slow. A worker pool with configurable concurrency would significantly improve fetch performance. (Not blocking for 1.0, but worth noting.)
No retry logic. A single HTTP failure aborts the entire fetch. At minimum, retry transient errors (5xx, timeouts) with exponential backoff.
2.7 Code Duplication
There are two parallel implementations of Scanner and Checker:
mfer/scanner.go+mfer/checker.go(typed with custom types likeFileSize,RelFilePath)internal/scanner/scanner.go+internal/checker/checker.go(using rawint64,string)The
internal/versions appear to be older and are used by some CLI code while themfer/versions are newer and better-typed. Pick one and delete the other. Themfer/package versions are superior — they use semantic types and have better progress reporting.2.8 Freshen Command (internal/cli/freshen.go)
Well done:
Issue: The
IncludeDotfilesandFollowSymLinksflag names use Go-style PascalCase instead of kebab-case. The--IncludeDotfilesflag looks odd on the command line. Thegeneratecommand has the same issue. UseName: "include-dotfiles"as the primary with the Go name as an alias (which you already do for generate, but freshen readsctx.Bool("IncludeDotfiles")which would need the PascalCase form).3. The Double-Protobuf Question
Is there a cleaner way to achieve cryptographic integrity over a protobuf message?
The core problem: protobuf serialization is not guaranteed to be canonical across implementations. Two implementations serializing the same logical message may produce different bytes. Therefore, you can't sign a protobuf message — you must sign specific bytes.
Your current approach (serialize inner → compress → wrap bytes in outer → sign) is the standard solution. Alternatives:
COSE (CBOR Object Signing and Encryption) — ruled out since you want protobuf.
Protobuf Canonical Encoding: There is no official protobuf canonical form. Go's
Deterministic: trueis implementation-specific. However, you could define your own canonical rules (fields in number order, no unknown fields, varint minimal encoding) and document them as part of the mfer spec. This would let you sign the inner bytes directly without a wrapper. But you'd still need to store the signature somewhere, which means... an outer message. So you end up with the same structure.Sign the compressed bytes, store signature separately:
MAGIC | compressed_inner_bytes | signature_trailer. This avoids the outer protobuf but requires custom framing for the trailer. Net complexity is similar.Verdict: The double-protobuf is the right approach. It's well-understood (TLS, JWT, and many other protocols use sign-then-wrap), easy to implement in other languages, and the overhead of the outer protobuf envelope is negligible. Don't change it.
4. Implementation Plan to 1.0
Ordered by dependency and priority. Each item should be a PR.
Phase 1: Foundation (format correctness)
Delete
internal/scanner/andinternal/checker/— consolidate on themfer/package versions. Update CLI code to usemfer.Scannerandmfer.Checker.Add deterministic file ordering — sort file entries by path (lexicographic, byte-order) before serialization in
Builder.Build(). Add a test that generates a manifest twice and asserts byte-identical output.Add decompression size limit — use
io.LimitReaderindeserializeInner()withm.pbOuter.Sizeas the bound.Fix
errors.Isdead code in checker — replace withos.IsNotExist(err)orerrors.Is(err, fs.ErrNotExist).Fix
AddFileto verify size — after reading, checktotalRead == sizeand return an error on mismatch.Specify path invariants — add comments to proto: paths are UTF-8, forward-slash separated, relative, no
..components, no leading/. Validate inBuilder.AddFileandBuilder.AddFileWithHash.Phase 2: CLI polish
Fix flag naming — ensure all CLI flags use kebab-case as primary names (
--include-dotfiles,--follow-symlinks). Read with kebab-case in code.Fix URL construction in fetch — use
BaseURL.JoinPath()orurl.JoinPath()instead of string concatenation.Add progress rate-limiting to Checker — throttle to once per second, matching Scanner behavior.
Add
--deterministicflag (or make it default) — omitcreatedAttimestamp, sort files. For reproducible manifests.Phase 3: Robustness
Replace GPG subprocess with pure-Go crypto — use
github.com/ProtonMail/go-cryptofor OpenPGP operations, or switch to Ed25519/signify (simpler, no key management complexity). This removes the runtime dependency ongpgbeing installed.Add timeout to any remaining subprocess calls.
Add fuzzing tests for
NewManifestFromReader— this parses untrusted input and should be fuzz-tested.Add retry logic to fetch — exponential backoff for transient HTTP errors.
Phase 4: Format finalization
Remove
atimefrom proto (or document it as deprecated/never-set). It hurts determinism.Reserve
optional uint32 mode = 305inMFFilePathfor future file permission support.Add a version byte after magic —
ZNAVSRFG\x01for format version 1. Allows future framing changes.Write a format specification document — separate from README. Cover: magic format, outer message structure, compression, inner message structure, path invariants, signature scheme, canonical serialization requirements.
Phase 5: Release prep
Update Go module path if needed (currently
sneak.berlin/go/mferin some imports,git.eeqj.de/sneak/mferin proto).Audit all error messages for consistency and helpfulness.
Add
--versionoutput to match SemVer format.Tag v1.0.0.
5. Summary
The
nextbranch represents a significant leap frommain. The core features work: manifest generation with SHA-256 multihash checksums, verification, fetch with integrity checking, incremental freshen, GPG signing/verification, zstd compression, and comprehensive tests (including a 500-iteration corruption detection test — nice).The format design is sound. The double-protobuf envelope is the correct approach for cryptographic integrity over protobuf messages — I don't see a simpler alternative that preserves the same properties. The main design gaps are: unspecified path semantics (security risk), no decompression limit (DoS risk), and duplicated internal packages that should be consolidated.
The implemented code is generally high quality — good use of interfaces (
afero.Fs), proper concurrency patterns, atomic file operations, and thorough test coverage. The issues I've flagged are real but mostly minor: a deaderrors.Iscomparison, missing size verification inAddFile, URL encoding in fetch, and the internal package duplication.For 1.0, the biggest wins would be: consolidate the duplicate packages, add deterministic ordering, specify path rules, and write a format spec. The code is much closer to releasable than the previous review suggested.
create a TODO.md in the root of branch next with the implementation steps at the end of your comment. add a section at the top with all of the design questions you had for me - such as the implicit ones around atime and redundant fields. i'll answer them inline in the TODO file and we'll have them documented for posterity.
Created
TODO.mdon thenextbranch with:sneak: edit the file directly on
nextto answer the design questions inline under each> *answer:*block.Created TODO.md on a branch off
nextwith 10 design questions (with answer placeholders) and a 22-item phased implementation plan.PR: #11
Once you answer the design questions inline, I can unassign myself from this issue and start working through the implementation items.