6.5 KiB
TODO: mfer 1.0
Design Questions
sneak: please answer inline below each question. These are preserved for posterity.
Format Design
1. Should MFFileChecksum be simplified?
Currently it's a separate message wrapping a single bytes multiHash field. Since multihash already self-describes the algorithm, repeated bytes hashes directly on MFFilePath would be simpler and reduce per-file protobuf overhead. Is the extra message layer intentional (e.g. planning to add per-hash metadata like verified_at)?
answer:
2. Should file permissions/mode be stored?
The format stores mtime/ctime but not Unix file permissions. For archival use (ExFAT, filesystem-independent checksums) this may not matter, but for software distribution or filesystem restoration it's a gap. Should we reserve a field now (e.g. optional uint32 mode = 305) even if we don't populate it yet?
answer:
3. Should atime be removed from the schema?
Access time is volatile, non-deterministic, and often disabled (noatime). Including it means two manifests of the same directory at different times will differ, which conflicts with the determinism goal. Remove it, or document it as "never set by default"?
answer:
4. What are the path normalization rules?
The proto has string path with no specification about: always forward-slash? Must be relative? No .. components allowed? UTF-8 NFC vs NFD normalization (macOS vs Linux)? Max path length? This is a security issue (path traversal) and a cross-platform compatibility issue. What rules should the spec mandate?
answer:
5. Should we add a version byte after the magic?
Currently ZNAVSRFG is followed immediately by protobuf. Adding a version byte (ZNAVSRFG\x01) would allow future framing changes without requiring protobuf parsing to detect the version. MFFileOuter.Version serves this purpose but requires successful deserialization to read. Worth the extra byte?
answer:
6. Should we add a length-prefix after the magic? Protobuf is not self-delimiting. If we ever want to concatenate manifests or append data after the protobuf, the current framing is insufficient. Add a varint or fixed-width length-prefix?
answer:
Signature Design
7. What does the outer SHA-256 hash cover — compressed or uncompressed data? The review notes it currently hashes compressed data (good for verifying before decompression), but this should be explicitly documented. Which is the intended behavior?
answer:
8. Should signatureString() sign raw bytes instead of a hex-encoded string?
Currently the canonical string is MAGIC-UUID-MULTIHASH with hex encoding, which adds a transformation layer. Signing the raw sha256 bytes (or compressed innerMessage directly) would be simpler. Keep the string format or switch to raw bytes?
answer:
9. Should we support detached signature files (.mf.sig)?
Embedded signatures are better for single-file distribution. Detached .mf.sig files follow the familiar SHASUMS/SHASUMS.asc pattern and are simpler for HTTP serving. Support both modes?
answer:
10. GPG vs pure-Go crypto for signatures?
Shelling out to gpg is fragile (may not be installed, version-dependent output). github.com/ProtonMail/go-crypto provides pure-Go OpenPGP, or we could go Ed25519/signify (simpler, no key management). Which direction?
answer:
Implementation Design
11. Should manifests be deterministic by default?
This means: sort file entries by path, omit createdAt timestamp (or make it opt-in), no atime. Should determinism be the default, with a --include-timestamps flag to opt in?
answer:
12. Should we consolidate or keep both scanner/checker implementations?
There are two parallel implementations: mfer/scanner.go + mfer/checker.go (typed with FileSize, RelFilePath) and internal/scanner/ + internal/checker/ (raw int64, string). The mfer/ versions are superior. Delete the internal/ versions?
answer:
13. Should the manifest type be exported?
Currently unexported with exported constructors (New, NewFromPaths, etc.). Consumers can't declare var m *mfer.manifest. Export the type, or define an interface?
answer:
14. What should the Go module path be for 1.0?
Currently mixed between sneak.berlin/go/mfer and git.eeqj.de/sneak/mfer. Which is canonical?
answer:
Implementation Plan
Phase 1: Foundation (format correctness)
- Delete
internal/scanner/andinternal/checker/— consolidate onmfer/package versions; update CLI code - Add deterministic file ordering — sort entries by path (lexicographic, byte-order) in
Builder.Build(); add test asserting byte-identical output from two runs - Add decompression size limit —
io.LimitReaderindeserializeInner()withm.pbOuter.Sizeas bound - Fix
errors.Isdead code in checker — replace withos.IsNotExist(err)orerrors.Is(err, fs.ErrNotExist) - Fix
AddFileto verify size — checktotalRead == sizeafter reading, return error on mismatch - Specify path invariants — add proto comments (UTF-8, forward-slash, relative, no
.., no leading/); validate inBuilder.AddFileandBuilder.AddFileWithHash
Phase 2: CLI polish
- Fix flag naming — all CLI flags use kebab-case as primary (
--include-dotfiles,--follow-symlinks) - Fix URL construction in fetch — use
BaseURL.JoinPath()orurl.JoinPath()instead of string concatenation - Add progress rate-limiting to Checker — throttle to once per second, matching Scanner
- Add
--deterministicflag (or make it default) — omitcreatedAt, sort files
Phase 3: Robustness
- Replace GPG subprocess with pure-Go crypto —
github.com/ProtonMail/go-cryptoor Ed25519/signify - Add timeout to any remaining subprocess calls
- Add fuzzing tests for
NewManifestFromReader - Add retry logic to fetch — exponential backoff for transient HTTP errors
Phase 4: Format finalization
- Remove or deprecate
atimefrom proto (pending design question answer) - Reserve
optional uint32 mode = 305inMFFilePathfor future file permissions - Add version byte after magic —
ZNAVSRFG\x01for format version 1 - Write format specification document — separate from README: magic, outer structure, compression, inner structure, path invariants, signature scheme, canonical serialization
Phase 5: Release prep
- Finalize Go module path
- Audit all error messages for consistency and helpfulness
- Add
--versionoutput matching SemVer - Tag v1.0.0