6.6 KiB
TODO: mfer 1.0
Design Questions
These need answers before implementation. Respond inline.
1. Should atime be removed from the proto?
Access time is volatile, non-deterministic, and often disabled (noatime
mount option). Including it means two manifests of the same unchanged
directory tree generated at different times will differ. This hurts
deterministic/reproducible manifests. Recommendation: remove it or document
it as "never set by default, opt-in only."
Answer:
2. Should MFFileChecksum stay as a wrapper message or simplify to repeated bytes?
Currently MFFilePath.hashes is repeated MFFileChecksum where
MFFileChecksum contains a single bytes multiHash field. Since multihash
already self-describes the algorithm, repeated bytes hashes = 3 on
MFFilePath would be equivalent and save protobuf overhead (one fewer
tag+length per hash per file). The wrapper is justified if you plan to add
per-hash metadata later (e.g. verified_at timestamp). Otherwise it's
unnecessary indirection.
Answer:
3. Should file permissions/mode be supported?
The proto has no field for Unix permission bits. For the archival use case
(ExFAT drives, filesystem-independent checksums), this may not matter. But
for software distribution or filesystem restoration, the absence of
permission bits is a gap. Recommendation: reserve optional uint32 mode = 305 in MFFilePath now, even if it's not populated yet.
Answer:
4. Path semantics: what are the rules?
The proto has string path with no specified invariants. These need to be
documented and enforced:
- Always forward-slash separated? (even on Windows)
- Must be relative? (no leading
/) - No
..components? (path traversal prevention) - UTF-8 NFC normalized? (macOS uses NFD, Linux uses NFC — same filename looks different in bytes)
- Max path length?
Answer:
5. Should the sha256 field on MFFileOuter hash compressed or uncompressed data?
Currently it hashes the compressed inner bytes. This is actually the better choice — it lets you verify integrity before decompression (preventing decompression bombs from untrusted sources). But it's a non-obvious design decision that should be documented. Confirm this is intentional.
Answer:
6. GPG vs signify (Ed25519) for signatures?
The current implementation shells out to gpg, which is fragile (binary
might not be installed, output format changes between versions). Options:
- Keep GPG: Use
github.com/ProtonMail/go-cryptofor pure-Go OpenPGP (no subprocess, cross-platform, testable). - Switch to signify/Ed25519: Simpler, smaller keys, no key management
complexity.
github.com/frankbraun/gosignifyexists. README already mentions this as an option. - Support both: More work, but maximum flexibility.
Answer:
7. Should manifests be deterministic by default?
If createdAt is populated and file order depends on filesystem walk order,
two runs of mfer gen on the same directory produce different manifests.
Recommendation: sort files lexicographically by path, omit createdAt
unless --timestamp is passed. Deterministic by default.
Answer:
8. Should the magic bytes include a version byte?
Currently: ZNAVSRFG (8 bytes), then protobuf. If the framing ever needs
to change (e.g. different compression framing, post-quantum signatures),
you'd need to parse the protobuf to discover the version. Adding a version
byte after the magic (ZNAVSRFG\x01) allows future format changes without
requiring protobuf parsing for version detection.
Answer:
9. Detached signature support?
For HTTP distribution, a detached .mf.sig file alongside index.mf would
let servers serve both without modifying the manifest itself. This follows
the SHASUMS + SHASUMS.asc pattern. The embedded signature is better for
single-file distribution. Worth supporting both modes?
Answer:
10. What about the duplicate internal packages?
There are two parallel implementations:
mfer/scanner.go+mfer/checker.go(newer, better-typed withFileSize,RelFilePath)internal/scanner/+internal/checker/(older, raw types)
The mfer/ versions are superior. Should the internal/ versions be
deleted and CLI updated to use mfer/ package?
Answer:
Implementation Plan
Ordered by dependency and priority. Each item should be a PR.
Phase 1: Foundation (format correctness)
- Delete
internal/scanner/andinternal/checker/— consolidate on themfer/package versions. Update CLI code to usemfer.Scannerandmfer.Checker. - Add deterministic file ordering — sort file entries by path
(lexicographic, byte-order) before serialization in
Builder.Build(). Add a test that generates a manifest twice and asserts byte-identical output. - Add decompression size limit — use
io.LimitReaderindeserializeInner()withm.pbOuter.Sizeas the bound. - Fix
errors.Isdead code in checker — replace withos.IsNotExist(err)orerrors.Is(err, fs.ErrNotExist). - Fix
AddFileto verify size — after reading, checktotalRead == sizeand return an error on mismatch. - Specify path invariants — add comments to proto and validate in
Builder.AddFile/Builder.AddFileWithHash.
Phase 2: CLI polish
- Fix flag naming — ensure all CLI flags use kebab-case as primary
names (
--include-dotfiles,--follow-symlinks). - Fix URL construction in fetch — use
url.JoinPath()instead of string concatenation. - Add progress rate-limiting to Checker — throttle to once per second, matching Scanner behavior.
- Add
--deterministicflag (or make it default) — omitcreatedAttimestamp, sort files.
Phase 3: Robustness
- Replace GPG subprocess with pure-Go crypto — use
github.com/ProtonMail/go-cryptoor switch to Ed25519/signify. - Add timeout to any remaining subprocess calls.
- Add fuzzing tests for
NewManifestFromReader— parses untrusted input. - Add retry logic to fetch — exponential backoff for transient HTTP errors.
Phase 4: Format finalization
- Remove or deprecate
atimefrom proto (pending design question #1). - Reserve
optional uint32 mode = 305inMFFilePath(pending design question #3). - Add a version byte after magic (pending design question #8).
- Write a format specification document — separate from README.
Phase 5: Release prep
- Reconcile module path (
sneak.berlin/go/mfervsgit.eeqj.de/sneak/mfer). - Audit all error messages for consistency.
- Ensure
--versionoutput matches SemVer format. - Tag v1.0.0.