1
0
forked from sneak/mfer
mfer/TODO.md
2026-02-09 02:15:58 +01:00

6.5 KiB

TODO: mfer 1.0

Design Questions

sneak: please answer inline below each question. These are preserved for posterity.

Format Design

1. Should MFFileChecksum be simplified? Currently it's a separate message wrapping a single bytes multiHash field. Since multihash already self-describes the algorithm, repeated bytes hashes directly on MFFilePath would be simpler and reduce per-file protobuf overhead. Is the extra message layer intentional (e.g. planning to add per-hash metadata like verified_at)?

answer:

2. Should file permissions/mode be stored? The format stores mtime/ctime but not Unix file permissions. For archival use (ExFAT, filesystem-independent checksums) this may not matter, but for software distribution or filesystem restoration it's a gap. Should we reserve a field now (e.g. optional uint32 mode = 305) even if we don't populate it yet?

answer:

3. Should atime be removed from the schema? Access time is volatile, non-deterministic, and often disabled (noatime). Including it means two manifests of the same directory at different times will differ, which conflicts with the determinism goal. Remove it, or document it as "never set by default"?

answer:

4. What are the path normalization rules? The proto has string path with no specification about: always forward-slash? Must be relative? No .. components allowed? UTF-8 NFC vs NFD normalization (macOS vs Linux)? Max path length? This is a security issue (path traversal) and a cross-platform compatibility issue. What rules should the spec mandate?

answer:

5. Should we add a version byte after the magic? Currently ZNAVSRFG is followed immediately by protobuf. Adding a version byte (ZNAVSRFG\x01) would allow future framing changes without requiring protobuf parsing to detect the version. MFFileOuter.Version serves this purpose but requires successful deserialization to read. Worth the extra byte?

answer:

6. Should we add a length-prefix after the magic? Protobuf is not self-delimiting. If we ever want to concatenate manifests or append data after the protobuf, the current framing is insufficient. Add a varint or fixed-width length-prefix?

answer:

Signature Design

7. What does the outer SHA-256 hash cover — compressed or uncompressed data? The review notes it currently hashes compressed data (good for verifying before decompression), but this should be explicitly documented. Which is the intended behavior?

answer:

8. Should signatureString() sign raw bytes instead of a hex-encoded string? Currently the canonical string is MAGIC-UUID-MULTIHASH with hex encoding, which adds a transformation layer. Signing the raw sha256 bytes (or compressed innerMessage directly) would be simpler. Keep the string format or switch to raw bytes?

answer:

9. Should we support detached signature files (.mf.sig)? Embedded signatures are better for single-file distribution. Detached .mf.sig files follow the familiar SHASUMS/SHASUMS.asc pattern and are simpler for HTTP serving. Support both modes?

answer:

10. GPG vs pure-Go crypto for signatures? Shelling out to gpg is fragile (may not be installed, version-dependent output). github.com/ProtonMail/go-crypto provides pure-Go OpenPGP, or we could go Ed25519/signify (simpler, no key management). Which direction?

answer:

Implementation Design

11. Should manifests be deterministic by default? This means: sort file entries by path, omit createdAt timestamp (or make it opt-in), no atime. Should determinism be the default, with a --include-timestamps flag to opt in?

answer:

12. Should we consolidate or keep both scanner/checker implementations? There are two parallel implementations: mfer/scanner.go + mfer/checker.go (typed with FileSize, RelFilePath) and internal/scanner/ + internal/checker/ (raw int64, string). The mfer/ versions are superior. Delete the internal/ versions?

answer:

13. Should the manifest type be exported? Currently unexported with exported constructors (New, NewFromPaths, etc.). Consumers can't declare var m *mfer.manifest. Export the type, or define an interface?

answer:

14. What should the Go module path be for 1.0? Currently mixed between sneak.berlin/go/mfer and git.eeqj.de/sneak/mfer. Which is canonical?

answer:


Implementation Plan

Phase 1: Foundation (format correctness)

  • Delete internal/scanner/ and internal/checker/ — consolidate on mfer/ package versions; update CLI code
  • Add deterministic file ordering — sort entries by path (lexicographic, byte-order) in Builder.Build(); add test asserting byte-identical output from two runs
  • Add decompression size limit — io.LimitReader in deserializeInner() with m.pbOuter.Size as bound
  • Fix errors.Is dead code in checker — replace with os.IsNotExist(err) or errors.Is(err, fs.ErrNotExist)
  • Fix AddFile to verify size — check totalRead == size after reading, return error on mismatch
  • Specify path invariants — add proto comments (UTF-8, forward-slash, relative, no .., no leading /); validate in Builder.AddFile and Builder.AddFileWithHash

Phase 2: CLI polish

  • Fix flag naming — all CLI flags use kebab-case as primary (--include-dotfiles, --follow-symlinks)
  • Fix URL construction in fetch — use BaseURL.JoinPath() or url.JoinPath() instead of string concatenation
  • Add progress rate-limiting to Checker — throttle to once per second, matching Scanner
  • Add --deterministic flag (or make it default) — omit createdAt, sort files

Phase 3: Robustness

  • Replace GPG subprocess with pure-Go crypto — github.com/ProtonMail/go-crypto or Ed25519/signify
  • Add timeout to any remaining subprocess calls
  • Add fuzzing tests for NewManifestFromReader
  • Add retry logic to fetch — exponential backoff for transient HTTP errors

Phase 4: Format finalization

  • Remove or deprecate atime from proto (pending design question answer)
  • Reserve optional uint32 mode = 305 in MFFilePath for future file permissions
  • Add version byte after magic — ZNAVSRFG\x01 for format version 1
  • Write format specification document — separate from README: magic, outer structure, compression, inner structure, path invariants, signature scheme, canonical serialization

Phase 5: Release prep

  • Finalize Go module path
  • Audit all error messages for consistency and helpfulness
  • Add --version output matching SemVer
  • Tag v1.0.0