mfer/TODO.md

6.6 KiB

TODO: mfer 1.0

Design Questions

These need answers before implementation. Respond inline.

1. Should atime be removed from the proto?

Access time is volatile, non-deterministic, and often disabled (noatime mount option). Including it means two manifests of the same unchanged directory tree generated at different times will differ. This hurts deterministic/reproducible manifests. Recommendation: remove it or document it as "never set by default, opt-in only."

Answer:

2. Should MFFileChecksum stay as a wrapper message or simplify to repeated bytes?

Currently MFFilePath.hashes is repeated MFFileChecksum where MFFileChecksum contains a single bytes multiHash field. Since multihash already self-describes the algorithm, repeated bytes hashes = 3 on MFFilePath would be equivalent and save protobuf overhead (one fewer tag+length per hash per file). The wrapper is justified if you plan to add per-hash metadata later (e.g. verified_at timestamp). Otherwise it's unnecessary indirection.

Answer:

3. Should file permissions/mode be supported?

The proto has no field for Unix permission bits. For the archival use case (ExFAT drives, filesystem-independent checksums), this may not matter. But for software distribution or filesystem restoration, the absence of permission bits is a gap. Recommendation: reserve optional uint32 mode = 305 in MFFilePath now, even if it's not populated yet.

Answer:

4. Path semantics: what are the rules?

The proto has string path with no specified invariants. These need to be documented and enforced:

  • Always forward-slash separated? (even on Windows)
  • Must be relative? (no leading /)
  • No .. components? (path traversal prevention)
  • UTF-8 NFC normalized? (macOS uses NFD, Linux uses NFC — same filename looks different in bytes)
  • Max path length?

Answer:

5. Should the sha256 field on MFFileOuter hash compressed or uncompressed data?

Currently it hashes the compressed inner bytes. This is actually the better choice — it lets you verify integrity before decompression (preventing decompression bombs from untrusted sources). But it's a non-obvious design decision that should be documented. Confirm this is intentional.

Answer:

6. GPG vs signify (Ed25519) for signatures?

The current implementation shells out to gpg, which is fragile (binary might not be installed, output format changes between versions). Options:

  • Keep GPG: Use github.com/ProtonMail/go-crypto for pure-Go OpenPGP (no subprocess, cross-platform, testable).
  • Switch to signify/Ed25519: Simpler, smaller keys, no key management complexity. github.com/frankbraun/gosignify exists. README already mentions this as an option.
  • Support both: More work, but maximum flexibility.

Answer:

7. Should manifests be deterministic by default?

If createdAt is populated and file order depends on filesystem walk order, two runs of mfer gen on the same directory produce different manifests. Recommendation: sort files lexicographically by path, omit createdAt unless --timestamp is passed. Deterministic by default.

Answer:

8. Should the magic bytes include a version byte?

Currently: ZNAVSRFG (8 bytes), then protobuf. If the framing ever needs to change (e.g. different compression framing, post-quantum signatures), you'd need to parse the protobuf to discover the version. Adding a version byte after the magic (ZNAVSRFG\x01) allows future format changes without requiring protobuf parsing for version detection.

Answer:

9. Detached signature support?

For HTTP distribution, a detached .mf.sig file alongside index.mf would let servers serve both without modifying the manifest itself. This follows the SHASUMS + SHASUMS.asc pattern. The embedded signature is better for single-file distribution. Worth supporting both modes?

Answer:

10. What about the duplicate internal packages?

There are two parallel implementations:

  • mfer/scanner.go + mfer/checker.go (newer, better-typed with FileSize, RelFilePath)
  • internal/scanner/ + internal/checker/ (older, raw types)

The mfer/ versions are superior. Should the internal/ versions be deleted and CLI updated to use mfer/ package?

Answer:


Implementation Plan

Ordered by dependency and priority. Each item should be a PR.

Phase 1: Foundation (format correctness)

  • Delete internal/scanner/ and internal/checker/ — consolidate on the mfer/ package versions. Update CLI code to use mfer.Scanner and mfer.Checker.
  • Add deterministic file ordering — sort file entries by path (lexicographic, byte-order) before serialization in Builder.Build(). Add a test that generates a manifest twice and asserts byte-identical output.
  • Add decompression size limit — use io.LimitReader in deserializeInner() with m.pbOuter.Size as the bound.
  • Fix errors.Is dead code in checker — replace with os.IsNotExist(err) or errors.Is(err, fs.ErrNotExist).
  • Fix AddFile to verify size — after reading, check totalRead == size and return an error on mismatch.
  • Specify path invariants — add comments to proto and validate in Builder.AddFile / Builder.AddFileWithHash.

Phase 2: CLI polish

  • Fix flag naming — ensure all CLI flags use kebab-case as primary names (--include-dotfiles, --follow-symlinks).
  • Fix URL construction in fetch — use url.JoinPath() instead of string concatenation.
  • Add progress rate-limiting to Checker — throttle to once per second, matching Scanner behavior.
  • Add --deterministic flag (or make it default) — omit createdAt timestamp, sort files.

Phase 3: Robustness

  • Replace GPG subprocess with pure-Go crypto — use github.com/ProtonMail/go-crypto or switch to Ed25519/signify.
  • Add timeout to any remaining subprocess calls.
  • Add fuzzing tests for NewManifestFromReader — parses untrusted input.
  • Add retry logic to fetch — exponential backoff for transient HTTP errors.

Phase 4: Format finalization

  • Remove or deprecate atime from proto (pending design question #1).
  • Reserve optional uint32 mode = 305 in MFFilePath (pending design question #3).
  • Add a version byte after magic (pending design question #8).
  • Write a format specification document — separate from README.

Phase 5: Release prep

  • Reconcile module path (sneak.berlin/go/mfer vs git.eeqj.de/sneak/mfer).
  • Audit all error messages for consistency.
  • Ensure --version output matches SemVer format.
  • Tag v1.0.0.