# TODO: mfer 1.0 ## Design Questions These need answers before implementation. Respond inline. ### 1. Should `atime` be removed from the proto? Access time is volatile, non-deterministic, and often disabled (`noatime` mount option). Including it means two manifests of the same unchanged directory tree generated at different times will differ. This hurts deterministic/reproducible manifests. Recommendation: remove it or document it as "never set by default, opt-in only." **Answer:** ### 2. Should `MFFileChecksum` stay as a wrapper message or simplify to `repeated bytes`? Currently `MFFilePath.hashes` is `repeated MFFileChecksum` where `MFFileChecksum` contains a single `bytes multiHash` field. Since multihash already self-describes the algorithm, `repeated bytes hashes = 3` on `MFFilePath` would be equivalent and save protobuf overhead (one fewer tag+length per hash per file). The wrapper is justified if you plan to add per-hash metadata later (e.g. `verified_at` timestamp). Otherwise it's unnecessary indirection. **Answer:** ### 3. Should file permissions/mode be supported? The proto has no field for Unix permission bits. For the archival use case (ExFAT drives, filesystem-independent checksums), this may not matter. But for software distribution or filesystem restoration, the absence of permission bits is a gap. Recommendation: reserve `optional uint32 mode = 305` in `MFFilePath` now, even if it's not populated yet. **Answer:** ### 4. Path semantics: what are the rules? The proto has `string path` with no specified invariants. These need to be documented and enforced: - Always forward-slash separated? (even on Windows) - Must be relative? (no leading `/`) - No `..` components? (path traversal prevention) - UTF-8 NFC normalized? (macOS uses NFD, Linux uses NFC — same filename looks different in bytes) - Max path length? **Answer:** ### 5. Should the `sha256` field on `MFFileOuter` hash compressed or uncompressed data? Currently it hashes the compressed inner bytes. This is actually the better choice — it lets you verify integrity before decompression (preventing decompression bombs from untrusted sources). But it's a non-obvious design decision that should be documented. Confirm this is intentional. **Answer:** ### 6. GPG vs signify (Ed25519) for signatures? The current implementation shells out to `gpg`, which is fragile (binary might not be installed, output format changes between versions). Options: - **Keep GPG:** Use `github.com/ProtonMail/go-crypto` for pure-Go OpenPGP (no subprocess, cross-platform, testable). - **Switch to signify/Ed25519:** Simpler, smaller keys, no key management complexity. `github.com/frankbraun/gosignify` exists. README already mentions this as an option. - **Support both:** More work, but maximum flexibility. **Answer:** ### 7. Should manifests be deterministic by default? If `createdAt` is populated and file order depends on filesystem walk order, two runs of `mfer gen` on the same directory produce different manifests. Recommendation: sort files lexicographically by path, omit `createdAt` unless `--timestamp` is passed. Deterministic by default. **Answer:** ### 8. Should the magic bytes include a version byte? Currently: `ZNAVSRFG` (8 bytes), then protobuf. If the framing ever needs to change (e.g. different compression framing, post-quantum signatures), you'd need to parse the protobuf to discover the version. Adding a version byte after the magic (`ZNAVSRFG\x01`) allows future format changes without requiring protobuf parsing for version detection. **Answer:** ### 9. Detached signature support? For HTTP distribution, a detached `.mf.sig` file alongside `index.mf` would let servers serve both without modifying the manifest itself. This follows the `SHASUMS` + `SHASUMS.asc` pattern. The embedded signature is better for single-file distribution. Worth supporting both modes? **Answer:** ### 10. What about the duplicate internal packages? There are two parallel implementations: - `mfer/scanner.go` + `mfer/checker.go` (newer, better-typed with `FileSize`, `RelFilePath`) - `internal/scanner/` + `internal/checker/` (older, raw types) The `mfer/` versions are superior. Should the `internal/` versions be deleted and CLI updated to use `mfer/` package? **Answer:** --- ## Implementation Plan Ordered by dependency and priority. Each item should be a PR. ### Phase 1: Foundation (format correctness) - [ ] Delete `internal/scanner/` and `internal/checker/` — consolidate on the `mfer/` package versions. Update CLI code to use `mfer.Scanner` and `mfer.Checker`. - [ ] Add deterministic file ordering — sort file entries by path (lexicographic, byte-order) before serialization in `Builder.Build()`. Add a test that generates a manifest twice and asserts byte-identical output. - [ ] Add decompression size limit — use `io.LimitReader` in `deserializeInner()` with `m.pbOuter.Size` as the bound. - [ ] Fix `errors.Is` dead code in checker — replace with `os.IsNotExist(err)` or `errors.Is(err, fs.ErrNotExist)`. - [ ] Fix `AddFile` to verify size — after reading, check `totalRead == size` and return an error on mismatch. - [ ] Specify path invariants — add comments to proto and validate in `Builder.AddFile` / `Builder.AddFileWithHash`. ### Phase 2: CLI polish - [ ] Fix flag naming — ensure all CLI flags use kebab-case as primary names (`--include-dotfiles`, `--follow-symlinks`). - [ ] Fix URL construction in fetch — use `url.JoinPath()` instead of string concatenation. - [ ] Add progress rate-limiting to Checker — throttle to once per second, matching Scanner behavior. - [ ] Add `--deterministic` flag (or make it default) — omit `createdAt` timestamp, sort files. ### Phase 3: Robustness - [ ] Replace GPG subprocess with pure-Go crypto — use `github.com/ProtonMail/go-crypto` or switch to Ed25519/signify. - [ ] Add timeout to any remaining subprocess calls. - [ ] Add fuzzing tests for `NewManifestFromReader` — parses untrusted input. - [ ] Add retry logic to fetch — exponential backoff for transient HTTP errors. ### Phase 4: Format finalization - [ ] Remove or deprecate `atime` from proto (pending design question #1). - [ ] Reserve `optional uint32 mode = 305` in `MFFilePath` (pending design question #3). - [ ] Add a version byte after magic (pending design question #8). - [ ] Write a format specification document — separate from README. ### Phase 5: Release prep - [ ] Reconcile module path (`sneak.berlin/go/mfer` vs `git.eeqj.de/sneak/mfer`). - [ ] Audit all error messages for consistency. - [ ] Ensure `--version` output matches SemVer format. - [ ] Tag v1.0.0.