diff --git a/TODO.md b/TODO.md index 66e1c30..eca5603 100644 --- a/TODO.md +++ b/TODO.md @@ -1,19 +1,175 @@ -# TODO +# TODO: mfer 1.0 -## Critical +## Design Questions -- [ ] Fix broken error comparison in `internal/checker/checker.go:195` - `errors.Is(err, errors.New("file does not exist"))` always returns false because `errors.New()` creates a new instance each call -- [ ] Fix unchecked `hash.Write()` errors in `mfer/builder.go:52`, `mfer/serialize.go:56`, `internal/cli/freshen.go:340` -- [ ] Fix URL path traversal risk in `internal/cli/fetch.go:116` - path isn't URL-escaped, should use `url.JoinPath()` or proper encoding +These need answers before implementation. Respond inline. -## Important +### 1. Should `atime` be removed from the proto? -- [ ] Fix goroutine leak in signal handler `internal/cli/gen.go:98-106` - goroutine runs until channel closed, leaks if program exits normally -- [ ] Fix timestamp precision in `mfer/serialize.go:16-22` - use `t.Nanosecond()` instead of manual calculation -- [ ] Add context cancellation check to filesystem walk in `internal/cli/freshen.go` - Ctrl-C doesn't work during scan phase +Access time is volatile, non-deterministic, and often disabled (`noatime` +mount option). Including it means two manifests of the same unchanged +directory tree generated at different times will differ. This hurts +deterministic/reproducible manifests. Recommendation: remove it or document +it as "never set by default, opt-in only." -## Code Quality +**Answer:** -- [ ] Consolidate duplicate `pathIsHidden` implementations in `internal/scanner/scanner.go:385-402` and `internal/cli/freshen.go:378-397` -- [ ] Make `TotalBytes()` in `internal/scanner/scanner.go:250-259` track total incrementally instead of recalculating on every call -- [ ] Add input validation to `AddFileWithHash()` in `mfer/builder.go:107-120` - validate path, size, and hash inputs +### 2. Should `MFFileChecksum` stay as a wrapper message or simplify to `repeated bytes`? + +Currently `MFFilePath.hashes` is `repeated MFFileChecksum` where +`MFFileChecksum` contains a single `bytes multiHash` field. Since multihash +already self-describes the algorithm, `repeated bytes hashes = 3` on +`MFFilePath` would be equivalent and save protobuf overhead (one fewer +tag+length per hash per file). The wrapper is justified if you plan to add +per-hash metadata later (e.g. `verified_at` timestamp). Otherwise it's +unnecessary indirection. + +**Answer:** + +### 3. Should file permissions/mode be supported? + +The proto has no field for Unix permission bits. For the archival use case +(ExFAT drives, filesystem-independent checksums), this may not matter. But +for software distribution or filesystem restoration, the absence of +permission bits is a gap. Recommendation: reserve `optional uint32 mode = +305` in `MFFilePath` now, even if it's not populated yet. + +**Answer:** + +### 4. Path semantics: what are the rules? + +The proto has `string path` with no specified invariants. These need to be +documented and enforced: + +- Always forward-slash separated? (even on Windows) +- Must be relative? (no leading `/`) +- No `..` components? (path traversal prevention) +- UTF-8 NFC normalized? (macOS uses NFD, Linux uses NFC — same filename + looks different in bytes) +- Max path length? + +**Answer:** + +### 5. Should the `sha256` field on `MFFileOuter` hash compressed or uncompressed data? + +Currently it hashes the compressed inner bytes. This is actually the better +choice — it lets you verify integrity before decompression (preventing +decompression bombs from untrusted sources). But it's a non-obvious design +decision that should be documented. Confirm this is intentional. + +**Answer:** + +### 6. GPG vs signify (Ed25519) for signatures? + +The current implementation shells out to `gpg`, which is fragile (binary +might not be installed, output format changes between versions). Options: + +- **Keep GPG:** Use `github.com/ProtonMail/go-crypto` for pure-Go OpenPGP + (no subprocess, cross-platform, testable). +- **Switch to signify/Ed25519:** Simpler, smaller keys, no key management + complexity. `github.com/frankbraun/gosignify` exists. README already + mentions this as an option. +- **Support both:** More work, but maximum flexibility. + +**Answer:** + +### 7. Should manifests be deterministic by default? + +If `createdAt` is populated and file order depends on filesystem walk order, +two runs of `mfer gen` on the same directory produce different manifests. +Recommendation: sort files lexicographically by path, omit `createdAt` +unless `--timestamp` is passed. Deterministic by default. + +**Answer:** + +### 8. Should the magic bytes include a version byte? + +Currently: `ZNAVSRFG` (8 bytes), then protobuf. If the framing ever needs +to change (e.g. different compression framing, post-quantum signatures), +you'd need to parse the protobuf to discover the version. Adding a version +byte after the magic (`ZNAVSRFG\x01`) allows future format changes without +requiring protobuf parsing for version detection. + +**Answer:** + +### 9. Detached signature support? + +For HTTP distribution, a detached `.mf.sig` file alongside `index.mf` would +let servers serve both without modifying the manifest itself. This follows +the `SHASUMS` + `SHASUMS.asc` pattern. The embedded signature is better for +single-file distribution. Worth supporting both modes? + +**Answer:** + +### 10. What about the duplicate internal packages? + +There are two parallel implementations: +- `mfer/scanner.go` + `mfer/checker.go` (newer, better-typed with + `FileSize`, `RelFilePath`) +- `internal/scanner/` + `internal/checker/` (older, raw types) + +The `mfer/` versions are superior. Should the `internal/` versions be +deleted and CLI updated to use `mfer/` package? + +**Answer:** + +--- + +## Implementation Plan + +Ordered by dependency and priority. Each item should be a PR. + +### Phase 1: Foundation (format correctness) + +- [ ] Delete `internal/scanner/` and `internal/checker/` — consolidate on + the `mfer/` package versions. Update CLI code to use `mfer.Scanner` and + `mfer.Checker`. +- [ ] Add deterministic file ordering — sort file entries by path + (lexicographic, byte-order) before serialization in `Builder.Build()`. + Add a test that generates a manifest twice and asserts byte-identical + output. +- [ ] Add decompression size limit — use `io.LimitReader` in + `deserializeInner()` with `m.pbOuter.Size` as the bound. +- [ ] Fix `errors.Is` dead code in checker — replace with + `os.IsNotExist(err)` or `errors.Is(err, fs.ErrNotExist)`. +- [ ] Fix `AddFile` to verify size — after reading, check + `totalRead == size` and return an error on mismatch. +- [ ] Specify path invariants — add comments to proto and validate in + `Builder.AddFile` / `Builder.AddFileWithHash`. + +### Phase 2: CLI polish + +- [ ] Fix flag naming — ensure all CLI flags use kebab-case as primary + names (`--include-dotfiles`, `--follow-symlinks`). +- [ ] Fix URL construction in fetch — use `url.JoinPath()` instead of + string concatenation. +- [ ] Add progress rate-limiting to Checker — throttle to once per second, + matching Scanner behavior. +- [ ] Add `--deterministic` flag (or make it default) — omit `createdAt` + timestamp, sort files. + +### Phase 3: Robustness + +- [ ] Replace GPG subprocess with pure-Go crypto — use + `github.com/ProtonMail/go-crypto` or switch to Ed25519/signify. +- [ ] Add timeout to any remaining subprocess calls. +- [ ] Add fuzzing tests for `NewManifestFromReader` — parses untrusted + input. +- [ ] Add retry logic to fetch — exponential backoff for transient HTTP + errors. + +### Phase 4: Format finalization + +- [ ] Remove or deprecate `atime` from proto (pending design question #1). +- [ ] Reserve `optional uint32 mode = 305` in `MFFilePath` (pending design + question #3). +- [ ] Add a version byte after magic (pending design question #8). +- [ ] Write a format specification document — separate from README. + +### Phase 5: Release prep + +- [ ] Reconcile module path (`sneak.berlin/go/mfer` vs + `git.eeqj.de/sneak/mfer`). +- [ ] Audit all error messages for consistency. +- [ ] Ensure `--version` output matches SemVer format. +- [ ] Tag v1.0.0.