From 4b80c0067b736f64bd3020d1a710fb477616ba1d Mon Sep 17 00:00:00 2001 From: clawbot Date: Sun, 8 Feb 2026 18:40:31 +0100 Subject: [PATCH] docs: replace TODO.md with design questions and implementation plan --- TODO.md | 129 ++++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 116 insertions(+), 13 deletions(-) diff --git a/TODO.md b/TODO.md index 66e1c30..6c4cd3e 100644 --- a/TODO.md +++ b/TODO.md @@ -1,19 +1,122 @@ -# TODO +# TODO: mfer 1.0 -## Critical +## Design Questions -- [ ] Fix broken error comparison in `internal/checker/checker.go:195` - `errors.Is(err, errors.New("file does not exist"))` always returns false because `errors.New()` creates a new instance each call -- [ ] Fix unchecked `hash.Write()` errors in `mfer/builder.go:52`, `mfer/serialize.go:56`, `internal/cli/freshen.go:340` -- [ ] Fix URL path traversal risk in `internal/cli/fetch.go:116` - path isn't URL-escaped, should use `url.JoinPath()` or proper encoding +*sneak: please answer inline below each question. These are preserved for posterity.* -## Important +### Format Design -- [ ] Fix goroutine leak in signal handler `internal/cli/gen.go:98-106` - goroutine runs until channel closed, leaks if program exits normally -- [ ] Fix timestamp precision in `mfer/serialize.go:16-22` - use `t.Nanosecond()` instead of manual calculation -- [ ] Add context cancellation check to filesystem walk in `internal/cli/freshen.go` - Ctrl-C doesn't work during scan phase +**1. Should `MFFileChecksum` be simplified?** +Currently it's a separate message wrapping a single `bytes multiHash` field. Since multihash already self-describes the algorithm, `repeated bytes hashes` directly on `MFFilePath` would be simpler and reduce per-file protobuf overhead. Is the extra message layer intentional (e.g. planning to add per-hash metadata like `verified_at`)? -## Code Quality +> *answer:* -- [ ] Consolidate duplicate `pathIsHidden` implementations in `internal/scanner/scanner.go:385-402` and `internal/cli/freshen.go:378-397` -- [ ] Make `TotalBytes()` in `internal/scanner/scanner.go:250-259` track total incrementally instead of recalculating on every call -- [ ] Add input validation to `AddFileWithHash()` in `mfer/builder.go:107-120` - validate path, size, and hash inputs +**2. Should file permissions/mode be stored?** +The format stores mtime/ctime but not Unix file permissions. For archival use (ExFAT, filesystem-independent checksums) this may not matter, but for software distribution or filesystem restoration it's a gap. Should we reserve a field now (e.g. `optional uint32 mode = 305`) even if we don't populate it yet? + +> *answer:* + +**3. Should `atime` be removed from the schema?** +Access time is volatile, non-deterministic, and often disabled (`noatime`). Including it means two manifests of the same directory at different times will differ, which conflicts with the determinism goal. Remove it, or document it as "never set by default"? + +> *answer:* + +**4. What are the path normalization rules?** +The proto has `string path` with no specification about: always forward-slash? Must be relative? No `..` components allowed? UTF-8 NFC vs NFD normalization (macOS vs Linux)? Max path length? This is a security issue (path traversal) and a cross-platform compatibility issue. What rules should the spec mandate? + +> *answer:* + +**5. Should we add a version byte after the magic?** +Currently `ZNAVSRFG` is followed immediately by protobuf. Adding a version byte (`ZNAVSRFG\x01`) would allow future framing changes without requiring protobuf parsing to detect the version. `MFFileOuter.Version` serves this purpose but requires successful deserialization to read. Worth the extra byte? + +> *answer:* + +**6. Should we add a length-prefix after the magic?** +Protobuf is not self-delimiting. If we ever want to concatenate manifests or append data after the protobuf, the current framing is insufficient. Add a varint or fixed-width length-prefix? + +> *answer:* + +### Signature Design + +**7. What does the outer SHA-256 hash cover — compressed or uncompressed data?** +The review notes it currently hashes compressed data (good for verifying before decompression), but this should be explicitly documented. Which is the intended behavior? + +> *answer:* + +**8. Should `signatureString()` sign raw bytes instead of a hex-encoded string?** +Currently the canonical string is `MAGIC-UUID-MULTIHASH` with hex encoding, which adds a transformation layer. Signing the raw `sha256` bytes (or compressed `innerMessage` directly) would be simpler. Keep the string format or switch to raw bytes? + +> *answer:* + +**9. Should we support detached signature files (`.mf.sig`)?** +Embedded signatures are better for single-file distribution. Detached `.mf.sig` files follow the familiar `SHASUMS`/`SHASUMS.asc` pattern and are simpler for HTTP serving. Support both modes? + +> *answer:* + +**10. GPG vs pure-Go crypto for signatures?** +Shelling out to `gpg` is fragile (may not be installed, version-dependent output). `github.com/ProtonMail/go-crypto` provides pure-Go OpenPGP, or we could go Ed25519/signify (simpler, no key management). Which direction? + +> *answer:* + +### Implementation Design + +**11. Should manifests be deterministic by default?** +This means: sort file entries by path, omit `createdAt` timestamp (or make it opt-in), no `atime`. Should determinism be the default, with a `--include-timestamps` flag to opt in? + +> *answer:* + +**12. Should we consolidate or keep both scanner/checker implementations?** +There are two parallel implementations: `mfer/scanner.go` + `mfer/checker.go` (typed with `FileSize`, `RelFilePath`) and `internal/scanner/` + `internal/checker/` (raw `int64`, `string`). The `mfer/` versions are superior. Delete the `internal/` versions? + +> *answer:* + +**13. Should the `manifest` type be exported?** +Currently unexported with exported constructors (`New`, `NewFromPaths`, etc.). Consumers can't declare `var m *mfer.manifest`. Export the type, or define an interface? + +> *answer:* + +**14. What should the Go module path be for 1.0?** +Currently mixed between `sneak.berlin/go/mfer` and `git.eeqj.de/sneak/mfer`. Which is canonical? + +> *answer:* + +--- + +## Implementation Plan + +### Phase 1: Foundation (format correctness) + +- [ ] Delete `internal/scanner/` and `internal/checker/` — consolidate on `mfer/` package versions; update CLI code +- [ ] Add deterministic file ordering — sort entries by path (lexicographic, byte-order) in `Builder.Build()`; add test asserting byte-identical output from two runs +- [ ] Add decompression size limit — `io.LimitReader` in `deserializeInner()` with `m.pbOuter.Size` as bound +- [ ] Fix `errors.Is` dead code in checker — replace with `os.IsNotExist(err)` or `errors.Is(err, fs.ErrNotExist)` +- [ ] Fix `AddFile` to verify size — check `totalRead == size` after reading, return error on mismatch +- [ ] Specify path invariants — add proto comments (UTF-8, forward-slash, relative, no `..`, no leading `/`); validate in `Builder.AddFile` and `Builder.AddFileWithHash` + +### Phase 2: CLI polish + +- [ ] Fix flag naming — all CLI flags use kebab-case as primary (`--include-dotfiles`, `--follow-symlinks`) +- [ ] Fix URL construction in fetch — use `BaseURL.JoinPath()` or `url.JoinPath()` instead of string concatenation +- [ ] Add progress rate-limiting to Checker — throttle to once per second, matching Scanner +- [ ] Add `--deterministic` flag (or make it default) — omit `createdAt`, sort files + +### Phase 3: Robustness + +- [ ] Replace GPG subprocess with pure-Go crypto — `github.com/ProtonMail/go-crypto` or Ed25519/signify +- [ ] Add timeout to any remaining subprocess calls +- [ ] Add fuzzing tests for `NewManifestFromReader` +- [ ] Add retry logic to fetch — exponential backoff for transient HTTP errors + +### Phase 4: Format finalization + +- [ ] Remove or deprecate `atime` from proto (pending design question answer) +- [ ] Reserve `optional uint32 mode = 305` in `MFFilePath` for future file permissions +- [ ] Add version byte after magic — `ZNAVSRFG\x01` for format version 1 +- [ ] Write format specification document — separate from README: magic, outer structure, compression, inner structure, path invariants, signature scheme, canonical serialization + +### Phase 5: Release prep + +- [ ] Finalize Go module path +- [ ] Audit all error messages for consistency and helpfulness +- [ ] Add `--version` output matching SemVer +- [ ] Tag v1.0.0