From 963dd829e60c6b14122bdfc248200f432bd7f40b Mon Sep 17 00:00:00 2001 From: clawbot Date: Tue, 17 Mar 2026 01:57:57 -0700 Subject: [PATCH] Integrate all TODO.md items into README TODO section Add all 14 design questions (7 were missing: atime removal, path normalization, outer SHA-256 scope, signatureString format, detached signatures, deterministic-by-default, scanner/checker consolidation). Add all missing implementation tasks from TODO.md phases 1-4: - Phase 1 foundation: consolidate internal packages, deterministic ordering, decompression size limit, errors.Is fix, AddFile size verification, path invariants - Phase 2 CLI: flag naming, URL construction, progress rate-limiting, deterministic flag - Phase 3: subprocess timeouts - Phase 4: atime deprecation Organize design questions with sub-headers (Format Design, Signature Design, Implementation Design) matching original TODO.md structure. --- README.md | 129 +++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 108 insertions(+), 21 deletions(-) diff --git a/README.md b/README.md index 29ad303..259a9b0 100644 --- a/README.md +++ b/README.md @@ -216,6 +216,8 @@ desired username for an account on this Gitea instance. These require @sneak's input before implementation. Answers should be added inline below each question. +### Format Design + **1. Should `MFFileChecksum` be simplified?** Currently it's a separate message wrapping a single `bytes multiHash` field. Since multihash already self-describes the algorithm, `repeated bytes hashes` directly on @@ -233,27 +235,23 @@ even if we don't populate it yet? > _answer:_ -**3. Should the `manifest` type be exported?** Currently unexported with -exported constructors (`NewManifestFromReader`, `NewManifestFromFile`). -Consumers can't declare `var m *mfer.manifest`. Export the type, or -define an interface? +**3. Should `atime` be removed from the schema?** Access time is +volatile, non-deterministic, and often disabled (`noatime`). Including it +means two manifests of the same directory at different times will differ, +which conflicts with the determinism goal. Remove it, or document it as +"never set by default"? > _answer:_ -**4. What should the Go module path be for 1.0?** Currently -`sneak.berlin/go/mfer` in `go.mod` but `git.eeqj.de/sneak/mfer/mfer` in -the proto `go_package` option. Which is canonical? +**4. What are the path normalization rules?** The proto has `string path` +with no specification about: always forward-slash? Must be relative? No +`..` components allowed? UTF-8 NFC vs NFD normalization (macOS vs +Linux)? Max path length? This is a security issue (path traversal) and a +cross-platform compatibility issue. What rules should the spec mandate? > _answer:_ -**5. GPG vs pure-Go crypto for signatures?** Shelling out to `gpg` is -fragile (may not be installed, version-dependent output). -`github.com/ProtonMail/go-crypto` provides pure-Go OpenPGP, or we could -use Ed25519/signify (simpler, no key management). Which direction? - -> _answer:_ - -**6. Should we add a version byte after the magic?** Currently +**5. Should we add a version byte after the magic?** Currently `ZNAVSRFG` is followed immediately by protobuf. Adding a version byte (`ZNAVSRFG\x01`) would allow future framing changes without requiring protobuf parsing to detect the version. `MFFileOuter.Version` serves @@ -262,13 +260,75 @@ extra byte? > _answer:_ -**7. Should we add a length-prefix after the magic?** Protobuf is not +**6. Should we add a length-prefix after the magic?** Protobuf is not self-delimiting. If we ever want to concatenate manifests or append data after the protobuf, the current framing is insufficient. Add a varint or fixed-width length-prefix? > _answer:_ +### Signature Design + +**7. What does the outer SHA-256 hash cover — compressed or uncompressed +data?** The code currently hashes compressed data (good for verifying +before decompression), but this should be explicitly documented. Which is +the intended behavior? + +> _answer:_ + +**8. Should `signatureString()` sign raw bytes instead of a hex-encoded +string?** Currently the canonical string is `MAGIC-UUID-MULTIHASH` with +hex encoding, which adds a transformation layer. Signing the raw `sha256` +bytes (or compressed `innerMessage` directly) would be simpler. Keep the +string format or switch to raw bytes? + +> _answer:_ + +**9. Should we support detached signature files (`.mf.sig`)?** Embedded +signatures are better for single-file distribution. Detached `.mf.sig` +files follow the familiar `SHASUMS`/`SHASUMS.asc` pattern and are +simpler for HTTP serving. Support both modes? + +> _answer:_ + +**10. GPG vs pure-Go crypto for signatures?** Shelling out to `gpg` is +fragile (may not be installed, version-dependent output). +`github.com/ProtonMail/go-crypto` provides pure-Go OpenPGP, or we could +use Ed25519/signify (simpler, no key management). Which direction? + +> _answer:_ + +### Implementation Design + +**11. Should manifests be deterministic by default?** This means: sort +file entries by path, omit `createdAt` timestamp (or make it opt-in), no +`atime`. Should determinism be the default, with a +`--include-timestamps` flag to opt in? + +> _answer:_ + +**12. Should we consolidate or keep both scanner/checker +implementations?** There are two parallel implementations: +`mfer/scanner.go` + `mfer/checker.go` (typed with `FileSize`, +`RelFilePath`) and `internal/scanner/` + `internal/checker/` (raw +`int64`, `string`). The `mfer/` versions are superior. Delete the +`internal/` versions? + +> _answer:_ + +**13. Should the `manifest` type be exported?** Currently unexported with +exported constructors (`NewManifestFromReader`, `NewManifestFromFile`). +Consumers can't declare `var m *mfer.manifest`. Export the type, or +define an interface? + +> _answer:_ + +**14. What should the Go module path be for 1.0?** Currently +`sneak.berlin/go/mfer` in `go.mod` but `git.eeqj.de/sneak/mfer/mfer` in +the proto `go_package` option. Which is canonical? + +> _answer:_ + ## Implementation Tasks ### Repo Infrastructure @@ -282,27 +342,55 @@ fixed-width length-prefix? - [ ] Resolve proto `go_package` path inconsistency (`git.eeqj.de/sneak/mfer/mfer` vs `sneak.berlin/go/mfer`) +- [ ] Specify path invariants — add proto comments requiring UTF-8, + forward-slash, relative paths, no `..`, no leading `/`; validate + in `Builder.AddFile` and `Builder.AddFileWithHash` (pending design + question answer) +- [ ] Remove or deprecate `atime` from proto (pending design question + answer) - [ ] Reserve `optional uint32 mode = 305` in `MFFilePath` for future file permissions (pending design question answer) - [ ] Add version byte after magic — `ZNAVSRFG\x01` for format version 1 (pending design question answer) +- [ ] Write format specification document — separate from README: + magic, outer structure, compression, inner structure, path + invariants, signature scheme, canonical serialization ### Library +- [ ] Delete `internal/scanner/` and `internal/checker/` — consolidate + on `mfer/` package versions; update CLI code (pending design + question answer) +- [ ] Add deterministic file ordering — sort entries by path + (lexicographic, byte-order) in `Builder.Build()`; add test + asserting byte-identical output from two runs +- [ ] Add decompression size limit — `io.LimitReader` in + `deserializeInner()` with `m.pbOuter.Size` as bound +- [ ] Fix `errors.Is` dead code in checker — replace with + `os.IsNotExist(err)` or `errors.Is(err, fs.ErrNotExist)` +- [ ] Fix `AddFile` to verify size — check `totalRead == size` after + reading, return error on mismatch - [ ] Export the `manifest` type or define a public interface (pending design question answer) — currently consumers cannot hold a reference to a loaded manifest in their own type declarations - [ ] Replace GPG subprocess calls with pure-Go crypto (pending design question answer) — current implementation shells out to `gpg` which may not be installed +- [ ] Add timeout to any remaining subprocess calls ### CLI +- [ ] Fix flag naming — all CLI flags should use kebab-case as primary + (`--include-dotfiles`, `--follow-symlinks`) +- [ ] Fix URL construction in fetch — use `BaseURL.JoinPath()` or + `url.JoinPath()` instead of string concatenation +- [ ] Add progress rate-limiting to Checker — throttle to once per + second, matching Scanner +- [ ] Add `--deterministic` flag or make it default — omit `createdAt`, + sort files (pending design question answer) - [ ] Wire `--version` flag properly (currently only a `version` subcommand exists; top-level `--version` shows urfave/cli generic output) -- [ ] Add `--deterministic` flag documentation — deterministic output is - the default, but this isn't obvious from CLI help - [ ] Add retry logic to `fetch` — currently no retries on transient HTTP errors; needs exponential backoff - [ ] `fetch` command uses bare `http.Get` with no timeout — needs @@ -319,9 +407,8 @@ fixed-width length-prefix? ### Documentation -- [ ] Write standalone format specification document — `FORMAT.md` - exists but should be promoted to the primary spec reference; README - should link to it more prominently +- [ ] Promote `FORMAT.md` as primary spec reference; README should link + to it more prominently - [ ] Audit and update all error messages for consistency and helpfulness - [ ] Document the signature scheme more thoroughly (canonical string