Integrate all TODO.md items into README TODO section

Add all 14 design questions (7 were missing: atime removal, path
normalization, outer SHA-256 scope, signatureString format, detached
signatures, deterministic-by-default, scanner/checker consolidation).

Add all missing implementation tasks from TODO.md phases 1-4:
- Phase 1 foundation: consolidate internal packages, deterministic
  ordering, decompression size limit, errors.Is fix, AddFile size
  verification, path invariants
- Phase 2 CLI: flag naming, URL construction, progress rate-limiting,
  deterministic flag
- Phase 3: subprocess timeouts
- Phase 4: atime deprecation

Organize design questions with sub-headers (Format Design, Signature
Design, Implementation Design) matching original TODO.md structure.
This commit is contained in:
clawbot
2026-03-17 01:57:57 -07:00
parent cebf9785fc
commit 963dd829e6

129
README.md
View File

@@ -216,6 +216,8 @@ desired username for an account on this Gitea instance.
These require @sneak's input before implementation. Answers should be added These require @sneak's input before implementation. Answers should be added
inline below each question. inline below each question.
### Format Design
**1. Should `MFFileChecksum` be simplified?** Currently it's a separate **1. Should `MFFileChecksum` be simplified?** Currently it's a separate
message wrapping a single `bytes multiHash` field. Since multihash message wrapping a single `bytes multiHash` field. Since multihash
already self-describes the algorithm, `repeated bytes hashes` directly on already self-describes the algorithm, `repeated bytes hashes` directly on
@@ -233,27 +235,23 @@ even if we don't populate it yet?
> _answer:_ > _answer:_
**3. Should the `manifest` type be exported?** Currently unexported with **3. Should `atime` be removed from the schema?** Access time is
exported constructors (`NewManifestFromReader`, `NewManifestFromFile`). volatile, non-deterministic, and often disabled (`noatime`). Including it
Consumers can't declare `var m *mfer.manifest`. Export the type, or means two manifests of the same directory at different times will differ,
define an interface? which conflicts with the determinism goal. Remove it, or document it as
"never set by default"?
> _answer:_ > _answer:_
**4. What should the Go module path be for 1.0?** Currently **4. What are the path normalization rules?** The proto has `string path`
`sneak.berlin/go/mfer` in `go.mod` but `git.eeqj.de/sneak/mfer/mfer` in with no specification about: always forward-slash? Must be relative? No
the proto `go_package` option. Which is canonical? `..` components allowed? UTF-8 NFC vs NFD normalization (macOS vs
Linux)? Max path length? This is a security issue (path traversal) and a
cross-platform compatibility issue. What rules should the spec mandate?
> _answer:_ > _answer:_
**5. GPG vs pure-Go crypto for signatures?** Shelling out to `gpg` is **5. Should we add a version byte after the magic?** Currently
fragile (may not be installed, version-dependent output).
`github.com/ProtonMail/go-crypto` provides pure-Go OpenPGP, or we could
use Ed25519/signify (simpler, no key management). Which direction?
> _answer:_
**6. Should we add a version byte after the magic?** Currently
`ZNAVSRFG` is followed immediately by protobuf. Adding a version byte `ZNAVSRFG` is followed immediately by protobuf. Adding a version byte
(`ZNAVSRFG\x01`) would allow future framing changes without requiring (`ZNAVSRFG\x01`) would allow future framing changes without requiring
protobuf parsing to detect the version. `MFFileOuter.Version` serves protobuf parsing to detect the version. `MFFileOuter.Version` serves
@@ -262,13 +260,75 @@ extra byte?
> _answer:_ > _answer:_
**7. Should we add a length-prefix after the magic?** Protobuf is not **6. Should we add a length-prefix after the magic?** Protobuf is not
self-delimiting. If we ever want to concatenate manifests or append data self-delimiting. If we ever want to concatenate manifests or append data
after the protobuf, the current framing is insufficient. Add a varint or after the protobuf, the current framing is insufficient. Add a varint or
fixed-width length-prefix? fixed-width length-prefix?
> _answer:_ > _answer:_
### Signature Design
**7. What does the outer SHA-256 hash cover — compressed or uncompressed
data?** The code currently hashes compressed data (good for verifying
before decompression), but this should be explicitly documented. Which is
the intended behavior?
> _answer:_
**8. Should `signatureString()` sign raw bytes instead of a hex-encoded
string?** Currently the canonical string is `MAGIC-UUID-MULTIHASH` with
hex encoding, which adds a transformation layer. Signing the raw `sha256`
bytes (or compressed `innerMessage` directly) would be simpler. Keep the
string format or switch to raw bytes?
> _answer:_
**9. Should we support detached signature files (`.mf.sig`)?** Embedded
signatures are better for single-file distribution. Detached `.mf.sig`
files follow the familiar `SHASUMS`/`SHASUMS.asc` pattern and are
simpler for HTTP serving. Support both modes?
> _answer:_
**10. GPG vs pure-Go crypto for signatures?** Shelling out to `gpg` is
fragile (may not be installed, version-dependent output).
`github.com/ProtonMail/go-crypto` provides pure-Go OpenPGP, or we could
use Ed25519/signify (simpler, no key management). Which direction?
> _answer:_
### Implementation Design
**11. Should manifests be deterministic by default?** This means: sort
file entries by path, omit `createdAt` timestamp (or make it opt-in), no
`atime`. Should determinism be the default, with a
`--include-timestamps` flag to opt in?
> _answer:_
**12. Should we consolidate or keep both scanner/checker
implementations?** There are two parallel implementations:
`mfer/scanner.go` + `mfer/checker.go` (typed with `FileSize`,
`RelFilePath`) and `internal/scanner/` + `internal/checker/` (raw
`int64`, `string`). The `mfer/` versions are superior. Delete the
`internal/` versions?
> _answer:_
**13. Should the `manifest` type be exported?** Currently unexported with
exported constructors (`NewManifestFromReader`, `NewManifestFromFile`).
Consumers can't declare `var m *mfer.manifest`. Export the type, or
define an interface?
> _answer:_
**14. What should the Go module path be for 1.0?** Currently
`sneak.berlin/go/mfer` in `go.mod` but `git.eeqj.de/sneak/mfer/mfer` in
the proto `go_package` option. Which is canonical?
> _answer:_
## Implementation Tasks ## Implementation Tasks
### Repo Infrastructure ### Repo Infrastructure
@@ -282,27 +342,55 @@ fixed-width length-prefix?
- [ ] Resolve proto `go_package` path inconsistency - [ ] Resolve proto `go_package` path inconsistency
(`git.eeqj.de/sneak/mfer/mfer` vs `sneak.berlin/go/mfer`) (`git.eeqj.de/sneak/mfer/mfer` vs `sneak.berlin/go/mfer`)
- [ ] Specify path invariants — add proto comments requiring UTF-8,
forward-slash, relative paths, no `..`, no leading `/`; validate
in `Builder.AddFile` and `Builder.AddFileWithHash` (pending design
question answer)
- [ ] Remove or deprecate `atime` from proto (pending design question
answer)
- [ ] Reserve `optional uint32 mode = 305` in `MFFilePath` for future - [ ] Reserve `optional uint32 mode = 305` in `MFFilePath` for future
file permissions (pending design question answer) file permissions (pending design question answer)
- [ ] Add version byte after magic — `ZNAVSRFG\x01` for format version - [ ] Add version byte after magic — `ZNAVSRFG\x01` for format version
1 (pending design question answer) 1 (pending design question answer)
- [ ] Write format specification document — separate from README:
magic, outer structure, compression, inner structure, path
invariants, signature scheme, canonical serialization
### Library ### Library
- [ ] Delete `internal/scanner/` and `internal/checker/` — consolidate
on `mfer/` package versions; update CLI code (pending design
question answer)
- [ ] Add deterministic file ordering — sort entries by path
(lexicographic, byte-order) in `Builder.Build()`; add test
asserting byte-identical output from two runs
- [ ] Add decompression size limit — `io.LimitReader` in
`deserializeInner()` with `m.pbOuter.Size` as bound
- [ ] Fix `errors.Is` dead code in checker — replace with
`os.IsNotExist(err)` or `errors.Is(err, fs.ErrNotExist)`
- [ ] Fix `AddFile` to verify size — check `totalRead == size` after
reading, return error on mismatch
- [ ] Export the `manifest` type or define a public interface (pending - [ ] Export the `manifest` type or define a public interface (pending
design question answer) — currently consumers cannot hold a reference design question answer) — currently consumers cannot hold a reference
to a loaded manifest in their own type declarations to a loaded manifest in their own type declarations
- [ ] Replace GPG subprocess calls with pure-Go crypto (pending design - [ ] Replace GPG subprocess calls with pure-Go crypto (pending design
question answer) — current implementation shells out to `gpg` which question answer) — current implementation shells out to `gpg` which
may not be installed may not be installed
- [ ] Add timeout to any remaining subprocess calls
### CLI ### CLI
- [ ] Fix flag naming — all CLI flags should use kebab-case as primary
(`--include-dotfiles`, `--follow-symlinks`)
- [ ] Fix URL construction in fetch — use `BaseURL.JoinPath()` or
`url.JoinPath()` instead of string concatenation
- [ ] Add progress rate-limiting to Checker — throttle to once per
second, matching Scanner
- [ ] Add `--deterministic` flag or make it default — omit `createdAt`,
sort files (pending design question answer)
- [ ] Wire `--version` flag properly (currently only a `version` - [ ] Wire `--version` flag properly (currently only a `version`
subcommand exists; top-level `--version` shows urfave/cli generic subcommand exists; top-level `--version` shows urfave/cli generic
output) output)
- [ ] Add `--deterministic` flag documentation — deterministic output is
the default, but this isn't obvious from CLI help
- [ ] Add retry logic to `fetch` — currently no retries on transient - [ ] Add retry logic to `fetch` — currently no retries on transient
HTTP errors; needs exponential backoff HTTP errors; needs exponential backoff
- [ ] `fetch` command uses bare `http.Get` with no timeout — needs - [ ] `fetch` command uses bare `http.Get` with no timeout — needs
@@ -319,9 +407,8 @@ fixed-width length-prefix?
### Documentation ### Documentation
- [ ] Write standalone format specification document — `FORMAT.md` - [ ] Promote `FORMAT.md` as primary spec reference; README should link
exists but should be promoted to the primary spec reference; README to it more prominently
should link to it more prominently
- [ ] Audit and update all error messages for consistency and - [ ] Audit and update all error messages for consistency and
helpfulness helpfulness
- [ ] Document the signature scheme more thoroughly (canonical string - [ ] Document the signature scheme more thoroughly (canonical string