Integrate all TODO.md items into README TODO section
Add all 14 design questions (7 were missing: atime removal, path normalization, outer SHA-256 scope, signatureString format, detached signatures, deterministic-by-default, scanner/checker consolidation). Add all missing implementation tasks from TODO.md phases 1-4: - Phase 1 foundation: consolidate internal packages, deterministic ordering, decompression size limit, errors.Is fix, AddFile size verification, path invariants - Phase 2 CLI: flag naming, URL construction, progress rate-limiting, deterministic flag - Phase 3: subprocess timeouts - Phase 4: atime deprecation Organize design questions with sub-headers (Format Design, Signature Design, Implementation Design) matching original TODO.md structure.
This commit is contained in:
129
README.md
129
README.md
@@ -216,6 +216,8 @@ desired username for an account on this Gitea instance.
|
||||
These require @sneak's input before implementation. Answers should be added
|
||||
inline below each question.
|
||||
|
||||
### Format Design
|
||||
|
||||
**1. Should `MFFileChecksum` be simplified?** Currently it's a separate
|
||||
message wrapping a single `bytes multiHash` field. Since multihash
|
||||
already self-describes the algorithm, `repeated bytes hashes` directly on
|
||||
@@ -233,27 +235,23 @@ even if we don't populate it yet?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**3. Should the `manifest` type be exported?** Currently unexported with
|
||||
exported constructors (`NewManifestFromReader`, `NewManifestFromFile`).
|
||||
Consumers can't declare `var m *mfer.manifest`. Export the type, or
|
||||
define an interface?
|
||||
**3. Should `atime` be removed from the schema?** Access time is
|
||||
volatile, non-deterministic, and often disabled (`noatime`). Including it
|
||||
means two manifests of the same directory at different times will differ,
|
||||
which conflicts with the determinism goal. Remove it, or document it as
|
||||
"never set by default"?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**4. What should the Go module path be for 1.0?** Currently
|
||||
`sneak.berlin/go/mfer` in `go.mod` but `git.eeqj.de/sneak/mfer/mfer` in
|
||||
the proto `go_package` option. Which is canonical?
|
||||
**4. What are the path normalization rules?** The proto has `string path`
|
||||
with no specification about: always forward-slash? Must be relative? No
|
||||
`..` components allowed? UTF-8 NFC vs NFD normalization (macOS vs
|
||||
Linux)? Max path length? This is a security issue (path traversal) and a
|
||||
cross-platform compatibility issue. What rules should the spec mandate?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**5. GPG vs pure-Go crypto for signatures?** Shelling out to `gpg` is
|
||||
fragile (may not be installed, version-dependent output).
|
||||
`github.com/ProtonMail/go-crypto` provides pure-Go OpenPGP, or we could
|
||||
use Ed25519/signify (simpler, no key management). Which direction?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**6. Should we add a version byte after the magic?** Currently
|
||||
**5. Should we add a version byte after the magic?** Currently
|
||||
`ZNAVSRFG` is followed immediately by protobuf. Adding a version byte
|
||||
(`ZNAVSRFG\x01`) would allow future framing changes without requiring
|
||||
protobuf parsing to detect the version. `MFFileOuter.Version` serves
|
||||
@@ -262,13 +260,75 @@ extra byte?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**7. Should we add a length-prefix after the magic?** Protobuf is not
|
||||
**6. Should we add a length-prefix after the magic?** Protobuf is not
|
||||
self-delimiting. If we ever want to concatenate manifests or append data
|
||||
after the protobuf, the current framing is insufficient. Add a varint or
|
||||
fixed-width length-prefix?
|
||||
|
||||
> _answer:_
|
||||
|
||||
### Signature Design
|
||||
|
||||
**7. What does the outer SHA-256 hash cover — compressed or uncompressed
|
||||
data?** The code currently hashes compressed data (good for verifying
|
||||
before decompression), but this should be explicitly documented. Which is
|
||||
the intended behavior?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**8. Should `signatureString()` sign raw bytes instead of a hex-encoded
|
||||
string?** Currently the canonical string is `MAGIC-UUID-MULTIHASH` with
|
||||
hex encoding, which adds a transformation layer. Signing the raw `sha256`
|
||||
bytes (or compressed `innerMessage` directly) would be simpler. Keep the
|
||||
string format or switch to raw bytes?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**9. Should we support detached signature files (`.mf.sig`)?** Embedded
|
||||
signatures are better for single-file distribution. Detached `.mf.sig`
|
||||
files follow the familiar `SHASUMS`/`SHASUMS.asc` pattern and are
|
||||
simpler for HTTP serving. Support both modes?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**10. GPG vs pure-Go crypto for signatures?** Shelling out to `gpg` is
|
||||
fragile (may not be installed, version-dependent output).
|
||||
`github.com/ProtonMail/go-crypto` provides pure-Go OpenPGP, or we could
|
||||
use Ed25519/signify (simpler, no key management). Which direction?
|
||||
|
||||
> _answer:_
|
||||
|
||||
### Implementation Design
|
||||
|
||||
**11. Should manifests be deterministic by default?** This means: sort
|
||||
file entries by path, omit `createdAt` timestamp (or make it opt-in), no
|
||||
`atime`. Should determinism be the default, with a
|
||||
`--include-timestamps` flag to opt in?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**12. Should we consolidate or keep both scanner/checker
|
||||
implementations?** There are two parallel implementations:
|
||||
`mfer/scanner.go` + `mfer/checker.go` (typed with `FileSize`,
|
||||
`RelFilePath`) and `internal/scanner/` + `internal/checker/` (raw
|
||||
`int64`, `string`). The `mfer/` versions are superior. Delete the
|
||||
`internal/` versions?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**13. Should the `manifest` type be exported?** Currently unexported with
|
||||
exported constructors (`NewManifestFromReader`, `NewManifestFromFile`).
|
||||
Consumers can't declare `var m *mfer.manifest`. Export the type, or
|
||||
define an interface?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**14. What should the Go module path be for 1.0?** Currently
|
||||
`sneak.berlin/go/mfer` in `go.mod` but `git.eeqj.de/sneak/mfer/mfer` in
|
||||
the proto `go_package` option. Which is canonical?
|
||||
|
||||
> _answer:_
|
||||
|
||||
## Implementation Tasks
|
||||
|
||||
### Repo Infrastructure
|
||||
@@ -282,27 +342,55 @@ fixed-width length-prefix?
|
||||
|
||||
- [ ] Resolve proto `go_package` path inconsistency
|
||||
(`git.eeqj.de/sneak/mfer/mfer` vs `sneak.berlin/go/mfer`)
|
||||
- [ ] Specify path invariants — add proto comments requiring UTF-8,
|
||||
forward-slash, relative paths, no `..`, no leading `/`; validate
|
||||
in `Builder.AddFile` and `Builder.AddFileWithHash` (pending design
|
||||
question answer)
|
||||
- [ ] Remove or deprecate `atime` from proto (pending design question
|
||||
answer)
|
||||
- [ ] Reserve `optional uint32 mode = 305` in `MFFilePath` for future
|
||||
file permissions (pending design question answer)
|
||||
- [ ] Add version byte after magic — `ZNAVSRFG\x01` for format version
|
||||
1 (pending design question answer)
|
||||
- [ ] Write format specification document — separate from README:
|
||||
magic, outer structure, compression, inner structure, path
|
||||
invariants, signature scheme, canonical serialization
|
||||
|
||||
### Library
|
||||
|
||||
- [ ] Delete `internal/scanner/` and `internal/checker/` — consolidate
|
||||
on `mfer/` package versions; update CLI code (pending design
|
||||
question answer)
|
||||
- [ ] Add deterministic file ordering — sort entries by path
|
||||
(lexicographic, byte-order) in `Builder.Build()`; add test
|
||||
asserting byte-identical output from two runs
|
||||
- [ ] Add decompression size limit — `io.LimitReader` in
|
||||
`deserializeInner()` with `m.pbOuter.Size` as bound
|
||||
- [ ] Fix `errors.Is` dead code in checker — replace with
|
||||
`os.IsNotExist(err)` or `errors.Is(err, fs.ErrNotExist)`
|
||||
- [ ] Fix `AddFile` to verify size — check `totalRead == size` after
|
||||
reading, return error on mismatch
|
||||
- [ ] Export the `manifest` type or define a public interface (pending
|
||||
design question answer) — currently consumers cannot hold a reference
|
||||
to a loaded manifest in their own type declarations
|
||||
- [ ] Replace GPG subprocess calls with pure-Go crypto (pending design
|
||||
question answer) — current implementation shells out to `gpg` which
|
||||
may not be installed
|
||||
- [ ] Add timeout to any remaining subprocess calls
|
||||
|
||||
### CLI
|
||||
|
||||
- [ ] Fix flag naming — all CLI flags should use kebab-case as primary
|
||||
(`--include-dotfiles`, `--follow-symlinks`)
|
||||
- [ ] Fix URL construction in fetch — use `BaseURL.JoinPath()` or
|
||||
`url.JoinPath()` instead of string concatenation
|
||||
- [ ] Add progress rate-limiting to Checker — throttle to once per
|
||||
second, matching Scanner
|
||||
- [ ] Add `--deterministic` flag or make it default — omit `createdAt`,
|
||||
sort files (pending design question answer)
|
||||
- [ ] Wire `--version` flag properly (currently only a `version`
|
||||
subcommand exists; top-level `--version` shows urfave/cli generic
|
||||
output)
|
||||
- [ ] Add `--deterministic` flag documentation — deterministic output is
|
||||
the default, but this isn't obvious from CLI help
|
||||
- [ ] Add retry logic to `fetch` — currently no retries on transient
|
||||
HTTP errors; needs exponential backoff
|
||||
- [ ] `fetch` command uses bare `http.Get` with no timeout — needs
|
||||
@@ -319,9 +407,8 @@ fixed-width length-prefix?
|
||||
|
||||
### Documentation
|
||||
|
||||
- [ ] Write standalone format specification document — `FORMAT.md`
|
||||
exists but should be promoted to the primary spec reference; README
|
||||
should link to it more prominently
|
||||
- [ ] Promote `FORMAT.md` as primary spec reference; README should link
|
||||
to it more prominently
|
||||
- [ ] Audit and update all error messages for consistency and
|
||||
helpfulness
|
||||
- [ ] Document the signature scheme more thoroughly (canonical string
|
||||
|
||||
Reference in New Issue
Block a user