Add TODO section to README with 1.0 roadmap, remove TODO.md
Perform design and status review of the codebase. Add a comprehensive TODO section to README.md covering design questions requiring owner input, implementation tasks (repo infrastructure, format, library, CLI, testing, documentation), and release checklist. Integrate items from TODO.md into the README TODO section and remove TODO.md from the repo. Update AGENTS.md to reference the new location.
This commit is contained in:
@@ -26,4 +26,5 @@ source for coding standards, formatting, linting, and workflow rules.
|
||||
- The proto definition is in `mfer/mf.proto`; generated `.pb.go` files are
|
||||
committed (required for `go get` compatibility).
|
||||
- The format specification is in `FORMAT.md`.
|
||||
- See `TODO.md` for the 1.0 implementation plan and open design questions.
|
||||
- See the TODO section in `README.md` for the 1.0 implementation plan
|
||||
and open design questions.
|
||||
|
||||
128
README.md
128
README.md
@@ -25,7 +25,8 @@ software. A compatible javascript library is planned.
|
||||
|
||||
# Build Status
|
||||
|
||||
[](https://drone.datavi.be/sneak/mfer)
|
||||
CI runs via `docker build .` which executes `make check` (formatting,
|
||||
linting, tests). The `main` branch must always be green.
|
||||
|
||||
# Participation
|
||||
|
||||
@@ -208,6 +209,131 @@ regardless of filesystem format.
|
||||
Please email [`sneak@sneak.berlin`](mailto:sneak@sneak.berlin) with your
|
||||
desired username for an account on this Gitea instance.
|
||||
|
||||
# TODO: Remaining Work for 1.0
|
||||
|
||||
## Design Questions (Owner Decision Required)
|
||||
|
||||
These require @sneak's input before implementation. Answers should be added
|
||||
inline below each question.
|
||||
|
||||
**1. Should `MFFileChecksum` be simplified?** Currently it's a separate
|
||||
message wrapping a single `bytes multiHash` field. Since multihash
|
||||
already self-describes the algorithm, `repeated bytes hashes` directly on
|
||||
`MFFilePath` would be simpler and reduce per-file protobuf overhead. Is
|
||||
the extra message layer intentional (e.g. planning to add per-hash
|
||||
metadata like `verified_at`)?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**2. Should file permissions/mode be stored?** The format stores
|
||||
mtime/ctime but not Unix file permissions. For archival use this may not
|
||||
matter, but for software distribution or filesystem restoration it's a
|
||||
gap. Should we reserve a field now (e.g. `optional uint32 mode = 305`)
|
||||
even if we don't populate it yet?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**3. Should the `manifest` type be exported?** Currently unexported with
|
||||
exported constructors (`NewManifestFromReader`, `NewManifestFromFile`).
|
||||
Consumers can't declare `var m *mfer.manifest`. Export the type, or
|
||||
define an interface?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**4. What should the Go module path be for 1.0?** Currently
|
||||
`sneak.berlin/go/mfer` in `go.mod` but `git.eeqj.de/sneak/mfer/mfer` in
|
||||
the proto `go_package` option. Which is canonical?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**5. GPG vs pure-Go crypto for signatures?** Shelling out to `gpg` is
|
||||
fragile (may not be installed, version-dependent output).
|
||||
`github.com/ProtonMail/go-crypto` provides pure-Go OpenPGP, or we could
|
||||
use Ed25519/signify (simpler, no key management). Which direction?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**6. Should we add a version byte after the magic?** Currently
|
||||
`ZNAVSRFG` is followed immediately by protobuf. Adding a version byte
|
||||
(`ZNAVSRFG\x01`) would allow future framing changes without requiring
|
||||
protobuf parsing to detect the version. `MFFileOuter.Version` serves
|
||||
this purpose but requires successful deserialization to read. Worth the
|
||||
extra byte?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**7. Should we add a length-prefix after the magic?** Protobuf is not
|
||||
self-delimiting. If we ever want to concatenate manifests or append data
|
||||
after the protobuf, the current framing is insufficient. Add a varint or
|
||||
fixed-width length-prefix?
|
||||
|
||||
> _answer:_
|
||||
|
||||
## Implementation Tasks
|
||||
|
||||
### Repo Infrastructure
|
||||
|
||||
- [ ] Add `.golangci.yml` (fetch from
|
||||
`https://git.eeqj.de/sneak/prompts/raw/branch/main/.golangci.yml`)
|
||||
- [ ] Add `.editorconfig`
|
||||
- [ ] Add `.gitea/workflows/check.yml` that runs `docker build .`
|
||||
|
||||
### Format & Correctness
|
||||
|
||||
- [ ] Resolve proto `go_package` path inconsistency
|
||||
(`git.eeqj.de/sneak/mfer/mfer` vs `sneak.berlin/go/mfer`)
|
||||
- [ ] Reserve `optional uint32 mode = 305` in `MFFilePath` for future
|
||||
file permissions (pending design question answer)
|
||||
- [ ] Add version byte after magic — `ZNAVSRFG\x01` for format version
|
||||
1 (pending design question answer)
|
||||
|
||||
### Library
|
||||
|
||||
- [ ] Export the `manifest` type or define a public interface (pending
|
||||
design question answer) — currently consumers cannot hold a reference
|
||||
to a loaded manifest in their own type declarations
|
||||
- [ ] Replace GPG subprocess calls with pure-Go crypto (pending design
|
||||
question answer) — current implementation shells out to `gpg` which
|
||||
may not be installed
|
||||
|
||||
### CLI
|
||||
|
||||
- [ ] Wire `--version` flag properly (currently only a `version`
|
||||
subcommand exists; top-level `--version` shows urfave/cli generic
|
||||
output)
|
||||
- [ ] Add `--deterministic` flag documentation — deterministic output is
|
||||
the default, but this isn't obvious from CLI help
|
||||
- [ ] Add retry logic to `fetch` — currently no retries on transient
|
||||
HTTP errors; needs exponential backoff
|
||||
- [ ] `fetch` command uses bare `http.Get` with no timeout — needs
|
||||
`http.Client` with configurable timeout
|
||||
|
||||
### Testing & Robustness
|
||||
|
||||
- [ ] Add fuzzing tests for `NewManifestFromReader` — protobuf
|
||||
deserialization of untrusted input needs fuzz coverage
|
||||
- [ ] Add integration test for `freshen` CLI command — current tests
|
||||
only verify setup, not the actual freshen operation end-to-end
|
||||
- [ ] Add test for `fetch` CLI command end-to-end (currently only
|
||||
`downloadFile` is tested)
|
||||
|
||||
### Documentation
|
||||
|
||||
- [ ] Write standalone format specification document — `FORMAT.md`
|
||||
exists but should be promoted to the primary spec reference; README
|
||||
should link to it more prominently
|
||||
- [ ] Audit and update all error messages for consistency and
|
||||
helpfulness
|
||||
- [ ] Document the signature scheme more thoroughly (canonical string
|
||||
format, verification steps)
|
||||
|
||||
### Release
|
||||
|
||||
- [ ] Finalize Go module path
|
||||
- [ ] Update version constant in `mfer/constants.go`
|
||||
- [ ] Add `--version` output matching SemVer
|
||||
- [ ] Tag `v1.0.0`
|
||||
|
||||
# See Also
|
||||
|
||||
## Prior Art: Metalink
|
||||
|
||||
122
TODO.md
122
TODO.md
@@ -1,122 +0,0 @@
|
||||
# TODO: mfer 1.0
|
||||
|
||||
## Design Questions
|
||||
|
||||
_sneak: please answer inline below each question. These are preserved for posterity._
|
||||
|
||||
### Format Design
|
||||
|
||||
**1. Should `MFFileChecksum` be simplified?**
|
||||
Currently it's a separate message wrapping a single `bytes multiHash` field. Since multihash already self-describes the algorithm, `repeated bytes hashes` directly on `MFFilePath` would be simpler and reduce per-file protobuf overhead. Is the extra message layer intentional (e.g. planning to add per-hash metadata like `verified_at`)?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**2. Should file permissions/mode be stored?**
|
||||
The format stores mtime/ctime but not Unix file permissions. For archival use (ExFAT, filesystem-independent checksums) this may not matter, but for software distribution or filesystem restoration it's a gap. Should we reserve a field now (e.g. `optional uint32 mode = 305`) even if we don't populate it yet?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**3. Should `atime` be removed from the schema?**
|
||||
Access time is volatile, non-deterministic, and often disabled (`noatime`). Including it means two manifests of the same directory at different times will differ, which conflicts with the determinism goal. Remove it, or document it as "never set by default"?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**4. What are the path normalization rules?**
|
||||
The proto has `string path` with no specification about: always forward-slash? Must be relative? No `..` components allowed? UTF-8 NFC vs NFD normalization (macOS vs Linux)? Max path length? This is a security issue (path traversal) and a cross-platform compatibility issue. What rules should the spec mandate?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**5. Should we add a version byte after the magic?**
|
||||
Currently `ZNAVSRFG` is followed immediately by protobuf. Adding a version byte (`ZNAVSRFG\x01`) would allow future framing changes without requiring protobuf parsing to detect the version. `MFFileOuter.Version` serves this purpose but requires successful deserialization to read. Worth the extra byte?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**6. Should we add a length-prefix after the magic?**
|
||||
Protobuf is not self-delimiting. If we ever want to concatenate manifests or append data after the protobuf, the current framing is insufficient. Add a varint or fixed-width length-prefix?
|
||||
|
||||
> _answer:_
|
||||
|
||||
### Signature Design
|
||||
|
||||
**7. What does the outer SHA-256 hash cover — compressed or uncompressed data?**
|
||||
The review notes it currently hashes compressed data (good for verifying before decompression), but this should be explicitly documented. Which is the intended behavior?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**8. Should `signatureString()` sign raw bytes instead of a hex-encoded string?**
|
||||
Currently the canonical string is `MAGIC-UUID-MULTIHASH` with hex encoding, which adds a transformation layer. Signing the raw `sha256` bytes (or compressed `innerMessage` directly) would be simpler. Keep the string format or switch to raw bytes?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**9. Should we support detached signature files (`.mf.sig`)?**
|
||||
Embedded signatures are better for single-file distribution. Detached `.mf.sig` files follow the familiar `SHASUMS`/`SHASUMS.asc` pattern and are simpler for HTTP serving. Support both modes?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**10. GPG vs pure-Go crypto for signatures?**
|
||||
Shelling out to `gpg` is fragile (may not be installed, version-dependent output). `github.com/ProtonMail/go-crypto` provides pure-Go OpenPGP, or we could go Ed25519/signify (simpler, no key management). Which direction?
|
||||
|
||||
> _answer:_
|
||||
|
||||
### Implementation Design
|
||||
|
||||
**11. Should manifests be deterministic by default?**
|
||||
This means: sort file entries by path, omit `createdAt` timestamp (or make it opt-in), no `atime`. Should determinism be the default, with a `--include-timestamps` flag to opt in?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**12. Should we consolidate or keep both scanner/checker implementations?**
|
||||
There are two parallel implementations: `mfer/scanner.go` + `mfer/checker.go` (typed with `FileSize`, `RelFilePath`) and `internal/scanner/` + `internal/checker/` (raw `int64`, `string`). The `mfer/` versions are superior. Delete the `internal/` versions?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**13. Should the `manifest` type be exported?**
|
||||
Currently unexported with exported constructors (`New`, `NewFromPaths`, etc.). Consumers can't declare `var m *mfer.manifest`. Export the type, or define an interface?
|
||||
|
||||
> _answer:_
|
||||
|
||||
**14. What should the Go module path be for 1.0?**
|
||||
Currently mixed between `sneak.berlin/go/mfer` and `git.eeqj.de/sneak/mfer`. Which is canonical?
|
||||
|
||||
> _answer:_
|
||||
|
||||
---
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: Foundation (format correctness)
|
||||
|
||||
- [ ] Delete `internal/scanner/` and `internal/checker/` — consolidate on `mfer/` package versions; update CLI code
|
||||
- [ ] Add deterministic file ordering — sort entries by path (lexicographic, byte-order) in `Builder.Build()`; add test asserting byte-identical output from two runs
|
||||
- [ ] Add decompression size limit — `io.LimitReader` in `deserializeInner()` with `m.pbOuter.Size` as bound
|
||||
- [ ] Fix `errors.Is` dead code in checker — replace with `os.IsNotExist(err)` or `errors.Is(err, fs.ErrNotExist)`
|
||||
- [ ] Fix `AddFile` to verify size — check `totalRead == size` after reading, return error on mismatch
|
||||
- [ ] Specify path invariants — add proto comments (UTF-8, forward-slash, relative, no `..`, no leading `/`); validate in `Builder.AddFile` and `Builder.AddFileWithHash`
|
||||
|
||||
### Phase 2: CLI polish
|
||||
|
||||
- [ ] Fix flag naming — all CLI flags use kebab-case as primary (`--include-dotfiles`, `--follow-symlinks`)
|
||||
- [ ] Fix URL construction in fetch — use `BaseURL.JoinPath()` or `url.JoinPath()` instead of string concatenation
|
||||
- [ ] Add progress rate-limiting to Checker — throttle to once per second, matching Scanner
|
||||
- [ ] Add `--deterministic` flag (or make it default) — omit `createdAt`, sort files
|
||||
|
||||
### Phase 3: Robustness
|
||||
|
||||
- [ ] Replace GPG subprocess with pure-Go crypto — `github.com/ProtonMail/go-crypto` or Ed25519/signify
|
||||
- [ ] Add timeout to any remaining subprocess calls
|
||||
- [ ] Add fuzzing tests for `NewManifestFromReader`
|
||||
- [ ] Add retry logic to fetch — exponential backoff for transient HTTP errors
|
||||
|
||||
### Phase 4: Format finalization
|
||||
|
||||
- [ ] Remove or deprecate `atime` from proto (pending design question answer)
|
||||
- [ ] Reserve `optional uint32 mode = 305` in `MFFilePath` for future file permissions
|
||||
- [ ] Add version byte after magic — `ZNAVSRFG\x01` for format version 1
|
||||
- [ ] Write format specification document — separate from README: magic, outer structure, compression, inner structure, path invariants, signature scheme, canonical serialization
|
||||
|
||||
### Phase 5: Release prep
|
||||
|
||||
- [ ] Finalize Go module path
|
||||
- [ ] Audit all error messages for consistency and helpfulness
|
||||
- [ ] Add `--version` output matching SemVer
|
||||
- [ ] Tag v1.0.0
|
||||
Reference in New Issue
Block a user