forked from sneak/mfer
docs: add TODO.md with design questions and 1.0 implementation plan
This commit is contained in:
parent
5ab092098b
commit
cfb5058ad6
182
TODO.md
182
TODO.md
@ -1,19 +1,175 @@
|
|||||||
# TODO
|
# TODO: mfer 1.0
|
||||||
|
|
||||||
## Critical
|
## Design Questions
|
||||||
|
|
||||||
- [ ] Fix broken error comparison in `internal/checker/checker.go:195` - `errors.Is(err, errors.New("file does not exist"))` always returns false because `errors.New()` creates a new instance each call
|
These need answers before implementation. Respond inline.
|
||||||
- [ ] Fix unchecked `hash.Write()` errors in `mfer/builder.go:52`, `mfer/serialize.go:56`, `internal/cli/freshen.go:340`
|
|
||||||
- [ ] Fix URL path traversal risk in `internal/cli/fetch.go:116` - path isn't URL-escaped, should use `url.JoinPath()` or proper encoding
|
|
||||||
|
|
||||||
## Important
|
### 1. Should `atime` be removed from the proto?
|
||||||
|
|
||||||
- [ ] Fix goroutine leak in signal handler `internal/cli/gen.go:98-106` - goroutine runs until channel closed, leaks if program exits normally
|
Access time is volatile, non-deterministic, and often disabled (`noatime`
|
||||||
- [ ] Fix timestamp precision in `mfer/serialize.go:16-22` - use `t.Nanosecond()` instead of manual calculation
|
mount option). Including it means two manifests of the same unchanged
|
||||||
- [ ] Add context cancellation check to filesystem walk in `internal/cli/freshen.go` - Ctrl-C doesn't work during scan phase
|
directory tree generated at different times will differ. This hurts
|
||||||
|
deterministic/reproducible manifests. Recommendation: remove it or document
|
||||||
|
it as "never set by default, opt-in only."
|
||||||
|
|
||||||
## Code Quality
|
**Answer:**
|
||||||
|
|
||||||
- [ ] Consolidate duplicate `pathIsHidden` implementations in `internal/scanner/scanner.go:385-402` and `internal/cli/freshen.go:378-397`
|
### 2. Should `MFFileChecksum` stay as a wrapper message or simplify to `repeated bytes`?
|
||||||
- [ ] Make `TotalBytes()` in `internal/scanner/scanner.go:250-259` track total incrementally instead of recalculating on every call
|
|
||||||
- [ ] Add input validation to `AddFileWithHash()` in `mfer/builder.go:107-120` - validate path, size, and hash inputs
|
Currently `MFFilePath.hashes` is `repeated MFFileChecksum` where
|
||||||
|
`MFFileChecksum` contains a single `bytes multiHash` field. Since multihash
|
||||||
|
already self-describes the algorithm, `repeated bytes hashes = 3` on
|
||||||
|
`MFFilePath` would be equivalent and save protobuf overhead (one fewer
|
||||||
|
tag+length per hash per file). The wrapper is justified if you plan to add
|
||||||
|
per-hash metadata later (e.g. `verified_at` timestamp). Otherwise it's
|
||||||
|
unnecessary indirection.
|
||||||
|
|
||||||
|
**Answer:**
|
||||||
|
|
||||||
|
### 3. Should file permissions/mode be supported?
|
||||||
|
|
||||||
|
The proto has no field for Unix permission bits. For the archival use case
|
||||||
|
(ExFAT drives, filesystem-independent checksums), this may not matter. But
|
||||||
|
for software distribution or filesystem restoration, the absence of
|
||||||
|
permission bits is a gap. Recommendation: reserve `optional uint32 mode =
|
||||||
|
305` in `MFFilePath` now, even if it's not populated yet.
|
||||||
|
|
||||||
|
**Answer:**
|
||||||
|
|
||||||
|
### 4. Path semantics: what are the rules?
|
||||||
|
|
||||||
|
The proto has `string path` with no specified invariants. These need to be
|
||||||
|
documented and enforced:
|
||||||
|
|
||||||
|
- Always forward-slash separated? (even on Windows)
|
||||||
|
- Must be relative? (no leading `/`)
|
||||||
|
- No `..` components? (path traversal prevention)
|
||||||
|
- UTF-8 NFC normalized? (macOS uses NFD, Linux uses NFC — same filename
|
||||||
|
looks different in bytes)
|
||||||
|
- Max path length?
|
||||||
|
|
||||||
|
**Answer:**
|
||||||
|
|
||||||
|
### 5. Should the `sha256` field on `MFFileOuter` hash compressed or uncompressed data?
|
||||||
|
|
||||||
|
Currently it hashes the compressed inner bytes. This is actually the better
|
||||||
|
choice — it lets you verify integrity before decompression (preventing
|
||||||
|
decompression bombs from untrusted sources). But it's a non-obvious design
|
||||||
|
decision that should be documented. Confirm this is intentional.
|
||||||
|
|
||||||
|
**Answer:**
|
||||||
|
|
||||||
|
### 6. GPG vs signify (Ed25519) for signatures?
|
||||||
|
|
||||||
|
The current implementation shells out to `gpg`, which is fragile (binary
|
||||||
|
might not be installed, output format changes between versions). Options:
|
||||||
|
|
||||||
|
- **Keep GPG:** Use `github.com/ProtonMail/go-crypto` for pure-Go OpenPGP
|
||||||
|
(no subprocess, cross-platform, testable).
|
||||||
|
- **Switch to signify/Ed25519:** Simpler, smaller keys, no key management
|
||||||
|
complexity. `github.com/frankbraun/gosignify` exists. README already
|
||||||
|
mentions this as an option.
|
||||||
|
- **Support both:** More work, but maximum flexibility.
|
||||||
|
|
||||||
|
**Answer:**
|
||||||
|
|
||||||
|
### 7. Should manifests be deterministic by default?
|
||||||
|
|
||||||
|
If `createdAt` is populated and file order depends on filesystem walk order,
|
||||||
|
two runs of `mfer gen` on the same directory produce different manifests.
|
||||||
|
Recommendation: sort files lexicographically by path, omit `createdAt`
|
||||||
|
unless `--timestamp` is passed. Deterministic by default.
|
||||||
|
|
||||||
|
**Answer:**
|
||||||
|
|
||||||
|
### 8. Should the magic bytes include a version byte?
|
||||||
|
|
||||||
|
Currently: `ZNAVSRFG` (8 bytes), then protobuf. If the framing ever needs
|
||||||
|
to change (e.g. different compression framing, post-quantum signatures),
|
||||||
|
you'd need to parse the protobuf to discover the version. Adding a version
|
||||||
|
byte after the magic (`ZNAVSRFG\x01`) allows future format changes without
|
||||||
|
requiring protobuf parsing for version detection.
|
||||||
|
|
||||||
|
**Answer:**
|
||||||
|
|
||||||
|
### 9. Detached signature support?
|
||||||
|
|
||||||
|
For HTTP distribution, a detached `.mf.sig` file alongside `index.mf` would
|
||||||
|
let servers serve both without modifying the manifest itself. This follows
|
||||||
|
the `SHASUMS` + `SHASUMS.asc` pattern. The embedded signature is better for
|
||||||
|
single-file distribution. Worth supporting both modes?
|
||||||
|
|
||||||
|
**Answer:**
|
||||||
|
|
||||||
|
### 10. What about the duplicate internal packages?
|
||||||
|
|
||||||
|
There are two parallel implementations:
|
||||||
|
- `mfer/scanner.go` + `mfer/checker.go` (newer, better-typed with
|
||||||
|
`FileSize`, `RelFilePath`)
|
||||||
|
- `internal/scanner/` + `internal/checker/` (older, raw types)
|
||||||
|
|
||||||
|
The `mfer/` versions are superior. Should the `internal/` versions be
|
||||||
|
deleted and CLI updated to use `mfer/` package?
|
||||||
|
|
||||||
|
**Answer:**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Plan
|
||||||
|
|
||||||
|
Ordered by dependency and priority. Each item should be a PR.
|
||||||
|
|
||||||
|
### Phase 1: Foundation (format correctness)
|
||||||
|
|
||||||
|
- [ ] Delete `internal/scanner/` and `internal/checker/` — consolidate on
|
||||||
|
the `mfer/` package versions. Update CLI code to use `mfer.Scanner` and
|
||||||
|
`mfer.Checker`.
|
||||||
|
- [ ] Add deterministic file ordering — sort file entries by path
|
||||||
|
(lexicographic, byte-order) before serialization in `Builder.Build()`.
|
||||||
|
Add a test that generates a manifest twice and asserts byte-identical
|
||||||
|
output.
|
||||||
|
- [ ] Add decompression size limit — use `io.LimitReader` in
|
||||||
|
`deserializeInner()` with `m.pbOuter.Size` as the bound.
|
||||||
|
- [ ] Fix `errors.Is` dead code in checker — replace with
|
||||||
|
`os.IsNotExist(err)` or `errors.Is(err, fs.ErrNotExist)`.
|
||||||
|
- [ ] Fix `AddFile` to verify size — after reading, check
|
||||||
|
`totalRead == size` and return an error on mismatch.
|
||||||
|
- [ ] Specify path invariants — add comments to proto and validate in
|
||||||
|
`Builder.AddFile` / `Builder.AddFileWithHash`.
|
||||||
|
|
||||||
|
### Phase 2: CLI polish
|
||||||
|
|
||||||
|
- [ ] Fix flag naming — ensure all CLI flags use kebab-case as primary
|
||||||
|
names (`--include-dotfiles`, `--follow-symlinks`).
|
||||||
|
- [ ] Fix URL construction in fetch — use `url.JoinPath()` instead of
|
||||||
|
string concatenation.
|
||||||
|
- [ ] Add progress rate-limiting to Checker — throttle to once per second,
|
||||||
|
matching Scanner behavior.
|
||||||
|
- [ ] Add `--deterministic` flag (or make it default) — omit `createdAt`
|
||||||
|
timestamp, sort files.
|
||||||
|
|
||||||
|
### Phase 3: Robustness
|
||||||
|
|
||||||
|
- [ ] Replace GPG subprocess with pure-Go crypto — use
|
||||||
|
`github.com/ProtonMail/go-crypto` or switch to Ed25519/signify.
|
||||||
|
- [ ] Add timeout to any remaining subprocess calls.
|
||||||
|
- [ ] Add fuzzing tests for `NewManifestFromReader` — parses untrusted
|
||||||
|
input.
|
||||||
|
- [ ] Add retry logic to fetch — exponential backoff for transient HTTP
|
||||||
|
errors.
|
||||||
|
|
||||||
|
### Phase 4: Format finalization
|
||||||
|
|
||||||
|
- [ ] Remove or deprecate `atime` from proto (pending design question #1).
|
||||||
|
- [ ] Reserve `optional uint32 mode = 305` in `MFFilePath` (pending design
|
||||||
|
question #3).
|
||||||
|
- [ ] Add a version byte after magic (pending design question #8).
|
||||||
|
- [ ] Write a format specification document — separate from README.
|
||||||
|
|
||||||
|
### Phase 5: Release prep
|
||||||
|
|
||||||
|
- [ ] Reconcile module path (`sneak.berlin/go/mfer` vs
|
||||||
|
`git.eeqj.de/sneak/mfer`).
|
||||||
|
- [ ] Audit all error messages for consistency.
|
||||||
|
- [ ] Ensure `--version` output matches SemVer format.
|
||||||
|
- [ ] Tag v1.0.0.
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user