1
0
forked from sneak/mfer
mfer/TODO.md

176 lines
6.6 KiB
Markdown

# TODO: mfer 1.0
## Design Questions
These need answers before implementation. Respond inline.
### 1. Should `atime` be removed from the proto?
Access time is volatile, non-deterministic, and often disabled (`noatime`
mount option). Including it means two manifests of the same unchanged
directory tree generated at different times will differ. This hurts
deterministic/reproducible manifests. Recommendation: remove it or document
it as "never set by default, opt-in only."
**Answer:**
### 2. Should `MFFileChecksum` stay as a wrapper message or simplify to `repeated bytes`?
Currently `MFFilePath.hashes` is `repeated MFFileChecksum` where
`MFFileChecksum` contains a single `bytes multiHash` field. Since multihash
already self-describes the algorithm, `repeated bytes hashes = 3` on
`MFFilePath` would be equivalent and save protobuf overhead (one fewer
tag+length per hash per file). The wrapper is justified if you plan to add
per-hash metadata later (e.g. `verified_at` timestamp). Otherwise it's
unnecessary indirection.
**Answer:**
### 3. Should file permissions/mode be supported?
The proto has no field for Unix permission bits. For the archival use case
(ExFAT drives, filesystem-independent checksums), this may not matter. But
for software distribution or filesystem restoration, the absence of
permission bits is a gap. Recommendation: reserve `optional uint32 mode =
305` in `MFFilePath` now, even if it's not populated yet.
**Answer:**
### 4. Path semantics: what are the rules?
The proto has `string path` with no specified invariants. These need to be
documented and enforced:
- Always forward-slash separated? (even on Windows)
- Must be relative? (no leading `/`)
- No `..` components? (path traversal prevention)
- UTF-8 NFC normalized? (macOS uses NFD, Linux uses NFC — same filename
looks different in bytes)
- Max path length?
**Answer:**
### 5. Should the `sha256` field on `MFFileOuter` hash compressed or uncompressed data?
Currently it hashes the compressed inner bytes. This is actually the better
choice — it lets you verify integrity before decompression (preventing
decompression bombs from untrusted sources). But it's a non-obvious design
decision that should be documented. Confirm this is intentional.
**Answer:**
### 6. GPG vs signify (Ed25519) for signatures?
The current implementation shells out to `gpg`, which is fragile (binary
might not be installed, output format changes between versions). Options:
- **Keep GPG:** Use `github.com/ProtonMail/go-crypto` for pure-Go OpenPGP
(no subprocess, cross-platform, testable).
- **Switch to signify/Ed25519:** Simpler, smaller keys, no key management
complexity. `github.com/frankbraun/gosignify` exists. README already
mentions this as an option.
- **Support both:** More work, but maximum flexibility.
**Answer:**
### 7. Should manifests be deterministic by default?
If `createdAt` is populated and file order depends on filesystem walk order,
two runs of `mfer gen` on the same directory produce different manifests.
Recommendation: sort files lexicographically by path, omit `createdAt`
unless `--timestamp` is passed. Deterministic by default.
**Answer:**
### 8. Should the magic bytes include a version byte?
Currently: `ZNAVSRFG` (8 bytes), then protobuf. If the framing ever needs
to change (e.g. different compression framing, post-quantum signatures),
you'd need to parse the protobuf to discover the version. Adding a version
byte after the magic (`ZNAVSRFG\x01`) allows future format changes without
requiring protobuf parsing for version detection.
**Answer:**
### 9. Detached signature support?
For HTTP distribution, a detached `.mf.sig` file alongside `index.mf` would
let servers serve both without modifying the manifest itself. This follows
the `SHASUMS` + `SHASUMS.asc` pattern. The embedded signature is better for
single-file distribution. Worth supporting both modes?
**Answer:**
### 10. What about the duplicate internal packages?
There are two parallel implementations:
- `mfer/scanner.go` + `mfer/checker.go` (newer, better-typed with
`FileSize`, `RelFilePath`)
- `internal/scanner/` + `internal/checker/` (older, raw types)
The `mfer/` versions are superior. Should the `internal/` versions be
deleted and CLI updated to use `mfer/` package?
**Answer:**
---
## Implementation Plan
Ordered by dependency and priority. Each item should be a PR.
### Phase 1: Foundation (format correctness)
- [ ] Delete `internal/scanner/` and `internal/checker/` — consolidate on
the `mfer/` package versions. Update CLI code to use `mfer.Scanner` and
`mfer.Checker`.
- [ ] Add deterministic file ordering — sort file entries by path
(lexicographic, byte-order) before serialization in `Builder.Build()`.
Add a test that generates a manifest twice and asserts byte-identical
output.
- [ ] Add decompression size limit — use `io.LimitReader` in
`deserializeInner()` with `m.pbOuter.Size` as the bound.
- [ ] Fix `errors.Is` dead code in checker — replace with
`os.IsNotExist(err)` or `errors.Is(err, fs.ErrNotExist)`.
- [ ] Fix `AddFile` to verify size — after reading, check
`totalRead == size` and return an error on mismatch.
- [ ] Specify path invariants — add comments to proto and validate in
`Builder.AddFile` / `Builder.AddFileWithHash`.
### Phase 2: CLI polish
- [ ] Fix flag naming — ensure all CLI flags use kebab-case as primary
names (`--include-dotfiles`, `--follow-symlinks`).
- [ ] Fix URL construction in fetch — use `url.JoinPath()` instead of
string concatenation.
- [ ] Add progress rate-limiting to Checker — throttle to once per second,
matching Scanner behavior.
- [ ] Add `--deterministic` flag (or make it default) — omit `createdAt`
timestamp, sort files.
### Phase 3: Robustness
- [ ] Replace GPG subprocess with pure-Go crypto — use
`github.com/ProtonMail/go-crypto` or switch to Ed25519/signify.
- [ ] Add timeout to any remaining subprocess calls.
- [ ] Add fuzzing tests for `NewManifestFromReader` — parses untrusted
input.
- [ ] Add retry logic to fetch — exponential backoff for transient HTTP
errors.
### Phase 4: Format finalization
- [ ] Remove or deprecate `atime` from proto (pending design question #1).
- [ ] Reserve `optional uint32 mode = 305` in `MFFilePath` (pending design
question #3).
- [ ] Add a version byte after magic (pending design question #8).
- [ ] Write a format specification document — separate from README.
### Phase 5: Release prep
- [ ] Reconcile module path (`sneak.berlin/go/mfer` vs
`git.eeqj.de/sneak/mfer`).
- [ ] Audit all error messages for consistency.
- [ ] Ensure `--version` output matches SemVer format.
- [ ] Tag v1.0.0.