Compare commits
4 Commits
101edd4c8e
...
897d5e1665
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
897d5e1665 | ||
| 1f7ee256ec | |||
|
|
28c6fbd220 | ||
| 5aae442156 |
2
.gitignore
vendored
2
.gitignore
vendored
@ -4,3 +4,5 @@ mfer/*.pb.go
|
|||||||
*.tmp
|
*.tmp
|
||||||
*.dockerimage
|
*.dockerimage
|
||||||
/vendor
|
/vendor
|
||||||
|
vendor.tzst
|
||||||
|
modcache.tzst
|
||||||
|
|||||||
@ -207,6 +207,15 @@ regardless of filesystem format.
|
|||||||
Please email [`sneak@sneak.berlin`](mailto:sneak@sneak.berlin) with your
|
Please email [`sneak@sneak.berlin`](mailto:sneak@sneak.berlin) with your
|
||||||
desired username for an account on this Gitea instance.
|
desired username for an account on this Gitea instance.
|
||||||
|
|
||||||
|
# See Also
|
||||||
|
|
||||||
|
## Prior Art: Metalink
|
||||||
|
|
||||||
|
* [Metalink - Mozilla Wiki](https://wiki.mozilla.org/Metalink)
|
||||||
|
* [Metalink - Wikipedia](https://en.wikipedia.org/wiki/Metalink)
|
||||||
|
* [RFC 5854 - The Metalink Download Description Format](https://datatracker.ietf.org/doc/html/rfc5854)
|
||||||
|
* [RFC 6249 - Metalink/HTTP: Mirrors and Hashes](https://www.rfc-editor.org/rfc/rfc6249.html)
|
||||||
|
|
||||||
## Links
|
## Links
|
||||||
|
|
||||||
* Repo: [https://git.eeqj.de/sneak/mfer](https://git.eeqj.de/sneak/mfer)
|
* Repo: [https://git.eeqj.de/sneak/mfer](https://git.eeqj.de/sneak/mfer)
|
||||||
|
|||||||
122
TODO.md
Normal file
122
TODO.md
Normal file
@ -0,0 +1,122 @@
|
|||||||
|
# TODO: mfer 1.0
|
||||||
|
|
||||||
|
## Design Questions
|
||||||
|
|
||||||
|
*sneak: please answer inline below each question. These are preserved for posterity.*
|
||||||
|
|
||||||
|
### Format Design
|
||||||
|
|
||||||
|
**1. Should `MFFileChecksum` be simplified?**
|
||||||
|
Currently it's a separate message wrapping a single `bytes multiHash` field. Since multihash already self-describes the algorithm, `repeated bytes hashes` directly on `MFFilePath` would be simpler and reduce per-file protobuf overhead. Is the extra message layer intentional (e.g. planning to add per-hash metadata like `verified_at`)?
|
||||||
|
|
||||||
|
> *answer:*
|
||||||
|
|
||||||
|
**2. Should file permissions/mode be stored?**
|
||||||
|
The format stores mtime/ctime but not Unix file permissions. For archival use (ExFAT, filesystem-independent checksums) this may not matter, but for software distribution or filesystem restoration it's a gap. Should we reserve a field now (e.g. `optional uint32 mode = 305`) even if we don't populate it yet?
|
||||||
|
|
||||||
|
> *answer:*
|
||||||
|
|
||||||
|
**3. Should `atime` be removed from the schema?**
|
||||||
|
Access time is volatile, non-deterministic, and often disabled (`noatime`). Including it means two manifests of the same directory at different times will differ, which conflicts with the determinism goal. Remove it, or document it as "never set by default"?
|
||||||
|
|
||||||
|
> *answer:*
|
||||||
|
|
||||||
|
**4. What are the path normalization rules?**
|
||||||
|
The proto has `string path` with no specification about: always forward-slash? Must be relative? No `..` components allowed? UTF-8 NFC vs NFD normalization (macOS vs Linux)? Max path length? This is a security issue (path traversal) and a cross-platform compatibility issue. What rules should the spec mandate?
|
||||||
|
|
||||||
|
> *answer:*
|
||||||
|
|
||||||
|
**5. Should we add a version byte after the magic?**
|
||||||
|
Currently `ZNAVSRFG` is followed immediately by protobuf. Adding a version byte (`ZNAVSRFG\x01`) would allow future framing changes without requiring protobuf parsing to detect the version. `MFFileOuter.Version` serves this purpose but requires successful deserialization to read. Worth the extra byte?
|
||||||
|
|
||||||
|
> *answer:*
|
||||||
|
|
||||||
|
**6. Should we add a length-prefix after the magic?**
|
||||||
|
Protobuf is not self-delimiting. If we ever want to concatenate manifests or append data after the protobuf, the current framing is insufficient. Add a varint or fixed-width length-prefix?
|
||||||
|
|
||||||
|
> *answer:*
|
||||||
|
|
||||||
|
### Signature Design
|
||||||
|
|
||||||
|
**7. What does the outer SHA-256 hash cover — compressed or uncompressed data?**
|
||||||
|
The review notes it currently hashes compressed data (good for verifying before decompression), but this should be explicitly documented. Which is the intended behavior?
|
||||||
|
|
||||||
|
> *answer:*
|
||||||
|
|
||||||
|
**8. Should `signatureString()` sign raw bytes instead of a hex-encoded string?**
|
||||||
|
Currently the canonical string is `MAGIC-UUID-MULTIHASH` with hex encoding, which adds a transformation layer. Signing the raw `sha256` bytes (or compressed `innerMessage` directly) would be simpler. Keep the string format or switch to raw bytes?
|
||||||
|
|
||||||
|
> *answer:*
|
||||||
|
|
||||||
|
**9. Should we support detached signature files (`.mf.sig`)?**
|
||||||
|
Embedded signatures are better for single-file distribution. Detached `.mf.sig` files follow the familiar `SHASUMS`/`SHASUMS.asc` pattern and are simpler for HTTP serving. Support both modes?
|
||||||
|
|
||||||
|
> *answer:*
|
||||||
|
|
||||||
|
**10. GPG vs pure-Go crypto for signatures?**
|
||||||
|
Shelling out to `gpg` is fragile (may not be installed, version-dependent output). `github.com/ProtonMail/go-crypto` provides pure-Go OpenPGP, or we could go Ed25519/signify (simpler, no key management). Which direction?
|
||||||
|
|
||||||
|
> *answer:*
|
||||||
|
|
||||||
|
### Implementation Design
|
||||||
|
|
||||||
|
**11. Should manifests be deterministic by default?**
|
||||||
|
This means: sort file entries by path, omit `createdAt` timestamp (or make it opt-in), no `atime`. Should determinism be the default, with a `--include-timestamps` flag to opt in?
|
||||||
|
|
||||||
|
> *answer:*
|
||||||
|
|
||||||
|
**12. Should we consolidate or keep both scanner/checker implementations?**
|
||||||
|
There are two parallel implementations: `mfer/scanner.go` + `mfer/checker.go` (typed with `FileSize`, `RelFilePath`) and `internal/scanner/` + `internal/checker/` (raw `int64`, `string`). The `mfer/` versions are superior. Delete the `internal/` versions?
|
||||||
|
|
||||||
|
> *answer:*
|
||||||
|
|
||||||
|
**13. Should the `manifest` type be exported?**
|
||||||
|
Currently unexported with exported constructors (`New`, `NewFromPaths`, etc.). Consumers can't declare `var m *mfer.manifest`. Export the type, or define an interface?
|
||||||
|
|
||||||
|
> *answer:*
|
||||||
|
|
||||||
|
**14. What should the Go module path be for 1.0?**
|
||||||
|
Currently mixed between `sneak.berlin/go/mfer` and `git.eeqj.de/sneak/mfer`. Which is canonical?
|
||||||
|
|
||||||
|
> *answer:*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Plan
|
||||||
|
|
||||||
|
### Phase 1: Foundation (format correctness)
|
||||||
|
|
||||||
|
- [ ] Delete `internal/scanner/` and `internal/checker/` — consolidate on `mfer/` package versions; update CLI code
|
||||||
|
- [ ] Add deterministic file ordering — sort entries by path (lexicographic, byte-order) in `Builder.Build()`; add test asserting byte-identical output from two runs
|
||||||
|
- [ ] Add decompression size limit — `io.LimitReader` in `deserializeInner()` with `m.pbOuter.Size` as bound
|
||||||
|
- [ ] Fix `errors.Is` dead code in checker — replace with `os.IsNotExist(err)` or `errors.Is(err, fs.ErrNotExist)`
|
||||||
|
- [ ] Fix `AddFile` to verify size — check `totalRead == size` after reading, return error on mismatch
|
||||||
|
- [ ] Specify path invariants — add proto comments (UTF-8, forward-slash, relative, no `..`, no leading `/`); validate in `Builder.AddFile` and `Builder.AddFileWithHash`
|
||||||
|
|
||||||
|
### Phase 2: CLI polish
|
||||||
|
|
||||||
|
- [ ] Fix flag naming — all CLI flags use kebab-case as primary (`--include-dotfiles`, `--follow-symlinks`)
|
||||||
|
- [ ] Fix URL construction in fetch — use `BaseURL.JoinPath()` or `url.JoinPath()` instead of string concatenation
|
||||||
|
- [ ] Add progress rate-limiting to Checker — throttle to once per second, matching Scanner
|
||||||
|
- [ ] Add `--deterministic` flag (or make it default) — omit `createdAt`, sort files
|
||||||
|
|
||||||
|
### Phase 3: Robustness
|
||||||
|
|
||||||
|
- [ ] Replace GPG subprocess with pure-Go crypto — `github.com/ProtonMail/go-crypto` or Ed25519/signify
|
||||||
|
- [ ] Add timeout to any remaining subprocess calls
|
||||||
|
- [ ] Add fuzzing tests for `NewManifestFromReader`
|
||||||
|
- [ ] Add retry logic to fetch — exponential backoff for transient HTTP errors
|
||||||
|
|
||||||
|
### Phase 4: Format finalization
|
||||||
|
|
||||||
|
- [ ] Remove or deprecate `atime` from proto (pending design question answer)
|
||||||
|
- [ ] Reserve `optional uint32 mode = 305` in `MFFilePath` for future file permissions
|
||||||
|
- [ ] Add version byte after magic — `ZNAVSRFG\x01` for format version 1
|
||||||
|
- [ ] Write format specification document — separate from README: magic, outer structure, compression, inner structure, path invariants, signature scheme, canonical serialization
|
||||||
|
|
||||||
|
### Phase 5: Release prep
|
||||||
|
|
||||||
|
- [ ] Finalize Go module path
|
||||||
|
- [ ] Audit all error messages for consistency and helpfulness
|
||||||
|
- [ ] Add `--version` output matching SemVer
|
||||||
|
- [ ] Tag v1.0.0
|
||||||
3
go.mod
3
go.mod
@ -1,6 +1,6 @@
|
|||||||
module git.eeqj.de/sneak/mfer
|
module git.eeqj.de/sneak/mfer
|
||||||
|
|
||||||
go 1.17
|
go 1.22
|
||||||
|
|
||||||
require (
|
require (
|
||||||
github.com/apex/log v1.9.0
|
github.com/apex/log v1.9.0
|
||||||
@ -10,7 +10,6 @@ require (
|
|||||||
github.com/stretchr/testify v1.8.1
|
github.com/stretchr/testify v1.8.1
|
||||||
github.com/urfave/cli/v2 v2.23.6
|
github.com/urfave/cli/v2 v2.23.6
|
||||||
google.golang.org/protobuf v1.28.1
|
google.golang.org/protobuf v1.28.1
|
||||||
|
|
||||||
)
|
)
|
||||||
|
|
||||||
require (
|
require (
|
||||||
|
|||||||
1
go.sum
1
go.sum
@ -37,7 +37,6 @@ cloud.google.com/go/storage v1.10.0/go.mod h1:FLPqc6j+Ki4BU591ie1oL6qBQGu2Bl/tZ9
|
|||||||
cloud.google.com/go/storage v1.14.0/go.mod h1:GrKmX003DSIwi9o29oFT7YDnHYwZoctc3fOKtUw0Xmo=
|
cloud.google.com/go/storage v1.14.0/go.mod h1:GrKmX003DSIwi9o29oFT7YDnHYwZoctc3fOKtUw0Xmo=
|
||||||
dmitri.shuralyov.com/gpu/mtl v0.0.0-20190408044501-666a987793e9/go.mod h1:H6x//7gZCb22OMCxBHrMx7a5I7Hp++hsVxbQ4BYO7hU=
|
dmitri.shuralyov.com/gpu/mtl v0.0.0-20190408044501-666a987793e9/go.mod h1:H6x//7gZCb22OMCxBHrMx7a5I7Hp++hsVxbQ4BYO7hU=
|
||||||
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
|
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
|
||||||
github.com/BurntSushi/toml v1.2.1/go.mod h1:CxXYINrC8qIiEnFrOxCa7Jy5BFHlXnUU2pbicEuybxQ=
|
|
||||||
github.com/BurntSushi/xgb v0.0.0-20160522181843-27f122750802/go.mod h1:IVnqGOEym/WlBOVXweHU+Q+/VP0lqqI8lqeDx9IjBqo=
|
github.com/BurntSushi/xgb v0.0.0-20160522181843-27f122750802/go.mod h1:IVnqGOEym/WlBOVXweHU+Q+/VP0lqqI8lqeDx9IjBqo=
|
||||||
github.com/MarvinJWendt/testza v0.1.0/go.mod h1:7AxNvlfeHP7Z/hDQ5JtE3OKYT3XFUeLCDE2DQninSqs=
|
github.com/MarvinJWendt/testza v0.1.0/go.mod h1:7AxNvlfeHP7Z/hDQ5JtE3OKYT3XFUeLCDE2DQninSqs=
|
||||||
github.com/MarvinJWendt/testza v0.2.1/go.mod h1:God7bhG8n6uQxwdScay+gjm9/LnO4D3kkcZX4hv9Rp8=
|
github.com/MarvinJWendt/testza v0.2.1/go.mod h1:God7bhG8n6uQxwdScay+gjm9/LnO4D3kkcZX4hv9Rp8=
|
||||||
|
|||||||
BIN
modcache.tzst
BIN
modcache.tzst
Binary file not shown.
BIN
vendor.tzst
BIN
vendor.tzst
Binary file not shown.
Loading…
Reference in New Issue
Block a user