1 Commits

Author SHA1 Message Date
2f119ad289 Add Gitea Actions workflow for CI checks
All checks were successful
check / check (push) Successful in 1m19s
Adds .gitea/workflows/check.yml that runs docker build on push,
matching the standard CI pattern used across all sneak/* repos.
The Dockerfile already runs make check (fmt-check, lint, test)
so a successful build implies all checks pass.
2026-03-16 21:33:05 -07:00
4 changed files with 133 additions and 216 deletions

View File

@@ -0,0 +1,9 @@
name: check
on: [push]
jobs:
check:
runs-on: ubuntu-latest
steps:
# actions/checkout v4.2.2, 2026-03-16
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
- run: docker build .

View File

@@ -26,5 +26,4 @@ source for coding standards, formatting, linting, and workflow rules.
- The proto definition is in `mfer/mf.proto`; generated `.pb.go` files are
committed (required for `go get` compatibility).
- The format specification is in `FORMAT.md`.
- See the TODO section in `README.md` for the 1.0 implementation plan
and open design questions.
- See `TODO.md` for the 1.0 implementation plan and open design questions.

215
README.md
View File

@@ -25,8 +25,7 @@ software. A compatible javascript library is planned.
# Build Status
CI runs via `docker build .` which executes `make check` (formatting,
linting, tests). The `main` branch must always be green.
[![Build Status](https://drone.datavi.be/api/badges/sneak/mfer/status.svg)](https://drone.datavi.be/sneak/mfer)
# Participation
@@ -209,218 +208,6 @@ regardless of filesystem format.
Please email [`sneak@sneak.berlin`](mailto:sneak@sneak.berlin) with your
desired username for an account on this Gitea instance.
# TODO: Remaining Work for 1.0
## Design Questions (Owner Decision Required)
These require @sneak's input before implementation. Answers should be added
inline below each question.
### Format Design
**1. Should `MFFileChecksum` be simplified?** Currently it's a separate
message wrapping a single `bytes multiHash` field. Since multihash
already self-describes the algorithm, `repeated bytes hashes` directly on
`MFFilePath` would be simpler and reduce per-file protobuf overhead. Is
the extra message layer intentional (e.g. planning to add per-hash
metadata like `verified_at`)?
> _answer:_
**2. Should file permissions/mode be stored?** The format stores
mtime/ctime but not Unix file permissions. For archival use this may not
matter, but for software distribution or filesystem restoration it's a
gap. Should we reserve a field now (e.g. `optional uint32 mode = 305`)
even if we don't populate it yet?
> _answer:_
**3. Should `atime` be removed from the schema?** Access time is
volatile, non-deterministic, and often disabled (`noatime`). Including it
means two manifests of the same directory at different times will differ,
which conflicts with the determinism goal. Remove it, or document it as
"never set by default"?
> _answer:_
**4. What are the path normalization rules?** The proto has `string path`
with no specification about: always forward-slash? Must be relative? No
`..` components allowed? UTF-8 NFC vs NFD normalization (macOS vs
Linux)? Max path length? This is a security issue (path traversal) and a
cross-platform compatibility issue. What rules should the spec mandate?
> _answer:_
**5. Should we add a version byte after the magic?** Currently
`ZNAVSRFG` is followed immediately by protobuf. Adding a version byte
(`ZNAVSRFG\x01`) would allow future framing changes without requiring
protobuf parsing to detect the version. `MFFileOuter.Version` serves
this purpose but requires successful deserialization to read. Worth the
extra byte?
> _answer:_
**6. Should we add a length-prefix after the magic?** Protobuf is not
self-delimiting. If we ever want to concatenate manifests or append data
after the protobuf, the current framing is insufficient. Add a varint or
fixed-width length-prefix?
> _answer:_
### Signature Design
**7. What does the outer SHA-256 hash cover — compressed or uncompressed
data?** The code currently hashes compressed data (good for verifying
before decompression), but this should be explicitly documented. Which is
the intended behavior?
> _answer:_
**8. Should `signatureString()` sign raw bytes instead of a hex-encoded
string?** Currently the canonical string is `MAGIC-UUID-MULTIHASH` with
hex encoding, which adds a transformation layer. Signing the raw `sha256`
bytes (or compressed `innerMessage` directly) would be simpler. Keep the
string format or switch to raw bytes?
> _answer:_
**9. Should we support detached signature files (`.mf.sig`)?** Embedded
signatures are better for single-file distribution. Detached `.mf.sig`
files follow the familiar `SHASUMS`/`SHASUMS.asc` pattern and are
simpler for HTTP serving. Support both modes?
> _answer:_
**10. GPG vs pure-Go crypto for signatures?** Shelling out to `gpg` is
fragile (may not be installed, version-dependent output).
`github.com/ProtonMail/go-crypto` provides pure-Go OpenPGP, or we could
use Ed25519/signify (simpler, no key management). Which direction?
> _answer:_
### Implementation Design
**11. Should manifests be deterministic by default?** This means: sort
file entries by path, omit `createdAt` timestamp (or make it opt-in), no
`atime`. Should determinism be the default, with a
`--include-timestamps` flag to opt in?
> _answer:_
**12. Should we consolidate or keep both scanner/checker
implementations?** There are two parallel implementations:
`mfer/scanner.go` + `mfer/checker.go` (typed with `FileSize`,
`RelFilePath`) and `internal/scanner/` + `internal/checker/` (raw
`int64`, `string`). The `mfer/` versions are superior. Delete the
`internal/` versions?
> _answer:_
**13. Should the `manifest` type be exported?** Currently unexported with
exported constructors (`NewManifestFromReader`, `NewManifestFromFile`).
Consumers can't declare `var m *mfer.manifest`. Export the type, or
define an interface?
> _answer:_
**14. What should the Go module path be for 1.0?** Currently
`sneak.berlin/go/mfer` in `go.mod` but `git.eeqj.de/sneak/mfer/mfer` in
the proto `go_package` option. Which is canonical?
> _answer:_
## Implementation Tasks
### Repo Infrastructure
- [ ] Add `.golangci.yml` (fetch from
`https://git.eeqj.de/sneak/prompts/raw/branch/main/.golangci.yml`)
- [ ] Add `.editorconfig`
- [ ] Add `.gitea/workflows/check.yml` that runs `docker build .`
### Format & Correctness
- [ ] Resolve proto `go_package` path inconsistency
(`git.eeqj.de/sneak/mfer/mfer` vs `sneak.berlin/go/mfer`)
- [ ] Specify path invariants — add proto comments requiring UTF-8,
forward-slash, relative paths, no `..`, no leading `/`; validate
in `Builder.AddFile` and `Builder.AddFileWithHash` (pending design
question answer)
- [ ] Remove or deprecate `atime` from proto (pending design question
answer)
- [ ] Reserve `optional uint32 mode = 305` in `MFFilePath` for future
file permissions (pending design question answer)
- [ ] Add version byte after magic — `ZNAVSRFG\x01` for format version
1 (pending design question answer)
- [ ] Write format specification document — separate from README:
magic, outer structure, compression, inner structure, path
invariants, signature scheme, canonical serialization
### Library
- [ ] Delete `internal/scanner/` and `internal/checker/` — consolidate
on `mfer/` package versions; update CLI code (pending design
question answer)
- [ ] Add deterministic file ordering — sort entries by path
(lexicographic, byte-order) in `Builder.Build()`; add test
asserting byte-identical output from two runs
- [ ] Add decompression size limit — `io.LimitReader` in
`deserializeInner()` with `m.pbOuter.Size` as bound
- [ ] Fix `errors.Is` dead code in checker — replace with
`os.IsNotExist(err)` or `errors.Is(err, fs.ErrNotExist)`
- [ ] Fix `AddFile` to verify size — check `totalRead == size` after
reading, return error on mismatch
- [ ] Export the `manifest` type or define a public interface (pending
design question answer) — currently consumers cannot hold a reference
to a loaded manifest in their own type declarations
- [ ] Replace GPG subprocess calls with pure-Go crypto (pending design
question answer) — current implementation shells out to `gpg` which
may not be installed
- [ ] Add timeout to any remaining subprocess calls
### CLI
- [ ] Fix flag naming — all CLI flags should use kebab-case as primary
(`--include-dotfiles`, `--follow-symlinks`)
- [ ] Fix URL construction in fetch — use `BaseURL.JoinPath()` or
`url.JoinPath()` instead of string concatenation
- [ ] Add progress rate-limiting to Checker — throttle to once per
second, matching Scanner
- [ ] Add `--deterministic` flag or make it default — omit `createdAt`,
sort files (pending design question answer)
- [ ] Wire `--version` flag properly (currently only a `version`
subcommand exists; top-level `--version` shows urfave/cli generic
output)
- [ ] Add retry logic to `fetch` — currently no retries on transient
HTTP errors; needs exponential backoff
- [ ] `fetch` command uses bare `http.Get` with no timeout — needs
`http.Client` with configurable timeout
### Testing & Robustness
- [ ] Add fuzzing tests for `NewManifestFromReader` — protobuf
deserialization of untrusted input needs fuzz coverage
- [ ] Add integration test for `freshen` CLI command — current tests
only verify setup, not the actual freshen operation end-to-end
- [ ] Add test for `fetch` CLI command end-to-end (currently only
`downloadFile` is tested)
### Documentation
- [ ] Promote `FORMAT.md` as primary spec reference; README should link
to it more prominently
- [ ] Audit and update all error messages for consistency and
helpfulness
- [ ] Document the signature scheme more thoroughly (canonical string
format, verification steps)
### Release
- [ ] Finalize Go module path
- [ ] Update version constant in `mfer/constants.go`
- [ ] Add `--version` output matching SemVer
- [ ] Tag `v1.0.0`
# See Also
## Prior Art: Metalink

122
TODO.md Normal file
View File

@@ -0,0 +1,122 @@
# TODO: mfer 1.0
## Design Questions
_sneak: please answer inline below each question. These are preserved for posterity._
### Format Design
**1. Should `MFFileChecksum` be simplified?**
Currently it's a separate message wrapping a single `bytes multiHash` field. Since multihash already self-describes the algorithm, `repeated bytes hashes` directly on `MFFilePath` would be simpler and reduce per-file protobuf overhead. Is the extra message layer intentional (e.g. planning to add per-hash metadata like `verified_at`)?
> _answer:_
**2. Should file permissions/mode be stored?**
The format stores mtime/ctime but not Unix file permissions. For archival use (ExFAT, filesystem-independent checksums) this may not matter, but for software distribution or filesystem restoration it's a gap. Should we reserve a field now (e.g. `optional uint32 mode = 305`) even if we don't populate it yet?
> _answer:_
**3. Should `atime` be removed from the schema?**
Access time is volatile, non-deterministic, and often disabled (`noatime`). Including it means two manifests of the same directory at different times will differ, which conflicts with the determinism goal. Remove it, or document it as "never set by default"?
> _answer:_
**4. What are the path normalization rules?**
The proto has `string path` with no specification about: always forward-slash? Must be relative? No `..` components allowed? UTF-8 NFC vs NFD normalization (macOS vs Linux)? Max path length? This is a security issue (path traversal) and a cross-platform compatibility issue. What rules should the spec mandate?
> _answer:_
**5. Should we add a version byte after the magic?**
Currently `ZNAVSRFG` is followed immediately by protobuf. Adding a version byte (`ZNAVSRFG\x01`) would allow future framing changes without requiring protobuf parsing to detect the version. `MFFileOuter.Version` serves this purpose but requires successful deserialization to read. Worth the extra byte?
> _answer:_
**6. Should we add a length-prefix after the magic?**
Protobuf is not self-delimiting. If we ever want to concatenate manifests or append data after the protobuf, the current framing is insufficient. Add a varint or fixed-width length-prefix?
> _answer:_
### Signature Design
**7. What does the outer SHA-256 hash cover — compressed or uncompressed data?**
The review notes it currently hashes compressed data (good for verifying before decompression), but this should be explicitly documented. Which is the intended behavior?
> _answer:_
**8. Should `signatureString()` sign raw bytes instead of a hex-encoded string?**
Currently the canonical string is `MAGIC-UUID-MULTIHASH` with hex encoding, which adds a transformation layer. Signing the raw `sha256` bytes (or compressed `innerMessage` directly) would be simpler. Keep the string format or switch to raw bytes?
> _answer:_
**9. Should we support detached signature files (`.mf.sig`)?**
Embedded signatures are better for single-file distribution. Detached `.mf.sig` files follow the familiar `SHASUMS`/`SHASUMS.asc` pattern and are simpler for HTTP serving. Support both modes?
> _answer:_
**10. GPG vs pure-Go crypto for signatures?**
Shelling out to `gpg` is fragile (may not be installed, version-dependent output). `github.com/ProtonMail/go-crypto` provides pure-Go OpenPGP, or we could go Ed25519/signify (simpler, no key management). Which direction?
> _answer:_
### Implementation Design
**11. Should manifests be deterministic by default?**
This means: sort file entries by path, omit `createdAt` timestamp (or make it opt-in), no `atime`. Should determinism be the default, with a `--include-timestamps` flag to opt in?
> _answer:_
**12. Should we consolidate or keep both scanner/checker implementations?**
There are two parallel implementations: `mfer/scanner.go` + `mfer/checker.go` (typed with `FileSize`, `RelFilePath`) and `internal/scanner/` + `internal/checker/` (raw `int64`, `string`). The `mfer/` versions are superior. Delete the `internal/` versions?
> _answer:_
**13. Should the `manifest` type be exported?**
Currently unexported with exported constructors (`New`, `NewFromPaths`, etc.). Consumers can't declare `var m *mfer.manifest`. Export the type, or define an interface?
> _answer:_
**14. What should the Go module path be for 1.0?**
Currently mixed between `sneak.berlin/go/mfer` and `git.eeqj.de/sneak/mfer`. Which is canonical?
> _answer:_
---
## Implementation Plan
### Phase 1: Foundation (format correctness)
- [ ] Delete `internal/scanner/` and `internal/checker/` — consolidate on `mfer/` package versions; update CLI code
- [ ] Add deterministic file ordering — sort entries by path (lexicographic, byte-order) in `Builder.Build()`; add test asserting byte-identical output from two runs
- [ ] Add decompression size limit — `io.LimitReader` in `deserializeInner()` with `m.pbOuter.Size` as bound
- [ ] Fix `errors.Is` dead code in checker — replace with `os.IsNotExist(err)` or `errors.Is(err, fs.ErrNotExist)`
- [ ] Fix `AddFile` to verify size — check `totalRead == size` after reading, return error on mismatch
- [ ] Specify path invariants — add proto comments (UTF-8, forward-slash, relative, no `..`, no leading `/`); validate in `Builder.AddFile` and `Builder.AddFileWithHash`
### Phase 2: CLI polish
- [ ] Fix flag naming — all CLI flags use kebab-case as primary (`--include-dotfiles`, `--follow-symlinks`)
- [ ] Fix URL construction in fetch — use `BaseURL.JoinPath()` or `url.JoinPath()` instead of string concatenation
- [ ] Add progress rate-limiting to Checker — throttle to once per second, matching Scanner
- [ ] Add `--deterministic` flag (or make it default) — omit `createdAt`, sort files
### Phase 3: Robustness
- [ ] Replace GPG subprocess with pure-Go crypto — `github.com/ProtonMail/go-crypto` or Ed25519/signify
- [ ] Add timeout to any remaining subprocess calls
- [ ] Add fuzzing tests for `NewManifestFromReader`
- [ ] Add retry logic to fetch — exponential backoff for transient HTTP errors
### Phase 4: Format finalization
- [ ] Remove or deprecate `atime` from proto (pending design question answer)
- [ ] Reserve `optional uint32 mode = 305` in `MFFilePath` for future file permissions
- [ ] Add version byte after magic — `ZNAVSRFG\x01` for format version 1
- [ ] Write format specification document — separate from README: magic, outer structure, compression, inner structure, path invariants, signature scheme, canonical serialization
### Phase 5: Release prep
- [ ] Finalize Go module path
- [ ] Audit all error messages for consistency and helpfulness
- [ ] Add `--version` output matching SemVer
- [ ] Tag v1.0.0