52 Commits

Author SHA1 Message Date
9ce16a83ad Add TODO section to README with 1.0 roadmap, remove TODO.md (#54)
All checks were successful
check / check (push) Successful in 4s
## Summary

Performs a design and status review of the codebase and adds a comprehensive TODO section to `README.md` listing remaining work for a 1.0 release.

### What changed

- **README.md**: Added a `TODO: Remaining Work for 1.0` section covering:
  - **7 design questions** requiring @sneak's input before implementation (manifest type export, Go module path, GPG vs pure-Go crypto, format framing, etc.) — each with an answer field for inline decisions
  - **Implementation tasks** organized by category: repo infrastructure, format & correctness, library, CLI, testing & robustness, documentation, and release checklist
  - Updated build status section (removed stale Drone CI badge, replaced with description of current Docker-based CI)
- **TODO.md**: Removed — items integrated into README TODO section
- **AGENTS.md**: Updated reference from `TODO.md` to README TODO section

### Design review findings

**What works well:**
- Core library (Builder, Scanner, Checker) is solid with good test coverage
- Format specification is well-designed (protobuf + zstd, multihash, deterministic serialization)
- CLI covers all major operations (gen, check, list, export, freshen, fetch)
- Test suite is thorough — builder, scanner, checker, GPG, CLI integration, corruption detection
- afero abstraction enables clean testing without filesystem side effects

**Key gaps for 1.0:**
- Missing repo infrastructure (`.golangci.yml`, `.editorconfig`, CI workflow)
- `manifest` type is unexported — consumers can't use it in their own type declarations
- GPG signing shells out to `gpg` subprocess — fragile and may not be installed
- Go module path inconsistency between `go.mod` and proto `go_package`
- `fetch` command lacks retry logic and has no HTTP timeout
- Missing fuzz tests for untrusted input deserialization
- Freshen CLI command has incomplete integration test coverage

closes #47
closes #50

Co-authored-by: user <user@Mac.lan guest wan>
Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #54
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-04-07 00:43:56 +02:00
01124c10d9 Add Gitea Actions CI workflow (#53)
All checks were successful
check / check (push) Successful in 18s
Follow-up to [PR #51](#51) — addresses sneak's review comment requesting Gitea Actions to run the checks.

## Changes

Adds `.gitea/workflows/check.yml` that runs `docker build .` on every push, matching the standard CI pattern used across all `sneak/*` repos (identical to [chat](https://git.eeqj.de/sneak/chat/src/branch/main/.gitea/workflows/check.yml) and [dnswatcher](https://git.eeqj.de/sneak/dnswatcher/src/branch/main/.gitea/workflows/check.yml)).

Since the Dockerfile already runs `make check` (fmt-check, lint, test), a successful build implies all checks pass.

The `actions/checkout` action is pinned by SHA (`@11bd71901bbe5b1630ceea73d27597364c9af683` = v4.2.2) per REPO_POLICIES.

## Verification

`docker build .` passes clean.

Reviewed-on: #53
Co-authored-by: clawbot <sneak+clawbot@sneak.cloud>
Co-committed-by: clawbot <sneak+clawbot@sneak.cloud>
2026-03-20 06:57:27 +01:00
6ba32f5b35 Add REPO_POLICIES.md, rename CLAUDE.md to AGENTS.md, deduplicate (#51)
Closes #48

## Changes

- **Added `REPO_POLICIES.md`** — copied from the standard template at [sneak/prompts](https://git.eeqj.de/sneak/prompts/src/branch/main/prompts/REPO_POLICIES.md) (last_modified: 2026-03-10). This is the authoritative cross-project policy document covering repository structure, tooling, Docker, formatting, testing, and workflow standards.

- **Renamed `CLAUDE.md` → `AGENTS.md`** — deduplicated content:
  - Rules already covered by `REPO_POLICIES.md` (e.g. `git add -A`, Makefile targets) are no longer repeated
  - `AGENTS.md` retains only agent-specific workflow instructions: test-first bug fixing, no AI attribution in commits, per-change make fmt/test/lint workflow, and repo-specific notes (proto files, FORMAT.md, TODO.md)

- **Updated `README.md`** — added a reference to `REPO_POLICIES.md` in the Participation section

- **Formatting** — `make fmt` (prettier) applied to all markdown files

## Verification

`docker build .` passes clean — lint, fmt-check, and all tests green.

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #51
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-17 05:07:43 +01:00
e62c709d42 Remove committed .index.mf and add to .gitignore (#52)
Closes #49

`.index.mf` is a generated manifest file produced at runtime by `mfer gen`. It was committed to the repo but should not be tracked in version control.

Changes:
- `git rm .index.mf` to remove it from tracking
- Added `.index.mf` to `.gitignore` to prevent accidental re-commits

Co-authored-by: user <user@Mac.lan guest wan>
Reviewed-on: #52
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-17 05:06:37 +01:00
89903fa1cd Split Dockerfile: pre-built golangci-lint stage for faster CI (#45)
Closes #39

Splits the Dockerfile into a dedicated lint stage using the `golangci/golangci-lint` image directly, rather than copying the binary into the builder stage.

## Changes

### Dockerfile

- **Lint stage** (`AS lint`): Uses the pre-built `golangci/golangci-lint` image (pinned by sha256) to run `make fmt-check` and `make lint`. This is a self-contained stage with its own `go mod download` and source copy.
- **Builder stage** (`AS builder`): Runs only `make test` and the final binary build. No longer needs golangci-lint installed.
- **Stage dependency**: `COPY --from=lint /src/go.sum /dev/null` forces BuildKit to always execute the lint stage (without this, unused stages are silently skipped).
- Both stages touch `mfer/mf.pb.go` to prevent make from trying to regenerate via protoc.

With BuildKit, the lint and builder stages run in parallel after their shared `go mod download` layers complete, so lint/formatting failures surface much faster without blocking on test execution.

### Makefile

- Added `lint` to the `check` target prereqs: `check: test lint fmt-check` (was `check: test fmt-check`), matching the [REPO_POLICIES](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/REPO_POLICIES.md) requirement.

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #45
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-15 18:56:25 +01:00
b3d10106e1 next (#44)
Reviewed-on: #44
2026-03-15 18:49:19 +01:00
9712c10fe3 Merge main into next: resolve conflicts, rewrite Dockerfile for Go 1.23
- Resolve merge conflicts (README.md, TODO.md, go.mod) keeping next's versions
- Rewrite Dockerfile: replace sneak/builder:2022-12-08 (Go 1.19) with
  golang@sha256-pinned (Go 1.23), add golangci-lint for future use
- Remove references to deleted vendor.tzst, modcache.tzst
- Simplify to standard multi-stage build: check + build + scratch final image
- Keep module path sneak.berlin/go/mfer from next branch
- Add Makefile targets: check, fmt-check, hooks (per REPO_POLICIES)
- Pin golangci-lint@v2.0.2 in devprereqs
- Dockerfile version/date comment on pinned image hash
- make check runs test + fmt-check (lint deferred to follow-up issue)
2026-03-14 17:38:40 -07:00
43916c7746 1.0 quality polish — code review, tests, bug fixes, documentation (#32)
Comprehensive quality pass targeting 1.0 release:

- Code review and refactoring
- Fix open bugs (#14, #16, #23)
- Expand test coverage
- Lint clean
- README update with build instructions (#9)
- Documentation improvements

Branched from `next` (active dev branch).

Reviewed-on: #32
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-01 23:58:37 +01:00
bbab6e73f4 Add deterministic file ordering in Builder.Build() (closes #23) (#28)
Reviewed-on: #28
2026-02-20 12:15:13 +01:00
615eecff79 Merge branch 'next' into fix/issue-23 2026-02-20 12:14:53 +01:00
9b67de016d chore: remove committed vendor/modcache archives (#35)
Removes `vendor.tzst` and `modcache.tzst` that should never have been committed. Adds both to `.gitignore`.

Reviewed-on: #35
Co-authored-by: clawbot <sneak+clawbot@sneak.cloud>
Co-committed-by: clawbot <sneak+clawbot@sneak.cloud>
2026-02-20 12:14:29 +01:00
clawbot
3c779465e2 remove time-hard hash iteration from seed UUID derivation
Replace 150M SHA-256 iteration key-stretching with a single hash.
Remove all references to iteration counts, timing (~5-10s), and
key-stretching from code and documentation.

The seed flag is retained for deterministic UUID generation, but
now derives the UUID with a single SHA-256 hash instead of the
unnecessary iterative approach.
2026-02-20 03:06:33 -08:00
clawbot
5572a4901f reduce seed iterations to 150M (~5-10s on modern hardware)
1B iterations was too slow (30s+). Benchmarked on Apple Silicon:
- 150M iterations ≈ 6.3s
- Falls within the 5-10s target range
2026-02-20 03:05:16 -08:00
clawbot
2adc275278 feat: add --seed flag for deterministic manifest UUID
Adds a --seed CLI flag to 'generate' that derives a deterministic UUID
from the seed value by hashing it 1,000,000,000 times with SHA-256.
This makes manifest generation fully reproducible when the same seed
and input files are provided.

- Builder.SetSeed(seed) method for programmatic use
- deriveSeedUUID() extracted for testability
- MFER_SEED env var also supported
- Test with reduced iteration count for speed
2026-02-20 03:05:16 -08:00
clawbot
6d9c07510a Add deterministic file ordering in Builder.Build()
Sort file entries by path (lexicographic, byte-order) before
serialization to ensure deterministic output. Add fixedUUID support
for testing reproducibility, and a test asserting byte-identical
output from two runs with the same input.

Closes #23
2026-02-20 03:05:16 -08:00
6d1bdbb00f chore: remove stale .drone.yml (#37)
Remove obsolete Drone CI config. Added to .gitignore.

.golangci.yaml was already removed from next branch.

Co-authored-by: user <user@Mac.lan guest wan>
Reviewed-on: #37
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-02-20 12:00:49 +01:00
ae70cf6fb5 Fix FindExtraFiles reporting manifest and dotfiles as extra (closes #16) (#21)
Reviewed-on: #21
2026-02-20 11:38:52 +01:00
5099b6951b Fix IsHiddenPath treating current directory as hidden (closes #14) (#19)
Reviewed-on: #19
2026-02-20 11:37:23 +01:00
1f7ee256ec fix: update go directive from 1.17 to 1.22 (fixes #33) (#34)
Reviewed-on: #34
2026-02-20 11:35:44 +01:00
clawbot
28c6fbd220 fix: update go directive from 1.17 to 1.22 (fixes #33)
go.mod specified go 1.17 but protobuf generated code (mf.pb.go) uses
go1.18+ features (any) and go1.20+ features (unsafe.StringData).
Updated to go 1.22 and ran go mod tidy.
2026-02-20 02:25:21 -08:00
clawbot
7b61bdd62b fix: handle errcheck warnings in gpg.go and gpg_test.go 2026-02-19 23:41:14 -08:00
clawbot
8c7eef6240 fix: handle errcheck warnings in gpg.go and gpg_test.go, fix gofmt 2026-02-19 23:40:50 -08:00
clawbot
97dbe47c32 fix: FindExtraFiles skips dotfiles and manifest files to avoid false positives
FindExtraFiles now skips hidden files/directories and manifest files
(index.mf, .index.mf) when looking for extra files. Previously it would
report these as 'extra' even though they are intentionally excluded from
manifests by default, making --no-extra-files unusable.

Also includes IsHiddenPath fix for '.' (needed by the new filtering).
2026-02-19 23:36:48 -08:00
clawbot
f6478858d7 fix: IsHiddenPath returns false for "." and "/" (current dir/root are not hidden)
path.Clean(".") returns "." which starts with a dot, causing IsHiddenPath
to incorrectly treat the current directory as hidden. Add explicit checks
for "." and "/" before the dot-prefix check.

Fixed in both mfer/scanner.go and internal/scanner/scanner.go.
2026-02-19 23:36:42 -08:00
5aae442156 add links to metalink format (#7)
Reviewed-on: #7
2026-02-09 02:15:58 +01:00
1f12d10cb7 Fix errors.Is with errors.New() never matching in checker (closes #12) (#17)
Co-authored-by: clawbot <clawbot@openclaw>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #17
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-02-09 02:15:08 +01:00
7f25970dd3 Fix URL encoding for file paths in fetch command (closes #13) (#18)
Co-authored-by: clawbot <clawbot@openclaw>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #18
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-02-09 02:14:20 +01:00
70af055d4e Fix newTimestampFromTime panic on extreme dates (closes #15) (#20)
Co-authored-by: clawbot <clawbot@openclaw>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #20
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-02-09 02:10:21 +01:00
04b05e01e8 Consolidate scanner/checker — delete internal/scanner/ and internal/checker/ (closes #22) (#27)
Remove unused `internal/scanner/` and `internal/checker/` packages. The CLI already uses `mfer.Scanner` and `mfer.Checker` from the `mfer/` package directly, so these were dead code.

Co-authored-by: clawbot <clawbot@openclaw>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #27
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-02-09 02:09:01 +01:00
7144617d0e Add decompression size limit in deserializeInner() (closes #24) (#29)
Wrap zstd decompressor with `io.LimitReader` (256MB max) to prevent decompression bombs.

Co-authored-by: clawbot <clawbot@openclaw>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #29
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-02-09 01:45:55 +01:00
2efffd9da8 Specify and enforce path invariants (closes #26) (#31)
Add `ValidatePath()` enforcing UTF-8, forward-slash, relative, no `..`, no empty segments. Applied in `AddFile` and `AddFileWithHash`. Proto comments document the rules.

Co-authored-by: clawbot <clawbot@openclaw>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #31
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-02-09 01:45:29 +01:00
ebaf2a65ca Fix AddFile to verify actual bytes read matches declared size (closes #25) (#30)
After reading file content, verify `totalRead == size` and return an error on mismatch.

Co-authored-by: clawbot <clawbot@openclaw>
Reviewed-on: #30
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-02-09 01:35:07 +01:00
4b80c0067b docs: replace TODO.md with design questions and implementation plan 2026-02-08 18:40:31 +01:00
5ab092098b progress 2026-02-08 09:25:58 -08:00
4a2060087d Add GPG signature verification on manifest load
- Implement gpgVerify function that creates a temporary keyring to verify
  detached signatures against embedded public keys
- Signature verification happens during deserialization after hash
  validation but before decompression
- Extract signatureString() as a method on manifest for generating the
  canonical signature string (MAGIC-UUID-MULTIHASH)
- Add --require-signature flag to check command to mandate signature from
  a specific GPG key ID
- Expose IsSigned() and Signer() methods on Checker for signature status
2025-12-18 05:56:16 -08:00
213364bab5 Add UUID to manifest and verify integrity before decompression
- Add UUID field to both inner and outer manifest messages
- Generate random v4 UUID when creating manifest
- Hash compressed data (not uncompressed) for integrity check
- Verify hash before decompression to prevent malicious payloads
- Validate UUIDs are proper format and match between inner/outer
- Sign string format: MAGIC-UUID-MULTIHASH
2025-12-18 02:20:51 -08:00
778999a285 Add GPG signing support for manifest generation
- Add --sign-key flag and MFER_SIGN_KEY env var to gen and freshen commands
- Sign inner message multihash with GPG detached signature
- Include signer fingerprint and public key in outer wrapper
- Add comprehensive tests with temporary GPG keyring
- Increase test timeout to 10s for GPG key generation
2025-12-18 02:12:54 -08:00
308c583d57 Remove codebase structure section from README
godoc provides this documentation automatically
2025-12-18 01:38:13 -08:00
019fe41c3d Update .gitignore for new bin/ build directory 2025-12-18 01:30:50 -08:00
fc0b38ea19 Add TODO.md with codebase audit findings
Document issues found during code audit including:
- Critical: broken error comparison, unchecked hash writes, URL path traversal
- Important: goroutine leak, timestamp precision, missing context cancellation
- Code quality: duplicate functions, inefficient calculations, missing validation
2025-12-18 01:30:01 -08:00
61c17ca585 Normalize markdown formatting in documentation
- Use consistent dash-style bullet points
- Remove trailing whitespace
- Add missing blank lines between sections
- Add trailing newline to README.md
2025-12-18 01:29:56 -08:00
dae6c64e24 Change build output path from mfer.cmd to bin/mfer
Use conventional bin/ directory for build output instead of
placing executable in project root.
2025-12-18 01:29:47 -08:00
a5b0343b28 Use Go 1.13+ octal literal syntax throughout codebase
Update file permission literals from legacy octal format (0755, 0644)
to explicit Go 1.13+ format (0o755, 0o644) for improved readability.
2025-12-18 01:29:40 -08:00
e25e309581 Move checker package into mfer package
Consolidate checker functionality into the mfer package alongside
scanner, removing the need for a separate internal/checker package.
2025-12-18 01:28:35 -08:00
dc115c5ba2 Add custom types for type safety throughout codebase
- Add FileCount, FileSize, RelFilePath, AbsFilePath, ModTime, Multihash types
- Add UnixSeconds and UnixNanos types for timestamp handling
- Add URL types (ManifestURL, FileURL, BaseURL) with safe path joining
- Consolidate scanner package into mfer package
- Update checker to use custom types in Result and CheckStatus
- Add ModTime.Timestamp() method for protobuf conversion
- Update all tests to use proper custom types
2025-12-18 01:01:18 -08:00
a9f0d2abe4 Update README to reflect current API (FileProgress was already a channel) 2025-12-17 17:19:08 -08:00
1588e1bb9f Remove unused legacy manifest APIs
Removed:
- New(), NewFromPaths(), NewFromFS() - unused constructors
- Scan(), addFile(), addInputPath(), addInputFS() - unused scanning code
- WriteToFile(), Write() - unused output methods (Builder.Build() is used)
- GetFileCount(), GetTotalFileSize() - unused accessors
- pathIsHidden() - duplicated in internal/scanner
- ManifestScanOptions - unused options struct
- HasError(), AddError(), WithContext() - unused error/context handling
- NewFromProto() - deprecated alias
- manifestFile struct - unused internal type

Kept:
- manifest struct (simplified to just pbInner, pbOuter, output)
- NewManifestFromReader(), NewManifestFromFile() - for loading manifests
- Files() - returns files from loaded manifest
- Builder and its methods - for creating manifests
2025-12-17 17:16:35 -08:00
09e8da0855 Update CLAUDE.md and clean up completed TODOs in README 2025-12-17 17:09:33 -08:00
efa4bb929a Update README: mark FIXMEs as resolved 2025-12-17 17:08:37 -08:00
16e3538ea6 Document WriteToFile overwrite behavior, remove misplaced FIXME 2025-12-17 17:08:11 -08:00
1ae384b6f6 Add context cancellation support to Scan 2025-12-17 17:07:02 -08:00
b55ae961c8 Validate filesystem in addInputFS 2025-12-17 17:05:42 -08:00
46 changed files with 3465 additions and 1379 deletions

View File

@@ -1,23 +0,0 @@
kind: pipeline
name: test-docker-build
steps:
- name: test-docker-build
image: plugins/docker
network_mode: bridge
settings:
repo: sneak/mfer
build_args_from_env: [ DRONE_COMMIT_SHA ]
dry_run: true
custom_dns: [ 116.202.204.30 ]
tags:
- ${DRONE_COMMIT_SHA:0:7}
- ${DRONE_BRANCH}
- latest
- name: notify
image: plugins/slack
settings:
webhook:
from_secret: SLACK_WEBHOOK_URL
when:
event: pull_request

View File

@@ -0,0 +1,9 @@
name: check
on: [push]
jobs:
check:
runs-on: ubuntu-latest
steps:
# actions/checkout v4.2.2, 2026-03-16
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
- run: docker build .

10
.gitignore vendored
View File

@@ -1,5 +1,13 @@
/mfer.cmd
/bin/
/tmp
*.tmp
*.dockerimage
/vendor
vendor.tzst
modcache.tzst
# Generated manifest files
.index.mf
# Stale files
.drone.yml

30
AGENTS.md Normal file
View File

@@ -0,0 +1,30 @@
# Agent Instructions
Read `REPO_POLICIES.md` before making any changes. It is the authoritative
source for coding standards, formatting, linting, and workflow rules.
## Workflow
- When fixing a bug, write a failing test FIRST. Only after the test fails,
write the code to fix the bug. Then ensure the test passes. Leave the test in
place and commit it with the bugfix. Don't run shell commands to test bugfixes
or reproduce bugs. Write tests!
- After each change, run `make fmt`, then `make test`, then `make lint`. Fix any
failures before committing.
- After each change, commit only the files you've changed. Push after committing.
## Attribution
- Never mention Claude, Anthropic, or any AI/LLM tooling in commit messages. Do
not use attribution.
## Repository-Specific Notes
- This is a Go library + CLI tool for generating `.mf` manifest files.
- The proto definition is in `mfer/mf.proto`; generated `.pb.go` files are
committed (required for `go get` compatibility).
- The format specification is in `FORMAT.md`.
- See the TODO section in `README.md` for the 1.0 implementation plan
and open design questions.

View File

@@ -1,15 +0,0 @@
# Important Rules
* never, ever mention claude or anthropic in commit messages. do not use attribution
* after each change, run "make fmt".
* after each change, run "make test" and ensure all tests pass.
* after each change, run "make lint" and ensure no linting errors. fix any
you find, one by one.
* after each change, commit the files you've changed. push after
committing.
* NEVER use `git add -A`. always add only individual files that you've changed.

View File

@@ -1,37 +1,38 @@
################################################################################
#2345678911234567892123456789312345678941234567895123456789612345678971234567898
################################################################################
FROM sneak/builder:2022-12-08 AS builder
ENV DEBIAN_FRONTEND noninteractive
WORKDIR /build
COPY ./Makefile ./.golangci.yml ./go.mod ./go.sum /build/
COPY ./vendor.tzst /build/vendor.tzst
COPY ./modcache.tzst /build/modcache.tzst
COPY ./internal ./internal
COPY ./bin/gitrev.sh ./bin/gitrev.sh
COPY ./mfer ./mfer
COPY ./cmd ./cmd
ARG GITREV unknown
ARG DRONE_COMMIT_SHA unknown
# Lint stage — fast feedback on formatting and lint issues
# golangci/golangci-lint:v2.0.2 (2026-03-14)
FROM golangci/golangci-lint@sha256:d55581f7797e7a0877a7c3aaa399b01bdc57d2874d6412601a046cc4062cb62e AS lint
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
# Touch .pb.go so make does not try to regenerate via protoc (file is committed)
RUN touch mfer/mf.pb.go
RUN make fmt-check
RUN make lint
# Build stage — tests and compilation
# golang:1.23 (2026-03-14)
FROM golang@sha256:60deed95d3888cc5e4d9ff8a10c54e5edc008c6ae3fba6187be6fb592e19e8c0 AS builder
# Force BuildKit to run the lint stage by creating a stage dependency
COPY --from=lint /src/go.sum /dev/null
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
# Touch .pb.go so make does not try to regenerate via protoc (file is committed)
RUN touch mfer/mf.pb.go
RUN make test
RUN cd cmd/mfer && go build -tags urfave_cli_no_docs -o /mfer .
RUN mkdir -p "$(go env GOMODCACHE)" && cd "$(go env GOMODCACHE)" && \
zstdmt -d --stdout /build/modcache.tzst | tar xf - && \
rm /build/modcache.tzst && cd /build
RUN \
cd mfer && go generate . && cd .. && \
GOPACKAGESDEBUG=true golangci-lint run ./... && \
mkdir vendor && cd vendor && \
zstdmt -d --stdout /build/vendor.tzst | tar xf - && rm /build/vendor.tzst && \
cd .. && \
make mfer.cmd
RUN rm -rf /build/vendor && go mod vendor && tar -c . | zstdmt -19 > /src.tzst
################################################################################
#2345678911234567892123456789312345678941234567895123456789612345678971234567898
################################################################################
## final image
################################################################################
FROM scratch
# we put all the source into the final image for posterity, it's small
COPY --from=builder /src.tzst /src.tzst
COPY --from=builder /build/mfer.cmd /mfer
COPY --from=builder /mfer /mfer
ENTRYPOINT ["/mfer"]

143
FORMAT.md Normal file
View File

@@ -0,0 +1,143 @@
# .mf File Format Specification
Version 1.0
## Overview
An `.mf` file is a binary manifest that describes a directory tree of files,
including their paths, sizes, and cryptographic checksums. It supports
optional GPG signatures for integrity verification and optional timestamps
for metadata preservation.
## File Structure
An `.mf` file consists of two parts, concatenated:
1. **Magic bytes** (8 bytes): the ASCII string `ZNAVSRFG`
2. **Outer message**: a Protocol Buffers serialized `MFFileOuter` message
There is no length prefix or version byte between the magic and the protobuf
message. The protobuf message extends to the end of the file.
See [`mfer/mf.proto`](mfer/mf.proto) for exact field numbers and types.
## Outer Message (`MFFileOuter`)
The outer message contains:
| Field | Number | Type | Description |
| ----------------- | ------ | ---------------- | ------------------------------------------------------------------------ |
| `version` | 101 | enum | Must be `VERSION_ONE` (1) |
| `compressionType` | 102 | enum | Compression of `innerMessage`; must be `COMPRESSION_ZSTD` (1) |
| `size` | 103 | int64 | Uncompressed size of `innerMessage` (corruption detection) |
| `sha256` | 104 | bytes | SHA-256 hash of the **compressed** `innerMessage` (corruption detection) |
| `uuid` | 105 | bytes | Random v4 UUID; must match the inner message UUID |
| `innerMessage` | 199 | bytes | Zstd-compressed serialized `MFFile` message |
| `signature` | 201 | bytes (optional) | GPG signature (ASCII-armored or binary) |
| `signer` | 202 | bytes (optional) | Full GPG key ID of the signer |
| `signingPubKey` | 203 | bytes (optional) | Full GPG signing public key |
### SHA-256 Hash
The `sha256` field (104) covers the **compressed** `innerMessage` bytes.
This allows verifying data integrity before decompression.
## Compression
The `innerMessage` field is compressed with [Zstandard (zstd)](https://facebook.github.io/zstd/).
Implementations must enforce a decompression size limit to prevent
decompression bombs. The reference implementation limits decompressed size to
256 MB.
## Inner Message (`MFFile`)
After decompressing `innerMessage`, the result is a serialized `MFFile`
(referred to as the manifest):
| Field | Number | Type | Description |
| ----------- | ------ | --------------------- | ------------------------------------- |
| `version` | 100 | enum | Must be `VERSION_ONE` (1) |
| `files` | 101 | repeated `MFFilePath` | List of files in the manifest |
| `uuid` | 102 | bytes | Random v4 UUID; must match outer UUID |
| `createdAt` | 201 | Timestamp (optional) | When the manifest was created |
## File Entries (`MFFilePath`)
Each file entry contains:
| Field | Number | Type | Description |
| ---------- | ------ | ------------------------- | ----------------------------------- |
| `path` | 1 | string | Relative file path (see Path Rules) |
| `size` | 2 | int64 | File size in bytes |
| `hashes` | 3 | repeated `MFFileChecksum` | At least one hash required |
| `mimeType` | 301 | string (optional) | MIME type |
| `mtime` | 302 | Timestamp (optional) | Modification time |
| `ctime` | 303 | Timestamp (optional) | Change time (inode metadata change) |
Field 304 (`atime`) has been removed from the specification. Access time is
volatile and non-deterministic; it is not useful for integrity verification.
## Path Rules
All `path` values must satisfy these invariants:
- **UTF-8**: paths must be valid UTF-8
- **Forward slashes**: use `/` as the path separator (never `\`)
- **Relative only**: no leading `/`
- **No parent traversal**: no `..` path segments
- **No empty segments**: no `//` sequences
- **No trailing slash**: paths refer to files, not directories
Implementations must validate these invariants when reading and writing
manifests. Paths that violate these rules must be rejected.
## Hash Format (`MFFileChecksum`)
Each checksum is a single `bytes multiHash` field containing a
[multihash](https://multiformats.io/multihash/)-encoded value. Multihash is
self-describing: the encoded bytes include a varint algorithm identifier
followed by a varint digest length followed by the digest itself.
The 1.0 implementation writes SHA-256 multihashes (`0x12` algorithm code).
Implementations must be able to verify SHA-256 multihashes at minimum.
## Signature Scheme
Signing is optional. When present, the signature covers a canonical string
constructed as:
```
ZNAVSRFG-<UUID>-<SHA256>
```
Where:
- `ZNAVSRFG` is the magic bytes string (literal ASCII)
- `<UUID>` is the hex-encoded UUID from the outer message
- `<SHA256>` is the hex-encoded SHA-256 hash from the outer message (covering compressed data)
Components are separated by hyphens. The signature is produced by GPG over
this canonical string and stored in the `signature` field of the outer
message.
## Deterministic Serialization
By default, manifests are generated deterministically:
- File entries are sorted by `path` in **lexicographic byte order**
- `createdAt` is omitted unless explicitly requested
- `atime` is never included (field removed from schema)
This ensures that two independent runs over the same directory tree produce
byte-identical `.mf` files (assuming file contents and metadata have not
changed).
## MIME Type
The recommended MIME type for `.mf` files is `application/octet-stream`.
The `.mf` file extension is the canonical identifier.
## Reference
- Proto definition: [`mfer/mf.proto`](mfer/mf.proto)
- Reference implementation: [git.eeqj.de/sneak/mfer](https://git.eeqj.de/sneak/mfer)

View File

@@ -5,7 +5,7 @@ export PATH := $(PATH):$(GOPATH)/bin
PROTOC_GEN_GO := $(GOPATH)/bin/protoc-gen-go
SOURCEFILES := mfer/*.go mfer/*.proto internal/*/*.go cmd/*/*.go go.mod go.sum
ARCH := $(shell uname -m)
GITREV_BUILD := $(shell bash $(PWD)/bin/gitrev.sh)
GITREV_BUILD := $(shell bash $(PWD)/bin/gitrev.sh 2>/dev/null || echo unknown)
APPNAME := mfer
VERSION := 0.1.0
export DOCKER_IMAGE_CACHE_DIR := $(HOME)/Library/Caches/Docker/$(APPNAME)-$(ARCH)
@@ -13,18 +13,18 @@ GOLDFLAGS += -X main.Version=$(VERSION)
GOLDFLAGS += -X main.Gitrev=$(GITREV_BUILD)
GOFLAGS := -ldflags "$(GOLDFLAGS)"
.PHONY: docker default run ci test fixme
.PHONY: docker default run ci test check lint fmt fmt-check hooks fixme
default: fmt test
run: ./mfer.cmd
run: ./bin/mfer
./$<
./$< gen
ci: test
test: $(SOURCEFILES) mfer/mf.pb.go
go test -v --timeout 3s ./...
go test -v --timeout 10s ./...
$(PROTOC_GEN_GO):
test -e $(PROTOC_GEN_GO) || go install -v google.golang.org/protobuf/cmd/protoc-gen-go@v1.28.1
@@ -32,18 +32,27 @@ $(PROTOC_GEN_GO):
fixme:
@grep -nir fixme . | grep -v Makefile
check: test lint fmt-check
fmt-check: mfer/mf.pb.go
sh -c 'test -z "$$(gofmt -l .)"'
hooks:
echo '#!/bin/sh\nmake check' > .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit
devprereqs:
which golangci-lint || go install -v github.com/golangci/golangci-lint/cmd/golangci-lint@latest
which golangci-lint || go install -v github.com/golangci/golangci-lint/cmd/golangci-lint@v2.0.2
mfer/mf.pb.go: mfer/mf.proto
cd mfer && go generate .
mfer.cmd: $(SOURCEFILES) mfer/mf.pb.go
bin/mfer: $(SOURCEFILES) mfer/mf.pb.go
protoc --version
cd cmd/mfer && go build -tags urfave_cli_no_docs -o ../../mfer.cmd $(GOFLAGS) .
cd cmd/mfer && go build -tags urfave_cli_no_docs -o ../../bin/mfer $(GOFLAGS) .
clean:
rm -rfv mfer/*.pb.go mfer.cmd cmd/mfer/mfer *.dockerimage
rm -rfv mfer/*.pb.go bin/mfer cmd/mfer/mfer *.dockerimage
fmt: mfer/mf.pb.go
gofumpt -l -w mfer internal cmd

497
README.md
View File

@@ -3,273 +3,48 @@
[mfer](https://git.eeqj.de/sneak/mfer) is a reference implementation library
and thin wrapper command-line utility written in [Go](https://golang.org)
and first published in 2022 under the [WTFPL](https://wtfpl.net) (public
domain) license. It specifies and generates `.mf` manifest files over a
domain) license. It specifies and generates `.mf` manifest files over a
directory tree of files to encapsulate metadata about them (such as
cryptographic checksums or signatures over same) to aid in archiving,
downloading, and streaming, or mirroring. The manifest files' data is
downloading, and streaming, or mirroring. The manifest files' data is
serialized with Google's [protobuf serialization
format](https://developers.google.com/protocol-buffers). The structure of
format](https://developers.google.com/protocol-buffers). The structure of
these files can be found [in the format
specification](https://git.eeqj.de/sneak/mfer/src/branch/main/mfer/mf.proto)
which is included in the [project
repository](https://git.eeqj.de/sneak/mfer).
The current version is pre-1.0 and while the repo was published in 2022,
there has not yet been any versioned release. [SemVer](https://semver.org)
there has not yet been any versioned release. [SemVer](https://semver.org)
will be used for releases.
This project was started by [@sneak](https://sneak.berlin) to scratch an
itch in 2022 and is currently a one-person effort, though the goal is for
this to emerge as a de-facto standard and be incorporated into other
software. A compatible javascript library is planned.
# Phases
Manifest generation happens in two distinct phases:
## Phase 1: Enumeration
Walking directories and calling `stat()` on files to collect metadata (path, size, mtime, ctime). This builds the list of files to be scanned. Relatively fast as it only reads filesystem metadata, not file contents.
**Progress:** `EnumerateStatus` with `FilesFound` and `BytesFound`
## Phase 2: Scan (ToManifest)
Reading file contents and computing cryptographic hashes for manifest generation. This is the expensive phase that reads all file data from disk.
**Progress:** `ScanStatus` with `TotalFiles`, `ScannedFiles`, `TotalBytes`, `ScannedBytes`, `BytesPerSec`
# Code Conventions
- **Logging:** Never use `fmt.Printf` or write to stdout/stderr directly in normal code. Use the `internal/log` package for all output (`log.Info`, `log.Infof`, `log.Debug`, `log.Debugf`, `log.Progressf`, `log.ProgressDone`).
- **Filesystem abstraction:** Use `github.com/spf13/afero` for filesystem operations to enable testing and flexibility.
- **CLI framework:** Use `github.com/urfave/cli/v2` for command-line interface.
- **Serialization:** Use Protocol Buffers for manifest file format.
- **Internal packages:** Non-exported implementation details go in `internal/` subdirectories.
- **Concurrency:** Use `sync.RWMutex` for protecting shared state; prefer channels for progress reporting.
- **Progress channels:** Use buffered channels (size 1) with non-blocking sends to avoid blocking the main operation if the consumer is slow.
- **Context support:** Long-running operations should accept `context.Context` for cancellation.
- **NO_COLOR:** Respect the `NO_COLOR` environment variable for disabling colored output.
- **Options pattern:** Use `NewWithOptions(opts *Options)` constructor pattern for configurable types.
# Codebase Structure
## cmd/mfer/
### main.go
- **Variables**
- `Appname string` - Application name
- `Version string` - Version string (set at build time)
- `Gitrev string` - Git revision (set at build time)
## internal/cli/
### entry.go
- **Variables**
- `NO_COLOR bool` - Disables color output when NO_COLOR env var is set
- **Functions**
- `Run(Appname, Version, Gitrev string) int` - Main entry point for the CLI
### mfer.go
- **Types**
- `CLIApp struct` - Main CLI application container
- **Methods**
- `(*CLIApp) VersionString() string` - Returns formatted version string
## internal/log/
### log.go
- **Functions**
- `Init()` - Initializes the logger
- `Info(arg string)` - Logs at info level
- `Infof(format string, args ...interface{})` - Logs at info level with formatting
- `Debug(arg string)` - Logs at debug level with caller info
- `Debugf(format string, args ...interface{})` - Logs at debug level with formatting and caller info
- `Dump(args ...interface{})` - Logs spew dump at debug level
- `Progressf(format string, args ...interface{})` - Prints progress message (overwrites current line)
- `ProgressDone()` - Completes progress line with newline
- `EnableDebugLogging()` - Sets log level to debug
- `SetLevel(arg log.Level)` - Sets log level
- `SetLevelFromVerbosity(l int)` - Sets log level from verbosity count
- `GetLevel() log.Level` - Returns current log level
- `GetLogger() *log.Logger` - Returns underlying logger
- `WithError(e error) *log.Entry` - Returns log entry with error attached
- `DisableStyling()` - Disables colors and styling (for NO_COLOR)
## internal/scanner/
### scanner.go
- **Types**
- `Options struct` - Options for scanner behavior
- `IncludeDotfiles bool` - Include dot (hidden) files (excluded by default)
- `FollowSymLinks bool`
- `EnumerateStatus struct` - Progress information for enumeration phase
- `FilesFound int64`
- `BytesFound int64`
- `ScanStatus struct` - Progress information for scan phase
- `TotalFiles int64`
- `ScannedFiles int64`
- `TotalBytes int64`
- `ScannedBytes int64`
- `BytesPerSec float64`
- `ETA time.Duration`
- `FileEntry struct` - Represents an enumerated file
- `Path string` - Relative path (used in manifest)
- `AbsPath string` - Absolute path (used for reading file content)
- `Size int64`
- `Mtime time.Time`
- `Ctime time.Time`
- `Scanner struct` - Accumulates files and generates manifests
- **Functions**
- `New() *Scanner` - Creates a new Scanner with default options
- `NewWithOptions(opts *Options) *Scanner` - Creates a new Scanner with given options
- **Methods (Enumeration Phase)**
- `(*Scanner) EnumerateFile(path string) error` - Enumerates a single file, calling stat() for metadata
- `(*Scanner) EnumeratePath(inputPath string, progress chan<- EnumerateStatus) error` - Walks a directory and enumerates all files
- `(*Scanner) EnumeratePaths(progress chan<- EnumerateStatus, inputPaths ...string) error` - Walks multiple directories
- `(*Scanner) EnumerateFS(afs afero.Fs, basePath string, progress chan<- EnumerateStatus) error` - Walks an afero filesystem
- **Methods (Accessors)**
- `(*Scanner) Files() []*FileEntry` - Returns copy of all enumerated files
- `(*Scanner) FileCount() int64` - Returns number of files
- `(*Scanner) TotalBytes() int64` - Returns total size of all files
- **Methods (Scan Phase)**
- `(*Scanner) ToManifest(ctx context.Context, w io.Writer, progress chan<- ScanStatus) error` - Reads file contents, computes hashes, generates manifest
## internal/checker/
### checker.go
- **Types**
- `Result struct` - Outcome of checking a single file
- `Path string` - File path from manifest
- `Status Status` - Verification status
- `Message string` - Error or status message
- `Status int` - Verification status enumeration
- `StatusOK` - File matches manifest
- `StatusMissing` - File not found
- `StatusSizeMismatch` - File size differs from manifest
- `StatusHashMismatch` - File hash differs from manifest
- `StatusError` - Error occurred during verification
- `CheckStatus struct` - Progress information for check operation
- `TotalFiles int64`
- `CheckedFiles int64`
- `TotalBytes int64`
- `CheckedBytes int64`
- `BytesPerSec float64`
- `ETA time.Duration`
- `Failures int64`
- `Checker struct` - Verifies files against a manifest
- **Functions**
- `NewChecker(manifestPath string, basePath string) (*Checker, error)` - Creates a new Checker for the given manifest and base path
- **Methods**
- `(s Status) String() string` - Returns string representation of status
- `(*Checker) FileCount() int64` - Returns number of files in the manifest
- `(*Checker) TotalBytes() int64` - Returns total size of all files in manifest
- `(*Checker) Check(ctx context.Context, results chan<- Result, progress chan<- CheckStatus) error` - Verifies all files against the manifest
## mfer/
### manifest.go
- **Types**
- `ManifestScanOptions struct` - Options for scanning directories
- `IncludeDotfiles bool` - Include dot (hidden) files (excluded by default)
- `FollowSymLinks bool`
- **Functions**
- `New() *manifest` - Creates a new empty manifest
- `NewFromPaths(options *ManifestScanOptions, inputPaths ...string) (*manifest, error)` - Creates manifest from filesystem paths
- `NewFromFS(options *ManifestScanOptions, fs afero.Fs) (*manifest, error)` - Creates manifest from afero filesystem
- **Methods**
- `(*manifest) HasError() bool` - Returns true if manifest has errors
- `(*manifest) AddError(e error) *manifest` - Adds an error to the manifest
- `(*manifest) WithContext(c context.Context) *manifest` - Sets context for cancellation
- `(*manifest) GetFileCount() int64` - Returns number of files in manifest
- `(*manifest) GetTotalFileSize() int64` - Returns total size of all files
- `(*manifest) Files() []*MFFilePath` - Returns all file entries from a loaded manifest
- `(*manifest) Scan() error` - Scans source filesystems and populates file list
### output.go
- **Methods**
- `(*manifest) WriteToFile(path string) error` - Writes manifest to file path
- `(*manifest) WriteTo(output io.Writer) error` - Writes manifest to io.Writer
### builder.go
- **Types**
- `FileProgress func(bytesRead int64)` - Callback for file processing progress
- `Builder struct` - Constructs manifests by adding files one at a time
- **Functions**
- `NewBuilder() *Builder` - Creates a new Builder
- **Methods**
- `(*Builder) AddFile(path string, size int64, mtime time.Time, reader io.Reader, progress FileProgress) (int64, error)` - Reads file, computes hash, adds to manifest
- `(*Builder) FileCount() int` - Returns number of files added
- `(*Builder) Build(w io.Writer) error` - Finalizes and writes manifest
### serialize.go
- **Constants**
- `MAGIC string` - Magic bytes prefix for manifest files ("ZNAVSRFG")
### deserialize.go
- **Functions**
- `NewFromProto(input io.Reader) (*manifest, error)` - Deserializes manifest from protobuf
- `NewManifestFromReader(input io.Reader) (*manifest, error)` - Reads and parses manifest from io.Reader
- `NewManifestFromFile(path string) (*manifest, error)` - Reads and parses manifest from file path
### mf.pb.go (generated from mf.proto)
- **Enum Types**
- `MFFileOuter_Version` - Outer file format version
- `MFFileOuter_VERSION_NONE`
- `MFFileOuter_VERSION_ONE`
- `MFFileOuter_CompressionType` - Compression type for inner message
- `MFFileOuter_COMPRESSION_NONE`
- `MFFileOuter_COMPRESSION_ZSTD`
- `MFFile_Version` - Inner file format version
- `MFFile_VERSION_NONE`
- `MFFile_VERSION_ONE`
- **Message Types**
- `Timestamp struct` - Timestamp with seconds and nanoseconds
- `GetSeconds() int64`
- `GetNanos() int32`
- `MFFileOuter struct` - Outer wrapper containing compressed/signed inner message
- `GetVersion() MFFileOuter_Version`
- `GetCompressionType() MFFileOuter_CompressionType`
- `GetSize() int64`
- `GetSha256() []byte`
- `GetInnerMessage() []byte`
- `GetSignature() []byte`
- `GetSigner() []byte`
- `GetSigningPubKey() []byte`
- `MFFilePath struct` - Individual file entry in manifest
- `GetPath() string`
- `GetSize() int64`
- `GetHashes() []*MFFileChecksum`
- `GetMimeType() string`
- `GetMtime() *Timestamp`
- `GetCtime() *Timestamp`
- `GetAtime() *Timestamp`
- `MFFileChecksum struct` - File checksum using multihash
- `GetMultiHash() []byte`
- `MFFile struct` - Inner manifest containing file list
- `GetVersion() MFFile_Version`
- `GetFiles() []*MFFilePath`
- `GetCreatedAt() *Timestamp`
software. A compatible javascript library is planned.
# Build Status
[![Build Status](https://drone.datavi.be/api/badges/sneak/mfer/status.svg)](https://drone.datavi.be/sneak/mfer)
CI runs via `docker build .` which executes `make check` (formatting,
linting, tests). The `main` branch must always be green.
# Participation
The community is as yet nonexistent so there are no defined policies or
norms yet. Primary development happens on a privately-run Gitea instance at
norms yet. Primary development happens on a privately-run Gitea instance at
[https://git.eeqj.de/sneak/mfer](https://git.eeqj.de/sneak/mfer) and issues
are [tracked there](https://git.eeqj.de/sneak/mfer/issues).
Changes must always be formatted with a standard `go fmt`, syntactically
valid, and must pass the linting defined in the repository (presently only
the `golangci-lint` defaults), which can be run with a `make lint`. The
the `golangci-lint` defaults), which can be run with a `make lint`. The
`main` branch is protected and all changes must be made via [pull
requests](https://git.eeqj.de/sneak/mfer/pulls) and pass CI to be merged.
Any changes submitted to this project must also be
[WTFPL-licensed](https://wtfpl.net) to be considered.
See [`REPO_POLICIES.md`](REPO_POLICIES.md) for detailed coding standards,
tooling requirements, and workflow conventions.
# Problem Statement
@@ -348,30 +123,9 @@ The manifest file would do several important things:
- metadata size should not be used as an excuse to sacrifice utility (such
as providing checksums over each chunk of a large file)
# Limitations
- **Manifest size:** Manifests must fit entirely in system memory during reading and writing.
# TODO
## Medium Priority
- [x] **Atomic writes for `mfer gen`** - Writes to temp file then atomic rename; cleans up temp file on error/interrupt.
- [ ] **Change FileProgress callback to channel** - `mfer/builder.go` uses a callback for progress reporting; should use channels like `EnumerateStatus` and `ScanStatus` for consistency.
- [ ] **Consolidate legacy manifest code** - `mfer/manifest.go` has old scanning code (`Scan()`, `addFile()`) that duplicates the new `internal/scanner` + `mfer/builder.go` pattern.
- [ ] **Add context cancellation to legacy code** - The old `manifest.Scan()` doesn't support context cancellation; the new scanner does.
## Lower Priority
- [x] **Add unit tests for `internal/checker`** - 88.5% coverage.
- [x] **Add unit tests for `internal/scanner`** - 80.1% coverage.
- [ ] **Clean up FIXMEs in manifest.go** - Validate input paths exist, validate filesystem.
- [x] **Validate input paths before scanning** - Fails fast with clear error if paths don't exist.
# Open Questions
- Should the manifest file include checksums of individual file chunks, or just for the whole assembled file?
- If so, should the chunksize be fixed or dynamic?
- Should the manifest signature format be GnuPG signatures, or those from
@@ -455,15 +209,236 @@ regardless of filesystem format.
Please email [`sneak@sneak.berlin`](mailto:sneak@sneak.berlin) with your
desired username for an account on this Gitea instance.
# TODO: Remaining Work for 1.0
## Design Questions (Owner Decision Required)
These require @sneak's input before implementation. Answers should be added
inline below each question.
### Format Design
**1. Should `MFFileChecksum` be simplified?** Currently it's a separate
message wrapping a single `bytes multiHash` field. Since multihash
already self-describes the algorithm, `repeated bytes hashes` directly on
`MFFilePath` would be simpler and reduce per-file protobuf overhead. Is
the extra message layer intentional (e.g. planning to add per-hash
metadata like `verified_at`)?
> _answer:_
**2. Should file permissions/mode be stored?** The format stores
mtime/ctime but not Unix file permissions. For archival use this may not
matter, but for software distribution or filesystem restoration it's a
gap. Should we reserve a field now (e.g. `optional uint32 mode = 305`)
even if we don't populate it yet?
> _answer:_
**3. Should `atime` be removed from the schema?** Access time is
volatile, non-deterministic, and often disabled (`noatime`). Including it
means two manifests of the same directory at different times will differ,
which conflicts with the determinism goal. Remove it, or document it as
"never set by default"?
> _answer:_
**4. What are the path normalization rules?** The proto has `string path`
with no specification about: always forward-slash? Must be relative? No
`..` components allowed? UTF-8 NFC vs NFD normalization (macOS vs
Linux)? Max path length? This is a security issue (path traversal) and a
cross-platform compatibility issue. What rules should the spec mandate?
> _answer:_
**5. Should we add a version byte after the magic?** Currently
`ZNAVSRFG` is followed immediately by protobuf. Adding a version byte
(`ZNAVSRFG\x01`) would allow future framing changes without requiring
protobuf parsing to detect the version. `MFFileOuter.Version` serves
this purpose but requires successful deserialization to read. Worth the
extra byte?
> _answer:_
**6. Should we add a length-prefix after the magic?** Protobuf is not
self-delimiting. If we ever want to concatenate manifests or append data
after the protobuf, the current framing is insufficient. Add a varint or
fixed-width length-prefix?
> _answer:_
### Signature Design
**7. What does the outer SHA-256 hash cover — compressed or uncompressed
data?** The code currently hashes compressed data (good for verifying
before decompression), but this should be explicitly documented. Which is
the intended behavior?
> _answer:_
**8. Should `signatureString()` sign raw bytes instead of a hex-encoded
string?** Currently the canonical string is `MAGIC-UUID-MULTIHASH` with
hex encoding, which adds a transformation layer. Signing the raw `sha256`
bytes (or compressed `innerMessage` directly) would be simpler. Keep the
string format or switch to raw bytes?
> _answer:_
**9. Should we support detached signature files (`.mf.sig`)?** Embedded
signatures are better for single-file distribution. Detached `.mf.sig`
files follow the familiar `SHASUMS`/`SHASUMS.asc` pattern and are
simpler for HTTP serving. Support both modes?
> _answer:_
**10. GPG vs pure-Go crypto for signatures?** Shelling out to `gpg` is
fragile (may not be installed, version-dependent output).
`github.com/ProtonMail/go-crypto` provides pure-Go OpenPGP, or we could
use Ed25519/signify (simpler, no key management). Which direction?
> _answer:_
### Implementation Design
**11. Should manifests be deterministic by default?** This means: sort
file entries by path, omit `createdAt` timestamp (or make it opt-in), no
`atime`. Should determinism be the default, with a
`--include-timestamps` flag to opt in?
> _answer:_
**12. Should we consolidate or keep both scanner/checker
implementations?** There are two parallel implementations:
`mfer/scanner.go` + `mfer/checker.go` (typed with `FileSize`,
`RelFilePath`) and `internal/scanner/` + `internal/checker/` (raw
`int64`, `string`). The `mfer/` versions are superior. Delete the
`internal/` versions?
> _answer:_
**13. Should the `manifest` type be exported?** Currently unexported with
exported constructors (`NewManifestFromReader`, `NewManifestFromFile`).
Consumers can't declare `var m *mfer.manifest`. Export the type, or
define an interface?
> _answer:_
**14. What should the Go module path be for 1.0?** Currently
`sneak.berlin/go/mfer` in `go.mod` but `git.eeqj.de/sneak/mfer/mfer` in
the proto `go_package` option. Which is canonical?
> _answer:_
## Implementation Tasks
### Repo Infrastructure
- [ ] Add `.golangci.yml` (fetch from
`https://git.eeqj.de/sneak/prompts/raw/branch/main/.golangci.yml`)
- [ ] Add `.editorconfig`
- [ ] Add `.gitea/workflows/check.yml` that runs `docker build .`
### Format & Correctness
- [ ] Resolve proto `go_package` path inconsistency
(`git.eeqj.de/sneak/mfer/mfer` vs `sneak.berlin/go/mfer`)
- [ ] Specify path invariants — add proto comments requiring UTF-8,
forward-slash, relative paths, no `..`, no leading `/`; validate
in `Builder.AddFile` and `Builder.AddFileWithHash` (pending design
question answer)
- [ ] Remove or deprecate `atime` from proto (pending design question
answer)
- [ ] Reserve `optional uint32 mode = 305` in `MFFilePath` for future
file permissions (pending design question answer)
- [ ] Add version byte after magic — `ZNAVSRFG\x01` for format version
1 (pending design question answer)
- [ ] Write format specification document — separate from README:
magic, outer structure, compression, inner structure, path
invariants, signature scheme, canonical serialization
### Library
- [ ] Delete `internal/scanner/` and `internal/checker/` — consolidate
on `mfer/` package versions; update CLI code (pending design
question answer)
- [ ] Add deterministic file ordering — sort entries by path
(lexicographic, byte-order) in `Builder.Build()`; add test
asserting byte-identical output from two runs
- [ ] Add decompression size limit — `io.LimitReader` in
`deserializeInner()` with `m.pbOuter.Size` as bound
- [ ] Fix `errors.Is` dead code in checker — replace with
`os.IsNotExist(err)` or `errors.Is(err, fs.ErrNotExist)`
- [ ] Fix `AddFile` to verify size — check `totalRead == size` after
reading, return error on mismatch
- [ ] Export the `manifest` type or define a public interface (pending
design question answer) — currently consumers cannot hold a reference
to a loaded manifest in their own type declarations
- [ ] Replace GPG subprocess calls with pure-Go crypto (pending design
question answer) — current implementation shells out to `gpg` which
may not be installed
- [ ] Add timeout to any remaining subprocess calls
### CLI
- [ ] Fix flag naming — all CLI flags should use kebab-case as primary
(`--include-dotfiles`, `--follow-symlinks`)
- [ ] Fix URL construction in fetch — use `BaseURL.JoinPath()` or
`url.JoinPath()` instead of string concatenation
- [ ] Add progress rate-limiting to Checker — throttle to once per
second, matching Scanner
- [ ] Add `--deterministic` flag or make it default — omit `createdAt`,
sort files (pending design question answer)
- [ ] Wire `--version` flag properly (currently only a `version`
subcommand exists; top-level `--version` shows urfave/cli generic
output)
- [ ] Add retry logic to `fetch` — currently no retries on transient
HTTP errors; needs exponential backoff
- [ ] `fetch` command uses bare `http.Get` with no timeout — needs
`http.Client` with configurable timeout
### Testing & Robustness
- [ ] Add fuzzing tests for `NewManifestFromReader` — protobuf
deserialization of untrusted input needs fuzz coverage
- [ ] Add integration test for `freshen` CLI command — current tests
only verify setup, not the actual freshen operation end-to-end
- [ ] Add test for `fetch` CLI command end-to-end (currently only
`downloadFile` is tested)
### Documentation
- [ ] Promote `FORMAT.md` as primary spec reference; README should link
to it more prominently
- [ ] Audit and update all error messages for consistency and
helpfulness
- [ ] Document the signature scheme more thoroughly (canonical string
format, verification steps)
### Release
- [ ] Finalize Go module path
- [ ] Update version constant in `mfer/constants.go`
- [ ] Add `--version` output matching SemVer
- [ ] Tag `v1.0.0`
# See Also
## Prior Art: Metalink
- [Metalink - Mozilla Wiki](https://wiki.mozilla.org/Metalink)
- [Metalink - Wikipedia](https://en.wikipedia.org/wiki/Metalink)
- [RFC 5854 - The Metalink Download Description Format](https://datatracker.ietf.org/doc/html/rfc5854)
- [RFC 6249 - Metalink/HTTP: Mirrors and Hashes](https://www.rfc-editor.org/rfc/rfc6249.html)
## Links
* Repo: [https://git.eeqj.de/sneak/mfer](https://git.eeqj.de/sneak/mfer)
* Issues: [https://git.eeqj.de/sneak/mfer/issues](https://git.eeqj.de/sneak/mfer/issues)
- Repo: [https://git.eeqj.de/sneak/mfer](https://git.eeqj.de/sneak/mfer)
- Issues: [https://git.eeqj.de/sneak/mfer/issues](https://git.eeqj.de/sneak/mfer/issues)
# Authors
* [@sneak &lt;sneak@sneak.berlin&gt;](mailto:sneak@sneak.berlin)
- [@sneak &lt;sneak@sneak.berlin&gt;](mailto:sneak@sneak.berlin)
# License
* [WTFPL](https://wtfpl.net)
- [WTFPL](https://wtfpl.net)

255
REPO_POLICIES.md Normal file
View File

@@ -0,0 +1,255 @@
---
title: Repository Policies
last_modified: 2026-03-10
---
This document covers repository structure, tooling, and workflow standards. Code
style conventions are in separate documents:
- [Code Styleguide](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/CODE_STYLEGUIDE.md)
(general, bash, Docker)
- [Go](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/CODE_STYLEGUIDE_GO.md)
- [JavaScript](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/CODE_STYLEGUIDE_JS.md)
- [Python](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/CODE_STYLEGUIDE_PYTHON.md)
- [Go HTTP Server Conventions](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/GO_HTTP_SERVER_CONVENTIONS.md)
---
- Cross-project documentation (such as this file) must include
`last_modified: YYYY-MM-DD` in the YAML front matter so it can be kept in sync
with the authoritative source as policies evolve.
- **ALL external references must be pinned by cryptographic hash.** This
includes Docker base images, Go modules, npm packages, GitHub Actions, and
anything else fetched from a remote source. Version tags (`@v4`, `@latest`,
`:3.21`, etc.) are server-mutable and therefore remote code execution
vulnerabilities. The ONLY acceptable way to reference an external dependency
is by its content hash (Docker `@sha256:...`, Go module hash in `go.sum`, npm
integrity hash in lockfile, GitHub Actions `@<commit-sha>`). No exceptions.
This also means never `curl | bash` to install tools like pyenv, nvm, rustup,
etc. Instead, download a specific release archive from GitHub, verify its hash
(hardcoded in the Dockerfile or script), and only then install. Unverified
install scripts are arbitrary remote code execution. This is the single most
important rule in this document. Double-check every external reference in
every file before committing. There are zero exceptions to this rule.
- Every repo with software must have a root `Makefile` with these targets:
`make test`, `make lint`, `make fmt` (writes), `make fmt-check` (read-only),
`make check` (prereqs: `test`, `lint`, `fmt-check`), `make docker`, and
`make hooks` (installs pre-commit hook). A model Makefile is at
`https://git.eeqj.de/sneak/prompts/raw/branch/main/Makefile`.
- Always use Makefile targets (`make fmt`, `make test`, `make lint`, etc.)
instead of invoking the underlying tools directly. The Makefile is the single
source of truth for how these operations are run.
- The Makefile is authoritative documentation for how the repo is used. Beyond
the required targets above, it should have targets for every common operation:
running a local development server (`make run`, `make dev`), re-initializing
or migrating the database (`make db-reset`, `make migrate`), building
artifacts (`make build`), generating code, seeding data, or anything else a
developer would do regularly. If someone checks out the repo and types
`make<tab>`, they should see every meaningful operation available. A new
contributor should be able to understand the entire development workflow by
reading the Makefile.
- Every repo should have a `Dockerfile`. All Dockerfiles must run `make check`
as a build step so the build fails if the branch is not green. For non-server
repos, the Dockerfile should bring up a development environment and run
`make check`. For server repos, `make check` should run as an early build
stage before the final image is assembled.
- Every repo should have a Gitea Actions workflow (`.gitea/workflows/`) that
runs `docker build .` on push. Since the Dockerfile already runs `make check`,
a successful build implies all checks pass.
- Use platform-standard formatters: `black` for Python, `prettier` for
JS/CSS/Markdown/HTML, `go fmt` for Go. Always use default configuration with
two exceptions: four-space indents (except Go), and `proseWrap: always` for
Markdown (hard-wrap at 80 columns). Documentation and writing repos (Markdown,
HTML, CSS) should also have `.prettierrc` and `.prettierignore`.
- Pre-commit hook: `make check` if local testing is possible, otherwise
`make lint && make fmt-check`. The Makefile should provide a `make hooks`
target to install the pre-commit hook.
- All repos with software must have tests that run via the platform-standard
test framework (`go test`, `pytest`, `jest`/`vitest`, etc.). If no meaningful
tests exist yet, add the most minimal test possible — e.g. importing the
module under test to verify it compiles/parses. There is no excuse for
`make test` to be a no-op.
- `make test` must complete in under 20 seconds. Add a 30-second timeout in the
Makefile.
- Docker builds must complete in under 5 minutes.
- `make check` must not modify any files in the repo. Tests may use temporary
directories.
- `main` must always pass `make check`, no exceptions.
- Never commit secrets. `.env` files, credentials, API keys, and private keys
must be in `.gitignore`. No exceptions.
- `.gitignore` should be comprehensive from the start: OS files (`.DS_Store`),
editor files (`.swp`, `*~`), language build artifacts, and `node_modules/`.
Fetch the standard `.gitignore` from
`https://git.eeqj.de/sneak/prompts/raw/branch/main/.gitignore` when setting up
a new repo.
- **No build artifacts in version control.** Code-derived data (compiled
bundles, minified output, generated assets) must never be committed to the
repository if it can be avoided. The build process (e.g. Dockerfile, Makefile)
should generate these at build time. Notable exception: Go protobuf generated
files (`.pb.go`) ARE committed because repos need to work with `go get`, which
downloads code but does not execute code generation.
- Never use `git add -A` or `git add .`. Always stage files explicitly by name.
- Never force-push to `main`.
- Make all changes on a feature branch. You can do whatever you want on a
feature branch.
- `.golangci.yml` is standardized and must _NEVER_ be modified by an agent, only
manually by the user. Fetch from
`https://git.eeqj.de/sneak/prompts/raw/branch/main/.golangci.yml`.
- When pinning images or packages by hash, add a comment above the reference
with the version and date (YYYY-MM-DD).
- Use `yarn`, not `npm`.
- Write all dates as YYYY-MM-DD (ISO 8601).
- Simple projects should be configured with environment variables.
- Dockerized web services listen on port 8080 by default, overridable with
`PORT`.
- **HTTP/web services must be hardened for production internet exposure before
tagging 1.0.** This means full compliance with security best practices
including, without limitation, all of the following:
- **Security headers** on every response:
- `Strict-Transport-Security` (HSTS) with `max-age` of at least one year
and `includeSubDomains`.
- `Content-Security-Policy` (CSP) with a restrictive default policy
(`default-src 'self'` as a baseline, tightened per-resource as
needed). Never use `unsafe-inline` or `unsafe-eval` unless
unavoidable, and document the reason.
- `X-Frame-Options: DENY` (or `SAMEORIGIN` if framing is required).
Prefer the `frame-ancestors` CSP directive as the primary control.
- `X-Content-Type-Options: nosniff`.
- `Referrer-Policy: strict-origin-when-cross-origin` (or stricter).
- `Permissions-Policy` restricting access to browser features the
application does not use (camera, microphone, geolocation, etc.).
- **Request and response limits:**
- Maximum request body size enforced on all endpoints (e.g. Go
`http.MaxBytesReader`). Choose a sane default per-route; never accept
unbounded input.
- Maximum response body size where applicable (e.g. paginated APIs).
- `ReadTimeout` and `ReadHeaderTimeout` on the `http.Server` to defend
against slowloris attacks.
- `WriteTimeout` on the `http.Server`.
- `IdleTimeout` on the `http.Server`.
- Per-handler execution time limits via `context.WithTimeout` or
chi/stdlib `middleware.Timeout`.
- **Authentication and session security:**
- Rate limiting on password-based authentication endpoints. API keys are
high-entropy and not susceptible to brute force, so they are exempt.
- CSRF tokens on all state-mutating HTML forms. API endpoints
authenticated via `Authorization` header (Bearer token, API key) are
exempt because the browser does not attach these automatically.
- Passwords stored using bcrypt, scrypt, or argon2 — never plain-text,
MD5, or SHA.
- Session cookies set with `HttpOnly`, `Secure`, and `SameSite=Lax` (or
`Strict`) attributes.
- **Reverse proxy awareness:**
- True client IP detection when behind a reverse proxy
(`X-Forwarded-For`, `X-Real-IP`). The application must accept
forwarded headers only from a configured set of trusted proxy
addresses — never trust `X-Forwarded-For` unconditionally.
- **CORS:**
- Authenticated endpoints must restrict `Access-Control-Allow-Origin` to
an explicit allowlist of known origins. Wildcard (`*`) is acceptable
only for public, unauthenticated read-only APIs.
- **Error handling:**
- Internal errors must never leak stack traces, SQL queries, file paths,
or other implementation details to the client. Return generic error
messages in production; detailed errors only when `DEBUG` is enabled.
- **TLS:**
- Services never terminate TLS directly. They are always deployed behind
a TLS-terminating reverse proxy. The service itself listens on plain
HTTP. However, HSTS headers and `Secure` cookie flags must still be
set by the application so that the browser enforces HTTPS end-to-end.
This list is non-exhaustive. Apply defense-in-depth: if a standard security
hardening measure exists for HTTP services and is not listed here, it is
still expected. When in doubt, harden.
- `README.md` is the primary documentation. Required sections:
- **Description**: First line must include the project name, purpose,
category (web server, SPA, CLI tool, etc.), license, and author. Example:
"µPaaS is an MIT-licensed Go web application by @sneak that receives
git-frontend webhooks and deploys applications via Docker in realtime."
- **Getting Started**: Copy-pasteable install/usage code block.
- **Rationale**: Why does this exist?
- **Design**: How is the program structured?
- **TODO**: Update meticulously, even between commits. When planning, put
the todo list in the README so a new agent can pick up where the last one
left off.
- **License**: MIT, GPL, or WTFPL. Ask the user for new projects. Include a
`LICENSE` file in the repo root and a License section in the README.
- **Author**: [@sneak](https://sneak.berlin).
- First commit of a new repo should contain only `README.md`.
- Go module root: `sneak.berlin/go/<name>`. Always run `go mod tidy` before
committing.
- Use SemVer.
- Database migrations live in `internal/db/migrations/` and must be embedded in
the binary.
- `000_migration.sql` — contains ONLY the creation of the migrations
tracking table itself. Nothing else.
- `001_schema.sql` — the full application schema.
- **Pre-1.0.0:** never add additional migration files (002, 003, etc.).
There is no installed base to migrate. Edit `001_schema.sql` directly.
- **Post-1.0.0:** add new numbered migration files for each schema change.
Never edit existing migrations after release.
- All repos should have an `.editorconfig` enforcing the project's indentation
settings.
- Avoid putting files in the repo root unless necessary. Root should contain
only project-level config files (`README.md`, `Makefile`, `Dockerfile`,
`LICENSE`, `.gitignore`, `.editorconfig`, `REPO_POLICIES.md`, and
language-specific config). Everything else goes in a subdirectory. Canonical
subdirectory names:
- `bin/` — executable scripts and tools
- `cmd/` — Go command entrypoints
- `configs/` — configuration templates and examples
- `deploy/` — deployment manifests (k8s, compose, terraform)
- `docs/` — documentation and markdown (README.md stays in root)
- `internal/` — Go internal packages
- `internal/db/migrations/` — database migrations
- `pkg/` — Go library packages
- `share/` — systemd units, data files
- `static/` — static assets (images, fonts, etc.)
- `web/` — web frontend source
- When setting up a new repo, files from the `prompts` repo may be used as
templates. Fetch them from
`https://git.eeqj.de/sneak/prompts/raw/branch/main/<path>`.
- New repos must contain at minimum:
- `README.md`, `.git`, `.gitignore`, `.editorconfig`
- `LICENSE`, `REPO_POLICIES.md` (copy from the `prompts` repo)
- `Makefile`
- `Dockerfile`, `.dockerignore`
- `.gitea/workflows/check.yml`
- Go: `go.mod`, `go.sum`, `.golangci.yml`
- JS: `package.json`, `yarn.lock`, `.prettierrc`, `.prettierignore`
- Python: `pyproject.toml`

33
TODO.md
View File

@@ -1,33 +0,0 @@
# TODO for 1.0 Release
## High Priority
- [ ] **Fix panic in log.go** - `internal/log/log.go:141` has a `panic("unable to get logger")` that should return an error or handle gracefully instead.
- [ ] **Clean up FIXMEs in manifest.go** - Multiple FIXMEs need attention:
- Line 67: Validate input paths exist before processing
- Line 77: Add validation for filesystem input
- Line 163: Avoid redundant stat calls
- Line 182: Add context support for cancellation
- [ ] **Fix WriteToFile overwrite behavior** - `mfer/output.go:9` has FIXME to refuse overwriting without `-f` flag.
- [ ] **Consolidate legacy manifest code** - `mfer/manifest.go` has old scanning code (`Scan()`, `addFile()`) that duplicates the new `internal/scanner` + `mfer/builder.go` pattern. Remove duplication.
## Medium Priority
- [ ] **Add unit tests for `internal/checker`** - Currently has no test files; only tested indirectly via CLI tests.
- [ ] **Add unit tests for `internal/scanner`** - Currently has no test files.
- [ ] **Add context cancellation to legacy code** - The old `manifest.Scan()` doesn't support context cancellation; the new scanner does.
- [ ] **Validate input paths before scanning** - Should fail fast with a clear error if paths don't exist.
- [ ] **Add resume support for fetch** - Allow resuming partial downloads using HTTP Range requests and existing temp files.
## Lower Priority
- [ ] **Add manifest signature support** - Implement signing and verification using signify or similar.
- [ ] **Improve error messages** - Ensure all error messages are clear and actionable.

1
go.mod
View File

@@ -6,6 +6,7 @@ require (
github.com/apex/log v1.9.0
github.com/davecgh/go-spew v1.1.1
github.com/dustin/go-humanize v1.0.1
github.com/google/uuid v1.1.2
github.com/klauspost/compress v1.18.2
github.com/multiformats/go-multihash v0.2.3
github.com/pterm/pterm v0.12.35

1
go.sum
View File

@@ -135,6 +135,7 @@ github.com/google/pprof v0.0.0-20201203190320-1bf35d6f28c2/go.mod h1:kpwsk12EmLe
github.com/google/pprof v0.0.0-20201218002935-b9804c9f04c2/go.mod h1:kpwsk12EmLew5upagYY7GY0pfYCcupk39gWOCRROcvE=
github.com/google/renameio v0.1.0/go.mod h1:KWCgfxg9yswjAJkECMjeO8J8rahYeXnNhOm40UhjYkI=
github.com/google/uuid v1.1.1/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/google/uuid v1.1.2 h1:EVhdT+1Kseyi1/pUmXKaFxYsDNy9RQYkMWRH68J/W7Y=
github.com/google/uuid v1.1.2/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/googleapis/gax-go/v2 v2.0.4/go.mod h1:0Wqv26UfaUD9n4G6kQubkQ+KchISgw+vpHVxEJEs9eg=
github.com/googleapis/gax-go/v2 v2.0.5/go.mod h1:DWXyrwAJ9X0FpwwEdw+IPEYBICEFu5mhpdKc/us6bOk=

View File

@@ -1,299 +0,0 @@
package checker
import (
"bytes"
"context"
"crypto/sha256"
"errors"
"io"
"os"
"path/filepath"
"time"
"github.com/multiformats/go-multihash"
"github.com/spf13/afero"
"sneak.berlin/go/mfer/mfer"
)
// Result represents the outcome of checking a single file.
type Result struct {
Path string // Relative path from manifest
Status Status // Verification result status
Message string // Human-readable description of the result
}
// Status represents the verification status of a file.
type Status int
const (
StatusOK Status = iota // File matches manifest (size and hash verified)
StatusMissing // File not found on disk
StatusSizeMismatch // File size differs from manifest
StatusHashMismatch // File hash differs from manifest
StatusExtra // File exists on disk but not in manifest
StatusError // Error occurred during verification
)
func (s Status) String() string {
switch s {
case StatusOK:
return "OK"
case StatusMissing:
return "MISSING"
case StatusSizeMismatch:
return "SIZE_MISMATCH"
case StatusHashMismatch:
return "HASH_MISMATCH"
case StatusExtra:
return "EXTRA"
case StatusError:
return "ERROR"
default:
return "UNKNOWN"
}
}
// CheckStatus contains progress information for the check operation.
type CheckStatus struct {
TotalFiles int64 // Total number of files in manifest
CheckedFiles int64 // Number of files checked so far
TotalBytes int64 // Total bytes to verify (sum of all file sizes)
CheckedBytes int64 // Bytes verified so far
BytesPerSec float64 // Current throughput rate
ETA time.Duration // Estimated time to completion
Failures int64 // Number of verification failures encountered
}
// Checker verifies files against a manifest.
type Checker struct {
basePath string
files []*mfer.MFFilePath
fs afero.Fs
// manifestPaths is a set of paths in the manifest for quick lookup
manifestPaths map[string]struct{}
}
// NewChecker creates a new Checker for the given manifest, base path, and filesystem.
// The basePath is the directory relative to which manifest paths are resolved.
// If fs is nil, the real filesystem (OsFs) is used.
func NewChecker(manifestPath string, basePath string, fs afero.Fs) (*Checker, error) {
if fs == nil {
fs = afero.NewOsFs()
}
m, err := mfer.NewManifestFromFile(fs, manifestPath)
if err != nil {
return nil, err
}
abs, err := filepath.Abs(basePath)
if err != nil {
return nil, err
}
files := m.Files()
manifestPaths := make(map[string]struct{}, len(files))
for _, f := range files {
manifestPaths[f.Path] = struct{}{}
}
return &Checker{
basePath: abs,
files: files,
fs: fs,
manifestPaths: manifestPaths,
}, nil
}
// FileCount returns the number of files in the manifest.
func (c *Checker) FileCount() int64 {
return int64(len(c.files))
}
// TotalBytes returns the total size of all files in the manifest.
func (c *Checker) TotalBytes() int64 {
var total int64
for _, f := range c.files {
total += f.Size
}
return total
}
// Check verifies all files against the manifest.
// Results are sent to the results channel as files are checked.
// Progress updates are sent to the progress channel approximately once per second.
// Both channels are closed when the method returns.
func (c *Checker) Check(ctx context.Context, results chan<- Result, progress chan<- CheckStatus) error {
if results != nil {
defer close(results)
}
if progress != nil {
defer close(progress)
}
totalFiles := int64(len(c.files))
totalBytes := c.TotalBytes()
var checkedFiles int64
var checkedBytes int64
var failures int64
startTime := time.Now()
for _, entry := range c.files {
select {
case <-ctx.Done():
return ctx.Err()
default:
}
result := c.checkFile(entry, &checkedBytes)
if result.Status != StatusOK {
failures++
}
checkedFiles++
if results != nil {
results <- result
}
// Send progress with rate and ETA calculation
if progress != nil {
elapsed := time.Since(startTime)
var bytesPerSec float64
var eta time.Duration
if elapsed > 0 && checkedBytes > 0 {
bytesPerSec = float64(checkedBytes) / elapsed.Seconds()
remainingBytes := totalBytes - checkedBytes
if bytesPerSec > 0 {
eta = time.Duration(float64(remainingBytes)/bytesPerSec) * time.Second
}
}
sendCheckStatus(progress, CheckStatus{
TotalFiles: totalFiles,
CheckedFiles: checkedFiles,
TotalBytes: totalBytes,
CheckedBytes: checkedBytes,
BytesPerSec: bytesPerSec,
ETA: eta,
Failures: failures,
})
}
}
return nil
}
func (c *Checker) checkFile(entry *mfer.MFFilePath, checkedBytes *int64) Result {
absPath := filepath.Join(c.basePath, entry.Path)
// Check if file exists
info, err := c.fs.Stat(absPath)
if err != nil {
if errors.Is(err, afero.ErrFileNotFound) || errors.Is(err, errors.New("file does not exist")) {
return Result{Path: entry.Path, Status: StatusMissing, Message: "file not found"}
}
// Check for "file does not exist" style errors
exists, _ := afero.Exists(c.fs, absPath)
if !exists {
return Result{Path: entry.Path, Status: StatusMissing, Message: "file not found"}
}
return Result{Path: entry.Path, Status: StatusError, Message: err.Error()}
}
// Check size
if info.Size() != entry.Size {
*checkedBytes += info.Size()
return Result{
Path: entry.Path,
Status: StatusSizeMismatch,
Message: "size mismatch",
}
}
// Open and hash file
f, err := c.fs.Open(absPath)
if err != nil {
return Result{Path: entry.Path, Status: StatusError, Message: err.Error()}
}
defer func() { _ = f.Close() }()
h := sha256.New()
n, err := io.Copy(h, f)
if err != nil {
return Result{Path: entry.Path, Status: StatusError, Message: err.Error()}
}
*checkedBytes += n
// Encode as multihash and compare
computed, err := multihash.Encode(h.Sum(nil), multihash.SHA2_256)
if err != nil {
return Result{Path: entry.Path, Status: StatusError, Message: err.Error()}
}
// Check against all hashes in manifest (at least one must match)
for _, hash := range entry.Hashes {
if bytes.Equal(computed, hash.MultiHash) {
return Result{Path: entry.Path, Status: StatusOK}
}
}
return Result{Path: entry.Path, Status: StatusHashMismatch, Message: "hash mismatch"}
}
// FindExtraFiles walks the filesystem and reports files not in the manifest.
// Results are sent to the results channel. The channel is closed when done.
func (c *Checker) FindExtraFiles(ctx context.Context, results chan<- Result) error {
if results != nil {
defer close(results)
}
return afero.Walk(c.fs, c.basePath, func(path string, info os.FileInfo, err error) error {
if err != nil {
return err
}
select {
case <-ctx.Done():
return ctx.Err()
default:
}
// Skip directories
if info.IsDir() {
return nil
}
// Get relative path
relPath, err := filepath.Rel(c.basePath, path)
if err != nil {
return err
}
// Check if path is in manifest
if _, exists := c.manifestPaths[relPath]; !exists {
if results != nil {
results <- Result{
Path: relPath,
Status: StatusExtra,
Message: "not in manifest",
}
}
}
return nil
})
}
// sendCheckStatus sends a status update without blocking.
func sendCheckStatus(ch chan<- CheckStatus, status CheckStatus) {
if ch == nil {
return
}
select {
case ch <- status:
default:
}
}

View File

@@ -1,15 +1,18 @@
package cli
import (
"encoding/hex"
"fmt"
"io"
"path/filepath"
"strings"
"time"
"github.com/dustin/go-humanize"
"github.com/spf13/afero"
"github.com/urfave/cli/v2"
"sneak.berlin/go/mfer/internal/checker"
"sneak.berlin/go/mfer/internal/log"
"sneak.berlin/go/mfer/mfer"
)
// findManifest looks for a manifest file in the given directory.
@@ -32,29 +35,32 @@ func findManifest(fs afero.Fs, dir string) (string, error) {
func (mfa *CLIApp) checkManifestOperation(ctx *cli.Context) error {
log.Debug("checkManifestOperation()")
var manifestPath string
var err error
manifestPath, err := mfa.resolveManifestArg(ctx)
if err != nil {
return fmt.Errorf("check: %w", err)
}
if ctx.Args().Len() > 0 {
arg := ctx.Args().Get(0)
// Check if arg is a directory or a file
info, statErr := mfa.Fs.Stat(arg)
if statErr == nil && info.IsDir() {
// It's a directory, look for manifest inside
manifestPath, err = findManifest(mfa.Fs, arg)
if err != nil {
return err
}
} else {
// Treat as a file path
manifestPath = arg
// URL manifests need to be downloaded to a temp file for the checker
if isHTTPURL(manifestPath) {
rc, fetchErr := mfa.openManifestReader(manifestPath)
if fetchErr != nil {
return fmt.Errorf("check: %w", fetchErr)
}
} else {
// No argument, look in current directory
manifestPath, err = findManifest(mfa.Fs, ".")
if err != nil {
return err
tmpFile, tmpErr := afero.TempFile(mfa.Fs, "", "mfer-manifest-*.mf")
if tmpErr != nil {
_ = rc.Close()
return fmt.Errorf("check: failed to create temp file: %w", tmpErr)
}
tmpPath := tmpFile.Name()
_, cpErr := io.Copy(tmpFile, rc)
_ = rc.Close()
_ = tmpFile.Close()
if cpErr != nil {
_ = mfa.Fs.Remove(tmpPath)
return fmt.Errorf("check: failed to download manifest: %w", cpErr)
}
defer func() { _ = mfa.Fs.Remove(tmpPath) }()
manifestPath = tmpPath
}
basePath := ctx.String("base")
@@ -63,20 +69,49 @@ func (mfa *CLIApp) checkManifestOperation(ctx *cli.Context) error {
log.Infof("checking manifest %s with base %s", manifestPath, basePath)
// Create checker
chk, err := checker.NewChecker(manifestPath, basePath, mfa.Fs)
chk, err := mfer.NewChecker(manifestPath, basePath, mfa.Fs)
if err != nil {
return fmt.Errorf("failed to load manifest: %w", err)
}
// Check signature requirement
requiredSigner := ctx.String("require-signature")
if requiredSigner != "" {
// Validate fingerprint format: must be exactly 40 hex characters
if len(requiredSigner) != 40 {
return fmt.Errorf("invalid fingerprint: must be exactly 40 hex characters, got %d", len(requiredSigner))
}
if _, err := hex.DecodeString(requiredSigner); err != nil {
return fmt.Errorf("invalid fingerprint: must be valid hex: %w", err)
}
if !chk.IsSigned() {
return fmt.Errorf("manifest is not signed, but signature from %s is required", requiredSigner)
}
// Extract fingerprint from the embedded public key (not from the signer field)
// This validates the key is importable and gets its actual fingerprint
embeddedFP, err := chk.ExtractEmbeddedSigningKeyFP()
if err != nil {
return fmt.Errorf("failed to extract fingerprint from embedded signing key: %w", err)
}
// Compare fingerprints - must be exact match (case-insensitive)
if !strings.EqualFold(embeddedFP, requiredSigner) {
return fmt.Errorf("embedded signing key fingerprint %s does not match required %s", embeddedFP, requiredSigner)
}
log.Infof("manifest signature verified (signer: %s)", embeddedFP)
}
log.Infof("manifest contains %d files, %s", chk.FileCount(), humanize.IBytes(uint64(chk.TotalBytes())))
// Set up results channel
results := make(chan checker.Result, 1)
results := make(chan mfer.Result, 1)
// Set up progress channel
var progress chan checker.CheckStatus
var progress chan mfer.CheckStatus
if showProgress {
progress = make(chan checker.CheckStatus, 1)
progress = make(chan mfer.CheckStatus, 1)
go func() {
for status := range progress {
if status.ETA > 0 {
@@ -103,7 +138,7 @@ func (mfa *CLIApp) checkManifestOperation(ctx *cli.Context) error {
done := make(chan struct{})
go func() {
for result := range results {
if result.Status != checker.StatusOK {
if result.Status != mfer.StatusOK {
failures++
log.Infof("%s: %s (%s)", result.Status, result.Path, result.Message)
} else {
@@ -124,7 +159,7 @@ func (mfa *CLIApp) checkManifestOperation(ctx *cli.Context) error {
// Check for extra files if requested
if ctx.Bool("no-extra-files") {
extraResults := make(chan checker.Result, 1)
extraResults := make(chan mfer.Result, 1)
extraDone := make(chan struct{})
go func() {
for result := range extraResults {

View File

@@ -65,9 +65,9 @@ func TestGenerateCommand(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test files in memory filesystem
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello world"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("test content"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello world"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("test content"), 0o644))
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/testdir/test.mf", "/testdir"}, fs)
@@ -85,9 +85,9 @@ func TestGenerateAndCheckCommand(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test files with subdirectory
require.NoError(t, fs.MkdirAll("/testdir/subdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello world"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/subdir/file2.txt", []byte("test content"), 0644))
require.NoError(t, fs.MkdirAll("/testdir/subdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello world"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/subdir/file2.txt", []byte("test content"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/testdir/test.mf", "/testdir"}, fs)
@@ -104,8 +104,8 @@ func TestCheckCommandWithMissingFile(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello world"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello world"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/testdir/test.mf", "/testdir"}, fs)
@@ -125,8 +125,8 @@ func TestCheckCommandWithCorruptedFile(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello world"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello world"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/testdir/test.mf", "/testdir"}, fs)
@@ -134,7 +134,7 @@ func TestCheckCommandWithCorruptedFile(t *testing.T) {
require.Equal(t, 0, exitCode, "generate failed: %s", opts.Stderr.(*bytes.Buffer).String())
// Corrupt the file (change content but keep same size)
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("HELLO WORLD"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("HELLO WORLD"), 0o644))
// Check manifest - should fail with hash mismatch
opts = testOpts([]string{"mfer", "check", "-q", "--base", "/testdir", "/testdir/test.mf"}, fs)
@@ -146,8 +146,8 @@ func TestCheckCommandWithSizeMismatch(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello world"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello world"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/testdir/test.mf", "/testdir"}, fs)
@@ -155,7 +155,7 @@ func TestCheckCommandWithSizeMismatch(t *testing.T) {
require.Equal(t, 0, exitCode, "generate failed: %s", opts.Stderr.(*bytes.Buffer).String())
// Change file size
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("different size content here"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("different size content here"), 0o644))
// Check manifest - should fail with size mismatch
opts = testOpts([]string{"mfer", "check", "-q", "--base", "/testdir", "/testdir/test.mf"}, fs)
@@ -167,8 +167,8 @@ func TestBannerOutput(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
// Run without -q to see banner
opts := testOpts([]string{"mfer", "generate", "-o", "/testdir/test.mf", "/testdir"}, fs)
@@ -193,9 +193,9 @@ func TestGenerateExcludesDotfilesByDefault(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test files including dotfiles
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/.hidden", []byte("secret"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/.hidden", []byte("secret"), 0o644))
// Generate manifest without --include-dotfiles (default excludes dotfiles)
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/testdir/test.mf", "/testdir"}, fs)
@@ -217,9 +217,9 @@ func TestGenerateWithIncludeDotfiles(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test files including dotfiles
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/.hidden", []byte("secret"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/.hidden", []byte("secret"), 0o644))
// Generate manifest with --include-dotfiles
opts := testOpts([]string{"mfer", "generate", "-q", "--include-dotfiles", "-o", "/testdir/test.mf", "/testdir"}, fs)
@@ -236,10 +236,10 @@ func TestMultipleInputPaths(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test files in multiple directories
require.NoError(t, fs.MkdirAll("/dir1", 0755))
require.NoError(t, fs.MkdirAll("/dir2", 0755))
require.NoError(t, afero.WriteFile(fs, "/dir1/file1.txt", []byte("content1"), 0644))
require.NoError(t, afero.WriteFile(fs, "/dir2/file2.txt", []byte("content2"), 0644))
require.NoError(t, fs.MkdirAll("/dir1", 0o755))
require.NoError(t, fs.MkdirAll("/dir2", 0o755))
require.NoError(t, afero.WriteFile(fs, "/dir1/file1.txt", []byte("content1"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/dir2/file2.txt", []byte("content2"), 0o644))
// Generate manifest from multiple paths
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/output.mf", "/dir1", "/dir2"}, fs)
@@ -254,9 +254,9 @@ func TestNoExtraFilesPass(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test files
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("world"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("world"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/manifest.mf", "/testdir"}, fs)
@@ -273,8 +273,8 @@ func TestNoExtraFilesFail(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test files
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/manifest.mf", "/testdir"}, fs)
@@ -282,7 +282,7 @@ func TestNoExtraFilesFail(t *testing.T) {
require.Equal(t, 0, exitCode)
// Add an extra file after manifest generation
require.NoError(t, afero.WriteFile(fs, "/testdir/extra.txt", []byte("extra"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/extra.txt", []byte("extra"), 0o644))
// Check with --no-extra-files (should fail - extra file exists)
opts = testOpts([]string{"mfer", "check", "-q", "--no-extra-files", "--base", "/testdir", "/manifest.mf"}, fs)
@@ -294,9 +294,9 @@ func TestNoExtraFilesWithSubdirectory(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test files with subdirectory
require.NoError(t, fs.MkdirAll("/testdir/subdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/subdir/file2.txt", []byte("world"), 0644))
require.NoError(t, fs.MkdirAll("/testdir/subdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/subdir/file2.txt", []byte("world"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/manifest.mf", "/testdir"}, fs)
@@ -304,7 +304,7 @@ func TestNoExtraFilesWithSubdirectory(t *testing.T) {
require.Equal(t, 0, exitCode)
// Add extra file in subdirectory
require.NoError(t, afero.WriteFile(fs, "/testdir/subdir/extra.txt", []byte("extra"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/subdir/extra.txt", []byte("extra"), 0o644))
// Check with --no-extra-files (should fail)
opts = testOpts([]string{"mfer", "check", "-q", "--no-extra-files", "--base", "/testdir", "/manifest.mf"}, fs)
@@ -316,8 +316,8 @@ func TestCheckWithoutNoExtraFilesIgnoresExtra(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/manifest.mf", "/testdir"}, fs)
@@ -325,7 +325,7 @@ func TestCheckWithoutNoExtraFilesIgnoresExtra(t *testing.T) {
require.Equal(t, 0, exitCode)
// Add extra file
require.NoError(t, afero.WriteFile(fs, "/testdir/extra.txt", []byte("extra"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/extra.txt", []byte("extra"), 0o644))
// Check WITHOUT --no-extra-files (should pass - extra files ignored)
opts = testOpts([]string{"mfer", "check", "-q", "--base", "/testdir", "/manifest.mf"}, fs)
@@ -337,8 +337,8 @@ func TestGenerateAtomicWriteNoTempFileOnSuccess(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/output.mf", "/testdir"}, fs)
@@ -360,11 +360,11 @@ func TestGenerateAtomicWriteOverwriteWithForce(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
// Create existing manifest with different content
require.NoError(t, afero.WriteFile(fs, "/output.mf", []byte("old content"), 0644))
require.NoError(t, afero.WriteFile(fs, "/output.mf", []byte("old content"), 0o644))
// Generate manifest with --force
opts := testOpts([]string{"mfer", "generate", "-q", "-f", "-o", "/output.mf", "/testdir"}, fs)
@@ -386,11 +386,11 @@ func TestGenerateFailsWithoutForceWhenOutputExists(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
// Create existing manifest
require.NoError(t, afero.WriteFile(fs, "/output.mf", []byte("existing"), 0644))
require.NoError(t, afero.WriteFile(fs, "/output.mf", []byte("existing"), 0o644))
// Generate manifest WITHOUT --force (should fail)
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/output.mf", "/testdir"}, fs)
@@ -411,8 +411,8 @@ func TestGenerateAtomicWriteUsesTemp(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/output.mf", "/testdir"}, fs)
@@ -464,8 +464,8 @@ func TestGenerateAtomicWriteCleansUpOnError(t *testing.T) {
baseFs := afero.NewMemMapFs()
// Create test files - need enough content to trigger the write failure
require.NoError(t, baseFs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(baseFs, "/testdir/file1.txt", []byte("hello world this is a test file"), 0644))
require.NoError(t, baseFs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(baseFs, "/testdir/file1.txt", []byte("hello world this is a test file"), 0o644))
// Wrap with failing writer that fails after writing some bytes
fs := &failingWriterFs{Fs: baseFs, failAfter: 10}
@@ -489,8 +489,8 @@ func TestGenerateValidatesInputPaths(t *testing.T) {
fs := afero.NewMemMapFs()
// Create one valid directory
require.NoError(t, fs.MkdirAll("/validdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/validdir/file.txt", []byte("content"), 0644))
require.NoError(t, fs.MkdirAll("/validdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/validdir/file.txt", []byte("content"), 0o644))
t.Run("nonexistent path fails fast", func(t *testing.T) {
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/output.mf", "/nonexistent"}, fs)
@@ -527,7 +527,7 @@ func TestCheckDetectsManifestCorruption(t *testing.T) {
// Create many small files with random names to generate a ~1MB manifest
// Each manifest entry is roughly 50-60 bytes, so we need ~20000 files
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
numFiles := 20000
for i := 0; i < numFiles; i++ {
@@ -536,7 +536,7 @@ func TestCheckDetectsManifestCorruption(t *testing.T) {
// Small random content
content := make([]byte, 16+rng.Intn(48))
rng.Read(content)
require.NoError(t, afero.WriteFile(fs, filename, content, 0644))
require.NoError(t, afero.WriteFile(fs, filename, content, 0o644))
}
// Generate manifest outside of testdir
@@ -551,7 +551,7 @@ func TestCheckDetectsManifestCorruption(t *testing.T) {
t.Logf("manifest size: %d bytes (%d files)", len(validManifest), numFiles)
// First corruption: truncate the manifest
require.NoError(t, afero.WriteFile(fs, "/manifest.mf", validManifest[:len(validManifest)/2], 0644))
require.NoError(t, afero.WriteFile(fs, "/manifest.mf", validManifest[:len(validManifest)/2], 0o644))
// Check should fail with truncated manifest
opts = testOpts([]string{"mfer", "check", "-q", "--base", "/testdir", "/manifest.mf"}, fs)
@@ -559,7 +559,7 @@ func TestCheckDetectsManifestCorruption(t *testing.T) {
assert.Equal(t, 1, exitCode, "check should fail with truncated manifest")
// Verify check passes with valid manifest
require.NoError(t, afero.WriteFile(fs, "/manifest.mf", validManifest, 0644))
require.NoError(t, afero.WriteFile(fs, "/manifest.mf", validManifest, 0o644))
opts = testOpts([]string{"mfer", "check", "-q", "--base", "/testdir", "/manifest.mf"}, fs)
exitCode = RunWithOptions(opts)
require.Equal(t, 0, exitCode, "check should pass with valid manifest")
@@ -579,7 +579,7 @@ func TestCheckDetectsManifestCorruption(t *testing.T) {
}
corrupted[offset] = newByte
require.NoError(t, afero.WriteFile(fs, "/manifest.mf", corrupted, 0644))
require.NoError(t, afero.WriteFile(fs, "/manifest.mf", corrupted, 0o644))
// Check should fail with corrupted manifest
opts = testOpts([]string{"mfer", "check", "-q", "--base", "/testdir", "/manifest.mf"}, fs)
@@ -588,6 +588,6 @@ func TestCheckDetectsManifestCorruption(t *testing.T) {
i, offset, originalByte, newByte)
// Restore valid manifest for next iteration
require.NoError(t, afero.WriteFile(fs, "/manifest.mf", validManifest, 0644))
require.NoError(t, afero.WriteFile(fs, "/manifest.mf", validManifest, 0o644))
}
}

72
internal/cli/export.go Normal file
View File

@@ -0,0 +1,72 @@
package cli
import (
"encoding/hex"
"encoding/json"
"fmt"
"time"
"github.com/urfave/cli/v2"
"sneak.berlin/go/mfer/mfer"
)
// ExportEntry represents a single file entry in the exported JSON output.
type ExportEntry struct {
Path string `json:"path"`
Size int64 `json:"size"`
Hashes []string `json:"hashes"`
Mtime *string `json:"mtime,omitempty"`
Ctime *string `json:"ctime,omitempty"`
}
func (mfa *CLIApp) exportManifestOperation(ctx *cli.Context) error {
pathOrURL, err := mfa.resolveManifestArg(ctx)
if err != nil {
return fmt.Errorf("export: %w", err)
}
rc, err := mfa.openManifestReader(pathOrURL)
if err != nil {
return fmt.Errorf("export: %w", err)
}
defer func() { _ = rc.Close() }()
manifest, err := mfer.NewManifestFromReader(rc)
if err != nil {
return fmt.Errorf("export: failed to parse manifest: %w", err)
}
files := manifest.Files()
entries := make([]ExportEntry, 0, len(files))
for _, f := range files {
entry := ExportEntry{
Path: f.Path,
Size: f.Size,
Hashes: make([]string, 0, len(f.Hashes)),
}
for _, h := range f.Hashes {
entry.Hashes = append(entry.Hashes, hex.EncodeToString(h.MultiHash))
}
if f.Mtime != nil {
t := time.Unix(f.Mtime.Seconds, int64(f.Mtime.Nanos)).UTC().Format(time.RFC3339Nano)
entry.Mtime = &t
}
if f.Ctime != nil {
t := time.Unix(f.Ctime.Seconds, int64(f.Ctime.Nanos)).UTC().Format(time.RFC3339Nano)
entry.Ctime = &t
}
entries = append(entries, entry)
}
enc := json.NewEncoder(mfa.Stdout)
enc.SetIndent("", " ")
if err := enc.Encode(entries); err != nil {
return fmt.Errorf("export: failed to encode JSON: %w", err)
}
return nil
}

137
internal/cli/export_test.go Normal file
View File

@@ -0,0 +1,137 @@
package cli
import (
"bytes"
"context"
"encoding/json"
"net/http"
"net/http/httptest"
"testing"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/mfer/mfer"
)
// buildTestManifest creates a manifest from in-memory files and returns its bytes.
func buildTestManifest(t *testing.T, files map[string][]byte) []byte {
t.Helper()
sourceFs := afero.NewMemMapFs()
for path, content := range files {
require.NoError(t, sourceFs.MkdirAll("/", 0o755))
require.NoError(t, afero.WriteFile(sourceFs, "/"+path, content, 0o644))
}
opts := &mfer.ScannerOptions{Fs: sourceFs}
s := mfer.NewScannerWithOptions(opts)
require.NoError(t, s.EnumerateFS(sourceFs, "/", nil))
var buf bytes.Buffer
require.NoError(t, s.ToManifest(context.Background(), &buf, nil))
return buf.Bytes()
}
func TestExportManifestOperation(t *testing.T) {
testFiles := map[string][]byte{
"hello.txt": []byte("Hello, World!"),
"sub/file.txt": []byte("nested content"),
}
manifestData := buildTestManifest(t, testFiles)
// Write manifest to memfs
fs := afero.NewMemMapFs()
require.NoError(t, afero.WriteFile(fs, "/test.mf", manifestData, 0o644))
var stdout, stderr bytes.Buffer
exitCode := RunWithOptions(&RunOptions{
Appname: "mfer",
Args: []string{"mfer", "export", "/test.mf"},
Stdin: &bytes.Buffer{},
Stdout: &stdout,
Stderr: &stderr,
Fs: fs,
})
require.Equal(t, 0, exitCode, "stderr: %s", stderr.String())
var entries []ExportEntry
require.NoError(t, json.Unmarshal(stdout.Bytes(), &entries))
assert.Len(t, entries, 2)
// Verify entries have expected fields
pathSet := make(map[string]bool)
for _, e := range entries {
pathSet[e.Path] = true
assert.NotEmpty(t, e.Hashes, "entry %s should have hashes", e.Path)
assert.Greater(t, e.Size, int64(0), "entry %s should have positive size", e.Path)
}
assert.True(t, pathSet["hello.txt"])
assert.True(t, pathSet["sub/file.txt"])
}
func TestExportFromHTTPURL(t *testing.T) {
testFiles := map[string][]byte{
"a.txt": []byte("aaa"),
}
manifestData := buildTestManifest(t, testFiles)
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/octet-stream")
_, _ = w.Write(manifestData)
}))
defer server.Close()
var stdout, stderr bytes.Buffer
exitCode := RunWithOptions(&RunOptions{
Appname: "mfer",
Args: []string{"mfer", "export", server.URL + "/index.mf"},
Stdin: &bytes.Buffer{},
Stdout: &stdout,
Stderr: &stderr,
Fs: afero.NewMemMapFs(),
})
require.Equal(t, 0, exitCode, "stderr: %s", stderr.String())
var entries []ExportEntry
require.NoError(t, json.Unmarshal(stdout.Bytes(), &entries))
assert.Len(t, entries, 1)
assert.Equal(t, "a.txt", entries[0].Path)
}
func TestListFromHTTPURL(t *testing.T) {
testFiles := map[string][]byte{
"one.txt": []byte("1"),
"two.txt": []byte("22"),
}
manifestData := buildTestManifest(t, testFiles)
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
_, _ = w.Write(manifestData)
}))
defer server.Close()
var stdout, stderr bytes.Buffer
exitCode := RunWithOptions(&RunOptions{
Appname: "mfer",
Args: []string{"mfer", "list", server.URL + "/index.mf"},
Stdin: &bytes.Buffer{},
Stdout: &stdout,
Stderr: &stderr,
Fs: afero.NewMemMapFs(),
})
require.Equal(t, 0, exitCode, "stderr: %s", stderr.String())
output := stdout.String()
assert.Contains(t, output, "one.txt")
assert.Contains(t, output, "two.txt")
}
func TestIsHTTPURL(t *testing.T) {
assert.True(t, isHTTPURL("http://example.com/manifest.mf"))
assert.True(t, isHTTPURL("https://example.com/manifest.mf"))
assert.False(t, isHTTPURL("/local/path.mf"))
assert.False(t, isHTTPURL("relative/path.mf"))
assert.False(t, isHTTPURL("ftp://example.com/file"))
}

View File

@@ -67,7 +67,7 @@ func (mfa *CLIApp) fetchManifestOperation(ctx *cli.Context) error {
// Compute base URL (directory containing manifest)
baseURL, err := url.Parse(manifestURL)
if err != nil {
return err
return fmt.Errorf("fetch: invalid manifest URL: %w", err)
}
baseURL.Path = path.Dir(baseURL.Path)
if !strings.HasSuffix(baseURL.Path, "/") {
@@ -113,7 +113,7 @@ func (mfa *CLIApp) fetchManifestOperation(ctx *cli.Context) error {
return fmt.Errorf("invalid path in manifest: %w", err)
}
fileURL := baseURL.String() + f.Path
fileURL := baseURL.String() + encodeFilePath(f.Path)
log.Infof("fetching %s", f.Path)
if err := downloadFile(fileURL, localPath, f, progress); err != nil {
@@ -139,6 +139,15 @@ func (mfa *CLIApp) fetchManifestOperation(ctx *cli.Context) error {
return nil
}
// encodeFilePath URL-encodes each segment of a file path while preserving slashes.
func encodeFilePath(p string) string {
segments := strings.Split(p, "/")
for i, seg := range segments {
segments[i] = url.PathEscape(seg)
}
return strings.Join(segments, "/")
}
// sanitizePath validates and sanitizes a file path from the manifest.
// It prevents path traversal attacks and rejects unsafe paths.
func sanitizePath(p string) (string, error) {
@@ -257,8 +266,8 @@ func downloadFile(fileURL, localPath string, entry *mfer.MFFilePath, progress ch
// Create parent directories if needed
dir := filepath.Dir(localPath)
if dir != "" && dir != "." {
if err := os.MkdirAll(dir, 0755); err != nil {
return err
if err := os.MkdirAll(dir, 0o755); err != nil {
return fmt.Errorf("failed to create directory %s: %w", dir, err)
}
}
@@ -278,9 +287,9 @@ func downloadFile(fileURL, localPath string, entry *mfer.MFFilePath, progress ch
}
// Fetch file
resp, err := http.Get(fileURL)
resp, err := http.Get(fileURL) //nolint:gosec // URL constructed from manifest base
if err != nil {
return err
return fmt.Errorf("HTTP request failed: %w", err)
}
defer func() { _ = resp.Body.Close() }()
@@ -298,7 +307,7 @@ func downloadFile(fileURL, localPath string, entry *mfer.MFFilePath, progress ch
// Create temp file
out, err := os.Create(tmpPath)
if err != nil {
return err
return fmt.Errorf("failed to create temp file: %w", err)
}
// Set up hash computation

View File

@@ -13,10 +13,32 @@ import (
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/mfer/internal/scanner"
"sneak.berlin/go/mfer/mfer"
)
func TestEncodeFilePath(t *testing.T) {
tests := []struct {
input string
expected string
}{
{"file.txt", "file.txt"},
{"dir/file.txt", "dir/file.txt"},
{"my file.txt", "my%20file.txt"},
{"dir/my file.txt", "dir/my%20file.txt"},
{"file#1.txt", "file%231.txt"},
{"file?v=1.txt", "file%3Fv=1.txt"},
{"path/to/file with spaces.txt", "path/to/file%20with%20spaces.txt"},
{"100%done.txt", "100%25done.txt"},
}
for _, tt := range tests {
t.Run(tt.input, func(t *testing.T) {
result := encodeFilePath(tt.input)
assert.Equal(t, tt.expected, result)
})
}
}
func TestSanitizePath(t *testing.T) {
// Valid paths that should be accepted
validTests := []struct {
@@ -107,15 +129,15 @@ func TestFetchFromHTTP(t *testing.T) {
for path, content := range testFiles {
fullPath := "/" + path // MemMapFs needs absolute paths
dir := filepath.Dir(fullPath)
require.NoError(t, sourceFs.MkdirAll(dir, 0755))
require.NoError(t, afero.WriteFile(sourceFs, fullPath, content, 0644))
require.NoError(t, sourceFs.MkdirAll(dir, 0o755))
require.NoError(t, afero.WriteFile(sourceFs, fullPath, content, 0o644))
}
// Generate manifest using scanner
opts := &scanner.Options{
opts := &mfer.ScannerOptions{
Fs: sourceFs,
}
s := scanner.NewWithOptions(opts)
s := mfer.NewScannerWithOptions(opts)
require.NoError(t, s.EnumerateFS(sourceFs, "/", nil))
var manifestBuf bytes.Buffer
@@ -197,11 +219,11 @@ func TestFetchHashMismatch(t *testing.T) {
// Create source filesystem with a test file
sourceFs := afero.NewMemMapFs()
originalContent := []byte("Original content")
require.NoError(t, afero.WriteFile(sourceFs, "/file.txt", originalContent, 0644))
require.NoError(t, afero.WriteFile(sourceFs, "/file.txt", originalContent, 0o644))
// Generate manifest
opts := &scanner.Options{Fs: sourceFs}
s := scanner.NewWithOptions(opts)
opts := &mfer.ScannerOptions{Fs: sourceFs}
s := mfer.NewScannerWithOptions(opts)
require.NoError(t, s.EnumerateFS(sourceFs, "/", nil))
var manifestBuf bytes.Buffer
@@ -249,11 +271,11 @@ func TestFetchSizeMismatch(t *testing.T) {
// Create source filesystem with a test file
sourceFs := afero.NewMemMapFs()
originalContent := []byte("Original content with specific size")
require.NoError(t, afero.WriteFile(sourceFs, "/file.txt", originalContent, 0644))
require.NoError(t, afero.WriteFile(sourceFs, "/file.txt", originalContent, 0o644))
// Generate manifest
opts := &scanner.Options{Fs: sourceFs}
s := scanner.NewWithOptions(opts)
opts := &mfer.ScannerOptions{Fs: sourceFs}
s := mfer.NewScannerWithOptions(opts)
require.NoError(t, s.EnumerateFS(sourceFs, "/", nil))
var manifestBuf bytes.Buffer
@@ -298,11 +320,11 @@ func TestFetchProgress(t *testing.T) {
sourceFs := afero.NewMemMapFs()
// Create content large enough to trigger multiple progress updates
content := bytes.Repeat([]byte("x"), 100*1024) // 100KB
require.NoError(t, afero.WriteFile(sourceFs, "/large.txt", content, 0644))
require.NoError(t, afero.WriteFile(sourceFs, "/large.txt", content, 0o644))
// Generate manifest
opts := &scanner.Options{Fs: sourceFs}
s := scanner.NewWithOptions(opts)
opts := &mfer.ScannerOptions{Fs: sourceFs}
s := mfer.NewScannerWithOptions(opts)
require.NoError(t, s.EnumerateFS(sourceFs, "/", nil))
var manifestBuf bytes.Buffer

View File

@@ -41,8 +41,8 @@ func (mfa *CLIApp) freshenManifestOperation(ctx *cli.Context) error {
basePath := ctx.String("base")
showProgress := ctx.Bool("progress")
includeDotfiles := ctx.Bool("IncludeDotfiles")
followSymlinks := ctx.Bool("FollowSymLinks")
includeDotfiles := ctx.Bool("include-dotfiles")
followSymlinks := ctx.Bool("follow-symlinks")
// Find manifest file
var manifestPath string
@@ -54,7 +54,7 @@ func (mfa *CLIApp) freshenManifestOperation(ctx *cli.Context) error {
if statErr == nil && info.IsDir() {
manifestPath, err = findManifest(mfa.Fs, arg)
if err != nil {
return err
return fmt.Errorf("freshen: %w", err)
}
} else {
manifestPath = arg
@@ -62,7 +62,7 @@ func (mfa *CLIApp) freshenManifestOperation(ctx *cli.Context) error {
} else {
manifestPath, err = findManifest(mfa.Fs, ".")
if err != nil {
return err
return fmt.Errorf("freshen: %w", err)
}
}
@@ -93,7 +93,7 @@ func (mfa *CLIApp) freshenManifestOperation(ctx *cli.Context) error {
absBase, err := filepath.Abs(basePath)
if err != nil {
return err
return fmt.Errorf("freshen: invalid base path: %w", err)
}
err = afero.Walk(mfa.Fs, absBase, func(path string, info fs.FileInfo, walkErr error) error {
@@ -104,7 +104,7 @@ func (mfa *CLIApp) freshenManifestOperation(ctx *cli.Context) error {
// Get relative path
relPath, err := filepath.Rel(absBase, path)
if err != nil {
return err
return fmt.Errorf("freshen: failed to compute relative path for %s: %w", path, err)
}
// Skip the manifest file itself
@@ -113,7 +113,7 @@ func (mfa *CLIApp) freshenManifestOperation(ctx *cli.Context) error {
}
// Handle dotfiles
if !includeDotfiles && pathIsHidden(relPath) {
if !includeDotfiles && mfer.IsHiddenPath(filepath.ToSlash(relPath)) {
if info.IsDir() {
return filepath.SkipDir
}
@@ -226,6 +226,17 @@ func (mfa *CLIApp) freshenManifestOperation(ctx *cli.Context) error {
var hashedBytes int64
builder := mfer.NewBuilder()
if ctx.Bool("include-timestamps") {
builder.SetIncludeTimestamps(true)
}
// Set up signing options if sign-key is provided
if signKey := ctx.String("sign-key"); signKey != "" {
builder.SetSigningOptions(&mfer.SigningOptions{
KeyID: mfer.GPGKeyID(signKey),
})
log.Infof("signing manifest with GPG key: %s", signKey)
}
for _, e := range entries {
select {
@@ -274,10 +285,14 @@ func (mfa *CLIApp) freshenManifestOperation(ctx *cli.Context) error {
hashedFiles++
// Add to builder with computed hash
addFileToBuilder(builder, e.path, e.size, e.mtime, hash)
if err := addFileToBuilder(builder, e.path, e.size, e.mtime, hash); err != nil {
return fmt.Errorf("failed to add %s: %w", e.path, err)
}
} else {
// Use existing entry
addExistingToBuilder(builder, e.existing)
if err := addExistingToBuilder(builder, e.existing); err != nil {
return fmt.Errorf("failed to add %s: %w", e.path, err)
}
}
}
@@ -360,38 +375,15 @@ func hashFile(r io.Reader, size int64, progress func(int64)) ([]byte, int64, err
}
// addFileToBuilder adds a new file entry to the builder
func addFileToBuilder(b *mfer.Builder, path string, size int64, mtime time.Time, hash []byte) {
// Use the builder's internal method indirectly by creating an entry
// Since Builder.AddFile reads from a reader, we need to use a different approach
// We'll access the builder's files directly through a custom method
b.AddFileWithHash(path, size, mtime, hash)
func addFileToBuilder(b *mfer.Builder, path string, size int64, mtime time.Time, hash []byte) error {
return b.AddFileWithHash(mfer.RelFilePath(path), mfer.FileSize(size), mfer.ModTime(mtime), hash)
}
// addExistingToBuilder adds an existing manifest entry to the builder
func addExistingToBuilder(b *mfer.Builder, entry *mfer.MFFilePath) {
func addExistingToBuilder(b *mfer.Builder, entry *mfer.MFFilePath) error {
mtime := time.Unix(entry.Mtime.Seconds, int64(entry.Mtime.Nanos))
if len(entry.Hashes) > 0 {
b.AddFileWithHash(entry.Path, entry.Size, mtime, entry.Hashes[0].MultiHash)
if len(entry.Hashes) == 0 {
return nil
}
}
// pathIsHidden checks if a path contains hidden components
func pathIsHidden(p string) bool {
// "." is not hidden, it's the current directory
if p == "." {
return false
}
// Check each path component
for p != "" && p != "." && p != "/" {
base := filepath.Base(p)
if len(base) > 0 && base[0] == '.' {
return true
}
parent := filepath.Dir(p)
if parent == p {
break
}
p = parent
}
return false
return b.AddFileWithHash(mfer.RelFilePath(entry.Path), mfer.FileSize(entry.Size), mfer.ModTime(mtime), entry.Hashes[0].MultiHash)
}

View File

@@ -8,7 +8,6 @@ import (
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/mfer/internal/scanner"
"sneak.berlin/go/mfer/mfer"
)
@@ -16,20 +15,20 @@ func TestFreshenUnchanged(t *testing.T) {
// Create filesystem with test files
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("content1"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("content2"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("content1"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("content2"), 0o644))
// Generate initial manifest
opts := &scanner.Options{Fs: fs}
s := scanner.NewWithOptions(opts)
opts := &mfer.ScannerOptions{Fs: fs}
s := mfer.NewScannerWithOptions(opts)
require.NoError(t, s.EnumeratePath("/testdir", nil))
var manifestBuf bytes.Buffer
require.NoError(t, s.ToManifest(context.Background(), &manifestBuf, nil))
// Write manifest to filesystem
require.NoError(t, afero.WriteFile(fs, "/testdir/.index.mf", manifestBuf.Bytes(), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/.index.mf", manifestBuf.Bytes(), 0o644))
// Parse manifest to verify
manifest, err := mfer.NewManifestFromFile(fs, "/testdir/.index.mf")
@@ -41,20 +40,20 @@ func TestFreshenWithChanges(t *testing.T) {
// Create filesystem with test files
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("content1"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("content2"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("content1"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("content2"), 0o644))
// Generate initial manifest
opts := &scanner.Options{Fs: fs}
s := scanner.NewWithOptions(opts)
opts := &mfer.ScannerOptions{Fs: fs}
s := mfer.NewScannerWithOptions(opts)
require.NoError(t, s.EnumeratePath("/testdir", nil))
var manifestBuf bytes.Buffer
require.NoError(t, s.ToManifest(context.Background(), &manifestBuf, nil))
// Write manifest to filesystem
require.NoError(t, afero.WriteFile(fs, "/testdir/.index.mf", manifestBuf.Bytes(), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/.index.mf", manifestBuf.Bytes(), 0o644))
// Verify initial manifest has 2 files
manifest, err := mfer.NewManifestFromFile(fs, "/testdir/.index.mf")
@@ -62,10 +61,10 @@ func TestFreshenWithChanges(t *testing.T) {
assert.Len(t, manifest.Files(), 2)
// Add a new file
require.NoError(t, afero.WriteFile(fs, "/testdir/file3.txt", []byte("content3"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file3.txt", []byte("content3"), 0o644))
// Modify file2 (change content and size)
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("modified content2"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("modified content2"), 0o644))
// Remove file1
require.NoError(t, fs.Remove("/testdir/file1.txt"))

View File

@@ -13,29 +13,44 @@ import (
"github.com/spf13/afero"
"github.com/urfave/cli/v2"
"sneak.berlin/go/mfer/internal/log"
"sneak.berlin/go/mfer/internal/scanner"
"sneak.berlin/go/mfer/mfer"
)
func (mfa *CLIApp) generateManifestOperation(ctx *cli.Context) error {
log.Debug("generateManifestOperation()")
opts := &scanner.Options{
IncludeDotfiles: ctx.Bool("IncludeDotfiles"),
FollowSymLinks: ctx.Bool("FollowSymLinks"),
Fs: mfa.Fs,
opts := &mfer.ScannerOptions{
IncludeDotfiles: ctx.Bool("include-dotfiles"),
FollowSymLinks: ctx.Bool("follow-symlinks"),
IncludeTimestamps: ctx.Bool("include-timestamps"),
Fs: mfa.Fs,
}
s := scanner.NewWithOptions(opts)
// Set seed for deterministic UUID if provided
if seed := ctx.String("seed"); seed != "" {
opts.Seed = seed
log.Infof("using deterministic seed for manifest UUID")
}
// Set up signing options if sign-key is provided
if signKey := ctx.String("sign-key"); signKey != "" {
opts.SigningOptions = &mfer.SigningOptions{
KeyID: mfer.GPGKeyID(signKey),
}
log.Infof("signing manifest with GPG key: %s", signKey)
}
s := mfer.NewScannerWithOptions(opts)
// Phase 1: Enumeration - collect paths and stat files
args := ctx.Args()
showProgress := ctx.Bool("progress")
// Set up enumeration progress reporting
var enumProgress chan scanner.EnumerateStatus
var enumProgress chan mfer.EnumerateStatus
var enumWg sync.WaitGroup
if showProgress {
enumProgress = make(chan scanner.EnumerateStatus, 1)
enumProgress = make(chan mfer.EnumerateStatus, 1)
enumWg.Add(1)
go func() {
defer enumWg.Done()
@@ -51,7 +66,7 @@ func (mfa *CLIApp) generateManifestOperation(ctx *cli.Context) error {
if args.Len() == 0 {
// Default to current directory
if err := s.EnumeratePath(".", enumProgress); err != nil {
return err
return fmt.Errorf("generate: failed to enumerate current directory: %w", err)
}
} else {
// Collect and validate all paths first
@@ -60,7 +75,7 @@ func (mfa *CLIApp) generateManifestOperation(ctx *cli.Context) error {
inputPath := args.Get(i)
ap, err := filepath.Abs(inputPath)
if err != nil {
return err
return fmt.Errorf("generate: invalid path %q: %w", inputPath, err)
}
// Validate path exists before adding to list
if exists, _ := afero.Exists(mfa.Fs, ap); !exists {
@@ -70,7 +85,7 @@ func (mfa *CLIApp) generateManifestOperation(ctx *cli.Context) error {
paths = append(paths, ap)
}
if err := s.EnumeratePaths(enumProgress, paths...); err != nil {
return err
return fmt.Errorf("generate: failed to enumerate paths: %w", err)
}
}
enumWg.Wait()
@@ -117,10 +132,10 @@ func (mfa *CLIApp) generateManifestOperation(ctx *cli.Context) error {
}()
// Phase 2: Scan - read file contents and generate manifest
var scanProgress chan scanner.ScanStatus
var scanProgress chan mfer.ScanStatus
var scanWg sync.WaitGroup
if showProgress {
scanProgress = make(chan scanner.ScanStatus, 1)
scanProgress = make(chan mfer.ScanStatus, 1)
scanWg.Add(1)
go func() {
defer scanWg.Done()

View File

@@ -16,32 +16,20 @@ func (mfa *CLIApp) listManifestOperation(ctx *cli.Context) error {
longFormat := ctx.Bool("long")
print0 := ctx.Bool("print0")
// Find manifest file
var manifestPath string
var err error
if ctx.Args().Len() > 0 {
arg := ctx.Args().Get(0)
info, statErr := mfa.Fs.Stat(arg)
if statErr == nil && info.IsDir() {
manifestPath, err = findManifest(mfa.Fs, arg)
if err != nil {
return err
}
} else {
manifestPath = arg
}
} else {
manifestPath, err = findManifest(mfa.Fs, ".")
if err != nil {
return err
}
pathOrURL, err := mfa.resolveManifestArg(ctx)
if err != nil {
return fmt.Errorf("list: %w", err)
}
// Load manifest
manifest, err := mfer.NewManifestFromFile(mfa.Fs, manifestPath)
rc, err := mfa.openManifestReader(pathOrURL)
if err != nil {
return fmt.Errorf("failed to load manifest: %w", err)
return fmt.Errorf("list: %w", err)
}
defer func() { _ = rc.Close() }()
manifest, err := mfer.NewManifestFromReader(rc)
if err != nil {
return fmt.Errorf("list: failed to parse manifest: %w", err)
}
files := manifest.Files()

View File

@@ -0,0 +1,56 @@
package cli
import (
"fmt"
"io"
"net/http"
"strings"
"time"
"github.com/urfave/cli/v2"
)
// isHTTPURL returns true if the string starts with http:// or https://.
func isHTTPURL(s string) bool {
return strings.HasPrefix(s, "http://") || strings.HasPrefix(s, "https://")
}
// openManifestReader opens a manifest from a path or URL and returns a ReadCloser.
// The caller must close the returned reader.
func (mfa *CLIApp) openManifestReader(pathOrURL string) (io.ReadCloser, error) {
if isHTTPURL(pathOrURL) {
client := &http.Client{Timeout: 30 * time.Second}
resp, err := client.Get(pathOrURL) //nolint:gosec // user-provided URL is intentional
if err != nil {
return nil, fmt.Errorf("failed to fetch %s: %w", pathOrURL, err)
}
if resp.StatusCode != http.StatusOK {
_ = resp.Body.Close()
return nil, fmt.Errorf("failed to fetch %s: HTTP %d", pathOrURL, resp.StatusCode)
}
return resp.Body, nil
}
f, err := mfa.Fs.Open(pathOrURL)
if err != nil {
return nil, err
}
return f, nil
}
// resolveManifestArg resolves the manifest path from CLI arguments.
// HTTP(S) URLs are returned as-is. Directories are searched for index.mf/.index.mf.
// If no argument is given, the current directory is searched.
func (mfa *CLIApp) resolveManifestArg(ctx *cli.Context) (string, error) {
if ctx.Args().Len() > 0 {
arg := ctx.Args().Get(0)
if isHTTPURL(arg) {
return arg, nil
}
info, statErr := mfa.Fs.Stat(arg)
if statErr == nil && info.IsDir() {
return findManifest(mfa.Fs, arg)
}
return arg, nil
}
return findManifest(mfa.Fs, ".")
}

View File

@@ -123,14 +123,15 @@ func (mfa *CLIApp) run(args []string) {
},
Flags: append(commonFlags(),
&cli.BoolFlag{
Name: "FollowSymLinks",
Aliases: []string{"follow-symlinks"},
Name: "follow-symlinks",
Aliases: []string{"L"},
Usage: "Resolve encountered symlinks",
},
&cli.BoolFlag{
Name: "IncludeDotfiles",
Aliases: []string{"include-dotfiles"},
Usage: "Include dot (hidden) files (excluded by default)",
Name: "include-dotfiles",
Aliases: []string{"IncludeDotfiles"},
Usage: "Include dot (hidden) files (excluded by default)",
},
&cli.StringFlag{
Name: "output",
@@ -148,6 +149,21 @@ func (mfa *CLIApp) run(args []string) {
Aliases: []string{"P"},
Usage: "Show progress during enumeration and scanning",
},
&cli.StringFlag{
Name: "sign-key",
Aliases: []string{"s"},
Usage: "GPG key ID to sign the manifest with",
EnvVars: []string{"MFER_SIGN_KEY"},
},
&cli.StringFlag{
Name: "seed",
Usage: "Seed value for deterministic manifest UUID",
EnvVars: []string{"MFER_SEED"},
},
&cli.BoolFlag{
Name: "include-timestamps",
Usage: "Include createdAt timestamp in manifest (omitted by default for determinism)",
},
),
},
{
@@ -175,6 +191,12 @@ func (mfa *CLIApp) run(args []string) {
Name: "no-extra-files",
Usage: "Fail if files exist in base directory that are not in manifest",
},
&cli.StringFlag{
Name: "require-signature",
Aliases: []string{"S"},
Usage: "Require manifest to be signed by the specified GPG key ID",
EnvVars: []string{"MFER_REQUIRE_SIGNATURE"},
},
),
},
{
@@ -194,22 +216,41 @@ func (mfa *CLIApp) run(args []string) {
Usage: "Base directory for resolving relative paths",
},
&cli.BoolFlag{
Name: "FollowSymLinks",
Aliases: []string{"follow-symlinks"},
Name: "follow-symlinks",
Aliases: []string{"L"},
Usage: "Resolve encountered symlinks",
},
&cli.BoolFlag{
Name: "IncludeDotfiles",
Aliases: []string{"include-dotfiles"},
Usage: "Include dot (hidden) files (excluded by default)",
Name: "include-dotfiles",
Aliases: []string{"IncludeDotfiles"},
Usage: "Include dot (hidden) files (excluded by default)",
},
&cli.BoolFlag{
Name: "progress",
Aliases: []string{"P"},
Usage: "Show progress during scanning and hashing",
},
&cli.StringFlag{
Name: "sign-key",
Aliases: []string{"s"},
Usage: "GPG key ID to sign the manifest with",
EnvVars: []string{"MFER_SIGN_KEY"},
},
&cli.BoolFlag{
Name: "include-timestamps",
Usage: "Include createdAt timestamp in manifest (omitted by default for determinism)",
},
),
},
{
Name: "export",
Usage: "Export manifest contents as JSON",
ArgsUsage: "[manifest file or URL]",
Action: func(c *cli.Context) error {
return mfa.exportManifestOperation(c)
},
},
{
Name: "version",
Usage: "Show version",
@@ -251,7 +292,7 @@ func (mfa *CLIApp) run(args []string) {
},
}
mfa.app.HideVersion = true
mfa.app.HideVersion = false
err := mfa.app.Run(args)
if err != nil {
mfa.exitCode = 1

View File

@@ -2,23 +2,103 @@ package mfer
import (
"crypto/sha256"
"errors"
"fmt"
"io"
"sort"
"strings"
"sync"
"time"
"unicode/utf8"
"github.com/multiformats/go-multihash"
)
// ValidatePath checks that a file path conforms to manifest path invariants:
// - Must be valid UTF-8
// - Must use forward slashes only (no backslashes)
// - Must be relative (no leading /)
// - Must not contain ".." segments
// - Must not contain empty segments (no "//")
// - Must not be empty
func ValidatePath(p string) error {
if p == "" {
return errors.New("path cannot be empty")
}
if !utf8.ValidString(p) {
return fmt.Errorf("path %q is not valid UTF-8", p)
}
if strings.ContainsRune(p, '\\') {
return fmt.Errorf("path %q contains backslash; use forward slashes only", p)
}
if strings.HasPrefix(p, "/") {
return fmt.Errorf("path %q is absolute; must be relative", p)
}
for _, seg := range strings.Split(p, "/") {
if seg == "" {
return fmt.Errorf("path %q contains empty segment", p)
}
if seg == ".." {
return fmt.Errorf("path %q contains '..' segment", p)
}
}
return nil
}
// RelFilePath represents a relative file path within a manifest.
type RelFilePath string
// AbsFilePath represents an absolute file path on the filesystem.
type AbsFilePath string
// FileSize represents the size of a file in bytes.
type FileSize int64
// FileCount represents a count of files.
type FileCount int64
// ModTime represents a file's modification time.
type ModTime time.Time
// UnixSeconds represents seconds since Unix epoch.
type UnixSeconds int64
// UnixNanos represents the nanosecond component of a timestamp (0-999999999).
type UnixNanos int32
// Timestamp converts ModTime to a protobuf Timestamp.
func (m ModTime) Timestamp() *Timestamp {
t := time.Time(m)
return &Timestamp{
Seconds: t.Unix(),
Nanos: int32(t.Nanosecond()),
}
}
// Multihash represents a multihash-encoded file hash (typically SHA2-256).
type Multihash []byte
// FileHashProgress reports progress during file hashing.
type FileHashProgress struct {
BytesRead int64 // Total bytes read so far for the current file
BytesRead FileSize // Total bytes read so far for the current file
}
// Builder constructs a manifest by adding files one at a time.
type Builder struct {
mu sync.Mutex
files []*MFFilePath
createdAt time.Time
mu sync.Mutex
files []*MFFilePath
createdAt time.Time
includeTimestamps bool
signingOptions *SigningOptions
fixedUUID []byte // if set, use this UUID instead of generating one
}
// SetSeed derives a deterministic UUID from the given seed string.
// The seed is hashed once with SHA-256 and the first 16 bytes are used
// as a fixed UUID for the manifest.
func (b *Builder) SetSeed(seed string) {
hash := sha256.Sum256([]byte(seed))
b.fixedUUID = hash[:16]
}
// NewBuilder creates a new Builder.
@@ -33,24 +113,28 @@ func NewBuilder() *Builder {
// Progress updates are sent to the progress channel (if non-nil) without blocking.
// Returns the number of bytes read.
func (b *Builder) AddFile(
path string,
size int64,
mtime time.Time,
path RelFilePath,
size FileSize,
mtime ModTime,
reader io.Reader,
progress chan<- FileHashProgress,
) (int64, error) {
) (FileSize, error) {
if err := ValidatePath(string(path)); err != nil {
return 0, err
}
// Create hash writer
h := sha256.New()
// Read file in chunks, updating hash and progress
var totalRead int64
var totalRead FileSize
buf := make([]byte, 64*1024) // 64KB chunks
for {
n, err := reader.Read(buf)
if n > 0 {
h.Write(buf[:n])
totalRead += int64(n)
totalRead += FileSize(n)
sendFileHashProgress(progress, FileHashProgress{BytesRead: totalRead})
}
if err == io.EOF {
@@ -61,6 +145,11 @@ func (b *Builder) AddFile(
}
}
// Verify actual bytes read matches declared size
if totalRead != size {
return totalRead, fmt.Errorf("size mismatch for %q: declared %d bytes but read %d bytes", path, size, totalRead)
}
// Encode hash as multihash (SHA2-256)
mh, err := multihash.Encode(h.Sum(nil), multihash.SHA2_256)
if err != nil {
@@ -69,12 +158,12 @@ func (b *Builder) AddFile(
// Create file entry
entry := &MFFilePath{
Path: path,
Size: size,
Path: string(path),
Size: int64(size),
Hashes: []*MFFileChecksum{
{MultiHash: mh},
},
Mtime: newTimestampFromTime(mtime),
Mtime: mtime.Timestamp(),
}
b.mu.Lock()
@@ -104,19 +193,47 @@ func (b *Builder) FileCount() int {
// AddFileWithHash adds a file entry with a pre-computed hash.
// This is useful when the hash is already known (e.g., from an existing manifest).
func (b *Builder) AddFileWithHash(path string, size int64, mtime time.Time, hash []byte) {
// Returns an error if path is empty, size is negative, or hash is nil/empty.
func (b *Builder) AddFileWithHash(path RelFilePath, size FileSize, mtime ModTime, hash Multihash) error {
if err := ValidatePath(string(path)); err != nil {
return fmt.Errorf("add file: %w", err)
}
if size < 0 {
return errors.New("size cannot be negative")
}
if len(hash) == 0 {
return errors.New("hash cannot be nil or empty")
}
entry := &MFFilePath{
Path: path,
Size: size,
Path: string(path),
Size: int64(size),
Hashes: []*MFFileChecksum{
{MultiHash: hash},
},
Mtime: newTimestampFromTime(mtime),
Mtime: mtime.Timestamp(),
}
b.mu.Lock()
b.files = append(b.files, entry)
b.mu.Unlock()
return nil
}
// SetIncludeTimestamps controls whether the manifest includes a createdAt timestamp.
// By default timestamps are omitted for deterministic output.
func (b *Builder) SetIncludeTimestamps(include bool) {
b.mu.Lock()
defer b.mu.Unlock()
b.includeTimestamps = include
}
// SetSigningOptions sets the GPG signing options for the manifest.
// If opts is non-nil, the manifest will be signed when Build() is called.
func (b *Builder) SetSigningOptions(opts *SigningOptions) {
b.mu.Lock()
defer b.mu.Unlock()
b.signingOptions = opts
}
// Build finalizes the manifest and writes it to the writer.
@@ -124,29 +241,41 @@ func (b *Builder) Build(w io.Writer) error {
b.mu.Lock()
defer b.mu.Unlock()
// Sort files by path for deterministic output
sort.Slice(b.files, func(i, j int) bool {
return b.files[i].Path < b.files[j].Path
})
// Create inner manifest
inner := &MFFile{
Version: MFFile_VERSION_ONE,
CreatedAt: newTimestampFromTime(b.createdAt),
Files: b.files,
Version: MFFile_VERSION_ONE,
Files: b.files,
}
if b.includeTimestamps {
inner.CreatedAt = newTimestampFromTime(b.createdAt)
}
// Create a temporary manifest to use existing serialization
m := &manifest{
pbInner: inner,
pbInner: inner,
signingOptions: b.signingOptions,
fixedUUID: b.fixedUUID,
}
// Generate outer wrapper
if err := m.generateOuter(); err != nil {
return err
return fmt.Errorf("build: generate outer: %w", err)
}
// Generate final output
if err := m.generate(); err != nil {
return err
return fmt.Errorf("build: generate: %w", err)
}
// Write to output
_, err := w.Write(m.output.Bytes())
return err
if err != nil {
return fmt.Errorf("build: write output: %w", err)
}
return nil
}

387
mfer/builder_test.go Normal file
View File

@@ -0,0 +1,387 @@
package mfer
import (
"bytes"
"strings"
"testing"
"time"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestNewBuilder(t *testing.T) {
b := NewBuilder()
assert.NotNil(t, b)
assert.Equal(t, 0, b.FileCount())
}
func TestBuilderAddFile(t *testing.T) {
b := NewBuilder()
content := []byte("test content")
reader := bytes.NewReader(content)
bytesRead, err := b.AddFile("test.txt", FileSize(len(content)), ModTime(time.Now()), reader, nil)
require.NoError(t, err)
assert.Equal(t, FileSize(len(content)), bytesRead)
assert.Equal(t, 1, b.FileCount())
}
func TestBuilderAddFileWithHash(t *testing.T) {
b := NewBuilder()
hash := make([]byte, 34) // SHA256 multihash is 34 bytes
err := b.AddFileWithHash("test.txt", 100, ModTime(time.Now()), hash)
require.NoError(t, err)
assert.Equal(t, 1, b.FileCount())
}
func TestBuilderAddFileWithHashValidation(t *testing.T) {
t.Run("empty path", func(t *testing.T) {
b := NewBuilder()
hash := make([]byte, 34)
err := b.AddFileWithHash("", 100, ModTime(time.Now()), hash)
assert.Error(t, err)
assert.Contains(t, err.Error(), "path")
})
t.Run("negative size", func(t *testing.T) {
b := NewBuilder()
hash := make([]byte, 34)
err := b.AddFileWithHash("test.txt", -1, ModTime(time.Now()), hash)
assert.Error(t, err)
assert.Contains(t, err.Error(), "size")
})
t.Run("nil hash", func(t *testing.T) {
b := NewBuilder()
err := b.AddFileWithHash("test.txt", 100, ModTime(time.Now()), nil)
assert.Error(t, err)
assert.Contains(t, err.Error(), "hash")
})
t.Run("empty hash", func(t *testing.T) {
b := NewBuilder()
err := b.AddFileWithHash("test.txt", 100, ModTime(time.Now()), []byte{})
assert.Error(t, err)
assert.Contains(t, err.Error(), "hash")
})
t.Run("valid inputs", func(t *testing.T) {
b := NewBuilder()
hash := make([]byte, 34)
err := b.AddFileWithHash("test.txt", 100, ModTime(time.Now()), hash)
assert.NoError(t, err)
assert.Equal(t, 1, b.FileCount())
})
}
func TestBuilderBuild(t *testing.T) {
b := NewBuilder()
content := []byte("test content")
reader := bytes.NewReader(content)
_, err := b.AddFile("test.txt", FileSize(len(content)), ModTime(time.Now()), reader, nil)
require.NoError(t, err)
var buf bytes.Buffer
err = b.Build(&buf)
require.NoError(t, err)
// Should have magic bytes
assert.True(t, strings.HasPrefix(buf.String(), MAGIC))
}
func TestNewTimestampFromTimeExtremeDate(t *testing.T) {
// Regression test: newTimestampFromTime used UnixNano() which panics
// for dates outside ~1678-2262. Now uses Nanosecond() which is safe.
tests := []struct {
name string
time time.Time
}{
{"zero time", time.Time{}},
{"year 1000", time.Date(1000, 1, 1, 0, 0, 0, 0, time.UTC)},
{"year 3000", time.Date(3000, 1, 1, 0, 0, 0, 123456789, time.UTC)},
{"unix epoch", time.Unix(0, 0)},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Should not panic
ts := newTimestampFromTime(tt.time)
assert.Equal(t, tt.time.Unix(), ts.Seconds)
assert.Equal(t, int32(tt.time.Nanosecond()), ts.Nanos)
})
}
}
func TestBuilderDeterministicOutput(t *testing.T) {
buildManifest := func() []byte {
b := NewBuilder()
// Use a fixed createdAt and UUID so output is reproducible
b.createdAt = time.Date(2025, 1, 1, 0, 0, 0, 0, time.UTC)
b.fixedUUID = make([]byte, 16) // all zeros
mtime := ModTime(time.Date(2025, 6, 1, 0, 0, 0, 0, time.UTC))
// Add files in reverse order to test sorting
files := []struct {
path string
content string
}{
{"c/file.txt", "content c"},
{"a/file.txt", "content a"},
{"b/file.txt", "content b"},
}
for _, f := range files {
r := bytes.NewReader([]byte(f.content))
_, err := b.AddFile(RelFilePath(f.path), FileSize(len(f.content)), mtime, r, nil)
require.NoError(t, err)
}
var buf bytes.Buffer
err := b.Build(&buf)
require.NoError(t, err)
return buf.Bytes()
}
out1 := buildManifest()
out2 := buildManifest()
assert.Equal(t, out1, out2, "two builds with same input should produce byte-identical output")
}
func TestSetSeedDeterministic(t *testing.T) {
b1 := NewBuilder()
b1.SetSeed("test-seed-value")
b2 := NewBuilder()
b2.SetSeed("test-seed-value")
assert.Equal(t, b1.fixedUUID, b2.fixedUUID, "same seed should produce same UUID")
assert.Len(t, b1.fixedUUID, 16, "UUID should be 16 bytes")
b3 := NewBuilder()
b3.SetSeed("different-seed")
assert.NotEqual(t, b1.fixedUUID, b3.fixedUUID, "different seeds should produce different UUIDs")
}
func TestValidatePath(t *testing.T) {
valid := []string{
"file.txt",
"dir/file.txt",
"a/b/c/d.txt",
"file with spaces.txt",
"日本語.txt",
}
for _, p := range valid {
t.Run("valid:"+p, func(t *testing.T) {
assert.NoError(t, ValidatePath(p))
})
}
invalid := []struct {
path string
desc string
}{
{"", "empty"},
{"/absolute", "absolute path"},
{"has\\backslash", "backslash"},
{"has/../traversal", "dot-dot segment"},
{"has//double", "empty segment"},
{"..", "just dot-dot"},
{string([]byte{0xff, 0xfe}), "invalid UTF-8"},
}
for _, tt := range invalid {
t.Run("invalid:"+tt.desc, func(t *testing.T) {
assert.Error(t, ValidatePath(tt.path))
})
}
}
func TestBuilderAddFileSizeMismatch(t *testing.T) {
b := NewBuilder()
content := []byte("short")
reader := bytes.NewReader(content)
// Declare wrong size
_, err := b.AddFile("test.txt", FileSize(100), ModTime(time.Now()), reader, nil)
assert.Error(t, err)
assert.Contains(t, err.Error(), "size mismatch")
}
func TestBuilderAddFileInvalidPath(t *testing.T) {
b := NewBuilder()
content := []byte("data")
reader := bytes.NewReader(content)
_, err := b.AddFile("", FileSize(len(content)), ModTime(time.Now()), reader, nil)
assert.Error(t, err)
reader.Reset(content)
_, err = b.AddFile("/absolute", FileSize(len(content)), ModTime(time.Now()), reader, nil)
assert.Error(t, err)
}
func TestBuilderAddFileWithProgress(t *testing.T) {
b := NewBuilder()
content := bytes.Repeat([]byte("x"), 1000)
reader := bytes.NewReader(content)
progress := make(chan FileHashProgress, 100)
bytesRead, err := b.AddFile("test.txt", FileSize(len(content)), ModTime(time.Now()), reader, progress)
close(progress)
require.NoError(t, err)
assert.Equal(t, FileSize(1000), bytesRead)
var updates []FileHashProgress
for p := range progress {
updates = append(updates, p)
}
assert.NotEmpty(t, updates)
// Last update should show all bytes
assert.Equal(t, FileSize(1000), updates[len(updates)-1].BytesRead)
}
func TestBuilderBuildRoundTrip(t *testing.T) {
// Build a manifest, deserialize it, verify all fields survive round-trip
b := NewBuilder()
now := time.Date(2025, 6, 15, 12, 0, 0, 0, time.UTC)
files := []struct {
path string
content []byte
}{
{"alpha.txt", []byte("alpha content")},
{"beta/gamma.txt", []byte("gamma content")},
{"beta/delta.txt", []byte("delta content")},
}
for _, f := range files {
reader := bytes.NewReader(f.content)
_, err := b.AddFile(RelFilePath(f.path), FileSize(len(f.content)), ModTime(now), reader, nil)
require.NoError(t, err)
}
var buf bytes.Buffer
require.NoError(t, b.Build(&buf))
m, err := NewManifestFromReader(&buf)
require.NoError(t, err)
mfiles := m.Files()
require.Len(t, mfiles, 3)
// Verify sorted order
assert.Equal(t, "alpha.txt", mfiles[0].Path)
assert.Equal(t, "beta/delta.txt", mfiles[1].Path)
assert.Equal(t, "beta/gamma.txt", mfiles[2].Path)
// Verify sizes
assert.Equal(t, int64(len("alpha content")), mfiles[0].Size)
// Verify hashes are present
for _, f := range mfiles {
require.NotEmpty(t, f.Hashes, "file %s should have hashes", f.Path)
assert.NotEmpty(t, f.Hashes[0].MultiHash)
}
}
func TestNewManifestFromReaderInvalidMagic(t *testing.T) {
_, err := NewManifestFromReader(bytes.NewReader([]byte("NOT_VALID")))
assert.Error(t, err)
assert.Contains(t, err.Error(), "invalid file format")
}
func TestNewManifestFromReaderEmpty(t *testing.T) {
_, err := NewManifestFromReader(bytes.NewReader([]byte{}))
assert.Error(t, err)
}
func TestNewManifestFromReaderTruncated(t *testing.T) {
// Just the magic with nothing after
_, err := NewManifestFromReader(bytes.NewReader([]byte(MAGIC)))
assert.Error(t, err)
}
func TestManifestString(t *testing.T) {
b := NewBuilder()
content := []byte("test")
reader := bytes.NewReader(content)
_, err := b.AddFile("test.txt", FileSize(len(content)), ModTime(time.Now()), reader, nil)
require.NoError(t, err)
var buf bytes.Buffer
require.NoError(t, b.Build(&buf))
m, err := NewManifestFromReader(&buf)
require.NoError(t, err)
assert.Contains(t, m.String(), "count=1")
}
func TestBuilderBuildEmpty(t *testing.T) {
b := NewBuilder()
var buf bytes.Buffer
err := b.Build(&buf)
require.NoError(t, err)
// Should still produce valid manifest with 0 files
assert.True(t, strings.HasPrefix(buf.String(), MAGIC))
}
func TestBuilderOmitsCreatedAtByDefault(t *testing.T) {
b := NewBuilder()
content := []byte("hello")
_, err := b.AddFile("test.txt", FileSize(len(content)), ModTime(time.Now()), bytes.NewReader(content), nil)
require.NoError(t, err)
var buf bytes.Buffer
require.NoError(t, b.Build(&buf))
m, err := NewManifestFromReader(&buf)
require.NoError(t, err)
assert.Nil(t, m.pbInner.CreatedAt, "createdAt should be nil by default for deterministic output")
}
func TestBuilderIncludesCreatedAtWhenRequested(t *testing.T) {
b := NewBuilder()
b.SetIncludeTimestamps(true)
content := []byte("hello")
_, err := b.AddFile("test.txt", FileSize(len(content)), ModTime(time.Now()), bytes.NewReader(content), nil)
require.NoError(t, err)
var buf bytes.Buffer
require.NoError(t, b.Build(&buf))
m, err := NewManifestFromReader(&buf)
require.NoError(t, err)
assert.NotNil(t, m.pbInner.CreatedAt, "createdAt should be set when IncludeTimestamps is true")
}
func TestBuilderDeterministicFileOrder(t *testing.T) {
// Two builds with same files in different order should produce same file ordering.
// Note: UUIDs differ per build, so we compare parsed file lists, not raw bytes.
buildAndParse := func(order []string) []*MFFilePath {
b := NewBuilder()
for _, name := range order {
content := []byte("content of " + name)
_, err := b.AddFile(RelFilePath(name), FileSize(len(content)), ModTime(time.Unix(1000, 0)), bytes.NewReader(content), nil)
require.NoError(t, err)
}
var buf bytes.Buffer
require.NoError(t, b.Build(&buf))
m, err := NewManifestFromReader(&buf)
require.NoError(t, err)
return m.Files()
}
files1 := buildAndParse([]string{"b.txt", "a.txt"})
files2 := buildAndParse([]string{"a.txt", "b.txt"})
require.Len(t, files1, 2)
require.Len(t, files2, 2)
for i := range files1 {
assert.Equal(t, files1[i].Path, files2[i].Path)
assert.Equal(t, files1[i].Size, files2[i].Size)
}
assert.Equal(t, "a.txt", files1[0].Path)
assert.Equal(t, "b.txt", files1[1].Path)
}

362
mfer/checker.go Normal file
View File

@@ -0,0 +1,362 @@
package mfer
import (
"bytes"
"context"
"crypto/sha256"
"errors"
"io"
"os"
"path/filepath"
"time"
"github.com/multiformats/go-multihash"
"github.com/spf13/afero"
)
// Result represents the outcome of checking a single file.
type Result struct {
Path RelFilePath // Relative path from manifest
Status Status // Verification result status
Message string // Human-readable description of the result
}
// Status represents the verification status of a file.
type Status int
const (
StatusOK Status = iota // File matches manifest (size and hash verified)
StatusMissing // File not found on disk
StatusSizeMismatch // File size differs from manifest
StatusHashMismatch // File hash differs from manifest
StatusExtra // File exists on disk but not in manifest
StatusError // Error occurred during verification
)
func (s Status) String() string {
switch s {
case StatusOK:
return "OK"
case StatusMissing:
return "MISSING"
case StatusSizeMismatch:
return "SIZE_MISMATCH"
case StatusHashMismatch:
return "HASH_MISMATCH"
case StatusExtra:
return "EXTRA"
case StatusError:
return "ERROR"
default:
return "UNKNOWN"
}
}
// CheckStatus contains progress information for the check operation.
type CheckStatus struct {
TotalFiles FileCount // Total number of files in manifest
CheckedFiles FileCount // Number of files checked so far
TotalBytes FileSize // Total bytes to verify (sum of all file sizes)
CheckedBytes FileSize // Bytes verified so far
BytesPerSec float64 // Current throughput rate
ETA time.Duration // Estimated time to completion
Failures FileCount // Number of verification failures encountered
}
// Checker verifies files against a manifest.
type Checker struct {
basePath AbsFilePath
files []*MFFilePath
fs afero.Fs
// manifestPaths is a set of paths in the manifest for quick lookup
manifestPaths map[RelFilePath]struct{}
// manifestRelPath is the relative path of the manifest file from basePath (for exclusion)
manifestRelPath RelFilePath
// signature info from the manifest
signature []byte
signer []byte
signingPubKey []byte
}
// NewChecker creates a new Checker for the given manifest, base path, and filesystem.
// The basePath is the directory relative to which manifest paths are resolved.
// If fs is nil, the real filesystem (OsFs) is used.
func NewChecker(manifestPath string, basePath string, fs afero.Fs) (*Checker, error) {
if fs == nil {
fs = afero.NewOsFs()
}
m, err := NewManifestFromFile(fs, manifestPath)
if err != nil {
return nil, err
}
abs, err := filepath.Abs(basePath)
if err != nil {
return nil, err
}
files := m.Files()
manifestPaths := make(map[RelFilePath]struct{}, len(files))
for _, f := range files {
manifestPaths[RelFilePath(f.Path)] = struct{}{}
}
// Compute manifest's relative path from basePath for exclusion in FindExtraFiles
absManifest, err := filepath.Abs(manifestPath)
if err != nil {
return nil, err
}
manifestRel, err := filepath.Rel(abs, absManifest)
if err != nil {
manifestRel = ""
}
return &Checker{
basePath: AbsFilePath(abs),
files: files,
fs: fs,
manifestPaths: manifestPaths,
manifestRelPath: RelFilePath(manifestRel),
signature: m.pbOuter.Signature,
signer: m.pbOuter.Signer,
signingPubKey: m.pbOuter.SigningPubKey,
}, nil
}
// FileCount returns the number of files in the manifest.
func (c *Checker) FileCount() FileCount {
return FileCount(len(c.files))
}
// TotalBytes returns the total size of all files in the manifest.
func (c *Checker) TotalBytes() FileSize {
var total FileSize
for _, f := range c.files {
total += FileSize(f.Size)
}
return total
}
// IsSigned returns true if the manifest has a signature.
func (c *Checker) IsSigned() bool {
return len(c.signature) > 0
}
// Signer returns the signer fingerprint if the manifest is signed, nil otherwise.
func (c *Checker) Signer() []byte {
return c.signer
}
// SigningPubKey returns the signing public key if the manifest is signed, nil otherwise.
func (c *Checker) SigningPubKey() []byte {
return c.signingPubKey
}
// ExtractEmbeddedSigningKeyFP imports the manifest's embedded public key into a
// temporary keyring and extracts its fingerprint. This validates the key and
// returns its actual fingerprint from the key material itself.
func (c *Checker) ExtractEmbeddedSigningKeyFP() (string, error) {
if len(c.signingPubKey) == 0 {
return "", errors.New("manifest has no signing public key")
}
return gpgExtractPubKeyFingerprint(c.signingPubKey)
}
// Check verifies all files against the manifest.
// Results are sent to the results channel as files are checked.
// Progress updates are sent to the progress channel approximately once per second.
// Both channels are closed when the method returns.
func (c *Checker) Check(ctx context.Context, results chan<- Result, progress chan<- CheckStatus) error {
if results != nil {
defer close(results)
}
if progress != nil {
defer close(progress)
}
totalFiles := FileCount(len(c.files))
totalBytes := c.TotalBytes()
var checkedFiles FileCount
var checkedBytes FileSize
var failures FileCount
startTime := time.Now()
lastProgressTime := time.Now()
for _, entry := range c.files {
select {
case <-ctx.Done():
return ctx.Err()
default:
}
result := c.checkFile(entry, &checkedBytes)
if result.Status != StatusOK {
failures++
}
checkedFiles++
if results != nil {
results <- result
}
// Send progress at most once per second (rate-limited)
if progress != nil {
now := time.Now()
isLast := checkedFiles == totalFiles
if isLast || now.Sub(lastProgressTime) >= time.Second {
elapsed := time.Since(startTime)
var bytesPerSec float64
var eta time.Duration
if elapsed > 0 && checkedBytes > 0 {
bytesPerSec = float64(checkedBytes) / elapsed.Seconds()
remainingBytes := totalBytes - checkedBytes
if bytesPerSec > 0 {
eta = time.Duration(float64(remainingBytes)/bytesPerSec) * time.Second
}
}
sendCheckStatus(progress, CheckStatus{
TotalFiles: totalFiles,
CheckedFiles: checkedFiles,
TotalBytes: totalBytes,
CheckedBytes: checkedBytes,
BytesPerSec: bytesPerSec,
ETA: eta,
Failures: failures,
})
lastProgressTime = now
}
}
}
return nil
}
func (c *Checker) checkFile(entry *MFFilePath, checkedBytes *FileSize) Result {
absPath := filepath.Join(string(c.basePath), entry.Path)
relPath := RelFilePath(entry.Path)
// Check if file exists
info, err := c.fs.Stat(absPath)
if err != nil {
if errors.Is(err, os.ErrNotExist) || errors.Is(err, afero.ErrFileNotFound) {
return Result{Path: relPath, Status: StatusMissing, Message: "file not found"}
}
return Result{Path: relPath, Status: StatusError, Message: err.Error()}
}
// Check size
if info.Size() != entry.Size {
*checkedBytes += FileSize(info.Size())
return Result{
Path: relPath,
Status: StatusSizeMismatch,
Message: "size mismatch",
}
}
// Open and hash file
f, err := c.fs.Open(absPath)
if err != nil {
return Result{Path: relPath, Status: StatusError, Message: err.Error()}
}
defer func() { _ = f.Close() }()
h := sha256.New()
n, err := io.Copy(h, f)
if err != nil {
return Result{Path: relPath, Status: StatusError, Message: err.Error()}
}
*checkedBytes += FileSize(n)
// Encode as multihash and compare
computed, err := multihash.Encode(h.Sum(nil), multihash.SHA2_256)
if err != nil {
return Result{Path: relPath, Status: StatusError, Message: err.Error()}
}
// Check against all hashes in manifest (at least one must match)
for _, hash := range entry.Hashes {
if bytes.Equal(computed, hash.MultiHash) {
return Result{Path: relPath, Status: StatusOK}
}
}
return Result{Path: relPath, Status: StatusHashMismatch, Message: "hash mismatch"}
}
// FindExtraFiles walks the filesystem and reports files not in the manifest.
// Results are sent to the results channel. The channel is closed when done.
// Hidden files/directories (starting with .) are skipped, as they are excluded
// from manifests by default. The manifest file itself is also skipped.
func (c *Checker) FindExtraFiles(ctx context.Context, results chan<- Result) error {
if results != nil {
defer close(results)
}
return afero.Walk(c.fs, string(c.basePath), func(walkPath string, info os.FileInfo, err error) error {
if err != nil {
return err
}
select {
case <-ctx.Done():
return ctx.Err()
default:
}
// Get relative path
rel, err := filepath.Rel(string(c.basePath), walkPath)
if err != nil {
return err
}
// Skip hidden files and directories (dotfiles)
if IsHiddenPath(filepath.ToSlash(rel)) {
if info.IsDir() {
return filepath.SkipDir
}
return nil
}
// Skip directories
if info.IsDir() {
return nil
}
relPath := RelFilePath(rel)
// Skip the manifest file itself
if relPath == c.manifestRelPath {
return nil
}
// Check if path is in manifest
if _, exists := c.manifestPaths[relPath]; !exists {
if results != nil {
results <- Result{
Path: relPath,
Status: StatusExtra,
Message: "not in manifest",
}
}
}
return nil
})
}
// sendCheckStatus sends a status update without blocking.
func sendCheckStatus(ch chan<- CheckStatus, status CheckStatus) {
if ch == nil {
return
}
select {
case ch <- status:
default:
}
}

View File

@@ -1,15 +1,15 @@
package checker
package mfer
import (
"bytes"
"context"
"fmt"
"testing"
"time"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/mfer/mfer"
)
func TestStatusString(t *testing.T) {
@@ -37,16 +37,16 @@ func TestStatusString(t *testing.T) {
func createTestManifest(t *testing.T, fs afero.Fs, manifestPath string, files map[string][]byte) {
t.Helper()
builder := mfer.NewBuilder()
builder := NewBuilder()
for path, content := range files {
reader := bytes.NewReader(content)
_, err := builder.AddFile(path, int64(len(content)), time.Now(), reader, nil)
_, err := builder.AddFile(RelFilePath(path), FileSize(len(content)), ModTime(time.Now()), reader, nil)
require.NoError(t, err)
}
var buf bytes.Buffer
require.NoError(t, builder.Build(&buf))
require.NoError(t, afero.WriteFile(fs, manifestPath, buf.Bytes(), 0644))
require.NoError(t, afero.WriteFile(fs, manifestPath, buf.Bytes(), 0o644))
}
// createFilesOnDisk creates the given files on the filesystem.
@@ -55,8 +55,8 @@ func createFilesOnDisk(t *testing.T, fs afero.Fs, basePath string, files map[str
for path, content := range files {
fullPath := basePath + "/" + path
require.NoError(t, fs.MkdirAll(basePath, 0755))
require.NoError(t, afero.WriteFile(fs, fullPath, content, 0644))
require.NoError(t, fs.MkdirAll(basePath, 0o755))
require.NoError(t, afero.WriteFile(fs, fullPath, content, 0o644))
}
}
@@ -72,7 +72,7 @@ func TestNewChecker(t *testing.T) {
chk, err := NewChecker("/manifest.mf", "/", fs)
require.NoError(t, err)
assert.NotNil(t, chk)
assert.Equal(t, int64(2), chk.FileCount())
assert.Equal(t, FileCount(2), chk.FileCount())
})
t.Run("missing manifest", func(t *testing.T) {
@@ -83,7 +83,7 @@ func TestNewChecker(t *testing.T) {
t.Run("invalid manifest", func(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, afero.WriteFile(fs, "/bad.mf", []byte("not a manifest"), 0644))
require.NoError(t, afero.WriteFile(fs, "/bad.mf", []byte("not a manifest"), 0o644))
_, err := NewChecker("/bad.mf", "/", fs)
assert.Error(t, err)
})
@@ -101,8 +101,8 @@ func TestCheckerFileCountAndTotalBytes(t *testing.T) {
chk, err := NewChecker("/manifest.mf", "/", fs)
require.NoError(t, err)
assert.Equal(t, int64(3), chk.FileCount())
assert.Equal(t, int64(2+11+1000), chk.TotalBytes())
assert.Equal(t, FileCount(3), chk.FileCount())
assert.Equal(t, FileSize(2+11+1000), chk.TotalBytes())
}
func TestCheckAllFilesOK(t *testing.T) {
@@ -158,7 +158,7 @@ func TestCheckMissingFile(t *testing.T) {
okCount++
case StatusMissing:
missingCount++
assert.Equal(t, "missing.txt", r.Path)
assert.Equal(t, RelFilePath("missing.txt"), r.Path)
}
}
@@ -186,7 +186,7 @@ func TestCheckSizeMismatch(t *testing.T) {
r := <-results
assert.Equal(t, StatusSizeMismatch, r.Status)
assert.Equal(t, "file.txt", r.Path)
assert.Equal(t, RelFilePath("file.txt"), r.Path)
}
func TestCheckHashMismatch(t *testing.T) {
@@ -212,7 +212,7 @@ func TestCheckHashMismatch(t *testing.T) {
r := <-results
assert.Equal(t, StatusHashMismatch, r.Status)
assert.Equal(t, "file.txt", r.Path)
assert.Equal(t, RelFilePath("file.txt"), r.Path)
}
func TestCheckWithProgress(t *testing.T) {
@@ -246,11 +246,11 @@ func TestCheckWithProgress(t *testing.T) {
assert.NotEmpty(t, progressUpdates)
// Final progress should show all files checked
final := progressUpdates[len(progressUpdates)-1]
assert.Equal(t, int64(2), final.TotalFiles)
assert.Equal(t, int64(2), final.CheckedFiles)
assert.Equal(t, int64(300), final.TotalBytes)
assert.Equal(t, int64(300), final.CheckedBytes)
assert.Equal(t, int64(0), final.Failures)
assert.Equal(t, FileCount(2), final.TotalFiles)
assert.Equal(t, FileCount(2), final.CheckedFiles)
assert.Equal(t, FileSize(300), final.TotalBytes)
assert.Equal(t, FileSize(300), final.CheckedBytes)
assert.Equal(t, FileCount(0), final.Failures)
}
func TestCheckContextCancellation(t *testing.T) {
@@ -301,11 +301,49 @@ func TestFindExtraFiles(t *testing.T) {
}
assert.Len(t, extras, 1)
assert.Equal(t, "file2.txt", extras[0].Path)
assert.Equal(t, RelFilePath("file2.txt"), extras[0].Path)
assert.Equal(t, StatusExtra, extras[0].Status)
assert.Equal(t, "not in manifest", extras[0].Message)
}
func TestFindExtraFilesSkipsManifestAndDotfiles(t *testing.T) {
fs := afero.NewMemMapFs()
manifestFiles := map[string][]byte{
"file1.txt": []byte("in manifest"),
}
createTestManifest(t, fs, "/data/.index.mf", manifestFiles)
createFilesOnDisk(t, fs, "/data", map[string][]byte{
"file1.txt": []byte("in manifest"),
})
// Create dotfile and manifest that should be skipped
require.NoError(t, afero.WriteFile(fs, "/data/.hidden", []byte("hidden"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/data/.config/settings", []byte("cfg"), 0o644))
// Create a real extra file
require.NoError(t, fs.MkdirAll("/data", 0o755))
require.NoError(t, afero.WriteFile(fs, "/data/extra.txt", []byte("extra"), 0o644))
chk, err := NewChecker("/data/.index.mf", "/data", fs)
require.NoError(t, err)
results := make(chan Result, 10)
err = chk.FindExtraFiles(context.Background(), results)
require.NoError(t, err)
var extras []Result
for r := range results {
extras = append(extras, r)
}
// Should only report extra.txt, not .hidden, .config/settings, or .index.mf
for _, e := range extras {
t.Logf("extra: %s", e.Path)
}
assert.Len(t, extras, 1)
if len(extras) > 0 {
assert.Equal(t, RelFilePath("extra.txt"), extras[0].Path)
}
}
func TestFindExtraFilesContextCancellation(t *testing.T) {
fs := afero.NewMemMapFs()
files := map[string][]byte{"file.txt": []byte("data")}
@@ -363,8 +401,8 @@ func TestCheckSubdirectories(t *testing.T) {
// Create files with full directory structure
for path, content := range files {
fullPath := "/data/" + path
require.NoError(t, fs.MkdirAll("/data/dir1/dir2/dir3", 0755))
require.NoError(t, afero.WriteFile(fs, fullPath, content, 0644))
require.NoError(t, fs.MkdirAll("/data/dir1/dir2/dir3", 0o755))
require.NoError(t, afero.WriteFile(fs, fullPath, content, 0o644))
}
chk, err := NewChecker("/manifest.mf", "/data", fs)
@@ -382,6 +420,94 @@ func TestCheckSubdirectories(t *testing.T) {
assert.Equal(t, 3, okCount)
}
func TestCheckMissingFileDetectedWithoutFallback(t *testing.T) {
// Regression test: errors.Is(err, errors.New("...")) never matches because
// errors.New creates a new value each time. The fix uses os.ErrNotExist instead.
fs := afero.NewMemMapFs()
files := map[string][]byte{
"exists.txt": []byte("here"),
"missing.txt": []byte("not on disk"),
}
createTestManifest(t, fs, "/manifest.mf", files)
// Only create one file on disk
createFilesOnDisk(t, fs, "/data", map[string][]byte{
"exists.txt": []byte("here"),
})
chk, err := NewChecker("/manifest.mf", "/data", fs)
require.NoError(t, err)
results := make(chan Result, 10)
err = chk.Check(context.Background(), results, nil)
require.NoError(t, err)
statusCounts := map[Status]int{}
for r := range results {
statusCounts[r.Status]++
if r.Status == StatusMissing {
assert.Equal(t, RelFilePath("missing.txt"), r.Path)
}
}
assert.Equal(t, 1, statusCounts[StatusOK], "one file should be OK")
assert.Equal(t, 1, statusCounts[StatusMissing], "one file should be MISSING")
assert.Equal(t, 0, statusCounts[StatusError], "no files should be ERROR")
}
func TestFindExtraFilesSkipsDotfiles(t *testing.T) {
// Regression test for #16: FindExtraFiles should not report dotfiles
// or the manifest file itself as extra files.
fs := afero.NewMemMapFs()
files := map[string][]byte{
"file1.txt": []byte("in manifest"),
}
createTestManifest(t, fs, "/data/.index.mf", files)
createFilesOnDisk(t, fs, "/data", files)
// Add dotfiles and manifest file on disk
require.NoError(t, afero.WriteFile(fs, "/data/.hidden", []byte("dotfile"), 0o644))
require.NoError(t, fs.MkdirAll("/data/.git", 0o755))
require.NoError(t, afero.WriteFile(fs, "/data/.git/config", []byte("git config"), 0o644))
chk, err := NewChecker("/data/.index.mf", "/data", fs)
require.NoError(t, err)
results := make(chan Result, 10)
err = chk.FindExtraFiles(context.Background(), results)
require.NoError(t, err)
var extras []Result
for r := range results {
extras = append(extras, r)
}
// Should report NO extra files — dotfiles and manifest should be skipped
assert.Empty(t, extras, "FindExtraFiles should not report dotfiles or manifest file as extra; got: %v", extras)
}
func TestFindExtraFilesSkipsManifestFile(t *testing.T) {
// The manifest file itself should never be reported as extra
fs := afero.NewMemMapFs()
files := map[string][]byte{
"file1.txt": []byte("content"),
}
createTestManifest(t, fs, "/data/index.mf", files)
createFilesOnDisk(t, fs, "/data", files)
chk, err := NewChecker("/data/index.mf", "/data", fs)
require.NoError(t, err)
results := make(chan Result, 10)
err = chk.FindExtraFiles(context.Background(), results)
require.NoError(t, err)
var extras []Result
for r := range results {
extras = append(extras, r)
}
assert.Empty(t, extras, "manifest file should not be reported as extra; got: %v", extras)
}
func TestCheckEmptyManifest(t *testing.T) {
fs := afero.NewMemMapFs()
// Create manifest with no files
@@ -390,8 +516,8 @@ func TestCheckEmptyManifest(t *testing.T) {
chk, err := NewChecker("/manifest.mf", "/data", fs)
require.NoError(t, err)
assert.Equal(t, int64(0), chk.FileCount())
assert.Equal(t, int64(0), chk.TotalBytes())
assert.Equal(t, FileCount(0), chk.FileCount())
assert.Equal(t, FileSize(0), chk.TotalBytes())
results := make(chan Result, 10)
err = chk.Check(context.Background(), results, nil)
@@ -403,3 +529,40 @@ func TestCheckEmptyManifest(t *testing.T) {
}
assert.Equal(t, 0, count)
}
func TestCheckProgressRateLimited(t *testing.T) {
// Create many small files - progress should be rate-limited, not one per file.
// With rate-limiting to once per second, we should get far fewer progress
// updates than files (plus one final update).
fs := afero.NewMemMapFs()
files := make(map[string][]byte, 100)
for i := 0; i < 100; i++ {
name := fmt.Sprintf("file%03d.txt", i)
files[name] = []byte("content")
}
createTestManifest(t, fs, "/manifest.mf", files)
createFilesOnDisk(t, fs, "/data", files)
chk, err := NewChecker("/manifest.mf", "/data", fs)
require.NoError(t, err)
results := make(chan Result, 200)
progress := make(chan CheckStatus, 200)
err = chk.Check(context.Background(), results, progress)
require.NoError(t, err)
// Drain results
for range results {
}
// Count progress updates
var progressCount int
for range progress {
progressCount++
}
// Should be far fewer than 100 (rate-limited to once per second)
// At minimum we get the final update
assert.GreaterOrEqual(t, progressCount, 1, "should get at least the final progress update")
assert.Less(t, progressCount, 100, "progress should be rate-limited, not one per file")
}

View File

@@ -3,4 +3,9 @@ package mfer
const (
Version = "0.1.0"
ReleaseDate = "2025-12-17"
// MaxDecompressedSize is the maximum allowed size of decompressed manifest
// data (256 MB). This prevents decompression bombs from consuming excessive
// memory.
MaxDecompressedSize int64 = 256 * 1024 * 1024
)

View File

@@ -2,9 +2,12 @@ package mfer
import (
"bytes"
"crypto/sha256"
"errors"
"fmt"
"io"
"github.com/google/uuid"
"github.com/klauspost/compress/zstd"
"github.com/spf13/afero"
"google.golang.org/protobuf/proto"
@@ -12,6 +15,19 @@ import (
"sneak.berlin/go/mfer/internal/log"
)
// validateUUID checks that the byte slice is a valid UUID (16 bytes, parseable).
func validateUUID(data []byte) error {
if len(data) != 16 {
return errors.New("invalid UUID length")
}
// Try to parse as UUID to validate format
_, err := uuid.FromBytes(data)
if err != nil {
return errors.New("invalid UUID format")
}
return nil
}
func (m *manifest) deserializeInner() error {
if m.pbOuter.Version != MFFileOuter_VERSION_ONE {
return errors.New("unknown version")
@@ -20,17 +36,59 @@ func (m *manifest) deserializeInner() error {
return errors.New("unknown compression type")
}
// Validate outer UUID before any decompression
if err := validateUUID(m.pbOuter.Uuid); err != nil {
return errors.New("outer UUID invalid: " + err.Error())
}
// Verify hash of compressed data before decompression
h := sha256.New()
if _, err := h.Write(m.pbOuter.InnerMessage); err != nil {
return fmt.Errorf("deserialize: hash write: %w", err)
}
sha256Hash := h.Sum(nil)
if !bytes.Equal(sha256Hash, m.pbOuter.Sha256) {
return errors.New("compressed data hash mismatch")
}
// Verify signature if present
if len(m.pbOuter.Signature) > 0 {
if len(m.pbOuter.SigningPubKey) == 0 {
return errors.New("signature present but no public key")
}
sigString, err := m.signatureString()
if err != nil {
return fmt.Errorf("failed to generate signature string for verification: %w", err)
}
if err := gpgVerify([]byte(sigString), m.pbOuter.Signature, m.pbOuter.SigningPubKey); err != nil {
return fmt.Errorf("signature verification failed: %w", err)
}
log.Infof("signature verified successfully")
}
bb := bytes.NewBuffer(m.pbOuter.InnerMessage)
zr, err := zstd.NewReader(bb)
if err != nil {
return err
return fmt.Errorf("deserialize: zstd reader: %w", err)
}
defer zr.Close()
dat, err := io.ReadAll(zr)
// Limit decompressed size to prevent decompression bombs.
// Use declared size + 1 byte to detect overflow, capped at MaxDecompressedSize.
maxSize := MaxDecompressedSize
if m.pbOuter.Size > 0 && m.pbOuter.Size < int64(maxSize) {
maxSize = int64(m.pbOuter.Size) + 1
}
limitedReader := io.LimitReader(zr, maxSize)
dat, err := io.ReadAll(limitedReader)
if err != nil {
return err
return fmt.Errorf("deserialize: decompress: %w", err)
}
if int64(len(dat)) >= MaxDecompressedSize {
return fmt.Errorf("decompressed data exceeds maximum allowed size of %d bytes", MaxDecompressedSize)
}
isize := len(dat)
@@ -42,7 +100,17 @@ func (m *manifest) deserializeInner() error {
// Deserialize inner message
m.pbInner = new(MFFile)
if err := proto.Unmarshal(dat, m.pbInner); err != nil {
return err
return fmt.Errorf("deserialize: unmarshal inner: %w", err)
}
// Validate inner UUID
if err := validateUUID(m.pbInner.Uuid); err != nil {
return errors.New("inner UUID invalid: " + err.Error())
}
// Verify UUIDs match
if !bytes.Equal(m.pbOuter.Uuid, m.pbInner.Uuid) {
return errors.New("outer and inner UUID mismatch")
}
log.Infof("loaded manifest with %d files", len(m.pbInner.Files))
@@ -61,7 +129,7 @@ func validateMagic(dat []byte) bool {
// NewManifestFromReader reads a manifest from an io.Reader.
func NewManifestFromReader(input io.Reader) (*manifest, error) {
m := New()
m := &manifest{}
dat, err := io.ReadAll(input)
if err != nil {
return nil, err
@@ -102,8 +170,3 @@ func NewManifestFromFile(fs afero.Fs, path string) (*manifest, error) {
defer func() { _ = f.Close() }()
return NewManifestFromReader(f)
}
// NewFromProto is deprecated, use NewManifestFromReader instead.
func NewFromProto(input io.Reader) (*manifest, error) {
return NewManifestFromReader(input)
}

212
mfer/gpg.go Normal file
View File

@@ -0,0 +1,212 @@
package mfer
import (
"bytes"
"fmt"
"os"
"os/exec"
"path/filepath"
"strings"
)
// GPGKeyID represents a GPG key identifier (fingerprint or key ID).
type GPGKeyID string
// SigningOptions contains options for GPG signing.
type SigningOptions struct {
KeyID GPGKeyID
}
// gpgSign creates a detached signature of the data using the specified key.
// Returns the armored detached signature.
func gpgSign(data []byte, keyID GPGKeyID) ([]byte, error) {
cmd := exec.Command("gpg", "--batch", "--no-tty",
"--detach-sign",
"--armor",
"--local-user", string(keyID),
)
cmd.Stdin = bytes.NewReader(data)
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
if err := cmd.Run(); err != nil {
return nil, fmt.Errorf("gpg sign failed: %w: %s", err, stderr.String())
}
return stdout.Bytes(), nil
}
// gpgExportPublicKey exports the public key for the specified key ID.
// Returns the armored public key.
func gpgExportPublicKey(keyID GPGKeyID) ([]byte, error) {
cmd := exec.Command("gpg", "--batch", "--no-tty",
"--export",
"--armor",
string(keyID),
)
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
if err := cmd.Run(); err != nil {
return nil, fmt.Errorf("gpg export failed: %w: %s", err, stderr.String())
}
if stdout.Len() == 0 {
return nil, fmt.Errorf("gpg key not found: %s", keyID)
}
return stdout.Bytes(), nil
}
// gpgGetKeyFingerprint gets the full fingerprint for a key ID.
func gpgGetKeyFingerprint(keyID GPGKeyID) ([]byte, error) {
cmd := exec.Command("gpg", "--batch", "--no-tty",
"--with-colons",
"--fingerprint",
string(keyID),
)
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
if err := cmd.Run(); err != nil {
return nil, fmt.Errorf("gpg fingerprint lookup failed: %w: %s", err, stderr.String())
}
// Parse the colon-delimited output to find the fingerprint
lines := strings.Split(stdout.String(), "\n")
for _, line := range lines {
fields := strings.Split(line, ":")
if len(fields) >= 10 && fields[0] == "fpr" {
return []byte(fields[9]), nil
}
}
return nil, fmt.Errorf("fingerprint not found for key: %s", keyID)
}
// gpgExtractPubKeyFingerprint imports a public key into a temporary keyring
// and extracts its fingerprint. This verifies the key is valid and returns
// the actual fingerprint from the key material.
func gpgExtractPubKeyFingerprint(pubKey []byte) (string, error) {
// Create temporary directory for GPG operations
tmpDir, err := os.MkdirTemp("", "mfer-gpg-fingerprint-*")
if err != nil {
return "", fmt.Errorf("failed to create temp dir: %w", err)
}
defer func() { _ = os.RemoveAll(tmpDir) }()
// Set restrictive permissions
if err := os.Chmod(tmpDir, 0o700); err != nil {
return "", fmt.Errorf("failed to set temp dir permissions: %w", err)
}
// Write public key to temp file
pubKeyFile := filepath.Join(tmpDir, "pubkey.asc")
if err := os.WriteFile(pubKeyFile, pubKey, 0o600); err != nil {
return "", fmt.Errorf("failed to write public key: %w", err)
}
// Import the public key into the temporary keyring
importCmd := exec.Command("gpg", "--batch", "--no-tty",
"--homedir", tmpDir,
"--import",
pubKeyFile,
)
var importStderr bytes.Buffer
importCmd.Stderr = &importStderr
if err := importCmd.Run(); err != nil {
return "", fmt.Errorf("failed to import public key: %w: %s", err, importStderr.String())
}
// List keys to get fingerprint
listCmd := exec.Command("gpg", "--batch", "--no-tty",
"--homedir", tmpDir,
"--with-colons",
"--fingerprint",
)
var listStdout, listStderr bytes.Buffer
listCmd.Stdout = &listStdout
listCmd.Stderr = &listStderr
if err := listCmd.Run(); err != nil {
return "", fmt.Errorf("failed to list keys: %w: %s", err, listStderr.String())
}
// Parse the colon-delimited output to find the fingerprint
lines := strings.Split(listStdout.String(), "\n")
for _, line := range lines {
fields := strings.Split(line, ":")
if len(fields) >= 10 && fields[0] == "fpr" {
return fields[9], nil
}
}
return "", fmt.Errorf("fingerprint not found in imported key")
}
// gpgVerify verifies a detached signature against data using the provided public key.
// It creates a temporary keyring to import the public key for verification.
func gpgVerify(data, signature, pubKey []byte) error {
// Create temporary directory for GPG operations
tmpDir, err := os.MkdirTemp("", "mfer-gpg-verify-*")
if err != nil {
return fmt.Errorf("failed to create temp dir: %w", err)
}
defer func() { _ = os.RemoveAll(tmpDir) }()
// Set restrictive permissions
if err := os.Chmod(tmpDir, 0o700); err != nil {
return fmt.Errorf("failed to set temp dir permissions: %w", err)
}
// Write public key to temp file
pubKeyFile := filepath.Join(tmpDir, "pubkey.asc")
if err := os.WriteFile(pubKeyFile, pubKey, 0o600); err != nil {
return fmt.Errorf("failed to write public key: %w", err)
}
// Write signature to temp file
sigFile := filepath.Join(tmpDir, "signature.asc")
if err := os.WriteFile(sigFile, signature, 0o600); err != nil {
return fmt.Errorf("failed to write signature: %w", err)
}
// Write data to temp file
dataFile := filepath.Join(tmpDir, "data")
if err := os.WriteFile(dataFile, data, 0o600); err != nil {
return fmt.Errorf("failed to write data: %w", err)
}
// Import the public key into the temporary keyring
importCmd := exec.Command("gpg", "--batch", "--no-tty",
"--homedir", tmpDir,
"--import",
pubKeyFile,
)
var importStderr bytes.Buffer
importCmd.Stderr = &importStderr
if err := importCmd.Run(); err != nil {
return fmt.Errorf("failed to import public key: %w: %s", err, importStderr.String())
}
// Verify the signature
verifyCmd := exec.Command("gpg", "--batch", "--no-tty",
"--homedir", tmpDir,
"--verify",
sigFile,
dataFile,
)
var verifyStderr bytes.Buffer
verifyCmd.Stderr = &verifyStderr
if err := verifyCmd.Run(); err != nil {
return fmt.Errorf("signature verification failed: %w: %s", err, verifyStderr.String())
}
return nil
}

347
mfer/gpg_test.go Normal file
View File

@@ -0,0 +1,347 @@
package mfer
import (
"bytes"
"context"
"os"
"os/exec"
"path/filepath"
"strings"
"testing"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// testGPGEnv sets up a temporary GPG home directory with a test key.
// Returns the key ID and a cleanup function.
func testGPGEnv(t *testing.T) (GPGKeyID, func()) {
t.Helper()
// Check if gpg is installed
if _, err := exec.LookPath("gpg"); err != nil {
t.Skip("gpg not installed, skipping signing test")
return "", func() {}
}
// Create temporary GPG home directory
gpgHome, err := os.MkdirTemp("", "mfer-gpg-test-*")
require.NoError(t, err)
// Set restrictive permissions on GPG home
require.NoError(t, os.Chmod(gpgHome, 0o700))
// Save original GNUPGHOME and set new one
origGPGHome := os.Getenv("GNUPGHOME")
require.NoError(t, os.Setenv("GNUPGHOME", gpgHome))
cleanup := func() {
if origGPGHome == "" {
_ = os.Unsetenv("GNUPGHOME")
} else {
_ = os.Setenv("GNUPGHOME", origGPGHome)
}
_ = os.RemoveAll(gpgHome)
}
// Generate a test key with no passphrase
keyParams := `%no-protection
Key-Type: RSA
Key-Length: 2048
Name-Real: MFER Test Key
Name-Email: test@mfer.test
Expire-Date: 0
%commit
`
paramsFile := filepath.Join(gpgHome, "key-params")
require.NoError(t, os.WriteFile(paramsFile, []byte(keyParams), 0o600))
cmd := exec.Command("gpg", "--batch", "--gen-key", paramsFile)
cmd.Env = append(os.Environ(), "GNUPGHOME="+gpgHome)
output, err := cmd.CombinedOutput()
if err != nil {
cleanup()
t.Skipf("failed to generate test GPG key: %v: %s", err, output)
return "", func() {}
}
// Get the key fingerprint
cmd = exec.Command("gpg", "--list-keys", "--with-colons", "test@mfer.test")
cmd.Env = append(os.Environ(), "GNUPGHOME="+gpgHome)
output, err = cmd.Output()
if err != nil {
cleanup()
t.Fatalf("failed to list test key: %v", err)
}
// Parse fingerprint from output
var keyID string
for _, line := range strings.Split(string(output), "\n") {
fields := strings.Split(line, ":")
if len(fields) >= 10 && fields[0] == "fpr" {
keyID = fields[9]
break
}
}
if keyID == "" {
cleanup()
t.Fatal("failed to find test key fingerprint")
}
return GPGKeyID(keyID), cleanup
}
func TestGPGSign(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
data := []byte("test data to sign")
sig, err := gpgSign(data, keyID)
require.NoError(t, err)
assert.NotEmpty(t, sig)
assert.Contains(t, string(sig), "-----BEGIN PGP SIGNATURE-----")
assert.Contains(t, string(sig), "-----END PGP SIGNATURE-----")
}
func TestGPGExportPublicKey(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
pubKey, err := gpgExportPublicKey(keyID)
require.NoError(t, err)
assert.NotEmpty(t, pubKey)
assert.Contains(t, string(pubKey), "-----BEGIN PGP PUBLIC KEY BLOCK-----")
assert.Contains(t, string(pubKey), "-----END PGP PUBLIC KEY BLOCK-----")
}
func TestGPGGetKeyFingerprint(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
fingerprint, err := gpgGetKeyFingerprint(keyID)
require.NoError(t, err)
assert.NotEmpty(t, fingerprint)
// The fingerprint should be 40 hex chars
assert.Len(t, fingerprint, 40, "fingerprint should be 40 hex chars")
}
func TestGPGSignInvalidKey(t *testing.T) {
// Set up test environment (we need GNUPGHOME set)
_, cleanup := testGPGEnv(t)
defer cleanup()
data := []byte("test data")
_, err := gpgSign(data, GPGKeyID("NONEXISTENT_KEY_ID_12345"))
assert.Error(t, err)
}
func TestBuilderWithSigning(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
// Create a builder with signing options
b := NewBuilder()
b.SetSigningOptions(&SigningOptions{
KeyID: keyID,
})
// Add a test file
content := []byte("test file content")
reader := bytes.NewReader(content)
_, err := b.AddFile("test.txt", FileSize(len(content)), ModTime{}, reader, nil)
require.NoError(t, err)
// Build the manifest
var buf bytes.Buffer
err = b.Build(&buf)
require.NoError(t, err)
// Parse the manifest and verify signature fields are populated
manifest, err := NewManifestFromReader(&buf)
require.NoError(t, err)
require.NotNil(t, manifest.pbOuter)
assert.NotEmpty(t, manifest.pbOuter.Signature, "signature should be populated")
assert.NotEmpty(t, manifest.pbOuter.Signer, "signer should be populated")
assert.NotEmpty(t, manifest.pbOuter.SigningPubKey, "signing public key should be populated")
// Verify signature is a valid PGP signature
assert.Contains(t, string(manifest.pbOuter.Signature), "-----BEGIN PGP SIGNATURE-----")
// Verify public key is a valid PGP public key block
assert.Contains(t, string(manifest.pbOuter.SigningPubKey), "-----BEGIN PGP PUBLIC KEY BLOCK-----")
}
func TestScannerWithSigning(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
// Create in-memory filesystem with test files
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("content1"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("content2"), 0o644))
// Create scanner with signing options
opts := &ScannerOptions{
Fs: fs,
SigningOptions: &SigningOptions{
KeyID: keyID,
},
}
s := NewScannerWithOptions(opts)
// Enumerate files
require.NoError(t, s.EnumeratePath("/testdir", nil))
assert.Equal(t, FileCount(2), s.FileCount())
// Generate signed manifest
var buf bytes.Buffer
require.NoError(t, s.ToManifest(context.Background(), &buf, nil))
// Parse and verify
manifest, err := NewManifestFromReader(&buf)
require.NoError(t, err)
assert.NotEmpty(t, manifest.pbOuter.Signature)
assert.NotEmpty(t, manifest.pbOuter.Signer)
assert.NotEmpty(t, manifest.pbOuter.SigningPubKey)
}
func TestGPGVerify(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
data := []byte("test data to sign and verify")
sig, err := gpgSign(data, keyID)
require.NoError(t, err)
pubKey, err := gpgExportPublicKey(keyID)
require.NoError(t, err)
// Verify the signature
err = gpgVerify(data, sig, pubKey)
require.NoError(t, err)
}
func TestGPGVerifyInvalidSignature(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
data := []byte("test data to sign")
sig, err := gpgSign(data, keyID)
require.NoError(t, err)
pubKey, err := gpgExportPublicKey(keyID)
require.NoError(t, err)
// Try to verify with different data - should fail
wrongData := []byte("different data")
err = gpgVerify(wrongData, sig, pubKey)
assert.Error(t, err)
}
func TestGPGVerifyBadPublicKey(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
data := []byte("test data")
sig, err := gpgSign(data, keyID)
require.NoError(t, err)
// Try to verify with invalid public key - should fail
badPubKey := []byte("not a valid public key")
err = gpgVerify(data, sig, badPubKey)
assert.Error(t, err)
}
func TestManifestSignatureVerification(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
// Create a builder with signing options
b := NewBuilder()
b.SetSigningOptions(&SigningOptions{
KeyID: keyID,
})
// Add a test file
content := []byte("test file content for verification")
reader := bytes.NewReader(content)
_, err := b.AddFile("test.txt", FileSize(len(content)), ModTime{}, reader, nil)
require.NoError(t, err)
// Build the manifest
var buf bytes.Buffer
err = b.Build(&buf)
require.NoError(t, err)
// Parse the manifest - signature should be verified during load
manifest, err := NewManifestFromReader(&buf)
require.NoError(t, err)
require.NotNil(t, manifest)
// Signature should be present and valid
assert.NotEmpty(t, manifest.pbOuter.Signature)
}
func TestManifestTamperedSignatureFails(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
// Create a signed manifest
b := NewBuilder()
b.SetSigningOptions(&SigningOptions{
KeyID: keyID,
})
content := []byte("test file content")
reader := bytes.NewReader(content)
_, err := b.AddFile("test.txt", FileSize(len(content)), ModTime{}, reader, nil)
require.NoError(t, err)
var buf bytes.Buffer
err = b.Build(&buf)
require.NoError(t, err)
// Tamper with the signature by replacing some bytes
data := buf.Bytes()
// Find and modify a byte in the signature portion
for i := range data {
if i > 100 && data[i] == 'A' {
data[i] = 'B'
break
}
}
// Try to load the tampered manifest - should fail
_, err = NewManifestFromReader(bytes.NewReader(data))
assert.Error(t, err)
}
func TestBuilderWithoutSigning(t *testing.T) {
// Create a builder without signing options
b := NewBuilder()
// Add a test file
content := []byte("test file content")
reader := bytes.NewReader(content)
_, err := b.AddFile("test.txt", FileSize(len(content)), ModTime{}, reader, nil)
require.NoError(t, err)
// Build the manifest
var buf bytes.Buffer
err = b.Build(&buf)
require.NoError(t, err)
// Parse the manifest and verify signature fields are empty
manifest, err := NewManifestFromReader(&buf)
require.NoError(t, err)
require.NotNil(t, manifest.pbOuter)
assert.Empty(t, manifest.pbOuter.Signature, "signature should be empty when not signing")
assert.Empty(t, manifest.pbOuter.Signer, "signer should be empty when not signing")
assert.Empty(t, manifest.pbOuter.SigningPubKey, "signing public key should be empty when not signing")
}

View File

@@ -2,133 +2,30 @@ package mfer
import (
"bytes"
"context"
"encoding/hex"
"errors"
"fmt"
"io/fs"
"os"
"path"
"path/filepath"
"strings"
"github.com/spf13/afero"
"sneak.berlin/go/mfer/internal/log"
"github.com/multiformats/go-multihash"
)
type manifestFile struct {
path string
info fs.FileInfo
}
func (m *manifestFile) String() string {
return fmt.Sprintf("<File \"%s\">", m.path)
}
// manifest holds the internal representation of a manifest file.
// Use NewManifestFromFile or NewManifestFromReader to load an existing manifest,
// or use Builder to create a new one.
type manifest struct {
sourceFS []afero.Fs
files []*manifestFile
scanOptions *ManifestScanOptions
totalFileSize int64
pbInner *MFFile
pbOuter *MFFileOuter
output *bytes.Buffer
ctx context.Context
errors []*error
pbInner *MFFile
pbOuter *MFFileOuter
output *bytes.Buffer
signingOptions *SigningOptions
fixedUUID []byte // if set, use this UUID instead of generating one
}
func (m *manifest) String() string {
return fmt.Sprintf("<Manifest count=%d totalSize=%d>", len(m.files), m.totalFileSize)
}
// ManifestScanOptions configures behavior when scanning directories for manifest generation.
type ManifestScanOptions struct {
IncludeDotfiles bool // Include files and directories starting with a dot (default: exclude)
FollowSymLinks bool // Resolve symlinks instead of skipping them
}
func (m *manifest) HasError() bool {
return len(m.errors) > 0
}
func (m *manifest) AddError(e error) *manifest {
m.errors = append(m.errors, &e)
return m
}
func (m *manifest) WithContext(c context.Context) *manifest {
m.ctx = c
return m
}
func (m *manifest) addInputPath(inputPath string) error {
abs, err := filepath.Abs(inputPath)
if err != nil {
return err
}
// Validate path exists
if _, err := os.Stat(abs); err != nil {
return fmt.Errorf("path does not exist: %s", inputPath)
}
afs := afero.NewReadOnlyFs(afero.NewBasePathFs(afero.NewOsFs(), abs))
return m.addInputFS(afs)
}
func (m *manifest) addInputFS(f afero.Fs) error {
if m.sourceFS == nil {
m.sourceFS = make([]afero.Fs, 0)
}
m.sourceFS = append(m.sourceFS, f)
// FIXME do some sort of check on f here?
return nil
}
// New creates an empty manifest.
func New() *manifest {
m := &manifest{}
return m
}
// NewFromPaths creates a manifest configured to scan the given filesystem paths.
func NewFromPaths(options *ManifestScanOptions, inputPaths ...string) (*manifest, error) {
log.Dump(inputPaths)
m := New()
m.scanOptions = options
for _, p := range inputPaths {
err := m.addInputPath(p)
if err != nil {
return nil, err
}
}
return m, nil
}
// NewFromFS creates a manifest configured to scan the given afero filesystem.
func NewFromFS(options *ManifestScanOptions, fs afero.Fs) (*manifest, error) {
m := New()
m.scanOptions = options
err := m.addInputFS(fs)
if err != nil {
return nil, err
}
return m, nil
}
func (m *manifest) GetFileCount() int64 {
count := 0
if m.pbInner != nil {
return int64(len(m.pbInner.Files))
count = len(m.pbInner.Files)
}
return int64(len(m.files))
}
func (m *manifest) GetTotalFileSize() int64 {
if m.pbInner != nil {
var total int64
for _, f := range m.pbInner.Files {
total += f.Size
}
return total
}
return m.totalFileSize
return fmt.Sprintf("<Manifest count=%d>", count)
}
// Files returns all file entries from a loaded manifest.
@@ -139,64 +36,25 @@ func (m *manifest) Files() []*MFFilePath {
return m.pbInner.Files
}
func pathIsHidden(p string) bool {
tp := path.Clean(p)
if strings.HasPrefix(tp, ".") {
return true
// signatureString generates the canonical string used for signing/verification.
// Format: MAGIC-UUID-MULTIHASH where UUID and multihash are hex-encoded.
// Requires pbOuter to be set with Uuid and Sha256 fields.
func (m *manifest) signatureString() (string, error) {
if m.pbOuter == nil {
return "", errors.New("pbOuter not set")
}
for {
d, f := path.Split(tp)
if strings.HasPrefix(f, ".") {
return true
}
if d == "" {
return false
}
tp = d[0 : len(d)-1] // trim trailing slash from dir
if len(m.pbOuter.Uuid) == 0 {
return "", errors.New("UUID not set")
}
if len(m.pbOuter.Sha256) == 0 {
return "", errors.New("SHA256 hash not set")
}
}
func (m *manifest) addFile(p string, fi fs.FileInfo, sfsIndex int) error {
if !m.scanOptions.IncludeDotfiles && pathIsHidden(p) {
return nil
mh, err := multihash.Encode(m.pbOuter.Sha256, multihash.SHA2_256)
if err != nil {
return "", fmt.Errorf("failed to encode multihash: %w", err)
}
if fi == nil {
// fi should come from Walk; if nil, stat to get info
var err error
fi, err = m.sourceFS[sfsIndex].Stat(p)
if err != nil {
return err
}
}
if fi.IsDir() {
// manifests contain only files, directories are implied.
return nil
}
cleanPath := p
if cleanPath[0:1] == "/" {
cleanPath = cleanPath[1:]
}
nf := &manifestFile{
path: cleanPath,
info: fi,
}
m.files = append(m.files, nf)
m.totalFileSize = m.totalFileSize + fi.Size()
return nil
}
func (m *manifest) Scan() error {
// FIXME scan and whatever function does the hashing should take ctx
for idx, sfs := range m.sourceFS {
if sfs == nil {
return errors.New("invalid source fs")
}
e := afero.Walk(sfs, "/", func(p string, info fs.FileInfo, err error) error {
return m.addFile(p, info, idx)
})
if e != nil {
return e
}
}
return nil
uuidStr := hex.EncodeToString(m.pbOuter.Uuid)
mhStr := hex.EncodeToString(mh)
return fmt.Sprintf("%s-%s-%s", MAGIC, uuidStr, mhStr), nil
}

View File

@@ -1,7 +1,7 @@
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
// protoc-gen-go v1.36.11
// protoc v6.33.0
// protoc v6.33.4
// source: mf.proto
package mfer
@@ -218,8 +218,10 @@ type MFFileOuter struct {
CompressionType MFFileOuter_CompressionType `protobuf:"varint,102,opt,name=compressionType,proto3,enum=MFFileOuter_CompressionType" json:"compressionType,omitempty"`
// these are used solely to detect corruption/truncation
// and not for cryptographic integrity.
Size int64 `protobuf:"varint,103,opt,name=size,proto3" json:"size,omitempty"`
Sha256 []byte `protobuf:"bytes,104,opt,name=sha256,proto3" json:"sha256,omitempty"`
Size int64 `protobuf:"varint,103,opt,name=size,proto3" json:"size,omitempty"`
Sha256 []byte `protobuf:"bytes,104,opt,name=sha256,proto3" json:"sha256,omitempty"`
// uuid must match the uuid in the inner message
Uuid []byte `protobuf:"bytes,105,opt,name=uuid,proto3" json:"uuid,omitempty"`
InnerMessage []byte `protobuf:"bytes,199,opt,name=innerMessage,proto3" json:"innerMessage,omitempty"`
// detached signature, ascii or binary
Signature []byte `protobuf:"bytes,201,opt,name=signature,proto3,oneof" json:"signature,omitempty"`
@@ -289,6 +291,13 @@ func (x *MFFileOuter) GetSha256() []byte {
return nil
}
func (x *MFFileOuter) GetUuid() []byte {
if x != nil {
return x.Uuid
}
return nil
}
func (x *MFFileOuter) GetInnerMessage() []byte {
if x != nil {
return x.InnerMessage
@@ -320,6 +329,9 @@ func (x *MFFileOuter) GetSigningPubKey() []byte {
type MFFilePath struct {
state protoimpl.MessageState `protogen:"open.v1"`
// required attributes:
// Path invariants: must be valid UTF-8, use forward slashes only,
// be relative (no leading /), contain no ".." segments, and no
// empty segments (no "//").
Path string `protobuf:"bytes,1,opt,name=path,proto3" json:"path,omitempty"`
Size int64 `protobuf:"varint,2,opt,name=size,proto3" json:"size,omitempty"`
// gotta have at least one:
@@ -328,7 +340,6 @@ type MFFilePath struct {
MimeType *string `protobuf:"bytes,301,opt,name=mimeType,proto3,oneof" json:"mimeType,omitempty"`
Mtime *Timestamp `protobuf:"bytes,302,opt,name=mtime,proto3,oneof" json:"mtime,omitempty"`
Ctime *Timestamp `protobuf:"bytes,303,opt,name=ctime,proto3,oneof" json:"ctime,omitempty"`
Atime *Timestamp `protobuf:"bytes,304,opt,name=atime,proto3,oneof" json:"atime,omitempty"`
unknownFields protoimpl.UnknownFields
sizeCache protoimpl.SizeCache
}
@@ -405,13 +416,6 @@ func (x *MFFilePath) GetCtime() *Timestamp {
return nil
}
func (x *MFFilePath) GetAtime() *Timestamp {
if x != nil {
return x.Atime
}
return nil
}
type MFFileChecksum struct {
state protoimpl.MessageState `protogen:"open.v1"`
// 1.0 golang implementation must write a multihash here
@@ -463,6 +467,9 @@ type MFFile struct {
Version MFFile_Version `protobuf:"varint,100,opt,name=version,proto3,enum=MFFile_Version" json:"version,omitempty"`
// required manifest attributes:
Files []*MFFilePath `protobuf:"bytes,101,rep,name=files,proto3" json:"files,omitempty"`
// uuid is a random v4 UUID generated when creating the manifest
// used as part of the signature to prevent replay attacks
Uuid []byte `protobuf:"bytes,102,opt,name=uuid,proto3" json:"uuid,omitempty"`
// optional manifest attributes 2xx:
CreatedAt *Timestamp `protobuf:"bytes,201,opt,name=createdAt,proto3,oneof" json:"createdAt,omitempty"`
unknownFields protoimpl.UnknownFields
@@ -513,6 +520,13 @@ func (x *MFFile) GetFiles() []*MFFilePath {
return nil
}
func (x *MFFile) GetUuid() []byte {
if x != nil {
return x.Uuid
}
return nil
}
func (x *MFFile) GetCreatedAt() *Timestamp {
if x != nil {
return x.CreatedAt
@@ -527,12 +541,13 @@ const file_mf_proto_rawDesc = "" +
"\bmf.proto\";\n" +
"\tTimestamp\x12\x18\n" +
"\aseconds\x18\x01 \x01(\x03R\aseconds\x12\x14\n" +
"\x05nanos\x18\x02 \x01(\x05R\x05nanos\"\xdc\x03\n" +
"\x05nanos\x18\x02 \x01(\x05R\x05nanos\"\xf0\x03\n" +
"\vMFFileOuter\x12.\n" +
"\aversion\x18e \x01(\x0e2\x14.MFFileOuter.VersionR\aversion\x12F\n" +
"\x0fcompressionType\x18f \x01(\x0e2\x1c.MFFileOuter.CompressionTypeR\x0fcompressionType\x12\x12\n" +
"\x04size\x18g \x01(\x03R\x04size\x12\x16\n" +
"\x06sha256\x18h \x01(\fR\x06sha256\x12#\n" +
"\x06sha256\x18h \x01(\fR\x06sha256\x12\x12\n" +
"\x04uuid\x18i \x01(\fR\x04uuid\x12#\n" +
"\finnerMessage\x18\xc7\x01 \x01(\fR\finnerMessage\x12\"\n" +
"\tsignature\x18\xc9\x01 \x01(\fH\x00R\tsignature\x88\x01\x01\x12\x1c\n" +
"\x06signer\x18\xca\x01 \x01(\fH\x01R\x06signer\x88\x01\x01\x12*\n" +
@@ -546,7 +561,7 @@ const file_mf_proto_rawDesc = "" +
"\n" +
"_signatureB\t\n" +
"\a_signerB\x10\n" +
"\x0e_signingPubKey\"\xa2\x02\n" +
"\x0e_signingPubKey\"\xf0\x01\n" +
"\n" +
"MFFilePath\x12\x12\n" +
"\x04path\x18\x01 \x01(\tR\x04path\x12\x12\n" +
@@ -556,18 +571,16 @@ const file_mf_proto_rawDesc = "" +
"\x05mtime\x18\xae\x02 \x01(\v2\n" +
".TimestampH\x01R\x05mtime\x88\x01\x01\x12&\n" +
"\x05ctime\x18\xaf\x02 \x01(\v2\n" +
".TimestampH\x02R\x05ctime\x88\x01\x01\x12&\n" +
"\x05atime\x18\xb0\x02 \x01(\v2\n" +
".TimestampH\x03R\x05atime\x88\x01\x01B\v\n" +
".TimestampH\x02R\x05ctime\x88\x01\x01B\v\n" +
"\t_mimeTypeB\b\n" +
"\x06_mtimeB\b\n" +
"\x06_ctimeB\b\n" +
"\x06_atime\".\n" +
"\x06_ctime\".\n" +
"\x0eMFFileChecksum\x12\x1c\n" +
"\tmultiHash\x18\x01 \x01(\fR\tmultiHash\"\xc2\x01\n" +
"\tmultiHash\x18\x01 \x01(\fR\tmultiHash\"\xd6\x01\n" +
"\x06MFFile\x12)\n" +
"\aversion\x18d \x01(\x0e2\x0f.MFFile.VersionR\aversion\x12!\n" +
"\x05files\x18e \x03(\v2\v.MFFilePathR\x05files\x12.\n" +
"\x05files\x18e \x03(\v2\v.MFFilePathR\x05files\x12\x12\n" +
"\x04uuid\x18f \x01(\fR\x04uuid\x12.\n" +
"\tcreatedAt\x18\xc9\x01 \x01(\v2\n" +
".TimestampH\x00R\tcreatedAt\x88\x01\x01\",\n" +
"\aVersion\x12\x10\n" +
@@ -606,15 +619,14 @@ var file_mf_proto_depIdxs = []int32{
6, // 2: MFFilePath.hashes:type_name -> MFFileChecksum
3, // 3: MFFilePath.mtime:type_name -> Timestamp
3, // 4: MFFilePath.ctime:type_name -> Timestamp
3, // 5: MFFilePath.atime:type_name -> Timestamp
2, // 6: MFFile.version:type_name -> MFFile.Version
5, // 7: MFFile.files:type_name -> MFFilePath
3, // 8: MFFile.createdAt:type_name -> Timestamp
9, // [9:9] is the sub-list for method output_type
9, // [9:9] is the sub-list for method input_type
9, // [9:9] is the sub-list for extension type_name
9, // [9:9] is the sub-list for extension extendee
0, // [0:9] is the sub-list for field type_name
2, // 5: MFFile.version:type_name -> MFFile.Version
5, // 6: MFFile.files:type_name -> MFFilePath
3, // 7: MFFile.createdAt:type_name -> Timestamp
8, // [8:8] is the sub-list for method output_type
8, // [8:8] is the sub-list for method input_type
8, // [8:8] is the sub-list for extension type_name
8, // [8:8] is the sub-list for extension extendee
0, // [0:8] is the sub-list for field type_name
}
func init() { file_mf_proto_init() }

View File

@@ -28,6 +28,9 @@ message MFFileOuter {
int64 size = 103;
bytes sha256 = 104;
// uuid must match the uuid in the inner message
bytes uuid = 105;
bytes innerMessage = 199;
// 2xx for optional manifest root attributes
// think we might use gosignify instead of gpg:
@@ -43,6 +46,9 @@ message MFFileOuter {
message MFFilePath {
// required attributes:
// Path invariants: must be valid UTF-8, use forward slashes only,
// be relative (no leading /), contain no ".." segments, and no
// empty segments (no "//").
string path = 1;
int64 size = 2;
@@ -53,7 +59,6 @@ message MFFilePath {
optional string mimeType = 301;
optional Timestamp mtime = 302;
optional Timestamp ctime = 303;
optional Timestamp atime = 304;
}
message MFFileChecksum {
@@ -72,6 +77,10 @@ message MFFile {
// required manifest attributes:
repeated MFFilePath files = 101;
// uuid is a random v4 UUID generated when creating the manifest
// used as part of the signature to prevent replay attacks
bytes uuid = 102;
// optional manifest attributes 2xx:
optional Timestamp createdAt = 201;
}

View File

@@ -1,15 +0,0 @@
package mfer
import (
"testing"
"github.com/stretchr/testify/assert"
)
func TestPathHiddenFunc(t *testing.T) {
assert.False(t, pathIsHidden("/a/b/c/hello.txt"))
assert.True(t, pathIsHidden("/a/b/c/.hello.txt"))
assert.True(t, pathIsHidden("/a/.b/c/hello.txt"))
assert.True(t, pathIsHidden("/.a/b/c/hello.txt"))
assert.False(t, pathIsHidden("./a/b/c/hello.txt"))
}

View File

@@ -1,33 +0,0 @@
package mfer
import (
"io"
"os"
)
func (m *manifest) WriteToFile(path string) error {
// FIXME refuse to overwrite without -f if file exists
f, err := os.Create(path)
if err != nil {
return err
}
defer func() { _ = f.Close() }()
return m.Write(f)
}
func (m *manifest) Write(output io.Writer) error {
if m.pbOuter == nil {
err := m.generate()
if err != nil {
return err
}
}
_, err := output.Write(m.output.Bytes())
if err != nil {
return err
}
return nil
}

View File

@@ -1,4 +1,4 @@
package scanner
package mfer
import (
"context"
@@ -13,7 +13,6 @@ import (
"github.com/dustin/go-humanize"
"github.com/spf13/afero"
"sneak.berlin/go/mfer/internal/log"
"sneak.berlin/go/mfer/mfer"
)
// Phase 1: Enumeration
@@ -23,8 +22,8 @@ import (
// EnumerateStatus contains progress information for the enumeration phase.
type EnumerateStatus struct {
FilesFound int64 // Number of files discovered so far
BytesFound int64 // Total size of discovered files (from stat)
FilesFound FileCount // Number of files discovered so far
BytesFound FileSize // Total size of discovered files (from stat)
}
// Phase 2: Scan (ToManifest)
@@ -34,47 +33,51 @@ type EnumerateStatus struct {
// ScanStatus contains progress information for the scan phase.
type ScanStatus struct {
TotalFiles int64 // Total number of files to scan
ScannedFiles int64 // Number of files scanned so far
TotalBytes int64 // Total bytes to read (sum of all file sizes)
ScannedBytes int64 // Bytes read so far
TotalFiles FileCount // Total number of files to scan
ScannedFiles FileCount // Number of files scanned so far
TotalBytes FileSize // Total bytes to read (sum of all file sizes)
ScannedBytes FileSize // Bytes read so far
BytesPerSec float64 // Current throughput rate
ETA time.Duration // Estimated time to completion
}
// Options configures scanner behavior.
type Options struct {
IncludeDotfiles bool // Include files and directories starting with a dot (default: exclude)
FollowSymLinks bool // Resolve symlinks instead of skipping them
Fs afero.Fs // Filesystem to use, defaults to OsFs if nil
// ScannerOptions configures scanner behavior.
type ScannerOptions struct {
IncludeDotfiles bool // Include files and directories starting with a dot (default: exclude)
FollowSymLinks bool // Resolve symlinks instead of skipping them
IncludeTimestamps bool // Include createdAt timestamp in manifest (default: omit for determinism)
Fs afero.Fs // Filesystem to use, defaults to OsFs if nil
SigningOptions *SigningOptions // GPG signing options (nil = no signing)
Seed string // If set, derive a deterministic UUID from this seed
}
// FileEntry represents a file that has been enumerated.
type FileEntry struct {
Path string // Relative path (used in manifest)
AbsPath string // Absolute path (used for reading file content)
Size int64 // File size in bytes
Mtime time.Time // Last modification time
Ctime time.Time // Creation time (platform-dependent)
Path RelFilePath // Relative path (used in manifest)
AbsPath AbsFilePath // Absolute path (used for reading file content)
Size FileSize // File size in bytes
Mtime ModTime // Last modification time
Ctime time.Time // Creation time (platform-dependent)
}
// Scanner accumulates files and generates manifests from them.
type Scanner struct {
mu sync.RWMutex
files []*FileEntry
options *Options
fs afero.Fs
mu sync.RWMutex
files []*FileEntry
totalBytes FileSize // cached sum of all file sizes
options *ScannerOptions
fs afero.Fs
}
// New creates a new Scanner with default options.
func New() *Scanner {
return NewWithOptions(nil)
// NewScanner creates a new Scanner with default options.
func NewScanner() *Scanner {
return NewScannerWithOptions(nil)
}
// NewWithOptions creates a new Scanner with the given options.
func NewWithOptions(opts *Options) *Scanner {
// NewScannerWithOptions creates a new Scanner with the given options.
func NewScannerWithOptions(opts *ScannerOptions) *Scanner {
if opts == nil {
opts = &Options{}
opts = &ScannerOptions{}
}
fs := opts.Fs
if fs == nil {
@@ -154,7 +157,7 @@ func (s *Scanner) enumerateFS(afs afero.Fs, basePath string, progress chan<- Enu
if err != nil {
return err
}
if !s.options.IncludeDotfiles && pathIsHidden(p) {
if !s.options.IncludeDotfiles && IsHiddenPath(p) {
if info.IsDir() {
return filepath.SkipDir
}
@@ -206,21 +209,19 @@ func (s *Scanner) enumerateFileWithInfo(filePath string, basePath string, info f
}
entry := &FileEntry{
Path: cleanPath,
AbsPath: absPath,
Size: info.Size(),
Mtime: info.ModTime(),
Path: RelFilePath(cleanPath),
AbsPath: AbsFilePath(absPath),
Size: FileSize(info.Size()),
Mtime: ModTime(info.ModTime()),
// Note: Ctime not available from fs.FileInfo on all platforms
// Will need platform-specific code to extract it
}
s.mu.Lock()
s.files = append(s.files, entry)
filesFound := int64(len(s.files))
var bytesFound int64
for _, f := range s.files {
bytesFound += f.Size
}
s.totalBytes += entry.Size
filesFound := FileCount(len(s.files))
bytesFound := s.totalBytes
s.mu.Unlock()
sendEnumerateStatus(progress, EnumerateStatus{
@@ -241,21 +242,17 @@ func (s *Scanner) Files() []*FileEntry {
}
// FileCount returns the number of files in the scanner.
func (s *Scanner) FileCount() int64 {
func (s *Scanner) FileCount() FileCount {
s.mu.RLock()
defer s.mu.RUnlock()
return int64(len(s.files))
return FileCount(len(s.files))
}
// TotalBytes returns the total size of all files in the scanner.
func (s *Scanner) TotalBytes() int64 {
func (s *Scanner) TotalBytes() FileSize {
s.mu.RLock()
defer s.mu.RUnlock()
var total int64
for _, f := range s.files {
total += f.Size
}
return total
return s.totalBytes
}
// ToManifest reads all file contents, computes hashes, and generates a manifest.
@@ -270,17 +267,26 @@ func (s *Scanner) ToManifest(ctx context.Context, w io.Writer, progress chan<- S
s.mu.RLock()
files := make([]*FileEntry, len(s.files))
copy(files, s.files)
totalFiles := int64(len(files))
var totalBytes int64
totalFiles := FileCount(len(files))
var totalBytes FileSize
for _, f := range files {
totalBytes += f.Size
}
s.mu.RUnlock()
builder := mfer.NewBuilder()
builder := NewBuilder()
if s.options.IncludeTimestamps {
builder.SetIncludeTimestamps(true)
}
if s.options.SigningOptions != nil {
builder.SetSigningOptions(s.options.SigningOptions)
}
if s.options.Seed != "" {
builder.SetSeed(s.options.Seed)
}
var scannedFiles int64
var scannedBytes int64
var scannedFiles FileCount
var scannedBytes FileSize
lastProgressTime := time.Now()
startTime := time.Now()
@@ -293,18 +299,18 @@ func (s *Scanner) ToManifest(ctx context.Context, w io.Writer, progress chan<- S
}
// Open file
f, err := s.fs.Open(entry.AbsPath)
f, err := s.fs.Open(string(entry.AbsPath))
if err != nil {
return err
}
// Create progress channel for this file
var fileProgress chan mfer.FileHashProgress
var fileProgress chan FileHashProgress
var wg sync.WaitGroup
if progress != nil {
fileProgress = make(chan mfer.FileHashProgress, 1)
fileProgress = make(chan FileHashProgress, 1)
wg.Add(1)
go func(baseScannedBytes int64) {
go func(baseScannedBytes FileSize) {
defer wg.Done()
for p := range fileProgress {
// Send progress at most once per second
@@ -382,10 +388,14 @@ func (s *Scanner) ToManifest(ctx context.Context, w io.Writer, progress chan<- S
return builder.Build(w)
}
// pathIsHidden returns true if the path or any of its parent directories
// IsHiddenPath returns true if the path or any of its parent directories
// start with a dot (hidden files/directories).
func pathIsHidden(p string) bool {
// The path should use forward slashes.
func IsHiddenPath(p string) bool {
tp := path.Clean(p)
if tp == "." || tp == "/" {
return false
}
if strings.HasPrefix(tp, ".") {
return true
}

View File

@@ -1,4 +1,4 @@
package scanner
package mfer
import (
"bytes"
@@ -11,77 +11,77 @@ import (
"github.com/stretchr/testify/require"
)
func TestNew(t *testing.T) {
s := New()
func TestNewScanner(t *testing.T) {
s := NewScanner()
assert.NotNil(t, s)
assert.Equal(t, int64(0), s.FileCount())
assert.Equal(t, int64(0), s.TotalBytes())
assert.Equal(t, FileCount(0), s.FileCount())
assert.Equal(t, FileSize(0), s.TotalBytes())
}
func TestNewWithOptions(t *testing.T) {
func TestNewScannerWithOptions(t *testing.T) {
t.Run("nil options", func(t *testing.T) {
s := NewWithOptions(nil)
s := NewScannerWithOptions(nil)
assert.NotNil(t, s)
})
t.Run("with options", func(t *testing.T) {
fs := afero.NewMemMapFs()
opts := &Options{
opts := &ScannerOptions{
IncludeDotfiles: true,
FollowSymLinks: true,
Fs: fs,
}
s := NewWithOptions(opts)
s := NewScannerWithOptions(opts)
assert.NotNil(t, s)
})
}
func TestEnumerateFile(t *testing.T) {
func TestScannerEnumerateFile(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, afero.WriteFile(fs, "/test.txt", []byte("hello world"), 0644))
require.NoError(t, afero.WriteFile(fs, "/test.txt", []byte("hello world"), 0o644))
s := NewWithOptions(&Options{Fs: fs})
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
err := s.EnumerateFile("/test.txt")
require.NoError(t, err)
assert.Equal(t, int64(1), s.FileCount())
assert.Equal(t, int64(11), s.TotalBytes())
assert.Equal(t, FileCount(1), s.FileCount())
assert.Equal(t, FileSize(11), s.TotalBytes())
files := s.Files()
require.Len(t, files, 1)
assert.Equal(t, "test.txt", files[0].Path)
assert.Equal(t, int64(11), files[0].Size)
assert.Equal(t, RelFilePath("test.txt"), files[0].Path)
assert.Equal(t, FileSize(11), files[0].Size)
}
func TestEnumerateFileMissing(t *testing.T) {
func TestScannerEnumerateFileMissing(t *testing.T) {
fs := afero.NewMemMapFs()
s := NewWithOptions(&Options{Fs: fs})
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
err := s.EnumerateFile("/nonexistent.txt")
assert.Error(t, err)
}
func TestEnumeratePath(t *testing.T) {
func TestScannerEnumeratePath(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir/subdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("one"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("two"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/subdir/file3.txt", []byte("three"), 0644))
require.NoError(t, fs.MkdirAll("/testdir/subdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("one"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("two"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/subdir/file3.txt", []byte("three"), 0o644))
s := NewWithOptions(&Options{Fs: fs})
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
err := s.EnumeratePath("/testdir", nil)
require.NoError(t, err)
assert.Equal(t, int64(3), s.FileCount())
assert.Equal(t, int64(3+3+5), s.TotalBytes())
assert.Equal(t, FileCount(3), s.FileCount())
assert.Equal(t, FileSize(3+3+5), s.TotalBytes())
}
func TestEnumeratePathWithProgress(t *testing.T) {
func TestScannerEnumeratePathWithProgress(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("one"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("two"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("one"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("two"), 0o644))
s := NewWithOptions(&Options{Fs: fs})
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
progress := make(chan EnumerateStatus, 10)
err := s.EnumeratePath("/testdir", progress)
@@ -95,80 +95,57 @@ func TestEnumeratePathWithProgress(t *testing.T) {
assert.NotEmpty(t, updates)
// Final update should show all files
final := updates[len(updates)-1]
assert.Equal(t, int64(2), final.FilesFound)
assert.Equal(t, int64(6), final.BytesFound)
assert.Equal(t, FileCount(2), final.FilesFound)
assert.Equal(t, FileSize(6), final.BytesFound)
}
func TestEnumeratePaths(t *testing.T) {
func TestScannerEnumeratePaths(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/dir1", 0755))
require.NoError(t, fs.MkdirAll("/dir2", 0755))
require.NoError(t, afero.WriteFile(fs, "/dir1/a.txt", []byte("aaa"), 0644))
require.NoError(t, afero.WriteFile(fs, "/dir2/b.txt", []byte("bbb"), 0644))
require.NoError(t, fs.MkdirAll("/dir1", 0o755))
require.NoError(t, fs.MkdirAll("/dir2", 0o755))
require.NoError(t, afero.WriteFile(fs, "/dir1/a.txt", []byte("aaa"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/dir2/b.txt", []byte("bbb"), 0o644))
s := NewWithOptions(&Options{Fs: fs})
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
err := s.EnumeratePaths(nil, "/dir1", "/dir2")
require.NoError(t, err)
assert.Equal(t, int64(2), s.FileCount())
assert.Equal(t, FileCount(2), s.FileCount())
}
func TestExcludeDotfiles(t *testing.T) {
func TestScannerExcludeDotfiles(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir/.hidden", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/visible.txt", []byte("visible"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/.hidden.txt", []byte("hidden"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/.hidden/inside.txt", []byte("inside"), 0644))
require.NoError(t, fs.MkdirAll("/testdir/.hidden", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/visible.txt", []byte("visible"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/.hidden.txt", []byte("hidden"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/.hidden/inside.txt", []byte("inside"), 0o644))
t.Run("exclude by default", func(t *testing.T) {
s := NewWithOptions(&Options{Fs: fs, IncludeDotfiles: false})
s := NewScannerWithOptions(&ScannerOptions{Fs: fs, IncludeDotfiles: false})
err := s.EnumeratePath("/testdir", nil)
require.NoError(t, err)
assert.Equal(t, int64(1), s.FileCount())
assert.Equal(t, FileCount(1), s.FileCount())
files := s.Files()
assert.Equal(t, "visible.txt", files[0].Path)
assert.Equal(t, RelFilePath("visible.txt"), files[0].Path)
})
t.Run("include when enabled", func(t *testing.T) {
s := NewWithOptions(&Options{Fs: fs, IncludeDotfiles: true})
s := NewScannerWithOptions(&ScannerOptions{Fs: fs, IncludeDotfiles: true})
err := s.EnumeratePath("/testdir", nil)
require.NoError(t, err)
assert.Equal(t, int64(3), s.FileCount())
assert.Equal(t, FileCount(3), s.FileCount())
})
}
func TestPathIsHidden(t *testing.T) {
tests := []struct {
path string
hidden bool
}{
{"file.txt", false},
{".hidden", true},
{"dir/file.txt", false},
{"dir/.hidden", true},
{".dir/file.txt", true},
{"/absolute/path", false},
{"/absolute/.hidden", true},
{"./relative", false}, // path.Clean removes leading ./
{"a/b/c/.d/e", true},
}
for _, tt := range tests {
t.Run(tt.path, func(t *testing.T) {
assert.Equal(t, tt.hidden, pathIsHidden(tt.path), "pathIsHidden(%q)", tt.path)
})
}
}
func TestToManifest(t *testing.T) {
func TestScannerToManifest(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("content one"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("content two"), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("content one"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("content two"), 0o644))
s := NewWithOptions(&Options{Fs: fs})
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
err := s.EnumeratePath("/testdir", nil)
require.NoError(t, err)
@@ -178,15 +155,15 @@ func TestToManifest(t *testing.T) {
// Manifest should have magic bytes
assert.True(t, buf.Len() > 0)
assert.Equal(t, "ZNAVSRFG", string(buf.Bytes()[:8]))
assert.Equal(t, MAGIC, string(buf.Bytes()[:8]))
}
func TestToManifestWithProgress(t *testing.T) {
func TestScannerToManifestWithProgress(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file.txt", bytes.Repeat([]byte("x"), 1000), 0644))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file.txt", bytes.Repeat([]byte("x"), 1000), 0o644))
s := NewWithOptions(&Options{Fs: fs})
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
err := s.EnumeratePath("/testdir", nil)
require.NoError(t, err)
@@ -204,22 +181,22 @@ func TestToManifestWithProgress(t *testing.T) {
assert.NotEmpty(t, updates)
// Final update should show completion
final := updates[len(updates)-1]
assert.Equal(t, int64(1), final.TotalFiles)
assert.Equal(t, int64(1), final.ScannedFiles)
assert.Equal(t, int64(1000), final.TotalBytes)
assert.Equal(t, int64(1000), final.ScannedBytes)
assert.Equal(t, FileCount(1), final.TotalFiles)
assert.Equal(t, FileCount(1), final.ScannedFiles)
assert.Equal(t, FileSize(1000), final.TotalBytes)
assert.Equal(t, FileSize(1000), final.ScannedBytes)
}
func TestToManifestContextCancellation(t *testing.T) {
func TestScannerToManifestContextCancellation(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
// Create many files to ensure we have time to cancel
for i := 0; i < 100; i++ {
name := string(rune('a'+i%26)) + string(rune('0'+i/26)) + ".txt"
require.NoError(t, afero.WriteFile(fs, "/testdir/"+name, bytes.Repeat([]byte("x"), 100), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/"+name, bytes.Repeat([]byte("x"), 100), 0o644))
}
s := NewWithOptions(&Options{Fs: fs})
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
err := s.EnumeratePath("/testdir", nil)
require.NoError(t, err)
@@ -231,9 +208,9 @@ func TestToManifestContextCancellation(t *testing.T) {
assert.ErrorIs(t, err, context.Canceled)
}
func TestToManifestEmptyScanner(t *testing.T) {
func TestScannerToManifestEmptyScanner(t *testing.T) {
fs := afero.NewMemMapFs()
s := NewWithOptions(&Options{Fs: fs})
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
var buf bytes.Buffer
err := s.ToManifest(context.Background(), &buf, nil)
@@ -241,14 +218,14 @@ func TestToManifestEmptyScanner(t *testing.T) {
// Should still produce a valid manifest
assert.True(t, buf.Len() > 0)
assert.Equal(t, "ZNAVSRFG", string(buf.Bytes()[:8]))
assert.Equal(t, MAGIC, string(buf.Bytes()[:8]))
}
func TestFilesCopiesSlice(t *testing.T) {
func TestScannerFilesCopiesSlice(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, afero.WriteFile(fs, "/test.txt", []byte("hello"), 0644))
require.NoError(t, afero.WriteFile(fs, "/test.txt", []byte("hello"), 0o644))
s := NewWithOptions(&Options{Fs: fs})
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
require.NoError(t, s.EnumerateFile("/test.txt"))
files1 := s.Files()
@@ -258,20 +235,20 @@ func TestFilesCopiesSlice(t *testing.T) {
assert.NotSame(t, &files1[0], &files2[0])
}
func TestEnumerateFS(t *testing.T) {
func TestScannerEnumerateFS(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir/sub", 0755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file.txt", []byte("hello"), 0644))
require.NoError(t, afero.WriteFile(fs, "/testdir/sub/nested.txt", []byte("world"), 0644))
require.NoError(t, fs.MkdirAll("/testdir/sub", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file.txt", []byte("hello"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/sub/nested.txt", []byte("world"), 0o644))
// Create a basepath filesystem
baseFs := afero.NewBasePathFs(fs, "/testdir")
s := NewWithOptions(&Options{Fs: fs})
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
err := s.EnumerateFS(baseFs, "/testdir", nil)
require.NoError(t, err)
assert.Equal(t, int64(2), s.FileCount())
assert.Equal(t, FileCount(2), s.FileCount())
}
func TestSendEnumerateStatusNonBlocking(t *testing.T) {
@@ -317,37 +294,37 @@ func TestSendStatusNilChannel(t *testing.T) {
sendScanStatus(nil, ScanStatus{})
}
func TestFileEntryFields(t *testing.T) {
func TestScannerFileEntryFields(t *testing.T) {
fs := afero.NewMemMapFs()
now := time.Now().Truncate(time.Second)
require.NoError(t, afero.WriteFile(fs, "/test.txt", []byte("content"), 0644))
require.NoError(t, afero.WriteFile(fs, "/test.txt", []byte("content"), 0o644))
require.NoError(t, fs.Chtimes("/test.txt", now, now))
s := NewWithOptions(&Options{Fs: fs})
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
require.NoError(t, s.EnumerateFile("/test.txt"))
files := s.Files()
require.Len(t, files, 1)
entry := files[0]
assert.Equal(t, "test.txt", entry.Path)
assert.Contains(t, entry.AbsPath, "test.txt")
assert.Equal(t, int64(7), entry.Size)
assert.Equal(t, RelFilePath("test.txt"), entry.Path)
assert.Contains(t, string(entry.AbsPath), "test.txt")
assert.Equal(t, FileSize(7), entry.Size)
// Mtime should be set (within a second of now)
assert.WithinDuration(t, now, entry.Mtime, 2*time.Second)
assert.WithinDuration(t, now, time.Time(entry.Mtime), 2*time.Second)
}
func TestLargeFileEnumeration(t *testing.T) {
func TestScannerLargeFileEnumeration(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir", 0755))
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
// Create 100 files
for i := 0; i < 100; i++ {
name := "/testdir/" + string(rune('a'+i%26)) + string(rune('0'+i/26%10)) + ".txt"
require.NoError(t, afero.WriteFile(fs, name, []byte("data"), 0644))
require.NoError(t, afero.WriteFile(fs, name, []byte("data"), 0o644))
}
s := NewWithOptions(&Options{Fs: fs})
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
progress := make(chan EnumerateStatus, 200)
err := s.EnumeratePath("/testdir", progress)
@@ -357,6 +334,33 @@ func TestLargeFileEnumeration(t *testing.T) {
for range progress {
}
assert.Equal(t, int64(100), s.FileCount())
assert.Equal(t, int64(400), s.TotalBytes()) // 100 * 4 bytes
assert.Equal(t, FileCount(100), s.FileCount())
assert.Equal(t, FileSize(400), s.TotalBytes()) // 100 * 4 bytes
}
func TestIsHiddenPath(t *testing.T) {
tests := []struct {
path string
hidden bool
}{
{"file.txt", false},
{".hidden", true},
{"dir/file.txt", false},
{"dir/.hidden", true},
{".dir/file.txt", true},
{"/absolute/path", false},
{"/absolute/.hidden", true},
{"./relative", false}, // path.Clean removes leading ./
{"a/b/c/.d/e", true},
{".", false}, // current directory is not hidden (#14)
{"/", false}, // root is not hidden
{"./", false}, // current directory with trailing slash
{"./file.txt", false}, // file in current directory
}
for _, tt := range tests {
t.Run(tt.path, func(t *testing.T) {
assert.Equal(t, tt.hidden, IsHiddenPath(tt.path), "IsHiddenPath(%q)", tt.path)
})
}
}

View File

@@ -4,8 +4,10 @@ import (
"bytes"
"crypto/sha256"
"errors"
"fmt"
"time"
"github.com/google/uuid"
"github.com/klauspost/compress/zstd"
"google.golang.org/protobuf/proto"
)
@@ -14,11 +16,10 @@ import (
const MAGIC string = "ZNAVSRFG"
func newTimestampFromTime(t time.Time) *Timestamp {
out := &Timestamp{
return &Timestamp{
Seconds: t.Unix(),
Nanos: int32(t.UnixNano() - (t.Unix() * 1000000000)),
Nanos: int32(t.Nanosecond()),
}
return out
}
func (m *manifest) generate() error {
@@ -33,12 +34,12 @@ func (m *manifest) generate() error {
}
dat, err := proto.MarshalOptions{Deterministic: true}.Marshal(m.pbOuter)
if err != nil {
return err
return fmt.Errorf("serialize: marshal outer: %w", err)
}
m.output = bytes.NewBuffer([]byte(MAGIC))
_, err = m.output.Write(dat)
if err != nil {
return err
return fmt.Errorf("serialize: write output: %w", err)
}
return nil
}
@@ -47,33 +48,76 @@ func (m *manifest) generateOuter() error {
if m.pbInner == nil {
return errors.New("internal error")
}
// Use fixed UUID if provided, otherwise generate a new one
var manifestUUID uuid.UUID
if len(m.fixedUUID) == 16 {
copy(manifestUUID[:], m.fixedUUID)
} else {
manifestUUID = uuid.New()
}
m.pbInner.Uuid = manifestUUID[:]
innerData, err := proto.MarshalOptions{Deterministic: true}.Marshal(m.pbInner)
if err != nil {
return err
return fmt.Errorf("serialize: marshal inner: %w", err)
}
h := sha256.New()
h.Write(innerData)
// Compress the inner data
idc := new(bytes.Buffer)
zw, err := zstd.NewWriter(idc, zstd.WithEncoderLevel(zstd.SpeedBestCompression))
if err != nil {
return err
return fmt.Errorf("serialize: create compressor: %w", err)
}
_, err = zw.Write(innerData)
if err != nil {
return err
return fmt.Errorf("serialize: compress: %w", err)
}
_ = zw.Close()
o := &MFFileOuter{
InnerMessage: idc.Bytes(),
compressedData := idc.Bytes()
// Hash the compressed data for integrity verification before decompression
h := sha256.New()
if _, err := h.Write(compressedData); err != nil {
return fmt.Errorf("serialize: hash write: %w", err)
}
sha256Hash := h.Sum(nil)
m.pbOuter = &MFFileOuter{
InnerMessage: compressedData,
Size: int64(len(innerData)),
Sha256: h.Sum(nil),
Sha256: sha256Hash,
Uuid: manifestUUID[:],
Version: MFFileOuter_VERSION_ONE,
CompressionType: MFFileOuter_COMPRESSION_ZSTD,
}
m.pbOuter = o
// Sign the manifest if signing options are provided
if m.signingOptions != nil && m.signingOptions.KeyID != "" {
sigString, err := m.signatureString()
if err != nil {
return fmt.Errorf("failed to generate signature string: %w", err)
}
sig, err := gpgSign([]byte(sigString), m.signingOptions.KeyID)
if err != nil {
return fmt.Errorf("failed to sign manifest: %w", err)
}
m.pbOuter.Signature = sig
fingerprint, err := gpgGetKeyFingerprint(m.signingOptions.KeyID)
if err != nil {
return fmt.Errorf("failed to get key fingerprint: %w", err)
}
m.pbOuter.Signer = fingerprint
pubKey, err := gpgExportPublicKey(m.signingOptions.KeyID)
if err != nil {
return fmt.Errorf("failed to export public key: %w", err)
}
m.pbOuter.SigningPubKey = pubKey
}
return nil
}

57
mfer/url.go Normal file
View File

@@ -0,0 +1,57 @@
package mfer
import (
"net/url"
"strings"
)
// ManifestURL represents a URL pointing to a manifest file.
type ManifestURL string
// FileURL represents a URL pointing to a file to be fetched.
type FileURL string
// BaseURL represents a base URL for constructing file URLs.
type BaseURL string
// JoinPath safely joins a relative file path to a base URL.
// The path is properly URL-encoded to prevent path traversal.
func (b BaseURL) JoinPath(path RelFilePath) (FileURL, error) {
base, err := url.Parse(string(b))
if err != nil {
return "", err
}
// Ensure base path ends with /
if !strings.HasSuffix(base.Path, "/") {
base.Path += "/"
}
// Encode each path segment individually to preserve slashes
segments := strings.Split(string(path), "/")
for i, seg := range segments {
segments[i] = url.PathEscape(seg)
}
ref, err := url.Parse(strings.Join(segments, "/"))
if err != nil {
return "", err
}
resolved := base.ResolveReference(ref)
return FileURL(resolved.String()), nil
}
// String returns the URL as a string.
func (b BaseURL) String() string {
return string(b)
}
// String returns the URL as a string.
func (f FileURL) String() string {
return string(f)
}
// String returns the URL as a string.
func (m ManifestURL) String() string {
return string(m)
}

44
mfer/url_test.go Normal file
View File

@@ -0,0 +1,44 @@
package mfer
import (
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestBaseURLJoinPath(t *testing.T) {
tests := []struct {
base BaseURL
path RelFilePath
expected string
}{
{"https://example.com/dir/", "file.txt", "https://example.com/dir/file.txt"},
{"https://example.com/dir", "file.txt", "https://example.com/dir/file.txt"},
{"https://example.com/", "sub/file.txt", "https://example.com/sub/file.txt"},
{"https://example.com/dir/", "file with spaces.txt", "https://example.com/dir/file%20with%20spaces.txt"},
}
for _, tt := range tests {
t.Run(string(tt.base)+"+"+string(tt.path), func(t *testing.T) {
result, err := tt.base.JoinPath(tt.path)
require.NoError(t, err)
assert.Equal(t, tt.expected, string(result))
})
}
}
func TestBaseURLString(t *testing.T) {
b := BaseURL("https://example.com/")
assert.Equal(t, "https://example.com/", b.String())
}
func TestFileURLString(t *testing.T) {
f := FileURL("https://example.com/file.txt")
assert.Equal(t, "https://example.com/file.txt", f.String())
}
func TestManifestURLString(t *testing.T) {
m := ManifestURL("https://example.com/index.mf")
assert.Equal(t, "https://example.com/index.mf", m.String())
}

Binary file not shown.

Binary file not shown.