Compare commits

...

64 Commits

Author SHA1 Message Date
d2217ec29e Merge branch 'next' into fix/issue-12 2026-02-09 02:14:45 +01:00
7f25970dd3 Fix URL encoding for file paths in fetch command (closes #13) (#18)
Co-authored-by: clawbot <clawbot@openclaw>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #18
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-02-09 02:14:20 +01:00
clawbot
75c88d0a52 fix: use os.ErrNotExist instead of errors.New() for file-not-found detection in checkers
errors.Is(err, errors.New("file does not exist")) can never match because
errors.New creates a unique value each time. Replace with os.ErrNotExist
which is the standard Go sentinel for file-not-found errors.

Fixes both mfer/checker.go and internal/checker/checker.go.
2026-02-08 17:13:20 -08:00
70af055d4e Fix newTimestampFromTime panic on extreme dates (closes #15) (#20)
Co-authored-by: clawbot <clawbot@openclaw>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #20
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-02-09 02:10:21 +01:00
04b05e01e8 Consolidate scanner/checker — delete internal/scanner/ and internal/checker/ (closes #22) (#27)
Remove unused `internal/scanner/` and `internal/checker/` packages. The CLI already uses `mfer.Scanner` and `mfer.Checker` from the `mfer/` package directly, so these were dead code.

Co-authored-by: clawbot <clawbot@openclaw>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #27
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-02-09 02:09:01 +01:00
7144617d0e Add decompression size limit in deserializeInner() (closes #24) (#29)
Wrap zstd decompressor with `io.LimitReader` (256MB max) to prevent decompression bombs.

Co-authored-by: clawbot <clawbot@openclaw>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #29
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-02-09 01:45:55 +01:00
2efffd9da8 Specify and enforce path invariants (closes #26) (#31)
Add `ValidatePath()` enforcing UTF-8, forward-slash, relative, no `..`, no empty segments. Applied in `AddFile` and `AddFileWithHash`. Proto comments document the rules.

Co-authored-by: clawbot <clawbot@openclaw>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #31
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-02-09 01:45:29 +01:00
ebaf2a65ca Fix AddFile to verify actual bytes read matches declared size (closes #25) (#30)
After reading file content, verify `totalRead == size` and return an error on mismatch.

Co-authored-by: clawbot <clawbot@openclaw>
Reviewed-on: #30
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-02-09 01:35:07 +01:00
4b80c0067b docs: replace TODO.md with design questions and implementation plan 2026-02-08 18:40:31 +01:00
5ab092098b progress 2026-02-08 09:25:58 -08:00
4a2060087d Add GPG signature verification on manifest load
- Implement gpgVerify function that creates a temporary keyring to verify
  detached signatures against embedded public keys
- Signature verification happens during deserialization after hash
  validation but before decompression
- Extract signatureString() as a method on manifest for generating the
  canonical signature string (MAGIC-UUID-MULTIHASH)
- Add --require-signature flag to check command to mandate signature from
  a specific GPG key ID
- Expose IsSigned() and Signer() methods on Checker for signature status
2025-12-18 05:56:16 -08:00
213364bab5 Add UUID to manifest and verify integrity before decompression
- Add UUID field to both inner and outer manifest messages
- Generate random v4 UUID when creating manifest
- Hash compressed data (not uncompressed) for integrity check
- Verify hash before decompression to prevent malicious payloads
- Validate UUIDs are proper format and match between inner/outer
- Sign string format: MAGIC-UUID-MULTIHASH
2025-12-18 02:20:51 -08:00
778999a285 Add GPG signing support for manifest generation
- Add --sign-key flag and MFER_SIGN_KEY env var to gen and freshen commands
- Sign inner message multihash with GPG detached signature
- Include signer fingerprint and public key in outer wrapper
- Add comprehensive tests with temporary GPG keyring
- Increase test timeout to 10s for GPG key generation
2025-12-18 02:12:54 -08:00
308c583d57 Remove codebase structure section from README
godoc provides this documentation automatically
2025-12-18 01:38:13 -08:00
019fe41c3d Update .gitignore for new bin/ build directory 2025-12-18 01:30:50 -08:00
fc0b38ea19 Add TODO.md with codebase audit findings
Document issues found during code audit including:
- Critical: broken error comparison, unchecked hash writes, URL path traversal
- Important: goroutine leak, timestamp precision, missing context cancellation
- Code quality: duplicate functions, inefficient calculations, missing validation
2025-12-18 01:30:01 -08:00
61c17ca585 Normalize markdown formatting in documentation
- Use consistent dash-style bullet points
- Remove trailing whitespace
- Add missing blank lines between sections
- Add trailing newline to README.md
2025-12-18 01:29:56 -08:00
dae6c64e24 Change build output path from mfer.cmd to bin/mfer
Use conventional bin/ directory for build output instead of
placing executable in project root.
2025-12-18 01:29:47 -08:00
a5b0343b28 Use Go 1.13+ octal literal syntax throughout codebase
Update file permission literals from legacy octal format (0755, 0644)
to explicit Go 1.13+ format (0o755, 0o644) for improved readability.
2025-12-18 01:29:40 -08:00
e25e309581 Move checker package into mfer package
Consolidate checker functionality into the mfer package alongside
scanner, removing the need for a separate internal/checker package.
2025-12-18 01:28:35 -08:00
dc115c5ba2 Add custom types for type safety throughout codebase
- Add FileCount, FileSize, RelFilePath, AbsFilePath, ModTime, Multihash types
- Add UnixSeconds and UnixNanos types for timestamp handling
- Add URL types (ManifestURL, FileURL, BaseURL) with safe path joining
- Consolidate scanner package into mfer package
- Update checker to use custom types in Result and CheckStatus
- Add ModTime.Timestamp() method for protobuf conversion
- Update all tests to use proper custom types
2025-12-18 01:01:18 -08:00
a9f0d2abe4 Update README to reflect current API (FileProgress was already a channel) 2025-12-17 17:19:08 -08:00
1588e1bb9f Remove unused legacy manifest APIs
Removed:
- New(), NewFromPaths(), NewFromFS() - unused constructors
- Scan(), addFile(), addInputPath(), addInputFS() - unused scanning code
- WriteToFile(), Write() - unused output methods (Builder.Build() is used)
- GetFileCount(), GetTotalFileSize() - unused accessors
- pathIsHidden() - duplicated in internal/scanner
- ManifestScanOptions - unused options struct
- HasError(), AddError(), WithContext() - unused error/context handling
- NewFromProto() - deprecated alias
- manifestFile struct - unused internal type

Kept:
- manifest struct (simplified to just pbInner, pbOuter, output)
- NewManifestFromReader(), NewManifestFromFile() - for loading manifests
- Files() - returns files from loaded manifest
- Builder and its methods - for creating manifests
2025-12-17 17:16:35 -08:00
09e8da0855 Update CLAUDE.md and clean up completed TODOs in README 2025-12-17 17:09:33 -08:00
efa4bb929a Update README: mark FIXMEs as resolved 2025-12-17 17:08:37 -08:00
16e3538ea6 Document WriteToFile overwrite behavior, remove misplaced FIXME 2025-12-17 17:08:11 -08:00
1ae384b6f6 Add context cancellation support to Scan 2025-12-17 17:07:02 -08:00
b55ae961c8 Validate filesystem in addInputFS 2025-12-17 17:05:42 -08:00
ed40673e85 Validate input path exists in addInputPath 2025-12-17 17:04:46 -08:00
2549695ab0 Update README TODO with completed items 2025-12-17 17:03:05 -08:00
e480c3f677 Fix redundant stat call in addFile
Use the FileInfo already provided by Walk instead of calling Stat again.
Only stat if fi is nil (defensive, shouldn't happen in normal Walk usage).
Also fixes potential nil pointer dereference if fi was nil.
2025-12-17 17:02:29 -08:00
d3776d7d7c Add unit tests for checker/scanner and validate input paths
- Add comprehensive unit tests for internal/checker (88.5% coverage)
  - Status string representations
  - NewChecker validation
  - Check operation (OK, missing, size/hash mismatch)
  - Progress reporting and context cancellation
  - FindExtraFiles functionality

- Add comprehensive unit tests for internal/scanner (80.1% coverage)
  - Constructors and options
  - File/path enumeration
  - Dotfile exclusion/inclusion
  - ToManifest with progress and cancellation
  - Non-blocking status channel sends

- Validate input paths before scanning in generate command
  - Fail fast with clear error if paths don't exist
  - Prevents confusing errors deep in enumeration
2025-12-17 16:57:01 -08:00
07db5d434f Humanize file sizes in gen output, clean up temp on signal
- Humanize file sizes in verbose file listing (e.g., "76.8 MiB" not "76836984 bytes")
- Add signal handler to clean up temp file on Ctrl-C/SIGTERM during gen
2025-12-17 16:26:32 -08:00
1d37dd9748 Show only filename (not full path) in debug log caller info 2025-12-17 16:21:06 -08:00
1e81801036 Change gen file listing from debug to verbose level 2025-12-17 16:19:19 -08:00
eaeb84f2cc Change check OK messages from debug to verbose level 2025-12-17 16:16:16 -08:00
a20c3e5104 Add manifest corruption detection test
Generates a ~1MB manifest (20000 files with random names), then:
- Verifies truncated manifest causes check to fail
- Runs 500 iterations of random single-byte corruption
- Each iteration verifies check detects the corruption
2025-12-17 16:11:59 -08:00
c218fe56e9 Add atomic writes, humanized sizes, debug logging, and -v/-q per-command
- Atomic writes for mfer gen: writes to temp file, renames on success,
  cleans up temp on error/interrupt. Prevents empty manifests on Ctrl-C.
- Humanized byte sizes using dustin/go-humanize (e.g., "10 MiB" not "10485760")
- Progress lines clear when done (using ANSI escape \r\033[K])
- Debug logging when files are added to manifest (mfer gen -vv)
- Move -v/-q flags from global to per-command for better UX
- Add tests for atomic write behavior with failing filesystem mock
2025-12-17 15:57:20 -08:00
444a4c8f45 Add list command to show manifest contents
- mfer list: shows file paths one per line
- mfer list -l/--long: shows size, mtime, and path
- mfer list --print0: NUL-separated output for xargs -0
2025-12-17 15:36:48 -08:00
a07209fef5 Skip manifest write when nothing changed in freshen 2025-12-17 15:31:33 -08:00
b3fe38092b Add project URL to banner output 2025-12-17 15:29:58 -08:00
9dbdcbde91 Fix version command and add default action with banner
- Version command now uses mfer.Version constant instead of empty build flags
- Running just 'mfer' shows banner + help
- Unknown commands still return error with exit code 1
2025-12-17 15:28:14 -08:00
6edc798de0 Add version line after banner with release date
Added mfer/constants.go with Version and ReleaseDate constants for
deterministic builds. Banner now shows "mfer by @sneak: v0.1.0 released 2025-12-17"
2025-12-17 15:23:55 -08:00
818358a8a1 Fix pathIsHidden treating base directory as hidden, print banner to stdout
- pathIsHidden(".") was returning true, causing freshen to skip entire
  directory tree when dotfiles excluded
- Banner now prints directly to stdout to avoid log prefix artifacts
2025-12-17 15:20:56 -08:00
5523cb1595 Add Verbose log level between Info and Debug
Implemented full log level hierarchy: Fatal, Error, Warn, Info, Verbose, Debug.
- Verbose level (-v) shows detailed operations like file changes (M/A/D)
- Debug level (-vv) shows low-level tracing with caller info
- Quiet mode (-q) sets level to Error, suppressing Info messages
- Banner and summary output now use log levels for filtering
2025-12-17 15:17:27 -08:00
0e86562c09 Exclude dotfiles by default, add --include-dotfiles flag
Changed the default behavior to exclude dotfiles (files/dirs starting with .)
which is the more common use case. Added --include-dotfiles flag for when
hidden files need to be included in the manifest.
2025-12-17 15:10:22 -08:00
6dc496fa9e Add verbose output for freshen showing M/A/D files with -v 2025-12-17 15:04:43 -08:00
e6cb906d35 Change operational log messages from Debug to Info 2025-12-17 15:02:46 -08:00
45201509ff Fix progress line not clearing before final status 2025-12-17 15:01:32 -08:00
21028af9aa Replace gzip with zstd compression
Use zstd with SpeedBestCompression level for better compression
ratios. Remove gzip support entirely. Include generated protobuf
file to allow building without protoc.
2025-12-17 14:49:30 -08:00
150bac82cf Add TODO.md with 1.0 release tasks 2025-12-17 14:40:27 -08:00
92bd13efde Fix all linter errors
- Add explicit error ignoring with _ = for Close/Remove calls
- Rename WriteTo to Write to avoid io.WriterTo interface conflict
- Fix errcheck warnings in fetch, freshen, gen, mfer, checker,
  deserialize, serialize, and output files
2025-12-17 14:37:52 -08:00
531f460f87 Add CLAUDE.md 2025-12-17 14:33:00 -08:00
c5ca3e2ced Change FileProgress callback to channel-based progress
Replace callback-based progress reporting in Builder.AddFile with
channel-based FileHashProgress for consistency with EnumerateStatus
and ScanStatus patterns. Update scanner.go to use the new channel API.
2025-12-17 14:30:10 -08:00
fded1a0393 Remove generateInner, now handled by Builder
- Remove generateInner() from serialize.go
- Update generate() to error if pbInner not set
- Remove legacy tests that depended on old code path
- Update TODO item to reflect removal
2025-12-17 11:27:41 -08:00
5d7c729efb remove golangci-lint config files 2025-12-17 11:27:41 -08:00
48c3c09d85 Rename ManifestBuilder to Builder 2025-12-17 11:27:41 -08:00
f3be3eba84 Add TODO: change FileProgress callback to channel-based 2025-12-17 11:27:41 -08:00
5e65b3a0fd Add TODO section to README with prioritized task list 2025-12-17 11:27:41 -08:00
79fc5cca6c Add godoc strings to all exported types, functions, and fields
Documents:
- cli: NO_COLOR, RunOptions fields, CLIApp, VersionString
- checker: Result fields, Status constants, CheckStatus fields
- scanner: EnumerateStatus, ScanStatus, Options, FileEntry fields
- log: Level alias, DisableStyling, Init, Info/Debug functions,
  verbosity helpers, GetLogger, GetLevel, WithError
- mfer: ManifestScanOptions, New, NewFromPaths, NewFromFS, MAGIC
2025-12-17 11:27:41 -08:00
155ebe9a78 Merge branch 'main' into next 2025-12-17 19:01:52 +00:00
dc2ea47f6a Add testable CLI with dependency injection and new scanner/checker packages
Major changes:
- Refactor CLI to accept injected I/O streams and filesystem (afero.Fs)
  for testing without touching the real filesystem
- Add RunOptions struct and RunWithOptions() for configurable CLI execution
- Add internal/scanner package with two-phase manifest generation:
  - Phase 1 (Enumeration): walk directories, collect metadata
  - Phase 2 (Scan): read contents, compute hashes, write manifest
- Add internal/checker package for manifest verification with progress
  reporting and channel-based result streaming
- Add mfer/builder.go for incremental manifest construction
- Add --no-extra-files flag to check command to detect files not in manifest
- Add timing summaries showing file count, size, elapsed time, and throughput
- Add comprehensive tests using afero.MemMapFs (no real filesystem access)
- Add contrib/usage.sh integration test script
- Fix banner ASCII art alignment (consistent spacing)
- Fix verbosity levels so summaries display at default log level
- Update internal/log to support configurable output writers
2025-12-17 11:00:55 -08:00
13f39d598f remove gofumpt from linting
Some checks failed
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is failing
2024-05-22 14:34:05 -07:00
01bffc8388 latest
All checks were successful
continuous-integration/drone/push Build is passing
2024-05-22 14:25:35 -07:00
42 changed files with 6344 additions and 534 deletions

3
.gitignore vendored
View File

@ -1,5 +1,4 @@
mfer/*.pb.go
/mfer.cmd
/bin/
/tmp
*.tmp
*.dockerimage

View File

@ -1,2 +0,0 @@
run:
tests: false

View File

@ -1,2 +0,0 @@
run:
tests: false

BIN
.index.mf Normal file

Binary file not shown.

20
CLAUDE.md Normal file
View File

@ -0,0 +1,20 @@
# Important Rules
- when fixing a bug, write a failing test FIRST. only after the test fails, write
the code to fix the bug. then ensure the test passes. leave the test in
place and commit it with the bugfix. don't run shell commands to test
bugfixes or reproduce bugs. write tests!
- never, ever mention claude or anthropic in commit messages. do not use attribution
- after each change, run "make fmt".
- after each change, run "make test" and ensure all tests pass.
- after each change, run "make lint" and ensure no linting errors. fix any
you find, one by one.
- after each change, commit the files you've changed. push after
committing.
- NEVER use `git add -A`. always add only individual files that you've changed.

View File

@ -17,14 +17,14 @@ GOFLAGS := -ldflags "$(GOLDFLAGS)"
default: fmt test
run: ./mfer.cmd
run: ./bin/mfer
./$<
./$< gen --ignore-dotfiles
./$< gen
ci: test
test: $(SOURCEFILES) mfer/mf.pb.go
go test -v --timeout 3s ./...
go test -v --timeout 10s ./...
$(PROTOC_GEN_GO):
test -e $(PROTOC_GEN_GO) || go install -v google.golang.org/protobuf/cmd/protoc-gen-go@v1.28.1
@ -33,18 +33,17 @@ fixme:
@grep -nir fixme . | grep -v Makefile
devprereqs:
which gofumpt || go install -v mvdan.cc/gofumpt@latest
which golangci-lint || go install -v github.com/golangci/golangci-lint/cmd/golangci-lint@v1.50.1
which golangci-lint || go install -v github.com/golangci/golangci-lint/cmd/golangci-lint@latest
mfer/mf.pb.go: mfer/mf.proto
cd mfer && go generate .
mfer.cmd: $(SOURCEFILES) mfer/mf.pb.go
bin/mfer: $(SOURCEFILES) mfer/mf.pb.go
protoc --version
cd cmd/mfer && go build -tags urfave_cli_no_docs -o ../../mfer.cmd $(GOFLAGS) .
cd cmd/mfer && go build -tags urfave_cli_no_docs -o ../../bin/mfer $(GOFLAGS) .
clean:
rm -rfv mfer/*.pb.go mfer.cmd cmd/mfer/mfer *.dockerimage
rm -rfv mfer/*.pb.go bin/mfer cmd/mfer/mfer *.dockerimage
fmt: mfer/mf.pb.go
gofumpt -l -w mfer internal cmd

View File

@ -23,6 +23,35 @@ itch in 2022 and is currently a one-person effort, though the goal is for
this to emerge as a de-facto standard and be incorporated into other
software. A compatible javascript library is planned.
# Phases
Manifest generation happens in two distinct phases:
## Phase 1: Enumeration
Walking directories and calling `stat()` on files to collect metadata (path, size, mtime, ctime). This builds the list of files to be scanned. Relatively fast as it only reads filesystem metadata, not file contents.
**Progress:** `EnumerateStatus` with `FilesFound` and `BytesFound`
## Phase 2: Scan (ToManifest)
Reading file contents and computing cryptographic hashes for manifest generation. This is the expensive phase that reads all file data from disk.
**Progress:** `ScanStatus` with `TotalFiles`, `ScannedFiles`, `TotalBytes`, `ScannedBytes`, `BytesPerSec`
# Code Conventions
- **Logging:** Never use `fmt.Printf` or write to stdout/stderr directly in normal code. Use the `internal/log` package for all output (`log.Info`, `log.Infof`, `log.Debug`, `log.Debugf`, `log.Progressf`, `log.ProgressDone`).
- **Filesystem abstraction:** Use `github.com/spf13/afero` for filesystem operations to enable testing and flexibility.
- **CLI framework:** Use `github.com/urfave/cli/v2` for command-line interface.
- **Serialization:** Use Protocol Buffers for manifest file format.
- **Internal packages:** Non-exported implementation details go in `internal/` subdirectories.
- **Concurrency:** Use `sync.RWMutex` for protecting shared state; prefer channels for progress reporting.
- **Progress channels:** Use buffered channels (size 1) with non-blocking sends to avoid blocking the main operation if the consumer is slow.
- **Context support:** Long-running operations should accept `context.Context` for cancellation.
- **NO_COLOR:** Respect the `NO_COLOR` environment variable for disabling colored output.
- **Options pattern:** Use `NewWithOptions(opts *Options)` constructor pattern for configurable types.
# Build Status
[![Build Status](https://drone.datavi.be/api/badges/sneak/mfer/status.svg)](https://drone.datavi.be/sneak/mfer)
@ -42,7 +71,6 @@ requests](https://git.eeqj.de/sneak/mfer/pulls) and pass CI to be merged.
Any changes submitted to this project must also be
[WTFPL-licensed](https://wtfpl.net) to be considered.
# Problem Statement
Given a plain URL, there is no standard way to safely and programmatically
@ -120,6 +148,10 @@ The manifest file would do several important things:
- metadata size should not be used as an excuse to sacrifice utility (such
as providing checksums over each chunk of a large file)
# Limitations
- **Manifest size:** Manifests must fit entirely in system memory during reading and writing.
# Open Questions
- Should the manifest file include checksums of individual file chunks, or just for the whole assembled file?
@ -209,13 +241,13 @@ desired username for an account on this Gitea instance.
## Links
* Repo: [https://git.eeqj.de/sneak/mfer](https://git.eeqj.de/sneak/mfer)
* Issues: [https://git.eeqj.de/sneak/mfer/issues](https://git.eeqj.de/sneak/mfer/issues)
- Repo: [https://git.eeqj.de/sneak/mfer](https://git.eeqj.de/sneak/mfer)
- Issues: [https://git.eeqj.de/sneak/mfer/issues](https://git.eeqj.de/sneak/mfer/issues)
# Authors
* [@sneak &lt;sneak@sneak.berlin&gt;](mailto:sneak@sneak.berlin)
- [@sneak &lt;sneak@sneak.berlin&gt;](mailto:sneak@sneak.berlin)
# License
* [WTFPL](https://wtfpl.net)
- [WTFPL](https://wtfpl.net)

122
TODO.md Normal file
View File

@ -0,0 +1,122 @@
# TODO: mfer 1.0
## Design Questions
*sneak: please answer inline below each question. These are preserved for posterity.*
### Format Design
**1. Should `MFFileChecksum` be simplified?**
Currently it's a separate message wrapping a single `bytes multiHash` field. Since multihash already self-describes the algorithm, `repeated bytes hashes` directly on `MFFilePath` would be simpler and reduce per-file protobuf overhead. Is the extra message layer intentional (e.g. planning to add per-hash metadata like `verified_at`)?
> *answer:*
**2. Should file permissions/mode be stored?**
The format stores mtime/ctime but not Unix file permissions. For archival use (ExFAT, filesystem-independent checksums) this may not matter, but for software distribution or filesystem restoration it's a gap. Should we reserve a field now (e.g. `optional uint32 mode = 305`) even if we don't populate it yet?
> *answer:*
**3. Should `atime` be removed from the schema?**
Access time is volatile, non-deterministic, and often disabled (`noatime`). Including it means two manifests of the same directory at different times will differ, which conflicts with the determinism goal. Remove it, or document it as "never set by default"?
> *answer:*
**4. What are the path normalization rules?**
The proto has `string path` with no specification about: always forward-slash? Must be relative? No `..` components allowed? UTF-8 NFC vs NFD normalization (macOS vs Linux)? Max path length? This is a security issue (path traversal) and a cross-platform compatibility issue. What rules should the spec mandate?
> *answer:*
**5. Should we add a version byte after the magic?**
Currently `ZNAVSRFG` is followed immediately by protobuf. Adding a version byte (`ZNAVSRFG\x01`) would allow future framing changes without requiring protobuf parsing to detect the version. `MFFileOuter.Version` serves this purpose but requires successful deserialization to read. Worth the extra byte?
> *answer:*
**6. Should we add a length-prefix after the magic?**
Protobuf is not self-delimiting. If we ever want to concatenate manifests or append data after the protobuf, the current framing is insufficient. Add a varint or fixed-width length-prefix?
> *answer:*
### Signature Design
**7. What does the outer SHA-256 hash cover — compressed or uncompressed data?**
The review notes it currently hashes compressed data (good for verifying before decompression), but this should be explicitly documented. Which is the intended behavior?
> *answer:*
**8. Should `signatureString()` sign raw bytes instead of a hex-encoded string?**
Currently the canonical string is `MAGIC-UUID-MULTIHASH` with hex encoding, which adds a transformation layer. Signing the raw `sha256` bytes (or compressed `innerMessage` directly) would be simpler. Keep the string format or switch to raw bytes?
> *answer:*
**9. Should we support detached signature files (`.mf.sig`)?**
Embedded signatures are better for single-file distribution. Detached `.mf.sig` files follow the familiar `SHASUMS`/`SHASUMS.asc` pattern and are simpler for HTTP serving. Support both modes?
> *answer:*
**10. GPG vs pure-Go crypto for signatures?**
Shelling out to `gpg` is fragile (may not be installed, version-dependent output). `github.com/ProtonMail/go-crypto` provides pure-Go OpenPGP, or we could go Ed25519/signify (simpler, no key management). Which direction?
> *answer:*
### Implementation Design
**11. Should manifests be deterministic by default?**
This means: sort file entries by path, omit `createdAt` timestamp (or make it opt-in), no `atime`. Should determinism be the default, with a `--include-timestamps` flag to opt in?
> *answer:*
**12. Should we consolidate or keep both scanner/checker implementations?**
There are two parallel implementations: `mfer/scanner.go` + `mfer/checker.go` (typed with `FileSize`, `RelFilePath`) and `internal/scanner/` + `internal/checker/` (raw `int64`, `string`). The `mfer/` versions are superior. Delete the `internal/` versions?
> *answer:*
**13. Should the `manifest` type be exported?**
Currently unexported with exported constructors (`New`, `NewFromPaths`, etc.). Consumers can't declare `var m *mfer.manifest`. Export the type, or define an interface?
> *answer:*
**14. What should the Go module path be for 1.0?**
Currently mixed between `sneak.berlin/go/mfer` and `git.eeqj.de/sneak/mfer`. Which is canonical?
> *answer:*
---
## Implementation Plan
### Phase 1: Foundation (format correctness)
- [ ] Delete `internal/scanner/` and `internal/checker/` — consolidate on `mfer/` package versions; update CLI code
- [ ] Add deterministic file ordering — sort entries by path (lexicographic, byte-order) in `Builder.Build()`; add test asserting byte-identical output from two runs
- [ ] Add decompression size limit — `io.LimitReader` in `deserializeInner()` with `m.pbOuter.Size` as bound
- [ ] Fix `errors.Is` dead code in checker — replace with `os.IsNotExist(err)` or `errors.Is(err, fs.ErrNotExist)`
- [ ] Fix `AddFile` to verify size — check `totalRead == size` after reading, return error on mismatch
- [ ] Specify path invariants — add proto comments (UTF-8, forward-slash, relative, no `..`, no leading `/`); validate in `Builder.AddFile` and `Builder.AddFileWithHash`
### Phase 2: CLI polish
- [ ] Fix flag naming — all CLI flags use kebab-case as primary (`--include-dotfiles`, `--follow-symlinks`)
- [ ] Fix URL construction in fetch — use `BaseURL.JoinPath()` or `url.JoinPath()` instead of string concatenation
- [ ] Add progress rate-limiting to Checker — throttle to once per second, matching Scanner
- [ ] Add `--deterministic` flag (or make it default) — omit `createdAt`, sort files
### Phase 3: Robustness
- [ ] Replace GPG subprocess with pure-Go crypto — `github.com/ProtonMail/go-crypto` or Ed25519/signify
- [ ] Add timeout to any remaining subprocess calls
- [ ] Add fuzzing tests for `NewManifestFromReader`
- [ ] Add retry logic to fetch — exponential backoff for transient HTTP errors
### Phase 4: Format finalization
- [ ] Remove or deprecate `atime` from proto (pending design question answer)
- [ ] Reserve `optional uint32 mode = 305` in `MFFilePath` for future file permissions
- [ ] Add version byte after magic — `ZNAVSRFG\x01` for format version 1
- [ ] Write format specification document — separate from README: magic, outer structure, compression, inner structure, path invariants, signature scheme, canonical serialization
### Phase 5: Release prep
- [ ] Finalize Go module path
- [ ] Audit all error messages for consistency and helpfulness
- [ ] Add `--version` output matching SemVer
- [ ] Tag v1.0.0

View File

@ -3,7 +3,7 @@ package main
import (
"os"
"git.eeqj.de/sneak/mfer/internal/cli"
"sneak.berlin/go/mfer/internal/cli"
)
var (

19
contrib/usage.sh Executable file
View File

@ -0,0 +1,19 @@
#!/bin/bash
set -euo pipefail
# usage.sh - Generate and check a manifest from the repo
# Run from repo root: ./contrib/usage.sh
TMPDIR=$(mktemp -d)
MANIFEST="$TMPDIR/index.mf"
cleanup() {
rm -rf "$TMPDIR"
}
trap cleanup EXIT
echo "Building mfer..."
go build -o "$TMPDIR/mfer" ./cmd/mfer
"$TMPDIR/mfer" generate -o "$MANIFEST" .
"$TMPDIR/mfer" check --base . "$MANIFEST"

20
go.mod
View File

@ -1,16 +1,19 @@
module git.eeqj.de/sneak/mfer
module sneak.berlin/go/mfer
go 1.17
go 1.23
require (
github.com/apex/log v1.9.0
github.com/davecgh/go-spew v1.1.1
github.com/dustin/go-humanize v1.0.1
github.com/google/uuid v1.1.2
github.com/klauspost/compress v1.18.2
github.com/multiformats/go-multihash v0.2.3
github.com/pterm/pterm v0.12.35
github.com/spf13/afero v1.8.0
github.com/stretchr/testify v1.8.1
github.com/urfave/cli/v2 v2.23.6
google.golang.org/protobuf v1.28.1
)
require (
@ -18,17 +21,24 @@ require (
github.com/cpuguy83/go-md2man/v2 v2.0.2 // indirect
github.com/fatih/color v1.7.0 // indirect
github.com/gookit/color v1.4.2 // indirect
github.com/klauspost/cpuid/v2 v2.0.9 // indirect
github.com/mattn/go-colorable v0.1.2 // indirect
github.com/mattn/go-isatty v0.0.8 // indirect
github.com/mattn/go-runewidth v0.0.13 // indirect
github.com/minio/sha256-simd v1.0.0 // indirect
github.com/mr-tron/base58 v1.2.0 // indirect
github.com/multiformats/go-varint v0.0.6 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/rivo/uniseg v0.2.0 // indirect
github.com/russross/blackfriday/v2 v2.1.0 // indirect
github.com/spaolacci/murmur3 v1.1.0 // indirect
github.com/xo/terminfo v0.0.0-20210125001918-ca9a967f8778 // indirect
github.com/xrash/smetrics v0.0.0-20201216005158-039620a65673 // indirect
golang.org/x/sys v0.0.0-20211013075003-97ac67df715c // indirect
golang.org/x/crypto v0.0.0-20220525230936-793ad666bf5e // indirect
golang.org/x/sys v0.1.0 // indirect
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211 // indirect
golang.org/x/text v0.3.4 // indirect
golang.org/x/text v0.3.6 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
lukechampine.com/blake3 v1.1.6 // indirect
)

27
go.sum
View File

@ -37,7 +37,6 @@ cloud.google.com/go/storage v1.10.0/go.mod h1:FLPqc6j+Ki4BU591ie1oL6qBQGu2Bl/tZ9
cloud.google.com/go/storage v1.14.0/go.mod h1:GrKmX003DSIwi9o29oFT7YDnHYwZoctc3fOKtUw0Xmo=
dmitri.shuralyov.com/gpu/mtl v0.0.0-20190408044501-666a987793e9/go.mod h1:H6x//7gZCb22OMCxBHrMx7a5I7Hp++hsVxbQ4BYO7hU=
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
github.com/BurntSushi/toml v1.2.1/go.mod h1:CxXYINrC8qIiEnFrOxCa7Jy5BFHlXnUU2pbicEuybxQ=
github.com/BurntSushi/xgb v0.0.0-20160522181843-27f122750802/go.mod h1:IVnqGOEym/WlBOVXweHU+Q+/VP0lqqI8lqeDx9IjBqo=
github.com/MarvinJWendt/testza v0.1.0/go.mod h1:7AxNvlfeHP7Z/hDQ5JtE3OKYT3XFUeLCDE2DQninSqs=
github.com/MarvinJWendt/testza v0.2.1/go.mod h1:God7bhG8n6uQxwdScay+gjm9/LnO4D3kkcZX4hv9Rp8=
@ -67,6 +66,8 @@ github.com/cpuguy83/go-md2man/v2 v2.0.2/go.mod h1:tgQtvFlXSQOSOSIRvRPT7W67SCa46t
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
github.com/envoyproxy/go-control-plane v0.9.0/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4=
github.com/envoyproxy/go-control-plane v0.9.1-0.20191026205805-5f8ba28d4473/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4=
github.com/envoyproxy/go-control-plane v0.9.4/go.mod h1:6rpuAdCZL397s3pYoYcLgu1mIlRU8Am5FuJP05cCM98=
@ -134,6 +135,7 @@ github.com/google/pprof v0.0.0-20201203190320-1bf35d6f28c2/go.mod h1:kpwsk12EmLe
github.com/google/pprof v0.0.0-20201218002935-b9804c9f04c2/go.mod h1:kpwsk12EmLew5upagYY7GY0pfYCcupk39gWOCRROcvE=
github.com/google/renameio v0.1.0/go.mod h1:KWCgfxg9yswjAJkECMjeO8J8rahYeXnNhOm40UhjYkI=
github.com/google/uuid v1.1.1/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/google/uuid v1.1.2 h1:EVhdT+1Kseyi1/pUmXKaFxYsDNy9RQYkMWRH68J/W7Y=
github.com/google/uuid v1.1.2/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/googleapis/gax-go/v2 v2.0.4/go.mod h1:0Wqv26UfaUD9n4G6kQubkQ+KchISgw+vpHVxEJEs9eg=
github.com/googleapis/gax-go/v2 v2.0.5/go.mod h1:DWXyrwAJ9X0FpwwEdw+IPEYBICEFu5mhpdKc/us6bOk=
@ -150,6 +152,9 @@ github.com/jpillora/backoff v0.0.0-20180909062703-3050d21c67d7/go.mod h1:2iMrUgb
github.com/jstemmer/go-junit-report v0.0.0-20190106144839-af01ea7f8024/go.mod h1:6v2b51hI/fHJwM22ozAgKL4VKDeJcHhJFhtBdhmNjmU=
github.com/jstemmer/go-junit-report v0.9.1/go.mod h1:Brl9GWCQeLvo8nXZwPNNblvFj/XSXhF0NWZEnDohbsk=
github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck=
github.com/klauspost/compress v1.18.2 h1:iiPHWW0YrcFgpBYhsA6D1+fqHssJscY/Tm/y2Uqnapk=
github.com/klauspost/compress v1.18.2/go.mod h1:R0h/fSBs8DE4ENlcrlib3PsXS61voFxhIs2DeRhCvJ4=
github.com/klauspost/cpuid/v2 v2.0.4/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg=
github.com/klauspost/cpuid/v2 v2.0.9 h1:lgaqFMSdTdQYdZ04uHyN2d/eKdOMyi2YLSvlQIBFYa4=
github.com/klauspost/cpuid/v2 v2.0.9/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg=
github.com/kr/fs v0.1.0/go.mod h1:FFnZGqtBN9Gxj7eW1uZ42v5BccTP0vu6NEaFoC2HwRg=
@ -169,6 +174,14 @@ github.com/mattn/go-isatty v0.0.8/go.mod h1:Iq45c/XA43vh69/j3iqttzPXn0bhXyGjM0Hd
github.com/mattn/go-runewidth v0.0.13 h1:lTGmDsbAYt5DmK6OnoV7EuIF1wEIFAcxld6ypU4OSgU=
github.com/mattn/go-runewidth v0.0.13/go.mod h1:Jdepj2loyihRzMpdS35Xk/zdY8IAYHsh153qUoGf23w=
github.com/mgutz/ansi v0.0.0-20170206155736-9520e82c474b/go.mod h1:01TrycV0kFyexm33Z7vhZRXopbI8J3TDReVlkTgMUxE=
github.com/minio/sha256-simd v1.0.0 h1:v1ta+49hkWZyvaKwrQB8elexRqm6Y0aMLjCNsrYxo6g=
github.com/minio/sha256-simd v1.0.0/go.mod h1:OuYzVNI5vcoYIAmbIvHPl3N3jUzVedXbKy5RFepssQM=
github.com/mr-tron/base58 v1.2.0 h1:T/HDJBh4ZCPbU39/+c3rRvE0uKBQlU27+QI8LJ4t64o=
github.com/mr-tron/base58 v1.2.0/go.mod h1:BinMc/sQntlIE1frQmRFPUoPA1Zkr8VRgBdjWI2mNwc=
github.com/multiformats/go-multihash v0.2.3 h1:7Lyc8XfX/IY2jWb/gI7JP+o7JEq9hOa7BFvVU9RSh+U=
github.com/multiformats/go-multihash v0.2.3/go.mod h1:dXgKXCXjBzdscBLk9JkjINiEsCKRVch90MdaGiKsvSM=
github.com/multiformats/go-varint v0.0.6 h1:gk85QWKxh3TazbLxED/NlDVv8+q+ReFJk7Y2W/KhfNY=
github.com/multiformats/go-varint v0.0.6/go.mod h1:3Ls8CIEsrijN6+B7PbrXRPxHRPuXSrVKRY101jdMZYE=
github.com/onsi/ginkgo v1.6.0/go.mod h1:lLunBs/Ym6LB5Z9jYTR76FiuTmxDTDusOGeTQH+WWjE=
github.com/onsi/gomega v1.5.0/go.mod h1:ex+gbHU/CVuBBDIJjb2X0qEXbFg53c61hWP/1CpauHY=
github.com/pkg/errors v0.8.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
@ -195,6 +208,8 @@ github.com/sergi/go-diff v1.0.0/go.mod h1:0CfEIISq7TuYL3j771MWULgwwjU+GofnZX9QAm
github.com/smartystreets/assertions v1.0.0/go.mod h1:kHHU4qYBaI3q23Pp3VPrmWhuIUrLW/7eUrw0BU5VaoM=
github.com/smartystreets/go-aws-auth v0.0.0-20180515143844-0c1422d1fdb9/go.mod h1:SnhjPscd9TpLiy1LpzGSKh3bXCfxxXuqd9xmQJy3slM=
github.com/smartystreets/gunit v1.0.0/go.mod h1:qwPWnhz6pn0NnRBP++URONOVyNkPyr4SauJk4cUOwJs=
github.com/spaolacci/murmur3 v1.1.0 h1:7c1g84S4BPRrfL5Xrdp6fOJ206sU9y293DDHaoy0bLI=
github.com/spaolacci/murmur3 v1.1.0/go.mod h1:JwIasOWyU6f++ZhiEuf87xNszmSA2myDM2Kzu9HwQUA=
github.com/spf13/afero v1.8.0 h1:5MmtuhAgYeU6qpa7w7bP0dv6MBYuup0vekhSpSkoq60=
github.com/spf13/afero v1.8.0/go.mod h1:CtAatgMJh6bJEIs48Ay/FOnkljP3WeGUG0MC1RfAqwo=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
@ -240,6 +255,8 @@ golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8U
golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto=
golang.org/x/crypto v0.0.0-20210421170649-83a5a9bb288b/go.mod h1:T9bdIzuCu7OtxOm1hfPfRQxPLYneinmdGuTeoZ9dtd4=
golang.org/x/crypto v0.0.0-20211108221036-ceb1ce70b4fa/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/crypto v0.0.0-20220525230936-793ad666bf5e h1:T8NU3HyQ8ClP4SEE+KbFlg6n0NhuTsN4MyznaarGsZM=
golang.org/x/crypto v0.0.0-20220525230936-793ad666bf5e/go.mod h1:IxCIyHEi3zRg3s0A5j5BB6A9Jmi73HwBIUl50j+osU4=
golang.org/x/exp v0.0.0-20190121172915-509febef88a4/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
golang.org/x/exp v0.0.0-20190306152737-a1d7652674e8/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
golang.org/x/exp v0.0.0-20190510132918-efd6b22b2522/go.mod h1:ZjyILWgesfNpC6sMxTJOJm9Kp84zZh5NQWvqDGG3Qr8=
@ -361,8 +378,9 @@ golang.org/x/sys v0.0.0-20210225134936-a50acf3fe073/go.mod h1:h1NjWce9XRLGQEsW7w
golang.org/x/sys v0.0.0-20210330210617-4fbd30eecc44/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20210423185535-09eb48e85fd7/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20211013075003-97ac67df715c h1:taxlMj0D/1sOAuv/CbSD+MMDof2vbyPTqz5FNYKpXt8=
golang.org/x/sys v0.0.0-20211013075003-97ac67df715c/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.1.0 h1:kunALQeHf1/185U1i0GOB/fy1IPRDDpuoOOqRReG57U=
golang.org/x/sys v0.1.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/term v0.0.0-20210220032956-6a3ed077a48d/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/term v0.0.0-20210615171337-6886f2dfbf5b/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
@ -373,8 +391,9 @@ golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.1-0.20180807135948-17ff2d5776d2/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.2/go.mod h1:bEr9sfX3Q8Zfm5fL9x+3itogRgK3+ptLWKqgva+5dAk=
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.4 h1:0YWbFKbhXG/wIiuHDSKpS0Iy7FSA+u45VtBMfQcFTTc=
golang.org/x/text v0.3.4/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.6 h1:aRYxNxv6iGQlyVaZmk6ZgYEDa+Jg18DxebPSrd6bg1M=
golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/time v0.0.0-20181108054448-85acf8d2951c/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/time v0.0.0-20190308202827-9d24e82272b4/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/time v0.0.0-20191024005414-555d28b269f0/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
@ -542,6 +561,8 @@ honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWh
honnef.co/go/tools v0.0.1-2019.2.3/go.mod h1:a3bituU0lyd329TUQxRnasdCoJDkEUEAqEt0JzvZhAg=
honnef.co/go/tools v0.0.1-2020.1.3/go.mod h1:X/FiERA/W4tHapMX5mGpAtMSVEeEUOyHaw9vFzvIQ3k=
honnef.co/go/tools v0.0.1-2020.1.4/go.mod h1:X/FiERA/W4tHapMX5mGpAtMSVEeEUOyHaw9vFzvIQ3k=
lukechampine.com/blake3 v1.1.6 h1:H3cROdztr7RCfoaTpGZFQsrqvweFLrqS73j7L7cmR5c=
lukechampine.com/blake3 v1.1.6/go.mod h1:tkKEOtDkNtklkXtLNEOGNq5tcV90tJiA1vAA12R78LA=
rsc.io/binaryregexp v0.2.0/go.mod h1:qTv7/COck+e2FymRvadv62gMdZztPaShugOCi3I+8D8=
rsc.io/quote/v3 v3.1.0/go.mod h1:yEA65RcK8LyAZtP9Kv3t0HmxON59tX3rD+tICJqUlj0=
rsc.io/sampler v1.3.0/go.mod h1:T1hPZKmBbMNahiBKFy5HrXp6adAjACjK9JXDnKaTXpA=

View File

@ -1,13 +1,188 @@
package cli
import (
"errors"
"encoding/hex"
"fmt"
"path/filepath"
"strings"
"time"
"github.com/apex/log"
"github.com/dustin/go-humanize"
"github.com/spf13/afero"
"github.com/urfave/cli/v2"
"sneak.berlin/go/mfer/internal/log"
"sneak.berlin/go/mfer/mfer"
)
func (mfa *CLIApp) checkManifestOperation(c *cli.Context) error {
log.WithError(errors.New("unimplemented"))
// findManifest looks for a manifest file in the given directory.
// It checks for index.mf and .index.mf, returning the first one found.
func findManifest(fs afero.Fs, dir string) (string, error) {
candidates := []string{"index.mf", ".index.mf"}
for _, name := range candidates {
path := filepath.Join(dir, name)
exists, err := afero.Exists(fs, path)
if err != nil {
return "", err
}
if exists {
return path, nil
}
}
return "", fmt.Errorf("no manifest found in %s (looked for index.mf and .index.mf)", dir)
}
func (mfa *CLIApp) checkManifestOperation(ctx *cli.Context) error {
log.Debug("checkManifestOperation()")
var manifestPath string
var err error
if ctx.Args().Len() > 0 {
arg := ctx.Args().Get(0)
// Check if arg is a directory or a file
info, statErr := mfa.Fs.Stat(arg)
if statErr == nil && info.IsDir() {
// It's a directory, look for manifest inside
manifestPath, err = findManifest(mfa.Fs, arg)
if err != nil {
return err
}
} else {
// Treat as a file path
manifestPath = arg
}
} else {
// No argument, look in current directory
manifestPath, err = findManifest(mfa.Fs, ".")
if err != nil {
return err
}
}
basePath := ctx.String("base")
showProgress := ctx.Bool("progress")
log.Infof("checking manifest %s with base %s", manifestPath, basePath)
// Create checker
chk, err := mfer.NewChecker(manifestPath, basePath, mfa.Fs)
if err != nil {
return fmt.Errorf("failed to load manifest: %w", err)
}
// Check signature requirement
requiredSigner := ctx.String("require-signature")
if requiredSigner != "" {
// Validate fingerprint format: must be exactly 40 hex characters
if len(requiredSigner) != 40 {
return fmt.Errorf("invalid fingerprint: must be exactly 40 hex characters, got %d", len(requiredSigner))
}
if _, err := hex.DecodeString(requiredSigner); err != nil {
return fmt.Errorf("invalid fingerprint: must be valid hex: %w", err)
}
if !chk.IsSigned() {
return fmt.Errorf("manifest is not signed, but signature from %s is required", requiredSigner)
}
// Extract fingerprint from the embedded public key (not from the signer field)
// This validates the key is importable and gets its actual fingerprint
embeddedFP, err := chk.ExtractEmbeddedSigningKeyFP()
if err != nil {
return fmt.Errorf("failed to extract fingerprint from embedded signing key: %w", err)
}
// Compare fingerprints - must be exact match (case-insensitive)
if !strings.EqualFold(embeddedFP, requiredSigner) {
return fmt.Errorf("embedded signing key fingerprint %s does not match required %s", embeddedFP, requiredSigner)
}
log.Infof("manifest signature verified (signer: %s)", embeddedFP)
}
log.Infof("manifest contains %d files, %s", chk.FileCount(), humanize.IBytes(uint64(chk.TotalBytes())))
// Set up results channel
results := make(chan mfer.Result, 1)
// Set up progress channel
var progress chan mfer.CheckStatus
if showProgress {
progress = make(chan mfer.CheckStatus, 1)
go func() {
for status := range progress {
if status.ETA > 0 {
log.Progressf("Checking: %d/%d files, %s/s, ETA %s, %d failures",
status.CheckedFiles,
status.TotalFiles,
humanize.IBytes(uint64(status.BytesPerSec)),
status.ETA.Round(time.Second),
status.Failures)
} else {
log.Progressf("Checking: %d/%d files, %s/s, %d failures",
status.CheckedFiles,
status.TotalFiles,
humanize.IBytes(uint64(status.BytesPerSec)),
status.Failures)
}
}
log.ProgressDone()
}()
}
// Process results in a goroutine
var failures int64
done := make(chan struct{})
go func() {
for result := range results {
if result.Status != mfer.StatusOK {
failures++
log.Infof("%s: %s (%s)", result.Status, result.Path, result.Message)
} else {
log.Verbosef("%s: %s", result.Status, result.Path)
}
}
close(done)
}()
// Run check
err = chk.Check(ctx.Context, results, progress)
if err != nil {
return fmt.Errorf("check failed: %w", err)
}
// Wait for results processing to complete
<-done
// Check for extra files if requested
if ctx.Bool("no-extra-files") {
extraResults := make(chan mfer.Result, 1)
extraDone := make(chan struct{})
go func() {
for result := range extraResults {
failures++
log.Infof("%s: %s (%s)", result.Status, result.Path, result.Message)
}
close(extraDone)
}()
err = chk.FindExtraFiles(ctx.Context, extraResults)
if err != nil {
return fmt.Errorf("failed to check for extra files: %w", err)
}
<-extraDone
}
elapsed := time.Since(mfa.startupTime).Seconds()
rate := float64(chk.TotalBytes()) / elapsed
if failures == 0 {
log.Infof("checked %d files (%s) in %.1fs (%s/s): all OK", chk.FileCount(), humanize.IBytes(uint64(chk.TotalBytes())), elapsed, humanize.IBytes(uint64(rate)))
} else {
log.Infof("checked %d files (%s) in %.1fs (%s/s): %d failed", chk.FileCount(), humanize.IBytes(uint64(chk.TotalBytes())), elapsed, humanize.IBytes(uint64(rate)), failures)
}
if failures > 0 {
mfa.exitCode = 1
}
return nil
}

View File

@ -1,9 +1,14 @@
package cli
import (
"io"
"os"
"github.com/spf13/afero"
)
// NO_COLOR disables colored output when set. Automatically true if the
// NO_COLOR environment variable is present (per https://no-color.org/).
var NO_COLOR bool
func init() {
@ -13,13 +18,51 @@ func init() {
}
}
func Run(Appname, Version, Gitrev string) int {
m := &CLIApp{}
m.appname = Appname
m.version = Version
m.gitrev = Gitrev
m.exitCode = 0
// RunOptions contains all configuration for running the CLI application.
// Use DefaultRunOptions for standard CLI execution, or construct manually for testing.
type RunOptions struct {
Appname string // Application name displayed in help and version output
Version string // Version string (typically set at build time)
Gitrev string // Git revision hash (typically set at build time)
Args []string // Command-line arguments (typically os.Args)
Stdin io.Reader // Standard input stream
Stdout io.Writer // Standard output stream
Stderr io.Writer // Standard error stream
Fs afero.Fs // Filesystem abstraction for file operations
}
m.run()
// DefaultRunOptions returns RunOptions configured for normal CLI execution.
func DefaultRunOptions(appname, version, gitrev string) *RunOptions {
return &RunOptions{
Appname: appname,
Version: version,
Gitrev: gitrev,
Args: os.Args,
Stdin: os.Stdin,
Stdout: os.Stdout,
Stderr: os.Stderr,
Fs: afero.NewOsFs(),
}
}
// Run creates and runs the CLI application with default options.
func Run(appname, version, gitrev string) int {
return RunWithOptions(DefaultRunOptions(appname, version, gitrev))
}
// RunWithOptions creates and runs the CLI application with the given options.
func RunWithOptions(opts *RunOptions) int {
m := &CLIApp{
appname: opts.Appname,
version: opts.Version,
gitrev: opts.Gitrev,
exitCode: 0,
Stdin: opts.Stdin,
Stdout: opts.Stdout,
Stderr: opts.Stderr,
Fs: opts.Fs,
}
m.run(opts.Args)
return m.exitCode
}

View File

@ -1,12 +1,593 @@
package cli
import (
"bytes"
"fmt"
"math/rand"
"testing"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
urfcli "github.com/urfave/cli/v2"
"sneak.berlin/go/mfer/mfer"
)
func init() {
// Prevent urfave/cli from calling os.Exit during tests
urfcli.OsExiter = func(code int) {}
}
func TestBuild(t *testing.T) {
m := &CLIApp{}
assert.NotNil(t, m)
}
func testOpts(args []string, fs afero.Fs) *RunOptions {
return &RunOptions{
Appname: "mfer",
Version: "1.0.0",
Gitrev: "abc123",
Args: args,
Stdin: &bytes.Buffer{},
Stdout: &bytes.Buffer{},
Stderr: &bytes.Buffer{},
Fs: fs,
}
}
func TestVersionCommand(t *testing.T) {
fs := afero.NewMemMapFs()
opts := testOpts([]string{"mfer", "version"}, fs)
exitCode := RunWithOptions(opts)
assert.Equal(t, 0, exitCode)
stdout := opts.Stdout.(*bytes.Buffer).String()
assert.Contains(t, stdout, mfer.Version)
assert.Contains(t, stdout, "abc123")
}
func TestHelpCommand(t *testing.T) {
fs := afero.NewMemMapFs()
opts := testOpts([]string{"mfer", "--help"}, fs)
exitCode := RunWithOptions(opts)
assert.Equal(t, 0, exitCode)
stdout := opts.Stdout.(*bytes.Buffer).String()
assert.Contains(t, stdout, "generate")
assert.Contains(t, stdout, "check")
assert.Contains(t, stdout, "fetch")
}
func TestGenerateCommand(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test files in memory filesystem
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello world"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("test content"), 0o644))
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/testdir/test.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
assert.Equal(t, 0, exitCode, "stderr: %s", opts.Stderr.(*bytes.Buffer).String())
// Verify manifest was created
exists, err := afero.Exists(fs, "/testdir/test.mf")
require.NoError(t, err)
assert.True(t, exists)
}
func TestGenerateAndCheckCommand(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test files with subdirectory
require.NoError(t, fs.MkdirAll("/testdir/subdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello world"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/subdir/file2.txt", []byte("test content"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/testdir/test.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
require.Equal(t, 0, exitCode, "generate failed: %s", opts.Stderr.(*bytes.Buffer).String())
// Check manifest
opts = testOpts([]string{"mfer", "check", "-q", "--base", "/testdir", "/testdir/test.mf"}, fs)
exitCode = RunWithOptions(opts)
assert.Equal(t, 0, exitCode, "check failed: %s", opts.Stderr.(*bytes.Buffer).String())
}
func TestCheckCommandWithMissingFile(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello world"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/testdir/test.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
require.Equal(t, 0, exitCode, "generate failed: %s", opts.Stderr.(*bytes.Buffer).String())
// Delete the file
require.NoError(t, fs.Remove("/testdir/file1.txt"))
// Check manifest - should fail
opts = testOpts([]string{"mfer", "check", "-q", "--base", "/testdir", "/testdir/test.mf"}, fs)
exitCode = RunWithOptions(opts)
assert.Equal(t, 1, exitCode, "check should have failed for missing file")
}
func TestCheckCommandWithCorruptedFile(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello world"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/testdir/test.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
require.Equal(t, 0, exitCode, "generate failed: %s", opts.Stderr.(*bytes.Buffer).String())
// Corrupt the file (change content but keep same size)
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("HELLO WORLD"), 0o644))
// Check manifest - should fail with hash mismatch
opts = testOpts([]string{"mfer", "check", "-q", "--base", "/testdir", "/testdir/test.mf"}, fs)
exitCode = RunWithOptions(opts)
assert.Equal(t, 1, exitCode, "check should have failed for corrupted file")
}
func TestCheckCommandWithSizeMismatch(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello world"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/testdir/test.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
require.Equal(t, 0, exitCode, "generate failed: %s", opts.Stderr.(*bytes.Buffer).String())
// Change file size
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("different size content here"), 0o644))
// Check manifest - should fail with size mismatch
opts = testOpts([]string{"mfer", "check", "-q", "--base", "/testdir", "/testdir/test.mf"}, fs)
exitCode = RunWithOptions(opts)
assert.Equal(t, 1, exitCode, "check should have failed for size mismatch")
}
func TestBannerOutput(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
// Run without -q to see banner
opts := testOpts([]string{"mfer", "generate", "-o", "/testdir/test.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
assert.Equal(t, 0, exitCode)
// Banner ASCII art should be in stdout
stdout := opts.Stdout.(*bytes.Buffer).String()
assert.Contains(t, stdout, "___")
assert.Contains(t, stdout, "\\")
}
func TestUnknownCommand(t *testing.T) {
fs := afero.NewMemMapFs()
opts := testOpts([]string{"mfer", "unknown"}, fs)
exitCode := RunWithOptions(opts)
assert.Equal(t, 1, exitCode)
}
func TestGenerateExcludesDotfilesByDefault(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test files including dotfiles
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/.hidden", []byte("secret"), 0o644))
// Generate manifest without --include-dotfiles (default excludes dotfiles)
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/testdir/test.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
require.Equal(t, 0, exitCode)
// Check that manifest exists
exists, _ := afero.Exists(fs, "/testdir/test.mf")
assert.True(t, exists)
// Verify manifest only has 1 file (the non-dotfile)
manifest, err := mfer.NewManifestFromFile(fs, "/testdir/test.mf")
require.NoError(t, err)
assert.Len(t, manifest.Files(), 1)
assert.Equal(t, "file1.txt", manifest.Files()[0].Path)
}
func TestGenerateWithIncludeDotfiles(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test files including dotfiles
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/.hidden", []byte("secret"), 0o644))
// Generate manifest with --include-dotfiles
opts := testOpts([]string{"mfer", "generate", "-q", "--include-dotfiles", "-o", "/testdir/test.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
require.Equal(t, 0, exitCode)
// Verify manifest has 2 files (including dotfile)
manifest, err := mfer.NewManifestFromFile(fs, "/testdir/test.mf")
require.NoError(t, err)
assert.Len(t, manifest.Files(), 2)
}
func TestMultipleInputPaths(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test files in multiple directories
require.NoError(t, fs.MkdirAll("/dir1", 0o755))
require.NoError(t, fs.MkdirAll("/dir2", 0o755))
require.NoError(t, afero.WriteFile(fs, "/dir1/file1.txt", []byte("content1"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/dir2/file2.txt", []byte("content2"), 0o644))
// Generate manifest from multiple paths
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/output.mf", "/dir1", "/dir2"}, fs)
exitCode := RunWithOptions(opts)
assert.Equal(t, 0, exitCode, "stderr: %s", opts.Stderr.(*bytes.Buffer).String())
exists, _ := afero.Exists(fs, "/output.mf")
assert.True(t, exists)
}
func TestNoExtraFilesPass(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test files
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("world"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/manifest.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
require.Equal(t, 0, exitCode)
// Check with --no-extra-files (should pass - no extra files)
opts = testOpts([]string{"mfer", "check", "-q", "--no-extra-files", "--base", "/testdir", "/manifest.mf"}, fs)
exitCode = RunWithOptions(opts)
assert.Equal(t, 0, exitCode)
}
func TestNoExtraFilesFail(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test files
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/manifest.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
require.Equal(t, 0, exitCode)
// Add an extra file after manifest generation
require.NoError(t, afero.WriteFile(fs, "/testdir/extra.txt", []byte("extra"), 0o644))
// Check with --no-extra-files (should fail - extra file exists)
opts = testOpts([]string{"mfer", "check", "-q", "--no-extra-files", "--base", "/testdir", "/manifest.mf"}, fs)
exitCode = RunWithOptions(opts)
assert.Equal(t, 1, exitCode, "check should fail when extra files exist")
}
func TestNoExtraFilesWithSubdirectory(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test files with subdirectory
require.NoError(t, fs.MkdirAll("/testdir/subdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/subdir/file2.txt", []byte("world"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/manifest.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
require.Equal(t, 0, exitCode)
// Add extra file in subdirectory
require.NoError(t, afero.WriteFile(fs, "/testdir/subdir/extra.txt", []byte("extra"), 0o644))
// Check with --no-extra-files (should fail)
opts = testOpts([]string{"mfer", "check", "-q", "--no-extra-files", "--base", "/testdir", "/manifest.mf"}, fs)
exitCode = RunWithOptions(opts)
assert.Equal(t, 1, exitCode, "check should fail when extra files exist in subdirectory")
}
func TestCheckWithoutNoExtraFilesIgnoresExtra(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/manifest.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
require.Equal(t, 0, exitCode)
// Add extra file
require.NoError(t, afero.WriteFile(fs, "/testdir/extra.txt", []byte("extra"), 0o644))
// Check WITHOUT --no-extra-files (should pass - extra files ignored)
opts = testOpts([]string{"mfer", "check", "-q", "--base", "/testdir", "/manifest.mf"}, fs)
exitCode = RunWithOptions(opts)
assert.Equal(t, 0, exitCode, "check without --no-extra-files should ignore extra files")
}
func TestGenerateAtomicWriteNoTempFileOnSuccess(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/output.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
require.Equal(t, 0, exitCode)
// Verify output file exists
exists, err := afero.Exists(fs, "/output.mf")
require.NoError(t, err)
assert.True(t, exists, "output file should exist")
// Verify temp file does NOT exist
tmpExists, err := afero.Exists(fs, "/output.mf.tmp")
require.NoError(t, err)
assert.False(t, tmpExists, "temp file should not exist after successful generation")
}
func TestGenerateAtomicWriteOverwriteWithForce(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
// Create existing manifest with different content
require.NoError(t, afero.WriteFile(fs, "/output.mf", []byte("old content"), 0o644))
// Generate manifest with --force
opts := testOpts([]string{"mfer", "generate", "-q", "-f", "-o", "/output.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
require.Equal(t, 0, exitCode)
// Verify output file exists and was overwritten
content, err := afero.ReadFile(fs, "/output.mf")
require.NoError(t, err)
assert.NotEqual(t, "old content", string(content), "manifest should be overwritten")
// Verify temp file does NOT exist
tmpExists, err := afero.Exists(fs, "/output.mf.tmp")
require.NoError(t, err)
assert.False(t, tmpExists, "temp file should not exist after successful generation")
}
func TestGenerateFailsWithoutForceWhenOutputExists(t *testing.T) {
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
// Create existing manifest
require.NoError(t, afero.WriteFile(fs, "/output.mf", []byte("existing"), 0o644))
// Generate manifest WITHOUT --force (should fail)
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/output.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
assert.Equal(t, 1, exitCode, "should fail when output exists without --force")
// Verify original content is preserved
content, err := afero.ReadFile(fs, "/output.mf")
require.NoError(t, err)
assert.Equal(t, "existing", string(content), "original file should be preserved")
}
func TestGenerateAtomicWriteUsesTemp(t *testing.T) {
// This test verifies that generate uses a temp file by checking
// that the output file doesn't exist until generation completes.
// We do this by generating to a path and verifying the temp file
// pattern is used (output.mf.tmp -> output.mf)
fs := afero.NewMemMapFs()
// Create test file
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("hello"), 0o644))
// Generate manifest
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/output.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
require.Equal(t, 0, exitCode)
// Both output file should exist and temp should not
exists, _ := afero.Exists(fs, "/output.mf")
assert.True(t, exists, "output file should exist")
tmpExists, _ := afero.Exists(fs, "/output.mf.tmp")
assert.False(t, tmpExists, "temp file should be cleaned up")
// Verify manifest is valid (not empty)
content, err := afero.ReadFile(fs, "/output.mf")
require.NoError(t, err)
assert.True(t, len(content) > 0, "manifest should not be empty")
}
// failingWriterFs wraps a filesystem and makes writes fail after N bytes
type failingWriterFs struct {
afero.Fs
failAfter int64
written int64
}
type failingFile struct {
afero.File
fs *failingWriterFs
}
func (f *failingFile) Write(p []byte) (int, error) {
f.fs.written += int64(len(p))
if f.fs.written > f.fs.failAfter {
return 0, fmt.Errorf("simulated write failure")
}
return f.File.Write(p)
}
func (fs *failingWriterFs) Create(name string) (afero.File, error) {
f, err := fs.Fs.Create(name)
if err != nil {
return nil, err
}
return &failingFile{File: f, fs: fs}, nil
}
func TestGenerateAtomicWriteCleansUpOnError(t *testing.T) {
baseFs := afero.NewMemMapFs()
// Create test files - need enough content to trigger the write failure
require.NoError(t, baseFs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(baseFs, "/testdir/file1.txt", []byte("hello world this is a test file"), 0o644))
// Wrap with failing writer that fails after writing some bytes
fs := &failingWriterFs{Fs: baseFs, failAfter: 10}
// Generate manifest - should fail during write
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/output.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
assert.Equal(t, 1, exitCode, "should fail due to write error")
// With atomic writes: output.mf should NOT exist (temp was cleaned up)
// With non-atomic writes: output.mf WOULD exist (partial/empty)
exists, _ := afero.Exists(baseFs, "/output.mf")
assert.False(t, exists, "output file should not exist after failed generation (atomic write)")
// Temp file should also not exist
tmpExists, _ := afero.Exists(baseFs, "/output.mf.tmp")
assert.False(t, tmpExists, "temp file should be cleaned up after failed generation")
}
func TestGenerateValidatesInputPaths(t *testing.T) {
fs := afero.NewMemMapFs()
// Create one valid directory
require.NoError(t, fs.MkdirAll("/validdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/validdir/file.txt", []byte("content"), 0o644))
t.Run("nonexistent path fails fast", func(t *testing.T) {
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/output.mf", "/nonexistent"}, fs)
exitCode := RunWithOptions(opts)
assert.Equal(t, 1, exitCode)
stderr := opts.Stderr.(*bytes.Buffer).String()
assert.Contains(t, stderr, "path does not exist")
assert.Contains(t, stderr, "/nonexistent")
})
t.Run("mix of valid and invalid paths fails fast", func(t *testing.T) {
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/output.mf", "/validdir", "/alsononexistent"}, fs)
exitCode := RunWithOptions(opts)
assert.Equal(t, 1, exitCode)
stderr := opts.Stderr.(*bytes.Buffer).String()
assert.Contains(t, stderr, "path does not exist")
assert.Contains(t, stderr, "/alsononexistent")
// Output file should not have been created
exists, _ := afero.Exists(fs, "/output.mf")
assert.False(t, exists, "output file should not exist when path validation fails")
})
t.Run("valid paths succeed", func(t *testing.T) {
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/output.mf", "/validdir"}, fs)
exitCode := RunWithOptions(opts)
assert.Equal(t, 0, exitCode)
})
}
func TestCheckDetectsManifestCorruption(t *testing.T) {
fs := afero.NewMemMapFs()
rng := rand.New(rand.NewSource(42))
// Create many small files with random names to generate a ~1MB manifest
// Each manifest entry is roughly 50-60 bytes, so we need ~20000 files
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
numFiles := 20000
for i := 0; i < numFiles; i++ {
// Generate random filename
filename := fmt.Sprintf("/testdir/%08x%08x%08x.dat", rng.Uint32(), rng.Uint32(), rng.Uint32())
// Small random content
content := make([]byte, 16+rng.Intn(48))
rng.Read(content)
require.NoError(t, afero.WriteFile(fs, filename, content, 0o644))
}
// Generate manifest outside of testdir
opts := testOpts([]string{"mfer", "generate", "-q", "-o", "/manifest.mf", "/testdir"}, fs)
exitCode := RunWithOptions(opts)
require.Equal(t, 0, exitCode, "generate should succeed")
// Read the valid manifest and verify it's approximately 1MB
validManifest, err := afero.ReadFile(fs, "/manifest.mf")
require.NoError(t, err)
require.True(t, len(validManifest) >= 1024*1024, "manifest should be at least 1MB, got %d bytes", len(validManifest))
t.Logf("manifest size: %d bytes (%d files)", len(validManifest), numFiles)
// First corruption: truncate the manifest
require.NoError(t, afero.WriteFile(fs, "/manifest.mf", validManifest[:len(validManifest)/2], 0o644))
// Check should fail with truncated manifest
opts = testOpts([]string{"mfer", "check", "-q", "--base", "/testdir", "/manifest.mf"}, fs)
exitCode = RunWithOptions(opts)
assert.Equal(t, 1, exitCode, "check should fail with truncated manifest")
// Verify check passes with valid manifest
require.NoError(t, afero.WriteFile(fs, "/manifest.mf", validManifest, 0o644))
opts = testOpts([]string{"mfer", "check", "-q", "--base", "/testdir", "/manifest.mf"}, fs)
exitCode = RunWithOptions(opts)
require.Equal(t, 0, exitCode, "check should pass with valid manifest")
// Now do 500 random corruption iterations
for i := 0; i < 500; i++ {
// Corrupt: write a random byte at a random offset
corrupted := make([]byte, len(validManifest))
copy(corrupted, validManifest)
offset := rng.Intn(len(corrupted))
originalByte := corrupted[offset]
// Make sure we actually change the byte
newByte := byte(rng.Intn(256))
for newByte == originalByte {
newByte = byte(rng.Intn(256))
}
corrupted[offset] = newByte
require.NoError(t, afero.WriteFile(fs, "/manifest.mf", corrupted, 0o644))
// Check should fail with corrupted manifest
opts = testOpts([]string{"mfer", "check", "-q", "--base", "/testdir", "/manifest.mf"}, fs)
exitCode = RunWithOptions(opts)
assert.Equal(t, 1, exitCode, "iteration %d: check should fail with corrupted manifest (offset %d, 0x%02x -> 0x%02x)",
i, offset, originalByte, newByte)
// Restore valid manifest for next iteration
require.NoError(t, afero.WriteFile(fs, "/manifest.mf", validManifest, 0o644))
}
}

View File

@ -1,12 +1,374 @@
package cli
import (
"github.com/apex/log"
"bytes"
"crypto/sha256"
"fmt"
"io"
"net/http"
"net/url"
"os"
"path"
"path/filepath"
"strings"
"time"
"github.com/dustin/go-humanize"
"github.com/multiformats/go-multihash"
"github.com/urfave/cli/v2"
"sneak.berlin/go/mfer/internal/log"
"sneak.berlin/go/mfer/mfer"
)
func (mfa *CLIApp) fetchManifestOperation(c *cli.Context) error {
log.Debugf("fetchManifestOperation()")
panic("not implemented")
return nil //nolint
// DownloadProgress reports the progress of a single file download.
type DownloadProgress struct {
Path string // File path being downloaded
BytesRead int64 // Bytes downloaded so far
TotalBytes int64 // Total expected bytes (-1 if unknown)
BytesPerSec float64 // Current download rate
ETA time.Duration // Estimated time to completion
}
func (mfa *CLIApp) fetchManifestOperation(ctx *cli.Context) error {
log.Debug("fetchManifestOperation()")
if ctx.Args().Len() == 0 {
return fmt.Errorf("URL argument required")
}
inputURL := ctx.Args().Get(0)
manifestURL, err := resolveManifestURL(inputURL)
if err != nil {
return fmt.Errorf("invalid URL: %w", err)
}
log.Infof("fetching manifest from %s", manifestURL)
// Fetch manifest
resp, err := http.Get(manifestURL)
if err != nil {
return fmt.Errorf("failed to fetch manifest: %w", err)
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("failed to fetch manifest: HTTP %d", resp.StatusCode)
}
// Parse manifest
manifest, err := mfer.NewManifestFromReader(resp.Body)
if err != nil {
return fmt.Errorf("failed to parse manifest: %w", err)
}
files := manifest.Files()
log.Infof("manifest contains %d files", len(files))
// Compute base URL (directory containing manifest)
baseURL, err := url.Parse(manifestURL)
if err != nil {
return err
}
baseURL.Path = path.Dir(baseURL.Path)
if !strings.HasSuffix(baseURL.Path, "/") {
baseURL.Path += "/"
}
// Calculate total bytes to download
var totalBytes int64
for _, f := range files {
totalBytes += f.Size
}
// Create progress channel
progress := make(chan DownloadProgress, 10)
// Start progress reporter goroutine
done := make(chan struct{})
go func() {
defer close(done)
for p := range progress {
rate := formatBitrate(p.BytesPerSec * 8)
if p.ETA > 0 {
log.Infof("%s: %s/%s, %s, ETA %s",
p.Path, humanize.IBytes(uint64(p.BytesRead)), humanize.IBytes(uint64(p.TotalBytes)),
rate, p.ETA.Round(time.Second))
} else {
log.Infof("%s: %s/%s, %s",
p.Path, humanize.IBytes(uint64(p.BytesRead)), humanize.IBytes(uint64(p.TotalBytes)), rate)
}
}
}()
// Track download start time
startTime := time.Now()
// Download each file
for _, f := range files {
// Sanitize the path to prevent path traversal attacks
localPath, err := sanitizePath(f.Path)
if err != nil {
close(progress)
<-done
return fmt.Errorf("invalid path in manifest: %w", err)
}
fileURL := baseURL.String() + encodeFilePath(f.Path)
log.Infof("fetching %s", f.Path)
if err := downloadFile(fileURL, localPath, f, progress); err != nil {
close(progress)
<-done
return fmt.Errorf("failed to download %s: %w", f.Path, err)
}
}
close(progress)
<-done
// Print summary
elapsed := time.Since(startTime)
avgBytesPerSec := float64(totalBytes) / elapsed.Seconds()
avgRate := formatBitrate(avgBytesPerSec * 8)
log.Infof("downloaded %d files (%s) in %.1fs (%s avg)",
len(files),
humanize.IBytes(uint64(totalBytes)),
elapsed.Seconds(),
avgRate)
return nil
}
// encodeFilePath URL-encodes each segment of a file path while preserving slashes.
func encodeFilePath(p string) string {
segments := strings.Split(p, "/")
for i, seg := range segments {
segments[i] = url.PathEscape(seg)
}
return strings.Join(segments, "/")
}
// sanitizePath validates and sanitizes a file path from the manifest.
// It prevents path traversal attacks and rejects unsafe paths.
func sanitizePath(p string) (string, error) {
// Reject empty paths
if p == "" {
return "", fmt.Errorf("empty path")
}
// Reject absolute paths
if filepath.IsAbs(p) {
return "", fmt.Errorf("absolute path not allowed: %s", p)
}
// Clean the path to resolve . and ..
cleaned := filepath.Clean(p)
// Reject paths that escape the current directory
if strings.HasPrefix(cleaned, ".."+string(filepath.Separator)) || cleaned == ".." {
return "", fmt.Errorf("path traversal not allowed: %s", p)
}
// Also check for absolute paths after cleaning (handles edge cases)
if filepath.IsAbs(cleaned) {
return "", fmt.Errorf("absolute path not allowed: %s", p)
}
return cleaned, nil
}
// resolveManifestURL takes a URL and returns the manifest URL.
// If the URL already ends with .mf, it's returned as-is.
// Otherwise, index.mf is appended.
func resolveManifestURL(inputURL string) (string, error) {
parsed, err := url.Parse(inputURL)
if err != nil {
return "", err
}
// Check if URL already ends with .mf
if strings.HasSuffix(parsed.Path, ".mf") {
return inputURL, nil
}
// Ensure path ends with /
if !strings.HasSuffix(parsed.Path, "/") {
parsed.Path += "/"
}
// Append index.mf
parsed.Path += "index.mf"
return parsed.String(), nil
}
// progressWriter wraps an io.Writer and reports progress to a channel.
type progressWriter struct {
w io.Writer
path string
total int64
written int64
startTime time.Time
progress chan<- DownloadProgress
}
func (pw *progressWriter) Write(p []byte) (int, error) {
n, err := pw.w.Write(p)
pw.written += int64(n)
if pw.progress != nil {
var bytesPerSec float64
var eta time.Duration
elapsed := time.Since(pw.startTime)
if elapsed > 0 && pw.written > 0 {
bytesPerSec = float64(pw.written) / elapsed.Seconds()
if bytesPerSec > 0 && pw.total > 0 {
remainingBytes := pw.total - pw.written
eta = time.Duration(float64(remainingBytes)/bytesPerSec) * time.Second
}
}
sendProgress(pw.progress, DownloadProgress{
Path: pw.path,
BytesRead: pw.written,
TotalBytes: pw.total,
BytesPerSec: bytesPerSec,
ETA: eta,
})
}
return n, err
}
// formatBitrate formats a bits-per-second value with appropriate unit prefix.
func formatBitrate(bps float64) string {
switch {
case bps >= 1e9:
return fmt.Sprintf("%.1f Gbps", bps/1e9)
case bps >= 1e6:
return fmt.Sprintf("%.1f Mbps", bps/1e6)
case bps >= 1e3:
return fmt.Sprintf("%.1f Kbps", bps/1e3)
default:
return fmt.Sprintf("%.0f bps", bps)
}
}
// sendProgress sends a progress update without blocking.
func sendProgress(ch chan<- DownloadProgress, p DownloadProgress) {
select {
case ch <- p:
default:
}
}
// downloadFile downloads a URL to a local file path with hash verification.
// It downloads to a temporary file, verifies the hash, then renames to the final path.
// Progress is reported via the progress channel.
func downloadFile(fileURL, localPath string, entry *mfer.MFFilePath, progress chan<- DownloadProgress) error {
// Create parent directories if needed
dir := filepath.Dir(localPath)
if dir != "" && dir != "." {
if err := os.MkdirAll(dir, 0o755); err != nil {
return err
}
}
// Compute temp file path in the same directory
// For dotfiles, just append .tmp (they're already hidden)
// For regular files, prefix with . and append .tmp
base := filepath.Base(localPath)
var tmpName string
if strings.HasPrefix(base, ".") {
tmpName = base + ".tmp"
} else {
tmpName = "." + base + ".tmp"
}
tmpPath := filepath.Join(dir, tmpName)
if dir == "" || dir == "." {
tmpPath = tmpName
}
// Fetch file
resp, err := http.Get(fileURL)
if err != nil {
return err
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("HTTP %d", resp.StatusCode)
}
// Determine expected size
expectedSize := entry.Size
totalBytes := resp.ContentLength
if totalBytes < 0 {
totalBytes = expectedSize
}
// Create temp file
out, err := os.Create(tmpPath)
if err != nil {
return err
}
// Set up hash computation
h := sha256.New()
// Create progress-reporting writer that also computes hash
pw := &progressWriter{
w: io.MultiWriter(out, h),
path: localPath,
total: totalBytes,
startTime: time.Now(),
progress: progress,
}
// Copy content while hashing and reporting progress
written, copyErr := io.Copy(pw, resp.Body)
// Close file before checking errors (to flush writes)
closeErr := out.Close()
// If copy failed, clean up temp file and return error
if copyErr != nil {
_ = os.Remove(tmpPath)
return copyErr
}
if closeErr != nil {
_ = os.Remove(tmpPath)
return closeErr
}
// Verify size
if written != expectedSize {
_ = os.Remove(tmpPath)
return fmt.Errorf("size mismatch: expected %d bytes, got %d", expectedSize, written)
}
// Encode computed hash as multihash
computed, err := multihash.Encode(h.Sum(nil), multihash.SHA2_256)
if err != nil {
_ = os.Remove(tmpPath)
return fmt.Errorf("failed to encode hash: %w", err)
}
// Verify hash against manifest (at least one must match)
hashMatch := false
for _, hash := range entry.Hashes {
if bytes.Equal(computed, hash.MultiHash) {
hashMatch = true
break
}
}
if !hashMatch {
_ = os.Remove(tmpPath)
return fmt.Errorf("hash mismatch")
}
// Rename temp file to final path
if err := os.Rename(tmpPath, localPath); err != nil {
_ = os.Remove(tmpPath)
return fmt.Errorf("failed to rename temp file: %w", err)
}
return nil
}

391
internal/cli/fetch_test.go Normal file
View File

@ -0,0 +1,391 @@
package cli
import (
"bytes"
"context"
"io"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"testing"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/mfer/mfer"
)
func TestEncodeFilePath(t *testing.T) {
tests := []struct {
input string
expected string
}{
{"file.txt", "file.txt"},
{"dir/file.txt", "dir/file.txt"},
{"my file.txt", "my%20file.txt"},
{"dir/my file.txt", "dir/my%20file.txt"},
{"file#1.txt", "file%231.txt"},
{"file?v=1.txt", "file%3Fv=1.txt"},
{"path/to/file with spaces.txt", "path/to/file%20with%20spaces.txt"},
{"100%done.txt", "100%25done.txt"},
}
for _, tt := range tests {
t.Run(tt.input, func(t *testing.T) {
result := encodeFilePath(tt.input)
assert.Equal(t, tt.expected, result)
})
}
}
func TestSanitizePath(t *testing.T) {
// Valid paths that should be accepted
validTests := []struct {
input string
expected string
}{
{"file.txt", "file.txt"},
{"dir/file.txt", "dir/file.txt"},
{"dir/subdir/file.txt", "dir/subdir/file.txt"},
{"./file.txt", "file.txt"},
{"./dir/file.txt", "dir/file.txt"},
{"dir/./file.txt", "dir/file.txt"},
}
for _, tt := range validTests {
t.Run("valid:"+tt.input, func(t *testing.T) {
result, err := sanitizePath(tt.input)
assert.NoError(t, err)
assert.Equal(t, tt.expected, result)
})
}
// Invalid paths that should be rejected
invalidTests := []struct {
input string
desc string
}{
{"", "empty path"},
{"..", "parent directory"},
{"../file.txt", "parent traversal"},
{"../../file.txt", "double parent traversal"},
{"dir/../../../file.txt", "traversal escaping base"},
{"/etc/passwd", "absolute path"},
{"/file.txt", "absolute path with single component"},
{"dir/../../etc/passwd", "traversal to system file"},
}
for _, tt := range invalidTests {
t.Run("invalid:"+tt.desc, func(t *testing.T) {
_, err := sanitizePath(tt.input)
assert.Error(t, err, "expected error for path: %s", tt.input)
})
}
}
func TestResolveManifestURL(t *testing.T) {
tests := []struct {
input string
expected string
}{
// Already ends with .mf - use as-is
{"https://example.com/path/index.mf", "https://example.com/path/index.mf"},
{"https://example.com/path/custom.mf", "https://example.com/path/custom.mf"},
{"https://example.com/foo.mf", "https://example.com/foo.mf"},
// Directory with trailing slash - append index.mf
{"https://example.com/path/", "https://example.com/path/index.mf"},
{"https://example.com/", "https://example.com/index.mf"},
// Directory without trailing slash - add slash and index.mf
{"https://example.com/path", "https://example.com/path/index.mf"},
{"https://example.com", "https://example.com/index.mf"},
// With query strings
{"https://example.com/path?foo=bar", "https://example.com/path/index.mf?foo=bar"},
}
for _, tt := range tests {
t.Run(tt.input, func(t *testing.T) {
result, err := resolveManifestURL(tt.input)
assert.NoError(t, err)
assert.Equal(t, tt.expected, result)
})
}
}
func TestFetchFromHTTP(t *testing.T) {
// Create source filesystem with test files
sourceFs := afero.NewMemMapFs()
testFiles := map[string][]byte{
"file1.txt": []byte("Hello, World!"),
"file2.txt": []byte("This is file 2 with more content."),
"subdir/file3.txt": []byte("Nested file content here."),
"subdir/deep/f.txt": []byte("Deeply nested file."),
}
for path, content := range testFiles {
fullPath := "/" + path // MemMapFs needs absolute paths
dir := filepath.Dir(fullPath)
require.NoError(t, sourceFs.MkdirAll(dir, 0o755))
require.NoError(t, afero.WriteFile(sourceFs, fullPath, content, 0o644))
}
// Generate manifest using scanner
opts := &mfer.ScannerOptions{
Fs: sourceFs,
}
s := mfer.NewScannerWithOptions(opts)
require.NoError(t, s.EnumerateFS(sourceFs, "/", nil))
var manifestBuf bytes.Buffer
require.NoError(t, s.ToManifest(context.Background(), &manifestBuf, nil))
manifestData := manifestBuf.Bytes()
// Create HTTP server that serves the source filesystem
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
path := r.URL.Path
if path == "/index.mf" {
w.Header().Set("Content-Type", "application/octet-stream")
_, _ = w.Write(manifestData)
return
}
// Strip leading slash
if len(path) > 0 && path[0] == '/' {
path = path[1:]
}
content, exists := testFiles[path]
if !exists {
http.NotFound(w, r)
return
}
w.Header().Set("Content-Type", "application/octet-stream")
_, _ = w.Write(content)
}))
defer server.Close()
// Create destination directory
destDir, err := os.MkdirTemp("", "mfer-fetch-test-*")
require.NoError(t, err)
defer func() { _ = os.RemoveAll(destDir) }()
// Change to dest directory for the test
origDir, err := os.Getwd()
require.NoError(t, err)
require.NoError(t, os.Chdir(destDir))
defer func() { _ = os.Chdir(origDir) }()
// Parse the manifest to get file entries
manifest, err := mfer.NewManifestFromReader(bytes.NewReader(manifestData))
require.NoError(t, err)
files := manifest.Files()
require.Len(t, files, len(testFiles))
// Download each file using downloadFile
progress := make(chan DownloadProgress, 10)
go func() {
for range progress {
// Drain progress channel
}
}()
baseURL := server.URL + "/"
for _, f := range files {
localPath, err := sanitizePath(f.Path)
require.NoError(t, err)
fileURL := baseURL + f.Path
err = downloadFile(fileURL, localPath, f, progress)
require.NoError(t, err, "failed to download %s", f.Path)
}
close(progress)
// Verify downloaded files match originals
for path, expectedContent := range testFiles {
downloadedPath := filepath.Join(destDir, path)
downloadedContent, err := os.ReadFile(downloadedPath)
require.NoError(t, err, "failed to read downloaded file %s", path)
assert.Equal(t, expectedContent, downloadedContent, "content mismatch for %s", path)
}
}
func TestFetchHashMismatch(t *testing.T) {
// Create source filesystem with a test file
sourceFs := afero.NewMemMapFs()
originalContent := []byte("Original content")
require.NoError(t, afero.WriteFile(sourceFs, "/file.txt", originalContent, 0o644))
// Generate manifest
opts := &mfer.ScannerOptions{Fs: sourceFs}
s := mfer.NewScannerWithOptions(opts)
require.NoError(t, s.EnumerateFS(sourceFs, "/", nil))
var manifestBuf bytes.Buffer
require.NoError(t, s.ToManifest(context.Background(), &manifestBuf, nil))
// Parse manifest
manifest, err := mfer.NewManifestFromReader(bytes.NewReader(manifestBuf.Bytes()))
require.NoError(t, err)
files := manifest.Files()
require.Len(t, files, 1)
// Create server that serves DIFFERENT content (to trigger hash mismatch)
tamperedContent := []byte("Tampered content!")
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/octet-stream")
_, _ = w.Write(tamperedContent)
}))
defer server.Close()
// Create temp directory
destDir, err := os.MkdirTemp("", "mfer-fetch-hash-test-*")
require.NoError(t, err)
defer func() { _ = os.RemoveAll(destDir) }()
origDir, err := os.Getwd()
require.NoError(t, err)
require.NoError(t, os.Chdir(destDir))
defer func() { _ = os.Chdir(origDir) }()
// Try to download - should fail with hash mismatch
err = downloadFile(server.URL+"/file.txt", "file.txt", files[0], nil)
assert.Error(t, err)
assert.Contains(t, err.Error(), "mismatch")
// Verify temp file was cleaned up
_, err = os.Stat(".file.txt.tmp")
assert.True(t, os.IsNotExist(err), "temp file should be cleaned up on hash mismatch")
// Verify final file was not created
_, err = os.Stat("file.txt")
assert.True(t, os.IsNotExist(err), "final file should not exist on hash mismatch")
}
func TestFetchSizeMismatch(t *testing.T) {
// Create source filesystem with a test file
sourceFs := afero.NewMemMapFs()
originalContent := []byte("Original content with specific size")
require.NoError(t, afero.WriteFile(sourceFs, "/file.txt", originalContent, 0o644))
// Generate manifest
opts := &mfer.ScannerOptions{Fs: sourceFs}
s := mfer.NewScannerWithOptions(opts)
require.NoError(t, s.EnumerateFS(sourceFs, "/", nil))
var manifestBuf bytes.Buffer
require.NoError(t, s.ToManifest(context.Background(), &manifestBuf, nil))
// Parse manifest
manifest, err := mfer.NewManifestFromReader(bytes.NewReader(manifestBuf.Bytes()))
require.NoError(t, err)
files := manifest.Files()
require.Len(t, files, 1)
// Create server that serves content with wrong size
wrongSizeContent := []byte("Short")
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/octet-stream")
_, _ = w.Write(wrongSizeContent)
}))
defer server.Close()
// Create temp directory
destDir, err := os.MkdirTemp("", "mfer-fetch-size-test-*")
require.NoError(t, err)
defer func() { _ = os.RemoveAll(destDir) }()
origDir, err := os.Getwd()
require.NoError(t, err)
require.NoError(t, os.Chdir(destDir))
defer func() { _ = os.Chdir(origDir) }()
// Try to download - should fail with size mismatch
err = downloadFile(server.URL+"/file.txt", "file.txt", files[0], nil)
assert.Error(t, err)
assert.Contains(t, err.Error(), "size mismatch")
// Verify temp file was cleaned up
_, err = os.Stat(".file.txt.tmp")
assert.True(t, os.IsNotExist(err), "temp file should be cleaned up on size mismatch")
}
func TestFetchProgress(t *testing.T) {
// Create source filesystem with a larger test file
sourceFs := afero.NewMemMapFs()
// Create content large enough to trigger multiple progress updates
content := bytes.Repeat([]byte("x"), 100*1024) // 100KB
require.NoError(t, afero.WriteFile(sourceFs, "/large.txt", content, 0o644))
// Generate manifest
opts := &mfer.ScannerOptions{Fs: sourceFs}
s := mfer.NewScannerWithOptions(opts)
require.NoError(t, s.EnumerateFS(sourceFs, "/", nil))
var manifestBuf bytes.Buffer
require.NoError(t, s.ToManifest(context.Background(), &manifestBuf, nil))
// Parse manifest
manifest, err := mfer.NewManifestFromReader(bytes.NewReader(manifestBuf.Bytes()))
require.NoError(t, err)
files := manifest.Files()
require.Len(t, files, 1)
// Create server that serves the content
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/octet-stream")
w.Header().Set("Content-Length", "102400")
// Write in chunks to allow progress reporting
reader := bytes.NewReader(content)
_, _ = io.Copy(w, reader)
}))
defer server.Close()
// Create temp directory
destDir, err := os.MkdirTemp("", "mfer-fetch-progress-test-*")
require.NoError(t, err)
defer func() { _ = os.RemoveAll(destDir) }()
origDir, err := os.Getwd()
require.NoError(t, err)
require.NoError(t, os.Chdir(destDir))
defer func() { _ = os.Chdir(origDir) }()
// Set up progress channel and collect updates
progress := make(chan DownloadProgress, 100)
var progressUpdates []DownloadProgress
done := make(chan struct{})
go func() {
for p := range progress {
progressUpdates = append(progressUpdates, p)
}
close(done)
}()
// Download
err = downloadFile(server.URL+"/large.txt", "large.txt", files[0], progress)
close(progress)
<-done
require.NoError(t, err)
// Verify we got progress updates
assert.NotEmpty(t, progressUpdates, "should have received progress updates")
// Verify final progress shows complete
if len(progressUpdates) > 0 {
last := progressUpdates[len(progressUpdates)-1]
assert.Equal(t, int64(len(content)), last.BytesRead, "final progress should show all bytes read")
assert.Equal(t, "large.txt", last.Path)
}
// Verify file was downloaded correctly
downloaded, err := os.ReadFile("large.txt")
require.NoError(t, err)
assert.Equal(t, content, downloaded)
}

386
internal/cli/freshen.go Normal file
View File

@ -0,0 +1,386 @@
package cli
import (
"crypto/sha256"
"fmt"
"io"
"io/fs"
"path/filepath"
"time"
"github.com/dustin/go-humanize"
"github.com/multiformats/go-multihash"
"github.com/spf13/afero"
"github.com/urfave/cli/v2"
"sneak.berlin/go/mfer/internal/log"
"sneak.berlin/go/mfer/mfer"
)
// FreshenStatus contains progress information for the freshen operation.
type FreshenStatus struct {
Phase string // "scan" or "hash"
TotalFiles int64 // Total files to process in current phase
CurrentFiles int64 // Files processed so far
TotalBytes int64 // Total bytes to hash (hash phase only)
CurrentBytes int64 // Bytes hashed so far
BytesPerSec float64 // Current throughput rate
ETA time.Duration // Estimated time to completion
}
// freshenEntry tracks a file's status during freshen
type freshenEntry struct {
path string
size int64
mtime time.Time
needsHash bool // true if new or changed
existing *mfer.MFFilePath // existing manifest entry if unchanged
}
func (mfa *CLIApp) freshenManifestOperation(ctx *cli.Context) error {
log.Debug("freshenManifestOperation()")
basePath := ctx.String("base")
showProgress := ctx.Bool("progress")
includeDotfiles := ctx.Bool("IncludeDotfiles")
followSymlinks := ctx.Bool("FollowSymLinks")
// Find manifest file
var manifestPath string
var err error
if ctx.Args().Len() > 0 {
arg := ctx.Args().Get(0)
info, statErr := mfa.Fs.Stat(arg)
if statErr == nil && info.IsDir() {
manifestPath, err = findManifest(mfa.Fs, arg)
if err != nil {
return err
}
} else {
manifestPath = arg
}
} else {
manifestPath, err = findManifest(mfa.Fs, ".")
if err != nil {
return err
}
}
log.Infof("loading manifest from %s", manifestPath)
// Load existing manifest
manifest, err := mfer.NewManifestFromFile(mfa.Fs, manifestPath)
if err != nil {
return fmt.Errorf("failed to load manifest: %w", err)
}
existingFiles := manifest.Files()
log.Infof("manifest contains %d files", len(existingFiles))
// Build map of existing entries by path
existingByPath := make(map[string]*mfer.MFFilePath, len(existingFiles))
for _, f := range existingFiles {
existingByPath[f.Path] = f
}
// Phase 1: Scan filesystem
log.Infof("scanning filesystem...")
startScan := time.Now()
var entries []*freshenEntry
var scanCount int64
var removed, changed, added, unchanged int64
absBase, err := filepath.Abs(basePath)
if err != nil {
return err
}
err = afero.Walk(mfa.Fs, absBase, func(path string, info fs.FileInfo, walkErr error) error {
if walkErr != nil {
return walkErr
}
// Get relative path
relPath, err := filepath.Rel(absBase, path)
if err != nil {
return err
}
// Skip the manifest file itself
if relPath == filepath.Base(manifestPath) || relPath == "."+filepath.Base(manifestPath) {
return nil
}
// Handle dotfiles
if !includeDotfiles && mfer.IsHiddenPath(filepath.ToSlash(relPath)) {
if info.IsDir() {
return filepath.SkipDir
}
return nil
}
// Skip directories
if info.IsDir() {
return nil
}
// Handle symlinks
if info.Mode()&fs.ModeSymlink != 0 {
if !followSymlinks {
return nil
}
realPath, err := filepath.EvalSymlinks(path)
if err != nil {
return nil // Skip broken symlinks
}
realInfo, err := mfa.Fs.Stat(realPath)
if err != nil || realInfo.IsDir() {
return nil
}
info = realInfo
}
scanCount++
// Check against existing manifest
existing, inManifest := existingByPath[relPath]
if inManifest {
// Check if changed (size or mtime)
existingMtime := time.Unix(existing.Mtime.Seconds, int64(existing.Mtime.Nanos))
if existing.Size != info.Size() || !existingMtime.Equal(info.ModTime()) {
changed++
log.Verbosef("M %s", relPath)
entries = append(entries, &freshenEntry{
path: relPath,
size: info.Size(),
mtime: info.ModTime(),
needsHash: true,
})
} else {
unchanged++
entries = append(entries, &freshenEntry{
path: relPath,
size: info.Size(),
mtime: info.ModTime(),
needsHash: false,
existing: existing,
})
}
// Mark as seen
delete(existingByPath, relPath)
} else {
added++
log.Verbosef("A %s", relPath)
entries = append(entries, &freshenEntry{
path: relPath,
size: info.Size(),
mtime: info.ModTime(),
needsHash: true,
})
}
// Report scan progress
if showProgress && scanCount%100 == 0 {
log.Progressf("Scanning: %d files found", scanCount)
}
return nil
})
if showProgress {
log.ProgressDone()
}
if err != nil {
return fmt.Errorf("failed to scan filesystem: %w", err)
}
// Remaining entries in existingByPath are removed files
removed = int64(len(existingByPath))
for path := range existingByPath {
log.Verbosef("D %s", path)
}
scanDuration := time.Since(startScan)
log.Infof("scan complete in %s: %d unchanged, %d changed, %d added, %d removed",
scanDuration.Round(time.Millisecond), unchanged, changed, added, removed)
// Calculate total bytes to hash
var totalHashBytes int64
var filesToHash int64
for _, e := range entries {
if e.needsHash {
totalHashBytes += e.size
filesToHash++
}
}
// Phase 2: Hash changed and new files
if filesToHash > 0 {
log.Infof("hashing %d files (%s)...", filesToHash, humanize.IBytes(uint64(totalHashBytes)))
}
startHash := time.Now()
var hashedFiles int64
var hashedBytes int64
builder := mfer.NewBuilder()
// Set up signing options if sign-key is provided
if signKey := ctx.String("sign-key"); signKey != "" {
builder.SetSigningOptions(&mfer.SigningOptions{
KeyID: mfer.GPGKeyID(signKey),
})
log.Infof("signing manifest with GPG key: %s", signKey)
}
for _, e := range entries {
select {
case <-ctx.Done():
return ctx.Err()
default:
}
if e.needsHash {
// Need to read and hash the file
absPath := filepath.Join(absBase, e.path)
f, err := mfa.Fs.Open(absPath)
if err != nil {
return fmt.Errorf("failed to open %s: %w", e.path, err)
}
hash, bytesRead, err := hashFile(f, e.size, func(n int64) {
if showProgress {
currentBytes := hashedBytes + n
elapsed := time.Since(startHash)
var rate float64
var eta time.Duration
if elapsed > 0 && currentBytes > 0 {
rate = float64(currentBytes) / elapsed.Seconds()
remaining := totalHashBytes - currentBytes
if rate > 0 {
eta = time.Duration(float64(remaining)/rate) * time.Second
}
}
if eta > 0 {
log.Progressf("Hashing: %d/%d files, %s/s, ETA %s",
hashedFiles, filesToHash, humanize.IBytes(uint64(rate)), eta.Round(time.Second))
} else {
log.Progressf("Hashing: %d/%d files, %s/s",
hashedFiles, filesToHash, humanize.IBytes(uint64(rate)))
}
}
})
_ = f.Close()
if err != nil {
return fmt.Errorf("failed to hash %s: %w", e.path, err)
}
hashedBytes += bytesRead
hashedFiles++
// Add to builder with computed hash
if err := addFileToBuilder(builder, e.path, e.size, e.mtime, hash); err != nil {
return fmt.Errorf("failed to add %s: %w", e.path, err)
}
} else {
// Use existing entry
if err := addExistingToBuilder(builder, e.existing); err != nil {
return fmt.Errorf("failed to add %s: %w", e.path, err)
}
}
}
if showProgress && filesToHash > 0 {
log.ProgressDone()
}
// Print summary
log.Infof("freshen complete: %d unchanged, %d changed, %d added, %d removed",
unchanged, changed, added, removed)
// Skip writing if nothing changed
if changed == 0 && added == 0 && removed == 0 {
log.Infof("manifest unchanged, skipping write")
return nil
}
// Write updated manifest atomically (write to temp, then rename)
tmpPath := manifestPath + ".tmp"
outFile, err := mfa.Fs.Create(tmpPath)
if err != nil {
return fmt.Errorf("failed to create temp file: %w", err)
}
err = builder.Build(outFile)
_ = outFile.Close()
if err != nil {
_ = mfa.Fs.Remove(tmpPath)
return fmt.Errorf("failed to write manifest: %w", err)
}
// Rename temp to final
if err := mfa.Fs.Rename(tmpPath, manifestPath); err != nil {
_ = mfa.Fs.Remove(tmpPath)
return fmt.Errorf("failed to rename manifest: %w", err)
}
totalDuration := time.Since(mfa.startupTime)
if hashedBytes > 0 {
hashDuration := time.Since(startHash)
hashRate := float64(hashedBytes) / hashDuration.Seconds()
log.Infof("hashed %s in %.1fs (%s/s)",
humanize.IBytes(uint64(hashedBytes)), totalDuration.Seconds(), humanize.IBytes(uint64(hashRate)))
}
log.Infof("wrote %d files to %s", len(entries), manifestPath)
return nil
}
// hashFile reads a file and computes its SHA256 multihash.
// Progress callback is called with bytes read so far.
func hashFile(r io.Reader, size int64, progress func(int64)) ([]byte, int64, error) {
h := sha256.New()
buf := make([]byte, 64*1024)
var total int64
for {
n, err := r.Read(buf)
if n > 0 {
h.Write(buf[:n])
total += int64(n)
if progress != nil {
progress(total)
}
}
if err == io.EOF {
break
}
if err != nil {
return nil, total, err
}
}
mh, err := multihash.Encode(h.Sum(nil), multihash.SHA2_256)
if err != nil {
return nil, total, err
}
return mh, total, nil
}
// addFileToBuilder adds a new file entry to the builder
func addFileToBuilder(b *mfer.Builder, path string, size int64, mtime time.Time, hash []byte) error {
return b.AddFileWithHash(mfer.RelFilePath(path), mfer.FileSize(size), mfer.ModTime(mtime), hash)
}
// addExistingToBuilder adds an existing manifest entry to the builder
func addExistingToBuilder(b *mfer.Builder, entry *mfer.MFFilePath) error {
mtime := time.Unix(entry.Mtime.Seconds, int64(entry.Mtime.Nanos))
if len(entry.Hashes) == 0 {
return nil
}
return b.AddFileWithHash(mfer.RelFilePath(entry.Path), mfer.FileSize(entry.Size), mfer.ModTime(mtime), entry.Hashes[0].MultiHash)
}

View File

@ -0,0 +1,82 @@
package cli
import (
"bytes"
"context"
"testing"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/mfer/mfer"
)
func TestFreshenUnchanged(t *testing.T) {
// Create filesystem with test files
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("content1"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("content2"), 0o644))
// Generate initial manifest
opts := &mfer.ScannerOptions{Fs: fs}
s := mfer.NewScannerWithOptions(opts)
require.NoError(t, s.EnumeratePath("/testdir", nil))
var manifestBuf bytes.Buffer
require.NoError(t, s.ToManifest(context.Background(), &manifestBuf, nil))
// Write manifest to filesystem
require.NoError(t, afero.WriteFile(fs, "/testdir/.index.mf", manifestBuf.Bytes(), 0o644))
// Parse manifest to verify
manifest, err := mfer.NewManifestFromFile(fs, "/testdir/.index.mf")
require.NoError(t, err)
assert.Len(t, manifest.Files(), 2)
}
func TestFreshenWithChanges(t *testing.T) {
// Create filesystem with test files
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("content1"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("content2"), 0o644))
// Generate initial manifest
opts := &mfer.ScannerOptions{Fs: fs}
s := mfer.NewScannerWithOptions(opts)
require.NoError(t, s.EnumeratePath("/testdir", nil))
var manifestBuf bytes.Buffer
require.NoError(t, s.ToManifest(context.Background(), &manifestBuf, nil))
// Write manifest to filesystem
require.NoError(t, afero.WriteFile(fs, "/testdir/.index.mf", manifestBuf.Bytes(), 0o644))
// Verify initial manifest has 2 files
manifest, err := mfer.NewManifestFromFile(fs, "/testdir/.index.mf")
require.NoError(t, err)
assert.Len(t, manifest.Files(), 2)
// Add a new file
require.NoError(t, afero.WriteFile(fs, "/testdir/file3.txt", []byte("content3"), 0o644))
// Modify file2 (change content and size)
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("modified content2"), 0o644))
// Remove file1
require.NoError(t, fs.Remove("/testdir/file1.txt"))
// Note: The freshen operation would need to be run here
// For now, we just verify the test setup is correct
exists, _ := afero.Exists(fs, "/testdir/file1.txt")
assert.False(t, exists)
exists, _ = afero.Exists(fs, "/testdir/file3.txt")
assert.True(t, exists)
content, _ := afero.ReadFile(fs, "/testdir/file2.txt")
assert.Equal(t, "modified content2", string(content))
}

View File

@ -1,54 +1,176 @@
package cli
import (
"bytes"
"fmt"
"os"
"os/signal"
"path/filepath"
"sync"
"syscall"
"time"
"git.eeqj.de/sneak/mfer/internal/log"
"git.eeqj.de/sneak/mfer/mfer"
"github.com/dustin/go-humanize"
"github.com/spf13/afero"
"github.com/urfave/cli/v2"
"sneak.berlin/go/mfer/internal/log"
"sneak.berlin/go/mfer/mfer"
)
func (mfa *CLIApp) generateManifestOperation(ctx *cli.Context) error {
log.Debug("generateManifestOperation()")
myArgs := ctx.Args()
log.Dump(myArgs)
opts := &mfer.ManifestScanOptions{
IgnoreDotfiles: ctx.Bool("IgnoreDotfiles"),
opts := &mfer.ScannerOptions{
IncludeDotfiles: ctx.Bool("IncludeDotfiles"),
FollowSymLinks: ctx.Bool("FollowSymLinks"),
Fs: mfa.Fs,
}
paths := make([]string, ctx.Args().Len()-1)
for i := 0; i < ctx.Args().Len(); i++ {
ap, err := filepath.Abs(ctx.Args().Get(i))
// Set up signing options if sign-key is provided
if signKey := ctx.String("sign-key"); signKey != "" {
opts.SigningOptions = &mfer.SigningOptions{
KeyID: mfer.GPGKeyID(signKey),
}
log.Infof("signing manifest with GPG key: %s", signKey)
}
s := mfer.NewScannerWithOptions(opts)
// Phase 1: Enumeration - collect paths and stat files
args := ctx.Args()
showProgress := ctx.Bool("progress")
// Set up enumeration progress reporting
var enumProgress chan mfer.EnumerateStatus
var enumWg sync.WaitGroup
if showProgress {
enumProgress = make(chan mfer.EnumerateStatus, 1)
enumWg.Add(1)
go func() {
defer enumWg.Done()
for status := range enumProgress {
log.Progressf("Enumerating: %d files, %s",
status.FilesFound,
humanize.IBytes(uint64(status.BytesFound)))
}
log.ProgressDone()
}()
}
if args.Len() == 0 {
// Default to current directory
if err := s.EnumeratePath(".", enumProgress); err != nil {
return err
}
} else {
// Collect and validate all paths first
paths := make([]string, 0, args.Len())
for i := 0; i < args.Len(); i++ {
inputPath := args.Get(i)
ap, err := filepath.Abs(inputPath)
if err != nil {
return err
}
log.Dump(ap)
// Validate path exists before adding to list
if exists, _ := afero.Exists(mfa.Fs, ap); !exists {
return fmt.Errorf("path does not exist: %s", inputPath)
}
log.Debugf("enumerating path: %s", ap)
paths = append(paths, ap)
}
mf, err := mfer.NewFromPaths(opts, paths...)
if err != nil {
panic(err)
}
mf.WithContext(ctx.Context)
log.Dump(mf)
err = mf.Scan()
if err != nil {
if err := s.EnumeratePaths(enumProgress, paths...); err != nil {
return err
}
}
enumWg.Wait()
buf := new(bytes.Buffer)
log.Infof("enumerated %d files, %s total", s.FileCount(), humanize.IBytes(uint64(s.TotalBytes())))
err = mf.WriteTo(buf)
if err != nil {
return err
// Check if output file exists
outputPath := ctx.String("output")
if exists, _ := afero.Exists(mfa.Fs, outputPath); exists {
if !ctx.Bool("force") {
return fmt.Errorf("output file %s already exists (use --force to overwrite)", outputPath)
}
}
dat := buf.Bytes()
// Create temp file for atomic write
tmpPath := outputPath + ".tmp"
outFile, err := mfa.Fs.Create(tmpPath)
if err != nil {
return fmt.Errorf("failed to create temp file: %w", err)
}
// Set up signal handler to clean up temp file on Ctrl-C
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)
go func() {
sig, ok := <-sigChan
if !ok || sig == nil {
return // Channel closed normally, not a signal
}
_ = outFile.Close()
_ = mfa.Fs.Remove(tmpPath)
os.Exit(1)
}()
// Clean up temp file on any error or interruption
success := false
defer func() {
signal.Stop(sigChan)
close(sigChan)
_ = outFile.Close()
if !success {
_ = mfa.Fs.Remove(tmpPath)
}
}()
// Phase 2: Scan - read file contents and generate manifest
var scanProgress chan mfer.ScanStatus
var scanWg sync.WaitGroup
if showProgress {
scanProgress = make(chan mfer.ScanStatus, 1)
scanWg.Add(1)
go func() {
defer scanWg.Done()
for status := range scanProgress {
if status.ETA > 0 {
log.Progressf("Scanning: %d/%d files, %s/s, ETA %s",
status.ScannedFiles,
status.TotalFiles,
humanize.IBytes(uint64(status.BytesPerSec)),
status.ETA.Round(time.Second))
} else {
log.Progressf("Scanning: %d/%d files, %s/s",
status.ScannedFiles,
status.TotalFiles,
humanize.IBytes(uint64(status.BytesPerSec)))
}
}
log.ProgressDone()
}()
}
err = s.ToManifest(ctx.Context, outFile, scanProgress)
scanWg.Wait()
if err != nil {
return fmt.Errorf("failed to generate manifest: %w", err)
}
// Close file before rename to ensure all data is flushed
if err := outFile.Close(); err != nil {
return fmt.Errorf("failed to close temp file: %w", err)
}
// Atomic rename
if err := mfa.Fs.Rename(tmpPath, outputPath); err != nil {
return fmt.Errorf("failed to rename temp file: %w", err)
}
success = true
elapsed := time.Since(mfa.startupTime).Seconds()
rate := float64(s.TotalBytes()) / elapsed
log.Infof("wrote %d files (%s) to %s in %.1fs (%s/s)", s.FileCount(), humanize.IBytes(uint64(s.TotalBytes())), outputPath, elapsed, humanize.IBytes(uint64(rate)))
log.Dump(dat)
return nil
}

65
internal/cli/list.go Normal file
View File

@ -0,0 +1,65 @@
package cli
import (
"fmt"
"time"
"github.com/urfave/cli/v2"
"sneak.berlin/go/mfer/internal/log"
"sneak.berlin/go/mfer/mfer"
)
func (mfa *CLIApp) listManifestOperation(ctx *cli.Context) error {
// Default to ErrorLevel for clean output
log.SetLevel(log.ErrorLevel)
longFormat := ctx.Bool("long")
print0 := ctx.Bool("print0")
// Find manifest file
var manifestPath string
var err error
if ctx.Args().Len() > 0 {
arg := ctx.Args().Get(0)
info, statErr := mfa.Fs.Stat(arg)
if statErr == nil && info.IsDir() {
manifestPath, err = findManifest(mfa.Fs, arg)
if err != nil {
return err
}
} else {
manifestPath = arg
}
} else {
manifestPath, err = findManifest(mfa.Fs, ".")
if err != nil {
return err
}
}
// Load manifest
manifest, err := mfer.NewManifestFromFile(mfa.Fs, manifestPath)
if err != nil {
return fmt.Errorf("failed to load manifest: %w", err)
}
files := manifest.Files()
// Determine line ending
lineEnd := "\n"
if print0 {
lineEnd = "\x00"
}
for _, f := range files {
if longFormat {
mtime := time.Unix(f.Mtime.Seconds, int64(f.Mtime.Nanos))
_, _ = fmt.Fprintf(mfa.Stdout, "%d\t%s\t%s%s", f.Size, mtime.Format(time.RFC3339), f.Path, lineEnd)
} else {
_, _ = fmt.Fprintf(mfa.Stdout, "%s%s", f.Path, lineEnd)
}
}
return nil
}

View File

@ -2,13 +2,18 @@ package cli
import (
"fmt"
"io"
"os"
"time"
"git.eeqj.de/sneak/mfer/internal/log"
"github.com/spf13/afero"
"github.com/urfave/cli/v2"
"sneak.berlin/go/mfer/internal/log"
"sneak.berlin/go/mfer/mfer"
)
// CLIApp is the main CLI application container. It holds configuration,
// I/O streams, and filesystem abstraction to enable testing and flexibility.
type CLIApp struct {
appname string
version string
@ -16,9 +21,15 @@ type CLIApp struct {
startupTime time.Time
exitCode int
app *cli.App
Stdin io.Reader // Standard input stream
Stdout io.Writer // Standard output stream for normal output
Stderr io.Writer // Standard error stream for diagnostics
Fs afero.Fs // Filesystem abstraction for all file operations
}
const banner = ` ___ ___ ___ ___
const banner = `
___ ___ ___ ___
/__/\ / /\ / /\ / /\
| |::\ / /:/_ / /:/_ / /::\
| |:|:\ / /:/ /\ / /:/ /\ / /:/\:\
@ -31,23 +42,50 @@ const banner = ` ___ ___ ___ ___
\__\/ \__\/ \__\/ \__\/`
func (mfa *CLIApp) printBanner() {
fmt.Println(banner)
if log.GetLevel() <= log.InfoLevel {
_, _ = fmt.Fprintln(mfa.Stdout, banner)
_, _ = fmt.Fprintf(mfa.Stdout, " mfer by @sneak: v%s released %s\n", mfer.Version, mfer.ReleaseDate)
_, _ = fmt.Fprintln(mfa.Stdout, " https://sneak.berlin/go/mfer")
}
}
// VersionString returns the version and git revision formatted for display.
func (mfa *CLIApp) VersionString() string {
return fmt.Sprintf("%s (%s)", mfa.version, mfa.gitrev)
if mfa.gitrev != "" {
return fmt.Sprintf("%s (%s)", mfer.Version, mfa.gitrev)
}
return mfer.Version
}
func (mfa *CLIApp) setVerbosity(v int) {
func (mfa *CLIApp) setVerbosity(c *cli.Context) {
_, present := os.LookupEnv("MFER_DEBUG")
if present {
log.EnableDebugLogging()
} else if c.Bool("quiet") {
log.SetLevel(log.ErrorLevel)
} else {
log.SetLevelFromVerbosity(v)
log.SetLevelFromVerbosity(c.Count("verbose"))
}
}
func (mfa *CLIApp) run() {
// commonFlags returns the flags shared by most commands (-v, -q)
func commonFlags() []cli.Flag {
return []cli.Flag{
&cli.BoolFlag{
Name: "verbose",
Aliases: []string{"v"},
Usage: "Increase verbosity (-v for verbose, -vv for debug)",
Count: new(int),
},
&cli.BoolFlag{
Name: "quiet",
Aliases: []string{"q"},
Usage: "Suppress output except errors",
},
}
}
func (mfa *CLIApp) run(args []string) {
mfa.startupTime = time.Now()
if NO_COLOR {
@ -55,27 +93,23 @@ func (mfa *CLIApp) run() {
log.DisableStyling()
}
// Configure log package to use our I/O streams
log.SetOutput(mfa.Stdout, mfa.Stderr)
log.Init()
var verbosity int
mfa.app = &cli.App{
Name: mfa.appname,
Usage: "Manifest generator",
Version: mfa.VersionString(),
EnableBashCompletion: true,
Flags: []cli.Flag{
&cli.BoolFlag{
Name: "verbose",
Usage: "Verbosity level",
Aliases: []string{"v"},
Count: &verbosity,
},
&cli.BoolFlag{
Name: "quiet",
Usage: "don't produce output except on error",
Aliases: []string{"q"},
},
Writer: mfa.Stdout,
ErrWriter: mfa.Stderr,
Action: func(c *cli.Context) error {
if c.Args().Len() > 0 {
return fmt.Errorf("unknown command %q", c.Args().First())
}
mfa.printBanner()
return cli.ShowAppHelp(c)
},
Commands: []*cli.Command{
{
@ -83,66 +117,160 @@ func (mfa *CLIApp) run() {
Aliases: []string{"gen"},
Usage: "Generate manifest file",
Action: func(c *cli.Context) error {
if !c.Bool("quiet") {
mfa.setVerbosity(c)
mfa.printBanner()
}
mfa.setVerbosity(verbosity)
return mfa.generateManifestOperation(c)
},
Flags: []cli.Flag{
Flags: append(commonFlags(),
&cli.BoolFlag{
Name: "FollowSymLinks",
Aliases: []string{"follow-symlinks"},
Usage: "Resolve encountered symlinks",
},
&cli.BoolFlag{
Name: "IgnoreDotfiles",
Aliases: []string{"ignore-dotfiles"},
Usage: "Ignore any dot (hidden) files encountered",
Name: "IncludeDotfiles",
Aliases: []string{"include-dotfiles"},
Usage: "Include dot (hidden) files (excluded by default)",
},
&cli.StringFlag{
Name: "output",
Value: "./index.mf",
Value: "./.index.mf",
Aliases: []string{"o"},
Usage: "Specify output filename",
},
&cli.BoolFlag{
Name: "force",
Aliases: []string{"f"},
Usage: "Overwrite output file if it exists",
},
&cli.BoolFlag{
Name: "progress",
Aliases: []string{"P"},
Usage: "Show progress during enumeration and scanning",
},
&cli.StringFlag{
Name: "sign-key",
Aliases: []string{"s"},
Usage: "GPG key ID to sign the manifest with",
EnvVars: []string{"MFER_SIGN_KEY"},
},
),
},
{
Name: "check",
Usage: "Validate files using manifest file",
ArgsUsage: "[manifest file]",
Action: func(c *cli.Context) error {
if !c.Bool("quiet") {
mfa.setVerbosity(c)
mfa.printBanner()
}
mfa.setVerbosity(verbosity)
return mfa.checkManifestOperation(c)
},
Flags: append(commonFlags(),
&cli.StringFlag{
Name: "base",
Aliases: []string{"b"},
Value: ".",
Usage: "Base directory for resolving relative paths from manifest",
},
&cli.BoolFlag{
Name: "progress",
Aliases: []string{"P"},
Usage: "Show progress during checking",
},
&cli.BoolFlag{
Name: "no-extra-files",
Usage: "Fail if files exist in base directory that are not in manifest",
},
&cli.StringFlag{
Name: "require-signature",
Aliases: []string{"S"},
Usage: "Require manifest to be signed by the specified GPG key ID",
EnvVars: []string{"MFER_REQUIRE_SIGNATURE"},
},
),
},
{
Name: "freshen",
Usage: "Update manifest with changed, new, and removed files",
ArgsUsage: "[manifest file]",
Action: func(c *cli.Context) error {
mfa.setVerbosity(c)
mfa.printBanner()
return mfa.freshenManifestOperation(c)
},
Flags: append(commonFlags(),
&cli.StringFlag{
Name: "base",
Aliases: []string{"b"},
Value: ".",
Usage: "Base directory for resolving relative paths",
},
&cli.BoolFlag{
Name: "FollowSymLinks",
Aliases: []string{"follow-symlinks"},
Usage: "Resolve encountered symlinks",
},
&cli.BoolFlag{
Name: "IncludeDotfiles",
Aliases: []string{"include-dotfiles"},
Usage: "Include dot (hidden) files (excluded by default)",
},
&cli.BoolFlag{
Name: "progress",
Aliases: []string{"P"},
Usage: "Show progress during scanning and hashing",
},
&cli.StringFlag{
Name: "sign-key",
Aliases: []string{"s"},
Usage: "GPG key ID to sign the manifest with",
EnvVars: []string{"MFER_SIGN_KEY"},
},
),
},
{
Name: "version",
Usage: "Show version",
Action: func(c *cli.Context) error {
fmt.Printf("%s\n", mfa.VersionString())
_, _ = fmt.Fprintln(mfa.Stdout, mfa.VersionString())
return nil
},
},
{
Name: "list",
Aliases: []string{"ls"},
Usage: "List files in manifest",
ArgsUsage: "[manifest file]",
Action: func(c *cli.Context) error {
return mfa.listManifestOperation(c)
},
Flags: []cli.Flag{
&cli.BoolFlag{
Name: "long",
Aliases: []string{"l"},
Usage: "Show size and mtime",
},
&cli.BoolFlag{
Name: "print0",
Usage: "Separate entries with NUL character (for xargs -0)",
},
},
},
{
Name: "fetch",
Usage: "fetch manifest and referenced files",
Action: func(c *cli.Context) error {
if !c.Bool("quiet") {
mfa.setVerbosity(c)
mfa.printBanner()
}
mfa.setVerbosity(verbosity)
return mfa.fetchManifestOperation(c)
},
Flags: commonFlags(),
},
},
}
mfa.app.HideVersion = true
err := mfa.app.Run(os.Args)
err := mfa.app.Run(args)
if err != nil {
mfa.exitCode = 1
log.WithError(err).Debugf("exiting")

View File

@ -2,7 +2,11 @@ package log
import (
"fmt"
"io"
"os"
"path/filepath"
"runtime"
"sync"
"github.com/apex/log"
acli "github.com/apex/log/handlers/cli"
@ -10,8 +14,80 @@ import (
"github.com/pterm/pterm"
)
type Level = log.Level
// Level represents log severity levels.
// Lower values are more verbose.
type Level int
const (
// DebugLevel is for low-level tracing and structure inspection
DebugLevel Level = iota
// VerboseLevel is for detailed operational info (file listings, etc)
VerboseLevel
// InfoLevel is for operational summaries (default)
InfoLevel
// WarnLevel is for warnings
WarnLevel
// ErrorLevel is for errors
ErrorLevel
// FatalLevel is for fatal errors
FatalLevel
)
func (l Level) String() string {
switch l {
case DebugLevel:
return "debug"
case VerboseLevel:
return "verbose"
case InfoLevel:
return "info"
case WarnLevel:
return "warn"
case ErrorLevel:
return "error"
case FatalLevel:
return "fatal"
default:
return "unknown"
}
}
var (
// mu protects the output writers and level
mu sync.RWMutex
// stdout is the writer for progress output
stdout io.Writer = os.Stdout
// stderr is the writer for log output
stderr io.Writer = os.Stderr
// currentLevel is our log level (includes Verbose)
currentLevel Level = InfoLevel
)
// SetOutput configures the output writers for the log package.
// stdout is used for progress output, stderr is used for log messages.
func SetOutput(out, err io.Writer) {
mu.Lock()
defer mu.Unlock()
stdout = out
stderr = err
pterm.SetDefaultOutput(out)
}
// GetStdout returns the configured stdout writer.
func GetStdout() io.Writer {
mu.RLock()
defer mu.RUnlock()
return stdout
}
// GetStderr returns the configured stderr writer.
func GetStderr() io.Writer {
mu.RLock()
defer mu.RUnlock()
return stderr
}
// DisableStyling turns off colors and styling for terminal output.
func DisableStyling() {
pterm.DisableColor()
pterm.DisableStyling()
@ -23,67 +99,176 @@ func DisableStyling() {
pterm.Fatal.Prefix.Text = ""
}
// Init initializes the logger with the CLI handler and default log level.
func Init() {
log.SetHandler(acli.Default)
log.SetLevel(log.InfoLevel)
mu.RLock()
w := stderr
mu.RUnlock()
log.SetHandler(acli.New(w))
log.SetLevel(log.DebugLevel) // Let apex/log pass everything; we filter ourselves
}
// isEnabled returns true if messages at the given level should be logged.
func isEnabled(l Level) bool {
mu.RLock()
defer mu.RUnlock()
return l >= currentLevel
}
// Fatalf logs a formatted message at fatal level.
func Fatalf(format string, args ...interface{}) {
if isEnabled(FatalLevel) {
log.Fatalf(format, args...)
}
}
// Fatal logs a message at fatal level.
func Fatal(arg string) {
if isEnabled(FatalLevel) {
log.Fatal(arg)
}
}
// Errorf logs a formatted message at error level.
func Errorf(format string, args ...interface{}) {
if isEnabled(ErrorLevel) {
log.Errorf(format, args...)
}
}
// Error logs a message at error level.
func Error(arg string) {
if isEnabled(ErrorLevel) {
log.Error(arg)
}
}
// Warnf logs a formatted message at warn level.
func Warnf(format string, args ...interface{}) {
if isEnabled(WarnLevel) {
log.Warnf(format, args...)
}
}
// Warn logs a message at warn level.
func Warn(arg string) {
if isEnabled(WarnLevel) {
log.Warn(arg)
}
}
// Infof logs a formatted message at info level.
func Infof(format string, args ...interface{}) {
if isEnabled(InfoLevel) {
log.Infof(format, args...)
}
}
// Info logs a message at info level.
func Info(arg string) {
if isEnabled(InfoLevel) {
log.Info(arg)
}
}
// Verbosef logs a formatted message at verbose level.
func Verbosef(format string, args ...interface{}) {
if isEnabled(VerboseLevel) {
log.Infof(format, args...)
}
}
// Verbose logs a message at verbose level.
func Verbose(arg string) {
if isEnabled(VerboseLevel) {
log.Info(arg)
}
}
// Debugf logs a formatted message at debug level with caller location.
func Debugf(format string, args ...interface{}) {
if isEnabled(DebugLevel) {
DebugReal(fmt.Sprintf(format, args...), 2)
}
func Debug(arg string) {
DebugReal(arg, 2)
}
// Debug logs a message at debug level with caller location.
func Debug(arg string) {
if isEnabled(DebugLevel) {
DebugReal(arg, 2)
}
}
// DebugReal logs at debug level with caller info from the specified stack depth.
func DebugReal(arg string, cs int) {
if !isEnabled(DebugLevel) {
return
}
_, callerFile, callerLine, ok := runtime.Caller(cs)
if !ok {
return
}
tag := fmt.Sprintf("%s:%d: ", callerFile, callerLine)
tag := fmt.Sprintf("%s:%d: ", filepath.Base(callerFile), callerLine)
log.Debug(tag + arg)
}
// Dump logs a spew dump of the arguments at debug level.
func Dump(args ...interface{}) {
if isEnabled(DebugLevel) {
DebugReal(spew.Sdump(args...), 2)
}
}
// EnableDebugLogging sets the log level to debug.
func EnableDebugLogging() {
SetLevel(log.DebugLevel)
SetLevel(DebugLevel)
}
func VerbosityStepsToLogLevel(l int) log.Level {
// VerbosityStepsToLogLevel converts a -v count to a log level.
// 0 returns InfoLevel, 1 returns VerboseLevel, 2+ returns DebugLevel.
func VerbosityStepsToLogLevel(l int) Level {
switch l {
case 0:
return InfoLevel
case 1:
return log.WarnLevel
case 2:
return log.InfoLevel
case 3:
return log.DebugLevel
return VerboseLevel
default:
return DebugLevel
}
return log.ErrorLevel
}
// SetLevelFromVerbosity sets the log level based on -v flag count.
func SetLevelFromVerbosity(l int) {
SetLevel(VerbosityStepsToLogLevel(l))
}
func SetLevel(arg log.Level) {
log.SetLevel(arg)
// SetLevel sets the global log level.
func SetLevel(l Level) {
mu.Lock()
defer mu.Unlock()
currentLevel = l
}
func GetLogger() *log.Logger {
if logger, ok := log.Log.(*log.Logger); ok {
return logger
}
panic("unable to get logger")
}
func GetLevel() log.Level {
return GetLogger().Level
// GetLevel returns the current log level.
func GetLevel() Level {
mu.RLock()
defer mu.RUnlock()
return currentLevel
}
// WithError returns a log entry with the error attached.
func WithError(e error) *log.Entry {
return GetLogger().WithError(e)
return log.Log.WithError(e)
}
// Progressf prints a progress message that overwrites the current line.
// Use ProgressDone() when progress is complete to move to the next line.
func Progressf(format string, args ...interface{}) {
pterm.Printf("\r"+format, args...)
}
// ProgressDone clears the progress line when progress is complete.
func ProgressDone() {
// Clear the line with spaces and return to beginning
pterm.Print("\r\033[K")
}

251
mfer/builder.go Normal file
View File

@ -0,0 +1,251 @@
package mfer
import (
"crypto/sha256"
"errors"
"fmt"
"io"
"strings"
"sync"
"time"
"unicode/utf8"
"github.com/multiformats/go-multihash"
)
// ValidatePath checks that a file path conforms to manifest path invariants:
// - Must be valid UTF-8
// - Must use forward slashes only (no backslashes)
// - Must be relative (no leading /)
// - Must not contain ".." segments
// - Must not contain empty segments (no "//")
// - Must not be empty
func ValidatePath(p string) error {
if p == "" {
return errors.New("path cannot be empty")
}
if !utf8.ValidString(p) {
return fmt.Errorf("path %q is not valid UTF-8", p)
}
if strings.ContainsRune(p, '\\') {
return fmt.Errorf("path %q contains backslash; use forward slashes only", p)
}
if strings.HasPrefix(p, "/") {
return fmt.Errorf("path %q is absolute; must be relative", p)
}
for _, seg := range strings.Split(p, "/") {
if seg == "" {
return fmt.Errorf("path %q contains empty segment", p)
}
if seg == ".." {
return fmt.Errorf("path %q contains '..' segment", p)
}
}
return nil
}
// RelFilePath represents a relative file path within a manifest.
type RelFilePath string
// AbsFilePath represents an absolute file path on the filesystem.
type AbsFilePath string
// FileSize represents the size of a file in bytes.
type FileSize int64
// FileCount represents a count of files.
type FileCount int64
// ModTime represents a file's modification time.
type ModTime time.Time
// UnixSeconds represents seconds since Unix epoch.
type UnixSeconds int64
// UnixNanos represents the nanosecond component of a timestamp (0-999999999).
type UnixNanos int32
// Timestamp converts ModTime to a protobuf Timestamp.
func (m ModTime) Timestamp() *Timestamp {
t := time.Time(m)
return &Timestamp{
Seconds: t.Unix(),
Nanos: int32(t.Nanosecond()),
}
}
// Multihash represents a multihash-encoded file hash (typically SHA2-256).
type Multihash []byte
// FileHashProgress reports progress during file hashing.
type FileHashProgress struct {
BytesRead FileSize // Total bytes read so far for the current file
}
// Builder constructs a manifest by adding files one at a time.
type Builder struct {
mu sync.Mutex
files []*MFFilePath
createdAt time.Time
signingOptions *SigningOptions
}
// NewBuilder creates a new Builder.
func NewBuilder() *Builder {
return &Builder{
files: make([]*MFFilePath, 0),
createdAt: time.Now(),
}
}
// AddFile reads file content from reader, computes hashes, and adds to manifest.
// Progress updates are sent to the progress channel (if non-nil) without blocking.
// Returns the number of bytes read.
func (b *Builder) AddFile(
path RelFilePath,
size FileSize,
mtime ModTime,
reader io.Reader,
progress chan<- FileHashProgress,
) (FileSize, error) {
if err := ValidatePath(string(path)); err != nil {
return 0, err
}
// Create hash writer
h := sha256.New()
// Read file in chunks, updating hash and progress
var totalRead FileSize
buf := make([]byte, 64*1024) // 64KB chunks
for {
n, err := reader.Read(buf)
if n > 0 {
h.Write(buf[:n])
totalRead += FileSize(n)
sendFileHashProgress(progress, FileHashProgress{BytesRead: totalRead})
}
if err == io.EOF {
break
}
if err != nil {
return totalRead, err
}
}
// Verify actual bytes read matches declared size
if totalRead != size {
return totalRead, fmt.Errorf("size mismatch for %q: declared %d bytes but read %d bytes", path, size, totalRead)
}
// Encode hash as multihash (SHA2-256)
mh, err := multihash.Encode(h.Sum(nil), multihash.SHA2_256)
if err != nil {
return totalRead, err
}
// Create file entry
entry := &MFFilePath{
Path: string(path),
Size: int64(size),
Hashes: []*MFFileChecksum{
{MultiHash: mh},
},
Mtime: mtime.Timestamp(),
}
b.mu.Lock()
b.files = append(b.files, entry)
b.mu.Unlock()
return totalRead, nil
}
// sendFileHashProgress sends a progress update without blocking.
func sendFileHashProgress(ch chan<- FileHashProgress, p FileHashProgress) {
if ch == nil {
return
}
select {
case ch <- p:
default:
}
}
// FileCount returns the number of files added to the builder.
func (b *Builder) FileCount() int {
b.mu.Lock()
defer b.mu.Unlock()
return len(b.files)
}
// AddFileWithHash adds a file entry with a pre-computed hash.
// This is useful when the hash is already known (e.g., from an existing manifest).
// Returns an error if path is empty, size is negative, or hash is nil/empty.
func (b *Builder) AddFileWithHash(path RelFilePath, size FileSize, mtime ModTime, hash Multihash) error {
if err := ValidatePath(string(path)); err != nil {
return err
}
if size < 0 {
return errors.New("size cannot be negative")
}
if len(hash) == 0 {
return errors.New("hash cannot be nil or empty")
}
entry := &MFFilePath{
Path: string(path),
Size: int64(size),
Hashes: []*MFFileChecksum{
{MultiHash: hash},
},
Mtime: mtime.Timestamp(),
}
b.mu.Lock()
b.files = append(b.files, entry)
b.mu.Unlock()
return nil
}
// SetSigningOptions sets the GPG signing options for the manifest.
// If opts is non-nil, the manifest will be signed when Build() is called.
func (b *Builder) SetSigningOptions(opts *SigningOptions) {
b.mu.Lock()
defer b.mu.Unlock()
b.signingOptions = opts
}
// Build finalizes the manifest and writes it to the writer.
func (b *Builder) Build(w io.Writer) error {
b.mu.Lock()
defer b.mu.Unlock()
// Create inner manifest
inner := &MFFile{
Version: MFFile_VERSION_ONE,
CreatedAt: newTimestampFromTime(b.createdAt),
Files: b.files,
}
// Create a temporary manifest to use existing serialization
m := &manifest{
pbInner: inner,
signingOptions: b.signingOptions,
}
// Generate outer wrapper
if err := m.generateOuter(); err != nil {
return err
}
// Generate final output
if err := m.generate(); err != nil {
return err
}
// Write to output
_, err := w.Write(m.output.Bytes())
return err
}

127
mfer/builder_test.go Normal file
View File

@ -0,0 +1,127 @@
package mfer
import (
"bytes"
"strings"
"testing"
"time"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestNewBuilder(t *testing.T) {
b := NewBuilder()
assert.NotNil(t, b)
assert.Equal(t, 0, b.FileCount())
}
func TestBuilderAddFile(t *testing.T) {
b := NewBuilder()
content := []byte("test content")
reader := bytes.NewReader(content)
bytesRead, err := b.AddFile("test.txt", FileSize(len(content)), ModTime(time.Now()), reader, nil)
require.NoError(t, err)
assert.Equal(t, FileSize(len(content)), bytesRead)
assert.Equal(t, 1, b.FileCount())
}
func TestBuilderAddFileWithHash(t *testing.T) {
b := NewBuilder()
hash := make([]byte, 34) // SHA256 multihash is 34 bytes
err := b.AddFileWithHash("test.txt", 100, ModTime(time.Now()), hash)
require.NoError(t, err)
assert.Equal(t, 1, b.FileCount())
}
func TestBuilderAddFileWithHashValidation(t *testing.T) {
t.Run("empty path", func(t *testing.T) {
b := NewBuilder()
hash := make([]byte, 34)
err := b.AddFileWithHash("", 100, ModTime(time.Now()), hash)
assert.Error(t, err)
assert.Contains(t, err.Error(), "path")
})
t.Run("negative size", func(t *testing.T) {
b := NewBuilder()
hash := make([]byte, 34)
err := b.AddFileWithHash("test.txt", -1, ModTime(time.Now()), hash)
assert.Error(t, err)
assert.Contains(t, err.Error(), "size")
})
t.Run("nil hash", func(t *testing.T) {
b := NewBuilder()
err := b.AddFileWithHash("test.txt", 100, ModTime(time.Now()), nil)
assert.Error(t, err)
assert.Contains(t, err.Error(), "hash")
})
t.Run("empty hash", func(t *testing.T) {
b := NewBuilder()
err := b.AddFileWithHash("test.txt", 100, ModTime(time.Now()), []byte{})
assert.Error(t, err)
assert.Contains(t, err.Error(), "hash")
})
t.Run("valid inputs", func(t *testing.T) {
b := NewBuilder()
hash := make([]byte, 34)
err := b.AddFileWithHash("test.txt", 100, ModTime(time.Now()), hash)
assert.NoError(t, err)
assert.Equal(t, 1, b.FileCount())
})
}
func TestBuilderBuild(t *testing.T) {
b := NewBuilder()
content := []byte("test content")
reader := bytes.NewReader(content)
_, err := b.AddFile("test.txt", FileSize(len(content)), ModTime(time.Now()), reader, nil)
require.NoError(t, err)
var buf bytes.Buffer
err = b.Build(&buf)
require.NoError(t, err)
// Should have magic bytes
assert.True(t, strings.HasPrefix(buf.String(), MAGIC))
}
func TestNewTimestampFromTimeExtremeDate(t *testing.T) {
// Regression test: newTimestampFromTime used UnixNano() which panics
// for dates outside ~1678-2262. Now uses Nanosecond() which is safe.
tests := []struct {
name string
time time.Time
}{
{"zero time", time.Time{}},
{"year 1000", time.Date(1000, 1, 1, 0, 0, 0, 0, time.UTC)},
{"year 3000", time.Date(3000, 1, 1, 0, 0, 0, 123456789, time.UTC)},
{"unix epoch", time.Unix(0, 0)},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Should not panic
ts := newTimestampFromTime(tt.time)
assert.Equal(t, tt.time.Unix(), ts.Seconds)
assert.Equal(t, int32(tt.time.Nanosecond()), ts.Nanos)
})
}
}
func TestBuilderBuildEmpty(t *testing.T) {
b := NewBuilder()
var buf bytes.Buffer
err := b.Build(&buf)
require.NoError(t, err)
// Should still produce valid manifest with 0 files
assert.True(t, strings.HasPrefix(buf.String(), MAGIC))
}

327
mfer/checker.go Normal file
View File

@ -0,0 +1,327 @@
package mfer
import (
"bytes"
"context"
"crypto/sha256"
"errors"
"io"
"os"
"path/filepath"
"time"
"github.com/multiformats/go-multihash"
"github.com/spf13/afero"
)
// Result represents the outcome of checking a single file.
type Result struct {
Path RelFilePath // Relative path from manifest
Status Status // Verification result status
Message string // Human-readable description of the result
}
// Status represents the verification status of a file.
type Status int
const (
StatusOK Status = iota // File matches manifest (size and hash verified)
StatusMissing // File not found on disk
StatusSizeMismatch // File size differs from manifest
StatusHashMismatch // File hash differs from manifest
StatusExtra // File exists on disk but not in manifest
StatusError // Error occurred during verification
)
func (s Status) String() string {
switch s {
case StatusOK:
return "OK"
case StatusMissing:
return "MISSING"
case StatusSizeMismatch:
return "SIZE_MISMATCH"
case StatusHashMismatch:
return "HASH_MISMATCH"
case StatusExtra:
return "EXTRA"
case StatusError:
return "ERROR"
default:
return "UNKNOWN"
}
}
// CheckStatus contains progress information for the check operation.
type CheckStatus struct {
TotalFiles FileCount // Total number of files in manifest
CheckedFiles FileCount // Number of files checked so far
TotalBytes FileSize // Total bytes to verify (sum of all file sizes)
CheckedBytes FileSize // Bytes verified so far
BytesPerSec float64 // Current throughput rate
ETA time.Duration // Estimated time to completion
Failures FileCount // Number of verification failures encountered
}
// Checker verifies files against a manifest.
type Checker struct {
basePath AbsFilePath
files []*MFFilePath
fs afero.Fs
// manifestPaths is a set of paths in the manifest for quick lookup
manifestPaths map[RelFilePath]struct{}
// signature info from the manifest
signature []byte
signer []byte
signingPubKey []byte
}
// NewChecker creates a new Checker for the given manifest, base path, and filesystem.
// The basePath is the directory relative to which manifest paths are resolved.
// If fs is nil, the real filesystem (OsFs) is used.
func NewChecker(manifestPath string, basePath string, fs afero.Fs) (*Checker, error) {
if fs == nil {
fs = afero.NewOsFs()
}
m, err := NewManifestFromFile(fs, manifestPath)
if err != nil {
return nil, err
}
abs, err := filepath.Abs(basePath)
if err != nil {
return nil, err
}
files := m.Files()
manifestPaths := make(map[RelFilePath]struct{}, len(files))
for _, f := range files {
manifestPaths[RelFilePath(f.Path)] = struct{}{}
}
return &Checker{
basePath: AbsFilePath(abs),
files: files,
fs: fs,
manifestPaths: manifestPaths,
signature: m.pbOuter.Signature,
signer: m.pbOuter.Signer,
signingPubKey: m.pbOuter.SigningPubKey,
}, nil
}
// FileCount returns the number of files in the manifest.
func (c *Checker) FileCount() FileCount {
return FileCount(len(c.files))
}
// TotalBytes returns the total size of all files in the manifest.
func (c *Checker) TotalBytes() FileSize {
var total FileSize
for _, f := range c.files {
total += FileSize(f.Size)
}
return total
}
// IsSigned returns true if the manifest has a signature.
func (c *Checker) IsSigned() bool {
return len(c.signature) > 0
}
// Signer returns the signer fingerprint if the manifest is signed, nil otherwise.
func (c *Checker) Signer() []byte {
return c.signer
}
// SigningPubKey returns the signing public key if the manifest is signed, nil otherwise.
func (c *Checker) SigningPubKey() []byte {
return c.signingPubKey
}
// ExtractEmbeddedSigningKeyFP imports the manifest's embedded public key into a
// temporary keyring and extracts its fingerprint. This validates the key and
// returns its actual fingerprint from the key material itself.
func (c *Checker) ExtractEmbeddedSigningKeyFP() (string, error) {
if len(c.signingPubKey) == 0 {
return "", errors.New("manifest has no signing public key")
}
return gpgExtractPubKeyFingerprint(c.signingPubKey)
}
// Check verifies all files against the manifest.
// Results are sent to the results channel as files are checked.
// Progress updates are sent to the progress channel approximately once per second.
// Both channels are closed when the method returns.
func (c *Checker) Check(ctx context.Context, results chan<- Result, progress chan<- CheckStatus) error {
if results != nil {
defer close(results)
}
if progress != nil {
defer close(progress)
}
totalFiles := FileCount(len(c.files))
totalBytes := c.TotalBytes()
var checkedFiles FileCount
var checkedBytes FileSize
var failures FileCount
startTime := time.Now()
for _, entry := range c.files {
select {
case <-ctx.Done():
return ctx.Err()
default:
}
result := c.checkFile(entry, &checkedBytes)
if result.Status != StatusOK {
failures++
}
checkedFiles++
if results != nil {
results <- result
}
// Send progress with rate and ETA calculation
if progress != nil {
elapsed := time.Since(startTime)
var bytesPerSec float64
var eta time.Duration
if elapsed > 0 && checkedBytes > 0 {
bytesPerSec = float64(checkedBytes) / elapsed.Seconds()
remainingBytes := totalBytes - checkedBytes
if bytesPerSec > 0 {
eta = time.Duration(float64(remainingBytes)/bytesPerSec) * time.Second
}
}
sendCheckStatus(progress, CheckStatus{
TotalFiles: totalFiles,
CheckedFiles: checkedFiles,
TotalBytes: totalBytes,
CheckedBytes: checkedBytes,
BytesPerSec: bytesPerSec,
ETA: eta,
Failures: failures,
})
}
}
return nil
}
func (c *Checker) checkFile(entry *MFFilePath, checkedBytes *FileSize) Result {
absPath := filepath.Join(string(c.basePath), entry.Path)
relPath := RelFilePath(entry.Path)
// Check if file exists
info, err := c.fs.Stat(absPath)
if err != nil {
if errors.Is(err, os.ErrNotExist) || errors.Is(err, afero.ErrFileNotFound) {
return Result{Path: relPath, Status: StatusMissing, Message: "file not found"}
}
return Result{Path: relPath, Status: StatusError, Message: err.Error()}
}
// Check size
if info.Size() != entry.Size {
*checkedBytes += FileSize(info.Size())
return Result{
Path: relPath,
Status: StatusSizeMismatch,
Message: "size mismatch",
}
}
// Open and hash file
f, err := c.fs.Open(absPath)
if err != nil {
return Result{Path: relPath, Status: StatusError, Message: err.Error()}
}
defer func() { _ = f.Close() }()
h := sha256.New()
n, err := io.Copy(h, f)
if err != nil {
return Result{Path: relPath, Status: StatusError, Message: err.Error()}
}
*checkedBytes += FileSize(n)
// Encode as multihash and compare
computed, err := multihash.Encode(h.Sum(nil), multihash.SHA2_256)
if err != nil {
return Result{Path: relPath, Status: StatusError, Message: err.Error()}
}
// Check against all hashes in manifest (at least one must match)
for _, hash := range entry.Hashes {
if bytes.Equal(computed, hash.MultiHash) {
return Result{Path: relPath, Status: StatusOK}
}
}
return Result{Path: relPath, Status: StatusHashMismatch, Message: "hash mismatch"}
}
// FindExtraFiles walks the filesystem and reports files not in the manifest.
// Results are sent to the results channel. The channel is closed when done.
func (c *Checker) FindExtraFiles(ctx context.Context, results chan<- Result) error {
if results != nil {
defer close(results)
}
return afero.Walk(c.fs, string(c.basePath), func(path string, info os.FileInfo, err error) error {
if err != nil {
return err
}
select {
case <-ctx.Done():
return ctx.Err()
default:
}
// Skip directories
if info.IsDir() {
return nil
}
// Get relative path
rel, err := filepath.Rel(string(c.basePath), path)
if err != nil {
return err
}
relPath := RelFilePath(rel)
// Check if path is in manifest
if _, exists := c.manifestPaths[relPath]; !exists {
if results != nil {
results <- Result{
Path: relPath,
Status: StatusExtra,
Message: "not in manifest",
}
}
}
return nil
})
}
// sendCheckStatus sends a status update without blocking.
func sendCheckStatus(ch chan<- CheckStatus, status CheckStatus) {
if ch == nil {
return
}
select {
case ch <- status:
default:
}
}

437
mfer/checker_test.go Normal file
View File

@ -0,0 +1,437 @@
package mfer
import (
"bytes"
"context"
"testing"
"time"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestStatusString(t *testing.T) {
tests := []struct {
status Status
expected string
}{
{StatusOK, "OK"},
{StatusMissing, "MISSING"},
{StatusSizeMismatch, "SIZE_MISMATCH"},
{StatusHashMismatch, "HASH_MISMATCH"},
{StatusExtra, "EXTRA"},
{StatusError, "ERROR"},
{Status(99), "UNKNOWN"},
}
for _, tt := range tests {
t.Run(tt.expected, func(t *testing.T) {
assert.Equal(t, tt.expected, tt.status.String())
})
}
}
// createTestManifest creates a manifest file in the filesystem with the given files.
func createTestManifest(t *testing.T, fs afero.Fs, manifestPath string, files map[string][]byte) {
t.Helper()
builder := NewBuilder()
for path, content := range files {
reader := bytes.NewReader(content)
_, err := builder.AddFile(RelFilePath(path), FileSize(len(content)), ModTime(time.Now()), reader, nil)
require.NoError(t, err)
}
var buf bytes.Buffer
require.NoError(t, builder.Build(&buf))
require.NoError(t, afero.WriteFile(fs, manifestPath, buf.Bytes(), 0o644))
}
// createFilesOnDisk creates the given files on the filesystem.
func createFilesOnDisk(t *testing.T, fs afero.Fs, basePath string, files map[string][]byte) {
t.Helper()
for path, content := range files {
fullPath := basePath + "/" + path
require.NoError(t, fs.MkdirAll(basePath, 0o755))
require.NoError(t, afero.WriteFile(fs, fullPath, content, 0o644))
}
}
func TestNewChecker(t *testing.T) {
t.Run("valid manifest", func(t *testing.T) {
fs := afero.NewMemMapFs()
files := map[string][]byte{
"file1.txt": []byte("hello"),
"file2.txt": []byte("world"),
}
createTestManifest(t, fs, "/manifest.mf", files)
chk, err := NewChecker("/manifest.mf", "/", fs)
require.NoError(t, err)
assert.NotNil(t, chk)
assert.Equal(t, FileCount(2), chk.FileCount())
})
t.Run("missing manifest", func(t *testing.T) {
fs := afero.NewMemMapFs()
_, err := NewChecker("/nonexistent.mf", "/", fs)
assert.Error(t, err)
})
t.Run("invalid manifest", func(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, afero.WriteFile(fs, "/bad.mf", []byte("not a manifest"), 0o644))
_, err := NewChecker("/bad.mf", "/", fs)
assert.Error(t, err)
})
}
func TestCheckerFileCountAndTotalBytes(t *testing.T) {
fs := afero.NewMemMapFs()
files := map[string][]byte{
"small.txt": []byte("hi"),
"medium.txt": []byte("hello world"),
"large.txt": bytes.Repeat([]byte("x"), 1000),
}
createTestManifest(t, fs, "/manifest.mf", files)
chk, err := NewChecker("/manifest.mf", "/", fs)
require.NoError(t, err)
assert.Equal(t, FileCount(3), chk.FileCount())
assert.Equal(t, FileSize(2+11+1000), chk.TotalBytes())
}
func TestCheckAllFilesOK(t *testing.T) {
fs := afero.NewMemMapFs()
files := map[string][]byte{
"file1.txt": []byte("content one"),
"file2.txt": []byte("content two"),
}
createTestManifest(t, fs, "/manifest.mf", files)
createFilesOnDisk(t, fs, "/data", files)
chk, err := NewChecker("/manifest.mf", "/data", fs)
require.NoError(t, err)
results := make(chan Result, 10)
err = chk.Check(context.Background(), results, nil)
require.NoError(t, err)
var resultList []Result
for r := range results {
resultList = append(resultList, r)
}
assert.Len(t, resultList, 2)
for _, r := range resultList {
assert.Equal(t, StatusOK, r.Status, "file %s should be OK", r.Path)
}
}
func TestCheckMissingFile(t *testing.T) {
fs := afero.NewMemMapFs()
files := map[string][]byte{
"exists.txt": []byte("I exist"),
"missing.txt": []byte("I don't exist on disk"),
}
createTestManifest(t, fs, "/manifest.mf", files)
// Only create one file
createFilesOnDisk(t, fs, "/data", map[string][]byte{
"exists.txt": []byte("I exist"),
})
chk, err := NewChecker("/manifest.mf", "/data", fs)
require.NoError(t, err)
results := make(chan Result, 10)
err = chk.Check(context.Background(), results, nil)
require.NoError(t, err)
var okCount, missingCount int
for r := range results {
switch r.Status {
case StatusOK:
okCount++
case StatusMissing:
missingCount++
assert.Equal(t, RelFilePath("missing.txt"), r.Path)
}
}
assert.Equal(t, 1, okCount)
assert.Equal(t, 1, missingCount)
}
func TestCheckSizeMismatch(t *testing.T) {
fs := afero.NewMemMapFs()
files := map[string][]byte{
"file.txt": []byte("original content"),
}
createTestManifest(t, fs, "/manifest.mf", files)
// Create file with different size
createFilesOnDisk(t, fs, "/data", map[string][]byte{
"file.txt": []byte("short"),
})
chk, err := NewChecker("/manifest.mf", "/data", fs)
require.NoError(t, err)
results := make(chan Result, 10)
err = chk.Check(context.Background(), results, nil)
require.NoError(t, err)
r := <-results
assert.Equal(t, StatusSizeMismatch, r.Status)
assert.Equal(t, RelFilePath("file.txt"), r.Path)
}
func TestCheckHashMismatch(t *testing.T) {
fs := afero.NewMemMapFs()
originalContent := []byte("original content")
files := map[string][]byte{
"file.txt": originalContent,
}
createTestManifest(t, fs, "/manifest.mf", files)
// Create file with same size but different content
differentContent := []byte("different contnt") // same length (16 bytes) but different
require.Equal(t, len(originalContent), len(differentContent), "test requires same length")
createFilesOnDisk(t, fs, "/data", map[string][]byte{
"file.txt": differentContent,
})
chk, err := NewChecker("/manifest.mf", "/data", fs)
require.NoError(t, err)
results := make(chan Result, 10)
err = chk.Check(context.Background(), results, nil)
require.NoError(t, err)
r := <-results
assert.Equal(t, StatusHashMismatch, r.Status)
assert.Equal(t, RelFilePath("file.txt"), r.Path)
}
func TestCheckWithProgress(t *testing.T) {
fs := afero.NewMemMapFs()
files := map[string][]byte{
"file1.txt": bytes.Repeat([]byte("a"), 100),
"file2.txt": bytes.Repeat([]byte("b"), 200),
}
createTestManifest(t, fs, "/manifest.mf", files)
createFilesOnDisk(t, fs, "/data", files)
chk, err := NewChecker("/manifest.mf", "/data", fs)
require.NoError(t, err)
results := make(chan Result, 10)
progress := make(chan CheckStatus, 10)
err = chk.Check(context.Background(), results, progress)
require.NoError(t, err)
// Drain results
for range results {
}
// Check progress was sent
var progressUpdates []CheckStatus
for p := range progress {
progressUpdates = append(progressUpdates, p)
}
assert.NotEmpty(t, progressUpdates)
// Final progress should show all files checked
final := progressUpdates[len(progressUpdates)-1]
assert.Equal(t, FileCount(2), final.TotalFiles)
assert.Equal(t, FileCount(2), final.CheckedFiles)
assert.Equal(t, FileSize(300), final.TotalBytes)
assert.Equal(t, FileSize(300), final.CheckedBytes)
assert.Equal(t, FileCount(0), final.Failures)
}
func TestCheckContextCancellation(t *testing.T) {
fs := afero.NewMemMapFs()
// Create many files to ensure we have time to cancel
files := make(map[string][]byte)
for i := 0; i < 100; i++ {
files[string(rune('a'+i%26))+".txt"] = bytes.Repeat([]byte("x"), 1000)
}
createTestManifest(t, fs, "/manifest.mf", files)
createFilesOnDisk(t, fs, "/data", files)
chk, err := NewChecker("/manifest.mf", "/data", fs)
require.NoError(t, err)
ctx, cancel := context.WithCancel(context.Background())
cancel() // Cancel immediately
results := make(chan Result, 200)
err = chk.Check(ctx, results, nil)
assert.ErrorIs(t, err, context.Canceled)
}
func TestFindExtraFiles(t *testing.T) {
fs := afero.NewMemMapFs()
// Manifest only contains file1
manifestFiles := map[string][]byte{
"file1.txt": []byte("in manifest"),
}
createTestManifest(t, fs, "/manifest.mf", manifestFiles)
// Disk has file1 and file2
createFilesOnDisk(t, fs, "/data", map[string][]byte{
"file1.txt": []byte("in manifest"),
"file2.txt": []byte("extra file"),
})
chk, err := NewChecker("/manifest.mf", "/data", fs)
require.NoError(t, err)
results := make(chan Result, 10)
err = chk.FindExtraFiles(context.Background(), results)
require.NoError(t, err)
var extras []Result
for r := range results {
extras = append(extras, r)
}
assert.Len(t, extras, 1)
assert.Equal(t, RelFilePath("file2.txt"), extras[0].Path)
assert.Equal(t, StatusExtra, extras[0].Status)
assert.Equal(t, "not in manifest", extras[0].Message)
}
func TestFindExtraFilesContextCancellation(t *testing.T) {
fs := afero.NewMemMapFs()
files := map[string][]byte{"file.txt": []byte("data")}
createTestManifest(t, fs, "/manifest.mf", files)
createFilesOnDisk(t, fs, "/data", files)
chk, err := NewChecker("/manifest.mf", "/data", fs)
require.NoError(t, err)
ctx, cancel := context.WithCancel(context.Background())
cancel() // Cancel immediately
results := make(chan Result, 10)
err = chk.FindExtraFiles(ctx, results)
assert.ErrorIs(t, err, context.Canceled)
}
func TestCheckNilChannels(t *testing.T) {
fs := afero.NewMemMapFs()
files := map[string][]byte{"file.txt": []byte("data")}
createTestManifest(t, fs, "/manifest.mf", files)
createFilesOnDisk(t, fs, "/data", files)
chk, err := NewChecker("/manifest.mf", "/data", fs)
require.NoError(t, err)
// Should not panic with nil channels
err = chk.Check(context.Background(), nil, nil)
assert.NoError(t, err)
}
func TestFindExtraFilesNilChannel(t *testing.T) {
fs := afero.NewMemMapFs()
files := map[string][]byte{"file.txt": []byte("data")}
createTestManifest(t, fs, "/manifest.mf", files)
createFilesOnDisk(t, fs, "/data", files)
chk, err := NewChecker("/manifest.mf", "/data", fs)
require.NoError(t, err)
// Should not panic with nil channel
err = chk.FindExtraFiles(context.Background(), nil)
assert.NoError(t, err)
}
func TestCheckSubdirectories(t *testing.T) {
fs := afero.NewMemMapFs()
files := map[string][]byte{
"dir1/file1.txt": []byte("content1"),
"dir1/dir2/file2.txt": []byte("content2"),
"dir1/dir2/dir3/deep.txt": []byte("deep content"),
}
createTestManifest(t, fs, "/manifest.mf", files)
// Create files with full directory structure
for path, content := range files {
fullPath := "/data/" + path
require.NoError(t, fs.MkdirAll("/data/dir1/dir2/dir3", 0o755))
require.NoError(t, afero.WriteFile(fs, fullPath, content, 0o644))
}
chk, err := NewChecker("/manifest.mf", "/data", fs)
require.NoError(t, err)
results := make(chan Result, 10)
err = chk.Check(context.Background(), results, nil)
require.NoError(t, err)
var okCount int
for r := range results {
assert.Equal(t, StatusOK, r.Status, "file %s should be OK", r.Path)
okCount++
}
assert.Equal(t, 3, okCount)
}
func TestCheckMissingFileDetectedWithoutFallback(t *testing.T) {
// Regression test: errors.Is(err, errors.New("...")) never matches because
// errors.New creates a new value each time. The fix uses os.ErrNotExist instead.
fs := afero.NewMemMapFs()
files := map[string][]byte{
"exists.txt": []byte("here"),
"missing.txt": []byte("not on disk"),
}
createTestManifest(t, fs, "/manifest.mf", files)
// Only create one file on disk
createFilesOnDisk(t, fs, "/data", map[string][]byte{
"exists.txt": []byte("here"),
})
chk, err := NewChecker("/manifest.mf", "/data", fs)
require.NoError(t, err)
results := make(chan Result, 10)
err = chk.Check(context.Background(), results, nil)
require.NoError(t, err)
statusCounts := map[Status]int{}
for r := range results {
statusCounts[r.Status]++
if r.Status == StatusMissing {
assert.Equal(t, RelFilePath("missing.txt"), r.Path)
}
}
assert.Equal(t, 1, statusCounts[StatusOK], "one file should be OK")
assert.Equal(t, 1, statusCounts[StatusMissing], "one file should be MISSING")
assert.Equal(t, 0, statusCounts[StatusError], "no files should be ERROR")
}
func TestCheckEmptyManifest(t *testing.T) {
fs := afero.NewMemMapFs()
// Create manifest with no files
createTestManifest(t, fs, "/manifest.mf", map[string][]byte{})
chk, err := NewChecker("/manifest.mf", "/data", fs)
require.NoError(t, err)
assert.Equal(t, FileCount(0), chk.FileCount())
assert.Equal(t, FileSize(0), chk.TotalBytes())
results := make(chan Result, 10)
err = chk.Check(context.Background(), results, nil)
require.NoError(t, err)
var count int
for range results {
count++
}
assert.Equal(t, 0, count)
}

11
mfer/constants.go Normal file
View File

@ -0,0 +1,11 @@
package mfer
const (
Version = "0.1.0"
ReleaseDate = "2025-12-17"
// MaxDecompressedSize is the maximum allowed size of decompressed manifest
// data (256 MB). This prevents decompression bombs from consuming excessive
// memory.
MaxDecompressedSize int64 = 256 * 1024 * 1024
)

View File

@ -2,45 +2,118 @@ package mfer
import (
"bytes"
"compress/gzip"
"crypto/sha256"
"errors"
"fmt"
"io"
"git.eeqj.de/sneak/mfer/internal/bork"
"git.eeqj.de/sneak/mfer/internal/log"
"github.com/google/uuid"
"github.com/klauspost/compress/zstd"
"github.com/spf13/afero"
"google.golang.org/protobuf/proto"
"sneak.berlin/go/mfer/internal/bork"
"sneak.berlin/go/mfer/internal/log"
)
func (m *manifest) validateProtoOuter() error {
// validateUUID checks that the byte slice is a valid UUID (16 bytes, parseable).
func validateUUID(data []byte) error {
if len(data) != 16 {
return errors.New("invalid UUID length")
}
// Try to parse as UUID to validate format
_, err := uuid.FromBytes(data)
if err != nil {
return errors.New("invalid UUID format")
}
return nil
}
func (m *manifest) deserializeInner() error {
if m.pbOuter.Version != MFFileOuter_VERSION_ONE {
return errors.New("unknown version")
}
if m.pbOuter.CompressionType != MFFileOuter_COMPRESSION_GZIP {
if m.pbOuter.CompressionType != MFFileOuter_COMPRESSION_ZSTD {
return errors.New("unknown compression type")
}
// Validate outer UUID before any decompression
if err := validateUUID(m.pbOuter.Uuid); err != nil {
return errors.New("outer UUID invalid: " + err.Error())
}
// Verify hash of compressed data before decompression
h := sha256.New()
if _, err := h.Write(m.pbOuter.InnerMessage); err != nil {
return err
}
sha256Hash := h.Sum(nil)
if !bytes.Equal(sha256Hash, m.pbOuter.Sha256) {
return errors.New("compressed data hash mismatch")
}
// Verify signature if present
if len(m.pbOuter.Signature) > 0 {
if len(m.pbOuter.SigningPubKey) == 0 {
return errors.New("signature present but no public key")
}
sigString, err := m.signatureString()
if err != nil {
return fmt.Errorf("failed to generate signature string for verification: %w", err)
}
if err := gpgVerify([]byte(sigString), m.pbOuter.Signature, m.pbOuter.SigningPubKey); err != nil {
return fmt.Errorf("signature verification failed: %w", err)
}
log.Infof("signature verified successfully")
}
bb := bytes.NewBuffer(m.pbOuter.InnerMessage)
gzr, err := gzip.NewReader(bb)
zr, err := zstd.NewReader(bb)
if err != nil {
return err
}
defer zr.Close()
dat, err := io.ReadAll(gzr)
defer gzr.Close()
// Limit decompressed size to prevent decompression bombs.
// Use declared size + 1 byte to detect overflow, capped at MaxDecompressedSize.
maxSize := MaxDecompressedSize
if m.pbOuter.Size > 0 && m.pbOuter.Size < int64(maxSize) {
maxSize = int64(m.pbOuter.Size) + 1
}
limitedReader := io.LimitReader(zr, maxSize)
dat, err := io.ReadAll(limitedReader)
if err != nil {
return err
}
if int64(len(dat)) >= MaxDecompressedSize {
return fmt.Errorf("decompressed data exceeds maximum allowed size of %d bytes", MaxDecompressedSize)
}
isize := len(dat)
if int64(isize) != m.pbOuter.Size {
log.Debugf("truncated data, got %d expected %d", isize, m.pbOuter.Size)
return bork.ErrFileTruncated
}
log.Debugf("inner data size is %d", isize)
log.Dump(dat)
log.Dump(m.pbOuter.Sha256)
// Deserialize inner message
m.pbInner = new(MFFile)
if err := proto.Unmarshal(dat, m.pbInner); err != nil {
return err
}
// Validate inner UUID
if err := validateUUID(m.pbInner.Uuid); err != nil {
return errors.New("inner UUID invalid: " + err.Error())
}
// Verify UUIDs match
if !bytes.Equal(m.pbOuter.Uuid, m.pbInner.Uuid) {
return errors.New("outer and inner UUID mismatch")
}
log.Infof("loaded manifest with %d files", len(m.pbInner.Files))
return nil
}
@ -54,8 +127,9 @@ func validateMagic(dat []byte) bool {
return bytes.Equal(got, expected)
}
func NewFromProto(input io.Reader) (*manifest, error) {
m := New()
// NewManifestFromReader reads a manifest from an io.Reader.
func NewManifestFromReader(input io.Reader) (*manifest, error) {
m := &manifest{}
dat, err := io.ReadAll(input)
if err != nil {
return nil, err
@ -69,21 +143,30 @@ func NewFromProto(input io.Reader) (*manifest, error) {
bb := bytes.NewBuffer(dat[ml:])
dat = bb.Bytes()
log.Dump(dat)
// deserialize:
// deserialize outer:
m.pbOuter = new(MFFileOuter)
err = proto.Unmarshal(dat, m.pbOuter)
if err != nil {
if err := proto.Unmarshal(dat, m.pbOuter); err != nil {
return nil, err
}
ve := m.validateProtoOuter()
if ve != nil {
return nil, ve
// deserialize inner:
if err := m.deserializeInner(); err != nil {
return nil, err
}
// FIXME TODO deserialize inner
return m, nil
}
// NewManifestFromFile reads a manifest from a file path using the given filesystem.
// If fs is nil, the real filesystem (OsFs) is used.
func NewManifestFromFile(fs afero.Fs, path string) (*manifest, error) {
if fs == nil {
fs = afero.NewOsFs()
}
f, err := fs.Open(path)
if err != nil {
return nil, err
}
defer func() { _ = f.Close() }()
return NewManifestFromReader(f)
}

View File

@ -1,42 +0,0 @@
package mfer
import (
"bytes"
"testing"
"git.eeqj.de/sneak/mfer/internal/log"
"github.com/stretchr/testify/assert"
)
func TestAPIExample(t *testing.T) {
// read from filesystem
m, err := NewFromFS(&ManifestScanOptions{
IgnoreDotfiles: true,
}, big)
assert.Nil(t, err)
assert.NotNil(t, m)
// scan for files
m.Scan()
// serialize
var buf bytes.Buffer
m.WriteTo(&buf)
// show serialized
log.Dump(buf.Bytes())
// do it again
var buf2 bytes.Buffer
m.WriteTo(&buf2)
// should be same!
assert.True(t, bytes.Equal(buf.Bytes(), buf2.Bytes()))
// deserialize
m2, err := NewFromProto(&buf)
assert.Nil(t, err)
assert.NotNil(t, m2)
log.Dump(m2)
}

212
mfer/gpg.go Normal file
View File

@ -0,0 +1,212 @@
package mfer
import (
"bytes"
"fmt"
"os"
"os/exec"
"path/filepath"
"strings"
)
// GPGKeyID represents a GPG key identifier (fingerprint or key ID).
type GPGKeyID string
// SigningOptions contains options for GPG signing.
type SigningOptions struct {
KeyID GPGKeyID
}
// gpgSign creates a detached signature of the data using the specified key.
// Returns the armored detached signature.
func gpgSign(data []byte, keyID GPGKeyID) ([]byte, error) {
cmd := exec.Command("gpg",
"--detach-sign",
"--armor",
"--local-user", string(keyID),
)
cmd.Stdin = bytes.NewReader(data)
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
if err := cmd.Run(); err != nil {
return nil, fmt.Errorf("gpg sign failed: %w: %s", err, stderr.String())
}
return stdout.Bytes(), nil
}
// gpgExportPublicKey exports the public key for the specified key ID.
// Returns the armored public key.
func gpgExportPublicKey(keyID GPGKeyID) ([]byte, error) {
cmd := exec.Command("gpg",
"--export",
"--armor",
string(keyID),
)
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
if err := cmd.Run(); err != nil {
return nil, fmt.Errorf("gpg export failed: %w: %s", err, stderr.String())
}
if stdout.Len() == 0 {
return nil, fmt.Errorf("gpg key not found: %s", keyID)
}
return stdout.Bytes(), nil
}
// gpgGetKeyFingerprint gets the full fingerprint for a key ID.
func gpgGetKeyFingerprint(keyID GPGKeyID) ([]byte, error) {
cmd := exec.Command("gpg",
"--with-colons",
"--fingerprint",
string(keyID),
)
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
if err := cmd.Run(); err != nil {
return nil, fmt.Errorf("gpg fingerprint lookup failed: %w: %s", err, stderr.String())
}
// Parse the colon-delimited output to find the fingerprint
lines := strings.Split(stdout.String(), "\n")
for _, line := range lines {
fields := strings.Split(line, ":")
if len(fields) >= 10 && fields[0] == "fpr" {
return []byte(fields[9]), nil
}
}
return nil, fmt.Errorf("fingerprint not found for key: %s", keyID)
}
// gpgExtractPubKeyFingerprint imports a public key into a temporary keyring
// and extracts its fingerprint. This verifies the key is valid and returns
// the actual fingerprint from the key material.
func gpgExtractPubKeyFingerprint(pubKey []byte) (string, error) {
// Create temporary directory for GPG operations
tmpDir, err := os.MkdirTemp("", "mfer-gpg-fingerprint-*")
if err != nil {
return "", fmt.Errorf("failed to create temp dir: %w", err)
}
defer os.RemoveAll(tmpDir)
// Set restrictive permissions
if err := os.Chmod(tmpDir, 0o700); err != nil {
return "", fmt.Errorf("failed to set temp dir permissions: %w", err)
}
// Write public key to temp file
pubKeyFile := filepath.Join(tmpDir, "pubkey.asc")
if err := os.WriteFile(pubKeyFile, pubKey, 0o600); err != nil {
return "", fmt.Errorf("failed to write public key: %w", err)
}
// Import the public key into the temporary keyring
importCmd := exec.Command("gpg",
"--homedir", tmpDir,
"--import",
pubKeyFile,
)
var importStderr bytes.Buffer
importCmd.Stderr = &importStderr
if err := importCmd.Run(); err != nil {
return "", fmt.Errorf("failed to import public key: %w: %s", err, importStderr.String())
}
// List keys to get fingerprint
listCmd := exec.Command("gpg",
"--homedir", tmpDir,
"--with-colons",
"--fingerprint",
)
var listStdout, listStderr bytes.Buffer
listCmd.Stdout = &listStdout
listCmd.Stderr = &listStderr
if err := listCmd.Run(); err != nil {
return "", fmt.Errorf("failed to list keys: %w: %s", err, listStderr.String())
}
// Parse the colon-delimited output to find the fingerprint
lines := strings.Split(listStdout.String(), "\n")
for _, line := range lines {
fields := strings.Split(line, ":")
if len(fields) >= 10 && fields[0] == "fpr" {
return fields[9], nil
}
}
return "", fmt.Errorf("fingerprint not found in imported key")
}
// gpgVerify verifies a detached signature against data using the provided public key.
// It creates a temporary keyring to import the public key for verification.
func gpgVerify(data, signature, pubKey []byte) error {
// Create temporary directory for GPG operations
tmpDir, err := os.MkdirTemp("", "mfer-gpg-verify-*")
if err != nil {
return fmt.Errorf("failed to create temp dir: %w", err)
}
defer os.RemoveAll(tmpDir)
// Set restrictive permissions
if err := os.Chmod(tmpDir, 0o700); err != nil {
return fmt.Errorf("failed to set temp dir permissions: %w", err)
}
// Write public key to temp file
pubKeyFile := filepath.Join(tmpDir, "pubkey.asc")
if err := os.WriteFile(pubKeyFile, pubKey, 0o600); err != nil {
return fmt.Errorf("failed to write public key: %w", err)
}
// Write signature to temp file
sigFile := filepath.Join(tmpDir, "signature.asc")
if err := os.WriteFile(sigFile, signature, 0o600); err != nil {
return fmt.Errorf("failed to write signature: %w", err)
}
// Write data to temp file
dataFile := filepath.Join(tmpDir, "data")
if err := os.WriteFile(dataFile, data, 0o600); err != nil {
return fmt.Errorf("failed to write data: %w", err)
}
// Import the public key into the temporary keyring
importCmd := exec.Command("gpg",
"--homedir", tmpDir,
"--import",
pubKeyFile,
)
var importStderr bytes.Buffer
importCmd.Stderr = &importStderr
if err := importCmd.Run(); err != nil {
return fmt.Errorf("failed to import public key: %w: %s", err, importStderr.String())
}
// Verify the signature
verifyCmd := exec.Command("gpg",
"--homedir", tmpDir,
"--verify",
sigFile,
dataFile,
)
var verifyStderr bytes.Buffer
verifyCmd.Stderr = &verifyStderr
if err := verifyCmd.Run(); err != nil {
return fmt.Errorf("signature verification failed: %w: %s", err, verifyStderr.String())
}
return nil
}

347
mfer/gpg_test.go Normal file
View File

@ -0,0 +1,347 @@
package mfer
import (
"bytes"
"context"
"os"
"os/exec"
"path/filepath"
"strings"
"testing"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// testGPGEnv sets up a temporary GPG home directory with a test key.
// Returns the key ID and a cleanup function.
func testGPGEnv(t *testing.T) (GPGKeyID, func()) {
t.Helper()
// Check if gpg is installed
if _, err := exec.LookPath("gpg"); err != nil {
t.Skip("gpg not installed, skipping signing test")
return "", func() {}
}
// Create temporary GPG home directory
gpgHome, err := os.MkdirTemp("", "mfer-gpg-test-*")
require.NoError(t, err)
// Set restrictive permissions on GPG home
require.NoError(t, os.Chmod(gpgHome, 0o700))
// Save original GNUPGHOME and set new one
origGPGHome := os.Getenv("GNUPGHOME")
os.Setenv("GNUPGHOME", gpgHome)
cleanup := func() {
if origGPGHome == "" {
os.Unsetenv("GNUPGHOME")
} else {
os.Setenv("GNUPGHOME", origGPGHome)
}
os.RemoveAll(gpgHome)
}
// Generate a test key with no passphrase
keyParams := `%no-protection
Key-Type: RSA
Key-Length: 2048
Name-Real: MFER Test Key
Name-Email: test@mfer.test
Expire-Date: 0
%commit
`
paramsFile := filepath.Join(gpgHome, "key-params")
require.NoError(t, os.WriteFile(paramsFile, []byte(keyParams), 0o600))
cmd := exec.Command("gpg", "--batch", "--gen-key", paramsFile)
cmd.Env = append(os.Environ(), "GNUPGHOME="+gpgHome)
output, err := cmd.CombinedOutput()
if err != nil {
cleanup()
t.Skipf("failed to generate test GPG key: %v: %s", err, output)
return "", func() {}
}
// Get the key fingerprint
cmd = exec.Command("gpg", "--list-keys", "--with-colons", "test@mfer.test")
cmd.Env = append(os.Environ(), "GNUPGHOME="+gpgHome)
output, err = cmd.Output()
if err != nil {
cleanup()
t.Fatalf("failed to list test key: %v", err)
}
// Parse fingerprint from output
var keyID string
for _, line := range strings.Split(string(output), "\n") {
fields := strings.Split(line, ":")
if len(fields) >= 10 && fields[0] == "fpr" {
keyID = fields[9]
break
}
}
if keyID == "" {
cleanup()
t.Fatal("failed to find test key fingerprint")
}
return GPGKeyID(keyID), cleanup
}
func TestGPGSign(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
data := []byte("test data to sign")
sig, err := gpgSign(data, keyID)
require.NoError(t, err)
assert.NotEmpty(t, sig)
assert.Contains(t, string(sig), "-----BEGIN PGP SIGNATURE-----")
assert.Contains(t, string(sig), "-----END PGP SIGNATURE-----")
}
func TestGPGExportPublicKey(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
pubKey, err := gpgExportPublicKey(keyID)
require.NoError(t, err)
assert.NotEmpty(t, pubKey)
assert.Contains(t, string(pubKey), "-----BEGIN PGP PUBLIC KEY BLOCK-----")
assert.Contains(t, string(pubKey), "-----END PGP PUBLIC KEY BLOCK-----")
}
func TestGPGGetKeyFingerprint(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
fingerprint, err := gpgGetKeyFingerprint(keyID)
require.NoError(t, err)
assert.NotEmpty(t, fingerprint)
// The fingerprint should be 40 hex chars
assert.Len(t, fingerprint, 40, "fingerprint should be 40 hex chars")
}
func TestGPGSignInvalidKey(t *testing.T) {
// Set up test environment (we need GNUPGHOME set)
_, cleanup := testGPGEnv(t)
defer cleanup()
data := []byte("test data")
_, err := gpgSign(data, GPGKeyID("NONEXISTENT_KEY_ID_12345"))
assert.Error(t, err)
}
func TestBuilderWithSigning(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
// Create a builder with signing options
b := NewBuilder()
b.SetSigningOptions(&SigningOptions{
KeyID: keyID,
})
// Add a test file
content := []byte("test file content")
reader := bytes.NewReader(content)
_, err := b.AddFile("test.txt", FileSize(len(content)), ModTime{}, reader, nil)
require.NoError(t, err)
// Build the manifest
var buf bytes.Buffer
err = b.Build(&buf)
require.NoError(t, err)
// Parse the manifest and verify signature fields are populated
manifest, err := NewManifestFromReader(&buf)
require.NoError(t, err)
require.NotNil(t, manifest.pbOuter)
assert.NotEmpty(t, manifest.pbOuter.Signature, "signature should be populated")
assert.NotEmpty(t, manifest.pbOuter.Signer, "signer should be populated")
assert.NotEmpty(t, manifest.pbOuter.SigningPubKey, "signing public key should be populated")
// Verify signature is a valid PGP signature
assert.Contains(t, string(manifest.pbOuter.Signature), "-----BEGIN PGP SIGNATURE-----")
// Verify public key is a valid PGP public key block
assert.Contains(t, string(manifest.pbOuter.SigningPubKey), "-----BEGIN PGP PUBLIC KEY BLOCK-----")
}
func TestScannerWithSigning(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
// Create in-memory filesystem with test files
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("content1"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("content2"), 0o644))
// Create scanner with signing options
opts := &ScannerOptions{
Fs: fs,
SigningOptions: &SigningOptions{
KeyID: keyID,
},
}
s := NewScannerWithOptions(opts)
// Enumerate files
require.NoError(t, s.EnumeratePath("/testdir", nil))
assert.Equal(t, FileCount(2), s.FileCount())
// Generate signed manifest
var buf bytes.Buffer
require.NoError(t, s.ToManifest(context.Background(), &buf, nil))
// Parse and verify
manifest, err := NewManifestFromReader(&buf)
require.NoError(t, err)
assert.NotEmpty(t, manifest.pbOuter.Signature)
assert.NotEmpty(t, manifest.pbOuter.Signer)
assert.NotEmpty(t, manifest.pbOuter.SigningPubKey)
}
func TestGPGVerify(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
data := []byte("test data to sign and verify")
sig, err := gpgSign(data, keyID)
require.NoError(t, err)
pubKey, err := gpgExportPublicKey(keyID)
require.NoError(t, err)
// Verify the signature
err = gpgVerify(data, sig, pubKey)
require.NoError(t, err)
}
func TestGPGVerifyInvalidSignature(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
data := []byte("test data to sign")
sig, err := gpgSign(data, keyID)
require.NoError(t, err)
pubKey, err := gpgExportPublicKey(keyID)
require.NoError(t, err)
// Try to verify with different data - should fail
wrongData := []byte("different data")
err = gpgVerify(wrongData, sig, pubKey)
assert.Error(t, err)
}
func TestGPGVerifyBadPublicKey(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
data := []byte("test data")
sig, err := gpgSign(data, keyID)
require.NoError(t, err)
// Try to verify with invalid public key - should fail
badPubKey := []byte("not a valid public key")
err = gpgVerify(data, sig, badPubKey)
assert.Error(t, err)
}
func TestManifestSignatureVerification(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
// Create a builder with signing options
b := NewBuilder()
b.SetSigningOptions(&SigningOptions{
KeyID: keyID,
})
// Add a test file
content := []byte("test file content for verification")
reader := bytes.NewReader(content)
_, err := b.AddFile("test.txt", FileSize(len(content)), ModTime{}, reader, nil)
require.NoError(t, err)
// Build the manifest
var buf bytes.Buffer
err = b.Build(&buf)
require.NoError(t, err)
// Parse the manifest - signature should be verified during load
manifest, err := NewManifestFromReader(&buf)
require.NoError(t, err)
require.NotNil(t, manifest)
// Signature should be present and valid
assert.NotEmpty(t, manifest.pbOuter.Signature)
}
func TestManifestTamperedSignatureFails(t *testing.T) {
keyID, cleanup := testGPGEnv(t)
defer cleanup()
// Create a signed manifest
b := NewBuilder()
b.SetSigningOptions(&SigningOptions{
KeyID: keyID,
})
content := []byte("test file content")
reader := bytes.NewReader(content)
_, err := b.AddFile("test.txt", FileSize(len(content)), ModTime{}, reader, nil)
require.NoError(t, err)
var buf bytes.Buffer
err = b.Build(&buf)
require.NoError(t, err)
// Tamper with the signature by replacing some bytes
data := buf.Bytes()
// Find and modify a byte in the signature portion
for i := range data {
if i > 100 && data[i] == 'A' {
data[i] = 'B'
break
}
}
// Try to load the tampered manifest - should fail
_, err = NewManifestFromReader(bytes.NewReader(data))
assert.Error(t, err)
}
func TestBuilderWithoutSigning(t *testing.T) {
// Create a builder without signing options
b := NewBuilder()
// Add a test file
content := []byte("test file content")
reader := bytes.NewReader(content)
_, err := b.AddFile("test.txt", FileSize(len(content)), ModTime{}, reader, nil)
require.NoError(t, err)
// Build the manifest
var buf bytes.Buffer
err = b.Build(&buf)
require.NoError(t, err)
// Parse the manifest and verify signature fields are empty
manifest, err := NewManifestFromReader(&buf)
require.NoError(t, err)
require.NotNil(t, manifest.pbOuter)
assert.Empty(t, manifest.pbOuter.Signature, "signature should be empty when not signing")
assert.Empty(t, manifest.pbOuter.Signer, "signer should be empty when not signing")
assert.Empty(t, manifest.pbOuter.SigningPubKey, "signing public key should be empty when not signing")
}

View File

@ -2,172 +2,58 @@ package mfer
import (
"bytes"
"context"
"encoding/hex"
"errors"
"fmt"
"io/fs"
"path"
"path/filepath"
"strings"
"git.eeqj.de/sneak/mfer/internal/log"
"github.com/spf13/afero"
"github.com/multiformats/go-multihash"
)
type manifestFile struct {
path string
info fs.FileInfo
}
func (m *manifestFile) String() string {
return fmt.Sprintf("<File \"%s\">", m.path)
}
// manifest holds the internal representation of a manifest file.
// Use NewManifestFromFile or NewManifestFromReader to load an existing manifest,
// or use Builder to create a new one.
type manifest struct {
sourceFS []afero.Fs
files []*manifestFile
scanOptions *ManifestScanOptions
totalFileSize int64
pbInner *MFFile
pbOuter *MFFileOuter
output *bytes.Buffer
ctx context.Context
errors []*error
signingOptions *SigningOptions
}
func (m *manifest) String() string {
return fmt.Sprintf("<Manifest count=%d totalSize=%d>", len(m.files), m.totalFileSize)
count := 0
if m.pbInner != nil {
count = len(m.pbInner.Files)
}
return fmt.Sprintf("<Manifest count=%d>", count)
}
type ManifestScanOptions struct {
IgnoreDotfiles bool
FollowSymLinks bool
// Files returns all file entries from a loaded manifest.
func (m *manifest) Files() []*MFFilePath {
if m.pbInner == nil {
return nil
}
return m.pbInner.Files
}
func (m *manifest) HasError() bool {
return len(m.errors) > 0
// signatureString generates the canonical string used for signing/verification.
// Format: MAGIC-UUID-MULTIHASH where UUID and multihash are hex-encoded.
// Requires pbOuter to be set with Uuid and Sha256 fields.
func (m *manifest) signatureString() (string, error) {
if m.pbOuter == nil {
return "", errors.New("pbOuter not set")
}
if len(m.pbOuter.Uuid) == 0 {
return "", errors.New("UUID not set")
}
if len(m.pbOuter.Sha256) == 0 {
return "", errors.New("SHA256 hash not set")
}
func (m *manifest) AddError(e error) *manifest {
m.errors = append(m.errors, &e)
return m
}
func (m *manifest) WithContext(c context.Context) *manifest {
m.ctx = c
return m
}
func (m *manifest) addInputPath(inputPath string) error {
abs, err := filepath.Abs(inputPath)
mh, err := multihash.Encode(m.pbOuter.Sha256, multihash.SHA2_256)
if err != nil {
return err
return "", fmt.Errorf("failed to encode multihash: %w", err)
}
// FIXME check to make sure inputPath/abs exists maybe
afs := afero.NewReadOnlyFs(afero.NewBasePathFs(afero.NewOsFs(), abs))
return m.addInputFS(afs)
}
func (m *manifest) addInputFS(f afero.Fs) error {
if m.sourceFS == nil {
m.sourceFS = make([]afero.Fs, 0)
}
m.sourceFS = append(m.sourceFS, f)
// FIXME do some sort of check on f here?
return nil
}
func New() *manifest {
m := &manifest{}
return m
}
func NewFromPaths(options *ManifestScanOptions, inputPaths ...string) (*manifest, error) {
log.Dump(inputPaths)
m := New()
m.scanOptions = options
for _, p := range inputPaths {
err := m.addInputPath(p)
if err != nil {
return nil, err
}
}
return m, nil
}
func NewFromFS(options *ManifestScanOptions, fs afero.Fs) (*manifest, error) {
m := New()
m.scanOptions = options
err := m.addInputFS(fs)
if err != nil {
return nil, err
}
return m, nil
}
func (m *manifest) GetFileCount() int64 {
return int64(len(m.files))
}
func (m *manifest) GetTotalFileSize() int64 {
return m.totalFileSize
}
func pathIsHidden(p string) bool {
tp := path.Clean(p)
if strings.HasPrefix(tp, ".") {
return true
}
for {
d, f := path.Split(tp)
if strings.HasPrefix(f, ".") {
return true
}
if d == "" {
return false
}
tp = d[0 : len(d)-1] // trim trailing slash from dir
}
}
func (m *manifest) addFile(p string, fi fs.FileInfo, sfsIndex int) error {
if m.scanOptions.IgnoreDotfiles && pathIsHidden(p) {
return nil
}
if fi != nil && fi.IsDir() {
// manifests contain only files, directories are implied.
return nil
}
// FIXME test if 'fi' is already result of stat
fileinfo, staterr := m.sourceFS[sfsIndex].Stat(p)
if staterr != nil {
return staterr
}
cleanPath := p
if cleanPath[0:1] == "/" {
cleanPath = cleanPath[1:]
}
nf := &manifestFile{
path: cleanPath,
info: fileinfo,
}
m.files = append(m.files, nf)
m.totalFileSize = m.totalFileSize + fi.Size()
return nil
}
func (m *manifest) Scan() error {
// FIXME scan and whatever function does the hashing should take ctx
for idx, sfs := range m.sourceFS {
if sfs == nil {
return errors.New("invalid source fs")
}
e := afero.Walk(sfs, "/", func(p string, info fs.FileInfo, err error) error {
return m.addFile(p, info, idx)
})
if e != nil {
return e
}
}
return nil
uuidStr := hex.EncodeToString(m.pbOuter.Uuid)
mhStr := hex.EncodeToString(mh)
return fmt.Sprintf("%s-%s-%s", MAGIC, uuidStr, mhStr), nil
}

3
mfer/mf.go Normal file
View File

@ -0,0 +1,3 @@
package mfer
//go:generate protoc ./mf.proto --go_out=paths=source_relative:.

667
mfer/mf.pb.go Normal file
View File

@ -0,0 +1,667 @@
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
// protoc-gen-go v1.36.11
// protoc v6.33.0
// source: mf.proto
package mfer
import (
protoreflect "google.golang.org/protobuf/reflect/protoreflect"
protoimpl "google.golang.org/protobuf/runtime/protoimpl"
reflect "reflect"
sync "sync"
unsafe "unsafe"
)
const (
// Verify that this generated code is sufficiently up-to-date.
_ = protoimpl.EnforceVersion(20 - protoimpl.MinVersion)
// Verify that runtime/protoimpl is sufficiently up-to-date.
_ = protoimpl.EnforceVersion(protoimpl.MaxVersion - 20)
)
type MFFileOuter_Version int32
const (
MFFileOuter_VERSION_NONE MFFileOuter_Version = 0
MFFileOuter_VERSION_ONE MFFileOuter_Version = 1 // only one for now
)
// Enum value maps for MFFileOuter_Version.
var (
MFFileOuter_Version_name = map[int32]string{
0: "VERSION_NONE",
1: "VERSION_ONE",
}
MFFileOuter_Version_value = map[string]int32{
"VERSION_NONE": 0,
"VERSION_ONE": 1,
}
)
func (x MFFileOuter_Version) Enum() *MFFileOuter_Version {
p := new(MFFileOuter_Version)
*p = x
return p
}
func (x MFFileOuter_Version) String() string {
return protoimpl.X.EnumStringOf(x.Descriptor(), protoreflect.EnumNumber(x))
}
func (MFFileOuter_Version) Descriptor() protoreflect.EnumDescriptor {
return file_mf_proto_enumTypes[0].Descriptor()
}
func (MFFileOuter_Version) Type() protoreflect.EnumType {
return &file_mf_proto_enumTypes[0]
}
func (x MFFileOuter_Version) Number() protoreflect.EnumNumber {
return protoreflect.EnumNumber(x)
}
// Deprecated: Use MFFileOuter_Version.Descriptor instead.
func (MFFileOuter_Version) EnumDescriptor() ([]byte, []int) {
return file_mf_proto_rawDescGZIP(), []int{1, 0}
}
type MFFileOuter_CompressionType int32
const (
MFFileOuter_COMPRESSION_NONE MFFileOuter_CompressionType = 0
MFFileOuter_COMPRESSION_ZSTD MFFileOuter_CompressionType = 1
)
// Enum value maps for MFFileOuter_CompressionType.
var (
MFFileOuter_CompressionType_name = map[int32]string{
0: "COMPRESSION_NONE",
1: "COMPRESSION_ZSTD",
}
MFFileOuter_CompressionType_value = map[string]int32{
"COMPRESSION_NONE": 0,
"COMPRESSION_ZSTD": 1,
}
)
func (x MFFileOuter_CompressionType) Enum() *MFFileOuter_CompressionType {
p := new(MFFileOuter_CompressionType)
*p = x
return p
}
func (x MFFileOuter_CompressionType) String() string {
return protoimpl.X.EnumStringOf(x.Descriptor(), protoreflect.EnumNumber(x))
}
func (MFFileOuter_CompressionType) Descriptor() protoreflect.EnumDescriptor {
return file_mf_proto_enumTypes[1].Descriptor()
}
func (MFFileOuter_CompressionType) Type() protoreflect.EnumType {
return &file_mf_proto_enumTypes[1]
}
func (x MFFileOuter_CompressionType) Number() protoreflect.EnumNumber {
return protoreflect.EnumNumber(x)
}
// Deprecated: Use MFFileOuter_CompressionType.Descriptor instead.
func (MFFileOuter_CompressionType) EnumDescriptor() ([]byte, []int) {
return file_mf_proto_rawDescGZIP(), []int{1, 1}
}
type MFFile_Version int32
const (
MFFile_VERSION_NONE MFFile_Version = 0
MFFile_VERSION_ONE MFFile_Version = 1 // only one for now
)
// Enum value maps for MFFile_Version.
var (
MFFile_Version_name = map[int32]string{
0: "VERSION_NONE",
1: "VERSION_ONE",
}
MFFile_Version_value = map[string]int32{
"VERSION_NONE": 0,
"VERSION_ONE": 1,
}
)
func (x MFFile_Version) Enum() *MFFile_Version {
p := new(MFFile_Version)
*p = x
return p
}
func (x MFFile_Version) String() string {
return protoimpl.X.EnumStringOf(x.Descriptor(), protoreflect.EnumNumber(x))
}
func (MFFile_Version) Descriptor() protoreflect.EnumDescriptor {
return file_mf_proto_enumTypes[2].Descriptor()
}
func (MFFile_Version) Type() protoreflect.EnumType {
return &file_mf_proto_enumTypes[2]
}
func (x MFFile_Version) Number() protoreflect.EnumNumber {
return protoreflect.EnumNumber(x)
}
// Deprecated: Use MFFile_Version.Descriptor instead.
func (MFFile_Version) EnumDescriptor() ([]byte, []int) {
return file_mf_proto_rawDescGZIP(), []int{4, 0}
}
type Timestamp struct {
state protoimpl.MessageState `protogen:"open.v1"`
Seconds int64 `protobuf:"varint,1,opt,name=seconds,proto3" json:"seconds,omitempty"`
Nanos int32 `protobuf:"varint,2,opt,name=nanos,proto3" json:"nanos,omitempty"`
unknownFields protoimpl.UnknownFields
sizeCache protoimpl.SizeCache
}
func (x *Timestamp) Reset() {
*x = Timestamp{}
mi := &file_mf_proto_msgTypes[0]
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
ms.StoreMessageInfo(mi)
}
func (x *Timestamp) String() string {
return protoimpl.X.MessageStringOf(x)
}
func (*Timestamp) ProtoMessage() {}
func (x *Timestamp) ProtoReflect() protoreflect.Message {
mi := &file_mf_proto_msgTypes[0]
if x != nil {
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
if ms.LoadMessageInfo() == nil {
ms.StoreMessageInfo(mi)
}
return ms
}
return mi.MessageOf(x)
}
// Deprecated: Use Timestamp.ProtoReflect.Descriptor instead.
func (*Timestamp) Descriptor() ([]byte, []int) {
return file_mf_proto_rawDescGZIP(), []int{0}
}
func (x *Timestamp) GetSeconds() int64 {
if x != nil {
return x.Seconds
}
return 0
}
func (x *Timestamp) GetNanos() int32 {
if x != nil {
return x.Nanos
}
return 0
}
type MFFileOuter struct {
state protoimpl.MessageState `protogen:"open.v1"`
// required mffile root attributes 1xx
Version MFFileOuter_Version `protobuf:"varint,101,opt,name=version,proto3,enum=MFFileOuter_Version" json:"version,omitempty"`
CompressionType MFFileOuter_CompressionType `protobuf:"varint,102,opt,name=compressionType,proto3,enum=MFFileOuter_CompressionType" json:"compressionType,omitempty"`
// these are used solely to detect corruption/truncation
// and not for cryptographic integrity.
Size int64 `protobuf:"varint,103,opt,name=size,proto3" json:"size,omitempty"`
Sha256 []byte `protobuf:"bytes,104,opt,name=sha256,proto3" json:"sha256,omitempty"`
// uuid must match the uuid in the inner message
Uuid []byte `protobuf:"bytes,105,opt,name=uuid,proto3" json:"uuid,omitempty"`
InnerMessage []byte `protobuf:"bytes,199,opt,name=innerMessage,proto3" json:"innerMessage,omitempty"`
// detached signature, ascii or binary
Signature []byte `protobuf:"bytes,201,opt,name=signature,proto3,oneof" json:"signature,omitempty"`
// full GPG key id
Signer []byte `protobuf:"bytes,202,opt,name=signer,proto3,oneof" json:"signer,omitempty"`
// full GPG signing public key, ascii or binary
SigningPubKey []byte `protobuf:"bytes,203,opt,name=signingPubKey,proto3,oneof" json:"signingPubKey,omitempty"`
unknownFields protoimpl.UnknownFields
sizeCache protoimpl.SizeCache
}
func (x *MFFileOuter) Reset() {
*x = MFFileOuter{}
mi := &file_mf_proto_msgTypes[1]
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
ms.StoreMessageInfo(mi)
}
func (x *MFFileOuter) String() string {
return protoimpl.X.MessageStringOf(x)
}
func (*MFFileOuter) ProtoMessage() {}
func (x *MFFileOuter) ProtoReflect() protoreflect.Message {
mi := &file_mf_proto_msgTypes[1]
if x != nil {
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
if ms.LoadMessageInfo() == nil {
ms.StoreMessageInfo(mi)
}
return ms
}
return mi.MessageOf(x)
}
// Deprecated: Use MFFileOuter.ProtoReflect.Descriptor instead.
func (*MFFileOuter) Descriptor() ([]byte, []int) {
return file_mf_proto_rawDescGZIP(), []int{1}
}
func (x *MFFileOuter) GetVersion() MFFileOuter_Version {
if x != nil {
return x.Version
}
return MFFileOuter_VERSION_NONE
}
func (x *MFFileOuter) GetCompressionType() MFFileOuter_CompressionType {
if x != nil {
return x.CompressionType
}
return MFFileOuter_COMPRESSION_NONE
}
func (x *MFFileOuter) GetSize() int64 {
if x != nil {
return x.Size
}
return 0
}
func (x *MFFileOuter) GetSha256() []byte {
if x != nil {
return x.Sha256
}
return nil
}
func (x *MFFileOuter) GetUuid() []byte {
if x != nil {
return x.Uuid
}
return nil
}
func (x *MFFileOuter) GetInnerMessage() []byte {
if x != nil {
return x.InnerMessage
}
return nil
}
func (x *MFFileOuter) GetSignature() []byte {
if x != nil {
return x.Signature
}
return nil
}
func (x *MFFileOuter) GetSigner() []byte {
if x != nil {
return x.Signer
}
return nil
}
func (x *MFFileOuter) GetSigningPubKey() []byte {
if x != nil {
return x.SigningPubKey
}
return nil
}
type MFFilePath struct {
state protoimpl.MessageState `protogen:"open.v1"`
// required attributes:
Path string `protobuf:"bytes,1,opt,name=path,proto3" json:"path,omitempty"`
Size int64 `protobuf:"varint,2,opt,name=size,proto3" json:"size,omitempty"`
// gotta have at least one:
Hashes []*MFFileChecksum `protobuf:"bytes,3,rep,name=hashes,proto3" json:"hashes,omitempty"`
// optional per-file metadata
MimeType *string `protobuf:"bytes,301,opt,name=mimeType,proto3,oneof" json:"mimeType,omitempty"`
Mtime *Timestamp `protobuf:"bytes,302,opt,name=mtime,proto3,oneof" json:"mtime,omitempty"`
Ctime *Timestamp `protobuf:"bytes,303,opt,name=ctime,proto3,oneof" json:"ctime,omitempty"`
Atime *Timestamp `protobuf:"bytes,304,opt,name=atime,proto3,oneof" json:"atime,omitempty"`
unknownFields protoimpl.UnknownFields
sizeCache protoimpl.SizeCache
}
func (x *MFFilePath) Reset() {
*x = MFFilePath{}
mi := &file_mf_proto_msgTypes[2]
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
ms.StoreMessageInfo(mi)
}
func (x *MFFilePath) String() string {
return protoimpl.X.MessageStringOf(x)
}
func (*MFFilePath) ProtoMessage() {}
func (x *MFFilePath) ProtoReflect() protoreflect.Message {
mi := &file_mf_proto_msgTypes[2]
if x != nil {
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
if ms.LoadMessageInfo() == nil {
ms.StoreMessageInfo(mi)
}
return ms
}
return mi.MessageOf(x)
}
// Deprecated: Use MFFilePath.ProtoReflect.Descriptor instead.
func (*MFFilePath) Descriptor() ([]byte, []int) {
return file_mf_proto_rawDescGZIP(), []int{2}
}
func (x *MFFilePath) GetPath() string {
if x != nil {
return x.Path
}
return ""
}
func (x *MFFilePath) GetSize() int64 {
if x != nil {
return x.Size
}
return 0
}
func (x *MFFilePath) GetHashes() []*MFFileChecksum {
if x != nil {
return x.Hashes
}
return nil
}
func (x *MFFilePath) GetMimeType() string {
if x != nil && x.MimeType != nil {
return *x.MimeType
}
return ""
}
func (x *MFFilePath) GetMtime() *Timestamp {
if x != nil {
return x.Mtime
}
return nil
}
func (x *MFFilePath) GetCtime() *Timestamp {
if x != nil {
return x.Ctime
}
return nil
}
func (x *MFFilePath) GetAtime() *Timestamp {
if x != nil {
return x.Atime
}
return nil
}
type MFFileChecksum struct {
state protoimpl.MessageState `protogen:"open.v1"`
// 1.0 golang implementation must write a multihash here
// it's ok to only ever use/verify sha256 multihash
MultiHash []byte `protobuf:"bytes,1,opt,name=multiHash,proto3" json:"multiHash,omitempty"`
unknownFields protoimpl.UnknownFields
sizeCache protoimpl.SizeCache
}
func (x *MFFileChecksum) Reset() {
*x = MFFileChecksum{}
mi := &file_mf_proto_msgTypes[3]
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
ms.StoreMessageInfo(mi)
}
func (x *MFFileChecksum) String() string {
return protoimpl.X.MessageStringOf(x)
}
func (*MFFileChecksum) ProtoMessage() {}
func (x *MFFileChecksum) ProtoReflect() protoreflect.Message {
mi := &file_mf_proto_msgTypes[3]
if x != nil {
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
if ms.LoadMessageInfo() == nil {
ms.StoreMessageInfo(mi)
}
return ms
}
return mi.MessageOf(x)
}
// Deprecated: Use MFFileChecksum.ProtoReflect.Descriptor instead.
func (*MFFileChecksum) Descriptor() ([]byte, []int) {
return file_mf_proto_rawDescGZIP(), []int{3}
}
func (x *MFFileChecksum) GetMultiHash() []byte {
if x != nil {
return x.MultiHash
}
return nil
}
type MFFile struct {
state protoimpl.MessageState `protogen:"open.v1"`
Version MFFile_Version `protobuf:"varint,100,opt,name=version,proto3,enum=MFFile_Version" json:"version,omitempty"`
// required manifest attributes:
Files []*MFFilePath `protobuf:"bytes,101,rep,name=files,proto3" json:"files,omitempty"`
// uuid is a random v4 UUID generated when creating the manifest
// used as part of the signature to prevent replay attacks
Uuid []byte `protobuf:"bytes,102,opt,name=uuid,proto3" json:"uuid,omitempty"`
// optional manifest attributes 2xx:
CreatedAt *Timestamp `protobuf:"bytes,201,opt,name=createdAt,proto3,oneof" json:"createdAt,omitempty"`
unknownFields protoimpl.UnknownFields
sizeCache protoimpl.SizeCache
}
func (x *MFFile) Reset() {
*x = MFFile{}
mi := &file_mf_proto_msgTypes[4]
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
ms.StoreMessageInfo(mi)
}
func (x *MFFile) String() string {
return protoimpl.X.MessageStringOf(x)
}
func (*MFFile) ProtoMessage() {}
func (x *MFFile) ProtoReflect() protoreflect.Message {
mi := &file_mf_proto_msgTypes[4]
if x != nil {
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
if ms.LoadMessageInfo() == nil {
ms.StoreMessageInfo(mi)
}
return ms
}
return mi.MessageOf(x)
}
// Deprecated: Use MFFile.ProtoReflect.Descriptor instead.
func (*MFFile) Descriptor() ([]byte, []int) {
return file_mf_proto_rawDescGZIP(), []int{4}
}
func (x *MFFile) GetVersion() MFFile_Version {
if x != nil {
return x.Version
}
return MFFile_VERSION_NONE
}
func (x *MFFile) GetFiles() []*MFFilePath {
if x != nil {
return x.Files
}
return nil
}
func (x *MFFile) GetUuid() []byte {
if x != nil {
return x.Uuid
}
return nil
}
func (x *MFFile) GetCreatedAt() *Timestamp {
if x != nil {
return x.CreatedAt
}
return nil
}
var File_mf_proto protoreflect.FileDescriptor
const file_mf_proto_rawDesc = "" +
"\n" +
"\bmf.proto\";\n" +
"\tTimestamp\x12\x18\n" +
"\aseconds\x18\x01 \x01(\x03R\aseconds\x12\x14\n" +
"\x05nanos\x18\x02 \x01(\x05R\x05nanos\"\xf0\x03\n" +
"\vMFFileOuter\x12.\n" +
"\aversion\x18e \x01(\x0e2\x14.MFFileOuter.VersionR\aversion\x12F\n" +
"\x0fcompressionType\x18f \x01(\x0e2\x1c.MFFileOuter.CompressionTypeR\x0fcompressionType\x12\x12\n" +
"\x04size\x18g \x01(\x03R\x04size\x12\x16\n" +
"\x06sha256\x18h \x01(\fR\x06sha256\x12\x12\n" +
"\x04uuid\x18i \x01(\fR\x04uuid\x12#\n" +
"\finnerMessage\x18\xc7\x01 \x01(\fR\finnerMessage\x12\"\n" +
"\tsignature\x18\xc9\x01 \x01(\fH\x00R\tsignature\x88\x01\x01\x12\x1c\n" +
"\x06signer\x18\xca\x01 \x01(\fH\x01R\x06signer\x88\x01\x01\x12*\n" +
"\rsigningPubKey\x18\xcb\x01 \x01(\fH\x02R\rsigningPubKey\x88\x01\x01\",\n" +
"\aVersion\x12\x10\n" +
"\fVERSION_NONE\x10\x00\x12\x0f\n" +
"\vVERSION_ONE\x10\x01\"=\n" +
"\x0fCompressionType\x12\x14\n" +
"\x10COMPRESSION_NONE\x10\x00\x12\x14\n" +
"\x10COMPRESSION_ZSTD\x10\x01B\f\n" +
"\n" +
"_signatureB\t\n" +
"\a_signerB\x10\n" +
"\x0e_signingPubKey\"\xa2\x02\n" +
"\n" +
"MFFilePath\x12\x12\n" +
"\x04path\x18\x01 \x01(\tR\x04path\x12\x12\n" +
"\x04size\x18\x02 \x01(\x03R\x04size\x12'\n" +
"\x06hashes\x18\x03 \x03(\v2\x0f.MFFileChecksumR\x06hashes\x12 \n" +
"\bmimeType\x18\xad\x02 \x01(\tH\x00R\bmimeType\x88\x01\x01\x12&\n" +
"\x05mtime\x18\xae\x02 \x01(\v2\n" +
".TimestampH\x01R\x05mtime\x88\x01\x01\x12&\n" +
"\x05ctime\x18\xaf\x02 \x01(\v2\n" +
".TimestampH\x02R\x05ctime\x88\x01\x01\x12&\n" +
"\x05atime\x18\xb0\x02 \x01(\v2\n" +
".TimestampH\x03R\x05atime\x88\x01\x01B\v\n" +
"\t_mimeTypeB\b\n" +
"\x06_mtimeB\b\n" +
"\x06_ctimeB\b\n" +
"\x06_atime\".\n" +
"\x0eMFFileChecksum\x12\x1c\n" +
"\tmultiHash\x18\x01 \x01(\fR\tmultiHash\"\xd6\x01\n" +
"\x06MFFile\x12)\n" +
"\aversion\x18d \x01(\x0e2\x0f.MFFile.VersionR\aversion\x12!\n" +
"\x05files\x18e \x03(\v2\v.MFFilePathR\x05files\x12\x12\n" +
"\x04uuid\x18f \x01(\fR\x04uuid\x12.\n" +
"\tcreatedAt\x18\xc9\x01 \x01(\v2\n" +
".TimestampH\x00R\tcreatedAt\x88\x01\x01\",\n" +
"\aVersion\x12\x10\n" +
"\fVERSION_NONE\x10\x00\x12\x0f\n" +
"\vVERSION_ONE\x10\x01B\f\n" +
"\n" +
"_createdAtB\x1dZ\x1bgit.eeqj.de/sneak/mfer/mferb\x06proto3"
var (
file_mf_proto_rawDescOnce sync.Once
file_mf_proto_rawDescData []byte
)
func file_mf_proto_rawDescGZIP() []byte {
file_mf_proto_rawDescOnce.Do(func() {
file_mf_proto_rawDescData = protoimpl.X.CompressGZIP(unsafe.Slice(unsafe.StringData(file_mf_proto_rawDesc), len(file_mf_proto_rawDesc)))
})
return file_mf_proto_rawDescData
}
var file_mf_proto_enumTypes = make([]protoimpl.EnumInfo, 3)
var file_mf_proto_msgTypes = make([]protoimpl.MessageInfo, 5)
var file_mf_proto_goTypes = []any{
(MFFileOuter_Version)(0), // 0: MFFileOuter.Version
(MFFileOuter_CompressionType)(0), // 1: MFFileOuter.CompressionType
(MFFile_Version)(0), // 2: MFFile.Version
(*Timestamp)(nil), // 3: Timestamp
(*MFFileOuter)(nil), // 4: MFFileOuter
(*MFFilePath)(nil), // 5: MFFilePath
(*MFFileChecksum)(nil), // 6: MFFileChecksum
(*MFFile)(nil), // 7: MFFile
}
var file_mf_proto_depIdxs = []int32{
0, // 0: MFFileOuter.version:type_name -> MFFileOuter.Version
1, // 1: MFFileOuter.compressionType:type_name -> MFFileOuter.CompressionType
6, // 2: MFFilePath.hashes:type_name -> MFFileChecksum
3, // 3: MFFilePath.mtime:type_name -> Timestamp
3, // 4: MFFilePath.ctime:type_name -> Timestamp
3, // 5: MFFilePath.atime:type_name -> Timestamp
2, // 6: MFFile.version:type_name -> MFFile.Version
5, // 7: MFFile.files:type_name -> MFFilePath
3, // 8: MFFile.createdAt:type_name -> Timestamp
9, // [9:9] is the sub-list for method output_type
9, // [9:9] is the sub-list for method input_type
9, // [9:9] is the sub-list for extension type_name
9, // [9:9] is the sub-list for extension extendee
0, // [0:9] is the sub-list for field type_name
}
func init() { file_mf_proto_init() }
func file_mf_proto_init() {
if File_mf_proto != nil {
return
}
file_mf_proto_msgTypes[1].OneofWrappers = []any{}
file_mf_proto_msgTypes[2].OneofWrappers = []any{}
file_mf_proto_msgTypes[4].OneofWrappers = []any{}
type x struct{}
out := protoimpl.TypeBuilder{
File: protoimpl.DescBuilder{
GoPackagePath: reflect.TypeOf(x{}).PkgPath(),
RawDescriptor: unsafe.Slice(unsafe.StringData(file_mf_proto_rawDesc), len(file_mf_proto_rawDesc)),
NumEnums: 3,
NumMessages: 5,
NumExtensions: 0,
NumServices: 0,
},
GoTypes: file_mf_proto_goTypes,
DependencyIndexes: file_mf_proto_depIdxs,
EnumInfos: file_mf_proto_enumTypes,
MessageInfos: file_mf_proto_msgTypes,
}.Build()
File_mf_proto = out.File
file_mf_proto_goTypes = nil
file_mf_proto_depIdxs = nil
}

View File

@ -18,7 +18,7 @@ message MFFileOuter {
enum CompressionType {
COMPRESSION_NONE = 0;
COMPRESSION_GZIP = 1;
COMPRESSION_ZSTD = 1;
}
CompressionType compressionType = 102;
@ -28,6 +28,9 @@ message MFFileOuter {
int64 size = 103;
bytes sha256 = 104;
// uuid must match the uuid in the inner message
bytes uuid = 105;
bytes innerMessage = 199;
// 2xx for optional manifest root attributes
// think we might use gosignify instead of gpg:
@ -43,6 +46,9 @@ message MFFileOuter {
message MFFilePath {
// required attributes:
// Path invariants: must be valid UTF-8, use forward slashes only,
// be relative (no leading /), contain no ".." segments, and no
// empty segments (no "//").
string path = 1;
int64 size = 2;
@ -72,6 +78,10 @@ message MFFile {
// required manifest attributes:
repeated MFFilePath files = 101;
// uuid is a random v4 UUID generated when creating the manifest
// used as part of the signature to prevent replay attacks
bytes uuid = 102;
// optional manifest attributes 2xx:
optional Timestamp createdAt = 201;
}

View File

@ -1,74 +0,0 @@
package mfer
import (
"bytes"
"fmt"
"testing"
"git.eeqj.de/sneak/mfer/internal/log"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
)
// Add those variables as well
var (
existingFolder = "./testdata/a-folder-that-exists"
)
var (
af *afero.Afero = &afero.Afero{Fs: afero.NewMemMapFs()}
big *afero.Afero = &afero.Afero{Fs: afero.NewMemMapFs()}
)
func init() {
log.EnableDebugLogging()
// create test files and directories
af.MkdirAll("/a/b/c", 0o755)
af.MkdirAll("/.hidden", 0o755)
af.WriteFile("/a/b/c/hello.txt", []byte("hello world\n\n\n\n"), 0o755)
af.WriteFile("/a/b/c/hello2.txt", []byte("hello world\n\n\n\n"), 0o755)
af.WriteFile("/.hidden/hello.txt", []byte("hello world\n"), 0o755)
af.WriteFile("/.hidden/hello2.txt", []byte("hello world\n"), 0o755)
big.MkdirAll("/home/user/Library", 0o755)
for i := range [25]int{} {
big.WriteFile(fmt.Sprintf("/home/user/Library/hello%d.txt", i), []byte("hello world\n"), 0o755)
}
}
func TestPathHiddenFunc(t *testing.T) {
assert.False(t, pathIsHidden("/a/b/c/hello.txt"))
assert.True(t, pathIsHidden("/a/b/c/.hello.txt"))
assert.True(t, pathIsHidden("/a/.b/c/hello.txt"))
assert.True(t, pathIsHidden("/.a/b/c/hello.txt"))
assert.False(t, pathIsHidden("./a/b/c/hello.txt"))
}
func TestManifestGenerationOne(t *testing.T) {
m, err := NewFromFS(&ManifestScanOptions{
IgnoreDotfiles: true,
}, af)
assert.Nil(t, err)
assert.NotNil(t, m)
m.Scan()
assert.Equal(t, int64(2), m.GetFileCount())
assert.Equal(t, int64(30), m.GetTotalFileSize())
}
func TestManifestGenerationTwo(t *testing.T) {
m, err := NewFromFS(&ManifestScanOptions{
IgnoreDotfiles: false,
}, af)
assert.Nil(t, err)
assert.NotNil(t, m)
m.Scan()
assert.Equal(t, int64(4), m.GetFileCount())
assert.Equal(t, int64(54), m.GetTotalFileSize())
err = m.generate()
assert.Nil(t, err)
var buf bytes.Buffer
err = m.WriteTo(&buf)
assert.Nil(t, err)
log.Dump(buf.Bytes())
}

View File

@ -1,33 +0,0 @@
package mfer
import (
"io"
"os"
)
func (m *manifest) WriteToFile(path string) error {
// FIXME refuse to overwrite without -f if file exists
f, err := os.Create(path)
if err != nil {
return err
}
defer f.Close()
return m.WriteTo(f)
}
func (m *manifest) WriteTo(output io.Writer) error {
if m.pbOuter == nil {
err := m.generate()
if err != nil {
return err
}
}
_, err := output.Write(m.output.Bytes())
if err != nil {
return err
}
return nil
}

427
mfer/scanner.go Normal file
View File

@ -0,0 +1,427 @@
package mfer
import (
"context"
"io"
"io/fs"
"path"
"path/filepath"
"strings"
"sync"
"time"
"github.com/dustin/go-humanize"
"github.com/spf13/afero"
"sneak.berlin/go/mfer/internal/log"
)
// Phase 1: Enumeration
// ---------------------
// Walking directories and calling stat() on files to collect metadata.
// Builds the list of files to be scanned. Relatively fast (metadata only).
// EnumerateStatus contains progress information for the enumeration phase.
type EnumerateStatus struct {
FilesFound FileCount // Number of files discovered so far
BytesFound FileSize // Total size of discovered files (from stat)
}
// Phase 2: Scan (ToManifest)
// --------------------------
// Reading file contents and computing hashes for manifest generation.
// This is the expensive phase that reads all file data.
// ScanStatus contains progress information for the scan phase.
type ScanStatus struct {
TotalFiles FileCount // Total number of files to scan
ScannedFiles FileCount // Number of files scanned so far
TotalBytes FileSize // Total bytes to read (sum of all file sizes)
ScannedBytes FileSize // Bytes read so far
BytesPerSec float64 // Current throughput rate
ETA time.Duration // Estimated time to completion
}
// ScannerOptions configures scanner behavior.
type ScannerOptions struct {
IncludeDotfiles bool // Include files and directories starting with a dot (default: exclude)
FollowSymLinks bool // Resolve symlinks instead of skipping them
Fs afero.Fs // Filesystem to use, defaults to OsFs if nil
SigningOptions *SigningOptions // GPG signing options (nil = no signing)
}
// FileEntry represents a file that has been enumerated.
type FileEntry struct {
Path RelFilePath // Relative path (used in manifest)
AbsPath AbsFilePath // Absolute path (used for reading file content)
Size FileSize // File size in bytes
Mtime ModTime // Last modification time
Ctime time.Time // Creation time (platform-dependent)
}
// Scanner accumulates files and generates manifests from them.
type Scanner struct {
mu sync.RWMutex
files []*FileEntry
totalBytes FileSize // cached sum of all file sizes
options *ScannerOptions
fs afero.Fs
}
// NewScanner creates a new Scanner with default options.
func NewScanner() *Scanner {
return NewScannerWithOptions(nil)
}
// NewScannerWithOptions creates a new Scanner with the given options.
func NewScannerWithOptions(opts *ScannerOptions) *Scanner {
if opts == nil {
opts = &ScannerOptions{}
}
fs := opts.Fs
if fs == nil {
fs = afero.NewOsFs()
}
return &Scanner{
files: make([]*FileEntry, 0),
options: opts,
fs: fs,
}
}
// EnumerateFile adds a single file to the scanner, calling stat() to get metadata.
func (s *Scanner) EnumerateFile(filePath string) error {
abs, err := filepath.Abs(filePath)
if err != nil {
return err
}
info, err := s.fs.Stat(abs)
if err != nil {
return err
}
// For single files, use the filename as the relative path
basePath := filepath.Dir(abs)
return s.enumerateFileWithInfo(filepath.Base(abs), basePath, info, nil)
}
// EnumeratePath walks a directory path and adds all files to the scanner.
// If progress is non-nil, status updates are sent as files are discovered.
// The progress channel is closed when the method returns.
func (s *Scanner) EnumeratePath(inputPath string, progress chan<- EnumerateStatus) error {
if progress != nil {
defer close(progress)
}
abs, err := filepath.Abs(inputPath)
if err != nil {
return err
}
afs := afero.NewReadOnlyFs(afero.NewBasePathFs(s.fs, abs))
return s.enumerateFS(afs, abs, progress)
}
// EnumeratePaths walks multiple directory paths and adds all files to the scanner.
// If progress is non-nil, status updates are sent as files are discovered.
// The progress channel is closed when the method returns.
func (s *Scanner) EnumeratePaths(progress chan<- EnumerateStatus, inputPaths ...string) error {
if progress != nil {
defer close(progress)
}
for _, p := range inputPaths {
abs, err := filepath.Abs(p)
if err != nil {
return err
}
afs := afero.NewReadOnlyFs(afero.NewBasePathFs(s.fs, abs))
if err := s.enumerateFS(afs, abs, progress); err != nil {
return err
}
}
return nil
}
// EnumerateFS walks an afero filesystem and adds all files to the scanner.
// If progress is non-nil, status updates are sent as files are discovered.
// The progress channel is closed when the method returns.
// basePath is used to compute absolute paths for file reading.
func (s *Scanner) EnumerateFS(afs afero.Fs, basePath string, progress chan<- EnumerateStatus) error {
if progress != nil {
defer close(progress)
}
return s.enumerateFS(afs, basePath, progress)
}
// enumerateFS is the internal implementation that doesn't close the progress channel.
func (s *Scanner) enumerateFS(afs afero.Fs, basePath string, progress chan<- EnumerateStatus) error {
return afero.Walk(afs, "/", func(p string, info fs.FileInfo, err error) error {
if err != nil {
return err
}
if !s.options.IncludeDotfiles && IsHiddenPath(p) {
if info.IsDir() {
return filepath.SkipDir
}
return nil
}
return s.enumerateFileWithInfo(p, basePath, info, progress)
})
}
// enumerateFileWithInfo adds a file with pre-existing fs.FileInfo.
func (s *Scanner) enumerateFileWithInfo(filePath string, basePath string, info fs.FileInfo, progress chan<- EnumerateStatus) error {
if info.IsDir() {
// Manifests contain only files, directories are implied
return nil
}
// Clean the path - remove leading slash if present
cleanPath := filePath
if len(cleanPath) > 0 && cleanPath[0] == '/' {
cleanPath = cleanPath[1:]
}
// Compute absolute path for file reading
absPath := filepath.Join(basePath, cleanPath)
// Handle symlinks
if info.Mode()&fs.ModeSymlink != 0 {
if !s.options.FollowSymLinks {
// Skip symlinks when not following them
return nil
}
// Resolve symlink to get real file info
realPath, err := filepath.EvalSymlinks(absPath)
if err != nil {
// Skip broken symlinks
return nil
}
realInfo, err := s.fs.Stat(realPath)
if err != nil {
return nil
}
// Skip if symlink points to a directory
if realInfo.IsDir() {
return nil
}
// Use resolved path for reading, but keep original path in manifest
absPath = realPath
info = realInfo
}
entry := &FileEntry{
Path: RelFilePath(cleanPath),
AbsPath: AbsFilePath(absPath),
Size: FileSize(info.Size()),
Mtime: ModTime(info.ModTime()),
// Note: Ctime not available from fs.FileInfo on all platforms
// Will need platform-specific code to extract it
}
s.mu.Lock()
s.files = append(s.files, entry)
s.totalBytes += entry.Size
filesFound := FileCount(len(s.files))
bytesFound := s.totalBytes
s.mu.Unlock()
sendEnumerateStatus(progress, EnumerateStatus{
FilesFound: filesFound,
BytesFound: bytesFound,
})
return nil
}
// Files returns a copy of all files added to the scanner.
func (s *Scanner) Files() []*FileEntry {
s.mu.RLock()
defer s.mu.RUnlock()
out := make([]*FileEntry, len(s.files))
copy(out, s.files)
return out
}
// FileCount returns the number of files in the scanner.
func (s *Scanner) FileCount() FileCount {
s.mu.RLock()
defer s.mu.RUnlock()
return FileCount(len(s.files))
}
// TotalBytes returns the total size of all files in the scanner.
func (s *Scanner) TotalBytes() FileSize {
s.mu.RLock()
defer s.mu.RUnlock()
return s.totalBytes
}
// ToManifest reads all file contents, computes hashes, and generates a manifest.
// If progress is non-nil, status updates are sent approximately once per second.
// The progress channel is closed when the method returns.
// The manifest is written to the provided io.Writer.
func (s *Scanner) ToManifest(ctx context.Context, w io.Writer, progress chan<- ScanStatus) error {
if progress != nil {
defer close(progress)
}
s.mu.RLock()
files := make([]*FileEntry, len(s.files))
copy(files, s.files)
totalFiles := FileCount(len(files))
var totalBytes FileSize
for _, f := range files {
totalBytes += f.Size
}
s.mu.RUnlock()
builder := NewBuilder()
if s.options.SigningOptions != nil {
builder.SetSigningOptions(s.options.SigningOptions)
}
var scannedFiles FileCount
var scannedBytes FileSize
lastProgressTime := time.Now()
startTime := time.Now()
for _, entry := range files {
// Check for cancellation
select {
case <-ctx.Done():
return ctx.Err()
default:
}
// Open file
f, err := s.fs.Open(string(entry.AbsPath))
if err != nil {
return err
}
// Create progress channel for this file
var fileProgress chan FileHashProgress
var wg sync.WaitGroup
if progress != nil {
fileProgress = make(chan FileHashProgress, 1)
wg.Add(1)
go func(baseScannedBytes FileSize) {
defer wg.Done()
for p := range fileProgress {
// Send progress at most once per second
now := time.Now()
if now.Sub(lastProgressTime) >= time.Second {
elapsed := now.Sub(startTime).Seconds()
currentBytes := baseScannedBytes + p.BytesRead
var rate float64
var eta time.Duration
if elapsed > 0 && currentBytes > 0 {
rate = float64(currentBytes) / elapsed
remainingBytes := totalBytes - currentBytes
if rate > 0 {
eta = time.Duration(float64(remainingBytes)/rate) * time.Second
}
}
sendScanStatus(progress, ScanStatus{
TotalFiles: totalFiles,
ScannedFiles: scannedFiles,
TotalBytes: totalBytes,
ScannedBytes: currentBytes,
BytesPerSec: rate,
ETA: eta,
})
lastProgressTime = now
}
}
}(scannedBytes)
}
// Add to manifest with progress channel
bytesRead, err := builder.AddFile(
entry.Path,
entry.Size,
entry.Mtime,
f,
fileProgress,
)
_ = f.Close()
// Close channel and wait for goroutine to finish
if fileProgress != nil {
close(fileProgress)
wg.Wait()
}
if err != nil {
return err
}
log.Verbosef("+ %s (%s)", entry.Path, humanize.IBytes(uint64(bytesRead)))
scannedFiles++
scannedBytes += bytesRead
}
// Send final progress (ETA is 0 at completion)
if progress != nil {
elapsed := time.Since(startTime).Seconds()
var rate float64
if elapsed > 0 {
rate = float64(scannedBytes) / elapsed
}
sendScanStatus(progress, ScanStatus{
TotalFiles: totalFiles,
ScannedFiles: scannedFiles,
TotalBytes: totalBytes,
ScannedBytes: scannedBytes,
BytesPerSec: rate,
ETA: 0,
})
}
// Build and write manifest
return builder.Build(w)
}
// IsHiddenPath returns true if the path or any of its parent directories
// start with a dot (hidden files/directories).
// The path should use forward slashes.
func IsHiddenPath(p string) bool {
tp := path.Clean(p)
if strings.HasPrefix(tp, ".") {
return true
}
for {
d, f := path.Split(tp)
if strings.HasPrefix(f, ".") {
return true
}
if d == "" {
return false
}
tp = d[0 : len(d)-1] // trim trailing slash from dir
}
}
// sendEnumerateStatus sends a status update without blocking.
// If the channel is full, the update is dropped.
func sendEnumerateStatus(ch chan<- EnumerateStatus, status EnumerateStatus) {
if ch == nil {
return
}
select {
case ch <- status:
default:
// Channel full, drop this update
}
}
// sendScanStatus sends a status update without blocking.
// If the channel is full, the update is dropped.
func sendScanStatus(ch chan<- ScanStatus, status ScanStatus) {
if ch == nil {
return
}
select {
case ch <- status:
default:
// Channel full, drop this update
}
}

362
mfer/scanner_test.go Normal file
View File

@ -0,0 +1,362 @@
package mfer
import (
"bytes"
"context"
"testing"
"time"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestNewScanner(t *testing.T) {
s := NewScanner()
assert.NotNil(t, s)
assert.Equal(t, FileCount(0), s.FileCount())
assert.Equal(t, FileSize(0), s.TotalBytes())
}
func TestNewScannerWithOptions(t *testing.T) {
t.Run("nil options", func(t *testing.T) {
s := NewScannerWithOptions(nil)
assert.NotNil(t, s)
})
t.Run("with options", func(t *testing.T) {
fs := afero.NewMemMapFs()
opts := &ScannerOptions{
IncludeDotfiles: true,
FollowSymLinks: true,
Fs: fs,
}
s := NewScannerWithOptions(opts)
assert.NotNil(t, s)
})
}
func TestScannerEnumerateFile(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, afero.WriteFile(fs, "/test.txt", []byte("hello world"), 0o644))
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
err := s.EnumerateFile("/test.txt")
require.NoError(t, err)
assert.Equal(t, FileCount(1), s.FileCount())
assert.Equal(t, FileSize(11), s.TotalBytes())
files := s.Files()
require.Len(t, files, 1)
assert.Equal(t, RelFilePath("test.txt"), files[0].Path)
assert.Equal(t, FileSize(11), files[0].Size)
}
func TestScannerEnumerateFileMissing(t *testing.T) {
fs := afero.NewMemMapFs()
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
err := s.EnumerateFile("/nonexistent.txt")
assert.Error(t, err)
}
func TestScannerEnumeratePath(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir/subdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("one"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("two"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/subdir/file3.txt", []byte("three"), 0o644))
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
err := s.EnumeratePath("/testdir", nil)
require.NoError(t, err)
assert.Equal(t, FileCount(3), s.FileCount())
assert.Equal(t, FileSize(3+3+5), s.TotalBytes())
}
func TestScannerEnumeratePathWithProgress(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("one"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("two"), 0o644))
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
progress := make(chan EnumerateStatus, 10)
err := s.EnumeratePath("/testdir", progress)
require.NoError(t, err)
var updates []EnumerateStatus
for p := range progress {
updates = append(updates, p)
}
assert.NotEmpty(t, updates)
// Final update should show all files
final := updates[len(updates)-1]
assert.Equal(t, FileCount(2), final.FilesFound)
assert.Equal(t, FileSize(6), final.BytesFound)
}
func TestScannerEnumeratePaths(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/dir1", 0o755))
require.NoError(t, fs.MkdirAll("/dir2", 0o755))
require.NoError(t, afero.WriteFile(fs, "/dir1/a.txt", []byte("aaa"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/dir2/b.txt", []byte("bbb"), 0o644))
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
err := s.EnumeratePaths(nil, "/dir1", "/dir2")
require.NoError(t, err)
assert.Equal(t, FileCount(2), s.FileCount())
}
func TestScannerExcludeDotfiles(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir/.hidden", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/visible.txt", []byte("visible"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/.hidden.txt", []byte("hidden"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/.hidden/inside.txt", []byte("inside"), 0o644))
t.Run("exclude by default", func(t *testing.T) {
s := NewScannerWithOptions(&ScannerOptions{Fs: fs, IncludeDotfiles: false})
err := s.EnumeratePath("/testdir", nil)
require.NoError(t, err)
assert.Equal(t, FileCount(1), s.FileCount())
files := s.Files()
assert.Equal(t, RelFilePath("visible.txt"), files[0].Path)
})
t.Run("include when enabled", func(t *testing.T) {
s := NewScannerWithOptions(&ScannerOptions{Fs: fs, IncludeDotfiles: true})
err := s.EnumeratePath("/testdir", nil)
require.NoError(t, err)
assert.Equal(t, FileCount(3), s.FileCount())
})
}
func TestScannerToManifest(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file1.txt", []byte("content one"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/file2.txt", []byte("content two"), 0o644))
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
err := s.EnumeratePath("/testdir", nil)
require.NoError(t, err)
var buf bytes.Buffer
err = s.ToManifest(context.Background(), &buf, nil)
require.NoError(t, err)
// Manifest should have magic bytes
assert.True(t, buf.Len() > 0)
assert.Equal(t, MAGIC, string(buf.Bytes()[:8]))
}
func TestScannerToManifestWithProgress(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file.txt", bytes.Repeat([]byte("x"), 1000), 0o644))
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
err := s.EnumeratePath("/testdir", nil)
require.NoError(t, err)
var buf bytes.Buffer
progress := make(chan ScanStatus, 10)
err = s.ToManifest(context.Background(), &buf, progress)
require.NoError(t, err)
var updates []ScanStatus
for p := range progress {
updates = append(updates, p)
}
assert.NotEmpty(t, updates)
// Final update should show completion
final := updates[len(updates)-1]
assert.Equal(t, FileCount(1), final.TotalFiles)
assert.Equal(t, FileCount(1), final.ScannedFiles)
assert.Equal(t, FileSize(1000), final.TotalBytes)
assert.Equal(t, FileSize(1000), final.ScannedBytes)
}
func TestScannerToManifestContextCancellation(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
// Create many files to ensure we have time to cancel
for i := 0; i < 100; i++ {
name := string(rune('a'+i%26)) + string(rune('0'+i/26)) + ".txt"
require.NoError(t, afero.WriteFile(fs, "/testdir/"+name, bytes.Repeat([]byte("x"), 100), 0o644))
}
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
err := s.EnumeratePath("/testdir", nil)
require.NoError(t, err)
ctx, cancel := context.WithCancel(context.Background())
cancel() // Cancel immediately
var buf bytes.Buffer
err = s.ToManifest(ctx, &buf, nil)
assert.ErrorIs(t, err, context.Canceled)
}
func TestScannerToManifestEmptyScanner(t *testing.T) {
fs := afero.NewMemMapFs()
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
var buf bytes.Buffer
err := s.ToManifest(context.Background(), &buf, nil)
require.NoError(t, err)
// Should still produce a valid manifest
assert.True(t, buf.Len() > 0)
assert.Equal(t, MAGIC, string(buf.Bytes()[:8]))
}
func TestScannerFilesCopiesSlice(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, afero.WriteFile(fs, "/test.txt", []byte("hello"), 0o644))
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
require.NoError(t, s.EnumerateFile("/test.txt"))
files1 := s.Files()
files2 := s.Files()
// Should be different slices
assert.NotSame(t, &files1[0], &files2[0])
}
func TestScannerEnumerateFS(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir/sub", 0o755))
require.NoError(t, afero.WriteFile(fs, "/testdir/file.txt", []byte("hello"), 0o644))
require.NoError(t, afero.WriteFile(fs, "/testdir/sub/nested.txt", []byte("world"), 0o644))
// Create a basepath filesystem
baseFs := afero.NewBasePathFs(fs, "/testdir")
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
err := s.EnumerateFS(baseFs, "/testdir", nil)
require.NoError(t, err)
assert.Equal(t, FileCount(2), s.FileCount())
}
func TestSendEnumerateStatusNonBlocking(t *testing.T) {
// Channel with no buffer - send should not block
ch := make(chan EnumerateStatus)
// This should not block
done := make(chan bool)
go func() {
sendEnumerateStatus(ch, EnumerateStatus{FilesFound: 1})
done <- true
}()
select {
case <-done:
// Success - did not block
case <-time.After(100 * time.Millisecond):
t.Fatal("sendEnumerateStatus blocked on full channel")
}
}
func TestSendScanStatusNonBlocking(t *testing.T) {
// Channel with no buffer - send should not block
ch := make(chan ScanStatus)
done := make(chan bool)
go func() {
sendScanStatus(ch, ScanStatus{ScannedFiles: 1})
done <- true
}()
select {
case <-done:
// Success - did not block
case <-time.After(100 * time.Millisecond):
t.Fatal("sendScanStatus blocked on full channel")
}
}
func TestSendStatusNilChannel(t *testing.T) {
// Should not panic with nil channel
sendEnumerateStatus(nil, EnumerateStatus{})
sendScanStatus(nil, ScanStatus{})
}
func TestScannerFileEntryFields(t *testing.T) {
fs := afero.NewMemMapFs()
now := time.Now().Truncate(time.Second)
require.NoError(t, afero.WriteFile(fs, "/test.txt", []byte("content"), 0o644))
require.NoError(t, fs.Chtimes("/test.txt", now, now))
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
require.NoError(t, s.EnumerateFile("/test.txt"))
files := s.Files()
require.Len(t, files, 1)
entry := files[0]
assert.Equal(t, RelFilePath("test.txt"), entry.Path)
assert.Contains(t, string(entry.AbsPath), "test.txt")
assert.Equal(t, FileSize(7), entry.Size)
// Mtime should be set (within a second of now)
assert.WithinDuration(t, now, time.Time(entry.Mtime), 2*time.Second)
}
func TestScannerLargeFileEnumeration(t *testing.T) {
fs := afero.NewMemMapFs()
require.NoError(t, fs.MkdirAll("/testdir", 0o755))
// Create 100 files
for i := 0; i < 100; i++ {
name := "/testdir/" + string(rune('a'+i%26)) + string(rune('0'+i/26%10)) + ".txt"
require.NoError(t, afero.WriteFile(fs, name, []byte("data"), 0o644))
}
s := NewScannerWithOptions(&ScannerOptions{Fs: fs})
progress := make(chan EnumerateStatus, 200)
err := s.EnumeratePath("/testdir", progress)
require.NoError(t, err)
// Drain channel
for range progress {
}
assert.Equal(t, FileCount(100), s.FileCount())
assert.Equal(t, FileSize(400), s.TotalBytes()) // 100 * 4 bytes
}
func TestIsHiddenPath(t *testing.T) {
tests := []struct {
path string
hidden bool
}{
{"file.txt", false},
{".hidden", true},
{"dir/file.txt", false},
{"dir/.hidden", true},
{".dir/file.txt", true},
{"/absolute/path", false},
{"/absolute/.hidden", true},
{"./relative", false}, // path.Clean removes leading ./
{"a/b/c/.d/e", true},
}
for _, tt := range tests {
t.Run(tt.path, func(t *testing.T) {
assert.Equal(t, tt.hidden, IsHiddenPath(tt.path), "IsHiddenPath(%q)", tt.path)
})
}
}

View File

@ -2,33 +2,29 @@ package mfer
import (
"bytes"
"compress/gzip"
"crypto/sha256"
"errors"
"fmt"
"time"
"github.com/google/uuid"
"github.com/klauspost/compress/zstd"
"google.golang.org/protobuf/proto"
)
//go:generate protoc --go_out=. --go_opt=paths=source_relative mf.proto
// rot13("MANIFEST")
// MAGIC is the file format magic bytes prefix (rot13 of "MANIFEST").
const MAGIC string = "ZNAVSRFG"
func newTimestampFromTime(t time.Time) *Timestamp {
out := &Timestamp{
return &Timestamp{
Seconds: t.Unix(),
Nanos: int32(t.UnixNano() - (t.Unix() * 1000000000)),
Nanos: int32(t.Nanosecond()),
}
return out
}
func (m *manifest) generate() error {
if m.pbInner == nil {
e := m.generateInner()
if e != nil {
return e
}
return errors.New("internal error: pbInner not set")
}
if m.pbOuter == nil {
e := m.generateOuter()
@ -52,49 +48,71 @@ func (m *manifest) generateOuter() error {
if m.pbInner == nil {
return errors.New("internal error")
}
// Generate UUID and set on inner message
manifestUUID := uuid.New()
m.pbInner.Uuid = manifestUUID[:]
innerData, err := proto.MarshalOptions{Deterministic: true}.Marshal(m.pbInner)
if err != nil {
return err
}
h := sha256.New()
h.Write(innerData)
// Compress the inner data
idc := new(bytes.Buffer)
gzw, err := gzip.NewWriterLevel(idc, gzip.BestCompression)
zw, err := zstd.NewWriter(idc, zstd.WithEncoderLevel(zstd.SpeedBestCompression))
if err != nil {
return err
}
_, err = gzw.Write(innerData)
_, err = zw.Write(innerData)
if err != nil {
return err
}
_ = zw.Close()
gzw.Close()
compressedData := idc.Bytes()
o := &MFFileOuter{
InnerMessage: idc.Bytes(),
// Hash the compressed data for integrity verification before decompression
h := sha256.New()
if _, err := h.Write(compressedData); err != nil {
return err
}
sha256Hash := h.Sum(nil)
m.pbOuter = &MFFileOuter{
InnerMessage: compressedData,
Size: int64(len(innerData)),
Sha256: h.Sum(nil),
Sha256: sha256Hash,
Uuid: manifestUUID[:],
Version: MFFileOuter_VERSION_ONE,
CompressionType: MFFileOuter_COMPRESSION_GZIP,
}
m.pbOuter = o
return nil
CompressionType: MFFileOuter_COMPRESSION_ZSTD,
}
func (m *manifest) generateInner() error {
m.pbInner = &MFFile{
Version: MFFile_VERSION_ONE,
CreatedAt: newTimestampFromTime(time.Now()),
Files: []*MFFilePath{},
// Sign the manifest if signing options are provided
if m.signingOptions != nil && m.signingOptions.KeyID != "" {
sigString, err := m.signatureString()
if err != nil {
return fmt.Errorf("failed to generate signature string: %w", err)
}
for _, f := range m.files {
nf := &MFFilePath{
Path: f.path,
// FIXME add more stuff
sig, err := gpgSign([]byte(sigString), m.signingOptions.KeyID)
if err != nil {
return fmt.Errorf("failed to sign manifest: %w", err)
}
m.pbInner.Files = append(m.pbInner.Files, nf)
m.pbOuter.Signature = sig
fingerprint, err := gpgGetKeyFingerprint(m.signingOptions.KeyID)
if err != nil {
return fmt.Errorf("failed to get key fingerprint: %w", err)
}
m.pbOuter.Signer = fingerprint
pubKey, err := gpgExportPublicKey(m.signingOptions.KeyID)
if err != nil {
return fmt.Errorf("failed to export public key: %w", err)
}
m.pbOuter.SigningPubKey = pubKey
}
return nil
}

53
mfer/url.go Normal file
View File

@ -0,0 +1,53 @@
package mfer
import (
"net/url"
"strings"
)
// ManifestURL represents a URL pointing to a manifest file.
type ManifestURL string
// FileURL represents a URL pointing to a file to be fetched.
type FileURL string
// BaseURL represents a base URL for constructing file URLs.
type BaseURL string
// JoinPath safely joins a relative file path to a base URL.
// The path is properly URL-encoded to prevent path traversal.
func (b BaseURL) JoinPath(path RelFilePath) (FileURL, error) {
base, err := url.Parse(string(b))
if err != nil {
return "", err
}
// Ensure base path ends with /
if !strings.HasSuffix(base.Path, "/") {
base.Path += "/"
}
// Parse and encode the relative path
ref, err := url.Parse(url.PathEscape(string(path)))
if err != nil {
return "", err
}
resolved := base.ResolveReference(ref)
return FileURL(resolved.String()), nil
}
// String returns the URL as a string.
func (b BaseURL) String() string {
return string(b)
}
// String returns the URL as a string.
func (f FileURL) String() string {
return string(f)
}
// String returns the URL as a string.
func (m ManifestURL) String() string {
return string(m)
}