Add testable CLI with dependency injection and new scanner/checker packages
Major changes: - Refactor CLI to accept injected I/O streams and filesystem (afero.Fs) for testing without touching the real filesystem - Add RunOptions struct and RunWithOptions() for configurable CLI execution - Add internal/scanner package with two-phase manifest generation: - Phase 1 (Enumeration): walk directories, collect metadata - Phase 2 (Scan): read contents, compute hashes, write manifest - Add internal/checker package for manifest verification with progress reporting and channel-based result streaming - Add mfer/builder.go for incremental manifest construction - Add --no-extra-files flag to check command to detect files not in manifest - Add timing summaries showing file count, size, elapsed time, and throughput - Add comprehensive tests using afero.MemMapFs (no real filesystem access) - Add contrib/usage.sh integration test script - Fix banner ASCII art alignment (consistent spacing) - Fix verbosity levels so summaries display at default log level - Update internal/log to support configurable output writers
This commit is contained in:
230
README.md
230
README.md
@@ -2,6 +2,232 @@
|
||||
|
||||
Manifest file generator and checker.
|
||||
|
||||
# Phases
|
||||
|
||||
Manifest generation happens in two distinct phases:
|
||||
|
||||
## Phase 1: Enumeration
|
||||
|
||||
Walking directories and calling `stat()` on files to collect metadata (path, size, mtime, ctime). This builds the list of files to be scanned. Relatively fast as it only reads filesystem metadata, not file contents.
|
||||
|
||||
**Progress:** `EnumerateStatus` with `FilesFound` and `BytesFound`
|
||||
|
||||
## Phase 2: Scan (ToManifest)
|
||||
|
||||
Reading file contents and computing cryptographic hashes for manifest generation. This is the expensive phase that reads all file data from disk.
|
||||
|
||||
**Progress:** `ScanStatus` with `TotalFiles`, `ScannedFiles`, `TotalBytes`, `ScannedBytes`, `BytesPerSec`
|
||||
|
||||
# Code Conventions
|
||||
|
||||
- **Logging:** Never use `fmt.Printf` or write to stdout/stderr directly in normal code. Use the `internal/log` package for all output (`log.Info`, `log.Infof`, `log.Debug`, `log.Debugf`, `log.Progressf`, `log.ProgressDone`).
|
||||
- **Filesystem abstraction:** Use `github.com/spf13/afero` for filesystem operations to enable testing and flexibility.
|
||||
- **CLI framework:** Use `github.com/urfave/cli/v2` for command-line interface.
|
||||
- **Serialization:** Use Protocol Buffers for manifest file format.
|
||||
- **Internal packages:** Non-exported implementation details go in `internal/` subdirectories.
|
||||
- **Concurrency:** Use `sync.RWMutex` for protecting shared state; prefer channels for progress reporting.
|
||||
- **Progress channels:** Use buffered channels (size 1) with non-blocking sends to avoid blocking the main operation if the consumer is slow.
|
||||
- **Context support:** Long-running operations should accept `context.Context` for cancellation.
|
||||
- **NO_COLOR:** Respect the `NO_COLOR` environment variable for disabling colored output.
|
||||
- **Options pattern:** Use `NewWithOptions(opts *Options)` constructor pattern for configurable types.
|
||||
|
||||
# Codebase Structure
|
||||
|
||||
## cmd/mfer/
|
||||
|
||||
### main.go
|
||||
- **Variables**
|
||||
- `Appname string` - Application name
|
||||
- `Version string` - Version string (set at build time)
|
||||
- `Gitrev string` - Git revision (set at build time)
|
||||
|
||||
## internal/cli/
|
||||
|
||||
### entry.go
|
||||
- **Variables**
|
||||
- `NO_COLOR bool` - Disables color output when NO_COLOR env var is set
|
||||
- **Functions**
|
||||
- `Run(Appname, Version, Gitrev string) int` - Main entry point for the CLI
|
||||
|
||||
### mfer.go
|
||||
- **Types**
|
||||
- `CLIApp struct` - Main CLI application container
|
||||
- **Methods**
|
||||
- `(*CLIApp) VersionString() string` - Returns formatted version string
|
||||
|
||||
## internal/log/
|
||||
|
||||
### log.go
|
||||
- **Functions**
|
||||
- `Init()` - Initializes the logger
|
||||
- `Info(arg string)` - Logs at info level
|
||||
- `Infof(format string, args ...interface{})` - Logs at info level with formatting
|
||||
- `Debug(arg string)` - Logs at debug level with caller info
|
||||
- `Debugf(format string, args ...interface{})` - Logs at debug level with formatting and caller info
|
||||
- `Dump(args ...interface{})` - Logs spew dump at debug level
|
||||
- `Progressf(format string, args ...interface{})` - Prints progress message (overwrites current line)
|
||||
- `ProgressDone()` - Completes progress line with newline
|
||||
- `EnableDebugLogging()` - Sets log level to debug
|
||||
- `SetLevel(arg log.Level)` - Sets log level
|
||||
- `SetLevelFromVerbosity(l int)` - Sets log level from verbosity count
|
||||
- `GetLevel() log.Level` - Returns current log level
|
||||
- `GetLogger() *log.Logger` - Returns underlying logger
|
||||
- `WithError(e error) *log.Entry` - Returns log entry with error attached
|
||||
- `DisableStyling()` - Disables colors and styling (for NO_COLOR)
|
||||
|
||||
## internal/scanner/
|
||||
|
||||
### scanner.go
|
||||
- **Types**
|
||||
- `Options struct` - Options for scanner behavior
|
||||
- `IgnoreDotfiles bool`
|
||||
- `FollowSymLinks bool`
|
||||
- `EnumerateStatus struct` - Progress information for enumeration phase
|
||||
- `FilesFound int64`
|
||||
- `BytesFound int64`
|
||||
- `ScanStatus struct` - Progress information for scan phase
|
||||
- `TotalFiles int64`
|
||||
- `ScannedFiles int64`
|
||||
- `TotalBytes int64`
|
||||
- `ScannedBytes int64`
|
||||
- `BytesPerSec float64`
|
||||
- `FileEntry struct` - Represents an enumerated file
|
||||
- `Path string` - Relative path (used in manifest)
|
||||
- `AbsPath string` - Absolute path (used for reading file content)
|
||||
- `Size int64`
|
||||
- `Mtime time.Time`
|
||||
- `Ctime time.Time`
|
||||
- `Scanner struct` - Accumulates files and generates manifests
|
||||
- **Functions**
|
||||
- `New() *Scanner` - Creates a new Scanner with default options
|
||||
- `NewWithOptions(opts *Options) *Scanner` - Creates a new Scanner with given options
|
||||
- **Methods (Enumeration Phase)**
|
||||
- `(*Scanner) EnumerateFile(path string) error` - Enumerates a single file, calling stat() for metadata
|
||||
- `(*Scanner) EnumeratePath(inputPath string, progress chan<- EnumerateStatus) error` - Walks a directory and enumerates all files
|
||||
- `(*Scanner) EnumeratePaths(progress chan<- EnumerateStatus, inputPaths ...string) error` - Walks multiple directories
|
||||
- `(*Scanner) EnumerateFS(afs afero.Fs, basePath string, progress chan<- EnumerateStatus) error` - Walks an afero filesystem
|
||||
- **Methods (Accessors)**
|
||||
- `(*Scanner) Files() []*FileEntry` - Returns copy of all enumerated files
|
||||
- `(*Scanner) FileCount() int64` - Returns number of files
|
||||
- `(*Scanner) TotalBytes() int64` - Returns total size of all files
|
||||
- **Methods (Scan Phase)**
|
||||
- `(*Scanner) ToManifest(ctx context.Context, w io.Writer, progress chan<- ScanStatus) error` - Reads file contents, computes hashes, generates manifest
|
||||
|
||||
## internal/checker/
|
||||
|
||||
### checker.go
|
||||
- **Types**
|
||||
- `Result struct` - Outcome of checking a single file
|
||||
- `Path string` - File path from manifest
|
||||
- `Status Status` - Verification status
|
||||
- `Message string` - Error or status message
|
||||
- `Status int` - Verification status enumeration
|
||||
- `StatusOK` - File matches manifest
|
||||
- `StatusMissing` - File not found
|
||||
- `StatusSizeMismatch` - File size differs from manifest
|
||||
- `StatusHashMismatch` - File hash differs from manifest
|
||||
- `StatusError` - Error occurred during verification
|
||||
- `CheckStatus struct` - Progress information for check operation
|
||||
- `TotalFiles int64`
|
||||
- `CheckedFiles int64`
|
||||
- `TotalBytes int64`
|
||||
- `CheckedBytes int64`
|
||||
- `BytesPerSec float64`
|
||||
- `Failures int64`
|
||||
- `Checker struct` - Verifies files against a manifest
|
||||
- **Functions**
|
||||
- `NewChecker(manifestPath string, basePath string) (*Checker, error)` - Creates a new Checker for the given manifest and base path
|
||||
- **Methods**
|
||||
- `(s Status) String() string` - Returns string representation of status
|
||||
- `(*Checker) FileCount() int64` - Returns number of files in the manifest
|
||||
- `(*Checker) TotalBytes() int64` - Returns total size of all files in manifest
|
||||
- `(*Checker) Check(ctx context.Context, results chan<- Result, progress chan<- CheckStatus) error` - Verifies all files against the manifest
|
||||
|
||||
## mfer/
|
||||
|
||||
### manifest.go
|
||||
- **Types**
|
||||
- `ManifestScanOptions struct` - Options for scanning directories
|
||||
- `IgnoreDotfiles bool`
|
||||
- `FollowSymLinks bool`
|
||||
- **Functions**
|
||||
- `New() *manifest` - Creates a new empty manifest
|
||||
- `NewFromPaths(options *ManifestScanOptions, inputPaths ...string) (*manifest, error)` - Creates manifest from filesystem paths
|
||||
- `NewFromFS(options *ManifestScanOptions, fs afero.Fs) (*manifest, error)` - Creates manifest from afero filesystem
|
||||
- **Methods**
|
||||
- `(*manifest) HasError() bool` - Returns true if manifest has errors
|
||||
- `(*manifest) AddError(e error) *manifest` - Adds an error to the manifest
|
||||
- `(*manifest) WithContext(c context.Context) *manifest` - Sets context for cancellation
|
||||
- `(*manifest) GetFileCount() int64` - Returns number of files in manifest
|
||||
- `(*manifest) GetTotalFileSize() int64` - Returns total size of all files
|
||||
- `(*manifest) Files() []*MFFilePath` - Returns all file entries from a loaded manifest
|
||||
- `(*manifest) Scan() error` - Scans source filesystems and populates file list
|
||||
|
||||
### output.go
|
||||
- **Methods**
|
||||
- `(*manifest) WriteToFile(path string) error` - Writes manifest to file path
|
||||
- `(*manifest) WriteTo(output io.Writer) error` - Writes manifest to io.Writer
|
||||
|
||||
### builder.go
|
||||
- **Types**
|
||||
- `FileProgress func(bytesRead int64)` - Callback for file processing progress
|
||||
- `ManifestBuilder struct` - Constructs manifests by adding files one at a time
|
||||
- **Functions**
|
||||
- `NewBuilder() *ManifestBuilder` - Creates a new ManifestBuilder
|
||||
- **Methods**
|
||||
- `(*ManifestBuilder) AddFile(path string, size int64, mtime time.Time, reader io.Reader, progress FileProgress) (int64, error)` - Reads file, computes hash, adds to manifest
|
||||
- `(*ManifestBuilder) FileCount() int` - Returns number of files added
|
||||
- `(*ManifestBuilder) Build(w io.Writer) error` - Finalizes and writes manifest
|
||||
|
||||
### serialize.go
|
||||
- **Constants**
|
||||
- `MAGIC string` - Magic bytes prefix for manifest files ("ZNAVSRFG")
|
||||
|
||||
### deserialize.go
|
||||
- **Functions**
|
||||
- `NewFromProto(input io.Reader) (*manifest, error)` - Deserializes manifest from protobuf
|
||||
- `NewManifestFromReader(input io.Reader) (*manifest, error)` - Reads and parses manifest from io.Reader
|
||||
- `NewManifestFromFile(path string) (*manifest, error)` - Reads and parses manifest from file path
|
||||
|
||||
### mf.pb.go (generated from mf.proto)
|
||||
- **Enum Types**
|
||||
- `MFFileOuter_Version` - Outer file format version
|
||||
- `MFFileOuter_VERSION_NONE`
|
||||
- `MFFileOuter_VERSION_ONE`
|
||||
- `MFFileOuter_CompressionType` - Compression type for inner message
|
||||
- `MFFileOuter_COMPRESSION_NONE`
|
||||
- `MFFileOuter_COMPRESSION_GZIP`
|
||||
- `MFFile_Version` - Inner file format version
|
||||
- `MFFile_VERSION_NONE`
|
||||
- `MFFile_VERSION_ONE`
|
||||
- **Message Types**
|
||||
- `Timestamp struct` - Timestamp with seconds and nanoseconds
|
||||
- `GetSeconds() int64`
|
||||
- `GetNanos() int32`
|
||||
- `MFFileOuter struct` - Outer wrapper containing compressed/signed inner message
|
||||
- `GetVersion() MFFileOuter_Version`
|
||||
- `GetCompressionType() MFFileOuter_CompressionType`
|
||||
- `GetSize() int64`
|
||||
- `GetSha256() []byte`
|
||||
- `GetInnerMessage() []byte`
|
||||
- `GetSignature() []byte`
|
||||
- `GetSigner() []byte`
|
||||
- `GetSigningPubKey() []byte`
|
||||
- `MFFilePath struct` - Individual file entry in manifest
|
||||
- `GetPath() string`
|
||||
- `GetSize() int64`
|
||||
- `GetHashes() []*MFFileChecksum`
|
||||
- `GetMimeType() string`
|
||||
- `GetMtime() *Timestamp`
|
||||
- `GetCtime() *Timestamp`
|
||||
- `GetAtime() *Timestamp`
|
||||
- `MFFileChecksum struct` - File checksum using multihash
|
||||
- `GetMultiHash() []byte`
|
||||
- `MFFile struct` - Inner manifest containing file list
|
||||
- `GetVersion() MFFile_Version`
|
||||
- `GetFiles() []*MFFilePath`
|
||||
- `GetCreatedAt() *Timestamp`
|
||||
|
||||
# Build Status
|
||||
|
||||
[](https://drone.datavi.be/sneak/mfer)
|
||||
@@ -83,6 +309,10 @@ The manifest file would do several important things:
|
||||
- metadata size should not be used as an excuse to sacrifice utility (such
|
||||
as providing checksums over each chunk of a large file)
|
||||
|
||||
# Limitations
|
||||
|
||||
- **Manifest size:** Manifests must fit entirely in system memory during reading and writing.
|
||||
|
||||
# Open Questions
|
||||
|
||||
- Should the manifest file include checksums of individual file chunks, or just for the whole assembled file?
|
||||
|
||||
Reference in New Issue
Block a user