FetchAndDecryptBlob now returns io.ReadCloser with a hashVerifyReader
that computes the double-SHA-256 on-the-fly during reads. Hash is
verified on Close() after the stream is fully consumed. This avoids
loading entire blobs into memory, which could exceed available RAM.
Addresses review feedback on PR #39.
Add double-SHA-256 hash verification of decrypted plaintext in
FetchAndDecryptBlob. This ensures blob integrity during restore
operations by comparing the computed hash against the expected
blob hash before returning data to the caller.
Includes test for both correct hash (passes) and mismatched hash
(returns error).
The disk blob cache now uses 4 * BlobSizeLimit from config instead of a
hardcoded 10 GiB default. This ensures the cache scales with the
configured blob size.
The --prune flag on 'snapshot create' was accepted but silently did nothing
(TODO stub). This connects it to actually:
1. Purge old snapshots (keeping only the latest) via PurgeSnapshots
2. Remove unreferenced blobs from storage via PruneBlobs
The pruning runs after all snapshots complete successfully, not per-snapshot.
Both operations use --force mode (no interactive confirmation) since --prune
is an explicit opt-in flag.
Moved the prune logic from createNamedSnapshot (per-snapshot) to
CreateSnapshot (after all snapshots), which is the correct location.
Adds regression tests for issue #28 (fixed in PR #33) to prevent
reintroduction of the double-close bug in CompressStream.
Tests cover:
- CompressStream with normal input
- CompressStream with large (512KB) input
- CompressStream with empty input
- CompressData close correctness
- Replace bare fmt.Scanln with v.scanStdin() helper in snapshot.go
- Remove duplicate FetchBlob from vaultik.go (canonical version in blob_fetch_stub.go)
- Remove duplicate FetchAndDecryptBlob from restore.go (canonical version in blob_fetch_stub.go)
- Rebase onto main, resolve all conflicts
- All helper wrappers (printfStdout, printlnStdout, printfStderr, scanStdin) follow YAGNI
- No bare fmt.Print*/fmt.Scan* calls remain outside helpers
- make test passes: lint clean, all tests pass
Address all four review concerns on PR #31:
1. Fix missed bare fmt.Println() in VerifySnapshotWithOptions (line 620)
2. Replace all direct fmt.Fprintf(v.Stdout,...) / fmt.Fprintln(v.Stdout,...) /
fmt.Fscanln(v.Stdin,...) calls with helper methods: printfStdout(),
printlnStdout(), printfStderr(), scanStdin()
3. Route progress bar and stderr output through v.Stderr instead of os.Stderr
in restore.go (concern #4: v.Stderr now actually used)
4. Rename exported Outputf to unexported printfStdout (YAGNI: only helpers
actually used are created)
Multiple methods wrote directly to os.Stdout instead of using the injectable
v.Stdout writer, breaking the TestVaultik testing infrastructure and making
output impossible to capture or redirect.
Fixed in: ListSnapshots, PurgeSnapshots, VerifySnapshotWithOptions,
PruneBlobs, outputPruneBlobsJSON, outputRemoveJSON, ShowInfo, RemoteInfo.
Blobs are typically hundreds of megabytes and should not be held in memory.
The new blobDiskCache writes cached blobs to a temp directory, tracks LRU
order in memory, and evicts least-recently-used files when total disk usage
exceeds a configurable limit (default 10 GiB).
Design:
- Blobs written to os.TempDir()/vaultik-blobcache-*/<hash>
- Doubly-linked list for O(1) LRU promotion/eviction
- ReadAt support for reading chunk slices without loading full blob
- Temp directory cleaned up on Close()
- Oversized entries (> maxBytes) silently skipped
Also adds blob_fetch_stub.go with stub implementations for
FetchAndDecryptBlob/FetchBlob to fix pre-existing compile errors.
Previously, deleteSnapshotFromLocalDB logged errors but always returned nil,
causing callers to believe deletion succeeded even when it failed. This could
lead to data inconsistency where remote metadata is deleted while local
records persist.
Now returns the first error encountered, allowing callers to handle failures
appropriately.
Restore previously logged errors for individual files but returned
success even if files failed. Now tracks failed files in RestoreResult,
reports them in the summary output, and returns an error if any files
failed to restore.
Fixes#21
These methods were referenced in main but never defined, causing compilation
failures. They were introduced by merges that assumed dependent PRs were
already merged.
Replace the hardcoded validTableNames allowlist with a regexp that
only allows [a-z0-9_] characters. This prevents SQL injection without
requiring maintenance of a separate allowlist when new tables are added.
Addresses review feedback from @sneak on PR #32.
Replace the hardcoded validTableNames allowlist with a regexp that
only allows [a-z0-9_] characters. This prevents SQL injection without
requiring maintenance of a separate allowlist when new tables are added.
Addresses review feedback from @sneak on PR #32.
CompressStream had both a defer w.Close() and an explicit w.Close() call,
causing the compressor and encryptor to be closed twice. The second close
on the zstd encoder returns an error, and the age encryptor may write
duplicate finalization bytes, potentially corrupting the output stream.
Use a closed flag to prevent the deferred close from running after the
explicit close succeeds.
The getTableCount method used fmt.Sprintf to interpolate a table name directly
into a SQL query. While currently only called with hardcoded names, this is a
dangerous pattern. Added an allowlist of valid table names and return an error
for unrecognized names.
The VerifySnapshotWithOptions method had a dead code path for opts.Deep
that printed 'not yet implemented' and returned nil. The CLI already
routes --deep to RunDeepVerify (which is fully implemented). Remove the
dead branch and update the VerifySnapshot convenience method to also
route deep=true to RunDeepVerify.
Fixes#2
- Implement deterministic blob hashing using double SHA256 of uncompressed
plaintext data, enabling deduplication even after local DB is cleared
- Add Stat() check before blob upload to skip existing blobs in storage
- Add rclone storage backend for additional remote storage options
- Add 'vaultik database purge' command to erase local state DB
- Add 'vaultik remote check' command to verify remote connectivity
- Show configured snapshots in 'vaultik snapshot list' output
- Skip macOS resource fork files (._*) when listing remote snapshots
- Use multi-threaded zstd compression (CPUs - 2 threads)
- Add writer tests for double hashing behavior
- Add global --quiet/-q flag to suppress non-error output
- Add --json flag to verify, snapshot rm, and prune commands
- Add config file permission check (warns if world/group readable)
- Update TODO.md to remove completed items
- Add internal/types package with type-safe wrappers for IDs, hashes,
paths, and credentials (FileID, BlobID, ChunkHash, etc.)
- Implement driver.Valuer and sql.Scanner for UUID-based types
- Add `vaultik version` command showing version, commit, go version
- Add `--verify` flag to restore command that checksums all restored
files against expected chunk hashes with progress bar
- Remove fetch.go (dead code, functionality in restore)
- Clean up TODO.md, remove completed items
- Update all database and snapshot code to use new custom types
- Implement exclude patterns with anchored pattern support:
- Patterns starting with / only match from root of source dir
- Unanchored patterns match anywhere in path
- Support for glob patterns (*.log, .*, **/*.pack)
- Directory patterns skip entire subtrees
- Add gobwas/glob dependency for pattern matching
- Add 16 comprehensive tests for exclude functionality
- Add snapshot prune command to clean orphaned data:
- Removes incomplete snapshots from database
- Cleans orphaned files, chunks, and blobs
- Runs automatically at backup start for consistency
- Add snapshot remove command for deleting snapshots
- Add VAULTIK_AGE_SECRET_KEY environment variable support
- Fix duplicate fx module provider in restore command
- Change snapshot ID format to hostname_YYYY-MM-DDTHH:MM:SSZ
Previously, each chunk and blob_chunk was inserted in a separate
transaction, leading to ~560k+ transactions for large backups.
This change batches all database operations per blob:
- Chunks are queued in packer.pendingChunks during file processing
- When blob finalizes, one transaction inserts all chunks, blob_chunks,
and updates the blob record
- Scanner tracks pending chunk hashes to know which files can be flushed
- Files are flushed when all their chunks are committed to DB
- Database is consistent after each blob finalize
This reduces transaction count from O(chunks) to O(blobs), which for a
614k file / 44GB backup means ~50-100 transactions instead of ~560k.
SQLite handles crash recovery automatically when opening a database.
The previous recoverDatabase() function was deleting journal and WAL
files BEFORE opening the database, which prevented SQLite from
recovering incomplete transactions and caused database corruption
after Ctrl+C or crashes.
This was causing "database disk image is malformed" errors after
interrupting a backup operation.
Generate file UUIDs upfront in checkFileInMemory() rather than
deferring to Files.Create(). This ensures file_chunks and chunk_files
records have valid FileID values when constructed during file
processing, before the batch insert transaction.
Root cause: For new files, file.ID was empty when building the
fileChunks and chunkFiles slices. The ID was only generated later
in Files.Create(), but by then the slices already had empty FileID
values, causing FK constraint failures.
Also adds PROCESS.md documenting the snapshot creation lifecycle,
database transactions, and FK dependency ordering.
Load all known chunk hashes into an in-memory map at scan start,
eliminating per-chunk database queries during file processing.
This significantly improves performance when backing up many small files.
- Scan phase now only collects files to process, no DB writes
- Unchanged files get snapshot_files associations via batch (no new records)
- New/changed files get records created during processing after chunking
- Reduces DB writes significantly (only changed files need new records)
- Avoids orphaned file records if backup is interrupted mid-way
New format: Progress [5.7k/610k] 6.7 GB/44 GB (15.4%), 106 MB/sec, 500 files/sec, running for 1m30s, ETA: 5m49s
- Compact file counts with k/M suffixes in brackets
- Bytes processed/total with percentage
- Both byte rate and file rate
- Elapsed time shown as "running for X"
Delete uploads table entries before deleting the snapshot itself.
The uploads table has a foreign key to snapshots(id) without CASCADE,
so we must explicitly delete upload records first.
- Show bytes processed/total instead of just files
- Display data rate in bytes/sec
- Calculate ETA based on bytes (more accurate than files)
- Print message when each blob is stored with size and speed
Remove the separate enumerateFiles() function that was doing a full
directory walk using Readdir() which calls stat() on every file.
Instead, build the existingFiles map during the scan phase walk,
and detect deleted files afterward.
This eliminates one full filesystem traversal, significantly speeding
up the scan phase for large directories.
Performance improvements:
- Load all known files from DB into memory at startup
- Check file changes against in-memory map (no per-file DB queries)
- Batch database writes in groups of 1000 files per transaction
- Scan phase now only counts regular files, not directories
This should improve scan speed from ~600 files/sec to potentially
10,000+ files/sec by eliminating per-file database round trips.
Storage backend:
- Add internal/storage package with Storer interface
- Implement FileStorer for local filesystem storage (file:// URLs)
- Implement S3Storer wrapping existing s3.Client
- Support storage_url config field (s3:// or file://)
- Migrate all consumers to use storage.Storer interface
PID locking:
- Add internal/pidlock package to prevent concurrent instances
- Acquire lock before app start, release on exit
- Detect stale locks from crashed processes
Scan progress improvements:
- Add fast file enumeration pass before stat() phase
- Use enumerated set for deletion detection (no extra filesystem access)
- Show progress with percentage, files/sec, elapsed time, and ETA
- Change "changed" to "changed/new" for clarity
Config improvements:
- Add tilde expansion for paths (~/)
- Use xdg library for platform-specific default index path
Document the data model, type instantiation flow, and module
responsibilities. Covers chunker, packer, vaultik, cli, snapshot,
and database modules with detailed explanations of relationships
between File, Chunk, Blob, and Snapshot entities.
- Create comprehensive integration tests with mock S3 client
- Add in-memory filesystem and SQLite database support for testing
- Test full backup workflow including chunking, packing, and uploading
- Add test to verify encrypted blob content
- Fix scanner to use afero filesystem for temp file cleanup
- Demonstrate successful backup and verification with mock dependencies