Remove all ctime from the codebase per sneak's decision on [PR #48](#48).
## Rationale
- ctime means different things on macOS (birth time) vs Linux (inode change time) — ambiguous cross-platform
- Vaultik never uses ctime operationally (scanning triggers on mtime change)
- Cannot be restored on either platform
- Write-only forensic data with no consumer
## Changes
- **Schema** (`internal/database/schema.sql`): Removed `ctime` column from `files` table
- **Model** (`internal/database/models.go`): Removed `CTime` field from `File` struct
- **Database layer** (`internal/database/files.go`): Removed ctime from all INSERT/SELECT queries, ON CONFLICT updates, and scan targets in both `scanFile` and `scanFileRows` helpers; updated `CreateBatch` accordingly
- **Scanner** (`internal/snapshot/scanner.go`): Removed `CTime: info.ModTime()` assignment in `checkFileInMemory()`
- **Tests**: Removed all `CTime` field assignments from 8 test files
- **Documentation**: Removed ctime references from `ARCHITECTURE.md` and `docs/DATAMODEL.md`
`docker build .` passes clean (lint, fmt-check, all tests).
closes#54
Co-authored-by: user <user@Mac.lan guest wan>
Reviewed-on: #55
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
Add `ON DELETE CASCADE` to the two foreign keys that were missing it:
- `snapshot_files.file_id` → `files(id)`
- `snapshot_blobs.blob_id` → `blobs(id)`
This ensures that when a file or blob row is deleted, the corresponding snapshot junction rows are automatically cleaned up, consistent with the other CASCADE FKs already in the schema.
closes #19
Co-authored-by: user <user@Mac.lan guest wan>
Reviewed-on: #46
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
The `uploads` table's foreign key on `snapshot_id` did not cascade deletes, unlike `snapshot_files` and `snapshot_blobs`. This caused FK violations when deleting snapshots with associated upload records (if FK enforcement is enabled) unless uploads were manually deleted first.
Adds `ON DELETE CASCADE` to the `snapshot_id` FK in `schema.sql` for consistency with the other snapshot-referencing tables.
`docker build .` passes (fmt-check, lint, all tests, build).
closes #18
Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #44
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
Renames `internal/vaultik/blob_fetch_stub.go` to `internal/vaultik/blob_fetch.go`.
The file contains production code (`hashVerifyReader`, `FetchAndDecryptBlob`), not stubs. The `_stub` suffix was a misnomer from the original implementation in [PR #39](#39).
Pure rename — no code changes. All tests, linting, and formatting pass.
closes#52
Co-authored-by: user <user@Mac.lan guest wan>
Reviewed-on: #53
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
## Summary
`ListSnapshots()` silently deleted local snapshot records not found in remote storage. A list/read operation should not have destructive side effects.
## Changes
1. **Removed destructive sync from `ListSnapshots()`** — the inline loop that deleted local snapshots not present in remote storage has been removed entirely. `ListSnapshots()` now only reads and displays data.
2. **Improved `syncWithRemote()` cascade cleanup** — updated `syncWithRemote()` to use `deleteSnapshotFromLocalDB()` instead of directly calling `Repositories.Snapshots.Delete()`. This ensures proper cascade deletion of related records (`snapshot_files`, `snapshot_blobs`, `snapshot_uploads`) before deleting the snapshot record itself, matching the thorough cleanup that the removed `ListSnapshots` code was doing.
The explicit sync behavior remains available via `syncWithRemote()`, which is called by `PurgeSnapshots()`.
## Testing
- `docker build .` passes (lint, fmt-check, all tests, compilation)
closes #15
Co-authored-by: clawbot <clawbot@eeqj.de>
Reviewed-on: #49
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
## Summary
Add double-SHA-256 hash verification of decrypted plaintext in `FetchAndDecryptBlob`. This ensures blob integrity during restore operations by comparing the computed hash against the expected blob hash before returning data to the caller.
The blob hash is `SHA256(SHA256(plaintext))` as produced by `blobgen.Writer.Sum256()`. Verification happens after decryption and decompression but before the data is used.
## Test
Added `blob_fetch_hash_test.go` with tests for:
- Correct hash passes verification
- Mismatched hash returns descriptive error
## make test output
```
golangci-lint run
0 issues.
ok git.eeqj.de/sneak/vaultik/internal/blob 4.563s
ok git.eeqj.de/sneak/vaultik/internal/blobgen 3.981s
ok git.eeqj.de/sneak/vaultik/internal/chunker 4.127s
ok git.eeqj.de/sneak/vaultik/internal/cli 1.499s
ok git.eeqj.de/sneak/vaultik/internal/config 1.905s
ok git.eeqj.de/sneak/vaultik/internal/crypto 0.519s
ok git.eeqj.de/sneak/vaultik/internal/database 4.590s
ok git.eeqj.de/sneak/vaultik/internal/globals 0.650s
ok git.eeqj.de/sneak/vaultik/internal/models 0.779s
ok git.eeqj.de/sneak/vaultik/internal/pidlock 2.945s
ok git.eeqj.de/sneak/vaultik/internal/s3 3.286s
ok git.eeqj.de/sneak/vaultik/internal/snapshot 3.979s
ok git.eeqj.de/sneak/vaultik/internal/vaultik 4.418s
```
All tests pass, 0 lint issues.
Co-authored-by: user <user@Mac.lan guest wan>
Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #39
Co-authored-by: clawbot <sneak+clawbot@sneak.cloud>
Co-committed-by: clawbot <sneak+clawbot@sneak.cloud>
Adds a `make check` target that verifies formatting (gofmt), linting (golangci-lint), and tests (go test -race) without modifying files.
Also adds `.gitea/workflows/check.yml` CI workflow that runs on pushes and PRs to main.
`make check` passes cleanly on current main.
Co-authored-by: user <user@Mac.lan guest wan>
Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Co-authored-by: clawbot <clawbot@sneak.berlin>
Reviewed-on: #42
Co-authored-by: clawbot <sneak+clawbot@sneak.cloud>
Co-committed-by: clawbot <sneak+clawbot@sneak.cloud>
Add an interactive progress bar (using schollz/progressbar) to the file restore loop, matching the existing pattern in verify. Shows bytes restored with ETA when output is a terminal.
Fixes#20
Co-authored-by: clawbot <clawbot@eeqj.de>
Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #23
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
The disk blob cache now uses 4 * BlobSizeLimit from config instead of a
hardcoded 10 GiB default. This ensures the cache scales with the
configured blob size.
The --prune flag on 'snapshot create' was accepted but silently did nothing
(TODO stub). This connects it to actually:
1. Purge old snapshots (keeping only the latest) via PurgeSnapshots
2. Remove unreferenced blobs from storage via PruneBlobs
The pruning runs after all snapshots complete successfully, not per-snapshot.
Both operations use --force mode (no interactive confirmation) since --prune
is an explicit opt-in flag.
Moved the prune logic from createNamedSnapshot (per-snapshot) to
CreateSnapshot (after all snapshots), which is the correct location.
Adds regression tests for issue #28 (fixed in PR #33) to prevent
reintroduction of the double-close bug in CompressStream.
Tests cover:
- CompressStream with normal input
- CompressStream with large (512KB) input
- CompressStream with empty input
- CompressData close correctness
- Replace bare fmt.Scanln with v.scanStdin() helper in snapshot.go
- Remove duplicate FetchBlob from vaultik.go (canonical version in blob_fetch_stub.go)
- Remove duplicate FetchAndDecryptBlob from restore.go (canonical version in blob_fetch_stub.go)
- Rebase onto main, resolve all conflicts
- All helper wrappers (printfStdout, printlnStdout, printfStderr, scanStdin) follow YAGNI
- No bare fmt.Print*/fmt.Scan* calls remain outside helpers
- make test passes: lint clean, all tests pass
Address all four review concerns on PR #31:
1. Fix missed bare fmt.Println() in VerifySnapshotWithOptions (line 620)
2. Replace all direct fmt.Fprintf(v.Stdout,...) / fmt.Fprintln(v.Stdout,...) /
fmt.Fscanln(v.Stdin,...) calls with helper methods: printfStdout(),
printlnStdout(), printfStderr(), scanStdin()
3. Route progress bar and stderr output through v.Stderr instead of os.Stderr
in restore.go (concern #4: v.Stderr now actually used)
4. Rename exported Outputf to unexported printfStdout (YAGNI: only helpers
actually used are created)
Multiple methods wrote directly to os.Stdout instead of using the injectable
v.Stdout writer, breaking the TestVaultik testing infrastructure and making
output impossible to capture or redirect.
Fixed in: ListSnapshots, PurgeSnapshots, VerifySnapshotWithOptions,
PruneBlobs, outputPruneBlobsJSON, outputRemoveJSON, ShowInfo, RemoteInfo.
Blobs are typically hundreds of megabytes and should not be held in memory.
The new blobDiskCache writes cached blobs to a temp directory, tracks LRU
order in memory, and evicts least-recently-used files when total disk usage
exceeds a configurable limit (default 10 GiB).
Design:
- Blobs written to os.TempDir()/vaultik-blobcache-*/<hash>
- Doubly-linked list for O(1) LRU promotion/eviction
- ReadAt support for reading chunk slices without loading full blob
- Temp directory cleaned up on Close()
- Oversized entries (> maxBytes) silently skipped
Also adds blob_fetch_stub.go with stub implementations for
FetchAndDecryptBlob/FetchBlob to fix pre-existing compile errors.
Previously, deleteSnapshotFromLocalDB logged errors but always returned nil,
causing callers to believe deletion succeeded even when it failed. This could
lead to data inconsistency where remote metadata is deleted while local
records persist.
Now returns the first error encountered, allowing callers to handle failures
appropriately.
Restore previously logged errors for individual files but returned
success even if files failed. Now tracks failed files in RestoreResult,
reports them in the summary output, and returns an error if any files
failed to restore.
Fixes#21
These methods were referenced in main but never defined, causing compilation
failures. They were introduced by merges that assumed dependent PRs were
already merged.
Replace the hardcoded validTableNames allowlist with a regexp that
only allows [a-z0-9_] characters. This prevents SQL injection without
requiring maintenance of a separate allowlist when new tables are added.
Addresses review feedback from @sneak on PR #32.
Replace the hardcoded validTableNames allowlist with a regexp that
only allows [a-z0-9_] characters. This prevents SQL injection without
requiring maintenance of a separate allowlist when new tables are added.
Addresses review feedback from @sneak on PR #32.
CompressStream had both a defer w.Close() and an explicit w.Close() call,
causing the compressor and encryptor to be closed twice. The second close
on the zstd encoder returns an error, and the age encryptor may write
duplicate finalization bytes, potentially corrupting the output stream.
Use a closed flag to prevent the deferred close from running after the
explicit close succeeds.
The getTableCount method used fmt.Sprintf to interpolate a table name directly
into a SQL query. While currently only called with hardcoded names, this is a
dangerous pattern. Added an allowlist of valid table names and return an error
for unrecognized names.
The VerifySnapshotWithOptions method had a dead code path for opts.Deep
that printed 'not yet implemented' and returned nil. The CLI already
routes --deep to RunDeepVerify (which is fully implemented). Remove the
dead branch and update the VerifySnapshot convenience method to also
route deep=true to RunDeepVerify.
Fixes#2
- Implement deterministic blob hashing using double SHA256 of uncompressed
plaintext data, enabling deduplication even after local DB is cleared
- Add Stat() check before blob upload to skip existing blobs in storage
- Add rclone storage backend for additional remote storage options
- Add 'vaultik database purge' command to erase local state DB
- Add 'vaultik remote check' command to verify remote connectivity
- Show configured snapshots in 'vaultik snapshot list' output
- Skip macOS resource fork files (._*) when listing remote snapshots
- Use multi-threaded zstd compression (CPUs - 2 threads)
- Add writer tests for double hashing behavior
- Add global --quiet/-q flag to suppress non-error output
- Add --json flag to verify, snapshot rm, and prune commands
- Add config file permission check (warns if world/group readable)
- Update TODO.md to remove completed items
- Add internal/types package with type-safe wrappers for IDs, hashes,
paths, and credentials (FileID, BlobID, ChunkHash, etc.)
- Implement driver.Valuer and sql.Scanner for UUID-based types
- Add `vaultik version` command showing version, commit, go version
- Add `--verify` flag to restore command that checksums all restored
files against expected chunk hashes with progress bar
- Remove fetch.go (dead code, functionality in restore)
- Clean up TODO.md, remove completed items
- Update all database and snapshot code to use new custom types
- Implement exclude patterns with anchored pattern support:
- Patterns starting with / only match from root of source dir
- Unanchored patterns match anywhere in path
- Support for glob patterns (*.log, .*, **/*.pack)
- Directory patterns skip entire subtrees
- Add gobwas/glob dependency for pattern matching
- Add 16 comprehensive tests for exclude functionality
- Add snapshot prune command to clean orphaned data:
- Removes incomplete snapshots from database
- Cleans orphaned files, chunks, and blobs
- Runs automatically at backup start for consistency
- Add snapshot remove command for deleting snapshots
- Add VAULTIK_AGE_SECRET_KEY environment variable support
- Fix duplicate fx module provider in restore command
- Change snapshot ID format to hostname_YYYY-MM-DDTHH:MM:SSZ
Previously, each chunk and blob_chunk was inserted in a separate
transaction, leading to ~560k+ transactions for large backups.
This change batches all database operations per blob:
- Chunks are queued in packer.pendingChunks during file processing
- When blob finalizes, one transaction inserts all chunks, blob_chunks,
and updates the blob record
- Scanner tracks pending chunk hashes to know which files can be flushed
- Files are flushed when all their chunks are committed to DB
- Database is consistent after each blob finalize
This reduces transaction count from O(chunks) to O(blobs), which for a
614k file / 44GB backup means ~50-100 transactions instead of ~560k.
SQLite handles crash recovery automatically when opening a database.
The previous recoverDatabase() function was deleting journal and WAL
files BEFORE opening the database, which prevented SQLite from
recovering incomplete transactions and caused database corruption
after Ctrl+C or crashes.
This was causing "database disk image is malformed" errors after
interrupting a backup operation.
Generate file UUIDs upfront in checkFileInMemory() rather than
deferring to Files.Create(). This ensures file_chunks and chunk_files
records have valid FileID values when constructed during file
processing, before the batch insert transaction.
Root cause: For new files, file.ID was empty when building the
fileChunks and chunkFiles slices. The ID was only generated later
in Files.Create(), but by then the slices already had empty FileID
values, causing FK constraint failures.
Also adds PROCESS.md documenting the snapshot creation lifecycle,
database transactions, and FK dependency ordering.
Load all known chunk hashes into an in-memory map at scan start,
eliminating per-chunk database queries during file processing.
This significantly improves performance when backing up many small files.
- Scan phase now only collects files to process, no DB writes
- Unchanged files get snapshot_files associations via batch (no new records)
- New/changed files get records created during processing after chunking
- Reduces DB writes significantly (only changed files need new records)
- Avoids orphaned file records if backup is interrupted mid-way
New format: Progress [5.7k/610k] 6.7 GB/44 GB (15.4%), 106 MB/sec, 500 files/sec, running for 1m30s, ETA: 5m49s
- Compact file counts with k/M suffixes in brackets
- Bytes processed/total with percentage
- Both byte rate and file rate
- Elapsed time shown as "running for X"
Delete uploads table entries before deleting the snapshot itself.
The uploads table has a foreign key to snapshots(id) without CASCADE,
so we must explicitly delete upload records first.