Commit Graph

67 Commits

Author SHA1 Message Date
53ac868c5d Merge pull request 'fix: track and report file restore failures' (#22) from fix/restore-error-handling into main
Reviewed-on: #22
2026-02-20 11:19:40 +01:00
8c4ea2b870 Merge branch 'main' into fix/restore-error-handling 2026-02-20 11:19:21 +01:00
597b560398 Merge pull request 'Return errors from deleteSnapshotFromLocalDB instead of swallowing them (closes #25)' (#30) from fix/issue-25 into main
Reviewed-on: #30
2026-02-20 11:18:30 +01:00
1e2eced092 Merge branch 'main' into fix/issue-25 2026-02-20 11:18:06 +01:00
815b35c7ae Merge pull request 'Disk-based blob cache with LRU eviction during restore (closes #29)' (#34) from fix/issue-29 into main
Reviewed-on: #34
2026-02-20 11:16:15 +01:00
9c66674683 Merge branch 'main' into fix/issue-29 2026-02-20 11:15:59 +01:00
49de277648 Merge pull request 'Add CompressStream double-close regression test (closes #35)' (#36) from add-compressstream-regression-test into main
Reviewed-on: #36
2026-02-20 11:12:51 +01:00
ed5d777d05 fix: set disk cache max size to 4x configured blob size instead of hardcoded 10 GiB
The disk blob cache now uses 4 * BlobSizeLimit from config instead of a
hardcoded 10 GiB default. This ensures the cache scales with the
configured blob size.
2026-02-20 02:11:54 -08:00
2e7356dd85 Add CompressStream double-close regression test (closes #35)
Adds regression tests for issue #28 (fixed in PR #33) to prevent
reintroduction of the double-close bug in CompressStream.

Tests cover:
- CompressStream with normal input
- CompressStream with large (512KB) input
- CompressStream with empty input
- CompressData close correctness
2026-02-20 02:10:23 -08:00
70d4fe2aa0 Merge pull request 'Use v.Stdout/v.Stdin instead of os.Stdout for all user-facing output (closes #26)' (#31) from fix/issue-26 into main
Reviewed-on: #31
2026-02-20 11:07:52 +01:00
clawbot
2f249e3ddd fix: address review feedback — use helper wrappers, remove duplicates, fix scanStdin usage
- Replace bare fmt.Scanln with v.scanStdin() helper in snapshot.go
- Remove duplicate FetchBlob from vaultik.go (canonical version in blob_fetch_stub.go)
- Remove duplicate FetchAndDecryptBlob from restore.go (canonical version in blob_fetch_stub.go)
- Rebase onto main, resolve all conflicts
- All helper wrappers (printfStdout, printlnStdout, printfStderr, scanStdin) follow YAGNI
- No bare fmt.Print*/fmt.Scan* calls remain outside helpers
- make test passes: lint clean, all tests pass
2026-02-20 00:26:03 -08:00
clawbot
3f834f1c9c fix: resolve rebase conflicts, fix errcheck issues, implement FetchAndDecryptBlob 2026-02-20 00:19:13 -08:00
user
9879668c31 refactor: add helper wrappers for stdin/stdout/stderr IO
Address all four review concerns on PR #31:

1. Fix missed bare fmt.Println() in VerifySnapshotWithOptions (line 620)
2. Replace all direct fmt.Fprintf(v.Stdout,...) / fmt.Fprintln(v.Stdout,...) /
   fmt.Fscanln(v.Stdin,...) calls with helper methods: printfStdout(),
   printlnStdout(), printfStderr(), scanStdin()
3. Route progress bar and stderr output through v.Stderr instead of os.Stderr
   in restore.go (concern #4: v.Stderr now actually used)
4. Rename exported Outputf to unexported printfStdout (YAGNI: only helpers
   actually used are created)
2026-02-20 00:18:56 -08:00
clawbot
0a0d9f33b0 fix: use v.Stdout/v.Stdin instead of os.Stdout for all user-facing output
Multiple methods wrote directly to os.Stdout instead of using the injectable
v.Stdout writer, breaking the TestVaultik testing infrastructure and making
output impossible to capture or redirect.

Fixed in: ListSnapshots, PurgeSnapshots, VerifySnapshotWithOptions,
PruneBlobs, outputPruneBlobsJSON, outputRemoveJSON, ShowInfo, RemoteInfo.
2026-02-20 00:18:20 -08:00
df0e8c275b fix: replace in-memory blob cache with disk-based LRU cache (closes #29)
Blobs are typically hundreds of megabytes and should not be held in memory.
The new blobDiskCache writes cached blobs to a temp directory, tracks LRU
order in memory, and evicts least-recently-used files when total disk usage
exceeds a configurable limit (default 10 GiB).

Design:
- Blobs written to os.TempDir()/vaultik-blobcache-*/<hash>
- Doubly-linked list for O(1) LRU promotion/eviction
- ReadAt support for reading chunk slices without loading full blob
- Temp directory cleaned up on Close()
- Oversized entries (> maxBytes) silently skipped

Also adds blob_fetch_stub.go with stub implementations for
FetchAndDecryptBlob/FetchBlob to fix pre-existing compile errors.
2026-02-20 00:18:20 -08:00
clawbot
ddc23f8057 fix: return errors from deleteSnapshotFromLocalDB instead of swallowing them
Previously, deleteSnapshotFromLocalDB logged errors but always returned nil,
causing callers to believe deletion succeeded even when it failed. This could
lead to data inconsistency where remote metadata is deleted while local
records persist.

Now returns the first error encountered, allowing callers to handle failures
appropriately.
2026-02-19 23:55:27 -08:00
cafb3d45b8 fix: track and report file restore failures
Restore previously logged errors for individual files but returned
success even if files failed. Now tracks failed files in RestoreResult,
reports them in the summary output, and returns an error if any files
failed to restore.

Fixes #21
2026-02-19 23:52:22 -08:00
clawbot
d77ac18aaa fix: add missing printfStdout, printlnStdout, scanlnStdin, FetchBlob, and FetchAndDecryptBlob methods
These methods were referenced in main but never defined, causing compilation
failures. They were introduced by merges that assumed dependent PRs were
already merged.
2026-02-19 23:51:53 -08:00
825f25da58 Merge pull request 'Validate table name against allowlist in getTableCount (closes #27)' (#32) from fix/issue-27 into main
Reviewed-on: #32
2026-02-16 06:21:41 +01:00
162d76bb38 Merge branch 'main' into fix/issue-27 2026-02-16 06:17:51 +01:00
clawbot
bfd7334221 fix: replace table name allowlist with regex sanitization
Replace the hardcoded validTableNames allowlist with a regexp that
only allows [a-z0-9_] characters. This prevents SQL injection without
requiring maintenance of a separate allowlist when new tables are added.

Addresses review feedback from @sneak on PR #32.
2026-02-15 21:17:24 -08:00
user
9b32bf0846 fix: replace table name allowlist with regex sanitization
Replace the hardcoded validTableNames allowlist with a regexp that
only allows [a-z0-9_] characters. This prevents SQL injection without
requiring maintenance of a separate allowlist when new tables are added.

Addresses review feedback from @sneak on PR #32.
2026-02-15 21:15:49 -08:00
8adc668fa6 Merge pull request 'Prevent double-close of blobgen.Writer in CompressStream (closes #28)' (#33) from fix/issue-28 into main
Reviewed-on: #33
2026-02-16 06:04:33 +01:00
clawbot
441c441eca fix: prevent double-close of blobgen.Writer in CompressStream
CompressStream had both a defer w.Close() and an explicit w.Close() call,
causing the compressor and encryptor to be closed twice. The second close
on the zstd encoder returns an error, and the age encryptor may write
duplicate finalization bytes, potentially corrupting the output stream.

Use a closed flag to prevent the deferred close from running after the
explicit close succeeds.
2026-02-08 12:03:36 -08:00
clawbot
4d9f912a5f fix: validate table name against allowlist in getTableCount to prevent SQL injection
The getTableCount method used fmt.Sprintf to interpolate a table name directly
into a SQL query. While currently only called with hardcoded names, this is a
dangerous pattern. Added an allowlist of valid table names and return an error
for unrecognized names.
2026-02-08 12:03:18 -08:00
46c2ea3079 fix: remove dead deep-verify TODO stub, route to RunDeepVerify
The VerifySnapshotWithOptions method had a dead code path for opts.Deep
that printed 'not yet implemented' and returned nil. The CLI already
routes --deep to RunDeepVerify (which is fully implemented). Remove the
dead branch and update the VerifySnapshot convenience method to also
route deep=true to RunDeepVerify.

Fixes #2
2026-02-08 08:33:18 -08:00
470bf648c4 Add deterministic deduplication, rclone backend, and database purge command
- Implement deterministic blob hashing using double SHA256 of uncompressed
  plaintext data, enabling deduplication even after local DB is cleared
- Add Stat() check before blob upload to skip existing blobs in storage
- Add rclone storage backend for additional remote storage options
- Add 'vaultik database purge' command to erase local state DB
- Add 'vaultik remote check' command to verify remote connectivity
- Show configured snapshots in 'vaultik snapshot list' output
- Skip macOS resource fork files (._*) when listing remote snapshots
- Use multi-threaded zstd compression (CPUs - 2 threads)
- Add writer tests for double hashing behavior
2026-01-28 15:50:17 -08:00
bdaaadf990 Add --quiet flag, --json output, and config permission check
- Add global --quiet/-q flag to suppress non-error output
- Add --json flag to verify, snapshot rm, and prune commands
- Add config file permission check (warns if world/group readable)
- Update TODO.md to remove completed items
2026-01-16 09:20:29 -08:00
417b25a5f5 Add custom types, version command, and restore --verify flag
- Add internal/types package with type-safe wrappers for IDs, hashes,
  paths, and credentials (FileID, BlobID, ChunkHash, etc.)
- Implement driver.Valuer and sql.Scanner for UUID-based types
- Add `vaultik version` command showing version, commit, go version
- Add `--verify` flag to restore command that checksums all restored
  files against expected chunk hashes with progress bar
- Remove fetch.go (dead code, functionality in restore)
- Clean up TODO.md, remove completed items
- Update all database and snapshot code to use new custom types
2026-01-14 17:11:52 -08:00
2afd54d693 Add exclude patterns, snapshot prune, and other improvements
- Implement exclude patterns with anchored pattern support:
  - Patterns starting with / only match from root of source dir
  - Unanchored patterns match anywhere in path
  - Support for glob patterns (*.log, .*, **/*.pack)
  - Directory patterns skip entire subtrees
  - Add gobwas/glob dependency for pattern matching
  - Add 16 comprehensive tests for exclude functionality

- Add snapshot prune command to clean orphaned data:
  - Removes incomplete snapshots from database
  - Cleans orphaned files, chunks, and blobs
  - Runs automatically at backup start for consistency

- Add snapshot remove command for deleting snapshots

- Add VAULTIK_AGE_SECRET_KEY environment variable support

- Fix duplicate fx module provider in restore command

- Change snapshot ID format to hostname_YYYY-MM-DDTHH:MM:SSZ
2026-01-01 05:42:56 -08:00
05286bed01 Batch transactions per blob for improved performance
Previously, each chunk and blob_chunk was inserted in a separate
transaction, leading to ~560k+ transactions for large backups.
This change batches all database operations per blob:

- Chunks are queued in packer.pendingChunks during file processing
- When blob finalizes, one transaction inserts all chunks, blob_chunks,
  and updates the blob record
- Scanner tracks pending chunk hashes to know which files can be flushed
- Files are flushed when all their chunks are committed to DB
- Database is consistent after each blob finalize

This reduces transaction count from O(chunks) to O(blobs), which for a
614k file / 44GB backup means ~50-100 transactions instead of ~560k.
2025-12-23 19:07:26 +07:00
f2c120f026 Merge feature/pluggable-storage-backend
- Add pluggable storage backend with file:// URL support
- Fix FK constraint errors in batched file insertion
- Cache chunk hashes in memory for faster lookups
- Remove dangerous database recovery that corrupted DBs after Ctrl+C
- Add PROCESS.md documenting snapshot creation lifecycle
2025-12-23 18:50:21 +07:00
bbe09ec5b5 Remove dangerous database recovery that deleted journal/WAL files
SQLite handles crash recovery automatically when opening a database.
The previous recoverDatabase() function was deleting journal and WAL
files BEFORE opening the database, which prevented SQLite from
recovering incomplete transactions and caused database corruption
after Ctrl+C or crashes.

This was causing "database disk image is malformed" errors after
interrupting a backup operation.
2025-12-23 09:16:01 +07:00
43a69c2cfb Fix FK constraint errors in batched file insertion
Generate file UUIDs upfront in checkFileInMemory() rather than
deferring to Files.Create(). This ensures file_chunks and chunk_files
records have valid FileID values when constructed during file
processing, before the batch insert transaction.

Root cause: For new files, file.ID was empty when building the
fileChunks and chunkFiles slices. The ID was only generated later
in Files.Create(), but by then the slices already had empty FileID
values, causing FK constraint failures.

Also adds PROCESS.md documenting the snapshot creation lifecycle,
database transactions, and FK dependency ordering.
2025-12-19 19:48:48 +07:00
899448e1da Cache chunk hashes in memory for faster small file processing
Load all known chunk hashes into an in-memory map at scan start,
eliminating per-chunk database queries during file processing.
This significantly improves performance when backing up many small files.
2025-12-19 12:56:04 +07:00
24c5e8c5a6 Refactor: Create file records only after successful chunking
- Scan phase now only collects files to process, no DB writes
- Unchanged files get snapshot_files associations via batch (no new records)
- New/changed files get records created during processing after chunking
- Reduces DB writes significantly (only changed files need new records)
- Avoids orphaned file records if backup is interrupted mid-way
2025-12-19 12:40:45 +07:00
40fff09594 Update progress output format with compact file counts
New format: Progress [5.7k/610k] 6.7 GB/44 GB (15.4%), 106 MB/sec, 500 files/sec, running for 1m30s, ETA: 5m49s

- Compact file counts with k/M suffixes in brackets
- Bytes processed/total with percentage
- Both byte rate and file rate
- Elapsed time shown as "running for X"
2025-12-19 12:33:38 +07:00
8a8651c690 Fix foreign key error when deleting incomplete snapshots
Delete uploads table entries before deleting the snapshot itself.
The uploads table has a foreign key to snapshots(id) without CASCADE,
so we must explicitly delete upload records first.
2025-12-19 12:27:05 +07:00
a1d559c30d Improve processing progress output with bytes and blob messages
- Show bytes processed/total instead of just files
- Display data rate in bytes/sec
- Calculate ETA based on bytes (more accurate than files)
- Print message when each blob is stored with size and speed
2025-12-19 12:24:55 +07:00
88e2508dc7 Eliminate redundant filesystem traversal in scan phase
Remove the separate enumerateFiles() function that was doing a full
directory walk using Readdir() which calls stat() on every file.
Instead, build the existingFiles map during the scan phase walk,
and detect deleted files afterward.

This eliminates one full filesystem traversal, significantly speeding
up the scan phase for large directories.
2025-12-19 12:15:13 +07:00
c3725e745e Optimize scan phase: in-memory change detection and batched DB writes
Performance improvements:
- Load all known files from DB into memory at startup
- Check file changes against in-memory map (no per-file DB queries)
- Batch database writes in groups of 1000 files per transaction
- Scan phase now only counts regular files, not directories

This should improve scan speed from ~600 files/sec to potentially
10,000+ files/sec by eliminating per-file database round trips.
2025-12-19 12:08:47 +07:00
badc0c07e0 Add pluggable storage backend, PID locking, and improved scan progress
Storage backend:
- Add internal/storage package with Storer interface
- Implement FileStorer for local filesystem storage (file:// URLs)
- Implement S3Storer wrapping existing s3.Client
- Support storage_url config field (s3:// or file://)
- Migrate all consumers to use storage.Storer interface

PID locking:
- Add internal/pidlock package to prevent concurrent instances
- Acquire lock before app start, release on exit
- Detect stale locks from crashed processes

Scan progress improvements:
- Add fast file enumeration pass before stat() phase
- Use enumerated set for deletion detection (no extra filesystem access)
- Show progress with percentage, files/sec, elapsed time, and ETA
- Change "changed" to "changed/new" for clarity

Config improvements:
- Add tilde expansion for paths (~/)
- Use xdg library for platform-specific default index path
2025-12-19 11:52:51 +07:00
cda0cf865a Add ARCHITECTURE.md documenting internal design
Document the data model, type instantiation flow, and module
responsibilities. Covers chunker, packer, vaultik, cli, snapshot,
and database modules with detailed explanations of relationships
between File, Chunk, Blob, and Snapshot entities.
2025-12-18 19:49:42 -08:00
0736bd070b Add godoc documentation to exported types and methods
Add proper godoc comments to exported items in:
- internal/globals: Appname, Version, Commit variables; Globals type; New function
- internal/log: LogLevel type; level constants; Config type; Initialize, Fatal,
  Error, Warn, Notice, Info, Debug functions and variants; TTYHandler type and
  methods; Module variable; LogOptions type
2025-12-18 18:51:52 -08:00
d7cd9aac27 Add end-to-end integration tests for Vaultik
- Create comprehensive integration tests with mock S3 client
- Add in-memory filesystem and SQLite database support for testing
- Test full backup workflow including chunking, packing, and uploading
- Add test to verify encrypted blob content
- Fix scanner to use afero filesystem for temp file cleanup
- Demonstrate successful backup and verification with mock dependencies
2025-07-26 15:52:23 +02:00
bb38f8c5d6 Integrate afero filesystem abstraction library
- Add afero.Fs field to Vaultik struct for filesystem operations
- Vaultik now owns and manages the filesystem instance
- SnapshotManager receives filesystem via SetFilesystem() setter
- Update blob packer to use afero for temporary files
- Convert all filesystem operations to use afero abstraction
- Remove filesystem module - Vaultik manages filesystem directly
- Update tests: remove symlink test (unsupported by afero memfs)
- Fix TestMultipleFileChanges to handle scanner examining directories

This enables full end-to-end testing without touching disk by using
memory-backed filesystems. Database operations continue using real
filesystem as SQLite requires actual files.
2025-07-26 15:33:18 +02:00
e29a995120 Refactor: Move Vaultik struct and methods to internal/vaultik package
- Created new internal/vaultik package with unified Vaultik struct
- Moved all command methods (snapshot, info, prune, verify) from CLI to vaultik package
- Implemented single constructor that handles crypto capabilities automatically
- Added CanDecrypt() method to check if decryption is available
- Updated all CLI commands to use the new vaultik.Vaultik struct
- Removed old fragmented App structs and WithCrypto wrapper
- Fixed context management - Vaultik now owns its context lifecycle
- Cleaned up package imports and dependencies

This creates a cleaner separation between CLI/Cobra code and business logic,
with all vaultik operations now centralized in the internal/vaultik package.
2025-07-26 14:47:26 +02:00
5c70405a85 Fix snapshot list to fail on manifest errors
- Remove error suppression for manifest decoding errors
- Manifest read/deserialize errors now fail immediately with clear error messages
- This ensures we catch format mismatches and other issues early
2025-07-26 03:31:09 +02:00
a544fa80f2 Major refactoring: Updated manifest format and renamed backup to snapshot
- Created manifest.go with proper Manifest structure including blob sizes
- Updated manifest generation to include compressed size for each blob
- Added TotalCompressedSize field to manifest for quick access
- Renamed backup package to snapshot for clarity
- Updated snapshot list to show all remote snapshots
- Remote snapshots not in local DB fetch manifest to get size
- Local snapshots not in remote are automatically deleted
- Removed backwards compatibility code (pre-1.0, no users)
- Fixed prune command to use new manifest format
- Updated all imports and references from backup to snapshot
2025-07-26 03:27:47 +02:00
c07d8eec0a Fix snapshot list to not download manifests
- Removed unnecessary manifest downloads from snapshot list command
- Removed blob size calculation from listing operation
- Removed COMPRESSED SIZE column from output since we're not calculating it
- This makes snapshot list much faster and avoids 404 errors for old snapshots
2025-07-26 03:16:18 +02:00