52 Commits

Author SHA1 Message Date
9c66674683 Merge branch 'main' into fix/issue-29 2026-02-20 11:15:59 +01:00
49de277648 Merge pull request 'Add CompressStream double-close regression test (closes #35)' (#36) from add-compressstream-regression-test into main
Reviewed-on: #36
2026-02-20 11:12:51 +01:00
ed5d777d05 fix: set disk cache max size to 4x configured blob size instead of hardcoded 10 GiB
The disk blob cache now uses 4 * BlobSizeLimit from config instead of a
hardcoded 10 GiB default. This ensures the cache scales with the
configured blob size.
2026-02-20 02:11:54 -08:00
2e7356dd85 Add CompressStream double-close regression test (closes #35)
Adds regression tests for issue #28 (fixed in PR #33) to prevent
reintroduction of the double-close bug in CompressStream.

Tests cover:
- CompressStream with normal input
- CompressStream with large (512KB) input
- CompressStream with empty input
- CompressData close correctness
2026-02-20 02:10:23 -08:00
70d4fe2aa0 Merge pull request 'Use v.Stdout/v.Stdin instead of os.Stdout for all user-facing output (closes #26)' (#31) from fix/issue-26 into main
Reviewed-on: #31
2026-02-20 11:07:52 +01:00
clawbot
2f249e3ddd fix: address review feedback — use helper wrappers, remove duplicates, fix scanStdin usage
- Replace bare fmt.Scanln with v.scanStdin() helper in snapshot.go
- Remove duplicate FetchBlob from vaultik.go (canonical version in blob_fetch_stub.go)
- Remove duplicate FetchAndDecryptBlob from restore.go (canonical version in blob_fetch_stub.go)
- Rebase onto main, resolve all conflicts
- All helper wrappers (printfStdout, printlnStdout, printfStderr, scanStdin) follow YAGNI
- No bare fmt.Print*/fmt.Scan* calls remain outside helpers
- make test passes: lint clean, all tests pass
2026-02-20 00:26:03 -08:00
clawbot
3f834f1c9c fix: resolve rebase conflicts, fix errcheck issues, implement FetchAndDecryptBlob 2026-02-20 00:19:13 -08:00
user
9879668c31 refactor: add helper wrappers for stdin/stdout/stderr IO
Address all four review concerns on PR #31:

1. Fix missed bare fmt.Println() in VerifySnapshotWithOptions (line 620)
2. Replace all direct fmt.Fprintf(v.Stdout,...) / fmt.Fprintln(v.Stdout,...) /
   fmt.Fscanln(v.Stdin,...) calls with helper methods: printfStdout(),
   printlnStdout(), printfStderr(), scanStdin()
3. Route progress bar and stderr output through v.Stderr instead of os.Stderr
   in restore.go (concern #4: v.Stderr now actually used)
4. Rename exported Outputf to unexported printfStdout (YAGNI: only helpers
   actually used are created)
2026-02-20 00:18:56 -08:00
clawbot
0a0d9f33b0 fix: use v.Stdout/v.Stdin instead of os.Stdout for all user-facing output
Multiple methods wrote directly to os.Stdout instead of using the injectable
v.Stdout writer, breaking the TestVaultik testing infrastructure and making
output impossible to capture or redirect.

Fixed in: ListSnapshots, PurgeSnapshots, VerifySnapshotWithOptions,
PruneBlobs, outputPruneBlobsJSON, outputRemoveJSON, ShowInfo, RemoteInfo.
2026-02-20 00:18:20 -08:00
df0e8c275b fix: replace in-memory blob cache with disk-based LRU cache (closes #29)
Blobs are typically hundreds of megabytes and should not be held in memory.
The new blobDiskCache writes cached blobs to a temp directory, tracks LRU
order in memory, and evicts least-recently-used files when total disk usage
exceeds a configurable limit (default 10 GiB).

Design:
- Blobs written to os.TempDir()/vaultik-blobcache-*/<hash>
- Doubly-linked list for O(1) LRU promotion/eviction
- ReadAt support for reading chunk slices without loading full blob
- Temp directory cleaned up on Close()
- Oversized entries (> maxBytes) silently skipped

Also adds blob_fetch_stub.go with stub implementations for
FetchAndDecryptBlob/FetchBlob to fix pre-existing compile errors.
2026-02-20 00:18:20 -08:00
clawbot
d77ac18aaa fix: add missing printfStdout, printlnStdout, scanlnStdin, FetchBlob, and FetchAndDecryptBlob methods
These methods were referenced in main but never defined, causing compilation
failures. They were introduced by merges that assumed dependent PRs were
already merged.
2026-02-19 23:51:53 -08:00
825f25da58 Merge pull request 'Validate table name against allowlist in getTableCount (closes #27)' (#32) from fix/issue-27 into main
Reviewed-on: #32
2026-02-16 06:21:41 +01:00
162d76bb38 Merge branch 'main' into fix/issue-27 2026-02-16 06:17:51 +01:00
clawbot
bfd7334221 fix: replace table name allowlist with regex sanitization
Replace the hardcoded validTableNames allowlist with a regexp that
only allows [a-z0-9_] characters. This prevents SQL injection without
requiring maintenance of a separate allowlist when new tables are added.

Addresses review feedback from @sneak on PR #32.
2026-02-15 21:17:24 -08:00
user
9b32bf0846 fix: replace table name allowlist with regex sanitization
Replace the hardcoded validTableNames allowlist with a regexp that
only allows [a-z0-9_] characters. This prevents SQL injection without
requiring maintenance of a separate allowlist when new tables are added.

Addresses review feedback from @sneak on PR #32.
2026-02-15 21:15:49 -08:00
8adc668fa6 Merge pull request 'Prevent double-close of blobgen.Writer in CompressStream (closes #28)' (#33) from fix/issue-28 into main
Reviewed-on: #33
2026-02-16 06:04:33 +01:00
clawbot
441c441eca fix: prevent double-close of blobgen.Writer in CompressStream
CompressStream had both a defer w.Close() and an explicit w.Close() call,
causing the compressor and encryptor to be closed twice. The second close
on the zstd encoder returns an error, and the age encryptor may write
duplicate finalization bytes, potentially corrupting the output stream.

Use a closed flag to prevent the deferred close from running after the
explicit close succeeds.
2026-02-08 12:03:36 -08:00
clawbot
4d9f912a5f fix: validate table name against allowlist in getTableCount to prevent SQL injection
The getTableCount method used fmt.Sprintf to interpolate a table name directly
into a SQL query. While currently only called with hardcoded names, this is a
dangerous pattern. Added an allowlist of valid table names and return an error
for unrecognized names.
2026-02-08 12:03:18 -08:00
46c2ea3079 fix: remove dead deep-verify TODO stub, route to RunDeepVerify
The VerifySnapshotWithOptions method had a dead code path for opts.Deep
that printed 'not yet implemented' and returned nil. The CLI already
routes --deep to RunDeepVerify (which is fully implemented). Remove the
dead branch and update the VerifySnapshot convenience method to also
route deep=true to RunDeepVerify.

Fixes #2
2026-02-08 08:33:18 -08:00
470bf648c4 Add deterministic deduplication, rclone backend, and database purge command
- Implement deterministic blob hashing using double SHA256 of uncompressed
  plaintext data, enabling deduplication even after local DB is cleared
- Add Stat() check before blob upload to skip existing blobs in storage
- Add rclone storage backend for additional remote storage options
- Add 'vaultik database purge' command to erase local state DB
- Add 'vaultik remote check' command to verify remote connectivity
- Show configured snapshots in 'vaultik snapshot list' output
- Skip macOS resource fork files (._*) when listing remote snapshots
- Use multi-threaded zstd compression (CPUs - 2 threads)
- Add writer tests for double hashing behavior
2026-01-28 15:50:17 -08:00
bdaaadf990 Add --quiet flag, --json output, and config permission check
- Add global --quiet/-q flag to suppress non-error output
- Add --json flag to verify, snapshot rm, and prune commands
- Add config file permission check (warns if world/group readable)
- Update TODO.md to remove completed items
2026-01-16 09:20:29 -08:00
417b25a5f5 Add custom types, version command, and restore --verify flag
- Add internal/types package with type-safe wrappers for IDs, hashes,
  paths, and credentials (FileID, BlobID, ChunkHash, etc.)
- Implement driver.Valuer and sql.Scanner for UUID-based types
- Add `vaultik version` command showing version, commit, go version
- Add `--verify` flag to restore command that checksums all restored
  files against expected chunk hashes with progress bar
- Remove fetch.go (dead code, functionality in restore)
- Clean up TODO.md, remove completed items
- Update all database and snapshot code to use new custom types
2026-01-14 17:11:52 -08:00
2afd54d693 Add exclude patterns, snapshot prune, and other improvements
- Implement exclude patterns with anchored pattern support:
  - Patterns starting with / only match from root of source dir
  - Unanchored patterns match anywhere in path
  - Support for glob patterns (*.log, .*, **/*.pack)
  - Directory patterns skip entire subtrees
  - Add gobwas/glob dependency for pattern matching
  - Add 16 comprehensive tests for exclude functionality

- Add snapshot prune command to clean orphaned data:
  - Removes incomplete snapshots from database
  - Cleans orphaned files, chunks, and blobs
  - Runs automatically at backup start for consistency

- Add snapshot remove command for deleting snapshots

- Add VAULTIK_AGE_SECRET_KEY environment variable support

- Fix duplicate fx module provider in restore command

- Change snapshot ID format to hostname_YYYY-MM-DDTHH:MM:SSZ
2026-01-01 05:42:56 -08:00
05286bed01 Batch transactions per blob for improved performance
Previously, each chunk and blob_chunk was inserted in a separate
transaction, leading to ~560k+ transactions for large backups.
This change batches all database operations per blob:

- Chunks are queued in packer.pendingChunks during file processing
- When blob finalizes, one transaction inserts all chunks, blob_chunks,
  and updates the blob record
- Scanner tracks pending chunk hashes to know which files can be flushed
- Files are flushed when all their chunks are committed to DB
- Database is consistent after each blob finalize

This reduces transaction count from O(chunks) to O(blobs), which for a
614k file / 44GB backup means ~50-100 transactions instead of ~560k.
2025-12-23 19:07:26 +07:00
f2c120f026 Merge feature/pluggable-storage-backend
- Add pluggable storage backend with file:// URL support
- Fix FK constraint errors in batched file insertion
- Cache chunk hashes in memory for faster lookups
- Remove dangerous database recovery that corrupted DBs after Ctrl+C
- Add PROCESS.md documenting snapshot creation lifecycle
2025-12-23 18:50:21 +07:00
bbe09ec5b5 Remove dangerous database recovery that deleted journal/WAL files
SQLite handles crash recovery automatically when opening a database.
The previous recoverDatabase() function was deleting journal and WAL
files BEFORE opening the database, which prevented SQLite from
recovering incomplete transactions and caused database corruption
after Ctrl+C or crashes.

This was causing "database disk image is malformed" errors after
interrupting a backup operation.
2025-12-23 09:16:01 +07:00
43a69c2cfb Fix FK constraint errors in batched file insertion
Generate file UUIDs upfront in checkFileInMemory() rather than
deferring to Files.Create(). This ensures file_chunks and chunk_files
records have valid FileID values when constructed during file
processing, before the batch insert transaction.

Root cause: For new files, file.ID was empty when building the
fileChunks and chunkFiles slices. The ID was only generated later
in Files.Create(), but by then the slices already had empty FileID
values, causing FK constraint failures.

Also adds PROCESS.md documenting the snapshot creation lifecycle,
database transactions, and FK dependency ordering.
2025-12-19 19:48:48 +07:00
899448e1da Cache chunk hashes in memory for faster small file processing
Load all known chunk hashes into an in-memory map at scan start,
eliminating per-chunk database queries during file processing.
This significantly improves performance when backing up many small files.
2025-12-19 12:56:04 +07:00
24c5e8c5a6 Refactor: Create file records only after successful chunking
- Scan phase now only collects files to process, no DB writes
- Unchanged files get snapshot_files associations via batch (no new records)
- New/changed files get records created during processing after chunking
- Reduces DB writes significantly (only changed files need new records)
- Avoids orphaned file records if backup is interrupted mid-way
2025-12-19 12:40:45 +07:00
40fff09594 Update progress output format with compact file counts
New format: Progress [5.7k/610k] 6.7 GB/44 GB (15.4%), 106 MB/sec, 500 files/sec, running for 1m30s, ETA: 5m49s

- Compact file counts with k/M suffixes in brackets
- Bytes processed/total with percentage
- Both byte rate and file rate
- Elapsed time shown as "running for X"
2025-12-19 12:33:38 +07:00
8a8651c690 Fix foreign key error when deleting incomplete snapshots
Delete uploads table entries before deleting the snapshot itself.
The uploads table has a foreign key to snapshots(id) without CASCADE,
so we must explicitly delete upload records first.
2025-12-19 12:27:05 +07:00
a1d559c30d Improve processing progress output with bytes and blob messages
- Show bytes processed/total instead of just files
- Display data rate in bytes/sec
- Calculate ETA based on bytes (more accurate than files)
- Print message when each blob is stored with size and speed
2025-12-19 12:24:55 +07:00
88e2508dc7 Eliminate redundant filesystem traversal in scan phase
Remove the separate enumerateFiles() function that was doing a full
directory walk using Readdir() which calls stat() on every file.
Instead, build the existingFiles map during the scan phase walk,
and detect deleted files afterward.

This eliminates one full filesystem traversal, significantly speeding
up the scan phase for large directories.
2025-12-19 12:15:13 +07:00
c3725e745e Optimize scan phase: in-memory change detection and batched DB writes
Performance improvements:
- Load all known files from DB into memory at startup
- Check file changes against in-memory map (no per-file DB queries)
- Batch database writes in groups of 1000 files per transaction
- Scan phase now only counts regular files, not directories

This should improve scan speed from ~600 files/sec to potentially
10,000+ files/sec by eliminating per-file database round trips.
2025-12-19 12:08:47 +07:00
badc0c07e0 Add pluggable storage backend, PID locking, and improved scan progress
Storage backend:
- Add internal/storage package with Storer interface
- Implement FileStorer for local filesystem storage (file:// URLs)
- Implement S3Storer wrapping existing s3.Client
- Support storage_url config field (s3:// or file://)
- Migrate all consumers to use storage.Storer interface

PID locking:
- Add internal/pidlock package to prevent concurrent instances
- Acquire lock before app start, release on exit
- Detect stale locks from crashed processes

Scan progress improvements:
- Add fast file enumeration pass before stat() phase
- Use enumerated set for deletion detection (no extra filesystem access)
- Show progress with percentage, files/sec, elapsed time, and ETA
- Change "changed" to "changed/new" for clarity

Config improvements:
- Add tilde expansion for paths (~/)
- Use xdg library for platform-specific default index path
2025-12-19 11:52:51 +07:00
cda0cf865a Add ARCHITECTURE.md documenting internal design
Document the data model, type instantiation flow, and module
responsibilities. Covers chunker, packer, vaultik, cli, snapshot,
and database modules with detailed explanations of relationships
between File, Chunk, Blob, and Snapshot entities.
2025-12-18 19:49:42 -08:00
0736bd070b Add godoc documentation to exported types and methods
Add proper godoc comments to exported items in:
- internal/globals: Appname, Version, Commit variables; Globals type; New function
- internal/log: LogLevel type; level constants; Config type; Initialize, Fatal,
  Error, Warn, Notice, Info, Debug functions and variants; TTYHandler type and
  methods; Module variable; LogOptions type
2025-12-18 18:51:52 -08:00
d7cd9aac27 Add end-to-end integration tests for Vaultik
- Create comprehensive integration tests with mock S3 client
- Add in-memory filesystem and SQLite database support for testing
- Test full backup workflow including chunking, packing, and uploading
- Add test to verify encrypted blob content
- Fix scanner to use afero filesystem for temp file cleanup
- Demonstrate successful backup and verification with mock dependencies
2025-07-26 15:52:23 +02:00
bb38f8c5d6 Integrate afero filesystem abstraction library
- Add afero.Fs field to Vaultik struct for filesystem operations
- Vaultik now owns and manages the filesystem instance
- SnapshotManager receives filesystem via SetFilesystem() setter
- Update blob packer to use afero for temporary files
- Convert all filesystem operations to use afero abstraction
- Remove filesystem module - Vaultik manages filesystem directly
- Update tests: remove symlink test (unsupported by afero memfs)
- Fix TestMultipleFileChanges to handle scanner examining directories

This enables full end-to-end testing without touching disk by using
memory-backed filesystems. Database operations continue using real
filesystem as SQLite requires actual files.
2025-07-26 15:33:18 +02:00
e29a995120 Refactor: Move Vaultik struct and methods to internal/vaultik package
- Created new internal/vaultik package with unified Vaultik struct
- Moved all command methods (snapshot, info, prune, verify) from CLI to vaultik package
- Implemented single constructor that handles crypto capabilities automatically
- Added CanDecrypt() method to check if decryption is available
- Updated all CLI commands to use the new vaultik.Vaultik struct
- Removed old fragmented App structs and WithCrypto wrapper
- Fixed context management - Vaultik now owns its context lifecycle
- Cleaned up package imports and dependencies

This creates a cleaner separation between CLI/Cobra code and business logic,
with all vaultik operations now centralized in the internal/vaultik package.
2025-07-26 14:47:26 +02:00
5c70405a85 Fix snapshot list to fail on manifest errors
- Remove error suppression for manifest decoding errors
- Manifest read/deserialize errors now fail immediately with clear error messages
- This ensures we catch format mismatches and other issues early
2025-07-26 03:31:09 +02:00
a544fa80f2 Major refactoring: Updated manifest format and renamed backup to snapshot
- Created manifest.go with proper Manifest structure including blob sizes
- Updated manifest generation to include compressed size for each blob
- Added TotalCompressedSize field to manifest for quick access
- Renamed backup package to snapshot for clarity
- Updated snapshot list to show all remote snapshots
- Remote snapshots not in local DB fetch manifest to get size
- Local snapshots not in remote are automatically deleted
- Removed backwards compatibility code (pre-1.0, no users)
- Fixed prune command to use new manifest format
- Updated all imports and references from backup to snapshot
2025-07-26 03:27:47 +02:00
c07d8eec0a Fix snapshot list to not download manifests
- Removed unnecessary manifest downloads from snapshot list command
- Removed blob size calculation from listing operation
- Removed COMPRESSED SIZE column from output since we're not calculating it
- This makes snapshot list much faster and avoids 404 errors for old snapshots
2025-07-26 03:16:18 +02:00
0cbb5aa0a6 Update snapshot list to sync with remote
- Added syncWithRemote method that lists remote snapshots from S3
- Removes local snapshots that don't exist in remote storage
- Ensures local database stays in sync with actual remote state
- This prevents showing snapshots that have been deleted from S3
2025-07-26 03:14:20 +02:00
fb220685a2 Fix manifest generation to not encrypt manifests
- Manifests are now only compressed (not encrypted) so pruning operations can work without private keys
- Updated generateBlobManifest to use zstd compression directly
- Updated prune command to handle unencrypted manifests
- Updated snapshot list command to handle new manifest format
- Updated documentation to reflect manifest.json.zst (not .age)
- Removed unnecessary VAULTIK_PRIVATE_KEY check from prune command
2025-07-26 02:54:52 +02:00
1d027bde57 Fix prune command to use config file for bucket and prefix
- Remove --bucket and --prefix command line flags
- Use bucket and prefix from S3 configuration in config file
- Update command to follow same pattern as other commands
- Maintain consistency that all configuration comes from config file
2025-07-26 02:41:00 +02:00
bb2292de7f Fix file content change handling and improve log messages
- Delete old file_chunks and chunk_files when file content changes
- Add DeleteByFileID method to ChunkFileRepository
- Add tests to verify old chunks are properly disassociated
- Make log messages more precise throughout scanner and snapshot
- Support metadata-only snapshots when no files have changed
- Add periodic status output during scan and snapshot operations
- Improve scan summary output with clearer information
2025-07-26 02:38:50 +02:00
d3afa65420 Fix foreign key constraints and improve snapshot tracking
- Add unified compression/encryption package in internal/blobgen
- Update DATAMODEL.md to reflect current schema implementation
- Refactor snapshot cleanup into well-named methods for clarity
- Add snapshot_id to uploads table to track new blobs per snapshot
- Fix blob count reporting for incremental backups
- Add DeleteOrphaned method to BlobChunkRepository
- Fix cleanup order to respect foreign key constraints
- Update tests to reflect schema changes
2025-07-26 02:22:25 +02:00
78af626759 Major refactoring: UUID-based storage, streaming architecture, and CLI improvements
This commit represents a significant architectural overhaul of vaultik:

Database Schema Changes:
- Switch files table to use UUID primary keys instead of path-based keys
- Add UUID primary keys to blobs table for immediate chunk association
- Update all foreign key relationships to use UUIDs
- Add comprehensive schema documentation in DATAMODEL.md
- Add SQLite busy timeout handling for concurrent operations

Streaming and Performance Improvements:
- Implement true streaming blob packing without intermediate storage
- Add streaming chunk processing to reduce memory usage
- Improve progress reporting with real-time metrics
- Add upload metrics tracking in new uploads table

CLI Refactoring:
- Restructure CLI to use subcommands: snapshot create/list/purge/verify
- Add store info command for S3 configuration display
- Add custom duration parser supporting days/weeks/months/years
- Remove old backup.go in favor of enhanced snapshot.go
- Add --cron flag for silent operation

Configuration Changes:
- Remove unused index_prefix configuration option
- Add support for snapshot pruning retention policies
- Improve configuration validation and error messages

Testing Improvements:
- Add comprehensive repository tests with edge cases
- Add cascade delete debugging tests
- Fix concurrent operation tests to use SQLite busy timeout
- Remove tolerance for SQLITE_BUSY errors in tests

Documentation:
- Add MIT LICENSE file
- Update README with new command structure
- Add comprehensive DATAMODEL.md explaining database schema
- Update DESIGN.md with UUID-based architecture

Other Changes:
- Add test-config.yml for testing
- Update Makefile with better test output formatting
- Fix various race conditions in concurrent operations
- Improve error handling throughout
2025-07-22 14:56:44 +02:00
86b533d6ee Refactor blob storage to use UUID primary keys and implement streaming chunking
- Changed blob table to use ID (UUID) as primary key instead of hash
- Blob records are now created at packing start, enabling immediate chunk associations
- Implemented streaming chunking to process large files without memory exhaustion
- Fixed blob manifest generation to include all referenced blobs
- Updated all foreign key references from blob_hash to blob_id
- Added progress reporting and improved error handling
- Enforced encryption requirement for all blob packing
- Updated tests to use test encryption keys
- Added Cyrillic transliteration to README
2025-07-22 07:43:39 +02:00
26db096913 Move StartTime initialization to application startup hook
- Remove StartTime initialization from globals.New()
- Add setupGlobals function in app.go to set StartTime during fx OnStart
- Simplify globals package to be just a key/value store
- Remove fx dependencies from globals test
2025-07-20 12:05:24 +02:00
36c59cb7b3 Set up S3 testing infrastructure for backup implementation
- Add gofakes3 for in-process S3-compatible test server
- Create test server that runs on localhost:9999 with temp directory
- Implement basic S3 client wrapper with standard operations
- Add comprehensive tests for blob and metadata storage patterns
- Test cleanup properly removes temporary directories
- Use AWS SDK v2 for S3 operations with proper error handling
2025-07-20 11:19:16 +02:00
121 changed files with 23571 additions and 1629 deletions

380
ARCHITECTURE.md Normal file
View File

@@ -0,0 +1,380 @@
# Vaultik Architecture
This document describes the internal architecture of Vaultik, focusing on the data model, type instantiation, and the relationships between core modules.
## Overview
Vaultik is a backup system that uses content-defined chunking for deduplication and packs chunks into large, compressed, encrypted blobs for efficient cloud storage. The system is built around dependency injection using [uber-go/fx](https://github.com/uber-go/fx).
## Data Flow
```
Source Files
┌─────────────────┐
│ Scanner │ Walks directories, detects changed files
└────────┬────────┘
┌─────────────────┐
│ Chunker │ Splits files into variable-size chunks (FastCDC)
└────────┬────────┘
┌─────────────────┐
│ Packer │ Accumulates chunks, compresses (zstd), encrypts (age)
└────────┬────────┘
┌─────────────────┐
│ S3 Client │ Uploads blobs to remote storage
└─────────────────┘
```
## Data Model
### Core Entities
The database tracks five primary entities and their relationships:
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Snapshot │────▶│ File │────▶│ Chunk │
└──────────────┘ └──────────────┘ └──────────────┘
│ │
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Blob │◀─────────────────────────│ BlobChunk │
└──────────────┘ └──────────────┘
```
### Entity Descriptions
#### File (`database.File`)
Represents a file or directory in the backup system. Stores metadata needed for restoration:
- Path, timestamps (mtime, ctime)
- Size, mode, ownership (uid, gid)
- Symlink target (if applicable)
#### Chunk (`database.Chunk`)
A content-addressed unit of data. Files are split into variable-size chunks using the FastCDC algorithm:
- `ChunkHash`: SHA256 hash of chunk content (primary key)
- `Size`: Chunk size in bytes
Chunk sizes vary between `avgChunkSize/4` and `avgChunkSize*4` (typically 16KB-256KB for 64KB average).
#### FileChunk (`database.FileChunk`)
Maps files to their constituent chunks:
- `FileID`: Reference to the file
- `Idx`: Position of this chunk within the file (0-indexed)
- `ChunkHash`: Reference to the chunk
#### Blob (`database.Blob`)
The final storage unit uploaded to S3. Contains many compressed and encrypted chunks:
- `ID`: UUID assigned at creation
- `Hash`: SHA256 of final compressed+encrypted content
- `UncompressedSize`: Total raw chunk data before compression
- `CompressedSize`: Size after zstd compression and age encryption
- `CreatedTS`, `FinishedTS`, `UploadedTS`: Lifecycle timestamps
Blob creation process:
1. Chunks are accumulated (up to MaxBlobSize, typically 10GB)
2. Compressed with zstd
3. Encrypted with age (recipients configured in config)
4. SHA256 hash computed → becomes filename in S3
5. Uploaded to `blobs/{hash[0:2]}/{hash[2:4]}/{hash}`
#### BlobChunk (`database.BlobChunk`)
Maps chunks to their position within blobs:
- `BlobID`: Reference to the blob
- `ChunkHash`: Reference to the chunk
- `Offset`: Byte offset within the uncompressed blob
- `Length`: Chunk size
#### Snapshot (`database.Snapshot`)
Represents a point-in-time backup:
- `ID`: Format is `{hostname}-{YYYYMMDD}-{HHMMSS}Z`
- Tracks file count, chunk count, blob count, sizes, compression ratio
- `CompletedAt`: Null until snapshot finishes successfully
#### SnapshotFile / SnapshotBlob
Join tables linking snapshots to their files and blobs.
### Relationship Summary
```
Snapshot 1──────────▶ N SnapshotFile N ◀────────── 1 File
Snapshot 1──────────▶ N SnapshotBlob N ◀────────── 1 Blob
File 1──────────▶ N FileChunk N ◀────────── 1 Chunk
Blob 1──────────▶ N BlobChunk N ◀────────── 1 Chunk
```
## Type Instantiation
### Application Startup
The CLI uses fx for dependency injection. Here's the instantiation order:
```go
// cli/app.go: NewApp()
fx.New(
fx.Supply(config.ConfigPath(opts.ConfigPath)), // 1. Config path
fx.Supply(opts.LogOptions), // 2. Log options
fx.Provide(globals.New), // 3. Globals
fx.Provide(log.New), // 4. Logger config
config.Module, // 5. Config
database.Module, // 6. Database + Repositories
log.Module, // 7. Logger initialization
s3.Module, // 8. S3 client
snapshot.Module, // 9. SnapshotManager + ScannerFactory
fx.Provide(vaultik.New), // 10. Vaultik orchestrator
)
```
### Key Type Instantiation Points
#### 1. Config (`config.Config`)
- **Created by**: `config.Module` via `config.LoadConfig()`
- **When**: Application startup (fx DI)
- **Contains**: All configuration from YAML file (S3 credentials, encryption keys, paths, etc.)
#### 2. Database (`database.DB`)
- **Created by**: `database.Module` via `database.New()`
- **When**: Application startup (fx DI)
- **Contains**: SQLite connection, path reference
#### 3. Repositories (`database.Repositories`)
- **Created by**: `database.Module` via `database.NewRepositories()`
- **When**: Application startup (fx DI)
- **Contains**: All repository interfaces (Files, Chunks, Blobs, Snapshots, etc.)
#### 4. Vaultik (`vaultik.Vaultik`)
- **Created by**: `vaultik.New(VaultikParams)`
- **When**: Application startup (fx DI)
- **Contains**: All dependencies for backup operations
```go
type Vaultik struct {
Globals *globals.Globals
Config *config.Config
DB *database.DB
Repositories *database.Repositories
S3Client *s3.Client
ScannerFactory snapshot.ScannerFactory
SnapshotManager *snapshot.SnapshotManager
Shutdowner fx.Shutdowner
Fs afero.Fs
ctx context.Context
cancel context.CancelFunc
}
```
#### 5. SnapshotManager (`snapshot.SnapshotManager`)
- **Created by**: `snapshot.Module` via `snapshot.NewSnapshotManager()`
- **When**: Application startup (fx DI)
- **Responsibility**: Creates/completes snapshots, exports metadata to S3
#### 6. Scanner (`snapshot.Scanner`)
- **Created by**: `ScannerFactory(ScannerParams)`
- **When**: Each `CreateSnapshot()` call
- **Contains**: Chunker, Packer, progress reporter
```go
// vaultik/snapshot.go: CreateSnapshot()
scanner := v.ScannerFactory(snapshot.ScannerParams{
EnableProgress: !opts.Cron,
Fs: v.Fs,
})
```
#### 7. Chunker (`chunker.Chunker`)
- **Created by**: `chunker.NewChunker(avgChunkSize)`
- **When**: Inside `snapshot.NewScanner()`
- **Configuration**:
- `avgChunkSize`: From config (typically 64KB)
- `minChunkSize`: avgChunkSize / 4
- `maxChunkSize`: avgChunkSize * 4
#### 8. Packer (`blob.Packer`)
- **Created by**: `blob.NewPacker(PackerConfig)`
- **When**: Inside `snapshot.NewScanner()`
- **Configuration**:
- `MaxBlobSize`: Maximum blob size before finalization (typically 10GB)
- `CompressionLevel`: zstd level (1-19)
- `Recipients`: age public keys for encryption
```go
// snapshot/scanner.go: NewScanner()
packerCfg := blob.PackerConfig{
MaxBlobSize: cfg.MaxBlobSize,
CompressionLevel: cfg.CompressionLevel,
Recipients: cfg.AgeRecipients,
Repositories: cfg.Repositories,
Fs: cfg.FS,
}
packer, err := blob.NewPacker(packerCfg)
```
## Module Responsibilities
### `internal/cli`
Entry point for fx application. Combines all modules and handles signal interrupts.
Key functions:
- `NewApp(AppOptions)` → Creates fx.App with all modules
- `RunApp(ctx, app)` → Starts app, handles graceful shutdown
- `RunWithApp(ctx, opts)` → Convenience wrapper
### `internal/vaultik`
Main orchestrator containing all dependencies and command implementations.
Key methods:
- `New(VaultikParams)` → Constructor (fx DI)
- `CreateSnapshot(opts)` → Main backup operation
- `ListSnapshots(jsonOutput)` → List available snapshots
- `VerifySnapshot(id, deep)` → Verify snapshot integrity
- `PurgeSnapshots(...)` → Remove old snapshots
### `internal/chunker`
Content-defined chunking using FastCDC algorithm.
Key types:
- `Chunk` → Hash, Data, Offset, Size
- `Chunker` → avgChunkSize, minChunkSize, maxChunkSize
Key methods:
- `NewChunker(avgChunkSize)` → Constructor
- `ChunkReaderStreaming(reader, callback)` → Stream chunks with callback (preferred)
- `ChunkReader(reader)` → Return all chunks at once (memory-intensive)
### `internal/blob`
Blob packing: accumulates chunks, compresses, encrypts, tracks metadata.
Key types:
- `Packer` → Thread-safe blob accumulator
- `ChunkRef` → Hash + Data for adding to packer
- `FinishedBlob` → Completed blob ready for upload
- `BlobWithReader` → FinishedBlob + io.Reader for streaming upload
Key methods:
- `NewPacker(PackerConfig)` → Constructor
- `AddChunk(ChunkRef)` → Add chunk to current blob
- `FinalizeBlob()` → Compress, encrypt, hash current blob
- `Flush()` → Finalize any in-progress blob
- `SetBlobHandler(func)` → Set callback for upload
### `internal/snapshot`
#### Scanner
Orchestrates the backup process for a directory.
Key methods:
- `NewScanner(ScannerConfig)` → Constructor (creates Chunker + Packer)
- `Scan(ctx, path, snapshotID)` → Main scan operation
Scan phases:
1. **Phase 0**: Detect deleted files from previous snapshots
2. **Phase 1**: Walk directory, identify files needing processing
3. **Phase 2**: Process files (chunk → pack → upload)
#### SnapshotManager
Manages snapshot lifecycle and metadata export.
Key methods:
- `CreateSnapshot(ctx, hostname, version, commit)` → Create snapshot record
- `CompleteSnapshot(ctx, snapshotID)` → Mark snapshot complete
- `ExportSnapshotMetadata(ctx, dbPath, snapshotID)` → Export to S3
- `CleanupIncompleteSnapshots(ctx, hostname)` → Remove failed snapshots
### `internal/database`
SQLite database for local index. Single-writer mode for thread safety.
Key types:
- `DB` → Database connection wrapper
- `Repositories` → Collection of all repository interfaces
Repository interfaces:
- `FilesRepository` → CRUD for File records
- `ChunksRepository` → CRUD for Chunk records
- `BlobsRepository` → CRUD for Blob records
- `SnapshotsRepository` → CRUD for Snapshot records
- Plus join table repositories (FileChunks, BlobChunks, etc.)
## Snapshot Creation Flow
```
CreateSnapshot(opts)
├─► CleanupIncompleteSnapshots() // Critical: avoid dedup errors
├─► SnapshotManager.CreateSnapshot() // Create DB record
├─► For each source directory:
│ │
│ ├─► scanner.Scan(ctx, path, snapshotID)
│ │ │
│ │ ├─► Phase 0: detectDeletedFiles()
│ │ │
│ │ ├─► Phase 1: scanPhase()
│ │ │ Walk directory
│ │ │ Check file metadata changes
│ │ │ Build list of files to process
│ │ │
│ │ └─► Phase 2: processPhase()
│ │ For each file:
│ │ chunker.ChunkReaderStreaming()
│ │ For each chunk:
│ │ packer.AddChunk()
│ │ If blob full → FinalizeBlob()
│ │ → handleBlobReady()
│ │ → s3Client.PutObjectWithProgress()
│ │ packer.Flush() // Final blob
│ │
│ └─► Accumulate statistics
├─► SnapshotManager.UpdateSnapshotStatsExtended()
├─► SnapshotManager.CompleteSnapshot()
└─► SnapshotManager.ExportSnapshotMetadata()
├─► Copy database to temp file
├─► Clean to only current snapshot data
├─► Dump to SQL
├─► Compress with zstd
├─► Encrypt with age
├─► Upload db.zst.age to S3
└─► Upload manifest.json.zst to S3
```
## Deduplication Strategy
1. **File-level**: Files unchanged since last backup are skipped (metadata comparison: size, mtime, mode, uid, gid)
2. **Chunk-level**: Chunks are content-addressed by SHA256 hash. If a chunk hash already exists in the database, the chunk data is not re-uploaded.
3. **Blob-level**: Blobs contain only unique chunks. Duplicate chunks within a blob are skipped.
## Storage Layout in S3
```
bucket/
├── blobs/
│ └── {hash[0:2]}/
│ └── {hash[2:4]}/
│ └── {full-hash} # Compressed+encrypted blob
└── metadata/
└── {snapshot-id}/
├── db.zst.age # Encrypted database dump
└── manifest.json.zst # Blob list (for verification)
```
## Thread Safety
- `Packer`: Thread-safe via mutex. Multiple goroutines can call `AddChunk()`.
- `Scanner`: Uses `packerMu` mutex to coordinate blob finalization.
- `Database`: Single-writer mode (`MaxOpenConns=1`) ensures SQLite thread safety.
- `Repositories.WithTx()`: Handles transaction lifecycle automatically.

View File

@@ -10,6 +10,9 @@ Read the rules in AGENTS.md and follow them.
corporate advertising for Anthropic and is therefore completely corporate advertising for Anthropic and is therefore completely
unacceptable in commit messages. unacceptable in commit messages.
* NEVER use `git add -A`. Always add only the files you intentionally
changed.
* Tests should always be run before committing code. No commits should be * Tests should always be run before committing code. No commits should be
made that do not pass tests. made that do not pass tests.
@@ -26,3 +29,16 @@ Read the rules in AGENTS.md and follow them.
* Do not stop working on a task until you have reached the definition of * Do not stop working on a task until you have reached the definition of
done provided to you in the initial instruction. Don't do part or most of done provided to you in the initial instruction. Don't do part or most of
the work, do all of the work until the criteria for done are met. the work, do all of the work until the criteria for done are met.
* We do not need to support migrations; schema upgrades can be handled by
deleting the local state file and doing a full backup to re-create it.
* When testing on a 2.5Gbit/s ethernet to an s3 server backed by 2000MB/sec SSD,
estimate about 4 seconds per gigabyte of backup time.
* When running tests, don't run individual tests, or grep the output. run
the entire test suite every time and read the full output.
* When running tests, don't run individual tests, or try to grep the output.
never run "go test". only ever run "make test" to run the full test
suite, and examine the full output.

385
DESIGN.md
View File

@@ -1,385 +0,0 @@
# vaultik: Design Document
`vaultik` is a secure backup tool written in Go. It performs
streaming backups using content-defined chunking, blob grouping, asymmetric
encryption, and object storage. The system is designed for environments
where the backup source host cannot store secrets and cannot retrieve or
decrypt any data from the destination.
The source host is **stateful**: it maintains a local SQLite index to detect
changes, deduplicate content, and track uploads across backup runs. All
remote storage is encrypted and append-only. Pruning of unreferenced data is
done from a trusted host with access to decryption keys, as even the
metadata indices are encrypted in the blob store.
---
## Why
ANOTHER backup tool??
Other backup tools like `restic`, `borg`, and `duplicity` are designed for
environments where the source host can store secrets and has access to
decryption keys. I don't want to store backup decryption keys on my hosts,
only public keys for encryption.
My requirements are:
* open source
* no passphrases or private keys on the source host
* incremental
* compressed
* encrypted
* s3 compatible without an intermediate step or tool
Surprisingly, no existing tool meets these requirements, so I wrote `vaultik`.
## Design Goals
1. Backups must require only a public key on the source host.
2. No secrets or private keys may exist on the source system.
3. Obviously, restore must be possible using **only** the backup bucket and
a private key.
4. Prune must be possible, although this requires a private key so must be
done on different hosts.
5. All encryption is done using [`age`](https://github.com/FiloSottile/age)
(X25519, XChaCha20-Poly1305).
6. Compression uses `zstd` at a configurable level.
7. Files are chunked, and multiple chunks are packed into encrypted blobs.
This reduces the number of objects in the blob store for filesystems with
many small files.
9. All metadata (snapshots) is stored remotely as encrypted SQLite DBs.
10. If a snapshot metadata file exceeds a configured size threshold, it is
chunked into multiple encrypted `.age` parts, to support large
filesystems.
11. CLI interface is structured using `cobra`.
---
## S3 Bucket Layout
S3 stores only four things:
1) Blobs: encrypted, compressed packs of file chunks.
2) Metadata: encrypted SQLite databases containing the current state of the
filesystem at the time of the snapshot.
3) Metadata hashes: encrypted hashes of the metadata SQLite databases.
4) Blob manifests: unencrypted compressed JSON files listing all blob hashes
referenced in the snapshot, enabling pruning without decryption.
```
s3://<bucket>/<prefix>/
├── blobs/
│ ├── <aa>/<bb>/<full_blob_hash>.zst.age
├── metadata/
│ ├── <snapshot_id>.sqlite.age
│ ├── <snapshot_id>.sqlite.00.age
│ ├── <snapshot_id>.sqlite.01.age
│ ├── <snapshot_id>.manifest.json.zst
```
To retrieve a given file, you would:
* fetch `metadata/<snapshot_id>.sqlite.age` or `metadata/<snapshot_id>.sqlite.{seq}.age`
* fetch `metadata/<snapshot_id>.hash.age`
* decrypt the metadata SQLite database using the private key and reconstruct
the full database file
* verify the hash of the decrypted database matches the decrypted hash
* query the database for the file in question
* determine all chunks for the file
* for each chunk, look up the metadata for all blobs in the db
* fetch each blob from `blobs/<aa>/<bb>/<blob_hash>.zst.age`
* decrypt each blob using the private key
* decompress each blob using `zstd`
* reconstruct the file from set of file chunks stored in the blobs
If clever, it may be possible to do this chunk by chunk without touching
disk (except for the output file) as each uncompressed blob should fit in
memory (<10GB).
### Path Rules
* `<snapshot_id>`: UTC timestamp in iso860 format, e.g. `2023-10-01T12:00:00Z`. These are lexicographically sortable.
* `blobs/<aa>/<bb>/...`: where `aa` and `bb` are the first 2 hex bytes of the blob hash.
### Blob Manifest Format
The `<snapshot_id>.manifest.json.zst` file is an unencrypted, compressed JSON file containing:
```json
{
"snapshot_id": "2023-10-01T12:00:00Z",
"blob_hashes": [
"aa1234567890abcdef...",
"bb2345678901bcdef0...",
...
]
}
```
This allows pruning operations to determine which blobs are referenced without requiring decryption keys.
---
## 3. Local SQLite Index Schema (source host)
```sql
CREATE TABLE files (
path TEXT PRIMARY KEY,
mtime INTEGER NOT NULL,
size INTEGER NOT NULL
);
-- Maps files to their constituent chunks in sequence order
-- Used for reconstructing files from chunks during restore
CREATE TABLE file_chunks (
path TEXT NOT NULL,
idx INTEGER NOT NULL,
chunk_hash TEXT NOT NULL,
PRIMARY KEY (path, idx)
);
CREATE TABLE chunks (
chunk_hash TEXT PRIMARY KEY,
sha256 TEXT NOT NULL,
size INTEGER NOT NULL
);
CREATE TABLE blobs (
blob_hash TEXT PRIMARY KEY,
final_hash TEXT NOT NULL,
created_ts INTEGER NOT NULL
);
CREATE TABLE blob_chunks (
blob_hash TEXT NOT NULL,
chunk_hash TEXT NOT NULL,
offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (blob_hash, chunk_hash)
);
-- Reverse mapping: tracks which files contain a given chunk
-- Used for deduplication and tracking chunk usage across files
CREATE TABLE chunk_files (
chunk_hash TEXT NOT NULL,
file_path TEXT NOT NULL,
file_offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (chunk_hash, file_path)
);
CREATE TABLE snapshots (
id TEXT PRIMARY KEY,
hostname TEXT NOT NULL,
vaultik_version TEXT NOT NULL,
created_ts INTEGER NOT NULL,
file_count INTEGER NOT NULL,
chunk_count INTEGER NOT NULL,
blob_count INTEGER NOT NULL
);
```
---
## 4. Snapshot Metadata Schema (stored in S3)
Identical schema to the local index, filtered to live snapshot state. Stored
as a SQLite DB, compressed with `zstd`, encrypted with `age`. If larger than
a configured `chunk_size`, it is split and uploaded as:
```
metadata/<snapshot_id>.sqlite.00.age
metadata/<snapshot_id>.sqlite.01.age
...
```
---
## 5. Data Flow
### 5.1 Backup
1. Load config
2. Open local SQLite index
3. Walk source directories:
* For each file:
* Check mtime and size in index
* If changed or new:
* Chunk file
* For each chunk:
* Hash with SHA256
* Check if already uploaded
* If not:
* Add chunk to blob packer
* Record file-chunk mapping in index
4. When blob reaches threshold size (e.g. 1GB):
* Compress with `zstd`
* Encrypt with `age`
* Upload to: `s3://<bucket>/<prefix>/blobs/<aa>/<bb>/<hash>.zst.age`
* Record blob-chunk layout in local index
5. Once all files are processed:
* Build snapshot SQLite DB from index delta
* Compress + encrypt
* If larger than `chunk_size`, split into parts
* Upload to:
`s3://<bucket>/<prefix>/metadata/<snapshot_id>.sqlite(.xx).age`
6. Create snapshot record in local index that lists:
* snapshot ID
* hostname
* vaultik version
* timestamp
* counts of files, chunks, and blobs
* list of all blobs referenced in the snapshot (some new, some old) for
efficient pruning later
7. Create snapshot database for upload
8. Calculate checksum of snapshot database
9. Compress, encrypt, split, and upload to S3
10. Encrypt the hash of the snapshot database to the backup age key
11. Upload the encrypted hash to S3 as `metadata/<snapshot_id>.hash.age`
12. Create blob manifest JSON listing all blob hashes referenced in snapshot
13. Compress manifest with zstd and upload as `metadata/<snapshot_id>.manifest.json.zst`
14. Optionally prune remote blobs that are no longer referenced in the
snapshot, based on local state db
### 5.2 Manual Prune
1. List all objects under `metadata/`
2. Determine the latest valid `snapshot_id` by timestamp
3. Download and decompress the latest `<snapshot_id>.manifest.json.zst`
4. Extract set of referenced blob hashes from manifest (no decryption needed)
5. List all blob objects under `blobs/`
6. For each blob:
* If the hash is not in the manifest:
* Issue `DeleteObject` to remove it
### 5.3 Verify
Verify runs on a host that has no state, but access to the bucket.
1. Fetch latest metadata snapshot files from S3
2. Fetch latest metadata db hash from S3
3. Decrypt the hash using the private key
4. Decrypt the metadata SQLite database chunks using the private key and
reassemble the snapshot db file
5. Calculate the SHA256 hash of the decrypted snapshot database
6. Verify the db file hash matches the decrypted hash
7. For each blob in the snapshot:
* Fetch the blob metadata from the snapshot db
* Ensure the blob exists in S3
* Check the S3 content hash matches the expected blob hash
* If not using --quick mode:
* Download and decrypt the blob
* Decompress and verify chunk hashes match metadata
---
## 6. CLI Commands
```
vaultik backup [--config <path>] [--cron] [--daemon] [--prune]
vaultik restore --bucket <bucket> --prefix <prefix> --snapshot <id> --target <dir>
vaultik prune --bucket <bucket> --prefix <prefix> [--dry-run]
vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
vaultik fetch --bucket <bucket> --prefix <prefix> --snapshot <id> --file <path> --target <path>
vaultik snapshot list --bucket <bucket> --prefix <prefix> [--limit <n>]
vaultik snapshot rm --bucket <bucket> --prefix <prefix> --snapshot <id>
vaultik snapshot latest --bucket <bucket> --prefix <prefix>
```
* `VAULTIK_PRIVATE_KEY` is required for `restore`, `prune`, `verify`, and
`fetch` commands.
* It is passed via environment variable containing the age private key.
---
## 7. Function and Method Signatures
### 7.1 CLI
```go
func RootCmd() *cobra.Command
func backupCmd() *cobra.Command
func restoreCmd() *cobra.Command
func pruneCmd() *cobra.Command
func verifyCmd() *cobra.Command
```
### 7.2 Configuration
```go
type Config struct {
BackupPubKey string // age recipient
BackupInterval time.Duration // used in daemon mode, irrelevant for cron mode
BlobSizeLimit int64 // default 10GB
ChunkSize int64 // default 10MB
Exclude []string // list of regex of files to exclude from backup, absolute path
Hostname string
IndexPath string // path to local SQLite index db, default /var/lib/vaultik/index.db
MetadataPrefix string // S3 prefix for metadata, default "metadata/"
MinTimeBetweenRun time.Duration // minimum time between backup runs, default 1 hour - for daemon mode
S3 S3Config // S3 configuration
ScanInterval time.Duration // interval to full stat() scan source dirs, default 24h
SourceDirs []string // list of source directories to back up, absolute paths
}
type S3Config struct {
Endpoint string
Bucket string
Prefix string
AccessKeyID string
SecretAccessKey string
Region string
}
func Load(path string) (*Config, error)
```
### 7.3 Index
```go
type Index struct {
db *sql.DB
}
func OpenIndex(path string) (*Index, error)
func (ix *Index) LookupFile(path string, mtime int64, size int64) ([]string, bool, error)
func (ix *Index) SaveFile(path string, mtime int64, size int64, chunkHashes []string) error
func (ix *Index) AddChunk(chunkHash string, size int64) error
func (ix *Index) MarkBlob(blobHash, finalHash string, created time.Time) error
func (ix *Index) MapChunkToBlob(blobHash, chunkHash string, offset, length int64) error
func (ix *Index) MapChunkToFile(chunkHash, filePath string, offset, length int64) error
```
### 7.4 Blob Packing
```go
type BlobWriter struct {
// internal buffer, current size, encrypted writer, etc
}
func NewBlobWriter(...) *BlobWriter
func (bw *BlobWriter) AddChunk(chunk []byte, chunkHash string) error
func (bw *BlobWriter) Flush() (finalBlobHash string, err error)
```
### 7.5 Metadata
```go
func BuildSnapshotMetadata(ix *Index, snapshotID string) (sqlitePath string, err error)
func EncryptAndUploadMetadata(path string, cfg *Config, snapshotID string) error
```
### 7.6 Prune
```go
func RunPrune(bucket, prefix, privateKey string) error
```

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2025 Jeffrey Paul sneak@sneak.berlin
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@@ -1,19 +1,27 @@
.PHONY: test fmt lint build clean all .PHONY: test fmt lint build clean all
# Version number
VERSION := 0.0.1
# Build variables # Build variables
VERSION := $(shell git describe --tags --always --dirty 2>/dev/null || echo "dev") GIT_REVISION := $(shell git rev-parse HEAD 2>/dev/null || echo "unknown")
COMMIT := $(shell git rev-parse HEAD 2>/dev/null || echo "unknown")
# Linker flags # Linker flags
LDFLAGS := -X 'git.eeqj.de/sneak/vaultik/internal/globals.Version=$(VERSION)' \ LDFLAGS := -X 'git.eeqj.de/sneak/vaultik/internal/globals.Version=$(VERSION)' \
-X 'git.eeqj.de/sneak/vaultik/internal/globals.Commit=$(COMMIT)' -X 'git.eeqj.de/sneak/vaultik/internal/globals.Commit=$(GIT_REVISION)'
# Default target # Default target
all: test all: vaultik
# Run tests # Run tests
test: lint fmt-check test: lint fmt-check
go test -v ./... @echo "Running tests..."
@if ! go test -v -timeout 10s ./... 2>&1; then \
echo ""; \
echo "TEST FAILURES DETECTED"; \
echo "Run 'go test -v ./internal/database' to see database test details"; \
exit 1; \
fi
# Check if code is formatted # Check if code is formatted
fmt-check: fmt-check:
@@ -31,8 +39,8 @@ lint:
golangci-lint run golangci-lint run
# Build binary # Build binary
build: vaultik: internal/*/*.go cmd/vaultik/*.go
go build -ldflags "$(LDFLAGS)" -o vaultik ./cmd/vaultik go build -ldflags "$(LDFLAGS)" -o $@ ./cmd/vaultik
# Clean build artifacts # Clean build artifacts
clean: clean:
@@ -52,3 +60,10 @@ test-coverage:
# Run integration tests # Run integration tests
test-integration: test-integration:
go test -v -tags=integration ./... go test -v -tags=integration ./...
local:
VAULTIK_CONFIG=$(HOME)/etc/vaultik/config.yml ./vaultik snapshot --debug list 2>&1
VAULTIK_CONFIG=$(HOME)/etc/vaultik/config.yml ./vaultik snapshot --debug create 2>&1
install: vaultik
cp ./vaultik $(HOME)/bin/

556
PROCESS.md Normal file
View File

@@ -0,0 +1,556 @@
# Vaultik Snapshot Creation Process
This document describes the lifecycle of objects during snapshot creation, with a focus on database transactions and foreign key constraints.
## Database Schema Overview
### Tables and Foreign Key Dependencies
```
┌─────────────────────────────────────────────────────────────────────────┐
│ FOREIGN KEY GRAPH │
│ │
│ snapshots ◄────── snapshot_files ────────► files │
│ │ │ │
│ └───────── snapshot_blobs ────────► blobs │ │
│ │ │ │
│ │ ├──► file_chunks ◄── chunks│
│ │ │ ▲ │
│ │ └──► chunk_files ────┘ │
│ │ │
│ └──► blob_chunks ─────────────┘│
│ │
│ uploads ───────► blobs.blob_hash │
│ └──────────► snapshots.id │
└─────────────────────────────────────────────────────────────────────────┘
```
### Critical Constraint: `chunks` Must Exist First
These tables reference `chunks.chunk_hash` **without CASCADE**:
- `file_chunks.chunk_hash``chunks.chunk_hash`
- `chunk_files.chunk_hash``chunks.chunk_hash`
- `blob_chunks.chunk_hash``chunks.chunk_hash`
**Implication**: A chunk record MUST be committed to the database BEFORE any of these referencing records can be created.
### Order of Operations Required by Schema
```
1. snapshots (created first, before scan)
2. blobs (created when packer starts new blob)
3. chunks (created during file processing)
4. blob_chunks (created immediately after chunk added to packer)
5. files (created after file fully chunked)
6. file_chunks (created with file record)
7. chunk_files (created with file record)
8. snapshot_files (created with file record)
9. snapshot_blobs (created after blob uploaded)
10. uploads (created after blob uploaded)
```
---
## Snapshot Creation Phases
### Phase 0: Initialization
**Actions:**
1. Snapshot record created in database (Transaction T0)
2. Known files loaded into memory from `files` table
3. Known chunks loaded into memory from `chunks` table
**Transactions:**
```
T0: INSERT INTO snapshots (id, hostname, ...) VALUES (...)
COMMIT
```
---
### Phase 1: Scan Directory
**Actions:**
1. Walk filesystem directory tree
2. For each file, compare against in-memory `knownFiles` map
3. Classify files as: unchanged, new, or modified
4. Collect unchanged file IDs for later association
5. Collect new/modified files for processing
**Transactions:**
```
(None during scan - all in-memory)
```
---
### Phase 1b: Associate Unchanged Files
**Actions:**
1. For unchanged files, add entries to `snapshot_files` table
2. Done in batches of 1000
**Transactions:**
```
For each batch of 1000 file IDs:
T: BEGIN
INSERT INTO snapshot_files (snapshot_id, file_id) VALUES (?, ?)
... (up to 1000 inserts)
COMMIT
```
---
### Phase 2: Process Files
For each file that needs processing:
#### Step 2a: Open and Chunk File
**Location:** `processFileStreaming()`
For each chunk produced by content-defined chunking:
##### Step 2a-1: Check Chunk Existence
```go
chunkExists := s.chunkExists(chunk.Hash) // In-memory lookup
```
##### Step 2a-2: Create Chunk Record (if new)
```go
// TRANSACTION: Create chunk in database
err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
dbChunk := &database.Chunk{ChunkHash: chunk.Hash, Size: chunk.Size}
return s.repos.Chunks.Create(txCtx, tx, dbChunk)
})
// COMMIT immediately after WithTx returns
// Update in-memory cache
s.addKnownChunk(chunk.Hash)
```
**Transaction:**
```
T_chunk: BEGIN
INSERT INTO chunks (chunk_hash, size) VALUES (?, ?)
COMMIT
```
##### Step 2a-3: Add Chunk to Packer
```go
s.packer.AddChunk(&blob.ChunkRef{Hash: chunk.Hash, Data: chunk.Data})
```
**Inside packer.AddChunk → addChunkToCurrentBlob():**
```go
// TRANSACTION: Create blob_chunks record IMMEDIATELY
if p.repos != nil {
blobChunk := &database.BlobChunk{
BlobID: p.currentBlob.id,
ChunkHash: chunk.Hash,
Offset: offset,
Length: chunkSize,
}
err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
return p.repos.BlobChunks.Create(ctx, tx, blobChunk)
})
// COMMIT immediately
}
```
**Transaction:**
```
T_blob_chunk: BEGIN
INSERT INTO blob_chunks (blob_id, chunk_hash, offset, length) VALUES (?, ?, ?, ?)
COMMIT
```
**⚠️ CRITICAL DEPENDENCY**: This transaction requires `chunks.chunk_hash` to exist (FK constraint).
The chunk MUST be committed in Step 2a-2 BEFORE this can succeed.
---
#### Step 2b: Blob Size Limit Handling
If adding a chunk would exceed blob size limit:
```go
if err == blob.ErrBlobSizeLimitExceeded {
if err := s.packer.FinalizeBlob(); err != nil { ... }
// Retry adding the chunk
if err := s.packer.AddChunk(...); err != nil { ... }
}
```
**FinalizeBlob() transactions:**
```
T_blob_finish: BEGIN
UPDATE blobs SET blob_hash=?, uncompressed_size=?, compressed_size=?, finished_ts=? WHERE id=?
COMMIT
```
Then blob handler is called (handleBlobReady):
```
(Upload to S3 - no transaction)
T_blob_uploaded: BEGIN
UPDATE blobs SET uploaded_ts=? WHERE id=?
INSERT INTO snapshot_blobs (snapshot_id, blob_id, blob_hash) VALUES (?, ?, ?)
INSERT INTO uploads (blob_hash, snapshot_id, uploaded_at, size, duration_ms) VALUES (?, ?, ?, ?, ?)
COMMIT
```
---
#### Step 2c: Queue File for Batch Insertion
After all chunks for a file are processed:
```go
// Build file data (in-memory, no DB)
fileChunks := make([]database.FileChunk, len(chunks))
chunkFiles := make([]database.ChunkFile, len(chunks))
// Queue for batch insertion
return s.addPendingFile(ctx, pendingFileData{
file: fileToProcess.File,
fileChunks: fileChunks,
chunkFiles: chunkFiles,
})
```
**No transaction yet** - just adds to `pendingFiles` slice.
If `len(pendingFiles) >= fileBatchSize (100)`, triggers `flushPendingFiles()`.
---
### Step 2d: Flush Pending Files
**Location:** `flushPendingFiles()` - called when batch is full or at end of processing
```go
return s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
for _, data := range files {
// 1. Create file record
s.repos.Files.Create(txCtx, tx, data.file) // INSERT OR REPLACE
// 2. Delete old associations
s.repos.FileChunks.DeleteByFileID(txCtx, tx, data.file.ID)
s.repos.ChunkFiles.DeleteByFileID(txCtx, tx, data.file.ID)
// 3. Create file_chunks records
for _, fc := range data.fileChunks {
s.repos.FileChunks.Create(txCtx, tx, &fc) // FK: chunks.chunk_hash
}
// 4. Create chunk_files records
for _, cf := range data.chunkFiles {
s.repos.ChunkFiles.Create(txCtx, tx, &cf) // FK: chunks.chunk_hash
}
// 5. Add file to snapshot
s.repos.Snapshots.AddFileByID(txCtx, tx, s.snapshotID, data.file.ID)
}
return nil
})
// COMMIT (all or nothing for the batch)
```
**Transaction:**
```
T_files_batch: BEGIN
-- For each file in batch:
INSERT OR REPLACE INTO files (...) VALUES (...)
DELETE FROM file_chunks WHERE file_id = ?
DELETE FROM chunk_files WHERE file_id = ?
INSERT INTO file_chunks (file_id, idx, chunk_hash) VALUES (?, ?, ?) -- FK: chunks
INSERT INTO chunk_files (chunk_hash, file_id, ...) VALUES (?, ?, ...) -- FK: chunks
INSERT INTO snapshot_files (snapshot_id, file_id) VALUES (?, ?)
-- Repeat for each file
COMMIT
```
**⚠️ CRITICAL DEPENDENCY**: `file_chunks` and `chunk_files` require `chunks.chunk_hash` to exist.
---
### Phase 2 End: Final Flush
```go
// Flush any remaining pending files
if err := s.flushAllPending(ctx); err != nil { ... }
// Final packer flush
s.packer.Flush()
```
---
## The Current Bug
### Problem
The current code attempts to batch file insertions, but `file_chunks` and `chunk_files` have foreign keys to `chunks.chunk_hash`. The batched file flush tries to insert these records, but if the chunks haven't been committed yet, the FK constraint fails.
### Why It's Happening
Looking at the sequence:
1. Process file A, chunk X
2. Create chunk X in DB (Transaction commits)
3. Add chunk X to packer
4. Packer creates blob_chunks for chunk X (needs chunk X - OK, committed in step 2)
5. Queue file A with chunk references
6. Process file B, chunk Y
7. Create chunk Y in DB (Transaction commits)
8. ... etc ...
9. At end: flushPendingFiles()
10. Insert file_chunks for file A referencing chunk X (chunk X committed - should work)
The chunks ARE being created individually. But something is going wrong.
### Actual Issue
Wait - let me re-read the code. The issue is:
In `processFileStreaming`, when we queue file data:
```go
fileChunks[i] = database.FileChunk{
FileID: fileToProcess.File.ID,
Idx: ci.fileChunk.Idx,
ChunkHash: ci.fileChunk.ChunkHash,
}
```
The `FileID` is set, but `fileToProcess.File.ID` might be empty at this point because the file record hasn't been created yet!
Looking at `checkFileInMemory`:
```go
// For new files:
if !exists {
return file, true // file.ID is empty string!
}
// For existing files:
file.ID = existingFile.ID // Reuse existing ID
```
**For NEW files, `file.ID` is empty!**
Then in `flushPendingFiles`:
```go
s.repos.Files.Create(txCtx, tx, data.file) // This generates/uses the ID
```
But `data.fileChunks` was built with the EMPTY ID!
### The Real Problem
For new files:
1. `checkFileInMemory` creates file record with empty ID
2. `processFileStreaming` queues file_chunks with empty `FileID`
3. `flushPendingFiles` creates file (generates ID), but file_chunks still have empty `FileID`
Wait, but `Files.Create` should be INSERT OR REPLACE by path, and the file struct should get updated... Let me check.
Actually, looking more carefully at the code path - the file IS created first in the flush, but the `fileChunks` slice was already built with the old (possibly empty) ID. The ID isn't updated after the file is created.
Hmm, but looking at the current code:
```go
fileChunks[i] = database.FileChunk{
FileID: fileToProcess.File.ID, // This uses the ID from the File struct
```
And in `checkFileInMemory` for new files, we create a file struct but don't set the ID. However, looking at the database repository, `Files.Create` should be doing `INSERT OR REPLACE` and the ID should be pre-generated...
Let me check if IDs are being generated. Looking at the File struct usage, it seems like UUIDs should be generated somewhere...
Actually, looking at the test failures again:
```
creating file chunk: inserting file_chunk: constraint failed: FOREIGN KEY constraint failed (787)
```
Error 787 is SQLite's foreign key constraint error. The failing FK is on `file_chunks.chunk_hash → chunks.chunk_hash`.
So the chunks ARE NOT in the database when we try to insert file_chunks. Let me trace through more carefully...
---
## Transaction Timing Issue
The problem is transaction visibility in SQLite.
Each `WithTx` creates a new transaction that commits at the end. But with batched file insertion:
1. Chunk transactions commit one at a time
2. File batch transaction runs later
If chunks are being inserted but something goes wrong with transaction isolation, the file batch might not see them.
But actually SQLite in WAL mode should have SERIALIZABLE isolation by default, so committed transactions should be visible.
Let me check if the in-memory cache is masking a database problem...
Actually, wait. Let me re-check the current broken code more carefully. The issue might be simpler.
---
## Current Code Flow Analysis
Looking at `processFileStreaming` in the current broken state:
```go
// For each chunk:
if !chunkExists {
err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
dbChunk := &database.Chunk{ChunkHash: chunk.Hash, Size: chunk.Size}
return s.repos.Chunks.Create(txCtx, tx, dbChunk)
})
// ... check error ...
s.addKnownChunk(chunk.Hash)
}
// ... add to packer (creates blob_chunks) ...
// Collect chunk info for file
chunks = append(chunks, chunkInfo{...})
```
Then at end of function:
```go
// Queue file for batch insertion
return s.addPendingFile(ctx, pendingFileData{
file: fileToProcess.File,
fileChunks: fileChunks,
chunkFiles: chunkFiles,
})
```
At end of `processPhase`:
```go
if err := s.flushAllPending(ctx); err != nil { ... }
```
The chunks are being created one-by-one with individual transactions. By the time `flushPendingFiles` runs, all chunk transactions should have committed.
Unless... there's a bug in how the chunks are being referenced. Let me check if the chunk_hash values are correct.
Or... maybe the test database is being recreated between operations somehow?
Actually, let me check the test setup. Maybe the issue is specific to the test environment.
---
## Summary of Object Lifecycle
| Object | When Created | Transaction | Dependencies |
|--------|--------------|-------------|--------------|
| snapshot | Before scan | Individual tx | None |
| blob | When packer needs new blob | Individual tx | None |
| chunk | During file chunking (each chunk) | Individual tx | None |
| blob_chunks | Immediately after adding chunk to packer | Individual tx | chunks, blobs |
| files | Batched at end of processing | Batch tx | None |
| file_chunks | With file (batched) | Batch tx | files, chunks |
| chunk_files | With file (batched) | Batch tx | files, chunks |
| snapshot_files | With file (batched) | Batch tx | snapshots, files |
| snapshot_blobs | After blob upload | Individual tx | snapshots, blobs |
| uploads | After blob upload | Same tx as snapshot_blobs | blobs, snapshots |
---
## Root Cause Analysis
After detailed analysis, I believe the issue is one of the following:
### Hypothesis 1: File ID Not Set
Looking at `checkFileInMemory()` for NEW files:
```go
if !exists {
return file, true // file.ID is empty string!
}
```
For new files, `file.ID` is empty. Then in `processFileStreaming`:
```go
fileChunks[i] = database.FileChunk{
FileID: fileToProcess.File.ID, // Empty for new files!
...
}
```
The `FileID` in the built `fileChunks` slice is empty.
Then in `flushPendingFiles`:
```go
s.repos.Files.Create(txCtx, tx, data.file) // This generates the ID
// But data.fileChunks still has empty FileID!
for i := range data.fileChunks {
s.repos.FileChunks.Create(...) // Uses empty FileID
}
```
**Solution**: Generate file IDs upfront in `checkFileInMemory()`:
```go
file := &database.File{
ID: uuid.New().String(), // Generate ID immediately
Path: path,
...
}
```
### Hypothesis 2: Transaction Isolation
SQLite with a single connection pool (`MaxOpenConns(1)`) should serialize all transactions. Committed data should be visible to subsequent transactions.
However, there might be a subtle issue with how `context.Background()` is used in the packer vs the scanner's context.
## Recommended Fix
**Step 1: Generate file IDs upfront**
In `checkFileInMemory()`, generate the UUID for new files immediately:
```go
file := &database.File{
ID: uuid.New().String(), // Always generate ID
Path: path,
...
}
```
This ensures `file.ID` is set when building `fileChunks` and `chunkFiles` slices.
**Step 2: Verify by reverting to per-file transactions**
If Step 1 doesn't fix it, revert to non-batched file insertion to isolate the issue:
```go
// Instead of queuing:
// return s.addPendingFile(ctx, pendingFileData{...})
// Do immediate insertion:
return s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
// Create file
s.repos.Files.Create(txCtx, tx, fileToProcess.File)
// Delete old associations
s.repos.FileChunks.DeleteByFileID(...)
s.repos.ChunkFiles.DeleteByFileID(...)
// Create new associations
for _, fc := range fileChunks {
s.repos.FileChunks.Create(...)
}
for _, cf := range chunkFiles {
s.repos.ChunkFiles.Create(...)
}
// Add to snapshot
s.repos.Snapshots.AddFileByID(...)
return nil
})
```
**Step 3: If batching is still desired**
After confirming per-file transactions work, re-implement batching with the ID fix in place, and add debug logging to trace exactly which chunk_hash is failing and why.

462
README.md
View File

@@ -1,11 +1,64 @@
# vaultik # vaultik (ваултик)
`vaultik` is a incremental backup daemon written in Go. It WIP: pre-1.0, some functions may not be fully implemented yet
encrypts data using an `age` public key and uploads each encrypted blob
directly to a remote S3-compatible object store. It requires no private
keys, secrets, or credentials stored on the backed-up system.
--- `vaultik` is an incremental backup daemon written in Go. It encrypts data
using an `age` public key and uploads each encrypted blob directly to a
remote S3-compatible object store. It requires no private keys, secrets, or
credentials (other than those required to PUT to encrypted object storage,
such as S3 API keys) stored on the backed-up system.
It includes table-stakes features such as:
* modern encryption (the excellent `age`)
* deduplication
* incremental backups
* modern multithreaded zstd compression with configurable levels
* content-addressed immutable storage
* local state tracking in standard SQLite database, enables write-only
incremental backups to destination
* no mutable remote metadata
* no plaintext file paths or metadata stored in remote
* does not create huge numbers of small files (to keep S3 operation counts
down) even if the source system has many small files
## why
Existing backup software fails under one or more of these conditions:
* Requires secrets (passwords, private keys) on the source system, which
compromises encrypted backups in the case of host system compromise
* Depends on symmetric encryption unsuitable for zero-trust environments
* Creates one-blob-per-file, which results in excessive S3 operation counts
* is slow
Other backup tools like `restic`, `borg`, and `duplicity` are designed for
environments where the source host can store secrets and has access to
decryption keys. I don't want to store backup decryption keys on my hosts,
only public keys for encryption.
My requirements are:
* open source
* no passphrases or private keys on the source host
* incremental
* compressed
* encrypted
* s3 compatible without an intermediate step or tool
Surprisingly, no existing tool meets these requirements, so I wrote `vaultik`.
## design goals
1. Backups must require only a public key on the source host.
1. No secrets or private keys may exist on the source system.
1. Restore must be possible using **only** the backup bucket and a private key.
1. Prune must be possible (requires private key, done on different hosts).
1. All encryption uses [`age`](https://age-encryption.org/) (X25519, XChaCha20-Poly1305).
1. Compression uses `zstd` at a configurable level.
1. Files are chunked, and multiple chunks are packed into encrypted blobs
to reduce object count for filesystems with many small files.
1. All metadata (snapshots) is stored remotely as encrypted SQLite DBs.
## what ## what
@@ -13,29 +66,12 @@ keys, secrets, or credentials stored on the backed-up system.
content-addressable chunk map of changed files using deterministic chunking. content-addressable chunk map of changed files using deterministic chunking.
Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`, Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
encrypted with `age`, and uploaded directly to remote storage under a encrypted with `age`, and uploaded directly to remote storage under a
content-addressed S3 path. content-addressed S3 path. At the end, a pruned snapshot-specific sqlite
database of metadata is created, encrypted, and uploaded alongside the
blobs.
No plaintext file contents ever hit disk. No private key is needed or stored No plaintext file contents ever hit disk. No private key or secret
locally. All encrypted data is streaming-processed and immediately discarded passphrase is needed or stored locally.
once uploaded. Metadata is encrypted and pushed with the same mechanism.
## why
Existing backup software fails under one or more of these conditions:
* Requires secrets (passwords, private keys) on the source system
* Depends on symmetric encryption unsuitable for zero-trust environments
* Stages temporary archives or repositories
* Writes plaintext metadata or plaintext file paths
`vaultik` addresses all of these by using:
* Public-key-only encryption (via `age`) requires no secrets (other than
bucket access key) on the source system
* Blob-level deduplication and batching
* Local state cache for incremental detection
* S3-native chunked upload interface
* Self-contained encrypted snapshot metadata
## how ## how
@@ -45,23 +81,38 @@ Existing backup software fails under one or more of these conditions:
go install git.eeqj.de/sneak/vaultik@latest go install git.eeqj.de/sneak/vaultik@latest
``` ```
2. **generate keypair** 1. **generate keypair**
```sh ```sh
age-keygen -o agekey.txt age-keygen -o agekey.txt
grep 'public key:' agekey.txt grep 'public key:' agekey.txt
``` ```
3. **write config** 1. **write config**
```yaml ```yaml
source_dirs: # Named snapshots - each snapshot can contain multiple paths
snapshots:
system:
paths:
- /etc - /etc
- /home/user/data - /var/lib
exclude:
- '*.cache' # Snapshot-specific exclusions
home:
paths:
- /home/user/documents
- /home/user/photos
# Global exclusions (apply to all snapshots)
exclude: exclude:
- '*.log' - '*.log'
- '*.tmp' - '*.tmp'
age_recipient: age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx - '.git'
- 'node_modules'
age_recipients:
- age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
s3: s3:
endpoint: https://s3.example.com endpoint: https://s3.example.com
bucket: vaultik-data bucket: vaultik-data
@@ -69,28 +120,24 @@ Existing backup software fails under one or more of these conditions:
access_key_id: ... access_key_id: ...
secret_access_key: ... secret_access_key: ...
region: us-east-1 region: us-east-1
backup_interval: 1h # only used in daemon mode, not for --cron mode backup_interval: 1h
full_scan_interval: 24h # normally we use inotify to mark dirty, but full_scan_interval: 24h
# every 24h we do a full stat() scan min_time_between_run: 15m
min_time_between_run: 15m # again, only for daemon mode
index_path: /var/lib/vaultik/index.sqlite
chunk_size: 10MB chunk_size: 10MB
blob_size_limit: 10GB blob_size_limit: 1GB
index_prefix: index/
``` ```
4. **run** 1. **run**
```sh ```sh
vaultik backup /etc/vaultik.yaml # Create all configured snapshots
``` vaultik --config /etc/vaultik.yaml snapshot create
```sh # Create specific snapshots by name
vaultik backup /etc/vaultik.yaml --cron # silent unless error vaultik --config /etc/vaultik.yaml snapshot create home system
```
```sh # Silent mode for cron
vaultik backup /etc/vaultik.yaml --daemon # runs in background, uses inotify vaultik --config /etc/vaultik.yaml snapshot create --cron
``` ```
--- ---
@@ -100,54 +147,211 @@ Existing backup software fails under one or more of these conditions:
### commands ### commands
```sh ```sh
vaultik backup [--config <path>] [--cron] [--daemon] vaultik [--config <path>] snapshot create [snapshot-names...] [--cron] [--daemon] [--prune]
vaultik restore --bucket <bucket> --prefix <prefix> --snapshot <id> --target <dir> vaultik [--config <path>] snapshot list [--json]
vaultik prune --bucket <bucket> --prefix <prefix> [--dry-run] vaultik [--config <path>] snapshot verify <snapshot-id> [--deep]
vaultik fetch --bucket <bucket> --prefix <prefix> --snapshot <id> --file <path> --target <path> vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--force]
vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick] vaultik [--config <path>] snapshot remove <snapshot-id> [--dry-run] [--force]
vaultik [--config <path>] snapshot prune
vaultik [--config <path>] restore <snapshot-id> <target-dir> [paths...]
vaultik [--config <path>] prune [--dry-run] [--force]
vaultik [--config <path>] info
vaultik [--config <path>] store info
``` ```
### environment ### environment
* `VAULTIK_PRIVATE_KEY`: Required for `restore`, `prune`, `fetch`, and `verify` commands. Contains the age private key for decryption. * `VAULTIK_AGE_SECRET_KEY`: Required for `restore` and deep `verify`. Contains the age private key for decryption.
* `VAULTIK_CONFIG`: Optional path to config file. If set, `vaultik backup` can be run without specifying the config file path. * `VAULTIK_CONFIG`: Optional path to config file.
### command details ### command details
**backup**: Perform incremental backup of configured directories **snapshot create**: Perform incremental backup of configured snapshots
* Config is located at `/etc/vaultik/config.yml` by default * Config is located at `/etc/vaultik/config.yml` by default
* `--config`: Override config file path * Optional snapshot names argument to create specific snapshots (default: all)
* `--cron`: Silent unless error (for crontab) * `--cron`: Silent unless error (for crontab)
* `--daemon`: Run continuously with inotify monitoring and periodic scans * `--daemon`: Run continuously with inotify monitoring and periodic scans
* `--prune`: Delete old snapshots and orphaned blobs after backup
**restore**: Restore entire snapshot to target directory **snapshot list**: List all snapshots with their timestamps and sizes
* Downloads and decrypts metadata * `--json`: Output in JSON format
* Fetches only required blobs
* Reconstructs directory structure
**prune**: Remove unreferenced blobs from storage **snapshot verify**: Verify snapshot integrity
* Requires private key * `--deep`: Download and verify blob contents (not just existence)
* Downloads latest snapshot metadata
**snapshot purge**: Remove old snapshots based on criteria
* `--keep-latest`: Keep only the most recent snapshot
* `--older-than`: Remove snapshots older than duration (e.g., 30d, 6mo, 1y)
* `--force`: Skip confirmation prompt
**snapshot remove**: Remove a specific snapshot
* `--dry-run`: Show what would be deleted without deleting
* `--force`: Skip confirmation prompt
**snapshot prune**: Clean orphaned data from local database
**restore**: Restore snapshot to target directory
* Requires `VAULTIK_AGE_SECRET_KEY` environment variable with age private key
* Optional path arguments to restore specific files/directories (default: all)
* Downloads and decrypts metadata, fetches required blobs, reconstructs files
* Preserves file permissions, timestamps, and ownership (ownership requires root)
* Handles symlinks and directories
**prune**: Remove unreferenced blobs from remote storage
* Scans all snapshots for referenced blobs
* Deletes orphaned blobs * Deletes orphaned blobs
**fetch**: Extract single file from backup **info**: Display system and configuration information
* Retrieves specific file without full restore
* Supports extracting to different filename
**verify**: Validate backup integrity **store info**: Display S3 bucket configuration and storage statistics
* Checks metadata hash
* Verifies all referenced blobs exist
* Default: Downloads blobs and validates chunk integrity
* `--quick`: Only checks blob existence and S3 content hashes
--- ---
## architecture ## architecture
### s3 bucket layout
```
s3://<bucket>/<prefix>/
├── blobs/
│ └── <aa>/<bb>/<full_blob_hash>
└── metadata/
├── <snapshot_id>/
│ ├── db.zst.age
│ └── manifest.json.zst
```
* `blobs/<aa>/<bb>/...`: Two-level directory sharding using first 4 hex chars of blob hash
* `metadata/<snapshot_id>/db.zst.age`: Encrypted, compressed SQLite database
* `metadata/<snapshot_id>/manifest.json.zst`: Unencrypted blob list for pruning
### blob manifest format
The `manifest.json.zst` file is unencrypted (compressed JSON) to enable pruning without decryption:
```json
{
"snapshot_id": "hostname_snapshotname_2025-01-01T12:00:00Z",
"blob_hashes": [
"aa1234567890abcdef...",
"bb2345678901bcdef0..."
]
}
```
Snapshot IDs follow the format `<hostname>_<snapshot-name>_<timestamp>` (e.g., `server1_home_2025-01-01T12:00:00Z`).
### local sqlite schema
```sql
CREATE TABLE files (
id TEXT PRIMARY KEY,
path TEXT NOT NULL UNIQUE,
mtime INTEGER NOT NULL,
size INTEGER NOT NULL,
mode INTEGER NOT NULL,
uid INTEGER NOT NULL,
gid INTEGER NOT NULL
);
CREATE TABLE file_chunks (
file_id TEXT NOT NULL,
idx INTEGER NOT NULL,
chunk_hash TEXT NOT NULL,
PRIMARY KEY (file_id, idx),
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
);
CREATE TABLE chunks (
chunk_hash TEXT PRIMARY KEY,
size INTEGER NOT NULL
);
CREATE TABLE blobs (
id TEXT PRIMARY KEY,
blob_hash TEXT NOT NULL UNIQUE,
uncompressed INTEGER NOT NULL,
compressed INTEGER NOT NULL,
uploaded_at INTEGER
);
CREATE TABLE blob_chunks (
blob_hash TEXT NOT NULL,
chunk_hash TEXT NOT NULL,
offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (blob_hash, chunk_hash)
);
CREATE TABLE chunk_files (
chunk_hash TEXT NOT NULL,
file_id TEXT NOT NULL,
file_offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (chunk_hash, file_id)
);
CREATE TABLE snapshots (
id TEXT PRIMARY KEY,
hostname TEXT NOT NULL,
vaultik_version TEXT NOT NULL,
started_at INTEGER NOT NULL,
completed_at INTEGER,
file_count INTEGER NOT NULL,
chunk_count INTEGER NOT NULL,
blob_count INTEGER NOT NULL,
total_size INTEGER NOT NULL,
blob_size INTEGER NOT NULL,
compression_ratio REAL NOT NULL
);
CREATE TABLE snapshot_files (
snapshot_id TEXT NOT NULL,
file_id TEXT NOT NULL,
PRIMARY KEY (snapshot_id, file_id)
);
CREATE TABLE snapshot_blobs (
snapshot_id TEXT NOT NULL,
blob_id TEXT NOT NULL,
blob_hash TEXT NOT NULL,
PRIMARY KEY (snapshot_id, blob_id)
);
```
### data flow
#### backup
1. Load config, open local SQLite index
1. Walk source directories, check mtime/size against index
1. For changed/new files: chunk using content-defined chunking
1. For each chunk: hash, check if already uploaded, add to blob packer
1. When blob reaches threshold: compress, encrypt, upload to S3
1. Build snapshot metadata, compress, encrypt, upload
1. Create blob manifest (unencrypted) for pruning support
#### restore
1. Download `metadata/<snapshot_id>/db.zst.age`
1. Decrypt and decompress SQLite database
1. Query files table (optionally filtered by paths)
1. For each file, get ordered chunk list from file_chunks
1. Download required blobs, decrypt, decompress
1. Extract chunks and reconstruct files
1. Restore permissions, mtime, uid/gid
#### prune
1. List all snapshot manifests
1. Build set of all referenced blob hashes
1. List all blobs in storage
1. Delete any blob not in referenced set
### chunking ### chunking
* Content-defined chunking using rolling hash (Rabin fingerprint) * Content-defined chunking using FastCDC algorithm
* Average chunk size: 10MB (configurable) * Average chunk size: configurable (default 10MB)
* Deduplication at chunk level * Deduplication at chunk level
* Multiple chunks packed into blobs for efficiency * Multiple chunks packed into blobs for efficiency
@@ -158,19 +362,13 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
* Each blob encrypted independently * Each blob encrypted independently
* Metadata databases also encrypted * Metadata databases also encrypted
### storage ### compression
* Content-addressed blob storage * zstd compression at configurable level
* Immutable append-only design * Applied before encryption
* Two-level directory sharding for blobs (aa/bb/hash) * Blob-level compression for efficiency
* Compressed with zstd before encryption
### state tracking ---
* Local SQLite database for incremental state
* Tracks file mtimes and chunk mappings
* Enables efficient change detection
* Supports inotify monitoring in daemon mode
## does not ## does not
@@ -180,8 +378,6 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
* Require a symmetric passphrase or password * Require a symmetric passphrase or password
* Trust the source system with anything * Trust the source system with anything
---
## does ## does
* Incremental deduplicated backup * Incremental deduplicated backup
@@ -193,90 +389,22 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
--- ---
## restore ## requirements
`vaultik restore` downloads only the snapshot metadata and required blobs. It * Go 1.24 or later
never contacts the source system. All restore operations depend only on: * S3-compatible object storage
* Sufficient disk space for local index (typically <1GB)
* `VAULTIK_PRIVATE_KEY`
* The bucket
The entire system is restore-only from object storage.
---
## features
### daemon mode
* Continuous background operation
* inotify-based change detection
* Respects `backup_interval` and `min_time_between_run`
* Full scan every `full_scan_interval` (default 24h)
### cron mode
* Single backup run
* Silent output unless errors
* Ideal for scheduled backups
### metadata integrity
* SHA256 hash of metadata stored separately
* Encrypted hash file for verification
* Chunked metadata support for large filesystems
### exclusion patterns
* Glob-based file exclusion
* Configured in YAML
* Applied during directory walk
## prune
Run `vaultik prune` on a machine with the private key. It:
* Downloads the most recent snapshot
* Decrypts metadata
* Lists referenced blobs
* Deletes any blob in the bucket not referenced
This enables garbage collection from immutable storage.
---
## license ## license
WTFPL — see LICENSE. [MIT](https://opensource.org/license/mit/)
---
## security considerations
* Source host compromise cannot decrypt backups
* No replay attacks possible (append-only)
* Each blob independently encrypted
* Metadata tampering detectable via hash verification
* S3 credentials only allow write access to backup prefix
## performance
* Streaming processing (no temp files)
* Parallel blob uploads
* Deduplication reduces storage and bandwidth
* Local index enables fast incremental detection
* Configurable compression levels
## requirements
* Go 1.24.4 or later
* S3-compatible object storage
* age command-line tool (for key generation)
* SQLite3
* Sufficient disk space for local index
## author ## author
sneak Made with love and lots of expensive SOTA AI by [sneak](https://sneak.berlin) in Berlin in the summer of 2025.
[sneak@sneak.berlin](mailto:sneak@sneak.berlin)
[https://sneak.berlin](https://sneak.berlin) Released as a free software gift to the world, no strings attached.
Contact: [sneak@sneak.berlin](mailto:sneak@sneak.berlin)
[https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2](https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2)

212
TODO.md
View File

@@ -1,112 +1,128 @@
# Implementation TODO # Vaultik 1.0 TODO
## Local Index Database Linear list of tasks to complete before 1.0 release.
1. Implement SQLite schema creation
1. Create Index type with all database operations
1. Add transaction support and proper locking
1. Implement file tracking (save, lookup, delete)
1. Implement chunk tracking and deduplication
1. Implement blob tracking and chunk-to-blob mapping
1. Write tests for all index operations
## Chunking and Hashing ## Rclone Storage Backend (Complete)
1. Implement Rabin fingerprint chunker
1. Create streaming chunk processor
1. Implement SHA256 hashing for chunks
1. Add configurable chunk size parameters
1. Write tests for chunking consistency
## Compression and Encryption Add rclone as a storage backend via Go library import, allowing vaultik to use any of rclone's 70+ supported cloud storage providers.
1. Implement zstd compression wrapper
1. Integrate age encryption library
1. Create Encryptor type for public key encryption
1. Create Decryptor type for private key decryption
1. Implement streaming encrypt/decrypt pipelines
1. Write tests for compression and encryption
## Blob Packing **Configuration:**
1. Implement BlobWriter with size limits ```yaml
1. Add chunk accumulation and flushing storage_url: "rclone://myremote/path/to/backups"
1. Create blob hash calculation ```
1. Implement proper error handling and rollback User must have rclone configured separately (via `rclone config`).
1. Write tests for blob packing scenarios
## S3 Operations **Implementation Steps:**
1. Integrate MinIO client library 1. [x] Add rclone dependency to go.mod
1. Implement S3Client wrapper type 2. [x] Create `internal/storage/rclone.go` implementing `Storer` interface
1. Add multipart upload support for large blobs - `NewRcloneStorer(remote, path)` - init with `configfile.Install()` and `fs.NewFs()`
1. Implement retry logic with exponential backoff - `Put` / `PutWithProgress` - use `operations.Rcat()`
1. Add connection pooling and timeout handling - `Get` - use `fs.NewObject()` then `obj.Open()`
1. Write tests using MinIO container - `Stat` - use `fs.NewObject()` for size/metadata
- `Delete` - use `obj.Remove()`
- `List` / `ListStream` - use `operations.ListFn()`
- `Info` - return remote name
3. [x] Update `internal/storage/url.go` - parse `rclone://remote/path` URLs
4. [x] Update `internal/storage/module.go` - add rclone case to `storerFromURL()`
5. [x] Test with real rclone remote
## Backup Command - Basic **Error Mapping:**
1. Implement directory walking with exclusion patterns - `fs.ErrorObjectNotFound``ErrNotFound`
1. Add file change detection using index - `fs.ErrorDirNotFound``ErrNotFound`
1. Integrate chunking pipeline for changed files - `fs.ErrorNotFoundInConfigFile``ErrRemoteNotFound` (new)
1. Implement blob upload coordination
1. Add progress reporting to stderr
1. Write integration tests for backup
## Snapshot Metadata ---
1. Implement snapshot metadata extraction from index
1. Create SQLite snapshot database builder
1. Add metadata compression and encryption
1. Implement metadata chunking for large snapshots
1. Add hash calculation and verification
1. Implement metadata upload to S3
1. Write tests for metadata operations
## Restore Command ## CLI Polish (Priority)
1. Implement snapshot listing and selection
1. Add metadata download and reconstruction
1. Implement hash verification for metadata
1. Create file restoration logic with chunk retrieval
1. Add blob caching for efficiency
1. Implement proper file permissions and mtime restoration
1. Write integration tests for restore
## Prune Command 1. Improve error messages throughout
1. Implement latest snapshot detection - Ensure all errors include actionable context
1. Add referenced blob extraction from metadata - Add suggestions for common issues (e.g., "did you set VAULTIK_AGE_SECRET_KEY?")
1. Create S3 blob listing and comparison
1. Implement safe deletion of unreferenced blobs
1. Add dry-run mode for safety
1. Write tests for prune scenarios
## Verify Command ## Security (Priority)
1. Implement metadata integrity checking
1. Add blob existence verification
1. Implement quick mode (S3 hash checking)
1. Implement deep mode (download and verify chunks)
1. Add detailed error reporting
1. Write tests for verification
## Fetch Command 1. Audit encryption implementation
1. Implement single-file metadata query - Verify age encryption is used correctly
1. Add minimal blob downloading for file - Ensure no plaintext leaks in logs or errors
1. Create streaming file reconstruction - Verify blob hashes are computed correctly
1. Add support for output redirection
1. Write tests for fetch command
## Daemon Mode 1. Secure memory handling for secrets
1. Implement inotify watcher for Linux - Clear S3 credentials from memory after client init
1. Add dirty path tracking in index - Document that age_secret_key is env-var only (already implemented)
1. Create periodic full scan scheduler
1. Implement backup interval enforcement
1. Add proper signal handling and shutdown
1. Write tests for daemon behavior
## Cron Mode ## Testing
1. Implement silent operation mode
1. Add proper exit codes for cron
1. Implement lock file to prevent concurrent runs
1. Add error summary reporting
1. Write tests for cron mode
## Finalization 1. Write integration tests for restore command
1. Add comprehensive logging throughout
1. Implement proper error wrapping and context 1. Write end-to-end integration test
1. Add performance metrics collection - Create backup
1. Create end-to-end integration tests - Verify backup
1. Write documentation and examples - Restore backup
1. Set up CI/CD pipeline - Compare restored files to originals
1. Add tests for edge cases
- Empty directories
- Symlinks
- Special characters in filenames
- Very large files (multi-GB)
- Many small files (100k+)
1. Add tests for error conditions
- Network failures during upload
- Disk full during restore
- Corrupted blobs
- Missing blobs
## Performance
1. Profile and optimize restore performance
- Parallel blob downloads
- Streaming decompression/decryption
- Efficient chunk reassembly
1. Add bandwidth limiting option
- `--bwlimit` flag for upload/download speed limiting
## Documentation
1. Add man page or --help improvements
- Detailed help for each command
- Examples in help output
## Final Polish
1. Ensure version is set correctly in releases
1. Create release process
- Binary releases for supported platforms
- Checksums for binaries
- Release notes template
1. Final code review
- Remove debug statements
- Ensure consistent code style
1. Tag and release v1.0.0
---
## Post-1.0 (Daemon Mode)
1. Implement inotify file watcher for Linux
- Watch source directories for changes
- Track dirty paths in memory
1. Implement FSEvents watcher for macOS
- Watch source directories for changes
- Track dirty paths in memory
1. Implement backup scheduler in daemon mode
- Respect backup_interval config
- Trigger backup when dirty paths exist and interval elapsed
- Implement full_scan_interval for periodic full scans
1. Add proper signal handling for daemon
- Graceful shutdown on SIGTERM/SIGINT
- Complete in-progress backup before exit
1. Write tests for daemon mode

View File

@@ -1,9 +1,41 @@
package main package main
import ( import (
"os"
"runtime"
"runtime/pprof"
"git.eeqj.de/sneak/vaultik/internal/cli" "git.eeqj.de/sneak/vaultik/internal/cli"
) )
func main() { func main() {
// CPU profiling: set VAULTIK_CPUPROFILE=/path/to/cpu.prof
if cpuProfile := os.Getenv("VAULTIK_CPUPROFILE"); cpuProfile != "" {
f, err := os.Create(cpuProfile)
if err != nil {
panic("could not create CPU profile: " + err.Error())
}
defer func() { _ = f.Close() }()
if err := pprof.StartCPUProfile(f); err != nil {
panic("could not start CPU profile: " + err.Error())
}
defer pprof.StopCPUProfile()
}
// Memory profiling: set VAULTIK_MEMPROFILE=/path/to/mem.prof
if memProfile := os.Getenv("VAULTIK_MEMPROFILE"); memProfile != "" {
defer func() {
f, err := os.Create(memProfile)
if err != nil {
panic("could not create memory profile: " + err.Error())
}
defer func() { _ = f.Close() }()
runtime.GC() // get up-to-date statistics
if err := pprof.WriteHeapProfile(f); err != nil {
panic("could not write memory profile: " + err.Error())
}
}()
}
cli.CLIEntry() cli.CLIEntry()
} }

332
config.example.yml Normal file
View File

@@ -0,0 +1,332 @@
# vaultik configuration file example
# This file shows all available configuration options with their default values
# Copy this file and uncomment/modify the values you need
# Age recipient public keys for encryption
# This is REQUIRED - backups are encrypted to these public keys
# Generate with: age-keygen | grep "public key"
age_recipients:
- age1cj2k2addawy294f6k2gr2mf9gps9r3syplryxca3nvxj3daqm96qfp84tz
# Named snapshots - each snapshot can contain multiple paths
# Each snapshot gets its own ID and can have snapshot-specific excludes
snapshots:
testing:
paths:
- ~/dev/vaultik
apps:
paths:
- /Applications
exclude:
- "/App Store.app"
- "/Apps.app"
- "/Automator.app"
- "/Books.app"
- "/Calculator.app"
- "/Calendar.app"
- "/Chess.app"
- "/Clock.app"
- "/Contacts.app"
- "/Dictionary.app"
- "/FaceTime.app"
- "/FindMy.app"
- "/Font Book.app"
- "/Freeform.app"
- "/Games.app"
- "/GarageBand.app"
- "/Home.app"
- "/Image Capture.app"
- "/Image Playground.app"
- "/Journal.app"
- "/Keynote.app"
- "/Mail.app"
- "/Maps.app"
- "/Messages.app"
- "/Mission Control.app"
- "/Music.app"
- "/News.app"
- "/Notes.app"
- "/Numbers.app"
- "/Pages.app"
- "/Passwords.app"
- "/Phone.app"
- "/Photo Booth.app"
- "/Photos.app"
- "/Podcasts.app"
- "/Preview.app"
- "/QuickTime Player.app"
- "/Reminders.app"
- "/Safari.app"
- "/Shortcuts.app"
- "/Siri.app"
- "/Stickies.app"
- "/Stocks.app"
- "/System Settings.app"
- "/TV.app"
- "/TextEdit.app"
- "/Time Machine.app"
- "/Tips.app"
- "/Utilities/Activity Monitor.app"
- "/Utilities/AirPort Utility.app"
- "/Utilities/Audio MIDI Setup.app"
- "/Utilities/Bluetooth File Exchange.app"
- "/Utilities/Boot Camp Assistant.app"
- "/Utilities/ColorSync Utility.app"
- "/Utilities/Console.app"
- "/Utilities/Digital Color Meter.app"
- "/Utilities/Disk Utility.app"
- "/Utilities/Grapher.app"
- "/Utilities/Magnifier.app"
- "/Utilities/Migration Assistant.app"
- "/Utilities/Print Center.app"
- "/Utilities/Screen Sharing.app"
- "/Utilities/Screenshot.app"
- "/Utilities/Script Editor.app"
- "/Utilities/System Information.app"
- "/Utilities/Terminal.app"
- "/Utilities/VoiceOver Utility.app"
- "/VoiceMemos.app"
- "/Weather.app"
- "/iMovie.app"
- "/iPhone Mirroring.app"
home:
paths:
- "~"
exclude:
- "/.Trash"
- "/tmp"
- "/Library/Caches"
- "/Library/Accounts"
- "/Library/AppleMediaServices"
- "/Library/Application Support/AddressBook"
- "/Library/Application Support/CallHistoryDB"
- "/Library/Application Support/CallHistoryTransactions"
- "/Library/Application Support/DifferentialPrivacy"
- "/Library/Application Support/FaceTime"
- "/Library/Application Support/FileProvider"
- "/Library/Application Support/Knowledge"
- "/Library/Application Support/com.apple.TCC"
- "/Library/Application Support/com.apple.avfoundation/Frecents"
- "/Library/Application Support/com.apple.sharedfilelist"
- "/Library/Assistant/SiriVocabulary"
- "/Library/Autosave Information"
- "/Library/Biome"
- "/Library/ContainerManager"
- "/Library/Containers/com.apple.Home"
- "/Library/Containers/com.apple.Maps/Data/Maps"
- "/Library/Containers/com.apple.MobileSMS"
- "/Library/Containers/com.apple.Notes"
- "/Library/Containers/com.apple.Safari"
- "/Library/Containers/com.apple.Safari.WebApp"
- "/Library/Containers/com.apple.VoiceMemos"
- "/Library/Containers/com.apple.archiveutility"
- "/Library/Containers/com.apple.corerecents.recentsd/Data/Library/Recents"
- "/Library/Containers/com.apple.mail"
- "/Library/Containers/com.apple.news"
- "/Library/Containers/com.apple.stocks"
- "/Library/Cookies"
- "/Library/CoreFollowUp"
- "/Library/Daemon Containers"
- "/Library/DoNotDisturb"
- "/Library/DuetExpertCenter"
- "/Library/Group Containers/com.apple.Home.group"
- "/Library/Group Containers/com.apple.MailPersonaStorage"
- "/Library/Group Containers/com.apple.PreviewLegacySignaturesConversion"
- "/Library/Group Containers/com.apple.bird"
- "/Library/Group Containers/com.apple.stickersd.group"
- "/Library/Group Containers/com.apple.systempreferences.cache"
- "/Library/Group Containers/group.com.apple.AppleSpell"
- "/Library/Group Containers/group.com.apple.ArchiveUtility.PKSignedContainer"
- "/Library/Group Containers/group.com.apple.DeviceActivity"
- "/Library/Group Containers/group.com.apple.Journal"
- "/Library/Group Containers/group.com.apple.ManagedSettings"
- "/Library/Group Containers/group.com.apple.PegasusConfiguration"
- "/Library/Group Containers/group.com.apple.Safari.SandboxBroker"
- "/Library/Group Containers/group.com.apple.SiriTTS"
- "/Library/Group Containers/group.com.apple.UserNotifications"
- "/Library/Group Containers/group.com.apple.VoiceMemos.shared"
- "/Library/Group Containers/group.com.apple.accessibility.voicebanking"
- "/Library/Group Containers/group.com.apple.amsondevicestoraged"
- "/Library/Group Containers/group.com.apple.appstoreagent"
- "/Library/Group Containers/group.com.apple.calendar"
- "/Library/Group Containers/group.com.apple.chronod"
- "/Library/Group Containers/group.com.apple.contacts"
- "/Library/Group Containers/group.com.apple.controlcenter"
- "/Library/Group Containers/group.com.apple.corerepair"
- "/Library/Group Containers/group.com.apple.coreservices.useractivityd"
- "/Library/Group Containers/group.com.apple.energykit"
- "/Library/Group Containers/group.com.apple.feedback"
- "/Library/Group Containers/group.com.apple.feedbacklogger"
- "/Library/Group Containers/group.com.apple.findmy.findmylocateagent"
- "/Library/Group Containers/group.com.apple.iCloudDrive"
- "/Library/Group Containers/group.com.apple.icloud.fmfcore"
- "/Library/Group Containers/group.com.apple.icloud.fmipcore"
- "/Library/Group Containers/group.com.apple.icloud.searchpartyuseragent"
- "/Library/Group Containers/group.com.apple.liveactivitiesd"
- "/Library/Group Containers/group.com.apple.loginwindow.persistent-apps"
- "/Library/Group Containers/group.com.apple.mail"
- "/Library/Group Containers/group.com.apple.mlhost"
- "/Library/Group Containers/group.com.apple.moments"
- "/Library/Group Containers/group.com.apple.news"
- "/Library/Group Containers/group.com.apple.newsd"
- "/Library/Group Containers/group.com.apple.notes"
- "/Library/Group Containers/group.com.apple.notes.import"
- "/Library/Group Containers/group.com.apple.photolibraryd.private"
- "/Library/Group Containers/group.com.apple.portrait.BackgroundReplacement"
- "/Library/Group Containers/group.com.apple.printtool"
- "/Library/Group Containers/group.com.apple.private.translation"
- "/Library/Group Containers/group.com.apple.reminders"
- "/Library/Group Containers/group.com.apple.replicatord"
- "/Library/Group Containers/group.com.apple.scopedbookmarkagent"
- "/Library/Group Containers/group.com.apple.secure-control-center-preferences"
- "/Library/Group Containers/group.com.apple.sharingd"
- "/Library/Group Containers/group.com.apple.shortcuts"
- "/Library/Group Containers/group.com.apple.siri.inference"
- "/Library/Group Containers/group.com.apple.siri.referenceResolution"
- "/Library/Group Containers/group.com.apple.siri.remembers"
- "/Library/Group Containers/group.com.apple.siri.userfeedbacklearning"
- "/Library/Group Containers/group.com.apple.spotlight"
- "/Library/Group Containers/group.com.apple.stocks"
- "/Library/Group Containers/group.com.apple.stocks-news"
- "/Library/Group Containers/group.com.apple.studentd"
- "/Library/Group Containers/group.com.apple.swtransparency"
- "/Library/Group Containers/group.com.apple.telephonyutilities.callservicesd"
- "/Library/Group Containers/group.com.apple.tips"
- "/Library/Group Containers/group.com.apple.tipsnext"
- "/Library/Group Containers/group.com.apple.transparency"
- "/Library/Group Containers/group.com.apple.usernoted"
- "/Library/Group Containers/group.com.apple.weather"
- "/Library/HomeKit"
- "/Library/IdentityServices"
- "/Library/IntelligencePlatform"
- "/Library/Mail"
- "/Library/Messages"
- "/Library/Metadata/CoreSpotlight"
- "/Library/Metadata/com.apple.IntelligentSuggestions"
- "/Library/PersonalizationPortrait"
- "/Library/Safari"
- "/Library/Sharing"
- "/Library/Shortcuts"
- "/Library/StatusKit"
- "/Library/Suggestions"
- "/Library/Trial"
- "/Library/Weather"
- "/Library/com.apple.aiml.instrumentation"
- "/Movies/TV"
system:
paths:
- /
exclude:
# Virtual/transient filesystems
- /proc
- /sys
- /dev
- /run
- /tmp
- /var/tmp
- /var/run
- /var/lock
- /var/cache
- /media
- /mnt
# Swap
- /swapfile
- /swap.img
# Package manager caches
- /var/cache/apt
- /var/cache/yum
- /var/cache/dnf
- /var/cache/pacman
# Trash
- "*/.local/share/Trash"
dev:
paths:
- /Users/user/dev
exclude:
- "**/node_modules"
- "**/target"
- "**/build"
- "**/__pycache__"
- "**/*.pyc"
- "**/.venv"
- "**/vendor"
# Global patterns to exclude from all backups
exclude:
- "*.tmp"
# Storage URL - use either this OR the s3 section below
# Supports: s3://bucket/prefix, file:///path, rclone://remote/path
storage_url: "rclone://las1stor1//srv/pool.2024.04/backups/heraklion"
# S3-compatible storage configuration
#s3:
# # S3-compatible endpoint URL
# # Examples: https://s3.amazonaws.com, https://storage.googleapis.com
# endpoint: http://10.100.205.122:8333
#
# # Bucket name where backups will be stored
# bucket: testbucket
#
# # Prefix (folder) within the bucket for this host's backups
# # Useful for organizing backups from multiple hosts
# # Default: empty (root of bucket)
# #prefix: "hosts/myserver/"
#
# # S3 access credentials
# access_key_id: Z9GT22M9YFU08WRMC5D4
# secret_access_key: Pi0tPKjFbN4rZlRhcA4zBtEkib04yy2WcIzI+AXk
#
# # S3 region
# # Default: us-east-1
# #region: us-east-1
#
# # Use SSL/TLS for S3 connections
# # Default: true
# #use_ssl: true
#
# # Part size for multipart uploads
# # Minimum 5MB, affects memory usage during upload
# # Supports: 5MB, 10M, 100MiB, etc.
# # Default: 5MB
# #part_size: 5MB
# How often to run backups in daemon mode
# Format: 1h, 30m, 24h, etc
# Default: 1h
#backup_interval: 1h
# How often to do a full filesystem scan in daemon mode
# Between full scans, inotify is used to detect changes
# Default: 24h
#full_scan_interval: 24h
# Minimum time between backup runs in daemon mode
# Prevents backups from running too frequently
# Default: 15m
#min_time_between_run: 15m
# Path to local SQLite index database
# This database tracks file state for incremental backups
# Default: /var/lib/vaultik/index.sqlite
#index_path: /var/lib/vaultik/index.sqlite
# Average chunk size for content-defined chunking
# Smaller chunks = better deduplication but more metadata
# Supports: 10MB, 5M, 1GB, 500KB, 64MiB, etc.
# Default: 10MB
#chunk_size: 10MB
# Maximum blob size
# Multiple chunks are packed into blobs up to this size
# Supports: 1GB, 10G, 500MB, 1GiB, etc.
# Default: 10GB
#blob_size_limit: 10GB
# Compression level (1-19)
# Higher = better compression but slower
# Default: 3
compression_level: 5
# Hostname to use in backup metadata
# Default: system hostname
#hostname: myserver

268
docs/DATAMODEL.md Normal file
View File

@@ -0,0 +1,268 @@
# Vaultik Data Model
## Overview
Vaultik uses a local SQLite database to track file metadata, chunk mappings, and blob associations during the backup process. This database serves as an index for incremental backups and enables efficient deduplication.
**Important Notes:**
- **No Migration Support**: Vaultik does not support database schema migrations. If the schema changes, the local database must be deleted and recreated by performing a full backup.
- **Version Compatibility**: In rare cases, you may need to use the same version of Vaultik to restore a backup as was used to create it. This ensures compatibility with the metadata format stored in S3.
## Database Tables
### 1. `files`
Stores metadata about files in the filesystem being backed up.
**Columns:**
- `id` (TEXT PRIMARY KEY) - UUID for the file record
- `path` (TEXT NOT NULL UNIQUE) - Absolute file path
- `mtime` (INTEGER NOT NULL) - Modification time as Unix timestamp
- `ctime` (INTEGER NOT NULL) - Change time as Unix timestamp
- `size` (INTEGER NOT NULL) - File size in bytes
- `mode` (INTEGER NOT NULL) - Unix file permissions and type
- `uid` (INTEGER NOT NULL) - User ID of file owner
- `gid` (INTEGER NOT NULL) - Group ID of file owner
- `link_target` (TEXT) - Symlink target path (NULL for regular files)
**Indexes:**
- `idx_files_path` on `path` for efficient lookups
**Purpose:** Tracks file metadata to detect changes between backup runs. Used for incremental backup decisions. The UUID primary key provides stable references that don't change if files are moved.
### 2. `chunks`
Stores information about content-defined chunks created from files.
**Columns:**
- `chunk_hash` (TEXT PRIMARY KEY) - SHA256 hash of chunk content
- `size` (INTEGER NOT NULL) - Chunk size in bytes
**Purpose:** Enables deduplication by tracking unique chunks across all files.
### 3. `file_chunks`
Maps files to their constituent chunks in order.
**Columns:**
- `file_id` (TEXT) - File ID (FK to files.id)
- `idx` (INTEGER) - Chunk index within file (0-based)
- `chunk_hash` (TEXT) - Chunk hash (FK to chunks.chunk_hash)
- PRIMARY KEY (`file_id`, `idx`)
**Purpose:** Allows reconstruction of files from chunks during restore.
### 4. `chunk_files`
Reverse mapping showing which files contain each chunk.
**Columns:**
- `chunk_hash` (TEXT) - Chunk hash (FK to chunks.chunk_hash)
- `file_id` (TEXT) - File ID (FK to files.id)
- `file_offset` (INTEGER) - Byte offset of chunk within file
- `length` (INTEGER) - Length of chunk in bytes
- PRIMARY KEY (`chunk_hash`, `file_id`)
**Purpose:** Supports efficient queries for chunk usage and deduplication statistics.
### 5. `blobs`
Stores information about packed, compressed, and encrypted blob files.
**Columns:**
- `id` (TEXT PRIMARY KEY) - UUID assigned when blob creation starts
- `blob_hash` (TEXT UNIQUE) - SHA256 hash of final blob (NULL until finalized)
- `created_ts` (INTEGER NOT NULL) - Creation timestamp
- `finished_ts` (INTEGER) - Finalization timestamp (NULL if in progress)
- `uncompressed_size` (INTEGER NOT NULL DEFAULT 0) - Total size of chunks before compression
- `compressed_size` (INTEGER NOT NULL DEFAULT 0) - Size after compression and encryption
- `uploaded_ts` (INTEGER) - Upload completion timestamp (NULL if not uploaded)
**Purpose:** Tracks blob lifecycle from creation through upload. The UUID primary key allows immediate association of chunks with blobs.
### 6. `blob_chunks`
Maps chunks to the blobs that contain them.
**Columns:**
- `blob_id` (TEXT) - Blob ID (FK to blobs.id)
- `chunk_hash` (TEXT) - Chunk hash (FK to chunks.chunk_hash)
- `offset` (INTEGER) - Byte offset of chunk within blob (before compression)
- `length` (INTEGER) - Length of chunk in bytes
- PRIMARY KEY (`blob_id`, `chunk_hash`)
**Purpose:** Enables chunk retrieval from blobs during restore operations.
### 7. `snapshots`
Tracks backup snapshots.
**Columns:**
- `id` (TEXT PRIMARY KEY) - Snapshot ID (format: hostname-YYYYMMDD-HHMMSSZ)
- `hostname` (TEXT) - Hostname where backup was created
- `vaultik_version` (TEXT) - Version of Vaultik used
- `vaultik_git_revision` (TEXT) - Git revision of Vaultik used
- `started_at` (INTEGER) - Start timestamp
- `completed_at` (INTEGER) - Completion timestamp (NULL if in progress)
- `file_count` (INTEGER) - Number of files in snapshot
- `chunk_count` (INTEGER) - Number of unique chunks
- `blob_count` (INTEGER) - Number of blobs referenced
- `total_size` (INTEGER) - Total size of all files
- `blob_size` (INTEGER) - Total size of all blobs (compressed)
- `blob_uncompressed_size` (INTEGER) - Total uncompressed size of all referenced blobs
- `compression_ratio` (REAL) - Compression ratio achieved
- `compression_level` (INTEGER) - Compression level used for this snapshot
- `upload_bytes` (INTEGER) - Total bytes uploaded during this snapshot
- `upload_duration_ms` (INTEGER) - Total milliseconds spent uploading to S3
**Purpose:** Provides snapshot metadata and statistics including version tracking for compatibility.
### 8. `snapshot_files`
Maps snapshots to the files they contain.
**Columns:**
- `snapshot_id` (TEXT) - Snapshot ID (FK to snapshots.id)
- `file_id` (TEXT) - File ID (FK to files.id)
- PRIMARY KEY (`snapshot_id`, `file_id`)
**Purpose:** Records which files are included in each snapshot.
### 9. `snapshot_blobs`
Maps snapshots to the blobs they reference.
**Columns:**
- `snapshot_id` (TEXT) - Snapshot ID (FK to snapshots.id)
- `blob_id` (TEXT) - Blob ID (FK to blobs.id)
- `blob_hash` (TEXT) - Denormalized blob hash for manifest generation
- PRIMARY KEY (`snapshot_id`, `blob_id`)
**Purpose:** Tracks blob dependencies for snapshots and enables manifest generation.
### 10. `uploads`
Tracks blob upload metrics.
**Columns:**
- `blob_hash` (TEXT PRIMARY KEY) - Hash of uploaded blob
- `snapshot_id` (TEXT NOT NULL) - The snapshot that triggered this upload (FK to snapshots.id)
- `uploaded_at` (INTEGER) - Upload timestamp
- `size` (INTEGER) - Size of uploaded blob
- `duration_ms` (INTEGER) - Upload duration in milliseconds
**Purpose:** Performance monitoring and tracking which blobs were newly created (uploaded) during each snapshot.
## Data Flow and Operations
### 1. Backup Process
1. **File Scanning**
- `INSERT OR REPLACE INTO files` - Update file metadata
- `SELECT * FROM files WHERE path = ?` - Check if file has changed
- `INSERT INTO snapshot_files` - Add file to current snapshot
2. **Chunking** (for changed files)
- `INSERT OR IGNORE INTO chunks` - Store new chunks
- `INSERT INTO file_chunks` - Map chunks to file
- `INSERT INTO chunk_files` - Create reverse mapping
3. **Blob Packing**
- `INSERT INTO blobs` - Create blob record with UUID (blob_hash NULL)
- `INSERT INTO blob_chunks` - Associate chunks with blob immediately
- `UPDATE blobs SET blob_hash = ?, finished_ts = ?` - Finalize blob after packing
4. **Upload**
- `UPDATE blobs SET uploaded_ts = ?` - Mark blob as uploaded
- `INSERT INTO uploads` - Record upload metrics with snapshot_id
- `INSERT INTO snapshot_blobs` - Associate blob with snapshot
5. **Snapshot Completion**
- `UPDATE snapshots SET completed_at = ?, stats...` - Finalize snapshot
- Generate and upload blob manifest from `snapshot_blobs`
### 2. Incremental Backup
1. **Change Detection**
- `SELECT * FROM files WHERE path = ?` - Get previous file metadata
- Compare mtime, size, mode to detect changes
- Skip unchanged files but still add to `snapshot_files`
2. **Chunk Reuse**
- `SELECT * FROM blob_chunks WHERE chunk_hash = ?` - Find existing chunks
- `INSERT INTO snapshot_blobs` - Reference existing blobs for unchanged files
### 3. Snapshot Metadata Export
After a snapshot is completed:
1. Copy database to temporary file
2. Clean temporary database to contain only current snapshot data
3. Export to SQL dump using sqlite3
4. Compress with zstd and encrypt with age
5. Upload to S3 as `metadata/{snapshot-id}/db.zst.age`
6. Generate blob manifest and upload as `metadata/{snapshot-id}/manifest.json.zst`
### 4. Restore Process
The restore process doesn't use the local database. Instead:
1. Downloads snapshot metadata from S3
2. Downloads required blobs based on manifest
3. Reconstructs files from decrypted and decompressed chunks
### 5. Pruning
1. **Identify Unreferenced Blobs**
- Query blobs not referenced by any remaining snapshot
- Delete from S3 and local database
### 6. Incomplete Snapshot Cleanup
Before each backup:
1. Query incomplete snapshots (where `completed_at IS NULL`)
2. Check if metadata exists in S3
3. If no metadata, delete snapshot and all associations
4. Clean up orphaned files, chunks, and blobs
## Repository Pattern
Vaultik uses a repository pattern for database access:
- `FileRepository` - CRUD operations for files and file metadata
- `ChunkRepository` - CRUD operations for content chunks
- `FileChunkRepository` - Manage file-to-chunk mappings
- `ChunkFileRepository` - Manage chunk-to-file reverse mappings
- `BlobRepository` - Manage blob lifecycle (creation, finalization, upload)
- `BlobChunkRepository` - Manage blob-to-chunk associations
- `SnapshotRepository` - Manage snapshots and their relationships
- `UploadRepository` - Track blob upload metrics
Each repository provides methods like:
- `Create()` - Insert new record
- `GetByID()` / `GetByPath()` / `GetByHash()` - Retrieve records
- `Update()` - Update existing records
- `Delete()` - Remove records
- Specialized queries for each entity type (e.g., `DeleteOrphaned()`, `GetIncompleteByHostname()`)
## Transaction Management
All database operations that modify multiple tables are wrapped in transactions:
```go
err := repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
// Multiple repository operations using tx
})
```
This ensures consistency, especially important for operations like:
- Creating file-chunk mappings
- Associating chunks with blobs
- Updating snapshot statistics
## Performance Considerations
1. **Indexes**:
- Primary keys are automatically indexed
- `idx_files_path` on `files(path)` for efficient file lookups
2. **Prepared Statements**: All queries use prepared statements for performance and security
3. **Batch Operations**: Where possible, operations are batched within transactions
4. **Write-Ahead Logging**: SQLite WAL mode is enabled for better concurrency
## Data Integrity
1. **Foreign Keys**: Enforced through CASCADE DELETE and application-level repository methods
2. **Unique Constraints**: Chunk hashes, file paths, and blob hashes are unique
3. **Null Handling**: Nullable fields clearly indicate in-progress operations
4. **Timestamp Tracking**: All major operations record timestamps for auditing

143
docs/REPOSTRUCTURE.md Normal file
View File

@@ -0,0 +1,143 @@
# Vaultik S3 Repository Structure
This document describes the structure and organization of data stored in the S3 bucket by Vaultik.
## Overview
Vaultik stores all backup data in an S3-compatible object store. The repository consists of two main components:
1. **Blobs** - The actual backup data (content-addressed, encrypted)
2. **Metadata** - Snapshot information and manifests (partially encrypted)
## Directory Structure
```
<bucket>/<prefix>/
├── blobs/
│ └── <hash[0:2]>/
│ └── <hash[2:4]>/
│ └── <full-hash>
└── metadata/
└── <snapshot-id>/
├── db.zst.age
└── manifest.json.zst
```
## Blobs Directory (`blobs/`)
### Structure
- **Path format**: `blobs/<first-2-chars>/<next-2-chars>/<full-hash>`
- **Example**: `blobs/ca/fe/cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678`
- **Sharding**: The two-level directory structure (using the first 4 characters of the hash) prevents any single directory from containing too many objects
### Content
- **What it contains**: Packed collections of content-defined chunks from files
- **Format**: Zstandard compressed, then Age encrypted
- **Encryption**: Always encrypted with Age using the configured recipients
- **Naming**: Content-addressed using SHA256 hash of the encrypted blob
### Why Encrypted
Blobs contain the actual file data from backups and must be encrypted for security. The content-addressing ensures deduplication while the encryption ensures privacy.
## Metadata Directory (`metadata/`)
Each snapshot has its own subdirectory named with the snapshot ID.
### Snapshot ID Format
- **Format**: `<hostname>-<YYYYMMDD>-<HHMMSSZ>`
- **Example**: `laptop-20240115-143052Z`
- **Components**:
- Hostname (may contain hyphens)
- Date in YYYYMMDD format
- Time in HHMMSSZ format (Z indicates UTC)
### Files in Each Snapshot Directory
#### `db.zst.age` - Encrypted Database Dump
- **What it contains**: Complete SQLite database dump for this snapshot
- **Format**: SQL dump → Zstandard compressed → Age encrypted
- **Encryption**: Encrypted with Age
- **Purpose**: Contains full file metadata, chunk mappings, and all relationships
- **Why encrypted**: Contains sensitive metadata like file paths, permissions, and ownership
#### `manifest.json.zst` - Unencrypted Blob Manifest
- **What it contains**: JSON list of all blob hashes referenced by this snapshot
- **Format**: JSON → Zstandard compressed (NOT encrypted)
- **Encryption**: NOT encrypted
- **Purpose**: Enables pruning operations without requiring decryption keys
- **Structure**:
```json
{
"snapshot_id": "laptop-20240115-143052Z",
"timestamp": "2024-01-15T14:30:52Z",
"blob_count": 42,
"blobs": [
"cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678",
"deadbeef1234567890abcdef1234567890abcdef1234567890abcdef12345678",
...
]
}
```
### Why Manifest is Unencrypted
The manifest must be readable without the private key to enable:
1. **Pruning operations** - Identifying unreferenced blobs for deletion
2. **Storage analysis** - Understanding space usage without decryption
3. **Verification** - Checking blob existence without decryption
4. **Cross-snapshot deduplication analysis** - Finding shared blobs between snapshots
The manifest only contains blob hashes, not file names or any other sensitive information.
## Security Considerations
### What's Encrypted
- **All file content** (in blobs)
- **All file metadata** (paths, permissions, timestamps, ownership in db.zst.age)
- **File-to-chunk mappings** (in db.zst.age)
### What's Not Encrypted
- **Blob hashes** (in manifest.json.zst)
- **Snapshot IDs** (directory names)
- **Blob count per snapshot** (in manifest.json.zst)
### Privacy Implications
From the unencrypted data, an observer can determine:
- When backups were taken (from snapshot IDs)
- Which hostname created backups (from snapshot IDs)
- How many blobs each snapshot references
- Which blobs are shared between snapshots (deduplication patterns)
- The size of each encrypted blob
An observer cannot determine:
- File names or paths
- File contents
- File permissions or ownership
- Directory structure
- Which chunks belong to which files
## Consistency Guarantees
1. **Blobs are immutable** - Once written, a blob is never modified
2. **Blobs are written before metadata** - A snapshot's metadata is only written after all its blobs are successfully uploaded
3. **Metadata is written atomically** - Both db.zst.age and manifest.json.zst are written as complete files
4. **Snapshots are marked complete in local DB only after metadata upload** - Ensures consistency between local and remote state
## Pruning Safety
The prune operation is safe because:
1. It only deletes blobs not referenced in any manifest
2. Manifests are unencrypted and can be read without keys
3. The operation compares the latest local DB snapshot with the latest S3 snapshot to ensure consistency
4. Pruning will fail if these don't match, preventing accidental deletion of needed blobs
## Restoration Requirements
To restore from a backup, you need:
1. **The Age private key** - To decrypt blobs and database
2. **The snapshot metadata** - Both files from the snapshot's metadata directory
3. **All referenced blobs** - As listed in the manifest
The restoration process:
1. Download and decrypt the database dump to understand file structure
2. Download and decrypt the required blobs
3. Reconstruct files from their chunks
4. Restore file metadata (permissions, timestamps, etc.)

293
go.mod
View File

@@ -3,26 +3,303 @@ module git.eeqj.de/sneak/vaultik
go 1.24.4 go 1.24.4
require ( require (
github.com/spf13/cobra v1.9.1 filippo.io/age v1.2.1
git.eeqj.de/sneak/smartconfig v1.0.0
github.com/adrg/xdg v0.5.3
github.com/aws/aws-sdk-go-v2 v1.39.6
github.com/aws/aws-sdk-go-v2/config v1.31.17
github.com/aws/aws-sdk-go-v2/credentials v1.18.21
github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.20.4
github.com/aws/aws-sdk-go-v2/service/s3 v1.90.0
github.com/aws/smithy-go v1.23.2
github.com/dustin/go-humanize v1.0.1
github.com/gobwas/glob v0.2.3
github.com/google/uuid v1.6.0
github.com/johannesboyne/gofakes3 v0.0.0-20250603205740-ed9094be7668
github.com/klauspost/compress v1.18.1
github.com/mattn/go-sqlite3 v1.14.29
github.com/rclone/rclone v1.72.1
github.com/schollz/progressbar/v3 v3.19.0
github.com/spf13/afero v1.15.0
github.com/spf13/cobra v1.10.1
github.com/stretchr/testify v1.11.1
go.uber.org/fx v1.24.0 go.uber.org/fx v1.24.0
golang.org/x/term v0.37.0
gopkg.in/yaml.v3 v3.0.1 gopkg.in/yaml.v3 v3.0.1
modernc.org/sqlite v1.38.0 modernc.org/sqlite v1.38.0
) )
require ( require (
github.com/dustin/go-humanize v1.0.1 // indirect cloud.google.com/go/auth v0.17.0 // indirect
github.com/google/uuid v1.6.0 // indirect cloud.google.com/go/auth/oauth2adapt v0.2.8 // indirect
cloud.google.com/go/compute/metadata v0.9.0 // indirect
cloud.google.com/go/iam v1.5.2 // indirect
cloud.google.com/go/secretmanager v1.15.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.20.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.13.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/internal v1.11.2 // indirect
github.com/Azure/azure-sdk-for-go/sdk/keyvault/azsecrets v0.12.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/keyvault/internal v0.7.1 // indirect
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.6.3 // indirect
github.com/Azure/azure-sdk-for-go/sdk/storage/azfile v1.5.3 // indirect
github.com/Azure/go-ntlmssp v0.0.2-0.20251110135918-10b7b7e7cd26 // indirect
github.com/AzureAD/microsoft-authentication-library-for-go v1.6.0 // indirect
github.com/Files-com/files-sdk-go/v3 v3.2.264 // indirect
github.com/IBM/go-sdk-core/v5 v5.21.0 // indirect
github.com/Max-Sum/base32768 v0.0.0-20230304063302-18e6ce5945fd // indirect
github.com/Microsoft/go-winio v0.6.2 // indirect
github.com/ProtonMail/bcrypt v0.0.0-20211005172633-e235017c1baf // indirect
github.com/ProtonMail/gluon v0.17.1-0.20230724134000-308be39be96e // indirect
github.com/ProtonMail/go-crypto v1.3.0 // indirect
github.com/ProtonMail/go-mime v0.0.0-20230322103455-7d82a3887f2f // indirect
github.com/ProtonMail/go-srp v0.0.7 // indirect
github.com/ProtonMail/gopenpgp/v2 v2.9.0 // indirect
github.com/PuerkitoBio/goquery v1.10.3 // indirect
github.com/a1ex3/zstd-seekable-format-go/pkg v0.10.0 // indirect
github.com/abbot/go-http-auth v0.4.0 // indirect
github.com/anchore/go-lzo v0.1.0 // indirect
github.com/andybalholm/cascadia v1.3.3 // indirect
github.com/appscode/go-querystring v0.0.0-20170504095604-0126cfb3f1dc // indirect
github.com/armon/go-metrics v0.4.1 // indirect
github.com/aws/aws-sdk-go v1.44.256 // indirect
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.3 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.13 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.13 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.13 // indirect
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.4 // indirect
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.13 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.3 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.4 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.13 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.13 // indirect
github.com/aws/aws-sdk-go-v2/service/secretsmanager v1.35.8 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.30.1 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.5 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.39.1 // indirect
github.com/bahlo/generic-list-go v0.2.0 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/boombuler/barcode v1.1.0 // indirect
github.com/bradenaw/juniper v0.15.3 // indirect
github.com/bradfitz/iter v0.0.0-20191230175014-e8f45d346db8 // indirect
github.com/buengese/sgzip v0.1.1 // indirect
github.com/buger/jsonparser v1.1.1 // indirect
github.com/calebcase/tmpfile v1.0.3 // indirect
github.com/cenkalti/backoff/v4 v4.3.0 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/chilts/sid v0.0.0-20190607042430-660e94789ec9 // indirect
github.com/clipperhouse/stringish v0.1.1 // indirect
github.com/clipperhouse/uax29/v2 v2.3.0 // indirect
github.com/cloudflare/circl v1.6.1 // indirect
github.com/cloudinary/cloudinary-go/v2 v2.13.0 // indirect
github.com/cloudsoda/go-smb2 v0.0.0-20250228001242-d4c70e6251cc // indirect
github.com/cloudsoda/sddl v0.0.0-20250224235906-926454e91efc // indirect
github.com/colinmarc/hdfs/v2 v2.4.0 // indirect
github.com/coreos/go-semver v0.3.1 // indirect
github.com/coreos/go-systemd/v22 v22.6.0 // indirect
github.com/creasty/defaults v1.8.0 // indirect
github.com/cronokirby/saferith v0.33.0 // indirect
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
github.com/diskfs/go-diskfs v1.7.0 // indirect
github.com/dropbox/dropbox-sdk-go-unofficial/v6 v6.0.5 // indirect
github.com/ebitengine/purego v0.9.1 // indirect
github.com/emersion/go-message v0.18.2 // indirect
github.com/emersion/go-vcard v0.0.0-20241024213814-c9703dde27ff // indirect
github.com/emicklei/go-restful/v3 v3.11.0 // indirect
github.com/fatih/color v1.16.0 // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/flynn/noise v1.1.0 // indirect
github.com/fxamacker/cbor/v2 v2.7.0 // indirect
github.com/gabriel-vasile/mimetype v1.4.11 // indirect
github.com/geoffgarside/ber v1.2.0 // indirect
github.com/go-chi/chi/v5 v5.2.3 // indirect
github.com/go-darwin/apfs v0.0.0-20211011131704-f84b94dbf348 // indirect
github.com/go-git/go-billy/v5 v5.6.2 // indirect
github.com/go-jose/go-jose/v4 v4.1.2 // indirect
github.com/go-logr/logr v1.4.3 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-ole/go-ole v1.3.0 // indirect
github.com/go-openapi/errors v0.22.4 // indirect
github.com/go-openapi/jsonpointer v0.21.0 // indirect
github.com/go-openapi/jsonreference v0.20.2 // indirect
github.com/go-openapi/strfmt v0.25.0 // indirect
github.com/go-openapi/swag v0.23.0 // indirect
github.com/go-playground/locales v0.14.1 // indirect
github.com/go-playground/universal-translator v0.18.1 // indirect
github.com/go-playground/validator/v10 v10.28.0 // indirect
github.com/go-resty/resty/v2 v2.16.5 // indirect
github.com/go-viper/mapstructure/v2 v2.4.0 // indirect
github.com/gofrs/flock v0.13.0 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang-jwt/jwt/v4 v4.5.2 // indirect
github.com/golang-jwt/jwt/v5 v5.3.0 // indirect
github.com/golang/protobuf v1.5.4 // indirect
github.com/google/btree v1.1.3 // indirect
github.com/google/gnostic-models v0.6.9 // indirect
github.com/google/go-cmp v0.7.0 // indirect
github.com/google/s2a-go v0.1.9 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.7 // indirect
github.com/googleapis/gax-go/v2 v2.15.0 // indirect
github.com/gopherjs/gopherjs v1.17.2 // indirect
github.com/gorilla/schema v1.4.1 // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.26.3 // indirect
github.com/hashicorp/consul/api v1.32.1 // indirect
github.com/hashicorp/errwrap v1.1.0 // indirect
github.com/hashicorp/go-cleanhttp v0.5.2 // indirect
github.com/hashicorp/go-hclog v1.6.3 // indirect
github.com/hashicorp/go-immutable-radix v1.3.1 // indirect
github.com/hashicorp/go-multierror v1.1.1 // indirect
github.com/hashicorp/go-retryablehttp v0.7.8 // indirect
github.com/hashicorp/go-rootcerts v1.0.2 // indirect
github.com/hashicorp/go-secure-stdlib/parseutil v0.1.6 // indirect
github.com/hashicorp/go-secure-stdlib/strutil v0.1.2 // indirect
github.com/hashicorp/go-sockaddr v1.0.2 // indirect
github.com/hashicorp/go-uuid v1.0.3 // indirect
github.com/hashicorp/golang-lru v0.5.4 // indirect
github.com/hashicorp/hcl v1.0.1-vault-7 // indirect
github.com/hashicorp/serf v0.10.1 // indirect
github.com/hashicorp/vault/api v1.20.0 // indirect
github.com/henrybear327/Proton-API-Bridge v1.0.0 // indirect
github.com/henrybear327/go-proton-api v1.0.0 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/jcmturner/aescts/v2 v2.0.0 // indirect
github.com/jcmturner/dnsutils/v2 v2.0.0 // indirect
github.com/jcmturner/gofork v1.7.6 // indirect
github.com/jcmturner/goidentity/v6 v6.0.1 // indirect
github.com/jcmturner/gokrb5/v8 v8.4.4 // indirect
github.com/jcmturner/rpc/v2 v2.0.3 // indirect
github.com/jlaffaye/ftp v0.2.1-0.20240918233326-1b970516f5d3 // indirect
github.com/josharian/intern v1.0.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/jtolds/gls v4.20.0+incompatible // indirect
github.com/jtolio/noiseconn v0.0.0-20231127013910-f6d9ecbf1de7 // indirect
github.com/jzelinskie/whirlpool v0.0.0-20201016144138-0675e54bb004 // indirect
github.com/klauspost/cpuid/v2 v2.3.0 // indirect
github.com/koofr/go-httpclient v0.0.0-20240520111329-e20f8f203988 // indirect
github.com/koofr/go-koofrclient v0.0.0-20221207135200-cbd7fc9ad6a6 // indirect
github.com/kr/fs v0.1.0 // indirect
github.com/kylelemons/godebug v1.1.0 // indirect
github.com/lanrat/extsort v1.4.2 // indirect
github.com/leodido/go-urn v1.4.0 // indirect
github.com/lpar/date v1.0.0 // indirect
github.com/lufia/plan9stats v0.0.0-20251013123823-9fd1530e3ec3 // indirect
github.com/mailru/easyjson v0.9.1 // indirect
github.com/mattn/go-colorable v0.1.14 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect github.com/mattn/go-isatty v0.0.20 // indirect
github.com/mattn/go-runewidth v0.0.19 // indirect
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db // indirect
github.com/mitchellh/go-homedir v1.1.0 // indirect
github.com/mitchellh/mapstructure v1.5.0 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/ncruces/go-strftime v0.1.9 // indirect github.com/ncruces/go-strftime v0.1.9 // indirect
github.com/ncw/swift/v2 v2.0.5 // indirect
github.com/oklog/ulid v1.3.1 // indirect
github.com/onsi/ginkgo/v2 v2.23.3 // indirect
github.com/oracle/oci-go-sdk/v65 v65.104.0 // indirect
github.com/panjf2000/ants/v2 v2.11.3 // indirect
github.com/patrickmn/go-cache v2.1.0+incompatible // indirect
github.com/pengsrc/go-shared v0.2.1-0.20190131101655-1999055a4a14 // indirect
github.com/peterh/liner v1.2.2 // indirect
github.com/pierrec/lz4/v4 v4.1.22 // indirect
github.com/pkg/browser v0.0.0-20240102092130-5ac0b6a4141c // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/pkg/sftp v1.13.10 // indirect
github.com/pkg/xattr v0.4.12 // indirect
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
github.com/power-devops/perfstat v0.0.0-20240221224432-82ca36839d55 // indirect
github.com/pquerna/otp v1.5.0 // indirect
github.com/prometheus/client_golang v1.23.2 // indirect
github.com/prometheus/client_model v0.6.2 // indirect
github.com/prometheus/common v0.67.2 // indirect
github.com/prometheus/procfs v0.19.2 // indirect
github.com/putdotio/go-putio/putio v0.0.0-20200123120452-16d982cac2b8 // indirect
github.com/relvacode/iso8601 v1.7.0 // indirect
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
github.com/spf13/pflag v1.0.6 // indirect github.com/rfjakob/eme v1.1.2 // indirect
github.com/rivo/uniseg v0.4.7 // indirect
github.com/ryanuber/go-glob v1.0.0 // indirect
github.com/ryszard/goskiplist v0.0.0-20150312221310-2dfbae5fcf46 // indirect
github.com/sabhiram/go-gitignore v0.0.0-20210923224102-525f6e181f06 // indirect
github.com/samber/lo v1.52.0 // indirect
github.com/shirou/gopsutil/v4 v4.25.10 // indirect
github.com/sirupsen/logrus v1.9.4-0.20230606125235-dd1b4c2e81af // indirect
github.com/skratchdot/open-golang v0.0.0-20200116055534-eef842397966 // indirect
github.com/smarty/assertions v1.16.0 // indirect
github.com/sony/gobreaker v1.0.0 // indirect
github.com/spacemonkeygo/monkit/v3 v3.0.25-0.20251022131615-eb24eb109368 // indirect
github.com/spf13/pflag v1.0.10 // indirect
github.com/t3rm1n4l/go-mega v0.0.0-20251031123324-a804aaa87491 // indirect
github.com/tidwall/gjson v1.18.0 // indirect
github.com/tidwall/match v1.1.1 // indirect
github.com/tidwall/pretty v1.2.0 // indirect
github.com/tklauser/go-sysconf v0.3.15 // indirect
github.com/tklauser/numcpus v0.10.0 // indirect
github.com/ulikunitz/xz v0.5.15 // indirect
github.com/unknwon/goconfig v1.0.0 // indirect
github.com/wk8/go-ordered-map/v2 v2.1.8 // indirect
github.com/x448/float16 v0.8.4 // indirect
github.com/xanzy/ssh-agent v0.3.3 // indirect
github.com/youmark/pkcs8 v0.0.0-20240726163527-a2c0da244d78 // indirect
github.com/yunify/qingstor-sdk-go/v3 v3.2.0 // indirect
github.com/yusufpapurcu/wmi v1.2.4 // indirect
github.com/zeebo/blake3 v0.2.4 // indirect
github.com/zeebo/errs v1.4.0 // indirect
github.com/zeebo/xxh3 v1.0.2 // indirect
go.etcd.io/bbolt v1.4.3 // indirect
go.etcd.io/etcd/api/v3 v3.6.2 // indirect
go.etcd.io/etcd/client/pkg/v3 v3.6.2 // indirect
go.etcd.io/etcd/client/v3 v3.6.2 // indirect
go.mongodb.org/mongo-driver v1.17.6 // indirect
go.opentelemetry.io/auto/sdk v1.2.1 // indirect
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.61.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.63.0 // indirect
go.opentelemetry.io/otel v1.38.0 // indirect
go.opentelemetry.io/otel/metric v1.38.0 // indirect
go.opentelemetry.io/otel/trace v1.38.0 // indirect
go.shabbyrobe.org/gocovmerge v0.0.0-20230507111327-fa4f82cfbf4d // indirect
go.uber.org/dig v1.19.0 // indirect go.uber.org/dig v1.19.0 // indirect
go.uber.org/multierr v1.10.0 // indirect go.uber.org/multierr v1.11.0 // indirect
go.uber.org/zap v1.26.0 // indirect go.uber.org/zap v1.27.0 // indirect
golang.org/x/exp v0.0.0-20250408133849-7e4ce0ab07d0 // indirect go.yaml.in/yaml/v2 v2.4.3 // indirect
golang.org/x/sys v0.33.0 // indirect golang.org/x/crypto v0.45.0 // indirect
golang.org/x/exp v0.0.0-20251023183803-a4bb9ffd2546 // indirect
golang.org/x/net v0.47.0 // indirect
golang.org/x/oauth2 v0.33.0 // indirect
golang.org/x/sync v0.18.0 // indirect
golang.org/x/sys v0.38.0 // indirect
golang.org/x/text v0.31.0 // indirect
golang.org/x/time v0.14.0 // indirect
golang.org/x/tools v0.38.0 // indirect
google.golang.org/api v0.255.0 // indirect
google.golang.org/genproto v0.0.0-20250603155806-513f23925822 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20250804133106-a7a43d27e69b // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20251103181224-f26f9409b101 // indirect
google.golang.org/grpc v1.76.0 // indirect
google.golang.org/protobuf v1.36.10 // indirect
gopkg.in/evanphx/json-patch.v4 v4.12.0 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/natefinch/lumberjack.v2 v2.2.1 // indirect
gopkg.in/validator.v2 v2.0.1 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
k8s.io/api v0.33.3 // indirect
k8s.io/apimachinery v0.33.3 // indirect
k8s.io/client-go v0.33.3 // indirect
k8s.io/klog/v2 v2.130.1 // indirect
k8s.io/kube-openapi v0.0.0-20250318190949-c8a335a9a2ff // indirect
k8s.io/utils v0.0.0-20241104100929-3ea5e8cea738 // indirect
modernc.org/libc v1.65.10 // indirect modernc.org/libc v1.65.10 // indirect
modernc.org/mathutil v1.7.1 // indirect modernc.org/mathutil v1.7.1 // indirect
modernc.org/memory v1.11.0 // indirect modernc.org/memory v1.11.0 // indirect
moul.io/http2curl/v2 v2.3.0 // indirect
sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3 // indirect
sigs.k8s.io/randfill v1.0.0 // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.6.0 // indirect
sigs.k8s.io/yaml v1.6.0 // indirect
storj.io/common v0.0.0-20251107171817-6221ae45072c // indirect
storj.io/drpc v0.0.35-0.20250513201419-f7819ea69b55 // indirect
storj.io/eventkit v0.0.0-20250410172343-61f26d3de156 // indirect
storj.io/infectious v0.0.2 // indirect
storj.io/picobuf v0.0.4 // indirect
storj.io/uplink v1.13.1 // indirect
) )

1323
go.sum

File diff suppressed because it is too large Load Diff

6
internal/blob/errors.go Normal file
View File

@@ -0,0 +1,6 @@
package blob
import "errors"
// ErrBlobSizeLimitExceeded is returned when adding a chunk would exceed the blob size limit
var ErrBlobSizeLimitExceeded = errors.New("adding chunk would exceed blob size limit")

555
internal/blob/packer.go Normal file
View File

@@ -0,0 +1,555 @@
// Package blob handles the creation of blobs - the final storage units for Vaultik.
// A blob is a large file (up to 10GB) containing many compressed and encrypted chunks
// from multiple source files. Blobs are content-addressed, meaning their filename
// is derived from the SHA256 hash of their compressed and encrypted content.
//
// The blob creation process:
// 1. Chunks are accumulated from multiple files
// 2. The collection is compressed using zstd
// 3. The compressed data is encrypted using age
// 4. The encrypted blob is hashed to create its content-addressed name
// 5. The blob is uploaded to S3 using the hash as the filename
//
// This design optimizes storage efficiency by batching many small chunks into
// larger blobs, reducing the number of S3 operations and associated costs.
package blob
import (
"context"
"database/sql"
"encoding/hex"
"fmt"
"io"
"sync"
"time"
"git.eeqj.de/sneak/vaultik/internal/blobgen"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/google/uuid"
"github.com/spf13/afero"
)
// BlobHandler is a callback function invoked when a blob is finalized and ready for upload.
// The handler receives a BlobWithReader containing the blob metadata and a reader for
// the compressed and encrypted blob content. The handler is responsible for uploading
// the blob to storage and cleaning up any temporary files.
type BlobHandler func(blob *BlobWithReader) error
// PackerConfig holds configuration for creating a Packer.
// All fields except BlobHandler are required.
type PackerConfig struct {
MaxBlobSize int64 // Maximum size of a blob before forcing finalization
CompressionLevel int // Zstd compression level (1-19, higher = better compression)
Recipients []string // Age recipients for encryption
Repositories *database.Repositories // Database repositories for tracking blob metadata
BlobHandler BlobHandler // Optional callback when blob is ready for upload
Fs afero.Fs // Filesystem for temporary files
}
// PendingChunk represents a chunk waiting to be inserted into the database.
type PendingChunk struct {
Hash string
Size int64
}
// Packer accumulates chunks and packs them into blobs.
// It handles compression, encryption, and coordination with the database
// to track blob metadata. Packer is thread-safe.
type Packer struct {
maxBlobSize int64
compressionLevel int
recipients []string // Age recipients for encryption
blobHandler BlobHandler // Called when blob is ready
repos *database.Repositories // For creating blob records
fs afero.Fs // Filesystem for temporary files
// Mutex for thread-safe blob creation
mu sync.Mutex
// Current blob being packed
currentBlob *blobInProgress
finishedBlobs []*FinishedBlob // Only used if no handler provided
// Pending chunks to be inserted when blob finalizes
pendingChunks []PendingChunk
}
// blobInProgress represents a blob being assembled
type blobInProgress struct {
id string // UUID of the blob
chunks []*chunkInfo // Track chunk metadata
chunkSet map[string]bool // Track unique chunks in this blob
tempFile afero.File // Temporary file for encrypted compressed data
writer *blobgen.Writer // Unified compression/encryption/hashing writer
startTime time.Time
size int64 // Current uncompressed size
}
// ChunkRef represents a chunk to be added to a blob.
// The Hash is the content-addressed identifier (SHA256) of the chunk,
// and Data contains the raw chunk bytes. After adding to a blob,
// the Data can be safely discarded as it's written to the blob immediately.
type ChunkRef struct {
Hash string // SHA256 hash of the chunk data
Data []byte // Raw chunk content
}
// chunkInfo tracks chunk metadata in a blob
type chunkInfo struct {
Hash string
Offset int64
Size int64
}
// FinishedBlob represents a completed blob ready for storage
type FinishedBlob struct {
ID string
Hash string
Data []byte // Compressed data
Chunks []*BlobChunkRef
CreatedTS time.Time
Uncompressed int64
Compressed int64
}
// BlobChunkRef represents a chunk's position within a blob
type BlobChunkRef struct {
ChunkHash string
Offset int64
Length int64
}
// BlobWithReader wraps a FinishedBlob with its data reader
type BlobWithReader struct {
*FinishedBlob
Reader io.ReadSeeker
TempFile afero.File // Optional, only set for disk-based blobs
InsertedChunkHashes []string // Chunk hashes that were inserted to DB with this blob
}
// NewPacker creates a new blob packer that accumulates chunks into blobs.
// The packer will automatically finalize blobs when they reach MaxBlobSize.
// Returns an error if required configuration fields are missing or invalid.
func NewPacker(cfg PackerConfig) (*Packer, error) {
if len(cfg.Recipients) == 0 {
return nil, fmt.Errorf("recipients are required - blobs must be encrypted")
}
if cfg.MaxBlobSize <= 0 {
return nil, fmt.Errorf("max blob size must be positive")
}
if cfg.Fs == nil {
return nil, fmt.Errorf("filesystem is required")
}
return &Packer{
maxBlobSize: cfg.MaxBlobSize,
compressionLevel: cfg.CompressionLevel,
recipients: cfg.Recipients,
blobHandler: cfg.BlobHandler,
repos: cfg.Repositories,
fs: cfg.Fs,
finishedBlobs: make([]*FinishedBlob, 0),
}, nil
}
// SetBlobHandler sets the handler to be called when a blob is finalized.
// The handler is responsible for uploading the blob to storage.
// If no handler is set, finalized blobs are stored in memory and can be
// retrieved with GetFinishedBlobs().
func (p *Packer) SetBlobHandler(handler BlobHandler) {
p.mu.Lock()
defer p.mu.Unlock()
p.blobHandler = handler
}
// AddPendingChunk queues a chunk to be inserted into the database when the
// current blob is finalized. This batches chunk inserts to reduce transaction
// overhead. Thread-safe.
func (p *Packer) AddPendingChunk(hash string, size int64) {
p.mu.Lock()
defer p.mu.Unlock()
p.pendingChunks = append(p.pendingChunks, PendingChunk{Hash: hash, Size: size})
}
// AddChunk adds a chunk to the current blob being packed.
// If adding the chunk would exceed MaxBlobSize, returns ErrBlobSizeLimitExceeded.
// In this case, the caller should finalize the current blob and retry.
// The chunk data is written immediately and can be garbage collected after this call.
// Thread-safe.
func (p *Packer) AddChunk(chunk *ChunkRef) error {
p.mu.Lock()
defer p.mu.Unlock()
// Initialize new blob if needed
if p.currentBlob == nil {
if err := p.startNewBlob(); err != nil {
return fmt.Errorf("starting new blob: %w", err)
}
}
// Check if adding this chunk would exceed blob size limit
// Use conservative estimate: assume no compression
// Skip size check if chunk already exists in blob
if !p.currentBlob.chunkSet[chunk.Hash] {
currentSize := p.currentBlob.size
newSize := currentSize + int64(len(chunk.Data))
if newSize > p.maxBlobSize && len(p.currentBlob.chunks) > 0 {
// Return error indicating size limit would be exceeded
return ErrBlobSizeLimitExceeded
}
}
// Add chunk to current blob
if err := p.addChunkToCurrentBlob(chunk); err != nil {
return err
}
return nil
}
// Flush finalizes any in-progress blob, compressing, encrypting, and hashing it.
// This should be called after all chunks have been added to ensure no data is lost.
// If a BlobHandler is set, it will be called with the finalized blob.
// Thread-safe.
func (p *Packer) Flush() error {
p.mu.Lock()
defer p.mu.Unlock()
if p.currentBlob != nil && len(p.currentBlob.chunks) > 0 {
if err := p.finalizeCurrentBlob(); err != nil {
return fmt.Errorf("finalizing blob: %w", err)
}
}
return nil
}
// FinalizeBlob finalizes the current blob being assembled.
// This compresses the accumulated chunks, encrypts the result, and computes
// the content-addressed hash. The finalized blob is either passed to the
// BlobHandler (if set) or stored internally.
// Caller must handle retrying any chunk that triggered size limit exceeded.
// Not thread-safe - caller must hold the lock.
func (p *Packer) FinalizeBlob() error {
p.mu.Lock()
defer p.mu.Unlock()
if p.currentBlob == nil {
return nil
}
return p.finalizeCurrentBlob()
}
// GetFinishedBlobs returns all completed blobs and clears the internal list.
// This is only used when no BlobHandler is set. After calling this method,
// the caller is responsible for uploading the blobs to storage.
// Thread-safe.
func (p *Packer) GetFinishedBlobs() []*FinishedBlob {
p.mu.Lock()
defer p.mu.Unlock()
blobs := p.finishedBlobs
p.finishedBlobs = make([]*FinishedBlob, 0)
return blobs
}
// startNewBlob initializes a new blob (must be called with lock held)
func (p *Packer) startNewBlob() error {
// Generate UUID for the blob
blobID := uuid.New().String()
// Create blob record in database
if p.repos != nil {
blobIDTyped, err := types.ParseBlobID(blobID)
if err != nil {
return fmt.Errorf("parsing blob ID: %w", err)
}
blob := &database.Blob{
ID: blobIDTyped,
Hash: types.BlobHash("temp-placeholder-" + blobID), // Temporary placeholder until finalized
CreatedTS: time.Now().UTC(),
FinishedTS: nil,
UncompressedSize: 0,
CompressedSize: 0,
UploadedTS: nil,
}
if err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
return p.repos.Blobs.Create(ctx, tx, blob)
}); err != nil {
return fmt.Errorf("creating blob record: %w", err)
}
}
// Create temporary file
tempFile, err := afero.TempFile(p.fs, "", "vaultik-blob-*.tmp")
if err != nil {
return fmt.Errorf("creating temp file: %w", err)
}
// Create blobgen writer for unified compression/encryption/hashing
writer, err := blobgen.NewWriter(tempFile, p.compressionLevel, p.recipients)
if err != nil {
_ = tempFile.Close()
_ = p.fs.Remove(tempFile.Name())
return fmt.Errorf("creating blobgen writer: %w", err)
}
p.currentBlob = &blobInProgress{
id: blobID,
chunks: make([]*chunkInfo, 0),
chunkSet: make(map[string]bool),
startTime: time.Now().UTC(),
tempFile: tempFile,
writer: writer,
size: 0,
}
log.Debug("Created new blob container", "blob_id", blobID, "temp_file", tempFile.Name())
return nil
}
// addChunkToCurrentBlob adds a chunk to the current blob (must be called with lock held)
func (p *Packer) addChunkToCurrentBlob(chunk *ChunkRef) error {
// Skip if chunk already in current blob
if p.currentBlob.chunkSet[chunk.Hash] {
log.Debug("Skipping duplicate chunk already in current blob", "chunk_hash", chunk.Hash)
return nil
}
// Track offset before writing
offset := p.currentBlob.size
// Write to the blobgen writer (compression -> encryption -> disk)
if _, err := p.currentBlob.writer.Write(chunk.Data); err != nil {
return fmt.Errorf("writing to blob stream: %w", err)
}
// Track chunk info
chunkSize := int64(len(chunk.Data))
chunkInfo := &chunkInfo{
Hash: chunk.Hash,
Offset: offset,
Size: chunkSize,
}
p.currentBlob.chunks = append(p.currentBlob.chunks, chunkInfo)
p.currentBlob.chunkSet[chunk.Hash] = true
// Note: blob_chunk records are inserted in batch when blob is finalized
// to reduce transaction overhead. The chunk info is already stored in
// p.currentBlob.chunks for later insertion.
// Update total size
p.currentBlob.size += chunkSize
log.Debug("Added chunk to blob container",
"blob_id", p.currentBlob.id,
"chunk_hash", chunk.Hash,
"chunk_size", len(chunk.Data),
"offset", offset,
"blob_chunks", len(p.currentBlob.chunks),
"uncompressed_size", p.currentBlob.size)
return nil
}
// finalizeCurrentBlob completes the current blob (must be called with lock held)
func (p *Packer) finalizeCurrentBlob() error {
if p.currentBlob == nil {
return nil
}
// Close blobgen writer to flush all data
if err := p.currentBlob.writer.Close(); err != nil {
p.cleanupTempFile()
return fmt.Errorf("closing blobgen writer: %w", err)
}
// Sync file to ensure all data is written
if err := p.currentBlob.tempFile.Sync(); err != nil {
p.cleanupTempFile()
return fmt.Errorf("syncing temp file: %w", err)
}
// Get the final size (encrypted if applicable)
finalSize, err := p.currentBlob.tempFile.Seek(0, io.SeekCurrent)
if err != nil {
p.cleanupTempFile()
return fmt.Errorf("getting file size: %w", err)
}
// Reset to beginning for reading
if _, err := p.currentBlob.tempFile.Seek(0, io.SeekStart); err != nil {
p.cleanupTempFile()
return fmt.Errorf("seeking to start: %w", err)
}
// Get hash from blobgen writer (of final encrypted data)
finalHash := p.currentBlob.writer.Sum256()
blobHash := hex.EncodeToString(finalHash)
// Create chunk references with offsets
chunkRefs := make([]*BlobChunkRef, 0, len(p.currentBlob.chunks))
for _, chunk := range p.currentBlob.chunks {
chunkRefs = append(chunkRefs, &BlobChunkRef{
ChunkHash: chunk.Hash,
Offset: chunk.Offset,
Length: chunk.Size,
})
}
// Get pending chunks (will be inserted to DB and reported to handler)
chunksToInsert := p.pendingChunks
p.pendingChunks = nil // Clear pending list
// Insert pending chunks, blob_chunks, and update blob in a single transaction
if p.repos != nil {
blobIDTyped, parseErr := types.ParseBlobID(p.currentBlob.id)
if parseErr != nil {
p.cleanupTempFile()
return fmt.Errorf("parsing blob ID: %w", parseErr)
}
err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
// First insert all pending chunks (required for blob_chunks FK)
for _, chunk := range chunksToInsert {
dbChunk := &database.Chunk{
ChunkHash: types.ChunkHash(chunk.Hash),
Size: chunk.Size,
}
if err := p.repos.Chunks.Create(ctx, tx, dbChunk); err != nil {
return fmt.Errorf("creating chunk: %w", err)
}
}
// Insert all blob_chunk records in batch
for _, chunk := range p.currentBlob.chunks {
blobChunk := &database.BlobChunk{
BlobID: blobIDTyped,
ChunkHash: types.ChunkHash(chunk.Hash),
Offset: chunk.Offset,
Length: chunk.Size,
}
if err := p.repos.BlobChunks.Create(ctx, tx, blobChunk); err != nil {
return fmt.Errorf("creating blob_chunk: %w", err)
}
}
// Update blob record with final hash and sizes
return p.repos.Blobs.UpdateFinished(ctx, tx, p.currentBlob.id, blobHash,
p.currentBlob.size, finalSize)
})
if err != nil {
p.cleanupTempFile()
return fmt.Errorf("finalizing blob transaction: %w", err)
}
log.Debug("Committed blob transaction",
"chunks_inserted", len(chunksToInsert),
"blob_chunks_inserted", len(p.currentBlob.chunks))
}
// Create finished blob
finished := &FinishedBlob{
ID: p.currentBlob.id,
Hash: blobHash,
Data: nil, // We don't load data into memory anymore
Chunks: chunkRefs,
CreatedTS: p.currentBlob.startTime,
Uncompressed: p.currentBlob.size,
Compressed: finalSize,
}
compressionRatio := float64(finished.Compressed) / float64(finished.Uncompressed)
log.Info("Finalized blob (compressed and encrypted)",
"hash", blobHash,
"chunks", len(chunkRefs),
"uncompressed", finished.Uncompressed,
"compressed", finished.Compressed,
"ratio", fmt.Sprintf("%.2f", compressionRatio),
"duration", time.Since(p.currentBlob.startTime))
// Collect inserted chunk hashes for the scanner to track
var insertedChunkHashes []string
for _, chunk := range chunksToInsert {
insertedChunkHashes = append(insertedChunkHashes, chunk.Hash)
}
// Call blob handler if set
if p.blobHandler != nil {
// Reset file position for handler
if _, err := p.currentBlob.tempFile.Seek(0, io.SeekStart); err != nil {
p.cleanupTempFile()
return fmt.Errorf("seeking for handler: %w", err)
}
// Create a blob reader that includes the data stream
blobWithReader := &BlobWithReader{
FinishedBlob: finished,
Reader: p.currentBlob.tempFile,
TempFile: p.currentBlob.tempFile,
InsertedChunkHashes: insertedChunkHashes,
}
if err := p.blobHandler(blobWithReader); err != nil {
p.cleanupTempFile()
return fmt.Errorf("blob handler failed: %w", err)
}
// Note: blob handler is responsible for closing/cleaning up temp file
p.currentBlob = nil
} else {
log.Debug("No blob handler callback configured", "blob_hash", blobHash[:8]+"...")
// No handler, need to read data for legacy behavior
if _, err := p.currentBlob.tempFile.Seek(0, io.SeekStart); err != nil {
p.cleanupTempFile()
return fmt.Errorf("seeking to read data: %w", err)
}
data, err := io.ReadAll(p.currentBlob.tempFile)
if err != nil {
p.cleanupTempFile()
return fmt.Errorf("reading blob data: %w", err)
}
finished.Data = data
p.finishedBlobs = append(p.finishedBlobs, finished)
// Cleanup
p.cleanupTempFile()
p.currentBlob = nil
}
return nil
}
// cleanupTempFile removes the temporary file
func (p *Packer) cleanupTempFile() {
if p.currentBlob != nil && p.currentBlob.tempFile != nil {
name := p.currentBlob.tempFile.Name()
_ = p.currentBlob.tempFile.Close()
_ = p.fs.Remove(name)
}
}
// PackChunks is a convenience method to pack multiple chunks at once
func (p *Packer) PackChunks(chunks []*ChunkRef) error {
for _, chunk := range chunks {
err := p.AddChunk(chunk)
if err == ErrBlobSizeLimitExceeded {
// Finalize current blob and retry
if err := p.FinalizeBlob(); err != nil {
return fmt.Errorf("finalizing blob before retry: %w", err)
}
// Retry the chunk
if err := p.AddChunk(chunk); err != nil {
return fmt.Errorf("adding chunk %s after finalize: %w", chunk.Hash, err)
}
} else if err != nil {
return fmt.Errorf("adding chunk %s: %w", chunk.Hash, err)
}
}
return p.Flush()
}

View File

@@ -0,0 +1,385 @@
package blob
import (
"bytes"
"context"
"crypto/sha256"
"database/sql"
"encoding/hex"
"io"
"testing"
"filippo.io/age"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/klauspost/compress/zstd"
"github.com/spf13/afero"
)
const (
// Test key from test/insecure-integration-test.key
testPrivateKey = "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5"
testPublicKey = "age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"
)
func TestPacker(t *testing.T) {
// Initialize logger for tests
log.Initialize(log.Config{})
// Parse test identity
identity, err := age.ParseX25519Identity(testPrivateKey)
if err != nil {
t.Fatalf("failed to parse test identity: %v", err)
}
t.Run("single chunk creates single blob", func(t *testing.T) {
// Create test database
db, err := database.NewTestDB()
if err != nil {
t.Fatalf("failed to create test db: %v", err)
}
defer func() { _ = db.Close() }()
repos := database.NewRepositories(db)
cfg := PackerConfig{
MaxBlobSize: 10 * 1024 * 1024, // 10MB
CompressionLevel: 3,
Recipients: []string{testPublicKey},
Repositories: repos,
Fs: afero.NewMemMapFs(),
}
packer, err := NewPacker(cfg)
if err != nil {
t.Fatalf("failed to create packer: %v", err)
}
// Create a chunk
data := []byte("Hello, World!")
hash := sha256.Sum256(data)
hashStr := hex.EncodeToString(hash[:])
// Create chunk in database first
dbChunk := &database.Chunk{
ChunkHash: types.ChunkHash(hashStr),
Size: int64(len(data)),
}
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
return repos.Chunks.Create(ctx, tx, dbChunk)
})
if err != nil {
t.Fatalf("failed to create chunk in db: %v", err)
}
chunk := &ChunkRef{
Hash: hashStr,
Data: data,
}
// Add chunk
if err := packer.AddChunk(chunk); err != nil {
t.Fatalf("failed to add chunk: %v", err)
}
// Flush
if err := packer.Flush(); err != nil {
t.Fatalf("failed to flush: %v", err)
}
// Get finished blobs
blobs := packer.GetFinishedBlobs()
if len(blobs) != 1 {
t.Fatalf("expected 1 blob, got %d", len(blobs))
}
blob := blobs[0]
if len(blob.Chunks) != 1 {
t.Errorf("expected 1 chunk in blob, got %d", len(blob.Chunks))
}
// Note: Very small data may not compress well
t.Logf("Compression: %d -> %d bytes", blob.Uncompressed, blob.Compressed)
// Decrypt the blob data
decrypted, err := age.Decrypt(bytes.NewReader(blob.Data), identity)
if err != nil {
t.Fatalf("failed to decrypt blob: %v", err)
}
// Decompress the decrypted data
reader, err := zstd.NewReader(decrypted)
if err != nil {
t.Fatalf("failed to create decompressor: %v", err)
}
defer reader.Close()
var decompressed bytes.Buffer
if _, err := io.Copy(&decompressed, reader); err != nil {
t.Fatalf("failed to decompress: %v", err)
}
if !bytes.Equal(decompressed.Bytes(), data) {
t.Error("decompressed data doesn't match original")
}
})
t.Run("multiple chunks packed together", func(t *testing.T) {
// Create test database
db, err := database.NewTestDB()
if err != nil {
t.Fatalf("failed to create test db: %v", err)
}
defer func() { _ = db.Close() }()
repos := database.NewRepositories(db)
cfg := PackerConfig{
MaxBlobSize: 10 * 1024 * 1024, // 10MB
CompressionLevel: 3,
Recipients: []string{testPublicKey},
Repositories: repos,
Fs: afero.NewMemMapFs(),
}
packer, err := NewPacker(cfg)
if err != nil {
t.Fatalf("failed to create packer: %v", err)
}
// Create multiple small chunks
chunks := make([]*ChunkRef, 10)
for i := 0; i < 10; i++ {
data := bytes.Repeat([]byte{byte(i)}, 1000)
hash := sha256.Sum256(data)
hashStr := hex.EncodeToString(hash[:])
// Create chunk in database first
dbChunk := &database.Chunk{
ChunkHash: types.ChunkHash(hashStr),
Size: int64(len(data)),
}
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
return repos.Chunks.Create(ctx, tx, dbChunk)
})
if err != nil {
t.Fatalf("failed to create chunk in db: %v", err)
}
chunks[i] = &ChunkRef{
Hash: hashStr,
Data: data,
}
}
// Add all chunks
for _, chunk := range chunks {
err := packer.AddChunk(chunk)
if err != nil {
t.Fatalf("failed to add chunk: %v", err)
}
}
// Flush
if err := packer.Flush(); err != nil {
t.Fatalf("failed to flush: %v", err)
}
// Should have one blob with all chunks
blobs := packer.GetFinishedBlobs()
if len(blobs) != 1 {
t.Fatalf("expected 1 blob, got %d", len(blobs))
}
if len(blobs[0].Chunks) != 10 {
t.Errorf("expected 10 chunks in blob, got %d", len(blobs[0].Chunks))
}
// Verify offsets are correct
expectedOffset := int64(0)
for i, chunkRef := range blobs[0].Chunks {
if chunkRef.Offset != expectedOffset {
t.Errorf("chunk %d: expected offset %d, got %d", i, expectedOffset, chunkRef.Offset)
}
if chunkRef.Length != 1000 {
t.Errorf("chunk %d: expected length 1000, got %d", i, chunkRef.Length)
}
expectedOffset += chunkRef.Length
}
})
t.Run("blob size limit enforced", func(t *testing.T) {
// Create test database
db, err := database.NewTestDB()
if err != nil {
t.Fatalf("failed to create test db: %v", err)
}
defer func() { _ = db.Close() }()
repos := database.NewRepositories(db)
// Small blob size limit to force multiple blobs
cfg := PackerConfig{
MaxBlobSize: 5000, // 5KB max
CompressionLevel: 3,
Recipients: []string{testPublicKey},
Repositories: repos,
Fs: afero.NewMemMapFs(),
}
packer, err := NewPacker(cfg)
if err != nil {
t.Fatalf("failed to create packer: %v", err)
}
// Create chunks that will exceed the limit
chunks := make([]*ChunkRef, 10)
for i := 0; i < 10; i++ {
data := bytes.Repeat([]byte{byte(i)}, 1000) // 1KB each
hash := sha256.Sum256(data)
hashStr := hex.EncodeToString(hash[:])
// Create chunk in database first
dbChunk := &database.Chunk{
ChunkHash: types.ChunkHash(hashStr),
Size: int64(len(data)),
}
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
return repos.Chunks.Create(ctx, tx, dbChunk)
})
if err != nil {
t.Fatalf("failed to create chunk in db: %v", err)
}
chunks[i] = &ChunkRef{
Hash: hashStr,
Data: data,
}
}
blobCount := 0
// Add chunks and handle size limit errors
for _, chunk := range chunks {
err := packer.AddChunk(chunk)
if err == ErrBlobSizeLimitExceeded {
// Finalize current blob
if err := packer.FinalizeBlob(); err != nil {
t.Fatalf("failed to finalize blob: %v", err)
}
blobCount++
// Retry adding the chunk
if err := packer.AddChunk(chunk); err != nil {
t.Fatalf("failed to add chunk after finalize: %v", err)
}
} else if err != nil {
t.Fatalf("failed to add chunk: %v", err)
}
}
// Flush remaining
if err := packer.Flush(); err != nil {
t.Fatalf("failed to flush: %v", err)
}
// Get all blobs
blobs := packer.GetFinishedBlobs()
totalBlobs := blobCount + len(blobs)
// Should have multiple blobs due to size limit
if totalBlobs < 2 {
t.Errorf("expected multiple blobs due to size limit, got %d", totalBlobs)
}
// Verify each blob respects size limit (approximately)
for _, blob := range blobs {
if blob.Compressed > 6000 { // Allow some overhead
t.Errorf("blob size %d exceeds limit", blob.Compressed)
}
}
})
t.Run("with encryption", func(t *testing.T) {
// Create test database
db, err := database.NewTestDB()
if err != nil {
t.Fatalf("failed to create test db: %v", err)
}
defer func() { _ = db.Close() }()
repos := database.NewRepositories(db)
// Generate test identity (using the one from parent test)
cfg := PackerConfig{
MaxBlobSize: 10 * 1024 * 1024, // 10MB
CompressionLevel: 3,
Recipients: []string{testPublicKey},
Repositories: repos,
Fs: afero.NewMemMapFs(),
}
packer, err := NewPacker(cfg)
if err != nil {
t.Fatalf("failed to create packer: %v", err)
}
// Create test data
data := bytes.Repeat([]byte("Test data for encryption!"), 100)
hash := sha256.Sum256(data)
hashStr := hex.EncodeToString(hash[:])
// Create chunk in database first
dbChunk := &database.Chunk{
ChunkHash: types.ChunkHash(hashStr),
Size: int64(len(data)),
}
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
return repos.Chunks.Create(ctx, tx, dbChunk)
})
if err != nil {
t.Fatalf("failed to create chunk in db: %v", err)
}
chunk := &ChunkRef{
Hash: hashStr,
Data: data,
}
// Add chunk and flush
if err := packer.AddChunk(chunk); err != nil {
t.Fatalf("failed to add chunk: %v", err)
}
if err := packer.Flush(); err != nil {
t.Fatalf("failed to flush: %v", err)
}
// Get blob
blobs := packer.GetFinishedBlobs()
if len(blobs) != 1 {
t.Fatalf("expected 1 blob, got %d", len(blobs))
}
blob := blobs[0]
// Decrypt the blob
decrypted, err := age.Decrypt(bytes.NewReader(blob.Data), identity)
if err != nil {
t.Fatalf("failed to decrypt blob: %v", err)
}
var decryptedData bytes.Buffer
if _, err := decryptedData.ReadFrom(decrypted); err != nil {
t.Fatalf("failed to read decrypted data: %v", err)
}
// Decompress
reader, err := zstd.NewReader(&decryptedData)
if err != nil {
t.Fatalf("failed to create decompressor: %v", err)
}
defer reader.Close()
var decompressed bytes.Buffer
if _, err := decompressed.ReadFrom(reader); err != nil {
t.Fatalf("failed to decompress: %v", err)
}
// Verify data
if !bytes.Equal(decompressed.Bytes(), data) {
t.Error("decrypted and decompressed data doesn't match original")
}
})
}

View File

@@ -0,0 +1,74 @@
package blobgen
import (
"bytes"
"encoding/hex"
"fmt"
"io"
)
// CompressResult contains the results of compression
type CompressResult struct {
Data []byte
UncompressedSize int64
CompressedSize int64
SHA256 string
}
// CompressData compresses and encrypts data, returning the result with hash
func CompressData(data []byte, compressionLevel int, recipients []string) (*CompressResult, error) {
var buf bytes.Buffer
// Create writer
w, err := NewWriter(&buf, compressionLevel, recipients)
if err != nil {
return nil, fmt.Errorf("creating writer: %w", err)
}
// Write data
if _, err := w.Write(data); err != nil {
_ = w.Close()
return nil, fmt.Errorf("writing data: %w", err)
}
// Close to flush
if err := w.Close(); err != nil {
return nil, fmt.Errorf("closing writer: %w", err)
}
return &CompressResult{
Data: buf.Bytes(),
UncompressedSize: int64(len(data)),
CompressedSize: int64(buf.Len()),
SHA256: hex.EncodeToString(w.Sum256()),
}, nil
}
// CompressStream compresses and encrypts from reader to writer, returning hash
func CompressStream(dst io.Writer, src io.Reader, compressionLevel int, recipients []string) (written int64, hash string, err error) {
// Create writer
w, err := NewWriter(dst, compressionLevel, recipients)
if err != nil {
return 0, "", fmt.Errorf("creating writer: %w", err)
}
closed := false
defer func() {
if !closed {
_ = w.Close()
}
}()
// Copy data
if _, err := io.Copy(w, src); err != nil {
return 0, "", fmt.Errorf("copying data: %w", err)
}
// Close to flush
if err := w.Close(); err != nil {
return 0, "", fmt.Errorf("closing writer: %w", err)
}
closed = true
return w.BytesWritten(), hex.EncodeToString(w.Sum256()), nil
}

View File

@@ -0,0 +1,64 @@
package blobgen
import (
"bytes"
"crypto/rand"
"strings"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// testRecipient is a static age recipient for tests.
const testRecipient = "age1cplgrwj77ta54dnmydvvmzn64ltk83ankxl5sww04mrtmu62kv3s89gmvv"
// TestCompressStreamNoDoubleClose is a regression test for issue #28.
// It verifies that CompressStream does not panic or return an error due to
// double-closing the underlying blobgen.Writer. Before the fix in PR #33,
// the explicit Close() on the happy path combined with defer Close() would
// cause a double close.
func TestCompressStreamNoDoubleClose(t *testing.T) {
input := []byte("regression test data for issue #28 double-close fix")
var buf bytes.Buffer
written, hash, err := CompressStream(&buf, bytes.NewReader(input), 3, []string{testRecipient})
require.NoError(t, err, "CompressStream should not return an error")
assert.True(t, written > 0, "expected bytes written > 0")
assert.NotEmpty(t, hash, "expected non-empty hash")
assert.True(t, buf.Len() > 0, "expected non-empty output")
}
// TestCompressStreamLargeInput exercises CompressStream with a larger payload
// to ensure no double-close issues surface under heavier I/O.
func TestCompressStreamLargeInput(t *testing.T) {
data := make([]byte, 512*1024) // 512 KB
_, err := rand.Read(data)
require.NoError(t, err)
var buf bytes.Buffer
written, hash, err := CompressStream(&buf, bytes.NewReader(data), 3, []string{testRecipient})
require.NoError(t, err)
assert.True(t, written > 0)
assert.NotEmpty(t, hash)
}
// TestCompressStreamEmptyInput verifies CompressStream handles empty input
// without double-close issues.
func TestCompressStreamEmptyInput(t *testing.T) {
var buf bytes.Buffer
_, hash, err := CompressStream(&buf, strings.NewReader(""), 3, []string{testRecipient})
require.NoError(t, err)
assert.NotEmpty(t, hash)
}
// TestCompressDataNoDoubleClose mirrors the stream test for CompressData,
// ensuring the explicit Close + error-path Close pattern is also safe.
func TestCompressDataNoDoubleClose(t *testing.T) {
input := []byte("CompressData regression test for double-close")
result, err := CompressData(input, 3, []string{testRecipient})
require.NoError(t, err)
assert.True(t, result.CompressedSize > 0)
assert.True(t, result.UncompressedSize == int64(len(input)))
assert.NotEmpty(t, result.SHA256)
}

View File

@@ -0,0 +1,73 @@
package blobgen
import (
"crypto/sha256"
"fmt"
"hash"
"io"
"filippo.io/age"
"github.com/klauspost/compress/zstd"
)
// Reader wraps decompression and decryption with SHA256 verification
type Reader struct {
reader io.Reader
decompressor *zstd.Decoder
decryptor io.Reader
hasher hash.Hash
teeReader io.Reader
bytesRead int64
}
// NewReader creates a new Reader that decrypts, decompresses, and verifies data
func NewReader(r io.Reader, identity age.Identity) (*Reader, error) {
// Create decryption reader
decReader, err := age.Decrypt(r, identity)
if err != nil {
return nil, fmt.Errorf("creating decryption reader: %w", err)
}
// Create decompression reader
decompressor, err := zstd.NewReader(decReader)
if err != nil {
return nil, fmt.Errorf("creating decompression reader: %w", err)
}
// Create SHA256 hasher
hasher := sha256.New()
// Create tee reader that reads from decompressor and writes to hasher
teeReader := io.TeeReader(decompressor, hasher)
return &Reader{
reader: r,
decompressor: decompressor,
decryptor: decReader,
hasher: hasher,
teeReader: teeReader,
}, nil
}
// Read implements io.Reader
func (r *Reader) Read(p []byte) (n int, err error) {
n, err = r.teeReader.Read(p)
r.bytesRead += int64(n)
return n, err
}
// Close closes the decompressor
func (r *Reader) Close() error {
r.decompressor.Close()
return nil
}
// Sum256 returns the SHA256 hash of all data read
func (r *Reader) Sum256() []byte {
return r.hasher.Sum(nil)
}
// BytesRead returns the number of uncompressed bytes read
func (r *Reader) BytesRead() int64 {
return r.bytesRead
}

127
internal/blobgen/writer.go Normal file
View File

@@ -0,0 +1,127 @@
package blobgen
import (
"crypto/sha256"
"fmt"
"hash"
"io"
"runtime"
"filippo.io/age"
"github.com/klauspost/compress/zstd"
)
// Writer wraps compression and encryption with SHA256 hashing.
// Data flows: input -> tee(hasher, compressor -> encryptor -> destination)
// The hash is computed on the uncompressed input for deterministic content-addressing.
type Writer struct {
teeWriter io.Writer // Tee to hasher and compressor
compressor *zstd.Encoder // Compression layer
encryptor io.WriteCloser // Encryption layer
hasher hash.Hash // SHA256 hasher (on uncompressed input)
compressionLevel int
bytesWritten int64
}
// NewWriter creates a new Writer that compresses, encrypts, and hashes data.
// The hash is computed on the uncompressed input for deterministic content-addressing.
func NewWriter(w io.Writer, compressionLevel int, recipients []string) (*Writer, error) {
// Validate compression level
if err := validateCompressionLevel(compressionLevel); err != nil {
return nil, err
}
// Create SHA256 hasher for the uncompressed input
hasher := sha256.New()
// Parse recipients
var ageRecipients []age.Recipient
for _, recipient := range recipients {
r, err := age.ParseX25519Recipient(recipient)
if err != nil {
return nil, fmt.Errorf("parsing recipient %s: %w", recipient, err)
}
ageRecipients = append(ageRecipients, r)
}
// Create encryption writer that outputs to destination
encWriter, err := age.Encrypt(w, ageRecipients...)
if err != nil {
return nil, fmt.Errorf("creating encryption writer: %w", err)
}
// Calculate compression concurrency: CPUs - 2, minimum 1
concurrency := runtime.NumCPU() - 2
if concurrency < 1 {
concurrency = 1
}
// Create compression writer with encryption as destination
compressor, err := zstd.NewWriter(encWriter,
zstd.WithEncoderLevel(zstd.EncoderLevelFromZstd(compressionLevel)),
zstd.WithEncoderConcurrency(concurrency),
)
if err != nil {
_ = encWriter.Close()
return nil, fmt.Errorf("creating compression writer: %w", err)
}
// Create tee writer: input goes to both hasher and compressor
teeWriter := io.MultiWriter(hasher, compressor)
return &Writer{
teeWriter: teeWriter,
compressor: compressor,
encryptor: encWriter,
hasher: hasher,
compressionLevel: compressionLevel,
}, nil
}
// Write implements io.Writer
func (w *Writer) Write(p []byte) (n int, err error) {
n, err = w.teeWriter.Write(p)
w.bytesWritten += int64(n)
return n, err
}
// Close closes all layers and returns any errors
func (w *Writer) Close() error {
// Close compressor first
if err := w.compressor.Close(); err != nil {
return fmt.Errorf("closing compressor: %w", err)
}
// Then close encryptor
if err := w.encryptor.Close(); err != nil {
return fmt.Errorf("closing encryptor: %w", err)
}
return nil
}
// Sum256 returns the double SHA256 hash of the uncompressed input data.
// Double hashing (SHA256(SHA256(data))) prevents information leakage about
// the plaintext - an attacker cannot confirm existence of known content
// by computing its hash and checking for a matching blob filename.
func (w *Writer) Sum256() []byte {
// First hash: SHA256(plaintext)
firstHash := w.hasher.Sum(nil)
// Second hash: SHA256(firstHash) - this is the blob ID
secondHash := sha256.Sum256(firstHash)
return secondHash[:]
}
// BytesWritten returns the number of uncompressed bytes written
func (w *Writer) BytesWritten() int64 {
return w.bytesWritten
}
func validateCompressionLevel(level int) error {
// Zstd compression levels: 1-19 (default is 3)
// SpeedFastest = 1, SpeedDefault = 3, SpeedBetterCompression = 7, SpeedBestCompression = 11
if level < 1 || level > 19 {
return fmt.Errorf("invalid compression level %d: must be between 1 and 19", level)
}
return nil
}

View File

@@ -0,0 +1,105 @@
package blobgen
import (
"bytes"
"crypto/rand"
"crypto/sha256"
"encoding/hex"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// TestWriterHashIsDoubleHash verifies that Writer.Sum256() returns
// the double hash SHA256(SHA256(plaintext)) for security.
// Double hashing prevents attackers from confirming existence of known content.
func TestWriterHashIsDoubleHash(t *testing.T) {
// Test data - random data that doesn't compress well
testData := make([]byte, 1024*1024) // 1MB
_, err := rand.Read(testData)
require.NoError(t, err)
// Test recipient (generated with age-keygen)
testRecipient := "age1cplgrwj77ta54dnmydvvmzn64ltk83ankxl5sww04mrtmu62kv3s89gmvv"
// Create a buffer to capture the encrypted output
var encryptedBuf bytes.Buffer
// Create blobgen writer
writer, err := NewWriter(&encryptedBuf, 3, []string{testRecipient})
require.NoError(t, err)
// Write test data
n, err := writer.Write(testData)
require.NoError(t, err)
assert.Equal(t, len(testData), n)
// Close to flush all data
err = writer.Close()
require.NoError(t, err)
// Get the hash from the writer
writerHash := hex.EncodeToString(writer.Sum256())
// Calculate the expected double hash: SHA256(SHA256(plaintext))
firstHash := sha256.Sum256(testData)
secondHash := sha256.Sum256(firstHash[:])
expectedDoubleHash := hex.EncodeToString(secondHash[:])
// Also compute single hash to verify it's different
singleHashStr := hex.EncodeToString(firstHash[:])
t.Logf("Input size: %d bytes", len(testData))
t.Logf("Single hash (SHA256(data)): %s", singleHashStr)
t.Logf("Double hash (SHA256(SHA256(data))): %s", expectedDoubleHash)
t.Logf("Writer hash: %s", writerHash)
// The writer hash should match the double hash
assert.Equal(t, expectedDoubleHash, writerHash,
"Writer.Sum256() should return SHA256(SHA256(plaintext)) for security")
// Verify it's NOT the single hash (would leak information)
assert.NotEqual(t, singleHashStr, writerHash,
"Writer hash should not be single hash (would allow content confirmation attacks)")
}
// TestWriterDeterministicHash verifies that the same input always produces
// the same hash, even with non-deterministic encryption.
func TestWriterDeterministicHash(t *testing.T) {
// Test data
testData := []byte("Hello, World! This is test data for deterministic hashing.")
// Test recipient
testRecipient := "age1cplgrwj77ta54dnmydvvmzn64ltk83ankxl5sww04mrtmu62kv3s89gmvv"
// Create two writers and verify they produce the same hash
var buf1, buf2 bytes.Buffer
writer1, err := NewWriter(&buf1, 3, []string{testRecipient})
require.NoError(t, err)
_, err = writer1.Write(testData)
require.NoError(t, err)
require.NoError(t, writer1.Close())
writer2, err := NewWriter(&buf2, 3, []string{testRecipient})
require.NoError(t, err)
_, err = writer2.Write(testData)
require.NoError(t, err)
require.NoError(t, writer2.Close())
hash1 := hex.EncodeToString(writer1.Sum256())
hash2 := hex.EncodeToString(writer2.Sum256())
// Hashes should be identical (deterministic)
assert.Equal(t, hash1, hash2, "Same input should produce same hash")
// Encrypted outputs should be different (non-deterministic encryption)
assert.NotEqual(t, buf1.Bytes(), buf2.Bytes(),
"Encrypted outputs should differ due to non-deterministic encryption")
t.Logf("Hash 1: %s", hash1)
t.Logf("Hash 2: %s", hash2)
t.Logf("Encrypted size 1: %d bytes", buf1.Len())
t.Logf("Encrypted size 2: %d bytes", buf2.Len())
}

153
internal/chunker/chunker.go Normal file
View File

@@ -0,0 +1,153 @@
package chunker
import (
"crypto/sha256"
"encoding/hex"
"fmt"
"io"
"os"
)
// Chunk represents a single chunk of data produced by the content-defined chunking algorithm.
// Each chunk is identified by its SHA256 hash and contains the raw data along with
// its position and size information from the original file.
type Chunk struct {
Hash string // Content hash of the chunk
Data []byte // Chunk data
Offset int64 // Offset in the original file
Size int64 // Size of the chunk
}
// Chunker provides content-defined chunking using the FastCDC algorithm.
// It splits data into variable-sized chunks based on content patterns, ensuring
// that identical data sequences produce identical chunks regardless of their
// position in the file. This enables efficient deduplication.
type Chunker struct {
avgChunkSize int
minChunkSize int
maxChunkSize int
}
// NewChunker creates a new chunker with the specified average chunk size.
// The actual chunk sizes will vary between avgChunkSize/4 and avgChunkSize*4
// as recommended by the FastCDC algorithm. Typical values for avgChunkSize
// are 64KB (65536), 256KB (262144), or 1MB (1048576).
func NewChunker(avgChunkSize int64) *Chunker {
// FastCDC recommends min = avg/4 and max = avg*4
return &Chunker{
avgChunkSize: int(avgChunkSize),
minChunkSize: int(avgChunkSize / 4),
maxChunkSize: int(avgChunkSize * 4),
}
}
// ChunkReader splits the reader into content-defined chunks and returns all chunks at once.
// This method loads all chunk data into memory, so it should only be used for
// reasonably sized inputs. For large files or streams, use ChunkReaderStreaming instead.
// Returns an error if chunking fails or if reading from the input fails.
func (c *Chunker) ChunkReader(r io.Reader) ([]Chunk, error) {
chunker := AcquireReusableChunker(r, c.minChunkSize, c.avgChunkSize, c.maxChunkSize)
defer chunker.Release()
var chunks []Chunk
offset := int64(0)
for {
chunk, err := chunker.Next()
if err == io.EOF {
break
}
if err != nil {
return nil, fmt.Errorf("reading chunk: %w", err)
}
// Calculate hash
hash := sha256.Sum256(chunk.Data)
// Make a copy of the data since the chunker reuses the buffer
chunkData := make([]byte, len(chunk.Data))
copy(chunkData, chunk.Data)
chunks = append(chunks, Chunk{
Hash: hex.EncodeToString(hash[:]),
Data: chunkData,
Offset: offset,
Size: int64(len(chunk.Data)),
})
offset += int64(len(chunk.Data))
}
return chunks, nil
}
// ChunkCallback is a function called for each chunk as it's processed.
// The callback receives a Chunk containing the hash, data, offset, and size.
// If the callback returns an error, chunk processing stops and the error is propagated.
type ChunkCallback func(chunk Chunk) error
// ChunkReaderStreaming splits the reader into chunks and calls the callback for each chunk.
// This is the preferred method for processing large files or streams as it doesn't
// accumulate all chunks in memory. The callback is invoked for each chunk as it's
// produced, allowing for streaming processing and immediate storage or transmission.
// Returns the SHA256 hash of the entire file content and an error if chunking fails,
// reading fails, or if the callback returns an error.
func (c *Chunker) ChunkReaderStreaming(r io.Reader, callback ChunkCallback) (string, error) {
// Create a tee reader to calculate full file hash while chunking
fileHasher := sha256.New()
teeReader := io.TeeReader(r, fileHasher)
chunker := AcquireReusableChunker(teeReader, c.minChunkSize, c.avgChunkSize, c.maxChunkSize)
defer chunker.Release()
offset := int64(0)
for {
chunk, err := chunker.Next()
if err == io.EOF {
break
}
if err != nil {
return "", fmt.Errorf("reading chunk: %w", err)
}
// Calculate chunk hash
hash := sha256.Sum256(chunk.Data)
// Pass the data directly - caller must process it before we call Next() again
// (chunker reuses its internal buffer, but since we process synchronously
// and completely before continuing, no copy is needed)
if err := callback(Chunk{
Hash: hex.EncodeToString(hash[:]),
Data: chunk.Data,
Offset: offset,
Size: int64(len(chunk.Data)),
}); err != nil {
return "", fmt.Errorf("callback error: %w", err)
}
offset += int64(len(chunk.Data))
}
// Return the full file hash
return hex.EncodeToString(fileHasher.Sum(nil)), nil
}
// ChunkFile splits a file into content-defined chunks by reading the entire file.
// This is a convenience method that opens the file and passes it to ChunkReader.
// For large files, consider using ChunkReaderStreaming with a file handle instead.
// Returns an error if the file cannot be opened or if chunking fails.
func (c *Chunker) ChunkFile(path string) ([]Chunk, error) {
file, err := os.Open(path)
if err != nil {
return nil, fmt.Errorf("opening file: %w", err)
}
defer func() {
if err := file.Close(); err != nil && err.Error() != "invalid argument" {
// Log error or handle as needed
_ = err
}
}()
return c.ChunkReader(file)
}

View File

@@ -0,0 +1,77 @@
package chunker
import (
"bytes"
"testing"
)
func TestChunkerExpectedChunkCount(t *testing.T) {
tests := []struct {
name string
fileSize int
avgChunkSize int64
minExpected int
maxExpected int
}{
{
name: "1MB file with 64KB average",
fileSize: 1024 * 1024,
avgChunkSize: 64 * 1024,
minExpected: 8, // At least half the expected count
maxExpected: 32, // At most double the expected count
},
{
name: "10MB file with 256KB average",
fileSize: 10 * 1024 * 1024,
avgChunkSize: 256 * 1024,
minExpected: 10, // FastCDC may produce larger chunks
maxExpected: 80,
},
{
name: "512KB file with 64KB average",
fileSize: 512 * 1024,
avgChunkSize: 64 * 1024,
minExpected: 4, // ~8 expected
maxExpected: 16,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
chunker := NewChunker(tt.avgChunkSize)
// Create data with some variation to trigger chunk boundaries
data := make([]byte, tt.fileSize)
for i := 0; i < len(data); i++ {
// Use a pattern that should create boundaries
data[i] = byte((i * 17) ^ (i >> 5))
}
chunks, err := chunker.ChunkReader(bytes.NewReader(data))
if err != nil {
t.Fatalf("chunking failed: %v", err)
}
t.Logf("Created %d chunks for %d bytes with %d average chunk size",
len(chunks), tt.fileSize, tt.avgChunkSize)
if len(chunks) < tt.minExpected {
t.Errorf("too few chunks: got %d, expected at least %d",
len(chunks), tt.minExpected)
}
if len(chunks) > tt.maxExpected {
t.Errorf("too many chunks: got %d, expected at most %d",
len(chunks), tt.maxExpected)
}
// Verify chunks reconstruct to original
var reconstructed []byte
for _, chunk := range chunks {
reconstructed = append(reconstructed, chunk.Data...)
}
if !bytes.Equal(data, reconstructed) {
t.Error("reconstructed data doesn't match original")
}
})
}
}

View File

@@ -0,0 +1,128 @@
package chunker
import (
"bytes"
"crypto/rand"
"testing"
)
func TestChunker(t *testing.T) {
t.Run("small file produces single chunk", func(t *testing.T) {
chunker := NewChunker(1024 * 1024) // 1MB average
data := bytes.Repeat([]byte("hello"), 100) // 500 bytes
chunks, err := chunker.ChunkReader(bytes.NewReader(data))
if err != nil {
t.Fatalf("chunking failed: %v", err)
}
if len(chunks) != 1 {
t.Errorf("expected 1 chunk, got %d", len(chunks))
}
if chunks[0].Size != int64(len(data)) {
t.Errorf("expected chunk size %d, got %d", len(data), chunks[0].Size)
}
})
t.Run("large file produces multiple chunks", func(t *testing.T) {
chunker := NewChunker(256 * 1024) // 256KB average chunk size
// Generate 2MB of random data
data := make([]byte, 2*1024*1024)
if _, err := rand.Read(data); err != nil {
t.Fatalf("failed to generate random data: %v", err)
}
chunks, err := chunker.ChunkReader(bytes.NewReader(data))
if err != nil {
t.Fatalf("chunking failed: %v", err)
}
// Should produce multiple chunks - with FastCDC we expect around 8 chunks for 2MB with 256KB average
if len(chunks) < 4 || len(chunks) > 16 {
t.Errorf("expected 4-16 chunks, got %d", len(chunks))
}
// Verify chunks reconstruct original data
var reconstructed []byte
for _, chunk := range chunks {
reconstructed = append(reconstructed, chunk.Data...)
}
if !bytes.Equal(data, reconstructed) {
t.Error("reconstructed data doesn't match original")
}
// Verify offsets
var expectedOffset int64
for i, chunk := range chunks {
if chunk.Offset != expectedOffset {
t.Errorf("chunk %d: expected offset %d, got %d", i, expectedOffset, chunk.Offset)
}
expectedOffset += chunk.Size
}
})
t.Run("deterministic chunking", func(t *testing.T) {
chunker1 := NewChunker(256 * 1024)
chunker2 := NewChunker(256 * 1024)
// Use deterministic data
data := bytes.Repeat([]byte("abcdefghijklmnopqrstuvwxyz"), 20000) // ~520KB
chunks1, err := chunker1.ChunkReader(bytes.NewReader(data))
if err != nil {
t.Fatalf("chunking failed: %v", err)
}
chunks2, err := chunker2.ChunkReader(bytes.NewReader(data))
if err != nil {
t.Fatalf("chunking failed: %v", err)
}
// Should produce same chunks
if len(chunks1) != len(chunks2) {
t.Fatalf("different number of chunks: %d vs %d", len(chunks1), len(chunks2))
}
for i := range chunks1 {
if chunks1[i].Hash != chunks2[i].Hash {
t.Errorf("chunk %d: different hashes", i)
}
if chunks1[i].Size != chunks2[i].Size {
t.Errorf("chunk %d: different sizes", i)
}
}
})
}
func TestChunkBoundaries(t *testing.T) {
chunker := NewChunker(256 * 1024) // 256KB average
// FastCDC uses avg/4 for min and avg*4 for max
avgSize := int64(256 * 1024)
minSize := avgSize / 4
maxSize := avgSize * 4
// Test that minimum chunk size is respected
data := make([]byte, minSize+1024)
if _, err := rand.Read(data); err != nil {
t.Fatalf("failed to generate random data: %v", err)
}
chunks, err := chunker.ChunkReader(bytes.NewReader(data))
if err != nil {
t.Fatalf("chunking failed: %v", err)
}
for i, chunk := range chunks {
// Last chunk can be smaller than minimum
if i < len(chunks)-1 && chunk.Size < minSize {
t.Errorf("chunk %d size %d is below minimum %d", i, chunk.Size, minSize)
}
if chunk.Size > maxSize {
t.Errorf("chunk %d size %d exceeds maximum %d", i, chunk.Size, maxSize)
}
}
}

265
internal/chunker/fastcdc.go Normal file
View File

@@ -0,0 +1,265 @@
package chunker
import (
"io"
"math"
"sync"
)
// ReusableChunker implements FastCDC with reusable buffers to minimize allocations.
// Unlike the upstream fastcdc-go library which allocates a new buffer per file,
// this implementation uses sync.Pool to reuse buffers across files.
type ReusableChunker struct {
minSize int
maxSize int
normSize int
bufSize int
maskS uint64
maskL uint64
rd io.Reader
buf []byte
cursor int
offset int
eof bool
}
// reusableChunkerPool pools ReusableChunker instances to avoid allocations.
var reusableChunkerPool = sync.Pool{
New: func() interface{} {
return &ReusableChunker{}
},
}
// bufferPools contains pools for different buffer sizes.
// Key is the buffer size.
var bufferPools = sync.Map{}
func getBuffer(size int) []byte {
poolI, _ := bufferPools.LoadOrStore(size, &sync.Pool{
New: func() interface{} {
buf := make([]byte, size)
return &buf
},
})
pool := poolI.(*sync.Pool)
return *pool.Get().(*[]byte)
}
func putBuffer(buf []byte) {
size := cap(buf)
poolI, ok := bufferPools.Load(size)
if ok {
pool := poolI.(*sync.Pool)
b := buf[:size]
pool.Put(&b)
}
}
// FastCDCChunk represents a chunk from the FastCDC algorithm.
type FastCDCChunk struct {
Offset int
Length int
Data []byte
Fingerprint uint64
}
// AcquireReusableChunker gets a chunker from the pool and initializes it for the given reader.
func AcquireReusableChunker(rd io.Reader, minSize, avgSize, maxSize int) *ReusableChunker {
c := reusableChunkerPool.Get().(*ReusableChunker)
bufSize := maxSize * 2
// Reuse buffer if it's the right size, otherwise get a new one
if c.buf == nil || cap(c.buf) != bufSize {
if c.buf != nil {
putBuffer(c.buf)
}
c.buf = getBuffer(bufSize)
} else {
// Restore buffer to full capacity (may have been truncated by previous EOF)
c.buf = c.buf[:cap(c.buf)]
}
bits := int(math.Round(math.Log2(float64(avgSize))))
normalization := 2
smallBits := bits + normalization
largeBits := bits - normalization
c.minSize = minSize
c.maxSize = maxSize
c.normSize = avgSize
c.bufSize = bufSize
c.maskS = (1 << smallBits) - 1
c.maskL = (1 << largeBits) - 1
c.rd = rd
c.cursor = bufSize
c.offset = 0
c.eof = false
return c
}
// Release returns the chunker to the pool for reuse.
func (c *ReusableChunker) Release() {
c.rd = nil
reusableChunkerPool.Put(c)
}
func (c *ReusableChunker) fillBuffer() error {
n := len(c.buf) - c.cursor
if n >= c.maxSize {
return nil
}
// Move all data after the cursor to the start of the buffer
copy(c.buf[:n], c.buf[c.cursor:])
c.cursor = 0
if c.eof {
c.buf = c.buf[:n]
return nil
}
// Restore buffer to full capacity for reading
c.buf = c.buf[:c.bufSize]
// Fill the rest of the buffer
m, err := io.ReadFull(c.rd, c.buf[n:])
if err == io.EOF || err == io.ErrUnexpectedEOF {
c.buf = c.buf[:n+m]
c.eof = true
} else if err != nil {
return err
}
return nil
}
// Next returns the next chunk or io.EOF when done.
// The returned Data slice is only valid until the next call to Next.
func (c *ReusableChunker) Next() (FastCDCChunk, error) {
if err := c.fillBuffer(); err != nil {
return FastCDCChunk{}, err
}
if len(c.buf) == 0 {
return FastCDCChunk{}, io.EOF
}
length, fp := c.nextChunk(c.buf[c.cursor:])
chunk := FastCDCChunk{
Offset: c.offset,
Length: length,
Data: c.buf[c.cursor : c.cursor+length],
Fingerprint: fp,
}
c.cursor += length
c.offset += chunk.Length
return chunk, nil
}
func (c *ReusableChunker) nextChunk(data []byte) (int, uint64) {
fp := uint64(0)
i := c.minSize
if len(data) <= c.minSize {
return len(data), fp
}
n := min(len(data), c.maxSize)
for ; i < min(n, c.normSize); i++ {
fp = (fp << 1) + table[data[i]]
if (fp & c.maskS) == 0 {
return i + 1, fp
}
}
for ; i < n; i++ {
fp = (fp << 1) + table[data[i]]
if (fp & c.maskL) == 0 {
return i + 1, fp
}
}
return i, fp
}
func min(a, b int) int {
if a < b {
return a
}
return b
}
// 256 random uint64s for the rolling hash function (from FastCDC paper)
var table = [256]uint64{
0xe80e8d55032474b3, 0x11b25b61f5924e15, 0x03aa5bd82a9eb669, 0xc45a153ef107a38c,
0xeac874b86f0f57b9, 0xa5ccedec95ec79c7, 0xe15a3320ad42ac0a, 0x5ed3583fa63cec15,
0xcd497bf624a4451d, 0xf9ade5b059683605, 0x773940c03fb11ca1, 0xa36b16e4a6ae15b2,
0x67afd1adb5a89eac, 0xc44c75ee32f0038e, 0x2101790f365c0967, 0x76415c64a222fc4a,
0x579929249a1e577a, 0xe4762fc41fdbf750, 0xea52198e57dfcdcc, 0xe2535aafe30b4281,
0xcb1a1bd6c77c9056, 0x5a1aa9bfc4612a62, 0x15a728aef8943eb5, 0x2f8f09738a8ec8d9,
0x200f3dec9fac8074, 0x0fa9a7b1e0d318df, 0x06c0804ffd0d8e3a, 0x630cbc412669dd25,
0x10e34f85f4b10285, 0x2a6fe8164b9b6410, 0xcacb57d857d55810, 0x77f8a3a36ff11b46,
0x66af517e0dc3003e, 0x76c073c789b4009a, 0x853230dbb529f22a, 0x1e9e9c09a1f77e56,
0x1e871223802ee65d, 0x37fe4588718ff813, 0x10088539f30db464, 0x366f7470b80b72d1,
0x33f2634d9a6b31db, 0xd43917751d69ea18, 0xa0f492bc1aa7b8de, 0x3f94e5a8054edd20,
0xedfd6e25eb8b1dbf, 0x759517a54f196a56, 0xe81d5006ec7b6b17, 0x8dd8385fa894a6b7,
0x45f4d5467b0d6f91, 0xa1f894699de22bc8, 0x33829d09ef93e0fe, 0x3e29e250caed603c,
0xf7382cba7f63a45e, 0x970f95412bb569d1, 0xc7fcea456d356b4b, 0x723042513f3e7a57,
0x17ae7688de3596f1, 0x27ac1fcd7cd23c1a, 0xf429beeb78b3f71f, 0xd0780692fb93a3f9,
0x9f507e28a7c9842f, 0x56001ad536e433ae, 0x7e1dd1ecf58be306, 0x15fee353aa233fc6,
0xb033a0730b7638e8, 0xeb593ad6bd2406d1, 0x7c86502574d0f133, 0xce3b008d4ccb4be7,
0xf8566e3d383594c8, 0xb2c261e9b7af4429, 0xf685e7e253799dbb, 0x05d33ed60a494cbc,
0xeaf88d55a4cb0d1a, 0x3ee9368a902415a1, 0x8980fe6a8493a9a4, 0x358ed008cb448631,
0xd0cb7e37b46824b8, 0xe9bc375c0bc94f84, 0xea0bf1d8e6b55bb3, 0xb66a60d0f9f6f297,
0x66db2cc4807b3758, 0x7e4e014afbca8b4d, 0xa5686a4938b0c730, 0xa5f0d7353d623316,
0x26e38c349242d5e8, 0xeeefa80a29858e30, 0x8915cb912aa67386, 0x4b957a47bfc420d4,
0xbb53d051a895f7e1, 0x09f5e3235f6911ce, 0x416b98e695cfb7ce, 0x97a08183344c5c86,
0xbf68e0791839a861, 0xea05dde59ed3ed56, 0x0ca732280beda160, 0xac748ed62fe7f4e2,
0xc686da075cf6e151, 0xe1ba5658f4af05c8, 0xe9ff09fbeb67cc35, 0xafaea9470323b28d,
0x0291e8db5bb0ac2a, 0x342072a9bbee77ae, 0x03147eed6b3d0a9c, 0x21379d4de31dbadb,
0x2388d965226fb986, 0x52c96988bfebabfa, 0xa6fc29896595bc2d, 0x38fa4af70aa46b8b,
0xa688dd13939421ee, 0x99d5275d9b1415da, 0x453d31bb4fe73631, 0xde51debc1fbe3356,
0x75a3c847a06c622f, 0xe80e32755d272579, 0x5444052250d8ec0d, 0x8f17dfda19580a3b,
0xf6b3e9363a185e42, 0x7a42adec6868732f, 0x32cb6a07629203a2, 0x1eca8957defe56d9,
0x9fa85e4bc78ff9ed, 0x20ff07224a499ca7, 0x3fa6295ff9682c70, 0xe3d5b1e3ce993eff,
0xa341209362e0b79a, 0x64bd9eae5712ffe8, 0xceebb537babbd12a, 0x5586ef404315954f,
0x46c3085c938ab51a, 0xa82ccb9199907cee, 0x8c51b6690a3523c8, 0xc4dbd4c9ae518332,
0x979898dbb23db7b2, 0x1b5b585e6f672a9d, 0xce284da7c4903810, 0x841166e8bb5f1c4f,
0xb7d884a3fceca7d0, 0xa76468f5a4572374, 0xc10c45f49ee9513d, 0x68f9a5663c1908c9,
0x0095a13476a6339d, 0xd1d7516ffbe9c679, 0xfd94ab0c9726f938, 0x627468bbdb27c959,
0xedc3f8988e4a8c9a, 0x58efd33f0dfaa499, 0x21e37d7e2ef4ac8b, 0x297f9ab5586259c6,
0xda3ba4dc6cb9617d, 0xae11d8d9de2284d2, 0xcfeed88cb3729865, 0xefc2f9e4f03e2633,
0x8226393e8f0855a4, 0xd6e25fd7acf3a767, 0x435784c3bfd6d14a, 0xf97142e6343fe757,
0xd73b9fe826352f85, 0x6c3ac444b5b2bd76, 0xd8e88f3e9fd4a3fd, 0x31e50875c36f3460,
0xa824f1bf88cf4d44, 0x54a4d2c8f5f25899, 0xbff254637ce3b1e6, 0xa02cfe92561b3caa,
0x7bedb4edee9f0af7, 0x879c0620ac49a102, 0xa12c4ccd23b332e7, 0x09a5ff47bf94ed1e,
0x7b62f43cd3046fa0, 0xaa3af0476b9c2fb9, 0x22e55301abebba8e, 0x3a6035c42747bd58,
0x1705373106c8ec07, 0xb1f660de828d0628, 0x065fe82d89ca563d, 0xf555c2d8074d516d,
0x6bb6c186b423ee99, 0x54a807be6f3120a8, 0x8a3c7fe2f88860b8, 0xbeffc344f5118e81,
0xd686e80b7d1bd268, 0x661aef4ef5e5e88b, 0x5bf256c654cd1dda, 0x9adb1ab85d7640f4,
0x68449238920833a2, 0x843279f4cebcb044, 0xc8710cdefa93f7bb, 0x236943294538f3e6,
0x80d7d136c486d0b4, 0x61653956b28851d3, 0x3f843be9a9a956b5, 0xf73cfbbf137987e5,
0xcf0cb6dee8ceac2c, 0x50c401f52f185cae, 0xbdbe89ce735c4c1c, 0xeef3ade9c0570bc7,
0xbe8b066f8f64cbf6, 0x5238d6131705dcb9, 0x20219086c950e9f6, 0x634468d9ed74de02,
0x0aba4b3d705c7fa5, 0x3374416f725a6672, 0xe7378bdf7beb3bc6, 0x0f7b6a1b1cee565b,
0x234e4c41b0c33e64, 0x4efa9a0c3f21fe28, 0x1167fc551643e514, 0x9f81a69d3eb01fa4,
0xdb75c22b12306ed0, 0xe25055d738fc9686, 0x9f9f167a3f8507bb, 0x195f8336d3fbe4d3,
0x8442b6feffdcb6f6, 0x1e07ed24746ffde9, 0x140e31462d555266, 0x8bd0ce515ae1406e,
0x2c0be0042b5584b3, 0x35a23d0e15d45a60, 0xc14f1ba147d9bc83, 0xbbf168691264b23f,
0xad2cc7b57e589ade, 0x9501963154c7815c, 0x9664afa6b8d67d47, 0x7f9e5101fea0a81c,
0x45ecffb610d25bfd, 0x3157f7aecf9b6ab3, 0xc43ca6f88d87501d, 0x9576ff838dee38dc,
0x93f21afe0ce1c7d7, 0xceac699df343d8f9, 0x2fec49e29f03398d, 0x8805ccd5730281ed,
0xf9fc16fc750a8e59, 0x35308cc771adf736, 0x4a57b7c9ee2b7def, 0x03a4c6cdc937a02a,
0x6c9a8a269fc8c4fc, 0x4681decec7a03f43, 0x342eecded1353ef9, 0x8be0552d8413a867,
0xc7b4ac51beda8be8, 0xebcc64fb719842c0, 0xde8e4c7fb6d40c1c, 0xcc8263b62f9738b1,
0xd3cfc0f86511929a, 0x466024ce8bb226ea, 0x459ff690253a3c18, 0x98b27e9d91284c9c,
0x75c3ae8aa3af373d, 0xfbf8f8e79a866ffc, 0x32327f59d0662799, 0x8228b57e729e9830,
0x065ceb7a18381b58, 0xd2177671a31dc5ff, 0x90cd801f2f8701f9, 0x9d714428471c65fe,
}

View File

@@ -2,28 +2,63 @@ package cli
import ( import (
"context" "context"
"errors"
"fmt" "fmt"
"os"
"os/signal"
"path/filepath"
"syscall"
"time"
"git.eeqj.de/sneak/vaultik/internal/config" "git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/database" "git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/globals" "git.eeqj.de/sneak/vaultik/internal/globals"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/pidlock"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/storage"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/adrg/xdg"
"go.uber.org/fx" "go.uber.org/fx"
) )
// AppOptions contains common options for creating the fx application // AppOptions contains common options for creating the fx application.
// It includes the configuration file path, logging options, and additional
// fx modules and invocations that should be included in the application.
type AppOptions struct { type AppOptions struct {
ConfigPath string ConfigPath string
LogOptions log.LogOptions
Modules []fx.Option Modules []fx.Option
Invokes []fx.Option Invokes []fx.Option
} }
// NewApp creates a new fx application with common modules // setupGlobals sets up the globals with application startup time
func setupGlobals(lc fx.Lifecycle, g *globals.Globals) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
g.StartTime = time.Now().UTC()
return nil
},
})
}
// NewApp creates a new fx application with common modules.
// It sets up the base modules (config, database, logging, globals) and
// combines them with any additional modules specified in the options.
// The returned fx.App is ready to be started with RunApp.
func NewApp(opts AppOptions) *fx.App { func NewApp(opts AppOptions) *fx.App {
baseModules := []fx.Option{ baseModules := []fx.Option{
fx.Supply(config.ConfigPath(opts.ConfigPath)), fx.Supply(config.ConfigPath(opts.ConfigPath)),
fx.Supply(opts.LogOptions),
fx.Provide(globals.New), fx.Provide(globals.New),
fx.Provide(log.New),
config.Module, config.Module,
database.Module, database.Module,
log.Module,
storage.Module,
snapshot.Module,
fx.Provide(vaultik.New),
fx.Invoke(setupGlobals),
fx.NopLogger, fx.NopLogger,
} }
@@ -33,24 +68,77 @@ func NewApp(opts AppOptions) *fx.App {
return fx.New(allOptions...) return fx.New(allOptions...)
} }
// RunApp starts and stops the fx application within the given context // RunApp starts and stops the fx application within the given context.
// It handles graceful shutdown on interrupt signals (SIGINT, SIGTERM) and
// ensures the application stops cleanly. The function blocks until the
// application completes or is interrupted. Returns an error if startup fails.
func RunApp(ctx context.Context, app *fx.App) error { func RunApp(ctx context.Context, app *fx.App) error {
// Set up signal handling for graceful shutdown
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)
// Create a context that will be cancelled on signal
ctx, cancel := context.WithCancel(ctx)
defer cancel()
// Start the app
if err := app.Start(ctx); err != nil { if err := app.Start(ctx); err != nil {
return fmt.Errorf("failed to start app: %w", err) return fmt.Errorf("failed to start app: %w", err)
} }
defer func() {
if err := app.Stop(ctx); err != nil { // Handle shutdown
fmt.Printf("error stopping app: %v\n", err) shutdownComplete := make(chan struct{})
go func() {
defer close(shutdownComplete)
<-sigChan
log.Notice("Received interrupt signal, shutting down gracefully...")
// Create a timeout context for shutdown
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 30*time.Second)
defer shutdownCancel()
if err := app.Stop(shutdownCtx); err != nil {
log.Error("Error during shutdown", "error", err)
} }
}() }()
// Wait for context cancellation // Wait for either the signal handler to complete shutdown or the app to request shutdown
<-ctx.Done() select {
case <-shutdownComplete:
// Shutdown completed via signal
return nil return nil
case <-ctx.Done():
// Context cancelled (shouldn't happen in normal operation)
if err := app.Stop(context.Background()); err != nil {
log.Error("Error stopping app", "error", err)
}
return ctx.Err()
case <-app.Done():
// App finished running (e.g., backup completed)
return nil
}
} }
// RunWithApp is a helper that creates and runs an fx app with the given options // RunWithApp is a helper that creates and runs an fx app with the given options.
// It combines NewApp and RunApp into a single convenient function. This is the
// preferred way to run CLI commands that need the full application context.
// It acquires a PID lock before starting to prevent concurrent instances.
func RunWithApp(ctx context.Context, opts AppOptions) error { func RunWithApp(ctx context.Context, opts AppOptions) error {
// Acquire PID lock to prevent concurrent instances
lockDir := filepath.Join(xdg.DataHome, "berlin.sneak.app.vaultik")
lock, err := pidlock.Acquire(lockDir)
if err != nil {
if errors.Is(err, pidlock.ErrAlreadyRunning) {
return fmt.Errorf("cannot start: %w", err)
}
return fmt.Errorf("failed to acquire lock: %w", err)
}
defer func() {
if err := lock.Release(); err != nil {
log.Warn("Failed to release PID lock", "error", err)
}
}()
app := NewApp(opts) app := NewApp(opts)
return RunApp(ctx, app) return RunApp(ctx, app)
} }

View File

@@ -1,83 +0,0 @@
package cli
import (
"context"
"fmt"
"os"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/globals"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// BackupOptions contains options for the backup command
type BackupOptions struct {
ConfigPath string
Daemon bool
Cron bool
Prune bool
}
// NewBackupCommand creates the backup command
func NewBackupCommand() *cobra.Command {
opts := &BackupOptions{}
cmd := &cobra.Command{
Use: "backup",
Short: "Perform incremental backup",
Long: `Backup configured directories using incremental deduplication and encryption.
Config is located at /etc/vaultik/config.yml, but can be overridden by specifying
a path using --config or by setting VAULTIK_CONFIG to a path.`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// If --config not specified, check environment variable
if opts.ConfigPath == "" {
opts.ConfigPath = os.Getenv("VAULTIK_CONFIG")
}
// If still not specified, use default
if opts.ConfigPath == "" {
defaultConfig := "/etc/vaultik/config.yml"
if _, err := os.Stat(defaultConfig); err == nil {
opts.ConfigPath = defaultConfig
} else {
return fmt.Errorf("no config file specified, VAULTIK_CONFIG not set, and %s not found", defaultConfig)
}
}
return runBackup(cmd.Context(), opts)
},
}
cmd.Flags().StringVar(&opts.ConfigPath, "config", "", "Path to config file")
cmd.Flags().BoolVar(&opts.Daemon, "daemon", false, "Run in daemon mode with inotify monitoring")
cmd.Flags().BoolVar(&opts.Cron, "cron", false, "Run in cron mode (silent unless error)")
cmd.Flags().BoolVar(&opts.Prune, "prune", false, "Delete all previous snapshots and unreferenced blobs after backup")
return cmd
}
func runBackup(ctx context.Context, opts *BackupOptions) error {
return RunWithApp(ctx, AppOptions{
ConfigPath: opts.ConfigPath,
Invokes: []fx.Option{
fx.Invoke(func(g *globals.Globals, cfg *config.Config, repos *database.Repositories) error {
// TODO: Implement backup logic
fmt.Printf("Running backup with config: %s\n", opts.ConfigPath)
fmt.Printf("Version: %s, Commit: %s\n", g.Version, g.Commit)
fmt.Printf("Index path: %s\n", cfg.IndexPath)
if opts.Daemon {
fmt.Println("Running in daemon mode")
}
if opts.Cron {
fmt.Println("Running in cron mode")
}
if opts.Prune {
fmt.Println("Pruning enabled - will delete old snapshots after backup")
}
return nil
}),
},
})
}

102
internal/cli/database.go Normal file
View File

@@ -0,0 +1,102 @@
package cli
import (
"fmt"
"os"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/log"
"github.com/spf13/cobra"
)
// NewDatabaseCommand creates the database command group
func NewDatabaseCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "database",
Short: "Manage the local state database",
Long: `Commands for managing the local SQLite state database.`,
}
cmd.AddCommand(
newDatabasePurgeCommand(),
)
return cmd
}
// newDatabasePurgeCommand creates the database purge command
func newDatabasePurgeCommand() *cobra.Command {
var force bool
cmd := &cobra.Command{
Use: "purge",
Short: "Delete the local state database",
Long: `Completely removes the local SQLite state database.
This will erase all local tracking of:
- File metadata and change detection state
- Chunk and blob mappings
- Local snapshot records
The remote storage is NOT affected. After purging, the next backup will
perform a full scan and re-deduplicate against existing remote blobs.
Use --force to skip the confirmation prompt.`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Resolve config path
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
// Load config to get database path
cfg, err := config.Load(configPath)
if err != nil {
return fmt.Errorf("failed to load config: %w", err)
}
dbPath := cfg.IndexPath
// Check if database exists
if _, err := os.Stat(dbPath); os.IsNotExist(err) {
fmt.Printf("Database does not exist: %s\n", dbPath)
return nil
}
// Confirm unless --force
if !force {
fmt.Printf("This will delete the local state database at:\n %s\n\n", dbPath)
fmt.Print("Are you sure? Type 'yes' to confirm: ")
var confirm string
if _, err := fmt.Scanln(&confirm); err != nil || confirm != "yes" {
fmt.Println("Aborted.")
return nil
}
}
// Delete the database file
if err := os.Remove(dbPath); err != nil {
return fmt.Errorf("failed to delete database: %w", err)
}
// Also delete WAL and SHM files if they exist
walPath := dbPath + "-wal"
shmPath := dbPath + "-shm"
_ = os.Remove(walPath) // Ignore errors - files may not exist
_ = os.Remove(shmPath)
rootFlags := GetRootFlags()
if !rootFlags.Quiet {
fmt.Printf("Database purged: %s\n", dbPath)
}
log.Info("Local state database purged", "path", dbPath)
return nil
},
}
cmd.Flags().BoolVar(&force, "force", false, "Skip confirmation prompt")
return cmd
}

94
internal/cli/duration.go Normal file
View File

@@ -0,0 +1,94 @@
package cli
import (
"fmt"
"regexp"
"strconv"
"strings"
"time"
)
// parseDuration parses duration strings. Supports standard Go duration format
// (e.g., "3h30m", "1h45m30s") as well as extended units:
// - d: days (e.g., "30d", "7d")
// - w: weeks (e.g., "2w", "4w")
// - mo: months (30 days) (e.g., "6mo", "1mo")
// - y: years (365 days) (e.g., "1y", "2y")
//
// Can combine units: "1y6mo", "2w3d", "1d12h30m"
func parseDuration(s string) (time.Duration, error) {
// First try standard Go duration parsing
if d, err := time.ParseDuration(s); err == nil {
return d, nil
}
// Extended duration parsing
// Check for negative values
if strings.HasPrefix(strings.TrimSpace(s), "-") {
return 0, fmt.Errorf("negative durations are not supported")
}
// Pattern matches: number + unit, repeated
re := regexp.MustCompile(`(\d+(?:\.\d+)?)\s*([a-zA-Z]+)`)
matches := re.FindAllStringSubmatch(s, -1)
if len(matches) == 0 {
return 0, fmt.Errorf("invalid duration format: %q", s)
}
var total time.Duration
for _, match := range matches {
valueStr := match[1]
unit := strings.ToLower(match[2])
value, err := strconv.ParseFloat(valueStr, 64)
if err != nil {
return 0, fmt.Errorf("invalid number %q: %w", valueStr, err)
}
var d time.Duration
switch unit {
// Standard time units
case "ns", "nanosecond", "nanoseconds":
d = time.Duration(value)
case "us", "µs", "microsecond", "microseconds":
d = time.Duration(value * float64(time.Microsecond))
case "ms", "millisecond", "milliseconds":
d = time.Duration(value * float64(time.Millisecond))
case "s", "sec", "second", "seconds":
d = time.Duration(value * float64(time.Second))
case "m", "min", "minute", "minutes":
d = time.Duration(value * float64(time.Minute))
case "h", "hr", "hour", "hours":
d = time.Duration(value * float64(time.Hour))
// Extended units
case "d", "day", "days":
d = time.Duration(value * float64(24*time.Hour))
case "w", "week", "weeks":
d = time.Duration(value * float64(7*24*time.Hour))
case "mo", "month", "months":
// Using 30 days as approximation
d = time.Duration(value * float64(30*24*time.Hour))
case "y", "year", "years":
// Using 365 days as approximation
d = time.Duration(value * float64(365*24*time.Hour))
default:
// Try parsing as standard Go duration unit
testStr := fmt.Sprintf("1%s", unit)
if _, err := time.ParseDuration(testStr); err == nil {
// It's a valid Go duration unit, parse the full value
fullStr := fmt.Sprintf("%g%s", value, unit)
if d, err = time.ParseDuration(fullStr); err != nil {
return 0, fmt.Errorf("invalid duration %q: %w", fullStr, err)
}
} else {
return 0, fmt.Errorf("unknown time unit %q", unit)
}
}
total += d
}
return total, nil
}

View File

@@ -0,0 +1,263 @@
package cli
import (
"testing"
"time"
"github.com/stretchr/testify/assert"
)
func TestParseDuration(t *testing.T) {
tests := []struct {
name string
input string
expected time.Duration
wantErr bool
}{
// Standard Go durations
{
name: "standard seconds",
input: "30s",
expected: 30 * time.Second,
},
{
name: "standard minutes",
input: "45m",
expected: 45 * time.Minute,
},
{
name: "standard hours",
input: "2h",
expected: 2 * time.Hour,
},
{
name: "standard combined",
input: "3h30m",
expected: 3*time.Hour + 30*time.Minute,
},
{
name: "standard complex",
input: "1h45m30s",
expected: 1*time.Hour + 45*time.Minute + 30*time.Second,
},
{
name: "standard with milliseconds",
input: "1s500ms",
expected: 1*time.Second + 500*time.Millisecond,
},
// Extended units - days
{
name: "single day",
input: "1d",
expected: 24 * time.Hour,
},
{
name: "multiple days",
input: "7d",
expected: 7 * 24 * time.Hour,
},
{
name: "fractional days",
input: "1.5d",
expected: 36 * time.Hour,
},
{
name: "days spelled out",
input: "3days",
expected: 3 * 24 * time.Hour,
},
// Extended units - weeks
{
name: "single week",
input: "1w",
expected: 7 * 24 * time.Hour,
},
{
name: "multiple weeks",
input: "4w",
expected: 4 * 7 * 24 * time.Hour,
},
{
name: "weeks spelled out",
input: "2weeks",
expected: 2 * 7 * 24 * time.Hour,
},
// Extended units - months
{
name: "single month",
input: "1mo",
expected: 30 * 24 * time.Hour,
},
{
name: "multiple months",
input: "6mo",
expected: 6 * 30 * 24 * time.Hour,
},
{
name: "months spelled out",
input: "3months",
expected: 3 * 30 * 24 * time.Hour,
},
// Extended units - years
{
name: "single year",
input: "1y",
expected: 365 * 24 * time.Hour,
},
{
name: "multiple years",
input: "2y",
expected: 2 * 365 * 24 * time.Hour,
},
{
name: "years spelled out",
input: "1year",
expected: 365 * 24 * time.Hour,
},
// Combined extended units
{
name: "weeks and days",
input: "2w3d",
expected: 2*7*24*time.Hour + 3*24*time.Hour,
},
{
name: "years and months",
input: "1y6mo",
expected: 365*24*time.Hour + 6*30*24*time.Hour,
},
{
name: "days and hours",
input: "1d12h",
expected: 24*time.Hour + 12*time.Hour,
},
{
name: "complex combination",
input: "1y2mo3w4d5h6m7s",
expected: 365*24*time.Hour + 2*30*24*time.Hour + 3*7*24*time.Hour + 4*24*time.Hour + 5*time.Hour + 6*time.Minute + 7*time.Second,
},
{
name: "with spaces",
input: "1d 12h 30m",
expected: 24*time.Hour + 12*time.Hour + 30*time.Minute,
},
// Edge cases
{
name: "zero duration",
input: "0s",
expected: 0,
},
{
name: "large duration",
input: "10y",
expected: 10 * 365 * 24 * time.Hour,
},
// Error cases
{
name: "empty string",
input: "",
wantErr: true,
},
{
name: "invalid format",
input: "abc",
wantErr: true,
},
{
name: "unknown unit",
input: "5x",
wantErr: true,
},
{
name: "invalid number",
input: "xyzd",
wantErr: true,
},
{
name: "negative not supported",
input: "-5d",
wantErr: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got, err := parseDuration(tt.input)
if tt.wantErr {
assert.Error(t, err, "expected error for input %q", tt.input)
return
}
assert.NoError(t, err, "unexpected error for input %q", tt.input)
assert.Equal(t, tt.expected, got, "duration mismatch for input %q", tt.input)
})
}
}
func TestParseDurationSpecialCases(t *testing.T) {
// Test that standard Go durations work exactly as expected
standardDurations := []string{
"300ms",
"1.5h",
"2h45m",
"72h",
"1us",
"1µs",
"1ns",
}
for _, d := range standardDurations {
expected, err := time.ParseDuration(d)
assert.NoError(t, err)
got, err := parseDuration(d)
assert.NoError(t, err)
assert.Equal(t, expected, got, "standard duration %q should parse identically", d)
}
}
func TestParseDurationRealWorldExamples(t *testing.T) {
// Test real-world snapshot purge scenarios
tests := []struct {
description string
input string
olderThan time.Duration
}{
{
description: "keep snapshots from last 30 days",
input: "30d",
olderThan: 30 * 24 * time.Hour,
},
{
description: "keep snapshots from last 6 months",
input: "6mo",
olderThan: 6 * 30 * 24 * time.Hour,
},
{
description: "keep snapshots from last year",
input: "1y",
olderThan: 365 * 24 * time.Hour,
},
{
description: "keep snapshots from last week and a half",
input: "1w3d",
olderThan: 10 * 24 * time.Hour,
},
{
description: "keep snapshots from last 90 days",
input: "90d",
olderThan: 90 * 24 * time.Hour,
},
}
for _, tt := range tests {
t.Run(tt.description, func(t *testing.T) {
got, err := parseDuration(tt.input)
assert.NoError(t, err)
assert.Equal(t, tt.olderThan, got)
// Verify the duration makes sense for snapshot purging
assert.Greater(t, got, time.Hour, "snapshot purge duration should be at least an hour")
})
}
}

View File

@@ -4,7 +4,9 @@ import (
"os" "os"
) )
// CLIEntry is the main entry point for the CLI application // CLIEntry is the main entry point for the CLI application.
// It creates the root command, executes it, and exits with status 1
// if an error occurs. This function should be called from main().
func CLIEntry() { func CLIEntry() {
rootCmd := NewRootCommand() rootCmd := NewRootCommand()
if err := rootCmd.Execute(); err != nil { if err := rootCmd.Execute(); err != nil {

View File

@@ -18,7 +18,7 @@ func TestCLIEntry(t *testing.T) {
} }
// Verify all subcommands are registered // Verify all subcommands are registered
expectedCommands := []string{"backup", "restore", "prune", "verify", "fetch"} expectedCommands := []string{"snapshot", "store", "restore", "prune", "verify", "info", "version"}
for _, expected := range expectedCommands { for _, expected := range expectedCommands {
found := false found := false
for _, cmd := range cmd.Commands() { for _, cmd := range cmd.Commands() {
@@ -32,19 +32,24 @@ func TestCLIEntry(t *testing.T) {
} }
} }
// Verify backup command has proper flags // Verify snapshot command has subcommands
backupCmd, _, err := cmd.Find([]string{"backup"}) snapshotCmd, _, err := cmd.Find([]string{"snapshot"})
if err != nil { if err != nil {
t.Errorf("Failed to find backup command: %v", err) t.Errorf("Failed to find snapshot command: %v", err)
} else { } else {
if backupCmd.Flag("config") == nil { // Check snapshot subcommands
t.Error("Backup command missing --config flag") expectedSubCommands := []string{"create", "list", "purge", "verify"}
for _, expected := range expectedSubCommands {
found := false
for _, subcmd := range snapshotCmd.Commands() {
if subcmd.Use == expected || subcmd.Name() == expected {
found = true
break
} }
if backupCmd.Flag("daemon") == nil {
t.Error("Backup command missing --daemon flag")
} }
if backupCmd.Flag("cron") == nil { if !found {
t.Error("Backup command missing --cron flag") t.Errorf("Expected snapshot subcommand '%s' not found", expected)
}
} }
} }
} }

View File

@@ -1,88 +0,0 @@
package cli
import (
"context"
"fmt"
"os"
"git.eeqj.de/sneak/vaultik/internal/globals"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// FetchOptions contains options for the fetch command
type FetchOptions struct {
Bucket string
Prefix string
SnapshotID string
FilePath string
Target string
}
// NewFetchCommand creates the fetch command
func NewFetchCommand() *cobra.Command {
opts := &FetchOptions{}
cmd := &cobra.Command{
Use: "fetch",
Short: "Extract single file from backup",
Long: `Download and decrypt a single file from a backup snapshot`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Validate required flags
if opts.Bucket == "" {
return fmt.Errorf("--bucket is required")
}
if opts.Prefix == "" {
return fmt.Errorf("--prefix is required")
}
if opts.SnapshotID == "" {
return fmt.Errorf("--snapshot is required")
}
if opts.FilePath == "" {
return fmt.Errorf("--file is required")
}
if opts.Target == "" {
return fmt.Errorf("--target is required")
}
return runFetch(cmd.Context(), opts)
},
}
cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name")
cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix")
cmd.Flags().StringVar(&opts.SnapshotID, "snapshot", "", "Snapshot ID")
cmd.Flags().StringVar(&opts.FilePath, "file", "", "Path of file to extract from backup")
cmd.Flags().StringVar(&opts.Target, "target", "", "Target path for extracted file")
return cmd
}
func runFetch(ctx context.Context, opts *FetchOptions) error {
if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
}
app := fx.New(
fx.Supply(opts),
fx.Provide(globals.New),
// Additional modules will be added here
fx.Invoke(func(g *globals.Globals) error {
// TODO: Implement fetch logic
fmt.Printf("Fetching %s from snapshot %s to %s\n", opts.FilePath, opts.SnapshotID, opts.Target)
return nil
}),
fx.NopLogger,
)
if err := app.Start(ctx); err != nil {
return fmt.Errorf("failed to start fetch: %w", err)
}
defer func() {
if err := app.Stop(ctx); err != nil {
fmt.Printf("error stopping app: %v\n", err)
}
}()
return nil
}

71
internal/cli/info.go Normal file
View File

@@ -0,0 +1,71 @@
package cli
import (
"context"
"os"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// NewInfoCommand creates the info command
func NewInfoCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "info",
Short: "Display system and configuration information",
Long: `Shows information about the current vaultik configuration, including:
- System details (OS, architecture, version)
- Storage configuration (S3 bucket, endpoint)
- Backup settings (source directories, compression)
- Encryption configuration (recipients)
- Local database statistics`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
// Use the app framework
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
go func() {
if err := v.ShowInfo(); err != nil {
if err != context.Canceled {
log.Error("Failed to show info", "error", err)
os.Exit(1)
}
}
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
v.Cancel()
return nil
},
})
}),
},
})
},
}
return cmd
}

View File

@@ -2,77 +2,83 @@ package cli
import ( import (
"context" "context"
"fmt"
"os" "os"
"git.eeqj.de/sneak/vaultik/internal/globals" "git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra" "github.com/spf13/cobra"
"go.uber.org/fx" "go.uber.org/fx"
) )
// PruneOptions contains options for the prune command
type PruneOptions struct {
Bucket string
Prefix string
DryRun bool
}
// NewPruneCommand creates the prune command // NewPruneCommand creates the prune command
func NewPruneCommand() *cobra.Command { func NewPruneCommand() *cobra.Command {
opts := &PruneOptions{} opts := &vaultik.PruneOptions{}
cmd := &cobra.Command{ cmd := &cobra.Command{
Use: "prune", Use: "prune",
Short: "Remove unreferenced blobs", Short: "Remove unreferenced blobs",
Long: `Delete blobs that are no longer referenced by any snapshot`, Long: `Removes blobs that are not referenced by any snapshot.
This command scans all snapshots and their manifests to build a list of
referenced blobs, then removes any blobs in storage that are not in this list.
Use this command after deleting snapshots with 'vaultik purge' to reclaim
storage space.`,
Args: cobra.NoArgs, Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error { RunE: func(cmd *cobra.Command, args []string) error {
// Validate required flags // Use unified config resolution
if opts.Bucket == "" { configPath, err := ResolveConfigPath()
return fmt.Errorf("--bucket is required") if err != nil {
return err
} }
if opts.Prefix == "" {
return fmt.Errorf("--prefix is required") // Use the app framework like other commands
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet || opts.JSON,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
// Start the prune operation in a goroutine
go func() {
// Run the prune operation
if err := v.PruneBlobs(opts); err != nil {
if err != context.Canceled {
if !opts.JSON {
log.Error("Prune operation failed", "error", err)
} }
return runPrune(cmd.Context(), opts) os.Exit(1)
}
}
// Shutdown the app when prune completes
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
log.Debug("Stopping prune operation")
v.Cancel()
return nil
},
})
}),
},
})
}, },
} }
cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name") cmd.Flags().BoolVar(&opts.Force, "force", false, "Skip confirmation prompt")
cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix") cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output pruning stats as JSON")
cmd.Flags().BoolVar(&opts.DryRun, "dry-run", false, "Show what would be deleted without actually deleting")
return cmd return cmd
} }
func runPrune(ctx context.Context, opts *PruneOptions) error {
if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
}
app := fx.New(
fx.Supply(opts),
fx.Provide(globals.New),
// Additional modules will be added here
fx.Invoke(func(g *globals.Globals) error {
// TODO: Implement prune logic
fmt.Printf("Pruning bucket %s with prefix %s\n", opts.Bucket, opts.Prefix)
if opts.DryRun {
fmt.Println("Running in dry-run mode")
}
return nil
}),
fx.NopLogger,
)
if err := app.Start(ctx); err != nil {
return fmt.Errorf("failed to start prune: %w", err)
}
defer func() {
if err := app.Stop(ctx); err != nil {
fmt.Printf("error stopping app: %v\n", err)
}
}()
return nil
}

100
internal/cli/purge.go Normal file
View File

@@ -0,0 +1,100 @@
package cli
import (
"context"
"fmt"
"os"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// PurgeOptions contains options for the purge command
type PurgeOptions struct {
KeepLatest bool
OlderThan string
Force bool
}
// NewPurgeCommand creates the purge command
func NewPurgeCommand() *cobra.Command {
opts := &PurgeOptions{}
cmd := &cobra.Command{
Use: "purge",
Short: "Purge old snapshots",
Long: `Removes snapshots based on age or count criteria.
This command allows you to:
- Keep only the latest snapshot (--keep-latest)
- Remove snapshots older than a specific duration (--older-than)
Config is located at /etc/vaultik/config.yml by default, but can be overridden by
specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Validate flags
if !opts.KeepLatest && opts.OlderThan == "" {
return fmt.Errorf("must specify either --keep-latest or --older-than")
}
if opts.KeepLatest && opts.OlderThan != "" {
return fmt.Errorf("cannot specify both --keep-latest and --older-than")
}
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
// Use the app framework like other commands
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
// Start the purge operation in a goroutine
go func() {
// Run the purge operation
if err := v.PurgeSnapshots(opts.KeepLatest, opts.OlderThan, opts.Force); err != nil {
if err != context.Canceled {
log.Error("Purge operation failed", "error", err)
os.Exit(1)
}
}
// Shutdown the app when purge completes
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
log.Debug("Stopping purge operation")
v.Cancel()
return nil
},
})
}),
},
})
},
}
cmd.Flags().BoolVar(&opts.KeepLatest, "keep-latest", false, "Keep only the latest snapshot")
cmd.Flags().StringVar(&opts.OlderThan, "older-than", "", "Remove snapshots older than duration (e.g. 30d, 6m, 1y)")
cmd.Flags().BoolVar(&opts.Force, "force", false, "Skip confirmation prompts")
return cmd
}

89
internal/cli/remote.go Normal file
View File

@@ -0,0 +1,89 @@
package cli
import (
"context"
"os"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// NewRemoteCommand creates the remote command and subcommands
func NewRemoteCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "remote",
Short: "Remote storage management commands",
Long: "Commands for inspecting and managing remote storage",
}
// Add subcommands
cmd.AddCommand(newRemoteInfoCommand())
return cmd
}
// newRemoteInfoCommand creates the 'remote info' subcommand
func newRemoteInfoCommand() *cobra.Command {
var jsonOutput bool
cmd := &cobra.Command{
Use: "info",
Short: "Display remote storage information",
Long: `Shows detailed information about remote storage, including:
- Size of all snapshot metadata (per snapshot and total)
- Count and total size of all blobs
- Count and size of referenced blobs (from all manifests)
- Count and size of orphaned blobs (not referenced by any manifest)`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet || jsonOutput,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
go func() {
if err := v.RemoteInfo(jsonOutput); err != nil {
if err != context.Canceled {
if !jsonOutput {
log.Error("Failed to get remote info", "error", err)
}
os.Exit(1)
}
}
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
v.Cancel()
return nil
},
})
}),
},
})
},
}
cmd.Flags().BoolVar(&jsonOutput, "json", false, "Output in JSON format")
return cmd
}

View File

@@ -2,20 +2,30 @@ package cli
import ( import (
"context" "context"
"fmt"
"os"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/globals" "git.eeqj.de/sneak/vaultik/internal/globals"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/storage"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra" "github.com/spf13/cobra"
"go.uber.org/fx" "go.uber.org/fx"
) )
// RestoreOptions contains options for the restore command // RestoreOptions contains options for the restore command
type RestoreOptions struct { type RestoreOptions struct {
Bucket string
Prefix string
SnapshotID string
TargetDir string TargetDir string
Paths []string // Optional paths to restore (empty = all)
Verify bool // Verify restored files after restore
}
// RestoreApp contains all dependencies needed for restore
type RestoreApp struct {
Globals *globals.Globals
Config *config.Config
Storage storage.Storer
Vaultik *vaultik.Vaultik
Shutdowner fx.Shutdowner
} }
// NewRestoreCommand creates the restore command // NewRestoreCommand creates the restore command
@@ -23,61 +33,104 @@ func NewRestoreCommand() *cobra.Command {
opts := &RestoreOptions{} opts := &RestoreOptions{}
cmd := &cobra.Command{ cmd := &cobra.Command{
Use: "restore", Use: "restore <snapshot-id> <target-dir> [paths...]",
Short: "Restore files from backup", Short: "Restore files from backup",
Long: `Download and decrypt files from a backup snapshot`, Long: `Download and decrypt files from a backup snapshot.
Args: cobra.NoArgs,
This command will restore files from the specified snapshot to the target directory.
If no paths are specified, all files are restored.
If paths are specified, only matching files/directories are restored.
Requires the VAULTIK_AGE_SECRET_KEY environment variable to be set with the age private key.
Examples:
# Restore entire snapshot
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore
# Restore specific file
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore /home/user/important.txt
# Restore specific directory
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore /home/user/documents/
# Restore and verify all files
vaultik restore --verify myhost_docs_2025-01-01T12:00:00Z /restore`,
Args: cobra.MinimumNArgs(2),
RunE: func(cmd *cobra.Command, args []string) error { RunE: func(cmd *cobra.Command, args []string) error {
// Validate required flags snapshotID := args[0]
if opts.Bucket == "" { opts.TargetDir = args[1]
return fmt.Errorf("--bucket is required") if len(args) > 2 {
opts.Paths = args[2:]
} }
if opts.Prefix == "" {
return fmt.Errorf("--prefix is required") // Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
} }
if opts.SnapshotID == "" {
return fmt.Errorf("--snapshot is required") // Use the app framework like other commands
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{
fx.Provide(fx.Annotate(
func(g *globals.Globals, cfg *config.Config,
storer storage.Storer, v *vaultik.Vaultik, shutdowner fx.Shutdowner) *RestoreApp {
return &RestoreApp{
Globals: g,
Config: cfg,
Storage: storer,
Vaultik: v,
Shutdowner: shutdowner,
} }
if opts.TargetDir == "" { },
return fmt.Errorf("--target is required") )),
},
Invokes: []fx.Option{
fx.Invoke(func(app *RestoreApp, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
// Start the restore operation in a goroutine
go func() {
// Run the restore operation
restoreOpts := &vaultik.RestoreOptions{
SnapshotID: snapshotID,
TargetDir: opts.TargetDir,
Paths: opts.Paths,
Verify: opts.Verify,
} }
return runRestore(cmd.Context(), opts) if err := app.Vaultik.Restore(restoreOpts); err != nil {
if err != context.Canceled {
log.Error("Restore operation failed", "error", err)
}
}
// Shutdown the app when restore completes
if err := app.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
log.Debug("Stopping restore operation")
app.Vaultik.Cancel()
return nil
},
})
}),
},
})
}, },
} }
cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name") cmd.Flags().BoolVar(&opts.Verify, "verify", false, "Verify restored files by checking chunk hashes")
cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix")
cmd.Flags().StringVar(&opts.SnapshotID, "snapshot", "", "Snapshot ID to restore")
cmd.Flags().StringVar(&opts.TargetDir, "target", "", "Target directory for restore")
return cmd return cmd
} }
func runRestore(ctx context.Context, opts *RestoreOptions) error {
if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
}
app := fx.New(
fx.Supply(opts),
fx.Provide(globals.New),
// Additional modules will be added here
fx.Invoke(func(g *globals.Globals) error {
// TODO: Implement restore logic
fmt.Printf("Restoring snapshot %s to %s\n", opts.SnapshotID, opts.TargetDir)
return nil
}),
fx.NopLogger,
)
if err := app.Start(ctx); err != nil {
return fmt.Errorf("failed to start restore: %w", err)
}
defer func() {
if err := app.Stop(ctx); err != nil {
fmt.Printf("error stopping app: %v\n", err)
}
}()
return nil
}

View File

@@ -1,10 +1,26 @@
package cli package cli
import ( import (
"fmt"
"os"
"github.com/spf13/cobra" "github.com/spf13/cobra"
) )
// NewRootCommand creates the root cobra command // RootFlags holds global flags that apply to all commands.
// These flags are defined on the root command and inherited by all subcommands.
type RootFlags struct {
ConfigPath string
Verbose bool
Debug bool
Quiet bool
}
var rootFlags RootFlags
// NewRootCommand creates the root cobra command for the vaultik CLI.
// It sets up the command structure, global flags, and adds all subcommands.
// This is the main entry point for the CLI command hierarchy.
func NewRootCommand() *cobra.Command { func NewRootCommand() *cobra.Command {
cmd := &cobra.Command{ cmd := &cobra.Command{
Use: "vaultik", Use: "vaultik",
@@ -15,15 +31,54 @@ on the source system.`,
SilenceUsage: true, SilenceUsage: true,
} }
// Add global flags
cmd.PersistentFlags().StringVar(&rootFlags.ConfigPath, "config", "", "Path to config file (default: $VAULTIK_CONFIG or /etc/vaultik/config.yml)")
cmd.PersistentFlags().BoolVarP(&rootFlags.Verbose, "verbose", "v", false, "Enable verbose output")
cmd.PersistentFlags().BoolVar(&rootFlags.Debug, "debug", false, "Enable debug output")
cmd.PersistentFlags().BoolVarP(&rootFlags.Quiet, "quiet", "q", false, "Suppress non-error output")
// Add subcommands // Add subcommands
cmd.AddCommand( cmd.AddCommand(
NewBackupCommand(),
NewRestoreCommand(), NewRestoreCommand(),
NewPruneCommand(), NewPruneCommand(),
NewVerifyCommand(), NewVerifyCommand(),
NewFetchCommand(), NewStoreCommand(),
SnapshotCmd(), NewSnapshotCommand(),
NewInfoCommand(),
NewVersionCommand(),
NewRemoteCommand(),
NewDatabaseCommand(),
) )
return cmd return cmd
} }
// GetRootFlags returns the global flags that were parsed from the command line.
// This allows subcommands to access global flag values like verbosity and config path.
func GetRootFlags() RootFlags {
return rootFlags
}
// ResolveConfigPath resolves the config file path from flags, environment, or default.
// It checks in order: 1) --config flag, 2) VAULTIK_CONFIG environment variable,
// 3) default location /etc/vaultik/config.yml. Returns an error if no valid
// config file can be found through any of these methods.
func ResolveConfigPath() (string, error) {
// First check global flag
if rootFlags.ConfigPath != "" {
return rootFlags.ConfigPath, nil
}
// Then check environment variable
if envPath := os.Getenv("VAULTIK_CONFIG"); envPath != "" {
return envPath, nil
}
// Finally check default location
defaultPath := "/etc/vaultik/config.yml"
if _, err := os.Stat(defaultPath); err == nil {
return defaultPath, nil
}
return "", fmt.Errorf("no config file specified, VAULTIK_CONFIG not set, and %s not found", defaultPath)
}

View File

@@ -1,90 +1,467 @@
package cli package cli
import ( import (
"context"
"fmt"
"os"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra" "github.com/spf13/cobra"
"go.uber.org/fx"
) )
func SnapshotCmd() *cobra.Command { // NewSnapshotCommand creates the snapshot command and subcommands
func NewSnapshotCommand() *cobra.Command {
cmd := &cobra.Command{ cmd := &cobra.Command{
Use: "snapshot", Use: "snapshot",
Short: "Manage snapshots", Short: "Snapshot management commands",
Long: "Commands for listing, removing, and querying snapshots", Long: "Commands for creating, listing, and managing snapshots",
} }
cmd.AddCommand(snapshotListCmd()) // Add subcommands
cmd.AddCommand(snapshotRmCmd()) cmd.AddCommand(newSnapshotCreateCommand())
cmd.AddCommand(snapshotLatestCmd()) cmd.AddCommand(newSnapshotListCommand())
cmd.AddCommand(newSnapshotPurgeCommand())
cmd.AddCommand(newSnapshotVerifyCommand())
cmd.AddCommand(newSnapshotRemoveCommand())
cmd.AddCommand(newSnapshotPruneCommand())
return cmd return cmd
} }
func snapshotListCmd() *cobra.Command { // newSnapshotCreateCommand creates the 'snapshot create' subcommand
var ( func newSnapshotCreateCommand() *cobra.Command {
bucket string opts := &vaultik.SnapshotCreateOptions{}
prefix string
limit int cmd := &cobra.Command{
) Use: "create [snapshot-names...]",
Short: "Create new snapshots",
Long: `Creates new snapshots of the configured directories.
If snapshot names are provided, only those snapshots are created.
If no names are provided, all configured snapshots are created.
Config is located at /etc/vaultik/config.yml by default, but can be overridden by
specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
Args: cobra.ArbitraryArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Pass snapshot names from args
opts.Snapshots = args
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
// Use the backup functionality from cli package
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Cron: opts.Cron,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
// Start the snapshot creation in a goroutine
go func() {
// Run the snapshot creation
if err := v.CreateSnapshot(opts); err != nil {
if err != context.Canceled {
log.Error("Snapshot creation failed", "error", err)
}
}
// Shutdown the app when snapshot completes
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
log.Debug("Stopping snapshot creation")
// Cancel the Vaultik context
v.Cancel()
return nil
},
})
}),
},
})
},
}
cmd.Flags().BoolVar(&opts.Daemon, "daemon", false, "Run in daemon mode with inotify monitoring")
cmd.Flags().BoolVar(&opts.Cron, "cron", false, "Run in cron mode (silent unless error)")
cmd.Flags().BoolVar(&opts.Prune, "prune", false, "Delete all previous snapshots and unreferenced blobs after backup")
cmd.Flags().BoolVar(&opts.SkipErrors, "skip-errors", false, "Skip file read errors (log them loudly but continue)")
return cmd
}
// newSnapshotListCommand creates the 'snapshot list' subcommand
func newSnapshotListCommand() *cobra.Command {
var jsonOutput bool
cmd := &cobra.Command{ cmd := &cobra.Command{
Use: "list", Use: "list",
Short: "List snapshots", Aliases: []string{"ls"},
Long: "List all snapshots in the bucket, sorted by timestamp", Short: "List all snapshots",
Long: "Lists all snapshots with their ID, timestamp, and compressed size",
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error { RunE: func(cmd *cobra.Command, args []string) error {
panic("unimplemented") // Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
go func() {
if err := v.ListSnapshots(jsonOutput); err != nil {
if err != context.Canceled {
log.Error("Failed to list snapshots", "error", err)
os.Exit(1)
}
}
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
v.Cancel()
return nil
},
})
}),
},
})
}, },
} }
cmd.Flags().StringVar(&bucket, "bucket", "", "S3 bucket name") cmd.Flags().BoolVar(&jsonOutput, "json", false, "Output in JSON format")
cmd.Flags().StringVar(&prefix, "prefix", "", "S3 prefix")
cmd.Flags().IntVar(&limit, "limit", 10, "Maximum number of snapshots to list")
cmd.MarkFlagRequired("bucket")
return cmd return cmd
} }
func snapshotRmCmd() *cobra.Command { // newSnapshotPurgeCommand creates the 'snapshot purge' subcommand
var ( func newSnapshotPurgeCommand() *cobra.Command {
bucket string var keepLatest bool
prefix string var olderThan string
snapshot string var force bool
)
cmd := &cobra.Command{ cmd := &cobra.Command{
Use: "rm", Use: "purge",
Short: "Remove a snapshot", Short: "Purge old snapshots",
Long: "Remove a snapshot and optionally its associated blobs", Long: "Removes snapshots based on age or count criteria",
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error { RunE: func(cmd *cobra.Command, args []string) error {
panic("unimplemented") // Validate flags
if !keepLatest && olderThan == "" {
return fmt.Errorf("must specify either --keep-latest or --older-than")
}
if keepLatest && olderThan != "" {
return fmt.Errorf("cannot specify both --keep-latest and --older-than")
}
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
go func() {
if err := v.PurgeSnapshots(keepLatest, olderThan, force); err != nil {
if err != context.Canceled {
log.Error("Failed to purge snapshots", "error", err)
os.Exit(1)
}
}
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
v.Cancel()
return nil
},
})
}),
},
})
}, },
} }
cmd.Flags().StringVar(&bucket, "bucket", "", "S3 bucket name") cmd.Flags().BoolVar(&keepLatest, "keep-latest", false, "Keep only the latest snapshot")
cmd.Flags().StringVar(&prefix, "prefix", "", "S3 prefix") cmd.Flags().StringVar(&olderThan, "older-than", "", "Remove snapshots older than duration (e.g., 30d, 6m, 1y)")
cmd.Flags().StringVar(&snapshot, "snapshot", "", "Snapshot ID to remove") cmd.Flags().BoolVar(&force, "force", false, "Skip confirmation prompt")
cmd.MarkFlagRequired("bucket")
cmd.MarkFlagRequired("snapshot")
return cmd return cmd
} }
func snapshotLatestCmd() *cobra.Command { // newSnapshotVerifyCommand creates the 'snapshot verify' subcommand
var ( func newSnapshotVerifyCommand() *cobra.Command {
bucket string opts := &vaultik.VerifyOptions{}
prefix string
)
cmd := &cobra.Command{ cmd := &cobra.Command{
Use: "latest", Use: "verify <snapshot-id>",
Short: "Get the latest snapshot ID", Short: "Verify snapshot integrity",
Long: "Display the ID of the most recent snapshot", Long: "Verifies that all blobs referenced in a snapshot exist",
Args: func(cmd *cobra.Command, args []string) error {
if len(args) != 1 {
_ = cmd.Help()
if len(args) == 0 {
return fmt.Errorf("snapshot ID required")
}
return fmt.Errorf("expected 1 argument, got %d", len(args))
}
return nil
},
RunE: func(cmd *cobra.Command, args []string) error { RunE: func(cmd *cobra.Command, args []string) error {
panic("unimplemented") snapshotID := args[0]
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet || opts.JSON,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
go func() {
var err error
if opts.Deep {
err = v.RunDeepVerify(snapshotID, opts)
} else {
err = v.VerifySnapshotWithOptions(snapshotID, opts)
}
if err != nil {
if err != context.Canceled {
if !opts.JSON {
log.Error("Verification failed", "error", err)
}
os.Exit(1)
}
}
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
v.Cancel()
return nil
},
})
}),
},
})
}, },
} }
cmd.Flags().StringVar(&bucket, "bucket", "", "S3 bucket name") cmd.Flags().BoolVar(&opts.Deep, "deep", false, "Download and verify blob hashes")
cmd.Flags().StringVar(&prefix, "prefix", "", "S3 prefix") cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output verification results as JSON")
cmd.MarkFlagRequired("bucket")
return cmd
}
// newSnapshotRemoveCommand creates the 'snapshot remove' subcommand
func newSnapshotRemoveCommand() *cobra.Command {
opts := &vaultik.RemoveOptions{}
cmd := &cobra.Command{
Use: "remove [snapshot-id]",
Aliases: []string{"rm"},
Short: "Remove a snapshot from the local database",
Long: `Removes a snapshot from the local database.
By default, only removes from the local database. Use --remote to also remove
the snapshot metadata from remote storage.
Note: This does NOT remove blobs. Use 'vaultik prune' to remove orphaned blobs
after removing snapshots.
Use --all --force to remove all snapshots.`,
Args: func(cmd *cobra.Command, args []string) error {
all, _ := cmd.Flags().GetBool("all")
if all {
if len(args) > 0 {
_ = cmd.Help()
return fmt.Errorf("--all cannot be used with a snapshot ID")
}
return nil
}
if len(args) != 1 {
_ = cmd.Help()
if len(args) == 0 {
return fmt.Errorf("snapshot ID required (or use --all --force)")
}
return fmt.Errorf("expected 1 argument, got %d", len(args))
}
return nil
},
RunE: func(cmd *cobra.Command, args []string) error {
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet || opts.JSON,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
go func() {
var err error
if opts.All {
_, err = v.RemoveAllSnapshots(opts)
} else {
_, err = v.RemoveSnapshot(args[0], opts)
}
if err != nil {
if err != context.Canceled {
if !opts.JSON {
log.Error("Failed to remove snapshot", "error", err)
}
os.Exit(1)
}
}
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
v.Cancel()
return nil
},
})
}),
},
})
},
}
cmd.Flags().BoolVarP(&opts.Force, "force", "f", false, "Skip confirmation prompt")
cmd.Flags().BoolVar(&opts.DryRun, "dry-run", false, "Show what would be removed without removing")
cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output result as JSON")
cmd.Flags().BoolVar(&opts.Remote, "remote", false, "Also remove snapshot metadata from remote storage")
cmd.Flags().BoolVar(&opts.All, "all", false, "Remove all snapshots (requires --force)")
return cmd
}
// newSnapshotPruneCommand creates the 'snapshot prune' subcommand
func newSnapshotPruneCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "prune",
Short: "Remove orphaned data from local database",
Long: `Removes orphaned files, chunks, and blobs from the local database.
This cleans up data that is no longer referenced by any snapshot, which can
accumulate from incomplete backups or deleted snapshots.`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
go func() {
if _, err := v.PruneDatabase(); err != nil {
if err != context.Canceled {
log.Error("Failed to prune database", "error", err)
os.Exit(1)
}
}
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
v.Cancel()
return nil
},
})
}),
},
})
},
}
return cmd return cmd
} }

158
internal/cli/store.go Normal file
View File

@@ -0,0 +1,158 @@
package cli
import (
"context"
"fmt"
"strings"
"time"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/storage"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// StoreApp contains dependencies for store commands
type StoreApp struct {
Storage storage.Storer
Shutdowner fx.Shutdowner
}
// NewStoreCommand creates the store command and subcommands
func NewStoreCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "store",
Short: "Storage information commands",
Long: "Commands for viewing information about the storage backend",
}
// Add subcommands
cmd.AddCommand(newStoreInfoCommand())
return cmd
}
// newStoreInfoCommand creates the 'store info' subcommand
func newStoreInfoCommand() *cobra.Command {
return &cobra.Command{
Use: "info",
Short: "Display storage information",
Long: "Shows storage configuration and statistics including snapshots and blobs",
RunE: func(cmd *cobra.Command, args []string) error {
return runWithApp(cmd.Context(), func(app *StoreApp) error {
return app.Info(cmd.Context())
})
},
}
}
// Info displays storage information
func (app *StoreApp) Info(ctx context.Context) error {
// Get storage info
storageInfo := app.Storage.Info()
fmt.Printf("Storage Information\n")
fmt.Printf("==================\n\n")
fmt.Printf("Storage Configuration:\n")
fmt.Printf(" Type: %s\n", storageInfo.Type)
fmt.Printf(" Location: %s\n\n", storageInfo.Location)
// Count snapshots by listing metadata/ prefix
snapshotCount := 0
snapshotCh := app.Storage.ListStream(ctx, "metadata/")
snapshotDirs := make(map[string]bool)
for object := range snapshotCh {
if object.Err != nil {
return fmt.Errorf("listing snapshots: %w", object.Err)
}
// Extract snapshot ID from path like metadata/2024-01-15-143052-hostname/
parts := strings.Split(object.Key, "/")
if len(parts) >= 2 && parts[0] == "metadata" && parts[1] != "" {
snapshotDirs[parts[1]] = true
}
}
snapshotCount = len(snapshotDirs)
// Count blobs and calculate total size by listing blobs/ prefix
blobCount := 0
var totalSize int64
blobCh := app.Storage.ListStream(ctx, "blobs/")
for object := range blobCh {
if object.Err != nil {
return fmt.Errorf("listing blobs: %w", object.Err)
}
if !strings.HasSuffix(object.Key, "/") { // Skip directories
blobCount++
totalSize += object.Size
}
}
fmt.Printf("Storage Statistics:\n")
fmt.Printf(" Snapshots: %d\n", snapshotCount)
fmt.Printf(" Blobs: %d\n", blobCount)
fmt.Printf(" Total Size: %s\n", formatBytes(totalSize))
return nil
}
// formatBytes formats bytes into human-readable format
func formatBytes(bytes int64) string {
const unit = 1024
if bytes < unit {
return fmt.Sprintf("%d B", bytes)
}
div, exp := int64(unit), 0
for n := bytes / unit; n >= unit; n /= unit {
div *= unit
exp++
}
return fmt.Sprintf("%.1f %cB", float64(bytes)/float64(div), "KMGTPE"[exp])
}
// runWithApp creates the FX app and runs the given function
func runWithApp(ctx context.Context, fn func(*StoreApp) error) error {
var result error
rootFlags := GetRootFlags()
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
err = RunWithApp(ctx, AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{
fx.Provide(func(storer storage.Storer, shutdowner fx.Shutdowner) *StoreApp {
return &StoreApp{
Storage: storer,
Shutdowner: shutdowner,
}
}),
},
Invokes: []fx.Option{
fx.Invoke(func(app *StoreApp, shutdowner fx.Shutdowner) {
result = fn(app)
// Shutdown after command completes
go func() {
time.Sleep(100 * time.Millisecond) // Brief delay to ensure clean shutdown
if err := shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
}),
},
})
if err != nil {
return err
}
return result
}

View File

@@ -0,0 +1,10 @@
package cli
import "time"
// SnapshotInfo represents snapshot information for listing
type SnapshotInfo struct {
ID string `json:"id"`
Timestamp time.Time `json:"timestamp"`
CompressedSize int64 `json:"compressed_size"`
}

View File

@@ -2,85 +2,97 @@ package cli
import ( import (
"context" "context"
"fmt"
"os" "os"
"git.eeqj.de/sneak/vaultik/internal/globals" "git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra" "github.com/spf13/cobra"
"go.uber.org/fx" "go.uber.org/fx"
) )
// VerifyOptions contains options for the verify command
type VerifyOptions struct {
Bucket string
Prefix string
SnapshotID string
Quick bool
}
// NewVerifyCommand creates the verify command // NewVerifyCommand creates the verify command
func NewVerifyCommand() *cobra.Command { func NewVerifyCommand() *cobra.Command {
opts := &VerifyOptions{} opts := &vaultik.VerifyOptions{}
cmd := &cobra.Command{ cmd := &cobra.Command{
Use: "verify", Use: "verify <snapshot-id>",
Short: "Verify backup integrity", Short: "Verify snapshot integrity",
Long: `Check that all referenced blobs exist and verify metadata integrity`, Long: `Verifies that all blobs referenced in a snapshot exist and optionally verifies their contents.
Args: cobra.NoArgs,
Shallow verification (default):
- Downloads and decompresses manifest
- Checks existence of all blobs in S3
- Reports missing blobs
Deep verification (--deep):
- Downloads and decrypts database
- Verifies blob lists match between manifest and database
- Downloads, decrypts, and decompresses each blob
- Verifies SHA256 hash of each chunk matches database
- Ensures chunks are ordered correctly
The command will fail immediately on any verification error and exit with non-zero status.`,
Args: cobra.ExactArgs(1),
RunE: func(cmd *cobra.Command, args []string) error { RunE: func(cmd *cobra.Command, args []string) error {
// Validate required flags snapshotID := args[0]
if opts.Bucket == "" {
return fmt.Errorf("--bucket is required") // Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
} }
if opts.Prefix == "" {
return fmt.Errorf("--prefix is required") // Use the app framework for all verification
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet || opts.JSON, // Suppress log output in JSON mode
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
// Run the verify operation directly
go func() {
var err error
if opts.Deep {
err = v.RunDeepVerify(snapshotID, opts)
} else {
err = v.VerifySnapshotWithOptions(snapshotID, opts)
} }
return runVerify(cmd.Context(), opts)
if err != nil {
if err != context.Canceled {
if !opts.JSON {
log.Error("Verification failed", "error", err)
}
os.Exit(1)
}
}
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
log.Debug("Stopping verify operation")
v.Cancel()
return nil
},
})
}),
},
})
}, },
} }
cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name") cmd.Flags().BoolVar(&opts.Deep, "deep", false, "Perform deep verification by downloading and verifying all blob contents")
cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix") cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output verification results as JSON")
cmd.Flags().StringVar(&opts.SnapshotID, "snapshot", "", "Snapshot ID to verify (optional, defaults to latest)")
cmd.Flags().BoolVar(&opts.Quick, "quick", false, "Perform quick verification by checking blob existence and S3 content hashes without downloading")
return cmd return cmd
} }
func runVerify(ctx context.Context, opts *VerifyOptions) error {
if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
}
app := fx.New(
fx.Supply(opts),
fx.Provide(globals.New),
// Additional modules will be added here
fx.Invoke(func(g *globals.Globals) error {
// TODO: Implement verify logic
if opts.SnapshotID == "" {
fmt.Printf("Verifying latest snapshot in bucket %s with prefix %s\n", opts.Bucket, opts.Prefix)
} else {
fmt.Printf("Verifying snapshot %s in bucket %s with prefix %s\n", opts.SnapshotID, opts.Bucket, opts.Prefix)
}
if opts.Quick {
fmt.Println("Performing quick verification")
} else {
fmt.Println("Performing deep verification")
}
return nil
}),
fx.NopLogger,
)
if err := app.Start(ctx); err != nil {
return fmt.Errorf("failed to start verify: %w", err)
}
defer func() {
if err := app.Stop(ctx); err != nil {
fmt.Printf("error stopping app: %v\n", err)
}
}()
return nil
}

27
internal/cli/version.go Normal file
View File

@@ -0,0 +1,27 @@
package cli
import (
"fmt"
"runtime"
"git.eeqj.de/sneak/vaultik/internal/globals"
"github.com/spf13/cobra"
)
// NewVersionCommand creates the version command
func NewVersionCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "version",
Short: "Print version information",
Long: `Print version, git commit, and build information for vaultik.`,
Args: cobra.NoArgs,
Run: func(cmd *cobra.Command, args []string) {
fmt.Printf("vaultik %s\n", globals.Version)
fmt.Printf(" commit: %s\n", globals.Commit)
fmt.Printf(" go: %s\n", runtime.Version())
fmt.Printf(" os/arch: %s/%s\n", runtime.GOOS, runtime.GOARCH)
},
}
return cmd
}

View File

@@ -3,30 +3,112 @@ package config
import ( import (
"fmt" "fmt"
"os" "os"
"path/filepath"
"sort"
"strings"
"time" "time"
"filippo.io/age"
"git.eeqj.de/sneak/smartconfig"
"git.eeqj.de/sneak/vaultik/internal/log"
"github.com/adrg/xdg"
"go.uber.org/fx" "go.uber.org/fx"
"gopkg.in/yaml.v3" "gopkg.in/yaml.v3"
) )
// Config represents the application configuration const appName = "berlin.sneak.app.vaultik"
// expandTilde expands ~ at the start of a path to the user's home directory.
func expandTilde(path string) string {
if path == "~" {
home, _ := os.UserHomeDir()
return home
}
if strings.HasPrefix(path, "~/") {
home, _ := os.UserHomeDir()
return filepath.Join(home, path[2:])
}
return path
}
// expandTildeInURL expands ~ in file:// URLs.
func expandTildeInURL(url string) string {
if strings.HasPrefix(url, "file://~/") {
home, _ := os.UserHomeDir()
return "file://" + filepath.Join(home, url[9:])
}
return url
}
// SnapshotConfig represents configuration for a named snapshot.
// Each snapshot backs up one or more paths and can have its own exclude patterns
// in addition to the global excludes.
type SnapshotConfig struct {
Paths []string `yaml:"paths"`
Exclude []string `yaml:"exclude"` // Additional excludes for this snapshot
}
// GetExcludes returns the combined exclude patterns for a named snapshot.
// It merges global excludes with the snapshot-specific excludes.
func (c *Config) GetExcludes(snapshotName string) []string {
snap, ok := c.Snapshots[snapshotName]
if !ok {
return c.Exclude
}
if len(snap.Exclude) == 0 {
return c.Exclude
}
// Combine global and snapshot-specific excludes
combined := make([]string, 0, len(c.Exclude)+len(snap.Exclude))
combined = append(combined, c.Exclude...)
combined = append(combined, snap.Exclude...)
return combined
}
// SnapshotNames returns the names of all configured snapshots in sorted order.
func (c *Config) SnapshotNames() []string {
names := make([]string, 0, len(c.Snapshots))
for name := range c.Snapshots {
names = append(names, name)
}
// Sort for deterministic order
sort.Strings(names)
return names
}
// Config represents the application configuration for Vaultik.
// It defines all settings for backup operations, including source directories,
// encryption recipients, storage configuration, and performance tuning parameters.
// Configuration is typically loaded from a YAML file.
type Config struct { type Config struct {
AgeRecipient string `yaml:"age_recipient"` AgeRecipients []string `yaml:"age_recipients"`
AgeSecretKey string `yaml:"age_secret_key"`
BackupInterval time.Duration `yaml:"backup_interval"` BackupInterval time.Duration `yaml:"backup_interval"`
BlobSizeLimit int64 `yaml:"blob_size_limit"` BlobSizeLimit Size `yaml:"blob_size_limit"`
ChunkSize int64 `yaml:"chunk_size"` ChunkSize Size `yaml:"chunk_size"`
Exclude []string `yaml:"exclude"` Exclude []string `yaml:"exclude"` // Global excludes applied to all snapshots
FullScanInterval time.Duration `yaml:"full_scan_interval"` FullScanInterval time.Duration `yaml:"full_scan_interval"`
Hostname string `yaml:"hostname"` Hostname string `yaml:"hostname"`
IndexPath string `yaml:"index_path"` IndexPath string `yaml:"index_path"`
IndexPrefix string `yaml:"index_prefix"`
MinTimeBetweenRun time.Duration `yaml:"min_time_between_run"` MinTimeBetweenRun time.Duration `yaml:"min_time_between_run"`
S3 S3Config `yaml:"s3"` S3 S3Config `yaml:"s3"`
SourceDirs []string `yaml:"source_dirs"` Snapshots map[string]SnapshotConfig `yaml:"snapshots"`
CompressionLevel int `yaml:"compression_level"` CompressionLevel int `yaml:"compression_level"`
// StorageURL specifies the storage backend using a URL format.
// Takes precedence over S3Config if set.
// Supported formats:
// - s3://bucket/prefix?endpoint=host&region=us-east-1
// - file:///path/to/backup
// For S3 URLs, credentials are still read from s3.access_key_id and s3.secret_access_key.
StorageURL string `yaml:"storage_url"`
} }
// S3Config represents S3 storage configuration // S3Config represents S3 storage configuration for backup storage.
// It supports both AWS S3 and S3-compatible storage services.
// All fields except UseSSL and PartSize are required.
type S3Config struct { type S3Config struct {
Endpoint string `yaml:"endpoint"` Endpoint string `yaml:"endpoint"`
Bucket string `yaml:"bucket"` Bucket string `yaml:"bucket"`
@@ -35,13 +117,17 @@ type S3Config struct {
SecretAccessKey string `yaml:"secret_access_key"` SecretAccessKey string `yaml:"secret_access_key"`
Region string `yaml:"region"` Region string `yaml:"region"`
UseSSL bool `yaml:"use_ssl"` UseSSL bool `yaml:"use_ssl"`
PartSize int64 `yaml:"part_size"` PartSize Size `yaml:"part_size"`
} }
// ConfigPath wraps the config file path for fx injection // ConfigPath wraps the config file path for fx dependency injection.
// This type allows the config file path to be injected as a distinct type
// rather than a plain string, avoiding conflicts with other string dependencies.
type ConfigPath string type ConfigPath string
// New creates a new Config instance // New creates a new Config instance by loading from the specified path.
// This function is used by the fx dependency injection framework.
// Returns an error if the path is empty or if loading fails.
func New(path ConfigPath) (*Config, error) { func New(path ConfigPath) (*Config, error) {
if path == "" { if path == "" {
return nil, fmt.Errorf("config path not provided") return nil, fmt.Errorf("config path not provided")
@@ -55,32 +141,60 @@ func New(path ConfigPath) (*Config, error) {
return cfg, nil return cfg, nil
} }
// Load reads and parses the configuration file // Load reads and parses the configuration file from the specified path.
// It applies default values for optional fields, performs environment variable
// substitution using smartconfig, and validates the configuration.
// The configuration file should be in YAML format. Returns an error if the file
// cannot be read, parsed, or if validation fails.
func Load(path string) (*Config, error) { func Load(path string) (*Config, error) {
data, err := os.ReadFile(path) // Load config using smartconfig for interpolation
sc, err := smartconfig.NewFromConfigPath(path)
if err != nil { if err != nil {
return nil, fmt.Errorf("failed to read config file: %w", err) return nil, fmt.Errorf("failed to load config file: %w", err)
} }
cfg := &Config{ cfg := &Config{
// Set defaults // Set defaults
BlobSizeLimit: 10 * 1024 * 1024 * 1024, // 10GB BlobSizeLimit: Size(10 * 1024 * 1024 * 1024), // 10GB
ChunkSize: 10 * 1024 * 1024, // 10MB ChunkSize: Size(10 * 1024 * 1024), // 10MB
BackupInterval: 1 * time.Hour, BackupInterval: 1 * time.Hour,
FullScanInterval: 24 * time.Hour, FullScanInterval: 24 * time.Hour,
MinTimeBetweenRun: 15 * time.Minute, MinTimeBetweenRun: 15 * time.Minute,
IndexPath: "/var/lib/vaultik/index.sqlite", IndexPath: filepath.Join(xdg.DataHome, appName, "index.sqlite"),
IndexPrefix: "index/",
CompressionLevel: 3, CompressionLevel: 3,
} }
if err := yaml.Unmarshal(data, cfg); err != nil { // Convert smartconfig data to YAML then unmarshal
configData := sc.Data()
yamlBytes, err := yaml.Marshal(configData)
if err != nil {
return nil, fmt.Errorf("failed to marshal config data: %w", err)
}
if err := yaml.Unmarshal(yamlBytes, cfg); err != nil {
return nil, fmt.Errorf("failed to parse config: %w", err) return nil, fmt.Errorf("failed to parse config: %w", err)
} }
// Expand tilde in all path fields
cfg.IndexPath = expandTilde(cfg.IndexPath)
cfg.StorageURL = expandTildeInURL(cfg.StorageURL)
// Expand tildes in snapshot paths
for name, snap := range cfg.Snapshots {
for i, path := range snap.Paths {
snap.Paths[i] = expandTilde(path)
}
cfg.Snapshots[name] = snap
}
// Check for environment variable override for IndexPath // Check for environment variable override for IndexPath
if envIndexPath := os.Getenv("VAULTIK_INDEX_PATH"); envIndexPath != "" { if envIndexPath := os.Getenv("VAULTIK_INDEX_PATH"); envIndexPath != "" {
cfg.IndexPath = envIndexPath cfg.IndexPath = expandTilde(envIndexPath)
}
// Check for environment variable override for AgeSecretKey
if envAgeSecretKey := os.Getenv("VAULTIK_AGE_SECRET_KEY"); envAgeSecretKey != "" {
cfg.AgeSecretKey = extractAgeSecretKey(envAgeSecretKey)
} }
// Get hostname if not set // Get hostname if not set
@@ -97,7 +211,18 @@ func Load(path string) (*Config, error) {
cfg.S3.Region = "us-east-1" cfg.S3.Region = "us-east-1"
} }
if cfg.S3.PartSize == 0 { if cfg.S3.PartSize == 0 {
cfg.S3.PartSize = 5 * 1024 * 1024 // 5MB cfg.S3.PartSize = Size(5 * 1024 * 1024) // 5MB
}
// Check config file permissions (warn if world or group readable)
if info, err := os.Stat(path); err == nil {
mode := info.Mode().Perm()
if mode&0044 != 0 { // group or world readable
log.Warn("Config file has insecure permissions (contains S3 credentials)",
"path", path,
"mode", fmt.Sprintf("%04o", mode),
"recommendation", "chmod 600 "+path)
}
} }
if err := cfg.Validate(); err != nil { if err := cfg.Validate(); err != nil {
@@ -107,37 +232,40 @@ func Load(path string) (*Config, error) {
return cfg, nil return cfg, nil
} }
// Validate checks if the configuration is valid // Validate checks if the configuration is valid and complete.
// It ensures all required fields are present and have valid values:
// - At least one age recipient must be specified
// - At least one snapshot must be configured with at least one path
// - Storage must be configured (either storage_url or s3.* fields)
// - Chunk size must be at least 1MB
// - Blob size limit must be at least the chunk size
// - Compression level must be between 1 and 19
// Returns an error describing the first validation failure encountered.
func (c *Config) Validate() error { func (c *Config) Validate() error {
if c.AgeRecipient == "" { if len(c.AgeRecipients) == 0 {
return fmt.Errorf("age_recipient is required") return fmt.Errorf("at least one age_recipient is required")
} }
if len(c.SourceDirs) == 0 { if len(c.Snapshots) == 0 {
return fmt.Errorf("at least one source directory is required") return fmt.Errorf("at least one snapshot must be configured")
} }
if c.S3.Endpoint == "" { for name, snap := range c.Snapshots {
return fmt.Errorf("s3.endpoint is required") if len(snap.Paths) == 0 {
return fmt.Errorf("snapshot %q must have at least one path", name)
}
} }
if c.S3.Bucket == "" { // Validate storage configuration
return fmt.Errorf("s3.bucket is required") if err := c.validateStorage(); err != nil {
return err
} }
if c.S3.AccessKeyID == "" { if c.ChunkSize.Int64() < 1024*1024 { // 1MB minimum
return fmt.Errorf("s3.access_key_id is required")
}
if c.S3.SecretAccessKey == "" {
return fmt.Errorf("s3.secret_access_key is required")
}
if c.ChunkSize < 1024*1024 { // 1MB minimum
return fmt.Errorf("chunk_size must be at least 1MB") return fmt.Errorf("chunk_size must be at least 1MB")
} }
if c.BlobSizeLimit < c.ChunkSize { if c.BlobSizeLimit.Int64() < c.ChunkSize.Int64() {
return fmt.Errorf("blob_size_limit must be at least chunk_size") return fmt.Errorf("blob_size_limit must be at least chunk_size")
} }
@@ -148,7 +276,71 @@ func (c *Config) Validate() error {
return nil return nil
} }
// Module exports the config module for fx // validateStorage validates storage configuration.
// If StorageURL is set, it takes precedence. S3 URLs require credentials.
// File URLs don't require any S3 configuration.
// If StorageURL is not set, legacy S3 configuration is required.
func (c *Config) validateStorage() error {
if c.StorageURL != "" {
// URL-based configuration
if strings.HasPrefix(c.StorageURL, "file://") {
// File storage doesn't need S3 credentials
return nil
}
if strings.HasPrefix(c.StorageURL, "s3://") {
// S3 storage needs credentials
if c.S3.AccessKeyID == "" {
return fmt.Errorf("s3.access_key_id is required for s3:// URLs")
}
if c.S3.SecretAccessKey == "" {
return fmt.Errorf("s3.secret_access_key is required for s3:// URLs")
}
return nil
}
if strings.HasPrefix(c.StorageURL, "rclone://") {
// Rclone storage uses rclone's own config
return nil
}
return fmt.Errorf("storage_url must start with s3://, file://, or rclone://")
}
// Legacy S3 configuration
if c.S3.Endpoint == "" {
return fmt.Errorf("s3.endpoint is required (or set storage_url)")
}
if c.S3.Bucket == "" {
return fmt.Errorf("s3.bucket is required (or set storage_url)")
}
if c.S3.AccessKeyID == "" {
return fmt.Errorf("s3.access_key_id is required")
}
if c.S3.SecretAccessKey == "" {
return fmt.Errorf("s3.secret_access_key is required")
}
return nil
}
// extractAgeSecretKey extracts the AGE-SECRET-KEY from the input using
// the age library's parser, which handles comments and whitespace.
func extractAgeSecretKey(input string) string {
identities, err := age.ParseIdentities(strings.NewReader(input))
if err != nil || len(identities) == 0 {
// Fall back to trimmed input if parsing fails
return strings.TrimSpace(input)
}
// Return the string representation of the first identity
if id, ok := identities[0].(*age.X25519Identity); ok {
return id.String()
}
return strings.TrimSpace(input)
}
// Module exports the config module for fx dependency injection.
// It provides the Config type to other modules in the application.
var Module = fx.Module("config", var Module = fx.Module("config",
fx.Provide(New), fx.Provide(New),
) )

View File

@@ -6,6 +6,12 @@ import (
"testing" "testing"
) )
const (
TEST_SNEAK_AGE_PUBLIC_KEY = "age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj"
TEST_INTEGRATION_AGE_PUBLIC_KEY = "age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"
TEST_INTEGRATION_AGE_PRIVATE_KEY = "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5"
)
func TestMain(m *testing.M) { func TestMain(m *testing.M) {
// Set up test environment // Set up test environment
testConfigPath := filepath.Join("..", "..", "test", "config.yaml") testConfigPath := filepath.Join("..", "..", "test", "config.yaml")
@@ -32,16 +38,28 @@ func TestConfigLoad(t *testing.T) {
} }
// Basic validation // Basic validation
if cfg.AgeRecipient != "age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" { if len(cfg.AgeRecipients) != 2 {
t.Errorf("Expected age recipient to be set, got '%s'", cfg.AgeRecipient) t.Errorf("Expected 2 age recipients, got %d", len(cfg.AgeRecipients))
}
if cfg.AgeRecipients[0] != TEST_SNEAK_AGE_PUBLIC_KEY {
t.Errorf("Expected first age recipient to be %s, got '%s'", TEST_SNEAK_AGE_PUBLIC_KEY, cfg.AgeRecipients[0])
} }
if len(cfg.SourceDirs) != 2 { if len(cfg.Snapshots) != 1 {
t.Errorf("Expected 2 source dirs, got %d", len(cfg.SourceDirs)) t.Errorf("Expected 1 snapshot, got %d", len(cfg.Snapshots))
} }
if cfg.SourceDirs[0] != "/tmp/vaultik-test-source" { testSnap, ok := cfg.Snapshots["test"]
t.Errorf("Expected first source dir to be '/tmp/vaultik-test-source', got '%s'", cfg.SourceDirs[0]) if !ok {
t.Fatal("Expected 'test' snapshot to exist")
}
if len(testSnap.Paths) != 2 {
t.Errorf("Expected 2 paths in test snapshot, got %d", len(testSnap.Paths))
}
if testSnap.Paths[0] != "/tmp/vaultik-test-source" {
t.Errorf("Expected first path to be '/tmp/vaultik-test-source', got '%s'", testSnap.Paths[0])
} }
if cfg.S3.Bucket != "vaultik-test-bucket" { if cfg.S3.Bucket != "vaultik-test-bucket" {
@@ -65,3 +83,65 @@ func TestConfigFromEnv(t *testing.T) {
t.Errorf("Config file does not exist at path from VAULTIK_CONFIG: %s", configPath) t.Errorf("Config file does not exist at path from VAULTIK_CONFIG: %s", configPath)
} }
} }
// TestExtractAgeSecretKey tests extraction of AGE-SECRET-KEY from various inputs
func TestExtractAgeSecretKey(t *testing.T) {
tests := []struct {
name string
input string
expected string
}{
{
name: "plain key",
input: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
},
{
name: "key with trailing newline",
input: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5\n",
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
},
{
name: "full age-keygen output",
input: `# created: 2025-01-14T12:00:00Z
# public key: age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg
AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5
`,
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
},
{
name: "age-keygen output with extra blank lines",
input: `# created: 2025-01-14T12:00:00Z
# public key: age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg
AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5
`,
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
},
{
name: "key with leading whitespace",
input: " AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5 ",
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
},
{
name: "empty input",
input: "",
expected: "",
},
{
name: "only comments",
input: "# this is a comment\n# another comment",
expected: "# this is a comment\n# another comment",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result := extractAgeSecretKey(tt.input)
if result != tt.expected {
t.Errorf("extractAgeSecretKey(%q) = %q, want %q", tt.input, result, tt.expected)
}
})
}
}

62
internal/config/size.go Normal file
View File

@@ -0,0 +1,62 @@
package config
import (
"fmt"
"github.com/dustin/go-humanize"
)
// Size represents a byte size that can be specified in configuration files.
// It can unmarshal from both numeric values (interpreted as bytes) and
// human-readable strings like "10MB", "2.5GB", or "1TB".
type Size int64
// UnmarshalYAML implements yaml.Unmarshaler for Size, allowing it to be
// parsed from YAML configuration files. It accepts both numeric values
// (interpreted as bytes) and string values with units (e.g., "10MB").
func (s *Size) UnmarshalYAML(unmarshal func(interface{}) error) error {
// Try to unmarshal as int64 first
var intVal int64
if err := unmarshal(&intVal); err == nil {
*s = Size(intVal)
return nil
}
// Try to unmarshal as string
var strVal string
if err := unmarshal(&strVal); err != nil {
return fmt.Errorf("size must be a number or string")
}
// Parse the string using go-humanize
bytes, err := humanize.ParseBytes(strVal)
if err != nil {
return fmt.Errorf("invalid size format: %w", err)
}
*s = Size(bytes)
return nil
}
// Int64 returns the size as int64 bytes.
// This is useful when the size needs to be passed to APIs that expect
// a numeric byte count.
func (s Size) Int64() int64 {
return int64(s)
}
// String returns the size as a human-readable string.
// For example, 1048576 bytes would be formatted as "1.0 MB".
// This implements the fmt.Stringer interface.
func (s Size) String() string {
return humanize.Bytes(uint64(s))
}
// ParseSize parses a size string into a Size value
func ParseSize(s string) (Size, error) {
bytes, err := humanize.ParseBytes(s)
if err != nil {
return 0, fmt.Errorf("invalid size format: %w", err)
}
return Size(bytes), nil
}

View File

@@ -0,0 +1,209 @@
package crypto
import (
"bytes"
"fmt"
"io"
"sync"
"filippo.io/age"
"go.uber.org/fx"
)
// Encryptor provides thread-safe encryption using the age encryption library.
// It supports encrypting data for multiple recipients simultaneously, allowing
// any of the corresponding private keys to decrypt the data. This is useful
// for backup scenarios where multiple parties should be able to decrypt the data.
type Encryptor struct {
recipients []age.Recipient
mu sync.RWMutex
}
// NewEncryptor creates a new encryptor with the given age public keys.
// Each public key should be a valid age X25519 recipient string (e.g., "age1...")
// At least one recipient must be provided. Returns an error if any of the
// public keys are invalid or if no recipients are specified.
func NewEncryptor(publicKeys []string) (*Encryptor, error) {
if len(publicKeys) == 0 {
return nil, fmt.Errorf("at least one recipient is required")
}
recipients := make([]age.Recipient, 0, len(publicKeys))
for _, key := range publicKeys {
recipient, err := age.ParseX25519Recipient(key)
if err != nil {
return nil, fmt.Errorf("parsing age recipient %s: %w", key, err)
}
recipients = append(recipients, recipient)
}
return &Encryptor{
recipients: recipients,
}, nil
}
// Encrypt encrypts data using age encryption for all configured recipients.
// The encrypted data can be decrypted by any of the corresponding private keys.
// This method is suitable for small to medium amounts of data that fit in memory.
// For large data streams, use EncryptStream or EncryptWriter instead.
func (e *Encryptor) Encrypt(data []byte) ([]byte, error) {
e.mu.RLock()
recipients := e.recipients
e.mu.RUnlock()
var buf bytes.Buffer
// Create encrypted writer for all recipients
w, err := age.Encrypt(&buf, recipients...)
if err != nil {
return nil, fmt.Errorf("creating encrypted writer: %w", err)
}
// Write data
if _, err := w.Write(data); err != nil {
return nil, fmt.Errorf("writing encrypted data: %w", err)
}
// Close to flush
if err := w.Close(); err != nil {
return nil, fmt.Errorf("closing encrypted writer: %w", err)
}
return buf.Bytes(), nil
}
// EncryptStream encrypts data from reader to writer using age encryption.
// This method is suitable for encrypting large files or streams as it processes
// data in a streaming fashion without loading everything into memory.
// The encrypted data is written directly to the destination writer.
func (e *Encryptor) EncryptStream(dst io.Writer, src io.Reader) error {
e.mu.RLock()
recipients := e.recipients
e.mu.RUnlock()
// Create encrypted writer for all recipients
w, err := age.Encrypt(dst, recipients...)
if err != nil {
return fmt.Errorf("creating encrypted writer: %w", err)
}
// Copy data
if _, err := io.Copy(w, src); err != nil {
return fmt.Errorf("copying encrypted data: %w", err)
}
// Close to flush
if err := w.Close(); err != nil {
return fmt.Errorf("closing encrypted writer: %w", err)
}
return nil
}
// EncryptWriter creates a writer that encrypts data written to it.
// All data written to the returned WriteCloser will be encrypted and written
// to the destination writer. The caller must call Close() on the returned
// writer to ensure all encrypted data is properly flushed and finalized.
// This is useful for integrating encryption into existing writer-based pipelines.
func (e *Encryptor) EncryptWriter(dst io.Writer) (io.WriteCloser, error) {
e.mu.RLock()
recipients := e.recipients
e.mu.RUnlock()
// Create encrypted writer for all recipients
w, err := age.Encrypt(dst, recipients...)
if err != nil {
return nil, fmt.Errorf("creating encrypted writer: %w", err)
}
return w, nil
}
// UpdateRecipients updates the recipients for future encryption operations.
// This method is thread-safe and can be called while other encryption operations
// are in progress. Existing encryption operations will continue with the old
// recipients. At least one recipient must be provided. Returns an error if any
// of the public keys are invalid or if no recipients are specified.
func (e *Encryptor) UpdateRecipients(publicKeys []string) error {
if len(publicKeys) == 0 {
return fmt.Errorf("at least one recipient is required")
}
recipients := make([]age.Recipient, 0, len(publicKeys))
for _, key := range publicKeys {
recipient, err := age.ParseX25519Recipient(key)
if err != nil {
return fmt.Errorf("parsing age recipient %s: %w", key, err)
}
recipients = append(recipients, recipient)
}
e.mu.Lock()
e.recipients = recipients
e.mu.Unlock()
return nil
}
// Decryptor provides thread-safe decryption using the age encryption library.
// It uses a private key to decrypt data that was encrypted for the corresponding
// public key.
type Decryptor struct {
identity age.Identity
mu sync.RWMutex
}
// NewDecryptor creates a new decryptor with the given age private key.
// The private key should be a valid age X25519 identity string.
// Returns an error if the private key is invalid.
func NewDecryptor(privateKey string) (*Decryptor, error) {
identity, err := age.ParseX25519Identity(privateKey)
if err != nil {
return nil, fmt.Errorf("parsing age identity: %w", err)
}
return &Decryptor{
identity: identity,
}, nil
}
// Decrypt decrypts data using age decryption.
// This method is suitable for small to medium amounts of data that fit in memory.
// For large data streams, use DecryptStream instead.
func (d *Decryptor) Decrypt(data []byte) ([]byte, error) {
d.mu.RLock()
identity := d.identity
d.mu.RUnlock()
r, err := age.Decrypt(bytes.NewReader(data), identity)
if err != nil {
return nil, fmt.Errorf("creating decrypted reader: %w", err)
}
decrypted, err := io.ReadAll(r)
if err != nil {
return nil, fmt.Errorf("reading decrypted data: %w", err)
}
return decrypted, nil
}
// DecryptStream returns a reader that decrypts data from the provided reader.
// This method is suitable for decrypting large files or streams as it processes
// data in a streaming fashion without loading everything into memory.
// The caller should close the input reader when done.
func (d *Decryptor) DecryptStream(src io.Reader) (io.Reader, error) {
d.mu.RLock()
identity := d.identity
d.mu.RUnlock()
r, err := age.Decrypt(src, identity)
if err != nil {
return nil, fmt.Errorf("creating decrypted reader: %w", err)
}
return r, nil
}
// Module exports the crypto module for fx dependency injection.
var Module = fx.Module("crypto")

View File

@@ -0,0 +1,157 @@
package crypto
import (
"bytes"
"testing"
"filippo.io/age"
)
func TestEncryptor(t *testing.T) {
// Generate a test key pair
identity, err := age.GenerateX25519Identity()
if err != nil {
t.Fatalf("failed to generate identity: %v", err)
}
publicKey := identity.Recipient().String()
// Create encryptor
enc, err := NewEncryptor([]string{publicKey})
if err != nil {
t.Fatalf("failed to create encryptor: %v", err)
}
// Test data
plaintext := []byte("Hello, World! This is a test message.")
// Encrypt
ciphertext, err := enc.Encrypt(plaintext)
if err != nil {
t.Fatalf("failed to encrypt: %v", err)
}
// Verify it's actually encrypted (should be larger and different)
if bytes.Equal(plaintext, ciphertext) {
t.Error("ciphertext equals plaintext")
}
// Decrypt to verify
r, err := age.Decrypt(bytes.NewReader(ciphertext), identity)
if err != nil {
t.Fatalf("failed to decrypt: %v", err)
}
var decrypted bytes.Buffer
if _, err := decrypted.ReadFrom(r); err != nil {
t.Fatalf("failed to read decrypted data: %v", err)
}
if !bytes.Equal(plaintext, decrypted.Bytes()) {
t.Error("decrypted data doesn't match original")
}
}
func TestEncryptorMultipleRecipients(t *testing.T) {
// Generate three test key pairs
identity1, err := age.GenerateX25519Identity()
if err != nil {
t.Fatalf("failed to generate identity1: %v", err)
}
identity2, err := age.GenerateX25519Identity()
if err != nil {
t.Fatalf("failed to generate identity2: %v", err)
}
identity3, err := age.GenerateX25519Identity()
if err != nil {
t.Fatalf("failed to generate identity3: %v", err)
}
publicKeys := []string{
identity1.Recipient().String(),
identity2.Recipient().String(),
identity3.Recipient().String(),
}
// Create encryptor with multiple recipients
enc, err := NewEncryptor(publicKeys)
if err != nil {
t.Fatalf("failed to create encryptor: %v", err)
}
// Test data
plaintext := []byte("Secret message for multiple recipients")
// Encrypt
ciphertext, err := enc.Encrypt(plaintext)
if err != nil {
t.Fatalf("failed to encrypt: %v", err)
}
// Verify each recipient can decrypt
identities := []age.Identity{identity1, identity2, identity3}
for i, identity := range identities {
r, err := age.Decrypt(bytes.NewReader(ciphertext), identity)
if err != nil {
t.Fatalf("recipient %d failed to decrypt: %v", i+1, err)
}
var decrypted bytes.Buffer
if _, err := decrypted.ReadFrom(r); err != nil {
t.Fatalf("recipient %d failed to read decrypted data: %v", i+1, err)
}
if !bytes.Equal(plaintext, decrypted.Bytes()) {
t.Errorf("recipient %d: decrypted data doesn't match original", i+1)
}
}
}
func TestEncryptorUpdateRecipients(t *testing.T) {
// Generate two identities
identity1, _ := age.GenerateX25519Identity()
identity2, _ := age.GenerateX25519Identity()
publicKey1 := identity1.Recipient().String()
publicKey2 := identity2.Recipient().String()
// Create encryptor with first key
enc, err := NewEncryptor([]string{publicKey1})
if err != nil {
t.Fatalf("failed to create encryptor: %v", err)
}
// Encrypt with first key
plaintext := []byte("test data")
ciphertext1, err := enc.Encrypt(plaintext)
if err != nil {
t.Fatalf("failed to encrypt: %v", err)
}
// Update to second key
if err := enc.UpdateRecipients([]string{publicKey2}); err != nil {
t.Fatalf("failed to update recipients: %v", err)
}
// Encrypt with second key
ciphertext2, err := enc.Encrypt(plaintext)
if err != nil {
t.Fatalf("failed to encrypt: %v", err)
}
// First ciphertext should only decrypt with first identity
if _, err := age.Decrypt(bytes.NewReader(ciphertext1), identity1); err != nil {
t.Error("failed to decrypt with identity1")
}
if _, err := age.Decrypt(bytes.NewReader(ciphertext1), identity2); err == nil {
t.Error("should not decrypt with identity2")
}
// Second ciphertext should only decrypt with second identity
if _, err := age.Decrypt(bytes.NewReader(ciphertext2), identity2); err != nil {
t.Error("failed to decrypt with identity2")
}
if _, err := age.Decrypt(bytes.NewReader(ciphertext2), identity1); err == nil {
t.Error("should not decrypt with identity1")
}
}

View File

@@ -16,15 +16,15 @@ func NewBlobChunkRepository(db *DB) *BlobChunkRepository {
func (r *BlobChunkRepository) Create(ctx context.Context, tx *sql.Tx, bc *BlobChunk) error { func (r *BlobChunkRepository) Create(ctx context.Context, tx *sql.Tx, bc *BlobChunk) error {
query := ` query := `
INSERT INTO blob_chunks (blob_hash, chunk_hash, offset, length) INSERT INTO blob_chunks (blob_id, chunk_hash, offset, length)
VALUES (?, ?, ?, ?) VALUES (?, ?, ?, ?)
` `
var err error var err error
if tx != nil { if tx != nil {
_, err = tx.ExecContext(ctx, query, bc.BlobHash, bc.ChunkHash, bc.Offset, bc.Length) _, err = tx.ExecContext(ctx, query, bc.BlobID, bc.ChunkHash, bc.Offset, bc.Length)
} else { } else {
_, err = r.db.ExecWithLock(ctx, query, bc.BlobHash, bc.ChunkHash, bc.Offset, bc.Length) _, err = r.db.ExecWithLog(ctx, query, bc.BlobID, bc.ChunkHash, bc.Offset, bc.Length)
} }
if err != nil { if err != nil {
@@ -34,15 +34,15 @@ func (r *BlobChunkRepository) Create(ctx context.Context, tx *sql.Tx, bc *BlobCh
return nil return nil
} }
func (r *BlobChunkRepository) GetByBlobHash(ctx context.Context, blobHash string) ([]*BlobChunk, error) { func (r *BlobChunkRepository) GetByBlobID(ctx context.Context, blobID string) ([]*BlobChunk, error) {
query := ` query := `
SELECT blob_hash, chunk_hash, offset, length SELECT blob_id, chunk_hash, offset, length
FROM blob_chunks FROM blob_chunks
WHERE blob_hash = ? WHERE blob_id = ?
ORDER BY offset ORDER BY offset
` `
rows, err := r.db.conn.QueryContext(ctx, query, blobHash) rows, err := r.db.conn.QueryContext(ctx, query, blobID)
if err != nil { if err != nil {
return nil, fmt.Errorf("querying blob chunks: %w", err) return nil, fmt.Errorf("querying blob chunks: %w", err)
} }
@@ -51,7 +51,7 @@ func (r *BlobChunkRepository) GetByBlobHash(ctx context.Context, blobHash string
var blobChunks []*BlobChunk var blobChunks []*BlobChunk
for rows.Next() { for rows.Next() {
var bc BlobChunk var bc BlobChunk
err := rows.Scan(&bc.BlobHash, &bc.ChunkHash, &bc.Offset, &bc.Length) err := rows.Scan(&bc.BlobID, &bc.ChunkHash, &bc.Offset, &bc.Length)
if err != nil { if err != nil {
return nil, fmt.Errorf("scanning blob chunk: %w", err) return nil, fmt.Errorf("scanning blob chunk: %w", err)
} }
@@ -63,26 +63,90 @@ func (r *BlobChunkRepository) GetByBlobHash(ctx context.Context, blobHash string
func (r *BlobChunkRepository) GetByChunkHash(ctx context.Context, chunkHash string) (*BlobChunk, error) { func (r *BlobChunkRepository) GetByChunkHash(ctx context.Context, chunkHash string) (*BlobChunk, error) {
query := ` query := `
SELECT blob_hash, chunk_hash, offset, length SELECT blob_id, chunk_hash, offset, length
FROM blob_chunks FROM blob_chunks
WHERE chunk_hash = ? WHERE chunk_hash = ?
LIMIT 1 LIMIT 1
` `
LogSQL("GetByChunkHash", query, chunkHash)
var bc BlobChunk var bc BlobChunk
err := r.db.conn.QueryRowContext(ctx, query, chunkHash).Scan( err := r.db.conn.QueryRowContext(ctx, query, chunkHash).Scan(
&bc.BlobHash, &bc.BlobID,
&bc.ChunkHash, &bc.ChunkHash,
&bc.Offset, &bc.Offset,
&bc.Length, &bc.Length,
) )
if err == sql.ErrNoRows { if err == sql.ErrNoRows {
LogSQL("GetByChunkHash", "No rows found", chunkHash)
return nil, nil return nil, nil
} }
if err != nil { if err != nil {
LogSQL("GetByChunkHash", "Error", chunkHash, err)
return nil, fmt.Errorf("querying blob chunk: %w", err) return nil, fmt.Errorf("querying blob chunk: %w", err)
} }
LogSQL("GetByChunkHash", "Found blob", chunkHash, "blob", bc.BlobID)
return &bc, nil return &bc, nil
} }
// GetByChunkHashTx retrieves a blob chunk within a transaction
func (r *BlobChunkRepository) GetByChunkHashTx(ctx context.Context, tx *sql.Tx, chunkHash string) (*BlobChunk, error) {
query := `
SELECT blob_id, chunk_hash, offset, length
FROM blob_chunks
WHERE chunk_hash = ?
LIMIT 1
`
LogSQL("GetByChunkHashTx", query, chunkHash)
var bc BlobChunk
err := tx.QueryRowContext(ctx, query, chunkHash).Scan(
&bc.BlobID,
&bc.ChunkHash,
&bc.Offset,
&bc.Length,
)
if err == sql.ErrNoRows {
LogSQL("GetByChunkHashTx", "No rows found", chunkHash)
return nil, nil
}
if err != nil {
LogSQL("GetByChunkHashTx", "Error", chunkHash, err)
return nil, fmt.Errorf("querying blob chunk: %w", err)
}
LogSQL("GetByChunkHashTx", "Found blob", chunkHash, "blob", bc.BlobID)
return &bc, nil
}
// DeleteOrphaned deletes blob_chunks entries where either the blob or chunk no longer exists
func (r *BlobChunkRepository) DeleteOrphaned(ctx context.Context) error {
// Delete blob_chunks where the blob doesn't exist
query1 := `
DELETE FROM blob_chunks
WHERE NOT EXISTS (
SELECT 1 FROM blobs
WHERE blobs.id = blob_chunks.blob_id
)
`
if _, err := r.db.ExecWithLog(ctx, query1); err != nil {
return fmt.Errorf("deleting blob_chunks with missing blobs: %w", err)
}
// Delete blob_chunks where the chunk doesn't exist
query2 := `
DELETE FROM blob_chunks
WHERE NOT EXISTS (
SELECT 1 FROM chunks
WHERE chunks.chunk_hash = blob_chunks.chunk_hash
)
`
if _, err := r.db.ExecWithLog(ctx, query2); err != nil {
return fmt.Errorf("deleting blob_chunks with missing chunks: %w", err)
}
return nil
}

View File

@@ -2,7 +2,11 @@ package database
import ( import (
"context" "context"
"strings"
"testing" "testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
) )
func TestBlobChunkRepository(t *testing.T) { func TestBlobChunkRepository(t *testing.T) {
@@ -10,78 +14,111 @@ func TestBlobChunkRepository(t *testing.T) {
defer cleanup() defer cleanup()
ctx := context.Background() ctx := context.Background()
repo := NewBlobChunkRepository(db) repos := NewRepositories(db)
// Create blob first
blob := &Blob{
ID: types.NewBlobID(),
Hash: types.BlobHash("blob1-hash"),
CreatedTS: time.Now(),
}
err := repos.Blobs.Create(ctx, nil, blob)
if err != nil {
t.Fatalf("failed to create blob: %v", err)
}
// Create chunks
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
for _, chunkHash := range chunks {
chunk := &Chunk{
ChunkHash: chunkHash,
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)
if err != nil {
t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
}
}
// Test Create // Test Create
bc1 := &BlobChunk{ bc1 := &BlobChunk{
BlobHash: "blob1", BlobID: blob.ID,
ChunkHash: "chunk1", ChunkHash: types.ChunkHash("chunk1"),
Offset: 0, Offset: 0,
Length: 1024, Length: 1024,
} }
err := repo.Create(ctx, nil, bc1) err = repos.BlobChunks.Create(ctx, nil, bc1)
if err != nil { if err != nil {
t.Fatalf("failed to create blob chunk: %v", err) t.Fatalf("failed to create blob chunk: %v", err)
} }
// Add more chunks to the same blob // Add more chunks to the same blob
bc2 := &BlobChunk{ bc2 := &BlobChunk{
BlobHash: "blob1", BlobID: blob.ID,
ChunkHash: "chunk2", ChunkHash: types.ChunkHash("chunk2"),
Offset: 1024, Offset: 1024,
Length: 2048, Length: 2048,
} }
err = repo.Create(ctx, nil, bc2) err = repos.BlobChunks.Create(ctx, nil, bc2)
if err != nil { if err != nil {
t.Fatalf("failed to create second blob chunk: %v", err) t.Fatalf("failed to create second blob chunk: %v", err)
} }
bc3 := &BlobChunk{ bc3 := &BlobChunk{
BlobHash: "blob1", BlobID: blob.ID,
ChunkHash: "chunk3", ChunkHash: types.ChunkHash("chunk3"),
Offset: 3072, Offset: 3072,
Length: 512, Length: 512,
} }
err = repo.Create(ctx, nil, bc3) err = repos.BlobChunks.Create(ctx, nil, bc3)
if err != nil { if err != nil {
t.Fatalf("failed to create third blob chunk: %v", err) t.Fatalf("failed to create third blob chunk: %v", err)
} }
// Test GetByBlobHash // Test GetByBlobID
chunks, err := repo.GetByBlobHash(ctx, "blob1") blobChunks, err := repos.BlobChunks.GetByBlobID(ctx, blob.ID.String())
if err != nil { if err != nil {
t.Fatalf("failed to get blob chunks: %v", err) t.Fatalf("failed to get blob chunks: %v", err)
} }
if len(chunks) != 3 { if len(blobChunks) != 3 {
t.Errorf("expected 3 chunks, got %d", len(chunks)) t.Errorf("expected 3 chunks, got %d", len(blobChunks))
} }
// Verify order by offset // Verify order by offset
expectedOffsets := []int64{0, 1024, 3072} expectedOffsets := []int64{0, 1024, 3072}
for i, chunk := range chunks { for i, bc := range blobChunks {
if chunk.Offset != expectedOffsets[i] { if bc.Offset != expectedOffsets[i] {
t.Errorf("wrong chunk order: expected offset %d, got %d", expectedOffsets[i], chunk.Offset) t.Errorf("wrong chunk order: expected offset %d, got %d", expectedOffsets[i], bc.Offset)
} }
} }
// Test GetByChunkHash // Test GetByChunkHash
bc, err := repo.GetByChunkHash(ctx, "chunk2") bc, err := repos.BlobChunks.GetByChunkHash(ctx, "chunk2")
if err != nil { if err != nil {
t.Fatalf("failed to get blob chunk by chunk hash: %v", err) t.Fatalf("failed to get blob chunk by chunk hash: %v", err)
} }
if bc == nil { if bc == nil {
t.Fatal("expected blob chunk, got nil") t.Fatal("expected blob chunk, got nil")
} }
if bc.BlobHash != "blob1" { if bc.BlobID != blob.ID {
t.Errorf("wrong blob hash: expected blob1, got %s", bc.BlobHash) t.Errorf("wrong blob ID: expected %s, got %s", blob.ID, bc.BlobID)
} }
if bc.Offset != 1024 { if bc.Offset != 1024 {
t.Errorf("wrong offset: expected 1024, got %d", bc.Offset) t.Errorf("wrong offset: expected 1024, got %d", bc.Offset)
} }
// Test duplicate insert (should fail due to primary key constraint)
err = repos.BlobChunks.Create(ctx, nil, bc1)
if err == nil {
t.Fatal("duplicate blob_chunk insert should fail due to primary key constraint")
}
if !strings.Contains(err.Error(), "UNIQUE") && !strings.Contains(err.Error(), "constraint") {
t.Fatalf("expected constraint error, got: %v", err)
}
// Test non-existent chunk // Test non-existent chunk
bc, err = repo.GetByChunkHash(ctx, "nonexistent") bc, err = repos.BlobChunks.GetByChunkHash(ctx, "nonexistent")
if err != nil { if err != nil {
t.Fatalf("unexpected error: %v", err) t.Fatalf("unexpected error: %v", err)
} }
@@ -95,26 +132,60 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
defer cleanup() defer cleanup()
ctx := context.Background() ctx := context.Background()
repo := NewBlobChunkRepository(db) repos := NewRepositories(db)
// Create blobs
blob1 := &Blob{
ID: types.NewBlobID(),
Hash: types.BlobHash("blob1-hash"),
CreatedTS: time.Now(),
}
blob2 := &Blob{
ID: types.NewBlobID(),
Hash: types.BlobHash("blob2-hash"),
CreatedTS: time.Now(),
}
err := repos.Blobs.Create(ctx, nil, blob1)
if err != nil {
t.Fatalf("failed to create blob1: %v", err)
}
err = repos.Blobs.Create(ctx, nil, blob2)
if err != nil {
t.Fatalf("failed to create blob2: %v", err)
}
// Create chunks
chunkHashes := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
for _, chunkHash := range chunkHashes {
chunk := &Chunk{
ChunkHash: chunkHash,
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)
if err != nil {
t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
}
}
// Create chunks across multiple blobs // Create chunks across multiple blobs
// Some chunks are shared between blobs (deduplication scenario) // Some chunks are shared between blobs (deduplication scenario)
blobChunks := []BlobChunk{ blobChunks := []BlobChunk{
{BlobHash: "blob1", ChunkHash: "chunk1", Offset: 0, Length: 1024}, {BlobID: blob1.ID, ChunkHash: types.ChunkHash("chunk1"), Offset: 0, Length: 1024},
{BlobHash: "blob1", ChunkHash: "chunk2", Offset: 1024, Length: 1024}, {BlobID: blob1.ID, ChunkHash: types.ChunkHash("chunk2"), Offset: 1024, Length: 1024},
{BlobHash: "blob2", ChunkHash: "chunk2", Offset: 0, Length: 1024}, // chunk2 is shared {BlobID: blob2.ID, ChunkHash: types.ChunkHash("chunk2"), Offset: 0, Length: 1024}, // chunk2 is shared
{BlobHash: "blob2", ChunkHash: "chunk3", Offset: 1024, Length: 1024}, {BlobID: blob2.ID, ChunkHash: types.ChunkHash("chunk3"), Offset: 1024, Length: 1024},
} }
for _, bc := range blobChunks { for _, bc := range blobChunks {
err := repo.Create(ctx, nil, &bc) err := repos.BlobChunks.Create(ctx, nil, &bc)
if err != nil { if err != nil {
t.Fatalf("failed to create blob chunk: %v", err) t.Fatalf("failed to create blob chunk: %v", err)
} }
} }
// Verify blob1 chunks // Verify blob1 chunks
chunks, err := repo.GetByBlobHash(ctx, "blob1") chunks, err := repos.BlobChunks.GetByBlobID(ctx, blob1.ID.String())
if err != nil { if err != nil {
t.Fatalf("failed to get blob1 chunks: %v", err) t.Fatalf("failed to get blob1 chunks: %v", err)
} }
@@ -123,7 +194,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
} }
// Verify blob2 chunks // Verify blob2 chunks
chunks, err = repo.GetByBlobHash(ctx, "blob2") chunks, err = repos.BlobChunks.GetByBlobID(ctx, blob2.ID.String())
if err != nil { if err != nil {
t.Fatalf("failed to get blob2 chunks: %v", err) t.Fatalf("failed to get blob2 chunks: %v", err)
} }
@@ -132,7 +203,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
} }
// Verify shared chunk // Verify shared chunk
bc, err := repo.GetByChunkHash(ctx, "chunk2") bc, err := repos.BlobChunks.GetByChunkHash(ctx, "chunk2")
if err != nil { if err != nil {
t.Fatalf("failed to get shared chunk: %v", err) t.Fatalf("failed to get shared chunk: %v", err)
} }
@@ -140,7 +211,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
t.Fatal("expected shared chunk, got nil") t.Fatal("expected shared chunk, got nil")
} }
// GetByChunkHash returns first match, should be blob1 // GetByChunkHash returns first match, should be blob1
if bc.BlobHash != "blob1" { if bc.BlobID != blob1.ID {
t.Errorf("expected blob1 for shared chunk, got %s", bc.BlobHash) t.Errorf("expected %s for shared chunk, got %s", blob1.ID, bc.BlobID)
} }
} }

View File

@@ -5,6 +5,8 @@ import (
"database/sql" "database/sql"
"fmt" "fmt"
"time" "time"
"git.eeqj.de/sneak/vaultik/internal/log"
) )
type BlobRepository struct { type BlobRepository struct {
@@ -17,15 +19,27 @@ func NewBlobRepository(db *DB) *BlobRepository {
func (r *BlobRepository) Create(ctx context.Context, tx *sql.Tx, blob *Blob) error { func (r *BlobRepository) Create(ctx context.Context, tx *sql.Tx, blob *Blob) error {
query := ` query := `
INSERT INTO blobs (blob_hash, created_ts) INSERT INTO blobs (id, blob_hash, created_ts, finished_ts, uncompressed_size, compressed_size, uploaded_ts)
VALUES (?, ?) VALUES (?, ?, ?, ?, ?, ?, ?)
` `
var finishedTS, uploadedTS *int64
if blob.FinishedTS != nil {
ts := blob.FinishedTS.Unix()
finishedTS = &ts
}
if blob.UploadedTS != nil {
ts := blob.UploadedTS.Unix()
uploadedTS = &ts
}
var err error var err error
if tx != nil { if tx != nil {
_, err = tx.ExecContext(ctx, query, blob.BlobHash, blob.CreatedTS.Unix()) _, err = tx.ExecContext(ctx, query, blob.ID, blob.Hash, blob.CreatedTS.Unix(),
finishedTS, blob.UncompressedSize, blob.CompressedSize, uploadedTS)
} else { } else {
_, err = r.db.ExecWithLock(ctx, query, blob.BlobHash, blob.CreatedTS.Unix()) _, err = r.db.ExecWithLog(ctx, query, blob.ID, blob.Hash, blob.CreatedTS.Unix(),
finishedTS, blob.UncompressedSize, blob.CompressedSize, uploadedTS)
} }
if err != nil { if err != nil {
@@ -37,17 +51,23 @@ func (r *BlobRepository) Create(ctx context.Context, tx *sql.Tx, blob *Blob) err
func (r *BlobRepository) GetByHash(ctx context.Context, hash string) (*Blob, error) { func (r *BlobRepository) GetByHash(ctx context.Context, hash string) (*Blob, error) {
query := ` query := `
SELECT blob_hash, created_ts SELECT id, blob_hash, created_ts, finished_ts, uncompressed_size, compressed_size, uploaded_ts
FROM blobs FROM blobs
WHERE blob_hash = ? WHERE blob_hash = ?
` `
var blob Blob var blob Blob
var createdTSUnix int64 var createdTSUnix int64
var finishedTSUnix, uploadedTSUnix sql.NullInt64
err := r.db.conn.QueryRowContext(ctx, query, hash).Scan( err := r.db.conn.QueryRowContext(ctx, query, hash).Scan(
&blob.BlobHash, &blob.ID,
&blob.Hash,
&createdTSUnix, &createdTSUnix,
&finishedTSUnix,
&blob.UncompressedSize,
&blob.CompressedSize,
&uploadedTSUnix,
) )
if err == sql.ErrNoRows { if err == sql.ErrNoRows {
@@ -57,40 +77,124 @@ func (r *BlobRepository) GetByHash(ctx context.Context, hash string) (*Blob, err
return nil, fmt.Errorf("querying blob: %w", err) return nil, fmt.Errorf("querying blob: %w", err)
} }
blob.CreatedTS = time.Unix(createdTSUnix, 0) blob.CreatedTS = time.Unix(createdTSUnix, 0).UTC()
if finishedTSUnix.Valid {
ts := time.Unix(finishedTSUnix.Int64, 0).UTC()
blob.FinishedTS = &ts
}
if uploadedTSUnix.Valid {
ts := time.Unix(uploadedTSUnix.Int64, 0).UTC()
blob.UploadedTS = &ts
}
return &blob, nil return &blob, nil
} }
func (r *BlobRepository) List(ctx context.Context, limit, offset int) ([]*Blob, error) { // GetByID retrieves a blob by its ID
func (r *BlobRepository) GetByID(ctx context.Context, id string) (*Blob, error) {
query := ` query := `
SELECT blob_hash, created_ts SELECT id, blob_hash, created_ts, finished_ts, uncompressed_size, compressed_size, uploaded_ts
FROM blobs FROM blobs
ORDER BY blob_hash WHERE id = ?
LIMIT ? OFFSET ?
` `
rows, err := r.db.conn.QueryContext(ctx, query, limit, offset)
if err != nil {
return nil, fmt.Errorf("querying blobs: %w", err)
}
defer CloseRows(rows)
var blobs []*Blob
for rows.Next() {
var blob Blob var blob Blob
var createdTSUnix int64 var createdTSUnix int64
var finishedTSUnix, uploadedTSUnix sql.NullInt64
err := rows.Scan( err := r.db.conn.QueryRowContext(ctx, query, id).Scan(
&blob.BlobHash, &blob.ID,
&blob.Hash,
&createdTSUnix, &createdTSUnix,
&finishedTSUnix,
&blob.UncompressedSize,
&blob.CompressedSize,
&uploadedTSUnix,
) )
if err == sql.ErrNoRows {
return nil, nil
}
if err != nil { if err != nil {
return nil, fmt.Errorf("scanning blob: %w", err) return nil, fmt.Errorf("querying blob: %w", err)
} }
blob.CreatedTS = time.Unix(createdTSUnix, 0) blob.CreatedTS = time.Unix(createdTSUnix, 0).UTC()
blobs = append(blobs, &blob) if finishedTSUnix.Valid {
ts := time.Unix(finishedTSUnix.Int64, 0).UTC()
blob.FinishedTS = &ts
} }
if uploadedTSUnix.Valid {
return blobs, rows.Err() ts := time.Unix(uploadedTSUnix.Int64, 0).UTC()
blob.UploadedTS = &ts
}
return &blob, nil
}
// UpdateFinished updates a blob when it's finalized
func (r *BlobRepository) UpdateFinished(ctx context.Context, tx *sql.Tx, id string, hash string, uncompressedSize, compressedSize int64) error {
query := `
UPDATE blobs
SET blob_hash = ?, finished_ts = ?, uncompressed_size = ?, compressed_size = ?
WHERE id = ?
`
now := time.Now().UTC().Unix()
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, hash, now, uncompressedSize, compressedSize, id)
} else {
_, err = r.db.ExecWithLog(ctx, query, hash, now, uncompressedSize, compressedSize, id)
}
if err != nil {
return fmt.Errorf("updating blob: %w", err)
}
return nil
}
// UpdateUploaded marks a blob as uploaded
func (r *BlobRepository) UpdateUploaded(ctx context.Context, tx *sql.Tx, id string) error {
query := `
UPDATE blobs
SET uploaded_ts = ?
WHERE id = ?
`
now := time.Now().UTC().Unix()
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, now, id)
} else {
_, err = r.db.ExecWithLog(ctx, query, now, id)
}
if err != nil {
return fmt.Errorf("marking blob as uploaded: %w", err)
}
return nil
}
// DeleteOrphaned deletes blobs that are not referenced by any snapshot
func (r *BlobRepository) DeleteOrphaned(ctx context.Context) error {
query := `
DELETE FROM blobs
WHERE NOT EXISTS (
SELECT 1 FROM snapshot_blobs
WHERE snapshot_blobs.blob_id = blobs.id
)
`
result, err := r.db.ExecWithLog(ctx, query)
if err != nil {
return fmt.Errorf("deleting orphaned blobs: %w", err)
}
rowsAffected, _ := result.RowsAffected()
if rowsAffected > 0 {
log.Debug("Deleted orphaned blobs", "count", rowsAffected)
}
return nil
} }

View File

@@ -4,6 +4,8 @@ import (
"context" "context"
"testing" "testing"
"time" "time"
"git.eeqj.de/sneak/vaultik/internal/types"
) )
func TestBlobRepository(t *testing.T) { func TestBlobRepository(t *testing.T) {
@@ -15,7 +17,8 @@ func TestBlobRepository(t *testing.T) {
// Test Create // Test Create
blob := &Blob{ blob := &Blob{
BlobHash: "blobhash123", ID: types.NewBlobID(),
Hash: types.BlobHash("blobhash123"),
CreatedTS: time.Now().Truncate(time.Second), CreatedTS: time.Now().Truncate(time.Second),
} }
@@ -25,23 +28,36 @@ func TestBlobRepository(t *testing.T) {
} }
// Test GetByHash // Test GetByHash
retrieved, err := repo.GetByHash(ctx, blob.BlobHash) retrieved, err := repo.GetByHash(ctx, blob.Hash.String())
if err != nil { if err != nil {
t.Fatalf("failed to get blob: %v", err) t.Fatalf("failed to get blob: %v", err)
} }
if retrieved == nil { if retrieved == nil {
t.Fatal("expected blob, got nil") t.Fatal("expected blob, got nil")
} }
if retrieved.BlobHash != blob.BlobHash { if retrieved.Hash != blob.Hash {
t.Errorf("blob hash mismatch: got %s, want %s", retrieved.BlobHash, blob.BlobHash) t.Errorf("blob hash mismatch: got %s, want %s", retrieved.Hash, blob.Hash)
} }
if !retrieved.CreatedTS.Equal(blob.CreatedTS) { if !retrieved.CreatedTS.Equal(blob.CreatedTS) {
t.Errorf("created timestamp mismatch: got %v, want %v", retrieved.CreatedTS, blob.CreatedTS) t.Errorf("created timestamp mismatch: got %v, want %v", retrieved.CreatedTS, blob.CreatedTS)
} }
// Test List // Test GetByID
retrievedByID, err := repo.GetByID(ctx, blob.ID.String())
if err != nil {
t.Fatalf("failed to get blob by ID: %v", err)
}
if retrievedByID == nil {
t.Fatal("expected blob, got nil")
}
if retrievedByID.ID != blob.ID {
t.Errorf("blob ID mismatch: got %s, want %s", retrievedByID.ID, blob.ID)
}
// Test with second blob
blob2 := &Blob{ blob2 := &Blob{
BlobHash: "blobhash456", ID: types.NewBlobID(),
Hash: types.BlobHash("blobhash456"),
CreatedTS: time.Now().Truncate(time.Second), CreatedTS: time.Now().Truncate(time.Second),
} }
err = repo.Create(ctx, nil, blob2) err = repo.Create(ctx, nil, blob2)
@@ -49,29 +65,45 @@ func TestBlobRepository(t *testing.T) {
t.Fatalf("failed to create second blob: %v", err) t.Fatalf("failed to create second blob: %v", err)
} }
blobs, err := repo.List(ctx, 10, 0) // Test UpdateFinished
now := time.Now()
err = repo.UpdateFinished(ctx, nil, blob.ID.String(), blob.Hash.String(), 1000, 500)
if err != nil { if err != nil {
t.Fatalf("failed to list blobs: %v", err) t.Fatalf("failed to update blob as finished: %v", err)
}
if len(blobs) != 2 {
t.Errorf("expected 2 blobs, got %d", len(blobs))
} }
// Test pagination // Verify update
blobs, err = repo.List(ctx, 1, 0) updated, err := repo.GetByID(ctx, blob.ID.String())
if err != nil { if err != nil {
t.Fatalf("failed to list blobs with limit: %v", err) t.Fatalf("failed to get updated blob: %v", err)
} }
if len(blobs) != 1 { if updated.FinishedTS == nil {
t.Errorf("expected 1 blob with limit, got %d", len(blobs)) t.Fatal("expected finished timestamp to be set")
}
if updated.UncompressedSize != 1000 {
t.Errorf("expected uncompressed size 1000, got %d", updated.UncompressedSize)
}
if updated.CompressedSize != 500 {
t.Errorf("expected compressed size 500, got %d", updated.CompressedSize)
} }
blobs, err = repo.List(ctx, 1, 1) // Test UpdateUploaded
err = repo.UpdateUploaded(ctx, nil, blob.ID.String())
if err != nil { if err != nil {
t.Fatalf("failed to list blobs with offset: %v", err) t.Fatalf("failed to update blob as uploaded: %v", err)
} }
if len(blobs) != 1 {
t.Errorf("expected 1 blob with offset, got %d", len(blobs)) // Verify upload update
uploaded, err := repo.GetByID(ctx, blob.ID.String())
if err != nil {
t.Fatalf("failed to get uploaded blob: %v", err)
}
if uploaded.UploadedTS == nil {
t.Fatal("expected uploaded timestamp to be set")
}
// Allow 1 second tolerance for timestamp comparison
if uploaded.UploadedTS.Before(now.Add(-1 * time.Second)) {
t.Error("uploaded timestamp should be around test time")
} }
} }
@@ -83,7 +115,8 @@ func TestBlobRepositoryDuplicate(t *testing.T) {
repo := NewBlobRepository(db) repo := NewBlobRepository(db)
blob := &Blob{ blob := &Blob{
BlobHash: "duplicate_blob", ID: types.NewBlobID(),
Hash: types.BlobHash("duplicate_blob"),
CreatedTS: time.Now().Truncate(time.Second), CreatedTS: time.Now().Truncate(time.Second),
} }

View File

@@ -0,0 +1,125 @@
package database
import (
"context"
"fmt"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
// TestCascadeDeleteDebug tests cascade delete with debug output
func TestCascadeDeleteDebug(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Check if foreign keys are enabled
var fkEnabled int
err := db.conn.QueryRow("PRAGMA foreign_keys").Scan(&fkEnabled)
if err != nil {
t.Fatal(err)
}
t.Logf("Foreign keys enabled: %d", fkEnabled)
// Create a file
file := &File{
Path: "/cascade-test.txt",
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
}
err = repos.Files.Create(ctx, nil, file)
if err != nil {
t.Fatalf("failed to create file: %v", err)
}
t.Logf("Created file with ID: %s", file.ID)
// Create chunks and file-chunk mappings
for i := 0; i < 3; i++ {
chunk := &Chunk{
ChunkHash: types.ChunkHash(fmt.Sprintf("cascade-chunk-%d", i)),
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)
if err != nil {
t.Fatalf("failed to create chunk: %v", err)
}
fc := &FileChunk{
FileID: file.ID,
Idx: i,
ChunkHash: chunk.ChunkHash,
}
err = repos.FileChunks.Create(ctx, nil, fc)
if err != nil {
t.Fatalf("failed to create file chunk: %v", err)
}
t.Logf("Created file chunk mapping: file_id=%s, idx=%d, chunk=%s", fc.FileID, fc.Idx, fc.ChunkHash)
}
// Verify file chunks exist
fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
if err != nil {
t.Fatal(err)
}
t.Logf("File chunks before delete: %d", len(fileChunks))
// Check the foreign key constraint
var fkInfo string
err = db.conn.QueryRow(`
SELECT sql FROM sqlite_master
WHERE type='table' AND name='file_chunks'
`).Scan(&fkInfo)
if err != nil {
t.Fatal(err)
}
t.Logf("file_chunks table definition:\n%s", fkInfo)
// Delete the file
t.Log("Deleting file...")
err = repos.Files.DeleteByID(ctx, nil, file.ID)
if err != nil {
t.Fatalf("failed to delete file: %v", err)
}
// Verify file is gone
deletedFile, err := repos.Files.GetByID(ctx, file.ID)
if err != nil {
t.Fatal(err)
}
if deletedFile != nil {
t.Error("file should have been deleted")
} else {
t.Log("File was successfully deleted")
}
// Check file chunks after delete
fileChunks, err = repos.FileChunks.GetByFileID(ctx, file.ID)
if err != nil {
t.Fatal(err)
}
t.Logf("File chunks after delete: %d", len(fileChunks))
// Manually check the database
var count int
err = db.conn.QueryRow("SELECT COUNT(*) FROM file_chunks WHERE file_id = ?", file.ID).Scan(&count)
if err != nil {
t.Fatal(err)
}
t.Logf("Manual count of file_chunks for deleted file: %d", count)
if len(fileChunks) != 0 {
t.Errorf("expected 0 file chunks after cascade delete, got %d", len(fileChunks))
// List the remaining chunks
for _, fc := range fileChunks {
t.Logf("Remaining chunk: file_id=%s, idx=%d, chunk=%s", fc.FileID, fc.Idx, fc.ChunkHash)
}
}
}

View File

@@ -4,6 +4,8 @@ import (
"context" "context"
"database/sql" "database/sql"
"fmt" "fmt"
"git.eeqj.de/sneak/vaultik/internal/types"
) )
type ChunkFileRepository struct { type ChunkFileRepository struct {
@@ -16,16 +18,16 @@ func NewChunkFileRepository(db *DB) *ChunkFileRepository {
func (r *ChunkFileRepository) Create(ctx context.Context, tx *sql.Tx, cf *ChunkFile) error { func (r *ChunkFileRepository) Create(ctx context.Context, tx *sql.Tx, cf *ChunkFile) error {
query := ` query := `
INSERT INTO chunk_files (chunk_hash, file_path, file_offset, length) INSERT INTO chunk_files (chunk_hash, file_id, file_offset, length)
VALUES (?, ?, ?, ?) VALUES (?, ?, ?, ?)
ON CONFLICT(chunk_hash, file_path) DO NOTHING ON CONFLICT(chunk_hash, file_id) DO NOTHING
` `
var err error var err error
if tx != nil { if tx != nil {
_, err = tx.ExecContext(ctx, query, cf.ChunkHash, cf.FilePath, cf.FileOffset, cf.Length) _, err = tx.ExecContext(ctx, query, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
} else { } else {
_, err = r.db.ExecWithLock(ctx, query, cf.ChunkHash, cf.FilePath, cf.FileOffset, cf.Length) _, err = r.db.ExecWithLog(ctx, query, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
} }
if err != nil { if err != nil {
@@ -35,37 +37,28 @@ func (r *ChunkFileRepository) Create(ctx context.Context, tx *sql.Tx, cf *ChunkF
return nil return nil
} }
func (r *ChunkFileRepository) GetByChunkHash(ctx context.Context, chunkHash string) ([]*ChunkFile, error) { func (r *ChunkFileRepository) GetByChunkHash(ctx context.Context, chunkHash types.ChunkHash) ([]*ChunkFile, error) {
query := ` query := `
SELECT chunk_hash, file_path, file_offset, length SELECT chunk_hash, file_id, file_offset, length
FROM chunk_files FROM chunk_files
WHERE chunk_hash = ? WHERE chunk_hash = ?
` `
rows, err := r.db.conn.QueryContext(ctx, query, chunkHash) rows, err := r.db.conn.QueryContext(ctx, query, chunkHash.String())
if err != nil { if err != nil {
return nil, fmt.Errorf("querying chunk files: %w", err) return nil, fmt.Errorf("querying chunk files: %w", err)
} }
defer CloseRows(rows) defer CloseRows(rows)
var chunkFiles []*ChunkFile return r.scanChunkFiles(rows)
for rows.Next() {
var cf ChunkFile
err := rows.Scan(&cf.ChunkHash, &cf.FilePath, &cf.FileOffset, &cf.Length)
if err != nil {
return nil, fmt.Errorf("scanning chunk file: %w", err)
}
chunkFiles = append(chunkFiles, &cf)
}
return chunkFiles, rows.Err()
} }
func (r *ChunkFileRepository) GetByFilePath(ctx context.Context, filePath string) ([]*ChunkFile, error) { func (r *ChunkFileRepository) GetByFilePath(ctx context.Context, filePath string) ([]*ChunkFile, error) {
query := ` query := `
SELECT chunk_hash, file_path, file_offset, length SELECT cf.chunk_hash, cf.file_id, cf.file_offset, cf.length
FROM chunk_files FROM chunk_files cf
WHERE file_path = ? JOIN files f ON cf.file_id = f.id
WHERE f.path = ?
` `
rows, err := r.db.conn.QueryContext(ctx, query, filePath) rows, err := r.db.conn.QueryContext(ctx, query, filePath)
@@ -74,15 +67,138 @@ func (r *ChunkFileRepository) GetByFilePath(ctx context.Context, filePath string
} }
defer CloseRows(rows) defer CloseRows(rows)
return r.scanChunkFiles(rows)
}
// GetByFileID retrieves chunk files by file ID
func (r *ChunkFileRepository) GetByFileID(ctx context.Context, fileID types.FileID) ([]*ChunkFile, error) {
query := `
SELECT chunk_hash, file_id, file_offset, length
FROM chunk_files
WHERE file_id = ?
`
rows, err := r.db.conn.QueryContext(ctx, query, fileID.String())
if err != nil {
return nil, fmt.Errorf("querying chunk files: %w", err)
}
defer CloseRows(rows)
return r.scanChunkFiles(rows)
}
// scanChunkFiles is a helper that scans chunk file rows
func (r *ChunkFileRepository) scanChunkFiles(rows *sql.Rows) ([]*ChunkFile, error) {
var chunkFiles []*ChunkFile var chunkFiles []*ChunkFile
for rows.Next() { for rows.Next() {
var cf ChunkFile var cf ChunkFile
err := rows.Scan(&cf.ChunkHash, &cf.FilePath, &cf.FileOffset, &cf.Length) var chunkHashStr, fileIDStr string
err := rows.Scan(&chunkHashStr, &fileIDStr, &cf.FileOffset, &cf.Length)
if err != nil { if err != nil {
return nil, fmt.Errorf("scanning chunk file: %w", err) return nil, fmt.Errorf("scanning chunk file: %w", err)
} }
cf.ChunkHash = types.ChunkHash(chunkHashStr)
cf.FileID, err = types.ParseFileID(fileIDStr)
if err != nil {
return nil, fmt.Errorf("parsing file ID: %w", err)
}
chunkFiles = append(chunkFiles, &cf) chunkFiles = append(chunkFiles, &cf)
} }
return chunkFiles, rows.Err() return chunkFiles, rows.Err()
} }
// DeleteByFileID deletes all chunk_files entries for a given file ID
func (r *ChunkFileRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fileID types.FileID) error {
query := `DELETE FROM chunk_files WHERE file_id = ?`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, fileID.String())
} else {
_, err = r.db.ExecWithLog(ctx, query, fileID.String())
}
if err != nil {
return fmt.Errorf("deleting chunk files: %w", err)
}
return nil
}
// DeleteByFileIDs deletes all chunk_files for multiple files in a single statement.
func (r *ChunkFileRepository) DeleteByFileIDs(ctx context.Context, tx *sql.Tx, fileIDs []types.FileID) error {
if len(fileIDs) == 0 {
return nil
}
// Batch at 500 to stay within SQLite's variable limit
const batchSize = 500
for i := 0; i < len(fileIDs); i += batchSize {
end := i + batchSize
if end > len(fileIDs) {
end = len(fileIDs)
}
batch := fileIDs[i:end]
query := "DELETE FROM chunk_files WHERE file_id IN (?" + repeatPlaceholder(len(batch)-1) + ")"
args := make([]interface{}, len(batch))
for j, id := range batch {
args[j] = id.String()
}
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch deleting chunk_files: %w", err)
}
}
return nil
}
// CreateBatch inserts multiple chunk_files in a single statement for efficiency.
func (r *ChunkFileRepository) CreateBatch(ctx context.Context, tx *sql.Tx, cfs []ChunkFile) error {
if len(cfs) == 0 {
return nil
}
// Each ChunkFile has 4 values, so batch at 200 to be safe with SQLite's variable limit
const batchSize = 200
for i := 0; i < len(cfs); i += batchSize {
end := i + batchSize
if end > len(cfs) {
end = len(cfs)
}
batch := cfs[i:end]
query := "INSERT INTO chunk_files (chunk_hash, file_id, file_offset, length) VALUES "
args := make([]interface{}, 0, len(batch)*4)
for j, cf := range batch {
if j > 0 {
query += ", "
}
query += "(?, ?, ?, ?)"
args = append(args, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
}
query += " ON CONFLICT(chunk_hash, file_id) DO NOTHING"
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch inserting chunk_files: %w", err)
}
}
return nil
}

View File

@@ -3,6 +3,9 @@ package database
import ( import (
"context" "context"
"testing" "testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
) )
func TestChunkFileRepository(t *testing.T) { func TestChunkFileRepository(t *testing.T) {
@@ -11,24 +14,68 @@ func TestChunkFileRepository(t *testing.T) {
ctx := context.Background() ctx := context.Background()
repo := NewChunkFileRepository(db) repo := NewChunkFileRepository(db)
fileRepo := NewFileRepository(db)
chunksRepo := NewChunkRepository(db)
// Create test files first
testTime := time.Now().Truncate(time.Second)
file1 := &File{
Path: "/file1.txt",
MTime: testTime,
CTime: testTime,
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
LinkTarget: "",
}
err := fileRepo.Create(ctx, nil, file1)
if err != nil {
t.Fatalf("failed to create file1: %v", err)
}
file2 := &File{
Path: "/file2.txt",
MTime: testTime,
CTime: testTime,
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
LinkTarget: "",
}
err = fileRepo.Create(ctx, nil, file2)
if err != nil {
t.Fatalf("failed to create file2: %v", err)
}
// Create chunk first
chunk := &Chunk{
ChunkHash: types.ChunkHash("chunk1"),
Size: 1024,
}
err = chunksRepo.Create(ctx, nil, chunk)
if err != nil {
t.Fatalf("failed to create chunk: %v", err)
}
// Test Create // Test Create
cf1 := &ChunkFile{ cf1 := &ChunkFile{
ChunkHash: "chunk1", ChunkHash: types.ChunkHash("chunk1"),
FilePath: "/file1.txt", FileID: file1.ID,
FileOffset: 0, FileOffset: 0,
Length: 1024, Length: 1024,
} }
err := repo.Create(ctx, nil, cf1) err = repo.Create(ctx, nil, cf1)
if err != nil { if err != nil {
t.Fatalf("failed to create chunk file: %v", err) t.Fatalf("failed to create chunk file: %v", err)
} }
// Add same chunk in different file (deduplication scenario) // Add same chunk in different file (deduplication scenario)
cf2 := &ChunkFile{ cf2 := &ChunkFile{
ChunkHash: "chunk1", ChunkHash: types.ChunkHash("chunk1"),
FilePath: "/file2.txt", FileID: file2.ID,
FileOffset: 2048, FileOffset: 2048,
Length: 1024, Length: 1024,
} }
@@ -50,10 +97,10 @@ func TestChunkFileRepository(t *testing.T) {
foundFile1 := false foundFile1 := false
foundFile2 := false foundFile2 := false
for _, cf := range chunkFiles { for _, cf := range chunkFiles {
if cf.FilePath == "/file1.txt" && cf.FileOffset == 0 { if cf.FileID == file1.ID && cf.FileOffset == 0 {
foundFile1 = true foundFile1 = true
} }
if cf.FilePath == "/file2.txt" && cf.FileOffset == 2048 { if cf.FileID == file2.ID && cf.FileOffset == 2048 {
foundFile2 = true foundFile2 = true
} }
} }
@@ -61,15 +108,15 @@ func TestChunkFileRepository(t *testing.T) {
t.Error("not all expected files found") t.Error("not all expected files found")
} }
// Test GetByFilePath // Test GetByFileID
chunkFiles, err = repo.GetByFilePath(ctx, "/file1.txt") chunkFiles, err = repo.GetByFileID(ctx, file1.ID)
if err != nil { if err != nil {
t.Fatalf("failed to get chunks by file path: %v", err) t.Fatalf("failed to get chunks by file ID: %v", err)
} }
if len(chunkFiles) != 1 { if len(chunkFiles) != 1 {
t.Errorf("expected 1 chunk for file, got %d", len(chunkFiles)) t.Errorf("expected 1 chunk for file, got %d", len(chunkFiles))
} }
if chunkFiles[0].ChunkHash != "chunk1" { if chunkFiles[0].ChunkHash != types.ChunkHash("chunk1") {
t.Errorf("wrong chunk hash: expected chunk1, got %s", chunkFiles[0].ChunkHash) t.Errorf("wrong chunk hash: expected chunk1, got %s", chunkFiles[0].ChunkHash)
} }
@@ -86,6 +133,37 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {
ctx := context.Background() ctx := context.Background()
repo := NewChunkFileRepository(db) repo := NewChunkFileRepository(db)
fileRepo := NewFileRepository(db)
chunksRepo := NewChunkRepository(db)
// Create test files
testTime := time.Now().Truncate(time.Second)
file1 := &File{Path: "/file1.txt", MTime: testTime, CTime: testTime, Size: 3072, Mode: 0644, UID: 1000, GID: 1000}
file2 := &File{Path: "/file2.txt", MTime: testTime, CTime: testTime, Size: 3072, Mode: 0644, UID: 1000, GID: 1000}
file3 := &File{Path: "/file3.txt", MTime: testTime, CTime: testTime, Size: 2048, Mode: 0644, UID: 1000, GID: 1000}
if err := fileRepo.Create(ctx, nil, file1); err != nil {
t.Fatalf("failed to create file1: %v", err)
}
if err := fileRepo.Create(ctx, nil, file2); err != nil {
t.Fatalf("failed to create file2: %v", err)
}
if err := fileRepo.Create(ctx, nil, file3); err != nil {
t.Fatalf("failed to create file3: %v", err)
}
// Create chunks first
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3", "chunk4"}
for _, chunkHash := range chunks {
chunk := &Chunk{
ChunkHash: chunkHash,
Size: 1024,
}
err := chunksRepo.Create(ctx, nil, chunk)
if err != nil {
t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
}
}
// Simulate a scenario where multiple files share chunks // Simulate a scenario where multiple files share chunks
// File1: chunk1, chunk2, chunk3 // File1: chunk1, chunk2, chunk3
@@ -94,16 +172,16 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {
chunkFiles := []ChunkFile{ chunkFiles := []ChunkFile{
// File1 // File1
{ChunkHash: "chunk1", FilePath: "/file1.txt", FileOffset: 0, Length: 1024}, {ChunkHash: types.ChunkHash("chunk1"), FileID: file1.ID, FileOffset: 0, Length: 1024},
{ChunkHash: "chunk2", FilePath: "/file1.txt", FileOffset: 1024, Length: 1024}, {ChunkHash: types.ChunkHash("chunk2"), FileID: file1.ID, FileOffset: 1024, Length: 1024},
{ChunkHash: "chunk3", FilePath: "/file1.txt", FileOffset: 2048, Length: 1024}, {ChunkHash: types.ChunkHash("chunk3"), FileID: file1.ID, FileOffset: 2048, Length: 1024},
// File2 // File2
{ChunkHash: "chunk2", FilePath: "/file2.txt", FileOffset: 0, Length: 1024}, {ChunkHash: types.ChunkHash("chunk2"), FileID: file2.ID, FileOffset: 0, Length: 1024},
{ChunkHash: "chunk3", FilePath: "/file2.txt", FileOffset: 1024, Length: 1024}, {ChunkHash: types.ChunkHash("chunk3"), FileID: file2.ID, FileOffset: 1024, Length: 1024},
{ChunkHash: "chunk4", FilePath: "/file2.txt", FileOffset: 2048, Length: 1024}, {ChunkHash: types.ChunkHash("chunk4"), FileID: file2.ID, FileOffset: 2048, Length: 1024},
// File3 // File3
{ChunkHash: "chunk1", FilePath: "/file3.txt", FileOffset: 0, Length: 1024}, {ChunkHash: types.ChunkHash("chunk1"), FileID: file3.ID, FileOffset: 0, Length: 1024},
{ChunkHash: "chunk4", FilePath: "/file3.txt", FileOffset: 1024, Length: 1024}, {ChunkHash: types.ChunkHash("chunk4"), FileID: file3.ID, FileOffset: 1024, Length: 1024},
} }
for _, cf := range chunkFiles { for _, cf := range chunkFiles {
@@ -132,11 +210,11 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {
} }
// Test file2 chunks // Test file2 chunks
chunks, err := repo.GetByFilePath(ctx, "/file2.txt") file2Chunks, err := repo.GetByFileID(ctx, file2.ID)
if err != nil { if err != nil {
t.Fatalf("failed to get chunks for file2: %v", err) t.Fatalf("failed to get chunks for file2: %v", err)
} }
if len(chunks) != 3 { if len(file2Chunks) != 3 {
t.Errorf("expected 3 chunks for file2, got %d", len(chunks)) t.Errorf("expected 3 chunks for file2, got %d", len(file2Chunks))
} }
} }

View File

@@ -4,6 +4,8 @@ import (
"context" "context"
"database/sql" "database/sql"
"fmt" "fmt"
"git.eeqj.de/sneak/vaultik/internal/log"
) )
type ChunkRepository struct { type ChunkRepository struct {
@@ -16,16 +18,16 @@ func NewChunkRepository(db *DB) *ChunkRepository {
func (r *ChunkRepository) Create(ctx context.Context, tx *sql.Tx, chunk *Chunk) error { func (r *ChunkRepository) Create(ctx context.Context, tx *sql.Tx, chunk *Chunk) error {
query := ` query := `
INSERT INTO chunks (chunk_hash, sha256, size) INSERT INTO chunks (chunk_hash, size)
VALUES (?, ?, ?) VALUES (?, ?)
ON CONFLICT(chunk_hash) DO NOTHING ON CONFLICT(chunk_hash) DO NOTHING
` `
var err error var err error
if tx != nil { if tx != nil {
_, err = tx.ExecContext(ctx, query, chunk.ChunkHash, chunk.SHA256, chunk.Size) _, err = tx.ExecContext(ctx, query, chunk.ChunkHash, chunk.Size)
} else { } else {
_, err = r.db.ExecWithLock(ctx, query, chunk.ChunkHash, chunk.SHA256, chunk.Size) _, err = r.db.ExecWithLog(ctx, query, chunk.ChunkHash, chunk.Size)
} }
if err != nil { if err != nil {
@@ -37,7 +39,7 @@ func (r *ChunkRepository) Create(ctx context.Context, tx *sql.Tx, chunk *Chunk)
func (r *ChunkRepository) GetByHash(ctx context.Context, hash string) (*Chunk, error) { func (r *ChunkRepository) GetByHash(ctx context.Context, hash string) (*Chunk, error) {
query := ` query := `
SELECT chunk_hash, sha256, size SELECT chunk_hash, size
FROM chunks FROM chunks
WHERE chunk_hash = ? WHERE chunk_hash = ?
` `
@@ -46,7 +48,6 @@ func (r *ChunkRepository) GetByHash(ctx context.Context, hash string) (*Chunk, e
err := r.db.conn.QueryRowContext(ctx, query, hash).Scan( err := r.db.conn.QueryRowContext(ctx, query, hash).Scan(
&chunk.ChunkHash, &chunk.ChunkHash,
&chunk.SHA256,
&chunk.Size, &chunk.Size,
) )
@@ -66,7 +67,7 @@ func (r *ChunkRepository) GetByHashes(ctx context.Context, hashes []string) ([]*
} }
query := ` query := `
SELECT chunk_hash, sha256, size SELECT chunk_hash, size
FROM chunks FROM chunks
WHERE chunk_hash IN (` WHERE chunk_hash IN (`
@@ -92,7 +93,6 @@ func (r *ChunkRepository) GetByHashes(ctx context.Context, hashes []string) ([]*
err := rows.Scan( err := rows.Scan(
&chunk.ChunkHash, &chunk.ChunkHash,
&chunk.SHA256,
&chunk.Size, &chunk.Size,
) )
if err != nil { if err != nil {
@@ -107,7 +107,7 @@ func (r *ChunkRepository) GetByHashes(ctx context.Context, hashes []string) ([]*
func (r *ChunkRepository) ListUnpacked(ctx context.Context, limit int) ([]*Chunk, error) { func (r *ChunkRepository) ListUnpacked(ctx context.Context, limit int) ([]*Chunk, error) {
query := ` query := `
SELECT c.chunk_hash, c.sha256, c.size SELECT c.chunk_hash, c.size
FROM chunks c FROM chunks c
LEFT JOIN blob_chunks bc ON c.chunk_hash = bc.chunk_hash LEFT JOIN blob_chunks bc ON c.chunk_hash = bc.chunk_hash
WHERE bc.chunk_hash IS NULL WHERE bc.chunk_hash IS NULL
@@ -127,7 +127,6 @@ func (r *ChunkRepository) ListUnpacked(ctx context.Context, limit int) ([]*Chunk
err := rows.Scan( err := rows.Scan(
&chunk.ChunkHash, &chunk.ChunkHash,
&chunk.SHA256,
&chunk.Size, &chunk.Size,
) )
if err != nil { if err != nil {
@@ -139,3 +138,30 @@ func (r *ChunkRepository) ListUnpacked(ctx context.Context, limit int) ([]*Chunk
return chunks, rows.Err() return chunks, rows.Err()
} }
// DeleteOrphaned deletes chunks that are not referenced by any file or blob
func (r *ChunkRepository) DeleteOrphaned(ctx context.Context) error {
query := `
DELETE FROM chunks
WHERE NOT EXISTS (
SELECT 1 FROM file_chunks
WHERE file_chunks.chunk_hash = chunks.chunk_hash
)
AND NOT EXISTS (
SELECT 1 FROM blob_chunks
WHERE blob_chunks.chunk_hash = chunks.chunk_hash
)
`
result, err := r.db.ExecWithLog(ctx, query)
if err != nil {
return fmt.Errorf("deleting orphaned chunks: %w", err)
}
rowsAffected, _ := result.RowsAffected()
if rowsAffected > 0 {
log.Debug("Deleted orphaned chunks", "count", rowsAffected)
}
return nil
}

View File

@@ -0,0 +1,37 @@
package database
import (
"context"
"fmt"
)
func (r *ChunkRepository) List(ctx context.Context) ([]*Chunk, error) {
query := `
SELECT chunk_hash, size
FROM chunks
ORDER BY chunk_hash
`
rows, err := r.db.conn.QueryContext(ctx, query)
if err != nil {
return nil, fmt.Errorf("querying chunks: %w", err)
}
defer CloseRows(rows)
var chunks []*Chunk
for rows.Next() {
var chunk Chunk
err := rows.Scan(
&chunk.ChunkHash,
&chunk.Size,
)
if err != nil {
return nil, fmt.Errorf("scanning chunk: %w", err)
}
chunks = append(chunks, &chunk)
}
return chunks, rows.Err()
}

View File

@@ -3,6 +3,8 @@ package database
import ( import (
"context" "context"
"testing" "testing"
"git.eeqj.de/sneak/vaultik/internal/types"
) )
func TestChunkRepository(t *testing.T) { func TestChunkRepository(t *testing.T) {
@@ -14,8 +16,7 @@ func TestChunkRepository(t *testing.T) {
// Test Create // Test Create
chunk := &Chunk{ chunk := &Chunk{
ChunkHash: "chunkhash123", ChunkHash: types.ChunkHash("chunkhash123"),
SHA256: "sha256hash123",
Size: 4096, Size: 4096,
} }
@@ -25,7 +26,7 @@ func TestChunkRepository(t *testing.T) {
} }
// Test GetByHash // Test GetByHash
retrieved, err := repo.GetByHash(ctx, chunk.ChunkHash) retrieved, err := repo.GetByHash(ctx, chunk.ChunkHash.String())
if err != nil { if err != nil {
t.Fatalf("failed to get chunk: %v", err) t.Fatalf("failed to get chunk: %v", err)
} }
@@ -35,9 +36,6 @@ func TestChunkRepository(t *testing.T) {
if retrieved.ChunkHash != chunk.ChunkHash { if retrieved.ChunkHash != chunk.ChunkHash {
t.Errorf("chunk hash mismatch: got %s, want %s", retrieved.ChunkHash, chunk.ChunkHash) t.Errorf("chunk hash mismatch: got %s, want %s", retrieved.ChunkHash, chunk.ChunkHash)
} }
if retrieved.SHA256 != chunk.SHA256 {
t.Errorf("sha256 mismatch: got %s, want %s", retrieved.SHA256, chunk.SHA256)
}
if retrieved.Size != chunk.Size { if retrieved.Size != chunk.Size {
t.Errorf("size mismatch: got %d, want %d", retrieved.Size, chunk.Size) t.Errorf("size mismatch: got %d, want %d", retrieved.Size, chunk.Size)
} }
@@ -50,8 +48,7 @@ func TestChunkRepository(t *testing.T) {
// Test GetByHashes // Test GetByHashes
chunk2 := &Chunk{ chunk2 := &Chunk{
ChunkHash: "chunkhash456", ChunkHash: types.ChunkHash("chunkhash456"),
SHA256: "sha256hash456",
Size: 8192, Size: 8192,
} }
err = repo.Create(ctx, nil, chunk2) err = repo.Create(ctx, nil, chunk2)
@@ -59,7 +56,7 @@ func TestChunkRepository(t *testing.T) {
t.Fatalf("failed to create second chunk: %v", err) t.Fatalf("failed to create second chunk: %v", err)
} }
chunks, err := repo.GetByHashes(ctx, []string{chunk.ChunkHash, chunk2.ChunkHash}) chunks, err := repo.GetByHashes(ctx, []string{chunk.ChunkHash.String(), chunk2.ChunkHash.String()})
if err != nil { if err != nil {
t.Fatalf("failed to get chunks by hashes: %v", err) t.Fatalf("failed to get chunks by hashes: %v", err)
} }

View File

@@ -1,143 +1,239 @@
// Package database provides the local SQLite index for Vaultik backup operations.
// The database tracks files, chunks, and their associations with blobs.
//
// Blobs in Vaultik are the final storage units uploaded to S3. Each blob is a
// large (up to 10GB) file containing many compressed and encrypted chunks from
// multiple source files. Blobs are content-addressed, meaning their filename
// is derived from their SHA256 hash after compression and encryption.
//
// The database does not support migrations. If the schema changes, delete
// the local database and perform a full backup to recreate it.
package database package database
import ( import (
"context" "context"
"database/sql" "database/sql"
_ "embed"
"fmt" "fmt"
"sync" "os"
"strings"
"git.eeqj.de/sneak/vaultik/internal/log"
_ "modernc.org/sqlite" _ "modernc.org/sqlite"
) )
//go:embed schema.sql
var schemaSQL string
// DB represents the Vaultik local index database connection.
// It uses SQLite to track file metadata, content-defined chunks, and blob associations.
// The database enables incremental backups by detecting changed files and
// supports deduplication by tracking which chunks are already stored in blobs.
// Write operations are synchronized through a mutex to ensure thread safety.
type DB struct { type DB struct {
conn *sql.DB conn *sql.DB
writeLock sync.Mutex path string
} }
// New creates a new database connection at the specified path.
// It creates the schema if needed and configures SQLite with WAL mode for
// better concurrency. SQLite handles crash recovery automatically when
// opening a database with journal/WAL files present.
// The path parameter can be a file path for persistent storage or ":memory:"
// for an in-memory database (useful for testing).
func New(ctx context.Context, path string) (*DB, error) { func New(ctx context.Context, path string) (*DB, error) {
conn, err := sql.Open("sqlite", path+"?_journal_mode=WAL&_synchronous=NORMAL&_busy_timeout=5000") log.Debug("Opening database connection", "path", path)
if err != nil {
return nil, fmt.Errorf("opening database: %w", err) // Note: We do NOT delete journal/WAL files before opening.
// SQLite handles crash recovery automatically when the database is opened.
// Deleting these files would corrupt the database after an unclean shutdown.
// First attempt with standard WAL mode
log.Debug("Attempting to open database with WAL mode", "path", path)
conn, err := sql.Open(
"sqlite",
path+"?_journal_mode=WAL&_synchronous=NORMAL&_busy_timeout=10000&_locking_mode=NORMAL&_foreign_keys=ON",
)
if err == nil {
// Set connection pool settings
// SQLite can handle multiple readers but only one writer at a time.
// Setting MaxOpenConns to 1 ensures all writes are serialized through
// a single connection, preventing SQLITE_BUSY errors.
conn.SetMaxOpenConns(1)
conn.SetMaxIdleConns(1)
if err := conn.PingContext(ctx); err == nil {
// Success on first try
log.Debug("Database opened successfully with WAL mode", "path", path)
// Enable foreign keys explicitly
if _, err := conn.ExecContext(ctx, "PRAGMA foreign_keys = ON"); err != nil {
log.Warn("Failed to enable foreign keys", "error", err)
} }
db := &DB{conn: conn, path: path}
if err := db.createSchema(ctx); err != nil {
_ = conn.Close()
return nil, fmt.Errorf("creating schema: %w", err)
}
return db, nil
}
log.Debug("Failed to ping database, closing connection", "path", path, "error", err)
_ = conn.Close()
}
// If first attempt failed, try with TRUNCATE mode to clear any locks
log.Info(
"Database appears locked, attempting recovery with TRUNCATE mode",
"path", path,
)
conn, err = sql.Open(
"sqlite",
path+"?_journal_mode=TRUNCATE&_synchronous=NORMAL&_busy_timeout=10000&_foreign_keys=ON",
)
if err != nil {
return nil, fmt.Errorf("opening database in recovery mode: %w", err)
}
// Set connection pool settings
// SQLite can handle multiple readers but only one writer at a time.
// Setting MaxOpenConns to 1 ensures all writes are serialized through
// a single connection, preventing SQLITE_BUSY errors.
conn.SetMaxOpenConns(1)
conn.SetMaxIdleConns(1)
if err := conn.PingContext(ctx); err != nil { if err := conn.PingContext(ctx); err != nil {
if closeErr := conn.Close(); closeErr != nil { log.Debug("Failed to ping database in recovery mode, closing", "path", path, "error", err)
Fatal("failed to close database connection: %v", closeErr) _ = conn.Close()
} return nil, fmt.Errorf(
return nil, fmt.Errorf("pinging database: %w", err) "database still locked after recovery attempt: %w",
err,
)
} }
db := &DB{conn: conn} log.Debug("Database opened in TRUNCATE mode", "path", path)
if err := db.createSchema(ctx); err != nil {
if closeErr := conn.Close(); closeErr != nil { // Switch back to WAL mode
Fatal("failed to close database connection: %v", closeErr) log.Debug("Switching database back to WAL mode", "path", path)
if _, err := conn.ExecContext(ctx, "PRAGMA journal_mode=WAL"); err != nil {
log.Warn("Failed to switch back to WAL mode", "path", path, "error", err)
} }
// Ensure foreign keys are enabled
if _, err := conn.ExecContext(ctx, "PRAGMA foreign_keys=ON"); err != nil {
log.Warn("Failed to enable foreign keys", "path", path, "error", err)
}
db := &DB{conn: conn, path: path}
if err := db.createSchema(ctx); err != nil {
_ = conn.Close()
return nil, fmt.Errorf("creating schema: %w", err) return nil, fmt.Errorf("creating schema: %w", err)
} }
log.Debug("Database connection established successfully", "path", path)
return db, nil return db, nil
} }
// Close closes the database connection.
// It ensures all pending operations are completed before closing.
// Returns an error if the database connection cannot be closed properly.
func (db *DB) Close() error { func (db *DB) Close() error {
log.Debug("Closing database connection", "path", db.path)
if err := db.conn.Close(); err != nil { if err := db.conn.Close(); err != nil {
Fatal("failed to close database: %v", err) log.Error("Failed to close database", "path", db.path, "error", err)
return fmt.Errorf("failed to close database: %w", err)
} }
log.Debug("Database connection closed successfully", "path", db.path)
return nil return nil
} }
// Conn returns the underlying *sql.DB connection.
// This should be used sparingly and primarily for read operations.
// For write operations, prefer using the ExecWithLog method.
func (db *DB) Conn() *sql.DB { func (db *DB) Conn() *sql.DB {
return db.conn return db.conn
} }
func (db *DB) BeginTx(ctx context.Context, opts *sql.TxOptions) (*sql.Tx, error) { // Path returns the path to the database file.
func (db *DB) Path() string {
return db.path
}
// BeginTx starts a new database transaction with the given options.
// The caller is responsible for committing or rolling back the transaction.
// For write transactions, consider using the Repositories.WithTx method instead,
// which handles locking and rollback automatically.
func (db *DB) BeginTx(
ctx context.Context,
opts *sql.TxOptions,
) (*sql.Tx, error) {
return db.conn.BeginTx(ctx, opts) return db.conn.BeginTx(ctx, opts)
} }
// LockForWrite acquires the write lock // Note: LockForWrite and UnlockWrite methods have been removed.
func (db *DB) LockForWrite() { // SQLite handles its own locking internally, so explicit locking is not needed.
db.writeLock.Lock()
}
// UnlockWrite releases the write lock // ExecWithLog executes a write query with SQL logging.
func (db *DB) UnlockWrite() { // SQLite handles its own locking internally, so we just pass through to ExecContext.
db.writeLock.Unlock() // The query and args parameters follow the same format as sql.DB.ExecContext.
} func (db *DB) ExecWithLog(
ctx context.Context,
// ExecWithLock executes a write query with the write lock held query string,
func (db *DB) ExecWithLock(ctx context.Context, query string, args ...interface{}) (sql.Result, error) { args ...interface{},
db.writeLock.Lock() ) (sql.Result, error) {
defer db.writeLock.Unlock() LogSQL("Execute", query, args...)
return db.conn.ExecContext(ctx, query, args...) return db.conn.ExecContext(ctx, query, args...)
} }
// QueryRowWithLock executes a write query that returns a row with the write lock held // QueryRowWithLog executes a query that returns at most one row with SQL logging.
func (db *DB) QueryRowWithLock(ctx context.Context, query string, args ...interface{}) *sql.Row { // This is useful for queries that modify data and return values (e.g., INSERT ... RETURNING).
db.writeLock.Lock() // SQLite handles its own locking internally.
defer db.writeLock.Unlock() // The query and args parameters follow the same format as sql.DB.QueryRowContext.
func (db *DB) QueryRowWithLog(
ctx context.Context,
query string,
args ...interface{},
) *sql.Row {
LogSQL("QueryRow", query, args...)
return db.conn.QueryRowContext(ctx, query, args...) return db.conn.QueryRowContext(ctx, query, args...)
} }
func (db *DB) createSchema(ctx context.Context) error { func (db *DB) createSchema(ctx context.Context) error {
schema := ` _, err := db.conn.ExecContext(ctx, schemaSQL)
CREATE TABLE IF NOT EXISTS files (
path TEXT PRIMARY KEY,
mtime INTEGER NOT NULL,
ctime INTEGER NOT NULL,
size INTEGER NOT NULL,
mode INTEGER NOT NULL,
uid INTEGER NOT NULL,
gid INTEGER NOT NULL,
link_target TEXT
);
CREATE TABLE IF NOT EXISTS file_chunks (
path TEXT NOT NULL,
idx INTEGER NOT NULL,
chunk_hash TEXT NOT NULL,
PRIMARY KEY (path, idx)
);
CREATE TABLE IF NOT EXISTS chunks (
chunk_hash TEXT PRIMARY KEY,
sha256 TEXT NOT NULL,
size INTEGER NOT NULL
);
CREATE TABLE IF NOT EXISTS blobs (
blob_hash TEXT PRIMARY KEY,
created_ts INTEGER NOT NULL
);
CREATE TABLE IF NOT EXISTS blob_chunks (
blob_hash TEXT NOT NULL,
chunk_hash TEXT NOT NULL,
offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (blob_hash, chunk_hash)
);
CREATE TABLE IF NOT EXISTS chunk_files (
chunk_hash TEXT NOT NULL,
file_path TEXT NOT NULL,
file_offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (chunk_hash, file_path)
);
CREATE TABLE IF NOT EXISTS snapshots (
id TEXT PRIMARY KEY,
hostname TEXT NOT NULL,
vaultik_version TEXT NOT NULL,
created_ts INTEGER NOT NULL,
file_count INTEGER NOT NULL,
chunk_count INTEGER NOT NULL,
blob_count INTEGER NOT NULL,
total_size INTEGER NOT NULL,
blob_size INTEGER NOT NULL,
compression_ratio REAL NOT NULL
);
`
_, err := db.conn.ExecContext(ctx, schema)
return err return err
} }
// NewTestDB creates an in-memory SQLite database for testing purposes.
// The database is automatically initialized with the schema and is ready for use.
// Each call creates a new independent database instance.
func NewTestDB() (*DB, error) {
return New(context.Background(), ":memory:")
}
// repeatPlaceholder generates a string of ", ?" repeated n times for IN clause construction.
// For example, repeatPlaceholder(2) returns ", ?, ?".
func repeatPlaceholder(n int) string {
if n <= 0 {
return ""
}
return strings.Repeat(", ?", n)
}
// LogSQL logs SQL queries and their arguments when debug mode is enabled.
// Debug mode is activated by setting the GODEBUG environment variable to include "vaultik".
// This is useful for troubleshooting database operations and understanding query patterns.
//
// The operation parameter describes the type of SQL operation (e.g., "Execute", "Query").
// The query parameter is the SQL statement being executed.
// The args parameter contains the query arguments that will be interpolated.
func LogSQL(operation, query string, args ...interface{}) {
if strings.Contains(os.Getenv("GODEBUG"), "vaultik") {
log.Debug(
"SQL "+operation,
"query",
strings.TrimSpace(query),
"args",
fmt.Sprintf("%v", args),
)
}
}

View File

@@ -67,21 +67,26 @@ func TestDatabaseConcurrentAccess(t *testing.T) {
}() }()
// Test concurrent writes // Test concurrent writes
done := make(chan bool, 10) type result struct {
index int
err error
}
results := make(chan result, 10)
for i := 0; i < 10; i++ { for i := 0; i < 10; i++ {
go func(i int) { go func(i int) {
_, err := db.ExecWithLock(ctx, "INSERT INTO chunks (chunk_hash, sha256, size) VALUES (?, ?, ?)", _, err := db.ExecWithLog(ctx, "INSERT INTO chunks (chunk_hash, size) VALUES (?, ?)",
fmt.Sprintf("hash%d", i), fmt.Sprintf("sha%d", i), i*1024) fmt.Sprintf("hash%d", i), i*1024)
if err != nil { results <- result{index: i, err: err}
t.Errorf("concurrent insert failed: %v", err)
}
done <- true
}(i) }(i)
} }
// Wait for all goroutines // Wait for all goroutines and check results
for i := 0; i < 10; i++ { for i := 0; i < 10; i++ {
<-done r := <-results
if r.err != nil {
t.Fatalf("concurrent insert %d failed: %v", r.index, r.err)
}
} }
// Verify all inserts succeeded // Verify all inserts succeeded

View File

@@ -4,6 +4,8 @@ import (
"context" "context"
"database/sql" "database/sql"
"fmt" "fmt"
"git.eeqj.de/sneak/vaultik/internal/types"
) )
type FileChunkRepository struct { type FileChunkRepository struct {
@@ -16,16 +18,16 @@ func NewFileChunkRepository(db *DB) *FileChunkRepository {
func (r *FileChunkRepository) Create(ctx context.Context, tx *sql.Tx, fc *FileChunk) error { func (r *FileChunkRepository) Create(ctx context.Context, tx *sql.Tx, fc *FileChunk) error {
query := ` query := `
INSERT INTO file_chunks (path, idx, chunk_hash) INSERT INTO file_chunks (file_id, idx, chunk_hash)
VALUES (?, ?, ?) VALUES (?, ?, ?)
ON CONFLICT(path, idx) DO NOTHING ON CONFLICT(file_id, idx) DO NOTHING
` `
var err error var err error
if tx != nil { if tx != nil {
_, err = tx.ExecContext(ctx, query, fc.Path, fc.Idx, fc.ChunkHash) _, err = tx.ExecContext(ctx, query, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
} else { } else {
_, err = r.db.ExecWithLock(ctx, query, fc.Path, fc.Idx, fc.ChunkHash) _, err = r.db.ExecWithLog(ctx, query, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
} }
if err != nil { if err != nil {
@@ -37,10 +39,11 @@ func (r *FileChunkRepository) Create(ctx context.Context, tx *sql.Tx, fc *FileCh
func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*FileChunk, error) { func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*FileChunk, error) {
query := ` query := `
SELECT path, idx, chunk_hash SELECT fc.file_id, fc.idx, fc.chunk_hash
FROM file_chunks FROM file_chunks fc
WHERE path = ? JOIN files f ON fc.file_id = f.id
ORDER BY idx WHERE f.path = ?
ORDER BY fc.idx
` `
rows, err := r.db.conn.QueryContext(ctx, query, path) rows, err := r.db.conn.QueryContext(ctx, query, path)
@@ -49,13 +52,64 @@ func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*Fi
} }
defer CloseRows(rows) defer CloseRows(rows)
return r.scanFileChunks(rows)
}
// GetByFileID retrieves file chunks by file ID
func (r *FileChunkRepository) GetByFileID(ctx context.Context, fileID types.FileID) ([]*FileChunk, error) {
query := `
SELECT file_id, idx, chunk_hash
FROM file_chunks
WHERE file_id = ?
ORDER BY idx
`
rows, err := r.db.conn.QueryContext(ctx, query, fileID.String())
if err != nil {
return nil, fmt.Errorf("querying file chunks: %w", err)
}
defer CloseRows(rows)
return r.scanFileChunks(rows)
}
// GetByPathTx retrieves file chunks within a transaction
func (r *FileChunkRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path string) ([]*FileChunk, error) {
query := `
SELECT fc.file_id, fc.idx, fc.chunk_hash
FROM file_chunks fc
JOIN files f ON fc.file_id = f.id
WHERE f.path = ?
ORDER BY fc.idx
`
LogSQL("GetByPathTx", query, path)
rows, err := tx.QueryContext(ctx, query, path)
if err != nil {
return nil, fmt.Errorf("querying file chunks: %w", err)
}
defer CloseRows(rows)
fileChunks, err := r.scanFileChunks(rows)
LogSQL("GetByPathTx", "Complete", path, "count", len(fileChunks))
return fileChunks, err
}
// scanFileChunks is a helper that scans file chunk rows
func (r *FileChunkRepository) scanFileChunks(rows *sql.Rows) ([]*FileChunk, error) {
var fileChunks []*FileChunk var fileChunks []*FileChunk
for rows.Next() { for rows.Next() {
var fc FileChunk var fc FileChunk
err := rows.Scan(&fc.Path, &fc.Idx, &fc.ChunkHash) var fileIDStr, chunkHashStr string
err := rows.Scan(&fileIDStr, &fc.Idx, &chunkHashStr)
if err != nil { if err != nil {
return nil, fmt.Errorf("scanning file chunk: %w", err) return nil, fmt.Errorf("scanning file chunk: %w", err)
} }
fc.FileID, err = types.ParseFileID(fileIDStr)
if err != nil {
return nil, fmt.Errorf("parsing file ID: %w", err)
}
fc.ChunkHash = types.ChunkHash(chunkHashStr)
fileChunks = append(fileChunks, &fc) fileChunks = append(fileChunks, &fc)
} }
@@ -63,13 +117,13 @@ func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*Fi
} }
func (r *FileChunkRepository) DeleteByPath(ctx context.Context, tx *sql.Tx, path string) error { func (r *FileChunkRepository) DeleteByPath(ctx context.Context, tx *sql.Tx, path string) error {
query := `DELETE FROM file_chunks WHERE path = ?` query := `DELETE FROM file_chunks WHERE file_id = (SELECT id FROM files WHERE path = ?)`
var err error var err error
if tx != nil { if tx != nil {
_, err = tx.ExecContext(ctx, query, path) _, err = tx.ExecContext(ctx, query, path)
} else { } else {
_, err = r.db.ExecWithLock(ctx, query, path) _, err = r.db.ExecWithLog(ctx, query, path)
} }
if err != nil { if err != nil {
@@ -78,3 +132,117 @@ func (r *FileChunkRepository) DeleteByPath(ctx context.Context, tx *sql.Tx, path
return nil return nil
} }
// DeleteByFileID deletes all chunks for a file by its UUID
func (r *FileChunkRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fileID types.FileID) error {
query := `DELETE FROM file_chunks WHERE file_id = ?`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, fileID.String())
} else {
_, err = r.db.ExecWithLog(ctx, query, fileID.String())
}
if err != nil {
return fmt.Errorf("deleting file chunks: %w", err)
}
return nil
}
// DeleteByFileIDs deletes all chunks for multiple files in a single statement.
func (r *FileChunkRepository) DeleteByFileIDs(ctx context.Context, tx *sql.Tx, fileIDs []types.FileID) error {
if len(fileIDs) == 0 {
return nil
}
// Batch at 500 to stay within SQLite's variable limit
const batchSize = 500
for i := 0; i < len(fileIDs); i += batchSize {
end := i + batchSize
if end > len(fileIDs) {
end = len(fileIDs)
}
batch := fileIDs[i:end]
query := "DELETE FROM file_chunks WHERE file_id IN (?" + repeatPlaceholder(len(batch)-1) + ")"
args := make([]interface{}, len(batch))
for j, id := range batch {
args[j] = id.String()
}
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch deleting file_chunks: %w", err)
}
}
return nil
}
// CreateBatch inserts multiple file_chunks in a single statement for efficiency.
// Batches are automatically split to stay within SQLite's variable limit.
func (r *FileChunkRepository) CreateBatch(ctx context.Context, tx *sql.Tx, fcs []FileChunk) error {
if len(fcs) == 0 {
return nil
}
// SQLite has a limit on variables (typically 999 or 32766).
// Each FileChunk has 3 values, so batch at 300 to be safe.
const batchSize = 300
for i := 0; i < len(fcs); i += batchSize {
end := i + batchSize
if end > len(fcs) {
end = len(fcs)
}
batch := fcs[i:end]
// Build the query with multiple value sets
query := "INSERT INTO file_chunks (file_id, idx, chunk_hash) VALUES "
args := make([]interface{}, 0, len(batch)*3)
for j, fc := range batch {
if j > 0 {
query += ", "
}
query += "(?, ?, ?)"
args = append(args, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
}
query += " ON CONFLICT(file_id, idx) DO NOTHING"
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch inserting file_chunks: %w", err)
}
}
return nil
}
// GetByFile is an alias for GetByPath for compatibility
func (r *FileChunkRepository) GetByFile(ctx context.Context, path string) ([]*FileChunk, error) {
LogSQL("GetByFile", "Starting", path)
result, err := r.GetByPath(ctx, path)
LogSQL("GetByFile", "Complete", path, "count", len(result))
return result, err
}
// GetByFileTx retrieves file chunks within a transaction
func (r *FileChunkRepository) GetByFileTx(ctx context.Context, tx *sql.Tx, path string) ([]*FileChunk, error) {
LogSQL("GetByFileTx", "Starting", path)
result, err := r.GetByPathTx(ctx, tx, path)
LogSQL("GetByFileTx", "Complete", path, "count", len(result))
return result, err
}

View File

@@ -4,6 +4,9 @@ import (
"context" "context"
"fmt" "fmt"
"testing" "testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
) )
func TestFileChunkRepository(t *testing.T) { func TestFileChunkRepository(t *testing.T) {
@@ -12,24 +15,56 @@ func TestFileChunkRepository(t *testing.T) {
ctx := context.Background() ctx := context.Background()
repo := NewFileChunkRepository(db) repo := NewFileChunkRepository(db)
fileRepo := NewFileRepository(db)
// Create test file first
testTime := time.Now().Truncate(time.Second)
file := &File{
Path: "/test/file.txt",
MTime: testTime,
CTime: testTime,
Size: 3072,
Mode: 0644,
UID: 1000,
GID: 1000,
LinkTarget: "",
}
err := fileRepo.Create(ctx, nil, file)
if err != nil {
t.Fatalf("failed to create file: %v", err)
}
// Create chunks first
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
chunkRepo := NewChunkRepository(db)
for _, chunkHash := range chunks {
chunk := &Chunk{
ChunkHash: chunkHash,
Size: 1024,
}
err = chunkRepo.Create(ctx, nil, chunk)
if err != nil {
t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
}
}
// Test Create // Test Create
fc1 := &FileChunk{ fc1 := &FileChunk{
Path: "/test/file.txt", FileID: file.ID,
Idx: 0, Idx: 0,
ChunkHash: "chunk1", ChunkHash: types.ChunkHash("chunk1"),
} }
err := repo.Create(ctx, nil, fc1) err = repo.Create(ctx, nil, fc1)
if err != nil { if err != nil {
t.Fatalf("failed to create file chunk: %v", err) t.Fatalf("failed to create file chunk: %v", err)
} }
// Add more chunks for the same file // Add more chunks for the same file
fc2 := &FileChunk{ fc2 := &FileChunk{
Path: "/test/file.txt", FileID: file.ID,
Idx: 1, Idx: 1,
ChunkHash: "chunk2", ChunkHash: types.ChunkHash("chunk2"),
} }
err = repo.Create(ctx, nil, fc2) err = repo.Create(ctx, nil, fc2)
if err != nil { if err != nil {
@@ -37,26 +72,26 @@ func TestFileChunkRepository(t *testing.T) {
} }
fc3 := &FileChunk{ fc3 := &FileChunk{
Path: "/test/file.txt", FileID: file.ID,
Idx: 2, Idx: 2,
ChunkHash: "chunk3", ChunkHash: types.ChunkHash("chunk3"),
} }
err = repo.Create(ctx, nil, fc3) err = repo.Create(ctx, nil, fc3)
if err != nil { if err != nil {
t.Fatalf("failed to create third file chunk: %v", err) t.Fatalf("failed to create third file chunk: %v", err)
} }
// Test GetByPath // Test GetByFile
chunks, err := repo.GetByPath(ctx, "/test/file.txt") fileChunks, err := repo.GetByFile(ctx, "/test/file.txt")
if err != nil { if err != nil {
t.Fatalf("failed to get file chunks: %v", err) t.Fatalf("failed to get file chunks: %v", err)
} }
if len(chunks) != 3 { if len(fileChunks) != 3 {
t.Errorf("expected 3 chunks, got %d", len(chunks)) t.Errorf("expected 3 chunks, got %d", len(fileChunks))
} }
// Verify order // Verify order
for i, chunk := range chunks { for i, chunk := range fileChunks {
if chunk.Idx != i { if chunk.Idx != i {
t.Errorf("wrong chunk order: expected idx %d, got %d", i, chunk.Idx) t.Errorf("wrong chunk order: expected idx %d, got %d", i, chunk.Idx)
} }
@@ -68,18 +103,18 @@ func TestFileChunkRepository(t *testing.T) {
t.Fatalf("failed to create duplicate file chunk: %v", err) t.Fatalf("failed to create duplicate file chunk: %v", err)
} }
// Test DeleteByPath // Test DeleteByFileID
err = repo.DeleteByPath(ctx, nil, "/test/file.txt") err = repo.DeleteByFileID(ctx, nil, file.ID)
if err != nil { if err != nil {
t.Fatalf("failed to delete file chunks: %v", err) t.Fatalf("failed to delete file chunks: %v", err)
} }
chunks, err = repo.GetByPath(ctx, "/test/file.txt") fileChunks, err = repo.GetByFileID(ctx, file.ID)
if err != nil { if err != nil {
t.Fatalf("failed to get deleted file chunks: %v", err) t.Fatalf("failed to get deleted file chunks: %v", err)
} }
if len(chunks) != 0 { if len(fileChunks) != 0 {
t.Errorf("expected 0 chunks after delete, got %d", len(chunks)) t.Errorf("expected 0 chunks after delete, got %d", len(fileChunks))
} }
} }
@@ -89,15 +124,54 @@ func TestFileChunkRepositoryMultipleFiles(t *testing.T) {
ctx := context.Background() ctx := context.Background()
repo := NewFileChunkRepository(db) repo := NewFileChunkRepository(db)
fileRepo := NewFileRepository(db)
// Create test files
testTime := time.Now().Truncate(time.Second)
filePaths := []string{"/file1.txt", "/file2.txt", "/file3.txt"}
files := make([]*File, len(filePaths))
for i, path := range filePaths {
file := &File{
Path: types.FilePath(path),
MTime: testTime,
CTime: testTime,
Size: 2048,
Mode: 0644,
UID: 1000,
GID: 1000,
LinkTarget: "",
}
err := fileRepo.Create(ctx, nil, file)
if err != nil {
t.Fatalf("failed to create file %s: %v", path, err)
}
files[i] = file
}
// Create all chunks first
chunkRepo := NewChunkRepository(db)
for i := range files {
for j := 0; j < 2; j++ {
chunkHash := types.ChunkHash(fmt.Sprintf("file%d_chunk%d", i, j))
chunk := &Chunk{
ChunkHash: chunkHash,
Size: 1024,
}
err := chunkRepo.Create(ctx, nil, chunk)
if err != nil {
t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
}
}
}
// Create chunks for multiple files // Create chunks for multiple files
files := []string{"/file1.txt", "/file2.txt", "/file3.txt"} for i, file := range files {
for _, path := range files { for j := 0; j < 2; j++ {
for i := 0; i < 2; i++ {
fc := &FileChunk{ fc := &FileChunk{
Path: path, FileID: file.ID,
Idx: i, Idx: j,
ChunkHash: fmt.Sprintf("%s_chunk%d", path, i), ChunkHash: types.ChunkHash(fmt.Sprintf("file%d_chunk%d", i, j)),
} }
err := repo.Create(ctx, nil, fc) err := repo.Create(ctx, nil, fc)
if err != nil { if err != nil {
@@ -107,13 +181,13 @@ func TestFileChunkRepositoryMultipleFiles(t *testing.T) {
} }
// Verify each file has correct chunks // Verify each file has correct chunks
for _, path := range files { for i, file := range files {
chunks, err := repo.GetByPath(ctx, path) chunks, err := repo.GetByFileID(ctx, file.ID)
if err != nil { if err != nil {
t.Fatalf("failed to get chunks for %s: %v", path, err) t.Fatalf("failed to get chunks for file %d: %v", i, err)
} }
if len(chunks) != 2 { if len(chunks) != 2 {
t.Errorf("expected 2 chunks for %s, got %d", path, len(chunks)) t.Errorf("expected 2 chunks for file %d, got %d", i, len(chunks))
} }
} }
} }

View File

@@ -5,6 +5,9 @@ import (
"database/sql" "database/sql"
"fmt" "fmt"
"time" "time"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/types"
) )
type FileRepository struct { type FileRepository struct {
@@ -16,10 +19,16 @@ func NewFileRepository(db *DB) *FileRepository {
} }
func (r *FileRepository) Create(ctx context.Context, tx *sql.Tx, file *File) error { func (r *FileRepository) Create(ctx context.Context, tx *sql.Tx, file *File) error {
// Generate UUID if not provided
if file.ID.IsZero() {
file.ID = types.NewFileID()
}
query := ` query := `
INSERT INTO files (path, mtime, ctime, size, mode, uid, gid, link_target) INSERT INTO files (id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target)
VALUES (?, ?, ?, ?, ?, ?, ?, ?) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(path) DO UPDATE SET ON CONFLICT(path) DO UPDATE SET
source_path = excluded.source_path,
mtime = excluded.mtime, mtime = excluded.mtime,
ctime = excluded.ctime, ctime = excluded.ctime,
size = excluded.size, size = excluded.size,
@@ -27,43 +36,78 @@ func (r *FileRepository) Create(ctx context.Context, tx *sql.Tx, file *File) err
uid = excluded.uid, uid = excluded.uid,
gid = excluded.gid, gid = excluded.gid,
link_target = excluded.link_target link_target = excluded.link_target
RETURNING id
` `
var idStr string
var err error var err error
if tx != nil { if tx != nil {
_, err = tx.ExecContext(ctx, query, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget) LogSQL("Execute", query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String())
err = tx.QueryRowContext(ctx, query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String()).Scan(&idStr)
} else { } else {
_, err = r.db.ExecWithLock(ctx, query, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget) err = r.db.QueryRowWithLog(ctx, query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String()).Scan(&idStr)
} }
if err != nil { if err != nil {
return fmt.Errorf("inserting file: %w", err) return fmt.Errorf("inserting file: %w", err)
} }
// Parse the returned ID
file.ID, err = types.ParseFileID(idStr)
if err != nil {
return fmt.Errorf("parsing file ID: %w", err)
}
return nil return nil
} }
func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, error) { func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, error) {
query := ` query := `
SELECT path, mtime, ctime, size, mode, uid, gid, link_target SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
FROM files FROM files
WHERE path = ? WHERE path = ?
` `
var file File file, err := r.scanFile(r.db.conn.QueryRowContext(ctx, query, path))
var mtimeUnix, ctimeUnix int64 if err == sql.ErrNoRows {
var linkTarget sql.NullString return nil, nil
}
if err != nil {
return nil, fmt.Errorf("querying file: %w", err)
}
err := r.db.conn.QueryRowContext(ctx, query, path).Scan( return file, nil
&file.Path, }
&mtimeUnix,
&ctimeUnix, // GetByID retrieves a file by its UUID
&file.Size, func (r *FileRepository) GetByID(ctx context.Context, id types.FileID) (*File, error) {
&file.Mode, query := `
&file.UID, SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
&file.GID, FROM files
&linkTarget, WHERE id = ?
) `
file, err := r.scanFile(r.db.conn.QueryRowContext(ctx, query, id.String()))
if err == sql.ErrNoRows {
return nil, nil
}
if err != nil {
return nil, fmt.Errorf("querying file: %w", err)
}
return file, nil
}
func (r *FileRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path string) (*File, error) {
query := `
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
WHERE path = ?
`
LogSQL("GetByPathTx QueryRowContext", query, path)
file, err := r.scanFile(tx.QueryRowContext(ctx, query, path))
LogSQL("GetByPathTx Scan complete", query, path)
if err == sql.ErrNoRows { if err == sql.ErrNoRows {
return nil, nil return nil, nil
@@ -72,10 +116,80 @@ func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, err
return nil, fmt.Errorf("querying file: %w", err) return nil, fmt.Errorf("querying file: %w", err)
} }
file.MTime = time.Unix(mtimeUnix, 0) return file, nil
file.CTime = time.Unix(ctimeUnix, 0) }
// scanFile is a helper that scans a single file row
func (r *FileRepository) scanFile(row *sql.Row) (*File, error) {
var file File
var idStr, pathStr, sourcePathStr string
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
err := row.Scan(
&idStr,
&pathStr,
&sourcePathStr,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
if err != nil {
return nil, err
}
file.ID, err = types.ParseFileID(idStr)
if err != nil {
return nil, fmt.Errorf("parsing file ID: %w", err)
}
file.Path = types.FilePath(pathStr)
file.SourcePath = types.SourcePath(sourcePathStr)
file.MTime = time.Unix(mtimeUnix, 0).UTC()
file.CTime = time.Unix(ctimeUnix, 0).UTC()
if linkTarget.Valid { if linkTarget.Valid {
file.LinkTarget = linkTarget.String file.LinkTarget = types.FilePath(linkTarget.String)
}
return &file, nil
}
// scanFileRows is a helper that scans a file row from rows iterator
func (r *FileRepository) scanFileRows(rows *sql.Rows) (*File, error) {
var file File
var idStr, pathStr, sourcePathStr string
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
err := rows.Scan(
&idStr,
&pathStr,
&sourcePathStr,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
if err != nil {
return nil, err
}
file.ID, err = types.ParseFileID(idStr)
if err != nil {
return nil, fmt.Errorf("parsing file ID: %w", err)
}
file.Path = types.FilePath(pathStr)
file.SourcePath = types.SourcePath(sourcePathStr)
file.MTime = time.Unix(mtimeUnix, 0).UTC()
file.CTime = time.Unix(ctimeUnix, 0).UTC()
if linkTarget.Valid {
file.LinkTarget = types.FilePath(linkTarget.String)
} }
return &file, nil return &file, nil
@@ -83,7 +197,7 @@ func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, err
func (r *FileRepository) ListModifiedSince(ctx context.Context, since time.Time) ([]*File, error) { func (r *FileRepository) ListModifiedSince(ctx context.Context, since time.Time) ([]*File, error) {
query := ` query := `
SELECT path, mtime, ctime, size, mode, uid, gid, link_target SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
FROM files FROM files
WHERE mtime >= ? WHERE mtime >= ?
ORDER BY path ORDER BY path
@@ -97,31 +211,11 @@ func (r *FileRepository) ListModifiedSince(ctx context.Context, since time.Time)
var files []*File var files []*File
for rows.Next() { for rows.Next() {
var file File file, err := r.scanFileRows(rows)
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
err := rows.Scan(
&file.Path,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
if err != nil { if err != nil {
return nil, fmt.Errorf("scanning file: %w", err) return nil, fmt.Errorf("scanning file: %w", err)
} }
files = append(files, file)
file.MTime = time.Unix(mtimeUnix, 0)
file.CTime = time.Unix(ctimeUnix, 0)
if linkTarget.Valid {
file.LinkTarget = linkTarget.String
}
files = append(files, &file)
} }
return files, rows.Err() return files, rows.Err()
@@ -134,7 +228,7 @@ func (r *FileRepository) Delete(ctx context.Context, tx *sql.Tx, path string) er
if tx != nil { if tx != nil {
_, err = tx.ExecContext(ctx, query, path) _, err = tx.ExecContext(ctx, query, path)
} else { } else {
_, err = r.db.ExecWithLock(ctx, query, path) _, err = r.db.ExecWithLog(ctx, query, path)
} }
if err != nil { if err != nil {
@@ -143,3 +237,146 @@ func (r *FileRepository) Delete(ctx context.Context, tx *sql.Tx, path string) er
return nil return nil
} }
// DeleteByID deletes a file by its UUID
func (r *FileRepository) DeleteByID(ctx context.Context, tx *sql.Tx, id types.FileID) error {
query := `DELETE FROM files WHERE id = ?`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, id.String())
} else {
_, err = r.db.ExecWithLog(ctx, query, id.String())
}
if err != nil {
return fmt.Errorf("deleting file: %w", err)
}
return nil
}
func (r *FileRepository) ListByPrefix(ctx context.Context, prefix string) ([]*File, error) {
query := `
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
WHERE path LIKE ? || '%'
ORDER BY path
`
rows, err := r.db.conn.QueryContext(ctx, query, prefix)
if err != nil {
return nil, fmt.Errorf("querying files: %w", err)
}
defer CloseRows(rows)
var files []*File
for rows.Next() {
file, err := r.scanFileRows(rows)
if err != nil {
return nil, fmt.Errorf("scanning file: %w", err)
}
files = append(files, file)
}
return files, rows.Err()
}
// ListAll returns all files in the database
func (r *FileRepository) ListAll(ctx context.Context) ([]*File, error) {
query := `
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
ORDER BY path
`
rows, err := r.db.conn.QueryContext(ctx, query)
if err != nil {
return nil, fmt.Errorf("querying files: %w", err)
}
defer CloseRows(rows)
var files []*File
for rows.Next() {
file, err := r.scanFileRows(rows)
if err != nil {
return nil, fmt.Errorf("scanning file: %w", err)
}
files = append(files, file)
}
return files, rows.Err()
}
// CreateBatch inserts or updates multiple files in a single statement for efficiency.
// File IDs must be pre-generated before calling this method.
func (r *FileRepository) CreateBatch(ctx context.Context, tx *sql.Tx, files []*File) error {
if len(files) == 0 {
return nil
}
// Each File has 10 values, so batch at 100 to be safe with SQLite's variable limit
const batchSize = 100
for i := 0; i < len(files); i += batchSize {
end := i + batchSize
if end > len(files) {
end = len(files)
}
batch := files[i:end]
query := `INSERT INTO files (id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target) VALUES `
args := make([]interface{}, 0, len(batch)*10)
for j, f := range batch {
if j > 0 {
query += ", "
}
query += "(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"
args = append(args, f.ID.String(), f.Path.String(), f.SourcePath.String(), f.MTime.Unix(), f.CTime.Unix(), f.Size, f.Mode, f.UID, f.GID, f.LinkTarget.String())
}
query += ` ON CONFLICT(path) DO UPDATE SET
source_path = excluded.source_path,
mtime = excluded.mtime,
ctime = excluded.ctime,
size = excluded.size,
mode = excluded.mode,
uid = excluded.uid,
gid = excluded.gid,
link_target = excluded.link_target`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch inserting files: %w", err)
}
}
return nil
}
// DeleteOrphaned deletes files that are not referenced by any snapshot
func (r *FileRepository) DeleteOrphaned(ctx context.Context) error {
query := `
DELETE FROM files
WHERE NOT EXISTS (
SELECT 1 FROM snapshot_files
WHERE snapshot_files.file_id = files.id
)
`
result, err := r.db.ExecWithLog(ctx, query)
if err != nil {
return fmt.Errorf("deleting orphaned files: %w", err)
}
rowsAffected, _ := result.RowsAffected()
if rowsAffected > 0 {
log.Debug("Deleted orphaned files", "count", rowsAffected)
}
return nil
}

View File

@@ -53,7 +53,7 @@ func TestFileRepository(t *testing.T) {
} }
// Test GetByPath // Test GetByPath
retrieved, err := repo.GetByPath(ctx, file.Path) retrieved, err := repo.GetByPath(ctx, file.Path.String())
if err != nil { if err != nil {
t.Fatalf("failed to get file: %v", err) t.Fatalf("failed to get file: %v", err)
} }
@@ -81,7 +81,7 @@ func TestFileRepository(t *testing.T) {
t.Fatalf("failed to update file: %v", err) t.Fatalf("failed to update file: %v", err)
} }
retrieved, err = repo.GetByPath(ctx, file.Path) retrieved, err = repo.GetByPath(ctx, file.Path.String())
if err != nil { if err != nil {
t.Fatalf("failed to get updated file: %v", err) t.Fatalf("failed to get updated file: %v", err)
} }
@@ -99,12 +99,12 @@ func TestFileRepository(t *testing.T) {
} }
// Test Delete // Test Delete
err = repo.Delete(ctx, nil, file.Path) err = repo.Delete(ctx, nil, file.Path.String())
if err != nil { if err != nil {
t.Fatalf("failed to delete file: %v", err) t.Fatalf("failed to delete file: %v", err)
} }
retrieved, err = repo.GetByPath(ctx, file.Path) retrieved, err = repo.GetByPath(ctx, file.Path.String())
if err != nil { if err != nil {
t.Fatalf("error getting deleted file: %v", err) t.Fatalf("error getting deleted file: %v", err)
} }
@@ -137,7 +137,7 @@ func TestFileRepositorySymlink(t *testing.T) {
t.Fatalf("failed to create symlink: %v", err) t.Fatalf("failed to create symlink: %v", err)
} }
retrieved, err := repo.GetByPath(ctx, symlink.Path) retrieved, err := repo.GetByPath(ctx, symlink.Path.String())
if err != nil { if err != nil {
t.Fatalf("failed to get symlink: %v", err) t.Fatalf("failed to get symlink: %v", err)
} }

View File

@@ -1,70 +1,125 @@
// Package database provides data models and repository interfaces for the Vaultik backup system.
// It includes types for files, chunks, blobs, snapshots, and their relationships.
package database package database
import "time" import (
"time"
// File represents a file record in the database "git.eeqj.de/sneak/vaultik/internal/types"
)
// File represents a file or directory in the backup system.
// It stores metadata about files including timestamps, permissions, ownership,
// and symlink targets. This information is used to restore files with their
// original attributes.
type File struct { type File struct {
Path string ID types.FileID // UUID primary key
Path types.FilePath // Absolute path of the file
SourcePath types.SourcePath // The source directory this file came from (for restore path stripping)
MTime time.Time MTime time.Time
CTime time.Time CTime time.Time
Size int64 Size int64
Mode uint32 Mode uint32
UID uint32 UID uint32
GID uint32 GID uint32
LinkTarget string // empty for regular files, target path for symlinks LinkTarget types.FilePath // empty for regular files, target path for symlinks
} }
// IsSymlink returns true if this file is a symbolic link // IsSymlink returns true if this file is a symbolic link.
// A file is considered a symlink if it has a non-empty LinkTarget.
func (f *File) IsSymlink() bool { func (f *File) IsSymlink() bool {
return f.LinkTarget != "" return f.LinkTarget != ""
} }
// FileChunk represents the mapping between files and chunks // FileChunk represents the mapping between files and their constituent chunks.
// Large files are split into multiple chunks for efficient deduplication and storage.
// The Idx field maintains the order of chunks within a file.
type FileChunk struct { type FileChunk struct {
Path string FileID types.FileID
Idx int Idx int
ChunkHash string ChunkHash types.ChunkHash
} }
// Chunk represents a chunk record in the database // Chunk represents a data chunk in the deduplication system.
// Files are split into chunks which are content-addressed by their hash.
// The ChunkHash is the SHA256 hash of the chunk content, used for deduplication.
type Chunk struct { type Chunk struct {
ChunkHash string ChunkHash types.ChunkHash
SHA256 string
Size int64 Size int64
} }
// Blob represents a blob record in the database // Blob represents a blob record in the database.
// A blob is Vaultik's final storage unit - a large file (up to 10GB) containing
// many compressed and encrypted chunks from multiple source files.
// Blobs are content-addressed, meaning their filename in S3 is derived from
// the SHA256 hash of their compressed and encrypted content.
// The blob creation process is: chunks are accumulated -> compressed with zstd
// -> encrypted with age -> hashed -> uploaded to S3 with the hash as filename.
type Blob struct { type Blob struct {
BlobHash string ID types.BlobID // UUID assigned when blob creation starts
CreatedTS time.Time Hash types.BlobHash // SHA256 of final compressed+encrypted content (empty until finalized)
CreatedTS time.Time // When blob creation started
FinishedTS *time.Time // When blob was finalized (nil if still packing)
UncompressedSize int64 // Total size of raw chunks before compression
CompressedSize int64 // Size after compression and encryption
UploadedTS *time.Time // When blob was uploaded to S3 (nil if not uploaded)
} }
// BlobChunk represents the mapping between blobs and chunks // BlobChunk represents the mapping between blobs and the chunks they contain.
// This allows tracking which chunks are stored in which blobs, along with
// their position and size within the blob. The offset and length fields
// enable extracting specific chunks from a blob without processing the entire blob.
type BlobChunk struct { type BlobChunk struct {
BlobHash string BlobID types.BlobID
ChunkHash string ChunkHash types.ChunkHash
Offset int64 Offset int64
Length int64 Length int64
} }
// ChunkFile represents the reverse mapping of chunks to files // ChunkFile represents the reverse mapping showing which files contain a specific chunk.
// This is used during deduplication to identify all files that share a chunk,
// which is important for garbage collection and integrity verification.
type ChunkFile struct { type ChunkFile struct {
ChunkHash string ChunkHash types.ChunkHash
FilePath string FileID types.FileID
FileOffset int64 FileOffset int64
Length int64 Length int64
} }
// Snapshot represents a snapshot record in the database // Snapshot represents a snapshot record in the database
type Snapshot struct { type Snapshot struct {
ID string ID types.SnapshotID
Hostname string Hostname types.Hostname
VaultikVersion string VaultikVersion types.Version
CreatedTS time.Time VaultikGitRevision types.GitRevision
StartedAt time.Time
CompletedAt *time.Time // nil if still in progress
FileCount int64 FileCount int64
ChunkCount int64 ChunkCount int64
BlobCount int64 BlobCount int64
TotalSize int64 // Total size of all referenced files TotalSize int64 // Total size of all referenced files
BlobSize int64 // Total size of all referenced blobs (compressed and encrypted) BlobSize int64 // Total size of all referenced blobs (compressed and encrypted)
CompressionRatio float64 // Compression ratio (BlobSize / TotalSize) BlobUncompressedSize int64 // Total uncompressed size of all referenced blobs
CompressionRatio float64 // Compression ratio (BlobSize / BlobUncompressedSize)
CompressionLevel int // Compression level used for this snapshot
UploadBytes int64 // Total bytes uploaded during this snapshot
UploadDurationMs int64 // Total milliseconds spent uploading to S3
}
// IsComplete returns true if the snapshot has completed
func (s *Snapshot) IsComplete() bool {
return s.CompletedAt != nil
}
// SnapshotFile represents the mapping between snapshots and files
type SnapshotFile struct {
SnapshotID types.SnapshotID
FileID types.FileID
}
// SnapshotBlob represents the mapping between snapshots and blobs
type SnapshotBlob struct {
SnapshotID types.SnapshotID
BlobID types.BlobID
BlobHash types.BlobHash // Denormalized for easier manifest generation
} }

View File

@@ -7,6 +7,7 @@ import (
"path/filepath" "path/filepath"
"git.eeqj.de/sneak/vaultik/internal/config" "git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/log"
"go.uber.org/fx" "go.uber.org/fx"
) )
@@ -32,7 +33,13 @@ func provideDatabase(lc fx.Lifecycle, cfg *config.Config) (*DB, error) {
lc.Append(fx.Hook{ lc.Append(fx.Hook{
OnStop: func(ctx context.Context) error { OnStop: func(ctx context.Context) error {
return db.Close() log.Debug("Database module OnStop hook called")
if err := db.Close(); err != nil {
log.Error("Failed to close database in OnStop hook", "error", err)
return err
}
log.Debug("Database closed successfully in OnStop hook")
return nil
}, },
}) })

View File

@@ -6,6 +6,9 @@ import (
"fmt" "fmt"
) )
// Repositories provides access to all database repositories.
// It serves as a centralized access point for all database operations
// and manages transaction coordination across repositories.
type Repositories struct { type Repositories struct {
db *DB db *DB
Files *FileRepository Files *FileRepository
@@ -15,8 +18,11 @@ type Repositories struct {
BlobChunks *BlobChunkRepository BlobChunks *BlobChunkRepository
ChunkFiles *ChunkFileRepository ChunkFiles *ChunkFileRepository
Snapshots *SnapshotRepository Snapshots *SnapshotRepository
Uploads *UploadRepository
} }
// NewRepositories creates a new Repositories instance with all repository types.
// Each repository shares the same database connection for coordinated transactions.
func NewRepositories(db *DB) *Repositories { func NewRepositories(db *DB) *Repositories {
return &Repositories{ return &Repositories{
db: db, db: db,
@@ -27,20 +33,26 @@ func NewRepositories(db *DB) *Repositories {
BlobChunks: NewBlobChunkRepository(db), BlobChunks: NewBlobChunkRepository(db),
ChunkFiles: NewChunkFileRepository(db), ChunkFiles: NewChunkFileRepository(db),
Snapshots: NewSnapshotRepository(db), Snapshots: NewSnapshotRepository(db),
Uploads: NewUploadRepository(db.conn),
} }
} }
// TxFunc is a function that executes within a database transaction.
// The transaction is automatically committed if the function returns nil,
// or rolled back if it returns an error.
type TxFunc func(ctx context.Context, tx *sql.Tx) error type TxFunc func(ctx context.Context, tx *sql.Tx) error
// WithTx executes a function within a write transaction.
// SQLite handles its own locking internally, so no explicit locking is needed.
// The transaction is automatically committed on success or rolled back on error.
// This method should be used for all write operations to ensure atomicity.
func (r *Repositories) WithTx(ctx context.Context, fn TxFunc) error { func (r *Repositories) WithTx(ctx context.Context, fn TxFunc) error {
// Acquire write lock for the entire transaction LogSQL("WithTx", "Beginning transaction", "")
r.db.LockForWrite()
defer r.db.UnlockWrite()
tx, err := r.db.BeginTx(ctx, nil) tx, err := r.db.BeginTx(ctx, nil)
if err != nil { if err != nil {
return fmt.Errorf("beginning transaction: %w", err) return fmt.Errorf("beginning transaction: %w", err)
} }
LogSQL("WithTx", "Transaction started", "")
defer func() { defer func() {
if p := recover(); p != nil { if p := recover(); p != nil {
@@ -63,6 +75,15 @@ func (r *Repositories) WithTx(ctx context.Context, fn TxFunc) error {
return tx.Commit() return tx.Commit()
} }
// DB returns the underlying database for direct queries
func (r *Repositories) DB() *DB {
return r.db
}
// WithReadTx executes a function within a read-only transaction.
// Read transactions can run concurrently with other read transactions
// but will be blocked by write transactions. The transaction is
// automatically committed on success or rolled back on error.
func (r *Repositories) WithReadTx(ctx context.Context, fn TxFunc) error { func (r *Repositories) WithReadTx(ctx context.Context, fn TxFunc) error {
opts := &sql.TxOptions{ opts := &sql.TxOptions{
ReadOnly: true, ReadOnly: true,

View File

@@ -6,6 +6,8 @@ import (
"fmt" "fmt"
"testing" "testing"
"time" "time"
"git.eeqj.de/sneak/vaultik/internal/types"
) )
func TestRepositoriesTransaction(t *testing.T) { func TestRepositoriesTransaction(t *testing.T) {
@@ -33,8 +35,7 @@ func TestRepositoriesTransaction(t *testing.T) {
// Create chunks // Create chunks
chunk1 := &Chunk{ chunk1 := &Chunk{
ChunkHash: "tx_chunk1", ChunkHash: types.ChunkHash("tx_chunk1"),
SHA256: "tx_sha1",
Size: 512, Size: 512,
} }
if err := repos.Chunks.Create(ctx, tx, chunk1); err != nil { if err := repos.Chunks.Create(ctx, tx, chunk1); err != nil {
@@ -42,8 +43,7 @@ func TestRepositoriesTransaction(t *testing.T) {
} }
chunk2 := &Chunk{ chunk2 := &Chunk{
ChunkHash: "tx_chunk2", ChunkHash: types.ChunkHash("tx_chunk2"),
SHA256: "tx_sha2",
Size: 512, Size: 512,
} }
if err := repos.Chunks.Create(ctx, tx, chunk2); err != nil { if err := repos.Chunks.Create(ctx, tx, chunk2); err != nil {
@@ -52,7 +52,7 @@ func TestRepositoriesTransaction(t *testing.T) {
// Map chunks to file // Map chunks to file
fc1 := &FileChunk{ fc1 := &FileChunk{
Path: file.Path, FileID: file.ID,
Idx: 0, Idx: 0,
ChunkHash: chunk1.ChunkHash, ChunkHash: chunk1.ChunkHash,
} }
@@ -61,7 +61,7 @@ func TestRepositoriesTransaction(t *testing.T) {
} }
fc2 := &FileChunk{ fc2 := &FileChunk{
Path: file.Path, FileID: file.ID,
Idx: 1, Idx: 1,
ChunkHash: chunk2.ChunkHash, ChunkHash: chunk2.ChunkHash,
} }
@@ -71,7 +71,8 @@ func TestRepositoriesTransaction(t *testing.T) {
// Create blob // Create blob
blob := &Blob{ blob := &Blob{
BlobHash: "tx_blob1", ID: types.NewBlobID(),
Hash: types.BlobHash("tx_blob1"),
CreatedTS: time.Now().Truncate(time.Second), CreatedTS: time.Now().Truncate(time.Second),
} }
if err := repos.Blobs.Create(ctx, tx, blob); err != nil { if err := repos.Blobs.Create(ctx, tx, blob); err != nil {
@@ -80,7 +81,7 @@ func TestRepositoriesTransaction(t *testing.T) {
// Map chunks to blob // Map chunks to blob
bc1 := &BlobChunk{ bc1 := &BlobChunk{
BlobHash: blob.BlobHash, BlobID: blob.ID,
ChunkHash: chunk1.ChunkHash, ChunkHash: chunk1.ChunkHash,
Offset: 0, Offset: 0,
Length: 512, Length: 512,
@@ -90,7 +91,7 @@ func TestRepositoriesTransaction(t *testing.T) {
} }
bc2 := &BlobChunk{ bc2 := &BlobChunk{
BlobHash: blob.BlobHash, BlobID: blob.ID,
ChunkHash: chunk2.ChunkHash, ChunkHash: chunk2.ChunkHash,
Offset: 512, Offset: 512,
Length: 512, Length: 512,
@@ -115,7 +116,7 @@ func TestRepositoriesTransaction(t *testing.T) {
t.Error("expected file after transaction") t.Error("expected file after transaction")
} }
chunks, err := repos.FileChunks.GetByPath(ctx, "/test/tx_file.txt") chunks, err := repos.FileChunks.GetByFile(ctx, "/test/tx_file.txt")
if err != nil { if err != nil {
t.Fatalf("failed to get file chunks: %v", err) t.Fatalf("failed to get file chunks: %v", err)
} }
@@ -157,8 +158,7 @@ func TestRepositoriesTransactionRollback(t *testing.T) {
// Create a chunk // Create a chunk
chunk := &Chunk{ chunk := &Chunk{
ChunkHash: "rollback_chunk", ChunkHash: types.ChunkHash("rollback_chunk"),
SHA256: "rollback_sha",
Size: 1024, Size: 1024,
} }
if err := repos.Chunks.Create(ctx, tx, chunk); err != nil { if err := repos.Chunks.Create(ctx, tx, chunk); err != nil {
@@ -217,7 +217,7 @@ func TestRepositoriesReadTransaction(t *testing.T) {
var retrievedFile *File var retrievedFile *File
err = repos.WithReadTx(ctx, func(ctx context.Context, tx *sql.Tx) error { err = repos.WithReadTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
var err error var err error
retrievedFile, err = repos.Files.GetByPath(ctx, "/test/read_file.txt") retrievedFile, err = repos.Files.GetByPathTx(ctx, tx, "/test/read_file.txt")
if err != nil { if err != nil {
return err return err
} }

View File

@@ -0,0 +1,874 @@
package database
import (
"context"
"database/sql"
"fmt"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
// TestFileRepositoryUUIDGeneration tests that files get unique UUIDs
func TestFileRepositoryUUIDGeneration(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repo := NewFileRepository(db)
// Create multiple files
files := []*File{
{
Path: "/file1.txt",
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
},
{
Path: "/file2.txt",
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 2048,
Mode: 0644,
UID: 1000,
GID: 1000,
},
}
uuids := make(map[string]bool)
for _, file := range files {
err := repo.Create(ctx, nil, file)
if err != nil {
t.Fatalf("failed to create file: %v", err)
}
// Check UUID was generated
if file.ID.IsZero() {
t.Error("file ID was not generated")
}
// Check UUID is unique
if uuids[file.ID.String()] {
t.Errorf("duplicate UUID generated: %s", file.ID)
}
uuids[file.ID.String()] = true
}
}
// TestFileRepositoryGetByID tests retrieving files by UUID
func TestFileRepositoryGetByID(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repo := NewFileRepository(db)
// Create a file
file := &File{
Path: "/test.txt",
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
}
err := repo.Create(ctx, nil, file)
if err != nil {
t.Fatalf("failed to create file: %v", err)
}
// Retrieve by ID
retrieved, err := repo.GetByID(ctx, file.ID)
if err != nil {
t.Fatalf("failed to get file by ID: %v", err)
}
if retrieved.ID != file.ID {
t.Errorf("ID mismatch: expected %s, got %s", file.ID, retrieved.ID)
}
if retrieved.Path != file.Path {
t.Errorf("Path mismatch: expected %s, got %s", file.Path, retrieved.Path)
}
// Test non-existent ID
nonExistentID := types.NewFileID() // Generate a new UUID that won't exist in the database
nonExistent, err := repo.GetByID(ctx, nonExistentID)
if err != nil {
t.Fatalf("GetByID should not return error for non-existent ID: %v", err)
}
if nonExistent != nil {
t.Error("expected nil for non-existent ID")
}
}
// TestOrphanedFileCleanup tests the cleanup of orphaned files
func TestOrphanedFileCleanup(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Create files
file1 := &File{
Path: "/orphaned.txt",
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
}
file2 := &File{
Path: "/referenced.txt",
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 2048,
Mode: 0644,
UID: 1000,
GID: 1000,
}
err := repos.Files.Create(ctx, nil, file1)
if err != nil {
t.Fatalf("failed to create file1: %v", err)
}
err = repos.Files.Create(ctx, nil, file2)
if err != nil {
t.Fatalf("failed to create file2: %v", err)
}
// Create a snapshot and reference only file2
snapshot := &Snapshot{
ID: "test-snapshot",
Hostname: "test-host",
StartedAt: time.Now(),
}
err = repos.Snapshots.Create(ctx, nil, snapshot)
if err != nil {
t.Fatalf("failed to create snapshot: %v", err)
}
// Add file2 to snapshot
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file2.ID)
if err != nil {
t.Fatalf("failed to add file to snapshot: %v", err)
}
// Run orphaned cleanup
err = repos.Files.DeleteOrphaned(ctx)
if err != nil {
t.Fatalf("failed to delete orphaned files: %v", err)
}
// Check that orphaned file is gone
orphanedFile, err := repos.Files.GetByID(ctx, file1.ID)
if err != nil {
t.Fatalf("error getting file: %v", err)
}
if orphanedFile != nil {
t.Error("orphaned file should have been deleted")
}
// Check that referenced file still exists
referencedFile, err := repos.Files.GetByID(ctx, file2.ID)
if err != nil {
t.Fatalf("error getting file: %v", err)
}
if referencedFile == nil {
t.Error("referenced file should not have been deleted")
}
}
// TestOrphanedChunkCleanup tests the cleanup of orphaned chunks
func TestOrphanedChunkCleanup(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Create chunks
chunk1 := &Chunk{
ChunkHash: types.ChunkHash("orphaned-chunk"),
Size: 1024,
}
chunk2 := &Chunk{
ChunkHash: types.ChunkHash("referenced-chunk"),
Size: 1024,
}
err := repos.Chunks.Create(ctx, nil, chunk1)
if err != nil {
t.Fatalf("failed to create chunk1: %v", err)
}
err = repos.Chunks.Create(ctx, nil, chunk2)
if err != nil {
t.Fatalf("failed to create chunk2: %v", err)
}
// Create a file and reference only chunk2
file := &File{
Path: "/test.txt",
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
}
err = repos.Files.Create(ctx, nil, file)
if err != nil {
t.Fatalf("failed to create file: %v", err)
}
// Create file-chunk mapping only for chunk2
fc := &FileChunk{
FileID: file.ID,
Idx: 0,
ChunkHash: chunk2.ChunkHash,
}
err = repos.FileChunks.Create(ctx, nil, fc)
if err != nil {
t.Fatalf("failed to create file chunk: %v", err)
}
// Run orphaned cleanup
err = repos.Chunks.DeleteOrphaned(ctx)
if err != nil {
t.Fatalf("failed to delete orphaned chunks: %v", err)
}
// Check that orphaned chunk is gone
orphanedChunk, err := repos.Chunks.GetByHash(ctx, chunk1.ChunkHash.String())
if err != nil {
t.Fatalf("error getting chunk: %v", err)
}
if orphanedChunk != nil {
t.Error("orphaned chunk should have been deleted")
}
// Check that referenced chunk still exists
referencedChunk, err := repos.Chunks.GetByHash(ctx, chunk2.ChunkHash.String())
if err != nil {
t.Fatalf("error getting chunk: %v", err)
}
if referencedChunk == nil {
t.Error("referenced chunk should not have been deleted")
}
}
// TestOrphanedBlobCleanup tests the cleanup of orphaned blobs
func TestOrphanedBlobCleanup(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Create blobs
blob1 := &Blob{
ID: types.NewBlobID(),
Hash: types.BlobHash("orphaned-blob"),
CreatedTS: time.Now().Truncate(time.Second),
}
blob2 := &Blob{
ID: types.NewBlobID(),
Hash: types.BlobHash("referenced-blob"),
CreatedTS: time.Now().Truncate(time.Second),
}
err := repos.Blobs.Create(ctx, nil, blob1)
if err != nil {
t.Fatalf("failed to create blob1: %v", err)
}
err = repos.Blobs.Create(ctx, nil, blob2)
if err != nil {
t.Fatalf("failed to create blob2: %v", err)
}
// Create a snapshot and reference only blob2
snapshot := &Snapshot{
ID: "test-snapshot",
Hostname: "test-host",
StartedAt: time.Now(),
}
err = repos.Snapshots.Create(ctx, nil, snapshot)
if err != nil {
t.Fatalf("failed to create snapshot: %v", err)
}
// Add blob2 to snapshot
err = repos.Snapshots.AddBlob(ctx, nil, snapshot.ID.String(), blob2.ID, blob2.Hash)
if err != nil {
t.Fatalf("failed to add blob to snapshot: %v", err)
}
// Run orphaned cleanup
err = repos.Blobs.DeleteOrphaned(ctx)
if err != nil {
t.Fatalf("failed to delete orphaned blobs: %v", err)
}
// Check that orphaned blob is gone
orphanedBlob, err := repos.Blobs.GetByID(ctx, blob1.ID.String())
if err != nil {
t.Fatalf("error getting blob: %v", err)
}
if orphanedBlob != nil {
t.Error("orphaned blob should have been deleted")
}
// Check that referenced blob still exists
referencedBlob, err := repos.Blobs.GetByID(ctx, blob2.ID.String())
if err != nil {
t.Fatalf("error getting blob: %v", err)
}
if referencedBlob == nil {
t.Error("referenced blob should not have been deleted")
}
}
// TestFileChunkRepositoryWithUUIDs tests file-chunk relationships with UUIDs
func TestFileChunkRepositoryWithUUIDs(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Create a file
file := &File{
Path: "/test.txt",
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 3072,
Mode: 0644,
UID: 1000,
GID: 1000,
}
err := repos.Files.Create(ctx, nil, file)
if err != nil {
t.Fatalf("failed to create file: %v", err)
}
// Create chunks
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
for i, chunkHash := range chunks {
chunk := &Chunk{
ChunkHash: chunkHash,
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)
if err != nil {
t.Fatalf("failed to create chunk: %v", err)
}
// Create file-chunk mapping
fc := &FileChunk{
FileID: file.ID,
Idx: i,
ChunkHash: chunkHash,
}
err = repos.FileChunks.Create(ctx, nil, fc)
if err != nil {
t.Fatalf("failed to create file chunk: %v", err)
}
}
// Test GetByFileID
fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
if err != nil {
t.Fatalf("failed to get file chunks: %v", err)
}
if len(fileChunks) != 3 {
t.Errorf("expected 3 chunks, got %d", len(fileChunks))
}
// Test DeleteByFileID
err = repos.FileChunks.DeleteByFileID(ctx, nil, file.ID)
if err != nil {
t.Fatalf("failed to delete file chunks: %v", err)
}
fileChunks, err = repos.FileChunks.GetByFileID(ctx, file.ID)
if err != nil {
t.Fatalf("failed to get file chunks after delete: %v", err)
}
if len(fileChunks) != 0 {
t.Errorf("expected 0 chunks after delete, got %d", len(fileChunks))
}
}
// TestChunkFileRepositoryWithUUIDs tests chunk-file relationships with UUIDs
func TestChunkFileRepositoryWithUUIDs(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Create files
file1 := &File{
Path: "/file1.txt",
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
}
file2 := &File{
Path: "/file2.txt",
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
}
err := repos.Files.Create(ctx, nil, file1)
if err != nil {
t.Fatalf("failed to create file1: %v", err)
}
err = repos.Files.Create(ctx, nil, file2)
if err != nil {
t.Fatalf("failed to create file2: %v", err)
}
// Create a chunk that appears in both files (deduplication)
chunk := &Chunk{
ChunkHash: types.ChunkHash("shared-chunk"),
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)
if err != nil {
t.Fatalf("failed to create chunk: %v", err)
}
// Create chunk-file mappings
cf1 := &ChunkFile{
ChunkHash: chunk.ChunkHash,
FileID: file1.ID,
FileOffset: 0,
Length: 1024,
}
cf2 := &ChunkFile{
ChunkHash: chunk.ChunkHash,
FileID: file2.ID,
FileOffset: 512,
Length: 1024,
}
err = repos.ChunkFiles.Create(ctx, nil, cf1)
if err != nil {
t.Fatalf("failed to create chunk file 1: %v", err)
}
err = repos.ChunkFiles.Create(ctx, nil, cf2)
if err != nil {
t.Fatalf("failed to create chunk file 2: %v", err)
}
// Test GetByChunkHash
chunkFiles, err := repos.ChunkFiles.GetByChunkHash(ctx, chunk.ChunkHash)
if err != nil {
t.Fatalf("failed to get chunk files: %v", err)
}
if len(chunkFiles) != 2 {
t.Errorf("expected 2 files for chunk, got %d", len(chunkFiles))
}
// Test GetByFileID
chunkFiles, err = repos.ChunkFiles.GetByFileID(ctx, file1.ID)
if err != nil {
t.Fatalf("failed to get chunks by file ID: %v", err)
}
if len(chunkFiles) != 1 {
t.Errorf("expected 1 chunk for file, got %d", len(chunkFiles))
}
}
// TestSnapshotRepositoryExtendedFields tests snapshot with version and git revision
func TestSnapshotRepositoryExtendedFields(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repo := NewSnapshotRepository(db)
// Create snapshot with extended fields
snapshot := &Snapshot{
ID: "test-20250722-120000Z",
Hostname: "test-host",
VaultikVersion: "0.0.1",
VaultikGitRevision: "abc123def456",
StartedAt: time.Now(),
CompletedAt: nil,
FileCount: 100,
ChunkCount: 200,
BlobCount: 50,
TotalSize: 1024 * 1024,
BlobSize: 512 * 1024,
BlobUncompressedSize: 1024 * 1024,
CompressionLevel: 6,
CompressionRatio: 2.0,
UploadDurationMs: 5000,
}
err := repo.Create(ctx, nil, snapshot)
if err != nil {
t.Fatalf("failed to create snapshot: %v", err)
}
// Retrieve and verify
retrieved, err := repo.GetByID(ctx, snapshot.ID.String())
if err != nil {
t.Fatalf("failed to get snapshot: %v", err)
}
if retrieved.VaultikVersion != snapshot.VaultikVersion {
t.Errorf("version mismatch: expected %s, got %s", snapshot.VaultikVersion, retrieved.VaultikVersion)
}
if retrieved.VaultikGitRevision != snapshot.VaultikGitRevision {
t.Errorf("git revision mismatch: expected %s, got %s", snapshot.VaultikGitRevision, retrieved.VaultikGitRevision)
}
if retrieved.CompressionLevel != snapshot.CompressionLevel {
t.Errorf("compression level mismatch: expected %d, got %d", snapshot.CompressionLevel, retrieved.CompressionLevel)
}
if retrieved.BlobUncompressedSize != snapshot.BlobUncompressedSize {
t.Errorf("uncompressed size mismatch: expected %d, got %d", snapshot.BlobUncompressedSize, retrieved.BlobUncompressedSize)
}
if retrieved.UploadDurationMs != snapshot.UploadDurationMs {
t.Errorf("upload duration mismatch: expected %d, got %d", snapshot.UploadDurationMs, retrieved.UploadDurationMs)
}
}
// TestComplexOrphanedDataScenario tests a complex scenario with multiple relationships
func TestComplexOrphanedDataScenario(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Create snapshots
snapshot1 := &Snapshot{
ID: "snapshot1",
Hostname: "host1",
StartedAt: time.Now(),
}
snapshot2 := &Snapshot{
ID: "snapshot2",
Hostname: "host1",
StartedAt: time.Now(),
}
err := repos.Snapshots.Create(ctx, nil, snapshot1)
if err != nil {
t.Fatalf("failed to create snapshot1: %v", err)
}
err = repos.Snapshots.Create(ctx, nil, snapshot2)
if err != nil {
t.Fatalf("failed to create snapshot2: %v", err)
}
// Create files
files := make([]*File, 3)
for i := range files {
files[i] = &File{
Path: types.FilePath(fmt.Sprintf("/file%d.txt", i)),
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
}
err = repos.Files.Create(ctx, nil, files[i])
if err != nil {
t.Fatalf("failed to create file%d: %v", i, err)
}
}
// Add files to snapshots
// Snapshot1: file0, file1
// Snapshot2: file1, file2
// file0: only in snapshot1
// file1: in both snapshots
// file2: only in snapshot2
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot1.ID.String(), files[0].ID)
if err != nil {
t.Fatal(err)
}
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot1.ID.String(), files[1].ID)
if err != nil {
t.Fatal(err)
}
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot2.ID.String(), files[1].ID)
if err != nil {
t.Fatal(err)
}
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot2.ID.String(), files[2].ID)
if err != nil {
t.Fatal(err)
}
// Delete snapshot1
err = repos.Snapshots.DeleteSnapshotFiles(ctx, snapshot1.ID.String())
if err != nil {
t.Fatal(err)
}
err = repos.Snapshots.Delete(ctx, snapshot1.ID.String())
if err != nil {
t.Fatal(err)
}
// Run orphaned cleanup
err = repos.Files.DeleteOrphaned(ctx)
if err != nil {
t.Fatal(err)
}
// Check results
// file0 should be deleted (only in deleted snapshot)
file0, err := repos.Files.GetByID(ctx, files[0].ID)
if err != nil {
t.Fatalf("error getting file0: %v", err)
}
if file0 != nil {
t.Error("file0 should have been deleted")
}
// file1 should exist (still in snapshot2)
file1, err := repos.Files.GetByID(ctx, files[1].ID)
if err != nil {
t.Fatalf("error getting file1: %v", err)
}
if file1 == nil {
t.Error("file1 should still exist")
}
// file2 should exist (still in snapshot2)
file2, err := repos.Files.GetByID(ctx, files[2].ID)
if err != nil {
t.Fatalf("error getting file2: %v", err)
}
if file2 == nil {
t.Error("file2 should still exist")
}
}
// TestCascadeDelete tests that cascade deletes work properly
func TestCascadeDelete(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Create a file
file := &File{
Path: "/cascade-test.txt",
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
}
err := repos.Files.Create(ctx, nil, file)
if err != nil {
t.Fatalf("failed to create file: %v", err)
}
// Create chunks and file-chunk mappings
for i := 0; i < 3; i++ {
chunk := &Chunk{
ChunkHash: types.ChunkHash(fmt.Sprintf("cascade-chunk-%d", i)),
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)
if err != nil {
t.Fatalf("failed to create chunk: %v", err)
}
fc := &FileChunk{
FileID: file.ID,
Idx: i,
ChunkHash: chunk.ChunkHash,
}
err = repos.FileChunks.Create(ctx, nil, fc)
if err != nil {
t.Fatalf("failed to create file chunk: %v", err)
}
}
// Verify file chunks exist
fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
if err != nil {
t.Fatal(err)
}
if len(fileChunks) != 3 {
t.Errorf("expected 3 file chunks, got %d", len(fileChunks))
}
// Delete the file
err = repos.Files.DeleteByID(ctx, nil, file.ID)
if err != nil {
t.Fatalf("failed to delete file: %v", err)
}
// Verify file chunks were cascade deleted
fileChunks, err = repos.FileChunks.GetByFileID(ctx, file.ID)
if err != nil {
t.Fatal(err)
}
if len(fileChunks) != 0 {
t.Errorf("expected 0 file chunks after cascade delete, got %d", len(fileChunks))
}
}
// TestTransactionIsolation tests that transactions properly isolate changes
func TestTransactionIsolation(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Start a transaction
err := repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
// Create a file within the transaction
file := &File{
Path: "/tx-test.txt",
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
}
err := repos.Files.Create(ctx, tx, file)
if err != nil {
return err
}
// Within the same transaction, we should be able to query it
// Note: This would require modifying GetByPath to accept a tx parameter
// For now, we'll just test that rollback works
// Return an error to trigger rollback
return fmt.Errorf("intentional rollback")
})
if err == nil {
t.Fatal("expected error from transaction")
}
// Verify the file was not created (transaction rolled back)
files, err := repos.Files.ListByPrefix(ctx, "/tx-test")
if err != nil {
t.Fatal(err)
}
if len(files) != 0 {
t.Error("file should not exist after rollback")
}
}
// TestConcurrentOrphanedCleanup tests that concurrent cleanup operations don't interfere
func TestConcurrentOrphanedCleanup(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Set a 5-second busy timeout to handle concurrent operations
if _, err := db.conn.Exec("PRAGMA busy_timeout = 5000"); err != nil {
t.Fatalf("failed to set busy timeout: %v", err)
}
// Create a snapshot
snapshot := &Snapshot{
ID: "concurrent-test",
Hostname: "test-host",
StartedAt: time.Now(),
}
err := repos.Snapshots.Create(ctx, nil, snapshot)
if err != nil {
t.Fatal(err)
}
// Create many files, some orphaned
for i := 0; i < 20; i++ {
file := &File{
Path: types.FilePath(fmt.Sprintf("/concurrent-%d.txt", i)),
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
}
err = repos.Files.Create(ctx, nil, file)
if err != nil {
t.Fatal(err)
}
// Add even-numbered files to snapshot
if i%2 == 0 {
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file.ID)
if err != nil {
t.Fatal(err)
}
}
}
// Run multiple cleanup operations concurrently
// Note: SQLite has limited support for concurrent writes, so we expect some to fail
done := make(chan error, 3)
for i := 0; i < 3; i++ {
go func() {
done <- repos.Files.DeleteOrphaned(ctx)
}()
}
// Wait for all to complete
for i := 0; i < 3; i++ {
err := <-done
if err != nil {
t.Errorf("cleanup %d failed: %v", i, err)
}
}
// Verify correct files were deleted
files, err := repos.Files.ListByPrefix(ctx, "/concurrent-")
if err != nil {
t.Fatal(err)
}
// Should have 10 files remaining (even numbered)
if len(files) != 10 {
t.Errorf("expected 10 files remaining, got %d", len(files))
}
// Verify all remaining files are even-numbered
for _, file := range files {
var num int
_, err := fmt.Sscanf(file.Path.String(), "/concurrent-%d.txt", &num)
if err != nil {
t.Logf("failed to parse file number from %s: %v", file.Path, err)
}
if num%2 != 0 {
t.Errorf("odd-numbered file %s should have been deleted", file.Path)
}
}
}

View File

@@ -0,0 +1,165 @@
package database
import (
"context"
"testing"
"time"
)
// TestOrphanedFileCleanupDebug tests orphaned file cleanup with debug output
func TestOrphanedFileCleanupDebug(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Create files
file1 := &File{
Path: "/orphaned.txt",
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
}
file2 := &File{
Path: "/referenced.txt",
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 2048,
Mode: 0644,
UID: 1000,
GID: 1000,
}
err := repos.Files.Create(ctx, nil, file1)
if err != nil {
t.Fatalf("failed to create file1: %v", err)
}
t.Logf("Created file1 with ID: %s", file1.ID)
err = repos.Files.Create(ctx, nil, file2)
if err != nil {
t.Fatalf("failed to create file2: %v", err)
}
t.Logf("Created file2 with ID: %s", file2.ID)
// Create a snapshot and reference only file2
snapshot := &Snapshot{
ID: "test-snapshot",
Hostname: "test-host",
StartedAt: time.Now(),
}
err = repos.Snapshots.Create(ctx, nil, snapshot)
if err != nil {
t.Fatalf("failed to create snapshot: %v", err)
}
t.Logf("Created snapshot: %s", snapshot.ID)
// Check snapshot_files before adding
var count int
err = db.conn.QueryRow("SELECT COUNT(*) FROM snapshot_files").Scan(&count)
if err != nil {
t.Fatal(err)
}
t.Logf("snapshot_files count before add: %d", count)
// Add file2 to snapshot
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file2.ID)
if err != nil {
t.Fatalf("failed to add file to snapshot: %v", err)
}
t.Logf("Added file2 to snapshot")
// Check snapshot_files after adding
err = db.conn.QueryRow("SELECT COUNT(*) FROM snapshot_files").Scan(&count)
if err != nil {
t.Fatal(err)
}
t.Logf("snapshot_files count after add: %d", count)
// Check which files are referenced
rows, err := db.conn.Query("SELECT file_id FROM snapshot_files")
if err != nil {
t.Fatal(err)
}
defer func() {
if err := rows.Close(); err != nil {
t.Logf("failed to close rows: %v", err)
}
}()
t.Log("Files in snapshot_files:")
for rows.Next() {
var fileID string
if err := rows.Scan(&fileID); err != nil {
t.Fatal(err)
}
t.Logf(" - %s", fileID)
}
// Check files before cleanup
err = db.conn.QueryRow("SELECT COUNT(*) FROM files").Scan(&count)
if err != nil {
t.Fatal(err)
}
t.Logf("Files count before cleanup: %d", count)
// Run orphaned cleanup
err = repos.Files.DeleteOrphaned(ctx)
if err != nil {
t.Fatalf("failed to delete orphaned files: %v", err)
}
t.Log("Ran orphaned cleanup")
// Check files after cleanup
err = db.conn.QueryRow("SELECT COUNT(*) FROM files").Scan(&count)
if err != nil {
t.Fatal(err)
}
t.Logf("Files count after cleanup: %d", count)
// List remaining files
files, err := repos.Files.ListByPrefix(ctx, "/")
if err != nil {
t.Fatal(err)
}
t.Log("Remaining files:")
for _, f := range files {
t.Logf(" - ID: %s, Path: %s", f.ID, f.Path)
}
// Check that orphaned file is gone
orphanedFile, err := repos.Files.GetByID(ctx, file1.ID)
if err != nil {
t.Fatalf("error getting file: %v", err)
}
if orphanedFile != nil {
t.Error("orphaned file should have been deleted")
// Let's check why it wasn't deleted
var exists bool
err = db.conn.QueryRow(`
SELECT EXISTS(
SELECT 1 FROM snapshot_files
WHERE file_id = ?
)`, file1.ID).Scan(&exists)
if err != nil {
t.Fatal(err)
}
t.Logf("File1 exists in snapshot_files: %v", exists)
} else {
t.Log("Orphaned file was correctly deleted")
}
// Check that referenced file still exists
referencedFile, err := repos.Files.GetByID(ctx, file2.ID)
if err != nil {
t.Fatalf("error getting file: %v", err)
}
if referencedFile == nil {
t.Error("referenced file should not have been deleted")
} else {
t.Log("Referenced file correctly remains")
}
}

View File

@@ -0,0 +1,543 @@
package database
import (
"context"
"fmt"
"strings"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
// TestFileRepositoryEdgeCases tests edge cases for file repository
func TestFileRepositoryEdgeCases(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repo := NewFileRepository(db)
tests := []struct {
name string
file *File
wantErr bool
errMsg string
}{
{
name: "empty path",
file: &File{
Path: "",
MTime: time.Now(),
CTime: time.Now(),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
},
wantErr: false, // Empty strings are allowed, only NULL is not allowed
},
{
name: "very long path",
file: &File{
Path: types.FilePath("/" + strings.Repeat("a", 4096)),
MTime: time.Now(),
CTime: time.Now(),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
},
wantErr: false,
},
{
name: "path with special characters",
file: &File{
Path: "/test/file with spaces and 特殊文字.txt",
MTime: time.Now(),
CTime: time.Now(),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
},
wantErr: false,
},
{
name: "zero size file",
file: &File{
Path: "/empty.txt",
MTime: time.Now(),
CTime: time.Now(),
Size: 0,
Mode: 0644,
UID: 1000,
GID: 1000,
},
wantErr: false,
},
{
name: "symlink with target",
file: &File{
Path: "/link",
MTime: time.Now(),
CTime: time.Now(),
Size: 0,
Mode: 0777 | 0120000, // symlink mode
UID: 1000,
GID: 1000,
LinkTarget: "/target",
},
wantErr: false,
},
}
for i, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Add a unique suffix to paths to avoid UNIQUE constraint violations
if tt.file.Path != "" {
tt.file.Path = types.FilePath(fmt.Sprintf("%s_%d_%d", tt.file.Path, i, time.Now().UnixNano()))
}
err := repo.Create(ctx, nil, tt.file)
if (err != nil) != tt.wantErr {
t.Errorf("Create() error = %v, wantErr %v", err, tt.wantErr)
}
if err != nil && tt.errMsg != "" && !strings.Contains(err.Error(), tt.errMsg) {
t.Errorf("Create() error = %v, want error containing %q", err, tt.errMsg)
}
})
}
}
// TestDuplicateHandling tests handling of duplicate entries
func TestDuplicateHandling(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Test duplicate file paths - Create uses UPSERT logic
t.Run("duplicate file paths", func(t *testing.T) {
file1 := &File{
Path: "/duplicate.txt",
MTime: time.Now(),
CTime: time.Now(),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
}
file2 := &File{
Path: "/duplicate.txt", // Same path
MTime: time.Now().Add(time.Hour),
CTime: time.Now().Add(time.Hour),
Size: 2048,
Mode: 0644,
UID: 1000,
GID: 1000,
}
err := repos.Files.Create(ctx, nil, file1)
if err != nil {
t.Fatalf("failed to create file1: %v", err)
}
originalID := file1.ID
// Create with same path should update the existing record (UPSERT behavior)
err = repos.Files.Create(ctx, nil, file2)
if err != nil {
t.Fatalf("failed to create file2: %v", err)
}
// Verify the file was updated, not duplicated
retrievedFile, err := repos.Files.GetByPath(ctx, "/duplicate.txt")
if err != nil {
t.Fatalf("failed to retrieve file: %v", err)
}
// The file should have been updated with file2's data
if retrievedFile.Size != 2048 {
t.Errorf("expected size 2048, got %d", retrievedFile.Size)
}
// ID might be different due to the UPSERT
if retrievedFile.ID != file2.ID {
t.Logf("File ID changed from %s to %s during upsert", originalID, retrievedFile.ID)
}
})
// Test duplicate chunk hashes
t.Run("duplicate chunk hashes", func(t *testing.T) {
chunk := &Chunk{
ChunkHash: types.ChunkHash("duplicate-chunk"),
Size: 1024,
}
err := repos.Chunks.Create(ctx, nil, chunk)
if err != nil {
t.Fatalf("failed to create chunk: %v", err)
}
// Creating the same chunk again should be idempotent (ON CONFLICT DO NOTHING)
err = repos.Chunks.Create(ctx, nil, chunk)
if err != nil {
t.Errorf("duplicate chunk creation should be idempotent, got error: %v", err)
}
})
// Test duplicate file-chunk mappings
t.Run("duplicate file-chunk mappings", func(t *testing.T) {
file := &File{
Path: "/test-dup-fc.txt",
MTime: time.Now(),
CTime: time.Now(),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
}
err := repos.Files.Create(ctx, nil, file)
if err != nil {
t.Fatal(err)
}
chunk := &Chunk{
ChunkHash: types.ChunkHash("test-chunk-dup"),
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)
if err != nil {
t.Fatal(err)
}
fc := &FileChunk{
FileID: file.ID,
Idx: 0,
ChunkHash: chunk.ChunkHash,
}
err = repos.FileChunks.Create(ctx, nil, fc)
if err != nil {
t.Fatal(err)
}
// Creating the same mapping again should be idempotent
err = repos.FileChunks.Create(ctx, nil, fc)
if err != nil {
t.Error("file-chunk creation should be idempotent")
}
})
}
// TestNullHandling tests handling of NULL values
func TestNullHandling(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Test file with no link target
t.Run("file without link target", func(t *testing.T) {
file := &File{
Path: "/regular.txt",
MTime: time.Now(),
CTime: time.Now(),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
LinkTarget: "", // Should be stored as NULL
}
err := repos.Files.Create(ctx, nil, file)
if err != nil {
t.Fatal(err)
}
retrieved, err := repos.Files.GetByID(ctx, file.ID)
if err != nil {
t.Fatal(err)
}
if retrieved.LinkTarget != "" {
t.Errorf("expected empty link target, got %q", retrieved.LinkTarget)
}
})
// Test snapshot with NULL completed_at
t.Run("incomplete snapshot", func(t *testing.T) {
snapshot := &Snapshot{
ID: "incomplete-test",
Hostname: "test-host",
StartedAt: time.Now(),
CompletedAt: nil, // Should remain NULL until completed
}
err := repos.Snapshots.Create(ctx, nil, snapshot)
if err != nil {
t.Fatal(err)
}
retrieved, err := repos.Snapshots.GetByID(ctx, snapshot.ID.String())
if err != nil {
t.Fatal(err)
}
if retrieved.CompletedAt != nil {
t.Error("expected nil CompletedAt for incomplete snapshot")
}
})
// Test blob with NULL uploaded_ts
t.Run("blob not uploaded", func(t *testing.T) {
blob := &Blob{
ID: types.NewBlobID(),
Hash: types.BlobHash("test-hash"),
CreatedTS: time.Now(),
UploadedTS: nil, // Not uploaded yet
}
err := repos.Blobs.Create(ctx, nil, blob)
if err != nil {
t.Fatal(err)
}
retrieved, err := repos.Blobs.GetByID(ctx, blob.ID.String())
if err != nil {
t.Fatal(err)
}
if retrieved.UploadedTS != nil {
t.Error("expected nil UploadedTS for non-uploaded blob")
}
})
}
// TestLargeDatasets tests operations with large amounts of data
func TestLargeDatasets(t *testing.T) {
if testing.Short() {
t.Skip("skipping large dataset test in short mode")
}
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Create a snapshot
snapshot := &Snapshot{
ID: "large-dataset-test",
Hostname: "test-host",
StartedAt: time.Now(),
}
err := repos.Snapshots.Create(ctx, nil, snapshot)
if err != nil {
t.Fatal(err)
}
// Create many files
const fileCount = 1000
fileIDs := make([]types.FileID, fileCount)
t.Run("create many files", func(t *testing.T) {
start := time.Now()
for i := 0; i < fileCount; i++ {
file := &File{
Path: types.FilePath(fmt.Sprintf("/large/file%05d.txt", i)),
MTime: time.Now(),
CTime: time.Now(),
Size: int64(i * 1024),
Mode: 0644,
UID: uint32(1000 + (i % 10)),
GID: uint32(1000 + (i % 10)),
}
err := repos.Files.Create(ctx, nil, file)
if err != nil {
t.Fatalf("failed to create file %d: %v", i, err)
}
fileIDs[i] = file.ID
// Add half to snapshot
if i%2 == 0 {
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file.ID)
if err != nil {
t.Fatal(err)
}
}
}
t.Logf("Created %d files in %v", fileCount, time.Since(start))
})
// Test ListByPrefix performance
t.Run("list by prefix performance", func(t *testing.T) {
start := time.Now()
files, err := repos.Files.ListByPrefix(ctx, "/large/")
if err != nil {
t.Fatal(err)
}
if len(files) != fileCount {
t.Errorf("expected %d files, got %d", fileCount, len(files))
}
t.Logf("Listed %d files in %v", len(files), time.Since(start))
})
// Test orphaned cleanup performance
t.Run("orphaned cleanup performance", func(t *testing.T) {
start := time.Now()
err := repos.Files.DeleteOrphaned(ctx)
if err != nil {
t.Fatal(err)
}
t.Logf("Cleaned up orphaned files in %v", time.Since(start))
// Verify correct number remain
files, err := repos.Files.ListByPrefix(ctx, "/large/")
if err != nil {
t.Fatal(err)
}
if len(files) != fileCount/2 {
t.Errorf("expected %d files after cleanup, got %d", fileCount/2, len(files))
}
})
}
// TestErrorPropagation tests that errors are properly propagated
func TestErrorPropagation(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Test GetByID with non-existent ID
t.Run("GetByID non-existent", func(t *testing.T) {
file, err := repos.Files.GetByID(ctx, types.NewFileID())
if err != nil {
t.Errorf("GetByID should not return error for non-existent ID, got: %v", err)
}
if file != nil {
t.Error("expected nil file for non-existent ID")
}
})
// Test GetByPath with non-existent path
t.Run("GetByPath non-existent", func(t *testing.T) {
file, err := repos.Files.GetByPath(ctx, "/non/existent/path.txt")
if err != nil {
t.Errorf("GetByPath should not return error for non-existent path, got: %v", err)
}
if file != nil {
t.Error("expected nil file for non-existent path")
}
})
// Test invalid foreign key reference
t.Run("invalid foreign key", func(t *testing.T) {
fc := &FileChunk{
FileID: types.NewFileID(),
Idx: 0,
ChunkHash: types.ChunkHash("some-chunk"),
}
err := repos.FileChunks.Create(ctx, nil, fc)
if err == nil {
t.Error("expected error for invalid foreign key")
}
if !strings.Contains(err.Error(), "FOREIGN KEY") {
t.Errorf("expected foreign key error, got: %v", err)
}
})
}
// TestQueryInjection tests that the system is safe from SQL injection
func TestQueryInjection(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Test various injection attempts
injectionTests := []string{
"'; DROP TABLE files; --",
"' OR '1'='1",
"'; DELETE FROM files WHERE '1'='1'; --",
`test'); DROP TABLE files; --`,
}
for _, injection := range injectionTests {
t.Run("injection attempt", func(t *testing.T) {
// Try injection in file path
file := &File{
Path: types.FilePath(injection),
MTime: time.Now(),
CTime: time.Now(),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
}
_ = repos.Files.Create(ctx, nil, file)
// Should either succeed (treating as normal string) or fail with constraint
// but should NOT execute the injected SQL
// Verify tables still exist
var count int
err := db.conn.QueryRow("SELECT COUNT(*) FROM files").Scan(&count)
if err != nil {
t.Fatal("files table was damaged by injection")
}
})
}
}
// TestTimezoneHandling tests that times are properly handled in UTC
func TestTimezoneHandling(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Create file with specific timezone
loc, err := time.LoadLocation("America/New_York")
if err != nil {
t.Skip("timezone not available")
}
// Use Truncate to remove sub-second precision since we store as Unix timestamps
nyTime := time.Now().In(loc).Truncate(time.Second)
file := &File{
Path: "/timezone-test.txt",
MTime: nyTime,
CTime: nyTime,
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
}
err = repos.Files.Create(ctx, nil, file)
if err != nil {
t.Fatal(err)
}
// Retrieve and verify times are in UTC
retrieved, err := repos.Files.GetByID(ctx, file.ID)
if err != nil {
t.Fatal(err)
}
// Check that times are equivalent (same instant)
if !retrieved.MTime.Equal(nyTime) {
t.Error("time was not preserved correctly")
}
// Check that retrieved time is in UTC
if retrieved.MTime.Location() != time.UTC {
t.Error("retrieved time is not in UTC")
}
}

View File

@@ -0,0 +1,137 @@
-- Vaultik Database Schema
-- Note: This database does not support migrations. If the schema changes,
-- delete the local database and perform a full backup to recreate it.
-- Files table: stores metadata about files in the filesystem
CREATE TABLE IF NOT EXISTS files (
id TEXT PRIMARY KEY, -- UUID
path TEXT NOT NULL UNIQUE,
source_path TEXT NOT NULL DEFAULT '', -- The source directory this file came from (for restore path stripping)
mtime INTEGER NOT NULL,
ctime INTEGER NOT NULL,
size INTEGER NOT NULL,
mode INTEGER NOT NULL,
uid INTEGER NOT NULL,
gid INTEGER NOT NULL,
link_target TEXT
);
-- Create index on path for efficient lookups
CREATE INDEX IF NOT EXISTS idx_files_path ON files(path);
-- File chunks table: maps files to their constituent chunks
CREATE TABLE IF NOT EXISTS file_chunks (
file_id TEXT NOT NULL,
idx INTEGER NOT NULL,
chunk_hash TEXT NOT NULL,
PRIMARY KEY (file_id, idx),
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE,
FOREIGN KEY (chunk_hash) REFERENCES chunks(chunk_hash)
);
-- Index for efficient chunk lookups (used in orphan detection)
CREATE INDEX IF NOT EXISTS idx_file_chunks_chunk_hash ON file_chunks(chunk_hash);
-- Chunks table: stores unique content-defined chunks
CREATE TABLE IF NOT EXISTS chunks (
chunk_hash TEXT PRIMARY KEY,
size INTEGER NOT NULL
);
-- Blobs table: stores packed, compressed, and encrypted blob information
CREATE TABLE IF NOT EXISTS blobs (
id TEXT PRIMARY KEY,
blob_hash TEXT UNIQUE,
created_ts INTEGER NOT NULL,
finished_ts INTEGER,
uncompressed_size INTEGER NOT NULL DEFAULT 0,
compressed_size INTEGER NOT NULL DEFAULT 0,
uploaded_ts INTEGER
);
-- Blob chunks table: maps chunks to the blobs that contain them
CREATE TABLE IF NOT EXISTS blob_chunks (
blob_id TEXT NOT NULL,
chunk_hash TEXT NOT NULL,
offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (blob_id, chunk_hash),
FOREIGN KEY (blob_id) REFERENCES blobs(id) ON DELETE CASCADE,
FOREIGN KEY (chunk_hash) REFERENCES chunks(chunk_hash)
);
-- Index for efficient chunk lookups (used in orphan detection)
CREATE INDEX IF NOT EXISTS idx_blob_chunks_chunk_hash ON blob_chunks(chunk_hash);
-- Chunk files table: reverse mapping of chunks to files
CREATE TABLE IF NOT EXISTS chunk_files (
chunk_hash TEXT NOT NULL,
file_id TEXT NOT NULL,
file_offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (chunk_hash, file_id),
FOREIGN KEY (chunk_hash) REFERENCES chunks(chunk_hash),
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
);
-- Index for efficient file lookups (used in orphan detection)
CREATE INDEX IF NOT EXISTS idx_chunk_files_file_id ON chunk_files(file_id);
-- Snapshots table: tracks backup snapshots
CREATE TABLE IF NOT EXISTS snapshots (
id TEXT PRIMARY KEY,
hostname TEXT NOT NULL,
vaultik_version TEXT NOT NULL,
vaultik_git_revision TEXT NOT NULL,
started_at INTEGER NOT NULL,
completed_at INTEGER,
file_count INTEGER NOT NULL DEFAULT 0,
chunk_count INTEGER NOT NULL DEFAULT 0,
blob_count INTEGER NOT NULL DEFAULT 0,
total_size INTEGER NOT NULL DEFAULT 0,
blob_size INTEGER NOT NULL DEFAULT 0,
blob_uncompressed_size INTEGER NOT NULL DEFAULT 0,
compression_ratio REAL NOT NULL DEFAULT 1.0,
compression_level INTEGER NOT NULL DEFAULT 3,
upload_bytes INTEGER NOT NULL DEFAULT 0,
upload_duration_ms INTEGER NOT NULL DEFAULT 0
);
-- Snapshot files table: maps snapshots to files
CREATE TABLE IF NOT EXISTS snapshot_files (
snapshot_id TEXT NOT NULL,
file_id TEXT NOT NULL,
PRIMARY KEY (snapshot_id, file_id),
FOREIGN KEY (snapshot_id) REFERENCES snapshots(id) ON DELETE CASCADE,
FOREIGN KEY (file_id) REFERENCES files(id)
);
-- Index for efficient file lookups (used in orphan detection)
CREATE INDEX IF NOT EXISTS idx_snapshot_files_file_id ON snapshot_files(file_id);
-- Snapshot blobs table: maps snapshots to blobs
CREATE TABLE IF NOT EXISTS snapshot_blobs (
snapshot_id TEXT NOT NULL,
blob_id TEXT NOT NULL,
blob_hash TEXT NOT NULL,
PRIMARY KEY (snapshot_id, blob_id),
FOREIGN KEY (snapshot_id) REFERENCES snapshots(id) ON DELETE CASCADE,
FOREIGN KEY (blob_id) REFERENCES blobs(id)
);
-- Index for efficient blob lookups (used in orphan detection)
CREATE INDEX IF NOT EXISTS idx_snapshot_blobs_blob_id ON snapshot_blobs(blob_id);
-- Uploads table: tracks blob upload metrics
CREATE TABLE IF NOT EXISTS uploads (
blob_hash TEXT PRIMARY KEY,
snapshot_id TEXT NOT NULL,
uploaded_at INTEGER NOT NULL,
size INTEGER NOT NULL,
duration_ms INTEGER NOT NULL,
FOREIGN KEY (blob_hash) REFERENCES blobs(blob_hash),
FOREIGN KEY (snapshot_id) REFERENCES snapshots(id)
);
-- Index for efficient snapshot lookups
CREATE INDEX IF NOT EXISTS idx_uploads_snapshot_id ON uploads(snapshot_id);

View File

@@ -0,0 +1,11 @@
-- Track blob upload metrics
CREATE TABLE IF NOT EXISTS uploads (
blob_hash TEXT PRIMARY KEY,
uploaded_at TIMESTAMP NOT NULL,
size INTEGER NOT NULL,
duration_ms INTEGER NOT NULL,
FOREIGN KEY (blob_hash) REFERENCES blobs(blob_hash)
);
CREATE INDEX idx_uploads_uploaded_at ON uploads(uploaded_at);
CREATE INDEX idx_uploads_duration ON uploads(duration_ms);

View File

@@ -5,6 +5,8 @@ import (
"database/sql" "database/sql"
"fmt" "fmt"
"time" "time"
"git.eeqj.de/sneak/vaultik/internal/types"
) )
type SnapshotRepository struct { type SnapshotRepository struct {
@@ -17,17 +19,27 @@ func NewSnapshotRepository(db *DB) *SnapshotRepository {
func (r *SnapshotRepository) Create(ctx context.Context, tx *sql.Tx, snapshot *Snapshot) error { func (r *SnapshotRepository) Create(ctx context.Context, tx *sql.Tx, snapshot *Snapshot) error {
query := ` query := `
INSERT INTO snapshots (id, hostname, vaultik_version, created_ts, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio) INSERT INTO snapshots (id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at,
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) file_count, chunk_count, blob_count, total_size, blob_size, blob_uncompressed_size,
compression_ratio, compression_level, upload_bytes, upload_duration_ms)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
` `
var completedAt *int64
if snapshot.CompletedAt != nil {
ts := snapshot.CompletedAt.Unix()
completedAt = &ts
}
var err error var err error
if tx != nil { if tx != nil {
_, err = tx.ExecContext(ctx, query, snapshot.ID, snapshot.Hostname, snapshot.VaultikVersion, snapshot.CreatedTS.Unix(), _, err = tx.ExecContext(ctx, query, snapshot.ID, snapshot.Hostname, snapshot.VaultikVersion, snapshot.VaultikGitRevision, snapshot.StartedAt.Unix(),
snapshot.FileCount, snapshot.ChunkCount, snapshot.BlobCount, snapshot.TotalSize, snapshot.BlobSize, snapshot.CompressionRatio) completedAt, snapshot.FileCount, snapshot.ChunkCount, snapshot.BlobCount, snapshot.TotalSize, snapshot.BlobSize, snapshot.BlobUncompressedSize,
snapshot.CompressionRatio, snapshot.CompressionLevel, snapshot.UploadBytes, snapshot.UploadDurationMs)
} else { } else {
_, err = r.db.ExecWithLock(ctx, query, snapshot.ID, snapshot.Hostname, snapshot.VaultikVersion, snapshot.CreatedTS.Unix(), _, err = r.db.ExecWithLog(ctx, query, snapshot.ID, snapshot.Hostname, snapshot.VaultikVersion, snapshot.VaultikGitRevision, snapshot.StartedAt.Unix(),
snapshot.FileCount, snapshot.ChunkCount, snapshot.BlobCount, snapshot.TotalSize, snapshot.BlobSize, snapshot.CompressionRatio) completedAt, snapshot.FileCount, snapshot.ChunkCount, snapshot.BlobCount, snapshot.TotalSize, snapshot.BlobSize, snapshot.BlobUncompressedSize,
snapshot.CompressionRatio, snapshot.CompressionLevel, snapshot.UploadBytes, snapshot.UploadDurationMs)
} }
if err != nil { if err != nil {
@@ -58,7 +70,7 @@ func (r *SnapshotRepository) UpdateCounts(ctx context.Context, tx *sql.Tx, snaps
if tx != nil { if tx != nil {
_, err = tx.ExecContext(ctx, query, fileCount, chunkCount, blobCount, totalSize, blobSize, compressionRatio, snapshotID) _, err = tx.ExecContext(ctx, query, fileCount, chunkCount, blobCount, totalSize, blobSize, compressionRatio, snapshotID)
} else { } else {
_, err = r.db.ExecWithLock(ctx, query, fileCount, chunkCount, blobCount, totalSize, blobSize, compressionRatio, snapshotID) _, err = r.db.ExecWithLog(ctx, query, fileCount, chunkCount, blobCount, totalSize, blobSize, compressionRatio, snapshotID)
} }
if err != nil { if err != nil {
@@ -68,27 +80,83 @@ func (r *SnapshotRepository) UpdateCounts(ctx context.Context, tx *sql.Tx, snaps
return nil return nil
} }
// UpdateExtendedStats updates extended statistics for a snapshot
func (r *SnapshotRepository) UpdateExtendedStats(ctx context.Context, tx *sql.Tx, snapshotID string, blobUncompressedSize int64, compressionLevel int, uploadDurationMs int64) error {
// Calculate compression ratio based on uncompressed vs compressed sizes
var compressionRatio float64
if blobUncompressedSize > 0 {
// Get current blob_size from DB to calculate ratio
var blobSize int64
queryGet := `SELECT blob_size FROM snapshots WHERE id = ?`
if tx != nil {
err := tx.QueryRowContext(ctx, queryGet, snapshotID).Scan(&blobSize)
if err != nil {
return fmt.Errorf("getting blob size: %w", err)
}
} else {
err := r.db.conn.QueryRowContext(ctx, queryGet, snapshotID).Scan(&blobSize)
if err != nil {
return fmt.Errorf("getting blob size: %w", err)
}
}
compressionRatio = float64(blobSize) / float64(blobUncompressedSize)
} else {
compressionRatio = 1.0
}
query := `
UPDATE snapshots
SET blob_uncompressed_size = ?,
compression_ratio = ?,
compression_level = ?,
upload_bytes = blob_size,
upload_duration_ms = ?
WHERE id = ?
`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, blobUncompressedSize, compressionRatio, compressionLevel, uploadDurationMs, snapshotID)
} else {
_, err = r.db.ExecWithLog(ctx, query, blobUncompressedSize, compressionRatio, compressionLevel, uploadDurationMs, snapshotID)
}
if err != nil {
return fmt.Errorf("updating extended stats: %w", err)
}
return nil
}
func (r *SnapshotRepository) GetByID(ctx context.Context, snapshotID string) (*Snapshot, error) { func (r *SnapshotRepository) GetByID(ctx context.Context, snapshotID string) (*Snapshot, error) {
query := ` query := `
SELECT id, hostname, vaultik_version, created_ts, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio SELECT id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at,
file_count, chunk_count, blob_count, total_size, blob_size, blob_uncompressed_size,
compression_ratio, compression_level, upload_bytes, upload_duration_ms
FROM snapshots FROM snapshots
WHERE id = ? WHERE id = ?
` `
var snapshot Snapshot var snapshot Snapshot
var createdTSUnix int64 var startedAtUnix int64
var completedAtUnix *int64
err := r.db.conn.QueryRowContext(ctx, query, snapshotID).Scan( err := r.db.conn.QueryRowContext(ctx, query, snapshotID).Scan(
&snapshot.ID, &snapshot.ID,
&snapshot.Hostname, &snapshot.Hostname,
&snapshot.VaultikVersion, &snapshot.VaultikVersion,
&createdTSUnix, &snapshot.VaultikGitRevision,
&startedAtUnix,
&completedAtUnix,
&snapshot.FileCount, &snapshot.FileCount,
&snapshot.ChunkCount, &snapshot.ChunkCount,
&snapshot.BlobCount, &snapshot.BlobCount,
&snapshot.TotalSize, &snapshot.TotalSize,
&snapshot.BlobSize, &snapshot.BlobSize,
&snapshot.BlobUncompressedSize,
&snapshot.CompressionRatio, &snapshot.CompressionRatio,
&snapshot.CompressionLevel,
&snapshot.UploadBytes,
&snapshot.UploadDurationMs,
) )
if err == sql.ErrNoRows { if err == sql.ErrNoRows {
@@ -98,16 +166,20 @@ func (r *SnapshotRepository) GetByID(ctx context.Context, snapshotID string) (*S
return nil, fmt.Errorf("querying snapshot: %w", err) return nil, fmt.Errorf("querying snapshot: %w", err)
} }
snapshot.CreatedTS = time.Unix(createdTSUnix, 0) snapshot.StartedAt = time.Unix(startedAtUnix, 0).UTC()
if completedAtUnix != nil {
t := time.Unix(*completedAtUnix, 0).UTC()
snapshot.CompletedAt = &t
}
return &snapshot, nil return &snapshot, nil
} }
func (r *SnapshotRepository) ListRecent(ctx context.Context, limit int) ([]*Snapshot, error) { func (r *SnapshotRepository) ListRecent(ctx context.Context, limit int) ([]*Snapshot, error) {
query := ` query := `
SELECT id, hostname, vaultik_version, created_ts, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio SELECT id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
FROM snapshots FROM snapshots
ORDER BY created_ts DESC ORDER BY started_at DESC
LIMIT ? LIMIT ?
` `
@@ -120,13 +192,16 @@ func (r *SnapshotRepository) ListRecent(ctx context.Context, limit int) ([]*Snap
var snapshots []*Snapshot var snapshots []*Snapshot
for rows.Next() { for rows.Next() {
var snapshot Snapshot var snapshot Snapshot
var createdTSUnix int64 var startedAtUnix int64
var completedAtUnix *int64
err := rows.Scan( err := rows.Scan(
&snapshot.ID, &snapshot.ID,
&snapshot.Hostname, &snapshot.Hostname,
&snapshot.VaultikVersion, &snapshot.VaultikVersion,
&createdTSUnix, &snapshot.VaultikGitRevision,
&startedAtUnix,
&completedAtUnix,
&snapshot.FileCount, &snapshot.FileCount,
&snapshot.ChunkCount, &snapshot.ChunkCount,
&snapshot.BlobCount, &snapshot.BlobCount,
@@ -138,10 +213,336 @@ func (r *SnapshotRepository) ListRecent(ctx context.Context, limit int) ([]*Snap
return nil, fmt.Errorf("scanning snapshot: %w", err) return nil, fmt.Errorf("scanning snapshot: %w", err)
} }
snapshot.CreatedTS = time.Unix(createdTSUnix, 0) snapshot.StartedAt = time.Unix(startedAtUnix, 0)
if completedAtUnix != nil {
t := time.Unix(*completedAtUnix, 0)
snapshot.CompletedAt = &t
}
snapshots = append(snapshots, &snapshot) snapshots = append(snapshots, &snapshot)
} }
return snapshots, rows.Err() return snapshots, rows.Err()
} }
// MarkComplete marks a snapshot as completed with the current timestamp
func (r *SnapshotRepository) MarkComplete(ctx context.Context, tx *sql.Tx, snapshotID string) error {
query := `
UPDATE snapshots
SET completed_at = ?
WHERE id = ?
`
completedAt := time.Now().UTC().Unix()
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, completedAt, snapshotID)
} else {
_, err = r.db.ExecWithLog(ctx, query, completedAt, snapshotID)
}
if err != nil {
return fmt.Errorf("marking snapshot complete: %w", err)
}
return nil
}
// AddFile adds a file to a snapshot
func (r *SnapshotRepository) AddFile(ctx context.Context, tx *sql.Tx, snapshotID string, filePath string) error {
query := `
INSERT OR IGNORE INTO snapshot_files (snapshot_id, file_id)
SELECT ?, id FROM files WHERE path = ?
`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, snapshotID, filePath)
} else {
_, err = r.db.ExecWithLog(ctx, query, snapshotID, filePath)
}
if err != nil {
return fmt.Errorf("adding file to snapshot: %w", err)
}
return nil
}
// AddFileByID adds a file to a snapshot by file ID
func (r *SnapshotRepository) AddFileByID(ctx context.Context, tx *sql.Tx, snapshotID string, fileID types.FileID) error {
query := `
INSERT OR IGNORE INTO snapshot_files (snapshot_id, file_id)
VALUES (?, ?)
`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, snapshotID, fileID.String())
} else {
_, err = r.db.ExecWithLog(ctx, query, snapshotID, fileID.String())
}
if err != nil {
return fmt.Errorf("adding file to snapshot: %w", err)
}
return nil
}
// AddFilesByIDBatch adds multiple files to a snapshot in batched inserts
func (r *SnapshotRepository) AddFilesByIDBatch(ctx context.Context, tx *sql.Tx, snapshotID string, fileIDs []types.FileID) error {
if len(fileIDs) == 0 {
return nil
}
// Each entry has 2 values, so batch at 400 to be safe
const batchSize = 400
for i := 0; i < len(fileIDs); i += batchSize {
end := i + batchSize
if end > len(fileIDs) {
end = len(fileIDs)
}
batch := fileIDs[i:end]
query := "INSERT OR IGNORE INTO snapshot_files (snapshot_id, file_id) VALUES "
args := make([]interface{}, 0, len(batch)*2)
for j, fileID := range batch {
if j > 0 {
query += ", "
}
query += "(?, ?)"
args = append(args, snapshotID, fileID.String())
}
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch adding files to snapshot: %w", err)
}
}
return nil
}
// AddBlob adds a blob to a snapshot
func (r *SnapshotRepository) AddBlob(ctx context.Context, tx *sql.Tx, snapshotID string, blobID types.BlobID, blobHash types.BlobHash) error {
query := `
INSERT OR IGNORE INTO snapshot_blobs (snapshot_id, blob_id, blob_hash)
VALUES (?, ?, ?)
`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, snapshotID, blobID.String(), blobHash.String())
} else {
_, err = r.db.ExecWithLog(ctx, query, snapshotID, blobID.String(), blobHash.String())
}
if err != nil {
return fmt.Errorf("adding blob to snapshot: %w", err)
}
return nil
}
// GetBlobHashes returns all blob hashes for a snapshot
func (r *SnapshotRepository) GetBlobHashes(ctx context.Context, snapshotID string) ([]string, error) {
query := `
SELECT sb.blob_hash
FROM snapshot_blobs sb
WHERE sb.snapshot_id = ?
ORDER BY sb.blob_hash
`
rows, err := r.db.conn.QueryContext(ctx, query, snapshotID)
if err != nil {
return nil, fmt.Errorf("querying blob hashes: %w", err)
}
defer CloseRows(rows)
var blobs []string
for rows.Next() {
var blobHash string
if err := rows.Scan(&blobHash); err != nil {
return nil, fmt.Errorf("scanning blob hash: %w", err)
}
blobs = append(blobs, blobHash)
}
return blobs, rows.Err()
}
// GetSnapshotTotalCompressedSize returns the total compressed size of all blobs referenced by a snapshot
func (r *SnapshotRepository) GetSnapshotTotalCompressedSize(ctx context.Context, snapshotID string) (int64, error) {
query := `
SELECT COALESCE(SUM(b.compressed_size), 0)
FROM snapshot_blobs sb
JOIN blobs b ON sb.blob_hash = b.blob_hash
WHERE sb.snapshot_id = ?
`
var totalSize int64
err := r.db.conn.QueryRowContext(ctx, query, snapshotID).Scan(&totalSize)
if err != nil {
return 0, fmt.Errorf("querying total compressed size: %w", err)
}
return totalSize, nil
}
// GetIncompleteSnapshots returns all snapshots that haven't been completed
func (r *SnapshotRepository) GetIncompleteSnapshots(ctx context.Context) ([]*Snapshot, error) {
query := `
SELECT id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
FROM snapshots
WHERE completed_at IS NULL
ORDER BY started_at DESC
`
rows, err := r.db.conn.QueryContext(ctx, query)
if err != nil {
return nil, fmt.Errorf("querying incomplete snapshots: %w", err)
}
defer CloseRows(rows)
var snapshots []*Snapshot
for rows.Next() {
var snapshot Snapshot
var startedAtUnix int64
var completedAtUnix *int64
err := rows.Scan(
&snapshot.ID,
&snapshot.Hostname,
&snapshot.VaultikVersion,
&snapshot.VaultikGitRevision,
&startedAtUnix,
&completedAtUnix,
&snapshot.FileCount,
&snapshot.ChunkCount,
&snapshot.BlobCount,
&snapshot.TotalSize,
&snapshot.BlobSize,
&snapshot.CompressionRatio,
)
if err != nil {
return nil, fmt.Errorf("scanning snapshot: %w", err)
}
snapshot.StartedAt = time.Unix(startedAtUnix, 0)
if completedAtUnix != nil {
t := time.Unix(*completedAtUnix, 0)
snapshot.CompletedAt = &t
}
snapshots = append(snapshots, &snapshot)
}
return snapshots, rows.Err()
}
// GetIncompleteByHostname returns all incomplete snapshots for a specific hostname
func (r *SnapshotRepository) GetIncompleteByHostname(ctx context.Context, hostname string) ([]*Snapshot, error) {
query := `
SELECT id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
FROM snapshots
WHERE completed_at IS NULL AND hostname = ?
ORDER BY started_at DESC
`
rows, err := r.db.conn.QueryContext(ctx, query, hostname)
if err != nil {
return nil, fmt.Errorf("querying incomplete snapshots: %w", err)
}
defer CloseRows(rows)
var snapshots []*Snapshot
for rows.Next() {
var snapshot Snapshot
var startedAtUnix int64
var completedAtUnix *int64
err := rows.Scan(
&snapshot.ID,
&snapshot.Hostname,
&snapshot.VaultikVersion,
&snapshot.VaultikGitRevision,
&startedAtUnix,
&completedAtUnix,
&snapshot.FileCount,
&snapshot.ChunkCount,
&snapshot.BlobCount,
&snapshot.TotalSize,
&snapshot.BlobSize,
&snapshot.CompressionRatio,
)
if err != nil {
return nil, fmt.Errorf("scanning snapshot: %w", err)
}
snapshot.StartedAt = time.Unix(startedAtUnix, 0).UTC()
if completedAtUnix != nil {
t := time.Unix(*completedAtUnix, 0).UTC()
snapshot.CompletedAt = &t
}
snapshots = append(snapshots, &snapshot)
}
return snapshots, rows.Err()
}
// Delete removes a snapshot record
func (r *SnapshotRepository) Delete(ctx context.Context, snapshotID string) error {
query := `DELETE FROM snapshots WHERE id = ?`
_, err := r.db.ExecWithLog(ctx, query, snapshotID)
if err != nil {
return fmt.Errorf("deleting snapshot: %w", err)
}
return nil
}
// DeleteSnapshotFiles removes all snapshot_files entries for a snapshot
func (r *SnapshotRepository) DeleteSnapshotFiles(ctx context.Context, snapshotID string) error {
query := `DELETE FROM snapshot_files WHERE snapshot_id = ?`
_, err := r.db.ExecWithLog(ctx, query, snapshotID)
if err != nil {
return fmt.Errorf("deleting snapshot files: %w", err)
}
return nil
}
// DeleteSnapshotBlobs removes all snapshot_blobs entries for a snapshot
func (r *SnapshotRepository) DeleteSnapshotBlobs(ctx context.Context, snapshotID string) error {
query := `DELETE FROM snapshot_blobs WHERE snapshot_id = ?`
_, err := r.db.ExecWithLog(ctx, query, snapshotID)
if err != nil {
return fmt.Errorf("deleting snapshot blobs: %w", err)
}
return nil
}
// DeleteSnapshotUploads removes all uploads entries for a snapshot
func (r *SnapshotRepository) DeleteSnapshotUploads(ctx context.Context, snapshotID string) error {
query := `DELETE FROM uploads WHERE snapshot_id = ?`
_, err := r.db.ExecWithLog(ctx, query, snapshotID)
if err != nil {
return fmt.Errorf("deleting snapshot uploads: %w", err)
}
return nil
}

View File

@@ -6,6 +6,8 @@ import (
"math" "math"
"testing" "testing"
"time" "time"
"git.eeqj.de/sneak/vaultik/internal/types"
) )
const ( const (
@@ -30,7 +32,8 @@ func TestSnapshotRepository(t *testing.T) {
ID: "2024-01-01T12:00:00Z", ID: "2024-01-01T12:00:00Z",
Hostname: "test-host", Hostname: "test-host",
VaultikVersion: "1.0.0", VaultikVersion: "1.0.0",
CreatedTS: time.Now().Truncate(time.Second), StartedAt: time.Now().Truncate(time.Second),
CompletedAt: nil,
FileCount: 100, FileCount: 100,
ChunkCount: 500, ChunkCount: 500,
BlobCount: 10, BlobCount: 10,
@@ -45,7 +48,7 @@ func TestSnapshotRepository(t *testing.T) {
} }
// Test GetByID // Test GetByID
retrieved, err := repo.GetByID(ctx, snapshot.ID) retrieved, err := repo.GetByID(ctx, snapshot.ID.String())
if err != nil { if err != nil {
t.Fatalf("failed to get snapshot: %v", err) t.Fatalf("failed to get snapshot: %v", err)
} }
@@ -63,12 +66,12 @@ func TestSnapshotRepository(t *testing.T) {
} }
// Test UpdateCounts // Test UpdateCounts
err = repo.UpdateCounts(ctx, nil, snapshot.ID, 200, 1000, 20, twoHundredMebibytes, sixtyMebibytes) err = repo.UpdateCounts(ctx, nil, snapshot.ID.String(), 200, 1000, 20, twoHundredMebibytes, sixtyMebibytes)
if err != nil { if err != nil {
t.Fatalf("failed to update counts: %v", err) t.Fatalf("failed to update counts: %v", err)
} }
retrieved, err = repo.GetByID(ctx, snapshot.ID) retrieved, err = repo.GetByID(ctx, snapshot.ID.String())
if err != nil { if err != nil {
t.Fatalf("failed to get updated snapshot: %v", err) t.Fatalf("failed to get updated snapshot: %v", err)
} }
@@ -96,10 +99,11 @@ func TestSnapshotRepository(t *testing.T) {
// Add more snapshots // Add more snapshots
for i := 2; i <= 5; i++ { for i := 2; i <= 5; i++ {
s := &Snapshot{ s := &Snapshot{
ID: fmt.Sprintf("2024-01-0%dT12:00:00Z", i), ID: types.SnapshotID(fmt.Sprintf("2024-01-0%dT12:00:00Z", i)),
Hostname: "test-host", Hostname: "test-host",
VaultikVersion: "1.0.0", VaultikVersion: "1.0.0",
CreatedTS: time.Now().Add(time.Duration(i) * time.Hour).Truncate(time.Second), StartedAt: time.Now().Add(time.Duration(i) * time.Hour).Truncate(time.Second),
CompletedAt: nil,
FileCount: int64(100 * i), FileCount: int64(100 * i),
ChunkCount: int64(500 * i), ChunkCount: int64(500 * i),
BlobCount: int64(10 * i), BlobCount: int64(10 * i),
@@ -121,7 +125,7 @@ func TestSnapshotRepository(t *testing.T) {
// Verify order (most recent first) // Verify order (most recent first)
for i := 0; i < len(recent)-1; i++ { for i := 0; i < len(recent)-1; i++ {
if recent[i].CreatedTS.Before(recent[i+1].CreatedTS) { if recent[i].StartedAt.Before(recent[i+1].StartedAt) {
t.Error("snapshots not in descending order") t.Error("snapshots not in descending order")
} }
} }
@@ -162,7 +166,8 @@ func TestSnapshotRepositoryDuplicate(t *testing.T) {
ID: "2024-01-01T12:00:00Z", ID: "2024-01-01T12:00:00Z",
Hostname: "test-host", Hostname: "test-host",
VaultikVersion: "1.0.0", VaultikVersion: "1.0.0",
CreatedTS: time.Now().Truncate(time.Second), StartedAt: time.Now().Truncate(time.Second),
CompletedAt: nil,
FileCount: 100, FileCount: 100,
ChunkCount: 500, ChunkCount: 500,
BlobCount: 10, BlobCount: 10,

View File

@@ -0,0 +1,147 @@
package database
import (
"context"
"database/sql"
"time"
"git.eeqj.de/sneak/vaultik/internal/log"
)
// Upload represents a blob upload record
type Upload struct {
BlobHash string
SnapshotID string
UploadedAt time.Time
Size int64
DurationMs int64
}
// UploadRepository handles upload records
type UploadRepository struct {
conn *sql.DB
}
// NewUploadRepository creates a new upload repository
func NewUploadRepository(conn *sql.DB) *UploadRepository {
return &UploadRepository{conn: conn}
}
// Create inserts a new upload record
func (r *UploadRepository) Create(ctx context.Context, tx *sql.Tx, upload *Upload) error {
query := `
INSERT INTO uploads (blob_hash, snapshot_id, uploaded_at, size, duration_ms)
VALUES (?, ?, ?, ?, ?)
`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, upload.BlobHash, upload.SnapshotID, upload.UploadedAt, upload.Size, upload.DurationMs)
} else {
_, err = r.conn.ExecContext(ctx, query, upload.BlobHash, upload.SnapshotID, upload.UploadedAt, upload.Size, upload.DurationMs)
}
return err
}
// GetByBlobHash retrieves an upload record by blob hash
func (r *UploadRepository) GetByBlobHash(ctx context.Context, blobHash string) (*Upload, error) {
query := `
SELECT blob_hash, uploaded_at, size, duration_ms
FROM uploads
WHERE blob_hash = ?
`
var upload Upload
err := r.conn.QueryRowContext(ctx, query, blobHash).Scan(
&upload.BlobHash,
&upload.UploadedAt,
&upload.Size,
&upload.DurationMs,
)
if err == sql.ErrNoRows {
return nil, nil
}
if err != nil {
return nil, err
}
return &upload, nil
}
// GetRecentUploads retrieves recent uploads ordered by upload time
func (r *UploadRepository) GetRecentUploads(ctx context.Context, limit int) ([]*Upload, error) {
query := `
SELECT blob_hash, uploaded_at, size, duration_ms
FROM uploads
ORDER BY uploaded_at DESC
LIMIT ?
`
rows, err := r.conn.QueryContext(ctx, query, limit)
if err != nil {
return nil, err
}
defer func() {
if err := rows.Close(); err != nil {
log.Error("failed to close rows", "error", err)
}
}()
var uploads []*Upload
for rows.Next() {
var upload Upload
if err := rows.Scan(&upload.BlobHash, &upload.UploadedAt, &upload.Size, &upload.DurationMs); err != nil {
return nil, err
}
uploads = append(uploads, &upload)
}
return uploads, rows.Err()
}
// GetUploadStats returns aggregate statistics for uploads
func (r *UploadRepository) GetUploadStats(ctx context.Context, since time.Time) (*UploadStats, error) {
query := `
SELECT
COUNT(*) as count,
COALESCE(SUM(size), 0) as total_size,
COALESCE(AVG(duration_ms), 0) as avg_duration_ms,
COALESCE(MIN(duration_ms), 0) as min_duration_ms,
COALESCE(MAX(duration_ms), 0) as max_duration_ms
FROM uploads
WHERE uploaded_at >= ?
`
var stats UploadStats
err := r.conn.QueryRowContext(ctx, query, since).Scan(
&stats.Count,
&stats.TotalSize,
&stats.AvgDurationMs,
&stats.MinDurationMs,
&stats.MaxDurationMs,
)
return &stats, err
}
// UploadStats contains aggregate upload statistics
type UploadStats struct {
Count int64
TotalSize int64
AvgDurationMs float64
MinDurationMs int64
MaxDurationMs int64
}
// GetCountBySnapshot returns the count of uploads for a specific snapshot
func (r *UploadRepository) GetCountBySnapshot(ctx context.Context, snapshotID string) (int64, error) {
query := `SELECT COUNT(*) FROM uploads WHERE snapshot_id = ?`
var count int64
err := r.conn.QueryRowContext(ctx, query, snapshotID).Scan(&count)
if err != nil {
return 0, err
}
return count, nil
}

View File

@@ -4,13 +4,16 @@ import (
"time" "time"
) )
// these get populated from main() and copied into the Globals object. // Appname is the application name, populated from main().
var ( var Appname string = "vaultik"
Appname string = "vaultik"
Version string = "dev"
Commit string = "unknown"
)
// Version is the application version, populated from main().
var Version string = "dev"
// Commit is the git commit hash, populated from main().
var Commit string = "unknown"
// Globals contains application-wide configuration and metadata.
type Globals struct { type Globals struct {
Appname string Appname string
Version string Version string
@@ -18,13 +21,11 @@ type Globals struct {
StartTime time.Time StartTime time.Time
} }
// New creates and returns a new Globals instance initialized with the package-level variables.
func New() (*Globals, error) { func New() (*Globals, error) {
n := &Globals{ return &Globals{
Appname: Appname, Appname: Appname,
Version: Version, Version: Version,
Commit: Commit, Commit: Commit,
StartTime: time.Now(), }, nil
}
return n, nil
} }

View File

@@ -2,16 +2,15 @@ package globals
import ( import (
"testing" "testing"
"go.uber.org/fx"
"go.uber.org/fx/fxtest"
) )
// TestGlobalsNew ensures the globals package initializes correctly // TestGlobalsNew ensures the globals package initializes correctly
func TestGlobalsNew(t *testing.T) { func TestGlobalsNew(t *testing.T) {
app := fxtest.New(t, g, err := New()
fx.Provide(New), if err != nil {
fx.Invoke(func(g *Globals) { t.Fatalf("Failed to create Globals: %v", err)
}
if g == nil { if g == nil {
t.Fatal("Globals instance is nil") t.Fatal("Globals instance is nil")
} }
@@ -28,9 +27,4 @@ func TestGlobalsNew(t *testing.T) {
if g.Commit == "" { if g.Commit == "" {
t.Error("Commit should not be empty") t.Error("Commit should not be empty")
} }
}),
)
app.RequireStart()
app.RequireStop()
} }

182
internal/log/log.go Normal file
View File

@@ -0,0 +1,182 @@
package log
import (
"context"
"fmt"
"log/slog"
"os"
"path/filepath"
"runtime"
"strings"
"golang.org/x/term"
)
// LogLevel represents the logging level.
type LogLevel int
const (
// LevelFatal represents a fatal error level that will exit the program.
LevelFatal LogLevel = iota
// LevelError represents an error level.
LevelError
// LevelWarn represents a warning level.
LevelWarn
// LevelNotice represents a notice level (mapped to Info in slog).
LevelNotice
// LevelInfo represents an informational level.
LevelInfo
// LevelDebug represents a debug level.
LevelDebug
)
// Config holds logger configuration.
type Config struct {
Verbose bool
Debug bool
Cron bool
Quiet bool
}
var logger *slog.Logger
// Initialize sets up the global logger based on the provided configuration.
func Initialize(cfg Config) {
// Determine log level based on configuration
var level slog.Level
if cfg.Cron || cfg.Quiet {
// In quiet/cron mode, only show errors
level = slog.LevelError
} else if cfg.Debug || strings.Contains(os.Getenv("GODEBUG"), "vaultik") {
level = slog.LevelDebug
} else if cfg.Verbose {
level = slog.LevelInfo
} else {
level = slog.LevelWarn
}
// Create handler with appropriate level
opts := &slog.HandlerOptions{
Level: level,
}
// Check if stdout is a TTY
if term.IsTerminal(int(os.Stdout.Fd())) {
// Use colorized TTY handler
logger = slog.New(NewTTYHandler(os.Stdout, opts))
} else {
// Use JSON format for non-TTY output
logger = slog.New(slog.NewJSONHandler(os.Stdout, opts))
}
// Set as default logger
slog.SetDefault(logger)
}
// getCaller returns the caller information as a string
func getCaller(skip int) string {
_, file, line, ok := runtime.Caller(skip)
if !ok {
return "unknown"
}
return fmt.Sprintf("%s:%d", filepath.Base(file), line)
}
// Fatal logs a fatal error message and exits the program with code 1.
func Fatal(msg string, args ...any) {
if logger != nil {
// Add caller info to args
args = append(args, "caller", getCaller(2))
logger.Error(msg, args...)
}
os.Exit(1)
}
// Fatalf logs a formatted fatal error message and exits the program with code 1.
func Fatalf(format string, args ...any) {
Fatal(fmt.Sprintf(format, args...))
}
// Error logs an error message.
func Error(msg string, args ...any) {
if logger != nil {
args = append(args, "caller", getCaller(2))
logger.Error(msg, args...)
}
}
// Errorf logs a formatted error message.
func Errorf(format string, args ...any) {
Error(fmt.Sprintf(format, args...))
}
// Warn logs a warning message.
func Warn(msg string, args ...any) {
if logger != nil {
args = append(args, "caller", getCaller(2))
logger.Warn(msg, args...)
}
}
// Warnf logs a formatted warning message.
func Warnf(format string, args ...any) {
Warn(fmt.Sprintf(format, args...))
}
// Notice logs a notice message (mapped to Info level).
func Notice(msg string, args ...any) {
if logger != nil {
args = append(args, "caller", getCaller(2))
logger.Info(msg, args...)
}
}
// Noticef logs a formatted notice message.
func Noticef(format string, args ...any) {
Notice(fmt.Sprintf(format, args...))
}
// Info logs an informational message.
func Info(msg string, args ...any) {
if logger != nil {
args = append(args, "caller", getCaller(2))
logger.Info(msg, args...)
}
}
// Infof logs a formatted informational message.
func Infof(format string, args ...any) {
Info(fmt.Sprintf(format, args...))
}
// Debug logs a debug message.
func Debug(msg string, args ...any) {
if logger != nil {
args = append(args, "caller", getCaller(2))
logger.Debug(msg, args...)
}
}
// Debugf logs a formatted debug message.
func Debugf(format string, args ...any) {
Debug(fmt.Sprintf(format, args...))
}
// With returns a logger with additional context attributes.
func With(args ...any) *slog.Logger {
if logger != nil {
return logger.With(args...)
}
return slog.Default()
}
// WithContext returns a logger with the provided context.
func WithContext(ctx context.Context) *slog.Logger {
return logger
}
// Logger returns the underlying slog.Logger instance.
func Logger() *slog.Logger {
return logger
}

25
internal/log/module.go Normal file
View File

@@ -0,0 +1,25 @@
package log
import (
"go.uber.org/fx"
)
// Module exports logging functionality for dependency injection.
var Module = fx.Module("log",
fx.Invoke(func(cfg Config) {
Initialize(cfg)
}),
)
// New creates a new logger configuration from provided options.
func New(opts LogOptions) Config {
return Config(opts)
}
// LogOptions are provided by the CLI.
type LogOptions struct {
Verbose bool
Debug bool
Cron bool
Quiet bool
}

140
internal/log/tty_handler.go Normal file
View File

@@ -0,0 +1,140 @@
package log
import (
"context"
"fmt"
"io"
"log/slog"
"sync"
"time"
)
// ANSI color codes
const (
colorReset = "\033[0m"
colorRed = "\033[31m"
colorYellow = "\033[33m"
colorBlue = "\033[34m"
colorGray = "\033[90m"
colorGreen = "\033[32m"
colorCyan = "\033[36m"
colorBold = "\033[1m"
)
// TTYHandler is a custom slog handler for TTY output with colors.
type TTYHandler struct {
opts slog.HandlerOptions
mu sync.Mutex
out io.Writer
}
// NewTTYHandler creates a new TTY handler with colored output.
func NewTTYHandler(out io.Writer, opts *slog.HandlerOptions) *TTYHandler {
if opts == nil {
opts = &slog.HandlerOptions{}
}
return &TTYHandler{
out: out,
opts: *opts,
}
}
// Enabled reports whether the handler handles records at the given level.
func (h *TTYHandler) Enabled(_ context.Context, level slog.Level) bool {
return level >= h.opts.Level.Level()
}
// Handle writes the log record to the output with color formatting.
func (h *TTYHandler) Handle(_ context.Context, r slog.Record) error {
h.mu.Lock()
defer h.mu.Unlock()
// Format timestamp
timestamp := r.Time.Format("15:04:05")
// Level and color
level := r.Level.String()
var levelColor string
switch r.Level {
case slog.LevelDebug:
levelColor = colorGray
level = "DEBUG"
case slog.LevelInfo:
levelColor = colorGreen
level = "INFO "
case slog.LevelWarn:
levelColor = colorYellow
level = "WARN "
case slog.LevelError:
levelColor = colorRed
level = "ERROR"
default:
levelColor = colorReset
}
// Print main message
_, _ = fmt.Fprintf(h.out, "%s%s%s %s%s%s %s%s%s",
colorGray, timestamp, colorReset,
levelColor, level, colorReset,
colorBold, r.Message, colorReset)
// Print attributes
r.Attrs(func(a slog.Attr) bool {
value := a.Value.String()
// Special handling for certain attribute types
switch a.Value.Kind() {
case slog.KindDuration:
if d, ok := a.Value.Any().(time.Duration); ok {
value = formatDuration(d)
}
case slog.KindInt64:
if a.Key == "bytes" {
value = formatBytes(a.Value.Int64())
}
}
_, _ = fmt.Fprintf(h.out, " %s%s%s=%s%s%s",
colorCyan, a.Key, colorReset,
colorBlue, value, colorReset)
return true
})
_, _ = fmt.Fprintln(h.out)
return nil
}
// WithAttrs returns a new handler with the given attributes.
func (h *TTYHandler) WithAttrs(attrs []slog.Attr) slog.Handler {
return h // Simplified for now
}
// WithGroup returns a new handler with the given group name.
func (h *TTYHandler) WithGroup(name string) slog.Handler {
return h // Simplified for now
}
// formatDuration formats a duration in a human-readable way
func formatDuration(d time.Duration) string {
if d < time.Millisecond {
return fmt.Sprintf("%dµs", d.Microseconds())
} else if d < time.Second {
return fmt.Sprintf("%dms", d.Milliseconds())
} else if d < time.Minute {
return fmt.Sprintf("%.1fs", d.Seconds())
}
return d.String()
}
// formatBytes formats bytes in a human-readable way
func formatBytes(b int64) string {
const unit = 1024
if b < unit {
return fmt.Sprintf("%d B", b)
}
div, exp := int64(unit), 0
for n := b / unit; n >= unit; n /= unit {
div *= unit
exp++
}
return fmt.Sprintf("%.1f %cB", float64(b)/float64(div), "KMGTPE"[exp])
}

108
internal/pidlock/pidlock.go Normal file
View File

@@ -0,0 +1,108 @@
// Package pidlock provides process-level locking using PID files.
// It prevents multiple instances of vaultik from running simultaneously,
// which would cause database locking conflicts.
package pidlock
import (
"errors"
"fmt"
"os"
"path/filepath"
"strconv"
"strings"
"syscall"
)
// ErrAlreadyRunning indicates another vaultik instance is running.
var ErrAlreadyRunning = errors.New("another vaultik instance is already running")
// Lock represents an acquired PID lock.
type Lock struct {
path string
}
// Acquire attempts to acquire a PID lock in the specified directory.
// If the lock file exists and the process is still running, it returns
// ErrAlreadyRunning with details about the existing process.
// On success, it writes the current PID to the lock file and returns
// a Lock that must be released with Release().
func Acquire(lockDir string) (*Lock, error) {
// Ensure lock directory exists
if err := os.MkdirAll(lockDir, 0700); err != nil {
return nil, fmt.Errorf("creating lock directory: %w", err)
}
lockPath := filepath.Join(lockDir, "vaultik.pid")
// Check for existing lock
existingPID, err := readPIDFile(lockPath)
if err == nil {
// Lock file exists, check if process is running
if isProcessRunning(existingPID) {
return nil, fmt.Errorf("%w (PID %d)", ErrAlreadyRunning, existingPID)
}
// Process is not running, stale lock file - we can take over
}
// Write our PID
pid := os.Getpid()
if err := os.WriteFile(lockPath, []byte(strconv.Itoa(pid)), 0600); err != nil {
return nil, fmt.Errorf("writing PID file: %w", err)
}
return &Lock{path: lockPath}, nil
}
// Release removes the PID lock file.
// It is safe to call Release multiple times.
func (l *Lock) Release() error {
if l == nil || l.path == "" {
return nil
}
// Verify we still own the lock (our PID is in the file)
existingPID, err := readPIDFile(l.path)
if err != nil {
// File already gone or unreadable - that's fine
return nil
}
if existingPID != os.Getpid() {
// Someone else wrote to our lock file - don't remove it
return nil
}
if err := os.Remove(l.path); err != nil && !os.IsNotExist(err) {
return fmt.Errorf("removing PID file: %w", err)
}
l.path = "" // Prevent double-release
return nil
}
// readPIDFile reads and parses the PID from a lock file.
func readPIDFile(path string) (int, error) {
data, err := os.ReadFile(path)
if err != nil {
return 0, err
}
pid, err := strconv.Atoi(strings.TrimSpace(string(data)))
if err != nil {
return 0, fmt.Errorf("parsing PID: %w", err)
}
return pid, nil
}
// isProcessRunning checks if a process with the given PID is running.
func isProcessRunning(pid int) bool {
process, err := os.FindProcess(pid)
if err != nil {
return false
}
// On Unix, FindProcess always succeeds. We need to send signal 0 to check.
err = process.Signal(syscall.Signal(0))
return err == nil
}

View File

@@ -0,0 +1,108 @@
package pidlock
import (
"os"
"path/filepath"
"strconv"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestAcquireAndRelease(t *testing.T) {
tmpDir := t.TempDir()
// Acquire lock
lock, err := Acquire(tmpDir)
require.NoError(t, err)
require.NotNil(t, lock)
// Verify PID file exists with our PID
data, err := os.ReadFile(filepath.Join(tmpDir, "vaultik.pid"))
require.NoError(t, err)
pid, err := strconv.Atoi(string(data))
require.NoError(t, err)
assert.Equal(t, os.Getpid(), pid)
// Release lock
err = lock.Release()
require.NoError(t, err)
// Verify PID file is gone
_, err = os.Stat(filepath.Join(tmpDir, "vaultik.pid"))
assert.True(t, os.IsNotExist(err))
}
func TestAcquireBlocksSecondInstance(t *testing.T) {
tmpDir := t.TempDir()
// Acquire first lock
lock1, err := Acquire(tmpDir)
require.NoError(t, err)
require.NotNil(t, lock1)
defer func() { _ = lock1.Release() }()
// Try to acquire second lock - should fail
lock2, err := Acquire(tmpDir)
assert.ErrorIs(t, err, ErrAlreadyRunning)
assert.Nil(t, lock2)
}
func TestAcquireWithStaleLock(t *testing.T) {
tmpDir := t.TempDir()
// Write a stale PID file (PID that doesn't exist)
stalePID := 999999999 // Unlikely to be a real process
pidPath := filepath.Join(tmpDir, "vaultik.pid")
err := os.WriteFile(pidPath, []byte(strconv.Itoa(stalePID)), 0600)
require.NoError(t, err)
// Should be able to acquire lock (stale lock is cleaned up)
lock, err := Acquire(tmpDir)
require.NoError(t, err)
require.NotNil(t, lock)
defer func() { _ = lock.Release() }()
// Verify our PID is now in the file
data, err := os.ReadFile(pidPath)
require.NoError(t, err)
pid, err := strconv.Atoi(string(data))
require.NoError(t, err)
assert.Equal(t, os.Getpid(), pid)
}
func TestReleaseIsIdempotent(t *testing.T) {
tmpDir := t.TempDir()
lock, err := Acquire(tmpDir)
require.NoError(t, err)
// Release multiple times - should not error
err = lock.Release()
require.NoError(t, err)
err = lock.Release()
require.NoError(t, err)
}
func TestReleaseNilLock(t *testing.T) {
var lock *Lock
err := lock.Release()
assert.NoError(t, err)
}
func TestAcquireCreatesDirectory(t *testing.T) {
tmpDir := t.TempDir()
nestedDir := filepath.Join(tmpDir, "nested", "dir")
lock, err := Acquire(nestedDir)
require.NoError(t, err)
require.NotNil(t, lock)
defer func() { _ = lock.Release() }()
// Verify directory was created
info, err := os.Stat(nestedDir)
require.NoError(t, err)
assert.True(t, info.IsDir())
}

334
internal/s3/client.go Normal file
View File

@@ -0,0 +1,334 @@
package s3
import (
"context"
"io"
"sync/atomic"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/credentials"
"github.com/aws/aws-sdk-go-v2/feature/s3/manager"
"github.com/aws/aws-sdk-go-v2/service/s3"
"github.com/aws/smithy-go/logging"
)
// Client wraps the AWS S3 client for vaultik operations.
// It provides a simplified interface for S3 operations with automatic
// prefix handling and connection management. All operations are performed
// within the configured bucket and prefix.
type Client struct {
s3Client *s3.Client
bucket string
prefix string
endpoint string
}
// Config contains S3 client configuration.
// All fields are required except Prefix, which defaults to an empty string.
// The Endpoint field should include the protocol (http:// or https://).
type Config struct {
Endpoint string
Bucket string
Prefix string
AccessKeyID string
SecretAccessKey string
Region string
}
// nopLogger is a logger that discards all output.
// Used to suppress SDK warnings about checksums.
type nopLogger struct{}
func (nopLogger) Logf(classification logging.Classification, format string, v ...interface{}) {}
// NewClient creates a new S3 client with the provided configuration.
// It establishes a connection to the S3-compatible storage service and
// validates the credentials. The client uses static credentials and
// path-style URLs for compatibility with various S3-compatible services.
func NewClient(ctx context.Context, cfg Config) (*Client, error) {
// Create AWS config with a nop logger to suppress SDK warnings
awsCfg, err := config.LoadDefaultConfig(ctx,
config.WithRegion(cfg.Region),
config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(
cfg.AccessKeyID,
cfg.SecretAccessKey,
"",
)),
config.WithLogger(nopLogger{}),
)
if err != nil {
return nil, err
}
// Configure custom endpoint if provided
s3Opts := func(o *s3.Options) {
if cfg.Endpoint != "" {
o.BaseEndpoint = aws.String(cfg.Endpoint)
o.UsePathStyle = true
}
}
s3Client := s3.NewFromConfig(awsCfg, s3Opts)
return &Client{
s3Client: s3Client,
bucket: cfg.Bucket,
prefix: cfg.Prefix,
endpoint: cfg.Endpoint,
}, nil
}
// PutObject uploads an object to S3 with the specified key.
// The key is automatically prefixed with the configured prefix.
// The data parameter should be a reader containing the object data.
// Returns an error if the upload fails.
func (c *Client) PutObject(ctx context.Context, key string, data io.Reader) error {
fullKey := c.prefix + key
_, err := c.s3Client.PutObject(ctx, &s3.PutObjectInput{
Bucket: aws.String(c.bucket),
Key: aws.String(fullKey),
Body: data,
})
return err
}
// ProgressCallback is called during upload progress with bytes uploaded so far.
// The callback should return an error to cancel the upload.
type ProgressCallback func(bytesUploaded int64) error
// PutObjectWithProgress uploads an object to S3 with progress tracking.
// The key is automatically prefixed with the configured prefix.
// The size parameter must be the exact size of the data to upload.
// The progress callback is called periodically with the number of bytes uploaded.
// Returns an error if the upload fails.
func (c *Client) PutObjectWithProgress(ctx context.Context, key string, data io.Reader, size int64, progress ProgressCallback) error {
fullKey := c.prefix + key
// Create an uploader with the S3 client
uploader := manager.NewUploader(c.s3Client, func(u *manager.Uploader) {
// Set part size to 10MB for better progress granularity
u.PartSize = 10 * 1024 * 1024
})
// Create a progress reader that tracks upload progress
pr := &progressReader{
reader: data,
size: size,
callback: progress,
read: 0,
}
// Upload the file
_, err := uploader.Upload(ctx, &s3.PutObjectInput{
Bucket: aws.String(c.bucket),
Key: aws.String(fullKey),
Body: pr,
})
return err
}
// GetObject downloads an object from S3 with the specified key.
// The key is automatically prefixed with the configured prefix.
// Returns a ReadCloser containing the object data. The caller must
// close the returned reader when done to avoid resource leaks.
func (c *Client) GetObject(ctx context.Context, key string) (io.ReadCloser, error) {
fullKey := c.prefix + key
result, err := c.s3Client.GetObject(ctx, &s3.GetObjectInput{
Bucket: aws.String(c.bucket),
Key: aws.String(fullKey),
})
if err != nil {
return nil, err
}
return result.Body, nil
}
// DeleteObject removes an object from S3 with the specified key.
// The key is automatically prefixed with the configured prefix.
// No error is returned if the object doesn't exist.
func (c *Client) DeleteObject(ctx context.Context, key string) error {
fullKey := c.prefix + key
_, err := c.s3Client.DeleteObject(ctx, &s3.DeleteObjectInput{
Bucket: aws.String(c.bucket),
Key: aws.String(fullKey),
})
return err
}
// ListObjects lists all objects with the given prefix.
// The prefix is combined with the client's configured prefix.
// Returns a slice of object keys with the base prefix removed.
// This method loads all matching keys into memory, so use
// ListObjectsStream for large result sets.
func (c *Client) ListObjects(ctx context.Context, prefix string) ([]string, error) {
fullPrefix := c.prefix + prefix
var keys []string
paginator := s3.NewListObjectsV2Paginator(c.s3Client, &s3.ListObjectsV2Input{
Bucket: aws.String(c.bucket),
Prefix: aws.String(fullPrefix),
})
for paginator.HasMorePages() {
page, err := paginator.NextPage(ctx)
if err != nil {
return nil, err
}
for _, obj := range page.Contents {
if obj.Key != nil {
// Remove the base prefix from the key
key := *obj.Key
if len(key) > len(c.prefix) {
key = key[len(c.prefix):]
}
keys = append(keys, key)
}
}
}
return keys, nil
}
// HeadObject checks if an object exists in S3.
// Returns true if the object exists, false otherwise.
// The key is automatically prefixed with the configured prefix.
// Note: This method returns false for any error, not just "not found".
func (c *Client) HeadObject(ctx context.Context, key string) (bool, error) {
fullKey := c.prefix + key
_, err := c.s3Client.HeadObject(ctx, &s3.HeadObjectInput{
Bucket: aws.String(c.bucket),
Key: aws.String(fullKey),
})
if err != nil {
// Check if it's a not found error
// TODO: Add proper error type checking
return false, nil
}
return true, nil
}
// ObjectInfo contains information about an S3 object.
// It is used by ListObjectsStream to return object metadata
// along with any errors encountered during listing.
type ObjectInfo struct {
Key string
Size int64
Err error
}
// ListObjectsStream lists objects with the given prefix and returns a channel.
// This method is preferred for large result sets as it streams results
// instead of loading everything into memory. The channel is closed when
// listing is complete or an error occurs. If an error occurs, it will be
// sent as the last item with the Err field set. The recursive parameter
// is currently unused but reserved for future use.
func (c *Client) ListObjectsStream(ctx context.Context, prefix string, recursive bool) <-chan ObjectInfo {
ch := make(chan ObjectInfo)
go func() {
defer close(ch)
fullPrefix := c.prefix + prefix
paginator := s3.NewListObjectsV2Paginator(c.s3Client, &s3.ListObjectsV2Input{
Bucket: aws.String(c.bucket),
Prefix: aws.String(fullPrefix),
})
for paginator.HasMorePages() {
page, err := paginator.NextPage(ctx)
if err != nil {
ch <- ObjectInfo{Err: err}
return
}
for _, obj := range page.Contents {
if obj.Key != nil && obj.Size != nil {
// Remove the base prefix from the key
key := *obj.Key
if len(key) > len(c.prefix) {
key = key[len(c.prefix):]
}
ch <- ObjectInfo{
Key: key,
Size: *obj.Size,
}
}
}
}
}()
return ch
}
// StatObject returns information about an object without downloading it.
// The key is automatically prefixed with the configured prefix.
// Returns an ObjectInfo struct with the object's metadata.
// Returns an error if the object doesn't exist or if the operation fails.
func (c *Client) StatObject(ctx context.Context, key string) (*ObjectInfo, error) {
fullKey := c.prefix + key
result, err := c.s3Client.HeadObject(ctx, &s3.HeadObjectInput{
Bucket: aws.String(c.bucket),
Key: aws.String(fullKey),
})
if err != nil {
return nil, err
}
size := int64(0)
if result.ContentLength != nil {
size = *result.ContentLength
}
return &ObjectInfo{
Key: key,
Size: size,
}, nil
}
// RemoveObject deletes an object from S3 (alias for DeleteObject).
// This method exists for API compatibility and simply calls DeleteObject.
func (c *Client) RemoveObject(ctx context.Context, key string) error {
return c.DeleteObject(ctx, key)
}
// BucketName returns the configured S3 bucket name.
// This is useful for displaying configuration information.
func (c *Client) BucketName() string {
return c.bucket
}
// Endpoint returns the S3 endpoint URL.
// If no custom endpoint was configured, returns the default AWS S3 endpoint.
// This is useful for displaying configuration information.
func (c *Client) Endpoint() string {
if c.endpoint == "" {
return "s3.amazonaws.com"
}
return c.endpoint
}
// progressReader wraps an io.Reader to track reading progress
type progressReader struct {
reader io.Reader
size int64
read int64
callback ProgressCallback
}
// Read implements io.Reader
func (pr *progressReader) Read(p []byte) (int, error) {
n, err := pr.reader.Read(p)
if n > 0 {
atomic.AddInt64(&pr.read, int64(n))
if pr.callback != nil {
if callbackErr := pr.callback(atomic.LoadInt64(&pr.read)); callbackErr != nil {
return n, callbackErr
}
}
}
return n, err
}

View File

@@ -0,0 +1,98 @@
package s3_test
import (
"bytes"
"context"
"io"
"testing"
"git.eeqj.de/sneak/vaultik/internal/s3"
)
func TestClient(t *testing.T) {
ts := NewTestServer(t)
defer func() {
if err := ts.Cleanup(); err != nil {
t.Errorf("cleanup failed: %v", err)
}
}()
ctx := context.Background()
// Create client
client, err := s3.NewClient(ctx, s3.Config{
Endpoint: testEndpoint,
Bucket: testBucket,
Prefix: "test-prefix/",
AccessKeyID: testAccessKey,
SecretAccessKey: testSecretKey,
Region: testRegion,
})
if err != nil {
t.Fatalf("failed to create client: %v", err)
}
// Test PutObject
testKey := "foo/bar.txt"
testData := []byte("test data")
err = client.PutObject(ctx, testKey, bytes.NewReader(testData))
if err != nil {
t.Fatalf("failed to put object: %v", err)
}
// Test GetObject
reader, err := client.GetObject(ctx, testKey)
if err != nil {
t.Fatalf("failed to get object: %v", err)
}
defer func() {
if err := reader.Close(); err != nil {
t.Errorf("failed to close reader: %v", err)
}
}()
data, err := io.ReadAll(reader)
if err != nil {
t.Fatalf("failed to read data: %v", err)
}
if !bytes.Equal(data, testData) {
t.Errorf("data mismatch: got %q, want %q", data, testData)
}
// Test HeadObject
exists, err := client.HeadObject(ctx, testKey)
if err != nil {
t.Fatalf("failed to head object: %v", err)
}
if !exists {
t.Error("expected object to exist")
}
// Test ListObjects
keys, err := client.ListObjects(ctx, "foo/")
if err != nil {
t.Fatalf("failed to list objects: %v", err)
}
if len(keys) != 1 {
t.Errorf("expected 1 key, got %d", len(keys))
}
if keys[0] != testKey {
t.Errorf("unexpected key: got %s, want %s", keys[0], testKey)
}
// Test DeleteObject
err = client.DeleteObject(ctx, testKey)
if err != nil {
t.Fatalf("failed to delete object: %v", err)
}
// Verify deletion
exists, err = client.HeadObject(ctx, testKey)
if err != nil {
t.Fatalf("failed to head object after deletion: %v", err)
}
if exists {
t.Error("expected object to not exist after deletion")
}
}

42
internal/s3/module.go Normal file
View File

@@ -0,0 +1,42 @@
package s3
import (
"context"
"git.eeqj.de/sneak/vaultik/internal/config"
"go.uber.org/fx"
)
// Module exports S3 functionality as an fx module.
// It provides automatic dependency injection for the S3 client,
// configuring it based on the application's configuration settings.
var Module = fx.Module("s3",
fx.Provide(
provideClient,
),
)
func provideClient(lc fx.Lifecycle, cfg *config.Config) (*Client, error) {
ctx := context.Background()
client, err := NewClient(ctx, Config{
Endpoint: cfg.S3.Endpoint,
Bucket: cfg.S3.Bucket,
Prefix: cfg.S3.Prefix,
AccessKeyID: cfg.S3.AccessKeyID,
SecretAccessKey: cfg.S3.SecretAccessKey,
Region: cfg.S3.Region,
})
if err != nil {
return nil, err
}
lc.Append(fx.Hook{
OnStop: func(ctx context.Context) error {
// S3 client doesn't need explicit cleanup
return nil
},
})
return client, nil
}

306
internal/s3/s3_test.go Normal file
View File

@@ -0,0 +1,306 @@
package s3_test
import (
"bytes"
"context"
"fmt"
"io"
"net/http"
"os"
"path/filepath"
"testing"
"time"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/credentials"
"github.com/aws/aws-sdk-go-v2/service/s3"
"github.com/aws/smithy-go/logging"
"github.com/johannesboyne/gofakes3"
"github.com/johannesboyne/gofakes3/backend/s3mem"
)
const (
testBucket = "test-bucket"
testRegion = "us-east-1"
testAccessKey = "test-access-key"
testSecretKey = "test-secret-key"
testEndpoint = "http://localhost:9999"
)
// TestServer represents an in-process S3-compatible test server
type TestServer struct {
server *http.Server
backend gofakes3.Backend
s3Client *s3.Client
tempDir string
logBuf *bytes.Buffer
}
// NewTestServer creates and starts a new test server
func NewTestServer(t *testing.T) *TestServer {
// Create temp directory for any file operations
tempDir, err := os.MkdirTemp("", "vaultik-s3-test-*")
if err != nil {
t.Fatalf("failed to create temp dir: %v", err)
}
// Create in-memory backend
backend := s3mem.New()
faker := gofakes3.New(backend)
// Create HTTP server
server := &http.Server{
Addr: "localhost:9999",
Handler: faker.Server(),
}
// Start server in background
go func() {
if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
t.Logf("test server error: %v", err)
}
}()
// Wait for server to be ready
time.Sleep(100 * time.Millisecond)
// Create a buffer to capture logs
logBuf := &bytes.Buffer{}
// Create S3 client with custom logger
cfg, err := config.LoadDefaultConfig(context.Background(),
config.WithRegion(testRegion),
config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(
testAccessKey,
testSecretKey,
"",
)),
config.WithClientLogMode(aws.LogRetries|aws.LogRequestWithBody|aws.LogResponseWithBody),
config.WithLogger(logging.LoggerFunc(func(classification logging.Classification, format string, v ...interface{}) {
// Capture logs to buffer instead of stdout
fmt.Fprintf(logBuf, "SDK %s %s %s\n",
time.Now().Format("2006/01/02 15:04:05"),
string(classification),
fmt.Sprintf(format, v...))
})),
)
if err != nil {
t.Fatalf("failed to create AWS config: %v", err)
}
s3Client := s3.NewFromConfig(cfg, func(o *s3.Options) {
o.BaseEndpoint = aws.String(testEndpoint)
o.UsePathStyle = true
})
ts := &TestServer{
server: server,
backend: backend,
s3Client: s3Client,
tempDir: tempDir,
logBuf: logBuf,
}
// Register cleanup to show logs on test failure
t.Cleanup(func() {
if t.Failed() && logBuf.Len() > 0 {
t.Logf("S3 SDK Debug Output:\n%s", logBuf.String())
}
})
// Create test bucket
_, err = s3Client.CreateBucket(context.Background(), &s3.CreateBucketInput{
Bucket: aws.String(testBucket),
})
if err != nil {
t.Fatalf("failed to create test bucket: %v", err)
}
return ts
}
// Cleanup shuts down the server and removes temp directory
func (ts *TestServer) Cleanup() error {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if err := ts.server.Shutdown(ctx); err != nil {
return err
}
return os.RemoveAll(ts.tempDir)
}
// Client returns the S3 client configured for the test server
func (ts *TestServer) Client() *s3.Client {
return ts.s3Client
}
// TestBasicS3Operations tests basic store and retrieve operations
func TestBasicS3Operations(t *testing.T) {
ts := NewTestServer(t)
defer func() {
if err := ts.Cleanup(); err != nil {
t.Errorf("cleanup failed: %v", err)
}
}()
ctx := context.Background()
client := ts.Client()
// Test data
testKey := "test/file.txt"
testData := []byte("Hello, S3 test!")
// Put object
_, err := client.PutObject(ctx, &s3.PutObjectInput{
Bucket: aws.String(testBucket),
Key: aws.String(testKey),
Body: bytes.NewReader(testData),
})
if err != nil {
t.Fatalf("failed to put object: %v", err)
}
// Get object
result, err := client.GetObject(ctx, &s3.GetObjectInput{
Bucket: aws.String(testBucket),
Key: aws.String(testKey),
})
if err != nil {
t.Fatalf("failed to get object: %v", err)
}
defer func() {
if err := result.Body.Close(); err != nil {
t.Errorf("failed to close body: %v", err)
}
}()
// Read and verify data
data, err := io.ReadAll(result.Body)
if err != nil {
t.Fatalf("failed to read object body: %v", err)
}
if !bytes.Equal(data, testData) {
t.Errorf("retrieved data mismatch: got %q, want %q", data, testData)
}
}
// TestBlobOperations tests blob storage patterns for vaultik
func TestBlobOperations(t *testing.T) {
ts := NewTestServer(t)
defer func() {
if err := ts.Cleanup(); err != nil {
t.Errorf("cleanup failed: %v", err)
}
}()
ctx := context.Background()
client := ts.Client()
// Test blob storage with prefix structure
blobHash := "aabbccddee112233445566778899aabbccddee11"
blobKey := filepath.Join("blobs", blobHash[:2], blobHash[2:4], blobHash+".zst.age")
blobData := []byte("compressed and encrypted blob data")
// Store blob
_, err := client.PutObject(ctx, &s3.PutObjectInput{
Bucket: aws.String(testBucket),
Key: aws.String(blobKey),
Body: bytes.NewReader(blobData),
})
if err != nil {
t.Fatalf("failed to store blob: %v", err)
}
// List objects with prefix
listResult, err := client.ListObjectsV2(ctx, &s3.ListObjectsV2Input{
Bucket: aws.String(testBucket),
Prefix: aws.String("blobs/aa/"),
})
if err != nil {
t.Fatalf("failed to list objects: %v", err)
}
if len(listResult.Contents) != 1 {
t.Errorf("expected 1 object, got %d", len(listResult.Contents))
}
if listResult.Contents[0].Key != nil && *listResult.Contents[0].Key != blobKey {
t.Errorf("unexpected key: got %s, want %s", *listResult.Contents[0].Key, blobKey)
}
// Delete blob
_, err = client.DeleteObject(ctx, &s3.DeleteObjectInput{
Bucket: aws.String(testBucket),
Key: aws.String(blobKey),
})
if err != nil {
t.Fatalf("failed to delete blob: %v", err)
}
// Verify deletion
_, err = client.GetObject(ctx, &s3.GetObjectInput{
Bucket: aws.String(testBucket),
Key: aws.String(blobKey),
})
if err == nil {
t.Error("expected error getting deleted object, got nil")
}
}
// TestMetadataOperations tests metadata storage patterns
func TestMetadataOperations(t *testing.T) {
ts := NewTestServer(t)
defer func() {
if err := ts.Cleanup(); err != nil {
t.Errorf("cleanup failed: %v", err)
}
}()
ctx := context.Background()
client := ts.Client()
// Test metadata storage
snapshotID := "2024-01-01T12:00:00Z"
metadataKey := filepath.Join("metadata", snapshotID+".sqlite.age")
metadataData := []byte("encrypted sqlite database")
// Store metadata
_, err := client.PutObject(ctx, &s3.PutObjectInput{
Bucket: aws.String(testBucket),
Key: aws.String(metadataKey),
Body: bytes.NewReader(metadataData),
})
if err != nil {
t.Fatalf("failed to store metadata: %v", err)
}
// Store manifest
manifestKey := filepath.Join("metadata", snapshotID+".manifest.json.zst")
manifestData := []byte(`{"snapshot_id":"2024-01-01T12:00:00Z","blob_hashes":["hash1","hash2"]}`)
_, err = client.PutObject(ctx, &s3.PutObjectInput{
Bucket: aws.String(testBucket),
Key: aws.String(manifestKey),
Body: bytes.NewReader(manifestData),
})
if err != nil {
t.Fatalf("failed to store manifest: %v", err)
}
// List metadata objects
listResult, err := client.ListObjectsV2(ctx, &s3.ListObjectsV2Input{
Bucket: aws.String(testBucket),
Prefix: aws.String("metadata/"),
})
if err != nil {
t.Fatalf("failed to list metadata: %v", err)
}
if len(listResult.Contents) != 2 {
t.Errorf("expected 2 metadata objects, got %d", len(listResult.Contents))
}
}

View File

@@ -0,0 +1,534 @@
package snapshot
import (
"context"
"crypto/sha256"
"database/sql"
"fmt"
"io"
"io/fs"
"os"
"path/filepath"
"testing"
"testing/fstest"
"time"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/types"
)
// MockS3Client is a mock implementation of S3 operations for testing
type MockS3Client struct {
storage map[string][]byte
}
func NewMockS3Client() *MockS3Client {
return &MockS3Client{
storage: make(map[string][]byte),
}
}
func (m *MockS3Client) PutBlob(ctx context.Context, hash string, data []byte) error {
m.storage[hash] = data
return nil
}
func (m *MockS3Client) GetBlob(ctx context.Context, hash string) ([]byte, error) {
data, ok := m.storage[hash]
if !ok {
return nil, fmt.Errorf("blob not found: %s", hash)
}
return data, nil
}
func (m *MockS3Client) BlobExists(ctx context.Context, hash string) (bool, error) {
_, ok := m.storage[hash]
return ok, nil
}
func (m *MockS3Client) CreateBucket(ctx context.Context, bucket string) error {
return nil
}
func TestBackupWithInMemoryFS(t *testing.T) {
// Create a temporary directory for the database
tempDir := t.TempDir()
dbPath := filepath.Join(tempDir, "test.db")
// Create test filesystem
testFS := fstest.MapFS{
"file1.txt": &fstest.MapFile{
Data: []byte("Hello, World!"),
Mode: 0644,
ModTime: time.Now(),
},
"dir1/file2.txt": &fstest.MapFile{
Data: []byte("This is a test file with some content."),
Mode: 0755,
ModTime: time.Now(),
},
"dir1/subdir/file3.txt": &fstest.MapFile{
Data: []byte("Another file in a subdirectory."),
Mode: 0600,
ModTime: time.Now(),
},
"largefile.bin": &fstest.MapFile{
Data: generateLargeFileContent(10 * 1024 * 1024), // 10MB file with varied content
Mode: 0644,
ModTime: time.Now(),
},
}
// Initialize the database
ctx := context.Background()
db, err := database.New(ctx, dbPath)
if err != nil {
t.Fatalf("Failed to create database: %v", err)
}
defer func() {
if err := db.Close(); err != nil {
t.Logf("Failed to close database: %v", err)
}
}()
repos := database.NewRepositories(db)
// Create mock S3 client
s3Client := NewMockS3Client()
// Run backup
backupEngine := &BackupEngine{
repos: repos,
s3Client: s3Client,
}
snapshotID, err := backupEngine.Backup(ctx, testFS, ".")
if err != nil {
t.Fatalf("Backup failed: %v", err)
}
// Verify snapshot was created
snapshot, err := repos.Snapshots.GetByID(ctx, snapshotID)
if err != nil {
t.Fatalf("Failed to get snapshot: %v", err)
}
if snapshot == nil {
t.Fatal("Snapshot not found")
}
if snapshot.FileCount == 0 {
t.Error("Expected snapshot to have files")
}
// Verify files in database
files, err := repos.Files.ListByPrefix(ctx, "")
if err != nil {
t.Fatalf("Failed to list files: %v", err)
}
expectedFiles := map[string]bool{
"file1.txt": true,
"dir1/file2.txt": true,
"dir1/subdir/file3.txt": true,
"largefile.bin": true,
}
if len(files) != len(expectedFiles) {
t.Errorf("Expected %d files, got %d", len(expectedFiles), len(files))
}
for _, file := range files {
if !expectedFiles[file.Path.String()] {
t.Errorf("Unexpected file in database: %s", file.Path)
}
delete(expectedFiles, file.Path.String())
// Verify file metadata
fsFile := testFS[file.Path.String()]
if fsFile == nil {
t.Errorf("File %s not found in test filesystem", file.Path)
continue
}
if file.Size != int64(len(fsFile.Data)) {
t.Errorf("File %s: expected size %d, got %d", file.Path, len(fsFile.Data), file.Size)
}
if file.Mode != uint32(fsFile.Mode) {
t.Errorf("File %s: expected mode %o, got %o", file.Path, fsFile.Mode, file.Mode)
}
}
if len(expectedFiles) > 0 {
t.Errorf("Files not found in database: %v", expectedFiles)
}
// Verify chunks
chunks, err := repos.Chunks.List(ctx)
if err != nil {
t.Fatalf("Failed to list chunks: %v", err)
}
if len(chunks) == 0 {
t.Error("No chunks found in database")
}
// The large file should create 10 chunks (10MB / 1MB chunk size)
// Plus the small files
minExpectedChunks := 10 + 3
if len(chunks) < minExpectedChunks {
t.Errorf("Expected at least %d chunks, got %d", minExpectedChunks, len(chunks))
}
// Verify at least one blob was created and uploaded
// We can't list blobs directly, but we can check via snapshot blobs
blobHashes, err := repos.Snapshots.GetBlobHashes(ctx, snapshotID)
if err != nil {
t.Fatalf("Failed to get blob hashes: %v", err)
}
if len(blobHashes) == 0 {
t.Error("Expected at least one blob to be created")
}
for _, blobHash := range blobHashes {
// Check blob exists in mock S3
exists, err := s3Client.BlobExists(ctx, blobHash)
if err != nil {
t.Errorf("Failed to check blob %s: %v", blobHash, err)
}
if !exists {
t.Errorf("Blob %s not found in S3", blobHash)
}
}
}
func TestBackupDeduplication(t *testing.T) {
// Create a temporary directory for the database
tempDir := t.TempDir()
dbPath := filepath.Join(tempDir, "test.db")
// Create test filesystem with duplicate content
testFS := fstest.MapFS{
"file1.txt": &fstest.MapFile{
Data: []byte("Duplicate content"),
Mode: 0644,
ModTime: time.Now(),
},
"file2.txt": &fstest.MapFile{
Data: []byte("Duplicate content"),
Mode: 0644,
ModTime: time.Now(),
},
"file3.txt": &fstest.MapFile{
Data: []byte("Unique content"),
Mode: 0644,
ModTime: time.Now(),
},
}
// Initialize the database
ctx := context.Background()
db, err := database.New(ctx, dbPath)
if err != nil {
t.Fatalf("Failed to create database: %v", err)
}
defer func() {
if err := db.Close(); err != nil {
t.Logf("Failed to close database: %v", err)
}
}()
repos := database.NewRepositories(db)
// Create mock S3 client
s3Client := NewMockS3Client()
// Run backup
backupEngine := &BackupEngine{
repos: repos,
s3Client: s3Client,
}
_, err = backupEngine.Backup(ctx, testFS, ".")
if err != nil {
t.Fatalf("Backup failed: %v", err)
}
// Verify deduplication
chunks, err := repos.Chunks.List(ctx)
if err != nil {
t.Fatalf("Failed to list chunks: %v", err)
}
// Should have only 2 unique chunks (duplicate content + unique content)
if len(chunks) != 2 {
t.Errorf("Expected 2 unique chunks, got %d", len(chunks))
}
// Verify chunk references
for _, chunk := range chunks {
files, err := repos.ChunkFiles.GetByChunkHash(ctx, chunk.ChunkHash)
if err != nil {
t.Errorf("Failed to get files for chunk %s: %v", chunk.ChunkHash, err)
}
// The duplicate content chunk should be referenced by 2 files
if chunk.Size == int64(len("Duplicate content")) && len(files) != 2 {
t.Errorf("Expected duplicate chunk to be referenced by 2 files, got %d", len(files))
}
}
}
// BackupEngine performs backup operations
type BackupEngine struct {
repos *database.Repositories
s3Client interface {
PutBlob(ctx context.Context, hash string, data []byte) error
BlobExists(ctx context.Context, hash string) (bool, error)
}
}
// Backup performs a backup of the given filesystem
func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (string, error) {
// Create a new snapshot
hostname, _ := os.Hostname()
snapshotID := time.Now().Format(time.RFC3339)
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID),
Hostname: types.Hostname(hostname),
VaultikVersion: "test",
StartedAt: time.Now(),
CompletedAt: nil,
}
// Create initial snapshot record
err := b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
return b.repos.Snapshots.Create(ctx, tx, snapshot)
})
if err != nil {
return "", err
}
// Track counters
var fileCount, chunkCount, blobCount, totalSize, blobSize int64
// Track which chunks we've seen to handle deduplication
processedChunks := make(map[string]bool)
// Scan the filesystem and process files
err = fs.WalkDir(fsys, root, func(path string, d fs.DirEntry, err error) error {
if err != nil {
return err
}
// Skip directories
if d.IsDir() {
return nil
}
// Get file info
info, err := d.Info()
if err != nil {
return err
}
// Handle symlinks
if info.Mode()&fs.ModeSymlink != 0 {
// For testing, we'll skip symlinks since fstest doesn't support them well
return nil
}
// Create file record in a short transaction
file := &database.File{
Path: types.FilePath(path),
Size: info.Size(),
Mode: uint32(info.Mode()),
MTime: info.ModTime(),
CTime: info.ModTime(), // Use mtime as ctime for test
UID: 1000, // Default UID for test
GID: 1000, // Default GID for test
}
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
return b.repos.Files.Create(ctx, tx, file)
})
if err != nil {
return err
}
fileCount++
totalSize += info.Size()
// Read and process file in chunks
f, err := fsys.Open(path)
if err != nil {
return err
}
defer func() {
if err := f.Close(); err != nil {
// Log but don't fail since we're already in an error path potentially
fmt.Fprintf(os.Stderr, "Failed to close file: %v\n", err)
}
}()
// Process file in chunks
chunkIndex := 0
buffer := make([]byte, defaultChunkSize)
for {
n, err := f.Read(buffer)
if err != nil && err != io.EOF {
return err
}
if n == 0 {
break
}
chunkData := buffer[:n]
chunkHash := calculateHash(chunkData)
// Check if chunk already exists (outside of transaction)
existingChunk, _ := b.repos.Chunks.GetByHash(ctx, chunkHash)
if existingChunk == nil {
// Create new chunk in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
chunk := &database.Chunk{
ChunkHash: types.ChunkHash(chunkHash),
Size: int64(n),
}
return b.repos.Chunks.Create(ctx, tx, chunk)
})
if err != nil {
return err
}
processedChunks[chunkHash] = true
}
// Create file-chunk mapping in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
fileChunk := &database.FileChunk{
FileID: file.ID,
Idx: chunkIndex,
ChunkHash: types.ChunkHash(chunkHash),
}
return b.repos.FileChunks.Create(ctx, tx, fileChunk)
})
if err != nil {
return err
}
// Create chunk-file mapping in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
chunkFile := &database.ChunkFile{
ChunkHash: types.ChunkHash(chunkHash),
FileID: file.ID,
FileOffset: int64(chunkIndex * defaultChunkSize),
Length: int64(n),
}
return b.repos.ChunkFiles.Create(ctx, tx, chunkFile)
})
if err != nil {
return err
}
chunkIndex++
}
return nil
})
if err != nil {
return "", err
}
// After all files are processed, create blobs for new chunks
for chunkHash := range processedChunks {
// Get chunk data (outside of transaction)
chunk, err := b.repos.Chunks.GetByHash(ctx, chunkHash)
if err != nil {
return "", err
}
chunkCount++
// In a real system, blobs would contain multiple chunks and be encrypted
// For testing, we'll create a blob with a "blob-" prefix to differentiate
blobHash := "blob-" + chunkHash
// For the test, we'll create dummy data since we don't have the original
dummyData := []byte(chunkHash)
// Upload to S3 as a blob
if err := b.s3Client.PutBlob(ctx, blobHash, dummyData); err != nil {
return "", err
}
// Create blob entry in a short transaction
blobID := types.NewBlobID()
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
blob := &database.Blob{
ID: blobID,
Hash: types.BlobHash(blobHash),
CreatedTS: time.Now(),
}
return b.repos.Blobs.Create(ctx, tx, blob)
})
if err != nil {
return "", err
}
blobCount++
blobSize += chunk.Size
// Create blob-chunk mapping in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
blobChunk := &database.BlobChunk{
BlobID: blobID,
ChunkHash: types.ChunkHash(chunkHash),
Offset: 0,
Length: chunk.Size,
}
return b.repos.BlobChunks.Create(ctx, tx, blobChunk)
})
if err != nil {
return "", err
}
// Add blob to snapshot in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
return b.repos.Snapshots.AddBlob(ctx, tx, snapshotID, blobID, types.BlobHash(blobHash))
})
if err != nil {
return "", err
}
}
// Update snapshot with final counts
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
return b.repos.Snapshots.UpdateCounts(ctx, tx, snapshotID, fileCount, chunkCount, blobCount, totalSize, blobSize)
})
if err != nil {
return "", err
}
return snapshotID, nil
}
func calculateHash(data []byte) string {
h := sha256.New()
h.Write(data)
return fmt.Sprintf("%x", h.Sum(nil))
}
func generateLargeFileContent(size int) []byte {
data := make([]byte, size)
// Fill with pattern that changes every chunk to avoid deduplication
for i := 0; i < size; i++ {
chunkNum := i / defaultChunkSize
data[i] = byte((i + chunkNum) % 256)
}
return data
}
const defaultChunkSize = 1024 * 1024 // 1MB chunks

View File

@@ -0,0 +1,454 @@
package snapshot_test
import (
"context"
"database/sql"
"path/filepath"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/spf13/afero"
"github.com/stretchr/testify/require"
)
func setupExcludeTestFS(t *testing.T) afero.Fs {
t.Helper()
// Create in-memory filesystem
fs := afero.NewMemMapFs()
// Create test directory structure:
// /backup/
// file1.txt (should be backed up)
// file2.log (should be excluded if *.log is in patterns)
// .git/
// config (should be excluded if .git is in patterns)
// objects/
// pack/
// data.pack (should be excluded if .git is in patterns)
// src/
// main.go (should be backed up)
// test.go (should be backed up)
// node_modules/
// package/
// index.js (should be excluded if node_modules is in patterns)
// cache/
// temp.dat (should be excluded if cache/ is in patterns)
// build/
// output.bin (should be excluded if build is in patterns)
// docs/
// readme.md (should be backed up)
// .DS_Store (should be excluded if .DS_Store is in patterns)
// thumbs.db (should be excluded if thumbs.db is in patterns)
files := map[string]string{
"/backup/file1.txt": "content1",
"/backup/file2.log": "log content",
"/backup/.git/config": "git config",
"/backup/.git/objects/pack/data.pack": "pack data",
"/backup/src/main.go": "package main",
"/backup/src/test.go": "package main_test",
"/backup/node_modules/package/index.js": "module.exports = {}",
"/backup/cache/temp.dat": "cached data",
"/backup/build/output.bin": "binary data",
"/backup/docs/readme.md": "# Documentation",
"/backup/.DS_Store": "ds store data",
"/backup/thumbs.db": "thumbs data",
"/backup/src/.hidden": "hidden file",
"/backup/important.log.bak": "backup of log",
}
testTime := time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC)
for path, content := range files {
dir := filepath.Dir(path)
err := fs.MkdirAll(dir, 0755)
require.NoError(t, err)
err = afero.WriteFile(fs, path, []byte(content), 0644)
require.NoError(t, err)
err = fs.Chtimes(path, testTime, testTime)
require.NoError(t, err)
}
return fs
}
func createTestScanner(t *testing.T, fs afero.Fs, excludePatterns []string) (*snapshot.Scanner, *database.Repositories, func()) {
t.Helper()
// Initialize logger
log.Initialize(log.Config{})
// Create test database
db, err := database.NewTestDB()
require.NoError(t, err)
repos := database.NewRepositories(db)
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
FS: fs,
ChunkSize: 64 * 1024,
Repositories: repos,
MaxBlobSize: 1024 * 1024,
CompressionLevel: 3,
AgeRecipients: []string{"age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p"},
Exclude: excludePatterns,
})
cleanup := func() {
_ = db.Close()
}
return scanner, repos, cleanup
}
func createSnapshotRecord(t *testing.T, ctx context.Context, repos *database.Repositories, snapshotID string) {
t.Helper()
err := repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snap := &database.Snapshot{
ID: types.SnapshotID(snapshotID),
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
CompletedAt: nil,
FileCount: 0,
ChunkCount: 0,
BlobCount: 0,
TotalSize: 0,
BlobSize: 0,
CompressionRatio: 1.0,
}
return repos.Snapshots.Create(ctx, tx, snap)
})
require.NoError(t, err)
}
func TestExcludePatterns_ExcludeGitDirectory(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{".git"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should have scanned files but NOT .git directory contents
// Expected: file1.txt, file2.log, src/main.go, src/test.go, node_modules/package/index.js,
// cache/temp.dat, build/output.bin, docs/readme.md, .DS_Store, thumbs.db,
// src/.hidden, important.log.bak
// Excluded: .git/config, .git/objects/pack/data.pack
require.Equal(t, 12, result.FilesScanned, "Should exclude .git directory contents")
}
func TestExcludePatterns_ExcludeByExtension(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"*.log"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude file2.log but NOT important.log.bak (different extension)
// Total files: 14, excluded: 1 (file2.log)
require.Equal(t, 13, result.FilesScanned, "Should exclude *.log files")
}
func TestExcludePatterns_ExcludeNodeModules(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"node_modules"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude node_modules/package/index.js
// Total files: 14, excluded: 1
require.Equal(t, 13, result.FilesScanned, "Should exclude node_modules directory")
}
func TestExcludePatterns_MultiplePatterns(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{".git", "node_modules", "*.log", ".DS_Store", "thumbs.db", "cache", "build"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should only have: file1.txt, src/main.go, src/test.go, docs/readme.md, src/.hidden, important.log.bak
// Excluded: .git/*, node_modules/*, *.log (file2.log), .DS_Store, thumbs.db, cache/*, build/*
require.Equal(t, 6, result.FilesScanned, "Should exclude multiple patterns")
}
func TestExcludePatterns_NoExclusions(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should scan all 14 files
require.Equal(t, 14, result.FilesScanned, "Should scan all files when no exclusions")
}
func TestExcludePatterns_ExcludeHiddenFiles(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{".*"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude: .git/*, .DS_Store, src/.hidden
// Total files: 14, excluded: 4 (.git/config, .git/objects/pack/data.pack, .DS_Store, src/.hidden)
require.Equal(t, 10, result.FilesScanned, "Should exclude hidden files and directories")
}
func TestExcludePatterns_DoubleStarGlob(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"**/*.pack"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude .git/objects/pack/data.pack
// Total files: 14, excluded: 1
require.Equal(t, 13, result.FilesScanned, "Should exclude **/*.pack files")
}
func TestExcludePatterns_ExactFileName(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"thumbs.db", ".DS_Store"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude thumbs.db and .DS_Store
// Total files: 14, excluded: 2
require.Equal(t, 12, result.FilesScanned, "Should exclude exact file names")
}
func TestExcludePatterns_CaseSensitive(t *testing.T) {
// Pattern matching should be case-sensitive
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"THUMBS.DB"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Case-sensitive matching: THUMBS.DB should NOT match thumbs.db
// All 14 files should be scanned
require.Equal(t, 14, result.FilesScanned, "Pattern matching should be case-sensitive")
}
func TestExcludePatterns_DirectoryWithTrailingSlash(t *testing.T) {
fs := setupExcludeTestFS(t)
// Some users might add trailing slashes to directory patterns
scanner, repos, cleanup := createTestScanner(t, fs, []string{"cache/", "build/"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude cache/temp.dat and build/output.bin
// Total files: 14, excluded: 2
require.Equal(t, 12, result.FilesScanned, "Should handle directory patterns with trailing slashes")
}
func TestExcludePatterns_PatternInSubdirectory(t *testing.T) {
fs := setupExcludeTestFS(t)
// Exclude .hidden file specifically in src directory
scanner, repos, cleanup := createTestScanner(t, fs, []string{"src/.hidden"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude only src/.hidden
// Total files: 14, excluded: 1
require.Equal(t, 13, result.FilesScanned, "Should exclude specific subdirectory files")
}
// setupAnchoredTestFS creates a filesystem for testing anchored patterns
// Source dir: /backup
// Structure:
//
// /backup/
// projectname/
// file.txt (should be excluded with /projectname)
// otherproject/
// projectname/
// file.txt (should NOT be excluded with /projectname, only with projectname)
// src/
// file.go
func setupAnchoredTestFS(t *testing.T) afero.Fs {
t.Helper()
fs := afero.NewMemMapFs()
files := map[string]string{
"/backup/projectname/file.txt": "root project file",
"/backup/otherproject/projectname/file.txt": "nested project file",
"/backup/src/file.go": "source file",
"/backup/file.txt": "root file",
}
testTime := time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC)
for path, content := range files {
dir := filepath.Dir(path)
err := fs.MkdirAll(dir, 0755)
require.NoError(t, err)
err = afero.WriteFile(fs, path, []byte(content), 0644)
require.NoError(t, err)
err = fs.Chtimes(path, testTime, testTime)
require.NoError(t, err)
}
return fs
}
func TestExcludePatterns_AnchoredPattern(t *testing.T) {
// Pattern starting with / should only match from root of source dir
fs := setupAnchoredTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"/projectname"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// /projectname should ONLY exclude /backup/projectname/file.txt (1 file)
// /backup/otherproject/projectname/file.txt should NOT be excluded
// Total files: 4, excluded: 1
require.Equal(t, 3, result.FilesScanned, "Anchored pattern /projectname should only match at root of source dir")
}
func TestExcludePatterns_UnanchoredPattern(t *testing.T) {
// Pattern without leading / should match anywhere in path
fs := setupAnchoredTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"projectname"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// projectname (without /) should exclude BOTH:
// - /backup/projectname/file.txt
// - /backup/otherproject/projectname/file.txt
// Total files: 4, excluded: 2
require.Equal(t, 2, result.FilesScanned, "Unanchored pattern should match anywhere in path")
}
func TestExcludePatterns_AnchoredPatternWithGlob(t *testing.T) {
// Anchored pattern with glob
fs := setupAnchoredTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"/src/*.go"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// /src/*.go should exclude /backup/src/file.go
// Total files: 4, excluded: 1
require.Equal(t, 3, result.FilesScanned, "Anchored pattern with glob should work")
}
func TestExcludePatterns_AnchoredPatternFile(t *testing.T) {
// Anchored pattern for exact file at root
fs := setupAnchoredTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"/file.txt"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// /file.txt should ONLY exclude /backup/file.txt
// NOT /backup/projectname/file.txt or /backup/otherproject/projectname/file.txt
// Total files: 4, excluded: 1
require.Equal(t, 3, result.FilesScanned, "Anchored pattern for file should only match at root")
}
func TestExcludePatterns_UnanchoredPatternFile(t *testing.T) {
// Unanchored pattern for file should match anywhere
fs := setupAnchoredTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"file.txt"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// file.txt should exclude ALL file.txt files:
// - /backup/file.txt
// - /backup/projectname/file.txt
// - /backup/otherproject/projectname/file.txt
// Total files: 4, excluded: 3
require.Equal(t, 1, result.FilesScanned, "Unanchored pattern for file should match anywhere")
}

View File

@@ -0,0 +1,238 @@
package snapshot_test
import (
"context"
"database/sql"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// TestFileContentChange verifies that when a file's content changes,
// the old chunks are properly disassociated
func TestFileContentChange(t *testing.T) {
// Initialize logger for tests
log.Initialize(log.Config{})
// Create in-memory filesystem
fs := afero.NewMemMapFs()
// Create initial file
err := afero.WriteFile(fs, "/test.txt", []byte("Initial content"), 0644)
require.NoError(t, err)
// Create test database
db, err := database.NewTestDB()
require.NoError(t, err)
defer func() {
if err := db.Close(); err != nil {
t.Errorf("failed to close database: %v", err)
}
}()
repos := database.NewRepositories(db)
// Create scanner
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
FS: fs,
ChunkSize: int64(1024 * 16), // 16KB chunks for testing
Repositories: repos,
MaxBlobSize: int64(1024 * 1024), // 1MB blobs
CompressionLevel: 3,
AgeRecipients: []string{"age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"}, // Test public key
})
// Create first snapshot
ctx := context.Background()
snapshotID1 := "snapshot1"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID1),
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
}
return repos.Snapshots.Create(ctx, tx, snapshot)
})
require.NoError(t, err)
// First scan - should create chunks for initial content
result1, err := scanner.Scan(ctx, "/", snapshotID1)
require.NoError(t, err)
t.Logf("First scan: %d files scanned", result1.FilesScanned)
// Get file chunks from first scan
fileChunks1, err := repos.FileChunks.GetByPath(ctx, "/test.txt")
require.NoError(t, err)
assert.Len(t, fileChunks1, 1) // Small file = 1 chunk
oldChunkHash := fileChunks1[0].ChunkHash
// Get chunk files from first scan
chunkFiles1, err := repos.ChunkFiles.GetByFilePath(ctx, "/test.txt")
require.NoError(t, err)
assert.Len(t, chunkFiles1, 1)
// Modify the file
time.Sleep(10 * time.Millisecond) // Ensure mtime changes
err = afero.WriteFile(fs, "/test.txt", []byte("Modified content with different data"), 0644)
require.NoError(t, err)
// Create second snapshot
snapshotID2 := "snapshot2"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID2),
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
}
return repos.Snapshots.Create(ctx, tx, snapshot)
})
require.NoError(t, err)
// Second scan - should create new chunks and remove old associations
result2, err := scanner.Scan(ctx, "/", snapshotID2)
require.NoError(t, err)
t.Logf("Second scan: %d files scanned", result2.FilesScanned)
// Get file chunks from second scan
fileChunks2, err := repos.FileChunks.GetByPath(ctx, "/test.txt")
require.NoError(t, err)
assert.Len(t, fileChunks2, 1) // Still 1 chunk but different hash
newChunkHash := fileChunks2[0].ChunkHash
// Verify the chunk hashes are different
assert.NotEqual(t, oldChunkHash, newChunkHash, "Chunk hash should change when content changes")
// Get chunk files from second scan
chunkFiles2, err := repos.ChunkFiles.GetByFilePath(ctx, "/test.txt")
require.NoError(t, err)
assert.Len(t, chunkFiles2, 1)
assert.Equal(t, newChunkHash, chunkFiles2[0].ChunkHash)
// Verify old chunk still exists (it's still valid data)
oldChunk, err := repos.Chunks.GetByHash(ctx, oldChunkHash.String())
require.NoError(t, err)
assert.NotNil(t, oldChunk)
// Verify new chunk exists
newChunk, err := repos.Chunks.GetByHash(ctx, newChunkHash.String())
require.NoError(t, err)
assert.NotNil(t, newChunk)
// Verify that chunk_files for old chunk no longer references this file
oldChunkFiles, err := repos.ChunkFiles.GetByChunkHash(ctx, oldChunkHash)
require.NoError(t, err)
for _, cf := range oldChunkFiles {
file, err := repos.Files.GetByID(ctx, cf.FileID)
require.NoError(t, err)
assert.NotEqual(t, "/data/test.txt", file.Path, "Old chunk should not be associated with the modified file")
}
}
// TestMultipleFileChanges verifies handling of multiple file changes in one scan
func TestMultipleFileChanges(t *testing.T) {
// Initialize logger for tests
log.Initialize(log.Config{})
// Create in-memory filesystem
fs := afero.NewMemMapFs()
// Create initial files
files := map[string]string{
"/file1.txt": "Content 1",
"/file2.txt": "Content 2",
"/file3.txt": "Content 3",
}
for path, content := range files {
err := afero.WriteFile(fs, path, []byte(content), 0644)
require.NoError(t, err)
}
// Create test database
db, err := database.NewTestDB()
require.NoError(t, err)
defer func() {
if err := db.Close(); err != nil {
t.Errorf("failed to close database: %v", err)
}
}()
repos := database.NewRepositories(db)
// Create scanner
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
FS: fs,
ChunkSize: int64(1024 * 16), // 16KB chunks for testing
Repositories: repos,
MaxBlobSize: int64(1024 * 1024), // 1MB blobs
CompressionLevel: 3,
AgeRecipients: []string{"age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"}, // Test public key
})
// Create first snapshot
ctx := context.Background()
snapshotID1 := "snapshot1"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID1),
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
}
return repos.Snapshots.Create(ctx, tx, snapshot)
})
require.NoError(t, err)
// First scan
result1, err := scanner.Scan(ctx, "/", snapshotID1)
require.NoError(t, err)
// Only regular files are counted, not directories
assert.Equal(t, 3, result1.FilesScanned)
// Modify two files
time.Sleep(10 * time.Millisecond) // Ensure mtime changes
err = afero.WriteFile(fs, "/file1.txt", []byte("Modified content 1"), 0644)
require.NoError(t, err)
err = afero.WriteFile(fs, "/file3.txt", []byte("Modified content 3"), 0644)
require.NoError(t, err)
// Create second snapshot
snapshotID2 := "snapshot2"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID2),
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
}
return repos.Snapshots.Create(ctx, tx, snapshot)
})
require.NoError(t, err)
// Second scan
result2, err := scanner.Scan(ctx, "/", snapshotID2)
require.NoError(t, err)
// Only regular files are counted, not directories
assert.Equal(t, 3, result2.FilesScanned)
// Verify each file has exactly one set of chunks
for path := range files {
fileChunks, err := repos.FileChunks.GetByPath(ctx, path)
require.NoError(t, err)
assert.Len(t, fileChunks, 1, "File %s should have exactly 1 chunk association", path)
chunkFiles, err := repos.ChunkFiles.GetByFilePath(ctx, path)
require.NoError(t, err)
assert.Len(t, chunkFiles, 1, "File %s should have exactly 1 chunk-file association", path)
}
}

View File

@@ -0,0 +1,70 @@
package snapshot
import (
"bytes"
"encoding/json"
"fmt"
"io"
"github.com/klauspost/compress/zstd"
)
// Manifest represents the structure of a snapshot's blob manifest
type Manifest struct {
SnapshotID string `json:"snapshot_id"`
Timestamp string `json:"timestamp"`
BlobCount int `json:"blob_count"`
TotalCompressedSize int64 `json:"total_compressed_size"`
Blobs []BlobInfo `json:"blobs"`
}
// BlobInfo represents information about a single blob in the manifest
type BlobInfo struct {
Hash string `json:"hash"`
CompressedSize int64 `json:"compressed_size"`
}
// DecodeManifest decodes a manifest from a reader containing compressed JSON
func DecodeManifest(r io.Reader) (*Manifest, error) {
// Decompress using zstd
zr, err := zstd.NewReader(r)
if err != nil {
return nil, fmt.Errorf("creating zstd reader: %w", err)
}
defer zr.Close()
// Decode JSON manifest
var manifest Manifest
if err := json.NewDecoder(zr).Decode(&manifest); err != nil {
return nil, fmt.Errorf("decoding manifest: %w", err)
}
return &manifest, nil
}
// EncodeManifest encodes a manifest to compressed JSON
func EncodeManifest(manifest *Manifest, compressionLevel int) ([]byte, error) {
// Marshal to JSON
jsonData, err := json.MarshalIndent(manifest, "", " ")
if err != nil {
return nil, fmt.Errorf("marshaling manifest: %w", err)
}
// Compress using zstd
var compressedBuf bytes.Buffer
writer, err := zstd.NewWriter(&compressedBuf, zstd.WithEncoderLevel(zstd.EncoderLevelFromZstd(compressionLevel)))
if err != nil {
return nil, fmt.Errorf("creating zstd writer: %w", err)
}
if _, err := writer.Write(jsonData); err != nil {
_ = writer.Close()
return nil, fmt.Errorf("writing compressed data: %w", err)
}
if err := writer.Close(); err != nil {
return nil, fmt.Errorf("closing zstd writer: %w", err)
}
return compressedBuf.Bytes(), nil
}

View File

@@ -0,0 +1,53 @@
package snapshot
import (
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/storage"
"github.com/spf13/afero"
"go.uber.org/fx"
)
// ScannerParams holds parameters for scanner creation
type ScannerParams struct {
EnableProgress bool
Fs afero.Fs
Exclude []string // Exclude patterns (combined global + snapshot-specific)
SkipErrors bool // Skip file read errors (log loudly but continue)
}
// Module exports backup functionality as an fx module.
// It provides a ScannerFactory that can create Scanner instances
// with custom parameters while sharing common dependencies.
var Module = fx.Module("backup",
fx.Provide(
provideScannerFactory,
NewSnapshotManager,
),
)
// ScannerFactory creates scanners with custom parameters
type ScannerFactory func(params ScannerParams) *Scanner
func provideScannerFactory(cfg *config.Config, repos *database.Repositories, storer storage.Storer) ScannerFactory {
return func(params ScannerParams) *Scanner {
// Use provided excludes, or fall back to global config excludes
excludes := params.Exclude
if len(excludes) == 0 {
excludes = cfg.Exclude
}
return NewScanner(ScannerConfig{
FS: params.Fs,
ChunkSize: cfg.ChunkSize.Int64(),
Repositories: repos,
Storage: storer,
MaxBlobSize: cfg.BlobSizeLimit.Int64(),
CompressionLevel: cfg.CompressionLevel,
AgeRecipients: cfg.AgeRecipients,
EnableProgress: params.EnableProgress,
Exclude: excludes,
SkipErrors: params.SkipErrors,
})
}
}

View File

@@ -0,0 +1,419 @@
package snapshot
import (
"context"
"fmt"
"os"
"os/signal"
"sync"
"sync/atomic"
"syscall"
"time"
"git.eeqj.de/sneak/vaultik/internal/log"
"github.com/dustin/go-humanize"
)
const (
// SummaryInterval defines how often one-line status updates are printed.
// These updates show current progress, ETA, and the file being processed.
SummaryInterval = 10 * time.Second
// DetailInterval defines how often multi-line detailed status reports are printed.
// These reports include comprehensive statistics about files, chunks, blobs, and uploads.
DetailInterval = 60 * time.Second
// UploadProgressInterval defines how often upload progress messages are logged.
UploadProgressInterval = 15 * time.Second
)
// ProgressStats holds atomic counters for progress tracking
type ProgressStats struct {
FilesScanned atomic.Int64 // Total files seen during scan (includes skipped)
FilesProcessed atomic.Int64 // Files actually processed in phase 2
FilesSkipped atomic.Int64 // Files skipped due to no changes
BytesScanned atomic.Int64 // Bytes from new/changed files only
BytesSkipped atomic.Int64 // Bytes from unchanged files
BytesProcessed atomic.Int64 // Actual bytes processed (for ETA calculation)
ChunksCreated atomic.Int64
BlobsCreated atomic.Int64
BlobsUploaded atomic.Int64
BytesUploaded atomic.Int64
UploadDurationMs atomic.Int64 // Total milliseconds spent uploading
CurrentFile atomic.Value // stores string
TotalSize atomic.Int64 // Total size to process (set after scan phase)
TotalFiles atomic.Int64 // Total files to process in phase 2
ProcessStartTime atomic.Value // stores time.Time when processing starts
StartTime time.Time
mu sync.RWMutex
lastDetailTime time.Time
// Upload tracking
CurrentUpload atomic.Value // stores *UploadInfo
lastChunkingTime time.Time // Track when we last showed chunking progress
}
// UploadInfo tracks current upload progress
type UploadInfo struct {
BlobHash string
Size int64
StartTime time.Time
LastLogTime time.Time
}
// ProgressReporter handles periodic progress reporting
type ProgressReporter struct {
stats *ProgressStats
ctx context.Context
cancel context.CancelFunc
wg sync.WaitGroup
detailTicker *time.Ticker
summaryTicker *time.Ticker
sigChan chan os.Signal
}
// NewProgressReporter creates a new progress reporter
func NewProgressReporter() *ProgressReporter {
stats := &ProgressStats{
StartTime: time.Now().UTC(),
lastDetailTime: time.Now().UTC(),
}
stats.CurrentFile.Store("")
ctx, cancel := context.WithCancel(context.Background())
pr := &ProgressReporter{
stats: stats,
ctx: ctx,
cancel: cancel,
summaryTicker: time.NewTicker(SummaryInterval),
detailTicker: time.NewTicker(DetailInterval),
sigChan: make(chan os.Signal, 1),
}
// Register for SIGUSR1
signal.Notify(pr.sigChan, syscall.SIGUSR1)
return pr
}
// Start begins the progress reporting
func (pr *ProgressReporter) Start() {
pr.wg.Add(1)
go pr.run()
// Print initial multi-line status
pr.printDetailedStatus()
}
// Stop stops the progress reporting
func (pr *ProgressReporter) Stop() {
pr.cancel()
pr.summaryTicker.Stop()
pr.detailTicker.Stop()
signal.Stop(pr.sigChan)
close(pr.sigChan)
pr.wg.Wait()
}
// GetStats returns the progress stats for updating
func (pr *ProgressReporter) GetStats() *ProgressStats {
return pr.stats
}
// SetTotalSize sets the total size to process (after scan phase)
func (pr *ProgressReporter) SetTotalSize(size int64) {
pr.stats.TotalSize.Store(size)
pr.stats.ProcessStartTime.Store(time.Now().UTC())
}
// run is the main progress reporting loop
func (pr *ProgressReporter) run() {
defer pr.wg.Done()
for {
select {
case <-pr.ctx.Done():
return
case <-pr.summaryTicker.C:
pr.printSummaryStatus()
case <-pr.detailTicker.C:
pr.printDetailedStatus()
case <-pr.sigChan:
// SIGUSR1 received, print detailed status
log.Info("SIGUSR1 received, printing detailed status")
pr.printDetailedStatus()
}
}
}
// printSummaryStatus prints a one-line status update
func (pr *ProgressReporter) printSummaryStatus() {
// Check if we're currently uploading
if uploadInfo, ok := pr.stats.CurrentUpload.Load().(*UploadInfo); ok && uploadInfo != nil {
// Show upload progress instead
pr.printUploadProgress(uploadInfo)
return
}
// Only show chunking progress if we've done chunking recently
pr.stats.mu.RLock()
timeSinceLastChunk := time.Since(pr.stats.lastChunkingTime)
pr.stats.mu.RUnlock()
if timeSinceLastChunk > SummaryInterval*2 {
// No recent chunking activity, don't show progress
return
}
elapsed := time.Since(pr.stats.StartTime)
bytesScanned := pr.stats.BytesScanned.Load()
bytesSkipped := pr.stats.BytesSkipped.Load()
bytesProcessed := pr.stats.BytesProcessed.Load()
totalSize := pr.stats.TotalSize.Load()
currentFile := pr.stats.CurrentFile.Load().(string)
// Calculate ETA if we have total size and are processing
etaStr := ""
if totalSize > 0 && bytesProcessed > 0 {
processStart, ok := pr.stats.ProcessStartTime.Load().(time.Time)
if ok && !processStart.IsZero() {
processElapsed := time.Since(processStart)
rate := float64(bytesProcessed) / processElapsed.Seconds()
if rate > 0 {
remainingBytes := totalSize - bytesProcessed
remainingSeconds := float64(remainingBytes) / rate
eta := time.Duration(remainingSeconds * float64(time.Second))
etaStr = fmt.Sprintf(" | ETA: %s", formatDuration(eta))
}
}
}
rate := float64(bytesScanned+bytesSkipped) / elapsed.Seconds()
// Show files processed / total files to process
filesProcessed := pr.stats.FilesProcessed.Load()
totalFiles := pr.stats.TotalFiles.Load()
status := fmt.Sprintf("Snapshot progress: %d/%d files, %s/%s (%.1f%%), %s/s%s",
filesProcessed,
totalFiles,
humanize.Bytes(uint64(bytesProcessed)),
humanize.Bytes(uint64(totalSize)),
float64(bytesProcessed)/float64(totalSize)*100,
humanize.Bytes(uint64(rate)),
etaStr,
)
if currentFile != "" {
status += fmt.Sprintf(" | Current: %s", truncatePath(currentFile, 40))
}
log.Info(status)
}
// printDetailedStatus prints a multi-line detailed status
func (pr *ProgressReporter) printDetailedStatus() {
pr.stats.mu.Lock()
pr.stats.lastDetailTime = time.Now().UTC()
pr.stats.mu.Unlock()
elapsed := time.Since(pr.stats.StartTime)
filesScanned := pr.stats.FilesScanned.Load()
filesSkipped := pr.stats.FilesSkipped.Load()
bytesScanned := pr.stats.BytesScanned.Load()
bytesSkipped := pr.stats.BytesSkipped.Load()
bytesProcessed := pr.stats.BytesProcessed.Load()
totalSize := pr.stats.TotalSize.Load()
chunksCreated := pr.stats.ChunksCreated.Load()
blobsCreated := pr.stats.BlobsCreated.Load()
blobsUploaded := pr.stats.BlobsUploaded.Load()
bytesUploaded := pr.stats.BytesUploaded.Load()
currentFile := pr.stats.CurrentFile.Load().(string)
totalBytes := bytesScanned + bytesSkipped
rate := float64(totalBytes) / elapsed.Seconds()
log.Notice("=== Snapshot Progress Report ===")
log.Info("Elapsed time", "duration", formatDuration(elapsed))
// Calculate and show ETA if we have data
if totalSize > 0 && bytesProcessed > 0 {
processStart, ok := pr.stats.ProcessStartTime.Load().(time.Time)
if ok && !processStart.IsZero() {
processElapsed := time.Since(processStart)
processRate := float64(bytesProcessed) / processElapsed.Seconds()
if processRate > 0 {
remainingBytes := totalSize - bytesProcessed
remainingSeconds := float64(remainingBytes) / processRate
eta := time.Duration(remainingSeconds * float64(time.Second))
percentComplete := float64(bytesProcessed) / float64(totalSize) * 100
log.Info("Overall progress",
"percent", fmt.Sprintf("%.1f%%", percentComplete),
"processed", humanize.Bytes(uint64(bytesProcessed)),
"total", humanize.Bytes(uint64(totalSize)),
"rate", humanize.Bytes(uint64(processRate))+"/s",
"eta", formatDuration(eta))
}
}
}
log.Info("Files processed",
"scanned", filesScanned,
"skipped", filesSkipped,
"total", filesScanned,
"skip_rate", formatPercent(filesSkipped, filesScanned))
log.Info("Data scanned",
"new", humanize.Bytes(uint64(bytesScanned)),
"skipped", humanize.Bytes(uint64(bytesSkipped)),
"total", humanize.Bytes(uint64(totalBytes)),
"scan_rate", humanize.Bytes(uint64(rate))+"/s")
log.Info("Chunks created", "count", chunksCreated)
log.Info("Blobs status",
"created", blobsCreated,
"uploaded", blobsUploaded,
"pending", blobsCreated-blobsUploaded)
log.Info("Total uploaded to remote",
"uploaded", humanize.Bytes(uint64(bytesUploaded)),
"compression_ratio", formatRatio(bytesUploaded, bytesScanned))
if currentFile != "" {
log.Info("Current file", "path", currentFile)
}
log.Notice("=============================")
}
// Helper functions
func formatDuration(d time.Duration) string {
if d < 0 {
return "unknown"
}
if d < time.Minute {
return fmt.Sprintf("%ds", int(d.Seconds()))
}
if d < time.Hour {
return fmt.Sprintf("%dm%ds", int(d.Minutes()), int(d.Seconds())%60)
}
return fmt.Sprintf("%dh%dm", int(d.Hours()), int(d.Minutes())%60)
}
func formatPercent(numerator, denominator int64) string {
if denominator == 0 {
return "0.0%"
}
return fmt.Sprintf("%.1f%%", float64(numerator)/float64(denominator)*100)
}
func formatRatio(compressed, uncompressed int64) string {
if uncompressed == 0 {
return "1.00"
}
ratio := float64(compressed) / float64(uncompressed)
return fmt.Sprintf("%.2f", ratio)
}
func truncatePath(path string, maxLen int) string {
if len(path) <= maxLen {
return path
}
// Keep the last maxLen-3 characters and prepend "..."
return "..." + path[len(path)-(maxLen-3):]
}
// printUploadProgress prints upload progress
func (pr *ProgressReporter) printUploadProgress(info *UploadInfo) {
// This function is called repeatedly during upload, not just at start
// Don't print anything here - the actual progress is shown by ReportUploadProgress
}
// ReportUploadStart marks the beginning of a blob upload
func (pr *ProgressReporter) ReportUploadStart(blobHash string, size int64) {
info := &UploadInfo{
BlobHash: blobHash,
Size: size,
StartTime: time.Now().UTC(),
}
pr.stats.CurrentUpload.Store(info)
// Log the start of upload
log.Info("Starting blob upload",
"hash", blobHash[:8]+"...",
"size", humanize.Bytes(uint64(size)))
}
// ReportUploadComplete marks the completion of a blob upload
func (pr *ProgressReporter) ReportUploadComplete(blobHash string, size int64, duration time.Duration) {
// Clear current upload
pr.stats.CurrentUpload.Store((*UploadInfo)(nil))
// Add to total upload duration
pr.stats.UploadDurationMs.Add(duration.Milliseconds())
// Calculate speed
if duration < time.Millisecond {
duration = time.Millisecond
}
bytesPerSec := float64(size) / duration.Seconds()
bitsPerSec := bytesPerSec * 8
// Format speed
var speedStr string
if bitsPerSec >= 1e9 {
speedStr = fmt.Sprintf("%.1fGbit/sec", bitsPerSec/1e9)
} else if bitsPerSec >= 1e6 {
speedStr = fmt.Sprintf("%.0fMbit/sec", bitsPerSec/1e6)
} else if bitsPerSec >= 1e3 {
speedStr = fmt.Sprintf("%.0fKbit/sec", bitsPerSec/1e3)
} else {
speedStr = fmt.Sprintf("%.0fbit/sec", bitsPerSec)
}
log.Info("Blob upload completed",
"hash", blobHash[:8]+"...",
"size", humanize.Bytes(uint64(size)),
"duration", formatDuration(duration),
"speed", speedStr)
}
// UpdateChunkingActivity updates the last chunking time
func (pr *ProgressReporter) UpdateChunkingActivity() {
pr.stats.mu.Lock()
pr.stats.lastChunkingTime = time.Now().UTC()
pr.stats.mu.Unlock()
}
// ReportUploadProgress reports current upload progress with instantaneous speed
func (pr *ProgressReporter) ReportUploadProgress(blobHash string, bytesUploaded, totalSize int64, instantSpeed float64) {
// Update the current upload info with progress
if uploadInfo, ok := pr.stats.CurrentUpload.Load().(*UploadInfo); ok && uploadInfo != nil {
now := time.Now()
// Only log at the configured interval
if now.Sub(uploadInfo.LastLogTime) >= UploadProgressInterval {
// Format speed in bits/second using humanize
bitsPerSec := instantSpeed * 8
speedStr := humanize.SI(bitsPerSec, "bit/sec")
percent := float64(bytesUploaded) / float64(totalSize) * 100
// Calculate ETA based on current speed
etaStr := "unknown"
if instantSpeed > 0 && bytesUploaded < totalSize {
remainingBytes := totalSize - bytesUploaded
remainingSeconds := float64(remainingBytes) / instantSpeed
eta := time.Duration(remainingSeconds * float64(time.Second))
etaStr = formatDuration(eta)
}
log.Info("Blob upload progress",
"hash", blobHash[:8]+"...",
"progress", fmt.Sprintf("%.1f%%", percent),
"uploaded", humanize.Bytes(uint64(bytesUploaded)),
"total", humanize.Bytes(uint64(totalSize)),
"speed", speedStr,
"eta", etaStr)
uploadInfo.LastLogTime = now
}
}
}

1408
internal/snapshot/scanner.go Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,269 @@
package snapshot_test
import (
"context"
"database/sql"
"path/filepath"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/spf13/afero"
)
func TestScannerSimpleDirectory(t *testing.T) {
// Initialize logger for tests
log.Initialize(log.Config{})
// Create in-memory filesystem
fs := afero.NewMemMapFs()
// Create test directory structure
testFiles := map[string]string{
"/source/file1.txt": "Hello, world!", // 13 bytes
"/source/file2.txt": "This is another file", // 20 bytes
"/source/subdir/file3.txt": "File in subdirectory", // 20 bytes
"/source/subdir/file4.txt": "Another file in subdirectory", // 28 bytes
"/source/empty.txt": "", // 0 bytes
"/source/subdir2/file5.txt": "Yet another file", // 16 bytes
}
// Create files with specific times
testTime := time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC)
for path, content := range testFiles {
dir := filepath.Dir(path)
if err := fs.MkdirAll(dir, 0755); err != nil {
t.Fatalf("failed to create directory %s: %v", dir, err)
}
if err := afero.WriteFile(fs, path, []byte(content), 0644); err != nil {
t.Fatalf("failed to write file %s: %v", path, err)
}
// Set times
if err := fs.Chtimes(path, testTime, testTime); err != nil {
t.Fatalf("failed to set times for %s: %v", path, err)
}
}
// Create test database
db, err := database.NewTestDB()
if err != nil {
t.Fatalf("failed to create test database: %v", err)
}
defer func() {
if err := db.Close(); err != nil {
t.Errorf("failed to close database: %v", err)
}
}()
repos := database.NewRepositories(db)
// Create scanner
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
FS: fs,
ChunkSize: int64(1024 * 16), // 16KB chunks for testing
Repositories: repos,
MaxBlobSize: int64(1024 * 1024), // 1MB blobs
CompressionLevel: 3,
AgeRecipients: []string{"age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"}, // Test public key
})
// Create a snapshot record for testing
ctx := context.Background()
snapshotID := "test-snapshot-001"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID),
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
CompletedAt: nil,
FileCount: 0,
ChunkCount: 0,
BlobCount: 0,
TotalSize: 0,
BlobSize: 0,
CompressionRatio: 1.0,
}
return repos.Snapshots.Create(ctx, tx, snapshot)
})
if err != nil {
t.Fatalf("failed to create snapshot: %v", err)
}
// Scan the directory
var result *snapshot.ScanResult
result, err = scanner.Scan(ctx, "/source", snapshotID)
if err != nil {
t.Fatalf("scan failed: %v", err)
}
// Verify results - we only scan regular files, not directories
if result.FilesScanned != 6 {
t.Errorf("expected 6 files scanned, got %d", result.FilesScanned)
}
// Total bytes should be the sum of all file contents
if result.BytesScanned < 97 { // At minimum we have 97 bytes of file content
t.Errorf("expected at least 97 bytes scanned, got %d", result.BytesScanned)
}
// Verify files in database - only regular files are stored
files, err := repos.Files.ListByPrefix(ctx, "/source")
if err != nil {
t.Fatalf("failed to list files: %v", err)
}
// We should have 6 files (directories are not stored)
if len(files) != 6 {
t.Errorf("expected 6 files in database, got %d", len(files))
}
// Verify specific file
file1, err := repos.Files.GetByPath(ctx, "/source/file1.txt")
if err != nil {
t.Fatalf("failed to get file1.txt: %v", err)
}
if file1.Size != 13 {
t.Errorf("expected file1.txt size 13, got %d", file1.Size)
}
if file1.Mode != 0644 {
t.Errorf("expected file1.txt mode 0644, got %o", file1.Mode)
}
// Verify chunks were created
chunks, err := repos.FileChunks.GetByFile(ctx, "/source/file1.txt")
if err != nil {
t.Fatalf("failed to get chunks for file1.txt: %v", err)
}
if len(chunks) != 1 { // Small file should be one chunk
t.Errorf("expected 1 chunk for file1.txt, got %d", len(chunks))
}
// Verify deduplication - file3.txt and file4.txt have different content
// but we should still have the correct number of unique chunks
allChunks, err := repos.Chunks.List(ctx)
if err != nil {
t.Fatalf("failed to list all chunks: %v", err)
}
// We should have at most 6 chunks (one per unique file content)
// Empty file might not create a chunk
if len(allChunks) > 6 {
t.Errorf("expected at most 6 chunks, got %d", len(allChunks))
}
}
func TestScannerLargeFile(t *testing.T) {
// Initialize logger for tests
log.Initialize(log.Config{})
// Create in-memory filesystem
fs := afero.NewMemMapFs()
// Create a large file that will require multiple chunks
// Use random content to ensure good chunk boundaries
largeContent := make([]byte, 1024*1024) // 1MB
// Fill with pseudo-random data to ensure chunk boundaries
for i := 0; i < len(largeContent); i++ {
// Simple pseudo-random generator for deterministic tests
largeContent[i] = byte((i * 7919) ^ (i >> 3))
}
if err := fs.MkdirAll("/source", 0755); err != nil {
t.Fatal(err)
}
if err := afero.WriteFile(fs, "/source/large.bin", largeContent, 0644); err != nil {
t.Fatal(err)
}
// Create test database
db, err := database.NewTestDB()
if err != nil {
t.Fatalf("failed to create test database: %v", err)
}
defer func() {
if err := db.Close(); err != nil {
t.Errorf("failed to close database: %v", err)
}
}()
repos := database.NewRepositories(db)
// Create scanner with 64KB average chunk size
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
FS: fs,
ChunkSize: int64(1024 * 64), // 64KB average chunks
Repositories: repos,
MaxBlobSize: int64(1024 * 1024),
CompressionLevel: 3,
AgeRecipients: []string{"age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"}, // Test public key
})
// Create a snapshot record for testing
ctx := context.Background()
snapshotID := "test-snapshot-001"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID),
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
CompletedAt: nil,
FileCount: 0,
ChunkCount: 0,
BlobCount: 0,
TotalSize: 0,
BlobSize: 0,
CompressionRatio: 1.0,
}
return repos.Snapshots.Create(ctx, tx, snapshot)
})
if err != nil {
t.Fatalf("failed to create snapshot: %v", err)
}
// Scan the directory
var result *snapshot.ScanResult
result, err = scanner.Scan(ctx, "/source", snapshotID)
if err != nil {
t.Fatalf("scan failed: %v", err)
}
// We scan only regular files, not directories
if result.FilesScanned != 1 {
t.Errorf("expected 1 file scanned, got %d", result.FilesScanned)
}
// The file size should be at least 1MB
if result.BytesScanned < 1024*1024 {
t.Errorf("expected at least %d bytes scanned, got %d", 1024*1024, result.BytesScanned)
}
// Verify chunks
chunks, err := repos.FileChunks.GetByFile(ctx, "/source/large.bin")
if err != nil {
t.Fatalf("failed to get chunks: %v", err)
}
// With content-defined chunking, the number of chunks depends on content
// For a 1MB file, we should get at least 1 chunk
if len(chunks) < 1 {
t.Errorf("expected at least 1 chunk, got %d", len(chunks))
}
// Log the actual number of chunks for debugging
t.Logf("1MB file produced %d chunks with 64KB average chunk size", len(chunks))
// Verify chunk sequence
for i, fc := range chunks {
if fc.Idx != i {
t.Errorf("chunk %d has incorrect sequence %d", i, fc.Idx)
}
}
}

View File

@@ -0,0 +1,895 @@
package snapshot
// Snapshot Metadata Export Process
// ================================
//
// The snapshot metadata contains all information needed to restore a snapshot.
// Instead of creating a custom format, we use a trimmed copy of the SQLite
// database containing only data relevant to the current snapshot.
//
// Process Overview:
// 1. After all files/chunks/blobs are backed up, create a snapshot record
// 2. Close the main database to ensure consistency
// 3. Copy the entire database to a temporary file
// 4. Open the temporary database
// 5. Delete all snapshots except the current one
// 6. Delete all orphaned records:
// - Files not referenced by any remaining snapshot
// - Chunks not referenced by any remaining files
// - Blobs not containing any remaining chunks
// - All related mapping tables (file_chunks, chunk_files, blob_chunks)
// 7. Close the temporary database
// 8. VACUUM the database to remove deleted data and compact (security critical)
// 9. Compress the binary database with zstd
// 10. Encrypt the compressed database with age (if encryption is enabled)
// 11. Upload to S3 as: metadata/{snapshot-id}/db.zst.age
// 12. Reopen the main database
//
// Advantages of this approach:
// - No custom metadata format needed
// - Reuses existing database schema and relationships
// - Binary SQLite files are portable and compress well
// - Fast restore - just decompress and open (no SQL parsing)
// - VACUUM ensures no deleted data leaks
// - Atomic and consistent snapshot of all metadata
import (
"bytes"
"context"
"database/sql"
"fmt"
"io"
"os/exec"
"path/filepath"
"strings"
"time"
"git.eeqj.de/sneak/vaultik/internal/blobgen"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/storage"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/dustin/go-humanize"
"github.com/spf13/afero"
"go.uber.org/fx"
)
// SnapshotManager handles snapshot creation and metadata export
type SnapshotManager struct {
repos *database.Repositories
storage storage.Storer
config *config.Config
fs afero.Fs
}
// SnapshotManagerParams holds dependencies for NewSnapshotManager
type SnapshotManagerParams struct {
fx.In
Repos *database.Repositories
Storage storage.Storer
Config *config.Config
}
// NewSnapshotManager creates a new snapshot manager for dependency injection
func NewSnapshotManager(params SnapshotManagerParams) *SnapshotManager {
return &SnapshotManager{
repos: params.Repos,
storage: params.Storage,
config: params.Config,
}
}
// SetFilesystem sets the filesystem to use for all file operations
func (sm *SnapshotManager) SetFilesystem(fs afero.Fs) {
sm.fs = fs
}
// CreateSnapshot creates a new snapshot record in the database at the start of a backup.
// Deprecated: Use CreateSnapshotWithName instead for multi-snapshot support.
func (sm *SnapshotManager) CreateSnapshot(ctx context.Context, hostname, version, gitRevision string) (string, error) {
return sm.CreateSnapshotWithName(ctx, hostname, "", version, gitRevision)
}
// CreateSnapshotWithName creates a new snapshot record with an optional snapshot name.
// The snapshot ID format is: hostname_name_timestamp or hostname_timestamp if name is empty.
func (sm *SnapshotManager) CreateSnapshotWithName(ctx context.Context, hostname, name, version, gitRevision string) (string, error) {
// Use short hostname (strip domain if present)
shortHostname := hostname
if idx := strings.Index(hostname, "."); idx != -1 {
shortHostname = hostname[:idx]
}
// Build snapshot ID with optional name
timestamp := time.Now().UTC().Format("2006-01-02T15:04:05Z")
var snapshotID string
if name != "" {
snapshotID = fmt.Sprintf("%s_%s_%s", shortHostname, name, timestamp)
} else {
snapshotID = fmt.Sprintf("%s_%s", shortHostname, timestamp)
}
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID),
Hostname: types.Hostname(hostname),
VaultikVersion: types.Version(version),
VaultikGitRevision: types.GitRevision(gitRevision),
StartedAt: time.Now().UTC(),
CompletedAt: nil, // Not completed yet
FileCount: 0,
ChunkCount: 0,
BlobCount: 0,
TotalSize: 0,
BlobSize: 0,
CompressionRatio: 1.0,
}
err := sm.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
return sm.repos.Snapshots.Create(ctx, tx, snapshot)
})
if err != nil {
return "", fmt.Errorf("creating snapshot: %w", err)
}
log.Info("Created snapshot", "snapshot_id", snapshotID)
return snapshotID, nil
}
// UpdateSnapshotStats updates the statistics for a snapshot during backup
func (sm *SnapshotManager) UpdateSnapshotStats(ctx context.Context, snapshotID string, stats BackupStats) error {
err := sm.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
return sm.repos.Snapshots.UpdateCounts(ctx, tx, snapshotID,
int64(stats.FilesScanned),
int64(stats.ChunksCreated),
int64(stats.BlobsCreated),
stats.BytesScanned,
stats.BytesUploaded,
)
})
if err != nil {
return fmt.Errorf("updating snapshot stats: %w", err)
}
return nil
}
// UpdateSnapshotStatsExtended updates snapshot statistics with extended metrics.
// This includes compression level, uncompressed blob size, and upload duration.
func (sm *SnapshotManager) UpdateSnapshotStatsExtended(ctx context.Context, snapshotID string, stats ExtendedBackupStats) error {
return sm.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
// First update basic stats
if err := sm.repos.Snapshots.UpdateCounts(ctx, tx, snapshotID,
int64(stats.FilesScanned),
int64(stats.ChunksCreated),
int64(stats.BlobsCreated),
stats.BytesScanned,
stats.BytesUploaded,
); err != nil {
return err
}
// Then update extended stats
return sm.repos.Snapshots.UpdateExtendedStats(ctx, tx, snapshotID,
stats.BlobUncompressedSize,
stats.CompressionLevel,
stats.UploadDurationMs,
)
})
}
// CompleteSnapshot marks a snapshot as completed and exports its metadata
func (sm *SnapshotManager) CompleteSnapshot(ctx context.Context, snapshotID string) error {
// Mark the snapshot as completed
err := sm.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
return sm.repos.Snapshots.MarkComplete(ctx, tx, snapshotID)
})
if err != nil {
return fmt.Errorf("marking snapshot complete: %w", err)
}
log.Info("Completed snapshot", "snapshot_id", snapshotID)
return nil
}
// ExportSnapshotMetadata exports snapshot metadata to S3
//
// This method executes the complete snapshot metadata export process:
// 1. Creates a temporary directory for working files
// 2. Copies the main database to preserve its state
// 3. Cleans the copy to contain only current snapshot data
// 4. Dumps the cleaned database to SQL
// 5. Compresses the SQL dump with zstd
// 6. Encrypts the compressed data (if encryption is enabled)
// 7. Uploads to S3 at: snapshots/{snapshot-id}.sql.zst[.age]
//
// The caller is responsible for:
// - Ensuring the main database is closed before calling this method
// - Reopening the main database after this method returns
//
// This ensures database consistency during the copy operation.
func (sm *SnapshotManager) ExportSnapshotMetadata(ctx context.Context, dbPath string, snapshotID string) error {
log.Info("Phase 3/3: Exporting snapshot metadata", "snapshot_id", snapshotID, "source_db", dbPath)
// Create temp directory for all temporary files
tempDir, err := afero.TempDir(sm.fs, "", "vaultik-snapshot-*")
if err != nil {
return fmt.Errorf("creating temp dir: %w", err)
}
log.Debug("Created temporary directory", "path", tempDir)
defer func() {
log.Debug("Cleaning up temporary directory", "path", tempDir)
if err := sm.fs.RemoveAll(tempDir); err != nil {
log.Debug("Failed to remove temp dir", "path", tempDir, "error", err)
}
}()
// Step 1: Copy database to temp file
// The main database should be closed at this point
tempDBPath := filepath.Join(tempDir, "snapshot.db")
log.Debug("Copying database to temporary location", "source", dbPath, "destination", tempDBPath)
if err := sm.copyFile(dbPath, tempDBPath); err != nil {
return fmt.Errorf("copying database: %w", err)
}
log.Debug("Database copy complete", "size", sm.getFileSize(tempDBPath))
// Step 2: Clean the temp database to only contain current snapshot data
log.Debug("Cleaning temporary database", "snapshot_id", snapshotID)
stats, err := sm.cleanSnapshotDB(ctx, tempDBPath, snapshotID)
if err != nil {
return fmt.Errorf("cleaning snapshot database: %w", err)
}
log.Info("Temporary database cleanup complete",
"db_path", tempDBPath,
"size_after_clean", humanize.Bytes(uint64(sm.getFileSize(tempDBPath))),
"files", stats.FileCount,
"chunks", stats.ChunkCount,
"blobs", stats.BlobCount,
"total_compressed_size", humanize.Bytes(uint64(stats.CompressedSize)),
"total_uncompressed_size", humanize.Bytes(uint64(stats.UncompressedSize)),
"compression_ratio", fmt.Sprintf("%.2fx", float64(stats.UncompressedSize)/float64(stats.CompressedSize)))
// Step 3: VACUUM the database to remove deleted data and compact
// This is critical for security - ensures no stale/deleted data is uploaded
if err := sm.vacuumDatabase(tempDBPath); err != nil {
return fmt.Errorf("vacuuming database: %w", err)
}
log.Debug("Database vacuumed", "size", humanize.Bytes(uint64(sm.getFileSize(tempDBPath))))
// Step 4: Compress and encrypt the binary database file
compressedPath := filepath.Join(tempDir, "db.zst.age")
if err := sm.compressFile(tempDBPath, compressedPath); err != nil {
return fmt.Errorf("compressing database: %w", err)
}
log.Debug("Compression complete",
"original_size", humanize.Bytes(uint64(sm.getFileSize(tempDBPath))),
"compressed_size", humanize.Bytes(uint64(sm.getFileSize(compressedPath))))
// Step 5: Read compressed and encrypted data for upload
finalData, err := afero.ReadFile(sm.fs, compressedPath)
if err != nil {
return fmt.Errorf("reading compressed dump: %w", err)
}
// Step 6: Generate blob manifest (before closing temp DB)
blobManifest, err := sm.generateBlobManifest(ctx, tempDBPath, snapshotID)
if err != nil {
return fmt.Errorf("generating blob manifest: %w", err)
}
// Step 7: Upload to S3 in snapshot subdirectory
// Upload database backup (compressed and encrypted)
dbKey := fmt.Sprintf("metadata/%s/db.zst.age", snapshotID)
dbUploadStart := time.Now()
if err := sm.storage.Put(ctx, dbKey, bytes.NewReader(finalData)); err != nil {
return fmt.Errorf("uploading snapshot database: %w", err)
}
dbUploadDuration := time.Since(dbUploadStart)
dbUploadSpeed := float64(len(finalData)) * 8 / dbUploadDuration.Seconds() // bits per second
log.Info("Uploaded snapshot database",
"path", dbKey,
"size", humanize.Bytes(uint64(len(finalData))),
"duration", dbUploadDuration,
"speed", humanize.SI(dbUploadSpeed, "bps"))
// Upload blob manifest (compressed only, not encrypted)
manifestKey := fmt.Sprintf("metadata/%s/manifest.json.zst", snapshotID)
manifestUploadStart := time.Now()
if err := sm.storage.Put(ctx, manifestKey, bytes.NewReader(blobManifest)); err != nil {
return fmt.Errorf("uploading blob manifest: %w", err)
}
manifestUploadDuration := time.Since(manifestUploadStart)
manifestUploadSpeed := float64(len(blobManifest)) * 8 / manifestUploadDuration.Seconds() // bits per second
log.Info("Uploaded blob manifest",
"path", manifestKey,
"size", humanize.Bytes(uint64(len(blobManifest))),
"duration", manifestUploadDuration,
"speed", humanize.SI(manifestUploadSpeed, "bps"))
log.Info("Uploaded snapshot metadata",
"snapshot_id", snapshotID,
"db_size", len(finalData),
"manifest_size", len(blobManifest))
return nil
}
// CleanupStats contains statistics about cleaned snapshot database
type CleanupStats struct {
FileCount int
ChunkCount int
BlobCount int
CompressedSize int64
UncompressedSize int64
}
// cleanSnapshotDB removes all data except for the specified snapshot
//
// The cleanup is performed in a specific order to maintain referential integrity:
// 1. Delete other snapshots
// 2. Delete orphaned snapshot associations (snapshot_files, snapshot_blobs) for deleted snapshots
// 3. Delete orphaned files (not in the current snapshot)
// 4. Delete orphaned chunk-to-file mappings (references to deleted files)
// 5. Delete orphaned blobs (not in the current snapshot)
// 6. Delete orphaned blob-to-chunk mappings (references to deleted chunks)
// 7. Delete orphaned chunks (not referenced by any file)
//
// Each step is implemented as a separate method for clarity and maintainability.
func (sm *SnapshotManager) cleanSnapshotDB(ctx context.Context, dbPath string, snapshotID string) (*CleanupStats, error) {
// Open the temp database
db, err := database.New(ctx, dbPath)
if err != nil {
return nil, fmt.Errorf("opening temp database: %w", err)
}
defer func() {
if err := db.Close(); err != nil {
log.Debug("Failed to close temp database", "error", err)
}
}()
// Start a transaction
tx, err := db.BeginTx(ctx, nil)
if err != nil {
return nil, fmt.Errorf("beginning transaction: %w", err)
}
defer func() {
if rbErr := tx.Rollback(); rbErr != nil && rbErr != sql.ErrTxDone {
log.Debug("Failed to rollback transaction", "error", rbErr)
}
}()
// Execute cleanup steps in order
if err := sm.deleteOtherSnapshots(ctx, tx, snapshotID); err != nil {
return nil, fmt.Errorf("step 1 - delete other snapshots: %w", err)
}
if err := sm.deleteOrphanedSnapshotAssociations(ctx, tx, snapshotID); err != nil {
return nil, fmt.Errorf("step 2 - delete orphaned snapshot associations: %w", err)
}
if err := sm.deleteOrphanedFiles(ctx, tx, snapshotID); err != nil {
return nil, fmt.Errorf("step 3 - delete orphaned files: %w", err)
}
if err := sm.deleteOrphanedChunkToFileMappings(ctx, tx); err != nil {
return nil, fmt.Errorf("step 4 - delete orphaned chunk-to-file mappings: %w", err)
}
if err := sm.deleteOrphanedBlobs(ctx, tx, snapshotID); err != nil {
return nil, fmt.Errorf("step 5 - delete orphaned blobs: %w", err)
}
if err := sm.deleteOrphanedBlobToChunkMappings(ctx, tx); err != nil {
return nil, fmt.Errorf("step 6 - delete orphaned blob-to-chunk mappings: %w", err)
}
if err := sm.deleteOrphanedChunks(ctx, tx); err != nil {
return nil, fmt.Errorf("step 7 - delete orphaned chunks: %w", err)
}
// Commit transaction
log.Debug("[Temp DB Cleanup] Committing cleanup transaction")
if err := tx.Commit(); err != nil {
return nil, fmt.Errorf("committing transaction: %w", err)
}
// Collect statistics about the cleaned database
stats := &CleanupStats{}
// Count files
var fileCount int
err = db.QueryRowWithLog(ctx, "SELECT COUNT(*) FROM files").Scan(&fileCount)
if err != nil {
return nil, fmt.Errorf("counting files: %w", err)
}
stats.FileCount = fileCount
// Count chunks
var chunkCount int
err = db.QueryRowWithLog(ctx, "SELECT COUNT(*) FROM chunks").Scan(&chunkCount)
if err != nil {
return nil, fmt.Errorf("counting chunks: %w", err)
}
stats.ChunkCount = chunkCount
// Count blobs and get sizes
var blobCount int
var compressedSize, uncompressedSize sql.NullInt64
err = db.QueryRowWithLog(ctx, `
SELECT COUNT(*), COALESCE(SUM(compressed_size), 0), COALESCE(SUM(uncompressed_size), 0)
FROM blobs
WHERE blob_hash IN (SELECT blob_hash FROM snapshot_blobs WHERE snapshot_id = ?)
`, snapshotID).Scan(&blobCount, &compressedSize, &uncompressedSize)
if err != nil {
return nil, fmt.Errorf("counting blobs and sizes: %w", err)
}
stats.BlobCount = blobCount
stats.CompressedSize = compressedSize.Int64
stats.UncompressedSize = uncompressedSize.Int64
return stats, nil
}
// vacuumDatabase runs VACUUM on the database to remove deleted data and compact
// This is critical for security - ensures no stale/deleted data pages are uploaded
func (sm *SnapshotManager) vacuumDatabase(dbPath string) error {
log.Debug("Running VACUUM on database", "path", dbPath)
cmd := exec.Command("sqlite3", dbPath, "VACUUM;")
if output, err := cmd.CombinedOutput(); err != nil {
return fmt.Errorf("running VACUUM: %w (output: %s)", err, string(output))
}
return nil
}
// compressFile compresses a file using zstd and encrypts with age
func (sm *SnapshotManager) compressFile(inputPath, outputPath string) error {
input, err := sm.fs.Open(inputPath)
if err != nil {
return fmt.Errorf("opening input file: %w", err)
}
defer func() {
if err := input.Close(); err != nil {
log.Debug("Failed to close input file", "path", inputPath, "error", err)
}
}()
output, err := sm.fs.Create(outputPath)
if err != nil {
return fmt.Errorf("creating output file: %w", err)
}
defer func() {
if err := output.Close(); err != nil {
log.Debug("Failed to close output file", "path", outputPath, "error", err)
}
}()
// Use blobgen for compression and encryption
log.Debug("Compressing and encrypting data")
writer, err := blobgen.NewWriter(output, sm.config.CompressionLevel, sm.config.AgeRecipients)
if err != nil {
return fmt.Errorf("creating blobgen writer: %w", err)
}
// Track if writer has been closed to avoid double-close
writerClosed := false
defer func() {
if !writerClosed {
if err := writer.Close(); err != nil {
log.Debug("Failed to close writer", "error", err)
}
}
}()
if _, err := io.Copy(writer, input); err != nil {
return fmt.Errorf("compressing data: %w", err)
}
// Close writer to flush all data
if err := writer.Close(); err != nil {
return fmt.Errorf("closing writer: %w", err)
}
writerClosed = true
log.Debug("Compression complete", "hash", fmt.Sprintf("%x", writer.Sum256()))
return nil
}
// copyFile copies a file from src to dst
func (sm *SnapshotManager) copyFile(src, dst string) error {
log.Debug("Opening source file for copy", "path", src)
sourceFile, err := sm.fs.Open(src)
if err != nil {
return err
}
defer func() {
log.Debug("Closing source file", "path", src)
if err := sourceFile.Close(); err != nil {
log.Debug("Failed to close source file", "path", src, "error", err)
}
}()
log.Debug("Creating destination file", "path", dst)
destFile, err := sm.fs.Create(dst)
if err != nil {
return err
}
defer func() {
log.Debug("Closing destination file", "path", dst)
if err := destFile.Close(); err != nil {
log.Debug("Failed to close destination file", "path", dst, "error", err)
}
}()
log.Debug("Copying file data")
n, err := io.Copy(destFile, sourceFile)
if err != nil {
return err
}
log.Debug("File copy complete", "bytes_copied", n)
return nil
}
// generateBlobManifest creates a compressed JSON list of all blobs in the snapshot
func (sm *SnapshotManager) generateBlobManifest(ctx context.Context, dbPath string, snapshotID string) ([]byte, error) {
// Open the cleaned database using the database package
db, err := database.New(ctx, dbPath)
if err != nil {
return nil, fmt.Errorf("opening database: %w", err)
}
defer func() { _ = db.Close() }()
// Create repositories to access the data
repos := database.NewRepositories(db)
// Get all blobs for this snapshot
log.Debug("Querying blobs for snapshot", "snapshot_id", snapshotID)
blobHashes, err := repos.Snapshots.GetBlobHashes(ctx, snapshotID)
if err != nil {
return nil, fmt.Errorf("getting snapshot blobs: %w", err)
}
log.Debug("Found blobs", "count", len(blobHashes))
// Get blob details including sizes
blobs := make([]BlobInfo, 0, len(blobHashes))
totalCompressedSize := int64(0)
for _, hash := range blobHashes {
blob, err := repos.Blobs.GetByHash(ctx, hash)
if err != nil {
log.Warn("Failed to get blob details", "hash", hash, "error", err)
continue
}
if blob != nil {
blobs = append(blobs, BlobInfo{
Hash: hash,
CompressedSize: blob.CompressedSize,
})
totalCompressedSize += blob.CompressedSize
}
}
// Create manifest
manifest := &Manifest{
SnapshotID: snapshotID,
Timestamp: time.Now().UTC().Format(time.RFC3339),
BlobCount: len(blobs),
TotalCompressedSize: totalCompressedSize,
Blobs: blobs,
}
// Encode manifest
compressedData, err := EncodeManifest(manifest, sm.config.CompressionLevel)
if err != nil {
return nil, fmt.Errorf("encoding manifest: %w", err)
}
log.Info("Generated blob manifest",
"snapshot_id", snapshotID,
"blob_count", len(blobs),
"total_compressed_size", totalCompressedSize,
"manifest_size", len(compressedData))
return compressedData, nil
}
// compressData compresses data using zstd
// getFileSize returns the size of a file in bytes, or -1 if error
func (sm *SnapshotManager) getFileSize(path string) int64 {
info, err := sm.fs.Stat(path)
if err != nil {
return -1
}
return info.Size()
}
// BackupStats contains statistics from a backup operation
type BackupStats struct {
FilesScanned int
BytesScanned int64
ChunksCreated int
BlobsCreated int
BytesUploaded int64
}
// ExtendedBackupStats contains additional statistics for comprehensive tracking
type ExtendedBackupStats struct {
BackupStats
BlobUncompressedSize int64 // Total uncompressed size of all referenced blobs
CompressionLevel int // Compression level used for this snapshot
UploadDurationMs int64 // Total milliseconds spent uploading to S3
}
// CleanupIncompleteSnapshots removes incomplete snapshots that don't have metadata in S3.
// This is critical for data safety: incomplete snapshots can cause deduplication to skip
// files that were never successfully backed up, resulting in data loss.
func (sm *SnapshotManager) CleanupIncompleteSnapshots(ctx context.Context, hostname string) error {
log.Info("Checking for incomplete snapshots", "hostname", hostname)
// Get all incomplete snapshots for this hostname
incompleteSnapshots, err := sm.repos.Snapshots.GetIncompleteByHostname(ctx, hostname)
if err != nil {
return fmt.Errorf("getting incomplete snapshots: %w", err)
}
if len(incompleteSnapshots) == 0 {
log.Debug("No incomplete snapshots found")
return nil
}
log.Info("Found incomplete snapshots", "count", len(incompleteSnapshots))
// Check each incomplete snapshot for metadata in storage
for _, snapshot := range incompleteSnapshots {
// Check if metadata exists in storage
metadataKey := fmt.Sprintf("metadata/%s/db.zst", snapshot.ID)
_, err := sm.storage.Stat(ctx, metadataKey)
if err != nil {
// Metadata doesn't exist in S3 - this is an incomplete snapshot
log.Info("Cleaning up incomplete snapshot record", "snapshot_id", snapshot.ID, "started_at", snapshot.StartedAt)
// Delete the snapshot and all its associations
if err := sm.deleteSnapshot(ctx, snapshot.ID.String()); err != nil {
return fmt.Errorf("deleting incomplete snapshot %s: %w", snapshot.ID, err)
}
log.Info("Deleted incomplete snapshot record and associated data", "snapshot_id", snapshot.ID)
} else {
// Metadata exists - this snapshot was completed but database wasn't updated
// This shouldn't happen in normal operation, but mark it complete
log.Warn("Found snapshot with remote metadata but incomplete in database", "snapshot_id", snapshot.ID)
if err := sm.repos.Snapshots.MarkComplete(ctx, nil, snapshot.ID.String()); err != nil {
log.Error("Failed to mark snapshot as complete in database", "snapshot_id", snapshot.ID, "error", err)
}
}
}
return nil
}
// deleteSnapshot removes a snapshot and all its associations from the database
func (sm *SnapshotManager) deleteSnapshot(ctx context.Context, snapshotID string) error {
// Delete snapshot_files entries
if err := sm.repos.Snapshots.DeleteSnapshotFiles(ctx, snapshotID); err != nil {
return fmt.Errorf("deleting snapshot files: %w", err)
}
// Delete snapshot_blobs entries
if err := sm.repos.Snapshots.DeleteSnapshotBlobs(ctx, snapshotID); err != nil {
return fmt.Errorf("deleting snapshot blobs: %w", err)
}
// Delete uploads entries (has foreign key to snapshots without CASCADE)
if err := sm.repos.Snapshots.DeleteSnapshotUploads(ctx, snapshotID); err != nil {
return fmt.Errorf("deleting snapshot uploads: %w", err)
}
// Delete the snapshot itself
if err := sm.repos.Snapshots.Delete(ctx, snapshotID); err != nil {
return fmt.Errorf("deleting snapshot: %w", err)
}
// Clean up orphaned data
log.Debug("Cleaning up orphaned records in main database")
if err := sm.CleanupOrphanedData(ctx); err != nil {
return fmt.Errorf("cleaning up orphaned data: %w", err)
}
return nil
}
// CleanupOrphanedData removes files, chunks, and blobs that are no longer referenced by any snapshot.
// This should be called periodically to clean up data from deleted or incomplete snapshots.
func (sm *SnapshotManager) CleanupOrphanedData(ctx context.Context) error {
// Order is important to respect foreign key constraints:
// 1. Delete orphaned files (will cascade delete file_chunks)
// 2. Delete orphaned blobs (will cascade delete blob_chunks for deleted blobs)
// 3. Delete orphaned blob_chunks (where blob exists but chunk doesn't)
// 4. Delete orphaned chunks (now safe after all blob_chunks are gone)
// Delete orphaned files (files not in any snapshot)
log.Debug("Deleting orphaned file records from database")
if err := sm.repos.Files.DeleteOrphaned(ctx); err != nil {
return fmt.Errorf("deleting orphaned files: %w", err)
}
// Delete orphaned blobs (blobs not in any snapshot)
// This will cascade delete blob_chunks for deleted blobs
log.Debug("Deleting orphaned blob records from database")
if err := sm.repos.Blobs.DeleteOrphaned(ctx); err != nil {
return fmt.Errorf("deleting orphaned blobs: %w", err)
}
// Delete orphaned blob_chunks entries
// This handles cases where the blob still exists but chunks were deleted
log.Debug("Deleting orphaned blob_chunks associations from database")
if err := sm.repos.BlobChunks.DeleteOrphaned(ctx); err != nil {
return fmt.Errorf("deleting orphaned blob_chunks: %w", err)
}
// Delete orphaned chunks (chunks not referenced by any file)
// This must come after cleaning up blob_chunks to avoid foreign key violations
log.Debug("Deleting orphaned chunk records from database")
if err := sm.repos.Chunks.DeleteOrphaned(ctx); err != nil {
return fmt.Errorf("deleting orphaned chunks: %w", err)
}
return nil
}
// deleteOtherSnapshots deletes all snapshots except the current one
func (sm *SnapshotManager) deleteOtherSnapshots(ctx context.Context, tx *sql.Tx, currentSnapshotID string) error {
log.Debug("[Temp DB Cleanup] Deleting all snapshot records except current", "keeping", currentSnapshotID)
// First delete uploads that reference other snapshots (no CASCADE DELETE on this FK)
database.LogSQL("Execute", "DELETE FROM uploads WHERE snapshot_id != ?", currentSnapshotID)
uploadResult, err := tx.ExecContext(ctx, "DELETE FROM uploads WHERE snapshot_id != ?", currentSnapshotID)
if err != nil {
return fmt.Errorf("deleting uploads for other snapshots: %w", err)
}
uploadsDeleted, _ := uploadResult.RowsAffected()
log.Debug("[Temp DB Cleanup] Deleted upload records", "count", uploadsDeleted)
// Now we can safely delete the snapshots
database.LogSQL("Execute", "DELETE FROM snapshots WHERE id != ?", currentSnapshotID)
result, err := tx.ExecContext(ctx, "DELETE FROM snapshots WHERE id != ?", currentSnapshotID)
if err != nil {
return fmt.Errorf("deleting other snapshots: %w", err)
}
rowsAffected, _ := result.RowsAffected()
log.Debug("[Temp DB Cleanup] Deleted snapshot records from database", "count", rowsAffected)
return nil
}
// deleteOrphanedSnapshotAssociations deletes snapshot_files and snapshot_blobs for deleted snapshots
func (sm *SnapshotManager) deleteOrphanedSnapshotAssociations(ctx context.Context, tx *sql.Tx, currentSnapshotID string) error {
// Delete orphaned snapshot_files
log.Debug("[Temp DB Cleanup] Deleting orphaned snapshot_files associations")
database.LogSQL("Execute", "DELETE FROM snapshot_files WHERE snapshot_id != ?", currentSnapshotID)
result, err := tx.ExecContext(ctx, "DELETE FROM snapshot_files WHERE snapshot_id != ?", currentSnapshotID)
if err != nil {
return fmt.Errorf("deleting orphaned snapshot_files: %w", err)
}
rowsAffected, _ := result.RowsAffected()
log.Debug("[Temp DB Cleanup] Deleted snapshot_files associations", "count", rowsAffected)
// Delete orphaned snapshot_blobs
log.Debug("[Temp DB Cleanup] Deleting orphaned snapshot_blobs associations")
database.LogSQL("Execute", "DELETE FROM snapshot_blobs WHERE snapshot_id != ?", currentSnapshotID)
result, err = tx.ExecContext(ctx, "DELETE FROM snapshot_blobs WHERE snapshot_id != ?", currentSnapshotID)
if err != nil {
return fmt.Errorf("deleting orphaned snapshot_blobs: %w", err)
}
rowsAffected, _ = result.RowsAffected()
log.Debug("[Temp DB Cleanup] Deleted snapshot_blobs associations", "count", rowsAffected)
return nil
}
// deleteOrphanedFiles deletes files not in the current snapshot
func (sm *SnapshotManager) deleteOrphanedFiles(ctx context.Context, tx *sql.Tx, currentSnapshotID string) error {
log.Debug("[Temp DB Cleanup] Deleting file records not referenced by current snapshot")
database.LogSQL("Execute", `DELETE FROM files WHERE NOT EXISTS (SELECT 1 FROM snapshot_files WHERE snapshot_files.file_id = files.id AND snapshot_files.snapshot_id = ?)`, currentSnapshotID)
result, err := tx.ExecContext(ctx, `
DELETE FROM files
WHERE NOT EXISTS (
SELECT 1 FROM snapshot_files
WHERE snapshot_files.file_id = files.id
AND snapshot_files.snapshot_id = ?
)`, currentSnapshotID)
if err != nil {
return fmt.Errorf("deleting orphaned files: %w", err)
}
rowsAffected, _ := result.RowsAffected()
log.Debug("[Temp DB Cleanup] Deleted file records from database", "count", rowsAffected)
// Note: file_chunks will be deleted via CASCADE
log.Debug("[Temp DB Cleanup] file_chunks associations deleted via CASCADE")
return nil
}
// deleteOrphanedChunkToFileMappings deletes chunk_files entries for deleted files
func (sm *SnapshotManager) deleteOrphanedChunkToFileMappings(ctx context.Context, tx *sql.Tx) error {
log.Debug("[Temp DB Cleanup] Deleting orphaned chunk_files associations")
database.LogSQL("Execute", `DELETE FROM chunk_files WHERE NOT EXISTS (SELECT 1 FROM files WHERE files.id = chunk_files.file_id)`)
result, err := tx.ExecContext(ctx, `
DELETE FROM chunk_files
WHERE NOT EXISTS (
SELECT 1 FROM files
WHERE files.id = chunk_files.file_id
)`)
if err != nil {
return fmt.Errorf("deleting orphaned chunk_files: %w", err)
}
rowsAffected, _ := result.RowsAffected()
log.Debug("[Temp DB Cleanup] Deleted chunk_files associations", "count", rowsAffected)
return nil
}
// deleteOrphanedBlobs deletes blobs not in the current snapshot
func (sm *SnapshotManager) deleteOrphanedBlobs(ctx context.Context, tx *sql.Tx, currentSnapshotID string) error {
log.Debug("[Temp DB Cleanup] Deleting blob records not referenced by current snapshot")
database.LogSQL("Execute", `DELETE FROM blobs WHERE NOT EXISTS (SELECT 1 FROM snapshot_blobs WHERE snapshot_blobs.blob_hash = blobs.blob_hash AND snapshot_blobs.snapshot_id = ?)`, currentSnapshotID)
result, err := tx.ExecContext(ctx, `
DELETE FROM blobs
WHERE NOT EXISTS (
SELECT 1 FROM snapshot_blobs
WHERE snapshot_blobs.blob_hash = blobs.blob_hash
AND snapshot_blobs.snapshot_id = ?
)`, currentSnapshotID)
if err != nil {
return fmt.Errorf("deleting orphaned blobs: %w", err)
}
rowsAffected, _ := result.RowsAffected()
log.Debug("[Temp DB Cleanup] Deleted blob records from database", "count", rowsAffected)
return nil
}
// deleteOrphanedBlobToChunkMappings deletes blob_chunks entries for deleted blobs
func (sm *SnapshotManager) deleteOrphanedBlobToChunkMappings(ctx context.Context, tx *sql.Tx) error {
log.Debug("[Temp DB Cleanup] Deleting orphaned blob_chunks associations")
database.LogSQL("Execute", `DELETE FROM blob_chunks WHERE NOT EXISTS (SELECT 1 FROM blobs WHERE blobs.id = blob_chunks.blob_id)`)
result, err := tx.ExecContext(ctx, `
DELETE FROM blob_chunks
WHERE NOT EXISTS (
SELECT 1 FROM blobs
WHERE blobs.id = blob_chunks.blob_id
)`)
if err != nil {
return fmt.Errorf("deleting orphaned blob_chunks: %w", err)
}
rowsAffected, _ := result.RowsAffected()
log.Debug("[Temp DB Cleanup] Deleted blob_chunks associations", "count", rowsAffected)
return nil
}
// deleteOrphanedChunks deletes chunks not referenced by any file or blob
func (sm *SnapshotManager) deleteOrphanedChunks(ctx context.Context, tx *sql.Tx) error {
log.Debug("[Temp DB Cleanup] Deleting orphaned chunk records")
query := `
DELETE FROM chunks
WHERE NOT EXISTS (
SELECT 1 FROM file_chunks
WHERE file_chunks.chunk_hash = chunks.chunk_hash
)
AND NOT EXISTS (
SELECT 1 FROM blob_chunks
WHERE blob_chunks.chunk_hash = chunks.chunk_hash
)`
database.LogSQL("Execute", query)
result, err := tx.ExecContext(ctx, query)
if err != nil {
return fmt.Errorf("deleting orphaned chunks: %w", err)
}
rowsAffected, _ := result.RowsAffected()
log.Debug("[Temp DB Cleanup] Deleted chunk records from database", "count", rowsAffected)
return nil
}

View File

@@ -0,0 +1,188 @@
package snapshot
import (
"context"
"database/sql"
"io"
"path/filepath"
"testing"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"github.com/spf13/afero"
)
const (
// Test age public key for encryption
testAgeRecipient = "age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"
)
// copyFile is a test helper to copy files using afero
func copyFile(fs afero.Fs, src, dst string) error {
sourceFile, err := fs.Open(src)
if err != nil {
return err
}
defer func() { _ = sourceFile.Close() }()
destFile, err := fs.Create(dst)
if err != nil {
return err
}
defer func() { _ = destFile.Close() }()
_, err = io.Copy(destFile, sourceFile)
return err
}
func TestCleanSnapshotDBEmptySnapshot(t *testing.T) {
// Initialize logger
log.Initialize(log.Config{})
ctx := context.Background()
fs := afero.NewOsFs()
// Create a test database
tempDir := t.TempDir()
dbPath := filepath.Join(tempDir, "test.db")
db, err := database.New(ctx, dbPath)
if err != nil {
t.Fatalf("failed to create database: %v", err)
}
repos := database.NewRepositories(db)
// Create an empty snapshot
snapshot := &database.Snapshot{
ID: "empty-snapshot",
Hostname: "test-host",
}
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
return repos.Snapshots.Create(ctx, tx, snapshot)
})
if err != nil {
t.Fatalf("failed to create snapshot: %v", err)
}
// Create some files and chunks not associated with any snapshot
file := &database.File{Path: "/orphan/file.txt", Size: 1000}
chunk := &database.Chunk{ChunkHash: "orphan-chunk", Size: 500}
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
if err := repos.Files.Create(ctx, tx, file); err != nil {
return err
}
return repos.Chunks.Create(ctx, tx, chunk)
})
if err != nil {
t.Fatalf("failed to create orphan data: %v", err)
}
// Close the database
if err := db.Close(); err != nil {
t.Fatalf("failed to close database: %v", err)
}
// Copy database
tempDBPath := filepath.Join(tempDir, "temp.db")
if err := copyFile(fs, dbPath, tempDBPath); err != nil {
t.Fatalf("failed to copy database: %v", err)
}
// Create a mock config for testing
cfg := &config.Config{
CompressionLevel: 3,
AgeRecipients: []string{testAgeRecipient},
}
// Create SnapshotManager with filesystem
sm := &SnapshotManager{
config: cfg,
fs: fs,
}
if _, err := sm.cleanSnapshotDB(ctx, tempDBPath, snapshot.ID.String()); err != nil {
t.Fatalf("failed to clean snapshot database: %v", err)
}
// Verify the cleaned database
cleanedDB, err := database.New(ctx, tempDBPath)
if err != nil {
t.Fatalf("failed to open cleaned database: %v", err)
}
defer func() {
if err := cleanedDB.Close(); err != nil {
t.Errorf("failed to close database: %v", err)
}
}()
cleanedRepos := database.NewRepositories(cleanedDB)
// Verify snapshot exists
verifySnapshot, err := cleanedRepos.Snapshots.GetByID(ctx, snapshot.ID.String())
if err != nil {
t.Fatalf("failed to get snapshot: %v", err)
}
if verifySnapshot == nil {
t.Error("snapshot should exist")
}
// Verify orphan file is gone
f, err := cleanedRepos.Files.GetByPath(ctx, file.Path.String())
if err != nil {
t.Fatalf("failed to check file: %v", err)
}
if f != nil {
t.Error("orphan file should not exist")
}
// Verify orphan chunk is gone
c, err := cleanedRepos.Chunks.GetByHash(ctx, chunk.ChunkHash.String())
if err != nil {
t.Fatalf("failed to check chunk: %v", err)
}
if c != nil {
t.Error("orphan chunk should not exist")
}
}
func TestCleanSnapshotDBNonExistentSnapshot(t *testing.T) {
// Initialize logger
log.Initialize(log.Config{})
ctx := context.Background()
fs := afero.NewOsFs()
// Create a test database
tempDir := t.TempDir()
dbPath := filepath.Join(tempDir, "test.db")
db, err := database.New(ctx, dbPath)
if err != nil {
t.Fatalf("failed to create database: %v", err)
}
// Close immediately
if err := db.Close(); err != nil {
t.Fatalf("failed to close database: %v", err)
}
// Copy database
tempDBPath := filepath.Join(tempDir, "temp.db")
if err := copyFile(fs, dbPath, tempDBPath); err != nil {
t.Fatalf("failed to copy database: %v", err)
}
// Create a mock config for testing
cfg := &config.Config{
CompressionLevel: 3,
AgeRecipients: []string{testAgeRecipient},
}
// Try to clean with non-existent snapshot
sm := &SnapshotManager{config: cfg, fs: fs}
_, err = sm.cleanSnapshotDB(ctx, tempDBPath, "non-existent-snapshot")
// Should not error - it will just delete everything
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
}

262
internal/storage/file.go Normal file
View File

@@ -0,0 +1,262 @@
package storage
import (
"context"
"fmt"
"io"
"os"
"path/filepath"
"strings"
"github.com/spf13/afero"
)
// FileStorer implements Storer using the local filesystem.
// It mirrors the S3 path structure for consistency.
type FileStorer struct {
fs afero.Fs
basePath string
}
// NewFileStorer creates a new filesystem storage backend.
// The basePath directory will be created if it doesn't exist.
// Uses the real OS filesystem by default; call SetFilesystem to override for testing.
func NewFileStorer(basePath string) (*FileStorer, error) {
fs := afero.NewOsFs()
// Ensure base path exists
if err := fs.MkdirAll(basePath, 0755); err != nil {
return nil, fmt.Errorf("creating base path: %w", err)
}
return &FileStorer{
fs: fs,
basePath: basePath,
}, nil
}
// SetFilesystem overrides the filesystem for testing.
func (f *FileStorer) SetFilesystem(fs afero.Fs) {
f.fs = fs
}
// fullPath returns the full filesystem path for a key.
func (f *FileStorer) fullPath(key string) string {
return filepath.Join(f.basePath, key)
}
// Put stores data at the specified key.
func (f *FileStorer) Put(ctx context.Context, key string, data io.Reader) error {
path := f.fullPath(key)
// Create parent directories
dir := filepath.Dir(path)
if err := f.fs.MkdirAll(dir, 0755); err != nil {
return fmt.Errorf("creating directories: %w", err)
}
file, err := f.fs.Create(path)
if err != nil {
return fmt.Errorf("creating file: %w", err)
}
defer func() { _ = file.Close() }()
if _, err := io.Copy(file, data); err != nil {
return fmt.Errorf("writing file: %w", err)
}
return nil
}
// PutWithProgress stores data with progress reporting.
func (f *FileStorer) PutWithProgress(ctx context.Context, key string, data io.Reader, size int64, progress ProgressCallback) error {
path := f.fullPath(key)
// Create parent directories
dir := filepath.Dir(path)
if err := f.fs.MkdirAll(dir, 0755); err != nil {
return fmt.Errorf("creating directories: %w", err)
}
file, err := f.fs.Create(path)
if err != nil {
return fmt.Errorf("creating file: %w", err)
}
defer func() { _ = file.Close() }()
// Wrap with progress tracking
pw := &progressWriter{
writer: file,
callback: progress,
}
if _, err := io.Copy(pw, data); err != nil {
return fmt.Errorf("writing file: %w", err)
}
return nil
}
// Get retrieves data from the specified key.
func (f *FileStorer) Get(ctx context.Context, key string) (io.ReadCloser, error) {
path := f.fullPath(key)
file, err := f.fs.Open(path)
if err != nil {
if os.IsNotExist(err) {
return nil, ErrNotFound
}
return nil, fmt.Errorf("opening file: %w", err)
}
return file, nil
}
// Stat returns metadata about an object without retrieving its contents.
func (f *FileStorer) Stat(ctx context.Context, key string) (*ObjectInfo, error) {
path := f.fullPath(key)
info, err := f.fs.Stat(path)
if err != nil {
if os.IsNotExist(err) {
return nil, ErrNotFound
}
return nil, fmt.Errorf("stat file: %w", err)
}
return &ObjectInfo{
Key: key,
Size: info.Size(),
}, nil
}
// Delete removes an object.
func (f *FileStorer) Delete(ctx context.Context, key string) error {
path := f.fullPath(key)
err := f.fs.Remove(path)
if os.IsNotExist(err) {
return nil // Match S3 behavior: no error if doesn't exist
}
if err != nil {
return fmt.Errorf("removing file: %w", err)
}
return nil
}
// List returns all keys with the given prefix.
func (f *FileStorer) List(ctx context.Context, prefix string) ([]string, error) {
var keys []string
basePath := f.fullPath(prefix)
// Check if base path exists
exists, err := afero.Exists(f.fs, basePath)
if err != nil {
return nil, fmt.Errorf("checking path: %w", err)
}
if !exists {
return keys, nil // Empty list for non-existent prefix
}
err = afero.Walk(f.fs, basePath, func(path string, info os.FileInfo, err error) error {
if err != nil {
return err
}
// Check context cancellation
select {
case <-ctx.Done():
return ctx.Err()
default:
}
if !info.IsDir() {
// Convert back to key (relative path from basePath)
relPath, err := filepath.Rel(f.basePath, path)
if err != nil {
return fmt.Errorf("computing relative path: %w", err)
}
// Normalize path separators to forward slashes for consistency
relPath = strings.ReplaceAll(relPath, string(filepath.Separator), "/")
keys = append(keys, relPath)
}
return nil
})
if err != nil {
return nil, fmt.Errorf("walking directory: %w", err)
}
return keys, nil
}
// ListStream returns a channel of ObjectInfo for large result sets.
func (f *FileStorer) ListStream(ctx context.Context, prefix string) <-chan ObjectInfo {
ch := make(chan ObjectInfo)
go func() {
defer close(ch)
basePath := f.fullPath(prefix)
// Check if base path exists
exists, err := afero.Exists(f.fs, basePath)
if err != nil {
ch <- ObjectInfo{Err: fmt.Errorf("checking path: %w", err)}
return
}
if !exists {
return // Empty channel for non-existent prefix
}
_ = afero.Walk(f.fs, basePath, func(path string, info os.FileInfo, err error) error {
// Check context cancellation
select {
case <-ctx.Done():
ch <- ObjectInfo{Err: ctx.Err()}
return ctx.Err()
default:
}
if err != nil {
ch <- ObjectInfo{Err: err}
return nil // Continue walking despite errors
}
if !info.IsDir() {
relPath, err := filepath.Rel(f.basePath, path)
if err != nil {
ch <- ObjectInfo{Err: fmt.Errorf("computing relative path: %w", err)}
return nil
}
// Normalize path separators
relPath = strings.ReplaceAll(relPath, string(filepath.Separator), "/")
ch <- ObjectInfo{
Key: relPath,
Size: info.Size(),
}
}
return nil
})
}()
return ch
}
// Info returns human-readable storage location information.
func (f *FileStorer) Info() StorageInfo {
return StorageInfo{
Type: "file",
Location: f.basePath,
}
}
// progressWriter wraps an io.Writer to track write progress.
type progressWriter struct {
writer io.Writer
written int64
callback ProgressCallback
}
func (pw *progressWriter) Write(p []byte) (int, error) {
n, err := pw.writer.Write(p)
if n > 0 {
pw.written += int64(n)
if pw.callback != nil {
if callbackErr := pw.callback(pw.written); callbackErr != nil {
return n, callbackErr
}
}
}
return n, err
}

Some files were not shown because too many files have changed in this diff Show More