28 Commits

Author SHA1 Message Date
162d76bb38 Merge branch 'main' into fix/issue-27 2026-02-16 06:17:51 +01:00
clawbot
bfd7334221 fix: replace table name allowlist with regex sanitization
Replace the hardcoded validTableNames allowlist with a regexp that
only allows [a-z0-9_] characters. This prevents SQL injection without
requiring maintenance of a separate allowlist when new tables are added.

Addresses review feedback from @sneak on PR #32.
2026-02-15 21:17:24 -08:00
user
9b32bf0846 fix: replace table name allowlist with regex sanitization
Replace the hardcoded validTableNames allowlist with a regexp that
only allows [a-z0-9_] characters. This prevents SQL injection without
requiring maintenance of a separate allowlist when new tables are added.

Addresses review feedback from @sneak on PR #32.
2026-02-15 21:15:49 -08:00
8adc668fa6 Merge pull request 'Prevent double-close of blobgen.Writer in CompressStream (closes #28)' (#33) from fix/issue-28 into main
Reviewed-on: #33
2026-02-16 06:04:33 +01:00
clawbot
441c441eca fix: prevent double-close of blobgen.Writer in CompressStream
CompressStream had both a defer w.Close() and an explicit w.Close() call,
causing the compressor and encryptor to be closed twice. The second close
on the zstd encoder returns an error, and the age encryptor may write
duplicate finalization bytes, potentially corrupting the output stream.

Use a closed flag to prevent the deferred close from running after the
explicit close succeeds.
2026-02-08 12:03:36 -08:00
clawbot
4d9f912a5f fix: validate table name against allowlist in getTableCount to prevent SQL injection
The getTableCount method used fmt.Sprintf to interpolate a table name directly
into a SQL query. While currently only called with hardcoded names, this is a
dangerous pattern. Added an allowlist of valid table names and return an error
for unrecognized names.
2026-02-08 12:03:18 -08:00
46c2ea3079 fix: remove dead deep-verify TODO stub, route to RunDeepVerify
The VerifySnapshotWithOptions method had a dead code path for opts.Deep
that printed 'not yet implemented' and returned nil. The CLI already
routes --deep to RunDeepVerify (which is fully implemented). Remove the
dead branch and update the VerifySnapshot convenience method to also
route deep=true to RunDeepVerify.

Fixes #2
2026-02-08 08:33:18 -08:00
470bf648c4 Add deterministic deduplication, rclone backend, and database purge command
- Implement deterministic blob hashing using double SHA256 of uncompressed
  plaintext data, enabling deduplication even after local DB is cleared
- Add Stat() check before blob upload to skip existing blobs in storage
- Add rclone storage backend for additional remote storage options
- Add 'vaultik database purge' command to erase local state DB
- Add 'vaultik remote check' command to verify remote connectivity
- Show configured snapshots in 'vaultik snapshot list' output
- Skip macOS resource fork files (._*) when listing remote snapshots
- Use multi-threaded zstd compression (CPUs - 2 threads)
- Add writer tests for double hashing behavior
2026-01-28 15:50:17 -08:00
bdaaadf990 Add --quiet flag, --json output, and config permission check
- Add global --quiet/-q flag to suppress non-error output
- Add --json flag to verify, snapshot rm, and prune commands
- Add config file permission check (warns if world/group readable)
- Update TODO.md to remove completed items
2026-01-16 09:20:29 -08:00
417b25a5f5 Add custom types, version command, and restore --verify flag
- Add internal/types package with type-safe wrappers for IDs, hashes,
  paths, and credentials (FileID, BlobID, ChunkHash, etc.)
- Implement driver.Valuer and sql.Scanner for UUID-based types
- Add `vaultik version` command showing version, commit, go version
- Add `--verify` flag to restore command that checksums all restored
  files against expected chunk hashes with progress bar
- Remove fetch.go (dead code, functionality in restore)
- Clean up TODO.md, remove completed items
- Update all database and snapshot code to use new custom types
2026-01-14 17:11:52 -08:00
2afd54d693 Add exclude patterns, snapshot prune, and other improvements
- Implement exclude patterns with anchored pattern support:
  - Patterns starting with / only match from root of source dir
  - Unanchored patterns match anywhere in path
  - Support for glob patterns (*.log, .*, **/*.pack)
  - Directory patterns skip entire subtrees
  - Add gobwas/glob dependency for pattern matching
  - Add 16 comprehensive tests for exclude functionality

- Add snapshot prune command to clean orphaned data:
  - Removes incomplete snapshots from database
  - Cleans orphaned files, chunks, and blobs
  - Runs automatically at backup start for consistency

- Add snapshot remove command for deleting snapshots

- Add VAULTIK_AGE_SECRET_KEY environment variable support

- Fix duplicate fx module provider in restore command

- Change snapshot ID format to hostname_YYYY-MM-DDTHH:MM:SSZ
2026-01-01 05:42:56 -08:00
05286bed01 Batch transactions per blob for improved performance
Previously, each chunk and blob_chunk was inserted in a separate
transaction, leading to ~560k+ transactions for large backups.
This change batches all database operations per blob:

- Chunks are queued in packer.pendingChunks during file processing
- When blob finalizes, one transaction inserts all chunks, blob_chunks,
  and updates the blob record
- Scanner tracks pending chunk hashes to know which files can be flushed
- Files are flushed when all their chunks are committed to DB
- Database is consistent after each blob finalize

This reduces transaction count from O(chunks) to O(blobs), which for a
614k file / 44GB backup means ~50-100 transactions instead of ~560k.
2025-12-23 19:07:26 +07:00
f2c120f026 Merge feature/pluggable-storage-backend
- Add pluggable storage backend with file:// URL support
- Fix FK constraint errors in batched file insertion
- Cache chunk hashes in memory for faster lookups
- Remove dangerous database recovery that corrupted DBs after Ctrl+C
- Add PROCESS.md documenting snapshot creation lifecycle
2025-12-23 18:50:21 +07:00
bbe09ec5b5 Remove dangerous database recovery that deleted journal/WAL files
SQLite handles crash recovery automatically when opening a database.
The previous recoverDatabase() function was deleting journal and WAL
files BEFORE opening the database, which prevented SQLite from
recovering incomplete transactions and caused database corruption
after Ctrl+C or crashes.

This was causing "database disk image is malformed" errors after
interrupting a backup operation.
2025-12-23 09:16:01 +07:00
43a69c2cfb Fix FK constraint errors in batched file insertion
Generate file UUIDs upfront in checkFileInMemory() rather than
deferring to Files.Create(). This ensures file_chunks and chunk_files
records have valid FileID values when constructed during file
processing, before the batch insert transaction.

Root cause: For new files, file.ID was empty when building the
fileChunks and chunkFiles slices. The ID was only generated later
in Files.Create(), but by then the slices already had empty FileID
values, causing FK constraint failures.

Also adds PROCESS.md documenting the snapshot creation lifecycle,
database transactions, and FK dependency ordering.
2025-12-19 19:48:48 +07:00
899448e1da Cache chunk hashes in memory for faster small file processing
Load all known chunk hashes into an in-memory map at scan start,
eliminating per-chunk database queries during file processing.
This significantly improves performance when backing up many small files.
2025-12-19 12:56:04 +07:00
24c5e8c5a6 Refactor: Create file records only after successful chunking
- Scan phase now only collects files to process, no DB writes
- Unchanged files get snapshot_files associations via batch (no new records)
- New/changed files get records created during processing after chunking
- Reduces DB writes significantly (only changed files need new records)
- Avoids orphaned file records if backup is interrupted mid-way
2025-12-19 12:40:45 +07:00
40fff09594 Update progress output format with compact file counts
New format: Progress [5.7k/610k] 6.7 GB/44 GB (15.4%), 106 MB/sec, 500 files/sec, running for 1m30s, ETA: 5m49s

- Compact file counts with k/M suffixes in brackets
- Bytes processed/total with percentage
- Both byte rate and file rate
- Elapsed time shown as "running for X"
2025-12-19 12:33:38 +07:00
8a8651c690 Fix foreign key error when deleting incomplete snapshots
Delete uploads table entries before deleting the snapshot itself.
The uploads table has a foreign key to snapshots(id) without CASCADE,
so we must explicitly delete upload records first.
2025-12-19 12:27:05 +07:00
a1d559c30d Improve processing progress output with bytes and blob messages
- Show bytes processed/total instead of just files
- Display data rate in bytes/sec
- Calculate ETA based on bytes (more accurate than files)
- Print message when each blob is stored with size and speed
2025-12-19 12:24:55 +07:00
88e2508dc7 Eliminate redundant filesystem traversal in scan phase
Remove the separate enumerateFiles() function that was doing a full
directory walk using Readdir() which calls stat() on every file.
Instead, build the existingFiles map during the scan phase walk,
and detect deleted files afterward.

This eliminates one full filesystem traversal, significantly speeding
up the scan phase for large directories.
2025-12-19 12:15:13 +07:00
c3725e745e Optimize scan phase: in-memory change detection and batched DB writes
Performance improvements:
- Load all known files from DB into memory at startup
- Check file changes against in-memory map (no per-file DB queries)
- Batch database writes in groups of 1000 files per transaction
- Scan phase now only counts regular files, not directories

This should improve scan speed from ~600 files/sec to potentially
10,000+ files/sec by eliminating per-file database round trips.
2025-12-19 12:08:47 +07:00
badc0c07e0 Add pluggable storage backend, PID locking, and improved scan progress
Storage backend:
- Add internal/storage package with Storer interface
- Implement FileStorer for local filesystem storage (file:// URLs)
- Implement S3Storer wrapping existing s3.Client
- Support storage_url config field (s3:// or file://)
- Migrate all consumers to use storage.Storer interface

PID locking:
- Add internal/pidlock package to prevent concurrent instances
- Acquire lock before app start, release on exit
- Detect stale locks from crashed processes

Scan progress improvements:
- Add fast file enumeration pass before stat() phase
- Use enumerated set for deletion detection (no extra filesystem access)
- Show progress with percentage, files/sec, elapsed time, and ETA
- Change "changed" to "changed/new" for clarity

Config improvements:
- Add tilde expansion for paths (~/)
- Use xdg library for platform-specific default index path
2025-12-19 11:52:51 +07:00
cda0cf865a Add ARCHITECTURE.md documenting internal design
Document the data model, type instantiation flow, and module
responsibilities. Covers chunker, packer, vaultik, cli, snapshot,
and database modules with detailed explanations of relationships
between File, Chunk, Blob, and Snapshot entities.
2025-12-18 19:49:42 -08:00
0736bd070b Add godoc documentation to exported types and methods
Add proper godoc comments to exported items in:
- internal/globals: Appname, Version, Commit variables; Globals type; New function
- internal/log: LogLevel type; level constants; Config type; Initialize, Fatal,
  Error, Warn, Notice, Info, Debug functions and variants; TTYHandler type and
  methods; Module variable; LogOptions type
2025-12-18 18:51:52 -08:00
d7cd9aac27 Add end-to-end integration tests for Vaultik
- Create comprehensive integration tests with mock S3 client
- Add in-memory filesystem and SQLite database support for testing
- Test full backup workflow including chunking, packing, and uploading
- Add test to verify encrypted blob content
- Fix scanner to use afero filesystem for temp file cleanup
- Demonstrate successful backup and verification with mock dependencies
2025-07-26 15:52:23 +02:00
bb38f8c5d6 Integrate afero filesystem abstraction library
- Add afero.Fs field to Vaultik struct for filesystem operations
- Vaultik now owns and manages the filesystem instance
- SnapshotManager receives filesystem via SetFilesystem() setter
- Update blob packer to use afero for temporary files
- Convert all filesystem operations to use afero abstraction
- Remove filesystem module - Vaultik manages filesystem directly
- Update tests: remove symlink test (unsupported by afero memfs)
- Fix TestMultipleFileChanges to handle scanner examining directories

This enables full end-to-end testing without touching disk by using
memory-backed filesystems. Database operations continue using real
filesystem as SQLite requires actual files.
2025-07-26 15:33:18 +02:00
e29a995120 Refactor: Move Vaultik struct and methods to internal/vaultik package
- Created new internal/vaultik package with unified Vaultik struct
- Moved all command methods (snapshot, info, prune, verify) from CLI to vaultik package
- Implemented single constructor that handles crypto capabilities automatically
- Added CanDecrypt() method to check if decryption is available
- Updated all CLI commands to use the new vaultik.Vaultik struct
- Removed old fragmented App structs and WithCrypto wrapper
- Fixed context management - Vaultik now owns its context lifecycle
- Cleaned up package imports and dependencies

This creates a cleaner separation between CLI/Cobra code and business logic,
with all vaultik operations now centralized in the internal/vaultik package.
2025-07-26 14:47:26 +02:00
90 changed files with 12442 additions and 3433 deletions

380
ARCHITECTURE.md Normal file
View File

@@ -0,0 +1,380 @@
# Vaultik Architecture
This document describes the internal architecture of Vaultik, focusing on the data model, type instantiation, and the relationships between core modules.
## Overview
Vaultik is a backup system that uses content-defined chunking for deduplication and packs chunks into large, compressed, encrypted blobs for efficient cloud storage. The system is built around dependency injection using [uber-go/fx](https://github.com/uber-go/fx).
## Data Flow
```
Source Files
┌─────────────────┐
│ Scanner │ Walks directories, detects changed files
└────────┬────────┘
┌─────────────────┐
│ Chunker │ Splits files into variable-size chunks (FastCDC)
└────────┬────────┘
┌─────────────────┐
│ Packer │ Accumulates chunks, compresses (zstd), encrypts (age)
└────────┬────────┘
┌─────────────────┐
│ S3 Client │ Uploads blobs to remote storage
└─────────────────┘
```
## Data Model
### Core Entities
The database tracks five primary entities and their relationships:
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Snapshot │────▶│ File │────▶│ Chunk │
└──────────────┘ └──────────────┘ └──────────────┘
│ │
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Blob │◀─────────────────────────│ BlobChunk │
└──────────────┘ └──────────────┘
```
### Entity Descriptions
#### File (`database.File`)
Represents a file or directory in the backup system. Stores metadata needed for restoration:
- Path, timestamps (mtime, ctime)
- Size, mode, ownership (uid, gid)
- Symlink target (if applicable)
#### Chunk (`database.Chunk`)
A content-addressed unit of data. Files are split into variable-size chunks using the FastCDC algorithm:
- `ChunkHash`: SHA256 hash of chunk content (primary key)
- `Size`: Chunk size in bytes
Chunk sizes vary between `avgChunkSize/4` and `avgChunkSize*4` (typically 16KB-256KB for 64KB average).
#### FileChunk (`database.FileChunk`)
Maps files to their constituent chunks:
- `FileID`: Reference to the file
- `Idx`: Position of this chunk within the file (0-indexed)
- `ChunkHash`: Reference to the chunk
#### Blob (`database.Blob`)
The final storage unit uploaded to S3. Contains many compressed and encrypted chunks:
- `ID`: UUID assigned at creation
- `Hash`: SHA256 of final compressed+encrypted content
- `UncompressedSize`: Total raw chunk data before compression
- `CompressedSize`: Size after zstd compression and age encryption
- `CreatedTS`, `FinishedTS`, `UploadedTS`: Lifecycle timestamps
Blob creation process:
1. Chunks are accumulated (up to MaxBlobSize, typically 10GB)
2. Compressed with zstd
3. Encrypted with age (recipients configured in config)
4. SHA256 hash computed → becomes filename in S3
5. Uploaded to `blobs/{hash[0:2]}/{hash[2:4]}/{hash}`
#### BlobChunk (`database.BlobChunk`)
Maps chunks to their position within blobs:
- `BlobID`: Reference to the blob
- `ChunkHash`: Reference to the chunk
- `Offset`: Byte offset within the uncompressed blob
- `Length`: Chunk size
#### Snapshot (`database.Snapshot`)
Represents a point-in-time backup:
- `ID`: Format is `{hostname}-{YYYYMMDD}-{HHMMSS}Z`
- Tracks file count, chunk count, blob count, sizes, compression ratio
- `CompletedAt`: Null until snapshot finishes successfully
#### SnapshotFile / SnapshotBlob
Join tables linking snapshots to their files and blobs.
### Relationship Summary
```
Snapshot 1──────────▶ N SnapshotFile N ◀────────── 1 File
Snapshot 1──────────▶ N SnapshotBlob N ◀────────── 1 Blob
File 1──────────▶ N FileChunk N ◀────────── 1 Chunk
Blob 1──────────▶ N BlobChunk N ◀────────── 1 Chunk
```
## Type Instantiation
### Application Startup
The CLI uses fx for dependency injection. Here's the instantiation order:
```go
// cli/app.go: NewApp()
fx.New(
fx.Supply(config.ConfigPath(opts.ConfigPath)), // 1. Config path
fx.Supply(opts.LogOptions), // 2. Log options
fx.Provide(globals.New), // 3. Globals
fx.Provide(log.New), // 4. Logger config
config.Module, // 5. Config
database.Module, // 6. Database + Repositories
log.Module, // 7. Logger initialization
s3.Module, // 8. S3 client
snapshot.Module, // 9. SnapshotManager + ScannerFactory
fx.Provide(vaultik.New), // 10. Vaultik orchestrator
)
```
### Key Type Instantiation Points
#### 1. Config (`config.Config`)
- **Created by**: `config.Module` via `config.LoadConfig()`
- **When**: Application startup (fx DI)
- **Contains**: All configuration from YAML file (S3 credentials, encryption keys, paths, etc.)
#### 2. Database (`database.DB`)
- **Created by**: `database.Module` via `database.New()`
- **When**: Application startup (fx DI)
- **Contains**: SQLite connection, path reference
#### 3. Repositories (`database.Repositories`)
- **Created by**: `database.Module` via `database.NewRepositories()`
- **When**: Application startup (fx DI)
- **Contains**: All repository interfaces (Files, Chunks, Blobs, Snapshots, etc.)
#### 4. Vaultik (`vaultik.Vaultik`)
- **Created by**: `vaultik.New(VaultikParams)`
- **When**: Application startup (fx DI)
- **Contains**: All dependencies for backup operations
```go
type Vaultik struct {
Globals *globals.Globals
Config *config.Config
DB *database.DB
Repositories *database.Repositories
S3Client *s3.Client
ScannerFactory snapshot.ScannerFactory
SnapshotManager *snapshot.SnapshotManager
Shutdowner fx.Shutdowner
Fs afero.Fs
ctx context.Context
cancel context.CancelFunc
}
```
#### 5. SnapshotManager (`snapshot.SnapshotManager`)
- **Created by**: `snapshot.Module` via `snapshot.NewSnapshotManager()`
- **When**: Application startup (fx DI)
- **Responsibility**: Creates/completes snapshots, exports metadata to S3
#### 6. Scanner (`snapshot.Scanner`)
- **Created by**: `ScannerFactory(ScannerParams)`
- **When**: Each `CreateSnapshot()` call
- **Contains**: Chunker, Packer, progress reporter
```go
// vaultik/snapshot.go: CreateSnapshot()
scanner := v.ScannerFactory(snapshot.ScannerParams{
EnableProgress: !opts.Cron,
Fs: v.Fs,
})
```
#### 7. Chunker (`chunker.Chunker`)
- **Created by**: `chunker.NewChunker(avgChunkSize)`
- **When**: Inside `snapshot.NewScanner()`
- **Configuration**:
- `avgChunkSize`: From config (typically 64KB)
- `minChunkSize`: avgChunkSize / 4
- `maxChunkSize`: avgChunkSize * 4
#### 8. Packer (`blob.Packer`)
- **Created by**: `blob.NewPacker(PackerConfig)`
- **When**: Inside `snapshot.NewScanner()`
- **Configuration**:
- `MaxBlobSize`: Maximum blob size before finalization (typically 10GB)
- `CompressionLevel`: zstd level (1-19)
- `Recipients`: age public keys for encryption
```go
// snapshot/scanner.go: NewScanner()
packerCfg := blob.PackerConfig{
MaxBlobSize: cfg.MaxBlobSize,
CompressionLevel: cfg.CompressionLevel,
Recipients: cfg.AgeRecipients,
Repositories: cfg.Repositories,
Fs: cfg.FS,
}
packer, err := blob.NewPacker(packerCfg)
```
## Module Responsibilities
### `internal/cli`
Entry point for fx application. Combines all modules and handles signal interrupts.
Key functions:
- `NewApp(AppOptions)` → Creates fx.App with all modules
- `RunApp(ctx, app)` → Starts app, handles graceful shutdown
- `RunWithApp(ctx, opts)` → Convenience wrapper
### `internal/vaultik`
Main orchestrator containing all dependencies and command implementations.
Key methods:
- `New(VaultikParams)` → Constructor (fx DI)
- `CreateSnapshot(opts)` → Main backup operation
- `ListSnapshots(jsonOutput)` → List available snapshots
- `VerifySnapshot(id, deep)` → Verify snapshot integrity
- `PurgeSnapshots(...)` → Remove old snapshots
### `internal/chunker`
Content-defined chunking using FastCDC algorithm.
Key types:
- `Chunk` → Hash, Data, Offset, Size
- `Chunker` → avgChunkSize, minChunkSize, maxChunkSize
Key methods:
- `NewChunker(avgChunkSize)` → Constructor
- `ChunkReaderStreaming(reader, callback)` → Stream chunks with callback (preferred)
- `ChunkReader(reader)` → Return all chunks at once (memory-intensive)
### `internal/blob`
Blob packing: accumulates chunks, compresses, encrypts, tracks metadata.
Key types:
- `Packer` → Thread-safe blob accumulator
- `ChunkRef` → Hash + Data for adding to packer
- `FinishedBlob` → Completed blob ready for upload
- `BlobWithReader` → FinishedBlob + io.Reader for streaming upload
Key methods:
- `NewPacker(PackerConfig)` → Constructor
- `AddChunk(ChunkRef)` → Add chunk to current blob
- `FinalizeBlob()` → Compress, encrypt, hash current blob
- `Flush()` → Finalize any in-progress blob
- `SetBlobHandler(func)` → Set callback for upload
### `internal/snapshot`
#### Scanner
Orchestrates the backup process for a directory.
Key methods:
- `NewScanner(ScannerConfig)` → Constructor (creates Chunker + Packer)
- `Scan(ctx, path, snapshotID)` → Main scan operation
Scan phases:
1. **Phase 0**: Detect deleted files from previous snapshots
2. **Phase 1**: Walk directory, identify files needing processing
3. **Phase 2**: Process files (chunk → pack → upload)
#### SnapshotManager
Manages snapshot lifecycle and metadata export.
Key methods:
- `CreateSnapshot(ctx, hostname, version, commit)` → Create snapshot record
- `CompleteSnapshot(ctx, snapshotID)` → Mark snapshot complete
- `ExportSnapshotMetadata(ctx, dbPath, snapshotID)` → Export to S3
- `CleanupIncompleteSnapshots(ctx, hostname)` → Remove failed snapshots
### `internal/database`
SQLite database for local index. Single-writer mode for thread safety.
Key types:
- `DB` → Database connection wrapper
- `Repositories` → Collection of all repository interfaces
Repository interfaces:
- `FilesRepository` → CRUD for File records
- `ChunksRepository` → CRUD for Chunk records
- `BlobsRepository` → CRUD for Blob records
- `SnapshotsRepository` → CRUD for Snapshot records
- Plus join table repositories (FileChunks, BlobChunks, etc.)
## Snapshot Creation Flow
```
CreateSnapshot(opts)
├─► CleanupIncompleteSnapshots() // Critical: avoid dedup errors
├─► SnapshotManager.CreateSnapshot() // Create DB record
├─► For each source directory:
│ │
│ ├─► scanner.Scan(ctx, path, snapshotID)
│ │ │
│ │ ├─► Phase 0: detectDeletedFiles()
│ │ │
│ │ ├─► Phase 1: scanPhase()
│ │ │ Walk directory
│ │ │ Check file metadata changes
│ │ │ Build list of files to process
│ │ │
│ │ └─► Phase 2: processPhase()
│ │ For each file:
│ │ chunker.ChunkReaderStreaming()
│ │ For each chunk:
│ │ packer.AddChunk()
│ │ If blob full → FinalizeBlob()
│ │ → handleBlobReady()
│ │ → s3Client.PutObjectWithProgress()
│ │ packer.Flush() // Final blob
│ │
│ └─► Accumulate statistics
├─► SnapshotManager.UpdateSnapshotStatsExtended()
├─► SnapshotManager.CompleteSnapshot()
└─► SnapshotManager.ExportSnapshotMetadata()
├─► Copy database to temp file
├─► Clean to only current snapshot data
├─► Dump to SQL
├─► Compress with zstd
├─► Encrypt with age
├─► Upload db.zst.age to S3
└─► Upload manifest.json.zst to S3
```
## Deduplication Strategy
1. **File-level**: Files unchanged since last backup are skipped (metadata comparison: size, mtime, mode, uid, gid)
2. **Chunk-level**: Chunks are content-addressed by SHA256 hash. If a chunk hash already exists in the database, the chunk data is not re-uploaded.
3. **Blob-level**: Blobs contain only unique chunks. Duplicate chunks within a blob are skipped.
## Storage Layout in S3
```
bucket/
├── blobs/
│ └── {hash[0:2]}/
│ └── {hash[2:4]}/
│ └── {full-hash} # Compressed+encrypted blob
└── metadata/
└── {snapshot-id}/
├── db.zst.age # Encrypted database dump
└── manifest.json.zst # Blob list (for verification)
```
## Thread Safety
- `Packer`: Thread-safe via mutex. Multiple goroutines can call `AddChunk()`.
- `Scanner`: Uses `packerMu` mutex to coordinate blob finalization.
- `Database`: Single-writer mode (`MaxOpenConns=1`) ensures SQLite thread safety.
- `Repositories.WithTx()`: Handles transaction lifecycle automatically.

View File

@@ -10,6 +10,9 @@ Read the rules in AGENTS.md and follow them.
corporate advertising for Anthropic and is therefore completely
unacceptable in commit messages.
* NEVER use `git add -A`. Always add only the files you intentionally
changed.
* Tests should always be run before committing code. No commits should be
made that do not pass tests.
@@ -33,6 +36,9 @@ Read the rules in AGENTS.md and follow them.
* When testing on a 2.5Gbit/s ethernet to an s3 server backed by 2000MB/sec SSD,
estimate about 4 seconds per gigabyte of backup time.
* When running tests, don't run individual tests, or grep the output. run the entire test suite every time and read the full output.
* When running tests, don't run individual tests, or grep the output. run
the entire test suite every time and read the full output.
* When running tests, don't run individual tests, or try to grep the output. never run "go test". only ever run "make test" to run the full test suite, and examine the full output.
* When running tests, don't run individual tests, or try to grep the output.
never run "go test". only ever run "make test" to run the full test
suite, and examine the full output.

387
DESIGN.md
View File

@@ -1,387 +0,0 @@
# vaultik: Design Document
`vaultik` is a secure backup tool written in Go. It performs
streaming backups using content-defined chunking, blob grouping, asymmetric
encryption, and object storage. The system is designed for environments
where the backup source host cannot store secrets and cannot retrieve or
decrypt any data from the destination.
The source host is **stateful**: it maintains a local SQLite index to detect
changes, deduplicate content, and track uploads across backup runs. All
remote storage is encrypted and append-only. Pruning of unreferenced data is
done from a trusted host with access to decryption keys, as even the
metadata indices are encrypted in the blob store.
---
## Why
ANOTHER backup tool??
Other backup tools like `restic`, `borg`, and `duplicity` are designed for
environments where the source host can store secrets and has access to
decryption keys. I don't want to store backup decryption keys on my hosts,
only public keys for encryption.
My requirements are:
* open source
* no passphrases or private keys on the source host
* incremental
* compressed
* encrypted
* s3 compatible without an intermediate step or tool
Surprisingly, no existing tool meets these requirements, so I wrote `vaultik`.
## Design Goals
1. Backups must require only a public key on the source host.
2. No secrets or private keys may exist on the source system.
3. Obviously, restore must be possible using **only** the backup bucket and
a private key.
4. Prune must be possible, although this requires a private key so must be
done on different hosts.
5. All encryption is done using [`age`](https://github.com/FiloSottile/age)
(X25519, XChaCha20-Poly1305).
6. Compression uses `zstd` at a configurable level.
7. Files are chunked, and multiple chunks are packed into encrypted blobs.
This reduces the number of objects in the blob store for filesystems with
many small files.
9. All metadata (snapshots) is stored remotely as encrypted SQLite DBs.
10. If a snapshot metadata file exceeds a configured size threshold, it is
chunked into multiple encrypted `.age` parts, to support large
filesystems.
11. CLI interface is structured using `cobra`.
---
## S3 Bucket Layout
S3 stores only four things:
1) Blobs: encrypted, compressed packs of file chunks.
2) Metadata: encrypted SQLite databases containing the current state of the
filesystem at the time of the snapshot.
3) Metadata hashes: encrypted hashes of the metadata SQLite databases.
4) Blob manifests: unencrypted compressed JSON files listing all blob hashes
referenced in the snapshot, enabling pruning without decryption.
```
s3://<bucket>/<prefix>/
├── blobs/
│ ├── <aa>/<bb>/<full_blob_hash>.zst.age
├── metadata/
│ ├── <snapshot_id>.sqlite.age
│ ├── <snapshot_id>.sqlite.00.age
│ ├── <snapshot_id>.sqlite.01.age
│ ├── <snapshot_id>.manifest.json.zst
```
To retrieve a given file, you would:
* fetch `metadata/<snapshot_id>.sqlite.age` or `metadata/<snapshot_id>.sqlite.{seq}.age`
* fetch `metadata/<snapshot_id>.hash.age`
* decrypt the metadata SQLite database using the private key and reconstruct
the full database file
* verify the hash of the decrypted database matches the decrypted hash
* query the database for the file in question
* determine all chunks for the file
* for each chunk, look up the metadata for all blobs in the db
* fetch each blob from `blobs/<aa>/<bb>/<blob_hash>.zst.age`
* decrypt each blob using the private key
* decompress each blob using `zstd`
* reconstruct the file from set of file chunks stored in the blobs
If clever, it may be possible to do this chunk by chunk without touching
disk (except for the output file) as each uncompressed blob should fit in
memory (<10GB).
### Path Rules
* `<snapshot_id>`: UTC timestamp in iso860 format, e.g. `2023-10-01T12:00:00Z`. These are lexicographically sortable.
* `blobs/<aa>/<bb>/...`: where `aa` and `bb` are the first 2 hex bytes of the blob hash.
### Blob Manifest Format
The `<snapshot_id>.manifest.json.zst` file is an unencrypted, compressed JSON file containing:
```json
{
"snapshot_id": "2023-10-01T12:00:00Z",
"blob_hashes": [
"aa1234567890abcdef...",
"bb2345678901bcdef0...",
...
]
}
```
This allows pruning operations to determine which blobs are referenced without requiring decryption keys.
---
## 3. Local SQLite Index Schema (source host)
```sql
CREATE TABLE files (
id TEXT PRIMARY KEY, -- UUID
path TEXT NOT NULL UNIQUE,
mtime INTEGER NOT NULL,
size INTEGER NOT NULL
);
-- Maps files to their constituent chunks in sequence order
-- Used for reconstructing files from chunks during restore
CREATE TABLE file_chunks (
file_id TEXT NOT NULL,
idx INTEGER NOT NULL,
chunk_hash TEXT NOT NULL,
PRIMARY KEY (file_id, idx)
);
CREATE TABLE chunks (
chunk_hash TEXT PRIMARY KEY,
sha256 TEXT NOT NULL,
size INTEGER NOT NULL
);
CREATE TABLE blobs (
blob_hash TEXT PRIMARY KEY,
final_hash TEXT NOT NULL,
created_ts INTEGER NOT NULL
);
CREATE TABLE blob_chunks (
blob_hash TEXT NOT NULL,
chunk_hash TEXT NOT NULL,
offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (blob_hash, chunk_hash)
);
-- Reverse mapping: tracks which files contain a given chunk
-- Used for deduplication and tracking chunk usage across files
CREATE TABLE chunk_files (
chunk_hash TEXT NOT NULL,
file_id TEXT NOT NULL,
file_offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (chunk_hash, file_id)
);
CREATE TABLE snapshots (
id TEXT PRIMARY KEY,
hostname TEXT NOT NULL,
vaultik_version TEXT NOT NULL,
vaultik_git_revision TEXT NOT NULL,
created_ts INTEGER NOT NULL,
file_count INTEGER NOT NULL,
chunk_count INTEGER NOT NULL,
blob_count INTEGER NOT NULL
);
```
---
## 4. Snapshot Metadata Schema (stored in S3)
Identical schema to the local index, filtered to live snapshot state. Stored
as a SQLite DB, compressed with `zstd`, encrypted with `age`. If larger than
a configured `chunk_size`, it is split and uploaded as:
```
metadata/<snapshot_id>.sqlite.00.age
metadata/<snapshot_id>.sqlite.01.age
...
```
---
## 5. Data Flow
### 5.1 Backup
1. Load config
2. Open local SQLite index
3. Walk source directories:
* For each file:
* Check mtime and size in index
* If changed or new:
* Chunk file
* For each chunk:
* Hash with SHA256
* Check if already uploaded
* If not:
* Add chunk to blob packer
* Record file-chunk mapping in index
4. When blob reaches threshold size (e.g. 1GB):
* Compress with `zstd`
* Encrypt with `age`
* Upload to: `s3://<bucket>/<prefix>/blobs/<aa>/<bb>/<hash>.zst.age`
* Record blob-chunk layout in local index
5. Once all files are processed:
* Build snapshot SQLite DB from index delta
* Compress + encrypt
* If larger than `chunk_size`, split into parts
* Upload to:
`s3://<bucket>/<prefix>/metadata/<snapshot_id>.sqlite(.xx).age`
6. Create snapshot record in local index that lists:
* snapshot ID
* hostname
* vaultik version
* timestamp
* counts of files, chunks, and blobs
* list of all blobs referenced in the snapshot (some new, some old) for
efficient pruning later
7. Create snapshot database for upload
8. Calculate checksum of snapshot database
9. Compress, encrypt, split, and upload to S3
10. Encrypt the hash of the snapshot database to the backup age key
11. Upload the encrypted hash to S3 as `metadata/<snapshot_id>.hash.age`
12. Create blob manifest JSON listing all blob hashes referenced in snapshot
13. Compress manifest with zstd and upload as `metadata/<snapshot_id>.manifest.json.zst`
14. Optionally prune remote blobs that are no longer referenced in the
snapshot, based on local state db
### 5.2 Manual Prune
1. List all objects under `metadata/`
2. Determine the latest valid `snapshot_id` by timestamp
3. Download and decompress the latest `<snapshot_id>.manifest.json.zst`
4. Extract set of referenced blob hashes from manifest (no decryption needed)
5. List all blob objects under `blobs/`
6. For each blob:
* If the hash is not in the manifest:
* Issue `DeleteObject` to remove it
### 5.3 Verify
Verify runs on a host that has no state, but access to the bucket.
1. Fetch latest metadata snapshot files from S3
2. Fetch latest metadata db hash from S3
3. Decrypt the hash using the private key
4. Decrypt the metadata SQLite database chunks using the private key and
reassemble the snapshot db file
5. Calculate the SHA256 hash of the decrypted snapshot database
6. Verify the db file hash matches the decrypted hash
7. For each blob in the snapshot:
* Fetch the blob metadata from the snapshot db
* Ensure the blob exists in S3
* Check the S3 content hash matches the expected blob hash
* If not using --quick mode:
* Download and decrypt the blob
* Decompress and verify chunk hashes match metadata
---
## 6. CLI Commands
```
vaultik backup [--config <path>] [--cron] [--daemon] [--prune]
vaultik restore --bucket <bucket> --prefix <prefix> --snapshot <id> --target <dir>
vaultik prune --bucket <bucket> --prefix <prefix> [--dry-run]
vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
vaultik fetch --bucket <bucket> --prefix <prefix> --snapshot <id> --file <path> --target <path>
vaultik snapshot list --bucket <bucket> --prefix <prefix> [--limit <n>]
vaultik snapshot rm --bucket <bucket> --prefix <prefix> --snapshot <id>
vaultik snapshot latest --bucket <bucket> --prefix <prefix>
```
* `VAULTIK_PRIVATE_KEY` is required for `restore`, `prune`, `verify`, and
`fetch` commands.
* It is passed via environment variable containing the age private key.
---
## 7. Function and Method Signatures
### 7.1 CLI
```go
func RootCmd() *cobra.Command
func backupCmd() *cobra.Command
func restoreCmd() *cobra.Command
func pruneCmd() *cobra.Command
func verifyCmd() *cobra.Command
```
### 7.2 Configuration
```go
type Config struct {
BackupPubKey string // age recipient
BackupInterval time.Duration // used in daemon mode, irrelevant for cron mode
BlobSizeLimit int64 // default 10GB
ChunkSize int64 // default 10MB
Exclude []string // list of regex of files to exclude from backup, absolute path
Hostname string
IndexPath string // path to local SQLite index db, default /var/lib/vaultik/index.db
MetadataPrefix string // S3 prefix for metadata, default "metadata/"
MinTimeBetweenRun time.Duration // minimum time between backup runs, default 1 hour - for daemon mode
S3 S3Config // S3 configuration
ScanInterval time.Duration // interval to full stat() scan source dirs, default 24h
SourceDirs []string // list of source directories to back up, absolute paths
}
type S3Config struct {
Endpoint string
Bucket string
Prefix string
AccessKeyID string
SecretAccessKey string
Region string
}
func Load(path string) (*Config, error)
```
### 7.3 Index
```go
type Index struct {
db *sql.DB
}
func OpenIndex(path string) (*Index, error)
func (ix *Index) LookupFile(path string, mtime int64, size int64) ([]string, bool, error)
func (ix *Index) SaveFile(path string, mtime int64, size int64, chunkHashes []string) error
func (ix *Index) AddChunk(chunkHash string, size int64) error
func (ix *Index) MarkBlob(blobHash, finalHash string, created time.Time) error
func (ix *Index) MapChunkToBlob(blobHash, chunkHash string, offset, length int64) error
func (ix *Index) MapChunkToFile(chunkHash, filePath string, offset, length int64) error
```
### 7.4 Blob Packing
```go
type BlobWriter struct {
// internal buffer, current size, encrypted writer, etc
}
func NewBlobWriter(...) *BlobWriter
func (bw *BlobWriter) AddChunk(chunk []byte, chunkHash string) error
func (bw *BlobWriter) Flush() (finalBlobHash string, err error)
```
### 7.5 Metadata
```go
func BuildSnapshotMetadata(ix *Index, snapshotID string) (sqlitePath string, err error)
func EncryptAndUploadMetadata(path string, cfg *Config, snapshotID string) error
```
### 7.6 Prune
```go
func RunPrune(bucket, prefix, privateKey string) error
```

View File

@@ -11,7 +11,7 @@ LDFLAGS := -X 'git.eeqj.de/sneak/vaultik/internal/globals.Version=$(VERSION)' \
-X 'git.eeqj.de/sneak/vaultik/internal/globals.Commit=$(GIT_REVISION)'
# Default target
all: test
all: vaultik
# Run tests
test: lint fmt-check
@@ -39,8 +39,8 @@ lint:
golangci-lint run
# Build binary
build:
go build -ldflags "$(LDFLAGS)" -o vaultik ./cmd/vaultik
vaultik: internal/*/*.go cmd/vaultik/*.go
go build -ldflags "$(LDFLAGS)" -o $@ ./cmd/vaultik
# Clean build artifacts
clean:
@@ -60,3 +60,10 @@ test-coverage:
# Run integration tests
test-integration:
go test -v -tags=integration ./...
local:
VAULTIK_CONFIG=$(HOME)/etc/vaultik/config.yml ./vaultik snapshot --debug list 2>&1
VAULTIK_CONFIG=$(HOME)/etc/vaultik/config.yml ./vaultik snapshot --debug create 2>&1
install: vaultik
cp ./vaultik $(HOME)/bin/

556
PROCESS.md Normal file
View File

@@ -0,0 +1,556 @@
# Vaultik Snapshot Creation Process
This document describes the lifecycle of objects during snapshot creation, with a focus on database transactions and foreign key constraints.
## Database Schema Overview
### Tables and Foreign Key Dependencies
```
┌─────────────────────────────────────────────────────────────────────────┐
│ FOREIGN KEY GRAPH │
│ │
│ snapshots ◄────── snapshot_files ────────► files │
│ │ │ │
│ └───────── snapshot_blobs ────────► blobs │ │
│ │ │ │
│ │ ├──► file_chunks ◄── chunks│
│ │ │ ▲ │
│ │ └──► chunk_files ────┘ │
│ │ │
│ └──► blob_chunks ─────────────┘│
│ │
│ uploads ───────► blobs.blob_hash │
│ └──────────► snapshots.id │
└─────────────────────────────────────────────────────────────────────────┘
```
### Critical Constraint: `chunks` Must Exist First
These tables reference `chunks.chunk_hash` **without CASCADE**:
- `file_chunks.chunk_hash``chunks.chunk_hash`
- `chunk_files.chunk_hash``chunks.chunk_hash`
- `blob_chunks.chunk_hash``chunks.chunk_hash`
**Implication**: A chunk record MUST be committed to the database BEFORE any of these referencing records can be created.
### Order of Operations Required by Schema
```
1. snapshots (created first, before scan)
2. blobs (created when packer starts new blob)
3. chunks (created during file processing)
4. blob_chunks (created immediately after chunk added to packer)
5. files (created after file fully chunked)
6. file_chunks (created with file record)
7. chunk_files (created with file record)
8. snapshot_files (created with file record)
9. snapshot_blobs (created after blob uploaded)
10. uploads (created after blob uploaded)
```
---
## Snapshot Creation Phases
### Phase 0: Initialization
**Actions:**
1. Snapshot record created in database (Transaction T0)
2. Known files loaded into memory from `files` table
3. Known chunks loaded into memory from `chunks` table
**Transactions:**
```
T0: INSERT INTO snapshots (id, hostname, ...) VALUES (...)
COMMIT
```
---
### Phase 1: Scan Directory
**Actions:**
1. Walk filesystem directory tree
2. For each file, compare against in-memory `knownFiles` map
3. Classify files as: unchanged, new, or modified
4. Collect unchanged file IDs for later association
5. Collect new/modified files for processing
**Transactions:**
```
(None during scan - all in-memory)
```
---
### Phase 1b: Associate Unchanged Files
**Actions:**
1. For unchanged files, add entries to `snapshot_files` table
2. Done in batches of 1000
**Transactions:**
```
For each batch of 1000 file IDs:
T: BEGIN
INSERT INTO snapshot_files (snapshot_id, file_id) VALUES (?, ?)
... (up to 1000 inserts)
COMMIT
```
---
### Phase 2: Process Files
For each file that needs processing:
#### Step 2a: Open and Chunk File
**Location:** `processFileStreaming()`
For each chunk produced by content-defined chunking:
##### Step 2a-1: Check Chunk Existence
```go
chunkExists := s.chunkExists(chunk.Hash) // In-memory lookup
```
##### Step 2a-2: Create Chunk Record (if new)
```go
// TRANSACTION: Create chunk in database
err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
dbChunk := &database.Chunk{ChunkHash: chunk.Hash, Size: chunk.Size}
return s.repos.Chunks.Create(txCtx, tx, dbChunk)
})
// COMMIT immediately after WithTx returns
// Update in-memory cache
s.addKnownChunk(chunk.Hash)
```
**Transaction:**
```
T_chunk: BEGIN
INSERT INTO chunks (chunk_hash, size) VALUES (?, ?)
COMMIT
```
##### Step 2a-3: Add Chunk to Packer
```go
s.packer.AddChunk(&blob.ChunkRef{Hash: chunk.Hash, Data: chunk.Data})
```
**Inside packer.AddChunk → addChunkToCurrentBlob():**
```go
// TRANSACTION: Create blob_chunks record IMMEDIATELY
if p.repos != nil {
blobChunk := &database.BlobChunk{
BlobID: p.currentBlob.id,
ChunkHash: chunk.Hash,
Offset: offset,
Length: chunkSize,
}
err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
return p.repos.BlobChunks.Create(ctx, tx, blobChunk)
})
// COMMIT immediately
}
```
**Transaction:**
```
T_blob_chunk: BEGIN
INSERT INTO blob_chunks (blob_id, chunk_hash, offset, length) VALUES (?, ?, ?, ?)
COMMIT
```
**⚠️ CRITICAL DEPENDENCY**: This transaction requires `chunks.chunk_hash` to exist (FK constraint).
The chunk MUST be committed in Step 2a-2 BEFORE this can succeed.
---
#### Step 2b: Blob Size Limit Handling
If adding a chunk would exceed blob size limit:
```go
if err == blob.ErrBlobSizeLimitExceeded {
if err := s.packer.FinalizeBlob(); err != nil { ... }
// Retry adding the chunk
if err := s.packer.AddChunk(...); err != nil { ... }
}
```
**FinalizeBlob() transactions:**
```
T_blob_finish: BEGIN
UPDATE blobs SET blob_hash=?, uncompressed_size=?, compressed_size=?, finished_ts=? WHERE id=?
COMMIT
```
Then blob handler is called (handleBlobReady):
```
(Upload to S3 - no transaction)
T_blob_uploaded: BEGIN
UPDATE blobs SET uploaded_ts=? WHERE id=?
INSERT INTO snapshot_blobs (snapshot_id, blob_id, blob_hash) VALUES (?, ?, ?)
INSERT INTO uploads (blob_hash, snapshot_id, uploaded_at, size, duration_ms) VALUES (?, ?, ?, ?, ?)
COMMIT
```
---
#### Step 2c: Queue File for Batch Insertion
After all chunks for a file are processed:
```go
// Build file data (in-memory, no DB)
fileChunks := make([]database.FileChunk, len(chunks))
chunkFiles := make([]database.ChunkFile, len(chunks))
// Queue for batch insertion
return s.addPendingFile(ctx, pendingFileData{
file: fileToProcess.File,
fileChunks: fileChunks,
chunkFiles: chunkFiles,
})
```
**No transaction yet** - just adds to `pendingFiles` slice.
If `len(pendingFiles) >= fileBatchSize (100)`, triggers `flushPendingFiles()`.
---
### Step 2d: Flush Pending Files
**Location:** `flushPendingFiles()` - called when batch is full or at end of processing
```go
return s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
for _, data := range files {
// 1. Create file record
s.repos.Files.Create(txCtx, tx, data.file) // INSERT OR REPLACE
// 2. Delete old associations
s.repos.FileChunks.DeleteByFileID(txCtx, tx, data.file.ID)
s.repos.ChunkFiles.DeleteByFileID(txCtx, tx, data.file.ID)
// 3. Create file_chunks records
for _, fc := range data.fileChunks {
s.repos.FileChunks.Create(txCtx, tx, &fc) // FK: chunks.chunk_hash
}
// 4. Create chunk_files records
for _, cf := range data.chunkFiles {
s.repos.ChunkFiles.Create(txCtx, tx, &cf) // FK: chunks.chunk_hash
}
// 5. Add file to snapshot
s.repos.Snapshots.AddFileByID(txCtx, tx, s.snapshotID, data.file.ID)
}
return nil
})
// COMMIT (all or nothing for the batch)
```
**Transaction:**
```
T_files_batch: BEGIN
-- For each file in batch:
INSERT OR REPLACE INTO files (...) VALUES (...)
DELETE FROM file_chunks WHERE file_id = ?
DELETE FROM chunk_files WHERE file_id = ?
INSERT INTO file_chunks (file_id, idx, chunk_hash) VALUES (?, ?, ?) -- FK: chunks
INSERT INTO chunk_files (chunk_hash, file_id, ...) VALUES (?, ?, ...) -- FK: chunks
INSERT INTO snapshot_files (snapshot_id, file_id) VALUES (?, ?)
-- Repeat for each file
COMMIT
```
**⚠️ CRITICAL DEPENDENCY**: `file_chunks` and `chunk_files` require `chunks.chunk_hash` to exist.
---
### Phase 2 End: Final Flush
```go
// Flush any remaining pending files
if err := s.flushAllPending(ctx); err != nil { ... }
// Final packer flush
s.packer.Flush()
```
---
## The Current Bug
### Problem
The current code attempts to batch file insertions, but `file_chunks` and `chunk_files` have foreign keys to `chunks.chunk_hash`. The batched file flush tries to insert these records, but if the chunks haven't been committed yet, the FK constraint fails.
### Why It's Happening
Looking at the sequence:
1. Process file A, chunk X
2. Create chunk X in DB (Transaction commits)
3. Add chunk X to packer
4. Packer creates blob_chunks for chunk X (needs chunk X - OK, committed in step 2)
5. Queue file A with chunk references
6. Process file B, chunk Y
7. Create chunk Y in DB (Transaction commits)
8. ... etc ...
9. At end: flushPendingFiles()
10. Insert file_chunks for file A referencing chunk X (chunk X committed - should work)
The chunks ARE being created individually. But something is going wrong.
### Actual Issue
Wait - let me re-read the code. The issue is:
In `processFileStreaming`, when we queue file data:
```go
fileChunks[i] = database.FileChunk{
FileID: fileToProcess.File.ID,
Idx: ci.fileChunk.Idx,
ChunkHash: ci.fileChunk.ChunkHash,
}
```
The `FileID` is set, but `fileToProcess.File.ID` might be empty at this point because the file record hasn't been created yet!
Looking at `checkFileInMemory`:
```go
// For new files:
if !exists {
return file, true // file.ID is empty string!
}
// For existing files:
file.ID = existingFile.ID // Reuse existing ID
```
**For NEW files, `file.ID` is empty!**
Then in `flushPendingFiles`:
```go
s.repos.Files.Create(txCtx, tx, data.file) // This generates/uses the ID
```
But `data.fileChunks` was built with the EMPTY ID!
### The Real Problem
For new files:
1. `checkFileInMemory` creates file record with empty ID
2. `processFileStreaming` queues file_chunks with empty `FileID`
3. `flushPendingFiles` creates file (generates ID), but file_chunks still have empty `FileID`
Wait, but `Files.Create` should be INSERT OR REPLACE by path, and the file struct should get updated... Let me check.
Actually, looking more carefully at the code path - the file IS created first in the flush, but the `fileChunks` slice was already built with the old (possibly empty) ID. The ID isn't updated after the file is created.
Hmm, but looking at the current code:
```go
fileChunks[i] = database.FileChunk{
FileID: fileToProcess.File.ID, // This uses the ID from the File struct
```
And in `checkFileInMemory` for new files, we create a file struct but don't set the ID. However, looking at the database repository, `Files.Create` should be doing `INSERT OR REPLACE` and the ID should be pre-generated...
Let me check if IDs are being generated. Looking at the File struct usage, it seems like UUIDs should be generated somewhere...
Actually, looking at the test failures again:
```
creating file chunk: inserting file_chunk: constraint failed: FOREIGN KEY constraint failed (787)
```
Error 787 is SQLite's foreign key constraint error. The failing FK is on `file_chunks.chunk_hash → chunks.chunk_hash`.
So the chunks ARE NOT in the database when we try to insert file_chunks. Let me trace through more carefully...
---
## Transaction Timing Issue
The problem is transaction visibility in SQLite.
Each `WithTx` creates a new transaction that commits at the end. But with batched file insertion:
1. Chunk transactions commit one at a time
2. File batch transaction runs later
If chunks are being inserted but something goes wrong with transaction isolation, the file batch might not see them.
But actually SQLite in WAL mode should have SERIALIZABLE isolation by default, so committed transactions should be visible.
Let me check if the in-memory cache is masking a database problem...
Actually, wait. Let me re-check the current broken code more carefully. The issue might be simpler.
---
## Current Code Flow Analysis
Looking at `processFileStreaming` in the current broken state:
```go
// For each chunk:
if !chunkExists {
err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
dbChunk := &database.Chunk{ChunkHash: chunk.Hash, Size: chunk.Size}
return s.repos.Chunks.Create(txCtx, tx, dbChunk)
})
// ... check error ...
s.addKnownChunk(chunk.Hash)
}
// ... add to packer (creates blob_chunks) ...
// Collect chunk info for file
chunks = append(chunks, chunkInfo{...})
```
Then at end of function:
```go
// Queue file for batch insertion
return s.addPendingFile(ctx, pendingFileData{
file: fileToProcess.File,
fileChunks: fileChunks,
chunkFiles: chunkFiles,
})
```
At end of `processPhase`:
```go
if err := s.flushAllPending(ctx); err != nil { ... }
```
The chunks are being created one-by-one with individual transactions. By the time `flushPendingFiles` runs, all chunk transactions should have committed.
Unless... there's a bug in how the chunks are being referenced. Let me check if the chunk_hash values are correct.
Or... maybe the test database is being recreated between operations somehow?
Actually, let me check the test setup. Maybe the issue is specific to the test environment.
---
## Summary of Object Lifecycle
| Object | When Created | Transaction | Dependencies |
|--------|--------------|-------------|--------------|
| snapshot | Before scan | Individual tx | None |
| blob | When packer needs new blob | Individual tx | None |
| chunk | During file chunking (each chunk) | Individual tx | None |
| blob_chunks | Immediately after adding chunk to packer | Individual tx | chunks, blobs |
| files | Batched at end of processing | Batch tx | None |
| file_chunks | With file (batched) | Batch tx | files, chunks |
| chunk_files | With file (batched) | Batch tx | files, chunks |
| snapshot_files | With file (batched) | Batch tx | snapshots, files |
| snapshot_blobs | After blob upload | Individual tx | snapshots, blobs |
| uploads | After blob upload | Same tx as snapshot_blobs | blobs, snapshots |
---
## Root Cause Analysis
After detailed analysis, I believe the issue is one of the following:
### Hypothesis 1: File ID Not Set
Looking at `checkFileInMemory()` for NEW files:
```go
if !exists {
return file, true // file.ID is empty string!
}
```
For new files, `file.ID` is empty. Then in `processFileStreaming`:
```go
fileChunks[i] = database.FileChunk{
FileID: fileToProcess.File.ID, // Empty for new files!
...
}
```
The `FileID` in the built `fileChunks` slice is empty.
Then in `flushPendingFiles`:
```go
s.repos.Files.Create(txCtx, tx, data.file) // This generates the ID
// But data.fileChunks still has empty FileID!
for i := range data.fileChunks {
s.repos.FileChunks.Create(...) // Uses empty FileID
}
```
**Solution**: Generate file IDs upfront in `checkFileInMemory()`:
```go
file := &database.File{
ID: uuid.New().String(), // Generate ID immediately
Path: path,
...
}
```
### Hypothesis 2: Transaction Isolation
SQLite with a single connection pool (`MaxOpenConns(1)`) should serialize all transactions. Committed data should be visible to subsequent transactions.
However, there might be a subtle issue with how `context.Background()` is used in the packer vs the scanner's context.
## Recommended Fix
**Step 1: Generate file IDs upfront**
In `checkFileInMemory()`, generate the UUID for new files immediately:
```go
file := &database.File{
ID: uuid.New().String(), // Always generate ID
Path: path,
...
}
```
This ensures `file.ID` is set when building `fileChunks` and `chunkFiles` slices.
**Step 2: Verify by reverting to per-file transactions**
If Step 1 doesn't fix it, revert to non-batched file insertion to isolate the issue:
```go
// Instead of queuing:
// return s.addPendingFile(ctx, pendingFileData{...})
// Do immediate insertion:
return s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
// Create file
s.repos.Files.Create(txCtx, tx, fileToProcess.File)
// Delete old associations
s.repos.FileChunks.DeleteByFileID(...)
s.repos.ChunkFiles.DeleteByFileID(...)
// Create new associations
for _, fc := range fileChunks {
s.repos.FileChunks.Create(...)
}
for _, cf := range chunkFiles {
s.repos.ChunkFiles.Create(...)
}
// Add to snapshot
s.repos.Snapshots.AddFileByID(...)
return nil
})
```
**Step 3: If batching is still desired**
After confirming per-file transactions work, re-implement batching with the ID fix in place, and add debug logging to trace exactly which chunk_hash is failing and why.

423
README.md
View File

@@ -1,39 +1,27 @@
# vaultik (ваултик)
`vaultik` is a incremental backup daemon written in Go. It
encrypts data using an `age` public key and uploads each encrypted blob
directly to a remote S3-compatible object store. It requires no private
keys, secrets, or credentials stored on the backed-up system.
WIP: pre-1.0, some functions may not be fully implemented yet
`vaultik` is an incremental backup daemon written in Go. It encrypts data
using an `age` public key and uploads each encrypted blob directly to a
remote S3-compatible object store. It requires no private keys, secrets, or
credentials (other than those required to PUT to encrypted object storage,
such as S3 API keys) stored on the backed-up system.
It includes table-stakes features such as:
* modern authenticated encryption
* modern encryption (the excellent `age`)
* deduplication
* incremental backups
* modern multithreaded zstd compression with configurable levels
* content-addressed immutable storage
* local state tracking in standard SQLite database
* inotify-based change detection
* streaming processing of all data to not require lots of ram or temp file
storage
* local state tracking in standard SQLite database, enables write-only
incremental backups to destination
* no mutable remote metadata
* no plaintext file paths or metadata stored in remote
* does not create huge numbers of small files (to keep S3 operation counts
down) even if the source system has many small files
## what
`vaultik` walks a set of configured directories and builds a
content-addressable chunk map of changed files using deterministic chunking.
Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
encrypted with `age`, and uploaded directly to remote storage under a
content-addressed S3 path.
No plaintext file contents ever hit disk. No private key or secret
passphrase is needed or stored locally. All encrypted data is
streaming-processed and immediately discarded once uploaded. Metadata is
encrypted and pushed with the same mechanism.
## why
Existing backup software fails under one or more of these conditions:
@@ -42,16 +30,48 @@ Existing backup software fails under one or more of these conditions:
compromises encrypted backups in the case of host system compromise
* Depends on symmetric encryption unsuitable for zero-trust environments
* Creates one-blob-per-file, which results in excessive S3 operation counts
* is slow
`vaultik` addresses these by using:
Other backup tools like `restic`, `borg`, and `duplicity` are designed for
environments where the source host can store secrets and has access to
decryption keys. I don't want to store backup decryption keys on my hosts,
only public keys for encryption.
* Public-key-only encryption (via `age`) requires no secrets (other than
remote storage api key) on the source system
* Local state cache for incremental detection does not require reading from
or decrypting remote storage
* Content-addressed immutable storage allows efficient deduplication
* Storage only of large encrypted blobs of configurable size (1G by default)
reduces S3 operation counts and improves performance
My requirements are:
* open source
* no passphrases or private keys on the source host
* incremental
* compressed
* encrypted
* s3 compatible without an intermediate step or tool
Surprisingly, no existing tool meets these requirements, so I wrote `vaultik`.
## design goals
1. Backups must require only a public key on the source host.
1. No secrets or private keys may exist on the source system.
1. Restore must be possible using **only** the backup bucket and a private key.
1. Prune must be possible (requires private key, done on different hosts).
1. All encryption uses [`age`](https://age-encryption.org/) (X25519, XChaCha20-Poly1305).
1. Compression uses `zstd` at a configurable level.
1. Files are chunked, and multiple chunks are packed into encrypted blobs
to reduce object count for filesystems with many small files.
1. All metadata (snapshots) is stored remotely as encrypted SQLite DBs.
## what
`vaultik` walks a set of configured directories and builds a
content-addressable chunk map of changed files using deterministic chunking.
Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
encrypted with `age`, and uploaded directly to remote storage under a
content-addressed S3 path. At the end, a pruned snapshot-specific sqlite
database of metadata is created, encrypted, and uploaded alongside the
blobs.
No plaintext file contents ever hit disk. No private key or secret
passphrase is needed or stored locally.
## how
@@ -61,59 +81,63 @@ Existing backup software fails under one or more of these conditions:
go install git.eeqj.de/sneak/vaultik@latest
```
2. **generate keypair**
1. **generate keypair**
```sh
age-keygen -o agekey.txt
grep 'public key:' agekey.txt
```
3. **write config**
1. **write config**
```yaml
source_dirs:
# Named snapshots - each snapshot can contain multiple paths
snapshots:
system:
paths:
- /etc
- /home/user/data
- /var/lib
exclude:
- '*.cache' # Snapshot-specific exclusions
home:
paths:
- /home/user/documents
- /home/user/photos
# Global exclusions (apply to all snapshots)
exclude:
- '*.log'
- '*.tmp'
age_recipient: age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
- '.git'
- 'node_modules'
age_recipients:
- age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
s3:
# endpoint is optional if using AWS S3, but who even does that?
endpoint: https://s3.example.com
bucket: vaultik-data
prefix: host1/
access_key_id: ...
secret_access_key: ...
region: us-east-1
backup_interval: 1h # only used in daemon mode, not for --cron mode
full_scan_interval: 24h # normally we use inotify to mark dirty, but
# every 24h we do a full stat() scan
min_time_between_run: 15m # again, only for daemon mode
#index_path: /var/lib/vaultik/index.sqlite
backup_interval: 1h
full_scan_interval: 24h
min_time_between_run: 15m
chunk_size: 10MB
blob_size_limit: 10GB
blob_size_limit: 1GB
```
4. **run**
1. **run**
```sh
# Create all configured snapshots
vaultik --config /etc/vaultik.yaml snapshot create
```
```sh
vaultik --config /etc/vaultik.yaml snapshot create --cron # silent unless error
```
# Create specific snapshots by name
vaultik --config /etc/vaultik.yaml snapshot create home system
```sh
vaultik --config /etc/vaultik.yaml snapshot daemon # runs continuously in foreground, uses inotify to detect changes
# TODO
* make sure daemon mode does not make a snapshot if no files have
changed, even if the backup_interval has passed
* in daemon mode, if we are long enough since the last snapshot event, and we get
an inotify event, we should schedule the next snapshot creation for 10 minutes from the
time of the mark-dirty event.
# Silent mode for cron
vaultik --config /etc/vaultik.yaml snapshot create --cron
```
---
@@ -123,76 +147,211 @@ Existing backup software fails under one or more of these conditions:
### commands
```sh
vaultik [--config <path>] snapshot create [--cron] [--daemon]
vaultik [--config <path>] snapshot create [snapshot-names...] [--cron] [--daemon] [--prune]
vaultik [--config <path>] snapshot list [--json]
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--force]
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep]
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--force]
vaultik [--config <path>] snapshot remove <snapshot-id> [--dry-run] [--force]
vaultik [--config <path>] snapshot prune
vaultik [--config <path>] restore <snapshot-id> <target-dir> [paths...]
vaultik [--config <path>] prune [--dry-run] [--force]
vaultik [--config <path>] info
vaultik [--config <path>] store info
# FIXME: remove 'bucket' and 'prefix' and 'snapshot' flags. it should be
# 'vaultik restore snapshot <snapshot> --target <dir>'. bucket and prefix are always
# from config file.
vaultik restore --bucket <bucket> --prefix <prefix> --snapshot <id> --target <dir>
# FIXME: remove prune, it's the old version of "snapshot purge"
vaultik prune --bucket <bucket> --prefix <prefix> [--dry-run]
# FIXME: change fetch to 'vaultik restore path <snapshot> <path> --target <path>'
vaultik fetch --bucket <bucket> --prefix <prefix> --snapshot <id> --file <path> --target <path>
# FIXME: remove this, it's redundant with 'snapshot verify'
vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
```
### environment
* `VAULTIK_PRIVATE_KEY`: Required for `restore`, `prune`, `fetch`, and `verify` commands. Contains the age private key for decryption.
* `VAULTIK_CONFIG`: Optional path to config file. If set, config file path doesn't need to be specified on the command line.
* `VAULTIK_AGE_SECRET_KEY`: Required for `restore` and deep `verify`. Contains the age private key for decryption.
* `VAULTIK_CONFIG`: Optional path to config file.
### command details
**snapshot create**: Perform incremental backup of configured directories
**snapshot create**: Perform incremental backup of configured snapshots
* Config is located at `/etc/vaultik/config.yml` by default
* Optional snapshot names argument to create specific snapshots (default: all)
* `--cron`: Silent unless error (for crontab)
* `--daemon`: Run continuously with inotify monitoring and periodic scans
* `--prune`: Delete old snapshots and orphaned blobs after backup
**snapshot list**: List all snapshots with their timestamps and sizes
* `--json`: Output in JSON format
**snapshot verify**: Verify snapshot integrity
* `--deep`: Download and verify blob contents (not just existence)
**snapshot purge**: Remove old snapshots based on criteria
* `--keep-latest`: Keep only the most recent snapshot
* `--older-than`: Remove snapshots older than duration (e.g., 30d, 6mo, 1y)
* `--force`: Skip confirmation prompt
**snapshot verify**: Verify snapshot integrity
* `--deep`: Download and verify blob hashes (not just existence)
**snapshot remove**: Remove a specific snapshot
* `--dry-run`: Show what would be deleted without deleting
* `--force`: Skip confirmation prompt
**store info**: Display S3 bucket configuration and storage statistics
**snapshot prune**: Clean orphaned data from local database
**restore**: Restore entire snapshot to target directory
* Downloads and decrypts metadata
* Fetches only required blobs
* Reconstructs directory structure
**restore**: Restore snapshot to target directory
* Requires `VAULTIK_AGE_SECRET_KEY` environment variable with age private key
* Optional path arguments to restore specific files/directories (default: all)
* Downloads and decrypts metadata, fetches required blobs, reconstructs files
* Preserves file permissions, timestamps, and ownership (ownership requires root)
* Handles symlinks and directories
**prune**: Remove unreferenced blobs from storage
* Requires private key
* Downloads latest snapshot metadata
**prune**: Remove unreferenced blobs from remote storage
* Scans all snapshots for referenced blobs
* Deletes orphaned blobs
**fetch**: Extract single file from backup
* Retrieves specific file without full restore
* Supports extracting to different filename
**info**: Display system and configuration information
**verify**: Validate backup integrity
* Checks metadata hash
* Verifies all referenced blobs exist
* Default: Downloads blobs and validates chunk integrity
* `--quick`: Only checks blob existence and S3 content hashes
**store info**: Display S3 bucket configuration and storage statistics
---
## architecture
### s3 bucket layout
```
s3://<bucket>/<prefix>/
├── blobs/
│ └── <aa>/<bb>/<full_blob_hash>
└── metadata/
├── <snapshot_id>/
│ ├── db.zst.age
│ └── manifest.json.zst
```
* `blobs/<aa>/<bb>/...`: Two-level directory sharding using first 4 hex chars of blob hash
* `metadata/<snapshot_id>/db.zst.age`: Encrypted, compressed SQLite database
* `metadata/<snapshot_id>/manifest.json.zst`: Unencrypted blob list for pruning
### blob manifest format
The `manifest.json.zst` file is unencrypted (compressed JSON) to enable pruning without decryption:
```json
{
"snapshot_id": "hostname_snapshotname_2025-01-01T12:00:00Z",
"blob_hashes": [
"aa1234567890abcdef...",
"bb2345678901bcdef0..."
]
}
```
Snapshot IDs follow the format `<hostname>_<snapshot-name>_<timestamp>` (e.g., `server1_home_2025-01-01T12:00:00Z`).
### local sqlite schema
```sql
CREATE TABLE files (
id TEXT PRIMARY KEY,
path TEXT NOT NULL UNIQUE,
mtime INTEGER NOT NULL,
size INTEGER NOT NULL,
mode INTEGER NOT NULL,
uid INTEGER NOT NULL,
gid INTEGER NOT NULL
);
CREATE TABLE file_chunks (
file_id TEXT NOT NULL,
idx INTEGER NOT NULL,
chunk_hash TEXT NOT NULL,
PRIMARY KEY (file_id, idx),
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
);
CREATE TABLE chunks (
chunk_hash TEXT PRIMARY KEY,
size INTEGER NOT NULL
);
CREATE TABLE blobs (
id TEXT PRIMARY KEY,
blob_hash TEXT NOT NULL UNIQUE,
uncompressed INTEGER NOT NULL,
compressed INTEGER NOT NULL,
uploaded_at INTEGER
);
CREATE TABLE blob_chunks (
blob_hash TEXT NOT NULL,
chunk_hash TEXT NOT NULL,
offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (blob_hash, chunk_hash)
);
CREATE TABLE chunk_files (
chunk_hash TEXT NOT NULL,
file_id TEXT NOT NULL,
file_offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (chunk_hash, file_id)
);
CREATE TABLE snapshots (
id TEXT PRIMARY KEY,
hostname TEXT NOT NULL,
vaultik_version TEXT NOT NULL,
started_at INTEGER NOT NULL,
completed_at INTEGER,
file_count INTEGER NOT NULL,
chunk_count INTEGER NOT NULL,
blob_count INTEGER NOT NULL,
total_size INTEGER NOT NULL,
blob_size INTEGER NOT NULL,
compression_ratio REAL NOT NULL
);
CREATE TABLE snapshot_files (
snapshot_id TEXT NOT NULL,
file_id TEXT NOT NULL,
PRIMARY KEY (snapshot_id, file_id)
);
CREATE TABLE snapshot_blobs (
snapshot_id TEXT NOT NULL,
blob_id TEXT NOT NULL,
blob_hash TEXT NOT NULL,
PRIMARY KEY (snapshot_id, blob_id)
);
```
### data flow
#### backup
1. Load config, open local SQLite index
1. Walk source directories, check mtime/size against index
1. For changed/new files: chunk using content-defined chunking
1. For each chunk: hash, check if already uploaded, add to blob packer
1. When blob reaches threshold: compress, encrypt, upload to S3
1. Build snapshot metadata, compress, encrypt, upload
1. Create blob manifest (unencrypted) for pruning support
#### restore
1. Download `metadata/<snapshot_id>/db.zst.age`
1. Decrypt and decompress SQLite database
1. Query files table (optionally filtered by paths)
1. For each file, get ordered chunk list from file_chunks
1. Download required blobs, decrypt, decompress
1. Extract chunks and reconstruct files
1. Restore permissions, mtime, uid/gid
#### prune
1. List all snapshot manifests
1. Build set of all referenced blob hashes
1. List all blobs in storage
1. Delete any blob not in referenced set
### chunking
* Content-defined chunking using rolling hash (Rabin fingerprint)
* Average chunk size: 10MB (configurable)
* Content-defined chunking using FastCDC algorithm
* Average chunk size: configurable (default 10MB)
* Deduplication at chunk level
* Multiple chunks packed into blobs for efficiency
@@ -203,19 +362,13 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
* Each blob encrypted independently
* Metadata databases also encrypted
### storage
### compression
* Content-addressed blob storage
* Immutable append-only design
* Two-level directory sharding for blobs (aa/bb/hash)
* Compressed with zstd before encryption
* zstd compression at configurable level
* Applied before encryption
* Blob-level compression for efficiency
### state tracking
* Local SQLite database for incremental state
* Tracks file mtimes and chunk mappings
* Enables efficient change detection
* Supports inotify monitoring in daemon mode
---
## does not
@@ -225,8 +378,6 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
* Require a symmetric passphrase or password
* Trust the source system with anything
---
## does
* Incremental deduplicated backup
@@ -238,70 +389,16 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
---
## restore
`vaultik restore` downloads only the snapshot metadata and required blobs. It
never contacts the source system. All restore operations depend only on:
* `VAULTIK_PRIVATE_KEY`
* The bucket
The entire system is restore-only from object storage.
---
## features
### daemon mode
* Continuous background operation
* inotify-based change detection
* Respects `backup_interval` and `min_time_between_run`
* Full scan every `full_scan_interval` (default 24h)
### cron mode
* Single backup run
* Silent output unless errors
* Ideal for scheduled backups
### metadata integrity
* SHA256 hash of metadata stored separately
* Encrypted hash file for verification
* Chunked metadata support for large filesystems
### exclusion patterns
* Glob-based file exclusion
* Configured in YAML
* Applied during directory walk
## prune
Run `vaultik prune` on a machine with the private key. It:
* Downloads the most recent snapshot
* Decrypts metadata
* Lists referenced blobs
* Deletes any blob in the bucket not referenced
This enables garbage collection from immutable storage.
---
## LICENSE
[MIT](https://opensource.org/license/mit/)
---
## requirements
* Go 1.24.4 or later
* Go 1.24 or later
* S3-compatible object storage
* Sufficient disk space for local index (typically <1GB)
## license
[MIT](https://opensource.org/license/mit/)
## author
Made with love and lots of expensive SOTA AI by [sneak](https://sneak.berlin) in Berlin in the summer of 2025.

235
TODO.md
View File

@@ -1,155 +1,128 @@
# Implementation TODO
# Vaultik 1.0 TODO
## Proposed: Store and Snapshot Commands
Linear list of tasks to complete before 1.0 release.
### Overview
Reorganize commands to provide better visibility into stored data and snapshots.
## Rclone Storage Backend (Complete)
### Command Structure
Add rclone as a storage backend via Go library import, allowing vaultik to use any of rclone's 70+ supported cloud storage providers.
#### `vaultik store` - Storage information commands
- `vaultik store info`
- Lists S3 bucket configuration
- Shows total number of snapshots (from metadata/ listing)
- Shows total number of blobs (from blobs/ listing)
- Shows total size of all blobs
- **No decryption required** - uses S3 listing only
**Configuration:**
```yaml
storage_url: "rclone://myremote/path/to/backups"
```
User must have rclone configured separately (via `rclone config`).
#### `vaultik snapshot` - Snapshot management commands
- `vaultik snapshot create [path]`
- Renamed from `vaultik backup`
- Same functionality as current backup command
**Implementation Steps:**
1. [x] Add rclone dependency to go.mod
2. [x] Create `internal/storage/rclone.go` implementing `Storer` interface
- `NewRcloneStorer(remote, path)` - init with `configfile.Install()` and `fs.NewFs()`
- `Put` / `PutWithProgress` - use `operations.Rcat()`
- `Get` - use `fs.NewObject()` then `obj.Open()`
- `Stat` - use `fs.NewObject()` for size/metadata
- `Delete` - use `obj.Remove()`
- `List` / `ListStream` - use `operations.ListFn()`
- `Info` - return remote name
3. [x] Update `internal/storage/url.go` - parse `rclone://remote/path` URLs
4. [x] Update `internal/storage/module.go` - add rclone case to `storerFromURL()`
5. [x] Test with real rclone remote
- `vaultik snapshot list [--json]`
- Lists all snapshots with:
- Snapshot ID
- Creation timestamp (parsed from snapshot ID)
- Compressed size (sum of referenced blob sizes from manifest)
- **No decryption required** - uses blob manifests only
- `--json` flag outputs in JSON format instead of table
**Error Mapping:**
- `fs.ErrorObjectNotFound``ErrNotFound`
- `fs.ErrorDirNotFound``ErrNotFound`
- `fs.ErrorNotFoundInConfigFile``ErrRemoteNotFound` (new)
- `vaultik snapshot purge`
- Requires one of:
- `--keep-latest` - keeps only the most recent snapshot
- `--older-than <duration>` - removes snapshots older than duration (e.g., "30d", "6m", "1y")
- Removes snapshot metadata and runs pruning to clean up unreferenced blobs
- Shows what would be deleted and requires confirmation
---
- `vaultik snapshot verify [--deep] <snapshot-id>`
- Basic mode: Verifies all blobs referenced in manifest exist in S3
- `--deep` mode: Downloads each blob and verifies its hash matches the stored hash
- **Stub implementation for now**
## CLI Polish (Priority)
### Implementation Notes
1. Improve error messages throughout
- Ensure all errors include actionable context
- Add suggestions for common issues (e.g., "did you set VAULTIK_AGE_SECRET_KEY?")
1. **No Decryption Required**: All commands work with unencrypted blob manifests
2. **Blob Manifests**: Located at `metadata/{snapshot-id}/manifest.json.zst`
3. **S3 Operations**: Use S3 ListObjects to enumerate snapshots and blobs
4. **Size Calculations**: Sum blob sizes from S3 object metadata
5. **Timestamp Parsing**: Extract from snapshot ID format (e.g., `2024-01-15-143052-hostname`)
6. **S3 Metadata**: Only used for `snapshot verify` command
## Security (Priority)
### Benefits
- Users can see storage usage without decryption keys
- Snapshot management doesn't require access to encrypted metadata
- Clean separation between storage info and snapshot operations
1. Audit encryption implementation
- Verify age encryption is used correctly
- Ensure no plaintext leaks in logs or errors
- Verify blob hashes are computed correctly
## Chunking and Hashing
1. ~~Implement content-defined chunking~~ (done with FastCDC)
1. ~~Create streaming chunk processor~~ (done in chunker)
1. ~~Implement SHA256 hashing for chunks~~ (done in scanner)
1. ~~Add configurable chunk size parameters~~ (done in scanner)
1. ~~Write tests for chunking consistency~~ (done)
1. Secure memory handling for secrets
- Clear S3 credentials from memory after client init
- Document that age_secret_key is env-var only (already implemented)
## Compression and Encryption
1. ~~Implement compression~~ (done with zlib in blob packer)
1. ~~Integrate age encryption library~~ (done in crypto package)
1. ~~Create Encryptor type for public key encryption~~ (done)
1. ~~Implement streaming encrypt/decrypt pipelines~~ (done in packer)
1. ~~Write tests for compression and encryption~~ (done)
## Testing
## Blob Packing
1. ~~Implement BlobWriter with size limits~~ (done in packer)
1. ~~Add chunk accumulation and flushing~~ (done)
1. ~~Create blob hash calculation~~ (done)
1. ~~Implement proper error handling and rollback~~ (done with transactions)
1. ~~Write tests for blob packing scenarios~~ (done)
1. Write integration tests for restore command
## S3 Operations
1. ~~Integrate MinIO client library~~ (done in s3 package)
1. ~~Implement S3Client wrapper type~~ (done)
1. ~~Add multipart upload support for large blobs~~ (done - using standard upload)
1. ~~Implement retry logic~~ (handled by MinIO client)
1. ~~Write tests using MinIO container~~ (done with testcontainers)
1. Write end-to-end integration test
- Create backup
- Verify backup
- Restore backup
- Compare restored files to originals
## Backup Command - Basic
1. ~~Implement directory walking with exclusion patterns~~ (done with afero)
1. Add file change detection using index
1. ~~Integrate chunking pipeline for changed files~~ (done in scanner)
1. Implement blob upload coordination to S3
1. Add progress reporting to stderr
1. Write integration tests for backup
1. Add tests for edge cases
- Empty directories
- Symlinks
- Special characters in filenames
- Very large files (multi-GB)
- Many small files (100k+)
## Snapshot Metadata
1. Implement snapshot metadata extraction from index
1. Create SQLite snapshot database builder
1. Add metadata compression and encryption
1. Implement metadata chunking for large snapshots
1. Add hash calculation and verification
1. Implement metadata upload to S3
1. Write tests for metadata operations
1. Add tests for error conditions
- Network failures during upload
- Disk full during restore
- Corrupted blobs
- Missing blobs
## Restore Command
1. Implement snapshot listing and selection
1. Add metadata download and reconstruction
1. Implement hash verification for metadata
1. Create file restoration logic with chunk retrieval
1. Add blob caching for efficiency
1. Implement proper file permissions and mtime restoration
1. Write integration tests for restore
## Performance
## Prune Command
1. Implement latest snapshot detection
1. Add referenced blob extraction from metadata
1. Create S3 blob listing and comparison
1. Implement safe deletion of unreferenced blobs
1. Add dry-run mode for safety
1. Write tests for prune scenarios
1. Profile and optimize restore performance
- Parallel blob downloads
- Streaming decompression/decryption
- Efficient chunk reassembly
## Verify Command
1. Implement metadata integrity checking
1. Add blob existence verification
1. Implement quick mode (S3 hash checking)
1. Implement deep mode (download and verify chunks)
1. Add detailed error reporting
1. Write tests for verification
1. Add bandwidth limiting option
- `--bwlimit` flag for upload/download speed limiting
## Fetch Command
1. Implement single-file metadata query
1. Add minimal blob downloading for file
1. Create streaming file reconstruction
1. Add support for output redirection
1. Write tests for fetch command
## Documentation
## Daemon Mode
1. Implement inotify watcher for Linux
1. Add dirty path tracking in index
1. Create periodic full scan scheduler
1. Implement backup interval enforcement
1. Add proper signal handling and shutdown
1. Write tests for daemon behavior
1. Add man page or --help improvements
- Detailed help for each command
- Examples in help output
## Cron Mode
1. Implement silent operation mode
1. Add proper exit codes for cron
1. Implement lock file to prevent concurrent runs
1. Add error summary reporting
1. Write tests for cron mode
## Final Polish
## Finalization
1. Add comprehensive logging throughout
1. Implement proper error wrapping and context
1. Add performance metrics collection
1. Create end-to-end integration tests
1. Write documentation and examples
1. Set up CI/CD pipeline
1. Ensure version is set correctly in releases
1. Create release process
- Binary releases for supported platforms
- Checksums for binaries
- Release notes template
1. Final code review
- Remove debug statements
- Ensure consistent code style
1. Tag and release v1.0.0
---
## Post-1.0 (Daemon Mode)
1. Implement inotify file watcher for Linux
- Watch source directories for changes
- Track dirty paths in memory
1. Implement FSEvents watcher for macOS
- Watch source directories for changes
- Track dirty paths in memory
1. Implement backup scheduler in daemon mode
- Respect backup_interval config
- Trigger backup when dirty paths exist and interval elapsed
- Implement full_scan_interval for periodic full scans
1. Add proper signal handling for daemon
- Graceful shutdown on SIGTERM/SIGINT
- Complete in-progress backup before exit
1. Write tests for daemon mode

View File

@@ -1,9 +1,41 @@
package main
import (
"os"
"runtime"
"runtime/pprof"
"git.eeqj.de/sneak/vaultik/internal/cli"
)
func main() {
// CPU profiling: set VAULTIK_CPUPROFILE=/path/to/cpu.prof
if cpuProfile := os.Getenv("VAULTIK_CPUPROFILE"); cpuProfile != "" {
f, err := os.Create(cpuProfile)
if err != nil {
panic("could not create CPU profile: " + err.Error())
}
defer func() { _ = f.Close() }()
if err := pprof.StartCPUProfile(f); err != nil {
panic("could not start CPU profile: " + err.Error())
}
defer pprof.StopCPUProfile()
}
// Memory profiling: set VAULTIK_MEMPROFILE=/path/to/mem.prof
if memProfile := os.Getenv("VAULTIK_MEMPROFILE"); memProfile != "" {
defer func() {
f, err := os.Create(memProfile)
if err != nil {
panic("could not create memory profile: " + err.Error())
}
defer func() { _ = f.Close() }()
runtime.GC() // get up-to-date statistics
if err := pprof.WriteHeapProfile(f); err != nil {
panic("could not write memory profile: " + err.Error())
}
}()
}
cli.CLIEntry()
}

View File

@@ -2,25 +2,222 @@
# This file shows all available configuration options with their default values
# Copy this file and uncomment/modify the values you need
# Age recipient public key for encryption
# This is REQUIRED - backups are encrypted to this public key
# Age recipient public keys for encryption
# This is REQUIRED - backups are encrypted to these public keys
# Generate with: age-keygen | grep "public key"
age_recipient: age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
age_recipients:
- age1cj2k2addawy294f6k2gr2mf9gps9r3syplryxca3nvxj3daqm96qfp84tz
# List of directories to backup
# These paths will be scanned recursively for files to backup
# Use absolute paths
source_dirs:
- /
# - /home
# - /etc
# - /var
# Patterns to exclude from backup
# Uses glob patterns to match file paths
# Paths are matched as absolute paths
# Named snapshots - each snapshot can contain multiple paths
# Each snapshot gets its own ID and can have snapshot-specific excludes
snapshots:
testing:
paths:
- ~/dev/vaultik
apps:
paths:
- /Applications
exclude:
# System directories that should not be backed up
- "/App Store.app"
- "/Apps.app"
- "/Automator.app"
- "/Books.app"
- "/Calculator.app"
- "/Calendar.app"
- "/Chess.app"
- "/Clock.app"
- "/Contacts.app"
- "/Dictionary.app"
- "/FaceTime.app"
- "/FindMy.app"
- "/Font Book.app"
- "/Freeform.app"
- "/Games.app"
- "/GarageBand.app"
- "/Home.app"
- "/Image Capture.app"
- "/Image Playground.app"
- "/Journal.app"
- "/Keynote.app"
- "/Mail.app"
- "/Maps.app"
- "/Messages.app"
- "/Mission Control.app"
- "/Music.app"
- "/News.app"
- "/Notes.app"
- "/Numbers.app"
- "/Pages.app"
- "/Passwords.app"
- "/Phone.app"
- "/Photo Booth.app"
- "/Photos.app"
- "/Podcasts.app"
- "/Preview.app"
- "/QuickTime Player.app"
- "/Reminders.app"
- "/Safari.app"
- "/Shortcuts.app"
- "/Siri.app"
- "/Stickies.app"
- "/Stocks.app"
- "/System Settings.app"
- "/TV.app"
- "/TextEdit.app"
- "/Time Machine.app"
- "/Tips.app"
- "/Utilities/Activity Monitor.app"
- "/Utilities/AirPort Utility.app"
- "/Utilities/Audio MIDI Setup.app"
- "/Utilities/Bluetooth File Exchange.app"
- "/Utilities/Boot Camp Assistant.app"
- "/Utilities/ColorSync Utility.app"
- "/Utilities/Console.app"
- "/Utilities/Digital Color Meter.app"
- "/Utilities/Disk Utility.app"
- "/Utilities/Grapher.app"
- "/Utilities/Magnifier.app"
- "/Utilities/Migration Assistant.app"
- "/Utilities/Print Center.app"
- "/Utilities/Screen Sharing.app"
- "/Utilities/Screenshot.app"
- "/Utilities/Script Editor.app"
- "/Utilities/System Information.app"
- "/Utilities/Terminal.app"
- "/Utilities/VoiceOver Utility.app"
- "/VoiceMemos.app"
- "/Weather.app"
- "/iMovie.app"
- "/iPhone Mirroring.app"
home:
paths:
- "~"
exclude:
- "/.Trash"
- "/tmp"
- "/Library/Caches"
- "/Library/Accounts"
- "/Library/AppleMediaServices"
- "/Library/Application Support/AddressBook"
- "/Library/Application Support/CallHistoryDB"
- "/Library/Application Support/CallHistoryTransactions"
- "/Library/Application Support/DifferentialPrivacy"
- "/Library/Application Support/FaceTime"
- "/Library/Application Support/FileProvider"
- "/Library/Application Support/Knowledge"
- "/Library/Application Support/com.apple.TCC"
- "/Library/Application Support/com.apple.avfoundation/Frecents"
- "/Library/Application Support/com.apple.sharedfilelist"
- "/Library/Assistant/SiriVocabulary"
- "/Library/Autosave Information"
- "/Library/Biome"
- "/Library/ContainerManager"
- "/Library/Containers/com.apple.Home"
- "/Library/Containers/com.apple.Maps/Data/Maps"
- "/Library/Containers/com.apple.MobileSMS"
- "/Library/Containers/com.apple.Notes"
- "/Library/Containers/com.apple.Safari"
- "/Library/Containers/com.apple.Safari.WebApp"
- "/Library/Containers/com.apple.VoiceMemos"
- "/Library/Containers/com.apple.archiveutility"
- "/Library/Containers/com.apple.corerecents.recentsd/Data/Library/Recents"
- "/Library/Containers/com.apple.mail"
- "/Library/Containers/com.apple.news"
- "/Library/Containers/com.apple.stocks"
- "/Library/Cookies"
- "/Library/CoreFollowUp"
- "/Library/Daemon Containers"
- "/Library/DoNotDisturb"
- "/Library/DuetExpertCenter"
- "/Library/Group Containers/com.apple.Home.group"
- "/Library/Group Containers/com.apple.MailPersonaStorage"
- "/Library/Group Containers/com.apple.PreviewLegacySignaturesConversion"
- "/Library/Group Containers/com.apple.bird"
- "/Library/Group Containers/com.apple.stickersd.group"
- "/Library/Group Containers/com.apple.systempreferences.cache"
- "/Library/Group Containers/group.com.apple.AppleSpell"
- "/Library/Group Containers/group.com.apple.ArchiveUtility.PKSignedContainer"
- "/Library/Group Containers/group.com.apple.DeviceActivity"
- "/Library/Group Containers/group.com.apple.Journal"
- "/Library/Group Containers/group.com.apple.ManagedSettings"
- "/Library/Group Containers/group.com.apple.PegasusConfiguration"
- "/Library/Group Containers/group.com.apple.Safari.SandboxBroker"
- "/Library/Group Containers/group.com.apple.SiriTTS"
- "/Library/Group Containers/group.com.apple.UserNotifications"
- "/Library/Group Containers/group.com.apple.VoiceMemos.shared"
- "/Library/Group Containers/group.com.apple.accessibility.voicebanking"
- "/Library/Group Containers/group.com.apple.amsondevicestoraged"
- "/Library/Group Containers/group.com.apple.appstoreagent"
- "/Library/Group Containers/group.com.apple.calendar"
- "/Library/Group Containers/group.com.apple.chronod"
- "/Library/Group Containers/group.com.apple.contacts"
- "/Library/Group Containers/group.com.apple.controlcenter"
- "/Library/Group Containers/group.com.apple.corerepair"
- "/Library/Group Containers/group.com.apple.coreservices.useractivityd"
- "/Library/Group Containers/group.com.apple.energykit"
- "/Library/Group Containers/group.com.apple.feedback"
- "/Library/Group Containers/group.com.apple.feedbacklogger"
- "/Library/Group Containers/group.com.apple.findmy.findmylocateagent"
- "/Library/Group Containers/group.com.apple.iCloudDrive"
- "/Library/Group Containers/group.com.apple.icloud.fmfcore"
- "/Library/Group Containers/group.com.apple.icloud.fmipcore"
- "/Library/Group Containers/group.com.apple.icloud.searchpartyuseragent"
- "/Library/Group Containers/group.com.apple.liveactivitiesd"
- "/Library/Group Containers/group.com.apple.loginwindow.persistent-apps"
- "/Library/Group Containers/group.com.apple.mail"
- "/Library/Group Containers/group.com.apple.mlhost"
- "/Library/Group Containers/group.com.apple.moments"
- "/Library/Group Containers/group.com.apple.news"
- "/Library/Group Containers/group.com.apple.newsd"
- "/Library/Group Containers/group.com.apple.notes"
- "/Library/Group Containers/group.com.apple.notes.import"
- "/Library/Group Containers/group.com.apple.photolibraryd.private"
- "/Library/Group Containers/group.com.apple.portrait.BackgroundReplacement"
- "/Library/Group Containers/group.com.apple.printtool"
- "/Library/Group Containers/group.com.apple.private.translation"
- "/Library/Group Containers/group.com.apple.reminders"
- "/Library/Group Containers/group.com.apple.replicatord"
- "/Library/Group Containers/group.com.apple.scopedbookmarkagent"
- "/Library/Group Containers/group.com.apple.secure-control-center-preferences"
- "/Library/Group Containers/group.com.apple.sharingd"
- "/Library/Group Containers/group.com.apple.shortcuts"
- "/Library/Group Containers/group.com.apple.siri.inference"
- "/Library/Group Containers/group.com.apple.siri.referenceResolution"
- "/Library/Group Containers/group.com.apple.siri.remembers"
- "/Library/Group Containers/group.com.apple.siri.userfeedbacklearning"
- "/Library/Group Containers/group.com.apple.spotlight"
- "/Library/Group Containers/group.com.apple.stocks"
- "/Library/Group Containers/group.com.apple.stocks-news"
- "/Library/Group Containers/group.com.apple.studentd"
- "/Library/Group Containers/group.com.apple.swtransparency"
- "/Library/Group Containers/group.com.apple.telephonyutilities.callservicesd"
- "/Library/Group Containers/group.com.apple.tips"
- "/Library/Group Containers/group.com.apple.tipsnext"
- "/Library/Group Containers/group.com.apple.transparency"
- "/Library/Group Containers/group.com.apple.usernoted"
- "/Library/Group Containers/group.com.apple.weather"
- "/Library/HomeKit"
- "/Library/IdentityServices"
- "/Library/IntelligencePlatform"
- "/Library/Mail"
- "/Library/Messages"
- "/Library/Metadata/CoreSpotlight"
- "/Library/Metadata/com.apple.IntelligentSuggestions"
- "/Library/PersonalizationPortrait"
- "/Library/Safari"
- "/Library/Sharing"
- "/Library/Shortcuts"
- "/Library/StatusKit"
- "/Library/Suggestions"
- "/Library/Trial"
- "/Library/Weather"
- "/Library/com.apple.aiml.instrumentation"
- "/Movies/TV"
system:
paths:
- /
exclude:
# Virtual/transient filesystems
- /proc
- /sys
- /dev
@@ -30,73 +227,69 @@ exclude:
- /var/run
- /var/lock
- /var/cache
- /lost+found
- /media
- /mnt
# Swap files
# Swap
- /swapfile
- /swap.img
- "*.swap"
- "*.swp"
# Log files (optional - you may want to keep some logs)
- "*.log"
- "*.log.*"
- /var/log
# Package manager caches
- /var/cache/apt
- /var/cache/yum
- /var/cache/dnf
- /var/cache/pacman
# User caches and temporary files
- "*/.cache"
# Trash
- "*/.local/share/Trash"
- "*/Downloads"
- "*/.thumbnails"
# Development artifacts
dev:
paths:
- /Users/user/dev
exclude:
- "**/node_modules"
- "**/.git/objects"
- "**/target"
- "**/build"
- "**/__pycache__"
- "**/*.pyc"
# Large files you might not want to backup
- "*.iso"
- "*.img"
- "*.vmdk"
- "*.vdi"
- "*.qcow2"
- "**/.venv"
- "**/vendor"
# Global patterns to exclude from all backups
exclude:
- "*.tmp"
# Storage URL - use either this OR the s3 section below
# Supports: s3://bucket/prefix, file:///path, rclone://remote/path
storage_url: "rclone://las1stor1//srv/pool.2024.04/backups/heraklion"
# S3-compatible storage configuration
s3:
# S3-compatible endpoint URL
# Examples: https://s3.amazonaws.com, https://storage.googleapis.com
endpoint: https://s3.example.com
# Bucket name where backups will be stored
bucket: my-backup-bucket
# Prefix (folder) within the bucket for this host's backups
# Useful for organizing backups from multiple hosts
# Default: empty (root of bucket)
#prefix: "hosts/myserver/"
# S3 access credentials
access_key_id: your-access-key
secret_access_key: your-secret-key
# S3 region
# Default: us-east-1
#region: us-east-1
# Use SSL/TLS for S3 connections
# Default: true
#use_ssl: true
# Part size for multipart uploads
# Minimum 5MB, affects memory usage during upload
# Supports: 5MB, 10M, 100MiB, etc.
# Default: 5MB
#part_size: 5MB
#s3:
# # S3-compatible endpoint URL
# # Examples: https://s3.amazonaws.com, https://storage.googleapis.com
# endpoint: http://10.100.205.122:8333
#
# # Bucket name where backups will be stored
# bucket: testbucket
#
# # Prefix (folder) within the bucket for this host's backups
# # Useful for organizing backups from multiple hosts
# # Default: empty (root of bucket)
# #prefix: "hosts/myserver/"
#
# # S3 access credentials
# access_key_id: Z9GT22M9YFU08WRMC5D4
# secret_access_key: Pi0tPKjFbN4rZlRhcA4zBtEkib04yy2WcIzI+AXk
#
# # S3 region
# # Default: us-east-1
# #region: us-east-1
#
# # Use SSL/TLS for S3 connections
# # Default: true
# #use_ssl: true
#
# # Part size for multipart uploads
# # Minimum 5MB, affects memory usage during upload
# # Supports: 5MB, 10M, 100MiB, etc.
# # Default: 5MB
# #part_size: 5MB
# How often to run backups in daemon mode
# Format: 1h, 30m, 24h, etc
@@ -133,8 +326,7 @@ s3:
# Compression level (1-19)
# Higher = better compression but slower
# Default: 3
#compression_level: 3
compression_level: 5
# Hostname to use in backup metadata
# Default: system hostname
#hostname: myserver

313
go.mod
View File

@@ -4,59 +4,302 @@ go 1.24.4
require (
filippo.io/age v1.2.1
github.com/aws/aws-sdk-go-v2 v1.36.6
github.com/aws/aws-sdk-go-v2/config v1.29.18
github.com/aws/aws-sdk-go-v2/credentials v1.17.71
github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.17.85
github.com/aws/aws-sdk-go-v2/service/s3 v1.84.1
github.com/aws/smithy-go v1.22.4
git.eeqj.de/sneak/smartconfig v1.0.0
github.com/adrg/xdg v0.5.3
github.com/aws/aws-sdk-go-v2 v1.39.6
github.com/aws/aws-sdk-go-v2/config v1.31.17
github.com/aws/aws-sdk-go-v2/credentials v1.18.21
github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.20.4
github.com/aws/aws-sdk-go-v2/service/s3 v1.90.0
github.com/aws/smithy-go v1.23.2
github.com/dustin/go-humanize v1.0.1
github.com/gobwas/glob v0.2.3
github.com/google/uuid v1.6.0
github.com/johannesboyne/gofakes3 v0.0.0-20250603205740-ed9094be7668
github.com/jotfs/fastcdc-go v0.2.0
github.com/klauspost/compress v1.18.0
github.com/spf13/afero v1.14.0
github.com/spf13/cobra v1.9.1
github.com/stretchr/testify v1.9.0
github.com/klauspost/compress v1.18.1
github.com/mattn/go-sqlite3 v1.14.29
github.com/rclone/rclone v1.72.1
github.com/schollz/progressbar/v3 v3.19.0
github.com/spf13/afero v1.15.0
github.com/spf13/cobra v1.10.1
github.com/stretchr/testify v1.11.1
go.uber.org/fx v1.24.0
golang.org/x/term v0.33.0
golang.org/x/term v0.37.0
gopkg.in/yaml.v3 v3.0.1
modernc.org/sqlite v1.38.0
)
require (
cloud.google.com/go/auth v0.17.0 // indirect
cloud.google.com/go/auth/oauth2adapt v0.2.8 // indirect
cloud.google.com/go/compute/metadata v0.9.0 // indirect
cloud.google.com/go/iam v1.5.2 // indirect
cloud.google.com/go/secretmanager v1.15.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.20.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.13.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/internal v1.11.2 // indirect
github.com/Azure/azure-sdk-for-go/sdk/keyvault/azsecrets v0.12.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/keyvault/internal v0.7.1 // indirect
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.6.3 // indirect
github.com/Azure/azure-sdk-for-go/sdk/storage/azfile v1.5.3 // indirect
github.com/Azure/go-ntlmssp v0.0.2-0.20251110135918-10b7b7e7cd26 // indirect
github.com/AzureAD/microsoft-authentication-library-for-go v1.6.0 // indirect
github.com/Files-com/files-sdk-go/v3 v3.2.264 // indirect
github.com/IBM/go-sdk-core/v5 v5.21.0 // indirect
github.com/Max-Sum/base32768 v0.0.0-20230304063302-18e6ce5945fd // indirect
github.com/Microsoft/go-winio v0.6.2 // indirect
github.com/ProtonMail/bcrypt v0.0.0-20211005172633-e235017c1baf // indirect
github.com/ProtonMail/gluon v0.17.1-0.20230724134000-308be39be96e // indirect
github.com/ProtonMail/go-crypto v1.3.0 // indirect
github.com/ProtonMail/go-mime v0.0.0-20230322103455-7d82a3887f2f // indirect
github.com/ProtonMail/go-srp v0.0.7 // indirect
github.com/ProtonMail/gopenpgp/v2 v2.9.0 // indirect
github.com/PuerkitoBio/goquery v1.10.3 // indirect
github.com/a1ex3/zstd-seekable-format-go/pkg v0.10.0 // indirect
github.com/abbot/go-http-auth v0.4.0 // indirect
github.com/anchore/go-lzo v0.1.0 // indirect
github.com/andybalholm/cascadia v1.3.3 // indirect
github.com/appscode/go-querystring v0.0.0-20170504095604-0126cfb3f1dc // indirect
github.com/armon/go-metrics v0.4.1 // indirect
github.com/aws/aws-sdk-go v1.44.256 // indirect
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.6.11 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.33 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.37 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.37 // indirect
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.3 // indirect
github.com/aws/aws-sdk-go-v2/internal/v4a v1.3.37 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.12.4 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.7.5 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.18 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.18.18 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.25.6 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.30.4 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.34.1 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.3 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.13 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.13 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.13 // indirect
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.4 // indirect
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.13 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.3 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.4 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.13 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.13 // indirect
github.com/aws/aws-sdk-go-v2/service/secretsmanager v1.35.8 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.30.1 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.5 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.39.1 // indirect
github.com/bahlo/generic-list-go v0.2.0 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/boombuler/barcode v1.1.0 // indirect
github.com/bradenaw/juniper v0.15.3 // indirect
github.com/bradfitz/iter v0.0.0-20191230175014-e8f45d346db8 // indirect
github.com/buengese/sgzip v0.1.1 // indirect
github.com/buger/jsonparser v1.1.1 // indirect
github.com/calebcase/tmpfile v1.0.3 // indirect
github.com/cenkalti/backoff/v4 v4.3.0 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/chilts/sid v0.0.0-20190607042430-660e94789ec9 // indirect
github.com/clipperhouse/stringish v0.1.1 // indirect
github.com/clipperhouse/uax29/v2 v2.3.0 // indirect
github.com/cloudflare/circl v1.6.1 // indirect
github.com/cloudinary/cloudinary-go/v2 v2.13.0 // indirect
github.com/cloudsoda/go-smb2 v0.0.0-20250228001242-d4c70e6251cc // indirect
github.com/cloudsoda/sddl v0.0.0-20250224235906-926454e91efc // indirect
github.com/colinmarc/hdfs/v2 v2.4.0 // indirect
github.com/coreos/go-semver v0.3.1 // indirect
github.com/coreos/go-systemd/v22 v22.6.0 // indirect
github.com/creasty/defaults v1.8.0 // indirect
github.com/cronokirby/saferith v0.33.0 // indirect
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
github.com/diskfs/go-diskfs v1.7.0 // indirect
github.com/dropbox/dropbox-sdk-go-unofficial/v6 v6.0.5 // indirect
github.com/ebitengine/purego v0.9.1 // indirect
github.com/emersion/go-message v0.18.2 // indirect
github.com/emersion/go-vcard v0.0.0-20241024213814-c9703dde27ff // indirect
github.com/emicklei/go-restful/v3 v3.11.0 // indirect
github.com/fatih/color v1.16.0 // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/flynn/noise v1.1.0 // indirect
github.com/fxamacker/cbor/v2 v2.7.0 // indirect
github.com/gabriel-vasile/mimetype v1.4.11 // indirect
github.com/geoffgarside/ber v1.2.0 // indirect
github.com/go-chi/chi/v5 v5.2.3 // indirect
github.com/go-darwin/apfs v0.0.0-20211011131704-f84b94dbf348 // indirect
github.com/go-git/go-billy/v5 v5.6.2 // indirect
github.com/go-jose/go-jose/v4 v4.1.2 // indirect
github.com/go-logr/logr v1.4.3 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-ole/go-ole v1.3.0 // indirect
github.com/go-openapi/errors v0.22.4 // indirect
github.com/go-openapi/jsonpointer v0.21.0 // indirect
github.com/go-openapi/jsonreference v0.20.2 // indirect
github.com/go-openapi/strfmt v0.25.0 // indirect
github.com/go-openapi/swag v0.23.0 // indirect
github.com/go-playground/locales v0.14.1 // indirect
github.com/go-playground/universal-translator v0.18.1 // indirect
github.com/go-playground/validator/v10 v10.28.0 // indirect
github.com/go-resty/resty/v2 v2.16.5 // indirect
github.com/go-viper/mapstructure/v2 v2.4.0 // indirect
github.com/gofrs/flock v0.13.0 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang-jwt/jwt/v4 v4.5.2 // indirect
github.com/golang-jwt/jwt/v5 v5.3.0 // indirect
github.com/golang/protobuf v1.5.4 // indirect
github.com/google/btree v1.1.3 // indirect
github.com/google/gnostic-models v0.6.9 // indirect
github.com/google/go-cmp v0.7.0 // indirect
github.com/google/s2a-go v0.1.9 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.7 // indirect
github.com/googleapis/gax-go/v2 v2.15.0 // indirect
github.com/gopherjs/gopherjs v1.17.2 // indirect
github.com/gorilla/schema v1.4.1 // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.26.3 // indirect
github.com/hashicorp/consul/api v1.32.1 // indirect
github.com/hashicorp/errwrap v1.1.0 // indirect
github.com/hashicorp/go-cleanhttp v0.5.2 // indirect
github.com/hashicorp/go-hclog v1.6.3 // indirect
github.com/hashicorp/go-immutable-radix v1.3.1 // indirect
github.com/hashicorp/go-multierror v1.1.1 // indirect
github.com/hashicorp/go-retryablehttp v0.7.8 // indirect
github.com/hashicorp/go-rootcerts v1.0.2 // indirect
github.com/hashicorp/go-secure-stdlib/parseutil v0.1.6 // indirect
github.com/hashicorp/go-secure-stdlib/strutil v0.1.2 // indirect
github.com/hashicorp/go-sockaddr v1.0.2 // indirect
github.com/hashicorp/go-uuid v1.0.3 // indirect
github.com/hashicorp/golang-lru v0.5.4 // indirect
github.com/hashicorp/hcl v1.0.1-vault-7 // indirect
github.com/hashicorp/serf v0.10.1 // indirect
github.com/hashicorp/vault/api v1.20.0 // indirect
github.com/henrybear327/Proton-API-Bridge v1.0.0 // indirect
github.com/henrybear327/go-proton-api v1.0.0 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/jcmturner/aescts/v2 v2.0.0 // indirect
github.com/jcmturner/dnsutils/v2 v2.0.0 // indirect
github.com/jcmturner/gofork v1.7.6 // indirect
github.com/jcmturner/goidentity/v6 v6.0.1 // indirect
github.com/jcmturner/gokrb5/v8 v8.4.4 // indirect
github.com/jcmturner/rpc/v2 v2.0.3 // indirect
github.com/jlaffaye/ftp v0.2.1-0.20240918233326-1b970516f5d3 // indirect
github.com/josharian/intern v1.0.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/jtolds/gls v4.20.0+incompatible // indirect
github.com/jtolio/noiseconn v0.0.0-20231127013910-f6d9ecbf1de7 // indirect
github.com/jzelinskie/whirlpool v0.0.0-20201016144138-0675e54bb004 // indirect
github.com/klauspost/cpuid/v2 v2.3.0 // indirect
github.com/koofr/go-httpclient v0.0.0-20240520111329-e20f8f203988 // indirect
github.com/koofr/go-koofrclient v0.0.0-20221207135200-cbd7fc9ad6a6 // indirect
github.com/kr/fs v0.1.0 // indirect
github.com/kylelemons/godebug v1.1.0 // indirect
github.com/lanrat/extsort v1.4.2 // indirect
github.com/leodido/go-urn v1.4.0 // indirect
github.com/lpar/date v1.0.0 // indirect
github.com/lufia/plan9stats v0.0.0-20251013123823-9fd1530e3ec3 // indirect
github.com/mailru/easyjson v0.9.1 // indirect
github.com/mattn/go-colorable v0.1.14 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/mattn/go-runewidth v0.0.19 // indirect
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db // indirect
github.com/mitchellh/go-homedir v1.1.0 // indirect
github.com/mitchellh/mapstructure v1.5.0 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/ncruces/go-strftime v0.1.9 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/ncw/swift/v2 v2.0.5 // indirect
github.com/oklog/ulid v1.3.1 // indirect
github.com/onsi/ginkgo/v2 v2.23.3 // indirect
github.com/oracle/oci-go-sdk/v65 v65.104.0 // indirect
github.com/panjf2000/ants/v2 v2.11.3 // indirect
github.com/patrickmn/go-cache v2.1.0+incompatible // indirect
github.com/pengsrc/go-shared v0.2.1-0.20190131101655-1999055a4a14 // indirect
github.com/peterh/liner v1.2.2 // indirect
github.com/pierrec/lz4/v4 v4.1.22 // indirect
github.com/pkg/browser v0.0.0-20240102092130-5ac0b6a4141c // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/pkg/sftp v1.13.10 // indirect
github.com/pkg/xattr v0.4.12 // indirect
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
github.com/power-devops/perfstat v0.0.0-20240221224432-82ca36839d55 // indirect
github.com/pquerna/otp v1.5.0 // indirect
github.com/prometheus/client_golang v1.23.2 // indirect
github.com/prometheus/client_model v0.6.2 // indirect
github.com/prometheus/common v0.67.2 // indirect
github.com/prometheus/procfs v0.19.2 // indirect
github.com/putdotio/go-putio/putio v0.0.0-20200123120452-16d982cac2b8 // indirect
github.com/relvacode/iso8601 v1.7.0 // indirect
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
github.com/rfjakob/eme v1.1.2 // indirect
github.com/rivo/uniseg v0.4.7 // indirect
github.com/ryanuber/go-glob v1.0.0 // indirect
github.com/ryszard/goskiplist v0.0.0-20150312221310-2dfbae5fcf46 // indirect
github.com/spf13/pflag v1.0.6 // indirect
github.com/sabhiram/go-gitignore v0.0.0-20210923224102-525f6e181f06 // indirect
github.com/samber/lo v1.52.0 // indirect
github.com/shirou/gopsutil/v4 v4.25.10 // indirect
github.com/sirupsen/logrus v1.9.4-0.20230606125235-dd1b4c2e81af // indirect
github.com/skratchdot/open-golang v0.0.0-20200116055534-eef842397966 // indirect
github.com/smarty/assertions v1.16.0 // indirect
github.com/sony/gobreaker v1.0.0 // indirect
github.com/spacemonkeygo/monkit/v3 v3.0.25-0.20251022131615-eb24eb109368 // indirect
github.com/spf13/pflag v1.0.10 // indirect
github.com/t3rm1n4l/go-mega v0.0.0-20251031123324-a804aaa87491 // indirect
github.com/tidwall/gjson v1.18.0 // indirect
github.com/tidwall/match v1.1.1 // indirect
github.com/tidwall/pretty v1.2.0 // indirect
github.com/tklauser/go-sysconf v0.3.15 // indirect
github.com/tklauser/numcpus v0.10.0 // indirect
github.com/ulikunitz/xz v0.5.15 // indirect
github.com/unknwon/goconfig v1.0.0 // indirect
github.com/wk8/go-ordered-map/v2 v2.1.8 // indirect
github.com/x448/float16 v0.8.4 // indirect
github.com/xanzy/ssh-agent v0.3.3 // indirect
github.com/youmark/pkcs8 v0.0.0-20240726163527-a2c0da244d78 // indirect
github.com/yunify/qingstor-sdk-go/v3 v3.2.0 // indirect
github.com/yusufpapurcu/wmi v1.2.4 // indirect
github.com/zeebo/blake3 v0.2.4 // indirect
github.com/zeebo/errs v1.4.0 // indirect
github.com/zeebo/xxh3 v1.0.2 // indirect
go.etcd.io/bbolt v1.4.3 // indirect
go.etcd.io/etcd/api/v3 v3.6.2 // indirect
go.etcd.io/etcd/client/pkg/v3 v3.6.2 // indirect
go.etcd.io/etcd/client/v3 v3.6.2 // indirect
go.mongodb.org/mongo-driver v1.17.6 // indirect
go.opentelemetry.io/auto/sdk v1.2.1 // indirect
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.61.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.63.0 // indirect
go.opentelemetry.io/otel v1.38.0 // indirect
go.opentelemetry.io/otel/metric v1.38.0 // indirect
go.opentelemetry.io/otel/trace v1.38.0 // indirect
go.shabbyrobe.org/gocovmerge v0.0.0-20230507111327-fa4f82cfbf4d // indirect
go.uber.org/dig v1.19.0 // indirect
go.uber.org/multierr v1.10.0 // indirect
go.uber.org/zap v1.26.0 // indirect
golang.org/x/crypto v0.38.0 // indirect
golang.org/x/exp v0.0.0-20250408133849-7e4ce0ab07d0 // indirect
golang.org/x/sys v0.34.0 // indirect
golang.org/x/text v0.25.0 // indirect
golang.org/x/tools v0.33.0 // indirect
go.uber.org/multierr v1.11.0 // indirect
go.uber.org/zap v1.27.0 // indirect
go.yaml.in/yaml/v2 v2.4.3 // indirect
golang.org/x/crypto v0.45.0 // indirect
golang.org/x/exp v0.0.0-20251023183803-a4bb9ffd2546 // indirect
golang.org/x/net v0.47.0 // indirect
golang.org/x/oauth2 v0.33.0 // indirect
golang.org/x/sync v0.18.0 // indirect
golang.org/x/sys v0.38.0 // indirect
golang.org/x/text v0.31.0 // indirect
golang.org/x/time v0.14.0 // indirect
golang.org/x/tools v0.38.0 // indirect
google.golang.org/api v0.255.0 // indirect
google.golang.org/genproto v0.0.0-20250603155806-513f23925822 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20250804133106-a7a43d27e69b // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20251103181224-f26f9409b101 // indirect
google.golang.org/grpc v1.76.0 // indirect
google.golang.org/protobuf v1.36.10 // indirect
gopkg.in/evanphx/json-patch.v4 v4.12.0 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/natefinch/lumberjack.v2 v2.2.1 // indirect
gopkg.in/validator.v2 v2.0.1 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
k8s.io/api v0.33.3 // indirect
k8s.io/apimachinery v0.33.3 // indirect
k8s.io/client-go v0.33.3 // indirect
k8s.io/klog/v2 v2.130.1 // indirect
k8s.io/kube-openapi v0.0.0-20250318190949-c8a335a9a2ff // indirect
k8s.io/utils v0.0.0-20241104100929-3ea5e8cea738 // indirect
modernc.org/libc v1.65.10 // indirect
modernc.org/mathutil v1.7.1 // indirect
modernc.org/memory v1.11.0 // indirect
moul.io/http2curl/v2 v2.3.0 // indirect
sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3 // indirect
sigs.k8s.io/randfill v1.0.0 // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.6.0 // indirect
sigs.k8s.io/yaml v1.6.0 // indirect
storj.io/common v0.0.0-20251107171817-6221ae45072c // indirect
storj.io/drpc v0.0.35-0.20250513201419-f7819ea69b55 // indirect
storj.io/eventkit v0.0.0-20250410172343-61f26d3de156 // indirect
storj.io/infectious v0.0.2 // indirect
storj.io/picobuf v0.0.4 // indirect
storj.io/uplink v1.13.1 // indirect
)

1307
go.sum

File diff suppressed because it is too large Load Diff

View File

@@ -20,14 +20,15 @@ import (
"encoding/hex"
"fmt"
"io"
"os"
"sync"
"time"
"git.eeqj.de/sneak/vaultik/internal/blobgen"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/google/uuid"
"github.com/spf13/afero"
)
// BlobHandler is a callback function invoked when a blob is finalized and ready for upload.
@@ -44,6 +45,13 @@ type PackerConfig struct {
Recipients []string // Age recipients for encryption
Repositories *database.Repositories // Database repositories for tracking blob metadata
BlobHandler BlobHandler // Optional callback when blob is ready for upload
Fs afero.Fs // Filesystem for temporary files
}
// PendingChunk represents a chunk waiting to be inserted into the database.
type PendingChunk struct {
Hash string
Size int64
}
// Packer accumulates chunks and packs them into blobs.
@@ -55,6 +63,7 @@ type Packer struct {
recipients []string // Age recipients for encryption
blobHandler BlobHandler // Called when blob is ready
repos *database.Repositories // For creating blob records
fs afero.Fs // Filesystem for temporary files
// Mutex for thread-safe blob creation
mu sync.Mutex
@@ -62,6 +71,9 @@ type Packer struct {
// Current blob being packed
currentBlob *blobInProgress
finishedBlobs []*FinishedBlob // Only used if no handler provided
// Pending chunks to be inserted when blob finalizes
pendingChunks []PendingChunk
}
// blobInProgress represents a blob being assembled
@@ -69,7 +81,7 @@ type blobInProgress struct {
id string // UUID of the blob
chunks []*chunkInfo // Track chunk metadata
chunkSet map[string]bool // Track unique chunks in this blob
tempFile *os.File // Temporary file for encrypted compressed data
tempFile afero.File // Temporary file for encrypted compressed data
writer *blobgen.Writer // Unified compression/encryption/hashing writer
startTime time.Time
size int64 // Current uncompressed size
@@ -113,7 +125,8 @@ type BlobChunkRef struct {
type BlobWithReader struct {
*FinishedBlob
Reader io.ReadSeeker
TempFile *os.File // Optional, only set for disk-based blobs
TempFile afero.File // Optional, only set for disk-based blobs
InsertedChunkHashes []string // Chunk hashes that were inserted to DB with this blob
}
// NewPacker creates a new blob packer that accumulates chunks into blobs.
@@ -126,12 +139,16 @@ func NewPacker(cfg PackerConfig) (*Packer, error) {
if cfg.MaxBlobSize <= 0 {
return nil, fmt.Errorf("max blob size must be positive")
}
if cfg.Fs == nil {
return nil, fmt.Errorf("filesystem is required")
}
return &Packer{
maxBlobSize: cfg.MaxBlobSize,
compressionLevel: cfg.CompressionLevel,
recipients: cfg.Recipients,
blobHandler: cfg.BlobHandler,
repos: cfg.Repositories,
fs: cfg.Fs,
finishedBlobs: make([]*FinishedBlob, 0),
}, nil
}
@@ -146,6 +163,15 @@ func (p *Packer) SetBlobHandler(handler BlobHandler) {
p.blobHandler = handler
}
// AddPendingChunk queues a chunk to be inserted into the database when the
// current blob is finalized. This batches chunk inserts to reduce transaction
// overhead. Thread-safe.
func (p *Packer) AddPendingChunk(hash string, size int64) {
p.mu.Lock()
defer p.mu.Unlock()
p.pendingChunks = append(p.pendingChunks, PendingChunk{Hash: hash, Size: size})
}
// AddChunk adds a chunk to the current blob being packed.
// If adding the chunk would exceed MaxBlobSize, returns ErrBlobSizeLimitExceeded.
// In this case, the caller should finalize the current blob and retry.
@@ -237,25 +263,28 @@ func (p *Packer) startNewBlob() error {
// Create blob record in database
if p.repos != nil {
blobIDTyped, err := types.ParseBlobID(blobID)
if err != nil {
return fmt.Errorf("parsing blob ID: %w", err)
}
blob := &database.Blob{
ID: blobID,
Hash: "temp-placeholder-" + blobID, // Temporary placeholder until finalized
ID: blobIDTyped,
Hash: types.BlobHash("temp-placeholder-" + blobID), // Temporary placeholder until finalized
CreatedTS: time.Now().UTC(),
FinishedTS: nil,
UncompressedSize: 0,
CompressedSize: 0,
UploadedTS: nil,
}
err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
if err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
return p.repos.Blobs.Create(ctx, tx, blob)
})
if err != nil {
}); err != nil {
return fmt.Errorf("creating blob record: %w", err)
}
}
// Create temporary file
tempFile, err := os.CreateTemp("", "vaultik-blob-*.tmp")
tempFile, err := afero.TempFile(p.fs, "", "vaultik-blob-*.tmp")
if err != nil {
return fmt.Errorf("creating temp file: %w", err)
}
@@ -264,7 +293,7 @@ func (p *Packer) startNewBlob() error {
writer, err := blobgen.NewWriter(tempFile, p.compressionLevel, p.recipients)
if err != nil {
_ = tempFile.Close()
_ = os.Remove(tempFile.Name())
_ = p.fs.Remove(tempFile.Name())
return fmt.Errorf("creating blobgen writer: %w", err)
}
@@ -308,23 +337,9 @@ func (p *Packer) addChunkToCurrentBlob(chunk *ChunkRef) error {
p.currentBlob.chunks = append(p.currentBlob.chunks, chunkInfo)
p.currentBlob.chunkSet[chunk.Hash] = true
// Store blob-chunk association in database immediately
if p.repos != nil {
blobChunk := &database.BlobChunk{
BlobID: p.currentBlob.id,
ChunkHash: chunk.Hash,
Offset: offset,
Length: chunkSize,
}
err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
return p.repos.BlobChunks.Create(ctx, tx, blobChunk)
})
if err != nil {
log.Error("Failed to store blob-chunk association in database", "error", err,
"blob_id", p.currentBlob.id, "chunk_hash", chunk.Hash)
// Continue anyway - we can reconstruct this later if needed
}
}
// Note: blob_chunk records are inserted in batch when blob is finalized
// to reduce transaction overhead. The chunk info is already stored in
// p.currentBlob.chunks for later insertion.
// Update total size
p.currentBlob.size += chunkSize
@@ -386,16 +401,54 @@ func (p *Packer) finalizeCurrentBlob() error {
})
}
// Update blob record in database with hash and sizes
// Get pending chunks (will be inserted to DB and reported to handler)
chunksToInsert := p.pendingChunks
p.pendingChunks = nil // Clear pending list
// Insert pending chunks, blob_chunks, and update blob in a single transaction
if p.repos != nil {
blobIDTyped, parseErr := types.ParseBlobID(p.currentBlob.id)
if parseErr != nil {
p.cleanupTempFile()
return fmt.Errorf("parsing blob ID: %w", parseErr)
}
err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
// First insert all pending chunks (required for blob_chunks FK)
for _, chunk := range chunksToInsert {
dbChunk := &database.Chunk{
ChunkHash: types.ChunkHash(chunk.Hash),
Size: chunk.Size,
}
if err := p.repos.Chunks.Create(ctx, tx, dbChunk); err != nil {
return fmt.Errorf("creating chunk: %w", err)
}
}
// Insert all blob_chunk records in batch
for _, chunk := range p.currentBlob.chunks {
blobChunk := &database.BlobChunk{
BlobID: blobIDTyped,
ChunkHash: types.ChunkHash(chunk.Hash),
Offset: chunk.Offset,
Length: chunk.Size,
}
if err := p.repos.BlobChunks.Create(ctx, tx, blobChunk); err != nil {
return fmt.Errorf("creating blob_chunk: %w", err)
}
}
// Update blob record with final hash and sizes
return p.repos.Blobs.UpdateFinished(ctx, tx, p.currentBlob.id, blobHash,
p.currentBlob.size, finalSize)
})
if err != nil {
p.cleanupTempFile()
return fmt.Errorf("updating blob record: %w", err)
return fmt.Errorf("finalizing blob transaction: %w", err)
}
log.Debug("Committed blob transaction",
"chunks_inserted", len(chunksToInsert),
"blob_chunks_inserted", len(p.currentBlob.chunks))
}
// Create finished blob
@@ -418,9 +471,14 @@ func (p *Packer) finalizeCurrentBlob() error {
"ratio", fmt.Sprintf("%.2f", compressionRatio),
"duration", time.Since(p.currentBlob.startTime))
// Collect inserted chunk hashes for the scanner to track
var insertedChunkHashes []string
for _, chunk := range chunksToInsert {
insertedChunkHashes = append(insertedChunkHashes, chunk.Hash)
}
// Call blob handler if set
if p.blobHandler != nil {
log.Debug("Invoking blob handler callback", "blob_hash", blobHash[:8]+"...")
// Reset file position for handler
if _, err := p.currentBlob.tempFile.Seek(0, io.SeekStart); err != nil {
p.cleanupTempFile()
@@ -432,6 +490,7 @@ func (p *Packer) finalizeCurrentBlob() error {
FinishedBlob: finished,
Reader: p.currentBlob.tempFile,
TempFile: p.currentBlob.tempFile,
InsertedChunkHashes: insertedChunkHashes,
}
if err := p.blobHandler(blobWithReader); err != nil {
@@ -470,7 +529,7 @@ func (p *Packer) cleanupTempFile() {
if p.currentBlob != nil && p.currentBlob.tempFile != nil {
name := p.currentBlob.tempFile.Name()
_ = p.currentBlob.tempFile.Close()
_ = os.Remove(name)
_ = p.fs.Remove(name)
}
}

View File

@@ -12,7 +12,9 @@ import (
"filippo.io/age"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/klauspost/compress/zstd"
"github.com/spf13/afero"
)
const (
@@ -45,6 +47,7 @@ func TestPacker(t *testing.T) {
CompressionLevel: 3,
Recipients: []string{testPublicKey},
Repositories: repos,
Fs: afero.NewMemMapFs(),
}
packer, err := NewPacker(cfg)
if err != nil {
@@ -58,7 +61,7 @@ func TestPacker(t *testing.T) {
// Create chunk in database first
dbChunk := &database.Chunk{
ChunkHash: hashStr,
ChunkHash: types.ChunkHash(hashStr),
Size: int64(len(data)),
}
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
@@ -134,6 +137,7 @@ func TestPacker(t *testing.T) {
CompressionLevel: 3,
Recipients: []string{testPublicKey},
Repositories: repos,
Fs: afero.NewMemMapFs(),
}
packer, err := NewPacker(cfg)
if err != nil {
@@ -149,7 +153,7 @@ func TestPacker(t *testing.T) {
// Create chunk in database first
dbChunk := &database.Chunk{
ChunkHash: hashStr,
ChunkHash: types.ChunkHash(hashStr),
Size: int64(len(data)),
}
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
@@ -216,6 +220,7 @@ func TestPacker(t *testing.T) {
CompressionLevel: 3,
Recipients: []string{testPublicKey},
Repositories: repos,
Fs: afero.NewMemMapFs(),
}
packer, err := NewPacker(cfg)
if err != nil {
@@ -231,7 +236,7 @@ func TestPacker(t *testing.T) {
// Create chunk in database first
dbChunk := &database.Chunk{
ChunkHash: hashStr,
ChunkHash: types.ChunkHash(hashStr),
Size: int64(len(data)),
}
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
@@ -304,6 +309,7 @@ func TestPacker(t *testing.T) {
CompressionLevel: 3,
Recipients: []string{testPublicKey},
Repositories: repos,
Fs: afero.NewMemMapFs(),
}
packer, err := NewPacker(cfg)
if err != nil {
@@ -317,7 +323,7 @@ func TestPacker(t *testing.T) {
// Create chunk in database first
dbChunk := &database.Chunk{
ChunkHash: hashStr,
ChunkHash: types.ChunkHash(hashStr),
Size: int64(len(data)),
}
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {

View File

@@ -51,7 +51,13 @@ func CompressStream(dst io.Writer, src io.Reader, compressionLevel int, recipien
if err != nil {
return 0, "", fmt.Errorf("creating writer: %w", err)
}
defer func() { _ = w.Close() }()
closed := false
defer func() {
if !closed {
_ = w.Close()
}
}()
// Copy data
if _, err := io.Copy(w, src); err != nil {
@@ -62,6 +68,7 @@ func CompressStream(dst io.Writer, src io.Reader, compressionLevel int, recipien
if err := w.Close(); err != nil {
return 0, "", fmt.Errorf("closing writer: %w", err)
}
closed = true
return w.BytesWritten(), hex.EncodeToString(w.Sum256()), nil
}

View File

@@ -5,30 +5,33 @@ import (
"fmt"
"hash"
"io"
"runtime"
"filippo.io/age"
"github.com/klauspost/compress/zstd"
)
// Writer wraps compression and encryption with SHA256 hashing
// Writer wraps compression and encryption with SHA256 hashing.
// Data flows: input -> tee(hasher, compressor -> encryptor -> destination)
// The hash is computed on the uncompressed input for deterministic content-addressing.
type Writer struct {
writer io.Writer // Final destination
teeWriter io.Writer // Tee to hasher and compressor
compressor *zstd.Encoder // Compression layer
encryptor io.WriteCloser // Encryption layer
hasher hash.Hash // SHA256 hasher
teeWriter io.Writer // Tees data to hasher
hasher hash.Hash // SHA256 hasher (on uncompressed input)
compressionLevel int
bytesWritten int64
}
// NewWriter creates a new Writer that compresses, encrypts, and hashes data
// NewWriter creates a new Writer that compresses, encrypts, and hashes data.
// The hash is computed on the uncompressed input for deterministic content-addressing.
func NewWriter(w io.Writer, compressionLevel int, recipients []string) (*Writer, error) {
// Validate compression level
if err := validateCompressionLevel(compressionLevel); err != nil {
return nil, err
}
// Create SHA256 hasher
// Create SHA256 hasher for the uncompressed input
hasher := sha256.New()
// Parse recipients
@@ -41,31 +44,36 @@ func NewWriter(w io.Writer, compressionLevel int, recipients []string) (*Writer,
ageRecipients = append(ageRecipients, r)
}
// Create encryption writer
// Create encryption writer that outputs to destination
encWriter, err := age.Encrypt(w, ageRecipients...)
if err != nil {
return nil, fmt.Errorf("creating encryption writer: %w", err)
}
// Calculate compression concurrency: CPUs - 2, minimum 1
concurrency := runtime.NumCPU() - 2
if concurrency < 1 {
concurrency = 1
}
// Create compression writer with encryption as destination
compressor, err := zstd.NewWriter(encWriter,
zstd.WithEncoderLevel(zstd.EncoderLevelFromZstd(compressionLevel)),
zstd.WithEncoderConcurrency(1), // Use single thread for streaming
zstd.WithEncoderConcurrency(concurrency),
)
if err != nil {
_ = encWriter.Close()
return nil, fmt.Errorf("creating compression writer: %w", err)
}
// Create tee writer that writes to both compressor and hasher
teeWriter := io.MultiWriter(compressor, hasher)
// Create tee writer: input goes to both hasher and compressor
teeWriter := io.MultiWriter(hasher, compressor)
return &Writer{
writer: w,
teeWriter: teeWriter,
compressor: compressor,
encryptor: encWriter,
hasher: hasher,
teeWriter: teeWriter,
compressionLevel: compressionLevel,
}, nil
}
@@ -92,9 +100,16 @@ func (w *Writer) Close() error {
return nil
}
// Sum256 returns the SHA256 hash of all data written
// Sum256 returns the double SHA256 hash of the uncompressed input data.
// Double hashing (SHA256(SHA256(data))) prevents information leakage about
// the plaintext - an attacker cannot confirm existence of known content
// by computing its hash and checking for a matching blob filename.
func (w *Writer) Sum256() []byte {
return w.hasher.Sum(nil)
// First hash: SHA256(plaintext)
firstHash := w.hasher.Sum(nil)
// Second hash: SHA256(firstHash) - this is the blob ID
secondHash := sha256.Sum256(firstHash)
return secondHash[:]
}
// BytesWritten returns the number of uncompressed bytes written

View File

@@ -0,0 +1,105 @@
package blobgen
import (
"bytes"
"crypto/rand"
"crypto/sha256"
"encoding/hex"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// TestWriterHashIsDoubleHash verifies that Writer.Sum256() returns
// the double hash SHA256(SHA256(plaintext)) for security.
// Double hashing prevents attackers from confirming existence of known content.
func TestWriterHashIsDoubleHash(t *testing.T) {
// Test data - random data that doesn't compress well
testData := make([]byte, 1024*1024) // 1MB
_, err := rand.Read(testData)
require.NoError(t, err)
// Test recipient (generated with age-keygen)
testRecipient := "age1cplgrwj77ta54dnmydvvmzn64ltk83ankxl5sww04mrtmu62kv3s89gmvv"
// Create a buffer to capture the encrypted output
var encryptedBuf bytes.Buffer
// Create blobgen writer
writer, err := NewWriter(&encryptedBuf, 3, []string{testRecipient})
require.NoError(t, err)
// Write test data
n, err := writer.Write(testData)
require.NoError(t, err)
assert.Equal(t, len(testData), n)
// Close to flush all data
err = writer.Close()
require.NoError(t, err)
// Get the hash from the writer
writerHash := hex.EncodeToString(writer.Sum256())
// Calculate the expected double hash: SHA256(SHA256(plaintext))
firstHash := sha256.Sum256(testData)
secondHash := sha256.Sum256(firstHash[:])
expectedDoubleHash := hex.EncodeToString(secondHash[:])
// Also compute single hash to verify it's different
singleHashStr := hex.EncodeToString(firstHash[:])
t.Logf("Input size: %d bytes", len(testData))
t.Logf("Single hash (SHA256(data)): %s", singleHashStr)
t.Logf("Double hash (SHA256(SHA256(data))): %s", expectedDoubleHash)
t.Logf("Writer hash: %s", writerHash)
// The writer hash should match the double hash
assert.Equal(t, expectedDoubleHash, writerHash,
"Writer.Sum256() should return SHA256(SHA256(plaintext)) for security")
// Verify it's NOT the single hash (would leak information)
assert.NotEqual(t, singleHashStr, writerHash,
"Writer hash should not be single hash (would allow content confirmation attacks)")
}
// TestWriterDeterministicHash verifies that the same input always produces
// the same hash, even with non-deterministic encryption.
func TestWriterDeterministicHash(t *testing.T) {
// Test data
testData := []byte("Hello, World! This is test data for deterministic hashing.")
// Test recipient
testRecipient := "age1cplgrwj77ta54dnmydvvmzn64ltk83ankxl5sww04mrtmu62kv3s89gmvv"
// Create two writers and verify they produce the same hash
var buf1, buf2 bytes.Buffer
writer1, err := NewWriter(&buf1, 3, []string{testRecipient})
require.NoError(t, err)
_, err = writer1.Write(testData)
require.NoError(t, err)
require.NoError(t, writer1.Close())
writer2, err := NewWriter(&buf2, 3, []string{testRecipient})
require.NoError(t, err)
_, err = writer2.Write(testData)
require.NoError(t, err)
require.NoError(t, writer2.Close())
hash1 := hex.EncodeToString(writer1.Sum256())
hash2 := hex.EncodeToString(writer2.Sum256())
// Hashes should be identical (deterministic)
assert.Equal(t, hash1, hash2, "Same input should produce same hash")
// Encrypted outputs should be different (non-deterministic encryption)
assert.NotEqual(t, buf1.Bytes(), buf2.Bytes(),
"Encrypted outputs should differ due to non-deterministic encryption")
t.Logf("Hash 1: %s", hash1)
t.Logf("Hash 2: %s", hash2)
t.Logf("Encrypted size 1: %d bytes", buf1.Len())
t.Logf("Encrypted size 2: %d bytes", buf2.Len())
}

View File

@@ -6,8 +6,6 @@ import (
"fmt"
"io"
"os"
"github.com/jotfs/fastcdc-go"
)
// Chunk represents a single chunk of data produced by the content-defined chunking algorithm.
@@ -48,16 +46,8 @@ func NewChunker(avgChunkSize int64) *Chunker {
// reasonably sized inputs. For large files or streams, use ChunkReaderStreaming instead.
// Returns an error if chunking fails or if reading from the input fails.
func (c *Chunker) ChunkReader(r io.Reader) ([]Chunk, error) {
opts := fastcdc.Options{
MinSize: c.minChunkSize,
AverageSize: c.avgChunkSize,
MaxSize: c.maxChunkSize,
}
chunker, err := fastcdc.NewChunker(r, opts)
if err != nil {
return nil, fmt.Errorf("creating chunker: %w", err)
}
chunker := AcquireReusableChunker(r, c.minChunkSize, c.avgChunkSize, c.maxChunkSize)
defer chunker.Release()
var chunks []Chunk
offset := int64(0)
@@ -74,7 +64,7 @@ func (c *Chunker) ChunkReader(r io.Reader) ([]Chunk, error) {
// Calculate hash
hash := sha256.Sum256(chunk.Data)
// Make a copy of the data since FastCDC reuses the buffer
// Make a copy of the data since the chunker reuses the buffer
chunkData := make([]byte, len(chunk.Data))
copy(chunkData, chunk.Data)
@@ -107,16 +97,8 @@ func (c *Chunker) ChunkReaderStreaming(r io.Reader, callback ChunkCallback) (str
fileHasher := sha256.New()
teeReader := io.TeeReader(r, fileHasher)
opts := fastcdc.Options{
MinSize: c.minChunkSize,
AverageSize: c.avgChunkSize,
MaxSize: c.maxChunkSize,
}
chunker, err := fastcdc.NewChunker(teeReader, opts)
if err != nil {
return "", fmt.Errorf("creating chunker: %w", err)
}
chunker := AcquireReusableChunker(teeReader, c.minChunkSize, c.avgChunkSize, c.maxChunkSize)
defer chunker.Release()
offset := int64(0)
@@ -132,13 +114,12 @@ func (c *Chunker) ChunkReaderStreaming(r io.Reader, callback ChunkCallback) (str
// Calculate chunk hash
hash := sha256.Sum256(chunk.Data)
// Make a copy of the data since FastCDC reuses the buffer
chunkData := make([]byte, len(chunk.Data))
copy(chunkData, chunk.Data)
// Pass the data directly - caller must process it before we call Next() again
// (chunker reuses its internal buffer, but since we process synchronously
// and completely before continuing, no copy is needed)
if err := callback(Chunk{
Hash: hex.EncodeToString(hash[:]),
Data: chunkData,
Data: chunk.Data,
Offset: offset,
Size: int64(len(chunk.Data)),
}); err != nil {

265
internal/chunker/fastcdc.go Normal file
View File

@@ -0,0 +1,265 @@
package chunker
import (
"io"
"math"
"sync"
)
// ReusableChunker implements FastCDC with reusable buffers to minimize allocations.
// Unlike the upstream fastcdc-go library which allocates a new buffer per file,
// this implementation uses sync.Pool to reuse buffers across files.
type ReusableChunker struct {
minSize int
maxSize int
normSize int
bufSize int
maskS uint64
maskL uint64
rd io.Reader
buf []byte
cursor int
offset int
eof bool
}
// reusableChunkerPool pools ReusableChunker instances to avoid allocations.
var reusableChunkerPool = sync.Pool{
New: func() interface{} {
return &ReusableChunker{}
},
}
// bufferPools contains pools for different buffer sizes.
// Key is the buffer size.
var bufferPools = sync.Map{}
func getBuffer(size int) []byte {
poolI, _ := bufferPools.LoadOrStore(size, &sync.Pool{
New: func() interface{} {
buf := make([]byte, size)
return &buf
},
})
pool := poolI.(*sync.Pool)
return *pool.Get().(*[]byte)
}
func putBuffer(buf []byte) {
size := cap(buf)
poolI, ok := bufferPools.Load(size)
if ok {
pool := poolI.(*sync.Pool)
b := buf[:size]
pool.Put(&b)
}
}
// FastCDCChunk represents a chunk from the FastCDC algorithm.
type FastCDCChunk struct {
Offset int
Length int
Data []byte
Fingerprint uint64
}
// AcquireReusableChunker gets a chunker from the pool and initializes it for the given reader.
func AcquireReusableChunker(rd io.Reader, minSize, avgSize, maxSize int) *ReusableChunker {
c := reusableChunkerPool.Get().(*ReusableChunker)
bufSize := maxSize * 2
// Reuse buffer if it's the right size, otherwise get a new one
if c.buf == nil || cap(c.buf) != bufSize {
if c.buf != nil {
putBuffer(c.buf)
}
c.buf = getBuffer(bufSize)
} else {
// Restore buffer to full capacity (may have been truncated by previous EOF)
c.buf = c.buf[:cap(c.buf)]
}
bits := int(math.Round(math.Log2(float64(avgSize))))
normalization := 2
smallBits := bits + normalization
largeBits := bits - normalization
c.minSize = minSize
c.maxSize = maxSize
c.normSize = avgSize
c.bufSize = bufSize
c.maskS = (1 << smallBits) - 1
c.maskL = (1 << largeBits) - 1
c.rd = rd
c.cursor = bufSize
c.offset = 0
c.eof = false
return c
}
// Release returns the chunker to the pool for reuse.
func (c *ReusableChunker) Release() {
c.rd = nil
reusableChunkerPool.Put(c)
}
func (c *ReusableChunker) fillBuffer() error {
n := len(c.buf) - c.cursor
if n >= c.maxSize {
return nil
}
// Move all data after the cursor to the start of the buffer
copy(c.buf[:n], c.buf[c.cursor:])
c.cursor = 0
if c.eof {
c.buf = c.buf[:n]
return nil
}
// Restore buffer to full capacity for reading
c.buf = c.buf[:c.bufSize]
// Fill the rest of the buffer
m, err := io.ReadFull(c.rd, c.buf[n:])
if err == io.EOF || err == io.ErrUnexpectedEOF {
c.buf = c.buf[:n+m]
c.eof = true
} else if err != nil {
return err
}
return nil
}
// Next returns the next chunk or io.EOF when done.
// The returned Data slice is only valid until the next call to Next.
func (c *ReusableChunker) Next() (FastCDCChunk, error) {
if err := c.fillBuffer(); err != nil {
return FastCDCChunk{}, err
}
if len(c.buf) == 0 {
return FastCDCChunk{}, io.EOF
}
length, fp := c.nextChunk(c.buf[c.cursor:])
chunk := FastCDCChunk{
Offset: c.offset,
Length: length,
Data: c.buf[c.cursor : c.cursor+length],
Fingerprint: fp,
}
c.cursor += length
c.offset += chunk.Length
return chunk, nil
}
func (c *ReusableChunker) nextChunk(data []byte) (int, uint64) {
fp := uint64(0)
i := c.minSize
if len(data) <= c.minSize {
return len(data), fp
}
n := min(len(data), c.maxSize)
for ; i < min(n, c.normSize); i++ {
fp = (fp << 1) + table[data[i]]
if (fp & c.maskS) == 0 {
return i + 1, fp
}
}
for ; i < n; i++ {
fp = (fp << 1) + table[data[i]]
if (fp & c.maskL) == 0 {
return i + 1, fp
}
}
return i, fp
}
func min(a, b int) int {
if a < b {
return a
}
return b
}
// 256 random uint64s for the rolling hash function (from FastCDC paper)
var table = [256]uint64{
0xe80e8d55032474b3, 0x11b25b61f5924e15, 0x03aa5bd82a9eb669, 0xc45a153ef107a38c,
0xeac874b86f0f57b9, 0xa5ccedec95ec79c7, 0xe15a3320ad42ac0a, 0x5ed3583fa63cec15,
0xcd497bf624a4451d, 0xf9ade5b059683605, 0x773940c03fb11ca1, 0xa36b16e4a6ae15b2,
0x67afd1adb5a89eac, 0xc44c75ee32f0038e, 0x2101790f365c0967, 0x76415c64a222fc4a,
0x579929249a1e577a, 0xe4762fc41fdbf750, 0xea52198e57dfcdcc, 0xe2535aafe30b4281,
0xcb1a1bd6c77c9056, 0x5a1aa9bfc4612a62, 0x15a728aef8943eb5, 0x2f8f09738a8ec8d9,
0x200f3dec9fac8074, 0x0fa9a7b1e0d318df, 0x06c0804ffd0d8e3a, 0x630cbc412669dd25,
0x10e34f85f4b10285, 0x2a6fe8164b9b6410, 0xcacb57d857d55810, 0x77f8a3a36ff11b46,
0x66af517e0dc3003e, 0x76c073c789b4009a, 0x853230dbb529f22a, 0x1e9e9c09a1f77e56,
0x1e871223802ee65d, 0x37fe4588718ff813, 0x10088539f30db464, 0x366f7470b80b72d1,
0x33f2634d9a6b31db, 0xd43917751d69ea18, 0xa0f492bc1aa7b8de, 0x3f94e5a8054edd20,
0xedfd6e25eb8b1dbf, 0x759517a54f196a56, 0xe81d5006ec7b6b17, 0x8dd8385fa894a6b7,
0x45f4d5467b0d6f91, 0xa1f894699de22bc8, 0x33829d09ef93e0fe, 0x3e29e250caed603c,
0xf7382cba7f63a45e, 0x970f95412bb569d1, 0xc7fcea456d356b4b, 0x723042513f3e7a57,
0x17ae7688de3596f1, 0x27ac1fcd7cd23c1a, 0xf429beeb78b3f71f, 0xd0780692fb93a3f9,
0x9f507e28a7c9842f, 0x56001ad536e433ae, 0x7e1dd1ecf58be306, 0x15fee353aa233fc6,
0xb033a0730b7638e8, 0xeb593ad6bd2406d1, 0x7c86502574d0f133, 0xce3b008d4ccb4be7,
0xf8566e3d383594c8, 0xb2c261e9b7af4429, 0xf685e7e253799dbb, 0x05d33ed60a494cbc,
0xeaf88d55a4cb0d1a, 0x3ee9368a902415a1, 0x8980fe6a8493a9a4, 0x358ed008cb448631,
0xd0cb7e37b46824b8, 0xe9bc375c0bc94f84, 0xea0bf1d8e6b55bb3, 0xb66a60d0f9f6f297,
0x66db2cc4807b3758, 0x7e4e014afbca8b4d, 0xa5686a4938b0c730, 0xa5f0d7353d623316,
0x26e38c349242d5e8, 0xeeefa80a29858e30, 0x8915cb912aa67386, 0x4b957a47bfc420d4,
0xbb53d051a895f7e1, 0x09f5e3235f6911ce, 0x416b98e695cfb7ce, 0x97a08183344c5c86,
0xbf68e0791839a861, 0xea05dde59ed3ed56, 0x0ca732280beda160, 0xac748ed62fe7f4e2,
0xc686da075cf6e151, 0xe1ba5658f4af05c8, 0xe9ff09fbeb67cc35, 0xafaea9470323b28d,
0x0291e8db5bb0ac2a, 0x342072a9bbee77ae, 0x03147eed6b3d0a9c, 0x21379d4de31dbadb,
0x2388d965226fb986, 0x52c96988bfebabfa, 0xa6fc29896595bc2d, 0x38fa4af70aa46b8b,
0xa688dd13939421ee, 0x99d5275d9b1415da, 0x453d31bb4fe73631, 0xde51debc1fbe3356,
0x75a3c847a06c622f, 0xe80e32755d272579, 0x5444052250d8ec0d, 0x8f17dfda19580a3b,
0xf6b3e9363a185e42, 0x7a42adec6868732f, 0x32cb6a07629203a2, 0x1eca8957defe56d9,
0x9fa85e4bc78ff9ed, 0x20ff07224a499ca7, 0x3fa6295ff9682c70, 0xe3d5b1e3ce993eff,
0xa341209362e0b79a, 0x64bd9eae5712ffe8, 0xceebb537babbd12a, 0x5586ef404315954f,
0x46c3085c938ab51a, 0xa82ccb9199907cee, 0x8c51b6690a3523c8, 0xc4dbd4c9ae518332,
0x979898dbb23db7b2, 0x1b5b585e6f672a9d, 0xce284da7c4903810, 0x841166e8bb5f1c4f,
0xb7d884a3fceca7d0, 0xa76468f5a4572374, 0xc10c45f49ee9513d, 0x68f9a5663c1908c9,
0x0095a13476a6339d, 0xd1d7516ffbe9c679, 0xfd94ab0c9726f938, 0x627468bbdb27c959,
0xedc3f8988e4a8c9a, 0x58efd33f0dfaa499, 0x21e37d7e2ef4ac8b, 0x297f9ab5586259c6,
0xda3ba4dc6cb9617d, 0xae11d8d9de2284d2, 0xcfeed88cb3729865, 0xefc2f9e4f03e2633,
0x8226393e8f0855a4, 0xd6e25fd7acf3a767, 0x435784c3bfd6d14a, 0xf97142e6343fe757,
0xd73b9fe826352f85, 0x6c3ac444b5b2bd76, 0xd8e88f3e9fd4a3fd, 0x31e50875c36f3460,
0xa824f1bf88cf4d44, 0x54a4d2c8f5f25899, 0xbff254637ce3b1e6, 0xa02cfe92561b3caa,
0x7bedb4edee9f0af7, 0x879c0620ac49a102, 0xa12c4ccd23b332e7, 0x09a5ff47bf94ed1e,
0x7b62f43cd3046fa0, 0xaa3af0476b9c2fb9, 0x22e55301abebba8e, 0x3a6035c42747bd58,
0x1705373106c8ec07, 0xb1f660de828d0628, 0x065fe82d89ca563d, 0xf555c2d8074d516d,
0x6bb6c186b423ee99, 0x54a807be6f3120a8, 0x8a3c7fe2f88860b8, 0xbeffc344f5118e81,
0xd686e80b7d1bd268, 0x661aef4ef5e5e88b, 0x5bf256c654cd1dda, 0x9adb1ab85d7640f4,
0x68449238920833a2, 0x843279f4cebcb044, 0xc8710cdefa93f7bb, 0x236943294538f3e6,
0x80d7d136c486d0b4, 0x61653956b28851d3, 0x3f843be9a9a956b5, 0xf73cfbbf137987e5,
0xcf0cb6dee8ceac2c, 0x50c401f52f185cae, 0xbdbe89ce735c4c1c, 0xeef3ade9c0570bc7,
0xbe8b066f8f64cbf6, 0x5238d6131705dcb9, 0x20219086c950e9f6, 0x634468d9ed74de02,
0x0aba4b3d705c7fa5, 0x3374416f725a6672, 0xe7378bdf7beb3bc6, 0x0f7b6a1b1cee565b,
0x234e4c41b0c33e64, 0x4efa9a0c3f21fe28, 0x1167fc551643e514, 0x9f81a69d3eb01fa4,
0xdb75c22b12306ed0, 0xe25055d738fc9686, 0x9f9f167a3f8507bb, 0x195f8336d3fbe4d3,
0x8442b6feffdcb6f6, 0x1e07ed24746ffde9, 0x140e31462d555266, 0x8bd0ce515ae1406e,
0x2c0be0042b5584b3, 0x35a23d0e15d45a60, 0xc14f1ba147d9bc83, 0xbbf168691264b23f,
0xad2cc7b57e589ade, 0x9501963154c7815c, 0x9664afa6b8d67d47, 0x7f9e5101fea0a81c,
0x45ecffb610d25bfd, 0x3157f7aecf9b6ab3, 0xc43ca6f88d87501d, 0x9576ff838dee38dc,
0x93f21afe0ce1c7d7, 0xceac699df343d8f9, 0x2fec49e29f03398d, 0x8805ccd5730281ed,
0xf9fc16fc750a8e59, 0x35308cc771adf736, 0x4a57b7c9ee2b7def, 0x03a4c6cdc937a02a,
0x6c9a8a269fc8c4fc, 0x4681decec7a03f43, 0x342eecded1353ef9, 0x8be0552d8413a867,
0xc7b4ac51beda8be8, 0xebcc64fb719842c0, 0xde8e4c7fb6d40c1c, 0xcc8263b62f9738b1,
0xd3cfc0f86511929a, 0x466024ce8bb226ea, 0x459ff690253a3c18, 0x98b27e9d91284c9c,
0x75c3ae8aa3af373d, 0xfbf8f8e79a866ffc, 0x32327f59d0662799, 0x8228b57e729e9830,
0x065ceb7a18381b58, 0xd2177671a31dc5ff, 0x90cd801f2f8701f9, 0x9d714428471c65fe,
}

View File

@@ -2,9 +2,11 @@ package cli
import (
"context"
"errors"
"fmt"
"os"
"os/signal"
"path/filepath"
"syscall"
"time"
@@ -12,6 +14,11 @@ import (
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/globals"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/pidlock"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/storage"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/adrg/xdg"
"go.uber.org/fx"
)
@@ -48,6 +55,9 @@ func NewApp(opts AppOptions) *fx.App {
config.Module,
database.Module,
log.Module,
storage.Module,
snapshot.Module,
fx.Provide(vaultik.New),
fx.Invoke(setupGlobals),
fx.NopLogger,
}
@@ -112,7 +122,23 @@ func RunApp(ctx context.Context, app *fx.App) error {
// RunWithApp is a helper that creates and runs an fx app with the given options.
// It combines NewApp and RunApp into a single convenient function. This is the
// preferred way to run CLI commands that need the full application context.
// It acquires a PID lock before starting to prevent concurrent instances.
func RunWithApp(ctx context.Context, opts AppOptions) error {
// Acquire PID lock to prevent concurrent instances
lockDir := filepath.Join(xdg.DataHome, "berlin.sneak.app.vaultik")
lock, err := pidlock.Acquire(lockDir)
if err != nil {
if errors.Is(err, pidlock.ErrAlreadyRunning) {
return fmt.Errorf("cannot start: %w", err)
}
return fmt.Errorf("failed to acquire lock: %w", err)
}
defer func() {
if err := lock.Release(); err != nil {
log.Warn("Failed to release PID lock", "error", err)
}
}()
app := NewApp(opts)
return RunApp(ctx, app)
}

102
internal/cli/database.go Normal file
View File

@@ -0,0 +1,102 @@
package cli
import (
"fmt"
"os"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/log"
"github.com/spf13/cobra"
)
// NewDatabaseCommand creates the database command group
func NewDatabaseCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "database",
Short: "Manage the local state database",
Long: `Commands for managing the local SQLite state database.`,
}
cmd.AddCommand(
newDatabasePurgeCommand(),
)
return cmd
}
// newDatabasePurgeCommand creates the database purge command
func newDatabasePurgeCommand() *cobra.Command {
var force bool
cmd := &cobra.Command{
Use: "purge",
Short: "Delete the local state database",
Long: `Completely removes the local SQLite state database.
This will erase all local tracking of:
- File metadata and change detection state
- Chunk and blob mappings
- Local snapshot records
The remote storage is NOT affected. After purging, the next backup will
perform a full scan and re-deduplicate against existing remote blobs.
Use --force to skip the confirmation prompt.`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Resolve config path
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
// Load config to get database path
cfg, err := config.Load(configPath)
if err != nil {
return fmt.Errorf("failed to load config: %w", err)
}
dbPath := cfg.IndexPath
// Check if database exists
if _, err := os.Stat(dbPath); os.IsNotExist(err) {
fmt.Printf("Database does not exist: %s\n", dbPath)
return nil
}
// Confirm unless --force
if !force {
fmt.Printf("This will delete the local state database at:\n %s\n\n", dbPath)
fmt.Print("Are you sure? Type 'yes' to confirm: ")
var confirm string
if _, err := fmt.Scanln(&confirm); err != nil || confirm != "yes" {
fmt.Println("Aborted.")
return nil
}
}
// Delete the database file
if err := os.Remove(dbPath); err != nil {
return fmt.Errorf("failed to delete database: %w", err)
}
// Also delete WAL and SHM files if they exist
walPath := dbPath + "-wal"
shmPath := dbPath + "-shm"
_ = os.Remove(walPath) // Ignore errors - files may not exist
_ = os.Remove(shmPath)
rootFlags := GetRootFlags()
if !rootFlags.Quiet {
fmt.Printf("Database purged: %s\n", dbPath)
}
log.Info("Local state database purged", "path", dbPath)
return nil
},
}
cmd.Flags().BoolVar(&force, "force", false, "Skip confirmation prompt")
return cmd
}

View File

@@ -18,7 +18,7 @@ func TestCLIEntry(t *testing.T) {
}
// Verify all subcommands are registered
expectedCommands := []string{"snapshot", "store", "restore", "prune", "verify", "fetch"}
expectedCommands := []string{"snapshot", "store", "restore", "prune", "verify", "info", "version"}
for _, expected := range expectedCommands {
found := false
for _, cmd := range cmd.Commands() {

View File

@@ -1,88 +0,0 @@
package cli
import (
"context"
"fmt"
"os"
"git.eeqj.de/sneak/vaultik/internal/globals"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// FetchOptions contains options for the fetch command
type FetchOptions struct {
Bucket string
Prefix string
SnapshotID string
FilePath string
Target string
}
// NewFetchCommand creates the fetch command
func NewFetchCommand() *cobra.Command {
opts := &FetchOptions{}
cmd := &cobra.Command{
Use: "fetch",
Short: "Extract single file from backup",
Long: `Download and decrypt a single file from a backup snapshot`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Validate required flags
if opts.Bucket == "" {
return fmt.Errorf("--bucket is required")
}
if opts.Prefix == "" {
return fmt.Errorf("--prefix is required")
}
if opts.SnapshotID == "" {
return fmt.Errorf("--snapshot is required")
}
if opts.FilePath == "" {
return fmt.Errorf("--file is required")
}
if opts.Target == "" {
return fmt.Errorf("--target is required")
}
return runFetch(cmd.Context(), opts)
},
}
cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name")
cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix")
cmd.Flags().StringVar(&opts.SnapshotID, "snapshot", "", "Snapshot ID")
cmd.Flags().StringVar(&opts.FilePath, "file", "", "Path of file to extract from backup")
cmd.Flags().StringVar(&opts.Target, "target", "", "Target path for extracted file")
return cmd
}
func runFetch(ctx context.Context, opts *FetchOptions) error {
if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
}
app := fx.New(
fx.Supply(opts),
fx.Provide(globals.New),
// Additional modules will be added here
fx.Invoke(func(g *globals.Globals) error {
// TODO: Implement fetch logic
fmt.Printf("Fetching %s from snapshot %s to %s\n", opts.FilePath, opts.SnapshotID, opts.Target)
return nil
}),
fx.NopLogger,
)
if err := app.Start(ctx); err != nil {
return fmt.Errorf("failed to start fetch: %w", err)
}
defer func() {
if err := app.Stop(ctx); err != nil {
fmt.Printf("error stopping app: %v\n", err)
}
}()
return nil
}

71
internal/cli/info.go Normal file
View File

@@ -0,0 +1,71 @@
package cli
import (
"context"
"os"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// NewInfoCommand creates the info command
func NewInfoCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "info",
Short: "Display system and configuration information",
Long: `Shows information about the current vaultik configuration, including:
- System details (OS, architecture, version)
- Storage configuration (S3 bucket, endpoint)
- Backup settings (source directories, compression)
- Encryption configuration (recipients)
- Local database statistics`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
// Use the app framework
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
go func() {
if err := v.ShowInfo(); err != nil {
if err != context.Canceled {
log.Error("Failed to show info", "error", err)
os.Exit(1)
}
}
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
v.Cancel()
return nil
},
})
}),
},
})
},
}
return cmd
}

View File

@@ -2,51 +2,28 @@ package cli
import (
"context"
"fmt"
"strings"
"os"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/globals"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/s3"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"github.com/dustin/go-humanize"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// PruneOptions contains options for the prune command
type PruneOptions struct {
DryRun bool
}
// PruneApp contains all dependencies needed for pruning
type PruneApp struct {
Globals *globals.Globals
Config *config.Config
Repositories *database.Repositories
S3Client *s3.Client
DB *database.DB
Shutdowner fx.Shutdowner
}
// NewPruneCommand creates the prune command
func NewPruneCommand() *cobra.Command {
opts := &PruneOptions{}
opts := &vaultik.PruneOptions{}
cmd := &cobra.Command{
Use: "prune",
Short: "Remove unreferenced blobs",
Long: `Delete blobs that are no longer referenced by any snapshot.
Long: `Removes blobs that are not referenced by any snapshot.
This command will:
1. Download the manifest from the last successful snapshot
2. List all blobs in S3
3. Delete any blobs not referenced in the manifest
This command scans all snapshots and their manifests to build a list of
referenced blobs, then removes any blobs in storage that are not in this list.
Config is located at /etc/vaultik/config.yml by default, but can be overridden by
specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
Use this command after deleting snapshots with 'vaultik purge' to reclaim
storage space.`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Use unified config resolution
@@ -62,39 +39,27 @@ specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet || opts.JSON,
},
Modules: []fx.Option{
snapshot.Module,
s3.Module,
fx.Provide(fx.Annotate(
func(g *globals.Globals, cfg *config.Config, repos *database.Repositories,
s3Client *s3.Client, db *database.DB, shutdowner fx.Shutdowner) *PruneApp {
return &PruneApp{
Globals: g,
Config: cfg,
Repositories: repos,
S3Client: s3Client,
DB: db,
Shutdowner: shutdowner,
}
},
)),
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(app *PruneApp, lc fx.Lifecycle) {
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
// Start the prune operation in a goroutine
go func() {
// Run the prune operation
if err := app.runPrune(ctx, opts); err != nil {
if err := v.PruneBlobs(opts); err != nil {
if err != context.Canceled {
if !opts.JSON {
log.Error("Prune operation failed", "error", err)
}
os.Exit(1)
}
}
// Shutdown the app when prune completes
if err := app.Shutdowner.Shutdown(); err != nil {
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
@@ -102,6 +67,7 @@ specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
},
OnStop: func(ctx context.Context) error {
log.Debug("Stopping prune operation")
v.Cancel()
return nil
},
})
@@ -111,186 +77,8 @@ specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
},
}
cmd.Flags().BoolVar(&opts.DryRun, "dry-run", false, "Show what would be deleted without actually deleting")
cmd.Flags().BoolVar(&opts.Force, "force", false, "Skip confirmation prompt")
cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output pruning stats as JSON")
return cmd
}
// runPrune executes the prune operation
func (app *PruneApp) runPrune(ctx context.Context, opts *PruneOptions) error {
log.Info("Starting prune operation",
"bucket", app.Config.S3.Bucket,
"prefix", app.Config.S3.Prefix,
"dry_run", opts.DryRun,
)
// Step 1: Get the latest complete snapshot from the database
log.Info("Getting latest snapshot from database")
snapshots, err := app.Repositories.Snapshots.ListRecent(ctx, 1)
if err != nil {
return fmt.Errorf("listing snapshots: %w", err)
}
if len(snapshots) == 0 {
return fmt.Errorf("no snapshots found in database")
}
latestSnapshot := snapshots[0]
if latestSnapshot.CompletedAt == nil {
return fmt.Errorf("latest snapshot %s is incomplete", latestSnapshot.ID)
}
log.Info("Found latest snapshot",
"id", latestSnapshot.ID,
"completed_at", latestSnapshot.CompletedAt.Format("2006-01-02 15:04:05"))
// Step 2: Find and download the manifest from the last successful snapshot in S3
log.Info("Finding last successful snapshot in S3")
metadataPrefix := "metadata/"
// List all snapshots in S3
var s3Snapshots []string
objectCh := app.S3Client.ListObjectsStream(ctx, metadataPrefix, false)
for obj := range objectCh {
if obj.Err != nil {
return fmt.Errorf("listing metadata objects: %w", obj.Err)
}
// Extract snapshot ID from path like "metadata/hostname-20240115-143052Z/manifest.json.zst"
parts := strings.Split(obj.Key, "/")
if len(parts) >= 2 && strings.HasSuffix(obj.Key, "/manifest.json.zst") {
s3Snapshots = append(s3Snapshots, parts[1])
}
}
if len(s3Snapshots) == 0 {
return fmt.Errorf("no snapshot manifests found in S3")
}
// Find the most recent snapshot (they're named with timestamps)
var lastS3Snapshot string
for _, snap := range s3Snapshots {
if lastS3Snapshot == "" || snap > lastS3Snapshot {
lastS3Snapshot = snap
}
}
log.Info("Found last S3 snapshot", "id", lastS3Snapshot)
// Step 3: Verify the last S3 snapshot matches the latest DB snapshot
if lastS3Snapshot != latestSnapshot.ID {
return fmt.Errorf("latest snapshot in database (%s) does not match last successful snapshot in S3 (%s)",
latestSnapshot.ID, lastS3Snapshot)
}
// Step 4: Download and parse the manifest
log.Info("Downloading manifest", "snapshot_id", lastS3Snapshot)
manifest, err := app.downloadManifest(ctx, lastS3Snapshot)
if err != nil {
return fmt.Errorf("downloading manifest: %w", err)
}
log.Info("Manifest loaded", "blob_count", len(manifest.Blobs))
// Step 5: Build set of referenced blobs
referencedBlobs := make(map[string]bool)
for _, blob := range manifest.Blobs {
referencedBlobs[blob.Hash] = true
}
// Step 6: List all blobs in S3
log.Info("Listing all blobs in S3")
blobPrefix := "blobs/"
var totalBlobs int
var unreferencedBlobs []s3.ObjectInfo
var unreferencedSize int64
objectCh = app.S3Client.ListObjectsStream(ctx, blobPrefix, true)
for obj := range objectCh {
if obj.Err != nil {
return fmt.Errorf("listing blobs: %w", obj.Err)
}
totalBlobs++
// Extract blob hash from path like "blobs/ca/fe/cafebabe..."
parts := strings.Split(obj.Key, "/")
if len(parts) == 4 {
blobHash := parts[3]
if !referencedBlobs[blobHash] {
unreferencedBlobs = append(unreferencedBlobs, obj)
unreferencedSize += obj.Size
}
}
}
log.Info("Blob scan complete",
"total_blobs", totalBlobs,
"referenced_blobs", len(referencedBlobs),
"unreferenced_blobs", len(unreferencedBlobs),
"unreferenced_size", humanize.Bytes(uint64(unreferencedSize)))
// Step 7: Delete or report unreferenced blobs
if opts.DryRun {
fmt.Printf("\nDry run mode - would delete %d unreferenced blobs\n", len(unreferencedBlobs))
fmt.Printf("Total size of blobs to delete: %s\n", humanize.Bytes(uint64(unreferencedSize)))
if len(unreferencedBlobs) > 0 {
log.Debug("Unreferenced blobs found", "count", len(unreferencedBlobs))
for _, obj := range unreferencedBlobs {
log.Debug("Would delete blob", "key", obj.Key, "size", humanize.Bytes(uint64(obj.Size)))
}
}
} else {
if len(unreferencedBlobs) == 0 {
fmt.Println("No unreferenced blobs to delete")
return nil
}
fmt.Printf("\nDeleting %d unreferenced blobs (%s)...\n",
len(unreferencedBlobs), humanize.Bytes(uint64(unreferencedSize)))
deletedCount := 0
deletedSize := int64(0)
for _, obj := range unreferencedBlobs {
if err := app.S3Client.RemoveObject(ctx, obj.Key); err != nil {
log.Error("Failed to delete blob", "key", obj.Key, "error", err)
continue
}
deletedCount++
deletedSize += obj.Size
// Show progress every 100 blobs
if deletedCount%100 == 0 {
fmt.Printf(" Deleted %d/%d blobs (%s)...\n",
deletedCount, len(unreferencedBlobs),
humanize.Bytes(uint64(deletedSize)))
}
}
fmt.Printf("\nDeleted %d blobs (%s)\n", deletedCount, humanize.Bytes(uint64(deletedSize)))
}
log.Info("Prune operation completed successfully")
return nil
}
// downloadManifest downloads and decompresses a snapshot manifest
func (app *PruneApp) downloadManifest(ctx context.Context, snapshotID string) (*snapshot.Manifest, error) {
manifestPath := fmt.Sprintf("metadata/%s/manifest.json.zst", snapshotID)
// Download the compressed manifest
reader, err := app.S3Client.GetObject(ctx, manifestPath)
if err != nil {
return nil, fmt.Errorf("downloading manifest: %w", err)
}
defer func() { _ = reader.Close() }()
// Decode manifest
manifest, err := snapshot.DecodeManifest(reader)
if err != nil {
return nil, fmt.Errorf("decoding manifest: %w", err)
}
return manifest, nil
}

100
internal/cli/purge.go Normal file
View File

@@ -0,0 +1,100 @@
package cli
import (
"context"
"fmt"
"os"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// PurgeOptions contains options for the purge command
type PurgeOptions struct {
KeepLatest bool
OlderThan string
Force bool
}
// NewPurgeCommand creates the purge command
func NewPurgeCommand() *cobra.Command {
opts := &PurgeOptions{}
cmd := &cobra.Command{
Use: "purge",
Short: "Purge old snapshots",
Long: `Removes snapshots based on age or count criteria.
This command allows you to:
- Keep only the latest snapshot (--keep-latest)
- Remove snapshots older than a specific duration (--older-than)
Config is located at /etc/vaultik/config.yml by default, but can be overridden by
specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Validate flags
if !opts.KeepLatest && opts.OlderThan == "" {
return fmt.Errorf("must specify either --keep-latest or --older-than")
}
if opts.KeepLatest && opts.OlderThan != "" {
return fmt.Errorf("cannot specify both --keep-latest and --older-than")
}
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
// Use the app framework like other commands
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
// Start the purge operation in a goroutine
go func() {
// Run the purge operation
if err := v.PurgeSnapshots(opts.KeepLatest, opts.OlderThan, opts.Force); err != nil {
if err != context.Canceled {
log.Error("Purge operation failed", "error", err)
os.Exit(1)
}
}
// Shutdown the app when purge completes
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
log.Debug("Stopping purge operation")
v.Cancel()
return nil
},
})
}),
},
})
},
}
cmd.Flags().BoolVar(&opts.KeepLatest, "keep-latest", false, "Keep only the latest snapshot")
cmd.Flags().StringVar(&opts.OlderThan, "older-than", "", "Remove snapshots older than duration (e.g. 30d, 6m, 1y)")
cmd.Flags().BoolVar(&opts.Force, "force", false, "Skip confirmation prompts")
return cmd
}

89
internal/cli/remote.go Normal file
View File

@@ -0,0 +1,89 @@
package cli
import (
"context"
"os"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// NewRemoteCommand creates the remote command and subcommands
func NewRemoteCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "remote",
Short: "Remote storage management commands",
Long: "Commands for inspecting and managing remote storage",
}
// Add subcommands
cmd.AddCommand(newRemoteInfoCommand())
return cmd
}
// newRemoteInfoCommand creates the 'remote info' subcommand
func newRemoteInfoCommand() *cobra.Command {
var jsonOutput bool
cmd := &cobra.Command{
Use: "info",
Short: "Display remote storage information",
Long: `Shows detailed information about remote storage, including:
- Size of all snapshot metadata (per snapshot and total)
- Count and total size of all blobs
- Count and size of referenced blobs (from all manifests)
- Count and size of orphaned blobs (not referenced by any manifest)`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet || jsonOutput,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
go func() {
if err := v.RemoteInfo(jsonOutput); err != nil {
if err != context.Canceled {
if !jsonOutput {
log.Error("Failed to get remote info", "error", err)
}
os.Exit(1)
}
}
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
v.Cancel()
return nil
},
})
}),
},
})
},
}
cmd.Flags().BoolVar(&jsonOutput, "json", false, "Output in JSON format")
return cmd
}

View File

@@ -2,20 +2,30 @@ package cli
import (
"context"
"fmt"
"os"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/globals"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/storage"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// RestoreOptions contains options for the restore command
type RestoreOptions struct {
Bucket string
Prefix string
SnapshotID string
TargetDir string
Paths []string // Optional paths to restore (empty = all)
Verify bool // Verify restored files after restore
}
// RestoreApp contains all dependencies needed for restore
type RestoreApp struct {
Globals *globals.Globals
Config *config.Config
Storage storage.Storer
Vaultik *vaultik.Vaultik
Shutdowner fx.Shutdowner
}
// NewRestoreCommand creates the restore command
@@ -23,61 +33,104 @@ func NewRestoreCommand() *cobra.Command {
opts := &RestoreOptions{}
cmd := &cobra.Command{
Use: "restore",
Use: "restore <snapshot-id> <target-dir> [paths...]",
Short: "Restore files from backup",
Long: `Download and decrypt files from a backup snapshot`,
Args: cobra.NoArgs,
Long: `Download and decrypt files from a backup snapshot.
This command will restore files from the specified snapshot to the target directory.
If no paths are specified, all files are restored.
If paths are specified, only matching files/directories are restored.
Requires the VAULTIK_AGE_SECRET_KEY environment variable to be set with the age private key.
Examples:
# Restore entire snapshot
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore
# Restore specific file
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore /home/user/important.txt
# Restore specific directory
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore /home/user/documents/
# Restore and verify all files
vaultik restore --verify myhost_docs_2025-01-01T12:00:00Z /restore`,
Args: cobra.MinimumNArgs(2),
RunE: func(cmd *cobra.Command, args []string) error {
// Validate required flags
if opts.Bucket == "" {
return fmt.Errorf("--bucket is required")
snapshotID := args[0]
opts.TargetDir = args[1]
if len(args) > 2 {
opts.Paths = args[2:]
}
if opts.Prefix == "" {
return fmt.Errorf("--prefix is required")
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
if opts.SnapshotID == "" {
return fmt.Errorf("--snapshot is required")
// Use the app framework like other commands
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{
fx.Provide(fx.Annotate(
func(g *globals.Globals, cfg *config.Config,
storer storage.Storer, v *vaultik.Vaultik, shutdowner fx.Shutdowner) *RestoreApp {
return &RestoreApp{
Globals: g,
Config: cfg,
Storage: storer,
Vaultik: v,
Shutdowner: shutdowner,
}
if opts.TargetDir == "" {
return fmt.Errorf("--target is required")
},
)),
},
Invokes: []fx.Option{
fx.Invoke(func(app *RestoreApp, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
// Start the restore operation in a goroutine
go func() {
// Run the restore operation
restoreOpts := &vaultik.RestoreOptions{
SnapshotID: snapshotID,
TargetDir: opts.TargetDir,
Paths: opts.Paths,
Verify: opts.Verify,
}
return runRestore(cmd.Context(), opts)
if err := app.Vaultik.Restore(restoreOpts); err != nil {
if err != context.Canceled {
log.Error("Restore operation failed", "error", err)
}
}
// Shutdown the app when restore completes
if err := app.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
log.Debug("Stopping restore operation")
app.Vaultik.Cancel()
return nil
},
})
}),
},
})
},
}
cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name")
cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix")
cmd.Flags().StringVar(&opts.SnapshotID, "snapshot", "", "Snapshot ID to restore")
cmd.Flags().StringVar(&opts.TargetDir, "target", "", "Target directory for restore")
cmd.Flags().BoolVar(&opts.Verify, "verify", false, "Verify restored files by checking chunk hashes")
return cmd
}
func runRestore(ctx context.Context, opts *RestoreOptions) error {
if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
}
app := fx.New(
fx.Supply(opts),
fx.Provide(globals.New),
// Additional modules will be added here
fx.Invoke(func(g *globals.Globals) error {
// TODO: Implement restore logic
fmt.Printf("Restoring snapshot %s to %s\n", opts.SnapshotID, opts.TargetDir)
return nil
}),
fx.NopLogger,
)
if err := app.Start(ctx); err != nil {
return fmt.Errorf("failed to start restore: %w", err)
}
defer func() {
if err := app.Stop(ctx); err != nil {
fmt.Printf("error stopping app: %v\n", err)
}
}()
return nil
}

View File

@@ -13,6 +13,7 @@ type RootFlags struct {
ConfigPath string
Verbose bool
Debug bool
Quiet bool
}
var rootFlags RootFlags
@@ -34,15 +35,19 @@ on the source system.`,
cmd.PersistentFlags().StringVar(&rootFlags.ConfigPath, "config", "", "Path to config file (default: $VAULTIK_CONFIG or /etc/vaultik/config.yml)")
cmd.PersistentFlags().BoolVarP(&rootFlags.Verbose, "verbose", "v", false, "Enable verbose output")
cmd.PersistentFlags().BoolVar(&rootFlags.Debug, "debug", false, "Enable debug output")
cmd.PersistentFlags().BoolVarP(&rootFlags.Quiet, "quiet", "q", false, "Suppress non-error output")
// Add subcommands
cmd.AddCommand(
NewRestoreCommand(),
NewPruneCommand(),
NewVerifyCommand(),
NewFetchCommand(),
NewStoreCommand(),
NewSnapshotCommand(),
NewInfoCommand(),
NewVersionCommand(),
NewRemoteCommand(),
NewDatabaseCommand(),
)
return cmd

File diff suppressed because it is too large Load Diff

View File

@@ -7,14 +7,14 @@ import (
"time"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/s3"
"git.eeqj.de/sneak/vaultik/internal/storage"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// StoreApp contains dependencies for store commands
type StoreApp struct {
S3Client *s3.Client
Storage storage.Storer
Shutdowner fx.Shutdowner
}
@@ -23,7 +23,7 @@ func NewStoreCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "store",
Short: "Storage information commands",
Long: "Commands for viewing information about the S3 storage backend",
Long: "Commands for viewing information about the storage backend",
}
// Add subcommands
@@ -37,7 +37,7 @@ func newStoreInfoCommand() *cobra.Command {
return &cobra.Command{
Use: "info",
Short: "Display storage information",
Long: "Shows S3 bucket configuration and storage statistics including snapshots and blobs",
Long: "Shows storage configuration and statistics including snapshots and blobs",
RunE: func(cmd *cobra.Command, args []string) error {
return runWithApp(cmd.Context(), func(app *StoreApp) error {
return app.Info(cmd.Context())
@@ -48,19 +48,18 @@ func newStoreInfoCommand() *cobra.Command {
// Info displays storage information
func (app *StoreApp) Info(ctx context.Context) error {
// Get bucket info
bucketName := app.S3Client.BucketName()
endpoint := app.S3Client.Endpoint()
// Get storage info
storageInfo := app.Storage.Info()
fmt.Printf("Storage Information\n")
fmt.Printf("==================\n\n")
fmt.Printf("S3 Configuration:\n")
fmt.Printf(" Endpoint: %s\n", endpoint)
fmt.Printf(" Bucket: %s\n\n", bucketName)
fmt.Printf("Storage Configuration:\n")
fmt.Printf(" Type: %s\n", storageInfo.Type)
fmt.Printf(" Location: %s\n\n", storageInfo.Location)
// Count snapshots by listing metadata/ prefix
snapshotCount := 0
snapshotCh := app.S3Client.ListObjectsStream(ctx, "metadata/", true)
snapshotCh := app.Storage.ListStream(ctx, "metadata/")
snapshotDirs := make(map[string]bool)
for object := range snapshotCh {
@@ -79,7 +78,7 @@ func (app *StoreApp) Info(ctx context.Context) error {
blobCount := 0
var totalSize int64
blobCh := app.S3Client.ListObjectsStream(ctx, "blobs/", false)
blobCh := app.Storage.ListStream(ctx, "blobs/")
for object := range blobCh {
if object.Err != nil {
return fmt.Errorf("listing blobs: %w", object.Err)
@@ -128,12 +127,12 @@ func runWithApp(ctx context.Context, fn func(*StoreApp) error) error {
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{
s3.Module,
fx.Provide(func(s3Client *s3.Client, shutdowner fx.Shutdowner) *StoreApp {
fx.Provide(func(storer storage.Storer, shutdowner fx.Shutdowner) *StoreApp {
return &StoreApp{
S3Client: s3Client,
Storage: storer,
Shutdowner: shutdowner,
}
}),

View File

@@ -0,0 +1,10 @@
package cli
import "time"
// SnapshotInfo represents snapshot information for listing
type SnapshotInfo struct {
ID string `json:"id"`
Timestamp time.Time `json:"timestamp"`
CompressedSize int64 `json:"compressed_size"`
}

View File

@@ -2,85 +2,97 @@ package cli
import (
"context"
"fmt"
"os"
"git.eeqj.de/sneak/vaultik/internal/globals"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// VerifyOptions contains options for the verify command
type VerifyOptions struct {
Bucket string
Prefix string
SnapshotID string
Quick bool
}
// NewVerifyCommand creates the verify command
func NewVerifyCommand() *cobra.Command {
opts := &VerifyOptions{}
opts := &vaultik.VerifyOptions{}
cmd := &cobra.Command{
Use: "verify",
Short: "Verify backup integrity",
Long: `Check that all referenced blobs exist and verify metadata integrity`,
Args: cobra.NoArgs,
Use: "verify <snapshot-id>",
Short: "Verify snapshot integrity",
Long: `Verifies that all blobs referenced in a snapshot exist and optionally verifies their contents.
Shallow verification (default):
- Downloads and decompresses manifest
- Checks existence of all blobs in S3
- Reports missing blobs
Deep verification (--deep):
- Downloads and decrypts database
- Verifies blob lists match between manifest and database
- Downloads, decrypts, and decompresses each blob
- Verifies SHA256 hash of each chunk matches database
- Ensures chunks are ordered correctly
The command will fail immediately on any verification error and exit with non-zero status.`,
Args: cobra.ExactArgs(1),
RunE: func(cmd *cobra.Command, args []string) error {
// Validate required flags
if opts.Bucket == "" {
return fmt.Errorf("--bucket is required")
snapshotID := args[0]
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
if opts.Prefix == "" {
return fmt.Errorf("--prefix is required")
// Use the app framework for all verification
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet || opts.JSON, // Suppress log output in JSON mode
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
// Run the verify operation directly
go func() {
var err error
if opts.Deep {
err = v.RunDeepVerify(snapshotID, opts)
} else {
err = v.VerifySnapshotWithOptions(snapshotID, opts)
}
return runVerify(cmd.Context(), opts)
if err != nil {
if err != context.Canceled {
if !opts.JSON {
log.Error("Verification failed", "error", err)
}
os.Exit(1)
}
}
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
log.Debug("Stopping verify operation")
v.Cancel()
return nil
},
})
}),
},
})
},
}
cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name")
cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix")
cmd.Flags().StringVar(&opts.SnapshotID, "snapshot", "", "Snapshot ID to verify (optional, defaults to latest)")
cmd.Flags().BoolVar(&opts.Quick, "quick", false, "Perform quick verification by checking blob existence and S3 content hashes without downloading")
cmd.Flags().BoolVar(&opts.Deep, "deep", false, "Perform deep verification by downloading and verifying all blob contents")
cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output verification results as JSON")
return cmd
}
func runVerify(ctx context.Context, opts *VerifyOptions) error {
if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
}
app := fx.New(
fx.Supply(opts),
fx.Provide(globals.New),
// Additional modules will be added here
fx.Invoke(func(g *globals.Globals) error {
// TODO: Implement verify logic
if opts.SnapshotID == "" {
fmt.Printf("Verifying latest snapshot in bucket %s with prefix %s\n", opts.Bucket, opts.Prefix)
} else {
fmt.Printf("Verifying snapshot %s in bucket %s with prefix %s\n", opts.SnapshotID, opts.Bucket, opts.Prefix)
}
if opts.Quick {
fmt.Println("Performing quick verification")
} else {
fmt.Println("Performing deep verification")
}
return nil
}),
fx.NopLogger,
)
if err := app.Start(ctx); err != nil {
return fmt.Errorf("failed to start verify: %w", err)
}
defer func() {
if err := app.Stop(ctx); err != nil {
fmt.Printf("error stopping app: %v\n", err)
}
}()
return nil
}

27
internal/cli/version.go Normal file
View File

@@ -0,0 +1,27 @@
package cli
import (
"fmt"
"runtime"
"git.eeqj.de/sneak/vaultik/internal/globals"
"github.com/spf13/cobra"
)
// NewVersionCommand creates the version command
func NewVersionCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "version",
Short: "Print version information",
Long: `Print version, git commit, and build information for vaultik.`,
Args: cobra.NoArgs,
Run: func(cmd *cobra.Command, args []string) {
fmt.Printf("vaultik %s\n", globals.Version)
fmt.Printf(" commit: %s\n", globals.Commit)
fmt.Printf(" go: %s\n", runtime.Version())
fmt.Printf(" os/arch: %s/%s\n", runtime.GOOS, runtime.GOARCH)
},
}
return cmd
}

View File

@@ -3,29 +3,107 @@ package config
import (
"fmt"
"os"
"path/filepath"
"sort"
"strings"
"time"
"filippo.io/age"
"git.eeqj.de/sneak/smartconfig"
"git.eeqj.de/sneak/vaultik/internal/log"
"github.com/adrg/xdg"
"go.uber.org/fx"
"gopkg.in/yaml.v3"
)
const appName = "berlin.sneak.app.vaultik"
// expandTilde expands ~ at the start of a path to the user's home directory.
func expandTilde(path string) string {
if path == "~" {
home, _ := os.UserHomeDir()
return home
}
if strings.HasPrefix(path, "~/") {
home, _ := os.UserHomeDir()
return filepath.Join(home, path[2:])
}
return path
}
// expandTildeInURL expands ~ in file:// URLs.
func expandTildeInURL(url string) string {
if strings.HasPrefix(url, "file://~/") {
home, _ := os.UserHomeDir()
return "file://" + filepath.Join(home, url[9:])
}
return url
}
// SnapshotConfig represents configuration for a named snapshot.
// Each snapshot backs up one or more paths and can have its own exclude patterns
// in addition to the global excludes.
type SnapshotConfig struct {
Paths []string `yaml:"paths"`
Exclude []string `yaml:"exclude"` // Additional excludes for this snapshot
}
// GetExcludes returns the combined exclude patterns for a named snapshot.
// It merges global excludes with the snapshot-specific excludes.
func (c *Config) GetExcludes(snapshotName string) []string {
snap, ok := c.Snapshots[snapshotName]
if !ok {
return c.Exclude
}
if len(snap.Exclude) == 0 {
return c.Exclude
}
// Combine global and snapshot-specific excludes
combined := make([]string, 0, len(c.Exclude)+len(snap.Exclude))
combined = append(combined, c.Exclude...)
combined = append(combined, snap.Exclude...)
return combined
}
// SnapshotNames returns the names of all configured snapshots in sorted order.
func (c *Config) SnapshotNames() []string {
names := make([]string, 0, len(c.Snapshots))
for name := range c.Snapshots {
names = append(names, name)
}
// Sort for deterministic order
sort.Strings(names)
return names
}
// Config represents the application configuration for Vaultik.
// It defines all settings for backup operations, including source directories,
// encryption recipients, S3 storage configuration, and performance tuning parameters.
// encryption recipients, storage configuration, and performance tuning parameters.
// Configuration is typically loaded from a YAML file.
type Config struct {
AgeRecipients []string `yaml:"age_recipients"`
AgeSecretKey string `yaml:"age_secret_key"`
BackupInterval time.Duration `yaml:"backup_interval"`
BlobSizeLimit Size `yaml:"blob_size_limit"`
ChunkSize Size `yaml:"chunk_size"`
Exclude []string `yaml:"exclude"`
Exclude []string `yaml:"exclude"` // Global excludes applied to all snapshots
FullScanInterval time.Duration `yaml:"full_scan_interval"`
Hostname string `yaml:"hostname"`
IndexPath string `yaml:"index_path"`
MinTimeBetweenRun time.Duration `yaml:"min_time_between_run"`
S3 S3Config `yaml:"s3"`
SourceDirs []string `yaml:"source_dirs"`
Snapshots map[string]SnapshotConfig `yaml:"snapshots"`
CompressionLevel int `yaml:"compression_level"`
// StorageURL specifies the storage backend using a URL format.
// Takes precedence over S3Config if set.
// Supported formats:
// - s3://bucket/prefix?endpoint=host&region=us-east-1
// - file:///path/to/backup
// For S3 URLs, credentials are still read from s3.access_key_id and s3.secret_access_key.
StorageURL string `yaml:"storage_url"`
}
// S3Config represents S3 storage configuration for backup storage.
@@ -65,13 +143,14 @@ func New(path ConfigPath) (*Config, error) {
// Load reads and parses the configuration file from the specified path.
// It applies default values for optional fields, performs environment variable
// substitution for certain fields (like IndexPath), and validates the configuration.
// substitution using smartconfig, and validates the configuration.
// The configuration file should be in YAML format. Returns an error if the file
// cannot be read, parsed, or if validation fails.
func Load(path string) (*Config, error) {
data, err := os.ReadFile(path)
// Load config using smartconfig for interpolation
sc, err := smartconfig.NewFromConfigPath(path)
if err != nil {
return nil, fmt.Errorf("failed to read config file: %w", err)
return nil, fmt.Errorf("failed to load config file: %w", err)
}
cfg := &Config{
@@ -81,17 +160,41 @@ func Load(path string) (*Config, error) {
BackupInterval: 1 * time.Hour,
FullScanInterval: 24 * time.Hour,
MinTimeBetweenRun: 15 * time.Minute,
IndexPath: "/var/lib/vaultik/index.sqlite",
IndexPath: filepath.Join(xdg.DataHome, appName, "index.sqlite"),
CompressionLevel: 3,
}
if err := yaml.Unmarshal(data, cfg); err != nil {
// Convert smartconfig data to YAML then unmarshal
configData := sc.Data()
yamlBytes, err := yaml.Marshal(configData)
if err != nil {
return nil, fmt.Errorf("failed to marshal config data: %w", err)
}
if err := yaml.Unmarshal(yamlBytes, cfg); err != nil {
return nil, fmt.Errorf("failed to parse config: %w", err)
}
// Expand tilde in all path fields
cfg.IndexPath = expandTilde(cfg.IndexPath)
cfg.StorageURL = expandTildeInURL(cfg.StorageURL)
// Expand tildes in snapshot paths
for name, snap := range cfg.Snapshots {
for i, path := range snap.Paths {
snap.Paths[i] = expandTilde(path)
}
cfg.Snapshots[name] = snap
}
// Check for environment variable override for IndexPath
if envIndexPath := os.Getenv("VAULTIK_INDEX_PATH"); envIndexPath != "" {
cfg.IndexPath = envIndexPath
cfg.IndexPath = expandTilde(envIndexPath)
}
// Check for environment variable override for AgeSecretKey
if envAgeSecretKey := os.Getenv("VAULTIK_AGE_SECRET_KEY"); envAgeSecretKey != "" {
cfg.AgeSecretKey = extractAgeSecretKey(envAgeSecretKey)
}
// Get hostname if not set
@@ -111,6 +214,17 @@ func Load(path string) (*Config, error) {
cfg.S3.PartSize = Size(5 * 1024 * 1024) // 5MB
}
// Check config file permissions (warn if world or group readable)
if info, err := os.Stat(path); err == nil {
mode := info.Mode().Perm()
if mode&0044 != 0 { // group or world readable
log.Warn("Config file has insecure permissions (contains S3 credentials)",
"path", path,
"mode", fmt.Sprintf("%04o", mode),
"recommendation", "chmod 600 "+path)
}
}
if err := cfg.Validate(); err != nil {
return nil, fmt.Errorf("invalid config: %w", err)
}
@@ -121,8 +235,8 @@ func Load(path string) (*Config, error) {
// Validate checks if the configuration is valid and complete.
// It ensures all required fields are present and have valid values:
// - At least one age recipient must be specified
// - At least one source directory must be configured
// - S3 credentials and endpoint must be provided
// - At least one snapshot must be configured with at least one path
// - Storage must be configured (either storage_url or s3.* fields)
// - Chunk size must be at least 1MB
// - Blob size limit must be at least the chunk size
// - Compression level must be between 1 and 19
@@ -132,24 +246,19 @@ func (c *Config) Validate() error {
return fmt.Errorf("at least one age_recipient is required")
}
if len(c.SourceDirs) == 0 {
return fmt.Errorf("at least one source directory is required")
if len(c.Snapshots) == 0 {
return fmt.Errorf("at least one snapshot must be configured")
}
if c.S3.Endpoint == "" {
return fmt.Errorf("s3.endpoint is required")
for name, snap := range c.Snapshots {
if len(snap.Paths) == 0 {
return fmt.Errorf("snapshot %q must have at least one path", name)
}
}
if c.S3.Bucket == "" {
return fmt.Errorf("s3.bucket is required")
}
if c.S3.AccessKeyID == "" {
return fmt.Errorf("s3.access_key_id is required")
}
if c.S3.SecretAccessKey == "" {
return fmt.Errorf("s3.secret_access_key is required")
// Validate storage configuration
if err := c.validateStorage(); err != nil {
return err
}
if c.ChunkSize.Int64() < 1024*1024 { // 1MB minimum
@@ -167,6 +276,69 @@ func (c *Config) Validate() error {
return nil
}
// validateStorage validates storage configuration.
// If StorageURL is set, it takes precedence. S3 URLs require credentials.
// File URLs don't require any S3 configuration.
// If StorageURL is not set, legacy S3 configuration is required.
func (c *Config) validateStorage() error {
if c.StorageURL != "" {
// URL-based configuration
if strings.HasPrefix(c.StorageURL, "file://") {
// File storage doesn't need S3 credentials
return nil
}
if strings.HasPrefix(c.StorageURL, "s3://") {
// S3 storage needs credentials
if c.S3.AccessKeyID == "" {
return fmt.Errorf("s3.access_key_id is required for s3:// URLs")
}
if c.S3.SecretAccessKey == "" {
return fmt.Errorf("s3.secret_access_key is required for s3:// URLs")
}
return nil
}
if strings.HasPrefix(c.StorageURL, "rclone://") {
// Rclone storage uses rclone's own config
return nil
}
return fmt.Errorf("storage_url must start with s3://, file://, or rclone://")
}
// Legacy S3 configuration
if c.S3.Endpoint == "" {
return fmt.Errorf("s3.endpoint is required (or set storage_url)")
}
if c.S3.Bucket == "" {
return fmt.Errorf("s3.bucket is required (or set storage_url)")
}
if c.S3.AccessKeyID == "" {
return fmt.Errorf("s3.access_key_id is required")
}
if c.S3.SecretAccessKey == "" {
return fmt.Errorf("s3.secret_access_key is required")
}
return nil
}
// extractAgeSecretKey extracts the AGE-SECRET-KEY from the input using
// the age library's parser, which handles comments and whitespace.
func extractAgeSecretKey(input string) string {
identities, err := age.ParseIdentities(strings.NewReader(input))
if err != nil || len(identities) == 0 {
// Fall back to trimmed input if parsing fails
return strings.TrimSpace(input)
}
// Return the string representation of the first identity
if id, ok := identities[0].(*age.X25519Identity); ok {
return id.String()
}
return strings.TrimSpace(input)
}
// Module exports the config module for fx dependency injection.
// It provides the Config type to other modules in the application.
var Module = fx.Module("config",

View File

@@ -45,12 +45,21 @@ func TestConfigLoad(t *testing.T) {
t.Errorf("Expected first age recipient to be %s, got '%s'", TEST_SNEAK_AGE_PUBLIC_KEY, cfg.AgeRecipients[0])
}
if len(cfg.SourceDirs) != 2 {
t.Errorf("Expected 2 source dirs, got %d", len(cfg.SourceDirs))
if len(cfg.Snapshots) != 1 {
t.Errorf("Expected 1 snapshot, got %d", len(cfg.Snapshots))
}
if cfg.SourceDirs[0] != "/tmp/vaultik-test-source" {
t.Errorf("Expected first source dir to be '/tmp/vaultik-test-source', got '%s'", cfg.SourceDirs[0])
testSnap, ok := cfg.Snapshots["test"]
if !ok {
t.Fatal("Expected 'test' snapshot to exist")
}
if len(testSnap.Paths) != 2 {
t.Errorf("Expected 2 paths in test snapshot, got %d", len(testSnap.Paths))
}
if testSnap.Paths[0] != "/tmp/vaultik-test-source" {
t.Errorf("Expected first path to be '/tmp/vaultik-test-source', got '%s'", testSnap.Paths[0])
}
if cfg.S3.Bucket != "vaultik-test-bucket" {
@@ -74,3 +83,65 @@ func TestConfigFromEnv(t *testing.T) {
t.Errorf("Config file does not exist at path from VAULTIK_CONFIG: %s", configPath)
}
}
// TestExtractAgeSecretKey tests extraction of AGE-SECRET-KEY from various inputs
func TestExtractAgeSecretKey(t *testing.T) {
tests := []struct {
name string
input string
expected string
}{
{
name: "plain key",
input: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
},
{
name: "key with trailing newline",
input: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5\n",
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
},
{
name: "full age-keygen output",
input: `# created: 2025-01-14T12:00:00Z
# public key: age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg
AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5
`,
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
},
{
name: "age-keygen output with extra blank lines",
input: `# created: 2025-01-14T12:00:00Z
# public key: age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg
AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5
`,
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
},
{
name: "key with leading whitespace",
input: " AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5 ",
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
},
{
name: "empty input",
input: "",
expected: "",
},
{
name: "only comments",
input: "# this is a comment\n# another comment",
expected: "# this is a comment\n# another comment",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result := extractAgeSecretKey(tt.input)
if result != tt.expected {
t.Errorf("extractAgeSecretKey(%q) = %q, want %q", tt.input, result, tt.expected)
}
})
}
}

View File

@@ -51,3 +51,12 @@ func (s Size) Int64() int64 {
func (s Size) String() string {
return humanize.Bytes(uint64(s))
}
// ParseSize parses a size string into a Size value
func ParseSize(s string) (Size, error) {
bytes, err := humanize.ParseBytes(s)
if err != nil {
return 0, fmt.Errorf("invalid size format: %w", err)
}
return Size(bytes), nil
}

View File

@@ -7,6 +7,7 @@ import (
"sync"
"filippo.io/age"
"go.uber.org/fx"
)
// Encryptor provides thread-safe encryption using the age encryption library.
@@ -143,3 +144,66 @@ func (e *Encryptor) UpdateRecipients(publicKeys []string) error {
return nil
}
// Decryptor provides thread-safe decryption using the age encryption library.
// It uses a private key to decrypt data that was encrypted for the corresponding
// public key.
type Decryptor struct {
identity age.Identity
mu sync.RWMutex
}
// NewDecryptor creates a new decryptor with the given age private key.
// The private key should be a valid age X25519 identity string.
// Returns an error if the private key is invalid.
func NewDecryptor(privateKey string) (*Decryptor, error) {
identity, err := age.ParseX25519Identity(privateKey)
if err != nil {
return nil, fmt.Errorf("parsing age identity: %w", err)
}
return &Decryptor{
identity: identity,
}, nil
}
// Decrypt decrypts data using age decryption.
// This method is suitable for small to medium amounts of data that fit in memory.
// For large data streams, use DecryptStream instead.
func (d *Decryptor) Decrypt(data []byte) ([]byte, error) {
d.mu.RLock()
identity := d.identity
d.mu.RUnlock()
r, err := age.Decrypt(bytes.NewReader(data), identity)
if err != nil {
return nil, fmt.Errorf("creating decrypted reader: %w", err)
}
decrypted, err := io.ReadAll(r)
if err != nil {
return nil, fmt.Errorf("reading decrypted data: %w", err)
}
return decrypted, nil
}
// DecryptStream returns a reader that decrypts data from the provided reader.
// This method is suitable for decrypting large files or streams as it processes
// data in a streaming fashion without loading everything into memory.
// The caller should close the input reader when done.
func (d *Decryptor) DecryptStream(src io.Reader) (io.Reader, error) {
d.mu.RLock()
identity := d.identity
d.mu.RUnlock()
r, err := age.Decrypt(src, identity)
if err != nil {
return nil, fmt.Errorf("creating decrypted reader: %w", err)
}
return r, nil
}
// Module exports the crypto module for fx dependency injection.
var Module = fx.Module("crypto")

View File

@@ -5,6 +5,8 @@ import (
"strings"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
func TestBlobChunkRepository(t *testing.T) {
@@ -16,8 +18,8 @@ func TestBlobChunkRepository(t *testing.T) {
// Create blob first
blob := &Blob{
ID: "blob1-uuid",
Hash: "blob1-hash",
ID: types.NewBlobID(),
Hash: types.BlobHash("blob1-hash"),
CreatedTS: time.Now(),
}
err := repos.Blobs.Create(ctx, nil, blob)
@@ -26,7 +28,7 @@ func TestBlobChunkRepository(t *testing.T) {
}
// Create chunks
chunks := []string{"chunk1", "chunk2", "chunk3"}
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
for _, chunkHash := range chunks {
chunk := &Chunk{
ChunkHash: chunkHash,
@@ -41,7 +43,7 @@ func TestBlobChunkRepository(t *testing.T) {
// Test Create
bc1 := &BlobChunk{
BlobID: blob.ID,
ChunkHash: "chunk1",
ChunkHash: types.ChunkHash("chunk1"),
Offset: 0,
Length: 1024,
}
@@ -54,7 +56,7 @@ func TestBlobChunkRepository(t *testing.T) {
// Add more chunks to the same blob
bc2 := &BlobChunk{
BlobID: blob.ID,
ChunkHash: "chunk2",
ChunkHash: types.ChunkHash("chunk2"),
Offset: 1024,
Length: 2048,
}
@@ -65,7 +67,7 @@ func TestBlobChunkRepository(t *testing.T) {
bc3 := &BlobChunk{
BlobID: blob.ID,
ChunkHash: "chunk3",
ChunkHash: types.ChunkHash("chunk3"),
Offset: 3072,
Length: 512,
}
@@ -75,7 +77,7 @@ func TestBlobChunkRepository(t *testing.T) {
}
// Test GetByBlobID
blobChunks, err := repos.BlobChunks.GetByBlobID(ctx, blob.ID)
blobChunks, err := repos.BlobChunks.GetByBlobID(ctx, blob.ID.String())
if err != nil {
t.Fatalf("failed to get blob chunks: %v", err)
}
@@ -134,13 +136,13 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
// Create blobs
blob1 := &Blob{
ID: "blob1-uuid",
Hash: "blob1-hash",
ID: types.NewBlobID(),
Hash: types.BlobHash("blob1-hash"),
CreatedTS: time.Now(),
}
blob2 := &Blob{
ID: "blob2-uuid",
Hash: "blob2-hash",
ID: types.NewBlobID(),
Hash: types.BlobHash("blob2-hash"),
CreatedTS: time.Now(),
}
@@ -154,7 +156,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
}
// Create chunks
chunkHashes := []string{"chunk1", "chunk2", "chunk3"}
chunkHashes := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
for _, chunkHash := range chunkHashes {
chunk := &Chunk{
ChunkHash: chunkHash,
@@ -169,10 +171,10 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
// Create chunks across multiple blobs
// Some chunks are shared between blobs (deduplication scenario)
blobChunks := []BlobChunk{
{BlobID: blob1.ID, ChunkHash: "chunk1", Offset: 0, Length: 1024},
{BlobID: blob1.ID, ChunkHash: "chunk2", Offset: 1024, Length: 1024},
{BlobID: blob2.ID, ChunkHash: "chunk2", Offset: 0, Length: 1024}, // chunk2 is shared
{BlobID: blob2.ID, ChunkHash: "chunk3", Offset: 1024, Length: 1024},
{BlobID: blob1.ID, ChunkHash: types.ChunkHash("chunk1"), Offset: 0, Length: 1024},
{BlobID: blob1.ID, ChunkHash: types.ChunkHash("chunk2"), Offset: 1024, Length: 1024},
{BlobID: blob2.ID, ChunkHash: types.ChunkHash("chunk2"), Offset: 0, Length: 1024}, // chunk2 is shared
{BlobID: blob2.ID, ChunkHash: types.ChunkHash("chunk3"), Offset: 1024, Length: 1024},
}
for _, bc := range blobChunks {
@@ -183,7 +185,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
}
// Verify blob1 chunks
chunks, err := repos.BlobChunks.GetByBlobID(ctx, blob1.ID)
chunks, err := repos.BlobChunks.GetByBlobID(ctx, blob1.ID.String())
if err != nil {
t.Fatalf("failed to get blob1 chunks: %v", err)
}
@@ -192,7 +194,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
}
// Verify blob2 chunks
chunks, err = repos.BlobChunks.GetByBlobID(ctx, blob2.ID)
chunks, err = repos.BlobChunks.GetByBlobID(ctx, blob2.ID.String())
if err != nil {
t.Fatalf("failed to get blob2 chunks: %v", err)
}

View File

@@ -4,6 +4,8 @@ import (
"context"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
func TestBlobRepository(t *testing.T) {
@@ -15,8 +17,8 @@ func TestBlobRepository(t *testing.T) {
// Test Create
blob := &Blob{
ID: "test-blob-id-123",
Hash: "blobhash123",
ID: types.NewBlobID(),
Hash: types.BlobHash("blobhash123"),
CreatedTS: time.Now().Truncate(time.Second),
}
@@ -26,7 +28,7 @@ func TestBlobRepository(t *testing.T) {
}
// Test GetByHash
retrieved, err := repo.GetByHash(ctx, blob.Hash)
retrieved, err := repo.GetByHash(ctx, blob.Hash.String())
if err != nil {
t.Fatalf("failed to get blob: %v", err)
}
@@ -41,7 +43,7 @@ func TestBlobRepository(t *testing.T) {
}
// Test GetByID
retrievedByID, err := repo.GetByID(ctx, blob.ID)
retrievedByID, err := repo.GetByID(ctx, blob.ID.String())
if err != nil {
t.Fatalf("failed to get blob by ID: %v", err)
}
@@ -54,8 +56,8 @@ func TestBlobRepository(t *testing.T) {
// Test with second blob
blob2 := &Blob{
ID: "test-blob-id-456",
Hash: "blobhash456",
ID: types.NewBlobID(),
Hash: types.BlobHash("blobhash456"),
CreatedTS: time.Now().Truncate(time.Second),
}
err = repo.Create(ctx, nil, blob2)
@@ -65,13 +67,13 @@ func TestBlobRepository(t *testing.T) {
// Test UpdateFinished
now := time.Now()
err = repo.UpdateFinished(ctx, nil, blob.ID, blob.Hash, 1000, 500)
err = repo.UpdateFinished(ctx, nil, blob.ID.String(), blob.Hash.String(), 1000, 500)
if err != nil {
t.Fatalf("failed to update blob as finished: %v", err)
}
// Verify update
updated, err := repo.GetByID(ctx, blob.ID)
updated, err := repo.GetByID(ctx, blob.ID.String())
if err != nil {
t.Fatalf("failed to get updated blob: %v", err)
}
@@ -86,13 +88,13 @@ func TestBlobRepository(t *testing.T) {
}
// Test UpdateUploaded
err = repo.UpdateUploaded(ctx, nil, blob.ID)
err = repo.UpdateUploaded(ctx, nil, blob.ID.String())
if err != nil {
t.Fatalf("failed to update blob as uploaded: %v", err)
}
// Verify upload update
uploaded, err := repo.GetByID(ctx, blob.ID)
uploaded, err := repo.GetByID(ctx, blob.ID.String())
if err != nil {
t.Fatalf("failed to get uploaded blob: %v", err)
}
@@ -113,8 +115,8 @@ func TestBlobRepositoryDuplicate(t *testing.T) {
repo := NewBlobRepository(db)
blob := &Blob{
ID: "duplicate-test-id",
Hash: "duplicate_blob",
ID: types.NewBlobID(),
Hash: types.BlobHash("duplicate_blob"),
CreatedTS: time.Now().Truncate(time.Second),
}

View File

@@ -5,6 +5,8 @@ import (
"fmt"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
// TestCascadeDeleteDebug tests cascade delete with debug output
@@ -42,7 +44,7 @@ func TestCascadeDeleteDebug(t *testing.T) {
// Create chunks and file-chunk mappings
for i := 0; i < 3; i++ {
chunk := &Chunk{
ChunkHash: fmt.Sprintf("cascade-chunk-%d", i),
ChunkHash: types.ChunkHash(fmt.Sprintf("cascade-chunk-%d", i)),
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)

View File

@@ -4,6 +4,8 @@ import (
"context"
"database/sql"
"fmt"
"git.eeqj.de/sneak/vaultik/internal/types"
)
type ChunkFileRepository struct {
@@ -23,9 +25,9 @@ func (r *ChunkFileRepository) Create(ctx context.Context, tx *sql.Tx, cf *ChunkF
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, cf.ChunkHash, cf.FileID, cf.FileOffset, cf.Length)
_, err = tx.ExecContext(ctx, query, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
} else {
_, err = r.db.ExecWithLog(ctx, query, cf.ChunkHash, cf.FileID, cf.FileOffset, cf.Length)
_, err = r.db.ExecWithLog(ctx, query, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
}
if err != nil {
@@ -35,30 +37,20 @@ func (r *ChunkFileRepository) Create(ctx context.Context, tx *sql.Tx, cf *ChunkF
return nil
}
func (r *ChunkFileRepository) GetByChunkHash(ctx context.Context, chunkHash string) ([]*ChunkFile, error) {
func (r *ChunkFileRepository) GetByChunkHash(ctx context.Context, chunkHash types.ChunkHash) ([]*ChunkFile, error) {
query := `
SELECT chunk_hash, file_id, file_offset, length
FROM chunk_files
WHERE chunk_hash = ?
`
rows, err := r.db.conn.QueryContext(ctx, query, chunkHash)
rows, err := r.db.conn.QueryContext(ctx, query, chunkHash.String())
if err != nil {
return nil, fmt.Errorf("querying chunk files: %w", err)
}
defer CloseRows(rows)
var chunkFiles []*ChunkFile
for rows.Next() {
var cf ChunkFile
err := rows.Scan(&cf.ChunkHash, &cf.FileID, &cf.FileOffset, &cf.Length)
if err != nil {
return nil, fmt.Errorf("scanning chunk file: %w", err)
}
chunkFiles = append(chunkFiles, &cf)
}
return chunkFiles, rows.Err()
return r.scanChunkFiles(rows)
}
func (r *ChunkFileRepository) GetByFilePath(ctx context.Context, filePath string) ([]*ChunkFile, error) {
@@ -75,40 +67,41 @@ func (r *ChunkFileRepository) GetByFilePath(ctx context.Context, filePath string
}
defer CloseRows(rows)
var chunkFiles []*ChunkFile
for rows.Next() {
var cf ChunkFile
err := rows.Scan(&cf.ChunkHash, &cf.FileID, &cf.FileOffset, &cf.Length)
if err != nil {
return nil, fmt.Errorf("scanning chunk file: %w", err)
}
chunkFiles = append(chunkFiles, &cf)
}
return chunkFiles, rows.Err()
return r.scanChunkFiles(rows)
}
// GetByFileID retrieves chunk files by file ID
func (r *ChunkFileRepository) GetByFileID(ctx context.Context, fileID string) ([]*ChunkFile, error) {
func (r *ChunkFileRepository) GetByFileID(ctx context.Context, fileID types.FileID) ([]*ChunkFile, error) {
query := `
SELECT chunk_hash, file_id, file_offset, length
FROM chunk_files
WHERE file_id = ?
`
rows, err := r.db.conn.QueryContext(ctx, query, fileID)
rows, err := r.db.conn.QueryContext(ctx, query, fileID.String())
if err != nil {
return nil, fmt.Errorf("querying chunk files: %w", err)
}
defer CloseRows(rows)
return r.scanChunkFiles(rows)
}
// scanChunkFiles is a helper that scans chunk file rows
func (r *ChunkFileRepository) scanChunkFiles(rows *sql.Rows) ([]*ChunkFile, error) {
var chunkFiles []*ChunkFile
for rows.Next() {
var cf ChunkFile
err := rows.Scan(&cf.ChunkHash, &cf.FileID, &cf.FileOffset, &cf.Length)
var chunkHashStr, fileIDStr string
err := rows.Scan(&chunkHashStr, &fileIDStr, &cf.FileOffset, &cf.Length)
if err != nil {
return nil, fmt.Errorf("scanning chunk file: %w", err)
}
cf.ChunkHash = types.ChunkHash(chunkHashStr)
cf.FileID, err = types.ParseFileID(fileIDStr)
if err != nil {
return nil, fmt.Errorf("parsing file ID: %w", err)
}
chunkFiles = append(chunkFiles, &cf)
}
@@ -116,14 +109,14 @@ func (r *ChunkFileRepository) GetByFileID(ctx context.Context, fileID string) ([
}
// DeleteByFileID deletes all chunk_files entries for a given file ID
func (r *ChunkFileRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fileID string) error {
func (r *ChunkFileRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fileID types.FileID) error {
query := `DELETE FROM chunk_files WHERE file_id = ?`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, fileID)
_, err = tx.ExecContext(ctx, query, fileID.String())
} else {
_, err = r.db.ExecWithLog(ctx, query, fileID)
_, err = r.db.ExecWithLog(ctx, query, fileID.String())
}
if err != nil {
@@ -132,3 +125,80 @@ func (r *ChunkFileRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fi
return nil
}
// DeleteByFileIDs deletes all chunk_files for multiple files in a single statement.
func (r *ChunkFileRepository) DeleteByFileIDs(ctx context.Context, tx *sql.Tx, fileIDs []types.FileID) error {
if len(fileIDs) == 0 {
return nil
}
// Batch at 500 to stay within SQLite's variable limit
const batchSize = 500
for i := 0; i < len(fileIDs); i += batchSize {
end := i + batchSize
if end > len(fileIDs) {
end = len(fileIDs)
}
batch := fileIDs[i:end]
query := "DELETE FROM chunk_files WHERE file_id IN (?" + repeatPlaceholder(len(batch)-1) + ")"
args := make([]interface{}, len(batch))
for j, id := range batch {
args[j] = id.String()
}
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch deleting chunk_files: %w", err)
}
}
return nil
}
// CreateBatch inserts multiple chunk_files in a single statement for efficiency.
func (r *ChunkFileRepository) CreateBatch(ctx context.Context, tx *sql.Tx, cfs []ChunkFile) error {
if len(cfs) == 0 {
return nil
}
// Each ChunkFile has 4 values, so batch at 200 to be safe with SQLite's variable limit
const batchSize = 200
for i := 0; i < len(cfs); i += batchSize {
end := i + batchSize
if end > len(cfs) {
end = len(cfs)
}
batch := cfs[i:end]
query := "INSERT INTO chunk_files (chunk_hash, file_id, file_offset, length) VALUES "
args := make([]interface{}, 0, len(batch)*4)
for j, cf := range batch {
if j > 0 {
query += ", "
}
query += "(?, ?, ?, ?)"
args = append(args, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
}
query += " ON CONFLICT(chunk_hash, file_id) DO NOTHING"
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch inserting chunk_files: %w", err)
}
}
return nil
}

View File

@@ -4,6 +4,8 @@ import (
"context"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
func TestChunkFileRepository(t *testing.T) {
@@ -49,7 +51,7 @@ func TestChunkFileRepository(t *testing.T) {
// Create chunk first
chunk := &Chunk{
ChunkHash: "chunk1",
ChunkHash: types.ChunkHash("chunk1"),
Size: 1024,
}
err = chunksRepo.Create(ctx, nil, chunk)
@@ -59,7 +61,7 @@ func TestChunkFileRepository(t *testing.T) {
// Test Create
cf1 := &ChunkFile{
ChunkHash: "chunk1",
ChunkHash: types.ChunkHash("chunk1"),
FileID: file1.ID,
FileOffset: 0,
Length: 1024,
@@ -72,7 +74,7 @@ func TestChunkFileRepository(t *testing.T) {
// Add same chunk in different file (deduplication scenario)
cf2 := &ChunkFile{
ChunkHash: "chunk1",
ChunkHash: types.ChunkHash("chunk1"),
FileID: file2.ID,
FileOffset: 2048,
Length: 1024,
@@ -114,7 +116,7 @@ func TestChunkFileRepository(t *testing.T) {
if len(chunkFiles) != 1 {
t.Errorf("expected 1 chunk for file, got %d", len(chunkFiles))
}
if chunkFiles[0].ChunkHash != "chunk1" {
if chunkFiles[0].ChunkHash != types.ChunkHash("chunk1") {
t.Errorf("wrong chunk hash: expected chunk1, got %s", chunkFiles[0].ChunkHash)
}
@@ -151,7 +153,7 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {
}
// Create chunks first
chunks := []string{"chunk1", "chunk2", "chunk3", "chunk4"}
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3", "chunk4"}
for _, chunkHash := range chunks {
chunk := &Chunk{
ChunkHash: chunkHash,
@@ -170,16 +172,16 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {
chunkFiles := []ChunkFile{
// File1
{ChunkHash: "chunk1", FileID: file1.ID, FileOffset: 0, Length: 1024},
{ChunkHash: "chunk2", FileID: file1.ID, FileOffset: 1024, Length: 1024},
{ChunkHash: "chunk3", FileID: file1.ID, FileOffset: 2048, Length: 1024},
{ChunkHash: types.ChunkHash("chunk1"), FileID: file1.ID, FileOffset: 0, Length: 1024},
{ChunkHash: types.ChunkHash("chunk2"), FileID: file1.ID, FileOffset: 1024, Length: 1024},
{ChunkHash: types.ChunkHash("chunk3"), FileID: file1.ID, FileOffset: 2048, Length: 1024},
// File2
{ChunkHash: "chunk2", FileID: file2.ID, FileOffset: 0, Length: 1024},
{ChunkHash: "chunk3", FileID: file2.ID, FileOffset: 1024, Length: 1024},
{ChunkHash: "chunk4", FileID: file2.ID, FileOffset: 2048, Length: 1024},
{ChunkHash: types.ChunkHash("chunk2"), FileID: file2.ID, FileOffset: 0, Length: 1024},
{ChunkHash: types.ChunkHash("chunk3"), FileID: file2.ID, FileOffset: 1024, Length: 1024},
{ChunkHash: types.ChunkHash("chunk4"), FileID: file2.ID, FileOffset: 2048, Length: 1024},
// File3
{ChunkHash: "chunk1", FileID: file3.ID, FileOffset: 0, Length: 1024},
{ChunkHash: "chunk4", FileID: file3.ID, FileOffset: 1024, Length: 1024},
{ChunkHash: types.ChunkHash("chunk1"), FileID: file3.ID, FileOffset: 0, Length: 1024},
{ChunkHash: types.ChunkHash("chunk4"), FileID: file3.ID, FileOffset: 1024, Length: 1024},
}
for _, cf := range chunkFiles {

View File

@@ -139,7 +139,7 @@ func (r *ChunkRepository) ListUnpacked(ctx context.Context, limit int) ([]*Chunk
return chunks, rows.Err()
}
// DeleteOrphaned deletes chunks that are not referenced by any file
// DeleteOrphaned deletes chunks that are not referenced by any file or blob
func (r *ChunkRepository) DeleteOrphaned(ctx context.Context) error {
query := `
DELETE FROM chunks
@@ -147,6 +147,10 @@ func (r *ChunkRepository) DeleteOrphaned(ctx context.Context) error {
SELECT 1 FROM file_chunks
WHERE file_chunks.chunk_hash = chunks.chunk_hash
)
AND NOT EXISTS (
SELECT 1 FROM blob_chunks
WHERE blob_chunks.chunk_hash = chunks.chunk_hash
)
`
result, err := r.db.ExecWithLog(ctx, query)

View File

@@ -3,6 +3,8 @@ package database
import (
"context"
"testing"
"git.eeqj.de/sneak/vaultik/internal/types"
)
func TestChunkRepository(t *testing.T) {
@@ -14,7 +16,7 @@ func TestChunkRepository(t *testing.T) {
// Test Create
chunk := &Chunk{
ChunkHash: "chunkhash123",
ChunkHash: types.ChunkHash("chunkhash123"),
Size: 4096,
}
@@ -24,7 +26,7 @@ func TestChunkRepository(t *testing.T) {
}
// Test GetByHash
retrieved, err := repo.GetByHash(ctx, chunk.ChunkHash)
retrieved, err := repo.GetByHash(ctx, chunk.ChunkHash.String())
if err != nil {
t.Fatalf("failed to get chunk: %v", err)
}
@@ -46,7 +48,7 @@ func TestChunkRepository(t *testing.T) {
// Test GetByHashes
chunk2 := &Chunk{
ChunkHash: "chunkhash456",
ChunkHash: types.ChunkHash("chunkhash456"),
Size: 8192,
}
err = repo.Create(ctx, nil, chunk2)
@@ -54,7 +56,7 @@ func TestChunkRepository(t *testing.T) {
t.Fatalf("failed to create second chunk: %v", err)
}
chunks, err := repo.GetByHashes(ctx, []string{chunk.ChunkHash, chunk2.ChunkHash})
chunks, err := repo.GetByHashes(ctx, []string{chunk.ChunkHash.String(), chunk2.ChunkHash.String()})
if err != nil {
t.Fatalf("failed to get chunks by hashes: %v", err)
}

View File

@@ -36,26 +36,17 @@ type DB struct {
}
// New creates a new database connection at the specified path.
// It automatically handles database recovery, creates the schema if needed,
// and configures SQLite with appropriate settings for performance and reliability.
// The database uses WAL mode for better concurrency and sets a busy timeout
// to handle concurrent access gracefully.
//
// If the database appears locked, it will attempt recovery by removing stale
// lock files and switching temporarily to TRUNCATE journal mode.
//
// New creates a new database connection at the specified path.
// It automatically handles recovery from stale locks, creates the schema if needed,
// and configures SQLite with WAL mode for better concurrency.
// It creates the schema if needed and configures SQLite with WAL mode for
// better concurrency. SQLite handles crash recovery automatically when
// opening a database with journal/WAL files present.
// The path parameter can be a file path for persistent storage or ":memory:"
// for an in-memory database (useful for testing).
func New(ctx context.Context, path string) (*DB, error) {
log.Debug("Opening database connection", "path", path)
// First, try to recover from any stale locks
if err := recoverDatabase(ctx, path); err != nil {
log.Warn("Failed to recover database", "error", err)
}
// Note: We do NOT delete journal/WAL files before opening.
// SQLite handles crash recovery automatically when the database is opened.
// Deleting these files would corrupt the database after an unclean shutdown.
// First attempt with standard WAL mode
log.Debug("Attempting to open database with WAL mode", "path", path)
@@ -156,62 +147,6 @@ func (db *DB) Close() error {
return nil
}
// recoverDatabase attempts to recover a locked database
func recoverDatabase(ctx context.Context, path string) error {
// Check if database file exists
if _, err := os.Stat(path); os.IsNotExist(err) {
// No database file, nothing to recover
return nil
}
// Remove stale lock files
// SQLite creates -wal and -shm files for WAL mode
walPath := path + "-wal"
shmPath := path + "-shm"
journalPath := path + "-journal"
log.Info("Attempting database recovery", "path", path)
// Always remove lock files on startup to ensure clean state
removed := false
// Check for and remove journal file (from non-WAL mode)
if _, err := os.Stat(journalPath); err == nil {
log.Info("Found journal file, removing", "path", journalPath)
if err := os.Remove(journalPath); err != nil {
log.Warn("Failed to remove journal file", "error", err)
} else {
removed = true
}
}
// Remove WAL file
if _, err := os.Stat(walPath); err == nil {
log.Info("Found WAL file, removing", "path", walPath)
if err := os.Remove(walPath); err != nil {
log.Warn("Failed to remove WAL file", "error", err)
} else {
removed = true
}
}
// Remove SHM file
if _, err := os.Stat(shmPath); err == nil {
log.Info("Found shared memory file, removing", "path", shmPath)
if err := os.Remove(shmPath); err != nil {
log.Warn("Failed to remove shared memory file", "error", err)
} else {
removed = true
}
}
if removed {
log.Info("Database lock files removed")
}
return nil
}
// Conn returns the underlying *sql.DB connection.
// This should be used sparingly and primarily for read operations.
// For write operations, prefer using the ExecWithLog method.
@@ -219,6 +154,11 @@ func (db *DB) Conn() *sql.DB {
return db.conn
}
// Path returns the path to the database file.
func (db *DB) Path() string {
return db.path
}
// BeginTx starts a new database transaction with the given options.
// The caller is responsible for committing or rolling back the transaction.
// For write transactions, consider using the Repositories.WithTx method instead,
@@ -270,6 +210,15 @@ func NewTestDB() (*DB, error) {
return New(context.Background(), ":memory:")
}
// repeatPlaceholder generates a string of ", ?" repeated n times for IN clause construction.
// For example, repeatPlaceholder(2) returns ", ?, ?".
func repeatPlaceholder(n int) string {
if n <= 0 {
return ""
}
return strings.Repeat(", ?", n)
}
// LogSQL logs SQL queries and their arguments when debug mode is enabled.
// Debug mode is activated by setting the GODEBUG environment variable to include "vaultik".
// This is useful for troubleshooting database operations and understanding query patterns.

View File

@@ -4,6 +4,8 @@ import (
"context"
"database/sql"
"fmt"
"git.eeqj.de/sneak/vaultik/internal/types"
)
type FileChunkRepository struct {
@@ -23,9 +25,9 @@ func (r *FileChunkRepository) Create(ctx context.Context, tx *sql.Tx, fc *FileCh
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, fc.FileID, fc.Idx, fc.ChunkHash)
_, err = tx.ExecContext(ctx, query, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
} else {
_, err = r.db.ExecWithLog(ctx, query, fc.FileID, fc.Idx, fc.ChunkHash)
_, err = r.db.ExecWithLog(ctx, query, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
}
if err != nil {
@@ -50,21 +52,11 @@ func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*Fi
}
defer CloseRows(rows)
var fileChunks []*FileChunk
for rows.Next() {
var fc FileChunk
err := rows.Scan(&fc.FileID, &fc.Idx, &fc.ChunkHash)
if err != nil {
return nil, fmt.Errorf("scanning file chunk: %w", err)
}
fileChunks = append(fileChunks, &fc)
}
return fileChunks, rows.Err()
return r.scanFileChunks(rows)
}
// GetByFileID retrieves file chunks by file ID
func (r *FileChunkRepository) GetByFileID(ctx context.Context, fileID string) ([]*FileChunk, error) {
func (r *FileChunkRepository) GetByFileID(ctx context.Context, fileID types.FileID) ([]*FileChunk, error) {
query := `
SELECT file_id, idx, chunk_hash
FROM file_chunks
@@ -72,23 +64,13 @@ func (r *FileChunkRepository) GetByFileID(ctx context.Context, fileID string) ([
ORDER BY idx
`
rows, err := r.db.conn.QueryContext(ctx, query, fileID)
rows, err := r.db.conn.QueryContext(ctx, query, fileID.String())
if err != nil {
return nil, fmt.Errorf("querying file chunks: %w", err)
}
defer CloseRows(rows)
var fileChunks []*FileChunk
for rows.Next() {
var fc FileChunk
err := rows.Scan(&fc.FileID, &fc.Idx, &fc.ChunkHash)
if err != nil {
return nil, fmt.Errorf("scanning file chunk: %w", err)
}
fileChunks = append(fileChunks, &fc)
}
return fileChunks, rows.Err()
return r.scanFileChunks(rows)
}
// GetByPathTx retrieves file chunks within a transaction
@@ -108,16 +90,28 @@ func (r *FileChunkRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path
}
defer CloseRows(rows)
fileChunks, err := r.scanFileChunks(rows)
LogSQL("GetByPathTx", "Complete", path, "count", len(fileChunks))
return fileChunks, err
}
// scanFileChunks is a helper that scans file chunk rows
func (r *FileChunkRepository) scanFileChunks(rows *sql.Rows) ([]*FileChunk, error) {
var fileChunks []*FileChunk
for rows.Next() {
var fc FileChunk
err := rows.Scan(&fc.FileID, &fc.Idx, &fc.ChunkHash)
var fileIDStr, chunkHashStr string
err := rows.Scan(&fileIDStr, &fc.Idx, &chunkHashStr)
if err != nil {
return nil, fmt.Errorf("scanning file chunk: %w", err)
}
fc.FileID, err = types.ParseFileID(fileIDStr)
if err != nil {
return nil, fmt.Errorf("parsing file ID: %w", err)
}
fc.ChunkHash = types.ChunkHash(chunkHashStr)
fileChunks = append(fileChunks, &fc)
}
LogSQL("GetByPathTx", "Complete", path, "count", len(fileChunks))
return fileChunks, rows.Err()
}
@@ -140,14 +134,14 @@ func (r *FileChunkRepository) DeleteByPath(ctx context.Context, tx *sql.Tx, path
}
// DeleteByFileID deletes all chunks for a file by its UUID
func (r *FileChunkRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fileID string) error {
func (r *FileChunkRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fileID types.FileID) error {
query := `DELETE FROM file_chunks WHERE file_id = ?`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, fileID)
_, err = tx.ExecContext(ctx, query, fileID.String())
} else {
_, err = r.db.ExecWithLog(ctx, query, fileID)
_, err = r.db.ExecWithLog(ctx, query, fileID.String())
}
if err != nil {
@@ -157,6 +151,86 @@ func (r *FileChunkRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fi
return nil
}
// DeleteByFileIDs deletes all chunks for multiple files in a single statement.
func (r *FileChunkRepository) DeleteByFileIDs(ctx context.Context, tx *sql.Tx, fileIDs []types.FileID) error {
if len(fileIDs) == 0 {
return nil
}
// Batch at 500 to stay within SQLite's variable limit
const batchSize = 500
for i := 0; i < len(fileIDs); i += batchSize {
end := i + batchSize
if end > len(fileIDs) {
end = len(fileIDs)
}
batch := fileIDs[i:end]
query := "DELETE FROM file_chunks WHERE file_id IN (?" + repeatPlaceholder(len(batch)-1) + ")"
args := make([]interface{}, len(batch))
for j, id := range batch {
args[j] = id.String()
}
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch deleting file_chunks: %w", err)
}
}
return nil
}
// CreateBatch inserts multiple file_chunks in a single statement for efficiency.
// Batches are automatically split to stay within SQLite's variable limit.
func (r *FileChunkRepository) CreateBatch(ctx context.Context, tx *sql.Tx, fcs []FileChunk) error {
if len(fcs) == 0 {
return nil
}
// SQLite has a limit on variables (typically 999 or 32766).
// Each FileChunk has 3 values, so batch at 300 to be safe.
const batchSize = 300
for i := 0; i < len(fcs); i += batchSize {
end := i + batchSize
if end > len(fcs) {
end = len(fcs)
}
batch := fcs[i:end]
// Build the query with multiple value sets
query := "INSERT INTO file_chunks (file_id, idx, chunk_hash) VALUES "
args := make([]interface{}, 0, len(batch)*3)
for j, fc := range batch {
if j > 0 {
query += ", "
}
query += "(?, ?, ?)"
args = append(args, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
}
query += " ON CONFLICT(file_id, idx) DO NOTHING"
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch inserting file_chunks: %w", err)
}
}
return nil
}
// GetByFile is an alias for GetByPath for compatibility
func (r *FileChunkRepository) GetByFile(ctx context.Context, path string) ([]*FileChunk, error) {
LogSQL("GetByFile", "Starting", path)

View File

@@ -5,6 +5,8 @@ import (
"fmt"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
func TestFileChunkRepository(t *testing.T) {
@@ -33,7 +35,7 @@ func TestFileChunkRepository(t *testing.T) {
}
// Create chunks first
chunks := []string{"chunk1", "chunk2", "chunk3"}
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
chunkRepo := NewChunkRepository(db)
for _, chunkHash := range chunks {
chunk := &Chunk{
@@ -50,7 +52,7 @@ func TestFileChunkRepository(t *testing.T) {
fc1 := &FileChunk{
FileID: file.ID,
Idx: 0,
ChunkHash: "chunk1",
ChunkHash: types.ChunkHash("chunk1"),
}
err = repo.Create(ctx, nil, fc1)
@@ -62,7 +64,7 @@ func TestFileChunkRepository(t *testing.T) {
fc2 := &FileChunk{
FileID: file.ID,
Idx: 1,
ChunkHash: "chunk2",
ChunkHash: types.ChunkHash("chunk2"),
}
err = repo.Create(ctx, nil, fc2)
if err != nil {
@@ -72,7 +74,7 @@ func TestFileChunkRepository(t *testing.T) {
fc3 := &FileChunk{
FileID: file.ID,
Idx: 2,
ChunkHash: "chunk3",
ChunkHash: types.ChunkHash("chunk3"),
}
err = repo.Create(ctx, nil, fc3)
if err != nil {
@@ -131,7 +133,7 @@ func TestFileChunkRepositoryMultipleFiles(t *testing.T) {
for i, path := range filePaths {
file := &File{
Path: path,
Path: types.FilePath(path),
MTime: testTime,
CTime: testTime,
Size: 2048,
@@ -151,7 +153,7 @@ func TestFileChunkRepositoryMultipleFiles(t *testing.T) {
chunkRepo := NewChunkRepository(db)
for i := range files {
for j := 0; j < 2; j++ {
chunkHash := fmt.Sprintf("file%d_chunk%d", i, j)
chunkHash := types.ChunkHash(fmt.Sprintf("file%d_chunk%d", i, j))
chunk := &Chunk{
ChunkHash: chunkHash,
Size: 1024,
@@ -169,7 +171,7 @@ func TestFileChunkRepositoryMultipleFiles(t *testing.T) {
fc := &FileChunk{
FileID: file.ID,
Idx: j,
ChunkHash: fmt.Sprintf("file%d_chunk%d", i, j),
ChunkHash: types.ChunkHash(fmt.Sprintf("file%d_chunk%d", i, j)),
}
err := repo.Create(ctx, nil, fc)
if err != nil {

View File

@@ -7,7 +7,7 @@ import (
"time"
"git.eeqj.de/sneak/vaultik/internal/log"
"github.com/google/uuid"
"git.eeqj.de/sneak/vaultik/internal/types"
)
type FileRepository struct {
@@ -20,14 +20,15 @@ func NewFileRepository(db *DB) *FileRepository {
func (r *FileRepository) Create(ctx context.Context, tx *sql.Tx, file *File) error {
// Generate UUID if not provided
if file.ID == "" {
file.ID = uuid.New().String()
if file.ID.IsZero() {
file.ID = types.NewFileID()
}
query := `
INSERT INTO files (id, path, mtime, ctime, size, mode, uid, gid, link_target)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
INSERT INTO files (id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(path) DO UPDATE SET
source_path = excluded.source_path,
mtime = excluded.mtime,
ctime = excluded.ctime,
size = excluded.size,
@@ -38,44 +39,36 @@ func (r *FileRepository) Create(ctx context.Context, tx *sql.Tx, file *File) err
RETURNING id
`
var idStr string
var err error
if tx != nil {
LogSQL("Execute", query, file.ID, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget)
err = tx.QueryRowContext(ctx, query, file.ID, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget).Scan(&file.ID)
LogSQL("Execute", query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String())
err = tx.QueryRowContext(ctx, query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String()).Scan(&idStr)
} else {
err = r.db.QueryRowWithLog(ctx, query, file.ID, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget).Scan(&file.ID)
err = r.db.QueryRowWithLog(ctx, query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String()).Scan(&idStr)
}
if err != nil {
return fmt.Errorf("inserting file: %w", err)
}
// Parse the returned ID
file.ID, err = types.ParseFileID(idStr)
if err != nil {
return fmt.Errorf("parsing file ID: %w", err)
}
return nil
}
func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, error) {
query := `
SELECT id, path, mtime, ctime, size, mode, uid, gid, link_target
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
WHERE path = ?
`
var file File
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
err := r.db.conn.QueryRowContext(ctx, query, path).Scan(
&file.ID,
&file.Path,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
file, err := r.scanFile(r.db.conn.QueryRowContext(ctx, query, path))
if err == sql.ErrNoRows {
return nil, nil
}
@@ -83,39 +76,18 @@ func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, err
return nil, fmt.Errorf("querying file: %w", err)
}
file.MTime = time.Unix(mtimeUnix, 0).UTC()
file.CTime = time.Unix(ctimeUnix, 0).UTC()
if linkTarget.Valid {
file.LinkTarget = linkTarget.String
}
return &file, nil
return file, nil
}
// GetByID retrieves a file by its UUID
func (r *FileRepository) GetByID(ctx context.Context, id string) (*File, error) {
func (r *FileRepository) GetByID(ctx context.Context, id types.FileID) (*File, error) {
query := `
SELECT id, path, mtime, ctime, size, mode, uid, gid, link_target
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
WHERE id = ?
`
var file File
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
err := r.db.conn.QueryRowContext(ctx, query, id).Scan(
&file.ID,
&file.Path,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
file, err := r.scanFile(r.db.conn.QueryRowContext(ctx, query, id.String()))
if err == sql.ErrNoRows {
return nil, nil
}
@@ -123,38 +95,18 @@ func (r *FileRepository) GetByID(ctx context.Context, id string) (*File, error)
return nil, fmt.Errorf("querying file: %w", err)
}
file.MTime = time.Unix(mtimeUnix, 0).UTC()
file.CTime = time.Unix(ctimeUnix, 0).UTC()
if linkTarget.Valid {
file.LinkTarget = linkTarget.String
}
return &file, nil
return file, nil
}
func (r *FileRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path string) (*File, error) {
query := `
SELECT id, path, mtime, ctime, size, mode, uid, gid, link_target
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
WHERE path = ?
`
var file File
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
LogSQL("GetByPathTx QueryRowContext", query, path)
err := tx.QueryRowContext(ctx, query, path).Scan(
&file.ID,
&file.Path,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
file, err := r.scanFile(tx.QueryRowContext(ctx, query, path))
LogSQL("GetByPathTx Scan complete", query, path)
if err == sql.ErrNoRows {
@@ -164,10 +116,80 @@ func (r *FileRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path strin
return nil, fmt.Errorf("querying file: %w", err)
}
return file, nil
}
// scanFile is a helper that scans a single file row
func (r *FileRepository) scanFile(row *sql.Row) (*File, error) {
var file File
var idStr, pathStr, sourcePathStr string
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
err := row.Scan(
&idStr,
&pathStr,
&sourcePathStr,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
if err != nil {
return nil, err
}
file.ID, err = types.ParseFileID(idStr)
if err != nil {
return nil, fmt.Errorf("parsing file ID: %w", err)
}
file.Path = types.FilePath(pathStr)
file.SourcePath = types.SourcePath(sourcePathStr)
file.MTime = time.Unix(mtimeUnix, 0).UTC()
file.CTime = time.Unix(ctimeUnix, 0).UTC()
if linkTarget.Valid {
file.LinkTarget = linkTarget.String
file.LinkTarget = types.FilePath(linkTarget.String)
}
return &file, nil
}
// scanFileRows is a helper that scans a file row from rows iterator
func (r *FileRepository) scanFileRows(rows *sql.Rows) (*File, error) {
var file File
var idStr, pathStr, sourcePathStr string
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
err := rows.Scan(
&idStr,
&pathStr,
&sourcePathStr,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
if err != nil {
return nil, err
}
file.ID, err = types.ParseFileID(idStr)
if err != nil {
return nil, fmt.Errorf("parsing file ID: %w", err)
}
file.Path = types.FilePath(pathStr)
file.SourcePath = types.SourcePath(sourcePathStr)
file.MTime = time.Unix(mtimeUnix, 0).UTC()
file.CTime = time.Unix(ctimeUnix, 0).UTC()
if linkTarget.Valid {
file.LinkTarget = types.FilePath(linkTarget.String)
}
return &file, nil
@@ -175,7 +197,7 @@ func (r *FileRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path strin
func (r *FileRepository) ListModifiedSince(ctx context.Context, since time.Time) ([]*File, error) {
query := `
SELECT id, path, mtime, ctime, size, mode, uid, gid, link_target
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
WHERE mtime >= ?
ORDER BY path
@@ -189,32 +211,11 @@ func (r *FileRepository) ListModifiedSince(ctx context.Context, since time.Time)
var files []*File
for rows.Next() {
var file File
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
err := rows.Scan(
&file.ID,
&file.Path,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
file, err := r.scanFileRows(rows)
if err != nil {
return nil, fmt.Errorf("scanning file: %w", err)
}
file.MTime = time.Unix(mtimeUnix, 0)
file.CTime = time.Unix(ctimeUnix, 0)
if linkTarget.Valid {
file.LinkTarget = linkTarget.String
}
files = append(files, &file)
files = append(files, file)
}
return files, rows.Err()
@@ -238,14 +239,14 @@ func (r *FileRepository) Delete(ctx context.Context, tx *sql.Tx, path string) er
}
// DeleteByID deletes a file by its UUID
func (r *FileRepository) DeleteByID(ctx context.Context, tx *sql.Tx, id string) error {
func (r *FileRepository) DeleteByID(ctx context.Context, tx *sql.Tx, id types.FileID) error {
query := `DELETE FROM files WHERE id = ?`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, id)
_, err = tx.ExecContext(ctx, query, id.String())
} else {
_, err = r.db.ExecWithLog(ctx, query, id)
_, err = r.db.ExecWithLog(ctx, query, id.String())
}
if err != nil {
@@ -257,7 +258,7 @@ func (r *FileRepository) DeleteByID(ctx context.Context, tx *sql.Tx, id string)
func (r *FileRepository) ListByPrefix(ctx context.Context, prefix string) ([]*File, error) {
query := `
SELECT id, path, mtime, ctime, size, mode, uid, gid, link_target
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
WHERE path LIKE ? || '%'
ORDER BY path
@@ -271,37 +272,92 @@ func (r *FileRepository) ListByPrefix(ctx context.Context, prefix string) ([]*Fi
var files []*File
for rows.Next() {
var file File
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
err := rows.Scan(
&file.ID,
&file.Path,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
file, err := r.scanFileRows(rows)
if err != nil {
return nil, fmt.Errorf("scanning file: %w", err)
}
file.MTime = time.Unix(mtimeUnix, 0)
file.CTime = time.Unix(ctimeUnix, 0)
if linkTarget.Valid {
file.LinkTarget = linkTarget.String
}
files = append(files, &file)
files = append(files, file)
}
return files, rows.Err()
}
// ListAll returns all files in the database
func (r *FileRepository) ListAll(ctx context.Context) ([]*File, error) {
query := `
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
ORDER BY path
`
rows, err := r.db.conn.QueryContext(ctx, query)
if err != nil {
return nil, fmt.Errorf("querying files: %w", err)
}
defer CloseRows(rows)
var files []*File
for rows.Next() {
file, err := r.scanFileRows(rows)
if err != nil {
return nil, fmt.Errorf("scanning file: %w", err)
}
files = append(files, file)
}
return files, rows.Err()
}
// CreateBatch inserts or updates multiple files in a single statement for efficiency.
// File IDs must be pre-generated before calling this method.
func (r *FileRepository) CreateBatch(ctx context.Context, tx *sql.Tx, files []*File) error {
if len(files) == 0 {
return nil
}
// Each File has 10 values, so batch at 100 to be safe with SQLite's variable limit
const batchSize = 100
for i := 0; i < len(files); i += batchSize {
end := i + batchSize
if end > len(files) {
end = len(files)
}
batch := files[i:end]
query := `INSERT INTO files (id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target) VALUES `
args := make([]interface{}, 0, len(batch)*10)
for j, f := range batch {
if j > 0 {
query += ", "
}
query += "(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"
args = append(args, f.ID.String(), f.Path.String(), f.SourcePath.String(), f.MTime.Unix(), f.CTime.Unix(), f.Size, f.Mode, f.UID, f.GID, f.LinkTarget.String())
}
query += ` ON CONFLICT(path) DO UPDATE SET
source_path = excluded.source_path,
mtime = excluded.mtime,
ctime = excluded.ctime,
size = excluded.size,
mode = excluded.mode,
uid = excluded.uid,
gid = excluded.gid,
link_target = excluded.link_target`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch inserting files: %w", err)
}
}
return nil
}
// DeleteOrphaned deletes files that are not referenced by any snapshot
func (r *FileRepository) DeleteOrphaned(ctx context.Context) error {
query := `

View File

@@ -53,7 +53,7 @@ func TestFileRepository(t *testing.T) {
}
// Test GetByPath
retrieved, err := repo.GetByPath(ctx, file.Path)
retrieved, err := repo.GetByPath(ctx, file.Path.String())
if err != nil {
t.Fatalf("failed to get file: %v", err)
}
@@ -81,7 +81,7 @@ func TestFileRepository(t *testing.T) {
t.Fatalf("failed to update file: %v", err)
}
retrieved, err = repo.GetByPath(ctx, file.Path)
retrieved, err = repo.GetByPath(ctx, file.Path.String())
if err != nil {
t.Fatalf("failed to get updated file: %v", err)
}
@@ -99,12 +99,12 @@ func TestFileRepository(t *testing.T) {
}
// Test Delete
err = repo.Delete(ctx, nil, file.Path)
err = repo.Delete(ctx, nil, file.Path.String())
if err != nil {
t.Fatalf("failed to delete file: %v", err)
}
retrieved, err = repo.GetByPath(ctx, file.Path)
retrieved, err = repo.GetByPath(ctx, file.Path.String())
if err != nil {
t.Fatalf("error getting deleted file: %v", err)
}
@@ -137,7 +137,7 @@ func TestFileRepositorySymlink(t *testing.T) {
t.Fatalf("failed to create symlink: %v", err)
}
retrieved, err := repo.GetByPath(ctx, symlink.Path)
retrieved, err := repo.GetByPath(ctx, symlink.Path.String())
if err != nil {
t.Fatalf("failed to get symlink: %v", err)
}

View File

@@ -2,22 +2,27 @@
// It includes types for files, chunks, blobs, snapshots, and their relationships.
package database
import "time"
import (
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
// File represents a file or directory in the backup system.
// It stores metadata about files including timestamps, permissions, ownership,
// and symlink targets. This information is used to restore files with their
// original attributes.
type File struct {
ID string // UUID primary key
Path string
ID types.FileID // UUID primary key
Path types.FilePath // Absolute path of the file
SourcePath types.SourcePath // The source directory this file came from (for restore path stripping)
MTime time.Time
CTime time.Time
Size int64
Mode uint32
UID uint32
GID uint32
LinkTarget string // empty for regular files, target path for symlinks
LinkTarget types.FilePath // empty for regular files, target path for symlinks
}
// IsSymlink returns true if this file is a symbolic link.
@@ -30,16 +35,16 @@ func (f *File) IsSymlink() bool {
// Large files are split into multiple chunks for efficient deduplication and storage.
// The Idx field maintains the order of chunks within a file.
type FileChunk struct {
FileID string
FileID types.FileID
Idx int
ChunkHash string
ChunkHash types.ChunkHash
}
// Chunk represents a data chunk in the deduplication system.
// Files are split into chunks which are content-addressed by their hash.
// The ChunkHash is the SHA256 hash of the chunk content, used for deduplication.
type Chunk struct {
ChunkHash string
ChunkHash types.ChunkHash
Size int64
}
@@ -51,8 +56,8 @@ type Chunk struct {
// The blob creation process is: chunks are accumulated -> compressed with zstd
// -> encrypted with age -> hashed -> uploaded to S3 with the hash as filename.
type Blob struct {
ID string // UUID assigned when blob creation starts
Hash string // SHA256 of final compressed+encrypted content (empty until finalized)
ID types.BlobID // UUID assigned when blob creation starts
Hash types.BlobHash // SHA256 of final compressed+encrypted content (empty until finalized)
CreatedTS time.Time // When blob creation started
FinishedTS *time.Time // When blob was finalized (nil if still packing)
UncompressedSize int64 // Total size of raw chunks before compression
@@ -65,8 +70,8 @@ type Blob struct {
// their position and size within the blob. The offset and length fields
// enable extracting specific chunks from a blob without processing the entire blob.
type BlobChunk struct {
BlobID string
ChunkHash string
BlobID types.BlobID
ChunkHash types.ChunkHash
Offset int64
Length int64
}
@@ -75,18 +80,18 @@ type BlobChunk struct {
// This is used during deduplication to identify all files that share a chunk,
// which is important for garbage collection and integrity verification.
type ChunkFile struct {
ChunkHash string
FileID string
ChunkHash types.ChunkHash
FileID types.FileID
FileOffset int64
Length int64
}
// Snapshot represents a snapshot record in the database
type Snapshot struct {
ID string
Hostname string
VaultikVersion string
VaultikGitRevision string
ID types.SnapshotID
Hostname types.Hostname
VaultikVersion types.Version
VaultikGitRevision types.GitRevision
StartedAt time.Time
CompletedAt *time.Time // nil if still in progress
FileCount int64
@@ -108,13 +113,13 @@ func (s *Snapshot) IsComplete() bool {
// SnapshotFile represents the mapping between snapshots and files
type SnapshotFile struct {
SnapshotID string
FileID string
SnapshotID types.SnapshotID
FileID types.FileID
}
// SnapshotBlob represents the mapping between snapshots and blobs
type SnapshotBlob struct {
SnapshotID string
BlobID string
BlobHash string // Denormalized for easier manifest generation
SnapshotID types.SnapshotID
BlobID types.BlobID
BlobHash types.BlobHash // Denormalized for easier manifest generation
}

View File

@@ -75,6 +75,11 @@ func (r *Repositories) WithTx(ctx context.Context, fn TxFunc) error {
return tx.Commit()
}
// DB returns the underlying database for direct queries
func (r *Repositories) DB() *DB {
return r.db
}
// WithReadTx executes a function within a read-only transaction.
// Read transactions can run concurrently with other read transactions
// but will be blocked by write transactions. The transaction is

View File

@@ -6,6 +6,8 @@ import (
"fmt"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
func TestRepositoriesTransaction(t *testing.T) {
@@ -33,7 +35,7 @@ func TestRepositoriesTransaction(t *testing.T) {
// Create chunks
chunk1 := &Chunk{
ChunkHash: "tx_chunk1",
ChunkHash: types.ChunkHash("tx_chunk1"),
Size: 512,
}
if err := repos.Chunks.Create(ctx, tx, chunk1); err != nil {
@@ -41,7 +43,7 @@ func TestRepositoriesTransaction(t *testing.T) {
}
chunk2 := &Chunk{
ChunkHash: "tx_chunk2",
ChunkHash: types.ChunkHash("tx_chunk2"),
Size: 512,
}
if err := repos.Chunks.Create(ctx, tx, chunk2); err != nil {
@@ -69,8 +71,8 @@ func TestRepositoriesTransaction(t *testing.T) {
// Create blob
blob := &Blob{
ID: "tx-blob-id-1",
Hash: "tx_blob1",
ID: types.NewBlobID(),
Hash: types.BlobHash("tx_blob1"),
CreatedTS: time.Now().Truncate(time.Second),
}
if err := repos.Blobs.Create(ctx, tx, blob); err != nil {
@@ -156,7 +158,7 @@ func TestRepositoriesTransactionRollback(t *testing.T) {
// Create a chunk
chunk := &Chunk{
ChunkHash: "rollback_chunk",
ChunkHash: types.ChunkHash("rollback_chunk"),
Size: 1024,
}
if err := repos.Chunks.Create(ctx, tx, chunk); err != nil {

View File

@@ -6,6 +6,8 @@ import (
"fmt"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
// TestFileRepositoryUUIDGeneration tests that files get unique UUIDs
@@ -46,15 +48,15 @@ func TestFileRepositoryUUIDGeneration(t *testing.T) {
}
// Check UUID was generated
if file.ID == "" {
if file.ID.IsZero() {
t.Error("file ID was not generated")
}
// Check UUID is unique
if uuids[file.ID] {
if uuids[file.ID.String()] {
t.Errorf("duplicate UUID generated: %s", file.ID)
}
uuids[file.ID] = true
uuids[file.ID.String()] = true
}
}
@@ -96,7 +98,8 @@ func TestFileRepositoryGetByID(t *testing.T) {
}
// Test non-existent ID
nonExistent, err := repo.GetByID(ctx, "non-existent-uuid")
nonExistentID := types.NewFileID() // Generate a new UUID that won't exist in the database
nonExistent, err := repo.GetByID(ctx, nonExistentID)
if err != nil {
t.Fatalf("GetByID should not return error for non-existent ID: %v", err)
}
@@ -154,7 +157,7 @@ func TestOrphanedFileCleanup(t *testing.T) {
}
// Add file2 to snapshot
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID, file2.ID)
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file2.ID)
if err != nil {
t.Fatalf("failed to add file to snapshot: %v", err)
}
@@ -194,11 +197,11 @@ func TestOrphanedChunkCleanup(t *testing.T) {
// Create chunks
chunk1 := &Chunk{
ChunkHash: "orphaned-chunk",
ChunkHash: types.ChunkHash("orphaned-chunk"),
Size: 1024,
}
chunk2 := &Chunk{
ChunkHash: "referenced-chunk",
ChunkHash: types.ChunkHash("referenced-chunk"),
Size: 1024,
}
@@ -244,7 +247,7 @@ func TestOrphanedChunkCleanup(t *testing.T) {
}
// Check that orphaned chunk is gone
orphanedChunk, err := repos.Chunks.GetByHash(ctx, chunk1.ChunkHash)
orphanedChunk, err := repos.Chunks.GetByHash(ctx, chunk1.ChunkHash.String())
if err != nil {
t.Fatalf("error getting chunk: %v", err)
}
@@ -253,7 +256,7 @@ func TestOrphanedChunkCleanup(t *testing.T) {
}
// Check that referenced chunk still exists
referencedChunk, err := repos.Chunks.GetByHash(ctx, chunk2.ChunkHash)
referencedChunk, err := repos.Chunks.GetByHash(ctx, chunk2.ChunkHash.String())
if err != nil {
t.Fatalf("error getting chunk: %v", err)
}
@@ -272,13 +275,13 @@ func TestOrphanedBlobCleanup(t *testing.T) {
// Create blobs
blob1 := &Blob{
ID: "orphaned-blob-id",
Hash: "orphaned-blob",
ID: types.NewBlobID(),
Hash: types.BlobHash("orphaned-blob"),
CreatedTS: time.Now().Truncate(time.Second),
}
blob2 := &Blob{
ID: "referenced-blob-id",
Hash: "referenced-blob",
ID: types.NewBlobID(),
Hash: types.BlobHash("referenced-blob"),
CreatedTS: time.Now().Truncate(time.Second),
}
@@ -303,7 +306,7 @@ func TestOrphanedBlobCleanup(t *testing.T) {
}
// Add blob2 to snapshot
err = repos.Snapshots.AddBlob(ctx, nil, snapshot.ID, blob2.ID, blob2.Hash)
err = repos.Snapshots.AddBlob(ctx, nil, snapshot.ID.String(), blob2.ID, blob2.Hash)
if err != nil {
t.Fatalf("failed to add blob to snapshot: %v", err)
}
@@ -315,7 +318,7 @@ func TestOrphanedBlobCleanup(t *testing.T) {
}
// Check that orphaned blob is gone
orphanedBlob, err := repos.Blobs.GetByID(ctx, blob1.ID)
orphanedBlob, err := repos.Blobs.GetByID(ctx, blob1.ID.String())
if err != nil {
t.Fatalf("error getting blob: %v", err)
}
@@ -324,7 +327,7 @@ func TestOrphanedBlobCleanup(t *testing.T) {
}
// Check that referenced blob still exists
referencedBlob, err := repos.Blobs.GetByID(ctx, blob2.ID)
referencedBlob, err := repos.Blobs.GetByID(ctx, blob2.ID.String())
if err != nil {
t.Fatalf("error getting blob: %v", err)
}
@@ -357,7 +360,7 @@ func TestFileChunkRepositoryWithUUIDs(t *testing.T) {
}
// Create chunks
chunks := []string{"chunk1", "chunk2", "chunk3"}
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
for i, chunkHash := range chunks {
chunk := &Chunk{
ChunkHash: chunkHash,
@@ -443,7 +446,7 @@ func TestChunkFileRepositoryWithUUIDs(t *testing.T) {
// Create a chunk that appears in both files (deduplication)
chunk := &Chunk{
ChunkHash: "shared-chunk",
ChunkHash: types.ChunkHash("shared-chunk"),
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)
@@ -526,7 +529,7 @@ func TestSnapshotRepositoryExtendedFields(t *testing.T) {
}
// Retrieve and verify
retrieved, err := repo.GetByID(ctx, snapshot.ID)
retrieved, err := repo.GetByID(ctx, snapshot.ID.String())
if err != nil {
t.Fatalf("failed to get snapshot: %v", err)
}
@@ -581,7 +584,7 @@ func TestComplexOrphanedDataScenario(t *testing.T) {
files := make([]*File, 3)
for i := range files {
files[i] = &File{
Path: fmt.Sprintf("/file%d.txt", i),
Path: types.FilePath(fmt.Sprintf("/file%d.txt", i)),
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 1024,
@@ -601,29 +604,29 @@ func TestComplexOrphanedDataScenario(t *testing.T) {
// file0: only in snapshot1
// file1: in both snapshots
// file2: only in snapshot2
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot1.ID, files[0].ID)
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot1.ID.String(), files[0].ID)
if err != nil {
t.Fatal(err)
}
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot1.ID, files[1].ID)
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot1.ID.String(), files[1].ID)
if err != nil {
t.Fatal(err)
}
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot2.ID, files[1].ID)
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot2.ID.String(), files[1].ID)
if err != nil {
t.Fatal(err)
}
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot2.ID, files[2].ID)
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot2.ID.String(), files[2].ID)
if err != nil {
t.Fatal(err)
}
// Delete snapshot1
err = repos.Snapshots.DeleteSnapshotFiles(ctx, snapshot1.ID)
err = repos.Snapshots.DeleteSnapshotFiles(ctx, snapshot1.ID.String())
if err != nil {
t.Fatal(err)
}
err = repos.Snapshots.Delete(ctx, snapshot1.ID)
err = repos.Snapshots.Delete(ctx, snapshot1.ID.String())
if err != nil {
t.Fatal(err)
}
@@ -689,7 +692,7 @@ func TestCascadeDelete(t *testing.T) {
// Create chunks and file-chunk mappings
for i := 0; i < 3; i++ {
chunk := &Chunk{
ChunkHash: fmt.Sprintf("cascade-chunk-%d", i),
ChunkHash: types.ChunkHash(fmt.Sprintf("cascade-chunk-%d", i)),
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)
@@ -807,7 +810,7 @@ func TestConcurrentOrphanedCleanup(t *testing.T) {
// Create many files, some orphaned
for i := 0; i < 20; i++ {
file := &File{
Path: fmt.Sprintf("/concurrent-%d.txt", i),
Path: types.FilePath(fmt.Sprintf("/concurrent-%d.txt", i)),
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 1024,
@@ -822,7 +825,7 @@ func TestConcurrentOrphanedCleanup(t *testing.T) {
// Add even-numbered files to snapshot
if i%2 == 0 {
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID, file.ID)
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file.ID)
if err != nil {
t.Fatal(err)
}
@@ -860,7 +863,7 @@ func TestConcurrentOrphanedCleanup(t *testing.T) {
// Verify all remaining files are even-numbered
for _, file := range files {
var num int
_, err := fmt.Sscanf(file.Path, "/concurrent-%d.txt", &num)
_, err := fmt.Sscanf(file.Path.String(), "/concurrent-%d.txt", &num)
if err != nil {
t.Logf("failed to parse file number from %s: %v", file.Path, err)
}

View File

@@ -67,7 +67,7 @@ func TestOrphanedFileCleanupDebug(t *testing.T) {
t.Logf("snapshot_files count before add: %d", count)
// Add file2 to snapshot
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID, file2.ID)
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file2.ID)
if err != nil {
t.Fatalf("failed to add file to snapshot: %v", err)
}

View File

@@ -6,6 +6,8 @@ import (
"strings"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
// TestFileRepositoryEdgeCases tests edge cases for file repository
@@ -38,7 +40,7 @@ func TestFileRepositoryEdgeCases(t *testing.T) {
{
name: "very long path",
file: &File{
Path: "/" + strings.Repeat("a", 4096),
Path: types.FilePath("/" + strings.Repeat("a", 4096)),
MTime: time.Now(),
CTime: time.Now(),
Size: 1024,
@@ -94,7 +96,7 @@ func TestFileRepositoryEdgeCases(t *testing.T) {
t.Run(tt.name, func(t *testing.T) {
// Add a unique suffix to paths to avoid UNIQUE constraint violations
if tt.file.Path != "" {
tt.file.Path = fmt.Sprintf("%s_%d_%d", tt.file.Path, i, time.Now().UnixNano())
tt.file.Path = types.FilePath(fmt.Sprintf("%s_%d_%d", tt.file.Path, i, time.Now().UnixNano()))
}
err := repo.Create(ctx, nil, tt.file)
@@ -169,7 +171,7 @@ func TestDuplicateHandling(t *testing.T) {
// Test duplicate chunk hashes
t.Run("duplicate chunk hashes", func(t *testing.T) {
chunk := &Chunk{
ChunkHash: "duplicate-chunk",
ChunkHash: types.ChunkHash("duplicate-chunk"),
Size: 1024,
}
@@ -202,7 +204,7 @@ func TestDuplicateHandling(t *testing.T) {
}
chunk := &Chunk{
ChunkHash: "test-chunk-dup",
ChunkHash: types.ChunkHash("test-chunk-dup"),
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)
@@ -279,7 +281,7 @@ func TestNullHandling(t *testing.T) {
t.Fatal(err)
}
retrieved, err := repos.Snapshots.GetByID(ctx, snapshot.ID)
retrieved, err := repos.Snapshots.GetByID(ctx, snapshot.ID.String())
if err != nil {
t.Fatal(err)
}
@@ -292,8 +294,8 @@ func TestNullHandling(t *testing.T) {
// Test blob with NULL uploaded_ts
t.Run("blob not uploaded", func(t *testing.T) {
blob := &Blob{
ID: "not-uploaded",
Hash: "test-hash",
ID: types.NewBlobID(),
Hash: types.BlobHash("test-hash"),
CreatedTS: time.Now(),
UploadedTS: nil, // Not uploaded yet
}
@@ -303,7 +305,7 @@ func TestNullHandling(t *testing.T) {
t.Fatal(err)
}
retrieved, err := repos.Blobs.GetByID(ctx, blob.ID)
retrieved, err := repos.Blobs.GetByID(ctx, blob.ID.String())
if err != nil {
t.Fatal(err)
}
@@ -339,13 +341,13 @@ func TestLargeDatasets(t *testing.T) {
// Create many files
const fileCount = 1000
fileIDs := make([]string, fileCount)
fileIDs := make([]types.FileID, fileCount)
t.Run("create many files", func(t *testing.T) {
start := time.Now()
for i := 0; i < fileCount; i++ {
file := &File{
Path: fmt.Sprintf("/large/file%05d.txt", i),
Path: types.FilePath(fmt.Sprintf("/large/file%05d.txt", i)),
MTime: time.Now(),
CTime: time.Now(),
Size: int64(i * 1024),
@@ -361,7 +363,7 @@ func TestLargeDatasets(t *testing.T) {
// Add half to snapshot
if i%2 == 0 {
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID, file.ID)
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file.ID)
if err != nil {
t.Fatal(err)
}
@@ -413,7 +415,7 @@ func TestErrorPropagation(t *testing.T) {
// Test GetByID with non-existent ID
t.Run("GetByID non-existent", func(t *testing.T) {
file, err := repos.Files.GetByID(ctx, "non-existent-uuid")
file, err := repos.Files.GetByID(ctx, types.NewFileID())
if err != nil {
t.Errorf("GetByID should not return error for non-existent ID, got: %v", err)
}
@@ -436,9 +438,9 @@ func TestErrorPropagation(t *testing.T) {
// Test invalid foreign key reference
t.Run("invalid foreign key", func(t *testing.T) {
fc := &FileChunk{
FileID: "non-existent-file-id",
FileID: types.NewFileID(),
Idx: 0,
ChunkHash: "some-chunk",
ChunkHash: types.ChunkHash("some-chunk"),
}
err := repos.FileChunks.Create(ctx, nil, fc)
if err == nil {
@@ -470,7 +472,7 @@ func TestQueryInjection(t *testing.T) {
t.Run("injection attempt", func(t *testing.T) {
// Try injection in file path
file := &File{
Path: injection,
Path: types.FilePath(injection),
MTime: time.Now(),
CTime: time.Now(),
Size: 1024,

View File

@@ -6,6 +6,7 @@
CREATE TABLE IF NOT EXISTS files (
id TEXT PRIMARY KEY, -- UUID
path TEXT NOT NULL UNIQUE,
source_path TEXT NOT NULL DEFAULT '', -- The source directory this file came from (for restore path stripping)
mtime INTEGER NOT NULL,
ctime INTEGER NOT NULL,
size INTEGER NOT NULL,
@@ -28,6 +29,9 @@ CREATE TABLE IF NOT EXISTS file_chunks (
FOREIGN KEY (chunk_hash) REFERENCES chunks(chunk_hash)
);
-- Index for efficient chunk lookups (used in orphan detection)
CREATE INDEX IF NOT EXISTS idx_file_chunks_chunk_hash ON file_chunks(chunk_hash);
-- Chunks table: stores unique content-defined chunks
CREATE TABLE IF NOT EXISTS chunks (
chunk_hash TEXT PRIMARY KEY,
@@ -56,6 +60,9 @@ CREATE TABLE IF NOT EXISTS blob_chunks (
FOREIGN KEY (chunk_hash) REFERENCES chunks(chunk_hash)
);
-- Index for efficient chunk lookups (used in orphan detection)
CREATE INDEX IF NOT EXISTS idx_blob_chunks_chunk_hash ON blob_chunks(chunk_hash);
-- Chunk files table: reverse mapping of chunks to files
CREATE TABLE IF NOT EXISTS chunk_files (
chunk_hash TEXT NOT NULL,
@@ -67,6 +74,9 @@ CREATE TABLE IF NOT EXISTS chunk_files (
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
);
-- Index for efficient file lookups (used in orphan detection)
CREATE INDEX IF NOT EXISTS idx_chunk_files_file_id ON chunk_files(file_id);
-- Snapshots table: tracks backup snapshots
CREATE TABLE IF NOT EXISTS snapshots (
id TEXT PRIMARY KEY,
@@ -96,6 +106,9 @@ CREATE TABLE IF NOT EXISTS snapshot_files (
FOREIGN KEY (file_id) REFERENCES files(id)
);
-- Index for efficient file lookups (used in orphan detection)
CREATE INDEX IF NOT EXISTS idx_snapshot_files_file_id ON snapshot_files(file_id);
-- Snapshot blobs table: maps snapshots to blobs
CREATE TABLE IF NOT EXISTS snapshot_blobs (
snapshot_id TEXT NOT NULL,
@@ -106,6 +119,9 @@ CREATE TABLE IF NOT EXISTS snapshot_blobs (
FOREIGN KEY (blob_id) REFERENCES blobs(id)
);
-- Index for efficient blob lookups (used in orphan detection)
CREATE INDEX IF NOT EXISTS idx_snapshot_blobs_blob_id ON snapshot_blobs(blob_id);
-- Uploads table: tracks blob upload metrics
CREATE TABLE IF NOT EXISTS uploads (
blob_hash TEXT PRIMARY KEY,
@@ -116,3 +132,6 @@ CREATE TABLE IF NOT EXISTS uploads (
FOREIGN KEY (blob_hash) REFERENCES blobs(blob_hash),
FOREIGN KEY (snapshot_id) REFERENCES snapshots(id)
);
-- Index for efficient snapshot lookups
CREATE INDEX IF NOT EXISTS idx_uploads_snapshot_id ON uploads(snapshot_id);

View File

@@ -5,6 +5,8 @@ import (
"database/sql"
"fmt"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
type SnapshotRepository struct {
@@ -269,7 +271,7 @@ func (r *SnapshotRepository) AddFile(ctx context.Context, tx *sql.Tx, snapshotID
}
// AddFileByID adds a file to a snapshot by file ID
func (r *SnapshotRepository) AddFileByID(ctx context.Context, tx *sql.Tx, snapshotID string, fileID string) error {
func (r *SnapshotRepository) AddFileByID(ctx context.Context, tx *sql.Tx, snapshotID string, fileID types.FileID) error {
query := `
INSERT OR IGNORE INTO snapshot_files (snapshot_id, file_id)
VALUES (?, ?)
@@ -277,9 +279,9 @@ func (r *SnapshotRepository) AddFileByID(ctx context.Context, tx *sql.Tx, snapsh
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, snapshotID, fileID)
_, err = tx.ExecContext(ctx, query, snapshotID, fileID.String())
} else {
_, err = r.db.ExecWithLog(ctx, query, snapshotID, fileID)
_, err = r.db.ExecWithLog(ctx, query, snapshotID, fileID.String())
}
if err != nil {
@@ -289,8 +291,48 @@ func (r *SnapshotRepository) AddFileByID(ctx context.Context, tx *sql.Tx, snapsh
return nil
}
// AddFilesByIDBatch adds multiple files to a snapshot in batched inserts
func (r *SnapshotRepository) AddFilesByIDBatch(ctx context.Context, tx *sql.Tx, snapshotID string, fileIDs []types.FileID) error {
if len(fileIDs) == 0 {
return nil
}
// Each entry has 2 values, so batch at 400 to be safe
const batchSize = 400
for i := 0; i < len(fileIDs); i += batchSize {
end := i + batchSize
if end > len(fileIDs) {
end = len(fileIDs)
}
batch := fileIDs[i:end]
query := "INSERT OR IGNORE INTO snapshot_files (snapshot_id, file_id) VALUES "
args := make([]interface{}, 0, len(batch)*2)
for j, fileID := range batch {
if j > 0 {
query += ", "
}
query += "(?, ?)"
args = append(args, snapshotID, fileID.String())
}
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch adding files to snapshot: %w", err)
}
}
return nil
}
// AddBlob adds a blob to a snapshot
func (r *SnapshotRepository) AddBlob(ctx context.Context, tx *sql.Tx, snapshotID string, blobID string, blobHash string) error {
func (r *SnapshotRepository) AddBlob(ctx context.Context, tx *sql.Tx, snapshotID string, blobID types.BlobID, blobHash types.BlobHash) error {
query := `
INSERT OR IGNORE INTO snapshot_blobs (snapshot_id, blob_id, blob_hash)
VALUES (?, ?, ?)
@@ -298,9 +340,9 @@ func (r *SnapshotRepository) AddBlob(ctx context.Context, tx *sql.Tx, snapshotID
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, snapshotID, blobID, blobHash)
_, err = tx.ExecContext(ctx, query, snapshotID, blobID.String(), blobHash.String())
} else {
_, err = r.db.ExecWithLog(ctx, query, snapshotID, blobID, blobHash)
_, err = r.db.ExecWithLog(ctx, query, snapshotID, blobID.String(), blobHash.String())
}
if err != nil {
@@ -337,6 +379,24 @@ func (r *SnapshotRepository) GetBlobHashes(ctx context.Context, snapshotID strin
return blobs, rows.Err()
}
// GetSnapshotTotalCompressedSize returns the total compressed size of all blobs referenced by a snapshot
func (r *SnapshotRepository) GetSnapshotTotalCompressedSize(ctx context.Context, snapshotID string) (int64, error) {
query := `
SELECT COALESCE(SUM(b.compressed_size), 0)
FROM snapshot_blobs sb
JOIN blobs b ON sb.blob_hash = b.blob_hash
WHERE sb.snapshot_id = ?
`
var totalSize int64
err := r.db.conn.QueryRowContext(ctx, query, snapshotID).Scan(&totalSize)
if err != nil {
return 0, fmt.Errorf("querying total compressed size: %w", err)
}
return totalSize, nil
}
// GetIncompleteSnapshots returns all snapshots that haven't been completed
func (r *SnapshotRepository) GetIncompleteSnapshots(ctx context.Context) ([]*Snapshot, error) {
query := `
@@ -474,3 +534,15 @@ func (r *SnapshotRepository) DeleteSnapshotBlobs(ctx context.Context, snapshotID
return nil
}
// DeleteSnapshotUploads removes all uploads entries for a snapshot
func (r *SnapshotRepository) DeleteSnapshotUploads(ctx context.Context, snapshotID string) error {
query := `DELETE FROM uploads WHERE snapshot_id = ?`
_, err := r.db.ExecWithLog(ctx, query, snapshotID)
if err != nil {
return fmt.Errorf("deleting snapshot uploads: %w", err)
}
return nil
}

View File

@@ -6,6 +6,8 @@ import (
"math"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
const (
@@ -46,7 +48,7 @@ func TestSnapshotRepository(t *testing.T) {
}
// Test GetByID
retrieved, err := repo.GetByID(ctx, snapshot.ID)
retrieved, err := repo.GetByID(ctx, snapshot.ID.String())
if err != nil {
t.Fatalf("failed to get snapshot: %v", err)
}
@@ -64,12 +66,12 @@ func TestSnapshotRepository(t *testing.T) {
}
// Test UpdateCounts
err = repo.UpdateCounts(ctx, nil, snapshot.ID, 200, 1000, 20, twoHundredMebibytes, sixtyMebibytes)
err = repo.UpdateCounts(ctx, nil, snapshot.ID.String(), 200, 1000, 20, twoHundredMebibytes, sixtyMebibytes)
if err != nil {
t.Fatalf("failed to update counts: %v", err)
}
retrieved, err = repo.GetByID(ctx, snapshot.ID)
retrieved, err = repo.GetByID(ctx, snapshot.ID.String())
if err != nil {
t.Fatalf("failed to get updated snapshot: %v", err)
}
@@ -97,7 +99,7 @@ func TestSnapshotRepository(t *testing.T) {
// Add more snapshots
for i := 2; i <= 5; i++ {
s := &Snapshot{
ID: fmt.Sprintf("2024-01-0%dT12:00:00Z", i),
ID: types.SnapshotID(fmt.Sprintf("2024-01-0%dT12:00:00Z", i)),
Hostname: "test-host",
VaultikVersion: "1.0.0",
StartedAt: time.Now().Add(time.Duration(i) * time.Hour).Truncate(time.Second),

View File

@@ -4,13 +4,16 @@ import (
"time"
)
// these get populated from main() and copied into the Globals object.
var (
Appname string = "vaultik"
Version string = "dev"
Commit string = "unknown"
)
// Appname is the application name, populated from main().
var Appname string = "vaultik"
// Version is the application version, populated from main().
var Version string = "dev"
// Commit is the git commit hash, populated from main().
var Commit string = "unknown"
// Globals contains application-wide configuration and metadata.
type Globals struct {
Appname string
Version string
@@ -18,6 +21,7 @@ type Globals struct {
StartTime time.Time
}
// New creates and returns a new Globals instance initialized with the package-level variables.
func New() (*Globals, error) {
return &Globals{
Appname: Appname,

View File

@@ -12,34 +12,41 @@ import (
"golang.org/x/term"
)
// LogLevel represents the logging level
// LogLevel represents the logging level.
type LogLevel int
const (
// LevelFatal represents a fatal error level that will exit the program.
LevelFatal LogLevel = iota
// LevelError represents an error level.
LevelError
// LevelWarn represents a warning level.
LevelWarn
// LevelNotice represents a notice level (mapped to Info in slog).
LevelNotice
// LevelInfo represents an informational level.
LevelInfo
// LevelDebug represents a debug level.
LevelDebug
)
// Logger configuration
// Config holds logger configuration.
type Config struct {
Verbose bool
Debug bool
Cron bool
Quiet bool
}
var logger *slog.Logger
// Initialize sets up the global logger based on the provided configuration
// Initialize sets up the global logger based on the provided configuration.
func Initialize(cfg Config) {
// Determine log level based on configuration
var level slog.Level
if cfg.Cron {
// In cron mode, only show fatal errors (which we'll handle specially)
if cfg.Cron || cfg.Quiet {
// In quiet/cron mode, only show errors
level = slog.LevelError
} else if cfg.Debug || strings.Contains(os.Getenv("GODEBUG"), "vaultik") {
level = slog.LevelDebug
@@ -76,7 +83,7 @@ func getCaller(skip int) string {
return fmt.Sprintf("%s:%d", filepath.Base(file), line)
}
// Fatal logs a fatal error and exits
// Fatal logs a fatal error message and exits the program with code 1.
func Fatal(msg string, args ...any) {
if logger != nil {
// Add caller info to args
@@ -86,12 +93,12 @@ func Fatal(msg string, args ...any) {
os.Exit(1)
}
// Fatalf logs a formatted fatal error and exits
// Fatalf logs a formatted fatal error message and exits the program with code 1.
func Fatalf(format string, args ...any) {
Fatal(fmt.Sprintf(format, args...))
}
// Error logs an error
// Error logs an error message.
func Error(msg string, args ...any) {
if logger != nil {
args = append(args, "caller", getCaller(2))
@@ -99,12 +106,12 @@ func Error(msg string, args ...any) {
}
}
// Errorf logs a formatted error
// Errorf logs a formatted error message.
func Errorf(format string, args ...any) {
Error(fmt.Sprintf(format, args...))
}
// Warn logs a warning
// Warn logs a warning message.
func Warn(msg string, args ...any) {
if logger != nil {
args = append(args, "caller", getCaller(2))
@@ -112,12 +119,12 @@ func Warn(msg string, args ...any) {
}
}
// Warnf logs a formatted warning
// Warnf logs a formatted warning message.
func Warnf(format string, args ...any) {
Warn(fmt.Sprintf(format, args...))
}
// Notice logs a notice (mapped to Info level)
// Notice logs a notice message (mapped to Info level).
func Notice(msg string, args ...any) {
if logger != nil {
args = append(args, "caller", getCaller(2))
@@ -125,12 +132,12 @@ func Notice(msg string, args ...any) {
}
}
// Noticef logs a formatted notice
// Noticef logs a formatted notice message.
func Noticef(format string, args ...any) {
Notice(fmt.Sprintf(format, args...))
}
// Info logs an info message
// Info logs an informational message.
func Info(msg string, args ...any) {
if logger != nil {
args = append(args, "caller", getCaller(2))
@@ -138,12 +145,12 @@ func Info(msg string, args ...any) {
}
}
// Infof logs a formatted info message
// Infof logs a formatted informational message.
func Infof(format string, args ...any) {
Info(fmt.Sprintf(format, args...))
}
// Debug logs a debug message
// Debug logs a debug message.
func Debug(msg string, args ...any) {
if logger != nil {
args = append(args, "caller", getCaller(2))
@@ -151,12 +158,12 @@ func Debug(msg string, args ...any) {
}
}
// Debugf logs a formatted debug message
// Debugf logs a formatted debug message.
func Debugf(format string, args ...any) {
Debug(fmt.Sprintf(format, args...))
}
// With returns a logger with additional context
// With returns a logger with additional context attributes.
func With(args ...any) *slog.Logger {
if logger != nil {
return logger.With(args...)
@@ -164,12 +171,12 @@ func With(args ...any) *slog.Logger {
return slog.Default()
}
// WithContext returns a logger with context
// WithContext returns a logger with the provided context.
func WithContext(ctx context.Context) *slog.Logger {
return logger
}
// Logger returns the underlying slog.Logger
// Logger returns the underlying slog.Logger instance.
func Logger() *slog.Logger {
return logger
}

View File

@@ -4,21 +4,22 @@ import (
"go.uber.org/fx"
)
// Module exports logging functionality
// Module exports logging functionality for dependency injection.
var Module = fx.Module("log",
fx.Invoke(func(cfg Config) {
Initialize(cfg)
}),
)
// New creates a new logger configuration from provided options
// New creates a new logger configuration from provided options.
func New(opts LogOptions) Config {
return Config(opts)
}
// LogOptions are provided by the CLI
// LogOptions are provided by the CLI.
type LogOptions struct {
Verbose bool
Debug bool
Cron bool
Quiet bool
}

View File

@@ -21,14 +21,14 @@ const (
colorBold = "\033[1m"
)
// TTYHandler is a custom handler for TTY output with colors
// TTYHandler is a custom slog handler for TTY output with colors.
type TTYHandler struct {
opts slog.HandlerOptions
mu sync.Mutex
out io.Writer
}
// NewTTYHandler creates a new TTY handler
// NewTTYHandler creates a new TTY handler with colored output.
func NewTTYHandler(out io.Writer, opts *slog.HandlerOptions) *TTYHandler {
if opts == nil {
opts = &slog.HandlerOptions{}
@@ -39,12 +39,12 @@ func NewTTYHandler(out io.Writer, opts *slog.HandlerOptions) *TTYHandler {
}
}
// Enabled reports whether the handler handles records at the given level
// Enabled reports whether the handler handles records at the given level.
func (h *TTYHandler) Enabled(_ context.Context, level slog.Level) bool {
return level >= h.opts.Level.Level()
}
// Handle writes the log record
// Handle writes the log record to the output with color formatting.
func (h *TTYHandler) Handle(_ context.Context, r slog.Record) error {
h.mu.Lock()
defer h.mu.Unlock()
@@ -103,12 +103,12 @@ func (h *TTYHandler) Handle(_ context.Context, r slog.Record) error {
return nil
}
// WithAttrs returns a new handler with the given attributes
// WithAttrs returns a new handler with the given attributes.
func (h *TTYHandler) WithAttrs(attrs []slog.Attr) slog.Handler {
return h // Simplified for now
}
// WithGroup returns a new handler with the given group name
// WithGroup returns a new handler with the given group name.
func (h *TTYHandler) WithGroup(name string) slog.Handler {
return h // Simplified for now
}

108
internal/pidlock/pidlock.go Normal file
View File

@@ -0,0 +1,108 @@
// Package pidlock provides process-level locking using PID files.
// It prevents multiple instances of vaultik from running simultaneously,
// which would cause database locking conflicts.
package pidlock
import (
"errors"
"fmt"
"os"
"path/filepath"
"strconv"
"strings"
"syscall"
)
// ErrAlreadyRunning indicates another vaultik instance is running.
var ErrAlreadyRunning = errors.New("another vaultik instance is already running")
// Lock represents an acquired PID lock.
type Lock struct {
path string
}
// Acquire attempts to acquire a PID lock in the specified directory.
// If the lock file exists and the process is still running, it returns
// ErrAlreadyRunning with details about the existing process.
// On success, it writes the current PID to the lock file and returns
// a Lock that must be released with Release().
func Acquire(lockDir string) (*Lock, error) {
// Ensure lock directory exists
if err := os.MkdirAll(lockDir, 0700); err != nil {
return nil, fmt.Errorf("creating lock directory: %w", err)
}
lockPath := filepath.Join(lockDir, "vaultik.pid")
// Check for existing lock
existingPID, err := readPIDFile(lockPath)
if err == nil {
// Lock file exists, check if process is running
if isProcessRunning(existingPID) {
return nil, fmt.Errorf("%w (PID %d)", ErrAlreadyRunning, existingPID)
}
// Process is not running, stale lock file - we can take over
}
// Write our PID
pid := os.Getpid()
if err := os.WriteFile(lockPath, []byte(strconv.Itoa(pid)), 0600); err != nil {
return nil, fmt.Errorf("writing PID file: %w", err)
}
return &Lock{path: lockPath}, nil
}
// Release removes the PID lock file.
// It is safe to call Release multiple times.
func (l *Lock) Release() error {
if l == nil || l.path == "" {
return nil
}
// Verify we still own the lock (our PID is in the file)
existingPID, err := readPIDFile(l.path)
if err != nil {
// File already gone or unreadable - that's fine
return nil
}
if existingPID != os.Getpid() {
// Someone else wrote to our lock file - don't remove it
return nil
}
if err := os.Remove(l.path); err != nil && !os.IsNotExist(err) {
return fmt.Errorf("removing PID file: %w", err)
}
l.path = "" // Prevent double-release
return nil
}
// readPIDFile reads and parses the PID from a lock file.
func readPIDFile(path string) (int, error) {
data, err := os.ReadFile(path)
if err != nil {
return 0, err
}
pid, err := strconv.Atoi(strings.TrimSpace(string(data)))
if err != nil {
return 0, fmt.Errorf("parsing PID: %w", err)
}
return pid, nil
}
// isProcessRunning checks if a process with the given PID is running.
func isProcessRunning(pid int) bool {
process, err := os.FindProcess(pid)
if err != nil {
return false
}
// On Unix, FindProcess always succeeds. We need to send signal 0 to check.
err = process.Signal(syscall.Signal(0))
return err == nil
}

View File

@@ -0,0 +1,108 @@
package pidlock
import (
"os"
"path/filepath"
"strconv"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestAcquireAndRelease(t *testing.T) {
tmpDir := t.TempDir()
// Acquire lock
lock, err := Acquire(tmpDir)
require.NoError(t, err)
require.NotNil(t, lock)
// Verify PID file exists with our PID
data, err := os.ReadFile(filepath.Join(tmpDir, "vaultik.pid"))
require.NoError(t, err)
pid, err := strconv.Atoi(string(data))
require.NoError(t, err)
assert.Equal(t, os.Getpid(), pid)
// Release lock
err = lock.Release()
require.NoError(t, err)
// Verify PID file is gone
_, err = os.Stat(filepath.Join(tmpDir, "vaultik.pid"))
assert.True(t, os.IsNotExist(err))
}
func TestAcquireBlocksSecondInstance(t *testing.T) {
tmpDir := t.TempDir()
// Acquire first lock
lock1, err := Acquire(tmpDir)
require.NoError(t, err)
require.NotNil(t, lock1)
defer func() { _ = lock1.Release() }()
// Try to acquire second lock - should fail
lock2, err := Acquire(tmpDir)
assert.ErrorIs(t, err, ErrAlreadyRunning)
assert.Nil(t, lock2)
}
func TestAcquireWithStaleLock(t *testing.T) {
tmpDir := t.TempDir()
// Write a stale PID file (PID that doesn't exist)
stalePID := 999999999 // Unlikely to be a real process
pidPath := filepath.Join(tmpDir, "vaultik.pid")
err := os.WriteFile(pidPath, []byte(strconv.Itoa(stalePID)), 0600)
require.NoError(t, err)
// Should be able to acquire lock (stale lock is cleaned up)
lock, err := Acquire(tmpDir)
require.NoError(t, err)
require.NotNil(t, lock)
defer func() { _ = lock.Release() }()
// Verify our PID is now in the file
data, err := os.ReadFile(pidPath)
require.NoError(t, err)
pid, err := strconv.Atoi(string(data))
require.NoError(t, err)
assert.Equal(t, os.Getpid(), pid)
}
func TestReleaseIsIdempotent(t *testing.T) {
tmpDir := t.TempDir()
lock, err := Acquire(tmpDir)
require.NoError(t, err)
// Release multiple times - should not error
err = lock.Release()
require.NoError(t, err)
err = lock.Release()
require.NoError(t, err)
}
func TestReleaseNilLock(t *testing.T) {
var lock *Lock
err := lock.Release()
assert.NoError(t, err)
}
func TestAcquireCreatesDirectory(t *testing.T) {
tmpDir := t.TempDir()
nestedDir := filepath.Join(tmpDir, "nested", "dir")
lock, err := Acquire(nestedDir)
require.NoError(t, err)
require.NotNil(t, lock)
defer func() { _ = lock.Release() }()
// Verify directory was created
info, err := os.Stat(nestedDir)
require.NoError(t, err)
assert.True(t, info.IsDir())
}

View File

@@ -10,6 +10,7 @@ import (
"github.com/aws/aws-sdk-go-v2/credentials"
"github.com/aws/aws-sdk-go-v2/feature/s3/manager"
"github.com/aws/aws-sdk-go-v2/service/s3"
"github.com/aws/smithy-go/logging"
)
// Client wraps the AWS S3 client for vaultik operations.
@@ -35,12 +36,18 @@ type Config struct {
Region string
}
// nopLogger is a logger that discards all output.
// Used to suppress SDK warnings about checksums.
type nopLogger struct{}
func (nopLogger) Logf(classification logging.Classification, format string, v ...interface{}) {}
// NewClient creates a new S3 client with the provided configuration.
// It establishes a connection to the S3-compatible storage service and
// validates the credentials. The client uses static credentials and
// path-style URLs for compatibility with various S3-compatible services.
func NewClient(ctx context.Context, cfg Config) (*Client, error) {
// Create AWS config
// Create AWS config with a nop logger to suppress SDK warnings
awsCfg, err := config.LoadDefaultConfig(ctx,
config.WithRegion(cfg.Region),
config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(
@@ -48,6 +55,7 @@ func NewClient(ctx context.Context, cfg Config) (*Client, error) {
cfg.SecretAccessKey,
"",
)),
config.WithLogger(nopLogger{}),
)
if err != nil {
return nil, err

View File

@@ -14,6 +14,7 @@ import (
"time"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/types"
)
// MockS3Client is a mock implementation of S3 operations for testing
@@ -138,13 +139,13 @@ func TestBackupWithInMemoryFS(t *testing.T) {
}
for _, file := range files {
if !expectedFiles[file.Path] {
if !expectedFiles[file.Path.String()] {
t.Errorf("Unexpected file in database: %s", file.Path)
}
delete(expectedFiles, file.Path)
delete(expectedFiles, file.Path.String())
// Verify file metadata
fsFile := testFS[file.Path]
fsFile := testFS[file.Path.String()]
if fsFile == nil {
t.Errorf("File %s not found in test filesystem", file.Path)
continue
@@ -294,8 +295,8 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
hostname, _ := os.Hostname()
snapshotID := time.Now().Format(time.RFC3339)
snapshot := &database.Snapshot{
ID: snapshotID,
Hostname: hostname,
ID: types.SnapshotID(snapshotID),
Hostname: types.Hostname(hostname),
VaultikVersion: "test",
StartedAt: time.Now(),
CompletedAt: nil,
@@ -340,7 +341,7 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
// Create file record in a short transaction
file := &database.File{
Path: path,
Path: types.FilePath(path),
Size: info.Size(),
Mode: uint32(info.Mode()),
MTime: info.ModTime(),
@@ -392,7 +393,7 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
// Create new chunk in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
chunk := &database.Chunk{
ChunkHash: chunkHash,
ChunkHash: types.ChunkHash(chunkHash),
Size: int64(n),
}
return b.repos.Chunks.Create(ctx, tx, chunk)
@@ -408,7 +409,7 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
fileChunk := &database.FileChunk{
FileID: file.ID,
Idx: chunkIndex,
ChunkHash: chunkHash,
ChunkHash: types.ChunkHash(chunkHash),
}
return b.repos.FileChunks.Create(ctx, tx, fileChunk)
})
@@ -419,7 +420,7 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
// Create chunk-file mapping in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
chunkFile := &database.ChunkFile{
ChunkHash: chunkHash,
ChunkHash: types.ChunkHash(chunkHash),
FileID: file.ID,
FileOffset: int64(chunkIndex * defaultChunkSize),
Length: int64(n),
@@ -463,10 +464,11 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
}
// Create blob entry in a short transaction
blobID := types.NewBlobID()
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
blob := &database.Blob{
ID: "test-blob-" + blobHash[:8],
Hash: blobHash,
ID: blobID,
Hash: types.BlobHash(blobHash),
CreatedTS: time.Now(),
}
return b.repos.Blobs.Create(ctx, tx, blob)
@@ -481,8 +483,8 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
// Create blob-chunk mapping in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
blobChunk := &database.BlobChunk{
BlobID: "test-blob-" + blobHash[:8],
ChunkHash: chunkHash,
BlobID: blobID,
ChunkHash: types.ChunkHash(chunkHash),
Offset: 0,
Length: chunk.Size,
}
@@ -494,7 +496,7 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
// Add blob to snapshot in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
return b.repos.Snapshots.AddBlob(ctx, tx, snapshotID, "test-blob-"+blobHash[:8], blobHash)
return b.repos.Snapshots.AddBlob(ctx, tx, snapshotID, blobID, types.BlobHash(blobHash))
})
if err != nil {
return "", err

View File

@@ -0,0 +1,454 @@
package snapshot_test
import (
"context"
"database/sql"
"path/filepath"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/spf13/afero"
"github.com/stretchr/testify/require"
)
func setupExcludeTestFS(t *testing.T) afero.Fs {
t.Helper()
// Create in-memory filesystem
fs := afero.NewMemMapFs()
// Create test directory structure:
// /backup/
// file1.txt (should be backed up)
// file2.log (should be excluded if *.log is in patterns)
// .git/
// config (should be excluded if .git is in patterns)
// objects/
// pack/
// data.pack (should be excluded if .git is in patterns)
// src/
// main.go (should be backed up)
// test.go (should be backed up)
// node_modules/
// package/
// index.js (should be excluded if node_modules is in patterns)
// cache/
// temp.dat (should be excluded if cache/ is in patterns)
// build/
// output.bin (should be excluded if build is in patterns)
// docs/
// readme.md (should be backed up)
// .DS_Store (should be excluded if .DS_Store is in patterns)
// thumbs.db (should be excluded if thumbs.db is in patterns)
files := map[string]string{
"/backup/file1.txt": "content1",
"/backup/file2.log": "log content",
"/backup/.git/config": "git config",
"/backup/.git/objects/pack/data.pack": "pack data",
"/backup/src/main.go": "package main",
"/backup/src/test.go": "package main_test",
"/backup/node_modules/package/index.js": "module.exports = {}",
"/backup/cache/temp.dat": "cached data",
"/backup/build/output.bin": "binary data",
"/backup/docs/readme.md": "# Documentation",
"/backup/.DS_Store": "ds store data",
"/backup/thumbs.db": "thumbs data",
"/backup/src/.hidden": "hidden file",
"/backup/important.log.bak": "backup of log",
}
testTime := time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC)
for path, content := range files {
dir := filepath.Dir(path)
err := fs.MkdirAll(dir, 0755)
require.NoError(t, err)
err = afero.WriteFile(fs, path, []byte(content), 0644)
require.NoError(t, err)
err = fs.Chtimes(path, testTime, testTime)
require.NoError(t, err)
}
return fs
}
func createTestScanner(t *testing.T, fs afero.Fs, excludePatterns []string) (*snapshot.Scanner, *database.Repositories, func()) {
t.Helper()
// Initialize logger
log.Initialize(log.Config{})
// Create test database
db, err := database.NewTestDB()
require.NoError(t, err)
repos := database.NewRepositories(db)
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
FS: fs,
ChunkSize: 64 * 1024,
Repositories: repos,
MaxBlobSize: 1024 * 1024,
CompressionLevel: 3,
AgeRecipients: []string{"age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p"},
Exclude: excludePatterns,
})
cleanup := func() {
_ = db.Close()
}
return scanner, repos, cleanup
}
func createSnapshotRecord(t *testing.T, ctx context.Context, repos *database.Repositories, snapshotID string) {
t.Helper()
err := repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snap := &database.Snapshot{
ID: types.SnapshotID(snapshotID),
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
CompletedAt: nil,
FileCount: 0,
ChunkCount: 0,
BlobCount: 0,
TotalSize: 0,
BlobSize: 0,
CompressionRatio: 1.0,
}
return repos.Snapshots.Create(ctx, tx, snap)
})
require.NoError(t, err)
}
func TestExcludePatterns_ExcludeGitDirectory(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{".git"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should have scanned files but NOT .git directory contents
// Expected: file1.txt, file2.log, src/main.go, src/test.go, node_modules/package/index.js,
// cache/temp.dat, build/output.bin, docs/readme.md, .DS_Store, thumbs.db,
// src/.hidden, important.log.bak
// Excluded: .git/config, .git/objects/pack/data.pack
require.Equal(t, 12, result.FilesScanned, "Should exclude .git directory contents")
}
func TestExcludePatterns_ExcludeByExtension(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"*.log"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude file2.log but NOT important.log.bak (different extension)
// Total files: 14, excluded: 1 (file2.log)
require.Equal(t, 13, result.FilesScanned, "Should exclude *.log files")
}
func TestExcludePatterns_ExcludeNodeModules(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"node_modules"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude node_modules/package/index.js
// Total files: 14, excluded: 1
require.Equal(t, 13, result.FilesScanned, "Should exclude node_modules directory")
}
func TestExcludePatterns_MultiplePatterns(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{".git", "node_modules", "*.log", ".DS_Store", "thumbs.db", "cache", "build"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should only have: file1.txt, src/main.go, src/test.go, docs/readme.md, src/.hidden, important.log.bak
// Excluded: .git/*, node_modules/*, *.log (file2.log), .DS_Store, thumbs.db, cache/*, build/*
require.Equal(t, 6, result.FilesScanned, "Should exclude multiple patterns")
}
func TestExcludePatterns_NoExclusions(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should scan all 14 files
require.Equal(t, 14, result.FilesScanned, "Should scan all files when no exclusions")
}
func TestExcludePatterns_ExcludeHiddenFiles(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{".*"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude: .git/*, .DS_Store, src/.hidden
// Total files: 14, excluded: 4 (.git/config, .git/objects/pack/data.pack, .DS_Store, src/.hidden)
require.Equal(t, 10, result.FilesScanned, "Should exclude hidden files and directories")
}
func TestExcludePatterns_DoubleStarGlob(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"**/*.pack"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude .git/objects/pack/data.pack
// Total files: 14, excluded: 1
require.Equal(t, 13, result.FilesScanned, "Should exclude **/*.pack files")
}
func TestExcludePatterns_ExactFileName(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"thumbs.db", ".DS_Store"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude thumbs.db and .DS_Store
// Total files: 14, excluded: 2
require.Equal(t, 12, result.FilesScanned, "Should exclude exact file names")
}
func TestExcludePatterns_CaseSensitive(t *testing.T) {
// Pattern matching should be case-sensitive
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"THUMBS.DB"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Case-sensitive matching: THUMBS.DB should NOT match thumbs.db
// All 14 files should be scanned
require.Equal(t, 14, result.FilesScanned, "Pattern matching should be case-sensitive")
}
func TestExcludePatterns_DirectoryWithTrailingSlash(t *testing.T) {
fs := setupExcludeTestFS(t)
// Some users might add trailing slashes to directory patterns
scanner, repos, cleanup := createTestScanner(t, fs, []string{"cache/", "build/"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude cache/temp.dat and build/output.bin
// Total files: 14, excluded: 2
require.Equal(t, 12, result.FilesScanned, "Should handle directory patterns with trailing slashes")
}
func TestExcludePatterns_PatternInSubdirectory(t *testing.T) {
fs := setupExcludeTestFS(t)
// Exclude .hidden file specifically in src directory
scanner, repos, cleanup := createTestScanner(t, fs, []string{"src/.hidden"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude only src/.hidden
// Total files: 14, excluded: 1
require.Equal(t, 13, result.FilesScanned, "Should exclude specific subdirectory files")
}
// setupAnchoredTestFS creates a filesystem for testing anchored patterns
// Source dir: /backup
// Structure:
//
// /backup/
// projectname/
// file.txt (should be excluded with /projectname)
// otherproject/
// projectname/
// file.txt (should NOT be excluded with /projectname, only with projectname)
// src/
// file.go
func setupAnchoredTestFS(t *testing.T) afero.Fs {
t.Helper()
fs := afero.NewMemMapFs()
files := map[string]string{
"/backup/projectname/file.txt": "root project file",
"/backup/otherproject/projectname/file.txt": "nested project file",
"/backup/src/file.go": "source file",
"/backup/file.txt": "root file",
}
testTime := time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC)
for path, content := range files {
dir := filepath.Dir(path)
err := fs.MkdirAll(dir, 0755)
require.NoError(t, err)
err = afero.WriteFile(fs, path, []byte(content), 0644)
require.NoError(t, err)
err = fs.Chtimes(path, testTime, testTime)
require.NoError(t, err)
}
return fs
}
func TestExcludePatterns_AnchoredPattern(t *testing.T) {
// Pattern starting with / should only match from root of source dir
fs := setupAnchoredTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"/projectname"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// /projectname should ONLY exclude /backup/projectname/file.txt (1 file)
// /backup/otherproject/projectname/file.txt should NOT be excluded
// Total files: 4, excluded: 1
require.Equal(t, 3, result.FilesScanned, "Anchored pattern /projectname should only match at root of source dir")
}
func TestExcludePatterns_UnanchoredPattern(t *testing.T) {
// Pattern without leading / should match anywhere in path
fs := setupAnchoredTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"projectname"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// projectname (without /) should exclude BOTH:
// - /backup/projectname/file.txt
// - /backup/otherproject/projectname/file.txt
// Total files: 4, excluded: 2
require.Equal(t, 2, result.FilesScanned, "Unanchored pattern should match anywhere in path")
}
func TestExcludePatterns_AnchoredPatternWithGlob(t *testing.T) {
// Anchored pattern with glob
fs := setupAnchoredTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"/src/*.go"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// /src/*.go should exclude /backup/src/file.go
// Total files: 4, excluded: 1
require.Equal(t, 3, result.FilesScanned, "Anchored pattern with glob should work")
}
func TestExcludePatterns_AnchoredPatternFile(t *testing.T) {
// Anchored pattern for exact file at root
fs := setupAnchoredTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"/file.txt"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// /file.txt should ONLY exclude /backup/file.txt
// NOT /backup/projectname/file.txt or /backup/otherproject/projectname/file.txt
// Total files: 4, excluded: 1
require.Equal(t, 3, result.FilesScanned, "Anchored pattern for file should only match at root")
}
func TestExcludePatterns_UnanchoredPatternFile(t *testing.T) {
// Unanchored pattern for file should match anywhere
fs := setupAnchoredTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"file.txt"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// file.txt should exclude ALL file.txt files:
// - /backup/file.txt
// - /backup/projectname/file.txt
// - /backup/otherproject/projectname/file.txt
// Total files: 4, excluded: 3
require.Equal(t, 1, result.FilesScanned, "Unanchored pattern for file should match anywhere")
}

View File

@@ -9,6 +9,7 @@ import (
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
@@ -53,7 +54,7 @@ func TestFileContentChange(t *testing.T) {
snapshotID1 := "snapshot1"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: snapshotID1,
ID: types.SnapshotID(snapshotID1),
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
@@ -87,7 +88,7 @@ func TestFileContentChange(t *testing.T) {
snapshotID2 := "snapshot2"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: snapshotID2,
ID: types.SnapshotID(snapshotID2),
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
@@ -117,12 +118,12 @@ func TestFileContentChange(t *testing.T) {
assert.Equal(t, newChunkHash, chunkFiles2[0].ChunkHash)
// Verify old chunk still exists (it's still valid data)
oldChunk, err := repos.Chunks.GetByHash(ctx, oldChunkHash)
oldChunk, err := repos.Chunks.GetByHash(ctx, oldChunkHash.String())
require.NoError(t, err)
assert.NotNil(t, oldChunk)
// Verify new chunk exists
newChunk, err := repos.Chunks.GetByHash(ctx, newChunkHash)
newChunk, err := repos.Chunks.GetByHash(ctx, newChunkHash.String())
require.NoError(t, err)
assert.NotNil(t, newChunk)
@@ -182,7 +183,7 @@ func TestMultipleFileChanges(t *testing.T) {
snapshotID1 := "snapshot1"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: snapshotID1,
ID: types.SnapshotID(snapshotID1),
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
@@ -194,8 +195,8 @@ func TestMultipleFileChanges(t *testing.T) {
// First scan
result1, err := scanner.Scan(ctx, "/", snapshotID1)
require.NoError(t, err)
// 4 files because root directory is also counted
assert.Equal(t, 4, result1.FilesScanned)
// Only regular files are counted, not directories
assert.Equal(t, 3, result1.FilesScanned)
// Modify two files
time.Sleep(10 * time.Millisecond) // Ensure mtime changes
@@ -208,7 +209,7 @@ func TestMultipleFileChanges(t *testing.T) {
snapshotID2 := "snapshot2"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: snapshotID2,
ID: types.SnapshotID(snapshotID2),
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
@@ -220,8 +221,9 @@ func TestMultipleFileChanges(t *testing.T) {
// Second scan
result2, err := scanner.Scan(ctx, "/", snapshotID2)
require.NoError(t, err)
// 4 files because root directory is also counted
assert.Equal(t, 4, result2.FilesScanned)
// Only regular files are counted, not directories
assert.Equal(t, 3, result2.FilesScanned)
// Verify each file has exactly one set of chunks
for path := range files {

View File

@@ -3,7 +3,7 @@ package snapshot
import (
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/s3"
"git.eeqj.de/sneak/vaultik/internal/storage"
"github.com/spf13/afero"
"go.uber.org/fx"
)
@@ -11,6 +11,9 @@ import (
// ScannerParams holds parameters for scanner creation
type ScannerParams struct {
EnableProgress bool
Fs afero.Fs
Exclude []string // Exclude patterns (combined global + snapshot-specific)
SkipErrors bool // Skip file read errors (log loudly but continue)
}
// Module exports backup functionality as an fx module.
@@ -26,17 +29,25 @@ var Module = fx.Module("backup",
// ScannerFactory creates scanners with custom parameters
type ScannerFactory func(params ScannerParams) *Scanner
func provideScannerFactory(cfg *config.Config, repos *database.Repositories, s3Client *s3.Client) ScannerFactory {
func provideScannerFactory(cfg *config.Config, repos *database.Repositories, storer storage.Storer) ScannerFactory {
return func(params ScannerParams) *Scanner {
// Use provided excludes, or fall back to global config excludes
excludes := params.Exclude
if len(excludes) == 0 {
excludes = cfg.Exclude
}
return NewScanner(ScannerConfig{
FS: afero.NewOsFs(),
FS: params.Fs,
ChunkSize: cfg.ChunkSize.Int64(),
Repositories: repos,
S3Client: s3Client,
Storage: storer,
MaxBlobSize: cfg.BlobSizeLimit.Int64(),
CompressionLevel: cfg.CompressionLevel,
AgeRecipients: cfg.AgeRecipients,
EnableProgress: params.EnableProgress,
Exclude: excludes,
SkipErrors: params.SkipErrors,
})
}
}

View File

@@ -22,6 +22,9 @@ const (
// DetailInterval defines how often multi-line detailed status reports are printed.
// These reports include comprehensive statistics about files, chunks, blobs, and uploads.
DetailInterval = 60 * time.Second
// UploadProgressInterval defines how often upload progress messages are logged.
UploadProgressInterval = 15 * time.Second
)
// ProgressStats holds atomic counters for progress tracking
@@ -36,7 +39,7 @@ type ProgressStats struct {
BlobsCreated atomic.Int64
BlobsUploaded atomic.Int64
BytesUploaded atomic.Int64
UploadDurationMs atomic.Int64 // Total milliseconds spent uploading to S3
UploadDurationMs atomic.Int64 // Total milliseconds spent uploading
CurrentFile atomic.Value // stores string
TotalSize atomic.Int64 // Total size to process (set after scan phase)
TotalFiles atomic.Int64 // Total files to process in phase 2
@@ -55,6 +58,7 @@ type UploadInfo struct {
BlobHash string
Size int64
StartTime time.Time
LastLogTime time.Time
}
// ProgressReporter handles periodic progress reporting
@@ -269,7 +273,7 @@ func (pr *ProgressReporter) printDetailedStatus() {
"created", blobsCreated,
"uploaded", blobsUploaded,
"pending", blobsCreated-blobsUploaded)
log.Info("Total uploaded to S3",
log.Info("Total uploaded to remote",
"uploaded", humanize.Bytes(uint64(bytesUploaded)),
"compression_ratio", formatRatio(bytesUploaded, bytesScanned))
if currentFile != "" {
@@ -330,6 +334,11 @@ func (pr *ProgressReporter) ReportUploadStart(blobHash string, size int64) {
StartTime: time.Now().UTC(),
}
pr.stats.CurrentUpload.Store(info)
// Log the start of upload
log.Info("Starting blob upload",
"hash", blobHash[:8]+"...",
"size", humanize.Bytes(uint64(size)))
}
// ReportUploadComplete marks the completion of a blob upload
@@ -377,18 +386,13 @@ func (pr *ProgressReporter) UpdateChunkingActivity() {
func (pr *ProgressReporter) ReportUploadProgress(blobHash string, bytesUploaded, totalSize int64, instantSpeed float64) {
// Update the current upload info with progress
if uploadInfo, ok := pr.stats.CurrentUpload.Load().(*UploadInfo); ok && uploadInfo != nil {
// Format speed in bits/second
now := time.Now()
// Only log at the configured interval
if now.Sub(uploadInfo.LastLogTime) >= UploadProgressInterval {
// Format speed in bits/second using humanize
bitsPerSec := instantSpeed * 8
var speedStr string
if bitsPerSec >= 1e9 {
speedStr = fmt.Sprintf("%.1fGbit/sec", bitsPerSec/1e9)
} else if bitsPerSec >= 1e6 {
speedStr = fmt.Sprintf("%.0fMbit/sec", bitsPerSec/1e6)
} else if bitsPerSec >= 1e3 {
speedStr = fmt.Sprintf("%.0fKbit/sec", bitsPerSec/1e3)
} else {
speedStr = fmt.Sprintf("%.0fbit/sec", bitsPerSec)
}
speedStr := humanize.SI(bitsPerSec, "bit/sec")
percent := float64(bytesUploaded) / float64(totalSize) * 100
@@ -408,5 +412,8 @@ func (pr *ProgressReporter) ReportUploadProgress(blobHash string, bytesUploaded,
"total", humanize.Bytes(uint64(totalSize)),
"speed", speedStr,
"eta", etaStr)
uploadInfo.LastLogTime = now
}
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -10,6 +10,7 @@ import (
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/spf13/afero"
)
@@ -74,7 +75,7 @@ func TestScannerSimpleDirectory(t *testing.T) {
snapshotID := "test-snapshot-001"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: snapshotID,
ID: types.SnapshotID(snapshotID),
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
@@ -99,26 +100,25 @@ func TestScannerSimpleDirectory(t *testing.T) {
t.Fatalf("scan failed: %v", err)
}
// Verify results
// We now scan 6 files + 3 directories (source, subdir, subdir2) = 9 entries
if result.FilesScanned != 9 {
t.Errorf("expected 9 entries scanned, got %d", result.FilesScanned)
// Verify results - we only scan regular files, not directories
if result.FilesScanned != 6 {
t.Errorf("expected 6 files scanned, got %d", result.FilesScanned)
}
// Directories have their own sizes, so the total will be more than just file content
// Total bytes should be the sum of all file contents
if result.BytesScanned < 97 { // At minimum we have 97 bytes of file content
t.Errorf("expected at least 97 bytes scanned, got %d", result.BytesScanned)
}
// Verify files in database
// Verify files in database - only regular files are stored
files, err := repos.Files.ListByPrefix(ctx, "/source")
if err != nil {
t.Fatalf("failed to list files: %v", err)
}
// We should have 6 files + 3 directories = 9 entries
if len(files) != 9 {
t.Errorf("expected 9 entries in database, got %d", len(files))
// We should have 6 files (directories are not stored)
if len(files) != 6 {
t.Errorf("expected 6 files in database, got %d", len(files))
}
// Verify specific file
@@ -159,118 +159,6 @@ func TestScannerSimpleDirectory(t *testing.T) {
}
}
func TestScannerWithSymlinks(t *testing.T) {
// Initialize logger for tests
log.Initialize(log.Config{})
// Create in-memory filesystem
fs := afero.NewMemMapFs()
// Create test files
if err := fs.MkdirAll("/source", 0755); err != nil {
t.Fatal(err)
}
if err := afero.WriteFile(fs, "/source/target.txt", []byte("target content"), 0644); err != nil {
t.Fatal(err)
}
if err := afero.WriteFile(fs, "/outside/file.txt", []byte("outside content"), 0644); err != nil {
t.Fatal(err)
}
// Create symlinks (if supported by the filesystem)
linker, ok := fs.(afero.Symlinker)
if !ok {
t.Skip("filesystem does not support symlinks")
}
// Symlink to file in source
if err := linker.SymlinkIfPossible("target.txt", "/source/link1.txt"); err != nil {
t.Fatal(err)
}
// Symlink to file outside source
if err := linker.SymlinkIfPossible("/outside/file.txt", "/source/link2.txt"); err != nil {
t.Fatal(err)
}
// Create test database
db, err := database.NewTestDB()
if err != nil {
t.Fatalf("failed to create test database: %v", err)
}
defer func() {
if err := db.Close(); err != nil {
t.Errorf("failed to close database: %v", err)
}
}()
repos := database.NewRepositories(db)
// Create scanner
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
FS: fs,
ChunkSize: 1024 * 16,
Repositories: repos,
MaxBlobSize: int64(1024 * 1024),
CompressionLevel: 3,
AgeRecipients: []string{"age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"}, // Test public key
})
// Create a snapshot record for testing
ctx := context.Background()
snapshotID := "test-snapshot-001"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: snapshotID,
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
CompletedAt: nil,
FileCount: 0,
ChunkCount: 0,
BlobCount: 0,
TotalSize: 0,
BlobSize: 0,
CompressionRatio: 1.0,
}
return repos.Snapshots.Create(ctx, tx, snapshot)
})
if err != nil {
t.Fatalf("failed to create snapshot: %v", err)
}
// Scan the directory
var result *snapshot.ScanResult
result, err = scanner.Scan(ctx, "/source", snapshotID)
if err != nil {
t.Fatalf("scan failed: %v", err)
}
// Should have scanned 3 files (target + 2 symlinks)
if result.FilesScanned != 3 {
t.Errorf("expected 3 files scanned, got %d", result.FilesScanned)
}
// Check symlinks in database
link1, err := repos.Files.GetByPath(ctx, "/source/link1.txt")
if err != nil {
t.Fatalf("failed to get link1.txt: %v", err)
}
if link1.LinkTarget != "target.txt" {
t.Errorf("expected link1.txt target 'target.txt', got %q", link1.LinkTarget)
}
link2, err := repos.Files.GetByPath(ctx, "/source/link2.txt")
if err != nil {
t.Fatalf("failed to get link2.txt: %v", err)
}
if link2.LinkTarget != "/outside/file.txt" {
t.Errorf("expected link2.txt target '/outside/file.txt', got %q", link2.LinkTarget)
}
}
func TestScannerLargeFile(t *testing.T) {
// Initialize logger for tests
log.Initialize(log.Config{})
@@ -322,7 +210,7 @@ func TestScannerLargeFile(t *testing.T) {
snapshotID := "test-snapshot-001"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: snapshotID,
ID: types.SnapshotID(snapshotID),
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
@@ -347,9 +235,9 @@ func TestScannerLargeFile(t *testing.T) {
t.Fatalf("scan failed: %v", err)
}
// We scan 1 file + 1 directory = 2 entries
if result.FilesScanned != 2 {
t.Errorf("expected 2 entries scanned, got %d", result.FilesScanned)
// We scan only regular files, not directories
if result.FilesScanned != 1 {
t.Errorf("expected 1 file scanned, got %d", result.FilesScanned)
}
// The file size should be at least 1MB

View File

@@ -19,24 +19,19 @@ package snapshot
// - Blobs not containing any remaining chunks
// - All related mapping tables (file_chunks, chunk_files, blob_chunks)
// 7. Close the temporary database
// 8. Use sqlite3 to dump the cleaned database to SQL
// 9. Delete the temporary database file
// 10. Compress the SQL dump with zstd
// 11. Encrypt the compressed dump with age (if encryption is enabled)
// 12. Upload to S3 as: snapshots/{snapshot-id}.sql.zst[.age]
// 13. Reopen the main database
// 8. VACUUM the database to remove deleted data and compact (security critical)
// 9. Compress the binary database with zstd
// 10. Encrypt the compressed database with age (if encryption is enabled)
// 11. Upload to S3 as: metadata/{snapshot-id}/db.zst.age
// 12. Reopen the main database
//
// Advantages of this approach:
// - No custom metadata format needed
// - Reuses existing database schema and relationships
// - SQL dumps are portable and compress well
// - Restore process can simply execute the SQL
// - Binary SQLite files are portable and compress well
// - Fast restore - just decompress and open (no SQL parsing)
// - VACUUM ensures no deleted data leaks
// - Atomic and consistent snapshot of all metadata
//
// TODO: Future improvements:
// - Add snapshot-file relationships to track which files belong to which snapshot
// - Implement incremental snapshots that reference previous snapshots
// - Add snapshot manifest with additional metadata (size, chunk count, etc.)
import (
"bytes"
@@ -44,25 +39,28 @@ import (
"database/sql"
"fmt"
"io"
"os"
"os/exec"
"path/filepath"
"strings"
"time"
"git.eeqj.de/sneak/vaultik/internal/blobgen"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/s3"
"git.eeqj.de/sneak/vaultik/internal/storage"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/dustin/go-humanize"
"github.com/spf13/afero"
"go.uber.org/fx"
)
// SnapshotManager handles snapshot creation and metadata export
type SnapshotManager struct {
repos *database.Repositories
s3Client S3Client
storage storage.Storer
config *config.Config
fs afero.Fs
}
// SnapshotManagerParams holds dependencies for NewSnapshotManager
@@ -70,7 +68,7 @@ type SnapshotManagerParams struct {
fx.In
Repos *database.Repositories
S3Client *s3.Client
Storage storage.Storer
Config *config.Config
}
@@ -78,20 +76,45 @@ type SnapshotManagerParams struct {
func NewSnapshotManager(params SnapshotManagerParams) *SnapshotManager {
return &SnapshotManager{
repos: params.Repos,
s3Client: params.S3Client,
storage: params.Storage,
config: params.Config,
}
}
// CreateSnapshot creates a new snapshot record in the database at the start of a backup
// SetFilesystem sets the filesystem to use for all file operations
func (sm *SnapshotManager) SetFilesystem(fs afero.Fs) {
sm.fs = fs
}
// CreateSnapshot creates a new snapshot record in the database at the start of a backup.
// Deprecated: Use CreateSnapshotWithName instead for multi-snapshot support.
func (sm *SnapshotManager) CreateSnapshot(ctx context.Context, hostname, version, gitRevision string) (string, error) {
snapshotID := fmt.Sprintf("%s-%s", hostname, time.Now().UTC().Format("20060102-150405Z"))
return sm.CreateSnapshotWithName(ctx, hostname, "", version, gitRevision)
}
// CreateSnapshotWithName creates a new snapshot record with an optional snapshot name.
// The snapshot ID format is: hostname_name_timestamp or hostname_timestamp if name is empty.
func (sm *SnapshotManager) CreateSnapshotWithName(ctx context.Context, hostname, name, version, gitRevision string) (string, error) {
// Use short hostname (strip domain if present)
shortHostname := hostname
if idx := strings.Index(hostname, "."); idx != -1 {
shortHostname = hostname[:idx]
}
// Build snapshot ID with optional name
timestamp := time.Now().UTC().Format("2006-01-02T15:04:05Z")
var snapshotID string
if name != "" {
snapshotID = fmt.Sprintf("%s_%s_%s", shortHostname, name, timestamp)
} else {
snapshotID = fmt.Sprintf("%s_%s", shortHostname, timestamp)
}
snapshot := &database.Snapshot{
ID: snapshotID,
Hostname: hostname,
VaultikVersion: version,
VaultikGitRevision: gitRevision,
ID: types.SnapshotID(snapshotID),
Hostname: types.Hostname(hostname),
VaultikVersion: types.Version(version),
VaultikGitRevision: types.GitRevision(gitRevision),
StartedAt: time.Now().UTC(),
CompletedAt: nil, // Not completed yet
FileCount: 0,
@@ -192,14 +215,14 @@ func (sm *SnapshotManager) ExportSnapshotMetadata(ctx context.Context, dbPath st
log.Info("Phase 3/3: Exporting snapshot metadata", "snapshot_id", snapshotID, "source_db", dbPath)
// Create temp directory for all temporary files
tempDir, err := os.MkdirTemp("", "vaultik-snapshot-*")
tempDir, err := afero.TempDir(sm.fs, "", "vaultik-snapshot-*")
if err != nil {
return fmt.Errorf("creating temp dir: %w", err)
}
log.Debug("Created temporary directory", "path", tempDir)
defer func() {
log.Debug("Cleaning up temporary directory", "path", tempDir)
if err := os.RemoveAll(tempDir); err != nil {
if err := sm.fs.RemoveAll(tempDir); err != nil {
log.Debug("Failed to remove temp dir", "path", tempDir, "error", err)
}
}()
@@ -208,20 +231,20 @@ func (sm *SnapshotManager) ExportSnapshotMetadata(ctx context.Context, dbPath st
// The main database should be closed at this point
tempDBPath := filepath.Join(tempDir, "snapshot.db")
log.Debug("Copying database to temporary location", "source", dbPath, "destination", tempDBPath)
if err := copyFile(dbPath, tempDBPath); err != nil {
if err := sm.copyFile(dbPath, tempDBPath); err != nil {
return fmt.Errorf("copying database: %w", err)
}
log.Debug("Database copy complete", "size", getFileSize(tempDBPath))
log.Debug("Database copy complete", "size", sm.getFileSize(tempDBPath))
// Step 2: Clean the temp database to only contain current snapshot data
log.Debug("Cleaning temporary database to contain only current snapshot data", "snapshot_id", snapshotID, "db_path", tempDBPath)
log.Debug("Cleaning temporary database", "snapshot_id", snapshotID)
stats, err := sm.cleanSnapshotDB(ctx, tempDBPath, snapshotID)
if err != nil {
return fmt.Errorf("cleaning snapshot database: %w", err)
}
log.Info("Temporary database cleanup complete",
"db_path", tempDBPath,
"size_after_clean", humanize.Bytes(uint64(getFileSize(tempDBPath))),
"size_after_clean", humanize.Bytes(uint64(sm.getFileSize(tempDBPath))),
"files", stats.FileCount,
"chunks", stats.ChunkCount,
"blobs", stats.BlobCount,
@@ -229,31 +252,29 @@ func (sm *SnapshotManager) ExportSnapshotMetadata(ctx context.Context, dbPath st
"total_uncompressed_size", humanize.Bytes(uint64(stats.UncompressedSize)),
"compression_ratio", fmt.Sprintf("%.2fx", float64(stats.UncompressedSize)/float64(stats.CompressedSize)))
// Step 3: Dump the cleaned database to SQL
dumpPath := filepath.Join(tempDir, "snapshot.sql")
log.Debug("Dumping database to SQL", "source", tempDBPath, "destination", dumpPath)
if err := sm.dumpDatabase(tempDBPath, dumpPath); err != nil {
return fmt.Errorf("dumping database: %w", err)
// Step 3: VACUUM the database to remove deleted data and compact
// This is critical for security - ensures no stale/deleted data is uploaded
if err := sm.vacuumDatabase(tempDBPath); err != nil {
return fmt.Errorf("vacuuming database: %w", err)
}
log.Debug("SQL dump complete", "size", getFileSize(dumpPath))
log.Debug("Database vacuumed", "size", humanize.Bytes(uint64(sm.getFileSize(tempDBPath))))
// Step 4: Compress and encrypt the SQL dump
compressedPath := filepath.Join(tempDir, "snapshot.sql.zst.age")
log.Debug("Compressing and encrypting SQL dump", "source", dumpPath, "destination", compressedPath)
if err := sm.compressDump(dumpPath, compressedPath); err != nil {
return fmt.Errorf("compressing dump: %w", err)
// Step 4: Compress and encrypt the binary database file
compressedPath := filepath.Join(tempDir, "db.zst.age")
if err := sm.compressFile(tempDBPath, compressedPath); err != nil {
return fmt.Errorf("compressing database: %w", err)
}
log.Debug("Compression complete", "original_size", getFileSize(dumpPath), "compressed_size", getFileSize(compressedPath))
log.Debug("Compression complete",
"original_size", humanize.Bytes(uint64(sm.getFileSize(tempDBPath))),
"compressed_size", humanize.Bytes(uint64(sm.getFileSize(compressedPath))))
// Step 5: Read compressed and encrypted data for upload
log.Debug("Reading compressed and encrypted data for upload", "path", compressedPath)
finalData, err := os.ReadFile(compressedPath)
finalData, err := afero.ReadFile(sm.fs, compressedPath)
if err != nil {
return fmt.Errorf("reading compressed dump: %w", err)
}
// Step 6: Generate blob manifest (before closing temp DB)
log.Debug("Generating blob manifest from temporary database", "db_path", tempDBPath)
blobManifest, err := sm.generateBlobManifest(ctx, tempDBPath, snapshotID)
if err != nil {
return fmt.Errorf("generating blob manifest: %w", err)
@@ -263,14 +284,13 @@ func (sm *SnapshotManager) ExportSnapshotMetadata(ctx context.Context, dbPath st
// Upload database backup (compressed and encrypted)
dbKey := fmt.Sprintf("metadata/%s/db.zst.age", snapshotID)
log.Debug("Uploading snapshot database to S3", "key", dbKey, "size", len(finalData))
dbUploadStart := time.Now()
if err := sm.s3Client.PutObject(ctx, dbKey, bytes.NewReader(finalData)); err != nil {
if err := sm.storage.Put(ctx, dbKey, bytes.NewReader(finalData)); err != nil {
return fmt.Errorf("uploading snapshot database: %w", err)
}
dbUploadDuration := time.Since(dbUploadStart)
dbUploadSpeed := float64(len(finalData)) * 8 / dbUploadDuration.Seconds() // bits per second
log.Info("Uploaded snapshot database to S3",
log.Info("Uploaded snapshot database",
"path", dbKey,
"size", humanize.Bytes(uint64(len(finalData))),
"duration", dbUploadDuration,
@@ -278,14 +298,13 @@ func (sm *SnapshotManager) ExportSnapshotMetadata(ctx context.Context, dbPath st
// Upload blob manifest (compressed only, not encrypted)
manifestKey := fmt.Sprintf("metadata/%s/manifest.json.zst", snapshotID)
log.Debug("Uploading blob manifest to S3", "key", manifestKey, "size", len(blobManifest))
manifestUploadStart := time.Now()
if err := sm.s3Client.PutObject(ctx, manifestKey, bytes.NewReader(blobManifest)); err != nil {
if err := sm.storage.Put(ctx, manifestKey, bytes.NewReader(blobManifest)); err != nil {
return fmt.Errorf("uploading blob manifest: %w", err)
}
manifestUploadDuration := time.Since(manifestUploadStart)
manifestUploadSpeed := float64(len(blobManifest)) * 8 / manifestUploadDuration.Seconds() // bits per second
log.Info("Uploaded blob manifest to S3",
log.Info("Uploaded blob manifest",
"path", manifestKey,
"size", humanize.Bytes(uint64(len(blobManifest))),
"duration", manifestUploadDuration,
@@ -411,67 +430,61 @@ func (sm *SnapshotManager) cleanSnapshotDB(ctx context.Context, dbPath string, s
stats.CompressedSize = compressedSize.Int64
stats.UncompressedSize = uncompressedSize.Int64
log.Debug("[Temp DB Cleanup] Database cleanup complete", "stats", stats)
return stats, nil
}
// dumpDatabase creates a SQL dump of the database
func (sm *SnapshotManager) dumpDatabase(dbPath, dumpPath string) error {
log.Debug("Running sqlite3 dump command", "source", dbPath, "destination", dumpPath)
cmd := exec.Command("sqlite3", dbPath, ".dump")
// vacuumDatabase runs VACUUM on the database to remove deleted data and compact
// This is critical for security - ensures no stale/deleted data pages are uploaded
func (sm *SnapshotManager) vacuumDatabase(dbPath string) error {
log.Debug("Running VACUUM on database", "path", dbPath)
cmd := exec.Command("sqlite3", dbPath, "VACUUM;")
output, err := cmd.Output()
if err != nil {
return fmt.Errorf("running sqlite3 dump: %w", err)
}
log.Debug("SQL dump generated", "size", len(output))
if err := os.WriteFile(dumpPath, output, 0644); err != nil {
return fmt.Errorf("writing dump file: %w", err)
if output, err := cmd.CombinedOutput(); err != nil {
return fmt.Errorf("running VACUUM: %w (output: %s)", err, string(output))
}
return nil
}
// compressDump compresses the SQL dump using zstd
func (sm *SnapshotManager) compressDump(inputPath, outputPath string) error {
log.Debug("Opening SQL dump for compression", "path", inputPath)
input, err := os.Open(inputPath)
// compressFile compresses a file using zstd and encrypts with age
func (sm *SnapshotManager) compressFile(inputPath, outputPath string) error {
input, err := sm.fs.Open(inputPath)
if err != nil {
return fmt.Errorf("opening input file: %w", err)
}
defer func() {
log.Debug("Closing input file", "path", inputPath)
if err := input.Close(); err != nil {
log.Debug("Failed to close input file", "path", inputPath, "error", err)
}
}()
log.Debug("Creating output file for compressed and encrypted data", "path", outputPath)
output, err := os.Create(outputPath)
output, err := sm.fs.Create(outputPath)
if err != nil {
return fmt.Errorf("creating output file: %w", err)
}
defer func() {
log.Debug("Closing output file", "path", outputPath)
if err := output.Close(); err != nil {
log.Debug("Failed to close output file", "path", outputPath, "error", err)
}
}()
// Use blobgen for compression and encryption
log.Debug("Creating compressor/encryptor", "level", sm.config.CompressionLevel)
log.Debug("Compressing and encrypting data")
writer, err := blobgen.NewWriter(output, sm.config.CompressionLevel, sm.config.AgeRecipients)
if err != nil {
return fmt.Errorf("creating blobgen writer: %w", err)
}
// Track if writer has been closed to avoid double-close
writerClosed := false
defer func() {
if !writerClosed {
if err := writer.Close(); err != nil {
log.Debug("Failed to close writer", "error", err)
}
}
}()
log.Debug("Compressing and encrypting data")
if _, err := io.Copy(writer, input); err != nil {
return fmt.Errorf("compressing data: %w", err)
}
@@ -480,6 +493,7 @@ func (sm *SnapshotManager) compressDump(inputPath, outputPath string) error {
if err := writer.Close(); err != nil {
return fmt.Errorf("closing writer: %w", err)
}
writerClosed = true
log.Debug("Compression complete", "hash", fmt.Sprintf("%x", writer.Sum256()))
@@ -487,9 +501,9 @@ func (sm *SnapshotManager) compressDump(inputPath, outputPath string) error {
}
// copyFile copies a file from src to dst
func copyFile(src, dst string) error {
func (sm *SnapshotManager) copyFile(src, dst string) error {
log.Debug("Opening source file for copy", "path", src)
sourceFile, err := os.Open(src)
sourceFile, err := sm.fs.Open(src)
if err != nil {
return err
}
@@ -501,7 +515,7 @@ func copyFile(src, dst string) error {
}()
log.Debug("Creating destination file", "path", dst)
destFile, err := os.Create(dst)
destFile, err := sm.fs.Create(dst)
if err != nil {
return err
}
@@ -524,7 +538,6 @@ func copyFile(src, dst string) error {
// generateBlobManifest creates a compressed JSON list of all blobs in the snapshot
func (sm *SnapshotManager) generateBlobManifest(ctx context.Context, dbPath string, snapshotID string) ([]byte, error) {
log.Debug("Generating blob manifest", "db_path", dbPath, "snapshot_id", snapshotID)
// Open the cleaned database using the database package
db, err := database.New(ctx, dbPath)
@@ -573,7 +586,6 @@ func (sm *SnapshotManager) generateBlobManifest(ctx context.Context, dbPath stri
}
// Encode manifest
log.Debug("Encoding manifest")
compressedData, err := EncodeManifest(manifest, sm.config.CompressionLevel)
if err != nil {
return nil, fmt.Errorf("encoding manifest: %w", err)
@@ -591,8 +603,8 @@ func (sm *SnapshotManager) generateBlobManifest(ctx context.Context, dbPath stri
// compressData compresses data using zstd
// getFileSize returns the size of a file in bytes, or -1 if error
func getFileSize(path string) int64 {
info, err := os.Stat(path)
func (sm *SnapshotManager) getFileSize(path string) int64 {
info, err := sm.fs.Stat(path)
if err != nil {
return -1
}
@@ -635,18 +647,18 @@ func (sm *SnapshotManager) CleanupIncompleteSnapshots(ctx context.Context, hostn
log.Info("Found incomplete snapshots", "count", len(incompleteSnapshots))
// Check each incomplete snapshot for metadata in S3
// Check each incomplete snapshot for metadata in storage
for _, snapshot := range incompleteSnapshots {
// Check if metadata exists in S3
// Check if metadata exists in storage
metadataKey := fmt.Sprintf("metadata/%s/db.zst", snapshot.ID)
_, err := sm.s3Client.StatObject(ctx, metadataKey)
_, err := sm.storage.Stat(ctx, metadataKey)
if err != nil {
// Metadata doesn't exist in S3 - this is an incomplete snapshot
log.Info("Cleaning up incomplete snapshot record", "snapshot_id", snapshot.ID, "started_at", snapshot.StartedAt)
// Delete the snapshot and all its associations
if err := sm.deleteSnapshot(ctx, snapshot.ID); err != nil {
if err := sm.deleteSnapshot(ctx, snapshot.ID.String()); err != nil {
return fmt.Errorf("deleting incomplete snapshot %s: %w", snapshot.ID, err)
}
@@ -654,8 +666,8 @@ func (sm *SnapshotManager) CleanupIncompleteSnapshots(ctx context.Context, hostn
} else {
// Metadata exists - this snapshot was completed but database wasn't updated
// This shouldn't happen in normal operation, but mark it complete
log.Warn("Found snapshot with S3 metadata but incomplete in database", "snapshot_id", snapshot.ID)
if err := sm.repos.Snapshots.MarkComplete(ctx, nil, snapshot.ID); err != nil {
log.Warn("Found snapshot with remote metadata but incomplete in database", "snapshot_id", snapshot.ID)
if err := sm.repos.Snapshots.MarkComplete(ctx, nil, snapshot.ID.String()); err != nil {
log.Error("Failed to mark snapshot as complete in database", "snapshot_id", snapshot.ID, "error", err)
}
}
@@ -676,6 +688,11 @@ func (sm *SnapshotManager) deleteSnapshot(ctx context.Context, snapshotID string
return fmt.Errorf("deleting snapshot blobs: %w", err)
}
// Delete uploads entries (has foreign key to snapshots without CASCADE)
if err := sm.repos.Snapshots.DeleteSnapshotUploads(ctx, snapshotID); err != nil {
return fmt.Errorf("deleting snapshot uploads: %w", err)
}
// Delete the snapshot itself
if err := sm.repos.Snapshots.Delete(ctx, snapshotID); err != nil {
return fmt.Errorf("deleting snapshot: %w", err)
@@ -683,15 +700,16 @@ func (sm *SnapshotManager) deleteSnapshot(ctx context.Context, snapshotID string
// Clean up orphaned data
log.Debug("Cleaning up orphaned records in main database")
if err := sm.cleanupOrphanedData(ctx); err != nil {
if err := sm.CleanupOrphanedData(ctx); err != nil {
return fmt.Errorf("cleaning up orphaned data: %w", err)
}
return nil
}
// cleanupOrphanedData removes files, chunks, and blobs that are no longer referenced by any snapshot
func (sm *SnapshotManager) cleanupOrphanedData(ctx context.Context) error {
// CleanupOrphanedData removes files, chunks, and blobs that are no longer referenced by any snapshot.
// This should be called periodically to clean up data from deleted or incomplete snapshots.
func (sm *SnapshotManager) CleanupOrphanedData(ctx context.Context) error {
// Order is important to respect foreign key constraints:
// 1. Delete orphaned files (will cascade delete file_chunks)
// 2. Delete orphaned blobs (will cascade delete blob_chunks for deleted blobs)
@@ -731,6 +749,17 @@ func (sm *SnapshotManager) cleanupOrphanedData(ctx context.Context) error {
// deleteOtherSnapshots deletes all snapshots except the current one
func (sm *SnapshotManager) deleteOtherSnapshots(ctx context.Context, tx *sql.Tx, currentSnapshotID string) error {
log.Debug("[Temp DB Cleanup] Deleting all snapshot records except current", "keeping", currentSnapshotID)
// First delete uploads that reference other snapshots (no CASCADE DELETE on this FK)
database.LogSQL("Execute", "DELETE FROM uploads WHERE snapshot_id != ?", currentSnapshotID)
uploadResult, err := tx.ExecContext(ctx, "DELETE FROM uploads WHERE snapshot_id != ?", currentSnapshotID)
if err != nil {
return fmt.Errorf("deleting uploads for other snapshots: %w", err)
}
uploadsDeleted, _ := uploadResult.RowsAffected()
log.Debug("[Temp DB Cleanup] Deleted upload records", "count", uploadsDeleted)
// Now we can safely delete the snapshots
database.LogSQL("Execute", "DELETE FROM snapshots WHERE id != ?", currentSnapshotID)
result, err := tx.ExecContext(ctx, "DELETE FROM snapshots WHERE id != ?", currentSnapshotID)
if err != nil {
@@ -842,16 +871,21 @@ func (sm *SnapshotManager) deleteOrphanedBlobToChunkMappings(ctx context.Context
return nil
}
// deleteOrphanedChunks deletes chunks not referenced by any file
// deleteOrphanedChunks deletes chunks not referenced by any file or blob
func (sm *SnapshotManager) deleteOrphanedChunks(ctx context.Context, tx *sql.Tx) error {
log.Debug("[Temp DB Cleanup] Deleting orphaned chunk records")
database.LogSQL("Execute", `DELETE FROM chunks WHERE NOT EXISTS (SELECT 1 FROM file_chunks WHERE file_chunks.chunk_hash = chunks.chunk_hash)`)
result, err := tx.ExecContext(ctx, `
query := `
DELETE FROM chunks
WHERE NOT EXISTS (
SELECT 1 FROM file_chunks
WHERE file_chunks.chunk_hash = chunks.chunk_hash
)`)
)
AND NOT EXISTS (
SELECT 1 FROM blob_chunks
WHERE blob_chunks.chunk_hash = chunks.chunk_hash
)`
database.LogSQL("Execute", query)
result, err := tx.ExecContext(ctx, query)
if err != nil {
return fmt.Errorf("deleting orphaned chunks: %w", err)
}

View File

@@ -3,12 +3,14 @@ package snapshot
import (
"context"
"database/sql"
"io"
"path/filepath"
"testing"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"github.com/spf13/afero"
)
const (
@@ -16,11 +18,30 @@ const (
testAgeRecipient = "age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"
)
// copyFile is a test helper to copy files using afero
func copyFile(fs afero.Fs, src, dst string) error {
sourceFile, err := fs.Open(src)
if err != nil {
return err
}
defer func() { _ = sourceFile.Close() }()
destFile, err := fs.Create(dst)
if err != nil {
return err
}
defer func() { _ = destFile.Close() }()
_, err = io.Copy(destFile, sourceFile)
return err
}
func TestCleanSnapshotDBEmptySnapshot(t *testing.T) {
// Initialize logger
log.Initialize(log.Config{})
ctx := context.Background()
fs := afero.NewOsFs()
// Create a test database
tempDir := t.TempDir()
@@ -66,7 +87,7 @@ func TestCleanSnapshotDBEmptySnapshot(t *testing.T) {
// Copy database
tempDBPath := filepath.Join(tempDir, "temp.db")
if err := copyFile(dbPath, tempDBPath); err != nil {
if err := copyFile(fs, dbPath, tempDBPath); err != nil {
t.Fatalf("failed to copy database: %v", err)
}
@@ -75,9 +96,12 @@ func TestCleanSnapshotDBEmptySnapshot(t *testing.T) {
CompressionLevel: 3,
AgeRecipients: []string{testAgeRecipient},
}
// Clean the database
sm := &SnapshotManager{config: cfg}
if _, err := sm.cleanSnapshotDB(ctx, tempDBPath, snapshot.ID); err != nil {
// Create SnapshotManager with filesystem
sm := &SnapshotManager{
config: cfg,
fs: fs,
}
if _, err := sm.cleanSnapshotDB(ctx, tempDBPath, snapshot.ID.String()); err != nil {
t.Fatalf("failed to clean snapshot database: %v", err)
}
@@ -95,7 +119,7 @@ func TestCleanSnapshotDBEmptySnapshot(t *testing.T) {
cleanedRepos := database.NewRepositories(cleanedDB)
// Verify snapshot exists
verifySnapshot, err := cleanedRepos.Snapshots.GetByID(ctx, snapshot.ID)
verifySnapshot, err := cleanedRepos.Snapshots.GetByID(ctx, snapshot.ID.String())
if err != nil {
t.Fatalf("failed to get snapshot: %v", err)
}
@@ -104,7 +128,7 @@ func TestCleanSnapshotDBEmptySnapshot(t *testing.T) {
}
// Verify orphan file is gone
f, err := cleanedRepos.Files.GetByPath(ctx, file.Path)
f, err := cleanedRepos.Files.GetByPath(ctx, file.Path.String())
if err != nil {
t.Fatalf("failed to check file: %v", err)
}
@@ -113,7 +137,7 @@ func TestCleanSnapshotDBEmptySnapshot(t *testing.T) {
}
// Verify orphan chunk is gone
c, err := cleanedRepos.Chunks.GetByHash(ctx, chunk.ChunkHash)
c, err := cleanedRepos.Chunks.GetByHash(ctx, chunk.ChunkHash.String())
if err != nil {
t.Fatalf("failed to check chunk: %v", err)
}
@@ -127,6 +151,7 @@ func TestCleanSnapshotDBNonExistentSnapshot(t *testing.T) {
log.Initialize(log.Config{})
ctx := context.Background()
fs := afero.NewOsFs()
// Create a test database
tempDir := t.TempDir()
@@ -143,7 +168,7 @@ func TestCleanSnapshotDBNonExistentSnapshot(t *testing.T) {
// Copy database
tempDBPath := filepath.Join(tempDir, "temp.db")
if err := copyFile(dbPath, tempDBPath); err != nil {
if err := copyFile(fs, dbPath, tempDBPath); err != nil {
t.Fatalf("failed to copy database: %v", err)
}
@@ -153,7 +178,7 @@ func TestCleanSnapshotDBNonExistentSnapshot(t *testing.T) {
AgeRecipients: []string{testAgeRecipient},
}
// Try to clean with non-existent snapshot
sm := &SnapshotManager{config: cfg}
sm := &SnapshotManager{config: cfg, fs: fs}
_, err = sm.cleanSnapshotDB(ctx, tempDBPath, "non-existent-snapshot")
// Should not error - it will just delete everything

262
internal/storage/file.go Normal file
View File

@@ -0,0 +1,262 @@
package storage
import (
"context"
"fmt"
"io"
"os"
"path/filepath"
"strings"
"github.com/spf13/afero"
)
// FileStorer implements Storer using the local filesystem.
// It mirrors the S3 path structure for consistency.
type FileStorer struct {
fs afero.Fs
basePath string
}
// NewFileStorer creates a new filesystem storage backend.
// The basePath directory will be created if it doesn't exist.
// Uses the real OS filesystem by default; call SetFilesystem to override for testing.
func NewFileStorer(basePath string) (*FileStorer, error) {
fs := afero.NewOsFs()
// Ensure base path exists
if err := fs.MkdirAll(basePath, 0755); err != nil {
return nil, fmt.Errorf("creating base path: %w", err)
}
return &FileStorer{
fs: fs,
basePath: basePath,
}, nil
}
// SetFilesystem overrides the filesystem for testing.
func (f *FileStorer) SetFilesystem(fs afero.Fs) {
f.fs = fs
}
// fullPath returns the full filesystem path for a key.
func (f *FileStorer) fullPath(key string) string {
return filepath.Join(f.basePath, key)
}
// Put stores data at the specified key.
func (f *FileStorer) Put(ctx context.Context, key string, data io.Reader) error {
path := f.fullPath(key)
// Create parent directories
dir := filepath.Dir(path)
if err := f.fs.MkdirAll(dir, 0755); err != nil {
return fmt.Errorf("creating directories: %w", err)
}
file, err := f.fs.Create(path)
if err != nil {
return fmt.Errorf("creating file: %w", err)
}
defer func() { _ = file.Close() }()
if _, err := io.Copy(file, data); err != nil {
return fmt.Errorf("writing file: %w", err)
}
return nil
}
// PutWithProgress stores data with progress reporting.
func (f *FileStorer) PutWithProgress(ctx context.Context, key string, data io.Reader, size int64, progress ProgressCallback) error {
path := f.fullPath(key)
// Create parent directories
dir := filepath.Dir(path)
if err := f.fs.MkdirAll(dir, 0755); err != nil {
return fmt.Errorf("creating directories: %w", err)
}
file, err := f.fs.Create(path)
if err != nil {
return fmt.Errorf("creating file: %w", err)
}
defer func() { _ = file.Close() }()
// Wrap with progress tracking
pw := &progressWriter{
writer: file,
callback: progress,
}
if _, err := io.Copy(pw, data); err != nil {
return fmt.Errorf("writing file: %w", err)
}
return nil
}
// Get retrieves data from the specified key.
func (f *FileStorer) Get(ctx context.Context, key string) (io.ReadCloser, error) {
path := f.fullPath(key)
file, err := f.fs.Open(path)
if err != nil {
if os.IsNotExist(err) {
return nil, ErrNotFound
}
return nil, fmt.Errorf("opening file: %w", err)
}
return file, nil
}
// Stat returns metadata about an object without retrieving its contents.
func (f *FileStorer) Stat(ctx context.Context, key string) (*ObjectInfo, error) {
path := f.fullPath(key)
info, err := f.fs.Stat(path)
if err != nil {
if os.IsNotExist(err) {
return nil, ErrNotFound
}
return nil, fmt.Errorf("stat file: %w", err)
}
return &ObjectInfo{
Key: key,
Size: info.Size(),
}, nil
}
// Delete removes an object.
func (f *FileStorer) Delete(ctx context.Context, key string) error {
path := f.fullPath(key)
err := f.fs.Remove(path)
if os.IsNotExist(err) {
return nil // Match S3 behavior: no error if doesn't exist
}
if err != nil {
return fmt.Errorf("removing file: %w", err)
}
return nil
}
// List returns all keys with the given prefix.
func (f *FileStorer) List(ctx context.Context, prefix string) ([]string, error) {
var keys []string
basePath := f.fullPath(prefix)
// Check if base path exists
exists, err := afero.Exists(f.fs, basePath)
if err != nil {
return nil, fmt.Errorf("checking path: %w", err)
}
if !exists {
return keys, nil // Empty list for non-existent prefix
}
err = afero.Walk(f.fs, basePath, func(path string, info os.FileInfo, err error) error {
if err != nil {
return err
}
// Check context cancellation
select {
case <-ctx.Done():
return ctx.Err()
default:
}
if !info.IsDir() {
// Convert back to key (relative path from basePath)
relPath, err := filepath.Rel(f.basePath, path)
if err != nil {
return fmt.Errorf("computing relative path: %w", err)
}
// Normalize path separators to forward slashes for consistency
relPath = strings.ReplaceAll(relPath, string(filepath.Separator), "/")
keys = append(keys, relPath)
}
return nil
})
if err != nil {
return nil, fmt.Errorf("walking directory: %w", err)
}
return keys, nil
}
// ListStream returns a channel of ObjectInfo for large result sets.
func (f *FileStorer) ListStream(ctx context.Context, prefix string) <-chan ObjectInfo {
ch := make(chan ObjectInfo)
go func() {
defer close(ch)
basePath := f.fullPath(prefix)
// Check if base path exists
exists, err := afero.Exists(f.fs, basePath)
if err != nil {
ch <- ObjectInfo{Err: fmt.Errorf("checking path: %w", err)}
return
}
if !exists {
return // Empty channel for non-existent prefix
}
_ = afero.Walk(f.fs, basePath, func(path string, info os.FileInfo, err error) error {
// Check context cancellation
select {
case <-ctx.Done():
ch <- ObjectInfo{Err: ctx.Err()}
return ctx.Err()
default:
}
if err != nil {
ch <- ObjectInfo{Err: err}
return nil // Continue walking despite errors
}
if !info.IsDir() {
relPath, err := filepath.Rel(f.basePath, path)
if err != nil {
ch <- ObjectInfo{Err: fmt.Errorf("computing relative path: %w", err)}
return nil
}
// Normalize path separators
relPath = strings.ReplaceAll(relPath, string(filepath.Separator), "/")
ch <- ObjectInfo{
Key: relPath,
Size: info.Size(),
}
}
return nil
})
}()
return ch
}
// Info returns human-readable storage location information.
func (f *FileStorer) Info() StorageInfo {
return StorageInfo{
Type: "file",
Location: f.basePath,
}
}
// progressWriter wraps an io.Writer to track write progress.
type progressWriter struct {
writer io.Writer
written int64
callback ProgressCallback
}
func (pw *progressWriter) Write(p []byte) (int, error) {
n, err := pw.writer.Write(p)
if n > 0 {
pw.written += int64(n)
if pw.callback != nil {
if callbackErr := pw.callback(pw.written); callbackErr != nil {
return n, callbackErr
}
}
}
return n, err
}

113
internal/storage/module.go Normal file
View File

@@ -0,0 +1,113 @@
package storage
import (
"context"
"fmt"
"strings"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/s3"
"go.uber.org/fx"
)
// Module exports storage functionality as an fx module.
// It provides a Storer implementation based on the configured storage URL
// or falls back to legacy S3 configuration.
var Module = fx.Module("storage",
fx.Provide(NewStorer),
)
// NewStorer creates a Storer based on configuration.
// If StorageURL is set, it uses URL-based configuration.
// Otherwise, it falls back to legacy S3 configuration.
func NewStorer(cfg *config.Config) (Storer, error) {
if cfg.StorageURL != "" {
return storerFromURL(cfg.StorageURL, cfg)
}
return storerFromLegacyS3Config(cfg)
}
func storerFromURL(rawURL string, cfg *config.Config) (Storer, error) {
parsed, err := ParseStorageURL(rawURL)
if err != nil {
return nil, fmt.Errorf("parsing storage URL: %w", err)
}
switch parsed.Scheme {
case "file":
return NewFileStorer(parsed.Prefix)
case "s3":
// Build endpoint URL
endpoint := parsed.Endpoint
if endpoint == "" {
endpoint = "s3.amazonaws.com"
}
// Add protocol if not present
if parsed.UseSSL && !strings.HasPrefix(endpoint, "https://") && !strings.HasPrefix(endpoint, "http://") {
endpoint = "https://" + endpoint
} else if !parsed.UseSSL && !strings.HasPrefix(endpoint, "http://") && !strings.HasPrefix(endpoint, "https://") {
endpoint = "http://" + endpoint
}
region := parsed.Region
if region == "" {
region = cfg.S3.Region
if region == "" {
region = "us-east-1"
}
}
// Credentials come from config (not URL for security)
client, err := s3.NewClient(context.Background(), s3.Config{
Endpoint: endpoint,
Bucket: parsed.Bucket,
Prefix: parsed.Prefix,
AccessKeyID: cfg.S3.AccessKeyID,
SecretAccessKey: cfg.S3.SecretAccessKey,
Region: region,
})
if err != nil {
return nil, fmt.Errorf("creating S3 client: %w", err)
}
return NewS3Storer(client), nil
case "rclone":
return NewRcloneStorer(context.Background(), parsed.RcloneRemote, parsed.Prefix)
default:
return nil, fmt.Errorf("unsupported storage scheme: %s", parsed.Scheme)
}
}
func storerFromLegacyS3Config(cfg *config.Config) (Storer, error) {
endpoint := cfg.S3.Endpoint
// Ensure protocol is present
if !strings.HasPrefix(endpoint, "http://") && !strings.HasPrefix(endpoint, "https://") {
if cfg.S3.UseSSL {
endpoint = "https://" + endpoint
} else {
endpoint = "http://" + endpoint
}
}
region := cfg.S3.Region
if region == "" {
region = "us-east-1"
}
client, err := s3.NewClient(context.Background(), s3.Config{
Endpoint: endpoint,
Bucket: cfg.S3.Bucket,
Prefix: cfg.S3.Prefix,
AccessKeyID: cfg.S3.AccessKeyID,
SecretAccessKey: cfg.S3.SecretAccessKey,
Region: region,
})
if err != nil {
return nil, fmt.Errorf("creating S3 client: %w", err)
}
return NewS3Storer(client), nil
}

236
internal/storage/rclone.go Normal file
View File

@@ -0,0 +1,236 @@
package storage
import (
"bytes"
"context"
"errors"
"fmt"
"io"
"strings"
"time"
"github.com/rclone/rclone/fs"
"github.com/rclone/rclone/fs/config/configfile"
"github.com/rclone/rclone/fs/operations"
// Import all rclone backends
_ "github.com/rclone/rclone/backend/all"
)
// ErrRemoteNotFound is returned when an rclone remote is not configured.
var ErrRemoteNotFound = errors.New("rclone remote not found in config")
// RcloneStorer implements Storer using rclone's filesystem abstraction.
// This allows vaultik to use any of rclone's 70+ supported storage providers.
type RcloneStorer struct {
fsys fs.Fs // rclone filesystem
remote string // remote name (for Info())
path string // path within remote (for Info())
}
// NewRcloneStorer creates a new rclone storage backend.
// The remote parameter is the rclone remote name (as configured via `rclone config`).
// The path parameter is the path within the remote.
func NewRcloneStorer(ctx context.Context, remote, path string) (*RcloneStorer, error) {
// Install the default config file handler
configfile.Install()
// Build the rclone path string (e.g., "myremote:path/to/backups")
rclonePath := remote + ":"
if path != "" {
rclonePath += path
}
// Create the rclone filesystem
fsys, err := fs.NewFs(ctx, rclonePath)
if err != nil {
// Check for remote not found error
if strings.Contains(err.Error(), "didn't find section in config file") ||
strings.Contains(err.Error(), "failed to find remote") {
return nil, fmt.Errorf("%w: %s", ErrRemoteNotFound, remote)
}
return nil, fmt.Errorf("creating rclone filesystem: %w", err)
}
return &RcloneStorer{
fsys: fsys,
remote: remote,
path: path,
}, nil
}
// Put stores data at the specified key.
func (r *RcloneStorer) Put(ctx context.Context, key string, data io.Reader) error {
// Read all data into memory to get size (required by rclone)
buf, err := io.ReadAll(data)
if err != nil {
return fmt.Errorf("reading data: %w", err)
}
// Upload the object
_, err = operations.Rcat(ctx, r.fsys, key, io.NopCloser(bytes.NewReader(buf)), time.Now(), nil)
if err != nil {
return fmt.Errorf("uploading object: %w", err)
}
return nil
}
// PutWithProgress stores data with progress reporting.
func (r *RcloneStorer) PutWithProgress(ctx context.Context, key string, data io.Reader, size int64, progress ProgressCallback) error {
// Wrap reader with progress tracking
pr := &progressReader{
reader: data,
callback: progress,
}
// Upload the object
_, err := operations.Rcat(ctx, r.fsys, key, io.NopCloser(pr), time.Now(), nil)
if err != nil {
return fmt.Errorf("uploading object: %w", err)
}
return nil
}
// Get retrieves data from the specified key.
func (r *RcloneStorer) Get(ctx context.Context, key string) (io.ReadCloser, error) {
// Get the object
obj, err := r.fsys.NewObject(ctx, key)
if err != nil {
if errors.Is(err, fs.ErrorObjectNotFound) {
return nil, ErrNotFound
}
if errors.Is(err, fs.ErrorDirNotFound) {
return nil, ErrNotFound
}
return nil, fmt.Errorf("getting object: %w", err)
}
// Open the object for reading
reader, err := obj.Open(ctx)
if err != nil {
return nil, fmt.Errorf("opening object: %w", err)
}
return reader, nil
}
// Stat returns metadata about an object without retrieving its contents.
func (r *RcloneStorer) Stat(ctx context.Context, key string) (*ObjectInfo, error) {
obj, err := r.fsys.NewObject(ctx, key)
if err != nil {
if errors.Is(err, fs.ErrorObjectNotFound) {
return nil, ErrNotFound
}
if errors.Is(err, fs.ErrorDirNotFound) {
return nil, ErrNotFound
}
return nil, fmt.Errorf("getting object: %w", err)
}
return &ObjectInfo{
Key: key,
Size: obj.Size(),
}, nil
}
// Delete removes an object.
func (r *RcloneStorer) Delete(ctx context.Context, key string) error {
obj, err := r.fsys.NewObject(ctx, key)
if err != nil {
if errors.Is(err, fs.ErrorObjectNotFound) {
return nil // Match S3 behavior: no error if doesn't exist
}
if errors.Is(err, fs.ErrorDirNotFound) {
return nil
}
return fmt.Errorf("getting object: %w", err)
}
if err := obj.Remove(ctx); err != nil {
return fmt.Errorf("removing object: %w", err)
}
return nil
}
// List returns all keys with the given prefix.
func (r *RcloneStorer) List(ctx context.Context, prefix string) ([]string, error) {
var keys []string
err := operations.ListFn(ctx, r.fsys, func(obj fs.Object) {
key := obj.Remote()
if prefix == "" || strings.HasPrefix(key, prefix) {
keys = append(keys, key)
}
})
if err != nil {
return nil, fmt.Errorf("listing objects: %w", err)
}
return keys, nil
}
// ListStream returns a channel of ObjectInfo for large result sets.
func (r *RcloneStorer) ListStream(ctx context.Context, prefix string) <-chan ObjectInfo {
ch := make(chan ObjectInfo)
go func() {
defer close(ch)
err := operations.ListFn(ctx, r.fsys, func(obj fs.Object) {
// Check context cancellation
select {
case <-ctx.Done():
return
default:
}
key := obj.Remote()
if prefix == "" || strings.HasPrefix(key, prefix) {
ch <- ObjectInfo{
Key: key,
Size: obj.Size(),
}
}
})
if err != nil {
ch <- ObjectInfo{Err: fmt.Errorf("listing objects: %w", err)}
}
}()
return ch
}
// Info returns human-readable storage location information.
func (r *RcloneStorer) Info() StorageInfo {
location := r.remote
if r.path != "" {
location += ":" + r.path
}
return StorageInfo{
Type: "rclone",
Location: location,
}
}
// progressReader wraps an io.Reader to track read progress.
type progressReader struct {
reader io.Reader
read int64
callback ProgressCallback
}
func (pr *progressReader) Read(p []byte) (int, error) {
n, err := pr.reader.Read(p)
if n > 0 {
pr.read += int64(n)
if pr.callback != nil {
if callbackErr := pr.callback(pr.read); callbackErr != nil {
return n, callbackErr
}
}
}
return n, err
}

85
internal/storage/s3.go Normal file
View File

@@ -0,0 +1,85 @@
package storage
import (
"context"
"fmt"
"io"
"git.eeqj.de/sneak/vaultik/internal/s3"
)
// S3Storer wraps the existing s3.Client to implement Storer.
type S3Storer struct {
client *s3.Client
}
// NewS3Storer creates a new S3 storage backend.
func NewS3Storer(client *s3.Client) *S3Storer {
return &S3Storer{client: client}
}
// Put stores data at the specified key.
func (s *S3Storer) Put(ctx context.Context, key string, data io.Reader) error {
return s.client.PutObject(ctx, key, data)
}
// PutWithProgress stores data with progress reporting.
func (s *S3Storer) PutWithProgress(ctx context.Context, key string, data io.Reader, size int64, progress ProgressCallback) error {
// Convert storage.ProgressCallback to s3.ProgressCallback
var s3Progress s3.ProgressCallback
if progress != nil {
s3Progress = s3.ProgressCallback(progress)
}
return s.client.PutObjectWithProgress(ctx, key, data, size, s3Progress)
}
// Get retrieves data from the specified key.
func (s *S3Storer) Get(ctx context.Context, key string) (io.ReadCloser, error) {
return s.client.GetObject(ctx, key)
}
// Stat returns metadata about an object without retrieving its contents.
func (s *S3Storer) Stat(ctx context.Context, key string) (*ObjectInfo, error) {
info, err := s.client.StatObject(ctx, key)
if err != nil {
return nil, err
}
return &ObjectInfo{
Key: info.Key,
Size: info.Size,
}, nil
}
// Delete removes an object.
func (s *S3Storer) Delete(ctx context.Context, key string) error {
return s.client.DeleteObject(ctx, key)
}
// List returns all keys with the given prefix.
func (s *S3Storer) List(ctx context.Context, prefix string) ([]string, error) {
return s.client.ListObjects(ctx, prefix)
}
// ListStream returns a channel of ObjectInfo for large result sets.
func (s *S3Storer) ListStream(ctx context.Context, prefix string) <-chan ObjectInfo {
ch := make(chan ObjectInfo)
go func() {
defer close(ch)
for info := range s.client.ListObjectsStream(ctx, prefix, false) {
ch <- ObjectInfo{
Key: info.Key,
Size: info.Size,
Err: info.Err,
}
}
}()
return ch
}
// Info returns human-readable storage location information.
func (s *S3Storer) Info() StorageInfo {
return StorageInfo{
Type: "s3",
Location: fmt.Sprintf("%s/%s", s.client.Endpoint(), s.client.BucketName()),
}
}

View File

@@ -0,0 +1,74 @@
// Package storage provides a unified interface for storage backends.
// It supports both S3-compatible object storage and local filesystem storage,
// allowing Vaultik to store backups in either location with the same API.
//
// Storage backends are selected via URL:
// - s3://bucket/prefix?endpoint=host&region=r - S3-compatible storage
// - file:///path/to/backup - Local filesystem storage
//
// Both backends implement the Storer interface and support progress reporting
// during upload/write operations.
package storage
import (
"context"
"errors"
"io"
)
// ErrNotFound is returned when an object does not exist.
var ErrNotFound = errors.New("object not found")
// ProgressCallback is called during storage operations with bytes transferred so far.
// Return an error to cancel the operation.
type ProgressCallback func(bytesTransferred int64) error
// ObjectInfo contains metadata about a stored object.
type ObjectInfo struct {
Key string // Object key/path
Size int64 // Size in bytes
Err error // Error for streaming results (nil on success)
}
// StorageInfo provides human-readable storage configuration.
type StorageInfo struct {
Type string // "s3" or "file"
Location string // endpoint/bucket for S3, base path for filesystem
}
// Storer defines the interface for storage backends.
// All paths are relative to the storage root (bucket/prefix for S3, base directory for filesystem).
type Storer interface {
// Put stores data at the specified key.
// Parent directories are created automatically for filesystem backends.
Put(ctx context.Context, key string, data io.Reader) error
// PutWithProgress stores data with progress reporting.
// Size must be the exact size of the data to store.
// The progress callback is called periodically with bytes transferred.
PutWithProgress(ctx context.Context, key string, data io.Reader, size int64, progress ProgressCallback) error
// Get retrieves data from the specified key.
// The caller must close the returned ReadCloser.
// Returns ErrNotFound if the object does not exist.
Get(ctx context.Context, key string) (io.ReadCloser, error)
// Stat returns metadata about an object without retrieving its contents.
// Returns ErrNotFound if the object does not exist.
Stat(ctx context.Context, key string) (*ObjectInfo, error)
// Delete removes an object. No error is returned if the object doesn't exist.
Delete(ctx context.Context, key string) error
// List returns all keys with the given prefix.
// For large result sets, prefer ListStream.
List(ctx context.Context, prefix string) ([]string, error)
// ListStream returns a channel of ObjectInfo for large result sets.
// The channel is closed when listing completes.
// If an error occurs during listing, the final item will have Err set.
ListStream(ctx context.Context, prefix string) <-chan ObjectInfo
// Info returns human-readable storage location information.
Info() StorageInfo
}

118
internal/storage/url.go Normal file
View File

@@ -0,0 +1,118 @@
package storage
import (
"fmt"
"net/url"
"strings"
)
// StorageURL represents a parsed storage URL.
type StorageURL struct {
Scheme string // "s3", "file", or "rclone"
Bucket string // S3 bucket name (empty for file/rclone)
Prefix string // Path within bucket or filesystem base path
Endpoint string // S3 endpoint (optional, default AWS)
Region string // S3 region (optional)
UseSSL bool // Use HTTPS for S3 (default true)
RcloneRemote string // rclone remote name (for rclone:// URLs)
}
// ParseStorageURL parses a storage URL string.
// Supported formats:
// - s3://bucket/prefix?endpoint=host&region=us-east-1&ssl=true
// - file:///absolute/path/to/backup
// - rclone://remote/path/to/backups
func ParseStorageURL(rawURL string) (*StorageURL, error) {
if rawURL == "" {
return nil, fmt.Errorf("storage URL is empty")
}
// Handle file:// URLs
if strings.HasPrefix(rawURL, "file://") {
path := strings.TrimPrefix(rawURL, "file://")
if path == "" {
return nil, fmt.Errorf("file URL path is empty")
}
return &StorageURL{
Scheme: "file",
Prefix: path,
}, nil
}
// Handle s3:// URLs
if strings.HasPrefix(rawURL, "s3://") {
u, err := url.Parse(rawURL)
if err != nil {
return nil, fmt.Errorf("invalid URL: %w", err)
}
bucket := u.Host
if bucket == "" {
return nil, fmt.Errorf("s3 URL missing bucket name")
}
prefix := strings.TrimPrefix(u.Path, "/")
query := u.Query()
useSSL := true
if query.Get("ssl") == "false" {
useSSL = false
}
return &StorageURL{
Scheme: "s3",
Bucket: bucket,
Prefix: prefix,
Endpoint: query.Get("endpoint"),
Region: query.Get("region"),
UseSSL: useSSL,
}, nil
}
// Handle rclone:// URLs
if strings.HasPrefix(rawURL, "rclone://") {
u, err := url.Parse(rawURL)
if err != nil {
return nil, fmt.Errorf("invalid URL: %w", err)
}
remote := u.Host
if remote == "" {
return nil, fmt.Errorf("rclone URL missing remote name")
}
path := strings.TrimPrefix(u.Path, "/")
return &StorageURL{
Scheme: "rclone",
Prefix: path,
RcloneRemote: remote,
}, nil
}
return nil, fmt.Errorf("unsupported URL scheme: must start with s3://, file://, or rclone://")
}
// String returns a human-readable representation of the storage URL.
func (u *StorageURL) String() string {
switch u.Scheme {
case "file":
return fmt.Sprintf("file://%s", u.Prefix)
case "s3":
endpoint := u.Endpoint
if endpoint == "" {
endpoint = "s3.amazonaws.com"
}
if u.Prefix != "" {
return fmt.Sprintf("s3://%s/%s (endpoint: %s)", u.Bucket, u.Prefix, endpoint)
}
return fmt.Sprintf("s3://%s (endpoint: %s)", u.Bucket, endpoint)
case "rclone":
if u.Prefix != "" {
return fmt.Sprintf("rclone://%s/%s", u.RcloneRemote, u.Prefix)
}
return fmt.Sprintf("rclone://%s", u.RcloneRemote)
default:
return fmt.Sprintf("%s://?", u.Scheme)
}
}

203
internal/types/types.go Normal file
View File

@@ -0,0 +1,203 @@
// Package types provides custom types for better type safety across the vaultik codebase.
// Using distinct types for IDs, hashes, paths, and credentials prevents accidental
// mixing of semantically different values that happen to share the same underlying type.
package types
import (
"database/sql/driver"
"fmt"
"github.com/google/uuid"
)
// FileID is a UUID identifying a file record in the database.
type FileID uuid.UUID
// NewFileID generates a new random FileID.
func NewFileID() FileID {
return FileID(uuid.New())
}
// ParseFileID parses a string into a FileID.
func ParseFileID(s string) (FileID, error) {
id, err := uuid.Parse(s)
if err != nil {
return FileID{}, err
}
return FileID(id), nil
}
// IsZero returns true if the FileID is the zero value.
func (id FileID) IsZero() bool {
return uuid.UUID(id) == uuid.Nil
}
// Value implements driver.Valuer for database serialization.
func (id FileID) Value() (driver.Value, error) {
return uuid.UUID(id).String(), nil
}
// Scan implements sql.Scanner for database deserialization.
func (id *FileID) Scan(src interface{}) error {
if src == nil {
*id = FileID{}
return nil
}
var s string
switch v := src.(type) {
case string:
s = v
case []byte:
s = string(v)
default:
return fmt.Errorf("cannot scan %T into FileID", src)
}
parsed, err := uuid.Parse(s)
if err != nil {
return fmt.Errorf("invalid FileID: %w", err)
}
*id = FileID(parsed)
return nil
}
// BlobID is a UUID identifying a blob record in the database.
// This is distinct from BlobHash which is the content-addressed hash of the blob.
type BlobID uuid.UUID
// NewBlobID generates a new random BlobID.
func NewBlobID() BlobID {
return BlobID(uuid.New())
}
// ParseBlobID parses a string into a BlobID.
func ParseBlobID(s string) (BlobID, error) {
id, err := uuid.Parse(s)
if err != nil {
return BlobID{}, err
}
return BlobID(id), nil
}
// IsZero returns true if the BlobID is the zero value.
func (id BlobID) IsZero() bool {
return uuid.UUID(id) == uuid.Nil
}
// Value implements driver.Valuer for database serialization.
func (id BlobID) Value() (driver.Value, error) {
return uuid.UUID(id).String(), nil
}
// Scan implements sql.Scanner for database deserialization.
func (id *BlobID) Scan(src interface{}) error {
if src == nil {
*id = BlobID{}
return nil
}
var s string
switch v := src.(type) {
case string:
s = v
case []byte:
s = string(v)
default:
return fmt.Errorf("cannot scan %T into BlobID", src)
}
parsed, err := uuid.Parse(s)
if err != nil {
return fmt.Errorf("invalid BlobID: %w", err)
}
*id = BlobID(parsed)
return nil
}
// SnapshotID identifies a snapshot, typically in format "hostname_name_timestamp".
type SnapshotID string
// ChunkHash is the SHA256 hash of a chunk's content.
// Used for content-addressing and deduplication of file chunks.
type ChunkHash string
// BlobHash is the SHA256 hash of a blob's compressed and encrypted content.
// This is used as the filename in S3 storage for content-addressed retrieval.
type BlobHash string
// FilePath represents an absolute path to a file or directory.
type FilePath string
// SourcePath represents the root directory from which files are backed up.
// Used during restore to strip the source prefix from paths.
type SourcePath string
// AgeRecipient is an age public key used for encryption.
// Format: age1... (Bech32-encoded X25519 public key)
type AgeRecipient string
// AgeSecretKey is an age private key used for decryption.
// Format: AGE-SECRET-KEY-... (Bech32-encoded X25519 private key)
// This type should never be logged or serialized in plaintext.
type AgeSecretKey string
// S3Endpoint is the URL of an S3-compatible storage endpoint.
type S3Endpoint string
// BucketName is the name of an S3 bucket.
type BucketName string
// S3Prefix is the path prefix within an S3 bucket.
type S3Prefix string
// AWSRegion is an AWS region identifier (e.g., "us-east-1").
type AWSRegion string
// AWSAccessKeyID is an AWS access key ID for authentication.
type AWSAccessKeyID string
// AWSSecretAccessKey is an AWS secret access key for authentication.
// This type should never be logged or serialized in plaintext.
type AWSSecretAccessKey string
// Hostname identifies a host machine.
type Hostname string
// Version is a semantic version string.
type Version string
// GitRevision is a git commit SHA.
type GitRevision string
// GlobPattern is a glob pattern for file matching (e.g., "*.log", "node_modules").
type GlobPattern string
// String methods for Stringer interface
func (id FileID) String() string { return uuid.UUID(id).String() }
func (id BlobID) String() string { return uuid.UUID(id).String() }
func (id SnapshotID) String() string { return string(id) }
func (h ChunkHash) String() string { return string(h) }
func (h BlobHash) String() string { return string(h) }
func (p FilePath) String() string { return string(p) }
func (p SourcePath) String() string { return string(p) }
func (r AgeRecipient) String() string { return string(r) }
func (e S3Endpoint) String() string { return string(e) }
func (b BucketName) String() string { return string(b) }
func (p S3Prefix) String() string { return string(p) }
func (r AWSRegion) String() string { return string(r) }
func (k AWSAccessKeyID) String() string { return string(k) }
func (h Hostname) String() string { return string(h) }
func (v Version) String() string { return string(v) }
func (r GitRevision) String() string { return string(r) }
func (p GlobPattern) String() string { return string(p) }
// Redacted String methods for sensitive types - prevents accidental logging
func (k AgeSecretKey) String() string { return "[REDACTED]" }
func (k AWSSecretAccessKey) String() string { return "[REDACTED]" }
// Raw returns the actual value for sensitive types when explicitly needed
func (k AgeSecretKey) Raw() string { return string(k) }
func (k AWSSecretAccessKey) Raw() string { return string(k) }

View File

@@ -0,0 +1,96 @@
package vaultik
import (
"fmt"
"strconv"
"strings"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
// SnapshotInfo contains information about a snapshot
type SnapshotInfo struct {
ID types.SnapshotID `json:"id"`
Timestamp time.Time `json:"timestamp"`
CompressedSize int64 `json:"compressed_size"`
}
// formatNumber formats a number with commas
func formatNumber(n int) string {
str := fmt.Sprintf("%d", n)
var result []string
for i, digit := range str {
if i > 0 && (len(str)-i)%3 == 0 {
result = append(result, ",")
}
result = append(result, string(digit))
}
return strings.Join(result, "")
}
// formatDuration formats a duration in a human-readable way
func formatDuration(d time.Duration) string {
if d < time.Second {
return fmt.Sprintf("%dms", d.Milliseconds())
}
if d < time.Minute {
return fmt.Sprintf("%.1fs", d.Seconds())
}
if d < time.Hour {
mins := int(d.Minutes())
secs := int(d.Seconds()) % 60
return fmt.Sprintf("%dm %ds", mins, secs)
}
hours := int(d.Hours())
mins := int(d.Minutes()) % 60
return fmt.Sprintf("%dh %dm", hours, mins)
}
// formatBytes formats bytes in a human-readable format
func formatBytes(bytes int64) string {
const unit = 1024
if bytes < unit {
return fmt.Sprintf("%d B", bytes)
}
div, exp := int64(unit), 0
for n := bytes / unit; n >= unit; n /= unit {
div *= unit
exp++
}
return fmt.Sprintf("%.1f %cB", float64(bytes)/float64(div), "KMGTPE"[exp])
}
// parseSnapshotTimestamp extracts the timestamp from a snapshot ID
// Format: hostname_snapshotname_2026-01-12T14:41:15Z
func parseSnapshotTimestamp(snapshotID string) (time.Time, error) {
parts := strings.Split(snapshotID, "_")
if len(parts) < 2 {
return time.Time{}, fmt.Errorf("invalid snapshot ID format: expected hostname_snapshotname_timestamp")
}
// Last part is the RFC3339 timestamp
timestampStr := parts[len(parts)-1]
timestamp, err := time.Parse(time.RFC3339, timestampStr)
if err != nil {
return time.Time{}, fmt.Errorf("invalid timestamp: %w", err)
}
return timestamp.UTC(), nil
}
// parseDuration parses a duration string with support for days
func parseDuration(s string) (time.Duration, error) {
// Check for days suffix
if strings.HasSuffix(s, "d") {
daysStr := strings.TrimSuffix(s, "d")
days, err := strconv.Atoi(daysStr)
if err != nil {
return 0, fmt.Errorf("invalid days value: %w", err)
}
return time.Duration(days) * 24 * time.Hour, nil
}
// Otherwise use standard Go duration parsing
return time.ParseDuration(s)
}

348
internal/vaultik/info.go Normal file
View File

@@ -0,0 +1,348 @@
package vaultik
import (
"encoding/json"
"fmt"
"runtime"
"sort"
"strings"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"github.com/dustin/go-humanize"
)
// ShowInfo displays system and configuration information
func (v *Vaultik) ShowInfo() error {
// System Information
fmt.Printf("=== System Information ===\n")
fmt.Printf("OS/Architecture: %s/%s\n", runtime.GOOS, runtime.GOARCH)
fmt.Printf("Version: %s\n", v.Globals.Version)
fmt.Printf("Commit: %s\n", v.Globals.Commit)
fmt.Printf("Go Version: %s\n", runtime.Version())
fmt.Println()
// Storage Configuration
fmt.Printf("=== Storage Configuration ===\n")
fmt.Printf("S3 Bucket: %s\n", v.Config.S3.Bucket)
if v.Config.S3.Prefix != "" {
fmt.Printf("S3 Prefix: %s\n", v.Config.S3.Prefix)
}
fmt.Printf("S3 Endpoint: %s\n", v.Config.S3.Endpoint)
fmt.Printf("S3 Region: %s\n", v.Config.S3.Region)
fmt.Println()
// Backup Settings
fmt.Printf("=== Backup Settings ===\n")
// Show configured snapshots
fmt.Printf("Snapshots:\n")
for _, name := range v.Config.SnapshotNames() {
snap := v.Config.Snapshots[name]
fmt.Printf(" %s:\n", name)
for _, path := range snap.Paths {
fmt.Printf(" - %s\n", path)
}
if len(snap.Exclude) > 0 {
fmt.Printf(" exclude: %s\n", strings.Join(snap.Exclude, ", "))
}
}
// Global exclude patterns
if len(v.Config.Exclude) > 0 {
fmt.Printf("Global Exclude: %s\n", strings.Join(v.Config.Exclude, ", "))
}
fmt.Printf("Compression: zstd level %d\n", v.Config.CompressionLevel)
fmt.Printf("Chunk Size: %s\n", humanize.Bytes(uint64(v.Config.ChunkSize)))
fmt.Printf("Blob Size Limit: %s\n", humanize.Bytes(uint64(v.Config.BlobSizeLimit)))
fmt.Println()
// Encryption Configuration
fmt.Printf("=== Encryption Configuration ===\n")
fmt.Printf("Recipients:\n")
for _, recipient := range v.Config.AgeRecipients {
fmt.Printf(" - %s\n", recipient)
}
fmt.Println()
// Daemon Settings (if applicable)
if v.Config.BackupInterval > 0 || v.Config.MinTimeBetweenRun > 0 {
fmt.Printf("=== Daemon Settings ===\n")
if v.Config.BackupInterval > 0 {
fmt.Printf("Backup Interval: %s\n", v.Config.BackupInterval)
}
if v.Config.MinTimeBetweenRun > 0 {
fmt.Printf("Minimum Time: %s\n", v.Config.MinTimeBetweenRun)
}
fmt.Println()
}
// Local Database
fmt.Printf("=== Local Database ===\n")
fmt.Printf("Index Path: %s\n", v.Config.IndexPath)
// Check if index file exists and get its size
if info, err := v.Fs.Stat(v.Config.IndexPath); err == nil {
fmt.Printf("Index Size: %s\n", humanize.Bytes(uint64(info.Size())))
// Get snapshot count from database
query := `SELECT COUNT(*) FROM snapshots WHERE completed_at IS NOT NULL`
var snapshotCount int
if err := v.DB.Conn().QueryRowContext(v.ctx, query).Scan(&snapshotCount); err == nil {
fmt.Printf("Snapshots: %d\n", snapshotCount)
}
// Get blob count from database
query = `SELECT COUNT(*) FROM blobs`
var blobCount int
if err := v.DB.Conn().QueryRowContext(v.ctx, query).Scan(&blobCount); err == nil {
fmt.Printf("Blobs: %d\n", blobCount)
}
// Get file count from database
query = `SELECT COUNT(*) FROM files`
var fileCount int
if err := v.DB.Conn().QueryRowContext(v.ctx, query).Scan(&fileCount); err == nil {
fmt.Printf("Files: %d\n", fileCount)
}
} else {
fmt.Printf("Index Size: (not created)\n")
}
return nil
}
// SnapshotMetadataInfo contains information about a single snapshot's metadata
type SnapshotMetadataInfo struct {
SnapshotID string `json:"snapshot_id"`
ManifestSize int64 `json:"manifest_size"`
DatabaseSize int64 `json:"database_size"`
TotalSize int64 `json:"total_size"`
BlobCount int `json:"blob_count"`
BlobsSize int64 `json:"blobs_size"`
}
// RemoteInfoResult contains all remote storage information
type RemoteInfoResult struct {
// Storage info
StorageType string `json:"storage_type"`
StorageLocation string `json:"storage_location"`
// Snapshot metadata
Snapshots []SnapshotMetadataInfo `json:"snapshots"`
TotalMetadataSize int64 `json:"total_metadata_size"`
TotalMetadataCount int `json:"total_metadata_count"`
// All blobs on remote
TotalBlobCount int `json:"total_blob_count"`
TotalBlobSize int64 `json:"total_blob_size"`
// Referenced blobs (from manifests)
ReferencedBlobCount int `json:"referenced_blob_count"`
ReferencedBlobSize int64 `json:"referenced_blob_size"`
// Orphaned blobs
OrphanedBlobCount int `json:"orphaned_blob_count"`
OrphanedBlobSize int64 `json:"orphaned_blob_size"`
}
// RemoteInfo displays information about remote storage
func (v *Vaultik) RemoteInfo(jsonOutput bool) error {
result := &RemoteInfoResult{}
// Get storage info
storageInfo := v.Storage.Info()
result.StorageType = storageInfo.Type
result.StorageLocation = storageInfo.Location
if !jsonOutput {
fmt.Printf("=== Remote Storage ===\n")
fmt.Printf("Type: %s\n", storageInfo.Type)
fmt.Printf("Location: %s\n", storageInfo.Location)
fmt.Println()
}
// List all snapshot metadata
if !jsonOutput {
fmt.Printf("Scanning snapshot metadata...\n")
}
snapshotMetadata := make(map[string]*SnapshotMetadataInfo)
// Collect metadata files
metadataCh := v.Storage.ListStream(v.ctx, "metadata/")
for obj := range metadataCh {
if obj.Err != nil {
return fmt.Errorf("listing metadata: %w", obj.Err)
}
// Parse key: metadata/<snapshot-id>/<filename>
parts := strings.Split(obj.Key, "/")
if len(parts) < 3 {
continue
}
snapshotID := parts[1]
if _, exists := snapshotMetadata[snapshotID]; !exists {
snapshotMetadata[snapshotID] = &SnapshotMetadataInfo{
SnapshotID: snapshotID,
}
}
info := snapshotMetadata[snapshotID]
filename := parts[2]
if strings.HasPrefix(filename, "manifest") {
info.ManifestSize = obj.Size
} else if strings.HasPrefix(filename, "db") {
info.DatabaseSize = obj.Size
}
info.TotalSize = info.ManifestSize + info.DatabaseSize
}
// Sort snapshots by ID for consistent output
var snapshotIDs []string
for id := range snapshotMetadata {
snapshotIDs = append(snapshotIDs, id)
}
sort.Strings(snapshotIDs)
// Download and parse all manifests to get referenced blobs
if !jsonOutput {
fmt.Printf("Downloading %d manifest(s)...\n", len(snapshotIDs))
}
referencedBlobs := make(map[string]int64) // hash -> compressed size
for _, snapshotID := range snapshotIDs {
manifestKey := fmt.Sprintf("metadata/%s/manifest.json.zst", snapshotID)
reader, err := v.Storage.Get(v.ctx, manifestKey)
if err != nil {
log.Warn("Failed to get manifest", "snapshot", snapshotID, "error", err)
continue
}
manifest, err := snapshot.DecodeManifest(reader)
_ = reader.Close()
if err != nil {
log.Warn("Failed to decode manifest", "snapshot", snapshotID, "error", err)
continue
}
// Record blob info from manifest
info := snapshotMetadata[snapshotID]
info.BlobCount = manifest.BlobCount
var blobsSize int64
for _, blob := range manifest.Blobs {
referencedBlobs[blob.Hash] = blob.CompressedSize
blobsSize += blob.CompressedSize
}
info.BlobsSize = blobsSize
}
// Build result snapshots
var totalMetadataSize int64
for _, id := range snapshotIDs {
info := snapshotMetadata[id]
result.Snapshots = append(result.Snapshots, *info)
totalMetadataSize += info.TotalSize
}
result.TotalMetadataSize = totalMetadataSize
result.TotalMetadataCount = len(snapshotIDs)
// Calculate referenced blob stats
for _, size := range referencedBlobs {
result.ReferencedBlobCount++
result.ReferencedBlobSize += size
}
// List all blobs on remote
if !jsonOutput {
fmt.Printf("Scanning blobs...\n")
}
allBlobs := make(map[string]int64) // hash -> size from storage
blobCh := v.Storage.ListStream(v.ctx, "blobs/")
for obj := range blobCh {
if obj.Err != nil {
return fmt.Errorf("listing blobs: %w", obj.Err)
}
// Extract hash from key: blobs/xx/yy/hash
parts := strings.Split(obj.Key, "/")
if len(parts) < 4 {
continue
}
hash := parts[3]
allBlobs[hash] = obj.Size
result.TotalBlobCount++
result.TotalBlobSize += obj.Size
}
// Calculate orphaned blobs
for hash, size := range allBlobs {
if _, referenced := referencedBlobs[hash]; !referenced {
result.OrphanedBlobCount++
result.OrphanedBlobSize += size
}
}
// Output results
if jsonOutput {
enc := json.NewEncoder(v.Stdout)
enc.SetIndent("", " ")
return enc.Encode(result)
}
// Human-readable output
fmt.Printf("\n=== Snapshot Metadata ===\n")
if len(result.Snapshots) == 0 {
fmt.Printf("No snapshots found\n")
} else {
fmt.Printf("%-45s %12s %12s %12s %10s %12s\n", "SNAPSHOT", "MANIFEST", "DATABASE", "TOTAL", "BLOBS", "BLOB SIZE")
fmt.Printf("%-45s %12s %12s %12s %10s %12s\n", strings.Repeat("-", 45), strings.Repeat("-", 12), strings.Repeat("-", 12), strings.Repeat("-", 12), strings.Repeat("-", 10), strings.Repeat("-", 12))
for _, info := range result.Snapshots {
fmt.Printf("%-45s %12s %12s %12s %10s %12s\n",
truncateString(info.SnapshotID, 45),
humanize.Bytes(uint64(info.ManifestSize)),
humanize.Bytes(uint64(info.DatabaseSize)),
humanize.Bytes(uint64(info.TotalSize)),
humanize.Comma(int64(info.BlobCount)),
humanize.Bytes(uint64(info.BlobsSize)),
)
}
fmt.Printf("%-45s %12s %12s %12s %10s %12s\n", strings.Repeat("-", 45), strings.Repeat("-", 12), strings.Repeat("-", 12), strings.Repeat("-", 12), strings.Repeat("-", 10), strings.Repeat("-", 12))
fmt.Printf("%-45s %12s %12s %12s\n", fmt.Sprintf("Total (%d snapshots)", result.TotalMetadataCount), "", "", humanize.Bytes(uint64(result.TotalMetadataSize)))
}
fmt.Printf("\n=== Blob Storage ===\n")
fmt.Printf("Total blobs on remote: %s (%s)\n",
humanize.Comma(int64(result.TotalBlobCount)),
humanize.Bytes(uint64(result.TotalBlobSize)))
fmt.Printf("Referenced by snapshots: %s (%s)\n",
humanize.Comma(int64(result.ReferencedBlobCount)),
humanize.Bytes(uint64(result.ReferencedBlobSize)))
fmt.Printf("Orphaned (unreferenced): %s (%s)\n",
humanize.Comma(int64(result.OrphanedBlobCount)),
humanize.Bytes(uint64(result.OrphanedBlobSize)))
if result.OrphanedBlobCount > 0 {
fmt.Printf("\nRun 'vaultik prune --remote' to remove orphaned blobs.\n")
}
return nil
}
// truncateString truncates a string to maxLen, adding "..." if truncated
func truncateString(s string, maxLen int) string {
if len(s) <= maxLen {
return s
}
if maxLen <= 3 {
return s[:maxLen]
}
return s[:maxLen-3] + "..."
}

View File

@@ -0,0 +1,543 @@
package vaultik_test
import (
"bytes"
"context"
"database/sql"
"io"
"os"
"path/filepath"
"sync"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/storage"
"git.eeqj.de/sneak/vaultik/internal/types"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// MockStorer implements storage.Storer for testing
type MockStorer struct {
mu sync.Mutex
data map[string][]byte
calls []string
}
func NewMockStorer() *MockStorer {
return &MockStorer{
data: make(map[string][]byte),
calls: make([]string, 0),
}
}
func (m *MockStorer) Put(ctx context.Context, key string, reader io.Reader) error {
m.mu.Lock()
defer m.mu.Unlock()
m.calls = append(m.calls, "Put:"+key)
data, err := io.ReadAll(reader)
if err != nil {
return err
}
m.data[key] = data
return nil
}
func (m *MockStorer) PutWithProgress(ctx context.Context, key string, reader io.Reader, size int64, progress storage.ProgressCallback) error {
return m.Put(ctx, key, reader)
}
func (m *MockStorer) Get(ctx context.Context, key string) (io.ReadCloser, error) {
m.mu.Lock()
defer m.mu.Unlock()
m.calls = append(m.calls, "Get:"+key)
data, exists := m.data[key]
if !exists {
return nil, storage.ErrNotFound
}
return io.NopCloser(bytes.NewReader(data)), nil
}
func (m *MockStorer) Stat(ctx context.Context, key string) (*storage.ObjectInfo, error) {
m.mu.Lock()
defer m.mu.Unlock()
m.calls = append(m.calls, "Stat:"+key)
data, exists := m.data[key]
if !exists {
return nil, storage.ErrNotFound
}
return &storage.ObjectInfo{
Key: key,
Size: int64(len(data)),
}, nil
}
func (m *MockStorer) Delete(ctx context.Context, key string) error {
m.mu.Lock()
defer m.mu.Unlock()
m.calls = append(m.calls, "Delete:"+key)
delete(m.data, key)
return nil
}
func (m *MockStorer) List(ctx context.Context, prefix string) ([]string, error) {
m.mu.Lock()
defer m.mu.Unlock()
m.calls = append(m.calls, "List:"+prefix)
var keys []string
for key := range m.data {
if len(prefix) == 0 || (len(key) >= len(prefix) && key[:len(prefix)] == prefix) {
keys = append(keys, key)
}
}
return keys, nil
}
func (m *MockStorer) ListStream(ctx context.Context, prefix string) <-chan storage.ObjectInfo {
ch := make(chan storage.ObjectInfo)
go func() {
defer close(ch)
m.mu.Lock()
defer m.mu.Unlock()
for key, data := range m.data {
if len(prefix) == 0 || (len(key) >= len(prefix) && key[:len(prefix)] == prefix) {
ch <- storage.ObjectInfo{
Key: key,
Size: int64(len(data)),
}
}
}
}()
return ch
}
func (m *MockStorer) Info() storage.StorageInfo {
return storage.StorageInfo{
Type: "mock",
Location: "memory",
}
}
// GetCalls returns the list of operations that were called
func (m *MockStorer) GetCalls() []string {
m.mu.Lock()
defer m.mu.Unlock()
calls := make([]string, len(m.calls))
copy(calls, m.calls)
return calls
}
// GetStorageSize returns the number of objects in storage
func (m *MockStorer) GetStorageSize() int {
m.mu.Lock()
defer m.mu.Unlock()
return len(m.data)
}
// TestEndToEndBackup tests the full backup workflow with mocked dependencies
func TestEndToEndBackup(t *testing.T) {
// Initialize logger
log.Initialize(log.Config{})
// Create in-memory filesystem
fs := afero.NewMemMapFs()
// Create test directory structure and files
testFiles := map[string]string{
"/home/user/documents/file1.txt": "This is file 1 content",
"/home/user/documents/file2.txt": "This is file 2 content with more data",
"/home/user/pictures/photo1.jpg": "Binary photo data here...",
"/home/user/code/main.go": "package main\n\nfunc main() {\n\tprintln(\"Hello, World!\")\n}",
}
// Create all directories first
dirs := []string{
"/home/user/documents",
"/home/user/pictures",
"/home/user/code",
}
for _, dir := range dirs {
if err := fs.MkdirAll(dir, 0755); err != nil {
t.Fatalf("failed to create directory %s: %v", dir, err)
}
}
// Create test files
for path, content := range testFiles {
if err := afero.WriteFile(fs, path, []byte(content), 0644); err != nil {
t.Fatalf("failed to create test file %s: %v", path, err)
}
}
// Create mock storage
mockStorage := NewMockStorer()
// Create test configuration
cfg := &config.Config{
Snapshots: map[string]config.SnapshotConfig{
"test": {
Paths: []string{"/home/user"},
},
},
Exclude: []string{"*.tmp", "*.log"},
ChunkSize: config.Size(16 * 1024), // 16KB chunks
BlobSizeLimit: config.Size(100 * 1024), // 100KB blobs
CompressionLevel: 3,
AgeRecipients: []string{"age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"}, // Test public key
AgeSecretKey: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5", // Test private key
S3: config.S3Config{
Endpoint: "http://localhost:9000", // MinIO endpoint for testing
Region: "us-east-1",
Bucket: "test-bucket",
AccessKeyID: "test-access",
SecretAccessKey: "test-secret",
},
IndexPath: ":memory:", // In-memory SQLite database
}
// For a true end-to-end test, we'll create a simpler test that focuses on
// the core backup logic using the scanner directly with our mock storage
ctx := context.Background()
// Create in-memory database
db, err := database.New(ctx, ":memory:")
require.NoError(t, err)
defer func() {
if err := db.Close(); err != nil {
t.Errorf("failed to close database: %v", err)
}
}()
repos := database.NewRepositories(db)
// Create scanner with mock storage
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
FS: fs,
ChunkSize: cfg.ChunkSize.Int64(),
Repositories: repos,
Storage: mockStorage,
MaxBlobSize: cfg.BlobSizeLimit.Int64(),
CompressionLevel: cfg.CompressionLevel,
AgeRecipients: cfg.AgeRecipients,
EnableProgress: false,
})
// Create a snapshot record
snapshotID := "test-snapshot-001"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID),
Hostname: "test-host",
VaultikVersion: "test-version",
StartedAt: time.Now(),
}
return repos.Snapshots.Create(ctx, tx, snapshot)
})
require.NoError(t, err)
// Run the backup scan
result, err := scanner.Scan(ctx, "/home/user", snapshotID)
require.NoError(t, err)
// Verify scan results
// The scanner counts both files and directories, so we have:
// 4 files + 4 directories (/home, /home/user, /home/user/documents, /home/user/pictures, /home/user/code)
assert.GreaterOrEqual(t, result.FilesScanned, 4, "Should scan at least 4 files")
assert.Greater(t, result.BytesScanned, int64(0), "Should scan some bytes")
assert.Greater(t, result.ChunksCreated, 0, "Should create chunks")
assert.Greater(t, result.BlobsCreated, 0, "Should create blobs")
// Verify storage operations
calls := mockStorage.GetCalls()
t.Logf("Storage operations performed: %v", calls)
// Should have uploaded at least one blob
blobUploads := 0
for _, call := range calls {
if len(call) > 4 && call[:4] == "Put:" {
if len(call) > 10 && call[4:10] == "blobs/" {
blobUploads++
}
}
}
assert.Greater(t, blobUploads, 0, "Should upload at least one blob")
// Verify files in database
files, err := repos.Files.ListByPrefix(ctx, "/home/user")
require.NoError(t, err)
// Count only regular files (not directories)
regularFiles := 0
for _, f := range files {
if f.Mode&0x80000000 == 0 { // Check if regular file (not directory)
regularFiles++
}
}
assert.Equal(t, 4, regularFiles, "Should have 4 regular files in database")
// Verify chunks were created by checking a specific file
fileChunks, err := repos.FileChunks.GetByPath(ctx, "/home/user/documents/file1.txt")
require.NoError(t, err)
assert.Greater(t, len(fileChunks), 0, "Should have chunks for file1.txt")
// Verify blobs were uploaded to storage
assert.Greater(t, mockStorage.GetStorageSize(), 0, "Should have blobs in storage")
// Complete the snapshot - just verify we got results
// In a real integration test, we'd update the snapshot record
// Create snapshot manager to test metadata export
snapshotManager := &snapshot.SnapshotManager{}
snapshotManager.SetFilesystem(fs)
// Note: We can't fully test snapshot metadata export without a proper S3 client mock
// that implements all required methods. This would require refactoring the S3 client
// interface to be more testable.
t.Logf("Backup completed successfully:")
t.Logf(" Files scanned: %d", result.FilesScanned)
t.Logf(" Bytes scanned: %d", result.BytesScanned)
t.Logf(" Chunks created: %d", result.ChunksCreated)
t.Logf(" Blobs created: %d", result.BlobsCreated)
t.Logf(" Storage size: %d objects", mockStorage.GetStorageSize())
}
// TestBackupAndVerify tests backing up files and verifying the blobs
func TestBackupAndVerify(t *testing.T) {
// Initialize logger
log.Initialize(log.Config{})
// Create in-memory filesystem
fs := afero.NewMemMapFs()
// Create test files
testContent := "This is a test file with some content that should be backed up"
err := fs.MkdirAll("/data", 0755)
require.NoError(t, err)
err = afero.WriteFile(fs, "/data/test.txt", []byte(testContent), 0644)
require.NoError(t, err)
// Create mock storage
mockStorage := NewMockStorer()
// Create test database
ctx := context.Background()
db, err := database.New(ctx, ":memory:")
require.NoError(t, err)
defer func() {
if err := db.Close(); err != nil {
t.Errorf("failed to close database: %v", err)
}
}()
repos := database.NewRepositories(db)
// Create scanner
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
FS: fs,
ChunkSize: int64(1024 * 16), // 16KB chunks
Repositories: repos,
Storage: mockStorage,
MaxBlobSize: int64(1024 * 1024), // 1MB blobs
CompressionLevel: 3,
AgeRecipients: []string{"age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"}, // Test public key
})
// Create a snapshot
snapshotID := "test-snapshot-001"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID),
Hostname: "test-host",
VaultikVersion: "test-version",
StartedAt: time.Now(),
}
return repos.Snapshots.Create(ctx, tx, snapshot)
})
require.NoError(t, err)
// Run the backup
result, err := scanner.Scan(ctx, "/data", snapshotID)
require.NoError(t, err)
// Verify backup created blobs
assert.Greater(t, result.BlobsCreated, 0, "Should create at least one blob")
assert.Equal(t, mockStorage.GetStorageSize(), result.BlobsCreated, "Storage should have the blobs")
// Verify we can retrieve the blob from storage
objects, err := mockStorage.List(ctx, "blobs/")
require.NoError(t, err)
assert.Len(t, objects, result.BlobsCreated, "Should have correct number of blobs in storage")
// Get the first blob and verify it exists
if len(objects) > 0 {
blobKey := objects[0]
t.Logf("Verifying blob: %s", blobKey)
// Get blob info
blobInfo, err := mockStorage.Stat(ctx, blobKey)
require.NoError(t, err)
assert.Greater(t, blobInfo.Size, int64(0), "Blob should have content")
// Get blob content
reader, err := mockStorage.Get(ctx, blobKey)
require.NoError(t, err)
defer func() { _ = reader.Close() }()
// Verify blob data is encrypted (should not contain plaintext)
blobData, err := io.ReadAll(reader)
require.NoError(t, err)
assert.NotContains(t, string(blobData), testContent, "Blob should be encrypted")
assert.Greater(t, len(blobData), 0, "Blob should have data")
}
t.Logf("Backup and verify test completed successfully")
}
// TestBackupAndRestore tests the full backup and restore workflow
// This test verifies that the restore code correctly handles the binary SQLite
// database format that is exported by the snapshot manager.
func TestBackupAndRestore(t *testing.T) {
// Initialize logger
log.Initialize(log.Config{})
// Create real temp directory for the database (SQLite needs real filesystem)
realTempDir, err := os.MkdirTemp("", "vaultik-test-")
require.NoError(t, err)
defer func() { _ = os.RemoveAll(realTempDir) }()
// Use real OS filesystem for this test
fs := afero.NewOsFs()
// Create test directory structure and files
dataDir := filepath.Join(realTempDir, "data")
testFiles := map[string]string{
filepath.Join(dataDir, "file1.txt"): "This is file 1 content",
filepath.Join(dataDir, "file2.txt"): "This is file 2 content with more data",
filepath.Join(dataDir, "subdir", "file3.txt"): "This is file 3 in a subdirectory",
}
// Create directories and files
for path, content := range testFiles {
dir := filepath.Dir(path)
if err := fs.MkdirAll(dir, 0755); err != nil {
t.Fatalf("failed to create directory %s: %v", dir, err)
}
if err := afero.WriteFile(fs, path, []byte(content), 0644); err != nil {
t.Fatalf("failed to create test file %s: %v", path, err)
}
}
ctx := context.Background()
// Create mock storage
mockStorage := NewMockStorer()
// Test keypair
agePublicKey := "age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"
ageSecretKey := "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5"
// Create database file
dbPath := filepath.Join(realTempDir, "test.db")
db, err := database.New(ctx, dbPath)
require.NoError(t, err)
defer func() { _ = db.Close() }()
repos := database.NewRepositories(db)
// Create config for snapshot manager
cfg := &config.Config{
AgeSecretKey: ageSecretKey,
AgeRecipients: []string{agePublicKey},
CompressionLevel: 3,
}
// Create snapshot manager
sm := snapshot.NewSnapshotManager(snapshot.SnapshotManagerParams{
Repos: repos,
Storage: mockStorage,
Config: cfg,
})
sm.SetFilesystem(fs)
// Create scanner
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
FS: fs,
Storage: mockStorage,
ChunkSize: int64(16 * 1024),
MaxBlobSize: int64(100 * 1024),
CompressionLevel: 3,
AgeRecipients: []string{agePublicKey},
Repositories: repos,
})
// Create a snapshot
snapshotID, err := sm.CreateSnapshot(ctx, "test-host", "test-version", "test-git")
require.NoError(t, err)
t.Logf("Created snapshot: %s", snapshotID)
// Run the backup (scan)
result, err := scanner.Scan(ctx, dataDir, snapshotID)
require.NoError(t, err)
t.Logf("Scan complete: %d files, %d blobs", result.FilesScanned, result.BlobsCreated)
// Complete the snapshot
err = sm.CompleteSnapshot(ctx, snapshotID)
require.NoError(t, err)
// Export snapshot metadata (this uploads db.zst.age and manifest.json.zst)
err = sm.ExportSnapshotMetadata(ctx, dbPath, snapshotID)
require.NoError(t, err)
t.Logf("Exported snapshot metadata")
// Verify metadata was uploaded
keys, err := mockStorage.List(ctx, "metadata/")
require.NoError(t, err)
t.Logf("Metadata keys: %v", keys)
assert.GreaterOrEqual(t, len(keys), 2, "Should have at least db.zst.age and manifest.json.zst")
// Close the source database
err = db.Close()
require.NoError(t, err)
// Create Vaultik instance for restore
vaultikApp := &vaultik.Vaultik{
Config: cfg,
Storage: mockStorage,
Fs: fs,
Stdout: io.Discard,
Stderr: io.Discard,
}
vaultikApp.SetContext(ctx)
// Try to restore - this should work with binary SQLite format
restoreDir := filepath.Join(realTempDir, "restored")
err = vaultikApp.Restore(&vaultik.RestoreOptions{
SnapshotID: snapshotID,
TargetDir: restoreDir,
})
require.NoError(t, err, "Restore should succeed with binary SQLite database format")
// Verify restored files match originals
for origPath, expectedContent := range testFiles {
restoredPath := filepath.Join(restoreDir, origPath)
restoredContent, err := afero.ReadFile(fs, restoredPath)
require.NoError(t, err, "Should be able to read restored file: %s", restoredPath)
assert.Equal(t, expectedContent, string(restoredContent), "Restored content should match original for: %s", origPath)
}
t.Log("Backup and restore test completed successfully")
}

204
internal/vaultik/prune.go Normal file
View File

@@ -0,0 +1,204 @@
package vaultik
import (
"encoding/json"
"fmt"
"os"
"strings"
"git.eeqj.de/sneak/vaultik/internal/log"
"github.com/dustin/go-humanize"
)
// PruneOptions contains options for the prune command
type PruneOptions struct {
Force bool
JSON bool
}
// PruneBlobsResult contains the result of a blob prune operation
type PruneBlobsResult struct {
BlobsFound int `json:"blobs_found"`
BlobsDeleted int `json:"blobs_deleted"`
BlobsFailed int `json:"blobs_failed,omitempty"`
BytesFreed int64 `json:"bytes_freed"`
}
// PruneBlobs removes unreferenced blobs from storage
func (v *Vaultik) PruneBlobs(opts *PruneOptions) error {
log.Info("Starting prune operation")
// Get all remote snapshots and their manifests
allBlobsReferenced := make(map[string]bool)
manifestCount := 0
// List all snapshots in storage
log.Info("Listing remote snapshots")
objectCh := v.Storage.ListStream(v.ctx, "metadata/")
var snapshotIDs []string
for object := range objectCh {
if object.Err != nil {
return fmt.Errorf("listing remote snapshots: %w", object.Err)
}
// Extract snapshot ID from paths like metadata/hostname-20240115-143052Z/
parts := strings.Split(object.Key, "/")
if len(parts) >= 2 && parts[0] == "metadata" && parts[1] != "" {
// Check if this is a directory by looking for trailing slash
if strings.HasSuffix(object.Key, "/") || strings.Contains(object.Key, "/manifest.json.zst") {
snapshotID := parts[1]
// Only add unique snapshot IDs
found := false
for _, id := range snapshotIDs {
if id == snapshotID {
found = true
break
}
}
if !found {
snapshotIDs = append(snapshotIDs, snapshotID)
}
}
}
}
log.Info("Found manifests in remote storage", "count", len(snapshotIDs))
// Download and parse each manifest to get referenced blobs
for _, snapshotID := range snapshotIDs {
log.Debug("Processing manifest", "snapshot_id", snapshotID)
manifest, err := v.downloadManifest(snapshotID)
if err != nil {
log.Error("Failed to download manifest", "snapshot_id", snapshotID, "error", err)
continue
}
// Add all blobs from this manifest to our referenced set
for _, blob := range manifest.Blobs {
allBlobsReferenced[blob.Hash] = true
}
manifestCount++
}
log.Info("Processed manifests", "count", manifestCount, "unique_blobs_referenced", len(allBlobsReferenced))
// List all blobs in storage
log.Info("Listing all blobs in storage")
allBlobs := make(map[string]int64) // hash -> size
blobObjectCh := v.Storage.ListStream(v.ctx, "blobs/")
for object := range blobObjectCh {
if object.Err != nil {
return fmt.Errorf("listing blobs: %w", object.Err)
}
// Extract hash from path like blobs/ab/cd/abcdef123456...
parts := strings.Split(object.Key, "/")
if len(parts) == 4 && parts[0] == "blobs" {
hash := parts[3]
allBlobs[hash] = object.Size
}
}
log.Info("Found blobs in storage", "count", len(allBlobs))
// Find unreferenced blobs
var unreferencedBlobs []string
var totalSize int64
for hash, size := range allBlobs {
if !allBlobsReferenced[hash] {
unreferencedBlobs = append(unreferencedBlobs, hash)
totalSize += size
}
}
result := &PruneBlobsResult{
BlobsFound: len(unreferencedBlobs),
}
if len(unreferencedBlobs) == 0 {
log.Info("No unreferenced blobs found")
if opts.JSON {
return outputPruneBlobsJSON(result)
}
fmt.Println("No unreferenced blobs to remove.")
return nil
}
// Show what will be deleted
log.Info("Found unreferenced blobs", "count", len(unreferencedBlobs), "total_size", humanize.Bytes(uint64(totalSize)))
if !opts.JSON {
fmt.Printf("Found %d unreferenced blob(s) totaling %s\n", len(unreferencedBlobs), humanize.Bytes(uint64(totalSize)))
}
// Confirm unless --force is used (skip in JSON mode - require --force)
if !opts.Force && !opts.JSON {
fmt.Printf("\nDelete %d unreferenced blob(s)? [y/N] ", len(unreferencedBlobs))
var confirm string
if _, err := fmt.Scanln(&confirm); err != nil {
// Treat EOF or error as "no"
fmt.Println("Cancelled")
return nil
}
if strings.ToLower(confirm) != "y" {
fmt.Println("Cancelled")
return nil
}
}
// Delete unreferenced blobs
log.Info("Deleting unreferenced blobs")
deletedCount := 0
deletedSize := int64(0)
for i, hash := range unreferencedBlobs {
blobPath := fmt.Sprintf("blobs/%s/%s/%s", hash[:2], hash[2:4], hash)
if err := v.Storage.Delete(v.ctx, blobPath); err != nil {
log.Error("Failed to delete blob", "hash", hash, "error", err)
continue
}
deletedCount++
deletedSize += allBlobs[hash]
// Progress update every 100 blobs
if (i+1)%100 == 0 || i == len(unreferencedBlobs)-1 {
log.Info("Deletion progress",
"deleted", i+1,
"total", len(unreferencedBlobs),
"percent", fmt.Sprintf("%.1f%%", float64(i+1)/float64(len(unreferencedBlobs))*100),
)
}
}
result.BlobsDeleted = deletedCount
result.BlobsFailed = len(unreferencedBlobs) - deletedCount
result.BytesFreed = deletedSize
log.Info("Prune complete",
"deleted_count", deletedCount,
"deleted_size", humanize.Bytes(uint64(deletedSize)),
"failed", len(unreferencedBlobs)-deletedCount,
)
if opts.JSON {
return outputPruneBlobsJSON(result)
}
fmt.Printf("\nDeleted %d blob(s) totaling %s\n", deletedCount, humanize.Bytes(uint64(deletedSize)))
if deletedCount < len(unreferencedBlobs) {
fmt.Printf("Failed to delete %d blob(s)\n", len(unreferencedBlobs)-deletedCount)
}
return nil
}
// outputPruneBlobsJSON outputs the prune result as JSON
func outputPruneBlobsJSON(result *PruneBlobsResult) error {
encoder := json.NewEncoder(os.Stdout)
encoder.SetIndent("", " ")
return encoder.Encode(result)
}

632
internal/vaultik/restore.go Normal file
View File

@@ -0,0 +1,632 @@
package vaultik
import (
"bytes"
"context"
"crypto/sha256"
"encoding/hex"
"fmt"
"io"
"os"
"path/filepath"
"time"
"filippo.io/age"
"git.eeqj.de/sneak/vaultik/internal/blobgen"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/dustin/go-humanize"
"github.com/schollz/progressbar/v3"
"github.com/spf13/afero"
"golang.org/x/term"
)
// RestoreOptions contains options for the restore operation
type RestoreOptions struct {
SnapshotID string
TargetDir string
Paths []string // Optional paths to restore (empty = all)
Verify bool // Verify restored files by checking chunk hashes
}
// RestoreResult contains statistics from a restore operation
type RestoreResult struct {
FilesRestored int
BytesRestored int64
BlobsDownloaded int
BytesDownloaded int64
Duration time.Duration
// Verification results (only populated if Verify option is set)
FilesVerified int
BytesVerified int64
FilesFailed int
FailedFiles []string // Paths of files that failed verification
}
// Restore restores files from a snapshot to the target directory
func (v *Vaultik) Restore(opts *RestoreOptions) error {
startTime := time.Now()
// Check for age_secret_key
if v.Config.AgeSecretKey == "" {
return fmt.Errorf("decryption key required for restore\n\nSet the VAULTIK_AGE_SECRET_KEY environment variable to your age private key:\n export VAULTIK_AGE_SECRET_KEY='AGE-SECRET-KEY-...'")
}
// Parse the age identity
identity, err := age.ParseX25519Identity(v.Config.AgeSecretKey)
if err != nil {
return fmt.Errorf("parsing age secret key: %w", err)
}
log.Info("Starting restore operation",
"snapshot_id", opts.SnapshotID,
"target_dir", opts.TargetDir,
"paths", opts.Paths,
)
// Step 1: Download and decrypt the snapshot metadata database
log.Info("Downloading snapshot metadata...")
tempDB, err := v.downloadSnapshotDB(opts.SnapshotID, identity)
if err != nil {
return fmt.Errorf("downloading snapshot database: %w", err)
}
defer func() {
if err := tempDB.Close(); err != nil {
log.Debug("Failed to close temp database", "error", err)
}
// Clean up temp file
if err := v.Fs.Remove(tempDB.Path()); err != nil {
log.Debug("Failed to remove temp database", "error", err)
}
}()
repos := database.NewRepositories(tempDB)
// Step 2: Get list of files to restore
files, err := v.getFilesToRestore(v.ctx, repos, opts.Paths)
if err != nil {
return fmt.Errorf("getting files to restore: %w", err)
}
if len(files) == 0 {
log.Warn("No files found to restore")
return nil
}
log.Info("Found files to restore", "count", len(files))
// Step 3: Create target directory
if err := v.Fs.MkdirAll(opts.TargetDir, 0755); err != nil {
return fmt.Errorf("creating target directory: %w", err)
}
// Step 4: Build a map of chunks to blobs for efficient restoration
chunkToBlobMap, err := v.buildChunkToBlobMap(v.ctx, repos)
if err != nil {
return fmt.Errorf("building chunk-to-blob map: %w", err)
}
// Step 5: Restore files
result := &RestoreResult{}
blobCache := make(map[string][]byte) // Cache downloaded and decrypted blobs
for i, file := range files {
if v.ctx.Err() != nil {
return v.ctx.Err()
}
if err := v.restoreFile(v.ctx, repos, file, opts.TargetDir, identity, chunkToBlobMap, blobCache, result); err != nil {
log.Error("Failed to restore file", "path", file.Path, "error", err)
// Continue with other files
continue
}
// Progress logging
if (i+1)%100 == 0 || i+1 == len(files) {
log.Info("Restore progress",
"files", fmt.Sprintf("%d/%d", i+1, len(files)),
"bytes", humanize.Bytes(uint64(result.BytesRestored)),
)
}
}
result.Duration = time.Since(startTime)
log.Info("Restore complete",
"files_restored", result.FilesRestored,
"bytes_restored", humanize.Bytes(uint64(result.BytesRestored)),
"blobs_downloaded", result.BlobsDownloaded,
"bytes_downloaded", humanize.Bytes(uint64(result.BytesDownloaded)),
"duration", result.Duration,
)
_, _ = fmt.Fprintf(v.Stdout, "Restored %d files (%s) in %s\n",
result.FilesRestored,
humanize.Bytes(uint64(result.BytesRestored)),
result.Duration.Round(time.Second),
)
// Run verification if requested
if opts.Verify {
if err := v.verifyRestoredFiles(v.ctx, repos, files, opts.TargetDir, result); err != nil {
return fmt.Errorf("verification failed: %w", err)
}
if result.FilesFailed > 0 {
_, _ = fmt.Fprintf(v.Stdout, "\nVerification FAILED: %d files did not match expected checksums\n", result.FilesFailed)
for _, path := range result.FailedFiles {
_, _ = fmt.Fprintf(v.Stdout, " - %s\n", path)
}
return fmt.Errorf("%d files failed verification", result.FilesFailed)
}
_, _ = fmt.Fprintf(v.Stdout, "Verified %d files (%s)\n",
result.FilesVerified,
humanize.Bytes(uint64(result.BytesVerified)),
)
}
return nil
}
// downloadSnapshotDB downloads and decrypts the snapshot metadata database
func (v *Vaultik) downloadSnapshotDB(snapshotID string, identity age.Identity) (*database.DB, error) {
// Download encrypted database from storage
dbKey := fmt.Sprintf("metadata/%s/db.zst.age", snapshotID)
reader, err := v.Storage.Get(v.ctx, dbKey)
if err != nil {
return nil, fmt.Errorf("downloading %s: %w", dbKey, err)
}
defer func() { _ = reader.Close() }()
// Read all data
encryptedData, err := io.ReadAll(reader)
if err != nil {
return nil, fmt.Errorf("reading encrypted data: %w", err)
}
log.Debug("Downloaded encrypted database", "size", humanize.Bytes(uint64(len(encryptedData))))
// Decrypt and decompress using blobgen.Reader
blobReader, err := blobgen.NewReader(bytes.NewReader(encryptedData), identity)
if err != nil {
return nil, fmt.Errorf("creating decryption reader: %w", err)
}
defer func() { _ = blobReader.Close() }()
// Read the binary SQLite database
dbData, err := io.ReadAll(blobReader)
if err != nil {
return nil, fmt.Errorf("decrypting and decompressing: %w", err)
}
log.Debug("Decrypted database", "size", humanize.Bytes(uint64(len(dbData))))
// Create a temporary database file and write the binary SQLite data directly
tempFile, err := afero.TempFile(v.Fs, "", "vaultik-restore-*.db")
if err != nil {
return nil, fmt.Errorf("creating temp file: %w", err)
}
tempPath := tempFile.Name()
// Write the binary SQLite database directly
if _, err := tempFile.Write(dbData); err != nil {
_ = tempFile.Close()
_ = v.Fs.Remove(tempPath)
return nil, fmt.Errorf("writing database file: %w", err)
}
if err := tempFile.Close(); err != nil {
_ = v.Fs.Remove(tempPath)
return nil, fmt.Errorf("closing temp file: %w", err)
}
log.Debug("Created restore database", "path", tempPath)
// Open the database
db, err := database.New(v.ctx, tempPath)
if err != nil {
return nil, fmt.Errorf("opening restore database: %w", err)
}
return db, nil
}
// getFilesToRestore returns the list of files to restore based on path filters
func (v *Vaultik) getFilesToRestore(ctx context.Context, repos *database.Repositories, pathFilters []string) ([]*database.File, error) {
// If no filters, get all files
if len(pathFilters) == 0 {
return repos.Files.ListAll(ctx)
}
// Get files matching the path filters
var result []*database.File
seen := make(map[string]bool)
for _, filter := range pathFilters {
// Normalize the filter path
filter = filepath.Clean(filter)
// Get files with this prefix
files, err := repos.Files.ListByPrefix(ctx, filter)
if err != nil {
return nil, fmt.Errorf("listing files with prefix %s: %w", filter, err)
}
for _, file := range files {
if !seen[file.ID.String()] {
seen[file.ID.String()] = true
result = append(result, file)
}
}
}
return result, nil
}
// buildChunkToBlobMap creates a mapping from chunk hash to blob information
func (v *Vaultik) buildChunkToBlobMap(ctx context.Context, repos *database.Repositories) (map[string]*database.BlobChunk, error) {
// Query all blob_chunks
query := `SELECT blob_id, chunk_hash, offset, length FROM blob_chunks`
rows, err := repos.DB().Conn().QueryContext(ctx, query)
if err != nil {
return nil, fmt.Errorf("querying blob_chunks: %w", err)
}
defer func() { _ = rows.Close() }()
result := make(map[string]*database.BlobChunk)
for rows.Next() {
var bc database.BlobChunk
var blobIDStr, chunkHashStr string
if err := rows.Scan(&blobIDStr, &chunkHashStr, &bc.Offset, &bc.Length); err != nil {
return nil, fmt.Errorf("scanning blob_chunk: %w", err)
}
blobID, err := types.ParseBlobID(blobIDStr)
if err != nil {
return nil, fmt.Errorf("parsing blob ID: %w", err)
}
bc.BlobID = blobID
bc.ChunkHash = types.ChunkHash(chunkHashStr)
result[chunkHashStr] = &bc
}
return result, rows.Err()
}
// restoreFile restores a single file
func (v *Vaultik) restoreFile(
ctx context.Context,
repos *database.Repositories,
file *database.File,
targetDir string,
identity age.Identity,
chunkToBlobMap map[string]*database.BlobChunk,
blobCache map[string][]byte,
result *RestoreResult,
) error {
// Calculate target path - use full original path under target directory
targetPath := filepath.Join(targetDir, file.Path.String())
// Create parent directories
parentDir := filepath.Dir(targetPath)
if err := v.Fs.MkdirAll(parentDir, 0755); err != nil {
return fmt.Errorf("creating parent directory: %w", err)
}
// Handle symlinks
if file.IsSymlink() {
return v.restoreSymlink(file, targetPath, result)
}
// Handle directories
if file.Mode&uint32(os.ModeDir) != 0 {
return v.restoreDirectory(file, targetPath, result)
}
// Handle regular files
return v.restoreRegularFile(ctx, repos, file, targetPath, identity, chunkToBlobMap, blobCache, result)
}
// restoreSymlink restores a symbolic link
func (v *Vaultik) restoreSymlink(file *database.File, targetPath string, result *RestoreResult) error {
// Remove existing file if it exists
_ = v.Fs.Remove(targetPath)
// Create symlink
// Note: afero.MemMapFs doesn't support symlinks, so we use os for real filesystems
if osFs, ok := v.Fs.(*afero.OsFs); ok {
_ = osFs // silence unused variable warning
if err := os.Symlink(file.LinkTarget.String(), targetPath); err != nil {
return fmt.Errorf("creating symlink: %w", err)
}
} else {
log.Debug("Symlink creation not supported on this filesystem", "path", file.Path, "target", file.LinkTarget)
}
result.FilesRestored++
log.Debug("Restored symlink", "path", file.Path, "target", file.LinkTarget)
return nil
}
// restoreDirectory restores a directory with proper permissions
func (v *Vaultik) restoreDirectory(file *database.File, targetPath string, result *RestoreResult) error {
// Create directory
if err := v.Fs.MkdirAll(targetPath, os.FileMode(file.Mode)); err != nil {
return fmt.Errorf("creating directory: %w", err)
}
// Set permissions
if err := v.Fs.Chmod(targetPath, os.FileMode(file.Mode)); err != nil {
log.Debug("Failed to set directory permissions", "path", targetPath, "error", err)
}
// Set ownership (requires root)
if osFs, ok := v.Fs.(*afero.OsFs); ok {
_ = osFs
if err := os.Chown(targetPath, int(file.UID), int(file.GID)); err != nil {
log.Debug("Failed to set directory ownership", "path", targetPath, "error", err)
}
}
// Set mtime
if err := v.Fs.Chtimes(targetPath, file.MTime, file.MTime); err != nil {
log.Debug("Failed to set directory mtime", "path", targetPath, "error", err)
}
result.FilesRestored++
return nil
}
// restoreRegularFile restores a regular file by reconstructing it from chunks
func (v *Vaultik) restoreRegularFile(
ctx context.Context,
repos *database.Repositories,
file *database.File,
targetPath string,
identity age.Identity,
chunkToBlobMap map[string]*database.BlobChunk,
blobCache map[string][]byte,
result *RestoreResult,
) error {
// Get file chunks in order
fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
if err != nil {
return fmt.Errorf("getting file chunks: %w", err)
}
// Create output file
outFile, err := v.Fs.Create(targetPath)
if err != nil {
return fmt.Errorf("creating output file: %w", err)
}
defer func() { _ = outFile.Close() }()
// Write chunks in order
var bytesWritten int64
for _, fc := range fileChunks {
// Find which blob contains this chunk
chunkHashStr := fc.ChunkHash.String()
blobChunk, ok := chunkToBlobMap[chunkHashStr]
if !ok {
return fmt.Errorf("chunk %s not found in any blob", chunkHashStr[:16])
}
// Get the blob's hash from the database
blob, err := repos.Blobs.GetByID(ctx, blobChunk.BlobID.String())
if err != nil {
return fmt.Errorf("getting blob %s: %w", blobChunk.BlobID, err)
}
// Download and decrypt blob if not cached
blobHashStr := blob.Hash.String()
blobData, ok := blobCache[blobHashStr]
if !ok {
blobData, err = v.downloadBlob(ctx, blobHashStr, blob.CompressedSize, identity)
if err != nil {
return fmt.Errorf("downloading blob %s: %w", blobHashStr[:16], err)
}
blobCache[blobHashStr] = blobData
result.BlobsDownloaded++
result.BytesDownloaded += blob.CompressedSize
}
// Extract chunk from blob
if blobChunk.Offset+blobChunk.Length > int64(len(blobData)) {
return fmt.Errorf("chunk %s extends beyond blob data (offset=%d, length=%d, blob_size=%d)",
fc.ChunkHash[:16], blobChunk.Offset, blobChunk.Length, len(blobData))
}
chunkData := blobData[blobChunk.Offset : blobChunk.Offset+blobChunk.Length]
// Write chunk to output file
n, err := outFile.Write(chunkData)
if err != nil {
return fmt.Errorf("writing chunk: %w", err)
}
bytesWritten += int64(n)
}
// Close file before setting metadata
if err := outFile.Close(); err != nil {
return fmt.Errorf("closing output file: %w", err)
}
// Set permissions
if err := v.Fs.Chmod(targetPath, os.FileMode(file.Mode)); err != nil {
log.Debug("Failed to set file permissions", "path", targetPath, "error", err)
}
// Set ownership (requires root)
if osFs, ok := v.Fs.(*afero.OsFs); ok {
_ = osFs
if err := os.Chown(targetPath, int(file.UID), int(file.GID)); err != nil {
log.Debug("Failed to set file ownership", "path", targetPath, "error", err)
}
}
// Set mtime
if err := v.Fs.Chtimes(targetPath, file.MTime, file.MTime); err != nil {
log.Debug("Failed to set file mtime", "path", targetPath, "error", err)
}
result.FilesRestored++
result.BytesRestored += bytesWritten
log.Debug("Restored file", "path", file.Path, "size", humanize.Bytes(uint64(bytesWritten)))
return nil
}
// downloadBlob downloads and decrypts a blob
func (v *Vaultik) downloadBlob(ctx context.Context, blobHash string, expectedSize int64, identity age.Identity) ([]byte, error) {
result, err := v.FetchAndDecryptBlob(ctx, blobHash, expectedSize, identity)
if err != nil {
return nil, err
}
return result.Data, nil
}
// verifyRestoredFiles verifies that all restored files match their expected chunk hashes
func (v *Vaultik) verifyRestoredFiles(
ctx context.Context,
repos *database.Repositories,
files []*database.File,
targetDir string,
result *RestoreResult,
) error {
// Calculate total bytes to verify for progress bar
var totalBytes int64
regularFiles := make([]*database.File, 0, len(files))
for _, file := range files {
// Skip symlinks and directories - only verify regular files
if file.IsSymlink() || file.Mode&uint32(os.ModeDir) != 0 {
continue
}
regularFiles = append(regularFiles, file)
totalBytes += file.Size
}
if len(regularFiles) == 0 {
log.Info("No regular files to verify")
return nil
}
log.Info("Verifying restored files",
"files", len(regularFiles),
"bytes", humanize.Bytes(uint64(totalBytes)),
)
_, _ = fmt.Fprintf(v.Stdout, "\nVerifying %d files (%s)...\n",
len(regularFiles),
humanize.Bytes(uint64(totalBytes)),
)
// Create progress bar if output is a terminal
var bar *progressbar.ProgressBar
if isTerminal() {
bar = progressbar.NewOptions64(
totalBytes,
progressbar.OptionSetDescription("Verifying"),
progressbar.OptionSetWriter(os.Stderr),
progressbar.OptionShowBytes(true),
progressbar.OptionShowCount(),
progressbar.OptionSetWidth(40),
progressbar.OptionThrottle(100*time.Millisecond),
progressbar.OptionOnCompletion(func() {
fmt.Fprint(os.Stderr, "\n")
}),
progressbar.OptionSetRenderBlankState(true),
)
}
// Verify each file
for _, file := range regularFiles {
if ctx.Err() != nil {
return ctx.Err()
}
targetPath := filepath.Join(targetDir, file.Path.String())
bytesVerified, err := v.verifyFile(ctx, repos, file, targetPath)
if err != nil {
log.Error("File verification failed", "path", file.Path, "error", err)
result.FilesFailed++
result.FailedFiles = append(result.FailedFiles, file.Path.String())
} else {
result.FilesVerified++
result.BytesVerified += bytesVerified
}
// Update progress bar
if bar != nil {
_ = bar.Add64(file.Size)
}
}
if bar != nil {
_ = bar.Finish()
}
log.Info("Verification complete",
"files_verified", result.FilesVerified,
"bytes_verified", humanize.Bytes(uint64(result.BytesVerified)),
"files_failed", result.FilesFailed,
)
return nil
}
// verifyFile verifies a single restored file by checking its chunk hashes
func (v *Vaultik) verifyFile(
ctx context.Context,
repos *database.Repositories,
file *database.File,
targetPath string,
) (int64, error) {
// Get file chunks in order
fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
if err != nil {
return 0, fmt.Errorf("getting file chunks: %w", err)
}
// Open the restored file
f, err := v.Fs.Open(targetPath)
if err != nil {
return 0, fmt.Errorf("opening file: %w", err)
}
defer func() { _ = f.Close() }()
// Verify each chunk
var bytesVerified int64
for _, fc := range fileChunks {
// Get chunk size from database
chunk, err := repos.Chunks.GetByHash(ctx, fc.ChunkHash.String())
if err != nil {
return bytesVerified, fmt.Errorf("getting chunk %s: %w", fc.ChunkHash.String()[:16], err)
}
// Read chunk data from file
chunkData := make([]byte, chunk.Size)
n, err := io.ReadFull(f, chunkData)
if err != nil {
return bytesVerified, fmt.Errorf("reading chunk data: %w", err)
}
if int64(n) != chunk.Size {
return bytesVerified, fmt.Errorf("short read: expected %d bytes, got %d", chunk.Size, n)
}
// Calculate hash and compare
hash := sha256.Sum256(chunkData)
actualHash := hex.EncodeToString(hash[:])
expectedHash := fc.ChunkHash.String()
if actualHash != expectedHash {
return bytesVerified, fmt.Errorf("chunk %d hash mismatch: expected %s, got %s",
fc.Idx, expectedHash[:16], actualHash[:16])
}
bytesVerified += int64(n)
}
log.Debug("File verified", "path", file.Path, "bytes", bytesVerified, "chunks", len(fileChunks))
return bytesVerified, nil
}
// isTerminal returns true if stdout is a terminal
func isTerminal() bool {
return term.IsTerminal(int(os.Stdout.Fd()))
}

1151
internal/vaultik/snapshot.go Normal file

File diff suppressed because it is too large Load Diff

167
internal/vaultik/vaultik.go Normal file
View File

@@ -0,0 +1,167 @@
package vaultik
import (
"bytes"
"context"
"fmt"
"io"
"os"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/crypto"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/globals"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/storage"
"github.com/spf13/afero"
"go.uber.org/fx"
)
// Vaultik contains all dependencies needed for vaultik operations
type Vaultik struct {
Globals *globals.Globals
Config *config.Config
DB *database.DB
Repositories *database.Repositories
Storage storage.Storer
ScannerFactory snapshot.ScannerFactory
SnapshotManager *snapshot.SnapshotManager
Shutdowner fx.Shutdowner
Fs afero.Fs
// Context management
ctx context.Context
cancel context.CancelFunc
// IO
Stdout io.Writer
Stderr io.Writer
Stdin io.Reader
}
// VaultikParams contains all parameters for New that can be provided by fx
type VaultikParams struct {
fx.In
Globals *globals.Globals
Config *config.Config
DB *database.DB
Repositories *database.Repositories
Storage storage.Storer
ScannerFactory snapshot.ScannerFactory
SnapshotManager *snapshot.SnapshotManager
Shutdowner fx.Shutdowner
Fs afero.Fs `optional:"true"`
}
// New creates a new Vaultik instance with proper context management
// It automatically includes crypto capabilities if age_secret_key is configured
func New(params VaultikParams) *Vaultik {
ctx, cancel := context.WithCancel(context.Background())
// Use provided filesystem or default to OS filesystem
fs := params.Fs
if fs == nil {
fs = afero.NewOsFs()
}
// Set filesystem on SnapshotManager
params.SnapshotManager.SetFilesystem(fs)
return &Vaultik{
Globals: params.Globals,
Config: params.Config,
DB: params.DB,
Repositories: params.Repositories,
Storage: params.Storage,
ScannerFactory: params.ScannerFactory,
SnapshotManager: params.SnapshotManager,
Shutdowner: params.Shutdowner,
Fs: fs,
ctx: ctx,
cancel: cancel,
Stdout: os.Stdout,
Stderr: os.Stderr,
Stdin: os.Stdin,
}
}
// Context returns the Vaultik's context
func (v *Vaultik) Context() context.Context {
return v.ctx
}
// SetContext sets the Vaultik's context (primarily for testing)
func (v *Vaultik) SetContext(ctx context.Context) {
v.ctx = ctx
}
// Cancel cancels the Vaultik's context
func (v *Vaultik) Cancel() {
v.cancel()
}
// CanDecrypt returns true if this Vaultik instance has decryption capabilities
func (v *Vaultik) CanDecrypt() bool {
return v.Config.AgeSecretKey != ""
}
// GetEncryptor creates a new Encryptor instance based on the configured age recipients
// Returns an error if no recipients are configured
func (v *Vaultik) GetEncryptor() (*crypto.Encryptor, error) {
if len(v.Config.AgeRecipients) == 0 {
return nil, fmt.Errorf("no age recipients configured")
}
return crypto.NewEncryptor(v.Config.AgeRecipients)
}
// GetDecryptor creates a new Decryptor instance based on the configured age secret key
// Returns an error if no secret key is configured
func (v *Vaultik) GetDecryptor() (*crypto.Decryptor, error) {
if v.Config.AgeSecretKey == "" {
return nil, fmt.Errorf("no age secret key configured")
}
return crypto.NewDecryptor(v.Config.AgeSecretKey)
}
// GetFilesystem returns the filesystem instance used by Vaultik
func (v *Vaultik) GetFilesystem() afero.Fs {
return v.Fs
}
// Outputf writes formatted output to stdout for user-facing messages.
// This should be used for all non-log user output.
func (v *Vaultik) Outputf(format string, args ...any) {
_, _ = fmt.Fprintf(v.Stdout, format, args...)
}
// TestVaultik wraps a Vaultik with captured stdout/stderr for testing
type TestVaultik struct {
*Vaultik
Stdout *bytes.Buffer
Stderr *bytes.Buffer
Stdin *bytes.Buffer
}
// NewForTesting creates a minimal Vaultik instance for testing purposes.
// Only the Storage field is populated; other fields are nil.
// Returns a TestVaultik that captures stdout/stderr in buffers.
func NewForTesting(storage storage.Storer) *TestVaultik {
ctx, cancel := context.WithCancel(context.Background())
stdout := &bytes.Buffer{}
stderr := &bytes.Buffer{}
stdin := &bytes.Buffer{}
return &TestVaultik{
Vaultik: &Vaultik{
Storage: storage,
ctx: ctx,
cancel: cancel,
Stdout: stdout,
Stderr: stderr,
Stdin: stdin,
},
Stdout: stdout,
Stderr: stderr,
Stdin: stdin,
}
}

590
internal/vaultik/verify.go Normal file
View File

@@ -0,0 +1,590 @@
package vaultik
import (
"crypto/sha256"
"database/sql"
"encoding/hex"
"fmt"
"io"
"os"
"time"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"github.com/dustin/go-humanize"
"github.com/klauspost/compress/zstd"
_ "github.com/mattn/go-sqlite3"
)
// VerifyOptions contains options for the verify command
type VerifyOptions struct {
Deep bool
JSON bool
}
// VerifyResult contains the result of a snapshot verification
type VerifyResult struct {
SnapshotID string `json:"snapshot_id"`
Status string `json:"status"` // "ok" or "failed"
Mode string `json:"mode"` // "shallow" or "deep"
BlobCount int `json:"blob_count"`
TotalSize int64 `json:"total_size"`
Verified int `json:"verified"`
Missing int `json:"missing"`
MissingSize int64 `json:"missing_size,omitempty"`
ErrorMessage string `json:"error,omitempty"`
}
// RunDeepVerify executes deep verification operation
func (v *Vaultik) RunDeepVerify(snapshotID string, opts *VerifyOptions) error {
result := &VerifyResult{
SnapshotID: snapshotID,
Mode: "deep",
}
// Check for decryption capability
if !v.CanDecrypt() {
result.Status = "failed"
result.ErrorMessage = "VAULTIK_AGE_SECRET_KEY environment variable not set - required for deep verification"
if opts.JSON {
return v.outputVerifyJSON(result)
}
return fmt.Errorf("VAULTIK_AGE_SECRET_KEY environment variable not set - required for deep verification")
}
log.Info("Starting snapshot verification",
"snapshot_id", snapshotID,
"mode", "deep",
)
if !opts.JSON {
v.Outputf("Deep verification of snapshot: %s\n\n", snapshotID)
}
// Step 1: Download manifest
manifestPath := fmt.Sprintf("metadata/%s/manifest.json.zst", snapshotID)
log.Info("Downloading manifest", "path", manifestPath)
if !opts.JSON {
v.Outputf("Downloading manifest...\n")
}
manifestReader, err := v.Storage.Get(v.ctx, manifestPath)
if err != nil {
result.Status = "failed"
result.ErrorMessage = fmt.Sprintf("failed to download manifest: %v", err)
if opts.JSON {
return v.outputVerifyJSON(result)
}
return fmt.Errorf("failed to download manifest: %w", err)
}
defer func() { _ = manifestReader.Close() }()
// Decompress manifest
manifest, err := snapshot.DecodeManifest(manifestReader)
if err != nil {
result.Status = "failed"
result.ErrorMessage = fmt.Sprintf("failed to decode manifest: %v", err)
if opts.JSON {
return v.outputVerifyJSON(result)
}
return fmt.Errorf("failed to decode manifest: %w", err)
}
log.Info("Manifest loaded",
"manifest_blob_count", manifest.BlobCount,
"manifest_total_size", humanize.Bytes(uint64(manifest.TotalCompressedSize)),
)
if !opts.JSON {
v.Outputf("Manifest loaded: %d blobs (%s)\n", manifest.BlobCount, humanize.Bytes(uint64(manifest.TotalCompressedSize)))
}
// Step 2: Download and decrypt database (authoritative source)
dbPath := fmt.Sprintf("metadata/%s/db.zst.age", snapshotID)
log.Info("Downloading encrypted database", "path", dbPath)
if !opts.JSON {
v.Outputf("Downloading and decrypting database...\n")
}
dbReader, err := v.Storage.Get(v.ctx, dbPath)
if err != nil {
result.Status = "failed"
result.ErrorMessage = fmt.Sprintf("failed to download database: %v", err)
if opts.JSON {
return v.outputVerifyJSON(result)
}
return fmt.Errorf("failed to download database: %w", err)
}
defer func() { _ = dbReader.Close() }()
// Decrypt and decompress database
tempDB, err := v.decryptAndLoadDatabase(dbReader, v.Config.AgeSecretKey)
if err != nil {
result.Status = "failed"
result.ErrorMessage = fmt.Sprintf("failed to decrypt database: %v", err)
if opts.JSON {
return v.outputVerifyJSON(result)
}
return fmt.Errorf("failed to decrypt database: %w", err)
}
defer func() {
if tempDB != nil {
_ = tempDB.Close()
}
}()
// Step 3: Get authoritative blob list from database
dbBlobs, err := v.getBlobsFromDatabase(snapshotID, tempDB.DB)
if err != nil {
result.Status = "failed"
result.ErrorMessage = fmt.Sprintf("failed to get blobs from database: %v", err)
if opts.JSON {
return v.outputVerifyJSON(result)
}
return fmt.Errorf("failed to get blobs from database: %w", err)
}
result.BlobCount = len(dbBlobs)
var totalSize int64
for _, blob := range dbBlobs {
totalSize += blob.CompressedSize
}
result.TotalSize = totalSize
log.Info("Database loaded",
"db_blob_count", len(dbBlobs),
"db_total_size", humanize.Bytes(uint64(totalSize)),
)
if !opts.JSON {
v.Outputf("Database loaded: %d blobs (%s)\n", len(dbBlobs), humanize.Bytes(uint64(totalSize)))
v.Outputf("Verifying manifest against database...\n")
}
// Step 4: Verify manifest matches database
if err := v.verifyManifestAgainstDatabase(manifest, dbBlobs); err != nil {
result.Status = "failed"
result.ErrorMessage = err.Error()
if opts.JSON {
return v.outputVerifyJSON(result)
}
return err
}
// Step 5: Verify all blobs exist in S3 (using database as source)
if !opts.JSON {
v.Outputf("Manifest verified.\n")
v.Outputf("Checking blob existence in remote storage...\n")
}
if err := v.verifyBlobExistenceFromDB(dbBlobs); err != nil {
result.Status = "failed"
result.ErrorMessage = err.Error()
if opts.JSON {
return v.outputVerifyJSON(result)
}
return err
}
// Step 6: Deep verification - download and verify blob contents
if !opts.JSON {
v.Outputf("All blobs exist.\n")
v.Outputf("Downloading and verifying blob contents (%d blobs, %s)...\n", len(dbBlobs), humanize.Bytes(uint64(totalSize)))
}
if err := v.performDeepVerificationFromDB(dbBlobs, tempDB.DB, opts); err != nil {
result.Status = "failed"
result.ErrorMessage = err.Error()
if opts.JSON {
return v.outputVerifyJSON(result)
}
return err
}
// Success
result.Status = "ok"
result.Verified = len(dbBlobs)
if opts.JSON {
return v.outputVerifyJSON(result)
}
log.Info("✓ Verification completed successfully",
"snapshot_id", snapshotID,
"mode", "deep",
"blobs_verified", len(dbBlobs),
)
v.Outputf("\n✓ Verification completed successfully\n")
v.Outputf(" Snapshot: %s\n", snapshotID)
v.Outputf(" Blobs verified: %d\n", len(dbBlobs))
v.Outputf(" Total size: %s\n", humanize.Bytes(uint64(totalSize)))
return nil
}
// tempDB wraps sql.DB with cleanup
type tempDB struct {
*sql.DB
tempPath string
}
func (t *tempDB) Close() error {
err := t.DB.Close()
_ = os.Remove(t.tempPath)
return err
}
// decryptAndLoadDatabase decrypts and loads the binary SQLite database from the encrypted stream
func (v *Vaultik) decryptAndLoadDatabase(reader io.ReadCloser, secretKey string) (*tempDB, error) {
// Get decryptor
decryptor, err := v.GetDecryptor()
if err != nil {
return nil, fmt.Errorf("failed to get decryptor: %w", err)
}
// Decrypt the stream
decryptedReader, err := decryptor.DecryptStream(reader)
if err != nil {
return nil, fmt.Errorf("failed to decrypt database: %w", err)
}
// Decompress the binary database
decompressor, err := zstd.NewReader(decryptedReader)
if err != nil {
return nil, fmt.Errorf("failed to create decompressor: %w", err)
}
defer decompressor.Close()
// Create temporary file for the database
tempFile, err := os.CreateTemp("", "vaultik-verify-*.db")
if err != nil {
return nil, fmt.Errorf("failed to create temp file: %w", err)
}
tempPath := tempFile.Name()
// Stream decompress directly to file
log.Info("Decompressing database...")
written, err := io.Copy(tempFile, decompressor)
if err != nil {
_ = tempFile.Close()
_ = os.Remove(tempPath)
return nil, fmt.Errorf("failed to decompress database: %w", err)
}
_ = tempFile.Close()
log.Info("Database decompressed", "size", humanize.Bytes(uint64(written)))
// Open the database
db, err := sql.Open("sqlite3", tempPath)
if err != nil {
_ = os.Remove(tempPath)
return nil, fmt.Errorf("failed to open database: %w", err)
}
return &tempDB{
DB: db,
tempPath: tempPath,
}, nil
}
// verifyBlob downloads and verifies a single blob
func (v *Vaultik) verifyBlob(blobInfo snapshot.BlobInfo, db *sql.DB) error {
// Download blob using shared fetch method
reader, _, err := v.FetchBlob(v.ctx, blobInfo.Hash, blobInfo.CompressedSize)
if err != nil {
return fmt.Errorf("failed to download: %w", err)
}
defer func() { _ = reader.Close() }()
// Get decryptor
decryptor, err := v.GetDecryptor()
if err != nil {
return fmt.Errorf("failed to get decryptor: %w", err)
}
// Hash the encrypted blob data as it streams through to decryption
blobHasher := sha256.New()
teeReader := io.TeeReader(reader, blobHasher)
// Decrypt blob (reading through teeReader to hash encrypted data)
decryptedReader, err := decryptor.DecryptStream(teeReader)
if err != nil {
return fmt.Errorf("failed to decrypt: %w", err)
}
// Decompress blob
decompressor, err := zstd.NewReader(decryptedReader)
if err != nil {
return fmt.Errorf("failed to decompress: %w", err)
}
defer decompressor.Close()
// Query blob chunks from database to get offsets and lengths
query := `
SELECT bc.chunk_hash, bc.offset, bc.length
FROM blob_chunks bc
JOIN blobs b ON bc.blob_id = b.id
WHERE b.blob_hash = ?
ORDER BY bc.offset
`
rows, err := db.QueryContext(v.ctx, query, blobInfo.Hash)
if err != nil {
return fmt.Errorf("failed to query blob chunks: %w", err)
}
defer func() { _ = rows.Close() }()
var lastOffset int64 = -1
chunkCount := 0
totalRead := int64(0)
// Verify each chunk in the blob
for rows.Next() {
var chunkHash string
var offset, length int64
if err := rows.Scan(&chunkHash, &offset, &length); err != nil {
return fmt.Errorf("failed to scan chunk row: %w", err)
}
// Verify chunk ordering
if offset <= lastOffset {
return fmt.Errorf("chunks out of order: offset %d after %d", offset, lastOffset)
}
lastOffset = offset
// Read chunk data from decompressed stream
if offset > totalRead {
// Skip to the correct offset
skipBytes := offset - totalRead
if _, err := io.CopyN(io.Discard, decompressor, skipBytes); err != nil {
return fmt.Errorf("failed to skip to offset %d: %w", offset, err)
}
totalRead = offset
}
// Read chunk data
chunkData := make([]byte, length)
if _, err := io.ReadFull(decompressor, chunkData); err != nil {
return fmt.Errorf("failed to read chunk at offset %d: %w", offset, err)
}
totalRead += length
// Verify chunk hash
hasher := sha256.New()
hasher.Write(chunkData)
calculatedHash := hex.EncodeToString(hasher.Sum(nil))
if calculatedHash != chunkHash {
return fmt.Errorf("chunk hash mismatch at offset %d: calculated %s, expected %s",
offset, calculatedHash, chunkHash)
}
chunkCount++
}
if err := rows.Err(); err != nil {
return fmt.Errorf("error iterating blob chunks: %w", err)
}
// Verify no remaining data in blob - if chunk list is accurate, blob should be fully consumed
remaining, err := io.Copy(io.Discard, decompressor)
if err != nil {
return fmt.Errorf("failed to check for remaining blob data: %w", err)
}
if remaining > 0 {
return fmt.Errorf("blob has %d unexpected trailing bytes not covered by chunk list", remaining)
}
// Verify blob hash matches the encrypted data we downloaded
calculatedBlobHash := hex.EncodeToString(blobHasher.Sum(nil))
if calculatedBlobHash != blobInfo.Hash {
return fmt.Errorf("blob hash mismatch: calculated %s, expected %s",
calculatedBlobHash, blobInfo.Hash)
}
log.Info("Blob verified",
"hash", blobInfo.Hash[:16]+"...",
"chunks", chunkCount,
"size", humanize.Bytes(uint64(blobInfo.CompressedSize)),
)
return nil
}
// getBlobsFromDatabase gets all blobs for the snapshot from the database
func (v *Vaultik) getBlobsFromDatabase(snapshotID string, db *sql.DB) ([]snapshot.BlobInfo, error) {
query := `
SELECT b.blob_hash, b.compressed_size
FROM snapshot_blobs sb
JOIN blobs b ON sb.blob_hash = b.blob_hash
WHERE sb.snapshot_id = ?
ORDER BY b.blob_hash
`
rows, err := db.QueryContext(v.ctx, query, snapshotID)
if err != nil {
return nil, fmt.Errorf("failed to query snapshot blobs: %w", err)
}
defer func() { _ = rows.Close() }()
var blobs []snapshot.BlobInfo
for rows.Next() {
var hash string
var size int64
if err := rows.Scan(&hash, &size); err != nil {
return nil, fmt.Errorf("failed to scan blob row: %w", err)
}
blobs = append(blobs, snapshot.BlobInfo{
Hash: hash,
CompressedSize: size,
})
}
if err := rows.Err(); err != nil {
return nil, fmt.Errorf("error iterating blobs: %w", err)
}
return blobs, nil
}
// verifyManifestAgainstDatabase verifies the manifest matches the authoritative database
func (v *Vaultik) verifyManifestAgainstDatabase(manifest *snapshot.Manifest, dbBlobs []snapshot.BlobInfo) error {
log.Info("Verifying manifest against database")
// Build map of database blobs
dbBlobMap := make(map[string]int64)
for _, blob := range dbBlobs {
dbBlobMap[blob.Hash] = blob.CompressedSize
}
// Build map of manifest blobs
manifestBlobMap := make(map[string]int64)
for _, blob := range manifest.Blobs {
manifestBlobMap[blob.Hash] = blob.CompressedSize
}
// Check counts match
if len(dbBlobMap) != len(manifestBlobMap) {
log.Warn("Manifest blob count mismatch",
"database_blobs", len(dbBlobMap),
"manifest_blobs", len(manifestBlobMap),
)
// This is a warning, not an error - database is authoritative
}
// Check each manifest blob exists in database with correct size
for hash, manifestSize := range manifestBlobMap {
dbSize, exists := dbBlobMap[hash]
if !exists {
return fmt.Errorf("manifest contains blob %s not in database", hash)
}
if dbSize != manifestSize {
return fmt.Errorf("blob %s size mismatch: database has %d bytes, manifest has %d bytes",
hash, dbSize, manifestSize)
}
}
log.Info("✓ Manifest verified against database",
"manifest_blobs", len(manifestBlobMap),
"database_blobs", len(dbBlobMap),
)
return nil
}
// verifyBlobExistenceFromDB checks that all blobs from database exist in S3
func (v *Vaultik) verifyBlobExistenceFromDB(blobs []snapshot.BlobInfo) error {
log.Info("Verifying blob existence in S3", "blob_count", len(blobs))
for i, blob := range blobs {
// Construct blob path
blobPath := fmt.Sprintf("blobs/%s/%s/%s", blob.Hash[:2], blob.Hash[2:4], blob.Hash)
// Check blob exists
stat, err := v.Storage.Stat(v.ctx, blobPath)
if err != nil {
return fmt.Errorf("blob %s missing from storage: %w", blob.Hash, err)
}
// Verify size matches
if stat.Size != blob.CompressedSize {
return fmt.Errorf("blob %s size mismatch: S3 has %d bytes, database has %d bytes",
blob.Hash, stat.Size, blob.CompressedSize)
}
// Progress update every 100 blobs
if (i+1)%100 == 0 || i == len(blobs)-1 {
log.Info("Blob existence check progress",
"checked", i+1,
"total", len(blobs),
"percent", fmt.Sprintf("%.1f%%", float64(i+1)/float64(len(blobs))*100),
)
}
}
log.Info("✓ All blobs exist in storage")
return nil
}
// performDeepVerificationFromDB downloads and verifies the content of each blob using database as source
func (v *Vaultik) performDeepVerificationFromDB(blobs []snapshot.BlobInfo, db *sql.DB, opts *VerifyOptions) error {
// Calculate total bytes for ETA
var totalBytesExpected int64
for _, b := range blobs {
totalBytesExpected += b.CompressedSize
}
log.Info("Starting deep verification - downloading and verifying all blobs",
"blob_count", len(blobs),
"total_size", humanize.Bytes(uint64(totalBytesExpected)),
)
startTime := time.Now()
bytesProcessed := int64(0)
for i, blobInfo := range blobs {
// Verify individual blob
if err := v.verifyBlob(blobInfo, db); err != nil {
return fmt.Errorf("blob %s verification failed: %w", blobInfo.Hash, err)
}
bytesProcessed += blobInfo.CompressedSize
elapsed := time.Since(startTime)
remaining := len(blobs) - (i + 1)
// Calculate ETA based on bytes processed
var eta time.Duration
if bytesProcessed > 0 {
bytesPerSec := float64(bytesProcessed) / elapsed.Seconds()
bytesRemaining := totalBytesExpected - bytesProcessed
if bytesPerSec > 0 {
eta = time.Duration(float64(bytesRemaining)/bytesPerSec) * time.Second
}
}
log.Info("Verification progress",
"blobs_done", i+1,
"blobs_total", len(blobs),
"blobs_remaining", remaining,
"bytes_done", bytesProcessed,
"bytes_done_human", humanize.Bytes(uint64(bytesProcessed)),
"bytes_total", totalBytesExpected,
"bytes_total_human", humanize.Bytes(uint64(totalBytesExpected)),
"elapsed", elapsed.Round(time.Second),
"eta", eta.Round(time.Second),
)
if !opts.JSON {
v.Outputf(" Verified %d/%d blobs (%d remaining) - %s/%s - elapsed %s, eta %s\n",
i+1, len(blobs), remaining,
humanize.Bytes(uint64(bytesProcessed)),
humanize.Bytes(uint64(totalBytesExpected)),
elapsed.Round(time.Second),
eta.Round(time.Second))
}
}
totalElapsed := time.Since(startTime)
log.Info("✓ Deep verification completed successfully",
"blobs_verified", len(blobs),
"total_bytes", bytesProcessed,
"total_bytes_human", humanize.Bytes(uint64(bytesProcessed)),
"duration", totalElapsed.Round(time.Second),
)
return nil
}

View File

@@ -1,7 +1,9 @@
age_recipients:
- age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj # sneak's long term age key
- age1otherpubkey... # add additional recipients as needed
source_dirs:
snapshots:
test:
paths:
- /tmp/vaultik-test-source
- /var/test/data
exclude: