Compare commits

..

No commits in common. "main" and "feature/pluggable-storage-backend" have entirely different histories.

68 changed files with 1785 additions and 5078 deletions

View File

@ -10,9 +10,6 @@ Read the rules in AGENTS.md and follow them.
corporate advertising for Anthropic and is therefore completely
unacceptable in commit messages.
* NEVER use `git add -A`. Always add only the files you intentionally
changed.
* Tests should always be run before committing code. No commits should be
made that do not pass tests.
@ -36,9 +33,6 @@ Read the rules in AGENTS.md and follow them.
* When testing on a 2.5Gbit/s ethernet to an s3 server backed by 2000MB/sec SSD,
estimate about 4 seconds per gigabyte of backup time.
* When running tests, don't run individual tests, or grep the output. run
the entire test suite every time and read the full output.
* When running tests, don't run individual tests, or grep the output. run the entire test suite every time and read the full output.
* When running tests, don't run individual tests, or try to grep the output.
never run "go test". only ever run "make test" to run the full test
suite, and examine the full output.
* When running tests, don't run individual tests, or try to grep the output. never run "go test". only ever run "make test" to run the full test suite, and examine the full output.

387
DESIGN.md Normal file
View File

@ -0,0 +1,387 @@
# vaultik: Design Document
`vaultik` is a secure backup tool written in Go. It performs
streaming backups using content-defined chunking, blob grouping, asymmetric
encryption, and object storage. The system is designed for environments
where the backup source host cannot store secrets and cannot retrieve or
decrypt any data from the destination.
The source host is **stateful**: it maintains a local SQLite index to detect
changes, deduplicate content, and track uploads across backup runs. All
remote storage is encrypted and append-only. Pruning of unreferenced data is
done from a trusted host with access to decryption keys, as even the
metadata indices are encrypted in the blob store.
---
## Why
ANOTHER backup tool??
Other backup tools like `restic`, `borg`, and `duplicity` are designed for
environments where the source host can store secrets and has access to
decryption keys. I don't want to store backup decryption keys on my hosts,
only public keys for encryption.
My requirements are:
* open source
* no passphrases or private keys on the source host
* incremental
* compressed
* encrypted
* s3 compatible without an intermediate step or tool
Surprisingly, no existing tool meets these requirements, so I wrote `vaultik`.
## Design Goals
1. Backups must require only a public key on the source host.
2. No secrets or private keys may exist on the source system.
3. Obviously, restore must be possible using **only** the backup bucket and
a private key.
4. Prune must be possible, although this requires a private key so must be
done on different hosts.
5. All encryption is done using [`age`](https://github.com/FiloSottile/age)
(X25519, XChaCha20-Poly1305).
6. Compression uses `zstd` at a configurable level.
7. Files are chunked, and multiple chunks are packed into encrypted blobs.
This reduces the number of objects in the blob store for filesystems with
many small files.
9. All metadata (snapshots) is stored remotely as encrypted SQLite DBs.
10. If a snapshot metadata file exceeds a configured size threshold, it is
chunked into multiple encrypted `.age` parts, to support large
filesystems.
11. CLI interface is structured using `cobra`.
---
## S3 Bucket Layout
S3 stores only four things:
1) Blobs: encrypted, compressed packs of file chunks.
2) Metadata: encrypted SQLite databases containing the current state of the
filesystem at the time of the snapshot.
3) Metadata hashes: encrypted hashes of the metadata SQLite databases.
4) Blob manifests: unencrypted compressed JSON files listing all blob hashes
referenced in the snapshot, enabling pruning without decryption.
```
s3://<bucket>/<prefix>/
├── blobs/
│ ├── <aa>/<bb>/<full_blob_hash>.zst.age
├── metadata/
│ ├── <snapshot_id>.sqlite.age
│ ├── <snapshot_id>.sqlite.00.age
│ ├── <snapshot_id>.sqlite.01.age
│ ├── <snapshot_id>.manifest.json.zst
```
To retrieve a given file, you would:
* fetch `metadata/<snapshot_id>.sqlite.age` or `metadata/<snapshot_id>.sqlite.{seq}.age`
* fetch `metadata/<snapshot_id>.hash.age`
* decrypt the metadata SQLite database using the private key and reconstruct
the full database file
* verify the hash of the decrypted database matches the decrypted hash
* query the database for the file in question
* determine all chunks for the file
* for each chunk, look up the metadata for all blobs in the db
* fetch each blob from `blobs/<aa>/<bb>/<blob_hash>.zst.age`
* decrypt each blob using the private key
* decompress each blob using `zstd`
* reconstruct the file from set of file chunks stored in the blobs
If clever, it may be possible to do this chunk by chunk without touching
disk (except for the output file) as each uncompressed blob should fit in
memory (<10GB).
### Path Rules
* `<snapshot_id>`: UTC timestamp in iso860 format, e.g. `2023-10-01T12:00:00Z`. These are lexicographically sortable.
* `blobs/<aa>/<bb>/...`: where `aa` and `bb` are the first 2 hex bytes of the blob hash.
### Blob Manifest Format
The `<snapshot_id>.manifest.json.zst` file is an unencrypted, compressed JSON file containing:
```json
{
"snapshot_id": "2023-10-01T12:00:00Z",
"blob_hashes": [
"aa1234567890abcdef...",
"bb2345678901bcdef0...",
...
]
}
```
This allows pruning operations to determine which blobs are referenced without requiring decryption keys.
---
## 3. Local SQLite Index Schema (source host)
```sql
CREATE TABLE files (
id TEXT PRIMARY KEY, -- UUID
path TEXT NOT NULL UNIQUE,
mtime INTEGER NOT NULL,
size INTEGER NOT NULL
);
-- Maps files to their constituent chunks in sequence order
-- Used for reconstructing files from chunks during restore
CREATE TABLE file_chunks (
file_id TEXT NOT NULL,
idx INTEGER NOT NULL,
chunk_hash TEXT NOT NULL,
PRIMARY KEY (file_id, idx)
);
CREATE TABLE chunks (
chunk_hash TEXT PRIMARY KEY,
sha256 TEXT NOT NULL,
size INTEGER NOT NULL
);
CREATE TABLE blobs (
blob_hash TEXT PRIMARY KEY,
final_hash TEXT NOT NULL,
created_ts INTEGER NOT NULL
);
CREATE TABLE blob_chunks (
blob_hash TEXT NOT NULL,
chunk_hash TEXT NOT NULL,
offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (blob_hash, chunk_hash)
);
-- Reverse mapping: tracks which files contain a given chunk
-- Used for deduplication and tracking chunk usage across files
CREATE TABLE chunk_files (
chunk_hash TEXT NOT NULL,
file_id TEXT NOT NULL,
file_offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (chunk_hash, file_id)
);
CREATE TABLE snapshots (
id TEXT PRIMARY KEY,
hostname TEXT NOT NULL,
vaultik_version TEXT NOT NULL,
vaultik_git_revision TEXT NOT NULL,
created_ts INTEGER NOT NULL,
file_count INTEGER NOT NULL,
chunk_count INTEGER NOT NULL,
blob_count INTEGER NOT NULL
);
```
---
## 4. Snapshot Metadata Schema (stored in S3)
Identical schema to the local index, filtered to live snapshot state. Stored
as a SQLite DB, compressed with `zstd`, encrypted with `age`. If larger than
a configured `chunk_size`, it is split and uploaded as:
```
metadata/<snapshot_id>.sqlite.00.age
metadata/<snapshot_id>.sqlite.01.age
...
```
---
## 5. Data Flow
### 5.1 Backup
1. Load config
2. Open local SQLite index
3. Walk source directories:
* For each file:
* Check mtime and size in index
* If changed or new:
* Chunk file
* For each chunk:
* Hash with SHA256
* Check if already uploaded
* If not:
* Add chunk to blob packer
* Record file-chunk mapping in index
4. When blob reaches threshold size (e.g. 1GB):
* Compress with `zstd`
* Encrypt with `age`
* Upload to: `s3://<bucket>/<prefix>/blobs/<aa>/<bb>/<hash>.zst.age`
* Record blob-chunk layout in local index
5. Once all files are processed:
* Build snapshot SQLite DB from index delta
* Compress + encrypt
* If larger than `chunk_size`, split into parts
* Upload to:
`s3://<bucket>/<prefix>/metadata/<snapshot_id>.sqlite(.xx).age`
6. Create snapshot record in local index that lists:
* snapshot ID
* hostname
* vaultik version
* timestamp
* counts of files, chunks, and blobs
* list of all blobs referenced in the snapshot (some new, some old) for
efficient pruning later
7. Create snapshot database for upload
8. Calculate checksum of snapshot database
9. Compress, encrypt, split, and upload to S3
10. Encrypt the hash of the snapshot database to the backup age key
11. Upload the encrypted hash to S3 as `metadata/<snapshot_id>.hash.age`
12. Create blob manifest JSON listing all blob hashes referenced in snapshot
13. Compress manifest with zstd and upload as `metadata/<snapshot_id>.manifest.json.zst`
14. Optionally prune remote blobs that are no longer referenced in the
snapshot, based on local state db
### 5.2 Manual Prune
1. List all objects under `metadata/`
2. Determine the latest valid `snapshot_id` by timestamp
3. Download and decompress the latest `<snapshot_id>.manifest.json.zst`
4. Extract set of referenced blob hashes from manifest (no decryption needed)
5. List all blob objects under `blobs/`
6. For each blob:
* If the hash is not in the manifest:
* Issue `DeleteObject` to remove it
### 5.3 Verify
Verify runs on a host that has no state, but access to the bucket.
1. Fetch latest metadata snapshot files from S3
2. Fetch latest metadata db hash from S3
3. Decrypt the hash using the private key
4. Decrypt the metadata SQLite database chunks using the private key and
reassemble the snapshot db file
5. Calculate the SHA256 hash of the decrypted snapshot database
6. Verify the db file hash matches the decrypted hash
7. For each blob in the snapshot:
* Fetch the blob metadata from the snapshot db
* Ensure the blob exists in S3
* Check the S3 content hash matches the expected blob hash
* If not using --quick mode:
* Download and decrypt the blob
* Decompress and verify chunk hashes match metadata
---
## 6. CLI Commands
```
vaultik backup [--config <path>] [--cron] [--daemon] [--prune]
vaultik restore --bucket <bucket> --prefix <prefix> --snapshot <id> --target <dir>
vaultik prune --bucket <bucket> --prefix <prefix> [--dry-run]
vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
vaultik fetch --bucket <bucket> --prefix <prefix> --snapshot <id> --file <path> --target <path>
vaultik snapshot list --bucket <bucket> --prefix <prefix> [--limit <n>]
vaultik snapshot rm --bucket <bucket> --prefix <prefix> --snapshot <id>
vaultik snapshot latest --bucket <bucket> --prefix <prefix>
```
* `VAULTIK_PRIVATE_KEY` is required for `restore`, `prune`, `verify`, and
`fetch` commands.
* It is passed via environment variable containing the age private key.
---
## 7. Function and Method Signatures
### 7.1 CLI
```go
func RootCmd() *cobra.Command
func backupCmd() *cobra.Command
func restoreCmd() *cobra.Command
func pruneCmd() *cobra.Command
func verifyCmd() *cobra.Command
```
### 7.2 Configuration
```go
type Config struct {
BackupPubKey string // age recipient
BackupInterval time.Duration // used in daemon mode, irrelevant for cron mode
BlobSizeLimit int64 // default 10GB
ChunkSize int64 // default 10MB
Exclude []string // list of regex of files to exclude from backup, absolute path
Hostname string
IndexPath string // path to local SQLite index db, default /var/lib/vaultik/index.db
MetadataPrefix string // S3 prefix for metadata, default "metadata/"
MinTimeBetweenRun time.Duration // minimum time between backup runs, default 1 hour - for daemon mode
S3 S3Config // S3 configuration
ScanInterval time.Duration // interval to full stat() scan source dirs, default 24h
SourceDirs []string // list of source directories to back up, absolute paths
}
type S3Config struct {
Endpoint string
Bucket string
Prefix string
AccessKeyID string
SecretAccessKey string
Region string
}
func Load(path string) (*Config, error)
```
### 7.3 Index
```go
type Index struct {
db *sql.DB
}
func OpenIndex(path string) (*Index, error)
func (ix *Index) LookupFile(path string, mtime int64, size int64) ([]string, bool, error)
func (ix *Index) SaveFile(path string, mtime int64, size int64, chunkHashes []string) error
func (ix *Index) AddChunk(chunkHash string, size int64) error
func (ix *Index) MarkBlob(blobHash, finalHash string, created time.Time) error
func (ix *Index) MapChunkToBlob(blobHash, chunkHash string, offset, length int64) error
func (ix *Index) MapChunkToFile(chunkHash, filePath string, offset, length int64) error
```
### 7.4 Blob Packing
```go
type BlobWriter struct {
// internal buffer, current size, encrypted writer, etc
}
func NewBlobWriter(...) *BlobWriter
func (bw *BlobWriter) AddChunk(chunk []byte, chunkHash string) error
func (bw *BlobWriter) Flush() (finalBlobHash string, err error)
```
### 7.5 Metadata
```go
func BuildSnapshotMetadata(ix *Index, snapshotID string) (sqlitePath string, err error)
func EncryptAndUploadMetadata(path string, cfg *Config, snapshotID string) error
```
### 7.6 Prune
```go
func RunPrune(bucket, prefix, privateKey string) error
```

View File

@ -11,7 +11,7 @@ LDFLAGS := -X 'git.eeqj.de/sneak/vaultik/internal/globals.Version=$(VERSION)' \
-X 'git.eeqj.de/sneak/vaultik/internal/globals.Commit=$(GIT_REVISION)'
# Default target
all: vaultik
all: test
# Run tests
test: lint fmt-check
@ -39,8 +39,8 @@ lint:
golangci-lint run
# Build binary
vaultik: internal/*/*.go cmd/vaultik/*.go
go build -ldflags "$(LDFLAGS)" -o $@ ./cmd/vaultik
build:
go build -ldflags "$(LDFLAGS)" -o vaultik ./cmd/vaultik
# Clean build artifacts
clean:
@ -59,8 +59,4 @@ test-coverage:
# Run integration tests
test-integration:
go test -v -tags=integration ./...
local:
VAULTIK_CONFIG=$(HOME)/etc/vaultik/config.yml ./vaultik snapshot --debug list 2>&1
VAULTIK_CONFIG=$(HOME)/etc/vaultik/config.yml ./vaultik snapshot --debug create 2>&1
go test -v -tags=integration ./...

View File

@ -1,556 +0,0 @@
# Vaultik Snapshot Creation Process
This document describes the lifecycle of objects during snapshot creation, with a focus on database transactions and foreign key constraints.
## Database Schema Overview
### Tables and Foreign Key Dependencies
```
┌─────────────────────────────────────────────────────────────────────────┐
│ FOREIGN KEY GRAPH │
│ │
│ snapshots ◄────── snapshot_files ────────► files │
│ │ │ │
│ └───────── snapshot_blobs ────────► blobs │ │
│ │ │ │
│ │ ├──► file_chunks ◄── chunks│
│ │ │ ▲ │
│ │ └──► chunk_files ────┘ │
│ │ │
│ └──► blob_chunks ─────────────┘│
│ │
│ uploads ───────► blobs.blob_hash │
│ └──────────► snapshots.id │
└─────────────────────────────────────────────────────────────────────────┘
```
### Critical Constraint: `chunks` Must Exist First
These tables reference `chunks.chunk_hash` **without CASCADE**:
- `file_chunks.chunk_hash``chunks.chunk_hash`
- `chunk_files.chunk_hash``chunks.chunk_hash`
- `blob_chunks.chunk_hash``chunks.chunk_hash`
**Implication**: A chunk record MUST be committed to the database BEFORE any of these referencing records can be created.
### Order of Operations Required by Schema
```
1. snapshots (created first, before scan)
2. blobs (created when packer starts new blob)
3. chunks (created during file processing)
4. blob_chunks (created immediately after chunk added to packer)
5. files (created after file fully chunked)
6. file_chunks (created with file record)
7. chunk_files (created with file record)
8. snapshot_files (created with file record)
9. snapshot_blobs (created after blob uploaded)
10. uploads (created after blob uploaded)
```
---
## Snapshot Creation Phases
### Phase 0: Initialization
**Actions:**
1. Snapshot record created in database (Transaction T0)
2. Known files loaded into memory from `files` table
3. Known chunks loaded into memory from `chunks` table
**Transactions:**
```
T0: INSERT INTO snapshots (id, hostname, ...) VALUES (...)
COMMIT
```
---
### Phase 1: Scan Directory
**Actions:**
1. Walk filesystem directory tree
2. For each file, compare against in-memory `knownFiles` map
3. Classify files as: unchanged, new, or modified
4. Collect unchanged file IDs for later association
5. Collect new/modified files for processing
**Transactions:**
```
(None during scan - all in-memory)
```
---
### Phase 1b: Associate Unchanged Files
**Actions:**
1. For unchanged files, add entries to `snapshot_files` table
2. Done in batches of 1000
**Transactions:**
```
For each batch of 1000 file IDs:
T: BEGIN
INSERT INTO snapshot_files (snapshot_id, file_id) VALUES (?, ?)
... (up to 1000 inserts)
COMMIT
```
---
### Phase 2: Process Files
For each file that needs processing:
#### Step 2a: Open and Chunk File
**Location:** `processFileStreaming()`
For each chunk produced by content-defined chunking:
##### Step 2a-1: Check Chunk Existence
```go
chunkExists := s.chunkExists(chunk.Hash) // In-memory lookup
```
##### Step 2a-2: Create Chunk Record (if new)
```go
// TRANSACTION: Create chunk in database
err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
dbChunk := &database.Chunk{ChunkHash: chunk.Hash, Size: chunk.Size}
return s.repos.Chunks.Create(txCtx, tx, dbChunk)
})
// COMMIT immediately after WithTx returns
// Update in-memory cache
s.addKnownChunk(chunk.Hash)
```
**Transaction:**
```
T_chunk: BEGIN
INSERT INTO chunks (chunk_hash, size) VALUES (?, ?)
COMMIT
```
##### Step 2a-3: Add Chunk to Packer
```go
s.packer.AddChunk(&blob.ChunkRef{Hash: chunk.Hash, Data: chunk.Data})
```
**Inside packer.AddChunk → addChunkToCurrentBlob():**
```go
// TRANSACTION: Create blob_chunks record IMMEDIATELY
if p.repos != nil {
blobChunk := &database.BlobChunk{
BlobID: p.currentBlob.id,
ChunkHash: chunk.Hash,
Offset: offset,
Length: chunkSize,
}
err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
return p.repos.BlobChunks.Create(ctx, tx, blobChunk)
})
// COMMIT immediately
}
```
**Transaction:**
```
T_blob_chunk: BEGIN
INSERT INTO blob_chunks (blob_id, chunk_hash, offset, length) VALUES (?, ?, ?, ?)
COMMIT
```
**⚠️ CRITICAL DEPENDENCY**: This transaction requires `chunks.chunk_hash` to exist (FK constraint).
The chunk MUST be committed in Step 2a-2 BEFORE this can succeed.
---
#### Step 2b: Blob Size Limit Handling
If adding a chunk would exceed blob size limit:
```go
if err == blob.ErrBlobSizeLimitExceeded {
if err := s.packer.FinalizeBlob(); err != nil { ... }
// Retry adding the chunk
if err := s.packer.AddChunk(...); err != nil { ... }
}
```
**FinalizeBlob() transactions:**
```
T_blob_finish: BEGIN
UPDATE blobs SET blob_hash=?, uncompressed_size=?, compressed_size=?, finished_ts=? WHERE id=?
COMMIT
```
Then blob handler is called (handleBlobReady):
```
(Upload to S3 - no transaction)
T_blob_uploaded: BEGIN
UPDATE blobs SET uploaded_ts=? WHERE id=?
INSERT INTO snapshot_blobs (snapshot_id, blob_id, blob_hash) VALUES (?, ?, ?)
INSERT INTO uploads (blob_hash, snapshot_id, uploaded_at, size, duration_ms) VALUES (?, ?, ?, ?, ?)
COMMIT
```
---
#### Step 2c: Queue File for Batch Insertion
After all chunks for a file are processed:
```go
// Build file data (in-memory, no DB)
fileChunks := make([]database.FileChunk, len(chunks))
chunkFiles := make([]database.ChunkFile, len(chunks))
// Queue for batch insertion
return s.addPendingFile(ctx, pendingFileData{
file: fileToProcess.File,
fileChunks: fileChunks,
chunkFiles: chunkFiles,
})
```
**No transaction yet** - just adds to `pendingFiles` slice.
If `len(pendingFiles) >= fileBatchSize (100)`, triggers `flushPendingFiles()`.
---
### Step 2d: Flush Pending Files
**Location:** `flushPendingFiles()` - called when batch is full or at end of processing
```go
return s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
for _, data := range files {
// 1. Create file record
s.repos.Files.Create(txCtx, tx, data.file) // INSERT OR REPLACE
// 2. Delete old associations
s.repos.FileChunks.DeleteByFileID(txCtx, tx, data.file.ID)
s.repos.ChunkFiles.DeleteByFileID(txCtx, tx, data.file.ID)
// 3. Create file_chunks records
for _, fc := range data.fileChunks {
s.repos.FileChunks.Create(txCtx, tx, &fc) // FK: chunks.chunk_hash
}
// 4. Create chunk_files records
for _, cf := range data.chunkFiles {
s.repos.ChunkFiles.Create(txCtx, tx, &cf) // FK: chunks.chunk_hash
}
// 5. Add file to snapshot
s.repos.Snapshots.AddFileByID(txCtx, tx, s.snapshotID, data.file.ID)
}
return nil
})
// COMMIT (all or nothing for the batch)
```
**Transaction:**
```
T_files_batch: BEGIN
-- For each file in batch:
INSERT OR REPLACE INTO files (...) VALUES (...)
DELETE FROM file_chunks WHERE file_id = ?
DELETE FROM chunk_files WHERE file_id = ?
INSERT INTO file_chunks (file_id, idx, chunk_hash) VALUES (?, ?, ?) -- FK: chunks
INSERT INTO chunk_files (chunk_hash, file_id, ...) VALUES (?, ?, ...) -- FK: chunks
INSERT INTO snapshot_files (snapshot_id, file_id) VALUES (?, ?)
-- Repeat for each file
COMMIT
```
**⚠️ CRITICAL DEPENDENCY**: `file_chunks` and `chunk_files` require `chunks.chunk_hash` to exist.
---
### Phase 2 End: Final Flush
```go
// Flush any remaining pending files
if err := s.flushAllPending(ctx); err != nil { ... }
// Final packer flush
s.packer.Flush()
```
---
## The Current Bug
### Problem
The current code attempts to batch file insertions, but `file_chunks` and `chunk_files` have foreign keys to `chunks.chunk_hash`. The batched file flush tries to insert these records, but if the chunks haven't been committed yet, the FK constraint fails.
### Why It's Happening
Looking at the sequence:
1. Process file A, chunk X
2. Create chunk X in DB (Transaction commits)
3. Add chunk X to packer
4. Packer creates blob_chunks for chunk X (needs chunk X - OK, committed in step 2)
5. Queue file A with chunk references
6. Process file B, chunk Y
7. Create chunk Y in DB (Transaction commits)
8. ... etc ...
9. At end: flushPendingFiles()
10. Insert file_chunks for file A referencing chunk X (chunk X committed - should work)
The chunks ARE being created individually. But something is going wrong.
### Actual Issue
Wait - let me re-read the code. The issue is:
In `processFileStreaming`, when we queue file data:
```go
fileChunks[i] = database.FileChunk{
FileID: fileToProcess.File.ID,
Idx: ci.fileChunk.Idx,
ChunkHash: ci.fileChunk.ChunkHash,
}
```
The `FileID` is set, but `fileToProcess.File.ID` might be empty at this point because the file record hasn't been created yet!
Looking at `checkFileInMemory`:
```go
// For new files:
if !exists {
return file, true // file.ID is empty string!
}
// For existing files:
file.ID = existingFile.ID // Reuse existing ID
```
**For NEW files, `file.ID` is empty!**
Then in `flushPendingFiles`:
```go
s.repos.Files.Create(txCtx, tx, data.file) // This generates/uses the ID
```
But `data.fileChunks` was built with the EMPTY ID!
### The Real Problem
For new files:
1. `checkFileInMemory` creates file record with empty ID
2. `processFileStreaming` queues file_chunks with empty `FileID`
3. `flushPendingFiles` creates file (generates ID), but file_chunks still have empty `FileID`
Wait, but `Files.Create` should be INSERT OR REPLACE by path, and the file struct should get updated... Let me check.
Actually, looking more carefully at the code path - the file IS created first in the flush, but the `fileChunks` slice was already built with the old (possibly empty) ID. The ID isn't updated after the file is created.
Hmm, but looking at the current code:
```go
fileChunks[i] = database.FileChunk{
FileID: fileToProcess.File.ID, // This uses the ID from the File struct
```
And in `checkFileInMemory` for new files, we create a file struct but don't set the ID. However, looking at the database repository, `Files.Create` should be doing `INSERT OR REPLACE` and the ID should be pre-generated...
Let me check if IDs are being generated. Looking at the File struct usage, it seems like UUIDs should be generated somewhere...
Actually, looking at the test failures again:
```
creating file chunk: inserting file_chunk: constraint failed: FOREIGN KEY constraint failed (787)
```
Error 787 is SQLite's foreign key constraint error. The failing FK is on `file_chunks.chunk_hash → chunks.chunk_hash`.
So the chunks ARE NOT in the database when we try to insert file_chunks. Let me trace through more carefully...
---
## Transaction Timing Issue
The problem is transaction visibility in SQLite.
Each `WithTx` creates a new transaction that commits at the end. But with batched file insertion:
1. Chunk transactions commit one at a time
2. File batch transaction runs later
If chunks are being inserted but something goes wrong with transaction isolation, the file batch might not see them.
But actually SQLite in WAL mode should have SERIALIZABLE isolation by default, so committed transactions should be visible.
Let me check if the in-memory cache is masking a database problem...
Actually, wait. Let me re-check the current broken code more carefully. The issue might be simpler.
---
## Current Code Flow Analysis
Looking at `processFileStreaming` in the current broken state:
```go
// For each chunk:
if !chunkExists {
err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
dbChunk := &database.Chunk{ChunkHash: chunk.Hash, Size: chunk.Size}
return s.repos.Chunks.Create(txCtx, tx, dbChunk)
})
// ... check error ...
s.addKnownChunk(chunk.Hash)
}
// ... add to packer (creates blob_chunks) ...
// Collect chunk info for file
chunks = append(chunks, chunkInfo{...})
```
Then at end of function:
```go
// Queue file for batch insertion
return s.addPendingFile(ctx, pendingFileData{
file: fileToProcess.File,
fileChunks: fileChunks,
chunkFiles: chunkFiles,
})
```
At end of `processPhase`:
```go
if err := s.flushAllPending(ctx); err != nil { ... }
```
The chunks are being created one-by-one with individual transactions. By the time `flushPendingFiles` runs, all chunk transactions should have committed.
Unless... there's a bug in how the chunks are being referenced. Let me check if the chunk_hash values are correct.
Or... maybe the test database is being recreated between operations somehow?
Actually, let me check the test setup. Maybe the issue is specific to the test environment.
---
## Summary of Object Lifecycle
| Object | When Created | Transaction | Dependencies |
|--------|--------------|-------------|--------------|
| snapshot | Before scan | Individual tx | None |
| blob | When packer needs new blob | Individual tx | None |
| chunk | During file chunking (each chunk) | Individual tx | None |
| blob_chunks | Immediately after adding chunk to packer | Individual tx | chunks, blobs |
| files | Batched at end of processing | Batch tx | None |
| file_chunks | With file (batched) | Batch tx | files, chunks |
| chunk_files | With file (batched) | Batch tx | files, chunks |
| snapshot_files | With file (batched) | Batch tx | snapshots, files |
| snapshot_blobs | After blob upload | Individual tx | snapshots, blobs |
| uploads | After blob upload | Same tx as snapshot_blobs | blobs, snapshots |
---
## Root Cause Analysis
After detailed analysis, I believe the issue is one of the following:
### Hypothesis 1: File ID Not Set
Looking at `checkFileInMemory()` for NEW files:
```go
if !exists {
return file, true // file.ID is empty string!
}
```
For new files, `file.ID` is empty. Then in `processFileStreaming`:
```go
fileChunks[i] = database.FileChunk{
FileID: fileToProcess.File.ID, // Empty for new files!
...
}
```
The `FileID` in the built `fileChunks` slice is empty.
Then in `flushPendingFiles`:
```go
s.repos.Files.Create(txCtx, tx, data.file) // This generates the ID
// But data.fileChunks still has empty FileID!
for i := range data.fileChunks {
s.repos.FileChunks.Create(...) // Uses empty FileID
}
```
**Solution**: Generate file IDs upfront in `checkFileInMemory()`:
```go
file := &database.File{
ID: uuid.New().String(), // Generate ID immediately
Path: path,
...
}
```
### Hypothesis 2: Transaction Isolation
SQLite with a single connection pool (`MaxOpenConns(1)`) should serialize all transactions. Committed data should be visible to subsequent transactions.
However, there might be a subtle issue with how `context.Background()` is used in the packer vs the scanner's context.
## Recommended Fix
**Step 1: Generate file IDs upfront**
In `checkFileInMemory()`, generate the UUID for new files immediately:
```go
file := &database.File{
ID: uuid.New().String(), // Always generate ID
Path: path,
...
}
```
This ensures `file.ID` is set when building `fileChunks` and `chunkFiles` slices.
**Step 2: Verify by reverting to per-file transactions**
If Step 1 doesn't fix it, revert to non-batched file insertion to isolate the issue:
```go
// Instead of queuing:
// return s.addPendingFile(ctx, pendingFileData{...})
// Do immediate insertion:
return s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
// Create file
s.repos.Files.Create(txCtx, tx, fileToProcess.File)
// Delete old associations
s.repos.FileChunks.DeleteByFileID(...)
s.repos.ChunkFiles.DeleteByFileID(...)
// Create new associations
for _, fc := range fileChunks {
s.repos.FileChunks.Create(...)
}
for _, cf := range chunkFiles {
s.repos.ChunkFiles.Create(...)
}
// Add to snapshot
s.repos.Snapshots.AddFileByID(...)
return nil
})
```
**Step 3: If batching is still desired**
After confirming per-file transactions work, re-implement batching with the ID fix in place, and add debug logging to trace exactly which chunk_hash is failing and why.

433
README.md
View File

@ -1,27 +1,39 @@
# vaultik (ваултик)
WIP: pre-1.0, some functions may not be fully implemented yet
`vaultik` is an incremental backup daemon written in Go. It encrypts data
using an `age` public key and uploads each encrypted blob directly to a
remote S3-compatible object store. It requires no private keys, secrets, or
credentials (other than those required to PUT to encrypted object storage,
such as S3 API keys) stored on the backed-up system.
`vaultik` is a incremental backup daemon written in Go. It
encrypts data using an `age` public key and uploads each encrypted blob
directly to a remote S3-compatible object store. It requires no private
keys, secrets, or credentials stored on the backed-up system.
It includes table-stakes features such as:
* modern encryption (the excellent `age`)
* modern authenticated encryption
* deduplication
* incremental backups
* modern multithreaded zstd compression with configurable levels
* content-addressed immutable storage
* local state tracking in standard SQLite database, enables write-only
incremental backups to destination
* local state tracking in standard SQLite database
* inotify-based change detection
* streaming processing of all data to not require lots of ram or temp file
storage
* no mutable remote metadata
* no plaintext file paths or metadata stored in remote
* does not create huge numbers of small files (to keep S3 operation counts
down) even if the source system has many small files
## what
`vaultik` walks a set of configured directories and builds a
content-addressable chunk map of changed files using deterministic chunking.
Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
encrypted with `age`, and uploaded directly to remote storage under a
content-addressed S3 path.
No plaintext file contents ever hit disk. No private key or secret
passphrase is needed or stored locally. All encrypted data is
streaming-processed and immediately discarded once uploaded. Metadata is
encrypted and pushed with the same mechanism.
## why
Existing backup software fails under one or more of these conditions:
@ -30,48 +42,16 @@ Existing backup software fails under one or more of these conditions:
compromises encrypted backups in the case of host system compromise
* Depends on symmetric encryption unsuitable for zero-trust environments
* Creates one-blob-per-file, which results in excessive S3 operation counts
* is slow
Other backup tools like `restic`, `borg`, and `duplicity` are designed for
environments where the source host can store secrets and has access to
decryption keys. I don't want to store backup decryption keys on my hosts,
only public keys for encryption.
`vaultik` addresses these by using:
My requirements are:
* open source
* no passphrases or private keys on the source host
* incremental
* compressed
* encrypted
* s3 compatible without an intermediate step or tool
Surprisingly, no existing tool meets these requirements, so I wrote `vaultik`.
## design goals
1. Backups must require only a public key on the source host.
1. No secrets or private keys may exist on the source system.
1. Restore must be possible using **only** the backup bucket and a private key.
1. Prune must be possible (requires private key, done on different hosts).
1. All encryption uses [`age`](https://age-encryption.org/) (X25519, XChaCha20-Poly1305).
1. Compression uses `zstd` at a configurable level.
1. Files are chunked, and multiple chunks are packed into encrypted blobs
to reduce object count for filesystems with many small files.
1. All metadata (snapshots) is stored remotely as encrypted SQLite DBs.
## what
`vaultik` walks a set of configured directories and builds a
content-addressable chunk map of changed files using deterministic chunking.
Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
encrypted with `age`, and uploaded directly to remote storage under a
content-addressed S3 path. At the end, a pruned snapshot-specific sqlite
database of metadata is created, encrypted, and uploaded alongside the
blobs.
No plaintext file contents ever hit disk. No private key or secret
passphrase is needed or stored locally.
* Public-key-only encryption (via `age`) requires no secrets (other than
remote storage api key) on the source system
* Local state cache for incremental detection does not require reading from
or decrypting remote storage
* Content-addressed immutable storage allows efficient deduplication
* Storage only of large encrypted blobs of configurable size (1G by default)
reduces S3 operation counts and improves performance
## how
@ -81,63 +61,59 @@ passphrase is needed or stored locally.
go install git.eeqj.de/sneak/vaultik@latest
```
1. **generate keypair**
2. **generate keypair**
```sh
age-keygen -o agekey.txt
grep 'public key:' agekey.txt
```
1. **write config**
3. **write config**
```yaml
# Named snapshots - each snapshot can contain multiple paths
snapshots:
system:
paths:
- /etc
- /var/lib
exclude:
- '*.cache' # Snapshot-specific exclusions
home:
paths:
- /home/user/documents
- /home/user/photos
# Global exclusions (apply to all snapshots)
source_dirs:
- /etc
- /home/user/data
exclude:
- '*.log'
- '*.tmp'
- '.git'
- 'node_modules'
age_recipients:
- age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
age_recipient: age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
s3:
# endpoint is optional if using AWS S3, but who even does that?
endpoint: https://s3.example.com
bucket: vaultik-data
prefix: host1/
access_key_id: ...
secret_access_key: ...
region: us-east-1
backup_interval: 1h
full_scan_interval: 24h
min_time_between_run: 15m
backup_interval: 1h # only used in daemon mode, not for --cron mode
full_scan_interval: 24h # normally we use inotify to mark dirty, but
# every 24h we do a full stat() scan
min_time_between_run: 15m # again, only for daemon mode
#index_path: /var/lib/vaultik/index.sqlite
chunk_size: 10MB
blob_size_limit: 1GB
blob_size_limit: 10GB
```
1. **run**
4. **run**
```sh
# Create all configured snapshots
vaultik --config /etc/vaultik.yaml snapshot create
```
# Create specific snapshots by name
vaultik --config /etc/vaultik.yaml snapshot create home system
```sh
vaultik --config /etc/vaultik.yaml snapshot create --cron # silent unless error
```
# Silent mode for cron
vaultik --config /etc/vaultik.yaml snapshot create --cron
```sh
vaultik --config /etc/vaultik.yaml snapshot daemon # runs continuously in foreground, uses inotify to detect changes
# TODO
* make sure daemon mode does not make a snapshot if no files have
changed, even if the backup_interval has passed
* in daemon mode, if we are long enough since the last snapshot event, and we get
an inotify event, we should schedule the next snapshot creation for 10 minutes from the
time of the mark-dirty event.
```
---
@ -147,211 +123,76 @@ passphrase is needed or stored locally.
### commands
```sh
vaultik [--config <path>] snapshot create [snapshot-names...] [--cron] [--daemon] [--prune]
vaultik [--config <path>] snapshot create [--cron] [--daemon]
vaultik [--config <path>] snapshot list [--json]
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep]
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--force]
vaultik [--config <path>] snapshot remove <snapshot-id> [--dry-run] [--force]
vaultik [--config <path>] snapshot prune
vaultik [--config <path>] restore <snapshot-id> <target-dir> [paths...]
vaultik [--config <path>] prune [--dry-run] [--force]
vaultik [--config <path>] info
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep]
vaultik [--config <path>] store info
# FIXME: remove 'bucket' and 'prefix' and 'snapshot' flags. it should be
# 'vaultik restore snapshot <snapshot> --target <dir>'. bucket and prefix are always
# from config file.
vaultik restore --bucket <bucket> --prefix <prefix> --snapshot <id> --target <dir>
# FIXME: remove prune, it's the old version of "snapshot purge"
vaultik prune --bucket <bucket> --prefix <prefix> [--dry-run]
# FIXME: change fetch to 'vaultik restore path <snapshot> <path> --target <path>'
vaultik fetch --bucket <bucket> --prefix <prefix> --snapshot <id> --file <path> --target <path>
# FIXME: remove this, it's redundant with 'snapshot verify'
vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
```
### environment
* `VAULTIK_AGE_SECRET_KEY`: Required for `restore` and deep `verify`. Contains the age private key for decryption.
* `VAULTIK_CONFIG`: Optional path to config file.
* `VAULTIK_PRIVATE_KEY`: Required for `restore`, `prune`, `fetch`, and `verify` commands. Contains the age private key for decryption.
* `VAULTIK_CONFIG`: Optional path to config file. If set, config file path doesn't need to be specified on the command line.
### command details
**snapshot create**: Perform incremental backup of configured snapshots
**snapshot create**: Perform incremental backup of configured directories
* Config is located at `/etc/vaultik/config.yml` by default
* Optional snapshot names argument to create specific snapshots (default: all)
* `--cron`: Silent unless error (for crontab)
* `--daemon`: Run continuously with inotify monitoring and periodic scans
* `--prune`: Delete old snapshots and orphaned blobs after backup
**snapshot list**: List all snapshots with their timestamps and sizes
* `--json`: Output in JSON format
**snapshot verify**: Verify snapshot integrity
* `--deep`: Download and verify blob contents (not just existence)
**snapshot purge**: Remove old snapshots based on criteria
* `--keep-latest`: Keep only the most recent snapshot
* `--older-than`: Remove snapshots older than duration (e.g., 30d, 6mo, 1y)
* `--force`: Skip confirmation prompt
**snapshot remove**: Remove a specific snapshot
* `--dry-run`: Show what would be deleted without deleting
* `--force`: Skip confirmation prompt
**snapshot prune**: Clean orphaned data from local database
**restore**: Restore snapshot to target directory
* Requires `VAULTIK_AGE_SECRET_KEY` environment variable with age private key
* Optional path arguments to restore specific files/directories (default: all)
* Downloads and decrypts metadata, fetches required blobs, reconstructs files
* Preserves file permissions, timestamps, and ownership (ownership requires root)
* Handles symlinks and directories
**prune**: Remove unreferenced blobs from remote storage
* Scans all snapshots for referenced blobs
* Deletes orphaned blobs
**info**: Display system and configuration information
**snapshot verify**: Verify snapshot integrity
* `--deep`: Download and verify blob hashes (not just existence)
**store info**: Display S3 bucket configuration and storage statistics
**restore**: Restore entire snapshot to target directory
* Downloads and decrypts metadata
* Fetches only required blobs
* Reconstructs directory structure
**prune**: Remove unreferenced blobs from storage
* Requires private key
* Downloads latest snapshot metadata
* Deletes orphaned blobs
**fetch**: Extract single file from backup
* Retrieves specific file without full restore
* Supports extracting to different filename
**verify**: Validate backup integrity
* Checks metadata hash
* Verifies all referenced blobs exist
* Default: Downloads blobs and validates chunk integrity
* `--quick`: Only checks blob existence and S3 content hashes
---
## architecture
### s3 bucket layout
```
s3://<bucket>/<prefix>/
├── blobs/
│ └── <aa>/<bb>/<full_blob_hash>
└── metadata/
├── <snapshot_id>/
│ ├── db.zst.age
│ └── manifest.json.zst
```
* `blobs/<aa>/<bb>/...`: Two-level directory sharding using first 4 hex chars of blob hash
* `metadata/<snapshot_id>/db.zst.age`: Encrypted, compressed SQLite database
* `metadata/<snapshot_id>/manifest.json.zst`: Unencrypted blob list for pruning
### blob manifest format
The `manifest.json.zst` file is unencrypted (compressed JSON) to enable pruning without decryption:
```json
{
"snapshot_id": "hostname_snapshotname_2025-01-01T12:00:00Z",
"blob_hashes": [
"aa1234567890abcdef...",
"bb2345678901bcdef0..."
]
}
```
Snapshot IDs follow the format `<hostname>_<snapshot-name>_<timestamp>` (e.g., `server1_home_2025-01-01T12:00:00Z`).
### local sqlite schema
```sql
CREATE TABLE files (
id TEXT PRIMARY KEY,
path TEXT NOT NULL UNIQUE,
mtime INTEGER NOT NULL,
size INTEGER NOT NULL,
mode INTEGER NOT NULL,
uid INTEGER NOT NULL,
gid INTEGER NOT NULL
);
CREATE TABLE file_chunks (
file_id TEXT NOT NULL,
idx INTEGER NOT NULL,
chunk_hash TEXT NOT NULL,
PRIMARY KEY (file_id, idx),
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
);
CREATE TABLE chunks (
chunk_hash TEXT PRIMARY KEY,
size INTEGER NOT NULL
);
CREATE TABLE blobs (
id TEXT PRIMARY KEY,
blob_hash TEXT NOT NULL UNIQUE,
uncompressed INTEGER NOT NULL,
compressed INTEGER NOT NULL,
uploaded_at INTEGER
);
CREATE TABLE blob_chunks (
blob_hash TEXT NOT NULL,
chunk_hash TEXT NOT NULL,
offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (blob_hash, chunk_hash)
);
CREATE TABLE chunk_files (
chunk_hash TEXT NOT NULL,
file_id TEXT NOT NULL,
file_offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (chunk_hash, file_id)
);
CREATE TABLE snapshots (
id TEXT PRIMARY KEY,
hostname TEXT NOT NULL,
vaultik_version TEXT NOT NULL,
started_at INTEGER NOT NULL,
completed_at INTEGER,
file_count INTEGER NOT NULL,
chunk_count INTEGER NOT NULL,
blob_count INTEGER NOT NULL,
total_size INTEGER NOT NULL,
blob_size INTEGER NOT NULL,
compression_ratio REAL NOT NULL
);
CREATE TABLE snapshot_files (
snapshot_id TEXT NOT NULL,
file_id TEXT NOT NULL,
PRIMARY KEY (snapshot_id, file_id)
);
CREATE TABLE snapshot_blobs (
snapshot_id TEXT NOT NULL,
blob_id TEXT NOT NULL,
blob_hash TEXT NOT NULL,
PRIMARY KEY (snapshot_id, blob_id)
);
```
### data flow
#### backup
1. Load config, open local SQLite index
1. Walk source directories, check mtime/size against index
1. For changed/new files: chunk using content-defined chunking
1. For each chunk: hash, check if already uploaded, add to blob packer
1. When blob reaches threshold: compress, encrypt, upload to S3
1. Build snapshot metadata, compress, encrypt, upload
1. Create blob manifest (unencrypted) for pruning support
#### restore
1. Download `metadata/<snapshot_id>/db.zst.age`
1. Decrypt and decompress SQLite database
1. Query files table (optionally filtered by paths)
1. For each file, get ordered chunk list from file_chunks
1. Download required blobs, decrypt, decompress
1. Extract chunks and reconstruct files
1. Restore permissions, mtime, uid/gid
#### prune
1. List all snapshot manifests
1. Build set of all referenced blob hashes
1. List all blobs in storage
1. Delete any blob not in referenced set
### chunking
* Content-defined chunking using FastCDC algorithm
* Average chunk size: configurable (default 10MB)
* Content-defined chunking using rolling hash (Rabin fingerprint)
* Average chunk size: 10MB (configurable)
* Deduplication at chunk level
* Multiple chunks packed into blobs for efficiency
@ -362,13 +203,19 @@ CREATE TABLE snapshot_blobs (
* Each blob encrypted independently
* Metadata databases also encrypted
### compression
### storage
* zstd compression at configurable level
* Applied before encryption
* Blob-level compression for efficiency
* Content-addressed blob storage
* Immutable append-only design
* Two-level directory sharding for blobs (aa/bb/hash)
* Compressed with zstd before encryption
---
### state tracking
* Local SQLite database for incremental state
* Tracks file mtimes and chunk mappings
* Enables efficient change detection
* Supports inotify monitoring in daemon mode
## does not
@ -378,6 +225,8 @@ CREATE TABLE snapshot_blobs (
* Require a symmetric passphrase or password
* Trust the source system with anything
---
## does
* Incremental deduplicated backup
@ -389,16 +238,70 @@ CREATE TABLE snapshot_blobs (
---
## requirements
## restore
* Go 1.24 or later
* S3-compatible object storage
* Sufficient disk space for local index (typically <1GB)
`vaultik restore` downloads only the snapshot metadata and required blobs. It
never contacts the source system. All restore operations depend only on:
## license
* `VAULTIK_PRIVATE_KEY`
* The bucket
The entire system is restore-only from object storage.
---
## features
### daemon mode
* Continuous background operation
* inotify-based change detection
* Respects `backup_interval` and `min_time_between_run`
* Full scan every `full_scan_interval` (default 24h)
### cron mode
* Single backup run
* Silent output unless errors
* Ideal for scheduled backups
### metadata integrity
* SHA256 hash of metadata stored separately
* Encrypted hash file for verification
* Chunked metadata support for large filesystems
### exclusion patterns
* Glob-based file exclusion
* Configured in YAML
* Applied during directory walk
## prune
Run `vaultik prune` on a machine with the private key. It:
* Downloads the most recent snapshot
* Decrypts metadata
* Lists referenced blobs
* Deletes any blob in the bucket not referenced
This enables garbage collection from immutable storage.
---
## LICENSE
[MIT](https://opensource.org/license/mit/)
---
## requirements
* Go 1.24.4 or later
* S3-compatible object storage
* Sufficient disk space for local index (typically <1GB)
## author
Made with love and lots of expensive SOTA AI by [sneak](https://sneak.berlin) in Berlin in the summer of 2025.

86
TODO-verify.md Normal file
View File

@ -0,0 +1,86 @@
# TODO: Implement Verify Command
## Overview
Implement the `verify` command to check snapshot integrity. Both shallow and deep verification require the age_secret_key from config to decrypt the database index.
## Implementation Steps
### 1. Update Config Structure
- Add `AgeSecretKey string` field to the Config struct in `internal/config/config.go`
- Add corresponding `age_secret_key` YAML tag
- Ensure the field is properly loaded from config file
### 2. Remove Command Line Flags
- Remove --bucket, --prefix, and --snapshot flags from:
- `internal/cli/verify.go`
- `internal/cli/restore.go`
- `internal/cli/fetch.go`
- Update all commands to use bucket/prefix from config instead of flags
- Update verify command to take snapshot ID as first positional argument
### 3. Implement Shallow Verification
**Requires age_secret_key from config**
1. Download from S3:
- `metadata/{snapshot-id}/manifest.json.zst`
- `metadata/{snapshot-id}/db.zst.age`
2. Process files:
- Decompress manifest (not encrypted)
- Decrypt db.zst.age using age_secret_key
- Decompress decrypted database
- Load SQLite database from dump
3. Verify integrity:
- Query snapshot_blobs table for all blobs in this snapshot
- Compare DB blob list against manifest blob list
- **FAIL IMMEDIATELY** if lists don't match exactly
4. For each blob in manifest:
- Use S3 HeadObject to check existence
- **FAIL IMMEDIATELY** if blob is missing
- Verify blob hash matches filename
- **FAIL IMMEDIATELY** if hash mismatch
5. Only report success if ALL checks pass
### 4. Implement Deep Verification
**Requires age_secret_key from config**
1. Run all shallow verification first (fail on any error)
2. For each blob referenced in snapshot:
- Download blob from S3
- Decrypt using age_secret_key (streaming)
- Decompress (streaming)
- Parse blob structure to extract chunks
3. For each chunk in blob:
- Calculate SHA256 of chunk data
- Query database for expected chunk hash
- **FAIL IMMEDIATELY** if calculated != expected
- Verify chunks are ordered correctly by offset
- **FAIL IMMEDIATELY** if chunks out of order
4. Progress reporting:
- Show blob-by-blob progress
- Show chunk verification within each blob
- But continue only if no errors
5. Only report success if ALL blobs and ALL chunks verify
### 5. Error Handling
- **FAIL IMMEDIATELY** if age_secret_key missing from config
- **FAIL IMMEDIATELY** on decryption failure
- **FAIL IMMEDIATELY** on any verification mismatch
- Use log.Fatal() or return error to ensure non-zero exit code
- Provide clear error messages indicating exactly what failed
## Success Criteria
- Verify command exits with code 0 only if ALL checks pass
- Any failure results in non-zero exit code
- Clear error messages for each failure type
- Progress reporting during verification
- Works with remote-only snapshots (not in local DB)

208
TODO.md
View File

@ -1,97 +1,155 @@
# Vaultik 1.0 TODO
# Implementation TODO
Linear list of tasks to complete before 1.0 release.
## Proposed: Store and Snapshot Commands
## CLI Polish (Priority)
### Overview
Reorganize commands to provide better visibility into stored data and snapshots.
1. Improve error messages throughout
- Ensure all errors include actionable context
- Add suggestions for common issues (e.g., "did you set VAULTIK_AGE_SECRET_KEY?")
### Command Structure
## Security (Priority)
#### `vaultik store` - Storage information commands
- `vaultik store info`
- Lists S3 bucket configuration
- Shows total number of snapshots (from metadata/ listing)
- Shows total number of blobs (from blobs/ listing)
- Shows total size of all blobs
- **No decryption required** - uses S3 listing only
1. Audit encryption implementation
- Verify age encryption is used correctly
- Ensure no plaintext leaks in logs or errors
- Verify blob hashes are computed correctly
#### `vaultik snapshot` - Snapshot management commands
- `vaultik snapshot create [path]`
- Renamed from `vaultik backup`
- Same functionality as current backup command
- `vaultik snapshot list [--json]`
- Lists all snapshots with:
- Snapshot ID
- Creation timestamp (parsed from snapshot ID)
- Compressed size (sum of referenced blob sizes from manifest)
- **No decryption required** - uses blob manifests only
- `--json` flag outputs in JSON format instead of table
- `vaultik snapshot purge`
- Requires one of:
- `--keep-latest` - keeps only the most recent snapshot
- `--older-than <duration>` - removes snapshots older than duration (e.g., "30d", "6m", "1y")
- Removes snapshot metadata and runs pruning to clean up unreferenced blobs
- Shows what would be deleted and requires confirmation
1. Secure memory handling for secrets
- Clear S3 credentials from memory after client init
- Document that age_secret_key is env-var only (already implemented)
- `vaultik snapshot verify [--deep] <snapshot-id>`
- Basic mode: Verifies all blobs referenced in manifest exist in S3
- `--deep` mode: Downloads each blob and verifies its hash matches the stored hash
- **Stub implementation for now**
## Testing
### Implementation Notes
1. Write integration tests for restore command
1. **No Decryption Required**: All commands work with unencrypted blob manifests
2. **Blob Manifests**: Located at `metadata/{snapshot-id}/manifest.json.zst`
3. **S3 Operations**: Use S3 ListObjects to enumerate snapshots and blobs
4. **Size Calculations**: Sum blob sizes from S3 object metadata
5. **Timestamp Parsing**: Extract from snapshot ID format (e.g., `2024-01-15-143052-hostname`)
6. **S3 Metadata**: Only used for `snapshot verify` command
1. Write end-to-end integration test
- Create backup
- Verify backup
- Restore backup
- Compare restored files to originals
### Benefits
- Users can see storage usage without decryption keys
- Snapshot management doesn't require access to encrypted metadata
- Clean separation between storage info and snapshot operations
1. Add tests for edge cases
- Empty directories
- Symlinks
- Special characters in filenames
- Very large files (multi-GB)
- Many small files (100k+)
## Chunking and Hashing
1. ~~Implement content-defined chunking~~ (done with FastCDC)
1. ~~Create streaming chunk processor~~ (done in chunker)
1. ~~Implement SHA256 hashing for chunks~~ (done in scanner)
1. ~~Add configurable chunk size parameters~~ (done in scanner)
1. ~~Write tests for chunking consistency~~ (done)
1. Add tests for error conditions
- Network failures during upload
- Disk full during restore
- Corrupted blobs
- Missing blobs
## Compression and Encryption
1. ~~Implement compression~~ (done with zlib in blob packer)
1. ~~Integrate age encryption library~~ (done in crypto package)
1. ~~Create Encryptor type for public key encryption~~ (done)
1. ~~Implement streaming encrypt/decrypt pipelines~~ (done in packer)
1. ~~Write tests for compression and encryption~~ (done)
## Performance
## Blob Packing
1. ~~Implement BlobWriter with size limits~~ (done in packer)
1. ~~Add chunk accumulation and flushing~~ (done)
1. ~~Create blob hash calculation~~ (done)
1. ~~Implement proper error handling and rollback~~ (done with transactions)
1. ~~Write tests for blob packing scenarios~~ (done)
1. Profile and optimize restore performance
- Parallel blob downloads
- Streaming decompression/decryption
- Efficient chunk reassembly
## S3 Operations
1. ~~Integrate MinIO client library~~ (done in s3 package)
1. ~~Implement S3Client wrapper type~~ (done)
1. ~~Add multipart upload support for large blobs~~ (done - using standard upload)
1. ~~Implement retry logic~~ (handled by MinIO client)
1. ~~Write tests using MinIO container~~ (done with testcontainers)
1. Add bandwidth limiting option
- `--bwlimit` flag for upload/download speed limiting
## Backup Command - Basic
1. ~~Implement directory walking with exclusion patterns~~ (done with afero)
1. Add file change detection using index
1. ~~Integrate chunking pipeline for changed files~~ (done in scanner)
1. Implement blob upload coordination to S3
1. Add progress reporting to stderr
1. Write integration tests for backup
## Documentation
## Snapshot Metadata
1. Implement snapshot metadata extraction from index
1. Create SQLite snapshot database builder
1. Add metadata compression and encryption
1. Implement metadata chunking for large snapshots
1. Add hash calculation and verification
1. Implement metadata upload to S3
1. Write tests for metadata operations
1. Add man page or --help improvements
- Detailed help for each command
- Examples in help output
## Restore Command
1. Implement snapshot listing and selection
1. Add metadata download and reconstruction
1. Implement hash verification for metadata
1. Create file restoration logic with chunk retrieval
1. Add blob caching for efficiency
1. Implement proper file permissions and mtime restoration
1. Write integration tests for restore
## Final Polish
## Prune Command
1. Implement latest snapshot detection
1. Add referenced blob extraction from metadata
1. Create S3 blob listing and comparison
1. Implement safe deletion of unreferenced blobs
1. Add dry-run mode for safety
1. Write tests for prune scenarios
1. Ensure version is set correctly in releases
## Verify Command
1. Implement metadata integrity checking
1. Add blob existence verification
1. Implement quick mode (S3 hash checking)
1. Implement deep mode (download and verify chunks)
1. Add detailed error reporting
1. Write tests for verification
1. Create release process
- Binary releases for supported platforms
- Checksums for binaries
- Release notes template
## Fetch Command
1. Implement single-file metadata query
1. Add minimal blob downloading for file
1. Create streaming file reconstruction
1. Add support for output redirection
1. Write tests for fetch command
1. Final code review
- Remove debug statements
- Ensure consistent code style
## Daemon Mode
1. Implement inotify watcher for Linux
1. Add dirty path tracking in index
1. Create periodic full scan scheduler
1. Implement backup interval enforcement
1. Add proper signal handling and shutdown
1. Write tests for daemon behavior
1. Tag and release v1.0.0
## Cron Mode
1. Implement silent operation mode
1. Add proper exit codes for cron
1. Implement lock file to prevent concurrent runs
1. Add error summary reporting
1. Write tests for cron mode
---
## Post-1.0 (Daemon Mode)
1. Implement inotify file watcher for Linux
- Watch source directories for changes
- Track dirty paths in memory
1. Implement FSEvents watcher for macOS
- Watch source directories for changes
- Track dirty paths in memory
1. Implement backup scheduler in daemon mode
- Respect backup_interval config
- Trigger backup when dirty paths exist and interval elapsed
- Implement full_scan_interval for periodic full scans
1. Add proper signal handling for daemon
- Graceful shutdown on SIGTERM/SIGINT
- Complete in-progress backup before exit
1. Write tests for daemon mode
## Finalization
1. Add comprehensive logging throughout
1. Implement proper error wrapping and context
1. Add performance metrics collection
1. Create end-to-end integration tests
1. Write documentation and examples
1. Set up CI/CD pipeline

View File

@ -1,41 +1,9 @@
package main
import (
"os"
"runtime"
"runtime/pprof"
"git.eeqj.de/sneak/vaultik/internal/cli"
)
func main() {
// CPU profiling: set VAULTIK_CPUPROFILE=/path/to/cpu.prof
if cpuProfile := os.Getenv("VAULTIK_CPUPROFILE"); cpuProfile != "" {
f, err := os.Create(cpuProfile)
if err != nil {
panic("could not create CPU profile: " + err.Error())
}
defer func() { _ = f.Close() }()
if err := pprof.StartCPUProfile(f); err != nil {
panic("could not start CPU profile: " + err.Error())
}
defer pprof.StopCPUProfile()
}
// Memory profiling: set VAULTIK_MEMPROFILE=/path/to/mem.prof
if memProfile := os.Getenv("VAULTIK_MEMPROFILE"); memProfile != "" {
defer func() {
f, err := os.Create(memProfile)
if err != nil {
panic("could not create memory profile: " + err.Error())
}
defer func() { _ = f.Close() }()
runtime.GC() // get up-to-date statistics
if err := pprof.WriteHeapProfile(f); err != nil {
panic("could not write memory profile: " + err.Error())
}
}()
}
cli.CLIEntry()
}

View File

@ -2,210 +2,96 @@
# This file shows all available configuration options with their default values
# Copy this file and uncomment/modify the values you need
# Age recipient public keys for encryption
# This is REQUIRED - backups are encrypted to these public keys
# Age recipient public key for encryption
# This is REQUIRED - backups are encrypted to this public key
# Generate with: age-keygen | grep "public key"
age_recipients:
- age1cj2k2addawy294f6k2gr2mf9gps9r3syplryxca3nvxj3daqm96qfp84tz
age_recipient: age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Named snapshots - each snapshot can contain multiple paths
# Each snapshot gets its own ID and can have snapshot-specific excludes
snapshots:
apps:
paths:
- /Applications
home:
paths:
- "~"
exclude:
- "/.Trash"
- "/tmp"
- "/Library/Caches"
- "/Library/Accounts"
- "/Library/AppleMediaServices"
- "/Library/Application Support/AddressBook"
- "/Library/Application Support/CallHistoryDB"
- "/Library/Application Support/CallHistoryTransactions"
- "/Library/Application Support/DifferentialPrivacy"
- "/Library/Application Support/FaceTime"
- "/Library/Application Support/FileProvider"
- "/Library/Application Support/Knowledge"
- "/Library/Application Support/com.apple.TCC"
- "/Library/Application Support/com.apple.avfoundation/Frecents"
- "/Library/Application Support/com.apple.sharedfilelist"
- "/Library/Assistant/SiriVocabulary"
- "/Library/Autosave Information"
- "/Library/Biome"
- "/Library/ContainerManager"
- "/Library/Containers/com.apple.Home"
- "/Library/Containers/com.apple.Maps/Data/Maps"
- "/Library/Containers/com.apple.MobileSMS"
- "/Library/Containers/com.apple.Notes"
- "/Library/Containers/com.apple.Safari"
- "/Library/Containers/com.apple.Safari.WebApp"
- "/Library/Containers/com.apple.VoiceMemos"
- "/Library/Containers/com.apple.archiveutility"
- "/Library/Containers/com.apple.corerecents.recentsd/Data/Library/Recents"
- "/Library/Containers/com.apple.mail"
- "/Library/Containers/com.apple.news"
- "/Library/Containers/com.apple.stocks"
- "/Library/Cookies"
- "/Library/CoreFollowUp"
- "/Library/Daemon Containers"
- "/Library/DoNotDisturb"
- "/Library/DuetExpertCenter"
- "/Library/Group Containers/com.apple.Home.group"
- "/Library/Group Containers/com.apple.MailPersonaStorage"
- "/Library/Group Containers/com.apple.PreviewLegacySignaturesConversion"
- "/Library/Group Containers/com.apple.bird"
- "/Library/Group Containers/com.apple.stickersd.group"
- "/Library/Group Containers/com.apple.systempreferences.cache"
- "/Library/Group Containers/group.com.apple.AppleSpell"
- "/Library/Group Containers/group.com.apple.ArchiveUtility.PKSignedContainer"
- "/Library/Group Containers/group.com.apple.DeviceActivity"
- "/Library/Group Containers/group.com.apple.Journal"
- "/Library/Group Containers/group.com.apple.ManagedSettings"
- "/Library/Group Containers/group.com.apple.PegasusConfiguration"
- "/Library/Group Containers/group.com.apple.Safari.SandboxBroker"
- "/Library/Group Containers/group.com.apple.SiriTTS"
- "/Library/Group Containers/group.com.apple.UserNotifications"
- "/Library/Group Containers/group.com.apple.VoiceMemos.shared"
- "/Library/Group Containers/group.com.apple.accessibility.voicebanking"
- "/Library/Group Containers/group.com.apple.amsondevicestoraged"
- "/Library/Group Containers/group.com.apple.appstoreagent"
- "/Library/Group Containers/group.com.apple.calendar"
- "/Library/Group Containers/group.com.apple.chronod"
- "/Library/Group Containers/group.com.apple.contacts"
- "/Library/Group Containers/group.com.apple.controlcenter"
- "/Library/Group Containers/group.com.apple.corerepair"
- "/Library/Group Containers/group.com.apple.coreservices.useractivityd"
- "/Library/Group Containers/group.com.apple.energykit"
- "/Library/Group Containers/group.com.apple.feedback"
- "/Library/Group Containers/group.com.apple.feedbacklogger"
- "/Library/Group Containers/group.com.apple.findmy.findmylocateagent"
- "/Library/Group Containers/group.com.apple.iCloudDrive"
- "/Library/Group Containers/group.com.apple.icloud.fmfcore"
- "/Library/Group Containers/group.com.apple.icloud.fmipcore"
- "/Library/Group Containers/group.com.apple.icloud.searchpartyuseragent"
- "/Library/Group Containers/group.com.apple.liveactivitiesd"
- "/Library/Group Containers/group.com.apple.loginwindow.persistent-apps"
- "/Library/Group Containers/group.com.apple.mail"
- "/Library/Group Containers/group.com.apple.mlhost"
- "/Library/Group Containers/group.com.apple.moments"
- "/Library/Group Containers/group.com.apple.news"
- "/Library/Group Containers/group.com.apple.newsd"
- "/Library/Group Containers/group.com.apple.notes"
- "/Library/Group Containers/group.com.apple.notes.import"
- "/Library/Group Containers/group.com.apple.photolibraryd.private"
- "/Library/Group Containers/group.com.apple.portrait.BackgroundReplacement"
- "/Library/Group Containers/group.com.apple.printtool"
- "/Library/Group Containers/group.com.apple.private.translation"
- "/Library/Group Containers/group.com.apple.reminders"
- "/Library/Group Containers/group.com.apple.replicatord"
- "/Library/Group Containers/group.com.apple.scopedbookmarkagent"
- "/Library/Group Containers/group.com.apple.secure-control-center-preferences"
- "/Library/Group Containers/group.com.apple.sharingd"
- "/Library/Group Containers/group.com.apple.shortcuts"
- "/Library/Group Containers/group.com.apple.siri.inference"
- "/Library/Group Containers/group.com.apple.siri.referenceResolution"
- "/Library/Group Containers/group.com.apple.siri.remembers"
- "/Library/Group Containers/group.com.apple.siri.userfeedbacklearning"
- "/Library/Group Containers/group.com.apple.spotlight"
- "/Library/Group Containers/group.com.apple.stocks"
- "/Library/Group Containers/group.com.apple.stocks-news"
- "/Library/Group Containers/group.com.apple.studentd"
- "/Library/Group Containers/group.com.apple.swtransparency"
- "/Library/Group Containers/group.com.apple.telephonyutilities.callservicesd"
- "/Library/Group Containers/group.com.apple.tips"
- "/Library/Group Containers/group.com.apple.tipsnext"
- "/Library/Group Containers/group.com.apple.transparency"
- "/Library/Group Containers/group.com.apple.usernoted"
- "/Library/Group Containers/group.com.apple.weather"
- "/Library/HomeKit"
- "/Library/IdentityServices"
- "/Library/IntelligencePlatform"
- "/Library/Mail"
- "/Library/Messages"
- "/Library/Metadata/CoreSpotlight"
- "/Library/Metadata/com.apple.IntelligentSuggestions"
- "/Library/PersonalizationPortrait"
- "/Library/Safari"
- "/Library/Sharing"
- "/Library/Shortcuts"
- "/Library/StatusKit"
- "/Library/Suggestions"
- "/Library/Trial"
- "/Library/Weather"
- "/Library/com.apple.aiml.instrumentation"
- "/Movies/TV"
system:
paths:
- /
exclude:
# Virtual/transient filesystems
- /proc
- /sys
- /dev
- /run
- /tmp
- /var/tmp
- /var/run
- /var/lock
- /var/cache
- /media
- /mnt
# Swap
- /swapfile
- /swap.img
# Package manager caches
- /var/cache/apt
- /var/cache/yum
- /var/cache/dnf
- /var/cache/pacman
# Trash
- "*/.local/share/Trash"
dev:
paths:
- /Users/user/dev
exclude:
- "**/node_modules"
- "**/target"
- "**/build"
- "**/__pycache__"
- "**/*.pyc"
- "**/.venv"
- "**/vendor"
# List of directories to backup
# These paths will be scanned recursively for files to backup
# Use absolute paths
source_dirs:
- /
# - /home
# - /etc
# - /var
# Global patterns to exclude from all backups
# Patterns to exclude from backup
# Uses glob patterns to match file paths
# Paths are matched as absolute paths
exclude:
- "*.tmp"
# System directories that should not be backed up
- /proc
- /sys
- /dev
- /run
- /tmp
- /var/tmp
- /var/run
- /var/lock
- /var/cache
- /lost+found
- /media
- /mnt
# Swap files
- /swapfile
- /swap.img
- "*.swap"
- "*.swp"
# Log files (optional - you may want to keep some logs)
- "*.log"
- "*.log.*"
- /var/log
# Package manager caches
- /var/cache/apt
- /var/cache/yum
- /var/cache/dnf
- /var/cache/pacman
# User caches and temporary files
- "*/.cache"
- "*/.local/share/Trash"
- "*/Downloads"
- "*/.thumbnails"
# Development artifacts
- "**/node_modules"
- "**/.git/objects"
- "**/target"
- "**/build"
- "**/__pycache__"
- "**/*.pyc"
# Large files you might not want to backup
- "*.iso"
- "*.img"
- "*.vmdk"
- "*.vdi"
- "*.qcow2"
# S3-compatible storage configuration
s3:
# S3-compatible endpoint URL
# Examples: https://s3.amazonaws.com, https://storage.googleapis.com
endpoint: http://10.100.205.122:8333
endpoint: https://s3.example.com
# Bucket name where backups will be stored
bucket: testbucket
bucket: my-backup-bucket
# Prefix (folder) within the bucket for this host's backups
# Useful for organizing backups from multiple hosts
# Default: empty (root of bucket)
#prefix: "hosts/myserver/"
# S3 access credentials
access_key_id: Z9GT22M9YFU08WRMC5D4
secret_access_key: Pi0tPKjFbN4rZlRhcA4zBtEkib04yy2WcIzI+AXk
access_key_id: your-access-key
secret_access_key: your-secret-key
# S3 region
# Default: us-east-1
#region: us-east-1
# Use SSL/TLS for S3 connections
# Default: true
#use_ssl: true
# Part size for multipart uploads
# Minimum 5MB, affects memory usage during upload
# Supports: 5MB, 10M, 100MiB, etc.
@ -247,8 +133,8 @@ s3:
# Compression level (1-19)
# Higher = better compression but slower
# Default: 3
compression_level: 5
#compression_level: 3
# Hostname to use in backup metadata
# Default: system hostname
#hostname: myserver
#hostname: myserver

9
go.mod
View File

@ -5,7 +5,6 @@ go 1.24.4
require (
filippo.io/age v1.2.1
git.eeqj.de/sneak/smartconfig v1.0.0
github.com/adrg/xdg v0.5.3
github.com/aws/aws-sdk-go-v2 v1.36.6
github.com/aws/aws-sdk-go-v2/config v1.29.18
github.com/aws/aws-sdk-go-v2/credentials v1.17.71
@ -13,12 +12,10 @@ require (
github.com/aws/aws-sdk-go-v2/service/s3 v1.84.1
github.com/aws/smithy-go v1.22.4
github.com/dustin/go-humanize v1.0.1
github.com/gobwas/glob v0.2.3
github.com/google/uuid v1.6.0
github.com/johannesboyne/gofakes3 v0.0.0-20250603205740-ed9094be7668
github.com/jotfs/fastcdc-go v0.2.0
github.com/klauspost/compress v1.18.0
github.com/mattn/go-sqlite3 v1.14.29
github.com/schollz/progressbar/v3 v3.19.0
github.com/spf13/afero v1.14.0
github.com/spf13/cobra v1.9.1
github.com/stretchr/testify v1.10.0
@ -40,6 +37,7 @@ require (
github.com/Azure/azure-sdk-for-go/sdk/keyvault/azsecrets v0.12.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/keyvault/internal v0.7.1 // indirect
github.com/AzureAD/microsoft-authentication-library-for-go v1.4.2 // indirect
github.com/adrg/xdg v0.5.3 // indirect
github.com/armon/go-metrics v0.4.1 // indirect
github.com/aws/aws-sdk-go v1.44.256 // indirect
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.6.11 // indirect
@ -101,7 +99,7 @@ require (
github.com/mailru/easyjson v0.7.7 // indirect
github.com/mattn/go-colorable v0.1.13 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db // indirect
github.com/mattn/go-sqlite3 v1.14.29 // indirect
github.com/mitchellh/go-homedir v1.1.0 // indirect
github.com/mitchellh/mapstructure v1.5.0 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
@ -112,7 +110,6 @@ require (
github.com/pkg/errors v0.9.1 // indirect
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
github.com/rivo/uniseg v0.4.7 // indirect
github.com/ryanuber/go-glob v1.0.0 // indirect
github.com/ryszard/goskiplist v0.0.0-20150312221310-2dfbae5fcf46 // indirect
github.com/spf13/pflag v1.0.6 // indirect

14
go.sum
View File

@ -98,8 +98,6 @@ github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UF
github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
github.com/cevatbarisyilmaz/ara v0.0.4 h1:SGH10hXpBJhhTlObuZzTuFn1rrdmjQImITXnZVPSodc=
github.com/cevatbarisyilmaz/ara v0.0.4/go.mod h1:BfFOxnUd6Mj6xmcvRxHN3Sr21Z1T3U2MYkYOmoQe4Ts=
github.com/chengxilo/virtualterm v1.0.4 h1:Z6IpERbRVlfB8WkOmtbHiDbBANU7cimRIof7mk9/PwM=
github.com/chengxilo/virtualterm v1.0.4/go.mod h1:DyxxBZz/x1iqJjFxTFcr6/x+jSpqN0iwWCOK1q10rlY=
github.com/circonus-labs/circonus-gometrics v2.3.1+incompatible/go.mod h1:nmEj6Dob7S7YxXgwXpfOuvO54S+tGdZdw9fuRZt25Ag=
github.com/circonus-labs/circonusllhist v0.1.3/go.mod h1:kMXHVDlOchFAehlya5ePtbp5jckzBHf4XRpQvBOLI+I=
github.com/coreos/go-semver v0.3.1 h1:yi21YpKnrx1gt5R+la8n5WgS0kCrsPp33dmEyHReZr4=
@ -151,8 +149,6 @@ github.com/go-task/slim-sprig/v3 v3.0.0 h1:sUs3vkvUymDpBKi3qH1YSqBQk9+9D/8M2mN1v
github.com/go-task/slim-sprig/v3 v3.0.0/go.mod h1:W848ghGpv3Qj3dhTPRyJypKRiqCdHZiAzKg9hl15HA8=
github.com/go-test/deep v1.0.2 h1:onZX1rnHT3Wv6cqNgYyFOOlgVKJrksuCMCRvJStbMYw=
github.com/go-test/deep v1.0.2/go.mod h1:wGDj63lr65AM2AQyKZd/NYHGb0R+1RLqB8NKt3aSFNA=
github.com/gobwas/glob v0.2.3 h1:A4xDbljILXROh+kObIiy5kIaPYD8e96x1tgBhUI5J+Y=
github.com/gobwas/glob v0.2.3/go.mod h1:d3Ez4x06l9bZtSvzIay5+Yzi0fmZzPgnTbPcKjJAkT8=
github.com/godbus/dbus/v5 v5.0.4/go.mod h1:xhWf0FNVPg57R7Z0UbKHbJfkEywrmjJnf7w5xrFpKfA=
github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ=
github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=
@ -251,6 +247,8 @@ github.com/johannesboyne/gofakes3 v0.0.0-20250603205740-ed9094be7668 h1:+Mn8Sj5V
github.com/johannesboyne/gofakes3 v0.0.0-20250603205740-ed9094be7668/go.mod h1:t6osVdP++3g4v2awHz4+HFccij23BbdT1rX3W7IijqQ=
github.com/josharian/intern v1.0.0 h1:vlS4z54oSdjm0bgjRigI+G1HpF+tI+9rE5LLzOg8HmY=
github.com/josharian/intern v1.0.0/go.mod h1:5DoeVV0s6jJacbCEi61lwdGj/aVlrQvzHFFd8Hwg//Y=
github.com/jotfs/fastcdc-go v0.2.0 h1:WHYIGk3k9NumGWfp4YMsemEcx/s4JKpGAa6tpCpHJOo=
github.com/jotfs/fastcdc-go v0.2.0/go.mod h1:PGFBIloiASFbiKnkCd/hmHXxngxYDYtisyurJ/zyDNM=
github.com/json-iterator/go v1.1.6/go.mod h1:+SdeFBvtyEkXs7REEP0seUULqWtbJapLOCVDaaPEHmU=
github.com/json-iterator/go v1.1.9/go.mod h1:KdQUCv79m/52Kvf8AW2vK1V8akMuk1QjK/uOdHXbAo4=
github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnrnM=
@ -291,8 +289,6 @@ github.com/mattn/go-isatty v0.0.14/go.mod h1:7GGIvUiUoEMVVmxf/4nioHXj79iQHKdU27k
github.com/mattn/go-isatty v0.0.16/go.mod h1:kYGgaQfpe5nmfYZH+SKPsOc2e4SrIfOl2e/yFXSvRLM=
github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
github.com/mattn/go-runewidth v0.0.16 h1:E5ScNMtiwvlvB5paMFdw9p4kSQzbXFikJ5SQO6TULQc=
github.com/mattn/go-runewidth v0.0.16/go.mod h1:Jdepj2loyihRzMpdS35Xk/zdY8IAYHsh153qUoGf23w=
github.com/mattn/go-sqlite3 v1.14.29 h1:1O6nRLJKvsi1H2Sj0Hzdfojwt8GiGKm+LOfLaBFaouQ=
github.com/mattn/go-sqlite3 v1.14.29/go.mod h1:Uh1q+B4BYcTPb+yiD3kU8Ct7aC0hY9fxUwlHK0RXw+Y=
github.com/matttproud/golang_protobuf_extensions v1.0.1/go.mod h1:D8He9yQNgCq6Z5Ld7szi9bcBfOoFv/3dc6xSMkL2PC0=
@ -301,8 +297,6 @@ github.com/miekg/dns v1.1.41 h1:WMszZWJG0XmzbK9FEmzH2TVcqYzFesusSIB41b8KHxY=
github.com/miekg/dns v1.1.41/go.mod h1:p6aan82bvRIyn+zDIv9xYNUpwa73JcSh9BKwknJysuI=
github.com/mitchellh/cli v1.0.0/go.mod h1:hNIlj7HEI86fIcpObd7a0FcrxTWetlwJDGcceTlRvqc=
github.com/mitchellh/cli v1.1.0/go.mod h1:xcISNoH86gajksDmfB23e/pu+B+GeFRMYmoHXxx3xhI=
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db h1:62I3jR2EmQ4l5rM/4FEfDWcRD+abF5XlKShorW5LRoQ=
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db/go.mod h1:l0dey0ia/Uv7NcFFVbCLtqEBQbrT4OCwCSKTEv6enCw=
github.com/mitchellh/go-homedir v1.1.0 h1:lukF9ziXFxDFPkA1vsr5zpc1XuPDn/wFntq5mG+4E0Y=
github.com/mitchellh/go-homedir v1.1.0/go.mod h1:SfyaCUpYCn1Vlf4IUYiD9fPX4A5wJrkLzIz1N1q0pr0=
github.com/mitchellh/go-wordwrap v1.0.0/go.mod h1:ZXFpozHsX6DPmq2I0TCekCxypsnAUbP2oI0UX1GXzOo=
@ -355,8 +349,6 @@ github.com/redis/go-redis/v9 v9.8.0 h1:q3nRvjrlge/6UD7eTu/DSg2uYiU2mCL0G/uzBWqhi
github.com/redis/go-redis/v9 v9.8.0/go.mod h1:huWgSWd8mW6+m0VPhJjSSQ+d6Nh1VICQ6Q5lHuCH/Iw=
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec h1:W09IVJc94icq4NjY3clb7Lk8O1qJ8BdBEF8z0ibU0rE=
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec/go.mod h1:qqbHyh8v60DhA7CoWK5oRCqLrMHRGoxYCSS9EjAz6Eo=
github.com/rivo/uniseg v0.4.7 h1:WUdvkW8uEhrYfLC4ZzdpI2ztxP1I582+49Oc5Mq64VQ=
github.com/rivo/uniseg v0.4.7/go.mod h1:FN3SvrM+Zdj16jyLfmOkMNblXMcoc8DfTHruCPUcx88=
github.com/rogpeppe/go-internal v1.13.1 h1:KvO1DLK/DRN07sQ1LQKScxyZJuNnedQ5/wKSR38lUII=
github.com/rogpeppe/go-internal v1.13.1/go.mod h1:uMEvuHeurkdAXX61udpOXGD/AzZDWNMNyH2VO9fmH0o=
github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
@ -366,8 +358,6 @@ github.com/ryanuber/go-glob v1.0.0 h1:iQh3xXAumdQ+4Ufa5b25cRpC5TYKlno6hsv6Cb3pkB
github.com/ryanuber/go-glob v1.0.0/go.mod h1:807d1WSdnB0XRJzKNil9Om6lcp/3a0v4qIHxIXzX/Yc=
github.com/ryszard/goskiplist v0.0.0-20150312221310-2dfbae5fcf46 h1:GHRpF1pTW19a8tTFrMLUcfWwyC0pnifVo2ClaLq+hP8=
github.com/ryszard/goskiplist v0.0.0-20150312221310-2dfbae5fcf46/go.mod h1:uAQ5PCi+MFsC7HjREoAz1BU+Mq60+05gifQSsHSDG/8=
github.com/schollz/progressbar/v3 v3.19.0 h1:Ea18xuIRQXLAUidVDox3AbwfUhD0/1IvohyTutOIFoc=
github.com/schollz/progressbar/v3 v3.19.0/go.mod h1:IsO3lpbaGuzh8zIMzgY3+J8l4C8GjO0Y9S69eFvNsec=
github.com/sean-/seed v0.0.0-20170313163322-e2103e2c3529 h1:nn5Wsu0esKSJiIVhscUtVbo7ada43DJhG55ua/hjS5I=
github.com/sean-/seed v0.0.0-20170313163322-e2103e2c3529/go.mod h1:DxrIzT+xaE7yg65j358z/aeFdxmN0P9QXhEzd20vsDc=
github.com/sirupsen/logrus v1.2.0/go.mod h1:LxeOpSwHxABJmUn/MG1IvRgCAasNZTLOkJPxbbu5VWo=

View File

@ -26,7 +26,6 @@ import (
"git.eeqj.de/sneak/vaultik/internal/blobgen"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/google/uuid"
"github.com/spf13/afero"
)
@ -48,12 +47,6 @@ type PackerConfig struct {
Fs afero.Fs // Filesystem for temporary files
}
// PendingChunk represents a chunk waiting to be inserted into the database.
type PendingChunk struct {
Hash string
Size int64
}
// Packer accumulates chunks and packs them into blobs.
// It handles compression, encryption, and coordination with the database
// to track blob metadata. Packer is thread-safe.
@ -71,9 +64,6 @@ type Packer struct {
// Current blob being packed
currentBlob *blobInProgress
finishedBlobs []*FinishedBlob // Only used if no handler provided
// Pending chunks to be inserted when blob finalizes
pendingChunks []PendingChunk
}
// blobInProgress represents a blob being assembled
@ -124,9 +114,8 @@ type BlobChunkRef struct {
// BlobWithReader wraps a FinishedBlob with its data reader
type BlobWithReader struct {
*FinishedBlob
Reader io.ReadSeeker
TempFile afero.File // Optional, only set for disk-based blobs
InsertedChunkHashes []string // Chunk hashes that were inserted to DB with this blob
Reader io.ReadSeeker
TempFile afero.File // Optional, only set for disk-based blobs
}
// NewPacker creates a new blob packer that accumulates chunks into blobs.
@ -163,15 +152,6 @@ func (p *Packer) SetBlobHandler(handler BlobHandler) {
p.blobHandler = handler
}
// AddPendingChunk queues a chunk to be inserted into the database when the
// current blob is finalized. This batches chunk inserts to reduce transaction
// overhead. Thread-safe.
func (p *Packer) AddPendingChunk(hash string, size int64) {
p.mu.Lock()
defer p.mu.Unlock()
p.pendingChunks = append(p.pendingChunks, PendingChunk{Hash: hash, Size: size})
}
// AddChunk adds a chunk to the current blob being packed.
// If adding the chunk would exceed MaxBlobSize, returns ErrBlobSizeLimitExceeded.
// In this case, the caller should finalize the current blob and retry.
@ -263,22 +243,19 @@ func (p *Packer) startNewBlob() error {
// Create blob record in database
if p.repos != nil {
blobIDTyped, err := types.ParseBlobID(blobID)
if err != nil {
return fmt.Errorf("parsing blob ID: %w", err)
}
blob := &database.Blob{
ID: blobIDTyped,
Hash: types.BlobHash("temp-placeholder-" + blobID), // Temporary placeholder until finalized
ID: blobID,
Hash: "temp-placeholder-" + blobID, // Temporary placeholder until finalized
CreatedTS: time.Now().UTC(),
FinishedTS: nil,
UncompressedSize: 0,
CompressedSize: 0,
UploadedTS: nil,
}
if err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
return p.repos.Blobs.Create(ctx, tx, blob)
}); err != nil {
})
if err != nil {
return fmt.Errorf("creating blob record: %w", err)
}
}
@ -337,9 +314,23 @@ func (p *Packer) addChunkToCurrentBlob(chunk *ChunkRef) error {
p.currentBlob.chunks = append(p.currentBlob.chunks, chunkInfo)
p.currentBlob.chunkSet[chunk.Hash] = true
// Note: blob_chunk records are inserted in batch when blob is finalized
// to reduce transaction overhead. The chunk info is already stored in
// p.currentBlob.chunks for later insertion.
// Store blob-chunk association in database immediately
if p.repos != nil {
blobChunk := &database.BlobChunk{
BlobID: p.currentBlob.id,
ChunkHash: chunk.Hash,
Offset: offset,
Length: chunkSize,
}
err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
return p.repos.BlobChunks.Create(ctx, tx, blobChunk)
})
if err != nil {
log.Error("Failed to store blob-chunk association in database", "error", err,
"blob_id", p.currentBlob.id, "chunk_hash", chunk.Hash)
// Continue anyway - we can reconstruct this later if needed
}
}
// Update total size
p.currentBlob.size += chunkSize
@ -401,54 +392,16 @@ func (p *Packer) finalizeCurrentBlob() error {
})
}
// Get pending chunks (will be inserted to DB and reported to handler)
chunksToInsert := p.pendingChunks
p.pendingChunks = nil // Clear pending list
// Insert pending chunks, blob_chunks, and update blob in a single transaction
// Update blob record in database with hash and sizes
if p.repos != nil {
blobIDTyped, parseErr := types.ParseBlobID(p.currentBlob.id)
if parseErr != nil {
p.cleanupTempFile()
return fmt.Errorf("parsing blob ID: %w", parseErr)
}
err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
// First insert all pending chunks (required for blob_chunks FK)
for _, chunk := range chunksToInsert {
dbChunk := &database.Chunk{
ChunkHash: types.ChunkHash(chunk.Hash),
Size: chunk.Size,
}
if err := p.repos.Chunks.Create(ctx, tx, dbChunk); err != nil {
return fmt.Errorf("creating chunk: %w", err)
}
}
// Insert all blob_chunk records in batch
for _, chunk := range p.currentBlob.chunks {
blobChunk := &database.BlobChunk{
BlobID: blobIDTyped,
ChunkHash: types.ChunkHash(chunk.Hash),
Offset: chunk.Offset,
Length: chunk.Size,
}
if err := p.repos.BlobChunks.Create(ctx, tx, blobChunk); err != nil {
return fmt.Errorf("creating blob_chunk: %w", err)
}
}
// Update blob record with final hash and sizes
return p.repos.Blobs.UpdateFinished(ctx, tx, p.currentBlob.id, blobHash,
p.currentBlob.size, finalSize)
})
if err != nil {
p.cleanupTempFile()
return fmt.Errorf("finalizing blob transaction: %w", err)
return fmt.Errorf("updating blob record: %w", err)
}
log.Debug("Committed blob transaction",
"chunks_inserted", len(chunksToInsert),
"blob_chunks_inserted", len(p.currentBlob.chunks))
}
// Create finished blob
@ -471,12 +424,6 @@ func (p *Packer) finalizeCurrentBlob() error {
"ratio", fmt.Sprintf("%.2f", compressionRatio),
"duration", time.Since(p.currentBlob.startTime))
// Collect inserted chunk hashes for the scanner to track
var insertedChunkHashes []string
for _, chunk := range chunksToInsert {
insertedChunkHashes = append(insertedChunkHashes, chunk.Hash)
}
// Call blob handler if set
if p.blobHandler != nil {
// Reset file position for handler
@ -487,10 +434,9 @@ func (p *Packer) finalizeCurrentBlob() error {
// Create a blob reader that includes the data stream
blobWithReader := &BlobWithReader{
FinishedBlob: finished,
Reader: p.currentBlob.tempFile,
TempFile: p.currentBlob.tempFile,
InsertedChunkHashes: insertedChunkHashes,
FinishedBlob: finished,
Reader: p.currentBlob.tempFile,
TempFile: p.currentBlob.tempFile,
}
if err := p.blobHandler(blobWithReader); err != nil {

View File

@ -12,7 +12,6 @@ import (
"filippo.io/age"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/klauspost/compress/zstd"
"github.com/spf13/afero"
)
@ -61,7 +60,7 @@ func TestPacker(t *testing.T) {
// Create chunk in database first
dbChunk := &database.Chunk{
ChunkHash: types.ChunkHash(hashStr),
ChunkHash: hashStr,
Size: int64(len(data)),
}
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
@ -153,7 +152,7 @@ func TestPacker(t *testing.T) {
// Create chunk in database first
dbChunk := &database.Chunk{
ChunkHash: types.ChunkHash(hashStr),
ChunkHash: hashStr,
Size: int64(len(data)),
}
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
@ -236,7 +235,7 @@ func TestPacker(t *testing.T) {
// Create chunk in database first
dbChunk := &database.Chunk{
ChunkHash: types.ChunkHash(hashStr),
ChunkHash: hashStr,
Size: int64(len(data)),
}
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
@ -323,7 +322,7 @@ func TestPacker(t *testing.T) {
// Create chunk in database first
dbChunk := &database.Chunk{
ChunkHash: types.ChunkHash(hashStr),
ChunkHash: hashStr,
Size: int64(len(data)),
}
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {

View File

@ -6,6 +6,8 @@ import (
"fmt"
"io"
"os"
"github.com/jotfs/fastcdc-go"
)
// Chunk represents a single chunk of data produced by the content-defined chunking algorithm.
@ -46,8 +48,16 @@ func NewChunker(avgChunkSize int64) *Chunker {
// reasonably sized inputs. For large files or streams, use ChunkReaderStreaming instead.
// Returns an error if chunking fails or if reading from the input fails.
func (c *Chunker) ChunkReader(r io.Reader) ([]Chunk, error) {
chunker := AcquireReusableChunker(r, c.minChunkSize, c.avgChunkSize, c.maxChunkSize)
defer chunker.Release()
opts := fastcdc.Options{
MinSize: c.minChunkSize,
AverageSize: c.avgChunkSize,
MaxSize: c.maxChunkSize,
}
chunker, err := fastcdc.NewChunker(r, opts)
if err != nil {
return nil, fmt.Errorf("creating chunker: %w", err)
}
var chunks []Chunk
offset := int64(0)
@ -64,7 +74,7 @@ func (c *Chunker) ChunkReader(r io.Reader) ([]Chunk, error) {
// Calculate hash
hash := sha256.Sum256(chunk.Data)
// Make a copy of the data since the chunker reuses the buffer
// Make a copy of the data since FastCDC reuses the buffer
chunkData := make([]byte, len(chunk.Data))
copy(chunkData, chunk.Data)
@ -97,8 +107,16 @@ func (c *Chunker) ChunkReaderStreaming(r io.Reader, callback ChunkCallback) (str
fileHasher := sha256.New()
teeReader := io.TeeReader(r, fileHasher)
chunker := AcquireReusableChunker(teeReader, c.minChunkSize, c.avgChunkSize, c.maxChunkSize)
defer chunker.Release()
opts := fastcdc.Options{
MinSize: c.minChunkSize,
AverageSize: c.avgChunkSize,
MaxSize: c.maxChunkSize,
}
chunker, err := fastcdc.NewChunker(teeReader, opts)
if err != nil {
return "", fmt.Errorf("creating chunker: %w", err)
}
offset := int64(0)
@ -114,12 +132,13 @@ func (c *Chunker) ChunkReaderStreaming(r io.Reader, callback ChunkCallback) (str
// Calculate chunk hash
hash := sha256.Sum256(chunk.Data)
// Pass the data directly - caller must process it before we call Next() again
// (chunker reuses its internal buffer, but since we process synchronously
// and completely before continuing, no copy is needed)
// Make a copy of the data since FastCDC reuses the buffer
chunkData := make([]byte, len(chunk.Data))
copy(chunkData, chunk.Data)
if err := callback(Chunk{
Hash: hex.EncodeToString(hash[:]),
Data: chunk.Data,
Data: chunkData,
Offset: offset,
Size: int64(len(chunk.Data)),
}); err != nil {

View File

@ -1,265 +0,0 @@
package chunker
import (
"io"
"math"
"sync"
)
// ReusableChunker implements FastCDC with reusable buffers to minimize allocations.
// Unlike the upstream fastcdc-go library which allocates a new buffer per file,
// this implementation uses sync.Pool to reuse buffers across files.
type ReusableChunker struct {
minSize int
maxSize int
normSize int
bufSize int
maskS uint64
maskL uint64
rd io.Reader
buf []byte
cursor int
offset int
eof bool
}
// reusableChunkerPool pools ReusableChunker instances to avoid allocations.
var reusableChunkerPool = sync.Pool{
New: func() interface{} {
return &ReusableChunker{}
},
}
// bufferPools contains pools for different buffer sizes.
// Key is the buffer size.
var bufferPools = sync.Map{}
func getBuffer(size int) []byte {
poolI, _ := bufferPools.LoadOrStore(size, &sync.Pool{
New: func() interface{} {
buf := make([]byte, size)
return &buf
},
})
pool := poolI.(*sync.Pool)
return *pool.Get().(*[]byte)
}
func putBuffer(buf []byte) {
size := cap(buf)
poolI, ok := bufferPools.Load(size)
if ok {
pool := poolI.(*sync.Pool)
b := buf[:size]
pool.Put(&b)
}
}
// FastCDCChunk represents a chunk from the FastCDC algorithm.
type FastCDCChunk struct {
Offset int
Length int
Data []byte
Fingerprint uint64
}
// AcquireReusableChunker gets a chunker from the pool and initializes it for the given reader.
func AcquireReusableChunker(rd io.Reader, minSize, avgSize, maxSize int) *ReusableChunker {
c := reusableChunkerPool.Get().(*ReusableChunker)
bufSize := maxSize * 2
// Reuse buffer if it's the right size, otherwise get a new one
if c.buf == nil || cap(c.buf) != bufSize {
if c.buf != nil {
putBuffer(c.buf)
}
c.buf = getBuffer(bufSize)
} else {
// Restore buffer to full capacity (may have been truncated by previous EOF)
c.buf = c.buf[:cap(c.buf)]
}
bits := int(math.Round(math.Log2(float64(avgSize))))
normalization := 2
smallBits := bits + normalization
largeBits := bits - normalization
c.minSize = minSize
c.maxSize = maxSize
c.normSize = avgSize
c.bufSize = bufSize
c.maskS = (1 << smallBits) - 1
c.maskL = (1 << largeBits) - 1
c.rd = rd
c.cursor = bufSize
c.offset = 0
c.eof = false
return c
}
// Release returns the chunker to the pool for reuse.
func (c *ReusableChunker) Release() {
c.rd = nil
reusableChunkerPool.Put(c)
}
func (c *ReusableChunker) fillBuffer() error {
n := len(c.buf) - c.cursor
if n >= c.maxSize {
return nil
}
// Move all data after the cursor to the start of the buffer
copy(c.buf[:n], c.buf[c.cursor:])
c.cursor = 0
if c.eof {
c.buf = c.buf[:n]
return nil
}
// Restore buffer to full capacity for reading
c.buf = c.buf[:c.bufSize]
// Fill the rest of the buffer
m, err := io.ReadFull(c.rd, c.buf[n:])
if err == io.EOF || err == io.ErrUnexpectedEOF {
c.buf = c.buf[:n+m]
c.eof = true
} else if err != nil {
return err
}
return nil
}
// Next returns the next chunk or io.EOF when done.
// The returned Data slice is only valid until the next call to Next.
func (c *ReusableChunker) Next() (FastCDCChunk, error) {
if err := c.fillBuffer(); err != nil {
return FastCDCChunk{}, err
}
if len(c.buf) == 0 {
return FastCDCChunk{}, io.EOF
}
length, fp := c.nextChunk(c.buf[c.cursor:])
chunk := FastCDCChunk{
Offset: c.offset,
Length: length,
Data: c.buf[c.cursor : c.cursor+length],
Fingerprint: fp,
}
c.cursor += length
c.offset += chunk.Length
return chunk, nil
}
func (c *ReusableChunker) nextChunk(data []byte) (int, uint64) {
fp := uint64(0)
i := c.minSize
if len(data) <= c.minSize {
return len(data), fp
}
n := min(len(data), c.maxSize)
for ; i < min(n, c.normSize); i++ {
fp = (fp << 1) + table[data[i]]
if (fp & c.maskS) == 0 {
return i + 1, fp
}
}
for ; i < n; i++ {
fp = (fp << 1) + table[data[i]]
if (fp & c.maskL) == 0 {
return i + 1, fp
}
}
return i, fp
}
func min(a, b int) int {
if a < b {
return a
}
return b
}
// 256 random uint64s for the rolling hash function (from FastCDC paper)
var table = [256]uint64{
0xe80e8d55032474b3, 0x11b25b61f5924e15, 0x03aa5bd82a9eb669, 0xc45a153ef107a38c,
0xeac874b86f0f57b9, 0xa5ccedec95ec79c7, 0xe15a3320ad42ac0a, 0x5ed3583fa63cec15,
0xcd497bf624a4451d, 0xf9ade5b059683605, 0x773940c03fb11ca1, 0xa36b16e4a6ae15b2,
0x67afd1adb5a89eac, 0xc44c75ee32f0038e, 0x2101790f365c0967, 0x76415c64a222fc4a,
0x579929249a1e577a, 0xe4762fc41fdbf750, 0xea52198e57dfcdcc, 0xe2535aafe30b4281,
0xcb1a1bd6c77c9056, 0x5a1aa9bfc4612a62, 0x15a728aef8943eb5, 0x2f8f09738a8ec8d9,
0x200f3dec9fac8074, 0x0fa9a7b1e0d318df, 0x06c0804ffd0d8e3a, 0x630cbc412669dd25,
0x10e34f85f4b10285, 0x2a6fe8164b9b6410, 0xcacb57d857d55810, 0x77f8a3a36ff11b46,
0x66af517e0dc3003e, 0x76c073c789b4009a, 0x853230dbb529f22a, 0x1e9e9c09a1f77e56,
0x1e871223802ee65d, 0x37fe4588718ff813, 0x10088539f30db464, 0x366f7470b80b72d1,
0x33f2634d9a6b31db, 0xd43917751d69ea18, 0xa0f492bc1aa7b8de, 0x3f94e5a8054edd20,
0xedfd6e25eb8b1dbf, 0x759517a54f196a56, 0xe81d5006ec7b6b17, 0x8dd8385fa894a6b7,
0x45f4d5467b0d6f91, 0xa1f894699de22bc8, 0x33829d09ef93e0fe, 0x3e29e250caed603c,
0xf7382cba7f63a45e, 0x970f95412bb569d1, 0xc7fcea456d356b4b, 0x723042513f3e7a57,
0x17ae7688de3596f1, 0x27ac1fcd7cd23c1a, 0xf429beeb78b3f71f, 0xd0780692fb93a3f9,
0x9f507e28a7c9842f, 0x56001ad536e433ae, 0x7e1dd1ecf58be306, 0x15fee353aa233fc6,
0xb033a0730b7638e8, 0xeb593ad6bd2406d1, 0x7c86502574d0f133, 0xce3b008d4ccb4be7,
0xf8566e3d383594c8, 0xb2c261e9b7af4429, 0xf685e7e253799dbb, 0x05d33ed60a494cbc,
0xeaf88d55a4cb0d1a, 0x3ee9368a902415a1, 0x8980fe6a8493a9a4, 0x358ed008cb448631,
0xd0cb7e37b46824b8, 0xe9bc375c0bc94f84, 0xea0bf1d8e6b55bb3, 0xb66a60d0f9f6f297,
0x66db2cc4807b3758, 0x7e4e014afbca8b4d, 0xa5686a4938b0c730, 0xa5f0d7353d623316,
0x26e38c349242d5e8, 0xeeefa80a29858e30, 0x8915cb912aa67386, 0x4b957a47bfc420d4,
0xbb53d051a895f7e1, 0x09f5e3235f6911ce, 0x416b98e695cfb7ce, 0x97a08183344c5c86,
0xbf68e0791839a861, 0xea05dde59ed3ed56, 0x0ca732280beda160, 0xac748ed62fe7f4e2,
0xc686da075cf6e151, 0xe1ba5658f4af05c8, 0xe9ff09fbeb67cc35, 0xafaea9470323b28d,
0x0291e8db5bb0ac2a, 0x342072a9bbee77ae, 0x03147eed6b3d0a9c, 0x21379d4de31dbadb,
0x2388d965226fb986, 0x52c96988bfebabfa, 0xa6fc29896595bc2d, 0x38fa4af70aa46b8b,
0xa688dd13939421ee, 0x99d5275d9b1415da, 0x453d31bb4fe73631, 0xde51debc1fbe3356,
0x75a3c847a06c622f, 0xe80e32755d272579, 0x5444052250d8ec0d, 0x8f17dfda19580a3b,
0xf6b3e9363a185e42, 0x7a42adec6868732f, 0x32cb6a07629203a2, 0x1eca8957defe56d9,
0x9fa85e4bc78ff9ed, 0x20ff07224a499ca7, 0x3fa6295ff9682c70, 0xe3d5b1e3ce993eff,
0xa341209362e0b79a, 0x64bd9eae5712ffe8, 0xceebb537babbd12a, 0x5586ef404315954f,
0x46c3085c938ab51a, 0xa82ccb9199907cee, 0x8c51b6690a3523c8, 0xc4dbd4c9ae518332,
0x979898dbb23db7b2, 0x1b5b585e6f672a9d, 0xce284da7c4903810, 0x841166e8bb5f1c4f,
0xb7d884a3fceca7d0, 0xa76468f5a4572374, 0xc10c45f49ee9513d, 0x68f9a5663c1908c9,
0x0095a13476a6339d, 0xd1d7516ffbe9c679, 0xfd94ab0c9726f938, 0x627468bbdb27c959,
0xedc3f8988e4a8c9a, 0x58efd33f0dfaa499, 0x21e37d7e2ef4ac8b, 0x297f9ab5586259c6,
0xda3ba4dc6cb9617d, 0xae11d8d9de2284d2, 0xcfeed88cb3729865, 0xefc2f9e4f03e2633,
0x8226393e8f0855a4, 0xd6e25fd7acf3a767, 0x435784c3bfd6d14a, 0xf97142e6343fe757,
0xd73b9fe826352f85, 0x6c3ac444b5b2bd76, 0xd8e88f3e9fd4a3fd, 0x31e50875c36f3460,
0xa824f1bf88cf4d44, 0x54a4d2c8f5f25899, 0xbff254637ce3b1e6, 0xa02cfe92561b3caa,
0x7bedb4edee9f0af7, 0x879c0620ac49a102, 0xa12c4ccd23b332e7, 0x09a5ff47bf94ed1e,
0x7b62f43cd3046fa0, 0xaa3af0476b9c2fb9, 0x22e55301abebba8e, 0x3a6035c42747bd58,
0x1705373106c8ec07, 0xb1f660de828d0628, 0x065fe82d89ca563d, 0xf555c2d8074d516d,
0x6bb6c186b423ee99, 0x54a807be6f3120a8, 0x8a3c7fe2f88860b8, 0xbeffc344f5118e81,
0xd686e80b7d1bd268, 0x661aef4ef5e5e88b, 0x5bf256c654cd1dda, 0x9adb1ab85d7640f4,
0x68449238920833a2, 0x843279f4cebcb044, 0xc8710cdefa93f7bb, 0x236943294538f3e6,
0x80d7d136c486d0b4, 0x61653956b28851d3, 0x3f843be9a9a956b5, 0xf73cfbbf137987e5,
0xcf0cb6dee8ceac2c, 0x50c401f52f185cae, 0xbdbe89ce735c4c1c, 0xeef3ade9c0570bc7,
0xbe8b066f8f64cbf6, 0x5238d6131705dcb9, 0x20219086c950e9f6, 0x634468d9ed74de02,
0x0aba4b3d705c7fa5, 0x3374416f725a6672, 0xe7378bdf7beb3bc6, 0x0f7b6a1b1cee565b,
0x234e4c41b0c33e64, 0x4efa9a0c3f21fe28, 0x1167fc551643e514, 0x9f81a69d3eb01fa4,
0xdb75c22b12306ed0, 0xe25055d738fc9686, 0x9f9f167a3f8507bb, 0x195f8336d3fbe4d3,
0x8442b6feffdcb6f6, 0x1e07ed24746ffde9, 0x140e31462d555266, 0x8bd0ce515ae1406e,
0x2c0be0042b5584b3, 0x35a23d0e15d45a60, 0xc14f1ba147d9bc83, 0xbbf168691264b23f,
0xad2cc7b57e589ade, 0x9501963154c7815c, 0x9664afa6b8d67d47, 0x7f9e5101fea0a81c,
0x45ecffb610d25bfd, 0x3157f7aecf9b6ab3, 0xc43ca6f88d87501d, 0x9576ff838dee38dc,
0x93f21afe0ce1c7d7, 0xceac699df343d8f9, 0x2fec49e29f03398d, 0x8805ccd5730281ed,
0xf9fc16fc750a8e59, 0x35308cc771adf736, 0x4a57b7c9ee2b7def, 0x03a4c6cdc937a02a,
0x6c9a8a269fc8c4fc, 0x4681decec7a03f43, 0x342eecded1353ef9, 0x8be0552d8413a867,
0xc7b4ac51beda8be8, 0xebcc64fb719842c0, 0xde8e4c7fb6d40c1c, 0xcc8263b62f9738b1,
0xd3cfc0f86511929a, 0x466024ce8bb226ea, 0x459ff690253a3c18, 0x98b27e9d91284c9c,
0x75c3ae8aa3af373d, 0xfbf8f8e79a866ffc, 0x32327f59d0662799, 0x8228b57e729e9830,
0x065ceb7a18381b58, 0xd2177671a31dc5ff, 0x90cd801f2f8701f9, 0x9d714428471c65fe,
}

View File

@ -18,7 +18,7 @@ func TestCLIEntry(t *testing.T) {
}
// Verify all subcommands are registered
expectedCommands := []string{"snapshot", "store", "restore", "prune", "verify", "info", "version"}
expectedCommands := []string{"snapshot", "store", "restore", "prune", "verify", "fetch"}
for _, expected := range expectedCommands {
found := false
for _, cmd := range cmd.Commands() {

138
internal/cli/fetch.go Normal file
View File

@ -0,0 +1,138 @@
package cli
import (
"context"
"fmt"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/globals"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/storage"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// FetchOptions contains options for the fetch command
type FetchOptions struct {
}
// FetchApp contains all dependencies needed for fetch
type FetchApp struct {
Globals *globals.Globals
Config *config.Config
Repositories *database.Repositories
Storage storage.Storer
DB *database.DB
Shutdowner fx.Shutdowner
}
// NewFetchCommand creates the fetch command
func NewFetchCommand() *cobra.Command {
opts := &FetchOptions{}
cmd := &cobra.Command{
Use: "fetch <snapshot-id> <file-path> <target-path>",
Short: "Extract single file from backup",
Long: `Download and decrypt a single file from a backup snapshot.
This command extracts a specific file from the snapshot and saves it to the target path.
The age_secret_key must be configured in the config file for decryption.`,
Args: cobra.ExactArgs(3),
RunE: func(cmd *cobra.Command, args []string) error {
snapshotID := args[0]
filePath := args[1]
targetPath := args[2]
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
// Use the app framework like other commands
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
},
Modules: []fx.Option{
snapshot.Module,
fx.Provide(fx.Annotate(
func(g *globals.Globals, cfg *config.Config, repos *database.Repositories,
storer storage.Storer, db *database.DB, shutdowner fx.Shutdowner) *FetchApp {
return &FetchApp{
Globals: g,
Config: cfg,
Repositories: repos,
Storage: storer,
DB: db,
Shutdowner: shutdowner,
}
},
)),
},
Invokes: []fx.Option{
fx.Invoke(func(app *FetchApp, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
// Start the fetch operation in a goroutine
go func() {
// Run the fetch operation
if err := app.runFetch(ctx, snapshotID, filePath, targetPath, opts); err != nil {
if err != context.Canceled {
log.Error("Fetch operation failed", "error", err)
}
}
// Shutdown the app when fetch completes
if err := app.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
log.Debug("Stopping fetch operation")
return nil
},
})
}),
},
})
},
}
return cmd
}
// runFetch executes the fetch operation
func (app *FetchApp) runFetch(ctx context.Context, snapshotID, filePath, targetPath string, opts *FetchOptions) error {
// Check for age_secret_key
if app.Config.AgeSecretKey == "" {
return fmt.Errorf("age_secret_key missing from config - required for fetch")
}
log.Info("Starting fetch operation",
"snapshot_id", snapshotID,
"file_path", filePath,
"target_path", targetPath,
"bucket", app.Config.S3.Bucket,
"prefix", app.Config.S3.Prefix,
)
// TODO: Implement fetch logic
// 1. Download and decrypt database from S3
// 2. Find the file metadata and chunk list
// 3. Download and decrypt only the necessary blobs
// 4. Reconstruct the file from chunks
// 5. Write file to target path with proper metadata
fmt.Printf("Fetching %s from snapshot %s to %s\n", filePath, snapshotID, targetPath)
fmt.Println("TODO: Implement fetch logic")
return nil
}

View File

@ -36,7 +36,6 @@ func NewInfoCommand() *cobra.Command {
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{},
Invokes: []fx.Option{

View File

@ -19,10 +19,10 @@ func NewPruneCommand() *cobra.Command {
Short: "Remove unreferenced blobs",
Long: `Removes blobs that are not referenced by any snapshot.
This command scans all snapshots and their manifests to build a list of
This command scans all snapshots and their manifests to build a list of
referenced blobs, then removes any blobs in storage that are not in this list.
Use this command after deleting snapshots with 'vaultik purge' to reclaim
Use this command after deleting snapshots with 'vaultik purge' to reclaim
storage space.`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
@ -39,7 +39,6 @@ storage space.`,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet || opts.JSON,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
@ -51,9 +50,7 @@ storage space.`,
// Run the prune operation
if err := v.PruneBlobs(opts); err != nil {
if err != context.Canceled {
if !opts.JSON {
log.Error("Prune operation failed", "error", err)
}
log.Error("Prune operation failed", "error", err)
os.Exit(1)
}
}
@ -78,7 +75,6 @@ storage space.`,
}
cmd.Flags().BoolVar(&opts.Force, "force", false, "Skip confirmation prompt")
cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output pruning stats as JSON")
return cmd
}

View File

@ -56,7 +56,6 @@ specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{},
Invokes: []fx.Option{

View File

@ -2,12 +2,14 @@ package cli
import (
"context"
"fmt"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/globals"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/storage"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
@ -15,17 +17,16 @@ import (
// RestoreOptions contains options for the restore command
type RestoreOptions struct {
TargetDir string
Paths []string // Optional paths to restore (empty = all)
Verify bool // Verify restored files after restore
}
// RestoreApp contains all dependencies needed for restore
type RestoreApp struct {
Globals *globals.Globals
Config *config.Config
Storage storage.Storer
Vaultik *vaultik.Vaultik
Shutdowner fx.Shutdowner
Globals *globals.Globals
Config *config.Config
Repositories *database.Repositories
Storage storage.Storer
DB *database.DB
Shutdowner fx.Shutdowner
}
// NewRestoreCommand creates the restore command
@ -33,35 +34,16 @@ func NewRestoreCommand() *cobra.Command {
opts := &RestoreOptions{}
cmd := &cobra.Command{
Use: "restore <snapshot-id> <target-dir> [paths...]",
Use: "restore <snapshot-id> <target-dir>",
Short: "Restore files from backup",
Long: `Download and decrypt files from a backup snapshot.
This command will restore files from the specified snapshot to the target directory.
If no paths are specified, all files are restored.
If paths are specified, only matching files/directories are restored.
Requires the VAULTIK_AGE_SECRET_KEY environment variable to be set with the age private key.
Examples:
# Restore entire snapshot
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore
# Restore specific file
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore /home/user/important.txt
# Restore specific directory
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore /home/user/documents/
# Restore and verify all files
vaultik restore --verify myhost_docs_2025-01-01T12:00:00Z /restore`,
Args: cobra.MinimumNArgs(2),
This command will restore all files from the specified snapshot to the target directory.
The age_secret_key must be configured in the config file for decryption.`,
Args: cobra.ExactArgs(2),
RunE: func(cmd *cobra.Command, args []string) error {
snapshotID := args[0]
opts.TargetDir = args[1]
if len(args) > 2 {
opts.Paths = args[2:]
}
// Use unified config resolution
configPath, err := ResolveConfigPath()
@ -76,18 +58,19 @@ Examples:
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{
snapshot.Module,
fx.Provide(fx.Annotate(
func(g *globals.Globals, cfg *config.Config,
storer storage.Storer, v *vaultik.Vaultik, shutdowner fx.Shutdowner) *RestoreApp {
func(g *globals.Globals, cfg *config.Config, repos *database.Repositories,
storer storage.Storer, db *database.DB, shutdowner fx.Shutdowner) *RestoreApp {
return &RestoreApp{
Globals: g,
Config: cfg,
Storage: storer,
Vaultik: v,
Shutdowner: shutdowner,
Globals: g,
Config: cfg,
Repositories: repos,
Storage: storer,
DB: db,
Shutdowner: shutdowner,
}
},
)),
@ -99,13 +82,7 @@ Examples:
// Start the restore operation in a goroutine
go func() {
// Run the restore operation
restoreOpts := &vaultik.RestoreOptions{
SnapshotID: snapshotID,
TargetDir: opts.TargetDir,
Paths: opts.Paths,
Verify: opts.Verify,
}
if err := app.Vaultik.Restore(restoreOpts); err != nil {
if err := app.runRestore(ctx, snapshotID, opts); err != nil {
if err != context.Canceled {
log.Error("Restore operation failed", "error", err)
}
@ -120,7 +97,6 @@ Examples:
},
OnStop: func(ctx context.Context) error {
log.Debug("Stopping restore operation")
app.Vaultik.Cancel()
return nil
},
})
@ -130,7 +106,31 @@ Examples:
},
}
cmd.Flags().BoolVar(&opts.Verify, "verify", false, "Verify restored files by checking chunk hashes")
return cmd
}
// runRestore executes the restore operation
func (app *RestoreApp) runRestore(ctx context.Context, snapshotID string, opts *RestoreOptions) error {
// Check for age_secret_key
if app.Config.AgeSecretKey == "" {
return fmt.Errorf("age_secret_key missing from config - required for restore")
}
log.Info("Starting restore operation",
"snapshot_id", snapshotID,
"target_dir", opts.TargetDir,
"bucket", app.Config.S3.Bucket,
"prefix", app.Config.S3.Prefix,
)
// TODO: Implement restore logic
// 1. Download and decrypt database from S3
// 2. Download and decrypt blobs
// 3. Reconstruct files from chunks
// 4. Write files to target directory with proper metadata
fmt.Printf("Restoring snapshot %s to %s\n", snapshotID, opts.TargetDir)
fmt.Println("TODO: Implement restore logic")
return nil
}

View File

@ -13,7 +13,6 @@ type RootFlags struct {
ConfigPath string
Verbose bool
Debug bool
Quiet bool
}
var rootFlags RootFlags
@ -35,17 +34,16 @@ on the source system.`,
cmd.PersistentFlags().StringVar(&rootFlags.ConfigPath, "config", "", "Path to config file (default: $VAULTIK_CONFIG or /etc/vaultik/config.yml)")
cmd.PersistentFlags().BoolVarP(&rootFlags.Verbose, "verbose", "v", false, "Enable verbose output")
cmd.PersistentFlags().BoolVar(&rootFlags.Debug, "debug", false, "Enable debug output")
cmd.PersistentFlags().BoolVarP(&rootFlags.Quiet, "quiet", "q", false, "Suppress non-error output")
// Add subcommands
cmd.AddCommand(
NewRestoreCommand(),
NewPruneCommand(),
NewVerifyCommand(),
NewFetchCommand(),
NewStoreCommand(),
NewSnapshotCommand(),
NewInfoCommand(),
NewVersionCommand(),
)
return cmd

View File

@ -24,8 +24,6 @@ func NewSnapshotCommand() *cobra.Command {
cmd.AddCommand(newSnapshotListCommand())
cmd.AddCommand(newSnapshotPurgeCommand())
cmd.AddCommand(newSnapshotVerifyCommand())
cmd.AddCommand(newSnapshotRemoveCommand())
cmd.AddCommand(newSnapshotPruneCommand())
return cmd
}
@ -35,19 +33,14 @@ func newSnapshotCreateCommand() *cobra.Command {
opts := &vaultik.SnapshotCreateOptions{}
cmd := &cobra.Command{
Use: "create [snapshot-names...]",
Short: "Create new snapshots",
Long: `Creates new snapshots of the configured directories.
Use: "create",
Short: "Create a new snapshot",
Long: `Creates a new snapshot of the configured directories.
If snapshot names are provided, only those snapshots are created.
If no names are provided, all configured snapshots are created.
Config is located at /etc/vaultik/config.yml by default, but can be overridden by
Config is located at /etc/vaultik/config.yml by default, but can be overridden by
specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
Args: cobra.ArbitraryArgs,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Pass snapshot names from args
opts.Snapshots = args
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
@ -62,7 +55,6 @@ specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Cron: opts.Cron,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
@ -101,7 +93,6 @@ specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
cmd.Flags().BoolVar(&opts.Daemon, "daemon", false, "Run in daemon mode with inotify monitoring")
cmd.Flags().BoolVar(&opts.Cron, "cron", false, "Run in cron mode (silent unless error)")
cmd.Flags().BoolVar(&opts.Prune, "prune", false, "Delete all previous snapshots and unreferenced blobs after backup")
cmd.Flags().BoolVar(&opts.SkipErrors, "skip-errors", false, "Skip file read errors (log them loudly but continue)")
return cmd
}
@ -128,7 +119,6 @@ func newSnapshotListCommand() *cobra.Command {
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
@ -196,7 +186,6 @@ func newSnapshotPurgeCommand() *cobra.Command {
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
@ -236,7 +225,7 @@ func newSnapshotPurgeCommand() *cobra.Command {
// newSnapshotVerifyCommand creates the 'snapshot verify' subcommand
func newSnapshotVerifyCommand() *cobra.Command {
opts := &vaultik.VerifyOptions{}
var deep bool
cmd := &cobra.Command{
Use: "verify <snapshot-id>",
@ -258,7 +247,6 @@ func newSnapshotVerifyCommand() *cobra.Command {
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet || opts.JSON,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
@ -266,11 +254,9 @@ func newSnapshotVerifyCommand() *cobra.Command {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
go func() {
if err := v.VerifySnapshotWithOptions(snapshotID, opts); err != nil {
if err := v.VerifySnapshot(snapshotID, deep); err != nil {
if err != context.Canceled {
if !opts.JSON {
log.Error("Verification failed", "error", err)
}
log.Error("Verification failed", "error", err)
os.Exit(1)
}
}
@ -291,133 +277,7 @@ func newSnapshotVerifyCommand() *cobra.Command {
},
}
cmd.Flags().BoolVar(&opts.Deep, "deep", false, "Download and verify blob hashes")
cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output verification results as JSON")
return cmd
}
// newSnapshotRemoveCommand creates the 'snapshot remove' subcommand
func newSnapshotRemoveCommand() *cobra.Command {
opts := &vaultik.RemoveOptions{}
cmd := &cobra.Command{
Use: "remove <snapshot-id>",
Aliases: []string{"rm"},
Short: "Remove a snapshot and its orphaned blobs",
Long: `Removes a snapshot and any blobs that are no longer referenced by other snapshots.
This command downloads manifests from all other snapshots to determine which blobs
are still in use, then deletes any blobs that would become orphaned.`,
Args: cobra.ExactArgs(1),
RunE: func(cmd *cobra.Command, args []string) error {
snapshotID := args[0]
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet || opts.JSON,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
go func() {
if _, err := v.RemoveSnapshot(snapshotID, opts); err != nil {
if err != context.Canceled {
if !opts.JSON {
log.Error("Failed to remove snapshot", "error", err)
}
os.Exit(1)
}
}
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
v.Cancel()
return nil
},
})
}),
},
})
},
}
cmd.Flags().BoolVarP(&opts.Force, "force", "f", false, "Skip confirmation prompt")
cmd.Flags().BoolVar(&opts.DryRun, "dry-run", false, "Show what would be deleted without deleting")
cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output deletion stats as JSON")
return cmd
}
// newSnapshotPruneCommand creates the 'snapshot prune' subcommand
func newSnapshotPruneCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "prune",
Short: "Remove orphaned data from local database",
Long: `Removes orphaned files, chunks, and blobs from the local database.
This cleans up data that is no longer referenced by any snapshot, which can
accumulate from incomplete backups or deleted snapshots.`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
go func() {
if _, err := v.PruneDatabase(); err != nil {
if err != context.Canceled {
log.Error("Failed to prune database", "error", err)
os.Exit(1)
}
}
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
v.Cancel()
return nil
},
})
}),
},
})
},
}
cmd.Flags().BoolVar(&deep, "deep", false, "Download and verify blob hashes")
return cmd
}

View File

@ -127,7 +127,6 @@ func runWithApp(ctx context.Context, fn func(*StoreApp) error) error {
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{
fx.Provide(func(storer storage.Storer, shutdowner fx.Shutdowner) *StoreApp {

View File

@ -49,7 +49,6 @@ The command will fail immediately on any verification error and exit with non-ze
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet || opts.JSON, // Suppress log output in JSON mode
},
Modules: []fx.Option{},
Invokes: []fx.Option{
@ -62,14 +61,12 @@ The command will fail immediately on any verification error and exit with non-ze
if opts.Deep {
err = v.RunDeepVerify(snapshotID, opts)
} else {
err = v.VerifySnapshotWithOptions(snapshotID, opts)
err = v.VerifySnapshot(snapshotID, false)
}
if err != nil {
if err != context.Canceled {
if !opts.JSON {
log.Error("Verification failed", "error", err)
}
log.Error("Verification failed", "error", err)
os.Exit(1)
}
}
@ -92,7 +89,6 @@ The command will fail immediately on any verification error and exit with non-ze
}
cmd.Flags().BoolVar(&opts.Deep, "deep", false, "Perform deep verification by downloading and verifying all blob contents")
cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output verification results as JSON")
return cmd
}

View File

@ -1,27 +0,0 @@
package cli
import (
"fmt"
"runtime"
"git.eeqj.de/sneak/vaultik/internal/globals"
"github.com/spf13/cobra"
)
// NewVersionCommand creates the version command
func NewVersionCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "version",
Short: "Print version information",
Long: `Print version, git commit, and build information for vaultik.`,
Args: cobra.NoArgs,
Run: func(cmd *cobra.Command, args []string) {
fmt.Printf("vaultik %s\n", globals.Version)
fmt.Printf(" commit: %s\n", globals.Commit)
fmt.Printf(" go: %s\n", runtime.Version())
fmt.Printf(" os/arch: %s/%s\n", runtime.GOOS, runtime.GOARCH)
},
}
return cmd
}

View File

@ -4,13 +4,10 @@ import (
"fmt"
"os"
"path/filepath"
"sort"
"strings"
"time"
"filippo.io/age"
"git.eeqj.de/sneak/smartconfig"
"git.eeqj.de/sneak/vaultik/internal/log"
"github.com/adrg/xdg"
"go.uber.org/fx"
"gopkg.in/yaml.v3"
@ -40,62 +37,24 @@ func expandTildeInURL(url string) string {
return url
}
// SnapshotConfig represents configuration for a named snapshot.
// Each snapshot backs up one or more paths and can have its own exclude patterns
// in addition to the global excludes.
type SnapshotConfig struct {
Paths []string `yaml:"paths"`
Exclude []string `yaml:"exclude"` // Additional excludes for this snapshot
}
// GetExcludes returns the combined exclude patterns for a named snapshot.
// It merges global excludes with the snapshot-specific excludes.
func (c *Config) GetExcludes(snapshotName string) []string {
snap, ok := c.Snapshots[snapshotName]
if !ok {
return c.Exclude
}
if len(snap.Exclude) == 0 {
return c.Exclude
}
// Combine global and snapshot-specific excludes
combined := make([]string, 0, len(c.Exclude)+len(snap.Exclude))
combined = append(combined, c.Exclude...)
combined = append(combined, snap.Exclude...)
return combined
}
// SnapshotNames returns the names of all configured snapshots in sorted order.
func (c *Config) SnapshotNames() []string {
names := make([]string, 0, len(c.Snapshots))
for name := range c.Snapshots {
names = append(names, name)
}
// Sort for deterministic order
sort.Strings(names)
return names
}
// Config represents the application configuration for Vaultik.
// It defines all settings for backup operations, including source directories,
// encryption recipients, storage configuration, and performance tuning parameters.
// Configuration is typically loaded from a YAML file.
type Config struct {
AgeRecipients []string `yaml:"age_recipients"`
AgeSecretKey string `yaml:"age_secret_key"`
BackupInterval time.Duration `yaml:"backup_interval"`
BlobSizeLimit Size `yaml:"blob_size_limit"`
ChunkSize Size `yaml:"chunk_size"`
Exclude []string `yaml:"exclude"` // Global excludes applied to all snapshots
FullScanInterval time.Duration `yaml:"full_scan_interval"`
Hostname string `yaml:"hostname"`
IndexPath string `yaml:"index_path"`
MinTimeBetweenRun time.Duration `yaml:"min_time_between_run"`
S3 S3Config `yaml:"s3"`
Snapshots map[string]SnapshotConfig `yaml:"snapshots"`
CompressionLevel int `yaml:"compression_level"`
AgeRecipients []string `yaml:"age_recipients"`
AgeSecretKey string `yaml:"age_secret_key"`
BackupInterval time.Duration `yaml:"backup_interval"`
BlobSizeLimit Size `yaml:"blob_size_limit"`
ChunkSize Size `yaml:"chunk_size"`
Exclude []string `yaml:"exclude"`
FullScanInterval time.Duration `yaml:"full_scan_interval"`
Hostname string `yaml:"hostname"`
IndexPath string `yaml:"index_path"`
MinTimeBetweenRun time.Duration `yaml:"min_time_between_run"`
S3 S3Config `yaml:"s3"`
SourceDirs []string `yaml:"source_dirs"`
CompressionLevel int `yaml:"compression_level"`
// StorageURL specifies the storage backend using a URL format.
// Takes precedence over S3Config if set.
@ -178,13 +137,8 @@ func Load(path string) (*Config, error) {
// Expand tilde in all path fields
cfg.IndexPath = expandTilde(cfg.IndexPath)
cfg.StorageURL = expandTildeInURL(cfg.StorageURL)
// Expand tildes in snapshot paths
for name, snap := range cfg.Snapshots {
for i, path := range snap.Paths {
snap.Paths[i] = expandTilde(path)
}
cfg.Snapshots[name] = snap
for i, dir := range cfg.SourceDirs {
cfg.SourceDirs[i] = expandTilde(dir)
}
// Check for environment variable override for IndexPath
@ -192,11 +146,6 @@ func Load(path string) (*Config, error) {
cfg.IndexPath = expandTilde(envIndexPath)
}
// Check for environment variable override for AgeSecretKey
if envAgeSecretKey := os.Getenv("VAULTIK_AGE_SECRET_KEY"); envAgeSecretKey != "" {
cfg.AgeSecretKey = extractAgeSecretKey(envAgeSecretKey)
}
// Get hostname if not set
if cfg.Hostname == "" {
hostname, err := os.Hostname()
@ -214,17 +163,6 @@ func Load(path string) (*Config, error) {
cfg.S3.PartSize = Size(5 * 1024 * 1024) // 5MB
}
// Check config file permissions (warn if world or group readable)
if info, err := os.Stat(path); err == nil {
mode := info.Mode().Perm()
if mode&0044 != 0 { // group or world readable
log.Warn("Config file has insecure permissions (contains S3 credentials)",
"path", path,
"mode", fmt.Sprintf("%04o", mode),
"recommendation", "chmod 600 "+path)
}
}
if err := cfg.Validate(); err != nil {
return nil, fmt.Errorf("invalid config: %w", err)
}
@ -235,7 +173,7 @@ func Load(path string) (*Config, error) {
// Validate checks if the configuration is valid and complete.
// It ensures all required fields are present and have valid values:
// - At least one age recipient must be specified
// - At least one snapshot must be configured with at least one path
// - At least one source directory must be configured
// - Storage must be configured (either storage_url or s3.* fields)
// - Chunk size must be at least 1MB
// - Blob size limit must be at least the chunk size
@ -246,14 +184,8 @@ func (c *Config) Validate() error {
return fmt.Errorf("at least one age_recipient is required")
}
if len(c.Snapshots) == 0 {
return fmt.Errorf("at least one snapshot must be configured")
}
for name, snap := range c.Snapshots {
if len(snap.Paths) == 0 {
return fmt.Errorf("snapshot %q must have at least one path", name)
}
if len(c.SourceDirs) == 0 {
return fmt.Errorf("at least one source directory is required")
}
// Validate storage configuration
@ -320,21 +252,6 @@ func (c *Config) validateStorage() error {
return nil
}
// extractAgeSecretKey extracts the AGE-SECRET-KEY from the input using
// the age library's parser, which handles comments and whitespace.
func extractAgeSecretKey(input string) string {
identities, err := age.ParseIdentities(strings.NewReader(input))
if err != nil || len(identities) == 0 {
// Fall back to trimmed input if parsing fails
return strings.TrimSpace(input)
}
// Return the string representation of the first identity
if id, ok := identities[0].(*age.X25519Identity); ok {
return id.String()
}
return strings.TrimSpace(input)
}
// Module exports the config module for fx dependency injection.
// It provides the Config type to other modules in the application.
var Module = fx.Module("config",

View File

@ -45,21 +45,12 @@ func TestConfigLoad(t *testing.T) {
t.Errorf("Expected first age recipient to be %s, got '%s'", TEST_SNEAK_AGE_PUBLIC_KEY, cfg.AgeRecipients[0])
}
if len(cfg.Snapshots) != 1 {
t.Errorf("Expected 1 snapshot, got %d", len(cfg.Snapshots))
if len(cfg.SourceDirs) != 2 {
t.Errorf("Expected 2 source dirs, got %d", len(cfg.SourceDirs))
}
testSnap, ok := cfg.Snapshots["test"]
if !ok {
t.Fatal("Expected 'test' snapshot to exist")
}
if len(testSnap.Paths) != 2 {
t.Errorf("Expected 2 paths in test snapshot, got %d", len(testSnap.Paths))
}
if testSnap.Paths[0] != "/tmp/vaultik-test-source" {
t.Errorf("Expected first path to be '/tmp/vaultik-test-source', got '%s'", testSnap.Paths[0])
if cfg.SourceDirs[0] != "/tmp/vaultik-test-source" {
t.Errorf("Expected first source dir to be '/tmp/vaultik-test-source', got '%s'", cfg.SourceDirs[0])
}
if cfg.S3.Bucket != "vaultik-test-bucket" {
@ -83,65 +74,3 @@ func TestConfigFromEnv(t *testing.T) {
t.Errorf("Config file does not exist at path from VAULTIK_CONFIG: %s", configPath)
}
}
// TestExtractAgeSecretKey tests extraction of AGE-SECRET-KEY from various inputs
func TestExtractAgeSecretKey(t *testing.T) {
tests := []struct {
name string
input string
expected string
}{
{
name: "plain key",
input: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
},
{
name: "key with trailing newline",
input: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5\n",
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
},
{
name: "full age-keygen output",
input: `# created: 2025-01-14T12:00:00Z
# public key: age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg
AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5
`,
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
},
{
name: "age-keygen output with extra blank lines",
input: `# created: 2025-01-14T12:00:00Z
# public key: age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg
AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5
`,
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
},
{
name: "key with leading whitespace",
input: " AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5 ",
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
},
{
name: "empty input",
input: "",
expected: "",
},
{
name: "only comments",
input: "# this is a comment\n# another comment",
expected: "# this is a comment\n# another comment",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result := extractAgeSecretKey(tt.input)
if result != tt.expected {
t.Errorf("extractAgeSecretKey(%q) = %q, want %q", tt.input, result, tt.expected)
}
})
}
}

View File

@ -5,8 +5,6 @@ import (
"strings"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
func TestBlobChunkRepository(t *testing.T) {
@ -18,8 +16,8 @@ func TestBlobChunkRepository(t *testing.T) {
// Create blob first
blob := &Blob{
ID: types.NewBlobID(),
Hash: types.BlobHash("blob1-hash"),
ID: "blob1-uuid",
Hash: "blob1-hash",
CreatedTS: time.Now(),
}
err := repos.Blobs.Create(ctx, nil, blob)
@ -28,7 +26,7 @@ func TestBlobChunkRepository(t *testing.T) {
}
// Create chunks
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
chunks := []string{"chunk1", "chunk2", "chunk3"}
for _, chunkHash := range chunks {
chunk := &Chunk{
ChunkHash: chunkHash,
@ -43,7 +41,7 @@ func TestBlobChunkRepository(t *testing.T) {
// Test Create
bc1 := &BlobChunk{
BlobID: blob.ID,
ChunkHash: types.ChunkHash("chunk1"),
ChunkHash: "chunk1",
Offset: 0,
Length: 1024,
}
@ -56,7 +54,7 @@ func TestBlobChunkRepository(t *testing.T) {
// Add more chunks to the same blob
bc2 := &BlobChunk{
BlobID: blob.ID,
ChunkHash: types.ChunkHash("chunk2"),
ChunkHash: "chunk2",
Offset: 1024,
Length: 2048,
}
@ -67,7 +65,7 @@ func TestBlobChunkRepository(t *testing.T) {
bc3 := &BlobChunk{
BlobID: blob.ID,
ChunkHash: types.ChunkHash("chunk3"),
ChunkHash: "chunk3",
Offset: 3072,
Length: 512,
}
@ -77,7 +75,7 @@ func TestBlobChunkRepository(t *testing.T) {
}
// Test GetByBlobID
blobChunks, err := repos.BlobChunks.GetByBlobID(ctx, blob.ID.String())
blobChunks, err := repos.BlobChunks.GetByBlobID(ctx, blob.ID)
if err != nil {
t.Fatalf("failed to get blob chunks: %v", err)
}
@ -136,13 +134,13 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
// Create blobs
blob1 := &Blob{
ID: types.NewBlobID(),
Hash: types.BlobHash("blob1-hash"),
ID: "blob1-uuid",
Hash: "blob1-hash",
CreatedTS: time.Now(),
}
blob2 := &Blob{
ID: types.NewBlobID(),
Hash: types.BlobHash("blob2-hash"),
ID: "blob2-uuid",
Hash: "blob2-hash",
CreatedTS: time.Now(),
}
@ -156,7 +154,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
}
// Create chunks
chunkHashes := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
chunkHashes := []string{"chunk1", "chunk2", "chunk3"}
for _, chunkHash := range chunkHashes {
chunk := &Chunk{
ChunkHash: chunkHash,
@ -171,10 +169,10 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
// Create chunks across multiple blobs
// Some chunks are shared between blobs (deduplication scenario)
blobChunks := []BlobChunk{
{BlobID: blob1.ID, ChunkHash: types.ChunkHash("chunk1"), Offset: 0, Length: 1024},
{BlobID: blob1.ID, ChunkHash: types.ChunkHash("chunk2"), Offset: 1024, Length: 1024},
{BlobID: blob2.ID, ChunkHash: types.ChunkHash("chunk2"), Offset: 0, Length: 1024}, // chunk2 is shared
{BlobID: blob2.ID, ChunkHash: types.ChunkHash("chunk3"), Offset: 1024, Length: 1024},
{BlobID: blob1.ID, ChunkHash: "chunk1", Offset: 0, Length: 1024},
{BlobID: blob1.ID, ChunkHash: "chunk2", Offset: 1024, Length: 1024},
{BlobID: blob2.ID, ChunkHash: "chunk2", Offset: 0, Length: 1024}, // chunk2 is shared
{BlobID: blob2.ID, ChunkHash: "chunk3", Offset: 1024, Length: 1024},
}
for _, bc := range blobChunks {
@ -185,7 +183,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
}
// Verify blob1 chunks
chunks, err := repos.BlobChunks.GetByBlobID(ctx, blob1.ID.String())
chunks, err := repos.BlobChunks.GetByBlobID(ctx, blob1.ID)
if err != nil {
t.Fatalf("failed to get blob1 chunks: %v", err)
}
@ -194,7 +192,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
}
// Verify blob2 chunks
chunks, err = repos.BlobChunks.GetByBlobID(ctx, blob2.ID.String())
chunks, err = repos.BlobChunks.GetByBlobID(ctx, blob2.ID)
if err != nil {
t.Fatalf("failed to get blob2 chunks: %v", err)
}

View File

@ -4,8 +4,6 @@ import (
"context"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
func TestBlobRepository(t *testing.T) {
@ -17,8 +15,8 @@ func TestBlobRepository(t *testing.T) {
// Test Create
blob := &Blob{
ID: types.NewBlobID(),
Hash: types.BlobHash("blobhash123"),
ID: "test-blob-id-123",
Hash: "blobhash123",
CreatedTS: time.Now().Truncate(time.Second),
}
@ -28,7 +26,7 @@ func TestBlobRepository(t *testing.T) {
}
// Test GetByHash
retrieved, err := repo.GetByHash(ctx, blob.Hash.String())
retrieved, err := repo.GetByHash(ctx, blob.Hash)
if err != nil {
t.Fatalf("failed to get blob: %v", err)
}
@ -43,7 +41,7 @@ func TestBlobRepository(t *testing.T) {
}
// Test GetByID
retrievedByID, err := repo.GetByID(ctx, blob.ID.String())
retrievedByID, err := repo.GetByID(ctx, blob.ID)
if err != nil {
t.Fatalf("failed to get blob by ID: %v", err)
}
@ -56,8 +54,8 @@ func TestBlobRepository(t *testing.T) {
// Test with second blob
blob2 := &Blob{
ID: types.NewBlobID(),
Hash: types.BlobHash("blobhash456"),
ID: "test-blob-id-456",
Hash: "blobhash456",
CreatedTS: time.Now().Truncate(time.Second),
}
err = repo.Create(ctx, nil, blob2)
@ -67,13 +65,13 @@ func TestBlobRepository(t *testing.T) {
// Test UpdateFinished
now := time.Now()
err = repo.UpdateFinished(ctx, nil, blob.ID.String(), blob.Hash.String(), 1000, 500)
err = repo.UpdateFinished(ctx, nil, blob.ID, blob.Hash, 1000, 500)
if err != nil {
t.Fatalf("failed to update blob as finished: %v", err)
}
// Verify update
updated, err := repo.GetByID(ctx, blob.ID.String())
updated, err := repo.GetByID(ctx, blob.ID)
if err != nil {
t.Fatalf("failed to get updated blob: %v", err)
}
@ -88,13 +86,13 @@ func TestBlobRepository(t *testing.T) {
}
// Test UpdateUploaded
err = repo.UpdateUploaded(ctx, nil, blob.ID.String())
err = repo.UpdateUploaded(ctx, nil, blob.ID)
if err != nil {
t.Fatalf("failed to update blob as uploaded: %v", err)
}
// Verify upload update
uploaded, err := repo.GetByID(ctx, blob.ID.String())
uploaded, err := repo.GetByID(ctx, blob.ID)
if err != nil {
t.Fatalf("failed to get uploaded blob: %v", err)
}
@ -115,8 +113,8 @@ func TestBlobRepositoryDuplicate(t *testing.T) {
repo := NewBlobRepository(db)
blob := &Blob{
ID: types.NewBlobID(),
Hash: types.BlobHash("duplicate_blob"),
ID: "duplicate-test-id",
Hash: "duplicate_blob",
CreatedTS: time.Now().Truncate(time.Second),
}

View File

@ -5,8 +5,6 @@ import (
"fmt"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
// TestCascadeDeleteDebug tests cascade delete with debug output
@ -44,7 +42,7 @@ func TestCascadeDeleteDebug(t *testing.T) {
// Create chunks and file-chunk mappings
for i := 0; i < 3; i++ {
chunk := &Chunk{
ChunkHash: types.ChunkHash(fmt.Sprintf("cascade-chunk-%d", i)),
ChunkHash: fmt.Sprintf("cascade-chunk-%d", i),
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)

View File

@ -4,8 +4,6 @@ import (
"context"
"database/sql"
"fmt"
"git.eeqj.de/sneak/vaultik/internal/types"
)
type ChunkFileRepository struct {
@ -25,9 +23,9 @@ func (r *ChunkFileRepository) Create(ctx context.Context, tx *sql.Tx, cf *ChunkF
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
_, err = tx.ExecContext(ctx, query, cf.ChunkHash, cf.FileID, cf.FileOffset, cf.Length)
} else {
_, err = r.db.ExecWithLog(ctx, query, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
_, err = r.db.ExecWithLog(ctx, query, cf.ChunkHash, cf.FileID, cf.FileOffset, cf.Length)
}
if err != nil {
@ -37,20 +35,30 @@ func (r *ChunkFileRepository) Create(ctx context.Context, tx *sql.Tx, cf *ChunkF
return nil
}
func (r *ChunkFileRepository) GetByChunkHash(ctx context.Context, chunkHash types.ChunkHash) ([]*ChunkFile, error) {
func (r *ChunkFileRepository) GetByChunkHash(ctx context.Context, chunkHash string) ([]*ChunkFile, error) {
query := `
SELECT chunk_hash, file_id, file_offset, length
FROM chunk_files
WHERE chunk_hash = ?
`
rows, err := r.db.conn.QueryContext(ctx, query, chunkHash.String())
rows, err := r.db.conn.QueryContext(ctx, query, chunkHash)
if err != nil {
return nil, fmt.Errorf("querying chunk files: %w", err)
}
defer CloseRows(rows)
return r.scanChunkFiles(rows)
var chunkFiles []*ChunkFile
for rows.Next() {
var cf ChunkFile
err := rows.Scan(&cf.ChunkHash, &cf.FileID, &cf.FileOffset, &cf.Length)
if err != nil {
return nil, fmt.Errorf("scanning chunk file: %w", err)
}
chunkFiles = append(chunkFiles, &cf)
}
return chunkFiles, rows.Err()
}
func (r *ChunkFileRepository) GetByFilePath(ctx context.Context, filePath string) ([]*ChunkFile, error) {
@ -67,41 +75,40 @@ func (r *ChunkFileRepository) GetByFilePath(ctx context.Context, filePath string
}
defer CloseRows(rows)
return r.scanChunkFiles(rows)
var chunkFiles []*ChunkFile
for rows.Next() {
var cf ChunkFile
err := rows.Scan(&cf.ChunkHash, &cf.FileID, &cf.FileOffset, &cf.Length)
if err != nil {
return nil, fmt.Errorf("scanning chunk file: %w", err)
}
chunkFiles = append(chunkFiles, &cf)
}
return chunkFiles, rows.Err()
}
// GetByFileID retrieves chunk files by file ID
func (r *ChunkFileRepository) GetByFileID(ctx context.Context, fileID types.FileID) ([]*ChunkFile, error) {
func (r *ChunkFileRepository) GetByFileID(ctx context.Context, fileID string) ([]*ChunkFile, error) {
query := `
SELECT chunk_hash, file_id, file_offset, length
FROM chunk_files
WHERE file_id = ?
`
rows, err := r.db.conn.QueryContext(ctx, query, fileID.String())
rows, err := r.db.conn.QueryContext(ctx, query, fileID)
if err != nil {
return nil, fmt.Errorf("querying chunk files: %w", err)
}
defer CloseRows(rows)
return r.scanChunkFiles(rows)
}
// scanChunkFiles is a helper that scans chunk file rows
func (r *ChunkFileRepository) scanChunkFiles(rows *sql.Rows) ([]*ChunkFile, error) {
var chunkFiles []*ChunkFile
for rows.Next() {
var cf ChunkFile
var chunkHashStr, fileIDStr string
err := rows.Scan(&chunkHashStr, &fileIDStr, &cf.FileOffset, &cf.Length)
err := rows.Scan(&cf.ChunkHash, &cf.FileID, &cf.FileOffset, &cf.Length)
if err != nil {
return nil, fmt.Errorf("scanning chunk file: %w", err)
}
cf.ChunkHash = types.ChunkHash(chunkHashStr)
cf.FileID, err = types.ParseFileID(fileIDStr)
if err != nil {
return nil, fmt.Errorf("parsing file ID: %w", err)
}
chunkFiles = append(chunkFiles, &cf)
}
@ -109,14 +116,14 @@ func (r *ChunkFileRepository) scanChunkFiles(rows *sql.Rows) ([]*ChunkFile, erro
}
// DeleteByFileID deletes all chunk_files entries for a given file ID
func (r *ChunkFileRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fileID types.FileID) error {
func (r *ChunkFileRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fileID string) error {
query := `DELETE FROM chunk_files WHERE file_id = ?`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, fileID.String())
_, err = tx.ExecContext(ctx, query, fileID)
} else {
_, err = r.db.ExecWithLog(ctx, query, fileID.String())
_, err = r.db.ExecWithLog(ctx, query, fileID)
}
if err != nil {
@ -125,80 +132,3 @@ func (r *ChunkFileRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fi
return nil
}
// DeleteByFileIDs deletes all chunk_files for multiple files in a single statement.
func (r *ChunkFileRepository) DeleteByFileIDs(ctx context.Context, tx *sql.Tx, fileIDs []types.FileID) error {
if len(fileIDs) == 0 {
return nil
}
// Batch at 500 to stay within SQLite's variable limit
const batchSize = 500
for i := 0; i < len(fileIDs); i += batchSize {
end := i + batchSize
if end > len(fileIDs) {
end = len(fileIDs)
}
batch := fileIDs[i:end]
query := "DELETE FROM chunk_files WHERE file_id IN (?" + repeatPlaceholder(len(batch)-1) + ")"
args := make([]interface{}, len(batch))
for j, id := range batch {
args[j] = id.String()
}
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch deleting chunk_files: %w", err)
}
}
return nil
}
// CreateBatch inserts multiple chunk_files in a single statement for efficiency.
func (r *ChunkFileRepository) CreateBatch(ctx context.Context, tx *sql.Tx, cfs []ChunkFile) error {
if len(cfs) == 0 {
return nil
}
// Each ChunkFile has 4 values, so batch at 200 to be safe with SQLite's variable limit
const batchSize = 200
for i := 0; i < len(cfs); i += batchSize {
end := i + batchSize
if end > len(cfs) {
end = len(cfs)
}
batch := cfs[i:end]
query := "INSERT INTO chunk_files (chunk_hash, file_id, file_offset, length) VALUES "
args := make([]interface{}, 0, len(batch)*4)
for j, cf := range batch {
if j > 0 {
query += ", "
}
query += "(?, ?, ?, ?)"
args = append(args, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
}
query += " ON CONFLICT(chunk_hash, file_id) DO NOTHING"
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch inserting chunk_files: %w", err)
}
}
return nil
}

View File

@ -4,8 +4,6 @@ import (
"context"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
func TestChunkFileRepository(t *testing.T) {
@ -51,7 +49,7 @@ func TestChunkFileRepository(t *testing.T) {
// Create chunk first
chunk := &Chunk{
ChunkHash: types.ChunkHash("chunk1"),
ChunkHash: "chunk1",
Size: 1024,
}
err = chunksRepo.Create(ctx, nil, chunk)
@ -61,7 +59,7 @@ func TestChunkFileRepository(t *testing.T) {
// Test Create
cf1 := &ChunkFile{
ChunkHash: types.ChunkHash("chunk1"),
ChunkHash: "chunk1",
FileID: file1.ID,
FileOffset: 0,
Length: 1024,
@ -74,7 +72,7 @@ func TestChunkFileRepository(t *testing.T) {
// Add same chunk in different file (deduplication scenario)
cf2 := &ChunkFile{
ChunkHash: types.ChunkHash("chunk1"),
ChunkHash: "chunk1",
FileID: file2.ID,
FileOffset: 2048,
Length: 1024,
@ -116,7 +114,7 @@ func TestChunkFileRepository(t *testing.T) {
if len(chunkFiles) != 1 {
t.Errorf("expected 1 chunk for file, got %d", len(chunkFiles))
}
if chunkFiles[0].ChunkHash != types.ChunkHash("chunk1") {
if chunkFiles[0].ChunkHash != "chunk1" {
t.Errorf("wrong chunk hash: expected chunk1, got %s", chunkFiles[0].ChunkHash)
}
@ -153,7 +151,7 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {
}
// Create chunks first
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3", "chunk4"}
chunks := []string{"chunk1", "chunk2", "chunk3", "chunk4"}
for _, chunkHash := range chunks {
chunk := &Chunk{
ChunkHash: chunkHash,
@ -172,16 +170,16 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {
chunkFiles := []ChunkFile{
// File1
{ChunkHash: types.ChunkHash("chunk1"), FileID: file1.ID, FileOffset: 0, Length: 1024},
{ChunkHash: types.ChunkHash("chunk2"), FileID: file1.ID, FileOffset: 1024, Length: 1024},
{ChunkHash: types.ChunkHash("chunk3"), FileID: file1.ID, FileOffset: 2048, Length: 1024},
{ChunkHash: "chunk1", FileID: file1.ID, FileOffset: 0, Length: 1024},
{ChunkHash: "chunk2", FileID: file1.ID, FileOffset: 1024, Length: 1024},
{ChunkHash: "chunk3", FileID: file1.ID, FileOffset: 2048, Length: 1024},
// File2
{ChunkHash: types.ChunkHash("chunk2"), FileID: file2.ID, FileOffset: 0, Length: 1024},
{ChunkHash: types.ChunkHash("chunk3"), FileID: file2.ID, FileOffset: 1024, Length: 1024},
{ChunkHash: types.ChunkHash("chunk4"), FileID: file2.ID, FileOffset: 2048, Length: 1024},
{ChunkHash: "chunk2", FileID: file2.ID, FileOffset: 0, Length: 1024},
{ChunkHash: "chunk3", FileID: file2.ID, FileOffset: 1024, Length: 1024},
{ChunkHash: "chunk4", FileID: file2.ID, FileOffset: 2048, Length: 1024},
// File3
{ChunkHash: types.ChunkHash("chunk1"), FileID: file3.ID, FileOffset: 0, Length: 1024},
{ChunkHash: types.ChunkHash("chunk4"), FileID: file3.ID, FileOffset: 1024, Length: 1024},
{ChunkHash: "chunk1", FileID: file3.ID, FileOffset: 0, Length: 1024},
{ChunkHash: "chunk4", FileID: file3.ID, FileOffset: 1024, Length: 1024},
}
for _, cf := range chunkFiles {

View File

@ -3,8 +3,6 @@ package database
import (
"context"
"testing"
"git.eeqj.de/sneak/vaultik/internal/types"
)
func TestChunkRepository(t *testing.T) {
@ -16,7 +14,7 @@ func TestChunkRepository(t *testing.T) {
// Test Create
chunk := &Chunk{
ChunkHash: types.ChunkHash("chunkhash123"),
ChunkHash: "chunkhash123",
Size: 4096,
}
@ -26,7 +24,7 @@ func TestChunkRepository(t *testing.T) {
}
// Test GetByHash
retrieved, err := repo.GetByHash(ctx, chunk.ChunkHash.String())
retrieved, err := repo.GetByHash(ctx, chunk.ChunkHash)
if err != nil {
t.Fatalf("failed to get chunk: %v", err)
}
@ -48,7 +46,7 @@ func TestChunkRepository(t *testing.T) {
// Test GetByHashes
chunk2 := &Chunk{
ChunkHash: types.ChunkHash("chunkhash456"),
ChunkHash: "chunkhash456",
Size: 8192,
}
err = repo.Create(ctx, nil, chunk2)
@ -56,7 +54,7 @@ func TestChunkRepository(t *testing.T) {
t.Fatalf("failed to create second chunk: %v", err)
}
chunks, err := repo.GetByHashes(ctx, []string{chunk.ChunkHash.String(), chunk2.ChunkHash.String()})
chunks, err := repo.GetByHashes(ctx, []string{chunk.ChunkHash, chunk2.ChunkHash})
if err != nil {
t.Fatalf("failed to get chunks by hashes: %v", err)
}

View File

@ -36,17 +36,26 @@ type DB struct {
}
// New creates a new database connection at the specified path.
// It creates the schema if needed and configures SQLite with WAL mode for
// better concurrency. SQLite handles crash recovery automatically when
// opening a database with journal/WAL files present.
// It automatically handles database recovery, creates the schema if needed,
// and configures SQLite with appropriate settings for performance and reliability.
// The database uses WAL mode for better concurrency and sets a busy timeout
// to handle concurrent access gracefully.
//
// If the database appears locked, it will attempt recovery by removing stale
// lock files and switching temporarily to TRUNCATE journal mode.
//
// New creates a new database connection at the specified path.
// It automatically handles recovery from stale locks, creates the schema if needed,
// and configures SQLite with WAL mode for better concurrency.
// The path parameter can be a file path for persistent storage or ":memory:"
// for an in-memory database (useful for testing).
func New(ctx context.Context, path string) (*DB, error) {
log.Debug("Opening database connection", "path", path)
// Note: We do NOT delete journal/WAL files before opening.
// SQLite handles crash recovery automatically when the database is opened.
// Deleting these files would corrupt the database after an unclean shutdown.
// First, try to recover from any stale locks
if err := recoverDatabase(ctx, path); err != nil {
log.Warn("Failed to recover database", "error", err)
}
// First attempt with standard WAL mode
log.Debug("Attempting to open database with WAL mode", "path", path)
@ -147,6 +156,62 @@ func (db *DB) Close() error {
return nil
}
// recoverDatabase attempts to recover a locked database
func recoverDatabase(ctx context.Context, path string) error {
// Check if database file exists
if _, err := os.Stat(path); os.IsNotExist(err) {
// No database file, nothing to recover
return nil
}
// Remove stale lock files
// SQLite creates -wal and -shm files for WAL mode
walPath := path + "-wal"
shmPath := path + "-shm"
journalPath := path + "-journal"
log.Info("Attempting database recovery", "path", path)
// Always remove lock files on startup to ensure clean state
removed := false
// Check for and remove journal file (from non-WAL mode)
if _, err := os.Stat(journalPath); err == nil {
log.Info("Found journal file, removing", "path", journalPath)
if err := os.Remove(journalPath); err != nil {
log.Warn("Failed to remove journal file", "error", err)
} else {
removed = true
}
}
// Remove WAL file
if _, err := os.Stat(walPath); err == nil {
log.Info("Found WAL file, removing", "path", walPath)
if err := os.Remove(walPath); err != nil {
log.Warn("Failed to remove WAL file", "error", err)
} else {
removed = true
}
}
// Remove SHM file
if _, err := os.Stat(shmPath); err == nil {
log.Info("Found shared memory file, removing", "path", shmPath)
if err := os.Remove(shmPath); err != nil {
log.Warn("Failed to remove shared memory file", "error", err)
} else {
removed = true
}
}
if removed {
log.Info("Database lock files removed")
}
return nil
}
// Conn returns the underlying *sql.DB connection.
// This should be used sparingly and primarily for read operations.
// For write operations, prefer using the ExecWithLog method.
@ -154,11 +219,6 @@ func (db *DB) Conn() *sql.DB {
return db.conn
}
// Path returns the path to the database file.
func (db *DB) Path() string {
return db.path
}
// BeginTx starts a new database transaction with the given options.
// The caller is responsible for committing or rolling back the transaction.
// For write transactions, consider using the Repositories.WithTx method instead,
@ -210,15 +270,6 @@ func NewTestDB() (*DB, error) {
return New(context.Background(), ":memory:")
}
// repeatPlaceholder generates a string of ", ?" repeated n times for IN clause construction.
// For example, repeatPlaceholder(2) returns ", ?, ?".
func repeatPlaceholder(n int) string {
if n <= 0 {
return ""
}
return strings.Repeat(", ?", n)
}
// LogSQL logs SQL queries and their arguments when debug mode is enabled.
// Debug mode is activated by setting the GODEBUG environment variable to include "vaultik".
// This is useful for troubleshooting database operations and understanding query patterns.

View File

@ -4,8 +4,6 @@ import (
"context"
"database/sql"
"fmt"
"git.eeqj.de/sneak/vaultik/internal/types"
)
type FileChunkRepository struct {
@ -25,9 +23,9 @@ func (r *FileChunkRepository) Create(ctx context.Context, tx *sql.Tx, fc *FileCh
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
_, err = tx.ExecContext(ctx, query, fc.FileID, fc.Idx, fc.ChunkHash)
} else {
_, err = r.db.ExecWithLog(ctx, query, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
_, err = r.db.ExecWithLog(ctx, query, fc.FileID, fc.Idx, fc.ChunkHash)
}
if err != nil {
@ -52,11 +50,21 @@ func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*Fi
}
defer CloseRows(rows)
return r.scanFileChunks(rows)
var fileChunks []*FileChunk
for rows.Next() {
var fc FileChunk
err := rows.Scan(&fc.FileID, &fc.Idx, &fc.ChunkHash)
if err != nil {
return nil, fmt.Errorf("scanning file chunk: %w", err)
}
fileChunks = append(fileChunks, &fc)
}
return fileChunks, rows.Err()
}
// GetByFileID retrieves file chunks by file ID
func (r *FileChunkRepository) GetByFileID(ctx context.Context, fileID types.FileID) ([]*FileChunk, error) {
func (r *FileChunkRepository) GetByFileID(ctx context.Context, fileID string) ([]*FileChunk, error) {
query := `
SELECT file_id, idx, chunk_hash
FROM file_chunks
@ -64,13 +72,23 @@ func (r *FileChunkRepository) GetByFileID(ctx context.Context, fileID types.File
ORDER BY idx
`
rows, err := r.db.conn.QueryContext(ctx, query, fileID.String())
rows, err := r.db.conn.QueryContext(ctx, query, fileID)
if err != nil {
return nil, fmt.Errorf("querying file chunks: %w", err)
}
defer CloseRows(rows)
return r.scanFileChunks(rows)
var fileChunks []*FileChunk
for rows.Next() {
var fc FileChunk
err := rows.Scan(&fc.FileID, &fc.Idx, &fc.ChunkHash)
if err != nil {
return nil, fmt.Errorf("scanning file chunk: %w", err)
}
fileChunks = append(fileChunks, &fc)
}
return fileChunks, rows.Err()
}
// GetByPathTx retrieves file chunks within a transaction
@ -90,28 +108,16 @@ func (r *FileChunkRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path
}
defer CloseRows(rows)
fileChunks, err := r.scanFileChunks(rows)
LogSQL("GetByPathTx", "Complete", path, "count", len(fileChunks))
return fileChunks, err
}
// scanFileChunks is a helper that scans file chunk rows
func (r *FileChunkRepository) scanFileChunks(rows *sql.Rows) ([]*FileChunk, error) {
var fileChunks []*FileChunk
for rows.Next() {
var fc FileChunk
var fileIDStr, chunkHashStr string
err := rows.Scan(&fileIDStr, &fc.Idx, &chunkHashStr)
err := rows.Scan(&fc.FileID, &fc.Idx, &fc.ChunkHash)
if err != nil {
return nil, fmt.Errorf("scanning file chunk: %w", err)
}
fc.FileID, err = types.ParseFileID(fileIDStr)
if err != nil {
return nil, fmt.Errorf("parsing file ID: %w", err)
}
fc.ChunkHash = types.ChunkHash(chunkHashStr)
fileChunks = append(fileChunks, &fc)
}
LogSQL("GetByPathTx", "Complete", path, "count", len(fileChunks))
return fileChunks, rows.Err()
}
@ -134,14 +140,14 @@ func (r *FileChunkRepository) DeleteByPath(ctx context.Context, tx *sql.Tx, path
}
// DeleteByFileID deletes all chunks for a file by its UUID
func (r *FileChunkRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fileID types.FileID) error {
func (r *FileChunkRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fileID string) error {
query := `DELETE FROM file_chunks WHERE file_id = ?`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, fileID.String())
_, err = tx.ExecContext(ctx, query, fileID)
} else {
_, err = r.db.ExecWithLog(ctx, query, fileID.String())
_, err = r.db.ExecWithLog(ctx, query, fileID)
}
if err != nil {
@ -151,86 +157,6 @@ func (r *FileChunkRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fi
return nil
}
// DeleteByFileIDs deletes all chunks for multiple files in a single statement.
func (r *FileChunkRepository) DeleteByFileIDs(ctx context.Context, tx *sql.Tx, fileIDs []types.FileID) error {
if len(fileIDs) == 0 {
return nil
}
// Batch at 500 to stay within SQLite's variable limit
const batchSize = 500
for i := 0; i < len(fileIDs); i += batchSize {
end := i + batchSize
if end > len(fileIDs) {
end = len(fileIDs)
}
batch := fileIDs[i:end]
query := "DELETE FROM file_chunks WHERE file_id IN (?" + repeatPlaceholder(len(batch)-1) + ")"
args := make([]interface{}, len(batch))
for j, id := range batch {
args[j] = id.String()
}
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch deleting file_chunks: %w", err)
}
}
return nil
}
// CreateBatch inserts multiple file_chunks in a single statement for efficiency.
// Batches are automatically split to stay within SQLite's variable limit.
func (r *FileChunkRepository) CreateBatch(ctx context.Context, tx *sql.Tx, fcs []FileChunk) error {
if len(fcs) == 0 {
return nil
}
// SQLite has a limit on variables (typically 999 or 32766).
// Each FileChunk has 3 values, so batch at 300 to be safe.
const batchSize = 300
for i := 0; i < len(fcs); i += batchSize {
end := i + batchSize
if end > len(fcs) {
end = len(fcs)
}
batch := fcs[i:end]
// Build the query with multiple value sets
query := "INSERT INTO file_chunks (file_id, idx, chunk_hash) VALUES "
args := make([]interface{}, 0, len(batch)*3)
for j, fc := range batch {
if j > 0 {
query += ", "
}
query += "(?, ?, ?)"
args = append(args, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
}
query += " ON CONFLICT(file_id, idx) DO NOTHING"
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch inserting file_chunks: %w", err)
}
}
return nil
}
// GetByFile is an alias for GetByPath for compatibility
func (r *FileChunkRepository) GetByFile(ctx context.Context, path string) ([]*FileChunk, error) {
LogSQL("GetByFile", "Starting", path)

View File

@ -5,8 +5,6 @@ import (
"fmt"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
func TestFileChunkRepository(t *testing.T) {
@ -35,7 +33,7 @@ func TestFileChunkRepository(t *testing.T) {
}
// Create chunks first
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
chunks := []string{"chunk1", "chunk2", "chunk3"}
chunkRepo := NewChunkRepository(db)
for _, chunkHash := range chunks {
chunk := &Chunk{
@ -52,7 +50,7 @@ func TestFileChunkRepository(t *testing.T) {
fc1 := &FileChunk{
FileID: file.ID,
Idx: 0,
ChunkHash: types.ChunkHash("chunk1"),
ChunkHash: "chunk1",
}
err = repo.Create(ctx, nil, fc1)
@ -64,7 +62,7 @@ func TestFileChunkRepository(t *testing.T) {
fc2 := &FileChunk{
FileID: file.ID,
Idx: 1,
ChunkHash: types.ChunkHash("chunk2"),
ChunkHash: "chunk2",
}
err = repo.Create(ctx, nil, fc2)
if err != nil {
@ -74,7 +72,7 @@ func TestFileChunkRepository(t *testing.T) {
fc3 := &FileChunk{
FileID: file.ID,
Idx: 2,
ChunkHash: types.ChunkHash("chunk3"),
ChunkHash: "chunk3",
}
err = repo.Create(ctx, nil, fc3)
if err != nil {
@ -133,7 +131,7 @@ func TestFileChunkRepositoryMultipleFiles(t *testing.T) {
for i, path := range filePaths {
file := &File{
Path: types.FilePath(path),
Path: path,
MTime: testTime,
CTime: testTime,
Size: 2048,
@ -153,7 +151,7 @@ func TestFileChunkRepositoryMultipleFiles(t *testing.T) {
chunkRepo := NewChunkRepository(db)
for i := range files {
for j := 0; j < 2; j++ {
chunkHash := types.ChunkHash(fmt.Sprintf("file%d_chunk%d", i, j))
chunkHash := fmt.Sprintf("file%d_chunk%d", i, j)
chunk := &Chunk{
ChunkHash: chunkHash,
Size: 1024,
@ -171,7 +169,7 @@ func TestFileChunkRepositoryMultipleFiles(t *testing.T) {
fc := &FileChunk{
FileID: file.ID,
Idx: j,
ChunkHash: types.ChunkHash(fmt.Sprintf("file%d_chunk%d", i, j)),
ChunkHash: fmt.Sprintf("file%d_chunk%d", i, j),
}
err := repo.Create(ctx, nil, fc)
if err != nil {

View File

@ -7,7 +7,7 @@ import (
"time"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/google/uuid"
)
type FileRepository struct {
@ -20,15 +20,14 @@ func NewFileRepository(db *DB) *FileRepository {
func (r *FileRepository) Create(ctx context.Context, tx *sql.Tx, file *File) error {
// Generate UUID if not provided
if file.ID.IsZero() {
file.ID = types.NewFileID()
if file.ID == "" {
file.ID = uuid.New().String()
}
query := `
INSERT INTO files (id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
INSERT INTO files (id, path, mtime, ctime, size, mode, uid, gid, link_target)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(path) DO UPDATE SET
source_path = excluded.source_path,
mtime = excluded.mtime,
ctime = excluded.ctime,
size = excluded.size,
@ -39,36 +38,44 @@ func (r *FileRepository) Create(ctx context.Context, tx *sql.Tx, file *File) err
RETURNING id
`
var idStr string
var err error
if tx != nil {
LogSQL("Execute", query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String())
err = tx.QueryRowContext(ctx, query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String()).Scan(&idStr)
LogSQL("Execute", query, file.ID, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget)
err = tx.QueryRowContext(ctx, query, file.ID, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget).Scan(&file.ID)
} else {
err = r.db.QueryRowWithLog(ctx, query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String()).Scan(&idStr)
err = r.db.QueryRowWithLog(ctx, query, file.ID, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget).Scan(&file.ID)
}
if err != nil {
return fmt.Errorf("inserting file: %w", err)
}
// Parse the returned ID
file.ID, err = types.ParseFileID(idStr)
if err != nil {
return fmt.Errorf("parsing file ID: %w", err)
}
return nil
}
func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, error) {
query := `
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
SELECT id, path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
WHERE path = ?
`
file, err := r.scanFile(r.db.conn.QueryRowContext(ctx, query, path))
var file File
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
err := r.db.conn.QueryRowContext(ctx, query, path).Scan(
&file.ID,
&file.Path,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
if err == sql.ErrNoRows {
return nil, nil
}
@ -76,18 +83,39 @@ func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, err
return nil, fmt.Errorf("querying file: %w", err)
}
return file, nil
file.MTime = time.Unix(mtimeUnix, 0).UTC()
file.CTime = time.Unix(ctimeUnix, 0).UTC()
if linkTarget.Valid {
file.LinkTarget = linkTarget.String
}
return &file, nil
}
// GetByID retrieves a file by its UUID
func (r *FileRepository) GetByID(ctx context.Context, id types.FileID) (*File, error) {
func (r *FileRepository) GetByID(ctx context.Context, id string) (*File, error) {
query := `
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
SELECT id, path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
WHERE id = ?
`
file, err := r.scanFile(r.db.conn.QueryRowContext(ctx, query, id.String()))
var file File
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
err := r.db.conn.QueryRowContext(ctx, query, id).Scan(
&file.ID,
&file.Path,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
if err == sql.ErrNoRows {
return nil, nil
}
@ -95,18 +123,38 @@ func (r *FileRepository) GetByID(ctx context.Context, id types.FileID) (*File, e
return nil, fmt.Errorf("querying file: %w", err)
}
return file, nil
file.MTime = time.Unix(mtimeUnix, 0).UTC()
file.CTime = time.Unix(ctimeUnix, 0).UTC()
if linkTarget.Valid {
file.LinkTarget = linkTarget.String
}
return &file, nil
}
func (r *FileRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path string) (*File, error) {
query := `
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
SELECT id, path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
WHERE path = ?
`
var file File
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
LogSQL("GetByPathTx QueryRowContext", query, path)
file, err := r.scanFile(tx.QueryRowContext(ctx, query, path))
err := tx.QueryRowContext(ctx, query, path).Scan(
&file.ID,
&file.Path,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
LogSQL("GetByPathTx Scan complete", query, path)
if err == sql.ErrNoRows {
@ -116,80 +164,10 @@ func (r *FileRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path strin
return nil, fmt.Errorf("querying file: %w", err)
}
return file, nil
}
// scanFile is a helper that scans a single file row
func (r *FileRepository) scanFile(row *sql.Row) (*File, error) {
var file File
var idStr, pathStr, sourcePathStr string
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
err := row.Scan(
&idStr,
&pathStr,
&sourcePathStr,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
if err != nil {
return nil, err
}
file.ID, err = types.ParseFileID(idStr)
if err != nil {
return nil, fmt.Errorf("parsing file ID: %w", err)
}
file.Path = types.FilePath(pathStr)
file.SourcePath = types.SourcePath(sourcePathStr)
file.MTime = time.Unix(mtimeUnix, 0).UTC()
file.CTime = time.Unix(ctimeUnix, 0).UTC()
if linkTarget.Valid {
file.LinkTarget = types.FilePath(linkTarget.String)
}
return &file, nil
}
// scanFileRows is a helper that scans a file row from rows iterator
func (r *FileRepository) scanFileRows(rows *sql.Rows) (*File, error) {
var file File
var idStr, pathStr, sourcePathStr string
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
err := rows.Scan(
&idStr,
&pathStr,
&sourcePathStr,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
if err != nil {
return nil, err
}
file.ID, err = types.ParseFileID(idStr)
if err != nil {
return nil, fmt.Errorf("parsing file ID: %w", err)
}
file.Path = types.FilePath(pathStr)
file.SourcePath = types.SourcePath(sourcePathStr)
file.MTime = time.Unix(mtimeUnix, 0).UTC()
file.CTime = time.Unix(ctimeUnix, 0).UTC()
if linkTarget.Valid {
file.LinkTarget = types.FilePath(linkTarget.String)
file.LinkTarget = linkTarget.String
}
return &file, nil
@ -197,7 +175,7 @@ func (r *FileRepository) scanFileRows(rows *sql.Rows) (*File, error) {
func (r *FileRepository) ListModifiedSince(ctx context.Context, since time.Time) ([]*File, error) {
query := `
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
SELECT id, path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
WHERE mtime >= ?
ORDER BY path
@ -211,11 +189,32 @@ func (r *FileRepository) ListModifiedSince(ctx context.Context, since time.Time)
var files []*File
for rows.Next() {
file, err := r.scanFileRows(rows)
var file File
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
err := rows.Scan(
&file.ID,
&file.Path,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
if err != nil {
return nil, fmt.Errorf("scanning file: %w", err)
}
files = append(files, file)
file.MTime = time.Unix(mtimeUnix, 0)
file.CTime = time.Unix(ctimeUnix, 0)
if linkTarget.Valid {
file.LinkTarget = linkTarget.String
}
files = append(files, &file)
}
return files, rows.Err()
@ -239,14 +238,14 @@ func (r *FileRepository) Delete(ctx context.Context, tx *sql.Tx, path string) er
}
// DeleteByID deletes a file by its UUID
func (r *FileRepository) DeleteByID(ctx context.Context, tx *sql.Tx, id types.FileID) error {
func (r *FileRepository) DeleteByID(ctx context.Context, tx *sql.Tx, id string) error {
query := `DELETE FROM files WHERE id = ?`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, id.String())
_, err = tx.ExecContext(ctx, query, id)
} else {
_, err = r.db.ExecWithLog(ctx, query, id.String())
_, err = r.db.ExecWithLog(ctx, query, id)
}
if err != nil {
@ -258,7 +257,7 @@ func (r *FileRepository) DeleteByID(ctx context.Context, tx *sql.Tx, id types.Fi
func (r *FileRepository) ListByPrefix(ctx context.Context, prefix string) ([]*File, error) {
query := `
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
SELECT id, path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
WHERE path LIKE ? || '%'
ORDER BY path
@ -272,98 +271,43 @@ func (r *FileRepository) ListByPrefix(ctx context.Context, prefix string) ([]*Fi
var files []*File
for rows.Next() {
file, err := r.scanFileRows(rows)
var file File
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
err := rows.Scan(
&file.ID,
&file.Path,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
if err != nil {
return nil, fmt.Errorf("scanning file: %w", err)
}
files = append(files, file)
file.MTime = time.Unix(mtimeUnix, 0)
file.CTime = time.Unix(ctimeUnix, 0)
if linkTarget.Valid {
file.LinkTarget = linkTarget.String
}
files = append(files, &file)
}
return files, rows.Err()
}
// ListAll returns all files in the database
func (r *FileRepository) ListAll(ctx context.Context) ([]*File, error) {
query := `
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
ORDER BY path
`
rows, err := r.db.conn.QueryContext(ctx, query)
if err != nil {
return nil, fmt.Errorf("querying files: %w", err)
}
defer CloseRows(rows)
var files []*File
for rows.Next() {
file, err := r.scanFileRows(rows)
if err != nil {
return nil, fmt.Errorf("scanning file: %w", err)
}
files = append(files, file)
}
return files, rows.Err()
}
// CreateBatch inserts or updates multiple files in a single statement for efficiency.
// File IDs must be pre-generated before calling this method.
func (r *FileRepository) CreateBatch(ctx context.Context, tx *sql.Tx, files []*File) error {
if len(files) == 0 {
return nil
}
// Each File has 10 values, so batch at 100 to be safe with SQLite's variable limit
const batchSize = 100
for i := 0; i < len(files); i += batchSize {
end := i + batchSize
if end > len(files) {
end = len(files)
}
batch := files[i:end]
query := `INSERT INTO files (id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target) VALUES `
args := make([]interface{}, 0, len(batch)*10)
for j, f := range batch {
if j > 0 {
query += ", "
}
query += "(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"
args = append(args, f.ID.String(), f.Path.String(), f.SourcePath.String(), f.MTime.Unix(), f.CTime.Unix(), f.Size, f.Mode, f.UID, f.GID, f.LinkTarget.String())
}
query += ` ON CONFLICT(path) DO UPDATE SET
source_path = excluded.source_path,
mtime = excluded.mtime,
ctime = excluded.ctime,
size = excluded.size,
mode = excluded.mode,
uid = excluded.uid,
gid = excluded.gid,
link_target = excluded.link_target`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch inserting files: %w", err)
}
}
return nil
}
// DeleteOrphaned deletes files that are not referenced by any snapshot
func (r *FileRepository) DeleteOrphaned(ctx context.Context) error {
query := `
DELETE FROM files
DELETE FROM files
WHERE NOT EXISTS (
SELECT 1 FROM snapshot_files
SELECT 1 FROM snapshot_files
WHERE snapshot_files.file_id = files.id
)
`

View File

@ -53,7 +53,7 @@ func TestFileRepository(t *testing.T) {
}
// Test GetByPath
retrieved, err := repo.GetByPath(ctx, file.Path.String())
retrieved, err := repo.GetByPath(ctx, file.Path)
if err != nil {
t.Fatalf("failed to get file: %v", err)
}
@ -81,7 +81,7 @@ func TestFileRepository(t *testing.T) {
t.Fatalf("failed to update file: %v", err)
}
retrieved, err = repo.GetByPath(ctx, file.Path.String())
retrieved, err = repo.GetByPath(ctx, file.Path)
if err != nil {
t.Fatalf("failed to get updated file: %v", err)
}
@ -99,12 +99,12 @@ func TestFileRepository(t *testing.T) {
}
// Test Delete
err = repo.Delete(ctx, nil, file.Path.String())
err = repo.Delete(ctx, nil, file.Path)
if err != nil {
t.Fatalf("failed to delete file: %v", err)
}
retrieved, err = repo.GetByPath(ctx, file.Path.String())
retrieved, err = repo.GetByPath(ctx, file.Path)
if err != nil {
t.Fatalf("error getting deleted file: %v", err)
}
@ -137,7 +137,7 @@ func TestFileRepositorySymlink(t *testing.T) {
t.Fatalf("failed to create symlink: %v", err)
}
retrieved, err := repo.GetByPath(ctx, symlink.Path.String())
retrieved, err := repo.GetByPath(ctx, symlink.Path)
if err != nil {
t.Fatalf("failed to get symlink: %v", err)
}

View File

@ -2,27 +2,22 @@
// It includes types for files, chunks, blobs, snapshots, and their relationships.
package database
import (
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
import "time"
// File represents a file or directory in the backup system.
// It stores metadata about files including timestamps, permissions, ownership,
// and symlink targets. This information is used to restore files with their
// original attributes.
type File struct {
ID types.FileID // UUID primary key
Path types.FilePath // Absolute path of the file
SourcePath types.SourcePath // The source directory this file came from (for restore path stripping)
ID string // UUID primary key
Path string
MTime time.Time
CTime time.Time
Size int64
Mode uint32
UID uint32
GID uint32
LinkTarget types.FilePath // empty for regular files, target path for symlinks
LinkTarget string // empty for regular files, target path for symlinks
}
// IsSymlink returns true if this file is a symbolic link.
@ -35,16 +30,16 @@ func (f *File) IsSymlink() bool {
// Large files are split into multiple chunks for efficient deduplication and storage.
// The Idx field maintains the order of chunks within a file.
type FileChunk struct {
FileID types.FileID
FileID string
Idx int
ChunkHash types.ChunkHash
ChunkHash string
}
// Chunk represents a data chunk in the deduplication system.
// Files are split into chunks which are content-addressed by their hash.
// The ChunkHash is the SHA256 hash of the chunk content, used for deduplication.
type Chunk struct {
ChunkHash types.ChunkHash
ChunkHash string
Size int64
}
@ -56,13 +51,13 @@ type Chunk struct {
// The blob creation process is: chunks are accumulated -> compressed with zstd
// -> encrypted with age -> hashed -> uploaded to S3 with the hash as filename.
type Blob struct {
ID types.BlobID // UUID assigned when blob creation starts
Hash types.BlobHash // SHA256 of final compressed+encrypted content (empty until finalized)
CreatedTS time.Time // When blob creation started
FinishedTS *time.Time // When blob was finalized (nil if still packing)
UncompressedSize int64 // Total size of raw chunks before compression
CompressedSize int64 // Size after compression and encryption
UploadedTS *time.Time // When blob was uploaded to S3 (nil if not uploaded)
ID string // UUID assigned when blob creation starts
Hash string // SHA256 of final compressed+encrypted content (empty until finalized)
CreatedTS time.Time // When blob creation started
FinishedTS *time.Time // When blob was finalized (nil if still packing)
UncompressedSize int64 // Total size of raw chunks before compression
CompressedSize int64 // Size after compression and encryption
UploadedTS *time.Time // When blob was uploaded to S3 (nil if not uploaded)
}
// BlobChunk represents the mapping between blobs and the chunks they contain.
@ -70,8 +65,8 @@ type Blob struct {
// their position and size within the blob. The offset and length fields
// enable extracting specific chunks from a blob without processing the entire blob.
type BlobChunk struct {
BlobID types.BlobID
ChunkHash types.ChunkHash
BlobID string
ChunkHash string
Offset int64
Length int64
}
@ -80,18 +75,18 @@ type BlobChunk struct {
// This is used during deduplication to identify all files that share a chunk,
// which is important for garbage collection and integrity verification.
type ChunkFile struct {
ChunkHash types.ChunkHash
FileID types.FileID
ChunkHash string
FileID string
FileOffset int64
Length int64
}
// Snapshot represents a snapshot record in the database
type Snapshot struct {
ID types.SnapshotID
Hostname types.Hostname
VaultikVersion types.Version
VaultikGitRevision types.GitRevision
ID string
Hostname string
VaultikVersion string
VaultikGitRevision string
StartedAt time.Time
CompletedAt *time.Time // nil if still in progress
FileCount int64
@ -113,13 +108,13 @@ func (s *Snapshot) IsComplete() bool {
// SnapshotFile represents the mapping between snapshots and files
type SnapshotFile struct {
SnapshotID types.SnapshotID
FileID types.FileID
SnapshotID string
FileID string
}
// SnapshotBlob represents the mapping between snapshots and blobs
type SnapshotBlob struct {
SnapshotID types.SnapshotID
BlobID types.BlobID
BlobHash types.BlobHash // Denormalized for easier manifest generation
SnapshotID string
BlobID string
BlobHash string // Denormalized for easier manifest generation
}

View File

@ -75,11 +75,6 @@ func (r *Repositories) WithTx(ctx context.Context, fn TxFunc) error {
return tx.Commit()
}
// DB returns the underlying database for direct queries
func (r *Repositories) DB() *DB {
return r.db
}
// WithReadTx executes a function within a read-only transaction.
// Read transactions can run concurrently with other read transactions
// but will be blocked by write transactions. The transaction is

View File

@ -6,8 +6,6 @@ import (
"fmt"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
func TestRepositoriesTransaction(t *testing.T) {
@ -35,7 +33,7 @@ func TestRepositoriesTransaction(t *testing.T) {
// Create chunks
chunk1 := &Chunk{
ChunkHash: types.ChunkHash("tx_chunk1"),
ChunkHash: "tx_chunk1",
Size: 512,
}
if err := repos.Chunks.Create(ctx, tx, chunk1); err != nil {
@ -43,7 +41,7 @@ func TestRepositoriesTransaction(t *testing.T) {
}
chunk2 := &Chunk{
ChunkHash: types.ChunkHash("tx_chunk2"),
ChunkHash: "tx_chunk2",
Size: 512,
}
if err := repos.Chunks.Create(ctx, tx, chunk2); err != nil {
@ -71,8 +69,8 @@ func TestRepositoriesTransaction(t *testing.T) {
// Create blob
blob := &Blob{
ID: types.NewBlobID(),
Hash: types.BlobHash("tx_blob1"),
ID: "tx-blob-id-1",
Hash: "tx_blob1",
CreatedTS: time.Now().Truncate(time.Second),
}
if err := repos.Blobs.Create(ctx, tx, blob); err != nil {
@ -158,7 +156,7 @@ func TestRepositoriesTransactionRollback(t *testing.T) {
// Create a chunk
chunk := &Chunk{
ChunkHash: types.ChunkHash("rollback_chunk"),
ChunkHash: "rollback_chunk",
Size: 1024,
}
if err := repos.Chunks.Create(ctx, tx, chunk); err != nil {

View File

@ -6,8 +6,6 @@ import (
"fmt"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
// TestFileRepositoryUUIDGeneration tests that files get unique UUIDs
@ -48,15 +46,15 @@ func TestFileRepositoryUUIDGeneration(t *testing.T) {
}
// Check UUID was generated
if file.ID.IsZero() {
if file.ID == "" {
t.Error("file ID was not generated")
}
// Check UUID is unique
if uuids[file.ID.String()] {
if uuids[file.ID] {
t.Errorf("duplicate UUID generated: %s", file.ID)
}
uuids[file.ID.String()] = true
uuids[file.ID] = true
}
}
@ -98,8 +96,7 @@ func TestFileRepositoryGetByID(t *testing.T) {
}
// Test non-existent ID
nonExistentID := types.NewFileID() // Generate a new UUID that won't exist in the database
nonExistent, err := repo.GetByID(ctx, nonExistentID)
nonExistent, err := repo.GetByID(ctx, "non-existent-uuid")
if err != nil {
t.Fatalf("GetByID should not return error for non-existent ID: %v", err)
}
@ -157,7 +154,7 @@ func TestOrphanedFileCleanup(t *testing.T) {
}
// Add file2 to snapshot
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file2.ID)
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID, file2.ID)
if err != nil {
t.Fatalf("failed to add file to snapshot: %v", err)
}
@ -197,11 +194,11 @@ func TestOrphanedChunkCleanup(t *testing.T) {
// Create chunks
chunk1 := &Chunk{
ChunkHash: types.ChunkHash("orphaned-chunk"),
ChunkHash: "orphaned-chunk",
Size: 1024,
}
chunk2 := &Chunk{
ChunkHash: types.ChunkHash("referenced-chunk"),
ChunkHash: "referenced-chunk",
Size: 1024,
}
@ -247,7 +244,7 @@ func TestOrphanedChunkCleanup(t *testing.T) {
}
// Check that orphaned chunk is gone
orphanedChunk, err := repos.Chunks.GetByHash(ctx, chunk1.ChunkHash.String())
orphanedChunk, err := repos.Chunks.GetByHash(ctx, chunk1.ChunkHash)
if err != nil {
t.Fatalf("error getting chunk: %v", err)
}
@ -256,7 +253,7 @@ func TestOrphanedChunkCleanup(t *testing.T) {
}
// Check that referenced chunk still exists
referencedChunk, err := repos.Chunks.GetByHash(ctx, chunk2.ChunkHash.String())
referencedChunk, err := repos.Chunks.GetByHash(ctx, chunk2.ChunkHash)
if err != nil {
t.Fatalf("error getting chunk: %v", err)
}
@ -275,13 +272,13 @@ func TestOrphanedBlobCleanup(t *testing.T) {
// Create blobs
blob1 := &Blob{
ID: types.NewBlobID(),
Hash: types.BlobHash("orphaned-blob"),
ID: "orphaned-blob-id",
Hash: "orphaned-blob",
CreatedTS: time.Now().Truncate(time.Second),
}
blob2 := &Blob{
ID: types.NewBlobID(),
Hash: types.BlobHash("referenced-blob"),
ID: "referenced-blob-id",
Hash: "referenced-blob",
CreatedTS: time.Now().Truncate(time.Second),
}
@ -306,7 +303,7 @@ func TestOrphanedBlobCleanup(t *testing.T) {
}
// Add blob2 to snapshot
err = repos.Snapshots.AddBlob(ctx, nil, snapshot.ID.String(), blob2.ID, blob2.Hash)
err = repos.Snapshots.AddBlob(ctx, nil, snapshot.ID, blob2.ID, blob2.Hash)
if err != nil {
t.Fatalf("failed to add blob to snapshot: %v", err)
}
@ -318,7 +315,7 @@ func TestOrphanedBlobCleanup(t *testing.T) {
}
// Check that orphaned blob is gone
orphanedBlob, err := repos.Blobs.GetByID(ctx, blob1.ID.String())
orphanedBlob, err := repos.Blobs.GetByID(ctx, blob1.ID)
if err != nil {
t.Fatalf("error getting blob: %v", err)
}
@ -327,7 +324,7 @@ func TestOrphanedBlobCleanup(t *testing.T) {
}
// Check that referenced blob still exists
referencedBlob, err := repos.Blobs.GetByID(ctx, blob2.ID.String())
referencedBlob, err := repos.Blobs.GetByID(ctx, blob2.ID)
if err != nil {
t.Fatalf("error getting blob: %v", err)
}
@ -360,7 +357,7 @@ func TestFileChunkRepositoryWithUUIDs(t *testing.T) {
}
// Create chunks
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
chunks := []string{"chunk1", "chunk2", "chunk3"}
for i, chunkHash := range chunks {
chunk := &Chunk{
ChunkHash: chunkHash,
@ -446,7 +443,7 @@ func TestChunkFileRepositoryWithUUIDs(t *testing.T) {
// Create a chunk that appears in both files (deduplication)
chunk := &Chunk{
ChunkHash: types.ChunkHash("shared-chunk"),
ChunkHash: "shared-chunk",
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)
@ -529,7 +526,7 @@ func TestSnapshotRepositoryExtendedFields(t *testing.T) {
}
// Retrieve and verify
retrieved, err := repo.GetByID(ctx, snapshot.ID.String())
retrieved, err := repo.GetByID(ctx, snapshot.ID)
if err != nil {
t.Fatalf("failed to get snapshot: %v", err)
}
@ -584,7 +581,7 @@ func TestComplexOrphanedDataScenario(t *testing.T) {
files := make([]*File, 3)
for i := range files {
files[i] = &File{
Path: types.FilePath(fmt.Sprintf("/file%d.txt", i)),
Path: fmt.Sprintf("/file%d.txt", i),
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 1024,
@ -604,29 +601,29 @@ func TestComplexOrphanedDataScenario(t *testing.T) {
// file0: only in snapshot1
// file1: in both snapshots
// file2: only in snapshot2
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot1.ID.String(), files[0].ID)
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot1.ID, files[0].ID)
if err != nil {
t.Fatal(err)
}
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot1.ID.String(), files[1].ID)
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot1.ID, files[1].ID)
if err != nil {
t.Fatal(err)
}
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot2.ID.String(), files[1].ID)
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot2.ID, files[1].ID)
if err != nil {
t.Fatal(err)
}
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot2.ID.String(), files[2].ID)
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot2.ID, files[2].ID)
if err != nil {
t.Fatal(err)
}
// Delete snapshot1
err = repos.Snapshots.DeleteSnapshotFiles(ctx, snapshot1.ID.String())
err = repos.Snapshots.DeleteSnapshotFiles(ctx, snapshot1.ID)
if err != nil {
t.Fatal(err)
}
err = repos.Snapshots.Delete(ctx, snapshot1.ID.String())
err = repos.Snapshots.Delete(ctx, snapshot1.ID)
if err != nil {
t.Fatal(err)
}
@ -692,7 +689,7 @@ func TestCascadeDelete(t *testing.T) {
// Create chunks and file-chunk mappings
for i := 0; i < 3; i++ {
chunk := &Chunk{
ChunkHash: types.ChunkHash(fmt.Sprintf("cascade-chunk-%d", i)),
ChunkHash: fmt.Sprintf("cascade-chunk-%d", i),
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)
@ -810,7 +807,7 @@ func TestConcurrentOrphanedCleanup(t *testing.T) {
// Create many files, some orphaned
for i := 0; i < 20; i++ {
file := &File{
Path: types.FilePath(fmt.Sprintf("/concurrent-%d.txt", i)),
Path: fmt.Sprintf("/concurrent-%d.txt", i),
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 1024,
@ -825,7 +822,7 @@ func TestConcurrentOrphanedCleanup(t *testing.T) {
// Add even-numbered files to snapshot
if i%2 == 0 {
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file.ID)
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID, file.ID)
if err != nil {
t.Fatal(err)
}
@ -863,7 +860,7 @@ func TestConcurrentOrphanedCleanup(t *testing.T) {
// Verify all remaining files are even-numbered
for _, file := range files {
var num int
_, err := fmt.Sscanf(file.Path.String(), "/concurrent-%d.txt", &num)
_, err := fmt.Sscanf(file.Path, "/concurrent-%d.txt", &num)
if err != nil {
t.Logf("failed to parse file number from %s: %v", file.Path, err)
}

View File

@ -67,7 +67,7 @@ func TestOrphanedFileCleanupDebug(t *testing.T) {
t.Logf("snapshot_files count before add: %d", count)
// Add file2 to snapshot
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file2.ID)
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID, file2.ID)
if err != nil {
t.Fatalf("failed to add file to snapshot: %v", err)
}

View File

@ -6,8 +6,6 @@ import (
"strings"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
// TestFileRepositoryEdgeCases tests edge cases for file repository
@ -40,7 +38,7 @@ func TestFileRepositoryEdgeCases(t *testing.T) {
{
name: "very long path",
file: &File{
Path: types.FilePath("/" + strings.Repeat("a", 4096)),
Path: "/" + strings.Repeat("a", 4096),
MTime: time.Now(),
CTime: time.Now(),
Size: 1024,
@ -96,7 +94,7 @@ func TestFileRepositoryEdgeCases(t *testing.T) {
t.Run(tt.name, func(t *testing.T) {
// Add a unique suffix to paths to avoid UNIQUE constraint violations
if tt.file.Path != "" {
tt.file.Path = types.FilePath(fmt.Sprintf("%s_%d_%d", tt.file.Path, i, time.Now().UnixNano()))
tt.file.Path = fmt.Sprintf("%s_%d_%d", tt.file.Path, i, time.Now().UnixNano())
}
err := repo.Create(ctx, nil, tt.file)
@ -171,7 +169,7 @@ func TestDuplicateHandling(t *testing.T) {
// Test duplicate chunk hashes
t.Run("duplicate chunk hashes", func(t *testing.T) {
chunk := &Chunk{
ChunkHash: types.ChunkHash("duplicate-chunk"),
ChunkHash: "duplicate-chunk",
Size: 1024,
}
@ -204,7 +202,7 @@ func TestDuplicateHandling(t *testing.T) {
}
chunk := &Chunk{
ChunkHash: types.ChunkHash("test-chunk-dup"),
ChunkHash: "test-chunk-dup",
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)
@ -281,7 +279,7 @@ func TestNullHandling(t *testing.T) {
t.Fatal(err)
}
retrieved, err := repos.Snapshots.GetByID(ctx, snapshot.ID.String())
retrieved, err := repos.Snapshots.GetByID(ctx, snapshot.ID)
if err != nil {
t.Fatal(err)
}
@ -294,8 +292,8 @@ func TestNullHandling(t *testing.T) {
// Test blob with NULL uploaded_ts
t.Run("blob not uploaded", func(t *testing.T) {
blob := &Blob{
ID: types.NewBlobID(),
Hash: types.BlobHash("test-hash"),
ID: "not-uploaded",
Hash: "test-hash",
CreatedTS: time.Now(),
UploadedTS: nil, // Not uploaded yet
}
@ -305,7 +303,7 @@ func TestNullHandling(t *testing.T) {
t.Fatal(err)
}
retrieved, err := repos.Blobs.GetByID(ctx, blob.ID.String())
retrieved, err := repos.Blobs.GetByID(ctx, blob.ID)
if err != nil {
t.Fatal(err)
}
@ -341,13 +339,13 @@ func TestLargeDatasets(t *testing.T) {
// Create many files
const fileCount = 1000
fileIDs := make([]types.FileID, fileCount)
fileIDs := make([]string, fileCount)
t.Run("create many files", func(t *testing.T) {
start := time.Now()
for i := 0; i < fileCount; i++ {
file := &File{
Path: types.FilePath(fmt.Sprintf("/large/file%05d.txt", i)),
Path: fmt.Sprintf("/large/file%05d.txt", i),
MTime: time.Now(),
CTime: time.Now(),
Size: int64(i * 1024),
@ -363,7 +361,7 @@ func TestLargeDatasets(t *testing.T) {
// Add half to snapshot
if i%2 == 0 {
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file.ID)
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID, file.ID)
if err != nil {
t.Fatal(err)
}
@ -415,7 +413,7 @@ func TestErrorPropagation(t *testing.T) {
// Test GetByID with non-existent ID
t.Run("GetByID non-existent", func(t *testing.T) {
file, err := repos.Files.GetByID(ctx, types.NewFileID())
file, err := repos.Files.GetByID(ctx, "non-existent-uuid")
if err != nil {
t.Errorf("GetByID should not return error for non-existent ID, got: %v", err)
}
@ -438,9 +436,9 @@ func TestErrorPropagation(t *testing.T) {
// Test invalid foreign key reference
t.Run("invalid foreign key", func(t *testing.T) {
fc := &FileChunk{
FileID: types.NewFileID(),
FileID: "non-existent-file-id",
Idx: 0,
ChunkHash: types.ChunkHash("some-chunk"),
ChunkHash: "some-chunk",
}
err := repos.FileChunks.Create(ctx, nil, fc)
if err == nil {
@ -472,7 +470,7 @@ func TestQueryInjection(t *testing.T) {
t.Run("injection attempt", func(t *testing.T) {
// Try injection in file path
file := &File{
Path: types.FilePath(injection),
Path: injection,
MTime: time.Now(),
CTime: time.Now(),
Size: 1024,

View File

@ -6,7 +6,6 @@
CREATE TABLE IF NOT EXISTS files (
id TEXT PRIMARY KEY, -- UUID
path TEXT NOT NULL UNIQUE,
source_path TEXT NOT NULL DEFAULT '', -- The source directory this file came from (for restore path stripping)
mtime INTEGER NOT NULL,
ctime INTEGER NOT NULL,
size INTEGER NOT NULL,
@ -29,9 +28,6 @@ CREATE TABLE IF NOT EXISTS file_chunks (
FOREIGN KEY (chunk_hash) REFERENCES chunks(chunk_hash)
);
-- Index for efficient chunk lookups (used in orphan detection)
CREATE INDEX IF NOT EXISTS idx_file_chunks_chunk_hash ON file_chunks(chunk_hash);
-- Chunks table: stores unique content-defined chunks
CREATE TABLE IF NOT EXISTS chunks (
chunk_hash TEXT PRIMARY KEY,
@ -60,9 +56,6 @@ CREATE TABLE IF NOT EXISTS blob_chunks (
FOREIGN KEY (chunk_hash) REFERENCES chunks(chunk_hash)
);
-- Index for efficient chunk lookups (used in orphan detection)
CREATE INDEX IF NOT EXISTS idx_blob_chunks_chunk_hash ON blob_chunks(chunk_hash);
-- Chunk files table: reverse mapping of chunks to files
CREATE TABLE IF NOT EXISTS chunk_files (
chunk_hash TEXT NOT NULL,
@ -74,9 +67,6 @@ CREATE TABLE IF NOT EXISTS chunk_files (
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
);
-- Index for efficient file lookups (used in orphan detection)
CREATE INDEX IF NOT EXISTS idx_chunk_files_file_id ON chunk_files(file_id);
-- Snapshots table: tracks backup snapshots
CREATE TABLE IF NOT EXISTS snapshots (
id TEXT PRIMARY KEY,
@ -106,9 +96,6 @@ CREATE TABLE IF NOT EXISTS snapshot_files (
FOREIGN KEY (file_id) REFERENCES files(id)
);
-- Index for efficient file lookups (used in orphan detection)
CREATE INDEX IF NOT EXISTS idx_snapshot_files_file_id ON snapshot_files(file_id);
-- Snapshot blobs table: maps snapshots to blobs
CREATE TABLE IF NOT EXISTS snapshot_blobs (
snapshot_id TEXT NOT NULL,
@ -119,9 +106,6 @@ CREATE TABLE IF NOT EXISTS snapshot_blobs (
FOREIGN KEY (blob_id) REFERENCES blobs(id)
);
-- Index for efficient blob lookups (used in orphan detection)
CREATE INDEX IF NOT EXISTS idx_snapshot_blobs_blob_id ON snapshot_blobs(blob_id);
-- Uploads table: tracks blob upload metrics
CREATE TABLE IF NOT EXISTS uploads (
blob_hash TEXT PRIMARY KEY,
@ -131,7 +115,4 @@ CREATE TABLE IF NOT EXISTS uploads (
duration_ms INTEGER NOT NULL,
FOREIGN KEY (blob_hash) REFERENCES blobs(blob_hash),
FOREIGN KEY (snapshot_id) REFERENCES snapshots(id)
);
-- Index for efficient snapshot lookups
CREATE INDEX IF NOT EXISTS idx_uploads_snapshot_id ON uploads(snapshot_id);
);

View File

@ -5,8 +5,6 @@ import (
"database/sql"
"fmt"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
type SnapshotRepository struct {
@ -271,7 +269,7 @@ func (r *SnapshotRepository) AddFile(ctx context.Context, tx *sql.Tx, snapshotID
}
// AddFileByID adds a file to a snapshot by file ID
func (r *SnapshotRepository) AddFileByID(ctx context.Context, tx *sql.Tx, snapshotID string, fileID types.FileID) error {
func (r *SnapshotRepository) AddFileByID(ctx context.Context, tx *sql.Tx, snapshotID string, fileID string) error {
query := `
INSERT OR IGNORE INTO snapshot_files (snapshot_id, file_id)
VALUES (?, ?)
@ -279,9 +277,9 @@ func (r *SnapshotRepository) AddFileByID(ctx context.Context, tx *sql.Tx, snapsh
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, snapshotID, fileID.String())
_, err = tx.ExecContext(ctx, query, snapshotID, fileID)
} else {
_, err = r.db.ExecWithLog(ctx, query, snapshotID, fileID.String())
_, err = r.db.ExecWithLog(ctx, query, snapshotID, fileID)
}
if err != nil {
@ -291,48 +289,8 @@ func (r *SnapshotRepository) AddFileByID(ctx context.Context, tx *sql.Tx, snapsh
return nil
}
// AddFilesByIDBatch adds multiple files to a snapshot in batched inserts
func (r *SnapshotRepository) AddFilesByIDBatch(ctx context.Context, tx *sql.Tx, snapshotID string, fileIDs []types.FileID) error {
if len(fileIDs) == 0 {
return nil
}
// Each entry has 2 values, so batch at 400 to be safe
const batchSize = 400
for i := 0; i < len(fileIDs); i += batchSize {
end := i + batchSize
if end > len(fileIDs) {
end = len(fileIDs)
}
batch := fileIDs[i:end]
query := "INSERT OR IGNORE INTO snapshot_files (snapshot_id, file_id) VALUES "
args := make([]interface{}, 0, len(batch)*2)
for j, fileID := range batch {
if j > 0 {
query += ", "
}
query += "(?, ?)"
args = append(args, snapshotID, fileID.String())
}
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, args...)
} else {
_, err = r.db.ExecWithLog(ctx, query, args...)
}
if err != nil {
return fmt.Errorf("batch adding files to snapshot: %w", err)
}
}
return nil
}
// AddBlob adds a blob to a snapshot
func (r *SnapshotRepository) AddBlob(ctx context.Context, tx *sql.Tx, snapshotID string, blobID types.BlobID, blobHash types.BlobHash) error {
func (r *SnapshotRepository) AddBlob(ctx context.Context, tx *sql.Tx, snapshotID string, blobID string, blobHash string) error {
query := `
INSERT OR IGNORE INTO snapshot_blobs (snapshot_id, blob_id, blob_hash)
VALUES (?, ?, ?)
@ -340,9 +298,9 @@ func (r *SnapshotRepository) AddBlob(ctx context.Context, tx *sql.Tx, snapshotID
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, snapshotID, blobID.String(), blobHash.String())
_, err = tx.ExecContext(ctx, query, snapshotID, blobID, blobHash)
} else {
_, err = r.db.ExecWithLog(ctx, query, snapshotID, blobID.String(), blobHash.String())
_, err = r.db.ExecWithLog(ctx, query, snapshotID, blobID, blobHash)
}
if err != nil {

View File

@ -6,8 +6,6 @@ import (
"math"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
const (
@ -48,7 +46,7 @@ func TestSnapshotRepository(t *testing.T) {
}
// Test GetByID
retrieved, err := repo.GetByID(ctx, snapshot.ID.String())
retrieved, err := repo.GetByID(ctx, snapshot.ID)
if err != nil {
t.Fatalf("failed to get snapshot: %v", err)
}
@ -66,12 +64,12 @@ func TestSnapshotRepository(t *testing.T) {
}
// Test UpdateCounts
err = repo.UpdateCounts(ctx, nil, snapshot.ID.String(), 200, 1000, 20, twoHundredMebibytes, sixtyMebibytes)
err = repo.UpdateCounts(ctx, nil, snapshot.ID, 200, 1000, 20, twoHundredMebibytes, sixtyMebibytes)
if err != nil {
t.Fatalf("failed to update counts: %v", err)
}
retrieved, err = repo.GetByID(ctx, snapshot.ID.String())
retrieved, err = repo.GetByID(ctx, snapshot.ID)
if err != nil {
t.Fatalf("failed to get updated snapshot: %v", err)
}
@ -99,7 +97,7 @@ func TestSnapshotRepository(t *testing.T) {
// Add more snapshots
for i := 2; i <= 5; i++ {
s := &Snapshot{
ID: types.SnapshotID(fmt.Sprintf("2024-01-0%dT12:00:00Z", i)),
ID: fmt.Sprintf("2024-01-0%dT12:00:00Z", i),
Hostname: "test-host",
VaultikVersion: "1.0.0",
StartedAt: time.Now().Add(time.Duration(i) * time.Hour).Truncate(time.Second),

View File

@ -35,7 +35,6 @@ type Config struct {
Verbose bool
Debug bool
Cron bool
Quiet bool
}
var logger *slog.Logger
@ -45,8 +44,8 @@ func Initialize(cfg Config) {
// Determine log level based on configuration
var level slog.Level
if cfg.Cron || cfg.Quiet {
// In quiet/cron mode, only show errors
if cfg.Cron {
// In cron mode, only show fatal errors (which we'll handle specially)
level = slog.LevelError
} else if cfg.Debug || strings.Contains(os.Getenv("GODEBUG"), "vaultik") {
level = slog.LevelDebug

View File

@ -21,5 +21,4 @@ type LogOptions struct {
Verbose bool
Debug bool
Cron bool
Quiet bool
}

View File

@ -14,7 +14,6 @@ import (
"time"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/types"
)
// MockS3Client is a mock implementation of S3 operations for testing
@ -139,13 +138,13 @@ func TestBackupWithInMemoryFS(t *testing.T) {
}
for _, file := range files {
if !expectedFiles[file.Path.String()] {
if !expectedFiles[file.Path] {
t.Errorf("Unexpected file in database: %s", file.Path)
}
delete(expectedFiles, file.Path.String())
delete(expectedFiles, file.Path)
// Verify file metadata
fsFile := testFS[file.Path.String()]
fsFile := testFS[file.Path]
if fsFile == nil {
t.Errorf("File %s not found in test filesystem", file.Path)
continue
@ -295,8 +294,8 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
hostname, _ := os.Hostname()
snapshotID := time.Now().Format(time.RFC3339)
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID),
Hostname: types.Hostname(hostname),
ID: snapshotID,
Hostname: hostname,
VaultikVersion: "test",
StartedAt: time.Now(),
CompletedAt: nil,
@ -341,7 +340,7 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
// Create file record in a short transaction
file := &database.File{
Path: types.FilePath(path),
Path: path,
Size: info.Size(),
Mode: uint32(info.Mode()),
MTime: info.ModTime(),
@ -393,7 +392,7 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
// Create new chunk in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
chunk := &database.Chunk{
ChunkHash: types.ChunkHash(chunkHash),
ChunkHash: chunkHash,
Size: int64(n),
}
return b.repos.Chunks.Create(ctx, tx, chunk)
@ -409,7 +408,7 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
fileChunk := &database.FileChunk{
FileID: file.ID,
Idx: chunkIndex,
ChunkHash: types.ChunkHash(chunkHash),
ChunkHash: chunkHash,
}
return b.repos.FileChunks.Create(ctx, tx, fileChunk)
})
@ -420,7 +419,7 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
// Create chunk-file mapping in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
chunkFile := &database.ChunkFile{
ChunkHash: types.ChunkHash(chunkHash),
ChunkHash: chunkHash,
FileID: file.ID,
FileOffset: int64(chunkIndex * defaultChunkSize),
Length: int64(n),
@ -464,11 +463,10 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
}
// Create blob entry in a short transaction
blobID := types.NewBlobID()
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
blob := &database.Blob{
ID: blobID,
Hash: types.BlobHash(blobHash),
ID: "test-blob-" + blobHash[:8],
Hash: blobHash,
CreatedTS: time.Now(),
}
return b.repos.Blobs.Create(ctx, tx, blob)
@ -483,8 +481,8 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
// Create blob-chunk mapping in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
blobChunk := &database.BlobChunk{
BlobID: blobID,
ChunkHash: types.ChunkHash(chunkHash),
BlobID: "test-blob-" + blobHash[:8],
ChunkHash: chunkHash,
Offset: 0,
Length: chunk.Size,
}
@ -496,7 +494,7 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
// Add blob to snapshot in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
return b.repos.Snapshots.AddBlob(ctx, tx, snapshotID, blobID, types.BlobHash(blobHash))
return b.repos.Snapshots.AddBlob(ctx, tx, snapshotID, "test-blob-"+blobHash[:8], blobHash)
})
if err != nil {
return "", err

View File

@ -1,454 +0,0 @@
package snapshot_test
import (
"context"
"database/sql"
"path/filepath"
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/spf13/afero"
"github.com/stretchr/testify/require"
)
func setupExcludeTestFS(t *testing.T) afero.Fs {
t.Helper()
// Create in-memory filesystem
fs := afero.NewMemMapFs()
// Create test directory structure:
// /backup/
// file1.txt (should be backed up)
// file2.log (should be excluded if *.log is in patterns)
// .git/
// config (should be excluded if .git is in patterns)
// objects/
// pack/
// data.pack (should be excluded if .git is in patterns)
// src/
// main.go (should be backed up)
// test.go (should be backed up)
// node_modules/
// package/
// index.js (should be excluded if node_modules is in patterns)
// cache/
// temp.dat (should be excluded if cache/ is in patterns)
// build/
// output.bin (should be excluded if build is in patterns)
// docs/
// readme.md (should be backed up)
// .DS_Store (should be excluded if .DS_Store is in patterns)
// thumbs.db (should be excluded if thumbs.db is in patterns)
files := map[string]string{
"/backup/file1.txt": "content1",
"/backup/file2.log": "log content",
"/backup/.git/config": "git config",
"/backup/.git/objects/pack/data.pack": "pack data",
"/backup/src/main.go": "package main",
"/backup/src/test.go": "package main_test",
"/backup/node_modules/package/index.js": "module.exports = {}",
"/backup/cache/temp.dat": "cached data",
"/backup/build/output.bin": "binary data",
"/backup/docs/readme.md": "# Documentation",
"/backup/.DS_Store": "ds store data",
"/backup/thumbs.db": "thumbs data",
"/backup/src/.hidden": "hidden file",
"/backup/important.log.bak": "backup of log",
}
testTime := time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC)
for path, content := range files {
dir := filepath.Dir(path)
err := fs.MkdirAll(dir, 0755)
require.NoError(t, err)
err = afero.WriteFile(fs, path, []byte(content), 0644)
require.NoError(t, err)
err = fs.Chtimes(path, testTime, testTime)
require.NoError(t, err)
}
return fs
}
func createTestScanner(t *testing.T, fs afero.Fs, excludePatterns []string) (*snapshot.Scanner, *database.Repositories, func()) {
t.Helper()
// Initialize logger
log.Initialize(log.Config{})
// Create test database
db, err := database.NewTestDB()
require.NoError(t, err)
repos := database.NewRepositories(db)
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
FS: fs,
ChunkSize: 64 * 1024,
Repositories: repos,
MaxBlobSize: 1024 * 1024,
CompressionLevel: 3,
AgeRecipients: []string{"age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p"},
Exclude: excludePatterns,
})
cleanup := func() {
_ = db.Close()
}
return scanner, repos, cleanup
}
func createSnapshotRecord(t *testing.T, ctx context.Context, repos *database.Repositories, snapshotID string) {
t.Helper()
err := repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snap := &database.Snapshot{
ID: types.SnapshotID(snapshotID),
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
CompletedAt: nil,
FileCount: 0,
ChunkCount: 0,
BlobCount: 0,
TotalSize: 0,
BlobSize: 0,
CompressionRatio: 1.0,
}
return repos.Snapshots.Create(ctx, tx, snap)
})
require.NoError(t, err)
}
func TestExcludePatterns_ExcludeGitDirectory(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{".git"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should have scanned files but NOT .git directory contents
// Expected: file1.txt, file2.log, src/main.go, src/test.go, node_modules/package/index.js,
// cache/temp.dat, build/output.bin, docs/readme.md, .DS_Store, thumbs.db,
// src/.hidden, important.log.bak
// Excluded: .git/config, .git/objects/pack/data.pack
require.Equal(t, 12, result.FilesScanned, "Should exclude .git directory contents")
}
func TestExcludePatterns_ExcludeByExtension(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"*.log"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude file2.log but NOT important.log.bak (different extension)
// Total files: 14, excluded: 1 (file2.log)
require.Equal(t, 13, result.FilesScanned, "Should exclude *.log files")
}
func TestExcludePatterns_ExcludeNodeModules(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"node_modules"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude node_modules/package/index.js
// Total files: 14, excluded: 1
require.Equal(t, 13, result.FilesScanned, "Should exclude node_modules directory")
}
func TestExcludePatterns_MultiplePatterns(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{".git", "node_modules", "*.log", ".DS_Store", "thumbs.db", "cache", "build"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should only have: file1.txt, src/main.go, src/test.go, docs/readme.md, src/.hidden, important.log.bak
// Excluded: .git/*, node_modules/*, *.log (file2.log), .DS_Store, thumbs.db, cache/*, build/*
require.Equal(t, 6, result.FilesScanned, "Should exclude multiple patterns")
}
func TestExcludePatterns_NoExclusions(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should scan all 14 files
require.Equal(t, 14, result.FilesScanned, "Should scan all files when no exclusions")
}
func TestExcludePatterns_ExcludeHiddenFiles(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{".*"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude: .git/*, .DS_Store, src/.hidden
// Total files: 14, excluded: 4 (.git/config, .git/objects/pack/data.pack, .DS_Store, src/.hidden)
require.Equal(t, 10, result.FilesScanned, "Should exclude hidden files and directories")
}
func TestExcludePatterns_DoubleStarGlob(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"**/*.pack"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude .git/objects/pack/data.pack
// Total files: 14, excluded: 1
require.Equal(t, 13, result.FilesScanned, "Should exclude **/*.pack files")
}
func TestExcludePatterns_ExactFileName(t *testing.T) {
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"thumbs.db", ".DS_Store"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude thumbs.db and .DS_Store
// Total files: 14, excluded: 2
require.Equal(t, 12, result.FilesScanned, "Should exclude exact file names")
}
func TestExcludePatterns_CaseSensitive(t *testing.T) {
// Pattern matching should be case-sensitive
fs := setupExcludeTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"THUMBS.DB"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Case-sensitive matching: THUMBS.DB should NOT match thumbs.db
// All 14 files should be scanned
require.Equal(t, 14, result.FilesScanned, "Pattern matching should be case-sensitive")
}
func TestExcludePatterns_DirectoryWithTrailingSlash(t *testing.T) {
fs := setupExcludeTestFS(t)
// Some users might add trailing slashes to directory patterns
scanner, repos, cleanup := createTestScanner(t, fs, []string{"cache/", "build/"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude cache/temp.dat and build/output.bin
// Total files: 14, excluded: 2
require.Equal(t, 12, result.FilesScanned, "Should handle directory patterns with trailing slashes")
}
func TestExcludePatterns_PatternInSubdirectory(t *testing.T) {
fs := setupExcludeTestFS(t)
// Exclude .hidden file specifically in src directory
scanner, repos, cleanup := createTestScanner(t, fs, []string{"src/.hidden"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// Should exclude only src/.hidden
// Total files: 14, excluded: 1
require.Equal(t, 13, result.FilesScanned, "Should exclude specific subdirectory files")
}
// setupAnchoredTestFS creates a filesystem for testing anchored patterns
// Source dir: /backup
// Structure:
//
// /backup/
// projectname/
// file.txt (should be excluded with /projectname)
// otherproject/
// projectname/
// file.txt (should NOT be excluded with /projectname, only with projectname)
// src/
// file.go
func setupAnchoredTestFS(t *testing.T) afero.Fs {
t.Helper()
fs := afero.NewMemMapFs()
files := map[string]string{
"/backup/projectname/file.txt": "root project file",
"/backup/otherproject/projectname/file.txt": "nested project file",
"/backup/src/file.go": "source file",
"/backup/file.txt": "root file",
}
testTime := time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC)
for path, content := range files {
dir := filepath.Dir(path)
err := fs.MkdirAll(dir, 0755)
require.NoError(t, err)
err = afero.WriteFile(fs, path, []byte(content), 0644)
require.NoError(t, err)
err = fs.Chtimes(path, testTime, testTime)
require.NoError(t, err)
}
return fs
}
func TestExcludePatterns_AnchoredPattern(t *testing.T) {
// Pattern starting with / should only match from root of source dir
fs := setupAnchoredTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"/projectname"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// /projectname should ONLY exclude /backup/projectname/file.txt (1 file)
// /backup/otherproject/projectname/file.txt should NOT be excluded
// Total files: 4, excluded: 1
require.Equal(t, 3, result.FilesScanned, "Anchored pattern /projectname should only match at root of source dir")
}
func TestExcludePatterns_UnanchoredPattern(t *testing.T) {
// Pattern without leading / should match anywhere in path
fs := setupAnchoredTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"projectname"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// projectname (without /) should exclude BOTH:
// - /backup/projectname/file.txt
// - /backup/otherproject/projectname/file.txt
// Total files: 4, excluded: 2
require.Equal(t, 2, result.FilesScanned, "Unanchored pattern should match anywhere in path")
}
func TestExcludePatterns_AnchoredPatternWithGlob(t *testing.T) {
// Anchored pattern with glob
fs := setupAnchoredTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"/src/*.go"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// /src/*.go should exclude /backup/src/file.go
// Total files: 4, excluded: 1
require.Equal(t, 3, result.FilesScanned, "Anchored pattern with glob should work")
}
func TestExcludePatterns_AnchoredPatternFile(t *testing.T) {
// Anchored pattern for exact file at root
fs := setupAnchoredTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"/file.txt"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// /file.txt should ONLY exclude /backup/file.txt
// NOT /backup/projectname/file.txt or /backup/otherproject/projectname/file.txt
// Total files: 4, excluded: 1
require.Equal(t, 3, result.FilesScanned, "Anchored pattern for file should only match at root")
}
func TestExcludePatterns_UnanchoredPatternFile(t *testing.T) {
// Unanchored pattern for file should match anywhere
fs := setupAnchoredTestFS(t)
scanner, repos, cleanup := createTestScanner(t, fs, []string{"file.txt"})
defer cleanup()
require.NotNil(t, scanner)
ctx := context.Background()
createSnapshotRecord(t, ctx, repos, "test-snapshot")
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
require.NoError(t, err)
// file.txt should exclude ALL file.txt files:
// - /backup/file.txt
// - /backup/projectname/file.txt
// - /backup/otherproject/projectname/file.txt
// Total files: 4, excluded: 3
require.Equal(t, 1, result.FilesScanned, "Unanchored pattern for file should match anywhere")
}

View File

@ -9,7 +9,6 @@ import (
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
@ -54,7 +53,7 @@ func TestFileContentChange(t *testing.T) {
snapshotID1 := "snapshot1"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID1),
ID: snapshotID1,
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
@ -88,7 +87,7 @@ func TestFileContentChange(t *testing.T) {
snapshotID2 := "snapshot2"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID2),
ID: snapshotID2,
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
@ -118,12 +117,12 @@ func TestFileContentChange(t *testing.T) {
assert.Equal(t, newChunkHash, chunkFiles2[0].ChunkHash)
// Verify old chunk still exists (it's still valid data)
oldChunk, err := repos.Chunks.GetByHash(ctx, oldChunkHash.String())
oldChunk, err := repos.Chunks.GetByHash(ctx, oldChunkHash)
require.NoError(t, err)
assert.NotNil(t, oldChunk)
// Verify new chunk exists
newChunk, err := repos.Chunks.GetByHash(ctx, newChunkHash.String())
newChunk, err := repos.Chunks.GetByHash(ctx, newChunkHash)
require.NoError(t, err)
assert.NotNil(t, newChunk)
@ -183,7 +182,7 @@ func TestMultipleFileChanges(t *testing.T) {
snapshotID1 := "snapshot1"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID1),
ID: snapshotID1,
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
@ -209,7 +208,7 @@ func TestMultipleFileChanges(t *testing.T) {
snapshotID2 := "snapshot2"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID2),
ID: snapshotID2,
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),

View File

@ -12,8 +12,6 @@ import (
type ScannerParams struct {
EnableProgress bool
Fs afero.Fs
Exclude []string // Exclude patterns (combined global + snapshot-specific)
SkipErrors bool // Skip file read errors (log loudly but continue)
}
// Module exports backup functionality as an fx module.
@ -31,12 +29,6 @@ type ScannerFactory func(params ScannerParams) *Scanner
func provideScannerFactory(cfg *config.Config, repos *database.Repositories, storer storage.Storer) ScannerFactory {
return func(params ScannerParams) *Scanner {
// Use provided excludes, or fall back to global config excludes
excludes := params.Exclude
if len(excludes) == 0 {
excludes = cfg.Exclude
}
return NewScanner(ScannerConfig{
FS: params.Fs,
ChunkSize: cfg.ChunkSize.Int64(),
@ -46,8 +38,6 @@ func provideScannerFactory(cfg *config.Config, repos *database.Repositories, sto
CompressionLevel: cfg.CompressionLevel,
AgeRecipients: cfg.AgeRecipients,
EnableProgress: params.EnableProgress,
Exclude: excludes,
SkipErrors: params.SkipErrors,
})
}
}

View File

@ -3,10 +3,8 @@ package snapshot
import (
"context"
"database/sql"
"errors"
"fmt"
"os"
"path/filepath"
"strings"
"sync"
"time"
@ -16,9 +14,7 @@ import (
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/storage"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/dustin/go-humanize"
"github.com/gobwas/glob"
"github.com/spf13/afero"
)
@ -29,51 +25,23 @@ type FileToProcess struct {
File *database.File
}
// pendingFileData holds all data needed to commit a file to the database
type pendingFileData struct {
file *database.File
fileChunks []database.FileChunk
chunkFiles []database.ChunkFile
}
// compiledPattern holds a compiled glob pattern and whether it's anchored
type compiledPattern struct {
pattern glob.Glob
anchored bool // If true, only matches from root of source dir
original string
}
// Scanner scans directories and populates the database with file and chunk information
type Scanner struct {
fs afero.Fs
chunker *chunker.Chunker
packer *blob.Packer
repos *database.Repositories
storage storage.Storer
maxBlobSize int64
compressionLevel int
ageRecipient string
snapshotID string // Current snapshot being processed
currentSourcePath string // Current source directory being scanned (for restore path stripping)
exclude []string // Glob patterns for files/directories to exclude
compiledExclude []compiledPattern // Compiled glob patterns
progress *ProgressReporter
skipErrors bool // Skip file read errors (log loudly but continue)
fs afero.Fs
chunker *chunker.Chunker
packer *blob.Packer
repos *database.Repositories
storage storage.Storer
maxBlobSize int64
compressionLevel int
ageRecipient string
snapshotID string // Current snapshot being processed
progress *ProgressReporter
// In-memory cache of known chunk hashes for fast existence checks
knownChunks map[string]struct{}
knownChunksMu sync.RWMutex
// Pending chunk hashes - chunks that have been added to packer but not yet committed to DB
// When a blob finalizes, the committed chunks are removed from this set
pendingChunkHashes map[string]struct{}
pendingChunkHashesMu sync.Mutex
// Pending file data buffer for batch insertion
// Files are flushed when all their chunks have been committed to DB
pendingFiles []pendingFileData
pendingFilesMu sync.Mutex
// Mutex for coordinating blob creation
packerMu sync.Mutex // Blocks chunk production during blob creation
@ -91,8 +59,6 @@ type ScannerConfig struct {
CompressionLevel int
AgeRecipients []string // Optional, empty means no encryption
EnableProgress bool // Enable progress reporting
Exclude []string // Glob patterns for files/directories to exclude
SkipErrors bool // Skip file read errors (log loudly but continue)
}
// ScanResult contains the results of a scan operation
@ -136,30 +102,22 @@ func NewScanner(cfg ScannerConfig) *Scanner {
progress = NewProgressReporter()
}
// Compile exclude patterns
compiledExclude := compileExcludePatterns(cfg.Exclude)
return &Scanner{
fs: cfg.FS,
chunker: chunker.NewChunker(cfg.ChunkSize),
packer: packer,
repos: cfg.Repositories,
storage: cfg.Storage,
maxBlobSize: cfg.MaxBlobSize,
compressionLevel: cfg.CompressionLevel,
ageRecipient: strings.Join(cfg.AgeRecipients, ","),
exclude: cfg.Exclude,
compiledExclude: compiledExclude,
progress: progress,
skipErrors: cfg.SkipErrors,
pendingChunkHashes: make(map[string]struct{}),
fs: cfg.FS,
chunker: chunker.NewChunker(cfg.ChunkSize),
packer: packer,
repos: cfg.Repositories,
storage: cfg.Storage,
maxBlobSize: cfg.MaxBlobSize,
compressionLevel: cfg.CompressionLevel,
ageRecipient: strings.Join(cfg.AgeRecipients, ","),
progress: progress,
}
}
// Scan scans a directory and populates the database
func (s *Scanner) Scan(ctx context.Context, path string, snapshotID string) (*ScanResult, error) {
s.snapshotID = snapshotID
s.currentSourcePath = path // Store source path for file records (used during restore)
s.scanCtx = ctx
result := &ScanResult{
StartTime: time.Now().UTC(),
@ -289,7 +247,7 @@ func (s *Scanner) loadKnownFiles(ctx context.Context, path string) (map[string]*
result := make(map[string]*database.File, len(files))
for _, f := range files {
result[f.Path.String()] = f
result[f.Path] = f
}
return result, nil
@ -306,7 +264,7 @@ func (s *Scanner) loadKnownChunks(ctx context.Context) error {
s.knownChunksMu.Lock()
s.knownChunks = make(map[string]struct{}, len(chunks))
for _, c := range chunks {
s.knownChunks[c.ChunkHash.String()] = struct{}{}
s.knownChunks[c.ChunkHash] = struct{}{}
}
s.knownChunksMu.Unlock()
@ -328,226 +286,10 @@ func (s *Scanner) addKnownChunk(hash string) {
s.knownChunksMu.Unlock()
}
// addPendingChunkHash marks a chunk as pending (not yet committed to DB)
func (s *Scanner) addPendingChunkHash(hash string) {
s.pendingChunkHashesMu.Lock()
s.pendingChunkHashes[hash] = struct{}{}
s.pendingChunkHashesMu.Unlock()
}
// removePendingChunkHashes removes committed chunk hashes from the pending set
func (s *Scanner) removePendingChunkHashes(hashes []string) {
log.Debug("removePendingChunkHashes: starting", "count", len(hashes))
start := time.Now()
s.pendingChunkHashesMu.Lock()
for _, hash := range hashes {
delete(s.pendingChunkHashes, hash)
}
s.pendingChunkHashesMu.Unlock()
log.Debug("removePendingChunkHashes: done", "count", len(hashes), "duration", time.Since(start))
}
// isChunkPending returns true if the chunk is still pending (not yet committed to DB)
func (s *Scanner) isChunkPending(hash string) bool {
s.pendingChunkHashesMu.Lock()
_, pending := s.pendingChunkHashes[hash]
s.pendingChunkHashesMu.Unlock()
return pending
}
// addPendingFile adds a file to the pending buffer
// Files are NOT auto-flushed here - they are flushed when their chunks are committed
// (in handleBlobReady after blob finalize)
func (s *Scanner) addPendingFile(_ context.Context, data pendingFileData) {
s.pendingFilesMu.Lock()
s.pendingFiles = append(s.pendingFiles, data)
s.pendingFilesMu.Unlock()
}
// flushPendingFiles writes all pending files to the database in a single transaction
func (s *Scanner) flushPendingFiles(ctx context.Context) error {
s.pendingFilesMu.Lock()
files := s.pendingFiles
s.pendingFiles = nil
s.pendingFilesMu.Unlock()
if len(files) == 0 {
return nil
}
return s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
for _, data := range files {
// Create or update the file record
if err := s.repos.Files.Create(txCtx, tx, data.file); err != nil {
return fmt.Errorf("creating file record: %w", err)
}
// Delete any existing file_chunks and chunk_files for this file
if err := s.repos.FileChunks.DeleteByFileID(txCtx, tx, data.file.ID); err != nil {
return fmt.Errorf("deleting old file chunks: %w", err)
}
if err := s.repos.ChunkFiles.DeleteByFileID(txCtx, tx, data.file.ID); err != nil {
return fmt.Errorf("deleting old chunk files: %w", err)
}
// Create file-chunk mappings
for i := range data.fileChunks {
if err := s.repos.FileChunks.Create(txCtx, tx, &data.fileChunks[i]); err != nil {
return fmt.Errorf("creating file chunk: %w", err)
}
}
// Create chunk-file mappings
for i := range data.chunkFiles {
if err := s.repos.ChunkFiles.Create(txCtx, tx, &data.chunkFiles[i]); err != nil {
return fmt.Errorf("creating chunk file: %w", err)
}
}
// Add file to snapshot
if err := s.repos.Snapshots.AddFileByID(txCtx, tx, s.snapshotID, data.file.ID); err != nil {
return fmt.Errorf("adding file to snapshot: %w", err)
}
}
return nil
})
}
// flushAllPending flushes all pending files to the database
func (s *Scanner) flushAllPending(ctx context.Context) error {
return s.flushPendingFiles(ctx)
}
// flushCompletedPendingFiles flushes only files whose chunks are all committed to DB
// Files with pending chunks are kept in the queue for later flushing
func (s *Scanner) flushCompletedPendingFiles(ctx context.Context) error {
flushStart := time.Now()
log.Debug("flushCompletedPendingFiles: starting")
log.Debug("flushCompletedPendingFiles: acquiring pendingFilesMu lock")
s.pendingFilesMu.Lock()
log.Debug("flushCompletedPendingFiles: acquired lock", "pending_files", len(s.pendingFiles))
// Separate files into complete (can flush) and incomplete (keep pending)
var canFlush []pendingFileData
var stillPending []pendingFileData
log.Debug("flushCompletedPendingFiles: checking which files can flush")
checkStart := time.Now()
for _, data := range s.pendingFiles {
allChunksCommitted := true
for _, fc := range data.fileChunks {
if s.isChunkPending(fc.ChunkHash.String()) {
allChunksCommitted = false
break
}
}
if allChunksCommitted {
canFlush = append(canFlush, data)
} else {
stillPending = append(stillPending, data)
}
}
log.Debug("flushCompletedPendingFiles: check done", "duration", time.Since(checkStart), "can_flush", len(canFlush), "still_pending", len(stillPending))
s.pendingFiles = stillPending
s.pendingFilesMu.Unlock()
log.Debug("flushCompletedPendingFiles: released lock")
if len(canFlush) == 0 {
log.Debug("flushCompletedPendingFiles: nothing to flush")
return nil
}
log.Debug("Flushing completed files after blob finalize",
"files_to_flush", len(canFlush),
"files_still_pending", len(stillPending))
// Collect all data for batch operations
log.Debug("flushCompletedPendingFiles: collecting data for batch ops")
collectStart := time.Now()
var allFileChunks []database.FileChunk
var allChunkFiles []database.ChunkFile
var allFileIDs []types.FileID
var allFiles []*database.File
for _, data := range canFlush {
allFileChunks = append(allFileChunks, data.fileChunks...)
allChunkFiles = append(allChunkFiles, data.chunkFiles...)
allFileIDs = append(allFileIDs, data.file.ID)
allFiles = append(allFiles, data.file)
}
log.Debug("flushCompletedPendingFiles: collected data",
"duration", time.Since(collectStart),
"file_chunks", len(allFileChunks),
"chunk_files", len(allChunkFiles),
"files", len(allFiles))
// Flush the complete files using batch operations
log.Debug("flushCompletedPendingFiles: starting transaction")
txStart := time.Now()
err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
log.Debug("flushCompletedPendingFiles: inside transaction")
// Batch delete old file_chunks and chunk_files
log.Debug("flushCompletedPendingFiles: deleting old file_chunks")
opStart := time.Now()
if err := s.repos.FileChunks.DeleteByFileIDs(txCtx, tx, allFileIDs); err != nil {
return fmt.Errorf("batch deleting old file chunks: %w", err)
}
log.Debug("flushCompletedPendingFiles: deleted file_chunks", "duration", time.Since(opStart))
log.Debug("flushCompletedPendingFiles: deleting old chunk_files")
opStart = time.Now()
if err := s.repos.ChunkFiles.DeleteByFileIDs(txCtx, tx, allFileIDs); err != nil {
return fmt.Errorf("batch deleting old chunk files: %w", err)
}
log.Debug("flushCompletedPendingFiles: deleted chunk_files", "duration", time.Since(opStart))
// Batch create/update file records
log.Debug("flushCompletedPendingFiles: creating files")
opStart = time.Now()
if err := s.repos.Files.CreateBatch(txCtx, tx, allFiles); err != nil {
return fmt.Errorf("batch creating file records: %w", err)
}
log.Debug("flushCompletedPendingFiles: created files", "duration", time.Since(opStart))
// Batch insert file_chunks
log.Debug("flushCompletedPendingFiles: inserting file_chunks")
opStart = time.Now()
if err := s.repos.FileChunks.CreateBatch(txCtx, tx, allFileChunks); err != nil {
return fmt.Errorf("batch creating file chunks: %w", err)
}
log.Debug("flushCompletedPendingFiles: inserted file_chunks", "duration", time.Since(opStart))
// Batch insert chunk_files
log.Debug("flushCompletedPendingFiles: inserting chunk_files")
opStart = time.Now()
if err := s.repos.ChunkFiles.CreateBatch(txCtx, tx, allChunkFiles); err != nil {
return fmt.Errorf("batch creating chunk files: %w", err)
}
log.Debug("flushCompletedPendingFiles: inserted chunk_files", "duration", time.Since(opStart))
// Batch add files to snapshot
log.Debug("flushCompletedPendingFiles: adding files to snapshot")
opStart = time.Now()
if err := s.repos.Snapshots.AddFilesByIDBatch(txCtx, tx, s.snapshotID, allFileIDs); err != nil {
return fmt.Errorf("batch adding files to snapshot: %w", err)
}
log.Debug("flushCompletedPendingFiles: added files to snapshot", "duration", time.Since(opStart))
log.Debug("flushCompletedPendingFiles: transaction complete")
return nil
})
log.Debug("flushCompletedPendingFiles: transaction done", "duration", time.Since(txStart))
log.Debug("flushCompletedPendingFiles: total duration", "duration", time.Since(flushStart))
return err
}
// ScanPhaseResult contains the results of the scan phase
type ScanPhaseResult struct {
FilesToProcess []*FileToProcess
UnchangedFileIDs []types.FileID // IDs of unchanged files to associate with snapshot
UnchangedFileIDs []string // IDs of unchanged files to associate with snapshot
}
// scanPhase performs the initial directory scan to identify files to process
@ -559,7 +301,7 @@ func (s *Scanner) scanPhase(ctx context.Context, path string, result *ScanResult
estimatedTotal := int64(len(knownFiles))
var filesToProcess []*FileToProcess
var unchangedFileIDs []types.FileID // Just IDs - no new records needed
var unchangedFileIDs []string // Just IDs - no new records needed
var mu sync.Mutex
// Set up periodic status output
@ -571,11 +313,6 @@ func (s *Scanner) scanPhase(ctx context.Context, path string, result *ScanResult
log.Debug("Starting directory walk", "path", path)
err := afero.Walk(s.fs, path, func(filePath string, info os.FileInfo, err error) error {
if err != nil {
if s.skipErrors {
log.Error("ERROR: Failed to access file (skipping due to --skip-errors)", "path", filePath, "error", err)
fmt.Printf("ERROR: Failed to access %s: %v (skipping)\n", filePath, err)
return nil // Continue scanning
}
log.Debug("Error accessing filesystem entry", "path", filePath, "error", err)
return err
}
@ -587,14 +324,6 @@ func (s *Scanner) scanPhase(ctx context.Context, path string, result *ScanResult
default:
}
// Check exclude patterns - for directories, skip the entire subtree
if s.shouldExclude(filePath, path) {
if info.IsDir() {
return filepath.SkipDir
}
return nil
}
// Skip non-regular files for processing (but still count them)
if !info.Mode().IsRegular() {
return nil
@ -614,7 +343,7 @@ func (s *Scanner) scanPhase(ctx context.Context, path string, result *ScanResult
FileInfo: info,
File: file,
})
} else if !file.ID.IsZero() {
} else if file.ID != "" {
// Unchanged file with existing ID - just need snapshot association
unchangedFileIDs = append(unchangedFileIDs, file.ID)
}
@ -700,36 +429,27 @@ func (s *Scanner) checkFileInMemory(path string, info os.FileInfo, knownFiles ma
gid = stat.Gid()
}
// Check against in-memory map first to get existing ID if available
existingFile, exists := knownFiles[path]
// Create file record with ID set upfront
// For new files, generate UUID immediately so it's available for chunk associations
// For existing files, reuse the existing ID
var fileID types.FileID
if exists {
fileID = existingFile.ID
} else {
fileID = types.NewFileID()
}
// Create file record
file := &database.File{
ID: fileID,
Path: types.FilePath(path),
SourcePath: types.SourcePath(s.currentSourcePath), // Store source directory for restore path stripping
MTime: info.ModTime(),
CTime: info.ModTime(), // afero doesn't provide ctime
Size: info.Size(),
Mode: uint32(info.Mode()),
UID: uid,
GID: gid,
Path: path,
MTime: info.ModTime(),
CTime: info.ModTime(), // afero doesn't provide ctime
Size: info.Size(),
Mode: uint32(info.Mode()),
UID: uid,
GID: gid,
}
// New file - needs processing
// Check against in-memory map
existingFile, exists := knownFiles[path]
if !exists {
// New file
return file, true
}
// Reuse existing ID
file.ID = existingFile.ID
// Check if file has changed
if existingFile.Size != file.Size ||
existingFile.MTime.Unix() != file.MTime.Unix() ||
@ -745,7 +465,7 @@ func (s *Scanner) checkFileInMemory(path string, info os.FileInfo, knownFiles ma
// batchAddFilesToSnapshot adds existing file IDs to the snapshot association table
// This is used for unchanged files that already have records in the database
func (s *Scanner) batchAddFilesToSnapshot(ctx context.Context, fileIDs []types.FileID) error {
func (s *Scanner) batchAddFilesToSnapshot(ctx context.Context, fileIDs []string) error {
const batchSize = 1000
startTime := time.Now()
@ -822,19 +542,6 @@ func (s *Scanner) processPhase(ctx context.Context, filesToProcess []*FileToProc
// Process file in streaming fashion
if err := s.processFileStreaming(ctx, fileToProcess, result); err != nil {
// Handle files that were deleted between scan and process phases
if errors.Is(err, os.ErrNotExist) {
log.Warn("File was deleted during backup, skipping", "path", fileToProcess.Path)
result.FilesSkipped++
continue
}
// Skip file read errors if --skip-errors is enabled
if s.skipErrors {
log.Error("ERROR: Failed to process file (skipping due to --skip-errors)", "path", fileToProcess.Path, "error", err)
fmt.Printf("ERROR: Failed to process %s: %v (skipping)\n", fileToProcess.Path, err)
result.FilesSkipped++
continue
}
return fmt.Errorf("processing file %s: %w", fileToProcess.Path, err)
}
@ -878,8 +585,7 @@ func (s *Scanner) processPhase(ctx context.Context, filesToProcess []*FileToProc
}
}
// Final packer flush first - this commits remaining chunks to DB
// and handleBlobReady will flush files whose chunks are now committed
// Final flush (outside any transaction)
s.packerMu.Lock()
if err := s.packer.Flush(); err != nil {
s.packerMu.Unlock()
@ -887,24 +593,14 @@ func (s *Scanner) processPhase(ctx context.Context, filesToProcess []*FileToProc
}
s.packerMu.Unlock()
// Flush any remaining pending files (e.g., files with only pre-existing chunks
// that didn't trigger a blob finalize)
if err := s.flushAllPending(ctx); err != nil {
return fmt.Errorf("flushing remaining pending files: %w", err)
}
// If no storage configured, store any remaining blobs locally
if s.storage == nil {
blobs := s.packer.GetFinishedBlobs()
for _, b := range blobs {
// Blob metadata is already stored incrementally during packing
// Just add the blob to the snapshot
blobID, err := types.ParseBlobID(b.ID)
if err != nil {
return fmt.Errorf("parsing blob ID: %w", err)
}
err = s.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
return s.repos.Snapshots.AddBlob(ctx, tx, s.snapshotID, blobID, types.BlobHash(b.Hash))
err := s.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
return s.repos.Snapshots.AddBlob(ctx, tx, s.snapshotID, b.ID, b.Hash)
})
if err != nil {
return fmt.Errorf("storing blob metadata: %w", err)
@ -1006,21 +702,14 @@ func (s *Scanner) handleBlobReady(blobWithReader *blob.BlobWithReader) error {
if dbCtx == nil {
dbCtx = context.Background()
}
// Parse blob ID for typed operations
finishedBlobID, err := types.ParseBlobID(finishedBlob.ID)
if err != nil {
return fmt.Errorf("parsing finished blob ID: %w", err)
}
err = s.repos.WithTx(dbCtx, func(ctx context.Context, tx *sql.Tx) error {
err := s.repos.WithTx(dbCtx, func(ctx context.Context, tx *sql.Tx) error {
// Update blob upload timestamp
if err := s.repos.Blobs.UpdateUploaded(ctx, tx, finishedBlob.ID); err != nil {
return fmt.Errorf("updating blob upload timestamp: %w", err)
}
// Add the blob to the snapshot
if err := s.repos.Snapshots.AddBlob(ctx, tx, s.snapshotID, finishedBlobID, types.BlobHash(finishedBlob.Hash)); err != nil {
if err := s.repos.Snapshots.AddBlob(ctx, tx, s.snapshotID, finishedBlob.ID, finishedBlob.Hash); err != nil {
return fmt.Errorf("adding blob to snapshot: %w", err)
}
@ -1050,25 +739,7 @@ func (s *Scanner) handleBlobReady(blobWithReader *blob.BlobWithReader) error {
}
}
if err != nil {
return err
}
// Chunks from this blob are now committed to DB - remove from pending set
log.Debug("handleBlobReady: removing pending chunk hashes")
s.removePendingChunkHashes(blobWithReader.InsertedChunkHashes)
log.Debug("handleBlobReady: removed pending chunk hashes")
// Flush files whose chunks are now all committed
// This maintains database consistency after each blob
log.Debug("handleBlobReady: calling flushCompletedPendingFiles")
if err := s.flushCompletedPendingFiles(dbCtx); err != nil {
return fmt.Errorf("flushing completed files: %w", err)
}
log.Debug("handleBlobReady: flushCompletedPendingFiles returned")
log.Debug("handleBlobReady: complete")
return nil
return err
}
// processFileStreaming processes a file by streaming chunks directly to the packer
@ -1108,14 +779,23 @@ func (s *Scanner) processFileStreaming(ctx context.Context, fileToProcess *FileT
// Check if chunk already exists (fast in-memory lookup)
chunkExists := s.chunkExists(chunk.Hash)
// Queue new chunks for batch insert when blob finalizes
// This dramatically reduces transaction overhead
// Store chunk if new
if !chunkExists {
s.packer.AddPendingChunk(chunk.Hash, chunk.Size)
// Add to in-memory cache immediately for fast duplicate detection
err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
dbChunk := &database.Chunk{
ChunkHash: chunk.Hash,
Size: chunk.Size,
}
if err := s.repos.Chunks.Create(txCtx, tx, dbChunk); err != nil {
return fmt.Errorf("creating chunk: %w", err)
}
return nil
})
if err != nil {
return fmt.Errorf("storing chunk: %w", err)
}
// Add to in-memory cache for fast duplicate detection
s.addKnownChunk(chunk.Hash)
// Track as pending until blob finalizes and commits to DB
s.addPendingChunkHash(chunk.Hash)
}
// Track file chunk association for later storage
@ -1123,7 +803,7 @@ func (s *Scanner) processFileStreaming(ctx context.Context, fileToProcess *FileT
fileChunk: database.FileChunk{
FileID: fileToProcess.File.ID,
Idx: chunkIndex,
ChunkHash: types.ChunkHash(chunk.Hash),
ChunkHash: chunk.Hash,
},
offset: chunk.Offset,
size: chunk.Size,
@ -1191,32 +871,56 @@ func (s *Scanner) processFileStreaming(ctx context.Context, fileToProcess *FileT
"file_hash", fileHash,
"chunks", len(chunks))
// Build file data for batch insertion
// Update chunk associations with the file ID
fileChunks := make([]database.FileChunk, len(chunks))
chunkFiles := make([]database.ChunkFile, len(chunks))
for i, ci := range chunks {
fileChunks[i] = database.FileChunk{
FileID: fileToProcess.File.ID,
Idx: ci.fileChunk.Idx,
ChunkHash: ci.fileChunk.ChunkHash,
// Store file record, chunk associations, and snapshot association in database
// This happens AFTER successful chunking to avoid orphaned records on interruption
err = s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
// Create or update the file record
// Files.Create uses INSERT OR REPLACE, so it handles both new and changed files
if err := s.repos.Files.Create(txCtx, tx, fileToProcess.File); err != nil {
return fmt.Errorf("creating file record: %w", err)
}
chunkFiles[i] = database.ChunkFile{
ChunkHash: ci.fileChunk.ChunkHash,
FileID: fileToProcess.File.ID,
FileOffset: ci.offset,
Length: ci.size,
}
}
// Queue file for batch insertion
// Files will be flushed when their chunks are committed (after blob finalize)
s.addPendingFile(ctx, pendingFileData{
file: fileToProcess.File,
fileChunks: fileChunks,
chunkFiles: chunkFiles,
// Delete any existing file_chunks and chunk_files for this file
// This ensures old chunks are no longer associated when file content changes
if err := s.repos.FileChunks.DeleteByFileID(txCtx, tx, fileToProcess.File.ID); err != nil {
return fmt.Errorf("deleting old file chunks: %w", err)
}
if err := s.repos.ChunkFiles.DeleteByFileID(txCtx, tx, fileToProcess.File.ID); err != nil {
return fmt.Errorf("deleting old chunk files: %w", err)
}
// Update chunk associations with the file ID (now that we have it)
for i := range chunks {
chunks[i].fileChunk.FileID = fileToProcess.File.ID
}
for _, ci := range chunks {
// Create file-chunk mapping
if err := s.repos.FileChunks.Create(txCtx, tx, &ci.fileChunk); err != nil {
return fmt.Errorf("creating file chunk: %w", err)
}
// Create chunk-file mapping
chunkFile := &database.ChunkFile{
ChunkHash: ci.fileChunk.ChunkHash,
FileID: fileToProcess.File.ID,
FileOffset: ci.offset,
Length: ci.size,
}
if err := s.repos.ChunkFiles.Create(txCtx, tx, chunkFile); err != nil {
return fmt.Errorf("creating chunk file: %w", err)
}
}
// Add file to snapshot
if err := s.repos.Snapshots.AddFileByID(txCtx, tx, s.snapshotID, fileToProcess.File.ID); err != nil {
return fmt.Errorf("adding file to snapshot: %w", err)
}
return nil
})
return nil
return err
}
// GetProgress returns the progress reporter for this scanner
@ -1256,105 +960,6 @@ func (s *Scanner) detectDeletedFilesFromMap(ctx context.Context, knownFiles map[
return nil
}
// compileExcludePatterns compiles the exclude patterns into glob matchers
func compileExcludePatterns(patterns []string) []compiledPattern {
var compiled []compiledPattern
for _, p := range patterns {
if p == "" {
continue
}
// Check if pattern is anchored (starts with /)
anchored := strings.HasPrefix(p, "/")
pattern := p
if anchored {
pattern = p[1:] // Remove leading /
}
// Remove trailing slash if present (directory indicator)
pattern = strings.TrimSuffix(pattern, "/")
// Compile the glob pattern
// For patterns without path separators, we need to match them as components
// e.g., ".git" should match ".git" anywhere in the path
g, err := glob.Compile(pattern, '/')
if err != nil {
log.Warn("Invalid exclude pattern, skipping", "pattern", p, "error", err)
continue
}
compiled = append(compiled, compiledPattern{
pattern: g,
anchored: anchored,
original: p,
})
}
return compiled
}
// shouldExclude checks if a path should be excluded based on exclude patterns
// filePath is the full path to the file
// rootPath is the root of the backup source directory
func (s *Scanner) shouldExclude(filePath, rootPath string) bool {
if len(s.compiledExclude) == 0 {
return false
}
// Get the relative path from root
relPath, err := filepath.Rel(rootPath, filePath)
if err != nil {
return false
}
// Never exclude the root directory itself
if relPath == "." {
return false
}
// Normalize path separators
relPath = filepath.ToSlash(relPath)
// Check each pattern
for _, cp := range s.compiledExclude {
if cp.anchored {
// Anchored pattern: must match from the root
// Match the relative path directly
if cp.pattern.Match(relPath) {
return true
}
// Also check if any prefix of the path matches (for directory patterns)
parts := strings.Split(relPath, "/")
for i := 1; i <= len(parts); i++ {
prefix := strings.Join(parts[:i], "/")
if cp.pattern.Match(prefix) {
return true
}
}
} else {
// Unanchored pattern: can match anywhere in path
// Check the full relative path
if cp.pattern.Match(relPath) {
return true
}
// Check each path component and subpath
parts := strings.Split(relPath, "/")
for i := range parts {
// Match individual component (e.g., ".git" matches ".git" directory)
if cp.pattern.Match(parts[i]) {
return true
}
// Match subpath from this component onwards
subpath := strings.Join(parts[i:], "/")
if cp.pattern.Match(subpath) {
return true
}
}
}
}
return false
}
// formatNumber formats a number with comma separators
func formatNumber(n int) string {
if n < 1000 {

View File

@ -10,7 +10,6 @@ import (
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/spf13/afero"
)
@ -75,7 +74,7 @@ func TestScannerSimpleDirectory(t *testing.T) {
snapshotID := "test-snapshot-001"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID),
ID: snapshotID,
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),
@ -210,7 +209,7 @@ func TestScannerLargeFile(t *testing.T) {
snapshotID := "test-snapshot-001"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID),
ID: snapshotID,
Hostname: "test-host",
VaultikVersion: "test",
StartedAt: time.Now(),

View File

@ -46,7 +46,6 @@ import (
"io"
"os/exec"
"path/filepath"
"strings"
"time"
"git.eeqj.de/sneak/vaultik/internal/blobgen"
@ -54,7 +53,6 @@ import (
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/storage"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/dustin/go-humanize"
"github.com/spf13/afero"
"go.uber.org/fx"
@ -91,35 +89,15 @@ func (sm *SnapshotManager) SetFilesystem(fs afero.Fs) {
sm.fs = fs
}
// CreateSnapshot creates a new snapshot record in the database at the start of a backup.
// Deprecated: Use CreateSnapshotWithName instead for multi-snapshot support.
// CreateSnapshot creates a new snapshot record in the database at the start of a backup
func (sm *SnapshotManager) CreateSnapshot(ctx context.Context, hostname, version, gitRevision string) (string, error) {
return sm.CreateSnapshotWithName(ctx, hostname, "", version, gitRevision)
}
// CreateSnapshotWithName creates a new snapshot record with an optional snapshot name.
// The snapshot ID format is: hostname_name_timestamp or hostname_timestamp if name is empty.
func (sm *SnapshotManager) CreateSnapshotWithName(ctx context.Context, hostname, name, version, gitRevision string) (string, error) {
// Use short hostname (strip domain if present)
shortHostname := hostname
if idx := strings.Index(hostname, "."); idx != -1 {
shortHostname = hostname[:idx]
}
// Build snapshot ID with optional name
timestamp := time.Now().UTC().Format("2006-01-02T15:04:05Z")
var snapshotID string
if name != "" {
snapshotID = fmt.Sprintf("%s_%s_%s", shortHostname, name, timestamp)
} else {
snapshotID = fmt.Sprintf("%s_%s", shortHostname, timestamp)
}
snapshotID := fmt.Sprintf("%s-%s", hostname, time.Now().UTC().Format("20060102-150405Z"))
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID),
Hostname: types.Hostname(hostname),
VaultikVersion: types.Version(version),
VaultikGitRevision: types.GitRevision(gitRevision),
ID: snapshotID,
Hostname: hostname,
VaultikVersion: version,
VaultikGitRevision: gitRevision,
StartedAt: time.Now().UTC(),
CompletedAt: nil, // Not completed yet
FileCount: 0,
@ -668,7 +646,7 @@ func (sm *SnapshotManager) CleanupIncompleteSnapshots(ctx context.Context, hostn
log.Info("Cleaning up incomplete snapshot record", "snapshot_id", snapshot.ID, "started_at", snapshot.StartedAt)
// Delete the snapshot and all its associations
if err := sm.deleteSnapshot(ctx, snapshot.ID.String()); err != nil {
if err := sm.deleteSnapshot(ctx, snapshot.ID); err != nil {
return fmt.Errorf("deleting incomplete snapshot %s: %w", snapshot.ID, err)
}
@ -677,7 +655,7 @@ func (sm *SnapshotManager) CleanupIncompleteSnapshots(ctx context.Context, hostn
// Metadata exists - this snapshot was completed but database wasn't updated
// This shouldn't happen in normal operation, but mark it complete
log.Warn("Found snapshot with S3 metadata but incomplete in database", "snapshot_id", snapshot.ID)
if err := sm.repos.Snapshots.MarkComplete(ctx, nil, snapshot.ID.String()); err != nil {
if err := sm.repos.Snapshots.MarkComplete(ctx, nil, snapshot.ID); err != nil {
log.Error("Failed to mark snapshot as complete in database", "snapshot_id", snapshot.ID, "error", err)
}
}
@ -710,16 +688,15 @@ func (sm *SnapshotManager) deleteSnapshot(ctx context.Context, snapshotID string
// Clean up orphaned data
log.Debug("Cleaning up orphaned records in main database")
if err := sm.CleanupOrphanedData(ctx); err != nil {
if err := sm.cleanupOrphanedData(ctx); err != nil {
return fmt.Errorf("cleaning up orphaned data: %w", err)
}
return nil
}
// CleanupOrphanedData removes files, chunks, and blobs that are no longer referenced by any snapshot.
// This should be called periodically to clean up data from deleted or incomplete snapshots.
func (sm *SnapshotManager) CleanupOrphanedData(ctx context.Context) error {
// cleanupOrphanedData removes files, chunks, and blobs that are no longer referenced by any snapshot
func (sm *SnapshotManager) cleanupOrphanedData(ctx context.Context) error {
// Order is important to respect foreign key constraints:
// 1. Delete orphaned files (will cascade delete file_chunks)
// 2. Delete orphaned blobs (will cascade delete blob_chunks for deleted blobs)

View File

@ -101,7 +101,7 @@ func TestCleanSnapshotDBEmptySnapshot(t *testing.T) {
config: cfg,
fs: fs,
}
if _, err := sm.cleanSnapshotDB(ctx, tempDBPath, snapshot.ID.String()); err != nil {
if _, err := sm.cleanSnapshotDB(ctx, tempDBPath, snapshot.ID); err != nil {
t.Fatalf("failed to clean snapshot database: %v", err)
}
@ -119,7 +119,7 @@ func TestCleanSnapshotDBEmptySnapshot(t *testing.T) {
cleanedRepos := database.NewRepositories(cleanedDB)
// Verify snapshot exists
verifySnapshot, err := cleanedRepos.Snapshots.GetByID(ctx, snapshot.ID.String())
verifySnapshot, err := cleanedRepos.Snapshots.GetByID(ctx, snapshot.ID)
if err != nil {
t.Fatalf("failed to get snapshot: %v", err)
}
@ -128,7 +128,7 @@ func TestCleanSnapshotDBEmptySnapshot(t *testing.T) {
}
// Verify orphan file is gone
f, err := cleanedRepos.Files.GetByPath(ctx, file.Path.String())
f, err := cleanedRepos.Files.GetByPath(ctx, file.Path)
if err != nil {
t.Fatalf("failed to check file: %v", err)
}
@ -137,7 +137,7 @@ func TestCleanSnapshotDBEmptySnapshot(t *testing.T) {
}
// Verify orphan chunk is gone
c, err := cleanedRepos.Chunks.GetByHash(ctx, chunk.ChunkHash.String())
c, err := cleanedRepos.Chunks.GetByHash(ctx, chunk.ChunkHash)
if err != nil {
t.Fatalf("failed to check chunk: %v", err)
}

View File

@ -1,203 +0,0 @@
// Package types provides custom types for better type safety across the vaultik codebase.
// Using distinct types for IDs, hashes, paths, and credentials prevents accidental
// mixing of semantically different values that happen to share the same underlying type.
package types
import (
"database/sql/driver"
"fmt"
"github.com/google/uuid"
)
// FileID is a UUID identifying a file record in the database.
type FileID uuid.UUID
// NewFileID generates a new random FileID.
func NewFileID() FileID {
return FileID(uuid.New())
}
// ParseFileID parses a string into a FileID.
func ParseFileID(s string) (FileID, error) {
id, err := uuid.Parse(s)
if err != nil {
return FileID{}, err
}
return FileID(id), nil
}
// IsZero returns true if the FileID is the zero value.
func (id FileID) IsZero() bool {
return uuid.UUID(id) == uuid.Nil
}
// Value implements driver.Valuer for database serialization.
func (id FileID) Value() (driver.Value, error) {
return uuid.UUID(id).String(), nil
}
// Scan implements sql.Scanner for database deserialization.
func (id *FileID) Scan(src interface{}) error {
if src == nil {
*id = FileID{}
return nil
}
var s string
switch v := src.(type) {
case string:
s = v
case []byte:
s = string(v)
default:
return fmt.Errorf("cannot scan %T into FileID", src)
}
parsed, err := uuid.Parse(s)
if err != nil {
return fmt.Errorf("invalid FileID: %w", err)
}
*id = FileID(parsed)
return nil
}
// BlobID is a UUID identifying a blob record in the database.
// This is distinct from BlobHash which is the content-addressed hash of the blob.
type BlobID uuid.UUID
// NewBlobID generates a new random BlobID.
func NewBlobID() BlobID {
return BlobID(uuid.New())
}
// ParseBlobID parses a string into a BlobID.
func ParseBlobID(s string) (BlobID, error) {
id, err := uuid.Parse(s)
if err != nil {
return BlobID{}, err
}
return BlobID(id), nil
}
// IsZero returns true if the BlobID is the zero value.
func (id BlobID) IsZero() bool {
return uuid.UUID(id) == uuid.Nil
}
// Value implements driver.Valuer for database serialization.
func (id BlobID) Value() (driver.Value, error) {
return uuid.UUID(id).String(), nil
}
// Scan implements sql.Scanner for database deserialization.
func (id *BlobID) Scan(src interface{}) error {
if src == nil {
*id = BlobID{}
return nil
}
var s string
switch v := src.(type) {
case string:
s = v
case []byte:
s = string(v)
default:
return fmt.Errorf("cannot scan %T into BlobID", src)
}
parsed, err := uuid.Parse(s)
if err != nil {
return fmt.Errorf("invalid BlobID: %w", err)
}
*id = BlobID(parsed)
return nil
}
// SnapshotID identifies a snapshot, typically in format "hostname_name_timestamp".
type SnapshotID string
// ChunkHash is the SHA256 hash of a chunk's content.
// Used for content-addressing and deduplication of file chunks.
type ChunkHash string
// BlobHash is the SHA256 hash of a blob's compressed and encrypted content.
// This is used as the filename in S3 storage for content-addressed retrieval.
type BlobHash string
// FilePath represents an absolute path to a file or directory.
type FilePath string
// SourcePath represents the root directory from which files are backed up.
// Used during restore to strip the source prefix from paths.
type SourcePath string
// AgeRecipient is an age public key used for encryption.
// Format: age1... (Bech32-encoded X25519 public key)
type AgeRecipient string
// AgeSecretKey is an age private key used for decryption.
// Format: AGE-SECRET-KEY-... (Bech32-encoded X25519 private key)
// This type should never be logged or serialized in plaintext.
type AgeSecretKey string
// S3Endpoint is the URL of an S3-compatible storage endpoint.
type S3Endpoint string
// BucketName is the name of an S3 bucket.
type BucketName string
// S3Prefix is the path prefix within an S3 bucket.
type S3Prefix string
// AWSRegion is an AWS region identifier (e.g., "us-east-1").
type AWSRegion string
// AWSAccessKeyID is an AWS access key ID for authentication.
type AWSAccessKeyID string
// AWSSecretAccessKey is an AWS secret access key for authentication.
// This type should never be logged or serialized in plaintext.
type AWSSecretAccessKey string
// Hostname identifies a host machine.
type Hostname string
// Version is a semantic version string.
type Version string
// GitRevision is a git commit SHA.
type GitRevision string
// GlobPattern is a glob pattern for file matching (e.g., "*.log", "node_modules").
type GlobPattern string
// String methods for Stringer interface
func (id FileID) String() string { return uuid.UUID(id).String() }
func (id BlobID) String() string { return uuid.UUID(id).String() }
func (id SnapshotID) String() string { return string(id) }
func (h ChunkHash) String() string { return string(h) }
func (h BlobHash) String() string { return string(h) }
func (p FilePath) String() string { return string(p) }
func (p SourcePath) String() string { return string(p) }
func (r AgeRecipient) String() string { return string(r) }
func (e S3Endpoint) String() string { return string(e) }
func (b BucketName) String() string { return string(b) }
func (p S3Prefix) String() string { return string(p) }
func (r AWSRegion) String() string { return string(r) }
func (k AWSAccessKeyID) String() string { return string(k) }
func (h Hostname) String() string { return string(h) }
func (v Version) String() string { return string(v) }
func (r GitRevision) String() string { return string(r) }
func (p GlobPattern) String() string { return string(p) }
// Redacted String methods for sensitive types - prevents accidental logging
func (k AgeSecretKey) String() string { return "[REDACTED]" }
func (k AWSSecretAccessKey) String() string { return "[REDACTED]" }
// Raw returns the actual value for sensitive types when explicitly needed
func (k AgeSecretKey) Raw() string { return string(k) }
func (k AWSSecretAccessKey) Raw() string { return string(k) }

View File

@ -5,15 +5,13 @@ import (
"strconv"
"strings"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
)
// SnapshotInfo contains information about a snapshot
type SnapshotInfo struct {
ID types.SnapshotID `json:"id"`
Timestamp time.Time `json:"timestamp"`
CompressedSize int64 `json:"compressed_size"`
ID string `json:"id"`
Timestamp time.Time `json:"timestamp"`
CompressedSize int64 `json:"compressed_size"`
}
// formatNumber formats a number with commas
@ -62,18 +60,27 @@ func formatBytes(bytes int64) string {
}
// parseSnapshotTimestamp extracts the timestamp from a snapshot ID
// Format: hostname_snapshotname_2026-01-12T14:41:15Z
func parseSnapshotTimestamp(snapshotID string) (time.Time, error) {
parts := strings.Split(snapshotID, "_")
if len(parts) < 2 {
return time.Time{}, fmt.Errorf("invalid snapshot ID format: expected hostname_snapshotname_timestamp")
// Format: hostname-YYYYMMDD-HHMMSSZ
parts := strings.Split(snapshotID, "-")
if len(parts) < 3 {
return time.Time{}, fmt.Errorf("invalid snapshot ID format")
}
// Last part is the RFC3339 timestamp
timestampStr := parts[len(parts)-1]
timestamp, err := time.Parse(time.RFC3339, timestampStr)
dateStr := parts[len(parts)-2]
timeStr := parts[len(parts)-1]
if len(dateStr) != 8 || len(timeStr) != 7 || !strings.HasSuffix(timeStr, "Z") {
return time.Time{}, fmt.Errorf("invalid timestamp format")
}
// Remove Z suffix
timeStr = timeStr[:6]
// Parse the timestamp
timestamp, err := time.Parse("20060102150405", dateStr+timeStr)
if err != nil {
return time.Time{}, fmt.Errorf("invalid timestamp: %w", err)
return time.Time{}, fmt.Errorf("failed to parse timestamp: %w", err)
}
return timestamp.UTC(), nil

View File

@ -30,23 +30,14 @@ func (v *Vaultik) ShowInfo() error {
// Backup Settings
fmt.Printf("=== Backup Settings ===\n")
// Show configured snapshots
fmt.Printf("Snapshots:\n")
for _, name := range v.Config.SnapshotNames() {
snap := v.Config.Snapshots[name]
fmt.Printf(" %s:\n", name)
for _, path := range snap.Paths {
fmt.Printf(" - %s\n", path)
}
if len(snap.Exclude) > 0 {
fmt.Printf(" exclude: %s\n", strings.Join(snap.Exclude, ", "))
}
fmt.Printf("Source Directories:\n")
for _, dir := range v.Config.SourceDirs {
fmt.Printf(" - %s\n", dir)
}
// Global exclude patterns
if len(v.Config.Exclude) > 0 {
fmt.Printf("Global Exclude: %s\n", strings.Join(v.Config.Exclude, ", "))
fmt.Printf("Exclude Patterns: %s\n", strings.Join(v.Config.Exclude, ", "))
}
fmt.Printf("Compression: zstd level %d\n", v.Config.CompressionLevel)

View File

@ -14,7 +14,6 @@ import (
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/storage"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
@ -185,11 +184,7 @@ func TestEndToEndBackup(t *testing.T) {
// Create test configuration
cfg := &config.Config{
Snapshots: map[string]config.SnapshotConfig{
"test": {
Paths: []string{"/home/user"},
},
},
SourceDirs: []string{"/home/user"},
Exclude: []string{"*.tmp", "*.log"},
ChunkSize: config.Size(16 * 1024), // 16KB chunks
BlobSizeLimit: config.Size(100 * 1024), // 100KB blobs
@ -237,7 +232,7 @@ func TestEndToEndBackup(t *testing.T) {
snapshotID := "test-snapshot-001"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID),
ID: snapshotID,
Hostname: "test-host",
VaultikVersion: "test-version",
StartedAt: time.Now(),
@ -357,7 +352,7 @@ func TestBackupAndVerify(t *testing.T) {
snapshotID := "test-snapshot-001"
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
snapshot := &database.Snapshot{
ID: types.SnapshotID(snapshotID),
ID: snapshotID,
Hostname: "test-host",
VaultikVersion: "test-version",
StartedAt: time.Now(),

View File

@ -1,9 +1,7 @@
package vaultik
import (
"encoding/json"
"fmt"
"os"
"strings"
"git.eeqj.de/sneak/vaultik/internal/log"
@ -13,15 +11,6 @@ import (
// PruneOptions contains options for the prune command
type PruneOptions struct {
Force bool
JSON bool
}
// PruneBlobsResult contains the result of a blob prune operation
type PruneBlobsResult struct {
BlobsFound int `json:"blobs_found"`
BlobsDeleted int `json:"blobs_deleted"`
BlobsFailed int `json:"blobs_failed,omitempty"`
BytesFreed int64 `json:"bytes_freed"`
}
// PruneBlobs removes unreferenced blobs from storage
@ -114,27 +103,18 @@ func (v *Vaultik) PruneBlobs(opts *PruneOptions) error {
}
}
result := &PruneBlobsResult{
BlobsFound: len(unreferencedBlobs),
}
if len(unreferencedBlobs) == 0 {
log.Info("No unreferenced blobs found")
if opts.JSON {
return outputPruneBlobsJSON(result)
}
fmt.Println("No unreferenced blobs to remove.")
return nil
}
// Show what will be deleted
log.Info("Found unreferenced blobs", "count", len(unreferencedBlobs), "total_size", humanize.Bytes(uint64(totalSize)))
if !opts.JSON {
fmt.Printf("Found %d unreferenced blob(s) totaling %s\n", len(unreferencedBlobs), humanize.Bytes(uint64(totalSize)))
}
fmt.Printf("Found %d unreferenced blob(s) totaling %s\n", len(unreferencedBlobs), humanize.Bytes(uint64(totalSize)))
// Confirm unless --force is used (skip in JSON mode - require --force)
if !opts.Force && !opts.JSON {
// Confirm unless --force is used
if !opts.Force {
fmt.Printf("\nDelete %d unreferenced blob(s)? [y/N] ", len(unreferencedBlobs))
var confirm string
if _, err := fmt.Scanln(&confirm); err != nil {
@ -174,20 +154,12 @@ func (v *Vaultik) PruneBlobs(opts *PruneOptions) error {
}
}
result.BlobsDeleted = deletedCount
result.BlobsFailed = len(unreferencedBlobs) - deletedCount
result.BytesFreed = deletedSize
log.Info("Prune complete",
"deleted_count", deletedCount,
"deleted_size", humanize.Bytes(uint64(deletedSize)),
"failed", len(unreferencedBlobs)-deletedCount,
)
if opts.JSON {
return outputPruneBlobsJSON(result)
}
fmt.Printf("\nDeleted %d blob(s) totaling %s\n", deletedCount, humanize.Bytes(uint64(deletedSize)))
if deletedCount < len(unreferencedBlobs) {
fmt.Printf("Failed to delete %d blob(s)\n", len(unreferencedBlobs)-deletedCount)
@ -195,10 +167,3 @@ func (v *Vaultik) PruneBlobs(opts *PruneOptions) error {
return nil
}
// outputPruneBlobsJSON outputs the prune result as JSON
func outputPruneBlobsJSON(result *PruneBlobsResult) error {
encoder := json.NewEncoder(os.Stdout)
encoder.SetIndent("", " ")
return encoder.Encode(result)
}

View File

@ -1,675 +0,0 @@
package vaultik
import (
"bytes"
"context"
"crypto/sha256"
"encoding/hex"
"fmt"
"io"
"os"
"os/exec"
"path/filepath"
"time"
"filippo.io/age"
"git.eeqj.de/sneak/vaultik/internal/blobgen"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/dustin/go-humanize"
"github.com/schollz/progressbar/v3"
"github.com/spf13/afero"
"golang.org/x/term"
)
// RestoreOptions contains options for the restore operation
type RestoreOptions struct {
SnapshotID string
TargetDir string
Paths []string // Optional paths to restore (empty = all)
Verify bool // Verify restored files by checking chunk hashes
}
// RestoreResult contains statistics from a restore operation
type RestoreResult struct {
FilesRestored int
BytesRestored int64
BlobsDownloaded int
BytesDownloaded int64
Duration time.Duration
// Verification results (only populated if Verify option is set)
FilesVerified int
BytesVerified int64
FilesFailed int
FailedFiles []string // Paths of files that failed verification
}
// Restore restores files from a snapshot to the target directory
func (v *Vaultik) Restore(opts *RestoreOptions) error {
startTime := time.Now()
// Check for age_secret_key
if v.Config.AgeSecretKey == "" {
return fmt.Errorf("decryption key required for restore\n\nSet the VAULTIK_AGE_SECRET_KEY environment variable to your age private key:\n export VAULTIK_AGE_SECRET_KEY='AGE-SECRET-KEY-...'")
}
// Parse the age identity
identity, err := age.ParseX25519Identity(v.Config.AgeSecretKey)
if err != nil {
return fmt.Errorf("parsing age secret key: %w", err)
}
log.Info("Starting restore operation",
"snapshot_id", opts.SnapshotID,
"target_dir", opts.TargetDir,
"paths", opts.Paths,
)
// Step 1: Download and decrypt the snapshot metadata database
log.Info("Downloading snapshot metadata...")
tempDB, err := v.downloadSnapshotDB(opts.SnapshotID, identity)
if err != nil {
return fmt.Errorf("downloading snapshot database: %w", err)
}
defer func() {
if err := tempDB.Close(); err != nil {
log.Debug("Failed to close temp database", "error", err)
}
// Clean up temp file
if err := v.Fs.Remove(tempDB.Path()); err != nil {
log.Debug("Failed to remove temp database", "error", err)
}
}()
repos := database.NewRepositories(tempDB)
// Step 2: Get list of files to restore
files, err := v.getFilesToRestore(v.ctx, repos, opts.Paths)
if err != nil {
return fmt.Errorf("getting files to restore: %w", err)
}
if len(files) == 0 {
log.Warn("No files found to restore")
return nil
}
log.Info("Found files to restore", "count", len(files))
// Step 3: Create target directory
if err := v.Fs.MkdirAll(opts.TargetDir, 0755); err != nil {
return fmt.Errorf("creating target directory: %w", err)
}
// Step 4: Build a map of chunks to blobs for efficient restoration
chunkToBlobMap, err := v.buildChunkToBlobMap(v.ctx, repos)
if err != nil {
return fmt.Errorf("building chunk-to-blob map: %w", err)
}
// Step 5: Restore files
result := &RestoreResult{}
blobCache := make(map[string][]byte) // Cache downloaded and decrypted blobs
for i, file := range files {
if v.ctx.Err() != nil {
return v.ctx.Err()
}
if err := v.restoreFile(v.ctx, repos, file, opts.TargetDir, identity, chunkToBlobMap, blobCache, result); err != nil {
log.Error("Failed to restore file", "path", file.Path, "error", err)
// Continue with other files
continue
}
// Progress logging
if (i+1)%100 == 0 || i+1 == len(files) {
log.Info("Restore progress",
"files", fmt.Sprintf("%d/%d", i+1, len(files)),
"bytes", humanize.Bytes(uint64(result.BytesRestored)),
)
}
}
result.Duration = time.Since(startTime)
log.Info("Restore complete",
"files_restored", result.FilesRestored,
"bytes_restored", humanize.Bytes(uint64(result.BytesRestored)),
"blobs_downloaded", result.BlobsDownloaded,
"bytes_downloaded", humanize.Bytes(uint64(result.BytesDownloaded)),
"duration", result.Duration,
)
_, _ = fmt.Fprintf(v.Stdout, "Restored %d files (%s) in %s\n",
result.FilesRestored,
humanize.Bytes(uint64(result.BytesRestored)),
result.Duration.Round(time.Second),
)
// Run verification if requested
if opts.Verify {
if err := v.verifyRestoredFiles(v.ctx, repos, files, opts.TargetDir, result); err != nil {
return fmt.Errorf("verification failed: %w", err)
}
if result.FilesFailed > 0 {
_, _ = fmt.Fprintf(v.Stdout, "\nVerification FAILED: %d files did not match expected checksums\n", result.FilesFailed)
for _, path := range result.FailedFiles {
_, _ = fmt.Fprintf(v.Stdout, " - %s\n", path)
}
return fmt.Errorf("%d files failed verification", result.FilesFailed)
}
_, _ = fmt.Fprintf(v.Stdout, "Verified %d files (%s)\n",
result.FilesVerified,
humanize.Bytes(uint64(result.BytesVerified)),
)
}
return nil
}
// downloadSnapshotDB downloads and decrypts the snapshot metadata database
func (v *Vaultik) downloadSnapshotDB(snapshotID string, identity age.Identity) (*database.DB, error) {
// Download encrypted database from S3
dbKey := fmt.Sprintf("metadata/%s/db.zst.age", snapshotID)
reader, err := v.Storage.Get(v.ctx, dbKey)
if err != nil {
return nil, fmt.Errorf("downloading %s: %w", dbKey, err)
}
defer func() { _ = reader.Close() }()
// Read all data
encryptedData, err := io.ReadAll(reader)
if err != nil {
return nil, fmt.Errorf("reading encrypted data: %w", err)
}
log.Debug("Downloaded encrypted database", "size", humanize.Bytes(uint64(len(encryptedData))))
// Decrypt and decompress using blobgen.Reader
blobReader, err := blobgen.NewReader(bytes.NewReader(encryptedData), identity)
if err != nil {
return nil, fmt.Errorf("creating decryption reader: %w", err)
}
defer func() { _ = blobReader.Close() }()
// Read the SQL dump
sqlDump, err := io.ReadAll(blobReader)
if err != nil {
return nil, fmt.Errorf("decrypting and decompressing: %w", err)
}
log.Debug("Decrypted database SQL dump", "size", humanize.Bytes(uint64(len(sqlDump))))
// Create a temporary database file
tempFile, err := afero.TempFile(v.Fs, "", "vaultik-restore-*.db")
if err != nil {
return nil, fmt.Errorf("creating temp file: %w", err)
}
tempPath := tempFile.Name()
if err := tempFile.Close(); err != nil {
return nil, fmt.Errorf("closing temp file: %w", err)
}
// Write SQL to a temp file for sqlite3 to read
sqlTempFile, err := afero.TempFile(v.Fs, "", "vaultik-restore-*.sql")
if err != nil {
return nil, fmt.Errorf("creating SQL temp file: %w", err)
}
sqlTempPath := sqlTempFile.Name()
if _, err := sqlTempFile.Write(sqlDump); err != nil {
_ = sqlTempFile.Close()
return nil, fmt.Errorf("writing SQL dump: %w", err)
}
if err := sqlTempFile.Close(); err != nil {
return nil, fmt.Errorf("closing SQL temp file: %w", err)
}
defer func() { _ = v.Fs.Remove(sqlTempPath) }()
// Execute the SQL dump to create the database
cmd := exec.Command("sqlite3", tempPath, ".read "+sqlTempPath)
if output, err := cmd.CombinedOutput(); err != nil {
return nil, fmt.Errorf("executing SQL dump: %w\nOutput: %s", err, output)
}
log.Debug("Created restore database", "path", tempPath)
// Open the database
db, err := database.New(v.ctx, tempPath)
if err != nil {
return nil, fmt.Errorf("opening restore database: %w", err)
}
return db, nil
}
// getFilesToRestore returns the list of files to restore based on path filters
func (v *Vaultik) getFilesToRestore(ctx context.Context, repos *database.Repositories, pathFilters []string) ([]*database.File, error) {
// If no filters, get all files
if len(pathFilters) == 0 {
return repos.Files.ListAll(ctx)
}
// Get files matching the path filters
var result []*database.File
seen := make(map[string]bool)
for _, filter := range pathFilters {
// Normalize the filter path
filter = filepath.Clean(filter)
// Get files with this prefix
files, err := repos.Files.ListByPrefix(ctx, filter)
if err != nil {
return nil, fmt.Errorf("listing files with prefix %s: %w", filter, err)
}
for _, file := range files {
if !seen[file.ID.String()] {
seen[file.ID.String()] = true
result = append(result, file)
}
}
}
return result, nil
}
// buildChunkToBlobMap creates a mapping from chunk hash to blob information
func (v *Vaultik) buildChunkToBlobMap(ctx context.Context, repos *database.Repositories) (map[string]*database.BlobChunk, error) {
// Query all blob_chunks
query := `SELECT blob_id, chunk_hash, offset, length FROM blob_chunks`
rows, err := repos.DB().Conn().QueryContext(ctx, query)
if err != nil {
return nil, fmt.Errorf("querying blob_chunks: %w", err)
}
defer func() { _ = rows.Close() }()
result := make(map[string]*database.BlobChunk)
for rows.Next() {
var bc database.BlobChunk
var blobIDStr, chunkHashStr string
if err := rows.Scan(&blobIDStr, &chunkHashStr, &bc.Offset, &bc.Length); err != nil {
return nil, fmt.Errorf("scanning blob_chunk: %w", err)
}
blobID, err := types.ParseBlobID(blobIDStr)
if err != nil {
return nil, fmt.Errorf("parsing blob ID: %w", err)
}
bc.BlobID = blobID
bc.ChunkHash = types.ChunkHash(chunkHashStr)
result[chunkHashStr] = &bc
}
return result, rows.Err()
}
// restoreFile restores a single file
func (v *Vaultik) restoreFile(
ctx context.Context,
repos *database.Repositories,
file *database.File,
targetDir string,
identity age.Identity,
chunkToBlobMap map[string]*database.BlobChunk,
blobCache map[string][]byte,
result *RestoreResult,
) error {
// Calculate target path - use full original path under target directory
targetPath := filepath.Join(targetDir, file.Path.String())
// Create parent directories
parentDir := filepath.Dir(targetPath)
if err := v.Fs.MkdirAll(parentDir, 0755); err != nil {
return fmt.Errorf("creating parent directory: %w", err)
}
// Handle symlinks
if file.IsSymlink() {
return v.restoreSymlink(file, targetPath, result)
}
// Handle directories
if file.Mode&uint32(os.ModeDir) != 0 {
return v.restoreDirectory(file, targetPath, result)
}
// Handle regular files
return v.restoreRegularFile(ctx, repos, file, targetPath, identity, chunkToBlobMap, blobCache, result)
}
// restoreSymlink restores a symbolic link
func (v *Vaultik) restoreSymlink(file *database.File, targetPath string, result *RestoreResult) error {
// Remove existing file if it exists
_ = v.Fs.Remove(targetPath)
// Create symlink
// Note: afero.MemMapFs doesn't support symlinks, so we use os for real filesystems
if osFs, ok := v.Fs.(*afero.OsFs); ok {
_ = osFs // silence unused variable warning
if err := os.Symlink(file.LinkTarget.String(), targetPath); err != nil {
return fmt.Errorf("creating symlink: %w", err)
}
} else {
log.Debug("Symlink creation not supported on this filesystem", "path", file.Path, "target", file.LinkTarget)
}
result.FilesRestored++
log.Debug("Restored symlink", "path", file.Path, "target", file.LinkTarget)
return nil
}
// restoreDirectory restores a directory with proper permissions
func (v *Vaultik) restoreDirectory(file *database.File, targetPath string, result *RestoreResult) error {
// Create directory
if err := v.Fs.MkdirAll(targetPath, os.FileMode(file.Mode)); err != nil {
return fmt.Errorf("creating directory: %w", err)
}
// Set permissions
if err := v.Fs.Chmod(targetPath, os.FileMode(file.Mode)); err != nil {
log.Debug("Failed to set directory permissions", "path", targetPath, "error", err)
}
// Set ownership (requires root)
if osFs, ok := v.Fs.(*afero.OsFs); ok {
_ = osFs
if err := os.Chown(targetPath, int(file.UID), int(file.GID)); err != nil {
log.Debug("Failed to set directory ownership", "path", targetPath, "error", err)
}
}
// Set mtime
if err := v.Fs.Chtimes(targetPath, file.MTime, file.MTime); err != nil {
log.Debug("Failed to set directory mtime", "path", targetPath, "error", err)
}
result.FilesRestored++
return nil
}
// restoreRegularFile restores a regular file by reconstructing it from chunks
func (v *Vaultik) restoreRegularFile(
ctx context.Context,
repos *database.Repositories,
file *database.File,
targetPath string,
identity age.Identity,
chunkToBlobMap map[string]*database.BlobChunk,
blobCache map[string][]byte,
result *RestoreResult,
) error {
// Get file chunks in order
fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
if err != nil {
return fmt.Errorf("getting file chunks: %w", err)
}
// Create output file
outFile, err := v.Fs.Create(targetPath)
if err != nil {
return fmt.Errorf("creating output file: %w", err)
}
defer func() { _ = outFile.Close() }()
// Write chunks in order
var bytesWritten int64
for _, fc := range fileChunks {
// Find which blob contains this chunk
chunkHashStr := fc.ChunkHash.String()
blobChunk, ok := chunkToBlobMap[chunkHashStr]
if !ok {
return fmt.Errorf("chunk %s not found in any blob", chunkHashStr[:16])
}
// Get the blob's hash from the database
blob, err := repos.Blobs.GetByID(ctx, blobChunk.BlobID.String())
if err != nil {
return fmt.Errorf("getting blob %s: %w", blobChunk.BlobID, err)
}
// Download and decrypt blob if not cached
blobHashStr := blob.Hash.String()
blobData, ok := blobCache[blobHashStr]
if !ok {
blobData, err = v.downloadBlob(ctx, blobHashStr, identity)
if err != nil {
return fmt.Errorf("downloading blob %s: %w", blobHashStr[:16], err)
}
blobCache[blobHashStr] = blobData
result.BlobsDownloaded++
result.BytesDownloaded += int64(len(blobData))
}
// Extract chunk from blob
if blobChunk.Offset+blobChunk.Length > int64(len(blobData)) {
return fmt.Errorf("chunk %s extends beyond blob data (offset=%d, length=%d, blob_size=%d)",
fc.ChunkHash[:16], blobChunk.Offset, blobChunk.Length, len(blobData))
}
chunkData := blobData[blobChunk.Offset : blobChunk.Offset+blobChunk.Length]
// Write chunk to output file
n, err := outFile.Write(chunkData)
if err != nil {
return fmt.Errorf("writing chunk: %w", err)
}
bytesWritten += int64(n)
}
// Close file before setting metadata
if err := outFile.Close(); err != nil {
return fmt.Errorf("closing output file: %w", err)
}
// Set permissions
if err := v.Fs.Chmod(targetPath, os.FileMode(file.Mode)); err != nil {
log.Debug("Failed to set file permissions", "path", targetPath, "error", err)
}
// Set ownership (requires root)
if osFs, ok := v.Fs.(*afero.OsFs); ok {
_ = osFs
if err := os.Chown(targetPath, int(file.UID), int(file.GID)); err != nil {
log.Debug("Failed to set file ownership", "path", targetPath, "error", err)
}
}
// Set mtime
if err := v.Fs.Chtimes(targetPath, file.MTime, file.MTime); err != nil {
log.Debug("Failed to set file mtime", "path", targetPath, "error", err)
}
result.FilesRestored++
result.BytesRestored += bytesWritten
log.Debug("Restored file", "path", file.Path, "size", humanize.Bytes(uint64(bytesWritten)))
return nil
}
// downloadBlob downloads and decrypts a blob
func (v *Vaultik) downloadBlob(ctx context.Context, blobHash string, identity age.Identity) ([]byte, error) {
// Construct blob path with sharding
blobPath := fmt.Sprintf("blobs/%s/%s/%s", blobHash[:2], blobHash[2:4], blobHash)
reader, err := v.Storage.Get(ctx, blobPath)
if err != nil {
return nil, fmt.Errorf("downloading blob: %w", err)
}
defer func() { _ = reader.Close() }()
// Read encrypted data
encryptedData, err := io.ReadAll(reader)
if err != nil {
return nil, fmt.Errorf("reading blob data: %w", err)
}
// Decrypt and decompress
blobReader, err := blobgen.NewReader(bytes.NewReader(encryptedData), identity)
if err != nil {
return nil, fmt.Errorf("creating decryption reader: %w", err)
}
defer func() { _ = blobReader.Close() }()
data, err := io.ReadAll(blobReader)
if err != nil {
return nil, fmt.Errorf("decrypting blob: %w", err)
}
log.Debug("Downloaded and decrypted blob",
"hash", blobHash[:16],
"encrypted_size", humanize.Bytes(uint64(len(encryptedData))),
"decrypted_size", humanize.Bytes(uint64(len(data))),
)
return data, nil
}
// verifyRestoredFiles verifies that all restored files match their expected chunk hashes
func (v *Vaultik) verifyRestoredFiles(
ctx context.Context,
repos *database.Repositories,
files []*database.File,
targetDir string,
result *RestoreResult,
) error {
// Calculate total bytes to verify for progress bar
var totalBytes int64
regularFiles := make([]*database.File, 0, len(files))
for _, file := range files {
// Skip symlinks and directories - only verify regular files
if file.IsSymlink() || file.Mode&uint32(os.ModeDir) != 0 {
continue
}
regularFiles = append(regularFiles, file)
totalBytes += file.Size
}
if len(regularFiles) == 0 {
log.Info("No regular files to verify")
return nil
}
log.Info("Verifying restored files",
"files", len(regularFiles),
"bytes", humanize.Bytes(uint64(totalBytes)),
)
_, _ = fmt.Fprintf(v.Stdout, "\nVerifying %d files (%s)...\n",
len(regularFiles),
humanize.Bytes(uint64(totalBytes)),
)
// Create progress bar if output is a terminal
var bar *progressbar.ProgressBar
if isTerminal() {
bar = progressbar.NewOptions64(
totalBytes,
progressbar.OptionSetDescription("Verifying"),
progressbar.OptionSetWriter(os.Stderr),
progressbar.OptionShowBytes(true),
progressbar.OptionShowCount(),
progressbar.OptionSetWidth(40),
progressbar.OptionThrottle(100*time.Millisecond),
progressbar.OptionOnCompletion(func() {
fmt.Fprint(os.Stderr, "\n")
}),
progressbar.OptionSetRenderBlankState(true),
)
}
// Verify each file
for _, file := range regularFiles {
if ctx.Err() != nil {
return ctx.Err()
}
targetPath := filepath.Join(targetDir, file.Path.String())
bytesVerified, err := v.verifyFile(ctx, repos, file, targetPath)
if err != nil {
log.Error("File verification failed", "path", file.Path, "error", err)
result.FilesFailed++
result.FailedFiles = append(result.FailedFiles, file.Path.String())
} else {
result.FilesVerified++
result.BytesVerified += bytesVerified
}
// Update progress bar
if bar != nil {
_ = bar.Add64(file.Size)
}
}
if bar != nil {
_ = bar.Finish()
}
log.Info("Verification complete",
"files_verified", result.FilesVerified,
"bytes_verified", humanize.Bytes(uint64(result.BytesVerified)),
"files_failed", result.FilesFailed,
)
return nil
}
// verifyFile verifies a single restored file by checking its chunk hashes
func (v *Vaultik) verifyFile(
ctx context.Context,
repos *database.Repositories,
file *database.File,
targetPath string,
) (int64, error) {
// Get file chunks in order
fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
if err != nil {
return 0, fmt.Errorf("getting file chunks: %w", err)
}
// Open the restored file
f, err := v.Fs.Open(targetPath)
if err != nil {
return 0, fmt.Errorf("opening file: %w", err)
}
defer func() { _ = f.Close() }()
// Verify each chunk
var bytesVerified int64
for _, fc := range fileChunks {
// Get chunk size from database
chunk, err := repos.Chunks.GetByHash(ctx, fc.ChunkHash.String())
if err != nil {
return bytesVerified, fmt.Errorf("getting chunk %s: %w", fc.ChunkHash.String()[:16], err)
}
// Read chunk data from file
chunkData := make([]byte, chunk.Size)
n, err := io.ReadFull(f, chunkData)
if err != nil {
return bytesVerified, fmt.Errorf("reading chunk data: %w", err)
}
if int64(n) != chunk.Size {
return bytesVerified, fmt.Errorf("short read: expected %d bytes, got %d", chunk.Size, n)
}
// Calculate hash and compare
hash := sha256.Sum256(chunkData)
actualHash := hex.EncodeToString(hash[:])
expectedHash := fc.ChunkHash.String()
if actualHash != expectedHash {
return bytesVerified, fmt.Errorf("chunk %d hash mismatch: expected %s, got %s",
fc.Idx, expectedHash[:16], actualHash[:16])
}
bytesVerified += int64(n)
}
log.Debug("File verified", "path", file.Path, "bytes", bytesVerified, "chunks", len(fileChunks))
return bytesVerified, nil
}
// isTerminal returns true if stdout is a terminal
func isTerminal() bool {
return term.IsTerminal(int(os.Stdout.Fd()))
}

View File

@ -13,22 +13,19 @@ import (
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/dustin/go-humanize"
)
// SnapshotCreateOptions contains options for the snapshot create command
type SnapshotCreateOptions struct {
Daemon bool
Cron bool
Prune bool
SkipErrors bool // Skip file read errors (log them loudly but continue)
Snapshots []string // Optional list of snapshot names to process (empty = all)
Daemon bool
Cron bool
Prune bool
}
// CreateSnapshot executes the snapshot creation operation
func (v *Vaultik) CreateSnapshot(opts *SnapshotCreateOptions) error {
overallStartTime := time.Now()
snapshotStartTime := time.Now()
log.Info("Starting snapshot creation",
"version", v.Globals.Version,
@ -46,12 +43,8 @@ func (v *Vaultik) CreateSnapshot(opts *SnapshotCreateOptions) error {
// CRITICAL: This MUST succeed. If we fail to clean up incomplete snapshots,
// the deduplication logic will think files from the incomplete snapshot were
// already backed up and skip them, resulting in data loss.
//
// Prune the database before starting: delete incomplete snapshots and orphaned data.
// This ensures the database is consistent before we start a new snapshot.
// Since we use locking, only one vaultik instance accesses the DB at a time.
if _, err := v.PruneDatabase(); err != nil {
return fmt.Errorf("prune database: %w", err)
if err := v.SnapshotManager.CleanupIncompleteSnapshots(v.ctx, hostname); err != nil {
return fmt.Errorf("cleanup incomplete snapshots: %w", err)
}
if opts.Daemon {
@ -60,51 +53,9 @@ func (v *Vaultik) CreateSnapshot(opts *SnapshotCreateOptions) error {
return fmt.Errorf("daemon mode not yet implemented")
}
// Determine which snapshots to process
snapshotNames := opts.Snapshots
if len(snapshotNames) == 0 {
snapshotNames = v.Config.SnapshotNames()
} else {
// Validate requested snapshot names exist
for _, name := range snapshotNames {
if _, ok := v.Config.Snapshots[name]; !ok {
return fmt.Errorf("snapshot %q not found in config", name)
}
}
}
if len(snapshotNames) == 0 {
return fmt.Errorf("no snapshots configured")
}
// Process each named snapshot
for snapIdx, snapName := range snapshotNames {
if err := v.createNamedSnapshot(opts, hostname, snapName, snapIdx+1, len(snapshotNames)); err != nil {
return err
}
}
// Print overall summary if multiple snapshots
if len(snapshotNames) > 1 {
_, _ = fmt.Fprintf(v.Stdout, "\nAll %d snapshots completed in %s\n", len(snapshotNames), time.Since(overallStartTime).Round(time.Second))
}
return nil
}
// createNamedSnapshot creates a single named snapshot
func (v *Vaultik) createNamedSnapshot(opts *SnapshotCreateOptions, hostname, snapName string, idx, total int) error {
snapshotStartTime := time.Now()
snapConfig := v.Config.Snapshots[snapName]
if total > 1 {
_, _ = fmt.Fprintf(v.Stdout, "\n=== Snapshot %d/%d: %s ===\n", idx, total, snapName)
}
// Resolve source directories to absolute paths
resolvedDirs := make([]string, 0, len(snapConfig.Paths))
for _, dir := range snapConfig.Paths {
resolvedDirs := make([]string, 0, len(v.Config.SourceDirs))
for _, dir := range v.Config.SourceDirs {
absPath, err := filepath.Abs(dir)
if err != nil {
return fmt.Errorf("failed to resolve absolute path for %s: %w", dir, err)
@ -125,12 +76,9 @@ func (v *Vaultik) createNamedSnapshot(opts *SnapshotCreateOptions, hostname, sna
}
// Create scanner with progress enabled (unless in cron mode)
// Pass the combined excludes for this snapshot
scanner := v.ScannerFactory(snapshot.ScannerParams{
EnableProgress: !opts.Cron,
Fs: v.Fs,
Exclude: v.Config.GetExcludes(snapName),
SkipErrors: opts.SkipErrors,
})
// Statistics tracking
@ -146,12 +94,12 @@ func (v *Vaultik) createNamedSnapshot(opts *SnapshotCreateOptions, hostname, sna
totalBlobsUploaded := 0
uploadDuration := time.Duration(0)
// Create a new snapshot at the beginning (with snapshot name in ID)
snapshotID, err := v.SnapshotManager.CreateSnapshotWithName(v.ctx, hostname, snapName, v.Globals.Version, v.Globals.Commit)
// Create a new snapshot at the beginning
snapshotID, err := v.SnapshotManager.CreateSnapshot(v.ctx, hostname, v.Globals.Version, v.Globals.Commit)
if err != nil {
return fmt.Errorf("creating snapshot: %w", err)
}
log.Info("Beginning snapshot", "snapshot_id", snapshotID, "name", snapName)
log.Info("Beginning snapshot", "snapshot_id", snapshotID)
_, _ = fmt.Fprintf(v.Stdout, "Beginning snapshot: %s\n", snapshotID)
for i, dir := range resolvedDirs {
@ -340,32 +288,31 @@ func (v *Vaultik) ListSnapshots(jsonOutput bool) error {
// Build a map of local snapshots for quick lookup
localSnapshotMap := make(map[string]*database.Snapshot)
for _, s := range localSnapshots {
localSnapshotMap[s.ID.String()] = s
localSnapshotMap[s.ID] = s
}
// Remove local snapshots that don't exist remotely
for _, snapshot := range localSnapshots {
snapshotIDStr := snapshot.ID.String()
if !remoteSnapshots[snapshotIDStr] {
if !remoteSnapshots[snapshot.ID] {
log.Info("Removing local snapshot not found in remote", "snapshot_id", snapshot.ID)
// Delete related records first to avoid foreign key constraints
if err := v.Repositories.Snapshots.DeleteSnapshotFiles(v.ctx, snapshotIDStr); err != nil {
if err := v.Repositories.Snapshots.DeleteSnapshotFiles(v.ctx, snapshot.ID); err != nil {
log.Error("Failed to delete snapshot files", "snapshot_id", snapshot.ID, "error", err)
}
if err := v.Repositories.Snapshots.DeleteSnapshotBlobs(v.ctx, snapshotIDStr); err != nil {
if err := v.Repositories.Snapshots.DeleteSnapshotBlobs(v.ctx, snapshot.ID); err != nil {
log.Error("Failed to delete snapshot blobs", "snapshot_id", snapshot.ID, "error", err)
}
if err := v.Repositories.Snapshots.DeleteSnapshotUploads(v.ctx, snapshotIDStr); err != nil {
if err := v.Repositories.Snapshots.DeleteSnapshotUploads(v.ctx, snapshot.ID); err != nil {
log.Error("Failed to delete snapshot uploads", "snapshot_id", snapshot.ID, "error", err)
}
// Now delete the snapshot itself
if err := v.Repositories.Snapshots.Delete(v.ctx, snapshotIDStr); err != nil {
if err := v.Repositories.Snapshots.Delete(v.ctx, snapshot.ID); err != nil {
log.Error("Failed to delete local snapshot", "snapshot_id", snapshot.ID, "error", err)
} else {
log.Info("Deleted local snapshot not found in remote", "snapshot_id", snapshot.ID)
delete(localSnapshotMap, snapshotIDStr)
delete(localSnapshotMap, snapshot.ID)
}
}
}
@ -404,7 +351,7 @@ func (v *Vaultik) ListSnapshots(jsonOutput bool) error {
}
snapshots = append(snapshots, SnapshotInfo{
ID: types.SnapshotID(snapshotID),
ID: snapshotID,
Timestamp: timestamp,
CompressedSize: totalSize,
})
@ -530,7 +477,7 @@ func (v *Vaultik) PurgeSnapshots(keepLatest bool, olderThan string, force bool)
// Delete snapshots
for _, snap := range toDelete {
log.Info("Deleting snapshot", "id", snap.ID)
if err := v.deleteSnapshot(snap.ID.String()); err != nil {
if err := v.deleteSnapshot(snap.ID); err != nil {
return fmt.Errorf("deleting snapshot %s: %w", snap.ID, err)
}
}
@ -545,19 +492,6 @@ func (v *Vaultik) PurgeSnapshots(keepLatest bool, olderThan string, force bool)
// VerifySnapshot checks snapshot integrity
func (v *Vaultik) VerifySnapshot(snapshotID string, deep bool) error {
return v.VerifySnapshotWithOptions(snapshotID, &VerifyOptions{Deep: deep})
}
// VerifySnapshotWithOptions checks snapshot integrity with full options
func (v *Vaultik) VerifySnapshotWithOptions(snapshotID string, opts *VerifyOptions) error {
result := &VerifyResult{
SnapshotID: snapshotID,
Mode: "shallow",
}
if opts.Deep {
result.Mode = "deep"
}
// Parse snapshot ID to extract timestamp
parts := strings.Split(snapshotID, "-")
var snapshotTime time.Time
@ -574,43 +508,30 @@ func (v *Vaultik) VerifySnapshotWithOptions(snapshotID string, opts *VerifyOptio
}
}
if !opts.JSON {
fmt.Printf("Verifying snapshot %s\n", snapshotID)
if !snapshotTime.IsZero() {
fmt.Printf("Snapshot time: %s\n", snapshotTime.Format("2006-01-02 15:04:05 MST"))
}
fmt.Println()
fmt.Printf("Verifying snapshot %s\n", snapshotID)
if !snapshotTime.IsZero() {
fmt.Printf("Snapshot time: %s\n", snapshotTime.Format("2006-01-02 15:04:05 MST"))
}
fmt.Println()
// Download and parse manifest
manifest, err := v.downloadManifest(snapshotID)
if err != nil {
if opts.JSON {
result.Status = "failed"
result.ErrorMessage = fmt.Sprintf("downloading manifest: %v", err)
return v.outputVerifyJSON(result)
}
return fmt.Errorf("downloading manifest: %w", err)
}
result.BlobCount = manifest.BlobCount
result.TotalSize = manifest.TotalCompressedSize
if !opts.JSON {
fmt.Printf("Snapshot information:\n")
fmt.Printf(" Blob count: %d\n", manifest.BlobCount)
fmt.Printf(" Total size: %s\n", humanize.Bytes(uint64(manifest.TotalCompressedSize)))
if manifest.Timestamp != "" {
if t, err := time.Parse(time.RFC3339, manifest.Timestamp); err == nil {
fmt.Printf(" Created: %s\n", t.Format("2006-01-02 15:04:05 MST"))
}
fmt.Printf("Snapshot information:\n")
fmt.Printf(" Blob count: %d\n", manifest.BlobCount)
fmt.Printf(" Total size: %s\n", humanize.Bytes(uint64(manifest.TotalCompressedSize)))
if manifest.Timestamp != "" {
if t, err := time.Parse(time.RFC3339, manifest.Timestamp); err == nil {
fmt.Printf(" Created: %s\n", t.Format("2006-01-02 15:04:05 MST"))
}
fmt.Println()
// Check each blob exists
fmt.Printf("Checking blob existence...\n")
}
fmt.Println()
// Check each blob exists
fmt.Printf("Checking blob existence...\n")
missing := 0
verified := 0
missingSize := int64(0)
@ -618,20 +539,16 @@ func (v *Vaultik) VerifySnapshotWithOptions(snapshotID string, opts *VerifyOptio
for _, blob := range manifest.Blobs {
blobPath := fmt.Sprintf("blobs/%s/%s/%s", blob.Hash[:2], blob.Hash[2:4], blob.Hash)
if opts.Deep {
if deep {
// Download and verify hash
// TODO: Implement deep verification
if !opts.JSON {
fmt.Printf("Deep verification not yet implemented\n")
}
fmt.Printf("Deep verification not yet implemented\n")
return nil
} else {
// Just check existence
_, err := v.Storage.Stat(v.ctx, blobPath)
if err != nil {
if !opts.JSON {
fmt.Printf(" Missing: %s (%s)\n", blob.Hash, humanize.Bytes(uint64(blob.CompressedSize)))
}
fmt.Printf(" Missing: %s (%s)\n", blob.Hash, humanize.Bytes(uint64(blob.CompressedSize)))
missing++
missingSize += blob.CompressedSize
} else {
@ -640,20 +557,6 @@ func (v *Vaultik) VerifySnapshotWithOptions(snapshotID string, opts *VerifyOptio
}
}
result.Verified = verified
result.Missing = missing
result.MissingSize = missingSize
if opts.JSON {
if missing > 0 {
result.Status = "failed"
result.ErrorMessage = fmt.Sprintf("%d blobs are missing", missing)
} else {
result.Status = "ok"
}
return v.outputVerifyJSON(result)
}
fmt.Printf("\nVerification complete:\n")
fmt.Printf(" Verified: %d blobs (%s)\n", verified,
humanize.Bytes(uint64(manifest.TotalCompressedSize-missingSize)))
@ -673,19 +576,6 @@ func (v *Vaultik) VerifySnapshotWithOptions(snapshotID string, opts *VerifyOptio
return nil
}
// outputVerifyJSON outputs the verification result as JSON
func (v *Vaultik) outputVerifyJSON(result *VerifyResult) error {
encoder := json.NewEncoder(os.Stdout)
encoder.SetIndent("", " ")
if err := encoder.Encode(result); err != nil {
return fmt.Errorf("encoding JSON: %w", err)
}
if result.Status == "failed" {
return fmt.Errorf("verification failed: %s", result.ErrorMessage)
}
return nil
}
// Helper methods that were previously on SnapshotApp
func (v *Vaultik) getManifestSize(snapshotID string) (int64, error) {
@ -743,23 +633,21 @@ func (v *Vaultik) deleteSnapshot(snapshotID string) error {
}
}
// Then, delete from local database (if we have a local database)
if v.Repositories != nil {
// Delete related records first to avoid foreign key constraints
if err := v.Repositories.Snapshots.DeleteSnapshotFiles(v.ctx, snapshotID); err != nil {
log.Error("Failed to delete snapshot files", "snapshot_id", snapshotID, "error", err)
}
if err := v.Repositories.Snapshots.DeleteSnapshotBlobs(v.ctx, snapshotID); err != nil {
log.Error("Failed to delete snapshot blobs", "snapshot_id", snapshotID, "error", err)
}
if err := v.Repositories.Snapshots.DeleteSnapshotUploads(v.ctx, snapshotID); err != nil {
log.Error("Failed to delete snapshot uploads", "snapshot_id", snapshotID, "error", err)
}
// Then, delete from local database
// Delete related records first to avoid foreign key constraints
if err := v.Repositories.Snapshots.DeleteSnapshotFiles(v.ctx, snapshotID); err != nil {
log.Error("Failed to delete snapshot files", "snapshot_id", snapshotID, "error", err)
}
if err := v.Repositories.Snapshots.DeleteSnapshotBlobs(v.ctx, snapshotID); err != nil {
log.Error("Failed to delete snapshot blobs", "snapshot_id", snapshotID, "error", err)
}
if err := v.Repositories.Snapshots.DeleteSnapshotUploads(v.ctx, snapshotID); err != nil {
log.Error("Failed to delete snapshot uploads", "snapshot_id", snapshotID, "error", err)
}
// Now delete the snapshot itself
if err := v.Repositories.Snapshots.Delete(v.ctx, snapshotID); err != nil {
return fmt.Errorf("deleting snapshot from database: %w", err)
}
// Now delete the snapshot itself
if err := v.Repositories.Snapshots.Delete(v.ctx, snapshotID); err != nil {
return fmt.Errorf("deleting snapshot from database: %w", err)
}
return nil
@ -795,10 +683,9 @@ func (v *Vaultik) syncWithRemote() error {
// Remove local snapshots that don't exist remotely
removedCount := 0
for _, snapshot := range localSnapshots {
snapshotIDStr := snapshot.ID.String()
if !remoteSnapshots[snapshotIDStr] {
if !remoteSnapshots[snapshot.ID] {
log.Info("Removing local snapshot not found in remote", "snapshot_id", snapshot.ID)
if err := v.Repositories.Snapshots.Delete(v.ctx, snapshotIDStr); err != nil {
if err := v.Repositories.Snapshots.Delete(v.ctx, snapshot.ID); err != nil {
log.Error("Failed to delete local snapshot", "snapshot_id", snapshot.ID, "error", err)
} else {
removedCount++
@ -812,298 +699,3 @@ func (v *Vaultik) syncWithRemote() error {
return nil
}
// RemoveOptions contains options for the snapshot remove command
type RemoveOptions struct {
Force bool
DryRun bool
JSON bool
}
// RemoveResult contains the result of a snapshot removal
type RemoveResult struct {
SnapshotID string `json:"snapshot_id"`
BlobsDeleted int `json:"blobs_deleted"`
BytesFreed int64 `json:"bytes_freed"`
BlobsFailed int `json:"blobs_failed,omitempty"`
DryRun bool `json:"dry_run,omitempty"`
}
// RemoveSnapshot removes a snapshot and any blobs that become orphaned
func (v *Vaultik) RemoveSnapshot(snapshotID string, opts *RemoveOptions) (*RemoveResult, error) {
log.Info("Starting snapshot removal", "snapshot_id", snapshotID)
result := &RemoveResult{
SnapshotID: snapshotID,
}
// Step 1: List all snapshots in storage
log.Info("Listing remote snapshots")
objectCh := v.Storage.ListStream(v.ctx, "metadata/")
var allSnapshotIDs []string
targetExists := false
for object := range objectCh {
if object.Err != nil {
return nil, fmt.Errorf("listing remote snapshots: %w", object.Err)
}
// Extract snapshot ID from paths like metadata/hostname-20240115-143052Z/
parts := strings.Split(object.Key, "/")
if len(parts) >= 2 && parts[0] == "metadata" && parts[1] != "" {
if strings.HasSuffix(object.Key, "/") || strings.Contains(object.Key, "/manifest.json.zst") {
sid := parts[1]
// Only add unique snapshot IDs
found := false
for _, id := range allSnapshotIDs {
if id == sid {
found = true
break
}
}
if !found {
allSnapshotIDs = append(allSnapshotIDs, sid)
if sid == snapshotID {
targetExists = true
}
}
}
}
}
if !targetExists {
return nil, fmt.Errorf("snapshot not found: %s", snapshotID)
}
log.Info("Found snapshots", "total", len(allSnapshotIDs))
// Step 2: Download target snapshot's manifest
log.Info("Downloading target manifest", "snapshot_id", snapshotID)
targetManifest, err := v.downloadManifest(snapshotID)
if err != nil {
return nil, fmt.Errorf("downloading target manifest: %w", err)
}
// Build set of target blob hashes with sizes
targetBlobs := make(map[string]int64) // hash -> size
for _, blob := range targetManifest.Blobs {
targetBlobs[blob.Hash] = blob.CompressedSize
}
log.Info("Target snapshot has blobs", "count", len(targetBlobs))
// Step 3: Download manifests from all OTHER snapshots to build "in-use" set
inUseBlobs := make(map[string]bool)
otherCount := 0
for _, sid := range allSnapshotIDs {
if sid == snapshotID {
continue // Skip target snapshot
}
log.Debug("Processing manifest", "snapshot_id", sid)
manifest, err := v.downloadManifest(sid)
if err != nil {
log.Error("Failed to download manifest", "snapshot_id", sid, "error", err)
continue
}
for _, blob := range manifest.Blobs {
inUseBlobs[blob.Hash] = true
}
otherCount++
}
log.Info("Processed other manifests", "count", otherCount, "in_use_blobs", len(inUseBlobs))
// Step 4: Find orphaned blobs (in target but not in use by others)
var orphanedBlobs []string
var totalSize int64
for hash, size := range targetBlobs {
if !inUseBlobs[hash] {
orphanedBlobs = append(orphanedBlobs, hash)
totalSize += size
}
}
log.Info("Found orphaned blobs",
"count", len(orphanedBlobs),
"total_size", humanize.Bytes(uint64(totalSize)),
)
// Show summary (unless JSON mode)
if !opts.JSON {
_, _ = fmt.Fprintf(v.Stdout, "\nSnapshot: %s\n", snapshotID)
_, _ = fmt.Fprintf(v.Stdout, "Blobs in snapshot: %d\n", len(targetBlobs))
_, _ = fmt.Fprintf(v.Stdout, "Orphaned blobs to delete: %d (%s)\n", len(orphanedBlobs), humanize.Bytes(uint64(totalSize)))
}
if opts.DryRun {
result.DryRun = true
if opts.JSON {
return result, v.outputRemoveJSON(result)
}
_, _ = fmt.Fprintln(v.Stdout, "\n[Dry run - no changes made]")
return result, nil
}
// Confirm unless --force is used (skip in JSON mode - require --force)
if !opts.Force && !opts.JSON {
_, _ = fmt.Fprintf(v.Stdout, "\nDelete snapshot and %d orphaned blob(s)? [y/N] ", len(orphanedBlobs))
var confirm string
if _, err := fmt.Fscanln(v.Stdin, &confirm); err != nil {
_, _ = fmt.Fprintln(v.Stdout, "Cancelled")
return result, nil
}
if strings.ToLower(confirm) != "y" {
_, _ = fmt.Fprintln(v.Stdout, "Cancelled")
return result, nil
}
}
// Step 5: Delete orphaned blobs
if len(orphanedBlobs) > 0 {
log.Info("Deleting orphaned blobs")
for i, hash := range orphanedBlobs {
blobPath := fmt.Sprintf("blobs/%s/%s/%s", hash[:2], hash[2:4], hash)
if err := v.Storage.Delete(v.ctx, blobPath); err != nil {
log.Error("Failed to delete blob", "hash", hash, "error", err)
result.BlobsFailed++
continue
}
result.BlobsDeleted++
result.BytesFreed += targetBlobs[hash]
// Progress update every 100 blobs
if (i+1)%100 == 0 || i == len(orphanedBlobs)-1 {
log.Info("Deletion progress",
"deleted", i+1,
"total", len(orphanedBlobs),
"percent", fmt.Sprintf("%.1f%%", float64(i+1)/float64(len(orphanedBlobs))*100),
)
}
}
}
// Step 6: Delete snapshot metadata
log.Info("Deleting snapshot metadata")
if err := v.deleteSnapshot(snapshotID); err != nil {
return result, fmt.Errorf("deleting snapshot metadata: %w", err)
}
// Output result
if opts.JSON {
return result, v.outputRemoveJSON(result)
}
// Print summary
_, _ = fmt.Fprintf(v.Stdout, "\nRemoved snapshot %s\n", snapshotID)
_, _ = fmt.Fprintf(v.Stdout, " Blobs deleted: %d\n", result.BlobsDeleted)
_, _ = fmt.Fprintf(v.Stdout, " Storage freed: %s\n", humanize.Bytes(uint64(result.BytesFreed)))
if result.BlobsFailed > 0 {
_, _ = fmt.Fprintf(v.Stdout, " Blobs failed: %d\n", result.BlobsFailed)
}
return result, nil
}
// outputRemoveJSON outputs the removal result as JSON
func (v *Vaultik) outputRemoveJSON(result *RemoveResult) error {
encoder := json.NewEncoder(os.Stdout)
encoder.SetIndent("", " ")
return encoder.Encode(result)
}
// PruneResult contains statistics about the prune operation
type PruneResult struct {
SnapshotsDeleted int64
FilesDeleted int64
ChunksDeleted int64
BlobsDeleted int64
}
// PruneDatabase removes incomplete snapshots and orphaned files, chunks,
// and blobs from the local database. This ensures database consistency
// before starting a new backup or on-demand via the prune command.
func (v *Vaultik) PruneDatabase() (*PruneResult, error) {
log.Info("Pruning database: removing incomplete snapshots and orphaned data")
result := &PruneResult{}
// First, delete any incomplete snapshots
incompleteSnapshots, err := v.Repositories.Snapshots.GetIncompleteSnapshots(v.ctx)
if err != nil {
return nil, fmt.Errorf("getting incomplete snapshots: %w", err)
}
for _, snapshot := range incompleteSnapshots {
snapshotIDStr := snapshot.ID.String()
log.Info("Deleting incomplete snapshot", "snapshot_id", snapshot.ID)
// Delete related records first
if err := v.Repositories.Snapshots.DeleteSnapshotFiles(v.ctx, snapshotIDStr); err != nil {
log.Error("Failed to delete snapshot files", "snapshot_id", snapshot.ID, "error", err)
}
if err := v.Repositories.Snapshots.DeleteSnapshotBlobs(v.ctx, snapshotIDStr); err != nil {
log.Error("Failed to delete snapshot blobs", "snapshot_id", snapshot.ID, "error", err)
}
if err := v.Repositories.Snapshots.DeleteSnapshotUploads(v.ctx, snapshotIDStr); err != nil {
log.Error("Failed to delete snapshot uploads", "snapshot_id", snapshot.ID, "error", err)
}
if err := v.Repositories.Snapshots.Delete(v.ctx, snapshotIDStr); err != nil {
log.Error("Failed to delete snapshot", "snapshot_id", snapshot.ID, "error", err)
} else {
result.SnapshotsDeleted++
}
}
// Get counts before cleanup for reporting
fileCountBefore, _ := v.getTableCount("files")
chunkCountBefore, _ := v.getTableCount("chunks")
blobCountBefore, _ := v.getTableCount("blobs")
// Run the cleanup
if err := v.SnapshotManager.CleanupOrphanedData(v.ctx); err != nil {
return nil, fmt.Errorf("cleanup orphaned data: %w", err)
}
// Get counts after cleanup
fileCountAfter, _ := v.getTableCount("files")
chunkCountAfter, _ := v.getTableCount("chunks")
blobCountAfter, _ := v.getTableCount("blobs")
result.FilesDeleted = fileCountBefore - fileCountAfter
result.ChunksDeleted = chunkCountBefore - chunkCountAfter
result.BlobsDeleted = blobCountBefore - blobCountAfter
log.Info("Prune complete",
"incomplete_snapshots", result.SnapshotsDeleted,
"orphaned_files", result.FilesDeleted,
"orphaned_chunks", result.ChunksDeleted,
"orphaned_blobs", result.BlobsDeleted,
)
// Print summary
_, _ = fmt.Fprintf(v.Stdout, "Prune complete:\n")
_, _ = fmt.Fprintf(v.Stdout, " Incomplete snapshots removed: %d\n", result.SnapshotsDeleted)
_, _ = fmt.Fprintf(v.Stdout, " Orphaned files removed: %d\n", result.FilesDeleted)
_, _ = fmt.Fprintf(v.Stdout, " Orphaned chunks removed: %d\n", result.ChunksDeleted)
_, _ = fmt.Fprintf(v.Stdout, " Orphaned blobs removed: %d\n", result.BlobsDeleted)
return result, nil
}
// getTableCount returns the count of rows in a table
func (v *Vaultik) getTableCount(tableName string) (int64, error) {
if v.DB == nil {
return 0, nil
}
var count int64
query := fmt.Sprintf("SELECT COUNT(*) FROM %s", tableName)
err := v.DB.Conn().QueryRowContext(v.ctx, query).Scan(&count)
if err != nil {
return 0, err
}
return count, nil
}

View File

@ -1,7 +1,6 @@
package vaultik
import (
"bytes"
"context"
"fmt"
"io"
@ -123,34 +122,3 @@ func (v *Vaultik) GetDecryptor() (*crypto.Decryptor, error) {
func (v *Vaultik) GetFilesystem() afero.Fs {
return v.Fs
}
// TestVaultik wraps a Vaultik with captured stdout/stderr for testing
type TestVaultik struct {
*Vaultik
Stdout *bytes.Buffer
Stderr *bytes.Buffer
Stdin *bytes.Buffer
}
// NewForTesting creates a minimal Vaultik instance for testing purposes.
// Only the Storage field is populated; other fields are nil.
// Returns a TestVaultik that captures stdout/stderr in buffers.
func NewForTesting(storage storage.Storer) *TestVaultik {
ctx, cancel := context.WithCancel(context.Background())
stdout := &bytes.Buffer{}
stderr := &bytes.Buffer{}
stdin := &bytes.Buffer{}
return &TestVaultik{
Vaultik: &Vaultik{
Storage: storage,
ctx: ctx,
cancel: cancel,
Stdout: stdout,
Stderr: stderr,
Stdin: stdin,
},
Stdout: stdout,
Stderr: stderr,
Stdin: stdin,
}
}

View File

@ -18,20 +18,6 @@ import (
// VerifyOptions contains options for the verify command
type VerifyOptions struct {
Deep bool
JSON bool
}
// VerifyResult contains the result of a snapshot verification
type VerifyResult struct {
SnapshotID string `json:"snapshot_id"`
Status string `json:"status"` // "ok" or "failed"
Mode string `json:"mode"` // "shallow" or "deep"
BlobCount int `json:"blob_count"`
TotalSize int64 `json:"total_size"`
Verified int `json:"verified"`
Missing int `json:"missing"`
MissingSize int64 `json:"missing_size,omitempty"`
ErrorMessage string `json:"error,omitempty"`
}
// RunDeepVerify executes deep verification operation

View File

@ -1,11 +1,9 @@
age_recipients:
- age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj # sneak's long term age key
- age1otherpubkey... # add additional recipients as needed
snapshots:
test:
paths:
- /tmp/vaultik-test-source
- /var/test/data
source_dirs:
- /tmp/vaultik-test-source
- /var/test/data
exclude:
- '*.log'
- '*.tmp'
@ -27,4 +25,4 @@ index_path: /tmp/vaultik-test.sqlite
chunk_size: 10MB
blob_size_limit: 10GB
compression_level: 3
hostname: test-host
hostname: test-host