Compare commits
55 Commits
9c072166fa
...
fix/sql-in
| Author | SHA1 | Date | |
|---|---|---|---|
| 3e282af516 | |||
| 815b35c7ae | |||
| 9c66674683 | |||
| 49de277648 | |||
| ed5d777d05 | |||
| 2e7356dd85 | |||
|
|
bb4b9b5bc9 | ||
| 70d4fe2aa0 | |||
|
|
2f249e3ddd | ||
|
|
3f834f1c9c | ||
|
|
9879668c31 | ||
|
|
0a0d9f33b0 | ||
| df0e8c275b | |||
|
|
d77ac18aaa | ||
| 825f25da58 | |||
| 162d76bb38 | |||
|
|
bfd7334221 | ||
|
|
9b32bf0846 | ||
| 8adc668fa6 | |||
|
|
441c441eca | ||
|
|
4d9f912a5f | ||
| 46c2ea3079 | |||
| 470bf648c4 | |||
| bdaaadf990 | |||
| 417b25a5f5 | |||
| 2afd54d693 | |||
| 05286bed01 | |||
| f2c120f026 | |||
| bbe09ec5b5 | |||
| 43a69c2cfb | |||
| 899448e1da | |||
| 24c5e8c5a6 | |||
| 40fff09594 | |||
| 8a8651c690 | |||
| a1d559c30d | |||
| 88e2508dc7 | |||
| c3725e745e | |||
| badc0c07e0 | |||
| cda0cf865a | |||
| 0736bd070b | |||
| d7cd9aac27 | |||
| bb38f8c5d6 | |||
| e29a995120 | |||
| 5c70405a85 | |||
| a544fa80f2 | |||
| c07d8eec0a | |||
| 0cbb5aa0a6 | |||
| fb220685a2 | |||
| 1d027bde57 | |||
| bb2292de7f | |||
| d3afa65420 | |||
| 78af626759 | |||
| 86b533d6ee | |||
| 26db096913 | |||
| 36c59cb7b3 |
380
ARCHITECTURE.md
Normal file
380
ARCHITECTURE.md
Normal file
@@ -0,0 +1,380 @@
|
|||||||
|
# Vaultik Architecture
|
||||||
|
|
||||||
|
This document describes the internal architecture of Vaultik, focusing on the data model, type instantiation, and the relationships between core modules.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Vaultik is a backup system that uses content-defined chunking for deduplication and packs chunks into large, compressed, encrypted blobs for efficient cloud storage. The system is built around dependency injection using [uber-go/fx](https://github.com/uber-go/fx).
|
||||||
|
|
||||||
|
## Data Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
Source Files
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────┐
|
||||||
|
│ Scanner │ Walks directories, detects changed files
|
||||||
|
└────────┬────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────┐
|
||||||
|
│ Chunker │ Splits files into variable-size chunks (FastCDC)
|
||||||
|
└────────┬────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────┐
|
||||||
|
│ Packer │ Accumulates chunks, compresses (zstd), encrypts (age)
|
||||||
|
└────────┬────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────┐
|
||||||
|
│ S3 Client │ Uploads blobs to remote storage
|
||||||
|
└─────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Data Model
|
||||||
|
|
||||||
|
### Core Entities
|
||||||
|
|
||||||
|
The database tracks five primary entities and their relationships:
|
||||||
|
|
||||||
|
```
|
||||||
|
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
||||||
|
│ Snapshot │────▶│ File │────▶│ Chunk │
|
||||||
|
└──────────────┘ └──────────────┘ └──────────────┘
|
||||||
|
│ │
|
||||||
|
│ │
|
||||||
|
▼ ▼
|
||||||
|
┌──────────────┐ ┌──────────────┐
|
||||||
|
│ Blob │◀─────────────────────────│ BlobChunk │
|
||||||
|
└──────────────┘ └──────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Entity Descriptions
|
||||||
|
|
||||||
|
#### File (`database.File`)
|
||||||
|
Represents a file or directory in the backup system. Stores metadata needed for restoration:
|
||||||
|
- Path, timestamps (mtime, ctime)
|
||||||
|
- Size, mode, ownership (uid, gid)
|
||||||
|
- Symlink target (if applicable)
|
||||||
|
|
||||||
|
#### Chunk (`database.Chunk`)
|
||||||
|
A content-addressed unit of data. Files are split into variable-size chunks using the FastCDC algorithm:
|
||||||
|
- `ChunkHash`: SHA256 hash of chunk content (primary key)
|
||||||
|
- `Size`: Chunk size in bytes
|
||||||
|
|
||||||
|
Chunk sizes vary between `avgChunkSize/4` and `avgChunkSize*4` (typically 16KB-256KB for 64KB average).
|
||||||
|
|
||||||
|
#### FileChunk (`database.FileChunk`)
|
||||||
|
Maps files to their constituent chunks:
|
||||||
|
- `FileID`: Reference to the file
|
||||||
|
- `Idx`: Position of this chunk within the file (0-indexed)
|
||||||
|
- `ChunkHash`: Reference to the chunk
|
||||||
|
|
||||||
|
#### Blob (`database.Blob`)
|
||||||
|
The final storage unit uploaded to S3. Contains many compressed and encrypted chunks:
|
||||||
|
- `ID`: UUID assigned at creation
|
||||||
|
- `Hash`: SHA256 of final compressed+encrypted content
|
||||||
|
- `UncompressedSize`: Total raw chunk data before compression
|
||||||
|
- `CompressedSize`: Size after zstd compression and age encryption
|
||||||
|
- `CreatedTS`, `FinishedTS`, `UploadedTS`: Lifecycle timestamps
|
||||||
|
|
||||||
|
Blob creation process:
|
||||||
|
1. Chunks are accumulated (up to MaxBlobSize, typically 10GB)
|
||||||
|
2. Compressed with zstd
|
||||||
|
3. Encrypted with age (recipients configured in config)
|
||||||
|
4. SHA256 hash computed → becomes filename in S3
|
||||||
|
5. Uploaded to `blobs/{hash[0:2]}/{hash[2:4]}/{hash}`
|
||||||
|
|
||||||
|
#### BlobChunk (`database.BlobChunk`)
|
||||||
|
Maps chunks to their position within blobs:
|
||||||
|
- `BlobID`: Reference to the blob
|
||||||
|
- `ChunkHash`: Reference to the chunk
|
||||||
|
- `Offset`: Byte offset within the uncompressed blob
|
||||||
|
- `Length`: Chunk size
|
||||||
|
|
||||||
|
#### Snapshot (`database.Snapshot`)
|
||||||
|
Represents a point-in-time backup:
|
||||||
|
- `ID`: Format is `{hostname}-{YYYYMMDD}-{HHMMSS}Z`
|
||||||
|
- Tracks file count, chunk count, blob count, sizes, compression ratio
|
||||||
|
- `CompletedAt`: Null until snapshot finishes successfully
|
||||||
|
|
||||||
|
#### SnapshotFile / SnapshotBlob
|
||||||
|
Join tables linking snapshots to their files and blobs.
|
||||||
|
|
||||||
|
### Relationship Summary
|
||||||
|
|
||||||
|
```
|
||||||
|
Snapshot 1──────────▶ N SnapshotFile N ◀────────── 1 File
|
||||||
|
Snapshot 1──────────▶ N SnapshotBlob N ◀────────── 1 Blob
|
||||||
|
File 1──────────▶ N FileChunk N ◀────────── 1 Chunk
|
||||||
|
Blob 1──────────▶ N BlobChunk N ◀────────── 1 Chunk
|
||||||
|
```
|
||||||
|
|
||||||
|
## Type Instantiation
|
||||||
|
|
||||||
|
### Application Startup
|
||||||
|
|
||||||
|
The CLI uses fx for dependency injection. Here's the instantiation order:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// cli/app.go: NewApp()
|
||||||
|
fx.New(
|
||||||
|
fx.Supply(config.ConfigPath(opts.ConfigPath)), // 1. Config path
|
||||||
|
fx.Supply(opts.LogOptions), // 2. Log options
|
||||||
|
fx.Provide(globals.New), // 3. Globals
|
||||||
|
fx.Provide(log.New), // 4. Logger config
|
||||||
|
config.Module, // 5. Config
|
||||||
|
database.Module, // 6. Database + Repositories
|
||||||
|
log.Module, // 7. Logger initialization
|
||||||
|
s3.Module, // 8. S3 client
|
||||||
|
snapshot.Module, // 9. SnapshotManager + ScannerFactory
|
||||||
|
fx.Provide(vaultik.New), // 10. Vaultik orchestrator
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key Type Instantiation Points
|
||||||
|
|
||||||
|
#### 1. Config (`config.Config`)
|
||||||
|
- **Created by**: `config.Module` via `config.LoadConfig()`
|
||||||
|
- **When**: Application startup (fx DI)
|
||||||
|
- **Contains**: All configuration from YAML file (S3 credentials, encryption keys, paths, etc.)
|
||||||
|
|
||||||
|
#### 2. Database (`database.DB`)
|
||||||
|
- **Created by**: `database.Module` via `database.New()`
|
||||||
|
- **When**: Application startup (fx DI)
|
||||||
|
- **Contains**: SQLite connection, path reference
|
||||||
|
|
||||||
|
#### 3. Repositories (`database.Repositories`)
|
||||||
|
- **Created by**: `database.Module` via `database.NewRepositories()`
|
||||||
|
- **When**: Application startup (fx DI)
|
||||||
|
- **Contains**: All repository interfaces (Files, Chunks, Blobs, Snapshots, etc.)
|
||||||
|
|
||||||
|
#### 4. Vaultik (`vaultik.Vaultik`)
|
||||||
|
- **Created by**: `vaultik.New(VaultikParams)`
|
||||||
|
- **When**: Application startup (fx DI)
|
||||||
|
- **Contains**: All dependencies for backup operations
|
||||||
|
|
||||||
|
```go
|
||||||
|
type Vaultik struct {
|
||||||
|
Globals *globals.Globals
|
||||||
|
Config *config.Config
|
||||||
|
DB *database.DB
|
||||||
|
Repositories *database.Repositories
|
||||||
|
S3Client *s3.Client
|
||||||
|
ScannerFactory snapshot.ScannerFactory
|
||||||
|
SnapshotManager *snapshot.SnapshotManager
|
||||||
|
Shutdowner fx.Shutdowner
|
||||||
|
Fs afero.Fs
|
||||||
|
ctx context.Context
|
||||||
|
cancel context.CancelFunc
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 5. SnapshotManager (`snapshot.SnapshotManager`)
|
||||||
|
- **Created by**: `snapshot.Module` via `snapshot.NewSnapshotManager()`
|
||||||
|
- **When**: Application startup (fx DI)
|
||||||
|
- **Responsibility**: Creates/completes snapshots, exports metadata to S3
|
||||||
|
|
||||||
|
#### 6. Scanner (`snapshot.Scanner`)
|
||||||
|
- **Created by**: `ScannerFactory(ScannerParams)`
|
||||||
|
- **When**: Each `CreateSnapshot()` call
|
||||||
|
- **Contains**: Chunker, Packer, progress reporter
|
||||||
|
|
||||||
|
```go
|
||||||
|
// vaultik/snapshot.go: CreateSnapshot()
|
||||||
|
scanner := v.ScannerFactory(snapshot.ScannerParams{
|
||||||
|
EnableProgress: !opts.Cron,
|
||||||
|
Fs: v.Fs,
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 7. Chunker (`chunker.Chunker`)
|
||||||
|
- **Created by**: `chunker.NewChunker(avgChunkSize)`
|
||||||
|
- **When**: Inside `snapshot.NewScanner()`
|
||||||
|
- **Configuration**:
|
||||||
|
- `avgChunkSize`: From config (typically 64KB)
|
||||||
|
- `minChunkSize`: avgChunkSize / 4
|
||||||
|
- `maxChunkSize`: avgChunkSize * 4
|
||||||
|
|
||||||
|
#### 8. Packer (`blob.Packer`)
|
||||||
|
- **Created by**: `blob.NewPacker(PackerConfig)`
|
||||||
|
- **When**: Inside `snapshot.NewScanner()`
|
||||||
|
- **Configuration**:
|
||||||
|
- `MaxBlobSize`: Maximum blob size before finalization (typically 10GB)
|
||||||
|
- `CompressionLevel`: zstd level (1-19)
|
||||||
|
- `Recipients`: age public keys for encryption
|
||||||
|
|
||||||
|
```go
|
||||||
|
// snapshot/scanner.go: NewScanner()
|
||||||
|
packerCfg := blob.PackerConfig{
|
||||||
|
MaxBlobSize: cfg.MaxBlobSize,
|
||||||
|
CompressionLevel: cfg.CompressionLevel,
|
||||||
|
Recipients: cfg.AgeRecipients,
|
||||||
|
Repositories: cfg.Repositories,
|
||||||
|
Fs: cfg.FS,
|
||||||
|
}
|
||||||
|
packer, err := blob.NewPacker(packerCfg)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Module Responsibilities
|
||||||
|
|
||||||
|
### `internal/cli`
|
||||||
|
Entry point for fx application. Combines all modules and handles signal interrupts.
|
||||||
|
|
||||||
|
Key functions:
|
||||||
|
- `NewApp(AppOptions)` → Creates fx.App with all modules
|
||||||
|
- `RunApp(ctx, app)` → Starts app, handles graceful shutdown
|
||||||
|
- `RunWithApp(ctx, opts)` → Convenience wrapper
|
||||||
|
|
||||||
|
### `internal/vaultik`
|
||||||
|
Main orchestrator containing all dependencies and command implementations.
|
||||||
|
|
||||||
|
Key methods:
|
||||||
|
- `New(VaultikParams)` → Constructor (fx DI)
|
||||||
|
- `CreateSnapshot(opts)` → Main backup operation
|
||||||
|
- `ListSnapshots(jsonOutput)` → List available snapshots
|
||||||
|
- `VerifySnapshot(id, deep)` → Verify snapshot integrity
|
||||||
|
- `PurgeSnapshots(...)` → Remove old snapshots
|
||||||
|
|
||||||
|
### `internal/chunker`
|
||||||
|
Content-defined chunking using FastCDC algorithm.
|
||||||
|
|
||||||
|
Key types:
|
||||||
|
- `Chunk` → Hash, Data, Offset, Size
|
||||||
|
- `Chunker` → avgChunkSize, minChunkSize, maxChunkSize
|
||||||
|
|
||||||
|
Key methods:
|
||||||
|
- `NewChunker(avgChunkSize)` → Constructor
|
||||||
|
- `ChunkReaderStreaming(reader, callback)` → Stream chunks with callback (preferred)
|
||||||
|
- `ChunkReader(reader)` → Return all chunks at once (memory-intensive)
|
||||||
|
|
||||||
|
### `internal/blob`
|
||||||
|
Blob packing: accumulates chunks, compresses, encrypts, tracks metadata.
|
||||||
|
|
||||||
|
Key types:
|
||||||
|
- `Packer` → Thread-safe blob accumulator
|
||||||
|
- `ChunkRef` → Hash + Data for adding to packer
|
||||||
|
- `FinishedBlob` → Completed blob ready for upload
|
||||||
|
- `BlobWithReader` → FinishedBlob + io.Reader for streaming upload
|
||||||
|
|
||||||
|
Key methods:
|
||||||
|
- `NewPacker(PackerConfig)` → Constructor
|
||||||
|
- `AddChunk(ChunkRef)` → Add chunk to current blob
|
||||||
|
- `FinalizeBlob()` → Compress, encrypt, hash current blob
|
||||||
|
- `Flush()` → Finalize any in-progress blob
|
||||||
|
- `SetBlobHandler(func)` → Set callback for upload
|
||||||
|
|
||||||
|
### `internal/snapshot`
|
||||||
|
|
||||||
|
#### Scanner
|
||||||
|
Orchestrates the backup process for a directory.
|
||||||
|
|
||||||
|
Key methods:
|
||||||
|
- `NewScanner(ScannerConfig)` → Constructor (creates Chunker + Packer)
|
||||||
|
- `Scan(ctx, path, snapshotID)` → Main scan operation
|
||||||
|
|
||||||
|
Scan phases:
|
||||||
|
1. **Phase 0**: Detect deleted files from previous snapshots
|
||||||
|
2. **Phase 1**: Walk directory, identify files needing processing
|
||||||
|
3. **Phase 2**: Process files (chunk → pack → upload)
|
||||||
|
|
||||||
|
#### SnapshotManager
|
||||||
|
Manages snapshot lifecycle and metadata export.
|
||||||
|
|
||||||
|
Key methods:
|
||||||
|
- `CreateSnapshot(ctx, hostname, version, commit)` → Create snapshot record
|
||||||
|
- `CompleteSnapshot(ctx, snapshotID)` → Mark snapshot complete
|
||||||
|
- `ExportSnapshotMetadata(ctx, dbPath, snapshotID)` → Export to S3
|
||||||
|
- `CleanupIncompleteSnapshots(ctx, hostname)` → Remove failed snapshots
|
||||||
|
|
||||||
|
### `internal/database`
|
||||||
|
SQLite database for local index. Single-writer mode for thread safety.
|
||||||
|
|
||||||
|
Key types:
|
||||||
|
- `DB` → Database connection wrapper
|
||||||
|
- `Repositories` → Collection of all repository interfaces
|
||||||
|
|
||||||
|
Repository interfaces:
|
||||||
|
- `FilesRepository` → CRUD for File records
|
||||||
|
- `ChunksRepository` → CRUD for Chunk records
|
||||||
|
- `BlobsRepository` → CRUD for Blob records
|
||||||
|
- `SnapshotsRepository` → CRUD for Snapshot records
|
||||||
|
- Plus join table repositories (FileChunks, BlobChunks, etc.)
|
||||||
|
|
||||||
|
## Snapshot Creation Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
CreateSnapshot(opts)
|
||||||
|
│
|
||||||
|
├─► CleanupIncompleteSnapshots() // Critical: avoid dedup errors
|
||||||
|
│
|
||||||
|
├─► SnapshotManager.CreateSnapshot() // Create DB record
|
||||||
|
│
|
||||||
|
├─► For each source directory:
|
||||||
|
│ │
|
||||||
|
│ ├─► scanner.Scan(ctx, path, snapshotID)
|
||||||
|
│ │ │
|
||||||
|
│ │ ├─► Phase 0: detectDeletedFiles()
|
||||||
|
│ │ │
|
||||||
|
│ │ ├─► Phase 1: scanPhase()
|
||||||
|
│ │ │ Walk directory
|
||||||
|
│ │ │ Check file metadata changes
|
||||||
|
│ │ │ Build list of files to process
|
||||||
|
│ │ │
|
||||||
|
│ │ └─► Phase 2: processPhase()
|
||||||
|
│ │ For each file:
|
||||||
|
│ │ chunker.ChunkReaderStreaming()
|
||||||
|
│ │ For each chunk:
|
||||||
|
│ │ packer.AddChunk()
|
||||||
|
│ │ If blob full → FinalizeBlob()
|
||||||
|
│ │ → handleBlobReady()
|
||||||
|
│ │ → s3Client.PutObjectWithProgress()
|
||||||
|
│ │ packer.Flush() // Final blob
|
||||||
|
│ │
|
||||||
|
│ └─► Accumulate statistics
|
||||||
|
│
|
||||||
|
├─► SnapshotManager.UpdateSnapshotStatsExtended()
|
||||||
|
│
|
||||||
|
├─► SnapshotManager.CompleteSnapshot()
|
||||||
|
│
|
||||||
|
└─► SnapshotManager.ExportSnapshotMetadata()
|
||||||
|
│
|
||||||
|
├─► Copy database to temp file
|
||||||
|
├─► Clean to only current snapshot data
|
||||||
|
├─► Dump to SQL
|
||||||
|
├─► Compress with zstd
|
||||||
|
├─► Encrypt with age
|
||||||
|
├─► Upload db.zst.age to S3
|
||||||
|
└─► Upload manifest.json.zst to S3
|
||||||
|
```
|
||||||
|
|
||||||
|
## Deduplication Strategy
|
||||||
|
|
||||||
|
1. **File-level**: Files unchanged since last backup are skipped (metadata comparison: size, mtime, mode, uid, gid)
|
||||||
|
|
||||||
|
2. **Chunk-level**: Chunks are content-addressed by SHA256 hash. If a chunk hash already exists in the database, the chunk data is not re-uploaded.
|
||||||
|
|
||||||
|
3. **Blob-level**: Blobs contain only unique chunks. Duplicate chunks within a blob are skipped.
|
||||||
|
|
||||||
|
## Storage Layout in S3
|
||||||
|
|
||||||
|
```
|
||||||
|
bucket/
|
||||||
|
├── blobs/
|
||||||
|
│ └── {hash[0:2]}/
|
||||||
|
│ └── {hash[2:4]}/
|
||||||
|
│ └── {full-hash} # Compressed+encrypted blob
|
||||||
|
│
|
||||||
|
└── metadata/
|
||||||
|
└── {snapshot-id}/
|
||||||
|
├── db.zst.age # Encrypted database dump
|
||||||
|
└── manifest.json.zst # Blob list (for verification)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Thread Safety
|
||||||
|
|
||||||
|
- `Packer`: Thread-safe via mutex. Multiple goroutines can call `AddChunk()`.
|
||||||
|
- `Scanner`: Uses `packerMu` mutex to coordinate blob finalization.
|
||||||
|
- `Database`: Single-writer mode (`MaxOpenConns=1`) ensures SQLite thread safety.
|
||||||
|
- `Repositories.WithTx()`: Handles transaction lifecycle automatically.
|
||||||
16
CLAUDE.md
16
CLAUDE.md
@@ -10,6 +10,9 @@ Read the rules in AGENTS.md and follow them.
|
|||||||
corporate advertising for Anthropic and is therefore completely
|
corporate advertising for Anthropic and is therefore completely
|
||||||
unacceptable in commit messages.
|
unacceptable in commit messages.
|
||||||
|
|
||||||
|
* NEVER use `git add -A`. Always add only the files you intentionally
|
||||||
|
changed.
|
||||||
|
|
||||||
* Tests should always be run before committing code. No commits should be
|
* Tests should always be run before committing code. No commits should be
|
||||||
made that do not pass tests.
|
made that do not pass tests.
|
||||||
|
|
||||||
@@ -26,3 +29,16 @@ Read the rules in AGENTS.md and follow them.
|
|||||||
* Do not stop working on a task until you have reached the definition of
|
* Do not stop working on a task until you have reached the definition of
|
||||||
done provided to you in the initial instruction. Don't do part or most of
|
done provided to you in the initial instruction. Don't do part or most of
|
||||||
the work, do all of the work until the criteria for done are met.
|
the work, do all of the work until the criteria for done are met.
|
||||||
|
|
||||||
|
* We do not need to support migrations; schema upgrades can be handled by
|
||||||
|
deleting the local state file and doing a full backup to re-create it.
|
||||||
|
|
||||||
|
* When testing on a 2.5Gbit/s ethernet to an s3 server backed by 2000MB/sec SSD,
|
||||||
|
estimate about 4 seconds per gigabyte of backup time.
|
||||||
|
|
||||||
|
* When running tests, don't run individual tests, or grep the output. run
|
||||||
|
the entire test suite every time and read the full output.
|
||||||
|
|
||||||
|
* When running tests, don't run individual tests, or try to grep the output.
|
||||||
|
never run "go test". only ever run "make test" to run the full test
|
||||||
|
suite, and examine the full output.
|
||||||
|
|||||||
385
DESIGN.md
385
DESIGN.md
@@ -1,385 +0,0 @@
|
|||||||
# vaultik: Design Document
|
|
||||||
|
|
||||||
`vaultik` is a secure backup tool written in Go. It performs
|
|
||||||
streaming backups using content-defined chunking, blob grouping, asymmetric
|
|
||||||
encryption, and object storage. The system is designed for environments
|
|
||||||
where the backup source host cannot store secrets and cannot retrieve or
|
|
||||||
decrypt any data from the destination.
|
|
||||||
|
|
||||||
The source host is **stateful**: it maintains a local SQLite index to detect
|
|
||||||
changes, deduplicate content, and track uploads across backup runs. All
|
|
||||||
remote storage is encrypted and append-only. Pruning of unreferenced data is
|
|
||||||
done from a trusted host with access to decryption keys, as even the
|
|
||||||
metadata indices are encrypted in the blob store.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Why
|
|
||||||
|
|
||||||
ANOTHER backup tool??
|
|
||||||
|
|
||||||
Other backup tools like `restic`, `borg`, and `duplicity` are designed for
|
|
||||||
environments where the source host can store secrets and has access to
|
|
||||||
decryption keys. I don't want to store backup decryption keys on my hosts,
|
|
||||||
only public keys for encryption.
|
|
||||||
|
|
||||||
My requirements are:
|
|
||||||
|
|
||||||
* open source
|
|
||||||
* no passphrases or private keys on the source host
|
|
||||||
* incremental
|
|
||||||
* compressed
|
|
||||||
* encrypted
|
|
||||||
* s3 compatible without an intermediate step or tool
|
|
||||||
|
|
||||||
Surprisingly, no existing tool meets these requirements, so I wrote `vaultik`.
|
|
||||||
|
|
||||||
## Design Goals
|
|
||||||
|
|
||||||
1. Backups must require only a public key on the source host.
|
|
||||||
2. No secrets or private keys may exist on the source system.
|
|
||||||
3. Obviously, restore must be possible using **only** the backup bucket and
|
|
||||||
a private key.
|
|
||||||
4. Prune must be possible, although this requires a private key so must be
|
|
||||||
done on different hosts.
|
|
||||||
5. All encryption is done using [`age`](https://github.com/FiloSottile/age)
|
|
||||||
(X25519, XChaCha20-Poly1305).
|
|
||||||
6. Compression uses `zstd` at a configurable level.
|
|
||||||
7. Files are chunked, and multiple chunks are packed into encrypted blobs.
|
|
||||||
This reduces the number of objects in the blob store for filesystems with
|
|
||||||
many small files.
|
|
||||||
9. All metadata (snapshots) is stored remotely as encrypted SQLite DBs.
|
|
||||||
10. If a snapshot metadata file exceeds a configured size threshold, it is
|
|
||||||
chunked into multiple encrypted `.age` parts, to support large
|
|
||||||
filesystems.
|
|
||||||
11. CLI interface is structured using `cobra`.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## S3 Bucket Layout
|
|
||||||
|
|
||||||
S3 stores only four things:
|
|
||||||
|
|
||||||
1) Blobs: encrypted, compressed packs of file chunks.
|
|
||||||
2) Metadata: encrypted SQLite databases containing the current state of the
|
|
||||||
filesystem at the time of the snapshot.
|
|
||||||
3) Metadata hashes: encrypted hashes of the metadata SQLite databases.
|
|
||||||
4) Blob manifests: unencrypted compressed JSON files listing all blob hashes
|
|
||||||
referenced in the snapshot, enabling pruning without decryption.
|
|
||||||
|
|
||||||
```
|
|
||||||
s3://<bucket>/<prefix>/
|
|
||||||
├── blobs/
|
|
||||||
│ ├── <aa>/<bb>/<full_blob_hash>.zst.age
|
|
||||||
├── metadata/
|
|
||||||
│ ├── <snapshot_id>.sqlite.age
|
|
||||||
│ ├── <snapshot_id>.sqlite.00.age
|
|
||||||
│ ├── <snapshot_id>.sqlite.01.age
|
|
||||||
│ ├── <snapshot_id>.manifest.json.zst
|
|
||||||
```
|
|
||||||
|
|
||||||
To retrieve a given file, you would:
|
|
||||||
|
|
||||||
* fetch `metadata/<snapshot_id>.sqlite.age` or `metadata/<snapshot_id>.sqlite.{seq}.age`
|
|
||||||
* fetch `metadata/<snapshot_id>.hash.age`
|
|
||||||
* decrypt the metadata SQLite database using the private key and reconstruct
|
|
||||||
the full database file
|
|
||||||
* verify the hash of the decrypted database matches the decrypted hash
|
|
||||||
* query the database for the file in question
|
|
||||||
* determine all chunks for the file
|
|
||||||
* for each chunk, look up the metadata for all blobs in the db
|
|
||||||
* fetch each blob from `blobs/<aa>/<bb>/<blob_hash>.zst.age`
|
|
||||||
* decrypt each blob using the private key
|
|
||||||
* decompress each blob using `zstd`
|
|
||||||
* reconstruct the file from set of file chunks stored in the blobs
|
|
||||||
|
|
||||||
If clever, it may be possible to do this chunk by chunk without touching
|
|
||||||
disk (except for the output file) as each uncompressed blob should fit in
|
|
||||||
memory (<10GB).
|
|
||||||
|
|
||||||
### Path Rules
|
|
||||||
|
|
||||||
* `<snapshot_id>`: UTC timestamp in iso860 format, e.g. `2023-10-01T12:00:00Z`. These are lexicographically sortable.
|
|
||||||
* `blobs/<aa>/<bb>/...`: where `aa` and `bb` are the first 2 hex bytes of the blob hash.
|
|
||||||
|
|
||||||
### Blob Manifest Format
|
|
||||||
|
|
||||||
The `<snapshot_id>.manifest.json.zst` file is an unencrypted, compressed JSON file containing:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"snapshot_id": "2023-10-01T12:00:00Z",
|
|
||||||
"blob_hashes": [
|
|
||||||
"aa1234567890abcdef...",
|
|
||||||
"bb2345678901bcdef0...",
|
|
||||||
...
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
This allows pruning operations to determine which blobs are referenced without requiring decryption keys.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 3. Local SQLite Index Schema (source host)
|
|
||||||
|
|
||||||
```sql
|
|
||||||
CREATE TABLE files (
|
|
||||||
path TEXT PRIMARY KEY,
|
|
||||||
mtime INTEGER NOT NULL,
|
|
||||||
size INTEGER NOT NULL
|
|
||||||
);
|
|
||||||
|
|
||||||
-- Maps files to their constituent chunks in sequence order
|
|
||||||
-- Used for reconstructing files from chunks during restore
|
|
||||||
CREATE TABLE file_chunks (
|
|
||||||
path TEXT NOT NULL,
|
|
||||||
idx INTEGER NOT NULL,
|
|
||||||
chunk_hash TEXT NOT NULL,
|
|
||||||
PRIMARY KEY (path, idx)
|
|
||||||
);
|
|
||||||
|
|
||||||
CREATE TABLE chunks (
|
|
||||||
chunk_hash TEXT PRIMARY KEY,
|
|
||||||
sha256 TEXT NOT NULL,
|
|
||||||
size INTEGER NOT NULL
|
|
||||||
);
|
|
||||||
|
|
||||||
CREATE TABLE blobs (
|
|
||||||
blob_hash TEXT PRIMARY KEY,
|
|
||||||
final_hash TEXT NOT NULL,
|
|
||||||
created_ts INTEGER NOT NULL
|
|
||||||
);
|
|
||||||
|
|
||||||
CREATE TABLE blob_chunks (
|
|
||||||
blob_hash TEXT NOT NULL,
|
|
||||||
chunk_hash TEXT NOT NULL,
|
|
||||||
offset INTEGER NOT NULL,
|
|
||||||
length INTEGER NOT NULL,
|
|
||||||
PRIMARY KEY (blob_hash, chunk_hash)
|
|
||||||
);
|
|
||||||
|
|
||||||
-- Reverse mapping: tracks which files contain a given chunk
|
|
||||||
-- Used for deduplication and tracking chunk usage across files
|
|
||||||
CREATE TABLE chunk_files (
|
|
||||||
chunk_hash TEXT NOT NULL,
|
|
||||||
file_path TEXT NOT NULL,
|
|
||||||
file_offset INTEGER NOT NULL,
|
|
||||||
length INTEGER NOT NULL,
|
|
||||||
PRIMARY KEY (chunk_hash, file_path)
|
|
||||||
);
|
|
||||||
|
|
||||||
CREATE TABLE snapshots (
|
|
||||||
id TEXT PRIMARY KEY,
|
|
||||||
hostname TEXT NOT NULL,
|
|
||||||
vaultik_version TEXT NOT NULL,
|
|
||||||
created_ts INTEGER NOT NULL,
|
|
||||||
file_count INTEGER NOT NULL,
|
|
||||||
chunk_count INTEGER NOT NULL,
|
|
||||||
blob_count INTEGER NOT NULL
|
|
||||||
);
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 4. Snapshot Metadata Schema (stored in S3)
|
|
||||||
|
|
||||||
Identical schema to the local index, filtered to live snapshot state. Stored
|
|
||||||
as a SQLite DB, compressed with `zstd`, encrypted with `age`. If larger than
|
|
||||||
a configured `chunk_size`, it is split and uploaded as:
|
|
||||||
|
|
||||||
```
|
|
||||||
metadata/<snapshot_id>.sqlite.00.age
|
|
||||||
metadata/<snapshot_id>.sqlite.01.age
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 5. Data Flow
|
|
||||||
|
|
||||||
### 5.1 Backup
|
|
||||||
|
|
||||||
1. Load config
|
|
||||||
2. Open local SQLite index
|
|
||||||
3. Walk source directories:
|
|
||||||
|
|
||||||
* For each file:
|
|
||||||
|
|
||||||
* Check mtime and size in index
|
|
||||||
* If changed or new:
|
|
||||||
|
|
||||||
* Chunk file
|
|
||||||
* For each chunk:
|
|
||||||
|
|
||||||
* Hash with SHA256
|
|
||||||
* Check if already uploaded
|
|
||||||
* If not:
|
|
||||||
|
|
||||||
* Add chunk to blob packer
|
|
||||||
* Record file-chunk mapping in index
|
|
||||||
4. When blob reaches threshold size (e.g. 1GB):
|
|
||||||
|
|
||||||
* Compress with `zstd`
|
|
||||||
* Encrypt with `age`
|
|
||||||
* Upload to: `s3://<bucket>/<prefix>/blobs/<aa>/<bb>/<hash>.zst.age`
|
|
||||||
* Record blob-chunk layout in local index
|
|
||||||
5. Once all files are processed:
|
|
||||||
* Build snapshot SQLite DB from index delta
|
|
||||||
* Compress + encrypt
|
|
||||||
* If larger than `chunk_size`, split into parts
|
|
||||||
* Upload to:
|
|
||||||
`s3://<bucket>/<prefix>/metadata/<snapshot_id>.sqlite(.xx).age`
|
|
||||||
6. Create snapshot record in local index that lists:
|
|
||||||
* snapshot ID
|
|
||||||
* hostname
|
|
||||||
* vaultik version
|
|
||||||
* timestamp
|
|
||||||
* counts of files, chunks, and blobs
|
|
||||||
* list of all blobs referenced in the snapshot (some new, some old) for
|
|
||||||
efficient pruning later
|
|
||||||
7. Create snapshot database for upload
|
|
||||||
8. Calculate checksum of snapshot database
|
|
||||||
9. Compress, encrypt, split, and upload to S3
|
|
||||||
10. Encrypt the hash of the snapshot database to the backup age key
|
|
||||||
11. Upload the encrypted hash to S3 as `metadata/<snapshot_id>.hash.age`
|
|
||||||
12. Create blob manifest JSON listing all blob hashes referenced in snapshot
|
|
||||||
13. Compress manifest with zstd and upload as `metadata/<snapshot_id>.manifest.json.zst`
|
|
||||||
14. Optionally prune remote blobs that are no longer referenced in the
|
|
||||||
snapshot, based on local state db
|
|
||||||
|
|
||||||
### 5.2 Manual Prune
|
|
||||||
|
|
||||||
1. List all objects under `metadata/`
|
|
||||||
2. Determine the latest valid `snapshot_id` by timestamp
|
|
||||||
3. Download and decompress the latest `<snapshot_id>.manifest.json.zst`
|
|
||||||
4. Extract set of referenced blob hashes from manifest (no decryption needed)
|
|
||||||
5. List all blob objects under `blobs/`
|
|
||||||
6. For each blob:
|
|
||||||
* If the hash is not in the manifest:
|
|
||||||
* Issue `DeleteObject` to remove it
|
|
||||||
|
|
||||||
### 5.3 Verify
|
|
||||||
|
|
||||||
Verify runs on a host that has no state, but access to the bucket.
|
|
||||||
|
|
||||||
1. Fetch latest metadata snapshot files from S3
|
|
||||||
2. Fetch latest metadata db hash from S3
|
|
||||||
3. Decrypt the hash using the private key
|
|
||||||
4. Decrypt the metadata SQLite database chunks using the private key and
|
|
||||||
reassemble the snapshot db file
|
|
||||||
5. Calculate the SHA256 hash of the decrypted snapshot database
|
|
||||||
6. Verify the db file hash matches the decrypted hash
|
|
||||||
7. For each blob in the snapshot:
|
|
||||||
* Fetch the blob metadata from the snapshot db
|
|
||||||
* Ensure the blob exists in S3
|
|
||||||
* Check the S3 content hash matches the expected blob hash
|
|
||||||
* If not using --quick mode:
|
|
||||||
* Download and decrypt the blob
|
|
||||||
* Decompress and verify chunk hashes match metadata
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 6. CLI Commands
|
|
||||||
|
|
||||||
```
|
|
||||||
vaultik backup [--config <path>] [--cron] [--daemon] [--prune]
|
|
||||||
vaultik restore --bucket <bucket> --prefix <prefix> --snapshot <id> --target <dir>
|
|
||||||
vaultik prune --bucket <bucket> --prefix <prefix> [--dry-run]
|
|
||||||
vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
|
|
||||||
vaultik fetch --bucket <bucket> --prefix <prefix> --snapshot <id> --file <path> --target <path>
|
|
||||||
vaultik snapshot list --bucket <bucket> --prefix <prefix> [--limit <n>]
|
|
||||||
vaultik snapshot rm --bucket <bucket> --prefix <prefix> --snapshot <id>
|
|
||||||
vaultik snapshot latest --bucket <bucket> --prefix <prefix>
|
|
||||||
```
|
|
||||||
|
|
||||||
* `VAULTIK_PRIVATE_KEY` is required for `restore`, `prune`, `verify`, and
|
|
||||||
`fetch` commands.
|
|
||||||
* It is passed via environment variable containing the age private key.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 7. Function and Method Signatures
|
|
||||||
|
|
||||||
### 7.1 CLI
|
|
||||||
|
|
||||||
```go
|
|
||||||
func RootCmd() *cobra.Command
|
|
||||||
func backupCmd() *cobra.Command
|
|
||||||
func restoreCmd() *cobra.Command
|
|
||||||
func pruneCmd() *cobra.Command
|
|
||||||
func verifyCmd() *cobra.Command
|
|
||||||
```
|
|
||||||
|
|
||||||
### 7.2 Configuration
|
|
||||||
|
|
||||||
```go
|
|
||||||
type Config struct {
|
|
||||||
BackupPubKey string // age recipient
|
|
||||||
BackupInterval time.Duration // used in daemon mode, irrelevant for cron mode
|
|
||||||
BlobSizeLimit int64 // default 10GB
|
|
||||||
ChunkSize int64 // default 10MB
|
|
||||||
Exclude []string // list of regex of files to exclude from backup, absolute path
|
|
||||||
Hostname string
|
|
||||||
IndexPath string // path to local SQLite index db, default /var/lib/vaultik/index.db
|
|
||||||
MetadataPrefix string // S3 prefix for metadata, default "metadata/"
|
|
||||||
MinTimeBetweenRun time.Duration // minimum time between backup runs, default 1 hour - for daemon mode
|
|
||||||
S3 S3Config // S3 configuration
|
|
||||||
ScanInterval time.Duration // interval to full stat() scan source dirs, default 24h
|
|
||||||
SourceDirs []string // list of source directories to back up, absolute paths
|
|
||||||
}
|
|
||||||
|
|
||||||
type S3Config struct {
|
|
||||||
Endpoint string
|
|
||||||
Bucket string
|
|
||||||
Prefix string
|
|
||||||
AccessKeyID string
|
|
||||||
SecretAccessKey string
|
|
||||||
Region string
|
|
||||||
}
|
|
||||||
|
|
||||||
func Load(path string) (*Config, error)
|
|
||||||
```
|
|
||||||
|
|
||||||
### 7.3 Index
|
|
||||||
|
|
||||||
```go
|
|
||||||
type Index struct {
|
|
||||||
db *sql.DB
|
|
||||||
}
|
|
||||||
|
|
||||||
func OpenIndex(path string) (*Index, error)
|
|
||||||
|
|
||||||
func (ix *Index) LookupFile(path string, mtime int64, size int64) ([]string, bool, error)
|
|
||||||
func (ix *Index) SaveFile(path string, mtime int64, size int64, chunkHashes []string) error
|
|
||||||
func (ix *Index) AddChunk(chunkHash string, size int64) error
|
|
||||||
func (ix *Index) MarkBlob(blobHash, finalHash string, created time.Time) error
|
|
||||||
func (ix *Index) MapChunkToBlob(blobHash, chunkHash string, offset, length int64) error
|
|
||||||
func (ix *Index) MapChunkToFile(chunkHash, filePath string, offset, length int64) error
|
|
||||||
```
|
|
||||||
|
|
||||||
### 7.4 Blob Packing
|
|
||||||
|
|
||||||
```go
|
|
||||||
type BlobWriter struct {
|
|
||||||
// internal buffer, current size, encrypted writer, etc
|
|
||||||
}
|
|
||||||
|
|
||||||
func NewBlobWriter(...) *BlobWriter
|
|
||||||
func (bw *BlobWriter) AddChunk(chunk []byte, chunkHash string) error
|
|
||||||
func (bw *BlobWriter) Flush() (finalBlobHash string, err error)
|
|
||||||
```
|
|
||||||
|
|
||||||
### 7.5 Metadata
|
|
||||||
|
|
||||||
```go
|
|
||||||
func BuildSnapshotMetadata(ix *Index, snapshotID string) (sqlitePath string, err error)
|
|
||||||
func EncryptAndUploadMetadata(path string, cfg *Config, snapshotID string) error
|
|
||||||
```
|
|
||||||
|
|
||||||
### 7.6 Prune
|
|
||||||
|
|
||||||
```go
|
|
||||||
func RunPrune(bucket, prefix, privateKey string) error
|
|
||||||
```
|
|
||||||
|
|
||||||
21
LICENSE
Normal file
21
LICENSE
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2025 Jeffrey Paul sneak@sneak.berlin
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
29
Makefile
29
Makefile
@@ -1,19 +1,27 @@
|
|||||||
.PHONY: test fmt lint build clean all
|
.PHONY: test fmt lint build clean all
|
||||||
|
|
||||||
|
# Version number
|
||||||
|
VERSION := 0.0.1
|
||||||
|
|
||||||
# Build variables
|
# Build variables
|
||||||
VERSION := $(shell git describe --tags --always --dirty 2>/dev/null || echo "dev")
|
GIT_REVISION := $(shell git rev-parse HEAD 2>/dev/null || echo "unknown")
|
||||||
COMMIT := $(shell git rev-parse HEAD 2>/dev/null || echo "unknown")
|
|
||||||
|
|
||||||
# Linker flags
|
# Linker flags
|
||||||
LDFLAGS := -X 'git.eeqj.de/sneak/vaultik/internal/globals.Version=$(VERSION)' \
|
LDFLAGS := -X 'git.eeqj.de/sneak/vaultik/internal/globals.Version=$(VERSION)' \
|
||||||
-X 'git.eeqj.de/sneak/vaultik/internal/globals.Commit=$(COMMIT)'
|
-X 'git.eeqj.de/sneak/vaultik/internal/globals.Commit=$(GIT_REVISION)'
|
||||||
|
|
||||||
# Default target
|
# Default target
|
||||||
all: test
|
all: vaultik
|
||||||
|
|
||||||
# Run tests
|
# Run tests
|
||||||
test: lint fmt-check
|
test: lint fmt-check
|
||||||
go test -v ./...
|
@echo "Running tests..."
|
||||||
|
@if ! go test -v -timeout 10s ./... 2>&1; then \
|
||||||
|
echo ""; \
|
||||||
|
echo "TEST FAILURES DETECTED"; \
|
||||||
|
echo "Run 'go test -v ./internal/database' to see database test details"; \
|
||||||
|
exit 1; \
|
||||||
|
fi
|
||||||
|
|
||||||
# Check if code is formatted
|
# Check if code is formatted
|
||||||
fmt-check:
|
fmt-check:
|
||||||
@@ -31,8 +39,8 @@ lint:
|
|||||||
golangci-lint run
|
golangci-lint run
|
||||||
|
|
||||||
# Build binary
|
# Build binary
|
||||||
build:
|
vaultik: internal/*/*.go cmd/vaultik/*.go
|
||||||
go build -ldflags "$(LDFLAGS)" -o vaultik ./cmd/vaultik
|
go build -ldflags "$(LDFLAGS)" -o $@ ./cmd/vaultik
|
||||||
|
|
||||||
# Clean build artifacts
|
# Clean build artifacts
|
||||||
clean:
|
clean:
|
||||||
@@ -52,3 +60,10 @@ test-coverage:
|
|||||||
# Run integration tests
|
# Run integration tests
|
||||||
test-integration:
|
test-integration:
|
||||||
go test -v -tags=integration ./...
|
go test -v -tags=integration ./...
|
||||||
|
|
||||||
|
local:
|
||||||
|
VAULTIK_CONFIG=$(HOME)/etc/vaultik/config.yml ./vaultik snapshot --debug list 2>&1
|
||||||
|
VAULTIK_CONFIG=$(HOME)/etc/vaultik/config.yml ./vaultik snapshot --debug create 2>&1
|
||||||
|
|
||||||
|
install: vaultik
|
||||||
|
cp ./vaultik $(HOME)/bin/
|
||||||
|
|||||||
556
PROCESS.md
Normal file
556
PROCESS.md
Normal file
@@ -0,0 +1,556 @@
|
|||||||
|
# Vaultik Snapshot Creation Process
|
||||||
|
|
||||||
|
This document describes the lifecycle of objects during snapshot creation, with a focus on database transactions and foreign key constraints.
|
||||||
|
|
||||||
|
## Database Schema Overview
|
||||||
|
|
||||||
|
### Tables and Foreign Key Dependencies
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ FOREIGN KEY GRAPH │
|
||||||
|
│ │
|
||||||
|
│ snapshots ◄────── snapshot_files ────────► files │
|
||||||
|
│ │ │ │
|
||||||
|
│ └───────── snapshot_blobs ────────► blobs │ │
|
||||||
|
│ │ │ │
|
||||||
|
│ │ ├──► file_chunks ◄── chunks│
|
||||||
|
│ │ │ ▲ │
|
||||||
|
│ │ └──► chunk_files ────┘ │
|
||||||
|
│ │ │
|
||||||
|
│ └──► blob_chunks ─────────────┘│
|
||||||
|
│ │
|
||||||
|
│ uploads ───────► blobs.blob_hash │
|
||||||
|
│ └──────────► snapshots.id │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Critical Constraint: `chunks` Must Exist First
|
||||||
|
|
||||||
|
These tables reference `chunks.chunk_hash` **without CASCADE**:
|
||||||
|
- `file_chunks.chunk_hash` → `chunks.chunk_hash`
|
||||||
|
- `chunk_files.chunk_hash` → `chunks.chunk_hash`
|
||||||
|
- `blob_chunks.chunk_hash` → `chunks.chunk_hash`
|
||||||
|
|
||||||
|
**Implication**: A chunk record MUST be committed to the database BEFORE any of these referencing records can be created.
|
||||||
|
|
||||||
|
### Order of Operations Required by Schema
|
||||||
|
|
||||||
|
```
|
||||||
|
1. snapshots (created first, before scan)
|
||||||
|
2. blobs (created when packer starts new blob)
|
||||||
|
3. chunks (created during file processing)
|
||||||
|
4. blob_chunks (created immediately after chunk added to packer)
|
||||||
|
5. files (created after file fully chunked)
|
||||||
|
6. file_chunks (created with file record)
|
||||||
|
7. chunk_files (created with file record)
|
||||||
|
8. snapshot_files (created with file record)
|
||||||
|
9. snapshot_blobs (created after blob uploaded)
|
||||||
|
10. uploads (created after blob uploaded)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Snapshot Creation Phases
|
||||||
|
|
||||||
|
### Phase 0: Initialization
|
||||||
|
|
||||||
|
**Actions:**
|
||||||
|
1. Snapshot record created in database (Transaction T0)
|
||||||
|
2. Known files loaded into memory from `files` table
|
||||||
|
3. Known chunks loaded into memory from `chunks` table
|
||||||
|
|
||||||
|
**Transactions:**
|
||||||
|
```
|
||||||
|
T0: INSERT INTO snapshots (id, hostname, ...) VALUES (...)
|
||||||
|
COMMIT
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 1: Scan Directory
|
||||||
|
|
||||||
|
**Actions:**
|
||||||
|
1. Walk filesystem directory tree
|
||||||
|
2. For each file, compare against in-memory `knownFiles` map
|
||||||
|
3. Classify files as: unchanged, new, or modified
|
||||||
|
4. Collect unchanged file IDs for later association
|
||||||
|
5. Collect new/modified files for processing
|
||||||
|
|
||||||
|
**Transactions:**
|
||||||
|
```
|
||||||
|
(None during scan - all in-memory)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 1b: Associate Unchanged Files
|
||||||
|
|
||||||
|
**Actions:**
|
||||||
|
1. For unchanged files, add entries to `snapshot_files` table
|
||||||
|
2. Done in batches of 1000
|
||||||
|
|
||||||
|
**Transactions:**
|
||||||
|
```
|
||||||
|
For each batch of 1000 file IDs:
|
||||||
|
T: BEGIN
|
||||||
|
INSERT INTO snapshot_files (snapshot_id, file_id) VALUES (?, ?)
|
||||||
|
... (up to 1000 inserts)
|
||||||
|
COMMIT
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 2: Process Files
|
||||||
|
|
||||||
|
For each file that needs processing:
|
||||||
|
|
||||||
|
#### Step 2a: Open and Chunk File
|
||||||
|
|
||||||
|
**Location:** `processFileStreaming()`
|
||||||
|
|
||||||
|
For each chunk produced by content-defined chunking:
|
||||||
|
|
||||||
|
##### Step 2a-1: Check Chunk Existence
|
||||||
|
```go
|
||||||
|
chunkExists := s.chunkExists(chunk.Hash) // In-memory lookup
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Step 2a-2: Create Chunk Record (if new)
|
||||||
|
```go
|
||||||
|
// TRANSACTION: Create chunk in database
|
||||||
|
err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
|
||||||
|
dbChunk := &database.Chunk{ChunkHash: chunk.Hash, Size: chunk.Size}
|
||||||
|
return s.repos.Chunks.Create(txCtx, tx, dbChunk)
|
||||||
|
})
|
||||||
|
// COMMIT immediately after WithTx returns
|
||||||
|
|
||||||
|
// Update in-memory cache
|
||||||
|
s.addKnownChunk(chunk.Hash)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Transaction:**
|
||||||
|
```
|
||||||
|
T_chunk: BEGIN
|
||||||
|
INSERT INTO chunks (chunk_hash, size) VALUES (?, ?)
|
||||||
|
COMMIT
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Step 2a-3: Add Chunk to Packer
|
||||||
|
|
||||||
|
```go
|
||||||
|
s.packer.AddChunk(&blob.ChunkRef{Hash: chunk.Hash, Data: chunk.Data})
|
||||||
|
```
|
||||||
|
|
||||||
|
**Inside packer.AddChunk → addChunkToCurrentBlob():**
|
||||||
|
|
||||||
|
```go
|
||||||
|
// TRANSACTION: Create blob_chunks record IMMEDIATELY
|
||||||
|
if p.repos != nil {
|
||||||
|
blobChunk := &database.BlobChunk{
|
||||||
|
BlobID: p.currentBlob.id,
|
||||||
|
ChunkHash: chunk.Hash,
|
||||||
|
Offset: offset,
|
||||||
|
Length: chunkSize,
|
||||||
|
}
|
||||||
|
err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
return p.repos.BlobChunks.Create(ctx, tx, blobChunk)
|
||||||
|
})
|
||||||
|
// COMMIT immediately
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Transaction:**
|
||||||
|
```
|
||||||
|
T_blob_chunk: BEGIN
|
||||||
|
INSERT INTO blob_chunks (blob_id, chunk_hash, offset, length) VALUES (?, ?, ?, ?)
|
||||||
|
COMMIT
|
||||||
|
```
|
||||||
|
|
||||||
|
**⚠️ CRITICAL DEPENDENCY**: This transaction requires `chunks.chunk_hash` to exist (FK constraint).
|
||||||
|
The chunk MUST be committed in Step 2a-2 BEFORE this can succeed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### Step 2b: Blob Size Limit Handling
|
||||||
|
|
||||||
|
If adding a chunk would exceed blob size limit:
|
||||||
|
|
||||||
|
```go
|
||||||
|
if err == blob.ErrBlobSizeLimitExceeded {
|
||||||
|
if err := s.packer.FinalizeBlob(); err != nil { ... }
|
||||||
|
// Retry adding the chunk
|
||||||
|
if err := s.packer.AddChunk(...); err != nil { ... }
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**FinalizeBlob() transactions:**
|
||||||
|
```
|
||||||
|
T_blob_finish: BEGIN
|
||||||
|
UPDATE blobs SET blob_hash=?, uncompressed_size=?, compressed_size=?, finished_ts=? WHERE id=?
|
||||||
|
COMMIT
|
||||||
|
```
|
||||||
|
|
||||||
|
Then blob handler is called (handleBlobReady):
|
||||||
|
```
|
||||||
|
(Upload to S3 - no transaction)
|
||||||
|
|
||||||
|
T_blob_uploaded: BEGIN
|
||||||
|
UPDATE blobs SET uploaded_ts=? WHERE id=?
|
||||||
|
INSERT INTO snapshot_blobs (snapshot_id, blob_id, blob_hash) VALUES (?, ?, ?)
|
||||||
|
INSERT INTO uploads (blob_hash, snapshot_id, uploaded_at, size, duration_ms) VALUES (?, ?, ?, ?, ?)
|
||||||
|
COMMIT
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### Step 2c: Queue File for Batch Insertion
|
||||||
|
|
||||||
|
After all chunks for a file are processed:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Build file data (in-memory, no DB)
|
||||||
|
fileChunks := make([]database.FileChunk, len(chunks))
|
||||||
|
chunkFiles := make([]database.ChunkFile, len(chunks))
|
||||||
|
|
||||||
|
// Queue for batch insertion
|
||||||
|
return s.addPendingFile(ctx, pendingFileData{
|
||||||
|
file: fileToProcess.File,
|
||||||
|
fileChunks: fileChunks,
|
||||||
|
chunkFiles: chunkFiles,
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
**No transaction yet** - just adds to `pendingFiles` slice.
|
||||||
|
|
||||||
|
If `len(pendingFiles) >= fileBatchSize (100)`, triggers `flushPendingFiles()`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Step 2d: Flush Pending Files
|
||||||
|
|
||||||
|
**Location:** `flushPendingFiles()` - called when batch is full or at end of processing
|
||||||
|
|
||||||
|
```go
|
||||||
|
return s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
|
||||||
|
for _, data := range files {
|
||||||
|
// 1. Create file record
|
||||||
|
s.repos.Files.Create(txCtx, tx, data.file) // INSERT OR REPLACE
|
||||||
|
|
||||||
|
// 2. Delete old associations
|
||||||
|
s.repos.FileChunks.DeleteByFileID(txCtx, tx, data.file.ID)
|
||||||
|
s.repos.ChunkFiles.DeleteByFileID(txCtx, tx, data.file.ID)
|
||||||
|
|
||||||
|
// 3. Create file_chunks records
|
||||||
|
for _, fc := range data.fileChunks {
|
||||||
|
s.repos.FileChunks.Create(txCtx, tx, &fc) // FK: chunks.chunk_hash
|
||||||
|
}
|
||||||
|
|
||||||
|
// 4. Create chunk_files records
|
||||||
|
for _, cf := range data.chunkFiles {
|
||||||
|
s.repos.ChunkFiles.Create(txCtx, tx, &cf) // FK: chunks.chunk_hash
|
||||||
|
}
|
||||||
|
|
||||||
|
// 5. Add file to snapshot
|
||||||
|
s.repos.Snapshots.AddFileByID(txCtx, tx, s.snapshotID, data.file.ID)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
})
|
||||||
|
// COMMIT (all or nothing for the batch)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Transaction:**
|
||||||
|
```
|
||||||
|
T_files_batch: BEGIN
|
||||||
|
-- For each file in batch:
|
||||||
|
INSERT OR REPLACE INTO files (...) VALUES (...)
|
||||||
|
DELETE FROM file_chunks WHERE file_id = ?
|
||||||
|
DELETE FROM chunk_files WHERE file_id = ?
|
||||||
|
INSERT INTO file_chunks (file_id, idx, chunk_hash) VALUES (?, ?, ?) -- FK: chunks
|
||||||
|
INSERT INTO chunk_files (chunk_hash, file_id, ...) VALUES (?, ?, ...) -- FK: chunks
|
||||||
|
INSERT INTO snapshot_files (snapshot_id, file_id) VALUES (?, ?)
|
||||||
|
-- Repeat for each file
|
||||||
|
COMMIT
|
||||||
|
```
|
||||||
|
|
||||||
|
**⚠️ CRITICAL DEPENDENCY**: `file_chunks` and `chunk_files` require `chunks.chunk_hash` to exist.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 2 End: Final Flush
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Flush any remaining pending files
|
||||||
|
if err := s.flushAllPending(ctx); err != nil { ... }
|
||||||
|
|
||||||
|
// Final packer flush
|
||||||
|
s.packer.Flush()
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Current Bug
|
||||||
|
|
||||||
|
### Problem
|
||||||
|
|
||||||
|
The current code attempts to batch file insertions, but `file_chunks` and `chunk_files` have foreign keys to `chunks.chunk_hash`. The batched file flush tries to insert these records, but if the chunks haven't been committed yet, the FK constraint fails.
|
||||||
|
|
||||||
|
### Why It's Happening
|
||||||
|
|
||||||
|
Looking at the sequence:
|
||||||
|
|
||||||
|
1. Process file A, chunk X
|
||||||
|
2. Create chunk X in DB (Transaction commits)
|
||||||
|
3. Add chunk X to packer
|
||||||
|
4. Packer creates blob_chunks for chunk X (needs chunk X - OK, committed in step 2)
|
||||||
|
5. Queue file A with chunk references
|
||||||
|
6. Process file B, chunk Y
|
||||||
|
7. Create chunk Y in DB (Transaction commits)
|
||||||
|
8. ... etc ...
|
||||||
|
9. At end: flushPendingFiles()
|
||||||
|
10. Insert file_chunks for file A referencing chunk X (chunk X committed - should work)
|
||||||
|
|
||||||
|
The chunks ARE being created individually. But something is going wrong.
|
||||||
|
|
||||||
|
### Actual Issue
|
||||||
|
|
||||||
|
Wait - let me re-read the code. The issue is:
|
||||||
|
|
||||||
|
In `processFileStreaming`, when we queue file data:
|
||||||
|
```go
|
||||||
|
fileChunks[i] = database.FileChunk{
|
||||||
|
FileID: fileToProcess.File.ID,
|
||||||
|
Idx: ci.fileChunk.Idx,
|
||||||
|
ChunkHash: ci.fileChunk.ChunkHash,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `FileID` is set, but `fileToProcess.File.ID` might be empty at this point because the file record hasn't been created yet!
|
||||||
|
|
||||||
|
Looking at `checkFileInMemory`:
|
||||||
|
```go
|
||||||
|
// For new files:
|
||||||
|
if !exists {
|
||||||
|
return file, true // file.ID is empty string!
|
||||||
|
}
|
||||||
|
|
||||||
|
// For existing files:
|
||||||
|
file.ID = existingFile.ID // Reuse existing ID
|
||||||
|
```
|
||||||
|
|
||||||
|
**For NEW files, `file.ID` is empty!**
|
||||||
|
|
||||||
|
Then in `flushPendingFiles`:
|
||||||
|
```go
|
||||||
|
s.repos.Files.Create(txCtx, tx, data.file) // This generates/uses the ID
|
||||||
|
```
|
||||||
|
|
||||||
|
But `data.fileChunks` was built with the EMPTY ID!
|
||||||
|
|
||||||
|
### The Real Problem
|
||||||
|
|
||||||
|
For new files:
|
||||||
|
1. `checkFileInMemory` creates file record with empty ID
|
||||||
|
2. `processFileStreaming` queues file_chunks with empty `FileID`
|
||||||
|
3. `flushPendingFiles` creates file (generates ID), but file_chunks still have empty `FileID`
|
||||||
|
|
||||||
|
Wait, but `Files.Create` should be INSERT OR REPLACE by path, and the file struct should get updated... Let me check.
|
||||||
|
|
||||||
|
Actually, looking more carefully at the code path - the file IS created first in the flush, but the `fileChunks` slice was already built with the old (possibly empty) ID. The ID isn't updated after the file is created.
|
||||||
|
|
||||||
|
Hmm, but looking at the current code:
|
||||||
|
```go
|
||||||
|
fileChunks[i] = database.FileChunk{
|
||||||
|
FileID: fileToProcess.File.ID, // This uses the ID from the File struct
|
||||||
|
```
|
||||||
|
|
||||||
|
And in `checkFileInMemory` for new files, we create a file struct but don't set the ID. However, looking at the database repository, `Files.Create` should be doing `INSERT OR REPLACE` and the ID should be pre-generated...
|
||||||
|
|
||||||
|
Let me check if IDs are being generated. Looking at the File struct usage, it seems like UUIDs should be generated somewhere...
|
||||||
|
|
||||||
|
Actually, looking at the test failures again:
|
||||||
|
```
|
||||||
|
creating file chunk: inserting file_chunk: constraint failed: FOREIGN KEY constraint failed (787)
|
||||||
|
```
|
||||||
|
|
||||||
|
Error 787 is SQLite's foreign key constraint error. The failing FK is on `file_chunks.chunk_hash → chunks.chunk_hash`.
|
||||||
|
|
||||||
|
So the chunks ARE NOT in the database when we try to insert file_chunks. Let me trace through more carefully...
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Transaction Timing Issue
|
||||||
|
|
||||||
|
The problem is transaction visibility in SQLite.
|
||||||
|
|
||||||
|
Each `WithTx` creates a new transaction that commits at the end. But with batched file insertion:
|
||||||
|
|
||||||
|
1. Chunk transactions commit one at a time
|
||||||
|
2. File batch transaction runs later
|
||||||
|
|
||||||
|
If chunks are being inserted but something goes wrong with transaction isolation, the file batch might not see them.
|
||||||
|
|
||||||
|
But actually SQLite in WAL mode should have SERIALIZABLE isolation by default, so committed transactions should be visible.
|
||||||
|
|
||||||
|
Let me check if the in-memory cache is masking a database problem...
|
||||||
|
|
||||||
|
Actually, wait. Let me re-check the current broken code more carefully. The issue might be simpler.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current Code Flow Analysis
|
||||||
|
|
||||||
|
Looking at `processFileStreaming` in the current broken state:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// For each chunk:
|
||||||
|
if !chunkExists {
|
||||||
|
err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
|
||||||
|
dbChunk := &database.Chunk{ChunkHash: chunk.Hash, Size: chunk.Size}
|
||||||
|
return s.repos.Chunks.Create(txCtx, tx, dbChunk)
|
||||||
|
})
|
||||||
|
// ... check error ...
|
||||||
|
s.addKnownChunk(chunk.Hash)
|
||||||
|
}
|
||||||
|
|
||||||
|
// ... add to packer (creates blob_chunks) ...
|
||||||
|
|
||||||
|
// Collect chunk info for file
|
||||||
|
chunks = append(chunks, chunkInfo{...})
|
||||||
|
```
|
||||||
|
|
||||||
|
Then at end of function:
|
||||||
|
```go
|
||||||
|
// Queue file for batch insertion
|
||||||
|
return s.addPendingFile(ctx, pendingFileData{
|
||||||
|
file: fileToProcess.File,
|
||||||
|
fileChunks: fileChunks,
|
||||||
|
chunkFiles: chunkFiles,
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
At end of `processPhase`:
|
||||||
|
```go
|
||||||
|
if err := s.flushAllPending(ctx); err != nil { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
The chunks are being created one-by-one with individual transactions. By the time `flushPendingFiles` runs, all chunk transactions should have committed.
|
||||||
|
|
||||||
|
Unless... there's a bug in how the chunks are being referenced. Let me check if the chunk_hash values are correct.
|
||||||
|
|
||||||
|
Or... maybe the test database is being recreated between operations somehow?
|
||||||
|
|
||||||
|
Actually, let me check the test setup. Maybe the issue is specific to the test environment.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary of Object Lifecycle
|
||||||
|
|
||||||
|
| Object | When Created | Transaction | Dependencies |
|
||||||
|
|--------|--------------|-------------|--------------|
|
||||||
|
| snapshot | Before scan | Individual tx | None |
|
||||||
|
| blob | When packer needs new blob | Individual tx | None |
|
||||||
|
| chunk | During file chunking (each chunk) | Individual tx | None |
|
||||||
|
| blob_chunks | Immediately after adding chunk to packer | Individual tx | chunks, blobs |
|
||||||
|
| files | Batched at end of processing | Batch tx | None |
|
||||||
|
| file_chunks | With file (batched) | Batch tx | files, chunks |
|
||||||
|
| chunk_files | With file (batched) | Batch tx | files, chunks |
|
||||||
|
| snapshot_files | With file (batched) | Batch tx | snapshots, files |
|
||||||
|
| snapshot_blobs | After blob upload | Individual tx | snapshots, blobs |
|
||||||
|
| uploads | After blob upload | Same tx as snapshot_blobs | blobs, snapshots |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Root Cause Analysis
|
||||||
|
|
||||||
|
After detailed analysis, I believe the issue is one of the following:
|
||||||
|
|
||||||
|
### Hypothesis 1: File ID Not Set
|
||||||
|
|
||||||
|
Looking at `checkFileInMemory()` for NEW files:
|
||||||
|
```go
|
||||||
|
if !exists {
|
||||||
|
return file, true // file.ID is empty string!
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
For new files, `file.ID` is empty. Then in `processFileStreaming`:
|
||||||
|
```go
|
||||||
|
fileChunks[i] = database.FileChunk{
|
||||||
|
FileID: fileToProcess.File.ID, // Empty for new files!
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `FileID` in the built `fileChunks` slice is empty.
|
||||||
|
|
||||||
|
Then in `flushPendingFiles`:
|
||||||
|
```go
|
||||||
|
s.repos.Files.Create(txCtx, tx, data.file) // This generates the ID
|
||||||
|
// But data.fileChunks still has empty FileID!
|
||||||
|
for i := range data.fileChunks {
|
||||||
|
s.repos.FileChunks.Create(...) // Uses empty FileID
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solution**: Generate file IDs upfront in `checkFileInMemory()`:
|
||||||
|
```go
|
||||||
|
file := &database.File{
|
||||||
|
ID: uuid.New().String(), // Generate ID immediately
|
||||||
|
Path: path,
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Hypothesis 2: Transaction Isolation
|
||||||
|
|
||||||
|
SQLite with a single connection pool (`MaxOpenConns(1)`) should serialize all transactions. Committed data should be visible to subsequent transactions.
|
||||||
|
|
||||||
|
However, there might be a subtle issue with how `context.Background()` is used in the packer vs the scanner's context.
|
||||||
|
|
||||||
|
## Recommended Fix
|
||||||
|
|
||||||
|
**Step 1: Generate file IDs upfront**
|
||||||
|
|
||||||
|
In `checkFileInMemory()`, generate the UUID for new files immediately:
|
||||||
|
```go
|
||||||
|
file := &database.File{
|
||||||
|
ID: uuid.New().String(), // Always generate ID
|
||||||
|
Path: path,
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This ensures `file.ID` is set when building `fileChunks` and `chunkFiles` slices.
|
||||||
|
|
||||||
|
**Step 2: Verify by reverting to per-file transactions**
|
||||||
|
|
||||||
|
If Step 1 doesn't fix it, revert to non-batched file insertion to isolate the issue:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Instead of queuing:
|
||||||
|
// return s.addPendingFile(ctx, pendingFileData{...})
|
||||||
|
|
||||||
|
// Do immediate insertion:
|
||||||
|
return s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
|
||||||
|
// Create file
|
||||||
|
s.repos.Files.Create(txCtx, tx, fileToProcess.File)
|
||||||
|
// Delete old associations
|
||||||
|
s.repos.FileChunks.DeleteByFileID(...)
|
||||||
|
s.repos.ChunkFiles.DeleteByFileID(...)
|
||||||
|
// Create new associations
|
||||||
|
for _, fc := range fileChunks {
|
||||||
|
s.repos.FileChunks.Create(...)
|
||||||
|
}
|
||||||
|
for _, cf := range chunkFiles {
|
||||||
|
s.repos.ChunkFiles.Create(...)
|
||||||
|
}
|
||||||
|
// Add to snapshot
|
||||||
|
s.repos.Snapshots.AddFileByID(...)
|
||||||
|
return nil
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 3: If batching is still desired**
|
||||||
|
|
||||||
|
After confirming per-file transactions work, re-implement batching with the ID fix in place, and add debug logging to trace exactly which chunk_hash is failing and why.
|
||||||
462
README.md
462
README.md
@@ -1,11 +1,64 @@
|
|||||||
# vaultik
|
# vaultik (ваултик)
|
||||||
|
|
||||||
`vaultik` is a incremental backup daemon written in Go. It
|
WIP: pre-1.0, some functions may not be fully implemented yet
|
||||||
encrypts data using an `age` public key and uploads each encrypted blob
|
|
||||||
directly to a remote S3-compatible object store. It requires no private
|
|
||||||
keys, secrets, or credentials stored on the backed-up system.
|
|
||||||
|
|
||||||
---
|
`vaultik` is an incremental backup daemon written in Go. It encrypts data
|
||||||
|
using an `age` public key and uploads each encrypted blob directly to a
|
||||||
|
remote S3-compatible object store. It requires no private keys, secrets, or
|
||||||
|
credentials (other than those required to PUT to encrypted object storage,
|
||||||
|
such as S3 API keys) stored on the backed-up system.
|
||||||
|
|
||||||
|
It includes table-stakes features such as:
|
||||||
|
|
||||||
|
* modern encryption (the excellent `age`)
|
||||||
|
* deduplication
|
||||||
|
* incremental backups
|
||||||
|
* modern multithreaded zstd compression with configurable levels
|
||||||
|
* content-addressed immutable storage
|
||||||
|
* local state tracking in standard SQLite database, enables write-only
|
||||||
|
incremental backups to destination
|
||||||
|
* no mutable remote metadata
|
||||||
|
* no plaintext file paths or metadata stored in remote
|
||||||
|
* does not create huge numbers of small files (to keep S3 operation counts
|
||||||
|
down) even if the source system has many small files
|
||||||
|
|
||||||
|
## why
|
||||||
|
|
||||||
|
Existing backup software fails under one or more of these conditions:
|
||||||
|
|
||||||
|
* Requires secrets (passwords, private keys) on the source system, which
|
||||||
|
compromises encrypted backups in the case of host system compromise
|
||||||
|
* Depends on symmetric encryption unsuitable for zero-trust environments
|
||||||
|
* Creates one-blob-per-file, which results in excessive S3 operation counts
|
||||||
|
* is slow
|
||||||
|
|
||||||
|
Other backup tools like `restic`, `borg`, and `duplicity` are designed for
|
||||||
|
environments where the source host can store secrets and has access to
|
||||||
|
decryption keys. I don't want to store backup decryption keys on my hosts,
|
||||||
|
only public keys for encryption.
|
||||||
|
|
||||||
|
My requirements are:
|
||||||
|
|
||||||
|
* open source
|
||||||
|
* no passphrases or private keys on the source host
|
||||||
|
* incremental
|
||||||
|
* compressed
|
||||||
|
* encrypted
|
||||||
|
* s3 compatible without an intermediate step or tool
|
||||||
|
|
||||||
|
Surprisingly, no existing tool meets these requirements, so I wrote `vaultik`.
|
||||||
|
|
||||||
|
## design goals
|
||||||
|
|
||||||
|
1. Backups must require only a public key on the source host.
|
||||||
|
1. No secrets or private keys may exist on the source system.
|
||||||
|
1. Restore must be possible using **only** the backup bucket and a private key.
|
||||||
|
1. Prune must be possible (requires private key, done on different hosts).
|
||||||
|
1. All encryption uses [`age`](https://age-encryption.org/) (X25519, XChaCha20-Poly1305).
|
||||||
|
1. Compression uses `zstd` at a configurable level.
|
||||||
|
1. Files are chunked, and multiple chunks are packed into encrypted blobs
|
||||||
|
to reduce object count for filesystems with many small files.
|
||||||
|
1. All metadata (snapshots) is stored remotely as encrypted SQLite DBs.
|
||||||
|
|
||||||
## what
|
## what
|
||||||
|
|
||||||
@@ -13,29 +66,12 @@ keys, secrets, or credentials stored on the backed-up system.
|
|||||||
content-addressable chunk map of changed files using deterministic chunking.
|
content-addressable chunk map of changed files using deterministic chunking.
|
||||||
Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
|
Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
|
||||||
encrypted with `age`, and uploaded directly to remote storage under a
|
encrypted with `age`, and uploaded directly to remote storage under a
|
||||||
content-addressed S3 path.
|
content-addressed S3 path. At the end, a pruned snapshot-specific sqlite
|
||||||
|
database of metadata is created, encrypted, and uploaded alongside the
|
||||||
|
blobs.
|
||||||
|
|
||||||
No plaintext file contents ever hit disk. No private key is needed or stored
|
No plaintext file contents ever hit disk. No private key or secret
|
||||||
locally. All encrypted data is streaming-processed and immediately discarded
|
passphrase is needed or stored locally.
|
||||||
once uploaded. Metadata is encrypted and pushed with the same mechanism.
|
|
||||||
|
|
||||||
## why
|
|
||||||
|
|
||||||
Existing backup software fails under one or more of these conditions:
|
|
||||||
|
|
||||||
* Requires secrets (passwords, private keys) on the source system
|
|
||||||
* Depends on symmetric encryption unsuitable for zero-trust environments
|
|
||||||
* Stages temporary archives or repositories
|
|
||||||
* Writes plaintext metadata or plaintext file paths
|
|
||||||
|
|
||||||
`vaultik` addresses all of these by using:
|
|
||||||
|
|
||||||
* Public-key-only encryption (via `age`) requires no secrets (other than
|
|
||||||
bucket access key) on the source system
|
|
||||||
* Blob-level deduplication and batching
|
|
||||||
* Local state cache for incremental detection
|
|
||||||
* S3-native chunked upload interface
|
|
||||||
* Self-contained encrypted snapshot metadata
|
|
||||||
|
|
||||||
## how
|
## how
|
||||||
|
|
||||||
@@ -45,23 +81,38 @@ Existing backup software fails under one or more of these conditions:
|
|||||||
go install git.eeqj.de/sneak/vaultik@latest
|
go install git.eeqj.de/sneak/vaultik@latest
|
||||||
```
|
```
|
||||||
|
|
||||||
2. **generate keypair**
|
1. **generate keypair**
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
age-keygen -o agekey.txt
|
age-keygen -o agekey.txt
|
||||||
grep 'public key:' agekey.txt
|
grep 'public key:' agekey.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
3. **write config**
|
1. **write config**
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
source_dirs:
|
# Named snapshots - each snapshot can contain multiple paths
|
||||||
|
snapshots:
|
||||||
|
system:
|
||||||
|
paths:
|
||||||
- /etc
|
- /etc
|
||||||
- /home/user/data
|
- /var/lib
|
||||||
|
exclude:
|
||||||
|
- '*.cache' # Snapshot-specific exclusions
|
||||||
|
home:
|
||||||
|
paths:
|
||||||
|
- /home/user/documents
|
||||||
|
- /home/user/photos
|
||||||
|
|
||||||
|
# Global exclusions (apply to all snapshots)
|
||||||
exclude:
|
exclude:
|
||||||
- '*.log'
|
- '*.log'
|
||||||
- '*.tmp'
|
- '*.tmp'
|
||||||
age_recipient: age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
- '.git'
|
||||||
|
- 'node_modules'
|
||||||
|
|
||||||
|
age_recipients:
|
||||||
|
- age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
|
||||||
s3:
|
s3:
|
||||||
endpoint: https://s3.example.com
|
endpoint: https://s3.example.com
|
||||||
bucket: vaultik-data
|
bucket: vaultik-data
|
||||||
@@ -69,28 +120,24 @@ Existing backup software fails under one or more of these conditions:
|
|||||||
access_key_id: ...
|
access_key_id: ...
|
||||||
secret_access_key: ...
|
secret_access_key: ...
|
||||||
region: us-east-1
|
region: us-east-1
|
||||||
backup_interval: 1h # only used in daemon mode, not for --cron mode
|
backup_interval: 1h
|
||||||
full_scan_interval: 24h # normally we use inotify to mark dirty, but
|
full_scan_interval: 24h
|
||||||
# every 24h we do a full stat() scan
|
min_time_between_run: 15m
|
||||||
min_time_between_run: 15m # again, only for daemon mode
|
|
||||||
index_path: /var/lib/vaultik/index.sqlite
|
|
||||||
chunk_size: 10MB
|
chunk_size: 10MB
|
||||||
blob_size_limit: 10GB
|
blob_size_limit: 1GB
|
||||||
index_prefix: index/
|
|
||||||
```
|
```
|
||||||
|
|
||||||
4. **run**
|
1. **run**
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
vaultik backup /etc/vaultik.yaml
|
# Create all configured snapshots
|
||||||
```
|
vaultik --config /etc/vaultik.yaml snapshot create
|
||||||
|
|
||||||
```sh
|
# Create specific snapshots by name
|
||||||
vaultik backup /etc/vaultik.yaml --cron # silent unless error
|
vaultik --config /etc/vaultik.yaml snapshot create home system
|
||||||
```
|
|
||||||
|
|
||||||
```sh
|
# Silent mode for cron
|
||||||
vaultik backup /etc/vaultik.yaml --daemon # runs in background, uses inotify
|
vaultik --config /etc/vaultik.yaml snapshot create --cron
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -100,54 +147,211 @@ Existing backup software fails under one or more of these conditions:
|
|||||||
### commands
|
### commands
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
vaultik backup [--config <path>] [--cron] [--daemon]
|
vaultik [--config <path>] snapshot create [snapshot-names...] [--cron] [--daemon] [--prune]
|
||||||
vaultik restore --bucket <bucket> --prefix <prefix> --snapshot <id> --target <dir>
|
vaultik [--config <path>] snapshot list [--json]
|
||||||
vaultik prune --bucket <bucket> --prefix <prefix> [--dry-run]
|
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep]
|
||||||
vaultik fetch --bucket <bucket> --prefix <prefix> --snapshot <id> --file <path> --target <path>
|
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--force]
|
||||||
vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
|
vaultik [--config <path>] snapshot remove <snapshot-id> [--dry-run] [--force]
|
||||||
|
vaultik [--config <path>] snapshot prune
|
||||||
|
vaultik [--config <path>] restore <snapshot-id> <target-dir> [paths...]
|
||||||
|
vaultik [--config <path>] prune [--dry-run] [--force]
|
||||||
|
vaultik [--config <path>] info
|
||||||
|
vaultik [--config <path>] store info
|
||||||
```
|
```
|
||||||
|
|
||||||
### environment
|
### environment
|
||||||
|
|
||||||
* `VAULTIK_PRIVATE_KEY`: Required for `restore`, `prune`, `fetch`, and `verify` commands. Contains the age private key for decryption.
|
* `VAULTIK_AGE_SECRET_KEY`: Required for `restore` and deep `verify`. Contains the age private key for decryption.
|
||||||
* `VAULTIK_CONFIG`: Optional path to config file. If set, `vaultik backup` can be run without specifying the config file path.
|
* `VAULTIK_CONFIG`: Optional path to config file.
|
||||||
|
|
||||||
### command details
|
### command details
|
||||||
|
|
||||||
**backup**: Perform incremental backup of configured directories
|
**snapshot create**: Perform incremental backup of configured snapshots
|
||||||
* Config is located at `/etc/vaultik/config.yml` by default
|
* Config is located at `/etc/vaultik/config.yml` by default
|
||||||
* `--config`: Override config file path
|
* Optional snapshot names argument to create specific snapshots (default: all)
|
||||||
* `--cron`: Silent unless error (for crontab)
|
* `--cron`: Silent unless error (for crontab)
|
||||||
* `--daemon`: Run continuously with inotify monitoring and periodic scans
|
* `--daemon`: Run continuously with inotify monitoring and periodic scans
|
||||||
|
* `--prune`: Delete old snapshots and orphaned blobs after backup
|
||||||
|
|
||||||
**restore**: Restore entire snapshot to target directory
|
**snapshot list**: List all snapshots with their timestamps and sizes
|
||||||
* Downloads and decrypts metadata
|
* `--json`: Output in JSON format
|
||||||
* Fetches only required blobs
|
|
||||||
* Reconstructs directory structure
|
|
||||||
|
|
||||||
**prune**: Remove unreferenced blobs from storage
|
**snapshot verify**: Verify snapshot integrity
|
||||||
* Requires private key
|
* `--deep`: Download and verify blob contents (not just existence)
|
||||||
* Downloads latest snapshot metadata
|
|
||||||
|
**snapshot purge**: Remove old snapshots based on criteria
|
||||||
|
* `--keep-latest`: Keep only the most recent snapshot
|
||||||
|
* `--older-than`: Remove snapshots older than duration (e.g., 30d, 6mo, 1y)
|
||||||
|
* `--force`: Skip confirmation prompt
|
||||||
|
|
||||||
|
**snapshot remove**: Remove a specific snapshot
|
||||||
|
* `--dry-run`: Show what would be deleted without deleting
|
||||||
|
* `--force`: Skip confirmation prompt
|
||||||
|
|
||||||
|
**snapshot prune**: Clean orphaned data from local database
|
||||||
|
|
||||||
|
**restore**: Restore snapshot to target directory
|
||||||
|
* Requires `VAULTIK_AGE_SECRET_KEY` environment variable with age private key
|
||||||
|
* Optional path arguments to restore specific files/directories (default: all)
|
||||||
|
* Downloads and decrypts metadata, fetches required blobs, reconstructs files
|
||||||
|
* Preserves file permissions, timestamps, and ownership (ownership requires root)
|
||||||
|
* Handles symlinks and directories
|
||||||
|
|
||||||
|
**prune**: Remove unreferenced blobs from remote storage
|
||||||
|
* Scans all snapshots for referenced blobs
|
||||||
* Deletes orphaned blobs
|
* Deletes orphaned blobs
|
||||||
|
|
||||||
**fetch**: Extract single file from backup
|
**info**: Display system and configuration information
|
||||||
* Retrieves specific file without full restore
|
|
||||||
* Supports extracting to different filename
|
|
||||||
|
|
||||||
**verify**: Validate backup integrity
|
**store info**: Display S3 bucket configuration and storage statistics
|
||||||
* Checks metadata hash
|
|
||||||
* Verifies all referenced blobs exist
|
|
||||||
* Default: Downloads blobs and validates chunk integrity
|
|
||||||
* `--quick`: Only checks blob existence and S3 content hashes
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## architecture
|
## architecture
|
||||||
|
|
||||||
|
### s3 bucket layout
|
||||||
|
|
||||||
|
```
|
||||||
|
s3://<bucket>/<prefix>/
|
||||||
|
├── blobs/
|
||||||
|
│ └── <aa>/<bb>/<full_blob_hash>
|
||||||
|
└── metadata/
|
||||||
|
├── <snapshot_id>/
|
||||||
|
│ ├── db.zst.age
|
||||||
|
│ └── manifest.json.zst
|
||||||
|
```
|
||||||
|
|
||||||
|
* `blobs/<aa>/<bb>/...`: Two-level directory sharding using first 4 hex chars of blob hash
|
||||||
|
* `metadata/<snapshot_id>/db.zst.age`: Encrypted, compressed SQLite database
|
||||||
|
* `metadata/<snapshot_id>/manifest.json.zst`: Unencrypted blob list for pruning
|
||||||
|
|
||||||
|
### blob manifest format
|
||||||
|
|
||||||
|
The `manifest.json.zst` file is unencrypted (compressed JSON) to enable pruning without decryption:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"snapshot_id": "hostname_snapshotname_2025-01-01T12:00:00Z",
|
||||||
|
"blob_hashes": [
|
||||||
|
"aa1234567890abcdef...",
|
||||||
|
"bb2345678901bcdef0..."
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Snapshot IDs follow the format `<hostname>_<snapshot-name>_<timestamp>` (e.g., `server1_home_2025-01-01T12:00:00Z`).
|
||||||
|
|
||||||
|
### local sqlite schema
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE files (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
path TEXT NOT NULL UNIQUE,
|
||||||
|
mtime INTEGER NOT NULL,
|
||||||
|
size INTEGER NOT NULL,
|
||||||
|
mode INTEGER NOT NULL,
|
||||||
|
uid INTEGER NOT NULL,
|
||||||
|
gid INTEGER NOT NULL
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE file_chunks (
|
||||||
|
file_id TEXT NOT NULL,
|
||||||
|
idx INTEGER NOT NULL,
|
||||||
|
chunk_hash TEXT NOT NULL,
|
||||||
|
PRIMARY KEY (file_id, idx),
|
||||||
|
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE chunks (
|
||||||
|
chunk_hash TEXT PRIMARY KEY,
|
||||||
|
size INTEGER NOT NULL
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE blobs (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
blob_hash TEXT NOT NULL UNIQUE,
|
||||||
|
uncompressed INTEGER NOT NULL,
|
||||||
|
compressed INTEGER NOT NULL,
|
||||||
|
uploaded_at INTEGER
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE blob_chunks (
|
||||||
|
blob_hash TEXT NOT NULL,
|
||||||
|
chunk_hash TEXT NOT NULL,
|
||||||
|
offset INTEGER NOT NULL,
|
||||||
|
length INTEGER NOT NULL,
|
||||||
|
PRIMARY KEY (blob_hash, chunk_hash)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE chunk_files (
|
||||||
|
chunk_hash TEXT NOT NULL,
|
||||||
|
file_id TEXT NOT NULL,
|
||||||
|
file_offset INTEGER NOT NULL,
|
||||||
|
length INTEGER NOT NULL,
|
||||||
|
PRIMARY KEY (chunk_hash, file_id)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE snapshots (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
hostname TEXT NOT NULL,
|
||||||
|
vaultik_version TEXT NOT NULL,
|
||||||
|
started_at INTEGER NOT NULL,
|
||||||
|
completed_at INTEGER,
|
||||||
|
file_count INTEGER NOT NULL,
|
||||||
|
chunk_count INTEGER NOT NULL,
|
||||||
|
blob_count INTEGER NOT NULL,
|
||||||
|
total_size INTEGER NOT NULL,
|
||||||
|
blob_size INTEGER NOT NULL,
|
||||||
|
compression_ratio REAL NOT NULL
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE snapshot_files (
|
||||||
|
snapshot_id TEXT NOT NULL,
|
||||||
|
file_id TEXT NOT NULL,
|
||||||
|
PRIMARY KEY (snapshot_id, file_id)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE snapshot_blobs (
|
||||||
|
snapshot_id TEXT NOT NULL,
|
||||||
|
blob_id TEXT NOT NULL,
|
||||||
|
blob_hash TEXT NOT NULL,
|
||||||
|
PRIMARY KEY (snapshot_id, blob_id)
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
### data flow
|
||||||
|
|
||||||
|
#### backup
|
||||||
|
|
||||||
|
1. Load config, open local SQLite index
|
||||||
|
1. Walk source directories, check mtime/size against index
|
||||||
|
1. For changed/new files: chunk using content-defined chunking
|
||||||
|
1. For each chunk: hash, check if already uploaded, add to blob packer
|
||||||
|
1. When blob reaches threshold: compress, encrypt, upload to S3
|
||||||
|
1. Build snapshot metadata, compress, encrypt, upload
|
||||||
|
1. Create blob manifest (unencrypted) for pruning support
|
||||||
|
|
||||||
|
#### restore
|
||||||
|
|
||||||
|
1. Download `metadata/<snapshot_id>/db.zst.age`
|
||||||
|
1. Decrypt and decompress SQLite database
|
||||||
|
1. Query files table (optionally filtered by paths)
|
||||||
|
1. For each file, get ordered chunk list from file_chunks
|
||||||
|
1. Download required blobs, decrypt, decompress
|
||||||
|
1. Extract chunks and reconstruct files
|
||||||
|
1. Restore permissions, mtime, uid/gid
|
||||||
|
|
||||||
|
#### prune
|
||||||
|
|
||||||
|
1. List all snapshot manifests
|
||||||
|
1. Build set of all referenced blob hashes
|
||||||
|
1. List all blobs in storage
|
||||||
|
1. Delete any blob not in referenced set
|
||||||
|
|
||||||
### chunking
|
### chunking
|
||||||
|
|
||||||
* Content-defined chunking using rolling hash (Rabin fingerprint)
|
* Content-defined chunking using FastCDC algorithm
|
||||||
* Average chunk size: 10MB (configurable)
|
* Average chunk size: configurable (default 10MB)
|
||||||
* Deduplication at chunk level
|
* Deduplication at chunk level
|
||||||
* Multiple chunks packed into blobs for efficiency
|
* Multiple chunks packed into blobs for efficiency
|
||||||
|
|
||||||
@@ -158,19 +362,13 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
|
|||||||
* Each blob encrypted independently
|
* Each blob encrypted independently
|
||||||
* Metadata databases also encrypted
|
* Metadata databases also encrypted
|
||||||
|
|
||||||
### storage
|
### compression
|
||||||
|
|
||||||
* Content-addressed blob storage
|
* zstd compression at configurable level
|
||||||
* Immutable append-only design
|
* Applied before encryption
|
||||||
* Two-level directory sharding for blobs (aa/bb/hash)
|
* Blob-level compression for efficiency
|
||||||
* Compressed with zstd before encryption
|
|
||||||
|
|
||||||
### state tracking
|
---
|
||||||
|
|
||||||
* Local SQLite database for incremental state
|
|
||||||
* Tracks file mtimes and chunk mappings
|
|
||||||
* Enables efficient change detection
|
|
||||||
* Supports inotify monitoring in daemon mode
|
|
||||||
|
|
||||||
## does not
|
## does not
|
||||||
|
|
||||||
@@ -180,8 +378,6 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
|
|||||||
* Require a symmetric passphrase or password
|
* Require a symmetric passphrase or password
|
||||||
* Trust the source system with anything
|
* Trust the source system with anything
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## does
|
## does
|
||||||
|
|
||||||
* Incremental deduplicated backup
|
* Incremental deduplicated backup
|
||||||
@@ -193,90 +389,22 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## restore
|
## requirements
|
||||||
|
|
||||||
`vaultik restore` downloads only the snapshot metadata and required blobs. It
|
* Go 1.24 or later
|
||||||
never contacts the source system. All restore operations depend only on:
|
* S3-compatible object storage
|
||||||
|
* Sufficient disk space for local index (typically <1GB)
|
||||||
* `VAULTIK_PRIVATE_KEY`
|
|
||||||
* The bucket
|
|
||||||
|
|
||||||
The entire system is restore-only from object storage.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## features
|
|
||||||
|
|
||||||
### daemon mode
|
|
||||||
|
|
||||||
* Continuous background operation
|
|
||||||
* inotify-based change detection
|
|
||||||
* Respects `backup_interval` and `min_time_between_run`
|
|
||||||
* Full scan every `full_scan_interval` (default 24h)
|
|
||||||
|
|
||||||
### cron mode
|
|
||||||
|
|
||||||
* Single backup run
|
|
||||||
* Silent output unless errors
|
|
||||||
* Ideal for scheduled backups
|
|
||||||
|
|
||||||
### metadata integrity
|
|
||||||
|
|
||||||
* SHA256 hash of metadata stored separately
|
|
||||||
* Encrypted hash file for verification
|
|
||||||
* Chunked metadata support for large filesystems
|
|
||||||
|
|
||||||
### exclusion patterns
|
|
||||||
|
|
||||||
* Glob-based file exclusion
|
|
||||||
* Configured in YAML
|
|
||||||
* Applied during directory walk
|
|
||||||
|
|
||||||
## prune
|
|
||||||
|
|
||||||
Run `vaultik prune` on a machine with the private key. It:
|
|
||||||
|
|
||||||
* Downloads the most recent snapshot
|
|
||||||
* Decrypts metadata
|
|
||||||
* Lists referenced blobs
|
|
||||||
* Deletes any blob in the bucket not referenced
|
|
||||||
|
|
||||||
This enables garbage collection from immutable storage.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## license
|
## license
|
||||||
|
|
||||||
WTFPL — see LICENSE.
|
[MIT](https://opensource.org/license/mit/)
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## security considerations
|
|
||||||
|
|
||||||
* Source host compromise cannot decrypt backups
|
|
||||||
* No replay attacks possible (append-only)
|
|
||||||
* Each blob independently encrypted
|
|
||||||
* Metadata tampering detectable via hash verification
|
|
||||||
* S3 credentials only allow write access to backup prefix
|
|
||||||
|
|
||||||
## performance
|
|
||||||
|
|
||||||
* Streaming processing (no temp files)
|
|
||||||
* Parallel blob uploads
|
|
||||||
* Deduplication reduces storage and bandwidth
|
|
||||||
* Local index enables fast incremental detection
|
|
||||||
* Configurable compression levels
|
|
||||||
|
|
||||||
## requirements
|
|
||||||
|
|
||||||
* Go 1.24.4 or later
|
|
||||||
* S3-compatible object storage
|
|
||||||
* age command-line tool (for key generation)
|
|
||||||
* SQLite3
|
|
||||||
* Sufficient disk space for local index
|
|
||||||
|
|
||||||
## author
|
## author
|
||||||
|
|
||||||
sneak
|
Made with love and lots of expensive SOTA AI by [sneak](https://sneak.berlin) in Berlin in the summer of 2025.
|
||||||
[sneak@sneak.berlin](mailto:sneak@sneak.berlin)
|
|
||||||
[https://sneak.berlin](https://sneak.berlin)
|
Released as a free software gift to the world, no strings attached.
|
||||||
|
|
||||||
|
Contact: [sneak@sneak.berlin](mailto:sneak@sneak.berlin)
|
||||||
|
|
||||||
|
[https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2](https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2)
|
||||||
|
|||||||
212
TODO.md
212
TODO.md
@@ -1,112 +1,128 @@
|
|||||||
# Implementation TODO
|
# Vaultik 1.0 TODO
|
||||||
|
|
||||||
## Local Index Database
|
Linear list of tasks to complete before 1.0 release.
|
||||||
1. Implement SQLite schema creation
|
|
||||||
1. Create Index type with all database operations
|
|
||||||
1. Add transaction support and proper locking
|
|
||||||
1. Implement file tracking (save, lookup, delete)
|
|
||||||
1. Implement chunk tracking and deduplication
|
|
||||||
1. Implement blob tracking and chunk-to-blob mapping
|
|
||||||
1. Write tests for all index operations
|
|
||||||
|
|
||||||
## Chunking and Hashing
|
## Rclone Storage Backend (Complete)
|
||||||
1. Implement Rabin fingerprint chunker
|
|
||||||
1. Create streaming chunk processor
|
|
||||||
1. Implement SHA256 hashing for chunks
|
|
||||||
1. Add configurable chunk size parameters
|
|
||||||
1. Write tests for chunking consistency
|
|
||||||
|
|
||||||
## Compression and Encryption
|
Add rclone as a storage backend via Go library import, allowing vaultik to use any of rclone's 70+ supported cloud storage providers.
|
||||||
1. Implement zstd compression wrapper
|
|
||||||
1. Integrate age encryption library
|
|
||||||
1. Create Encryptor type for public key encryption
|
|
||||||
1. Create Decryptor type for private key decryption
|
|
||||||
1. Implement streaming encrypt/decrypt pipelines
|
|
||||||
1. Write tests for compression and encryption
|
|
||||||
|
|
||||||
## Blob Packing
|
**Configuration:**
|
||||||
1. Implement BlobWriter with size limits
|
```yaml
|
||||||
1. Add chunk accumulation and flushing
|
storage_url: "rclone://myremote/path/to/backups"
|
||||||
1. Create blob hash calculation
|
```
|
||||||
1. Implement proper error handling and rollback
|
User must have rclone configured separately (via `rclone config`).
|
||||||
1. Write tests for blob packing scenarios
|
|
||||||
|
|
||||||
## S3 Operations
|
**Implementation Steps:**
|
||||||
1. Integrate MinIO client library
|
1. [x] Add rclone dependency to go.mod
|
||||||
1. Implement S3Client wrapper type
|
2. [x] Create `internal/storage/rclone.go` implementing `Storer` interface
|
||||||
1. Add multipart upload support for large blobs
|
- `NewRcloneStorer(remote, path)` - init with `configfile.Install()` and `fs.NewFs()`
|
||||||
1. Implement retry logic with exponential backoff
|
- `Put` / `PutWithProgress` - use `operations.Rcat()`
|
||||||
1. Add connection pooling and timeout handling
|
- `Get` - use `fs.NewObject()` then `obj.Open()`
|
||||||
1. Write tests using MinIO container
|
- `Stat` - use `fs.NewObject()` for size/metadata
|
||||||
|
- `Delete` - use `obj.Remove()`
|
||||||
|
- `List` / `ListStream` - use `operations.ListFn()`
|
||||||
|
- `Info` - return remote name
|
||||||
|
3. [x] Update `internal/storage/url.go` - parse `rclone://remote/path` URLs
|
||||||
|
4. [x] Update `internal/storage/module.go` - add rclone case to `storerFromURL()`
|
||||||
|
5. [x] Test with real rclone remote
|
||||||
|
|
||||||
## Backup Command - Basic
|
**Error Mapping:**
|
||||||
1. Implement directory walking with exclusion patterns
|
- `fs.ErrorObjectNotFound` → `ErrNotFound`
|
||||||
1. Add file change detection using index
|
- `fs.ErrorDirNotFound` → `ErrNotFound`
|
||||||
1. Integrate chunking pipeline for changed files
|
- `fs.ErrorNotFoundInConfigFile` → `ErrRemoteNotFound` (new)
|
||||||
1. Implement blob upload coordination
|
|
||||||
1. Add progress reporting to stderr
|
|
||||||
1. Write integration tests for backup
|
|
||||||
|
|
||||||
## Snapshot Metadata
|
---
|
||||||
1. Implement snapshot metadata extraction from index
|
|
||||||
1. Create SQLite snapshot database builder
|
|
||||||
1. Add metadata compression and encryption
|
|
||||||
1. Implement metadata chunking for large snapshots
|
|
||||||
1. Add hash calculation and verification
|
|
||||||
1. Implement metadata upload to S3
|
|
||||||
1. Write tests for metadata operations
|
|
||||||
|
|
||||||
## Restore Command
|
## CLI Polish (Priority)
|
||||||
1. Implement snapshot listing and selection
|
|
||||||
1. Add metadata download and reconstruction
|
|
||||||
1. Implement hash verification for metadata
|
|
||||||
1. Create file restoration logic with chunk retrieval
|
|
||||||
1. Add blob caching for efficiency
|
|
||||||
1. Implement proper file permissions and mtime restoration
|
|
||||||
1. Write integration tests for restore
|
|
||||||
|
|
||||||
## Prune Command
|
1. Improve error messages throughout
|
||||||
1. Implement latest snapshot detection
|
- Ensure all errors include actionable context
|
||||||
1. Add referenced blob extraction from metadata
|
- Add suggestions for common issues (e.g., "did you set VAULTIK_AGE_SECRET_KEY?")
|
||||||
1. Create S3 blob listing and comparison
|
|
||||||
1. Implement safe deletion of unreferenced blobs
|
|
||||||
1. Add dry-run mode for safety
|
|
||||||
1. Write tests for prune scenarios
|
|
||||||
|
|
||||||
## Verify Command
|
## Security (Priority)
|
||||||
1. Implement metadata integrity checking
|
|
||||||
1. Add blob existence verification
|
|
||||||
1. Implement quick mode (S3 hash checking)
|
|
||||||
1. Implement deep mode (download and verify chunks)
|
|
||||||
1. Add detailed error reporting
|
|
||||||
1. Write tests for verification
|
|
||||||
|
|
||||||
## Fetch Command
|
1. Audit encryption implementation
|
||||||
1. Implement single-file metadata query
|
- Verify age encryption is used correctly
|
||||||
1. Add minimal blob downloading for file
|
- Ensure no plaintext leaks in logs or errors
|
||||||
1. Create streaming file reconstruction
|
- Verify blob hashes are computed correctly
|
||||||
1. Add support for output redirection
|
|
||||||
1. Write tests for fetch command
|
|
||||||
|
|
||||||
## Daemon Mode
|
1. Secure memory handling for secrets
|
||||||
1. Implement inotify watcher for Linux
|
- Clear S3 credentials from memory after client init
|
||||||
1. Add dirty path tracking in index
|
- Document that age_secret_key is env-var only (already implemented)
|
||||||
1. Create periodic full scan scheduler
|
|
||||||
1. Implement backup interval enforcement
|
|
||||||
1. Add proper signal handling and shutdown
|
|
||||||
1. Write tests for daemon behavior
|
|
||||||
|
|
||||||
## Cron Mode
|
## Testing
|
||||||
1. Implement silent operation mode
|
|
||||||
1. Add proper exit codes for cron
|
|
||||||
1. Implement lock file to prevent concurrent runs
|
|
||||||
1. Add error summary reporting
|
|
||||||
1. Write tests for cron mode
|
|
||||||
|
|
||||||
## Finalization
|
1. Write integration tests for restore command
|
||||||
1. Add comprehensive logging throughout
|
|
||||||
1. Implement proper error wrapping and context
|
1. Write end-to-end integration test
|
||||||
1. Add performance metrics collection
|
- Create backup
|
||||||
1. Create end-to-end integration tests
|
- Verify backup
|
||||||
1. Write documentation and examples
|
- Restore backup
|
||||||
1. Set up CI/CD pipeline
|
- Compare restored files to originals
|
||||||
|
|
||||||
|
1. Add tests for edge cases
|
||||||
|
- Empty directories
|
||||||
|
- Symlinks
|
||||||
|
- Special characters in filenames
|
||||||
|
- Very large files (multi-GB)
|
||||||
|
- Many small files (100k+)
|
||||||
|
|
||||||
|
1. Add tests for error conditions
|
||||||
|
- Network failures during upload
|
||||||
|
- Disk full during restore
|
||||||
|
- Corrupted blobs
|
||||||
|
- Missing blobs
|
||||||
|
|
||||||
|
## Performance
|
||||||
|
|
||||||
|
1. Profile and optimize restore performance
|
||||||
|
- Parallel blob downloads
|
||||||
|
- Streaming decompression/decryption
|
||||||
|
- Efficient chunk reassembly
|
||||||
|
|
||||||
|
1. Add bandwidth limiting option
|
||||||
|
- `--bwlimit` flag for upload/download speed limiting
|
||||||
|
|
||||||
|
## Documentation
|
||||||
|
|
||||||
|
1. Add man page or --help improvements
|
||||||
|
- Detailed help for each command
|
||||||
|
- Examples in help output
|
||||||
|
|
||||||
|
## Final Polish
|
||||||
|
|
||||||
|
1. Ensure version is set correctly in releases
|
||||||
|
|
||||||
|
1. Create release process
|
||||||
|
- Binary releases for supported platforms
|
||||||
|
- Checksums for binaries
|
||||||
|
- Release notes template
|
||||||
|
|
||||||
|
1. Final code review
|
||||||
|
- Remove debug statements
|
||||||
|
- Ensure consistent code style
|
||||||
|
|
||||||
|
1. Tag and release v1.0.0
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Post-1.0 (Daemon Mode)
|
||||||
|
|
||||||
|
1. Implement inotify file watcher for Linux
|
||||||
|
- Watch source directories for changes
|
||||||
|
- Track dirty paths in memory
|
||||||
|
|
||||||
|
1. Implement FSEvents watcher for macOS
|
||||||
|
- Watch source directories for changes
|
||||||
|
- Track dirty paths in memory
|
||||||
|
|
||||||
|
1. Implement backup scheduler in daemon mode
|
||||||
|
- Respect backup_interval config
|
||||||
|
- Trigger backup when dirty paths exist and interval elapsed
|
||||||
|
- Implement full_scan_interval for periodic full scans
|
||||||
|
|
||||||
|
1. Add proper signal handling for daemon
|
||||||
|
- Graceful shutdown on SIGTERM/SIGINT
|
||||||
|
- Complete in-progress backup before exit
|
||||||
|
|
||||||
|
1. Write tests for daemon mode
|
||||||
|
|||||||
@@ -1,9 +1,41 @@
|
|||||||
package main
|
package main
|
||||||
|
|
||||||
import (
|
import (
|
||||||
|
"os"
|
||||||
|
"runtime"
|
||||||
|
"runtime/pprof"
|
||||||
|
|
||||||
"git.eeqj.de/sneak/vaultik/internal/cli"
|
"git.eeqj.de/sneak/vaultik/internal/cli"
|
||||||
)
|
)
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
|
// CPU profiling: set VAULTIK_CPUPROFILE=/path/to/cpu.prof
|
||||||
|
if cpuProfile := os.Getenv("VAULTIK_CPUPROFILE"); cpuProfile != "" {
|
||||||
|
f, err := os.Create(cpuProfile)
|
||||||
|
if err != nil {
|
||||||
|
panic("could not create CPU profile: " + err.Error())
|
||||||
|
}
|
||||||
|
defer func() { _ = f.Close() }()
|
||||||
|
if err := pprof.StartCPUProfile(f); err != nil {
|
||||||
|
panic("could not start CPU profile: " + err.Error())
|
||||||
|
}
|
||||||
|
defer pprof.StopCPUProfile()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Memory profiling: set VAULTIK_MEMPROFILE=/path/to/mem.prof
|
||||||
|
if memProfile := os.Getenv("VAULTIK_MEMPROFILE"); memProfile != "" {
|
||||||
|
defer func() {
|
||||||
|
f, err := os.Create(memProfile)
|
||||||
|
if err != nil {
|
||||||
|
panic("could not create memory profile: " + err.Error())
|
||||||
|
}
|
||||||
|
defer func() { _ = f.Close() }()
|
||||||
|
runtime.GC() // get up-to-date statistics
|
||||||
|
if err := pprof.WriteHeapProfile(f); err != nil {
|
||||||
|
panic("could not write memory profile: " + err.Error())
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
}
|
||||||
|
|
||||||
cli.CLIEntry()
|
cli.CLIEntry()
|
||||||
}
|
}
|
||||||
|
|||||||
332
config.example.yml
Normal file
332
config.example.yml
Normal file
@@ -0,0 +1,332 @@
|
|||||||
|
# vaultik configuration file example
|
||||||
|
# This file shows all available configuration options with their default values
|
||||||
|
# Copy this file and uncomment/modify the values you need
|
||||||
|
|
||||||
|
# Age recipient public keys for encryption
|
||||||
|
# This is REQUIRED - backups are encrypted to these public keys
|
||||||
|
# Generate with: age-keygen | grep "public key"
|
||||||
|
age_recipients:
|
||||||
|
- age1cj2k2addawy294f6k2gr2mf9gps9r3syplryxca3nvxj3daqm96qfp84tz
|
||||||
|
|
||||||
|
# Named snapshots - each snapshot can contain multiple paths
|
||||||
|
# Each snapshot gets its own ID and can have snapshot-specific excludes
|
||||||
|
snapshots:
|
||||||
|
testing:
|
||||||
|
paths:
|
||||||
|
- ~/dev/vaultik
|
||||||
|
apps:
|
||||||
|
paths:
|
||||||
|
- /Applications
|
||||||
|
exclude:
|
||||||
|
- "/App Store.app"
|
||||||
|
- "/Apps.app"
|
||||||
|
- "/Automator.app"
|
||||||
|
- "/Books.app"
|
||||||
|
- "/Calculator.app"
|
||||||
|
- "/Calendar.app"
|
||||||
|
- "/Chess.app"
|
||||||
|
- "/Clock.app"
|
||||||
|
- "/Contacts.app"
|
||||||
|
- "/Dictionary.app"
|
||||||
|
- "/FaceTime.app"
|
||||||
|
- "/FindMy.app"
|
||||||
|
- "/Font Book.app"
|
||||||
|
- "/Freeform.app"
|
||||||
|
- "/Games.app"
|
||||||
|
- "/GarageBand.app"
|
||||||
|
- "/Home.app"
|
||||||
|
- "/Image Capture.app"
|
||||||
|
- "/Image Playground.app"
|
||||||
|
- "/Journal.app"
|
||||||
|
- "/Keynote.app"
|
||||||
|
- "/Mail.app"
|
||||||
|
- "/Maps.app"
|
||||||
|
- "/Messages.app"
|
||||||
|
- "/Mission Control.app"
|
||||||
|
- "/Music.app"
|
||||||
|
- "/News.app"
|
||||||
|
- "/Notes.app"
|
||||||
|
- "/Numbers.app"
|
||||||
|
- "/Pages.app"
|
||||||
|
- "/Passwords.app"
|
||||||
|
- "/Phone.app"
|
||||||
|
- "/Photo Booth.app"
|
||||||
|
- "/Photos.app"
|
||||||
|
- "/Podcasts.app"
|
||||||
|
- "/Preview.app"
|
||||||
|
- "/QuickTime Player.app"
|
||||||
|
- "/Reminders.app"
|
||||||
|
- "/Safari.app"
|
||||||
|
- "/Shortcuts.app"
|
||||||
|
- "/Siri.app"
|
||||||
|
- "/Stickies.app"
|
||||||
|
- "/Stocks.app"
|
||||||
|
- "/System Settings.app"
|
||||||
|
- "/TV.app"
|
||||||
|
- "/TextEdit.app"
|
||||||
|
- "/Time Machine.app"
|
||||||
|
- "/Tips.app"
|
||||||
|
- "/Utilities/Activity Monitor.app"
|
||||||
|
- "/Utilities/AirPort Utility.app"
|
||||||
|
- "/Utilities/Audio MIDI Setup.app"
|
||||||
|
- "/Utilities/Bluetooth File Exchange.app"
|
||||||
|
- "/Utilities/Boot Camp Assistant.app"
|
||||||
|
- "/Utilities/ColorSync Utility.app"
|
||||||
|
- "/Utilities/Console.app"
|
||||||
|
- "/Utilities/Digital Color Meter.app"
|
||||||
|
- "/Utilities/Disk Utility.app"
|
||||||
|
- "/Utilities/Grapher.app"
|
||||||
|
- "/Utilities/Magnifier.app"
|
||||||
|
- "/Utilities/Migration Assistant.app"
|
||||||
|
- "/Utilities/Print Center.app"
|
||||||
|
- "/Utilities/Screen Sharing.app"
|
||||||
|
- "/Utilities/Screenshot.app"
|
||||||
|
- "/Utilities/Script Editor.app"
|
||||||
|
- "/Utilities/System Information.app"
|
||||||
|
- "/Utilities/Terminal.app"
|
||||||
|
- "/Utilities/VoiceOver Utility.app"
|
||||||
|
- "/VoiceMemos.app"
|
||||||
|
- "/Weather.app"
|
||||||
|
- "/iMovie.app"
|
||||||
|
- "/iPhone Mirroring.app"
|
||||||
|
home:
|
||||||
|
paths:
|
||||||
|
- "~"
|
||||||
|
exclude:
|
||||||
|
- "/.Trash"
|
||||||
|
- "/tmp"
|
||||||
|
- "/Library/Caches"
|
||||||
|
- "/Library/Accounts"
|
||||||
|
- "/Library/AppleMediaServices"
|
||||||
|
- "/Library/Application Support/AddressBook"
|
||||||
|
- "/Library/Application Support/CallHistoryDB"
|
||||||
|
- "/Library/Application Support/CallHistoryTransactions"
|
||||||
|
- "/Library/Application Support/DifferentialPrivacy"
|
||||||
|
- "/Library/Application Support/FaceTime"
|
||||||
|
- "/Library/Application Support/FileProvider"
|
||||||
|
- "/Library/Application Support/Knowledge"
|
||||||
|
- "/Library/Application Support/com.apple.TCC"
|
||||||
|
- "/Library/Application Support/com.apple.avfoundation/Frecents"
|
||||||
|
- "/Library/Application Support/com.apple.sharedfilelist"
|
||||||
|
- "/Library/Assistant/SiriVocabulary"
|
||||||
|
- "/Library/Autosave Information"
|
||||||
|
- "/Library/Biome"
|
||||||
|
- "/Library/ContainerManager"
|
||||||
|
- "/Library/Containers/com.apple.Home"
|
||||||
|
- "/Library/Containers/com.apple.Maps/Data/Maps"
|
||||||
|
- "/Library/Containers/com.apple.MobileSMS"
|
||||||
|
- "/Library/Containers/com.apple.Notes"
|
||||||
|
- "/Library/Containers/com.apple.Safari"
|
||||||
|
- "/Library/Containers/com.apple.Safari.WebApp"
|
||||||
|
- "/Library/Containers/com.apple.VoiceMemos"
|
||||||
|
- "/Library/Containers/com.apple.archiveutility"
|
||||||
|
- "/Library/Containers/com.apple.corerecents.recentsd/Data/Library/Recents"
|
||||||
|
- "/Library/Containers/com.apple.mail"
|
||||||
|
- "/Library/Containers/com.apple.news"
|
||||||
|
- "/Library/Containers/com.apple.stocks"
|
||||||
|
- "/Library/Cookies"
|
||||||
|
- "/Library/CoreFollowUp"
|
||||||
|
- "/Library/Daemon Containers"
|
||||||
|
- "/Library/DoNotDisturb"
|
||||||
|
- "/Library/DuetExpertCenter"
|
||||||
|
- "/Library/Group Containers/com.apple.Home.group"
|
||||||
|
- "/Library/Group Containers/com.apple.MailPersonaStorage"
|
||||||
|
- "/Library/Group Containers/com.apple.PreviewLegacySignaturesConversion"
|
||||||
|
- "/Library/Group Containers/com.apple.bird"
|
||||||
|
- "/Library/Group Containers/com.apple.stickersd.group"
|
||||||
|
- "/Library/Group Containers/com.apple.systempreferences.cache"
|
||||||
|
- "/Library/Group Containers/group.com.apple.AppleSpell"
|
||||||
|
- "/Library/Group Containers/group.com.apple.ArchiveUtility.PKSignedContainer"
|
||||||
|
- "/Library/Group Containers/group.com.apple.DeviceActivity"
|
||||||
|
- "/Library/Group Containers/group.com.apple.Journal"
|
||||||
|
- "/Library/Group Containers/group.com.apple.ManagedSettings"
|
||||||
|
- "/Library/Group Containers/group.com.apple.PegasusConfiguration"
|
||||||
|
- "/Library/Group Containers/group.com.apple.Safari.SandboxBroker"
|
||||||
|
- "/Library/Group Containers/group.com.apple.SiriTTS"
|
||||||
|
- "/Library/Group Containers/group.com.apple.UserNotifications"
|
||||||
|
- "/Library/Group Containers/group.com.apple.VoiceMemos.shared"
|
||||||
|
- "/Library/Group Containers/group.com.apple.accessibility.voicebanking"
|
||||||
|
- "/Library/Group Containers/group.com.apple.amsondevicestoraged"
|
||||||
|
- "/Library/Group Containers/group.com.apple.appstoreagent"
|
||||||
|
- "/Library/Group Containers/group.com.apple.calendar"
|
||||||
|
- "/Library/Group Containers/group.com.apple.chronod"
|
||||||
|
- "/Library/Group Containers/group.com.apple.contacts"
|
||||||
|
- "/Library/Group Containers/group.com.apple.controlcenter"
|
||||||
|
- "/Library/Group Containers/group.com.apple.corerepair"
|
||||||
|
- "/Library/Group Containers/group.com.apple.coreservices.useractivityd"
|
||||||
|
- "/Library/Group Containers/group.com.apple.energykit"
|
||||||
|
- "/Library/Group Containers/group.com.apple.feedback"
|
||||||
|
- "/Library/Group Containers/group.com.apple.feedbacklogger"
|
||||||
|
- "/Library/Group Containers/group.com.apple.findmy.findmylocateagent"
|
||||||
|
- "/Library/Group Containers/group.com.apple.iCloudDrive"
|
||||||
|
- "/Library/Group Containers/group.com.apple.icloud.fmfcore"
|
||||||
|
- "/Library/Group Containers/group.com.apple.icloud.fmipcore"
|
||||||
|
- "/Library/Group Containers/group.com.apple.icloud.searchpartyuseragent"
|
||||||
|
- "/Library/Group Containers/group.com.apple.liveactivitiesd"
|
||||||
|
- "/Library/Group Containers/group.com.apple.loginwindow.persistent-apps"
|
||||||
|
- "/Library/Group Containers/group.com.apple.mail"
|
||||||
|
- "/Library/Group Containers/group.com.apple.mlhost"
|
||||||
|
- "/Library/Group Containers/group.com.apple.moments"
|
||||||
|
- "/Library/Group Containers/group.com.apple.news"
|
||||||
|
- "/Library/Group Containers/group.com.apple.newsd"
|
||||||
|
- "/Library/Group Containers/group.com.apple.notes"
|
||||||
|
- "/Library/Group Containers/group.com.apple.notes.import"
|
||||||
|
- "/Library/Group Containers/group.com.apple.photolibraryd.private"
|
||||||
|
- "/Library/Group Containers/group.com.apple.portrait.BackgroundReplacement"
|
||||||
|
- "/Library/Group Containers/group.com.apple.printtool"
|
||||||
|
- "/Library/Group Containers/group.com.apple.private.translation"
|
||||||
|
- "/Library/Group Containers/group.com.apple.reminders"
|
||||||
|
- "/Library/Group Containers/group.com.apple.replicatord"
|
||||||
|
- "/Library/Group Containers/group.com.apple.scopedbookmarkagent"
|
||||||
|
- "/Library/Group Containers/group.com.apple.secure-control-center-preferences"
|
||||||
|
- "/Library/Group Containers/group.com.apple.sharingd"
|
||||||
|
- "/Library/Group Containers/group.com.apple.shortcuts"
|
||||||
|
- "/Library/Group Containers/group.com.apple.siri.inference"
|
||||||
|
- "/Library/Group Containers/group.com.apple.siri.referenceResolution"
|
||||||
|
- "/Library/Group Containers/group.com.apple.siri.remembers"
|
||||||
|
- "/Library/Group Containers/group.com.apple.siri.userfeedbacklearning"
|
||||||
|
- "/Library/Group Containers/group.com.apple.spotlight"
|
||||||
|
- "/Library/Group Containers/group.com.apple.stocks"
|
||||||
|
- "/Library/Group Containers/group.com.apple.stocks-news"
|
||||||
|
- "/Library/Group Containers/group.com.apple.studentd"
|
||||||
|
- "/Library/Group Containers/group.com.apple.swtransparency"
|
||||||
|
- "/Library/Group Containers/group.com.apple.telephonyutilities.callservicesd"
|
||||||
|
- "/Library/Group Containers/group.com.apple.tips"
|
||||||
|
- "/Library/Group Containers/group.com.apple.tipsnext"
|
||||||
|
- "/Library/Group Containers/group.com.apple.transparency"
|
||||||
|
- "/Library/Group Containers/group.com.apple.usernoted"
|
||||||
|
- "/Library/Group Containers/group.com.apple.weather"
|
||||||
|
- "/Library/HomeKit"
|
||||||
|
- "/Library/IdentityServices"
|
||||||
|
- "/Library/IntelligencePlatform"
|
||||||
|
- "/Library/Mail"
|
||||||
|
- "/Library/Messages"
|
||||||
|
- "/Library/Metadata/CoreSpotlight"
|
||||||
|
- "/Library/Metadata/com.apple.IntelligentSuggestions"
|
||||||
|
- "/Library/PersonalizationPortrait"
|
||||||
|
- "/Library/Safari"
|
||||||
|
- "/Library/Sharing"
|
||||||
|
- "/Library/Shortcuts"
|
||||||
|
- "/Library/StatusKit"
|
||||||
|
- "/Library/Suggestions"
|
||||||
|
- "/Library/Trial"
|
||||||
|
- "/Library/Weather"
|
||||||
|
- "/Library/com.apple.aiml.instrumentation"
|
||||||
|
- "/Movies/TV"
|
||||||
|
system:
|
||||||
|
paths:
|
||||||
|
- /
|
||||||
|
exclude:
|
||||||
|
# Virtual/transient filesystems
|
||||||
|
- /proc
|
||||||
|
- /sys
|
||||||
|
- /dev
|
||||||
|
- /run
|
||||||
|
- /tmp
|
||||||
|
- /var/tmp
|
||||||
|
- /var/run
|
||||||
|
- /var/lock
|
||||||
|
- /var/cache
|
||||||
|
- /media
|
||||||
|
- /mnt
|
||||||
|
# Swap
|
||||||
|
- /swapfile
|
||||||
|
- /swap.img
|
||||||
|
# Package manager caches
|
||||||
|
- /var/cache/apt
|
||||||
|
- /var/cache/yum
|
||||||
|
- /var/cache/dnf
|
||||||
|
- /var/cache/pacman
|
||||||
|
# Trash
|
||||||
|
- "*/.local/share/Trash"
|
||||||
|
dev:
|
||||||
|
paths:
|
||||||
|
- /Users/user/dev
|
||||||
|
exclude:
|
||||||
|
- "**/node_modules"
|
||||||
|
- "**/target"
|
||||||
|
- "**/build"
|
||||||
|
- "**/__pycache__"
|
||||||
|
- "**/*.pyc"
|
||||||
|
- "**/.venv"
|
||||||
|
- "**/vendor"
|
||||||
|
|
||||||
|
# Global patterns to exclude from all backups
|
||||||
|
exclude:
|
||||||
|
- "*.tmp"
|
||||||
|
|
||||||
|
# Storage URL - use either this OR the s3 section below
|
||||||
|
# Supports: s3://bucket/prefix, file:///path, rclone://remote/path
|
||||||
|
storage_url: "rclone://las1stor1//srv/pool.2024.04/backups/heraklion"
|
||||||
|
|
||||||
|
# S3-compatible storage configuration
|
||||||
|
#s3:
|
||||||
|
# # S3-compatible endpoint URL
|
||||||
|
# # Examples: https://s3.amazonaws.com, https://storage.googleapis.com
|
||||||
|
# endpoint: http://10.100.205.122:8333
|
||||||
|
#
|
||||||
|
# # Bucket name where backups will be stored
|
||||||
|
# bucket: testbucket
|
||||||
|
#
|
||||||
|
# # Prefix (folder) within the bucket for this host's backups
|
||||||
|
# # Useful for organizing backups from multiple hosts
|
||||||
|
# # Default: empty (root of bucket)
|
||||||
|
# #prefix: "hosts/myserver/"
|
||||||
|
#
|
||||||
|
# # S3 access credentials
|
||||||
|
# access_key_id: Z9GT22M9YFU08WRMC5D4
|
||||||
|
# secret_access_key: Pi0tPKjFbN4rZlRhcA4zBtEkib04yy2WcIzI+AXk
|
||||||
|
#
|
||||||
|
# # S3 region
|
||||||
|
# # Default: us-east-1
|
||||||
|
# #region: us-east-1
|
||||||
|
#
|
||||||
|
# # Use SSL/TLS for S3 connections
|
||||||
|
# # Default: true
|
||||||
|
# #use_ssl: true
|
||||||
|
#
|
||||||
|
# # Part size for multipart uploads
|
||||||
|
# # Minimum 5MB, affects memory usage during upload
|
||||||
|
# # Supports: 5MB, 10M, 100MiB, etc.
|
||||||
|
# # Default: 5MB
|
||||||
|
# #part_size: 5MB
|
||||||
|
|
||||||
|
# How often to run backups in daemon mode
|
||||||
|
# Format: 1h, 30m, 24h, etc
|
||||||
|
# Default: 1h
|
||||||
|
#backup_interval: 1h
|
||||||
|
|
||||||
|
# How often to do a full filesystem scan in daemon mode
|
||||||
|
# Between full scans, inotify is used to detect changes
|
||||||
|
# Default: 24h
|
||||||
|
#full_scan_interval: 24h
|
||||||
|
|
||||||
|
# Minimum time between backup runs in daemon mode
|
||||||
|
# Prevents backups from running too frequently
|
||||||
|
# Default: 15m
|
||||||
|
#min_time_between_run: 15m
|
||||||
|
|
||||||
|
# Path to local SQLite index database
|
||||||
|
# This database tracks file state for incremental backups
|
||||||
|
# Default: /var/lib/vaultik/index.sqlite
|
||||||
|
#index_path: /var/lib/vaultik/index.sqlite
|
||||||
|
|
||||||
|
# Average chunk size for content-defined chunking
|
||||||
|
# Smaller chunks = better deduplication but more metadata
|
||||||
|
# Supports: 10MB, 5M, 1GB, 500KB, 64MiB, etc.
|
||||||
|
# Default: 10MB
|
||||||
|
#chunk_size: 10MB
|
||||||
|
|
||||||
|
# Maximum blob size
|
||||||
|
# Multiple chunks are packed into blobs up to this size
|
||||||
|
# Supports: 1GB, 10G, 500MB, 1GiB, etc.
|
||||||
|
# Default: 10GB
|
||||||
|
#blob_size_limit: 10GB
|
||||||
|
|
||||||
|
# Compression level (1-19)
|
||||||
|
# Higher = better compression but slower
|
||||||
|
# Default: 3
|
||||||
|
compression_level: 5
|
||||||
|
# Hostname to use in backup metadata
|
||||||
|
# Default: system hostname
|
||||||
|
#hostname: myserver
|
||||||
268
docs/DATAMODEL.md
Normal file
268
docs/DATAMODEL.md
Normal file
@@ -0,0 +1,268 @@
|
|||||||
|
# Vaultik Data Model
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Vaultik uses a local SQLite database to track file metadata, chunk mappings, and blob associations during the backup process. This database serves as an index for incremental backups and enables efficient deduplication.
|
||||||
|
|
||||||
|
**Important Notes:**
|
||||||
|
- **No Migration Support**: Vaultik does not support database schema migrations. If the schema changes, the local database must be deleted and recreated by performing a full backup.
|
||||||
|
- **Version Compatibility**: In rare cases, you may need to use the same version of Vaultik to restore a backup as was used to create it. This ensures compatibility with the metadata format stored in S3.
|
||||||
|
|
||||||
|
## Database Tables
|
||||||
|
|
||||||
|
### 1. `files`
|
||||||
|
Stores metadata about files in the filesystem being backed up.
|
||||||
|
|
||||||
|
**Columns:**
|
||||||
|
- `id` (TEXT PRIMARY KEY) - UUID for the file record
|
||||||
|
- `path` (TEXT NOT NULL UNIQUE) - Absolute file path
|
||||||
|
- `mtime` (INTEGER NOT NULL) - Modification time as Unix timestamp
|
||||||
|
- `ctime` (INTEGER NOT NULL) - Change time as Unix timestamp
|
||||||
|
- `size` (INTEGER NOT NULL) - File size in bytes
|
||||||
|
- `mode` (INTEGER NOT NULL) - Unix file permissions and type
|
||||||
|
- `uid` (INTEGER NOT NULL) - User ID of file owner
|
||||||
|
- `gid` (INTEGER NOT NULL) - Group ID of file owner
|
||||||
|
- `link_target` (TEXT) - Symlink target path (NULL for regular files)
|
||||||
|
|
||||||
|
**Indexes:**
|
||||||
|
- `idx_files_path` on `path` for efficient lookups
|
||||||
|
|
||||||
|
**Purpose:** Tracks file metadata to detect changes between backup runs. Used for incremental backup decisions. The UUID primary key provides stable references that don't change if files are moved.
|
||||||
|
|
||||||
|
### 2. `chunks`
|
||||||
|
Stores information about content-defined chunks created from files.
|
||||||
|
|
||||||
|
**Columns:**
|
||||||
|
- `chunk_hash` (TEXT PRIMARY KEY) - SHA256 hash of chunk content
|
||||||
|
- `size` (INTEGER NOT NULL) - Chunk size in bytes
|
||||||
|
|
||||||
|
**Purpose:** Enables deduplication by tracking unique chunks across all files.
|
||||||
|
|
||||||
|
### 3. `file_chunks`
|
||||||
|
Maps files to their constituent chunks in order.
|
||||||
|
|
||||||
|
**Columns:**
|
||||||
|
- `file_id` (TEXT) - File ID (FK to files.id)
|
||||||
|
- `idx` (INTEGER) - Chunk index within file (0-based)
|
||||||
|
- `chunk_hash` (TEXT) - Chunk hash (FK to chunks.chunk_hash)
|
||||||
|
- PRIMARY KEY (`file_id`, `idx`)
|
||||||
|
|
||||||
|
**Purpose:** Allows reconstruction of files from chunks during restore.
|
||||||
|
|
||||||
|
### 4. `chunk_files`
|
||||||
|
Reverse mapping showing which files contain each chunk.
|
||||||
|
|
||||||
|
**Columns:**
|
||||||
|
- `chunk_hash` (TEXT) - Chunk hash (FK to chunks.chunk_hash)
|
||||||
|
- `file_id` (TEXT) - File ID (FK to files.id)
|
||||||
|
- `file_offset` (INTEGER) - Byte offset of chunk within file
|
||||||
|
- `length` (INTEGER) - Length of chunk in bytes
|
||||||
|
- PRIMARY KEY (`chunk_hash`, `file_id`)
|
||||||
|
|
||||||
|
**Purpose:** Supports efficient queries for chunk usage and deduplication statistics.
|
||||||
|
|
||||||
|
### 5. `blobs`
|
||||||
|
Stores information about packed, compressed, and encrypted blob files.
|
||||||
|
|
||||||
|
**Columns:**
|
||||||
|
- `id` (TEXT PRIMARY KEY) - UUID assigned when blob creation starts
|
||||||
|
- `blob_hash` (TEXT UNIQUE) - SHA256 hash of final blob (NULL until finalized)
|
||||||
|
- `created_ts` (INTEGER NOT NULL) - Creation timestamp
|
||||||
|
- `finished_ts` (INTEGER) - Finalization timestamp (NULL if in progress)
|
||||||
|
- `uncompressed_size` (INTEGER NOT NULL DEFAULT 0) - Total size of chunks before compression
|
||||||
|
- `compressed_size` (INTEGER NOT NULL DEFAULT 0) - Size after compression and encryption
|
||||||
|
- `uploaded_ts` (INTEGER) - Upload completion timestamp (NULL if not uploaded)
|
||||||
|
|
||||||
|
**Purpose:** Tracks blob lifecycle from creation through upload. The UUID primary key allows immediate association of chunks with blobs.
|
||||||
|
|
||||||
|
### 6. `blob_chunks`
|
||||||
|
Maps chunks to the blobs that contain them.
|
||||||
|
|
||||||
|
**Columns:**
|
||||||
|
- `blob_id` (TEXT) - Blob ID (FK to blobs.id)
|
||||||
|
- `chunk_hash` (TEXT) - Chunk hash (FK to chunks.chunk_hash)
|
||||||
|
- `offset` (INTEGER) - Byte offset of chunk within blob (before compression)
|
||||||
|
- `length` (INTEGER) - Length of chunk in bytes
|
||||||
|
- PRIMARY KEY (`blob_id`, `chunk_hash`)
|
||||||
|
|
||||||
|
**Purpose:** Enables chunk retrieval from blobs during restore operations.
|
||||||
|
|
||||||
|
### 7. `snapshots`
|
||||||
|
Tracks backup snapshots.
|
||||||
|
|
||||||
|
**Columns:**
|
||||||
|
- `id` (TEXT PRIMARY KEY) - Snapshot ID (format: hostname-YYYYMMDD-HHMMSSZ)
|
||||||
|
- `hostname` (TEXT) - Hostname where backup was created
|
||||||
|
- `vaultik_version` (TEXT) - Version of Vaultik used
|
||||||
|
- `vaultik_git_revision` (TEXT) - Git revision of Vaultik used
|
||||||
|
- `started_at` (INTEGER) - Start timestamp
|
||||||
|
- `completed_at` (INTEGER) - Completion timestamp (NULL if in progress)
|
||||||
|
- `file_count` (INTEGER) - Number of files in snapshot
|
||||||
|
- `chunk_count` (INTEGER) - Number of unique chunks
|
||||||
|
- `blob_count` (INTEGER) - Number of blobs referenced
|
||||||
|
- `total_size` (INTEGER) - Total size of all files
|
||||||
|
- `blob_size` (INTEGER) - Total size of all blobs (compressed)
|
||||||
|
- `blob_uncompressed_size` (INTEGER) - Total uncompressed size of all referenced blobs
|
||||||
|
- `compression_ratio` (REAL) - Compression ratio achieved
|
||||||
|
- `compression_level` (INTEGER) - Compression level used for this snapshot
|
||||||
|
- `upload_bytes` (INTEGER) - Total bytes uploaded during this snapshot
|
||||||
|
- `upload_duration_ms` (INTEGER) - Total milliseconds spent uploading to S3
|
||||||
|
|
||||||
|
**Purpose:** Provides snapshot metadata and statistics including version tracking for compatibility.
|
||||||
|
|
||||||
|
### 8. `snapshot_files`
|
||||||
|
Maps snapshots to the files they contain.
|
||||||
|
|
||||||
|
**Columns:**
|
||||||
|
- `snapshot_id` (TEXT) - Snapshot ID (FK to snapshots.id)
|
||||||
|
- `file_id` (TEXT) - File ID (FK to files.id)
|
||||||
|
- PRIMARY KEY (`snapshot_id`, `file_id`)
|
||||||
|
|
||||||
|
**Purpose:** Records which files are included in each snapshot.
|
||||||
|
|
||||||
|
### 9. `snapshot_blobs`
|
||||||
|
Maps snapshots to the blobs they reference.
|
||||||
|
|
||||||
|
**Columns:**
|
||||||
|
- `snapshot_id` (TEXT) - Snapshot ID (FK to snapshots.id)
|
||||||
|
- `blob_id` (TEXT) - Blob ID (FK to blobs.id)
|
||||||
|
- `blob_hash` (TEXT) - Denormalized blob hash for manifest generation
|
||||||
|
- PRIMARY KEY (`snapshot_id`, `blob_id`)
|
||||||
|
|
||||||
|
**Purpose:** Tracks blob dependencies for snapshots and enables manifest generation.
|
||||||
|
|
||||||
|
### 10. `uploads`
|
||||||
|
Tracks blob upload metrics.
|
||||||
|
|
||||||
|
**Columns:**
|
||||||
|
- `blob_hash` (TEXT PRIMARY KEY) - Hash of uploaded blob
|
||||||
|
- `snapshot_id` (TEXT NOT NULL) - The snapshot that triggered this upload (FK to snapshots.id)
|
||||||
|
- `uploaded_at` (INTEGER) - Upload timestamp
|
||||||
|
- `size` (INTEGER) - Size of uploaded blob
|
||||||
|
- `duration_ms` (INTEGER) - Upload duration in milliseconds
|
||||||
|
|
||||||
|
**Purpose:** Performance monitoring and tracking which blobs were newly created (uploaded) during each snapshot.
|
||||||
|
|
||||||
|
## Data Flow and Operations
|
||||||
|
|
||||||
|
### 1. Backup Process
|
||||||
|
|
||||||
|
1. **File Scanning**
|
||||||
|
- `INSERT OR REPLACE INTO files` - Update file metadata
|
||||||
|
- `SELECT * FROM files WHERE path = ?` - Check if file has changed
|
||||||
|
- `INSERT INTO snapshot_files` - Add file to current snapshot
|
||||||
|
|
||||||
|
2. **Chunking** (for changed files)
|
||||||
|
- `INSERT OR IGNORE INTO chunks` - Store new chunks
|
||||||
|
- `INSERT INTO file_chunks` - Map chunks to file
|
||||||
|
- `INSERT INTO chunk_files` - Create reverse mapping
|
||||||
|
|
||||||
|
3. **Blob Packing**
|
||||||
|
- `INSERT INTO blobs` - Create blob record with UUID (blob_hash NULL)
|
||||||
|
- `INSERT INTO blob_chunks` - Associate chunks with blob immediately
|
||||||
|
- `UPDATE blobs SET blob_hash = ?, finished_ts = ?` - Finalize blob after packing
|
||||||
|
|
||||||
|
4. **Upload**
|
||||||
|
- `UPDATE blobs SET uploaded_ts = ?` - Mark blob as uploaded
|
||||||
|
- `INSERT INTO uploads` - Record upload metrics with snapshot_id
|
||||||
|
- `INSERT INTO snapshot_blobs` - Associate blob with snapshot
|
||||||
|
|
||||||
|
5. **Snapshot Completion**
|
||||||
|
- `UPDATE snapshots SET completed_at = ?, stats...` - Finalize snapshot
|
||||||
|
- Generate and upload blob manifest from `snapshot_blobs`
|
||||||
|
|
||||||
|
### 2. Incremental Backup
|
||||||
|
|
||||||
|
1. **Change Detection**
|
||||||
|
- `SELECT * FROM files WHERE path = ?` - Get previous file metadata
|
||||||
|
- Compare mtime, size, mode to detect changes
|
||||||
|
- Skip unchanged files but still add to `snapshot_files`
|
||||||
|
|
||||||
|
2. **Chunk Reuse**
|
||||||
|
- `SELECT * FROM blob_chunks WHERE chunk_hash = ?` - Find existing chunks
|
||||||
|
- `INSERT INTO snapshot_blobs` - Reference existing blobs for unchanged files
|
||||||
|
|
||||||
|
### 3. Snapshot Metadata Export
|
||||||
|
|
||||||
|
After a snapshot is completed:
|
||||||
|
1. Copy database to temporary file
|
||||||
|
2. Clean temporary database to contain only current snapshot data
|
||||||
|
3. Export to SQL dump using sqlite3
|
||||||
|
4. Compress with zstd and encrypt with age
|
||||||
|
5. Upload to S3 as `metadata/{snapshot-id}/db.zst.age`
|
||||||
|
6. Generate blob manifest and upload as `metadata/{snapshot-id}/manifest.json.zst`
|
||||||
|
|
||||||
|
### 4. Restore Process
|
||||||
|
|
||||||
|
The restore process doesn't use the local database. Instead:
|
||||||
|
1. Downloads snapshot metadata from S3
|
||||||
|
2. Downloads required blobs based on manifest
|
||||||
|
3. Reconstructs files from decrypted and decompressed chunks
|
||||||
|
|
||||||
|
### 5. Pruning
|
||||||
|
|
||||||
|
1. **Identify Unreferenced Blobs**
|
||||||
|
- Query blobs not referenced by any remaining snapshot
|
||||||
|
- Delete from S3 and local database
|
||||||
|
|
||||||
|
### 6. Incomplete Snapshot Cleanup
|
||||||
|
|
||||||
|
Before each backup:
|
||||||
|
1. Query incomplete snapshots (where `completed_at IS NULL`)
|
||||||
|
2. Check if metadata exists in S3
|
||||||
|
3. If no metadata, delete snapshot and all associations
|
||||||
|
4. Clean up orphaned files, chunks, and blobs
|
||||||
|
|
||||||
|
## Repository Pattern
|
||||||
|
|
||||||
|
Vaultik uses a repository pattern for database access:
|
||||||
|
|
||||||
|
- `FileRepository` - CRUD operations for files and file metadata
|
||||||
|
- `ChunkRepository` - CRUD operations for content chunks
|
||||||
|
- `FileChunkRepository` - Manage file-to-chunk mappings
|
||||||
|
- `ChunkFileRepository` - Manage chunk-to-file reverse mappings
|
||||||
|
- `BlobRepository` - Manage blob lifecycle (creation, finalization, upload)
|
||||||
|
- `BlobChunkRepository` - Manage blob-to-chunk associations
|
||||||
|
- `SnapshotRepository` - Manage snapshots and their relationships
|
||||||
|
- `UploadRepository` - Track blob upload metrics
|
||||||
|
|
||||||
|
Each repository provides methods like:
|
||||||
|
- `Create()` - Insert new record
|
||||||
|
- `GetByID()` / `GetByPath()` / `GetByHash()` - Retrieve records
|
||||||
|
- `Update()` - Update existing records
|
||||||
|
- `Delete()` - Remove records
|
||||||
|
- Specialized queries for each entity type (e.g., `DeleteOrphaned()`, `GetIncompleteByHostname()`)
|
||||||
|
|
||||||
|
## Transaction Management
|
||||||
|
|
||||||
|
All database operations that modify multiple tables are wrapped in transactions:
|
||||||
|
|
||||||
|
```go
|
||||||
|
err := repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
// Multiple repository operations using tx
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
This ensures consistency, especially important for operations like:
|
||||||
|
- Creating file-chunk mappings
|
||||||
|
- Associating chunks with blobs
|
||||||
|
- Updating snapshot statistics
|
||||||
|
|
||||||
|
## Performance Considerations
|
||||||
|
|
||||||
|
1. **Indexes**:
|
||||||
|
- Primary keys are automatically indexed
|
||||||
|
- `idx_files_path` on `files(path)` for efficient file lookups
|
||||||
|
|
||||||
|
2. **Prepared Statements**: All queries use prepared statements for performance and security
|
||||||
|
|
||||||
|
3. **Batch Operations**: Where possible, operations are batched within transactions
|
||||||
|
|
||||||
|
4. **Write-Ahead Logging**: SQLite WAL mode is enabled for better concurrency
|
||||||
|
|
||||||
|
## Data Integrity
|
||||||
|
|
||||||
|
1. **Foreign Keys**: Enforced through CASCADE DELETE and application-level repository methods
|
||||||
|
2. **Unique Constraints**: Chunk hashes, file paths, and blob hashes are unique
|
||||||
|
3. **Null Handling**: Nullable fields clearly indicate in-progress operations
|
||||||
|
4. **Timestamp Tracking**: All major operations record timestamps for auditing
|
||||||
143
docs/REPOSTRUCTURE.md
Normal file
143
docs/REPOSTRUCTURE.md
Normal file
@@ -0,0 +1,143 @@
|
|||||||
|
# Vaultik S3 Repository Structure
|
||||||
|
|
||||||
|
This document describes the structure and organization of data stored in the S3 bucket by Vaultik.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Vaultik stores all backup data in an S3-compatible object store. The repository consists of two main components:
|
||||||
|
1. **Blobs** - The actual backup data (content-addressed, encrypted)
|
||||||
|
2. **Metadata** - Snapshot information and manifests (partially encrypted)
|
||||||
|
|
||||||
|
## Directory Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
<bucket>/<prefix>/
|
||||||
|
├── blobs/
|
||||||
|
│ └── <hash[0:2]>/
|
||||||
|
│ └── <hash[2:4]>/
|
||||||
|
│ └── <full-hash>
|
||||||
|
└── metadata/
|
||||||
|
└── <snapshot-id>/
|
||||||
|
├── db.zst.age
|
||||||
|
└── manifest.json.zst
|
||||||
|
```
|
||||||
|
|
||||||
|
## Blobs Directory (`blobs/`)
|
||||||
|
|
||||||
|
### Structure
|
||||||
|
- **Path format**: `blobs/<first-2-chars>/<next-2-chars>/<full-hash>`
|
||||||
|
- **Example**: `blobs/ca/fe/cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678`
|
||||||
|
- **Sharding**: The two-level directory structure (using the first 4 characters of the hash) prevents any single directory from containing too many objects
|
||||||
|
|
||||||
|
### Content
|
||||||
|
- **What it contains**: Packed collections of content-defined chunks from files
|
||||||
|
- **Format**: Zstandard compressed, then Age encrypted
|
||||||
|
- **Encryption**: Always encrypted with Age using the configured recipients
|
||||||
|
- **Naming**: Content-addressed using SHA256 hash of the encrypted blob
|
||||||
|
|
||||||
|
### Why Encrypted
|
||||||
|
Blobs contain the actual file data from backups and must be encrypted for security. The content-addressing ensures deduplication while the encryption ensures privacy.
|
||||||
|
|
||||||
|
## Metadata Directory (`metadata/`)
|
||||||
|
|
||||||
|
Each snapshot has its own subdirectory named with the snapshot ID.
|
||||||
|
|
||||||
|
### Snapshot ID Format
|
||||||
|
- **Format**: `<hostname>-<YYYYMMDD>-<HHMMSSZ>`
|
||||||
|
- **Example**: `laptop-20240115-143052Z`
|
||||||
|
- **Components**:
|
||||||
|
- Hostname (may contain hyphens)
|
||||||
|
- Date in YYYYMMDD format
|
||||||
|
- Time in HHMMSSZ format (Z indicates UTC)
|
||||||
|
|
||||||
|
### Files in Each Snapshot Directory
|
||||||
|
|
||||||
|
#### `db.zst.age` - Encrypted Database Dump
|
||||||
|
- **What it contains**: Complete SQLite database dump for this snapshot
|
||||||
|
- **Format**: SQL dump → Zstandard compressed → Age encrypted
|
||||||
|
- **Encryption**: Encrypted with Age
|
||||||
|
- **Purpose**: Contains full file metadata, chunk mappings, and all relationships
|
||||||
|
- **Why encrypted**: Contains sensitive metadata like file paths, permissions, and ownership
|
||||||
|
|
||||||
|
#### `manifest.json.zst` - Unencrypted Blob Manifest
|
||||||
|
- **What it contains**: JSON list of all blob hashes referenced by this snapshot
|
||||||
|
- **Format**: JSON → Zstandard compressed (NOT encrypted)
|
||||||
|
- **Encryption**: NOT encrypted
|
||||||
|
- **Purpose**: Enables pruning operations without requiring decryption keys
|
||||||
|
- **Structure**:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"snapshot_id": "laptop-20240115-143052Z",
|
||||||
|
"timestamp": "2024-01-15T14:30:52Z",
|
||||||
|
"blob_count": 42,
|
||||||
|
"blobs": [
|
||||||
|
"cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678",
|
||||||
|
"deadbeef1234567890abcdef1234567890abcdef1234567890abcdef12345678",
|
||||||
|
...
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Why Manifest is Unencrypted
|
||||||
|
The manifest must be readable without the private key to enable:
|
||||||
|
1. **Pruning operations** - Identifying unreferenced blobs for deletion
|
||||||
|
2. **Storage analysis** - Understanding space usage without decryption
|
||||||
|
3. **Verification** - Checking blob existence without decryption
|
||||||
|
4. **Cross-snapshot deduplication analysis** - Finding shared blobs between snapshots
|
||||||
|
|
||||||
|
The manifest only contains blob hashes, not file names or any other sensitive information.
|
||||||
|
|
||||||
|
## Security Considerations
|
||||||
|
|
||||||
|
### What's Encrypted
|
||||||
|
- **All file content** (in blobs)
|
||||||
|
- **All file metadata** (paths, permissions, timestamps, ownership in db.zst.age)
|
||||||
|
- **File-to-chunk mappings** (in db.zst.age)
|
||||||
|
|
||||||
|
### What's Not Encrypted
|
||||||
|
- **Blob hashes** (in manifest.json.zst)
|
||||||
|
- **Snapshot IDs** (directory names)
|
||||||
|
- **Blob count per snapshot** (in manifest.json.zst)
|
||||||
|
|
||||||
|
### Privacy Implications
|
||||||
|
From the unencrypted data, an observer can determine:
|
||||||
|
- When backups were taken (from snapshot IDs)
|
||||||
|
- Which hostname created backups (from snapshot IDs)
|
||||||
|
- How many blobs each snapshot references
|
||||||
|
- Which blobs are shared between snapshots (deduplication patterns)
|
||||||
|
- The size of each encrypted blob
|
||||||
|
|
||||||
|
An observer cannot determine:
|
||||||
|
- File names or paths
|
||||||
|
- File contents
|
||||||
|
- File permissions or ownership
|
||||||
|
- Directory structure
|
||||||
|
- Which chunks belong to which files
|
||||||
|
|
||||||
|
## Consistency Guarantees
|
||||||
|
|
||||||
|
1. **Blobs are immutable** - Once written, a blob is never modified
|
||||||
|
2. **Blobs are written before metadata** - A snapshot's metadata is only written after all its blobs are successfully uploaded
|
||||||
|
3. **Metadata is written atomically** - Both db.zst.age and manifest.json.zst are written as complete files
|
||||||
|
4. **Snapshots are marked complete in local DB only after metadata upload** - Ensures consistency between local and remote state
|
||||||
|
|
||||||
|
## Pruning Safety
|
||||||
|
|
||||||
|
The prune operation is safe because:
|
||||||
|
1. It only deletes blobs not referenced in any manifest
|
||||||
|
2. Manifests are unencrypted and can be read without keys
|
||||||
|
3. The operation compares the latest local DB snapshot with the latest S3 snapshot to ensure consistency
|
||||||
|
4. Pruning will fail if these don't match, preventing accidental deletion of needed blobs
|
||||||
|
|
||||||
|
## Restoration Requirements
|
||||||
|
|
||||||
|
To restore from a backup, you need:
|
||||||
|
1. **The Age private key** - To decrypt blobs and database
|
||||||
|
2. **The snapshot metadata** - Both files from the snapshot's metadata directory
|
||||||
|
3. **All referenced blobs** - As listed in the manifest
|
||||||
|
|
||||||
|
The restoration process:
|
||||||
|
1. Download and decrypt the database dump to understand file structure
|
||||||
|
2. Download and decrypt the required blobs
|
||||||
|
3. Reconstruct files from their chunks
|
||||||
|
4. Restore file metadata (permissions, timestamps, etc.)
|
||||||
293
go.mod
293
go.mod
@@ -3,26 +3,303 @@ module git.eeqj.de/sneak/vaultik
|
|||||||
go 1.24.4
|
go 1.24.4
|
||||||
|
|
||||||
require (
|
require (
|
||||||
github.com/spf13/cobra v1.9.1
|
filippo.io/age v1.2.1
|
||||||
|
git.eeqj.de/sneak/smartconfig v1.0.0
|
||||||
|
github.com/adrg/xdg v0.5.3
|
||||||
|
github.com/aws/aws-sdk-go-v2 v1.39.6
|
||||||
|
github.com/aws/aws-sdk-go-v2/config v1.31.17
|
||||||
|
github.com/aws/aws-sdk-go-v2/credentials v1.18.21
|
||||||
|
github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.20.4
|
||||||
|
github.com/aws/aws-sdk-go-v2/service/s3 v1.90.0
|
||||||
|
github.com/aws/smithy-go v1.23.2
|
||||||
|
github.com/dustin/go-humanize v1.0.1
|
||||||
|
github.com/gobwas/glob v0.2.3
|
||||||
|
github.com/google/uuid v1.6.0
|
||||||
|
github.com/johannesboyne/gofakes3 v0.0.0-20250603205740-ed9094be7668
|
||||||
|
github.com/klauspost/compress v1.18.1
|
||||||
|
github.com/mattn/go-sqlite3 v1.14.29
|
||||||
|
github.com/rclone/rclone v1.72.1
|
||||||
|
github.com/schollz/progressbar/v3 v3.19.0
|
||||||
|
github.com/spf13/afero v1.15.0
|
||||||
|
github.com/spf13/cobra v1.10.1
|
||||||
|
github.com/stretchr/testify v1.11.1
|
||||||
go.uber.org/fx v1.24.0
|
go.uber.org/fx v1.24.0
|
||||||
|
golang.org/x/term v0.37.0
|
||||||
gopkg.in/yaml.v3 v3.0.1
|
gopkg.in/yaml.v3 v3.0.1
|
||||||
modernc.org/sqlite v1.38.0
|
modernc.org/sqlite v1.38.0
|
||||||
)
|
)
|
||||||
|
|
||||||
require (
|
require (
|
||||||
github.com/dustin/go-humanize v1.0.1 // indirect
|
cloud.google.com/go/auth v0.17.0 // indirect
|
||||||
github.com/google/uuid v1.6.0 // indirect
|
cloud.google.com/go/auth/oauth2adapt v0.2.8 // indirect
|
||||||
|
cloud.google.com/go/compute/metadata v0.9.0 // indirect
|
||||||
|
cloud.google.com/go/iam v1.5.2 // indirect
|
||||||
|
cloud.google.com/go/secretmanager v1.15.0 // indirect
|
||||||
|
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.20.0 // indirect
|
||||||
|
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.13.0 // indirect
|
||||||
|
github.com/Azure/azure-sdk-for-go/sdk/internal v1.11.2 // indirect
|
||||||
|
github.com/Azure/azure-sdk-for-go/sdk/keyvault/azsecrets v0.12.0 // indirect
|
||||||
|
github.com/Azure/azure-sdk-for-go/sdk/keyvault/internal v0.7.1 // indirect
|
||||||
|
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.6.3 // indirect
|
||||||
|
github.com/Azure/azure-sdk-for-go/sdk/storage/azfile v1.5.3 // indirect
|
||||||
|
github.com/Azure/go-ntlmssp v0.0.2-0.20251110135918-10b7b7e7cd26 // indirect
|
||||||
|
github.com/AzureAD/microsoft-authentication-library-for-go v1.6.0 // indirect
|
||||||
|
github.com/Files-com/files-sdk-go/v3 v3.2.264 // indirect
|
||||||
|
github.com/IBM/go-sdk-core/v5 v5.21.0 // indirect
|
||||||
|
github.com/Max-Sum/base32768 v0.0.0-20230304063302-18e6ce5945fd // indirect
|
||||||
|
github.com/Microsoft/go-winio v0.6.2 // indirect
|
||||||
|
github.com/ProtonMail/bcrypt v0.0.0-20211005172633-e235017c1baf // indirect
|
||||||
|
github.com/ProtonMail/gluon v0.17.1-0.20230724134000-308be39be96e // indirect
|
||||||
|
github.com/ProtonMail/go-crypto v1.3.0 // indirect
|
||||||
|
github.com/ProtonMail/go-mime v0.0.0-20230322103455-7d82a3887f2f // indirect
|
||||||
|
github.com/ProtonMail/go-srp v0.0.7 // indirect
|
||||||
|
github.com/ProtonMail/gopenpgp/v2 v2.9.0 // indirect
|
||||||
|
github.com/PuerkitoBio/goquery v1.10.3 // indirect
|
||||||
|
github.com/a1ex3/zstd-seekable-format-go/pkg v0.10.0 // indirect
|
||||||
|
github.com/abbot/go-http-auth v0.4.0 // indirect
|
||||||
|
github.com/anchore/go-lzo v0.1.0 // indirect
|
||||||
|
github.com/andybalholm/cascadia v1.3.3 // indirect
|
||||||
|
github.com/appscode/go-querystring v0.0.0-20170504095604-0126cfb3f1dc // indirect
|
||||||
|
github.com/armon/go-metrics v0.4.1 // indirect
|
||||||
|
github.com/aws/aws-sdk-go v1.44.256 // indirect
|
||||||
|
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.3 // indirect
|
||||||
|
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.13 // indirect
|
||||||
|
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.13 // indirect
|
||||||
|
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.13 // indirect
|
||||||
|
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.4 // indirect
|
||||||
|
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.13 // indirect
|
||||||
|
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.3 // indirect
|
||||||
|
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.4 // indirect
|
||||||
|
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.13 // indirect
|
||||||
|
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.13 // indirect
|
||||||
|
github.com/aws/aws-sdk-go-v2/service/secretsmanager v1.35.8 // indirect
|
||||||
|
github.com/aws/aws-sdk-go-v2/service/sso v1.30.1 // indirect
|
||||||
|
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.5 // indirect
|
||||||
|
github.com/aws/aws-sdk-go-v2/service/sts v1.39.1 // indirect
|
||||||
|
github.com/bahlo/generic-list-go v0.2.0 // indirect
|
||||||
|
github.com/beorn7/perks v1.0.1 // indirect
|
||||||
|
github.com/boombuler/barcode v1.1.0 // indirect
|
||||||
|
github.com/bradenaw/juniper v0.15.3 // indirect
|
||||||
|
github.com/bradfitz/iter v0.0.0-20191230175014-e8f45d346db8 // indirect
|
||||||
|
github.com/buengese/sgzip v0.1.1 // indirect
|
||||||
|
github.com/buger/jsonparser v1.1.1 // indirect
|
||||||
|
github.com/calebcase/tmpfile v1.0.3 // indirect
|
||||||
|
github.com/cenkalti/backoff/v4 v4.3.0 // indirect
|
||||||
|
github.com/cespare/xxhash/v2 v2.3.0 // indirect
|
||||||
|
github.com/chilts/sid v0.0.0-20190607042430-660e94789ec9 // indirect
|
||||||
|
github.com/clipperhouse/stringish v0.1.1 // indirect
|
||||||
|
github.com/clipperhouse/uax29/v2 v2.3.0 // indirect
|
||||||
|
github.com/cloudflare/circl v1.6.1 // indirect
|
||||||
|
github.com/cloudinary/cloudinary-go/v2 v2.13.0 // indirect
|
||||||
|
github.com/cloudsoda/go-smb2 v0.0.0-20250228001242-d4c70e6251cc // indirect
|
||||||
|
github.com/cloudsoda/sddl v0.0.0-20250224235906-926454e91efc // indirect
|
||||||
|
github.com/colinmarc/hdfs/v2 v2.4.0 // indirect
|
||||||
|
github.com/coreos/go-semver v0.3.1 // indirect
|
||||||
|
github.com/coreos/go-systemd/v22 v22.6.0 // indirect
|
||||||
|
github.com/creasty/defaults v1.8.0 // indirect
|
||||||
|
github.com/cronokirby/saferith v0.33.0 // indirect
|
||||||
|
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
|
||||||
|
github.com/diskfs/go-diskfs v1.7.0 // indirect
|
||||||
|
github.com/dropbox/dropbox-sdk-go-unofficial/v6 v6.0.5 // indirect
|
||||||
|
github.com/ebitengine/purego v0.9.1 // indirect
|
||||||
|
github.com/emersion/go-message v0.18.2 // indirect
|
||||||
|
github.com/emersion/go-vcard v0.0.0-20241024213814-c9703dde27ff // indirect
|
||||||
|
github.com/emicklei/go-restful/v3 v3.11.0 // indirect
|
||||||
|
github.com/fatih/color v1.16.0 // indirect
|
||||||
|
github.com/felixge/httpsnoop v1.0.4 // indirect
|
||||||
|
github.com/flynn/noise v1.1.0 // indirect
|
||||||
|
github.com/fxamacker/cbor/v2 v2.7.0 // indirect
|
||||||
|
github.com/gabriel-vasile/mimetype v1.4.11 // indirect
|
||||||
|
github.com/geoffgarside/ber v1.2.0 // indirect
|
||||||
|
github.com/go-chi/chi/v5 v5.2.3 // indirect
|
||||||
|
github.com/go-darwin/apfs v0.0.0-20211011131704-f84b94dbf348 // indirect
|
||||||
|
github.com/go-git/go-billy/v5 v5.6.2 // indirect
|
||||||
|
github.com/go-jose/go-jose/v4 v4.1.2 // indirect
|
||||||
|
github.com/go-logr/logr v1.4.3 // indirect
|
||||||
|
github.com/go-logr/stdr v1.2.2 // indirect
|
||||||
|
github.com/go-ole/go-ole v1.3.0 // indirect
|
||||||
|
github.com/go-openapi/errors v0.22.4 // indirect
|
||||||
|
github.com/go-openapi/jsonpointer v0.21.0 // indirect
|
||||||
|
github.com/go-openapi/jsonreference v0.20.2 // indirect
|
||||||
|
github.com/go-openapi/strfmt v0.25.0 // indirect
|
||||||
|
github.com/go-openapi/swag v0.23.0 // indirect
|
||||||
|
github.com/go-playground/locales v0.14.1 // indirect
|
||||||
|
github.com/go-playground/universal-translator v0.18.1 // indirect
|
||||||
|
github.com/go-playground/validator/v10 v10.28.0 // indirect
|
||||||
|
github.com/go-resty/resty/v2 v2.16.5 // indirect
|
||||||
|
github.com/go-viper/mapstructure/v2 v2.4.0 // indirect
|
||||||
|
github.com/gofrs/flock v0.13.0 // indirect
|
||||||
|
github.com/gogo/protobuf v1.3.2 // indirect
|
||||||
|
github.com/golang-jwt/jwt/v4 v4.5.2 // indirect
|
||||||
|
github.com/golang-jwt/jwt/v5 v5.3.0 // indirect
|
||||||
|
github.com/golang/protobuf v1.5.4 // indirect
|
||||||
|
github.com/google/btree v1.1.3 // indirect
|
||||||
|
github.com/google/gnostic-models v0.6.9 // indirect
|
||||||
|
github.com/google/go-cmp v0.7.0 // indirect
|
||||||
|
github.com/google/s2a-go v0.1.9 // indirect
|
||||||
|
github.com/googleapis/enterprise-certificate-proxy v0.3.7 // indirect
|
||||||
|
github.com/googleapis/gax-go/v2 v2.15.0 // indirect
|
||||||
|
github.com/gopherjs/gopherjs v1.17.2 // indirect
|
||||||
|
github.com/gorilla/schema v1.4.1 // indirect
|
||||||
|
github.com/grpc-ecosystem/grpc-gateway/v2 v2.26.3 // indirect
|
||||||
|
github.com/hashicorp/consul/api v1.32.1 // indirect
|
||||||
|
github.com/hashicorp/errwrap v1.1.0 // indirect
|
||||||
|
github.com/hashicorp/go-cleanhttp v0.5.2 // indirect
|
||||||
|
github.com/hashicorp/go-hclog v1.6.3 // indirect
|
||||||
|
github.com/hashicorp/go-immutable-radix v1.3.1 // indirect
|
||||||
|
github.com/hashicorp/go-multierror v1.1.1 // indirect
|
||||||
|
github.com/hashicorp/go-retryablehttp v0.7.8 // indirect
|
||||||
|
github.com/hashicorp/go-rootcerts v1.0.2 // indirect
|
||||||
|
github.com/hashicorp/go-secure-stdlib/parseutil v0.1.6 // indirect
|
||||||
|
github.com/hashicorp/go-secure-stdlib/strutil v0.1.2 // indirect
|
||||||
|
github.com/hashicorp/go-sockaddr v1.0.2 // indirect
|
||||||
|
github.com/hashicorp/go-uuid v1.0.3 // indirect
|
||||||
|
github.com/hashicorp/golang-lru v0.5.4 // indirect
|
||||||
|
github.com/hashicorp/hcl v1.0.1-vault-7 // indirect
|
||||||
|
github.com/hashicorp/serf v0.10.1 // indirect
|
||||||
|
github.com/hashicorp/vault/api v1.20.0 // indirect
|
||||||
|
github.com/henrybear327/Proton-API-Bridge v1.0.0 // indirect
|
||||||
|
github.com/henrybear327/go-proton-api v1.0.0 // indirect
|
||||||
github.com/inconshreveable/mousetrap v1.1.0 // indirect
|
github.com/inconshreveable/mousetrap v1.1.0 // indirect
|
||||||
|
github.com/jcmturner/aescts/v2 v2.0.0 // indirect
|
||||||
|
github.com/jcmturner/dnsutils/v2 v2.0.0 // indirect
|
||||||
|
github.com/jcmturner/gofork v1.7.6 // indirect
|
||||||
|
github.com/jcmturner/goidentity/v6 v6.0.1 // indirect
|
||||||
|
github.com/jcmturner/gokrb5/v8 v8.4.4 // indirect
|
||||||
|
github.com/jcmturner/rpc/v2 v2.0.3 // indirect
|
||||||
|
github.com/jlaffaye/ftp v0.2.1-0.20240918233326-1b970516f5d3 // indirect
|
||||||
|
github.com/josharian/intern v1.0.0 // indirect
|
||||||
|
github.com/json-iterator/go v1.1.12 // indirect
|
||||||
|
github.com/jtolds/gls v4.20.0+incompatible // indirect
|
||||||
|
github.com/jtolio/noiseconn v0.0.0-20231127013910-f6d9ecbf1de7 // indirect
|
||||||
|
github.com/jzelinskie/whirlpool v0.0.0-20201016144138-0675e54bb004 // indirect
|
||||||
|
github.com/klauspost/cpuid/v2 v2.3.0 // indirect
|
||||||
|
github.com/koofr/go-httpclient v0.0.0-20240520111329-e20f8f203988 // indirect
|
||||||
|
github.com/koofr/go-koofrclient v0.0.0-20221207135200-cbd7fc9ad6a6 // indirect
|
||||||
|
github.com/kr/fs v0.1.0 // indirect
|
||||||
|
github.com/kylelemons/godebug v1.1.0 // indirect
|
||||||
|
github.com/lanrat/extsort v1.4.2 // indirect
|
||||||
|
github.com/leodido/go-urn v1.4.0 // indirect
|
||||||
|
github.com/lpar/date v1.0.0 // indirect
|
||||||
|
github.com/lufia/plan9stats v0.0.0-20251013123823-9fd1530e3ec3 // indirect
|
||||||
|
github.com/mailru/easyjson v0.9.1 // indirect
|
||||||
|
github.com/mattn/go-colorable v0.1.14 // indirect
|
||||||
github.com/mattn/go-isatty v0.0.20 // indirect
|
github.com/mattn/go-isatty v0.0.20 // indirect
|
||||||
|
github.com/mattn/go-runewidth v0.0.19 // indirect
|
||||||
|
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db // indirect
|
||||||
|
github.com/mitchellh/go-homedir v1.1.0 // indirect
|
||||||
|
github.com/mitchellh/mapstructure v1.5.0 // indirect
|
||||||
|
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
|
||||||
|
github.com/modern-go/reflect2 v1.0.2 // indirect
|
||||||
|
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
|
||||||
github.com/ncruces/go-strftime v0.1.9 // indirect
|
github.com/ncruces/go-strftime v0.1.9 // indirect
|
||||||
|
github.com/ncw/swift/v2 v2.0.5 // indirect
|
||||||
|
github.com/oklog/ulid v1.3.1 // indirect
|
||||||
|
github.com/onsi/ginkgo/v2 v2.23.3 // indirect
|
||||||
|
github.com/oracle/oci-go-sdk/v65 v65.104.0 // indirect
|
||||||
|
github.com/panjf2000/ants/v2 v2.11.3 // indirect
|
||||||
|
github.com/patrickmn/go-cache v2.1.0+incompatible // indirect
|
||||||
|
github.com/pengsrc/go-shared v0.2.1-0.20190131101655-1999055a4a14 // indirect
|
||||||
|
github.com/peterh/liner v1.2.2 // indirect
|
||||||
|
github.com/pierrec/lz4/v4 v4.1.22 // indirect
|
||||||
|
github.com/pkg/browser v0.0.0-20240102092130-5ac0b6a4141c // indirect
|
||||||
|
github.com/pkg/errors v0.9.1 // indirect
|
||||||
|
github.com/pkg/sftp v1.13.10 // indirect
|
||||||
|
github.com/pkg/xattr v0.4.12 // indirect
|
||||||
|
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
|
||||||
|
github.com/power-devops/perfstat v0.0.0-20240221224432-82ca36839d55 // indirect
|
||||||
|
github.com/pquerna/otp v1.5.0 // indirect
|
||||||
|
github.com/prometheus/client_golang v1.23.2 // indirect
|
||||||
|
github.com/prometheus/client_model v0.6.2 // indirect
|
||||||
|
github.com/prometheus/common v0.67.2 // indirect
|
||||||
|
github.com/prometheus/procfs v0.19.2 // indirect
|
||||||
|
github.com/putdotio/go-putio/putio v0.0.0-20200123120452-16d982cac2b8 // indirect
|
||||||
|
github.com/relvacode/iso8601 v1.7.0 // indirect
|
||||||
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
|
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
|
||||||
github.com/spf13/pflag v1.0.6 // indirect
|
github.com/rfjakob/eme v1.1.2 // indirect
|
||||||
|
github.com/rivo/uniseg v0.4.7 // indirect
|
||||||
|
github.com/ryanuber/go-glob v1.0.0 // indirect
|
||||||
|
github.com/ryszard/goskiplist v0.0.0-20150312221310-2dfbae5fcf46 // indirect
|
||||||
|
github.com/sabhiram/go-gitignore v0.0.0-20210923224102-525f6e181f06 // indirect
|
||||||
|
github.com/samber/lo v1.52.0 // indirect
|
||||||
|
github.com/shirou/gopsutil/v4 v4.25.10 // indirect
|
||||||
|
github.com/sirupsen/logrus v1.9.4-0.20230606125235-dd1b4c2e81af // indirect
|
||||||
|
github.com/skratchdot/open-golang v0.0.0-20200116055534-eef842397966 // indirect
|
||||||
|
github.com/smarty/assertions v1.16.0 // indirect
|
||||||
|
github.com/sony/gobreaker v1.0.0 // indirect
|
||||||
|
github.com/spacemonkeygo/monkit/v3 v3.0.25-0.20251022131615-eb24eb109368 // indirect
|
||||||
|
github.com/spf13/pflag v1.0.10 // indirect
|
||||||
|
github.com/t3rm1n4l/go-mega v0.0.0-20251031123324-a804aaa87491 // indirect
|
||||||
|
github.com/tidwall/gjson v1.18.0 // indirect
|
||||||
|
github.com/tidwall/match v1.1.1 // indirect
|
||||||
|
github.com/tidwall/pretty v1.2.0 // indirect
|
||||||
|
github.com/tklauser/go-sysconf v0.3.15 // indirect
|
||||||
|
github.com/tklauser/numcpus v0.10.0 // indirect
|
||||||
|
github.com/ulikunitz/xz v0.5.15 // indirect
|
||||||
|
github.com/unknwon/goconfig v1.0.0 // indirect
|
||||||
|
github.com/wk8/go-ordered-map/v2 v2.1.8 // indirect
|
||||||
|
github.com/x448/float16 v0.8.4 // indirect
|
||||||
|
github.com/xanzy/ssh-agent v0.3.3 // indirect
|
||||||
|
github.com/youmark/pkcs8 v0.0.0-20240726163527-a2c0da244d78 // indirect
|
||||||
|
github.com/yunify/qingstor-sdk-go/v3 v3.2.0 // indirect
|
||||||
|
github.com/yusufpapurcu/wmi v1.2.4 // indirect
|
||||||
|
github.com/zeebo/blake3 v0.2.4 // indirect
|
||||||
|
github.com/zeebo/errs v1.4.0 // indirect
|
||||||
|
github.com/zeebo/xxh3 v1.0.2 // indirect
|
||||||
|
go.etcd.io/bbolt v1.4.3 // indirect
|
||||||
|
go.etcd.io/etcd/api/v3 v3.6.2 // indirect
|
||||||
|
go.etcd.io/etcd/client/pkg/v3 v3.6.2 // indirect
|
||||||
|
go.etcd.io/etcd/client/v3 v3.6.2 // indirect
|
||||||
|
go.mongodb.org/mongo-driver v1.17.6 // indirect
|
||||||
|
go.opentelemetry.io/auto/sdk v1.2.1 // indirect
|
||||||
|
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.61.0 // indirect
|
||||||
|
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.63.0 // indirect
|
||||||
|
go.opentelemetry.io/otel v1.38.0 // indirect
|
||||||
|
go.opentelemetry.io/otel/metric v1.38.0 // indirect
|
||||||
|
go.opentelemetry.io/otel/trace v1.38.0 // indirect
|
||||||
|
go.shabbyrobe.org/gocovmerge v0.0.0-20230507111327-fa4f82cfbf4d // indirect
|
||||||
go.uber.org/dig v1.19.0 // indirect
|
go.uber.org/dig v1.19.0 // indirect
|
||||||
go.uber.org/multierr v1.10.0 // indirect
|
go.uber.org/multierr v1.11.0 // indirect
|
||||||
go.uber.org/zap v1.26.0 // indirect
|
go.uber.org/zap v1.27.0 // indirect
|
||||||
golang.org/x/exp v0.0.0-20250408133849-7e4ce0ab07d0 // indirect
|
go.yaml.in/yaml/v2 v2.4.3 // indirect
|
||||||
golang.org/x/sys v0.33.0 // indirect
|
golang.org/x/crypto v0.45.0 // indirect
|
||||||
|
golang.org/x/exp v0.0.0-20251023183803-a4bb9ffd2546 // indirect
|
||||||
|
golang.org/x/net v0.47.0 // indirect
|
||||||
|
golang.org/x/oauth2 v0.33.0 // indirect
|
||||||
|
golang.org/x/sync v0.18.0 // indirect
|
||||||
|
golang.org/x/sys v0.38.0 // indirect
|
||||||
|
golang.org/x/text v0.31.0 // indirect
|
||||||
|
golang.org/x/time v0.14.0 // indirect
|
||||||
|
golang.org/x/tools v0.38.0 // indirect
|
||||||
|
google.golang.org/api v0.255.0 // indirect
|
||||||
|
google.golang.org/genproto v0.0.0-20250603155806-513f23925822 // indirect
|
||||||
|
google.golang.org/genproto/googleapis/api v0.0.0-20250804133106-a7a43d27e69b // indirect
|
||||||
|
google.golang.org/genproto/googleapis/rpc v0.0.0-20251103181224-f26f9409b101 // indirect
|
||||||
|
google.golang.org/grpc v1.76.0 // indirect
|
||||||
|
google.golang.org/protobuf v1.36.10 // indirect
|
||||||
|
gopkg.in/evanphx/json-patch.v4 v4.12.0 // indirect
|
||||||
|
gopkg.in/inf.v0 v0.9.1 // indirect
|
||||||
|
gopkg.in/natefinch/lumberjack.v2 v2.2.1 // indirect
|
||||||
|
gopkg.in/validator.v2 v2.0.1 // indirect
|
||||||
|
gopkg.in/yaml.v2 v2.4.0 // indirect
|
||||||
|
k8s.io/api v0.33.3 // indirect
|
||||||
|
k8s.io/apimachinery v0.33.3 // indirect
|
||||||
|
k8s.io/client-go v0.33.3 // indirect
|
||||||
|
k8s.io/klog/v2 v2.130.1 // indirect
|
||||||
|
k8s.io/kube-openapi v0.0.0-20250318190949-c8a335a9a2ff // indirect
|
||||||
|
k8s.io/utils v0.0.0-20241104100929-3ea5e8cea738 // indirect
|
||||||
modernc.org/libc v1.65.10 // indirect
|
modernc.org/libc v1.65.10 // indirect
|
||||||
modernc.org/mathutil v1.7.1 // indirect
|
modernc.org/mathutil v1.7.1 // indirect
|
||||||
modernc.org/memory v1.11.0 // indirect
|
modernc.org/memory v1.11.0 // indirect
|
||||||
|
moul.io/http2curl/v2 v2.3.0 // indirect
|
||||||
|
sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3 // indirect
|
||||||
|
sigs.k8s.io/randfill v1.0.0 // indirect
|
||||||
|
sigs.k8s.io/structured-merge-diff/v4 v4.6.0 // indirect
|
||||||
|
sigs.k8s.io/yaml v1.6.0 // indirect
|
||||||
|
storj.io/common v0.0.0-20251107171817-6221ae45072c // indirect
|
||||||
|
storj.io/drpc v0.0.35-0.20250513201419-f7819ea69b55 // indirect
|
||||||
|
storj.io/eventkit v0.0.0-20250410172343-61f26d3de156 // indirect
|
||||||
|
storj.io/infectious v0.0.2 // indirect
|
||||||
|
storj.io/picobuf v0.0.4 // indirect
|
||||||
|
storj.io/uplink v1.13.1 // indirect
|
||||||
)
|
)
|
||||||
|
|||||||
6
internal/blob/errors.go
Normal file
6
internal/blob/errors.go
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
package blob
|
||||||
|
|
||||||
|
import "errors"
|
||||||
|
|
||||||
|
// ErrBlobSizeLimitExceeded is returned when adding a chunk would exceed the blob size limit
|
||||||
|
var ErrBlobSizeLimitExceeded = errors.New("adding chunk would exceed blob size limit")
|
||||||
555
internal/blob/packer.go
Normal file
555
internal/blob/packer.go
Normal file
@@ -0,0 +1,555 @@
|
|||||||
|
// Package blob handles the creation of blobs - the final storage units for Vaultik.
|
||||||
|
// A blob is a large file (up to 10GB) containing many compressed and encrypted chunks
|
||||||
|
// from multiple source files. Blobs are content-addressed, meaning their filename
|
||||||
|
// is derived from the SHA256 hash of their compressed and encrypted content.
|
||||||
|
//
|
||||||
|
// The blob creation process:
|
||||||
|
// 1. Chunks are accumulated from multiple files
|
||||||
|
// 2. The collection is compressed using zstd
|
||||||
|
// 3. The compressed data is encrypted using age
|
||||||
|
// 4. The encrypted blob is hashed to create its content-addressed name
|
||||||
|
// 5. The blob is uploaded to S3 using the hash as the filename
|
||||||
|
//
|
||||||
|
// This design optimizes storage efficiency by batching many small chunks into
|
||||||
|
// larger blobs, reducing the number of S3 operations and associated costs.
|
||||||
|
package blob
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"database/sql"
|
||||||
|
"encoding/hex"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/blobgen"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
|
"github.com/google/uuid"
|
||||||
|
"github.com/spf13/afero"
|
||||||
|
)
|
||||||
|
|
||||||
|
// BlobHandler is a callback function invoked when a blob is finalized and ready for upload.
|
||||||
|
// The handler receives a BlobWithReader containing the blob metadata and a reader for
|
||||||
|
// the compressed and encrypted blob content. The handler is responsible for uploading
|
||||||
|
// the blob to storage and cleaning up any temporary files.
|
||||||
|
type BlobHandler func(blob *BlobWithReader) error
|
||||||
|
|
||||||
|
// PackerConfig holds configuration for creating a Packer.
|
||||||
|
// All fields except BlobHandler are required.
|
||||||
|
type PackerConfig struct {
|
||||||
|
MaxBlobSize int64 // Maximum size of a blob before forcing finalization
|
||||||
|
CompressionLevel int // Zstd compression level (1-19, higher = better compression)
|
||||||
|
Recipients []string // Age recipients for encryption
|
||||||
|
Repositories *database.Repositories // Database repositories for tracking blob metadata
|
||||||
|
BlobHandler BlobHandler // Optional callback when blob is ready for upload
|
||||||
|
Fs afero.Fs // Filesystem for temporary files
|
||||||
|
}
|
||||||
|
|
||||||
|
// PendingChunk represents a chunk waiting to be inserted into the database.
|
||||||
|
type PendingChunk struct {
|
||||||
|
Hash string
|
||||||
|
Size int64
|
||||||
|
}
|
||||||
|
|
||||||
|
// Packer accumulates chunks and packs them into blobs.
|
||||||
|
// It handles compression, encryption, and coordination with the database
|
||||||
|
// to track blob metadata. Packer is thread-safe.
|
||||||
|
type Packer struct {
|
||||||
|
maxBlobSize int64
|
||||||
|
compressionLevel int
|
||||||
|
recipients []string // Age recipients for encryption
|
||||||
|
blobHandler BlobHandler // Called when blob is ready
|
||||||
|
repos *database.Repositories // For creating blob records
|
||||||
|
fs afero.Fs // Filesystem for temporary files
|
||||||
|
|
||||||
|
// Mutex for thread-safe blob creation
|
||||||
|
mu sync.Mutex
|
||||||
|
|
||||||
|
// Current blob being packed
|
||||||
|
currentBlob *blobInProgress
|
||||||
|
finishedBlobs []*FinishedBlob // Only used if no handler provided
|
||||||
|
|
||||||
|
// Pending chunks to be inserted when blob finalizes
|
||||||
|
pendingChunks []PendingChunk
|
||||||
|
}
|
||||||
|
|
||||||
|
// blobInProgress represents a blob being assembled
|
||||||
|
type blobInProgress struct {
|
||||||
|
id string // UUID of the blob
|
||||||
|
chunks []*chunkInfo // Track chunk metadata
|
||||||
|
chunkSet map[string]bool // Track unique chunks in this blob
|
||||||
|
tempFile afero.File // Temporary file for encrypted compressed data
|
||||||
|
writer *blobgen.Writer // Unified compression/encryption/hashing writer
|
||||||
|
startTime time.Time
|
||||||
|
size int64 // Current uncompressed size
|
||||||
|
}
|
||||||
|
|
||||||
|
// ChunkRef represents a chunk to be added to a blob.
|
||||||
|
// The Hash is the content-addressed identifier (SHA256) of the chunk,
|
||||||
|
// and Data contains the raw chunk bytes. After adding to a blob,
|
||||||
|
// the Data can be safely discarded as it's written to the blob immediately.
|
||||||
|
type ChunkRef struct {
|
||||||
|
Hash string // SHA256 hash of the chunk data
|
||||||
|
Data []byte // Raw chunk content
|
||||||
|
}
|
||||||
|
|
||||||
|
// chunkInfo tracks chunk metadata in a blob
|
||||||
|
type chunkInfo struct {
|
||||||
|
Hash string
|
||||||
|
Offset int64
|
||||||
|
Size int64
|
||||||
|
}
|
||||||
|
|
||||||
|
// FinishedBlob represents a completed blob ready for storage
|
||||||
|
type FinishedBlob struct {
|
||||||
|
ID string
|
||||||
|
Hash string
|
||||||
|
Data []byte // Compressed data
|
||||||
|
Chunks []*BlobChunkRef
|
||||||
|
CreatedTS time.Time
|
||||||
|
Uncompressed int64
|
||||||
|
Compressed int64
|
||||||
|
}
|
||||||
|
|
||||||
|
// BlobChunkRef represents a chunk's position within a blob
|
||||||
|
type BlobChunkRef struct {
|
||||||
|
ChunkHash string
|
||||||
|
Offset int64
|
||||||
|
Length int64
|
||||||
|
}
|
||||||
|
|
||||||
|
// BlobWithReader wraps a FinishedBlob with its data reader
|
||||||
|
type BlobWithReader struct {
|
||||||
|
*FinishedBlob
|
||||||
|
Reader io.ReadSeeker
|
||||||
|
TempFile afero.File // Optional, only set for disk-based blobs
|
||||||
|
InsertedChunkHashes []string // Chunk hashes that were inserted to DB with this blob
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewPacker creates a new blob packer that accumulates chunks into blobs.
|
||||||
|
// The packer will automatically finalize blobs when they reach MaxBlobSize.
|
||||||
|
// Returns an error if required configuration fields are missing or invalid.
|
||||||
|
func NewPacker(cfg PackerConfig) (*Packer, error) {
|
||||||
|
if len(cfg.Recipients) == 0 {
|
||||||
|
return nil, fmt.Errorf("recipients are required - blobs must be encrypted")
|
||||||
|
}
|
||||||
|
if cfg.MaxBlobSize <= 0 {
|
||||||
|
return nil, fmt.Errorf("max blob size must be positive")
|
||||||
|
}
|
||||||
|
if cfg.Fs == nil {
|
||||||
|
return nil, fmt.Errorf("filesystem is required")
|
||||||
|
}
|
||||||
|
return &Packer{
|
||||||
|
maxBlobSize: cfg.MaxBlobSize,
|
||||||
|
compressionLevel: cfg.CompressionLevel,
|
||||||
|
recipients: cfg.Recipients,
|
||||||
|
blobHandler: cfg.BlobHandler,
|
||||||
|
repos: cfg.Repositories,
|
||||||
|
fs: cfg.Fs,
|
||||||
|
finishedBlobs: make([]*FinishedBlob, 0),
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// SetBlobHandler sets the handler to be called when a blob is finalized.
|
||||||
|
// The handler is responsible for uploading the blob to storage.
|
||||||
|
// If no handler is set, finalized blobs are stored in memory and can be
|
||||||
|
// retrieved with GetFinishedBlobs().
|
||||||
|
func (p *Packer) SetBlobHandler(handler BlobHandler) {
|
||||||
|
p.mu.Lock()
|
||||||
|
defer p.mu.Unlock()
|
||||||
|
p.blobHandler = handler
|
||||||
|
}
|
||||||
|
|
||||||
|
// AddPendingChunk queues a chunk to be inserted into the database when the
|
||||||
|
// current blob is finalized. This batches chunk inserts to reduce transaction
|
||||||
|
// overhead. Thread-safe.
|
||||||
|
func (p *Packer) AddPendingChunk(hash string, size int64) {
|
||||||
|
p.mu.Lock()
|
||||||
|
defer p.mu.Unlock()
|
||||||
|
p.pendingChunks = append(p.pendingChunks, PendingChunk{Hash: hash, Size: size})
|
||||||
|
}
|
||||||
|
|
||||||
|
// AddChunk adds a chunk to the current blob being packed.
|
||||||
|
// If adding the chunk would exceed MaxBlobSize, returns ErrBlobSizeLimitExceeded.
|
||||||
|
// In this case, the caller should finalize the current blob and retry.
|
||||||
|
// The chunk data is written immediately and can be garbage collected after this call.
|
||||||
|
// Thread-safe.
|
||||||
|
func (p *Packer) AddChunk(chunk *ChunkRef) error {
|
||||||
|
p.mu.Lock()
|
||||||
|
defer p.mu.Unlock()
|
||||||
|
|
||||||
|
// Initialize new blob if needed
|
||||||
|
if p.currentBlob == nil {
|
||||||
|
if err := p.startNewBlob(); err != nil {
|
||||||
|
return fmt.Errorf("starting new blob: %w", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if adding this chunk would exceed blob size limit
|
||||||
|
// Use conservative estimate: assume no compression
|
||||||
|
// Skip size check if chunk already exists in blob
|
||||||
|
if !p.currentBlob.chunkSet[chunk.Hash] {
|
||||||
|
currentSize := p.currentBlob.size
|
||||||
|
newSize := currentSize + int64(len(chunk.Data))
|
||||||
|
|
||||||
|
if newSize > p.maxBlobSize && len(p.currentBlob.chunks) > 0 {
|
||||||
|
// Return error indicating size limit would be exceeded
|
||||||
|
return ErrBlobSizeLimitExceeded
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add chunk to current blob
|
||||||
|
if err := p.addChunkToCurrentBlob(chunk); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Flush finalizes any in-progress blob, compressing, encrypting, and hashing it.
|
||||||
|
// This should be called after all chunks have been added to ensure no data is lost.
|
||||||
|
// If a BlobHandler is set, it will be called with the finalized blob.
|
||||||
|
// Thread-safe.
|
||||||
|
func (p *Packer) Flush() error {
|
||||||
|
p.mu.Lock()
|
||||||
|
defer p.mu.Unlock()
|
||||||
|
|
||||||
|
if p.currentBlob != nil && len(p.currentBlob.chunks) > 0 {
|
||||||
|
if err := p.finalizeCurrentBlob(); err != nil {
|
||||||
|
return fmt.Errorf("finalizing blob: %w", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// FinalizeBlob finalizes the current blob being assembled.
|
||||||
|
// This compresses the accumulated chunks, encrypts the result, and computes
|
||||||
|
// the content-addressed hash. The finalized blob is either passed to the
|
||||||
|
// BlobHandler (if set) or stored internally.
|
||||||
|
// Caller must handle retrying any chunk that triggered size limit exceeded.
|
||||||
|
// Not thread-safe - caller must hold the lock.
|
||||||
|
func (p *Packer) FinalizeBlob() error {
|
||||||
|
p.mu.Lock()
|
||||||
|
defer p.mu.Unlock()
|
||||||
|
|
||||||
|
if p.currentBlob == nil {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
return p.finalizeCurrentBlob()
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetFinishedBlobs returns all completed blobs and clears the internal list.
|
||||||
|
// This is only used when no BlobHandler is set. After calling this method,
|
||||||
|
// the caller is responsible for uploading the blobs to storage.
|
||||||
|
// Thread-safe.
|
||||||
|
func (p *Packer) GetFinishedBlobs() []*FinishedBlob {
|
||||||
|
p.mu.Lock()
|
||||||
|
defer p.mu.Unlock()
|
||||||
|
|
||||||
|
blobs := p.finishedBlobs
|
||||||
|
p.finishedBlobs = make([]*FinishedBlob, 0)
|
||||||
|
return blobs
|
||||||
|
}
|
||||||
|
|
||||||
|
// startNewBlob initializes a new blob (must be called with lock held)
|
||||||
|
func (p *Packer) startNewBlob() error {
|
||||||
|
// Generate UUID for the blob
|
||||||
|
blobID := uuid.New().String()
|
||||||
|
|
||||||
|
// Create blob record in database
|
||||||
|
if p.repos != nil {
|
||||||
|
blobIDTyped, err := types.ParseBlobID(blobID)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("parsing blob ID: %w", err)
|
||||||
|
}
|
||||||
|
blob := &database.Blob{
|
||||||
|
ID: blobIDTyped,
|
||||||
|
Hash: types.BlobHash("temp-placeholder-" + blobID), // Temporary placeholder until finalized
|
||||||
|
CreatedTS: time.Now().UTC(),
|
||||||
|
FinishedTS: nil,
|
||||||
|
UncompressedSize: 0,
|
||||||
|
CompressedSize: 0,
|
||||||
|
UploadedTS: nil,
|
||||||
|
}
|
||||||
|
if err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
return p.repos.Blobs.Create(ctx, tx, blob)
|
||||||
|
}); err != nil {
|
||||||
|
return fmt.Errorf("creating blob record: %w", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create temporary file
|
||||||
|
tempFile, err := afero.TempFile(p.fs, "", "vaultik-blob-*.tmp")
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("creating temp file: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create blobgen writer for unified compression/encryption/hashing
|
||||||
|
writer, err := blobgen.NewWriter(tempFile, p.compressionLevel, p.recipients)
|
||||||
|
if err != nil {
|
||||||
|
_ = tempFile.Close()
|
||||||
|
_ = p.fs.Remove(tempFile.Name())
|
||||||
|
return fmt.Errorf("creating blobgen writer: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
p.currentBlob = &blobInProgress{
|
||||||
|
id: blobID,
|
||||||
|
chunks: make([]*chunkInfo, 0),
|
||||||
|
chunkSet: make(map[string]bool),
|
||||||
|
startTime: time.Now().UTC(),
|
||||||
|
tempFile: tempFile,
|
||||||
|
writer: writer,
|
||||||
|
size: 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
log.Debug("Created new blob container", "blob_id", blobID, "temp_file", tempFile.Name())
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// addChunkToCurrentBlob adds a chunk to the current blob (must be called with lock held)
|
||||||
|
func (p *Packer) addChunkToCurrentBlob(chunk *ChunkRef) error {
|
||||||
|
// Skip if chunk already in current blob
|
||||||
|
if p.currentBlob.chunkSet[chunk.Hash] {
|
||||||
|
log.Debug("Skipping duplicate chunk already in current blob", "chunk_hash", chunk.Hash)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Track offset before writing
|
||||||
|
offset := p.currentBlob.size
|
||||||
|
|
||||||
|
// Write to the blobgen writer (compression -> encryption -> disk)
|
||||||
|
if _, err := p.currentBlob.writer.Write(chunk.Data); err != nil {
|
||||||
|
return fmt.Errorf("writing to blob stream: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Track chunk info
|
||||||
|
chunkSize := int64(len(chunk.Data))
|
||||||
|
chunkInfo := &chunkInfo{
|
||||||
|
Hash: chunk.Hash,
|
||||||
|
Offset: offset,
|
||||||
|
Size: chunkSize,
|
||||||
|
}
|
||||||
|
p.currentBlob.chunks = append(p.currentBlob.chunks, chunkInfo)
|
||||||
|
p.currentBlob.chunkSet[chunk.Hash] = true
|
||||||
|
|
||||||
|
// Note: blob_chunk records are inserted in batch when blob is finalized
|
||||||
|
// to reduce transaction overhead. The chunk info is already stored in
|
||||||
|
// p.currentBlob.chunks for later insertion.
|
||||||
|
|
||||||
|
// Update total size
|
||||||
|
p.currentBlob.size += chunkSize
|
||||||
|
|
||||||
|
log.Debug("Added chunk to blob container",
|
||||||
|
"blob_id", p.currentBlob.id,
|
||||||
|
"chunk_hash", chunk.Hash,
|
||||||
|
"chunk_size", len(chunk.Data),
|
||||||
|
"offset", offset,
|
||||||
|
"blob_chunks", len(p.currentBlob.chunks),
|
||||||
|
"uncompressed_size", p.currentBlob.size)
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// finalizeCurrentBlob completes the current blob (must be called with lock held)
|
||||||
|
func (p *Packer) finalizeCurrentBlob() error {
|
||||||
|
if p.currentBlob == nil {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Close blobgen writer to flush all data
|
||||||
|
if err := p.currentBlob.writer.Close(); err != nil {
|
||||||
|
p.cleanupTempFile()
|
||||||
|
return fmt.Errorf("closing blobgen writer: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Sync file to ensure all data is written
|
||||||
|
if err := p.currentBlob.tempFile.Sync(); err != nil {
|
||||||
|
p.cleanupTempFile()
|
||||||
|
return fmt.Errorf("syncing temp file: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get the final size (encrypted if applicable)
|
||||||
|
finalSize, err := p.currentBlob.tempFile.Seek(0, io.SeekCurrent)
|
||||||
|
if err != nil {
|
||||||
|
p.cleanupTempFile()
|
||||||
|
return fmt.Errorf("getting file size: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Reset to beginning for reading
|
||||||
|
if _, err := p.currentBlob.tempFile.Seek(0, io.SeekStart); err != nil {
|
||||||
|
p.cleanupTempFile()
|
||||||
|
return fmt.Errorf("seeking to start: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get hash from blobgen writer (of final encrypted data)
|
||||||
|
finalHash := p.currentBlob.writer.Sum256()
|
||||||
|
blobHash := hex.EncodeToString(finalHash)
|
||||||
|
|
||||||
|
// Create chunk references with offsets
|
||||||
|
chunkRefs := make([]*BlobChunkRef, 0, len(p.currentBlob.chunks))
|
||||||
|
|
||||||
|
for _, chunk := range p.currentBlob.chunks {
|
||||||
|
chunkRefs = append(chunkRefs, &BlobChunkRef{
|
||||||
|
ChunkHash: chunk.Hash,
|
||||||
|
Offset: chunk.Offset,
|
||||||
|
Length: chunk.Size,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get pending chunks (will be inserted to DB and reported to handler)
|
||||||
|
chunksToInsert := p.pendingChunks
|
||||||
|
p.pendingChunks = nil // Clear pending list
|
||||||
|
|
||||||
|
// Insert pending chunks, blob_chunks, and update blob in a single transaction
|
||||||
|
if p.repos != nil {
|
||||||
|
blobIDTyped, parseErr := types.ParseBlobID(p.currentBlob.id)
|
||||||
|
if parseErr != nil {
|
||||||
|
p.cleanupTempFile()
|
||||||
|
return fmt.Errorf("parsing blob ID: %w", parseErr)
|
||||||
|
}
|
||||||
|
err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
// First insert all pending chunks (required for blob_chunks FK)
|
||||||
|
for _, chunk := range chunksToInsert {
|
||||||
|
dbChunk := &database.Chunk{
|
||||||
|
ChunkHash: types.ChunkHash(chunk.Hash),
|
||||||
|
Size: chunk.Size,
|
||||||
|
}
|
||||||
|
if err := p.repos.Chunks.Create(ctx, tx, dbChunk); err != nil {
|
||||||
|
return fmt.Errorf("creating chunk: %w", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Insert all blob_chunk records in batch
|
||||||
|
for _, chunk := range p.currentBlob.chunks {
|
||||||
|
blobChunk := &database.BlobChunk{
|
||||||
|
BlobID: blobIDTyped,
|
||||||
|
ChunkHash: types.ChunkHash(chunk.Hash),
|
||||||
|
Offset: chunk.Offset,
|
||||||
|
Length: chunk.Size,
|
||||||
|
}
|
||||||
|
if err := p.repos.BlobChunks.Create(ctx, tx, blobChunk); err != nil {
|
||||||
|
return fmt.Errorf("creating blob_chunk: %w", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Update blob record with final hash and sizes
|
||||||
|
return p.repos.Blobs.UpdateFinished(ctx, tx, p.currentBlob.id, blobHash,
|
||||||
|
p.currentBlob.size, finalSize)
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
p.cleanupTempFile()
|
||||||
|
return fmt.Errorf("finalizing blob transaction: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
log.Debug("Committed blob transaction",
|
||||||
|
"chunks_inserted", len(chunksToInsert),
|
||||||
|
"blob_chunks_inserted", len(p.currentBlob.chunks))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create finished blob
|
||||||
|
finished := &FinishedBlob{
|
||||||
|
ID: p.currentBlob.id,
|
||||||
|
Hash: blobHash,
|
||||||
|
Data: nil, // We don't load data into memory anymore
|
||||||
|
Chunks: chunkRefs,
|
||||||
|
CreatedTS: p.currentBlob.startTime,
|
||||||
|
Uncompressed: p.currentBlob.size,
|
||||||
|
Compressed: finalSize,
|
||||||
|
}
|
||||||
|
|
||||||
|
compressionRatio := float64(finished.Compressed) / float64(finished.Uncompressed)
|
||||||
|
log.Info("Finalized blob (compressed and encrypted)",
|
||||||
|
"hash", blobHash,
|
||||||
|
"chunks", len(chunkRefs),
|
||||||
|
"uncompressed", finished.Uncompressed,
|
||||||
|
"compressed", finished.Compressed,
|
||||||
|
"ratio", fmt.Sprintf("%.2f", compressionRatio),
|
||||||
|
"duration", time.Since(p.currentBlob.startTime))
|
||||||
|
|
||||||
|
// Collect inserted chunk hashes for the scanner to track
|
||||||
|
var insertedChunkHashes []string
|
||||||
|
for _, chunk := range chunksToInsert {
|
||||||
|
insertedChunkHashes = append(insertedChunkHashes, chunk.Hash)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Call blob handler if set
|
||||||
|
if p.blobHandler != nil {
|
||||||
|
// Reset file position for handler
|
||||||
|
if _, err := p.currentBlob.tempFile.Seek(0, io.SeekStart); err != nil {
|
||||||
|
p.cleanupTempFile()
|
||||||
|
return fmt.Errorf("seeking for handler: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create a blob reader that includes the data stream
|
||||||
|
blobWithReader := &BlobWithReader{
|
||||||
|
FinishedBlob: finished,
|
||||||
|
Reader: p.currentBlob.tempFile,
|
||||||
|
TempFile: p.currentBlob.tempFile,
|
||||||
|
InsertedChunkHashes: insertedChunkHashes,
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := p.blobHandler(blobWithReader); err != nil {
|
||||||
|
p.cleanupTempFile()
|
||||||
|
return fmt.Errorf("blob handler failed: %w", err)
|
||||||
|
}
|
||||||
|
// Note: blob handler is responsible for closing/cleaning up temp file
|
||||||
|
p.currentBlob = nil
|
||||||
|
} else {
|
||||||
|
log.Debug("No blob handler callback configured", "blob_hash", blobHash[:8]+"...")
|
||||||
|
// No handler, need to read data for legacy behavior
|
||||||
|
if _, err := p.currentBlob.tempFile.Seek(0, io.SeekStart); err != nil {
|
||||||
|
p.cleanupTempFile()
|
||||||
|
return fmt.Errorf("seeking to read data: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
data, err := io.ReadAll(p.currentBlob.tempFile)
|
||||||
|
if err != nil {
|
||||||
|
p.cleanupTempFile()
|
||||||
|
return fmt.Errorf("reading blob data: %w", err)
|
||||||
|
}
|
||||||
|
finished.Data = data
|
||||||
|
|
||||||
|
p.finishedBlobs = append(p.finishedBlobs, finished)
|
||||||
|
|
||||||
|
// Cleanup
|
||||||
|
p.cleanupTempFile()
|
||||||
|
p.currentBlob = nil
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// cleanupTempFile removes the temporary file
|
||||||
|
func (p *Packer) cleanupTempFile() {
|
||||||
|
if p.currentBlob != nil && p.currentBlob.tempFile != nil {
|
||||||
|
name := p.currentBlob.tempFile.Name()
|
||||||
|
_ = p.currentBlob.tempFile.Close()
|
||||||
|
_ = p.fs.Remove(name)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// PackChunks is a convenience method to pack multiple chunks at once
|
||||||
|
func (p *Packer) PackChunks(chunks []*ChunkRef) error {
|
||||||
|
for _, chunk := range chunks {
|
||||||
|
err := p.AddChunk(chunk)
|
||||||
|
if err == ErrBlobSizeLimitExceeded {
|
||||||
|
// Finalize current blob and retry
|
||||||
|
if err := p.FinalizeBlob(); err != nil {
|
||||||
|
return fmt.Errorf("finalizing blob before retry: %w", err)
|
||||||
|
}
|
||||||
|
// Retry the chunk
|
||||||
|
if err := p.AddChunk(chunk); err != nil {
|
||||||
|
return fmt.Errorf("adding chunk %s after finalize: %w", chunk.Hash, err)
|
||||||
|
}
|
||||||
|
} else if err != nil {
|
||||||
|
return fmt.Errorf("adding chunk %s: %w", chunk.Hash, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return p.Flush()
|
||||||
|
}
|
||||||
385
internal/blob/packer_test.go
Normal file
385
internal/blob/packer_test.go
Normal file
@@ -0,0 +1,385 @@
|
|||||||
|
package blob
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"context"
|
||||||
|
"crypto/sha256"
|
||||||
|
"database/sql"
|
||||||
|
"encoding/hex"
|
||||||
|
"io"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"filippo.io/age"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
|
"github.com/klauspost/compress/zstd"
|
||||||
|
"github.com/spf13/afero"
|
||||||
|
)
|
||||||
|
|
||||||
|
const (
|
||||||
|
// Test key from test/insecure-integration-test.key
|
||||||
|
testPrivateKey = "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5"
|
||||||
|
testPublicKey = "age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestPacker(t *testing.T) {
|
||||||
|
// Initialize logger for tests
|
||||||
|
log.Initialize(log.Config{})
|
||||||
|
|
||||||
|
// Parse test identity
|
||||||
|
identity, err := age.ParseX25519Identity(testPrivateKey)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to parse test identity: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
t.Run("single chunk creates single blob", func(t *testing.T) {
|
||||||
|
// Create test database
|
||||||
|
db, err := database.NewTestDB()
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create test db: %v", err)
|
||||||
|
}
|
||||||
|
defer func() { _ = db.Close() }()
|
||||||
|
repos := database.NewRepositories(db)
|
||||||
|
|
||||||
|
cfg := PackerConfig{
|
||||||
|
MaxBlobSize: 10 * 1024 * 1024, // 10MB
|
||||||
|
CompressionLevel: 3,
|
||||||
|
Recipients: []string{testPublicKey},
|
||||||
|
Repositories: repos,
|
||||||
|
Fs: afero.NewMemMapFs(),
|
||||||
|
}
|
||||||
|
packer, err := NewPacker(cfg)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create packer: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create a chunk
|
||||||
|
data := []byte("Hello, World!")
|
||||||
|
hash := sha256.Sum256(data)
|
||||||
|
hashStr := hex.EncodeToString(hash[:])
|
||||||
|
|
||||||
|
// Create chunk in database first
|
||||||
|
dbChunk := &database.Chunk{
|
||||||
|
ChunkHash: types.ChunkHash(hashStr),
|
||||||
|
Size: int64(len(data)),
|
||||||
|
}
|
||||||
|
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
return repos.Chunks.Create(ctx, tx, dbChunk)
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk in db: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
chunk := &ChunkRef{
|
||||||
|
Hash: hashStr,
|
||||||
|
Data: data,
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add chunk
|
||||||
|
if err := packer.AddChunk(chunk); err != nil {
|
||||||
|
t.Fatalf("failed to add chunk: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Flush
|
||||||
|
if err := packer.Flush(); err != nil {
|
||||||
|
t.Fatalf("failed to flush: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get finished blobs
|
||||||
|
blobs := packer.GetFinishedBlobs()
|
||||||
|
if len(blobs) != 1 {
|
||||||
|
t.Fatalf("expected 1 blob, got %d", len(blobs))
|
||||||
|
}
|
||||||
|
|
||||||
|
blob := blobs[0]
|
||||||
|
if len(blob.Chunks) != 1 {
|
||||||
|
t.Errorf("expected 1 chunk in blob, got %d", len(blob.Chunks))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Note: Very small data may not compress well
|
||||||
|
t.Logf("Compression: %d -> %d bytes", blob.Uncompressed, blob.Compressed)
|
||||||
|
|
||||||
|
// Decrypt the blob data
|
||||||
|
decrypted, err := age.Decrypt(bytes.NewReader(blob.Data), identity)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to decrypt blob: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Decompress the decrypted data
|
||||||
|
reader, err := zstd.NewReader(decrypted)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create decompressor: %v", err)
|
||||||
|
}
|
||||||
|
defer reader.Close()
|
||||||
|
|
||||||
|
var decompressed bytes.Buffer
|
||||||
|
if _, err := io.Copy(&decompressed, reader); err != nil {
|
||||||
|
t.Fatalf("failed to decompress: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if !bytes.Equal(decompressed.Bytes(), data) {
|
||||||
|
t.Error("decompressed data doesn't match original")
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
t.Run("multiple chunks packed together", func(t *testing.T) {
|
||||||
|
// Create test database
|
||||||
|
db, err := database.NewTestDB()
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create test db: %v", err)
|
||||||
|
}
|
||||||
|
defer func() { _ = db.Close() }()
|
||||||
|
repos := database.NewRepositories(db)
|
||||||
|
|
||||||
|
cfg := PackerConfig{
|
||||||
|
MaxBlobSize: 10 * 1024 * 1024, // 10MB
|
||||||
|
CompressionLevel: 3,
|
||||||
|
Recipients: []string{testPublicKey},
|
||||||
|
Repositories: repos,
|
||||||
|
Fs: afero.NewMemMapFs(),
|
||||||
|
}
|
||||||
|
packer, err := NewPacker(cfg)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create packer: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create multiple small chunks
|
||||||
|
chunks := make([]*ChunkRef, 10)
|
||||||
|
for i := 0; i < 10; i++ {
|
||||||
|
data := bytes.Repeat([]byte{byte(i)}, 1000)
|
||||||
|
hash := sha256.Sum256(data)
|
||||||
|
hashStr := hex.EncodeToString(hash[:])
|
||||||
|
|
||||||
|
// Create chunk in database first
|
||||||
|
dbChunk := &database.Chunk{
|
||||||
|
ChunkHash: types.ChunkHash(hashStr),
|
||||||
|
Size: int64(len(data)),
|
||||||
|
}
|
||||||
|
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
return repos.Chunks.Create(ctx, tx, dbChunk)
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk in db: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
chunks[i] = &ChunkRef{
|
||||||
|
Hash: hashStr,
|
||||||
|
Data: data,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add all chunks
|
||||||
|
for _, chunk := range chunks {
|
||||||
|
err := packer.AddChunk(chunk)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to add chunk: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Flush
|
||||||
|
if err := packer.Flush(); err != nil {
|
||||||
|
t.Fatalf("failed to flush: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Should have one blob with all chunks
|
||||||
|
blobs := packer.GetFinishedBlobs()
|
||||||
|
if len(blobs) != 1 {
|
||||||
|
t.Fatalf("expected 1 blob, got %d", len(blobs))
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(blobs[0].Chunks) != 10 {
|
||||||
|
t.Errorf("expected 10 chunks in blob, got %d", len(blobs[0].Chunks))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify offsets are correct
|
||||||
|
expectedOffset := int64(0)
|
||||||
|
for i, chunkRef := range blobs[0].Chunks {
|
||||||
|
if chunkRef.Offset != expectedOffset {
|
||||||
|
t.Errorf("chunk %d: expected offset %d, got %d", i, expectedOffset, chunkRef.Offset)
|
||||||
|
}
|
||||||
|
if chunkRef.Length != 1000 {
|
||||||
|
t.Errorf("chunk %d: expected length 1000, got %d", i, chunkRef.Length)
|
||||||
|
}
|
||||||
|
expectedOffset += chunkRef.Length
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
t.Run("blob size limit enforced", func(t *testing.T) {
|
||||||
|
// Create test database
|
||||||
|
db, err := database.NewTestDB()
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create test db: %v", err)
|
||||||
|
}
|
||||||
|
defer func() { _ = db.Close() }()
|
||||||
|
repos := database.NewRepositories(db)
|
||||||
|
|
||||||
|
// Small blob size limit to force multiple blobs
|
||||||
|
cfg := PackerConfig{
|
||||||
|
MaxBlobSize: 5000, // 5KB max
|
||||||
|
CompressionLevel: 3,
|
||||||
|
Recipients: []string{testPublicKey},
|
||||||
|
Repositories: repos,
|
||||||
|
Fs: afero.NewMemMapFs(),
|
||||||
|
}
|
||||||
|
packer, err := NewPacker(cfg)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create packer: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create chunks that will exceed the limit
|
||||||
|
chunks := make([]*ChunkRef, 10)
|
||||||
|
for i := 0; i < 10; i++ {
|
||||||
|
data := bytes.Repeat([]byte{byte(i)}, 1000) // 1KB each
|
||||||
|
hash := sha256.Sum256(data)
|
||||||
|
hashStr := hex.EncodeToString(hash[:])
|
||||||
|
|
||||||
|
// Create chunk in database first
|
||||||
|
dbChunk := &database.Chunk{
|
||||||
|
ChunkHash: types.ChunkHash(hashStr),
|
||||||
|
Size: int64(len(data)),
|
||||||
|
}
|
||||||
|
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
return repos.Chunks.Create(ctx, tx, dbChunk)
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk in db: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
chunks[i] = &ChunkRef{
|
||||||
|
Hash: hashStr,
|
||||||
|
Data: data,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
blobCount := 0
|
||||||
|
|
||||||
|
// Add chunks and handle size limit errors
|
||||||
|
for _, chunk := range chunks {
|
||||||
|
err := packer.AddChunk(chunk)
|
||||||
|
if err == ErrBlobSizeLimitExceeded {
|
||||||
|
// Finalize current blob
|
||||||
|
if err := packer.FinalizeBlob(); err != nil {
|
||||||
|
t.Fatalf("failed to finalize blob: %v", err)
|
||||||
|
}
|
||||||
|
blobCount++
|
||||||
|
// Retry adding the chunk
|
||||||
|
if err := packer.AddChunk(chunk); err != nil {
|
||||||
|
t.Fatalf("failed to add chunk after finalize: %v", err)
|
||||||
|
}
|
||||||
|
} else if err != nil {
|
||||||
|
t.Fatalf("failed to add chunk: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Flush remaining
|
||||||
|
if err := packer.Flush(); err != nil {
|
||||||
|
t.Fatalf("failed to flush: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get all blobs
|
||||||
|
blobs := packer.GetFinishedBlobs()
|
||||||
|
totalBlobs := blobCount + len(blobs)
|
||||||
|
|
||||||
|
// Should have multiple blobs due to size limit
|
||||||
|
if totalBlobs < 2 {
|
||||||
|
t.Errorf("expected multiple blobs due to size limit, got %d", totalBlobs)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify each blob respects size limit (approximately)
|
||||||
|
for _, blob := range blobs {
|
||||||
|
if blob.Compressed > 6000 { // Allow some overhead
|
||||||
|
t.Errorf("blob size %d exceeds limit", blob.Compressed)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
t.Run("with encryption", func(t *testing.T) {
|
||||||
|
// Create test database
|
||||||
|
db, err := database.NewTestDB()
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create test db: %v", err)
|
||||||
|
}
|
||||||
|
defer func() { _ = db.Close() }()
|
||||||
|
repos := database.NewRepositories(db)
|
||||||
|
|
||||||
|
// Generate test identity (using the one from parent test)
|
||||||
|
cfg := PackerConfig{
|
||||||
|
MaxBlobSize: 10 * 1024 * 1024, // 10MB
|
||||||
|
CompressionLevel: 3,
|
||||||
|
Recipients: []string{testPublicKey},
|
||||||
|
Repositories: repos,
|
||||||
|
Fs: afero.NewMemMapFs(),
|
||||||
|
}
|
||||||
|
packer, err := NewPacker(cfg)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create packer: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create test data
|
||||||
|
data := bytes.Repeat([]byte("Test data for encryption!"), 100)
|
||||||
|
hash := sha256.Sum256(data)
|
||||||
|
hashStr := hex.EncodeToString(hash[:])
|
||||||
|
|
||||||
|
// Create chunk in database first
|
||||||
|
dbChunk := &database.Chunk{
|
||||||
|
ChunkHash: types.ChunkHash(hashStr),
|
||||||
|
Size: int64(len(data)),
|
||||||
|
}
|
||||||
|
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
return repos.Chunks.Create(ctx, tx, dbChunk)
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk in db: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
chunk := &ChunkRef{
|
||||||
|
Hash: hashStr,
|
||||||
|
Data: data,
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add chunk and flush
|
||||||
|
if err := packer.AddChunk(chunk); err != nil {
|
||||||
|
t.Fatalf("failed to add chunk: %v", err)
|
||||||
|
}
|
||||||
|
if err := packer.Flush(); err != nil {
|
||||||
|
t.Fatalf("failed to flush: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get blob
|
||||||
|
blobs := packer.GetFinishedBlobs()
|
||||||
|
if len(blobs) != 1 {
|
||||||
|
t.Fatalf("expected 1 blob, got %d", len(blobs))
|
||||||
|
}
|
||||||
|
|
||||||
|
blob := blobs[0]
|
||||||
|
|
||||||
|
// Decrypt the blob
|
||||||
|
decrypted, err := age.Decrypt(bytes.NewReader(blob.Data), identity)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to decrypt blob: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
var decryptedData bytes.Buffer
|
||||||
|
if _, err := decryptedData.ReadFrom(decrypted); err != nil {
|
||||||
|
t.Fatalf("failed to read decrypted data: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Decompress
|
||||||
|
reader, err := zstd.NewReader(&decryptedData)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create decompressor: %v", err)
|
||||||
|
}
|
||||||
|
defer reader.Close()
|
||||||
|
|
||||||
|
var decompressed bytes.Buffer
|
||||||
|
if _, err := decompressed.ReadFrom(reader); err != nil {
|
||||||
|
t.Fatalf("failed to decompress: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify data
|
||||||
|
if !bytes.Equal(decompressed.Bytes(), data) {
|
||||||
|
t.Error("decrypted and decompressed data doesn't match original")
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
74
internal/blobgen/compress.go
Normal file
74
internal/blobgen/compress.go
Normal file
@@ -0,0 +1,74 @@
|
|||||||
|
package blobgen
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"encoding/hex"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
)
|
||||||
|
|
||||||
|
// CompressResult contains the results of compression
|
||||||
|
type CompressResult struct {
|
||||||
|
Data []byte
|
||||||
|
UncompressedSize int64
|
||||||
|
CompressedSize int64
|
||||||
|
SHA256 string
|
||||||
|
}
|
||||||
|
|
||||||
|
// CompressData compresses and encrypts data, returning the result with hash
|
||||||
|
func CompressData(data []byte, compressionLevel int, recipients []string) (*CompressResult, error) {
|
||||||
|
var buf bytes.Buffer
|
||||||
|
|
||||||
|
// Create writer
|
||||||
|
w, err := NewWriter(&buf, compressionLevel, recipients)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("creating writer: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Write data
|
||||||
|
if _, err := w.Write(data); err != nil {
|
||||||
|
_ = w.Close()
|
||||||
|
return nil, fmt.Errorf("writing data: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Close to flush
|
||||||
|
if err := w.Close(); err != nil {
|
||||||
|
return nil, fmt.Errorf("closing writer: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return &CompressResult{
|
||||||
|
Data: buf.Bytes(),
|
||||||
|
UncompressedSize: int64(len(data)),
|
||||||
|
CompressedSize: int64(buf.Len()),
|
||||||
|
SHA256: hex.EncodeToString(w.Sum256()),
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// CompressStream compresses and encrypts from reader to writer, returning hash
|
||||||
|
func CompressStream(dst io.Writer, src io.Reader, compressionLevel int, recipients []string) (written int64, hash string, err error) {
|
||||||
|
// Create writer
|
||||||
|
w, err := NewWriter(dst, compressionLevel, recipients)
|
||||||
|
if err != nil {
|
||||||
|
return 0, "", fmt.Errorf("creating writer: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
closed := false
|
||||||
|
defer func() {
|
||||||
|
if !closed {
|
||||||
|
_ = w.Close()
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Copy data
|
||||||
|
if _, err := io.Copy(w, src); err != nil {
|
||||||
|
return 0, "", fmt.Errorf("copying data: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Close to flush
|
||||||
|
if err := w.Close(); err != nil {
|
||||||
|
return 0, "", fmt.Errorf("closing writer: %w", err)
|
||||||
|
}
|
||||||
|
closed = true
|
||||||
|
|
||||||
|
return w.BytesWritten(), hex.EncodeToString(w.Sum256()), nil
|
||||||
|
}
|
||||||
64
internal/blobgen/compress_test.go
Normal file
64
internal/blobgen/compress_test.go
Normal file
@@ -0,0 +1,64 @@
|
|||||||
|
package blobgen
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"crypto/rand"
|
||||||
|
"strings"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/stretchr/testify/assert"
|
||||||
|
"github.com/stretchr/testify/require"
|
||||||
|
)
|
||||||
|
|
||||||
|
// testRecipient is a static age recipient for tests.
|
||||||
|
const testRecipient = "age1cplgrwj77ta54dnmydvvmzn64ltk83ankxl5sww04mrtmu62kv3s89gmvv"
|
||||||
|
|
||||||
|
// TestCompressStreamNoDoubleClose is a regression test for issue #28.
|
||||||
|
// It verifies that CompressStream does not panic or return an error due to
|
||||||
|
// double-closing the underlying blobgen.Writer. Before the fix in PR #33,
|
||||||
|
// the explicit Close() on the happy path combined with defer Close() would
|
||||||
|
// cause a double close.
|
||||||
|
func TestCompressStreamNoDoubleClose(t *testing.T) {
|
||||||
|
input := []byte("regression test data for issue #28 double-close fix")
|
||||||
|
var buf bytes.Buffer
|
||||||
|
|
||||||
|
written, hash, err := CompressStream(&buf, bytes.NewReader(input), 3, []string{testRecipient})
|
||||||
|
require.NoError(t, err, "CompressStream should not return an error")
|
||||||
|
assert.True(t, written > 0, "expected bytes written > 0")
|
||||||
|
assert.NotEmpty(t, hash, "expected non-empty hash")
|
||||||
|
assert.True(t, buf.Len() > 0, "expected non-empty output")
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestCompressStreamLargeInput exercises CompressStream with a larger payload
|
||||||
|
// to ensure no double-close issues surface under heavier I/O.
|
||||||
|
func TestCompressStreamLargeInput(t *testing.T) {
|
||||||
|
data := make([]byte, 512*1024) // 512 KB
|
||||||
|
_, err := rand.Read(data)
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
var buf bytes.Buffer
|
||||||
|
written, hash, err := CompressStream(&buf, bytes.NewReader(data), 3, []string{testRecipient})
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.True(t, written > 0)
|
||||||
|
assert.NotEmpty(t, hash)
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestCompressStreamEmptyInput verifies CompressStream handles empty input
|
||||||
|
// without double-close issues.
|
||||||
|
func TestCompressStreamEmptyInput(t *testing.T) {
|
||||||
|
var buf bytes.Buffer
|
||||||
|
_, hash, err := CompressStream(&buf, strings.NewReader(""), 3, []string{testRecipient})
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.NotEmpty(t, hash)
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestCompressDataNoDoubleClose mirrors the stream test for CompressData,
|
||||||
|
// ensuring the explicit Close + error-path Close pattern is also safe.
|
||||||
|
func TestCompressDataNoDoubleClose(t *testing.T) {
|
||||||
|
input := []byte("CompressData regression test for double-close")
|
||||||
|
result, err := CompressData(input, 3, []string{testRecipient})
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.True(t, result.CompressedSize > 0)
|
||||||
|
assert.True(t, result.UncompressedSize == int64(len(input)))
|
||||||
|
assert.NotEmpty(t, result.SHA256)
|
||||||
|
}
|
||||||
73
internal/blobgen/reader.go
Normal file
73
internal/blobgen/reader.go
Normal file
@@ -0,0 +1,73 @@
|
|||||||
|
package blobgen
|
||||||
|
|
||||||
|
import (
|
||||||
|
"crypto/sha256"
|
||||||
|
"fmt"
|
||||||
|
"hash"
|
||||||
|
"io"
|
||||||
|
|
||||||
|
"filippo.io/age"
|
||||||
|
"github.com/klauspost/compress/zstd"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Reader wraps decompression and decryption with SHA256 verification
|
||||||
|
type Reader struct {
|
||||||
|
reader io.Reader
|
||||||
|
decompressor *zstd.Decoder
|
||||||
|
decryptor io.Reader
|
||||||
|
hasher hash.Hash
|
||||||
|
teeReader io.Reader
|
||||||
|
bytesRead int64
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewReader creates a new Reader that decrypts, decompresses, and verifies data
|
||||||
|
func NewReader(r io.Reader, identity age.Identity) (*Reader, error) {
|
||||||
|
// Create decryption reader
|
||||||
|
decReader, err := age.Decrypt(r, identity)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("creating decryption reader: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create decompression reader
|
||||||
|
decompressor, err := zstd.NewReader(decReader)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("creating decompression reader: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create SHA256 hasher
|
||||||
|
hasher := sha256.New()
|
||||||
|
|
||||||
|
// Create tee reader that reads from decompressor and writes to hasher
|
||||||
|
teeReader := io.TeeReader(decompressor, hasher)
|
||||||
|
|
||||||
|
return &Reader{
|
||||||
|
reader: r,
|
||||||
|
decompressor: decompressor,
|
||||||
|
decryptor: decReader,
|
||||||
|
hasher: hasher,
|
||||||
|
teeReader: teeReader,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Read implements io.Reader
|
||||||
|
func (r *Reader) Read(p []byte) (n int, err error) {
|
||||||
|
n, err = r.teeReader.Read(p)
|
||||||
|
r.bytesRead += int64(n)
|
||||||
|
return n, err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Close closes the decompressor
|
||||||
|
func (r *Reader) Close() error {
|
||||||
|
r.decompressor.Close()
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Sum256 returns the SHA256 hash of all data read
|
||||||
|
func (r *Reader) Sum256() []byte {
|
||||||
|
return r.hasher.Sum(nil)
|
||||||
|
}
|
||||||
|
|
||||||
|
// BytesRead returns the number of uncompressed bytes read
|
||||||
|
func (r *Reader) BytesRead() int64 {
|
||||||
|
return r.bytesRead
|
||||||
|
}
|
||||||
127
internal/blobgen/writer.go
Normal file
127
internal/blobgen/writer.go
Normal file
@@ -0,0 +1,127 @@
|
|||||||
|
package blobgen
|
||||||
|
|
||||||
|
import (
|
||||||
|
"crypto/sha256"
|
||||||
|
"fmt"
|
||||||
|
"hash"
|
||||||
|
"io"
|
||||||
|
"runtime"
|
||||||
|
|
||||||
|
"filippo.io/age"
|
||||||
|
"github.com/klauspost/compress/zstd"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Writer wraps compression and encryption with SHA256 hashing.
|
||||||
|
// Data flows: input -> tee(hasher, compressor -> encryptor -> destination)
|
||||||
|
// The hash is computed on the uncompressed input for deterministic content-addressing.
|
||||||
|
type Writer struct {
|
||||||
|
teeWriter io.Writer // Tee to hasher and compressor
|
||||||
|
compressor *zstd.Encoder // Compression layer
|
||||||
|
encryptor io.WriteCloser // Encryption layer
|
||||||
|
hasher hash.Hash // SHA256 hasher (on uncompressed input)
|
||||||
|
compressionLevel int
|
||||||
|
bytesWritten int64
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewWriter creates a new Writer that compresses, encrypts, and hashes data.
|
||||||
|
// The hash is computed on the uncompressed input for deterministic content-addressing.
|
||||||
|
func NewWriter(w io.Writer, compressionLevel int, recipients []string) (*Writer, error) {
|
||||||
|
// Validate compression level
|
||||||
|
if err := validateCompressionLevel(compressionLevel); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create SHA256 hasher for the uncompressed input
|
||||||
|
hasher := sha256.New()
|
||||||
|
|
||||||
|
// Parse recipients
|
||||||
|
var ageRecipients []age.Recipient
|
||||||
|
for _, recipient := range recipients {
|
||||||
|
r, err := age.ParseX25519Recipient(recipient)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("parsing recipient %s: %w", recipient, err)
|
||||||
|
}
|
||||||
|
ageRecipients = append(ageRecipients, r)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create encryption writer that outputs to destination
|
||||||
|
encWriter, err := age.Encrypt(w, ageRecipients...)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("creating encryption writer: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Calculate compression concurrency: CPUs - 2, minimum 1
|
||||||
|
concurrency := runtime.NumCPU() - 2
|
||||||
|
if concurrency < 1 {
|
||||||
|
concurrency = 1
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create compression writer with encryption as destination
|
||||||
|
compressor, err := zstd.NewWriter(encWriter,
|
||||||
|
zstd.WithEncoderLevel(zstd.EncoderLevelFromZstd(compressionLevel)),
|
||||||
|
zstd.WithEncoderConcurrency(concurrency),
|
||||||
|
)
|
||||||
|
if err != nil {
|
||||||
|
_ = encWriter.Close()
|
||||||
|
return nil, fmt.Errorf("creating compression writer: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create tee writer: input goes to both hasher and compressor
|
||||||
|
teeWriter := io.MultiWriter(hasher, compressor)
|
||||||
|
|
||||||
|
return &Writer{
|
||||||
|
teeWriter: teeWriter,
|
||||||
|
compressor: compressor,
|
||||||
|
encryptor: encWriter,
|
||||||
|
hasher: hasher,
|
||||||
|
compressionLevel: compressionLevel,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Write implements io.Writer
|
||||||
|
func (w *Writer) Write(p []byte) (n int, err error) {
|
||||||
|
n, err = w.teeWriter.Write(p)
|
||||||
|
w.bytesWritten += int64(n)
|
||||||
|
return n, err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Close closes all layers and returns any errors
|
||||||
|
func (w *Writer) Close() error {
|
||||||
|
// Close compressor first
|
||||||
|
if err := w.compressor.Close(); err != nil {
|
||||||
|
return fmt.Errorf("closing compressor: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Then close encryptor
|
||||||
|
if err := w.encryptor.Close(); err != nil {
|
||||||
|
return fmt.Errorf("closing encryptor: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Sum256 returns the double SHA256 hash of the uncompressed input data.
|
||||||
|
// Double hashing (SHA256(SHA256(data))) prevents information leakage about
|
||||||
|
// the plaintext - an attacker cannot confirm existence of known content
|
||||||
|
// by computing its hash and checking for a matching blob filename.
|
||||||
|
func (w *Writer) Sum256() []byte {
|
||||||
|
// First hash: SHA256(plaintext)
|
||||||
|
firstHash := w.hasher.Sum(nil)
|
||||||
|
// Second hash: SHA256(firstHash) - this is the blob ID
|
||||||
|
secondHash := sha256.Sum256(firstHash)
|
||||||
|
return secondHash[:]
|
||||||
|
}
|
||||||
|
|
||||||
|
// BytesWritten returns the number of uncompressed bytes written
|
||||||
|
func (w *Writer) BytesWritten() int64 {
|
||||||
|
return w.bytesWritten
|
||||||
|
}
|
||||||
|
|
||||||
|
func validateCompressionLevel(level int) error {
|
||||||
|
// Zstd compression levels: 1-19 (default is 3)
|
||||||
|
// SpeedFastest = 1, SpeedDefault = 3, SpeedBetterCompression = 7, SpeedBestCompression = 11
|
||||||
|
if level < 1 || level > 19 {
|
||||||
|
return fmt.Errorf("invalid compression level %d: must be between 1 and 19", level)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
105
internal/blobgen/writer_test.go
Normal file
105
internal/blobgen/writer_test.go
Normal file
@@ -0,0 +1,105 @@
|
|||||||
|
package blobgen
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"crypto/rand"
|
||||||
|
"crypto/sha256"
|
||||||
|
"encoding/hex"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/stretchr/testify/assert"
|
||||||
|
"github.com/stretchr/testify/require"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TestWriterHashIsDoubleHash verifies that Writer.Sum256() returns
|
||||||
|
// the double hash SHA256(SHA256(plaintext)) for security.
|
||||||
|
// Double hashing prevents attackers from confirming existence of known content.
|
||||||
|
func TestWriterHashIsDoubleHash(t *testing.T) {
|
||||||
|
// Test data - random data that doesn't compress well
|
||||||
|
testData := make([]byte, 1024*1024) // 1MB
|
||||||
|
_, err := rand.Read(testData)
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Test recipient (generated with age-keygen)
|
||||||
|
testRecipient := "age1cplgrwj77ta54dnmydvvmzn64ltk83ankxl5sww04mrtmu62kv3s89gmvv"
|
||||||
|
|
||||||
|
// Create a buffer to capture the encrypted output
|
||||||
|
var encryptedBuf bytes.Buffer
|
||||||
|
|
||||||
|
// Create blobgen writer
|
||||||
|
writer, err := NewWriter(&encryptedBuf, 3, []string{testRecipient})
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Write test data
|
||||||
|
n, err := writer.Write(testData)
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Equal(t, len(testData), n)
|
||||||
|
|
||||||
|
// Close to flush all data
|
||||||
|
err = writer.Close()
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Get the hash from the writer
|
||||||
|
writerHash := hex.EncodeToString(writer.Sum256())
|
||||||
|
|
||||||
|
// Calculate the expected double hash: SHA256(SHA256(plaintext))
|
||||||
|
firstHash := sha256.Sum256(testData)
|
||||||
|
secondHash := sha256.Sum256(firstHash[:])
|
||||||
|
expectedDoubleHash := hex.EncodeToString(secondHash[:])
|
||||||
|
|
||||||
|
// Also compute single hash to verify it's different
|
||||||
|
singleHashStr := hex.EncodeToString(firstHash[:])
|
||||||
|
|
||||||
|
t.Logf("Input size: %d bytes", len(testData))
|
||||||
|
t.Logf("Single hash (SHA256(data)): %s", singleHashStr)
|
||||||
|
t.Logf("Double hash (SHA256(SHA256(data))): %s", expectedDoubleHash)
|
||||||
|
t.Logf("Writer hash: %s", writerHash)
|
||||||
|
|
||||||
|
// The writer hash should match the double hash
|
||||||
|
assert.Equal(t, expectedDoubleHash, writerHash,
|
||||||
|
"Writer.Sum256() should return SHA256(SHA256(plaintext)) for security")
|
||||||
|
|
||||||
|
// Verify it's NOT the single hash (would leak information)
|
||||||
|
assert.NotEqual(t, singleHashStr, writerHash,
|
||||||
|
"Writer hash should not be single hash (would allow content confirmation attacks)")
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestWriterDeterministicHash verifies that the same input always produces
|
||||||
|
// the same hash, even with non-deterministic encryption.
|
||||||
|
func TestWriterDeterministicHash(t *testing.T) {
|
||||||
|
// Test data
|
||||||
|
testData := []byte("Hello, World! This is test data for deterministic hashing.")
|
||||||
|
|
||||||
|
// Test recipient
|
||||||
|
testRecipient := "age1cplgrwj77ta54dnmydvvmzn64ltk83ankxl5sww04mrtmu62kv3s89gmvv"
|
||||||
|
|
||||||
|
// Create two writers and verify they produce the same hash
|
||||||
|
var buf1, buf2 bytes.Buffer
|
||||||
|
|
||||||
|
writer1, err := NewWriter(&buf1, 3, []string{testRecipient})
|
||||||
|
require.NoError(t, err)
|
||||||
|
_, err = writer1.Write(testData)
|
||||||
|
require.NoError(t, err)
|
||||||
|
require.NoError(t, writer1.Close())
|
||||||
|
|
||||||
|
writer2, err := NewWriter(&buf2, 3, []string{testRecipient})
|
||||||
|
require.NoError(t, err)
|
||||||
|
_, err = writer2.Write(testData)
|
||||||
|
require.NoError(t, err)
|
||||||
|
require.NoError(t, writer2.Close())
|
||||||
|
|
||||||
|
hash1 := hex.EncodeToString(writer1.Sum256())
|
||||||
|
hash2 := hex.EncodeToString(writer2.Sum256())
|
||||||
|
|
||||||
|
// Hashes should be identical (deterministic)
|
||||||
|
assert.Equal(t, hash1, hash2, "Same input should produce same hash")
|
||||||
|
|
||||||
|
// Encrypted outputs should be different (non-deterministic encryption)
|
||||||
|
assert.NotEqual(t, buf1.Bytes(), buf2.Bytes(),
|
||||||
|
"Encrypted outputs should differ due to non-deterministic encryption")
|
||||||
|
|
||||||
|
t.Logf("Hash 1: %s", hash1)
|
||||||
|
t.Logf("Hash 2: %s", hash2)
|
||||||
|
t.Logf("Encrypted size 1: %d bytes", buf1.Len())
|
||||||
|
t.Logf("Encrypted size 2: %d bytes", buf2.Len())
|
||||||
|
}
|
||||||
153
internal/chunker/chunker.go
Normal file
153
internal/chunker/chunker.go
Normal file
@@ -0,0 +1,153 @@
|
|||||||
|
package chunker
|
||||||
|
|
||||||
|
import (
|
||||||
|
"crypto/sha256"
|
||||||
|
"encoding/hex"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"os"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Chunk represents a single chunk of data produced by the content-defined chunking algorithm.
|
||||||
|
// Each chunk is identified by its SHA256 hash and contains the raw data along with
|
||||||
|
// its position and size information from the original file.
|
||||||
|
type Chunk struct {
|
||||||
|
Hash string // Content hash of the chunk
|
||||||
|
Data []byte // Chunk data
|
||||||
|
Offset int64 // Offset in the original file
|
||||||
|
Size int64 // Size of the chunk
|
||||||
|
}
|
||||||
|
|
||||||
|
// Chunker provides content-defined chunking using the FastCDC algorithm.
|
||||||
|
// It splits data into variable-sized chunks based on content patterns, ensuring
|
||||||
|
// that identical data sequences produce identical chunks regardless of their
|
||||||
|
// position in the file. This enables efficient deduplication.
|
||||||
|
type Chunker struct {
|
||||||
|
avgChunkSize int
|
||||||
|
minChunkSize int
|
||||||
|
maxChunkSize int
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewChunker creates a new chunker with the specified average chunk size.
|
||||||
|
// The actual chunk sizes will vary between avgChunkSize/4 and avgChunkSize*4
|
||||||
|
// as recommended by the FastCDC algorithm. Typical values for avgChunkSize
|
||||||
|
// are 64KB (65536), 256KB (262144), or 1MB (1048576).
|
||||||
|
func NewChunker(avgChunkSize int64) *Chunker {
|
||||||
|
// FastCDC recommends min = avg/4 and max = avg*4
|
||||||
|
return &Chunker{
|
||||||
|
avgChunkSize: int(avgChunkSize),
|
||||||
|
minChunkSize: int(avgChunkSize / 4),
|
||||||
|
maxChunkSize: int(avgChunkSize * 4),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ChunkReader splits the reader into content-defined chunks and returns all chunks at once.
|
||||||
|
// This method loads all chunk data into memory, so it should only be used for
|
||||||
|
// reasonably sized inputs. For large files or streams, use ChunkReaderStreaming instead.
|
||||||
|
// Returns an error if chunking fails or if reading from the input fails.
|
||||||
|
func (c *Chunker) ChunkReader(r io.Reader) ([]Chunk, error) {
|
||||||
|
chunker := AcquireReusableChunker(r, c.minChunkSize, c.avgChunkSize, c.maxChunkSize)
|
||||||
|
defer chunker.Release()
|
||||||
|
|
||||||
|
var chunks []Chunk
|
||||||
|
offset := int64(0)
|
||||||
|
|
||||||
|
for {
|
||||||
|
chunk, err := chunker.Next()
|
||||||
|
if err == io.EOF {
|
||||||
|
break
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("reading chunk: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Calculate hash
|
||||||
|
hash := sha256.Sum256(chunk.Data)
|
||||||
|
|
||||||
|
// Make a copy of the data since the chunker reuses the buffer
|
||||||
|
chunkData := make([]byte, len(chunk.Data))
|
||||||
|
copy(chunkData, chunk.Data)
|
||||||
|
|
||||||
|
chunks = append(chunks, Chunk{
|
||||||
|
Hash: hex.EncodeToString(hash[:]),
|
||||||
|
Data: chunkData,
|
||||||
|
Offset: offset,
|
||||||
|
Size: int64(len(chunk.Data)),
|
||||||
|
})
|
||||||
|
|
||||||
|
offset += int64(len(chunk.Data))
|
||||||
|
}
|
||||||
|
|
||||||
|
return chunks, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// ChunkCallback is a function called for each chunk as it's processed.
|
||||||
|
// The callback receives a Chunk containing the hash, data, offset, and size.
|
||||||
|
// If the callback returns an error, chunk processing stops and the error is propagated.
|
||||||
|
type ChunkCallback func(chunk Chunk) error
|
||||||
|
|
||||||
|
// ChunkReaderStreaming splits the reader into chunks and calls the callback for each chunk.
|
||||||
|
// This is the preferred method for processing large files or streams as it doesn't
|
||||||
|
// accumulate all chunks in memory. The callback is invoked for each chunk as it's
|
||||||
|
// produced, allowing for streaming processing and immediate storage or transmission.
|
||||||
|
// Returns the SHA256 hash of the entire file content and an error if chunking fails,
|
||||||
|
// reading fails, or if the callback returns an error.
|
||||||
|
func (c *Chunker) ChunkReaderStreaming(r io.Reader, callback ChunkCallback) (string, error) {
|
||||||
|
// Create a tee reader to calculate full file hash while chunking
|
||||||
|
fileHasher := sha256.New()
|
||||||
|
teeReader := io.TeeReader(r, fileHasher)
|
||||||
|
|
||||||
|
chunker := AcquireReusableChunker(teeReader, c.minChunkSize, c.avgChunkSize, c.maxChunkSize)
|
||||||
|
defer chunker.Release()
|
||||||
|
|
||||||
|
offset := int64(0)
|
||||||
|
|
||||||
|
for {
|
||||||
|
chunk, err := chunker.Next()
|
||||||
|
if err == io.EOF {
|
||||||
|
break
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return "", fmt.Errorf("reading chunk: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Calculate chunk hash
|
||||||
|
hash := sha256.Sum256(chunk.Data)
|
||||||
|
|
||||||
|
// Pass the data directly - caller must process it before we call Next() again
|
||||||
|
// (chunker reuses its internal buffer, but since we process synchronously
|
||||||
|
// and completely before continuing, no copy is needed)
|
||||||
|
if err := callback(Chunk{
|
||||||
|
Hash: hex.EncodeToString(hash[:]),
|
||||||
|
Data: chunk.Data,
|
||||||
|
Offset: offset,
|
||||||
|
Size: int64(len(chunk.Data)),
|
||||||
|
}); err != nil {
|
||||||
|
return "", fmt.Errorf("callback error: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
offset += int64(len(chunk.Data))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Return the full file hash
|
||||||
|
return hex.EncodeToString(fileHasher.Sum(nil)), nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// ChunkFile splits a file into content-defined chunks by reading the entire file.
|
||||||
|
// This is a convenience method that opens the file and passes it to ChunkReader.
|
||||||
|
// For large files, consider using ChunkReaderStreaming with a file handle instead.
|
||||||
|
// Returns an error if the file cannot be opened or if chunking fails.
|
||||||
|
func (c *Chunker) ChunkFile(path string) ([]Chunk, error) {
|
||||||
|
file, err := os.Open(path)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("opening file: %w", err)
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
if err := file.Close(); err != nil && err.Error() != "invalid argument" {
|
||||||
|
// Log error or handle as needed
|
||||||
|
_ = err
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
return c.ChunkReader(file)
|
||||||
|
}
|
||||||
77
internal/chunker/chunker_isolated_test.go
Normal file
77
internal/chunker/chunker_isolated_test.go
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
package chunker
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestChunkerExpectedChunkCount(t *testing.T) {
|
||||||
|
tests := []struct {
|
||||||
|
name string
|
||||||
|
fileSize int
|
||||||
|
avgChunkSize int64
|
||||||
|
minExpected int
|
||||||
|
maxExpected int
|
||||||
|
}{
|
||||||
|
{
|
||||||
|
name: "1MB file with 64KB average",
|
||||||
|
fileSize: 1024 * 1024,
|
||||||
|
avgChunkSize: 64 * 1024,
|
||||||
|
minExpected: 8, // At least half the expected count
|
||||||
|
maxExpected: 32, // At most double the expected count
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "10MB file with 256KB average",
|
||||||
|
fileSize: 10 * 1024 * 1024,
|
||||||
|
avgChunkSize: 256 * 1024,
|
||||||
|
minExpected: 10, // FastCDC may produce larger chunks
|
||||||
|
maxExpected: 80,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "512KB file with 64KB average",
|
||||||
|
fileSize: 512 * 1024,
|
||||||
|
avgChunkSize: 64 * 1024,
|
||||||
|
minExpected: 4, // ~8 expected
|
||||||
|
maxExpected: 16,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, tt := range tests {
|
||||||
|
t.Run(tt.name, func(t *testing.T) {
|
||||||
|
chunker := NewChunker(tt.avgChunkSize)
|
||||||
|
|
||||||
|
// Create data with some variation to trigger chunk boundaries
|
||||||
|
data := make([]byte, tt.fileSize)
|
||||||
|
for i := 0; i < len(data); i++ {
|
||||||
|
// Use a pattern that should create boundaries
|
||||||
|
data[i] = byte((i * 17) ^ (i >> 5))
|
||||||
|
}
|
||||||
|
|
||||||
|
chunks, err := chunker.ChunkReader(bytes.NewReader(data))
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("chunking failed: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
t.Logf("Created %d chunks for %d bytes with %d average chunk size",
|
||||||
|
len(chunks), tt.fileSize, tt.avgChunkSize)
|
||||||
|
|
||||||
|
if len(chunks) < tt.minExpected {
|
||||||
|
t.Errorf("too few chunks: got %d, expected at least %d",
|
||||||
|
len(chunks), tt.minExpected)
|
||||||
|
}
|
||||||
|
if len(chunks) > tt.maxExpected {
|
||||||
|
t.Errorf("too many chunks: got %d, expected at most %d",
|
||||||
|
len(chunks), tt.maxExpected)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify chunks reconstruct to original
|
||||||
|
var reconstructed []byte
|
||||||
|
for _, chunk := range chunks {
|
||||||
|
reconstructed = append(reconstructed, chunk.Data...)
|
||||||
|
}
|
||||||
|
if !bytes.Equal(data, reconstructed) {
|
||||||
|
t.Error("reconstructed data doesn't match original")
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
128
internal/chunker/chunker_test.go
Normal file
128
internal/chunker/chunker_test.go
Normal file
@@ -0,0 +1,128 @@
|
|||||||
|
package chunker
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"crypto/rand"
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestChunker(t *testing.T) {
|
||||||
|
t.Run("small file produces single chunk", func(t *testing.T) {
|
||||||
|
chunker := NewChunker(1024 * 1024) // 1MB average
|
||||||
|
data := bytes.Repeat([]byte("hello"), 100) // 500 bytes
|
||||||
|
|
||||||
|
chunks, err := chunker.ChunkReader(bytes.NewReader(data))
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("chunking failed: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(chunks) != 1 {
|
||||||
|
t.Errorf("expected 1 chunk, got %d", len(chunks))
|
||||||
|
}
|
||||||
|
|
||||||
|
if chunks[0].Size != int64(len(data)) {
|
||||||
|
t.Errorf("expected chunk size %d, got %d", len(data), chunks[0].Size)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
t.Run("large file produces multiple chunks", func(t *testing.T) {
|
||||||
|
chunker := NewChunker(256 * 1024) // 256KB average chunk size
|
||||||
|
|
||||||
|
// Generate 2MB of random data
|
||||||
|
data := make([]byte, 2*1024*1024)
|
||||||
|
if _, err := rand.Read(data); err != nil {
|
||||||
|
t.Fatalf("failed to generate random data: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
chunks, err := chunker.ChunkReader(bytes.NewReader(data))
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("chunking failed: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Should produce multiple chunks - with FastCDC we expect around 8 chunks for 2MB with 256KB average
|
||||||
|
if len(chunks) < 4 || len(chunks) > 16 {
|
||||||
|
t.Errorf("expected 4-16 chunks, got %d", len(chunks))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify chunks reconstruct original data
|
||||||
|
var reconstructed []byte
|
||||||
|
for _, chunk := range chunks {
|
||||||
|
reconstructed = append(reconstructed, chunk.Data...)
|
||||||
|
}
|
||||||
|
|
||||||
|
if !bytes.Equal(data, reconstructed) {
|
||||||
|
t.Error("reconstructed data doesn't match original")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify offsets
|
||||||
|
var expectedOffset int64
|
||||||
|
for i, chunk := range chunks {
|
||||||
|
if chunk.Offset != expectedOffset {
|
||||||
|
t.Errorf("chunk %d: expected offset %d, got %d", i, expectedOffset, chunk.Offset)
|
||||||
|
}
|
||||||
|
expectedOffset += chunk.Size
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
t.Run("deterministic chunking", func(t *testing.T) {
|
||||||
|
chunker1 := NewChunker(256 * 1024)
|
||||||
|
chunker2 := NewChunker(256 * 1024)
|
||||||
|
|
||||||
|
// Use deterministic data
|
||||||
|
data := bytes.Repeat([]byte("abcdefghijklmnopqrstuvwxyz"), 20000) // ~520KB
|
||||||
|
|
||||||
|
chunks1, err := chunker1.ChunkReader(bytes.NewReader(data))
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("chunking failed: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
chunks2, err := chunker2.ChunkReader(bytes.NewReader(data))
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("chunking failed: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Should produce same chunks
|
||||||
|
if len(chunks1) != len(chunks2) {
|
||||||
|
t.Fatalf("different number of chunks: %d vs %d", len(chunks1), len(chunks2))
|
||||||
|
}
|
||||||
|
|
||||||
|
for i := range chunks1 {
|
||||||
|
if chunks1[i].Hash != chunks2[i].Hash {
|
||||||
|
t.Errorf("chunk %d: different hashes", i)
|
||||||
|
}
|
||||||
|
if chunks1[i].Size != chunks2[i].Size {
|
||||||
|
t.Errorf("chunk %d: different sizes", i)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestChunkBoundaries(t *testing.T) {
|
||||||
|
chunker := NewChunker(256 * 1024) // 256KB average
|
||||||
|
|
||||||
|
// FastCDC uses avg/4 for min and avg*4 for max
|
||||||
|
avgSize := int64(256 * 1024)
|
||||||
|
minSize := avgSize / 4
|
||||||
|
maxSize := avgSize * 4
|
||||||
|
|
||||||
|
// Test that minimum chunk size is respected
|
||||||
|
data := make([]byte, minSize+1024)
|
||||||
|
if _, err := rand.Read(data); err != nil {
|
||||||
|
t.Fatalf("failed to generate random data: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
chunks, err := chunker.ChunkReader(bytes.NewReader(data))
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("chunking failed: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
for i, chunk := range chunks {
|
||||||
|
// Last chunk can be smaller than minimum
|
||||||
|
if i < len(chunks)-1 && chunk.Size < minSize {
|
||||||
|
t.Errorf("chunk %d size %d is below minimum %d", i, chunk.Size, minSize)
|
||||||
|
}
|
||||||
|
if chunk.Size > maxSize {
|
||||||
|
t.Errorf("chunk %d size %d exceeds maximum %d", i, chunk.Size, maxSize)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
265
internal/chunker/fastcdc.go
Normal file
265
internal/chunker/fastcdc.go
Normal file
@@ -0,0 +1,265 @@
|
|||||||
|
package chunker
|
||||||
|
|
||||||
|
import (
|
||||||
|
"io"
|
||||||
|
"math"
|
||||||
|
"sync"
|
||||||
|
)
|
||||||
|
|
||||||
|
// ReusableChunker implements FastCDC with reusable buffers to minimize allocations.
|
||||||
|
// Unlike the upstream fastcdc-go library which allocates a new buffer per file,
|
||||||
|
// this implementation uses sync.Pool to reuse buffers across files.
|
||||||
|
type ReusableChunker struct {
|
||||||
|
minSize int
|
||||||
|
maxSize int
|
||||||
|
normSize int
|
||||||
|
bufSize int
|
||||||
|
|
||||||
|
maskS uint64
|
||||||
|
maskL uint64
|
||||||
|
|
||||||
|
rd io.Reader
|
||||||
|
|
||||||
|
buf []byte
|
||||||
|
cursor int
|
||||||
|
offset int
|
||||||
|
eof bool
|
||||||
|
}
|
||||||
|
|
||||||
|
// reusableChunkerPool pools ReusableChunker instances to avoid allocations.
|
||||||
|
var reusableChunkerPool = sync.Pool{
|
||||||
|
New: func() interface{} {
|
||||||
|
return &ReusableChunker{}
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
// bufferPools contains pools for different buffer sizes.
|
||||||
|
// Key is the buffer size.
|
||||||
|
var bufferPools = sync.Map{}
|
||||||
|
|
||||||
|
func getBuffer(size int) []byte {
|
||||||
|
poolI, _ := bufferPools.LoadOrStore(size, &sync.Pool{
|
||||||
|
New: func() interface{} {
|
||||||
|
buf := make([]byte, size)
|
||||||
|
return &buf
|
||||||
|
},
|
||||||
|
})
|
||||||
|
pool := poolI.(*sync.Pool)
|
||||||
|
return *pool.Get().(*[]byte)
|
||||||
|
}
|
||||||
|
|
||||||
|
func putBuffer(buf []byte) {
|
||||||
|
size := cap(buf)
|
||||||
|
poolI, ok := bufferPools.Load(size)
|
||||||
|
if ok {
|
||||||
|
pool := poolI.(*sync.Pool)
|
||||||
|
b := buf[:size]
|
||||||
|
pool.Put(&b)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// FastCDCChunk represents a chunk from the FastCDC algorithm.
|
||||||
|
type FastCDCChunk struct {
|
||||||
|
Offset int
|
||||||
|
Length int
|
||||||
|
Data []byte
|
||||||
|
Fingerprint uint64
|
||||||
|
}
|
||||||
|
|
||||||
|
// AcquireReusableChunker gets a chunker from the pool and initializes it for the given reader.
|
||||||
|
func AcquireReusableChunker(rd io.Reader, minSize, avgSize, maxSize int) *ReusableChunker {
|
||||||
|
c := reusableChunkerPool.Get().(*ReusableChunker)
|
||||||
|
|
||||||
|
bufSize := maxSize * 2
|
||||||
|
|
||||||
|
// Reuse buffer if it's the right size, otherwise get a new one
|
||||||
|
if c.buf == nil || cap(c.buf) != bufSize {
|
||||||
|
if c.buf != nil {
|
||||||
|
putBuffer(c.buf)
|
||||||
|
}
|
||||||
|
c.buf = getBuffer(bufSize)
|
||||||
|
} else {
|
||||||
|
// Restore buffer to full capacity (may have been truncated by previous EOF)
|
||||||
|
c.buf = c.buf[:cap(c.buf)]
|
||||||
|
}
|
||||||
|
|
||||||
|
bits := int(math.Round(math.Log2(float64(avgSize))))
|
||||||
|
normalization := 2
|
||||||
|
smallBits := bits + normalization
|
||||||
|
largeBits := bits - normalization
|
||||||
|
|
||||||
|
c.minSize = minSize
|
||||||
|
c.maxSize = maxSize
|
||||||
|
c.normSize = avgSize
|
||||||
|
c.bufSize = bufSize
|
||||||
|
c.maskS = (1 << smallBits) - 1
|
||||||
|
c.maskL = (1 << largeBits) - 1
|
||||||
|
c.rd = rd
|
||||||
|
c.cursor = bufSize
|
||||||
|
c.offset = 0
|
||||||
|
c.eof = false
|
||||||
|
|
||||||
|
return c
|
||||||
|
}
|
||||||
|
|
||||||
|
// Release returns the chunker to the pool for reuse.
|
||||||
|
func (c *ReusableChunker) Release() {
|
||||||
|
c.rd = nil
|
||||||
|
reusableChunkerPool.Put(c)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (c *ReusableChunker) fillBuffer() error {
|
||||||
|
n := len(c.buf) - c.cursor
|
||||||
|
if n >= c.maxSize {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Move all data after the cursor to the start of the buffer
|
||||||
|
copy(c.buf[:n], c.buf[c.cursor:])
|
||||||
|
c.cursor = 0
|
||||||
|
|
||||||
|
if c.eof {
|
||||||
|
c.buf = c.buf[:n]
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Restore buffer to full capacity for reading
|
||||||
|
c.buf = c.buf[:c.bufSize]
|
||||||
|
|
||||||
|
// Fill the rest of the buffer
|
||||||
|
m, err := io.ReadFull(c.rd, c.buf[n:])
|
||||||
|
if err == io.EOF || err == io.ErrUnexpectedEOF {
|
||||||
|
c.buf = c.buf[:n+m]
|
||||||
|
c.eof = true
|
||||||
|
} else if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Next returns the next chunk or io.EOF when done.
|
||||||
|
// The returned Data slice is only valid until the next call to Next.
|
||||||
|
func (c *ReusableChunker) Next() (FastCDCChunk, error) {
|
||||||
|
if err := c.fillBuffer(); err != nil {
|
||||||
|
return FastCDCChunk{}, err
|
||||||
|
}
|
||||||
|
if len(c.buf) == 0 {
|
||||||
|
return FastCDCChunk{}, io.EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
length, fp := c.nextChunk(c.buf[c.cursor:])
|
||||||
|
|
||||||
|
chunk := FastCDCChunk{
|
||||||
|
Offset: c.offset,
|
||||||
|
Length: length,
|
||||||
|
Data: c.buf[c.cursor : c.cursor+length],
|
||||||
|
Fingerprint: fp,
|
||||||
|
}
|
||||||
|
|
||||||
|
c.cursor += length
|
||||||
|
c.offset += chunk.Length
|
||||||
|
|
||||||
|
return chunk, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (c *ReusableChunker) nextChunk(data []byte) (int, uint64) {
|
||||||
|
fp := uint64(0)
|
||||||
|
i := c.minSize
|
||||||
|
|
||||||
|
if len(data) <= c.minSize {
|
||||||
|
return len(data), fp
|
||||||
|
}
|
||||||
|
|
||||||
|
n := min(len(data), c.maxSize)
|
||||||
|
|
||||||
|
for ; i < min(n, c.normSize); i++ {
|
||||||
|
fp = (fp << 1) + table[data[i]]
|
||||||
|
if (fp & c.maskS) == 0 {
|
||||||
|
return i + 1, fp
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for ; i < n; i++ {
|
||||||
|
fp = (fp << 1) + table[data[i]]
|
||||||
|
if (fp & c.maskL) == 0 {
|
||||||
|
return i + 1, fp
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return i, fp
|
||||||
|
}
|
||||||
|
|
||||||
|
func min(a, b int) int {
|
||||||
|
if a < b {
|
||||||
|
return a
|
||||||
|
}
|
||||||
|
return b
|
||||||
|
}
|
||||||
|
|
||||||
|
// 256 random uint64s for the rolling hash function (from FastCDC paper)
|
||||||
|
var table = [256]uint64{
|
||||||
|
0xe80e8d55032474b3, 0x11b25b61f5924e15, 0x03aa5bd82a9eb669, 0xc45a153ef107a38c,
|
||||||
|
0xeac874b86f0f57b9, 0xa5ccedec95ec79c7, 0xe15a3320ad42ac0a, 0x5ed3583fa63cec15,
|
||||||
|
0xcd497bf624a4451d, 0xf9ade5b059683605, 0x773940c03fb11ca1, 0xa36b16e4a6ae15b2,
|
||||||
|
0x67afd1adb5a89eac, 0xc44c75ee32f0038e, 0x2101790f365c0967, 0x76415c64a222fc4a,
|
||||||
|
0x579929249a1e577a, 0xe4762fc41fdbf750, 0xea52198e57dfcdcc, 0xe2535aafe30b4281,
|
||||||
|
0xcb1a1bd6c77c9056, 0x5a1aa9bfc4612a62, 0x15a728aef8943eb5, 0x2f8f09738a8ec8d9,
|
||||||
|
0x200f3dec9fac8074, 0x0fa9a7b1e0d318df, 0x06c0804ffd0d8e3a, 0x630cbc412669dd25,
|
||||||
|
0x10e34f85f4b10285, 0x2a6fe8164b9b6410, 0xcacb57d857d55810, 0x77f8a3a36ff11b46,
|
||||||
|
0x66af517e0dc3003e, 0x76c073c789b4009a, 0x853230dbb529f22a, 0x1e9e9c09a1f77e56,
|
||||||
|
0x1e871223802ee65d, 0x37fe4588718ff813, 0x10088539f30db464, 0x366f7470b80b72d1,
|
||||||
|
0x33f2634d9a6b31db, 0xd43917751d69ea18, 0xa0f492bc1aa7b8de, 0x3f94e5a8054edd20,
|
||||||
|
0xedfd6e25eb8b1dbf, 0x759517a54f196a56, 0xe81d5006ec7b6b17, 0x8dd8385fa894a6b7,
|
||||||
|
0x45f4d5467b0d6f91, 0xa1f894699de22bc8, 0x33829d09ef93e0fe, 0x3e29e250caed603c,
|
||||||
|
0xf7382cba7f63a45e, 0x970f95412bb569d1, 0xc7fcea456d356b4b, 0x723042513f3e7a57,
|
||||||
|
0x17ae7688de3596f1, 0x27ac1fcd7cd23c1a, 0xf429beeb78b3f71f, 0xd0780692fb93a3f9,
|
||||||
|
0x9f507e28a7c9842f, 0x56001ad536e433ae, 0x7e1dd1ecf58be306, 0x15fee353aa233fc6,
|
||||||
|
0xb033a0730b7638e8, 0xeb593ad6bd2406d1, 0x7c86502574d0f133, 0xce3b008d4ccb4be7,
|
||||||
|
0xf8566e3d383594c8, 0xb2c261e9b7af4429, 0xf685e7e253799dbb, 0x05d33ed60a494cbc,
|
||||||
|
0xeaf88d55a4cb0d1a, 0x3ee9368a902415a1, 0x8980fe6a8493a9a4, 0x358ed008cb448631,
|
||||||
|
0xd0cb7e37b46824b8, 0xe9bc375c0bc94f84, 0xea0bf1d8e6b55bb3, 0xb66a60d0f9f6f297,
|
||||||
|
0x66db2cc4807b3758, 0x7e4e014afbca8b4d, 0xa5686a4938b0c730, 0xa5f0d7353d623316,
|
||||||
|
0x26e38c349242d5e8, 0xeeefa80a29858e30, 0x8915cb912aa67386, 0x4b957a47bfc420d4,
|
||||||
|
0xbb53d051a895f7e1, 0x09f5e3235f6911ce, 0x416b98e695cfb7ce, 0x97a08183344c5c86,
|
||||||
|
0xbf68e0791839a861, 0xea05dde59ed3ed56, 0x0ca732280beda160, 0xac748ed62fe7f4e2,
|
||||||
|
0xc686da075cf6e151, 0xe1ba5658f4af05c8, 0xe9ff09fbeb67cc35, 0xafaea9470323b28d,
|
||||||
|
0x0291e8db5bb0ac2a, 0x342072a9bbee77ae, 0x03147eed6b3d0a9c, 0x21379d4de31dbadb,
|
||||||
|
0x2388d965226fb986, 0x52c96988bfebabfa, 0xa6fc29896595bc2d, 0x38fa4af70aa46b8b,
|
||||||
|
0xa688dd13939421ee, 0x99d5275d9b1415da, 0x453d31bb4fe73631, 0xde51debc1fbe3356,
|
||||||
|
0x75a3c847a06c622f, 0xe80e32755d272579, 0x5444052250d8ec0d, 0x8f17dfda19580a3b,
|
||||||
|
0xf6b3e9363a185e42, 0x7a42adec6868732f, 0x32cb6a07629203a2, 0x1eca8957defe56d9,
|
||||||
|
0x9fa85e4bc78ff9ed, 0x20ff07224a499ca7, 0x3fa6295ff9682c70, 0xe3d5b1e3ce993eff,
|
||||||
|
0xa341209362e0b79a, 0x64bd9eae5712ffe8, 0xceebb537babbd12a, 0x5586ef404315954f,
|
||||||
|
0x46c3085c938ab51a, 0xa82ccb9199907cee, 0x8c51b6690a3523c8, 0xc4dbd4c9ae518332,
|
||||||
|
0x979898dbb23db7b2, 0x1b5b585e6f672a9d, 0xce284da7c4903810, 0x841166e8bb5f1c4f,
|
||||||
|
0xb7d884a3fceca7d0, 0xa76468f5a4572374, 0xc10c45f49ee9513d, 0x68f9a5663c1908c9,
|
||||||
|
0x0095a13476a6339d, 0xd1d7516ffbe9c679, 0xfd94ab0c9726f938, 0x627468bbdb27c959,
|
||||||
|
0xedc3f8988e4a8c9a, 0x58efd33f0dfaa499, 0x21e37d7e2ef4ac8b, 0x297f9ab5586259c6,
|
||||||
|
0xda3ba4dc6cb9617d, 0xae11d8d9de2284d2, 0xcfeed88cb3729865, 0xefc2f9e4f03e2633,
|
||||||
|
0x8226393e8f0855a4, 0xd6e25fd7acf3a767, 0x435784c3bfd6d14a, 0xf97142e6343fe757,
|
||||||
|
0xd73b9fe826352f85, 0x6c3ac444b5b2bd76, 0xd8e88f3e9fd4a3fd, 0x31e50875c36f3460,
|
||||||
|
0xa824f1bf88cf4d44, 0x54a4d2c8f5f25899, 0xbff254637ce3b1e6, 0xa02cfe92561b3caa,
|
||||||
|
0x7bedb4edee9f0af7, 0x879c0620ac49a102, 0xa12c4ccd23b332e7, 0x09a5ff47bf94ed1e,
|
||||||
|
0x7b62f43cd3046fa0, 0xaa3af0476b9c2fb9, 0x22e55301abebba8e, 0x3a6035c42747bd58,
|
||||||
|
0x1705373106c8ec07, 0xb1f660de828d0628, 0x065fe82d89ca563d, 0xf555c2d8074d516d,
|
||||||
|
0x6bb6c186b423ee99, 0x54a807be6f3120a8, 0x8a3c7fe2f88860b8, 0xbeffc344f5118e81,
|
||||||
|
0xd686e80b7d1bd268, 0x661aef4ef5e5e88b, 0x5bf256c654cd1dda, 0x9adb1ab85d7640f4,
|
||||||
|
0x68449238920833a2, 0x843279f4cebcb044, 0xc8710cdefa93f7bb, 0x236943294538f3e6,
|
||||||
|
0x80d7d136c486d0b4, 0x61653956b28851d3, 0x3f843be9a9a956b5, 0xf73cfbbf137987e5,
|
||||||
|
0xcf0cb6dee8ceac2c, 0x50c401f52f185cae, 0xbdbe89ce735c4c1c, 0xeef3ade9c0570bc7,
|
||||||
|
0xbe8b066f8f64cbf6, 0x5238d6131705dcb9, 0x20219086c950e9f6, 0x634468d9ed74de02,
|
||||||
|
0x0aba4b3d705c7fa5, 0x3374416f725a6672, 0xe7378bdf7beb3bc6, 0x0f7b6a1b1cee565b,
|
||||||
|
0x234e4c41b0c33e64, 0x4efa9a0c3f21fe28, 0x1167fc551643e514, 0x9f81a69d3eb01fa4,
|
||||||
|
0xdb75c22b12306ed0, 0xe25055d738fc9686, 0x9f9f167a3f8507bb, 0x195f8336d3fbe4d3,
|
||||||
|
0x8442b6feffdcb6f6, 0x1e07ed24746ffde9, 0x140e31462d555266, 0x8bd0ce515ae1406e,
|
||||||
|
0x2c0be0042b5584b3, 0x35a23d0e15d45a60, 0xc14f1ba147d9bc83, 0xbbf168691264b23f,
|
||||||
|
0xad2cc7b57e589ade, 0x9501963154c7815c, 0x9664afa6b8d67d47, 0x7f9e5101fea0a81c,
|
||||||
|
0x45ecffb610d25bfd, 0x3157f7aecf9b6ab3, 0xc43ca6f88d87501d, 0x9576ff838dee38dc,
|
||||||
|
0x93f21afe0ce1c7d7, 0xceac699df343d8f9, 0x2fec49e29f03398d, 0x8805ccd5730281ed,
|
||||||
|
0xf9fc16fc750a8e59, 0x35308cc771adf736, 0x4a57b7c9ee2b7def, 0x03a4c6cdc937a02a,
|
||||||
|
0x6c9a8a269fc8c4fc, 0x4681decec7a03f43, 0x342eecded1353ef9, 0x8be0552d8413a867,
|
||||||
|
0xc7b4ac51beda8be8, 0xebcc64fb719842c0, 0xde8e4c7fb6d40c1c, 0xcc8263b62f9738b1,
|
||||||
|
0xd3cfc0f86511929a, 0x466024ce8bb226ea, 0x459ff690253a3c18, 0x98b27e9d91284c9c,
|
||||||
|
0x75c3ae8aa3af373d, 0xfbf8f8e79a866ffc, 0x32327f59d0662799, 0x8228b57e729e9830,
|
||||||
|
0x065ceb7a18381b58, 0xd2177671a31dc5ff, 0x90cd801f2f8701f9, 0x9d714428471c65fe,
|
||||||
|
}
|
||||||
@@ -2,28 +2,63 @@ package cli
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"context"
|
"context"
|
||||||
|
"errors"
|
||||||
"fmt"
|
"fmt"
|
||||||
|
"os"
|
||||||
|
"os/signal"
|
||||||
|
"path/filepath"
|
||||||
|
"syscall"
|
||||||
|
"time"
|
||||||
|
|
||||||
"git.eeqj.de/sneak/vaultik/internal/config"
|
"git.eeqj.de/sneak/vaultik/internal/config"
|
||||||
"git.eeqj.de/sneak/vaultik/internal/database"
|
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||||
"git.eeqj.de/sneak/vaultik/internal/globals"
|
"git.eeqj.de/sneak/vaultik/internal/globals"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/pidlock"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/snapshot"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/storage"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/vaultik"
|
||||||
|
"github.com/adrg/xdg"
|
||||||
"go.uber.org/fx"
|
"go.uber.org/fx"
|
||||||
)
|
)
|
||||||
|
|
||||||
// AppOptions contains common options for creating the fx application
|
// AppOptions contains common options for creating the fx application.
|
||||||
|
// It includes the configuration file path, logging options, and additional
|
||||||
|
// fx modules and invocations that should be included in the application.
|
||||||
type AppOptions struct {
|
type AppOptions struct {
|
||||||
ConfigPath string
|
ConfigPath string
|
||||||
|
LogOptions log.LogOptions
|
||||||
Modules []fx.Option
|
Modules []fx.Option
|
||||||
Invokes []fx.Option
|
Invokes []fx.Option
|
||||||
}
|
}
|
||||||
|
|
||||||
// NewApp creates a new fx application with common modules
|
// setupGlobals sets up the globals with application startup time
|
||||||
|
func setupGlobals(lc fx.Lifecycle, g *globals.Globals) {
|
||||||
|
lc.Append(fx.Hook{
|
||||||
|
OnStart: func(ctx context.Context) error {
|
||||||
|
g.StartTime = time.Now().UTC()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewApp creates a new fx application with common modules.
|
||||||
|
// It sets up the base modules (config, database, logging, globals) and
|
||||||
|
// combines them with any additional modules specified in the options.
|
||||||
|
// The returned fx.App is ready to be started with RunApp.
|
||||||
func NewApp(opts AppOptions) *fx.App {
|
func NewApp(opts AppOptions) *fx.App {
|
||||||
baseModules := []fx.Option{
|
baseModules := []fx.Option{
|
||||||
fx.Supply(config.ConfigPath(opts.ConfigPath)),
|
fx.Supply(config.ConfigPath(opts.ConfigPath)),
|
||||||
|
fx.Supply(opts.LogOptions),
|
||||||
fx.Provide(globals.New),
|
fx.Provide(globals.New),
|
||||||
|
fx.Provide(log.New),
|
||||||
config.Module,
|
config.Module,
|
||||||
database.Module,
|
database.Module,
|
||||||
|
log.Module,
|
||||||
|
storage.Module,
|
||||||
|
snapshot.Module,
|
||||||
|
fx.Provide(vaultik.New),
|
||||||
|
fx.Invoke(setupGlobals),
|
||||||
fx.NopLogger,
|
fx.NopLogger,
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -33,24 +68,77 @@ func NewApp(opts AppOptions) *fx.App {
|
|||||||
return fx.New(allOptions...)
|
return fx.New(allOptions...)
|
||||||
}
|
}
|
||||||
|
|
||||||
// RunApp starts and stops the fx application within the given context
|
// RunApp starts and stops the fx application within the given context.
|
||||||
|
// It handles graceful shutdown on interrupt signals (SIGINT, SIGTERM) and
|
||||||
|
// ensures the application stops cleanly. The function blocks until the
|
||||||
|
// application completes or is interrupted. Returns an error if startup fails.
|
||||||
func RunApp(ctx context.Context, app *fx.App) error {
|
func RunApp(ctx context.Context, app *fx.App) error {
|
||||||
|
// Set up signal handling for graceful shutdown
|
||||||
|
sigChan := make(chan os.Signal, 1)
|
||||||
|
signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)
|
||||||
|
|
||||||
|
// Create a context that will be cancelled on signal
|
||||||
|
ctx, cancel := context.WithCancel(ctx)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
// Start the app
|
||||||
if err := app.Start(ctx); err != nil {
|
if err := app.Start(ctx); err != nil {
|
||||||
return fmt.Errorf("failed to start app: %w", err)
|
return fmt.Errorf("failed to start app: %w", err)
|
||||||
}
|
}
|
||||||
defer func() {
|
|
||||||
if err := app.Stop(ctx); err != nil {
|
// Handle shutdown
|
||||||
fmt.Printf("error stopping app: %v\n", err)
|
shutdownComplete := make(chan struct{})
|
||||||
|
go func() {
|
||||||
|
defer close(shutdownComplete)
|
||||||
|
<-sigChan
|
||||||
|
log.Notice("Received interrupt signal, shutting down gracefully...")
|
||||||
|
|
||||||
|
// Create a timeout context for shutdown
|
||||||
|
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 30*time.Second)
|
||||||
|
defer shutdownCancel()
|
||||||
|
|
||||||
|
if err := app.Stop(shutdownCtx); err != nil {
|
||||||
|
log.Error("Error during shutdown", "error", err)
|
||||||
}
|
}
|
||||||
}()
|
}()
|
||||||
|
|
||||||
// Wait for context cancellation
|
// Wait for either the signal handler to complete shutdown or the app to request shutdown
|
||||||
<-ctx.Done()
|
select {
|
||||||
|
case <-shutdownComplete:
|
||||||
|
// Shutdown completed via signal
|
||||||
return nil
|
return nil
|
||||||
|
case <-ctx.Done():
|
||||||
|
// Context cancelled (shouldn't happen in normal operation)
|
||||||
|
if err := app.Stop(context.Background()); err != nil {
|
||||||
|
log.Error("Error stopping app", "error", err)
|
||||||
|
}
|
||||||
|
return ctx.Err()
|
||||||
|
case <-app.Done():
|
||||||
|
// App finished running (e.g., backup completed)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// RunWithApp is a helper that creates and runs an fx app with the given options
|
// RunWithApp is a helper that creates and runs an fx app with the given options.
|
||||||
|
// It combines NewApp and RunApp into a single convenient function. This is the
|
||||||
|
// preferred way to run CLI commands that need the full application context.
|
||||||
|
// It acquires a PID lock before starting to prevent concurrent instances.
|
||||||
func RunWithApp(ctx context.Context, opts AppOptions) error {
|
func RunWithApp(ctx context.Context, opts AppOptions) error {
|
||||||
|
// Acquire PID lock to prevent concurrent instances
|
||||||
|
lockDir := filepath.Join(xdg.DataHome, "berlin.sneak.app.vaultik")
|
||||||
|
lock, err := pidlock.Acquire(lockDir)
|
||||||
|
if err != nil {
|
||||||
|
if errors.Is(err, pidlock.ErrAlreadyRunning) {
|
||||||
|
return fmt.Errorf("cannot start: %w", err)
|
||||||
|
}
|
||||||
|
return fmt.Errorf("failed to acquire lock: %w", err)
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
if err := lock.Release(); err != nil {
|
||||||
|
log.Warn("Failed to release PID lock", "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
app := NewApp(opts)
|
app := NewApp(opts)
|
||||||
return RunApp(ctx, app)
|
return RunApp(ctx, app)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,83 +0,0 @@
|
|||||||
package cli
|
|
||||||
|
|
||||||
import (
|
|
||||||
"context"
|
|
||||||
"fmt"
|
|
||||||
"os"
|
|
||||||
|
|
||||||
"git.eeqj.de/sneak/vaultik/internal/config"
|
|
||||||
"git.eeqj.de/sneak/vaultik/internal/database"
|
|
||||||
"git.eeqj.de/sneak/vaultik/internal/globals"
|
|
||||||
"github.com/spf13/cobra"
|
|
||||||
"go.uber.org/fx"
|
|
||||||
)
|
|
||||||
|
|
||||||
// BackupOptions contains options for the backup command
|
|
||||||
type BackupOptions struct {
|
|
||||||
ConfigPath string
|
|
||||||
Daemon bool
|
|
||||||
Cron bool
|
|
||||||
Prune bool
|
|
||||||
}
|
|
||||||
|
|
||||||
// NewBackupCommand creates the backup command
|
|
||||||
func NewBackupCommand() *cobra.Command {
|
|
||||||
opts := &BackupOptions{}
|
|
||||||
|
|
||||||
cmd := &cobra.Command{
|
|
||||||
Use: "backup",
|
|
||||||
Short: "Perform incremental backup",
|
|
||||||
Long: `Backup configured directories using incremental deduplication and encryption.
|
|
||||||
|
|
||||||
Config is located at /etc/vaultik/config.yml, but can be overridden by specifying
|
|
||||||
a path using --config or by setting VAULTIK_CONFIG to a path.`,
|
|
||||||
Args: cobra.NoArgs,
|
|
||||||
RunE: func(cmd *cobra.Command, args []string) error {
|
|
||||||
// If --config not specified, check environment variable
|
|
||||||
if opts.ConfigPath == "" {
|
|
||||||
opts.ConfigPath = os.Getenv("VAULTIK_CONFIG")
|
|
||||||
}
|
|
||||||
// If still not specified, use default
|
|
||||||
if opts.ConfigPath == "" {
|
|
||||||
defaultConfig := "/etc/vaultik/config.yml"
|
|
||||||
if _, err := os.Stat(defaultConfig); err == nil {
|
|
||||||
opts.ConfigPath = defaultConfig
|
|
||||||
} else {
|
|
||||||
return fmt.Errorf("no config file specified, VAULTIK_CONFIG not set, and %s not found", defaultConfig)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return runBackup(cmd.Context(), opts)
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
cmd.Flags().StringVar(&opts.ConfigPath, "config", "", "Path to config file")
|
|
||||||
cmd.Flags().BoolVar(&opts.Daemon, "daemon", false, "Run in daemon mode with inotify monitoring")
|
|
||||||
cmd.Flags().BoolVar(&opts.Cron, "cron", false, "Run in cron mode (silent unless error)")
|
|
||||||
cmd.Flags().BoolVar(&opts.Prune, "prune", false, "Delete all previous snapshots and unreferenced blobs after backup")
|
|
||||||
|
|
||||||
return cmd
|
|
||||||
}
|
|
||||||
|
|
||||||
func runBackup(ctx context.Context, opts *BackupOptions) error {
|
|
||||||
return RunWithApp(ctx, AppOptions{
|
|
||||||
ConfigPath: opts.ConfigPath,
|
|
||||||
Invokes: []fx.Option{
|
|
||||||
fx.Invoke(func(g *globals.Globals, cfg *config.Config, repos *database.Repositories) error {
|
|
||||||
// TODO: Implement backup logic
|
|
||||||
fmt.Printf("Running backup with config: %s\n", opts.ConfigPath)
|
|
||||||
fmt.Printf("Version: %s, Commit: %s\n", g.Version, g.Commit)
|
|
||||||
fmt.Printf("Index path: %s\n", cfg.IndexPath)
|
|
||||||
if opts.Daemon {
|
|
||||||
fmt.Println("Running in daemon mode")
|
|
||||||
}
|
|
||||||
if opts.Cron {
|
|
||||||
fmt.Println("Running in cron mode")
|
|
||||||
}
|
|
||||||
if opts.Prune {
|
|
||||||
fmt.Println("Pruning enabled - will delete old snapshots after backup")
|
|
||||||
}
|
|
||||||
return nil
|
|
||||||
}),
|
|
||||||
},
|
|
||||||
})
|
|
||||||
}
|
|
||||||
102
internal/cli/database.go
Normal file
102
internal/cli/database.go
Normal file
@@ -0,0 +1,102 @@
|
|||||||
|
package cli
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/config"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"github.com/spf13/cobra"
|
||||||
|
)
|
||||||
|
|
||||||
|
// NewDatabaseCommand creates the database command group
|
||||||
|
func NewDatabaseCommand() *cobra.Command {
|
||||||
|
cmd := &cobra.Command{
|
||||||
|
Use: "database",
|
||||||
|
Short: "Manage the local state database",
|
||||||
|
Long: `Commands for managing the local SQLite state database.`,
|
||||||
|
}
|
||||||
|
|
||||||
|
cmd.AddCommand(
|
||||||
|
newDatabasePurgeCommand(),
|
||||||
|
)
|
||||||
|
|
||||||
|
return cmd
|
||||||
|
}
|
||||||
|
|
||||||
|
// newDatabasePurgeCommand creates the database purge command
|
||||||
|
func newDatabasePurgeCommand() *cobra.Command {
|
||||||
|
var force bool
|
||||||
|
|
||||||
|
cmd := &cobra.Command{
|
||||||
|
Use: "purge",
|
||||||
|
Short: "Delete the local state database",
|
||||||
|
Long: `Completely removes the local SQLite state database.
|
||||||
|
|
||||||
|
This will erase all local tracking of:
|
||||||
|
- File metadata and change detection state
|
||||||
|
- Chunk and blob mappings
|
||||||
|
- Local snapshot records
|
||||||
|
|
||||||
|
The remote storage is NOT affected. After purging, the next backup will
|
||||||
|
perform a full scan and re-deduplicate against existing remote blobs.
|
||||||
|
|
||||||
|
Use --force to skip the confirmation prompt.`,
|
||||||
|
Args: cobra.NoArgs,
|
||||||
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
|
// Resolve config path
|
||||||
|
configPath, err := ResolveConfigPath()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Load config to get database path
|
||||||
|
cfg, err := config.Load(configPath)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to load config: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
dbPath := cfg.IndexPath
|
||||||
|
|
||||||
|
// Check if database exists
|
||||||
|
if _, err := os.Stat(dbPath); os.IsNotExist(err) {
|
||||||
|
fmt.Printf("Database does not exist: %s\n", dbPath)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Confirm unless --force
|
||||||
|
if !force {
|
||||||
|
fmt.Printf("This will delete the local state database at:\n %s\n\n", dbPath)
|
||||||
|
fmt.Print("Are you sure? Type 'yes' to confirm: ")
|
||||||
|
var confirm string
|
||||||
|
if _, err := fmt.Scanln(&confirm); err != nil || confirm != "yes" {
|
||||||
|
fmt.Println("Aborted.")
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Delete the database file
|
||||||
|
if err := os.Remove(dbPath); err != nil {
|
||||||
|
return fmt.Errorf("failed to delete database: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Also delete WAL and SHM files if they exist
|
||||||
|
walPath := dbPath + "-wal"
|
||||||
|
shmPath := dbPath + "-shm"
|
||||||
|
_ = os.Remove(walPath) // Ignore errors - files may not exist
|
||||||
|
_ = os.Remove(shmPath)
|
||||||
|
|
||||||
|
rootFlags := GetRootFlags()
|
||||||
|
if !rootFlags.Quiet {
|
||||||
|
fmt.Printf("Database purged: %s\n", dbPath)
|
||||||
|
}
|
||||||
|
|
||||||
|
log.Info("Local state database purged", "path", dbPath)
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
cmd.Flags().BoolVar(&force, "force", false, "Skip confirmation prompt")
|
||||||
|
|
||||||
|
return cmd
|
||||||
|
}
|
||||||
94
internal/cli/duration.go
Normal file
94
internal/cli/duration.go
Normal file
@@ -0,0 +1,94 @@
|
|||||||
|
package cli
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"regexp"
|
||||||
|
"strconv"
|
||||||
|
"strings"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// parseDuration parses duration strings. Supports standard Go duration format
|
||||||
|
// (e.g., "3h30m", "1h45m30s") as well as extended units:
|
||||||
|
// - d: days (e.g., "30d", "7d")
|
||||||
|
// - w: weeks (e.g., "2w", "4w")
|
||||||
|
// - mo: months (30 days) (e.g., "6mo", "1mo")
|
||||||
|
// - y: years (365 days) (e.g., "1y", "2y")
|
||||||
|
//
|
||||||
|
// Can combine units: "1y6mo", "2w3d", "1d12h30m"
|
||||||
|
func parseDuration(s string) (time.Duration, error) {
|
||||||
|
// First try standard Go duration parsing
|
||||||
|
if d, err := time.ParseDuration(s); err == nil {
|
||||||
|
return d, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Extended duration parsing
|
||||||
|
// Check for negative values
|
||||||
|
if strings.HasPrefix(strings.TrimSpace(s), "-") {
|
||||||
|
return 0, fmt.Errorf("negative durations are not supported")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pattern matches: number + unit, repeated
|
||||||
|
re := regexp.MustCompile(`(\d+(?:\.\d+)?)\s*([a-zA-Z]+)`)
|
||||||
|
matches := re.FindAllStringSubmatch(s, -1)
|
||||||
|
|
||||||
|
if len(matches) == 0 {
|
||||||
|
return 0, fmt.Errorf("invalid duration format: %q", s)
|
||||||
|
}
|
||||||
|
|
||||||
|
var total time.Duration
|
||||||
|
|
||||||
|
for _, match := range matches {
|
||||||
|
valueStr := match[1]
|
||||||
|
unit := strings.ToLower(match[2])
|
||||||
|
|
||||||
|
value, err := strconv.ParseFloat(valueStr, 64)
|
||||||
|
if err != nil {
|
||||||
|
return 0, fmt.Errorf("invalid number %q: %w", valueStr, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
var d time.Duration
|
||||||
|
switch unit {
|
||||||
|
// Standard time units
|
||||||
|
case "ns", "nanosecond", "nanoseconds":
|
||||||
|
d = time.Duration(value)
|
||||||
|
case "us", "µs", "microsecond", "microseconds":
|
||||||
|
d = time.Duration(value * float64(time.Microsecond))
|
||||||
|
case "ms", "millisecond", "milliseconds":
|
||||||
|
d = time.Duration(value * float64(time.Millisecond))
|
||||||
|
case "s", "sec", "second", "seconds":
|
||||||
|
d = time.Duration(value * float64(time.Second))
|
||||||
|
case "m", "min", "minute", "minutes":
|
||||||
|
d = time.Duration(value * float64(time.Minute))
|
||||||
|
case "h", "hr", "hour", "hours":
|
||||||
|
d = time.Duration(value * float64(time.Hour))
|
||||||
|
// Extended units
|
||||||
|
case "d", "day", "days":
|
||||||
|
d = time.Duration(value * float64(24*time.Hour))
|
||||||
|
case "w", "week", "weeks":
|
||||||
|
d = time.Duration(value * float64(7*24*time.Hour))
|
||||||
|
case "mo", "month", "months":
|
||||||
|
// Using 30 days as approximation
|
||||||
|
d = time.Duration(value * float64(30*24*time.Hour))
|
||||||
|
case "y", "year", "years":
|
||||||
|
// Using 365 days as approximation
|
||||||
|
d = time.Duration(value * float64(365*24*time.Hour))
|
||||||
|
default:
|
||||||
|
// Try parsing as standard Go duration unit
|
||||||
|
testStr := fmt.Sprintf("1%s", unit)
|
||||||
|
if _, err := time.ParseDuration(testStr); err == nil {
|
||||||
|
// It's a valid Go duration unit, parse the full value
|
||||||
|
fullStr := fmt.Sprintf("%g%s", value, unit)
|
||||||
|
if d, err = time.ParseDuration(fullStr); err != nil {
|
||||||
|
return 0, fmt.Errorf("invalid duration %q: %w", fullStr, err)
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
return 0, fmt.Errorf("unknown time unit %q", unit)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
total += d
|
||||||
|
}
|
||||||
|
|
||||||
|
return total, nil
|
||||||
|
}
|
||||||
263
internal/cli/duration_test.go
Normal file
263
internal/cli/duration_test.go
Normal file
@@ -0,0 +1,263 @@
|
|||||||
|
package cli
|
||||||
|
|
||||||
|
import (
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"github.com/stretchr/testify/assert"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestParseDuration(t *testing.T) {
|
||||||
|
tests := []struct {
|
||||||
|
name string
|
||||||
|
input string
|
||||||
|
expected time.Duration
|
||||||
|
wantErr bool
|
||||||
|
}{
|
||||||
|
// Standard Go durations
|
||||||
|
{
|
||||||
|
name: "standard seconds",
|
||||||
|
input: "30s",
|
||||||
|
expected: 30 * time.Second,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "standard minutes",
|
||||||
|
input: "45m",
|
||||||
|
expected: 45 * time.Minute,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "standard hours",
|
||||||
|
input: "2h",
|
||||||
|
expected: 2 * time.Hour,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "standard combined",
|
||||||
|
input: "3h30m",
|
||||||
|
expected: 3*time.Hour + 30*time.Minute,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "standard complex",
|
||||||
|
input: "1h45m30s",
|
||||||
|
expected: 1*time.Hour + 45*time.Minute + 30*time.Second,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "standard with milliseconds",
|
||||||
|
input: "1s500ms",
|
||||||
|
expected: 1*time.Second + 500*time.Millisecond,
|
||||||
|
},
|
||||||
|
// Extended units - days
|
||||||
|
{
|
||||||
|
name: "single day",
|
||||||
|
input: "1d",
|
||||||
|
expected: 24 * time.Hour,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "multiple days",
|
||||||
|
input: "7d",
|
||||||
|
expected: 7 * 24 * time.Hour,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "fractional days",
|
||||||
|
input: "1.5d",
|
||||||
|
expected: 36 * time.Hour,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "days spelled out",
|
||||||
|
input: "3days",
|
||||||
|
expected: 3 * 24 * time.Hour,
|
||||||
|
},
|
||||||
|
// Extended units - weeks
|
||||||
|
{
|
||||||
|
name: "single week",
|
||||||
|
input: "1w",
|
||||||
|
expected: 7 * 24 * time.Hour,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "multiple weeks",
|
||||||
|
input: "4w",
|
||||||
|
expected: 4 * 7 * 24 * time.Hour,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "weeks spelled out",
|
||||||
|
input: "2weeks",
|
||||||
|
expected: 2 * 7 * 24 * time.Hour,
|
||||||
|
},
|
||||||
|
// Extended units - months
|
||||||
|
{
|
||||||
|
name: "single month",
|
||||||
|
input: "1mo",
|
||||||
|
expected: 30 * 24 * time.Hour,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "multiple months",
|
||||||
|
input: "6mo",
|
||||||
|
expected: 6 * 30 * 24 * time.Hour,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "months spelled out",
|
||||||
|
input: "3months",
|
||||||
|
expected: 3 * 30 * 24 * time.Hour,
|
||||||
|
},
|
||||||
|
// Extended units - years
|
||||||
|
{
|
||||||
|
name: "single year",
|
||||||
|
input: "1y",
|
||||||
|
expected: 365 * 24 * time.Hour,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "multiple years",
|
||||||
|
input: "2y",
|
||||||
|
expected: 2 * 365 * 24 * time.Hour,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "years spelled out",
|
||||||
|
input: "1year",
|
||||||
|
expected: 365 * 24 * time.Hour,
|
||||||
|
},
|
||||||
|
// Combined extended units
|
||||||
|
{
|
||||||
|
name: "weeks and days",
|
||||||
|
input: "2w3d",
|
||||||
|
expected: 2*7*24*time.Hour + 3*24*time.Hour,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "years and months",
|
||||||
|
input: "1y6mo",
|
||||||
|
expected: 365*24*time.Hour + 6*30*24*time.Hour,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "days and hours",
|
||||||
|
input: "1d12h",
|
||||||
|
expected: 24*time.Hour + 12*time.Hour,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "complex combination",
|
||||||
|
input: "1y2mo3w4d5h6m7s",
|
||||||
|
expected: 365*24*time.Hour + 2*30*24*time.Hour + 3*7*24*time.Hour + 4*24*time.Hour + 5*time.Hour + 6*time.Minute + 7*time.Second,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "with spaces",
|
||||||
|
input: "1d 12h 30m",
|
||||||
|
expected: 24*time.Hour + 12*time.Hour + 30*time.Minute,
|
||||||
|
},
|
||||||
|
// Edge cases
|
||||||
|
{
|
||||||
|
name: "zero duration",
|
||||||
|
input: "0s",
|
||||||
|
expected: 0,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "large duration",
|
||||||
|
input: "10y",
|
||||||
|
expected: 10 * 365 * 24 * time.Hour,
|
||||||
|
},
|
||||||
|
// Error cases
|
||||||
|
{
|
||||||
|
name: "empty string",
|
||||||
|
input: "",
|
||||||
|
wantErr: true,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "invalid format",
|
||||||
|
input: "abc",
|
||||||
|
wantErr: true,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "unknown unit",
|
||||||
|
input: "5x",
|
||||||
|
wantErr: true,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "invalid number",
|
||||||
|
input: "xyzd",
|
||||||
|
wantErr: true,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "negative not supported",
|
||||||
|
input: "-5d",
|
||||||
|
wantErr: true,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, tt := range tests {
|
||||||
|
t.Run(tt.name, func(t *testing.T) {
|
||||||
|
got, err := parseDuration(tt.input)
|
||||||
|
|
||||||
|
if tt.wantErr {
|
||||||
|
assert.Error(t, err, "expected error for input %q", tt.input)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
assert.NoError(t, err, "unexpected error for input %q", tt.input)
|
||||||
|
assert.Equal(t, tt.expected, got, "duration mismatch for input %q", tt.input)
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestParseDurationSpecialCases(t *testing.T) {
|
||||||
|
// Test that standard Go durations work exactly as expected
|
||||||
|
standardDurations := []string{
|
||||||
|
"300ms",
|
||||||
|
"1.5h",
|
||||||
|
"2h45m",
|
||||||
|
"72h",
|
||||||
|
"1us",
|
||||||
|
"1µs",
|
||||||
|
"1ns",
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, d := range standardDurations {
|
||||||
|
expected, err := time.ParseDuration(d)
|
||||||
|
assert.NoError(t, err)
|
||||||
|
|
||||||
|
got, err := parseDuration(d)
|
||||||
|
assert.NoError(t, err)
|
||||||
|
assert.Equal(t, expected, got, "standard duration %q should parse identically", d)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestParseDurationRealWorldExamples(t *testing.T) {
|
||||||
|
// Test real-world snapshot purge scenarios
|
||||||
|
tests := []struct {
|
||||||
|
description string
|
||||||
|
input string
|
||||||
|
olderThan time.Duration
|
||||||
|
}{
|
||||||
|
{
|
||||||
|
description: "keep snapshots from last 30 days",
|
||||||
|
input: "30d",
|
||||||
|
olderThan: 30 * 24 * time.Hour,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
description: "keep snapshots from last 6 months",
|
||||||
|
input: "6mo",
|
||||||
|
olderThan: 6 * 30 * 24 * time.Hour,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
description: "keep snapshots from last year",
|
||||||
|
input: "1y",
|
||||||
|
olderThan: 365 * 24 * time.Hour,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
description: "keep snapshots from last week and a half",
|
||||||
|
input: "1w3d",
|
||||||
|
olderThan: 10 * 24 * time.Hour,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
description: "keep snapshots from last 90 days",
|
||||||
|
input: "90d",
|
||||||
|
olderThan: 90 * 24 * time.Hour,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, tt := range tests {
|
||||||
|
t.Run(tt.description, func(t *testing.T) {
|
||||||
|
got, err := parseDuration(tt.input)
|
||||||
|
assert.NoError(t, err)
|
||||||
|
assert.Equal(t, tt.olderThan, got)
|
||||||
|
|
||||||
|
// Verify the duration makes sense for snapshot purging
|
||||||
|
assert.Greater(t, got, time.Hour, "snapshot purge duration should be at least an hour")
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -4,7 +4,9 @@ import (
|
|||||||
"os"
|
"os"
|
||||||
)
|
)
|
||||||
|
|
||||||
// CLIEntry is the main entry point for the CLI application
|
// CLIEntry is the main entry point for the CLI application.
|
||||||
|
// It creates the root command, executes it, and exits with status 1
|
||||||
|
// if an error occurs. This function should be called from main().
|
||||||
func CLIEntry() {
|
func CLIEntry() {
|
||||||
rootCmd := NewRootCommand()
|
rootCmd := NewRootCommand()
|
||||||
if err := rootCmd.Execute(); err != nil {
|
if err := rootCmd.Execute(); err != nil {
|
||||||
|
|||||||
@@ -18,7 +18,7 @@ func TestCLIEntry(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Verify all subcommands are registered
|
// Verify all subcommands are registered
|
||||||
expectedCommands := []string{"backup", "restore", "prune", "verify", "fetch"}
|
expectedCommands := []string{"snapshot", "store", "restore", "prune", "verify", "info", "version"}
|
||||||
for _, expected := range expectedCommands {
|
for _, expected := range expectedCommands {
|
||||||
found := false
|
found := false
|
||||||
for _, cmd := range cmd.Commands() {
|
for _, cmd := range cmd.Commands() {
|
||||||
@@ -32,19 +32,24 @@ func TestCLIEntry(t *testing.T) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Verify backup command has proper flags
|
// Verify snapshot command has subcommands
|
||||||
backupCmd, _, err := cmd.Find([]string{"backup"})
|
snapshotCmd, _, err := cmd.Find([]string{"snapshot"})
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Errorf("Failed to find backup command: %v", err)
|
t.Errorf("Failed to find snapshot command: %v", err)
|
||||||
} else {
|
} else {
|
||||||
if backupCmd.Flag("config") == nil {
|
// Check snapshot subcommands
|
||||||
t.Error("Backup command missing --config flag")
|
expectedSubCommands := []string{"create", "list", "purge", "verify"}
|
||||||
|
for _, expected := range expectedSubCommands {
|
||||||
|
found := false
|
||||||
|
for _, subcmd := range snapshotCmd.Commands() {
|
||||||
|
if subcmd.Use == expected || subcmd.Name() == expected {
|
||||||
|
found = true
|
||||||
|
break
|
||||||
}
|
}
|
||||||
if backupCmd.Flag("daemon") == nil {
|
|
||||||
t.Error("Backup command missing --daemon flag")
|
|
||||||
}
|
}
|
||||||
if backupCmd.Flag("cron") == nil {
|
if !found {
|
||||||
t.Error("Backup command missing --cron flag")
|
t.Errorf("Expected snapshot subcommand '%s' not found", expected)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,88 +0,0 @@
|
|||||||
package cli
|
|
||||||
|
|
||||||
import (
|
|
||||||
"context"
|
|
||||||
"fmt"
|
|
||||||
"os"
|
|
||||||
|
|
||||||
"git.eeqj.de/sneak/vaultik/internal/globals"
|
|
||||||
"github.com/spf13/cobra"
|
|
||||||
"go.uber.org/fx"
|
|
||||||
)
|
|
||||||
|
|
||||||
// FetchOptions contains options for the fetch command
|
|
||||||
type FetchOptions struct {
|
|
||||||
Bucket string
|
|
||||||
Prefix string
|
|
||||||
SnapshotID string
|
|
||||||
FilePath string
|
|
||||||
Target string
|
|
||||||
}
|
|
||||||
|
|
||||||
// NewFetchCommand creates the fetch command
|
|
||||||
func NewFetchCommand() *cobra.Command {
|
|
||||||
opts := &FetchOptions{}
|
|
||||||
|
|
||||||
cmd := &cobra.Command{
|
|
||||||
Use: "fetch",
|
|
||||||
Short: "Extract single file from backup",
|
|
||||||
Long: `Download and decrypt a single file from a backup snapshot`,
|
|
||||||
Args: cobra.NoArgs,
|
|
||||||
RunE: func(cmd *cobra.Command, args []string) error {
|
|
||||||
// Validate required flags
|
|
||||||
if opts.Bucket == "" {
|
|
||||||
return fmt.Errorf("--bucket is required")
|
|
||||||
}
|
|
||||||
if opts.Prefix == "" {
|
|
||||||
return fmt.Errorf("--prefix is required")
|
|
||||||
}
|
|
||||||
if opts.SnapshotID == "" {
|
|
||||||
return fmt.Errorf("--snapshot is required")
|
|
||||||
}
|
|
||||||
if opts.FilePath == "" {
|
|
||||||
return fmt.Errorf("--file is required")
|
|
||||||
}
|
|
||||||
if opts.Target == "" {
|
|
||||||
return fmt.Errorf("--target is required")
|
|
||||||
}
|
|
||||||
return runFetch(cmd.Context(), opts)
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name")
|
|
||||||
cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix")
|
|
||||||
cmd.Flags().StringVar(&opts.SnapshotID, "snapshot", "", "Snapshot ID")
|
|
||||||
cmd.Flags().StringVar(&opts.FilePath, "file", "", "Path of file to extract from backup")
|
|
||||||
cmd.Flags().StringVar(&opts.Target, "target", "", "Target path for extracted file")
|
|
||||||
|
|
||||||
return cmd
|
|
||||||
}
|
|
||||||
|
|
||||||
func runFetch(ctx context.Context, opts *FetchOptions) error {
|
|
||||||
if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
|
|
||||||
return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
|
|
||||||
}
|
|
||||||
|
|
||||||
app := fx.New(
|
|
||||||
fx.Supply(opts),
|
|
||||||
fx.Provide(globals.New),
|
|
||||||
// Additional modules will be added here
|
|
||||||
fx.Invoke(func(g *globals.Globals) error {
|
|
||||||
// TODO: Implement fetch logic
|
|
||||||
fmt.Printf("Fetching %s from snapshot %s to %s\n", opts.FilePath, opts.SnapshotID, opts.Target)
|
|
||||||
return nil
|
|
||||||
}),
|
|
||||||
fx.NopLogger,
|
|
||||||
)
|
|
||||||
|
|
||||||
if err := app.Start(ctx); err != nil {
|
|
||||||
return fmt.Errorf("failed to start fetch: %w", err)
|
|
||||||
}
|
|
||||||
defer func() {
|
|
||||||
if err := app.Stop(ctx); err != nil {
|
|
||||||
fmt.Printf("error stopping app: %v\n", err)
|
|
||||||
}
|
|
||||||
}()
|
|
||||||
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
71
internal/cli/info.go
Normal file
71
internal/cli/info.go
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
package cli
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"os"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/vaultik"
|
||||||
|
"github.com/spf13/cobra"
|
||||||
|
"go.uber.org/fx"
|
||||||
|
)
|
||||||
|
|
||||||
|
// NewInfoCommand creates the info command
|
||||||
|
func NewInfoCommand() *cobra.Command {
|
||||||
|
cmd := &cobra.Command{
|
||||||
|
Use: "info",
|
||||||
|
Short: "Display system and configuration information",
|
||||||
|
Long: `Shows information about the current vaultik configuration, including:
|
||||||
|
- System details (OS, architecture, version)
|
||||||
|
- Storage configuration (S3 bucket, endpoint)
|
||||||
|
- Backup settings (source directories, compression)
|
||||||
|
- Encryption configuration (recipients)
|
||||||
|
- Local database statistics`,
|
||||||
|
Args: cobra.NoArgs,
|
||||||
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
|
// Use unified config resolution
|
||||||
|
configPath, err := ResolveConfigPath()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Use the app framework
|
||||||
|
rootFlags := GetRootFlags()
|
||||||
|
return RunWithApp(cmd.Context(), AppOptions{
|
||||||
|
ConfigPath: configPath,
|
||||||
|
LogOptions: log.LogOptions{
|
||||||
|
Verbose: rootFlags.Verbose,
|
||||||
|
Debug: rootFlags.Debug,
|
||||||
|
Quiet: rootFlags.Quiet,
|
||||||
|
},
|
||||||
|
Modules: []fx.Option{},
|
||||||
|
Invokes: []fx.Option{
|
||||||
|
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||||
|
lc.Append(fx.Hook{
|
||||||
|
OnStart: func(ctx context.Context) error {
|
||||||
|
go func() {
|
||||||
|
if err := v.ShowInfo(); err != nil {
|
||||||
|
if err != context.Canceled {
|
||||||
|
log.Error("Failed to show info", "error", err)
|
||||||
|
os.Exit(1)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||||
|
log.Error("Failed to shutdown", "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
OnStop: func(ctx context.Context) error {
|
||||||
|
v.Cancel()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
})
|
||||||
|
}),
|
||||||
|
},
|
||||||
|
})
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
return cmd
|
||||||
|
}
|
||||||
@@ -2,77 +2,83 @@ package cli
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"context"
|
"context"
|
||||||
"fmt"
|
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
"git.eeqj.de/sneak/vaultik/internal/globals"
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/vaultik"
|
||||||
"github.com/spf13/cobra"
|
"github.com/spf13/cobra"
|
||||||
"go.uber.org/fx"
|
"go.uber.org/fx"
|
||||||
)
|
)
|
||||||
|
|
||||||
// PruneOptions contains options for the prune command
|
|
||||||
type PruneOptions struct {
|
|
||||||
Bucket string
|
|
||||||
Prefix string
|
|
||||||
DryRun bool
|
|
||||||
}
|
|
||||||
|
|
||||||
// NewPruneCommand creates the prune command
|
// NewPruneCommand creates the prune command
|
||||||
func NewPruneCommand() *cobra.Command {
|
func NewPruneCommand() *cobra.Command {
|
||||||
opts := &PruneOptions{}
|
opts := &vaultik.PruneOptions{}
|
||||||
|
|
||||||
cmd := &cobra.Command{
|
cmd := &cobra.Command{
|
||||||
Use: "prune",
|
Use: "prune",
|
||||||
Short: "Remove unreferenced blobs",
|
Short: "Remove unreferenced blobs",
|
||||||
Long: `Delete blobs that are no longer referenced by any snapshot`,
|
Long: `Removes blobs that are not referenced by any snapshot.
|
||||||
|
|
||||||
|
This command scans all snapshots and their manifests to build a list of
|
||||||
|
referenced blobs, then removes any blobs in storage that are not in this list.
|
||||||
|
|
||||||
|
Use this command after deleting snapshots with 'vaultik purge' to reclaim
|
||||||
|
storage space.`,
|
||||||
Args: cobra.NoArgs,
|
Args: cobra.NoArgs,
|
||||||
RunE: func(cmd *cobra.Command, args []string) error {
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
// Validate required flags
|
// Use unified config resolution
|
||||||
if opts.Bucket == "" {
|
configPath, err := ResolveConfigPath()
|
||||||
return fmt.Errorf("--bucket is required")
|
if err != nil {
|
||||||
|
return err
|
||||||
}
|
}
|
||||||
if opts.Prefix == "" {
|
|
||||||
return fmt.Errorf("--prefix is required")
|
// Use the app framework like other commands
|
||||||
|
rootFlags := GetRootFlags()
|
||||||
|
return RunWithApp(cmd.Context(), AppOptions{
|
||||||
|
ConfigPath: configPath,
|
||||||
|
LogOptions: log.LogOptions{
|
||||||
|
Verbose: rootFlags.Verbose,
|
||||||
|
Debug: rootFlags.Debug,
|
||||||
|
Quiet: rootFlags.Quiet || opts.JSON,
|
||||||
|
},
|
||||||
|
Modules: []fx.Option{},
|
||||||
|
Invokes: []fx.Option{
|
||||||
|
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||||
|
lc.Append(fx.Hook{
|
||||||
|
OnStart: func(ctx context.Context) error {
|
||||||
|
// Start the prune operation in a goroutine
|
||||||
|
go func() {
|
||||||
|
// Run the prune operation
|
||||||
|
if err := v.PruneBlobs(opts); err != nil {
|
||||||
|
if err != context.Canceled {
|
||||||
|
if !opts.JSON {
|
||||||
|
log.Error("Prune operation failed", "error", err)
|
||||||
}
|
}
|
||||||
return runPrune(cmd.Context(), opts)
|
os.Exit(1)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Shutdown the app when prune completes
|
||||||
|
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||||
|
log.Error("Failed to shutdown", "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
OnStop: func(ctx context.Context) error {
|
||||||
|
log.Debug("Stopping prune operation")
|
||||||
|
v.Cancel()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
})
|
||||||
|
}),
|
||||||
|
},
|
||||||
|
})
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name")
|
cmd.Flags().BoolVar(&opts.Force, "force", false, "Skip confirmation prompt")
|
||||||
cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix")
|
cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output pruning stats as JSON")
|
||||||
cmd.Flags().BoolVar(&opts.DryRun, "dry-run", false, "Show what would be deleted without actually deleting")
|
|
||||||
|
|
||||||
return cmd
|
return cmd
|
||||||
}
|
}
|
||||||
|
|
||||||
func runPrune(ctx context.Context, opts *PruneOptions) error {
|
|
||||||
if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
|
|
||||||
return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
|
|
||||||
}
|
|
||||||
|
|
||||||
app := fx.New(
|
|
||||||
fx.Supply(opts),
|
|
||||||
fx.Provide(globals.New),
|
|
||||||
// Additional modules will be added here
|
|
||||||
fx.Invoke(func(g *globals.Globals) error {
|
|
||||||
// TODO: Implement prune logic
|
|
||||||
fmt.Printf("Pruning bucket %s with prefix %s\n", opts.Bucket, opts.Prefix)
|
|
||||||
if opts.DryRun {
|
|
||||||
fmt.Println("Running in dry-run mode")
|
|
||||||
}
|
|
||||||
return nil
|
|
||||||
}),
|
|
||||||
fx.NopLogger,
|
|
||||||
)
|
|
||||||
|
|
||||||
if err := app.Start(ctx); err != nil {
|
|
||||||
return fmt.Errorf("failed to start prune: %w", err)
|
|
||||||
}
|
|
||||||
defer func() {
|
|
||||||
if err := app.Stop(ctx); err != nil {
|
|
||||||
fmt.Printf("error stopping app: %v\n", err)
|
|
||||||
}
|
|
||||||
}()
|
|
||||||
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
|
|||||||
100
internal/cli/purge.go
Normal file
100
internal/cli/purge.go
Normal file
@@ -0,0 +1,100 @@
|
|||||||
|
package cli
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/vaultik"
|
||||||
|
"github.com/spf13/cobra"
|
||||||
|
"go.uber.org/fx"
|
||||||
|
)
|
||||||
|
|
||||||
|
// PurgeOptions contains options for the purge command
|
||||||
|
type PurgeOptions struct {
|
||||||
|
KeepLatest bool
|
||||||
|
OlderThan string
|
||||||
|
Force bool
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewPurgeCommand creates the purge command
|
||||||
|
func NewPurgeCommand() *cobra.Command {
|
||||||
|
opts := &PurgeOptions{}
|
||||||
|
|
||||||
|
cmd := &cobra.Command{
|
||||||
|
Use: "purge",
|
||||||
|
Short: "Purge old snapshots",
|
||||||
|
Long: `Removes snapshots based on age or count criteria.
|
||||||
|
|
||||||
|
This command allows you to:
|
||||||
|
- Keep only the latest snapshot (--keep-latest)
|
||||||
|
- Remove snapshots older than a specific duration (--older-than)
|
||||||
|
|
||||||
|
Config is located at /etc/vaultik/config.yml by default, but can be overridden by
|
||||||
|
specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
|
||||||
|
Args: cobra.NoArgs,
|
||||||
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
|
// Validate flags
|
||||||
|
if !opts.KeepLatest && opts.OlderThan == "" {
|
||||||
|
return fmt.Errorf("must specify either --keep-latest or --older-than")
|
||||||
|
}
|
||||||
|
if opts.KeepLatest && opts.OlderThan != "" {
|
||||||
|
return fmt.Errorf("cannot specify both --keep-latest and --older-than")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Use unified config resolution
|
||||||
|
configPath, err := ResolveConfigPath()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Use the app framework like other commands
|
||||||
|
rootFlags := GetRootFlags()
|
||||||
|
return RunWithApp(cmd.Context(), AppOptions{
|
||||||
|
ConfigPath: configPath,
|
||||||
|
LogOptions: log.LogOptions{
|
||||||
|
Verbose: rootFlags.Verbose,
|
||||||
|
Debug: rootFlags.Debug,
|
||||||
|
Quiet: rootFlags.Quiet,
|
||||||
|
},
|
||||||
|
Modules: []fx.Option{},
|
||||||
|
Invokes: []fx.Option{
|
||||||
|
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||||
|
lc.Append(fx.Hook{
|
||||||
|
OnStart: func(ctx context.Context) error {
|
||||||
|
// Start the purge operation in a goroutine
|
||||||
|
go func() {
|
||||||
|
// Run the purge operation
|
||||||
|
if err := v.PurgeSnapshots(opts.KeepLatest, opts.OlderThan, opts.Force); err != nil {
|
||||||
|
if err != context.Canceled {
|
||||||
|
log.Error("Purge operation failed", "error", err)
|
||||||
|
os.Exit(1)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Shutdown the app when purge completes
|
||||||
|
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||||
|
log.Error("Failed to shutdown", "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
OnStop: func(ctx context.Context) error {
|
||||||
|
log.Debug("Stopping purge operation")
|
||||||
|
v.Cancel()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
})
|
||||||
|
}),
|
||||||
|
},
|
||||||
|
})
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
cmd.Flags().BoolVar(&opts.KeepLatest, "keep-latest", false, "Keep only the latest snapshot")
|
||||||
|
cmd.Flags().StringVar(&opts.OlderThan, "older-than", "", "Remove snapshots older than duration (e.g. 30d, 6m, 1y)")
|
||||||
|
cmd.Flags().BoolVar(&opts.Force, "force", false, "Skip confirmation prompts")
|
||||||
|
|
||||||
|
return cmd
|
||||||
|
}
|
||||||
89
internal/cli/remote.go
Normal file
89
internal/cli/remote.go
Normal file
@@ -0,0 +1,89 @@
|
|||||||
|
package cli
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"os"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/vaultik"
|
||||||
|
"github.com/spf13/cobra"
|
||||||
|
"go.uber.org/fx"
|
||||||
|
)
|
||||||
|
|
||||||
|
// NewRemoteCommand creates the remote command and subcommands
|
||||||
|
func NewRemoteCommand() *cobra.Command {
|
||||||
|
cmd := &cobra.Command{
|
||||||
|
Use: "remote",
|
||||||
|
Short: "Remote storage management commands",
|
||||||
|
Long: "Commands for inspecting and managing remote storage",
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add subcommands
|
||||||
|
cmd.AddCommand(newRemoteInfoCommand())
|
||||||
|
|
||||||
|
return cmd
|
||||||
|
}
|
||||||
|
|
||||||
|
// newRemoteInfoCommand creates the 'remote info' subcommand
|
||||||
|
func newRemoteInfoCommand() *cobra.Command {
|
||||||
|
var jsonOutput bool
|
||||||
|
|
||||||
|
cmd := &cobra.Command{
|
||||||
|
Use: "info",
|
||||||
|
Short: "Display remote storage information",
|
||||||
|
Long: `Shows detailed information about remote storage, including:
|
||||||
|
- Size of all snapshot metadata (per snapshot and total)
|
||||||
|
- Count and total size of all blobs
|
||||||
|
- Count and size of referenced blobs (from all manifests)
|
||||||
|
- Count and size of orphaned blobs (not referenced by any manifest)`,
|
||||||
|
Args: cobra.NoArgs,
|
||||||
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
|
// Use unified config resolution
|
||||||
|
configPath, err := ResolveConfigPath()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
rootFlags := GetRootFlags()
|
||||||
|
return RunWithApp(cmd.Context(), AppOptions{
|
||||||
|
ConfigPath: configPath,
|
||||||
|
LogOptions: log.LogOptions{
|
||||||
|
Verbose: rootFlags.Verbose,
|
||||||
|
Debug: rootFlags.Debug,
|
||||||
|
Quiet: rootFlags.Quiet || jsonOutput,
|
||||||
|
},
|
||||||
|
Modules: []fx.Option{},
|
||||||
|
Invokes: []fx.Option{
|
||||||
|
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||||
|
lc.Append(fx.Hook{
|
||||||
|
OnStart: func(ctx context.Context) error {
|
||||||
|
go func() {
|
||||||
|
if err := v.RemoteInfo(jsonOutput); err != nil {
|
||||||
|
if err != context.Canceled {
|
||||||
|
if !jsonOutput {
|
||||||
|
log.Error("Failed to get remote info", "error", err)
|
||||||
|
}
|
||||||
|
os.Exit(1)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||||
|
log.Error("Failed to shutdown", "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
OnStop: func(ctx context.Context) error {
|
||||||
|
v.Cancel()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
})
|
||||||
|
}),
|
||||||
|
},
|
||||||
|
})
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
cmd.Flags().BoolVar(&jsonOutput, "json", false, "Output in JSON format")
|
||||||
|
|
||||||
|
return cmd
|
||||||
|
}
|
||||||
@@ -2,20 +2,30 @@ package cli
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"context"
|
"context"
|
||||||
"fmt"
|
|
||||||
"os"
|
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/config"
|
||||||
"git.eeqj.de/sneak/vaultik/internal/globals"
|
"git.eeqj.de/sneak/vaultik/internal/globals"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/storage"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/vaultik"
|
||||||
"github.com/spf13/cobra"
|
"github.com/spf13/cobra"
|
||||||
"go.uber.org/fx"
|
"go.uber.org/fx"
|
||||||
)
|
)
|
||||||
|
|
||||||
// RestoreOptions contains options for the restore command
|
// RestoreOptions contains options for the restore command
|
||||||
type RestoreOptions struct {
|
type RestoreOptions struct {
|
||||||
Bucket string
|
|
||||||
Prefix string
|
|
||||||
SnapshotID string
|
|
||||||
TargetDir string
|
TargetDir string
|
||||||
|
Paths []string // Optional paths to restore (empty = all)
|
||||||
|
Verify bool // Verify restored files after restore
|
||||||
|
}
|
||||||
|
|
||||||
|
// RestoreApp contains all dependencies needed for restore
|
||||||
|
type RestoreApp struct {
|
||||||
|
Globals *globals.Globals
|
||||||
|
Config *config.Config
|
||||||
|
Storage storage.Storer
|
||||||
|
Vaultik *vaultik.Vaultik
|
||||||
|
Shutdowner fx.Shutdowner
|
||||||
}
|
}
|
||||||
|
|
||||||
// NewRestoreCommand creates the restore command
|
// NewRestoreCommand creates the restore command
|
||||||
@@ -23,61 +33,104 @@ func NewRestoreCommand() *cobra.Command {
|
|||||||
opts := &RestoreOptions{}
|
opts := &RestoreOptions{}
|
||||||
|
|
||||||
cmd := &cobra.Command{
|
cmd := &cobra.Command{
|
||||||
Use: "restore",
|
Use: "restore <snapshot-id> <target-dir> [paths...]",
|
||||||
Short: "Restore files from backup",
|
Short: "Restore files from backup",
|
||||||
Long: `Download and decrypt files from a backup snapshot`,
|
Long: `Download and decrypt files from a backup snapshot.
|
||||||
Args: cobra.NoArgs,
|
|
||||||
|
This command will restore files from the specified snapshot to the target directory.
|
||||||
|
If no paths are specified, all files are restored.
|
||||||
|
If paths are specified, only matching files/directories are restored.
|
||||||
|
|
||||||
|
Requires the VAULTIK_AGE_SECRET_KEY environment variable to be set with the age private key.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
# Restore entire snapshot
|
||||||
|
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore
|
||||||
|
|
||||||
|
# Restore specific file
|
||||||
|
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore /home/user/important.txt
|
||||||
|
|
||||||
|
# Restore specific directory
|
||||||
|
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore /home/user/documents/
|
||||||
|
|
||||||
|
# Restore and verify all files
|
||||||
|
vaultik restore --verify myhost_docs_2025-01-01T12:00:00Z /restore`,
|
||||||
|
Args: cobra.MinimumNArgs(2),
|
||||||
RunE: func(cmd *cobra.Command, args []string) error {
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
// Validate required flags
|
snapshotID := args[0]
|
||||||
if opts.Bucket == "" {
|
opts.TargetDir = args[1]
|
||||||
return fmt.Errorf("--bucket is required")
|
if len(args) > 2 {
|
||||||
|
opts.Paths = args[2:]
|
||||||
}
|
}
|
||||||
if opts.Prefix == "" {
|
|
||||||
return fmt.Errorf("--prefix is required")
|
// Use unified config resolution
|
||||||
|
configPath, err := ResolveConfigPath()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
}
|
}
|
||||||
if opts.SnapshotID == "" {
|
|
||||||
return fmt.Errorf("--snapshot is required")
|
// Use the app framework like other commands
|
||||||
|
rootFlags := GetRootFlags()
|
||||||
|
return RunWithApp(cmd.Context(), AppOptions{
|
||||||
|
ConfigPath: configPath,
|
||||||
|
LogOptions: log.LogOptions{
|
||||||
|
Verbose: rootFlags.Verbose,
|
||||||
|
Debug: rootFlags.Debug,
|
||||||
|
Quiet: rootFlags.Quiet,
|
||||||
|
},
|
||||||
|
Modules: []fx.Option{
|
||||||
|
fx.Provide(fx.Annotate(
|
||||||
|
func(g *globals.Globals, cfg *config.Config,
|
||||||
|
storer storage.Storer, v *vaultik.Vaultik, shutdowner fx.Shutdowner) *RestoreApp {
|
||||||
|
return &RestoreApp{
|
||||||
|
Globals: g,
|
||||||
|
Config: cfg,
|
||||||
|
Storage: storer,
|
||||||
|
Vaultik: v,
|
||||||
|
Shutdowner: shutdowner,
|
||||||
}
|
}
|
||||||
if opts.TargetDir == "" {
|
},
|
||||||
return fmt.Errorf("--target is required")
|
)),
|
||||||
|
},
|
||||||
|
Invokes: []fx.Option{
|
||||||
|
fx.Invoke(func(app *RestoreApp, lc fx.Lifecycle) {
|
||||||
|
lc.Append(fx.Hook{
|
||||||
|
OnStart: func(ctx context.Context) error {
|
||||||
|
// Start the restore operation in a goroutine
|
||||||
|
go func() {
|
||||||
|
// Run the restore operation
|
||||||
|
restoreOpts := &vaultik.RestoreOptions{
|
||||||
|
SnapshotID: snapshotID,
|
||||||
|
TargetDir: opts.TargetDir,
|
||||||
|
Paths: opts.Paths,
|
||||||
|
Verify: opts.Verify,
|
||||||
}
|
}
|
||||||
return runRestore(cmd.Context(), opts)
|
if err := app.Vaultik.Restore(restoreOpts); err != nil {
|
||||||
|
if err != context.Canceled {
|
||||||
|
log.Error("Restore operation failed", "error", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Shutdown the app when restore completes
|
||||||
|
if err := app.Shutdowner.Shutdown(); err != nil {
|
||||||
|
log.Error("Failed to shutdown", "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
OnStop: func(ctx context.Context) error {
|
||||||
|
log.Debug("Stopping restore operation")
|
||||||
|
app.Vaultik.Cancel()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
})
|
||||||
|
}),
|
||||||
|
},
|
||||||
|
})
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name")
|
cmd.Flags().BoolVar(&opts.Verify, "verify", false, "Verify restored files by checking chunk hashes")
|
||||||
cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix")
|
|
||||||
cmd.Flags().StringVar(&opts.SnapshotID, "snapshot", "", "Snapshot ID to restore")
|
|
||||||
cmd.Flags().StringVar(&opts.TargetDir, "target", "", "Target directory for restore")
|
|
||||||
|
|
||||||
return cmd
|
return cmd
|
||||||
}
|
}
|
||||||
|
|
||||||
func runRestore(ctx context.Context, opts *RestoreOptions) error {
|
|
||||||
if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
|
|
||||||
return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
|
|
||||||
}
|
|
||||||
|
|
||||||
app := fx.New(
|
|
||||||
fx.Supply(opts),
|
|
||||||
fx.Provide(globals.New),
|
|
||||||
// Additional modules will be added here
|
|
||||||
fx.Invoke(func(g *globals.Globals) error {
|
|
||||||
// TODO: Implement restore logic
|
|
||||||
fmt.Printf("Restoring snapshot %s to %s\n", opts.SnapshotID, opts.TargetDir)
|
|
||||||
return nil
|
|
||||||
}),
|
|
||||||
fx.NopLogger,
|
|
||||||
)
|
|
||||||
|
|
||||||
if err := app.Start(ctx); err != nil {
|
|
||||||
return fmt.Errorf("failed to start restore: %w", err)
|
|
||||||
}
|
|
||||||
defer func() {
|
|
||||||
if err := app.Stop(ctx); err != nil {
|
|
||||||
fmt.Printf("error stopping app: %v\n", err)
|
|
||||||
}
|
|
||||||
}()
|
|
||||||
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
|
|||||||
@@ -1,10 +1,26 @@
|
|||||||
package cli
|
package cli
|
||||||
|
|
||||||
import (
|
import (
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
|
||||||
"github.com/spf13/cobra"
|
"github.com/spf13/cobra"
|
||||||
)
|
)
|
||||||
|
|
||||||
// NewRootCommand creates the root cobra command
|
// RootFlags holds global flags that apply to all commands.
|
||||||
|
// These flags are defined on the root command and inherited by all subcommands.
|
||||||
|
type RootFlags struct {
|
||||||
|
ConfigPath string
|
||||||
|
Verbose bool
|
||||||
|
Debug bool
|
||||||
|
Quiet bool
|
||||||
|
}
|
||||||
|
|
||||||
|
var rootFlags RootFlags
|
||||||
|
|
||||||
|
// NewRootCommand creates the root cobra command for the vaultik CLI.
|
||||||
|
// It sets up the command structure, global flags, and adds all subcommands.
|
||||||
|
// This is the main entry point for the CLI command hierarchy.
|
||||||
func NewRootCommand() *cobra.Command {
|
func NewRootCommand() *cobra.Command {
|
||||||
cmd := &cobra.Command{
|
cmd := &cobra.Command{
|
||||||
Use: "vaultik",
|
Use: "vaultik",
|
||||||
@@ -15,15 +31,54 @@ on the source system.`,
|
|||||||
SilenceUsage: true,
|
SilenceUsage: true,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Add global flags
|
||||||
|
cmd.PersistentFlags().StringVar(&rootFlags.ConfigPath, "config", "", "Path to config file (default: $VAULTIK_CONFIG or /etc/vaultik/config.yml)")
|
||||||
|
cmd.PersistentFlags().BoolVarP(&rootFlags.Verbose, "verbose", "v", false, "Enable verbose output")
|
||||||
|
cmd.PersistentFlags().BoolVar(&rootFlags.Debug, "debug", false, "Enable debug output")
|
||||||
|
cmd.PersistentFlags().BoolVarP(&rootFlags.Quiet, "quiet", "q", false, "Suppress non-error output")
|
||||||
|
|
||||||
// Add subcommands
|
// Add subcommands
|
||||||
cmd.AddCommand(
|
cmd.AddCommand(
|
||||||
NewBackupCommand(),
|
|
||||||
NewRestoreCommand(),
|
NewRestoreCommand(),
|
||||||
NewPruneCommand(),
|
NewPruneCommand(),
|
||||||
NewVerifyCommand(),
|
NewVerifyCommand(),
|
||||||
NewFetchCommand(),
|
NewStoreCommand(),
|
||||||
SnapshotCmd(),
|
NewSnapshotCommand(),
|
||||||
|
NewInfoCommand(),
|
||||||
|
NewVersionCommand(),
|
||||||
|
NewRemoteCommand(),
|
||||||
|
NewDatabaseCommand(),
|
||||||
)
|
)
|
||||||
|
|
||||||
return cmd
|
return cmd
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// GetRootFlags returns the global flags that were parsed from the command line.
|
||||||
|
// This allows subcommands to access global flag values like verbosity and config path.
|
||||||
|
func GetRootFlags() RootFlags {
|
||||||
|
return rootFlags
|
||||||
|
}
|
||||||
|
|
||||||
|
// ResolveConfigPath resolves the config file path from flags, environment, or default.
|
||||||
|
// It checks in order: 1) --config flag, 2) VAULTIK_CONFIG environment variable,
|
||||||
|
// 3) default location /etc/vaultik/config.yml. Returns an error if no valid
|
||||||
|
// config file can be found through any of these methods.
|
||||||
|
func ResolveConfigPath() (string, error) {
|
||||||
|
// First check global flag
|
||||||
|
if rootFlags.ConfigPath != "" {
|
||||||
|
return rootFlags.ConfigPath, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Then check environment variable
|
||||||
|
if envPath := os.Getenv("VAULTIK_CONFIG"); envPath != "" {
|
||||||
|
return envPath, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Finally check default location
|
||||||
|
defaultPath := "/etc/vaultik/config.yml"
|
||||||
|
if _, err := os.Stat(defaultPath); err == nil {
|
||||||
|
return defaultPath, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
return "", fmt.Errorf("no config file specified, VAULTIK_CONFIG not set, and %s not found", defaultPath)
|
||||||
|
}
|
||||||
|
|||||||
@@ -1,90 +1,467 @@
|
|||||||
package cli
|
package cli
|
||||||
|
|
||||||
import (
|
import (
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/vaultik"
|
||||||
"github.com/spf13/cobra"
|
"github.com/spf13/cobra"
|
||||||
|
"go.uber.org/fx"
|
||||||
)
|
)
|
||||||
|
|
||||||
func SnapshotCmd() *cobra.Command {
|
// NewSnapshotCommand creates the snapshot command and subcommands
|
||||||
|
func NewSnapshotCommand() *cobra.Command {
|
||||||
cmd := &cobra.Command{
|
cmd := &cobra.Command{
|
||||||
Use: "snapshot",
|
Use: "snapshot",
|
||||||
Short: "Manage snapshots",
|
Short: "Snapshot management commands",
|
||||||
Long: "Commands for listing, removing, and querying snapshots",
|
Long: "Commands for creating, listing, and managing snapshots",
|
||||||
}
|
}
|
||||||
|
|
||||||
cmd.AddCommand(snapshotListCmd())
|
// Add subcommands
|
||||||
cmd.AddCommand(snapshotRmCmd())
|
cmd.AddCommand(newSnapshotCreateCommand())
|
||||||
cmd.AddCommand(snapshotLatestCmd())
|
cmd.AddCommand(newSnapshotListCommand())
|
||||||
|
cmd.AddCommand(newSnapshotPurgeCommand())
|
||||||
|
cmd.AddCommand(newSnapshotVerifyCommand())
|
||||||
|
cmd.AddCommand(newSnapshotRemoveCommand())
|
||||||
|
cmd.AddCommand(newSnapshotPruneCommand())
|
||||||
|
|
||||||
return cmd
|
return cmd
|
||||||
}
|
}
|
||||||
|
|
||||||
func snapshotListCmd() *cobra.Command {
|
// newSnapshotCreateCommand creates the 'snapshot create' subcommand
|
||||||
var (
|
func newSnapshotCreateCommand() *cobra.Command {
|
||||||
bucket string
|
opts := &vaultik.SnapshotCreateOptions{}
|
||||||
prefix string
|
|
||||||
limit int
|
cmd := &cobra.Command{
|
||||||
)
|
Use: "create [snapshot-names...]",
|
||||||
|
Short: "Create new snapshots",
|
||||||
|
Long: `Creates new snapshots of the configured directories.
|
||||||
|
|
||||||
|
If snapshot names are provided, only those snapshots are created.
|
||||||
|
If no names are provided, all configured snapshots are created.
|
||||||
|
|
||||||
|
Config is located at /etc/vaultik/config.yml by default, but can be overridden by
|
||||||
|
specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
|
||||||
|
Args: cobra.ArbitraryArgs,
|
||||||
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
|
// Pass snapshot names from args
|
||||||
|
opts.Snapshots = args
|
||||||
|
// Use unified config resolution
|
||||||
|
configPath, err := ResolveConfigPath()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Use the backup functionality from cli package
|
||||||
|
rootFlags := GetRootFlags()
|
||||||
|
return RunWithApp(cmd.Context(), AppOptions{
|
||||||
|
ConfigPath: configPath,
|
||||||
|
LogOptions: log.LogOptions{
|
||||||
|
Verbose: rootFlags.Verbose,
|
||||||
|
Debug: rootFlags.Debug,
|
||||||
|
Cron: opts.Cron,
|
||||||
|
Quiet: rootFlags.Quiet,
|
||||||
|
},
|
||||||
|
Modules: []fx.Option{},
|
||||||
|
Invokes: []fx.Option{
|
||||||
|
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||||
|
lc.Append(fx.Hook{
|
||||||
|
OnStart: func(ctx context.Context) error {
|
||||||
|
// Start the snapshot creation in a goroutine
|
||||||
|
go func() {
|
||||||
|
// Run the snapshot creation
|
||||||
|
if err := v.CreateSnapshot(opts); err != nil {
|
||||||
|
if err != context.Canceled {
|
||||||
|
log.Error("Snapshot creation failed", "error", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Shutdown the app when snapshot completes
|
||||||
|
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||||
|
log.Error("Failed to shutdown", "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
OnStop: func(ctx context.Context) error {
|
||||||
|
log.Debug("Stopping snapshot creation")
|
||||||
|
// Cancel the Vaultik context
|
||||||
|
v.Cancel()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
})
|
||||||
|
}),
|
||||||
|
},
|
||||||
|
})
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
cmd.Flags().BoolVar(&opts.Daemon, "daemon", false, "Run in daemon mode with inotify monitoring")
|
||||||
|
cmd.Flags().BoolVar(&opts.Cron, "cron", false, "Run in cron mode (silent unless error)")
|
||||||
|
cmd.Flags().BoolVar(&opts.Prune, "prune", false, "Delete all previous snapshots and unreferenced blobs after backup")
|
||||||
|
cmd.Flags().BoolVar(&opts.SkipErrors, "skip-errors", false, "Skip file read errors (log them loudly but continue)")
|
||||||
|
|
||||||
|
return cmd
|
||||||
|
}
|
||||||
|
|
||||||
|
// newSnapshotListCommand creates the 'snapshot list' subcommand
|
||||||
|
func newSnapshotListCommand() *cobra.Command {
|
||||||
|
var jsonOutput bool
|
||||||
|
|
||||||
cmd := &cobra.Command{
|
cmd := &cobra.Command{
|
||||||
Use: "list",
|
Use: "list",
|
||||||
Short: "List snapshots",
|
Aliases: []string{"ls"},
|
||||||
Long: "List all snapshots in the bucket, sorted by timestamp",
|
Short: "List all snapshots",
|
||||||
|
Long: "Lists all snapshots with their ID, timestamp, and compressed size",
|
||||||
|
Args: cobra.NoArgs,
|
||||||
RunE: func(cmd *cobra.Command, args []string) error {
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
panic("unimplemented")
|
// Use unified config resolution
|
||||||
|
configPath, err := ResolveConfigPath()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
rootFlags := GetRootFlags()
|
||||||
|
return RunWithApp(cmd.Context(), AppOptions{
|
||||||
|
ConfigPath: configPath,
|
||||||
|
LogOptions: log.LogOptions{
|
||||||
|
Verbose: rootFlags.Verbose,
|
||||||
|
Debug: rootFlags.Debug,
|
||||||
|
Quiet: rootFlags.Quiet,
|
||||||
|
},
|
||||||
|
Modules: []fx.Option{},
|
||||||
|
Invokes: []fx.Option{
|
||||||
|
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||||
|
lc.Append(fx.Hook{
|
||||||
|
OnStart: func(ctx context.Context) error {
|
||||||
|
go func() {
|
||||||
|
if err := v.ListSnapshots(jsonOutput); err != nil {
|
||||||
|
if err != context.Canceled {
|
||||||
|
log.Error("Failed to list snapshots", "error", err)
|
||||||
|
os.Exit(1)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||||
|
log.Error("Failed to shutdown", "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
OnStop: func(ctx context.Context) error {
|
||||||
|
v.Cancel()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
})
|
||||||
|
}),
|
||||||
|
},
|
||||||
|
})
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
cmd.Flags().StringVar(&bucket, "bucket", "", "S3 bucket name")
|
cmd.Flags().BoolVar(&jsonOutput, "json", false, "Output in JSON format")
|
||||||
cmd.Flags().StringVar(&prefix, "prefix", "", "S3 prefix")
|
|
||||||
cmd.Flags().IntVar(&limit, "limit", 10, "Maximum number of snapshots to list")
|
|
||||||
cmd.MarkFlagRequired("bucket")
|
|
||||||
|
|
||||||
return cmd
|
return cmd
|
||||||
}
|
}
|
||||||
|
|
||||||
func snapshotRmCmd() *cobra.Command {
|
// newSnapshotPurgeCommand creates the 'snapshot purge' subcommand
|
||||||
var (
|
func newSnapshotPurgeCommand() *cobra.Command {
|
||||||
bucket string
|
var keepLatest bool
|
||||||
prefix string
|
var olderThan string
|
||||||
snapshot string
|
var force bool
|
||||||
)
|
|
||||||
|
|
||||||
cmd := &cobra.Command{
|
cmd := &cobra.Command{
|
||||||
Use: "rm",
|
Use: "purge",
|
||||||
Short: "Remove a snapshot",
|
Short: "Purge old snapshots",
|
||||||
Long: "Remove a snapshot and optionally its associated blobs",
|
Long: "Removes snapshots based on age or count criteria",
|
||||||
|
Args: cobra.NoArgs,
|
||||||
RunE: func(cmd *cobra.Command, args []string) error {
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
panic("unimplemented")
|
// Validate flags
|
||||||
|
if !keepLatest && olderThan == "" {
|
||||||
|
return fmt.Errorf("must specify either --keep-latest or --older-than")
|
||||||
|
}
|
||||||
|
if keepLatest && olderThan != "" {
|
||||||
|
return fmt.Errorf("cannot specify both --keep-latest and --older-than")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Use unified config resolution
|
||||||
|
configPath, err := ResolveConfigPath()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
rootFlags := GetRootFlags()
|
||||||
|
return RunWithApp(cmd.Context(), AppOptions{
|
||||||
|
ConfigPath: configPath,
|
||||||
|
LogOptions: log.LogOptions{
|
||||||
|
Verbose: rootFlags.Verbose,
|
||||||
|
Debug: rootFlags.Debug,
|
||||||
|
Quiet: rootFlags.Quiet,
|
||||||
|
},
|
||||||
|
Modules: []fx.Option{},
|
||||||
|
Invokes: []fx.Option{
|
||||||
|
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||||
|
lc.Append(fx.Hook{
|
||||||
|
OnStart: func(ctx context.Context) error {
|
||||||
|
go func() {
|
||||||
|
if err := v.PurgeSnapshots(keepLatest, olderThan, force); err != nil {
|
||||||
|
if err != context.Canceled {
|
||||||
|
log.Error("Failed to purge snapshots", "error", err)
|
||||||
|
os.Exit(1)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||||
|
log.Error("Failed to shutdown", "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
OnStop: func(ctx context.Context) error {
|
||||||
|
v.Cancel()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
})
|
||||||
|
}),
|
||||||
|
},
|
||||||
|
})
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
cmd.Flags().StringVar(&bucket, "bucket", "", "S3 bucket name")
|
cmd.Flags().BoolVar(&keepLatest, "keep-latest", false, "Keep only the latest snapshot")
|
||||||
cmd.Flags().StringVar(&prefix, "prefix", "", "S3 prefix")
|
cmd.Flags().StringVar(&olderThan, "older-than", "", "Remove snapshots older than duration (e.g., 30d, 6m, 1y)")
|
||||||
cmd.Flags().StringVar(&snapshot, "snapshot", "", "Snapshot ID to remove")
|
cmd.Flags().BoolVar(&force, "force", false, "Skip confirmation prompt")
|
||||||
cmd.MarkFlagRequired("bucket")
|
|
||||||
cmd.MarkFlagRequired("snapshot")
|
|
||||||
|
|
||||||
return cmd
|
return cmd
|
||||||
}
|
}
|
||||||
|
|
||||||
func snapshotLatestCmd() *cobra.Command {
|
// newSnapshotVerifyCommand creates the 'snapshot verify' subcommand
|
||||||
var (
|
func newSnapshotVerifyCommand() *cobra.Command {
|
||||||
bucket string
|
opts := &vaultik.VerifyOptions{}
|
||||||
prefix string
|
|
||||||
)
|
|
||||||
|
|
||||||
cmd := &cobra.Command{
|
cmd := &cobra.Command{
|
||||||
Use: "latest",
|
Use: "verify <snapshot-id>",
|
||||||
Short: "Get the latest snapshot ID",
|
Short: "Verify snapshot integrity",
|
||||||
Long: "Display the ID of the most recent snapshot",
|
Long: "Verifies that all blobs referenced in a snapshot exist",
|
||||||
|
Args: func(cmd *cobra.Command, args []string) error {
|
||||||
|
if len(args) != 1 {
|
||||||
|
_ = cmd.Help()
|
||||||
|
if len(args) == 0 {
|
||||||
|
return fmt.Errorf("snapshot ID required")
|
||||||
|
}
|
||||||
|
return fmt.Errorf("expected 1 argument, got %d", len(args))
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
},
|
||||||
RunE: func(cmd *cobra.Command, args []string) error {
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
panic("unimplemented")
|
snapshotID := args[0]
|
||||||
|
|
||||||
|
// Use unified config resolution
|
||||||
|
configPath, err := ResolveConfigPath()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
rootFlags := GetRootFlags()
|
||||||
|
return RunWithApp(cmd.Context(), AppOptions{
|
||||||
|
ConfigPath: configPath,
|
||||||
|
LogOptions: log.LogOptions{
|
||||||
|
Verbose: rootFlags.Verbose,
|
||||||
|
Debug: rootFlags.Debug,
|
||||||
|
Quiet: rootFlags.Quiet || opts.JSON,
|
||||||
|
},
|
||||||
|
Modules: []fx.Option{},
|
||||||
|
Invokes: []fx.Option{
|
||||||
|
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||||
|
lc.Append(fx.Hook{
|
||||||
|
OnStart: func(ctx context.Context) error {
|
||||||
|
go func() {
|
||||||
|
var err error
|
||||||
|
if opts.Deep {
|
||||||
|
err = v.RunDeepVerify(snapshotID, opts)
|
||||||
|
} else {
|
||||||
|
err = v.VerifySnapshotWithOptions(snapshotID, opts)
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
if err != context.Canceled {
|
||||||
|
if !opts.JSON {
|
||||||
|
log.Error("Verification failed", "error", err)
|
||||||
|
}
|
||||||
|
os.Exit(1)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||||
|
log.Error("Failed to shutdown", "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
OnStop: func(ctx context.Context) error {
|
||||||
|
v.Cancel()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
})
|
||||||
|
}),
|
||||||
|
},
|
||||||
|
})
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
cmd.Flags().StringVar(&bucket, "bucket", "", "S3 bucket name")
|
cmd.Flags().BoolVar(&opts.Deep, "deep", false, "Download and verify blob hashes")
|
||||||
cmd.Flags().StringVar(&prefix, "prefix", "", "S3 prefix")
|
cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output verification results as JSON")
|
||||||
cmd.MarkFlagRequired("bucket")
|
|
||||||
|
return cmd
|
||||||
|
}
|
||||||
|
|
||||||
|
// newSnapshotRemoveCommand creates the 'snapshot remove' subcommand
|
||||||
|
func newSnapshotRemoveCommand() *cobra.Command {
|
||||||
|
opts := &vaultik.RemoveOptions{}
|
||||||
|
|
||||||
|
cmd := &cobra.Command{
|
||||||
|
Use: "remove [snapshot-id]",
|
||||||
|
Aliases: []string{"rm"},
|
||||||
|
Short: "Remove a snapshot from the local database",
|
||||||
|
Long: `Removes a snapshot from the local database.
|
||||||
|
|
||||||
|
By default, only removes from the local database. Use --remote to also remove
|
||||||
|
the snapshot metadata from remote storage.
|
||||||
|
|
||||||
|
Note: This does NOT remove blobs. Use 'vaultik prune' to remove orphaned blobs
|
||||||
|
after removing snapshots.
|
||||||
|
|
||||||
|
Use --all --force to remove all snapshots.`,
|
||||||
|
Args: func(cmd *cobra.Command, args []string) error {
|
||||||
|
all, _ := cmd.Flags().GetBool("all")
|
||||||
|
if all {
|
||||||
|
if len(args) > 0 {
|
||||||
|
_ = cmd.Help()
|
||||||
|
return fmt.Errorf("--all cannot be used with a snapshot ID")
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
if len(args) != 1 {
|
||||||
|
_ = cmd.Help()
|
||||||
|
if len(args) == 0 {
|
||||||
|
return fmt.Errorf("snapshot ID required (or use --all --force)")
|
||||||
|
}
|
||||||
|
return fmt.Errorf("expected 1 argument, got %d", len(args))
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
|
// Use unified config resolution
|
||||||
|
configPath, err := ResolveConfigPath()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
rootFlags := GetRootFlags()
|
||||||
|
return RunWithApp(cmd.Context(), AppOptions{
|
||||||
|
ConfigPath: configPath,
|
||||||
|
LogOptions: log.LogOptions{
|
||||||
|
Verbose: rootFlags.Verbose,
|
||||||
|
Debug: rootFlags.Debug,
|
||||||
|
Quiet: rootFlags.Quiet || opts.JSON,
|
||||||
|
},
|
||||||
|
Modules: []fx.Option{},
|
||||||
|
Invokes: []fx.Option{
|
||||||
|
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||||
|
lc.Append(fx.Hook{
|
||||||
|
OnStart: func(ctx context.Context) error {
|
||||||
|
go func() {
|
||||||
|
var err error
|
||||||
|
if opts.All {
|
||||||
|
_, err = v.RemoveAllSnapshots(opts)
|
||||||
|
} else {
|
||||||
|
_, err = v.RemoveSnapshot(args[0], opts)
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
if err != context.Canceled {
|
||||||
|
if !opts.JSON {
|
||||||
|
log.Error("Failed to remove snapshot", "error", err)
|
||||||
|
}
|
||||||
|
os.Exit(1)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||||
|
log.Error("Failed to shutdown", "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
OnStop: func(ctx context.Context) error {
|
||||||
|
v.Cancel()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
})
|
||||||
|
}),
|
||||||
|
},
|
||||||
|
})
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
cmd.Flags().BoolVarP(&opts.Force, "force", "f", false, "Skip confirmation prompt")
|
||||||
|
cmd.Flags().BoolVar(&opts.DryRun, "dry-run", false, "Show what would be removed without removing")
|
||||||
|
cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output result as JSON")
|
||||||
|
cmd.Flags().BoolVar(&opts.Remote, "remote", false, "Also remove snapshot metadata from remote storage")
|
||||||
|
cmd.Flags().BoolVar(&opts.All, "all", false, "Remove all snapshots (requires --force)")
|
||||||
|
|
||||||
|
return cmd
|
||||||
|
}
|
||||||
|
|
||||||
|
// newSnapshotPruneCommand creates the 'snapshot prune' subcommand
|
||||||
|
func newSnapshotPruneCommand() *cobra.Command {
|
||||||
|
cmd := &cobra.Command{
|
||||||
|
Use: "prune",
|
||||||
|
Short: "Remove orphaned data from local database",
|
||||||
|
Long: `Removes orphaned files, chunks, and blobs from the local database.
|
||||||
|
|
||||||
|
This cleans up data that is no longer referenced by any snapshot, which can
|
||||||
|
accumulate from incomplete backups or deleted snapshots.`,
|
||||||
|
Args: cobra.NoArgs,
|
||||||
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
|
// Use unified config resolution
|
||||||
|
configPath, err := ResolveConfigPath()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
rootFlags := GetRootFlags()
|
||||||
|
return RunWithApp(cmd.Context(), AppOptions{
|
||||||
|
ConfigPath: configPath,
|
||||||
|
LogOptions: log.LogOptions{
|
||||||
|
Verbose: rootFlags.Verbose,
|
||||||
|
Debug: rootFlags.Debug,
|
||||||
|
Quiet: rootFlags.Quiet,
|
||||||
|
},
|
||||||
|
Modules: []fx.Option{},
|
||||||
|
Invokes: []fx.Option{
|
||||||
|
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||||
|
lc.Append(fx.Hook{
|
||||||
|
OnStart: func(ctx context.Context) error {
|
||||||
|
go func() {
|
||||||
|
if _, err := v.PruneDatabase(); err != nil {
|
||||||
|
if err != context.Canceled {
|
||||||
|
log.Error("Failed to prune database", "error", err)
|
||||||
|
os.Exit(1)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||||
|
log.Error("Failed to shutdown", "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
OnStop: func(ctx context.Context) error {
|
||||||
|
v.Cancel()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
})
|
||||||
|
}),
|
||||||
|
},
|
||||||
|
})
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
return cmd
|
return cmd
|
||||||
}
|
}
|
||||||
|
|||||||
158
internal/cli/store.go
Normal file
158
internal/cli/store.go
Normal file
@@ -0,0 +1,158 @@
|
|||||||
|
package cli
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
"strings"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/storage"
|
||||||
|
"github.com/spf13/cobra"
|
||||||
|
"go.uber.org/fx"
|
||||||
|
)
|
||||||
|
|
||||||
|
// StoreApp contains dependencies for store commands
|
||||||
|
type StoreApp struct {
|
||||||
|
Storage storage.Storer
|
||||||
|
Shutdowner fx.Shutdowner
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewStoreCommand creates the store command and subcommands
|
||||||
|
func NewStoreCommand() *cobra.Command {
|
||||||
|
cmd := &cobra.Command{
|
||||||
|
Use: "store",
|
||||||
|
Short: "Storage information commands",
|
||||||
|
Long: "Commands for viewing information about the storage backend",
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add subcommands
|
||||||
|
cmd.AddCommand(newStoreInfoCommand())
|
||||||
|
|
||||||
|
return cmd
|
||||||
|
}
|
||||||
|
|
||||||
|
// newStoreInfoCommand creates the 'store info' subcommand
|
||||||
|
func newStoreInfoCommand() *cobra.Command {
|
||||||
|
return &cobra.Command{
|
||||||
|
Use: "info",
|
||||||
|
Short: "Display storage information",
|
||||||
|
Long: "Shows storage configuration and statistics including snapshots and blobs",
|
||||||
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
|
return runWithApp(cmd.Context(), func(app *StoreApp) error {
|
||||||
|
return app.Info(cmd.Context())
|
||||||
|
})
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Info displays storage information
|
||||||
|
func (app *StoreApp) Info(ctx context.Context) error {
|
||||||
|
// Get storage info
|
||||||
|
storageInfo := app.Storage.Info()
|
||||||
|
|
||||||
|
fmt.Printf("Storage Information\n")
|
||||||
|
fmt.Printf("==================\n\n")
|
||||||
|
fmt.Printf("Storage Configuration:\n")
|
||||||
|
fmt.Printf(" Type: %s\n", storageInfo.Type)
|
||||||
|
fmt.Printf(" Location: %s\n\n", storageInfo.Location)
|
||||||
|
|
||||||
|
// Count snapshots by listing metadata/ prefix
|
||||||
|
snapshotCount := 0
|
||||||
|
snapshotCh := app.Storage.ListStream(ctx, "metadata/")
|
||||||
|
snapshotDirs := make(map[string]bool)
|
||||||
|
|
||||||
|
for object := range snapshotCh {
|
||||||
|
if object.Err != nil {
|
||||||
|
return fmt.Errorf("listing snapshots: %w", object.Err)
|
||||||
|
}
|
||||||
|
// Extract snapshot ID from path like metadata/2024-01-15-143052-hostname/
|
||||||
|
parts := strings.Split(object.Key, "/")
|
||||||
|
if len(parts) >= 2 && parts[0] == "metadata" && parts[1] != "" {
|
||||||
|
snapshotDirs[parts[1]] = true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
snapshotCount = len(snapshotDirs)
|
||||||
|
|
||||||
|
// Count blobs and calculate total size by listing blobs/ prefix
|
||||||
|
blobCount := 0
|
||||||
|
var totalSize int64
|
||||||
|
|
||||||
|
blobCh := app.Storage.ListStream(ctx, "blobs/")
|
||||||
|
for object := range blobCh {
|
||||||
|
if object.Err != nil {
|
||||||
|
return fmt.Errorf("listing blobs: %w", object.Err)
|
||||||
|
}
|
||||||
|
if !strings.HasSuffix(object.Key, "/") { // Skip directories
|
||||||
|
blobCount++
|
||||||
|
totalSize += object.Size
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Printf("Storage Statistics:\n")
|
||||||
|
fmt.Printf(" Snapshots: %d\n", snapshotCount)
|
||||||
|
fmt.Printf(" Blobs: %d\n", blobCount)
|
||||||
|
fmt.Printf(" Total Size: %s\n", formatBytes(totalSize))
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// formatBytes formats bytes into human-readable format
|
||||||
|
func formatBytes(bytes int64) string {
|
||||||
|
const unit = 1024
|
||||||
|
if bytes < unit {
|
||||||
|
return fmt.Sprintf("%d B", bytes)
|
||||||
|
}
|
||||||
|
div, exp := int64(unit), 0
|
||||||
|
for n := bytes / unit; n >= unit; n /= unit {
|
||||||
|
div *= unit
|
||||||
|
exp++
|
||||||
|
}
|
||||||
|
return fmt.Sprintf("%.1f %cB", float64(bytes)/float64(div), "KMGTPE"[exp])
|
||||||
|
}
|
||||||
|
|
||||||
|
// runWithApp creates the FX app and runs the given function
|
||||||
|
func runWithApp(ctx context.Context, fn func(*StoreApp) error) error {
|
||||||
|
var result error
|
||||||
|
rootFlags := GetRootFlags()
|
||||||
|
|
||||||
|
// Use unified config resolution
|
||||||
|
configPath, err := ResolveConfigPath()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
err = RunWithApp(ctx, AppOptions{
|
||||||
|
ConfigPath: configPath,
|
||||||
|
LogOptions: log.LogOptions{
|
||||||
|
Verbose: rootFlags.Verbose,
|
||||||
|
Debug: rootFlags.Debug,
|
||||||
|
Quiet: rootFlags.Quiet,
|
||||||
|
},
|
||||||
|
Modules: []fx.Option{
|
||||||
|
fx.Provide(func(storer storage.Storer, shutdowner fx.Shutdowner) *StoreApp {
|
||||||
|
return &StoreApp{
|
||||||
|
Storage: storer,
|
||||||
|
Shutdowner: shutdowner,
|
||||||
|
}
|
||||||
|
}),
|
||||||
|
},
|
||||||
|
Invokes: []fx.Option{
|
||||||
|
fx.Invoke(func(app *StoreApp, shutdowner fx.Shutdowner) {
|
||||||
|
result = fn(app)
|
||||||
|
// Shutdown after command completes
|
||||||
|
go func() {
|
||||||
|
time.Sleep(100 * time.Millisecond) // Brief delay to ensure clean shutdown
|
||||||
|
if err := shutdowner.Shutdown(); err != nil {
|
||||||
|
log.Error("Failed to shutdown", "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
}),
|
||||||
|
},
|
||||||
|
})
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
return result
|
||||||
|
}
|
||||||
10
internal/cli/vaultik_snapshot_types.go
Normal file
10
internal/cli/vaultik_snapshot_types.go
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
package cli
|
||||||
|
|
||||||
|
import "time"
|
||||||
|
|
||||||
|
// SnapshotInfo represents snapshot information for listing
|
||||||
|
type SnapshotInfo struct {
|
||||||
|
ID string `json:"id"`
|
||||||
|
Timestamp time.Time `json:"timestamp"`
|
||||||
|
CompressedSize int64 `json:"compressed_size"`
|
||||||
|
}
|
||||||
@@ -2,85 +2,97 @@ package cli
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"context"
|
"context"
|
||||||
"fmt"
|
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
"git.eeqj.de/sneak/vaultik/internal/globals"
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/vaultik"
|
||||||
"github.com/spf13/cobra"
|
"github.com/spf13/cobra"
|
||||||
"go.uber.org/fx"
|
"go.uber.org/fx"
|
||||||
)
|
)
|
||||||
|
|
||||||
// VerifyOptions contains options for the verify command
|
|
||||||
type VerifyOptions struct {
|
|
||||||
Bucket string
|
|
||||||
Prefix string
|
|
||||||
SnapshotID string
|
|
||||||
Quick bool
|
|
||||||
}
|
|
||||||
|
|
||||||
// NewVerifyCommand creates the verify command
|
// NewVerifyCommand creates the verify command
|
||||||
func NewVerifyCommand() *cobra.Command {
|
func NewVerifyCommand() *cobra.Command {
|
||||||
opts := &VerifyOptions{}
|
opts := &vaultik.VerifyOptions{}
|
||||||
|
|
||||||
cmd := &cobra.Command{
|
cmd := &cobra.Command{
|
||||||
Use: "verify",
|
Use: "verify <snapshot-id>",
|
||||||
Short: "Verify backup integrity",
|
Short: "Verify snapshot integrity",
|
||||||
Long: `Check that all referenced blobs exist and verify metadata integrity`,
|
Long: `Verifies that all blobs referenced in a snapshot exist and optionally verifies their contents.
|
||||||
Args: cobra.NoArgs,
|
|
||||||
|
Shallow verification (default):
|
||||||
|
- Downloads and decompresses manifest
|
||||||
|
- Checks existence of all blobs in S3
|
||||||
|
- Reports missing blobs
|
||||||
|
|
||||||
|
Deep verification (--deep):
|
||||||
|
- Downloads and decrypts database
|
||||||
|
- Verifies blob lists match between manifest and database
|
||||||
|
- Downloads, decrypts, and decompresses each blob
|
||||||
|
- Verifies SHA256 hash of each chunk matches database
|
||||||
|
- Ensures chunks are ordered correctly
|
||||||
|
|
||||||
|
The command will fail immediately on any verification error and exit with non-zero status.`,
|
||||||
|
Args: cobra.ExactArgs(1),
|
||||||
RunE: func(cmd *cobra.Command, args []string) error {
|
RunE: func(cmd *cobra.Command, args []string) error {
|
||||||
// Validate required flags
|
snapshotID := args[0]
|
||||||
if opts.Bucket == "" {
|
|
||||||
return fmt.Errorf("--bucket is required")
|
// Use unified config resolution
|
||||||
|
configPath, err := ResolveConfigPath()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
}
|
}
|
||||||
if opts.Prefix == "" {
|
|
||||||
return fmt.Errorf("--prefix is required")
|
// Use the app framework for all verification
|
||||||
|
rootFlags := GetRootFlags()
|
||||||
|
return RunWithApp(cmd.Context(), AppOptions{
|
||||||
|
ConfigPath: configPath,
|
||||||
|
LogOptions: log.LogOptions{
|
||||||
|
Verbose: rootFlags.Verbose,
|
||||||
|
Debug: rootFlags.Debug,
|
||||||
|
Quiet: rootFlags.Quiet || opts.JSON, // Suppress log output in JSON mode
|
||||||
|
},
|
||||||
|
Modules: []fx.Option{},
|
||||||
|
Invokes: []fx.Option{
|
||||||
|
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||||
|
lc.Append(fx.Hook{
|
||||||
|
OnStart: func(ctx context.Context) error {
|
||||||
|
// Run the verify operation directly
|
||||||
|
go func() {
|
||||||
|
var err error
|
||||||
|
if opts.Deep {
|
||||||
|
err = v.RunDeepVerify(snapshotID, opts)
|
||||||
|
} else {
|
||||||
|
err = v.VerifySnapshotWithOptions(snapshotID, opts)
|
||||||
}
|
}
|
||||||
return runVerify(cmd.Context(), opts)
|
|
||||||
|
if err != nil {
|
||||||
|
if err != context.Canceled {
|
||||||
|
if !opts.JSON {
|
||||||
|
log.Error("Verification failed", "error", err)
|
||||||
|
}
|
||||||
|
os.Exit(1)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||||
|
log.Error("Failed to shutdown", "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
OnStop: func(ctx context.Context) error {
|
||||||
|
log.Debug("Stopping verify operation")
|
||||||
|
v.Cancel()
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
})
|
||||||
|
}),
|
||||||
|
},
|
||||||
|
})
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name")
|
cmd.Flags().BoolVar(&opts.Deep, "deep", false, "Perform deep verification by downloading and verifying all blob contents")
|
||||||
cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix")
|
cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output verification results as JSON")
|
||||||
cmd.Flags().StringVar(&opts.SnapshotID, "snapshot", "", "Snapshot ID to verify (optional, defaults to latest)")
|
|
||||||
cmd.Flags().BoolVar(&opts.Quick, "quick", false, "Perform quick verification by checking blob existence and S3 content hashes without downloading")
|
|
||||||
|
|
||||||
return cmd
|
return cmd
|
||||||
}
|
}
|
||||||
|
|
||||||
func runVerify(ctx context.Context, opts *VerifyOptions) error {
|
|
||||||
if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
|
|
||||||
return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
|
|
||||||
}
|
|
||||||
|
|
||||||
app := fx.New(
|
|
||||||
fx.Supply(opts),
|
|
||||||
fx.Provide(globals.New),
|
|
||||||
// Additional modules will be added here
|
|
||||||
fx.Invoke(func(g *globals.Globals) error {
|
|
||||||
// TODO: Implement verify logic
|
|
||||||
if opts.SnapshotID == "" {
|
|
||||||
fmt.Printf("Verifying latest snapshot in bucket %s with prefix %s\n", opts.Bucket, opts.Prefix)
|
|
||||||
} else {
|
|
||||||
fmt.Printf("Verifying snapshot %s in bucket %s with prefix %s\n", opts.SnapshotID, opts.Bucket, opts.Prefix)
|
|
||||||
}
|
|
||||||
if opts.Quick {
|
|
||||||
fmt.Println("Performing quick verification")
|
|
||||||
} else {
|
|
||||||
fmt.Println("Performing deep verification")
|
|
||||||
}
|
|
||||||
return nil
|
|
||||||
}),
|
|
||||||
fx.NopLogger,
|
|
||||||
)
|
|
||||||
|
|
||||||
if err := app.Start(ctx); err != nil {
|
|
||||||
return fmt.Errorf("failed to start verify: %w", err)
|
|
||||||
}
|
|
||||||
defer func() {
|
|
||||||
if err := app.Stop(ctx); err != nil {
|
|
||||||
fmt.Printf("error stopping app: %v\n", err)
|
|
||||||
}
|
|
||||||
}()
|
|
||||||
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
|
|||||||
27
internal/cli/version.go
Normal file
27
internal/cli/version.go
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
package cli
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"runtime"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/globals"
|
||||||
|
"github.com/spf13/cobra"
|
||||||
|
)
|
||||||
|
|
||||||
|
// NewVersionCommand creates the version command
|
||||||
|
func NewVersionCommand() *cobra.Command {
|
||||||
|
cmd := &cobra.Command{
|
||||||
|
Use: "version",
|
||||||
|
Short: "Print version information",
|
||||||
|
Long: `Print version, git commit, and build information for vaultik.`,
|
||||||
|
Args: cobra.NoArgs,
|
||||||
|
Run: func(cmd *cobra.Command, args []string) {
|
||||||
|
fmt.Printf("vaultik %s\n", globals.Version)
|
||||||
|
fmt.Printf(" commit: %s\n", globals.Commit)
|
||||||
|
fmt.Printf(" go: %s\n", runtime.Version())
|
||||||
|
fmt.Printf(" os/arch: %s/%s\n", runtime.GOOS, runtime.GOARCH)
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
return cmd
|
||||||
|
}
|
||||||
@@ -3,30 +3,112 @@ package config
|
|||||||
import (
|
import (
|
||||||
"fmt"
|
"fmt"
|
||||||
"os"
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"sort"
|
||||||
|
"strings"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
|
"filippo.io/age"
|
||||||
|
"git.eeqj.de/sneak/smartconfig"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"github.com/adrg/xdg"
|
||||||
"go.uber.org/fx"
|
"go.uber.org/fx"
|
||||||
"gopkg.in/yaml.v3"
|
"gopkg.in/yaml.v3"
|
||||||
)
|
)
|
||||||
|
|
||||||
// Config represents the application configuration
|
const appName = "berlin.sneak.app.vaultik"
|
||||||
|
|
||||||
|
// expandTilde expands ~ at the start of a path to the user's home directory.
|
||||||
|
func expandTilde(path string) string {
|
||||||
|
if path == "~" {
|
||||||
|
home, _ := os.UserHomeDir()
|
||||||
|
return home
|
||||||
|
}
|
||||||
|
if strings.HasPrefix(path, "~/") {
|
||||||
|
home, _ := os.UserHomeDir()
|
||||||
|
return filepath.Join(home, path[2:])
|
||||||
|
}
|
||||||
|
return path
|
||||||
|
}
|
||||||
|
|
||||||
|
// expandTildeInURL expands ~ in file:// URLs.
|
||||||
|
func expandTildeInURL(url string) string {
|
||||||
|
if strings.HasPrefix(url, "file://~/") {
|
||||||
|
home, _ := os.UserHomeDir()
|
||||||
|
return "file://" + filepath.Join(home, url[9:])
|
||||||
|
}
|
||||||
|
return url
|
||||||
|
}
|
||||||
|
|
||||||
|
// SnapshotConfig represents configuration for a named snapshot.
|
||||||
|
// Each snapshot backs up one or more paths and can have its own exclude patterns
|
||||||
|
// in addition to the global excludes.
|
||||||
|
type SnapshotConfig struct {
|
||||||
|
Paths []string `yaml:"paths"`
|
||||||
|
Exclude []string `yaml:"exclude"` // Additional excludes for this snapshot
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetExcludes returns the combined exclude patterns for a named snapshot.
|
||||||
|
// It merges global excludes with the snapshot-specific excludes.
|
||||||
|
func (c *Config) GetExcludes(snapshotName string) []string {
|
||||||
|
snap, ok := c.Snapshots[snapshotName]
|
||||||
|
if !ok {
|
||||||
|
return c.Exclude
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(snap.Exclude) == 0 {
|
||||||
|
return c.Exclude
|
||||||
|
}
|
||||||
|
|
||||||
|
// Combine global and snapshot-specific excludes
|
||||||
|
combined := make([]string, 0, len(c.Exclude)+len(snap.Exclude))
|
||||||
|
combined = append(combined, c.Exclude...)
|
||||||
|
combined = append(combined, snap.Exclude...)
|
||||||
|
return combined
|
||||||
|
}
|
||||||
|
|
||||||
|
// SnapshotNames returns the names of all configured snapshots in sorted order.
|
||||||
|
func (c *Config) SnapshotNames() []string {
|
||||||
|
names := make([]string, 0, len(c.Snapshots))
|
||||||
|
for name := range c.Snapshots {
|
||||||
|
names = append(names, name)
|
||||||
|
}
|
||||||
|
// Sort for deterministic order
|
||||||
|
sort.Strings(names)
|
||||||
|
return names
|
||||||
|
}
|
||||||
|
|
||||||
|
// Config represents the application configuration for Vaultik.
|
||||||
|
// It defines all settings for backup operations, including source directories,
|
||||||
|
// encryption recipients, storage configuration, and performance tuning parameters.
|
||||||
|
// Configuration is typically loaded from a YAML file.
|
||||||
type Config struct {
|
type Config struct {
|
||||||
AgeRecipient string `yaml:"age_recipient"`
|
AgeRecipients []string `yaml:"age_recipients"`
|
||||||
|
AgeSecretKey string `yaml:"age_secret_key"`
|
||||||
BackupInterval time.Duration `yaml:"backup_interval"`
|
BackupInterval time.Duration `yaml:"backup_interval"`
|
||||||
BlobSizeLimit int64 `yaml:"blob_size_limit"`
|
BlobSizeLimit Size `yaml:"blob_size_limit"`
|
||||||
ChunkSize int64 `yaml:"chunk_size"`
|
ChunkSize Size `yaml:"chunk_size"`
|
||||||
Exclude []string `yaml:"exclude"`
|
Exclude []string `yaml:"exclude"` // Global excludes applied to all snapshots
|
||||||
FullScanInterval time.Duration `yaml:"full_scan_interval"`
|
FullScanInterval time.Duration `yaml:"full_scan_interval"`
|
||||||
Hostname string `yaml:"hostname"`
|
Hostname string `yaml:"hostname"`
|
||||||
IndexPath string `yaml:"index_path"`
|
IndexPath string `yaml:"index_path"`
|
||||||
IndexPrefix string `yaml:"index_prefix"`
|
|
||||||
MinTimeBetweenRun time.Duration `yaml:"min_time_between_run"`
|
MinTimeBetweenRun time.Duration `yaml:"min_time_between_run"`
|
||||||
S3 S3Config `yaml:"s3"`
|
S3 S3Config `yaml:"s3"`
|
||||||
SourceDirs []string `yaml:"source_dirs"`
|
Snapshots map[string]SnapshotConfig `yaml:"snapshots"`
|
||||||
CompressionLevel int `yaml:"compression_level"`
|
CompressionLevel int `yaml:"compression_level"`
|
||||||
|
|
||||||
|
// StorageURL specifies the storage backend using a URL format.
|
||||||
|
// Takes precedence over S3Config if set.
|
||||||
|
// Supported formats:
|
||||||
|
// - s3://bucket/prefix?endpoint=host®ion=us-east-1
|
||||||
|
// - file:///path/to/backup
|
||||||
|
// For S3 URLs, credentials are still read from s3.access_key_id and s3.secret_access_key.
|
||||||
|
StorageURL string `yaml:"storage_url"`
|
||||||
}
|
}
|
||||||
|
|
||||||
// S3Config represents S3 storage configuration
|
// S3Config represents S3 storage configuration for backup storage.
|
||||||
|
// It supports both AWS S3 and S3-compatible storage services.
|
||||||
|
// All fields except UseSSL and PartSize are required.
|
||||||
type S3Config struct {
|
type S3Config struct {
|
||||||
Endpoint string `yaml:"endpoint"`
|
Endpoint string `yaml:"endpoint"`
|
||||||
Bucket string `yaml:"bucket"`
|
Bucket string `yaml:"bucket"`
|
||||||
@@ -35,13 +117,17 @@ type S3Config struct {
|
|||||||
SecretAccessKey string `yaml:"secret_access_key"`
|
SecretAccessKey string `yaml:"secret_access_key"`
|
||||||
Region string `yaml:"region"`
|
Region string `yaml:"region"`
|
||||||
UseSSL bool `yaml:"use_ssl"`
|
UseSSL bool `yaml:"use_ssl"`
|
||||||
PartSize int64 `yaml:"part_size"`
|
PartSize Size `yaml:"part_size"`
|
||||||
}
|
}
|
||||||
|
|
||||||
// ConfigPath wraps the config file path for fx injection
|
// ConfigPath wraps the config file path for fx dependency injection.
|
||||||
|
// This type allows the config file path to be injected as a distinct type
|
||||||
|
// rather than a plain string, avoiding conflicts with other string dependencies.
|
||||||
type ConfigPath string
|
type ConfigPath string
|
||||||
|
|
||||||
// New creates a new Config instance
|
// New creates a new Config instance by loading from the specified path.
|
||||||
|
// This function is used by the fx dependency injection framework.
|
||||||
|
// Returns an error if the path is empty or if loading fails.
|
||||||
func New(path ConfigPath) (*Config, error) {
|
func New(path ConfigPath) (*Config, error) {
|
||||||
if path == "" {
|
if path == "" {
|
||||||
return nil, fmt.Errorf("config path not provided")
|
return nil, fmt.Errorf("config path not provided")
|
||||||
@@ -55,32 +141,60 @@ func New(path ConfigPath) (*Config, error) {
|
|||||||
return cfg, nil
|
return cfg, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
// Load reads and parses the configuration file
|
// Load reads and parses the configuration file from the specified path.
|
||||||
|
// It applies default values for optional fields, performs environment variable
|
||||||
|
// substitution using smartconfig, and validates the configuration.
|
||||||
|
// The configuration file should be in YAML format. Returns an error if the file
|
||||||
|
// cannot be read, parsed, or if validation fails.
|
||||||
func Load(path string) (*Config, error) {
|
func Load(path string) (*Config, error) {
|
||||||
data, err := os.ReadFile(path)
|
// Load config using smartconfig for interpolation
|
||||||
|
sc, err := smartconfig.NewFromConfigPath(path)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, fmt.Errorf("failed to read config file: %w", err)
|
return nil, fmt.Errorf("failed to load config file: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
cfg := &Config{
|
cfg := &Config{
|
||||||
// Set defaults
|
// Set defaults
|
||||||
BlobSizeLimit: 10 * 1024 * 1024 * 1024, // 10GB
|
BlobSizeLimit: Size(10 * 1024 * 1024 * 1024), // 10GB
|
||||||
ChunkSize: 10 * 1024 * 1024, // 10MB
|
ChunkSize: Size(10 * 1024 * 1024), // 10MB
|
||||||
BackupInterval: 1 * time.Hour,
|
BackupInterval: 1 * time.Hour,
|
||||||
FullScanInterval: 24 * time.Hour,
|
FullScanInterval: 24 * time.Hour,
|
||||||
MinTimeBetweenRun: 15 * time.Minute,
|
MinTimeBetweenRun: 15 * time.Minute,
|
||||||
IndexPath: "/var/lib/vaultik/index.sqlite",
|
IndexPath: filepath.Join(xdg.DataHome, appName, "index.sqlite"),
|
||||||
IndexPrefix: "index/",
|
|
||||||
CompressionLevel: 3,
|
CompressionLevel: 3,
|
||||||
}
|
}
|
||||||
|
|
||||||
if err := yaml.Unmarshal(data, cfg); err != nil {
|
// Convert smartconfig data to YAML then unmarshal
|
||||||
|
configData := sc.Data()
|
||||||
|
yamlBytes, err := yaml.Marshal(configData)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to marshal config data: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := yaml.Unmarshal(yamlBytes, cfg); err != nil {
|
||||||
return nil, fmt.Errorf("failed to parse config: %w", err)
|
return nil, fmt.Errorf("failed to parse config: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Expand tilde in all path fields
|
||||||
|
cfg.IndexPath = expandTilde(cfg.IndexPath)
|
||||||
|
cfg.StorageURL = expandTildeInURL(cfg.StorageURL)
|
||||||
|
|
||||||
|
// Expand tildes in snapshot paths
|
||||||
|
for name, snap := range cfg.Snapshots {
|
||||||
|
for i, path := range snap.Paths {
|
||||||
|
snap.Paths[i] = expandTilde(path)
|
||||||
|
}
|
||||||
|
cfg.Snapshots[name] = snap
|
||||||
|
}
|
||||||
|
|
||||||
// Check for environment variable override for IndexPath
|
// Check for environment variable override for IndexPath
|
||||||
if envIndexPath := os.Getenv("VAULTIK_INDEX_PATH"); envIndexPath != "" {
|
if envIndexPath := os.Getenv("VAULTIK_INDEX_PATH"); envIndexPath != "" {
|
||||||
cfg.IndexPath = envIndexPath
|
cfg.IndexPath = expandTilde(envIndexPath)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check for environment variable override for AgeSecretKey
|
||||||
|
if envAgeSecretKey := os.Getenv("VAULTIK_AGE_SECRET_KEY"); envAgeSecretKey != "" {
|
||||||
|
cfg.AgeSecretKey = extractAgeSecretKey(envAgeSecretKey)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Get hostname if not set
|
// Get hostname if not set
|
||||||
@@ -97,7 +211,18 @@ func Load(path string) (*Config, error) {
|
|||||||
cfg.S3.Region = "us-east-1"
|
cfg.S3.Region = "us-east-1"
|
||||||
}
|
}
|
||||||
if cfg.S3.PartSize == 0 {
|
if cfg.S3.PartSize == 0 {
|
||||||
cfg.S3.PartSize = 5 * 1024 * 1024 // 5MB
|
cfg.S3.PartSize = Size(5 * 1024 * 1024) // 5MB
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check config file permissions (warn if world or group readable)
|
||||||
|
if info, err := os.Stat(path); err == nil {
|
||||||
|
mode := info.Mode().Perm()
|
||||||
|
if mode&0044 != 0 { // group or world readable
|
||||||
|
log.Warn("Config file has insecure permissions (contains S3 credentials)",
|
||||||
|
"path", path,
|
||||||
|
"mode", fmt.Sprintf("%04o", mode),
|
||||||
|
"recommendation", "chmod 600 "+path)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if err := cfg.Validate(); err != nil {
|
if err := cfg.Validate(); err != nil {
|
||||||
@@ -107,37 +232,40 @@ func Load(path string) (*Config, error) {
|
|||||||
return cfg, nil
|
return cfg, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
// Validate checks if the configuration is valid
|
// Validate checks if the configuration is valid and complete.
|
||||||
|
// It ensures all required fields are present and have valid values:
|
||||||
|
// - At least one age recipient must be specified
|
||||||
|
// - At least one snapshot must be configured with at least one path
|
||||||
|
// - Storage must be configured (either storage_url or s3.* fields)
|
||||||
|
// - Chunk size must be at least 1MB
|
||||||
|
// - Blob size limit must be at least the chunk size
|
||||||
|
// - Compression level must be between 1 and 19
|
||||||
|
// Returns an error describing the first validation failure encountered.
|
||||||
func (c *Config) Validate() error {
|
func (c *Config) Validate() error {
|
||||||
if c.AgeRecipient == "" {
|
if len(c.AgeRecipients) == 0 {
|
||||||
return fmt.Errorf("age_recipient is required")
|
return fmt.Errorf("at least one age_recipient is required")
|
||||||
}
|
}
|
||||||
|
|
||||||
if len(c.SourceDirs) == 0 {
|
if len(c.Snapshots) == 0 {
|
||||||
return fmt.Errorf("at least one source directory is required")
|
return fmt.Errorf("at least one snapshot must be configured")
|
||||||
}
|
}
|
||||||
|
|
||||||
if c.S3.Endpoint == "" {
|
for name, snap := range c.Snapshots {
|
||||||
return fmt.Errorf("s3.endpoint is required")
|
if len(snap.Paths) == 0 {
|
||||||
|
return fmt.Errorf("snapshot %q must have at least one path", name)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if c.S3.Bucket == "" {
|
// Validate storage configuration
|
||||||
return fmt.Errorf("s3.bucket is required")
|
if err := c.validateStorage(); err != nil {
|
||||||
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
if c.S3.AccessKeyID == "" {
|
if c.ChunkSize.Int64() < 1024*1024 { // 1MB minimum
|
||||||
return fmt.Errorf("s3.access_key_id is required")
|
|
||||||
}
|
|
||||||
|
|
||||||
if c.S3.SecretAccessKey == "" {
|
|
||||||
return fmt.Errorf("s3.secret_access_key is required")
|
|
||||||
}
|
|
||||||
|
|
||||||
if c.ChunkSize < 1024*1024 { // 1MB minimum
|
|
||||||
return fmt.Errorf("chunk_size must be at least 1MB")
|
return fmt.Errorf("chunk_size must be at least 1MB")
|
||||||
}
|
}
|
||||||
|
|
||||||
if c.BlobSizeLimit < c.ChunkSize {
|
if c.BlobSizeLimit.Int64() < c.ChunkSize.Int64() {
|
||||||
return fmt.Errorf("blob_size_limit must be at least chunk_size")
|
return fmt.Errorf("blob_size_limit must be at least chunk_size")
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -148,7 +276,71 @@ func (c *Config) Validate() error {
|
|||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
// Module exports the config module for fx
|
// validateStorage validates storage configuration.
|
||||||
|
// If StorageURL is set, it takes precedence. S3 URLs require credentials.
|
||||||
|
// File URLs don't require any S3 configuration.
|
||||||
|
// If StorageURL is not set, legacy S3 configuration is required.
|
||||||
|
func (c *Config) validateStorage() error {
|
||||||
|
if c.StorageURL != "" {
|
||||||
|
// URL-based configuration
|
||||||
|
if strings.HasPrefix(c.StorageURL, "file://") {
|
||||||
|
// File storage doesn't need S3 credentials
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
if strings.HasPrefix(c.StorageURL, "s3://") {
|
||||||
|
// S3 storage needs credentials
|
||||||
|
if c.S3.AccessKeyID == "" {
|
||||||
|
return fmt.Errorf("s3.access_key_id is required for s3:// URLs")
|
||||||
|
}
|
||||||
|
if c.S3.SecretAccessKey == "" {
|
||||||
|
return fmt.Errorf("s3.secret_access_key is required for s3:// URLs")
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
if strings.HasPrefix(c.StorageURL, "rclone://") {
|
||||||
|
// Rclone storage uses rclone's own config
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
return fmt.Errorf("storage_url must start with s3://, file://, or rclone://")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Legacy S3 configuration
|
||||||
|
if c.S3.Endpoint == "" {
|
||||||
|
return fmt.Errorf("s3.endpoint is required (or set storage_url)")
|
||||||
|
}
|
||||||
|
|
||||||
|
if c.S3.Bucket == "" {
|
||||||
|
return fmt.Errorf("s3.bucket is required (or set storage_url)")
|
||||||
|
}
|
||||||
|
|
||||||
|
if c.S3.AccessKeyID == "" {
|
||||||
|
return fmt.Errorf("s3.access_key_id is required")
|
||||||
|
}
|
||||||
|
|
||||||
|
if c.S3.SecretAccessKey == "" {
|
||||||
|
return fmt.Errorf("s3.secret_access_key is required")
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// extractAgeSecretKey extracts the AGE-SECRET-KEY from the input using
|
||||||
|
// the age library's parser, which handles comments and whitespace.
|
||||||
|
func extractAgeSecretKey(input string) string {
|
||||||
|
identities, err := age.ParseIdentities(strings.NewReader(input))
|
||||||
|
if err != nil || len(identities) == 0 {
|
||||||
|
// Fall back to trimmed input if parsing fails
|
||||||
|
return strings.TrimSpace(input)
|
||||||
|
}
|
||||||
|
// Return the string representation of the first identity
|
||||||
|
if id, ok := identities[0].(*age.X25519Identity); ok {
|
||||||
|
return id.String()
|
||||||
|
}
|
||||||
|
return strings.TrimSpace(input)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Module exports the config module for fx dependency injection.
|
||||||
|
// It provides the Config type to other modules in the application.
|
||||||
var Module = fx.Module("config",
|
var Module = fx.Module("config",
|
||||||
fx.Provide(New),
|
fx.Provide(New),
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -6,6 +6,12 @@ import (
|
|||||||
"testing"
|
"testing"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
const (
|
||||||
|
TEST_SNEAK_AGE_PUBLIC_KEY = "age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj"
|
||||||
|
TEST_INTEGRATION_AGE_PUBLIC_KEY = "age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"
|
||||||
|
TEST_INTEGRATION_AGE_PRIVATE_KEY = "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5"
|
||||||
|
)
|
||||||
|
|
||||||
func TestMain(m *testing.M) {
|
func TestMain(m *testing.M) {
|
||||||
// Set up test environment
|
// Set up test environment
|
||||||
testConfigPath := filepath.Join("..", "..", "test", "config.yaml")
|
testConfigPath := filepath.Join("..", "..", "test", "config.yaml")
|
||||||
@@ -32,16 +38,28 @@ func TestConfigLoad(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Basic validation
|
// Basic validation
|
||||||
if cfg.AgeRecipient != "age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" {
|
if len(cfg.AgeRecipients) != 2 {
|
||||||
t.Errorf("Expected age recipient to be set, got '%s'", cfg.AgeRecipient)
|
t.Errorf("Expected 2 age recipients, got %d", len(cfg.AgeRecipients))
|
||||||
|
}
|
||||||
|
if cfg.AgeRecipients[0] != TEST_SNEAK_AGE_PUBLIC_KEY {
|
||||||
|
t.Errorf("Expected first age recipient to be %s, got '%s'", TEST_SNEAK_AGE_PUBLIC_KEY, cfg.AgeRecipients[0])
|
||||||
}
|
}
|
||||||
|
|
||||||
if len(cfg.SourceDirs) != 2 {
|
if len(cfg.Snapshots) != 1 {
|
||||||
t.Errorf("Expected 2 source dirs, got %d", len(cfg.SourceDirs))
|
t.Errorf("Expected 1 snapshot, got %d", len(cfg.Snapshots))
|
||||||
}
|
}
|
||||||
|
|
||||||
if cfg.SourceDirs[0] != "/tmp/vaultik-test-source" {
|
testSnap, ok := cfg.Snapshots["test"]
|
||||||
t.Errorf("Expected first source dir to be '/tmp/vaultik-test-source', got '%s'", cfg.SourceDirs[0])
|
if !ok {
|
||||||
|
t.Fatal("Expected 'test' snapshot to exist")
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(testSnap.Paths) != 2 {
|
||||||
|
t.Errorf("Expected 2 paths in test snapshot, got %d", len(testSnap.Paths))
|
||||||
|
}
|
||||||
|
|
||||||
|
if testSnap.Paths[0] != "/tmp/vaultik-test-source" {
|
||||||
|
t.Errorf("Expected first path to be '/tmp/vaultik-test-source', got '%s'", testSnap.Paths[0])
|
||||||
}
|
}
|
||||||
|
|
||||||
if cfg.S3.Bucket != "vaultik-test-bucket" {
|
if cfg.S3.Bucket != "vaultik-test-bucket" {
|
||||||
@@ -65,3 +83,65 @@ func TestConfigFromEnv(t *testing.T) {
|
|||||||
t.Errorf("Config file does not exist at path from VAULTIK_CONFIG: %s", configPath)
|
t.Errorf("Config file does not exist at path from VAULTIK_CONFIG: %s", configPath)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// TestExtractAgeSecretKey tests extraction of AGE-SECRET-KEY from various inputs
|
||||||
|
func TestExtractAgeSecretKey(t *testing.T) {
|
||||||
|
tests := []struct {
|
||||||
|
name string
|
||||||
|
input string
|
||||||
|
expected string
|
||||||
|
}{
|
||||||
|
{
|
||||||
|
name: "plain key",
|
||||||
|
input: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
|
||||||
|
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "key with trailing newline",
|
||||||
|
input: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5\n",
|
||||||
|
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "full age-keygen output",
|
||||||
|
input: `# created: 2025-01-14T12:00:00Z
|
||||||
|
# public key: age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg
|
||||||
|
AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5
|
||||||
|
`,
|
||||||
|
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "age-keygen output with extra blank lines",
|
||||||
|
input: `# created: 2025-01-14T12:00:00Z
|
||||||
|
# public key: age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg
|
||||||
|
|
||||||
|
AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5
|
||||||
|
|
||||||
|
`,
|
||||||
|
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "key with leading whitespace",
|
||||||
|
input: " AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5 ",
|
||||||
|
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "empty input",
|
||||||
|
input: "",
|
||||||
|
expected: "",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "only comments",
|
||||||
|
input: "# this is a comment\n# another comment",
|
||||||
|
expected: "# this is a comment\n# another comment",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, tt := range tests {
|
||||||
|
t.Run(tt.name, func(t *testing.T) {
|
||||||
|
result := extractAgeSecretKey(tt.input)
|
||||||
|
if result != tt.expected {
|
||||||
|
t.Errorf("extractAgeSecretKey(%q) = %q, want %q", tt.input, result, tt.expected)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
62
internal/config/size.go
Normal file
62
internal/config/size.go
Normal file
@@ -0,0 +1,62 @@
|
|||||||
|
package config
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
|
||||||
|
"github.com/dustin/go-humanize"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Size represents a byte size that can be specified in configuration files.
|
||||||
|
// It can unmarshal from both numeric values (interpreted as bytes) and
|
||||||
|
// human-readable strings like "10MB", "2.5GB", or "1TB".
|
||||||
|
type Size int64
|
||||||
|
|
||||||
|
// UnmarshalYAML implements yaml.Unmarshaler for Size, allowing it to be
|
||||||
|
// parsed from YAML configuration files. It accepts both numeric values
|
||||||
|
// (interpreted as bytes) and string values with units (e.g., "10MB").
|
||||||
|
func (s *Size) UnmarshalYAML(unmarshal func(interface{}) error) error {
|
||||||
|
// Try to unmarshal as int64 first
|
||||||
|
var intVal int64
|
||||||
|
if err := unmarshal(&intVal); err == nil {
|
||||||
|
*s = Size(intVal)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Try to unmarshal as string
|
||||||
|
var strVal string
|
||||||
|
if err := unmarshal(&strVal); err != nil {
|
||||||
|
return fmt.Errorf("size must be a number or string")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse the string using go-humanize
|
||||||
|
bytes, err := humanize.ParseBytes(strVal)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("invalid size format: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
*s = Size(bytes)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Int64 returns the size as int64 bytes.
|
||||||
|
// This is useful when the size needs to be passed to APIs that expect
|
||||||
|
// a numeric byte count.
|
||||||
|
func (s Size) Int64() int64 {
|
||||||
|
return int64(s)
|
||||||
|
}
|
||||||
|
|
||||||
|
// String returns the size as a human-readable string.
|
||||||
|
// For example, 1048576 bytes would be formatted as "1.0 MB".
|
||||||
|
// This implements the fmt.Stringer interface.
|
||||||
|
func (s Size) String() string {
|
||||||
|
return humanize.Bytes(uint64(s))
|
||||||
|
}
|
||||||
|
|
||||||
|
// ParseSize parses a size string into a Size value
|
||||||
|
func ParseSize(s string) (Size, error) {
|
||||||
|
bytes, err := humanize.ParseBytes(s)
|
||||||
|
if err != nil {
|
||||||
|
return 0, fmt.Errorf("invalid size format: %w", err)
|
||||||
|
}
|
||||||
|
return Size(bytes), nil
|
||||||
|
}
|
||||||
209
internal/crypto/encryption.go
Normal file
209
internal/crypto/encryption.go
Normal file
@@ -0,0 +1,209 @@
|
|||||||
|
package crypto
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"sync"
|
||||||
|
|
||||||
|
"filippo.io/age"
|
||||||
|
"go.uber.org/fx"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Encryptor provides thread-safe encryption using the age encryption library.
|
||||||
|
// It supports encrypting data for multiple recipients simultaneously, allowing
|
||||||
|
// any of the corresponding private keys to decrypt the data. This is useful
|
||||||
|
// for backup scenarios where multiple parties should be able to decrypt the data.
|
||||||
|
type Encryptor struct {
|
||||||
|
recipients []age.Recipient
|
||||||
|
mu sync.RWMutex
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewEncryptor creates a new encryptor with the given age public keys.
|
||||||
|
// Each public key should be a valid age X25519 recipient string (e.g., "age1...")
|
||||||
|
// At least one recipient must be provided. Returns an error if any of the
|
||||||
|
// public keys are invalid or if no recipients are specified.
|
||||||
|
func NewEncryptor(publicKeys []string) (*Encryptor, error) {
|
||||||
|
if len(publicKeys) == 0 {
|
||||||
|
return nil, fmt.Errorf("at least one recipient is required")
|
||||||
|
}
|
||||||
|
|
||||||
|
recipients := make([]age.Recipient, 0, len(publicKeys))
|
||||||
|
for _, key := range publicKeys {
|
||||||
|
recipient, err := age.ParseX25519Recipient(key)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("parsing age recipient %s: %w", key, err)
|
||||||
|
}
|
||||||
|
recipients = append(recipients, recipient)
|
||||||
|
}
|
||||||
|
|
||||||
|
return &Encryptor{
|
||||||
|
recipients: recipients,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Encrypt encrypts data using age encryption for all configured recipients.
|
||||||
|
// The encrypted data can be decrypted by any of the corresponding private keys.
|
||||||
|
// This method is suitable for small to medium amounts of data that fit in memory.
|
||||||
|
// For large data streams, use EncryptStream or EncryptWriter instead.
|
||||||
|
func (e *Encryptor) Encrypt(data []byte) ([]byte, error) {
|
||||||
|
e.mu.RLock()
|
||||||
|
recipients := e.recipients
|
||||||
|
e.mu.RUnlock()
|
||||||
|
|
||||||
|
var buf bytes.Buffer
|
||||||
|
|
||||||
|
// Create encrypted writer for all recipients
|
||||||
|
w, err := age.Encrypt(&buf, recipients...)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("creating encrypted writer: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Write data
|
||||||
|
if _, err := w.Write(data); err != nil {
|
||||||
|
return nil, fmt.Errorf("writing encrypted data: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Close to flush
|
||||||
|
if err := w.Close(); err != nil {
|
||||||
|
return nil, fmt.Errorf("closing encrypted writer: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return buf.Bytes(), nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// EncryptStream encrypts data from reader to writer using age encryption.
|
||||||
|
// This method is suitable for encrypting large files or streams as it processes
|
||||||
|
// data in a streaming fashion without loading everything into memory.
|
||||||
|
// The encrypted data is written directly to the destination writer.
|
||||||
|
func (e *Encryptor) EncryptStream(dst io.Writer, src io.Reader) error {
|
||||||
|
e.mu.RLock()
|
||||||
|
recipients := e.recipients
|
||||||
|
e.mu.RUnlock()
|
||||||
|
|
||||||
|
// Create encrypted writer for all recipients
|
||||||
|
w, err := age.Encrypt(dst, recipients...)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("creating encrypted writer: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Copy data
|
||||||
|
if _, err := io.Copy(w, src); err != nil {
|
||||||
|
return fmt.Errorf("copying encrypted data: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Close to flush
|
||||||
|
if err := w.Close(); err != nil {
|
||||||
|
return fmt.Errorf("closing encrypted writer: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// EncryptWriter creates a writer that encrypts data written to it.
|
||||||
|
// All data written to the returned WriteCloser will be encrypted and written
|
||||||
|
// to the destination writer. The caller must call Close() on the returned
|
||||||
|
// writer to ensure all encrypted data is properly flushed and finalized.
|
||||||
|
// This is useful for integrating encryption into existing writer-based pipelines.
|
||||||
|
func (e *Encryptor) EncryptWriter(dst io.Writer) (io.WriteCloser, error) {
|
||||||
|
e.mu.RLock()
|
||||||
|
recipients := e.recipients
|
||||||
|
e.mu.RUnlock()
|
||||||
|
|
||||||
|
// Create encrypted writer for all recipients
|
||||||
|
w, err := age.Encrypt(dst, recipients...)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("creating encrypted writer: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return w, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// UpdateRecipients updates the recipients for future encryption operations.
|
||||||
|
// This method is thread-safe and can be called while other encryption operations
|
||||||
|
// are in progress. Existing encryption operations will continue with the old
|
||||||
|
// recipients. At least one recipient must be provided. Returns an error if any
|
||||||
|
// of the public keys are invalid or if no recipients are specified.
|
||||||
|
func (e *Encryptor) UpdateRecipients(publicKeys []string) error {
|
||||||
|
if len(publicKeys) == 0 {
|
||||||
|
return fmt.Errorf("at least one recipient is required")
|
||||||
|
}
|
||||||
|
|
||||||
|
recipients := make([]age.Recipient, 0, len(publicKeys))
|
||||||
|
for _, key := range publicKeys {
|
||||||
|
recipient, err := age.ParseX25519Recipient(key)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("parsing age recipient %s: %w", key, err)
|
||||||
|
}
|
||||||
|
recipients = append(recipients, recipient)
|
||||||
|
}
|
||||||
|
|
||||||
|
e.mu.Lock()
|
||||||
|
e.recipients = recipients
|
||||||
|
e.mu.Unlock()
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Decryptor provides thread-safe decryption using the age encryption library.
|
||||||
|
// It uses a private key to decrypt data that was encrypted for the corresponding
|
||||||
|
// public key.
|
||||||
|
type Decryptor struct {
|
||||||
|
identity age.Identity
|
||||||
|
mu sync.RWMutex
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewDecryptor creates a new decryptor with the given age private key.
|
||||||
|
// The private key should be a valid age X25519 identity string.
|
||||||
|
// Returns an error if the private key is invalid.
|
||||||
|
func NewDecryptor(privateKey string) (*Decryptor, error) {
|
||||||
|
identity, err := age.ParseX25519Identity(privateKey)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("parsing age identity: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return &Decryptor{
|
||||||
|
identity: identity,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Decrypt decrypts data using age decryption.
|
||||||
|
// This method is suitable for small to medium amounts of data that fit in memory.
|
||||||
|
// For large data streams, use DecryptStream instead.
|
||||||
|
func (d *Decryptor) Decrypt(data []byte) ([]byte, error) {
|
||||||
|
d.mu.RLock()
|
||||||
|
identity := d.identity
|
||||||
|
d.mu.RUnlock()
|
||||||
|
|
||||||
|
r, err := age.Decrypt(bytes.NewReader(data), identity)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("creating decrypted reader: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
decrypted, err := io.ReadAll(r)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("reading decrypted data: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return decrypted, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// DecryptStream returns a reader that decrypts data from the provided reader.
|
||||||
|
// This method is suitable for decrypting large files or streams as it processes
|
||||||
|
// data in a streaming fashion without loading everything into memory.
|
||||||
|
// The caller should close the input reader when done.
|
||||||
|
func (d *Decryptor) DecryptStream(src io.Reader) (io.Reader, error) {
|
||||||
|
d.mu.RLock()
|
||||||
|
identity := d.identity
|
||||||
|
d.mu.RUnlock()
|
||||||
|
|
||||||
|
r, err := age.Decrypt(src, identity)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("creating decrypted reader: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return r, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Module exports the crypto module for fx dependency injection.
|
||||||
|
var Module = fx.Module("crypto")
|
||||||
157
internal/crypto/encryption_test.go
Normal file
157
internal/crypto/encryption_test.go
Normal file
@@ -0,0 +1,157 @@
|
|||||||
|
package crypto
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"filippo.io/age"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestEncryptor(t *testing.T) {
|
||||||
|
// Generate a test key pair
|
||||||
|
identity, err := age.GenerateX25519Identity()
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to generate identity: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
publicKey := identity.Recipient().String()
|
||||||
|
|
||||||
|
// Create encryptor
|
||||||
|
enc, err := NewEncryptor([]string{publicKey})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create encryptor: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test data
|
||||||
|
plaintext := []byte("Hello, World! This is a test message.")
|
||||||
|
|
||||||
|
// Encrypt
|
||||||
|
ciphertext, err := enc.Encrypt(plaintext)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to encrypt: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify it's actually encrypted (should be larger and different)
|
||||||
|
if bytes.Equal(plaintext, ciphertext) {
|
||||||
|
t.Error("ciphertext equals plaintext")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Decrypt to verify
|
||||||
|
r, err := age.Decrypt(bytes.NewReader(ciphertext), identity)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to decrypt: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
var decrypted bytes.Buffer
|
||||||
|
if _, err := decrypted.ReadFrom(r); err != nil {
|
||||||
|
t.Fatalf("failed to read decrypted data: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if !bytes.Equal(plaintext, decrypted.Bytes()) {
|
||||||
|
t.Error("decrypted data doesn't match original")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestEncryptorMultipleRecipients(t *testing.T) {
|
||||||
|
// Generate three test key pairs
|
||||||
|
identity1, err := age.GenerateX25519Identity()
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to generate identity1: %v", err)
|
||||||
|
}
|
||||||
|
identity2, err := age.GenerateX25519Identity()
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to generate identity2: %v", err)
|
||||||
|
}
|
||||||
|
identity3, err := age.GenerateX25519Identity()
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to generate identity3: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
publicKeys := []string{
|
||||||
|
identity1.Recipient().String(),
|
||||||
|
identity2.Recipient().String(),
|
||||||
|
identity3.Recipient().String(),
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create encryptor with multiple recipients
|
||||||
|
enc, err := NewEncryptor(publicKeys)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create encryptor: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test data
|
||||||
|
plaintext := []byte("Secret message for multiple recipients")
|
||||||
|
|
||||||
|
// Encrypt
|
||||||
|
ciphertext, err := enc.Encrypt(plaintext)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to encrypt: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify each recipient can decrypt
|
||||||
|
identities := []age.Identity{identity1, identity2, identity3}
|
||||||
|
for i, identity := range identities {
|
||||||
|
r, err := age.Decrypt(bytes.NewReader(ciphertext), identity)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("recipient %d failed to decrypt: %v", i+1, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
var decrypted bytes.Buffer
|
||||||
|
if _, err := decrypted.ReadFrom(r); err != nil {
|
||||||
|
t.Fatalf("recipient %d failed to read decrypted data: %v", i+1, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if !bytes.Equal(plaintext, decrypted.Bytes()) {
|
||||||
|
t.Errorf("recipient %d: decrypted data doesn't match original", i+1)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestEncryptorUpdateRecipients(t *testing.T) {
|
||||||
|
// Generate two identities
|
||||||
|
identity1, _ := age.GenerateX25519Identity()
|
||||||
|
identity2, _ := age.GenerateX25519Identity()
|
||||||
|
|
||||||
|
publicKey1 := identity1.Recipient().String()
|
||||||
|
publicKey2 := identity2.Recipient().String()
|
||||||
|
|
||||||
|
// Create encryptor with first key
|
||||||
|
enc, err := NewEncryptor([]string{publicKey1})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create encryptor: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Encrypt with first key
|
||||||
|
plaintext := []byte("test data")
|
||||||
|
ciphertext1, err := enc.Encrypt(plaintext)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to encrypt: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Update to second key
|
||||||
|
if err := enc.UpdateRecipients([]string{publicKey2}); err != nil {
|
||||||
|
t.Fatalf("failed to update recipients: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Encrypt with second key
|
||||||
|
ciphertext2, err := enc.Encrypt(plaintext)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to encrypt: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// First ciphertext should only decrypt with first identity
|
||||||
|
if _, err := age.Decrypt(bytes.NewReader(ciphertext1), identity1); err != nil {
|
||||||
|
t.Error("failed to decrypt with identity1")
|
||||||
|
}
|
||||||
|
if _, err := age.Decrypt(bytes.NewReader(ciphertext1), identity2); err == nil {
|
||||||
|
t.Error("should not decrypt with identity2")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Second ciphertext should only decrypt with second identity
|
||||||
|
if _, err := age.Decrypt(bytes.NewReader(ciphertext2), identity2); err != nil {
|
||||||
|
t.Error("failed to decrypt with identity2")
|
||||||
|
}
|
||||||
|
if _, err := age.Decrypt(bytes.NewReader(ciphertext2), identity1); err == nil {
|
||||||
|
t.Error("should not decrypt with identity1")
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -16,15 +16,15 @@ func NewBlobChunkRepository(db *DB) *BlobChunkRepository {
|
|||||||
|
|
||||||
func (r *BlobChunkRepository) Create(ctx context.Context, tx *sql.Tx, bc *BlobChunk) error {
|
func (r *BlobChunkRepository) Create(ctx context.Context, tx *sql.Tx, bc *BlobChunk) error {
|
||||||
query := `
|
query := `
|
||||||
INSERT INTO blob_chunks (blob_hash, chunk_hash, offset, length)
|
INSERT INTO blob_chunks (blob_id, chunk_hash, offset, length)
|
||||||
VALUES (?, ?, ?, ?)
|
VALUES (?, ?, ?, ?)
|
||||||
`
|
`
|
||||||
|
|
||||||
var err error
|
var err error
|
||||||
if tx != nil {
|
if tx != nil {
|
||||||
_, err = tx.ExecContext(ctx, query, bc.BlobHash, bc.ChunkHash, bc.Offset, bc.Length)
|
_, err = tx.ExecContext(ctx, query, bc.BlobID, bc.ChunkHash, bc.Offset, bc.Length)
|
||||||
} else {
|
} else {
|
||||||
_, err = r.db.ExecWithLock(ctx, query, bc.BlobHash, bc.ChunkHash, bc.Offset, bc.Length)
|
_, err = r.db.ExecWithLog(ctx, query, bc.BlobID, bc.ChunkHash, bc.Offset, bc.Length)
|
||||||
}
|
}
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -34,15 +34,15 @@ func (r *BlobChunkRepository) Create(ctx context.Context, tx *sql.Tx, bc *BlobCh
|
|||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (r *BlobChunkRepository) GetByBlobHash(ctx context.Context, blobHash string) ([]*BlobChunk, error) {
|
func (r *BlobChunkRepository) GetByBlobID(ctx context.Context, blobID string) ([]*BlobChunk, error) {
|
||||||
query := `
|
query := `
|
||||||
SELECT blob_hash, chunk_hash, offset, length
|
SELECT blob_id, chunk_hash, offset, length
|
||||||
FROM blob_chunks
|
FROM blob_chunks
|
||||||
WHERE blob_hash = ?
|
WHERE blob_id = ?
|
||||||
ORDER BY offset
|
ORDER BY offset
|
||||||
`
|
`
|
||||||
|
|
||||||
rows, err := r.db.conn.QueryContext(ctx, query, blobHash)
|
rows, err := r.db.conn.QueryContext(ctx, query, blobID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, fmt.Errorf("querying blob chunks: %w", err)
|
return nil, fmt.Errorf("querying blob chunks: %w", err)
|
||||||
}
|
}
|
||||||
@@ -51,7 +51,7 @@ func (r *BlobChunkRepository) GetByBlobHash(ctx context.Context, blobHash string
|
|||||||
var blobChunks []*BlobChunk
|
var blobChunks []*BlobChunk
|
||||||
for rows.Next() {
|
for rows.Next() {
|
||||||
var bc BlobChunk
|
var bc BlobChunk
|
||||||
err := rows.Scan(&bc.BlobHash, &bc.ChunkHash, &bc.Offset, &bc.Length)
|
err := rows.Scan(&bc.BlobID, &bc.ChunkHash, &bc.Offset, &bc.Length)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, fmt.Errorf("scanning blob chunk: %w", err)
|
return nil, fmt.Errorf("scanning blob chunk: %w", err)
|
||||||
}
|
}
|
||||||
@@ -63,26 +63,90 @@ func (r *BlobChunkRepository) GetByBlobHash(ctx context.Context, blobHash string
|
|||||||
|
|
||||||
func (r *BlobChunkRepository) GetByChunkHash(ctx context.Context, chunkHash string) (*BlobChunk, error) {
|
func (r *BlobChunkRepository) GetByChunkHash(ctx context.Context, chunkHash string) (*BlobChunk, error) {
|
||||||
query := `
|
query := `
|
||||||
SELECT blob_hash, chunk_hash, offset, length
|
SELECT blob_id, chunk_hash, offset, length
|
||||||
FROM blob_chunks
|
FROM blob_chunks
|
||||||
WHERE chunk_hash = ?
|
WHERE chunk_hash = ?
|
||||||
LIMIT 1
|
LIMIT 1
|
||||||
`
|
`
|
||||||
|
|
||||||
|
LogSQL("GetByChunkHash", query, chunkHash)
|
||||||
var bc BlobChunk
|
var bc BlobChunk
|
||||||
err := r.db.conn.QueryRowContext(ctx, query, chunkHash).Scan(
|
err := r.db.conn.QueryRowContext(ctx, query, chunkHash).Scan(
|
||||||
&bc.BlobHash,
|
&bc.BlobID,
|
||||||
&bc.ChunkHash,
|
&bc.ChunkHash,
|
||||||
&bc.Offset,
|
&bc.Offset,
|
||||||
&bc.Length,
|
&bc.Length,
|
||||||
)
|
)
|
||||||
|
|
||||||
if err == sql.ErrNoRows {
|
if err == sql.ErrNoRows {
|
||||||
|
LogSQL("GetByChunkHash", "No rows found", chunkHash)
|
||||||
return nil, nil
|
return nil, nil
|
||||||
}
|
}
|
||||||
if err != nil {
|
if err != nil {
|
||||||
|
LogSQL("GetByChunkHash", "Error", chunkHash, err)
|
||||||
return nil, fmt.Errorf("querying blob chunk: %w", err)
|
return nil, fmt.Errorf("querying blob chunk: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
LogSQL("GetByChunkHash", "Found blob", chunkHash, "blob", bc.BlobID)
|
||||||
return &bc, nil
|
return &bc, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// GetByChunkHashTx retrieves a blob chunk within a transaction
|
||||||
|
func (r *BlobChunkRepository) GetByChunkHashTx(ctx context.Context, tx *sql.Tx, chunkHash string) (*BlobChunk, error) {
|
||||||
|
query := `
|
||||||
|
SELECT blob_id, chunk_hash, offset, length
|
||||||
|
FROM blob_chunks
|
||||||
|
WHERE chunk_hash = ?
|
||||||
|
LIMIT 1
|
||||||
|
`
|
||||||
|
|
||||||
|
LogSQL("GetByChunkHashTx", query, chunkHash)
|
||||||
|
var bc BlobChunk
|
||||||
|
err := tx.QueryRowContext(ctx, query, chunkHash).Scan(
|
||||||
|
&bc.BlobID,
|
||||||
|
&bc.ChunkHash,
|
||||||
|
&bc.Offset,
|
||||||
|
&bc.Length,
|
||||||
|
)
|
||||||
|
|
||||||
|
if err == sql.ErrNoRows {
|
||||||
|
LogSQL("GetByChunkHashTx", "No rows found", chunkHash)
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
LogSQL("GetByChunkHashTx", "Error", chunkHash, err)
|
||||||
|
return nil, fmt.Errorf("querying blob chunk: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
LogSQL("GetByChunkHashTx", "Found blob", chunkHash, "blob", bc.BlobID)
|
||||||
|
return &bc, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// DeleteOrphaned deletes blob_chunks entries where either the blob or chunk no longer exists
|
||||||
|
func (r *BlobChunkRepository) DeleteOrphaned(ctx context.Context) error {
|
||||||
|
// Delete blob_chunks where the blob doesn't exist
|
||||||
|
query1 := `
|
||||||
|
DELETE FROM blob_chunks
|
||||||
|
WHERE NOT EXISTS (
|
||||||
|
SELECT 1 FROM blobs
|
||||||
|
WHERE blobs.id = blob_chunks.blob_id
|
||||||
|
)
|
||||||
|
`
|
||||||
|
if _, err := r.db.ExecWithLog(ctx, query1); err != nil {
|
||||||
|
return fmt.Errorf("deleting blob_chunks with missing blobs: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Delete blob_chunks where the chunk doesn't exist
|
||||||
|
query2 := `
|
||||||
|
DELETE FROM blob_chunks
|
||||||
|
WHERE NOT EXISTS (
|
||||||
|
SELECT 1 FROM chunks
|
||||||
|
WHERE chunks.chunk_hash = blob_chunks.chunk_hash
|
||||||
|
)
|
||||||
|
`
|
||||||
|
if _, err := r.db.ExecWithLog(ctx, query2); err != nil {
|
||||||
|
return fmt.Errorf("deleting blob_chunks with missing chunks: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|||||||
@@ -2,7 +2,11 @@ package database
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"context"
|
"context"
|
||||||
|
"strings"
|
||||||
"testing"
|
"testing"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
)
|
)
|
||||||
|
|
||||||
func TestBlobChunkRepository(t *testing.T) {
|
func TestBlobChunkRepository(t *testing.T) {
|
||||||
@@ -10,78 +14,111 @@ func TestBlobChunkRepository(t *testing.T) {
|
|||||||
defer cleanup()
|
defer cleanup()
|
||||||
|
|
||||||
ctx := context.Background()
|
ctx := context.Background()
|
||||||
repo := NewBlobChunkRepository(db)
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Create blob first
|
||||||
|
blob := &Blob{
|
||||||
|
ID: types.NewBlobID(),
|
||||||
|
Hash: types.BlobHash("blob1-hash"),
|
||||||
|
CreatedTS: time.Now(),
|
||||||
|
}
|
||||||
|
err := repos.Blobs.Create(ctx, nil, blob)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create blob: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create chunks
|
||||||
|
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
|
||||||
|
for _, chunkHash := range chunks {
|
||||||
|
chunk := &Chunk{
|
||||||
|
ChunkHash: chunkHash,
|
||||||
|
Size: 1024,
|
||||||
|
}
|
||||||
|
err = repos.Chunks.Create(ctx, nil, chunk)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Test Create
|
// Test Create
|
||||||
bc1 := &BlobChunk{
|
bc1 := &BlobChunk{
|
||||||
BlobHash: "blob1",
|
BlobID: blob.ID,
|
||||||
ChunkHash: "chunk1",
|
ChunkHash: types.ChunkHash("chunk1"),
|
||||||
Offset: 0,
|
Offset: 0,
|
||||||
Length: 1024,
|
Length: 1024,
|
||||||
}
|
}
|
||||||
|
|
||||||
err := repo.Create(ctx, nil, bc1)
|
err = repos.BlobChunks.Create(ctx, nil, bc1)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to create blob chunk: %v", err)
|
t.Fatalf("failed to create blob chunk: %v", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Add more chunks to the same blob
|
// Add more chunks to the same blob
|
||||||
bc2 := &BlobChunk{
|
bc2 := &BlobChunk{
|
||||||
BlobHash: "blob1",
|
BlobID: blob.ID,
|
||||||
ChunkHash: "chunk2",
|
ChunkHash: types.ChunkHash("chunk2"),
|
||||||
Offset: 1024,
|
Offset: 1024,
|
||||||
Length: 2048,
|
Length: 2048,
|
||||||
}
|
}
|
||||||
err = repo.Create(ctx, nil, bc2)
|
err = repos.BlobChunks.Create(ctx, nil, bc2)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to create second blob chunk: %v", err)
|
t.Fatalf("failed to create second blob chunk: %v", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
bc3 := &BlobChunk{
|
bc3 := &BlobChunk{
|
||||||
BlobHash: "blob1",
|
BlobID: blob.ID,
|
||||||
ChunkHash: "chunk3",
|
ChunkHash: types.ChunkHash("chunk3"),
|
||||||
Offset: 3072,
|
Offset: 3072,
|
||||||
Length: 512,
|
Length: 512,
|
||||||
}
|
}
|
||||||
err = repo.Create(ctx, nil, bc3)
|
err = repos.BlobChunks.Create(ctx, nil, bc3)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to create third blob chunk: %v", err)
|
t.Fatalf("failed to create third blob chunk: %v", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Test GetByBlobHash
|
// Test GetByBlobID
|
||||||
chunks, err := repo.GetByBlobHash(ctx, "blob1")
|
blobChunks, err := repos.BlobChunks.GetByBlobID(ctx, blob.ID.String())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get blob chunks: %v", err)
|
t.Fatalf("failed to get blob chunks: %v", err)
|
||||||
}
|
}
|
||||||
if len(chunks) != 3 {
|
if len(blobChunks) != 3 {
|
||||||
t.Errorf("expected 3 chunks, got %d", len(chunks))
|
t.Errorf("expected 3 chunks, got %d", len(blobChunks))
|
||||||
}
|
}
|
||||||
|
|
||||||
// Verify order by offset
|
// Verify order by offset
|
||||||
expectedOffsets := []int64{0, 1024, 3072}
|
expectedOffsets := []int64{0, 1024, 3072}
|
||||||
for i, chunk := range chunks {
|
for i, bc := range blobChunks {
|
||||||
if chunk.Offset != expectedOffsets[i] {
|
if bc.Offset != expectedOffsets[i] {
|
||||||
t.Errorf("wrong chunk order: expected offset %d, got %d", expectedOffsets[i], chunk.Offset)
|
t.Errorf("wrong chunk order: expected offset %d, got %d", expectedOffsets[i], bc.Offset)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Test GetByChunkHash
|
// Test GetByChunkHash
|
||||||
bc, err := repo.GetByChunkHash(ctx, "chunk2")
|
bc, err := repos.BlobChunks.GetByChunkHash(ctx, "chunk2")
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get blob chunk by chunk hash: %v", err)
|
t.Fatalf("failed to get blob chunk by chunk hash: %v", err)
|
||||||
}
|
}
|
||||||
if bc == nil {
|
if bc == nil {
|
||||||
t.Fatal("expected blob chunk, got nil")
|
t.Fatal("expected blob chunk, got nil")
|
||||||
}
|
}
|
||||||
if bc.BlobHash != "blob1" {
|
if bc.BlobID != blob.ID {
|
||||||
t.Errorf("wrong blob hash: expected blob1, got %s", bc.BlobHash)
|
t.Errorf("wrong blob ID: expected %s, got %s", blob.ID, bc.BlobID)
|
||||||
}
|
}
|
||||||
if bc.Offset != 1024 {
|
if bc.Offset != 1024 {
|
||||||
t.Errorf("wrong offset: expected 1024, got %d", bc.Offset)
|
t.Errorf("wrong offset: expected 1024, got %d", bc.Offset)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Test duplicate insert (should fail due to primary key constraint)
|
||||||
|
err = repos.BlobChunks.Create(ctx, nil, bc1)
|
||||||
|
if err == nil {
|
||||||
|
t.Fatal("duplicate blob_chunk insert should fail due to primary key constraint")
|
||||||
|
}
|
||||||
|
if !strings.Contains(err.Error(), "UNIQUE") && !strings.Contains(err.Error(), "constraint") {
|
||||||
|
t.Fatalf("expected constraint error, got: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
// Test non-existent chunk
|
// Test non-existent chunk
|
||||||
bc, err = repo.GetByChunkHash(ctx, "nonexistent")
|
bc, err = repos.BlobChunks.GetByChunkHash(ctx, "nonexistent")
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("unexpected error: %v", err)
|
t.Fatalf("unexpected error: %v", err)
|
||||||
}
|
}
|
||||||
@@ -95,26 +132,60 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
|
|||||||
defer cleanup()
|
defer cleanup()
|
||||||
|
|
||||||
ctx := context.Background()
|
ctx := context.Background()
|
||||||
repo := NewBlobChunkRepository(db)
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Create blobs
|
||||||
|
blob1 := &Blob{
|
||||||
|
ID: types.NewBlobID(),
|
||||||
|
Hash: types.BlobHash("blob1-hash"),
|
||||||
|
CreatedTS: time.Now(),
|
||||||
|
}
|
||||||
|
blob2 := &Blob{
|
||||||
|
ID: types.NewBlobID(),
|
||||||
|
Hash: types.BlobHash("blob2-hash"),
|
||||||
|
CreatedTS: time.Now(),
|
||||||
|
}
|
||||||
|
|
||||||
|
err := repos.Blobs.Create(ctx, nil, blob1)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create blob1: %v", err)
|
||||||
|
}
|
||||||
|
err = repos.Blobs.Create(ctx, nil, blob2)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create blob2: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create chunks
|
||||||
|
chunkHashes := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
|
||||||
|
for _, chunkHash := range chunkHashes {
|
||||||
|
chunk := &Chunk{
|
||||||
|
ChunkHash: chunkHash,
|
||||||
|
Size: 1024,
|
||||||
|
}
|
||||||
|
err = repos.Chunks.Create(ctx, nil, chunk)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Create chunks across multiple blobs
|
// Create chunks across multiple blobs
|
||||||
// Some chunks are shared between blobs (deduplication scenario)
|
// Some chunks are shared between blobs (deduplication scenario)
|
||||||
blobChunks := []BlobChunk{
|
blobChunks := []BlobChunk{
|
||||||
{BlobHash: "blob1", ChunkHash: "chunk1", Offset: 0, Length: 1024},
|
{BlobID: blob1.ID, ChunkHash: types.ChunkHash("chunk1"), Offset: 0, Length: 1024},
|
||||||
{BlobHash: "blob1", ChunkHash: "chunk2", Offset: 1024, Length: 1024},
|
{BlobID: blob1.ID, ChunkHash: types.ChunkHash("chunk2"), Offset: 1024, Length: 1024},
|
||||||
{BlobHash: "blob2", ChunkHash: "chunk2", Offset: 0, Length: 1024}, // chunk2 is shared
|
{BlobID: blob2.ID, ChunkHash: types.ChunkHash("chunk2"), Offset: 0, Length: 1024}, // chunk2 is shared
|
||||||
{BlobHash: "blob2", ChunkHash: "chunk3", Offset: 1024, Length: 1024},
|
{BlobID: blob2.ID, ChunkHash: types.ChunkHash("chunk3"), Offset: 1024, Length: 1024},
|
||||||
}
|
}
|
||||||
|
|
||||||
for _, bc := range blobChunks {
|
for _, bc := range blobChunks {
|
||||||
err := repo.Create(ctx, nil, &bc)
|
err := repos.BlobChunks.Create(ctx, nil, &bc)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to create blob chunk: %v", err)
|
t.Fatalf("failed to create blob chunk: %v", err)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Verify blob1 chunks
|
// Verify blob1 chunks
|
||||||
chunks, err := repo.GetByBlobHash(ctx, "blob1")
|
chunks, err := repos.BlobChunks.GetByBlobID(ctx, blob1.ID.String())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get blob1 chunks: %v", err)
|
t.Fatalf("failed to get blob1 chunks: %v", err)
|
||||||
}
|
}
|
||||||
@@ -123,7 +194,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Verify blob2 chunks
|
// Verify blob2 chunks
|
||||||
chunks, err = repo.GetByBlobHash(ctx, "blob2")
|
chunks, err = repos.BlobChunks.GetByBlobID(ctx, blob2.ID.String())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get blob2 chunks: %v", err)
|
t.Fatalf("failed to get blob2 chunks: %v", err)
|
||||||
}
|
}
|
||||||
@@ -132,7 +203,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Verify shared chunk
|
// Verify shared chunk
|
||||||
bc, err := repo.GetByChunkHash(ctx, "chunk2")
|
bc, err := repos.BlobChunks.GetByChunkHash(ctx, "chunk2")
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get shared chunk: %v", err)
|
t.Fatalf("failed to get shared chunk: %v", err)
|
||||||
}
|
}
|
||||||
@@ -140,7 +211,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
|
|||||||
t.Fatal("expected shared chunk, got nil")
|
t.Fatal("expected shared chunk, got nil")
|
||||||
}
|
}
|
||||||
// GetByChunkHash returns first match, should be blob1
|
// GetByChunkHash returns first match, should be blob1
|
||||||
if bc.BlobHash != "blob1" {
|
if bc.BlobID != blob1.ID {
|
||||||
t.Errorf("expected blob1 for shared chunk, got %s", bc.BlobHash)
|
t.Errorf("expected %s for shared chunk, got %s", blob1.ID, bc.BlobID)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -5,6 +5,8 @@ import (
|
|||||||
"database/sql"
|
"database/sql"
|
||||||
"fmt"
|
"fmt"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
)
|
)
|
||||||
|
|
||||||
type BlobRepository struct {
|
type BlobRepository struct {
|
||||||
@@ -17,15 +19,27 @@ func NewBlobRepository(db *DB) *BlobRepository {
|
|||||||
|
|
||||||
func (r *BlobRepository) Create(ctx context.Context, tx *sql.Tx, blob *Blob) error {
|
func (r *BlobRepository) Create(ctx context.Context, tx *sql.Tx, blob *Blob) error {
|
||||||
query := `
|
query := `
|
||||||
INSERT INTO blobs (blob_hash, created_ts)
|
INSERT INTO blobs (id, blob_hash, created_ts, finished_ts, uncompressed_size, compressed_size, uploaded_ts)
|
||||||
VALUES (?, ?)
|
VALUES (?, ?, ?, ?, ?, ?, ?)
|
||||||
`
|
`
|
||||||
|
|
||||||
|
var finishedTS, uploadedTS *int64
|
||||||
|
if blob.FinishedTS != nil {
|
||||||
|
ts := blob.FinishedTS.Unix()
|
||||||
|
finishedTS = &ts
|
||||||
|
}
|
||||||
|
if blob.UploadedTS != nil {
|
||||||
|
ts := blob.UploadedTS.Unix()
|
||||||
|
uploadedTS = &ts
|
||||||
|
}
|
||||||
|
|
||||||
var err error
|
var err error
|
||||||
if tx != nil {
|
if tx != nil {
|
||||||
_, err = tx.ExecContext(ctx, query, blob.BlobHash, blob.CreatedTS.Unix())
|
_, err = tx.ExecContext(ctx, query, blob.ID, blob.Hash, blob.CreatedTS.Unix(),
|
||||||
|
finishedTS, blob.UncompressedSize, blob.CompressedSize, uploadedTS)
|
||||||
} else {
|
} else {
|
||||||
_, err = r.db.ExecWithLock(ctx, query, blob.BlobHash, blob.CreatedTS.Unix())
|
_, err = r.db.ExecWithLog(ctx, query, blob.ID, blob.Hash, blob.CreatedTS.Unix(),
|
||||||
|
finishedTS, blob.UncompressedSize, blob.CompressedSize, uploadedTS)
|
||||||
}
|
}
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -37,17 +51,23 @@ func (r *BlobRepository) Create(ctx context.Context, tx *sql.Tx, blob *Blob) err
|
|||||||
|
|
||||||
func (r *BlobRepository) GetByHash(ctx context.Context, hash string) (*Blob, error) {
|
func (r *BlobRepository) GetByHash(ctx context.Context, hash string) (*Blob, error) {
|
||||||
query := `
|
query := `
|
||||||
SELECT blob_hash, created_ts
|
SELECT id, blob_hash, created_ts, finished_ts, uncompressed_size, compressed_size, uploaded_ts
|
||||||
FROM blobs
|
FROM blobs
|
||||||
WHERE blob_hash = ?
|
WHERE blob_hash = ?
|
||||||
`
|
`
|
||||||
|
|
||||||
var blob Blob
|
var blob Blob
|
||||||
var createdTSUnix int64
|
var createdTSUnix int64
|
||||||
|
var finishedTSUnix, uploadedTSUnix sql.NullInt64
|
||||||
|
|
||||||
err := r.db.conn.QueryRowContext(ctx, query, hash).Scan(
|
err := r.db.conn.QueryRowContext(ctx, query, hash).Scan(
|
||||||
&blob.BlobHash,
|
&blob.ID,
|
||||||
|
&blob.Hash,
|
||||||
&createdTSUnix,
|
&createdTSUnix,
|
||||||
|
&finishedTSUnix,
|
||||||
|
&blob.UncompressedSize,
|
||||||
|
&blob.CompressedSize,
|
||||||
|
&uploadedTSUnix,
|
||||||
)
|
)
|
||||||
|
|
||||||
if err == sql.ErrNoRows {
|
if err == sql.ErrNoRows {
|
||||||
@@ -57,40 +77,124 @@ func (r *BlobRepository) GetByHash(ctx context.Context, hash string) (*Blob, err
|
|||||||
return nil, fmt.Errorf("querying blob: %w", err)
|
return nil, fmt.Errorf("querying blob: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
blob.CreatedTS = time.Unix(createdTSUnix, 0)
|
blob.CreatedTS = time.Unix(createdTSUnix, 0).UTC()
|
||||||
|
if finishedTSUnix.Valid {
|
||||||
|
ts := time.Unix(finishedTSUnix.Int64, 0).UTC()
|
||||||
|
blob.FinishedTS = &ts
|
||||||
|
}
|
||||||
|
if uploadedTSUnix.Valid {
|
||||||
|
ts := time.Unix(uploadedTSUnix.Int64, 0).UTC()
|
||||||
|
blob.UploadedTS = &ts
|
||||||
|
}
|
||||||
return &blob, nil
|
return &blob, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (r *BlobRepository) List(ctx context.Context, limit, offset int) ([]*Blob, error) {
|
// GetByID retrieves a blob by its ID
|
||||||
|
func (r *BlobRepository) GetByID(ctx context.Context, id string) (*Blob, error) {
|
||||||
query := `
|
query := `
|
||||||
SELECT blob_hash, created_ts
|
SELECT id, blob_hash, created_ts, finished_ts, uncompressed_size, compressed_size, uploaded_ts
|
||||||
FROM blobs
|
FROM blobs
|
||||||
ORDER BY blob_hash
|
WHERE id = ?
|
||||||
LIMIT ? OFFSET ?
|
|
||||||
`
|
`
|
||||||
|
|
||||||
rows, err := r.db.conn.QueryContext(ctx, query, limit, offset)
|
|
||||||
if err != nil {
|
|
||||||
return nil, fmt.Errorf("querying blobs: %w", err)
|
|
||||||
}
|
|
||||||
defer CloseRows(rows)
|
|
||||||
|
|
||||||
var blobs []*Blob
|
|
||||||
for rows.Next() {
|
|
||||||
var blob Blob
|
var blob Blob
|
||||||
var createdTSUnix int64
|
var createdTSUnix int64
|
||||||
|
var finishedTSUnix, uploadedTSUnix sql.NullInt64
|
||||||
|
|
||||||
err := rows.Scan(
|
err := r.db.conn.QueryRowContext(ctx, query, id).Scan(
|
||||||
&blob.BlobHash,
|
&blob.ID,
|
||||||
|
&blob.Hash,
|
||||||
&createdTSUnix,
|
&createdTSUnix,
|
||||||
|
&finishedTSUnix,
|
||||||
|
&blob.UncompressedSize,
|
||||||
|
&blob.CompressedSize,
|
||||||
|
&uploadedTSUnix,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
if err == sql.ErrNoRows {
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, fmt.Errorf("scanning blob: %w", err)
|
return nil, fmt.Errorf("querying blob: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
blob.CreatedTS = time.Unix(createdTSUnix, 0)
|
blob.CreatedTS = time.Unix(createdTSUnix, 0).UTC()
|
||||||
blobs = append(blobs, &blob)
|
if finishedTSUnix.Valid {
|
||||||
|
ts := time.Unix(finishedTSUnix.Int64, 0).UTC()
|
||||||
|
blob.FinishedTS = &ts
|
||||||
}
|
}
|
||||||
|
if uploadedTSUnix.Valid {
|
||||||
return blobs, rows.Err()
|
ts := time.Unix(uploadedTSUnix.Int64, 0).UTC()
|
||||||
|
blob.UploadedTS = &ts
|
||||||
|
}
|
||||||
|
return &blob, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// UpdateFinished updates a blob when it's finalized
|
||||||
|
func (r *BlobRepository) UpdateFinished(ctx context.Context, tx *sql.Tx, id string, hash string, uncompressedSize, compressedSize int64) error {
|
||||||
|
query := `
|
||||||
|
UPDATE blobs
|
||||||
|
SET blob_hash = ?, finished_ts = ?, uncompressed_size = ?, compressed_size = ?
|
||||||
|
WHERE id = ?
|
||||||
|
`
|
||||||
|
|
||||||
|
now := time.Now().UTC().Unix()
|
||||||
|
var err error
|
||||||
|
if tx != nil {
|
||||||
|
_, err = tx.ExecContext(ctx, query, hash, now, uncompressedSize, compressedSize, id)
|
||||||
|
} else {
|
||||||
|
_, err = r.db.ExecWithLog(ctx, query, hash, now, uncompressedSize, compressedSize, id)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("updating blob: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// UpdateUploaded marks a blob as uploaded
|
||||||
|
func (r *BlobRepository) UpdateUploaded(ctx context.Context, tx *sql.Tx, id string) error {
|
||||||
|
query := `
|
||||||
|
UPDATE blobs
|
||||||
|
SET uploaded_ts = ?
|
||||||
|
WHERE id = ?
|
||||||
|
`
|
||||||
|
|
||||||
|
now := time.Now().UTC().Unix()
|
||||||
|
var err error
|
||||||
|
if tx != nil {
|
||||||
|
_, err = tx.ExecContext(ctx, query, now, id)
|
||||||
|
} else {
|
||||||
|
_, err = r.db.ExecWithLog(ctx, query, now, id)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("marking blob as uploaded: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// DeleteOrphaned deletes blobs that are not referenced by any snapshot
|
||||||
|
func (r *BlobRepository) DeleteOrphaned(ctx context.Context) error {
|
||||||
|
query := `
|
||||||
|
DELETE FROM blobs
|
||||||
|
WHERE NOT EXISTS (
|
||||||
|
SELECT 1 FROM snapshot_blobs
|
||||||
|
WHERE snapshot_blobs.blob_id = blobs.id
|
||||||
|
)
|
||||||
|
`
|
||||||
|
|
||||||
|
result, err := r.db.ExecWithLog(ctx, query)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting orphaned blobs: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
rowsAffected, _ := result.RowsAffected()
|
||||||
|
if rowsAffected > 0 {
|
||||||
|
log.Debug("Deleted orphaned blobs", "count", rowsAffected)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -4,6 +4,8 @@ import (
|
|||||||
"context"
|
"context"
|
||||||
"testing"
|
"testing"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
)
|
)
|
||||||
|
|
||||||
func TestBlobRepository(t *testing.T) {
|
func TestBlobRepository(t *testing.T) {
|
||||||
@@ -15,7 +17,8 @@ func TestBlobRepository(t *testing.T) {
|
|||||||
|
|
||||||
// Test Create
|
// Test Create
|
||||||
blob := &Blob{
|
blob := &Blob{
|
||||||
BlobHash: "blobhash123",
|
ID: types.NewBlobID(),
|
||||||
|
Hash: types.BlobHash("blobhash123"),
|
||||||
CreatedTS: time.Now().Truncate(time.Second),
|
CreatedTS: time.Now().Truncate(time.Second),
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -25,23 +28,36 @@ func TestBlobRepository(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Test GetByHash
|
// Test GetByHash
|
||||||
retrieved, err := repo.GetByHash(ctx, blob.BlobHash)
|
retrieved, err := repo.GetByHash(ctx, blob.Hash.String())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get blob: %v", err)
|
t.Fatalf("failed to get blob: %v", err)
|
||||||
}
|
}
|
||||||
if retrieved == nil {
|
if retrieved == nil {
|
||||||
t.Fatal("expected blob, got nil")
|
t.Fatal("expected blob, got nil")
|
||||||
}
|
}
|
||||||
if retrieved.BlobHash != blob.BlobHash {
|
if retrieved.Hash != blob.Hash {
|
||||||
t.Errorf("blob hash mismatch: got %s, want %s", retrieved.BlobHash, blob.BlobHash)
|
t.Errorf("blob hash mismatch: got %s, want %s", retrieved.Hash, blob.Hash)
|
||||||
}
|
}
|
||||||
if !retrieved.CreatedTS.Equal(blob.CreatedTS) {
|
if !retrieved.CreatedTS.Equal(blob.CreatedTS) {
|
||||||
t.Errorf("created timestamp mismatch: got %v, want %v", retrieved.CreatedTS, blob.CreatedTS)
|
t.Errorf("created timestamp mismatch: got %v, want %v", retrieved.CreatedTS, blob.CreatedTS)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Test List
|
// Test GetByID
|
||||||
|
retrievedByID, err := repo.GetByID(ctx, blob.ID.String())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to get blob by ID: %v", err)
|
||||||
|
}
|
||||||
|
if retrievedByID == nil {
|
||||||
|
t.Fatal("expected blob, got nil")
|
||||||
|
}
|
||||||
|
if retrievedByID.ID != blob.ID {
|
||||||
|
t.Errorf("blob ID mismatch: got %s, want %s", retrievedByID.ID, blob.ID)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test with second blob
|
||||||
blob2 := &Blob{
|
blob2 := &Blob{
|
||||||
BlobHash: "blobhash456",
|
ID: types.NewBlobID(),
|
||||||
|
Hash: types.BlobHash("blobhash456"),
|
||||||
CreatedTS: time.Now().Truncate(time.Second),
|
CreatedTS: time.Now().Truncate(time.Second),
|
||||||
}
|
}
|
||||||
err = repo.Create(ctx, nil, blob2)
|
err = repo.Create(ctx, nil, blob2)
|
||||||
@@ -49,29 +65,45 @@ func TestBlobRepository(t *testing.T) {
|
|||||||
t.Fatalf("failed to create second blob: %v", err)
|
t.Fatalf("failed to create second blob: %v", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
blobs, err := repo.List(ctx, 10, 0)
|
// Test UpdateFinished
|
||||||
|
now := time.Now()
|
||||||
|
err = repo.UpdateFinished(ctx, nil, blob.ID.String(), blob.Hash.String(), 1000, 500)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to list blobs: %v", err)
|
t.Fatalf("failed to update blob as finished: %v", err)
|
||||||
}
|
|
||||||
if len(blobs) != 2 {
|
|
||||||
t.Errorf("expected 2 blobs, got %d", len(blobs))
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// Test pagination
|
// Verify update
|
||||||
blobs, err = repo.List(ctx, 1, 0)
|
updated, err := repo.GetByID(ctx, blob.ID.String())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to list blobs with limit: %v", err)
|
t.Fatalf("failed to get updated blob: %v", err)
|
||||||
}
|
}
|
||||||
if len(blobs) != 1 {
|
if updated.FinishedTS == nil {
|
||||||
t.Errorf("expected 1 blob with limit, got %d", len(blobs))
|
t.Fatal("expected finished timestamp to be set")
|
||||||
|
}
|
||||||
|
if updated.UncompressedSize != 1000 {
|
||||||
|
t.Errorf("expected uncompressed size 1000, got %d", updated.UncompressedSize)
|
||||||
|
}
|
||||||
|
if updated.CompressedSize != 500 {
|
||||||
|
t.Errorf("expected compressed size 500, got %d", updated.CompressedSize)
|
||||||
}
|
}
|
||||||
|
|
||||||
blobs, err = repo.List(ctx, 1, 1)
|
// Test UpdateUploaded
|
||||||
|
err = repo.UpdateUploaded(ctx, nil, blob.ID.String())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to list blobs with offset: %v", err)
|
t.Fatalf("failed to update blob as uploaded: %v", err)
|
||||||
}
|
}
|
||||||
if len(blobs) != 1 {
|
|
||||||
t.Errorf("expected 1 blob with offset, got %d", len(blobs))
|
// Verify upload update
|
||||||
|
uploaded, err := repo.GetByID(ctx, blob.ID.String())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to get uploaded blob: %v", err)
|
||||||
|
}
|
||||||
|
if uploaded.UploadedTS == nil {
|
||||||
|
t.Fatal("expected uploaded timestamp to be set")
|
||||||
|
}
|
||||||
|
// Allow 1 second tolerance for timestamp comparison
|
||||||
|
if uploaded.UploadedTS.Before(now.Add(-1 * time.Second)) {
|
||||||
|
t.Error("uploaded timestamp should be around test time")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -83,7 +115,8 @@ func TestBlobRepositoryDuplicate(t *testing.T) {
|
|||||||
repo := NewBlobRepository(db)
|
repo := NewBlobRepository(db)
|
||||||
|
|
||||||
blob := &Blob{
|
blob := &Blob{
|
||||||
BlobHash: "duplicate_blob",
|
ID: types.NewBlobID(),
|
||||||
|
Hash: types.BlobHash("duplicate_blob"),
|
||||||
CreatedTS: time.Now().Truncate(time.Second),
|
CreatedTS: time.Now().Truncate(time.Second),
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
125
internal/database/cascade_debug_test.go
Normal file
125
internal/database/cascade_debug_test.go
Normal file
@@ -0,0 +1,125 @@
|
|||||||
|
package database
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TestCascadeDeleteDebug tests cascade delete with debug output
|
||||||
|
func TestCascadeDeleteDebug(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Check if foreign keys are enabled
|
||||||
|
var fkEnabled int
|
||||||
|
err := db.conn.QueryRow("PRAGMA foreign_keys").Scan(&fkEnabled)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
t.Logf("Foreign keys enabled: %d", fkEnabled)
|
||||||
|
|
||||||
|
// Create a file
|
||||||
|
file := &File{
|
||||||
|
Path: "/cascade-test.txt",
|
||||||
|
MTime: time.Now().Truncate(time.Second),
|
||||||
|
CTime: time.Now().Truncate(time.Second),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
err = repos.Files.Create(ctx, nil, file)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file: %v", err)
|
||||||
|
}
|
||||||
|
t.Logf("Created file with ID: %s", file.ID)
|
||||||
|
|
||||||
|
// Create chunks and file-chunk mappings
|
||||||
|
for i := 0; i < 3; i++ {
|
||||||
|
chunk := &Chunk{
|
||||||
|
ChunkHash: types.ChunkHash(fmt.Sprintf("cascade-chunk-%d", i)),
|
||||||
|
Size: 1024,
|
||||||
|
}
|
||||||
|
err = repos.Chunks.Create(ctx, nil, chunk)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
fc := &FileChunk{
|
||||||
|
FileID: file.ID,
|
||||||
|
Idx: i,
|
||||||
|
ChunkHash: chunk.ChunkHash,
|
||||||
|
}
|
||||||
|
err = repos.FileChunks.Create(ctx, nil, fc)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file chunk: %v", err)
|
||||||
|
}
|
||||||
|
t.Logf("Created file chunk mapping: file_id=%s, idx=%d, chunk=%s", fc.FileID, fc.Idx, fc.ChunkHash)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify file chunks exist
|
||||||
|
fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
t.Logf("File chunks before delete: %d", len(fileChunks))
|
||||||
|
|
||||||
|
// Check the foreign key constraint
|
||||||
|
var fkInfo string
|
||||||
|
err = db.conn.QueryRow(`
|
||||||
|
SELECT sql FROM sqlite_master
|
||||||
|
WHERE type='table' AND name='file_chunks'
|
||||||
|
`).Scan(&fkInfo)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
t.Logf("file_chunks table definition:\n%s", fkInfo)
|
||||||
|
|
||||||
|
// Delete the file
|
||||||
|
t.Log("Deleting file...")
|
||||||
|
err = repos.Files.DeleteByID(ctx, nil, file.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to delete file: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify file is gone
|
||||||
|
deletedFile, err := repos.Files.GetByID(ctx, file.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
if deletedFile != nil {
|
||||||
|
t.Error("file should have been deleted")
|
||||||
|
} else {
|
||||||
|
t.Log("File was successfully deleted")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check file chunks after delete
|
||||||
|
fileChunks, err = repos.FileChunks.GetByFileID(ctx, file.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
t.Logf("File chunks after delete: %d", len(fileChunks))
|
||||||
|
|
||||||
|
// Manually check the database
|
||||||
|
var count int
|
||||||
|
err = db.conn.QueryRow("SELECT COUNT(*) FROM file_chunks WHERE file_id = ?", file.ID).Scan(&count)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
t.Logf("Manual count of file_chunks for deleted file: %d", count)
|
||||||
|
|
||||||
|
if len(fileChunks) != 0 {
|
||||||
|
t.Errorf("expected 0 file chunks after cascade delete, got %d", len(fileChunks))
|
||||||
|
// List the remaining chunks
|
||||||
|
for _, fc := range fileChunks {
|
||||||
|
t.Logf("Remaining chunk: file_id=%s, idx=%d, chunk=%s", fc.FileID, fc.Idx, fc.ChunkHash)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -4,6 +4,8 @@ import (
|
|||||||
"context"
|
"context"
|
||||||
"database/sql"
|
"database/sql"
|
||||||
"fmt"
|
"fmt"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
)
|
)
|
||||||
|
|
||||||
type ChunkFileRepository struct {
|
type ChunkFileRepository struct {
|
||||||
@@ -16,16 +18,16 @@ func NewChunkFileRepository(db *DB) *ChunkFileRepository {
|
|||||||
|
|
||||||
func (r *ChunkFileRepository) Create(ctx context.Context, tx *sql.Tx, cf *ChunkFile) error {
|
func (r *ChunkFileRepository) Create(ctx context.Context, tx *sql.Tx, cf *ChunkFile) error {
|
||||||
query := `
|
query := `
|
||||||
INSERT INTO chunk_files (chunk_hash, file_path, file_offset, length)
|
INSERT INTO chunk_files (chunk_hash, file_id, file_offset, length)
|
||||||
VALUES (?, ?, ?, ?)
|
VALUES (?, ?, ?, ?)
|
||||||
ON CONFLICT(chunk_hash, file_path) DO NOTHING
|
ON CONFLICT(chunk_hash, file_id) DO NOTHING
|
||||||
`
|
`
|
||||||
|
|
||||||
var err error
|
var err error
|
||||||
if tx != nil {
|
if tx != nil {
|
||||||
_, err = tx.ExecContext(ctx, query, cf.ChunkHash, cf.FilePath, cf.FileOffset, cf.Length)
|
_, err = tx.ExecContext(ctx, query, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
|
||||||
} else {
|
} else {
|
||||||
_, err = r.db.ExecWithLock(ctx, query, cf.ChunkHash, cf.FilePath, cf.FileOffset, cf.Length)
|
_, err = r.db.ExecWithLog(ctx, query, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
|
||||||
}
|
}
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -35,37 +37,28 @@ func (r *ChunkFileRepository) Create(ctx context.Context, tx *sql.Tx, cf *ChunkF
|
|||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (r *ChunkFileRepository) GetByChunkHash(ctx context.Context, chunkHash string) ([]*ChunkFile, error) {
|
func (r *ChunkFileRepository) GetByChunkHash(ctx context.Context, chunkHash types.ChunkHash) ([]*ChunkFile, error) {
|
||||||
query := `
|
query := `
|
||||||
SELECT chunk_hash, file_path, file_offset, length
|
SELECT chunk_hash, file_id, file_offset, length
|
||||||
FROM chunk_files
|
FROM chunk_files
|
||||||
WHERE chunk_hash = ?
|
WHERE chunk_hash = ?
|
||||||
`
|
`
|
||||||
|
|
||||||
rows, err := r.db.conn.QueryContext(ctx, query, chunkHash)
|
rows, err := r.db.conn.QueryContext(ctx, query, chunkHash.String())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, fmt.Errorf("querying chunk files: %w", err)
|
return nil, fmt.Errorf("querying chunk files: %w", err)
|
||||||
}
|
}
|
||||||
defer CloseRows(rows)
|
defer CloseRows(rows)
|
||||||
|
|
||||||
var chunkFiles []*ChunkFile
|
return r.scanChunkFiles(rows)
|
||||||
for rows.Next() {
|
|
||||||
var cf ChunkFile
|
|
||||||
err := rows.Scan(&cf.ChunkHash, &cf.FilePath, &cf.FileOffset, &cf.Length)
|
|
||||||
if err != nil {
|
|
||||||
return nil, fmt.Errorf("scanning chunk file: %w", err)
|
|
||||||
}
|
|
||||||
chunkFiles = append(chunkFiles, &cf)
|
|
||||||
}
|
|
||||||
|
|
||||||
return chunkFiles, rows.Err()
|
|
||||||
}
|
}
|
||||||
|
|
||||||
func (r *ChunkFileRepository) GetByFilePath(ctx context.Context, filePath string) ([]*ChunkFile, error) {
|
func (r *ChunkFileRepository) GetByFilePath(ctx context.Context, filePath string) ([]*ChunkFile, error) {
|
||||||
query := `
|
query := `
|
||||||
SELECT chunk_hash, file_path, file_offset, length
|
SELECT cf.chunk_hash, cf.file_id, cf.file_offset, cf.length
|
||||||
FROM chunk_files
|
FROM chunk_files cf
|
||||||
WHERE file_path = ?
|
JOIN files f ON cf.file_id = f.id
|
||||||
|
WHERE f.path = ?
|
||||||
`
|
`
|
||||||
|
|
||||||
rows, err := r.db.conn.QueryContext(ctx, query, filePath)
|
rows, err := r.db.conn.QueryContext(ctx, query, filePath)
|
||||||
@@ -74,15 +67,138 @@ func (r *ChunkFileRepository) GetByFilePath(ctx context.Context, filePath string
|
|||||||
}
|
}
|
||||||
defer CloseRows(rows)
|
defer CloseRows(rows)
|
||||||
|
|
||||||
|
return r.scanChunkFiles(rows)
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetByFileID retrieves chunk files by file ID
|
||||||
|
func (r *ChunkFileRepository) GetByFileID(ctx context.Context, fileID types.FileID) ([]*ChunkFile, error) {
|
||||||
|
query := `
|
||||||
|
SELECT chunk_hash, file_id, file_offset, length
|
||||||
|
FROM chunk_files
|
||||||
|
WHERE file_id = ?
|
||||||
|
`
|
||||||
|
|
||||||
|
rows, err := r.db.conn.QueryContext(ctx, query, fileID.String())
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("querying chunk files: %w", err)
|
||||||
|
}
|
||||||
|
defer CloseRows(rows)
|
||||||
|
|
||||||
|
return r.scanChunkFiles(rows)
|
||||||
|
}
|
||||||
|
|
||||||
|
// scanChunkFiles is a helper that scans chunk file rows
|
||||||
|
func (r *ChunkFileRepository) scanChunkFiles(rows *sql.Rows) ([]*ChunkFile, error) {
|
||||||
var chunkFiles []*ChunkFile
|
var chunkFiles []*ChunkFile
|
||||||
for rows.Next() {
|
for rows.Next() {
|
||||||
var cf ChunkFile
|
var cf ChunkFile
|
||||||
err := rows.Scan(&cf.ChunkHash, &cf.FilePath, &cf.FileOffset, &cf.Length)
|
var chunkHashStr, fileIDStr string
|
||||||
|
err := rows.Scan(&chunkHashStr, &fileIDStr, &cf.FileOffset, &cf.Length)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, fmt.Errorf("scanning chunk file: %w", err)
|
return nil, fmt.Errorf("scanning chunk file: %w", err)
|
||||||
}
|
}
|
||||||
|
cf.ChunkHash = types.ChunkHash(chunkHashStr)
|
||||||
|
cf.FileID, err = types.ParseFileID(fileIDStr)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("parsing file ID: %w", err)
|
||||||
|
}
|
||||||
chunkFiles = append(chunkFiles, &cf)
|
chunkFiles = append(chunkFiles, &cf)
|
||||||
}
|
}
|
||||||
|
|
||||||
return chunkFiles, rows.Err()
|
return chunkFiles, rows.Err()
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// DeleteByFileID deletes all chunk_files entries for a given file ID
|
||||||
|
func (r *ChunkFileRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fileID types.FileID) error {
|
||||||
|
query := `DELETE FROM chunk_files WHERE file_id = ?`
|
||||||
|
|
||||||
|
var err error
|
||||||
|
if tx != nil {
|
||||||
|
_, err = tx.ExecContext(ctx, query, fileID.String())
|
||||||
|
} else {
|
||||||
|
_, err = r.db.ExecWithLog(ctx, query, fileID.String())
|
||||||
|
}
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting chunk files: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// DeleteByFileIDs deletes all chunk_files for multiple files in a single statement.
|
||||||
|
func (r *ChunkFileRepository) DeleteByFileIDs(ctx context.Context, tx *sql.Tx, fileIDs []types.FileID) error {
|
||||||
|
if len(fileIDs) == 0 {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Batch at 500 to stay within SQLite's variable limit
|
||||||
|
const batchSize = 500
|
||||||
|
|
||||||
|
for i := 0; i < len(fileIDs); i += batchSize {
|
||||||
|
end := i + batchSize
|
||||||
|
if end > len(fileIDs) {
|
||||||
|
end = len(fileIDs)
|
||||||
|
}
|
||||||
|
batch := fileIDs[i:end]
|
||||||
|
|
||||||
|
query := "DELETE FROM chunk_files WHERE file_id IN (?" + repeatPlaceholder(len(batch)-1) + ")"
|
||||||
|
args := make([]interface{}, len(batch))
|
||||||
|
for j, id := range batch {
|
||||||
|
args[j] = id.String()
|
||||||
|
}
|
||||||
|
|
||||||
|
var err error
|
||||||
|
if tx != nil {
|
||||||
|
_, err = tx.ExecContext(ctx, query, args...)
|
||||||
|
} else {
|
||||||
|
_, err = r.db.ExecWithLog(ctx, query, args...)
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("batch deleting chunk_files: %w", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// CreateBatch inserts multiple chunk_files in a single statement for efficiency.
|
||||||
|
func (r *ChunkFileRepository) CreateBatch(ctx context.Context, tx *sql.Tx, cfs []ChunkFile) error {
|
||||||
|
if len(cfs) == 0 {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Each ChunkFile has 4 values, so batch at 200 to be safe with SQLite's variable limit
|
||||||
|
const batchSize = 200
|
||||||
|
|
||||||
|
for i := 0; i < len(cfs); i += batchSize {
|
||||||
|
end := i + batchSize
|
||||||
|
if end > len(cfs) {
|
||||||
|
end = len(cfs)
|
||||||
|
}
|
||||||
|
batch := cfs[i:end]
|
||||||
|
|
||||||
|
query := "INSERT INTO chunk_files (chunk_hash, file_id, file_offset, length) VALUES "
|
||||||
|
args := make([]interface{}, 0, len(batch)*4)
|
||||||
|
for j, cf := range batch {
|
||||||
|
if j > 0 {
|
||||||
|
query += ", "
|
||||||
|
}
|
||||||
|
query += "(?, ?, ?, ?)"
|
||||||
|
args = append(args, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
|
||||||
|
}
|
||||||
|
query += " ON CONFLICT(chunk_hash, file_id) DO NOTHING"
|
||||||
|
|
||||||
|
var err error
|
||||||
|
if tx != nil {
|
||||||
|
_, err = tx.ExecContext(ctx, query, args...)
|
||||||
|
} else {
|
||||||
|
_, err = r.db.ExecWithLog(ctx, query, args...)
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("batch inserting chunk_files: %w", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|||||||
@@ -3,6 +3,9 @@ package database
|
|||||||
import (
|
import (
|
||||||
"context"
|
"context"
|
||||||
"testing"
|
"testing"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
)
|
)
|
||||||
|
|
||||||
func TestChunkFileRepository(t *testing.T) {
|
func TestChunkFileRepository(t *testing.T) {
|
||||||
@@ -11,24 +14,68 @@ func TestChunkFileRepository(t *testing.T) {
|
|||||||
|
|
||||||
ctx := context.Background()
|
ctx := context.Background()
|
||||||
repo := NewChunkFileRepository(db)
|
repo := NewChunkFileRepository(db)
|
||||||
|
fileRepo := NewFileRepository(db)
|
||||||
|
chunksRepo := NewChunkRepository(db)
|
||||||
|
|
||||||
|
// Create test files first
|
||||||
|
testTime := time.Now().Truncate(time.Second)
|
||||||
|
file1 := &File{
|
||||||
|
Path: "/file1.txt",
|
||||||
|
MTime: testTime,
|
||||||
|
CTime: testTime,
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
LinkTarget: "",
|
||||||
|
}
|
||||||
|
err := fileRepo.Create(ctx, nil, file1)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file1: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
file2 := &File{
|
||||||
|
Path: "/file2.txt",
|
||||||
|
MTime: testTime,
|
||||||
|
CTime: testTime,
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
LinkTarget: "",
|
||||||
|
}
|
||||||
|
err = fileRepo.Create(ctx, nil, file2)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file2: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create chunk first
|
||||||
|
chunk := &Chunk{
|
||||||
|
ChunkHash: types.ChunkHash("chunk1"),
|
||||||
|
Size: 1024,
|
||||||
|
}
|
||||||
|
err = chunksRepo.Create(ctx, nil, chunk)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
// Test Create
|
// Test Create
|
||||||
cf1 := &ChunkFile{
|
cf1 := &ChunkFile{
|
||||||
ChunkHash: "chunk1",
|
ChunkHash: types.ChunkHash("chunk1"),
|
||||||
FilePath: "/file1.txt",
|
FileID: file1.ID,
|
||||||
FileOffset: 0,
|
FileOffset: 0,
|
||||||
Length: 1024,
|
Length: 1024,
|
||||||
}
|
}
|
||||||
|
|
||||||
err := repo.Create(ctx, nil, cf1)
|
err = repo.Create(ctx, nil, cf1)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to create chunk file: %v", err)
|
t.Fatalf("failed to create chunk file: %v", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Add same chunk in different file (deduplication scenario)
|
// Add same chunk in different file (deduplication scenario)
|
||||||
cf2 := &ChunkFile{
|
cf2 := &ChunkFile{
|
||||||
ChunkHash: "chunk1",
|
ChunkHash: types.ChunkHash("chunk1"),
|
||||||
FilePath: "/file2.txt",
|
FileID: file2.ID,
|
||||||
FileOffset: 2048,
|
FileOffset: 2048,
|
||||||
Length: 1024,
|
Length: 1024,
|
||||||
}
|
}
|
||||||
@@ -50,10 +97,10 @@ func TestChunkFileRepository(t *testing.T) {
|
|||||||
foundFile1 := false
|
foundFile1 := false
|
||||||
foundFile2 := false
|
foundFile2 := false
|
||||||
for _, cf := range chunkFiles {
|
for _, cf := range chunkFiles {
|
||||||
if cf.FilePath == "/file1.txt" && cf.FileOffset == 0 {
|
if cf.FileID == file1.ID && cf.FileOffset == 0 {
|
||||||
foundFile1 = true
|
foundFile1 = true
|
||||||
}
|
}
|
||||||
if cf.FilePath == "/file2.txt" && cf.FileOffset == 2048 {
|
if cf.FileID == file2.ID && cf.FileOffset == 2048 {
|
||||||
foundFile2 = true
|
foundFile2 = true
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -61,15 +108,15 @@ func TestChunkFileRepository(t *testing.T) {
|
|||||||
t.Error("not all expected files found")
|
t.Error("not all expected files found")
|
||||||
}
|
}
|
||||||
|
|
||||||
// Test GetByFilePath
|
// Test GetByFileID
|
||||||
chunkFiles, err = repo.GetByFilePath(ctx, "/file1.txt")
|
chunkFiles, err = repo.GetByFileID(ctx, file1.ID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get chunks by file path: %v", err)
|
t.Fatalf("failed to get chunks by file ID: %v", err)
|
||||||
}
|
}
|
||||||
if len(chunkFiles) != 1 {
|
if len(chunkFiles) != 1 {
|
||||||
t.Errorf("expected 1 chunk for file, got %d", len(chunkFiles))
|
t.Errorf("expected 1 chunk for file, got %d", len(chunkFiles))
|
||||||
}
|
}
|
||||||
if chunkFiles[0].ChunkHash != "chunk1" {
|
if chunkFiles[0].ChunkHash != types.ChunkHash("chunk1") {
|
||||||
t.Errorf("wrong chunk hash: expected chunk1, got %s", chunkFiles[0].ChunkHash)
|
t.Errorf("wrong chunk hash: expected chunk1, got %s", chunkFiles[0].ChunkHash)
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -86,6 +133,37 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {
|
|||||||
|
|
||||||
ctx := context.Background()
|
ctx := context.Background()
|
||||||
repo := NewChunkFileRepository(db)
|
repo := NewChunkFileRepository(db)
|
||||||
|
fileRepo := NewFileRepository(db)
|
||||||
|
chunksRepo := NewChunkRepository(db)
|
||||||
|
|
||||||
|
// Create test files
|
||||||
|
testTime := time.Now().Truncate(time.Second)
|
||||||
|
file1 := &File{Path: "/file1.txt", MTime: testTime, CTime: testTime, Size: 3072, Mode: 0644, UID: 1000, GID: 1000}
|
||||||
|
file2 := &File{Path: "/file2.txt", MTime: testTime, CTime: testTime, Size: 3072, Mode: 0644, UID: 1000, GID: 1000}
|
||||||
|
file3 := &File{Path: "/file3.txt", MTime: testTime, CTime: testTime, Size: 2048, Mode: 0644, UID: 1000, GID: 1000}
|
||||||
|
|
||||||
|
if err := fileRepo.Create(ctx, nil, file1); err != nil {
|
||||||
|
t.Fatalf("failed to create file1: %v", err)
|
||||||
|
}
|
||||||
|
if err := fileRepo.Create(ctx, nil, file2); err != nil {
|
||||||
|
t.Fatalf("failed to create file2: %v", err)
|
||||||
|
}
|
||||||
|
if err := fileRepo.Create(ctx, nil, file3); err != nil {
|
||||||
|
t.Fatalf("failed to create file3: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create chunks first
|
||||||
|
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3", "chunk4"}
|
||||||
|
for _, chunkHash := range chunks {
|
||||||
|
chunk := &Chunk{
|
||||||
|
ChunkHash: chunkHash,
|
||||||
|
Size: 1024,
|
||||||
|
}
|
||||||
|
err := chunksRepo.Create(ctx, nil, chunk)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Simulate a scenario where multiple files share chunks
|
// Simulate a scenario where multiple files share chunks
|
||||||
// File1: chunk1, chunk2, chunk3
|
// File1: chunk1, chunk2, chunk3
|
||||||
@@ -94,16 +172,16 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {
|
|||||||
|
|
||||||
chunkFiles := []ChunkFile{
|
chunkFiles := []ChunkFile{
|
||||||
// File1
|
// File1
|
||||||
{ChunkHash: "chunk1", FilePath: "/file1.txt", FileOffset: 0, Length: 1024},
|
{ChunkHash: types.ChunkHash("chunk1"), FileID: file1.ID, FileOffset: 0, Length: 1024},
|
||||||
{ChunkHash: "chunk2", FilePath: "/file1.txt", FileOffset: 1024, Length: 1024},
|
{ChunkHash: types.ChunkHash("chunk2"), FileID: file1.ID, FileOffset: 1024, Length: 1024},
|
||||||
{ChunkHash: "chunk3", FilePath: "/file1.txt", FileOffset: 2048, Length: 1024},
|
{ChunkHash: types.ChunkHash("chunk3"), FileID: file1.ID, FileOffset: 2048, Length: 1024},
|
||||||
// File2
|
// File2
|
||||||
{ChunkHash: "chunk2", FilePath: "/file2.txt", FileOffset: 0, Length: 1024},
|
{ChunkHash: types.ChunkHash("chunk2"), FileID: file2.ID, FileOffset: 0, Length: 1024},
|
||||||
{ChunkHash: "chunk3", FilePath: "/file2.txt", FileOffset: 1024, Length: 1024},
|
{ChunkHash: types.ChunkHash("chunk3"), FileID: file2.ID, FileOffset: 1024, Length: 1024},
|
||||||
{ChunkHash: "chunk4", FilePath: "/file2.txt", FileOffset: 2048, Length: 1024},
|
{ChunkHash: types.ChunkHash("chunk4"), FileID: file2.ID, FileOffset: 2048, Length: 1024},
|
||||||
// File3
|
// File3
|
||||||
{ChunkHash: "chunk1", FilePath: "/file3.txt", FileOffset: 0, Length: 1024},
|
{ChunkHash: types.ChunkHash("chunk1"), FileID: file3.ID, FileOffset: 0, Length: 1024},
|
||||||
{ChunkHash: "chunk4", FilePath: "/file3.txt", FileOffset: 1024, Length: 1024},
|
{ChunkHash: types.ChunkHash("chunk4"), FileID: file3.ID, FileOffset: 1024, Length: 1024},
|
||||||
}
|
}
|
||||||
|
|
||||||
for _, cf := range chunkFiles {
|
for _, cf := range chunkFiles {
|
||||||
@@ -132,11 +210,11 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Test file2 chunks
|
// Test file2 chunks
|
||||||
chunks, err := repo.GetByFilePath(ctx, "/file2.txt")
|
file2Chunks, err := repo.GetByFileID(ctx, file2.ID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get chunks for file2: %v", err)
|
t.Fatalf("failed to get chunks for file2: %v", err)
|
||||||
}
|
}
|
||||||
if len(chunks) != 3 {
|
if len(file2Chunks) != 3 {
|
||||||
t.Errorf("expected 3 chunks for file2, got %d", len(chunks))
|
t.Errorf("expected 3 chunks for file2, got %d", len(file2Chunks))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -4,6 +4,8 @@ import (
|
|||||||
"context"
|
"context"
|
||||||
"database/sql"
|
"database/sql"
|
||||||
"fmt"
|
"fmt"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
)
|
)
|
||||||
|
|
||||||
type ChunkRepository struct {
|
type ChunkRepository struct {
|
||||||
@@ -16,16 +18,16 @@ func NewChunkRepository(db *DB) *ChunkRepository {
|
|||||||
|
|
||||||
func (r *ChunkRepository) Create(ctx context.Context, tx *sql.Tx, chunk *Chunk) error {
|
func (r *ChunkRepository) Create(ctx context.Context, tx *sql.Tx, chunk *Chunk) error {
|
||||||
query := `
|
query := `
|
||||||
INSERT INTO chunks (chunk_hash, sha256, size)
|
INSERT INTO chunks (chunk_hash, size)
|
||||||
VALUES (?, ?, ?)
|
VALUES (?, ?)
|
||||||
ON CONFLICT(chunk_hash) DO NOTHING
|
ON CONFLICT(chunk_hash) DO NOTHING
|
||||||
`
|
`
|
||||||
|
|
||||||
var err error
|
var err error
|
||||||
if tx != nil {
|
if tx != nil {
|
||||||
_, err = tx.ExecContext(ctx, query, chunk.ChunkHash, chunk.SHA256, chunk.Size)
|
_, err = tx.ExecContext(ctx, query, chunk.ChunkHash, chunk.Size)
|
||||||
} else {
|
} else {
|
||||||
_, err = r.db.ExecWithLock(ctx, query, chunk.ChunkHash, chunk.SHA256, chunk.Size)
|
_, err = r.db.ExecWithLog(ctx, query, chunk.ChunkHash, chunk.Size)
|
||||||
}
|
}
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -37,7 +39,7 @@ func (r *ChunkRepository) Create(ctx context.Context, tx *sql.Tx, chunk *Chunk)
|
|||||||
|
|
||||||
func (r *ChunkRepository) GetByHash(ctx context.Context, hash string) (*Chunk, error) {
|
func (r *ChunkRepository) GetByHash(ctx context.Context, hash string) (*Chunk, error) {
|
||||||
query := `
|
query := `
|
||||||
SELECT chunk_hash, sha256, size
|
SELECT chunk_hash, size
|
||||||
FROM chunks
|
FROM chunks
|
||||||
WHERE chunk_hash = ?
|
WHERE chunk_hash = ?
|
||||||
`
|
`
|
||||||
@@ -46,7 +48,6 @@ func (r *ChunkRepository) GetByHash(ctx context.Context, hash string) (*Chunk, e
|
|||||||
|
|
||||||
err := r.db.conn.QueryRowContext(ctx, query, hash).Scan(
|
err := r.db.conn.QueryRowContext(ctx, query, hash).Scan(
|
||||||
&chunk.ChunkHash,
|
&chunk.ChunkHash,
|
||||||
&chunk.SHA256,
|
|
||||||
&chunk.Size,
|
&chunk.Size,
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -66,7 +67,7 @@ func (r *ChunkRepository) GetByHashes(ctx context.Context, hashes []string) ([]*
|
|||||||
}
|
}
|
||||||
|
|
||||||
query := `
|
query := `
|
||||||
SELECT chunk_hash, sha256, size
|
SELECT chunk_hash, size
|
||||||
FROM chunks
|
FROM chunks
|
||||||
WHERE chunk_hash IN (`
|
WHERE chunk_hash IN (`
|
||||||
|
|
||||||
@@ -92,7 +93,6 @@ func (r *ChunkRepository) GetByHashes(ctx context.Context, hashes []string) ([]*
|
|||||||
|
|
||||||
err := rows.Scan(
|
err := rows.Scan(
|
||||||
&chunk.ChunkHash,
|
&chunk.ChunkHash,
|
||||||
&chunk.SHA256,
|
|
||||||
&chunk.Size,
|
&chunk.Size,
|
||||||
)
|
)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -107,7 +107,7 @@ func (r *ChunkRepository) GetByHashes(ctx context.Context, hashes []string) ([]*
|
|||||||
|
|
||||||
func (r *ChunkRepository) ListUnpacked(ctx context.Context, limit int) ([]*Chunk, error) {
|
func (r *ChunkRepository) ListUnpacked(ctx context.Context, limit int) ([]*Chunk, error) {
|
||||||
query := `
|
query := `
|
||||||
SELECT c.chunk_hash, c.sha256, c.size
|
SELECT c.chunk_hash, c.size
|
||||||
FROM chunks c
|
FROM chunks c
|
||||||
LEFT JOIN blob_chunks bc ON c.chunk_hash = bc.chunk_hash
|
LEFT JOIN blob_chunks bc ON c.chunk_hash = bc.chunk_hash
|
||||||
WHERE bc.chunk_hash IS NULL
|
WHERE bc.chunk_hash IS NULL
|
||||||
@@ -127,7 +127,6 @@ func (r *ChunkRepository) ListUnpacked(ctx context.Context, limit int) ([]*Chunk
|
|||||||
|
|
||||||
err := rows.Scan(
|
err := rows.Scan(
|
||||||
&chunk.ChunkHash,
|
&chunk.ChunkHash,
|
||||||
&chunk.SHA256,
|
|
||||||
&chunk.Size,
|
&chunk.Size,
|
||||||
)
|
)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -139,3 +138,30 @@ func (r *ChunkRepository) ListUnpacked(ctx context.Context, limit int) ([]*Chunk
|
|||||||
|
|
||||||
return chunks, rows.Err()
|
return chunks, rows.Err()
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// DeleteOrphaned deletes chunks that are not referenced by any file or blob
|
||||||
|
func (r *ChunkRepository) DeleteOrphaned(ctx context.Context) error {
|
||||||
|
query := `
|
||||||
|
DELETE FROM chunks
|
||||||
|
WHERE NOT EXISTS (
|
||||||
|
SELECT 1 FROM file_chunks
|
||||||
|
WHERE file_chunks.chunk_hash = chunks.chunk_hash
|
||||||
|
)
|
||||||
|
AND NOT EXISTS (
|
||||||
|
SELECT 1 FROM blob_chunks
|
||||||
|
WHERE blob_chunks.chunk_hash = chunks.chunk_hash
|
||||||
|
)
|
||||||
|
`
|
||||||
|
|
||||||
|
result, err := r.db.ExecWithLog(ctx, query)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting orphaned chunks: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
rowsAffected, _ := result.RowsAffected()
|
||||||
|
if rowsAffected > 0 {
|
||||||
|
log.Debug("Deleted orphaned chunks", "count", rowsAffected)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|||||||
37
internal/database/chunks_ext.go
Normal file
37
internal/database/chunks_ext.go
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
package database
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
)
|
||||||
|
|
||||||
|
func (r *ChunkRepository) List(ctx context.Context) ([]*Chunk, error) {
|
||||||
|
query := `
|
||||||
|
SELECT chunk_hash, size
|
||||||
|
FROM chunks
|
||||||
|
ORDER BY chunk_hash
|
||||||
|
`
|
||||||
|
|
||||||
|
rows, err := r.db.conn.QueryContext(ctx, query)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("querying chunks: %w", err)
|
||||||
|
}
|
||||||
|
defer CloseRows(rows)
|
||||||
|
|
||||||
|
var chunks []*Chunk
|
||||||
|
for rows.Next() {
|
||||||
|
var chunk Chunk
|
||||||
|
|
||||||
|
err := rows.Scan(
|
||||||
|
&chunk.ChunkHash,
|
||||||
|
&chunk.Size,
|
||||||
|
)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("scanning chunk: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
chunks = append(chunks, &chunk)
|
||||||
|
}
|
||||||
|
|
||||||
|
return chunks, rows.Err()
|
||||||
|
}
|
||||||
@@ -3,6 +3,8 @@ package database
|
|||||||
import (
|
import (
|
||||||
"context"
|
"context"
|
||||||
"testing"
|
"testing"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
)
|
)
|
||||||
|
|
||||||
func TestChunkRepository(t *testing.T) {
|
func TestChunkRepository(t *testing.T) {
|
||||||
@@ -14,8 +16,7 @@ func TestChunkRepository(t *testing.T) {
|
|||||||
|
|
||||||
// Test Create
|
// Test Create
|
||||||
chunk := &Chunk{
|
chunk := &Chunk{
|
||||||
ChunkHash: "chunkhash123",
|
ChunkHash: types.ChunkHash("chunkhash123"),
|
||||||
SHA256: "sha256hash123",
|
|
||||||
Size: 4096,
|
Size: 4096,
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -25,7 +26,7 @@ func TestChunkRepository(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Test GetByHash
|
// Test GetByHash
|
||||||
retrieved, err := repo.GetByHash(ctx, chunk.ChunkHash)
|
retrieved, err := repo.GetByHash(ctx, chunk.ChunkHash.String())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get chunk: %v", err)
|
t.Fatalf("failed to get chunk: %v", err)
|
||||||
}
|
}
|
||||||
@@ -35,9 +36,6 @@ func TestChunkRepository(t *testing.T) {
|
|||||||
if retrieved.ChunkHash != chunk.ChunkHash {
|
if retrieved.ChunkHash != chunk.ChunkHash {
|
||||||
t.Errorf("chunk hash mismatch: got %s, want %s", retrieved.ChunkHash, chunk.ChunkHash)
|
t.Errorf("chunk hash mismatch: got %s, want %s", retrieved.ChunkHash, chunk.ChunkHash)
|
||||||
}
|
}
|
||||||
if retrieved.SHA256 != chunk.SHA256 {
|
|
||||||
t.Errorf("sha256 mismatch: got %s, want %s", retrieved.SHA256, chunk.SHA256)
|
|
||||||
}
|
|
||||||
if retrieved.Size != chunk.Size {
|
if retrieved.Size != chunk.Size {
|
||||||
t.Errorf("size mismatch: got %d, want %d", retrieved.Size, chunk.Size)
|
t.Errorf("size mismatch: got %d, want %d", retrieved.Size, chunk.Size)
|
||||||
}
|
}
|
||||||
@@ -50,8 +48,7 @@ func TestChunkRepository(t *testing.T) {
|
|||||||
|
|
||||||
// Test GetByHashes
|
// Test GetByHashes
|
||||||
chunk2 := &Chunk{
|
chunk2 := &Chunk{
|
||||||
ChunkHash: "chunkhash456",
|
ChunkHash: types.ChunkHash("chunkhash456"),
|
||||||
SHA256: "sha256hash456",
|
|
||||||
Size: 8192,
|
Size: 8192,
|
||||||
}
|
}
|
||||||
err = repo.Create(ctx, nil, chunk2)
|
err = repo.Create(ctx, nil, chunk2)
|
||||||
@@ -59,7 +56,7 @@ func TestChunkRepository(t *testing.T) {
|
|||||||
t.Fatalf("failed to create second chunk: %v", err)
|
t.Fatalf("failed to create second chunk: %v", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
chunks, err := repo.GetByHashes(ctx, []string{chunk.ChunkHash, chunk2.ChunkHash})
|
chunks, err := repo.GetByHashes(ctx, []string{chunk.ChunkHash.String(), chunk2.ChunkHash.String()})
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get chunks by hashes: %v", err)
|
t.Fatalf("failed to get chunks by hashes: %v", err)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,143 +1,239 @@
|
|||||||
|
// Package database provides the local SQLite index for Vaultik backup operations.
|
||||||
|
// The database tracks files, chunks, and their associations with blobs.
|
||||||
|
//
|
||||||
|
// Blobs in Vaultik are the final storage units uploaded to S3. Each blob is a
|
||||||
|
// large (up to 10GB) file containing many compressed and encrypted chunks from
|
||||||
|
// multiple source files. Blobs are content-addressed, meaning their filename
|
||||||
|
// is derived from their SHA256 hash after compression and encryption.
|
||||||
|
//
|
||||||
|
// The database does not support migrations. If the schema changes, delete
|
||||||
|
// the local database and perform a full backup to recreate it.
|
||||||
package database
|
package database
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"context"
|
"context"
|
||||||
"database/sql"
|
"database/sql"
|
||||||
|
_ "embed"
|
||||||
"fmt"
|
"fmt"
|
||||||
"sync"
|
"os"
|
||||||
|
"strings"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
_ "modernc.org/sqlite"
|
_ "modernc.org/sqlite"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
//go:embed schema.sql
|
||||||
|
var schemaSQL string
|
||||||
|
|
||||||
|
// DB represents the Vaultik local index database connection.
|
||||||
|
// It uses SQLite to track file metadata, content-defined chunks, and blob associations.
|
||||||
|
// The database enables incremental backups by detecting changed files and
|
||||||
|
// supports deduplication by tracking which chunks are already stored in blobs.
|
||||||
|
// Write operations are synchronized through a mutex to ensure thread safety.
|
||||||
type DB struct {
|
type DB struct {
|
||||||
conn *sql.DB
|
conn *sql.DB
|
||||||
writeLock sync.Mutex
|
path string
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// New creates a new database connection at the specified path.
|
||||||
|
// It creates the schema if needed and configures SQLite with WAL mode for
|
||||||
|
// better concurrency. SQLite handles crash recovery automatically when
|
||||||
|
// opening a database with journal/WAL files present.
|
||||||
|
// The path parameter can be a file path for persistent storage or ":memory:"
|
||||||
|
// for an in-memory database (useful for testing).
|
||||||
func New(ctx context.Context, path string) (*DB, error) {
|
func New(ctx context.Context, path string) (*DB, error) {
|
||||||
conn, err := sql.Open("sqlite", path+"?_journal_mode=WAL&_synchronous=NORMAL&_busy_timeout=5000")
|
log.Debug("Opening database connection", "path", path)
|
||||||
if err != nil {
|
|
||||||
return nil, fmt.Errorf("opening database: %w", err)
|
// Note: We do NOT delete journal/WAL files before opening.
|
||||||
|
// SQLite handles crash recovery automatically when the database is opened.
|
||||||
|
// Deleting these files would corrupt the database after an unclean shutdown.
|
||||||
|
|
||||||
|
// First attempt with standard WAL mode
|
||||||
|
log.Debug("Attempting to open database with WAL mode", "path", path)
|
||||||
|
conn, err := sql.Open(
|
||||||
|
"sqlite",
|
||||||
|
path+"?_journal_mode=WAL&_synchronous=NORMAL&_busy_timeout=10000&_locking_mode=NORMAL&_foreign_keys=ON",
|
||||||
|
)
|
||||||
|
if err == nil {
|
||||||
|
// Set connection pool settings
|
||||||
|
// SQLite can handle multiple readers but only one writer at a time.
|
||||||
|
// Setting MaxOpenConns to 1 ensures all writes are serialized through
|
||||||
|
// a single connection, preventing SQLITE_BUSY errors.
|
||||||
|
conn.SetMaxOpenConns(1)
|
||||||
|
conn.SetMaxIdleConns(1)
|
||||||
|
|
||||||
|
if err := conn.PingContext(ctx); err == nil {
|
||||||
|
// Success on first try
|
||||||
|
log.Debug("Database opened successfully with WAL mode", "path", path)
|
||||||
|
|
||||||
|
// Enable foreign keys explicitly
|
||||||
|
if _, err := conn.ExecContext(ctx, "PRAGMA foreign_keys = ON"); err != nil {
|
||||||
|
log.Warn("Failed to enable foreign keys", "error", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
db := &DB{conn: conn, path: path}
|
||||||
|
if err := db.createSchema(ctx); err != nil {
|
||||||
|
_ = conn.Close()
|
||||||
|
return nil, fmt.Errorf("creating schema: %w", err)
|
||||||
|
}
|
||||||
|
return db, nil
|
||||||
|
}
|
||||||
|
log.Debug("Failed to ping database, closing connection", "path", path, "error", err)
|
||||||
|
_ = conn.Close()
|
||||||
|
}
|
||||||
|
|
||||||
|
// If first attempt failed, try with TRUNCATE mode to clear any locks
|
||||||
|
log.Info(
|
||||||
|
"Database appears locked, attempting recovery with TRUNCATE mode",
|
||||||
|
"path", path,
|
||||||
|
)
|
||||||
|
conn, err = sql.Open(
|
||||||
|
"sqlite",
|
||||||
|
path+"?_journal_mode=TRUNCATE&_synchronous=NORMAL&_busy_timeout=10000&_foreign_keys=ON",
|
||||||
|
)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("opening database in recovery mode: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Set connection pool settings
|
||||||
|
// SQLite can handle multiple readers but only one writer at a time.
|
||||||
|
// Setting MaxOpenConns to 1 ensures all writes are serialized through
|
||||||
|
// a single connection, preventing SQLITE_BUSY errors.
|
||||||
|
conn.SetMaxOpenConns(1)
|
||||||
|
conn.SetMaxIdleConns(1)
|
||||||
|
|
||||||
if err := conn.PingContext(ctx); err != nil {
|
if err := conn.PingContext(ctx); err != nil {
|
||||||
if closeErr := conn.Close(); closeErr != nil {
|
log.Debug("Failed to ping database in recovery mode, closing", "path", path, "error", err)
|
||||||
Fatal("failed to close database connection: %v", closeErr)
|
_ = conn.Close()
|
||||||
}
|
return nil, fmt.Errorf(
|
||||||
return nil, fmt.Errorf("pinging database: %w", err)
|
"database still locked after recovery attempt: %w",
|
||||||
|
err,
|
||||||
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
db := &DB{conn: conn}
|
log.Debug("Database opened in TRUNCATE mode", "path", path)
|
||||||
if err := db.createSchema(ctx); err != nil {
|
|
||||||
if closeErr := conn.Close(); closeErr != nil {
|
// Switch back to WAL mode
|
||||||
Fatal("failed to close database connection: %v", closeErr)
|
log.Debug("Switching database back to WAL mode", "path", path)
|
||||||
|
if _, err := conn.ExecContext(ctx, "PRAGMA journal_mode=WAL"); err != nil {
|
||||||
|
log.Warn("Failed to switch back to WAL mode", "path", path, "error", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Ensure foreign keys are enabled
|
||||||
|
if _, err := conn.ExecContext(ctx, "PRAGMA foreign_keys=ON"); err != nil {
|
||||||
|
log.Warn("Failed to enable foreign keys", "path", path, "error", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
db := &DB{conn: conn, path: path}
|
||||||
|
if err := db.createSchema(ctx); err != nil {
|
||||||
|
_ = conn.Close()
|
||||||
return nil, fmt.Errorf("creating schema: %w", err)
|
return nil, fmt.Errorf("creating schema: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
log.Debug("Database connection established successfully", "path", path)
|
||||||
return db, nil
|
return db, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Close closes the database connection.
|
||||||
|
// It ensures all pending operations are completed before closing.
|
||||||
|
// Returns an error if the database connection cannot be closed properly.
|
||||||
func (db *DB) Close() error {
|
func (db *DB) Close() error {
|
||||||
|
log.Debug("Closing database connection", "path", db.path)
|
||||||
if err := db.conn.Close(); err != nil {
|
if err := db.conn.Close(); err != nil {
|
||||||
Fatal("failed to close database: %v", err)
|
log.Error("Failed to close database", "path", db.path, "error", err)
|
||||||
|
return fmt.Errorf("failed to close database: %w", err)
|
||||||
}
|
}
|
||||||
|
log.Debug("Database connection closed successfully", "path", db.path)
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Conn returns the underlying *sql.DB connection.
|
||||||
|
// This should be used sparingly and primarily for read operations.
|
||||||
|
// For write operations, prefer using the ExecWithLog method.
|
||||||
func (db *DB) Conn() *sql.DB {
|
func (db *DB) Conn() *sql.DB {
|
||||||
return db.conn
|
return db.conn
|
||||||
}
|
}
|
||||||
|
|
||||||
func (db *DB) BeginTx(ctx context.Context, opts *sql.TxOptions) (*sql.Tx, error) {
|
// Path returns the path to the database file.
|
||||||
|
func (db *DB) Path() string {
|
||||||
|
return db.path
|
||||||
|
}
|
||||||
|
|
||||||
|
// BeginTx starts a new database transaction with the given options.
|
||||||
|
// The caller is responsible for committing or rolling back the transaction.
|
||||||
|
// For write transactions, consider using the Repositories.WithTx method instead,
|
||||||
|
// which handles locking and rollback automatically.
|
||||||
|
func (db *DB) BeginTx(
|
||||||
|
ctx context.Context,
|
||||||
|
opts *sql.TxOptions,
|
||||||
|
) (*sql.Tx, error) {
|
||||||
return db.conn.BeginTx(ctx, opts)
|
return db.conn.BeginTx(ctx, opts)
|
||||||
}
|
}
|
||||||
|
|
||||||
// LockForWrite acquires the write lock
|
// Note: LockForWrite and UnlockWrite methods have been removed.
|
||||||
func (db *DB) LockForWrite() {
|
// SQLite handles its own locking internally, so explicit locking is not needed.
|
||||||
db.writeLock.Lock()
|
|
||||||
}
|
|
||||||
|
|
||||||
// UnlockWrite releases the write lock
|
// ExecWithLog executes a write query with SQL logging.
|
||||||
func (db *DB) UnlockWrite() {
|
// SQLite handles its own locking internally, so we just pass through to ExecContext.
|
||||||
db.writeLock.Unlock()
|
// The query and args parameters follow the same format as sql.DB.ExecContext.
|
||||||
}
|
func (db *DB) ExecWithLog(
|
||||||
|
ctx context.Context,
|
||||||
// ExecWithLock executes a write query with the write lock held
|
query string,
|
||||||
func (db *DB) ExecWithLock(ctx context.Context, query string, args ...interface{}) (sql.Result, error) {
|
args ...interface{},
|
||||||
db.writeLock.Lock()
|
) (sql.Result, error) {
|
||||||
defer db.writeLock.Unlock()
|
LogSQL("Execute", query, args...)
|
||||||
return db.conn.ExecContext(ctx, query, args...)
|
return db.conn.ExecContext(ctx, query, args...)
|
||||||
}
|
}
|
||||||
|
|
||||||
// QueryRowWithLock executes a write query that returns a row with the write lock held
|
// QueryRowWithLog executes a query that returns at most one row with SQL logging.
|
||||||
func (db *DB) QueryRowWithLock(ctx context.Context, query string, args ...interface{}) *sql.Row {
|
// This is useful for queries that modify data and return values (e.g., INSERT ... RETURNING).
|
||||||
db.writeLock.Lock()
|
// SQLite handles its own locking internally.
|
||||||
defer db.writeLock.Unlock()
|
// The query and args parameters follow the same format as sql.DB.QueryRowContext.
|
||||||
|
func (db *DB) QueryRowWithLog(
|
||||||
|
ctx context.Context,
|
||||||
|
query string,
|
||||||
|
args ...interface{},
|
||||||
|
) *sql.Row {
|
||||||
|
LogSQL("QueryRow", query, args...)
|
||||||
return db.conn.QueryRowContext(ctx, query, args...)
|
return db.conn.QueryRowContext(ctx, query, args...)
|
||||||
}
|
}
|
||||||
|
|
||||||
func (db *DB) createSchema(ctx context.Context) error {
|
func (db *DB) createSchema(ctx context.Context) error {
|
||||||
schema := `
|
_, err := db.conn.ExecContext(ctx, schemaSQL)
|
||||||
CREATE TABLE IF NOT EXISTS files (
|
|
||||||
path TEXT PRIMARY KEY,
|
|
||||||
mtime INTEGER NOT NULL,
|
|
||||||
ctime INTEGER NOT NULL,
|
|
||||||
size INTEGER NOT NULL,
|
|
||||||
mode INTEGER NOT NULL,
|
|
||||||
uid INTEGER NOT NULL,
|
|
||||||
gid INTEGER NOT NULL,
|
|
||||||
link_target TEXT
|
|
||||||
);
|
|
||||||
|
|
||||||
CREATE TABLE IF NOT EXISTS file_chunks (
|
|
||||||
path TEXT NOT NULL,
|
|
||||||
idx INTEGER NOT NULL,
|
|
||||||
chunk_hash TEXT NOT NULL,
|
|
||||||
PRIMARY KEY (path, idx)
|
|
||||||
);
|
|
||||||
|
|
||||||
CREATE TABLE IF NOT EXISTS chunks (
|
|
||||||
chunk_hash TEXT PRIMARY KEY,
|
|
||||||
sha256 TEXT NOT NULL,
|
|
||||||
size INTEGER NOT NULL
|
|
||||||
);
|
|
||||||
|
|
||||||
CREATE TABLE IF NOT EXISTS blobs (
|
|
||||||
blob_hash TEXT PRIMARY KEY,
|
|
||||||
created_ts INTEGER NOT NULL
|
|
||||||
);
|
|
||||||
|
|
||||||
CREATE TABLE IF NOT EXISTS blob_chunks (
|
|
||||||
blob_hash TEXT NOT NULL,
|
|
||||||
chunk_hash TEXT NOT NULL,
|
|
||||||
offset INTEGER NOT NULL,
|
|
||||||
length INTEGER NOT NULL,
|
|
||||||
PRIMARY KEY (blob_hash, chunk_hash)
|
|
||||||
);
|
|
||||||
|
|
||||||
CREATE TABLE IF NOT EXISTS chunk_files (
|
|
||||||
chunk_hash TEXT NOT NULL,
|
|
||||||
file_path TEXT NOT NULL,
|
|
||||||
file_offset INTEGER NOT NULL,
|
|
||||||
length INTEGER NOT NULL,
|
|
||||||
PRIMARY KEY (chunk_hash, file_path)
|
|
||||||
);
|
|
||||||
|
|
||||||
CREATE TABLE IF NOT EXISTS snapshots (
|
|
||||||
id TEXT PRIMARY KEY,
|
|
||||||
hostname TEXT NOT NULL,
|
|
||||||
vaultik_version TEXT NOT NULL,
|
|
||||||
created_ts INTEGER NOT NULL,
|
|
||||||
file_count INTEGER NOT NULL,
|
|
||||||
chunk_count INTEGER NOT NULL,
|
|
||||||
blob_count INTEGER NOT NULL,
|
|
||||||
total_size INTEGER NOT NULL,
|
|
||||||
blob_size INTEGER NOT NULL,
|
|
||||||
compression_ratio REAL NOT NULL
|
|
||||||
);
|
|
||||||
`
|
|
||||||
|
|
||||||
_, err := db.conn.ExecContext(ctx, schema)
|
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// NewTestDB creates an in-memory SQLite database for testing purposes.
|
||||||
|
// The database is automatically initialized with the schema and is ready for use.
|
||||||
|
// Each call creates a new independent database instance.
|
||||||
|
func NewTestDB() (*DB, error) {
|
||||||
|
return New(context.Background(), ":memory:")
|
||||||
|
}
|
||||||
|
|
||||||
|
// repeatPlaceholder generates a string of ", ?" repeated n times for IN clause construction.
|
||||||
|
// For example, repeatPlaceholder(2) returns ", ?, ?".
|
||||||
|
func repeatPlaceholder(n int) string {
|
||||||
|
if n <= 0 {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
return strings.Repeat(", ?", n)
|
||||||
|
}
|
||||||
|
|
||||||
|
// LogSQL logs SQL queries and their arguments when debug mode is enabled.
|
||||||
|
// Debug mode is activated by setting the GODEBUG environment variable to include "vaultik".
|
||||||
|
// This is useful for troubleshooting database operations and understanding query patterns.
|
||||||
|
//
|
||||||
|
// The operation parameter describes the type of SQL operation (e.g., "Execute", "Query").
|
||||||
|
// The query parameter is the SQL statement being executed.
|
||||||
|
// The args parameter contains the query arguments that will be interpolated.
|
||||||
|
func LogSQL(operation, query string, args ...interface{}) {
|
||||||
|
if strings.Contains(os.Getenv("GODEBUG"), "vaultik") {
|
||||||
|
log.Debug(
|
||||||
|
"SQL "+operation,
|
||||||
|
"query",
|
||||||
|
strings.TrimSpace(query),
|
||||||
|
"args",
|
||||||
|
fmt.Sprintf("%v", args),
|
||||||
|
)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
@@ -67,21 +67,26 @@ func TestDatabaseConcurrentAccess(t *testing.T) {
|
|||||||
}()
|
}()
|
||||||
|
|
||||||
// Test concurrent writes
|
// Test concurrent writes
|
||||||
done := make(chan bool, 10)
|
type result struct {
|
||||||
|
index int
|
||||||
|
err error
|
||||||
|
}
|
||||||
|
results := make(chan result, 10)
|
||||||
|
|
||||||
for i := 0; i < 10; i++ {
|
for i := 0; i < 10; i++ {
|
||||||
go func(i int) {
|
go func(i int) {
|
||||||
_, err := db.ExecWithLock(ctx, "INSERT INTO chunks (chunk_hash, sha256, size) VALUES (?, ?, ?)",
|
_, err := db.ExecWithLog(ctx, "INSERT INTO chunks (chunk_hash, size) VALUES (?, ?)",
|
||||||
fmt.Sprintf("hash%d", i), fmt.Sprintf("sha%d", i), i*1024)
|
fmt.Sprintf("hash%d", i), i*1024)
|
||||||
if err != nil {
|
results <- result{index: i, err: err}
|
||||||
t.Errorf("concurrent insert failed: %v", err)
|
|
||||||
}
|
|
||||||
done <- true
|
|
||||||
}(i)
|
}(i)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Wait for all goroutines
|
// Wait for all goroutines and check results
|
||||||
for i := 0; i < 10; i++ {
|
for i := 0; i < 10; i++ {
|
||||||
<-done
|
r := <-results
|
||||||
|
if r.err != nil {
|
||||||
|
t.Fatalf("concurrent insert %d failed: %v", r.index, r.err)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Verify all inserts succeeded
|
// Verify all inserts succeeded
|
||||||
|
|||||||
@@ -4,6 +4,8 @@ import (
|
|||||||
"context"
|
"context"
|
||||||
"database/sql"
|
"database/sql"
|
||||||
"fmt"
|
"fmt"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
)
|
)
|
||||||
|
|
||||||
type FileChunkRepository struct {
|
type FileChunkRepository struct {
|
||||||
@@ -16,16 +18,16 @@ func NewFileChunkRepository(db *DB) *FileChunkRepository {
|
|||||||
|
|
||||||
func (r *FileChunkRepository) Create(ctx context.Context, tx *sql.Tx, fc *FileChunk) error {
|
func (r *FileChunkRepository) Create(ctx context.Context, tx *sql.Tx, fc *FileChunk) error {
|
||||||
query := `
|
query := `
|
||||||
INSERT INTO file_chunks (path, idx, chunk_hash)
|
INSERT INTO file_chunks (file_id, idx, chunk_hash)
|
||||||
VALUES (?, ?, ?)
|
VALUES (?, ?, ?)
|
||||||
ON CONFLICT(path, idx) DO NOTHING
|
ON CONFLICT(file_id, idx) DO NOTHING
|
||||||
`
|
`
|
||||||
|
|
||||||
var err error
|
var err error
|
||||||
if tx != nil {
|
if tx != nil {
|
||||||
_, err = tx.ExecContext(ctx, query, fc.Path, fc.Idx, fc.ChunkHash)
|
_, err = tx.ExecContext(ctx, query, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
|
||||||
} else {
|
} else {
|
||||||
_, err = r.db.ExecWithLock(ctx, query, fc.Path, fc.Idx, fc.ChunkHash)
|
_, err = r.db.ExecWithLog(ctx, query, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
|
||||||
}
|
}
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -37,10 +39,11 @@ func (r *FileChunkRepository) Create(ctx context.Context, tx *sql.Tx, fc *FileCh
|
|||||||
|
|
||||||
func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*FileChunk, error) {
|
func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*FileChunk, error) {
|
||||||
query := `
|
query := `
|
||||||
SELECT path, idx, chunk_hash
|
SELECT fc.file_id, fc.idx, fc.chunk_hash
|
||||||
FROM file_chunks
|
FROM file_chunks fc
|
||||||
WHERE path = ?
|
JOIN files f ON fc.file_id = f.id
|
||||||
ORDER BY idx
|
WHERE f.path = ?
|
||||||
|
ORDER BY fc.idx
|
||||||
`
|
`
|
||||||
|
|
||||||
rows, err := r.db.conn.QueryContext(ctx, query, path)
|
rows, err := r.db.conn.QueryContext(ctx, query, path)
|
||||||
@@ -49,13 +52,64 @@ func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*Fi
|
|||||||
}
|
}
|
||||||
defer CloseRows(rows)
|
defer CloseRows(rows)
|
||||||
|
|
||||||
|
return r.scanFileChunks(rows)
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetByFileID retrieves file chunks by file ID
|
||||||
|
func (r *FileChunkRepository) GetByFileID(ctx context.Context, fileID types.FileID) ([]*FileChunk, error) {
|
||||||
|
query := `
|
||||||
|
SELECT file_id, idx, chunk_hash
|
||||||
|
FROM file_chunks
|
||||||
|
WHERE file_id = ?
|
||||||
|
ORDER BY idx
|
||||||
|
`
|
||||||
|
|
||||||
|
rows, err := r.db.conn.QueryContext(ctx, query, fileID.String())
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("querying file chunks: %w", err)
|
||||||
|
}
|
||||||
|
defer CloseRows(rows)
|
||||||
|
|
||||||
|
return r.scanFileChunks(rows)
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetByPathTx retrieves file chunks within a transaction
|
||||||
|
func (r *FileChunkRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path string) ([]*FileChunk, error) {
|
||||||
|
query := `
|
||||||
|
SELECT fc.file_id, fc.idx, fc.chunk_hash
|
||||||
|
FROM file_chunks fc
|
||||||
|
JOIN files f ON fc.file_id = f.id
|
||||||
|
WHERE f.path = ?
|
||||||
|
ORDER BY fc.idx
|
||||||
|
`
|
||||||
|
|
||||||
|
LogSQL("GetByPathTx", query, path)
|
||||||
|
rows, err := tx.QueryContext(ctx, query, path)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("querying file chunks: %w", err)
|
||||||
|
}
|
||||||
|
defer CloseRows(rows)
|
||||||
|
|
||||||
|
fileChunks, err := r.scanFileChunks(rows)
|
||||||
|
LogSQL("GetByPathTx", "Complete", path, "count", len(fileChunks))
|
||||||
|
return fileChunks, err
|
||||||
|
}
|
||||||
|
|
||||||
|
// scanFileChunks is a helper that scans file chunk rows
|
||||||
|
func (r *FileChunkRepository) scanFileChunks(rows *sql.Rows) ([]*FileChunk, error) {
|
||||||
var fileChunks []*FileChunk
|
var fileChunks []*FileChunk
|
||||||
for rows.Next() {
|
for rows.Next() {
|
||||||
var fc FileChunk
|
var fc FileChunk
|
||||||
err := rows.Scan(&fc.Path, &fc.Idx, &fc.ChunkHash)
|
var fileIDStr, chunkHashStr string
|
||||||
|
err := rows.Scan(&fileIDStr, &fc.Idx, &chunkHashStr)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, fmt.Errorf("scanning file chunk: %w", err)
|
return nil, fmt.Errorf("scanning file chunk: %w", err)
|
||||||
}
|
}
|
||||||
|
fc.FileID, err = types.ParseFileID(fileIDStr)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("parsing file ID: %w", err)
|
||||||
|
}
|
||||||
|
fc.ChunkHash = types.ChunkHash(chunkHashStr)
|
||||||
fileChunks = append(fileChunks, &fc)
|
fileChunks = append(fileChunks, &fc)
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -63,13 +117,13 @@ func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*Fi
|
|||||||
}
|
}
|
||||||
|
|
||||||
func (r *FileChunkRepository) DeleteByPath(ctx context.Context, tx *sql.Tx, path string) error {
|
func (r *FileChunkRepository) DeleteByPath(ctx context.Context, tx *sql.Tx, path string) error {
|
||||||
query := `DELETE FROM file_chunks WHERE path = ?`
|
query := `DELETE FROM file_chunks WHERE file_id = (SELECT id FROM files WHERE path = ?)`
|
||||||
|
|
||||||
var err error
|
var err error
|
||||||
if tx != nil {
|
if tx != nil {
|
||||||
_, err = tx.ExecContext(ctx, query, path)
|
_, err = tx.ExecContext(ctx, query, path)
|
||||||
} else {
|
} else {
|
||||||
_, err = r.db.ExecWithLock(ctx, query, path)
|
_, err = r.db.ExecWithLog(ctx, query, path)
|
||||||
}
|
}
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -78,3 +132,117 @@ func (r *FileChunkRepository) DeleteByPath(ctx context.Context, tx *sql.Tx, path
|
|||||||
|
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// DeleteByFileID deletes all chunks for a file by its UUID
|
||||||
|
func (r *FileChunkRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fileID types.FileID) error {
|
||||||
|
query := `DELETE FROM file_chunks WHERE file_id = ?`
|
||||||
|
|
||||||
|
var err error
|
||||||
|
if tx != nil {
|
||||||
|
_, err = tx.ExecContext(ctx, query, fileID.String())
|
||||||
|
} else {
|
||||||
|
_, err = r.db.ExecWithLog(ctx, query, fileID.String())
|
||||||
|
}
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting file chunks: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// DeleteByFileIDs deletes all chunks for multiple files in a single statement.
|
||||||
|
func (r *FileChunkRepository) DeleteByFileIDs(ctx context.Context, tx *sql.Tx, fileIDs []types.FileID) error {
|
||||||
|
if len(fileIDs) == 0 {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Batch at 500 to stay within SQLite's variable limit
|
||||||
|
const batchSize = 500
|
||||||
|
|
||||||
|
for i := 0; i < len(fileIDs); i += batchSize {
|
||||||
|
end := i + batchSize
|
||||||
|
if end > len(fileIDs) {
|
||||||
|
end = len(fileIDs)
|
||||||
|
}
|
||||||
|
batch := fileIDs[i:end]
|
||||||
|
|
||||||
|
query := "DELETE FROM file_chunks WHERE file_id IN (?" + repeatPlaceholder(len(batch)-1) + ")"
|
||||||
|
args := make([]interface{}, len(batch))
|
||||||
|
for j, id := range batch {
|
||||||
|
args[j] = id.String()
|
||||||
|
}
|
||||||
|
|
||||||
|
var err error
|
||||||
|
if tx != nil {
|
||||||
|
_, err = tx.ExecContext(ctx, query, args...)
|
||||||
|
} else {
|
||||||
|
_, err = r.db.ExecWithLog(ctx, query, args...)
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("batch deleting file_chunks: %w", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// CreateBatch inserts multiple file_chunks in a single statement for efficiency.
|
||||||
|
// Batches are automatically split to stay within SQLite's variable limit.
|
||||||
|
func (r *FileChunkRepository) CreateBatch(ctx context.Context, tx *sql.Tx, fcs []FileChunk) error {
|
||||||
|
if len(fcs) == 0 {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// SQLite has a limit on variables (typically 999 or 32766).
|
||||||
|
// Each FileChunk has 3 values, so batch at 300 to be safe.
|
||||||
|
const batchSize = 300
|
||||||
|
|
||||||
|
for i := 0; i < len(fcs); i += batchSize {
|
||||||
|
end := i + batchSize
|
||||||
|
if end > len(fcs) {
|
||||||
|
end = len(fcs)
|
||||||
|
}
|
||||||
|
batch := fcs[i:end]
|
||||||
|
|
||||||
|
// Build the query with multiple value sets
|
||||||
|
query := "INSERT INTO file_chunks (file_id, idx, chunk_hash) VALUES "
|
||||||
|
args := make([]interface{}, 0, len(batch)*3)
|
||||||
|
for j, fc := range batch {
|
||||||
|
if j > 0 {
|
||||||
|
query += ", "
|
||||||
|
}
|
||||||
|
query += "(?, ?, ?)"
|
||||||
|
args = append(args, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
|
||||||
|
}
|
||||||
|
query += " ON CONFLICT(file_id, idx) DO NOTHING"
|
||||||
|
|
||||||
|
var err error
|
||||||
|
if tx != nil {
|
||||||
|
_, err = tx.ExecContext(ctx, query, args...)
|
||||||
|
} else {
|
||||||
|
_, err = r.db.ExecWithLog(ctx, query, args...)
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("batch inserting file_chunks: %w", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetByFile is an alias for GetByPath for compatibility
|
||||||
|
func (r *FileChunkRepository) GetByFile(ctx context.Context, path string) ([]*FileChunk, error) {
|
||||||
|
LogSQL("GetByFile", "Starting", path)
|
||||||
|
result, err := r.GetByPath(ctx, path)
|
||||||
|
LogSQL("GetByFile", "Complete", path, "count", len(result))
|
||||||
|
return result, err
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetByFileTx retrieves file chunks within a transaction
|
||||||
|
func (r *FileChunkRepository) GetByFileTx(ctx context.Context, tx *sql.Tx, path string) ([]*FileChunk, error) {
|
||||||
|
LogSQL("GetByFileTx", "Starting", path)
|
||||||
|
result, err := r.GetByPathTx(ctx, tx, path)
|
||||||
|
LogSQL("GetByFileTx", "Complete", path, "count", len(result))
|
||||||
|
return result, err
|
||||||
|
}
|
||||||
|
|||||||
@@ -4,6 +4,9 @@ import (
|
|||||||
"context"
|
"context"
|
||||||
"fmt"
|
"fmt"
|
||||||
"testing"
|
"testing"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
)
|
)
|
||||||
|
|
||||||
func TestFileChunkRepository(t *testing.T) {
|
func TestFileChunkRepository(t *testing.T) {
|
||||||
@@ -12,24 +15,56 @@ func TestFileChunkRepository(t *testing.T) {
|
|||||||
|
|
||||||
ctx := context.Background()
|
ctx := context.Background()
|
||||||
repo := NewFileChunkRepository(db)
|
repo := NewFileChunkRepository(db)
|
||||||
|
fileRepo := NewFileRepository(db)
|
||||||
|
|
||||||
|
// Create test file first
|
||||||
|
testTime := time.Now().Truncate(time.Second)
|
||||||
|
file := &File{
|
||||||
|
Path: "/test/file.txt",
|
||||||
|
MTime: testTime,
|
||||||
|
CTime: testTime,
|
||||||
|
Size: 3072,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
LinkTarget: "",
|
||||||
|
}
|
||||||
|
err := fileRepo.Create(ctx, nil, file)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create chunks first
|
||||||
|
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
|
||||||
|
chunkRepo := NewChunkRepository(db)
|
||||||
|
for _, chunkHash := range chunks {
|
||||||
|
chunk := &Chunk{
|
||||||
|
ChunkHash: chunkHash,
|
||||||
|
Size: 1024,
|
||||||
|
}
|
||||||
|
err = chunkRepo.Create(ctx, nil, chunk)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Test Create
|
// Test Create
|
||||||
fc1 := &FileChunk{
|
fc1 := &FileChunk{
|
||||||
Path: "/test/file.txt",
|
FileID: file.ID,
|
||||||
Idx: 0,
|
Idx: 0,
|
||||||
ChunkHash: "chunk1",
|
ChunkHash: types.ChunkHash("chunk1"),
|
||||||
}
|
}
|
||||||
|
|
||||||
err := repo.Create(ctx, nil, fc1)
|
err = repo.Create(ctx, nil, fc1)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to create file chunk: %v", err)
|
t.Fatalf("failed to create file chunk: %v", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Add more chunks for the same file
|
// Add more chunks for the same file
|
||||||
fc2 := &FileChunk{
|
fc2 := &FileChunk{
|
||||||
Path: "/test/file.txt",
|
FileID: file.ID,
|
||||||
Idx: 1,
|
Idx: 1,
|
||||||
ChunkHash: "chunk2",
|
ChunkHash: types.ChunkHash("chunk2"),
|
||||||
}
|
}
|
||||||
err = repo.Create(ctx, nil, fc2)
|
err = repo.Create(ctx, nil, fc2)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -37,26 +72,26 @@ func TestFileChunkRepository(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
fc3 := &FileChunk{
|
fc3 := &FileChunk{
|
||||||
Path: "/test/file.txt",
|
FileID: file.ID,
|
||||||
Idx: 2,
|
Idx: 2,
|
||||||
ChunkHash: "chunk3",
|
ChunkHash: types.ChunkHash("chunk3"),
|
||||||
}
|
}
|
||||||
err = repo.Create(ctx, nil, fc3)
|
err = repo.Create(ctx, nil, fc3)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to create third file chunk: %v", err)
|
t.Fatalf("failed to create third file chunk: %v", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Test GetByPath
|
// Test GetByFile
|
||||||
chunks, err := repo.GetByPath(ctx, "/test/file.txt")
|
fileChunks, err := repo.GetByFile(ctx, "/test/file.txt")
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get file chunks: %v", err)
|
t.Fatalf("failed to get file chunks: %v", err)
|
||||||
}
|
}
|
||||||
if len(chunks) != 3 {
|
if len(fileChunks) != 3 {
|
||||||
t.Errorf("expected 3 chunks, got %d", len(chunks))
|
t.Errorf("expected 3 chunks, got %d", len(fileChunks))
|
||||||
}
|
}
|
||||||
|
|
||||||
// Verify order
|
// Verify order
|
||||||
for i, chunk := range chunks {
|
for i, chunk := range fileChunks {
|
||||||
if chunk.Idx != i {
|
if chunk.Idx != i {
|
||||||
t.Errorf("wrong chunk order: expected idx %d, got %d", i, chunk.Idx)
|
t.Errorf("wrong chunk order: expected idx %d, got %d", i, chunk.Idx)
|
||||||
}
|
}
|
||||||
@@ -68,18 +103,18 @@ func TestFileChunkRepository(t *testing.T) {
|
|||||||
t.Fatalf("failed to create duplicate file chunk: %v", err)
|
t.Fatalf("failed to create duplicate file chunk: %v", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Test DeleteByPath
|
// Test DeleteByFileID
|
||||||
err = repo.DeleteByPath(ctx, nil, "/test/file.txt")
|
err = repo.DeleteByFileID(ctx, nil, file.ID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to delete file chunks: %v", err)
|
t.Fatalf("failed to delete file chunks: %v", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
chunks, err = repo.GetByPath(ctx, "/test/file.txt")
|
fileChunks, err = repo.GetByFileID(ctx, file.ID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get deleted file chunks: %v", err)
|
t.Fatalf("failed to get deleted file chunks: %v", err)
|
||||||
}
|
}
|
||||||
if len(chunks) != 0 {
|
if len(fileChunks) != 0 {
|
||||||
t.Errorf("expected 0 chunks after delete, got %d", len(chunks))
|
t.Errorf("expected 0 chunks after delete, got %d", len(fileChunks))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -89,15 +124,54 @@ func TestFileChunkRepositoryMultipleFiles(t *testing.T) {
|
|||||||
|
|
||||||
ctx := context.Background()
|
ctx := context.Background()
|
||||||
repo := NewFileChunkRepository(db)
|
repo := NewFileChunkRepository(db)
|
||||||
|
fileRepo := NewFileRepository(db)
|
||||||
|
|
||||||
|
// Create test files
|
||||||
|
testTime := time.Now().Truncate(time.Second)
|
||||||
|
filePaths := []string{"/file1.txt", "/file2.txt", "/file3.txt"}
|
||||||
|
files := make([]*File, len(filePaths))
|
||||||
|
|
||||||
|
for i, path := range filePaths {
|
||||||
|
file := &File{
|
||||||
|
Path: types.FilePath(path),
|
||||||
|
MTime: testTime,
|
||||||
|
CTime: testTime,
|
||||||
|
Size: 2048,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
LinkTarget: "",
|
||||||
|
}
|
||||||
|
err := fileRepo.Create(ctx, nil, file)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file %s: %v", path, err)
|
||||||
|
}
|
||||||
|
files[i] = file
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create all chunks first
|
||||||
|
chunkRepo := NewChunkRepository(db)
|
||||||
|
for i := range files {
|
||||||
|
for j := 0; j < 2; j++ {
|
||||||
|
chunkHash := types.ChunkHash(fmt.Sprintf("file%d_chunk%d", i, j))
|
||||||
|
chunk := &Chunk{
|
||||||
|
ChunkHash: chunkHash,
|
||||||
|
Size: 1024,
|
||||||
|
}
|
||||||
|
err := chunkRepo.Create(ctx, nil, chunk)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Create chunks for multiple files
|
// Create chunks for multiple files
|
||||||
files := []string{"/file1.txt", "/file2.txt", "/file3.txt"}
|
for i, file := range files {
|
||||||
for _, path := range files {
|
for j := 0; j < 2; j++ {
|
||||||
for i := 0; i < 2; i++ {
|
|
||||||
fc := &FileChunk{
|
fc := &FileChunk{
|
||||||
Path: path,
|
FileID: file.ID,
|
||||||
Idx: i,
|
Idx: j,
|
||||||
ChunkHash: fmt.Sprintf("%s_chunk%d", path, i),
|
ChunkHash: types.ChunkHash(fmt.Sprintf("file%d_chunk%d", i, j)),
|
||||||
}
|
}
|
||||||
err := repo.Create(ctx, nil, fc)
|
err := repo.Create(ctx, nil, fc)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -107,13 +181,13 @@ func TestFileChunkRepositoryMultipleFiles(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Verify each file has correct chunks
|
// Verify each file has correct chunks
|
||||||
for _, path := range files {
|
for i, file := range files {
|
||||||
chunks, err := repo.GetByPath(ctx, path)
|
chunks, err := repo.GetByFileID(ctx, file.ID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get chunks for %s: %v", path, err)
|
t.Fatalf("failed to get chunks for file %d: %v", i, err)
|
||||||
}
|
}
|
||||||
if len(chunks) != 2 {
|
if len(chunks) != 2 {
|
||||||
t.Errorf("expected 2 chunks for %s, got %d", path, len(chunks))
|
t.Errorf("expected 2 chunks for file %d, got %d", i, len(chunks))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -5,6 +5,9 @@ import (
|
|||||||
"database/sql"
|
"database/sql"
|
||||||
"fmt"
|
"fmt"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
)
|
)
|
||||||
|
|
||||||
type FileRepository struct {
|
type FileRepository struct {
|
||||||
@@ -16,10 +19,16 @@ func NewFileRepository(db *DB) *FileRepository {
|
|||||||
}
|
}
|
||||||
|
|
||||||
func (r *FileRepository) Create(ctx context.Context, tx *sql.Tx, file *File) error {
|
func (r *FileRepository) Create(ctx context.Context, tx *sql.Tx, file *File) error {
|
||||||
|
// Generate UUID if not provided
|
||||||
|
if file.ID.IsZero() {
|
||||||
|
file.ID = types.NewFileID()
|
||||||
|
}
|
||||||
|
|
||||||
query := `
|
query := `
|
||||||
INSERT INTO files (path, mtime, ctime, size, mode, uid, gid, link_target)
|
INSERT INTO files (id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target)
|
||||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||||
ON CONFLICT(path) DO UPDATE SET
|
ON CONFLICT(path) DO UPDATE SET
|
||||||
|
source_path = excluded.source_path,
|
||||||
mtime = excluded.mtime,
|
mtime = excluded.mtime,
|
||||||
ctime = excluded.ctime,
|
ctime = excluded.ctime,
|
||||||
size = excluded.size,
|
size = excluded.size,
|
||||||
@@ -27,43 +36,78 @@ func (r *FileRepository) Create(ctx context.Context, tx *sql.Tx, file *File) err
|
|||||||
uid = excluded.uid,
|
uid = excluded.uid,
|
||||||
gid = excluded.gid,
|
gid = excluded.gid,
|
||||||
link_target = excluded.link_target
|
link_target = excluded.link_target
|
||||||
|
RETURNING id
|
||||||
`
|
`
|
||||||
|
|
||||||
|
var idStr string
|
||||||
var err error
|
var err error
|
||||||
if tx != nil {
|
if tx != nil {
|
||||||
_, err = tx.ExecContext(ctx, query, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget)
|
LogSQL("Execute", query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String())
|
||||||
|
err = tx.QueryRowContext(ctx, query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String()).Scan(&idStr)
|
||||||
} else {
|
} else {
|
||||||
_, err = r.db.ExecWithLock(ctx, query, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget)
|
err = r.db.QueryRowWithLog(ctx, query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String()).Scan(&idStr)
|
||||||
}
|
}
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return fmt.Errorf("inserting file: %w", err)
|
return fmt.Errorf("inserting file: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Parse the returned ID
|
||||||
|
file.ID, err = types.ParseFileID(idStr)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("parsing file ID: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, error) {
|
func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, error) {
|
||||||
query := `
|
query := `
|
||||||
SELECT path, mtime, ctime, size, mode, uid, gid, link_target
|
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
|
||||||
FROM files
|
FROM files
|
||||||
WHERE path = ?
|
WHERE path = ?
|
||||||
`
|
`
|
||||||
|
|
||||||
var file File
|
file, err := r.scanFile(r.db.conn.QueryRowContext(ctx, query, path))
|
||||||
var mtimeUnix, ctimeUnix int64
|
if err == sql.ErrNoRows {
|
||||||
var linkTarget sql.NullString
|
return nil, nil
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("querying file: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
err := r.db.conn.QueryRowContext(ctx, query, path).Scan(
|
return file, nil
|
||||||
&file.Path,
|
}
|
||||||
&mtimeUnix,
|
|
||||||
&ctimeUnix,
|
// GetByID retrieves a file by its UUID
|
||||||
&file.Size,
|
func (r *FileRepository) GetByID(ctx context.Context, id types.FileID) (*File, error) {
|
||||||
&file.Mode,
|
query := `
|
||||||
&file.UID,
|
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
|
||||||
&file.GID,
|
FROM files
|
||||||
&linkTarget,
|
WHERE id = ?
|
||||||
)
|
`
|
||||||
|
|
||||||
|
file, err := r.scanFile(r.db.conn.QueryRowContext(ctx, query, id.String()))
|
||||||
|
if err == sql.ErrNoRows {
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("querying file: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return file, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (r *FileRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path string) (*File, error) {
|
||||||
|
query := `
|
||||||
|
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
|
||||||
|
FROM files
|
||||||
|
WHERE path = ?
|
||||||
|
`
|
||||||
|
|
||||||
|
LogSQL("GetByPathTx QueryRowContext", query, path)
|
||||||
|
file, err := r.scanFile(tx.QueryRowContext(ctx, query, path))
|
||||||
|
LogSQL("GetByPathTx Scan complete", query, path)
|
||||||
|
|
||||||
if err == sql.ErrNoRows {
|
if err == sql.ErrNoRows {
|
||||||
return nil, nil
|
return nil, nil
|
||||||
@@ -72,10 +116,80 @@ func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, err
|
|||||||
return nil, fmt.Errorf("querying file: %w", err)
|
return nil, fmt.Errorf("querying file: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
file.MTime = time.Unix(mtimeUnix, 0)
|
return file, nil
|
||||||
file.CTime = time.Unix(ctimeUnix, 0)
|
}
|
||||||
|
|
||||||
|
// scanFile is a helper that scans a single file row
|
||||||
|
func (r *FileRepository) scanFile(row *sql.Row) (*File, error) {
|
||||||
|
var file File
|
||||||
|
var idStr, pathStr, sourcePathStr string
|
||||||
|
var mtimeUnix, ctimeUnix int64
|
||||||
|
var linkTarget sql.NullString
|
||||||
|
|
||||||
|
err := row.Scan(
|
||||||
|
&idStr,
|
||||||
|
&pathStr,
|
||||||
|
&sourcePathStr,
|
||||||
|
&mtimeUnix,
|
||||||
|
&ctimeUnix,
|
||||||
|
&file.Size,
|
||||||
|
&file.Mode,
|
||||||
|
&file.UID,
|
||||||
|
&file.GID,
|
||||||
|
&linkTarget,
|
||||||
|
)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
file.ID, err = types.ParseFileID(idStr)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("parsing file ID: %w", err)
|
||||||
|
}
|
||||||
|
file.Path = types.FilePath(pathStr)
|
||||||
|
file.SourcePath = types.SourcePath(sourcePathStr)
|
||||||
|
file.MTime = time.Unix(mtimeUnix, 0).UTC()
|
||||||
|
file.CTime = time.Unix(ctimeUnix, 0).UTC()
|
||||||
if linkTarget.Valid {
|
if linkTarget.Valid {
|
||||||
file.LinkTarget = linkTarget.String
|
file.LinkTarget = types.FilePath(linkTarget.String)
|
||||||
|
}
|
||||||
|
|
||||||
|
return &file, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// scanFileRows is a helper that scans a file row from rows iterator
|
||||||
|
func (r *FileRepository) scanFileRows(rows *sql.Rows) (*File, error) {
|
||||||
|
var file File
|
||||||
|
var idStr, pathStr, sourcePathStr string
|
||||||
|
var mtimeUnix, ctimeUnix int64
|
||||||
|
var linkTarget sql.NullString
|
||||||
|
|
||||||
|
err := rows.Scan(
|
||||||
|
&idStr,
|
||||||
|
&pathStr,
|
||||||
|
&sourcePathStr,
|
||||||
|
&mtimeUnix,
|
||||||
|
&ctimeUnix,
|
||||||
|
&file.Size,
|
||||||
|
&file.Mode,
|
||||||
|
&file.UID,
|
||||||
|
&file.GID,
|
||||||
|
&linkTarget,
|
||||||
|
)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
file.ID, err = types.ParseFileID(idStr)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("parsing file ID: %w", err)
|
||||||
|
}
|
||||||
|
file.Path = types.FilePath(pathStr)
|
||||||
|
file.SourcePath = types.SourcePath(sourcePathStr)
|
||||||
|
file.MTime = time.Unix(mtimeUnix, 0).UTC()
|
||||||
|
file.CTime = time.Unix(ctimeUnix, 0).UTC()
|
||||||
|
if linkTarget.Valid {
|
||||||
|
file.LinkTarget = types.FilePath(linkTarget.String)
|
||||||
}
|
}
|
||||||
|
|
||||||
return &file, nil
|
return &file, nil
|
||||||
@@ -83,7 +197,7 @@ func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, err
|
|||||||
|
|
||||||
func (r *FileRepository) ListModifiedSince(ctx context.Context, since time.Time) ([]*File, error) {
|
func (r *FileRepository) ListModifiedSince(ctx context.Context, since time.Time) ([]*File, error) {
|
||||||
query := `
|
query := `
|
||||||
SELECT path, mtime, ctime, size, mode, uid, gid, link_target
|
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
|
||||||
FROM files
|
FROM files
|
||||||
WHERE mtime >= ?
|
WHERE mtime >= ?
|
||||||
ORDER BY path
|
ORDER BY path
|
||||||
@@ -97,31 +211,11 @@ func (r *FileRepository) ListModifiedSince(ctx context.Context, since time.Time)
|
|||||||
|
|
||||||
var files []*File
|
var files []*File
|
||||||
for rows.Next() {
|
for rows.Next() {
|
||||||
var file File
|
file, err := r.scanFileRows(rows)
|
||||||
var mtimeUnix, ctimeUnix int64
|
|
||||||
var linkTarget sql.NullString
|
|
||||||
|
|
||||||
err := rows.Scan(
|
|
||||||
&file.Path,
|
|
||||||
&mtimeUnix,
|
|
||||||
&ctimeUnix,
|
|
||||||
&file.Size,
|
|
||||||
&file.Mode,
|
|
||||||
&file.UID,
|
|
||||||
&file.GID,
|
|
||||||
&linkTarget,
|
|
||||||
)
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, fmt.Errorf("scanning file: %w", err)
|
return nil, fmt.Errorf("scanning file: %w", err)
|
||||||
}
|
}
|
||||||
|
files = append(files, file)
|
||||||
file.MTime = time.Unix(mtimeUnix, 0)
|
|
||||||
file.CTime = time.Unix(ctimeUnix, 0)
|
|
||||||
if linkTarget.Valid {
|
|
||||||
file.LinkTarget = linkTarget.String
|
|
||||||
}
|
|
||||||
|
|
||||||
files = append(files, &file)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
return files, rows.Err()
|
return files, rows.Err()
|
||||||
@@ -134,7 +228,7 @@ func (r *FileRepository) Delete(ctx context.Context, tx *sql.Tx, path string) er
|
|||||||
if tx != nil {
|
if tx != nil {
|
||||||
_, err = tx.ExecContext(ctx, query, path)
|
_, err = tx.ExecContext(ctx, query, path)
|
||||||
} else {
|
} else {
|
||||||
_, err = r.db.ExecWithLock(ctx, query, path)
|
_, err = r.db.ExecWithLog(ctx, query, path)
|
||||||
}
|
}
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -143,3 +237,146 @@ func (r *FileRepository) Delete(ctx context.Context, tx *sql.Tx, path string) er
|
|||||||
|
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// DeleteByID deletes a file by its UUID
|
||||||
|
func (r *FileRepository) DeleteByID(ctx context.Context, tx *sql.Tx, id types.FileID) error {
|
||||||
|
query := `DELETE FROM files WHERE id = ?`
|
||||||
|
|
||||||
|
var err error
|
||||||
|
if tx != nil {
|
||||||
|
_, err = tx.ExecContext(ctx, query, id.String())
|
||||||
|
} else {
|
||||||
|
_, err = r.db.ExecWithLog(ctx, query, id.String())
|
||||||
|
}
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting file: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (r *FileRepository) ListByPrefix(ctx context.Context, prefix string) ([]*File, error) {
|
||||||
|
query := `
|
||||||
|
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
|
||||||
|
FROM files
|
||||||
|
WHERE path LIKE ? || '%'
|
||||||
|
ORDER BY path
|
||||||
|
`
|
||||||
|
|
||||||
|
rows, err := r.db.conn.QueryContext(ctx, query, prefix)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("querying files: %w", err)
|
||||||
|
}
|
||||||
|
defer CloseRows(rows)
|
||||||
|
|
||||||
|
var files []*File
|
||||||
|
for rows.Next() {
|
||||||
|
file, err := r.scanFileRows(rows)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("scanning file: %w", err)
|
||||||
|
}
|
||||||
|
files = append(files, file)
|
||||||
|
}
|
||||||
|
|
||||||
|
return files, rows.Err()
|
||||||
|
}
|
||||||
|
|
||||||
|
// ListAll returns all files in the database
|
||||||
|
func (r *FileRepository) ListAll(ctx context.Context) ([]*File, error) {
|
||||||
|
query := `
|
||||||
|
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
|
||||||
|
FROM files
|
||||||
|
ORDER BY path
|
||||||
|
`
|
||||||
|
|
||||||
|
rows, err := r.db.conn.QueryContext(ctx, query)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("querying files: %w", err)
|
||||||
|
}
|
||||||
|
defer CloseRows(rows)
|
||||||
|
|
||||||
|
var files []*File
|
||||||
|
for rows.Next() {
|
||||||
|
file, err := r.scanFileRows(rows)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("scanning file: %w", err)
|
||||||
|
}
|
||||||
|
files = append(files, file)
|
||||||
|
}
|
||||||
|
|
||||||
|
return files, rows.Err()
|
||||||
|
}
|
||||||
|
|
||||||
|
// CreateBatch inserts or updates multiple files in a single statement for efficiency.
|
||||||
|
// File IDs must be pre-generated before calling this method.
|
||||||
|
func (r *FileRepository) CreateBatch(ctx context.Context, tx *sql.Tx, files []*File) error {
|
||||||
|
if len(files) == 0 {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Each File has 10 values, so batch at 100 to be safe with SQLite's variable limit
|
||||||
|
const batchSize = 100
|
||||||
|
|
||||||
|
for i := 0; i < len(files); i += batchSize {
|
||||||
|
end := i + batchSize
|
||||||
|
if end > len(files) {
|
||||||
|
end = len(files)
|
||||||
|
}
|
||||||
|
batch := files[i:end]
|
||||||
|
|
||||||
|
query := `INSERT INTO files (id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target) VALUES `
|
||||||
|
args := make([]interface{}, 0, len(batch)*10)
|
||||||
|
for j, f := range batch {
|
||||||
|
if j > 0 {
|
||||||
|
query += ", "
|
||||||
|
}
|
||||||
|
query += "(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"
|
||||||
|
args = append(args, f.ID.String(), f.Path.String(), f.SourcePath.String(), f.MTime.Unix(), f.CTime.Unix(), f.Size, f.Mode, f.UID, f.GID, f.LinkTarget.String())
|
||||||
|
}
|
||||||
|
query += ` ON CONFLICT(path) DO UPDATE SET
|
||||||
|
source_path = excluded.source_path,
|
||||||
|
mtime = excluded.mtime,
|
||||||
|
ctime = excluded.ctime,
|
||||||
|
size = excluded.size,
|
||||||
|
mode = excluded.mode,
|
||||||
|
uid = excluded.uid,
|
||||||
|
gid = excluded.gid,
|
||||||
|
link_target = excluded.link_target`
|
||||||
|
|
||||||
|
var err error
|
||||||
|
if tx != nil {
|
||||||
|
_, err = tx.ExecContext(ctx, query, args...)
|
||||||
|
} else {
|
||||||
|
_, err = r.db.ExecWithLog(ctx, query, args...)
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("batch inserting files: %w", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// DeleteOrphaned deletes files that are not referenced by any snapshot
|
||||||
|
func (r *FileRepository) DeleteOrphaned(ctx context.Context) error {
|
||||||
|
query := `
|
||||||
|
DELETE FROM files
|
||||||
|
WHERE NOT EXISTS (
|
||||||
|
SELECT 1 FROM snapshot_files
|
||||||
|
WHERE snapshot_files.file_id = files.id
|
||||||
|
)
|
||||||
|
`
|
||||||
|
|
||||||
|
result, err := r.db.ExecWithLog(ctx, query)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting orphaned files: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
rowsAffected, _ := result.RowsAffected()
|
||||||
|
if rowsAffected > 0 {
|
||||||
|
log.Debug("Deleted orphaned files", "count", rowsAffected)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|||||||
@@ -53,7 +53,7 @@ func TestFileRepository(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Test GetByPath
|
// Test GetByPath
|
||||||
retrieved, err := repo.GetByPath(ctx, file.Path)
|
retrieved, err := repo.GetByPath(ctx, file.Path.String())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get file: %v", err)
|
t.Fatalf("failed to get file: %v", err)
|
||||||
}
|
}
|
||||||
@@ -81,7 +81,7 @@ func TestFileRepository(t *testing.T) {
|
|||||||
t.Fatalf("failed to update file: %v", err)
|
t.Fatalf("failed to update file: %v", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
retrieved, err = repo.GetByPath(ctx, file.Path)
|
retrieved, err = repo.GetByPath(ctx, file.Path.String())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get updated file: %v", err)
|
t.Fatalf("failed to get updated file: %v", err)
|
||||||
}
|
}
|
||||||
@@ -99,12 +99,12 @@ func TestFileRepository(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Test Delete
|
// Test Delete
|
||||||
err = repo.Delete(ctx, nil, file.Path)
|
err = repo.Delete(ctx, nil, file.Path.String())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to delete file: %v", err)
|
t.Fatalf("failed to delete file: %v", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
retrieved, err = repo.GetByPath(ctx, file.Path)
|
retrieved, err = repo.GetByPath(ctx, file.Path.String())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("error getting deleted file: %v", err)
|
t.Fatalf("error getting deleted file: %v", err)
|
||||||
}
|
}
|
||||||
@@ -137,7 +137,7 @@ func TestFileRepositorySymlink(t *testing.T) {
|
|||||||
t.Fatalf("failed to create symlink: %v", err)
|
t.Fatalf("failed to create symlink: %v", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
retrieved, err := repo.GetByPath(ctx, symlink.Path)
|
retrieved, err := repo.GetByPath(ctx, symlink.Path.String())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get symlink: %v", err)
|
t.Fatalf("failed to get symlink: %v", err)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,70 +1,125 @@
|
|||||||
|
// Package database provides data models and repository interfaces for the Vaultik backup system.
|
||||||
|
// It includes types for files, chunks, blobs, snapshots, and their relationships.
|
||||||
package database
|
package database
|
||||||
|
|
||||||
import "time"
|
import (
|
||||||
|
"time"
|
||||||
|
|
||||||
// File represents a file record in the database
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
|
)
|
||||||
|
|
||||||
|
// File represents a file or directory in the backup system.
|
||||||
|
// It stores metadata about files including timestamps, permissions, ownership,
|
||||||
|
// and symlink targets. This information is used to restore files with their
|
||||||
|
// original attributes.
|
||||||
type File struct {
|
type File struct {
|
||||||
Path string
|
ID types.FileID // UUID primary key
|
||||||
|
Path types.FilePath // Absolute path of the file
|
||||||
|
SourcePath types.SourcePath // The source directory this file came from (for restore path stripping)
|
||||||
MTime time.Time
|
MTime time.Time
|
||||||
CTime time.Time
|
CTime time.Time
|
||||||
Size int64
|
Size int64
|
||||||
Mode uint32
|
Mode uint32
|
||||||
UID uint32
|
UID uint32
|
||||||
GID uint32
|
GID uint32
|
||||||
LinkTarget string // empty for regular files, target path for symlinks
|
LinkTarget types.FilePath // empty for regular files, target path for symlinks
|
||||||
}
|
}
|
||||||
|
|
||||||
// IsSymlink returns true if this file is a symbolic link
|
// IsSymlink returns true if this file is a symbolic link.
|
||||||
|
// A file is considered a symlink if it has a non-empty LinkTarget.
|
||||||
func (f *File) IsSymlink() bool {
|
func (f *File) IsSymlink() bool {
|
||||||
return f.LinkTarget != ""
|
return f.LinkTarget != ""
|
||||||
}
|
}
|
||||||
|
|
||||||
// FileChunk represents the mapping between files and chunks
|
// FileChunk represents the mapping between files and their constituent chunks.
|
||||||
|
// Large files are split into multiple chunks for efficient deduplication and storage.
|
||||||
|
// The Idx field maintains the order of chunks within a file.
|
||||||
type FileChunk struct {
|
type FileChunk struct {
|
||||||
Path string
|
FileID types.FileID
|
||||||
Idx int
|
Idx int
|
||||||
ChunkHash string
|
ChunkHash types.ChunkHash
|
||||||
}
|
}
|
||||||
|
|
||||||
// Chunk represents a chunk record in the database
|
// Chunk represents a data chunk in the deduplication system.
|
||||||
|
// Files are split into chunks which are content-addressed by their hash.
|
||||||
|
// The ChunkHash is the SHA256 hash of the chunk content, used for deduplication.
|
||||||
type Chunk struct {
|
type Chunk struct {
|
||||||
ChunkHash string
|
ChunkHash types.ChunkHash
|
||||||
SHA256 string
|
|
||||||
Size int64
|
Size int64
|
||||||
}
|
}
|
||||||
|
|
||||||
// Blob represents a blob record in the database
|
// Blob represents a blob record in the database.
|
||||||
|
// A blob is Vaultik's final storage unit - a large file (up to 10GB) containing
|
||||||
|
// many compressed and encrypted chunks from multiple source files.
|
||||||
|
// Blobs are content-addressed, meaning their filename in S3 is derived from
|
||||||
|
// the SHA256 hash of their compressed and encrypted content.
|
||||||
|
// The blob creation process is: chunks are accumulated -> compressed with zstd
|
||||||
|
// -> encrypted with age -> hashed -> uploaded to S3 with the hash as filename.
|
||||||
type Blob struct {
|
type Blob struct {
|
||||||
BlobHash string
|
ID types.BlobID // UUID assigned when blob creation starts
|
||||||
CreatedTS time.Time
|
Hash types.BlobHash // SHA256 of final compressed+encrypted content (empty until finalized)
|
||||||
|
CreatedTS time.Time // When blob creation started
|
||||||
|
FinishedTS *time.Time // When blob was finalized (nil if still packing)
|
||||||
|
UncompressedSize int64 // Total size of raw chunks before compression
|
||||||
|
CompressedSize int64 // Size after compression and encryption
|
||||||
|
UploadedTS *time.Time // When blob was uploaded to S3 (nil if not uploaded)
|
||||||
}
|
}
|
||||||
|
|
||||||
// BlobChunk represents the mapping between blobs and chunks
|
// BlobChunk represents the mapping between blobs and the chunks they contain.
|
||||||
|
// This allows tracking which chunks are stored in which blobs, along with
|
||||||
|
// their position and size within the blob. The offset and length fields
|
||||||
|
// enable extracting specific chunks from a blob without processing the entire blob.
|
||||||
type BlobChunk struct {
|
type BlobChunk struct {
|
||||||
BlobHash string
|
BlobID types.BlobID
|
||||||
ChunkHash string
|
ChunkHash types.ChunkHash
|
||||||
Offset int64
|
Offset int64
|
||||||
Length int64
|
Length int64
|
||||||
}
|
}
|
||||||
|
|
||||||
// ChunkFile represents the reverse mapping of chunks to files
|
// ChunkFile represents the reverse mapping showing which files contain a specific chunk.
|
||||||
|
// This is used during deduplication to identify all files that share a chunk,
|
||||||
|
// which is important for garbage collection and integrity verification.
|
||||||
type ChunkFile struct {
|
type ChunkFile struct {
|
||||||
ChunkHash string
|
ChunkHash types.ChunkHash
|
||||||
FilePath string
|
FileID types.FileID
|
||||||
FileOffset int64
|
FileOffset int64
|
||||||
Length int64
|
Length int64
|
||||||
}
|
}
|
||||||
|
|
||||||
// Snapshot represents a snapshot record in the database
|
// Snapshot represents a snapshot record in the database
|
||||||
type Snapshot struct {
|
type Snapshot struct {
|
||||||
ID string
|
ID types.SnapshotID
|
||||||
Hostname string
|
Hostname types.Hostname
|
||||||
VaultikVersion string
|
VaultikVersion types.Version
|
||||||
CreatedTS time.Time
|
VaultikGitRevision types.GitRevision
|
||||||
|
StartedAt time.Time
|
||||||
|
CompletedAt *time.Time // nil if still in progress
|
||||||
FileCount int64
|
FileCount int64
|
||||||
ChunkCount int64
|
ChunkCount int64
|
||||||
BlobCount int64
|
BlobCount int64
|
||||||
TotalSize int64 // Total size of all referenced files
|
TotalSize int64 // Total size of all referenced files
|
||||||
BlobSize int64 // Total size of all referenced blobs (compressed and encrypted)
|
BlobSize int64 // Total size of all referenced blobs (compressed and encrypted)
|
||||||
CompressionRatio float64 // Compression ratio (BlobSize / TotalSize)
|
BlobUncompressedSize int64 // Total uncompressed size of all referenced blobs
|
||||||
|
CompressionRatio float64 // Compression ratio (BlobSize / BlobUncompressedSize)
|
||||||
|
CompressionLevel int // Compression level used for this snapshot
|
||||||
|
UploadBytes int64 // Total bytes uploaded during this snapshot
|
||||||
|
UploadDurationMs int64 // Total milliseconds spent uploading to S3
|
||||||
|
}
|
||||||
|
|
||||||
|
// IsComplete returns true if the snapshot has completed
|
||||||
|
func (s *Snapshot) IsComplete() bool {
|
||||||
|
return s.CompletedAt != nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// SnapshotFile represents the mapping between snapshots and files
|
||||||
|
type SnapshotFile struct {
|
||||||
|
SnapshotID types.SnapshotID
|
||||||
|
FileID types.FileID
|
||||||
|
}
|
||||||
|
|
||||||
|
// SnapshotBlob represents the mapping between snapshots and blobs
|
||||||
|
type SnapshotBlob struct {
|
||||||
|
SnapshotID types.SnapshotID
|
||||||
|
BlobID types.BlobID
|
||||||
|
BlobHash types.BlobHash // Denormalized for easier manifest generation
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -7,6 +7,7 @@ import (
|
|||||||
"path/filepath"
|
"path/filepath"
|
||||||
|
|
||||||
"git.eeqj.de/sneak/vaultik/internal/config"
|
"git.eeqj.de/sneak/vaultik/internal/config"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
"go.uber.org/fx"
|
"go.uber.org/fx"
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -32,7 +33,13 @@ func provideDatabase(lc fx.Lifecycle, cfg *config.Config) (*DB, error) {
|
|||||||
|
|
||||||
lc.Append(fx.Hook{
|
lc.Append(fx.Hook{
|
||||||
OnStop: func(ctx context.Context) error {
|
OnStop: func(ctx context.Context) error {
|
||||||
return db.Close()
|
log.Debug("Database module OnStop hook called")
|
||||||
|
if err := db.Close(); err != nil {
|
||||||
|
log.Error("Failed to close database in OnStop hook", "error", err)
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
log.Debug("Database closed successfully in OnStop hook")
|
||||||
|
return nil
|
||||||
},
|
},
|
||||||
})
|
})
|
||||||
|
|
||||||
|
|||||||
@@ -6,6 +6,9 @@ import (
|
|||||||
"fmt"
|
"fmt"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
// Repositories provides access to all database repositories.
|
||||||
|
// It serves as a centralized access point for all database operations
|
||||||
|
// and manages transaction coordination across repositories.
|
||||||
type Repositories struct {
|
type Repositories struct {
|
||||||
db *DB
|
db *DB
|
||||||
Files *FileRepository
|
Files *FileRepository
|
||||||
@@ -15,8 +18,11 @@ type Repositories struct {
|
|||||||
BlobChunks *BlobChunkRepository
|
BlobChunks *BlobChunkRepository
|
||||||
ChunkFiles *ChunkFileRepository
|
ChunkFiles *ChunkFileRepository
|
||||||
Snapshots *SnapshotRepository
|
Snapshots *SnapshotRepository
|
||||||
|
Uploads *UploadRepository
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// NewRepositories creates a new Repositories instance with all repository types.
|
||||||
|
// Each repository shares the same database connection for coordinated transactions.
|
||||||
func NewRepositories(db *DB) *Repositories {
|
func NewRepositories(db *DB) *Repositories {
|
||||||
return &Repositories{
|
return &Repositories{
|
||||||
db: db,
|
db: db,
|
||||||
@@ -27,20 +33,26 @@ func NewRepositories(db *DB) *Repositories {
|
|||||||
BlobChunks: NewBlobChunkRepository(db),
|
BlobChunks: NewBlobChunkRepository(db),
|
||||||
ChunkFiles: NewChunkFileRepository(db),
|
ChunkFiles: NewChunkFileRepository(db),
|
||||||
Snapshots: NewSnapshotRepository(db),
|
Snapshots: NewSnapshotRepository(db),
|
||||||
|
Uploads: NewUploadRepository(db.conn),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// TxFunc is a function that executes within a database transaction.
|
||||||
|
// The transaction is automatically committed if the function returns nil,
|
||||||
|
// or rolled back if it returns an error.
|
||||||
type TxFunc func(ctx context.Context, tx *sql.Tx) error
|
type TxFunc func(ctx context.Context, tx *sql.Tx) error
|
||||||
|
|
||||||
|
// WithTx executes a function within a write transaction.
|
||||||
|
// SQLite handles its own locking internally, so no explicit locking is needed.
|
||||||
|
// The transaction is automatically committed on success or rolled back on error.
|
||||||
|
// This method should be used for all write operations to ensure atomicity.
|
||||||
func (r *Repositories) WithTx(ctx context.Context, fn TxFunc) error {
|
func (r *Repositories) WithTx(ctx context.Context, fn TxFunc) error {
|
||||||
// Acquire write lock for the entire transaction
|
LogSQL("WithTx", "Beginning transaction", "")
|
||||||
r.db.LockForWrite()
|
|
||||||
defer r.db.UnlockWrite()
|
|
||||||
|
|
||||||
tx, err := r.db.BeginTx(ctx, nil)
|
tx, err := r.db.BeginTx(ctx, nil)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return fmt.Errorf("beginning transaction: %w", err)
|
return fmt.Errorf("beginning transaction: %w", err)
|
||||||
}
|
}
|
||||||
|
LogSQL("WithTx", "Transaction started", "")
|
||||||
|
|
||||||
defer func() {
|
defer func() {
|
||||||
if p := recover(); p != nil {
|
if p := recover(); p != nil {
|
||||||
@@ -63,6 +75,15 @@ func (r *Repositories) WithTx(ctx context.Context, fn TxFunc) error {
|
|||||||
return tx.Commit()
|
return tx.Commit()
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// DB returns the underlying database for direct queries
|
||||||
|
func (r *Repositories) DB() *DB {
|
||||||
|
return r.db
|
||||||
|
}
|
||||||
|
|
||||||
|
// WithReadTx executes a function within a read-only transaction.
|
||||||
|
// Read transactions can run concurrently with other read transactions
|
||||||
|
// but will be blocked by write transactions. The transaction is
|
||||||
|
// automatically committed on success or rolled back on error.
|
||||||
func (r *Repositories) WithReadTx(ctx context.Context, fn TxFunc) error {
|
func (r *Repositories) WithReadTx(ctx context.Context, fn TxFunc) error {
|
||||||
opts := &sql.TxOptions{
|
opts := &sql.TxOptions{
|
||||||
ReadOnly: true,
|
ReadOnly: true,
|
||||||
|
|||||||
@@ -6,6 +6,8 @@ import (
|
|||||||
"fmt"
|
"fmt"
|
||||||
"testing"
|
"testing"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
)
|
)
|
||||||
|
|
||||||
func TestRepositoriesTransaction(t *testing.T) {
|
func TestRepositoriesTransaction(t *testing.T) {
|
||||||
@@ -33,8 +35,7 @@ func TestRepositoriesTransaction(t *testing.T) {
|
|||||||
|
|
||||||
// Create chunks
|
// Create chunks
|
||||||
chunk1 := &Chunk{
|
chunk1 := &Chunk{
|
||||||
ChunkHash: "tx_chunk1",
|
ChunkHash: types.ChunkHash("tx_chunk1"),
|
||||||
SHA256: "tx_sha1",
|
|
||||||
Size: 512,
|
Size: 512,
|
||||||
}
|
}
|
||||||
if err := repos.Chunks.Create(ctx, tx, chunk1); err != nil {
|
if err := repos.Chunks.Create(ctx, tx, chunk1); err != nil {
|
||||||
@@ -42,8 +43,7 @@ func TestRepositoriesTransaction(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
chunk2 := &Chunk{
|
chunk2 := &Chunk{
|
||||||
ChunkHash: "tx_chunk2",
|
ChunkHash: types.ChunkHash("tx_chunk2"),
|
||||||
SHA256: "tx_sha2",
|
|
||||||
Size: 512,
|
Size: 512,
|
||||||
}
|
}
|
||||||
if err := repos.Chunks.Create(ctx, tx, chunk2); err != nil {
|
if err := repos.Chunks.Create(ctx, tx, chunk2); err != nil {
|
||||||
@@ -52,7 +52,7 @@ func TestRepositoriesTransaction(t *testing.T) {
|
|||||||
|
|
||||||
// Map chunks to file
|
// Map chunks to file
|
||||||
fc1 := &FileChunk{
|
fc1 := &FileChunk{
|
||||||
Path: file.Path,
|
FileID: file.ID,
|
||||||
Idx: 0,
|
Idx: 0,
|
||||||
ChunkHash: chunk1.ChunkHash,
|
ChunkHash: chunk1.ChunkHash,
|
||||||
}
|
}
|
||||||
@@ -61,7 +61,7 @@ func TestRepositoriesTransaction(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
fc2 := &FileChunk{
|
fc2 := &FileChunk{
|
||||||
Path: file.Path,
|
FileID: file.ID,
|
||||||
Idx: 1,
|
Idx: 1,
|
||||||
ChunkHash: chunk2.ChunkHash,
|
ChunkHash: chunk2.ChunkHash,
|
||||||
}
|
}
|
||||||
@@ -71,7 +71,8 @@ func TestRepositoriesTransaction(t *testing.T) {
|
|||||||
|
|
||||||
// Create blob
|
// Create blob
|
||||||
blob := &Blob{
|
blob := &Blob{
|
||||||
BlobHash: "tx_blob1",
|
ID: types.NewBlobID(),
|
||||||
|
Hash: types.BlobHash("tx_blob1"),
|
||||||
CreatedTS: time.Now().Truncate(time.Second),
|
CreatedTS: time.Now().Truncate(time.Second),
|
||||||
}
|
}
|
||||||
if err := repos.Blobs.Create(ctx, tx, blob); err != nil {
|
if err := repos.Blobs.Create(ctx, tx, blob); err != nil {
|
||||||
@@ -80,7 +81,7 @@ func TestRepositoriesTransaction(t *testing.T) {
|
|||||||
|
|
||||||
// Map chunks to blob
|
// Map chunks to blob
|
||||||
bc1 := &BlobChunk{
|
bc1 := &BlobChunk{
|
||||||
BlobHash: blob.BlobHash,
|
BlobID: blob.ID,
|
||||||
ChunkHash: chunk1.ChunkHash,
|
ChunkHash: chunk1.ChunkHash,
|
||||||
Offset: 0,
|
Offset: 0,
|
||||||
Length: 512,
|
Length: 512,
|
||||||
@@ -90,7 +91,7 @@ func TestRepositoriesTransaction(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
bc2 := &BlobChunk{
|
bc2 := &BlobChunk{
|
||||||
BlobHash: blob.BlobHash,
|
BlobID: blob.ID,
|
||||||
ChunkHash: chunk2.ChunkHash,
|
ChunkHash: chunk2.ChunkHash,
|
||||||
Offset: 512,
|
Offset: 512,
|
||||||
Length: 512,
|
Length: 512,
|
||||||
@@ -115,7 +116,7 @@ func TestRepositoriesTransaction(t *testing.T) {
|
|||||||
t.Error("expected file after transaction")
|
t.Error("expected file after transaction")
|
||||||
}
|
}
|
||||||
|
|
||||||
chunks, err := repos.FileChunks.GetByPath(ctx, "/test/tx_file.txt")
|
chunks, err := repos.FileChunks.GetByFile(ctx, "/test/tx_file.txt")
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get file chunks: %v", err)
|
t.Fatalf("failed to get file chunks: %v", err)
|
||||||
}
|
}
|
||||||
@@ -157,8 +158,7 @@ func TestRepositoriesTransactionRollback(t *testing.T) {
|
|||||||
|
|
||||||
// Create a chunk
|
// Create a chunk
|
||||||
chunk := &Chunk{
|
chunk := &Chunk{
|
||||||
ChunkHash: "rollback_chunk",
|
ChunkHash: types.ChunkHash("rollback_chunk"),
|
||||||
SHA256: "rollback_sha",
|
|
||||||
Size: 1024,
|
Size: 1024,
|
||||||
}
|
}
|
||||||
if err := repos.Chunks.Create(ctx, tx, chunk); err != nil {
|
if err := repos.Chunks.Create(ctx, tx, chunk); err != nil {
|
||||||
@@ -217,7 +217,7 @@ func TestRepositoriesReadTransaction(t *testing.T) {
|
|||||||
var retrievedFile *File
|
var retrievedFile *File
|
||||||
err = repos.WithReadTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
err = repos.WithReadTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
var err error
|
var err error
|
||||||
retrievedFile, err = repos.Files.GetByPath(ctx, "/test/read_file.txt")
|
retrievedFile, err = repos.Files.GetByPathTx(ctx, tx, "/test/read_file.txt")
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|||||||
874
internal/database/repository_comprehensive_test.go
Normal file
874
internal/database/repository_comprehensive_test.go
Normal file
@@ -0,0 +1,874 @@
|
|||||||
|
package database
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"database/sql"
|
||||||
|
"fmt"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TestFileRepositoryUUIDGeneration tests that files get unique UUIDs
|
||||||
|
func TestFileRepositoryUUIDGeneration(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repo := NewFileRepository(db)
|
||||||
|
|
||||||
|
// Create multiple files
|
||||||
|
files := []*File{
|
||||||
|
{
|
||||||
|
Path: "/file1.txt",
|
||||||
|
MTime: time.Now().Truncate(time.Second),
|
||||||
|
CTime: time.Now().Truncate(time.Second),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
Path: "/file2.txt",
|
||||||
|
MTime: time.Now().Truncate(time.Second),
|
||||||
|
CTime: time.Now().Truncate(time.Second),
|
||||||
|
Size: 2048,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
uuids := make(map[string]bool)
|
||||||
|
for _, file := range files {
|
||||||
|
err := repo.Create(ctx, nil, file)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check UUID was generated
|
||||||
|
if file.ID.IsZero() {
|
||||||
|
t.Error("file ID was not generated")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check UUID is unique
|
||||||
|
if uuids[file.ID.String()] {
|
||||||
|
t.Errorf("duplicate UUID generated: %s", file.ID)
|
||||||
|
}
|
||||||
|
uuids[file.ID.String()] = true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestFileRepositoryGetByID tests retrieving files by UUID
|
||||||
|
func TestFileRepositoryGetByID(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repo := NewFileRepository(db)
|
||||||
|
|
||||||
|
// Create a file
|
||||||
|
file := &File{
|
||||||
|
Path: "/test.txt",
|
||||||
|
MTime: time.Now().Truncate(time.Second),
|
||||||
|
CTime: time.Now().Truncate(time.Second),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
|
||||||
|
err := repo.Create(ctx, nil, file)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Retrieve by ID
|
||||||
|
retrieved, err := repo.GetByID(ctx, file.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to get file by ID: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if retrieved.ID != file.ID {
|
||||||
|
t.Errorf("ID mismatch: expected %s, got %s", file.ID, retrieved.ID)
|
||||||
|
}
|
||||||
|
if retrieved.Path != file.Path {
|
||||||
|
t.Errorf("Path mismatch: expected %s, got %s", file.Path, retrieved.Path)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test non-existent ID
|
||||||
|
nonExistentID := types.NewFileID() // Generate a new UUID that won't exist in the database
|
||||||
|
nonExistent, err := repo.GetByID(ctx, nonExistentID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("GetByID should not return error for non-existent ID: %v", err)
|
||||||
|
}
|
||||||
|
if nonExistent != nil {
|
||||||
|
t.Error("expected nil for non-existent ID")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestOrphanedFileCleanup tests the cleanup of orphaned files
|
||||||
|
func TestOrphanedFileCleanup(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Create files
|
||||||
|
file1 := &File{
|
||||||
|
Path: "/orphaned.txt",
|
||||||
|
MTime: time.Now().Truncate(time.Second),
|
||||||
|
CTime: time.Now().Truncate(time.Second),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
file2 := &File{
|
||||||
|
Path: "/referenced.txt",
|
||||||
|
MTime: time.Now().Truncate(time.Second),
|
||||||
|
CTime: time.Now().Truncate(time.Second),
|
||||||
|
Size: 2048,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
|
||||||
|
err := repos.Files.Create(ctx, nil, file1)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file1: %v", err)
|
||||||
|
}
|
||||||
|
err = repos.Files.Create(ctx, nil, file2)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file2: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create a snapshot and reference only file2
|
||||||
|
snapshot := &Snapshot{
|
||||||
|
ID: "test-snapshot",
|
||||||
|
Hostname: "test-host",
|
||||||
|
StartedAt: time.Now(),
|
||||||
|
}
|
||||||
|
err = repos.Snapshots.Create(ctx, nil, snapshot)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create snapshot: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add file2 to snapshot
|
||||||
|
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file2.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to add file to snapshot: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Run orphaned cleanup
|
||||||
|
err = repos.Files.DeleteOrphaned(ctx)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to delete orphaned files: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check that orphaned file is gone
|
||||||
|
orphanedFile, err := repos.Files.GetByID(ctx, file1.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("error getting file: %v", err)
|
||||||
|
}
|
||||||
|
if orphanedFile != nil {
|
||||||
|
t.Error("orphaned file should have been deleted")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check that referenced file still exists
|
||||||
|
referencedFile, err := repos.Files.GetByID(ctx, file2.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("error getting file: %v", err)
|
||||||
|
}
|
||||||
|
if referencedFile == nil {
|
||||||
|
t.Error("referenced file should not have been deleted")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestOrphanedChunkCleanup tests the cleanup of orphaned chunks
|
||||||
|
func TestOrphanedChunkCleanup(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Create chunks
|
||||||
|
chunk1 := &Chunk{
|
||||||
|
ChunkHash: types.ChunkHash("orphaned-chunk"),
|
||||||
|
Size: 1024,
|
||||||
|
}
|
||||||
|
chunk2 := &Chunk{
|
||||||
|
ChunkHash: types.ChunkHash("referenced-chunk"),
|
||||||
|
Size: 1024,
|
||||||
|
}
|
||||||
|
|
||||||
|
err := repos.Chunks.Create(ctx, nil, chunk1)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk1: %v", err)
|
||||||
|
}
|
||||||
|
err = repos.Chunks.Create(ctx, nil, chunk2)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk2: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create a file and reference only chunk2
|
||||||
|
file := &File{
|
||||||
|
Path: "/test.txt",
|
||||||
|
MTime: time.Now().Truncate(time.Second),
|
||||||
|
CTime: time.Now().Truncate(time.Second),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
err = repos.Files.Create(ctx, nil, file)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create file-chunk mapping only for chunk2
|
||||||
|
fc := &FileChunk{
|
||||||
|
FileID: file.ID,
|
||||||
|
Idx: 0,
|
||||||
|
ChunkHash: chunk2.ChunkHash,
|
||||||
|
}
|
||||||
|
err = repos.FileChunks.Create(ctx, nil, fc)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file chunk: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Run orphaned cleanup
|
||||||
|
err = repos.Chunks.DeleteOrphaned(ctx)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to delete orphaned chunks: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check that orphaned chunk is gone
|
||||||
|
orphanedChunk, err := repos.Chunks.GetByHash(ctx, chunk1.ChunkHash.String())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("error getting chunk: %v", err)
|
||||||
|
}
|
||||||
|
if orphanedChunk != nil {
|
||||||
|
t.Error("orphaned chunk should have been deleted")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check that referenced chunk still exists
|
||||||
|
referencedChunk, err := repos.Chunks.GetByHash(ctx, chunk2.ChunkHash.String())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("error getting chunk: %v", err)
|
||||||
|
}
|
||||||
|
if referencedChunk == nil {
|
||||||
|
t.Error("referenced chunk should not have been deleted")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestOrphanedBlobCleanup tests the cleanup of orphaned blobs
|
||||||
|
func TestOrphanedBlobCleanup(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Create blobs
|
||||||
|
blob1 := &Blob{
|
||||||
|
ID: types.NewBlobID(),
|
||||||
|
Hash: types.BlobHash("orphaned-blob"),
|
||||||
|
CreatedTS: time.Now().Truncate(time.Second),
|
||||||
|
}
|
||||||
|
blob2 := &Blob{
|
||||||
|
ID: types.NewBlobID(),
|
||||||
|
Hash: types.BlobHash("referenced-blob"),
|
||||||
|
CreatedTS: time.Now().Truncate(time.Second),
|
||||||
|
}
|
||||||
|
|
||||||
|
err := repos.Blobs.Create(ctx, nil, blob1)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create blob1: %v", err)
|
||||||
|
}
|
||||||
|
err = repos.Blobs.Create(ctx, nil, blob2)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create blob2: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create a snapshot and reference only blob2
|
||||||
|
snapshot := &Snapshot{
|
||||||
|
ID: "test-snapshot",
|
||||||
|
Hostname: "test-host",
|
||||||
|
StartedAt: time.Now(),
|
||||||
|
}
|
||||||
|
err = repos.Snapshots.Create(ctx, nil, snapshot)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create snapshot: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add blob2 to snapshot
|
||||||
|
err = repos.Snapshots.AddBlob(ctx, nil, snapshot.ID.String(), blob2.ID, blob2.Hash)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to add blob to snapshot: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Run orphaned cleanup
|
||||||
|
err = repos.Blobs.DeleteOrphaned(ctx)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to delete orphaned blobs: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check that orphaned blob is gone
|
||||||
|
orphanedBlob, err := repos.Blobs.GetByID(ctx, blob1.ID.String())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("error getting blob: %v", err)
|
||||||
|
}
|
||||||
|
if orphanedBlob != nil {
|
||||||
|
t.Error("orphaned blob should have been deleted")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check that referenced blob still exists
|
||||||
|
referencedBlob, err := repos.Blobs.GetByID(ctx, blob2.ID.String())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("error getting blob: %v", err)
|
||||||
|
}
|
||||||
|
if referencedBlob == nil {
|
||||||
|
t.Error("referenced blob should not have been deleted")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestFileChunkRepositoryWithUUIDs tests file-chunk relationships with UUIDs
|
||||||
|
func TestFileChunkRepositoryWithUUIDs(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Create a file
|
||||||
|
file := &File{
|
||||||
|
Path: "/test.txt",
|
||||||
|
MTime: time.Now().Truncate(time.Second),
|
||||||
|
CTime: time.Now().Truncate(time.Second),
|
||||||
|
Size: 3072,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
err := repos.Files.Create(ctx, nil, file)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create chunks
|
||||||
|
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
|
||||||
|
for i, chunkHash := range chunks {
|
||||||
|
chunk := &Chunk{
|
||||||
|
ChunkHash: chunkHash,
|
||||||
|
Size: 1024,
|
||||||
|
}
|
||||||
|
err = repos.Chunks.Create(ctx, nil, chunk)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create file-chunk mapping
|
||||||
|
fc := &FileChunk{
|
||||||
|
FileID: file.ID,
|
||||||
|
Idx: i,
|
||||||
|
ChunkHash: chunkHash,
|
||||||
|
}
|
||||||
|
err = repos.FileChunks.Create(ctx, nil, fc)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file chunk: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test GetByFileID
|
||||||
|
fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to get file chunks: %v", err)
|
||||||
|
}
|
||||||
|
if len(fileChunks) != 3 {
|
||||||
|
t.Errorf("expected 3 chunks, got %d", len(fileChunks))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test DeleteByFileID
|
||||||
|
err = repos.FileChunks.DeleteByFileID(ctx, nil, file.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to delete file chunks: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
fileChunks, err = repos.FileChunks.GetByFileID(ctx, file.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to get file chunks after delete: %v", err)
|
||||||
|
}
|
||||||
|
if len(fileChunks) != 0 {
|
||||||
|
t.Errorf("expected 0 chunks after delete, got %d", len(fileChunks))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestChunkFileRepositoryWithUUIDs tests chunk-file relationships with UUIDs
|
||||||
|
func TestChunkFileRepositoryWithUUIDs(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Create files
|
||||||
|
file1 := &File{
|
||||||
|
Path: "/file1.txt",
|
||||||
|
MTime: time.Now().Truncate(time.Second),
|
||||||
|
CTime: time.Now().Truncate(time.Second),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
file2 := &File{
|
||||||
|
Path: "/file2.txt",
|
||||||
|
MTime: time.Now().Truncate(time.Second),
|
||||||
|
CTime: time.Now().Truncate(time.Second),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
|
||||||
|
err := repos.Files.Create(ctx, nil, file1)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file1: %v", err)
|
||||||
|
}
|
||||||
|
err = repos.Files.Create(ctx, nil, file2)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file2: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create a chunk that appears in both files (deduplication)
|
||||||
|
chunk := &Chunk{
|
||||||
|
ChunkHash: types.ChunkHash("shared-chunk"),
|
||||||
|
Size: 1024,
|
||||||
|
}
|
||||||
|
err = repos.Chunks.Create(ctx, nil, chunk)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create chunk-file mappings
|
||||||
|
cf1 := &ChunkFile{
|
||||||
|
ChunkHash: chunk.ChunkHash,
|
||||||
|
FileID: file1.ID,
|
||||||
|
FileOffset: 0,
|
||||||
|
Length: 1024,
|
||||||
|
}
|
||||||
|
cf2 := &ChunkFile{
|
||||||
|
ChunkHash: chunk.ChunkHash,
|
||||||
|
FileID: file2.ID,
|
||||||
|
FileOffset: 512,
|
||||||
|
Length: 1024,
|
||||||
|
}
|
||||||
|
|
||||||
|
err = repos.ChunkFiles.Create(ctx, nil, cf1)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk file 1: %v", err)
|
||||||
|
}
|
||||||
|
err = repos.ChunkFiles.Create(ctx, nil, cf2)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk file 2: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test GetByChunkHash
|
||||||
|
chunkFiles, err := repos.ChunkFiles.GetByChunkHash(ctx, chunk.ChunkHash)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to get chunk files: %v", err)
|
||||||
|
}
|
||||||
|
if len(chunkFiles) != 2 {
|
||||||
|
t.Errorf("expected 2 files for chunk, got %d", len(chunkFiles))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test GetByFileID
|
||||||
|
chunkFiles, err = repos.ChunkFiles.GetByFileID(ctx, file1.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to get chunks by file ID: %v", err)
|
||||||
|
}
|
||||||
|
if len(chunkFiles) != 1 {
|
||||||
|
t.Errorf("expected 1 chunk for file, got %d", len(chunkFiles))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestSnapshotRepositoryExtendedFields tests snapshot with version and git revision
|
||||||
|
func TestSnapshotRepositoryExtendedFields(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repo := NewSnapshotRepository(db)
|
||||||
|
|
||||||
|
// Create snapshot with extended fields
|
||||||
|
snapshot := &Snapshot{
|
||||||
|
ID: "test-20250722-120000Z",
|
||||||
|
Hostname: "test-host",
|
||||||
|
VaultikVersion: "0.0.1",
|
||||||
|
VaultikGitRevision: "abc123def456",
|
||||||
|
StartedAt: time.Now(),
|
||||||
|
CompletedAt: nil,
|
||||||
|
FileCount: 100,
|
||||||
|
ChunkCount: 200,
|
||||||
|
BlobCount: 50,
|
||||||
|
TotalSize: 1024 * 1024,
|
||||||
|
BlobSize: 512 * 1024,
|
||||||
|
BlobUncompressedSize: 1024 * 1024,
|
||||||
|
CompressionLevel: 6,
|
||||||
|
CompressionRatio: 2.0,
|
||||||
|
UploadDurationMs: 5000,
|
||||||
|
}
|
||||||
|
|
||||||
|
err := repo.Create(ctx, nil, snapshot)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create snapshot: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Retrieve and verify
|
||||||
|
retrieved, err := repo.GetByID(ctx, snapshot.ID.String())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to get snapshot: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if retrieved.VaultikVersion != snapshot.VaultikVersion {
|
||||||
|
t.Errorf("version mismatch: expected %s, got %s", snapshot.VaultikVersion, retrieved.VaultikVersion)
|
||||||
|
}
|
||||||
|
if retrieved.VaultikGitRevision != snapshot.VaultikGitRevision {
|
||||||
|
t.Errorf("git revision mismatch: expected %s, got %s", snapshot.VaultikGitRevision, retrieved.VaultikGitRevision)
|
||||||
|
}
|
||||||
|
if retrieved.CompressionLevel != snapshot.CompressionLevel {
|
||||||
|
t.Errorf("compression level mismatch: expected %d, got %d", snapshot.CompressionLevel, retrieved.CompressionLevel)
|
||||||
|
}
|
||||||
|
if retrieved.BlobUncompressedSize != snapshot.BlobUncompressedSize {
|
||||||
|
t.Errorf("uncompressed size mismatch: expected %d, got %d", snapshot.BlobUncompressedSize, retrieved.BlobUncompressedSize)
|
||||||
|
}
|
||||||
|
if retrieved.UploadDurationMs != snapshot.UploadDurationMs {
|
||||||
|
t.Errorf("upload duration mismatch: expected %d, got %d", snapshot.UploadDurationMs, retrieved.UploadDurationMs)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestComplexOrphanedDataScenario tests a complex scenario with multiple relationships
|
||||||
|
func TestComplexOrphanedDataScenario(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Create snapshots
|
||||||
|
snapshot1 := &Snapshot{
|
||||||
|
ID: "snapshot1",
|
||||||
|
Hostname: "host1",
|
||||||
|
StartedAt: time.Now(),
|
||||||
|
}
|
||||||
|
snapshot2 := &Snapshot{
|
||||||
|
ID: "snapshot2",
|
||||||
|
Hostname: "host1",
|
||||||
|
StartedAt: time.Now(),
|
||||||
|
}
|
||||||
|
|
||||||
|
err := repos.Snapshots.Create(ctx, nil, snapshot1)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create snapshot1: %v", err)
|
||||||
|
}
|
||||||
|
err = repos.Snapshots.Create(ctx, nil, snapshot2)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create snapshot2: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create files
|
||||||
|
files := make([]*File, 3)
|
||||||
|
for i := range files {
|
||||||
|
files[i] = &File{
|
||||||
|
Path: types.FilePath(fmt.Sprintf("/file%d.txt", i)),
|
||||||
|
MTime: time.Now().Truncate(time.Second),
|
||||||
|
CTime: time.Now().Truncate(time.Second),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
err = repos.Files.Create(ctx, nil, files[i])
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file%d: %v", i, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add files to snapshots
|
||||||
|
// Snapshot1: file0, file1
|
||||||
|
// Snapshot2: file1, file2
|
||||||
|
// file0: only in snapshot1
|
||||||
|
// file1: in both snapshots
|
||||||
|
// file2: only in snapshot2
|
||||||
|
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot1.ID.String(), files[0].ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot1.ID.String(), files[1].ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot2.ID.String(), files[1].ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot2.ID.String(), files[2].ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Delete snapshot1
|
||||||
|
err = repos.Snapshots.DeleteSnapshotFiles(ctx, snapshot1.ID.String())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
err = repos.Snapshots.Delete(ctx, snapshot1.ID.String())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Run orphaned cleanup
|
||||||
|
err = repos.Files.DeleteOrphaned(ctx)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check results
|
||||||
|
// file0 should be deleted (only in deleted snapshot)
|
||||||
|
file0, err := repos.Files.GetByID(ctx, files[0].ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("error getting file0: %v", err)
|
||||||
|
}
|
||||||
|
if file0 != nil {
|
||||||
|
t.Error("file0 should have been deleted")
|
||||||
|
}
|
||||||
|
|
||||||
|
// file1 should exist (still in snapshot2)
|
||||||
|
file1, err := repos.Files.GetByID(ctx, files[1].ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("error getting file1: %v", err)
|
||||||
|
}
|
||||||
|
if file1 == nil {
|
||||||
|
t.Error("file1 should still exist")
|
||||||
|
}
|
||||||
|
|
||||||
|
// file2 should exist (still in snapshot2)
|
||||||
|
file2, err := repos.Files.GetByID(ctx, files[2].ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("error getting file2: %v", err)
|
||||||
|
}
|
||||||
|
if file2 == nil {
|
||||||
|
t.Error("file2 should still exist")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestCascadeDelete tests that cascade deletes work properly
|
||||||
|
func TestCascadeDelete(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Create a file
|
||||||
|
file := &File{
|
||||||
|
Path: "/cascade-test.txt",
|
||||||
|
MTime: time.Now().Truncate(time.Second),
|
||||||
|
CTime: time.Now().Truncate(time.Second),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
err := repos.Files.Create(ctx, nil, file)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create chunks and file-chunk mappings
|
||||||
|
for i := 0; i < 3; i++ {
|
||||||
|
chunk := &Chunk{
|
||||||
|
ChunkHash: types.ChunkHash(fmt.Sprintf("cascade-chunk-%d", i)),
|
||||||
|
Size: 1024,
|
||||||
|
}
|
||||||
|
err = repos.Chunks.Create(ctx, nil, chunk)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
fc := &FileChunk{
|
||||||
|
FileID: file.ID,
|
||||||
|
Idx: i,
|
||||||
|
ChunkHash: chunk.ChunkHash,
|
||||||
|
}
|
||||||
|
err = repos.FileChunks.Create(ctx, nil, fc)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file chunk: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify file chunks exist
|
||||||
|
fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
if len(fileChunks) != 3 {
|
||||||
|
t.Errorf("expected 3 file chunks, got %d", len(fileChunks))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Delete the file
|
||||||
|
err = repos.Files.DeleteByID(ctx, nil, file.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to delete file: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify file chunks were cascade deleted
|
||||||
|
fileChunks, err = repos.FileChunks.GetByFileID(ctx, file.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
if len(fileChunks) != 0 {
|
||||||
|
t.Errorf("expected 0 file chunks after cascade delete, got %d", len(fileChunks))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestTransactionIsolation tests that transactions properly isolate changes
|
||||||
|
func TestTransactionIsolation(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Start a transaction
|
||||||
|
err := repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
// Create a file within the transaction
|
||||||
|
file := &File{
|
||||||
|
Path: "/tx-test.txt",
|
||||||
|
MTime: time.Now().Truncate(time.Second),
|
||||||
|
CTime: time.Now().Truncate(time.Second),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
err := repos.Files.Create(ctx, tx, file)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Within the same transaction, we should be able to query it
|
||||||
|
// Note: This would require modifying GetByPath to accept a tx parameter
|
||||||
|
// For now, we'll just test that rollback works
|
||||||
|
|
||||||
|
// Return an error to trigger rollback
|
||||||
|
return fmt.Errorf("intentional rollback")
|
||||||
|
})
|
||||||
|
|
||||||
|
if err == nil {
|
||||||
|
t.Fatal("expected error from transaction")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify the file was not created (transaction rolled back)
|
||||||
|
files, err := repos.Files.ListByPrefix(ctx, "/tx-test")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
if len(files) != 0 {
|
||||||
|
t.Error("file should not exist after rollback")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestConcurrentOrphanedCleanup tests that concurrent cleanup operations don't interfere
|
||||||
|
func TestConcurrentOrphanedCleanup(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Set a 5-second busy timeout to handle concurrent operations
|
||||||
|
if _, err := db.conn.Exec("PRAGMA busy_timeout = 5000"); err != nil {
|
||||||
|
t.Fatalf("failed to set busy timeout: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create a snapshot
|
||||||
|
snapshot := &Snapshot{
|
||||||
|
ID: "concurrent-test",
|
||||||
|
Hostname: "test-host",
|
||||||
|
StartedAt: time.Now(),
|
||||||
|
}
|
||||||
|
err := repos.Snapshots.Create(ctx, nil, snapshot)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create many files, some orphaned
|
||||||
|
for i := 0; i < 20; i++ {
|
||||||
|
file := &File{
|
||||||
|
Path: types.FilePath(fmt.Sprintf("/concurrent-%d.txt", i)),
|
||||||
|
MTime: time.Now().Truncate(time.Second),
|
||||||
|
CTime: time.Now().Truncate(time.Second),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
err = repos.Files.Create(ctx, nil, file)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add even-numbered files to snapshot
|
||||||
|
if i%2 == 0 {
|
||||||
|
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Run multiple cleanup operations concurrently
|
||||||
|
// Note: SQLite has limited support for concurrent writes, so we expect some to fail
|
||||||
|
done := make(chan error, 3)
|
||||||
|
for i := 0; i < 3; i++ {
|
||||||
|
go func() {
|
||||||
|
done <- repos.Files.DeleteOrphaned(ctx)
|
||||||
|
}()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait for all to complete
|
||||||
|
for i := 0; i < 3; i++ {
|
||||||
|
err := <-done
|
||||||
|
if err != nil {
|
||||||
|
t.Errorf("cleanup %d failed: %v", i, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify correct files were deleted
|
||||||
|
files, err := repos.Files.ListByPrefix(ctx, "/concurrent-")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Should have 10 files remaining (even numbered)
|
||||||
|
if len(files) != 10 {
|
||||||
|
t.Errorf("expected 10 files remaining, got %d", len(files))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify all remaining files are even-numbered
|
||||||
|
for _, file := range files {
|
||||||
|
var num int
|
||||||
|
_, err := fmt.Sscanf(file.Path.String(), "/concurrent-%d.txt", &num)
|
||||||
|
if err != nil {
|
||||||
|
t.Logf("failed to parse file number from %s: %v", file.Path, err)
|
||||||
|
}
|
||||||
|
if num%2 != 0 {
|
||||||
|
t.Errorf("odd-numbered file %s should have been deleted", file.Path)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
165
internal/database/repository_debug_test.go
Normal file
165
internal/database/repository_debug_test.go
Normal file
@@ -0,0 +1,165 @@
|
|||||||
|
package database
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TestOrphanedFileCleanupDebug tests orphaned file cleanup with debug output
|
||||||
|
func TestOrphanedFileCleanupDebug(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Create files
|
||||||
|
file1 := &File{
|
||||||
|
Path: "/orphaned.txt",
|
||||||
|
MTime: time.Now().Truncate(time.Second),
|
||||||
|
CTime: time.Now().Truncate(time.Second),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
file2 := &File{
|
||||||
|
Path: "/referenced.txt",
|
||||||
|
MTime: time.Now().Truncate(time.Second),
|
||||||
|
CTime: time.Now().Truncate(time.Second),
|
||||||
|
Size: 2048,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
|
||||||
|
err := repos.Files.Create(ctx, nil, file1)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file1: %v", err)
|
||||||
|
}
|
||||||
|
t.Logf("Created file1 with ID: %s", file1.ID)
|
||||||
|
|
||||||
|
err = repos.Files.Create(ctx, nil, file2)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file2: %v", err)
|
||||||
|
}
|
||||||
|
t.Logf("Created file2 with ID: %s", file2.ID)
|
||||||
|
|
||||||
|
// Create a snapshot and reference only file2
|
||||||
|
snapshot := &Snapshot{
|
||||||
|
ID: "test-snapshot",
|
||||||
|
Hostname: "test-host",
|
||||||
|
StartedAt: time.Now(),
|
||||||
|
}
|
||||||
|
err = repos.Snapshots.Create(ctx, nil, snapshot)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create snapshot: %v", err)
|
||||||
|
}
|
||||||
|
t.Logf("Created snapshot: %s", snapshot.ID)
|
||||||
|
|
||||||
|
// Check snapshot_files before adding
|
||||||
|
var count int
|
||||||
|
err = db.conn.QueryRow("SELECT COUNT(*) FROM snapshot_files").Scan(&count)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
t.Logf("snapshot_files count before add: %d", count)
|
||||||
|
|
||||||
|
// Add file2 to snapshot
|
||||||
|
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file2.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to add file to snapshot: %v", err)
|
||||||
|
}
|
||||||
|
t.Logf("Added file2 to snapshot")
|
||||||
|
|
||||||
|
// Check snapshot_files after adding
|
||||||
|
err = db.conn.QueryRow("SELECT COUNT(*) FROM snapshot_files").Scan(&count)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
t.Logf("snapshot_files count after add: %d", count)
|
||||||
|
|
||||||
|
// Check which files are referenced
|
||||||
|
rows, err := db.conn.Query("SELECT file_id FROM snapshot_files")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
if err := rows.Close(); err != nil {
|
||||||
|
t.Logf("failed to close rows: %v", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
t.Log("Files in snapshot_files:")
|
||||||
|
for rows.Next() {
|
||||||
|
var fileID string
|
||||||
|
if err := rows.Scan(&fileID); err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
t.Logf(" - %s", fileID)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check files before cleanup
|
||||||
|
err = db.conn.QueryRow("SELECT COUNT(*) FROM files").Scan(&count)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
t.Logf("Files count before cleanup: %d", count)
|
||||||
|
|
||||||
|
// Run orphaned cleanup
|
||||||
|
err = repos.Files.DeleteOrphaned(ctx)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to delete orphaned files: %v", err)
|
||||||
|
}
|
||||||
|
t.Log("Ran orphaned cleanup")
|
||||||
|
|
||||||
|
// Check files after cleanup
|
||||||
|
err = db.conn.QueryRow("SELECT COUNT(*) FROM files").Scan(&count)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
t.Logf("Files count after cleanup: %d", count)
|
||||||
|
|
||||||
|
// List remaining files
|
||||||
|
files, err := repos.Files.ListByPrefix(ctx, "/")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
t.Log("Remaining files:")
|
||||||
|
for _, f := range files {
|
||||||
|
t.Logf(" - ID: %s, Path: %s", f.ID, f.Path)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check that orphaned file is gone
|
||||||
|
orphanedFile, err := repos.Files.GetByID(ctx, file1.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("error getting file: %v", err)
|
||||||
|
}
|
||||||
|
if orphanedFile != nil {
|
||||||
|
t.Error("orphaned file should have been deleted")
|
||||||
|
// Let's check why it wasn't deleted
|
||||||
|
var exists bool
|
||||||
|
err = db.conn.QueryRow(`
|
||||||
|
SELECT EXISTS(
|
||||||
|
SELECT 1 FROM snapshot_files
|
||||||
|
WHERE file_id = ?
|
||||||
|
)`, file1.ID).Scan(&exists)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
t.Logf("File1 exists in snapshot_files: %v", exists)
|
||||||
|
} else {
|
||||||
|
t.Log("Orphaned file was correctly deleted")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check that referenced file still exists
|
||||||
|
referencedFile, err := repos.Files.GetByID(ctx, file2.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("error getting file: %v", err)
|
||||||
|
}
|
||||||
|
if referencedFile == nil {
|
||||||
|
t.Error("referenced file should not have been deleted")
|
||||||
|
} else {
|
||||||
|
t.Log("Referenced file correctly remains")
|
||||||
|
}
|
||||||
|
}
|
||||||
543
internal/database/repository_edge_cases_test.go
Normal file
543
internal/database/repository_edge_cases_test.go
Normal file
@@ -0,0 +1,543 @@
|
|||||||
|
package database
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
"strings"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TestFileRepositoryEdgeCases tests edge cases for file repository
|
||||||
|
func TestFileRepositoryEdgeCases(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repo := NewFileRepository(db)
|
||||||
|
|
||||||
|
tests := []struct {
|
||||||
|
name string
|
||||||
|
file *File
|
||||||
|
wantErr bool
|
||||||
|
errMsg string
|
||||||
|
}{
|
||||||
|
{
|
||||||
|
name: "empty path",
|
||||||
|
file: &File{
|
||||||
|
Path: "",
|
||||||
|
MTime: time.Now(),
|
||||||
|
CTime: time.Now(),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
},
|
||||||
|
wantErr: false, // Empty strings are allowed, only NULL is not allowed
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "very long path",
|
||||||
|
file: &File{
|
||||||
|
Path: types.FilePath("/" + strings.Repeat("a", 4096)),
|
||||||
|
MTime: time.Now(),
|
||||||
|
CTime: time.Now(),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
},
|
||||||
|
wantErr: false,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "path with special characters",
|
||||||
|
file: &File{
|
||||||
|
Path: "/test/file with spaces and 特殊文字.txt",
|
||||||
|
MTime: time.Now(),
|
||||||
|
CTime: time.Now(),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
},
|
||||||
|
wantErr: false,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "zero size file",
|
||||||
|
file: &File{
|
||||||
|
Path: "/empty.txt",
|
||||||
|
MTime: time.Now(),
|
||||||
|
CTime: time.Now(),
|
||||||
|
Size: 0,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
},
|
||||||
|
wantErr: false,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "symlink with target",
|
||||||
|
file: &File{
|
||||||
|
Path: "/link",
|
||||||
|
MTime: time.Now(),
|
||||||
|
CTime: time.Now(),
|
||||||
|
Size: 0,
|
||||||
|
Mode: 0777 | 0120000, // symlink mode
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
LinkTarget: "/target",
|
||||||
|
},
|
||||||
|
wantErr: false,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
for i, tt := range tests {
|
||||||
|
t.Run(tt.name, func(t *testing.T) {
|
||||||
|
// Add a unique suffix to paths to avoid UNIQUE constraint violations
|
||||||
|
if tt.file.Path != "" {
|
||||||
|
tt.file.Path = types.FilePath(fmt.Sprintf("%s_%d_%d", tt.file.Path, i, time.Now().UnixNano()))
|
||||||
|
}
|
||||||
|
|
||||||
|
err := repo.Create(ctx, nil, tt.file)
|
||||||
|
if (err != nil) != tt.wantErr {
|
||||||
|
t.Errorf("Create() error = %v, wantErr %v", err, tt.wantErr)
|
||||||
|
}
|
||||||
|
if err != nil && tt.errMsg != "" && !strings.Contains(err.Error(), tt.errMsg) {
|
||||||
|
t.Errorf("Create() error = %v, want error containing %q", err, tt.errMsg)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestDuplicateHandling tests handling of duplicate entries
|
||||||
|
func TestDuplicateHandling(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Test duplicate file paths - Create uses UPSERT logic
|
||||||
|
t.Run("duplicate file paths", func(t *testing.T) {
|
||||||
|
file1 := &File{
|
||||||
|
Path: "/duplicate.txt",
|
||||||
|
MTime: time.Now(),
|
||||||
|
CTime: time.Now(),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
file2 := &File{
|
||||||
|
Path: "/duplicate.txt", // Same path
|
||||||
|
MTime: time.Now().Add(time.Hour),
|
||||||
|
CTime: time.Now().Add(time.Hour),
|
||||||
|
Size: 2048,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
|
||||||
|
err := repos.Files.Create(ctx, nil, file1)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file1: %v", err)
|
||||||
|
}
|
||||||
|
originalID := file1.ID
|
||||||
|
|
||||||
|
// Create with same path should update the existing record (UPSERT behavior)
|
||||||
|
err = repos.Files.Create(ctx, nil, file2)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file2: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify the file was updated, not duplicated
|
||||||
|
retrievedFile, err := repos.Files.GetByPath(ctx, "/duplicate.txt")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to retrieve file: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// The file should have been updated with file2's data
|
||||||
|
if retrievedFile.Size != 2048 {
|
||||||
|
t.Errorf("expected size 2048, got %d", retrievedFile.Size)
|
||||||
|
}
|
||||||
|
|
||||||
|
// ID might be different due to the UPSERT
|
||||||
|
if retrievedFile.ID != file2.ID {
|
||||||
|
t.Logf("File ID changed from %s to %s during upsert", originalID, retrievedFile.ID)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
// Test duplicate chunk hashes
|
||||||
|
t.Run("duplicate chunk hashes", func(t *testing.T) {
|
||||||
|
chunk := &Chunk{
|
||||||
|
ChunkHash: types.ChunkHash("duplicate-chunk"),
|
||||||
|
Size: 1024,
|
||||||
|
}
|
||||||
|
|
||||||
|
err := repos.Chunks.Create(ctx, nil, chunk)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create chunk: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Creating the same chunk again should be idempotent (ON CONFLICT DO NOTHING)
|
||||||
|
err = repos.Chunks.Create(ctx, nil, chunk)
|
||||||
|
if err != nil {
|
||||||
|
t.Errorf("duplicate chunk creation should be idempotent, got error: %v", err)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
// Test duplicate file-chunk mappings
|
||||||
|
t.Run("duplicate file-chunk mappings", func(t *testing.T) {
|
||||||
|
file := &File{
|
||||||
|
Path: "/test-dup-fc.txt",
|
||||||
|
MTime: time.Now(),
|
||||||
|
CTime: time.Now(),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
err := repos.Files.Create(ctx, nil, file)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
chunk := &Chunk{
|
||||||
|
ChunkHash: types.ChunkHash("test-chunk-dup"),
|
||||||
|
Size: 1024,
|
||||||
|
}
|
||||||
|
err = repos.Chunks.Create(ctx, nil, chunk)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
fc := &FileChunk{
|
||||||
|
FileID: file.ID,
|
||||||
|
Idx: 0,
|
||||||
|
ChunkHash: chunk.ChunkHash,
|
||||||
|
}
|
||||||
|
|
||||||
|
err = repos.FileChunks.Create(ctx, nil, fc)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Creating the same mapping again should be idempotent
|
||||||
|
err = repos.FileChunks.Create(ctx, nil, fc)
|
||||||
|
if err != nil {
|
||||||
|
t.Error("file-chunk creation should be idempotent")
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestNullHandling tests handling of NULL values
|
||||||
|
func TestNullHandling(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Test file with no link target
|
||||||
|
t.Run("file without link target", func(t *testing.T) {
|
||||||
|
file := &File{
|
||||||
|
Path: "/regular.txt",
|
||||||
|
MTime: time.Now(),
|
||||||
|
CTime: time.Now(),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
LinkTarget: "", // Should be stored as NULL
|
||||||
|
}
|
||||||
|
|
||||||
|
err := repos.Files.Create(ctx, nil, file)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
retrieved, err := repos.Files.GetByID(ctx, file.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if retrieved.LinkTarget != "" {
|
||||||
|
t.Errorf("expected empty link target, got %q", retrieved.LinkTarget)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
// Test snapshot with NULL completed_at
|
||||||
|
t.Run("incomplete snapshot", func(t *testing.T) {
|
||||||
|
snapshot := &Snapshot{
|
||||||
|
ID: "incomplete-test",
|
||||||
|
Hostname: "test-host",
|
||||||
|
StartedAt: time.Now(),
|
||||||
|
CompletedAt: nil, // Should remain NULL until completed
|
||||||
|
}
|
||||||
|
|
||||||
|
err := repos.Snapshots.Create(ctx, nil, snapshot)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
retrieved, err := repos.Snapshots.GetByID(ctx, snapshot.ID.String())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if retrieved.CompletedAt != nil {
|
||||||
|
t.Error("expected nil CompletedAt for incomplete snapshot")
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
// Test blob with NULL uploaded_ts
|
||||||
|
t.Run("blob not uploaded", func(t *testing.T) {
|
||||||
|
blob := &Blob{
|
||||||
|
ID: types.NewBlobID(),
|
||||||
|
Hash: types.BlobHash("test-hash"),
|
||||||
|
CreatedTS: time.Now(),
|
||||||
|
UploadedTS: nil, // Not uploaded yet
|
||||||
|
}
|
||||||
|
|
||||||
|
err := repos.Blobs.Create(ctx, nil, blob)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
retrieved, err := repos.Blobs.GetByID(ctx, blob.ID.String())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if retrieved.UploadedTS != nil {
|
||||||
|
t.Error("expected nil UploadedTS for non-uploaded blob")
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestLargeDatasets tests operations with large amounts of data
|
||||||
|
func TestLargeDatasets(t *testing.T) {
|
||||||
|
if testing.Short() {
|
||||||
|
t.Skip("skipping large dataset test in short mode")
|
||||||
|
}
|
||||||
|
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Create a snapshot
|
||||||
|
snapshot := &Snapshot{
|
||||||
|
ID: "large-dataset-test",
|
||||||
|
Hostname: "test-host",
|
||||||
|
StartedAt: time.Now(),
|
||||||
|
}
|
||||||
|
err := repos.Snapshots.Create(ctx, nil, snapshot)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create many files
|
||||||
|
const fileCount = 1000
|
||||||
|
fileIDs := make([]types.FileID, fileCount)
|
||||||
|
|
||||||
|
t.Run("create many files", func(t *testing.T) {
|
||||||
|
start := time.Now()
|
||||||
|
for i := 0; i < fileCount; i++ {
|
||||||
|
file := &File{
|
||||||
|
Path: types.FilePath(fmt.Sprintf("/large/file%05d.txt", i)),
|
||||||
|
MTime: time.Now(),
|
||||||
|
CTime: time.Now(),
|
||||||
|
Size: int64(i * 1024),
|
||||||
|
Mode: 0644,
|
||||||
|
UID: uint32(1000 + (i % 10)),
|
||||||
|
GID: uint32(1000 + (i % 10)),
|
||||||
|
}
|
||||||
|
err := repos.Files.Create(ctx, nil, file)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create file %d: %v", i, err)
|
||||||
|
}
|
||||||
|
fileIDs[i] = file.ID
|
||||||
|
|
||||||
|
// Add half to snapshot
|
||||||
|
if i%2 == 0 {
|
||||||
|
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
t.Logf("Created %d files in %v", fileCount, time.Since(start))
|
||||||
|
})
|
||||||
|
|
||||||
|
// Test ListByPrefix performance
|
||||||
|
t.Run("list by prefix performance", func(t *testing.T) {
|
||||||
|
start := time.Now()
|
||||||
|
files, err := repos.Files.ListByPrefix(ctx, "/large/")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
if len(files) != fileCount {
|
||||||
|
t.Errorf("expected %d files, got %d", fileCount, len(files))
|
||||||
|
}
|
||||||
|
t.Logf("Listed %d files in %v", len(files), time.Since(start))
|
||||||
|
})
|
||||||
|
|
||||||
|
// Test orphaned cleanup performance
|
||||||
|
t.Run("orphaned cleanup performance", func(t *testing.T) {
|
||||||
|
start := time.Now()
|
||||||
|
err := repos.Files.DeleteOrphaned(ctx)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
t.Logf("Cleaned up orphaned files in %v", time.Since(start))
|
||||||
|
|
||||||
|
// Verify correct number remain
|
||||||
|
files, err := repos.Files.ListByPrefix(ctx, "/large/")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
if len(files) != fileCount/2 {
|
||||||
|
t.Errorf("expected %d files after cleanup, got %d", fileCount/2, len(files))
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestErrorPropagation tests that errors are properly propagated
|
||||||
|
func TestErrorPropagation(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Test GetByID with non-existent ID
|
||||||
|
t.Run("GetByID non-existent", func(t *testing.T) {
|
||||||
|
file, err := repos.Files.GetByID(ctx, types.NewFileID())
|
||||||
|
if err != nil {
|
||||||
|
t.Errorf("GetByID should not return error for non-existent ID, got: %v", err)
|
||||||
|
}
|
||||||
|
if file != nil {
|
||||||
|
t.Error("expected nil file for non-existent ID")
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
// Test GetByPath with non-existent path
|
||||||
|
t.Run("GetByPath non-existent", func(t *testing.T) {
|
||||||
|
file, err := repos.Files.GetByPath(ctx, "/non/existent/path.txt")
|
||||||
|
if err != nil {
|
||||||
|
t.Errorf("GetByPath should not return error for non-existent path, got: %v", err)
|
||||||
|
}
|
||||||
|
if file != nil {
|
||||||
|
t.Error("expected nil file for non-existent path")
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
// Test invalid foreign key reference
|
||||||
|
t.Run("invalid foreign key", func(t *testing.T) {
|
||||||
|
fc := &FileChunk{
|
||||||
|
FileID: types.NewFileID(),
|
||||||
|
Idx: 0,
|
||||||
|
ChunkHash: types.ChunkHash("some-chunk"),
|
||||||
|
}
|
||||||
|
err := repos.FileChunks.Create(ctx, nil, fc)
|
||||||
|
if err == nil {
|
||||||
|
t.Error("expected error for invalid foreign key")
|
||||||
|
}
|
||||||
|
if !strings.Contains(err.Error(), "FOREIGN KEY") {
|
||||||
|
t.Errorf("expected foreign key error, got: %v", err)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestQueryInjection tests that the system is safe from SQL injection
|
||||||
|
func TestQueryInjection(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Test various injection attempts
|
||||||
|
injectionTests := []string{
|
||||||
|
"'; DROP TABLE files; --",
|
||||||
|
"' OR '1'='1",
|
||||||
|
"'; DELETE FROM files WHERE '1'='1'; --",
|
||||||
|
`test'); DROP TABLE files; --`,
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, injection := range injectionTests {
|
||||||
|
t.Run("injection attempt", func(t *testing.T) {
|
||||||
|
// Try injection in file path
|
||||||
|
file := &File{
|
||||||
|
Path: types.FilePath(injection),
|
||||||
|
MTime: time.Now(),
|
||||||
|
CTime: time.Now(),
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
_ = repos.Files.Create(ctx, nil, file)
|
||||||
|
// Should either succeed (treating as normal string) or fail with constraint
|
||||||
|
// but should NOT execute the injected SQL
|
||||||
|
|
||||||
|
// Verify tables still exist
|
||||||
|
var count int
|
||||||
|
err := db.conn.QueryRow("SELECT COUNT(*) FROM files").Scan(&count)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal("files table was damaged by injection")
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestTimezoneHandling tests that times are properly handled in UTC
|
||||||
|
func TestTimezoneHandling(t *testing.T) {
|
||||||
|
db, cleanup := setupTestDB(t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
repos := NewRepositories(db)
|
||||||
|
|
||||||
|
// Create file with specific timezone
|
||||||
|
loc, err := time.LoadLocation("America/New_York")
|
||||||
|
if err != nil {
|
||||||
|
t.Skip("timezone not available")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Use Truncate to remove sub-second precision since we store as Unix timestamps
|
||||||
|
nyTime := time.Now().In(loc).Truncate(time.Second)
|
||||||
|
file := &File{
|
||||||
|
Path: "/timezone-test.txt",
|
||||||
|
MTime: nyTime,
|
||||||
|
CTime: nyTime,
|
||||||
|
Size: 1024,
|
||||||
|
Mode: 0644,
|
||||||
|
UID: 1000,
|
||||||
|
GID: 1000,
|
||||||
|
}
|
||||||
|
|
||||||
|
err = repos.Files.Create(ctx, nil, file)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Retrieve and verify times are in UTC
|
||||||
|
retrieved, err := repos.Files.GetByID(ctx, file.ID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check that times are equivalent (same instant)
|
||||||
|
if !retrieved.MTime.Equal(nyTime) {
|
||||||
|
t.Error("time was not preserved correctly")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check that retrieved time is in UTC
|
||||||
|
if retrieved.MTime.Location() != time.UTC {
|
||||||
|
t.Error("retrieved time is not in UTC")
|
||||||
|
}
|
||||||
|
}
|
||||||
137
internal/database/schema.sql
Normal file
137
internal/database/schema.sql
Normal file
@@ -0,0 +1,137 @@
|
|||||||
|
-- Vaultik Database Schema
|
||||||
|
-- Note: This database does not support migrations. If the schema changes,
|
||||||
|
-- delete the local database and perform a full backup to recreate it.
|
||||||
|
|
||||||
|
-- Files table: stores metadata about files in the filesystem
|
||||||
|
CREATE TABLE IF NOT EXISTS files (
|
||||||
|
id TEXT PRIMARY KEY, -- UUID
|
||||||
|
path TEXT NOT NULL UNIQUE,
|
||||||
|
source_path TEXT NOT NULL DEFAULT '', -- The source directory this file came from (for restore path stripping)
|
||||||
|
mtime INTEGER NOT NULL,
|
||||||
|
ctime INTEGER NOT NULL,
|
||||||
|
size INTEGER NOT NULL,
|
||||||
|
mode INTEGER NOT NULL,
|
||||||
|
uid INTEGER NOT NULL,
|
||||||
|
gid INTEGER NOT NULL,
|
||||||
|
link_target TEXT
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Create index on path for efficient lookups
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_files_path ON files(path);
|
||||||
|
|
||||||
|
-- File chunks table: maps files to their constituent chunks
|
||||||
|
CREATE TABLE IF NOT EXISTS file_chunks (
|
||||||
|
file_id TEXT NOT NULL,
|
||||||
|
idx INTEGER NOT NULL,
|
||||||
|
chunk_hash TEXT NOT NULL,
|
||||||
|
PRIMARY KEY (file_id, idx),
|
||||||
|
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE,
|
||||||
|
FOREIGN KEY (chunk_hash) REFERENCES chunks(chunk_hash)
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Index for efficient chunk lookups (used in orphan detection)
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_file_chunks_chunk_hash ON file_chunks(chunk_hash);
|
||||||
|
|
||||||
|
-- Chunks table: stores unique content-defined chunks
|
||||||
|
CREATE TABLE IF NOT EXISTS chunks (
|
||||||
|
chunk_hash TEXT PRIMARY KEY,
|
||||||
|
size INTEGER NOT NULL
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Blobs table: stores packed, compressed, and encrypted blob information
|
||||||
|
CREATE TABLE IF NOT EXISTS blobs (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
blob_hash TEXT UNIQUE,
|
||||||
|
created_ts INTEGER NOT NULL,
|
||||||
|
finished_ts INTEGER,
|
||||||
|
uncompressed_size INTEGER NOT NULL DEFAULT 0,
|
||||||
|
compressed_size INTEGER NOT NULL DEFAULT 0,
|
||||||
|
uploaded_ts INTEGER
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Blob chunks table: maps chunks to the blobs that contain them
|
||||||
|
CREATE TABLE IF NOT EXISTS blob_chunks (
|
||||||
|
blob_id TEXT NOT NULL,
|
||||||
|
chunk_hash TEXT NOT NULL,
|
||||||
|
offset INTEGER NOT NULL,
|
||||||
|
length INTEGER NOT NULL,
|
||||||
|
PRIMARY KEY (blob_id, chunk_hash),
|
||||||
|
FOREIGN KEY (blob_id) REFERENCES blobs(id) ON DELETE CASCADE,
|
||||||
|
FOREIGN KEY (chunk_hash) REFERENCES chunks(chunk_hash)
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Index for efficient chunk lookups (used in orphan detection)
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_blob_chunks_chunk_hash ON blob_chunks(chunk_hash);
|
||||||
|
|
||||||
|
-- Chunk files table: reverse mapping of chunks to files
|
||||||
|
CREATE TABLE IF NOT EXISTS chunk_files (
|
||||||
|
chunk_hash TEXT NOT NULL,
|
||||||
|
file_id TEXT NOT NULL,
|
||||||
|
file_offset INTEGER NOT NULL,
|
||||||
|
length INTEGER NOT NULL,
|
||||||
|
PRIMARY KEY (chunk_hash, file_id),
|
||||||
|
FOREIGN KEY (chunk_hash) REFERENCES chunks(chunk_hash),
|
||||||
|
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Index for efficient file lookups (used in orphan detection)
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_chunk_files_file_id ON chunk_files(file_id);
|
||||||
|
|
||||||
|
-- Snapshots table: tracks backup snapshots
|
||||||
|
CREATE TABLE IF NOT EXISTS snapshots (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
hostname TEXT NOT NULL,
|
||||||
|
vaultik_version TEXT NOT NULL,
|
||||||
|
vaultik_git_revision TEXT NOT NULL,
|
||||||
|
started_at INTEGER NOT NULL,
|
||||||
|
completed_at INTEGER,
|
||||||
|
file_count INTEGER NOT NULL DEFAULT 0,
|
||||||
|
chunk_count INTEGER NOT NULL DEFAULT 0,
|
||||||
|
blob_count INTEGER NOT NULL DEFAULT 0,
|
||||||
|
total_size INTEGER NOT NULL DEFAULT 0,
|
||||||
|
blob_size INTEGER NOT NULL DEFAULT 0,
|
||||||
|
blob_uncompressed_size INTEGER NOT NULL DEFAULT 0,
|
||||||
|
compression_ratio REAL NOT NULL DEFAULT 1.0,
|
||||||
|
compression_level INTEGER NOT NULL DEFAULT 3,
|
||||||
|
upload_bytes INTEGER NOT NULL DEFAULT 0,
|
||||||
|
upload_duration_ms INTEGER NOT NULL DEFAULT 0
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Snapshot files table: maps snapshots to files
|
||||||
|
CREATE TABLE IF NOT EXISTS snapshot_files (
|
||||||
|
snapshot_id TEXT NOT NULL,
|
||||||
|
file_id TEXT NOT NULL,
|
||||||
|
PRIMARY KEY (snapshot_id, file_id),
|
||||||
|
FOREIGN KEY (snapshot_id) REFERENCES snapshots(id) ON DELETE CASCADE,
|
||||||
|
FOREIGN KEY (file_id) REFERENCES files(id)
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Index for efficient file lookups (used in orphan detection)
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_snapshot_files_file_id ON snapshot_files(file_id);
|
||||||
|
|
||||||
|
-- Snapshot blobs table: maps snapshots to blobs
|
||||||
|
CREATE TABLE IF NOT EXISTS snapshot_blobs (
|
||||||
|
snapshot_id TEXT NOT NULL,
|
||||||
|
blob_id TEXT NOT NULL,
|
||||||
|
blob_hash TEXT NOT NULL,
|
||||||
|
PRIMARY KEY (snapshot_id, blob_id),
|
||||||
|
FOREIGN KEY (snapshot_id) REFERENCES snapshots(id) ON DELETE CASCADE,
|
||||||
|
FOREIGN KEY (blob_id) REFERENCES blobs(id)
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Index for efficient blob lookups (used in orphan detection)
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_snapshot_blobs_blob_id ON snapshot_blobs(blob_id);
|
||||||
|
|
||||||
|
-- Uploads table: tracks blob upload metrics
|
||||||
|
CREATE TABLE IF NOT EXISTS uploads (
|
||||||
|
blob_hash TEXT PRIMARY KEY,
|
||||||
|
snapshot_id TEXT NOT NULL,
|
||||||
|
uploaded_at INTEGER NOT NULL,
|
||||||
|
size INTEGER NOT NULL,
|
||||||
|
duration_ms INTEGER NOT NULL,
|
||||||
|
FOREIGN KEY (blob_hash) REFERENCES blobs(blob_hash),
|
||||||
|
FOREIGN KEY (snapshot_id) REFERENCES snapshots(id)
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Index for efficient snapshot lookups
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_uploads_snapshot_id ON uploads(snapshot_id);
|
||||||
11
internal/database/schema/008_uploads.sql
Normal file
11
internal/database/schema/008_uploads.sql
Normal file
@@ -0,0 +1,11 @@
|
|||||||
|
-- Track blob upload metrics
|
||||||
|
CREATE TABLE IF NOT EXISTS uploads (
|
||||||
|
blob_hash TEXT PRIMARY KEY,
|
||||||
|
uploaded_at TIMESTAMP NOT NULL,
|
||||||
|
size INTEGER NOT NULL,
|
||||||
|
duration_ms INTEGER NOT NULL,
|
||||||
|
FOREIGN KEY (blob_hash) REFERENCES blobs(blob_hash)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX idx_uploads_uploaded_at ON uploads(uploaded_at);
|
||||||
|
CREATE INDEX idx_uploads_duration ON uploads(duration_ms);
|
||||||
@@ -5,6 +5,8 @@ import (
|
|||||||
"database/sql"
|
"database/sql"
|
||||||
"fmt"
|
"fmt"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
)
|
)
|
||||||
|
|
||||||
type SnapshotRepository struct {
|
type SnapshotRepository struct {
|
||||||
@@ -17,17 +19,27 @@ func NewSnapshotRepository(db *DB) *SnapshotRepository {
|
|||||||
|
|
||||||
func (r *SnapshotRepository) Create(ctx context.Context, tx *sql.Tx, snapshot *Snapshot) error {
|
func (r *SnapshotRepository) Create(ctx context.Context, tx *sql.Tx, snapshot *Snapshot) error {
|
||||||
query := `
|
query := `
|
||||||
INSERT INTO snapshots (id, hostname, vaultik_version, created_ts, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio)
|
INSERT INTO snapshots (id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at,
|
||||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
file_count, chunk_count, blob_count, total_size, blob_size, blob_uncompressed_size,
|
||||||
|
compression_ratio, compression_level, upload_bytes, upload_duration_ms)
|
||||||
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||||
`
|
`
|
||||||
|
|
||||||
|
var completedAt *int64
|
||||||
|
if snapshot.CompletedAt != nil {
|
||||||
|
ts := snapshot.CompletedAt.Unix()
|
||||||
|
completedAt = &ts
|
||||||
|
}
|
||||||
|
|
||||||
var err error
|
var err error
|
||||||
if tx != nil {
|
if tx != nil {
|
||||||
_, err = tx.ExecContext(ctx, query, snapshot.ID, snapshot.Hostname, snapshot.VaultikVersion, snapshot.CreatedTS.Unix(),
|
_, err = tx.ExecContext(ctx, query, snapshot.ID, snapshot.Hostname, snapshot.VaultikVersion, snapshot.VaultikGitRevision, snapshot.StartedAt.Unix(),
|
||||||
snapshot.FileCount, snapshot.ChunkCount, snapshot.BlobCount, snapshot.TotalSize, snapshot.BlobSize, snapshot.CompressionRatio)
|
completedAt, snapshot.FileCount, snapshot.ChunkCount, snapshot.BlobCount, snapshot.TotalSize, snapshot.BlobSize, snapshot.BlobUncompressedSize,
|
||||||
|
snapshot.CompressionRatio, snapshot.CompressionLevel, snapshot.UploadBytes, snapshot.UploadDurationMs)
|
||||||
} else {
|
} else {
|
||||||
_, err = r.db.ExecWithLock(ctx, query, snapshot.ID, snapshot.Hostname, snapshot.VaultikVersion, snapshot.CreatedTS.Unix(),
|
_, err = r.db.ExecWithLog(ctx, query, snapshot.ID, snapshot.Hostname, snapshot.VaultikVersion, snapshot.VaultikGitRevision, snapshot.StartedAt.Unix(),
|
||||||
snapshot.FileCount, snapshot.ChunkCount, snapshot.BlobCount, snapshot.TotalSize, snapshot.BlobSize, snapshot.CompressionRatio)
|
completedAt, snapshot.FileCount, snapshot.ChunkCount, snapshot.BlobCount, snapshot.TotalSize, snapshot.BlobSize, snapshot.BlobUncompressedSize,
|
||||||
|
snapshot.CompressionRatio, snapshot.CompressionLevel, snapshot.UploadBytes, snapshot.UploadDurationMs)
|
||||||
}
|
}
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -58,7 +70,7 @@ func (r *SnapshotRepository) UpdateCounts(ctx context.Context, tx *sql.Tx, snaps
|
|||||||
if tx != nil {
|
if tx != nil {
|
||||||
_, err = tx.ExecContext(ctx, query, fileCount, chunkCount, blobCount, totalSize, blobSize, compressionRatio, snapshotID)
|
_, err = tx.ExecContext(ctx, query, fileCount, chunkCount, blobCount, totalSize, blobSize, compressionRatio, snapshotID)
|
||||||
} else {
|
} else {
|
||||||
_, err = r.db.ExecWithLock(ctx, query, fileCount, chunkCount, blobCount, totalSize, blobSize, compressionRatio, snapshotID)
|
_, err = r.db.ExecWithLog(ctx, query, fileCount, chunkCount, blobCount, totalSize, blobSize, compressionRatio, snapshotID)
|
||||||
}
|
}
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -68,27 +80,83 @@ func (r *SnapshotRepository) UpdateCounts(ctx context.Context, tx *sql.Tx, snaps
|
|||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// UpdateExtendedStats updates extended statistics for a snapshot
|
||||||
|
func (r *SnapshotRepository) UpdateExtendedStats(ctx context.Context, tx *sql.Tx, snapshotID string, blobUncompressedSize int64, compressionLevel int, uploadDurationMs int64) error {
|
||||||
|
// Calculate compression ratio based on uncompressed vs compressed sizes
|
||||||
|
var compressionRatio float64
|
||||||
|
if blobUncompressedSize > 0 {
|
||||||
|
// Get current blob_size from DB to calculate ratio
|
||||||
|
var blobSize int64
|
||||||
|
queryGet := `SELECT blob_size FROM snapshots WHERE id = ?`
|
||||||
|
if tx != nil {
|
||||||
|
err := tx.QueryRowContext(ctx, queryGet, snapshotID).Scan(&blobSize)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("getting blob size: %w", err)
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
err := r.db.conn.QueryRowContext(ctx, queryGet, snapshotID).Scan(&blobSize)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("getting blob size: %w", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
compressionRatio = float64(blobSize) / float64(blobUncompressedSize)
|
||||||
|
} else {
|
||||||
|
compressionRatio = 1.0
|
||||||
|
}
|
||||||
|
|
||||||
|
query := `
|
||||||
|
UPDATE snapshots
|
||||||
|
SET blob_uncompressed_size = ?,
|
||||||
|
compression_ratio = ?,
|
||||||
|
compression_level = ?,
|
||||||
|
upload_bytes = blob_size,
|
||||||
|
upload_duration_ms = ?
|
||||||
|
WHERE id = ?
|
||||||
|
`
|
||||||
|
|
||||||
|
var err error
|
||||||
|
if tx != nil {
|
||||||
|
_, err = tx.ExecContext(ctx, query, blobUncompressedSize, compressionRatio, compressionLevel, uploadDurationMs, snapshotID)
|
||||||
|
} else {
|
||||||
|
_, err = r.db.ExecWithLog(ctx, query, blobUncompressedSize, compressionRatio, compressionLevel, uploadDurationMs, snapshotID)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("updating extended stats: %w", err)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
func (r *SnapshotRepository) GetByID(ctx context.Context, snapshotID string) (*Snapshot, error) {
|
func (r *SnapshotRepository) GetByID(ctx context.Context, snapshotID string) (*Snapshot, error) {
|
||||||
query := `
|
query := `
|
||||||
SELECT id, hostname, vaultik_version, created_ts, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
|
SELECT id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at,
|
||||||
|
file_count, chunk_count, blob_count, total_size, blob_size, blob_uncompressed_size,
|
||||||
|
compression_ratio, compression_level, upload_bytes, upload_duration_ms
|
||||||
FROM snapshots
|
FROM snapshots
|
||||||
WHERE id = ?
|
WHERE id = ?
|
||||||
`
|
`
|
||||||
|
|
||||||
var snapshot Snapshot
|
var snapshot Snapshot
|
||||||
var createdTSUnix int64
|
var startedAtUnix int64
|
||||||
|
var completedAtUnix *int64
|
||||||
|
|
||||||
err := r.db.conn.QueryRowContext(ctx, query, snapshotID).Scan(
|
err := r.db.conn.QueryRowContext(ctx, query, snapshotID).Scan(
|
||||||
&snapshot.ID,
|
&snapshot.ID,
|
||||||
&snapshot.Hostname,
|
&snapshot.Hostname,
|
||||||
&snapshot.VaultikVersion,
|
&snapshot.VaultikVersion,
|
||||||
&createdTSUnix,
|
&snapshot.VaultikGitRevision,
|
||||||
|
&startedAtUnix,
|
||||||
|
&completedAtUnix,
|
||||||
&snapshot.FileCount,
|
&snapshot.FileCount,
|
||||||
&snapshot.ChunkCount,
|
&snapshot.ChunkCount,
|
||||||
&snapshot.BlobCount,
|
&snapshot.BlobCount,
|
||||||
&snapshot.TotalSize,
|
&snapshot.TotalSize,
|
||||||
&snapshot.BlobSize,
|
&snapshot.BlobSize,
|
||||||
|
&snapshot.BlobUncompressedSize,
|
||||||
&snapshot.CompressionRatio,
|
&snapshot.CompressionRatio,
|
||||||
|
&snapshot.CompressionLevel,
|
||||||
|
&snapshot.UploadBytes,
|
||||||
|
&snapshot.UploadDurationMs,
|
||||||
)
|
)
|
||||||
|
|
||||||
if err == sql.ErrNoRows {
|
if err == sql.ErrNoRows {
|
||||||
@@ -98,16 +166,20 @@ func (r *SnapshotRepository) GetByID(ctx context.Context, snapshotID string) (*S
|
|||||||
return nil, fmt.Errorf("querying snapshot: %w", err)
|
return nil, fmt.Errorf("querying snapshot: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
snapshot.CreatedTS = time.Unix(createdTSUnix, 0)
|
snapshot.StartedAt = time.Unix(startedAtUnix, 0).UTC()
|
||||||
|
if completedAtUnix != nil {
|
||||||
|
t := time.Unix(*completedAtUnix, 0).UTC()
|
||||||
|
snapshot.CompletedAt = &t
|
||||||
|
}
|
||||||
|
|
||||||
return &snapshot, nil
|
return &snapshot, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (r *SnapshotRepository) ListRecent(ctx context.Context, limit int) ([]*Snapshot, error) {
|
func (r *SnapshotRepository) ListRecent(ctx context.Context, limit int) ([]*Snapshot, error) {
|
||||||
query := `
|
query := `
|
||||||
SELECT id, hostname, vaultik_version, created_ts, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
|
SELECT id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
|
||||||
FROM snapshots
|
FROM snapshots
|
||||||
ORDER BY created_ts DESC
|
ORDER BY started_at DESC
|
||||||
LIMIT ?
|
LIMIT ?
|
||||||
`
|
`
|
||||||
|
|
||||||
@@ -120,13 +192,16 @@ func (r *SnapshotRepository) ListRecent(ctx context.Context, limit int) ([]*Snap
|
|||||||
var snapshots []*Snapshot
|
var snapshots []*Snapshot
|
||||||
for rows.Next() {
|
for rows.Next() {
|
||||||
var snapshot Snapshot
|
var snapshot Snapshot
|
||||||
var createdTSUnix int64
|
var startedAtUnix int64
|
||||||
|
var completedAtUnix *int64
|
||||||
|
|
||||||
err := rows.Scan(
|
err := rows.Scan(
|
||||||
&snapshot.ID,
|
&snapshot.ID,
|
||||||
&snapshot.Hostname,
|
&snapshot.Hostname,
|
||||||
&snapshot.VaultikVersion,
|
&snapshot.VaultikVersion,
|
||||||
&createdTSUnix,
|
&snapshot.VaultikGitRevision,
|
||||||
|
&startedAtUnix,
|
||||||
|
&completedAtUnix,
|
||||||
&snapshot.FileCount,
|
&snapshot.FileCount,
|
||||||
&snapshot.ChunkCount,
|
&snapshot.ChunkCount,
|
||||||
&snapshot.BlobCount,
|
&snapshot.BlobCount,
|
||||||
@@ -138,10 +213,336 @@ func (r *SnapshotRepository) ListRecent(ctx context.Context, limit int) ([]*Snap
|
|||||||
return nil, fmt.Errorf("scanning snapshot: %w", err)
|
return nil, fmt.Errorf("scanning snapshot: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
snapshot.CreatedTS = time.Unix(createdTSUnix, 0)
|
snapshot.StartedAt = time.Unix(startedAtUnix, 0)
|
||||||
|
if completedAtUnix != nil {
|
||||||
|
t := time.Unix(*completedAtUnix, 0)
|
||||||
|
snapshot.CompletedAt = &t
|
||||||
|
}
|
||||||
|
|
||||||
snapshots = append(snapshots, &snapshot)
|
snapshots = append(snapshots, &snapshot)
|
||||||
}
|
}
|
||||||
|
|
||||||
return snapshots, rows.Err()
|
return snapshots, rows.Err()
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// MarkComplete marks a snapshot as completed with the current timestamp
|
||||||
|
func (r *SnapshotRepository) MarkComplete(ctx context.Context, tx *sql.Tx, snapshotID string) error {
|
||||||
|
query := `
|
||||||
|
UPDATE snapshots
|
||||||
|
SET completed_at = ?
|
||||||
|
WHERE id = ?
|
||||||
|
`
|
||||||
|
|
||||||
|
completedAt := time.Now().UTC().Unix()
|
||||||
|
|
||||||
|
var err error
|
||||||
|
if tx != nil {
|
||||||
|
_, err = tx.ExecContext(ctx, query, completedAt, snapshotID)
|
||||||
|
} else {
|
||||||
|
_, err = r.db.ExecWithLog(ctx, query, completedAt, snapshotID)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("marking snapshot complete: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// AddFile adds a file to a snapshot
|
||||||
|
func (r *SnapshotRepository) AddFile(ctx context.Context, tx *sql.Tx, snapshotID string, filePath string) error {
|
||||||
|
query := `
|
||||||
|
INSERT OR IGNORE INTO snapshot_files (snapshot_id, file_id)
|
||||||
|
SELECT ?, id FROM files WHERE path = ?
|
||||||
|
`
|
||||||
|
|
||||||
|
var err error
|
||||||
|
if tx != nil {
|
||||||
|
_, err = tx.ExecContext(ctx, query, snapshotID, filePath)
|
||||||
|
} else {
|
||||||
|
_, err = r.db.ExecWithLog(ctx, query, snapshotID, filePath)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("adding file to snapshot: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// AddFileByID adds a file to a snapshot by file ID
|
||||||
|
func (r *SnapshotRepository) AddFileByID(ctx context.Context, tx *sql.Tx, snapshotID string, fileID types.FileID) error {
|
||||||
|
query := `
|
||||||
|
INSERT OR IGNORE INTO snapshot_files (snapshot_id, file_id)
|
||||||
|
VALUES (?, ?)
|
||||||
|
`
|
||||||
|
|
||||||
|
var err error
|
||||||
|
if tx != nil {
|
||||||
|
_, err = tx.ExecContext(ctx, query, snapshotID, fileID.String())
|
||||||
|
} else {
|
||||||
|
_, err = r.db.ExecWithLog(ctx, query, snapshotID, fileID.String())
|
||||||
|
}
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("adding file to snapshot: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// AddFilesByIDBatch adds multiple files to a snapshot in batched inserts
|
||||||
|
func (r *SnapshotRepository) AddFilesByIDBatch(ctx context.Context, tx *sql.Tx, snapshotID string, fileIDs []types.FileID) error {
|
||||||
|
if len(fileIDs) == 0 {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Each entry has 2 values, so batch at 400 to be safe
|
||||||
|
const batchSize = 400
|
||||||
|
|
||||||
|
for i := 0; i < len(fileIDs); i += batchSize {
|
||||||
|
end := i + batchSize
|
||||||
|
if end > len(fileIDs) {
|
||||||
|
end = len(fileIDs)
|
||||||
|
}
|
||||||
|
batch := fileIDs[i:end]
|
||||||
|
|
||||||
|
query := "INSERT OR IGNORE INTO snapshot_files (snapshot_id, file_id) VALUES "
|
||||||
|
args := make([]interface{}, 0, len(batch)*2)
|
||||||
|
for j, fileID := range batch {
|
||||||
|
if j > 0 {
|
||||||
|
query += ", "
|
||||||
|
}
|
||||||
|
query += "(?, ?)"
|
||||||
|
args = append(args, snapshotID, fileID.String())
|
||||||
|
}
|
||||||
|
|
||||||
|
var err error
|
||||||
|
if tx != nil {
|
||||||
|
_, err = tx.ExecContext(ctx, query, args...)
|
||||||
|
} else {
|
||||||
|
_, err = r.db.ExecWithLog(ctx, query, args...)
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("batch adding files to snapshot: %w", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// AddBlob adds a blob to a snapshot
|
||||||
|
func (r *SnapshotRepository) AddBlob(ctx context.Context, tx *sql.Tx, snapshotID string, blobID types.BlobID, blobHash types.BlobHash) error {
|
||||||
|
query := `
|
||||||
|
INSERT OR IGNORE INTO snapshot_blobs (snapshot_id, blob_id, blob_hash)
|
||||||
|
VALUES (?, ?, ?)
|
||||||
|
`
|
||||||
|
|
||||||
|
var err error
|
||||||
|
if tx != nil {
|
||||||
|
_, err = tx.ExecContext(ctx, query, snapshotID, blobID.String(), blobHash.String())
|
||||||
|
} else {
|
||||||
|
_, err = r.db.ExecWithLog(ctx, query, snapshotID, blobID.String(), blobHash.String())
|
||||||
|
}
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("adding blob to snapshot: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetBlobHashes returns all blob hashes for a snapshot
|
||||||
|
func (r *SnapshotRepository) GetBlobHashes(ctx context.Context, snapshotID string) ([]string, error) {
|
||||||
|
query := `
|
||||||
|
SELECT sb.blob_hash
|
||||||
|
FROM snapshot_blobs sb
|
||||||
|
WHERE sb.snapshot_id = ?
|
||||||
|
ORDER BY sb.blob_hash
|
||||||
|
`
|
||||||
|
|
||||||
|
rows, err := r.db.conn.QueryContext(ctx, query, snapshotID)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("querying blob hashes: %w", err)
|
||||||
|
}
|
||||||
|
defer CloseRows(rows)
|
||||||
|
|
||||||
|
var blobs []string
|
||||||
|
for rows.Next() {
|
||||||
|
var blobHash string
|
||||||
|
if err := rows.Scan(&blobHash); err != nil {
|
||||||
|
return nil, fmt.Errorf("scanning blob hash: %w", err)
|
||||||
|
}
|
||||||
|
blobs = append(blobs, blobHash)
|
||||||
|
}
|
||||||
|
|
||||||
|
return blobs, rows.Err()
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetSnapshotTotalCompressedSize returns the total compressed size of all blobs referenced by a snapshot
|
||||||
|
func (r *SnapshotRepository) GetSnapshotTotalCompressedSize(ctx context.Context, snapshotID string) (int64, error) {
|
||||||
|
query := `
|
||||||
|
SELECT COALESCE(SUM(b.compressed_size), 0)
|
||||||
|
FROM snapshot_blobs sb
|
||||||
|
JOIN blobs b ON sb.blob_hash = b.blob_hash
|
||||||
|
WHERE sb.snapshot_id = ?
|
||||||
|
`
|
||||||
|
|
||||||
|
var totalSize int64
|
||||||
|
err := r.db.conn.QueryRowContext(ctx, query, snapshotID).Scan(&totalSize)
|
||||||
|
if err != nil {
|
||||||
|
return 0, fmt.Errorf("querying total compressed size: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return totalSize, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetIncompleteSnapshots returns all snapshots that haven't been completed
|
||||||
|
func (r *SnapshotRepository) GetIncompleteSnapshots(ctx context.Context) ([]*Snapshot, error) {
|
||||||
|
query := `
|
||||||
|
SELECT id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
|
||||||
|
FROM snapshots
|
||||||
|
WHERE completed_at IS NULL
|
||||||
|
ORDER BY started_at DESC
|
||||||
|
`
|
||||||
|
|
||||||
|
rows, err := r.db.conn.QueryContext(ctx, query)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("querying incomplete snapshots: %w", err)
|
||||||
|
}
|
||||||
|
defer CloseRows(rows)
|
||||||
|
|
||||||
|
var snapshots []*Snapshot
|
||||||
|
for rows.Next() {
|
||||||
|
var snapshot Snapshot
|
||||||
|
var startedAtUnix int64
|
||||||
|
var completedAtUnix *int64
|
||||||
|
|
||||||
|
err := rows.Scan(
|
||||||
|
&snapshot.ID,
|
||||||
|
&snapshot.Hostname,
|
||||||
|
&snapshot.VaultikVersion,
|
||||||
|
&snapshot.VaultikGitRevision,
|
||||||
|
&startedAtUnix,
|
||||||
|
&completedAtUnix,
|
||||||
|
&snapshot.FileCount,
|
||||||
|
&snapshot.ChunkCount,
|
||||||
|
&snapshot.BlobCount,
|
||||||
|
&snapshot.TotalSize,
|
||||||
|
&snapshot.BlobSize,
|
||||||
|
&snapshot.CompressionRatio,
|
||||||
|
)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("scanning snapshot: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
snapshot.StartedAt = time.Unix(startedAtUnix, 0)
|
||||||
|
if completedAtUnix != nil {
|
||||||
|
t := time.Unix(*completedAtUnix, 0)
|
||||||
|
snapshot.CompletedAt = &t
|
||||||
|
}
|
||||||
|
|
||||||
|
snapshots = append(snapshots, &snapshot)
|
||||||
|
}
|
||||||
|
|
||||||
|
return snapshots, rows.Err()
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetIncompleteByHostname returns all incomplete snapshots for a specific hostname
|
||||||
|
func (r *SnapshotRepository) GetIncompleteByHostname(ctx context.Context, hostname string) ([]*Snapshot, error) {
|
||||||
|
query := `
|
||||||
|
SELECT id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
|
||||||
|
FROM snapshots
|
||||||
|
WHERE completed_at IS NULL AND hostname = ?
|
||||||
|
ORDER BY started_at DESC
|
||||||
|
`
|
||||||
|
|
||||||
|
rows, err := r.db.conn.QueryContext(ctx, query, hostname)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("querying incomplete snapshots: %w", err)
|
||||||
|
}
|
||||||
|
defer CloseRows(rows)
|
||||||
|
|
||||||
|
var snapshots []*Snapshot
|
||||||
|
for rows.Next() {
|
||||||
|
var snapshot Snapshot
|
||||||
|
var startedAtUnix int64
|
||||||
|
var completedAtUnix *int64
|
||||||
|
|
||||||
|
err := rows.Scan(
|
||||||
|
&snapshot.ID,
|
||||||
|
&snapshot.Hostname,
|
||||||
|
&snapshot.VaultikVersion,
|
||||||
|
&snapshot.VaultikGitRevision,
|
||||||
|
&startedAtUnix,
|
||||||
|
&completedAtUnix,
|
||||||
|
&snapshot.FileCount,
|
||||||
|
&snapshot.ChunkCount,
|
||||||
|
&snapshot.BlobCount,
|
||||||
|
&snapshot.TotalSize,
|
||||||
|
&snapshot.BlobSize,
|
||||||
|
&snapshot.CompressionRatio,
|
||||||
|
)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("scanning snapshot: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
snapshot.StartedAt = time.Unix(startedAtUnix, 0).UTC()
|
||||||
|
if completedAtUnix != nil {
|
||||||
|
t := time.Unix(*completedAtUnix, 0).UTC()
|
||||||
|
snapshot.CompletedAt = &t
|
||||||
|
}
|
||||||
|
|
||||||
|
snapshots = append(snapshots, &snapshot)
|
||||||
|
}
|
||||||
|
|
||||||
|
return snapshots, rows.Err()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Delete removes a snapshot record
|
||||||
|
func (r *SnapshotRepository) Delete(ctx context.Context, snapshotID string) error {
|
||||||
|
query := `DELETE FROM snapshots WHERE id = ?`
|
||||||
|
|
||||||
|
_, err := r.db.ExecWithLog(ctx, query, snapshotID)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting snapshot: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// DeleteSnapshotFiles removes all snapshot_files entries for a snapshot
|
||||||
|
func (r *SnapshotRepository) DeleteSnapshotFiles(ctx context.Context, snapshotID string) error {
|
||||||
|
query := `DELETE FROM snapshot_files WHERE snapshot_id = ?`
|
||||||
|
|
||||||
|
_, err := r.db.ExecWithLog(ctx, query, snapshotID)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting snapshot files: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// DeleteSnapshotBlobs removes all snapshot_blobs entries for a snapshot
|
||||||
|
func (r *SnapshotRepository) DeleteSnapshotBlobs(ctx context.Context, snapshotID string) error {
|
||||||
|
query := `DELETE FROM snapshot_blobs WHERE snapshot_id = ?`
|
||||||
|
|
||||||
|
_, err := r.db.ExecWithLog(ctx, query, snapshotID)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting snapshot blobs: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// DeleteSnapshotUploads removes all uploads entries for a snapshot
|
||||||
|
func (r *SnapshotRepository) DeleteSnapshotUploads(ctx context.Context, snapshotID string) error {
|
||||||
|
query := `DELETE FROM uploads WHERE snapshot_id = ?`
|
||||||
|
|
||||||
|
_, err := r.db.ExecWithLog(ctx, query, snapshotID)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting snapshot uploads: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|||||||
@@ -6,6 +6,8 @@ import (
|
|||||||
"math"
|
"math"
|
||||||
"testing"
|
"testing"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
)
|
)
|
||||||
|
|
||||||
const (
|
const (
|
||||||
@@ -30,7 +32,8 @@ func TestSnapshotRepository(t *testing.T) {
|
|||||||
ID: "2024-01-01T12:00:00Z",
|
ID: "2024-01-01T12:00:00Z",
|
||||||
Hostname: "test-host",
|
Hostname: "test-host",
|
||||||
VaultikVersion: "1.0.0",
|
VaultikVersion: "1.0.0",
|
||||||
CreatedTS: time.Now().Truncate(time.Second),
|
StartedAt: time.Now().Truncate(time.Second),
|
||||||
|
CompletedAt: nil,
|
||||||
FileCount: 100,
|
FileCount: 100,
|
||||||
ChunkCount: 500,
|
ChunkCount: 500,
|
||||||
BlobCount: 10,
|
BlobCount: 10,
|
||||||
@@ -45,7 +48,7 @@ func TestSnapshotRepository(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Test GetByID
|
// Test GetByID
|
||||||
retrieved, err := repo.GetByID(ctx, snapshot.ID)
|
retrieved, err := repo.GetByID(ctx, snapshot.ID.String())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get snapshot: %v", err)
|
t.Fatalf("failed to get snapshot: %v", err)
|
||||||
}
|
}
|
||||||
@@ -63,12 +66,12 @@ func TestSnapshotRepository(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Test UpdateCounts
|
// Test UpdateCounts
|
||||||
err = repo.UpdateCounts(ctx, nil, snapshot.ID, 200, 1000, 20, twoHundredMebibytes, sixtyMebibytes)
|
err = repo.UpdateCounts(ctx, nil, snapshot.ID.String(), 200, 1000, 20, twoHundredMebibytes, sixtyMebibytes)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to update counts: %v", err)
|
t.Fatalf("failed to update counts: %v", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
retrieved, err = repo.GetByID(ctx, snapshot.ID)
|
retrieved, err = repo.GetByID(ctx, snapshot.ID.String())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("failed to get updated snapshot: %v", err)
|
t.Fatalf("failed to get updated snapshot: %v", err)
|
||||||
}
|
}
|
||||||
@@ -96,10 +99,11 @@ func TestSnapshotRepository(t *testing.T) {
|
|||||||
// Add more snapshots
|
// Add more snapshots
|
||||||
for i := 2; i <= 5; i++ {
|
for i := 2; i <= 5; i++ {
|
||||||
s := &Snapshot{
|
s := &Snapshot{
|
||||||
ID: fmt.Sprintf("2024-01-0%dT12:00:00Z", i),
|
ID: types.SnapshotID(fmt.Sprintf("2024-01-0%dT12:00:00Z", i)),
|
||||||
Hostname: "test-host",
|
Hostname: "test-host",
|
||||||
VaultikVersion: "1.0.0",
|
VaultikVersion: "1.0.0",
|
||||||
CreatedTS: time.Now().Add(time.Duration(i) * time.Hour).Truncate(time.Second),
|
StartedAt: time.Now().Add(time.Duration(i) * time.Hour).Truncate(time.Second),
|
||||||
|
CompletedAt: nil,
|
||||||
FileCount: int64(100 * i),
|
FileCount: int64(100 * i),
|
||||||
ChunkCount: int64(500 * i),
|
ChunkCount: int64(500 * i),
|
||||||
BlobCount: int64(10 * i),
|
BlobCount: int64(10 * i),
|
||||||
@@ -121,7 +125,7 @@ func TestSnapshotRepository(t *testing.T) {
|
|||||||
|
|
||||||
// Verify order (most recent first)
|
// Verify order (most recent first)
|
||||||
for i := 0; i < len(recent)-1; i++ {
|
for i := 0; i < len(recent)-1; i++ {
|
||||||
if recent[i].CreatedTS.Before(recent[i+1].CreatedTS) {
|
if recent[i].StartedAt.Before(recent[i+1].StartedAt) {
|
||||||
t.Error("snapshots not in descending order")
|
t.Error("snapshots not in descending order")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -162,7 +166,8 @@ func TestSnapshotRepositoryDuplicate(t *testing.T) {
|
|||||||
ID: "2024-01-01T12:00:00Z",
|
ID: "2024-01-01T12:00:00Z",
|
||||||
Hostname: "test-host",
|
Hostname: "test-host",
|
||||||
VaultikVersion: "1.0.0",
|
VaultikVersion: "1.0.0",
|
||||||
CreatedTS: time.Now().Truncate(time.Second),
|
StartedAt: time.Now().Truncate(time.Second),
|
||||||
|
CompletedAt: nil,
|
||||||
FileCount: 100,
|
FileCount: 100,
|
||||||
ChunkCount: 500,
|
ChunkCount: 500,
|
||||||
BlobCount: 10,
|
BlobCount: 10,
|
||||||
|
|||||||
147
internal/database/uploads.go
Normal file
147
internal/database/uploads.go
Normal file
@@ -0,0 +1,147 @@
|
|||||||
|
package database
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"database/sql"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Upload represents a blob upload record
|
||||||
|
type Upload struct {
|
||||||
|
BlobHash string
|
||||||
|
SnapshotID string
|
||||||
|
UploadedAt time.Time
|
||||||
|
Size int64
|
||||||
|
DurationMs int64
|
||||||
|
}
|
||||||
|
|
||||||
|
// UploadRepository handles upload records
|
||||||
|
type UploadRepository struct {
|
||||||
|
conn *sql.DB
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewUploadRepository creates a new upload repository
|
||||||
|
func NewUploadRepository(conn *sql.DB) *UploadRepository {
|
||||||
|
return &UploadRepository{conn: conn}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create inserts a new upload record
|
||||||
|
func (r *UploadRepository) Create(ctx context.Context, tx *sql.Tx, upload *Upload) error {
|
||||||
|
query := `
|
||||||
|
INSERT INTO uploads (blob_hash, snapshot_id, uploaded_at, size, duration_ms)
|
||||||
|
VALUES (?, ?, ?, ?, ?)
|
||||||
|
`
|
||||||
|
|
||||||
|
var err error
|
||||||
|
if tx != nil {
|
||||||
|
_, err = tx.ExecContext(ctx, query, upload.BlobHash, upload.SnapshotID, upload.UploadedAt, upload.Size, upload.DurationMs)
|
||||||
|
} else {
|
||||||
|
_, err = r.conn.ExecContext(ctx, query, upload.BlobHash, upload.SnapshotID, upload.UploadedAt, upload.Size, upload.DurationMs)
|
||||||
|
}
|
||||||
|
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetByBlobHash retrieves an upload record by blob hash
|
||||||
|
func (r *UploadRepository) GetByBlobHash(ctx context.Context, blobHash string) (*Upload, error) {
|
||||||
|
query := `
|
||||||
|
SELECT blob_hash, uploaded_at, size, duration_ms
|
||||||
|
FROM uploads
|
||||||
|
WHERE blob_hash = ?
|
||||||
|
`
|
||||||
|
|
||||||
|
var upload Upload
|
||||||
|
err := r.conn.QueryRowContext(ctx, query, blobHash).Scan(
|
||||||
|
&upload.BlobHash,
|
||||||
|
&upload.UploadedAt,
|
||||||
|
&upload.Size,
|
||||||
|
&upload.DurationMs,
|
||||||
|
)
|
||||||
|
|
||||||
|
if err == sql.ErrNoRows {
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
return &upload, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetRecentUploads retrieves recent uploads ordered by upload time
|
||||||
|
func (r *UploadRepository) GetRecentUploads(ctx context.Context, limit int) ([]*Upload, error) {
|
||||||
|
query := `
|
||||||
|
SELECT blob_hash, uploaded_at, size, duration_ms
|
||||||
|
FROM uploads
|
||||||
|
ORDER BY uploaded_at DESC
|
||||||
|
LIMIT ?
|
||||||
|
`
|
||||||
|
|
||||||
|
rows, err := r.conn.QueryContext(ctx, query, limit)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
if err := rows.Close(); err != nil {
|
||||||
|
log.Error("failed to close rows", "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
var uploads []*Upload
|
||||||
|
for rows.Next() {
|
||||||
|
var upload Upload
|
||||||
|
if err := rows.Scan(&upload.BlobHash, &upload.UploadedAt, &upload.Size, &upload.DurationMs); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
uploads = append(uploads, &upload)
|
||||||
|
}
|
||||||
|
|
||||||
|
return uploads, rows.Err()
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetUploadStats returns aggregate statistics for uploads
|
||||||
|
func (r *UploadRepository) GetUploadStats(ctx context.Context, since time.Time) (*UploadStats, error) {
|
||||||
|
query := `
|
||||||
|
SELECT
|
||||||
|
COUNT(*) as count,
|
||||||
|
COALESCE(SUM(size), 0) as total_size,
|
||||||
|
COALESCE(AVG(duration_ms), 0) as avg_duration_ms,
|
||||||
|
COALESCE(MIN(duration_ms), 0) as min_duration_ms,
|
||||||
|
COALESCE(MAX(duration_ms), 0) as max_duration_ms
|
||||||
|
FROM uploads
|
||||||
|
WHERE uploaded_at >= ?
|
||||||
|
`
|
||||||
|
|
||||||
|
var stats UploadStats
|
||||||
|
err := r.conn.QueryRowContext(ctx, query, since).Scan(
|
||||||
|
&stats.Count,
|
||||||
|
&stats.TotalSize,
|
||||||
|
&stats.AvgDurationMs,
|
||||||
|
&stats.MinDurationMs,
|
||||||
|
&stats.MaxDurationMs,
|
||||||
|
)
|
||||||
|
|
||||||
|
return &stats, err
|
||||||
|
}
|
||||||
|
|
||||||
|
// UploadStats contains aggregate upload statistics
|
||||||
|
type UploadStats struct {
|
||||||
|
Count int64
|
||||||
|
TotalSize int64
|
||||||
|
AvgDurationMs float64
|
||||||
|
MinDurationMs int64
|
||||||
|
MaxDurationMs int64
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetCountBySnapshot returns the count of uploads for a specific snapshot
|
||||||
|
func (r *UploadRepository) GetCountBySnapshot(ctx context.Context, snapshotID string) (int64, error) {
|
||||||
|
query := `SELECT COUNT(*) FROM uploads WHERE snapshot_id = ?`
|
||||||
|
var count int64
|
||||||
|
err := r.conn.QueryRowContext(ctx, query, snapshotID).Scan(&count)
|
||||||
|
if err != nil {
|
||||||
|
return 0, err
|
||||||
|
}
|
||||||
|
return count, nil
|
||||||
|
}
|
||||||
@@ -4,13 +4,16 @@ import (
|
|||||||
"time"
|
"time"
|
||||||
)
|
)
|
||||||
|
|
||||||
// these get populated from main() and copied into the Globals object.
|
// Appname is the application name, populated from main().
|
||||||
var (
|
var Appname string = "vaultik"
|
||||||
Appname string = "vaultik"
|
|
||||||
Version string = "dev"
|
|
||||||
Commit string = "unknown"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
// Version is the application version, populated from main().
|
||||||
|
var Version string = "dev"
|
||||||
|
|
||||||
|
// Commit is the git commit hash, populated from main().
|
||||||
|
var Commit string = "unknown"
|
||||||
|
|
||||||
|
// Globals contains application-wide configuration and metadata.
|
||||||
type Globals struct {
|
type Globals struct {
|
||||||
Appname string
|
Appname string
|
||||||
Version string
|
Version string
|
||||||
@@ -18,13 +21,11 @@ type Globals struct {
|
|||||||
StartTime time.Time
|
StartTime time.Time
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// New creates and returns a new Globals instance initialized with the package-level variables.
|
||||||
func New() (*Globals, error) {
|
func New() (*Globals, error) {
|
||||||
n := &Globals{
|
return &Globals{
|
||||||
Appname: Appname,
|
Appname: Appname,
|
||||||
Version: Version,
|
Version: Version,
|
||||||
Commit: Commit,
|
Commit: Commit,
|
||||||
StartTime: time.Now(),
|
}, nil
|
||||||
}
|
|
||||||
|
|
||||||
return n, nil
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -2,16 +2,15 @@ package globals
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"testing"
|
"testing"
|
||||||
|
|
||||||
"go.uber.org/fx"
|
|
||||||
"go.uber.org/fx/fxtest"
|
|
||||||
)
|
)
|
||||||
|
|
||||||
// TestGlobalsNew ensures the globals package initializes correctly
|
// TestGlobalsNew ensures the globals package initializes correctly
|
||||||
func TestGlobalsNew(t *testing.T) {
|
func TestGlobalsNew(t *testing.T) {
|
||||||
app := fxtest.New(t,
|
g, err := New()
|
||||||
fx.Provide(New),
|
if err != nil {
|
||||||
fx.Invoke(func(g *Globals) {
|
t.Fatalf("Failed to create Globals: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
if g == nil {
|
if g == nil {
|
||||||
t.Fatal("Globals instance is nil")
|
t.Fatal("Globals instance is nil")
|
||||||
}
|
}
|
||||||
@@ -28,9 +27,4 @@ func TestGlobalsNew(t *testing.T) {
|
|||||||
if g.Commit == "" {
|
if g.Commit == "" {
|
||||||
t.Error("Commit should not be empty")
|
t.Error("Commit should not be empty")
|
||||||
}
|
}
|
||||||
}),
|
|
||||||
)
|
|
||||||
|
|
||||||
app.RequireStart()
|
|
||||||
app.RequireStop()
|
|
||||||
}
|
}
|
||||||
|
|||||||
182
internal/log/log.go
Normal file
182
internal/log/log.go
Normal file
@@ -0,0 +1,182 @@
|
|||||||
|
package log
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
"log/slog"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"runtime"
|
||||||
|
"strings"
|
||||||
|
|
||||||
|
"golang.org/x/term"
|
||||||
|
)
|
||||||
|
|
||||||
|
// LogLevel represents the logging level.
|
||||||
|
type LogLevel int
|
||||||
|
|
||||||
|
const (
|
||||||
|
// LevelFatal represents a fatal error level that will exit the program.
|
||||||
|
LevelFatal LogLevel = iota
|
||||||
|
// LevelError represents an error level.
|
||||||
|
LevelError
|
||||||
|
// LevelWarn represents a warning level.
|
||||||
|
LevelWarn
|
||||||
|
// LevelNotice represents a notice level (mapped to Info in slog).
|
||||||
|
LevelNotice
|
||||||
|
// LevelInfo represents an informational level.
|
||||||
|
LevelInfo
|
||||||
|
// LevelDebug represents a debug level.
|
||||||
|
LevelDebug
|
||||||
|
)
|
||||||
|
|
||||||
|
// Config holds logger configuration.
|
||||||
|
type Config struct {
|
||||||
|
Verbose bool
|
||||||
|
Debug bool
|
||||||
|
Cron bool
|
||||||
|
Quiet bool
|
||||||
|
}
|
||||||
|
|
||||||
|
var logger *slog.Logger
|
||||||
|
|
||||||
|
// Initialize sets up the global logger based on the provided configuration.
|
||||||
|
func Initialize(cfg Config) {
|
||||||
|
// Determine log level based on configuration
|
||||||
|
var level slog.Level
|
||||||
|
|
||||||
|
if cfg.Cron || cfg.Quiet {
|
||||||
|
// In quiet/cron mode, only show errors
|
||||||
|
level = slog.LevelError
|
||||||
|
} else if cfg.Debug || strings.Contains(os.Getenv("GODEBUG"), "vaultik") {
|
||||||
|
level = slog.LevelDebug
|
||||||
|
} else if cfg.Verbose {
|
||||||
|
level = slog.LevelInfo
|
||||||
|
} else {
|
||||||
|
level = slog.LevelWarn
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create handler with appropriate level
|
||||||
|
opts := &slog.HandlerOptions{
|
||||||
|
Level: level,
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if stdout is a TTY
|
||||||
|
if term.IsTerminal(int(os.Stdout.Fd())) {
|
||||||
|
// Use colorized TTY handler
|
||||||
|
logger = slog.New(NewTTYHandler(os.Stdout, opts))
|
||||||
|
} else {
|
||||||
|
// Use JSON format for non-TTY output
|
||||||
|
logger = slog.New(slog.NewJSONHandler(os.Stdout, opts))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Set as default logger
|
||||||
|
slog.SetDefault(logger)
|
||||||
|
}
|
||||||
|
|
||||||
|
// getCaller returns the caller information as a string
|
||||||
|
func getCaller(skip int) string {
|
||||||
|
_, file, line, ok := runtime.Caller(skip)
|
||||||
|
if !ok {
|
||||||
|
return "unknown"
|
||||||
|
}
|
||||||
|
return fmt.Sprintf("%s:%d", filepath.Base(file), line)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fatal logs a fatal error message and exits the program with code 1.
|
||||||
|
func Fatal(msg string, args ...any) {
|
||||||
|
if logger != nil {
|
||||||
|
// Add caller info to args
|
||||||
|
args = append(args, "caller", getCaller(2))
|
||||||
|
logger.Error(msg, args...)
|
||||||
|
}
|
||||||
|
os.Exit(1)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fatalf logs a formatted fatal error message and exits the program with code 1.
|
||||||
|
func Fatalf(format string, args ...any) {
|
||||||
|
Fatal(fmt.Sprintf(format, args...))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Error logs an error message.
|
||||||
|
func Error(msg string, args ...any) {
|
||||||
|
if logger != nil {
|
||||||
|
args = append(args, "caller", getCaller(2))
|
||||||
|
logger.Error(msg, args...)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Errorf logs a formatted error message.
|
||||||
|
func Errorf(format string, args ...any) {
|
||||||
|
Error(fmt.Sprintf(format, args...))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Warn logs a warning message.
|
||||||
|
func Warn(msg string, args ...any) {
|
||||||
|
if logger != nil {
|
||||||
|
args = append(args, "caller", getCaller(2))
|
||||||
|
logger.Warn(msg, args...)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Warnf logs a formatted warning message.
|
||||||
|
func Warnf(format string, args ...any) {
|
||||||
|
Warn(fmt.Sprintf(format, args...))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Notice logs a notice message (mapped to Info level).
|
||||||
|
func Notice(msg string, args ...any) {
|
||||||
|
if logger != nil {
|
||||||
|
args = append(args, "caller", getCaller(2))
|
||||||
|
logger.Info(msg, args...)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Noticef logs a formatted notice message.
|
||||||
|
func Noticef(format string, args ...any) {
|
||||||
|
Notice(fmt.Sprintf(format, args...))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Info logs an informational message.
|
||||||
|
func Info(msg string, args ...any) {
|
||||||
|
if logger != nil {
|
||||||
|
args = append(args, "caller", getCaller(2))
|
||||||
|
logger.Info(msg, args...)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Infof logs a formatted informational message.
|
||||||
|
func Infof(format string, args ...any) {
|
||||||
|
Info(fmt.Sprintf(format, args...))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Debug logs a debug message.
|
||||||
|
func Debug(msg string, args ...any) {
|
||||||
|
if logger != nil {
|
||||||
|
args = append(args, "caller", getCaller(2))
|
||||||
|
logger.Debug(msg, args...)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Debugf logs a formatted debug message.
|
||||||
|
func Debugf(format string, args ...any) {
|
||||||
|
Debug(fmt.Sprintf(format, args...))
|
||||||
|
}
|
||||||
|
|
||||||
|
// With returns a logger with additional context attributes.
|
||||||
|
func With(args ...any) *slog.Logger {
|
||||||
|
if logger != nil {
|
||||||
|
return logger.With(args...)
|
||||||
|
}
|
||||||
|
return slog.Default()
|
||||||
|
}
|
||||||
|
|
||||||
|
// WithContext returns a logger with the provided context.
|
||||||
|
func WithContext(ctx context.Context) *slog.Logger {
|
||||||
|
return logger
|
||||||
|
}
|
||||||
|
|
||||||
|
// Logger returns the underlying slog.Logger instance.
|
||||||
|
func Logger() *slog.Logger {
|
||||||
|
return logger
|
||||||
|
}
|
||||||
25
internal/log/module.go
Normal file
25
internal/log/module.go
Normal file
@@ -0,0 +1,25 @@
|
|||||||
|
package log
|
||||||
|
|
||||||
|
import (
|
||||||
|
"go.uber.org/fx"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Module exports logging functionality for dependency injection.
|
||||||
|
var Module = fx.Module("log",
|
||||||
|
fx.Invoke(func(cfg Config) {
|
||||||
|
Initialize(cfg)
|
||||||
|
}),
|
||||||
|
)
|
||||||
|
|
||||||
|
// New creates a new logger configuration from provided options.
|
||||||
|
func New(opts LogOptions) Config {
|
||||||
|
return Config(opts)
|
||||||
|
}
|
||||||
|
|
||||||
|
// LogOptions are provided by the CLI.
|
||||||
|
type LogOptions struct {
|
||||||
|
Verbose bool
|
||||||
|
Debug bool
|
||||||
|
Cron bool
|
||||||
|
Quiet bool
|
||||||
|
}
|
||||||
140
internal/log/tty_handler.go
Normal file
140
internal/log/tty_handler.go
Normal file
@@ -0,0 +1,140 @@
|
|||||||
|
package log
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"log/slog"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// ANSI color codes
|
||||||
|
const (
|
||||||
|
colorReset = "\033[0m"
|
||||||
|
colorRed = "\033[31m"
|
||||||
|
colorYellow = "\033[33m"
|
||||||
|
colorBlue = "\033[34m"
|
||||||
|
colorGray = "\033[90m"
|
||||||
|
colorGreen = "\033[32m"
|
||||||
|
colorCyan = "\033[36m"
|
||||||
|
colorBold = "\033[1m"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TTYHandler is a custom slog handler for TTY output with colors.
|
||||||
|
type TTYHandler struct {
|
||||||
|
opts slog.HandlerOptions
|
||||||
|
mu sync.Mutex
|
||||||
|
out io.Writer
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewTTYHandler creates a new TTY handler with colored output.
|
||||||
|
func NewTTYHandler(out io.Writer, opts *slog.HandlerOptions) *TTYHandler {
|
||||||
|
if opts == nil {
|
||||||
|
opts = &slog.HandlerOptions{}
|
||||||
|
}
|
||||||
|
return &TTYHandler{
|
||||||
|
out: out,
|
||||||
|
opts: *opts,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Enabled reports whether the handler handles records at the given level.
|
||||||
|
func (h *TTYHandler) Enabled(_ context.Context, level slog.Level) bool {
|
||||||
|
return level >= h.opts.Level.Level()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Handle writes the log record to the output with color formatting.
|
||||||
|
func (h *TTYHandler) Handle(_ context.Context, r slog.Record) error {
|
||||||
|
h.mu.Lock()
|
||||||
|
defer h.mu.Unlock()
|
||||||
|
|
||||||
|
// Format timestamp
|
||||||
|
timestamp := r.Time.Format("15:04:05")
|
||||||
|
|
||||||
|
// Level and color
|
||||||
|
level := r.Level.String()
|
||||||
|
var levelColor string
|
||||||
|
switch r.Level {
|
||||||
|
case slog.LevelDebug:
|
||||||
|
levelColor = colorGray
|
||||||
|
level = "DEBUG"
|
||||||
|
case slog.LevelInfo:
|
||||||
|
levelColor = colorGreen
|
||||||
|
level = "INFO "
|
||||||
|
case slog.LevelWarn:
|
||||||
|
levelColor = colorYellow
|
||||||
|
level = "WARN "
|
||||||
|
case slog.LevelError:
|
||||||
|
levelColor = colorRed
|
||||||
|
level = "ERROR"
|
||||||
|
default:
|
||||||
|
levelColor = colorReset
|
||||||
|
}
|
||||||
|
|
||||||
|
// Print main message
|
||||||
|
_, _ = fmt.Fprintf(h.out, "%s%s%s %s%s%s %s%s%s",
|
||||||
|
colorGray, timestamp, colorReset,
|
||||||
|
levelColor, level, colorReset,
|
||||||
|
colorBold, r.Message, colorReset)
|
||||||
|
|
||||||
|
// Print attributes
|
||||||
|
r.Attrs(func(a slog.Attr) bool {
|
||||||
|
value := a.Value.String()
|
||||||
|
// Special handling for certain attribute types
|
||||||
|
switch a.Value.Kind() {
|
||||||
|
case slog.KindDuration:
|
||||||
|
if d, ok := a.Value.Any().(time.Duration); ok {
|
||||||
|
value = formatDuration(d)
|
||||||
|
}
|
||||||
|
case slog.KindInt64:
|
||||||
|
if a.Key == "bytes" {
|
||||||
|
value = formatBytes(a.Value.Int64())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
_, _ = fmt.Fprintf(h.out, " %s%s%s=%s%s%s",
|
||||||
|
colorCyan, a.Key, colorReset,
|
||||||
|
colorBlue, value, colorReset)
|
||||||
|
return true
|
||||||
|
})
|
||||||
|
|
||||||
|
_, _ = fmt.Fprintln(h.out)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// WithAttrs returns a new handler with the given attributes.
|
||||||
|
func (h *TTYHandler) WithAttrs(attrs []slog.Attr) slog.Handler {
|
||||||
|
return h // Simplified for now
|
||||||
|
}
|
||||||
|
|
||||||
|
// WithGroup returns a new handler with the given group name.
|
||||||
|
func (h *TTYHandler) WithGroup(name string) slog.Handler {
|
||||||
|
return h // Simplified for now
|
||||||
|
}
|
||||||
|
|
||||||
|
// formatDuration formats a duration in a human-readable way
|
||||||
|
func formatDuration(d time.Duration) string {
|
||||||
|
if d < time.Millisecond {
|
||||||
|
return fmt.Sprintf("%dµs", d.Microseconds())
|
||||||
|
} else if d < time.Second {
|
||||||
|
return fmt.Sprintf("%dms", d.Milliseconds())
|
||||||
|
} else if d < time.Minute {
|
||||||
|
return fmt.Sprintf("%.1fs", d.Seconds())
|
||||||
|
}
|
||||||
|
return d.String()
|
||||||
|
}
|
||||||
|
|
||||||
|
// formatBytes formats bytes in a human-readable way
|
||||||
|
func formatBytes(b int64) string {
|
||||||
|
const unit = 1024
|
||||||
|
if b < unit {
|
||||||
|
return fmt.Sprintf("%d B", b)
|
||||||
|
}
|
||||||
|
div, exp := int64(unit), 0
|
||||||
|
for n := b / unit; n >= unit; n /= unit {
|
||||||
|
div *= unit
|
||||||
|
exp++
|
||||||
|
}
|
||||||
|
return fmt.Sprintf("%.1f %cB", float64(b)/float64(div), "KMGTPE"[exp])
|
||||||
|
}
|
||||||
108
internal/pidlock/pidlock.go
Normal file
108
internal/pidlock/pidlock.go
Normal file
@@ -0,0 +1,108 @@
|
|||||||
|
// Package pidlock provides process-level locking using PID files.
|
||||||
|
// It prevents multiple instances of vaultik from running simultaneously,
|
||||||
|
// which would cause database locking conflicts.
|
||||||
|
package pidlock
|
||||||
|
|
||||||
|
import (
|
||||||
|
"errors"
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"strconv"
|
||||||
|
"strings"
|
||||||
|
"syscall"
|
||||||
|
)
|
||||||
|
|
||||||
|
// ErrAlreadyRunning indicates another vaultik instance is running.
|
||||||
|
var ErrAlreadyRunning = errors.New("another vaultik instance is already running")
|
||||||
|
|
||||||
|
// Lock represents an acquired PID lock.
|
||||||
|
type Lock struct {
|
||||||
|
path string
|
||||||
|
}
|
||||||
|
|
||||||
|
// Acquire attempts to acquire a PID lock in the specified directory.
|
||||||
|
// If the lock file exists and the process is still running, it returns
|
||||||
|
// ErrAlreadyRunning with details about the existing process.
|
||||||
|
// On success, it writes the current PID to the lock file and returns
|
||||||
|
// a Lock that must be released with Release().
|
||||||
|
func Acquire(lockDir string) (*Lock, error) {
|
||||||
|
// Ensure lock directory exists
|
||||||
|
if err := os.MkdirAll(lockDir, 0700); err != nil {
|
||||||
|
return nil, fmt.Errorf("creating lock directory: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
lockPath := filepath.Join(lockDir, "vaultik.pid")
|
||||||
|
|
||||||
|
// Check for existing lock
|
||||||
|
existingPID, err := readPIDFile(lockPath)
|
||||||
|
if err == nil {
|
||||||
|
// Lock file exists, check if process is running
|
||||||
|
if isProcessRunning(existingPID) {
|
||||||
|
return nil, fmt.Errorf("%w (PID %d)", ErrAlreadyRunning, existingPID)
|
||||||
|
}
|
||||||
|
// Process is not running, stale lock file - we can take over
|
||||||
|
}
|
||||||
|
|
||||||
|
// Write our PID
|
||||||
|
pid := os.Getpid()
|
||||||
|
if err := os.WriteFile(lockPath, []byte(strconv.Itoa(pid)), 0600); err != nil {
|
||||||
|
return nil, fmt.Errorf("writing PID file: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return &Lock{path: lockPath}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Release removes the PID lock file.
|
||||||
|
// It is safe to call Release multiple times.
|
||||||
|
func (l *Lock) Release() error {
|
||||||
|
if l == nil || l.path == "" {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify we still own the lock (our PID is in the file)
|
||||||
|
existingPID, err := readPIDFile(l.path)
|
||||||
|
if err != nil {
|
||||||
|
// File already gone or unreadable - that's fine
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
if existingPID != os.Getpid() {
|
||||||
|
// Someone else wrote to our lock file - don't remove it
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := os.Remove(l.path); err != nil && !os.IsNotExist(err) {
|
||||||
|
return fmt.Errorf("removing PID file: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
l.path = "" // Prevent double-release
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// readPIDFile reads and parses the PID from a lock file.
|
||||||
|
func readPIDFile(path string) (int, error) {
|
||||||
|
data, err := os.ReadFile(path)
|
||||||
|
if err != nil {
|
||||||
|
return 0, err
|
||||||
|
}
|
||||||
|
|
||||||
|
pid, err := strconv.Atoi(strings.TrimSpace(string(data)))
|
||||||
|
if err != nil {
|
||||||
|
return 0, fmt.Errorf("parsing PID: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return pid, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// isProcessRunning checks if a process with the given PID is running.
|
||||||
|
func isProcessRunning(pid int) bool {
|
||||||
|
process, err := os.FindProcess(pid)
|
||||||
|
if err != nil {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
// On Unix, FindProcess always succeeds. We need to send signal 0 to check.
|
||||||
|
err = process.Signal(syscall.Signal(0))
|
||||||
|
return err == nil
|
||||||
|
}
|
||||||
108
internal/pidlock/pidlock_test.go
Normal file
108
internal/pidlock/pidlock_test.go
Normal file
@@ -0,0 +1,108 @@
|
|||||||
|
package pidlock
|
||||||
|
|
||||||
|
import (
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"strconv"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/stretchr/testify/assert"
|
||||||
|
"github.com/stretchr/testify/require"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestAcquireAndRelease(t *testing.T) {
|
||||||
|
tmpDir := t.TempDir()
|
||||||
|
|
||||||
|
// Acquire lock
|
||||||
|
lock, err := Acquire(tmpDir)
|
||||||
|
require.NoError(t, err)
|
||||||
|
require.NotNil(t, lock)
|
||||||
|
|
||||||
|
// Verify PID file exists with our PID
|
||||||
|
data, err := os.ReadFile(filepath.Join(tmpDir, "vaultik.pid"))
|
||||||
|
require.NoError(t, err)
|
||||||
|
pid, err := strconv.Atoi(string(data))
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Equal(t, os.Getpid(), pid)
|
||||||
|
|
||||||
|
// Release lock
|
||||||
|
err = lock.Release()
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Verify PID file is gone
|
||||||
|
_, err = os.Stat(filepath.Join(tmpDir, "vaultik.pid"))
|
||||||
|
assert.True(t, os.IsNotExist(err))
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestAcquireBlocksSecondInstance(t *testing.T) {
|
||||||
|
tmpDir := t.TempDir()
|
||||||
|
|
||||||
|
// Acquire first lock
|
||||||
|
lock1, err := Acquire(tmpDir)
|
||||||
|
require.NoError(t, err)
|
||||||
|
require.NotNil(t, lock1)
|
||||||
|
defer func() { _ = lock1.Release() }()
|
||||||
|
|
||||||
|
// Try to acquire second lock - should fail
|
||||||
|
lock2, err := Acquire(tmpDir)
|
||||||
|
assert.ErrorIs(t, err, ErrAlreadyRunning)
|
||||||
|
assert.Nil(t, lock2)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestAcquireWithStaleLock(t *testing.T) {
|
||||||
|
tmpDir := t.TempDir()
|
||||||
|
|
||||||
|
// Write a stale PID file (PID that doesn't exist)
|
||||||
|
stalePID := 999999999 // Unlikely to be a real process
|
||||||
|
pidPath := filepath.Join(tmpDir, "vaultik.pid")
|
||||||
|
err := os.WriteFile(pidPath, []byte(strconv.Itoa(stalePID)), 0600)
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Should be able to acquire lock (stale lock is cleaned up)
|
||||||
|
lock, err := Acquire(tmpDir)
|
||||||
|
require.NoError(t, err)
|
||||||
|
require.NotNil(t, lock)
|
||||||
|
defer func() { _ = lock.Release() }()
|
||||||
|
|
||||||
|
// Verify our PID is now in the file
|
||||||
|
data, err := os.ReadFile(pidPath)
|
||||||
|
require.NoError(t, err)
|
||||||
|
pid, err := strconv.Atoi(string(data))
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Equal(t, os.Getpid(), pid)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestReleaseIsIdempotent(t *testing.T) {
|
||||||
|
tmpDir := t.TempDir()
|
||||||
|
|
||||||
|
lock, err := Acquire(tmpDir)
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Release multiple times - should not error
|
||||||
|
err = lock.Release()
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
err = lock.Release()
|
||||||
|
require.NoError(t, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestReleaseNilLock(t *testing.T) {
|
||||||
|
var lock *Lock
|
||||||
|
err := lock.Release()
|
||||||
|
assert.NoError(t, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestAcquireCreatesDirectory(t *testing.T) {
|
||||||
|
tmpDir := t.TempDir()
|
||||||
|
nestedDir := filepath.Join(tmpDir, "nested", "dir")
|
||||||
|
|
||||||
|
lock, err := Acquire(nestedDir)
|
||||||
|
require.NoError(t, err)
|
||||||
|
require.NotNil(t, lock)
|
||||||
|
defer func() { _ = lock.Release() }()
|
||||||
|
|
||||||
|
// Verify directory was created
|
||||||
|
info, err := os.Stat(nestedDir)
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.True(t, info.IsDir())
|
||||||
|
}
|
||||||
334
internal/s3/client.go
Normal file
334
internal/s3/client.go
Normal file
@@ -0,0 +1,334 @@
|
|||||||
|
package s3
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"io"
|
||||||
|
"sync/atomic"
|
||||||
|
|
||||||
|
"github.com/aws/aws-sdk-go-v2/aws"
|
||||||
|
"github.com/aws/aws-sdk-go-v2/config"
|
||||||
|
"github.com/aws/aws-sdk-go-v2/credentials"
|
||||||
|
"github.com/aws/aws-sdk-go-v2/feature/s3/manager"
|
||||||
|
"github.com/aws/aws-sdk-go-v2/service/s3"
|
||||||
|
"github.com/aws/smithy-go/logging"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Client wraps the AWS S3 client for vaultik operations.
|
||||||
|
// It provides a simplified interface for S3 operations with automatic
|
||||||
|
// prefix handling and connection management. All operations are performed
|
||||||
|
// within the configured bucket and prefix.
|
||||||
|
type Client struct {
|
||||||
|
s3Client *s3.Client
|
||||||
|
bucket string
|
||||||
|
prefix string
|
||||||
|
endpoint string
|
||||||
|
}
|
||||||
|
|
||||||
|
// Config contains S3 client configuration.
|
||||||
|
// All fields are required except Prefix, which defaults to an empty string.
|
||||||
|
// The Endpoint field should include the protocol (http:// or https://).
|
||||||
|
type Config struct {
|
||||||
|
Endpoint string
|
||||||
|
Bucket string
|
||||||
|
Prefix string
|
||||||
|
AccessKeyID string
|
||||||
|
SecretAccessKey string
|
||||||
|
Region string
|
||||||
|
}
|
||||||
|
|
||||||
|
// nopLogger is a logger that discards all output.
|
||||||
|
// Used to suppress SDK warnings about checksums.
|
||||||
|
type nopLogger struct{}
|
||||||
|
|
||||||
|
func (nopLogger) Logf(classification logging.Classification, format string, v ...interface{}) {}
|
||||||
|
|
||||||
|
// NewClient creates a new S3 client with the provided configuration.
|
||||||
|
// It establishes a connection to the S3-compatible storage service and
|
||||||
|
// validates the credentials. The client uses static credentials and
|
||||||
|
// path-style URLs for compatibility with various S3-compatible services.
|
||||||
|
func NewClient(ctx context.Context, cfg Config) (*Client, error) {
|
||||||
|
// Create AWS config with a nop logger to suppress SDK warnings
|
||||||
|
awsCfg, err := config.LoadDefaultConfig(ctx,
|
||||||
|
config.WithRegion(cfg.Region),
|
||||||
|
config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(
|
||||||
|
cfg.AccessKeyID,
|
||||||
|
cfg.SecretAccessKey,
|
||||||
|
"",
|
||||||
|
)),
|
||||||
|
config.WithLogger(nopLogger{}),
|
||||||
|
)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Configure custom endpoint if provided
|
||||||
|
s3Opts := func(o *s3.Options) {
|
||||||
|
if cfg.Endpoint != "" {
|
||||||
|
o.BaseEndpoint = aws.String(cfg.Endpoint)
|
||||||
|
o.UsePathStyle = true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
s3Client := s3.NewFromConfig(awsCfg, s3Opts)
|
||||||
|
|
||||||
|
return &Client{
|
||||||
|
s3Client: s3Client,
|
||||||
|
bucket: cfg.Bucket,
|
||||||
|
prefix: cfg.Prefix,
|
||||||
|
endpoint: cfg.Endpoint,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// PutObject uploads an object to S3 with the specified key.
|
||||||
|
// The key is automatically prefixed with the configured prefix.
|
||||||
|
// The data parameter should be a reader containing the object data.
|
||||||
|
// Returns an error if the upload fails.
|
||||||
|
func (c *Client) PutObject(ctx context.Context, key string, data io.Reader) error {
|
||||||
|
fullKey := c.prefix + key
|
||||||
|
_, err := c.s3Client.PutObject(ctx, &s3.PutObjectInput{
|
||||||
|
Bucket: aws.String(c.bucket),
|
||||||
|
Key: aws.String(fullKey),
|
||||||
|
Body: data,
|
||||||
|
})
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// ProgressCallback is called during upload progress with bytes uploaded so far.
|
||||||
|
// The callback should return an error to cancel the upload.
|
||||||
|
type ProgressCallback func(bytesUploaded int64) error
|
||||||
|
|
||||||
|
// PutObjectWithProgress uploads an object to S3 with progress tracking.
|
||||||
|
// The key is automatically prefixed with the configured prefix.
|
||||||
|
// The size parameter must be the exact size of the data to upload.
|
||||||
|
// The progress callback is called periodically with the number of bytes uploaded.
|
||||||
|
// Returns an error if the upload fails.
|
||||||
|
func (c *Client) PutObjectWithProgress(ctx context.Context, key string, data io.Reader, size int64, progress ProgressCallback) error {
|
||||||
|
fullKey := c.prefix + key
|
||||||
|
|
||||||
|
// Create an uploader with the S3 client
|
||||||
|
uploader := manager.NewUploader(c.s3Client, func(u *manager.Uploader) {
|
||||||
|
// Set part size to 10MB for better progress granularity
|
||||||
|
u.PartSize = 10 * 1024 * 1024
|
||||||
|
})
|
||||||
|
|
||||||
|
// Create a progress reader that tracks upload progress
|
||||||
|
pr := &progressReader{
|
||||||
|
reader: data,
|
||||||
|
size: size,
|
||||||
|
callback: progress,
|
||||||
|
read: 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
// Upload the file
|
||||||
|
_, err := uploader.Upload(ctx, &s3.PutObjectInput{
|
||||||
|
Bucket: aws.String(c.bucket),
|
||||||
|
Key: aws.String(fullKey),
|
||||||
|
Body: pr,
|
||||||
|
})
|
||||||
|
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetObject downloads an object from S3 with the specified key.
|
||||||
|
// The key is automatically prefixed with the configured prefix.
|
||||||
|
// Returns a ReadCloser containing the object data. The caller must
|
||||||
|
// close the returned reader when done to avoid resource leaks.
|
||||||
|
func (c *Client) GetObject(ctx context.Context, key string) (io.ReadCloser, error) {
|
||||||
|
fullKey := c.prefix + key
|
||||||
|
result, err := c.s3Client.GetObject(ctx, &s3.GetObjectInput{
|
||||||
|
Bucket: aws.String(c.bucket),
|
||||||
|
Key: aws.String(fullKey),
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
return result.Body, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// DeleteObject removes an object from S3 with the specified key.
|
||||||
|
// The key is automatically prefixed with the configured prefix.
|
||||||
|
// No error is returned if the object doesn't exist.
|
||||||
|
func (c *Client) DeleteObject(ctx context.Context, key string) error {
|
||||||
|
fullKey := c.prefix + key
|
||||||
|
_, err := c.s3Client.DeleteObject(ctx, &s3.DeleteObjectInput{
|
||||||
|
Bucket: aws.String(c.bucket),
|
||||||
|
Key: aws.String(fullKey),
|
||||||
|
})
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// ListObjects lists all objects with the given prefix.
|
||||||
|
// The prefix is combined with the client's configured prefix.
|
||||||
|
// Returns a slice of object keys with the base prefix removed.
|
||||||
|
// This method loads all matching keys into memory, so use
|
||||||
|
// ListObjectsStream for large result sets.
|
||||||
|
func (c *Client) ListObjects(ctx context.Context, prefix string) ([]string, error) {
|
||||||
|
fullPrefix := c.prefix + prefix
|
||||||
|
|
||||||
|
var keys []string
|
||||||
|
paginator := s3.NewListObjectsV2Paginator(c.s3Client, &s3.ListObjectsV2Input{
|
||||||
|
Bucket: aws.String(c.bucket),
|
||||||
|
Prefix: aws.String(fullPrefix),
|
||||||
|
})
|
||||||
|
|
||||||
|
for paginator.HasMorePages() {
|
||||||
|
page, err := paginator.NextPage(ctx)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, obj := range page.Contents {
|
||||||
|
if obj.Key != nil {
|
||||||
|
// Remove the base prefix from the key
|
||||||
|
key := *obj.Key
|
||||||
|
if len(key) > len(c.prefix) {
|
||||||
|
key = key[len(c.prefix):]
|
||||||
|
}
|
||||||
|
keys = append(keys, key)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return keys, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// HeadObject checks if an object exists in S3.
|
||||||
|
// Returns true if the object exists, false otherwise.
|
||||||
|
// The key is automatically prefixed with the configured prefix.
|
||||||
|
// Note: This method returns false for any error, not just "not found".
|
||||||
|
func (c *Client) HeadObject(ctx context.Context, key string) (bool, error) {
|
||||||
|
fullKey := c.prefix + key
|
||||||
|
_, err := c.s3Client.HeadObject(ctx, &s3.HeadObjectInput{
|
||||||
|
Bucket: aws.String(c.bucket),
|
||||||
|
Key: aws.String(fullKey),
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
// Check if it's a not found error
|
||||||
|
// TODO: Add proper error type checking
|
||||||
|
return false, nil
|
||||||
|
}
|
||||||
|
return true, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// ObjectInfo contains information about an S3 object.
|
||||||
|
// It is used by ListObjectsStream to return object metadata
|
||||||
|
// along with any errors encountered during listing.
|
||||||
|
type ObjectInfo struct {
|
||||||
|
Key string
|
||||||
|
Size int64
|
||||||
|
Err error
|
||||||
|
}
|
||||||
|
|
||||||
|
// ListObjectsStream lists objects with the given prefix and returns a channel.
|
||||||
|
// This method is preferred for large result sets as it streams results
|
||||||
|
// instead of loading everything into memory. The channel is closed when
|
||||||
|
// listing is complete or an error occurs. If an error occurs, it will be
|
||||||
|
// sent as the last item with the Err field set. The recursive parameter
|
||||||
|
// is currently unused but reserved for future use.
|
||||||
|
func (c *Client) ListObjectsStream(ctx context.Context, prefix string, recursive bool) <-chan ObjectInfo {
|
||||||
|
ch := make(chan ObjectInfo)
|
||||||
|
|
||||||
|
go func() {
|
||||||
|
defer close(ch)
|
||||||
|
|
||||||
|
fullPrefix := c.prefix + prefix
|
||||||
|
|
||||||
|
paginator := s3.NewListObjectsV2Paginator(c.s3Client, &s3.ListObjectsV2Input{
|
||||||
|
Bucket: aws.String(c.bucket),
|
||||||
|
Prefix: aws.String(fullPrefix),
|
||||||
|
})
|
||||||
|
|
||||||
|
for paginator.HasMorePages() {
|
||||||
|
page, err := paginator.NextPage(ctx)
|
||||||
|
if err != nil {
|
||||||
|
ch <- ObjectInfo{Err: err}
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, obj := range page.Contents {
|
||||||
|
if obj.Key != nil && obj.Size != nil {
|
||||||
|
// Remove the base prefix from the key
|
||||||
|
key := *obj.Key
|
||||||
|
if len(key) > len(c.prefix) {
|
||||||
|
key = key[len(c.prefix):]
|
||||||
|
}
|
||||||
|
ch <- ObjectInfo{
|
||||||
|
Key: key,
|
||||||
|
Size: *obj.Size,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
return ch
|
||||||
|
}
|
||||||
|
|
||||||
|
// StatObject returns information about an object without downloading it.
|
||||||
|
// The key is automatically prefixed with the configured prefix.
|
||||||
|
// Returns an ObjectInfo struct with the object's metadata.
|
||||||
|
// Returns an error if the object doesn't exist or if the operation fails.
|
||||||
|
func (c *Client) StatObject(ctx context.Context, key string) (*ObjectInfo, error) {
|
||||||
|
fullKey := c.prefix + key
|
||||||
|
result, err := c.s3Client.HeadObject(ctx, &s3.HeadObjectInput{
|
||||||
|
Bucket: aws.String(c.bucket),
|
||||||
|
Key: aws.String(fullKey),
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
size := int64(0)
|
||||||
|
if result.ContentLength != nil {
|
||||||
|
size = *result.ContentLength
|
||||||
|
}
|
||||||
|
|
||||||
|
return &ObjectInfo{
|
||||||
|
Key: key,
|
||||||
|
Size: size,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// RemoveObject deletes an object from S3 (alias for DeleteObject).
|
||||||
|
// This method exists for API compatibility and simply calls DeleteObject.
|
||||||
|
func (c *Client) RemoveObject(ctx context.Context, key string) error {
|
||||||
|
return c.DeleteObject(ctx, key)
|
||||||
|
}
|
||||||
|
|
||||||
|
// BucketName returns the configured S3 bucket name.
|
||||||
|
// This is useful for displaying configuration information.
|
||||||
|
func (c *Client) BucketName() string {
|
||||||
|
return c.bucket
|
||||||
|
}
|
||||||
|
|
||||||
|
// Endpoint returns the S3 endpoint URL.
|
||||||
|
// If no custom endpoint was configured, returns the default AWS S3 endpoint.
|
||||||
|
// This is useful for displaying configuration information.
|
||||||
|
func (c *Client) Endpoint() string {
|
||||||
|
if c.endpoint == "" {
|
||||||
|
return "s3.amazonaws.com"
|
||||||
|
}
|
||||||
|
return c.endpoint
|
||||||
|
}
|
||||||
|
|
||||||
|
// progressReader wraps an io.Reader to track reading progress
|
||||||
|
type progressReader struct {
|
||||||
|
reader io.Reader
|
||||||
|
size int64
|
||||||
|
read int64
|
||||||
|
callback ProgressCallback
|
||||||
|
}
|
||||||
|
|
||||||
|
// Read implements io.Reader
|
||||||
|
func (pr *progressReader) Read(p []byte) (int, error) {
|
||||||
|
n, err := pr.reader.Read(p)
|
||||||
|
if n > 0 {
|
||||||
|
atomic.AddInt64(&pr.read, int64(n))
|
||||||
|
if pr.callback != nil {
|
||||||
|
if callbackErr := pr.callback(atomic.LoadInt64(&pr.read)); callbackErr != nil {
|
||||||
|
return n, callbackErr
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return n, err
|
||||||
|
}
|
||||||
98
internal/s3/client_test.go
Normal file
98
internal/s3/client_test.go
Normal file
@@ -0,0 +1,98 @@
|
|||||||
|
package s3_test
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"context"
|
||||||
|
"io"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/s3"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestClient(t *testing.T) {
|
||||||
|
ts := NewTestServer(t)
|
||||||
|
defer func() {
|
||||||
|
if err := ts.Cleanup(); err != nil {
|
||||||
|
t.Errorf("cleanup failed: %v", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
|
||||||
|
// Create client
|
||||||
|
client, err := s3.NewClient(ctx, s3.Config{
|
||||||
|
Endpoint: testEndpoint,
|
||||||
|
Bucket: testBucket,
|
||||||
|
Prefix: "test-prefix/",
|
||||||
|
AccessKeyID: testAccessKey,
|
||||||
|
SecretAccessKey: testSecretKey,
|
||||||
|
Region: testRegion,
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create client: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test PutObject
|
||||||
|
testKey := "foo/bar.txt"
|
||||||
|
testData := []byte("test data")
|
||||||
|
err = client.PutObject(ctx, testKey, bytes.NewReader(testData))
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to put object: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test GetObject
|
||||||
|
reader, err := client.GetObject(ctx, testKey)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to get object: %v", err)
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
if err := reader.Close(); err != nil {
|
||||||
|
t.Errorf("failed to close reader: %v", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
data, err := io.ReadAll(reader)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to read data: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if !bytes.Equal(data, testData) {
|
||||||
|
t.Errorf("data mismatch: got %q, want %q", data, testData)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test HeadObject
|
||||||
|
exists, err := client.HeadObject(ctx, testKey)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to head object: %v", err)
|
||||||
|
}
|
||||||
|
if !exists {
|
||||||
|
t.Error("expected object to exist")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test ListObjects
|
||||||
|
keys, err := client.ListObjects(ctx, "foo/")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to list objects: %v", err)
|
||||||
|
}
|
||||||
|
if len(keys) != 1 {
|
||||||
|
t.Errorf("expected 1 key, got %d", len(keys))
|
||||||
|
}
|
||||||
|
if keys[0] != testKey {
|
||||||
|
t.Errorf("unexpected key: got %s, want %s", keys[0], testKey)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test DeleteObject
|
||||||
|
err = client.DeleteObject(ctx, testKey)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to delete object: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify deletion
|
||||||
|
exists, err = client.HeadObject(ctx, testKey)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to head object after deletion: %v", err)
|
||||||
|
}
|
||||||
|
if exists {
|
||||||
|
t.Error("expected object to not exist after deletion")
|
||||||
|
}
|
||||||
|
}
|
||||||
42
internal/s3/module.go
Normal file
42
internal/s3/module.go
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
package s3
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/config"
|
||||||
|
"go.uber.org/fx"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Module exports S3 functionality as an fx module.
|
||||||
|
// It provides automatic dependency injection for the S3 client,
|
||||||
|
// configuring it based on the application's configuration settings.
|
||||||
|
var Module = fx.Module("s3",
|
||||||
|
fx.Provide(
|
||||||
|
provideClient,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
func provideClient(lc fx.Lifecycle, cfg *config.Config) (*Client, error) {
|
||||||
|
ctx := context.Background()
|
||||||
|
|
||||||
|
client, err := NewClient(ctx, Config{
|
||||||
|
Endpoint: cfg.S3.Endpoint,
|
||||||
|
Bucket: cfg.S3.Bucket,
|
||||||
|
Prefix: cfg.S3.Prefix,
|
||||||
|
AccessKeyID: cfg.S3.AccessKeyID,
|
||||||
|
SecretAccessKey: cfg.S3.SecretAccessKey,
|
||||||
|
Region: cfg.S3.Region,
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
lc.Append(fx.Hook{
|
||||||
|
OnStop: func(ctx context.Context) error {
|
||||||
|
// S3 client doesn't need explicit cleanup
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
})
|
||||||
|
|
||||||
|
return client, nil
|
||||||
|
}
|
||||||
306
internal/s3/s3_test.go
Normal file
306
internal/s3/s3_test.go
Normal file
@@ -0,0 +1,306 @@
|
|||||||
|
package s3_test
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"net/http"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"github.com/aws/aws-sdk-go-v2/aws"
|
||||||
|
"github.com/aws/aws-sdk-go-v2/config"
|
||||||
|
"github.com/aws/aws-sdk-go-v2/credentials"
|
||||||
|
"github.com/aws/aws-sdk-go-v2/service/s3"
|
||||||
|
"github.com/aws/smithy-go/logging"
|
||||||
|
"github.com/johannesboyne/gofakes3"
|
||||||
|
"github.com/johannesboyne/gofakes3/backend/s3mem"
|
||||||
|
)
|
||||||
|
|
||||||
|
const (
|
||||||
|
testBucket = "test-bucket"
|
||||||
|
testRegion = "us-east-1"
|
||||||
|
testAccessKey = "test-access-key"
|
||||||
|
testSecretKey = "test-secret-key"
|
||||||
|
testEndpoint = "http://localhost:9999"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TestServer represents an in-process S3-compatible test server
|
||||||
|
type TestServer struct {
|
||||||
|
server *http.Server
|
||||||
|
backend gofakes3.Backend
|
||||||
|
s3Client *s3.Client
|
||||||
|
tempDir string
|
||||||
|
logBuf *bytes.Buffer
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewTestServer creates and starts a new test server
|
||||||
|
func NewTestServer(t *testing.T) *TestServer {
|
||||||
|
// Create temp directory for any file operations
|
||||||
|
tempDir, err := os.MkdirTemp("", "vaultik-s3-test-*")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create temp dir: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create in-memory backend
|
||||||
|
backend := s3mem.New()
|
||||||
|
faker := gofakes3.New(backend)
|
||||||
|
|
||||||
|
// Create HTTP server
|
||||||
|
server := &http.Server{
|
||||||
|
Addr: "localhost:9999",
|
||||||
|
Handler: faker.Server(),
|
||||||
|
}
|
||||||
|
|
||||||
|
// Start server in background
|
||||||
|
go func() {
|
||||||
|
if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
|
||||||
|
t.Logf("test server error: %v", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Wait for server to be ready
|
||||||
|
time.Sleep(100 * time.Millisecond)
|
||||||
|
|
||||||
|
// Create a buffer to capture logs
|
||||||
|
logBuf := &bytes.Buffer{}
|
||||||
|
|
||||||
|
// Create S3 client with custom logger
|
||||||
|
cfg, err := config.LoadDefaultConfig(context.Background(),
|
||||||
|
config.WithRegion(testRegion),
|
||||||
|
config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(
|
||||||
|
testAccessKey,
|
||||||
|
testSecretKey,
|
||||||
|
"",
|
||||||
|
)),
|
||||||
|
config.WithClientLogMode(aws.LogRetries|aws.LogRequestWithBody|aws.LogResponseWithBody),
|
||||||
|
config.WithLogger(logging.LoggerFunc(func(classification logging.Classification, format string, v ...interface{}) {
|
||||||
|
// Capture logs to buffer instead of stdout
|
||||||
|
fmt.Fprintf(logBuf, "SDK %s %s %s\n",
|
||||||
|
time.Now().Format("2006/01/02 15:04:05"),
|
||||||
|
string(classification),
|
||||||
|
fmt.Sprintf(format, v...))
|
||||||
|
})),
|
||||||
|
)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create AWS config: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
s3Client := s3.NewFromConfig(cfg, func(o *s3.Options) {
|
||||||
|
o.BaseEndpoint = aws.String(testEndpoint)
|
||||||
|
o.UsePathStyle = true
|
||||||
|
})
|
||||||
|
|
||||||
|
ts := &TestServer{
|
||||||
|
server: server,
|
||||||
|
backend: backend,
|
||||||
|
s3Client: s3Client,
|
||||||
|
tempDir: tempDir,
|
||||||
|
logBuf: logBuf,
|
||||||
|
}
|
||||||
|
|
||||||
|
// Register cleanup to show logs on test failure
|
||||||
|
t.Cleanup(func() {
|
||||||
|
if t.Failed() && logBuf.Len() > 0 {
|
||||||
|
t.Logf("S3 SDK Debug Output:\n%s", logBuf.String())
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
// Create test bucket
|
||||||
|
_, err = s3Client.CreateBucket(context.Background(), &s3.CreateBucketInput{
|
||||||
|
Bucket: aws.String(testBucket),
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create test bucket: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return ts
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cleanup shuts down the server and removes temp directory
|
||||||
|
func (ts *TestServer) Cleanup() error {
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
if err := ts.server.Shutdown(ctx); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
return os.RemoveAll(ts.tempDir)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Client returns the S3 client configured for the test server
|
||||||
|
func (ts *TestServer) Client() *s3.Client {
|
||||||
|
return ts.s3Client
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestBasicS3Operations tests basic store and retrieve operations
|
||||||
|
func TestBasicS3Operations(t *testing.T) {
|
||||||
|
ts := NewTestServer(t)
|
||||||
|
defer func() {
|
||||||
|
if err := ts.Cleanup(); err != nil {
|
||||||
|
t.Errorf("cleanup failed: %v", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
client := ts.Client()
|
||||||
|
|
||||||
|
// Test data
|
||||||
|
testKey := "test/file.txt"
|
||||||
|
testData := []byte("Hello, S3 test!")
|
||||||
|
|
||||||
|
// Put object
|
||||||
|
_, err := client.PutObject(ctx, &s3.PutObjectInput{
|
||||||
|
Bucket: aws.String(testBucket),
|
||||||
|
Key: aws.String(testKey),
|
||||||
|
Body: bytes.NewReader(testData),
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to put object: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get object
|
||||||
|
result, err := client.GetObject(ctx, &s3.GetObjectInput{
|
||||||
|
Bucket: aws.String(testBucket),
|
||||||
|
Key: aws.String(testKey),
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to get object: %v", err)
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
if err := result.Body.Close(); err != nil {
|
||||||
|
t.Errorf("failed to close body: %v", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Read and verify data
|
||||||
|
data, err := io.ReadAll(result.Body)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to read object body: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if !bytes.Equal(data, testData) {
|
||||||
|
t.Errorf("retrieved data mismatch: got %q, want %q", data, testData)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestBlobOperations tests blob storage patterns for vaultik
|
||||||
|
func TestBlobOperations(t *testing.T) {
|
||||||
|
ts := NewTestServer(t)
|
||||||
|
defer func() {
|
||||||
|
if err := ts.Cleanup(); err != nil {
|
||||||
|
t.Errorf("cleanup failed: %v", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
client := ts.Client()
|
||||||
|
|
||||||
|
// Test blob storage with prefix structure
|
||||||
|
blobHash := "aabbccddee112233445566778899aabbccddee11"
|
||||||
|
blobKey := filepath.Join("blobs", blobHash[:2], blobHash[2:4], blobHash+".zst.age")
|
||||||
|
blobData := []byte("compressed and encrypted blob data")
|
||||||
|
|
||||||
|
// Store blob
|
||||||
|
_, err := client.PutObject(ctx, &s3.PutObjectInput{
|
||||||
|
Bucket: aws.String(testBucket),
|
||||||
|
Key: aws.String(blobKey),
|
||||||
|
Body: bytes.NewReader(blobData),
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to store blob: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// List objects with prefix
|
||||||
|
listResult, err := client.ListObjectsV2(ctx, &s3.ListObjectsV2Input{
|
||||||
|
Bucket: aws.String(testBucket),
|
||||||
|
Prefix: aws.String("blobs/aa/"),
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to list objects: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(listResult.Contents) != 1 {
|
||||||
|
t.Errorf("expected 1 object, got %d", len(listResult.Contents))
|
||||||
|
}
|
||||||
|
|
||||||
|
if listResult.Contents[0].Key != nil && *listResult.Contents[0].Key != blobKey {
|
||||||
|
t.Errorf("unexpected key: got %s, want %s", *listResult.Contents[0].Key, blobKey)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Delete blob
|
||||||
|
_, err = client.DeleteObject(ctx, &s3.DeleteObjectInput{
|
||||||
|
Bucket: aws.String(testBucket),
|
||||||
|
Key: aws.String(blobKey),
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to delete blob: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify deletion
|
||||||
|
_, err = client.GetObject(ctx, &s3.GetObjectInput{
|
||||||
|
Bucket: aws.String(testBucket),
|
||||||
|
Key: aws.String(blobKey),
|
||||||
|
})
|
||||||
|
if err == nil {
|
||||||
|
t.Error("expected error getting deleted object, got nil")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestMetadataOperations tests metadata storage patterns
|
||||||
|
func TestMetadataOperations(t *testing.T) {
|
||||||
|
ts := NewTestServer(t)
|
||||||
|
defer func() {
|
||||||
|
if err := ts.Cleanup(); err != nil {
|
||||||
|
t.Errorf("cleanup failed: %v", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
client := ts.Client()
|
||||||
|
|
||||||
|
// Test metadata storage
|
||||||
|
snapshotID := "2024-01-01T12:00:00Z"
|
||||||
|
metadataKey := filepath.Join("metadata", snapshotID+".sqlite.age")
|
||||||
|
metadataData := []byte("encrypted sqlite database")
|
||||||
|
|
||||||
|
// Store metadata
|
||||||
|
_, err := client.PutObject(ctx, &s3.PutObjectInput{
|
||||||
|
Bucket: aws.String(testBucket),
|
||||||
|
Key: aws.String(metadataKey),
|
||||||
|
Body: bytes.NewReader(metadataData),
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to store metadata: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Store manifest
|
||||||
|
manifestKey := filepath.Join("metadata", snapshotID+".manifest.json.zst")
|
||||||
|
manifestData := []byte(`{"snapshot_id":"2024-01-01T12:00:00Z","blob_hashes":["hash1","hash2"]}`)
|
||||||
|
|
||||||
|
_, err = client.PutObject(ctx, &s3.PutObjectInput{
|
||||||
|
Bucket: aws.String(testBucket),
|
||||||
|
Key: aws.String(manifestKey),
|
||||||
|
Body: bytes.NewReader(manifestData),
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to store manifest: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// List metadata objects
|
||||||
|
listResult, err := client.ListObjectsV2(ctx, &s3.ListObjectsV2Input{
|
||||||
|
Bucket: aws.String(testBucket),
|
||||||
|
Prefix: aws.String("metadata/"),
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to list metadata: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(listResult.Contents) != 2 {
|
||||||
|
t.Errorf("expected 2 metadata objects, got %d", len(listResult.Contents))
|
||||||
|
}
|
||||||
|
}
|
||||||
534
internal/snapshot/backup_test.go
Normal file
534
internal/snapshot/backup_test.go
Normal file
@@ -0,0 +1,534 @@
|
|||||||
|
package snapshot
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"crypto/sha256"
|
||||||
|
"database/sql"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"io/fs"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"testing"
|
||||||
|
"testing/fstest"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
|
)
|
||||||
|
|
||||||
|
// MockS3Client is a mock implementation of S3 operations for testing
|
||||||
|
type MockS3Client struct {
|
||||||
|
storage map[string][]byte
|
||||||
|
}
|
||||||
|
|
||||||
|
func NewMockS3Client() *MockS3Client {
|
||||||
|
return &MockS3Client{
|
||||||
|
storage: make(map[string][]byte),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (m *MockS3Client) PutBlob(ctx context.Context, hash string, data []byte) error {
|
||||||
|
m.storage[hash] = data
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (m *MockS3Client) GetBlob(ctx context.Context, hash string) ([]byte, error) {
|
||||||
|
data, ok := m.storage[hash]
|
||||||
|
if !ok {
|
||||||
|
return nil, fmt.Errorf("blob not found: %s", hash)
|
||||||
|
}
|
||||||
|
return data, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (m *MockS3Client) BlobExists(ctx context.Context, hash string) (bool, error) {
|
||||||
|
_, ok := m.storage[hash]
|
||||||
|
return ok, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (m *MockS3Client) CreateBucket(ctx context.Context, bucket string) error {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBackupWithInMemoryFS(t *testing.T) {
|
||||||
|
// Create a temporary directory for the database
|
||||||
|
tempDir := t.TempDir()
|
||||||
|
dbPath := filepath.Join(tempDir, "test.db")
|
||||||
|
|
||||||
|
// Create test filesystem
|
||||||
|
testFS := fstest.MapFS{
|
||||||
|
"file1.txt": &fstest.MapFile{
|
||||||
|
Data: []byte("Hello, World!"),
|
||||||
|
Mode: 0644,
|
||||||
|
ModTime: time.Now(),
|
||||||
|
},
|
||||||
|
"dir1/file2.txt": &fstest.MapFile{
|
||||||
|
Data: []byte("This is a test file with some content."),
|
||||||
|
Mode: 0755,
|
||||||
|
ModTime: time.Now(),
|
||||||
|
},
|
||||||
|
"dir1/subdir/file3.txt": &fstest.MapFile{
|
||||||
|
Data: []byte("Another file in a subdirectory."),
|
||||||
|
Mode: 0600,
|
||||||
|
ModTime: time.Now(),
|
||||||
|
},
|
||||||
|
"largefile.bin": &fstest.MapFile{
|
||||||
|
Data: generateLargeFileContent(10 * 1024 * 1024), // 10MB file with varied content
|
||||||
|
Mode: 0644,
|
||||||
|
ModTime: time.Now(),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
// Initialize the database
|
||||||
|
ctx := context.Background()
|
||||||
|
db, err := database.New(ctx, dbPath)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("Failed to create database: %v", err)
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
if err := db.Close(); err != nil {
|
||||||
|
t.Logf("Failed to close database: %v", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
repos := database.NewRepositories(db)
|
||||||
|
|
||||||
|
// Create mock S3 client
|
||||||
|
s3Client := NewMockS3Client()
|
||||||
|
|
||||||
|
// Run backup
|
||||||
|
backupEngine := &BackupEngine{
|
||||||
|
repos: repos,
|
||||||
|
s3Client: s3Client,
|
||||||
|
}
|
||||||
|
|
||||||
|
snapshotID, err := backupEngine.Backup(ctx, testFS, ".")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("Backup failed: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify snapshot was created
|
||||||
|
snapshot, err := repos.Snapshots.GetByID(ctx, snapshotID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("Failed to get snapshot: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if snapshot == nil {
|
||||||
|
t.Fatal("Snapshot not found")
|
||||||
|
}
|
||||||
|
|
||||||
|
if snapshot.FileCount == 0 {
|
||||||
|
t.Error("Expected snapshot to have files")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify files in database
|
||||||
|
files, err := repos.Files.ListByPrefix(ctx, "")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("Failed to list files: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
expectedFiles := map[string]bool{
|
||||||
|
"file1.txt": true,
|
||||||
|
"dir1/file2.txt": true,
|
||||||
|
"dir1/subdir/file3.txt": true,
|
||||||
|
"largefile.bin": true,
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(files) != len(expectedFiles) {
|
||||||
|
t.Errorf("Expected %d files, got %d", len(expectedFiles), len(files))
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, file := range files {
|
||||||
|
if !expectedFiles[file.Path.String()] {
|
||||||
|
t.Errorf("Unexpected file in database: %s", file.Path)
|
||||||
|
}
|
||||||
|
delete(expectedFiles, file.Path.String())
|
||||||
|
|
||||||
|
// Verify file metadata
|
||||||
|
fsFile := testFS[file.Path.String()]
|
||||||
|
if fsFile == nil {
|
||||||
|
t.Errorf("File %s not found in test filesystem", file.Path)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
if file.Size != int64(len(fsFile.Data)) {
|
||||||
|
t.Errorf("File %s: expected size %d, got %d", file.Path, len(fsFile.Data), file.Size)
|
||||||
|
}
|
||||||
|
|
||||||
|
if file.Mode != uint32(fsFile.Mode) {
|
||||||
|
t.Errorf("File %s: expected mode %o, got %o", file.Path, fsFile.Mode, file.Mode)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(expectedFiles) > 0 {
|
||||||
|
t.Errorf("Files not found in database: %v", expectedFiles)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify chunks
|
||||||
|
chunks, err := repos.Chunks.List(ctx)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("Failed to list chunks: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(chunks) == 0 {
|
||||||
|
t.Error("No chunks found in database")
|
||||||
|
}
|
||||||
|
|
||||||
|
// The large file should create 10 chunks (10MB / 1MB chunk size)
|
||||||
|
// Plus the small files
|
||||||
|
minExpectedChunks := 10 + 3
|
||||||
|
if len(chunks) < minExpectedChunks {
|
||||||
|
t.Errorf("Expected at least %d chunks, got %d", minExpectedChunks, len(chunks))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify at least one blob was created and uploaded
|
||||||
|
// We can't list blobs directly, but we can check via snapshot blobs
|
||||||
|
blobHashes, err := repos.Snapshots.GetBlobHashes(ctx, snapshotID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("Failed to get blob hashes: %v", err)
|
||||||
|
}
|
||||||
|
if len(blobHashes) == 0 {
|
||||||
|
t.Error("Expected at least one blob to be created")
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, blobHash := range blobHashes {
|
||||||
|
// Check blob exists in mock S3
|
||||||
|
exists, err := s3Client.BlobExists(ctx, blobHash)
|
||||||
|
if err != nil {
|
||||||
|
t.Errorf("Failed to check blob %s: %v", blobHash, err)
|
||||||
|
}
|
||||||
|
if !exists {
|
||||||
|
t.Errorf("Blob %s not found in S3", blobHash)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestBackupDeduplication(t *testing.T) {
|
||||||
|
// Create a temporary directory for the database
|
||||||
|
tempDir := t.TempDir()
|
||||||
|
dbPath := filepath.Join(tempDir, "test.db")
|
||||||
|
|
||||||
|
// Create test filesystem with duplicate content
|
||||||
|
testFS := fstest.MapFS{
|
||||||
|
"file1.txt": &fstest.MapFile{
|
||||||
|
Data: []byte("Duplicate content"),
|
||||||
|
Mode: 0644,
|
||||||
|
ModTime: time.Now(),
|
||||||
|
},
|
||||||
|
"file2.txt": &fstest.MapFile{
|
||||||
|
Data: []byte("Duplicate content"),
|
||||||
|
Mode: 0644,
|
||||||
|
ModTime: time.Now(),
|
||||||
|
},
|
||||||
|
"file3.txt": &fstest.MapFile{
|
||||||
|
Data: []byte("Unique content"),
|
||||||
|
Mode: 0644,
|
||||||
|
ModTime: time.Now(),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
// Initialize the database
|
||||||
|
ctx := context.Background()
|
||||||
|
db, err := database.New(ctx, dbPath)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("Failed to create database: %v", err)
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
if err := db.Close(); err != nil {
|
||||||
|
t.Logf("Failed to close database: %v", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
repos := database.NewRepositories(db)
|
||||||
|
|
||||||
|
// Create mock S3 client
|
||||||
|
s3Client := NewMockS3Client()
|
||||||
|
|
||||||
|
// Run backup
|
||||||
|
backupEngine := &BackupEngine{
|
||||||
|
repos: repos,
|
||||||
|
s3Client: s3Client,
|
||||||
|
}
|
||||||
|
|
||||||
|
_, err = backupEngine.Backup(ctx, testFS, ".")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("Backup failed: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify deduplication
|
||||||
|
chunks, err := repos.Chunks.List(ctx)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("Failed to list chunks: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Should have only 2 unique chunks (duplicate content + unique content)
|
||||||
|
if len(chunks) != 2 {
|
||||||
|
t.Errorf("Expected 2 unique chunks, got %d", len(chunks))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify chunk references
|
||||||
|
for _, chunk := range chunks {
|
||||||
|
files, err := repos.ChunkFiles.GetByChunkHash(ctx, chunk.ChunkHash)
|
||||||
|
if err != nil {
|
||||||
|
t.Errorf("Failed to get files for chunk %s: %v", chunk.ChunkHash, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// The duplicate content chunk should be referenced by 2 files
|
||||||
|
if chunk.Size == int64(len("Duplicate content")) && len(files) != 2 {
|
||||||
|
t.Errorf("Expected duplicate chunk to be referenced by 2 files, got %d", len(files))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// BackupEngine performs backup operations
|
||||||
|
type BackupEngine struct {
|
||||||
|
repos *database.Repositories
|
||||||
|
s3Client interface {
|
||||||
|
PutBlob(ctx context.Context, hash string, data []byte) error
|
||||||
|
BlobExists(ctx context.Context, hash string) (bool, error)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Backup performs a backup of the given filesystem
|
||||||
|
func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (string, error) {
|
||||||
|
// Create a new snapshot
|
||||||
|
hostname, _ := os.Hostname()
|
||||||
|
snapshotID := time.Now().Format(time.RFC3339)
|
||||||
|
snapshot := &database.Snapshot{
|
||||||
|
ID: types.SnapshotID(snapshotID),
|
||||||
|
Hostname: types.Hostname(hostname),
|
||||||
|
VaultikVersion: "test",
|
||||||
|
StartedAt: time.Now(),
|
||||||
|
CompletedAt: nil,
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create initial snapshot record
|
||||||
|
err := b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
return b.repos.Snapshots.Create(ctx, tx, snapshot)
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return "", err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Track counters
|
||||||
|
var fileCount, chunkCount, blobCount, totalSize, blobSize int64
|
||||||
|
|
||||||
|
// Track which chunks we've seen to handle deduplication
|
||||||
|
processedChunks := make(map[string]bool)
|
||||||
|
|
||||||
|
// Scan the filesystem and process files
|
||||||
|
err = fs.WalkDir(fsys, root, func(path string, d fs.DirEntry, err error) error {
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Skip directories
|
||||||
|
if d.IsDir() {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get file info
|
||||||
|
info, err := d.Info()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Handle symlinks
|
||||||
|
if info.Mode()&fs.ModeSymlink != 0 {
|
||||||
|
// For testing, we'll skip symlinks since fstest doesn't support them well
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create file record in a short transaction
|
||||||
|
file := &database.File{
|
||||||
|
Path: types.FilePath(path),
|
||||||
|
Size: info.Size(),
|
||||||
|
Mode: uint32(info.Mode()),
|
||||||
|
MTime: info.ModTime(),
|
||||||
|
CTime: info.ModTime(), // Use mtime as ctime for test
|
||||||
|
UID: 1000, // Default UID for test
|
||||||
|
GID: 1000, // Default GID for test
|
||||||
|
}
|
||||||
|
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
return b.repos.Files.Create(ctx, tx, file)
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
fileCount++
|
||||||
|
totalSize += info.Size()
|
||||||
|
|
||||||
|
// Read and process file in chunks
|
||||||
|
f, err := fsys.Open(path)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
if err := f.Close(); err != nil {
|
||||||
|
// Log but don't fail since we're already in an error path potentially
|
||||||
|
fmt.Fprintf(os.Stderr, "Failed to close file: %v\n", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Process file in chunks
|
||||||
|
chunkIndex := 0
|
||||||
|
buffer := make([]byte, defaultChunkSize)
|
||||||
|
|
||||||
|
for {
|
||||||
|
n, err := f.Read(buffer)
|
||||||
|
if err != nil && err != io.EOF {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if n == 0 {
|
||||||
|
break
|
||||||
|
}
|
||||||
|
|
||||||
|
chunkData := buffer[:n]
|
||||||
|
chunkHash := calculateHash(chunkData)
|
||||||
|
|
||||||
|
// Check if chunk already exists (outside of transaction)
|
||||||
|
existingChunk, _ := b.repos.Chunks.GetByHash(ctx, chunkHash)
|
||||||
|
if existingChunk == nil {
|
||||||
|
// Create new chunk in a short transaction
|
||||||
|
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
chunk := &database.Chunk{
|
||||||
|
ChunkHash: types.ChunkHash(chunkHash),
|
||||||
|
Size: int64(n),
|
||||||
|
}
|
||||||
|
return b.repos.Chunks.Create(ctx, tx, chunk)
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
processedChunks[chunkHash] = true
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create file-chunk mapping in a short transaction
|
||||||
|
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
fileChunk := &database.FileChunk{
|
||||||
|
FileID: file.ID,
|
||||||
|
Idx: chunkIndex,
|
||||||
|
ChunkHash: types.ChunkHash(chunkHash),
|
||||||
|
}
|
||||||
|
return b.repos.FileChunks.Create(ctx, tx, fileChunk)
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create chunk-file mapping in a short transaction
|
||||||
|
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
chunkFile := &database.ChunkFile{
|
||||||
|
ChunkHash: types.ChunkHash(chunkHash),
|
||||||
|
FileID: file.ID,
|
||||||
|
FileOffset: int64(chunkIndex * defaultChunkSize),
|
||||||
|
Length: int64(n),
|
||||||
|
}
|
||||||
|
return b.repos.ChunkFiles.Create(ctx, tx, chunkFile)
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
chunkIndex++
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
})
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return "", err
|
||||||
|
}
|
||||||
|
|
||||||
|
// After all files are processed, create blobs for new chunks
|
||||||
|
for chunkHash := range processedChunks {
|
||||||
|
// Get chunk data (outside of transaction)
|
||||||
|
chunk, err := b.repos.Chunks.GetByHash(ctx, chunkHash)
|
||||||
|
if err != nil {
|
||||||
|
return "", err
|
||||||
|
}
|
||||||
|
|
||||||
|
chunkCount++
|
||||||
|
|
||||||
|
// In a real system, blobs would contain multiple chunks and be encrypted
|
||||||
|
// For testing, we'll create a blob with a "blob-" prefix to differentiate
|
||||||
|
blobHash := "blob-" + chunkHash
|
||||||
|
|
||||||
|
// For the test, we'll create dummy data since we don't have the original
|
||||||
|
dummyData := []byte(chunkHash)
|
||||||
|
|
||||||
|
// Upload to S3 as a blob
|
||||||
|
if err := b.s3Client.PutBlob(ctx, blobHash, dummyData); err != nil {
|
||||||
|
return "", err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create blob entry in a short transaction
|
||||||
|
blobID := types.NewBlobID()
|
||||||
|
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
blob := &database.Blob{
|
||||||
|
ID: blobID,
|
||||||
|
Hash: types.BlobHash(blobHash),
|
||||||
|
CreatedTS: time.Now(),
|
||||||
|
}
|
||||||
|
return b.repos.Blobs.Create(ctx, tx, blob)
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return "", err
|
||||||
|
}
|
||||||
|
|
||||||
|
blobCount++
|
||||||
|
blobSize += chunk.Size
|
||||||
|
|
||||||
|
// Create blob-chunk mapping in a short transaction
|
||||||
|
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
blobChunk := &database.BlobChunk{
|
||||||
|
BlobID: blobID,
|
||||||
|
ChunkHash: types.ChunkHash(chunkHash),
|
||||||
|
Offset: 0,
|
||||||
|
Length: chunk.Size,
|
||||||
|
}
|
||||||
|
return b.repos.BlobChunks.Create(ctx, tx, blobChunk)
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return "", err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add blob to snapshot in a short transaction
|
||||||
|
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
return b.repos.Snapshots.AddBlob(ctx, tx, snapshotID, blobID, types.BlobHash(blobHash))
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return "", err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Update snapshot with final counts
|
||||||
|
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
return b.repos.Snapshots.UpdateCounts(ctx, tx, snapshotID, fileCount, chunkCount, blobCount, totalSize, blobSize)
|
||||||
|
})
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return "", err
|
||||||
|
}
|
||||||
|
|
||||||
|
return snapshotID, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func calculateHash(data []byte) string {
|
||||||
|
h := sha256.New()
|
||||||
|
h.Write(data)
|
||||||
|
return fmt.Sprintf("%x", h.Sum(nil))
|
||||||
|
}
|
||||||
|
|
||||||
|
func generateLargeFileContent(size int) []byte {
|
||||||
|
data := make([]byte, size)
|
||||||
|
// Fill with pattern that changes every chunk to avoid deduplication
|
||||||
|
for i := 0; i < size; i++ {
|
||||||
|
chunkNum := i / defaultChunkSize
|
||||||
|
data[i] = byte((i + chunkNum) % 256)
|
||||||
|
}
|
||||||
|
return data
|
||||||
|
}
|
||||||
|
|
||||||
|
const defaultChunkSize = 1024 * 1024 // 1MB chunks
|
||||||
454
internal/snapshot/exclude_test.go
Normal file
454
internal/snapshot/exclude_test.go
Normal file
@@ -0,0 +1,454 @@
|
|||||||
|
package snapshot_test
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"database/sql"
|
||||||
|
"path/filepath"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/snapshot"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
|
"github.com/spf13/afero"
|
||||||
|
"github.com/stretchr/testify/require"
|
||||||
|
)
|
||||||
|
|
||||||
|
func setupExcludeTestFS(t *testing.T) afero.Fs {
|
||||||
|
t.Helper()
|
||||||
|
|
||||||
|
// Create in-memory filesystem
|
||||||
|
fs := afero.NewMemMapFs()
|
||||||
|
|
||||||
|
// Create test directory structure:
|
||||||
|
// /backup/
|
||||||
|
// file1.txt (should be backed up)
|
||||||
|
// file2.log (should be excluded if *.log is in patterns)
|
||||||
|
// .git/
|
||||||
|
// config (should be excluded if .git is in patterns)
|
||||||
|
// objects/
|
||||||
|
// pack/
|
||||||
|
// data.pack (should be excluded if .git is in patterns)
|
||||||
|
// src/
|
||||||
|
// main.go (should be backed up)
|
||||||
|
// test.go (should be backed up)
|
||||||
|
// node_modules/
|
||||||
|
// package/
|
||||||
|
// index.js (should be excluded if node_modules is in patterns)
|
||||||
|
// cache/
|
||||||
|
// temp.dat (should be excluded if cache/ is in patterns)
|
||||||
|
// build/
|
||||||
|
// output.bin (should be excluded if build is in patterns)
|
||||||
|
// docs/
|
||||||
|
// readme.md (should be backed up)
|
||||||
|
// .DS_Store (should be excluded if .DS_Store is in patterns)
|
||||||
|
// thumbs.db (should be excluded if thumbs.db is in patterns)
|
||||||
|
|
||||||
|
files := map[string]string{
|
||||||
|
"/backup/file1.txt": "content1",
|
||||||
|
"/backup/file2.log": "log content",
|
||||||
|
"/backup/.git/config": "git config",
|
||||||
|
"/backup/.git/objects/pack/data.pack": "pack data",
|
||||||
|
"/backup/src/main.go": "package main",
|
||||||
|
"/backup/src/test.go": "package main_test",
|
||||||
|
"/backup/node_modules/package/index.js": "module.exports = {}",
|
||||||
|
"/backup/cache/temp.dat": "cached data",
|
||||||
|
"/backup/build/output.bin": "binary data",
|
||||||
|
"/backup/docs/readme.md": "# Documentation",
|
||||||
|
"/backup/.DS_Store": "ds store data",
|
||||||
|
"/backup/thumbs.db": "thumbs data",
|
||||||
|
"/backup/src/.hidden": "hidden file",
|
||||||
|
"/backup/important.log.bak": "backup of log",
|
||||||
|
}
|
||||||
|
|
||||||
|
testTime := time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC)
|
||||||
|
for path, content := range files {
|
||||||
|
dir := filepath.Dir(path)
|
||||||
|
err := fs.MkdirAll(dir, 0755)
|
||||||
|
require.NoError(t, err)
|
||||||
|
err = afero.WriteFile(fs, path, []byte(content), 0644)
|
||||||
|
require.NoError(t, err)
|
||||||
|
err = fs.Chtimes(path, testTime, testTime)
|
||||||
|
require.NoError(t, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return fs
|
||||||
|
}
|
||||||
|
|
||||||
|
func createTestScanner(t *testing.T, fs afero.Fs, excludePatterns []string) (*snapshot.Scanner, *database.Repositories, func()) {
|
||||||
|
t.Helper()
|
||||||
|
|
||||||
|
// Initialize logger
|
||||||
|
log.Initialize(log.Config{})
|
||||||
|
|
||||||
|
// Create test database
|
||||||
|
db, err := database.NewTestDB()
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
repos := database.NewRepositories(db)
|
||||||
|
|
||||||
|
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
|
||||||
|
FS: fs,
|
||||||
|
ChunkSize: 64 * 1024,
|
||||||
|
Repositories: repos,
|
||||||
|
MaxBlobSize: 1024 * 1024,
|
||||||
|
CompressionLevel: 3,
|
||||||
|
AgeRecipients: []string{"age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p"},
|
||||||
|
Exclude: excludePatterns,
|
||||||
|
})
|
||||||
|
|
||||||
|
cleanup := func() {
|
||||||
|
_ = db.Close()
|
||||||
|
}
|
||||||
|
|
||||||
|
return scanner, repos, cleanup
|
||||||
|
}
|
||||||
|
|
||||||
|
func createSnapshotRecord(t *testing.T, ctx context.Context, repos *database.Repositories, snapshotID string) {
|
||||||
|
t.Helper()
|
||||||
|
err := repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
snap := &database.Snapshot{
|
||||||
|
ID: types.SnapshotID(snapshotID),
|
||||||
|
Hostname: "test-host",
|
||||||
|
VaultikVersion: "test",
|
||||||
|
StartedAt: time.Now(),
|
||||||
|
CompletedAt: nil,
|
||||||
|
FileCount: 0,
|
||||||
|
ChunkCount: 0,
|
||||||
|
BlobCount: 0,
|
||||||
|
TotalSize: 0,
|
||||||
|
BlobSize: 0,
|
||||||
|
CompressionRatio: 1.0,
|
||||||
|
}
|
||||||
|
return repos.Snapshots.Create(ctx, tx, snap)
|
||||||
|
})
|
||||||
|
require.NoError(t, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExcludePatterns_ExcludeGitDirectory(t *testing.T) {
|
||||||
|
fs := setupExcludeTestFS(t)
|
||||||
|
scanner, repos, cleanup := createTestScanner(t, fs, []string{".git"})
|
||||||
|
defer cleanup()
|
||||||
|
require.NotNil(t, scanner)
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||||
|
|
||||||
|
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Should have scanned files but NOT .git directory contents
|
||||||
|
// Expected: file1.txt, file2.log, src/main.go, src/test.go, node_modules/package/index.js,
|
||||||
|
// cache/temp.dat, build/output.bin, docs/readme.md, .DS_Store, thumbs.db,
|
||||||
|
// src/.hidden, important.log.bak
|
||||||
|
// Excluded: .git/config, .git/objects/pack/data.pack
|
||||||
|
require.Equal(t, 12, result.FilesScanned, "Should exclude .git directory contents")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExcludePatterns_ExcludeByExtension(t *testing.T) {
|
||||||
|
fs := setupExcludeTestFS(t)
|
||||||
|
scanner, repos, cleanup := createTestScanner(t, fs, []string{"*.log"})
|
||||||
|
defer cleanup()
|
||||||
|
require.NotNil(t, scanner)
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||||
|
|
||||||
|
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Should exclude file2.log but NOT important.log.bak (different extension)
|
||||||
|
// Total files: 14, excluded: 1 (file2.log)
|
||||||
|
require.Equal(t, 13, result.FilesScanned, "Should exclude *.log files")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExcludePatterns_ExcludeNodeModules(t *testing.T) {
|
||||||
|
fs := setupExcludeTestFS(t)
|
||||||
|
scanner, repos, cleanup := createTestScanner(t, fs, []string{"node_modules"})
|
||||||
|
defer cleanup()
|
||||||
|
require.NotNil(t, scanner)
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||||
|
|
||||||
|
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Should exclude node_modules/package/index.js
|
||||||
|
// Total files: 14, excluded: 1
|
||||||
|
require.Equal(t, 13, result.FilesScanned, "Should exclude node_modules directory")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExcludePatterns_MultiplePatterns(t *testing.T) {
|
||||||
|
fs := setupExcludeTestFS(t)
|
||||||
|
scanner, repos, cleanup := createTestScanner(t, fs, []string{".git", "node_modules", "*.log", ".DS_Store", "thumbs.db", "cache", "build"})
|
||||||
|
defer cleanup()
|
||||||
|
require.NotNil(t, scanner)
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||||
|
|
||||||
|
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Should only have: file1.txt, src/main.go, src/test.go, docs/readme.md, src/.hidden, important.log.bak
|
||||||
|
// Excluded: .git/*, node_modules/*, *.log (file2.log), .DS_Store, thumbs.db, cache/*, build/*
|
||||||
|
require.Equal(t, 6, result.FilesScanned, "Should exclude multiple patterns")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExcludePatterns_NoExclusions(t *testing.T) {
|
||||||
|
fs := setupExcludeTestFS(t)
|
||||||
|
scanner, repos, cleanup := createTestScanner(t, fs, []string{})
|
||||||
|
defer cleanup()
|
||||||
|
require.NotNil(t, scanner)
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||||
|
|
||||||
|
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Should scan all 14 files
|
||||||
|
require.Equal(t, 14, result.FilesScanned, "Should scan all files when no exclusions")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExcludePatterns_ExcludeHiddenFiles(t *testing.T) {
|
||||||
|
fs := setupExcludeTestFS(t)
|
||||||
|
scanner, repos, cleanup := createTestScanner(t, fs, []string{".*"})
|
||||||
|
defer cleanup()
|
||||||
|
require.NotNil(t, scanner)
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||||
|
|
||||||
|
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Should exclude: .git/*, .DS_Store, src/.hidden
|
||||||
|
// Total files: 14, excluded: 4 (.git/config, .git/objects/pack/data.pack, .DS_Store, src/.hidden)
|
||||||
|
require.Equal(t, 10, result.FilesScanned, "Should exclude hidden files and directories")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExcludePatterns_DoubleStarGlob(t *testing.T) {
|
||||||
|
fs := setupExcludeTestFS(t)
|
||||||
|
scanner, repos, cleanup := createTestScanner(t, fs, []string{"**/*.pack"})
|
||||||
|
defer cleanup()
|
||||||
|
require.NotNil(t, scanner)
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||||
|
|
||||||
|
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Should exclude .git/objects/pack/data.pack
|
||||||
|
// Total files: 14, excluded: 1
|
||||||
|
require.Equal(t, 13, result.FilesScanned, "Should exclude **/*.pack files")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExcludePatterns_ExactFileName(t *testing.T) {
|
||||||
|
fs := setupExcludeTestFS(t)
|
||||||
|
scanner, repos, cleanup := createTestScanner(t, fs, []string{"thumbs.db", ".DS_Store"})
|
||||||
|
defer cleanup()
|
||||||
|
require.NotNil(t, scanner)
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||||
|
|
||||||
|
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Should exclude thumbs.db and .DS_Store
|
||||||
|
// Total files: 14, excluded: 2
|
||||||
|
require.Equal(t, 12, result.FilesScanned, "Should exclude exact file names")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExcludePatterns_CaseSensitive(t *testing.T) {
|
||||||
|
// Pattern matching should be case-sensitive
|
||||||
|
fs := setupExcludeTestFS(t)
|
||||||
|
scanner, repos, cleanup := createTestScanner(t, fs, []string{"THUMBS.DB"})
|
||||||
|
defer cleanup()
|
||||||
|
require.NotNil(t, scanner)
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||||
|
|
||||||
|
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Case-sensitive matching: THUMBS.DB should NOT match thumbs.db
|
||||||
|
// All 14 files should be scanned
|
||||||
|
require.Equal(t, 14, result.FilesScanned, "Pattern matching should be case-sensitive")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExcludePatterns_DirectoryWithTrailingSlash(t *testing.T) {
|
||||||
|
fs := setupExcludeTestFS(t)
|
||||||
|
// Some users might add trailing slashes to directory patterns
|
||||||
|
scanner, repos, cleanup := createTestScanner(t, fs, []string{"cache/", "build/"})
|
||||||
|
defer cleanup()
|
||||||
|
require.NotNil(t, scanner)
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||||
|
|
||||||
|
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Should exclude cache/temp.dat and build/output.bin
|
||||||
|
// Total files: 14, excluded: 2
|
||||||
|
require.Equal(t, 12, result.FilesScanned, "Should handle directory patterns with trailing slashes")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExcludePatterns_PatternInSubdirectory(t *testing.T) {
|
||||||
|
fs := setupExcludeTestFS(t)
|
||||||
|
// Exclude .hidden file specifically in src directory
|
||||||
|
scanner, repos, cleanup := createTestScanner(t, fs, []string{"src/.hidden"})
|
||||||
|
defer cleanup()
|
||||||
|
require.NotNil(t, scanner)
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||||
|
|
||||||
|
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Should exclude only src/.hidden
|
||||||
|
// Total files: 14, excluded: 1
|
||||||
|
require.Equal(t, 13, result.FilesScanned, "Should exclude specific subdirectory files")
|
||||||
|
}
|
||||||
|
|
||||||
|
// setupAnchoredTestFS creates a filesystem for testing anchored patterns
|
||||||
|
// Source dir: /backup
|
||||||
|
// Structure:
|
||||||
|
//
|
||||||
|
// /backup/
|
||||||
|
// projectname/
|
||||||
|
// file.txt (should be excluded with /projectname)
|
||||||
|
// otherproject/
|
||||||
|
// projectname/
|
||||||
|
// file.txt (should NOT be excluded with /projectname, only with projectname)
|
||||||
|
// src/
|
||||||
|
// file.go
|
||||||
|
func setupAnchoredTestFS(t *testing.T) afero.Fs {
|
||||||
|
t.Helper()
|
||||||
|
|
||||||
|
fs := afero.NewMemMapFs()
|
||||||
|
|
||||||
|
files := map[string]string{
|
||||||
|
"/backup/projectname/file.txt": "root project file",
|
||||||
|
"/backup/otherproject/projectname/file.txt": "nested project file",
|
||||||
|
"/backup/src/file.go": "source file",
|
||||||
|
"/backup/file.txt": "root file",
|
||||||
|
}
|
||||||
|
|
||||||
|
testTime := time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC)
|
||||||
|
for path, content := range files {
|
||||||
|
dir := filepath.Dir(path)
|
||||||
|
err := fs.MkdirAll(dir, 0755)
|
||||||
|
require.NoError(t, err)
|
||||||
|
err = afero.WriteFile(fs, path, []byte(content), 0644)
|
||||||
|
require.NoError(t, err)
|
||||||
|
err = fs.Chtimes(path, testTime, testTime)
|
||||||
|
require.NoError(t, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return fs
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExcludePatterns_AnchoredPattern(t *testing.T) {
|
||||||
|
// Pattern starting with / should only match from root of source dir
|
||||||
|
fs := setupAnchoredTestFS(t)
|
||||||
|
scanner, repos, cleanup := createTestScanner(t, fs, []string{"/projectname"})
|
||||||
|
defer cleanup()
|
||||||
|
require.NotNil(t, scanner)
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||||
|
|
||||||
|
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// /projectname should ONLY exclude /backup/projectname/file.txt (1 file)
|
||||||
|
// /backup/otherproject/projectname/file.txt should NOT be excluded
|
||||||
|
// Total files: 4, excluded: 1
|
||||||
|
require.Equal(t, 3, result.FilesScanned, "Anchored pattern /projectname should only match at root of source dir")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExcludePatterns_UnanchoredPattern(t *testing.T) {
|
||||||
|
// Pattern without leading / should match anywhere in path
|
||||||
|
fs := setupAnchoredTestFS(t)
|
||||||
|
scanner, repos, cleanup := createTestScanner(t, fs, []string{"projectname"})
|
||||||
|
defer cleanup()
|
||||||
|
require.NotNil(t, scanner)
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||||
|
|
||||||
|
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// projectname (without /) should exclude BOTH:
|
||||||
|
// - /backup/projectname/file.txt
|
||||||
|
// - /backup/otherproject/projectname/file.txt
|
||||||
|
// Total files: 4, excluded: 2
|
||||||
|
require.Equal(t, 2, result.FilesScanned, "Unanchored pattern should match anywhere in path")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExcludePatterns_AnchoredPatternWithGlob(t *testing.T) {
|
||||||
|
// Anchored pattern with glob
|
||||||
|
fs := setupAnchoredTestFS(t)
|
||||||
|
scanner, repos, cleanup := createTestScanner(t, fs, []string{"/src/*.go"})
|
||||||
|
defer cleanup()
|
||||||
|
require.NotNil(t, scanner)
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||||
|
|
||||||
|
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// /src/*.go should exclude /backup/src/file.go
|
||||||
|
// Total files: 4, excluded: 1
|
||||||
|
require.Equal(t, 3, result.FilesScanned, "Anchored pattern with glob should work")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExcludePatterns_AnchoredPatternFile(t *testing.T) {
|
||||||
|
// Anchored pattern for exact file at root
|
||||||
|
fs := setupAnchoredTestFS(t)
|
||||||
|
scanner, repos, cleanup := createTestScanner(t, fs, []string{"/file.txt"})
|
||||||
|
defer cleanup()
|
||||||
|
require.NotNil(t, scanner)
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||||
|
|
||||||
|
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// /file.txt should ONLY exclude /backup/file.txt
|
||||||
|
// NOT /backup/projectname/file.txt or /backup/otherproject/projectname/file.txt
|
||||||
|
// Total files: 4, excluded: 1
|
||||||
|
require.Equal(t, 3, result.FilesScanned, "Anchored pattern for file should only match at root")
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExcludePatterns_UnanchoredPatternFile(t *testing.T) {
|
||||||
|
// Unanchored pattern for file should match anywhere
|
||||||
|
fs := setupAnchoredTestFS(t)
|
||||||
|
scanner, repos, cleanup := createTestScanner(t, fs, []string{"file.txt"})
|
||||||
|
defer cleanup()
|
||||||
|
require.NotNil(t, scanner)
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||||
|
|
||||||
|
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// file.txt should exclude ALL file.txt files:
|
||||||
|
// - /backup/file.txt
|
||||||
|
// - /backup/projectname/file.txt
|
||||||
|
// - /backup/otherproject/projectname/file.txt
|
||||||
|
// Total files: 4, excluded: 3
|
||||||
|
require.Equal(t, 1, result.FilesScanned, "Unanchored pattern for file should match anywhere")
|
||||||
|
}
|
||||||
238
internal/snapshot/file_change_test.go
Normal file
238
internal/snapshot/file_change_test.go
Normal file
@@ -0,0 +1,238 @@
|
|||||||
|
package snapshot_test
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"database/sql"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/snapshot"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
|
"github.com/spf13/afero"
|
||||||
|
"github.com/stretchr/testify/assert"
|
||||||
|
"github.com/stretchr/testify/require"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TestFileContentChange verifies that when a file's content changes,
|
||||||
|
// the old chunks are properly disassociated
|
||||||
|
func TestFileContentChange(t *testing.T) {
|
||||||
|
// Initialize logger for tests
|
||||||
|
log.Initialize(log.Config{})
|
||||||
|
|
||||||
|
// Create in-memory filesystem
|
||||||
|
fs := afero.NewMemMapFs()
|
||||||
|
|
||||||
|
// Create initial file
|
||||||
|
err := afero.WriteFile(fs, "/test.txt", []byte("Initial content"), 0644)
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Create test database
|
||||||
|
db, err := database.NewTestDB()
|
||||||
|
require.NoError(t, err)
|
||||||
|
defer func() {
|
||||||
|
if err := db.Close(); err != nil {
|
||||||
|
t.Errorf("failed to close database: %v", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
repos := database.NewRepositories(db)
|
||||||
|
|
||||||
|
// Create scanner
|
||||||
|
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
|
||||||
|
FS: fs,
|
||||||
|
ChunkSize: int64(1024 * 16), // 16KB chunks for testing
|
||||||
|
Repositories: repos,
|
||||||
|
MaxBlobSize: int64(1024 * 1024), // 1MB blobs
|
||||||
|
CompressionLevel: 3,
|
||||||
|
AgeRecipients: []string{"age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"}, // Test public key
|
||||||
|
})
|
||||||
|
|
||||||
|
// Create first snapshot
|
||||||
|
ctx := context.Background()
|
||||||
|
snapshotID1 := "snapshot1"
|
||||||
|
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
snapshot := &database.Snapshot{
|
||||||
|
ID: types.SnapshotID(snapshotID1),
|
||||||
|
Hostname: "test-host",
|
||||||
|
VaultikVersion: "test",
|
||||||
|
StartedAt: time.Now(),
|
||||||
|
}
|
||||||
|
return repos.Snapshots.Create(ctx, tx, snapshot)
|
||||||
|
})
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// First scan - should create chunks for initial content
|
||||||
|
result1, err := scanner.Scan(ctx, "/", snapshotID1)
|
||||||
|
require.NoError(t, err)
|
||||||
|
t.Logf("First scan: %d files scanned", result1.FilesScanned)
|
||||||
|
|
||||||
|
// Get file chunks from first scan
|
||||||
|
fileChunks1, err := repos.FileChunks.GetByPath(ctx, "/test.txt")
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Len(t, fileChunks1, 1) // Small file = 1 chunk
|
||||||
|
oldChunkHash := fileChunks1[0].ChunkHash
|
||||||
|
|
||||||
|
// Get chunk files from first scan
|
||||||
|
chunkFiles1, err := repos.ChunkFiles.GetByFilePath(ctx, "/test.txt")
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Len(t, chunkFiles1, 1)
|
||||||
|
|
||||||
|
// Modify the file
|
||||||
|
time.Sleep(10 * time.Millisecond) // Ensure mtime changes
|
||||||
|
err = afero.WriteFile(fs, "/test.txt", []byte("Modified content with different data"), 0644)
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Create second snapshot
|
||||||
|
snapshotID2 := "snapshot2"
|
||||||
|
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
snapshot := &database.Snapshot{
|
||||||
|
ID: types.SnapshotID(snapshotID2),
|
||||||
|
Hostname: "test-host",
|
||||||
|
VaultikVersion: "test",
|
||||||
|
StartedAt: time.Now(),
|
||||||
|
}
|
||||||
|
return repos.Snapshots.Create(ctx, tx, snapshot)
|
||||||
|
})
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Second scan - should create new chunks and remove old associations
|
||||||
|
result2, err := scanner.Scan(ctx, "/", snapshotID2)
|
||||||
|
require.NoError(t, err)
|
||||||
|
t.Logf("Second scan: %d files scanned", result2.FilesScanned)
|
||||||
|
|
||||||
|
// Get file chunks from second scan
|
||||||
|
fileChunks2, err := repos.FileChunks.GetByPath(ctx, "/test.txt")
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Len(t, fileChunks2, 1) // Still 1 chunk but different hash
|
||||||
|
newChunkHash := fileChunks2[0].ChunkHash
|
||||||
|
|
||||||
|
// Verify the chunk hashes are different
|
||||||
|
assert.NotEqual(t, oldChunkHash, newChunkHash, "Chunk hash should change when content changes")
|
||||||
|
|
||||||
|
// Get chunk files from second scan
|
||||||
|
chunkFiles2, err := repos.ChunkFiles.GetByFilePath(ctx, "/test.txt")
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Len(t, chunkFiles2, 1)
|
||||||
|
assert.Equal(t, newChunkHash, chunkFiles2[0].ChunkHash)
|
||||||
|
|
||||||
|
// Verify old chunk still exists (it's still valid data)
|
||||||
|
oldChunk, err := repos.Chunks.GetByHash(ctx, oldChunkHash.String())
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.NotNil(t, oldChunk)
|
||||||
|
|
||||||
|
// Verify new chunk exists
|
||||||
|
newChunk, err := repos.Chunks.GetByHash(ctx, newChunkHash.String())
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.NotNil(t, newChunk)
|
||||||
|
|
||||||
|
// Verify that chunk_files for old chunk no longer references this file
|
||||||
|
oldChunkFiles, err := repos.ChunkFiles.GetByChunkHash(ctx, oldChunkHash)
|
||||||
|
require.NoError(t, err)
|
||||||
|
for _, cf := range oldChunkFiles {
|
||||||
|
file, err := repos.Files.GetByID(ctx, cf.FileID)
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.NotEqual(t, "/data/test.txt", file.Path, "Old chunk should not be associated with the modified file")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestMultipleFileChanges verifies handling of multiple file changes in one scan
|
||||||
|
func TestMultipleFileChanges(t *testing.T) {
|
||||||
|
// Initialize logger for tests
|
||||||
|
log.Initialize(log.Config{})
|
||||||
|
|
||||||
|
// Create in-memory filesystem
|
||||||
|
fs := afero.NewMemMapFs()
|
||||||
|
|
||||||
|
// Create initial files
|
||||||
|
files := map[string]string{
|
||||||
|
"/file1.txt": "Content 1",
|
||||||
|
"/file2.txt": "Content 2",
|
||||||
|
"/file3.txt": "Content 3",
|
||||||
|
}
|
||||||
|
|
||||||
|
for path, content := range files {
|
||||||
|
err := afero.WriteFile(fs, path, []byte(content), 0644)
|
||||||
|
require.NoError(t, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create test database
|
||||||
|
db, err := database.NewTestDB()
|
||||||
|
require.NoError(t, err)
|
||||||
|
defer func() {
|
||||||
|
if err := db.Close(); err != nil {
|
||||||
|
t.Errorf("failed to close database: %v", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
repos := database.NewRepositories(db)
|
||||||
|
|
||||||
|
// Create scanner
|
||||||
|
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
|
||||||
|
FS: fs,
|
||||||
|
ChunkSize: int64(1024 * 16), // 16KB chunks for testing
|
||||||
|
Repositories: repos,
|
||||||
|
MaxBlobSize: int64(1024 * 1024), // 1MB blobs
|
||||||
|
CompressionLevel: 3,
|
||||||
|
AgeRecipients: []string{"age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"}, // Test public key
|
||||||
|
})
|
||||||
|
|
||||||
|
// Create first snapshot
|
||||||
|
ctx := context.Background()
|
||||||
|
snapshotID1 := "snapshot1"
|
||||||
|
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
snapshot := &database.Snapshot{
|
||||||
|
ID: types.SnapshotID(snapshotID1),
|
||||||
|
Hostname: "test-host",
|
||||||
|
VaultikVersion: "test",
|
||||||
|
StartedAt: time.Now(),
|
||||||
|
}
|
||||||
|
return repos.Snapshots.Create(ctx, tx, snapshot)
|
||||||
|
})
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// First scan
|
||||||
|
result1, err := scanner.Scan(ctx, "/", snapshotID1)
|
||||||
|
require.NoError(t, err)
|
||||||
|
// Only regular files are counted, not directories
|
||||||
|
assert.Equal(t, 3, result1.FilesScanned)
|
||||||
|
|
||||||
|
// Modify two files
|
||||||
|
time.Sleep(10 * time.Millisecond) // Ensure mtime changes
|
||||||
|
err = afero.WriteFile(fs, "/file1.txt", []byte("Modified content 1"), 0644)
|
||||||
|
require.NoError(t, err)
|
||||||
|
err = afero.WriteFile(fs, "/file3.txt", []byte("Modified content 3"), 0644)
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Create second snapshot
|
||||||
|
snapshotID2 := "snapshot2"
|
||||||
|
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
snapshot := &database.Snapshot{
|
||||||
|
ID: types.SnapshotID(snapshotID2),
|
||||||
|
Hostname: "test-host",
|
||||||
|
VaultikVersion: "test",
|
||||||
|
StartedAt: time.Now(),
|
||||||
|
}
|
||||||
|
return repos.Snapshots.Create(ctx, tx, snapshot)
|
||||||
|
})
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Second scan
|
||||||
|
result2, err := scanner.Scan(ctx, "/", snapshotID2)
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
// Only regular files are counted, not directories
|
||||||
|
assert.Equal(t, 3, result2.FilesScanned)
|
||||||
|
|
||||||
|
// Verify each file has exactly one set of chunks
|
||||||
|
for path := range files {
|
||||||
|
fileChunks, err := repos.FileChunks.GetByPath(ctx, path)
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Len(t, fileChunks, 1, "File %s should have exactly 1 chunk association", path)
|
||||||
|
|
||||||
|
chunkFiles, err := repos.ChunkFiles.GetByFilePath(ctx, path)
|
||||||
|
require.NoError(t, err)
|
||||||
|
assert.Len(t, chunkFiles, 1, "File %s should have exactly 1 chunk-file association", path)
|
||||||
|
}
|
||||||
|
}
|
||||||
70
internal/snapshot/manifest.go
Normal file
70
internal/snapshot/manifest.go
Normal file
@@ -0,0 +1,70 @@
|
|||||||
|
package snapshot
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
|
||||||
|
"github.com/klauspost/compress/zstd"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Manifest represents the structure of a snapshot's blob manifest
|
||||||
|
type Manifest struct {
|
||||||
|
SnapshotID string `json:"snapshot_id"`
|
||||||
|
Timestamp string `json:"timestamp"`
|
||||||
|
BlobCount int `json:"blob_count"`
|
||||||
|
TotalCompressedSize int64 `json:"total_compressed_size"`
|
||||||
|
Blobs []BlobInfo `json:"blobs"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// BlobInfo represents information about a single blob in the manifest
|
||||||
|
type BlobInfo struct {
|
||||||
|
Hash string `json:"hash"`
|
||||||
|
CompressedSize int64 `json:"compressed_size"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// DecodeManifest decodes a manifest from a reader containing compressed JSON
|
||||||
|
func DecodeManifest(r io.Reader) (*Manifest, error) {
|
||||||
|
// Decompress using zstd
|
||||||
|
zr, err := zstd.NewReader(r)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("creating zstd reader: %w", err)
|
||||||
|
}
|
||||||
|
defer zr.Close()
|
||||||
|
|
||||||
|
// Decode JSON manifest
|
||||||
|
var manifest Manifest
|
||||||
|
if err := json.NewDecoder(zr).Decode(&manifest); err != nil {
|
||||||
|
return nil, fmt.Errorf("decoding manifest: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return &manifest, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// EncodeManifest encodes a manifest to compressed JSON
|
||||||
|
func EncodeManifest(manifest *Manifest, compressionLevel int) ([]byte, error) {
|
||||||
|
// Marshal to JSON
|
||||||
|
jsonData, err := json.MarshalIndent(manifest, "", " ")
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("marshaling manifest: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Compress using zstd
|
||||||
|
var compressedBuf bytes.Buffer
|
||||||
|
writer, err := zstd.NewWriter(&compressedBuf, zstd.WithEncoderLevel(zstd.EncoderLevelFromZstd(compressionLevel)))
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("creating zstd writer: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if _, err := writer.Write(jsonData); err != nil {
|
||||||
|
_ = writer.Close()
|
||||||
|
return nil, fmt.Errorf("writing compressed data: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := writer.Close(); err != nil {
|
||||||
|
return nil, fmt.Errorf("closing zstd writer: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return compressedBuf.Bytes(), nil
|
||||||
|
}
|
||||||
53
internal/snapshot/module.go
Normal file
53
internal/snapshot/module.go
Normal file
@@ -0,0 +1,53 @@
|
|||||||
|
package snapshot
|
||||||
|
|
||||||
|
import (
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/config"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/storage"
|
||||||
|
"github.com/spf13/afero"
|
||||||
|
"go.uber.org/fx"
|
||||||
|
)
|
||||||
|
|
||||||
|
// ScannerParams holds parameters for scanner creation
|
||||||
|
type ScannerParams struct {
|
||||||
|
EnableProgress bool
|
||||||
|
Fs afero.Fs
|
||||||
|
Exclude []string // Exclude patterns (combined global + snapshot-specific)
|
||||||
|
SkipErrors bool // Skip file read errors (log loudly but continue)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Module exports backup functionality as an fx module.
|
||||||
|
// It provides a ScannerFactory that can create Scanner instances
|
||||||
|
// with custom parameters while sharing common dependencies.
|
||||||
|
var Module = fx.Module("backup",
|
||||||
|
fx.Provide(
|
||||||
|
provideScannerFactory,
|
||||||
|
NewSnapshotManager,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
// ScannerFactory creates scanners with custom parameters
|
||||||
|
type ScannerFactory func(params ScannerParams) *Scanner
|
||||||
|
|
||||||
|
func provideScannerFactory(cfg *config.Config, repos *database.Repositories, storer storage.Storer) ScannerFactory {
|
||||||
|
return func(params ScannerParams) *Scanner {
|
||||||
|
// Use provided excludes, or fall back to global config excludes
|
||||||
|
excludes := params.Exclude
|
||||||
|
if len(excludes) == 0 {
|
||||||
|
excludes = cfg.Exclude
|
||||||
|
}
|
||||||
|
|
||||||
|
return NewScanner(ScannerConfig{
|
||||||
|
FS: params.Fs,
|
||||||
|
ChunkSize: cfg.ChunkSize.Int64(),
|
||||||
|
Repositories: repos,
|
||||||
|
Storage: storer,
|
||||||
|
MaxBlobSize: cfg.BlobSizeLimit.Int64(),
|
||||||
|
CompressionLevel: cfg.CompressionLevel,
|
||||||
|
AgeRecipients: cfg.AgeRecipients,
|
||||||
|
EnableProgress: params.EnableProgress,
|
||||||
|
Exclude: excludes,
|
||||||
|
SkipErrors: params.SkipErrors,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
419
internal/snapshot/progress.go
Normal file
419
internal/snapshot/progress.go
Normal file
@@ -0,0 +1,419 @@
|
|||||||
|
package snapshot
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
"os/signal"
|
||||||
|
"sync"
|
||||||
|
"sync/atomic"
|
||||||
|
"syscall"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"github.com/dustin/go-humanize"
|
||||||
|
)
|
||||||
|
|
||||||
|
const (
|
||||||
|
// SummaryInterval defines how often one-line status updates are printed.
|
||||||
|
// These updates show current progress, ETA, and the file being processed.
|
||||||
|
SummaryInterval = 10 * time.Second
|
||||||
|
|
||||||
|
// DetailInterval defines how often multi-line detailed status reports are printed.
|
||||||
|
// These reports include comprehensive statistics about files, chunks, blobs, and uploads.
|
||||||
|
DetailInterval = 60 * time.Second
|
||||||
|
|
||||||
|
// UploadProgressInterval defines how often upload progress messages are logged.
|
||||||
|
UploadProgressInterval = 15 * time.Second
|
||||||
|
)
|
||||||
|
|
||||||
|
// ProgressStats holds atomic counters for progress tracking
|
||||||
|
type ProgressStats struct {
|
||||||
|
FilesScanned atomic.Int64 // Total files seen during scan (includes skipped)
|
||||||
|
FilesProcessed atomic.Int64 // Files actually processed in phase 2
|
||||||
|
FilesSkipped atomic.Int64 // Files skipped due to no changes
|
||||||
|
BytesScanned atomic.Int64 // Bytes from new/changed files only
|
||||||
|
BytesSkipped atomic.Int64 // Bytes from unchanged files
|
||||||
|
BytesProcessed atomic.Int64 // Actual bytes processed (for ETA calculation)
|
||||||
|
ChunksCreated atomic.Int64
|
||||||
|
BlobsCreated atomic.Int64
|
||||||
|
BlobsUploaded atomic.Int64
|
||||||
|
BytesUploaded atomic.Int64
|
||||||
|
UploadDurationMs atomic.Int64 // Total milliseconds spent uploading
|
||||||
|
CurrentFile atomic.Value // stores string
|
||||||
|
TotalSize atomic.Int64 // Total size to process (set after scan phase)
|
||||||
|
TotalFiles atomic.Int64 // Total files to process in phase 2
|
||||||
|
ProcessStartTime atomic.Value // stores time.Time when processing starts
|
||||||
|
StartTime time.Time
|
||||||
|
mu sync.RWMutex
|
||||||
|
lastDetailTime time.Time
|
||||||
|
|
||||||
|
// Upload tracking
|
||||||
|
CurrentUpload atomic.Value // stores *UploadInfo
|
||||||
|
lastChunkingTime time.Time // Track when we last showed chunking progress
|
||||||
|
}
|
||||||
|
|
||||||
|
// UploadInfo tracks current upload progress
|
||||||
|
type UploadInfo struct {
|
||||||
|
BlobHash string
|
||||||
|
Size int64
|
||||||
|
StartTime time.Time
|
||||||
|
LastLogTime time.Time
|
||||||
|
}
|
||||||
|
|
||||||
|
// ProgressReporter handles periodic progress reporting
|
||||||
|
type ProgressReporter struct {
|
||||||
|
stats *ProgressStats
|
||||||
|
ctx context.Context
|
||||||
|
cancel context.CancelFunc
|
||||||
|
wg sync.WaitGroup
|
||||||
|
detailTicker *time.Ticker
|
||||||
|
summaryTicker *time.Ticker
|
||||||
|
sigChan chan os.Signal
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewProgressReporter creates a new progress reporter
|
||||||
|
func NewProgressReporter() *ProgressReporter {
|
||||||
|
stats := &ProgressStats{
|
||||||
|
StartTime: time.Now().UTC(),
|
||||||
|
lastDetailTime: time.Now().UTC(),
|
||||||
|
}
|
||||||
|
stats.CurrentFile.Store("")
|
||||||
|
|
||||||
|
ctx, cancel := context.WithCancel(context.Background())
|
||||||
|
|
||||||
|
pr := &ProgressReporter{
|
||||||
|
stats: stats,
|
||||||
|
ctx: ctx,
|
||||||
|
cancel: cancel,
|
||||||
|
summaryTicker: time.NewTicker(SummaryInterval),
|
||||||
|
detailTicker: time.NewTicker(DetailInterval),
|
||||||
|
sigChan: make(chan os.Signal, 1),
|
||||||
|
}
|
||||||
|
|
||||||
|
// Register for SIGUSR1
|
||||||
|
signal.Notify(pr.sigChan, syscall.SIGUSR1)
|
||||||
|
|
||||||
|
return pr
|
||||||
|
}
|
||||||
|
|
||||||
|
// Start begins the progress reporting
|
||||||
|
func (pr *ProgressReporter) Start() {
|
||||||
|
pr.wg.Add(1)
|
||||||
|
go pr.run()
|
||||||
|
|
||||||
|
// Print initial multi-line status
|
||||||
|
pr.printDetailedStatus()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Stop stops the progress reporting
|
||||||
|
func (pr *ProgressReporter) Stop() {
|
||||||
|
pr.cancel()
|
||||||
|
pr.summaryTicker.Stop()
|
||||||
|
pr.detailTicker.Stop()
|
||||||
|
signal.Stop(pr.sigChan)
|
||||||
|
close(pr.sigChan)
|
||||||
|
pr.wg.Wait()
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetStats returns the progress stats for updating
|
||||||
|
func (pr *ProgressReporter) GetStats() *ProgressStats {
|
||||||
|
return pr.stats
|
||||||
|
}
|
||||||
|
|
||||||
|
// SetTotalSize sets the total size to process (after scan phase)
|
||||||
|
func (pr *ProgressReporter) SetTotalSize(size int64) {
|
||||||
|
pr.stats.TotalSize.Store(size)
|
||||||
|
pr.stats.ProcessStartTime.Store(time.Now().UTC())
|
||||||
|
}
|
||||||
|
|
||||||
|
// run is the main progress reporting loop
|
||||||
|
func (pr *ProgressReporter) run() {
|
||||||
|
defer pr.wg.Done()
|
||||||
|
|
||||||
|
for {
|
||||||
|
select {
|
||||||
|
case <-pr.ctx.Done():
|
||||||
|
return
|
||||||
|
case <-pr.summaryTicker.C:
|
||||||
|
pr.printSummaryStatus()
|
||||||
|
case <-pr.detailTicker.C:
|
||||||
|
pr.printDetailedStatus()
|
||||||
|
case <-pr.sigChan:
|
||||||
|
// SIGUSR1 received, print detailed status
|
||||||
|
log.Info("SIGUSR1 received, printing detailed status")
|
||||||
|
pr.printDetailedStatus()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// printSummaryStatus prints a one-line status update
|
||||||
|
func (pr *ProgressReporter) printSummaryStatus() {
|
||||||
|
// Check if we're currently uploading
|
||||||
|
if uploadInfo, ok := pr.stats.CurrentUpload.Load().(*UploadInfo); ok && uploadInfo != nil {
|
||||||
|
// Show upload progress instead
|
||||||
|
pr.printUploadProgress(uploadInfo)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Only show chunking progress if we've done chunking recently
|
||||||
|
pr.stats.mu.RLock()
|
||||||
|
timeSinceLastChunk := time.Since(pr.stats.lastChunkingTime)
|
||||||
|
pr.stats.mu.RUnlock()
|
||||||
|
|
||||||
|
if timeSinceLastChunk > SummaryInterval*2 {
|
||||||
|
// No recent chunking activity, don't show progress
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
elapsed := time.Since(pr.stats.StartTime)
|
||||||
|
bytesScanned := pr.stats.BytesScanned.Load()
|
||||||
|
bytesSkipped := pr.stats.BytesSkipped.Load()
|
||||||
|
bytesProcessed := pr.stats.BytesProcessed.Load()
|
||||||
|
totalSize := pr.stats.TotalSize.Load()
|
||||||
|
currentFile := pr.stats.CurrentFile.Load().(string)
|
||||||
|
|
||||||
|
// Calculate ETA if we have total size and are processing
|
||||||
|
etaStr := ""
|
||||||
|
if totalSize > 0 && bytesProcessed > 0 {
|
||||||
|
processStart, ok := pr.stats.ProcessStartTime.Load().(time.Time)
|
||||||
|
if ok && !processStart.IsZero() {
|
||||||
|
processElapsed := time.Since(processStart)
|
||||||
|
rate := float64(bytesProcessed) / processElapsed.Seconds()
|
||||||
|
if rate > 0 {
|
||||||
|
remainingBytes := totalSize - bytesProcessed
|
||||||
|
remainingSeconds := float64(remainingBytes) / rate
|
||||||
|
eta := time.Duration(remainingSeconds * float64(time.Second))
|
||||||
|
etaStr = fmt.Sprintf(" | ETA: %s", formatDuration(eta))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
rate := float64(bytesScanned+bytesSkipped) / elapsed.Seconds()
|
||||||
|
|
||||||
|
// Show files processed / total files to process
|
||||||
|
filesProcessed := pr.stats.FilesProcessed.Load()
|
||||||
|
totalFiles := pr.stats.TotalFiles.Load()
|
||||||
|
|
||||||
|
status := fmt.Sprintf("Snapshot progress: %d/%d files, %s/%s (%.1f%%), %s/s%s",
|
||||||
|
filesProcessed,
|
||||||
|
totalFiles,
|
||||||
|
humanize.Bytes(uint64(bytesProcessed)),
|
||||||
|
humanize.Bytes(uint64(totalSize)),
|
||||||
|
float64(bytesProcessed)/float64(totalSize)*100,
|
||||||
|
humanize.Bytes(uint64(rate)),
|
||||||
|
etaStr,
|
||||||
|
)
|
||||||
|
|
||||||
|
if currentFile != "" {
|
||||||
|
status += fmt.Sprintf(" | Current: %s", truncatePath(currentFile, 40))
|
||||||
|
}
|
||||||
|
|
||||||
|
log.Info(status)
|
||||||
|
}
|
||||||
|
|
||||||
|
// printDetailedStatus prints a multi-line detailed status
|
||||||
|
func (pr *ProgressReporter) printDetailedStatus() {
|
||||||
|
pr.stats.mu.Lock()
|
||||||
|
pr.stats.lastDetailTime = time.Now().UTC()
|
||||||
|
pr.stats.mu.Unlock()
|
||||||
|
|
||||||
|
elapsed := time.Since(pr.stats.StartTime)
|
||||||
|
filesScanned := pr.stats.FilesScanned.Load()
|
||||||
|
filesSkipped := pr.stats.FilesSkipped.Load()
|
||||||
|
bytesScanned := pr.stats.BytesScanned.Load()
|
||||||
|
bytesSkipped := pr.stats.BytesSkipped.Load()
|
||||||
|
bytesProcessed := pr.stats.BytesProcessed.Load()
|
||||||
|
totalSize := pr.stats.TotalSize.Load()
|
||||||
|
chunksCreated := pr.stats.ChunksCreated.Load()
|
||||||
|
blobsCreated := pr.stats.BlobsCreated.Load()
|
||||||
|
blobsUploaded := pr.stats.BlobsUploaded.Load()
|
||||||
|
bytesUploaded := pr.stats.BytesUploaded.Load()
|
||||||
|
currentFile := pr.stats.CurrentFile.Load().(string)
|
||||||
|
|
||||||
|
totalBytes := bytesScanned + bytesSkipped
|
||||||
|
rate := float64(totalBytes) / elapsed.Seconds()
|
||||||
|
|
||||||
|
log.Notice("=== Snapshot Progress Report ===")
|
||||||
|
log.Info("Elapsed time", "duration", formatDuration(elapsed))
|
||||||
|
|
||||||
|
// Calculate and show ETA if we have data
|
||||||
|
if totalSize > 0 && bytesProcessed > 0 {
|
||||||
|
processStart, ok := pr.stats.ProcessStartTime.Load().(time.Time)
|
||||||
|
if ok && !processStart.IsZero() {
|
||||||
|
processElapsed := time.Since(processStart)
|
||||||
|
processRate := float64(bytesProcessed) / processElapsed.Seconds()
|
||||||
|
if processRate > 0 {
|
||||||
|
remainingBytes := totalSize - bytesProcessed
|
||||||
|
remainingSeconds := float64(remainingBytes) / processRate
|
||||||
|
eta := time.Duration(remainingSeconds * float64(time.Second))
|
||||||
|
percentComplete := float64(bytesProcessed) / float64(totalSize) * 100
|
||||||
|
log.Info("Overall progress",
|
||||||
|
"percent", fmt.Sprintf("%.1f%%", percentComplete),
|
||||||
|
"processed", humanize.Bytes(uint64(bytesProcessed)),
|
||||||
|
"total", humanize.Bytes(uint64(totalSize)),
|
||||||
|
"rate", humanize.Bytes(uint64(processRate))+"/s",
|
||||||
|
"eta", formatDuration(eta))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
log.Info("Files processed",
|
||||||
|
"scanned", filesScanned,
|
||||||
|
"skipped", filesSkipped,
|
||||||
|
"total", filesScanned,
|
||||||
|
"skip_rate", formatPercent(filesSkipped, filesScanned))
|
||||||
|
log.Info("Data scanned",
|
||||||
|
"new", humanize.Bytes(uint64(bytesScanned)),
|
||||||
|
"skipped", humanize.Bytes(uint64(bytesSkipped)),
|
||||||
|
"total", humanize.Bytes(uint64(totalBytes)),
|
||||||
|
"scan_rate", humanize.Bytes(uint64(rate))+"/s")
|
||||||
|
log.Info("Chunks created", "count", chunksCreated)
|
||||||
|
log.Info("Blobs status",
|
||||||
|
"created", blobsCreated,
|
||||||
|
"uploaded", blobsUploaded,
|
||||||
|
"pending", blobsCreated-blobsUploaded)
|
||||||
|
log.Info("Total uploaded to remote",
|
||||||
|
"uploaded", humanize.Bytes(uint64(bytesUploaded)),
|
||||||
|
"compression_ratio", formatRatio(bytesUploaded, bytesScanned))
|
||||||
|
if currentFile != "" {
|
||||||
|
log.Info("Current file", "path", currentFile)
|
||||||
|
}
|
||||||
|
log.Notice("=============================")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Helper functions
|
||||||
|
|
||||||
|
func formatDuration(d time.Duration) string {
|
||||||
|
if d < 0 {
|
||||||
|
return "unknown"
|
||||||
|
}
|
||||||
|
if d < time.Minute {
|
||||||
|
return fmt.Sprintf("%ds", int(d.Seconds()))
|
||||||
|
}
|
||||||
|
if d < time.Hour {
|
||||||
|
return fmt.Sprintf("%dm%ds", int(d.Minutes()), int(d.Seconds())%60)
|
||||||
|
}
|
||||||
|
return fmt.Sprintf("%dh%dm", int(d.Hours()), int(d.Minutes())%60)
|
||||||
|
}
|
||||||
|
|
||||||
|
func formatPercent(numerator, denominator int64) string {
|
||||||
|
if denominator == 0 {
|
||||||
|
return "0.0%"
|
||||||
|
}
|
||||||
|
return fmt.Sprintf("%.1f%%", float64(numerator)/float64(denominator)*100)
|
||||||
|
}
|
||||||
|
|
||||||
|
func formatRatio(compressed, uncompressed int64) string {
|
||||||
|
if uncompressed == 0 {
|
||||||
|
return "1.00"
|
||||||
|
}
|
||||||
|
ratio := float64(compressed) / float64(uncompressed)
|
||||||
|
return fmt.Sprintf("%.2f", ratio)
|
||||||
|
}
|
||||||
|
|
||||||
|
func truncatePath(path string, maxLen int) string {
|
||||||
|
if len(path) <= maxLen {
|
||||||
|
return path
|
||||||
|
}
|
||||||
|
// Keep the last maxLen-3 characters and prepend "..."
|
||||||
|
return "..." + path[len(path)-(maxLen-3):]
|
||||||
|
}
|
||||||
|
|
||||||
|
// printUploadProgress prints upload progress
|
||||||
|
func (pr *ProgressReporter) printUploadProgress(info *UploadInfo) {
|
||||||
|
// This function is called repeatedly during upload, not just at start
|
||||||
|
// Don't print anything here - the actual progress is shown by ReportUploadProgress
|
||||||
|
}
|
||||||
|
|
||||||
|
// ReportUploadStart marks the beginning of a blob upload
|
||||||
|
func (pr *ProgressReporter) ReportUploadStart(blobHash string, size int64) {
|
||||||
|
info := &UploadInfo{
|
||||||
|
BlobHash: blobHash,
|
||||||
|
Size: size,
|
||||||
|
StartTime: time.Now().UTC(),
|
||||||
|
}
|
||||||
|
pr.stats.CurrentUpload.Store(info)
|
||||||
|
|
||||||
|
// Log the start of upload
|
||||||
|
log.Info("Starting blob upload",
|
||||||
|
"hash", blobHash[:8]+"...",
|
||||||
|
"size", humanize.Bytes(uint64(size)))
|
||||||
|
}
|
||||||
|
|
||||||
|
// ReportUploadComplete marks the completion of a blob upload
|
||||||
|
func (pr *ProgressReporter) ReportUploadComplete(blobHash string, size int64, duration time.Duration) {
|
||||||
|
// Clear current upload
|
||||||
|
pr.stats.CurrentUpload.Store((*UploadInfo)(nil))
|
||||||
|
|
||||||
|
// Add to total upload duration
|
||||||
|
pr.stats.UploadDurationMs.Add(duration.Milliseconds())
|
||||||
|
|
||||||
|
// Calculate speed
|
||||||
|
if duration < time.Millisecond {
|
||||||
|
duration = time.Millisecond
|
||||||
|
}
|
||||||
|
bytesPerSec := float64(size) / duration.Seconds()
|
||||||
|
bitsPerSec := bytesPerSec * 8
|
||||||
|
|
||||||
|
// Format speed
|
||||||
|
var speedStr string
|
||||||
|
if bitsPerSec >= 1e9 {
|
||||||
|
speedStr = fmt.Sprintf("%.1fGbit/sec", bitsPerSec/1e9)
|
||||||
|
} else if bitsPerSec >= 1e6 {
|
||||||
|
speedStr = fmt.Sprintf("%.0fMbit/sec", bitsPerSec/1e6)
|
||||||
|
} else if bitsPerSec >= 1e3 {
|
||||||
|
speedStr = fmt.Sprintf("%.0fKbit/sec", bitsPerSec/1e3)
|
||||||
|
} else {
|
||||||
|
speedStr = fmt.Sprintf("%.0fbit/sec", bitsPerSec)
|
||||||
|
}
|
||||||
|
|
||||||
|
log.Info("Blob upload completed",
|
||||||
|
"hash", blobHash[:8]+"...",
|
||||||
|
"size", humanize.Bytes(uint64(size)),
|
||||||
|
"duration", formatDuration(duration),
|
||||||
|
"speed", speedStr)
|
||||||
|
}
|
||||||
|
|
||||||
|
// UpdateChunkingActivity updates the last chunking time
|
||||||
|
func (pr *ProgressReporter) UpdateChunkingActivity() {
|
||||||
|
pr.stats.mu.Lock()
|
||||||
|
pr.stats.lastChunkingTime = time.Now().UTC()
|
||||||
|
pr.stats.mu.Unlock()
|
||||||
|
}
|
||||||
|
|
||||||
|
// ReportUploadProgress reports current upload progress with instantaneous speed
|
||||||
|
func (pr *ProgressReporter) ReportUploadProgress(blobHash string, bytesUploaded, totalSize int64, instantSpeed float64) {
|
||||||
|
// Update the current upload info with progress
|
||||||
|
if uploadInfo, ok := pr.stats.CurrentUpload.Load().(*UploadInfo); ok && uploadInfo != nil {
|
||||||
|
now := time.Now()
|
||||||
|
|
||||||
|
// Only log at the configured interval
|
||||||
|
if now.Sub(uploadInfo.LastLogTime) >= UploadProgressInterval {
|
||||||
|
// Format speed in bits/second using humanize
|
||||||
|
bitsPerSec := instantSpeed * 8
|
||||||
|
speedStr := humanize.SI(bitsPerSec, "bit/sec")
|
||||||
|
|
||||||
|
percent := float64(bytesUploaded) / float64(totalSize) * 100
|
||||||
|
|
||||||
|
// Calculate ETA based on current speed
|
||||||
|
etaStr := "unknown"
|
||||||
|
if instantSpeed > 0 && bytesUploaded < totalSize {
|
||||||
|
remainingBytes := totalSize - bytesUploaded
|
||||||
|
remainingSeconds := float64(remainingBytes) / instantSpeed
|
||||||
|
eta := time.Duration(remainingSeconds * float64(time.Second))
|
||||||
|
etaStr = formatDuration(eta)
|
||||||
|
}
|
||||||
|
|
||||||
|
log.Info("Blob upload progress",
|
||||||
|
"hash", blobHash[:8]+"...",
|
||||||
|
"progress", fmt.Sprintf("%.1f%%", percent),
|
||||||
|
"uploaded", humanize.Bytes(uint64(bytesUploaded)),
|
||||||
|
"total", humanize.Bytes(uint64(totalSize)),
|
||||||
|
"speed", speedStr,
|
||||||
|
"eta", etaStr)
|
||||||
|
|
||||||
|
uploadInfo.LastLogTime = now
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
1408
internal/snapshot/scanner.go
Normal file
1408
internal/snapshot/scanner.go
Normal file
File diff suppressed because it is too large
Load Diff
269
internal/snapshot/scanner_test.go
Normal file
269
internal/snapshot/scanner_test.go
Normal file
@@ -0,0 +1,269 @@
|
|||||||
|
package snapshot_test
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"database/sql"
|
||||||
|
"path/filepath"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/snapshot"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
|
"github.com/spf13/afero"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestScannerSimpleDirectory(t *testing.T) {
|
||||||
|
// Initialize logger for tests
|
||||||
|
log.Initialize(log.Config{})
|
||||||
|
|
||||||
|
// Create in-memory filesystem
|
||||||
|
fs := afero.NewMemMapFs()
|
||||||
|
|
||||||
|
// Create test directory structure
|
||||||
|
testFiles := map[string]string{
|
||||||
|
"/source/file1.txt": "Hello, world!", // 13 bytes
|
||||||
|
"/source/file2.txt": "This is another file", // 20 bytes
|
||||||
|
"/source/subdir/file3.txt": "File in subdirectory", // 20 bytes
|
||||||
|
"/source/subdir/file4.txt": "Another file in subdirectory", // 28 bytes
|
||||||
|
"/source/empty.txt": "", // 0 bytes
|
||||||
|
"/source/subdir2/file5.txt": "Yet another file", // 16 bytes
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create files with specific times
|
||||||
|
testTime := time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC)
|
||||||
|
for path, content := range testFiles {
|
||||||
|
dir := filepath.Dir(path)
|
||||||
|
if err := fs.MkdirAll(dir, 0755); err != nil {
|
||||||
|
t.Fatalf("failed to create directory %s: %v", dir, err)
|
||||||
|
}
|
||||||
|
if err := afero.WriteFile(fs, path, []byte(content), 0644); err != nil {
|
||||||
|
t.Fatalf("failed to write file %s: %v", path, err)
|
||||||
|
}
|
||||||
|
// Set times
|
||||||
|
if err := fs.Chtimes(path, testTime, testTime); err != nil {
|
||||||
|
t.Fatalf("failed to set times for %s: %v", path, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create test database
|
||||||
|
db, err := database.NewTestDB()
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create test database: %v", err)
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
if err := db.Close(); err != nil {
|
||||||
|
t.Errorf("failed to close database: %v", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
repos := database.NewRepositories(db)
|
||||||
|
|
||||||
|
// Create scanner
|
||||||
|
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
|
||||||
|
FS: fs,
|
||||||
|
ChunkSize: int64(1024 * 16), // 16KB chunks for testing
|
||||||
|
Repositories: repos,
|
||||||
|
MaxBlobSize: int64(1024 * 1024), // 1MB blobs
|
||||||
|
CompressionLevel: 3,
|
||||||
|
AgeRecipients: []string{"age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"}, // Test public key
|
||||||
|
})
|
||||||
|
|
||||||
|
// Create a snapshot record for testing
|
||||||
|
ctx := context.Background()
|
||||||
|
snapshotID := "test-snapshot-001"
|
||||||
|
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
snapshot := &database.Snapshot{
|
||||||
|
ID: types.SnapshotID(snapshotID),
|
||||||
|
Hostname: "test-host",
|
||||||
|
VaultikVersion: "test",
|
||||||
|
StartedAt: time.Now(),
|
||||||
|
CompletedAt: nil,
|
||||||
|
FileCount: 0,
|
||||||
|
ChunkCount: 0,
|
||||||
|
BlobCount: 0,
|
||||||
|
TotalSize: 0,
|
||||||
|
BlobSize: 0,
|
||||||
|
CompressionRatio: 1.0,
|
||||||
|
}
|
||||||
|
return repos.Snapshots.Create(ctx, tx, snapshot)
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create snapshot: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Scan the directory
|
||||||
|
var result *snapshot.ScanResult
|
||||||
|
result, err = scanner.Scan(ctx, "/source", snapshotID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("scan failed: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify results - we only scan regular files, not directories
|
||||||
|
if result.FilesScanned != 6 {
|
||||||
|
t.Errorf("expected 6 files scanned, got %d", result.FilesScanned)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Total bytes should be the sum of all file contents
|
||||||
|
if result.BytesScanned < 97 { // At minimum we have 97 bytes of file content
|
||||||
|
t.Errorf("expected at least 97 bytes scanned, got %d", result.BytesScanned)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify files in database - only regular files are stored
|
||||||
|
files, err := repos.Files.ListByPrefix(ctx, "/source")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to list files: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// We should have 6 files (directories are not stored)
|
||||||
|
if len(files) != 6 {
|
||||||
|
t.Errorf("expected 6 files in database, got %d", len(files))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify specific file
|
||||||
|
file1, err := repos.Files.GetByPath(ctx, "/source/file1.txt")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to get file1.txt: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if file1.Size != 13 {
|
||||||
|
t.Errorf("expected file1.txt size 13, got %d", file1.Size)
|
||||||
|
}
|
||||||
|
|
||||||
|
if file1.Mode != 0644 {
|
||||||
|
t.Errorf("expected file1.txt mode 0644, got %o", file1.Mode)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify chunks were created
|
||||||
|
chunks, err := repos.FileChunks.GetByFile(ctx, "/source/file1.txt")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to get chunks for file1.txt: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(chunks) != 1 { // Small file should be one chunk
|
||||||
|
t.Errorf("expected 1 chunk for file1.txt, got %d", len(chunks))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify deduplication - file3.txt and file4.txt have different content
|
||||||
|
// but we should still have the correct number of unique chunks
|
||||||
|
allChunks, err := repos.Chunks.List(ctx)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to list all chunks: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// We should have at most 6 chunks (one per unique file content)
|
||||||
|
// Empty file might not create a chunk
|
||||||
|
if len(allChunks) > 6 {
|
||||||
|
t.Errorf("expected at most 6 chunks, got %d", len(allChunks))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestScannerLargeFile(t *testing.T) {
|
||||||
|
// Initialize logger for tests
|
||||||
|
log.Initialize(log.Config{})
|
||||||
|
|
||||||
|
// Create in-memory filesystem
|
||||||
|
fs := afero.NewMemMapFs()
|
||||||
|
|
||||||
|
// Create a large file that will require multiple chunks
|
||||||
|
// Use random content to ensure good chunk boundaries
|
||||||
|
largeContent := make([]byte, 1024*1024) // 1MB
|
||||||
|
// Fill with pseudo-random data to ensure chunk boundaries
|
||||||
|
for i := 0; i < len(largeContent); i++ {
|
||||||
|
// Simple pseudo-random generator for deterministic tests
|
||||||
|
largeContent[i] = byte((i * 7919) ^ (i >> 3))
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := fs.MkdirAll("/source", 0755); err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
if err := afero.WriteFile(fs, "/source/large.bin", largeContent, 0644); err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create test database
|
||||||
|
db, err := database.NewTestDB()
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create test database: %v", err)
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
if err := db.Close(); err != nil {
|
||||||
|
t.Errorf("failed to close database: %v", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
repos := database.NewRepositories(db)
|
||||||
|
|
||||||
|
// Create scanner with 64KB average chunk size
|
||||||
|
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
|
||||||
|
FS: fs,
|
||||||
|
ChunkSize: int64(1024 * 64), // 64KB average chunks
|
||||||
|
Repositories: repos,
|
||||||
|
MaxBlobSize: int64(1024 * 1024),
|
||||||
|
CompressionLevel: 3,
|
||||||
|
AgeRecipients: []string{"age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"}, // Test public key
|
||||||
|
})
|
||||||
|
|
||||||
|
// Create a snapshot record for testing
|
||||||
|
ctx := context.Background()
|
||||||
|
snapshotID := "test-snapshot-001"
|
||||||
|
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
snapshot := &database.Snapshot{
|
||||||
|
ID: types.SnapshotID(snapshotID),
|
||||||
|
Hostname: "test-host",
|
||||||
|
VaultikVersion: "test",
|
||||||
|
StartedAt: time.Now(),
|
||||||
|
CompletedAt: nil,
|
||||||
|
FileCount: 0,
|
||||||
|
ChunkCount: 0,
|
||||||
|
BlobCount: 0,
|
||||||
|
TotalSize: 0,
|
||||||
|
BlobSize: 0,
|
||||||
|
CompressionRatio: 1.0,
|
||||||
|
}
|
||||||
|
return repos.Snapshots.Create(ctx, tx, snapshot)
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create snapshot: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Scan the directory
|
||||||
|
var result *snapshot.ScanResult
|
||||||
|
result, err = scanner.Scan(ctx, "/source", snapshotID)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("scan failed: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// We scan only regular files, not directories
|
||||||
|
if result.FilesScanned != 1 {
|
||||||
|
t.Errorf("expected 1 file scanned, got %d", result.FilesScanned)
|
||||||
|
}
|
||||||
|
|
||||||
|
// The file size should be at least 1MB
|
||||||
|
if result.BytesScanned < 1024*1024 {
|
||||||
|
t.Errorf("expected at least %d bytes scanned, got %d", 1024*1024, result.BytesScanned)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify chunks
|
||||||
|
chunks, err := repos.FileChunks.GetByFile(ctx, "/source/large.bin")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to get chunks: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// With content-defined chunking, the number of chunks depends on content
|
||||||
|
// For a 1MB file, we should get at least 1 chunk
|
||||||
|
if len(chunks) < 1 {
|
||||||
|
t.Errorf("expected at least 1 chunk, got %d", len(chunks))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Log the actual number of chunks for debugging
|
||||||
|
t.Logf("1MB file produced %d chunks with 64KB average chunk size", len(chunks))
|
||||||
|
|
||||||
|
// Verify chunk sequence
|
||||||
|
for i, fc := range chunks {
|
||||||
|
if fc.Idx != i {
|
||||||
|
t.Errorf("chunk %d has incorrect sequence %d", i, fc.Idx)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
895
internal/snapshot/snapshot.go
Normal file
895
internal/snapshot/snapshot.go
Normal file
@@ -0,0 +1,895 @@
|
|||||||
|
package snapshot
|
||||||
|
|
||||||
|
// Snapshot Metadata Export Process
|
||||||
|
// ================================
|
||||||
|
//
|
||||||
|
// The snapshot metadata contains all information needed to restore a snapshot.
|
||||||
|
// Instead of creating a custom format, we use a trimmed copy of the SQLite
|
||||||
|
// database containing only data relevant to the current snapshot.
|
||||||
|
//
|
||||||
|
// Process Overview:
|
||||||
|
// 1. After all files/chunks/blobs are backed up, create a snapshot record
|
||||||
|
// 2. Close the main database to ensure consistency
|
||||||
|
// 3. Copy the entire database to a temporary file
|
||||||
|
// 4. Open the temporary database
|
||||||
|
// 5. Delete all snapshots except the current one
|
||||||
|
// 6. Delete all orphaned records:
|
||||||
|
// - Files not referenced by any remaining snapshot
|
||||||
|
// - Chunks not referenced by any remaining files
|
||||||
|
// - Blobs not containing any remaining chunks
|
||||||
|
// - All related mapping tables (file_chunks, chunk_files, blob_chunks)
|
||||||
|
// 7. Close the temporary database
|
||||||
|
// 8. VACUUM the database to remove deleted data and compact (security critical)
|
||||||
|
// 9. Compress the binary database with zstd
|
||||||
|
// 10. Encrypt the compressed database with age (if encryption is enabled)
|
||||||
|
// 11. Upload to S3 as: metadata/{snapshot-id}/db.zst.age
|
||||||
|
// 12. Reopen the main database
|
||||||
|
//
|
||||||
|
// Advantages of this approach:
|
||||||
|
// - No custom metadata format needed
|
||||||
|
// - Reuses existing database schema and relationships
|
||||||
|
// - Binary SQLite files are portable and compress well
|
||||||
|
// - Fast restore - just decompress and open (no SQL parsing)
|
||||||
|
// - VACUUM ensures no deleted data leaks
|
||||||
|
// - Atomic and consistent snapshot of all metadata
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"context"
|
||||||
|
"database/sql"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"os/exec"
|
||||||
|
"path/filepath"
|
||||||
|
"strings"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/blobgen"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/config"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/storage"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||||
|
"github.com/dustin/go-humanize"
|
||||||
|
"github.com/spf13/afero"
|
||||||
|
"go.uber.org/fx"
|
||||||
|
)
|
||||||
|
|
||||||
|
// SnapshotManager handles snapshot creation and metadata export
|
||||||
|
type SnapshotManager struct {
|
||||||
|
repos *database.Repositories
|
||||||
|
storage storage.Storer
|
||||||
|
config *config.Config
|
||||||
|
fs afero.Fs
|
||||||
|
}
|
||||||
|
|
||||||
|
// SnapshotManagerParams holds dependencies for NewSnapshotManager
|
||||||
|
type SnapshotManagerParams struct {
|
||||||
|
fx.In
|
||||||
|
|
||||||
|
Repos *database.Repositories
|
||||||
|
Storage storage.Storer
|
||||||
|
Config *config.Config
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewSnapshotManager creates a new snapshot manager for dependency injection
|
||||||
|
func NewSnapshotManager(params SnapshotManagerParams) *SnapshotManager {
|
||||||
|
return &SnapshotManager{
|
||||||
|
repos: params.Repos,
|
||||||
|
storage: params.Storage,
|
||||||
|
config: params.Config,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// SetFilesystem sets the filesystem to use for all file operations
|
||||||
|
func (sm *SnapshotManager) SetFilesystem(fs afero.Fs) {
|
||||||
|
sm.fs = fs
|
||||||
|
}
|
||||||
|
|
||||||
|
// CreateSnapshot creates a new snapshot record in the database at the start of a backup.
|
||||||
|
// Deprecated: Use CreateSnapshotWithName instead for multi-snapshot support.
|
||||||
|
func (sm *SnapshotManager) CreateSnapshot(ctx context.Context, hostname, version, gitRevision string) (string, error) {
|
||||||
|
return sm.CreateSnapshotWithName(ctx, hostname, "", version, gitRevision)
|
||||||
|
}
|
||||||
|
|
||||||
|
// CreateSnapshotWithName creates a new snapshot record with an optional snapshot name.
|
||||||
|
// The snapshot ID format is: hostname_name_timestamp or hostname_timestamp if name is empty.
|
||||||
|
func (sm *SnapshotManager) CreateSnapshotWithName(ctx context.Context, hostname, name, version, gitRevision string) (string, error) {
|
||||||
|
// Use short hostname (strip domain if present)
|
||||||
|
shortHostname := hostname
|
||||||
|
if idx := strings.Index(hostname, "."); idx != -1 {
|
||||||
|
shortHostname = hostname[:idx]
|
||||||
|
}
|
||||||
|
|
||||||
|
// Build snapshot ID with optional name
|
||||||
|
timestamp := time.Now().UTC().Format("2006-01-02T15:04:05Z")
|
||||||
|
var snapshotID string
|
||||||
|
if name != "" {
|
||||||
|
snapshotID = fmt.Sprintf("%s_%s_%s", shortHostname, name, timestamp)
|
||||||
|
} else {
|
||||||
|
snapshotID = fmt.Sprintf("%s_%s", shortHostname, timestamp)
|
||||||
|
}
|
||||||
|
|
||||||
|
snapshot := &database.Snapshot{
|
||||||
|
ID: types.SnapshotID(snapshotID),
|
||||||
|
Hostname: types.Hostname(hostname),
|
||||||
|
VaultikVersion: types.Version(version),
|
||||||
|
VaultikGitRevision: types.GitRevision(gitRevision),
|
||||||
|
StartedAt: time.Now().UTC(),
|
||||||
|
CompletedAt: nil, // Not completed yet
|
||||||
|
FileCount: 0,
|
||||||
|
ChunkCount: 0,
|
||||||
|
BlobCount: 0,
|
||||||
|
TotalSize: 0,
|
||||||
|
BlobSize: 0,
|
||||||
|
CompressionRatio: 1.0,
|
||||||
|
}
|
||||||
|
|
||||||
|
err := sm.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
return sm.repos.Snapshots.Create(ctx, tx, snapshot)
|
||||||
|
})
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return "", fmt.Errorf("creating snapshot: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
log.Info("Created snapshot", "snapshot_id", snapshotID)
|
||||||
|
return snapshotID, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// UpdateSnapshotStats updates the statistics for a snapshot during backup
|
||||||
|
func (sm *SnapshotManager) UpdateSnapshotStats(ctx context.Context, snapshotID string, stats BackupStats) error {
|
||||||
|
err := sm.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
return sm.repos.Snapshots.UpdateCounts(ctx, tx, snapshotID,
|
||||||
|
int64(stats.FilesScanned),
|
||||||
|
int64(stats.ChunksCreated),
|
||||||
|
int64(stats.BlobsCreated),
|
||||||
|
stats.BytesScanned,
|
||||||
|
stats.BytesUploaded,
|
||||||
|
)
|
||||||
|
})
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("updating snapshot stats: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// UpdateSnapshotStatsExtended updates snapshot statistics with extended metrics.
|
||||||
|
// This includes compression level, uncompressed blob size, and upload duration.
|
||||||
|
func (sm *SnapshotManager) UpdateSnapshotStatsExtended(ctx context.Context, snapshotID string, stats ExtendedBackupStats) error {
|
||||||
|
return sm.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
// First update basic stats
|
||||||
|
if err := sm.repos.Snapshots.UpdateCounts(ctx, tx, snapshotID,
|
||||||
|
int64(stats.FilesScanned),
|
||||||
|
int64(stats.ChunksCreated),
|
||||||
|
int64(stats.BlobsCreated),
|
||||||
|
stats.BytesScanned,
|
||||||
|
stats.BytesUploaded,
|
||||||
|
); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Then update extended stats
|
||||||
|
return sm.repos.Snapshots.UpdateExtendedStats(ctx, tx, snapshotID,
|
||||||
|
stats.BlobUncompressedSize,
|
||||||
|
stats.CompressionLevel,
|
||||||
|
stats.UploadDurationMs,
|
||||||
|
)
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
// CompleteSnapshot marks a snapshot as completed and exports its metadata
|
||||||
|
func (sm *SnapshotManager) CompleteSnapshot(ctx context.Context, snapshotID string) error {
|
||||||
|
// Mark the snapshot as completed
|
||||||
|
err := sm.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
return sm.repos.Snapshots.MarkComplete(ctx, tx, snapshotID)
|
||||||
|
})
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("marking snapshot complete: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
log.Info("Completed snapshot", "snapshot_id", snapshotID)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// ExportSnapshotMetadata exports snapshot metadata to S3
|
||||||
|
//
|
||||||
|
// This method executes the complete snapshot metadata export process:
|
||||||
|
// 1. Creates a temporary directory for working files
|
||||||
|
// 2. Copies the main database to preserve its state
|
||||||
|
// 3. Cleans the copy to contain only current snapshot data
|
||||||
|
// 4. Dumps the cleaned database to SQL
|
||||||
|
// 5. Compresses the SQL dump with zstd
|
||||||
|
// 6. Encrypts the compressed data (if encryption is enabled)
|
||||||
|
// 7. Uploads to S3 at: snapshots/{snapshot-id}.sql.zst[.age]
|
||||||
|
//
|
||||||
|
// The caller is responsible for:
|
||||||
|
// - Ensuring the main database is closed before calling this method
|
||||||
|
// - Reopening the main database after this method returns
|
||||||
|
//
|
||||||
|
// This ensures database consistency during the copy operation.
|
||||||
|
func (sm *SnapshotManager) ExportSnapshotMetadata(ctx context.Context, dbPath string, snapshotID string) error {
|
||||||
|
log.Info("Phase 3/3: Exporting snapshot metadata", "snapshot_id", snapshotID, "source_db", dbPath)
|
||||||
|
|
||||||
|
// Create temp directory for all temporary files
|
||||||
|
tempDir, err := afero.TempDir(sm.fs, "", "vaultik-snapshot-*")
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("creating temp dir: %w", err)
|
||||||
|
}
|
||||||
|
log.Debug("Created temporary directory", "path", tempDir)
|
||||||
|
defer func() {
|
||||||
|
log.Debug("Cleaning up temporary directory", "path", tempDir)
|
||||||
|
if err := sm.fs.RemoveAll(tempDir); err != nil {
|
||||||
|
log.Debug("Failed to remove temp dir", "path", tempDir, "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Step 1: Copy database to temp file
|
||||||
|
// The main database should be closed at this point
|
||||||
|
tempDBPath := filepath.Join(tempDir, "snapshot.db")
|
||||||
|
log.Debug("Copying database to temporary location", "source", dbPath, "destination", tempDBPath)
|
||||||
|
if err := sm.copyFile(dbPath, tempDBPath); err != nil {
|
||||||
|
return fmt.Errorf("copying database: %w", err)
|
||||||
|
}
|
||||||
|
log.Debug("Database copy complete", "size", sm.getFileSize(tempDBPath))
|
||||||
|
|
||||||
|
// Step 2: Clean the temp database to only contain current snapshot data
|
||||||
|
log.Debug("Cleaning temporary database", "snapshot_id", snapshotID)
|
||||||
|
stats, err := sm.cleanSnapshotDB(ctx, tempDBPath, snapshotID)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("cleaning snapshot database: %w", err)
|
||||||
|
}
|
||||||
|
log.Info("Temporary database cleanup complete",
|
||||||
|
"db_path", tempDBPath,
|
||||||
|
"size_after_clean", humanize.Bytes(uint64(sm.getFileSize(tempDBPath))),
|
||||||
|
"files", stats.FileCount,
|
||||||
|
"chunks", stats.ChunkCount,
|
||||||
|
"blobs", stats.BlobCount,
|
||||||
|
"total_compressed_size", humanize.Bytes(uint64(stats.CompressedSize)),
|
||||||
|
"total_uncompressed_size", humanize.Bytes(uint64(stats.UncompressedSize)),
|
||||||
|
"compression_ratio", fmt.Sprintf("%.2fx", float64(stats.UncompressedSize)/float64(stats.CompressedSize)))
|
||||||
|
|
||||||
|
// Step 3: VACUUM the database to remove deleted data and compact
|
||||||
|
// This is critical for security - ensures no stale/deleted data is uploaded
|
||||||
|
if err := sm.vacuumDatabase(tempDBPath); err != nil {
|
||||||
|
return fmt.Errorf("vacuuming database: %w", err)
|
||||||
|
}
|
||||||
|
log.Debug("Database vacuumed", "size", humanize.Bytes(uint64(sm.getFileSize(tempDBPath))))
|
||||||
|
|
||||||
|
// Step 4: Compress and encrypt the binary database file
|
||||||
|
compressedPath := filepath.Join(tempDir, "db.zst.age")
|
||||||
|
if err := sm.compressFile(tempDBPath, compressedPath); err != nil {
|
||||||
|
return fmt.Errorf("compressing database: %w", err)
|
||||||
|
}
|
||||||
|
log.Debug("Compression complete",
|
||||||
|
"original_size", humanize.Bytes(uint64(sm.getFileSize(tempDBPath))),
|
||||||
|
"compressed_size", humanize.Bytes(uint64(sm.getFileSize(compressedPath))))
|
||||||
|
|
||||||
|
// Step 5: Read compressed and encrypted data for upload
|
||||||
|
finalData, err := afero.ReadFile(sm.fs, compressedPath)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("reading compressed dump: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Step 6: Generate blob manifest (before closing temp DB)
|
||||||
|
blobManifest, err := sm.generateBlobManifest(ctx, tempDBPath, snapshotID)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("generating blob manifest: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Step 7: Upload to S3 in snapshot subdirectory
|
||||||
|
// Upload database backup (compressed and encrypted)
|
||||||
|
dbKey := fmt.Sprintf("metadata/%s/db.zst.age", snapshotID)
|
||||||
|
|
||||||
|
dbUploadStart := time.Now()
|
||||||
|
if err := sm.storage.Put(ctx, dbKey, bytes.NewReader(finalData)); err != nil {
|
||||||
|
return fmt.Errorf("uploading snapshot database: %w", err)
|
||||||
|
}
|
||||||
|
dbUploadDuration := time.Since(dbUploadStart)
|
||||||
|
dbUploadSpeed := float64(len(finalData)) * 8 / dbUploadDuration.Seconds() // bits per second
|
||||||
|
log.Info("Uploaded snapshot database",
|
||||||
|
"path", dbKey,
|
||||||
|
"size", humanize.Bytes(uint64(len(finalData))),
|
||||||
|
"duration", dbUploadDuration,
|
||||||
|
"speed", humanize.SI(dbUploadSpeed, "bps"))
|
||||||
|
|
||||||
|
// Upload blob manifest (compressed only, not encrypted)
|
||||||
|
manifestKey := fmt.Sprintf("metadata/%s/manifest.json.zst", snapshotID)
|
||||||
|
manifestUploadStart := time.Now()
|
||||||
|
if err := sm.storage.Put(ctx, manifestKey, bytes.NewReader(blobManifest)); err != nil {
|
||||||
|
return fmt.Errorf("uploading blob manifest: %w", err)
|
||||||
|
}
|
||||||
|
manifestUploadDuration := time.Since(manifestUploadStart)
|
||||||
|
manifestUploadSpeed := float64(len(blobManifest)) * 8 / manifestUploadDuration.Seconds() // bits per second
|
||||||
|
log.Info("Uploaded blob manifest",
|
||||||
|
"path", manifestKey,
|
||||||
|
"size", humanize.Bytes(uint64(len(blobManifest))),
|
||||||
|
"duration", manifestUploadDuration,
|
||||||
|
"speed", humanize.SI(manifestUploadSpeed, "bps"))
|
||||||
|
|
||||||
|
log.Info("Uploaded snapshot metadata",
|
||||||
|
"snapshot_id", snapshotID,
|
||||||
|
"db_size", len(finalData),
|
||||||
|
"manifest_size", len(blobManifest))
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// CleanupStats contains statistics about cleaned snapshot database
|
||||||
|
type CleanupStats struct {
|
||||||
|
FileCount int
|
||||||
|
ChunkCount int
|
||||||
|
BlobCount int
|
||||||
|
CompressedSize int64
|
||||||
|
UncompressedSize int64
|
||||||
|
}
|
||||||
|
|
||||||
|
// cleanSnapshotDB removes all data except for the specified snapshot
|
||||||
|
//
|
||||||
|
// The cleanup is performed in a specific order to maintain referential integrity:
|
||||||
|
// 1. Delete other snapshots
|
||||||
|
// 2. Delete orphaned snapshot associations (snapshot_files, snapshot_blobs) for deleted snapshots
|
||||||
|
// 3. Delete orphaned files (not in the current snapshot)
|
||||||
|
// 4. Delete orphaned chunk-to-file mappings (references to deleted files)
|
||||||
|
// 5. Delete orphaned blobs (not in the current snapshot)
|
||||||
|
// 6. Delete orphaned blob-to-chunk mappings (references to deleted chunks)
|
||||||
|
// 7. Delete orphaned chunks (not referenced by any file)
|
||||||
|
//
|
||||||
|
// Each step is implemented as a separate method for clarity and maintainability.
|
||||||
|
func (sm *SnapshotManager) cleanSnapshotDB(ctx context.Context, dbPath string, snapshotID string) (*CleanupStats, error) {
|
||||||
|
// Open the temp database
|
||||||
|
db, err := database.New(ctx, dbPath)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("opening temp database: %w", err)
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
if err := db.Close(); err != nil {
|
||||||
|
log.Debug("Failed to close temp database", "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Start a transaction
|
||||||
|
tx, err := db.BeginTx(ctx, nil)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("beginning transaction: %w", err)
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
if rbErr := tx.Rollback(); rbErr != nil && rbErr != sql.ErrTxDone {
|
||||||
|
log.Debug("Failed to rollback transaction", "error", rbErr)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Execute cleanup steps in order
|
||||||
|
if err := sm.deleteOtherSnapshots(ctx, tx, snapshotID); err != nil {
|
||||||
|
return nil, fmt.Errorf("step 1 - delete other snapshots: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := sm.deleteOrphanedSnapshotAssociations(ctx, tx, snapshotID); err != nil {
|
||||||
|
return nil, fmt.Errorf("step 2 - delete orphaned snapshot associations: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := sm.deleteOrphanedFiles(ctx, tx, snapshotID); err != nil {
|
||||||
|
return nil, fmt.Errorf("step 3 - delete orphaned files: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := sm.deleteOrphanedChunkToFileMappings(ctx, tx); err != nil {
|
||||||
|
return nil, fmt.Errorf("step 4 - delete orphaned chunk-to-file mappings: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := sm.deleteOrphanedBlobs(ctx, tx, snapshotID); err != nil {
|
||||||
|
return nil, fmt.Errorf("step 5 - delete orphaned blobs: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := sm.deleteOrphanedBlobToChunkMappings(ctx, tx); err != nil {
|
||||||
|
return nil, fmt.Errorf("step 6 - delete orphaned blob-to-chunk mappings: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := sm.deleteOrphanedChunks(ctx, tx); err != nil {
|
||||||
|
return nil, fmt.Errorf("step 7 - delete orphaned chunks: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Commit transaction
|
||||||
|
log.Debug("[Temp DB Cleanup] Committing cleanup transaction")
|
||||||
|
if err := tx.Commit(); err != nil {
|
||||||
|
return nil, fmt.Errorf("committing transaction: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Collect statistics about the cleaned database
|
||||||
|
stats := &CleanupStats{}
|
||||||
|
|
||||||
|
// Count files
|
||||||
|
var fileCount int
|
||||||
|
err = db.QueryRowWithLog(ctx, "SELECT COUNT(*) FROM files").Scan(&fileCount)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("counting files: %w", err)
|
||||||
|
}
|
||||||
|
stats.FileCount = fileCount
|
||||||
|
|
||||||
|
// Count chunks
|
||||||
|
var chunkCount int
|
||||||
|
err = db.QueryRowWithLog(ctx, "SELECT COUNT(*) FROM chunks").Scan(&chunkCount)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("counting chunks: %w", err)
|
||||||
|
}
|
||||||
|
stats.ChunkCount = chunkCount
|
||||||
|
|
||||||
|
// Count blobs and get sizes
|
||||||
|
var blobCount int
|
||||||
|
var compressedSize, uncompressedSize sql.NullInt64
|
||||||
|
err = db.QueryRowWithLog(ctx, `
|
||||||
|
SELECT COUNT(*), COALESCE(SUM(compressed_size), 0), COALESCE(SUM(uncompressed_size), 0)
|
||||||
|
FROM blobs
|
||||||
|
WHERE blob_hash IN (SELECT blob_hash FROM snapshot_blobs WHERE snapshot_id = ?)
|
||||||
|
`, snapshotID).Scan(&blobCount, &compressedSize, &uncompressedSize)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("counting blobs and sizes: %w", err)
|
||||||
|
}
|
||||||
|
stats.BlobCount = blobCount
|
||||||
|
stats.CompressedSize = compressedSize.Int64
|
||||||
|
stats.UncompressedSize = uncompressedSize.Int64
|
||||||
|
|
||||||
|
return stats, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// vacuumDatabase runs VACUUM on the database to remove deleted data and compact
|
||||||
|
// This is critical for security - ensures no stale/deleted data pages are uploaded
|
||||||
|
func (sm *SnapshotManager) vacuumDatabase(dbPath string) error {
|
||||||
|
log.Debug("Running VACUUM on database", "path", dbPath)
|
||||||
|
cmd := exec.Command("sqlite3", dbPath, "VACUUM;")
|
||||||
|
|
||||||
|
if output, err := cmd.CombinedOutput(); err != nil {
|
||||||
|
return fmt.Errorf("running VACUUM: %w (output: %s)", err, string(output))
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// compressFile compresses a file using zstd and encrypts with age
|
||||||
|
func (sm *SnapshotManager) compressFile(inputPath, outputPath string) error {
|
||||||
|
input, err := sm.fs.Open(inputPath)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("opening input file: %w", err)
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
if err := input.Close(); err != nil {
|
||||||
|
log.Debug("Failed to close input file", "path", inputPath, "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
output, err := sm.fs.Create(outputPath)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("creating output file: %w", err)
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
if err := output.Close(); err != nil {
|
||||||
|
log.Debug("Failed to close output file", "path", outputPath, "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Use blobgen for compression and encryption
|
||||||
|
log.Debug("Compressing and encrypting data")
|
||||||
|
writer, err := blobgen.NewWriter(output, sm.config.CompressionLevel, sm.config.AgeRecipients)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("creating blobgen writer: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Track if writer has been closed to avoid double-close
|
||||||
|
writerClosed := false
|
||||||
|
defer func() {
|
||||||
|
if !writerClosed {
|
||||||
|
if err := writer.Close(); err != nil {
|
||||||
|
log.Debug("Failed to close writer", "error", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
if _, err := io.Copy(writer, input); err != nil {
|
||||||
|
return fmt.Errorf("compressing data: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Close writer to flush all data
|
||||||
|
if err := writer.Close(); err != nil {
|
||||||
|
return fmt.Errorf("closing writer: %w", err)
|
||||||
|
}
|
||||||
|
writerClosed = true
|
||||||
|
|
||||||
|
log.Debug("Compression complete", "hash", fmt.Sprintf("%x", writer.Sum256()))
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// copyFile copies a file from src to dst
|
||||||
|
func (sm *SnapshotManager) copyFile(src, dst string) error {
|
||||||
|
log.Debug("Opening source file for copy", "path", src)
|
||||||
|
sourceFile, err := sm.fs.Open(src)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
log.Debug("Closing source file", "path", src)
|
||||||
|
if err := sourceFile.Close(); err != nil {
|
||||||
|
log.Debug("Failed to close source file", "path", src, "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
log.Debug("Creating destination file", "path", dst)
|
||||||
|
destFile, err := sm.fs.Create(dst)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
log.Debug("Closing destination file", "path", dst)
|
||||||
|
if err := destFile.Close(); err != nil {
|
||||||
|
log.Debug("Failed to close destination file", "path", dst, "error", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
log.Debug("Copying file data")
|
||||||
|
n, err := io.Copy(destFile, sourceFile)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
log.Debug("File copy complete", "bytes_copied", n)
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// generateBlobManifest creates a compressed JSON list of all blobs in the snapshot
|
||||||
|
func (sm *SnapshotManager) generateBlobManifest(ctx context.Context, dbPath string, snapshotID string) ([]byte, error) {
|
||||||
|
|
||||||
|
// Open the cleaned database using the database package
|
||||||
|
db, err := database.New(ctx, dbPath)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("opening database: %w", err)
|
||||||
|
}
|
||||||
|
defer func() { _ = db.Close() }()
|
||||||
|
|
||||||
|
// Create repositories to access the data
|
||||||
|
repos := database.NewRepositories(db)
|
||||||
|
|
||||||
|
// Get all blobs for this snapshot
|
||||||
|
log.Debug("Querying blobs for snapshot", "snapshot_id", snapshotID)
|
||||||
|
blobHashes, err := repos.Snapshots.GetBlobHashes(ctx, snapshotID)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("getting snapshot blobs: %w", err)
|
||||||
|
}
|
||||||
|
log.Debug("Found blobs", "count", len(blobHashes))
|
||||||
|
|
||||||
|
// Get blob details including sizes
|
||||||
|
blobs := make([]BlobInfo, 0, len(blobHashes))
|
||||||
|
totalCompressedSize := int64(0)
|
||||||
|
|
||||||
|
for _, hash := range blobHashes {
|
||||||
|
blob, err := repos.Blobs.GetByHash(ctx, hash)
|
||||||
|
if err != nil {
|
||||||
|
log.Warn("Failed to get blob details", "hash", hash, "error", err)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if blob != nil {
|
||||||
|
blobs = append(blobs, BlobInfo{
|
||||||
|
Hash: hash,
|
||||||
|
CompressedSize: blob.CompressedSize,
|
||||||
|
})
|
||||||
|
totalCompressedSize += blob.CompressedSize
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create manifest
|
||||||
|
manifest := &Manifest{
|
||||||
|
SnapshotID: snapshotID,
|
||||||
|
Timestamp: time.Now().UTC().Format(time.RFC3339),
|
||||||
|
BlobCount: len(blobs),
|
||||||
|
TotalCompressedSize: totalCompressedSize,
|
||||||
|
Blobs: blobs,
|
||||||
|
}
|
||||||
|
|
||||||
|
// Encode manifest
|
||||||
|
compressedData, err := EncodeManifest(manifest, sm.config.CompressionLevel)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("encoding manifest: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
log.Info("Generated blob manifest",
|
||||||
|
"snapshot_id", snapshotID,
|
||||||
|
"blob_count", len(blobs),
|
||||||
|
"total_compressed_size", totalCompressedSize,
|
||||||
|
"manifest_size", len(compressedData))
|
||||||
|
|
||||||
|
return compressedData, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// compressData compresses data using zstd
|
||||||
|
|
||||||
|
// getFileSize returns the size of a file in bytes, or -1 if error
|
||||||
|
func (sm *SnapshotManager) getFileSize(path string) int64 {
|
||||||
|
info, err := sm.fs.Stat(path)
|
||||||
|
if err != nil {
|
||||||
|
return -1
|
||||||
|
}
|
||||||
|
return info.Size()
|
||||||
|
}
|
||||||
|
|
||||||
|
// BackupStats contains statistics from a backup operation
|
||||||
|
type BackupStats struct {
|
||||||
|
FilesScanned int
|
||||||
|
BytesScanned int64
|
||||||
|
ChunksCreated int
|
||||||
|
BlobsCreated int
|
||||||
|
BytesUploaded int64
|
||||||
|
}
|
||||||
|
|
||||||
|
// ExtendedBackupStats contains additional statistics for comprehensive tracking
|
||||||
|
type ExtendedBackupStats struct {
|
||||||
|
BackupStats
|
||||||
|
BlobUncompressedSize int64 // Total uncompressed size of all referenced blobs
|
||||||
|
CompressionLevel int // Compression level used for this snapshot
|
||||||
|
UploadDurationMs int64 // Total milliseconds spent uploading to S3
|
||||||
|
}
|
||||||
|
|
||||||
|
// CleanupIncompleteSnapshots removes incomplete snapshots that don't have metadata in S3.
|
||||||
|
// This is critical for data safety: incomplete snapshots can cause deduplication to skip
|
||||||
|
// files that were never successfully backed up, resulting in data loss.
|
||||||
|
func (sm *SnapshotManager) CleanupIncompleteSnapshots(ctx context.Context, hostname string) error {
|
||||||
|
log.Info("Checking for incomplete snapshots", "hostname", hostname)
|
||||||
|
|
||||||
|
// Get all incomplete snapshots for this hostname
|
||||||
|
incompleteSnapshots, err := sm.repos.Snapshots.GetIncompleteByHostname(ctx, hostname)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("getting incomplete snapshots: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(incompleteSnapshots) == 0 {
|
||||||
|
log.Debug("No incomplete snapshots found")
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
log.Info("Found incomplete snapshots", "count", len(incompleteSnapshots))
|
||||||
|
|
||||||
|
// Check each incomplete snapshot for metadata in storage
|
||||||
|
for _, snapshot := range incompleteSnapshots {
|
||||||
|
// Check if metadata exists in storage
|
||||||
|
metadataKey := fmt.Sprintf("metadata/%s/db.zst", snapshot.ID)
|
||||||
|
_, err := sm.storage.Stat(ctx, metadataKey)
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
// Metadata doesn't exist in S3 - this is an incomplete snapshot
|
||||||
|
log.Info("Cleaning up incomplete snapshot record", "snapshot_id", snapshot.ID, "started_at", snapshot.StartedAt)
|
||||||
|
|
||||||
|
// Delete the snapshot and all its associations
|
||||||
|
if err := sm.deleteSnapshot(ctx, snapshot.ID.String()); err != nil {
|
||||||
|
return fmt.Errorf("deleting incomplete snapshot %s: %w", snapshot.ID, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
log.Info("Deleted incomplete snapshot record and associated data", "snapshot_id", snapshot.ID)
|
||||||
|
} else {
|
||||||
|
// Metadata exists - this snapshot was completed but database wasn't updated
|
||||||
|
// This shouldn't happen in normal operation, but mark it complete
|
||||||
|
log.Warn("Found snapshot with remote metadata but incomplete in database", "snapshot_id", snapshot.ID)
|
||||||
|
if err := sm.repos.Snapshots.MarkComplete(ctx, nil, snapshot.ID.String()); err != nil {
|
||||||
|
log.Error("Failed to mark snapshot as complete in database", "snapshot_id", snapshot.ID, "error", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// deleteSnapshot removes a snapshot and all its associations from the database
|
||||||
|
func (sm *SnapshotManager) deleteSnapshot(ctx context.Context, snapshotID string) error {
|
||||||
|
// Delete snapshot_files entries
|
||||||
|
if err := sm.repos.Snapshots.DeleteSnapshotFiles(ctx, snapshotID); err != nil {
|
||||||
|
return fmt.Errorf("deleting snapshot files: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Delete snapshot_blobs entries
|
||||||
|
if err := sm.repos.Snapshots.DeleteSnapshotBlobs(ctx, snapshotID); err != nil {
|
||||||
|
return fmt.Errorf("deleting snapshot blobs: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Delete uploads entries (has foreign key to snapshots without CASCADE)
|
||||||
|
if err := sm.repos.Snapshots.DeleteSnapshotUploads(ctx, snapshotID); err != nil {
|
||||||
|
return fmt.Errorf("deleting snapshot uploads: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Delete the snapshot itself
|
||||||
|
if err := sm.repos.Snapshots.Delete(ctx, snapshotID); err != nil {
|
||||||
|
return fmt.Errorf("deleting snapshot: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Clean up orphaned data
|
||||||
|
log.Debug("Cleaning up orphaned records in main database")
|
||||||
|
if err := sm.CleanupOrphanedData(ctx); err != nil {
|
||||||
|
return fmt.Errorf("cleaning up orphaned data: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// CleanupOrphanedData removes files, chunks, and blobs that are no longer referenced by any snapshot.
|
||||||
|
// This should be called periodically to clean up data from deleted or incomplete snapshots.
|
||||||
|
func (sm *SnapshotManager) CleanupOrphanedData(ctx context.Context) error {
|
||||||
|
// Order is important to respect foreign key constraints:
|
||||||
|
// 1. Delete orphaned files (will cascade delete file_chunks)
|
||||||
|
// 2. Delete orphaned blobs (will cascade delete blob_chunks for deleted blobs)
|
||||||
|
// 3. Delete orphaned blob_chunks (where blob exists but chunk doesn't)
|
||||||
|
// 4. Delete orphaned chunks (now safe after all blob_chunks are gone)
|
||||||
|
|
||||||
|
// Delete orphaned files (files not in any snapshot)
|
||||||
|
log.Debug("Deleting orphaned file records from database")
|
||||||
|
if err := sm.repos.Files.DeleteOrphaned(ctx); err != nil {
|
||||||
|
return fmt.Errorf("deleting orphaned files: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Delete orphaned blobs (blobs not in any snapshot)
|
||||||
|
// This will cascade delete blob_chunks for deleted blobs
|
||||||
|
log.Debug("Deleting orphaned blob records from database")
|
||||||
|
if err := sm.repos.Blobs.DeleteOrphaned(ctx); err != nil {
|
||||||
|
return fmt.Errorf("deleting orphaned blobs: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Delete orphaned blob_chunks entries
|
||||||
|
// This handles cases where the blob still exists but chunks were deleted
|
||||||
|
log.Debug("Deleting orphaned blob_chunks associations from database")
|
||||||
|
if err := sm.repos.BlobChunks.DeleteOrphaned(ctx); err != nil {
|
||||||
|
return fmt.Errorf("deleting orphaned blob_chunks: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Delete orphaned chunks (chunks not referenced by any file)
|
||||||
|
// This must come after cleaning up blob_chunks to avoid foreign key violations
|
||||||
|
log.Debug("Deleting orphaned chunk records from database")
|
||||||
|
if err := sm.repos.Chunks.DeleteOrphaned(ctx); err != nil {
|
||||||
|
return fmt.Errorf("deleting orphaned chunks: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// deleteOtherSnapshots deletes all snapshots except the current one
|
||||||
|
func (sm *SnapshotManager) deleteOtherSnapshots(ctx context.Context, tx *sql.Tx, currentSnapshotID string) error {
|
||||||
|
log.Debug("[Temp DB Cleanup] Deleting all snapshot records except current", "keeping", currentSnapshotID)
|
||||||
|
|
||||||
|
// First delete uploads that reference other snapshots (no CASCADE DELETE on this FK)
|
||||||
|
database.LogSQL("Execute", "DELETE FROM uploads WHERE snapshot_id != ?", currentSnapshotID)
|
||||||
|
uploadResult, err := tx.ExecContext(ctx, "DELETE FROM uploads WHERE snapshot_id != ?", currentSnapshotID)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting uploads for other snapshots: %w", err)
|
||||||
|
}
|
||||||
|
uploadsDeleted, _ := uploadResult.RowsAffected()
|
||||||
|
log.Debug("[Temp DB Cleanup] Deleted upload records", "count", uploadsDeleted)
|
||||||
|
|
||||||
|
// Now we can safely delete the snapshots
|
||||||
|
database.LogSQL("Execute", "DELETE FROM snapshots WHERE id != ?", currentSnapshotID)
|
||||||
|
result, err := tx.ExecContext(ctx, "DELETE FROM snapshots WHERE id != ?", currentSnapshotID)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting other snapshots: %w", err)
|
||||||
|
}
|
||||||
|
rowsAffected, _ := result.RowsAffected()
|
||||||
|
log.Debug("[Temp DB Cleanup] Deleted snapshot records from database", "count", rowsAffected)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// deleteOrphanedSnapshotAssociations deletes snapshot_files and snapshot_blobs for deleted snapshots
|
||||||
|
func (sm *SnapshotManager) deleteOrphanedSnapshotAssociations(ctx context.Context, tx *sql.Tx, currentSnapshotID string) error {
|
||||||
|
// Delete orphaned snapshot_files
|
||||||
|
log.Debug("[Temp DB Cleanup] Deleting orphaned snapshot_files associations")
|
||||||
|
database.LogSQL("Execute", "DELETE FROM snapshot_files WHERE snapshot_id != ?", currentSnapshotID)
|
||||||
|
result, err := tx.ExecContext(ctx, "DELETE FROM snapshot_files WHERE snapshot_id != ?", currentSnapshotID)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting orphaned snapshot_files: %w", err)
|
||||||
|
}
|
||||||
|
rowsAffected, _ := result.RowsAffected()
|
||||||
|
log.Debug("[Temp DB Cleanup] Deleted snapshot_files associations", "count", rowsAffected)
|
||||||
|
|
||||||
|
// Delete orphaned snapshot_blobs
|
||||||
|
log.Debug("[Temp DB Cleanup] Deleting orphaned snapshot_blobs associations")
|
||||||
|
database.LogSQL("Execute", "DELETE FROM snapshot_blobs WHERE snapshot_id != ?", currentSnapshotID)
|
||||||
|
result, err = tx.ExecContext(ctx, "DELETE FROM snapshot_blobs WHERE snapshot_id != ?", currentSnapshotID)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting orphaned snapshot_blobs: %w", err)
|
||||||
|
}
|
||||||
|
rowsAffected, _ = result.RowsAffected()
|
||||||
|
log.Debug("[Temp DB Cleanup] Deleted snapshot_blobs associations", "count", rowsAffected)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// deleteOrphanedFiles deletes files not in the current snapshot
|
||||||
|
func (sm *SnapshotManager) deleteOrphanedFiles(ctx context.Context, tx *sql.Tx, currentSnapshotID string) error {
|
||||||
|
log.Debug("[Temp DB Cleanup] Deleting file records not referenced by current snapshot")
|
||||||
|
database.LogSQL("Execute", `DELETE FROM files WHERE NOT EXISTS (SELECT 1 FROM snapshot_files WHERE snapshot_files.file_id = files.id AND snapshot_files.snapshot_id = ?)`, currentSnapshotID)
|
||||||
|
result, err := tx.ExecContext(ctx, `
|
||||||
|
DELETE FROM files
|
||||||
|
WHERE NOT EXISTS (
|
||||||
|
SELECT 1 FROM snapshot_files
|
||||||
|
WHERE snapshot_files.file_id = files.id
|
||||||
|
AND snapshot_files.snapshot_id = ?
|
||||||
|
)`, currentSnapshotID)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting orphaned files: %w", err)
|
||||||
|
}
|
||||||
|
rowsAffected, _ := result.RowsAffected()
|
||||||
|
log.Debug("[Temp DB Cleanup] Deleted file records from database", "count", rowsAffected)
|
||||||
|
|
||||||
|
// Note: file_chunks will be deleted via CASCADE
|
||||||
|
log.Debug("[Temp DB Cleanup] file_chunks associations deleted via CASCADE")
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// deleteOrphanedChunkToFileMappings deletes chunk_files entries for deleted files
|
||||||
|
func (sm *SnapshotManager) deleteOrphanedChunkToFileMappings(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
log.Debug("[Temp DB Cleanup] Deleting orphaned chunk_files associations")
|
||||||
|
database.LogSQL("Execute", `DELETE FROM chunk_files WHERE NOT EXISTS (SELECT 1 FROM files WHERE files.id = chunk_files.file_id)`)
|
||||||
|
result, err := tx.ExecContext(ctx, `
|
||||||
|
DELETE FROM chunk_files
|
||||||
|
WHERE NOT EXISTS (
|
||||||
|
SELECT 1 FROM files
|
||||||
|
WHERE files.id = chunk_files.file_id
|
||||||
|
)`)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting orphaned chunk_files: %w", err)
|
||||||
|
}
|
||||||
|
rowsAffected, _ := result.RowsAffected()
|
||||||
|
log.Debug("[Temp DB Cleanup] Deleted chunk_files associations", "count", rowsAffected)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// deleteOrphanedBlobs deletes blobs not in the current snapshot
|
||||||
|
func (sm *SnapshotManager) deleteOrphanedBlobs(ctx context.Context, tx *sql.Tx, currentSnapshotID string) error {
|
||||||
|
log.Debug("[Temp DB Cleanup] Deleting blob records not referenced by current snapshot")
|
||||||
|
database.LogSQL("Execute", `DELETE FROM blobs WHERE NOT EXISTS (SELECT 1 FROM snapshot_blobs WHERE snapshot_blobs.blob_hash = blobs.blob_hash AND snapshot_blobs.snapshot_id = ?)`, currentSnapshotID)
|
||||||
|
result, err := tx.ExecContext(ctx, `
|
||||||
|
DELETE FROM blobs
|
||||||
|
WHERE NOT EXISTS (
|
||||||
|
SELECT 1 FROM snapshot_blobs
|
||||||
|
WHERE snapshot_blobs.blob_hash = blobs.blob_hash
|
||||||
|
AND snapshot_blobs.snapshot_id = ?
|
||||||
|
)`, currentSnapshotID)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting orphaned blobs: %w", err)
|
||||||
|
}
|
||||||
|
rowsAffected, _ := result.RowsAffected()
|
||||||
|
log.Debug("[Temp DB Cleanup] Deleted blob records from database", "count", rowsAffected)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// deleteOrphanedBlobToChunkMappings deletes blob_chunks entries for deleted blobs
|
||||||
|
func (sm *SnapshotManager) deleteOrphanedBlobToChunkMappings(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
log.Debug("[Temp DB Cleanup] Deleting orphaned blob_chunks associations")
|
||||||
|
database.LogSQL("Execute", `DELETE FROM blob_chunks WHERE NOT EXISTS (SELECT 1 FROM blobs WHERE blobs.id = blob_chunks.blob_id)`)
|
||||||
|
result, err := tx.ExecContext(ctx, `
|
||||||
|
DELETE FROM blob_chunks
|
||||||
|
WHERE NOT EXISTS (
|
||||||
|
SELECT 1 FROM blobs
|
||||||
|
WHERE blobs.id = blob_chunks.blob_id
|
||||||
|
)`)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting orphaned blob_chunks: %w", err)
|
||||||
|
}
|
||||||
|
rowsAffected, _ := result.RowsAffected()
|
||||||
|
log.Debug("[Temp DB Cleanup] Deleted blob_chunks associations", "count", rowsAffected)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// deleteOrphanedChunks deletes chunks not referenced by any file or blob
|
||||||
|
func (sm *SnapshotManager) deleteOrphanedChunks(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
log.Debug("[Temp DB Cleanup] Deleting orphaned chunk records")
|
||||||
|
query := `
|
||||||
|
DELETE FROM chunks
|
||||||
|
WHERE NOT EXISTS (
|
||||||
|
SELECT 1 FROM file_chunks
|
||||||
|
WHERE file_chunks.chunk_hash = chunks.chunk_hash
|
||||||
|
)
|
||||||
|
AND NOT EXISTS (
|
||||||
|
SELECT 1 FROM blob_chunks
|
||||||
|
WHERE blob_chunks.chunk_hash = chunks.chunk_hash
|
||||||
|
)`
|
||||||
|
database.LogSQL("Execute", query)
|
||||||
|
result, err := tx.ExecContext(ctx, query)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("deleting orphaned chunks: %w", err)
|
||||||
|
}
|
||||||
|
rowsAffected, _ := result.RowsAffected()
|
||||||
|
log.Debug("[Temp DB Cleanup] Deleted chunk records from database", "count", rowsAffected)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
188
internal/snapshot/snapshot_test.go
Normal file
188
internal/snapshot/snapshot_test.go
Normal file
@@ -0,0 +1,188 @@
|
|||||||
|
package snapshot
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"database/sql"
|
||||||
|
"io"
|
||||||
|
"path/filepath"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/config"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||||
|
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||||
|
"github.com/spf13/afero"
|
||||||
|
)
|
||||||
|
|
||||||
|
const (
|
||||||
|
// Test age public key for encryption
|
||||||
|
testAgeRecipient = "age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"
|
||||||
|
)
|
||||||
|
|
||||||
|
// copyFile is a test helper to copy files using afero
|
||||||
|
func copyFile(fs afero.Fs, src, dst string) error {
|
||||||
|
sourceFile, err := fs.Open(src)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
defer func() { _ = sourceFile.Close() }()
|
||||||
|
|
||||||
|
destFile, err := fs.Create(dst)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
defer func() { _ = destFile.Close() }()
|
||||||
|
|
||||||
|
_, err = io.Copy(destFile, sourceFile)
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestCleanSnapshotDBEmptySnapshot(t *testing.T) {
|
||||||
|
// Initialize logger
|
||||||
|
log.Initialize(log.Config{})
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
fs := afero.NewOsFs()
|
||||||
|
|
||||||
|
// Create a test database
|
||||||
|
tempDir := t.TempDir()
|
||||||
|
dbPath := filepath.Join(tempDir, "test.db")
|
||||||
|
db, err := database.New(ctx, dbPath)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create database: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
repos := database.NewRepositories(db)
|
||||||
|
|
||||||
|
// Create an empty snapshot
|
||||||
|
snapshot := &database.Snapshot{
|
||||||
|
ID: "empty-snapshot",
|
||||||
|
Hostname: "test-host",
|
||||||
|
}
|
||||||
|
|
||||||
|
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
return repos.Snapshots.Create(ctx, tx, snapshot)
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create snapshot: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create some files and chunks not associated with any snapshot
|
||||||
|
file := &database.File{Path: "/orphan/file.txt", Size: 1000}
|
||||||
|
chunk := &database.Chunk{ChunkHash: "orphan-chunk", Size: 500}
|
||||||
|
|
||||||
|
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||||
|
if err := repos.Files.Create(ctx, tx, file); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
return repos.Chunks.Create(ctx, tx, chunk)
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create orphan data: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Close the database
|
||||||
|
if err := db.Close(); err != nil {
|
||||||
|
t.Fatalf("failed to close database: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Copy database
|
||||||
|
tempDBPath := filepath.Join(tempDir, "temp.db")
|
||||||
|
if err := copyFile(fs, dbPath, tempDBPath); err != nil {
|
||||||
|
t.Fatalf("failed to copy database: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create a mock config for testing
|
||||||
|
cfg := &config.Config{
|
||||||
|
CompressionLevel: 3,
|
||||||
|
AgeRecipients: []string{testAgeRecipient},
|
||||||
|
}
|
||||||
|
// Create SnapshotManager with filesystem
|
||||||
|
sm := &SnapshotManager{
|
||||||
|
config: cfg,
|
||||||
|
fs: fs,
|
||||||
|
}
|
||||||
|
if _, err := sm.cleanSnapshotDB(ctx, tempDBPath, snapshot.ID.String()); err != nil {
|
||||||
|
t.Fatalf("failed to clean snapshot database: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify the cleaned database
|
||||||
|
cleanedDB, err := database.New(ctx, tempDBPath)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to open cleaned database: %v", err)
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
if err := cleanedDB.Close(); err != nil {
|
||||||
|
t.Errorf("failed to close database: %v", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
cleanedRepos := database.NewRepositories(cleanedDB)
|
||||||
|
|
||||||
|
// Verify snapshot exists
|
||||||
|
verifySnapshot, err := cleanedRepos.Snapshots.GetByID(ctx, snapshot.ID.String())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to get snapshot: %v", err)
|
||||||
|
}
|
||||||
|
if verifySnapshot == nil {
|
||||||
|
t.Error("snapshot should exist")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify orphan file is gone
|
||||||
|
f, err := cleanedRepos.Files.GetByPath(ctx, file.Path.String())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to check file: %v", err)
|
||||||
|
}
|
||||||
|
if f != nil {
|
||||||
|
t.Error("orphan file should not exist")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify orphan chunk is gone
|
||||||
|
c, err := cleanedRepos.Chunks.GetByHash(ctx, chunk.ChunkHash.String())
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to check chunk: %v", err)
|
||||||
|
}
|
||||||
|
if c != nil {
|
||||||
|
t.Error("orphan chunk should not exist")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestCleanSnapshotDBNonExistentSnapshot(t *testing.T) {
|
||||||
|
// Initialize logger
|
||||||
|
log.Initialize(log.Config{})
|
||||||
|
|
||||||
|
ctx := context.Background()
|
||||||
|
fs := afero.NewOsFs()
|
||||||
|
|
||||||
|
// Create a test database
|
||||||
|
tempDir := t.TempDir()
|
||||||
|
dbPath := filepath.Join(tempDir, "test.db")
|
||||||
|
db, err := database.New(ctx, dbPath)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("failed to create database: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Close immediately
|
||||||
|
if err := db.Close(); err != nil {
|
||||||
|
t.Fatalf("failed to close database: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Copy database
|
||||||
|
tempDBPath := filepath.Join(tempDir, "temp.db")
|
||||||
|
if err := copyFile(fs, dbPath, tempDBPath); err != nil {
|
||||||
|
t.Fatalf("failed to copy database: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create a mock config for testing
|
||||||
|
cfg := &config.Config{
|
||||||
|
CompressionLevel: 3,
|
||||||
|
AgeRecipients: []string{testAgeRecipient},
|
||||||
|
}
|
||||||
|
// Try to clean with non-existent snapshot
|
||||||
|
sm := &SnapshotManager{config: cfg, fs: fs}
|
||||||
|
_, err = sm.cleanSnapshotDB(ctx, tempDBPath, "non-existent-snapshot")
|
||||||
|
|
||||||
|
// Should not error - it will just delete everything
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("unexpected error: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
262
internal/storage/file.go
Normal file
262
internal/storage/file.go
Normal file
@@ -0,0 +1,262 @@
|
|||||||
|
package storage
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"strings"
|
||||||
|
|
||||||
|
"github.com/spf13/afero"
|
||||||
|
)
|
||||||
|
|
||||||
|
// FileStorer implements Storer using the local filesystem.
|
||||||
|
// It mirrors the S3 path structure for consistency.
|
||||||
|
type FileStorer struct {
|
||||||
|
fs afero.Fs
|
||||||
|
basePath string
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewFileStorer creates a new filesystem storage backend.
|
||||||
|
// The basePath directory will be created if it doesn't exist.
|
||||||
|
// Uses the real OS filesystem by default; call SetFilesystem to override for testing.
|
||||||
|
func NewFileStorer(basePath string) (*FileStorer, error) {
|
||||||
|
fs := afero.NewOsFs()
|
||||||
|
// Ensure base path exists
|
||||||
|
if err := fs.MkdirAll(basePath, 0755); err != nil {
|
||||||
|
return nil, fmt.Errorf("creating base path: %w", err)
|
||||||
|
}
|
||||||
|
return &FileStorer{
|
||||||
|
fs: fs,
|
||||||
|
basePath: basePath,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// SetFilesystem overrides the filesystem for testing.
|
||||||
|
func (f *FileStorer) SetFilesystem(fs afero.Fs) {
|
||||||
|
f.fs = fs
|
||||||
|
}
|
||||||
|
|
||||||
|
// fullPath returns the full filesystem path for a key.
|
||||||
|
func (f *FileStorer) fullPath(key string) string {
|
||||||
|
return filepath.Join(f.basePath, key)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Put stores data at the specified key.
|
||||||
|
func (f *FileStorer) Put(ctx context.Context, key string, data io.Reader) error {
|
||||||
|
path := f.fullPath(key)
|
||||||
|
|
||||||
|
// Create parent directories
|
||||||
|
dir := filepath.Dir(path)
|
||||||
|
if err := f.fs.MkdirAll(dir, 0755); err != nil {
|
||||||
|
return fmt.Errorf("creating directories: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
file, err := f.fs.Create(path)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("creating file: %w", err)
|
||||||
|
}
|
||||||
|
defer func() { _ = file.Close() }()
|
||||||
|
|
||||||
|
if _, err := io.Copy(file, data); err != nil {
|
||||||
|
return fmt.Errorf("writing file: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// PutWithProgress stores data with progress reporting.
|
||||||
|
func (f *FileStorer) PutWithProgress(ctx context.Context, key string, data io.Reader, size int64, progress ProgressCallback) error {
|
||||||
|
path := f.fullPath(key)
|
||||||
|
|
||||||
|
// Create parent directories
|
||||||
|
dir := filepath.Dir(path)
|
||||||
|
if err := f.fs.MkdirAll(dir, 0755); err != nil {
|
||||||
|
return fmt.Errorf("creating directories: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
file, err := f.fs.Create(path)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("creating file: %w", err)
|
||||||
|
}
|
||||||
|
defer func() { _ = file.Close() }()
|
||||||
|
|
||||||
|
// Wrap with progress tracking
|
||||||
|
pw := &progressWriter{
|
||||||
|
writer: file,
|
||||||
|
callback: progress,
|
||||||
|
}
|
||||||
|
|
||||||
|
if _, err := io.Copy(pw, data); err != nil {
|
||||||
|
return fmt.Errorf("writing file: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get retrieves data from the specified key.
|
||||||
|
func (f *FileStorer) Get(ctx context.Context, key string) (io.ReadCloser, error) {
|
||||||
|
path := f.fullPath(key)
|
||||||
|
file, err := f.fs.Open(path)
|
||||||
|
if err != nil {
|
||||||
|
if os.IsNotExist(err) {
|
||||||
|
return nil, ErrNotFound
|
||||||
|
}
|
||||||
|
return nil, fmt.Errorf("opening file: %w", err)
|
||||||
|
}
|
||||||
|
return file, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Stat returns metadata about an object without retrieving its contents.
|
||||||
|
func (f *FileStorer) Stat(ctx context.Context, key string) (*ObjectInfo, error) {
|
||||||
|
path := f.fullPath(key)
|
||||||
|
info, err := f.fs.Stat(path)
|
||||||
|
if err != nil {
|
||||||
|
if os.IsNotExist(err) {
|
||||||
|
return nil, ErrNotFound
|
||||||
|
}
|
||||||
|
return nil, fmt.Errorf("stat file: %w", err)
|
||||||
|
}
|
||||||
|
return &ObjectInfo{
|
||||||
|
Key: key,
|
||||||
|
Size: info.Size(),
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Delete removes an object.
|
||||||
|
func (f *FileStorer) Delete(ctx context.Context, key string) error {
|
||||||
|
path := f.fullPath(key)
|
||||||
|
err := f.fs.Remove(path)
|
||||||
|
if os.IsNotExist(err) {
|
||||||
|
return nil // Match S3 behavior: no error if doesn't exist
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("removing file: %w", err)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// List returns all keys with the given prefix.
|
||||||
|
func (f *FileStorer) List(ctx context.Context, prefix string) ([]string, error) {
|
||||||
|
var keys []string
|
||||||
|
basePath := f.fullPath(prefix)
|
||||||
|
|
||||||
|
// Check if base path exists
|
||||||
|
exists, err := afero.Exists(f.fs, basePath)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("checking path: %w", err)
|
||||||
|
}
|
||||||
|
if !exists {
|
||||||
|
return keys, nil // Empty list for non-existent prefix
|
||||||
|
}
|
||||||
|
|
||||||
|
err = afero.Walk(f.fs, basePath, func(path string, info os.FileInfo, err error) error {
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check context cancellation
|
||||||
|
select {
|
||||||
|
case <-ctx.Done():
|
||||||
|
return ctx.Err()
|
||||||
|
default:
|
||||||
|
}
|
||||||
|
|
||||||
|
if !info.IsDir() {
|
||||||
|
// Convert back to key (relative path from basePath)
|
||||||
|
relPath, err := filepath.Rel(f.basePath, path)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("computing relative path: %w", err)
|
||||||
|
}
|
||||||
|
// Normalize path separators to forward slashes for consistency
|
||||||
|
relPath = strings.ReplaceAll(relPath, string(filepath.Separator), "/")
|
||||||
|
keys = append(keys, relPath)
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
})
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("walking directory: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return keys, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// ListStream returns a channel of ObjectInfo for large result sets.
|
||||||
|
func (f *FileStorer) ListStream(ctx context.Context, prefix string) <-chan ObjectInfo {
|
||||||
|
ch := make(chan ObjectInfo)
|
||||||
|
go func() {
|
||||||
|
defer close(ch)
|
||||||
|
basePath := f.fullPath(prefix)
|
||||||
|
|
||||||
|
// Check if base path exists
|
||||||
|
exists, err := afero.Exists(f.fs, basePath)
|
||||||
|
if err != nil {
|
||||||
|
ch <- ObjectInfo{Err: fmt.Errorf("checking path: %w", err)}
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if !exists {
|
||||||
|
return // Empty channel for non-existent prefix
|
||||||
|
}
|
||||||
|
|
||||||
|
_ = afero.Walk(f.fs, basePath, func(path string, info os.FileInfo, err error) error {
|
||||||
|
// Check context cancellation
|
||||||
|
select {
|
||||||
|
case <-ctx.Done():
|
||||||
|
ch <- ObjectInfo{Err: ctx.Err()}
|
||||||
|
return ctx.Err()
|
||||||
|
default:
|
||||||
|
}
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
ch <- ObjectInfo{Err: err}
|
||||||
|
return nil // Continue walking despite errors
|
||||||
|
}
|
||||||
|
|
||||||
|
if !info.IsDir() {
|
||||||
|
relPath, err := filepath.Rel(f.basePath, path)
|
||||||
|
if err != nil {
|
||||||
|
ch <- ObjectInfo{Err: fmt.Errorf("computing relative path: %w", err)}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
// Normalize path separators
|
||||||
|
relPath = strings.ReplaceAll(relPath, string(filepath.Separator), "/")
|
||||||
|
ch <- ObjectInfo{
|
||||||
|
Key: relPath,
|
||||||
|
Size: info.Size(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
})
|
||||||
|
}()
|
||||||
|
return ch
|
||||||
|
}
|
||||||
|
|
||||||
|
// Info returns human-readable storage location information.
|
||||||
|
func (f *FileStorer) Info() StorageInfo {
|
||||||
|
return StorageInfo{
|
||||||
|
Type: "file",
|
||||||
|
Location: f.basePath,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// progressWriter wraps an io.Writer to track write progress.
|
||||||
|
type progressWriter struct {
|
||||||
|
writer io.Writer
|
||||||
|
written int64
|
||||||
|
callback ProgressCallback
|
||||||
|
}
|
||||||
|
|
||||||
|
func (pw *progressWriter) Write(p []byte) (int, error) {
|
||||||
|
n, err := pw.writer.Write(p)
|
||||||
|
if n > 0 {
|
||||||
|
pw.written += int64(n)
|
||||||
|
if pw.callback != nil {
|
||||||
|
if callbackErr := pw.callback(pw.written); callbackErr != nil {
|
||||||
|
return n, callbackErr
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return n, err
|
||||||
|
}
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user