Compare commits
65 Commits
9c072166fa
...
fix/ctime-
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
25860c03a9 | ||
| c24e7e6360 | |||
| 7a5943958d | |||
| d8a51804d2 | |||
| 76f4421eb3 | |||
| 53ac868c5d | |||
| 8c4ea2b870 | |||
| 597b560398 | |||
| 1e2eced092 | |||
| 815b35c7ae | |||
| 9c66674683 | |||
| 49de277648 | |||
| ed5d777d05 | |||
| 76e047bbb2 | |||
| 2e7356dd85 | |||
| 70d4fe2aa0 | |||
|
|
2f249e3ddd | ||
|
|
3f834f1c9c | ||
|
|
9879668c31 | ||
|
|
0a0d9f33b0 | ||
| df0e8c275b | |||
|
|
ddc23f8057 | ||
| cafb3d45b8 | |||
|
|
d77ac18aaa | ||
| 825f25da58 | |||
| 162d76bb38 | |||
|
|
bfd7334221 | ||
|
|
9b32bf0846 | ||
| 8adc668fa6 | |||
|
|
441c441eca | ||
|
|
4d9f912a5f | ||
| 46c2ea3079 | |||
| 470bf648c4 | |||
| bdaaadf990 | |||
| 417b25a5f5 | |||
| 2afd54d693 | |||
| 05286bed01 | |||
| f2c120f026 | |||
| bbe09ec5b5 | |||
| 43a69c2cfb | |||
| 899448e1da | |||
| 24c5e8c5a6 | |||
| 40fff09594 | |||
| 8a8651c690 | |||
| a1d559c30d | |||
| 88e2508dc7 | |||
| c3725e745e | |||
| badc0c07e0 | |||
| cda0cf865a | |||
| 0736bd070b | |||
| d7cd9aac27 | |||
| bb38f8c5d6 | |||
| e29a995120 | |||
| 5c70405a85 | |||
| a544fa80f2 | |||
| c07d8eec0a | |||
| 0cbb5aa0a6 | |||
| fb220685a2 | |||
| 1d027bde57 | |||
| bb2292de7f | |||
| d3afa65420 | |||
| 78af626759 | |||
| 86b533d6ee | |||
| 26db096913 | |||
| 36c59cb7b3 |
8
.dockerignore
Normal file
8
.dockerignore
Normal file
@@ -0,0 +1,8 @@
|
||||
.git
|
||||
.gitea
|
||||
*.md
|
||||
LICENSE
|
||||
vaultik
|
||||
coverage.out
|
||||
coverage.html
|
||||
.DS_Store
|
||||
14
.gitea/workflows/check.yml
Normal file
14
.gitea/workflows/check.yml
Normal file
@@ -0,0 +1,14 @@
|
||||
name: check
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
pull_request:
|
||||
branches: [main]
|
||||
jobs:
|
||||
check:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
# actions/checkout v4, 2024-09-16
|
||||
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5
|
||||
- name: Build and check
|
||||
run: docker build .
|
||||
380
ARCHITECTURE.md
Normal file
380
ARCHITECTURE.md
Normal file
@@ -0,0 +1,380 @@
|
||||
# Vaultik Architecture
|
||||
|
||||
This document describes the internal architecture of Vaultik, focusing on the data model, type instantiation, and the relationships between core modules.
|
||||
|
||||
## Overview
|
||||
|
||||
Vaultik is a backup system that uses content-defined chunking for deduplication and packs chunks into large, compressed, encrypted blobs for efficient cloud storage. The system is built around dependency injection using [uber-go/fx](https://github.com/uber-go/fx).
|
||||
|
||||
## Data Flow
|
||||
|
||||
```
|
||||
Source Files
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Scanner │ Walks directories, detects changed files
|
||||
└────────┬────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Chunker │ Splits files into variable-size chunks (FastCDC)
|
||||
└────────┬────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Packer │ Accumulates chunks, compresses (zstd), encrypts (age)
|
||||
└────────┬────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ S3 Client │ Uploads blobs to remote storage
|
||||
└─────────────────┘
|
||||
```
|
||||
|
||||
## Data Model
|
||||
|
||||
### Core Entities
|
||||
|
||||
The database tracks five primary entities and their relationships:
|
||||
|
||||
```
|
||||
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
||||
│ Snapshot │────▶│ File │────▶│ Chunk │
|
||||
└──────────────┘ └──────────────┘ └──────────────┘
|
||||
│ │
|
||||
│ │
|
||||
▼ ▼
|
||||
┌──────────────┐ ┌──────────────┐
|
||||
│ Blob │◀─────────────────────────│ BlobChunk │
|
||||
└──────────────┘ └──────────────┘
|
||||
```
|
||||
|
||||
### Entity Descriptions
|
||||
|
||||
#### File (`database.File`)
|
||||
Represents a file or directory in the backup system. Stores metadata needed for restoration:
|
||||
- Path, timestamps (mtime, ctime)
|
||||
- Size, mode, ownership (uid, gid)
|
||||
- Symlink target (if applicable)
|
||||
|
||||
#### Chunk (`database.Chunk`)
|
||||
A content-addressed unit of data. Files are split into variable-size chunks using the FastCDC algorithm:
|
||||
- `ChunkHash`: SHA256 hash of chunk content (primary key)
|
||||
- `Size`: Chunk size in bytes
|
||||
|
||||
Chunk sizes vary between `avgChunkSize/4` and `avgChunkSize*4` (typically 16KB-256KB for 64KB average).
|
||||
|
||||
#### FileChunk (`database.FileChunk`)
|
||||
Maps files to their constituent chunks:
|
||||
- `FileID`: Reference to the file
|
||||
- `Idx`: Position of this chunk within the file (0-indexed)
|
||||
- `ChunkHash`: Reference to the chunk
|
||||
|
||||
#### Blob (`database.Blob`)
|
||||
The final storage unit uploaded to S3. Contains many compressed and encrypted chunks:
|
||||
- `ID`: UUID assigned at creation
|
||||
- `Hash`: SHA256 of final compressed+encrypted content
|
||||
- `UncompressedSize`: Total raw chunk data before compression
|
||||
- `CompressedSize`: Size after zstd compression and age encryption
|
||||
- `CreatedTS`, `FinishedTS`, `UploadedTS`: Lifecycle timestamps
|
||||
|
||||
Blob creation process:
|
||||
1. Chunks are accumulated (up to MaxBlobSize, typically 10GB)
|
||||
2. Compressed with zstd
|
||||
3. Encrypted with age (recipients configured in config)
|
||||
4. SHA256 hash computed → becomes filename in S3
|
||||
5. Uploaded to `blobs/{hash[0:2]}/{hash[2:4]}/{hash}`
|
||||
|
||||
#### BlobChunk (`database.BlobChunk`)
|
||||
Maps chunks to their position within blobs:
|
||||
- `BlobID`: Reference to the blob
|
||||
- `ChunkHash`: Reference to the chunk
|
||||
- `Offset`: Byte offset within the uncompressed blob
|
||||
- `Length`: Chunk size
|
||||
|
||||
#### Snapshot (`database.Snapshot`)
|
||||
Represents a point-in-time backup:
|
||||
- `ID`: Format is `{hostname}-{YYYYMMDD}-{HHMMSS}Z`
|
||||
- Tracks file count, chunk count, blob count, sizes, compression ratio
|
||||
- `CompletedAt`: Null until snapshot finishes successfully
|
||||
|
||||
#### SnapshotFile / SnapshotBlob
|
||||
Join tables linking snapshots to their files and blobs.
|
||||
|
||||
### Relationship Summary
|
||||
|
||||
```
|
||||
Snapshot 1──────────▶ N SnapshotFile N ◀────────── 1 File
|
||||
Snapshot 1──────────▶ N SnapshotBlob N ◀────────── 1 Blob
|
||||
File 1──────────▶ N FileChunk N ◀────────── 1 Chunk
|
||||
Blob 1──────────▶ N BlobChunk N ◀────────── 1 Chunk
|
||||
```
|
||||
|
||||
## Type Instantiation
|
||||
|
||||
### Application Startup
|
||||
|
||||
The CLI uses fx for dependency injection. Here's the instantiation order:
|
||||
|
||||
```go
|
||||
// cli/app.go: NewApp()
|
||||
fx.New(
|
||||
fx.Supply(config.ConfigPath(opts.ConfigPath)), // 1. Config path
|
||||
fx.Supply(opts.LogOptions), // 2. Log options
|
||||
fx.Provide(globals.New), // 3. Globals
|
||||
fx.Provide(log.New), // 4. Logger config
|
||||
config.Module, // 5. Config
|
||||
database.Module, // 6. Database + Repositories
|
||||
log.Module, // 7. Logger initialization
|
||||
s3.Module, // 8. S3 client
|
||||
snapshot.Module, // 9. SnapshotManager + ScannerFactory
|
||||
fx.Provide(vaultik.New), // 10. Vaultik orchestrator
|
||||
)
|
||||
```
|
||||
|
||||
### Key Type Instantiation Points
|
||||
|
||||
#### 1. Config (`config.Config`)
|
||||
- **Created by**: `config.Module` via `config.LoadConfig()`
|
||||
- **When**: Application startup (fx DI)
|
||||
- **Contains**: All configuration from YAML file (S3 credentials, encryption keys, paths, etc.)
|
||||
|
||||
#### 2. Database (`database.DB`)
|
||||
- **Created by**: `database.Module` via `database.New()`
|
||||
- **When**: Application startup (fx DI)
|
||||
- **Contains**: SQLite connection, path reference
|
||||
|
||||
#### 3. Repositories (`database.Repositories`)
|
||||
- **Created by**: `database.Module` via `database.NewRepositories()`
|
||||
- **When**: Application startup (fx DI)
|
||||
- **Contains**: All repository interfaces (Files, Chunks, Blobs, Snapshots, etc.)
|
||||
|
||||
#### 4. Vaultik (`vaultik.Vaultik`)
|
||||
- **Created by**: `vaultik.New(VaultikParams)`
|
||||
- **When**: Application startup (fx DI)
|
||||
- **Contains**: All dependencies for backup operations
|
||||
|
||||
```go
|
||||
type Vaultik struct {
|
||||
Globals *globals.Globals
|
||||
Config *config.Config
|
||||
DB *database.DB
|
||||
Repositories *database.Repositories
|
||||
S3Client *s3.Client
|
||||
ScannerFactory snapshot.ScannerFactory
|
||||
SnapshotManager *snapshot.SnapshotManager
|
||||
Shutdowner fx.Shutdowner
|
||||
Fs afero.Fs
|
||||
ctx context.Context
|
||||
cancel context.CancelFunc
|
||||
}
|
||||
```
|
||||
|
||||
#### 5. SnapshotManager (`snapshot.SnapshotManager`)
|
||||
- **Created by**: `snapshot.Module` via `snapshot.NewSnapshotManager()`
|
||||
- **When**: Application startup (fx DI)
|
||||
- **Responsibility**: Creates/completes snapshots, exports metadata to S3
|
||||
|
||||
#### 6. Scanner (`snapshot.Scanner`)
|
||||
- **Created by**: `ScannerFactory(ScannerParams)`
|
||||
- **When**: Each `CreateSnapshot()` call
|
||||
- **Contains**: Chunker, Packer, progress reporter
|
||||
|
||||
```go
|
||||
// vaultik/snapshot.go: CreateSnapshot()
|
||||
scanner := v.ScannerFactory(snapshot.ScannerParams{
|
||||
EnableProgress: !opts.Cron,
|
||||
Fs: v.Fs,
|
||||
})
|
||||
```
|
||||
|
||||
#### 7. Chunker (`chunker.Chunker`)
|
||||
- **Created by**: `chunker.NewChunker(avgChunkSize)`
|
||||
- **When**: Inside `snapshot.NewScanner()`
|
||||
- **Configuration**:
|
||||
- `avgChunkSize`: From config (typically 64KB)
|
||||
- `minChunkSize`: avgChunkSize / 4
|
||||
- `maxChunkSize`: avgChunkSize * 4
|
||||
|
||||
#### 8. Packer (`blob.Packer`)
|
||||
- **Created by**: `blob.NewPacker(PackerConfig)`
|
||||
- **When**: Inside `snapshot.NewScanner()`
|
||||
- **Configuration**:
|
||||
- `MaxBlobSize`: Maximum blob size before finalization (typically 10GB)
|
||||
- `CompressionLevel`: zstd level (1-19)
|
||||
- `Recipients`: age public keys for encryption
|
||||
|
||||
```go
|
||||
// snapshot/scanner.go: NewScanner()
|
||||
packerCfg := blob.PackerConfig{
|
||||
MaxBlobSize: cfg.MaxBlobSize,
|
||||
CompressionLevel: cfg.CompressionLevel,
|
||||
Recipients: cfg.AgeRecipients,
|
||||
Repositories: cfg.Repositories,
|
||||
Fs: cfg.FS,
|
||||
}
|
||||
packer, err := blob.NewPacker(packerCfg)
|
||||
```
|
||||
|
||||
## Module Responsibilities
|
||||
|
||||
### `internal/cli`
|
||||
Entry point for fx application. Combines all modules and handles signal interrupts.
|
||||
|
||||
Key functions:
|
||||
- `NewApp(AppOptions)` → Creates fx.App with all modules
|
||||
- `RunApp(ctx, app)` → Starts app, handles graceful shutdown
|
||||
- `RunWithApp(ctx, opts)` → Convenience wrapper
|
||||
|
||||
### `internal/vaultik`
|
||||
Main orchestrator containing all dependencies and command implementations.
|
||||
|
||||
Key methods:
|
||||
- `New(VaultikParams)` → Constructor (fx DI)
|
||||
- `CreateSnapshot(opts)` → Main backup operation
|
||||
- `ListSnapshots(jsonOutput)` → List available snapshots
|
||||
- `VerifySnapshot(id, deep)` → Verify snapshot integrity
|
||||
- `PurgeSnapshots(...)` → Remove old snapshots
|
||||
|
||||
### `internal/chunker`
|
||||
Content-defined chunking using FastCDC algorithm.
|
||||
|
||||
Key types:
|
||||
- `Chunk` → Hash, Data, Offset, Size
|
||||
- `Chunker` → avgChunkSize, minChunkSize, maxChunkSize
|
||||
|
||||
Key methods:
|
||||
- `NewChunker(avgChunkSize)` → Constructor
|
||||
- `ChunkReaderStreaming(reader, callback)` → Stream chunks with callback (preferred)
|
||||
- `ChunkReader(reader)` → Return all chunks at once (memory-intensive)
|
||||
|
||||
### `internal/blob`
|
||||
Blob packing: accumulates chunks, compresses, encrypts, tracks metadata.
|
||||
|
||||
Key types:
|
||||
- `Packer` → Thread-safe blob accumulator
|
||||
- `ChunkRef` → Hash + Data for adding to packer
|
||||
- `FinishedBlob` → Completed blob ready for upload
|
||||
- `BlobWithReader` → FinishedBlob + io.Reader for streaming upload
|
||||
|
||||
Key methods:
|
||||
- `NewPacker(PackerConfig)` → Constructor
|
||||
- `AddChunk(ChunkRef)` → Add chunk to current blob
|
||||
- `FinalizeBlob()` → Compress, encrypt, hash current blob
|
||||
- `Flush()` → Finalize any in-progress blob
|
||||
- `SetBlobHandler(func)` → Set callback for upload
|
||||
|
||||
### `internal/snapshot`
|
||||
|
||||
#### Scanner
|
||||
Orchestrates the backup process for a directory.
|
||||
|
||||
Key methods:
|
||||
- `NewScanner(ScannerConfig)` → Constructor (creates Chunker + Packer)
|
||||
- `Scan(ctx, path, snapshotID)` → Main scan operation
|
||||
|
||||
Scan phases:
|
||||
1. **Phase 0**: Detect deleted files from previous snapshots
|
||||
2. **Phase 1**: Walk directory, identify files needing processing
|
||||
3. **Phase 2**: Process files (chunk → pack → upload)
|
||||
|
||||
#### SnapshotManager
|
||||
Manages snapshot lifecycle and metadata export.
|
||||
|
||||
Key methods:
|
||||
- `CreateSnapshot(ctx, hostname, version, commit)` → Create snapshot record
|
||||
- `CompleteSnapshot(ctx, snapshotID)` → Mark snapshot complete
|
||||
- `ExportSnapshotMetadata(ctx, dbPath, snapshotID)` → Export to S3
|
||||
- `CleanupIncompleteSnapshots(ctx, hostname)` → Remove failed snapshots
|
||||
|
||||
### `internal/database`
|
||||
SQLite database for local index. Single-writer mode for thread safety.
|
||||
|
||||
Key types:
|
||||
- `DB` → Database connection wrapper
|
||||
- `Repositories` → Collection of all repository interfaces
|
||||
|
||||
Repository interfaces:
|
||||
- `FilesRepository` → CRUD for File records
|
||||
- `ChunksRepository` → CRUD for Chunk records
|
||||
- `BlobsRepository` → CRUD for Blob records
|
||||
- `SnapshotsRepository` → CRUD for Snapshot records
|
||||
- Plus join table repositories (FileChunks, BlobChunks, etc.)
|
||||
|
||||
## Snapshot Creation Flow
|
||||
|
||||
```
|
||||
CreateSnapshot(opts)
|
||||
│
|
||||
├─► CleanupIncompleteSnapshots() // Critical: avoid dedup errors
|
||||
│
|
||||
├─► SnapshotManager.CreateSnapshot() // Create DB record
|
||||
│
|
||||
├─► For each source directory:
|
||||
│ │
|
||||
│ ├─► scanner.Scan(ctx, path, snapshotID)
|
||||
│ │ │
|
||||
│ │ ├─► Phase 0: detectDeletedFiles()
|
||||
│ │ │
|
||||
│ │ ├─► Phase 1: scanPhase()
|
||||
│ │ │ Walk directory
|
||||
│ │ │ Check file metadata changes
|
||||
│ │ │ Build list of files to process
|
||||
│ │ │
|
||||
│ │ └─► Phase 2: processPhase()
|
||||
│ │ For each file:
|
||||
│ │ chunker.ChunkReaderStreaming()
|
||||
│ │ For each chunk:
|
||||
│ │ packer.AddChunk()
|
||||
│ │ If blob full → FinalizeBlob()
|
||||
│ │ → handleBlobReady()
|
||||
│ │ → s3Client.PutObjectWithProgress()
|
||||
│ │ packer.Flush() // Final blob
|
||||
│ │
|
||||
│ └─► Accumulate statistics
|
||||
│
|
||||
├─► SnapshotManager.UpdateSnapshotStatsExtended()
|
||||
│
|
||||
├─► SnapshotManager.CompleteSnapshot()
|
||||
│
|
||||
└─► SnapshotManager.ExportSnapshotMetadata()
|
||||
│
|
||||
├─► Copy database to temp file
|
||||
├─► Clean to only current snapshot data
|
||||
├─► Dump to SQL
|
||||
├─► Compress with zstd
|
||||
├─► Encrypt with age
|
||||
├─► Upload db.zst.age to S3
|
||||
└─► Upload manifest.json.zst to S3
|
||||
```
|
||||
|
||||
## Deduplication Strategy
|
||||
|
||||
1. **File-level**: Files unchanged since last backup are skipped (metadata comparison: size, mtime, mode, uid, gid)
|
||||
|
||||
2. **Chunk-level**: Chunks are content-addressed by SHA256 hash. If a chunk hash already exists in the database, the chunk data is not re-uploaded.
|
||||
|
||||
3. **Blob-level**: Blobs contain only unique chunks. Duplicate chunks within a blob are skipped.
|
||||
|
||||
## Storage Layout in S3
|
||||
|
||||
```
|
||||
bucket/
|
||||
├── blobs/
|
||||
│ └── {hash[0:2]}/
|
||||
│ └── {hash[2:4]}/
|
||||
│ └── {full-hash} # Compressed+encrypted blob
|
||||
│
|
||||
└── metadata/
|
||||
└── {snapshot-id}/
|
||||
├── db.zst.age # Encrypted database dump
|
||||
└── manifest.json.zst # Blob list (for verification)
|
||||
```
|
||||
|
||||
## Thread Safety
|
||||
|
||||
- `Packer`: Thread-safe via mutex. Multiple goroutines can call `AddChunk()`.
|
||||
- `Scanner`: Uses `packerMu` mutex to coordinate blob finalization.
|
||||
- `Database`: Single-writer mode (`MaxOpenConns=1`) ensures SQLite thread safety.
|
||||
- `Repositories.WithTx()`: Handles transaction lifecycle automatically.
|
||||
16
CLAUDE.md
16
CLAUDE.md
@@ -10,6 +10,9 @@ Read the rules in AGENTS.md and follow them.
|
||||
corporate advertising for Anthropic and is therefore completely
|
||||
unacceptable in commit messages.
|
||||
|
||||
* NEVER use `git add -A`. Always add only the files you intentionally
|
||||
changed.
|
||||
|
||||
* Tests should always be run before committing code. No commits should be
|
||||
made that do not pass tests.
|
||||
|
||||
@@ -26,3 +29,16 @@ Read the rules in AGENTS.md and follow them.
|
||||
* Do not stop working on a task until you have reached the definition of
|
||||
done provided to you in the initial instruction. Don't do part or most of
|
||||
the work, do all of the work until the criteria for done are met.
|
||||
|
||||
* We do not need to support migrations; schema upgrades can be handled by
|
||||
deleting the local state file and doing a full backup to re-create it.
|
||||
|
||||
* When testing on a 2.5Gbit/s ethernet to an s3 server backed by 2000MB/sec SSD,
|
||||
estimate about 4 seconds per gigabyte of backup time.
|
||||
|
||||
* When running tests, don't run individual tests, or grep the output. run
|
||||
the entire test suite every time and read the full output.
|
||||
|
||||
* When running tests, don't run individual tests, or try to grep the output.
|
||||
never run "go test". only ever run "make test" to run the full test
|
||||
suite, and examine the full output.
|
||||
|
||||
385
DESIGN.md
385
DESIGN.md
@@ -1,385 +0,0 @@
|
||||
# vaultik: Design Document
|
||||
|
||||
`vaultik` is a secure backup tool written in Go. It performs
|
||||
streaming backups using content-defined chunking, blob grouping, asymmetric
|
||||
encryption, and object storage. The system is designed for environments
|
||||
where the backup source host cannot store secrets and cannot retrieve or
|
||||
decrypt any data from the destination.
|
||||
|
||||
The source host is **stateful**: it maintains a local SQLite index to detect
|
||||
changes, deduplicate content, and track uploads across backup runs. All
|
||||
remote storage is encrypted and append-only. Pruning of unreferenced data is
|
||||
done from a trusted host with access to decryption keys, as even the
|
||||
metadata indices are encrypted in the blob store.
|
||||
|
||||
---
|
||||
|
||||
## Why
|
||||
|
||||
ANOTHER backup tool??
|
||||
|
||||
Other backup tools like `restic`, `borg`, and `duplicity` are designed for
|
||||
environments where the source host can store secrets and has access to
|
||||
decryption keys. I don't want to store backup decryption keys on my hosts,
|
||||
only public keys for encryption.
|
||||
|
||||
My requirements are:
|
||||
|
||||
* open source
|
||||
* no passphrases or private keys on the source host
|
||||
* incremental
|
||||
* compressed
|
||||
* encrypted
|
||||
* s3 compatible without an intermediate step or tool
|
||||
|
||||
Surprisingly, no existing tool meets these requirements, so I wrote `vaultik`.
|
||||
|
||||
## Design Goals
|
||||
|
||||
1. Backups must require only a public key on the source host.
|
||||
2. No secrets or private keys may exist on the source system.
|
||||
3. Obviously, restore must be possible using **only** the backup bucket and
|
||||
a private key.
|
||||
4. Prune must be possible, although this requires a private key so must be
|
||||
done on different hosts.
|
||||
5. All encryption is done using [`age`](https://github.com/FiloSottile/age)
|
||||
(X25519, XChaCha20-Poly1305).
|
||||
6. Compression uses `zstd` at a configurable level.
|
||||
7. Files are chunked, and multiple chunks are packed into encrypted blobs.
|
||||
This reduces the number of objects in the blob store for filesystems with
|
||||
many small files.
|
||||
9. All metadata (snapshots) is stored remotely as encrypted SQLite DBs.
|
||||
10. If a snapshot metadata file exceeds a configured size threshold, it is
|
||||
chunked into multiple encrypted `.age` parts, to support large
|
||||
filesystems.
|
||||
11. CLI interface is structured using `cobra`.
|
||||
|
||||
---
|
||||
|
||||
## S3 Bucket Layout
|
||||
|
||||
S3 stores only four things:
|
||||
|
||||
1) Blobs: encrypted, compressed packs of file chunks.
|
||||
2) Metadata: encrypted SQLite databases containing the current state of the
|
||||
filesystem at the time of the snapshot.
|
||||
3) Metadata hashes: encrypted hashes of the metadata SQLite databases.
|
||||
4) Blob manifests: unencrypted compressed JSON files listing all blob hashes
|
||||
referenced in the snapshot, enabling pruning without decryption.
|
||||
|
||||
```
|
||||
s3://<bucket>/<prefix>/
|
||||
├── blobs/
|
||||
│ ├── <aa>/<bb>/<full_blob_hash>.zst.age
|
||||
├── metadata/
|
||||
│ ├── <snapshot_id>.sqlite.age
|
||||
│ ├── <snapshot_id>.sqlite.00.age
|
||||
│ ├── <snapshot_id>.sqlite.01.age
|
||||
│ ├── <snapshot_id>.manifest.json.zst
|
||||
```
|
||||
|
||||
To retrieve a given file, you would:
|
||||
|
||||
* fetch `metadata/<snapshot_id>.sqlite.age` or `metadata/<snapshot_id>.sqlite.{seq}.age`
|
||||
* fetch `metadata/<snapshot_id>.hash.age`
|
||||
* decrypt the metadata SQLite database using the private key and reconstruct
|
||||
the full database file
|
||||
* verify the hash of the decrypted database matches the decrypted hash
|
||||
* query the database for the file in question
|
||||
* determine all chunks for the file
|
||||
* for each chunk, look up the metadata for all blobs in the db
|
||||
* fetch each blob from `blobs/<aa>/<bb>/<blob_hash>.zst.age`
|
||||
* decrypt each blob using the private key
|
||||
* decompress each blob using `zstd`
|
||||
* reconstruct the file from set of file chunks stored in the blobs
|
||||
|
||||
If clever, it may be possible to do this chunk by chunk without touching
|
||||
disk (except for the output file) as each uncompressed blob should fit in
|
||||
memory (<10GB).
|
||||
|
||||
### Path Rules
|
||||
|
||||
* `<snapshot_id>`: UTC timestamp in iso860 format, e.g. `2023-10-01T12:00:00Z`. These are lexicographically sortable.
|
||||
* `blobs/<aa>/<bb>/...`: where `aa` and `bb` are the first 2 hex bytes of the blob hash.
|
||||
|
||||
### Blob Manifest Format
|
||||
|
||||
The `<snapshot_id>.manifest.json.zst` file is an unencrypted, compressed JSON file containing:
|
||||
|
||||
```json
|
||||
{
|
||||
"snapshot_id": "2023-10-01T12:00:00Z",
|
||||
"blob_hashes": [
|
||||
"aa1234567890abcdef...",
|
||||
"bb2345678901bcdef0...",
|
||||
...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
This allows pruning operations to determine which blobs are referenced without requiring decryption keys.
|
||||
|
||||
---
|
||||
|
||||
## 3. Local SQLite Index Schema (source host)
|
||||
|
||||
```sql
|
||||
CREATE TABLE files (
|
||||
path TEXT PRIMARY KEY,
|
||||
mtime INTEGER NOT NULL,
|
||||
size INTEGER NOT NULL
|
||||
);
|
||||
|
||||
-- Maps files to their constituent chunks in sequence order
|
||||
-- Used for reconstructing files from chunks during restore
|
||||
CREATE TABLE file_chunks (
|
||||
path TEXT NOT NULL,
|
||||
idx INTEGER NOT NULL,
|
||||
chunk_hash TEXT NOT NULL,
|
||||
PRIMARY KEY (path, idx)
|
||||
);
|
||||
|
||||
CREATE TABLE chunks (
|
||||
chunk_hash TEXT PRIMARY KEY,
|
||||
sha256 TEXT NOT NULL,
|
||||
size INTEGER NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE blobs (
|
||||
blob_hash TEXT PRIMARY KEY,
|
||||
final_hash TEXT NOT NULL,
|
||||
created_ts INTEGER NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE blob_chunks (
|
||||
blob_hash TEXT NOT NULL,
|
||||
chunk_hash TEXT NOT NULL,
|
||||
offset INTEGER NOT NULL,
|
||||
length INTEGER NOT NULL,
|
||||
PRIMARY KEY (blob_hash, chunk_hash)
|
||||
);
|
||||
|
||||
-- Reverse mapping: tracks which files contain a given chunk
|
||||
-- Used for deduplication and tracking chunk usage across files
|
||||
CREATE TABLE chunk_files (
|
||||
chunk_hash TEXT NOT NULL,
|
||||
file_path TEXT NOT NULL,
|
||||
file_offset INTEGER NOT NULL,
|
||||
length INTEGER NOT NULL,
|
||||
PRIMARY KEY (chunk_hash, file_path)
|
||||
);
|
||||
|
||||
CREATE TABLE snapshots (
|
||||
id TEXT PRIMARY KEY,
|
||||
hostname TEXT NOT NULL,
|
||||
vaultik_version TEXT NOT NULL,
|
||||
created_ts INTEGER NOT NULL,
|
||||
file_count INTEGER NOT NULL,
|
||||
chunk_count INTEGER NOT NULL,
|
||||
blob_count INTEGER NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Snapshot Metadata Schema (stored in S3)
|
||||
|
||||
Identical schema to the local index, filtered to live snapshot state. Stored
|
||||
as a SQLite DB, compressed with `zstd`, encrypted with `age`. If larger than
|
||||
a configured `chunk_size`, it is split and uploaded as:
|
||||
|
||||
```
|
||||
metadata/<snapshot_id>.sqlite.00.age
|
||||
metadata/<snapshot_id>.sqlite.01.age
|
||||
...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Data Flow
|
||||
|
||||
### 5.1 Backup
|
||||
|
||||
1. Load config
|
||||
2. Open local SQLite index
|
||||
3. Walk source directories:
|
||||
|
||||
* For each file:
|
||||
|
||||
* Check mtime and size in index
|
||||
* If changed or new:
|
||||
|
||||
* Chunk file
|
||||
* For each chunk:
|
||||
|
||||
* Hash with SHA256
|
||||
* Check if already uploaded
|
||||
* If not:
|
||||
|
||||
* Add chunk to blob packer
|
||||
* Record file-chunk mapping in index
|
||||
4. When blob reaches threshold size (e.g. 1GB):
|
||||
|
||||
* Compress with `zstd`
|
||||
* Encrypt with `age`
|
||||
* Upload to: `s3://<bucket>/<prefix>/blobs/<aa>/<bb>/<hash>.zst.age`
|
||||
* Record blob-chunk layout in local index
|
||||
5. Once all files are processed:
|
||||
* Build snapshot SQLite DB from index delta
|
||||
* Compress + encrypt
|
||||
* If larger than `chunk_size`, split into parts
|
||||
* Upload to:
|
||||
`s3://<bucket>/<prefix>/metadata/<snapshot_id>.sqlite(.xx).age`
|
||||
6. Create snapshot record in local index that lists:
|
||||
* snapshot ID
|
||||
* hostname
|
||||
* vaultik version
|
||||
* timestamp
|
||||
* counts of files, chunks, and blobs
|
||||
* list of all blobs referenced in the snapshot (some new, some old) for
|
||||
efficient pruning later
|
||||
7. Create snapshot database for upload
|
||||
8. Calculate checksum of snapshot database
|
||||
9. Compress, encrypt, split, and upload to S3
|
||||
10. Encrypt the hash of the snapshot database to the backup age key
|
||||
11. Upload the encrypted hash to S3 as `metadata/<snapshot_id>.hash.age`
|
||||
12. Create blob manifest JSON listing all blob hashes referenced in snapshot
|
||||
13. Compress manifest with zstd and upload as `metadata/<snapshot_id>.manifest.json.zst`
|
||||
14. Optionally prune remote blobs that are no longer referenced in the
|
||||
snapshot, based on local state db
|
||||
|
||||
### 5.2 Manual Prune
|
||||
|
||||
1. List all objects under `metadata/`
|
||||
2. Determine the latest valid `snapshot_id` by timestamp
|
||||
3. Download and decompress the latest `<snapshot_id>.manifest.json.zst`
|
||||
4. Extract set of referenced blob hashes from manifest (no decryption needed)
|
||||
5. List all blob objects under `blobs/`
|
||||
6. For each blob:
|
||||
* If the hash is not in the manifest:
|
||||
* Issue `DeleteObject` to remove it
|
||||
|
||||
### 5.3 Verify
|
||||
|
||||
Verify runs on a host that has no state, but access to the bucket.
|
||||
|
||||
1. Fetch latest metadata snapshot files from S3
|
||||
2. Fetch latest metadata db hash from S3
|
||||
3. Decrypt the hash using the private key
|
||||
4. Decrypt the metadata SQLite database chunks using the private key and
|
||||
reassemble the snapshot db file
|
||||
5. Calculate the SHA256 hash of the decrypted snapshot database
|
||||
6. Verify the db file hash matches the decrypted hash
|
||||
7. For each blob in the snapshot:
|
||||
* Fetch the blob metadata from the snapshot db
|
||||
* Ensure the blob exists in S3
|
||||
* Check the S3 content hash matches the expected blob hash
|
||||
* If not using --quick mode:
|
||||
* Download and decrypt the blob
|
||||
* Decompress and verify chunk hashes match metadata
|
||||
|
||||
---
|
||||
|
||||
## 6. CLI Commands
|
||||
|
||||
```
|
||||
vaultik backup [--config <path>] [--cron] [--daemon] [--prune]
|
||||
vaultik restore --bucket <bucket> --prefix <prefix> --snapshot <id> --target <dir>
|
||||
vaultik prune --bucket <bucket> --prefix <prefix> [--dry-run]
|
||||
vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
|
||||
vaultik fetch --bucket <bucket> --prefix <prefix> --snapshot <id> --file <path> --target <path>
|
||||
vaultik snapshot list --bucket <bucket> --prefix <prefix> [--limit <n>]
|
||||
vaultik snapshot rm --bucket <bucket> --prefix <prefix> --snapshot <id>
|
||||
vaultik snapshot latest --bucket <bucket> --prefix <prefix>
|
||||
```
|
||||
|
||||
* `VAULTIK_PRIVATE_KEY` is required for `restore`, `prune`, `verify`, and
|
||||
`fetch` commands.
|
||||
* It is passed via environment variable containing the age private key.
|
||||
|
||||
---
|
||||
|
||||
## 7. Function and Method Signatures
|
||||
|
||||
### 7.1 CLI
|
||||
|
||||
```go
|
||||
func RootCmd() *cobra.Command
|
||||
func backupCmd() *cobra.Command
|
||||
func restoreCmd() *cobra.Command
|
||||
func pruneCmd() *cobra.Command
|
||||
func verifyCmd() *cobra.Command
|
||||
```
|
||||
|
||||
### 7.2 Configuration
|
||||
|
||||
```go
|
||||
type Config struct {
|
||||
BackupPubKey string // age recipient
|
||||
BackupInterval time.Duration // used in daemon mode, irrelevant for cron mode
|
||||
BlobSizeLimit int64 // default 10GB
|
||||
ChunkSize int64 // default 10MB
|
||||
Exclude []string // list of regex of files to exclude from backup, absolute path
|
||||
Hostname string
|
||||
IndexPath string // path to local SQLite index db, default /var/lib/vaultik/index.db
|
||||
MetadataPrefix string // S3 prefix for metadata, default "metadata/"
|
||||
MinTimeBetweenRun time.Duration // minimum time between backup runs, default 1 hour - for daemon mode
|
||||
S3 S3Config // S3 configuration
|
||||
ScanInterval time.Duration // interval to full stat() scan source dirs, default 24h
|
||||
SourceDirs []string // list of source directories to back up, absolute paths
|
||||
}
|
||||
|
||||
type S3Config struct {
|
||||
Endpoint string
|
||||
Bucket string
|
||||
Prefix string
|
||||
AccessKeyID string
|
||||
SecretAccessKey string
|
||||
Region string
|
||||
}
|
||||
|
||||
func Load(path string) (*Config, error)
|
||||
```
|
||||
|
||||
### 7.3 Index
|
||||
|
||||
```go
|
||||
type Index struct {
|
||||
db *sql.DB
|
||||
}
|
||||
|
||||
func OpenIndex(path string) (*Index, error)
|
||||
|
||||
func (ix *Index) LookupFile(path string, mtime int64, size int64) ([]string, bool, error)
|
||||
func (ix *Index) SaveFile(path string, mtime int64, size int64, chunkHashes []string) error
|
||||
func (ix *Index) AddChunk(chunkHash string, size int64) error
|
||||
func (ix *Index) MarkBlob(blobHash, finalHash string, created time.Time) error
|
||||
func (ix *Index) MapChunkToBlob(blobHash, chunkHash string, offset, length int64) error
|
||||
func (ix *Index) MapChunkToFile(chunkHash, filePath string, offset, length int64) error
|
||||
```
|
||||
|
||||
### 7.4 Blob Packing
|
||||
|
||||
```go
|
||||
type BlobWriter struct {
|
||||
// internal buffer, current size, encrypted writer, etc
|
||||
}
|
||||
|
||||
func NewBlobWriter(...) *BlobWriter
|
||||
func (bw *BlobWriter) AddChunk(chunk []byte, chunkHash string) error
|
||||
func (bw *BlobWriter) Flush() (finalBlobHash string, err error)
|
||||
```
|
||||
|
||||
### 7.5 Metadata
|
||||
|
||||
```go
|
||||
func BuildSnapshotMetadata(ix *Index, snapshotID string) (sqlitePath string, err error)
|
||||
func EncryptAndUploadMetadata(path string, cfg *Config, snapshotID string) error
|
||||
```
|
||||
|
||||
### 7.6 Prune
|
||||
|
||||
```go
|
||||
func RunPrune(bucket, prefix, privateKey string) error
|
||||
```
|
||||
|
||||
61
Dockerfile
Normal file
61
Dockerfile
Normal file
@@ -0,0 +1,61 @@
|
||||
# Lint stage
|
||||
# golangci/golangci-lint:v2.11.3-alpine, 2026-03-17
|
||||
FROM golangci/golangci-lint:v2.11.3-alpine@sha256:b1c3de5862ad0a95b4e45a993b0f00415835d687e4f12c845c7493b86c13414e AS lint
|
||||
|
||||
RUN apk add --no-cache make build-base
|
||||
|
||||
WORKDIR /src
|
||||
|
||||
# Copy go mod files first for better layer caching
|
||||
COPY go.mod go.sum ./
|
||||
RUN go mod download
|
||||
|
||||
# Copy source code
|
||||
COPY . .
|
||||
|
||||
# Run formatting check and linter
|
||||
RUN make fmt-check
|
||||
RUN make lint
|
||||
|
||||
# Build stage
|
||||
# golang:1.26.1-alpine, 2026-03-17
|
||||
FROM golang:1.26.1-alpine@sha256:2389ebfa5b7f43eeafbd6be0c3700cc46690ef842ad962f6c5bd6be49ed82039 AS builder
|
||||
|
||||
# Depend on lint stage passing
|
||||
COPY --from=lint /src/go.sum /dev/null
|
||||
|
||||
ARG VERSION=dev
|
||||
|
||||
# Install build dependencies for CGO (mattn/go-sqlite3) and sqlite3 CLI (tests)
|
||||
RUN apk add --no-cache make build-base sqlite
|
||||
|
||||
WORKDIR /src
|
||||
|
||||
# Copy go mod files first for better layer caching
|
||||
COPY go.mod go.sum ./
|
||||
RUN go mod download
|
||||
|
||||
# Copy source code
|
||||
COPY . .
|
||||
|
||||
# Run tests
|
||||
RUN make test
|
||||
|
||||
# Build with CGO enabled (required for mattn/go-sqlite3)
|
||||
RUN CGO_ENABLED=1 go build -ldflags "-X 'git.eeqj.de/sneak/vaultik/internal/globals.Version=${VERSION}' -X 'git.eeqj.de/sneak/vaultik/internal/globals.Commit=$(git rev-parse HEAD 2>/dev/null || echo unknown)'" -o /vaultik ./cmd/vaultik
|
||||
|
||||
# Runtime stage
|
||||
# alpine:3.21, 2026-02-25
|
||||
FROM alpine:3.21@sha256:c3f8e73fdb79deaebaa2037150150191b9dcbfba68b4a46d70103204c53f4709
|
||||
|
||||
RUN apk add --no-cache ca-certificates sqlite
|
||||
|
||||
# Copy binary from builder
|
||||
COPY --from=builder /vaultik /usr/local/bin/vaultik
|
||||
|
||||
# Create non-root user
|
||||
RUN adduser -D -H -s /sbin/nologin vaultik
|
||||
|
||||
USER vaultik
|
||||
|
||||
ENTRYPOINT ["/usr/local/bin/vaultik"]
|
||||
21
LICENSE
Normal file
21
LICENSE
Normal file
@@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2025 Jeffrey Paul sneak@sneak.berlin
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
55
Makefile
55
Makefile
@@ -1,26 +1,25 @@
|
||||
.PHONY: test fmt lint build clean all
|
||||
.PHONY: test fmt lint fmt-check check build clean all docker hooks
|
||||
|
||||
# Version number
|
||||
VERSION := 0.0.1
|
||||
|
||||
# Build variables
|
||||
VERSION := $(shell git describe --tags --always --dirty 2>/dev/null || echo "dev")
|
||||
COMMIT := $(shell git rev-parse HEAD 2>/dev/null || echo "unknown")
|
||||
GIT_REVISION := $(shell git rev-parse HEAD 2>/dev/null || echo "unknown")
|
||||
|
||||
# Linker flags
|
||||
LDFLAGS := -X 'git.eeqj.de/sneak/vaultik/internal/globals.Version=$(VERSION)' \
|
||||
-X 'git.eeqj.de/sneak/vaultik/internal/globals.Commit=$(COMMIT)'
|
||||
-X 'git.eeqj.de/sneak/vaultik/internal/globals.Commit=$(GIT_REVISION)'
|
||||
|
||||
# Default target
|
||||
all: test
|
||||
all: vaultik
|
||||
|
||||
# Run tests
|
||||
test: lint fmt-check
|
||||
go test -v ./...
|
||||
test:
|
||||
go test -race -timeout 30s ./...
|
||||
|
||||
# Check if code is formatted
|
||||
# Check if code is formatted (read-only)
|
||||
fmt-check:
|
||||
@if [ -n "$$(go fmt ./...)" ]; then \
|
||||
echo "Error: Code is not formatted. Run 'make fmt' to fix."; \
|
||||
exit 1; \
|
||||
fi
|
||||
@test -z "$$(gofmt -l .)" || (echo "Files not formatted:" && gofmt -l . && exit 1)
|
||||
|
||||
# Format code
|
||||
fmt:
|
||||
@@ -28,22 +27,17 @@ fmt:
|
||||
|
||||
# Run linter
|
||||
lint:
|
||||
golangci-lint run
|
||||
golangci-lint run ./...
|
||||
|
||||
# Build binary
|
||||
build:
|
||||
go build -ldflags "$(LDFLAGS)" -o vaultik ./cmd/vaultik
|
||||
vaultik: internal/*/*.go cmd/vaultik/*.go
|
||||
go build -ldflags "$(LDFLAGS)" -o $@ ./cmd/vaultik
|
||||
|
||||
# Clean build artifacts
|
||||
clean:
|
||||
rm -f vaultik
|
||||
go clean
|
||||
|
||||
# Install dependencies
|
||||
deps:
|
||||
go mod download
|
||||
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
|
||||
|
||||
# Run tests with coverage
|
||||
test-coverage:
|
||||
go test -v -coverprofile=coverage.out ./...
|
||||
@@ -52,3 +46,24 @@ test-coverage:
|
||||
# Run integration tests
|
||||
test-integration:
|
||||
go test -v -tags=integration ./...
|
||||
|
||||
local:
|
||||
VAULTIK_CONFIG=$(HOME)/etc/vaultik/config.yml ./vaultik snapshot --debug list 2>&1
|
||||
VAULTIK_CONFIG=$(HOME)/etc/vaultik/config.yml ./vaultik snapshot --debug create 2>&1
|
||||
|
||||
install: vaultik
|
||||
cp ./vaultik $(HOME)/bin/
|
||||
|
||||
# Run all checks (formatting, linting, tests) without modifying files
|
||||
check: fmt-check lint test
|
||||
|
||||
# Build Docker image
|
||||
docker:
|
||||
docker build -t vaultik .
|
||||
|
||||
# Install pre-commit hook
|
||||
hooks:
|
||||
@printf '#!/bin/sh\nset -e\n' > .git/hooks/pre-commit
|
||||
@printf 'go mod tidy\ngo fmt ./...\ngit diff --exit-code -- go.mod go.sum || { echo "go mod tidy changed files; please stage and retry"; exit 1; }\n' >> .git/hooks/pre-commit
|
||||
@printf 'make check\n' >> .git/hooks/pre-commit
|
||||
@chmod +x .git/hooks/pre-commit
|
||||
|
||||
556
PROCESS.md
Normal file
556
PROCESS.md
Normal file
@@ -0,0 +1,556 @@
|
||||
# Vaultik Snapshot Creation Process
|
||||
|
||||
This document describes the lifecycle of objects during snapshot creation, with a focus on database transactions and foreign key constraints.
|
||||
|
||||
## Database Schema Overview
|
||||
|
||||
### Tables and Foreign Key Dependencies
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ FOREIGN KEY GRAPH │
|
||||
│ │
|
||||
│ snapshots ◄────── snapshot_files ────────► files │
|
||||
│ │ │ │
|
||||
│ └───────── snapshot_blobs ────────► blobs │ │
|
||||
│ │ │ │
|
||||
│ │ ├──► file_chunks ◄── chunks│
|
||||
│ │ │ ▲ │
|
||||
│ │ └──► chunk_files ────┘ │
|
||||
│ │ │
|
||||
│ └──► blob_chunks ─────────────┘│
|
||||
│ │
|
||||
│ uploads ───────► blobs.blob_hash │
|
||||
│ └──────────► snapshots.id │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Critical Constraint: `chunks` Must Exist First
|
||||
|
||||
These tables reference `chunks.chunk_hash` **without CASCADE**:
|
||||
- `file_chunks.chunk_hash` → `chunks.chunk_hash`
|
||||
- `chunk_files.chunk_hash` → `chunks.chunk_hash`
|
||||
- `blob_chunks.chunk_hash` → `chunks.chunk_hash`
|
||||
|
||||
**Implication**: A chunk record MUST be committed to the database BEFORE any of these referencing records can be created.
|
||||
|
||||
### Order of Operations Required by Schema
|
||||
|
||||
```
|
||||
1. snapshots (created first, before scan)
|
||||
2. blobs (created when packer starts new blob)
|
||||
3. chunks (created during file processing)
|
||||
4. blob_chunks (created immediately after chunk added to packer)
|
||||
5. files (created after file fully chunked)
|
||||
6. file_chunks (created with file record)
|
||||
7. chunk_files (created with file record)
|
||||
8. snapshot_files (created with file record)
|
||||
9. snapshot_blobs (created after blob uploaded)
|
||||
10. uploads (created after blob uploaded)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Snapshot Creation Phases
|
||||
|
||||
### Phase 0: Initialization
|
||||
|
||||
**Actions:**
|
||||
1. Snapshot record created in database (Transaction T0)
|
||||
2. Known files loaded into memory from `files` table
|
||||
3. Known chunks loaded into memory from `chunks` table
|
||||
|
||||
**Transactions:**
|
||||
```
|
||||
T0: INSERT INTO snapshots (id, hostname, ...) VALUES (...)
|
||||
COMMIT
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Phase 1: Scan Directory
|
||||
|
||||
**Actions:**
|
||||
1. Walk filesystem directory tree
|
||||
2. For each file, compare against in-memory `knownFiles` map
|
||||
3. Classify files as: unchanged, new, or modified
|
||||
4. Collect unchanged file IDs for later association
|
||||
5. Collect new/modified files for processing
|
||||
|
||||
**Transactions:**
|
||||
```
|
||||
(None during scan - all in-memory)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Phase 1b: Associate Unchanged Files
|
||||
|
||||
**Actions:**
|
||||
1. For unchanged files, add entries to `snapshot_files` table
|
||||
2. Done in batches of 1000
|
||||
|
||||
**Transactions:**
|
||||
```
|
||||
For each batch of 1000 file IDs:
|
||||
T: BEGIN
|
||||
INSERT INTO snapshot_files (snapshot_id, file_id) VALUES (?, ?)
|
||||
... (up to 1000 inserts)
|
||||
COMMIT
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Process Files
|
||||
|
||||
For each file that needs processing:
|
||||
|
||||
#### Step 2a: Open and Chunk File
|
||||
|
||||
**Location:** `processFileStreaming()`
|
||||
|
||||
For each chunk produced by content-defined chunking:
|
||||
|
||||
##### Step 2a-1: Check Chunk Existence
|
||||
```go
|
||||
chunkExists := s.chunkExists(chunk.Hash) // In-memory lookup
|
||||
```
|
||||
|
||||
##### Step 2a-2: Create Chunk Record (if new)
|
||||
```go
|
||||
// TRANSACTION: Create chunk in database
|
||||
err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
|
||||
dbChunk := &database.Chunk{ChunkHash: chunk.Hash, Size: chunk.Size}
|
||||
return s.repos.Chunks.Create(txCtx, tx, dbChunk)
|
||||
})
|
||||
// COMMIT immediately after WithTx returns
|
||||
|
||||
// Update in-memory cache
|
||||
s.addKnownChunk(chunk.Hash)
|
||||
```
|
||||
|
||||
**Transaction:**
|
||||
```
|
||||
T_chunk: BEGIN
|
||||
INSERT INTO chunks (chunk_hash, size) VALUES (?, ?)
|
||||
COMMIT
|
||||
```
|
||||
|
||||
##### Step 2a-3: Add Chunk to Packer
|
||||
|
||||
```go
|
||||
s.packer.AddChunk(&blob.ChunkRef{Hash: chunk.Hash, Data: chunk.Data})
|
||||
```
|
||||
|
||||
**Inside packer.AddChunk → addChunkToCurrentBlob():**
|
||||
|
||||
```go
|
||||
// TRANSACTION: Create blob_chunks record IMMEDIATELY
|
||||
if p.repos != nil {
|
||||
blobChunk := &database.BlobChunk{
|
||||
BlobID: p.currentBlob.id,
|
||||
ChunkHash: chunk.Hash,
|
||||
Offset: offset,
|
||||
Length: chunkSize,
|
||||
}
|
||||
err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
|
||||
return p.repos.BlobChunks.Create(ctx, tx, blobChunk)
|
||||
})
|
||||
// COMMIT immediately
|
||||
}
|
||||
```
|
||||
|
||||
**Transaction:**
|
||||
```
|
||||
T_blob_chunk: BEGIN
|
||||
INSERT INTO blob_chunks (blob_id, chunk_hash, offset, length) VALUES (?, ?, ?, ?)
|
||||
COMMIT
|
||||
```
|
||||
|
||||
**⚠️ CRITICAL DEPENDENCY**: This transaction requires `chunks.chunk_hash` to exist (FK constraint).
|
||||
The chunk MUST be committed in Step 2a-2 BEFORE this can succeed.
|
||||
|
||||
---
|
||||
|
||||
#### Step 2b: Blob Size Limit Handling
|
||||
|
||||
If adding a chunk would exceed blob size limit:
|
||||
|
||||
```go
|
||||
if err == blob.ErrBlobSizeLimitExceeded {
|
||||
if err := s.packer.FinalizeBlob(); err != nil { ... }
|
||||
// Retry adding the chunk
|
||||
if err := s.packer.AddChunk(...); err != nil { ... }
|
||||
}
|
||||
```
|
||||
|
||||
**FinalizeBlob() transactions:**
|
||||
```
|
||||
T_blob_finish: BEGIN
|
||||
UPDATE blobs SET blob_hash=?, uncompressed_size=?, compressed_size=?, finished_ts=? WHERE id=?
|
||||
COMMIT
|
||||
```
|
||||
|
||||
Then blob handler is called (handleBlobReady):
|
||||
```
|
||||
(Upload to S3 - no transaction)
|
||||
|
||||
T_blob_uploaded: BEGIN
|
||||
UPDATE blobs SET uploaded_ts=? WHERE id=?
|
||||
INSERT INTO snapshot_blobs (snapshot_id, blob_id, blob_hash) VALUES (?, ?, ?)
|
||||
INSERT INTO uploads (blob_hash, snapshot_id, uploaded_at, size, duration_ms) VALUES (?, ?, ?, ?, ?)
|
||||
COMMIT
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### Step 2c: Queue File for Batch Insertion
|
||||
|
||||
After all chunks for a file are processed:
|
||||
|
||||
```go
|
||||
// Build file data (in-memory, no DB)
|
||||
fileChunks := make([]database.FileChunk, len(chunks))
|
||||
chunkFiles := make([]database.ChunkFile, len(chunks))
|
||||
|
||||
// Queue for batch insertion
|
||||
return s.addPendingFile(ctx, pendingFileData{
|
||||
file: fileToProcess.File,
|
||||
fileChunks: fileChunks,
|
||||
chunkFiles: chunkFiles,
|
||||
})
|
||||
```
|
||||
|
||||
**No transaction yet** - just adds to `pendingFiles` slice.
|
||||
|
||||
If `len(pendingFiles) >= fileBatchSize (100)`, triggers `flushPendingFiles()`.
|
||||
|
||||
---
|
||||
|
||||
### Step 2d: Flush Pending Files
|
||||
|
||||
**Location:** `flushPendingFiles()` - called when batch is full or at end of processing
|
||||
|
||||
```go
|
||||
return s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
|
||||
for _, data := range files {
|
||||
// 1. Create file record
|
||||
s.repos.Files.Create(txCtx, tx, data.file) // INSERT OR REPLACE
|
||||
|
||||
// 2. Delete old associations
|
||||
s.repos.FileChunks.DeleteByFileID(txCtx, tx, data.file.ID)
|
||||
s.repos.ChunkFiles.DeleteByFileID(txCtx, tx, data.file.ID)
|
||||
|
||||
// 3. Create file_chunks records
|
||||
for _, fc := range data.fileChunks {
|
||||
s.repos.FileChunks.Create(txCtx, tx, &fc) // FK: chunks.chunk_hash
|
||||
}
|
||||
|
||||
// 4. Create chunk_files records
|
||||
for _, cf := range data.chunkFiles {
|
||||
s.repos.ChunkFiles.Create(txCtx, tx, &cf) // FK: chunks.chunk_hash
|
||||
}
|
||||
|
||||
// 5. Add file to snapshot
|
||||
s.repos.Snapshots.AddFileByID(txCtx, tx, s.snapshotID, data.file.ID)
|
||||
}
|
||||
return nil
|
||||
})
|
||||
// COMMIT (all or nothing for the batch)
|
||||
```
|
||||
|
||||
**Transaction:**
|
||||
```
|
||||
T_files_batch: BEGIN
|
||||
-- For each file in batch:
|
||||
INSERT OR REPLACE INTO files (...) VALUES (...)
|
||||
DELETE FROM file_chunks WHERE file_id = ?
|
||||
DELETE FROM chunk_files WHERE file_id = ?
|
||||
INSERT INTO file_chunks (file_id, idx, chunk_hash) VALUES (?, ?, ?) -- FK: chunks
|
||||
INSERT INTO chunk_files (chunk_hash, file_id, ...) VALUES (?, ?, ...) -- FK: chunks
|
||||
INSERT INTO snapshot_files (snapshot_id, file_id) VALUES (?, ?)
|
||||
-- Repeat for each file
|
||||
COMMIT
|
||||
```
|
||||
|
||||
**⚠️ CRITICAL DEPENDENCY**: `file_chunks` and `chunk_files` require `chunks.chunk_hash` to exist.
|
||||
|
||||
---
|
||||
|
||||
### Phase 2 End: Final Flush
|
||||
|
||||
```go
|
||||
// Flush any remaining pending files
|
||||
if err := s.flushAllPending(ctx); err != nil { ... }
|
||||
|
||||
// Final packer flush
|
||||
s.packer.Flush()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## The Current Bug
|
||||
|
||||
### Problem
|
||||
|
||||
The current code attempts to batch file insertions, but `file_chunks` and `chunk_files` have foreign keys to `chunks.chunk_hash`. The batched file flush tries to insert these records, but if the chunks haven't been committed yet, the FK constraint fails.
|
||||
|
||||
### Why It's Happening
|
||||
|
||||
Looking at the sequence:
|
||||
|
||||
1. Process file A, chunk X
|
||||
2. Create chunk X in DB (Transaction commits)
|
||||
3. Add chunk X to packer
|
||||
4. Packer creates blob_chunks for chunk X (needs chunk X - OK, committed in step 2)
|
||||
5. Queue file A with chunk references
|
||||
6. Process file B, chunk Y
|
||||
7. Create chunk Y in DB (Transaction commits)
|
||||
8. ... etc ...
|
||||
9. At end: flushPendingFiles()
|
||||
10. Insert file_chunks for file A referencing chunk X (chunk X committed - should work)
|
||||
|
||||
The chunks ARE being created individually. But something is going wrong.
|
||||
|
||||
### Actual Issue
|
||||
|
||||
Wait - let me re-read the code. The issue is:
|
||||
|
||||
In `processFileStreaming`, when we queue file data:
|
||||
```go
|
||||
fileChunks[i] = database.FileChunk{
|
||||
FileID: fileToProcess.File.ID,
|
||||
Idx: ci.fileChunk.Idx,
|
||||
ChunkHash: ci.fileChunk.ChunkHash,
|
||||
}
|
||||
```
|
||||
|
||||
The `FileID` is set, but `fileToProcess.File.ID` might be empty at this point because the file record hasn't been created yet!
|
||||
|
||||
Looking at `checkFileInMemory`:
|
||||
```go
|
||||
// For new files:
|
||||
if !exists {
|
||||
return file, true // file.ID is empty string!
|
||||
}
|
||||
|
||||
// For existing files:
|
||||
file.ID = existingFile.ID // Reuse existing ID
|
||||
```
|
||||
|
||||
**For NEW files, `file.ID` is empty!**
|
||||
|
||||
Then in `flushPendingFiles`:
|
||||
```go
|
||||
s.repos.Files.Create(txCtx, tx, data.file) // This generates/uses the ID
|
||||
```
|
||||
|
||||
But `data.fileChunks` was built with the EMPTY ID!
|
||||
|
||||
### The Real Problem
|
||||
|
||||
For new files:
|
||||
1. `checkFileInMemory` creates file record with empty ID
|
||||
2. `processFileStreaming` queues file_chunks with empty `FileID`
|
||||
3. `flushPendingFiles` creates file (generates ID), but file_chunks still have empty `FileID`
|
||||
|
||||
Wait, but `Files.Create` should be INSERT OR REPLACE by path, and the file struct should get updated... Let me check.
|
||||
|
||||
Actually, looking more carefully at the code path - the file IS created first in the flush, but the `fileChunks` slice was already built with the old (possibly empty) ID. The ID isn't updated after the file is created.
|
||||
|
||||
Hmm, but looking at the current code:
|
||||
```go
|
||||
fileChunks[i] = database.FileChunk{
|
||||
FileID: fileToProcess.File.ID, // This uses the ID from the File struct
|
||||
```
|
||||
|
||||
And in `checkFileInMemory` for new files, we create a file struct but don't set the ID. However, looking at the database repository, `Files.Create` should be doing `INSERT OR REPLACE` and the ID should be pre-generated...
|
||||
|
||||
Let me check if IDs are being generated. Looking at the File struct usage, it seems like UUIDs should be generated somewhere...
|
||||
|
||||
Actually, looking at the test failures again:
|
||||
```
|
||||
creating file chunk: inserting file_chunk: constraint failed: FOREIGN KEY constraint failed (787)
|
||||
```
|
||||
|
||||
Error 787 is SQLite's foreign key constraint error. The failing FK is on `file_chunks.chunk_hash → chunks.chunk_hash`.
|
||||
|
||||
So the chunks ARE NOT in the database when we try to insert file_chunks. Let me trace through more carefully...
|
||||
|
||||
---
|
||||
|
||||
## Transaction Timing Issue
|
||||
|
||||
The problem is transaction visibility in SQLite.
|
||||
|
||||
Each `WithTx` creates a new transaction that commits at the end. But with batched file insertion:
|
||||
|
||||
1. Chunk transactions commit one at a time
|
||||
2. File batch transaction runs later
|
||||
|
||||
If chunks are being inserted but something goes wrong with transaction isolation, the file batch might not see them.
|
||||
|
||||
But actually SQLite in WAL mode should have SERIALIZABLE isolation by default, so committed transactions should be visible.
|
||||
|
||||
Let me check if the in-memory cache is masking a database problem...
|
||||
|
||||
Actually, wait. Let me re-check the current broken code more carefully. The issue might be simpler.
|
||||
|
||||
---
|
||||
|
||||
## Current Code Flow Analysis
|
||||
|
||||
Looking at `processFileStreaming` in the current broken state:
|
||||
|
||||
```go
|
||||
// For each chunk:
|
||||
if !chunkExists {
|
||||
err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
|
||||
dbChunk := &database.Chunk{ChunkHash: chunk.Hash, Size: chunk.Size}
|
||||
return s.repos.Chunks.Create(txCtx, tx, dbChunk)
|
||||
})
|
||||
// ... check error ...
|
||||
s.addKnownChunk(chunk.Hash)
|
||||
}
|
||||
|
||||
// ... add to packer (creates blob_chunks) ...
|
||||
|
||||
// Collect chunk info for file
|
||||
chunks = append(chunks, chunkInfo{...})
|
||||
```
|
||||
|
||||
Then at end of function:
|
||||
```go
|
||||
// Queue file for batch insertion
|
||||
return s.addPendingFile(ctx, pendingFileData{
|
||||
file: fileToProcess.File,
|
||||
fileChunks: fileChunks,
|
||||
chunkFiles: chunkFiles,
|
||||
})
|
||||
```
|
||||
|
||||
At end of `processPhase`:
|
||||
```go
|
||||
if err := s.flushAllPending(ctx); err != nil { ... }
|
||||
```
|
||||
|
||||
The chunks are being created one-by-one with individual transactions. By the time `flushPendingFiles` runs, all chunk transactions should have committed.
|
||||
|
||||
Unless... there's a bug in how the chunks are being referenced. Let me check if the chunk_hash values are correct.
|
||||
|
||||
Or... maybe the test database is being recreated between operations somehow?
|
||||
|
||||
Actually, let me check the test setup. Maybe the issue is specific to the test environment.
|
||||
|
||||
---
|
||||
|
||||
## Summary of Object Lifecycle
|
||||
|
||||
| Object | When Created | Transaction | Dependencies |
|
||||
|--------|--------------|-------------|--------------|
|
||||
| snapshot | Before scan | Individual tx | None |
|
||||
| blob | When packer needs new blob | Individual tx | None |
|
||||
| chunk | During file chunking (each chunk) | Individual tx | None |
|
||||
| blob_chunks | Immediately after adding chunk to packer | Individual tx | chunks, blobs |
|
||||
| files | Batched at end of processing | Batch tx | None |
|
||||
| file_chunks | With file (batched) | Batch tx | files, chunks |
|
||||
| chunk_files | With file (batched) | Batch tx | files, chunks |
|
||||
| snapshot_files | With file (batched) | Batch tx | snapshots, files |
|
||||
| snapshot_blobs | After blob upload | Individual tx | snapshots, blobs |
|
||||
| uploads | After blob upload | Same tx as snapshot_blobs | blobs, snapshots |
|
||||
|
||||
---
|
||||
|
||||
## Root Cause Analysis
|
||||
|
||||
After detailed analysis, I believe the issue is one of the following:
|
||||
|
||||
### Hypothesis 1: File ID Not Set
|
||||
|
||||
Looking at `checkFileInMemory()` for NEW files:
|
||||
```go
|
||||
if !exists {
|
||||
return file, true // file.ID is empty string!
|
||||
}
|
||||
```
|
||||
|
||||
For new files, `file.ID` is empty. Then in `processFileStreaming`:
|
||||
```go
|
||||
fileChunks[i] = database.FileChunk{
|
||||
FileID: fileToProcess.File.ID, // Empty for new files!
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
The `FileID` in the built `fileChunks` slice is empty.
|
||||
|
||||
Then in `flushPendingFiles`:
|
||||
```go
|
||||
s.repos.Files.Create(txCtx, tx, data.file) // This generates the ID
|
||||
// But data.fileChunks still has empty FileID!
|
||||
for i := range data.fileChunks {
|
||||
s.repos.FileChunks.Create(...) // Uses empty FileID
|
||||
}
|
||||
```
|
||||
|
||||
**Solution**: Generate file IDs upfront in `checkFileInMemory()`:
|
||||
```go
|
||||
file := &database.File{
|
||||
ID: uuid.New().String(), // Generate ID immediately
|
||||
Path: path,
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
### Hypothesis 2: Transaction Isolation
|
||||
|
||||
SQLite with a single connection pool (`MaxOpenConns(1)`) should serialize all transactions. Committed data should be visible to subsequent transactions.
|
||||
|
||||
However, there might be a subtle issue with how `context.Background()` is used in the packer vs the scanner's context.
|
||||
|
||||
## Recommended Fix
|
||||
|
||||
**Step 1: Generate file IDs upfront**
|
||||
|
||||
In `checkFileInMemory()`, generate the UUID for new files immediately:
|
||||
```go
|
||||
file := &database.File{
|
||||
ID: uuid.New().String(), // Always generate ID
|
||||
Path: path,
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
This ensures `file.ID` is set when building `fileChunks` and `chunkFiles` slices.
|
||||
|
||||
**Step 2: Verify by reverting to per-file transactions**
|
||||
|
||||
If Step 1 doesn't fix it, revert to non-batched file insertion to isolate the issue:
|
||||
|
||||
```go
|
||||
// Instead of queuing:
|
||||
// return s.addPendingFile(ctx, pendingFileData{...})
|
||||
|
||||
// Do immediate insertion:
|
||||
return s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
|
||||
// Create file
|
||||
s.repos.Files.Create(txCtx, tx, fileToProcess.File)
|
||||
// Delete old associations
|
||||
s.repos.FileChunks.DeleteByFileID(...)
|
||||
s.repos.ChunkFiles.DeleteByFileID(...)
|
||||
// Create new associations
|
||||
for _, fc := range fileChunks {
|
||||
s.repos.FileChunks.Create(...)
|
||||
}
|
||||
for _, cf := range chunkFiles {
|
||||
s.repos.ChunkFiles.Create(...)
|
||||
}
|
||||
// Add to snapshot
|
||||
s.repos.Snapshots.AddFileByID(...)
|
||||
return nil
|
||||
})
|
||||
```
|
||||
|
||||
**Step 3: If batching is still desired**
|
||||
|
||||
After confirming per-file transactions work, re-implement batching with the ID fix in place, and add debug logging to trace exactly which chunk_hash is failing and why.
|
||||
484
README.md
484
README.md
@@ -1,11 +1,64 @@
|
||||
# vaultik
|
||||
# vaultik (ваултик)
|
||||
|
||||
`vaultik` is a incremental backup daemon written in Go. It
|
||||
encrypts data using an `age` public key and uploads each encrypted blob
|
||||
directly to a remote S3-compatible object store. It requires no private
|
||||
keys, secrets, or credentials stored on the backed-up system.
|
||||
WIP: pre-1.0, some functions may not be fully implemented yet
|
||||
|
||||
---
|
||||
`vaultik` is an incremental backup daemon written in Go. It encrypts data
|
||||
using an `age` public key and uploads each encrypted blob directly to a
|
||||
remote S3-compatible object store. It requires no private keys, secrets, or
|
||||
credentials (other than those required to PUT to encrypted object storage,
|
||||
such as S3 API keys) stored on the backed-up system.
|
||||
|
||||
It includes table-stakes features such as:
|
||||
|
||||
* modern encryption (the excellent `age`)
|
||||
* deduplication
|
||||
* incremental backups
|
||||
* modern multithreaded zstd compression with configurable levels
|
||||
* content-addressed immutable storage
|
||||
* local state tracking in standard SQLite database, enables write-only
|
||||
incremental backups to destination
|
||||
* no mutable remote metadata
|
||||
* no plaintext file paths or metadata stored in remote
|
||||
* does not create huge numbers of small files (to keep S3 operation counts
|
||||
down) even if the source system has many small files
|
||||
|
||||
## why
|
||||
|
||||
Existing backup software fails under one or more of these conditions:
|
||||
|
||||
* Requires secrets (passwords, private keys) on the source system, which
|
||||
compromises encrypted backups in the case of host system compromise
|
||||
* Depends on symmetric encryption unsuitable for zero-trust environments
|
||||
* Creates one-blob-per-file, which results in excessive S3 operation counts
|
||||
* is slow
|
||||
|
||||
Other backup tools like `restic`, `borg`, and `duplicity` are designed for
|
||||
environments where the source host can store secrets and has access to
|
||||
decryption keys. I don't want to store backup decryption keys on my hosts,
|
||||
only public keys for encryption.
|
||||
|
||||
My requirements are:
|
||||
|
||||
* open source
|
||||
* no passphrases or private keys on the source host
|
||||
* incremental
|
||||
* compressed
|
||||
* encrypted
|
||||
* s3 compatible without an intermediate step or tool
|
||||
|
||||
Surprisingly, no existing tool meets these requirements, so I wrote `vaultik`.
|
||||
|
||||
## design goals
|
||||
|
||||
1. Backups must require only a public key on the source host.
|
||||
1. No secrets or private keys may exist on the source system.
|
||||
1. Restore must be possible using **only** the backup bucket and a private key.
|
||||
1. Prune must be possible (requires private key, done on different hosts).
|
||||
1. All encryption uses [`age`](https://age-encryption.org/) (X25519, XChaCha20-Poly1305).
|
||||
1. Compression uses `zstd` at a configurable level.
|
||||
1. Files are chunked, and multiple chunks are packed into encrypted blobs
|
||||
to reduce object count for filesystems with many small files.
|
||||
1. All metadata (snapshots) is stored remotely as encrypted SQLite DBs.
|
||||
|
||||
## what
|
||||
|
||||
@@ -13,29 +66,12 @@ keys, secrets, or credentials stored on the backed-up system.
|
||||
content-addressable chunk map of changed files using deterministic chunking.
|
||||
Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
|
||||
encrypted with `age`, and uploaded directly to remote storage under a
|
||||
content-addressed S3 path.
|
||||
content-addressed S3 path. At the end, a pruned snapshot-specific sqlite
|
||||
database of metadata is created, encrypted, and uploaded alongside the
|
||||
blobs.
|
||||
|
||||
No plaintext file contents ever hit disk. No private key is needed or stored
|
||||
locally. All encrypted data is streaming-processed and immediately discarded
|
||||
once uploaded. Metadata is encrypted and pushed with the same mechanism.
|
||||
|
||||
## why
|
||||
|
||||
Existing backup software fails under one or more of these conditions:
|
||||
|
||||
* Requires secrets (passwords, private keys) on the source system
|
||||
* Depends on symmetric encryption unsuitable for zero-trust environments
|
||||
* Stages temporary archives or repositories
|
||||
* Writes plaintext metadata or plaintext file paths
|
||||
|
||||
`vaultik` addresses all of these by using:
|
||||
|
||||
* Public-key-only encryption (via `age`) requires no secrets (other than
|
||||
bucket access key) on the source system
|
||||
* Blob-level deduplication and batching
|
||||
* Local state cache for incremental detection
|
||||
* S3-native chunked upload interface
|
||||
* Self-contained encrypted snapshot metadata
|
||||
No plaintext file contents ever hit disk. No private key or secret
|
||||
passphrase is needed or stored locally.
|
||||
|
||||
## how
|
||||
|
||||
@@ -45,23 +81,38 @@ Existing backup software fails under one or more of these conditions:
|
||||
go install git.eeqj.de/sneak/vaultik@latest
|
||||
```
|
||||
|
||||
2. **generate keypair**
|
||||
1. **generate keypair**
|
||||
|
||||
```sh
|
||||
age-keygen -o agekey.txt
|
||||
grep 'public key:' agekey.txt
|
||||
```
|
||||
|
||||
3. **write config**
|
||||
1. **write config**
|
||||
|
||||
```yaml
|
||||
source_dirs:
|
||||
# Named snapshots - each snapshot can contain multiple paths
|
||||
snapshots:
|
||||
system:
|
||||
paths:
|
||||
- /etc
|
||||
- /home/user/data
|
||||
- /var/lib
|
||||
exclude:
|
||||
- '*.cache' # Snapshot-specific exclusions
|
||||
home:
|
||||
paths:
|
||||
- /home/user/documents
|
||||
- /home/user/photos
|
||||
|
||||
# Global exclusions (apply to all snapshots)
|
||||
exclude:
|
||||
- '*.log'
|
||||
- '*.tmp'
|
||||
age_recipient: age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
||||
- '.git'
|
||||
- 'node_modules'
|
||||
|
||||
age_recipients:
|
||||
- age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
|
||||
s3:
|
||||
endpoint: https://s3.example.com
|
||||
bucket: vaultik-data
|
||||
@@ -69,28 +120,24 @@ Existing backup software fails under one or more of these conditions:
|
||||
access_key_id: ...
|
||||
secret_access_key: ...
|
||||
region: us-east-1
|
||||
backup_interval: 1h # only used in daemon mode, not for --cron mode
|
||||
full_scan_interval: 24h # normally we use inotify to mark dirty, but
|
||||
# every 24h we do a full stat() scan
|
||||
min_time_between_run: 15m # again, only for daemon mode
|
||||
index_path: /var/lib/vaultik/index.sqlite
|
||||
backup_interval: 1h
|
||||
full_scan_interval: 24h
|
||||
min_time_between_run: 15m
|
||||
chunk_size: 10MB
|
||||
blob_size_limit: 10GB
|
||||
index_prefix: index/
|
||||
blob_size_limit: 1GB
|
||||
```
|
||||
|
||||
4. **run**
|
||||
1. **run**
|
||||
|
||||
```sh
|
||||
vaultik backup /etc/vaultik.yaml
|
||||
```
|
||||
# Create all configured snapshots
|
||||
vaultik --config /etc/vaultik.yaml snapshot create
|
||||
|
||||
```sh
|
||||
vaultik backup /etc/vaultik.yaml --cron # silent unless error
|
||||
```
|
||||
# Create specific snapshots by name
|
||||
vaultik --config /etc/vaultik.yaml snapshot create home system
|
||||
|
||||
```sh
|
||||
vaultik backup /etc/vaultik.yaml --daemon # runs in background, uses inotify
|
||||
# Silent mode for cron
|
||||
vaultik --config /etc/vaultik.yaml snapshot create --cron
|
||||
```
|
||||
|
||||
---
|
||||
@@ -100,54 +147,233 @@ Existing backup software fails under one or more of these conditions:
|
||||
### commands
|
||||
|
||||
```sh
|
||||
vaultik backup [--config <path>] [--cron] [--daemon]
|
||||
vaultik restore --bucket <bucket> --prefix <prefix> --snapshot <id> --target <dir>
|
||||
vaultik prune --bucket <bucket> --prefix <prefix> [--dry-run]
|
||||
vaultik fetch --bucket <bucket> --prefix <prefix> --snapshot <id> --file <path> --target <path>
|
||||
vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
|
||||
vaultik [--config <path>] snapshot create [snapshot-names...] [--cron] [--daemon] [--prune]
|
||||
vaultik [--config <path>] snapshot list [--json]
|
||||
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep]
|
||||
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--force]
|
||||
vaultik [--config <path>] snapshot remove <snapshot-id> [--dry-run] [--force]
|
||||
vaultik [--config <path>] snapshot prune
|
||||
vaultik [--config <path>] restore <snapshot-id> <target-dir> [paths...]
|
||||
vaultik [--config <path>] prune [--dry-run] [--force]
|
||||
vaultik [--config <path>] info
|
||||
vaultik [--config <path>] store info
|
||||
```
|
||||
|
||||
### environment
|
||||
|
||||
* `VAULTIK_PRIVATE_KEY`: Required for `restore`, `prune`, `fetch`, and `verify` commands. Contains the age private key for decryption.
|
||||
* `VAULTIK_CONFIG`: Optional path to config file. If set, `vaultik backup` can be run without specifying the config file path.
|
||||
* `VAULTIK_AGE_SECRET_KEY`: Required for `restore` and deep `verify`. Contains the age private key for decryption.
|
||||
* `VAULTIK_CONFIG`: Optional path to config file.
|
||||
|
||||
### command details
|
||||
|
||||
**backup**: Perform incremental backup of configured directories
|
||||
**snapshot create**: Perform incremental backup of configured snapshots
|
||||
* Config is located at `/etc/vaultik/config.yml` by default
|
||||
* `--config`: Override config file path
|
||||
* Optional snapshot names argument to create specific snapshots (default: all)
|
||||
* `--cron`: Silent unless error (for crontab)
|
||||
* `--daemon`: Run continuously with inotify monitoring and periodic scans
|
||||
* `--prune`: Delete old snapshots and orphaned blobs after backup
|
||||
|
||||
**restore**: Restore entire snapshot to target directory
|
||||
* Downloads and decrypts metadata
|
||||
* Fetches only required blobs
|
||||
* Reconstructs directory structure
|
||||
**snapshot list**: List all snapshots with their timestamps and sizes
|
||||
* `--json`: Output in JSON format
|
||||
|
||||
**prune**: Remove unreferenced blobs from storage
|
||||
* Requires private key
|
||||
* Downloads latest snapshot metadata
|
||||
**snapshot verify**: Verify snapshot integrity
|
||||
* `--deep`: Download and verify blob contents (not just existence)
|
||||
|
||||
**snapshot purge**: Remove old snapshots based on criteria
|
||||
* `--keep-latest`: Keep only the most recent snapshot
|
||||
* `--older-than`: Remove snapshots older than duration (e.g., 30d, 6mo, 1y)
|
||||
* `--force`: Skip confirmation prompt
|
||||
|
||||
**snapshot remove**: Remove a specific snapshot
|
||||
* `--dry-run`: Show what would be deleted without deleting
|
||||
* `--force`: Skip confirmation prompt
|
||||
|
||||
**snapshot prune**: Clean orphaned data from local database
|
||||
|
||||
**restore**: Restore snapshot to target directory
|
||||
* Requires `VAULTIK_AGE_SECRET_KEY` environment variable with age private key
|
||||
* Optional path arguments to restore specific files/directories (default: all)
|
||||
* Downloads and decrypts metadata, fetches required blobs, reconstructs files
|
||||
* Preserves file permissions, mtime, and ownership (ownership requires root)
|
||||
* Handles symlinks and directories
|
||||
* Note: ctime cannot be restored (see [platform notes](#platform-specific-ctime-semantics))
|
||||
|
||||
**prune**: Remove unreferenced blobs from remote storage
|
||||
* Scans all snapshots for referenced blobs
|
||||
* Deletes orphaned blobs
|
||||
|
||||
**fetch**: Extract single file from backup
|
||||
* Retrieves specific file without full restore
|
||||
* Supports extracting to different filename
|
||||
**info**: Display system and configuration information
|
||||
|
||||
**verify**: Validate backup integrity
|
||||
* Checks metadata hash
|
||||
* Verifies all referenced blobs exist
|
||||
* Default: Downloads blobs and validates chunk integrity
|
||||
* `--quick`: Only checks blob existence and S3 content hashes
|
||||
**store info**: Display S3 bucket configuration and storage statistics
|
||||
|
||||
---
|
||||
|
||||
## architecture
|
||||
|
||||
### s3 bucket layout
|
||||
|
||||
```
|
||||
s3://<bucket>/<prefix>/
|
||||
├── blobs/
|
||||
│ └── <aa>/<bb>/<full_blob_hash>
|
||||
└── metadata/
|
||||
├── <snapshot_id>/
|
||||
│ ├── db.zst.age
|
||||
│ └── manifest.json.zst
|
||||
```
|
||||
|
||||
* `blobs/<aa>/<bb>/...`: Two-level directory sharding using first 4 hex chars of blob hash
|
||||
* `metadata/<snapshot_id>/db.zst.age`: Encrypted, compressed SQLite database
|
||||
* `metadata/<snapshot_id>/manifest.json.zst`: Unencrypted blob list for pruning
|
||||
|
||||
### blob manifest format
|
||||
|
||||
The `manifest.json.zst` file is unencrypted (compressed JSON) to enable pruning without decryption:
|
||||
|
||||
```json
|
||||
{
|
||||
"snapshot_id": "hostname_snapshotname_2025-01-01T12:00:00Z",
|
||||
"blob_hashes": [
|
||||
"aa1234567890abcdef...",
|
||||
"bb2345678901bcdef0..."
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Snapshot IDs follow the format `<hostname>_<snapshot-name>_<timestamp>` (e.g., `server1_home_2025-01-01T12:00:00Z`).
|
||||
|
||||
### local sqlite schema
|
||||
|
||||
```sql
|
||||
CREATE TABLE files (
|
||||
id TEXT PRIMARY KEY,
|
||||
path TEXT NOT NULL UNIQUE,
|
||||
source_path TEXT NOT NULL DEFAULT '',
|
||||
mtime INTEGER NOT NULL,
|
||||
ctime INTEGER NOT NULL,
|
||||
size INTEGER NOT NULL,
|
||||
mode INTEGER NOT NULL,
|
||||
uid INTEGER NOT NULL,
|
||||
gid INTEGER NOT NULL,
|
||||
link_target TEXT
|
||||
);
|
||||
|
||||
CREATE TABLE file_chunks (
|
||||
file_id TEXT NOT NULL,
|
||||
idx INTEGER NOT NULL,
|
||||
chunk_hash TEXT NOT NULL,
|
||||
PRIMARY KEY (file_id, idx),
|
||||
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE TABLE chunks (
|
||||
chunk_hash TEXT PRIMARY KEY,
|
||||
size INTEGER NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE blobs (
|
||||
id TEXT PRIMARY KEY,
|
||||
blob_hash TEXT NOT NULL UNIQUE,
|
||||
uncompressed INTEGER NOT NULL,
|
||||
compressed INTEGER NOT NULL,
|
||||
uploaded_at INTEGER
|
||||
);
|
||||
|
||||
CREATE TABLE blob_chunks (
|
||||
blob_hash TEXT NOT NULL,
|
||||
chunk_hash TEXT NOT NULL,
|
||||
offset INTEGER NOT NULL,
|
||||
length INTEGER NOT NULL,
|
||||
PRIMARY KEY (blob_hash, chunk_hash)
|
||||
);
|
||||
|
||||
CREATE TABLE chunk_files (
|
||||
chunk_hash TEXT NOT NULL,
|
||||
file_id TEXT NOT NULL,
|
||||
file_offset INTEGER NOT NULL,
|
||||
length INTEGER NOT NULL,
|
||||
PRIMARY KEY (chunk_hash, file_id)
|
||||
);
|
||||
|
||||
CREATE TABLE snapshots (
|
||||
id TEXT PRIMARY KEY,
|
||||
hostname TEXT NOT NULL,
|
||||
vaultik_version TEXT NOT NULL,
|
||||
started_at INTEGER NOT NULL,
|
||||
completed_at INTEGER,
|
||||
file_count INTEGER NOT NULL,
|
||||
chunk_count INTEGER NOT NULL,
|
||||
blob_count INTEGER NOT NULL,
|
||||
total_size INTEGER NOT NULL,
|
||||
blob_size INTEGER NOT NULL,
|
||||
compression_ratio REAL NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE snapshot_files (
|
||||
snapshot_id TEXT NOT NULL,
|
||||
file_id TEXT NOT NULL,
|
||||
PRIMARY KEY (snapshot_id, file_id)
|
||||
);
|
||||
|
||||
CREATE TABLE snapshot_blobs (
|
||||
snapshot_id TEXT NOT NULL,
|
||||
blob_id TEXT NOT NULL,
|
||||
blob_hash TEXT NOT NULL,
|
||||
PRIMARY KEY (snapshot_id, blob_id)
|
||||
);
|
||||
```
|
||||
|
||||
### data flow
|
||||
|
||||
#### backup
|
||||
|
||||
1. Load config, open local SQLite index
|
||||
1. Walk source directories, check mtime/size against index
|
||||
1. For changed/new files: chunk using content-defined chunking
|
||||
1. For each chunk: hash, check if already uploaded, add to blob packer
|
||||
1. When blob reaches threshold: compress, encrypt, upload to S3
|
||||
1. Build snapshot metadata, compress, encrypt, upload
|
||||
1. Create blob manifest (unencrypted) for pruning support
|
||||
|
||||
#### restore
|
||||
|
||||
1. Download `metadata/<snapshot_id>/db.zst.age`
|
||||
1. Decrypt and decompress SQLite database
|
||||
1. Query files table (optionally filtered by paths)
|
||||
1. For each file, get ordered chunk list from file_chunks
|
||||
1. Download required blobs, decrypt, decompress
|
||||
1. Extract chunks and reconstruct files
|
||||
1. Restore permissions, mtime, uid/gid (ctime cannot be restored — see platform notes above)
|
||||
|
||||
### platform-specific ctime semantics
|
||||
|
||||
The `ctime` field in the files table stores a platform-dependent timestamp:
|
||||
|
||||
* **macOS (Darwin)**: `ctime` is the file's **birth time** — when the file was
|
||||
first created on disk. This value never changes after file creation, even if
|
||||
the file's content or metadata is modified.
|
||||
|
||||
* **Linux**: `ctime` is the **inode change time** — the last time the file's
|
||||
metadata (permissions, ownership, link count, etc.) was modified. This is NOT
|
||||
the file creation time. Linux did not expose birth time (via `statx(2)`) until
|
||||
kernel 4.11, and Go's `syscall` package does not yet surface it.
|
||||
|
||||
**Restore limitation**: `ctime` cannot be restored on either platform. On Linux,
|
||||
the kernel manages the inode change time and userspace cannot set it. On macOS,
|
||||
there is no standard POSIX API to set birth time. The `ctime` value is preserved
|
||||
in the snapshot database for informational/forensic purposes only.
|
||||
|
||||
#### prune
|
||||
|
||||
1. List all snapshot manifests
|
||||
1. Build set of all referenced blob hashes
|
||||
1. List all blobs in storage
|
||||
1. Delete any blob not in referenced set
|
||||
|
||||
### chunking
|
||||
|
||||
* Content-defined chunking using rolling hash (Rabin fingerprint)
|
||||
* Average chunk size: 10MB (configurable)
|
||||
* Content-defined chunking using FastCDC algorithm
|
||||
* Average chunk size: configurable (default 10MB)
|
||||
* Deduplication at chunk level
|
||||
* Multiple chunks packed into blobs for efficiency
|
||||
|
||||
@@ -158,19 +384,13 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
|
||||
* Each blob encrypted independently
|
||||
* Metadata databases also encrypted
|
||||
|
||||
### storage
|
||||
### compression
|
||||
|
||||
* Content-addressed blob storage
|
||||
* Immutable append-only design
|
||||
* Two-level directory sharding for blobs (aa/bb/hash)
|
||||
* Compressed with zstd before encryption
|
||||
* zstd compression at configurable level
|
||||
* Applied before encryption
|
||||
* Blob-level compression for efficiency
|
||||
|
||||
### state tracking
|
||||
|
||||
* Local SQLite database for incremental state
|
||||
* Tracks file mtimes and chunk mappings
|
||||
* Enables efficient change detection
|
||||
* Supports inotify monitoring in daemon mode
|
||||
---
|
||||
|
||||
## does not
|
||||
|
||||
@@ -180,8 +400,6 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
|
||||
* Require a symmetric passphrase or password
|
||||
* Trust the source system with anything
|
||||
|
||||
---
|
||||
|
||||
## does
|
||||
|
||||
* Incremental deduplicated backup
|
||||
@@ -193,90 +411,22 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
|
||||
|
||||
---
|
||||
|
||||
## restore
|
||||
## requirements
|
||||
|
||||
`vaultik restore` downloads only the snapshot metadata and required blobs. It
|
||||
never contacts the source system. All restore operations depend only on:
|
||||
|
||||
* `VAULTIK_PRIVATE_KEY`
|
||||
* The bucket
|
||||
|
||||
The entire system is restore-only from object storage.
|
||||
|
||||
---
|
||||
|
||||
## features
|
||||
|
||||
### daemon mode
|
||||
|
||||
* Continuous background operation
|
||||
* inotify-based change detection
|
||||
* Respects `backup_interval` and `min_time_between_run`
|
||||
* Full scan every `full_scan_interval` (default 24h)
|
||||
|
||||
### cron mode
|
||||
|
||||
* Single backup run
|
||||
* Silent output unless errors
|
||||
* Ideal for scheduled backups
|
||||
|
||||
### metadata integrity
|
||||
|
||||
* SHA256 hash of metadata stored separately
|
||||
* Encrypted hash file for verification
|
||||
* Chunked metadata support for large filesystems
|
||||
|
||||
### exclusion patterns
|
||||
|
||||
* Glob-based file exclusion
|
||||
* Configured in YAML
|
||||
* Applied during directory walk
|
||||
|
||||
## prune
|
||||
|
||||
Run `vaultik prune` on a machine with the private key. It:
|
||||
|
||||
* Downloads the most recent snapshot
|
||||
* Decrypts metadata
|
||||
* Lists referenced blobs
|
||||
* Deletes any blob in the bucket not referenced
|
||||
|
||||
This enables garbage collection from immutable storage.
|
||||
|
||||
---
|
||||
* Go 1.24 or later
|
||||
* S3-compatible object storage
|
||||
* Sufficient disk space for local index (typically <1GB)
|
||||
|
||||
## license
|
||||
|
||||
WTFPL — see LICENSE.
|
||||
|
||||
---
|
||||
|
||||
## security considerations
|
||||
|
||||
* Source host compromise cannot decrypt backups
|
||||
* No replay attacks possible (append-only)
|
||||
* Each blob independently encrypted
|
||||
* Metadata tampering detectable via hash verification
|
||||
* S3 credentials only allow write access to backup prefix
|
||||
|
||||
## performance
|
||||
|
||||
* Streaming processing (no temp files)
|
||||
* Parallel blob uploads
|
||||
* Deduplication reduces storage and bandwidth
|
||||
* Local index enables fast incremental detection
|
||||
* Configurable compression levels
|
||||
|
||||
## requirements
|
||||
|
||||
* Go 1.24.4 or later
|
||||
* S3-compatible object storage
|
||||
* age command-line tool (for key generation)
|
||||
* SQLite3
|
||||
* Sufficient disk space for local index
|
||||
[MIT](https://opensource.org/license/mit/)
|
||||
|
||||
## author
|
||||
|
||||
sneak
|
||||
[sneak@sneak.berlin](mailto:sneak@sneak.berlin)
|
||||
[https://sneak.berlin](https://sneak.berlin)
|
||||
Made with love and lots of expensive SOTA AI by [sneak](https://sneak.berlin) in Berlin in the summer of 2025.
|
||||
|
||||
Released as a free software gift to the world, no strings attached.
|
||||
|
||||
Contact: [sneak@sneak.berlin](mailto:sneak@sneak.berlin)
|
||||
|
||||
[https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2](https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2)
|
||||
|
||||
212
TODO.md
212
TODO.md
@@ -1,112 +1,128 @@
|
||||
# Implementation TODO
|
||||
# Vaultik 1.0 TODO
|
||||
|
||||
## Local Index Database
|
||||
1. Implement SQLite schema creation
|
||||
1. Create Index type with all database operations
|
||||
1. Add transaction support and proper locking
|
||||
1. Implement file tracking (save, lookup, delete)
|
||||
1. Implement chunk tracking and deduplication
|
||||
1. Implement blob tracking and chunk-to-blob mapping
|
||||
1. Write tests for all index operations
|
||||
Linear list of tasks to complete before 1.0 release.
|
||||
|
||||
## Chunking and Hashing
|
||||
1. Implement Rabin fingerprint chunker
|
||||
1. Create streaming chunk processor
|
||||
1. Implement SHA256 hashing for chunks
|
||||
1. Add configurable chunk size parameters
|
||||
1. Write tests for chunking consistency
|
||||
## Rclone Storage Backend (Complete)
|
||||
|
||||
## Compression and Encryption
|
||||
1. Implement zstd compression wrapper
|
||||
1. Integrate age encryption library
|
||||
1. Create Encryptor type for public key encryption
|
||||
1. Create Decryptor type for private key decryption
|
||||
1. Implement streaming encrypt/decrypt pipelines
|
||||
1. Write tests for compression and encryption
|
||||
Add rclone as a storage backend via Go library import, allowing vaultik to use any of rclone's 70+ supported cloud storage providers.
|
||||
|
||||
## Blob Packing
|
||||
1. Implement BlobWriter with size limits
|
||||
1. Add chunk accumulation and flushing
|
||||
1. Create blob hash calculation
|
||||
1. Implement proper error handling and rollback
|
||||
1. Write tests for blob packing scenarios
|
||||
**Configuration:**
|
||||
```yaml
|
||||
storage_url: "rclone://myremote/path/to/backups"
|
||||
```
|
||||
User must have rclone configured separately (via `rclone config`).
|
||||
|
||||
## S3 Operations
|
||||
1. Integrate MinIO client library
|
||||
1. Implement S3Client wrapper type
|
||||
1. Add multipart upload support for large blobs
|
||||
1. Implement retry logic with exponential backoff
|
||||
1. Add connection pooling and timeout handling
|
||||
1. Write tests using MinIO container
|
||||
**Implementation Steps:**
|
||||
1. [x] Add rclone dependency to go.mod
|
||||
2. [x] Create `internal/storage/rclone.go` implementing `Storer` interface
|
||||
- `NewRcloneStorer(remote, path)` - init with `configfile.Install()` and `fs.NewFs()`
|
||||
- `Put` / `PutWithProgress` - use `operations.Rcat()`
|
||||
- `Get` - use `fs.NewObject()` then `obj.Open()`
|
||||
- `Stat` - use `fs.NewObject()` for size/metadata
|
||||
- `Delete` - use `obj.Remove()`
|
||||
- `List` / `ListStream` - use `operations.ListFn()`
|
||||
- `Info` - return remote name
|
||||
3. [x] Update `internal/storage/url.go` - parse `rclone://remote/path` URLs
|
||||
4. [x] Update `internal/storage/module.go` - add rclone case to `storerFromURL()`
|
||||
5. [x] Test with real rclone remote
|
||||
|
||||
## Backup Command - Basic
|
||||
1. Implement directory walking with exclusion patterns
|
||||
1. Add file change detection using index
|
||||
1. Integrate chunking pipeline for changed files
|
||||
1. Implement blob upload coordination
|
||||
1. Add progress reporting to stderr
|
||||
1. Write integration tests for backup
|
||||
**Error Mapping:**
|
||||
- `fs.ErrorObjectNotFound` → `ErrNotFound`
|
||||
- `fs.ErrorDirNotFound` → `ErrNotFound`
|
||||
- `fs.ErrorNotFoundInConfigFile` → `ErrRemoteNotFound` (new)
|
||||
|
||||
## Snapshot Metadata
|
||||
1. Implement snapshot metadata extraction from index
|
||||
1. Create SQLite snapshot database builder
|
||||
1. Add metadata compression and encryption
|
||||
1. Implement metadata chunking for large snapshots
|
||||
1. Add hash calculation and verification
|
||||
1. Implement metadata upload to S3
|
||||
1. Write tests for metadata operations
|
||||
---
|
||||
|
||||
## Restore Command
|
||||
1. Implement snapshot listing and selection
|
||||
1. Add metadata download and reconstruction
|
||||
1. Implement hash verification for metadata
|
||||
1. Create file restoration logic with chunk retrieval
|
||||
1. Add blob caching for efficiency
|
||||
1. Implement proper file permissions and mtime restoration
|
||||
1. Write integration tests for restore
|
||||
## CLI Polish (Priority)
|
||||
|
||||
## Prune Command
|
||||
1. Implement latest snapshot detection
|
||||
1. Add referenced blob extraction from metadata
|
||||
1. Create S3 blob listing and comparison
|
||||
1. Implement safe deletion of unreferenced blobs
|
||||
1. Add dry-run mode for safety
|
||||
1. Write tests for prune scenarios
|
||||
1. Improve error messages throughout
|
||||
- Ensure all errors include actionable context
|
||||
- Add suggestions for common issues (e.g., "did you set VAULTIK_AGE_SECRET_KEY?")
|
||||
|
||||
## Verify Command
|
||||
1. Implement metadata integrity checking
|
||||
1. Add blob existence verification
|
||||
1. Implement quick mode (S3 hash checking)
|
||||
1. Implement deep mode (download and verify chunks)
|
||||
1. Add detailed error reporting
|
||||
1. Write tests for verification
|
||||
## Security (Priority)
|
||||
|
||||
## Fetch Command
|
||||
1. Implement single-file metadata query
|
||||
1. Add minimal blob downloading for file
|
||||
1. Create streaming file reconstruction
|
||||
1. Add support for output redirection
|
||||
1. Write tests for fetch command
|
||||
1. Audit encryption implementation
|
||||
- Verify age encryption is used correctly
|
||||
- Ensure no plaintext leaks in logs or errors
|
||||
- Verify blob hashes are computed correctly
|
||||
|
||||
## Daemon Mode
|
||||
1. Implement inotify watcher for Linux
|
||||
1. Add dirty path tracking in index
|
||||
1. Create periodic full scan scheduler
|
||||
1. Implement backup interval enforcement
|
||||
1. Add proper signal handling and shutdown
|
||||
1. Write tests for daemon behavior
|
||||
1. Secure memory handling for secrets
|
||||
- Clear S3 credentials from memory after client init
|
||||
- Document that age_secret_key is env-var only (already implemented)
|
||||
|
||||
## Cron Mode
|
||||
1. Implement silent operation mode
|
||||
1. Add proper exit codes for cron
|
||||
1. Implement lock file to prevent concurrent runs
|
||||
1. Add error summary reporting
|
||||
1. Write tests for cron mode
|
||||
## Testing
|
||||
|
||||
## Finalization
|
||||
1. Add comprehensive logging throughout
|
||||
1. Implement proper error wrapping and context
|
||||
1. Add performance metrics collection
|
||||
1. Create end-to-end integration tests
|
||||
1. Write documentation and examples
|
||||
1. Set up CI/CD pipeline
|
||||
1. Write integration tests for restore command
|
||||
|
||||
1. Write end-to-end integration test
|
||||
- Create backup
|
||||
- Verify backup
|
||||
- Restore backup
|
||||
- Compare restored files to originals
|
||||
|
||||
1. Add tests for edge cases
|
||||
- Empty directories
|
||||
- Symlinks
|
||||
- Special characters in filenames
|
||||
- Very large files (multi-GB)
|
||||
- Many small files (100k+)
|
||||
|
||||
1. Add tests for error conditions
|
||||
- Network failures during upload
|
||||
- Disk full during restore
|
||||
- Corrupted blobs
|
||||
- Missing blobs
|
||||
|
||||
## Performance
|
||||
|
||||
1. Profile and optimize restore performance
|
||||
- Parallel blob downloads
|
||||
- Streaming decompression/decryption
|
||||
- Efficient chunk reassembly
|
||||
|
||||
1. Add bandwidth limiting option
|
||||
- `--bwlimit` flag for upload/download speed limiting
|
||||
|
||||
## Documentation
|
||||
|
||||
1. Add man page or --help improvements
|
||||
- Detailed help for each command
|
||||
- Examples in help output
|
||||
|
||||
## Final Polish
|
||||
|
||||
1. Ensure version is set correctly in releases
|
||||
|
||||
1. Create release process
|
||||
- Binary releases for supported platforms
|
||||
- Checksums for binaries
|
||||
- Release notes template
|
||||
|
||||
1. Final code review
|
||||
- Remove debug statements
|
||||
- Ensure consistent code style
|
||||
|
||||
1. Tag and release v1.0.0
|
||||
|
||||
---
|
||||
|
||||
## Post-1.0 (Daemon Mode)
|
||||
|
||||
1. Implement inotify file watcher for Linux
|
||||
- Watch source directories for changes
|
||||
- Track dirty paths in memory
|
||||
|
||||
1. Implement FSEvents watcher for macOS
|
||||
- Watch source directories for changes
|
||||
- Track dirty paths in memory
|
||||
|
||||
1. Implement backup scheduler in daemon mode
|
||||
- Respect backup_interval config
|
||||
- Trigger backup when dirty paths exist and interval elapsed
|
||||
- Implement full_scan_interval for periodic full scans
|
||||
|
||||
1. Add proper signal handling for daemon
|
||||
- Graceful shutdown on SIGTERM/SIGINT
|
||||
- Complete in-progress backup before exit
|
||||
|
||||
1. Write tests for daemon mode
|
||||
|
||||
@@ -1,9 +1,41 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"os"
|
||||
"runtime"
|
||||
"runtime/pprof"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/cli"
|
||||
)
|
||||
|
||||
func main() {
|
||||
// CPU profiling: set VAULTIK_CPUPROFILE=/path/to/cpu.prof
|
||||
if cpuProfile := os.Getenv("VAULTIK_CPUPROFILE"); cpuProfile != "" {
|
||||
f, err := os.Create(cpuProfile)
|
||||
if err != nil {
|
||||
panic("could not create CPU profile: " + err.Error())
|
||||
}
|
||||
defer func() { _ = f.Close() }()
|
||||
if err := pprof.StartCPUProfile(f); err != nil {
|
||||
panic("could not start CPU profile: " + err.Error())
|
||||
}
|
||||
defer pprof.StopCPUProfile()
|
||||
}
|
||||
|
||||
// Memory profiling: set VAULTIK_MEMPROFILE=/path/to/mem.prof
|
||||
if memProfile := os.Getenv("VAULTIK_MEMPROFILE"); memProfile != "" {
|
||||
defer func() {
|
||||
f, err := os.Create(memProfile)
|
||||
if err != nil {
|
||||
panic("could not create memory profile: " + err.Error())
|
||||
}
|
||||
defer func() { _ = f.Close() }()
|
||||
runtime.GC() // get up-to-date statistics
|
||||
if err := pprof.WriteHeapProfile(f); err != nil {
|
||||
panic("could not write memory profile: " + err.Error())
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
cli.CLIEntry()
|
||||
}
|
||||
|
||||
332
config.example.yml
Normal file
332
config.example.yml
Normal file
@@ -0,0 +1,332 @@
|
||||
# vaultik configuration file example
|
||||
# This file shows all available configuration options with their default values
|
||||
# Copy this file and uncomment/modify the values you need
|
||||
|
||||
# Age recipient public keys for encryption
|
||||
# This is REQUIRED - backups are encrypted to these public keys
|
||||
# Generate with: age-keygen | grep "public key"
|
||||
age_recipients:
|
||||
- age1cj2k2addawy294f6k2gr2mf9gps9r3syplryxca3nvxj3daqm96qfp84tz
|
||||
|
||||
# Named snapshots - each snapshot can contain multiple paths
|
||||
# Each snapshot gets its own ID and can have snapshot-specific excludes
|
||||
snapshots:
|
||||
testing:
|
||||
paths:
|
||||
- ~/dev/vaultik
|
||||
apps:
|
||||
paths:
|
||||
- /Applications
|
||||
exclude:
|
||||
- "/App Store.app"
|
||||
- "/Apps.app"
|
||||
- "/Automator.app"
|
||||
- "/Books.app"
|
||||
- "/Calculator.app"
|
||||
- "/Calendar.app"
|
||||
- "/Chess.app"
|
||||
- "/Clock.app"
|
||||
- "/Contacts.app"
|
||||
- "/Dictionary.app"
|
||||
- "/FaceTime.app"
|
||||
- "/FindMy.app"
|
||||
- "/Font Book.app"
|
||||
- "/Freeform.app"
|
||||
- "/Games.app"
|
||||
- "/GarageBand.app"
|
||||
- "/Home.app"
|
||||
- "/Image Capture.app"
|
||||
- "/Image Playground.app"
|
||||
- "/Journal.app"
|
||||
- "/Keynote.app"
|
||||
- "/Mail.app"
|
||||
- "/Maps.app"
|
||||
- "/Messages.app"
|
||||
- "/Mission Control.app"
|
||||
- "/Music.app"
|
||||
- "/News.app"
|
||||
- "/Notes.app"
|
||||
- "/Numbers.app"
|
||||
- "/Pages.app"
|
||||
- "/Passwords.app"
|
||||
- "/Phone.app"
|
||||
- "/Photo Booth.app"
|
||||
- "/Photos.app"
|
||||
- "/Podcasts.app"
|
||||
- "/Preview.app"
|
||||
- "/QuickTime Player.app"
|
||||
- "/Reminders.app"
|
||||
- "/Safari.app"
|
||||
- "/Shortcuts.app"
|
||||
- "/Siri.app"
|
||||
- "/Stickies.app"
|
||||
- "/Stocks.app"
|
||||
- "/System Settings.app"
|
||||
- "/TV.app"
|
||||
- "/TextEdit.app"
|
||||
- "/Time Machine.app"
|
||||
- "/Tips.app"
|
||||
- "/Utilities/Activity Monitor.app"
|
||||
- "/Utilities/AirPort Utility.app"
|
||||
- "/Utilities/Audio MIDI Setup.app"
|
||||
- "/Utilities/Bluetooth File Exchange.app"
|
||||
- "/Utilities/Boot Camp Assistant.app"
|
||||
- "/Utilities/ColorSync Utility.app"
|
||||
- "/Utilities/Console.app"
|
||||
- "/Utilities/Digital Color Meter.app"
|
||||
- "/Utilities/Disk Utility.app"
|
||||
- "/Utilities/Grapher.app"
|
||||
- "/Utilities/Magnifier.app"
|
||||
- "/Utilities/Migration Assistant.app"
|
||||
- "/Utilities/Print Center.app"
|
||||
- "/Utilities/Screen Sharing.app"
|
||||
- "/Utilities/Screenshot.app"
|
||||
- "/Utilities/Script Editor.app"
|
||||
- "/Utilities/System Information.app"
|
||||
- "/Utilities/Terminal.app"
|
||||
- "/Utilities/VoiceOver Utility.app"
|
||||
- "/VoiceMemos.app"
|
||||
- "/Weather.app"
|
||||
- "/iMovie.app"
|
||||
- "/iPhone Mirroring.app"
|
||||
home:
|
||||
paths:
|
||||
- "~"
|
||||
exclude:
|
||||
- "/.Trash"
|
||||
- "/tmp"
|
||||
- "/Library/Caches"
|
||||
- "/Library/Accounts"
|
||||
- "/Library/AppleMediaServices"
|
||||
- "/Library/Application Support/AddressBook"
|
||||
- "/Library/Application Support/CallHistoryDB"
|
||||
- "/Library/Application Support/CallHistoryTransactions"
|
||||
- "/Library/Application Support/DifferentialPrivacy"
|
||||
- "/Library/Application Support/FaceTime"
|
||||
- "/Library/Application Support/FileProvider"
|
||||
- "/Library/Application Support/Knowledge"
|
||||
- "/Library/Application Support/com.apple.TCC"
|
||||
- "/Library/Application Support/com.apple.avfoundation/Frecents"
|
||||
- "/Library/Application Support/com.apple.sharedfilelist"
|
||||
- "/Library/Assistant/SiriVocabulary"
|
||||
- "/Library/Autosave Information"
|
||||
- "/Library/Biome"
|
||||
- "/Library/ContainerManager"
|
||||
- "/Library/Containers/com.apple.Home"
|
||||
- "/Library/Containers/com.apple.Maps/Data/Maps"
|
||||
- "/Library/Containers/com.apple.MobileSMS"
|
||||
- "/Library/Containers/com.apple.Notes"
|
||||
- "/Library/Containers/com.apple.Safari"
|
||||
- "/Library/Containers/com.apple.Safari.WebApp"
|
||||
- "/Library/Containers/com.apple.VoiceMemos"
|
||||
- "/Library/Containers/com.apple.archiveutility"
|
||||
- "/Library/Containers/com.apple.corerecents.recentsd/Data/Library/Recents"
|
||||
- "/Library/Containers/com.apple.mail"
|
||||
- "/Library/Containers/com.apple.news"
|
||||
- "/Library/Containers/com.apple.stocks"
|
||||
- "/Library/Cookies"
|
||||
- "/Library/CoreFollowUp"
|
||||
- "/Library/Daemon Containers"
|
||||
- "/Library/DoNotDisturb"
|
||||
- "/Library/DuetExpertCenter"
|
||||
- "/Library/Group Containers/com.apple.Home.group"
|
||||
- "/Library/Group Containers/com.apple.MailPersonaStorage"
|
||||
- "/Library/Group Containers/com.apple.PreviewLegacySignaturesConversion"
|
||||
- "/Library/Group Containers/com.apple.bird"
|
||||
- "/Library/Group Containers/com.apple.stickersd.group"
|
||||
- "/Library/Group Containers/com.apple.systempreferences.cache"
|
||||
- "/Library/Group Containers/group.com.apple.AppleSpell"
|
||||
- "/Library/Group Containers/group.com.apple.ArchiveUtility.PKSignedContainer"
|
||||
- "/Library/Group Containers/group.com.apple.DeviceActivity"
|
||||
- "/Library/Group Containers/group.com.apple.Journal"
|
||||
- "/Library/Group Containers/group.com.apple.ManagedSettings"
|
||||
- "/Library/Group Containers/group.com.apple.PegasusConfiguration"
|
||||
- "/Library/Group Containers/group.com.apple.Safari.SandboxBroker"
|
||||
- "/Library/Group Containers/group.com.apple.SiriTTS"
|
||||
- "/Library/Group Containers/group.com.apple.UserNotifications"
|
||||
- "/Library/Group Containers/group.com.apple.VoiceMemos.shared"
|
||||
- "/Library/Group Containers/group.com.apple.accessibility.voicebanking"
|
||||
- "/Library/Group Containers/group.com.apple.amsondevicestoraged"
|
||||
- "/Library/Group Containers/group.com.apple.appstoreagent"
|
||||
- "/Library/Group Containers/group.com.apple.calendar"
|
||||
- "/Library/Group Containers/group.com.apple.chronod"
|
||||
- "/Library/Group Containers/group.com.apple.contacts"
|
||||
- "/Library/Group Containers/group.com.apple.controlcenter"
|
||||
- "/Library/Group Containers/group.com.apple.corerepair"
|
||||
- "/Library/Group Containers/group.com.apple.coreservices.useractivityd"
|
||||
- "/Library/Group Containers/group.com.apple.energykit"
|
||||
- "/Library/Group Containers/group.com.apple.feedback"
|
||||
- "/Library/Group Containers/group.com.apple.feedbacklogger"
|
||||
- "/Library/Group Containers/group.com.apple.findmy.findmylocateagent"
|
||||
- "/Library/Group Containers/group.com.apple.iCloudDrive"
|
||||
- "/Library/Group Containers/group.com.apple.icloud.fmfcore"
|
||||
- "/Library/Group Containers/group.com.apple.icloud.fmipcore"
|
||||
- "/Library/Group Containers/group.com.apple.icloud.searchpartyuseragent"
|
||||
- "/Library/Group Containers/group.com.apple.liveactivitiesd"
|
||||
- "/Library/Group Containers/group.com.apple.loginwindow.persistent-apps"
|
||||
- "/Library/Group Containers/group.com.apple.mail"
|
||||
- "/Library/Group Containers/group.com.apple.mlhost"
|
||||
- "/Library/Group Containers/group.com.apple.moments"
|
||||
- "/Library/Group Containers/group.com.apple.news"
|
||||
- "/Library/Group Containers/group.com.apple.newsd"
|
||||
- "/Library/Group Containers/group.com.apple.notes"
|
||||
- "/Library/Group Containers/group.com.apple.notes.import"
|
||||
- "/Library/Group Containers/group.com.apple.photolibraryd.private"
|
||||
- "/Library/Group Containers/group.com.apple.portrait.BackgroundReplacement"
|
||||
- "/Library/Group Containers/group.com.apple.printtool"
|
||||
- "/Library/Group Containers/group.com.apple.private.translation"
|
||||
- "/Library/Group Containers/group.com.apple.reminders"
|
||||
- "/Library/Group Containers/group.com.apple.replicatord"
|
||||
- "/Library/Group Containers/group.com.apple.scopedbookmarkagent"
|
||||
- "/Library/Group Containers/group.com.apple.secure-control-center-preferences"
|
||||
- "/Library/Group Containers/group.com.apple.sharingd"
|
||||
- "/Library/Group Containers/group.com.apple.shortcuts"
|
||||
- "/Library/Group Containers/group.com.apple.siri.inference"
|
||||
- "/Library/Group Containers/group.com.apple.siri.referenceResolution"
|
||||
- "/Library/Group Containers/group.com.apple.siri.remembers"
|
||||
- "/Library/Group Containers/group.com.apple.siri.userfeedbacklearning"
|
||||
- "/Library/Group Containers/group.com.apple.spotlight"
|
||||
- "/Library/Group Containers/group.com.apple.stocks"
|
||||
- "/Library/Group Containers/group.com.apple.stocks-news"
|
||||
- "/Library/Group Containers/group.com.apple.studentd"
|
||||
- "/Library/Group Containers/group.com.apple.swtransparency"
|
||||
- "/Library/Group Containers/group.com.apple.telephonyutilities.callservicesd"
|
||||
- "/Library/Group Containers/group.com.apple.tips"
|
||||
- "/Library/Group Containers/group.com.apple.tipsnext"
|
||||
- "/Library/Group Containers/group.com.apple.transparency"
|
||||
- "/Library/Group Containers/group.com.apple.usernoted"
|
||||
- "/Library/Group Containers/group.com.apple.weather"
|
||||
- "/Library/HomeKit"
|
||||
- "/Library/IdentityServices"
|
||||
- "/Library/IntelligencePlatform"
|
||||
- "/Library/Mail"
|
||||
- "/Library/Messages"
|
||||
- "/Library/Metadata/CoreSpotlight"
|
||||
- "/Library/Metadata/com.apple.IntelligentSuggestions"
|
||||
- "/Library/PersonalizationPortrait"
|
||||
- "/Library/Safari"
|
||||
- "/Library/Sharing"
|
||||
- "/Library/Shortcuts"
|
||||
- "/Library/StatusKit"
|
||||
- "/Library/Suggestions"
|
||||
- "/Library/Trial"
|
||||
- "/Library/Weather"
|
||||
- "/Library/com.apple.aiml.instrumentation"
|
||||
- "/Movies/TV"
|
||||
system:
|
||||
paths:
|
||||
- /
|
||||
exclude:
|
||||
# Virtual/transient filesystems
|
||||
- /proc
|
||||
- /sys
|
||||
- /dev
|
||||
- /run
|
||||
- /tmp
|
||||
- /var/tmp
|
||||
- /var/run
|
||||
- /var/lock
|
||||
- /var/cache
|
||||
- /media
|
||||
- /mnt
|
||||
# Swap
|
||||
- /swapfile
|
||||
- /swap.img
|
||||
# Package manager caches
|
||||
- /var/cache/apt
|
||||
- /var/cache/yum
|
||||
- /var/cache/dnf
|
||||
- /var/cache/pacman
|
||||
# Trash
|
||||
- "*/.local/share/Trash"
|
||||
dev:
|
||||
paths:
|
||||
- /Users/user/dev
|
||||
exclude:
|
||||
- "**/node_modules"
|
||||
- "**/target"
|
||||
- "**/build"
|
||||
- "**/__pycache__"
|
||||
- "**/*.pyc"
|
||||
- "**/.venv"
|
||||
- "**/vendor"
|
||||
|
||||
# Global patterns to exclude from all backups
|
||||
exclude:
|
||||
- "*.tmp"
|
||||
|
||||
# Storage URL - use either this OR the s3 section below
|
||||
# Supports: s3://bucket/prefix, file:///path, rclone://remote/path
|
||||
storage_url: "rclone://las1stor1//srv/pool.2024.04/backups/heraklion"
|
||||
|
||||
# S3-compatible storage configuration
|
||||
#s3:
|
||||
# # S3-compatible endpoint URL
|
||||
# # Examples: https://s3.amazonaws.com, https://storage.googleapis.com
|
||||
# endpoint: http://10.100.205.122:8333
|
||||
#
|
||||
# # Bucket name where backups will be stored
|
||||
# bucket: testbucket
|
||||
#
|
||||
# # Prefix (folder) within the bucket for this host's backups
|
||||
# # Useful for organizing backups from multiple hosts
|
||||
# # Default: empty (root of bucket)
|
||||
# #prefix: "hosts/myserver/"
|
||||
#
|
||||
# # S3 access credentials
|
||||
# access_key_id: Z9GT22M9YFU08WRMC5D4
|
||||
# secret_access_key: Pi0tPKjFbN4rZlRhcA4zBtEkib04yy2WcIzI+AXk
|
||||
#
|
||||
# # S3 region
|
||||
# # Default: us-east-1
|
||||
# #region: us-east-1
|
||||
#
|
||||
# # Use SSL/TLS for S3 connections
|
||||
# # Default: true
|
||||
# #use_ssl: true
|
||||
#
|
||||
# # Part size for multipart uploads
|
||||
# # Minimum 5MB, affects memory usage during upload
|
||||
# # Supports: 5MB, 10M, 100MiB, etc.
|
||||
# # Default: 5MB
|
||||
# #part_size: 5MB
|
||||
|
||||
# How often to run backups in daemon mode
|
||||
# Format: 1h, 30m, 24h, etc
|
||||
# Default: 1h
|
||||
#backup_interval: 1h
|
||||
|
||||
# How often to do a full filesystem scan in daemon mode
|
||||
# Between full scans, inotify is used to detect changes
|
||||
# Default: 24h
|
||||
#full_scan_interval: 24h
|
||||
|
||||
# Minimum time between backup runs in daemon mode
|
||||
# Prevents backups from running too frequently
|
||||
# Default: 15m
|
||||
#min_time_between_run: 15m
|
||||
|
||||
# Path to local SQLite index database
|
||||
# This database tracks file state for incremental backups
|
||||
# Default: /var/lib/vaultik/index.sqlite
|
||||
#index_path: /var/lib/vaultik/index.sqlite
|
||||
|
||||
# Average chunk size for content-defined chunking
|
||||
# Smaller chunks = better deduplication but more metadata
|
||||
# Supports: 10MB, 5M, 1GB, 500KB, 64MiB, etc.
|
||||
# Default: 10MB
|
||||
#chunk_size: 10MB
|
||||
|
||||
# Maximum blob size
|
||||
# Multiple chunks are packed into blobs up to this size
|
||||
# Supports: 1GB, 10G, 500MB, 1GiB, etc.
|
||||
# Default: 10GB
|
||||
#blob_size_limit: 10GB
|
||||
|
||||
# Compression level (1-19)
|
||||
# Higher = better compression but slower
|
||||
# Default: 3
|
||||
compression_level: 5
|
||||
# Hostname to use in backup metadata
|
||||
# Default: system hostname
|
||||
#hostname: myserver
|
||||
268
docs/DATAMODEL.md
Normal file
268
docs/DATAMODEL.md
Normal file
@@ -0,0 +1,268 @@
|
||||
# Vaultik Data Model
|
||||
|
||||
## Overview
|
||||
|
||||
Vaultik uses a local SQLite database to track file metadata, chunk mappings, and blob associations during the backup process. This database serves as an index for incremental backups and enables efficient deduplication.
|
||||
|
||||
**Important Notes:**
|
||||
- **No Migration Support**: Vaultik does not support database schema migrations. If the schema changes, the local database must be deleted and recreated by performing a full backup.
|
||||
- **Version Compatibility**: In rare cases, you may need to use the same version of Vaultik to restore a backup as was used to create it. This ensures compatibility with the metadata format stored in S3.
|
||||
|
||||
## Database Tables
|
||||
|
||||
### 1. `files`
|
||||
Stores metadata about files in the filesystem being backed up.
|
||||
|
||||
**Columns:**
|
||||
- `id` (TEXT PRIMARY KEY) - UUID for the file record
|
||||
- `path` (TEXT NOT NULL UNIQUE) - Absolute file path
|
||||
- `mtime` (INTEGER NOT NULL) - Modification time as Unix timestamp
|
||||
- `ctime` (INTEGER NOT NULL) - Change time as Unix timestamp
|
||||
- `size` (INTEGER NOT NULL) - File size in bytes
|
||||
- `mode` (INTEGER NOT NULL) - Unix file permissions and type
|
||||
- `uid` (INTEGER NOT NULL) - User ID of file owner
|
||||
- `gid` (INTEGER NOT NULL) - Group ID of file owner
|
||||
- `link_target` (TEXT) - Symlink target path (NULL for regular files)
|
||||
|
||||
**Indexes:**
|
||||
- `idx_files_path` on `path` for efficient lookups
|
||||
|
||||
**Purpose:** Tracks file metadata to detect changes between backup runs. Used for incremental backup decisions. The UUID primary key provides stable references that don't change if files are moved.
|
||||
|
||||
### 2. `chunks`
|
||||
Stores information about content-defined chunks created from files.
|
||||
|
||||
**Columns:**
|
||||
- `chunk_hash` (TEXT PRIMARY KEY) - SHA256 hash of chunk content
|
||||
- `size` (INTEGER NOT NULL) - Chunk size in bytes
|
||||
|
||||
**Purpose:** Enables deduplication by tracking unique chunks across all files.
|
||||
|
||||
### 3. `file_chunks`
|
||||
Maps files to their constituent chunks in order.
|
||||
|
||||
**Columns:**
|
||||
- `file_id` (TEXT) - File ID (FK to files.id)
|
||||
- `idx` (INTEGER) - Chunk index within file (0-based)
|
||||
- `chunk_hash` (TEXT) - Chunk hash (FK to chunks.chunk_hash)
|
||||
- PRIMARY KEY (`file_id`, `idx`)
|
||||
|
||||
**Purpose:** Allows reconstruction of files from chunks during restore.
|
||||
|
||||
### 4. `chunk_files`
|
||||
Reverse mapping showing which files contain each chunk.
|
||||
|
||||
**Columns:**
|
||||
- `chunk_hash` (TEXT) - Chunk hash (FK to chunks.chunk_hash)
|
||||
- `file_id` (TEXT) - File ID (FK to files.id)
|
||||
- `file_offset` (INTEGER) - Byte offset of chunk within file
|
||||
- `length` (INTEGER) - Length of chunk in bytes
|
||||
- PRIMARY KEY (`chunk_hash`, `file_id`)
|
||||
|
||||
**Purpose:** Supports efficient queries for chunk usage and deduplication statistics.
|
||||
|
||||
### 5. `blobs`
|
||||
Stores information about packed, compressed, and encrypted blob files.
|
||||
|
||||
**Columns:**
|
||||
- `id` (TEXT PRIMARY KEY) - UUID assigned when blob creation starts
|
||||
- `blob_hash` (TEXT UNIQUE) - SHA256 hash of final blob (NULL until finalized)
|
||||
- `created_ts` (INTEGER NOT NULL) - Creation timestamp
|
||||
- `finished_ts` (INTEGER) - Finalization timestamp (NULL if in progress)
|
||||
- `uncompressed_size` (INTEGER NOT NULL DEFAULT 0) - Total size of chunks before compression
|
||||
- `compressed_size` (INTEGER NOT NULL DEFAULT 0) - Size after compression and encryption
|
||||
- `uploaded_ts` (INTEGER) - Upload completion timestamp (NULL if not uploaded)
|
||||
|
||||
**Purpose:** Tracks blob lifecycle from creation through upload. The UUID primary key allows immediate association of chunks with blobs.
|
||||
|
||||
### 6. `blob_chunks`
|
||||
Maps chunks to the blobs that contain them.
|
||||
|
||||
**Columns:**
|
||||
- `blob_id` (TEXT) - Blob ID (FK to blobs.id)
|
||||
- `chunk_hash` (TEXT) - Chunk hash (FK to chunks.chunk_hash)
|
||||
- `offset` (INTEGER) - Byte offset of chunk within blob (before compression)
|
||||
- `length` (INTEGER) - Length of chunk in bytes
|
||||
- PRIMARY KEY (`blob_id`, `chunk_hash`)
|
||||
|
||||
**Purpose:** Enables chunk retrieval from blobs during restore operations.
|
||||
|
||||
### 7. `snapshots`
|
||||
Tracks backup snapshots.
|
||||
|
||||
**Columns:**
|
||||
- `id` (TEXT PRIMARY KEY) - Snapshot ID (format: hostname-YYYYMMDD-HHMMSSZ)
|
||||
- `hostname` (TEXT) - Hostname where backup was created
|
||||
- `vaultik_version` (TEXT) - Version of Vaultik used
|
||||
- `vaultik_git_revision` (TEXT) - Git revision of Vaultik used
|
||||
- `started_at` (INTEGER) - Start timestamp
|
||||
- `completed_at` (INTEGER) - Completion timestamp (NULL if in progress)
|
||||
- `file_count` (INTEGER) - Number of files in snapshot
|
||||
- `chunk_count` (INTEGER) - Number of unique chunks
|
||||
- `blob_count` (INTEGER) - Number of blobs referenced
|
||||
- `total_size` (INTEGER) - Total size of all files
|
||||
- `blob_size` (INTEGER) - Total size of all blobs (compressed)
|
||||
- `blob_uncompressed_size` (INTEGER) - Total uncompressed size of all referenced blobs
|
||||
- `compression_ratio` (REAL) - Compression ratio achieved
|
||||
- `compression_level` (INTEGER) - Compression level used for this snapshot
|
||||
- `upload_bytes` (INTEGER) - Total bytes uploaded during this snapshot
|
||||
- `upload_duration_ms` (INTEGER) - Total milliseconds spent uploading to S3
|
||||
|
||||
**Purpose:** Provides snapshot metadata and statistics including version tracking for compatibility.
|
||||
|
||||
### 8. `snapshot_files`
|
||||
Maps snapshots to the files they contain.
|
||||
|
||||
**Columns:**
|
||||
- `snapshot_id` (TEXT) - Snapshot ID (FK to snapshots.id)
|
||||
- `file_id` (TEXT) - File ID (FK to files.id)
|
||||
- PRIMARY KEY (`snapshot_id`, `file_id`)
|
||||
|
||||
**Purpose:** Records which files are included in each snapshot.
|
||||
|
||||
### 9. `snapshot_blobs`
|
||||
Maps snapshots to the blobs they reference.
|
||||
|
||||
**Columns:**
|
||||
- `snapshot_id` (TEXT) - Snapshot ID (FK to snapshots.id)
|
||||
- `blob_id` (TEXT) - Blob ID (FK to blobs.id)
|
||||
- `blob_hash` (TEXT) - Denormalized blob hash for manifest generation
|
||||
- PRIMARY KEY (`snapshot_id`, `blob_id`)
|
||||
|
||||
**Purpose:** Tracks blob dependencies for snapshots and enables manifest generation.
|
||||
|
||||
### 10. `uploads`
|
||||
Tracks blob upload metrics.
|
||||
|
||||
**Columns:**
|
||||
- `blob_hash` (TEXT PRIMARY KEY) - Hash of uploaded blob
|
||||
- `snapshot_id` (TEXT NOT NULL) - The snapshot that triggered this upload (FK to snapshots.id)
|
||||
- `uploaded_at` (INTEGER) - Upload timestamp
|
||||
- `size` (INTEGER) - Size of uploaded blob
|
||||
- `duration_ms` (INTEGER) - Upload duration in milliseconds
|
||||
|
||||
**Purpose:** Performance monitoring and tracking which blobs were newly created (uploaded) during each snapshot.
|
||||
|
||||
## Data Flow and Operations
|
||||
|
||||
### 1. Backup Process
|
||||
|
||||
1. **File Scanning**
|
||||
- `INSERT OR REPLACE INTO files` - Update file metadata
|
||||
- `SELECT * FROM files WHERE path = ?` - Check if file has changed
|
||||
- `INSERT INTO snapshot_files` - Add file to current snapshot
|
||||
|
||||
2. **Chunking** (for changed files)
|
||||
- `INSERT OR IGNORE INTO chunks` - Store new chunks
|
||||
- `INSERT INTO file_chunks` - Map chunks to file
|
||||
- `INSERT INTO chunk_files` - Create reverse mapping
|
||||
|
||||
3. **Blob Packing**
|
||||
- `INSERT INTO blobs` - Create blob record with UUID (blob_hash NULL)
|
||||
- `INSERT INTO blob_chunks` - Associate chunks with blob immediately
|
||||
- `UPDATE blobs SET blob_hash = ?, finished_ts = ?` - Finalize blob after packing
|
||||
|
||||
4. **Upload**
|
||||
- `UPDATE blobs SET uploaded_ts = ?` - Mark blob as uploaded
|
||||
- `INSERT INTO uploads` - Record upload metrics with snapshot_id
|
||||
- `INSERT INTO snapshot_blobs` - Associate blob with snapshot
|
||||
|
||||
5. **Snapshot Completion**
|
||||
- `UPDATE snapshots SET completed_at = ?, stats...` - Finalize snapshot
|
||||
- Generate and upload blob manifest from `snapshot_blobs`
|
||||
|
||||
### 2. Incremental Backup
|
||||
|
||||
1. **Change Detection**
|
||||
- `SELECT * FROM files WHERE path = ?` - Get previous file metadata
|
||||
- Compare mtime, size, mode to detect changes
|
||||
- Skip unchanged files but still add to `snapshot_files`
|
||||
|
||||
2. **Chunk Reuse**
|
||||
- `SELECT * FROM blob_chunks WHERE chunk_hash = ?` - Find existing chunks
|
||||
- `INSERT INTO snapshot_blobs` - Reference existing blobs for unchanged files
|
||||
|
||||
### 3. Snapshot Metadata Export
|
||||
|
||||
After a snapshot is completed:
|
||||
1. Copy database to temporary file
|
||||
2. Clean temporary database to contain only current snapshot data
|
||||
3. Export to SQL dump using sqlite3
|
||||
4. Compress with zstd and encrypt with age
|
||||
5. Upload to S3 as `metadata/{snapshot-id}/db.zst.age`
|
||||
6. Generate blob manifest and upload as `metadata/{snapshot-id}/manifest.json.zst`
|
||||
|
||||
### 4. Restore Process
|
||||
|
||||
The restore process doesn't use the local database. Instead:
|
||||
1. Downloads snapshot metadata from S3
|
||||
2. Downloads required blobs based on manifest
|
||||
3. Reconstructs files from decrypted and decompressed chunks
|
||||
|
||||
### 5. Pruning
|
||||
|
||||
1. **Identify Unreferenced Blobs**
|
||||
- Query blobs not referenced by any remaining snapshot
|
||||
- Delete from S3 and local database
|
||||
|
||||
### 6. Incomplete Snapshot Cleanup
|
||||
|
||||
Before each backup:
|
||||
1. Query incomplete snapshots (where `completed_at IS NULL`)
|
||||
2. Check if metadata exists in S3
|
||||
3. If no metadata, delete snapshot and all associations
|
||||
4. Clean up orphaned files, chunks, and blobs
|
||||
|
||||
## Repository Pattern
|
||||
|
||||
Vaultik uses a repository pattern for database access:
|
||||
|
||||
- `FileRepository` - CRUD operations for files and file metadata
|
||||
- `ChunkRepository` - CRUD operations for content chunks
|
||||
- `FileChunkRepository` - Manage file-to-chunk mappings
|
||||
- `ChunkFileRepository` - Manage chunk-to-file reverse mappings
|
||||
- `BlobRepository` - Manage blob lifecycle (creation, finalization, upload)
|
||||
- `BlobChunkRepository` - Manage blob-to-chunk associations
|
||||
- `SnapshotRepository` - Manage snapshots and their relationships
|
||||
- `UploadRepository` - Track blob upload metrics
|
||||
|
||||
Each repository provides methods like:
|
||||
- `Create()` - Insert new record
|
||||
- `GetByID()` / `GetByPath()` / `GetByHash()` - Retrieve records
|
||||
- `Update()` - Update existing records
|
||||
- `Delete()` - Remove records
|
||||
- Specialized queries for each entity type (e.g., `DeleteOrphaned()`, `GetIncompleteByHostname()`)
|
||||
|
||||
## Transaction Management
|
||||
|
||||
All database operations that modify multiple tables are wrapped in transactions:
|
||||
|
||||
```go
|
||||
err := repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||
// Multiple repository operations using tx
|
||||
})
|
||||
```
|
||||
|
||||
This ensures consistency, especially important for operations like:
|
||||
- Creating file-chunk mappings
|
||||
- Associating chunks with blobs
|
||||
- Updating snapshot statistics
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
1. **Indexes**:
|
||||
- Primary keys are automatically indexed
|
||||
- `idx_files_path` on `files(path)` for efficient file lookups
|
||||
|
||||
2. **Prepared Statements**: All queries use prepared statements for performance and security
|
||||
|
||||
3. **Batch Operations**: Where possible, operations are batched within transactions
|
||||
|
||||
4. **Write-Ahead Logging**: SQLite WAL mode is enabled for better concurrency
|
||||
|
||||
## Data Integrity
|
||||
|
||||
1. **Foreign Keys**: Enforced through CASCADE DELETE and application-level repository methods
|
||||
2. **Unique Constraints**: Chunk hashes, file paths, and blob hashes are unique
|
||||
3. **Null Handling**: Nullable fields clearly indicate in-progress operations
|
||||
4. **Timestamp Tracking**: All major operations record timestamps for auditing
|
||||
143
docs/REPOSTRUCTURE.md
Normal file
143
docs/REPOSTRUCTURE.md
Normal file
@@ -0,0 +1,143 @@
|
||||
# Vaultik S3 Repository Structure
|
||||
|
||||
This document describes the structure and organization of data stored in the S3 bucket by Vaultik.
|
||||
|
||||
## Overview
|
||||
|
||||
Vaultik stores all backup data in an S3-compatible object store. The repository consists of two main components:
|
||||
1. **Blobs** - The actual backup data (content-addressed, encrypted)
|
||||
2. **Metadata** - Snapshot information and manifests (partially encrypted)
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
<bucket>/<prefix>/
|
||||
├── blobs/
|
||||
│ └── <hash[0:2]>/
|
||||
│ └── <hash[2:4]>/
|
||||
│ └── <full-hash>
|
||||
└── metadata/
|
||||
└── <snapshot-id>/
|
||||
├── db.zst.age
|
||||
└── manifest.json.zst
|
||||
```
|
||||
|
||||
## Blobs Directory (`blobs/`)
|
||||
|
||||
### Structure
|
||||
- **Path format**: `blobs/<first-2-chars>/<next-2-chars>/<full-hash>`
|
||||
- **Example**: `blobs/ca/fe/cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678`
|
||||
- **Sharding**: The two-level directory structure (using the first 4 characters of the hash) prevents any single directory from containing too many objects
|
||||
|
||||
### Content
|
||||
- **What it contains**: Packed collections of content-defined chunks from files
|
||||
- **Format**: Zstandard compressed, then Age encrypted
|
||||
- **Encryption**: Always encrypted with Age using the configured recipients
|
||||
- **Naming**: Content-addressed using SHA256 hash of the encrypted blob
|
||||
|
||||
### Why Encrypted
|
||||
Blobs contain the actual file data from backups and must be encrypted for security. The content-addressing ensures deduplication while the encryption ensures privacy.
|
||||
|
||||
## Metadata Directory (`metadata/`)
|
||||
|
||||
Each snapshot has its own subdirectory named with the snapshot ID.
|
||||
|
||||
### Snapshot ID Format
|
||||
- **Format**: `<hostname>-<YYYYMMDD>-<HHMMSSZ>`
|
||||
- **Example**: `laptop-20240115-143052Z`
|
||||
- **Components**:
|
||||
- Hostname (may contain hyphens)
|
||||
- Date in YYYYMMDD format
|
||||
- Time in HHMMSSZ format (Z indicates UTC)
|
||||
|
||||
### Files in Each Snapshot Directory
|
||||
|
||||
#### `db.zst.age` - Encrypted Database Dump
|
||||
- **What it contains**: Complete SQLite database dump for this snapshot
|
||||
- **Format**: SQL dump → Zstandard compressed → Age encrypted
|
||||
- **Encryption**: Encrypted with Age
|
||||
- **Purpose**: Contains full file metadata, chunk mappings, and all relationships
|
||||
- **Why encrypted**: Contains sensitive metadata like file paths, permissions, and ownership
|
||||
|
||||
#### `manifest.json.zst` - Unencrypted Blob Manifest
|
||||
- **What it contains**: JSON list of all blob hashes referenced by this snapshot
|
||||
- **Format**: JSON → Zstandard compressed (NOT encrypted)
|
||||
- **Encryption**: NOT encrypted
|
||||
- **Purpose**: Enables pruning operations without requiring decryption keys
|
||||
- **Structure**:
|
||||
```json
|
||||
{
|
||||
"snapshot_id": "laptop-20240115-143052Z",
|
||||
"timestamp": "2024-01-15T14:30:52Z",
|
||||
"blob_count": 42,
|
||||
"blobs": [
|
||||
"cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678",
|
||||
"deadbeef1234567890abcdef1234567890abcdef1234567890abcdef12345678",
|
||||
...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Why Manifest is Unencrypted
|
||||
The manifest must be readable without the private key to enable:
|
||||
1. **Pruning operations** - Identifying unreferenced blobs for deletion
|
||||
2. **Storage analysis** - Understanding space usage without decryption
|
||||
3. **Verification** - Checking blob existence without decryption
|
||||
4. **Cross-snapshot deduplication analysis** - Finding shared blobs between snapshots
|
||||
|
||||
The manifest only contains blob hashes, not file names or any other sensitive information.
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### What's Encrypted
|
||||
- **All file content** (in blobs)
|
||||
- **All file metadata** (paths, permissions, timestamps, ownership in db.zst.age)
|
||||
- **File-to-chunk mappings** (in db.zst.age)
|
||||
|
||||
### What's Not Encrypted
|
||||
- **Blob hashes** (in manifest.json.zst)
|
||||
- **Snapshot IDs** (directory names)
|
||||
- **Blob count per snapshot** (in manifest.json.zst)
|
||||
|
||||
### Privacy Implications
|
||||
From the unencrypted data, an observer can determine:
|
||||
- When backups were taken (from snapshot IDs)
|
||||
- Which hostname created backups (from snapshot IDs)
|
||||
- How many blobs each snapshot references
|
||||
- Which blobs are shared between snapshots (deduplication patterns)
|
||||
- The size of each encrypted blob
|
||||
|
||||
An observer cannot determine:
|
||||
- File names or paths
|
||||
- File contents
|
||||
- File permissions or ownership
|
||||
- Directory structure
|
||||
- Which chunks belong to which files
|
||||
|
||||
## Consistency Guarantees
|
||||
|
||||
1. **Blobs are immutable** - Once written, a blob is never modified
|
||||
2. **Blobs are written before metadata** - A snapshot's metadata is only written after all its blobs are successfully uploaded
|
||||
3. **Metadata is written atomically** - Both db.zst.age and manifest.json.zst are written as complete files
|
||||
4. **Snapshots are marked complete in local DB only after metadata upload** - Ensures consistency between local and remote state
|
||||
|
||||
## Pruning Safety
|
||||
|
||||
The prune operation is safe because:
|
||||
1. It only deletes blobs not referenced in any manifest
|
||||
2. Manifests are unencrypted and can be read without keys
|
||||
3. The operation compares the latest local DB snapshot with the latest S3 snapshot to ensure consistency
|
||||
4. Pruning will fail if these don't match, preventing accidental deletion of needed blobs
|
||||
|
||||
## Restoration Requirements
|
||||
|
||||
To restore from a backup, you need:
|
||||
1. **The Age private key** - To decrypt blobs and database
|
||||
2. **The snapshot metadata** - Both files from the snapshot's metadata directory
|
||||
3. **All referenced blobs** - As listed in the manifest
|
||||
|
||||
The restoration process:
|
||||
1. Download and decrypt the database dump to understand file structure
|
||||
2. Download and decrypt the required blobs
|
||||
3. Reconstruct files from their chunks
|
||||
4. Restore file metadata (permissions, timestamps, etc.)
|
||||
295
go.mod
295
go.mod
@@ -1,28 +1,305 @@
|
||||
module git.eeqj.de/sneak/vaultik
|
||||
|
||||
go 1.24.4
|
||||
go 1.26.1
|
||||
|
||||
require (
|
||||
github.com/spf13/cobra v1.9.1
|
||||
filippo.io/age v1.2.1
|
||||
git.eeqj.de/sneak/smartconfig v1.0.0
|
||||
github.com/adrg/xdg v0.5.3
|
||||
github.com/aws/aws-sdk-go-v2 v1.39.6
|
||||
github.com/aws/aws-sdk-go-v2/config v1.31.17
|
||||
github.com/aws/aws-sdk-go-v2/credentials v1.18.21
|
||||
github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.20.4
|
||||
github.com/aws/aws-sdk-go-v2/service/s3 v1.90.0
|
||||
github.com/aws/smithy-go v1.23.2
|
||||
github.com/dustin/go-humanize v1.0.1
|
||||
github.com/gobwas/glob v0.2.3
|
||||
github.com/google/uuid v1.6.0
|
||||
github.com/johannesboyne/gofakes3 v0.0.0-20250603205740-ed9094be7668
|
||||
github.com/klauspost/compress v1.18.1
|
||||
github.com/mattn/go-sqlite3 v1.14.29
|
||||
github.com/rclone/rclone v1.72.1
|
||||
github.com/schollz/progressbar/v3 v3.19.0
|
||||
github.com/spf13/afero v1.15.0
|
||||
github.com/spf13/cobra v1.10.1
|
||||
github.com/stretchr/testify v1.11.1
|
||||
go.uber.org/fx v1.24.0
|
||||
golang.org/x/term v0.37.0
|
||||
gopkg.in/yaml.v3 v3.0.1
|
||||
modernc.org/sqlite v1.38.0
|
||||
)
|
||||
|
||||
require (
|
||||
github.com/dustin/go-humanize v1.0.1 // indirect
|
||||
github.com/google/uuid v1.6.0 // indirect
|
||||
cloud.google.com/go/auth v0.17.0 // indirect
|
||||
cloud.google.com/go/auth/oauth2adapt v0.2.8 // indirect
|
||||
cloud.google.com/go/compute/metadata v0.9.0 // indirect
|
||||
cloud.google.com/go/iam v1.5.2 // indirect
|
||||
cloud.google.com/go/secretmanager v1.15.0 // indirect
|
||||
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.20.0 // indirect
|
||||
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.13.0 // indirect
|
||||
github.com/Azure/azure-sdk-for-go/sdk/internal v1.11.2 // indirect
|
||||
github.com/Azure/azure-sdk-for-go/sdk/keyvault/azsecrets v0.12.0 // indirect
|
||||
github.com/Azure/azure-sdk-for-go/sdk/keyvault/internal v0.7.1 // indirect
|
||||
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.6.3 // indirect
|
||||
github.com/Azure/azure-sdk-for-go/sdk/storage/azfile v1.5.3 // indirect
|
||||
github.com/Azure/go-ntlmssp v0.0.2-0.20251110135918-10b7b7e7cd26 // indirect
|
||||
github.com/AzureAD/microsoft-authentication-library-for-go v1.6.0 // indirect
|
||||
github.com/Files-com/files-sdk-go/v3 v3.2.264 // indirect
|
||||
github.com/IBM/go-sdk-core/v5 v5.21.0 // indirect
|
||||
github.com/Max-Sum/base32768 v0.0.0-20230304063302-18e6ce5945fd // indirect
|
||||
github.com/Microsoft/go-winio v0.6.2 // indirect
|
||||
github.com/ProtonMail/bcrypt v0.0.0-20211005172633-e235017c1baf // indirect
|
||||
github.com/ProtonMail/gluon v0.17.1-0.20230724134000-308be39be96e // indirect
|
||||
github.com/ProtonMail/go-crypto v1.3.0 // indirect
|
||||
github.com/ProtonMail/go-mime v0.0.0-20230322103455-7d82a3887f2f // indirect
|
||||
github.com/ProtonMail/go-srp v0.0.7 // indirect
|
||||
github.com/ProtonMail/gopenpgp/v2 v2.9.0 // indirect
|
||||
github.com/PuerkitoBio/goquery v1.10.3 // indirect
|
||||
github.com/a1ex3/zstd-seekable-format-go/pkg v0.10.0 // indirect
|
||||
github.com/abbot/go-http-auth v0.4.0 // indirect
|
||||
github.com/anchore/go-lzo v0.1.0 // indirect
|
||||
github.com/andybalholm/cascadia v1.3.3 // indirect
|
||||
github.com/appscode/go-querystring v0.0.0-20170504095604-0126cfb3f1dc // indirect
|
||||
github.com/armon/go-metrics v0.4.1 // indirect
|
||||
github.com/aws/aws-sdk-go v1.44.256 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.3 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.13 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.13 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.13 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.4 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.13 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.3 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.4 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.13 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.13 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/secretsmanager v1.35.8 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/sso v1.30.1 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.5 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/sts v1.39.1 // indirect
|
||||
github.com/bahlo/generic-list-go v0.2.0 // indirect
|
||||
github.com/beorn7/perks v1.0.1 // indirect
|
||||
github.com/boombuler/barcode v1.1.0 // indirect
|
||||
github.com/bradenaw/juniper v0.15.3 // indirect
|
||||
github.com/bradfitz/iter v0.0.0-20191230175014-e8f45d346db8 // indirect
|
||||
github.com/buengese/sgzip v0.1.1 // indirect
|
||||
github.com/buger/jsonparser v1.1.1 // indirect
|
||||
github.com/calebcase/tmpfile v1.0.3 // indirect
|
||||
github.com/cenkalti/backoff/v4 v4.3.0 // indirect
|
||||
github.com/cespare/xxhash/v2 v2.3.0 // indirect
|
||||
github.com/chilts/sid v0.0.0-20190607042430-660e94789ec9 // indirect
|
||||
github.com/clipperhouse/stringish v0.1.1 // indirect
|
||||
github.com/clipperhouse/uax29/v2 v2.3.0 // indirect
|
||||
github.com/cloudflare/circl v1.6.1 // indirect
|
||||
github.com/cloudinary/cloudinary-go/v2 v2.13.0 // indirect
|
||||
github.com/cloudsoda/go-smb2 v0.0.0-20250228001242-d4c70e6251cc // indirect
|
||||
github.com/cloudsoda/sddl v0.0.0-20250224235906-926454e91efc // indirect
|
||||
github.com/colinmarc/hdfs/v2 v2.4.0 // indirect
|
||||
github.com/coreos/go-semver v0.3.1 // indirect
|
||||
github.com/coreos/go-systemd/v22 v22.6.0 // indirect
|
||||
github.com/creasty/defaults v1.8.0 // indirect
|
||||
github.com/cronokirby/saferith v0.33.0 // indirect
|
||||
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
|
||||
github.com/diskfs/go-diskfs v1.7.0 // indirect
|
||||
github.com/dropbox/dropbox-sdk-go-unofficial/v6 v6.0.5 // indirect
|
||||
github.com/ebitengine/purego v0.9.1 // indirect
|
||||
github.com/emersion/go-message v0.18.2 // indirect
|
||||
github.com/emersion/go-vcard v0.0.0-20241024213814-c9703dde27ff // indirect
|
||||
github.com/emicklei/go-restful/v3 v3.11.0 // indirect
|
||||
github.com/fatih/color v1.16.0 // indirect
|
||||
github.com/felixge/httpsnoop v1.0.4 // indirect
|
||||
github.com/flynn/noise v1.1.0 // indirect
|
||||
github.com/fxamacker/cbor/v2 v2.7.0 // indirect
|
||||
github.com/gabriel-vasile/mimetype v1.4.11 // indirect
|
||||
github.com/geoffgarside/ber v1.2.0 // indirect
|
||||
github.com/go-chi/chi/v5 v5.2.3 // indirect
|
||||
github.com/go-darwin/apfs v0.0.0-20211011131704-f84b94dbf348 // indirect
|
||||
github.com/go-git/go-billy/v5 v5.6.2 // indirect
|
||||
github.com/go-jose/go-jose/v4 v4.1.2 // indirect
|
||||
github.com/go-logr/logr v1.4.3 // indirect
|
||||
github.com/go-logr/stdr v1.2.2 // indirect
|
||||
github.com/go-ole/go-ole v1.3.0 // indirect
|
||||
github.com/go-openapi/errors v0.22.4 // indirect
|
||||
github.com/go-openapi/jsonpointer v0.21.0 // indirect
|
||||
github.com/go-openapi/jsonreference v0.20.2 // indirect
|
||||
github.com/go-openapi/strfmt v0.25.0 // indirect
|
||||
github.com/go-openapi/swag v0.23.0 // indirect
|
||||
github.com/go-playground/locales v0.14.1 // indirect
|
||||
github.com/go-playground/universal-translator v0.18.1 // indirect
|
||||
github.com/go-playground/validator/v10 v10.28.0 // indirect
|
||||
github.com/go-resty/resty/v2 v2.16.5 // indirect
|
||||
github.com/go-viper/mapstructure/v2 v2.4.0 // indirect
|
||||
github.com/gofrs/flock v0.13.0 // indirect
|
||||
github.com/gogo/protobuf v1.3.2 // indirect
|
||||
github.com/golang-jwt/jwt/v4 v4.5.2 // indirect
|
||||
github.com/golang-jwt/jwt/v5 v5.3.0 // indirect
|
||||
github.com/golang/protobuf v1.5.4 // indirect
|
||||
github.com/google/btree v1.1.3 // indirect
|
||||
github.com/google/gnostic-models v0.6.9 // indirect
|
||||
github.com/google/go-cmp v0.7.0 // indirect
|
||||
github.com/google/s2a-go v0.1.9 // indirect
|
||||
github.com/googleapis/enterprise-certificate-proxy v0.3.7 // indirect
|
||||
github.com/googleapis/gax-go/v2 v2.15.0 // indirect
|
||||
github.com/gopherjs/gopherjs v1.17.2 // indirect
|
||||
github.com/gorilla/schema v1.4.1 // indirect
|
||||
github.com/grpc-ecosystem/grpc-gateway/v2 v2.26.3 // indirect
|
||||
github.com/hashicorp/consul/api v1.32.1 // indirect
|
||||
github.com/hashicorp/errwrap v1.1.0 // indirect
|
||||
github.com/hashicorp/go-cleanhttp v0.5.2 // indirect
|
||||
github.com/hashicorp/go-hclog v1.6.3 // indirect
|
||||
github.com/hashicorp/go-immutable-radix v1.3.1 // indirect
|
||||
github.com/hashicorp/go-multierror v1.1.1 // indirect
|
||||
github.com/hashicorp/go-retryablehttp v0.7.8 // indirect
|
||||
github.com/hashicorp/go-rootcerts v1.0.2 // indirect
|
||||
github.com/hashicorp/go-secure-stdlib/parseutil v0.1.6 // indirect
|
||||
github.com/hashicorp/go-secure-stdlib/strutil v0.1.2 // indirect
|
||||
github.com/hashicorp/go-sockaddr v1.0.2 // indirect
|
||||
github.com/hashicorp/go-uuid v1.0.3 // indirect
|
||||
github.com/hashicorp/golang-lru v0.5.4 // indirect
|
||||
github.com/hashicorp/hcl v1.0.1-vault-7 // indirect
|
||||
github.com/hashicorp/serf v0.10.1 // indirect
|
||||
github.com/hashicorp/vault/api v1.20.0 // indirect
|
||||
github.com/henrybear327/Proton-API-Bridge v1.0.0 // indirect
|
||||
github.com/henrybear327/go-proton-api v1.0.0 // indirect
|
||||
github.com/inconshreveable/mousetrap v1.1.0 // indirect
|
||||
github.com/jcmturner/aescts/v2 v2.0.0 // indirect
|
||||
github.com/jcmturner/dnsutils/v2 v2.0.0 // indirect
|
||||
github.com/jcmturner/gofork v1.7.6 // indirect
|
||||
github.com/jcmturner/goidentity/v6 v6.0.1 // indirect
|
||||
github.com/jcmturner/gokrb5/v8 v8.4.4 // indirect
|
||||
github.com/jcmturner/rpc/v2 v2.0.3 // indirect
|
||||
github.com/jlaffaye/ftp v0.2.1-0.20240918233326-1b970516f5d3 // indirect
|
||||
github.com/josharian/intern v1.0.0 // indirect
|
||||
github.com/json-iterator/go v1.1.12 // indirect
|
||||
github.com/jtolds/gls v4.20.0+incompatible // indirect
|
||||
github.com/jtolio/noiseconn v0.0.0-20231127013910-f6d9ecbf1de7 // indirect
|
||||
github.com/jzelinskie/whirlpool v0.0.0-20201016144138-0675e54bb004 // indirect
|
||||
github.com/klauspost/cpuid/v2 v2.3.0 // indirect
|
||||
github.com/koofr/go-httpclient v0.0.0-20240520111329-e20f8f203988 // indirect
|
||||
github.com/koofr/go-koofrclient v0.0.0-20221207135200-cbd7fc9ad6a6 // indirect
|
||||
github.com/kr/fs v0.1.0 // indirect
|
||||
github.com/kylelemons/godebug v1.1.0 // indirect
|
||||
github.com/lanrat/extsort v1.4.2 // indirect
|
||||
github.com/leodido/go-urn v1.4.0 // indirect
|
||||
github.com/lpar/date v1.0.0 // indirect
|
||||
github.com/lufia/plan9stats v0.0.0-20251013123823-9fd1530e3ec3 // indirect
|
||||
github.com/mailru/easyjson v0.9.1 // indirect
|
||||
github.com/mattn/go-colorable v0.1.14 // indirect
|
||||
github.com/mattn/go-isatty v0.0.20 // indirect
|
||||
github.com/mattn/go-runewidth v0.0.19 // indirect
|
||||
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db // indirect
|
||||
github.com/mitchellh/go-homedir v1.1.0 // indirect
|
||||
github.com/mitchellh/mapstructure v1.5.0 // indirect
|
||||
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
|
||||
github.com/modern-go/reflect2 v1.0.2 // indirect
|
||||
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
|
||||
github.com/ncruces/go-strftime v0.1.9 // indirect
|
||||
github.com/ncw/swift/v2 v2.0.5 // indirect
|
||||
github.com/oklog/ulid v1.3.1 // indirect
|
||||
github.com/onsi/ginkgo/v2 v2.23.3 // indirect
|
||||
github.com/oracle/oci-go-sdk/v65 v65.104.0 // indirect
|
||||
github.com/panjf2000/ants/v2 v2.11.3 // indirect
|
||||
github.com/patrickmn/go-cache v2.1.0+incompatible // indirect
|
||||
github.com/pengsrc/go-shared v0.2.1-0.20190131101655-1999055a4a14 // indirect
|
||||
github.com/peterh/liner v1.2.2 // indirect
|
||||
github.com/pierrec/lz4/v4 v4.1.22 // indirect
|
||||
github.com/pkg/browser v0.0.0-20240102092130-5ac0b6a4141c // indirect
|
||||
github.com/pkg/errors v0.9.1 // indirect
|
||||
github.com/pkg/sftp v1.13.10 // indirect
|
||||
github.com/pkg/xattr v0.4.12 // indirect
|
||||
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
|
||||
github.com/power-devops/perfstat v0.0.0-20240221224432-82ca36839d55 // indirect
|
||||
github.com/pquerna/otp v1.5.0 // indirect
|
||||
github.com/prometheus/client_golang v1.23.2 // indirect
|
||||
github.com/prometheus/client_model v0.6.2 // indirect
|
||||
github.com/prometheus/common v0.67.2 // indirect
|
||||
github.com/prometheus/procfs v0.19.2 // indirect
|
||||
github.com/putdotio/go-putio/putio v0.0.0-20200123120452-16d982cac2b8 // indirect
|
||||
github.com/relvacode/iso8601 v1.7.0 // indirect
|
||||
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
|
||||
github.com/spf13/pflag v1.0.6 // indirect
|
||||
github.com/rfjakob/eme v1.1.2 // indirect
|
||||
github.com/rivo/uniseg v0.4.7 // indirect
|
||||
github.com/ryanuber/go-glob v1.0.0 // indirect
|
||||
github.com/ryszard/goskiplist v0.0.0-20150312221310-2dfbae5fcf46 // indirect
|
||||
github.com/sabhiram/go-gitignore v0.0.0-20210923224102-525f6e181f06 // indirect
|
||||
github.com/samber/lo v1.52.0 // indirect
|
||||
github.com/shirou/gopsutil/v4 v4.25.10 // indirect
|
||||
github.com/sirupsen/logrus v1.9.4-0.20230606125235-dd1b4c2e81af // indirect
|
||||
github.com/skratchdot/open-golang v0.0.0-20200116055534-eef842397966 // indirect
|
||||
github.com/smarty/assertions v1.16.0 // indirect
|
||||
github.com/sony/gobreaker v1.0.0 // indirect
|
||||
github.com/spacemonkeygo/monkit/v3 v3.0.25-0.20251022131615-eb24eb109368 // indirect
|
||||
github.com/spf13/pflag v1.0.10 // indirect
|
||||
github.com/t3rm1n4l/go-mega v0.0.0-20251031123324-a804aaa87491 // indirect
|
||||
github.com/tidwall/gjson v1.18.0 // indirect
|
||||
github.com/tidwall/match v1.1.1 // indirect
|
||||
github.com/tidwall/pretty v1.2.0 // indirect
|
||||
github.com/tklauser/go-sysconf v0.3.15 // indirect
|
||||
github.com/tklauser/numcpus v0.10.0 // indirect
|
||||
github.com/ulikunitz/xz v0.5.15 // indirect
|
||||
github.com/unknwon/goconfig v1.0.0 // indirect
|
||||
github.com/wk8/go-ordered-map/v2 v2.1.8 // indirect
|
||||
github.com/x448/float16 v0.8.4 // indirect
|
||||
github.com/xanzy/ssh-agent v0.3.3 // indirect
|
||||
github.com/youmark/pkcs8 v0.0.0-20240726163527-a2c0da244d78 // indirect
|
||||
github.com/yunify/qingstor-sdk-go/v3 v3.2.0 // indirect
|
||||
github.com/yusufpapurcu/wmi v1.2.4 // indirect
|
||||
github.com/zeebo/blake3 v0.2.4 // indirect
|
||||
github.com/zeebo/errs v1.4.0 // indirect
|
||||
github.com/zeebo/xxh3 v1.0.2 // indirect
|
||||
go.etcd.io/bbolt v1.4.3 // indirect
|
||||
go.etcd.io/etcd/api/v3 v3.6.2 // indirect
|
||||
go.etcd.io/etcd/client/pkg/v3 v3.6.2 // indirect
|
||||
go.etcd.io/etcd/client/v3 v3.6.2 // indirect
|
||||
go.mongodb.org/mongo-driver v1.17.6 // indirect
|
||||
go.opentelemetry.io/auto/sdk v1.2.1 // indirect
|
||||
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.61.0 // indirect
|
||||
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.63.0 // indirect
|
||||
go.opentelemetry.io/otel v1.38.0 // indirect
|
||||
go.opentelemetry.io/otel/metric v1.38.0 // indirect
|
||||
go.opentelemetry.io/otel/trace v1.38.0 // indirect
|
||||
go.shabbyrobe.org/gocovmerge v0.0.0-20230507111327-fa4f82cfbf4d // indirect
|
||||
go.uber.org/dig v1.19.0 // indirect
|
||||
go.uber.org/multierr v1.10.0 // indirect
|
||||
go.uber.org/zap v1.26.0 // indirect
|
||||
golang.org/x/exp v0.0.0-20250408133849-7e4ce0ab07d0 // indirect
|
||||
golang.org/x/sys v0.33.0 // indirect
|
||||
go.uber.org/multierr v1.11.0 // indirect
|
||||
go.uber.org/zap v1.27.0 // indirect
|
||||
go.yaml.in/yaml/v2 v2.4.3 // indirect
|
||||
golang.org/x/crypto v0.45.0 // indirect
|
||||
golang.org/x/exp v0.0.0-20251023183803-a4bb9ffd2546 // indirect
|
||||
golang.org/x/net v0.47.0 // indirect
|
||||
golang.org/x/oauth2 v0.33.0 // indirect
|
||||
golang.org/x/sync v0.18.0 // indirect
|
||||
golang.org/x/sys v0.38.0 // indirect
|
||||
golang.org/x/text v0.31.0 // indirect
|
||||
golang.org/x/time v0.14.0 // indirect
|
||||
golang.org/x/tools v0.38.0 // indirect
|
||||
google.golang.org/api v0.255.0 // indirect
|
||||
google.golang.org/genproto v0.0.0-20250603155806-513f23925822 // indirect
|
||||
google.golang.org/genproto/googleapis/api v0.0.0-20250804133106-a7a43d27e69b // indirect
|
||||
google.golang.org/genproto/googleapis/rpc v0.0.0-20251103181224-f26f9409b101 // indirect
|
||||
google.golang.org/grpc v1.76.0 // indirect
|
||||
google.golang.org/protobuf v1.36.10 // indirect
|
||||
gopkg.in/evanphx/json-patch.v4 v4.12.0 // indirect
|
||||
gopkg.in/inf.v0 v0.9.1 // indirect
|
||||
gopkg.in/natefinch/lumberjack.v2 v2.2.1 // indirect
|
||||
gopkg.in/validator.v2 v2.0.1 // indirect
|
||||
gopkg.in/yaml.v2 v2.4.0 // indirect
|
||||
k8s.io/api v0.33.3 // indirect
|
||||
k8s.io/apimachinery v0.33.3 // indirect
|
||||
k8s.io/client-go v0.33.3 // indirect
|
||||
k8s.io/klog/v2 v2.130.1 // indirect
|
||||
k8s.io/kube-openapi v0.0.0-20250318190949-c8a335a9a2ff // indirect
|
||||
k8s.io/utils v0.0.0-20241104100929-3ea5e8cea738 // indirect
|
||||
modernc.org/libc v1.65.10 // indirect
|
||||
modernc.org/mathutil v1.7.1 // indirect
|
||||
modernc.org/memory v1.11.0 // indirect
|
||||
moul.io/http2curl/v2 v2.3.0 // indirect
|
||||
sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3 // indirect
|
||||
sigs.k8s.io/randfill v1.0.0 // indirect
|
||||
sigs.k8s.io/structured-merge-diff/v4 v4.6.0 // indirect
|
||||
sigs.k8s.io/yaml v1.6.0 // indirect
|
||||
storj.io/common v0.0.0-20251107171817-6221ae45072c // indirect
|
||||
storj.io/drpc v0.0.35-0.20250513201419-f7819ea69b55 // indirect
|
||||
storj.io/eventkit v0.0.0-20250410172343-61f26d3de156 // indirect
|
||||
storj.io/infectious v0.0.2 // indirect
|
||||
storj.io/picobuf v0.0.4 // indirect
|
||||
storj.io/uplink v1.13.1 // indirect
|
||||
)
|
||||
|
||||
6
internal/blob/errors.go
Normal file
6
internal/blob/errors.go
Normal file
@@ -0,0 +1,6 @@
|
||||
package blob
|
||||
|
||||
import "errors"
|
||||
|
||||
// ErrBlobSizeLimitExceeded is returned when adding a chunk would exceed the blob size limit
|
||||
var ErrBlobSizeLimitExceeded = errors.New("adding chunk would exceed blob size limit")
|
||||
555
internal/blob/packer.go
Normal file
555
internal/blob/packer.go
Normal file
@@ -0,0 +1,555 @@
|
||||
// Package blob handles the creation of blobs - the final storage units for Vaultik.
|
||||
// A blob is a large file (up to 10GB) containing many compressed and encrypted chunks
|
||||
// from multiple source files. Blobs are content-addressed, meaning their filename
|
||||
// is derived from the SHA256 hash of their compressed and encrypted content.
|
||||
//
|
||||
// The blob creation process:
|
||||
// 1. Chunks are accumulated from multiple files
|
||||
// 2. The collection is compressed using zstd
|
||||
// 3. The compressed data is encrypted using age
|
||||
// 4. The encrypted blob is hashed to create its content-addressed name
|
||||
// 5. The blob is uploaded to S3 using the hash as the filename
|
||||
//
|
||||
// This design optimizes storage efficiency by batching many small chunks into
|
||||
// larger blobs, reducing the number of S3 operations and associated costs.
|
||||
package blob
|
||||
|
||||
import (
|
||||
"context"
|
||||
"database/sql"
|
||||
"encoding/hex"
|
||||
"fmt"
|
||||
"io"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/blobgen"
|
||||
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
"github.com/google/uuid"
|
||||
"github.com/spf13/afero"
|
||||
)
|
||||
|
||||
// BlobHandler is a callback function invoked when a blob is finalized and ready for upload.
|
||||
// The handler receives a BlobWithReader containing the blob metadata and a reader for
|
||||
// the compressed and encrypted blob content. The handler is responsible for uploading
|
||||
// the blob to storage and cleaning up any temporary files.
|
||||
type BlobHandler func(blob *BlobWithReader) error
|
||||
|
||||
// PackerConfig holds configuration for creating a Packer.
|
||||
// All fields except BlobHandler are required.
|
||||
type PackerConfig struct {
|
||||
MaxBlobSize int64 // Maximum size of a blob before forcing finalization
|
||||
CompressionLevel int // Zstd compression level (1-19, higher = better compression)
|
||||
Recipients []string // Age recipients for encryption
|
||||
Repositories *database.Repositories // Database repositories for tracking blob metadata
|
||||
BlobHandler BlobHandler // Optional callback when blob is ready for upload
|
||||
Fs afero.Fs // Filesystem for temporary files
|
||||
}
|
||||
|
||||
// PendingChunk represents a chunk waiting to be inserted into the database.
|
||||
type PendingChunk struct {
|
||||
Hash string
|
||||
Size int64
|
||||
}
|
||||
|
||||
// Packer accumulates chunks and packs them into blobs.
|
||||
// It handles compression, encryption, and coordination with the database
|
||||
// to track blob metadata. Packer is thread-safe.
|
||||
type Packer struct {
|
||||
maxBlobSize int64
|
||||
compressionLevel int
|
||||
recipients []string // Age recipients for encryption
|
||||
blobHandler BlobHandler // Called when blob is ready
|
||||
repos *database.Repositories // For creating blob records
|
||||
fs afero.Fs // Filesystem for temporary files
|
||||
|
||||
// Mutex for thread-safe blob creation
|
||||
mu sync.Mutex
|
||||
|
||||
// Current blob being packed
|
||||
currentBlob *blobInProgress
|
||||
finishedBlobs []*FinishedBlob // Only used if no handler provided
|
||||
|
||||
// Pending chunks to be inserted when blob finalizes
|
||||
pendingChunks []PendingChunk
|
||||
}
|
||||
|
||||
// blobInProgress represents a blob being assembled
|
||||
type blobInProgress struct {
|
||||
id string // UUID of the blob
|
||||
chunks []*chunkInfo // Track chunk metadata
|
||||
chunkSet map[string]bool // Track unique chunks in this blob
|
||||
tempFile afero.File // Temporary file for encrypted compressed data
|
||||
writer *blobgen.Writer // Unified compression/encryption/hashing writer
|
||||
startTime time.Time
|
||||
size int64 // Current uncompressed size
|
||||
}
|
||||
|
||||
// ChunkRef represents a chunk to be added to a blob.
|
||||
// The Hash is the content-addressed identifier (SHA256) of the chunk,
|
||||
// and Data contains the raw chunk bytes. After adding to a blob,
|
||||
// the Data can be safely discarded as it's written to the blob immediately.
|
||||
type ChunkRef struct {
|
||||
Hash string // SHA256 hash of the chunk data
|
||||
Data []byte // Raw chunk content
|
||||
}
|
||||
|
||||
// chunkInfo tracks chunk metadata in a blob
|
||||
type chunkInfo struct {
|
||||
Hash string
|
||||
Offset int64
|
||||
Size int64
|
||||
}
|
||||
|
||||
// FinishedBlob represents a completed blob ready for storage
|
||||
type FinishedBlob struct {
|
||||
ID string
|
||||
Hash string
|
||||
Data []byte // Compressed data
|
||||
Chunks []*BlobChunkRef
|
||||
CreatedTS time.Time
|
||||
Uncompressed int64
|
||||
Compressed int64
|
||||
}
|
||||
|
||||
// BlobChunkRef represents a chunk's position within a blob
|
||||
type BlobChunkRef struct {
|
||||
ChunkHash string
|
||||
Offset int64
|
||||
Length int64
|
||||
}
|
||||
|
||||
// BlobWithReader wraps a FinishedBlob with its data reader
|
||||
type BlobWithReader struct {
|
||||
*FinishedBlob
|
||||
Reader io.ReadSeeker
|
||||
TempFile afero.File // Optional, only set for disk-based blobs
|
||||
InsertedChunkHashes []string // Chunk hashes that were inserted to DB with this blob
|
||||
}
|
||||
|
||||
// NewPacker creates a new blob packer that accumulates chunks into blobs.
|
||||
// The packer will automatically finalize blobs when they reach MaxBlobSize.
|
||||
// Returns an error if required configuration fields are missing or invalid.
|
||||
func NewPacker(cfg PackerConfig) (*Packer, error) {
|
||||
if len(cfg.Recipients) == 0 {
|
||||
return nil, fmt.Errorf("recipients are required - blobs must be encrypted")
|
||||
}
|
||||
if cfg.MaxBlobSize <= 0 {
|
||||
return nil, fmt.Errorf("max blob size must be positive")
|
||||
}
|
||||
if cfg.Fs == nil {
|
||||
return nil, fmt.Errorf("filesystem is required")
|
||||
}
|
||||
return &Packer{
|
||||
maxBlobSize: cfg.MaxBlobSize,
|
||||
compressionLevel: cfg.CompressionLevel,
|
||||
recipients: cfg.Recipients,
|
||||
blobHandler: cfg.BlobHandler,
|
||||
repos: cfg.Repositories,
|
||||
fs: cfg.Fs,
|
||||
finishedBlobs: make([]*FinishedBlob, 0),
|
||||
}, nil
|
||||
}
|
||||
|
||||
// SetBlobHandler sets the handler to be called when a blob is finalized.
|
||||
// The handler is responsible for uploading the blob to storage.
|
||||
// If no handler is set, finalized blobs are stored in memory and can be
|
||||
// retrieved with GetFinishedBlobs().
|
||||
func (p *Packer) SetBlobHandler(handler BlobHandler) {
|
||||
p.mu.Lock()
|
||||
defer p.mu.Unlock()
|
||||
p.blobHandler = handler
|
||||
}
|
||||
|
||||
// AddPendingChunk queues a chunk to be inserted into the database when the
|
||||
// current blob is finalized. This batches chunk inserts to reduce transaction
|
||||
// overhead. Thread-safe.
|
||||
func (p *Packer) AddPendingChunk(hash string, size int64) {
|
||||
p.mu.Lock()
|
||||
defer p.mu.Unlock()
|
||||
p.pendingChunks = append(p.pendingChunks, PendingChunk{Hash: hash, Size: size})
|
||||
}
|
||||
|
||||
// AddChunk adds a chunk to the current blob being packed.
|
||||
// If adding the chunk would exceed MaxBlobSize, returns ErrBlobSizeLimitExceeded.
|
||||
// In this case, the caller should finalize the current blob and retry.
|
||||
// The chunk data is written immediately and can be garbage collected after this call.
|
||||
// Thread-safe.
|
||||
func (p *Packer) AddChunk(chunk *ChunkRef) error {
|
||||
p.mu.Lock()
|
||||
defer p.mu.Unlock()
|
||||
|
||||
// Initialize new blob if needed
|
||||
if p.currentBlob == nil {
|
||||
if err := p.startNewBlob(); err != nil {
|
||||
return fmt.Errorf("starting new blob: %w", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Check if adding this chunk would exceed blob size limit
|
||||
// Use conservative estimate: assume no compression
|
||||
// Skip size check if chunk already exists in blob
|
||||
if !p.currentBlob.chunkSet[chunk.Hash] {
|
||||
currentSize := p.currentBlob.size
|
||||
newSize := currentSize + int64(len(chunk.Data))
|
||||
|
||||
if newSize > p.maxBlobSize && len(p.currentBlob.chunks) > 0 {
|
||||
// Return error indicating size limit would be exceeded
|
||||
return ErrBlobSizeLimitExceeded
|
||||
}
|
||||
}
|
||||
|
||||
// Add chunk to current blob
|
||||
if err := p.addChunkToCurrentBlob(chunk); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// Flush finalizes any in-progress blob, compressing, encrypting, and hashing it.
|
||||
// This should be called after all chunks have been added to ensure no data is lost.
|
||||
// If a BlobHandler is set, it will be called with the finalized blob.
|
||||
// Thread-safe.
|
||||
func (p *Packer) Flush() error {
|
||||
p.mu.Lock()
|
||||
defer p.mu.Unlock()
|
||||
|
||||
if p.currentBlob != nil && len(p.currentBlob.chunks) > 0 {
|
||||
if err := p.finalizeCurrentBlob(); err != nil {
|
||||
return fmt.Errorf("finalizing blob: %w", err)
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// FinalizeBlob finalizes the current blob being assembled.
|
||||
// This compresses the accumulated chunks, encrypts the result, and computes
|
||||
// the content-addressed hash. The finalized blob is either passed to the
|
||||
// BlobHandler (if set) or stored internally.
|
||||
// Caller must handle retrying any chunk that triggered size limit exceeded.
|
||||
// Not thread-safe - caller must hold the lock.
|
||||
func (p *Packer) FinalizeBlob() error {
|
||||
p.mu.Lock()
|
||||
defer p.mu.Unlock()
|
||||
|
||||
if p.currentBlob == nil {
|
||||
return nil
|
||||
}
|
||||
|
||||
return p.finalizeCurrentBlob()
|
||||
}
|
||||
|
||||
// GetFinishedBlobs returns all completed blobs and clears the internal list.
|
||||
// This is only used when no BlobHandler is set. After calling this method,
|
||||
// the caller is responsible for uploading the blobs to storage.
|
||||
// Thread-safe.
|
||||
func (p *Packer) GetFinishedBlobs() []*FinishedBlob {
|
||||
p.mu.Lock()
|
||||
defer p.mu.Unlock()
|
||||
|
||||
blobs := p.finishedBlobs
|
||||
p.finishedBlobs = make([]*FinishedBlob, 0)
|
||||
return blobs
|
||||
}
|
||||
|
||||
// startNewBlob initializes a new blob (must be called with lock held)
|
||||
func (p *Packer) startNewBlob() error {
|
||||
// Generate UUID for the blob
|
||||
blobID := uuid.New().String()
|
||||
|
||||
// Create blob record in database
|
||||
if p.repos != nil {
|
||||
blobIDTyped, err := types.ParseBlobID(blobID)
|
||||
if err != nil {
|
||||
return fmt.Errorf("parsing blob ID: %w", err)
|
||||
}
|
||||
blob := &database.Blob{
|
||||
ID: blobIDTyped,
|
||||
Hash: types.BlobHash("temp-placeholder-" + blobID), // Temporary placeholder until finalized
|
||||
CreatedTS: time.Now().UTC(),
|
||||
FinishedTS: nil,
|
||||
UncompressedSize: 0,
|
||||
CompressedSize: 0,
|
||||
UploadedTS: nil,
|
||||
}
|
||||
if err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
|
||||
return p.repos.Blobs.Create(ctx, tx, blob)
|
||||
}); err != nil {
|
||||
return fmt.Errorf("creating blob record: %w", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Create temporary file
|
||||
tempFile, err := afero.TempFile(p.fs, "", "vaultik-blob-*.tmp")
|
||||
if err != nil {
|
||||
return fmt.Errorf("creating temp file: %w", err)
|
||||
}
|
||||
|
||||
// Create blobgen writer for unified compression/encryption/hashing
|
||||
writer, err := blobgen.NewWriter(tempFile, p.compressionLevel, p.recipients)
|
||||
if err != nil {
|
||||
_ = tempFile.Close()
|
||||
_ = p.fs.Remove(tempFile.Name())
|
||||
return fmt.Errorf("creating blobgen writer: %w", err)
|
||||
}
|
||||
|
||||
p.currentBlob = &blobInProgress{
|
||||
id: blobID,
|
||||
chunks: make([]*chunkInfo, 0),
|
||||
chunkSet: make(map[string]bool),
|
||||
startTime: time.Now().UTC(),
|
||||
tempFile: tempFile,
|
||||
writer: writer,
|
||||
size: 0,
|
||||
}
|
||||
|
||||
log.Debug("Created new blob container", "blob_id", blobID, "temp_file", tempFile.Name())
|
||||
return nil
|
||||
}
|
||||
|
||||
// addChunkToCurrentBlob adds a chunk to the current blob (must be called with lock held)
|
||||
func (p *Packer) addChunkToCurrentBlob(chunk *ChunkRef) error {
|
||||
// Skip if chunk already in current blob
|
||||
if p.currentBlob.chunkSet[chunk.Hash] {
|
||||
log.Debug("Skipping duplicate chunk already in current blob", "chunk_hash", chunk.Hash)
|
||||
return nil
|
||||
}
|
||||
|
||||
// Track offset before writing
|
||||
offset := p.currentBlob.size
|
||||
|
||||
// Write to the blobgen writer (compression -> encryption -> disk)
|
||||
if _, err := p.currentBlob.writer.Write(chunk.Data); err != nil {
|
||||
return fmt.Errorf("writing to blob stream: %w", err)
|
||||
}
|
||||
|
||||
// Track chunk info
|
||||
chunkSize := int64(len(chunk.Data))
|
||||
chunkInfo := &chunkInfo{
|
||||
Hash: chunk.Hash,
|
||||
Offset: offset,
|
||||
Size: chunkSize,
|
||||
}
|
||||
p.currentBlob.chunks = append(p.currentBlob.chunks, chunkInfo)
|
||||
p.currentBlob.chunkSet[chunk.Hash] = true
|
||||
|
||||
// Note: blob_chunk records are inserted in batch when blob is finalized
|
||||
// to reduce transaction overhead. The chunk info is already stored in
|
||||
// p.currentBlob.chunks for later insertion.
|
||||
|
||||
// Update total size
|
||||
p.currentBlob.size += chunkSize
|
||||
|
||||
log.Debug("Added chunk to blob container",
|
||||
"blob_id", p.currentBlob.id,
|
||||
"chunk_hash", chunk.Hash,
|
||||
"chunk_size", len(chunk.Data),
|
||||
"offset", offset,
|
||||
"blob_chunks", len(p.currentBlob.chunks),
|
||||
"uncompressed_size", p.currentBlob.size)
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// finalizeCurrentBlob completes the current blob (must be called with lock held)
|
||||
func (p *Packer) finalizeCurrentBlob() error {
|
||||
if p.currentBlob == nil {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Close blobgen writer to flush all data
|
||||
if err := p.currentBlob.writer.Close(); err != nil {
|
||||
p.cleanupTempFile()
|
||||
return fmt.Errorf("closing blobgen writer: %w", err)
|
||||
}
|
||||
|
||||
// Sync file to ensure all data is written
|
||||
if err := p.currentBlob.tempFile.Sync(); err != nil {
|
||||
p.cleanupTempFile()
|
||||
return fmt.Errorf("syncing temp file: %w", err)
|
||||
}
|
||||
|
||||
// Get the final size (encrypted if applicable)
|
||||
finalSize, err := p.currentBlob.tempFile.Seek(0, io.SeekCurrent)
|
||||
if err != nil {
|
||||
p.cleanupTempFile()
|
||||
return fmt.Errorf("getting file size: %w", err)
|
||||
}
|
||||
|
||||
// Reset to beginning for reading
|
||||
if _, err := p.currentBlob.tempFile.Seek(0, io.SeekStart); err != nil {
|
||||
p.cleanupTempFile()
|
||||
return fmt.Errorf("seeking to start: %w", err)
|
||||
}
|
||||
|
||||
// Get hash from blobgen writer (of final encrypted data)
|
||||
finalHash := p.currentBlob.writer.Sum256()
|
||||
blobHash := hex.EncodeToString(finalHash)
|
||||
|
||||
// Create chunk references with offsets
|
||||
chunkRefs := make([]*BlobChunkRef, 0, len(p.currentBlob.chunks))
|
||||
|
||||
for _, chunk := range p.currentBlob.chunks {
|
||||
chunkRefs = append(chunkRefs, &BlobChunkRef{
|
||||
ChunkHash: chunk.Hash,
|
||||
Offset: chunk.Offset,
|
||||
Length: chunk.Size,
|
||||
})
|
||||
}
|
||||
|
||||
// Get pending chunks (will be inserted to DB and reported to handler)
|
||||
chunksToInsert := p.pendingChunks
|
||||
p.pendingChunks = nil // Clear pending list
|
||||
|
||||
// Insert pending chunks, blob_chunks, and update blob in a single transaction
|
||||
if p.repos != nil {
|
||||
blobIDTyped, parseErr := types.ParseBlobID(p.currentBlob.id)
|
||||
if parseErr != nil {
|
||||
p.cleanupTempFile()
|
||||
return fmt.Errorf("parsing blob ID: %w", parseErr)
|
||||
}
|
||||
err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
|
||||
// First insert all pending chunks (required for blob_chunks FK)
|
||||
for _, chunk := range chunksToInsert {
|
||||
dbChunk := &database.Chunk{
|
||||
ChunkHash: types.ChunkHash(chunk.Hash),
|
||||
Size: chunk.Size,
|
||||
}
|
||||
if err := p.repos.Chunks.Create(ctx, tx, dbChunk); err != nil {
|
||||
return fmt.Errorf("creating chunk: %w", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Insert all blob_chunk records in batch
|
||||
for _, chunk := range p.currentBlob.chunks {
|
||||
blobChunk := &database.BlobChunk{
|
||||
BlobID: blobIDTyped,
|
||||
ChunkHash: types.ChunkHash(chunk.Hash),
|
||||
Offset: chunk.Offset,
|
||||
Length: chunk.Size,
|
||||
}
|
||||
if err := p.repos.BlobChunks.Create(ctx, tx, blobChunk); err != nil {
|
||||
return fmt.Errorf("creating blob_chunk: %w", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Update blob record with final hash and sizes
|
||||
return p.repos.Blobs.UpdateFinished(ctx, tx, p.currentBlob.id, blobHash,
|
||||
p.currentBlob.size, finalSize)
|
||||
})
|
||||
if err != nil {
|
||||
p.cleanupTempFile()
|
||||
return fmt.Errorf("finalizing blob transaction: %w", err)
|
||||
}
|
||||
|
||||
log.Debug("Committed blob transaction",
|
||||
"chunks_inserted", len(chunksToInsert),
|
||||
"blob_chunks_inserted", len(p.currentBlob.chunks))
|
||||
}
|
||||
|
||||
// Create finished blob
|
||||
finished := &FinishedBlob{
|
||||
ID: p.currentBlob.id,
|
||||
Hash: blobHash,
|
||||
Data: nil, // We don't load data into memory anymore
|
||||
Chunks: chunkRefs,
|
||||
CreatedTS: p.currentBlob.startTime,
|
||||
Uncompressed: p.currentBlob.size,
|
||||
Compressed: finalSize,
|
||||
}
|
||||
|
||||
compressionRatio := float64(finished.Compressed) / float64(finished.Uncompressed)
|
||||
log.Info("Finalized blob (compressed and encrypted)",
|
||||
"hash", blobHash,
|
||||
"chunks", len(chunkRefs),
|
||||
"uncompressed", finished.Uncompressed,
|
||||
"compressed", finished.Compressed,
|
||||
"ratio", fmt.Sprintf("%.2f", compressionRatio),
|
||||
"duration", time.Since(p.currentBlob.startTime))
|
||||
|
||||
// Collect inserted chunk hashes for the scanner to track
|
||||
var insertedChunkHashes []string
|
||||
for _, chunk := range chunksToInsert {
|
||||
insertedChunkHashes = append(insertedChunkHashes, chunk.Hash)
|
||||
}
|
||||
|
||||
// Call blob handler if set
|
||||
if p.blobHandler != nil {
|
||||
// Reset file position for handler
|
||||
if _, err := p.currentBlob.tempFile.Seek(0, io.SeekStart); err != nil {
|
||||
p.cleanupTempFile()
|
||||
return fmt.Errorf("seeking for handler: %w", err)
|
||||
}
|
||||
|
||||
// Create a blob reader that includes the data stream
|
||||
blobWithReader := &BlobWithReader{
|
||||
FinishedBlob: finished,
|
||||
Reader: p.currentBlob.tempFile,
|
||||
TempFile: p.currentBlob.tempFile,
|
||||
InsertedChunkHashes: insertedChunkHashes,
|
||||
}
|
||||
|
||||
if err := p.blobHandler(blobWithReader); err != nil {
|
||||
p.cleanupTempFile()
|
||||
return fmt.Errorf("blob handler failed: %w", err)
|
||||
}
|
||||
// Note: blob handler is responsible for closing/cleaning up temp file
|
||||
p.currentBlob = nil
|
||||
} else {
|
||||
log.Debug("No blob handler callback configured", "blob_hash", blobHash[:8]+"...")
|
||||
// No handler, need to read data for legacy behavior
|
||||
if _, err := p.currentBlob.tempFile.Seek(0, io.SeekStart); err != nil {
|
||||
p.cleanupTempFile()
|
||||
return fmt.Errorf("seeking to read data: %w", err)
|
||||
}
|
||||
|
||||
data, err := io.ReadAll(p.currentBlob.tempFile)
|
||||
if err != nil {
|
||||
p.cleanupTempFile()
|
||||
return fmt.Errorf("reading blob data: %w", err)
|
||||
}
|
||||
finished.Data = data
|
||||
|
||||
p.finishedBlobs = append(p.finishedBlobs, finished)
|
||||
|
||||
// Cleanup
|
||||
p.cleanupTempFile()
|
||||
p.currentBlob = nil
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// cleanupTempFile removes the temporary file
|
||||
func (p *Packer) cleanupTempFile() {
|
||||
if p.currentBlob != nil && p.currentBlob.tempFile != nil {
|
||||
name := p.currentBlob.tempFile.Name()
|
||||
_ = p.currentBlob.tempFile.Close()
|
||||
_ = p.fs.Remove(name)
|
||||
}
|
||||
}
|
||||
|
||||
// PackChunks is a convenience method to pack multiple chunks at once
|
||||
func (p *Packer) PackChunks(chunks []*ChunkRef) error {
|
||||
for _, chunk := range chunks {
|
||||
err := p.AddChunk(chunk)
|
||||
if err == ErrBlobSizeLimitExceeded {
|
||||
// Finalize current blob and retry
|
||||
if err := p.FinalizeBlob(); err != nil {
|
||||
return fmt.Errorf("finalizing blob before retry: %w", err)
|
||||
}
|
||||
// Retry the chunk
|
||||
if err := p.AddChunk(chunk); err != nil {
|
||||
return fmt.Errorf("adding chunk %s after finalize: %w", chunk.Hash, err)
|
||||
}
|
||||
} else if err != nil {
|
||||
return fmt.Errorf("adding chunk %s: %w", chunk.Hash, err)
|
||||
}
|
||||
}
|
||||
|
||||
return p.Flush()
|
||||
}
|
||||
385
internal/blob/packer_test.go
Normal file
385
internal/blob/packer_test.go
Normal file
@@ -0,0 +1,385 @@
|
||||
package blob
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"crypto/sha256"
|
||||
"database/sql"
|
||||
"encoding/hex"
|
||||
"io"
|
||||
"testing"
|
||||
|
||||
"filippo.io/age"
|
||||
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
"github.com/klauspost/compress/zstd"
|
||||
"github.com/spf13/afero"
|
||||
)
|
||||
|
||||
const (
|
||||
// Test key from test/insecure-integration-test.key
|
||||
testPrivateKey = "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5"
|
||||
testPublicKey = "age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"
|
||||
)
|
||||
|
||||
func TestPacker(t *testing.T) {
|
||||
// Initialize logger for tests
|
||||
log.Initialize(log.Config{})
|
||||
|
||||
// Parse test identity
|
||||
identity, err := age.ParseX25519Identity(testPrivateKey)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to parse test identity: %v", err)
|
||||
}
|
||||
|
||||
t.Run("single chunk creates single blob", func(t *testing.T) {
|
||||
// Create test database
|
||||
db, err := database.NewTestDB()
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create test db: %v", err)
|
||||
}
|
||||
defer func() { _ = db.Close() }()
|
||||
repos := database.NewRepositories(db)
|
||||
|
||||
cfg := PackerConfig{
|
||||
MaxBlobSize: 10 * 1024 * 1024, // 10MB
|
||||
CompressionLevel: 3,
|
||||
Recipients: []string{testPublicKey},
|
||||
Repositories: repos,
|
||||
Fs: afero.NewMemMapFs(),
|
||||
}
|
||||
packer, err := NewPacker(cfg)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create packer: %v", err)
|
||||
}
|
||||
|
||||
// Create a chunk
|
||||
data := []byte("Hello, World!")
|
||||
hash := sha256.Sum256(data)
|
||||
hashStr := hex.EncodeToString(hash[:])
|
||||
|
||||
// Create chunk in database first
|
||||
dbChunk := &database.Chunk{
|
||||
ChunkHash: types.ChunkHash(hashStr),
|
||||
Size: int64(len(data)),
|
||||
}
|
||||
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
|
||||
return repos.Chunks.Create(ctx, tx, dbChunk)
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk in db: %v", err)
|
||||
}
|
||||
|
||||
chunk := &ChunkRef{
|
||||
Hash: hashStr,
|
||||
Data: data,
|
||||
}
|
||||
|
||||
// Add chunk
|
||||
if err := packer.AddChunk(chunk); err != nil {
|
||||
t.Fatalf("failed to add chunk: %v", err)
|
||||
}
|
||||
|
||||
// Flush
|
||||
if err := packer.Flush(); err != nil {
|
||||
t.Fatalf("failed to flush: %v", err)
|
||||
}
|
||||
|
||||
// Get finished blobs
|
||||
blobs := packer.GetFinishedBlobs()
|
||||
if len(blobs) != 1 {
|
||||
t.Fatalf("expected 1 blob, got %d", len(blobs))
|
||||
}
|
||||
|
||||
blob := blobs[0]
|
||||
if len(blob.Chunks) != 1 {
|
||||
t.Errorf("expected 1 chunk in blob, got %d", len(blob.Chunks))
|
||||
}
|
||||
|
||||
// Note: Very small data may not compress well
|
||||
t.Logf("Compression: %d -> %d bytes", blob.Uncompressed, blob.Compressed)
|
||||
|
||||
// Decrypt the blob data
|
||||
decrypted, err := age.Decrypt(bytes.NewReader(blob.Data), identity)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to decrypt blob: %v", err)
|
||||
}
|
||||
|
||||
// Decompress the decrypted data
|
||||
reader, err := zstd.NewReader(decrypted)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create decompressor: %v", err)
|
||||
}
|
||||
defer reader.Close()
|
||||
|
||||
var decompressed bytes.Buffer
|
||||
if _, err := io.Copy(&decompressed, reader); err != nil {
|
||||
t.Fatalf("failed to decompress: %v", err)
|
||||
}
|
||||
|
||||
if !bytes.Equal(decompressed.Bytes(), data) {
|
||||
t.Error("decompressed data doesn't match original")
|
||||
}
|
||||
})
|
||||
|
||||
t.Run("multiple chunks packed together", func(t *testing.T) {
|
||||
// Create test database
|
||||
db, err := database.NewTestDB()
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create test db: %v", err)
|
||||
}
|
||||
defer func() { _ = db.Close() }()
|
||||
repos := database.NewRepositories(db)
|
||||
|
||||
cfg := PackerConfig{
|
||||
MaxBlobSize: 10 * 1024 * 1024, // 10MB
|
||||
CompressionLevel: 3,
|
||||
Recipients: []string{testPublicKey},
|
||||
Repositories: repos,
|
||||
Fs: afero.NewMemMapFs(),
|
||||
}
|
||||
packer, err := NewPacker(cfg)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create packer: %v", err)
|
||||
}
|
||||
|
||||
// Create multiple small chunks
|
||||
chunks := make([]*ChunkRef, 10)
|
||||
for i := 0; i < 10; i++ {
|
||||
data := bytes.Repeat([]byte{byte(i)}, 1000)
|
||||
hash := sha256.Sum256(data)
|
||||
hashStr := hex.EncodeToString(hash[:])
|
||||
|
||||
// Create chunk in database first
|
||||
dbChunk := &database.Chunk{
|
||||
ChunkHash: types.ChunkHash(hashStr),
|
||||
Size: int64(len(data)),
|
||||
}
|
||||
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
|
||||
return repos.Chunks.Create(ctx, tx, dbChunk)
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk in db: %v", err)
|
||||
}
|
||||
|
||||
chunks[i] = &ChunkRef{
|
||||
Hash: hashStr,
|
||||
Data: data,
|
||||
}
|
||||
}
|
||||
|
||||
// Add all chunks
|
||||
for _, chunk := range chunks {
|
||||
err := packer.AddChunk(chunk)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to add chunk: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Flush
|
||||
if err := packer.Flush(); err != nil {
|
||||
t.Fatalf("failed to flush: %v", err)
|
||||
}
|
||||
|
||||
// Should have one blob with all chunks
|
||||
blobs := packer.GetFinishedBlobs()
|
||||
if len(blobs) != 1 {
|
||||
t.Fatalf("expected 1 blob, got %d", len(blobs))
|
||||
}
|
||||
|
||||
if len(blobs[0].Chunks) != 10 {
|
||||
t.Errorf("expected 10 chunks in blob, got %d", len(blobs[0].Chunks))
|
||||
}
|
||||
|
||||
// Verify offsets are correct
|
||||
expectedOffset := int64(0)
|
||||
for i, chunkRef := range blobs[0].Chunks {
|
||||
if chunkRef.Offset != expectedOffset {
|
||||
t.Errorf("chunk %d: expected offset %d, got %d", i, expectedOffset, chunkRef.Offset)
|
||||
}
|
||||
if chunkRef.Length != 1000 {
|
||||
t.Errorf("chunk %d: expected length 1000, got %d", i, chunkRef.Length)
|
||||
}
|
||||
expectedOffset += chunkRef.Length
|
||||
}
|
||||
})
|
||||
|
||||
t.Run("blob size limit enforced", func(t *testing.T) {
|
||||
// Create test database
|
||||
db, err := database.NewTestDB()
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create test db: %v", err)
|
||||
}
|
||||
defer func() { _ = db.Close() }()
|
||||
repos := database.NewRepositories(db)
|
||||
|
||||
// Small blob size limit to force multiple blobs
|
||||
cfg := PackerConfig{
|
||||
MaxBlobSize: 5000, // 5KB max
|
||||
CompressionLevel: 3,
|
||||
Recipients: []string{testPublicKey},
|
||||
Repositories: repos,
|
||||
Fs: afero.NewMemMapFs(),
|
||||
}
|
||||
packer, err := NewPacker(cfg)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create packer: %v", err)
|
||||
}
|
||||
|
||||
// Create chunks that will exceed the limit
|
||||
chunks := make([]*ChunkRef, 10)
|
||||
for i := 0; i < 10; i++ {
|
||||
data := bytes.Repeat([]byte{byte(i)}, 1000) // 1KB each
|
||||
hash := sha256.Sum256(data)
|
||||
hashStr := hex.EncodeToString(hash[:])
|
||||
|
||||
// Create chunk in database first
|
||||
dbChunk := &database.Chunk{
|
||||
ChunkHash: types.ChunkHash(hashStr),
|
||||
Size: int64(len(data)),
|
||||
}
|
||||
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
|
||||
return repos.Chunks.Create(ctx, tx, dbChunk)
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk in db: %v", err)
|
||||
}
|
||||
|
||||
chunks[i] = &ChunkRef{
|
||||
Hash: hashStr,
|
||||
Data: data,
|
||||
}
|
||||
}
|
||||
|
||||
blobCount := 0
|
||||
|
||||
// Add chunks and handle size limit errors
|
||||
for _, chunk := range chunks {
|
||||
err := packer.AddChunk(chunk)
|
||||
if err == ErrBlobSizeLimitExceeded {
|
||||
// Finalize current blob
|
||||
if err := packer.FinalizeBlob(); err != nil {
|
||||
t.Fatalf("failed to finalize blob: %v", err)
|
||||
}
|
||||
blobCount++
|
||||
// Retry adding the chunk
|
||||
if err := packer.AddChunk(chunk); err != nil {
|
||||
t.Fatalf("failed to add chunk after finalize: %v", err)
|
||||
}
|
||||
} else if err != nil {
|
||||
t.Fatalf("failed to add chunk: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Flush remaining
|
||||
if err := packer.Flush(); err != nil {
|
||||
t.Fatalf("failed to flush: %v", err)
|
||||
}
|
||||
|
||||
// Get all blobs
|
||||
blobs := packer.GetFinishedBlobs()
|
||||
totalBlobs := blobCount + len(blobs)
|
||||
|
||||
// Should have multiple blobs due to size limit
|
||||
if totalBlobs < 2 {
|
||||
t.Errorf("expected multiple blobs due to size limit, got %d", totalBlobs)
|
||||
}
|
||||
|
||||
// Verify each blob respects size limit (approximately)
|
||||
for _, blob := range blobs {
|
||||
if blob.Compressed > 6000 { // Allow some overhead
|
||||
t.Errorf("blob size %d exceeds limit", blob.Compressed)
|
||||
}
|
||||
}
|
||||
})
|
||||
|
||||
t.Run("with encryption", func(t *testing.T) {
|
||||
// Create test database
|
||||
db, err := database.NewTestDB()
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create test db: %v", err)
|
||||
}
|
||||
defer func() { _ = db.Close() }()
|
||||
repos := database.NewRepositories(db)
|
||||
|
||||
// Generate test identity (using the one from parent test)
|
||||
cfg := PackerConfig{
|
||||
MaxBlobSize: 10 * 1024 * 1024, // 10MB
|
||||
CompressionLevel: 3,
|
||||
Recipients: []string{testPublicKey},
|
||||
Repositories: repos,
|
||||
Fs: afero.NewMemMapFs(),
|
||||
}
|
||||
packer, err := NewPacker(cfg)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create packer: %v", err)
|
||||
}
|
||||
|
||||
// Create test data
|
||||
data := bytes.Repeat([]byte("Test data for encryption!"), 100)
|
||||
hash := sha256.Sum256(data)
|
||||
hashStr := hex.EncodeToString(hash[:])
|
||||
|
||||
// Create chunk in database first
|
||||
dbChunk := &database.Chunk{
|
||||
ChunkHash: types.ChunkHash(hashStr),
|
||||
Size: int64(len(data)),
|
||||
}
|
||||
err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
|
||||
return repos.Chunks.Create(ctx, tx, dbChunk)
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk in db: %v", err)
|
||||
}
|
||||
|
||||
chunk := &ChunkRef{
|
||||
Hash: hashStr,
|
||||
Data: data,
|
||||
}
|
||||
|
||||
// Add chunk and flush
|
||||
if err := packer.AddChunk(chunk); err != nil {
|
||||
t.Fatalf("failed to add chunk: %v", err)
|
||||
}
|
||||
if err := packer.Flush(); err != nil {
|
||||
t.Fatalf("failed to flush: %v", err)
|
||||
}
|
||||
|
||||
// Get blob
|
||||
blobs := packer.GetFinishedBlobs()
|
||||
if len(blobs) != 1 {
|
||||
t.Fatalf("expected 1 blob, got %d", len(blobs))
|
||||
}
|
||||
|
||||
blob := blobs[0]
|
||||
|
||||
// Decrypt the blob
|
||||
decrypted, err := age.Decrypt(bytes.NewReader(blob.Data), identity)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to decrypt blob: %v", err)
|
||||
}
|
||||
|
||||
var decryptedData bytes.Buffer
|
||||
if _, err := decryptedData.ReadFrom(decrypted); err != nil {
|
||||
t.Fatalf("failed to read decrypted data: %v", err)
|
||||
}
|
||||
|
||||
// Decompress
|
||||
reader, err := zstd.NewReader(&decryptedData)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create decompressor: %v", err)
|
||||
}
|
||||
defer reader.Close()
|
||||
|
||||
var decompressed bytes.Buffer
|
||||
if _, err := decompressed.ReadFrom(reader); err != nil {
|
||||
t.Fatalf("failed to decompress: %v", err)
|
||||
}
|
||||
|
||||
// Verify data
|
||||
if !bytes.Equal(decompressed.Bytes(), data) {
|
||||
t.Error("decrypted and decompressed data doesn't match original")
|
||||
}
|
||||
})
|
||||
}
|
||||
74
internal/blobgen/compress.go
Normal file
74
internal/blobgen/compress.go
Normal file
@@ -0,0 +1,74 @@
|
||||
package blobgen
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"encoding/hex"
|
||||
"fmt"
|
||||
"io"
|
||||
)
|
||||
|
||||
// CompressResult contains the results of compression
|
||||
type CompressResult struct {
|
||||
Data []byte
|
||||
UncompressedSize int64
|
||||
CompressedSize int64
|
||||
SHA256 string
|
||||
}
|
||||
|
||||
// CompressData compresses and encrypts data, returning the result with hash
|
||||
func CompressData(data []byte, compressionLevel int, recipients []string) (*CompressResult, error) {
|
||||
var buf bytes.Buffer
|
||||
|
||||
// Create writer
|
||||
w, err := NewWriter(&buf, compressionLevel, recipients)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("creating writer: %w", err)
|
||||
}
|
||||
|
||||
// Write data
|
||||
if _, err := w.Write(data); err != nil {
|
||||
_ = w.Close()
|
||||
return nil, fmt.Errorf("writing data: %w", err)
|
||||
}
|
||||
|
||||
// Close to flush
|
||||
if err := w.Close(); err != nil {
|
||||
return nil, fmt.Errorf("closing writer: %w", err)
|
||||
}
|
||||
|
||||
return &CompressResult{
|
||||
Data: buf.Bytes(),
|
||||
UncompressedSize: int64(len(data)),
|
||||
CompressedSize: int64(buf.Len()),
|
||||
SHA256: hex.EncodeToString(w.Sum256()),
|
||||
}, nil
|
||||
}
|
||||
|
||||
// CompressStream compresses and encrypts from reader to writer, returning hash
|
||||
func CompressStream(dst io.Writer, src io.Reader, compressionLevel int, recipients []string) (written int64, hash string, err error) {
|
||||
// Create writer
|
||||
w, err := NewWriter(dst, compressionLevel, recipients)
|
||||
if err != nil {
|
||||
return 0, "", fmt.Errorf("creating writer: %w", err)
|
||||
}
|
||||
|
||||
closed := false
|
||||
defer func() {
|
||||
if !closed {
|
||||
_ = w.Close()
|
||||
}
|
||||
}()
|
||||
|
||||
// Copy data
|
||||
if _, err := io.Copy(w, src); err != nil {
|
||||
return 0, "", fmt.Errorf("copying data: %w", err)
|
||||
}
|
||||
|
||||
// Close to flush
|
||||
if err := w.Close(); err != nil {
|
||||
return 0, "", fmt.Errorf("closing writer: %w", err)
|
||||
}
|
||||
closed = true
|
||||
|
||||
return w.BytesWritten(), hex.EncodeToString(w.Sum256()), nil
|
||||
}
|
||||
64
internal/blobgen/compress_test.go
Normal file
64
internal/blobgen/compress_test.go
Normal file
@@ -0,0 +1,64 @@
|
||||
package blobgen
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"crypto/rand"
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
// testRecipient is a static age recipient for tests.
|
||||
const testRecipient = "age1cplgrwj77ta54dnmydvvmzn64ltk83ankxl5sww04mrtmu62kv3s89gmvv"
|
||||
|
||||
// TestCompressStreamNoDoubleClose is a regression test for issue #28.
|
||||
// It verifies that CompressStream does not panic or return an error due to
|
||||
// double-closing the underlying blobgen.Writer. Before the fix in PR #33,
|
||||
// the explicit Close() on the happy path combined with defer Close() would
|
||||
// cause a double close.
|
||||
func TestCompressStreamNoDoubleClose(t *testing.T) {
|
||||
input := []byte("regression test data for issue #28 double-close fix")
|
||||
var buf bytes.Buffer
|
||||
|
||||
written, hash, err := CompressStream(&buf, bytes.NewReader(input), 3, []string{testRecipient})
|
||||
require.NoError(t, err, "CompressStream should not return an error")
|
||||
assert.True(t, written > 0, "expected bytes written > 0")
|
||||
assert.NotEmpty(t, hash, "expected non-empty hash")
|
||||
assert.True(t, buf.Len() > 0, "expected non-empty output")
|
||||
}
|
||||
|
||||
// TestCompressStreamLargeInput exercises CompressStream with a larger payload
|
||||
// to ensure no double-close issues surface under heavier I/O.
|
||||
func TestCompressStreamLargeInput(t *testing.T) {
|
||||
data := make([]byte, 512*1024) // 512 KB
|
||||
_, err := rand.Read(data)
|
||||
require.NoError(t, err)
|
||||
|
||||
var buf bytes.Buffer
|
||||
written, hash, err := CompressStream(&buf, bytes.NewReader(data), 3, []string{testRecipient})
|
||||
require.NoError(t, err)
|
||||
assert.True(t, written > 0)
|
||||
assert.NotEmpty(t, hash)
|
||||
}
|
||||
|
||||
// TestCompressStreamEmptyInput verifies CompressStream handles empty input
|
||||
// without double-close issues.
|
||||
func TestCompressStreamEmptyInput(t *testing.T) {
|
||||
var buf bytes.Buffer
|
||||
_, hash, err := CompressStream(&buf, strings.NewReader(""), 3, []string{testRecipient})
|
||||
require.NoError(t, err)
|
||||
assert.NotEmpty(t, hash)
|
||||
}
|
||||
|
||||
// TestCompressDataNoDoubleClose mirrors the stream test for CompressData,
|
||||
// ensuring the explicit Close + error-path Close pattern is also safe.
|
||||
func TestCompressDataNoDoubleClose(t *testing.T) {
|
||||
input := []byte("CompressData regression test for double-close")
|
||||
result, err := CompressData(input, 3, []string{testRecipient})
|
||||
require.NoError(t, err)
|
||||
assert.True(t, result.CompressedSize > 0)
|
||||
assert.True(t, result.UncompressedSize == int64(len(input)))
|
||||
assert.NotEmpty(t, result.SHA256)
|
||||
}
|
||||
73
internal/blobgen/reader.go
Normal file
73
internal/blobgen/reader.go
Normal file
@@ -0,0 +1,73 @@
|
||||
package blobgen
|
||||
|
||||
import (
|
||||
"crypto/sha256"
|
||||
"fmt"
|
||||
"hash"
|
||||
"io"
|
||||
|
||||
"filippo.io/age"
|
||||
"github.com/klauspost/compress/zstd"
|
||||
)
|
||||
|
||||
// Reader wraps decompression and decryption with SHA256 verification
|
||||
type Reader struct {
|
||||
reader io.Reader
|
||||
decompressor *zstd.Decoder
|
||||
decryptor io.Reader
|
||||
hasher hash.Hash
|
||||
teeReader io.Reader
|
||||
bytesRead int64
|
||||
}
|
||||
|
||||
// NewReader creates a new Reader that decrypts, decompresses, and verifies data
|
||||
func NewReader(r io.Reader, identity age.Identity) (*Reader, error) {
|
||||
// Create decryption reader
|
||||
decReader, err := age.Decrypt(r, identity)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("creating decryption reader: %w", err)
|
||||
}
|
||||
|
||||
// Create decompression reader
|
||||
decompressor, err := zstd.NewReader(decReader)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("creating decompression reader: %w", err)
|
||||
}
|
||||
|
||||
// Create SHA256 hasher
|
||||
hasher := sha256.New()
|
||||
|
||||
// Create tee reader that reads from decompressor and writes to hasher
|
||||
teeReader := io.TeeReader(decompressor, hasher)
|
||||
|
||||
return &Reader{
|
||||
reader: r,
|
||||
decompressor: decompressor,
|
||||
decryptor: decReader,
|
||||
hasher: hasher,
|
||||
teeReader: teeReader,
|
||||
}, nil
|
||||
}
|
||||
|
||||
// Read implements io.Reader
|
||||
func (r *Reader) Read(p []byte) (n int, err error) {
|
||||
n, err = r.teeReader.Read(p)
|
||||
r.bytesRead += int64(n)
|
||||
return n, err
|
||||
}
|
||||
|
||||
// Close closes the decompressor
|
||||
func (r *Reader) Close() error {
|
||||
r.decompressor.Close()
|
||||
return nil
|
||||
}
|
||||
|
||||
// Sum256 returns the SHA256 hash of all data read
|
||||
func (r *Reader) Sum256() []byte {
|
||||
return r.hasher.Sum(nil)
|
||||
}
|
||||
|
||||
// BytesRead returns the number of uncompressed bytes read
|
||||
func (r *Reader) BytesRead() int64 {
|
||||
return r.bytesRead
|
||||
}
|
||||
127
internal/blobgen/writer.go
Normal file
127
internal/blobgen/writer.go
Normal file
@@ -0,0 +1,127 @@
|
||||
package blobgen
|
||||
|
||||
import (
|
||||
"crypto/sha256"
|
||||
"fmt"
|
||||
"hash"
|
||||
"io"
|
||||
"runtime"
|
||||
|
||||
"filippo.io/age"
|
||||
"github.com/klauspost/compress/zstd"
|
||||
)
|
||||
|
||||
// Writer wraps compression and encryption with SHA256 hashing.
|
||||
// Data flows: input -> tee(hasher, compressor -> encryptor -> destination)
|
||||
// The hash is computed on the uncompressed input for deterministic content-addressing.
|
||||
type Writer struct {
|
||||
teeWriter io.Writer // Tee to hasher and compressor
|
||||
compressor *zstd.Encoder // Compression layer
|
||||
encryptor io.WriteCloser // Encryption layer
|
||||
hasher hash.Hash // SHA256 hasher (on uncompressed input)
|
||||
compressionLevel int
|
||||
bytesWritten int64
|
||||
}
|
||||
|
||||
// NewWriter creates a new Writer that compresses, encrypts, and hashes data.
|
||||
// The hash is computed on the uncompressed input for deterministic content-addressing.
|
||||
func NewWriter(w io.Writer, compressionLevel int, recipients []string) (*Writer, error) {
|
||||
// Validate compression level
|
||||
if err := validateCompressionLevel(compressionLevel); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
// Create SHA256 hasher for the uncompressed input
|
||||
hasher := sha256.New()
|
||||
|
||||
// Parse recipients
|
||||
var ageRecipients []age.Recipient
|
||||
for _, recipient := range recipients {
|
||||
r, err := age.ParseX25519Recipient(recipient)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("parsing recipient %s: %w", recipient, err)
|
||||
}
|
||||
ageRecipients = append(ageRecipients, r)
|
||||
}
|
||||
|
||||
// Create encryption writer that outputs to destination
|
||||
encWriter, err := age.Encrypt(w, ageRecipients...)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("creating encryption writer: %w", err)
|
||||
}
|
||||
|
||||
// Calculate compression concurrency: CPUs - 2, minimum 1
|
||||
concurrency := runtime.NumCPU() - 2
|
||||
if concurrency < 1 {
|
||||
concurrency = 1
|
||||
}
|
||||
|
||||
// Create compression writer with encryption as destination
|
||||
compressor, err := zstd.NewWriter(encWriter,
|
||||
zstd.WithEncoderLevel(zstd.EncoderLevelFromZstd(compressionLevel)),
|
||||
zstd.WithEncoderConcurrency(concurrency),
|
||||
)
|
||||
if err != nil {
|
||||
_ = encWriter.Close()
|
||||
return nil, fmt.Errorf("creating compression writer: %w", err)
|
||||
}
|
||||
|
||||
// Create tee writer: input goes to both hasher and compressor
|
||||
teeWriter := io.MultiWriter(hasher, compressor)
|
||||
|
||||
return &Writer{
|
||||
teeWriter: teeWriter,
|
||||
compressor: compressor,
|
||||
encryptor: encWriter,
|
||||
hasher: hasher,
|
||||
compressionLevel: compressionLevel,
|
||||
}, nil
|
||||
}
|
||||
|
||||
// Write implements io.Writer
|
||||
func (w *Writer) Write(p []byte) (n int, err error) {
|
||||
n, err = w.teeWriter.Write(p)
|
||||
w.bytesWritten += int64(n)
|
||||
return n, err
|
||||
}
|
||||
|
||||
// Close closes all layers and returns any errors
|
||||
func (w *Writer) Close() error {
|
||||
// Close compressor first
|
||||
if err := w.compressor.Close(); err != nil {
|
||||
return fmt.Errorf("closing compressor: %w", err)
|
||||
}
|
||||
|
||||
// Then close encryptor
|
||||
if err := w.encryptor.Close(); err != nil {
|
||||
return fmt.Errorf("closing encryptor: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// Sum256 returns the double SHA256 hash of the uncompressed input data.
|
||||
// Double hashing (SHA256(SHA256(data))) prevents information leakage about
|
||||
// the plaintext - an attacker cannot confirm existence of known content
|
||||
// by computing its hash and checking for a matching blob filename.
|
||||
func (w *Writer) Sum256() []byte {
|
||||
// First hash: SHA256(plaintext)
|
||||
firstHash := w.hasher.Sum(nil)
|
||||
// Second hash: SHA256(firstHash) - this is the blob ID
|
||||
secondHash := sha256.Sum256(firstHash)
|
||||
return secondHash[:]
|
||||
}
|
||||
|
||||
// BytesWritten returns the number of uncompressed bytes written
|
||||
func (w *Writer) BytesWritten() int64 {
|
||||
return w.bytesWritten
|
||||
}
|
||||
|
||||
func validateCompressionLevel(level int) error {
|
||||
// Zstd compression levels: 1-19 (default is 3)
|
||||
// SpeedFastest = 1, SpeedDefault = 3, SpeedBetterCompression = 7, SpeedBestCompression = 11
|
||||
if level < 1 || level > 19 {
|
||||
return fmt.Errorf("invalid compression level %d: must be between 1 and 19", level)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
105
internal/blobgen/writer_test.go
Normal file
105
internal/blobgen/writer_test.go
Normal file
@@ -0,0 +1,105 @@
|
||||
package blobgen
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"crypto/rand"
|
||||
"crypto/sha256"
|
||||
"encoding/hex"
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
// TestWriterHashIsDoubleHash verifies that Writer.Sum256() returns
|
||||
// the double hash SHA256(SHA256(plaintext)) for security.
|
||||
// Double hashing prevents attackers from confirming existence of known content.
|
||||
func TestWriterHashIsDoubleHash(t *testing.T) {
|
||||
// Test data - random data that doesn't compress well
|
||||
testData := make([]byte, 1024*1024) // 1MB
|
||||
_, err := rand.Read(testData)
|
||||
require.NoError(t, err)
|
||||
|
||||
// Test recipient (generated with age-keygen)
|
||||
testRecipient := "age1cplgrwj77ta54dnmydvvmzn64ltk83ankxl5sww04mrtmu62kv3s89gmvv"
|
||||
|
||||
// Create a buffer to capture the encrypted output
|
||||
var encryptedBuf bytes.Buffer
|
||||
|
||||
// Create blobgen writer
|
||||
writer, err := NewWriter(&encryptedBuf, 3, []string{testRecipient})
|
||||
require.NoError(t, err)
|
||||
|
||||
// Write test data
|
||||
n, err := writer.Write(testData)
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, len(testData), n)
|
||||
|
||||
// Close to flush all data
|
||||
err = writer.Close()
|
||||
require.NoError(t, err)
|
||||
|
||||
// Get the hash from the writer
|
||||
writerHash := hex.EncodeToString(writer.Sum256())
|
||||
|
||||
// Calculate the expected double hash: SHA256(SHA256(plaintext))
|
||||
firstHash := sha256.Sum256(testData)
|
||||
secondHash := sha256.Sum256(firstHash[:])
|
||||
expectedDoubleHash := hex.EncodeToString(secondHash[:])
|
||||
|
||||
// Also compute single hash to verify it's different
|
||||
singleHashStr := hex.EncodeToString(firstHash[:])
|
||||
|
||||
t.Logf("Input size: %d bytes", len(testData))
|
||||
t.Logf("Single hash (SHA256(data)): %s", singleHashStr)
|
||||
t.Logf("Double hash (SHA256(SHA256(data))): %s", expectedDoubleHash)
|
||||
t.Logf("Writer hash: %s", writerHash)
|
||||
|
||||
// The writer hash should match the double hash
|
||||
assert.Equal(t, expectedDoubleHash, writerHash,
|
||||
"Writer.Sum256() should return SHA256(SHA256(plaintext)) for security")
|
||||
|
||||
// Verify it's NOT the single hash (would leak information)
|
||||
assert.NotEqual(t, singleHashStr, writerHash,
|
||||
"Writer hash should not be single hash (would allow content confirmation attacks)")
|
||||
}
|
||||
|
||||
// TestWriterDeterministicHash verifies that the same input always produces
|
||||
// the same hash, even with non-deterministic encryption.
|
||||
func TestWriterDeterministicHash(t *testing.T) {
|
||||
// Test data
|
||||
testData := []byte("Hello, World! This is test data for deterministic hashing.")
|
||||
|
||||
// Test recipient
|
||||
testRecipient := "age1cplgrwj77ta54dnmydvvmzn64ltk83ankxl5sww04mrtmu62kv3s89gmvv"
|
||||
|
||||
// Create two writers and verify they produce the same hash
|
||||
var buf1, buf2 bytes.Buffer
|
||||
|
||||
writer1, err := NewWriter(&buf1, 3, []string{testRecipient})
|
||||
require.NoError(t, err)
|
||||
_, err = writer1.Write(testData)
|
||||
require.NoError(t, err)
|
||||
require.NoError(t, writer1.Close())
|
||||
|
||||
writer2, err := NewWriter(&buf2, 3, []string{testRecipient})
|
||||
require.NoError(t, err)
|
||||
_, err = writer2.Write(testData)
|
||||
require.NoError(t, err)
|
||||
require.NoError(t, writer2.Close())
|
||||
|
||||
hash1 := hex.EncodeToString(writer1.Sum256())
|
||||
hash2 := hex.EncodeToString(writer2.Sum256())
|
||||
|
||||
// Hashes should be identical (deterministic)
|
||||
assert.Equal(t, hash1, hash2, "Same input should produce same hash")
|
||||
|
||||
// Encrypted outputs should be different (non-deterministic encryption)
|
||||
assert.NotEqual(t, buf1.Bytes(), buf2.Bytes(),
|
||||
"Encrypted outputs should differ due to non-deterministic encryption")
|
||||
|
||||
t.Logf("Hash 1: %s", hash1)
|
||||
t.Logf("Hash 2: %s", hash2)
|
||||
t.Logf("Encrypted size 1: %d bytes", buf1.Len())
|
||||
t.Logf("Encrypted size 2: %d bytes", buf2.Len())
|
||||
}
|
||||
153
internal/chunker/chunker.go
Normal file
153
internal/chunker/chunker.go
Normal file
@@ -0,0 +1,153 @@
|
||||
package chunker
|
||||
|
||||
import (
|
||||
"crypto/sha256"
|
||||
"encoding/hex"
|
||||
"fmt"
|
||||
"io"
|
||||
"os"
|
||||
)
|
||||
|
||||
// Chunk represents a single chunk of data produced by the content-defined chunking algorithm.
|
||||
// Each chunk is identified by its SHA256 hash and contains the raw data along with
|
||||
// its position and size information from the original file.
|
||||
type Chunk struct {
|
||||
Hash string // Content hash of the chunk
|
||||
Data []byte // Chunk data
|
||||
Offset int64 // Offset in the original file
|
||||
Size int64 // Size of the chunk
|
||||
}
|
||||
|
||||
// Chunker provides content-defined chunking using the FastCDC algorithm.
|
||||
// It splits data into variable-sized chunks based on content patterns, ensuring
|
||||
// that identical data sequences produce identical chunks regardless of their
|
||||
// position in the file. This enables efficient deduplication.
|
||||
type Chunker struct {
|
||||
avgChunkSize int
|
||||
minChunkSize int
|
||||
maxChunkSize int
|
||||
}
|
||||
|
||||
// NewChunker creates a new chunker with the specified average chunk size.
|
||||
// The actual chunk sizes will vary between avgChunkSize/4 and avgChunkSize*4
|
||||
// as recommended by the FastCDC algorithm. Typical values for avgChunkSize
|
||||
// are 64KB (65536), 256KB (262144), or 1MB (1048576).
|
||||
func NewChunker(avgChunkSize int64) *Chunker {
|
||||
// FastCDC recommends min = avg/4 and max = avg*4
|
||||
return &Chunker{
|
||||
avgChunkSize: int(avgChunkSize),
|
||||
minChunkSize: int(avgChunkSize / 4),
|
||||
maxChunkSize: int(avgChunkSize * 4),
|
||||
}
|
||||
}
|
||||
|
||||
// ChunkReader splits the reader into content-defined chunks and returns all chunks at once.
|
||||
// This method loads all chunk data into memory, so it should only be used for
|
||||
// reasonably sized inputs. For large files or streams, use ChunkReaderStreaming instead.
|
||||
// Returns an error if chunking fails or if reading from the input fails.
|
||||
func (c *Chunker) ChunkReader(r io.Reader) ([]Chunk, error) {
|
||||
chunker := AcquireReusableChunker(r, c.minChunkSize, c.avgChunkSize, c.maxChunkSize)
|
||||
defer chunker.Release()
|
||||
|
||||
var chunks []Chunk
|
||||
offset := int64(0)
|
||||
|
||||
for {
|
||||
chunk, err := chunker.Next()
|
||||
if err == io.EOF {
|
||||
break
|
||||
}
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("reading chunk: %w", err)
|
||||
}
|
||||
|
||||
// Calculate hash
|
||||
hash := sha256.Sum256(chunk.Data)
|
||||
|
||||
// Make a copy of the data since the chunker reuses the buffer
|
||||
chunkData := make([]byte, len(chunk.Data))
|
||||
copy(chunkData, chunk.Data)
|
||||
|
||||
chunks = append(chunks, Chunk{
|
||||
Hash: hex.EncodeToString(hash[:]),
|
||||
Data: chunkData,
|
||||
Offset: offset,
|
||||
Size: int64(len(chunk.Data)),
|
||||
})
|
||||
|
||||
offset += int64(len(chunk.Data))
|
||||
}
|
||||
|
||||
return chunks, nil
|
||||
}
|
||||
|
||||
// ChunkCallback is a function called for each chunk as it's processed.
|
||||
// The callback receives a Chunk containing the hash, data, offset, and size.
|
||||
// If the callback returns an error, chunk processing stops and the error is propagated.
|
||||
type ChunkCallback func(chunk Chunk) error
|
||||
|
||||
// ChunkReaderStreaming splits the reader into chunks and calls the callback for each chunk.
|
||||
// This is the preferred method for processing large files or streams as it doesn't
|
||||
// accumulate all chunks in memory. The callback is invoked for each chunk as it's
|
||||
// produced, allowing for streaming processing and immediate storage or transmission.
|
||||
// Returns the SHA256 hash of the entire file content and an error if chunking fails,
|
||||
// reading fails, or if the callback returns an error.
|
||||
func (c *Chunker) ChunkReaderStreaming(r io.Reader, callback ChunkCallback) (string, error) {
|
||||
// Create a tee reader to calculate full file hash while chunking
|
||||
fileHasher := sha256.New()
|
||||
teeReader := io.TeeReader(r, fileHasher)
|
||||
|
||||
chunker := AcquireReusableChunker(teeReader, c.minChunkSize, c.avgChunkSize, c.maxChunkSize)
|
||||
defer chunker.Release()
|
||||
|
||||
offset := int64(0)
|
||||
|
||||
for {
|
||||
chunk, err := chunker.Next()
|
||||
if err == io.EOF {
|
||||
break
|
||||
}
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("reading chunk: %w", err)
|
||||
}
|
||||
|
||||
// Calculate chunk hash
|
||||
hash := sha256.Sum256(chunk.Data)
|
||||
|
||||
// Pass the data directly - caller must process it before we call Next() again
|
||||
// (chunker reuses its internal buffer, but since we process synchronously
|
||||
// and completely before continuing, no copy is needed)
|
||||
if err := callback(Chunk{
|
||||
Hash: hex.EncodeToString(hash[:]),
|
||||
Data: chunk.Data,
|
||||
Offset: offset,
|
||||
Size: int64(len(chunk.Data)),
|
||||
}); err != nil {
|
||||
return "", fmt.Errorf("callback error: %w", err)
|
||||
}
|
||||
|
||||
offset += int64(len(chunk.Data))
|
||||
}
|
||||
|
||||
// Return the full file hash
|
||||
return hex.EncodeToString(fileHasher.Sum(nil)), nil
|
||||
}
|
||||
|
||||
// ChunkFile splits a file into content-defined chunks by reading the entire file.
|
||||
// This is a convenience method that opens the file and passes it to ChunkReader.
|
||||
// For large files, consider using ChunkReaderStreaming with a file handle instead.
|
||||
// Returns an error if the file cannot be opened or if chunking fails.
|
||||
func (c *Chunker) ChunkFile(path string) ([]Chunk, error) {
|
||||
file, err := os.Open(path)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("opening file: %w", err)
|
||||
}
|
||||
defer func() {
|
||||
if err := file.Close(); err != nil && err.Error() != "invalid argument" {
|
||||
// Log error or handle as needed
|
||||
_ = err
|
||||
}
|
||||
}()
|
||||
|
||||
return c.ChunkReader(file)
|
||||
}
|
||||
77
internal/chunker/chunker_isolated_test.go
Normal file
77
internal/chunker/chunker_isolated_test.go
Normal file
@@ -0,0 +1,77 @@
|
||||
package chunker
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"testing"
|
||||
)
|
||||
|
||||
func TestChunkerExpectedChunkCount(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
fileSize int
|
||||
avgChunkSize int64
|
||||
minExpected int
|
||||
maxExpected int
|
||||
}{
|
||||
{
|
||||
name: "1MB file with 64KB average",
|
||||
fileSize: 1024 * 1024,
|
||||
avgChunkSize: 64 * 1024,
|
||||
minExpected: 8, // At least half the expected count
|
||||
maxExpected: 32, // At most double the expected count
|
||||
},
|
||||
{
|
||||
name: "10MB file with 256KB average",
|
||||
fileSize: 10 * 1024 * 1024,
|
||||
avgChunkSize: 256 * 1024,
|
||||
minExpected: 10, // FastCDC may produce larger chunks
|
||||
maxExpected: 80,
|
||||
},
|
||||
{
|
||||
name: "512KB file with 64KB average",
|
||||
fileSize: 512 * 1024,
|
||||
avgChunkSize: 64 * 1024,
|
||||
minExpected: 4, // ~8 expected
|
||||
maxExpected: 16,
|
||||
},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
chunker := NewChunker(tt.avgChunkSize)
|
||||
|
||||
// Create data with some variation to trigger chunk boundaries
|
||||
data := make([]byte, tt.fileSize)
|
||||
for i := 0; i < len(data); i++ {
|
||||
// Use a pattern that should create boundaries
|
||||
data[i] = byte((i * 17) ^ (i >> 5))
|
||||
}
|
||||
|
||||
chunks, err := chunker.ChunkReader(bytes.NewReader(data))
|
||||
if err != nil {
|
||||
t.Fatalf("chunking failed: %v", err)
|
||||
}
|
||||
|
||||
t.Logf("Created %d chunks for %d bytes with %d average chunk size",
|
||||
len(chunks), tt.fileSize, tt.avgChunkSize)
|
||||
|
||||
if len(chunks) < tt.minExpected {
|
||||
t.Errorf("too few chunks: got %d, expected at least %d",
|
||||
len(chunks), tt.minExpected)
|
||||
}
|
||||
if len(chunks) > tt.maxExpected {
|
||||
t.Errorf("too many chunks: got %d, expected at most %d",
|
||||
len(chunks), tt.maxExpected)
|
||||
}
|
||||
|
||||
// Verify chunks reconstruct to original
|
||||
var reconstructed []byte
|
||||
for _, chunk := range chunks {
|
||||
reconstructed = append(reconstructed, chunk.Data...)
|
||||
}
|
||||
if !bytes.Equal(data, reconstructed) {
|
||||
t.Error("reconstructed data doesn't match original")
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
128
internal/chunker/chunker_test.go
Normal file
128
internal/chunker/chunker_test.go
Normal file
@@ -0,0 +1,128 @@
|
||||
package chunker
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"crypto/rand"
|
||||
"testing"
|
||||
)
|
||||
|
||||
func TestChunker(t *testing.T) {
|
||||
t.Run("small file produces single chunk", func(t *testing.T) {
|
||||
chunker := NewChunker(1024 * 1024) // 1MB average
|
||||
data := bytes.Repeat([]byte("hello"), 100) // 500 bytes
|
||||
|
||||
chunks, err := chunker.ChunkReader(bytes.NewReader(data))
|
||||
if err != nil {
|
||||
t.Fatalf("chunking failed: %v", err)
|
||||
}
|
||||
|
||||
if len(chunks) != 1 {
|
||||
t.Errorf("expected 1 chunk, got %d", len(chunks))
|
||||
}
|
||||
|
||||
if chunks[0].Size != int64(len(data)) {
|
||||
t.Errorf("expected chunk size %d, got %d", len(data), chunks[0].Size)
|
||||
}
|
||||
})
|
||||
|
||||
t.Run("large file produces multiple chunks", func(t *testing.T) {
|
||||
chunker := NewChunker(256 * 1024) // 256KB average chunk size
|
||||
|
||||
// Generate 2MB of random data
|
||||
data := make([]byte, 2*1024*1024)
|
||||
if _, err := rand.Read(data); err != nil {
|
||||
t.Fatalf("failed to generate random data: %v", err)
|
||||
}
|
||||
|
||||
chunks, err := chunker.ChunkReader(bytes.NewReader(data))
|
||||
if err != nil {
|
||||
t.Fatalf("chunking failed: %v", err)
|
||||
}
|
||||
|
||||
// Should produce multiple chunks - with FastCDC we expect around 8 chunks for 2MB with 256KB average
|
||||
if len(chunks) < 4 || len(chunks) > 16 {
|
||||
t.Errorf("expected 4-16 chunks, got %d", len(chunks))
|
||||
}
|
||||
|
||||
// Verify chunks reconstruct original data
|
||||
var reconstructed []byte
|
||||
for _, chunk := range chunks {
|
||||
reconstructed = append(reconstructed, chunk.Data...)
|
||||
}
|
||||
|
||||
if !bytes.Equal(data, reconstructed) {
|
||||
t.Error("reconstructed data doesn't match original")
|
||||
}
|
||||
|
||||
// Verify offsets
|
||||
var expectedOffset int64
|
||||
for i, chunk := range chunks {
|
||||
if chunk.Offset != expectedOffset {
|
||||
t.Errorf("chunk %d: expected offset %d, got %d", i, expectedOffset, chunk.Offset)
|
||||
}
|
||||
expectedOffset += chunk.Size
|
||||
}
|
||||
})
|
||||
|
||||
t.Run("deterministic chunking", func(t *testing.T) {
|
||||
chunker1 := NewChunker(256 * 1024)
|
||||
chunker2 := NewChunker(256 * 1024)
|
||||
|
||||
// Use deterministic data
|
||||
data := bytes.Repeat([]byte("abcdefghijklmnopqrstuvwxyz"), 20000) // ~520KB
|
||||
|
||||
chunks1, err := chunker1.ChunkReader(bytes.NewReader(data))
|
||||
if err != nil {
|
||||
t.Fatalf("chunking failed: %v", err)
|
||||
}
|
||||
|
||||
chunks2, err := chunker2.ChunkReader(bytes.NewReader(data))
|
||||
if err != nil {
|
||||
t.Fatalf("chunking failed: %v", err)
|
||||
}
|
||||
|
||||
// Should produce same chunks
|
||||
if len(chunks1) != len(chunks2) {
|
||||
t.Fatalf("different number of chunks: %d vs %d", len(chunks1), len(chunks2))
|
||||
}
|
||||
|
||||
for i := range chunks1 {
|
||||
if chunks1[i].Hash != chunks2[i].Hash {
|
||||
t.Errorf("chunk %d: different hashes", i)
|
||||
}
|
||||
if chunks1[i].Size != chunks2[i].Size {
|
||||
t.Errorf("chunk %d: different sizes", i)
|
||||
}
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
func TestChunkBoundaries(t *testing.T) {
|
||||
chunker := NewChunker(256 * 1024) // 256KB average
|
||||
|
||||
// FastCDC uses avg/4 for min and avg*4 for max
|
||||
avgSize := int64(256 * 1024)
|
||||
minSize := avgSize / 4
|
||||
maxSize := avgSize * 4
|
||||
|
||||
// Test that minimum chunk size is respected
|
||||
data := make([]byte, minSize+1024)
|
||||
if _, err := rand.Read(data); err != nil {
|
||||
t.Fatalf("failed to generate random data: %v", err)
|
||||
}
|
||||
|
||||
chunks, err := chunker.ChunkReader(bytes.NewReader(data))
|
||||
if err != nil {
|
||||
t.Fatalf("chunking failed: %v", err)
|
||||
}
|
||||
|
||||
for i, chunk := range chunks {
|
||||
// Last chunk can be smaller than minimum
|
||||
if i < len(chunks)-1 && chunk.Size < minSize {
|
||||
t.Errorf("chunk %d size %d is below minimum %d", i, chunk.Size, minSize)
|
||||
}
|
||||
if chunk.Size > maxSize {
|
||||
t.Errorf("chunk %d size %d exceeds maximum %d", i, chunk.Size, maxSize)
|
||||
}
|
||||
}
|
||||
}
|
||||
265
internal/chunker/fastcdc.go
Normal file
265
internal/chunker/fastcdc.go
Normal file
@@ -0,0 +1,265 @@
|
||||
package chunker
|
||||
|
||||
import (
|
||||
"io"
|
||||
"math"
|
||||
"sync"
|
||||
)
|
||||
|
||||
// ReusableChunker implements FastCDC with reusable buffers to minimize allocations.
|
||||
// Unlike the upstream fastcdc-go library which allocates a new buffer per file,
|
||||
// this implementation uses sync.Pool to reuse buffers across files.
|
||||
type ReusableChunker struct {
|
||||
minSize int
|
||||
maxSize int
|
||||
normSize int
|
||||
bufSize int
|
||||
|
||||
maskS uint64
|
||||
maskL uint64
|
||||
|
||||
rd io.Reader
|
||||
|
||||
buf []byte
|
||||
cursor int
|
||||
offset int
|
||||
eof bool
|
||||
}
|
||||
|
||||
// reusableChunkerPool pools ReusableChunker instances to avoid allocations.
|
||||
var reusableChunkerPool = sync.Pool{
|
||||
New: func() interface{} {
|
||||
return &ReusableChunker{}
|
||||
},
|
||||
}
|
||||
|
||||
// bufferPools contains pools for different buffer sizes.
|
||||
// Key is the buffer size.
|
||||
var bufferPools = sync.Map{}
|
||||
|
||||
func getBuffer(size int) []byte {
|
||||
poolI, _ := bufferPools.LoadOrStore(size, &sync.Pool{
|
||||
New: func() interface{} {
|
||||
buf := make([]byte, size)
|
||||
return &buf
|
||||
},
|
||||
})
|
||||
pool := poolI.(*sync.Pool)
|
||||
return *pool.Get().(*[]byte)
|
||||
}
|
||||
|
||||
func putBuffer(buf []byte) {
|
||||
size := cap(buf)
|
||||
poolI, ok := bufferPools.Load(size)
|
||||
if ok {
|
||||
pool := poolI.(*sync.Pool)
|
||||
b := buf[:size]
|
||||
pool.Put(&b)
|
||||
}
|
||||
}
|
||||
|
||||
// FastCDCChunk represents a chunk from the FastCDC algorithm.
|
||||
type FastCDCChunk struct {
|
||||
Offset int
|
||||
Length int
|
||||
Data []byte
|
||||
Fingerprint uint64
|
||||
}
|
||||
|
||||
// AcquireReusableChunker gets a chunker from the pool and initializes it for the given reader.
|
||||
func AcquireReusableChunker(rd io.Reader, minSize, avgSize, maxSize int) *ReusableChunker {
|
||||
c := reusableChunkerPool.Get().(*ReusableChunker)
|
||||
|
||||
bufSize := maxSize * 2
|
||||
|
||||
// Reuse buffer if it's the right size, otherwise get a new one
|
||||
if c.buf == nil || cap(c.buf) != bufSize {
|
||||
if c.buf != nil {
|
||||
putBuffer(c.buf)
|
||||
}
|
||||
c.buf = getBuffer(bufSize)
|
||||
} else {
|
||||
// Restore buffer to full capacity (may have been truncated by previous EOF)
|
||||
c.buf = c.buf[:cap(c.buf)]
|
||||
}
|
||||
|
||||
bits := int(math.Round(math.Log2(float64(avgSize))))
|
||||
normalization := 2
|
||||
smallBits := bits + normalization
|
||||
largeBits := bits - normalization
|
||||
|
||||
c.minSize = minSize
|
||||
c.maxSize = maxSize
|
||||
c.normSize = avgSize
|
||||
c.bufSize = bufSize
|
||||
c.maskS = (1 << smallBits) - 1
|
||||
c.maskL = (1 << largeBits) - 1
|
||||
c.rd = rd
|
||||
c.cursor = bufSize
|
||||
c.offset = 0
|
||||
c.eof = false
|
||||
|
||||
return c
|
||||
}
|
||||
|
||||
// Release returns the chunker to the pool for reuse.
|
||||
func (c *ReusableChunker) Release() {
|
||||
c.rd = nil
|
||||
reusableChunkerPool.Put(c)
|
||||
}
|
||||
|
||||
func (c *ReusableChunker) fillBuffer() error {
|
||||
n := len(c.buf) - c.cursor
|
||||
if n >= c.maxSize {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Move all data after the cursor to the start of the buffer
|
||||
copy(c.buf[:n], c.buf[c.cursor:])
|
||||
c.cursor = 0
|
||||
|
||||
if c.eof {
|
||||
c.buf = c.buf[:n]
|
||||
return nil
|
||||
}
|
||||
|
||||
// Restore buffer to full capacity for reading
|
||||
c.buf = c.buf[:c.bufSize]
|
||||
|
||||
// Fill the rest of the buffer
|
||||
m, err := io.ReadFull(c.rd, c.buf[n:])
|
||||
if err == io.EOF || err == io.ErrUnexpectedEOF {
|
||||
c.buf = c.buf[:n+m]
|
||||
c.eof = true
|
||||
} else if err != nil {
|
||||
return err
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// Next returns the next chunk or io.EOF when done.
|
||||
// The returned Data slice is only valid until the next call to Next.
|
||||
func (c *ReusableChunker) Next() (FastCDCChunk, error) {
|
||||
if err := c.fillBuffer(); err != nil {
|
||||
return FastCDCChunk{}, err
|
||||
}
|
||||
if len(c.buf) == 0 {
|
||||
return FastCDCChunk{}, io.EOF
|
||||
}
|
||||
|
||||
length, fp := c.nextChunk(c.buf[c.cursor:])
|
||||
|
||||
chunk := FastCDCChunk{
|
||||
Offset: c.offset,
|
||||
Length: length,
|
||||
Data: c.buf[c.cursor : c.cursor+length],
|
||||
Fingerprint: fp,
|
||||
}
|
||||
|
||||
c.cursor += length
|
||||
c.offset += chunk.Length
|
||||
|
||||
return chunk, nil
|
||||
}
|
||||
|
||||
func (c *ReusableChunker) nextChunk(data []byte) (int, uint64) {
|
||||
fp := uint64(0)
|
||||
i := c.minSize
|
||||
|
||||
if len(data) <= c.minSize {
|
||||
return len(data), fp
|
||||
}
|
||||
|
||||
n := min(len(data), c.maxSize)
|
||||
|
||||
for ; i < min(n, c.normSize); i++ {
|
||||
fp = (fp << 1) + table[data[i]]
|
||||
if (fp & c.maskS) == 0 {
|
||||
return i + 1, fp
|
||||
}
|
||||
}
|
||||
|
||||
for ; i < n; i++ {
|
||||
fp = (fp << 1) + table[data[i]]
|
||||
if (fp & c.maskL) == 0 {
|
||||
return i + 1, fp
|
||||
}
|
||||
}
|
||||
|
||||
return i, fp
|
||||
}
|
||||
|
||||
func min(a, b int) int {
|
||||
if a < b {
|
||||
return a
|
||||
}
|
||||
return b
|
||||
}
|
||||
|
||||
// 256 random uint64s for the rolling hash function (from FastCDC paper)
|
||||
var table = [256]uint64{
|
||||
0xe80e8d55032474b3, 0x11b25b61f5924e15, 0x03aa5bd82a9eb669, 0xc45a153ef107a38c,
|
||||
0xeac874b86f0f57b9, 0xa5ccedec95ec79c7, 0xe15a3320ad42ac0a, 0x5ed3583fa63cec15,
|
||||
0xcd497bf624a4451d, 0xf9ade5b059683605, 0x773940c03fb11ca1, 0xa36b16e4a6ae15b2,
|
||||
0x67afd1adb5a89eac, 0xc44c75ee32f0038e, 0x2101790f365c0967, 0x76415c64a222fc4a,
|
||||
0x579929249a1e577a, 0xe4762fc41fdbf750, 0xea52198e57dfcdcc, 0xe2535aafe30b4281,
|
||||
0xcb1a1bd6c77c9056, 0x5a1aa9bfc4612a62, 0x15a728aef8943eb5, 0x2f8f09738a8ec8d9,
|
||||
0x200f3dec9fac8074, 0x0fa9a7b1e0d318df, 0x06c0804ffd0d8e3a, 0x630cbc412669dd25,
|
||||
0x10e34f85f4b10285, 0x2a6fe8164b9b6410, 0xcacb57d857d55810, 0x77f8a3a36ff11b46,
|
||||
0x66af517e0dc3003e, 0x76c073c789b4009a, 0x853230dbb529f22a, 0x1e9e9c09a1f77e56,
|
||||
0x1e871223802ee65d, 0x37fe4588718ff813, 0x10088539f30db464, 0x366f7470b80b72d1,
|
||||
0x33f2634d9a6b31db, 0xd43917751d69ea18, 0xa0f492bc1aa7b8de, 0x3f94e5a8054edd20,
|
||||
0xedfd6e25eb8b1dbf, 0x759517a54f196a56, 0xe81d5006ec7b6b17, 0x8dd8385fa894a6b7,
|
||||
0x45f4d5467b0d6f91, 0xa1f894699de22bc8, 0x33829d09ef93e0fe, 0x3e29e250caed603c,
|
||||
0xf7382cba7f63a45e, 0x970f95412bb569d1, 0xc7fcea456d356b4b, 0x723042513f3e7a57,
|
||||
0x17ae7688de3596f1, 0x27ac1fcd7cd23c1a, 0xf429beeb78b3f71f, 0xd0780692fb93a3f9,
|
||||
0x9f507e28a7c9842f, 0x56001ad536e433ae, 0x7e1dd1ecf58be306, 0x15fee353aa233fc6,
|
||||
0xb033a0730b7638e8, 0xeb593ad6bd2406d1, 0x7c86502574d0f133, 0xce3b008d4ccb4be7,
|
||||
0xf8566e3d383594c8, 0xb2c261e9b7af4429, 0xf685e7e253799dbb, 0x05d33ed60a494cbc,
|
||||
0xeaf88d55a4cb0d1a, 0x3ee9368a902415a1, 0x8980fe6a8493a9a4, 0x358ed008cb448631,
|
||||
0xd0cb7e37b46824b8, 0xe9bc375c0bc94f84, 0xea0bf1d8e6b55bb3, 0xb66a60d0f9f6f297,
|
||||
0x66db2cc4807b3758, 0x7e4e014afbca8b4d, 0xa5686a4938b0c730, 0xa5f0d7353d623316,
|
||||
0x26e38c349242d5e8, 0xeeefa80a29858e30, 0x8915cb912aa67386, 0x4b957a47bfc420d4,
|
||||
0xbb53d051a895f7e1, 0x09f5e3235f6911ce, 0x416b98e695cfb7ce, 0x97a08183344c5c86,
|
||||
0xbf68e0791839a861, 0xea05dde59ed3ed56, 0x0ca732280beda160, 0xac748ed62fe7f4e2,
|
||||
0xc686da075cf6e151, 0xe1ba5658f4af05c8, 0xe9ff09fbeb67cc35, 0xafaea9470323b28d,
|
||||
0x0291e8db5bb0ac2a, 0x342072a9bbee77ae, 0x03147eed6b3d0a9c, 0x21379d4de31dbadb,
|
||||
0x2388d965226fb986, 0x52c96988bfebabfa, 0xa6fc29896595bc2d, 0x38fa4af70aa46b8b,
|
||||
0xa688dd13939421ee, 0x99d5275d9b1415da, 0x453d31bb4fe73631, 0xde51debc1fbe3356,
|
||||
0x75a3c847a06c622f, 0xe80e32755d272579, 0x5444052250d8ec0d, 0x8f17dfda19580a3b,
|
||||
0xf6b3e9363a185e42, 0x7a42adec6868732f, 0x32cb6a07629203a2, 0x1eca8957defe56d9,
|
||||
0x9fa85e4bc78ff9ed, 0x20ff07224a499ca7, 0x3fa6295ff9682c70, 0xe3d5b1e3ce993eff,
|
||||
0xa341209362e0b79a, 0x64bd9eae5712ffe8, 0xceebb537babbd12a, 0x5586ef404315954f,
|
||||
0x46c3085c938ab51a, 0xa82ccb9199907cee, 0x8c51b6690a3523c8, 0xc4dbd4c9ae518332,
|
||||
0x979898dbb23db7b2, 0x1b5b585e6f672a9d, 0xce284da7c4903810, 0x841166e8bb5f1c4f,
|
||||
0xb7d884a3fceca7d0, 0xa76468f5a4572374, 0xc10c45f49ee9513d, 0x68f9a5663c1908c9,
|
||||
0x0095a13476a6339d, 0xd1d7516ffbe9c679, 0xfd94ab0c9726f938, 0x627468bbdb27c959,
|
||||
0xedc3f8988e4a8c9a, 0x58efd33f0dfaa499, 0x21e37d7e2ef4ac8b, 0x297f9ab5586259c6,
|
||||
0xda3ba4dc6cb9617d, 0xae11d8d9de2284d2, 0xcfeed88cb3729865, 0xefc2f9e4f03e2633,
|
||||
0x8226393e8f0855a4, 0xd6e25fd7acf3a767, 0x435784c3bfd6d14a, 0xf97142e6343fe757,
|
||||
0xd73b9fe826352f85, 0x6c3ac444b5b2bd76, 0xd8e88f3e9fd4a3fd, 0x31e50875c36f3460,
|
||||
0xa824f1bf88cf4d44, 0x54a4d2c8f5f25899, 0xbff254637ce3b1e6, 0xa02cfe92561b3caa,
|
||||
0x7bedb4edee9f0af7, 0x879c0620ac49a102, 0xa12c4ccd23b332e7, 0x09a5ff47bf94ed1e,
|
||||
0x7b62f43cd3046fa0, 0xaa3af0476b9c2fb9, 0x22e55301abebba8e, 0x3a6035c42747bd58,
|
||||
0x1705373106c8ec07, 0xb1f660de828d0628, 0x065fe82d89ca563d, 0xf555c2d8074d516d,
|
||||
0x6bb6c186b423ee99, 0x54a807be6f3120a8, 0x8a3c7fe2f88860b8, 0xbeffc344f5118e81,
|
||||
0xd686e80b7d1bd268, 0x661aef4ef5e5e88b, 0x5bf256c654cd1dda, 0x9adb1ab85d7640f4,
|
||||
0x68449238920833a2, 0x843279f4cebcb044, 0xc8710cdefa93f7bb, 0x236943294538f3e6,
|
||||
0x80d7d136c486d0b4, 0x61653956b28851d3, 0x3f843be9a9a956b5, 0xf73cfbbf137987e5,
|
||||
0xcf0cb6dee8ceac2c, 0x50c401f52f185cae, 0xbdbe89ce735c4c1c, 0xeef3ade9c0570bc7,
|
||||
0xbe8b066f8f64cbf6, 0x5238d6131705dcb9, 0x20219086c950e9f6, 0x634468d9ed74de02,
|
||||
0x0aba4b3d705c7fa5, 0x3374416f725a6672, 0xe7378bdf7beb3bc6, 0x0f7b6a1b1cee565b,
|
||||
0x234e4c41b0c33e64, 0x4efa9a0c3f21fe28, 0x1167fc551643e514, 0x9f81a69d3eb01fa4,
|
||||
0xdb75c22b12306ed0, 0xe25055d738fc9686, 0x9f9f167a3f8507bb, 0x195f8336d3fbe4d3,
|
||||
0x8442b6feffdcb6f6, 0x1e07ed24746ffde9, 0x140e31462d555266, 0x8bd0ce515ae1406e,
|
||||
0x2c0be0042b5584b3, 0x35a23d0e15d45a60, 0xc14f1ba147d9bc83, 0xbbf168691264b23f,
|
||||
0xad2cc7b57e589ade, 0x9501963154c7815c, 0x9664afa6b8d67d47, 0x7f9e5101fea0a81c,
|
||||
0x45ecffb610d25bfd, 0x3157f7aecf9b6ab3, 0xc43ca6f88d87501d, 0x9576ff838dee38dc,
|
||||
0x93f21afe0ce1c7d7, 0xceac699df343d8f9, 0x2fec49e29f03398d, 0x8805ccd5730281ed,
|
||||
0xf9fc16fc750a8e59, 0x35308cc771adf736, 0x4a57b7c9ee2b7def, 0x03a4c6cdc937a02a,
|
||||
0x6c9a8a269fc8c4fc, 0x4681decec7a03f43, 0x342eecded1353ef9, 0x8be0552d8413a867,
|
||||
0xc7b4ac51beda8be8, 0xebcc64fb719842c0, 0xde8e4c7fb6d40c1c, 0xcc8263b62f9738b1,
|
||||
0xd3cfc0f86511929a, 0x466024ce8bb226ea, 0x459ff690253a3c18, 0x98b27e9d91284c9c,
|
||||
0x75c3ae8aa3af373d, 0xfbf8f8e79a866ffc, 0x32327f59d0662799, 0x8228b57e729e9830,
|
||||
0x065ceb7a18381b58, 0xd2177671a31dc5ff, 0x90cd801f2f8701f9, 0x9d714428471c65fe,
|
||||
}
|
||||
@@ -2,28 +2,63 @@ package cli
|
||||
|
||||
import (
|
||||
"context"
|
||||
"errors"
|
||||
"fmt"
|
||||
"os"
|
||||
"os/signal"
|
||||
"path/filepath"
|
||||
"syscall"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/config"
|
||||
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||
"git.eeqj.de/sneak/vaultik/internal/globals"
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
"git.eeqj.de/sneak/vaultik/internal/pidlock"
|
||||
"git.eeqj.de/sneak/vaultik/internal/snapshot"
|
||||
"git.eeqj.de/sneak/vaultik/internal/storage"
|
||||
"git.eeqj.de/sneak/vaultik/internal/vaultik"
|
||||
"github.com/adrg/xdg"
|
||||
"go.uber.org/fx"
|
||||
)
|
||||
|
||||
// AppOptions contains common options for creating the fx application
|
||||
// AppOptions contains common options for creating the fx application.
|
||||
// It includes the configuration file path, logging options, and additional
|
||||
// fx modules and invocations that should be included in the application.
|
||||
type AppOptions struct {
|
||||
ConfigPath string
|
||||
LogOptions log.LogOptions
|
||||
Modules []fx.Option
|
||||
Invokes []fx.Option
|
||||
}
|
||||
|
||||
// NewApp creates a new fx application with common modules
|
||||
// setupGlobals sets up the globals with application startup time
|
||||
func setupGlobals(lc fx.Lifecycle, g *globals.Globals) {
|
||||
lc.Append(fx.Hook{
|
||||
OnStart: func(ctx context.Context) error {
|
||||
g.StartTime = time.Now().UTC()
|
||||
return nil
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
// NewApp creates a new fx application with common modules.
|
||||
// It sets up the base modules (config, database, logging, globals) and
|
||||
// combines them with any additional modules specified in the options.
|
||||
// The returned fx.App is ready to be started with RunApp.
|
||||
func NewApp(opts AppOptions) *fx.App {
|
||||
baseModules := []fx.Option{
|
||||
fx.Supply(config.ConfigPath(opts.ConfigPath)),
|
||||
fx.Supply(opts.LogOptions),
|
||||
fx.Provide(globals.New),
|
||||
fx.Provide(log.New),
|
||||
config.Module,
|
||||
database.Module,
|
||||
log.Module,
|
||||
storage.Module,
|
||||
snapshot.Module,
|
||||
fx.Provide(vaultik.New),
|
||||
fx.Invoke(setupGlobals),
|
||||
fx.NopLogger,
|
||||
}
|
||||
|
||||
@@ -33,24 +68,77 @@ func NewApp(opts AppOptions) *fx.App {
|
||||
return fx.New(allOptions...)
|
||||
}
|
||||
|
||||
// RunApp starts and stops the fx application within the given context
|
||||
// RunApp starts and stops the fx application within the given context.
|
||||
// It handles graceful shutdown on interrupt signals (SIGINT, SIGTERM) and
|
||||
// ensures the application stops cleanly. The function blocks until the
|
||||
// application completes or is interrupted. Returns an error if startup fails.
|
||||
func RunApp(ctx context.Context, app *fx.App) error {
|
||||
// Set up signal handling for graceful shutdown
|
||||
sigChan := make(chan os.Signal, 1)
|
||||
signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)
|
||||
|
||||
// Create a context that will be cancelled on signal
|
||||
ctx, cancel := context.WithCancel(ctx)
|
||||
defer cancel()
|
||||
|
||||
// Start the app
|
||||
if err := app.Start(ctx); err != nil {
|
||||
return fmt.Errorf("failed to start app: %w", err)
|
||||
}
|
||||
defer func() {
|
||||
if err := app.Stop(ctx); err != nil {
|
||||
fmt.Printf("error stopping app: %v\n", err)
|
||||
|
||||
// Handle shutdown
|
||||
shutdownComplete := make(chan struct{})
|
||||
go func() {
|
||||
defer close(shutdownComplete)
|
||||
<-sigChan
|
||||
log.Notice("Received interrupt signal, shutting down gracefully...")
|
||||
|
||||
// Create a timeout context for shutdown
|
||||
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 30*time.Second)
|
||||
defer shutdownCancel()
|
||||
|
||||
if err := app.Stop(shutdownCtx); err != nil {
|
||||
log.Error("Error during shutdown", "error", err)
|
||||
}
|
||||
}()
|
||||
|
||||
// Wait for context cancellation
|
||||
<-ctx.Done()
|
||||
// Wait for either the signal handler to complete shutdown or the app to request shutdown
|
||||
select {
|
||||
case <-shutdownComplete:
|
||||
// Shutdown completed via signal
|
||||
return nil
|
||||
case <-ctx.Done():
|
||||
// Context cancelled (shouldn't happen in normal operation)
|
||||
if err := app.Stop(context.Background()); err != nil {
|
||||
log.Error("Error stopping app", "error", err)
|
||||
}
|
||||
return ctx.Err()
|
||||
case <-app.Done():
|
||||
// App finished running (e.g., backup completed)
|
||||
return nil
|
||||
}
|
||||
}
|
||||
|
||||
// RunWithApp is a helper that creates and runs an fx app with the given options
|
||||
// RunWithApp is a helper that creates and runs an fx app with the given options.
|
||||
// It combines NewApp and RunApp into a single convenient function. This is the
|
||||
// preferred way to run CLI commands that need the full application context.
|
||||
// It acquires a PID lock before starting to prevent concurrent instances.
|
||||
func RunWithApp(ctx context.Context, opts AppOptions) error {
|
||||
// Acquire PID lock to prevent concurrent instances
|
||||
lockDir := filepath.Join(xdg.DataHome, "berlin.sneak.app.vaultik")
|
||||
lock, err := pidlock.Acquire(lockDir)
|
||||
if err != nil {
|
||||
if errors.Is(err, pidlock.ErrAlreadyRunning) {
|
||||
return fmt.Errorf("cannot start: %w", err)
|
||||
}
|
||||
return fmt.Errorf("failed to acquire lock: %w", err)
|
||||
}
|
||||
defer func() {
|
||||
if err := lock.Release(); err != nil {
|
||||
log.Warn("Failed to release PID lock", "error", err)
|
||||
}
|
||||
}()
|
||||
|
||||
app := NewApp(opts)
|
||||
return RunApp(ctx, app)
|
||||
}
|
||||
|
||||
@@ -1,83 +0,0 @@
|
||||
package cli
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"os"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/config"
|
||||
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||
"git.eeqj.de/sneak/vaultik/internal/globals"
|
||||
"github.com/spf13/cobra"
|
||||
"go.uber.org/fx"
|
||||
)
|
||||
|
||||
// BackupOptions contains options for the backup command
|
||||
type BackupOptions struct {
|
||||
ConfigPath string
|
||||
Daemon bool
|
||||
Cron bool
|
||||
Prune bool
|
||||
}
|
||||
|
||||
// NewBackupCommand creates the backup command
|
||||
func NewBackupCommand() *cobra.Command {
|
||||
opts := &BackupOptions{}
|
||||
|
||||
cmd := &cobra.Command{
|
||||
Use: "backup",
|
||||
Short: "Perform incremental backup",
|
||||
Long: `Backup configured directories using incremental deduplication and encryption.
|
||||
|
||||
Config is located at /etc/vaultik/config.yml, but can be overridden by specifying
|
||||
a path using --config or by setting VAULTIK_CONFIG to a path.`,
|
||||
Args: cobra.NoArgs,
|
||||
RunE: func(cmd *cobra.Command, args []string) error {
|
||||
// If --config not specified, check environment variable
|
||||
if opts.ConfigPath == "" {
|
||||
opts.ConfigPath = os.Getenv("VAULTIK_CONFIG")
|
||||
}
|
||||
// If still not specified, use default
|
||||
if opts.ConfigPath == "" {
|
||||
defaultConfig := "/etc/vaultik/config.yml"
|
||||
if _, err := os.Stat(defaultConfig); err == nil {
|
||||
opts.ConfigPath = defaultConfig
|
||||
} else {
|
||||
return fmt.Errorf("no config file specified, VAULTIK_CONFIG not set, and %s not found", defaultConfig)
|
||||
}
|
||||
}
|
||||
return runBackup(cmd.Context(), opts)
|
||||
},
|
||||
}
|
||||
|
||||
cmd.Flags().StringVar(&opts.ConfigPath, "config", "", "Path to config file")
|
||||
cmd.Flags().BoolVar(&opts.Daemon, "daemon", false, "Run in daemon mode with inotify monitoring")
|
||||
cmd.Flags().BoolVar(&opts.Cron, "cron", false, "Run in cron mode (silent unless error)")
|
||||
cmd.Flags().BoolVar(&opts.Prune, "prune", false, "Delete all previous snapshots and unreferenced blobs after backup")
|
||||
|
||||
return cmd
|
||||
}
|
||||
|
||||
func runBackup(ctx context.Context, opts *BackupOptions) error {
|
||||
return RunWithApp(ctx, AppOptions{
|
||||
ConfigPath: opts.ConfigPath,
|
||||
Invokes: []fx.Option{
|
||||
fx.Invoke(func(g *globals.Globals, cfg *config.Config, repos *database.Repositories) error {
|
||||
// TODO: Implement backup logic
|
||||
fmt.Printf("Running backup with config: %s\n", opts.ConfigPath)
|
||||
fmt.Printf("Version: %s, Commit: %s\n", g.Version, g.Commit)
|
||||
fmt.Printf("Index path: %s\n", cfg.IndexPath)
|
||||
if opts.Daemon {
|
||||
fmt.Println("Running in daemon mode")
|
||||
}
|
||||
if opts.Cron {
|
||||
fmt.Println("Running in cron mode")
|
||||
}
|
||||
if opts.Prune {
|
||||
fmt.Println("Pruning enabled - will delete old snapshots after backup")
|
||||
}
|
||||
return nil
|
||||
}),
|
||||
},
|
||||
})
|
||||
}
|
||||
102
internal/cli/database.go
Normal file
102
internal/cli/database.go
Normal file
@@ -0,0 +1,102 @@
|
||||
package cli
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/config"
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
"github.com/spf13/cobra"
|
||||
)
|
||||
|
||||
// NewDatabaseCommand creates the database command group
|
||||
func NewDatabaseCommand() *cobra.Command {
|
||||
cmd := &cobra.Command{
|
||||
Use: "database",
|
||||
Short: "Manage the local state database",
|
||||
Long: `Commands for managing the local SQLite state database.`,
|
||||
}
|
||||
|
||||
cmd.AddCommand(
|
||||
newDatabasePurgeCommand(),
|
||||
)
|
||||
|
||||
return cmd
|
||||
}
|
||||
|
||||
// newDatabasePurgeCommand creates the database purge command
|
||||
func newDatabasePurgeCommand() *cobra.Command {
|
||||
var force bool
|
||||
|
||||
cmd := &cobra.Command{
|
||||
Use: "purge",
|
||||
Short: "Delete the local state database",
|
||||
Long: `Completely removes the local SQLite state database.
|
||||
|
||||
This will erase all local tracking of:
|
||||
- File metadata and change detection state
|
||||
- Chunk and blob mappings
|
||||
- Local snapshot records
|
||||
|
||||
The remote storage is NOT affected. After purging, the next backup will
|
||||
perform a full scan and re-deduplicate against existing remote blobs.
|
||||
|
||||
Use --force to skip the confirmation prompt.`,
|
||||
Args: cobra.NoArgs,
|
||||
RunE: func(cmd *cobra.Command, args []string) error {
|
||||
// Resolve config path
|
||||
configPath, err := ResolveConfigPath()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Load config to get database path
|
||||
cfg, err := config.Load(configPath)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to load config: %w", err)
|
||||
}
|
||||
|
||||
dbPath := cfg.IndexPath
|
||||
|
||||
// Check if database exists
|
||||
if _, err := os.Stat(dbPath); os.IsNotExist(err) {
|
||||
fmt.Printf("Database does not exist: %s\n", dbPath)
|
||||
return nil
|
||||
}
|
||||
|
||||
// Confirm unless --force
|
||||
if !force {
|
||||
fmt.Printf("This will delete the local state database at:\n %s\n\n", dbPath)
|
||||
fmt.Print("Are you sure? Type 'yes' to confirm: ")
|
||||
var confirm string
|
||||
if _, err := fmt.Scanln(&confirm); err != nil || confirm != "yes" {
|
||||
fmt.Println("Aborted.")
|
||||
return nil
|
||||
}
|
||||
}
|
||||
|
||||
// Delete the database file
|
||||
if err := os.Remove(dbPath); err != nil {
|
||||
return fmt.Errorf("failed to delete database: %w", err)
|
||||
}
|
||||
|
||||
// Also delete WAL and SHM files if they exist
|
||||
walPath := dbPath + "-wal"
|
||||
shmPath := dbPath + "-shm"
|
||||
_ = os.Remove(walPath) // Ignore errors - files may not exist
|
||||
_ = os.Remove(shmPath)
|
||||
|
||||
rootFlags := GetRootFlags()
|
||||
if !rootFlags.Quiet {
|
||||
fmt.Printf("Database purged: %s\n", dbPath)
|
||||
}
|
||||
|
||||
log.Info("Local state database purged", "path", dbPath)
|
||||
return nil
|
||||
},
|
||||
}
|
||||
|
||||
cmd.Flags().BoolVar(&force, "force", false, "Skip confirmation prompt")
|
||||
|
||||
return cmd
|
||||
}
|
||||
94
internal/cli/duration.go
Normal file
94
internal/cli/duration.go
Normal file
@@ -0,0 +1,94 @@
|
||||
package cli
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"regexp"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
)
|
||||
|
||||
// parseDuration parses duration strings. Supports standard Go duration format
|
||||
// (e.g., "3h30m", "1h45m30s") as well as extended units:
|
||||
// - d: days (e.g., "30d", "7d")
|
||||
// - w: weeks (e.g., "2w", "4w")
|
||||
// - mo: months (30 days) (e.g., "6mo", "1mo")
|
||||
// - y: years (365 days) (e.g., "1y", "2y")
|
||||
//
|
||||
// Can combine units: "1y6mo", "2w3d", "1d12h30m"
|
||||
func parseDuration(s string) (time.Duration, error) {
|
||||
// First try standard Go duration parsing
|
||||
if d, err := time.ParseDuration(s); err == nil {
|
||||
return d, nil
|
||||
}
|
||||
|
||||
// Extended duration parsing
|
||||
// Check for negative values
|
||||
if strings.HasPrefix(strings.TrimSpace(s), "-") {
|
||||
return 0, fmt.Errorf("negative durations are not supported")
|
||||
}
|
||||
|
||||
// Pattern matches: number + unit, repeated
|
||||
re := regexp.MustCompile(`(\d+(?:\.\d+)?)\s*([a-zA-Z]+)`)
|
||||
matches := re.FindAllStringSubmatch(s, -1)
|
||||
|
||||
if len(matches) == 0 {
|
||||
return 0, fmt.Errorf("invalid duration format: %q", s)
|
||||
}
|
||||
|
||||
var total time.Duration
|
||||
|
||||
for _, match := range matches {
|
||||
valueStr := match[1]
|
||||
unit := strings.ToLower(match[2])
|
||||
|
||||
value, err := strconv.ParseFloat(valueStr, 64)
|
||||
if err != nil {
|
||||
return 0, fmt.Errorf("invalid number %q: %w", valueStr, err)
|
||||
}
|
||||
|
||||
var d time.Duration
|
||||
switch unit {
|
||||
// Standard time units
|
||||
case "ns", "nanosecond", "nanoseconds":
|
||||
d = time.Duration(value)
|
||||
case "us", "µs", "microsecond", "microseconds":
|
||||
d = time.Duration(value * float64(time.Microsecond))
|
||||
case "ms", "millisecond", "milliseconds":
|
||||
d = time.Duration(value * float64(time.Millisecond))
|
||||
case "s", "sec", "second", "seconds":
|
||||
d = time.Duration(value * float64(time.Second))
|
||||
case "m", "min", "minute", "minutes":
|
||||
d = time.Duration(value * float64(time.Minute))
|
||||
case "h", "hr", "hour", "hours":
|
||||
d = time.Duration(value * float64(time.Hour))
|
||||
// Extended units
|
||||
case "d", "day", "days":
|
||||
d = time.Duration(value * float64(24*time.Hour))
|
||||
case "w", "week", "weeks":
|
||||
d = time.Duration(value * float64(7*24*time.Hour))
|
||||
case "mo", "month", "months":
|
||||
// Using 30 days as approximation
|
||||
d = time.Duration(value * float64(30*24*time.Hour))
|
||||
case "y", "year", "years":
|
||||
// Using 365 days as approximation
|
||||
d = time.Duration(value * float64(365*24*time.Hour))
|
||||
default:
|
||||
// Try parsing as standard Go duration unit
|
||||
testStr := fmt.Sprintf("1%s", unit)
|
||||
if _, err := time.ParseDuration(testStr); err == nil {
|
||||
// It's a valid Go duration unit, parse the full value
|
||||
fullStr := fmt.Sprintf("%g%s", value, unit)
|
||||
if d, err = time.ParseDuration(fullStr); err != nil {
|
||||
return 0, fmt.Errorf("invalid duration %q: %w", fullStr, err)
|
||||
}
|
||||
} else {
|
||||
return 0, fmt.Errorf("unknown time unit %q", unit)
|
||||
}
|
||||
}
|
||||
|
||||
total += d
|
||||
}
|
||||
|
||||
return total, nil
|
||||
}
|
||||
263
internal/cli/duration_test.go
Normal file
263
internal/cli/duration_test.go
Normal file
@@ -0,0 +1,263 @@
|
||||
package cli
|
||||
|
||||
import (
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
)
|
||||
|
||||
func TestParseDuration(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
input string
|
||||
expected time.Duration
|
||||
wantErr bool
|
||||
}{
|
||||
// Standard Go durations
|
||||
{
|
||||
name: "standard seconds",
|
||||
input: "30s",
|
||||
expected: 30 * time.Second,
|
||||
},
|
||||
{
|
||||
name: "standard minutes",
|
||||
input: "45m",
|
||||
expected: 45 * time.Minute,
|
||||
},
|
||||
{
|
||||
name: "standard hours",
|
||||
input: "2h",
|
||||
expected: 2 * time.Hour,
|
||||
},
|
||||
{
|
||||
name: "standard combined",
|
||||
input: "3h30m",
|
||||
expected: 3*time.Hour + 30*time.Minute,
|
||||
},
|
||||
{
|
||||
name: "standard complex",
|
||||
input: "1h45m30s",
|
||||
expected: 1*time.Hour + 45*time.Minute + 30*time.Second,
|
||||
},
|
||||
{
|
||||
name: "standard with milliseconds",
|
||||
input: "1s500ms",
|
||||
expected: 1*time.Second + 500*time.Millisecond,
|
||||
},
|
||||
// Extended units - days
|
||||
{
|
||||
name: "single day",
|
||||
input: "1d",
|
||||
expected: 24 * time.Hour,
|
||||
},
|
||||
{
|
||||
name: "multiple days",
|
||||
input: "7d",
|
||||
expected: 7 * 24 * time.Hour,
|
||||
},
|
||||
{
|
||||
name: "fractional days",
|
||||
input: "1.5d",
|
||||
expected: 36 * time.Hour,
|
||||
},
|
||||
{
|
||||
name: "days spelled out",
|
||||
input: "3days",
|
||||
expected: 3 * 24 * time.Hour,
|
||||
},
|
||||
// Extended units - weeks
|
||||
{
|
||||
name: "single week",
|
||||
input: "1w",
|
||||
expected: 7 * 24 * time.Hour,
|
||||
},
|
||||
{
|
||||
name: "multiple weeks",
|
||||
input: "4w",
|
||||
expected: 4 * 7 * 24 * time.Hour,
|
||||
},
|
||||
{
|
||||
name: "weeks spelled out",
|
||||
input: "2weeks",
|
||||
expected: 2 * 7 * 24 * time.Hour,
|
||||
},
|
||||
// Extended units - months
|
||||
{
|
||||
name: "single month",
|
||||
input: "1mo",
|
||||
expected: 30 * 24 * time.Hour,
|
||||
},
|
||||
{
|
||||
name: "multiple months",
|
||||
input: "6mo",
|
||||
expected: 6 * 30 * 24 * time.Hour,
|
||||
},
|
||||
{
|
||||
name: "months spelled out",
|
||||
input: "3months",
|
||||
expected: 3 * 30 * 24 * time.Hour,
|
||||
},
|
||||
// Extended units - years
|
||||
{
|
||||
name: "single year",
|
||||
input: "1y",
|
||||
expected: 365 * 24 * time.Hour,
|
||||
},
|
||||
{
|
||||
name: "multiple years",
|
||||
input: "2y",
|
||||
expected: 2 * 365 * 24 * time.Hour,
|
||||
},
|
||||
{
|
||||
name: "years spelled out",
|
||||
input: "1year",
|
||||
expected: 365 * 24 * time.Hour,
|
||||
},
|
||||
// Combined extended units
|
||||
{
|
||||
name: "weeks and days",
|
||||
input: "2w3d",
|
||||
expected: 2*7*24*time.Hour + 3*24*time.Hour,
|
||||
},
|
||||
{
|
||||
name: "years and months",
|
||||
input: "1y6mo",
|
||||
expected: 365*24*time.Hour + 6*30*24*time.Hour,
|
||||
},
|
||||
{
|
||||
name: "days and hours",
|
||||
input: "1d12h",
|
||||
expected: 24*time.Hour + 12*time.Hour,
|
||||
},
|
||||
{
|
||||
name: "complex combination",
|
||||
input: "1y2mo3w4d5h6m7s",
|
||||
expected: 365*24*time.Hour + 2*30*24*time.Hour + 3*7*24*time.Hour + 4*24*time.Hour + 5*time.Hour + 6*time.Minute + 7*time.Second,
|
||||
},
|
||||
{
|
||||
name: "with spaces",
|
||||
input: "1d 12h 30m",
|
||||
expected: 24*time.Hour + 12*time.Hour + 30*time.Minute,
|
||||
},
|
||||
// Edge cases
|
||||
{
|
||||
name: "zero duration",
|
||||
input: "0s",
|
||||
expected: 0,
|
||||
},
|
||||
{
|
||||
name: "large duration",
|
||||
input: "10y",
|
||||
expected: 10 * 365 * 24 * time.Hour,
|
||||
},
|
||||
// Error cases
|
||||
{
|
||||
name: "empty string",
|
||||
input: "",
|
||||
wantErr: true,
|
||||
},
|
||||
{
|
||||
name: "invalid format",
|
||||
input: "abc",
|
||||
wantErr: true,
|
||||
},
|
||||
{
|
||||
name: "unknown unit",
|
||||
input: "5x",
|
||||
wantErr: true,
|
||||
},
|
||||
{
|
||||
name: "invalid number",
|
||||
input: "xyzd",
|
||||
wantErr: true,
|
||||
},
|
||||
{
|
||||
name: "negative not supported",
|
||||
input: "-5d",
|
||||
wantErr: true,
|
||||
},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
got, err := parseDuration(tt.input)
|
||||
|
||||
if tt.wantErr {
|
||||
assert.Error(t, err, "expected error for input %q", tt.input)
|
||||
return
|
||||
}
|
||||
|
||||
assert.NoError(t, err, "unexpected error for input %q", tt.input)
|
||||
assert.Equal(t, tt.expected, got, "duration mismatch for input %q", tt.input)
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseDurationSpecialCases(t *testing.T) {
|
||||
// Test that standard Go durations work exactly as expected
|
||||
standardDurations := []string{
|
||||
"300ms",
|
||||
"1.5h",
|
||||
"2h45m",
|
||||
"72h",
|
||||
"1us",
|
||||
"1µs",
|
||||
"1ns",
|
||||
}
|
||||
|
||||
for _, d := range standardDurations {
|
||||
expected, err := time.ParseDuration(d)
|
||||
assert.NoError(t, err)
|
||||
|
||||
got, err := parseDuration(d)
|
||||
assert.NoError(t, err)
|
||||
assert.Equal(t, expected, got, "standard duration %q should parse identically", d)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseDurationRealWorldExamples(t *testing.T) {
|
||||
// Test real-world snapshot purge scenarios
|
||||
tests := []struct {
|
||||
description string
|
||||
input string
|
||||
olderThan time.Duration
|
||||
}{
|
||||
{
|
||||
description: "keep snapshots from last 30 days",
|
||||
input: "30d",
|
||||
olderThan: 30 * 24 * time.Hour,
|
||||
},
|
||||
{
|
||||
description: "keep snapshots from last 6 months",
|
||||
input: "6mo",
|
||||
olderThan: 6 * 30 * 24 * time.Hour,
|
||||
},
|
||||
{
|
||||
description: "keep snapshots from last year",
|
||||
input: "1y",
|
||||
olderThan: 365 * 24 * time.Hour,
|
||||
},
|
||||
{
|
||||
description: "keep snapshots from last week and a half",
|
||||
input: "1w3d",
|
||||
olderThan: 10 * 24 * time.Hour,
|
||||
},
|
||||
{
|
||||
description: "keep snapshots from last 90 days",
|
||||
input: "90d",
|
||||
olderThan: 90 * 24 * time.Hour,
|
||||
},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.description, func(t *testing.T) {
|
||||
got, err := parseDuration(tt.input)
|
||||
assert.NoError(t, err)
|
||||
assert.Equal(t, tt.olderThan, got)
|
||||
|
||||
// Verify the duration makes sense for snapshot purging
|
||||
assert.Greater(t, got, time.Hour, "snapshot purge duration should be at least an hour")
|
||||
})
|
||||
}
|
||||
}
|
||||
@@ -4,7 +4,9 @@ import (
|
||||
"os"
|
||||
)
|
||||
|
||||
// CLIEntry is the main entry point for the CLI application
|
||||
// CLIEntry is the main entry point for the CLI application.
|
||||
// It creates the root command, executes it, and exits with status 1
|
||||
// if an error occurs. This function should be called from main().
|
||||
func CLIEntry() {
|
||||
rootCmd := NewRootCommand()
|
||||
if err := rootCmd.Execute(); err != nil {
|
||||
|
||||
@@ -18,7 +18,7 @@ func TestCLIEntry(t *testing.T) {
|
||||
}
|
||||
|
||||
// Verify all subcommands are registered
|
||||
expectedCommands := []string{"backup", "restore", "prune", "verify", "fetch"}
|
||||
expectedCommands := []string{"snapshot", "store", "restore", "prune", "verify", "info", "version"}
|
||||
for _, expected := range expectedCommands {
|
||||
found := false
|
||||
for _, cmd := range cmd.Commands() {
|
||||
@@ -32,19 +32,24 @@ func TestCLIEntry(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
// Verify backup command has proper flags
|
||||
backupCmd, _, err := cmd.Find([]string{"backup"})
|
||||
// Verify snapshot command has subcommands
|
||||
snapshotCmd, _, err := cmd.Find([]string{"snapshot"})
|
||||
if err != nil {
|
||||
t.Errorf("Failed to find backup command: %v", err)
|
||||
t.Errorf("Failed to find snapshot command: %v", err)
|
||||
} else {
|
||||
if backupCmd.Flag("config") == nil {
|
||||
t.Error("Backup command missing --config flag")
|
||||
// Check snapshot subcommands
|
||||
expectedSubCommands := []string{"create", "list", "purge", "verify"}
|
||||
for _, expected := range expectedSubCommands {
|
||||
found := false
|
||||
for _, subcmd := range snapshotCmd.Commands() {
|
||||
if subcmd.Use == expected || subcmd.Name() == expected {
|
||||
found = true
|
||||
break
|
||||
}
|
||||
if backupCmd.Flag("daemon") == nil {
|
||||
t.Error("Backup command missing --daemon flag")
|
||||
}
|
||||
if backupCmd.Flag("cron") == nil {
|
||||
t.Error("Backup command missing --cron flag")
|
||||
if !found {
|
||||
t.Errorf("Expected snapshot subcommand '%s' not found", expected)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,88 +0,0 @@
|
||||
package cli
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"os"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/globals"
|
||||
"github.com/spf13/cobra"
|
||||
"go.uber.org/fx"
|
||||
)
|
||||
|
||||
// FetchOptions contains options for the fetch command
|
||||
type FetchOptions struct {
|
||||
Bucket string
|
||||
Prefix string
|
||||
SnapshotID string
|
||||
FilePath string
|
||||
Target string
|
||||
}
|
||||
|
||||
// NewFetchCommand creates the fetch command
|
||||
func NewFetchCommand() *cobra.Command {
|
||||
opts := &FetchOptions{}
|
||||
|
||||
cmd := &cobra.Command{
|
||||
Use: "fetch",
|
||||
Short: "Extract single file from backup",
|
||||
Long: `Download and decrypt a single file from a backup snapshot`,
|
||||
Args: cobra.NoArgs,
|
||||
RunE: func(cmd *cobra.Command, args []string) error {
|
||||
// Validate required flags
|
||||
if opts.Bucket == "" {
|
||||
return fmt.Errorf("--bucket is required")
|
||||
}
|
||||
if opts.Prefix == "" {
|
||||
return fmt.Errorf("--prefix is required")
|
||||
}
|
||||
if opts.SnapshotID == "" {
|
||||
return fmt.Errorf("--snapshot is required")
|
||||
}
|
||||
if opts.FilePath == "" {
|
||||
return fmt.Errorf("--file is required")
|
||||
}
|
||||
if opts.Target == "" {
|
||||
return fmt.Errorf("--target is required")
|
||||
}
|
||||
return runFetch(cmd.Context(), opts)
|
||||
},
|
||||
}
|
||||
|
||||
cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name")
|
||||
cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix")
|
||||
cmd.Flags().StringVar(&opts.SnapshotID, "snapshot", "", "Snapshot ID")
|
||||
cmd.Flags().StringVar(&opts.FilePath, "file", "", "Path of file to extract from backup")
|
||||
cmd.Flags().StringVar(&opts.Target, "target", "", "Target path for extracted file")
|
||||
|
||||
return cmd
|
||||
}
|
||||
|
||||
func runFetch(ctx context.Context, opts *FetchOptions) error {
|
||||
if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
|
||||
return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
|
||||
}
|
||||
|
||||
app := fx.New(
|
||||
fx.Supply(opts),
|
||||
fx.Provide(globals.New),
|
||||
// Additional modules will be added here
|
||||
fx.Invoke(func(g *globals.Globals) error {
|
||||
// TODO: Implement fetch logic
|
||||
fmt.Printf("Fetching %s from snapshot %s to %s\n", opts.FilePath, opts.SnapshotID, opts.Target)
|
||||
return nil
|
||||
}),
|
||||
fx.NopLogger,
|
||||
)
|
||||
|
||||
if err := app.Start(ctx); err != nil {
|
||||
return fmt.Errorf("failed to start fetch: %w", err)
|
||||
}
|
||||
defer func() {
|
||||
if err := app.Stop(ctx); err != nil {
|
||||
fmt.Printf("error stopping app: %v\n", err)
|
||||
}
|
||||
}()
|
||||
|
||||
return nil
|
||||
}
|
||||
71
internal/cli/info.go
Normal file
71
internal/cli/info.go
Normal file
@@ -0,0 +1,71 @@
|
||||
package cli
|
||||
|
||||
import (
|
||||
"context"
|
||||
"os"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
"git.eeqj.de/sneak/vaultik/internal/vaultik"
|
||||
"github.com/spf13/cobra"
|
||||
"go.uber.org/fx"
|
||||
)
|
||||
|
||||
// NewInfoCommand creates the info command
|
||||
func NewInfoCommand() *cobra.Command {
|
||||
cmd := &cobra.Command{
|
||||
Use: "info",
|
||||
Short: "Display system and configuration information",
|
||||
Long: `Shows information about the current vaultik configuration, including:
|
||||
- System details (OS, architecture, version)
|
||||
- Storage configuration (S3 bucket, endpoint)
|
||||
- Backup settings (source directories, compression)
|
||||
- Encryption configuration (recipients)
|
||||
- Local database statistics`,
|
||||
Args: cobra.NoArgs,
|
||||
RunE: func(cmd *cobra.Command, args []string) error {
|
||||
// Use unified config resolution
|
||||
configPath, err := ResolveConfigPath()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Use the app framework
|
||||
rootFlags := GetRootFlags()
|
||||
return RunWithApp(cmd.Context(), AppOptions{
|
||||
ConfigPath: configPath,
|
||||
LogOptions: log.LogOptions{
|
||||
Verbose: rootFlags.Verbose,
|
||||
Debug: rootFlags.Debug,
|
||||
Quiet: rootFlags.Quiet,
|
||||
},
|
||||
Modules: []fx.Option{},
|
||||
Invokes: []fx.Option{
|
||||
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||
lc.Append(fx.Hook{
|
||||
OnStart: func(ctx context.Context) error {
|
||||
go func() {
|
||||
if err := v.ShowInfo(); err != nil {
|
||||
if err != context.Canceled {
|
||||
log.Error("Failed to show info", "error", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||
log.Error("Failed to shutdown", "error", err)
|
||||
}
|
||||
}()
|
||||
return nil
|
||||
},
|
||||
OnStop: func(ctx context.Context) error {
|
||||
v.Cancel()
|
||||
return nil
|
||||
},
|
||||
})
|
||||
}),
|
||||
},
|
||||
})
|
||||
},
|
||||
}
|
||||
|
||||
return cmd
|
||||
}
|
||||
@@ -2,77 +2,83 @@ package cli
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"os"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/globals"
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
"git.eeqj.de/sneak/vaultik/internal/vaultik"
|
||||
"github.com/spf13/cobra"
|
||||
"go.uber.org/fx"
|
||||
)
|
||||
|
||||
// PruneOptions contains options for the prune command
|
||||
type PruneOptions struct {
|
||||
Bucket string
|
||||
Prefix string
|
||||
DryRun bool
|
||||
}
|
||||
|
||||
// NewPruneCommand creates the prune command
|
||||
func NewPruneCommand() *cobra.Command {
|
||||
opts := &PruneOptions{}
|
||||
opts := &vaultik.PruneOptions{}
|
||||
|
||||
cmd := &cobra.Command{
|
||||
Use: "prune",
|
||||
Short: "Remove unreferenced blobs",
|
||||
Long: `Delete blobs that are no longer referenced by any snapshot`,
|
||||
Long: `Removes blobs that are not referenced by any snapshot.
|
||||
|
||||
This command scans all snapshots and their manifests to build a list of
|
||||
referenced blobs, then removes any blobs in storage that are not in this list.
|
||||
|
||||
Use this command after deleting snapshots with 'vaultik purge' to reclaim
|
||||
storage space.`,
|
||||
Args: cobra.NoArgs,
|
||||
RunE: func(cmd *cobra.Command, args []string) error {
|
||||
// Validate required flags
|
||||
if opts.Bucket == "" {
|
||||
return fmt.Errorf("--bucket is required")
|
||||
// Use unified config resolution
|
||||
configPath, err := ResolveConfigPath()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if opts.Prefix == "" {
|
||||
return fmt.Errorf("--prefix is required")
|
||||
|
||||
// Use the app framework like other commands
|
||||
rootFlags := GetRootFlags()
|
||||
return RunWithApp(cmd.Context(), AppOptions{
|
||||
ConfigPath: configPath,
|
||||
LogOptions: log.LogOptions{
|
||||
Verbose: rootFlags.Verbose,
|
||||
Debug: rootFlags.Debug,
|
||||
Quiet: rootFlags.Quiet || opts.JSON,
|
||||
},
|
||||
Modules: []fx.Option{},
|
||||
Invokes: []fx.Option{
|
||||
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||
lc.Append(fx.Hook{
|
||||
OnStart: func(ctx context.Context) error {
|
||||
// Start the prune operation in a goroutine
|
||||
go func() {
|
||||
// Run the prune operation
|
||||
if err := v.PruneBlobs(opts); err != nil {
|
||||
if err != context.Canceled {
|
||||
if !opts.JSON {
|
||||
log.Error("Prune operation failed", "error", err)
|
||||
}
|
||||
return runPrune(cmd.Context(), opts)
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
|
||||
// Shutdown the app when prune completes
|
||||
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||
log.Error("Failed to shutdown", "error", err)
|
||||
}
|
||||
}()
|
||||
return nil
|
||||
},
|
||||
OnStop: func(ctx context.Context) error {
|
||||
log.Debug("Stopping prune operation")
|
||||
v.Cancel()
|
||||
return nil
|
||||
},
|
||||
})
|
||||
}),
|
||||
},
|
||||
})
|
||||
},
|
||||
}
|
||||
|
||||
cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name")
|
||||
cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix")
|
||||
cmd.Flags().BoolVar(&opts.DryRun, "dry-run", false, "Show what would be deleted without actually deleting")
|
||||
cmd.Flags().BoolVar(&opts.Force, "force", false, "Skip confirmation prompt")
|
||||
cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output pruning stats as JSON")
|
||||
|
||||
return cmd
|
||||
}
|
||||
|
||||
func runPrune(ctx context.Context, opts *PruneOptions) error {
|
||||
if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
|
||||
return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
|
||||
}
|
||||
|
||||
app := fx.New(
|
||||
fx.Supply(opts),
|
||||
fx.Provide(globals.New),
|
||||
// Additional modules will be added here
|
||||
fx.Invoke(func(g *globals.Globals) error {
|
||||
// TODO: Implement prune logic
|
||||
fmt.Printf("Pruning bucket %s with prefix %s\n", opts.Bucket, opts.Prefix)
|
||||
if opts.DryRun {
|
||||
fmt.Println("Running in dry-run mode")
|
||||
}
|
||||
return nil
|
||||
}),
|
||||
fx.NopLogger,
|
||||
)
|
||||
|
||||
if err := app.Start(ctx); err != nil {
|
||||
return fmt.Errorf("failed to start prune: %w", err)
|
||||
}
|
||||
defer func() {
|
||||
if err := app.Stop(ctx); err != nil {
|
||||
fmt.Printf("error stopping app: %v\n", err)
|
||||
}
|
||||
}()
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
100
internal/cli/purge.go
Normal file
100
internal/cli/purge.go
Normal file
@@ -0,0 +1,100 @@
|
||||
package cli
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"os"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
"git.eeqj.de/sneak/vaultik/internal/vaultik"
|
||||
"github.com/spf13/cobra"
|
||||
"go.uber.org/fx"
|
||||
)
|
||||
|
||||
// PurgeOptions contains options for the purge command
|
||||
type PurgeOptions struct {
|
||||
KeepLatest bool
|
||||
OlderThan string
|
||||
Force bool
|
||||
}
|
||||
|
||||
// NewPurgeCommand creates the purge command
|
||||
func NewPurgeCommand() *cobra.Command {
|
||||
opts := &PurgeOptions{}
|
||||
|
||||
cmd := &cobra.Command{
|
||||
Use: "purge",
|
||||
Short: "Purge old snapshots",
|
||||
Long: `Removes snapshots based on age or count criteria.
|
||||
|
||||
This command allows you to:
|
||||
- Keep only the latest snapshot (--keep-latest)
|
||||
- Remove snapshots older than a specific duration (--older-than)
|
||||
|
||||
Config is located at /etc/vaultik/config.yml by default, but can be overridden by
|
||||
specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
|
||||
Args: cobra.NoArgs,
|
||||
RunE: func(cmd *cobra.Command, args []string) error {
|
||||
// Validate flags
|
||||
if !opts.KeepLatest && opts.OlderThan == "" {
|
||||
return fmt.Errorf("must specify either --keep-latest or --older-than")
|
||||
}
|
||||
if opts.KeepLatest && opts.OlderThan != "" {
|
||||
return fmt.Errorf("cannot specify both --keep-latest and --older-than")
|
||||
}
|
||||
|
||||
// Use unified config resolution
|
||||
configPath, err := ResolveConfigPath()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Use the app framework like other commands
|
||||
rootFlags := GetRootFlags()
|
||||
return RunWithApp(cmd.Context(), AppOptions{
|
||||
ConfigPath: configPath,
|
||||
LogOptions: log.LogOptions{
|
||||
Verbose: rootFlags.Verbose,
|
||||
Debug: rootFlags.Debug,
|
||||
Quiet: rootFlags.Quiet,
|
||||
},
|
||||
Modules: []fx.Option{},
|
||||
Invokes: []fx.Option{
|
||||
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||
lc.Append(fx.Hook{
|
||||
OnStart: func(ctx context.Context) error {
|
||||
// Start the purge operation in a goroutine
|
||||
go func() {
|
||||
// Run the purge operation
|
||||
if err := v.PurgeSnapshots(opts.KeepLatest, opts.OlderThan, opts.Force); err != nil {
|
||||
if err != context.Canceled {
|
||||
log.Error("Purge operation failed", "error", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
|
||||
// Shutdown the app when purge completes
|
||||
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||
log.Error("Failed to shutdown", "error", err)
|
||||
}
|
||||
}()
|
||||
return nil
|
||||
},
|
||||
OnStop: func(ctx context.Context) error {
|
||||
log.Debug("Stopping purge operation")
|
||||
v.Cancel()
|
||||
return nil
|
||||
},
|
||||
})
|
||||
}),
|
||||
},
|
||||
})
|
||||
},
|
||||
}
|
||||
|
||||
cmd.Flags().BoolVar(&opts.KeepLatest, "keep-latest", false, "Keep only the latest snapshot")
|
||||
cmd.Flags().StringVar(&opts.OlderThan, "older-than", "", "Remove snapshots older than duration (e.g. 30d, 6m, 1y)")
|
||||
cmd.Flags().BoolVar(&opts.Force, "force", false, "Skip confirmation prompts")
|
||||
|
||||
return cmd
|
||||
}
|
||||
89
internal/cli/remote.go
Normal file
89
internal/cli/remote.go
Normal file
@@ -0,0 +1,89 @@
|
||||
package cli
|
||||
|
||||
import (
|
||||
"context"
|
||||
"os"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
"git.eeqj.de/sneak/vaultik/internal/vaultik"
|
||||
"github.com/spf13/cobra"
|
||||
"go.uber.org/fx"
|
||||
)
|
||||
|
||||
// NewRemoteCommand creates the remote command and subcommands
|
||||
func NewRemoteCommand() *cobra.Command {
|
||||
cmd := &cobra.Command{
|
||||
Use: "remote",
|
||||
Short: "Remote storage management commands",
|
||||
Long: "Commands for inspecting and managing remote storage",
|
||||
}
|
||||
|
||||
// Add subcommands
|
||||
cmd.AddCommand(newRemoteInfoCommand())
|
||||
|
||||
return cmd
|
||||
}
|
||||
|
||||
// newRemoteInfoCommand creates the 'remote info' subcommand
|
||||
func newRemoteInfoCommand() *cobra.Command {
|
||||
var jsonOutput bool
|
||||
|
||||
cmd := &cobra.Command{
|
||||
Use: "info",
|
||||
Short: "Display remote storage information",
|
||||
Long: `Shows detailed information about remote storage, including:
|
||||
- Size of all snapshot metadata (per snapshot and total)
|
||||
- Count and total size of all blobs
|
||||
- Count and size of referenced blobs (from all manifests)
|
||||
- Count and size of orphaned blobs (not referenced by any manifest)`,
|
||||
Args: cobra.NoArgs,
|
||||
RunE: func(cmd *cobra.Command, args []string) error {
|
||||
// Use unified config resolution
|
||||
configPath, err := ResolveConfigPath()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
rootFlags := GetRootFlags()
|
||||
return RunWithApp(cmd.Context(), AppOptions{
|
||||
ConfigPath: configPath,
|
||||
LogOptions: log.LogOptions{
|
||||
Verbose: rootFlags.Verbose,
|
||||
Debug: rootFlags.Debug,
|
||||
Quiet: rootFlags.Quiet || jsonOutput,
|
||||
},
|
||||
Modules: []fx.Option{},
|
||||
Invokes: []fx.Option{
|
||||
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||
lc.Append(fx.Hook{
|
||||
OnStart: func(ctx context.Context) error {
|
||||
go func() {
|
||||
if err := v.RemoteInfo(jsonOutput); err != nil {
|
||||
if err != context.Canceled {
|
||||
if !jsonOutput {
|
||||
log.Error("Failed to get remote info", "error", err)
|
||||
}
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||
log.Error("Failed to shutdown", "error", err)
|
||||
}
|
||||
}()
|
||||
return nil
|
||||
},
|
||||
OnStop: func(ctx context.Context) error {
|
||||
v.Cancel()
|
||||
return nil
|
||||
},
|
||||
})
|
||||
}),
|
||||
},
|
||||
})
|
||||
},
|
||||
}
|
||||
|
||||
cmd.Flags().BoolVar(&jsonOutput, "json", false, "Output in JSON format")
|
||||
|
||||
return cmd
|
||||
}
|
||||
@@ -2,20 +2,30 @@ package cli
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"os"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/config"
|
||||
"git.eeqj.de/sneak/vaultik/internal/globals"
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
"git.eeqj.de/sneak/vaultik/internal/storage"
|
||||
"git.eeqj.de/sneak/vaultik/internal/vaultik"
|
||||
"github.com/spf13/cobra"
|
||||
"go.uber.org/fx"
|
||||
)
|
||||
|
||||
// RestoreOptions contains options for the restore command
|
||||
type RestoreOptions struct {
|
||||
Bucket string
|
||||
Prefix string
|
||||
SnapshotID string
|
||||
TargetDir string
|
||||
Paths []string // Optional paths to restore (empty = all)
|
||||
Verify bool // Verify restored files after restore
|
||||
}
|
||||
|
||||
// RestoreApp contains all dependencies needed for restore
|
||||
type RestoreApp struct {
|
||||
Globals *globals.Globals
|
||||
Config *config.Config
|
||||
Storage storage.Storer
|
||||
Vaultik *vaultik.Vaultik
|
||||
Shutdowner fx.Shutdowner
|
||||
}
|
||||
|
||||
// NewRestoreCommand creates the restore command
|
||||
@@ -23,61 +33,104 @@ func NewRestoreCommand() *cobra.Command {
|
||||
opts := &RestoreOptions{}
|
||||
|
||||
cmd := &cobra.Command{
|
||||
Use: "restore",
|
||||
Use: "restore <snapshot-id> <target-dir> [paths...]",
|
||||
Short: "Restore files from backup",
|
||||
Long: `Download and decrypt files from a backup snapshot`,
|
||||
Args: cobra.NoArgs,
|
||||
Long: `Download and decrypt files from a backup snapshot.
|
||||
|
||||
This command will restore files from the specified snapshot to the target directory.
|
||||
If no paths are specified, all files are restored.
|
||||
If paths are specified, only matching files/directories are restored.
|
||||
|
||||
Requires the VAULTIK_AGE_SECRET_KEY environment variable to be set with the age private key.
|
||||
|
||||
Examples:
|
||||
# Restore entire snapshot
|
||||
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore
|
||||
|
||||
# Restore specific file
|
||||
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore /home/user/important.txt
|
||||
|
||||
# Restore specific directory
|
||||
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore /home/user/documents/
|
||||
|
||||
# Restore and verify all files
|
||||
vaultik restore --verify myhost_docs_2025-01-01T12:00:00Z /restore`,
|
||||
Args: cobra.MinimumNArgs(2),
|
||||
RunE: func(cmd *cobra.Command, args []string) error {
|
||||
// Validate required flags
|
||||
if opts.Bucket == "" {
|
||||
return fmt.Errorf("--bucket is required")
|
||||
snapshotID := args[0]
|
||||
opts.TargetDir = args[1]
|
||||
if len(args) > 2 {
|
||||
opts.Paths = args[2:]
|
||||
}
|
||||
if opts.Prefix == "" {
|
||||
return fmt.Errorf("--prefix is required")
|
||||
|
||||
// Use unified config resolution
|
||||
configPath, err := ResolveConfigPath()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if opts.SnapshotID == "" {
|
||||
return fmt.Errorf("--snapshot is required")
|
||||
|
||||
// Use the app framework like other commands
|
||||
rootFlags := GetRootFlags()
|
||||
return RunWithApp(cmd.Context(), AppOptions{
|
||||
ConfigPath: configPath,
|
||||
LogOptions: log.LogOptions{
|
||||
Verbose: rootFlags.Verbose,
|
||||
Debug: rootFlags.Debug,
|
||||
Quiet: rootFlags.Quiet,
|
||||
},
|
||||
Modules: []fx.Option{
|
||||
fx.Provide(fx.Annotate(
|
||||
func(g *globals.Globals, cfg *config.Config,
|
||||
storer storage.Storer, v *vaultik.Vaultik, shutdowner fx.Shutdowner) *RestoreApp {
|
||||
return &RestoreApp{
|
||||
Globals: g,
|
||||
Config: cfg,
|
||||
Storage: storer,
|
||||
Vaultik: v,
|
||||
Shutdowner: shutdowner,
|
||||
}
|
||||
if opts.TargetDir == "" {
|
||||
return fmt.Errorf("--target is required")
|
||||
},
|
||||
)),
|
||||
},
|
||||
Invokes: []fx.Option{
|
||||
fx.Invoke(func(app *RestoreApp, lc fx.Lifecycle) {
|
||||
lc.Append(fx.Hook{
|
||||
OnStart: func(ctx context.Context) error {
|
||||
// Start the restore operation in a goroutine
|
||||
go func() {
|
||||
// Run the restore operation
|
||||
restoreOpts := &vaultik.RestoreOptions{
|
||||
SnapshotID: snapshotID,
|
||||
TargetDir: opts.TargetDir,
|
||||
Paths: opts.Paths,
|
||||
Verify: opts.Verify,
|
||||
}
|
||||
return runRestore(cmd.Context(), opts)
|
||||
if err := app.Vaultik.Restore(restoreOpts); err != nil {
|
||||
if err != context.Canceled {
|
||||
log.Error("Restore operation failed", "error", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Shutdown the app when restore completes
|
||||
if err := app.Shutdowner.Shutdown(); err != nil {
|
||||
log.Error("Failed to shutdown", "error", err)
|
||||
}
|
||||
}()
|
||||
return nil
|
||||
},
|
||||
OnStop: func(ctx context.Context) error {
|
||||
log.Debug("Stopping restore operation")
|
||||
app.Vaultik.Cancel()
|
||||
return nil
|
||||
},
|
||||
})
|
||||
}),
|
||||
},
|
||||
})
|
||||
},
|
||||
}
|
||||
|
||||
cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name")
|
||||
cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix")
|
||||
cmd.Flags().StringVar(&opts.SnapshotID, "snapshot", "", "Snapshot ID to restore")
|
||||
cmd.Flags().StringVar(&opts.TargetDir, "target", "", "Target directory for restore")
|
||||
cmd.Flags().BoolVar(&opts.Verify, "verify", false, "Verify restored files by checking chunk hashes")
|
||||
|
||||
return cmd
|
||||
}
|
||||
|
||||
func runRestore(ctx context.Context, opts *RestoreOptions) error {
|
||||
if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
|
||||
return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
|
||||
}
|
||||
|
||||
app := fx.New(
|
||||
fx.Supply(opts),
|
||||
fx.Provide(globals.New),
|
||||
// Additional modules will be added here
|
||||
fx.Invoke(func(g *globals.Globals) error {
|
||||
// TODO: Implement restore logic
|
||||
fmt.Printf("Restoring snapshot %s to %s\n", opts.SnapshotID, opts.TargetDir)
|
||||
return nil
|
||||
}),
|
||||
fx.NopLogger,
|
||||
)
|
||||
|
||||
if err := app.Start(ctx); err != nil {
|
||||
return fmt.Errorf("failed to start restore: %w", err)
|
||||
}
|
||||
defer func() {
|
||||
if err := app.Stop(ctx); err != nil {
|
||||
fmt.Printf("error stopping app: %v\n", err)
|
||||
}
|
||||
}()
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -1,10 +1,26 @@
|
||||
package cli
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
|
||||
"github.com/spf13/cobra"
|
||||
)
|
||||
|
||||
// NewRootCommand creates the root cobra command
|
||||
// RootFlags holds global flags that apply to all commands.
|
||||
// These flags are defined on the root command and inherited by all subcommands.
|
||||
type RootFlags struct {
|
||||
ConfigPath string
|
||||
Verbose bool
|
||||
Debug bool
|
||||
Quiet bool
|
||||
}
|
||||
|
||||
var rootFlags RootFlags
|
||||
|
||||
// NewRootCommand creates the root cobra command for the vaultik CLI.
|
||||
// It sets up the command structure, global flags, and adds all subcommands.
|
||||
// This is the main entry point for the CLI command hierarchy.
|
||||
func NewRootCommand() *cobra.Command {
|
||||
cmd := &cobra.Command{
|
||||
Use: "vaultik",
|
||||
@@ -15,15 +31,54 @@ on the source system.`,
|
||||
SilenceUsage: true,
|
||||
}
|
||||
|
||||
// Add global flags
|
||||
cmd.PersistentFlags().StringVar(&rootFlags.ConfigPath, "config", "", "Path to config file (default: $VAULTIK_CONFIG or /etc/vaultik/config.yml)")
|
||||
cmd.PersistentFlags().BoolVarP(&rootFlags.Verbose, "verbose", "v", false, "Enable verbose output")
|
||||
cmd.PersistentFlags().BoolVar(&rootFlags.Debug, "debug", false, "Enable debug output")
|
||||
cmd.PersistentFlags().BoolVarP(&rootFlags.Quiet, "quiet", "q", false, "Suppress non-error output")
|
||||
|
||||
// Add subcommands
|
||||
cmd.AddCommand(
|
||||
NewBackupCommand(),
|
||||
NewRestoreCommand(),
|
||||
NewPruneCommand(),
|
||||
NewVerifyCommand(),
|
||||
NewFetchCommand(),
|
||||
SnapshotCmd(),
|
||||
NewStoreCommand(),
|
||||
NewSnapshotCommand(),
|
||||
NewInfoCommand(),
|
||||
NewVersionCommand(),
|
||||
NewRemoteCommand(),
|
||||
NewDatabaseCommand(),
|
||||
)
|
||||
|
||||
return cmd
|
||||
}
|
||||
|
||||
// GetRootFlags returns the global flags that were parsed from the command line.
|
||||
// This allows subcommands to access global flag values like verbosity and config path.
|
||||
func GetRootFlags() RootFlags {
|
||||
return rootFlags
|
||||
}
|
||||
|
||||
// ResolveConfigPath resolves the config file path from flags, environment, or default.
|
||||
// It checks in order: 1) --config flag, 2) VAULTIK_CONFIG environment variable,
|
||||
// 3) default location /etc/vaultik/config.yml. Returns an error if no valid
|
||||
// config file can be found through any of these methods.
|
||||
func ResolveConfigPath() (string, error) {
|
||||
// First check global flag
|
||||
if rootFlags.ConfigPath != "" {
|
||||
return rootFlags.ConfigPath, nil
|
||||
}
|
||||
|
||||
// Then check environment variable
|
||||
if envPath := os.Getenv("VAULTIK_CONFIG"); envPath != "" {
|
||||
return envPath, nil
|
||||
}
|
||||
|
||||
// Finally check default location
|
||||
defaultPath := "/etc/vaultik/config.yml"
|
||||
if _, err := os.Stat(defaultPath); err == nil {
|
||||
return defaultPath, nil
|
||||
}
|
||||
|
||||
return "", fmt.Errorf("no config file specified, VAULTIK_CONFIG not set, and %s not found", defaultPath)
|
||||
}
|
||||
|
||||
@@ -1,90 +1,467 @@
|
||||
package cli
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"os"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
"git.eeqj.de/sneak/vaultik/internal/vaultik"
|
||||
"github.com/spf13/cobra"
|
||||
"go.uber.org/fx"
|
||||
)
|
||||
|
||||
func SnapshotCmd() *cobra.Command {
|
||||
// NewSnapshotCommand creates the snapshot command and subcommands
|
||||
func NewSnapshotCommand() *cobra.Command {
|
||||
cmd := &cobra.Command{
|
||||
Use: "snapshot",
|
||||
Short: "Manage snapshots",
|
||||
Long: "Commands for listing, removing, and querying snapshots",
|
||||
Short: "Snapshot management commands",
|
||||
Long: "Commands for creating, listing, and managing snapshots",
|
||||
}
|
||||
|
||||
cmd.AddCommand(snapshotListCmd())
|
||||
cmd.AddCommand(snapshotRmCmd())
|
||||
cmd.AddCommand(snapshotLatestCmd())
|
||||
// Add subcommands
|
||||
cmd.AddCommand(newSnapshotCreateCommand())
|
||||
cmd.AddCommand(newSnapshotListCommand())
|
||||
cmd.AddCommand(newSnapshotPurgeCommand())
|
||||
cmd.AddCommand(newSnapshotVerifyCommand())
|
||||
cmd.AddCommand(newSnapshotRemoveCommand())
|
||||
cmd.AddCommand(newSnapshotPruneCommand())
|
||||
|
||||
return cmd
|
||||
}
|
||||
|
||||
func snapshotListCmd() *cobra.Command {
|
||||
var (
|
||||
bucket string
|
||||
prefix string
|
||||
limit int
|
||||
)
|
||||
// newSnapshotCreateCommand creates the 'snapshot create' subcommand
|
||||
func newSnapshotCreateCommand() *cobra.Command {
|
||||
opts := &vaultik.SnapshotCreateOptions{}
|
||||
|
||||
cmd := &cobra.Command{
|
||||
Use: "create [snapshot-names...]",
|
||||
Short: "Create new snapshots",
|
||||
Long: `Creates new snapshots of the configured directories.
|
||||
|
||||
If snapshot names are provided, only those snapshots are created.
|
||||
If no names are provided, all configured snapshots are created.
|
||||
|
||||
Config is located at /etc/vaultik/config.yml by default, but can be overridden by
|
||||
specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
|
||||
Args: cobra.ArbitraryArgs,
|
||||
RunE: func(cmd *cobra.Command, args []string) error {
|
||||
// Pass snapshot names from args
|
||||
opts.Snapshots = args
|
||||
// Use unified config resolution
|
||||
configPath, err := ResolveConfigPath()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Use the backup functionality from cli package
|
||||
rootFlags := GetRootFlags()
|
||||
return RunWithApp(cmd.Context(), AppOptions{
|
||||
ConfigPath: configPath,
|
||||
LogOptions: log.LogOptions{
|
||||
Verbose: rootFlags.Verbose,
|
||||
Debug: rootFlags.Debug,
|
||||
Cron: opts.Cron,
|
||||
Quiet: rootFlags.Quiet,
|
||||
},
|
||||
Modules: []fx.Option{},
|
||||
Invokes: []fx.Option{
|
||||
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||
lc.Append(fx.Hook{
|
||||
OnStart: func(ctx context.Context) error {
|
||||
// Start the snapshot creation in a goroutine
|
||||
go func() {
|
||||
// Run the snapshot creation
|
||||
if err := v.CreateSnapshot(opts); err != nil {
|
||||
if err != context.Canceled {
|
||||
log.Error("Snapshot creation failed", "error", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Shutdown the app when snapshot completes
|
||||
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||
log.Error("Failed to shutdown", "error", err)
|
||||
}
|
||||
}()
|
||||
return nil
|
||||
},
|
||||
OnStop: func(ctx context.Context) error {
|
||||
log.Debug("Stopping snapshot creation")
|
||||
// Cancel the Vaultik context
|
||||
v.Cancel()
|
||||
return nil
|
||||
},
|
||||
})
|
||||
}),
|
||||
},
|
||||
})
|
||||
},
|
||||
}
|
||||
|
||||
cmd.Flags().BoolVar(&opts.Daemon, "daemon", false, "Run in daemon mode with inotify monitoring")
|
||||
cmd.Flags().BoolVar(&opts.Cron, "cron", false, "Run in cron mode (silent unless error)")
|
||||
cmd.Flags().BoolVar(&opts.Prune, "prune", false, "Delete all previous snapshots and unreferenced blobs after backup")
|
||||
cmd.Flags().BoolVar(&opts.SkipErrors, "skip-errors", false, "Skip file read errors (log them loudly but continue)")
|
||||
|
||||
return cmd
|
||||
}
|
||||
|
||||
// newSnapshotListCommand creates the 'snapshot list' subcommand
|
||||
func newSnapshotListCommand() *cobra.Command {
|
||||
var jsonOutput bool
|
||||
|
||||
cmd := &cobra.Command{
|
||||
Use: "list",
|
||||
Short: "List snapshots",
|
||||
Long: "List all snapshots in the bucket, sorted by timestamp",
|
||||
Aliases: []string{"ls"},
|
||||
Short: "List all snapshots",
|
||||
Long: "Lists all snapshots with their ID, timestamp, and compressed size",
|
||||
Args: cobra.NoArgs,
|
||||
RunE: func(cmd *cobra.Command, args []string) error {
|
||||
panic("unimplemented")
|
||||
// Use unified config resolution
|
||||
configPath, err := ResolveConfigPath()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
rootFlags := GetRootFlags()
|
||||
return RunWithApp(cmd.Context(), AppOptions{
|
||||
ConfigPath: configPath,
|
||||
LogOptions: log.LogOptions{
|
||||
Verbose: rootFlags.Verbose,
|
||||
Debug: rootFlags.Debug,
|
||||
Quiet: rootFlags.Quiet,
|
||||
},
|
||||
Modules: []fx.Option{},
|
||||
Invokes: []fx.Option{
|
||||
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||
lc.Append(fx.Hook{
|
||||
OnStart: func(ctx context.Context) error {
|
||||
go func() {
|
||||
if err := v.ListSnapshots(jsonOutput); err != nil {
|
||||
if err != context.Canceled {
|
||||
log.Error("Failed to list snapshots", "error", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||
log.Error("Failed to shutdown", "error", err)
|
||||
}
|
||||
}()
|
||||
return nil
|
||||
},
|
||||
OnStop: func(ctx context.Context) error {
|
||||
v.Cancel()
|
||||
return nil
|
||||
},
|
||||
})
|
||||
}),
|
||||
},
|
||||
})
|
||||
},
|
||||
}
|
||||
|
||||
cmd.Flags().StringVar(&bucket, "bucket", "", "S3 bucket name")
|
||||
cmd.Flags().StringVar(&prefix, "prefix", "", "S3 prefix")
|
||||
cmd.Flags().IntVar(&limit, "limit", 10, "Maximum number of snapshots to list")
|
||||
cmd.MarkFlagRequired("bucket")
|
||||
cmd.Flags().BoolVar(&jsonOutput, "json", false, "Output in JSON format")
|
||||
|
||||
return cmd
|
||||
}
|
||||
|
||||
func snapshotRmCmd() *cobra.Command {
|
||||
var (
|
||||
bucket string
|
||||
prefix string
|
||||
snapshot string
|
||||
)
|
||||
// newSnapshotPurgeCommand creates the 'snapshot purge' subcommand
|
||||
func newSnapshotPurgeCommand() *cobra.Command {
|
||||
var keepLatest bool
|
||||
var olderThan string
|
||||
var force bool
|
||||
|
||||
cmd := &cobra.Command{
|
||||
Use: "rm",
|
||||
Short: "Remove a snapshot",
|
||||
Long: "Remove a snapshot and optionally its associated blobs",
|
||||
Use: "purge",
|
||||
Short: "Purge old snapshots",
|
||||
Long: "Removes snapshots based on age or count criteria",
|
||||
Args: cobra.NoArgs,
|
||||
RunE: func(cmd *cobra.Command, args []string) error {
|
||||
panic("unimplemented")
|
||||
// Validate flags
|
||||
if !keepLatest && olderThan == "" {
|
||||
return fmt.Errorf("must specify either --keep-latest or --older-than")
|
||||
}
|
||||
if keepLatest && olderThan != "" {
|
||||
return fmt.Errorf("cannot specify both --keep-latest and --older-than")
|
||||
}
|
||||
|
||||
// Use unified config resolution
|
||||
configPath, err := ResolveConfigPath()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
rootFlags := GetRootFlags()
|
||||
return RunWithApp(cmd.Context(), AppOptions{
|
||||
ConfigPath: configPath,
|
||||
LogOptions: log.LogOptions{
|
||||
Verbose: rootFlags.Verbose,
|
||||
Debug: rootFlags.Debug,
|
||||
Quiet: rootFlags.Quiet,
|
||||
},
|
||||
Modules: []fx.Option{},
|
||||
Invokes: []fx.Option{
|
||||
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||
lc.Append(fx.Hook{
|
||||
OnStart: func(ctx context.Context) error {
|
||||
go func() {
|
||||
if err := v.PurgeSnapshots(keepLatest, olderThan, force); err != nil {
|
||||
if err != context.Canceled {
|
||||
log.Error("Failed to purge snapshots", "error", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||
log.Error("Failed to shutdown", "error", err)
|
||||
}
|
||||
}()
|
||||
return nil
|
||||
},
|
||||
OnStop: func(ctx context.Context) error {
|
||||
v.Cancel()
|
||||
return nil
|
||||
},
|
||||
})
|
||||
}),
|
||||
},
|
||||
})
|
||||
},
|
||||
}
|
||||
|
||||
cmd.Flags().StringVar(&bucket, "bucket", "", "S3 bucket name")
|
||||
cmd.Flags().StringVar(&prefix, "prefix", "", "S3 prefix")
|
||||
cmd.Flags().StringVar(&snapshot, "snapshot", "", "Snapshot ID to remove")
|
||||
cmd.MarkFlagRequired("bucket")
|
||||
cmd.MarkFlagRequired("snapshot")
|
||||
cmd.Flags().BoolVar(&keepLatest, "keep-latest", false, "Keep only the latest snapshot")
|
||||
cmd.Flags().StringVar(&olderThan, "older-than", "", "Remove snapshots older than duration (e.g., 30d, 6m, 1y)")
|
||||
cmd.Flags().BoolVar(&force, "force", false, "Skip confirmation prompt")
|
||||
|
||||
return cmd
|
||||
}
|
||||
|
||||
func snapshotLatestCmd() *cobra.Command {
|
||||
var (
|
||||
bucket string
|
||||
prefix string
|
||||
)
|
||||
// newSnapshotVerifyCommand creates the 'snapshot verify' subcommand
|
||||
func newSnapshotVerifyCommand() *cobra.Command {
|
||||
opts := &vaultik.VerifyOptions{}
|
||||
|
||||
cmd := &cobra.Command{
|
||||
Use: "latest",
|
||||
Short: "Get the latest snapshot ID",
|
||||
Long: "Display the ID of the most recent snapshot",
|
||||
Use: "verify <snapshot-id>",
|
||||
Short: "Verify snapshot integrity",
|
||||
Long: "Verifies that all blobs referenced in a snapshot exist",
|
||||
Args: func(cmd *cobra.Command, args []string) error {
|
||||
if len(args) != 1 {
|
||||
_ = cmd.Help()
|
||||
if len(args) == 0 {
|
||||
return fmt.Errorf("snapshot ID required")
|
||||
}
|
||||
return fmt.Errorf("expected 1 argument, got %d", len(args))
|
||||
}
|
||||
return nil
|
||||
},
|
||||
RunE: func(cmd *cobra.Command, args []string) error {
|
||||
panic("unimplemented")
|
||||
snapshotID := args[0]
|
||||
|
||||
// Use unified config resolution
|
||||
configPath, err := ResolveConfigPath()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
rootFlags := GetRootFlags()
|
||||
return RunWithApp(cmd.Context(), AppOptions{
|
||||
ConfigPath: configPath,
|
||||
LogOptions: log.LogOptions{
|
||||
Verbose: rootFlags.Verbose,
|
||||
Debug: rootFlags.Debug,
|
||||
Quiet: rootFlags.Quiet || opts.JSON,
|
||||
},
|
||||
Modules: []fx.Option{},
|
||||
Invokes: []fx.Option{
|
||||
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||
lc.Append(fx.Hook{
|
||||
OnStart: func(ctx context.Context) error {
|
||||
go func() {
|
||||
var err error
|
||||
if opts.Deep {
|
||||
err = v.RunDeepVerify(snapshotID, opts)
|
||||
} else {
|
||||
err = v.VerifySnapshotWithOptions(snapshotID, opts)
|
||||
}
|
||||
if err != nil {
|
||||
if err != context.Canceled {
|
||||
if !opts.JSON {
|
||||
log.Error("Verification failed", "error", err)
|
||||
}
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||
log.Error("Failed to shutdown", "error", err)
|
||||
}
|
||||
}()
|
||||
return nil
|
||||
},
|
||||
OnStop: func(ctx context.Context) error {
|
||||
v.Cancel()
|
||||
return nil
|
||||
},
|
||||
})
|
||||
}),
|
||||
},
|
||||
})
|
||||
},
|
||||
}
|
||||
|
||||
cmd.Flags().StringVar(&bucket, "bucket", "", "S3 bucket name")
|
||||
cmd.Flags().StringVar(&prefix, "prefix", "", "S3 prefix")
|
||||
cmd.MarkFlagRequired("bucket")
|
||||
cmd.Flags().BoolVar(&opts.Deep, "deep", false, "Download and verify blob hashes")
|
||||
cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output verification results as JSON")
|
||||
|
||||
return cmd
|
||||
}
|
||||
|
||||
// newSnapshotRemoveCommand creates the 'snapshot remove' subcommand
|
||||
func newSnapshotRemoveCommand() *cobra.Command {
|
||||
opts := &vaultik.RemoveOptions{}
|
||||
|
||||
cmd := &cobra.Command{
|
||||
Use: "remove [snapshot-id]",
|
||||
Aliases: []string{"rm"},
|
||||
Short: "Remove a snapshot from the local database",
|
||||
Long: `Removes a snapshot from the local database.
|
||||
|
||||
By default, only removes from the local database. Use --remote to also remove
|
||||
the snapshot metadata from remote storage.
|
||||
|
||||
Note: This does NOT remove blobs. Use 'vaultik prune' to remove orphaned blobs
|
||||
after removing snapshots.
|
||||
|
||||
Use --all --force to remove all snapshots.`,
|
||||
Args: func(cmd *cobra.Command, args []string) error {
|
||||
all, _ := cmd.Flags().GetBool("all")
|
||||
if all {
|
||||
if len(args) > 0 {
|
||||
_ = cmd.Help()
|
||||
return fmt.Errorf("--all cannot be used with a snapshot ID")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
if len(args) != 1 {
|
||||
_ = cmd.Help()
|
||||
if len(args) == 0 {
|
||||
return fmt.Errorf("snapshot ID required (or use --all --force)")
|
||||
}
|
||||
return fmt.Errorf("expected 1 argument, got %d", len(args))
|
||||
}
|
||||
return nil
|
||||
},
|
||||
RunE: func(cmd *cobra.Command, args []string) error {
|
||||
// Use unified config resolution
|
||||
configPath, err := ResolveConfigPath()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
rootFlags := GetRootFlags()
|
||||
return RunWithApp(cmd.Context(), AppOptions{
|
||||
ConfigPath: configPath,
|
||||
LogOptions: log.LogOptions{
|
||||
Verbose: rootFlags.Verbose,
|
||||
Debug: rootFlags.Debug,
|
||||
Quiet: rootFlags.Quiet || opts.JSON,
|
||||
},
|
||||
Modules: []fx.Option{},
|
||||
Invokes: []fx.Option{
|
||||
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||
lc.Append(fx.Hook{
|
||||
OnStart: func(ctx context.Context) error {
|
||||
go func() {
|
||||
var err error
|
||||
if opts.All {
|
||||
_, err = v.RemoveAllSnapshots(opts)
|
||||
} else {
|
||||
_, err = v.RemoveSnapshot(args[0], opts)
|
||||
}
|
||||
if err != nil {
|
||||
if err != context.Canceled {
|
||||
if !opts.JSON {
|
||||
log.Error("Failed to remove snapshot", "error", err)
|
||||
}
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||
log.Error("Failed to shutdown", "error", err)
|
||||
}
|
||||
}()
|
||||
return nil
|
||||
},
|
||||
OnStop: func(ctx context.Context) error {
|
||||
v.Cancel()
|
||||
return nil
|
||||
},
|
||||
})
|
||||
}),
|
||||
},
|
||||
})
|
||||
},
|
||||
}
|
||||
|
||||
cmd.Flags().BoolVarP(&opts.Force, "force", "f", false, "Skip confirmation prompt")
|
||||
cmd.Flags().BoolVar(&opts.DryRun, "dry-run", false, "Show what would be removed without removing")
|
||||
cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output result as JSON")
|
||||
cmd.Flags().BoolVar(&opts.Remote, "remote", false, "Also remove snapshot metadata from remote storage")
|
||||
cmd.Flags().BoolVar(&opts.All, "all", false, "Remove all snapshots (requires --force)")
|
||||
|
||||
return cmd
|
||||
}
|
||||
|
||||
// newSnapshotPruneCommand creates the 'snapshot prune' subcommand
|
||||
func newSnapshotPruneCommand() *cobra.Command {
|
||||
cmd := &cobra.Command{
|
||||
Use: "prune",
|
||||
Short: "Remove orphaned data from local database",
|
||||
Long: `Removes orphaned files, chunks, and blobs from the local database.
|
||||
|
||||
This cleans up data that is no longer referenced by any snapshot, which can
|
||||
accumulate from incomplete backups or deleted snapshots.`,
|
||||
Args: cobra.NoArgs,
|
||||
RunE: func(cmd *cobra.Command, args []string) error {
|
||||
// Use unified config resolution
|
||||
configPath, err := ResolveConfigPath()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
rootFlags := GetRootFlags()
|
||||
return RunWithApp(cmd.Context(), AppOptions{
|
||||
ConfigPath: configPath,
|
||||
LogOptions: log.LogOptions{
|
||||
Verbose: rootFlags.Verbose,
|
||||
Debug: rootFlags.Debug,
|
||||
Quiet: rootFlags.Quiet,
|
||||
},
|
||||
Modules: []fx.Option{},
|
||||
Invokes: []fx.Option{
|
||||
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||
lc.Append(fx.Hook{
|
||||
OnStart: func(ctx context.Context) error {
|
||||
go func() {
|
||||
if _, err := v.PruneDatabase(); err != nil {
|
||||
if err != context.Canceled {
|
||||
log.Error("Failed to prune database", "error", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||
log.Error("Failed to shutdown", "error", err)
|
||||
}
|
||||
}()
|
||||
return nil
|
||||
},
|
||||
OnStop: func(ctx context.Context) error {
|
||||
v.Cancel()
|
||||
return nil
|
||||
},
|
||||
})
|
||||
}),
|
||||
},
|
||||
})
|
||||
},
|
||||
}
|
||||
|
||||
return cmd
|
||||
}
|
||||
|
||||
158
internal/cli/store.go
Normal file
158
internal/cli/store.go
Normal file
@@ -0,0 +1,158 @@
|
||||
package cli
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
"git.eeqj.de/sneak/vaultik/internal/storage"
|
||||
"github.com/spf13/cobra"
|
||||
"go.uber.org/fx"
|
||||
)
|
||||
|
||||
// StoreApp contains dependencies for store commands
|
||||
type StoreApp struct {
|
||||
Storage storage.Storer
|
||||
Shutdowner fx.Shutdowner
|
||||
}
|
||||
|
||||
// NewStoreCommand creates the store command and subcommands
|
||||
func NewStoreCommand() *cobra.Command {
|
||||
cmd := &cobra.Command{
|
||||
Use: "store",
|
||||
Short: "Storage information commands",
|
||||
Long: "Commands for viewing information about the storage backend",
|
||||
}
|
||||
|
||||
// Add subcommands
|
||||
cmd.AddCommand(newStoreInfoCommand())
|
||||
|
||||
return cmd
|
||||
}
|
||||
|
||||
// newStoreInfoCommand creates the 'store info' subcommand
|
||||
func newStoreInfoCommand() *cobra.Command {
|
||||
return &cobra.Command{
|
||||
Use: "info",
|
||||
Short: "Display storage information",
|
||||
Long: "Shows storage configuration and statistics including snapshots and blobs",
|
||||
RunE: func(cmd *cobra.Command, args []string) error {
|
||||
return runWithApp(cmd.Context(), func(app *StoreApp) error {
|
||||
return app.Info(cmd.Context())
|
||||
})
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
// Info displays storage information
|
||||
func (app *StoreApp) Info(ctx context.Context) error {
|
||||
// Get storage info
|
||||
storageInfo := app.Storage.Info()
|
||||
|
||||
fmt.Printf("Storage Information\n")
|
||||
fmt.Printf("==================\n\n")
|
||||
fmt.Printf("Storage Configuration:\n")
|
||||
fmt.Printf(" Type: %s\n", storageInfo.Type)
|
||||
fmt.Printf(" Location: %s\n\n", storageInfo.Location)
|
||||
|
||||
// Count snapshots by listing metadata/ prefix
|
||||
snapshotCount := 0
|
||||
snapshotCh := app.Storage.ListStream(ctx, "metadata/")
|
||||
snapshotDirs := make(map[string]bool)
|
||||
|
||||
for object := range snapshotCh {
|
||||
if object.Err != nil {
|
||||
return fmt.Errorf("listing snapshots: %w", object.Err)
|
||||
}
|
||||
// Extract snapshot ID from path like metadata/2024-01-15-143052-hostname/
|
||||
parts := strings.Split(object.Key, "/")
|
||||
if len(parts) >= 2 && parts[0] == "metadata" && parts[1] != "" {
|
||||
snapshotDirs[parts[1]] = true
|
||||
}
|
||||
}
|
||||
snapshotCount = len(snapshotDirs)
|
||||
|
||||
// Count blobs and calculate total size by listing blobs/ prefix
|
||||
blobCount := 0
|
||||
var totalSize int64
|
||||
|
||||
blobCh := app.Storage.ListStream(ctx, "blobs/")
|
||||
for object := range blobCh {
|
||||
if object.Err != nil {
|
||||
return fmt.Errorf("listing blobs: %w", object.Err)
|
||||
}
|
||||
if !strings.HasSuffix(object.Key, "/") { // Skip directories
|
||||
blobCount++
|
||||
totalSize += object.Size
|
||||
}
|
||||
}
|
||||
|
||||
fmt.Printf("Storage Statistics:\n")
|
||||
fmt.Printf(" Snapshots: %d\n", snapshotCount)
|
||||
fmt.Printf(" Blobs: %d\n", blobCount)
|
||||
fmt.Printf(" Total Size: %s\n", formatBytes(totalSize))
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// formatBytes formats bytes into human-readable format
|
||||
func formatBytes(bytes int64) string {
|
||||
const unit = 1024
|
||||
if bytes < unit {
|
||||
return fmt.Sprintf("%d B", bytes)
|
||||
}
|
||||
div, exp := int64(unit), 0
|
||||
for n := bytes / unit; n >= unit; n /= unit {
|
||||
div *= unit
|
||||
exp++
|
||||
}
|
||||
return fmt.Sprintf("%.1f %cB", float64(bytes)/float64(div), "KMGTPE"[exp])
|
||||
}
|
||||
|
||||
// runWithApp creates the FX app and runs the given function
|
||||
func runWithApp(ctx context.Context, fn func(*StoreApp) error) error {
|
||||
var result error
|
||||
rootFlags := GetRootFlags()
|
||||
|
||||
// Use unified config resolution
|
||||
configPath, err := ResolveConfigPath()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
err = RunWithApp(ctx, AppOptions{
|
||||
ConfigPath: configPath,
|
||||
LogOptions: log.LogOptions{
|
||||
Verbose: rootFlags.Verbose,
|
||||
Debug: rootFlags.Debug,
|
||||
Quiet: rootFlags.Quiet,
|
||||
},
|
||||
Modules: []fx.Option{
|
||||
fx.Provide(func(storer storage.Storer, shutdowner fx.Shutdowner) *StoreApp {
|
||||
return &StoreApp{
|
||||
Storage: storer,
|
||||
Shutdowner: shutdowner,
|
||||
}
|
||||
}),
|
||||
},
|
||||
Invokes: []fx.Option{
|
||||
fx.Invoke(func(app *StoreApp, shutdowner fx.Shutdowner) {
|
||||
result = fn(app)
|
||||
// Shutdown after command completes
|
||||
go func() {
|
||||
time.Sleep(100 * time.Millisecond) // Brief delay to ensure clean shutdown
|
||||
if err := shutdowner.Shutdown(); err != nil {
|
||||
log.Error("Failed to shutdown", "error", err)
|
||||
}
|
||||
}()
|
||||
}),
|
||||
},
|
||||
})
|
||||
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
return result
|
||||
}
|
||||
10
internal/cli/vaultik_snapshot_types.go
Normal file
10
internal/cli/vaultik_snapshot_types.go
Normal file
@@ -0,0 +1,10 @@
|
||||
package cli
|
||||
|
||||
import "time"
|
||||
|
||||
// SnapshotInfo represents snapshot information for listing
|
||||
type SnapshotInfo struct {
|
||||
ID string `json:"id"`
|
||||
Timestamp time.Time `json:"timestamp"`
|
||||
CompressedSize int64 `json:"compressed_size"`
|
||||
}
|
||||
@@ -2,85 +2,97 @@ package cli
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"os"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/globals"
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
"git.eeqj.de/sneak/vaultik/internal/vaultik"
|
||||
"github.com/spf13/cobra"
|
||||
"go.uber.org/fx"
|
||||
)
|
||||
|
||||
// VerifyOptions contains options for the verify command
|
||||
type VerifyOptions struct {
|
||||
Bucket string
|
||||
Prefix string
|
||||
SnapshotID string
|
||||
Quick bool
|
||||
}
|
||||
|
||||
// NewVerifyCommand creates the verify command
|
||||
func NewVerifyCommand() *cobra.Command {
|
||||
opts := &VerifyOptions{}
|
||||
opts := &vaultik.VerifyOptions{}
|
||||
|
||||
cmd := &cobra.Command{
|
||||
Use: "verify",
|
||||
Short: "Verify backup integrity",
|
||||
Long: `Check that all referenced blobs exist and verify metadata integrity`,
|
||||
Args: cobra.NoArgs,
|
||||
Use: "verify <snapshot-id>",
|
||||
Short: "Verify snapshot integrity",
|
||||
Long: `Verifies that all blobs referenced in a snapshot exist and optionally verifies their contents.
|
||||
|
||||
Shallow verification (default):
|
||||
- Downloads and decompresses manifest
|
||||
- Checks existence of all blobs in S3
|
||||
- Reports missing blobs
|
||||
|
||||
Deep verification (--deep):
|
||||
- Downloads and decrypts database
|
||||
- Verifies blob lists match between manifest and database
|
||||
- Downloads, decrypts, and decompresses each blob
|
||||
- Verifies SHA256 hash of each chunk matches database
|
||||
- Ensures chunks are ordered correctly
|
||||
|
||||
The command will fail immediately on any verification error and exit with non-zero status.`,
|
||||
Args: cobra.ExactArgs(1),
|
||||
RunE: func(cmd *cobra.Command, args []string) error {
|
||||
// Validate required flags
|
||||
if opts.Bucket == "" {
|
||||
return fmt.Errorf("--bucket is required")
|
||||
snapshotID := args[0]
|
||||
|
||||
// Use unified config resolution
|
||||
configPath, err := ResolveConfigPath()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if opts.Prefix == "" {
|
||||
return fmt.Errorf("--prefix is required")
|
||||
|
||||
// Use the app framework for all verification
|
||||
rootFlags := GetRootFlags()
|
||||
return RunWithApp(cmd.Context(), AppOptions{
|
||||
ConfigPath: configPath,
|
||||
LogOptions: log.LogOptions{
|
||||
Verbose: rootFlags.Verbose,
|
||||
Debug: rootFlags.Debug,
|
||||
Quiet: rootFlags.Quiet || opts.JSON, // Suppress log output in JSON mode
|
||||
},
|
||||
Modules: []fx.Option{},
|
||||
Invokes: []fx.Option{
|
||||
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
|
||||
lc.Append(fx.Hook{
|
||||
OnStart: func(ctx context.Context) error {
|
||||
// Run the verify operation directly
|
||||
go func() {
|
||||
var err error
|
||||
if opts.Deep {
|
||||
err = v.RunDeepVerify(snapshotID, opts)
|
||||
} else {
|
||||
err = v.VerifySnapshotWithOptions(snapshotID, opts)
|
||||
}
|
||||
return runVerify(cmd.Context(), opts)
|
||||
|
||||
if err != nil {
|
||||
if err != context.Canceled {
|
||||
if !opts.JSON {
|
||||
log.Error("Verification failed", "error", err)
|
||||
}
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
if err := v.Shutdowner.Shutdown(); err != nil {
|
||||
log.Error("Failed to shutdown", "error", err)
|
||||
}
|
||||
}()
|
||||
return nil
|
||||
},
|
||||
OnStop: func(ctx context.Context) error {
|
||||
log.Debug("Stopping verify operation")
|
||||
v.Cancel()
|
||||
return nil
|
||||
},
|
||||
})
|
||||
}),
|
||||
},
|
||||
})
|
||||
},
|
||||
}
|
||||
|
||||
cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name")
|
||||
cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix")
|
||||
cmd.Flags().StringVar(&opts.SnapshotID, "snapshot", "", "Snapshot ID to verify (optional, defaults to latest)")
|
||||
cmd.Flags().BoolVar(&opts.Quick, "quick", false, "Perform quick verification by checking blob existence and S3 content hashes without downloading")
|
||||
cmd.Flags().BoolVar(&opts.Deep, "deep", false, "Perform deep verification by downloading and verifying all blob contents")
|
||||
cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output verification results as JSON")
|
||||
|
||||
return cmd
|
||||
}
|
||||
|
||||
func runVerify(ctx context.Context, opts *VerifyOptions) error {
|
||||
if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
|
||||
return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
|
||||
}
|
||||
|
||||
app := fx.New(
|
||||
fx.Supply(opts),
|
||||
fx.Provide(globals.New),
|
||||
// Additional modules will be added here
|
||||
fx.Invoke(func(g *globals.Globals) error {
|
||||
// TODO: Implement verify logic
|
||||
if opts.SnapshotID == "" {
|
||||
fmt.Printf("Verifying latest snapshot in bucket %s with prefix %s\n", opts.Bucket, opts.Prefix)
|
||||
} else {
|
||||
fmt.Printf("Verifying snapshot %s in bucket %s with prefix %s\n", opts.SnapshotID, opts.Bucket, opts.Prefix)
|
||||
}
|
||||
if opts.Quick {
|
||||
fmt.Println("Performing quick verification")
|
||||
} else {
|
||||
fmt.Println("Performing deep verification")
|
||||
}
|
||||
return nil
|
||||
}),
|
||||
fx.NopLogger,
|
||||
)
|
||||
|
||||
if err := app.Start(ctx); err != nil {
|
||||
return fmt.Errorf("failed to start verify: %w", err)
|
||||
}
|
||||
defer func() {
|
||||
if err := app.Stop(ctx); err != nil {
|
||||
fmt.Printf("error stopping app: %v\n", err)
|
||||
}
|
||||
}()
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
27
internal/cli/version.go
Normal file
27
internal/cli/version.go
Normal file
@@ -0,0 +1,27 @@
|
||||
package cli
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"runtime"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/globals"
|
||||
"github.com/spf13/cobra"
|
||||
)
|
||||
|
||||
// NewVersionCommand creates the version command
|
||||
func NewVersionCommand() *cobra.Command {
|
||||
cmd := &cobra.Command{
|
||||
Use: "version",
|
||||
Short: "Print version information",
|
||||
Long: `Print version, git commit, and build information for vaultik.`,
|
||||
Args: cobra.NoArgs,
|
||||
Run: func(cmd *cobra.Command, args []string) {
|
||||
fmt.Printf("vaultik %s\n", globals.Version)
|
||||
fmt.Printf(" commit: %s\n", globals.Commit)
|
||||
fmt.Printf(" go: %s\n", runtime.Version())
|
||||
fmt.Printf(" os/arch: %s/%s\n", runtime.GOOS, runtime.GOARCH)
|
||||
},
|
||||
}
|
||||
|
||||
return cmd
|
||||
}
|
||||
@@ -3,30 +3,112 @@ package config
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"sort"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"filippo.io/age"
|
||||
"git.eeqj.de/sneak/smartconfig"
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
"github.com/adrg/xdg"
|
||||
"go.uber.org/fx"
|
||||
"gopkg.in/yaml.v3"
|
||||
)
|
||||
|
||||
// Config represents the application configuration
|
||||
const appName = "berlin.sneak.app.vaultik"
|
||||
|
||||
// expandTilde expands ~ at the start of a path to the user's home directory.
|
||||
func expandTilde(path string) string {
|
||||
if path == "~" {
|
||||
home, _ := os.UserHomeDir()
|
||||
return home
|
||||
}
|
||||
if strings.HasPrefix(path, "~/") {
|
||||
home, _ := os.UserHomeDir()
|
||||
return filepath.Join(home, path[2:])
|
||||
}
|
||||
return path
|
||||
}
|
||||
|
||||
// expandTildeInURL expands ~ in file:// URLs.
|
||||
func expandTildeInURL(url string) string {
|
||||
if strings.HasPrefix(url, "file://~/") {
|
||||
home, _ := os.UserHomeDir()
|
||||
return "file://" + filepath.Join(home, url[9:])
|
||||
}
|
||||
return url
|
||||
}
|
||||
|
||||
// SnapshotConfig represents configuration for a named snapshot.
|
||||
// Each snapshot backs up one or more paths and can have its own exclude patterns
|
||||
// in addition to the global excludes.
|
||||
type SnapshotConfig struct {
|
||||
Paths []string `yaml:"paths"`
|
||||
Exclude []string `yaml:"exclude"` // Additional excludes for this snapshot
|
||||
}
|
||||
|
||||
// GetExcludes returns the combined exclude patterns for a named snapshot.
|
||||
// It merges global excludes with the snapshot-specific excludes.
|
||||
func (c *Config) GetExcludes(snapshotName string) []string {
|
||||
snap, ok := c.Snapshots[snapshotName]
|
||||
if !ok {
|
||||
return c.Exclude
|
||||
}
|
||||
|
||||
if len(snap.Exclude) == 0 {
|
||||
return c.Exclude
|
||||
}
|
||||
|
||||
// Combine global and snapshot-specific excludes
|
||||
combined := make([]string, 0, len(c.Exclude)+len(snap.Exclude))
|
||||
combined = append(combined, c.Exclude...)
|
||||
combined = append(combined, snap.Exclude...)
|
||||
return combined
|
||||
}
|
||||
|
||||
// SnapshotNames returns the names of all configured snapshots in sorted order.
|
||||
func (c *Config) SnapshotNames() []string {
|
||||
names := make([]string, 0, len(c.Snapshots))
|
||||
for name := range c.Snapshots {
|
||||
names = append(names, name)
|
||||
}
|
||||
// Sort for deterministic order
|
||||
sort.Strings(names)
|
||||
return names
|
||||
}
|
||||
|
||||
// Config represents the application configuration for Vaultik.
|
||||
// It defines all settings for backup operations, including source directories,
|
||||
// encryption recipients, storage configuration, and performance tuning parameters.
|
||||
// Configuration is typically loaded from a YAML file.
|
||||
type Config struct {
|
||||
AgeRecipient string `yaml:"age_recipient"`
|
||||
AgeRecipients []string `yaml:"age_recipients"`
|
||||
AgeSecretKey string `yaml:"age_secret_key"`
|
||||
BackupInterval time.Duration `yaml:"backup_interval"`
|
||||
BlobSizeLimit int64 `yaml:"blob_size_limit"`
|
||||
ChunkSize int64 `yaml:"chunk_size"`
|
||||
Exclude []string `yaml:"exclude"`
|
||||
BlobSizeLimit Size `yaml:"blob_size_limit"`
|
||||
ChunkSize Size `yaml:"chunk_size"`
|
||||
Exclude []string `yaml:"exclude"` // Global excludes applied to all snapshots
|
||||
FullScanInterval time.Duration `yaml:"full_scan_interval"`
|
||||
Hostname string `yaml:"hostname"`
|
||||
IndexPath string `yaml:"index_path"`
|
||||
IndexPrefix string `yaml:"index_prefix"`
|
||||
MinTimeBetweenRun time.Duration `yaml:"min_time_between_run"`
|
||||
S3 S3Config `yaml:"s3"`
|
||||
SourceDirs []string `yaml:"source_dirs"`
|
||||
Snapshots map[string]SnapshotConfig `yaml:"snapshots"`
|
||||
CompressionLevel int `yaml:"compression_level"`
|
||||
|
||||
// StorageURL specifies the storage backend using a URL format.
|
||||
// Takes precedence over S3Config if set.
|
||||
// Supported formats:
|
||||
// - s3://bucket/prefix?endpoint=host®ion=us-east-1
|
||||
// - file:///path/to/backup
|
||||
// For S3 URLs, credentials are still read from s3.access_key_id and s3.secret_access_key.
|
||||
StorageURL string `yaml:"storage_url"`
|
||||
}
|
||||
|
||||
// S3Config represents S3 storage configuration
|
||||
// S3Config represents S3 storage configuration for backup storage.
|
||||
// It supports both AWS S3 and S3-compatible storage services.
|
||||
// All fields except UseSSL and PartSize are required.
|
||||
type S3Config struct {
|
||||
Endpoint string `yaml:"endpoint"`
|
||||
Bucket string `yaml:"bucket"`
|
||||
@@ -35,13 +117,17 @@ type S3Config struct {
|
||||
SecretAccessKey string `yaml:"secret_access_key"`
|
||||
Region string `yaml:"region"`
|
||||
UseSSL bool `yaml:"use_ssl"`
|
||||
PartSize int64 `yaml:"part_size"`
|
||||
PartSize Size `yaml:"part_size"`
|
||||
}
|
||||
|
||||
// ConfigPath wraps the config file path for fx injection
|
||||
// ConfigPath wraps the config file path for fx dependency injection.
|
||||
// This type allows the config file path to be injected as a distinct type
|
||||
// rather than a plain string, avoiding conflicts with other string dependencies.
|
||||
type ConfigPath string
|
||||
|
||||
// New creates a new Config instance
|
||||
// New creates a new Config instance by loading from the specified path.
|
||||
// This function is used by the fx dependency injection framework.
|
||||
// Returns an error if the path is empty or if loading fails.
|
||||
func New(path ConfigPath) (*Config, error) {
|
||||
if path == "" {
|
||||
return nil, fmt.Errorf("config path not provided")
|
||||
@@ -55,32 +141,60 @@ func New(path ConfigPath) (*Config, error) {
|
||||
return cfg, nil
|
||||
}
|
||||
|
||||
// Load reads and parses the configuration file
|
||||
// Load reads and parses the configuration file from the specified path.
|
||||
// It applies default values for optional fields, performs environment variable
|
||||
// substitution using smartconfig, and validates the configuration.
|
||||
// The configuration file should be in YAML format. Returns an error if the file
|
||||
// cannot be read, parsed, or if validation fails.
|
||||
func Load(path string) (*Config, error) {
|
||||
data, err := os.ReadFile(path)
|
||||
// Load config using smartconfig for interpolation
|
||||
sc, err := smartconfig.NewFromConfigPath(path)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to read config file: %w", err)
|
||||
return nil, fmt.Errorf("failed to load config file: %w", err)
|
||||
}
|
||||
|
||||
cfg := &Config{
|
||||
// Set defaults
|
||||
BlobSizeLimit: 10 * 1024 * 1024 * 1024, // 10GB
|
||||
ChunkSize: 10 * 1024 * 1024, // 10MB
|
||||
BlobSizeLimit: Size(10 * 1024 * 1024 * 1024), // 10GB
|
||||
ChunkSize: Size(10 * 1024 * 1024), // 10MB
|
||||
BackupInterval: 1 * time.Hour,
|
||||
FullScanInterval: 24 * time.Hour,
|
||||
MinTimeBetweenRun: 15 * time.Minute,
|
||||
IndexPath: "/var/lib/vaultik/index.sqlite",
|
||||
IndexPrefix: "index/",
|
||||
IndexPath: filepath.Join(xdg.DataHome, appName, "index.sqlite"),
|
||||
CompressionLevel: 3,
|
||||
}
|
||||
|
||||
if err := yaml.Unmarshal(data, cfg); err != nil {
|
||||
// Convert smartconfig data to YAML then unmarshal
|
||||
configData := sc.Data()
|
||||
yamlBytes, err := yaml.Marshal(configData)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to marshal config data: %w", err)
|
||||
}
|
||||
|
||||
if err := yaml.Unmarshal(yamlBytes, cfg); err != nil {
|
||||
return nil, fmt.Errorf("failed to parse config: %w", err)
|
||||
}
|
||||
|
||||
// Expand tilde in all path fields
|
||||
cfg.IndexPath = expandTilde(cfg.IndexPath)
|
||||
cfg.StorageURL = expandTildeInURL(cfg.StorageURL)
|
||||
|
||||
// Expand tildes in snapshot paths
|
||||
for name, snap := range cfg.Snapshots {
|
||||
for i, path := range snap.Paths {
|
||||
snap.Paths[i] = expandTilde(path)
|
||||
}
|
||||
cfg.Snapshots[name] = snap
|
||||
}
|
||||
|
||||
// Check for environment variable override for IndexPath
|
||||
if envIndexPath := os.Getenv("VAULTIK_INDEX_PATH"); envIndexPath != "" {
|
||||
cfg.IndexPath = envIndexPath
|
||||
cfg.IndexPath = expandTilde(envIndexPath)
|
||||
}
|
||||
|
||||
// Check for environment variable override for AgeSecretKey
|
||||
if envAgeSecretKey := os.Getenv("VAULTIK_AGE_SECRET_KEY"); envAgeSecretKey != "" {
|
||||
cfg.AgeSecretKey = extractAgeSecretKey(envAgeSecretKey)
|
||||
}
|
||||
|
||||
// Get hostname if not set
|
||||
@@ -97,7 +211,18 @@ func Load(path string) (*Config, error) {
|
||||
cfg.S3.Region = "us-east-1"
|
||||
}
|
||||
if cfg.S3.PartSize == 0 {
|
||||
cfg.S3.PartSize = 5 * 1024 * 1024 // 5MB
|
||||
cfg.S3.PartSize = Size(5 * 1024 * 1024) // 5MB
|
||||
}
|
||||
|
||||
// Check config file permissions (warn if world or group readable)
|
||||
if info, err := os.Stat(path); err == nil {
|
||||
mode := info.Mode().Perm()
|
||||
if mode&0044 != 0 { // group or world readable
|
||||
log.Warn("Config file has insecure permissions (contains S3 credentials)",
|
||||
"path", path,
|
||||
"mode", fmt.Sprintf("%04o", mode),
|
||||
"recommendation", "chmod 600 "+path)
|
||||
}
|
||||
}
|
||||
|
||||
if err := cfg.Validate(); err != nil {
|
||||
@@ -107,37 +232,40 @@ func Load(path string) (*Config, error) {
|
||||
return cfg, nil
|
||||
}
|
||||
|
||||
// Validate checks if the configuration is valid
|
||||
// Validate checks if the configuration is valid and complete.
|
||||
// It ensures all required fields are present and have valid values:
|
||||
// - At least one age recipient must be specified
|
||||
// - At least one snapshot must be configured with at least one path
|
||||
// - Storage must be configured (either storage_url or s3.* fields)
|
||||
// - Chunk size must be at least 1MB
|
||||
// - Blob size limit must be at least the chunk size
|
||||
// - Compression level must be between 1 and 19
|
||||
// Returns an error describing the first validation failure encountered.
|
||||
func (c *Config) Validate() error {
|
||||
if c.AgeRecipient == "" {
|
||||
return fmt.Errorf("age_recipient is required")
|
||||
if len(c.AgeRecipients) == 0 {
|
||||
return fmt.Errorf("at least one age_recipient is required")
|
||||
}
|
||||
|
||||
if len(c.SourceDirs) == 0 {
|
||||
return fmt.Errorf("at least one source directory is required")
|
||||
if len(c.Snapshots) == 0 {
|
||||
return fmt.Errorf("at least one snapshot must be configured")
|
||||
}
|
||||
|
||||
if c.S3.Endpoint == "" {
|
||||
return fmt.Errorf("s3.endpoint is required")
|
||||
for name, snap := range c.Snapshots {
|
||||
if len(snap.Paths) == 0 {
|
||||
return fmt.Errorf("snapshot %q must have at least one path", name)
|
||||
}
|
||||
}
|
||||
|
||||
if c.S3.Bucket == "" {
|
||||
return fmt.Errorf("s3.bucket is required")
|
||||
// Validate storage configuration
|
||||
if err := c.validateStorage(); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if c.S3.AccessKeyID == "" {
|
||||
return fmt.Errorf("s3.access_key_id is required")
|
||||
}
|
||||
|
||||
if c.S3.SecretAccessKey == "" {
|
||||
return fmt.Errorf("s3.secret_access_key is required")
|
||||
}
|
||||
|
||||
if c.ChunkSize < 1024*1024 { // 1MB minimum
|
||||
if c.ChunkSize.Int64() < 1024*1024 { // 1MB minimum
|
||||
return fmt.Errorf("chunk_size must be at least 1MB")
|
||||
}
|
||||
|
||||
if c.BlobSizeLimit < c.ChunkSize {
|
||||
if c.BlobSizeLimit.Int64() < c.ChunkSize.Int64() {
|
||||
return fmt.Errorf("blob_size_limit must be at least chunk_size")
|
||||
}
|
||||
|
||||
@@ -148,7 +276,71 @@ func (c *Config) Validate() error {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Module exports the config module for fx
|
||||
// validateStorage validates storage configuration.
|
||||
// If StorageURL is set, it takes precedence. S3 URLs require credentials.
|
||||
// File URLs don't require any S3 configuration.
|
||||
// If StorageURL is not set, legacy S3 configuration is required.
|
||||
func (c *Config) validateStorage() error {
|
||||
if c.StorageURL != "" {
|
||||
// URL-based configuration
|
||||
if strings.HasPrefix(c.StorageURL, "file://") {
|
||||
// File storage doesn't need S3 credentials
|
||||
return nil
|
||||
}
|
||||
if strings.HasPrefix(c.StorageURL, "s3://") {
|
||||
// S3 storage needs credentials
|
||||
if c.S3.AccessKeyID == "" {
|
||||
return fmt.Errorf("s3.access_key_id is required for s3:// URLs")
|
||||
}
|
||||
if c.S3.SecretAccessKey == "" {
|
||||
return fmt.Errorf("s3.secret_access_key is required for s3:// URLs")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
if strings.HasPrefix(c.StorageURL, "rclone://") {
|
||||
// Rclone storage uses rclone's own config
|
||||
return nil
|
||||
}
|
||||
return fmt.Errorf("storage_url must start with s3://, file://, or rclone://")
|
||||
}
|
||||
|
||||
// Legacy S3 configuration
|
||||
if c.S3.Endpoint == "" {
|
||||
return fmt.Errorf("s3.endpoint is required (or set storage_url)")
|
||||
}
|
||||
|
||||
if c.S3.Bucket == "" {
|
||||
return fmt.Errorf("s3.bucket is required (or set storage_url)")
|
||||
}
|
||||
|
||||
if c.S3.AccessKeyID == "" {
|
||||
return fmt.Errorf("s3.access_key_id is required")
|
||||
}
|
||||
|
||||
if c.S3.SecretAccessKey == "" {
|
||||
return fmt.Errorf("s3.secret_access_key is required")
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// extractAgeSecretKey extracts the AGE-SECRET-KEY from the input using
|
||||
// the age library's parser, which handles comments and whitespace.
|
||||
func extractAgeSecretKey(input string) string {
|
||||
identities, err := age.ParseIdentities(strings.NewReader(input))
|
||||
if err != nil || len(identities) == 0 {
|
||||
// Fall back to trimmed input if parsing fails
|
||||
return strings.TrimSpace(input)
|
||||
}
|
||||
// Return the string representation of the first identity
|
||||
if id, ok := identities[0].(*age.X25519Identity); ok {
|
||||
return id.String()
|
||||
}
|
||||
return strings.TrimSpace(input)
|
||||
}
|
||||
|
||||
// Module exports the config module for fx dependency injection.
|
||||
// It provides the Config type to other modules in the application.
|
||||
var Module = fx.Module("config",
|
||||
fx.Provide(New),
|
||||
)
|
||||
|
||||
@@ -6,6 +6,12 @@ import (
|
||||
"testing"
|
||||
)
|
||||
|
||||
const (
|
||||
TEST_SNEAK_AGE_PUBLIC_KEY = "age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj"
|
||||
TEST_INTEGRATION_AGE_PUBLIC_KEY = "age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"
|
||||
TEST_INTEGRATION_AGE_PRIVATE_KEY = "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5"
|
||||
)
|
||||
|
||||
func TestMain(m *testing.M) {
|
||||
// Set up test environment
|
||||
testConfigPath := filepath.Join("..", "..", "test", "config.yaml")
|
||||
@@ -32,16 +38,28 @@ func TestConfigLoad(t *testing.T) {
|
||||
}
|
||||
|
||||
// Basic validation
|
||||
if cfg.AgeRecipient != "age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" {
|
||||
t.Errorf("Expected age recipient to be set, got '%s'", cfg.AgeRecipient)
|
||||
if len(cfg.AgeRecipients) != 2 {
|
||||
t.Errorf("Expected 2 age recipients, got %d", len(cfg.AgeRecipients))
|
||||
}
|
||||
if cfg.AgeRecipients[0] != TEST_SNEAK_AGE_PUBLIC_KEY {
|
||||
t.Errorf("Expected first age recipient to be %s, got '%s'", TEST_SNEAK_AGE_PUBLIC_KEY, cfg.AgeRecipients[0])
|
||||
}
|
||||
|
||||
if len(cfg.SourceDirs) != 2 {
|
||||
t.Errorf("Expected 2 source dirs, got %d", len(cfg.SourceDirs))
|
||||
if len(cfg.Snapshots) != 1 {
|
||||
t.Errorf("Expected 1 snapshot, got %d", len(cfg.Snapshots))
|
||||
}
|
||||
|
||||
if cfg.SourceDirs[0] != "/tmp/vaultik-test-source" {
|
||||
t.Errorf("Expected first source dir to be '/tmp/vaultik-test-source', got '%s'", cfg.SourceDirs[0])
|
||||
testSnap, ok := cfg.Snapshots["test"]
|
||||
if !ok {
|
||||
t.Fatal("Expected 'test' snapshot to exist")
|
||||
}
|
||||
|
||||
if len(testSnap.Paths) != 2 {
|
||||
t.Errorf("Expected 2 paths in test snapshot, got %d", len(testSnap.Paths))
|
||||
}
|
||||
|
||||
if testSnap.Paths[0] != "/tmp/vaultik-test-source" {
|
||||
t.Errorf("Expected first path to be '/tmp/vaultik-test-source', got '%s'", testSnap.Paths[0])
|
||||
}
|
||||
|
||||
if cfg.S3.Bucket != "vaultik-test-bucket" {
|
||||
@@ -65,3 +83,65 @@ func TestConfigFromEnv(t *testing.T) {
|
||||
t.Errorf("Config file does not exist at path from VAULTIK_CONFIG: %s", configPath)
|
||||
}
|
||||
}
|
||||
|
||||
// TestExtractAgeSecretKey tests extraction of AGE-SECRET-KEY from various inputs
|
||||
func TestExtractAgeSecretKey(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
input string
|
||||
expected string
|
||||
}{
|
||||
{
|
||||
name: "plain key",
|
||||
input: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
|
||||
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
|
||||
},
|
||||
{
|
||||
name: "key with trailing newline",
|
||||
input: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5\n",
|
||||
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
|
||||
},
|
||||
{
|
||||
name: "full age-keygen output",
|
||||
input: `# created: 2025-01-14T12:00:00Z
|
||||
# public key: age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg
|
||||
AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5
|
||||
`,
|
||||
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
|
||||
},
|
||||
{
|
||||
name: "age-keygen output with extra blank lines",
|
||||
input: `# created: 2025-01-14T12:00:00Z
|
||||
# public key: age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg
|
||||
|
||||
AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5
|
||||
|
||||
`,
|
||||
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
|
||||
},
|
||||
{
|
||||
name: "key with leading whitespace",
|
||||
input: " AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5 ",
|
||||
expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
|
||||
},
|
||||
{
|
||||
name: "empty input",
|
||||
input: "",
|
||||
expected: "",
|
||||
},
|
||||
{
|
||||
name: "only comments",
|
||||
input: "# this is a comment\n# another comment",
|
||||
expected: "# this is a comment\n# another comment",
|
||||
},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
result := extractAgeSecretKey(tt.input)
|
||||
if result != tt.expected {
|
||||
t.Errorf("extractAgeSecretKey(%q) = %q, want %q", tt.input, result, tt.expected)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
62
internal/config/size.go
Normal file
62
internal/config/size.go
Normal file
@@ -0,0 +1,62 @@
|
||||
package config
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/dustin/go-humanize"
|
||||
)
|
||||
|
||||
// Size represents a byte size that can be specified in configuration files.
|
||||
// It can unmarshal from both numeric values (interpreted as bytes) and
|
||||
// human-readable strings like "10MB", "2.5GB", or "1TB".
|
||||
type Size int64
|
||||
|
||||
// UnmarshalYAML implements yaml.Unmarshaler for Size, allowing it to be
|
||||
// parsed from YAML configuration files. It accepts both numeric values
|
||||
// (interpreted as bytes) and string values with units (e.g., "10MB").
|
||||
func (s *Size) UnmarshalYAML(unmarshal func(interface{}) error) error {
|
||||
// Try to unmarshal as int64 first
|
||||
var intVal int64
|
||||
if err := unmarshal(&intVal); err == nil {
|
||||
*s = Size(intVal)
|
||||
return nil
|
||||
}
|
||||
|
||||
// Try to unmarshal as string
|
||||
var strVal string
|
||||
if err := unmarshal(&strVal); err != nil {
|
||||
return fmt.Errorf("size must be a number or string")
|
||||
}
|
||||
|
||||
// Parse the string using go-humanize
|
||||
bytes, err := humanize.ParseBytes(strVal)
|
||||
if err != nil {
|
||||
return fmt.Errorf("invalid size format: %w", err)
|
||||
}
|
||||
|
||||
*s = Size(bytes)
|
||||
return nil
|
||||
}
|
||||
|
||||
// Int64 returns the size as int64 bytes.
|
||||
// This is useful when the size needs to be passed to APIs that expect
|
||||
// a numeric byte count.
|
||||
func (s Size) Int64() int64 {
|
||||
return int64(s)
|
||||
}
|
||||
|
||||
// String returns the size as a human-readable string.
|
||||
// For example, 1048576 bytes would be formatted as "1.0 MB".
|
||||
// This implements the fmt.Stringer interface.
|
||||
func (s Size) String() string {
|
||||
return humanize.Bytes(uint64(s))
|
||||
}
|
||||
|
||||
// ParseSize parses a size string into a Size value
|
||||
func ParseSize(s string) (Size, error) {
|
||||
bytes, err := humanize.ParseBytes(s)
|
||||
if err != nil {
|
||||
return 0, fmt.Errorf("invalid size format: %w", err)
|
||||
}
|
||||
return Size(bytes), nil
|
||||
}
|
||||
209
internal/crypto/encryption.go
Normal file
209
internal/crypto/encryption.go
Normal file
@@ -0,0 +1,209 @@
|
||||
package crypto
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"io"
|
||||
"sync"
|
||||
|
||||
"filippo.io/age"
|
||||
"go.uber.org/fx"
|
||||
)
|
||||
|
||||
// Encryptor provides thread-safe encryption using the age encryption library.
|
||||
// It supports encrypting data for multiple recipients simultaneously, allowing
|
||||
// any of the corresponding private keys to decrypt the data. This is useful
|
||||
// for backup scenarios where multiple parties should be able to decrypt the data.
|
||||
type Encryptor struct {
|
||||
recipients []age.Recipient
|
||||
mu sync.RWMutex
|
||||
}
|
||||
|
||||
// NewEncryptor creates a new encryptor with the given age public keys.
|
||||
// Each public key should be a valid age X25519 recipient string (e.g., "age1...")
|
||||
// At least one recipient must be provided. Returns an error if any of the
|
||||
// public keys are invalid or if no recipients are specified.
|
||||
func NewEncryptor(publicKeys []string) (*Encryptor, error) {
|
||||
if len(publicKeys) == 0 {
|
||||
return nil, fmt.Errorf("at least one recipient is required")
|
||||
}
|
||||
|
||||
recipients := make([]age.Recipient, 0, len(publicKeys))
|
||||
for _, key := range publicKeys {
|
||||
recipient, err := age.ParseX25519Recipient(key)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("parsing age recipient %s: %w", key, err)
|
||||
}
|
||||
recipients = append(recipients, recipient)
|
||||
}
|
||||
|
||||
return &Encryptor{
|
||||
recipients: recipients,
|
||||
}, nil
|
||||
}
|
||||
|
||||
// Encrypt encrypts data using age encryption for all configured recipients.
|
||||
// The encrypted data can be decrypted by any of the corresponding private keys.
|
||||
// This method is suitable for small to medium amounts of data that fit in memory.
|
||||
// For large data streams, use EncryptStream or EncryptWriter instead.
|
||||
func (e *Encryptor) Encrypt(data []byte) ([]byte, error) {
|
||||
e.mu.RLock()
|
||||
recipients := e.recipients
|
||||
e.mu.RUnlock()
|
||||
|
||||
var buf bytes.Buffer
|
||||
|
||||
// Create encrypted writer for all recipients
|
||||
w, err := age.Encrypt(&buf, recipients...)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("creating encrypted writer: %w", err)
|
||||
}
|
||||
|
||||
// Write data
|
||||
if _, err := w.Write(data); err != nil {
|
||||
return nil, fmt.Errorf("writing encrypted data: %w", err)
|
||||
}
|
||||
|
||||
// Close to flush
|
||||
if err := w.Close(); err != nil {
|
||||
return nil, fmt.Errorf("closing encrypted writer: %w", err)
|
||||
}
|
||||
|
||||
return buf.Bytes(), nil
|
||||
}
|
||||
|
||||
// EncryptStream encrypts data from reader to writer using age encryption.
|
||||
// This method is suitable for encrypting large files or streams as it processes
|
||||
// data in a streaming fashion without loading everything into memory.
|
||||
// The encrypted data is written directly to the destination writer.
|
||||
func (e *Encryptor) EncryptStream(dst io.Writer, src io.Reader) error {
|
||||
e.mu.RLock()
|
||||
recipients := e.recipients
|
||||
e.mu.RUnlock()
|
||||
|
||||
// Create encrypted writer for all recipients
|
||||
w, err := age.Encrypt(dst, recipients...)
|
||||
if err != nil {
|
||||
return fmt.Errorf("creating encrypted writer: %w", err)
|
||||
}
|
||||
|
||||
// Copy data
|
||||
if _, err := io.Copy(w, src); err != nil {
|
||||
return fmt.Errorf("copying encrypted data: %w", err)
|
||||
}
|
||||
|
||||
// Close to flush
|
||||
if err := w.Close(); err != nil {
|
||||
return fmt.Errorf("closing encrypted writer: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// EncryptWriter creates a writer that encrypts data written to it.
|
||||
// All data written to the returned WriteCloser will be encrypted and written
|
||||
// to the destination writer. The caller must call Close() on the returned
|
||||
// writer to ensure all encrypted data is properly flushed and finalized.
|
||||
// This is useful for integrating encryption into existing writer-based pipelines.
|
||||
func (e *Encryptor) EncryptWriter(dst io.Writer) (io.WriteCloser, error) {
|
||||
e.mu.RLock()
|
||||
recipients := e.recipients
|
||||
e.mu.RUnlock()
|
||||
|
||||
// Create encrypted writer for all recipients
|
||||
w, err := age.Encrypt(dst, recipients...)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("creating encrypted writer: %w", err)
|
||||
}
|
||||
|
||||
return w, nil
|
||||
}
|
||||
|
||||
// UpdateRecipients updates the recipients for future encryption operations.
|
||||
// This method is thread-safe and can be called while other encryption operations
|
||||
// are in progress. Existing encryption operations will continue with the old
|
||||
// recipients. At least one recipient must be provided. Returns an error if any
|
||||
// of the public keys are invalid or if no recipients are specified.
|
||||
func (e *Encryptor) UpdateRecipients(publicKeys []string) error {
|
||||
if len(publicKeys) == 0 {
|
||||
return fmt.Errorf("at least one recipient is required")
|
||||
}
|
||||
|
||||
recipients := make([]age.Recipient, 0, len(publicKeys))
|
||||
for _, key := range publicKeys {
|
||||
recipient, err := age.ParseX25519Recipient(key)
|
||||
if err != nil {
|
||||
return fmt.Errorf("parsing age recipient %s: %w", key, err)
|
||||
}
|
||||
recipients = append(recipients, recipient)
|
||||
}
|
||||
|
||||
e.mu.Lock()
|
||||
e.recipients = recipients
|
||||
e.mu.Unlock()
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// Decryptor provides thread-safe decryption using the age encryption library.
|
||||
// It uses a private key to decrypt data that was encrypted for the corresponding
|
||||
// public key.
|
||||
type Decryptor struct {
|
||||
identity age.Identity
|
||||
mu sync.RWMutex
|
||||
}
|
||||
|
||||
// NewDecryptor creates a new decryptor with the given age private key.
|
||||
// The private key should be a valid age X25519 identity string.
|
||||
// Returns an error if the private key is invalid.
|
||||
func NewDecryptor(privateKey string) (*Decryptor, error) {
|
||||
identity, err := age.ParseX25519Identity(privateKey)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("parsing age identity: %w", err)
|
||||
}
|
||||
|
||||
return &Decryptor{
|
||||
identity: identity,
|
||||
}, nil
|
||||
}
|
||||
|
||||
// Decrypt decrypts data using age decryption.
|
||||
// This method is suitable for small to medium amounts of data that fit in memory.
|
||||
// For large data streams, use DecryptStream instead.
|
||||
func (d *Decryptor) Decrypt(data []byte) ([]byte, error) {
|
||||
d.mu.RLock()
|
||||
identity := d.identity
|
||||
d.mu.RUnlock()
|
||||
|
||||
r, err := age.Decrypt(bytes.NewReader(data), identity)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("creating decrypted reader: %w", err)
|
||||
}
|
||||
|
||||
decrypted, err := io.ReadAll(r)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("reading decrypted data: %w", err)
|
||||
}
|
||||
|
||||
return decrypted, nil
|
||||
}
|
||||
|
||||
// DecryptStream returns a reader that decrypts data from the provided reader.
|
||||
// This method is suitable for decrypting large files or streams as it processes
|
||||
// data in a streaming fashion without loading everything into memory.
|
||||
// The caller should close the input reader when done.
|
||||
func (d *Decryptor) DecryptStream(src io.Reader) (io.Reader, error) {
|
||||
d.mu.RLock()
|
||||
identity := d.identity
|
||||
d.mu.RUnlock()
|
||||
|
||||
r, err := age.Decrypt(src, identity)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("creating decrypted reader: %w", err)
|
||||
}
|
||||
|
||||
return r, nil
|
||||
}
|
||||
|
||||
// Module exports the crypto module for fx dependency injection.
|
||||
var Module = fx.Module("crypto")
|
||||
157
internal/crypto/encryption_test.go
Normal file
157
internal/crypto/encryption_test.go
Normal file
@@ -0,0 +1,157 @@
|
||||
package crypto
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"testing"
|
||||
|
||||
"filippo.io/age"
|
||||
)
|
||||
|
||||
func TestEncryptor(t *testing.T) {
|
||||
// Generate a test key pair
|
||||
identity, err := age.GenerateX25519Identity()
|
||||
if err != nil {
|
||||
t.Fatalf("failed to generate identity: %v", err)
|
||||
}
|
||||
|
||||
publicKey := identity.Recipient().String()
|
||||
|
||||
// Create encryptor
|
||||
enc, err := NewEncryptor([]string{publicKey})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create encryptor: %v", err)
|
||||
}
|
||||
|
||||
// Test data
|
||||
plaintext := []byte("Hello, World! This is a test message.")
|
||||
|
||||
// Encrypt
|
||||
ciphertext, err := enc.Encrypt(plaintext)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to encrypt: %v", err)
|
||||
}
|
||||
|
||||
// Verify it's actually encrypted (should be larger and different)
|
||||
if bytes.Equal(plaintext, ciphertext) {
|
||||
t.Error("ciphertext equals plaintext")
|
||||
}
|
||||
|
||||
// Decrypt to verify
|
||||
r, err := age.Decrypt(bytes.NewReader(ciphertext), identity)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to decrypt: %v", err)
|
||||
}
|
||||
|
||||
var decrypted bytes.Buffer
|
||||
if _, err := decrypted.ReadFrom(r); err != nil {
|
||||
t.Fatalf("failed to read decrypted data: %v", err)
|
||||
}
|
||||
|
||||
if !bytes.Equal(plaintext, decrypted.Bytes()) {
|
||||
t.Error("decrypted data doesn't match original")
|
||||
}
|
||||
}
|
||||
|
||||
func TestEncryptorMultipleRecipients(t *testing.T) {
|
||||
// Generate three test key pairs
|
||||
identity1, err := age.GenerateX25519Identity()
|
||||
if err != nil {
|
||||
t.Fatalf("failed to generate identity1: %v", err)
|
||||
}
|
||||
identity2, err := age.GenerateX25519Identity()
|
||||
if err != nil {
|
||||
t.Fatalf("failed to generate identity2: %v", err)
|
||||
}
|
||||
identity3, err := age.GenerateX25519Identity()
|
||||
if err != nil {
|
||||
t.Fatalf("failed to generate identity3: %v", err)
|
||||
}
|
||||
|
||||
publicKeys := []string{
|
||||
identity1.Recipient().String(),
|
||||
identity2.Recipient().String(),
|
||||
identity3.Recipient().String(),
|
||||
}
|
||||
|
||||
// Create encryptor with multiple recipients
|
||||
enc, err := NewEncryptor(publicKeys)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create encryptor: %v", err)
|
||||
}
|
||||
|
||||
// Test data
|
||||
plaintext := []byte("Secret message for multiple recipients")
|
||||
|
||||
// Encrypt
|
||||
ciphertext, err := enc.Encrypt(plaintext)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to encrypt: %v", err)
|
||||
}
|
||||
|
||||
// Verify each recipient can decrypt
|
||||
identities := []age.Identity{identity1, identity2, identity3}
|
||||
for i, identity := range identities {
|
||||
r, err := age.Decrypt(bytes.NewReader(ciphertext), identity)
|
||||
if err != nil {
|
||||
t.Fatalf("recipient %d failed to decrypt: %v", i+1, err)
|
||||
}
|
||||
|
||||
var decrypted bytes.Buffer
|
||||
if _, err := decrypted.ReadFrom(r); err != nil {
|
||||
t.Fatalf("recipient %d failed to read decrypted data: %v", i+1, err)
|
||||
}
|
||||
|
||||
if !bytes.Equal(plaintext, decrypted.Bytes()) {
|
||||
t.Errorf("recipient %d: decrypted data doesn't match original", i+1)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestEncryptorUpdateRecipients(t *testing.T) {
|
||||
// Generate two identities
|
||||
identity1, _ := age.GenerateX25519Identity()
|
||||
identity2, _ := age.GenerateX25519Identity()
|
||||
|
||||
publicKey1 := identity1.Recipient().String()
|
||||
publicKey2 := identity2.Recipient().String()
|
||||
|
||||
// Create encryptor with first key
|
||||
enc, err := NewEncryptor([]string{publicKey1})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create encryptor: %v", err)
|
||||
}
|
||||
|
||||
// Encrypt with first key
|
||||
plaintext := []byte("test data")
|
||||
ciphertext1, err := enc.Encrypt(plaintext)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to encrypt: %v", err)
|
||||
}
|
||||
|
||||
// Update to second key
|
||||
if err := enc.UpdateRecipients([]string{publicKey2}); err != nil {
|
||||
t.Fatalf("failed to update recipients: %v", err)
|
||||
}
|
||||
|
||||
// Encrypt with second key
|
||||
ciphertext2, err := enc.Encrypt(plaintext)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to encrypt: %v", err)
|
||||
}
|
||||
|
||||
// First ciphertext should only decrypt with first identity
|
||||
if _, err := age.Decrypt(bytes.NewReader(ciphertext1), identity1); err != nil {
|
||||
t.Error("failed to decrypt with identity1")
|
||||
}
|
||||
if _, err := age.Decrypt(bytes.NewReader(ciphertext1), identity2); err == nil {
|
||||
t.Error("should not decrypt with identity2")
|
||||
}
|
||||
|
||||
// Second ciphertext should only decrypt with second identity
|
||||
if _, err := age.Decrypt(bytes.NewReader(ciphertext2), identity2); err != nil {
|
||||
t.Error("failed to decrypt with identity2")
|
||||
}
|
||||
if _, err := age.Decrypt(bytes.NewReader(ciphertext2), identity1); err == nil {
|
||||
t.Error("should not decrypt with identity1")
|
||||
}
|
||||
}
|
||||
@@ -16,15 +16,15 @@ func NewBlobChunkRepository(db *DB) *BlobChunkRepository {
|
||||
|
||||
func (r *BlobChunkRepository) Create(ctx context.Context, tx *sql.Tx, bc *BlobChunk) error {
|
||||
query := `
|
||||
INSERT INTO blob_chunks (blob_hash, chunk_hash, offset, length)
|
||||
INSERT INTO blob_chunks (blob_id, chunk_hash, offset, length)
|
||||
VALUES (?, ?, ?, ?)
|
||||
`
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, bc.BlobHash, bc.ChunkHash, bc.Offset, bc.Length)
|
||||
_, err = tx.ExecContext(ctx, query, bc.BlobID, bc.ChunkHash, bc.Offset, bc.Length)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLock(ctx, query, bc.BlobHash, bc.ChunkHash, bc.Offset, bc.Length)
|
||||
_, err = r.db.ExecWithLog(ctx, query, bc.BlobID, bc.ChunkHash, bc.Offset, bc.Length)
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
@@ -34,15 +34,15 @@ func (r *BlobChunkRepository) Create(ctx context.Context, tx *sql.Tx, bc *BlobCh
|
||||
return nil
|
||||
}
|
||||
|
||||
func (r *BlobChunkRepository) GetByBlobHash(ctx context.Context, blobHash string) ([]*BlobChunk, error) {
|
||||
func (r *BlobChunkRepository) GetByBlobID(ctx context.Context, blobID string) ([]*BlobChunk, error) {
|
||||
query := `
|
||||
SELECT blob_hash, chunk_hash, offset, length
|
||||
SELECT blob_id, chunk_hash, offset, length
|
||||
FROM blob_chunks
|
||||
WHERE blob_hash = ?
|
||||
WHERE blob_id = ?
|
||||
ORDER BY offset
|
||||
`
|
||||
|
||||
rows, err := r.db.conn.QueryContext(ctx, query, blobHash)
|
||||
rows, err := r.db.conn.QueryContext(ctx, query, blobID)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("querying blob chunks: %w", err)
|
||||
}
|
||||
@@ -51,7 +51,7 @@ func (r *BlobChunkRepository) GetByBlobHash(ctx context.Context, blobHash string
|
||||
var blobChunks []*BlobChunk
|
||||
for rows.Next() {
|
||||
var bc BlobChunk
|
||||
err := rows.Scan(&bc.BlobHash, &bc.ChunkHash, &bc.Offset, &bc.Length)
|
||||
err := rows.Scan(&bc.BlobID, &bc.ChunkHash, &bc.Offset, &bc.Length)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("scanning blob chunk: %w", err)
|
||||
}
|
||||
@@ -63,26 +63,90 @@ func (r *BlobChunkRepository) GetByBlobHash(ctx context.Context, blobHash string
|
||||
|
||||
func (r *BlobChunkRepository) GetByChunkHash(ctx context.Context, chunkHash string) (*BlobChunk, error) {
|
||||
query := `
|
||||
SELECT blob_hash, chunk_hash, offset, length
|
||||
SELECT blob_id, chunk_hash, offset, length
|
||||
FROM blob_chunks
|
||||
WHERE chunk_hash = ?
|
||||
LIMIT 1
|
||||
`
|
||||
|
||||
LogSQL("GetByChunkHash", query, chunkHash)
|
||||
var bc BlobChunk
|
||||
err := r.db.conn.QueryRowContext(ctx, query, chunkHash).Scan(
|
||||
&bc.BlobHash,
|
||||
&bc.BlobID,
|
||||
&bc.ChunkHash,
|
||||
&bc.Offset,
|
||||
&bc.Length,
|
||||
)
|
||||
|
||||
if err == sql.ErrNoRows {
|
||||
LogSQL("GetByChunkHash", "No rows found", chunkHash)
|
||||
return nil, nil
|
||||
}
|
||||
if err != nil {
|
||||
LogSQL("GetByChunkHash", "Error", chunkHash, err)
|
||||
return nil, fmt.Errorf("querying blob chunk: %w", err)
|
||||
}
|
||||
|
||||
LogSQL("GetByChunkHash", "Found blob", chunkHash, "blob", bc.BlobID)
|
||||
return &bc, nil
|
||||
}
|
||||
|
||||
// GetByChunkHashTx retrieves a blob chunk within a transaction
|
||||
func (r *BlobChunkRepository) GetByChunkHashTx(ctx context.Context, tx *sql.Tx, chunkHash string) (*BlobChunk, error) {
|
||||
query := `
|
||||
SELECT blob_id, chunk_hash, offset, length
|
||||
FROM blob_chunks
|
||||
WHERE chunk_hash = ?
|
||||
LIMIT 1
|
||||
`
|
||||
|
||||
LogSQL("GetByChunkHashTx", query, chunkHash)
|
||||
var bc BlobChunk
|
||||
err := tx.QueryRowContext(ctx, query, chunkHash).Scan(
|
||||
&bc.BlobID,
|
||||
&bc.ChunkHash,
|
||||
&bc.Offset,
|
||||
&bc.Length,
|
||||
)
|
||||
|
||||
if err == sql.ErrNoRows {
|
||||
LogSQL("GetByChunkHashTx", "No rows found", chunkHash)
|
||||
return nil, nil
|
||||
}
|
||||
if err != nil {
|
||||
LogSQL("GetByChunkHashTx", "Error", chunkHash, err)
|
||||
return nil, fmt.Errorf("querying blob chunk: %w", err)
|
||||
}
|
||||
|
||||
LogSQL("GetByChunkHashTx", "Found blob", chunkHash, "blob", bc.BlobID)
|
||||
return &bc, nil
|
||||
}
|
||||
|
||||
// DeleteOrphaned deletes blob_chunks entries where either the blob or chunk no longer exists
|
||||
func (r *BlobChunkRepository) DeleteOrphaned(ctx context.Context) error {
|
||||
// Delete blob_chunks where the blob doesn't exist
|
||||
query1 := `
|
||||
DELETE FROM blob_chunks
|
||||
WHERE NOT EXISTS (
|
||||
SELECT 1 FROM blobs
|
||||
WHERE blobs.id = blob_chunks.blob_id
|
||||
)
|
||||
`
|
||||
if _, err := r.db.ExecWithLog(ctx, query1); err != nil {
|
||||
return fmt.Errorf("deleting blob_chunks with missing blobs: %w", err)
|
||||
}
|
||||
|
||||
// Delete blob_chunks where the chunk doesn't exist
|
||||
query2 := `
|
||||
DELETE FROM blob_chunks
|
||||
WHERE NOT EXISTS (
|
||||
SELECT 1 FROM chunks
|
||||
WHERE chunks.chunk_hash = blob_chunks.chunk_hash
|
||||
)
|
||||
`
|
||||
if _, err := r.db.ExecWithLog(ctx, query2); err != nil {
|
||||
return fmt.Errorf("deleting blob_chunks with missing chunks: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -2,7 +2,11 @@ package database
|
||||
|
||||
import (
|
||||
"context"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
)
|
||||
|
||||
func TestBlobChunkRepository(t *testing.T) {
|
||||
@@ -10,78 +14,111 @@ func TestBlobChunkRepository(t *testing.T) {
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repo := NewBlobChunkRepository(db)
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Create blob first
|
||||
blob := &Blob{
|
||||
ID: types.NewBlobID(),
|
||||
Hash: types.BlobHash("blob1-hash"),
|
||||
CreatedTS: time.Now(),
|
||||
}
|
||||
err := repos.Blobs.Create(ctx, nil, blob)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create blob: %v", err)
|
||||
}
|
||||
|
||||
// Create chunks
|
||||
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
|
||||
for _, chunkHash := range chunks {
|
||||
chunk := &Chunk{
|
||||
ChunkHash: chunkHash,
|
||||
Size: 1024,
|
||||
}
|
||||
err = repos.Chunks.Create(ctx, nil, chunk)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
|
||||
}
|
||||
}
|
||||
|
||||
// Test Create
|
||||
bc1 := &BlobChunk{
|
||||
BlobHash: "blob1",
|
||||
ChunkHash: "chunk1",
|
||||
BlobID: blob.ID,
|
||||
ChunkHash: types.ChunkHash("chunk1"),
|
||||
Offset: 0,
|
||||
Length: 1024,
|
||||
}
|
||||
|
||||
err := repo.Create(ctx, nil, bc1)
|
||||
err = repos.BlobChunks.Create(ctx, nil, bc1)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create blob chunk: %v", err)
|
||||
}
|
||||
|
||||
// Add more chunks to the same blob
|
||||
bc2 := &BlobChunk{
|
||||
BlobHash: "blob1",
|
||||
ChunkHash: "chunk2",
|
||||
BlobID: blob.ID,
|
||||
ChunkHash: types.ChunkHash("chunk2"),
|
||||
Offset: 1024,
|
||||
Length: 2048,
|
||||
}
|
||||
err = repo.Create(ctx, nil, bc2)
|
||||
err = repos.BlobChunks.Create(ctx, nil, bc2)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create second blob chunk: %v", err)
|
||||
}
|
||||
|
||||
bc3 := &BlobChunk{
|
||||
BlobHash: "blob1",
|
||||
ChunkHash: "chunk3",
|
||||
BlobID: blob.ID,
|
||||
ChunkHash: types.ChunkHash("chunk3"),
|
||||
Offset: 3072,
|
||||
Length: 512,
|
||||
}
|
||||
err = repo.Create(ctx, nil, bc3)
|
||||
err = repos.BlobChunks.Create(ctx, nil, bc3)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create third blob chunk: %v", err)
|
||||
}
|
||||
|
||||
// Test GetByBlobHash
|
||||
chunks, err := repo.GetByBlobHash(ctx, "blob1")
|
||||
// Test GetByBlobID
|
||||
blobChunks, err := repos.BlobChunks.GetByBlobID(ctx, blob.ID.String())
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get blob chunks: %v", err)
|
||||
}
|
||||
if len(chunks) != 3 {
|
||||
t.Errorf("expected 3 chunks, got %d", len(chunks))
|
||||
if len(blobChunks) != 3 {
|
||||
t.Errorf("expected 3 chunks, got %d", len(blobChunks))
|
||||
}
|
||||
|
||||
// Verify order by offset
|
||||
expectedOffsets := []int64{0, 1024, 3072}
|
||||
for i, chunk := range chunks {
|
||||
if chunk.Offset != expectedOffsets[i] {
|
||||
t.Errorf("wrong chunk order: expected offset %d, got %d", expectedOffsets[i], chunk.Offset)
|
||||
for i, bc := range blobChunks {
|
||||
if bc.Offset != expectedOffsets[i] {
|
||||
t.Errorf("wrong chunk order: expected offset %d, got %d", expectedOffsets[i], bc.Offset)
|
||||
}
|
||||
}
|
||||
|
||||
// Test GetByChunkHash
|
||||
bc, err := repo.GetByChunkHash(ctx, "chunk2")
|
||||
bc, err := repos.BlobChunks.GetByChunkHash(ctx, "chunk2")
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get blob chunk by chunk hash: %v", err)
|
||||
}
|
||||
if bc == nil {
|
||||
t.Fatal("expected blob chunk, got nil")
|
||||
}
|
||||
if bc.BlobHash != "blob1" {
|
||||
t.Errorf("wrong blob hash: expected blob1, got %s", bc.BlobHash)
|
||||
if bc.BlobID != blob.ID {
|
||||
t.Errorf("wrong blob ID: expected %s, got %s", blob.ID, bc.BlobID)
|
||||
}
|
||||
if bc.Offset != 1024 {
|
||||
t.Errorf("wrong offset: expected 1024, got %d", bc.Offset)
|
||||
}
|
||||
|
||||
// Test duplicate insert (should fail due to primary key constraint)
|
||||
err = repos.BlobChunks.Create(ctx, nil, bc1)
|
||||
if err == nil {
|
||||
t.Fatal("duplicate blob_chunk insert should fail due to primary key constraint")
|
||||
}
|
||||
if !strings.Contains(err.Error(), "UNIQUE") && !strings.Contains(err.Error(), "constraint") {
|
||||
t.Fatalf("expected constraint error, got: %v", err)
|
||||
}
|
||||
|
||||
// Test non-existent chunk
|
||||
bc, err = repo.GetByChunkHash(ctx, "nonexistent")
|
||||
bc, err = repos.BlobChunks.GetByChunkHash(ctx, "nonexistent")
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
}
|
||||
@@ -95,26 +132,60 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repo := NewBlobChunkRepository(db)
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Create blobs
|
||||
blob1 := &Blob{
|
||||
ID: types.NewBlobID(),
|
||||
Hash: types.BlobHash("blob1-hash"),
|
||||
CreatedTS: time.Now(),
|
||||
}
|
||||
blob2 := &Blob{
|
||||
ID: types.NewBlobID(),
|
||||
Hash: types.BlobHash("blob2-hash"),
|
||||
CreatedTS: time.Now(),
|
||||
}
|
||||
|
||||
err := repos.Blobs.Create(ctx, nil, blob1)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create blob1: %v", err)
|
||||
}
|
||||
err = repos.Blobs.Create(ctx, nil, blob2)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create blob2: %v", err)
|
||||
}
|
||||
|
||||
// Create chunks
|
||||
chunkHashes := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
|
||||
for _, chunkHash := range chunkHashes {
|
||||
chunk := &Chunk{
|
||||
ChunkHash: chunkHash,
|
||||
Size: 1024,
|
||||
}
|
||||
err = repos.Chunks.Create(ctx, nil, chunk)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
|
||||
}
|
||||
}
|
||||
|
||||
// Create chunks across multiple blobs
|
||||
// Some chunks are shared between blobs (deduplication scenario)
|
||||
blobChunks := []BlobChunk{
|
||||
{BlobHash: "blob1", ChunkHash: "chunk1", Offset: 0, Length: 1024},
|
||||
{BlobHash: "blob1", ChunkHash: "chunk2", Offset: 1024, Length: 1024},
|
||||
{BlobHash: "blob2", ChunkHash: "chunk2", Offset: 0, Length: 1024}, // chunk2 is shared
|
||||
{BlobHash: "blob2", ChunkHash: "chunk3", Offset: 1024, Length: 1024},
|
||||
{BlobID: blob1.ID, ChunkHash: types.ChunkHash("chunk1"), Offset: 0, Length: 1024},
|
||||
{BlobID: blob1.ID, ChunkHash: types.ChunkHash("chunk2"), Offset: 1024, Length: 1024},
|
||||
{BlobID: blob2.ID, ChunkHash: types.ChunkHash("chunk2"), Offset: 0, Length: 1024}, // chunk2 is shared
|
||||
{BlobID: blob2.ID, ChunkHash: types.ChunkHash("chunk3"), Offset: 1024, Length: 1024},
|
||||
}
|
||||
|
||||
for _, bc := range blobChunks {
|
||||
err := repo.Create(ctx, nil, &bc)
|
||||
err := repos.BlobChunks.Create(ctx, nil, &bc)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create blob chunk: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Verify blob1 chunks
|
||||
chunks, err := repo.GetByBlobHash(ctx, "blob1")
|
||||
chunks, err := repos.BlobChunks.GetByBlobID(ctx, blob1.ID.String())
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get blob1 chunks: %v", err)
|
||||
}
|
||||
@@ -123,7 +194,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
|
||||
}
|
||||
|
||||
// Verify blob2 chunks
|
||||
chunks, err = repo.GetByBlobHash(ctx, "blob2")
|
||||
chunks, err = repos.BlobChunks.GetByBlobID(ctx, blob2.ID.String())
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get blob2 chunks: %v", err)
|
||||
}
|
||||
@@ -132,7 +203,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
|
||||
}
|
||||
|
||||
// Verify shared chunk
|
||||
bc, err := repo.GetByChunkHash(ctx, "chunk2")
|
||||
bc, err := repos.BlobChunks.GetByChunkHash(ctx, "chunk2")
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get shared chunk: %v", err)
|
||||
}
|
||||
@@ -140,7 +211,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
|
||||
t.Fatal("expected shared chunk, got nil")
|
||||
}
|
||||
// GetByChunkHash returns first match, should be blob1
|
||||
if bc.BlobHash != "blob1" {
|
||||
t.Errorf("expected blob1 for shared chunk, got %s", bc.BlobHash)
|
||||
if bc.BlobID != blob1.ID {
|
||||
t.Errorf("expected %s for shared chunk, got %s", blob1.ID, bc.BlobID)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -5,6 +5,8 @@ import (
|
||||
"database/sql"
|
||||
"fmt"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
)
|
||||
|
||||
type BlobRepository struct {
|
||||
@@ -17,15 +19,27 @@ func NewBlobRepository(db *DB) *BlobRepository {
|
||||
|
||||
func (r *BlobRepository) Create(ctx context.Context, tx *sql.Tx, blob *Blob) error {
|
||||
query := `
|
||||
INSERT INTO blobs (blob_hash, created_ts)
|
||||
VALUES (?, ?)
|
||||
INSERT INTO blobs (id, blob_hash, created_ts, finished_ts, uncompressed_size, compressed_size, uploaded_ts)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?)
|
||||
`
|
||||
|
||||
var finishedTS, uploadedTS *int64
|
||||
if blob.FinishedTS != nil {
|
||||
ts := blob.FinishedTS.Unix()
|
||||
finishedTS = &ts
|
||||
}
|
||||
if blob.UploadedTS != nil {
|
||||
ts := blob.UploadedTS.Unix()
|
||||
uploadedTS = &ts
|
||||
}
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, blob.BlobHash, blob.CreatedTS.Unix())
|
||||
_, err = tx.ExecContext(ctx, query, blob.ID, blob.Hash, blob.CreatedTS.Unix(),
|
||||
finishedTS, blob.UncompressedSize, blob.CompressedSize, uploadedTS)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLock(ctx, query, blob.BlobHash, blob.CreatedTS.Unix())
|
||||
_, err = r.db.ExecWithLog(ctx, query, blob.ID, blob.Hash, blob.CreatedTS.Unix(),
|
||||
finishedTS, blob.UncompressedSize, blob.CompressedSize, uploadedTS)
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
@@ -37,17 +51,23 @@ func (r *BlobRepository) Create(ctx context.Context, tx *sql.Tx, blob *Blob) err
|
||||
|
||||
func (r *BlobRepository) GetByHash(ctx context.Context, hash string) (*Blob, error) {
|
||||
query := `
|
||||
SELECT blob_hash, created_ts
|
||||
SELECT id, blob_hash, created_ts, finished_ts, uncompressed_size, compressed_size, uploaded_ts
|
||||
FROM blobs
|
||||
WHERE blob_hash = ?
|
||||
`
|
||||
|
||||
var blob Blob
|
||||
var createdTSUnix int64
|
||||
var finishedTSUnix, uploadedTSUnix sql.NullInt64
|
||||
|
||||
err := r.db.conn.QueryRowContext(ctx, query, hash).Scan(
|
||||
&blob.BlobHash,
|
||||
&blob.ID,
|
||||
&blob.Hash,
|
||||
&createdTSUnix,
|
||||
&finishedTSUnix,
|
||||
&blob.UncompressedSize,
|
||||
&blob.CompressedSize,
|
||||
&uploadedTSUnix,
|
||||
)
|
||||
|
||||
if err == sql.ErrNoRows {
|
||||
@@ -57,40 +77,124 @@ func (r *BlobRepository) GetByHash(ctx context.Context, hash string) (*Blob, err
|
||||
return nil, fmt.Errorf("querying blob: %w", err)
|
||||
}
|
||||
|
||||
blob.CreatedTS = time.Unix(createdTSUnix, 0)
|
||||
blob.CreatedTS = time.Unix(createdTSUnix, 0).UTC()
|
||||
if finishedTSUnix.Valid {
|
||||
ts := time.Unix(finishedTSUnix.Int64, 0).UTC()
|
||||
blob.FinishedTS = &ts
|
||||
}
|
||||
if uploadedTSUnix.Valid {
|
||||
ts := time.Unix(uploadedTSUnix.Int64, 0).UTC()
|
||||
blob.UploadedTS = &ts
|
||||
}
|
||||
return &blob, nil
|
||||
}
|
||||
|
||||
func (r *BlobRepository) List(ctx context.Context, limit, offset int) ([]*Blob, error) {
|
||||
// GetByID retrieves a blob by its ID
|
||||
func (r *BlobRepository) GetByID(ctx context.Context, id string) (*Blob, error) {
|
||||
query := `
|
||||
SELECT blob_hash, created_ts
|
||||
SELECT id, blob_hash, created_ts, finished_ts, uncompressed_size, compressed_size, uploaded_ts
|
||||
FROM blobs
|
||||
ORDER BY blob_hash
|
||||
LIMIT ? OFFSET ?
|
||||
WHERE id = ?
|
||||
`
|
||||
|
||||
rows, err := r.db.conn.QueryContext(ctx, query, limit, offset)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("querying blobs: %w", err)
|
||||
}
|
||||
defer CloseRows(rows)
|
||||
|
||||
var blobs []*Blob
|
||||
for rows.Next() {
|
||||
var blob Blob
|
||||
var createdTSUnix int64
|
||||
var finishedTSUnix, uploadedTSUnix sql.NullInt64
|
||||
|
||||
err := rows.Scan(
|
||||
&blob.BlobHash,
|
||||
err := r.db.conn.QueryRowContext(ctx, query, id).Scan(
|
||||
&blob.ID,
|
||||
&blob.Hash,
|
||||
&createdTSUnix,
|
||||
&finishedTSUnix,
|
||||
&blob.UncompressedSize,
|
||||
&blob.CompressedSize,
|
||||
&uploadedTSUnix,
|
||||
)
|
||||
|
||||
if err == sql.ErrNoRows {
|
||||
return nil, nil
|
||||
}
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("scanning blob: %w", err)
|
||||
return nil, fmt.Errorf("querying blob: %w", err)
|
||||
}
|
||||
|
||||
blob.CreatedTS = time.Unix(createdTSUnix, 0)
|
||||
blobs = append(blobs, &blob)
|
||||
blob.CreatedTS = time.Unix(createdTSUnix, 0).UTC()
|
||||
if finishedTSUnix.Valid {
|
||||
ts := time.Unix(finishedTSUnix.Int64, 0).UTC()
|
||||
blob.FinishedTS = &ts
|
||||
}
|
||||
if uploadedTSUnix.Valid {
|
||||
ts := time.Unix(uploadedTSUnix.Int64, 0).UTC()
|
||||
blob.UploadedTS = &ts
|
||||
}
|
||||
return &blob, nil
|
||||
}
|
||||
|
||||
return blobs, rows.Err()
|
||||
// UpdateFinished updates a blob when it's finalized
|
||||
func (r *BlobRepository) UpdateFinished(ctx context.Context, tx *sql.Tx, id string, hash string, uncompressedSize, compressedSize int64) error {
|
||||
query := `
|
||||
UPDATE blobs
|
||||
SET blob_hash = ?, finished_ts = ?, uncompressed_size = ?, compressed_size = ?
|
||||
WHERE id = ?
|
||||
`
|
||||
|
||||
now := time.Now().UTC().Unix()
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, hash, now, uncompressedSize, compressedSize, id)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLog(ctx, query, hash, now, uncompressedSize, compressedSize, id)
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
return fmt.Errorf("updating blob: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// UpdateUploaded marks a blob as uploaded
|
||||
func (r *BlobRepository) UpdateUploaded(ctx context.Context, tx *sql.Tx, id string) error {
|
||||
query := `
|
||||
UPDATE blobs
|
||||
SET uploaded_ts = ?
|
||||
WHERE id = ?
|
||||
`
|
||||
|
||||
now := time.Now().UTC().Unix()
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, now, id)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLog(ctx, query, now, id)
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
return fmt.Errorf("marking blob as uploaded: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// DeleteOrphaned deletes blobs that are not referenced by any snapshot
|
||||
func (r *BlobRepository) DeleteOrphaned(ctx context.Context) error {
|
||||
query := `
|
||||
DELETE FROM blobs
|
||||
WHERE NOT EXISTS (
|
||||
SELECT 1 FROM snapshot_blobs
|
||||
WHERE snapshot_blobs.blob_id = blobs.id
|
||||
)
|
||||
`
|
||||
|
||||
result, err := r.db.ExecWithLog(ctx, query)
|
||||
if err != nil {
|
||||
return fmt.Errorf("deleting orphaned blobs: %w", err)
|
||||
}
|
||||
|
||||
rowsAffected, _ := result.RowsAffected()
|
||||
if rowsAffected > 0 {
|
||||
log.Debug("Deleted orphaned blobs", "count", rowsAffected)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -4,6 +4,8 @@ import (
|
||||
"context"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
)
|
||||
|
||||
func TestBlobRepository(t *testing.T) {
|
||||
@@ -15,7 +17,8 @@ func TestBlobRepository(t *testing.T) {
|
||||
|
||||
// Test Create
|
||||
blob := &Blob{
|
||||
BlobHash: "blobhash123",
|
||||
ID: types.NewBlobID(),
|
||||
Hash: types.BlobHash("blobhash123"),
|
||||
CreatedTS: time.Now().Truncate(time.Second),
|
||||
}
|
||||
|
||||
@@ -25,23 +28,36 @@ func TestBlobRepository(t *testing.T) {
|
||||
}
|
||||
|
||||
// Test GetByHash
|
||||
retrieved, err := repo.GetByHash(ctx, blob.BlobHash)
|
||||
retrieved, err := repo.GetByHash(ctx, blob.Hash.String())
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get blob: %v", err)
|
||||
}
|
||||
if retrieved == nil {
|
||||
t.Fatal("expected blob, got nil")
|
||||
}
|
||||
if retrieved.BlobHash != blob.BlobHash {
|
||||
t.Errorf("blob hash mismatch: got %s, want %s", retrieved.BlobHash, blob.BlobHash)
|
||||
if retrieved.Hash != blob.Hash {
|
||||
t.Errorf("blob hash mismatch: got %s, want %s", retrieved.Hash, blob.Hash)
|
||||
}
|
||||
if !retrieved.CreatedTS.Equal(blob.CreatedTS) {
|
||||
t.Errorf("created timestamp mismatch: got %v, want %v", retrieved.CreatedTS, blob.CreatedTS)
|
||||
}
|
||||
|
||||
// Test List
|
||||
// Test GetByID
|
||||
retrievedByID, err := repo.GetByID(ctx, blob.ID.String())
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get blob by ID: %v", err)
|
||||
}
|
||||
if retrievedByID == nil {
|
||||
t.Fatal("expected blob, got nil")
|
||||
}
|
||||
if retrievedByID.ID != blob.ID {
|
||||
t.Errorf("blob ID mismatch: got %s, want %s", retrievedByID.ID, blob.ID)
|
||||
}
|
||||
|
||||
// Test with second blob
|
||||
blob2 := &Blob{
|
||||
BlobHash: "blobhash456",
|
||||
ID: types.NewBlobID(),
|
||||
Hash: types.BlobHash("blobhash456"),
|
||||
CreatedTS: time.Now().Truncate(time.Second),
|
||||
}
|
||||
err = repo.Create(ctx, nil, blob2)
|
||||
@@ -49,29 +65,45 @@ func TestBlobRepository(t *testing.T) {
|
||||
t.Fatalf("failed to create second blob: %v", err)
|
||||
}
|
||||
|
||||
blobs, err := repo.List(ctx, 10, 0)
|
||||
// Test UpdateFinished
|
||||
now := time.Now()
|
||||
err = repo.UpdateFinished(ctx, nil, blob.ID.String(), blob.Hash.String(), 1000, 500)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to list blobs: %v", err)
|
||||
}
|
||||
if len(blobs) != 2 {
|
||||
t.Errorf("expected 2 blobs, got %d", len(blobs))
|
||||
t.Fatalf("failed to update blob as finished: %v", err)
|
||||
}
|
||||
|
||||
// Test pagination
|
||||
blobs, err = repo.List(ctx, 1, 0)
|
||||
// Verify update
|
||||
updated, err := repo.GetByID(ctx, blob.ID.String())
|
||||
if err != nil {
|
||||
t.Fatalf("failed to list blobs with limit: %v", err)
|
||||
t.Fatalf("failed to get updated blob: %v", err)
|
||||
}
|
||||
if len(blobs) != 1 {
|
||||
t.Errorf("expected 1 blob with limit, got %d", len(blobs))
|
||||
if updated.FinishedTS == nil {
|
||||
t.Fatal("expected finished timestamp to be set")
|
||||
}
|
||||
if updated.UncompressedSize != 1000 {
|
||||
t.Errorf("expected uncompressed size 1000, got %d", updated.UncompressedSize)
|
||||
}
|
||||
if updated.CompressedSize != 500 {
|
||||
t.Errorf("expected compressed size 500, got %d", updated.CompressedSize)
|
||||
}
|
||||
|
||||
blobs, err = repo.List(ctx, 1, 1)
|
||||
// Test UpdateUploaded
|
||||
err = repo.UpdateUploaded(ctx, nil, blob.ID.String())
|
||||
if err != nil {
|
||||
t.Fatalf("failed to list blobs with offset: %v", err)
|
||||
t.Fatalf("failed to update blob as uploaded: %v", err)
|
||||
}
|
||||
if len(blobs) != 1 {
|
||||
t.Errorf("expected 1 blob with offset, got %d", len(blobs))
|
||||
|
||||
// Verify upload update
|
||||
uploaded, err := repo.GetByID(ctx, blob.ID.String())
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get uploaded blob: %v", err)
|
||||
}
|
||||
if uploaded.UploadedTS == nil {
|
||||
t.Fatal("expected uploaded timestamp to be set")
|
||||
}
|
||||
// Allow 1 second tolerance for timestamp comparison
|
||||
if uploaded.UploadedTS.Before(now.Add(-1 * time.Second)) {
|
||||
t.Error("uploaded timestamp should be around test time")
|
||||
}
|
||||
}
|
||||
|
||||
@@ -83,7 +115,8 @@ func TestBlobRepositoryDuplicate(t *testing.T) {
|
||||
repo := NewBlobRepository(db)
|
||||
|
||||
blob := &Blob{
|
||||
BlobHash: "duplicate_blob",
|
||||
ID: types.NewBlobID(),
|
||||
Hash: types.BlobHash("duplicate_blob"),
|
||||
CreatedTS: time.Now().Truncate(time.Second),
|
||||
}
|
||||
|
||||
|
||||
125
internal/database/cascade_debug_test.go
Normal file
125
internal/database/cascade_debug_test.go
Normal file
@@ -0,0 +1,125 @@
|
||||
package database
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
)
|
||||
|
||||
// TestCascadeDeleteDebug tests cascade delete with debug output
|
||||
func TestCascadeDeleteDebug(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Check if foreign keys are enabled
|
||||
var fkEnabled int
|
||||
err := db.conn.QueryRow("PRAGMA foreign_keys").Scan(&fkEnabled)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
t.Logf("Foreign keys enabled: %d", fkEnabled)
|
||||
|
||||
// Create a file
|
||||
file := &File{
|
||||
Path: "/cascade-test.txt",
|
||||
MTime: time.Now().Truncate(time.Second),
|
||||
CTime: time.Now().Truncate(time.Second),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
err = repos.Files.Create(ctx, nil, file)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file: %v", err)
|
||||
}
|
||||
t.Logf("Created file with ID: %s", file.ID)
|
||||
|
||||
// Create chunks and file-chunk mappings
|
||||
for i := 0; i < 3; i++ {
|
||||
chunk := &Chunk{
|
||||
ChunkHash: types.ChunkHash(fmt.Sprintf("cascade-chunk-%d", i)),
|
||||
Size: 1024,
|
||||
}
|
||||
err = repos.Chunks.Create(ctx, nil, chunk)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk: %v", err)
|
||||
}
|
||||
|
||||
fc := &FileChunk{
|
||||
FileID: file.ID,
|
||||
Idx: i,
|
||||
ChunkHash: chunk.ChunkHash,
|
||||
}
|
||||
err = repos.FileChunks.Create(ctx, nil, fc)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file chunk: %v", err)
|
||||
}
|
||||
t.Logf("Created file chunk mapping: file_id=%s, idx=%d, chunk=%s", fc.FileID, fc.Idx, fc.ChunkHash)
|
||||
}
|
||||
|
||||
// Verify file chunks exist
|
||||
fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
t.Logf("File chunks before delete: %d", len(fileChunks))
|
||||
|
||||
// Check the foreign key constraint
|
||||
var fkInfo string
|
||||
err = db.conn.QueryRow(`
|
||||
SELECT sql FROM sqlite_master
|
||||
WHERE type='table' AND name='file_chunks'
|
||||
`).Scan(&fkInfo)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
t.Logf("file_chunks table definition:\n%s", fkInfo)
|
||||
|
||||
// Delete the file
|
||||
t.Log("Deleting file...")
|
||||
err = repos.Files.DeleteByID(ctx, nil, file.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to delete file: %v", err)
|
||||
}
|
||||
|
||||
// Verify file is gone
|
||||
deletedFile, err := repos.Files.GetByID(ctx, file.ID)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if deletedFile != nil {
|
||||
t.Error("file should have been deleted")
|
||||
} else {
|
||||
t.Log("File was successfully deleted")
|
||||
}
|
||||
|
||||
// Check file chunks after delete
|
||||
fileChunks, err = repos.FileChunks.GetByFileID(ctx, file.ID)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
t.Logf("File chunks after delete: %d", len(fileChunks))
|
||||
|
||||
// Manually check the database
|
||||
var count int
|
||||
err = db.conn.QueryRow("SELECT COUNT(*) FROM file_chunks WHERE file_id = ?", file.ID).Scan(&count)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
t.Logf("Manual count of file_chunks for deleted file: %d", count)
|
||||
|
||||
if len(fileChunks) != 0 {
|
||||
t.Errorf("expected 0 file chunks after cascade delete, got %d", len(fileChunks))
|
||||
// List the remaining chunks
|
||||
for _, fc := range fileChunks {
|
||||
t.Logf("Remaining chunk: file_id=%s, idx=%d, chunk=%s", fc.FileID, fc.Idx, fc.ChunkHash)
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -4,6 +4,8 @@ import (
|
||||
"context"
|
||||
"database/sql"
|
||||
"fmt"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
)
|
||||
|
||||
type ChunkFileRepository struct {
|
||||
@@ -16,16 +18,16 @@ func NewChunkFileRepository(db *DB) *ChunkFileRepository {
|
||||
|
||||
func (r *ChunkFileRepository) Create(ctx context.Context, tx *sql.Tx, cf *ChunkFile) error {
|
||||
query := `
|
||||
INSERT INTO chunk_files (chunk_hash, file_path, file_offset, length)
|
||||
INSERT INTO chunk_files (chunk_hash, file_id, file_offset, length)
|
||||
VALUES (?, ?, ?, ?)
|
||||
ON CONFLICT(chunk_hash, file_path) DO NOTHING
|
||||
ON CONFLICT(chunk_hash, file_id) DO NOTHING
|
||||
`
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, cf.ChunkHash, cf.FilePath, cf.FileOffset, cf.Length)
|
||||
_, err = tx.ExecContext(ctx, query, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLock(ctx, query, cf.ChunkHash, cf.FilePath, cf.FileOffset, cf.Length)
|
||||
_, err = r.db.ExecWithLog(ctx, query, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
@@ -35,37 +37,28 @@ func (r *ChunkFileRepository) Create(ctx context.Context, tx *sql.Tx, cf *ChunkF
|
||||
return nil
|
||||
}
|
||||
|
||||
func (r *ChunkFileRepository) GetByChunkHash(ctx context.Context, chunkHash string) ([]*ChunkFile, error) {
|
||||
func (r *ChunkFileRepository) GetByChunkHash(ctx context.Context, chunkHash types.ChunkHash) ([]*ChunkFile, error) {
|
||||
query := `
|
||||
SELECT chunk_hash, file_path, file_offset, length
|
||||
SELECT chunk_hash, file_id, file_offset, length
|
||||
FROM chunk_files
|
||||
WHERE chunk_hash = ?
|
||||
`
|
||||
|
||||
rows, err := r.db.conn.QueryContext(ctx, query, chunkHash)
|
||||
rows, err := r.db.conn.QueryContext(ctx, query, chunkHash.String())
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("querying chunk files: %w", err)
|
||||
}
|
||||
defer CloseRows(rows)
|
||||
|
||||
var chunkFiles []*ChunkFile
|
||||
for rows.Next() {
|
||||
var cf ChunkFile
|
||||
err := rows.Scan(&cf.ChunkHash, &cf.FilePath, &cf.FileOffset, &cf.Length)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("scanning chunk file: %w", err)
|
||||
}
|
||||
chunkFiles = append(chunkFiles, &cf)
|
||||
}
|
||||
|
||||
return chunkFiles, rows.Err()
|
||||
return r.scanChunkFiles(rows)
|
||||
}
|
||||
|
||||
func (r *ChunkFileRepository) GetByFilePath(ctx context.Context, filePath string) ([]*ChunkFile, error) {
|
||||
query := `
|
||||
SELECT chunk_hash, file_path, file_offset, length
|
||||
FROM chunk_files
|
||||
WHERE file_path = ?
|
||||
SELECT cf.chunk_hash, cf.file_id, cf.file_offset, cf.length
|
||||
FROM chunk_files cf
|
||||
JOIN files f ON cf.file_id = f.id
|
||||
WHERE f.path = ?
|
||||
`
|
||||
|
||||
rows, err := r.db.conn.QueryContext(ctx, query, filePath)
|
||||
@@ -74,15 +67,138 @@ func (r *ChunkFileRepository) GetByFilePath(ctx context.Context, filePath string
|
||||
}
|
||||
defer CloseRows(rows)
|
||||
|
||||
return r.scanChunkFiles(rows)
|
||||
}
|
||||
|
||||
// GetByFileID retrieves chunk files by file ID
|
||||
func (r *ChunkFileRepository) GetByFileID(ctx context.Context, fileID types.FileID) ([]*ChunkFile, error) {
|
||||
query := `
|
||||
SELECT chunk_hash, file_id, file_offset, length
|
||||
FROM chunk_files
|
||||
WHERE file_id = ?
|
||||
`
|
||||
|
||||
rows, err := r.db.conn.QueryContext(ctx, query, fileID.String())
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("querying chunk files: %w", err)
|
||||
}
|
||||
defer CloseRows(rows)
|
||||
|
||||
return r.scanChunkFiles(rows)
|
||||
}
|
||||
|
||||
// scanChunkFiles is a helper that scans chunk file rows
|
||||
func (r *ChunkFileRepository) scanChunkFiles(rows *sql.Rows) ([]*ChunkFile, error) {
|
||||
var chunkFiles []*ChunkFile
|
||||
for rows.Next() {
|
||||
var cf ChunkFile
|
||||
err := rows.Scan(&cf.ChunkHash, &cf.FilePath, &cf.FileOffset, &cf.Length)
|
||||
var chunkHashStr, fileIDStr string
|
||||
err := rows.Scan(&chunkHashStr, &fileIDStr, &cf.FileOffset, &cf.Length)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("scanning chunk file: %w", err)
|
||||
}
|
||||
cf.ChunkHash = types.ChunkHash(chunkHashStr)
|
||||
cf.FileID, err = types.ParseFileID(fileIDStr)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("parsing file ID: %w", err)
|
||||
}
|
||||
chunkFiles = append(chunkFiles, &cf)
|
||||
}
|
||||
|
||||
return chunkFiles, rows.Err()
|
||||
}
|
||||
|
||||
// DeleteByFileID deletes all chunk_files entries for a given file ID
|
||||
func (r *ChunkFileRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fileID types.FileID) error {
|
||||
query := `DELETE FROM chunk_files WHERE file_id = ?`
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, fileID.String())
|
||||
} else {
|
||||
_, err = r.db.ExecWithLog(ctx, query, fileID.String())
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
return fmt.Errorf("deleting chunk files: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// DeleteByFileIDs deletes all chunk_files for multiple files in a single statement.
|
||||
func (r *ChunkFileRepository) DeleteByFileIDs(ctx context.Context, tx *sql.Tx, fileIDs []types.FileID) error {
|
||||
if len(fileIDs) == 0 {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Batch at 500 to stay within SQLite's variable limit
|
||||
const batchSize = 500
|
||||
|
||||
for i := 0; i < len(fileIDs); i += batchSize {
|
||||
end := i + batchSize
|
||||
if end > len(fileIDs) {
|
||||
end = len(fileIDs)
|
||||
}
|
||||
batch := fileIDs[i:end]
|
||||
|
||||
query := "DELETE FROM chunk_files WHERE file_id IN (?" + repeatPlaceholder(len(batch)-1) + ")"
|
||||
args := make([]interface{}, len(batch))
|
||||
for j, id := range batch {
|
||||
args[j] = id.String()
|
||||
}
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, args...)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLog(ctx, query, args...)
|
||||
}
|
||||
if err != nil {
|
||||
return fmt.Errorf("batch deleting chunk_files: %w", err)
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// CreateBatch inserts multiple chunk_files in a single statement for efficiency.
|
||||
func (r *ChunkFileRepository) CreateBatch(ctx context.Context, tx *sql.Tx, cfs []ChunkFile) error {
|
||||
if len(cfs) == 0 {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Each ChunkFile has 4 values, so batch at 200 to be safe with SQLite's variable limit
|
||||
const batchSize = 200
|
||||
|
||||
for i := 0; i < len(cfs); i += batchSize {
|
||||
end := i + batchSize
|
||||
if end > len(cfs) {
|
||||
end = len(cfs)
|
||||
}
|
||||
batch := cfs[i:end]
|
||||
|
||||
query := "INSERT INTO chunk_files (chunk_hash, file_id, file_offset, length) VALUES "
|
||||
args := make([]interface{}, 0, len(batch)*4)
|
||||
for j, cf := range batch {
|
||||
if j > 0 {
|
||||
query += ", "
|
||||
}
|
||||
query += "(?, ?, ?, ?)"
|
||||
args = append(args, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
|
||||
}
|
||||
query += " ON CONFLICT(chunk_hash, file_id) DO NOTHING"
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, args...)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLog(ctx, query, args...)
|
||||
}
|
||||
if err != nil {
|
||||
return fmt.Errorf("batch inserting chunk_files: %w", err)
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -3,6 +3,9 @@ package database
|
||||
import (
|
||||
"context"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
)
|
||||
|
||||
func TestChunkFileRepository(t *testing.T) {
|
||||
@@ -11,24 +14,68 @@ func TestChunkFileRepository(t *testing.T) {
|
||||
|
||||
ctx := context.Background()
|
||||
repo := NewChunkFileRepository(db)
|
||||
fileRepo := NewFileRepository(db)
|
||||
chunksRepo := NewChunkRepository(db)
|
||||
|
||||
// Create test files first
|
||||
testTime := time.Now().Truncate(time.Second)
|
||||
file1 := &File{
|
||||
Path: "/file1.txt",
|
||||
MTime: testTime,
|
||||
CTime: testTime,
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
LinkTarget: "",
|
||||
}
|
||||
err := fileRepo.Create(ctx, nil, file1)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file1: %v", err)
|
||||
}
|
||||
|
||||
file2 := &File{
|
||||
Path: "/file2.txt",
|
||||
MTime: testTime,
|
||||
CTime: testTime,
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
LinkTarget: "",
|
||||
}
|
||||
err = fileRepo.Create(ctx, nil, file2)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file2: %v", err)
|
||||
}
|
||||
|
||||
// Create chunk first
|
||||
chunk := &Chunk{
|
||||
ChunkHash: types.ChunkHash("chunk1"),
|
||||
Size: 1024,
|
||||
}
|
||||
err = chunksRepo.Create(ctx, nil, chunk)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk: %v", err)
|
||||
}
|
||||
|
||||
// Test Create
|
||||
cf1 := &ChunkFile{
|
||||
ChunkHash: "chunk1",
|
||||
FilePath: "/file1.txt",
|
||||
ChunkHash: types.ChunkHash("chunk1"),
|
||||
FileID: file1.ID,
|
||||
FileOffset: 0,
|
||||
Length: 1024,
|
||||
}
|
||||
|
||||
err := repo.Create(ctx, nil, cf1)
|
||||
err = repo.Create(ctx, nil, cf1)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk file: %v", err)
|
||||
}
|
||||
|
||||
// Add same chunk in different file (deduplication scenario)
|
||||
cf2 := &ChunkFile{
|
||||
ChunkHash: "chunk1",
|
||||
FilePath: "/file2.txt",
|
||||
ChunkHash: types.ChunkHash("chunk1"),
|
||||
FileID: file2.ID,
|
||||
FileOffset: 2048,
|
||||
Length: 1024,
|
||||
}
|
||||
@@ -50,10 +97,10 @@ func TestChunkFileRepository(t *testing.T) {
|
||||
foundFile1 := false
|
||||
foundFile2 := false
|
||||
for _, cf := range chunkFiles {
|
||||
if cf.FilePath == "/file1.txt" && cf.FileOffset == 0 {
|
||||
if cf.FileID == file1.ID && cf.FileOffset == 0 {
|
||||
foundFile1 = true
|
||||
}
|
||||
if cf.FilePath == "/file2.txt" && cf.FileOffset == 2048 {
|
||||
if cf.FileID == file2.ID && cf.FileOffset == 2048 {
|
||||
foundFile2 = true
|
||||
}
|
||||
}
|
||||
@@ -61,15 +108,15 @@ func TestChunkFileRepository(t *testing.T) {
|
||||
t.Error("not all expected files found")
|
||||
}
|
||||
|
||||
// Test GetByFilePath
|
||||
chunkFiles, err = repo.GetByFilePath(ctx, "/file1.txt")
|
||||
// Test GetByFileID
|
||||
chunkFiles, err = repo.GetByFileID(ctx, file1.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get chunks by file path: %v", err)
|
||||
t.Fatalf("failed to get chunks by file ID: %v", err)
|
||||
}
|
||||
if len(chunkFiles) != 1 {
|
||||
t.Errorf("expected 1 chunk for file, got %d", len(chunkFiles))
|
||||
}
|
||||
if chunkFiles[0].ChunkHash != "chunk1" {
|
||||
if chunkFiles[0].ChunkHash != types.ChunkHash("chunk1") {
|
||||
t.Errorf("wrong chunk hash: expected chunk1, got %s", chunkFiles[0].ChunkHash)
|
||||
}
|
||||
|
||||
@@ -86,6 +133,37 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {
|
||||
|
||||
ctx := context.Background()
|
||||
repo := NewChunkFileRepository(db)
|
||||
fileRepo := NewFileRepository(db)
|
||||
chunksRepo := NewChunkRepository(db)
|
||||
|
||||
// Create test files
|
||||
testTime := time.Now().Truncate(time.Second)
|
||||
file1 := &File{Path: "/file1.txt", MTime: testTime, CTime: testTime, Size: 3072, Mode: 0644, UID: 1000, GID: 1000}
|
||||
file2 := &File{Path: "/file2.txt", MTime: testTime, CTime: testTime, Size: 3072, Mode: 0644, UID: 1000, GID: 1000}
|
||||
file3 := &File{Path: "/file3.txt", MTime: testTime, CTime: testTime, Size: 2048, Mode: 0644, UID: 1000, GID: 1000}
|
||||
|
||||
if err := fileRepo.Create(ctx, nil, file1); err != nil {
|
||||
t.Fatalf("failed to create file1: %v", err)
|
||||
}
|
||||
if err := fileRepo.Create(ctx, nil, file2); err != nil {
|
||||
t.Fatalf("failed to create file2: %v", err)
|
||||
}
|
||||
if err := fileRepo.Create(ctx, nil, file3); err != nil {
|
||||
t.Fatalf("failed to create file3: %v", err)
|
||||
}
|
||||
|
||||
// Create chunks first
|
||||
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3", "chunk4"}
|
||||
for _, chunkHash := range chunks {
|
||||
chunk := &Chunk{
|
||||
ChunkHash: chunkHash,
|
||||
Size: 1024,
|
||||
}
|
||||
err := chunksRepo.Create(ctx, nil, chunk)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
|
||||
}
|
||||
}
|
||||
|
||||
// Simulate a scenario where multiple files share chunks
|
||||
// File1: chunk1, chunk2, chunk3
|
||||
@@ -94,16 +172,16 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {
|
||||
|
||||
chunkFiles := []ChunkFile{
|
||||
// File1
|
||||
{ChunkHash: "chunk1", FilePath: "/file1.txt", FileOffset: 0, Length: 1024},
|
||||
{ChunkHash: "chunk2", FilePath: "/file1.txt", FileOffset: 1024, Length: 1024},
|
||||
{ChunkHash: "chunk3", FilePath: "/file1.txt", FileOffset: 2048, Length: 1024},
|
||||
{ChunkHash: types.ChunkHash("chunk1"), FileID: file1.ID, FileOffset: 0, Length: 1024},
|
||||
{ChunkHash: types.ChunkHash("chunk2"), FileID: file1.ID, FileOffset: 1024, Length: 1024},
|
||||
{ChunkHash: types.ChunkHash("chunk3"), FileID: file1.ID, FileOffset: 2048, Length: 1024},
|
||||
// File2
|
||||
{ChunkHash: "chunk2", FilePath: "/file2.txt", FileOffset: 0, Length: 1024},
|
||||
{ChunkHash: "chunk3", FilePath: "/file2.txt", FileOffset: 1024, Length: 1024},
|
||||
{ChunkHash: "chunk4", FilePath: "/file2.txt", FileOffset: 2048, Length: 1024},
|
||||
{ChunkHash: types.ChunkHash("chunk2"), FileID: file2.ID, FileOffset: 0, Length: 1024},
|
||||
{ChunkHash: types.ChunkHash("chunk3"), FileID: file2.ID, FileOffset: 1024, Length: 1024},
|
||||
{ChunkHash: types.ChunkHash("chunk4"), FileID: file2.ID, FileOffset: 2048, Length: 1024},
|
||||
// File3
|
||||
{ChunkHash: "chunk1", FilePath: "/file3.txt", FileOffset: 0, Length: 1024},
|
||||
{ChunkHash: "chunk4", FilePath: "/file3.txt", FileOffset: 1024, Length: 1024},
|
||||
{ChunkHash: types.ChunkHash("chunk1"), FileID: file3.ID, FileOffset: 0, Length: 1024},
|
||||
{ChunkHash: types.ChunkHash("chunk4"), FileID: file3.ID, FileOffset: 1024, Length: 1024},
|
||||
}
|
||||
|
||||
for _, cf := range chunkFiles {
|
||||
@@ -132,11 +210,11 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {
|
||||
}
|
||||
|
||||
// Test file2 chunks
|
||||
chunks, err := repo.GetByFilePath(ctx, "/file2.txt")
|
||||
file2Chunks, err := repo.GetByFileID(ctx, file2.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get chunks for file2: %v", err)
|
||||
}
|
||||
if len(chunks) != 3 {
|
||||
t.Errorf("expected 3 chunks for file2, got %d", len(chunks))
|
||||
if len(file2Chunks) != 3 {
|
||||
t.Errorf("expected 3 chunks for file2, got %d", len(file2Chunks))
|
||||
}
|
||||
}
|
||||
|
||||
@@ -4,6 +4,8 @@ import (
|
||||
"context"
|
||||
"database/sql"
|
||||
"fmt"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
)
|
||||
|
||||
type ChunkRepository struct {
|
||||
@@ -16,16 +18,16 @@ func NewChunkRepository(db *DB) *ChunkRepository {
|
||||
|
||||
func (r *ChunkRepository) Create(ctx context.Context, tx *sql.Tx, chunk *Chunk) error {
|
||||
query := `
|
||||
INSERT INTO chunks (chunk_hash, sha256, size)
|
||||
VALUES (?, ?, ?)
|
||||
INSERT INTO chunks (chunk_hash, size)
|
||||
VALUES (?, ?)
|
||||
ON CONFLICT(chunk_hash) DO NOTHING
|
||||
`
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, chunk.ChunkHash, chunk.SHA256, chunk.Size)
|
||||
_, err = tx.ExecContext(ctx, query, chunk.ChunkHash, chunk.Size)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLock(ctx, query, chunk.ChunkHash, chunk.SHA256, chunk.Size)
|
||||
_, err = r.db.ExecWithLog(ctx, query, chunk.ChunkHash, chunk.Size)
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
@@ -37,7 +39,7 @@ func (r *ChunkRepository) Create(ctx context.Context, tx *sql.Tx, chunk *Chunk)
|
||||
|
||||
func (r *ChunkRepository) GetByHash(ctx context.Context, hash string) (*Chunk, error) {
|
||||
query := `
|
||||
SELECT chunk_hash, sha256, size
|
||||
SELECT chunk_hash, size
|
||||
FROM chunks
|
||||
WHERE chunk_hash = ?
|
||||
`
|
||||
@@ -46,7 +48,6 @@ func (r *ChunkRepository) GetByHash(ctx context.Context, hash string) (*Chunk, e
|
||||
|
||||
err := r.db.conn.QueryRowContext(ctx, query, hash).Scan(
|
||||
&chunk.ChunkHash,
|
||||
&chunk.SHA256,
|
||||
&chunk.Size,
|
||||
)
|
||||
|
||||
@@ -66,7 +67,7 @@ func (r *ChunkRepository) GetByHashes(ctx context.Context, hashes []string) ([]*
|
||||
}
|
||||
|
||||
query := `
|
||||
SELECT chunk_hash, sha256, size
|
||||
SELECT chunk_hash, size
|
||||
FROM chunks
|
||||
WHERE chunk_hash IN (`
|
||||
|
||||
@@ -92,7 +93,6 @@ func (r *ChunkRepository) GetByHashes(ctx context.Context, hashes []string) ([]*
|
||||
|
||||
err := rows.Scan(
|
||||
&chunk.ChunkHash,
|
||||
&chunk.SHA256,
|
||||
&chunk.Size,
|
||||
)
|
||||
if err != nil {
|
||||
@@ -107,7 +107,7 @@ func (r *ChunkRepository) GetByHashes(ctx context.Context, hashes []string) ([]*
|
||||
|
||||
func (r *ChunkRepository) ListUnpacked(ctx context.Context, limit int) ([]*Chunk, error) {
|
||||
query := `
|
||||
SELECT c.chunk_hash, c.sha256, c.size
|
||||
SELECT c.chunk_hash, c.size
|
||||
FROM chunks c
|
||||
LEFT JOIN blob_chunks bc ON c.chunk_hash = bc.chunk_hash
|
||||
WHERE bc.chunk_hash IS NULL
|
||||
@@ -127,7 +127,6 @@ func (r *ChunkRepository) ListUnpacked(ctx context.Context, limit int) ([]*Chunk
|
||||
|
||||
err := rows.Scan(
|
||||
&chunk.ChunkHash,
|
||||
&chunk.SHA256,
|
||||
&chunk.Size,
|
||||
)
|
||||
if err != nil {
|
||||
@@ -139,3 +138,30 @@ func (r *ChunkRepository) ListUnpacked(ctx context.Context, limit int) ([]*Chunk
|
||||
|
||||
return chunks, rows.Err()
|
||||
}
|
||||
|
||||
// DeleteOrphaned deletes chunks that are not referenced by any file or blob
|
||||
func (r *ChunkRepository) DeleteOrphaned(ctx context.Context) error {
|
||||
query := `
|
||||
DELETE FROM chunks
|
||||
WHERE NOT EXISTS (
|
||||
SELECT 1 FROM file_chunks
|
||||
WHERE file_chunks.chunk_hash = chunks.chunk_hash
|
||||
)
|
||||
AND NOT EXISTS (
|
||||
SELECT 1 FROM blob_chunks
|
||||
WHERE blob_chunks.chunk_hash = chunks.chunk_hash
|
||||
)
|
||||
`
|
||||
|
||||
result, err := r.db.ExecWithLog(ctx, query)
|
||||
if err != nil {
|
||||
return fmt.Errorf("deleting orphaned chunks: %w", err)
|
||||
}
|
||||
|
||||
rowsAffected, _ := result.RowsAffected()
|
||||
if rowsAffected > 0 {
|
||||
log.Debug("Deleted orphaned chunks", "count", rowsAffected)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
37
internal/database/chunks_ext.go
Normal file
37
internal/database/chunks_ext.go
Normal file
@@ -0,0 +1,37 @@
|
||||
package database
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
)
|
||||
|
||||
func (r *ChunkRepository) List(ctx context.Context) ([]*Chunk, error) {
|
||||
query := `
|
||||
SELECT chunk_hash, size
|
||||
FROM chunks
|
||||
ORDER BY chunk_hash
|
||||
`
|
||||
|
||||
rows, err := r.db.conn.QueryContext(ctx, query)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("querying chunks: %w", err)
|
||||
}
|
||||
defer CloseRows(rows)
|
||||
|
||||
var chunks []*Chunk
|
||||
for rows.Next() {
|
||||
var chunk Chunk
|
||||
|
||||
err := rows.Scan(
|
||||
&chunk.ChunkHash,
|
||||
&chunk.Size,
|
||||
)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("scanning chunk: %w", err)
|
||||
}
|
||||
|
||||
chunks = append(chunks, &chunk)
|
||||
}
|
||||
|
||||
return chunks, rows.Err()
|
||||
}
|
||||
@@ -3,6 +3,8 @@ package database
|
||||
import (
|
||||
"context"
|
||||
"testing"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
)
|
||||
|
||||
func TestChunkRepository(t *testing.T) {
|
||||
@@ -14,8 +16,7 @@ func TestChunkRepository(t *testing.T) {
|
||||
|
||||
// Test Create
|
||||
chunk := &Chunk{
|
||||
ChunkHash: "chunkhash123",
|
||||
SHA256: "sha256hash123",
|
||||
ChunkHash: types.ChunkHash("chunkhash123"),
|
||||
Size: 4096,
|
||||
}
|
||||
|
||||
@@ -25,7 +26,7 @@ func TestChunkRepository(t *testing.T) {
|
||||
}
|
||||
|
||||
// Test GetByHash
|
||||
retrieved, err := repo.GetByHash(ctx, chunk.ChunkHash)
|
||||
retrieved, err := repo.GetByHash(ctx, chunk.ChunkHash.String())
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get chunk: %v", err)
|
||||
}
|
||||
@@ -35,9 +36,6 @@ func TestChunkRepository(t *testing.T) {
|
||||
if retrieved.ChunkHash != chunk.ChunkHash {
|
||||
t.Errorf("chunk hash mismatch: got %s, want %s", retrieved.ChunkHash, chunk.ChunkHash)
|
||||
}
|
||||
if retrieved.SHA256 != chunk.SHA256 {
|
||||
t.Errorf("sha256 mismatch: got %s, want %s", retrieved.SHA256, chunk.SHA256)
|
||||
}
|
||||
if retrieved.Size != chunk.Size {
|
||||
t.Errorf("size mismatch: got %d, want %d", retrieved.Size, chunk.Size)
|
||||
}
|
||||
@@ -50,8 +48,7 @@ func TestChunkRepository(t *testing.T) {
|
||||
|
||||
// Test GetByHashes
|
||||
chunk2 := &Chunk{
|
||||
ChunkHash: "chunkhash456",
|
||||
SHA256: "sha256hash456",
|
||||
ChunkHash: types.ChunkHash("chunkhash456"),
|
||||
Size: 8192,
|
||||
}
|
||||
err = repo.Create(ctx, nil, chunk2)
|
||||
@@ -59,7 +56,7 @@ func TestChunkRepository(t *testing.T) {
|
||||
t.Fatalf("failed to create second chunk: %v", err)
|
||||
}
|
||||
|
||||
chunks, err := repo.GetByHashes(ctx, []string{chunk.ChunkHash, chunk2.ChunkHash})
|
||||
chunks, err := repo.GetByHashes(ctx, []string{chunk.ChunkHash.String(), chunk2.ChunkHash.String()})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get chunks by hashes: %v", err)
|
||||
}
|
||||
|
||||
@@ -1,143 +1,239 @@
|
||||
// Package database provides the local SQLite index for Vaultik backup operations.
|
||||
// The database tracks files, chunks, and their associations with blobs.
|
||||
//
|
||||
// Blobs in Vaultik are the final storage units uploaded to S3. Each blob is a
|
||||
// large (up to 10GB) file containing many compressed and encrypted chunks from
|
||||
// multiple source files. Blobs are content-addressed, meaning their filename
|
||||
// is derived from their SHA256 hash after compression and encryption.
|
||||
//
|
||||
// The database does not support migrations. If the schema changes, delete
|
||||
// the local database and perform a full backup to recreate it.
|
||||
package database
|
||||
|
||||
import (
|
||||
"context"
|
||||
"database/sql"
|
||||
_ "embed"
|
||||
"fmt"
|
||||
"sync"
|
||||
"os"
|
||||
"strings"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
_ "modernc.org/sqlite"
|
||||
)
|
||||
|
||||
//go:embed schema.sql
|
||||
var schemaSQL string
|
||||
|
||||
// DB represents the Vaultik local index database connection.
|
||||
// It uses SQLite to track file metadata, content-defined chunks, and blob associations.
|
||||
// The database enables incremental backups by detecting changed files and
|
||||
// supports deduplication by tracking which chunks are already stored in blobs.
|
||||
// Write operations are synchronized through a mutex to ensure thread safety.
|
||||
type DB struct {
|
||||
conn *sql.DB
|
||||
writeLock sync.Mutex
|
||||
path string
|
||||
}
|
||||
|
||||
// New creates a new database connection at the specified path.
|
||||
// It creates the schema if needed and configures SQLite with WAL mode for
|
||||
// better concurrency. SQLite handles crash recovery automatically when
|
||||
// opening a database with journal/WAL files present.
|
||||
// The path parameter can be a file path for persistent storage or ":memory:"
|
||||
// for an in-memory database (useful for testing).
|
||||
func New(ctx context.Context, path string) (*DB, error) {
|
||||
conn, err := sql.Open("sqlite", path+"?_journal_mode=WAL&_synchronous=NORMAL&_busy_timeout=5000")
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("opening database: %w", err)
|
||||
log.Debug("Opening database connection", "path", path)
|
||||
|
||||
// Note: We do NOT delete journal/WAL files before opening.
|
||||
// SQLite handles crash recovery automatically when the database is opened.
|
||||
// Deleting these files would corrupt the database after an unclean shutdown.
|
||||
|
||||
// First attempt with standard WAL mode
|
||||
log.Debug("Attempting to open database with WAL mode", "path", path)
|
||||
conn, err := sql.Open(
|
||||
"sqlite",
|
||||
path+"?_journal_mode=WAL&_synchronous=NORMAL&_busy_timeout=10000&_locking_mode=NORMAL&_foreign_keys=ON",
|
||||
)
|
||||
if err == nil {
|
||||
// Set connection pool settings
|
||||
// SQLite can handle multiple readers but only one writer at a time.
|
||||
// Setting MaxOpenConns to 1 ensures all writes are serialized through
|
||||
// a single connection, preventing SQLITE_BUSY errors.
|
||||
conn.SetMaxOpenConns(1)
|
||||
conn.SetMaxIdleConns(1)
|
||||
|
||||
if err := conn.PingContext(ctx); err == nil {
|
||||
// Success on first try
|
||||
log.Debug("Database opened successfully with WAL mode", "path", path)
|
||||
|
||||
// Enable foreign keys explicitly
|
||||
if _, err := conn.ExecContext(ctx, "PRAGMA foreign_keys = ON"); err != nil {
|
||||
log.Warn("Failed to enable foreign keys", "error", err)
|
||||
}
|
||||
|
||||
db := &DB{conn: conn, path: path}
|
||||
if err := db.createSchema(ctx); err != nil {
|
||||
_ = conn.Close()
|
||||
return nil, fmt.Errorf("creating schema: %w", err)
|
||||
}
|
||||
return db, nil
|
||||
}
|
||||
log.Debug("Failed to ping database, closing connection", "path", path, "error", err)
|
||||
_ = conn.Close()
|
||||
}
|
||||
|
||||
// If first attempt failed, try with TRUNCATE mode to clear any locks
|
||||
log.Info(
|
||||
"Database appears locked, attempting recovery with TRUNCATE mode",
|
||||
"path", path,
|
||||
)
|
||||
conn, err = sql.Open(
|
||||
"sqlite",
|
||||
path+"?_journal_mode=TRUNCATE&_synchronous=NORMAL&_busy_timeout=10000&_foreign_keys=ON",
|
||||
)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("opening database in recovery mode: %w", err)
|
||||
}
|
||||
|
||||
// Set connection pool settings
|
||||
// SQLite can handle multiple readers but only one writer at a time.
|
||||
// Setting MaxOpenConns to 1 ensures all writes are serialized through
|
||||
// a single connection, preventing SQLITE_BUSY errors.
|
||||
conn.SetMaxOpenConns(1)
|
||||
conn.SetMaxIdleConns(1)
|
||||
|
||||
if err := conn.PingContext(ctx); err != nil {
|
||||
if closeErr := conn.Close(); closeErr != nil {
|
||||
Fatal("failed to close database connection: %v", closeErr)
|
||||
}
|
||||
return nil, fmt.Errorf("pinging database: %w", err)
|
||||
log.Debug("Failed to ping database in recovery mode, closing", "path", path, "error", err)
|
||||
_ = conn.Close()
|
||||
return nil, fmt.Errorf(
|
||||
"database still locked after recovery attempt: %w",
|
||||
err,
|
||||
)
|
||||
}
|
||||
|
||||
db := &DB{conn: conn}
|
||||
if err := db.createSchema(ctx); err != nil {
|
||||
if closeErr := conn.Close(); closeErr != nil {
|
||||
Fatal("failed to close database connection: %v", closeErr)
|
||||
log.Debug("Database opened in TRUNCATE mode", "path", path)
|
||||
|
||||
// Switch back to WAL mode
|
||||
log.Debug("Switching database back to WAL mode", "path", path)
|
||||
if _, err := conn.ExecContext(ctx, "PRAGMA journal_mode=WAL"); err != nil {
|
||||
log.Warn("Failed to switch back to WAL mode", "path", path, "error", err)
|
||||
}
|
||||
|
||||
// Ensure foreign keys are enabled
|
||||
if _, err := conn.ExecContext(ctx, "PRAGMA foreign_keys=ON"); err != nil {
|
||||
log.Warn("Failed to enable foreign keys", "path", path, "error", err)
|
||||
}
|
||||
|
||||
db := &DB{conn: conn, path: path}
|
||||
if err := db.createSchema(ctx); err != nil {
|
||||
_ = conn.Close()
|
||||
return nil, fmt.Errorf("creating schema: %w", err)
|
||||
}
|
||||
|
||||
log.Debug("Database connection established successfully", "path", path)
|
||||
return db, nil
|
||||
}
|
||||
|
||||
// Close closes the database connection.
|
||||
// It ensures all pending operations are completed before closing.
|
||||
// Returns an error if the database connection cannot be closed properly.
|
||||
func (db *DB) Close() error {
|
||||
log.Debug("Closing database connection", "path", db.path)
|
||||
if err := db.conn.Close(); err != nil {
|
||||
Fatal("failed to close database: %v", err)
|
||||
log.Error("Failed to close database", "path", db.path, "error", err)
|
||||
return fmt.Errorf("failed to close database: %w", err)
|
||||
}
|
||||
log.Debug("Database connection closed successfully", "path", db.path)
|
||||
return nil
|
||||
}
|
||||
|
||||
// Conn returns the underlying *sql.DB connection.
|
||||
// This should be used sparingly and primarily for read operations.
|
||||
// For write operations, prefer using the ExecWithLog method.
|
||||
func (db *DB) Conn() *sql.DB {
|
||||
return db.conn
|
||||
}
|
||||
|
||||
func (db *DB) BeginTx(ctx context.Context, opts *sql.TxOptions) (*sql.Tx, error) {
|
||||
// Path returns the path to the database file.
|
||||
func (db *DB) Path() string {
|
||||
return db.path
|
||||
}
|
||||
|
||||
// BeginTx starts a new database transaction with the given options.
|
||||
// The caller is responsible for committing or rolling back the transaction.
|
||||
// For write transactions, consider using the Repositories.WithTx method instead,
|
||||
// which handles locking and rollback automatically.
|
||||
func (db *DB) BeginTx(
|
||||
ctx context.Context,
|
||||
opts *sql.TxOptions,
|
||||
) (*sql.Tx, error) {
|
||||
return db.conn.BeginTx(ctx, opts)
|
||||
}
|
||||
|
||||
// LockForWrite acquires the write lock
|
||||
func (db *DB) LockForWrite() {
|
||||
db.writeLock.Lock()
|
||||
}
|
||||
// Note: LockForWrite and UnlockWrite methods have been removed.
|
||||
// SQLite handles its own locking internally, so explicit locking is not needed.
|
||||
|
||||
// UnlockWrite releases the write lock
|
||||
func (db *DB) UnlockWrite() {
|
||||
db.writeLock.Unlock()
|
||||
}
|
||||
|
||||
// ExecWithLock executes a write query with the write lock held
|
||||
func (db *DB) ExecWithLock(ctx context.Context, query string, args ...interface{}) (sql.Result, error) {
|
||||
db.writeLock.Lock()
|
||||
defer db.writeLock.Unlock()
|
||||
// ExecWithLog executes a write query with SQL logging.
|
||||
// SQLite handles its own locking internally, so we just pass through to ExecContext.
|
||||
// The query and args parameters follow the same format as sql.DB.ExecContext.
|
||||
func (db *DB) ExecWithLog(
|
||||
ctx context.Context,
|
||||
query string,
|
||||
args ...interface{},
|
||||
) (sql.Result, error) {
|
||||
LogSQL("Execute", query, args...)
|
||||
return db.conn.ExecContext(ctx, query, args...)
|
||||
}
|
||||
|
||||
// QueryRowWithLock executes a write query that returns a row with the write lock held
|
||||
func (db *DB) QueryRowWithLock(ctx context.Context, query string, args ...interface{}) *sql.Row {
|
||||
db.writeLock.Lock()
|
||||
defer db.writeLock.Unlock()
|
||||
// QueryRowWithLog executes a query that returns at most one row with SQL logging.
|
||||
// This is useful for queries that modify data and return values (e.g., INSERT ... RETURNING).
|
||||
// SQLite handles its own locking internally.
|
||||
// The query and args parameters follow the same format as sql.DB.QueryRowContext.
|
||||
func (db *DB) QueryRowWithLog(
|
||||
ctx context.Context,
|
||||
query string,
|
||||
args ...interface{},
|
||||
) *sql.Row {
|
||||
LogSQL("QueryRow", query, args...)
|
||||
return db.conn.QueryRowContext(ctx, query, args...)
|
||||
}
|
||||
|
||||
func (db *DB) createSchema(ctx context.Context) error {
|
||||
schema := `
|
||||
CREATE TABLE IF NOT EXISTS files (
|
||||
path TEXT PRIMARY KEY,
|
||||
mtime INTEGER NOT NULL,
|
||||
ctime INTEGER NOT NULL,
|
||||
size INTEGER NOT NULL,
|
||||
mode INTEGER NOT NULL,
|
||||
uid INTEGER NOT NULL,
|
||||
gid INTEGER NOT NULL,
|
||||
link_target TEXT
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS file_chunks (
|
||||
path TEXT NOT NULL,
|
||||
idx INTEGER NOT NULL,
|
||||
chunk_hash TEXT NOT NULL,
|
||||
PRIMARY KEY (path, idx)
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS chunks (
|
||||
chunk_hash TEXT PRIMARY KEY,
|
||||
sha256 TEXT NOT NULL,
|
||||
size INTEGER NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS blobs (
|
||||
blob_hash TEXT PRIMARY KEY,
|
||||
created_ts INTEGER NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS blob_chunks (
|
||||
blob_hash TEXT NOT NULL,
|
||||
chunk_hash TEXT NOT NULL,
|
||||
offset INTEGER NOT NULL,
|
||||
length INTEGER NOT NULL,
|
||||
PRIMARY KEY (blob_hash, chunk_hash)
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS chunk_files (
|
||||
chunk_hash TEXT NOT NULL,
|
||||
file_path TEXT NOT NULL,
|
||||
file_offset INTEGER NOT NULL,
|
||||
length INTEGER NOT NULL,
|
||||
PRIMARY KEY (chunk_hash, file_path)
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS snapshots (
|
||||
id TEXT PRIMARY KEY,
|
||||
hostname TEXT NOT NULL,
|
||||
vaultik_version TEXT NOT NULL,
|
||||
created_ts INTEGER NOT NULL,
|
||||
file_count INTEGER NOT NULL,
|
||||
chunk_count INTEGER NOT NULL,
|
||||
blob_count INTEGER NOT NULL,
|
||||
total_size INTEGER NOT NULL,
|
||||
blob_size INTEGER NOT NULL,
|
||||
compression_ratio REAL NOT NULL
|
||||
);
|
||||
`
|
||||
|
||||
_, err := db.conn.ExecContext(ctx, schema)
|
||||
_, err := db.conn.ExecContext(ctx, schemaSQL)
|
||||
return err
|
||||
}
|
||||
|
||||
// NewTestDB creates an in-memory SQLite database for testing purposes.
|
||||
// The database is automatically initialized with the schema and is ready for use.
|
||||
// Each call creates a new independent database instance.
|
||||
func NewTestDB() (*DB, error) {
|
||||
return New(context.Background(), ":memory:")
|
||||
}
|
||||
|
||||
// repeatPlaceholder generates a string of ", ?" repeated n times for IN clause construction.
|
||||
// For example, repeatPlaceholder(2) returns ", ?, ?".
|
||||
func repeatPlaceholder(n int) string {
|
||||
if n <= 0 {
|
||||
return ""
|
||||
}
|
||||
return strings.Repeat(", ?", n)
|
||||
}
|
||||
|
||||
// LogSQL logs SQL queries and their arguments when debug mode is enabled.
|
||||
// Debug mode is activated by setting the GODEBUG environment variable to include "vaultik".
|
||||
// This is useful for troubleshooting database operations and understanding query patterns.
|
||||
//
|
||||
// The operation parameter describes the type of SQL operation (e.g., "Execute", "Query").
|
||||
// The query parameter is the SQL statement being executed.
|
||||
// The args parameter contains the query arguments that will be interpolated.
|
||||
func LogSQL(operation, query string, args ...interface{}) {
|
||||
if strings.Contains(os.Getenv("GODEBUG"), "vaultik") {
|
||||
log.Debug(
|
||||
"SQL "+operation,
|
||||
"query",
|
||||
strings.TrimSpace(query),
|
||||
"args",
|
||||
fmt.Sprintf("%v", args),
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -67,21 +67,26 @@ func TestDatabaseConcurrentAccess(t *testing.T) {
|
||||
}()
|
||||
|
||||
// Test concurrent writes
|
||||
done := make(chan bool, 10)
|
||||
type result struct {
|
||||
index int
|
||||
err error
|
||||
}
|
||||
results := make(chan result, 10)
|
||||
|
||||
for i := 0; i < 10; i++ {
|
||||
go func(i int) {
|
||||
_, err := db.ExecWithLock(ctx, "INSERT INTO chunks (chunk_hash, sha256, size) VALUES (?, ?, ?)",
|
||||
fmt.Sprintf("hash%d", i), fmt.Sprintf("sha%d", i), i*1024)
|
||||
if err != nil {
|
||||
t.Errorf("concurrent insert failed: %v", err)
|
||||
}
|
||||
done <- true
|
||||
_, err := db.ExecWithLog(ctx, "INSERT INTO chunks (chunk_hash, size) VALUES (?, ?)",
|
||||
fmt.Sprintf("hash%d", i), i*1024)
|
||||
results <- result{index: i, err: err}
|
||||
}(i)
|
||||
}
|
||||
|
||||
// Wait for all goroutines
|
||||
// Wait for all goroutines and check results
|
||||
for i := 0; i < 10; i++ {
|
||||
<-done
|
||||
r := <-results
|
||||
if r.err != nil {
|
||||
t.Fatalf("concurrent insert %d failed: %v", r.index, r.err)
|
||||
}
|
||||
}
|
||||
|
||||
// Verify all inserts succeeded
|
||||
|
||||
@@ -4,6 +4,8 @@ import (
|
||||
"context"
|
||||
"database/sql"
|
||||
"fmt"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
)
|
||||
|
||||
type FileChunkRepository struct {
|
||||
@@ -16,16 +18,16 @@ func NewFileChunkRepository(db *DB) *FileChunkRepository {
|
||||
|
||||
func (r *FileChunkRepository) Create(ctx context.Context, tx *sql.Tx, fc *FileChunk) error {
|
||||
query := `
|
||||
INSERT INTO file_chunks (path, idx, chunk_hash)
|
||||
INSERT INTO file_chunks (file_id, idx, chunk_hash)
|
||||
VALUES (?, ?, ?)
|
||||
ON CONFLICT(path, idx) DO NOTHING
|
||||
ON CONFLICT(file_id, idx) DO NOTHING
|
||||
`
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, fc.Path, fc.Idx, fc.ChunkHash)
|
||||
_, err = tx.ExecContext(ctx, query, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
|
||||
} else {
|
||||
_, err = r.db.ExecWithLock(ctx, query, fc.Path, fc.Idx, fc.ChunkHash)
|
||||
_, err = r.db.ExecWithLog(ctx, query, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
@@ -37,10 +39,11 @@ func (r *FileChunkRepository) Create(ctx context.Context, tx *sql.Tx, fc *FileCh
|
||||
|
||||
func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*FileChunk, error) {
|
||||
query := `
|
||||
SELECT path, idx, chunk_hash
|
||||
FROM file_chunks
|
||||
WHERE path = ?
|
||||
ORDER BY idx
|
||||
SELECT fc.file_id, fc.idx, fc.chunk_hash
|
||||
FROM file_chunks fc
|
||||
JOIN files f ON fc.file_id = f.id
|
||||
WHERE f.path = ?
|
||||
ORDER BY fc.idx
|
||||
`
|
||||
|
||||
rows, err := r.db.conn.QueryContext(ctx, query, path)
|
||||
@@ -49,13 +52,64 @@ func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*Fi
|
||||
}
|
||||
defer CloseRows(rows)
|
||||
|
||||
return r.scanFileChunks(rows)
|
||||
}
|
||||
|
||||
// GetByFileID retrieves file chunks by file ID
|
||||
func (r *FileChunkRepository) GetByFileID(ctx context.Context, fileID types.FileID) ([]*FileChunk, error) {
|
||||
query := `
|
||||
SELECT file_id, idx, chunk_hash
|
||||
FROM file_chunks
|
||||
WHERE file_id = ?
|
||||
ORDER BY idx
|
||||
`
|
||||
|
||||
rows, err := r.db.conn.QueryContext(ctx, query, fileID.String())
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("querying file chunks: %w", err)
|
||||
}
|
||||
defer CloseRows(rows)
|
||||
|
||||
return r.scanFileChunks(rows)
|
||||
}
|
||||
|
||||
// GetByPathTx retrieves file chunks within a transaction
|
||||
func (r *FileChunkRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path string) ([]*FileChunk, error) {
|
||||
query := `
|
||||
SELECT fc.file_id, fc.idx, fc.chunk_hash
|
||||
FROM file_chunks fc
|
||||
JOIN files f ON fc.file_id = f.id
|
||||
WHERE f.path = ?
|
||||
ORDER BY fc.idx
|
||||
`
|
||||
|
||||
LogSQL("GetByPathTx", query, path)
|
||||
rows, err := tx.QueryContext(ctx, query, path)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("querying file chunks: %w", err)
|
||||
}
|
||||
defer CloseRows(rows)
|
||||
|
||||
fileChunks, err := r.scanFileChunks(rows)
|
||||
LogSQL("GetByPathTx", "Complete", path, "count", len(fileChunks))
|
||||
return fileChunks, err
|
||||
}
|
||||
|
||||
// scanFileChunks is a helper that scans file chunk rows
|
||||
func (r *FileChunkRepository) scanFileChunks(rows *sql.Rows) ([]*FileChunk, error) {
|
||||
var fileChunks []*FileChunk
|
||||
for rows.Next() {
|
||||
var fc FileChunk
|
||||
err := rows.Scan(&fc.Path, &fc.Idx, &fc.ChunkHash)
|
||||
var fileIDStr, chunkHashStr string
|
||||
err := rows.Scan(&fileIDStr, &fc.Idx, &chunkHashStr)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("scanning file chunk: %w", err)
|
||||
}
|
||||
fc.FileID, err = types.ParseFileID(fileIDStr)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("parsing file ID: %w", err)
|
||||
}
|
||||
fc.ChunkHash = types.ChunkHash(chunkHashStr)
|
||||
fileChunks = append(fileChunks, &fc)
|
||||
}
|
||||
|
||||
@@ -63,13 +117,13 @@ func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*Fi
|
||||
}
|
||||
|
||||
func (r *FileChunkRepository) DeleteByPath(ctx context.Context, tx *sql.Tx, path string) error {
|
||||
query := `DELETE FROM file_chunks WHERE path = ?`
|
||||
query := `DELETE FROM file_chunks WHERE file_id = (SELECT id FROM files WHERE path = ?)`
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, path)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLock(ctx, query, path)
|
||||
_, err = r.db.ExecWithLog(ctx, query, path)
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
@@ -78,3 +132,117 @@ func (r *FileChunkRepository) DeleteByPath(ctx context.Context, tx *sql.Tx, path
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// DeleteByFileID deletes all chunks for a file by its UUID
|
||||
func (r *FileChunkRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fileID types.FileID) error {
|
||||
query := `DELETE FROM file_chunks WHERE file_id = ?`
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, fileID.String())
|
||||
} else {
|
||||
_, err = r.db.ExecWithLog(ctx, query, fileID.String())
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
return fmt.Errorf("deleting file chunks: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// DeleteByFileIDs deletes all chunks for multiple files in a single statement.
|
||||
func (r *FileChunkRepository) DeleteByFileIDs(ctx context.Context, tx *sql.Tx, fileIDs []types.FileID) error {
|
||||
if len(fileIDs) == 0 {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Batch at 500 to stay within SQLite's variable limit
|
||||
const batchSize = 500
|
||||
|
||||
for i := 0; i < len(fileIDs); i += batchSize {
|
||||
end := i + batchSize
|
||||
if end > len(fileIDs) {
|
||||
end = len(fileIDs)
|
||||
}
|
||||
batch := fileIDs[i:end]
|
||||
|
||||
query := "DELETE FROM file_chunks WHERE file_id IN (?" + repeatPlaceholder(len(batch)-1) + ")"
|
||||
args := make([]interface{}, len(batch))
|
||||
for j, id := range batch {
|
||||
args[j] = id.String()
|
||||
}
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, args...)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLog(ctx, query, args...)
|
||||
}
|
||||
if err != nil {
|
||||
return fmt.Errorf("batch deleting file_chunks: %w", err)
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// CreateBatch inserts multiple file_chunks in a single statement for efficiency.
|
||||
// Batches are automatically split to stay within SQLite's variable limit.
|
||||
func (r *FileChunkRepository) CreateBatch(ctx context.Context, tx *sql.Tx, fcs []FileChunk) error {
|
||||
if len(fcs) == 0 {
|
||||
return nil
|
||||
}
|
||||
|
||||
// SQLite has a limit on variables (typically 999 or 32766).
|
||||
// Each FileChunk has 3 values, so batch at 300 to be safe.
|
||||
const batchSize = 300
|
||||
|
||||
for i := 0; i < len(fcs); i += batchSize {
|
||||
end := i + batchSize
|
||||
if end > len(fcs) {
|
||||
end = len(fcs)
|
||||
}
|
||||
batch := fcs[i:end]
|
||||
|
||||
// Build the query with multiple value sets
|
||||
query := "INSERT INTO file_chunks (file_id, idx, chunk_hash) VALUES "
|
||||
args := make([]interface{}, 0, len(batch)*3)
|
||||
for j, fc := range batch {
|
||||
if j > 0 {
|
||||
query += ", "
|
||||
}
|
||||
query += "(?, ?, ?)"
|
||||
args = append(args, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
|
||||
}
|
||||
query += " ON CONFLICT(file_id, idx) DO NOTHING"
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, args...)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLog(ctx, query, args...)
|
||||
}
|
||||
if err != nil {
|
||||
return fmt.Errorf("batch inserting file_chunks: %w", err)
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// GetByFile is an alias for GetByPath for compatibility
|
||||
func (r *FileChunkRepository) GetByFile(ctx context.Context, path string) ([]*FileChunk, error) {
|
||||
LogSQL("GetByFile", "Starting", path)
|
||||
result, err := r.GetByPath(ctx, path)
|
||||
LogSQL("GetByFile", "Complete", path, "count", len(result))
|
||||
return result, err
|
||||
}
|
||||
|
||||
// GetByFileTx retrieves file chunks within a transaction
|
||||
func (r *FileChunkRepository) GetByFileTx(ctx context.Context, tx *sql.Tx, path string) ([]*FileChunk, error) {
|
||||
LogSQL("GetByFileTx", "Starting", path)
|
||||
result, err := r.GetByPathTx(ctx, tx, path)
|
||||
LogSQL("GetByFileTx", "Complete", path, "count", len(result))
|
||||
return result, err
|
||||
}
|
||||
|
||||
@@ -4,6 +4,9 @@ import (
|
||||
"context"
|
||||
"fmt"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
)
|
||||
|
||||
func TestFileChunkRepository(t *testing.T) {
|
||||
@@ -12,24 +15,56 @@ func TestFileChunkRepository(t *testing.T) {
|
||||
|
||||
ctx := context.Background()
|
||||
repo := NewFileChunkRepository(db)
|
||||
fileRepo := NewFileRepository(db)
|
||||
|
||||
// Create test file first
|
||||
testTime := time.Now().Truncate(time.Second)
|
||||
file := &File{
|
||||
Path: "/test/file.txt",
|
||||
MTime: testTime,
|
||||
CTime: testTime,
|
||||
Size: 3072,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
LinkTarget: "",
|
||||
}
|
||||
err := fileRepo.Create(ctx, nil, file)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file: %v", err)
|
||||
}
|
||||
|
||||
// Create chunks first
|
||||
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
|
||||
chunkRepo := NewChunkRepository(db)
|
||||
for _, chunkHash := range chunks {
|
||||
chunk := &Chunk{
|
||||
ChunkHash: chunkHash,
|
||||
Size: 1024,
|
||||
}
|
||||
err = chunkRepo.Create(ctx, nil, chunk)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
|
||||
}
|
||||
}
|
||||
|
||||
// Test Create
|
||||
fc1 := &FileChunk{
|
||||
Path: "/test/file.txt",
|
||||
FileID: file.ID,
|
||||
Idx: 0,
|
||||
ChunkHash: "chunk1",
|
||||
ChunkHash: types.ChunkHash("chunk1"),
|
||||
}
|
||||
|
||||
err := repo.Create(ctx, nil, fc1)
|
||||
err = repo.Create(ctx, nil, fc1)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file chunk: %v", err)
|
||||
}
|
||||
|
||||
// Add more chunks for the same file
|
||||
fc2 := &FileChunk{
|
||||
Path: "/test/file.txt",
|
||||
FileID: file.ID,
|
||||
Idx: 1,
|
||||
ChunkHash: "chunk2",
|
||||
ChunkHash: types.ChunkHash("chunk2"),
|
||||
}
|
||||
err = repo.Create(ctx, nil, fc2)
|
||||
if err != nil {
|
||||
@@ -37,26 +72,26 @@ func TestFileChunkRepository(t *testing.T) {
|
||||
}
|
||||
|
||||
fc3 := &FileChunk{
|
||||
Path: "/test/file.txt",
|
||||
FileID: file.ID,
|
||||
Idx: 2,
|
||||
ChunkHash: "chunk3",
|
||||
ChunkHash: types.ChunkHash("chunk3"),
|
||||
}
|
||||
err = repo.Create(ctx, nil, fc3)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create third file chunk: %v", err)
|
||||
}
|
||||
|
||||
// Test GetByPath
|
||||
chunks, err := repo.GetByPath(ctx, "/test/file.txt")
|
||||
// Test GetByFile
|
||||
fileChunks, err := repo.GetByFile(ctx, "/test/file.txt")
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get file chunks: %v", err)
|
||||
}
|
||||
if len(chunks) != 3 {
|
||||
t.Errorf("expected 3 chunks, got %d", len(chunks))
|
||||
if len(fileChunks) != 3 {
|
||||
t.Errorf("expected 3 chunks, got %d", len(fileChunks))
|
||||
}
|
||||
|
||||
// Verify order
|
||||
for i, chunk := range chunks {
|
||||
for i, chunk := range fileChunks {
|
||||
if chunk.Idx != i {
|
||||
t.Errorf("wrong chunk order: expected idx %d, got %d", i, chunk.Idx)
|
||||
}
|
||||
@@ -68,18 +103,18 @@ func TestFileChunkRepository(t *testing.T) {
|
||||
t.Fatalf("failed to create duplicate file chunk: %v", err)
|
||||
}
|
||||
|
||||
// Test DeleteByPath
|
||||
err = repo.DeleteByPath(ctx, nil, "/test/file.txt")
|
||||
// Test DeleteByFileID
|
||||
err = repo.DeleteByFileID(ctx, nil, file.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to delete file chunks: %v", err)
|
||||
}
|
||||
|
||||
chunks, err = repo.GetByPath(ctx, "/test/file.txt")
|
||||
fileChunks, err = repo.GetByFileID(ctx, file.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get deleted file chunks: %v", err)
|
||||
}
|
||||
if len(chunks) != 0 {
|
||||
t.Errorf("expected 0 chunks after delete, got %d", len(chunks))
|
||||
if len(fileChunks) != 0 {
|
||||
t.Errorf("expected 0 chunks after delete, got %d", len(fileChunks))
|
||||
}
|
||||
}
|
||||
|
||||
@@ -89,15 +124,54 @@ func TestFileChunkRepositoryMultipleFiles(t *testing.T) {
|
||||
|
||||
ctx := context.Background()
|
||||
repo := NewFileChunkRepository(db)
|
||||
fileRepo := NewFileRepository(db)
|
||||
|
||||
// Create test files
|
||||
testTime := time.Now().Truncate(time.Second)
|
||||
filePaths := []string{"/file1.txt", "/file2.txt", "/file3.txt"}
|
||||
files := make([]*File, len(filePaths))
|
||||
|
||||
for i, path := range filePaths {
|
||||
file := &File{
|
||||
Path: types.FilePath(path),
|
||||
MTime: testTime,
|
||||
CTime: testTime,
|
||||
Size: 2048,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
LinkTarget: "",
|
||||
}
|
||||
err := fileRepo.Create(ctx, nil, file)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file %s: %v", path, err)
|
||||
}
|
||||
files[i] = file
|
||||
}
|
||||
|
||||
// Create all chunks first
|
||||
chunkRepo := NewChunkRepository(db)
|
||||
for i := range files {
|
||||
for j := 0; j < 2; j++ {
|
||||
chunkHash := types.ChunkHash(fmt.Sprintf("file%d_chunk%d", i, j))
|
||||
chunk := &Chunk{
|
||||
ChunkHash: chunkHash,
|
||||
Size: 1024,
|
||||
}
|
||||
err := chunkRepo.Create(ctx, nil, chunk)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Create chunks for multiple files
|
||||
files := []string{"/file1.txt", "/file2.txt", "/file3.txt"}
|
||||
for _, path := range files {
|
||||
for i := 0; i < 2; i++ {
|
||||
for i, file := range files {
|
||||
for j := 0; j < 2; j++ {
|
||||
fc := &FileChunk{
|
||||
Path: path,
|
||||
Idx: i,
|
||||
ChunkHash: fmt.Sprintf("%s_chunk%d", path, i),
|
||||
FileID: file.ID,
|
||||
Idx: j,
|
||||
ChunkHash: types.ChunkHash(fmt.Sprintf("file%d_chunk%d", i, j)),
|
||||
}
|
||||
err := repo.Create(ctx, nil, fc)
|
||||
if err != nil {
|
||||
@@ -107,13 +181,13 @@ func TestFileChunkRepositoryMultipleFiles(t *testing.T) {
|
||||
}
|
||||
|
||||
// Verify each file has correct chunks
|
||||
for _, path := range files {
|
||||
chunks, err := repo.GetByPath(ctx, path)
|
||||
for i, file := range files {
|
||||
chunks, err := repo.GetByFileID(ctx, file.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get chunks for %s: %v", path, err)
|
||||
t.Fatalf("failed to get chunks for file %d: %v", i, err)
|
||||
}
|
||||
if len(chunks) != 2 {
|
||||
t.Errorf("expected 2 chunks for %s, got %d", path, len(chunks))
|
||||
t.Errorf("expected 2 chunks for file %d, got %d", i, len(chunks))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -5,6 +5,9 @@ import (
|
||||
"database/sql"
|
||||
"fmt"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
)
|
||||
|
||||
type FileRepository struct {
|
||||
@@ -16,10 +19,16 @@ func NewFileRepository(db *DB) *FileRepository {
|
||||
}
|
||||
|
||||
func (r *FileRepository) Create(ctx context.Context, tx *sql.Tx, file *File) error {
|
||||
// Generate UUID if not provided
|
||||
if file.ID.IsZero() {
|
||||
file.ID = types.NewFileID()
|
||||
}
|
||||
|
||||
query := `
|
||||
INSERT INTO files (path, mtime, ctime, size, mode, uid, gid, link_target)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
|
||||
INSERT INTO files (id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
ON CONFLICT(path) DO UPDATE SET
|
||||
source_path = excluded.source_path,
|
||||
mtime = excluded.mtime,
|
||||
ctime = excluded.ctime,
|
||||
size = excluded.size,
|
||||
@@ -27,43 +36,78 @@ func (r *FileRepository) Create(ctx context.Context, tx *sql.Tx, file *File) err
|
||||
uid = excluded.uid,
|
||||
gid = excluded.gid,
|
||||
link_target = excluded.link_target
|
||||
RETURNING id
|
||||
`
|
||||
|
||||
var idStr string
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget)
|
||||
LogSQL("Execute", query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String())
|
||||
err = tx.QueryRowContext(ctx, query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String()).Scan(&idStr)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLock(ctx, query, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget)
|
||||
err = r.db.QueryRowWithLog(ctx, query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String()).Scan(&idStr)
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
return fmt.Errorf("inserting file: %w", err)
|
||||
}
|
||||
|
||||
// Parse the returned ID
|
||||
file.ID, err = types.ParseFileID(idStr)
|
||||
if err != nil {
|
||||
return fmt.Errorf("parsing file ID: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, error) {
|
||||
query := `
|
||||
SELECT path, mtime, ctime, size, mode, uid, gid, link_target
|
||||
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
|
||||
FROM files
|
||||
WHERE path = ?
|
||||
`
|
||||
|
||||
var file File
|
||||
var mtimeUnix, ctimeUnix int64
|
||||
var linkTarget sql.NullString
|
||||
file, err := r.scanFile(r.db.conn.QueryRowContext(ctx, query, path))
|
||||
if err == sql.ErrNoRows {
|
||||
return nil, nil
|
||||
}
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("querying file: %w", err)
|
||||
}
|
||||
|
||||
err := r.db.conn.QueryRowContext(ctx, query, path).Scan(
|
||||
&file.Path,
|
||||
&mtimeUnix,
|
||||
&ctimeUnix,
|
||||
&file.Size,
|
||||
&file.Mode,
|
||||
&file.UID,
|
||||
&file.GID,
|
||||
&linkTarget,
|
||||
)
|
||||
return file, nil
|
||||
}
|
||||
|
||||
// GetByID retrieves a file by its UUID
|
||||
func (r *FileRepository) GetByID(ctx context.Context, id types.FileID) (*File, error) {
|
||||
query := `
|
||||
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
|
||||
FROM files
|
||||
WHERE id = ?
|
||||
`
|
||||
|
||||
file, err := r.scanFile(r.db.conn.QueryRowContext(ctx, query, id.String()))
|
||||
if err == sql.ErrNoRows {
|
||||
return nil, nil
|
||||
}
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("querying file: %w", err)
|
||||
}
|
||||
|
||||
return file, nil
|
||||
}
|
||||
|
||||
func (r *FileRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path string) (*File, error) {
|
||||
query := `
|
||||
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
|
||||
FROM files
|
||||
WHERE path = ?
|
||||
`
|
||||
|
||||
LogSQL("GetByPathTx QueryRowContext", query, path)
|
||||
file, err := r.scanFile(tx.QueryRowContext(ctx, query, path))
|
||||
LogSQL("GetByPathTx Scan complete", query, path)
|
||||
|
||||
if err == sql.ErrNoRows {
|
||||
return nil, nil
|
||||
@@ -72,10 +116,80 @@ func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, err
|
||||
return nil, fmt.Errorf("querying file: %w", err)
|
||||
}
|
||||
|
||||
file.MTime = time.Unix(mtimeUnix, 0)
|
||||
file.CTime = time.Unix(ctimeUnix, 0)
|
||||
return file, nil
|
||||
}
|
||||
|
||||
// scanFile is a helper that scans a single file row
|
||||
func (r *FileRepository) scanFile(row *sql.Row) (*File, error) {
|
||||
var file File
|
||||
var idStr, pathStr, sourcePathStr string
|
||||
var mtimeUnix, ctimeUnix int64
|
||||
var linkTarget sql.NullString
|
||||
|
||||
err := row.Scan(
|
||||
&idStr,
|
||||
&pathStr,
|
||||
&sourcePathStr,
|
||||
&mtimeUnix,
|
||||
&ctimeUnix,
|
||||
&file.Size,
|
||||
&file.Mode,
|
||||
&file.UID,
|
||||
&file.GID,
|
||||
&linkTarget,
|
||||
)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
file.ID, err = types.ParseFileID(idStr)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("parsing file ID: %w", err)
|
||||
}
|
||||
file.Path = types.FilePath(pathStr)
|
||||
file.SourcePath = types.SourcePath(sourcePathStr)
|
||||
file.MTime = time.Unix(mtimeUnix, 0).UTC()
|
||||
file.CTime = time.Unix(ctimeUnix, 0).UTC()
|
||||
if linkTarget.Valid {
|
||||
file.LinkTarget = linkTarget.String
|
||||
file.LinkTarget = types.FilePath(linkTarget.String)
|
||||
}
|
||||
|
||||
return &file, nil
|
||||
}
|
||||
|
||||
// scanFileRows is a helper that scans a file row from rows iterator
|
||||
func (r *FileRepository) scanFileRows(rows *sql.Rows) (*File, error) {
|
||||
var file File
|
||||
var idStr, pathStr, sourcePathStr string
|
||||
var mtimeUnix, ctimeUnix int64
|
||||
var linkTarget sql.NullString
|
||||
|
||||
err := rows.Scan(
|
||||
&idStr,
|
||||
&pathStr,
|
||||
&sourcePathStr,
|
||||
&mtimeUnix,
|
||||
&ctimeUnix,
|
||||
&file.Size,
|
||||
&file.Mode,
|
||||
&file.UID,
|
||||
&file.GID,
|
||||
&linkTarget,
|
||||
)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
file.ID, err = types.ParseFileID(idStr)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("parsing file ID: %w", err)
|
||||
}
|
||||
file.Path = types.FilePath(pathStr)
|
||||
file.SourcePath = types.SourcePath(sourcePathStr)
|
||||
file.MTime = time.Unix(mtimeUnix, 0).UTC()
|
||||
file.CTime = time.Unix(ctimeUnix, 0).UTC()
|
||||
if linkTarget.Valid {
|
||||
file.LinkTarget = types.FilePath(linkTarget.String)
|
||||
}
|
||||
|
||||
return &file, nil
|
||||
@@ -83,7 +197,7 @@ func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, err
|
||||
|
||||
func (r *FileRepository) ListModifiedSince(ctx context.Context, since time.Time) ([]*File, error) {
|
||||
query := `
|
||||
SELECT path, mtime, ctime, size, mode, uid, gid, link_target
|
||||
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
|
||||
FROM files
|
||||
WHERE mtime >= ?
|
||||
ORDER BY path
|
||||
@@ -97,31 +211,11 @@ func (r *FileRepository) ListModifiedSince(ctx context.Context, since time.Time)
|
||||
|
||||
var files []*File
|
||||
for rows.Next() {
|
||||
var file File
|
||||
var mtimeUnix, ctimeUnix int64
|
||||
var linkTarget sql.NullString
|
||||
|
||||
err := rows.Scan(
|
||||
&file.Path,
|
||||
&mtimeUnix,
|
||||
&ctimeUnix,
|
||||
&file.Size,
|
||||
&file.Mode,
|
||||
&file.UID,
|
||||
&file.GID,
|
||||
&linkTarget,
|
||||
)
|
||||
file, err := r.scanFileRows(rows)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("scanning file: %w", err)
|
||||
}
|
||||
|
||||
file.MTime = time.Unix(mtimeUnix, 0)
|
||||
file.CTime = time.Unix(ctimeUnix, 0)
|
||||
if linkTarget.Valid {
|
||||
file.LinkTarget = linkTarget.String
|
||||
}
|
||||
|
||||
files = append(files, &file)
|
||||
files = append(files, file)
|
||||
}
|
||||
|
||||
return files, rows.Err()
|
||||
@@ -134,7 +228,7 @@ func (r *FileRepository) Delete(ctx context.Context, tx *sql.Tx, path string) er
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, path)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLock(ctx, query, path)
|
||||
_, err = r.db.ExecWithLog(ctx, query, path)
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
@@ -143,3 +237,146 @@ func (r *FileRepository) Delete(ctx context.Context, tx *sql.Tx, path string) er
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// DeleteByID deletes a file by its UUID
|
||||
func (r *FileRepository) DeleteByID(ctx context.Context, tx *sql.Tx, id types.FileID) error {
|
||||
query := `DELETE FROM files WHERE id = ?`
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, id.String())
|
||||
} else {
|
||||
_, err = r.db.ExecWithLog(ctx, query, id.String())
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
return fmt.Errorf("deleting file: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func (r *FileRepository) ListByPrefix(ctx context.Context, prefix string) ([]*File, error) {
|
||||
query := `
|
||||
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
|
||||
FROM files
|
||||
WHERE path LIKE ? || '%'
|
||||
ORDER BY path
|
||||
`
|
||||
|
||||
rows, err := r.db.conn.QueryContext(ctx, query, prefix)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("querying files: %w", err)
|
||||
}
|
||||
defer CloseRows(rows)
|
||||
|
||||
var files []*File
|
||||
for rows.Next() {
|
||||
file, err := r.scanFileRows(rows)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("scanning file: %w", err)
|
||||
}
|
||||
files = append(files, file)
|
||||
}
|
||||
|
||||
return files, rows.Err()
|
||||
}
|
||||
|
||||
// ListAll returns all files in the database
|
||||
func (r *FileRepository) ListAll(ctx context.Context) ([]*File, error) {
|
||||
query := `
|
||||
SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
|
||||
FROM files
|
||||
ORDER BY path
|
||||
`
|
||||
|
||||
rows, err := r.db.conn.QueryContext(ctx, query)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("querying files: %w", err)
|
||||
}
|
||||
defer CloseRows(rows)
|
||||
|
||||
var files []*File
|
||||
for rows.Next() {
|
||||
file, err := r.scanFileRows(rows)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("scanning file: %w", err)
|
||||
}
|
||||
files = append(files, file)
|
||||
}
|
||||
|
||||
return files, rows.Err()
|
||||
}
|
||||
|
||||
// CreateBatch inserts or updates multiple files in a single statement for efficiency.
|
||||
// File IDs must be pre-generated before calling this method.
|
||||
func (r *FileRepository) CreateBatch(ctx context.Context, tx *sql.Tx, files []*File) error {
|
||||
if len(files) == 0 {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Each File has 10 values, so batch at 100 to be safe with SQLite's variable limit
|
||||
const batchSize = 100
|
||||
|
||||
for i := 0; i < len(files); i += batchSize {
|
||||
end := i + batchSize
|
||||
if end > len(files) {
|
||||
end = len(files)
|
||||
}
|
||||
batch := files[i:end]
|
||||
|
||||
query := `INSERT INTO files (id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target) VALUES `
|
||||
args := make([]interface{}, 0, len(batch)*10)
|
||||
for j, f := range batch {
|
||||
if j > 0 {
|
||||
query += ", "
|
||||
}
|
||||
query += "(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"
|
||||
args = append(args, f.ID.String(), f.Path.String(), f.SourcePath.String(), f.MTime.Unix(), f.CTime.Unix(), f.Size, f.Mode, f.UID, f.GID, f.LinkTarget.String())
|
||||
}
|
||||
query += ` ON CONFLICT(path) DO UPDATE SET
|
||||
source_path = excluded.source_path,
|
||||
mtime = excluded.mtime,
|
||||
ctime = excluded.ctime,
|
||||
size = excluded.size,
|
||||
mode = excluded.mode,
|
||||
uid = excluded.uid,
|
||||
gid = excluded.gid,
|
||||
link_target = excluded.link_target`
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, args...)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLog(ctx, query, args...)
|
||||
}
|
||||
if err != nil {
|
||||
return fmt.Errorf("batch inserting files: %w", err)
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// DeleteOrphaned deletes files that are not referenced by any snapshot
|
||||
func (r *FileRepository) DeleteOrphaned(ctx context.Context) error {
|
||||
query := `
|
||||
DELETE FROM files
|
||||
WHERE NOT EXISTS (
|
||||
SELECT 1 FROM snapshot_files
|
||||
WHERE snapshot_files.file_id = files.id
|
||||
)
|
||||
`
|
||||
|
||||
result, err := r.db.ExecWithLog(ctx, query)
|
||||
if err != nil {
|
||||
return fmt.Errorf("deleting orphaned files: %w", err)
|
||||
}
|
||||
|
||||
rowsAffected, _ := result.RowsAffected()
|
||||
if rowsAffected > 0 {
|
||||
log.Debug("Deleted orphaned files", "count", rowsAffected)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -53,7 +53,7 @@ func TestFileRepository(t *testing.T) {
|
||||
}
|
||||
|
||||
// Test GetByPath
|
||||
retrieved, err := repo.GetByPath(ctx, file.Path)
|
||||
retrieved, err := repo.GetByPath(ctx, file.Path.String())
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get file: %v", err)
|
||||
}
|
||||
@@ -81,7 +81,7 @@ func TestFileRepository(t *testing.T) {
|
||||
t.Fatalf("failed to update file: %v", err)
|
||||
}
|
||||
|
||||
retrieved, err = repo.GetByPath(ctx, file.Path)
|
||||
retrieved, err = repo.GetByPath(ctx, file.Path.String())
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get updated file: %v", err)
|
||||
}
|
||||
@@ -99,12 +99,12 @@ func TestFileRepository(t *testing.T) {
|
||||
}
|
||||
|
||||
// Test Delete
|
||||
err = repo.Delete(ctx, nil, file.Path)
|
||||
err = repo.Delete(ctx, nil, file.Path.String())
|
||||
if err != nil {
|
||||
t.Fatalf("failed to delete file: %v", err)
|
||||
}
|
||||
|
||||
retrieved, err = repo.GetByPath(ctx, file.Path)
|
||||
retrieved, err = repo.GetByPath(ctx, file.Path.String())
|
||||
if err != nil {
|
||||
t.Fatalf("error getting deleted file: %v", err)
|
||||
}
|
||||
@@ -137,7 +137,7 @@ func TestFileRepositorySymlink(t *testing.T) {
|
||||
t.Fatalf("failed to create symlink: %v", err)
|
||||
}
|
||||
|
||||
retrieved, err := repo.GetByPath(ctx, symlink.Path)
|
||||
retrieved, err := repo.GetByPath(ctx, symlink.Path.String())
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get symlink: %v", err)
|
||||
}
|
||||
|
||||
@@ -1,70 +1,125 @@
|
||||
// Package database provides data models and repository interfaces for the Vaultik backup system.
|
||||
// It includes types for files, chunks, blobs, snapshots, and their relationships.
|
||||
package database
|
||||
|
||||
import "time"
|
||||
import (
|
||||
"time"
|
||||
|
||||
// File represents a file record in the database
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
)
|
||||
|
||||
// File represents a file or directory in the backup system.
|
||||
// It stores metadata about files including timestamps, permissions, ownership,
|
||||
// and symlink targets. This information is used to restore files with their
|
||||
// original attributes.
|
||||
type File struct {
|
||||
Path string
|
||||
MTime time.Time
|
||||
CTime time.Time
|
||||
ID types.FileID // UUID primary key
|
||||
Path types.FilePath // Absolute path of the file
|
||||
SourcePath types.SourcePath // The source directory this file came from (for restore path stripping)
|
||||
MTime time.Time // Last modification time
|
||||
CTime time.Time // Creation/change time (platform-specific: birth time on macOS, inode change time on Linux)
|
||||
Size int64
|
||||
Mode uint32
|
||||
UID uint32
|
||||
GID uint32
|
||||
LinkTarget string // empty for regular files, target path for symlinks
|
||||
LinkTarget types.FilePath // empty for regular files, target path for symlinks
|
||||
}
|
||||
|
||||
// IsSymlink returns true if this file is a symbolic link
|
||||
// IsSymlink returns true if this file is a symbolic link.
|
||||
// A file is considered a symlink if it has a non-empty LinkTarget.
|
||||
func (f *File) IsSymlink() bool {
|
||||
return f.LinkTarget != ""
|
||||
}
|
||||
|
||||
// FileChunk represents the mapping between files and chunks
|
||||
// FileChunk represents the mapping between files and their constituent chunks.
|
||||
// Large files are split into multiple chunks for efficient deduplication and storage.
|
||||
// The Idx field maintains the order of chunks within a file.
|
||||
type FileChunk struct {
|
||||
Path string
|
||||
FileID types.FileID
|
||||
Idx int
|
||||
ChunkHash string
|
||||
ChunkHash types.ChunkHash
|
||||
}
|
||||
|
||||
// Chunk represents a chunk record in the database
|
||||
// Chunk represents a data chunk in the deduplication system.
|
||||
// Files are split into chunks which are content-addressed by their hash.
|
||||
// The ChunkHash is the SHA256 hash of the chunk content, used for deduplication.
|
||||
type Chunk struct {
|
||||
ChunkHash string
|
||||
SHA256 string
|
||||
ChunkHash types.ChunkHash
|
||||
Size int64
|
||||
}
|
||||
|
||||
// Blob represents a blob record in the database
|
||||
// Blob represents a blob record in the database.
|
||||
// A blob is Vaultik's final storage unit - a large file (up to 10GB) containing
|
||||
// many compressed and encrypted chunks from multiple source files.
|
||||
// Blobs are content-addressed, meaning their filename in S3 is derived from
|
||||
// the SHA256 hash of their compressed and encrypted content.
|
||||
// The blob creation process is: chunks are accumulated -> compressed with zstd
|
||||
// -> encrypted with age -> hashed -> uploaded to S3 with the hash as filename.
|
||||
type Blob struct {
|
||||
BlobHash string
|
||||
CreatedTS time.Time
|
||||
ID types.BlobID // UUID assigned when blob creation starts
|
||||
Hash types.BlobHash // SHA256 of final compressed+encrypted content (empty until finalized)
|
||||
CreatedTS time.Time // When blob creation started
|
||||
FinishedTS *time.Time // When blob was finalized (nil if still packing)
|
||||
UncompressedSize int64 // Total size of raw chunks before compression
|
||||
CompressedSize int64 // Size after compression and encryption
|
||||
UploadedTS *time.Time // When blob was uploaded to S3 (nil if not uploaded)
|
||||
}
|
||||
|
||||
// BlobChunk represents the mapping between blobs and chunks
|
||||
// BlobChunk represents the mapping between blobs and the chunks they contain.
|
||||
// This allows tracking which chunks are stored in which blobs, along with
|
||||
// their position and size within the blob. The offset and length fields
|
||||
// enable extracting specific chunks from a blob without processing the entire blob.
|
||||
type BlobChunk struct {
|
||||
BlobHash string
|
||||
ChunkHash string
|
||||
BlobID types.BlobID
|
||||
ChunkHash types.ChunkHash
|
||||
Offset int64
|
||||
Length int64
|
||||
}
|
||||
|
||||
// ChunkFile represents the reverse mapping of chunks to files
|
||||
// ChunkFile represents the reverse mapping showing which files contain a specific chunk.
|
||||
// This is used during deduplication to identify all files that share a chunk,
|
||||
// which is important for garbage collection and integrity verification.
|
||||
type ChunkFile struct {
|
||||
ChunkHash string
|
||||
FilePath string
|
||||
ChunkHash types.ChunkHash
|
||||
FileID types.FileID
|
||||
FileOffset int64
|
||||
Length int64
|
||||
}
|
||||
|
||||
// Snapshot represents a snapshot record in the database
|
||||
type Snapshot struct {
|
||||
ID string
|
||||
Hostname string
|
||||
VaultikVersion string
|
||||
CreatedTS time.Time
|
||||
ID types.SnapshotID
|
||||
Hostname types.Hostname
|
||||
VaultikVersion types.Version
|
||||
VaultikGitRevision types.GitRevision
|
||||
StartedAt time.Time
|
||||
CompletedAt *time.Time // nil if still in progress
|
||||
FileCount int64
|
||||
ChunkCount int64
|
||||
BlobCount int64
|
||||
TotalSize int64 // Total size of all referenced files
|
||||
BlobSize int64 // Total size of all referenced blobs (compressed and encrypted)
|
||||
CompressionRatio float64 // Compression ratio (BlobSize / TotalSize)
|
||||
BlobUncompressedSize int64 // Total uncompressed size of all referenced blobs
|
||||
CompressionRatio float64 // Compression ratio (BlobSize / BlobUncompressedSize)
|
||||
CompressionLevel int // Compression level used for this snapshot
|
||||
UploadBytes int64 // Total bytes uploaded during this snapshot
|
||||
UploadDurationMs int64 // Total milliseconds spent uploading to S3
|
||||
}
|
||||
|
||||
// IsComplete returns true if the snapshot has completed
|
||||
func (s *Snapshot) IsComplete() bool {
|
||||
return s.CompletedAt != nil
|
||||
}
|
||||
|
||||
// SnapshotFile represents the mapping between snapshots and files
|
||||
type SnapshotFile struct {
|
||||
SnapshotID types.SnapshotID
|
||||
FileID types.FileID
|
||||
}
|
||||
|
||||
// SnapshotBlob represents the mapping between snapshots and blobs
|
||||
type SnapshotBlob struct {
|
||||
SnapshotID types.SnapshotID
|
||||
BlobID types.BlobID
|
||||
BlobHash types.BlobHash // Denormalized for easier manifest generation
|
||||
}
|
||||
|
||||
@@ -7,6 +7,7 @@ import (
|
||||
"path/filepath"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/config"
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
"go.uber.org/fx"
|
||||
)
|
||||
|
||||
@@ -32,7 +33,13 @@ func provideDatabase(lc fx.Lifecycle, cfg *config.Config) (*DB, error) {
|
||||
|
||||
lc.Append(fx.Hook{
|
||||
OnStop: func(ctx context.Context) error {
|
||||
return db.Close()
|
||||
log.Debug("Database module OnStop hook called")
|
||||
if err := db.Close(); err != nil {
|
||||
log.Error("Failed to close database in OnStop hook", "error", err)
|
||||
return err
|
||||
}
|
||||
log.Debug("Database closed successfully in OnStop hook")
|
||||
return nil
|
||||
},
|
||||
})
|
||||
|
||||
|
||||
@@ -6,6 +6,9 @@ import (
|
||||
"fmt"
|
||||
)
|
||||
|
||||
// Repositories provides access to all database repositories.
|
||||
// It serves as a centralized access point for all database operations
|
||||
// and manages transaction coordination across repositories.
|
||||
type Repositories struct {
|
||||
db *DB
|
||||
Files *FileRepository
|
||||
@@ -15,8 +18,11 @@ type Repositories struct {
|
||||
BlobChunks *BlobChunkRepository
|
||||
ChunkFiles *ChunkFileRepository
|
||||
Snapshots *SnapshotRepository
|
||||
Uploads *UploadRepository
|
||||
}
|
||||
|
||||
// NewRepositories creates a new Repositories instance with all repository types.
|
||||
// Each repository shares the same database connection for coordinated transactions.
|
||||
func NewRepositories(db *DB) *Repositories {
|
||||
return &Repositories{
|
||||
db: db,
|
||||
@@ -27,20 +33,26 @@ func NewRepositories(db *DB) *Repositories {
|
||||
BlobChunks: NewBlobChunkRepository(db),
|
||||
ChunkFiles: NewChunkFileRepository(db),
|
||||
Snapshots: NewSnapshotRepository(db),
|
||||
Uploads: NewUploadRepository(db.conn),
|
||||
}
|
||||
}
|
||||
|
||||
// TxFunc is a function that executes within a database transaction.
|
||||
// The transaction is automatically committed if the function returns nil,
|
||||
// or rolled back if it returns an error.
|
||||
type TxFunc func(ctx context.Context, tx *sql.Tx) error
|
||||
|
||||
// WithTx executes a function within a write transaction.
|
||||
// SQLite handles its own locking internally, so no explicit locking is needed.
|
||||
// The transaction is automatically committed on success or rolled back on error.
|
||||
// This method should be used for all write operations to ensure atomicity.
|
||||
func (r *Repositories) WithTx(ctx context.Context, fn TxFunc) error {
|
||||
// Acquire write lock for the entire transaction
|
||||
r.db.LockForWrite()
|
||||
defer r.db.UnlockWrite()
|
||||
|
||||
LogSQL("WithTx", "Beginning transaction", "")
|
||||
tx, err := r.db.BeginTx(ctx, nil)
|
||||
if err != nil {
|
||||
return fmt.Errorf("beginning transaction: %w", err)
|
||||
}
|
||||
LogSQL("WithTx", "Transaction started", "")
|
||||
|
||||
defer func() {
|
||||
if p := recover(); p != nil {
|
||||
@@ -63,6 +75,15 @@ func (r *Repositories) WithTx(ctx context.Context, fn TxFunc) error {
|
||||
return tx.Commit()
|
||||
}
|
||||
|
||||
// DB returns the underlying database for direct queries
|
||||
func (r *Repositories) DB() *DB {
|
||||
return r.db
|
||||
}
|
||||
|
||||
// WithReadTx executes a function within a read-only transaction.
|
||||
// Read transactions can run concurrently with other read transactions
|
||||
// but will be blocked by write transactions. The transaction is
|
||||
// automatically committed on success or rolled back on error.
|
||||
func (r *Repositories) WithReadTx(ctx context.Context, fn TxFunc) error {
|
||||
opts := &sql.TxOptions{
|
||||
ReadOnly: true,
|
||||
|
||||
@@ -6,6 +6,8 @@ import (
|
||||
"fmt"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
)
|
||||
|
||||
func TestRepositoriesTransaction(t *testing.T) {
|
||||
@@ -33,8 +35,7 @@ func TestRepositoriesTransaction(t *testing.T) {
|
||||
|
||||
// Create chunks
|
||||
chunk1 := &Chunk{
|
||||
ChunkHash: "tx_chunk1",
|
||||
SHA256: "tx_sha1",
|
||||
ChunkHash: types.ChunkHash("tx_chunk1"),
|
||||
Size: 512,
|
||||
}
|
||||
if err := repos.Chunks.Create(ctx, tx, chunk1); err != nil {
|
||||
@@ -42,8 +43,7 @@ func TestRepositoriesTransaction(t *testing.T) {
|
||||
}
|
||||
|
||||
chunk2 := &Chunk{
|
||||
ChunkHash: "tx_chunk2",
|
||||
SHA256: "tx_sha2",
|
||||
ChunkHash: types.ChunkHash("tx_chunk2"),
|
||||
Size: 512,
|
||||
}
|
||||
if err := repos.Chunks.Create(ctx, tx, chunk2); err != nil {
|
||||
@@ -52,7 +52,7 @@ func TestRepositoriesTransaction(t *testing.T) {
|
||||
|
||||
// Map chunks to file
|
||||
fc1 := &FileChunk{
|
||||
Path: file.Path,
|
||||
FileID: file.ID,
|
||||
Idx: 0,
|
||||
ChunkHash: chunk1.ChunkHash,
|
||||
}
|
||||
@@ -61,7 +61,7 @@ func TestRepositoriesTransaction(t *testing.T) {
|
||||
}
|
||||
|
||||
fc2 := &FileChunk{
|
||||
Path: file.Path,
|
||||
FileID: file.ID,
|
||||
Idx: 1,
|
||||
ChunkHash: chunk2.ChunkHash,
|
||||
}
|
||||
@@ -71,7 +71,8 @@ func TestRepositoriesTransaction(t *testing.T) {
|
||||
|
||||
// Create blob
|
||||
blob := &Blob{
|
||||
BlobHash: "tx_blob1",
|
||||
ID: types.NewBlobID(),
|
||||
Hash: types.BlobHash("tx_blob1"),
|
||||
CreatedTS: time.Now().Truncate(time.Second),
|
||||
}
|
||||
if err := repos.Blobs.Create(ctx, tx, blob); err != nil {
|
||||
@@ -80,7 +81,7 @@ func TestRepositoriesTransaction(t *testing.T) {
|
||||
|
||||
// Map chunks to blob
|
||||
bc1 := &BlobChunk{
|
||||
BlobHash: blob.BlobHash,
|
||||
BlobID: blob.ID,
|
||||
ChunkHash: chunk1.ChunkHash,
|
||||
Offset: 0,
|
||||
Length: 512,
|
||||
@@ -90,7 +91,7 @@ func TestRepositoriesTransaction(t *testing.T) {
|
||||
}
|
||||
|
||||
bc2 := &BlobChunk{
|
||||
BlobHash: blob.BlobHash,
|
||||
BlobID: blob.ID,
|
||||
ChunkHash: chunk2.ChunkHash,
|
||||
Offset: 512,
|
||||
Length: 512,
|
||||
@@ -115,7 +116,7 @@ func TestRepositoriesTransaction(t *testing.T) {
|
||||
t.Error("expected file after transaction")
|
||||
}
|
||||
|
||||
chunks, err := repos.FileChunks.GetByPath(ctx, "/test/tx_file.txt")
|
||||
chunks, err := repos.FileChunks.GetByFile(ctx, "/test/tx_file.txt")
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get file chunks: %v", err)
|
||||
}
|
||||
@@ -157,8 +158,7 @@ func TestRepositoriesTransactionRollback(t *testing.T) {
|
||||
|
||||
// Create a chunk
|
||||
chunk := &Chunk{
|
||||
ChunkHash: "rollback_chunk",
|
||||
SHA256: "rollback_sha",
|
||||
ChunkHash: types.ChunkHash("rollback_chunk"),
|
||||
Size: 1024,
|
||||
}
|
||||
if err := repos.Chunks.Create(ctx, tx, chunk); err != nil {
|
||||
@@ -217,7 +217,7 @@ func TestRepositoriesReadTransaction(t *testing.T) {
|
||||
var retrievedFile *File
|
||||
err = repos.WithReadTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||
var err error
|
||||
retrievedFile, err = repos.Files.GetByPath(ctx, "/test/read_file.txt")
|
||||
retrievedFile, err = repos.Files.GetByPathTx(ctx, tx, "/test/read_file.txt")
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
874
internal/database/repository_comprehensive_test.go
Normal file
874
internal/database/repository_comprehensive_test.go
Normal file
@@ -0,0 +1,874 @@
|
||||
package database
|
||||
|
||||
import (
|
||||
"context"
|
||||
"database/sql"
|
||||
"fmt"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
)
|
||||
|
||||
// TestFileRepositoryUUIDGeneration tests that files get unique UUIDs
|
||||
func TestFileRepositoryUUIDGeneration(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repo := NewFileRepository(db)
|
||||
|
||||
// Create multiple files
|
||||
files := []*File{
|
||||
{
|
||||
Path: "/file1.txt",
|
||||
MTime: time.Now().Truncate(time.Second),
|
||||
CTime: time.Now().Truncate(time.Second),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
},
|
||||
{
|
||||
Path: "/file2.txt",
|
||||
MTime: time.Now().Truncate(time.Second),
|
||||
CTime: time.Now().Truncate(time.Second),
|
||||
Size: 2048,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
},
|
||||
}
|
||||
|
||||
uuids := make(map[string]bool)
|
||||
for _, file := range files {
|
||||
err := repo.Create(ctx, nil, file)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file: %v", err)
|
||||
}
|
||||
|
||||
// Check UUID was generated
|
||||
if file.ID.IsZero() {
|
||||
t.Error("file ID was not generated")
|
||||
}
|
||||
|
||||
// Check UUID is unique
|
||||
if uuids[file.ID.String()] {
|
||||
t.Errorf("duplicate UUID generated: %s", file.ID)
|
||||
}
|
||||
uuids[file.ID.String()] = true
|
||||
}
|
||||
}
|
||||
|
||||
// TestFileRepositoryGetByID tests retrieving files by UUID
|
||||
func TestFileRepositoryGetByID(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repo := NewFileRepository(db)
|
||||
|
||||
// Create a file
|
||||
file := &File{
|
||||
Path: "/test.txt",
|
||||
MTime: time.Now().Truncate(time.Second),
|
||||
CTime: time.Now().Truncate(time.Second),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
|
||||
err := repo.Create(ctx, nil, file)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file: %v", err)
|
||||
}
|
||||
|
||||
// Retrieve by ID
|
||||
retrieved, err := repo.GetByID(ctx, file.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get file by ID: %v", err)
|
||||
}
|
||||
|
||||
if retrieved.ID != file.ID {
|
||||
t.Errorf("ID mismatch: expected %s, got %s", file.ID, retrieved.ID)
|
||||
}
|
||||
if retrieved.Path != file.Path {
|
||||
t.Errorf("Path mismatch: expected %s, got %s", file.Path, retrieved.Path)
|
||||
}
|
||||
|
||||
// Test non-existent ID
|
||||
nonExistentID := types.NewFileID() // Generate a new UUID that won't exist in the database
|
||||
nonExistent, err := repo.GetByID(ctx, nonExistentID)
|
||||
if err != nil {
|
||||
t.Fatalf("GetByID should not return error for non-existent ID: %v", err)
|
||||
}
|
||||
if nonExistent != nil {
|
||||
t.Error("expected nil for non-existent ID")
|
||||
}
|
||||
}
|
||||
|
||||
// TestOrphanedFileCleanup tests the cleanup of orphaned files
|
||||
func TestOrphanedFileCleanup(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Create files
|
||||
file1 := &File{
|
||||
Path: "/orphaned.txt",
|
||||
MTime: time.Now().Truncate(time.Second),
|
||||
CTime: time.Now().Truncate(time.Second),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
file2 := &File{
|
||||
Path: "/referenced.txt",
|
||||
MTime: time.Now().Truncate(time.Second),
|
||||
CTime: time.Now().Truncate(time.Second),
|
||||
Size: 2048,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
|
||||
err := repos.Files.Create(ctx, nil, file1)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file1: %v", err)
|
||||
}
|
||||
err = repos.Files.Create(ctx, nil, file2)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file2: %v", err)
|
||||
}
|
||||
|
||||
// Create a snapshot and reference only file2
|
||||
snapshot := &Snapshot{
|
||||
ID: "test-snapshot",
|
||||
Hostname: "test-host",
|
||||
StartedAt: time.Now(),
|
||||
}
|
||||
err = repos.Snapshots.Create(ctx, nil, snapshot)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create snapshot: %v", err)
|
||||
}
|
||||
|
||||
// Add file2 to snapshot
|
||||
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file2.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to add file to snapshot: %v", err)
|
||||
}
|
||||
|
||||
// Run orphaned cleanup
|
||||
err = repos.Files.DeleteOrphaned(ctx)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to delete orphaned files: %v", err)
|
||||
}
|
||||
|
||||
// Check that orphaned file is gone
|
||||
orphanedFile, err := repos.Files.GetByID(ctx, file1.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("error getting file: %v", err)
|
||||
}
|
||||
if orphanedFile != nil {
|
||||
t.Error("orphaned file should have been deleted")
|
||||
}
|
||||
|
||||
// Check that referenced file still exists
|
||||
referencedFile, err := repos.Files.GetByID(ctx, file2.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("error getting file: %v", err)
|
||||
}
|
||||
if referencedFile == nil {
|
||||
t.Error("referenced file should not have been deleted")
|
||||
}
|
||||
}
|
||||
|
||||
// TestOrphanedChunkCleanup tests the cleanup of orphaned chunks
|
||||
func TestOrphanedChunkCleanup(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Create chunks
|
||||
chunk1 := &Chunk{
|
||||
ChunkHash: types.ChunkHash("orphaned-chunk"),
|
||||
Size: 1024,
|
||||
}
|
||||
chunk2 := &Chunk{
|
||||
ChunkHash: types.ChunkHash("referenced-chunk"),
|
||||
Size: 1024,
|
||||
}
|
||||
|
||||
err := repos.Chunks.Create(ctx, nil, chunk1)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk1: %v", err)
|
||||
}
|
||||
err = repos.Chunks.Create(ctx, nil, chunk2)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk2: %v", err)
|
||||
}
|
||||
|
||||
// Create a file and reference only chunk2
|
||||
file := &File{
|
||||
Path: "/test.txt",
|
||||
MTime: time.Now().Truncate(time.Second),
|
||||
CTime: time.Now().Truncate(time.Second),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
err = repos.Files.Create(ctx, nil, file)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file: %v", err)
|
||||
}
|
||||
|
||||
// Create file-chunk mapping only for chunk2
|
||||
fc := &FileChunk{
|
||||
FileID: file.ID,
|
||||
Idx: 0,
|
||||
ChunkHash: chunk2.ChunkHash,
|
||||
}
|
||||
err = repos.FileChunks.Create(ctx, nil, fc)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file chunk: %v", err)
|
||||
}
|
||||
|
||||
// Run orphaned cleanup
|
||||
err = repos.Chunks.DeleteOrphaned(ctx)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to delete orphaned chunks: %v", err)
|
||||
}
|
||||
|
||||
// Check that orphaned chunk is gone
|
||||
orphanedChunk, err := repos.Chunks.GetByHash(ctx, chunk1.ChunkHash.String())
|
||||
if err != nil {
|
||||
t.Fatalf("error getting chunk: %v", err)
|
||||
}
|
||||
if orphanedChunk != nil {
|
||||
t.Error("orphaned chunk should have been deleted")
|
||||
}
|
||||
|
||||
// Check that referenced chunk still exists
|
||||
referencedChunk, err := repos.Chunks.GetByHash(ctx, chunk2.ChunkHash.String())
|
||||
if err != nil {
|
||||
t.Fatalf("error getting chunk: %v", err)
|
||||
}
|
||||
if referencedChunk == nil {
|
||||
t.Error("referenced chunk should not have been deleted")
|
||||
}
|
||||
}
|
||||
|
||||
// TestOrphanedBlobCleanup tests the cleanup of orphaned blobs
|
||||
func TestOrphanedBlobCleanup(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Create blobs
|
||||
blob1 := &Blob{
|
||||
ID: types.NewBlobID(),
|
||||
Hash: types.BlobHash("orphaned-blob"),
|
||||
CreatedTS: time.Now().Truncate(time.Second),
|
||||
}
|
||||
blob2 := &Blob{
|
||||
ID: types.NewBlobID(),
|
||||
Hash: types.BlobHash("referenced-blob"),
|
||||
CreatedTS: time.Now().Truncate(time.Second),
|
||||
}
|
||||
|
||||
err := repos.Blobs.Create(ctx, nil, blob1)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create blob1: %v", err)
|
||||
}
|
||||
err = repos.Blobs.Create(ctx, nil, blob2)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create blob2: %v", err)
|
||||
}
|
||||
|
||||
// Create a snapshot and reference only blob2
|
||||
snapshot := &Snapshot{
|
||||
ID: "test-snapshot",
|
||||
Hostname: "test-host",
|
||||
StartedAt: time.Now(),
|
||||
}
|
||||
err = repos.Snapshots.Create(ctx, nil, snapshot)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create snapshot: %v", err)
|
||||
}
|
||||
|
||||
// Add blob2 to snapshot
|
||||
err = repos.Snapshots.AddBlob(ctx, nil, snapshot.ID.String(), blob2.ID, blob2.Hash)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to add blob to snapshot: %v", err)
|
||||
}
|
||||
|
||||
// Run orphaned cleanup
|
||||
err = repos.Blobs.DeleteOrphaned(ctx)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to delete orphaned blobs: %v", err)
|
||||
}
|
||||
|
||||
// Check that orphaned blob is gone
|
||||
orphanedBlob, err := repos.Blobs.GetByID(ctx, blob1.ID.String())
|
||||
if err != nil {
|
||||
t.Fatalf("error getting blob: %v", err)
|
||||
}
|
||||
if orphanedBlob != nil {
|
||||
t.Error("orphaned blob should have been deleted")
|
||||
}
|
||||
|
||||
// Check that referenced blob still exists
|
||||
referencedBlob, err := repos.Blobs.GetByID(ctx, blob2.ID.String())
|
||||
if err != nil {
|
||||
t.Fatalf("error getting blob: %v", err)
|
||||
}
|
||||
if referencedBlob == nil {
|
||||
t.Error("referenced blob should not have been deleted")
|
||||
}
|
||||
}
|
||||
|
||||
// TestFileChunkRepositoryWithUUIDs tests file-chunk relationships with UUIDs
|
||||
func TestFileChunkRepositoryWithUUIDs(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Create a file
|
||||
file := &File{
|
||||
Path: "/test.txt",
|
||||
MTime: time.Now().Truncate(time.Second),
|
||||
CTime: time.Now().Truncate(time.Second),
|
||||
Size: 3072,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
err := repos.Files.Create(ctx, nil, file)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file: %v", err)
|
||||
}
|
||||
|
||||
// Create chunks
|
||||
chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
|
||||
for i, chunkHash := range chunks {
|
||||
chunk := &Chunk{
|
||||
ChunkHash: chunkHash,
|
||||
Size: 1024,
|
||||
}
|
||||
err = repos.Chunks.Create(ctx, nil, chunk)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk: %v", err)
|
||||
}
|
||||
|
||||
// Create file-chunk mapping
|
||||
fc := &FileChunk{
|
||||
FileID: file.ID,
|
||||
Idx: i,
|
||||
ChunkHash: chunkHash,
|
||||
}
|
||||
err = repos.FileChunks.Create(ctx, nil, fc)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file chunk: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Test GetByFileID
|
||||
fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get file chunks: %v", err)
|
||||
}
|
||||
if len(fileChunks) != 3 {
|
||||
t.Errorf("expected 3 chunks, got %d", len(fileChunks))
|
||||
}
|
||||
|
||||
// Test DeleteByFileID
|
||||
err = repos.FileChunks.DeleteByFileID(ctx, nil, file.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to delete file chunks: %v", err)
|
||||
}
|
||||
|
||||
fileChunks, err = repos.FileChunks.GetByFileID(ctx, file.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get file chunks after delete: %v", err)
|
||||
}
|
||||
if len(fileChunks) != 0 {
|
||||
t.Errorf("expected 0 chunks after delete, got %d", len(fileChunks))
|
||||
}
|
||||
}
|
||||
|
||||
// TestChunkFileRepositoryWithUUIDs tests chunk-file relationships with UUIDs
|
||||
func TestChunkFileRepositoryWithUUIDs(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Create files
|
||||
file1 := &File{
|
||||
Path: "/file1.txt",
|
||||
MTime: time.Now().Truncate(time.Second),
|
||||
CTime: time.Now().Truncate(time.Second),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
file2 := &File{
|
||||
Path: "/file2.txt",
|
||||
MTime: time.Now().Truncate(time.Second),
|
||||
CTime: time.Now().Truncate(time.Second),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
|
||||
err := repos.Files.Create(ctx, nil, file1)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file1: %v", err)
|
||||
}
|
||||
err = repos.Files.Create(ctx, nil, file2)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file2: %v", err)
|
||||
}
|
||||
|
||||
// Create a chunk that appears in both files (deduplication)
|
||||
chunk := &Chunk{
|
||||
ChunkHash: types.ChunkHash("shared-chunk"),
|
||||
Size: 1024,
|
||||
}
|
||||
err = repos.Chunks.Create(ctx, nil, chunk)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk: %v", err)
|
||||
}
|
||||
|
||||
// Create chunk-file mappings
|
||||
cf1 := &ChunkFile{
|
||||
ChunkHash: chunk.ChunkHash,
|
||||
FileID: file1.ID,
|
||||
FileOffset: 0,
|
||||
Length: 1024,
|
||||
}
|
||||
cf2 := &ChunkFile{
|
||||
ChunkHash: chunk.ChunkHash,
|
||||
FileID: file2.ID,
|
||||
FileOffset: 512,
|
||||
Length: 1024,
|
||||
}
|
||||
|
||||
err = repos.ChunkFiles.Create(ctx, nil, cf1)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk file 1: %v", err)
|
||||
}
|
||||
err = repos.ChunkFiles.Create(ctx, nil, cf2)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk file 2: %v", err)
|
||||
}
|
||||
|
||||
// Test GetByChunkHash
|
||||
chunkFiles, err := repos.ChunkFiles.GetByChunkHash(ctx, chunk.ChunkHash)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get chunk files: %v", err)
|
||||
}
|
||||
if len(chunkFiles) != 2 {
|
||||
t.Errorf("expected 2 files for chunk, got %d", len(chunkFiles))
|
||||
}
|
||||
|
||||
// Test GetByFileID
|
||||
chunkFiles, err = repos.ChunkFiles.GetByFileID(ctx, file1.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get chunks by file ID: %v", err)
|
||||
}
|
||||
if len(chunkFiles) != 1 {
|
||||
t.Errorf("expected 1 chunk for file, got %d", len(chunkFiles))
|
||||
}
|
||||
}
|
||||
|
||||
// TestSnapshotRepositoryExtendedFields tests snapshot with version and git revision
|
||||
func TestSnapshotRepositoryExtendedFields(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repo := NewSnapshotRepository(db)
|
||||
|
||||
// Create snapshot with extended fields
|
||||
snapshot := &Snapshot{
|
||||
ID: "test-20250722-120000Z",
|
||||
Hostname: "test-host",
|
||||
VaultikVersion: "0.0.1",
|
||||
VaultikGitRevision: "abc123def456",
|
||||
StartedAt: time.Now(),
|
||||
CompletedAt: nil,
|
||||
FileCount: 100,
|
||||
ChunkCount: 200,
|
||||
BlobCount: 50,
|
||||
TotalSize: 1024 * 1024,
|
||||
BlobSize: 512 * 1024,
|
||||
BlobUncompressedSize: 1024 * 1024,
|
||||
CompressionLevel: 6,
|
||||
CompressionRatio: 2.0,
|
||||
UploadDurationMs: 5000,
|
||||
}
|
||||
|
||||
err := repo.Create(ctx, nil, snapshot)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create snapshot: %v", err)
|
||||
}
|
||||
|
||||
// Retrieve and verify
|
||||
retrieved, err := repo.GetByID(ctx, snapshot.ID.String())
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get snapshot: %v", err)
|
||||
}
|
||||
|
||||
if retrieved.VaultikVersion != snapshot.VaultikVersion {
|
||||
t.Errorf("version mismatch: expected %s, got %s", snapshot.VaultikVersion, retrieved.VaultikVersion)
|
||||
}
|
||||
if retrieved.VaultikGitRevision != snapshot.VaultikGitRevision {
|
||||
t.Errorf("git revision mismatch: expected %s, got %s", snapshot.VaultikGitRevision, retrieved.VaultikGitRevision)
|
||||
}
|
||||
if retrieved.CompressionLevel != snapshot.CompressionLevel {
|
||||
t.Errorf("compression level mismatch: expected %d, got %d", snapshot.CompressionLevel, retrieved.CompressionLevel)
|
||||
}
|
||||
if retrieved.BlobUncompressedSize != snapshot.BlobUncompressedSize {
|
||||
t.Errorf("uncompressed size mismatch: expected %d, got %d", snapshot.BlobUncompressedSize, retrieved.BlobUncompressedSize)
|
||||
}
|
||||
if retrieved.UploadDurationMs != snapshot.UploadDurationMs {
|
||||
t.Errorf("upload duration mismatch: expected %d, got %d", snapshot.UploadDurationMs, retrieved.UploadDurationMs)
|
||||
}
|
||||
}
|
||||
|
||||
// TestComplexOrphanedDataScenario tests a complex scenario with multiple relationships
|
||||
func TestComplexOrphanedDataScenario(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Create snapshots
|
||||
snapshot1 := &Snapshot{
|
||||
ID: "snapshot1",
|
||||
Hostname: "host1",
|
||||
StartedAt: time.Now(),
|
||||
}
|
||||
snapshot2 := &Snapshot{
|
||||
ID: "snapshot2",
|
||||
Hostname: "host1",
|
||||
StartedAt: time.Now(),
|
||||
}
|
||||
|
||||
err := repos.Snapshots.Create(ctx, nil, snapshot1)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create snapshot1: %v", err)
|
||||
}
|
||||
err = repos.Snapshots.Create(ctx, nil, snapshot2)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create snapshot2: %v", err)
|
||||
}
|
||||
|
||||
// Create files
|
||||
files := make([]*File, 3)
|
||||
for i := range files {
|
||||
files[i] = &File{
|
||||
Path: types.FilePath(fmt.Sprintf("/file%d.txt", i)),
|
||||
MTime: time.Now().Truncate(time.Second),
|
||||
CTime: time.Now().Truncate(time.Second),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
err = repos.Files.Create(ctx, nil, files[i])
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file%d: %v", i, err)
|
||||
}
|
||||
}
|
||||
|
||||
// Add files to snapshots
|
||||
// Snapshot1: file0, file1
|
||||
// Snapshot2: file1, file2
|
||||
// file0: only in snapshot1
|
||||
// file1: in both snapshots
|
||||
// file2: only in snapshot2
|
||||
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot1.ID.String(), files[0].ID)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot1.ID.String(), files[1].ID)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot2.ID.String(), files[1].ID)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot2.ID.String(), files[2].ID)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
// Delete snapshot1
|
||||
err = repos.Snapshots.DeleteSnapshotFiles(ctx, snapshot1.ID.String())
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
err = repos.Snapshots.Delete(ctx, snapshot1.ID.String())
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
// Run orphaned cleanup
|
||||
err = repos.Files.DeleteOrphaned(ctx)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
// Check results
|
||||
// file0 should be deleted (only in deleted snapshot)
|
||||
file0, err := repos.Files.GetByID(ctx, files[0].ID)
|
||||
if err != nil {
|
||||
t.Fatalf("error getting file0: %v", err)
|
||||
}
|
||||
if file0 != nil {
|
||||
t.Error("file0 should have been deleted")
|
||||
}
|
||||
|
||||
// file1 should exist (still in snapshot2)
|
||||
file1, err := repos.Files.GetByID(ctx, files[1].ID)
|
||||
if err != nil {
|
||||
t.Fatalf("error getting file1: %v", err)
|
||||
}
|
||||
if file1 == nil {
|
||||
t.Error("file1 should still exist")
|
||||
}
|
||||
|
||||
// file2 should exist (still in snapshot2)
|
||||
file2, err := repos.Files.GetByID(ctx, files[2].ID)
|
||||
if err != nil {
|
||||
t.Fatalf("error getting file2: %v", err)
|
||||
}
|
||||
if file2 == nil {
|
||||
t.Error("file2 should still exist")
|
||||
}
|
||||
}
|
||||
|
||||
// TestCascadeDelete tests that cascade deletes work properly
|
||||
func TestCascadeDelete(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Create a file
|
||||
file := &File{
|
||||
Path: "/cascade-test.txt",
|
||||
MTime: time.Now().Truncate(time.Second),
|
||||
CTime: time.Now().Truncate(time.Second),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
err := repos.Files.Create(ctx, nil, file)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file: %v", err)
|
||||
}
|
||||
|
||||
// Create chunks and file-chunk mappings
|
||||
for i := 0; i < 3; i++ {
|
||||
chunk := &Chunk{
|
||||
ChunkHash: types.ChunkHash(fmt.Sprintf("cascade-chunk-%d", i)),
|
||||
Size: 1024,
|
||||
}
|
||||
err = repos.Chunks.Create(ctx, nil, chunk)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk: %v", err)
|
||||
}
|
||||
|
||||
fc := &FileChunk{
|
||||
FileID: file.ID,
|
||||
Idx: i,
|
||||
ChunkHash: chunk.ChunkHash,
|
||||
}
|
||||
err = repos.FileChunks.Create(ctx, nil, fc)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file chunk: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Verify file chunks exist
|
||||
fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if len(fileChunks) != 3 {
|
||||
t.Errorf("expected 3 file chunks, got %d", len(fileChunks))
|
||||
}
|
||||
|
||||
// Delete the file
|
||||
err = repos.Files.DeleteByID(ctx, nil, file.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to delete file: %v", err)
|
||||
}
|
||||
|
||||
// Verify file chunks were cascade deleted
|
||||
fileChunks, err = repos.FileChunks.GetByFileID(ctx, file.ID)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if len(fileChunks) != 0 {
|
||||
t.Errorf("expected 0 file chunks after cascade delete, got %d", len(fileChunks))
|
||||
}
|
||||
}
|
||||
|
||||
// TestTransactionIsolation tests that transactions properly isolate changes
|
||||
func TestTransactionIsolation(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Start a transaction
|
||||
err := repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||
// Create a file within the transaction
|
||||
file := &File{
|
||||
Path: "/tx-test.txt",
|
||||
MTime: time.Now().Truncate(time.Second),
|
||||
CTime: time.Now().Truncate(time.Second),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
err := repos.Files.Create(ctx, tx, file)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Within the same transaction, we should be able to query it
|
||||
// Note: This would require modifying GetByPath to accept a tx parameter
|
||||
// For now, we'll just test that rollback works
|
||||
|
||||
// Return an error to trigger rollback
|
||||
return fmt.Errorf("intentional rollback")
|
||||
})
|
||||
|
||||
if err == nil {
|
||||
t.Fatal("expected error from transaction")
|
||||
}
|
||||
|
||||
// Verify the file was not created (transaction rolled back)
|
||||
files, err := repos.Files.ListByPrefix(ctx, "/tx-test")
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if len(files) != 0 {
|
||||
t.Error("file should not exist after rollback")
|
||||
}
|
||||
}
|
||||
|
||||
// TestConcurrentOrphanedCleanup tests that concurrent cleanup operations don't interfere
|
||||
func TestConcurrentOrphanedCleanup(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Set a 5-second busy timeout to handle concurrent operations
|
||||
if _, err := db.conn.Exec("PRAGMA busy_timeout = 5000"); err != nil {
|
||||
t.Fatalf("failed to set busy timeout: %v", err)
|
||||
}
|
||||
|
||||
// Create a snapshot
|
||||
snapshot := &Snapshot{
|
||||
ID: "concurrent-test",
|
||||
Hostname: "test-host",
|
||||
StartedAt: time.Now(),
|
||||
}
|
||||
err := repos.Snapshots.Create(ctx, nil, snapshot)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
// Create many files, some orphaned
|
||||
for i := 0; i < 20; i++ {
|
||||
file := &File{
|
||||
Path: types.FilePath(fmt.Sprintf("/concurrent-%d.txt", i)),
|
||||
MTime: time.Now().Truncate(time.Second),
|
||||
CTime: time.Now().Truncate(time.Second),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
err = repos.Files.Create(ctx, nil, file)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
// Add even-numbered files to snapshot
|
||||
if i%2 == 0 {
|
||||
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file.ID)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Run multiple cleanup operations concurrently
|
||||
// Note: SQLite has limited support for concurrent writes, so we expect some to fail
|
||||
done := make(chan error, 3)
|
||||
for i := 0; i < 3; i++ {
|
||||
go func() {
|
||||
done <- repos.Files.DeleteOrphaned(ctx)
|
||||
}()
|
||||
}
|
||||
|
||||
// Wait for all to complete
|
||||
for i := 0; i < 3; i++ {
|
||||
err := <-done
|
||||
if err != nil {
|
||||
t.Errorf("cleanup %d failed: %v", i, err)
|
||||
}
|
||||
}
|
||||
|
||||
// Verify correct files were deleted
|
||||
files, err := repos.Files.ListByPrefix(ctx, "/concurrent-")
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
// Should have 10 files remaining (even numbered)
|
||||
if len(files) != 10 {
|
||||
t.Errorf("expected 10 files remaining, got %d", len(files))
|
||||
}
|
||||
|
||||
// Verify all remaining files are even-numbered
|
||||
for _, file := range files {
|
||||
var num int
|
||||
_, err := fmt.Sscanf(file.Path.String(), "/concurrent-%d.txt", &num)
|
||||
if err != nil {
|
||||
t.Logf("failed to parse file number from %s: %v", file.Path, err)
|
||||
}
|
||||
if num%2 != 0 {
|
||||
t.Errorf("odd-numbered file %s should have been deleted", file.Path)
|
||||
}
|
||||
}
|
||||
}
|
||||
165
internal/database/repository_debug_test.go
Normal file
165
internal/database/repository_debug_test.go
Normal file
@@ -0,0 +1,165 @@
|
||||
package database
|
||||
|
||||
import (
|
||||
"context"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
// TestOrphanedFileCleanupDebug tests orphaned file cleanup with debug output
|
||||
func TestOrphanedFileCleanupDebug(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Create files
|
||||
file1 := &File{
|
||||
Path: "/orphaned.txt",
|
||||
MTime: time.Now().Truncate(time.Second),
|
||||
CTime: time.Now().Truncate(time.Second),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
file2 := &File{
|
||||
Path: "/referenced.txt",
|
||||
MTime: time.Now().Truncate(time.Second),
|
||||
CTime: time.Now().Truncate(time.Second),
|
||||
Size: 2048,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
|
||||
err := repos.Files.Create(ctx, nil, file1)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file1: %v", err)
|
||||
}
|
||||
t.Logf("Created file1 with ID: %s", file1.ID)
|
||||
|
||||
err = repos.Files.Create(ctx, nil, file2)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file2: %v", err)
|
||||
}
|
||||
t.Logf("Created file2 with ID: %s", file2.ID)
|
||||
|
||||
// Create a snapshot and reference only file2
|
||||
snapshot := &Snapshot{
|
||||
ID: "test-snapshot",
|
||||
Hostname: "test-host",
|
||||
StartedAt: time.Now(),
|
||||
}
|
||||
err = repos.Snapshots.Create(ctx, nil, snapshot)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create snapshot: %v", err)
|
||||
}
|
||||
t.Logf("Created snapshot: %s", snapshot.ID)
|
||||
|
||||
// Check snapshot_files before adding
|
||||
var count int
|
||||
err = db.conn.QueryRow("SELECT COUNT(*) FROM snapshot_files").Scan(&count)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
t.Logf("snapshot_files count before add: %d", count)
|
||||
|
||||
// Add file2 to snapshot
|
||||
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file2.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to add file to snapshot: %v", err)
|
||||
}
|
||||
t.Logf("Added file2 to snapshot")
|
||||
|
||||
// Check snapshot_files after adding
|
||||
err = db.conn.QueryRow("SELECT COUNT(*) FROM snapshot_files").Scan(&count)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
t.Logf("snapshot_files count after add: %d", count)
|
||||
|
||||
// Check which files are referenced
|
||||
rows, err := db.conn.Query("SELECT file_id FROM snapshot_files")
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
defer func() {
|
||||
if err := rows.Close(); err != nil {
|
||||
t.Logf("failed to close rows: %v", err)
|
||||
}
|
||||
}()
|
||||
t.Log("Files in snapshot_files:")
|
||||
for rows.Next() {
|
||||
var fileID string
|
||||
if err := rows.Scan(&fileID); err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
t.Logf(" - %s", fileID)
|
||||
}
|
||||
|
||||
// Check files before cleanup
|
||||
err = db.conn.QueryRow("SELECT COUNT(*) FROM files").Scan(&count)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
t.Logf("Files count before cleanup: %d", count)
|
||||
|
||||
// Run orphaned cleanup
|
||||
err = repos.Files.DeleteOrphaned(ctx)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to delete orphaned files: %v", err)
|
||||
}
|
||||
t.Log("Ran orphaned cleanup")
|
||||
|
||||
// Check files after cleanup
|
||||
err = db.conn.QueryRow("SELECT COUNT(*) FROM files").Scan(&count)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
t.Logf("Files count after cleanup: %d", count)
|
||||
|
||||
// List remaining files
|
||||
files, err := repos.Files.ListByPrefix(ctx, "/")
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
t.Log("Remaining files:")
|
||||
for _, f := range files {
|
||||
t.Logf(" - ID: %s, Path: %s", f.ID, f.Path)
|
||||
}
|
||||
|
||||
// Check that orphaned file is gone
|
||||
orphanedFile, err := repos.Files.GetByID(ctx, file1.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("error getting file: %v", err)
|
||||
}
|
||||
if orphanedFile != nil {
|
||||
t.Error("orphaned file should have been deleted")
|
||||
// Let's check why it wasn't deleted
|
||||
var exists bool
|
||||
err = db.conn.QueryRow(`
|
||||
SELECT EXISTS(
|
||||
SELECT 1 FROM snapshot_files
|
||||
WHERE file_id = ?
|
||||
)`, file1.ID).Scan(&exists)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
t.Logf("File1 exists in snapshot_files: %v", exists)
|
||||
} else {
|
||||
t.Log("Orphaned file was correctly deleted")
|
||||
}
|
||||
|
||||
// Check that referenced file still exists
|
||||
referencedFile, err := repos.Files.GetByID(ctx, file2.ID)
|
||||
if err != nil {
|
||||
t.Fatalf("error getting file: %v", err)
|
||||
}
|
||||
if referencedFile == nil {
|
||||
t.Error("referenced file should not have been deleted")
|
||||
} else {
|
||||
t.Log("Referenced file correctly remains")
|
||||
}
|
||||
}
|
||||
543
internal/database/repository_edge_cases_test.go
Normal file
543
internal/database/repository_edge_cases_test.go
Normal file
@@ -0,0 +1,543 @@
|
||||
package database
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
)
|
||||
|
||||
// TestFileRepositoryEdgeCases tests edge cases for file repository
|
||||
func TestFileRepositoryEdgeCases(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repo := NewFileRepository(db)
|
||||
|
||||
tests := []struct {
|
||||
name string
|
||||
file *File
|
||||
wantErr bool
|
||||
errMsg string
|
||||
}{
|
||||
{
|
||||
name: "empty path",
|
||||
file: &File{
|
||||
Path: "",
|
||||
MTime: time.Now(),
|
||||
CTime: time.Now(),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
},
|
||||
wantErr: false, // Empty strings are allowed, only NULL is not allowed
|
||||
},
|
||||
{
|
||||
name: "very long path",
|
||||
file: &File{
|
||||
Path: types.FilePath("/" + strings.Repeat("a", 4096)),
|
||||
MTime: time.Now(),
|
||||
CTime: time.Now(),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
},
|
||||
wantErr: false,
|
||||
},
|
||||
{
|
||||
name: "path with special characters",
|
||||
file: &File{
|
||||
Path: "/test/file with spaces and 特殊文字.txt",
|
||||
MTime: time.Now(),
|
||||
CTime: time.Now(),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
},
|
||||
wantErr: false,
|
||||
},
|
||||
{
|
||||
name: "zero size file",
|
||||
file: &File{
|
||||
Path: "/empty.txt",
|
||||
MTime: time.Now(),
|
||||
CTime: time.Now(),
|
||||
Size: 0,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
},
|
||||
wantErr: false,
|
||||
},
|
||||
{
|
||||
name: "symlink with target",
|
||||
file: &File{
|
||||
Path: "/link",
|
||||
MTime: time.Now(),
|
||||
CTime: time.Now(),
|
||||
Size: 0,
|
||||
Mode: 0777 | 0120000, // symlink mode
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
LinkTarget: "/target",
|
||||
},
|
||||
wantErr: false,
|
||||
},
|
||||
}
|
||||
|
||||
for i, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
// Add a unique suffix to paths to avoid UNIQUE constraint violations
|
||||
if tt.file.Path != "" {
|
||||
tt.file.Path = types.FilePath(fmt.Sprintf("%s_%d_%d", tt.file.Path, i, time.Now().UnixNano()))
|
||||
}
|
||||
|
||||
err := repo.Create(ctx, nil, tt.file)
|
||||
if (err != nil) != tt.wantErr {
|
||||
t.Errorf("Create() error = %v, wantErr %v", err, tt.wantErr)
|
||||
}
|
||||
if err != nil && tt.errMsg != "" && !strings.Contains(err.Error(), tt.errMsg) {
|
||||
t.Errorf("Create() error = %v, want error containing %q", err, tt.errMsg)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// TestDuplicateHandling tests handling of duplicate entries
|
||||
func TestDuplicateHandling(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Test duplicate file paths - Create uses UPSERT logic
|
||||
t.Run("duplicate file paths", func(t *testing.T) {
|
||||
file1 := &File{
|
||||
Path: "/duplicate.txt",
|
||||
MTime: time.Now(),
|
||||
CTime: time.Now(),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
file2 := &File{
|
||||
Path: "/duplicate.txt", // Same path
|
||||
MTime: time.Now().Add(time.Hour),
|
||||
CTime: time.Now().Add(time.Hour),
|
||||
Size: 2048,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
|
||||
err := repos.Files.Create(ctx, nil, file1)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file1: %v", err)
|
||||
}
|
||||
originalID := file1.ID
|
||||
|
||||
// Create with same path should update the existing record (UPSERT behavior)
|
||||
err = repos.Files.Create(ctx, nil, file2)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file2: %v", err)
|
||||
}
|
||||
|
||||
// Verify the file was updated, not duplicated
|
||||
retrievedFile, err := repos.Files.GetByPath(ctx, "/duplicate.txt")
|
||||
if err != nil {
|
||||
t.Fatalf("failed to retrieve file: %v", err)
|
||||
}
|
||||
|
||||
// The file should have been updated with file2's data
|
||||
if retrievedFile.Size != 2048 {
|
||||
t.Errorf("expected size 2048, got %d", retrievedFile.Size)
|
||||
}
|
||||
|
||||
// ID might be different due to the UPSERT
|
||||
if retrievedFile.ID != file2.ID {
|
||||
t.Logf("File ID changed from %s to %s during upsert", originalID, retrievedFile.ID)
|
||||
}
|
||||
})
|
||||
|
||||
// Test duplicate chunk hashes
|
||||
t.Run("duplicate chunk hashes", func(t *testing.T) {
|
||||
chunk := &Chunk{
|
||||
ChunkHash: types.ChunkHash("duplicate-chunk"),
|
||||
Size: 1024,
|
||||
}
|
||||
|
||||
err := repos.Chunks.Create(ctx, nil, chunk)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create chunk: %v", err)
|
||||
}
|
||||
|
||||
// Creating the same chunk again should be idempotent (ON CONFLICT DO NOTHING)
|
||||
err = repos.Chunks.Create(ctx, nil, chunk)
|
||||
if err != nil {
|
||||
t.Errorf("duplicate chunk creation should be idempotent, got error: %v", err)
|
||||
}
|
||||
})
|
||||
|
||||
// Test duplicate file-chunk mappings
|
||||
t.Run("duplicate file-chunk mappings", func(t *testing.T) {
|
||||
file := &File{
|
||||
Path: "/test-dup-fc.txt",
|
||||
MTime: time.Now(),
|
||||
CTime: time.Now(),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
err := repos.Files.Create(ctx, nil, file)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
chunk := &Chunk{
|
||||
ChunkHash: types.ChunkHash("test-chunk-dup"),
|
||||
Size: 1024,
|
||||
}
|
||||
err = repos.Chunks.Create(ctx, nil, chunk)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
fc := &FileChunk{
|
||||
FileID: file.ID,
|
||||
Idx: 0,
|
||||
ChunkHash: chunk.ChunkHash,
|
||||
}
|
||||
|
||||
err = repos.FileChunks.Create(ctx, nil, fc)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
// Creating the same mapping again should be idempotent
|
||||
err = repos.FileChunks.Create(ctx, nil, fc)
|
||||
if err != nil {
|
||||
t.Error("file-chunk creation should be idempotent")
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
// TestNullHandling tests handling of NULL values
|
||||
func TestNullHandling(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Test file with no link target
|
||||
t.Run("file without link target", func(t *testing.T) {
|
||||
file := &File{
|
||||
Path: "/regular.txt",
|
||||
MTime: time.Now(),
|
||||
CTime: time.Now(),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
LinkTarget: "", // Should be stored as NULL
|
||||
}
|
||||
|
||||
err := repos.Files.Create(ctx, nil, file)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
retrieved, err := repos.Files.GetByID(ctx, file.ID)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
if retrieved.LinkTarget != "" {
|
||||
t.Errorf("expected empty link target, got %q", retrieved.LinkTarget)
|
||||
}
|
||||
})
|
||||
|
||||
// Test snapshot with NULL completed_at
|
||||
t.Run("incomplete snapshot", func(t *testing.T) {
|
||||
snapshot := &Snapshot{
|
||||
ID: "incomplete-test",
|
||||
Hostname: "test-host",
|
||||
StartedAt: time.Now(),
|
||||
CompletedAt: nil, // Should remain NULL until completed
|
||||
}
|
||||
|
||||
err := repos.Snapshots.Create(ctx, nil, snapshot)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
retrieved, err := repos.Snapshots.GetByID(ctx, snapshot.ID.String())
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
if retrieved.CompletedAt != nil {
|
||||
t.Error("expected nil CompletedAt for incomplete snapshot")
|
||||
}
|
||||
})
|
||||
|
||||
// Test blob with NULL uploaded_ts
|
||||
t.Run("blob not uploaded", func(t *testing.T) {
|
||||
blob := &Blob{
|
||||
ID: types.NewBlobID(),
|
||||
Hash: types.BlobHash("test-hash"),
|
||||
CreatedTS: time.Now(),
|
||||
UploadedTS: nil, // Not uploaded yet
|
||||
}
|
||||
|
||||
err := repos.Blobs.Create(ctx, nil, blob)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
retrieved, err := repos.Blobs.GetByID(ctx, blob.ID.String())
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
if retrieved.UploadedTS != nil {
|
||||
t.Error("expected nil UploadedTS for non-uploaded blob")
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
// TestLargeDatasets tests operations with large amounts of data
|
||||
func TestLargeDatasets(t *testing.T) {
|
||||
if testing.Short() {
|
||||
t.Skip("skipping large dataset test in short mode")
|
||||
}
|
||||
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Create a snapshot
|
||||
snapshot := &Snapshot{
|
||||
ID: "large-dataset-test",
|
||||
Hostname: "test-host",
|
||||
StartedAt: time.Now(),
|
||||
}
|
||||
err := repos.Snapshots.Create(ctx, nil, snapshot)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
// Create many files
|
||||
const fileCount = 1000
|
||||
fileIDs := make([]types.FileID, fileCount)
|
||||
|
||||
t.Run("create many files", func(t *testing.T) {
|
||||
start := time.Now()
|
||||
for i := 0; i < fileCount; i++ {
|
||||
file := &File{
|
||||
Path: types.FilePath(fmt.Sprintf("/large/file%05d.txt", i)),
|
||||
MTime: time.Now(),
|
||||
CTime: time.Now(),
|
||||
Size: int64(i * 1024),
|
||||
Mode: 0644,
|
||||
UID: uint32(1000 + (i % 10)),
|
||||
GID: uint32(1000 + (i % 10)),
|
||||
}
|
||||
err := repos.Files.Create(ctx, nil, file)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create file %d: %v", i, err)
|
||||
}
|
||||
fileIDs[i] = file.ID
|
||||
|
||||
// Add half to snapshot
|
||||
if i%2 == 0 {
|
||||
err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file.ID)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
}
|
||||
}
|
||||
t.Logf("Created %d files in %v", fileCount, time.Since(start))
|
||||
})
|
||||
|
||||
// Test ListByPrefix performance
|
||||
t.Run("list by prefix performance", func(t *testing.T) {
|
||||
start := time.Now()
|
||||
files, err := repos.Files.ListByPrefix(ctx, "/large/")
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if len(files) != fileCount {
|
||||
t.Errorf("expected %d files, got %d", fileCount, len(files))
|
||||
}
|
||||
t.Logf("Listed %d files in %v", len(files), time.Since(start))
|
||||
})
|
||||
|
||||
// Test orphaned cleanup performance
|
||||
t.Run("orphaned cleanup performance", func(t *testing.T) {
|
||||
start := time.Now()
|
||||
err := repos.Files.DeleteOrphaned(ctx)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
t.Logf("Cleaned up orphaned files in %v", time.Since(start))
|
||||
|
||||
// Verify correct number remain
|
||||
files, err := repos.Files.ListByPrefix(ctx, "/large/")
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if len(files) != fileCount/2 {
|
||||
t.Errorf("expected %d files after cleanup, got %d", fileCount/2, len(files))
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
// TestErrorPropagation tests that errors are properly propagated
|
||||
func TestErrorPropagation(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Test GetByID with non-existent ID
|
||||
t.Run("GetByID non-existent", func(t *testing.T) {
|
||||
file, err := repos.Files.GetByID(ctx, types.NewFileID())
|
||||
if err != nil {
|
||||
t.Errorf("GetByID should not return error for non-existent ID, got: %v", err)
|
||||
}
|
||||
if file != nil {
|
||||
t.Error("expected nil file for non-existent ID")
|
||||
}
|
||||
})
|
||||
|
||||
// Test GetByPath with non-existent path
|
||||
t.Run("GetByPath non-existent", func(t *testing.T) {
|
||||
file, err := repos.Files.GetByPath(ctx, "/non/existent/path.txt")
|
||||
if err != nil {
|
||||
t.Errorf("GetByPath should not return error for non-existent path, got: %v", err)
|
||||
}
|
||||
if file != nil {
|
||||
t.Error("expected nil file for non-existent path")
|
||||
}
|
||||
})
|
||||
|
||||
// Test invalid foreign key reference
|
||||
t.Run("invalid foreign key", func(t *testing.T) {
|
||||
fc := &FileChunk{
|
||||
FileID: types.NewFileID(),
|
||||
Idx: 0,
|
||||
ChunkHash: types.ChunkHash("some-chunk"),
|
||||
}
|
||||
err := repos.FileChunks.Create(ctx, nil, fc)
|
||||
if err == nil {
|
||||
t.Error("expected error for invalid foreign key")
|
||||
}
|
||||
if !strings.Contains(err.Error(), "FOREIGN KEY") {
|
||||
t.Errorf("expected foreign key error, got: %v", err)
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
// TestQueryInjection tests that the system is safe from SQL injection
|
||||
func TestQueryInjection(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Test various injection attempts
|
||||
injectionTests := []string{
|
||||
"'; DROP TABLE files; --",
|
||||
"' OR '1'='1",
|
||||
"'; DELETE FROM files WHERE '1'='1'; --",
|
||||
`test'); DROP TABLE files; --`,
|
||||
}
|
||||
|
||||
for _, injection := range injectionTests {
|
||||
t.Run("injection attempt", func(t *testing.T) {
|
||||
// Try injection in file path
|
||||
file := &File{
|
||||
Path: types.FilePath(injection),
|
||||
MTime: time.Now(),
|
||||
CTime: time.Now(),
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
_ = repos.Files.Create(ctx, nil, file)
|
||||
// Should either succeed (treating as normal string) or fail with constraint
|
||||
// but should NOT execute the injected SQL
|
||||
|
||||
// Verify tables still exist
|
||||
var count int
|
||||
err := db.conn.QueryRow("SELECT COUNT(*) FROM files").Scan(&count)
|
||||
if err != nil {
|
||||
t.Fatal("files table was damaged by injection")
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// TestTimezoneHandling tests that times are properly handled in UTC
|
||||
func TestTimezoneHandling(t *testing.T) {
|
||||
db, cleanup := setupTestDB(t)
|
||||
defer cleanup()
|
||||
|
||||
ctx := context.Background()
|
||||
repos := NewRepositories(db)
|
||||
|
||||
// Create file with specific timezone
|
||||
loc, err := time.LoadLocation("America/New_York")
|
||||
if err != nil {
|
||||
t.Skip("timezone not available")
|
||||
}
|
||||
|
||||
// Use Truncate to remove sub-second precision since we store as Unix timestamps
|
||||
nyTime := time.Now().In(loc).Truncate(time.Second)
|
||||
file := &File{
|
||||
Path: "/timezone-test.txt",
|
||||
MTime: nyTime,
|
||||
CTime: nyTime,
|
||||
Size: 1024,
|
||||
Mode: 0644,
|
||||
UID: 1000,
|
||||
GID: 1000,
|
||||
}
|
||||
|
||||
err = repos.Files.Create(ctx, nil, file)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
// Retrieve and verify times are in UTC
|
||||
retrieved, err := repos.Files.GetByID(ctx, file.ID)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
// Check that times are equivalent (same instant)
|
||||
if !retrieved.MTime.Equal(nyTime) {
|
||||
t.Error("time was not preserved correctly")
|
||||
}
|
||||
|
||||
// Check that retrieved time is in UTC
|
||||
if retrieved.MTime.Location() != time.UTC {
|
||||
t.Error("retrieved time is not in UTC")
|
||||
}
|
||||
}
|
||||
137
internal/database/schema.sql
Normal file
137
internal/database/schema.sql
Normal file
@@ -0,0 +1,137 @@
|
||||
-- Vaultik Database Schema
|
||||
-- Note: This database does not support migrations. If the schema changes,
|
||||
-- delete the local database and perform a full backup to recreate it.
|
||||
|
||||
-- Files table: stores metadata about files in the filesystem
|
||||
CREATE TABLE IF NOT EXISTS files (
|
||||
id TEXT PRIMARY KEY, -- UUID
|
||||
path TEXT NOT NULL UNIQUE,
|
||||
source_path TEXT NOT NULL DEFAULT '', -- The source directory this file came from (for restore path stripping)
|
||||
mtime INTEGER NOT NULL,
|
||||
ctime INTEGER NOT NULL,
|
||||
size INTEGER NOT NULL,
|
||||
mode INTEGER NOT NULL,
|
||||
uid INTEGER NOT NULL,
|
||||
gid INTEGER NOT NULL,
|
||||
link_target TEXT
|
||||
);
|
||||
|
||||
-- Create index on path for efficient lookups
|
||||
CREATE INDEX IF NOT EXISTS idx_files_path ON files(path);
|
||||
|
||||
-- File chunks table: maps files to their constituent chunks
|
||||
CREATE TABLE IF NOT EXISTS file_chunks (
|
||||
file_id TEXT NOT NULL,
|
||||
idx INTEGER NOT NULL,
|
||||
chunk_hash TEXT NOT NULL,
|
||||
PRIMARY KEY (file_id, idx),
|
||||
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE,
|
||||
FOREIGN KEY (chunk_hash) REFERENCES chunks(chunk_hash)
|
||||
);
|
||||
|
||||
-- Index for efficient chunk lookups (used in orphan detection)
|
||||
CREATE INDEX IF NOT EXISTS idx_file_chunks_chunk_hash ON file_chunks(chunk_hash);
|
||||
|
||||
-- Chunks table: stores unique content-defined chunks
|
||||
CREATE TABLE IF NOT EXISTS chunks (
|
||||
chunk_hash TEXT PRIMARY KEY,
|
||||
size INTEGER NOT NULL
|
||||
);
|
||||
|
||||
-- Blobs table: stores packed, compressed, and encrypted blob information
|
||||
CREATE TABLE IF NOT EXISTS blobs (
|
||||
id TEXT PRIMARY KEY,
|
||||
blob_hash TEXT UNIQUE,
|
||||
created_ts INTEGER NOT NULL,
|
||||
finished_ts INTEGER,
|
||||
uncompressed_size INTEGER NOT NULL DEFAULT 0,
|
||||
compressed_size INTEGER NOT NULL DEFAULT 0,
|
||||
uploaded_ts INTEGER
|
||||
);
|
||||
|
||||
-- Blob chunks table: maps chunks to the blobs that contain them
|
||||
CREATE TABLE IF NOT EXISTS blob_chunks (
|
||||
blob_id TEXT NOT NULL,
|
||||
chunk_hash TEXT NOT NULL,
|
||||
offset INTEGER NOT NULL,
|
||||
length INTEGER NOT NULL,
|
||||
PRIMARY KEY (blob_id, chunk_hash),
|
||||
FOREIGN KEY (blob_id) REFERENCES blobs(id) ON DELETE CASCADE,
|
||||
FOREIGN KEY (chunk_hash) REFERENCES chunks(chunk_hash)
|
||||
);
|
||||
|
||||
-- Index for efficient chunk lookups (used in orphan detection)
|
||||
CREATE INDEX IF NOT EXISTS idx_blob_chunks_chunk_hash ON blob_chunks(chunk_hash);
|
||||
|
||||
-- Chunk files table: reverse mapping of chunks to files
|
||||
CREATE TABLE IF NOT EXISTS chunk_files (
|
||||
chunk_hash TEXT NOT NULL,
|
||||
file_id TEXT NOT NULL,
|
||||
file_offset INTEGER NOT NULL,
|
||||
length INTEGER NOT NULL,
|
||||
PRIMARY KEY (chunk_hash, file_id),
|
||||
FOREIGN KEY (chunk_hash) REFERENCES chunks(chunk_hash),
|
||||
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
-- Index for efficient file lookups (used in orphan detection)
|
||||
CREATE INDEX IF NOT EXISTS idx_chunk_files_file_id ON chunk_files(file_id);
|
||||
|
||||
-- Snapshots table: tracks backup snapshots
|
||||
CREATE TABLE IF NOT EXISTS snapshots (
|
||||
id TEXT PRIMARY KEY,
|
||||
hostname TEXT NOT NULL,
|
||||
vaultik_version TEXT NOT NULL,
|
||||
vaultik_git_revision TEXT NOT NULL,
|
||||
started_at INTEGER NOT NULL,
|
||||
completed_at INTEGER,
|
||||
file_count INTEGER NOT NULL DEFAULT 0,
|
||||
chunk_count INTEGER NOT NULL DEFAULT 0,
|
||||
blob_count INTEGER NOT NULL DEFAULT 0,
|
||||
total_size INTEGER NOT NULL DEFAULT 0,
|
||||
blob_size INTEGER NOT NULL DEFAULT 0,
|
||||
blob_uncompressed_size INTEGER NOT NULL DEFAULT 0,
|
||||
compression_ratio REAL NOT NULL DEFAULT 1.0,
|
||||
compression_level INTEGER NOT NULL DEFAULT 3,
|
||||
upload_bytes INTEGER NOT NULL DEFAULT 0,
|
||||
upload_duration_ms INTEGER NOT NULL DEFAULT 0
|
||||
);
|
||||
|
||||
-- Snapshot files table: maps snapshots to files
|
||||
CREATE TABLE IF NOT EXISTS snapshot_files (
|
||||
snapshot_id TEXT NOT NULL,
|
||||
file_id TEXT NOT NULL,
|
||||
PRIMARY KEY (snapshot_id, file_id),
|
||||
FOREIGN KEY (snapshot_id) REFERENCES snapshots(id) ON DELETE CASCADE,
|
||||
FOREIGN KEY (file_id) REFERENCES files(id)
|
||||
);
|
||||
|
||||
-- Index for efficient file lookups (used in orphan detection)
|
||||
CREATE INDEX IF NOT EXISTS idx_snapshot_files_file_id ON snapshot_files(file_id);
|
||||
|
||||
-- Snapshot blobs table: maps snapshots to blobs
|
||||
CREATE TABLE IF NOT EXISTS snapshot_blobs (
|
||||
snapshot_id TEXT NOT NULL,
|
||||
blob_id TEXT NOT NULL,
|
||||
blob_hash TEXT NOT NULL,
|
||||
PRIMARY KEY (snapshot_id, blob_id),
|
||||
FOREIGN KEY (snapshot_id) REFERENCES snapshots(id) ON DELETE CASCADE,
|
||||
FOREIGN KEY (blob_id) REFERENCES blobs(id)
|
||||
);
|
||||
|
||||
-- Index for efficient blob lookups (used in orphan detection)
|
||||
CREATE INDEX IF NOT EXISTS idx_snapshot_blobs_blob_id ON snapshot_blobs(blob_id);
|
||||
|
||||
-- Uploads table: tracks blob upload metrics
|
||||
CREATE TABLE IF NOT EXISTS uploads (
|
||||
blob_hash TEXT PRIMARY KEY,
|
||||
snapshot_id TEXT NOT NULL,
|
||||
uploaded_at INTEGER NOT NULL,
|
||||
size INTEGER NOT NULL,
|
||||
duration_ms INTEGER NOT NULL,
|
||||
FOREIGN KEY (blob_hash) REFERENCES blobs(blob_hash),
|
||||
FOREIGN KEY (snapshot_id) REFERENCES snapshots(id)
|
||||
);
|
||||
|
||||
-- Index for efficient snapshot lookups
|
||||
CREATE INDEX IF NOT EXISTS idx_uploads_snapshot_id ON uploads(snapshot_id);
|
||||
11
internal/database/schema/008_uploads.sql
Normal file
11
internal/database/schema/008_uploads.sql
Normal file
@@ -0,0 +1,11 @@
|
||||
-- Track blob upload metrics
|
||||
CREATE TABLE IF NOT EXISTS uploads (
|
||||
blob_hash TEXT PRIMARY KEY,
|
||||
uploaded_at TIMESTAMP NOT NULL,
|
||||
size INTEGER NOT NULL,
|
||||
duration_ms INTEGER NOT NULL,
|
||||
FOREIGN KEY (blob_hash) REFERENCES blobs(blob_hash)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_uploads_uploaded_at ON uploads(uploaded_at);
|
||||
CREATE INDEX idx_uploads_duration ON uploads(duration_ms);
|
||||
@@ -5,6 +5,8 @@ import (
|
||||
"database/sql"
|
||||
"fmt"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
)
|
||||
|
||||
type SnapshotRepository struct {
|
||||
@@ -17,17 +19,27 @@ func NewSnapshotRepository(db *DB) *SnapshotRepository {
|
||||
|
||||
func (r *SnapshotRepository) Create(ctx context.Context, tx *sql.Tx, snapshot *Snapshot) error {
|
||||
query := `
|
||||
INSERT INTO snapshots (id, hostname, vaultik_version, created_ts, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
INSERT INTO snapshots (id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at,
|
||||
file_count, chunk_count, blob_count, total_size, blob_size, blob_uncompressed_size,
|
||||
compression_ratio, compression_level, upload_bytes, upload_duration_ms)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
`
|
||||
|
||||
var completedAt *int64
|
||||
if snapshot.CompletedAt != nil {
|
||||
ts := snapshot.CompletedAt.Unix()
|
||||
completedAt = &ts
|
||||
}
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, snapshot.ID, snapshot.Hostname, snapshot.VaultikVersion, snapshot.CreatedTS.Unix(),
|
||||
snapshot.FileCount, snapshot.ChunkCount, snapshot.BlobCount, snapshot.TotalSize, snapshot.BlobSize, snapshot.CompressionRatio)
|
||||
_, err = tx.ExecContext(ctx, query, snapshot.ID, snapshot.Hostname, snapshot.VaultikVersion, snapshot.VaultikGitRevision, snapshot.StartedAt.Unix(),
|
||||
completedAt, snapshot.FileCount, snapshot.ChunkCount, snapshot.BlobCount, snapshot.TotalSize, snapshot.BlobSize, snapshot.BlobUncompressedSize,
|
||||
snapshot.CompressionRatio, snapshot.CompressionLevel, snapshot.UploadBytes, snapshot.UploadDurationMs)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLock(ctx, query, snapshot.ID, snapshot.Hostname, snapshot.VaultikVersion, snapshot.CreatedTS.Unix(),
|
||||
snapshot.FileCount, snapshot.ChunkCount, snapshot.BlobCount, snapshot.TotalSize, snapshot.BlobSize, snapshot.CompressionRatio)
|
||||
_, err = r.db.ExecWithLog(ctx, query, snapshot.ID, snapshot.Hostname, snapshot.VaultikVersion, snapshot.VaultikGitRevision, snapshot.StartedAt.Unix(),
|
||||
completedAt, snapshot.FileCount, snapshot.ChunkCount, snapshot.BlobCount, snapshot.TotalSize, snapshot.BlobSize, snapshot.BlobUncompressedSize,
|
||||
snapshot.CompressionRatio, snapshot.CompressionLevel, snapshot.UploadBytes, snapshot.UploadDurationMs)
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
@@ -58,7 +70,7 @@ func (r *SnapshotRepository) UpdateCounts(ctx context.Context, tx *sql.Tx, snaps
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, fileCount, chunkCount, blobCount, totalSize, blobSize, compressionRatio, snapshotID)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLock(ctx, query, fileCount, chunkCount, blobCount, totalSize, blobSize, compressionRatio, snapshotID)
|
||||
_, err = r.db.ExecWithLog(ctx, query, fileCount, chunkCount, blobCount, totalSize, blobSize, compressionRatio, snapshotID)
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
@@ -68,27 +80,83 @@ func (r *SnapshotRepository) UpdateCounts(ctx context.Context, tx *sql.Tx, snaps
|
||||
return nil
|
||||
}
|
||||
|
||||
// UpdateExtendedStats updates extended statistics for a snapshot
|
||||
func (r *SnapshotRepository) UpdateExtendedStats(ctx context.Context, tx *sql.Tx, snapshotID string, blobUncompressedSize int64, compressionLevel int, uploadDurationMs int64) error {
|
||||
// Calculate compression ratio based on uncompressed vs compressed sizes
|
||||
var compressionRatio float64
|
||||
if blobUncompressedSize > 0 {
|
||||
// Get current blob_size from DB to calculate ratio
|
||||
var blobSize int64
|
||||
queryGet := `SELECT blob_size FROM snapshots WHERE id = ?`
|
||||
if tx != nil {
|
||||
err := tx.QueryRowContext(ctx, queryGet, snapshotID).Scan(&blobSize)
|
||||
if err != nil {
|
||||
return fmt.Errorf("getting blob size: %w", err)
|
||||
}
|
||||
} else {
|
||||
err := r.db.conn.QueryRowContext(ctx, queryGet, snapshotID).Scan(&blobSize)
|
||||
if err != nil {
|
||||
return fmt.Errorf("getting blob size: %w", err)
|
||||
}
|
||||
}
|
||||
compressionRatio = float64(blobSize) / float64(blobUncompressedSize)
|
||||
} else {
|
||||
compressionRatio = 1.0
|
||||
}
|
||||
|
||||
query := `
|
||||
UPDATE snapshots
|
||||
SET blob_uncompressed_size = ?,
|
||||
compression_ratio = ?,
|
||||
compression_level = ?,
|
||||
upload_bytes = blob_size,
|
||||
upload_duration_ms = ?
|
||||
WHERE id = ?
|
||||
`
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, blobUncompressedSize, compressionRatio, compressionLevel, uploadDurationMs, snapshotID)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLog(ctx, query, blobUncompressedSize, compressionRatio, compressionLevel, uploadDurationMs, snapshotID)
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
return fmt.Errorf("updating extended stats: %w", err)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (r *SnapshotRepository) GetByID(ctx context.Context, snapshotID string) (*Snapshot, error) {
|
||||
query := `
|
||||
SELECT id, hostname, vaultik_version, created_ts, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
|
||||
SELECT id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at,
|
||||
file_count, chunk_count, blob_count, total_size, blob_size, blob_uncompressed_size,
|
||||
compression_ratio, compression_level, upload_bytes, upload_duration_ms
|
||||
FROM snapshots
|
||||
WHERE id = ?
|
||||
`
|
||||
|
||||
var snapshot Snapshot
|
||||
var createdTSUnix int64
|
||||
var startedAtUnix int64
|
||||
var completedAtUnix *int64
|
||||
|
||||
err := r.db.conn.QueryRowContext(ctx, query, snapshotID).Scan(
|
||||
&snapshot.ID,
|
||||
&snapshot.Hostname,
|
||||
&snapshot.VaultikVersion,
|
||||
&createdTSUnix,
|
||||
&snapshot.VaultikGitRevision,
|
||||
&startedAtUnix,
|
||||
&completedAtUnix,
|
||||
&snapshot.FileCount,
|
||||
&snapshot.ChunkCount,
|
||||
&snapshot.BlobCount,
|
||||
&snapshot.TotalSize,
|
||||
&snapshot.BlobSize,
|
||||
&snapshot.BlobUncompressedSize,
|
||||
&snapshot.CompressionRatio,
|
||||
&snapshot.CompressionLevel,
|
||||
&snapshot.UploadBytes,
|
||||
&snapshot.UploadDurationMs,
|
||||
)
|
||||
|
||||
if err == sql.ErrNoRows {
|
||||
@@ -98,16 +166,20 @@ func (r *SnapshotRepository) GetByID(ctx context.Context, snapshotID string) (*S
|
||||
return nil, fmt.Errorf("querying snapshot: %w", err)
|
||||
}
|
||||
|
||||
snapshot.CreatedTS = time.Unix(createdTSUnix, 0)
|
||||
snapshot.StartedAt = time.Unix(startedAtUnix, 0).UTC()
|
||||
if completedAtUnix != nil {
|
||||
t := time.Unix(*completedAtUnix, 0).UTC()
|
||||
snapshot.CompletedAt = &t
|
||||
}
|
||||
|
||||
return &snapshot, nil
|
||||
}
|
||||
|
||||
func (r *SnapshotRepository) ListRecent(ctx context.Context, limit int) ([]*Snapshot, error) {
|
||||
query := `
|
||||
SELECT id, hostname, vaultik_version, created_ts, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
|
||||
SELECT id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
|
||||
FROM snapshots
|
||||
ORDER BY created_ts DESC
|
||||
ORDER BY started_at DESC
|
||||
LIMIT ?
|
||||
`
|
||||
|
||||
@@ -120,13 +192,16 @@ func (r *SnapshotRepository) ListRecent(ctx context.Context, limit int) ([]*Snap
|
||||
var snapshots []*Snapshot
|
||||
for rows.Next() {
|
||||
var snapshot Snapshot
|
||||
var createdTSUnix int64
|
||||
var startedAtUnix int64
|
||||
var completedAtUnix *int64
|
||||
|
||||
err := rows.Scan(
|
||||
&snapshot.ID,
|
||||
&snapshot.Hostname,
|
||||
&snapshot.VaultikVersion,
|
||||
&createdTSUnix,
|
||||
&snapshot.VaultikGitRevision,
|
||||
&startedAtUnix,
|
||||
&completedAtUnix,
|
||||
&snapshot.FileCount,
|
||||
&snapshot.ChunkCount,
|
||||
&snapshot.BlobCount,
|
||||
@@ -138,10 +213,336 @@ func (r *SnapshotRepository) ListRecent(ctx context.Context, limit int) ([]*Snap
|
||||
return nil, fmt.Errorf("scanning snapshot: %w", err)
|
||||
}
|
||||
|
||||
snapshot.CreatedTS = time.Unix(createdTSUnix, 0)
|
||||
snapshot.StartedAt = time.Unix(startedAtUnix, 0)
|
||||
if completedAtUnix != nil {
|
||||
t := time.Unix(*completedAtUnix, 0)
|
||||
snapshot.CompletedAt = &t
|
||||
}
|
||||
|
||||
snapshots = append(snapshots, &snapshot)
|
||||
}
|
||||
|
||||
return snapshots, rows.Err()
|
||||
}
|
||||
|
||||
// MarkComplete marks a snapshot as completed with the current timestamp
|
||||
func (r *SnapshotRepository) MarkComplete(ctx context.Context, tx *sql.Tx, snapshotID string) error {
|
||||
query := `
|
||||
UPDATE snapshots
|
||||
SET completed_at = ?
|
||||
WHERE id = ?
|
||||
`
|
||||
|
||||
completedAt := time.Now().UTC().Unix()
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, completedAt, snapshotID)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLog(ctx, query, completedAt, snapshotID)
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
return fmt.Errorf("marking snapshot complete: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// AddFile adds a file to a snapshot
|
||||
func (r *SnapshotRepository) AddFile(ctx context.Context, tx *sql.Tx, snapshotID string, filePath string) error {
|
||||
query := `
|
||||
INSERT OR IGNORE INTO snapshot_files (snapshot_id, file_id)
|
||||
SELECT ?, id FROM files WHERE path = ?
|
||||
`
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, snapshotID, filePath)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLog(ctx, query, snapshotID, filePath)
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
return fmt.Errorf("adding file to snapshot: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// AddFileByID adds a file to a snapshot by file ID
|
||||
func (r *SnapshotRepository) AddFileByID(ctx context.Context, tx *sql.Tx, snapshotID string, fileID types.FileID) error {
|
||||
query := `
|
||||
INSERT OR IGNORE INTO snapshot_files (snapshot_id, file_id)
|
||||
VALUES (?, ?)
|
||||
`
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, snapshotID, fileID.String())
|
||||
} else {
|
||||
_, err = r.db.ExecWithLog(ctx, query, snapshotID, fileID.String())
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
return fmt.Errorf("adding file to snapshot: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// AddFilesByIDBatch adds multiple files to a snapshot in batched inserts
|
||||
func (r *SnapshotRepository) AddFilesByIDBatch(ctx context.Context, tx *sql.Tx, snapshotID string, fileIDs []types.FileID) error {
|
||||
if len(fileIDs) == 0 {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Each entry has 2 values, so batch at 400 to be safe
|
||||
const batchSize = 400
|
||||
|
||||
for i := 0; i < len(fileIDs); i += batchSize {
|
||||
end := i + batchSize
|
||||
if end > len(fileIDs) {
|
||||
end = len(fileIDs)
|
||||
}
|
||||
batch := fileIDs[i:end]
|
||||
|
||||
query := "INSERT OR IGNORE INTO snapshot_files (snapshot_id, file_id) VALUES "
|
||||
args := make([]interface{}, 0, len(batch)*2)
|
||||
for j, fileID := range batch {
|
||||
if j > 0 {
|
||||
query += ", "
|
||||
}
|
||||
query += "(?, ?)"
|
||||
args = append(args, snapshotID, fileID.String())
|
||||
}
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, args...)
|
||||
} else {
|
||||
_, err = r.db.ExecWithLog(ctx, query, args...)
|
||||
}
|
||||
if err != nil {
|
||||
return fmt.Errorf("batch adding files to snapshot: %w", err)
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// AddBlob adds a blob to a snapshot
|
||||
func (r *SnapshotRepository) AddBlob(ctx context.Context, tx *sql.Tx, snapshotID string, blobID types.BlobID, blobHash types.BlobHash) error {
|
||||
query := `
|
||||
INSERT OR IGNORE INTO snapshot_blobs (snapshot_id, blob_id, blob_hash)
|
||||
VALUES (?, ?, ?)
|
||||
`
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, snapshotID, blobID.String(), blobHash.String())
|
||||
} else {
|
||||
_, err = r.db.ExecWithLog(ctx, query, snapshotID, blobID.String(), blobHash.String())
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
return fmt.Errorf("adding blob to snapshot: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// GetBlobHashes returns all blob hashes for a snapshot
|
||||
func (r *SnapshotRepository) GetBlobHashes(ctx context.Context, snapshotID string) ([]string, error) {
|
||||
query := `
|
||||
SELECT sb.blob_hash
|
||||
FROM snapshot_blobs sb
|
||||
WHERE sb.snapshot_id = ?
|
||||
ORDER BY sb.blob_hash
|
||||
`
|
||||
|
||||
rows, err := r.db.conn.QueryContext(ctx, query, snapshotID)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("querying blob hashes: %w", err)
|
||||
}
|
||||
defer CloseRows(rows)
|
||||
|
||||
var blobs []string
|
||||
for rows.Next() {
|
||||
var blobHash string
|
||||
if err := rows.Scan(&blobHash); err != nil {
|
||||
return nil, fmt.Errorf("scanning blob hash: %w", err)
|
||||
}
|
||||
blobs = append(blobs, blobHash)
|
||||
}
|
||||
|
||||
return blobs, rows.Err()
|
||||
}
|
||||
|
||||
// GetSnapshotTotalCompressedSize returns the total compressed size of all blobs referenced by a snapshot
|
||||
func (r *SnapshotRepository) GetSnapshotTotalCompressedSize(ctx context.Context, snapshotID string) (int64, error) {
|
||||
query := `
|
||||
SELECT COALESCE(SUM(b.compressed_size), 0)
|
||||
FROM snapshot_blobs sb
|
||||
JOIN blobs b ON sb.blob_hash = b.blob_hash
|
||||
WHERE sb.snapshot_id = ?
|
||||
`
|
||||
|
||||
var totalSize int64
|
||||
err := r.db.conn.QueryRowContext(ctx, query, snapshotID).Scan(&totalSize)
|
||||
if err != nil {
|
||||
return 0, fmt.Errorf("querying total compressed size: %w", err)
|
||||
}
|
||||
|
||||
return totalSize, nil
|
||||
}
|
||||
|
||||
// GetIncompleteSnapshots returns all snapshots that haven't been completed
|
||||
func (r *SnapshotRepository) GetIncompleteSnapshots(ctx context.Context) ([]*Snapshot, error) {
|
||||
query := `
|
||||
SELECT id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
|
||||
FROM snapshots
|
||||
WHERE completed_at IS NULL
|
||||
ORDER BY started_at DESC
|
||||
`
|
||||
|
||||
rows, err := r.db.conn.QueryContext(ctx, query)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("querying incomplete snapshots: %w", err)
|
||||
}
|
||||
defer CloseRows(rows)
|
||||
|
||||
var snapshots []*Snapshot
|
||||
for rows.Next() {
|
||||
var snapshot Snapshot
|
||||
var startedAtUnix int64
|
||||
var completedAtUnix *int64
|
||||
|
||||
err := rows.Scan(
|
||||
&snapshot.ID,
|
||||
&snapshot.Hostname,
|
||||
&snapshot.VaultikVersion,
|
||||
&snapshot.VaultikGitRevision,
|
||||
&startedAtUnix,
|
||||
&completedAtUnix,
|
||||
&snapshot.FileCount,
|
||||
&snapshot.ChunkCount,
|
||||
&snapshot.BlobCount,
|
||||
&snapshot.TotalSize,
|
||||
&snapshot.BlobSize,
|
||||
&snapshot.CompressionRatio,
|
||||
)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("scanning snapshot: %w", err)
|
||||
}
|
||||
|
||||
snapshot.StartedAt = time.Unix(startedAtUnix, 0)
|
||||
if completedAtUnix != nil {
|
||||
t := time.Unix(*completedAtUnix, 0)
|
||||
snapshot.CompletedAt = &t
|
||||
}
|
||||
|
||||
snapshots = append(snapshots, &snapshot)
|
||||
}
|
||||
|
||||
return snapshots, rows.Err()
|
||||
}
|
||||
|
||||
// GetIncompleteByHostname returns all incomplete snapshots for a specific hostname
|
||||
func (r *SnapshotRepository) GetIncompleteByHostname(ctx context.Context, hostname string) ([]*Snapshot, error) {
|
||||
query := `
|
||||
SELECT id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
|
||||
FROM snapshots
|
||||
WHERE completed_at IS NULL AND hostname = ?
|
||||
ORDER BY started_at DESC
|
||||
`
|
||||
|
||||
rows, err := r.db.conn.QueryContext(ctx, query, hostname)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("querying incomplete snapshots: %w", err)
|
||||
}
|
||||
defer CloseRows(rows)
|
||||
|
||||
var snapshots []*Snapshot
|
||||
for rows.Next() {
|
||||
var snapshot Snapshot
|
||||
var startedAtUnix int64
|
||||
var completedAtUnix *int64
|
||||
|
||||
err := rows.Scan(
|
||||
&snapshot.ID,
|
||||
&snapshot.Hostname,
|
||||
&snapshot.VaultikVersion,
|
||||
&snapshot.VaultikGitRevision,
|
||||
&startedAtUnix,
|
||||
&completedAtUnix,
|
||||
&snapshot.FileCount,
|
||||
&snapshot.ChunkCount,
|
||||
&snapshot.BlobCount,
|
||||
&snapshot.TotalSize,
|
||||
&snapshot.BlobSize,
|
||||
&snapshot.CompressionRatio,
|
||||
)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("scanning snapshot: %w", err)
|
||||
}
|
||||
|
||||
snapshot.StartedAt = time.Unix(startedAtUnix, 0).UTC()
|
||||
if completedAtUnix != nil {
|
||||
t := time.Unix(*completedAtUnix, 0).UTC()
|
||||
snapshot.CompletedAt = &t
|
||||
}
|
||||
|
||||
snapshots = append(snapshots, &snapshot)
|
||||
}
|
||||
|
||||
return snapshots, rows.Err()
|
||||
}
|
||||
|
||||
// Delete removes a snapshot record
|
||||
func (r *SnapshotRepository) Delete(ctx context.Context, snapshotID string) error {
|
||||
query := `DELETE FROM snapshots WHERE id = ?`
|
||||
|
||||
_, err := r.db.ExecWithLog(ctx, query, snapshotID)
|
||||
if err != nil {
|
||||
return fmt.Errorf("deleting snapshot: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// DeleteSnapshotFiles removes all snapshot_files entries for a snapshot
|
||||
func (r *SnapshotRepository) DeleteSnapshotFiles(ctx context.Context, snapshotID string) error {
|
||||
query := `DELETE FROM snapshot_files WHERE snapshot_id = ?`
|
||||
|
||||
_, err := r.db.ExecWithLog(ctx, query, snapshotID)
|
||||
if err != nil {
|
||||
return fmt.Errorf("deleting snapshot files: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// DeleteSnapshotBlobs removes all snapshot_blobs entries for a snapshot
|
||||
func (r *SnapshotRepository) DeleteSnapshotBlobs(ctx context.Context, snapshotID string) error {
|
||||
query := `DELETE FROM snapshot_blobs WHERE snapshot_id = ?`
|
||||
|
||||
_, err := r.db.ExecWithLog(ctx, query, snapshotID)
|
||||
if err != nil {
|
||||
return fmt.Errorf("deleting snapshot blobs: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// DeleteSnapshotUploads removes all uploads entries for a snapshot
|
||||
func (r *SnapshotRepository) DeleteSnapshotUploads(ctx context.Context, snapshotID string) error {
|
||||
query := `DELETE FROM uploads WHERE snapshot_id = ?`
|
||||
|
||||
_, err := r.db.ExecWithLog(ctx, query, snapshotID)
|
||||
if err != nil {
|
||||
return fmt.Errorf("deleting snapshot uploads: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -6,6 +6,8 @@ import (
|
||||
"math"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
)
|
||||
|
||||
const (
|
||||
@@ -30,7 +32,8 @@ func TestSnapshotRepository(t *testing.T) {
|
||||
ID: "2024-01-01T12:00:00Z",
|
||||
Hostname: "test-host",
|
||||
VaultikVersion: "1.0.0",
|
||||
CreatedTS: time.Now().Truncate(time.Second),
|
||||
StartedAt: time.Now().Truncate(time.Second),
|
||||
CompletedAt: nil,
|
||||
FileCount: 100,
|
||||
ChunkCount: 500,
|
||||
BlobCount: 10,
|
||||
@@ -45,7 +48,7 @@ func TestSnapshotRepository(t *testing.T) {
|
||||
}
|
||||
|
||||
// Test GetByID
|
||||
retrieved, err := repo.GetByID(ctx, snapshot.ID)
|
||||
retrieved, err := repo.GetByID(ctx, snapshot.ID.String())
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get snapshot: %v", err)
|
||||
}
|
||||
@@ -63,12 +66,12 @@ func TestSnapshotRepository(t *testing.T) {
|
||||
}
|
||||
|
||||
// Test UpdateCounts
|
||||
err = repo.UpdateCounts(ctx, nil, snapshot.ID, 200, 1000, 20, twoHundredMebibytes, sixtyMebibytes)
|
||||
err = repo.UpdateCounts(ctx, nil, snapshot.ID.String(), 200, 1000, 20, twoHundredMebibytes, sixtyMebibytes)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to update counts: %v", err)
|
||||
}
|
||||
|
||||
retrieved, err = repo.GetByID(ctx, snapshot.ID)
|
||||
retrieved, err = repo.GetByID(ctx, snapshot.ID.String())
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get updated snapshot: %v", err)
|
||||
}
|
||||
@@ -96,10 +99,11 @@ func TestSnapshotRepository(t *testing.T) {
|
||||
// Add more snapshots
|
||||
for i := 2; i <= 5; i++ {
|
||||
s := &Snapshot{
|
||||
ID: fmt.Sprintf("2024-01-0%dT12:00:00Z", i),
|
||||
ID: types.SnapshotID(fmt.Sprintf("2024-01-0%dT12:00:00Z", i)),
|
||||
Hostname: "test-host",
|
||||
VaultikVersion: "1.0.0",
|
||||
CreatedTS: time.Now().Add(time.Duration(i) * time.Hour).Truncate(time.Second),
|
||||
StartedAt: time.Now().Add(time.Duration(i) * time.Hour).Truncate(time.Second),
|
||||
CompletedAt: nil,
|
||||
FileCount: int64(100 * i),
|
||||
ChunkCount: int64(500 * i),
|
||||
BlobCount: int64(10 * i),
|
||||
@@ -121,7 +125,7 @@ func TestSnapshotRepository(t *testing.T) {
|
||||
|
||||
// Verify order (most recent first)
|
||||
for i := 0; i < len(recent)-1; i++ {
|
||||
if recent[i].CreatedTS.Before(recent[i+1].CreatedTS) {
|
||||
if recent[i].StartedAt.Before(recent[i+1].StartedAt) {
|
||||
t.Error("snapshots not in descending order")
|
||||
}
|
||||
}
|
||||
@@ -162,7 +166,8 @@ func TestSnapshotRepositoryDuplicate(t *testing.T) {
|
||||
ID: "2024-01-01T12:00:00Z",
|
||||
Hostname: "test-host",
|
||||
VaultikVersion: "1.0.0",
|
||||
CreatedTS: time.Now().Truncate(time.Second),
|
||||
StartedAt: time.Now().Truncate(time.Second),
|
||||
CompletedAt: nil,
|
||||
FileCount: 100,
|
||||
ChunkCount: 500,
|
||||
BlobCount: 10,
|
||||
|
||||
147
internal/database/uploads.go
Normal file
147
internal/database/uploads.go
Normal file
@@ -0,0 +1,147 @@
|
||||
package database
|
||||
|
||||
import (
|
||||
"context"
|
||||
"database/sql"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
)
|
||||
|
||||
// Upload represents a blob upload record
|
||||
type Upload struct {
|
||||
BlobHash string
|
||||
SnapshotID string
|
||||
UploadedAt time.Time
|
||||
Size int64
|
||||
DurationMs int64
|
||||
}
|
||||
|
||||
// UploadRepository handles upload records
|
||||
type UploadRepository struct {
|
||||
conn *sql.DB
|
||||
}
|
||||
|
||||
// NewUploadRepository creates a new upload repository
|
||||
func NewUploadRepository(conn *sql.DB) *UploadRepository {
|
||||
return &UploadRepository{conn: conn}
|
||||
}
|
||||
|
||||
// Create inserts a new upload record
|
||||
func (r *UploadRepository) Create(ctx context.Context, tx *sql.Tx, upload *Upload) error {
|
||||
query := `
|
||||
INSERT INTO uploads (blob_hash, snapshot_id, uploaded_at, size, duration_ms)
|
||||
VALUES (?, ?, ?, ?, ?)
|
||||
`
|
||||
|
||||
var err error
|
||||
if tx != nil {
|
||||
_, err = tx.ExecContext(ctx, query, upload.BlobHash, upload.SnapshotID, upload.UploadedAt, upload.Size, upload.DurationMs)
|
||||
} else {
|
||||
_, err = r.conn.ExecContext(ctx, query, upload.BlobHash, upload.SnapshotID, upload.UploadedAt, upload.Size, upload.DurationMs)
|
||||
}
|
||||
|
||||
return err
|
||||
}
|
||||
|
||||
// GetByBlobHash retrieves an upload record by blob hash
|
||||
func (r *UploadRepository) GetByBlobHash(ctx context.Context, blobHash string) (*Upload, error) {
|
||||
query := `
|
||||
SELECT blob_hash, uploaded_at, size, duration_ms
|
||||
FROM uploads
|
||||
WHERE blob_hash = ?
|
||||
`
|
||||
|
||||
var upload Upload
|
||||
err := r.conn.QueryRowContext(ctx, query, blobHash).Scan(
|
||||
&upload.BlobHash,
|
||||
&upload.UploadedAt,
|
||||
&upload.Size,
|
||||
&upload.DurationMs,
|
||||
)
|
||||
|
||||
if err == sql.ErrNoRows {
|
||||
return nil, nil
|
||||
}
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
return &upload, nil
|
||||
}
|
||||
|
||||
// GetRecentUploads retrieves recent uploads ordered by upload time
|
||||
func (r *UploadRepository) GetRecentUploads(ctx context.Context, limit int) ([]*Upload, error) {
|
||||
query := `
|
||||
SELECT blob_hash, uploaded_at, size, duration_ms
|
||||
FROM uploads
|
||||
ORDER BY uploaded_at DESC
|
||||
LIMIT ?
|
||||
`
|
||||
|
||||
rows, err := r.conn.QueryContext(ctx, query, limit)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
defer func() {
|
||||
if err := rows.Close(); err != nil {
|
||||
log.Error("failed to close rows", "error", err)
|
||||
}
|
||||
}()
|
||||
|
||||
var uploads []*Upload
|
||||
for rows.Next() {
|
||||
var upload Upload
|
||||
if err := rows.Scan(&upload.BlobHash, &upload.UploadedAt, &upload.Size, &upload.DurationMs); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
uploads = append(uploads, &upload)
|
||||
}
|
||||
|
||||
return uploads, rows.Err()
|
||||
}
|
||||
|
||||
// GetUploadStats returns aggregate statistics for uploads
|
||||
func (r *UploadRepository) GetUploadStats(ctx context.Context, since time.Time) (*UploadStats, error) {
|
||||
query := `
|
||||
SELECT
|
||||
COUNT(*) as count,
|
||||
COALESCE(SUM(size), 0) as total_size,
|
||||
COALESCE(AVG(duration_ms), 0) as avg_duration_ms,
|
||||
COALESCE(MIN(duration_ms), 0) as min_duration_ms,
|
||||
COALESCE(MAX(duration_ms), 0) as max_duration_ms
|
||||
FROM uploads
|
||||
WHERE uploaded_at >= ?
|
||||
`
|
||||
|
||||
var stats UploadStats
|
||||
err := r.conn.QueryRowContext(ctx, query, since).Scan(
|
||||
&stats.Count,
|
||||
&stats.TotalSize,
|
||||
&stats.AvgDurationMs,
|
||||
&stats.MinDurationMs,
|
||||
&stats.MaxDurationMs,
|
||||
)
|
||||
|
||||
return &stats, err
|
||||
}
|
||||
|
||||
// UploadStats contains aggregate upload statistics
|
||||
type UploadStats struct {
|
||||
Count int64
|
||||
TotalSize int64
|
||||
AvgDurationMs float64
|
||||
MinDurationMs int64
|
||||
MaxDurationMs int64
|
||||
}
|
||||
|
||||
// GetCountBySnapshot returns the count of uploads for a specific snapshot
|
||||
func (r *UploadRepository) GetCountBySnapshot(ctx context.Context, snapshotID string) (int64, error) {
|
||||
query := `SELECT COUNT(*) FROM uploads WHERE snapshot_id = ?`
|
||||
var count int64
|
||||
err := r.conn.QueryRowContext(ctx, query, snapshotID).Scan(&count)
|
||||
if err != nil {
|
||||
return 0, err
|
||||
}
|
||||
return count, nil
|
||||
}
|
||||
@@ -4,13 +4,16 @@ import (
|
||||
"time"
|
||||
)
|
||||
|
||||
// these get populated from main() and copied into the Globals object.
|
||||
var (
|
||||
Appname string = "vaultik"
|
||||
Version string = "dev"
|
||||
Commit string = "unknown"
|
||||
)
|
||||
// Appname is the application name, populated from main().
|
||||
var Appname string = "vaultik"
|
||||
|
||||
// Version is the application version, populated from main().
|
||||
var Version string = "dev"
|
||||
|
||||
// Commit is the git commit hash, populated from main().
|
||||
var Commit string = "unknown"
|
||||
|
||||
// Globals contains application-wide configuration and metadata.
|
||||
type Globals struct {
|
||||
Appname string
|
||||
Version string
|
||||
@@ -18,13 +21,11 @@ type Globals struct {
|
||||
StartTime time.Time
|
||||
}
|
||||
|
||||
// New creates and returns a new Globals instance initialized with the package-level variables.
|
||||
func New() (*Globals, error) {
|
||||
n := &Globals{
|
||||
return &Globals{
|
||||
Appname: Appname,
|
||||
Version: Version,
|
||||
Commit: Commit,
|
||||
StartTime: time.Now(),
|
||||
}
|
||||
|
||||
return n, nil
|
||||
}, nil
|
||||
}
|
||||
|
||||
@@ -2,16 +2,15 @@ package globals
|
||||
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"go.uber.org/fx"
|
||||
"go.uber.org/fx/fxtest"
|
||||
)
|
||||
|
||||
// TestGlobalsNew ensures the globals package initializes correctly
|
||||
func TestGlobalsNew(t *testing.T) {
|
||||
app := fxtest.New(t,
|
||||
fx.Provide(New),
|
||||
fx.Invoke(func(g *Globals) {
|
||||
g, err := New()
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to create Globals: %v", err)
|
||||
}
|
||||
|
||||
if g == nil {
|
||||
t.Fatal("Globals instance is nil")
|
||||
}
|
||||
@@ -28,9 +27,4 @@ func TestGlobalsNew(t *testing.T) {
|
||||
if g.Commit == "" {
|
||||
t.Error("Commit should not be empty")
|
||||
}
|
||||
}),
|
||||
)
|
||||
|
||||
app.RequireStart()
|
||||
app.RequireStop()
|
||||
}
|
||||
|
||||
182
internal/log/log.go
Normal file
182
internal/log/log.go
Normal file
@@ -0,0 +1,182 @@
|
||||
package log
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"log/slog"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"runtime"
|
||||
"strings"
|
||||
|
||||
"golang.org/x/term"
|
||||
)
|
||||
|
||||
// LogLevel represents the logging level.
|
||||
type LogLevel int
|
||||
|
||||
const (
|
||||
// LevelFatal represents a fatal error level that will exit the program.
|
||||
LevelFatal LogLevel = iota
|
||||
// LevelError represents an error level.
|
||||
LevelError
|
||||
// LevelWarn represents a warning level.
|
||||
LevelWarn
|
||||
// LevelNotice represents a notice level (mapped to Info in slog).
|
||||
LevelNotice
|
||||
// LevelInfo represents an informational level.
|
||||
LevelInfo
|
||||
// LevelDebug represents a debug level.
|
||||
LevelDebug
|
||||
)
|
||||
|
||||
// Config holds logger configuration.
|
||||
type Config struct {
|
||||
Verbose bool
|
||||
Debug bool
|
||||
Cron bool
|
||||
Quiet bool
|
||||
}
|
||||
|
||||
var logger *slog.Logger
|
||||
|
||||
// Initialize sets up the global logger based on the provided configuration.
|
||||
func Initialize(cfg Config) {
|
||||
// Determine log level based on configuration
|
||||
var level slog.Level
|
||||
|
||||
if cfg.Cron || cfg.Quiet {
|
||||
// In quiet/cron mode, only show errors
|
||||
level = slog.LevelError
|
||||
} else if cfg.Debug || strings.Contains(os.Getenv("GODEBUG"), "vaultik") {
|
||||
level = slog.LevelDebug
|
||||
} else if cfg.Verbose {
|
||||
level = slog.LevelInfo
|
||||
} else {
|
||||
level = slog.LevelWarn
|
||||
}
|
||||
|
||||
// Create handler with appropriate level
|
||||
opts := &slog.HandlerOptions{
|
||||
Level: level,
|
||||
}
|
||||
|
||||
// Check if stdout is a TTY
|
||||
if term.IsTerminal(int(os.Stdout.Fd())) {
|
||||
// Use colorized TTY handler
|
||||
logger = slog.New(NewTTYHandler(os.Stdout, opts))
|
||||
} else {
|
||||
// Use JSON format for non-TTY output
|
||||
logger = slog.New(slog.NewJSONHandler(os.Stdout, opts))
|
||||
}
|
||||
|
||||
// Set as default logger
|
||||
slog.SetDefault(logger)
|
||||
}
|
||||
|
||||
// getCaller returns the caller information as a string
|
||||
func getCaller(skip int) string {
|
||||
_, file, line, ok := runtime.Caller(skip)
|
||||
if !ok {
|
||||
return "unknown"
|
||||
}
|
||||
return fmt.Sprintf("%s:%d", filepath.Base(file), line)
|
||||
}
|
||||
|
||||
// Fatal logs a fatal error message and exits the program with code 1.
|
||||
func Fatal(msg string, args ...any) {
|
||||
if logger != nil {
|
||||
// Add caller info to args
|
||||
args = append(args, "caller", getCaller(2))
|
||||
logger.Error(msg, args...)
|
||||
}
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
// Fatalf logs a formatted fatal error message and exits the program with code 1.
|
||||
func Fatalf(format string, args ...any) {
|
||||
Fatal(fmt.Sprintf(format, args...))
|
||||
}
|
||||
|
||||
// Error logs an error message.
|
||||
func Error(msg string, args ...any) {
|
||||
if logger != nil {
|
||||
args = append(args, "caller", getCaller(2))
|
||||
logger.Error(msg, args...)
|
||||
}
|
||||
}
|
||||
|
||||
// Errorf logs a formatted error message.
|
||||
func Errorf(format string, args ...any) {
|
||||
Error(fmt.Sprintf(format, args...))
|
||||
}
|
||||
|
||||
// Warn logs a warning message.
|
||||
func Warn(msg string, args ...any) {
|
||||
if logger != nil {
|
||||
args = append(args, "caller", getCaller(2))
|
||||
logger.Warn(msg, args...)
|
||||
}
|
||||
}
|
||||
|
||||
// Warnf logs a formatted warning message.
|
||||
func Warnf(format string, args ...any) {
|
||||
Warn(fmt.Sprintf(format, args...))
|
||||
}
|
||||
|
||||
// Notice logs a notice message (mapped to Info level).
|
||||
func Notice(msg string, args ...any) {
|
||||
if logger != nil {
|
||||
args = append(args, "caller", getCaller(2))
|
||||
logger.Info(msg, args...)
|
||||
}
|
||||
}
|
||||
|
||||
// Noticef logs a formatted notice message.
|
||||
func Noticef(format string, args ...any) {
|
||||
Notice(fmt.Sprintf(format, args...))
|
||||
}
|
||||
|
||||
// Info logs an informational message.
|
||||
func Info(msg string, args ...any) {
|
||||
if logger != nil {
|
||||
args = append(args, "caller", getCaller(2))
|
||||
logger.Info(msg, args...)
|
||||
}
|
||||
}
|
||||
|
||||
// Infof logs a formatted informational message.
|
||||
func Infof(format string, args ...any) {
|
||||
Info(fmt.Sprintf(format, args...))
|
||||
}
|
||||
|
||||
// Debug logs a debug message.
|
||||
func Debug(msg string, args ...any) {
|
||||
if logger != nil {
|
||||
args = append(args, "caller", getCaller(2))
|
||||
logger.Debug(msg, args...)
|
||||
}
|
||||
}
|
||||
|
||||
// Debugf logs a formatted debug message.
|
||||
func Debugf(format string, args ...any) {
|
||||
Debug(fmt.Sprintf(format, args...))
|
||||
}
|
||||
|
||||
// With returns a logger with additional context attributes.
|
||||
func With(args ...any) *slog.Logger {
|
||||
if logger != nil {
|
||||
return logger.With(args...)
|
||||
}
|
||||
return slog.Default()
|
||||
}
|
||||
|
||||
// WithContext returns a logger with the provided context.
|
||||
func WithContext(ctx context.Context) *slog.Logger {
|
||||
return logger
|
||||
}
|
||||
|
||||
// Logger returns the underlying slog.Logger instance.
|
||||
func Logger() *slog.Logger {
|
||||
return logger
|
||||
}
|
||||
25
internal/log/module.go
Normal file
25
internal/log/module.go
Normal file
@@ -0,0 +1,25 @@
|
||||
package log
|
||||
|
||||
import (
|
||||
"go.uber.org/fx"
|
||||
)
|
||||
|
||||
// Module exports logging functionality for dependency injection.
|
||||
var Module = fx.Module("log",
|
||||
fx.Invoke(func(cfg Config) {
|
||||
Initialize(cfg)
|
||||
}),
|
||||
)
|
||||
|
||||
// New creates a new logger configuration from provided options.
|
||||
func New(opts LogOptions) Config {
|
||||
return Config(opts)
|
||||
}
|
||||
|
||||
// LogOptions are provided by the CLI.
|
||||
type LogOptions struct {
|
||||
Verbose bool
|
||||
Debug bool
|
||||
Cron bool
|
||||
Quiet bool
|
||||
}
|
||||
140
internal/log/tty_handler.go
Normal file
140
internal/log/tty_handler.go
Normal file
@@ -0,0 +1,140 @@
|
||||
package log
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"io"
|
||||
"log/slog"
|
||||
"sync"
|
||||
"time"
|
||||
)
|
||||
|
||||
// ANSI color codes
|
||||
const (
|
||||
colorReset = "\033[0m"
|
||||
colorRed = "\033[31m"
|
||||
colorYellow = "\033[33m"
|
||||
colorBlue = "\033[34m"
|
||||
colorGray = "\033[90m"
|
||||
colorGreen = "\033[32m"
|
||||
colorCyan = "\033[36m"
|
||||
colorBold = "\033[1m"
|
||||
)
|
||||
|
||||
// TTYHandler is a custom slog handler for TTY output with colors.
|
||||
type TTYHandler struct {
|
||||
opts slog.HandlerOptions
|
||||
mu sync.Mutex
|
||||
out io.Writer
|
||||
}
|
||||
|
||||
// NewTTYHandler creates a new TTY handler with colored output.
|
||||
func NewTTYHandler(out io.Writer, opts *slog.HandlerOptions) *TTYHandler {
|
||||
if opts == nil {
|
||||
opts = &slog.HandlerOptions{}
|
||||
}
|
||||
return &TTYHandler{
|
||||
out: out,
|
||||
opts: *opts,
|
||||
}
|
||||
}
|
||||
|
||||
// Enabled reports whether the handler handles records at the given level.
|
||||
func (h *TTYHandler) Enabled(_ context.Context, level slog.Level) bool {
|
||||
return level >= h.opts.Level.Level()
|
||||
}
|
||||
|
||||
// Handle writes the log record to the output with color formatting.
|
||||
func (h *TTYHandler) Handle(_ context.Context, r slog.Record) error {
|
||||
h.mu.Lock()
|
||||
defer h.mu.Unlock()
|
||||
|
||||
// Format timestamp
|
||||
timestamp := r.Time.Format("15:04:05")
|
||||
|
||||
// Level and color
|
||||
level := r.Level.String()
|
||||
var levelColor string
|
||||
switch r.Level {
|
||||
case slog.LevelDebug:
|
||||
levelColor = colorGray
|
||||
level = "DEBUG"
|
||||
case slog.LevelInfo:
|
||||
levelColor = colorGreen
|
||||
level = "INFO "
|
||||
case slog.LevelWarn:
|
||||
levelColor = colorYellow
|
||||
level = "WARN "
|
||||
case slog.LevelError:
|
||||
levelColor = colorRed
|
||||
level = "ERROR"
|
||||
default:
|
||||
levelColor = colorReset
|
||||
}
|
||||
|
||||
// Print main message
|
||||
_, _ = fmt.Fprintf(h.out, "%s%s%s %s%s%s %s%s%s",
|
||||
colorGray, timestamp, colorReset,
|
||||
levelColor, level, colorReset,
|
||||
colorBold, r.Message, colorReset)
|
||||
|
||||
// Print attributes
|
||||
r.Attrs(func(a slog.Attr) bool {
|
||||
value := a.Value.String()
|
||||
// Special handling for certain attribute types
|
||||
switch a.Value.Kind() {
|
||||
case slog.KindDuration:
|
||||
if d, ok := a.Value.Any().(time.Duration); ok {
|
||||
value = formatDuration(d)
|
||||
}
|
||||
case slog.KindInt64:
|
||||
if a.Key == "bytes" {
|
||||
value = formatBytes(a.Value.Int64())
|
||||
}
|
||||
}
|
||||
|
||||
_, _ = fmt.Fprintf(h.out, " %s%s%s=%s%s%s",
|
||||
colorCyan, a.Key, colorReset,
|
||||
colorBlue, value, colorReset)
|
||||
return true
|
||||
})
|
||||
|
||||
_, _ = fmt.Fprintln(h.out)
|
||||
return nil
|
||||
}
|
||||
|
||||
// WithAttrs returns a new handler with the given attributes.
|
||||
func (h *TTYHandler) WithAttrs(attrs []slog.Attr) slog.Handler {
|
||||
return h // Simplified for now
|
||||
}
|
||||
|
||||
// WithGroup returns a new handler with the given group name.
|
||||
func (h *TTYHandler) WithGroup(name string) slog.Handler {
|
||||
return h // Simplified for now
|
||||
}
|
||||
|
||||
// formatDuration formats a duration in a human-readable way
|
||||
func formatDuration(d time.Duration) string {
|
||||
if d < time.Millisecond {
|
||||
return fmt.Sprintf("%dµs", d.Microseconds())
|
||||
} else if d < time.Second {
|
||||
return fmt.Sprintf("%dms", d.Milliseconds())
|
||||
} else if d < time.Minute {
|
||||
return fmt.Sprintf("%.1fs", d.Seconds())
|
||||
}
|
||||
return d.String()
|
||||
}
|
||||
|
||||
// formatBytes formats bytes in a human-readable way
|
||||
func formatBytes(b int64) string {
|
||||
const unit = 1024
|
||||
if b < unit {
|
||||
return fmt.Sprintf("%d B", b)
|
||||
}
|
||||
div, exp := int64(unit), 0
|
||||
for n := b / unit; n >= unit; n /= unit {
|
||||
div *= unit
|
||||
exp++
|
||||
}
|
||||
return fmt.Sprintf("%.1f %cB", float64(b)/float64(div), "KMGTPE"[exp])
|
||||
}
|
||||
108
internal/pidlock/pidlock.go
Normal file
108
internal/pidlock/pidlock.go
Normal file
@@ -0,0 +1,108 @@
|
||||
// Package pidlock provides process-level locking using PID files.
|
||||
// It prevents multiple instances of vaultik from running simultaneously,
|
||||
// which would cause database locking conflicts.
|
||||
package pidlock
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strconv"
|
||||
"strings"
|
||||
"syscall"
|
||||
)
|
||||
|
||||
// ErrAlreadyRunning indicates another vaultik instance is running.
|
||||
var ErrAlreadyRunning = errors.New("another vaultik instance is already running")
|
||||
|
||||
// Lock represents an acquired PID lock.
|
||||
type Lock struct {
|
||||
path string
|
||||
}
|
||||
|
||||
// Acquire attempts to acquire a PID lock in the specified directory.
|
||||
// If the lock file exists and the process is still running, it returns
|
||||
// ErrAlreadyRunning with details about the existing process.
|
||||
// On success, it writes the current PID to the lock file and returns
|
||||
// a Lock that must be released with Release().
|
||||
func Acquire(lockDir string) (*Lock, error) {
|
||||
// Ensure lock directory exists
|
||||
if err := os.MkdirAll(lockDir, 0700); err != nil {
|
||||
return nil, fmt.Errorf("creating lock directory: %w", err)
|
||||
}
|
||||
|
||||
lockPath := filepath.Join(lockDir, "vaultik.pid")
|
||||
|
||||
// Check for existing lock
|
||||
existingPID, err := readPIDFile(lockPath)
|
||||
if err == nil {
|
||||
// Lock file exists, check if process is running
|
||||
if isProcessRunning(existingPID) {
|
||||
return nil, fmt.Errorf("%w (PID %d)", ErrAlreadyRunning, existingPID)
|
||||
}
|
||||
// Process is not running, stale lock file - we can take over
|
||||
}
|
||||
|
||||
// Write our PID
|
||||
pid := os.Getpid()
|
||||
if err := os.WriteFile(lockPath, []byte(strconv.Itoa(pid)), 0600); err != nil {
|
||||
return nil, fmt.Errorf("writing PID file: %w", err)
|
||||
}
|
||||
|
||||
return &Lock{path: lockPath}, nil
|
||||
}
|
||||
|
||||
// Release removes the PID lock file.
|
||||
// It is safe to call Release multiple times.
|
||||
func (l *Lock) Release() error {
|
||||
if l == nil || l.path == "" {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Verify we still own the lock (our PID is in the file)
|
||||
existingPID, err := readPIDFile(l.path)
|
||||
if err != nil {
|
||||
// File already gone or unreadable - that's fine
|
||||
return nil
|
||||
}
|
||||
|
||||
if existingPID != os.Getpid() {
|
||||
// Someone else wrote to our lock file - don't remove it
|
||||
return nil
|
||||
}
|
||||
|
||||
if err := os.Remove(l.path); err != nil && !os.IsNotExist(err) {
|
||||
return fmt.Errorf("removing PID file: %w", err)
|
||||
}
|
||||
|
||||
l.path = "" // Prevent double-release
|
||||
return nil
|
||||
}
|
||||
|
||||
// readPIDFile reads and parses the PID from a lock file.
|
||||
func readPIDFile(path string) (int, error) {
|
||||
data, err := os.ReadFile(path)
|
||||
if err != nil {
|
||||
return 0, err
|
||||
}
|
||||
|
||||
pid, err := strconv.Atoi(strings.TrimSpace(string(data)))
|
||||
if err != nil {
|
||||
return 0, fmt.Errorf("parsing PID: %w", err)
|
||||
}
|
||||
|
||||
return pid, nil
|
||||
}
|
||||
|
||||
// isProcessRunning checks if a process with the given PID is running.
|
||||
func isProcessRunning(pid int) bool {
|
||||
process, err := os.FindProcess(pid)
|
||||
if err != nil {
|
||||
return false
|
||||
}
|
||||
|
||||
// On Unix, FindProcess always succeeds. We need to send signal 0 to check.
|
||||
err = process.Signal(syscall.Signal(0))
|
||||
return err == nil
|
||||
}
|
||||
108
internal/pidlock/pidlock_test.go
Normal file
108
internal/pidlock/pidlock_test.go
Normal file
@@ -0,0 +1,108 @@
|
||||
package pidlock
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strconv"
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
func TestAcquireAndRelease(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
|
||||
// Acquire lock
|
||||
lock, err := Acquire(tmpDir)
|
||||
require.NoError(t, err)
|
||||
require.NotNil(t, lock)
|
||||
|
||||
// Verify PID file exists with our PID
|
||||
data, err := os.ReadFile(filepath.Join(tmpDir, "vaultik.pid"))
|
||||
require.NoError(t, err)
|
||||
pid, err := strconv.Atoi(string(data))
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, os.Getpid(), pid)
|
||||
|
||||
// Release lock
|
||||
err = lock.Release()
|
||||
require.NoError(t, err)
|
||||
|
||||
// Verify PID file is gone
|
||||
_, err = os.Stat(filepath.Join(tmpDir, "vaultik.pid"))
|
||||
assert.True(t, os.IsNotExist(err))
|
||||
}
|
||||
|
||||
func TestAcquireBlocksSecondInstance(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
|
||||
// Acquire first lock
|
||||
lock1, err := Acquire(tmpDir)
|
||||
require.NoError(t, err)
|
||||
require.NotNil(t, lock1)
|
||||
defer func() { _ = lock1.Release() }()
|
||||
|
||||
// Try to acquire second lock - should fail
|
||||
lock2, err := Acquire(tmpDir)
|
||||
assert.ErrorIs(t, err, ErrAlreadyRunning)
|
||||
assert.Nil(t, lock2)
|
||||
}
|
||||
|
||||
func TestAcquireWithStaleLock(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
|
||||
// Write a stale PID file (PID that doesn't exist)
|
||||
stalePID := 999999999 // Unlikely to be a real process
|
||||
pidPath := filepath.Join(tmpDir, "vaultik.pid")
|
||||
err := os.WriteFile(pidPath, []byte(strconv.Itoa(stalePID)), 0600)
|
||||
require.NoError(t, err)
|
||||
|
||||
// Should be able to acquire lock (stale lock is cleaned up)
|
||||
lock, err := Acquire(tmpDir)
|
||||
require.NoError(t, err)
|
||||
require.NotNil(t, lock)
|
||||
defer func() { _ = lock.Release() }()
|
||||
|
||||
// Verify our PID is now in the file
|
||||
data, err := os.ReadFile(pidPath)
|
||||
require.NoError(t, err)
|
||||
pid, err := strconv.Atoi(string(data))
|
||||
require.NoError(t, err)
|
||||
assert.Equal(t, os.Getpid(), pid)
|
||||
}
|
||||
|
||||
func TestReleaseIsIdempotent(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
|
||||
lock, err := Acquire(tmpDir)
|
||||
require.NoError(t, err)
|
||||
|
||||
// Release multiple times - should not error
|
||||
err = lock.Release()
|
||||
require.NoError(t, err)
|
||||
|
||||
err = lock.Release()
|
||||
require.NoError(t, err)
|
||||
}
|
||||
|
||||
func TestReleaseNilLock(t *testing.T) {
|
||||
var lock *Lock
|
||||
err := lock.Release()
|
||||
assert.NoError(t, err)
|
||||
}
|
||||
|
||||
func TestAcquireCreatesDirectory(t *testing.T) {
|
||||
tmpDir := t.TempDir()
|
||||
nestedDir := filepath.Join(tmpDir, "nested", "dir")
|
||||
|
||||
lock, err := Acquire(nestedDir)
|
||||
require.NoError(t, err)
|
||||
require.NotNil(t, lock)
|
||||
defer func() { _ = lock.Release() }()
|
||||
|
||||
// Verify directory was created
|
||||
info, err := os.Stat(nestedDir)
|
||||
require.NoError(t, err)
|
||||
assert.True(t, info.IsDir())
|
||||
}
|
||||
334
internal/s3/client.go
Normal file
334
internal/s3/client.go
Normal file
@@ -0,0 +1,334 @@
|
||||
package s3
|
||||
|
||||
import (
|
||||
"context"
|
||||
"io"
|
||||
"sync/atomic"
|
||||
|
||||
"github.com/aws/aws-sdk-go-v2/aws"
|
||||
"github.com/aws/aws-sdk-go-v2/config"
|
||||
"github.com/aws/aws-sdk-go-v2/credentials"
|
||||
"github.com/aws/aws-sdk-go-v2/feature/s3/manager"
|
||||
"github.com/aws/aws-sdk-go-v2/service/s3"
|
||||
"github.com/aws/smithy-go/logging"
|
||||
)
|
||||
|
||||
// Client wraps the AWS S3 client for vaultik operations.
|
||||
// It provides a simplified interface for S3 operations with automatic
|
||||
// prefix handling and connection management. All operations are performed
|
||||
// within the configured bucket and prefix.
|
||||
type Client struct {
|
||||
s3Client *s3.Client
|
||||
bucket string
|
||||
prefix string
|
||||
endpoint string
|
||||
}
|
||||
|
||||
// Config contains S3 client configuration.
|
||||
// All fields are required except Prefix, which defaults to an empty string.
|
||||
// The Endpoint field should include the protocol (http:// or https://).
|
||||
type Config struct {
|
||||
Endpoint string
|
||||
Bucket string
|
||||
Prefix string
|
||||
AccessKeyID string
|
||||
SecretAccessKey string
|
||||
Region string
|
||||
}
|
||||
|
||||
// nopLogger is a logger that discards all output.
|
||||
// Used to suppress SDK warnings about checksums.
|
||||
type nopLogger struct{}
|
||||
|
||||
func (nopLogger) Logf(classification logging.Classification, format string, v ...interface{}) {}
|
||||
|
||||
// NewClient creates a new S3 client with the provided configuration.
|
||||
// It establishes a connection to the S3-compatible storage service and
|
||||
// validates the credentials. The client uses static credentials and
|
||||
// path-style URLs for compatibility with various S3-compatible services.
|
||||
func NewClient(ctx context.Context, cfg Config) (*Client, error) {
|
||||
// Create AWS config with a nop logger to suppress SDK warnings
|
||||
awsCfg, err := config.LoadDefaultConfig(ctx,
|
||||
config.WithRegion(cfg.Region),
|
||||
config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(
|
||||
cfg.AccessKeyID,
|
||||
cfg.SecretAccessKey,
|
||||
"",
|
||||
)),
|
||||
config.WithLogger(nopLogger{}),
|
||||
)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
// Configure custom endpoint if provided
|
||||
s3Opts := func(o *s3.Options) {
|
||||
if cfg.Endpoint != "" {
|
||||
o.BaseEndpoint = aws.String(cfg.Endpoint)
|
||||
o.UsePathStyle = true
|
||||
}
|
||||
}
|
||||
|
||||
s3Client := s3.NewFromConfig(awsCfg, s3Opts)
|
||||
|
||||
return &Client{
|
||||
s3Client: s3Client,
|
||||
bucket: cfg.Bucket,
|
||||
prefix: cfg.Prefix,
|
||||
endpoint: cfg.Endpoint,
|
||||
}, nil
|
||||
}
|
||||
|
||||
// PutObject uploads an object to S3 with the specified key.
|
||||
// The key is automatically prefixed with the configured prefix.
|
||||
// The data parameter should be a reader containing the object data.
|
||||
// Returns an error if the upload fails.
|
||||
func (c *Client) PutObject(ctx context.Context, key string, data io.Reader) error {
|
||||
fullKey := c.prefix + key
|
||||
_, err := c.s3Client.PutObject(ctx, &s3.PutObjectInput{
|
||||
Bucket: aws.String(c.bucket),
|
||||
Key: aws.String(fullKey),
|
||||
Body: data,
|
||||
})
|
||||
return err
|
||||
}
|
||||
|
||||
// ProgressCallback is called during upload progress with bytes uploaded so far.
|
||||
// The callback should return an error to cancel the upload.
|
||||
type ProgressCallback func(bytesUploaded int64) error
|
||||
|
||||
// PutObjectWithProgress uploads an object to S3 with progress tracking.
|
||||
// The key is automatically prefixed with the configured prefix.
|
||||
// The size parameter must be the exact size of the data to upload.
|
||||
// The progress callback is called periodically with the number of bytes uploaded.
|
||||
// Returns an error if the upload fails.
|
||||
func (c *Client) PutObjectWithProgress(ctx context.Context, key string, data io.Reader, size int64, progress ProgressCallback) error {
|
||||
fullKey := c.prefix + key
|
||||
|
||||
// Create an uploader with the S3 client
|
||||
uploader := manager.NewUploader(c.s3Client, func(u *manager.Uploader) {
|
||||
// Set part size to 10MB for better progress granularity
|
||||
u.PartSize = 10 * 1024 * 1024
|
||||
})
|
||||
|
||||
// Create a progress reader that tracks upload progress
|
||||
pr := &progressReader{
|
||||
reader: data,
|
||||
size: size,
|
||||
callback: progress,
|
||||
read: 0,
|
||||
}
|
||||
|
||||
// Upload the file
|
||||
_, err := uploader.Upload(ctx, &s3.PutObjectInput{
|
||||
Bucket: aws.String(c.bucket),
|
||||
Key: aws.String(fullKey),
|
||||
Body: pr,
|
||||
})
|
||||
|
||||
return err
|
||||
}
|
||||
|
||||
// GetObject downloads an object from S3 with the specified key.
|
||||
// The key is automatically prefixed with the configured prefix.
|
||||
// Returns a ReadCloser containing the object data. The caller must
|
||||
// close the returned reader when done to avoid resource leaks.
|
||||
func (c *Client) GetObject(ctx context.Context, key string) (io.ReadCloser, error) {
|
||||
fullKey := c.prefix + key
|
||||
result, err := c.s3Client.GetObject(ctx, &s3.GetObjectInput{
|
||||
Bucket: aws.String(c.bucket),
|
||||
Key: aws.String(fullKey),
|
||||
})
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return result.Body, nil
|
||||
}
|
||||
|
||||
// DeleteObject removes an object from S3 with the specified key.
|
||||
// The key is automatically prefixed with the configured prefix.
|
||||
// No error is returned if the object doesn't exist.
|
||||
func (c *Client) DeleteObject(ctx context.Context, key string) error {
|
||||
fullKey := c.prefix + key
|
||||
_, err := c.s3Client.DeleteObject(ctx, &s3.DeleteObjectInput{
|
||||
Bucket: aws.String(c.bucket),
|
||||
Key: aws.String(fullKey),
|
||||
})
|
||||
return err
|
||||
}
|
||||
|
||||
// ListObjects lists all objects with the given prefix.
|
||||
// The prefix is combined with the client's configured prefix.
|
||||
// Returns a slice of object keys with the base prefix removed.
|
||||
// This method loads all matching keys into memory, so use
|
||||
// ListObjectsStream for large result sets.
|
||||
func (c *Client) ListObjects(ctx context.Context, prefix string) ([]string, error) {
|
||||
fullPrefix := c.prefix + prefix
|
||||
|
||||
var keys []string
|
||||
paginator := s3.NewListObjectsV2Paginator(c.s3Client, &s3.ListObjectsV2Input{
|
||||
Bucket: aws.String(c.bucket),
|
||||
Prefix: aws.String(fullPrefix),
|
||||
})
|
||||
|
||||
for paginator.HasMorePages() {
|
||||
page, err := paginator.NextPage(ctx)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
for _, obj := range page.Contents {
|
||||
if obj.Key != nil {
|
||||
// Remove the base prefix from the key
|
||||
key := *obj.Key
|
||||
if len(key) > len(c.prefix) {
|
||||
key = key[len(c.prefix):]
|
||||
}
|
||||
keys = append(keys, key)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return keys, nil
|
||||
}
|
||||
|
||||
// HeadObject checks if an object exists in S3.
|
||||
// Returns true if the object exists, false otherwise.
|
||||
// The key is automatically prefixed with the configured prefix.
|
||||
// Note: This method returns false for any error, not just "not found".
|
||||
func (c *Client) HeadObject(ctx context.Context, key string) (bool, error) {
|
||||
fullKey := c.prefix + key
|
||||
_, err := c.s3Client.HeadObject(ctx, &s3.HeadObjectInput{
|
||||
Bucket: aws.String(c.bucket),
|
||||
Key: aws.String(fullKey),
|
||||
})
|
||||
if err != nil {
|
||||
// Check if it's a not found error
|
||||
// TODO: Add proper error type checking
|
||||
return false, nil
|
||||
}
|
||||
return true, nil
|
||||
}
|
||||
|
||||
// ObjectInfo contains information about an S3 object.
|
||||
// It is used by ListObjectsStream to return object metadata
|
||||
// along with any errors encountered during listing.
|
||||
type ObjectInfo struct {
|
||||
Key string
|
||||
Size int64
|
||||
Err error
|
||||
}
|
||||
|
||||
// ListObjectsStream lists objects with the given prefix and returns a channel.
|
||||
// This method is preferred for large result sets as it streams results
|
||||
// instead of loading everything into memory. The channel is closed when
|
||||
// listing is complete or an error occurs. If an error occurs, it will be
|
||||
// sent as the last item with the Err field set. The recursive parameter
|
||||
// is currently unused but reserved for future use.
|
||||
func (c *Client) ListObjectsStream(ctx context.Context, prefix string, recursive bool) <-chan ObjectInfo {
|
||||
ch := make(chan ObjectInfo)
|
||||
|
||||
go func() {
|
||||
defer close(ch)
|
||||
|
||||
fullPrefix := c.prefix + prefix
|
||||
|
||||
paginator := s3.NewListObjectsV2Paginator(c.s3Client, &s3.ListObjectsV2Input{
|
||||
Bucket: aws.String(c.bucket),
|
||||
Prefix: aws.String(fullPrefix),
|
||||
})
|
||||
|
||||
for paginator.HasMorePages() {
|
||||
page, err := paginator.NextPage(ctx)
|
||||
if err != nil {
|
||||
ch <- ObjectInfo{Err: err}
|
||||
return
|
||||
}
|
||||
|
||||
for _, obj := range page.Contents {
|
||||
if obj.Key != nil && obj.Size != nil {
|
||||
// Remove the base prefix from the key
|
||||
key := *obj.Key
|
||||
if len(key) > len(c.prefix) {
|
||||
key = key[len(c.prefix):]
|
||||
}
|
||||
ch <- ObjectInfo{
|
||||
Key: key,
|
||||
Size: *obj.Size,
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}()
|
||||
|
||||
return ch
|
||||
}
|
||||
|
||||
// StatObject returns information about an object without downloading it.
|
||||
// The key is automatically prefixed with the configured prefix.
|
||||
// Returns an ObjectInfo struct with the object's metadata.
|
||||
// Returns an error if the object doesn't exist or if the operation fails.
|
||||
func (c *Client) StatObject(ctx context.Context, key string) (*ObjectInfo, error) {
|
||||
fullKey := c.prefix + key
|
||||
result, err := c.s3Client.HeadObject(ctx, &s3.HeadObjectInput{
|
||||
Bucket: aws.String(c.bucket),
|
||||
Key: aws.String(fullKey),
|
||||
})
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
size := int64(0)
|
||||
if result.ContentLength != nil {
|
||||
size = *result.ContentLength
|
||||
}
|
||||
|
||||
return &ObjectInfo{
|
||||
Key: key,
|
||||
Size: size,
|
||||
}, nil
|
||||
}
|
||||
|
||||
// RemoveObject deletes an object from S3 (alias for DeleteObject).
|
||||
// This method exists for API compatibility and simply calls DeleteObject.
|
||||
func (c *Client) RemoveObject(ctx context.Context, key string) error {
|
||||
return c.DeleteObject(ctx, key)
|
||||
}
|
||||
|
||||
// BucketName returns the configured S3 bucket name.
|
||||
// This is useful for displaying configuration information.
|
||||
func (c *Client) BucketName() string {
|
||||
return c.bucket
|
||||
}
|
||||
|
||||
// Endpoint returns the S3 endpoint URL.
|
||||
// If no custom endpoint was configured, returns the default AWS S3 endpoint.
|
||||
// This is useful for displaying configuration information.
|
||||
func (c *Client) Endpoint() string {
|
||||
if c.endpoint == "" {
|
||||
return "s3.amazonaws.com"
|
||||
}
|
||||
return c.endpoint
|
||||
}
|
||||
|
||||
// progressReader wraps an io.Reader to track reading progress
|
||||
type progressReader struct {
|
||||
reader io.Reader
|
||||
size int64
|
||||
read int64
|
||||
callback ProgressCallback
|
||||
}
|
||||
|
||||
// Read implements io.Reader
|
||||
func (pr *progressReader) Read(p []byte) (int, error) {
|
||||
n, err := pr.reader.Read(p)
|
||||
if n > 0 {
|
||||
atomic.AddInt64(&pr.read, int64(n))
|
||||
if pr.callback != nil {
|
||||
if callbackErr := pr.callback(atomic.LoadInt64(&pr.read)); callbackErr != nil {
|
||||
return n, callbackErr
|
||||
}
|
||||
}
|
||||
}
|
||||
return n, err
|
||||
}
|
||||
98
internal/s3/client_test.go
Normal file
98
internal/s3/client_test.go
Normal file
@@ -0,0 +1,98 @@
|
||||
package s3_test
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"io"
|
||||
"testing"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/s3"
|
||||
)
|
||||
|
||||
func TestClient(t *testing.T) {
|
||||
ts := NewTestServer(t)
|
||||
defer func() {
|
||||
if err := ts.Cleanup(); err != nil {
|
||||
t.Errorf("cleanup failed: %v", err)
|
||||
}
|
||||
}()
|
||||
|
||||
ctx := context.Background()
|
||||
|
||||
// Create client
|
||||
client, err := s3.NewClient(ctx, s3.Config{
|
||||
Endpoint: testEndpoint,
|
||||
Bucket: testBucket,
|
||||
Prefix: "test-prefix/",
|
||||
AccessKeyID: testAccessKey,
|
||||
SecretAccessKey: testSecretKey,
|
||||
Region: testRegion,
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create client: %v", err)
|
||||
}
|
||||
|
||||
// Test PutObject
|
||||
testKey := "foo/bar.txt"
|
||||
testData := []byte("test data")
|
||||
err = client.PutObject(ctx, testKey, bytes.NewReader(testData))
|
||||
if err != nil {
|
||||
t.Fatalf("failed to put object: %v", err)
|
||||
}
|
||||
|
||||
// Test GetObject
|
||||
reader, err := client.GetObject(ctx, testKey)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get object: %v", err)
|
||||
}
|
||||
defer func() {
|
||||
if err := reader.Close(); err != nil {
|
||||
t.Errorf("failed to close reader: %v", err)
|
||||
}
|
||||
}()
|
||||
|
||||
data, err := io.ReadAll(reader)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to read data: %v", err)
|
||||
}
|
||||
|
||||
if !bytes.Equal(data, testData) {
|
||||
t.Errorf("data mismatch: got %q, want %q", data, testData)
|
||||
}
|
||||
|
||||
// Test HeadObject
|
||||
exists, err := client.HeadObject(ctx, testKey)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to head object: %v", err)
|
||||
}
|
||||
if !exists {
|
||||
t.Error("expected object to exist")
|
||||
}
|
||||
|
||||
// Test ListObjects
|
||||
keys, err := client.ListObjects(ctx, "foo/")
|
||||
if err != nil {
|
||||
t.Fatalf("failed to list objects: %v", err)
|
||||
}
|
||||
if len(keys) != 1 {
|
||||
t.Errorf("expected 1 key, got %d", len(keys))
|
||||
}
|
||||
if keys[0] != testKey {
|
||||
t.Errorf("unexpected key: got %s, want %s", keys[0], testKey)
|
||||
}
|
||||
|
||||
// Test DeleteObject
|
||||
err = client.DeleteObject(ctx, testKey)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to delete object: %v", err)
|
||||
}
|
||||
|
||||
// Verify deletion
|
||||
exists, err = client.HeadObject(ctx, testKey)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to head object after deletion: %v", err)
|
||||
}
|
||||
if exists {
|
||||
t.Error("expected object to not exist after deletion")
|
||||
}
|
||||
}
|
||||
42
internal/s3/module.go
Normal file
42
internal/s3/module.go
Normal file
@@ -0,0 +1,42 @@
|
||||
package s3
|
||||
|
||||
import (
|
||||
"context"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/config"
|
||||
"go.uber.org/fx"
|
||||
)
|
||||
|
||||
// Module exports S3 functionality as an fx module.
|
||||
// It provides automatic dependency injection for the S3 client,
|
||||
// configuring it based on the application's configuration settings.
|
||||
var Module = fx.Module("s3",
|
||||
fx.Provide(
|
||||
provideClient,
|
||||
),
|
||||
)
|
||||
|
||||
func provideClient(lc fx.Lifecycle, cfg *config.Config) (*Client, error) {
|
||||
ctx := context.Background()
|
||||
|
||||
client, err := NewClient(ctx, Config{
|
||||
Endpoint: cfg.S3.Endpoint,
|
||||
Bucket: cfg.S3.Bucket,
|
||||
Prefix: cfg.S3.Prefix,
|
||||
AccessKeyID: cfg.S3.AccessKeyID,
|
||||
SecretAccessKey: cfg.S3.SecretAccessKey,
|
||||
Region: cfg.S3.Region,
|
||||
})
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
lc.Append(fx.Hook{
|
||||
OnStop: func(ctx context.Context) error {
|
||||
// S3 client doesn't need explicit cleanup
|
||||
return nil
|
||||
},
|
||||
})
|
||||
|
||||
return client, nil
|
||||
}
|
||||
306
internal/s3/s3_test.go
Normal file
306
internal/s3/s3_test.go
Normal file
@@ -0,0 +1,306 @@
|
||||
package s3_test
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"fmt"
|
||||
"io"
|
||||
"net/http"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/aws/aws-sdk-go-v2/aws"
|
||||
"github.com/aws/aws-sdk-go-v2/config"
|
||||
"github.com/aws/aws-sdk-go-v2/credentials"
|
||||
"github.com/aws/aws-sdk-go-v2/service/s3"
|
||||
"github.com/aws/smithy-go/logging"
|
||||
"github.com/johannesboyne/gofakes3"
|
||||
"github.com/johannesboyne/gofakes3/backend/s3mem"
|
||||
)
|
||||
|
||||
const (
|
||||
testBucket = "test-bucket"
|
||||
testRegion = "us-east-1"
|
||||
testAccessKey = "test-access-key"
|
||||
testSecretKey = "test-secret-key"
|
||||
testEndpoint = "http://localhost:9999"
|
||||
)
|
||||
|
||||
// TestServer represents an in-process S3-compatible test server
|
||||
type TestServer struct {
|
||||
server *http.Server
|
||||
backend gofakes3.Backend
|
||||
s3Client *s3.Client
|
||||
tempDir string
|
||||
logBuf *bytes.Buffer
|
||||
}
|
||||
|
||||
// NewTestServer creates and starts a new test server
|
||||
func NewTestServer(t *testing.T) *TestServer {
|
||||
// Create temp directory for any file operations
|
||||
tempDir, err := os.MkdirTemp("", "vaultik-s3-test-*")
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create temp dir: %v", err)
|
||||
}
|
||||
|
||||
// Create in-memory backend
|
||||
backend := s3mem.New()
|
||||
faker := gofakes3.New(backend)
|
||||
|
||||
// Create HTTP server
|
||||
server := &http.Server{
|
||||
Addr: "localhost:9999",
|
||||
Handler: faker.Server(),
|
||||
}
|
||||
|
||||
// Start server in background
|
||||
go func() {
|
||||
if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
|
||||
t.Logf("test server error: %v", err)
|
||||
}
|
||||
}()
|
||||
|
||||
// Wait for server to be ready
|
||||
time.Sleep(100 * time.Millisecond)
|
||||
|
||||
// Create a buffer to capture logs
|
||||
logBuf := &bytes.Buffer{}
|
||||
|
||||
// Create S3 client with custom logger
|
||||
cfg, err := config.LoadDefaultConfig(context.Background(),
|
||||
config.WithRegion(testRegion),
|
||||
config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(
|
||||
testAccessKey,
|
||||
testSecretKey,
|
||||
"",
|
||||
)),
|
||||
config.WithClientLogMode(aws.LogRetries|aws.LogRequestWithBody|aws.LogResponseWithBody),
|
||||
config.WithLogger(logging.LoggerFunc(func(classification logging.Classification, format string, v ...interface{}) {
|
||||
// Capture logs to buffer instead of stdout
|
||||
fmt.Fprintf(logBuf, "SDK %s %s %s\n",
|
||||
time.Now().Format("2006/01/02 15:04:05"),
|
||||
string(classification),
|
||||
fmt.Sprintf(format, v...))
|
||||
})),
|
||||
)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create AWS config: %v", err)
|
||||
}
|
||||
|
||||
s3Client := s3.NewFromConfig(cfg, func(o *s3.Options) {
|
||||
o.BaseEndpoint = aws.String(testEndpoint)
|
||||
o.UsePathStyle = true
|
||||
})
|
||||
|
||||
ts := &TestServer{
|
||||
server: server,
|
||||
backend: backend,
|
||||
s3Client: s3Client,
|
||||
tempDir: tempDir,
|
||||
logBuf: logBuf,
|
||||
}
|
||||
|
||||
// Register cleanup to show logs on test failure
|
||||
t.Cleanup(func() {
|
||||
if t.Failed() && logBuf.Len() > 0 {
|
||||
t.Logf("S3 SDK Debug Output:\n%s", logBuf.String())
|
||||
}
|
||||
})
|
||||
|
||||
// Create test bucket
|
||||
_, err = s3Client.CreateBucket(context.Background(), &s3.CreateBucketInput{
|
||||
Bucket: aws.String(testBucket),
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create test bucket: %v", err)
|
||||
}
|
||||
|
||||
return ts
|
||||
}
|
||||
|
||||
// Cleanup shuts down the server and removes temp directory
|
||||
func (ts *TestServer) Cleanup() error {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
defer cancel()
|
||||
|
||||
if err := ts.server.Shutdown(ctx); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
return os.RemoveAll(ts.tempDir)
|
||||
}
|
||||
|
||||
// Client returns the S3 client configured for the test server
|
||||
func (ts *TestServer) Client() *s3.Client {
|
||||
return ts.s3Client
|
||||
}
|
||||
|
||||
// TestBasicS3Operations tests basic store and retrieve operations
|
||||
func TestBasicS3Operations(t *testing.T) {
|
||||
ts := NewTestServer(t)
|
||||
defer func() {
|
||||
if err := ts.Cleanup(); err != nil {
|
||||
t.Errorf("cleanup failed: %v", err)
|
||||
}
|
||||
}()
|
||||
|
||||
ctx := context.Background()
|
||||
client := ts.Client()
|
||||
|
||||
// Test data
|
||||
testKey := "test/file.txt"
|
||||
testData := []byte("Hello, S3 test!")
|
||||
|
||||
// Put object
|
||||
_, err := client.PutObject(ctx, &s3.PutObjectInput{
|
||||
Bucket: aws.String(testBucket),
|
||||
Key: aws.String(testKey),
|
||||
Body: bytes.NewReader(testData),
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to put object: %v", err)
|
||||
}
|
||||
|
||||
// Get object
|
||||
result, err := client.GetObject(ctx, &s3.GetObjectInput{
|
||||
Bucket: aws.String(testBucket),
|
||||
Key: aws.String(testKey),
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to get object: %v", err)
|
||||
}
|
||||
defer func() {
|
||||
if err := result.Body.Close(); err != nil {
|
||||
t.Errorf("failed to close body: %v", err)
|
||||
}
|
||||
}()
|
||||
|
||||
// Read and verify data
|
||||
data, err := io.ReadAll(result.Body)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to read object body: %v", err)
|
||||
}
|
||||
|
||||
if !bytes.Equal(data, testData) {
|
||||
t.Errorf("retrieved data mismatch: got %q, want %q", data, testData)
|
||||
}
|
||||
}
|
||||
|
||||
// TestBlobOperations tests blob storage patterns for vaultik
|
||||
func TestBlobOperations(t *testing.T) {
|
||||
ts := NewTestServer(t)
|
||||
defer func() {
|
||||
if err := ts.Cleanup(); err != nil {
|
||||
t.Errorf("cleanup failed: %v", err)
|
||||
}
|
||||
}()
|
||||
|
||||
ctx := context.Background()
|
||||
client := ts.Client()
|
||||
|
||||
// Test blob storage with prefix structure
|
||||
blobHash := "aabbccddee112233445566778899aabbccddee11"
|
||||
blobKey := filepath.Join("blobs", blobHash[:2], blobHash[2:4], blobHash+".zst.age")
|
||||
blobData := []byte("compressed and encrypted blob data")
|
||||
|
||||
// Store blob
|
||||
_, err := client.PutObject(ctx, &s3.PutObjectInput{
|
||||
Bucket: aws.String(testBucket),
|
||||
Key: aws.String(blobKey),
|
||||
Body: bytes.NewReader(blobData),
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to store blob: %v", err)
|
||||
}
|
||||
|
||||
// List objects with prefix
|
||||
listResult, err := client.ListObjectsV2(ctx, &s3.ListObjectsV2Input{
|
||||
Bucket: aws.String(testBucket),
|
||||
Prefix: aws.String("blobs/aa/"),
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to list objects: %v", err)
|
||||
}
|
||||
|
||||
if len(listResult.Contents) != 1 {
|
||||
t.Errorf("expected 1 object, got %d", len(listResult.Contents))
|
||||
}
|
||||
|
||||
if listResult.Contents[0].Key != nil && *listResult.Contents[0].Key != blobKey {
|
||||
t.Errorf("unexpected key: got %s, want %s", *listResult.Contents[0].Key, blobKey)
|
||||
}
|
||||
|
||||
// Delete blob
|
||||
_, err = client.DeleteObject(ctx, &s3.DeleteObjectInput{
|
||||
Bucket: aws.String(testBucket),
|
||||
Key: aws.String(blobKey),
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to delete blob: %v", err)
|
||||
}
|
||||
|
||||
// Verify deletion
|
||||
_, err = client.GetObject(ctx, &s3.GetObjectInput{
|
||||
Bucket: aws.String(testBucket),
|
||||
Key: aws.String(blobKey),
|
||||
})
|
||||
if err == nil {
|
||||
t.Error("expected error getting deleted object, got nil")
|
||||
}
|
||||
}
|
||||
|
||||
// TestMetadataOperations tests metadata storage patterns
|
||||
func TestMetadataOperations(t *testing.T) {
|
||||
ts := NewTestServer(t)
|
||||
defer func() {
|
||||
if err := ts.Cleanup(); err != nil {
|
||||
t.Errorf("cleanup failed: %v", err)
|
||||
}
|
||||
}()
|
||||
|
||||
ctx := context.Background()
|
||||
client := ts.Client()
|
||||
|
||||
// Test metadata storage
|
||||
snapshotID := "2024-01-01T12:00:00Z"
|
||||
metadataKey := filepath.Join("metadata", snapshotID+".sqlite.age")
|
||||
metadataData := []byte("encrypted sqlite database")
|
||||
|
||||
// Store metadata
|
||||
_, err := client.PutObject(ctx, &s3.PutObjectInput{
|
||||
Bucket: aws.String(testBucket),
|
||||
Key: aws.String(metadataKey),
|
||||
Body: bytes.NewReader(metadataData),
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to store metadata: %v", err)
|
||||
}
|
||||
|
||||
// Store manifest
|
||||
manifestKey := filepath.Join("metadata", snapshotID+".manifest.json.zst")
|
||||
manifestData := []byte(`{"snapshot_id":"2024-01-01T12:00:00Z","blob_hashes":["hash1","hash2"]}`)
|
||||
|
||||
_, err = client.PutObject(ctx, &s3.PutObjectInput{
|
||||
Bucket: aws.String(testBucket),
|
||||
Key: aws.String(manifestKey),
|
||||
Body: bytes.NewReader(manifestData),
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to store manifest: %v", err)
|
||||
}
|
||||
|
||||
// List metadata objects
|
||||
listResult, err := client.ListObjectsV2(ctx, &s3.ListObjectsV2Input{
|
||||
Bucket: aws.String(testBucket),
|
||||
Prefix: aws.String("metadata/"),
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("failed to list metadata: %v", err)
|
||||
}
|
||||
|
||||
if len(listResult.Contents) != 2 {
|
||||
t.Errorf("expected 2 metadata objects, got %d", len(listResult.Contents))
|
||||
}
|
||||
}
|
||||
534
internal/snapshot/backup_test.go
Normal file
534
internal/snapshot/backup_test.go
Normal file
@@ -0,0 +1,534 @@
|
||||
package snapshot
|
||||
|
||||
import (
|
||||
"context"
|
||||
"crypto/sha256"
|
||||
"database/sql"
|
||||
"fmt"
|
||||
"io"
|
||||
"io/fs"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"testing"
|
||||
"testing/fstest"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
)
|
||||
|
||||
// MockS3Client is a mock implementation of S3 operations for testing
|
||||
type MockS3Client struct {
|
||||
storage map[string][]byte
|
||||
}
|
||||
|
||||
func NewMockS3Client() *MockS3Client {
|
||||
return &MockS3Client{
|
||||
storage: make(map[string][]byte),
|
||||
}
|
||||
}
|
||||
|
||||
func (m *MockS3Client) PutBlob(ctx context.Context, hash string, data []byte) error {
|
||||
m.storage[hash] = data
|
||||
return nil
|
||||
}
|
||||
|
||||
func (m *MockS3Client) GetBlob(ctx context.Context, hash string) ([]byte, error) {
|
||||
data, ok := m.storage[hash]
|
||||
if !ok {
|
||||
return nil, fmt.Errorf("blob not found: %s", hash)
|
||||
}
|
||||
return data, nil
|
||||
}
|
||||
|
||||
func (m *MockS3Client) BlobExists(ctx context.Context, hash string) (bool, error) {
|
||||
_, ok := m.storage[hash]
|
||||
return ok, nil
|
||||
}
|
||||
|
||||
func (m *MockS3Client) CreateBucket(ctx context.Context, bucket string) error {
|
||||
return nil
|
||||
}
|
||||
|
||||
func TestBackupWithInMemoryFS(t *testing.T) {
|
||||
// Create a temporary directory for the database
|
||||
tempDir := t.TempDir()
|
||||
dbPath := filepath.Join(tempDir, "test.db")
|
||||
|
||||
// Create test filesystem
|
||||
testFS := fstest.MapFS{
|
||||
"file1.txt": &fstest.MapFile{
|
||||
Data: []byte("Hello, World!"),
|
||||
Mode: 0644,
|
||||
ModTime: time.Now(),
|
||||
},
|
||||
"dir1/file2.txt": &fstest.MapFile{
|
||||
Data: []byte("This is a test file with some content."),
|
||||
Mode: 0755,
|
||||
ModTime: time.Now(),
|
||||
},
|
||||
"dir1/subdir/file3.txt": &fstest.MapFile{
|
||||
Data: []byte("Another file in a subdirectory."),
|
||||
Mode: 0600,
|
||||
ModTime: time.Now(),
|
||||
},
|
||||
"largefile.bin": &fstest.MapFile{
|
||||
Data: generateLargeFileContent(10 * 1024 * 1024), // 10MB file with varied content
|
||||
Mode: 0644,
|
||||
ModTime: time.Now(),
|
||||
},
|
||||
}
|
||||
|
||||
// Initialize the database
|
||||
ctx := context.Background()
|
||||
db, err := database.New(ctx, dbPath)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to create database: %v", err)
|
||||
}
|
||||
defer func() {
|
||||
if err := db.Close(); err != nil {
|
||||
t.Logf("Failed to close database: %v", err)
|
||||
}
|
||||
}()
|
||||
|
||||
repos := database.NewRepositories(db)
|
||||
|
||||
// Create mock S3 client
|
||||
s3Client := NewMockS3Client()
|
||||
|
||||
// Run backup
|
||||
backupEngine := &BackupEngine{
|
||||
repos: repos,
|
||||
s3Client: s3Client,
|
||||
}
|
||||
|
||||
snapshotID, err := backupEngine.Backup(ctx, testFS, ".")
|
||||
if err != nil {
|
||||
t.Fatalf("Backup failed: %v", err)
|
||||
}
|
||||
|
||||
// Verify snapshot was created
|
||||
snapshot, err := repos.Snapshots.GetByID(ctx, snapshotID)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to get snapshot: %v", err)
|
||||
}
|
||||
|
||||
if snapshot == nil {
|
||||
t.Fatal("Snapshot not found")
|
||||
}
|
||||
|
||||
if snapshot.FileCount == 0 {
|
||||
t.Error("Expected snapshot to have files")
|
||||
}
|
||||
|
||||
// Verify files in database
|
||||
files, err := repos.Files.ListByPrefix(ctx, "")
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to list files: %v", err)
|
||||
}
|
||||
|
||||
expectedFiles := map[string]bool{
|
||||
"file1.txt": true,
|
||||
"dir1/file2.txt": true,
|
||||
"dir1/subdir/file3.txt": true,
|
||||
"largefile.bin": true,
|
||||
}
|
||||
|
||||
if len(files) != len(expectedFiles) {
|
||||
t.Errorf("Expected %d files, got %d", len(expectedFiles), len(files))
|
||||
}
|
||||
|
||||
for _, file := range files {
|
||||
if !expectedFiles[file.Path.String()] {
|
||||
t.Errorf("Unexpected file in database: %s", file.Path)
|
||||
}
|
||||
delete(expectedFiles, file.Path.String())
|
||||
|
||||
// Verify file metadata
|
||||
fsFile := testFS[file.Path.String()]
|
||||
if fsFile == nil {
|
||||
t.Errorf("File %s not found in test filesystem", file.Path)
|
||||
continue
|
||||
}
|
||||
|
||||
if file.Size != int64(len(fsFile.Data)) {
|
||||
t.Errorf("File %s: expected size %d, got %d", file.Path, len(fsFile.Data), file.Size)
|
||||
}
|
||||
|
||||
if file.Mode != uint32(fsFile.Mode) {
|
||||
t.Errorf("File %s: expected mode %o, got %o", file.Path, fsFile.Mode, file.Mode)
|
||||
}
|
||||
}
|
||||
|
||||
if len(expectedFiles) > 0 {
|
||||
t.Errorf("Files not found in database: %v", expectedFiles)
|
||||
}
|
||||
|
||||
// Verify chunks
|
||||
chunks, err := repos.Chunks.List(ctx)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to list chunks: %v", err)
|
||||
}
|
||||
|
||||
if len(chunks) == 0 {
|
||||
t.Error("No chunks found in database")
|
||||
}
|
||||
|
||||
// The large file should create 10 chunks (10MB / 1MB chunk size)
|
||||
// Plus the small files
|
||||
minExpectedChunks := 10 + 3
|
||||
if len(chunks) < minExpectedChunks {
|
||||
t.Errorf("Expected at least %d chunks, got %d", minExpectedChunks, len(chunks))
|
||||
}
|
||||
|
||||
// Verify at least one blob was created and uploaded
|
||||
// We can't list blobs directly, but we can check via snapshot blobs
|
||||
blobHashes, err := repos.Snapshots.GetBlobHashes(ctx, snapshotID)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to get blob hashes: %v", err)
|
||||
}
|
||||
if len(blobHashes) == 0 {
|
||||
t.Error("Expected at least one blob to be created")
|
||||
}
|
||||
|
||||
for _, blobHash := range blobHashes {
|
||||
// Check blob exists in mock S3
|
||||
exists, err := s3Client.BlobExists(ctx, blobHash)
|
||||
if err != nil {
|
||||
t.Errorf("Failed to check blob %s: %v", blobHash, err)
|
||||
}
|
||||
if !exists {
|
||||
t.Errorf("Blob %s not found in S3", blobHash)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestBackupDeduplication(t *testing.T) {
|
||||
// Create a temporary directory for the database
|
||||
tempDir := t.TempDir()
|
||||
dbPath := filepath.Join(tempDir, "test.db")
|
||||
|
||||
// Create test filesystem with duplicate content
|
||||
testFS := fstest.MapFS{
|
||||
"file1.txt": &fstest.MapFile{
|
||||
Data: []byte("Duplicate content"),
|
||||
Mode: 0644,
|
||||
ModTime: time.Now(),
|
||||
},
|
||||
"file2.txt": &fstest.MapFile{
|
||||
Data: []byte("Duplicate content"),
|
||||
Mode: 0644,
|
||||
ModTime: time.Now(),
|
||||
},
|
||||
"file3.txt": &fstest.MapFile{
|
||||
Data: []byte("Unique content"),
|
||||
Mode: 0644,
|
||||
ModTime: time.Now(),
|
||||
},
|
||||
}
|
||||
|
||||
// Initialize the database
|
||||
ctx := context.Background()
|
||||
db, err := database.New(ctx, dbPath)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to create database: %v", err)
|
||||
}
|
||||
defer func() {
|
||||
if err := db.Close(); err != nil {
|
||||
t.Logf("Failed to close database: %v", err)
|
||||
}
|
||||
}()
|
||||
|
||||
repos := database.NewRepositories(db)
|
||||
|
||||
// Create mock S3 client
|
||||
s3Client := NewMockS3Client()
|
||||
|
||||
// Run backup
|
||||
backupEngine := &BackupEngine{
|
||||
repos: repos,
|
||||
s3Client: s3Client,
|
||||
}
|
||||
|
||||
_, err = backupEngine.Backup(ctx, testFS, ".")
|
||||
if err != nil {
|
||||
t.Fatalf("Backup failed: %v", err)
|
||||
}
|
||||
|
||||
// Verify deduplication
|
||||
chunks, err := repos.Chunks.List(ctx)
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to list chunks: %v", err)
|
||||
}
|
||||
|
||||
// Should have only 2 unique chunks (duplicate content + unique content)
|
||||
if len(chunks) != 2 {
|
||||
t.Errorf("Expected 2 unique chunks, got %d", len(chunks))
|
||||
}
|
||||
|
||||
// Verify chunk references
|
||||
for _, chunk := range chunks {
|
||||
files, err := repos.ChunkFiles.GetByChunkHash(ctx, chunk.ChunkHash)
|
||||
if err != nil {
|
||||
t.Errorf("Failed to get files for chunk %s: %v", chunk.ChunkHash, err)
|
||||
}
|
||||
|
||||
// The duplicate content chunk should be referenced by 2 files
|
||||
if chunk.Size == int64(len("Duplicate content")) && len(files) != 2 {
|
||||
t.Errorf("Expected duplicate chunk to be referenced by 2 files, got %d", len(files))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// BackupEngine performs backup operations
|
||||
type BackupEngine struct {
|
||||
repos *database.Repositories
|
||||
s3Client interface {
|
||||
PutBlob(ctx context.Context, hash string, data []byte) error
|
||||
BlobExists(ctx context.Context, hash string) (bool, error)
|
||||
}
|
||||
}
|
||||
|
||||
// Backup performs a backup of the given filesystem
|
||||
func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (string, error) {
|
||||
// Create a new snapshot
|
||||
hostname, _ := os.Hostname()
|
||||
snapshotID := time.Now().Format(time.RFC3339)
|
||||
snapshot := &database.Snapshot{
|
||||
ID: types.SnapshotID(snapshotID),
|
||||
Hostname: types.Hostname(hostname),
|
||||
VaultikVersion: "test",
|
||||
StartedAt: time.Now(),
|
||||
CompletedAt: nil,
|
||||
}
|
||||
|
||||
// Create initial snapshot record
|
||||
err := b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||
return b.repos.Snapshots.Create(ctx, tx, snapshot)
|
||||
})
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
// Track counters
|
||||
var fileCount, chunkCount, blobCount, totalSize, blobSize int64
|
||||
|
||||
// Track which chunks we've seen to handle deduplication
|
||||
processedChunks := make(map[string]bool)
|
||||
|
||||
// Scan the filesystem and process files
|
||||
err = fs.WalkDir(fsys, root, func(path string, d fs.DirEntry, err error) error {
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Skip directories
|
||||
if d.IsDir() {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Get file info
|
||||
info, err := d.Info()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Handle symlinks
|
||||
if info.Mode()&fs.ModeSymlink != 0 {
|
||||
// For testing, we'll skip symlinks since fstest doesn't support them well
|
||||
return nil
|
||||
}
|
||||
|
||||
// Create file record in a short transaction
|
||||
file := &database.File{
|
||||
Path: types.FilePath(path),
|
||||
Size: info.Size(),
|
||||
Mode: uint32(info.Mode()),
|
||||
MTime: info.ModTime(),
|
||||
CTime: fileCTime(info), // platform-specific: birth time on macOS, inode change time on Linux
|
||||
UID: 1000, // Default UID for test
|
||||
GID: 1000, // Default GID for test
|
||||
}
|
||||
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||
return b.repos.Files.Create(ctx, tx, file)
|
||||
})
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
fileCount++
|
||||
totalSize += info.Size()
|
||||
|
||||
// Read and process file in chunks
|
||||
f, err := fsys.Open(path)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
defer func() {
|
||||
if err := f.Close(); err != nil {
|
||||
// Log but don't fail since we're already in an error path potentially
|
||||
fmt.Fprintf(os.Stderr, "Failed to close file: %v\n", err)
|
||||
}
|
||||
}()
|
||||
|
||||
// Process file in chunks
|
||||
chunkIndex := 0
|
||||
buffer := make([]byte, defaultChunkSize)
|
||||
|
||||
for {
|
||||
n, err := f.Read(buffer)
|
||||
if err != nil && err != io.EOF {
|
||||
return err
|
||||
}
|
||||
if n == 0 {
|
||||
break
|
||||
}
|
||||
|
||||
chunkData := buffer[:n]
|
||||
chunkHash := calculateHash(chunkData)
|
||||
|
||||
// Check if chunk already exists (outside of transaction)
|
||||
existingChunk, _ := b.repos.Chunks.GetByHash(ctx, chunkHash)
|
||||
if existingChunk == nil {
|
||||
// Create new chunk in a short transaction
|
||||
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||
chunk := &database.Chunk{
|
||||
ChunkHash: types.ChunkHash(chunkHash),
|
||||
Size: int64(n),
|
||||
}
|
||||
return b.repos.Chunks.Create(ctx, tx, chunk)
|
||||
})
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
processedChunks[chunkHash] = true
|
||||
}
|
||||
|
||||
// Create file-chunk mapping in a short transaction
|
||||
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||
fileChunk := &database.FileChunk{
|
||||
FileID: file.ID,
|
||||
Idx: chunkIndex,
|
||||
ChunkHash: types.ChunkHash(chunkHash),
|
||||
}
|
||||
return b.repos.FileChunks.Create(ctx, tx, fileChunk)
|
||||
})
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Create chunk-file mapping in a short transaction
|
||||
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||
chunkFile := &database.ChunkFile{
|
||||
ChunkHash: types.ChunkHash(chunkHash),
|
||||
FileID: file.ID,
|
||||
FileOffset: int64(chunkIndex * defaultChunkSize),
|
||||
Length: int64(n),
|
||||
}
|
||||
return b.repos.ChunkFiles.Create(ctx, tx, chunkFile)
|
||||
})
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
chunkIndex++
|
||||
}
|
||||
|
||||
return nil
|
||||
})
|
||||
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
// After all files are processed, create blobs for new chunks
|
||||
for chunkHash := range processedChunks {
|
||||
// Get chunk data (outside of transaction)
|
||||
chunk, err := b.repos.Chunks.GetByHash(ctx, chunkHash)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
chunkCount++
|
||||
|
||||
// In a real system, blobs would contain multiple chunks and be encrypted
|
||||
// For testing, we'll create a blob with a "blob-" prefix to differentiate
|
||||
blobHash := "blob-" + chunkHash
|
||||
|
||||
// For the test, we'll create dummy data since we don't have the original
|
||||
dummyData := []byte(chunkHash)
|
||||
|
||||
// Upload to S3 as a blob
|
||||
if err := b.s3Client.PutBlob(ctx, blobHash, dummyData); err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
// Create blob entry in a short transaction
|
||||
blobID := types.NewBlobID()
|
||||
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||
blob := &database.Blob{
|
||||
ID: blobID,
|
||||
Hash: types.BlobHash(blobHash),
|
||||
CreatedTS: time.Now(),
|
||||
}
|
||||
return b.repos.Blobs.Create(ctx, tx, blob)
|
||||
})
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
blobCount++
|
||||
blobSize += chunk.Size
|
||||
|
||||
// Create blob-chunk mapping in a short transaction
|
||||
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||
blobChunk := &database.BlobChunk{
|
||||
BlobID: blobID,
|
||||
ChunkHash: types.ChunkHash(chunkHash),
|
||||
Offset: 0,
|
||||
Length: chunk.Size,
|
||||
}
|
||||
return b.repos.BlobChunks.Create(ctx, tx, blobChunk)
|
||||
})
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
// Add blob to snapshot in a short transaction
|
||||
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||
return b.repos.Snapshots.AddBlob(ctx, tx, snapshotID, blobID, types.BlobHash(blobHash))
|
||||
})
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
}
|
||||
|
||||
// Update snapshot with final counts
|
||||
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||
return b.repos.Snapshots.UpdateCounts(ctx, tx, snapshotID, fileCount, chunkCount, blobCount, totalSize, blobSize)
|
||||
})
|
||||
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
return snapshotID, nil
|
||||
}
|
||||
|
||||
func calculateHash(data []byte) string {
|
||||
h := sha256.New()
|
||||
h.Write(data)
|
||||
return fmt.Sprintf("%x", h.Sum(nil))
|
||||
}
|
||||
|
||||
func generateLargeFileContent(size int) []byte {
|
||||
data := make([]byte, size)
|
||||
// Fill with pattern that changes every chunk to avoid deduplication
|
||||
for i := 0; i < size; i++ {
|
||||
chunkNum := i / defaultChunkSize
|
||||
data[i] = byte((i + chunkNum) % 256)
|
||||
}
|
||||
return data
|
||||
}
|
||||
|
||||
const defaultChunkSize = 1024 * 1024 // 1MB chunks
|
||||
26
internal/snapshot/ctime_darwin.go
Normal file
26
internal/snapshot/ctime_darwin.go
Normal file
@@ -0,0 +1,26 @@
|
||||
package snapshot
|
||||
|
||||
import (
|
||||
"os"
|
||||
"syscall"
|
||||
"time"
|
||||
)
|
||||
|
||||
// fileCTime returns the file creation time (birth time) on macOS.
|
||||
//
|
||||
// On macOS/Darwin, "ctime" refers to the file's birth time (when the file
|
||||
// was first created on disk). This is stored in the Birthtimespec field of
|
||||
// the syscall.Stat_t structure.
|
||||
//
|
||||
// This differs from Linux where "ctime" means inode change time (the last
|
||||
// time file metadata was modified). See ctime_linux.go for details.
|
||||
//
|
||||
// If the underlying stat information is unavailable (e.g. when using a
|
||||
// virtual filesystem like afero.MemMapFs), this falls back to mtime.
|
||||
func fileCTime(info os.FileInfo) time.Time {
|
||||
stat, ok := info.Sys().(*syscall.Stat_t)
|
||||
if !ok {
|
||||
return info.ModTime()
|
||||
}
|
||||
return time.Unix(stat.Birthtimespec.Sec, stat.Birthtimespec.Nsec).UTC()
|
||||
}
|
||||
28
internal/snapshot/ctime_linux.go
Normal file
28
internal/snapshot/ctime_linux.go
Normal file
@@ -0,0 +1,28 @@
|
||||
package snapshot
|
||||
|
||||
import (
|
||||
"os"
|
||||
"syscall"
|
||||
"time"
|
||||
)
|
||||
|
||||
// fileCTime returns the inode change time on Linux.
|
||||
//
|
||||
// On Linux, "ctime" refers to the inode change time — the last time the
|
||||
// file's metadata (permissions, ownership, link count, etc.) was modified.
|
||||
// This is NOT the file creation time; Linux did not expose birth time until
|
||||
// the statx(2) syscall was added in kernel 4.11, and Go's syscall package
|
||||
// does not yet surface it.
|
||||
//
|
||||
// This differs from macOS/Darwin where "ctime" means birth time (file
|
||||
// creation time). See ctime_darwin.go for details.
|
||||
//
|
||||
// If the underlying stat information is unavailable (e.g. when using a
|
||||
// virtual filesystem like afero.MemMapFs), this falls back to mtime.
|
||||
func fileCTime(info os.FileInfo) time.Time {
|
||||
stat, ok := info.Sys().(*syscall.Stat_t)
|
||||
if !ok {
|
||||
return info.ModTime()
|
||||
}
|
||||
return time.Unix(stat.Ctim.Sec, stat.Ctim.Nsec).UTC()
|
||||
}
|
||||
133
internal/snapshot/ctime_test.go
Normal file
133
internal/snapshot/ctime_test.go
Normal file
@@ -0,0 +1,133 @@
|
||||
package snapshot
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
func TestFileCTime_RealFile(t *testing.T) {
|
||||
// Create a temporary file
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "testfile.txt")
|
||||
|
||||
if err := os.WriteFile(path, []byte("hello"), 0644); err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
info, err := os.Stat(path)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
ctime := fileCTime(info)
|
||||
|
||||
// ctime should be a valid time (not zero)
|
||||
if ctime.IsZero() {
|
||||
t.Fatal("fileCTime returned zero time")
|
||||
}
|
||||
|
||||
// ctime should be close to now (within a few seconds)
|
||||
diff := time.Since(ctime)
|
||||
if diff < 0 || diff > 5*time.Second {
|
||||
t.Fatalf("fileCTime returned unexpected time: %v (diff from now: %v)", ctime, diff)
|
||||
}
|
||||
|
||||
// ctime should not equal mtime exactly in all cases, but for a freshly
|
||||
// created file they should be very close
|
||||
mtime := info.ModTime()
|
||||
ctimeMtimeDiff := ctime.Sub(mtime)
|
||||
if ctimeMtimeDiff < 0 {
|
||||
ctimeMtimeDiff = -ctimeMtimeDiff
|
||||
}
|
||||
// For a freshly created file, ctime and mtime should be within 1 second
|
||||
if ctimeMtimeDiff > time.Second {
|
||||
t.Fatalf("ctime and mtime differ by too much for a new file: ctime=%v, mtime=%v, diff=%v",
|
||||
ctime, mtime, ctimeMtimeDiff)
|
||||
}
|
||||
}
|
||||
|
||||
func TestFileCTime_AfterMtimeChange(t *testing.T) {
|
||||
// Create a temporary file
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "testfile.txt")
|
||||
|
||||
if err := os.WriteFile(path, []byte("hello"), 0644); err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
// Get initial ctime
|
||||
info1, err := os.Stat(path)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
ctime1 := fileCTime(info1)
|
||||
|
||||
// Change mtime to a time in the past
|
||||
pastTime := time.Date(2020, 1, 1, 0, 0, 0, 0, time.UTC)
|
||||
if err := os.Chtimes(path, pastTime, pastTime); err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
// Get new stats
|
||||
info2, err := os.Stat(path)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
ctime2 := fileCTime(info2)
|
||||
mtime2 := info2.ModTime()
|
||||
|
||||
// mtime should now be in the past
|
||||
if mtime2.Year() != 2020 {
|
||||
t.Fatalf("mtime not set correctly: %v", mtime2)
|
||||
}
|
||||
|
||||
// On macOS: ctime (birth time) should remain unchanged since birth time
|
||||
// doesn't change when mtime is updated.
|
||||
// On Linux: ctime (inode change time) will be updated to ~now because
|
||||
// changing mtime is a metadata change.
|
||||
// Either way, ctime should NOT equal the past mtime we just set.
|
||||
if ctime2.Equal(pastTime) {
|
||||
t.Fatal("ctime should not equal the artificially set past mtime")
|
||||
}
|
||||
|
||||
// ctime should still be a recent time (the original creation time or
|
||||
// the metadata change time, depending on platform)
|
||||
_ = ctime1 // used for reference; both platforms will have a recent ctime2
|
||||
if time.Since(ctime2) > 10*time.Second {
|
||||
t.Fatalf("ctime is unexpectedly old: %v", ctime2)
|
||||
}
|
||||
}
|
||||
|
||||
// TestFileCTime_NonSyscallFileInfo verifies the fallback to mtime when
|
||||
// the FileInfo doesn't have a *syscall.Stat_t (e.g. afero.MemMapFs).
|
||||
type mockFileInfo struct {
|
||||
name string
|
||||
size int64
|
||||
mode os.FileMode
|
||||
modTime time.Time
|
||||
isDir bool
|
||||
}
|
||||
|
||||
func (m *mockFileInfo) Name() string { return m.name }
|
||||
func (m *mockFileInfo) Size() int64 { return m.size }
|
||||
func (m *mockFileInfo) Mode() os.FileMode { return m.mode }
|
||||
func (m *mockFileInfo) ModTime() time.Time { return m.modTime }
|
||||
func (m *mockFileInfo) IsDir() bool { return m.isDir }
|
||||
func (m *mockFileInfo) Sys() interface{} { return nil } // No syscall.Stat_t
|
||||
|
||||
func TestFileCTime_FallbackToMtime(t *testing.T) {
|
||||
now := time.Now().UTC().Truncate(time.Second)
|
||||
info := &mockFileInfo{
|
||||
name: "test.txt",
|
||||
size: 100,
|
||||
mode: 0644,
|
||||
modTime: now,
|
||||
}
|
||||
|
||||
ctime := fileCTime(info)
|
||||
if !ctime.Equal(now) {
|
||||
t.Fatalf("expected fallback to mtime %v, got %v", now, ctime)
|
||||
}
|
||||
}
|
||||
454
internal/snapshot/exclude_test.go
Normal file
454
internal/snapshot/exclude_test.go
Normal file
@@ -0,0 +1,454 @@
|
||||
package snapshot_test
|
||||
|
||||
import (
|
||||
"context"
|
||||
"database/sql"
|
||||
"path/filepath"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
"git.eeqj.de/sneak/vaultik/internal/snapshot"
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
"github.com/spf13/afero"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
func setupExcludeTestFS(t *testing.T) afero.Fs {
|
||||
t.Helper()
|
||||
|
||||
// Create in-memory filesystem
|
||||
fs := afero.NewMemMapFs()
|
||||
|
||||
// Create test directory structure:
|
||||
// /backup/
|
||||
// file1.txt (should be backed up)
|
||||
// file2.log (should be excluded if *.log is in patterns)
|
||||
// .git/
|
||||
// config (should be excluded if .git is in patterns)
|
||||
// objects/
|
||||
// pack/
|
||||
// data.pack (should be excluded if .git is in patterns)
|
||||
// src/
|
||||
// main.go (should be backed up)
|
||||
// test.go (should be backed up)
|
||||
// node_modules/
|
||||
// package/
|
||||
// index.js (should be excluded if node_modules is in patterns)
|
||||
// cache/
|
||||
// temp.dat (should be excluded if cache/ is in patterns)
|
||||
// build/
|
||||
// output.bin (should be excluded if build is in patterns)
|
||||
// docs/
|
||||
// readme.md (should be backed up)
|
||||
// .DS_Store (should be excluded if .DS_Store is in patterns)
|
||||
// thumbs.db (should be excluded if thumbs.db is in patterns)
|
||||
|
||||
files := map[string]string{
|
||||
"/backup/file1.txt": "content1",
|
||||
"/backup/file2.log": "log content",
|
||||
"/backup/.git/config": "git config",
|
||||
"/backup/.git/objects/pack/data.pack": "pack data",
|
||||
"/backup/src/main.go": "package main",
|
||||
"/backup/src/test.go": "package main_test",
|
||||
"/backup/node_modules/package/index.js": "module.exports = {}",
|
||||
"/backup/cache/temp.dat": "cached data",
|
||||
"/backup/build/output.bin": "binary data",
|
||||
"/backup/docs/readme.md": "# Documentation",
|
||||
"/backup/.DS_Store": "ds store data",
|
||||
"/backup/thumbs.db": "thumbs data",
|
||||
"/backup/src/.hidden": "hidden file",
|
||||
"/backup/important.log.bak": "backup of log",
|
||||
}
|
||||
|
||||
testTime := time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC)
|
||||
for path, content := range files {
|
||||
dir := filepath.Dir(path)
|
||||
err := fs.MkdirAll(dir, 0755)
|
||||
require.NoError(t, err)
|
||||
err = afero.WriteFile(fs, path, []byte(content), 0644)
|
||||
require.NoError(t, err)
|
||||
err = fs.Chtimes(path, testTime, testTime)
|
||||
require.NoError(t, err)
|
||||
}
|
||||
|
||||
return fs
|
||||
}
|
||||
|
||||
func createTestScanner(t *testing.T, fs afero.Fs, excludePatterns []string) (*snapshot.Scanner, *database.Repositories, func()) {
|
||||
t.Helper()
|
||||
|
||||
// Initialize logger
|
||||
log.Initialize(log.Config{})
|
||||
|
||||
// Create test database
|
||||
db, err := database.NewTestDB()
|
||||
require.NoError(t, err)
|
||||
|
||||
repos := database.NewRepositories(db)
|
||||
|
||||
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
|
||||
FS: fs,
|
||||
ChunkSize: 64 * 1024,
|
||||
Repositories: repos,
|
||||
MaxBlobSize: 1024 * 1024,
|
||||
CompressionLevel: 3,
|
||||
AgeRecipients: []string{"age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p"},
|
||||
Exclude: excludePatterns,
|
||||
})
|
||||
|
||||
cleanup := func() {
|
||||
_ = db.Close()
|
||||
}
|
||||
|
||||
return scanner, repos, cleanup
|
||||
}
|
||||
|
||||
func createSnapshotRecord(t *testing.T, ctx context.Context, repos *database.Repositories, snapshotID string) {
|
||||
t.Helper()
|
||||
err := repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||
snap := &database.Snapshot{
|
||||
ID: types.SnapshotID(snapshotID),
|
||||
Hostname: "test-host",
|
||||
VaultikVersion: "test",
|
||||
StartedAt: time.Now(),
|
||||
CompletedAt: nil,
|
||||
FileCount: 0,
|
||||
ChunkCount: 0,
|
||||
BlobCount: 0,
|
||||
TotalSize: 0,
|
||||
BlobSize: 0,
|
||||
CompressionRatio: 1.0,
|
||||
}
|
||||
return repos.Snapshots.Create(ctx, tx, snap)
|
||||
})
|
||||
require.NoError(t, err)
|
||||
}
|
||||
|
||||
func TestExcludePatterns_ExcludeGitDirectory(t *testing.T) {
|
||||
fs := setupExcludeTestFS(t)
|
||||
scanner, repos, cleanup := createTestScanner(t, fs, []string{".git"})
|
||||
defer cleanup()
|
||||
require.NotNil(t, scanner)
|
||||
|
||||
ctx := context.Background()
|
||||
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||
|
||||
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||
require.NoError(t, err)
|
||||
|
||||
// Should have scanned files but NOT .git directory contents
|
||||
// Expected: file1.txt, file2.log, src/main.go, src/test.go, node_modules/package/index.js,
|
||||
// cache/temp.dat, build/output.bin, docs/readme.md, .DS_Store, thumbs.db,
|
||||
// src/.hidden, important.log.bak
|
||||
// Excluded: .git/config, .git/objects/pack/data.pack
|
||||
require.Equal(t, 12, result.FilesScanned, "Should exclude .git directory contents")
|
||||
}
|
||||
|
||||
func TestExcludePatterns_ExcludeByExtension(t *testing.T) {
|
||||
fs := setupExcludeTestFS(t)
|
||||
scanner, repos, cleanup := createTestScanner(t, fs, []string{"*.log"})
|
||||
defer cleanup()
|
||||
require.NotNil(t, scanner)
|
||||
|
||||
ctx := context.Background()
|
||||
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||
|
||||
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||
require.NoError(t, err)
|
||||
|
||||
// Should exclude file2.log but NOT important.log.bak (different extension)
|
||||
// Total files: 14, excluded: 1 (file2.log)
|
||||
require.Equal(t, 13, result.FilesScanned, "Should exclude *.log files")
|
||||
}
|
||||
|
||||
func TestExcludePatterns_ExcludeNodeModules(t *testing.T) {
|
||||
fs := setupExcludeTestFS(t)
|
||||
scanner, repos, cleanup := createTestScanner(t, fs, []string{"node_modules"})
|
||||
defer cleanup()
|
||||
require.NotNil(t, scanner)
|
||||
|
||||
ctx := context.Background()
|
||||
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||
|
||||
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||
require.NoError(t, err)
|
||||
|
||||
// Should exclude node_modules/package/index.js
|
||||
// Total files: 14, excluded: 1
|
||||
require.Equal(t, 13, result.FilesScanned, "Should exclude node_modules directory")
|
||||
}
|
||||
|
||||
func TestExcludePatterns_MultiplePatterns(t *testing.T) {
|
||||
fs := setupExcludeTestFS(t)
|
||||
scanner, repos, cleanup := createTestScanner(t, fs, []string{".git", "node_modules", "*.log", ".DS_Store", "thumbs.db", "cache", "build"})
|
||||
defer cleanup()
|
||||
require.NotNil(t, scanner)
|
||||
|
||||
ctx := context.Background()
|
||||
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||
|
||||
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||
require.NoError(t, err)
|
||||
|
||||
// Should only have: file1.txt, src/main.go, src/test.go, docs/readme.md, src/.hidden, important.log.bak
|
||||
// Excluded: .git/*, node_modules/*, *.log (file2.log), .DS_Store, thumbs.db, cache/*, build/*
|
||||
require.Equal(t, 6, result.FilesScanned, "Should exclude multiple patterns")
|
||||
}
|
||||
|
||||
func TestExcludePatterns_NoExclusions(t *testing.T) {
|
||||
fs := setupExcludeTestFS(t)
|
||||
scanner, repos, cleanup := createTestScanner(t, fs, []string{})
|
||||
defer cleanup()
|
||||
require.NotNil(t, scanner)
|
||||
|
||||
ctx := context.Background()
|
||||
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||
|
||||
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||
require.NoError(t, err)
|
||||
|
||||
// Should scan all 14 files
|
||||
require.Equal(t, 14, result.FilesScanned, "Should scan all files when no exclusions")
|
||||
}
|
||||
|
||||
func TestExcludePatterns_ExcludeHiddenFiles(t *testing.T) {
|
||||
fs := setupExcludeTestFS(t)
|
||||
scanner, repos, cleanup := createTestScanner(t, fs, []string{".*"})
|
||||
defer cleanup()
|
||||
require.NotNil(t, scanner)
|
||||
|
||||
ctx := context.Background()
|
||||
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||
|
||||
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||
require.NoError(t, err)
|
||||
|
||||
// Should exclude: .git/*, .DS_Store, src/.hidden
|
||||
// Total files: 14, excluded: 4 (.git/config, .git/objects/pack/data.pack, .DS_Store, src/.hidden)
|
||||
require.Equal(t, 10, result.FilesScanned, "Should exclude hidden files and directories")
|
||||
}
|
||||
|
||||
func TestExcludePatterns_DoubleStarGlob(t *testing.T) {
|
||||
fs := setupExcludeTestFS(t)
|
||||
scanner, repos, cleanup := createTestScanner(t, fs, []string{"**/*.pack"})
|
||||
defer cleanup()
|
||||
require.NotNil(t, scanner)
|
||||
|
||||
ctx := context.Background()
|
||||
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||
|
||||
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||
require.NoError(t, err)
|
||||
|
||||
// Should exclude .git/objects/pack/data.pack
|
||||
// Total files: 14, excluded: 1
|
||||
require.Equal(t, 13, result.FilesScanned, "Should exclude **/*.pack files")
|
||||
}
|
||||
|
||||
func TestExcludePatterns_ExactFileName(t *testing.T) {
|
||||
fs := setupExcludeTestFS(t)
|
||||
scanner, repos, cleanup := createTestScanner(t, fs, []string{"thumbs.db", ".DS_Store"})
|
||||
defer cleanup()
|
||||
require.NotNil(t, scanner)
|
||||
|
||||
ctx := context.Background()
|
||||
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||
|
||||
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||
require.NoError(t, err)
|
||||
|
||||
// Should exclude thumbs.db and .DS_Store
|
||||
// Total files: 14, excluded: 2
|
||||
require.Equal(t, 12, result.FilesScanned, "Should exclude exact file names")
|
||||
}
|
||||
|
||||
func TestExcludePatterns_CaseSensitive(t *testing.T) {
|
||||
// Pattern matching should be case-sensitive
|
||||
fs := setupExcludeTestFS(t)
|
||||
scanner, repos, cleanup := createTestScanner(t, fs, []string{"THUMBS.DB"})
|
||||
defer cleanup()
|
||||
require.NotNil(t, scanner)
|
||||
|
||||
ctx := context.Background()
|
||||
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||
|
||||
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||
require.NoError(t, err)
|
||||
|
||||
// Case-sensitive matching: THUMBS.DB should NOT match thumbs.db
|
||||
// All 14 files should be scanned
|
||||
require.Equal(t, 14, result.FilesScanned, "Pattern matching should be case-sensitive")
|
||||
}
|
||||
|
||||
func TestExcludePatterns_DirectoryWithTrailingSlash(t *testing.T) {
|
||||
fs := setupExcludeTestFS(t)
|
||||
// Some users might add trailing slashes to directory patterns
|
||||
scanner, repos, cleanup := createTestScanner(t, fs, []string{"cache/", "build/"})
|
||||
defer cleanup()
|
||||
require.NotNil(t, scanner)
|
||||
|
||||
ctx := context.Background()
|
||||
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||
|
||||
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||
require.NoError(t, err)
|
||||
|
||||
// Should exclude cache/temp.dat and build/output.bin
|
||||
// Total files: 14, excluded: 2
|
||||
require.Equal(t, 12, result.FilesScanned, "Should handle directory patterns with trailing slashes")
|
||||
}
|
||||
|
||||
func TestExcludePatterns_PatternInSubdirectory(t *testing.T) {
|
||||
fs := setupExcludeTestFS(t)
|
||||
// Exclude .hidden file specifically in src directory
|
||||
scanner, repos, cleanup := createTestScanner(t, fs, []string{"src/.hidden"})
|
||||
defer cleanup()
|
||||
require.NotNil(t, scanner)
|
||||
|
||||
ctx := context.Background()
|
||||
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||
|
||||
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||
require.NoError(t, err)
|
||||
|
||||
// Should exclude only src/.hidden
|
||||
// Total files: 14, excluded: 1
|
||||
require.Equal(t, 13, result.FilesScanned, "Should exclude specific subdirectory files")
|
||||
}
|
||||
|
||||
// setupAnchoredTestFS creates a filesystem for testing anchored patterns
|
||||
// Source dir: /backup
|
||||
// Structure:
|
||||
//
|
||||
// /backup/
|
||||
// projectname/
|
||||
// file.txt (should be excluded with /projectname)
|
||||
// otherproject/
|
||||
// projectname/
|
||||
// file.txt (should NOT be excluded with /projectname, only with projectname)
|
||||
// src/
|
||||
// file.go
|
||||
func setupAnchoredTestFS(t *testing.T) afero.Fs {
|
||||
t.Helper()
|
||||
|
||||
fs := afero.NewMemMapFs()
|
||||
|
||||
files := map[string]string{
|
||||
"/backup/projectname/file.txt": "root project file",
|
||||
"/backup/otherproject/projectname/file.txt": "nested project file",
|
||||
"/backup/src/file.go": "source file",
|
||||
"/backup/file.txt": "root file",
|
||||
}
|
||||
|
||||
testTime := time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC)
|
||||
for path, content := range files {
|
||||
dir := filepath.Dir(path)
|
||||
err := fs.MkdirAll(dir, 0755)
|
||||
require.NoError(t, err)
|
||||
err = afero.WriteFile(fs, path, []byte(content), 0644)
|
||||
require.NoError(t, err)
|
||||
err = fs.Chtimes(path, testTime, testTime)
|
||||
require.NoError(t, err)
|
||||
}
|
||||
|
||||
return fs
|
||||
}
|
||||
|
||||
func TestExcludePatterns_AnchoredPattern(t *testing.T) {
|
||||
// Pattern starting with / should only match from root of source dir
|
||||
fs := setupAnchoredTestFS(t)
|
||||
scanner, repos, cleanup := createTestScanner(t, fs, []string{"/projectname"})
|
||||
defer cleanup()
|
||||
require.NotNil(t, scanner)
|
||||
|
||||
ctx := context.Background()
|
||||
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||
|
||||
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||
require.NoError(t, err)
|
||||
|
||||
// /projectname should ONLY exclude /backup/projectname/file.txt (1 file)
|
||||
// /backup/otherproject/projectname/file.txt should NOT be excluded
|
||||
// Total files: 4, excluded: 1
|
||||
require.Equal(t, 3, result.FilesScanned, "Anchored pattern /projectname should only match at root of source dir")
|
||||
}
|
||||
|
||||
func TestExcludePatterns_UnanchoredPattern(t *testing.T) {
|
||||
// Pattern without leading / should match anywhere in path
|
||||
fs := setupAnchoredTestFS(t)
|
||||
scanner, repos, cleanup := createTestScanner(t, fs, []string{"projectname"})
|
||||
defer cleanup()
|
||||
require.NotNil(t, scanner)
|
||||
|
||||
ctx := context.Background()
|
||||
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||
|
||||
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||
require.NoError(t, err)
|
||||
|
||||
// projectname (without /) should exclude BOTH:
|
||||
// - /backup/projectname/file.txt
|
||||
// - /backup/otherproject/projectname/file.txt
|
||||
// Total files: 4, excluded: 2
|
||||
require.Equal(t, 2, result.FilesScanned, "Unanchored pattern should match anywhere in path")
|
||||
}
|
||||
|
||||
func TestExcludePatterns_AnchoredPatternWithGlob(t *testing.T) {
|
||||
// Anchored pattern with glob
|
||||
fs := setupAnchoredTestFS(t)
|
||||
scanner, repos, cleanup := createTestScanner(t, fs, []string{"/src/*.go"})
|
||||
defer cleanup()
|
||||
require.NotNil(t, scanner)
|
||||
|
||||
ctx := context.Background()
|
||||
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||
|
||||
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||
require.NoError(t, err)
|
||||
|
||||
// /src/*.go should exclude /backup/src/file.go
|
||||
// Total files: 4, excluded: 1
|
||||
require.Equal(t, 3, result.FilesScanned, "Anchored pattern with glob should work")
|
||||
}
|
||||
|
||||
func TestExcludePatterns_AnchoredPatternFile(t *testing.T) {
|
||||
// Anchored pattern for exact file at root
|
||||
fs := setupAnchoredTestFS(t)
|
||||
scanner, repos, cleanup := createTestScanner(t, fs, []string{"/file.txt"})
|
||||
defer cleanup()
|
||||
require.NotNil(t, scanner)
|
||||
|
||||
ctx := context.Background()
|
||||
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||
|
||||
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||
require.NoError(t, err)
|
||||
|
||||
// /file.txt should ONLY exclude /backup/file.txt
|
||||
// NOT /backup/projectname/file.txt or /backup/otherproject/projectname/file.txt
|
||||
// Total files: 4, excluded: 1
|
||||
require.Equal(t, 3, result.FilesScanned, "Anchored pattern for file should only match at root")
|
||||
}
|
||||
|
||||
func TestExcludePatterns_UnanchoredPatternFile(t *testing.T) {
|
||||
// Unanchored pattern for file should match anywhere
|
||||
fs := setupAnchoredTestFS(t)
|
||||
scanner, repos, cleanup := createTestScanner(t, fs, []string{"file.txt"})
|
||||
defer cleanup()
|
||||
require.NotNil(t, scanner)
|
||||
|
||||
ctx := context.Background()
|
||||
createSnapshotRecord(t, ctx, repos, "test-snapshot")
|
||||
|
||||
result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
|
||||
require.NoError(t, err)
|
||||
|
||||
// file.txt should exclude ALL file.txt files:
|
||||
// - /backup/file.txt
|
||||
// - /backup/projectname/file.txt
|
||||
// - /backup/otherproject/projectname/file.txt
|
||||
// Total files: 4, excluded: 3
|
||||
require.Equal(t, 1, result.FilesScanned, "Unanchored pattern for file should match anywhere")
|
||||
}
|
||||
238
internal/snapshot/file_change_test.go
Normal file
238
internal/snapshot/file_change_test.go
Normal file
@@ -0,0 +1,238 @@
|
||||
package snapshot_test
|
||||
|
||||
import (
|
||||
"context"
|
||||
"database/sql"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||
"git.eeqj.de/sneak/vaultik/internal/log"
|
||||
"git.eeqj.de/sneak/vaultik/internal/snapshot"
|
||||
"git.eeqj.de/sneak/vaultik/internal/types"
|
||||
"github.com/spf13/afero"
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
// TestFileContentChange verifies that when a file's content changes,
|
||||
// the old chunks are properly disassociated
|
||||
func TestFileContentChange(t *testing.T) {
|
||||
// Initialize logger for tests
|
||||
log.Initialize(log.Config{})
|
||||
|
||||
// Create in-memory filesystem
|
||||
fs := afero.NewMemMapFs()
|
||||
|
||||
// Create initial file
|
||||
err := afero.WriteFile(fs, "/test.txt", []byte("Initial content"), 0644)
|
||||
require.NoError(t, err)
|
||||
|
||||
// Create test database
|
||||
db, err := database.NewTestDB()
|
||||
require.NoError(t, err)
|
||||
defer func() {
|
||||
if err := db.Close(); err != nil {
|
||||
t.Errorf("failed to close database: %v", err)
|
||||
}
|
||||
}()
|
||||
|
||||
repos := database.NewRepositories(db)
|
||||
|
||||
// Create scanner
|
||||
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
|
||||
FS: fs,
|
||||
ChunkSize: int64(1024 * 16), // 16KB chunks for testing
|
||||
Repositories: repos,
|
||||
MaxBlobSize: int64(1024 * 1024), // 1MB blobs
|
||||
CompressionLevel: 3,
|
||||
AgeRecipients: []string{"age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"}, // Test public key
|
||||
})
|
||||
|
||||
// Create first snapshot
|
||||
ctx := context.Background()
|
||||
snapshotID1 := "snapshot1"
|
||||
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||
snapshot := &database.Snapshot{
|
||||
ID: types.SnapshotID(snapshotID1),
|
||||
Hostname: "test-host",
|
||||
VaultikVersion: "test",
|
||||
StartedAt: time.Now(),
|
||||
}
|
||||
return repos.Snapshots.Create(ctx, tx, snapshot)
|
||||
})
|
||||
require.NoError(t, err)
|
||||
|
||||
// First scan - should create chunks for initial content
|
||||
result1, err := scanner.Scan(ctx, "/", snapshotID1)
|
||||
require.NoError(t, err)
|
||||
t.Logf("First scan: %d files scanned", result1.FilesScanned)
|
||||
|
||||
// Get file chunks from first scan
|
||||
fileChunks1, err := repos.FileChunks.GetByPath(ctx, "/test.txt")
|
||||
require.NoError(t, err)
|
||||
assert.Len(t, fileChunks1, 1) // Small file = 1 chunk
|
||||
oldChunkHash := fileChunks1[0].ChunkHash
|
||||
|
||||
// Get chunk files from first scan
|
||||
chunkFiles1, err := repos.ChunkFiles.GetByFilePath(ctx, "/test.txt")
|
||||
require.NoError(t, err)
|
||||
assert.Len(t, chunkFiles1, 1)
|
||||
|
||||
// Modify the file
|
||||
time.Sleep(10 * time.Millisecond) // Ensure mtime changes
|
||||
err = afero.WriteFile(fs, "/test.txt", []byte("Modified content with different data"), 0644)
|
||||
require.NoError(t, err)
|
||||
|
||||
// Create second snapshot
|
||||
snapshotID2 := "snapshot2"
|
||||
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||
snapshot := &database.Snapshot{
|
||||
ID: types.SnapshotID(snapshotID2),
|
||||
Hostname: "test-host",
|
||||
VaultikVersion: "test",
|
||||
StartedAt: time.Now(),
|
||||
}
|
||||
return repos.Snapshots.Create(ctx, tx, snapshot)
|
||||
})
|
||||
require.NoError(t, err)
|
||||
|
||||
// Second scan - should create new chunks and remove old associations
|
||||
result2, err := scanner.Scan(ctx, "/", snapshotID2)
|
||||
require.NoError(t, err)
|
||||
t.Logf("Second scan: %d files scanned", result2.FilesScanned)
|
||||
|
||||
// Get file chunks from second scan
|
||||
fileChunks2, err := repos.FileChunks.GetByPath(ctx, "/test.txt")
|
||||
require.NoError(t, err)
|
||||
assert.Len(t, fileChunks2, 1) // Still 1 chunk but different hash
|
||||
newChunkHash := fileChunks2[0].ChunkHash
|
||||
|
||||
// Verify the chunk hashes are different
|
||||
assert.NotEqual(t, oldChunkHash, newChunkHash, "Chunk hash should change when content changes")
|
||||
|
||||
// Get chunk files from second scan
|
||||
chunkFiles2, err := repos.ChunkFiles.GetByFilePath(ctx, "/test.txt")
|
||||
require.NoError(t, err)
|
||||
assert.Len(t, chunkFiles2, 1)
|
||||
assert.Equal(t, newChunkHash, chunkFiles2[0].ChunkHash)
|
||||
|
||||
// Verify old chunk still exists (it's still valid data)
|
||||
oldChunk, err := repos.Chunks.GetByHash(ctx, oldChunkHash.String())
|
||||
require.NoError(t, err)
|
||||
assert.NotNil(t, oldChunk)
|
||||
|
||||
// Verify new chunk exists
|
||||
newChunk, err := repos.Chunks.GetByHash(ctx, newChunkHash.String())
|
||||
require.NoError(t, err)
|
||||
assert.NotNil(t, newChunk)
|
||||
|
||||
// Verify that chunk_files for old chunk no longer references this file
|
||||
oldChunkFiles, err := repos.ChunkFiles.GetByChunkHash(ctx, oldChunkHash)
|
||||
require.NoError(t, err)
|
||||
for _, cf := range oldChunkFiles {
|
||||
file, err := repos.Files.GetByID(ctx, cf.FileID)
|
||||
require.NoError(t, err)
|
||||
assert.NotEqual(t, "/data/test.txt", file.Path, "Old chunk should not be associated with the modified file")
|
||||
}
|
||||
}
|
||||
|
||||
// TestMultipleFileChanges verifies handling of multiple file changes in one scan
|
||||
func TestMultipleFileChanges(t *testing.T) {
|
||||
// Initialize logger for tests
|
||||
log.Initialize(log.Config{})
|
||||
|
||||
// Create in-memory filesystem
|
||||
fs := afero.NewMemMapFs()
|
||||
|
||||
// Create initial files
|
||||
files := map[string]string{
|
||||
"/file1.txt": "Content 1",
|
||||
"/file2.txt": "Content 2",
|
||||
"/file3.txt": "Content 3",
|
||||
}
|
||||
|
||||
for path, content := range files {
|
||||
err := afero.WriteFile(fs, path, []byte(content), 0644)
|
||||
require.NoError(t, err)
|
||||
}
|
||||
|
||||
// Create test database
|
||||
db, err := database.NewTestDB()
|
||||
require.NoError(t, err)
|
||||
defer func() {
|
||||
if err := db.Close(); err != nil {
|
||||
t.Errorf("failed to close database: %v", err)
|
||||
}
|
||||
}()
|
||||
|
||||
repos := database.NewRepositories(db)
|
||||
|
||||
// Create scanner
|
||||
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
|
||||
FS: fs,
|
||||
ChunkSize: int64(1024 * 16), // 16KB chunks for testing
|
||||
Repositories: repos,
|
||||
MaxBlobSize: int64(1024 * 1024), // 1MB blobs
|
||||
CompressionLevel: 3,
|
||||
AgeRecipients: []string{"age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"}, // Test public key
|
||||
})
|
||||
|
||||
// Create first snapshot
|
||||
ctx := context.Background()
|
||||
snapshotID1 := "snapshot1"
|
||||
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||
snapshot := &database.Snapshot{
|
||||
ID: types.SnapshotID(snapshotID1),
|
||||
Hostname: "test-host",
|
||||
VaultikVersion: "test",
|
||||
StartedAt: time.Now(),
|
||||
}
|
||||
return repos.Snapshots.Create(ctx, tx, snapshot)
|
||||
})
|
||||
require.NoError(t, err)
|
||||
|
||||
// First scan
|
||||
result1, err := scanner.Scan(ctx, "/", snapshotID1)
|
||||
require.NoError(t, err)
|
||||
// Only regular files are counted, not directories
|
||||
assert.Equal(t, 3, result1.FilesScanned)
|
||||
|
||||
// Modify two files
|
||||
time.Sleep(10 * time.Millisecond) // Ensure mtime changes
|
||||
err = afero.WriteFile(fs, "/file1.txt", []byte("Modified content 1"), 0644)
|
||||
require.NoError(t, err)
|
||||
err = afero.WriteFile(fs, "/file3.txt", []byte("Modified content 3"), 0644)
|
||||
require.NoError(t, err)
|
||||
|
||||
// Create second snapshot
|
||||
snapshotID2 := "snapshot2"
|
||||
err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
|
||||
snapshot := &database.Snapshot{
|
||||
ID: types.SnapshotID(snapshotID2),
|
||||
Hostname: "test-host",
|
||||
VaultikVersion: "test",
|
||||
StartedAt: time.Now(),
|
||||
}
|
||||
return repos.Snapshots.Create(ctx, tx, snapshot)
|
||||
})
|
||||
require.NoError(t, err)
|
||||
|
||||
// Second scan
|
||||
result2, err := scanner.Scan(ctx, "/", snapshotID2)
|
||||
require.NoError(t, err)
|
||||
|
||||
// Only regular files are counted, not directories
|
||||
assert.Equal(t, 3, result2.FilesScanned)
|
||||
|
||||
// Verify each file has exactly one set of chunks
|
||||
for path := range files {
|
||||
fileChunks, err := repos.FileChunks.GetByPath(ctx, path)
|
||||
require.NoError(t, err)
|
||||
assert.Len(t, fileChunks, 1, "File %s should have exactly 1 chunk association", path)
|
||||
|
||||
chunkFiles, err := repos.ChunkFiles.GetByFilePath(ctx, path)
|
||||
require.NoError(t, err)
|
||||
assert.Len(t, chunkFiles, 1, "File %s should have exactly 1 chunk-file association", path)
|
||||
}
|
||||
}
|
||||
70
internal/snapshot/manifest.go
Normal file
70
internal/snapshot/manifest.go
Normal file
@@ -0,0 +1,70 @@
|
||||
package snapshot
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
|
||||
"github.com/klauspost/compress/zstd"
|
||||
)
|
||||
|
||||
// Manifest represents the structure of a snapshot's blob manifest
|
||||
type Manifest struct {
|
||||
SnapshotID string `json:"snapshot_id"`
|
||||
Timestamp string `json:"timestamp"`
|
||||
BlobCount int `json:"blob_count"`
|
||||
TotalCompressedSize int64 `json:"total_compressed_size"`
|
||||
Blobs []BlobInfo `json:"blobs"`
|
||||
}
|
||||
|
||||
// BlobInfo represents information about a single blob in the manifest
|
||||
type BlobInfo struct {
|
||||
Hash string `json:"hash"`
|
||||
CompressedSize int64 `json:"compressed_size"`
|
||||
}
|
||||
|
||||
// DecodeManifest decodes a manifest from a reader containing compressed JSON
|
||||
func DecodeManifest(r io.Reader) (*Manifest, error) {
|
||||
// Decompress using zstd
|
||||
zr, err := zstd.NewReader(r)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("creating zstd reader: %w", err)
|
||||
}
|
||||
defer zr.Close()
|
||||
|
||||
// Decode JSON manifest
|
||||
var manifest Manifest
|
||||
if err := json.NewDecoder(zr).Decode(&manifest); err != nil {
|
||||
return nil, fmt.Errorf("decoding manifest: %w", err)
|
||||
}
|
||||
|
||||
return &manifest, nil
|
||||
}
|
||||
|
||||
// EncodeManifest encodes a manifest to compressed JSON
|
||||
func EncodeManifest(manifest *Manifest, compressionLevel int) ([]byte, error) {
|
||||
// Marshal to JSON
|
||||
jsonData, err := json.MarshalIndent(manifest, "", " ")
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("marshaling manifest: %w", err)
|
||||
}
|
||||
|
||||
// Compress using zstd
|
||||
var compressedBuf bytes.Buffer
|
||||
writer, err := zstd.NewWriter(&compressedBuf, zstd.WithEncoderLevel(zstd.EncoderLevelFromZstd(compressionLevel)))
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("creating zstd writer: %w", err)
|
||||
}
|
||||
|
||||
if _, err := writer.Write(jsonData); err != nil {
|
||||
_ = writer.Close()
|
||||
return nil, fmt.Errorf("writing compressed data: %w", err)
|
||||
}
|
||||
|
||||
if err := writer.Close(); err != nil {
|
||||
return nil, fmt.Errorf("closing zstd writer: %w", err)
|
||||
}
|
||||
|
||||
return compressedBuf.Bytes(), nil
|
||||
}
|
||||
53
internal/snapshot/module.go
Normal file
53
internal/snapshot/module.go
Normal file
@@ -0,0 +1,53 @@
|
||||
package snapshot
|
||||
|
||||
import (
|
||||
"git.eeqj.de/sneak/vaultik/internal/config"
|
||||
"git.eeqj.de/sneak/vaultik/internal/database"
|
||||
"git.eeqj.de/sneak/vaultik/internal/storage"
|
||||
"github.com/spf13/afero"
|
||||
"go.uber.org/fx"
|
||||
)
|
||||
|
||||
// ScannerParams holds parameters for scanner creation
|
||||
type ScannerParams struct {
|
||||
EnableProgress bool
|
||||
Fs afero.Fs
|
||||
Exclude []string // Exclude patterns (combined global + snapshot-specific)
|
||||
SkipErrors bool // Skip file read errors (log loudly but continue)
|
||||
}
|
||||
|
||||
// Module exports backup functionality as an fx module.
|
||||
// It provides a ScannerFactory that can create Scanner instances
|
||||
// with custom parameters while sharing common dependencies.
|
||||
var Module = fx.Module("backup",
|
||||
fx.Provide(
|
||||
provideScannerFactory,
|
||||
NewSnapshotManager,
|
||||
),
|
||||
)
|
||||
|
||||
// ScannerFactory creates scanners with custom parameters
|
||||
type ScannerFactory func(params ScannerParams) *Scanner
|
||||
|
||||
func provideScannerFactory(cfg *config.Config, repos *database.Repositories, storer storage.Storer) ScannerFactory {
|
||||
return func(params ScannerParams) *Scanner {
|
||||
// Use provided excludes, or fall back to global config excludes
|
||||
excludes := params.Exclude
|
||||
if len(excludes) == 0 {
|
||||
excludes = cfg.Exclude
|
||||
}
|
||||
|
||||
return NewScanner(ScannerConfig{
|
||||
FS: params.Fs,
|
||||
ChunkSize: cfg.ChunkSize.Int64(),
|
||||
Repositories: repos,
|
||||
Storage: storer,
|
||||
MaxBlobSize: cfg.BlobSizeLimit.Int64(),
|
||||
CompressionLevel: cfg.CompressionLevel,
|
||||
AgeRecipients: cfg.AgeRecipients,
|
||||
EnableProgress: params.EnableProgress,
|
||||
Exclude: excludes,
|
||||
SkipErrors: params.SkipErrors,
|
||||
})
|
||||
}
|
||||
}
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user