fix: populate ctime from actual filesystem stats instead of mtime fallback

Replace the mtime fallback for ctime in the scanner with platform-specific implementations that extract the real ctime from syscall.Stat_t: - macOS/Darwin: uses Birthtimespec (file birth/creation time) - Linux: uses Ctim (inode change time) - Falls back to mtime when syscall stats are unavailable (e.g. afero.MemMapFs) Also: - Document platform-specific ctime semantics in README - Document ctime restore limitations (cannot be set on either platform) - Add ctime field documentation to File model - Update README files table schema to match actual schema (adds ctime, source_path, link_target columns) - Add comprehensive tests for fileCTime on real files and mock FileInfo closes #13
Add make check target and CI workflow (#42 )
2026-03-17 13:49:51 -07:00 · 2026-03-17 12:39:44 +01:00 · 2026-03-17 11:18:18 +01:00 · 2026-02-20 11:22:12 +01:00 · 2026-02-20 11:20:52 +01:00 · 2026-02-20 11:19:40 +01:00
128 changed files with 23979 additions and 1645 deletions
--- a/.dockerignore
+++ b/.dockerignore
@@ -0,0 +1,8 @@
+.git
+.gitea
+*.md
+LICENSE
+vaultik
+coverage.out
+coverage.html
+.DS_Store
--- a/.gitea/workflows/check.yml
+++ b/.gitea/workflows/check.yml
@@ -0,0 +1,14 @@
+name: check
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+jobs:
+  check:
+    runs-on: ubuntu-latest
+    steps:
+      # actions/checkout v4, 2024-09-16
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5
+      - name: Build and check
+        run: docker build .
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@@ -0,0 +1,380 @@
+# Vaultik Architecture
+
+This document describes the internal architecture of Vaultik, focusing on the data model, type instantiation, and the relationships between core modules.
+
+## Overview
+
+Vaultik is a backup system that uses content-defined chunking for deduplication and packs chunks into large, compressed, encrypted blobs for efficient cloud storage. The system is built around dependency injection using [uber-go/fx](https://github.com/uber-go/fx).
+
+## Data Flow
+
+```
+Source Files
+     │
+     ▼
+┌─────────────────┐
+│    Scanner      │  Walks directories, detects changed files
+└────────┬────────┘
+         │
+         ▼
+┌─────────────────┐
+│    Chunker      │  Splits files into variable-size chunks (FastCDC)
+└────────┬────────┘
+         │
+         ▼
+┌─────────────────┐
+│    Packer       │  Accumulates chunks, compresses (zstd), encrypts (age)
+└────────┬────────┘
+         │
+         ▼
+┌─────────────────┐
+│   S3 Client     │  Uploads blobs to remote storage
+└─────────────────┘
+```
+
+## Data Model
+
+### Core Entities
+
+The database tracks five primary entities and their relationships:
+
+```
+┌──────────────┐     ┌──────────────┐     ┌──────────────┐
+│   Snapshot   │────▶│     File     │────▶│    Chunk     │
+└──────────────┘     └──────────────┘     └──────────────┘
+       │                                         │
+       │                                         │
+       ▼                                         ▼
+┌──────────────┐                          ┌──────────────┐
+│     Blob     │◀─────────────────────────│  BlobChunk   │
+└──────────────┘                          └──────────────┘
+```
+
+### Entity Descriptions
+
+#### File (`database.File`)
+Represents a file or directory in the backup system. Stores metadata needed for restoration:
+- Path, timestamps (mtime, ctime)
+- Size, mode, ownership (uid, gid)
+- Symlink target (if applicable)
+
+#### Chunk (`database.Chunk`)
+A content-addressed unit of data. Files are split into variable-size chunks using the FastCDC algorithm:
+- `ChunkHash`: SHA256 hash of chunk content (primary key)
+- `Size`: Chunk size in bytes
+
+Chunk sizes vary between `avgChunkSize/4` and `avgChunkSize*4` (typically 16KB-256KB for 64KB average).
+
+#### FileChunk (`database.FileChunk`)
+Maps files to their constituent chunks:
+- `FileID`: Reference to the file
+- `Idx`: Position of this chunk within the file (0-indexed)
+- `ChunkHash`: Reference to the chunk
+
+#### Blob (`database.Blob`)
+The final storage unit uploaded to S3. Contains many compressed and encrypted chunks:
+- `ID`: UUID assigned at creation
+- `Hash`: SHA256 of final compressed+encrypted content
+- `UncompressedSize`: Total raw chunk data before compression
+- `CompressedSize`: Size after zstd compression and age encryption
+- `CreatedTS`, `FinishedTS`, `UploadedTS`: Lifecycle timestamps
+
+Blob creation process:
+1. Chunks are accumulated (up to MaxBlobSize, typically 10GB)
+2. Compressed with zstd
+3. Encrypted with age (recipients configured in config)
+4. SHA256 hash computed → becomes filename in S3
+5. Uploaded to `blobs/{hash[0:2]}/{hash[2:4]}/{hash}`
+
+#### BlobChunk (`database.BlobChunk`)
+Maps chunks to their position within blobs:
+- `BlobID`: Reference to the blob
+- `ChunkHash`: Reference to the chunk
+- `Offset`: Byte offset within the uncompressed blob
+- `Length`: Chunk size
+
+#### Snapshot (`database.Snapshot`)
+Represents a point-in-time backup:
+- `ID`: Format is `{hostname}-{YYYYMMDD}-{HHMMSS}Z`
+- Tracks file count, chunk count, blob count, sizes, compression ratio
+- `CompletedAt`: Null until snapshot finishes successfully
+
+#### SnapshotFile / SnapshotBlob
+Join tables linking snapshots to their files and blobs.
+
+### Relationship Summary
+
+```
+Snapshot 1──────────▶ N SnapshotFile N ◀────────── 1 File
+Snapshot 1──────────▶ N SnapshotBlob N ◀────────── 1 Blob
+File     1──────────▶ N FileChunk    N ◀────────── 1 Chunk
+Blob     1──────────▶ N BlobChunk    N ◀────────── 1 Chunk
+```
+
+## Type Instantiation
+
+### Application Startup
+
+The CLI uses fx for dependency injection. Here's the instantiation order:
+
+```go
+// cli/app.go: NewApp()
+fx.New(
+    fx.Supply(config.ConfigPath(opts.ConfigPath)),  // 1. Config path
+    fx.Supply(opts.LogOptions),                      // 2. Log options
+    fx.Provide(globals.New),                         // 3. Globals
+    fx.Provide(log.New),                             // 4. Logger config
+    config.Module,                                   // 5. Config
+    database.Module,                                 // 6. Database + Repositories
+    log.Module,                                      // 7. Logger initialization
+    s3.Module,                                       // 8. S3 client
+    snapshot.Module,                                 // 9. SnapshotManager + ScannerFactory
+    fx.Provide(vaultik.New),                         // 10. Vaultik orchestrator
+)
+```
+
+### Key Type Instantiation Points
+
+#### 1. Config (`config.Config`)
+- **Created by**: `config.Module` via `config.LoadConfig()`
+- **When**: Application startup (fx DI)
+- **Contains**: All configuration from YAML file (S3 credentials, encryption keys, paths, etc.)
+
+#### 2. Database (`database.DB`)
+- **Created by**: `database.Module` via `database.New()`
+- **When**: Application startup (fx DI)
+- **Contains**: SQLite connection, path reference
+
+#### 3. Repositories (`database.Repositories`)
+- **Created by**: `database.Module` via `database.NewRepositories()`
+- **When**: Application startup (fx DI)
+- **Contains**: All repository interfaces (Files, Chunks, Blobs, Snapshots, etc.)
+
+#### 4. Vaultik (`vaultik.Vaultik`)
+- **Created by**: `vaultik.New(VaultikParams)`
+- **When**: Application startup (fx DI)
+- **Contains**: All dependencies for backup operations
+
+```go
+type Vaultik struct {
+    Globals         *globals.Globals
+    Config          *config.Config
+    DB              *database.DB
+    Repositories    *database.Repositories
+    S3Client        *s3.Client
+    ScannerFactory  snapshot.ScannerFactory
+    SnapshotManager *snapshot.SnapshotManager
+    Shutdowner      fx.Shutdowner
+    Fs              afero.Fs
+    ctx             context.Context
+    cancel          context.CancelFunc
+}
+```
+
+#### 5. SnapshotManager (`snapshot.SnapshotManager`)
+- **Created by**: `snapshot.Module` via `snapshot.NewSnapshotManager()`
+- **When**: Application startup (fx DI)
+- **Responsibility**: Creates/completes snapshots, exports metadata to S3
+
+#### 6. Scanner (`snapshot.Scanner`)
+- **Created by**: `ScannerFactory(ScannerParams)`
+- **When**: Each `CreateSnapshot()` call
+- **Contains**: Chunker, Packer, progress reporter
+
+```go
+// vaultik/snapshot.go: CreateSnapshot()
+scanner := v.ScannerFactory(snapshot.ScannerParams{
+    EnableProgress: !opts.Cron,
+    Fs:             v.Fs,
+})
+```
+
+#### 7. Chunker (`chunker.Chunker`)
+- **Created by**: `chunker.NewChunker(avgChunkSize)`
+- **When**: Inside `snapshot.NewScanner()`
+- **Configuration**:
+  - `avgChunkSize`: From config (typically 64KB)
+  - `minChunkSize`: avgChunkSize / 4
+  - `maxChunkSize`: avgChunkSize * 4
+
+#### 8. Packer (`blob.Packer`)
+- **Created by**: `blob.NewPacker(PackerConfig)`
+- **When**: Inside `snapshot.NewScanner()`
+- **Configuration**:
+  - `MaxBlobSize`: Maximum blob size before finalization (typically 10GB)
+  - `CompressionLevel`: zstd level (1-19)
+  - `Recipients`: age public keys for encryption
+
+```go
+// snapshot/scanner.go: NewScanner()
+packerCfg := blob.PackerConfig{
+    MaxBlobSize:      cfg.MaxBlobSize,
+    CompressionLevel: cfg.CompressionLevel,
+    Recipients:       cfg.AgeRecipients,
+    Repositories:     cfg.Repositories,
+    Fs:               cfg.FS,
+}
+packer, err := blob.NewPacker(packerCfg)
+```
+
+## Module Responsibilities
+
+### `internal/cli`
+Entry point for fx application. Combines all modules and handles signal interrupts.
+
+Key functions:
+- `NewApp(AppOptions)` → Creates fx.App with all modules
+- `RunApp(ctx, app)` → Starts app, handles graceful shutdown
+- `RunWithApp(ctx, opts)` → Convenience wrapper
+
+### `internal/vaultik`
+Main orchestrator containing all dependencies and command implementations.
+
+Key methods:
+- `New(VaultikParams)` → Constructor (fx DI)
+- `CreateSnapshot(opts)` → Main backup operation
+- `ListSnapshots(jsonOutput)` → List available snapshots
+- `VerifySnapshot(id, deep)` → Verify snapshot integrity
+- `PurgeSnapshots(...)` → Remove old snapshots
+
+### `internal/chunker`
+Content-defined chunking using FastCDC algorithm.
+
+Key types:
+- `Chunk` → Hash, Data, Offset, Size
+- `Chunker` → avgChunkSize, minChunkSize, maxChunkSize
+
+Key methods:
+- `NewChunker(avgChunkSize)` → Constructor
+- `ChunkReaderStreaming(reader, callback)` → Stream chunks with callback (preferred)
+- `ChunkReader(reader)` → Return all chunks at once (memory-intensive)
+
+### `internal/blob`
+Blob packing: accumulates chunks, compresses, encrypts, tracks metadata.
+
+Key types:
+- `Packer` → Thread-safe blob accumulator
+- `ChunkRef` → Hash + Data for adding to packer
+- `FinishedBlob` → Completed blob ready for upload
+- `BlobWithReader` → FinishedBlob + io.Reader for streaming upload
+
+Key methods:
+- `NewPacker(PackerConfig)` → Constructor
+- `AddChunk(ChunkRef)` → Add chunk to current blob
+- `FinalizeBlob()` → Compress, encrypt, hash current blob
+- `Flush()` → Finalize any in-progress blob
+- `SetBlobHandler(func)` → Set callback for upload
+
+### `internal/snapshot`
+
+#### Scanner
+Orchestrates the backup process for a directory.
+
+Key methods:
+- `NewScanner(ScannerConfig)` → Constructor (creates Chunker + Packer)
+- `Scan(ctx, path, snapshotID)` → Main scan operation
+
+Scan phases:
+1. **Phase 0**: Detect deleted files from previous snapshots
+2. **Phase 1**: Walk directory, identify files needing processing
+3. **Phase 2**: Process files (chunk → pack → upload)
+
+#### SnapshotManager
+Manages snapshot lifecycle and metadata export.
+
+Key methods:
+- `CreateSnapshot(ctx, hostname, version, commit)` → Create snapshot record
+- `CompleteSnapshot(ctx, snapshotID)` → Mark snapshot complete
+- `ExportSnapshotMetadata(ctx, dbPath, snapshotID)` → Export to S3
+- `CleanupIncompleteSnapshots(ctx, hostname)` → Remove failed snapshots
+
+### `internal/database`
+SQLite database for local index. Single-writer mode for thread safety.
+
+Key types:
+- `DB` → Database connection wrapper
+- `Repositories` → Collection of all repository interfaces
+
+Repository interfaces:
+- `FilesRepository` → CRUD for File records
+- `ChunksRepository` → CRUD for Chunk records
+- `BlobsRepository` → CRUD for Blob records
+- `SnapshotsRepository` → CRUD for Snapshot records
+- Plus join table repositories (FileChunks, BlobChunks, etc.)
+
+## Snapshot Creation Flow
+
+```
+CreateSnapshot(opts)
+    │
+    ├─► CleanupIncompleteSnapshots()   // Critical: avoid dedup errors
+    │
+    ├─► SnapshotManager.CreateSnapshot()   // Create DB record
+    │
+    ├─► For each source directory:
+    │       │
+    │       ├─► scanner.Scan(ctx, path, snapshotID)
+    │       │       │
+    │       │       ├─► Phase 0: detectDeletedFiles()
+    │       │       │
+    │       │       ├─► Phase 1: scanPhase()
+    │       │       │       Walk directory
+    │       │       │       Check file metadata changes
+    │       │       │       Build list of files to process
+    │       │       │
+    │       │       └─► Phase 2: processPhase()
+    │       │               For each file:
+    │       │                   chunker.ChunkReaderStreaming()
+    │       │                   For each chunk:
+    │       │                       packer.AddChunk()
+    │       │                       If blob full → FinalizeBlob()
+    │       │                           → handleBlobReady()
+    │       │                           → s3Client.PutObjectWithProgress()
+    │       │               packer.Flush()  // Final blob
+    │       │
+    │       └─► Accumulate statistics
+    │
+    ├─► SnapshotManager.UpdateSnapshotStatsExtended()
+    │
+    ├─► SnapshotManager.CompleteSnapshot()
+    │
+    └─► SnapshotManager.ExportSnapshotMetadata()
+            │
+            ├─► Copy database to temp file
+            ├─► Clean to only current snapshot data
+            ├─► Dump to SQL
+            ├─► Compress with zstd
+            ├─► Encrypt with age
+            ├─► Upload db.zst.age to S3
+            └─► Upload manifest.json.zst to S3
+```
+
+## Deduplication Strategy
+
+1. **File-level**: Files unchanged since last backup are skipped (metadata comparison: size, mtime, mode, uid, gid)
+
+2. **Chunk-level**: Chunks are content-addressed by SHA256 hash. If a chunk hash already exists in the database, the chunk data is not re-uploaded.
+
+3. **Blob-level**: Blobs contain only unique chunks. Duplicate chunks within a blob are skipped.
+
+## Storage Layout in S3
+
+```
+bucket/
+├── blobs/
+│   └── {hash[0:2]}/
+│       └── {hash[2:4]}/
+│           └── {full-hash}          # Compressed+encrypted blob
+│
+└── metadata/
+    └── {snapshot-id}/
+        ├── db.zst.age               # Encrypted database dump
+        └── manifest.json.zst        # Blob list (for verification)
+```
+
+## Thread Safety
+
+- `Packer`: Thread-safe via mutex. Multiple goroutines can call `AddChunk()`.
+- `Scanner`: Uses `packerMu` mutex to coordinate blob finalization.
+- `Database`: Single-writer mode (`MaxOpenConns=1`) ensures SQLite thread safety.
+- `Repositories.WithTx()`: Handles transaction lifecycle automatically.
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -10,6 +10,9 @@ Read the rules in AGENTS.md and follow them.
  corporate advertising for Anthropic and is therefore completely
  unacceptable in commit messages.

+* NEVER use `git add -A`.  Always add only the files you intentionally
+  changed.
+
 * Tests should always be run before committing code.  No commits should be
  made that do not pass tests.

@@ -26,3 +29,16 @@ Read the rules in AGENTS.md and follow them.
 * Do not stop working on a task until you have reached the definition of
  done provided to you in the initial instruction.  Don't do part or most of
  the work, do all of the work until the criteria for done are met.
+
+* We do not need to support migrations; schema upgrades can be handled by
+  deleting the local state file and doing a full backup to re-create it.
+
+* When testing on a 2.5Gbit/s ethernet to an s3 server backed by 2000MB/sec SSD, 
+  estimate about 4 seconds per gigabyte of backup time.
+
+* When running tests, don't run individual tests, or grep the output.  run
+  the entire test suite every time and read the full output.
+
+* When running tests, don't run individual tests, or try to grep the output.
+  never run "go test".  only ever run "make test" to run the full test
+  suite, and examine the full output.
--- a/DESIGN.md
+++ b/DESIGN.md
@@ -1,385 +0,0 @@
-# vaultik: Design Document
-
-`vaultik` is a secure  backup tool written in Go. It performs
-streaming backups using content-defined chunking, blob grouping, asymmetric
-encryption, and object storage. The system is designed for environments
-where the backup source host cannot store secrets and cannot retrieve or
-decrypt any data from the destination.
-
-The source host is **stateful**: it maintains a local SQLite index to detect
-changes, deduplicate content, and track uploads across backup runs. All
-remote storage is encrypted and append-only. Pruning of unreferenced data is
-done from a trusted host with access to decryption keys, as even the
-metadata indices are encrypted in the blob store.
-
---
-
-## Why
-
-ANOTHER backup tool??
-
-Other backup tools like `restic`, `borg`, and `duplicity` are designed for
-environments where the source host can store secrets and has access to
-decryption keys. I don't want to store backup decryption keys on my hosts,
-only public keys for encryption.
-
-My requirements are:
-
-* open source
-* no passphrases or private keys on the source host
-* incremental
-* compressed
-* encrypted
-* s3 compatible without an intermediate step or tool
-
-Surprisingly, no existing tool meets these requirements, so I wrote `vaultik`.
-
-## Design Goals
-
-1. Backups must require only a public key on the source host.
-2. No secrets or private keys may exist on the source system.
-3. Obviously, restore must be possible using **only** the backup bucket and
-   a private key.
-4. Prune must be possible, although this requires a private key so must be
-   done on different hosts.
-5. All encryption is done using [`age`](https://github.com/FiloSottile/age)
-   (X25519, XChaCha20-Poly1305).
-6. Compression uses `zstd` at a configurable level.
-7. Files are chunked, and multiple chunks are packed into encrypted blobs.
-   This reduces the number of objects in the blob store for filesystems with
-   many small files.
-9. All metadata (snapshots) is stored remotely as encrypted SQLite DBs.
-10. If a snapshot metadata file exceeds a configured size threshold, it is
-    chunked into multiple encrypted `.age` parts, to support large
-    filesystems.
-11. CLI interface is structured using `cobra`.
-
---
-
-## S3 Bucket Layout
-
-S3 stores only four things:
-
-1) Blobs: encrypted, compressed packs of file chunks.
-2) Metadata: encrypted SQLite databases containing the current state of the
-   filesystem at the time of the snapshot.
-3) Metadata hashes: encrypted hashes of the metadata SQLite databases.
-4) Blob manifests: unencrypted compressed JSON files listing all blob hashes
-   referenced in the snapshot, enabling pruning without decryption.
-
-```
-s3://<bucket>/<prefix>/
-├── blobs/
-│   ├── <aa>/<bb>/<full_blob_hash>.zst.age
-├── metadata/
-│   ├── <snapshot_id>.sqlite.age
-│   ├── <snapshot_id>.sqlite.00.age
-│   ├── <snapshot_id>.sqlite.01.age
-│   ├── <snapshot_id>.manifest.json.zst
-```
-
-To retrieve a given file, you would:
-
-* fetch `metadata/<snapshot_id>.sqlite.age` or `metadata/<snapshot_id>.sqlite.{seq}.age`
-* fetch `metadata/<snapshot_id>.hash.age`
-* decrypt the metadata SQLite database using the private key and reconstruct
-  the full database file
-* verify the hash of the decrypted database matches the decrypted hash
-* query the database for the file in question
-* determine all chunks for the file
-* for each chunk, look up the metadata for all blobs in the db
-* fetch each blob from `blobs/<aa>/<bb>/<blob_hash>.zst.age`
-* decrypt each blob using the private key
-* decompress each blob using `zstd`
-* reconstruct the file from set of file chunks stored in the blobs
-
-If clever, it may be possible to do this chunk by chunk without touching
-disk (except for the output file) as each uncompressed blob should fit in
-memory (<10GB).
-
-### Path Rules
-
-* `<snapshot_id>`: UTC timestamp in iso860 format, e.g. `2023-10-01T12:00:00Z`.  These are lexicographically sortable.
-* `blobs/<aa>/<bb>/...`: where `aa` and `bb` are the first 2 hex bytes of the blob hash.
-
-### Blob Manifest Format
-
-The `<snapshot_id>.manifest.json.zst` file is an unencrypted, compressed JSON file containing:
-
-```json
-{
-  "snapshot_id": "2023-10-01T12:00:00Z",
-  "blob_hashes": [
-    "aa1234567890abcdef...",
-    "bb2345678901bcdef0...",
-    ...
-  ]
-}
-```
-
-This allows pruning operations to determine which blobs are referenced without requiring decryption keys.
-
---
-
-## 3. Local SQLite Index Schema (source host)
-
-```sql
-CREATE TABLE files (
-  path TEXT PRIMARY KEY,
-  mtime INTEGER NOT NULL,
-  size INTEGER NOT NULL
-);
-
-- Maps files to their constituent chunks in sequence order
-- Used for reconstructing files from chunks during restore
-CREATE TABLE file_chunks (
-  path TEXT NOT NULL,
-  idx INTEGER NOT NULL,
-  chunk_hash TEXT NOT NULL,
-  PRIMARY KEY (path, idx)
-);
-
-CREATE TABLE chunks (
-  chunk_hash TEXT PRIMARY KEY,
-  sha256 TEXT NOT NULL,
-  size INTEGER NOT NULL
-);
-
-CREATE TABLE blobs (
-  blob_hash TEXT PRIMARY KEY,
-  final_hash TEXT NOT NULL,
-  created_ts INTEGER NOT NULL
-);
-
-CREATE TABLE blob_chunks (
-  blob_hash TEXT NOT NULL,
-  chunk_hash TEXT NOT NULL,
-  offset INTEGER NOT NULL,
-  length INTEGER NOT NULL,
-  PRIMARY KEY (blob_hash, chunk_hash)
-);
-
-- Reverse mapping: tracks which files contain a given chunk
-- Used for deduplication and tracking chunk usage across files
-CREATE TABLE chunk_files (
-  chunk_hash TEXT NOT NULL,
-  file_path TEXT NOT NULL,
-  file_offset INTEGER NOT NULL,
-  length INTEGER NOT NULL,
-  PRIMARY KEY (chunk_hash, file_path)
-);
-
-CREATE TABLE snapshots (
-  id TEXT PRIMARY KEY,
-  hostname TEXT NOT NULL,
-  vaultik_version TEXT NOT NULL,
-  created_ts INTEGER NOT NULL,
-  file_count INTEGER NOT NULL,
-  chunk_count INTEGER NOT NULL,
-  blob_count INTEGER NOT NULL
-);
-```
-
---
-
-## 4. Snapshot Metadata Schema (stored in S3)
-
-Identical schema to the local index, filtered to live snapshot state. Stored
-as a SQLite DB, compressed with `zstd`, encrypted with `age`. If larger than
-a configured `chunk_size`, it is split and uploaded as:
-
-```
-metadata/<snapshot_id>.sqlite.00.age
-metadata/<snapshot_id>.sqlite.01.age
-...
-```
-
---
-
-## 5. Data Flow
-
-### 5.1 Backup
-
-1. Load config
-2. Open local SQLite index
-3. Walk source directories:
-
-   * For each file:
-
-     * Check mtime and size in index
-     * If changed or new:
-
-       * Chunk file
-       * For each chunk:
-
-         * Hash with SHA256
-         * Check if already uploaded
-         * If not:
-
-           * Add chunk to blob packer
-       * Record file-chunk mapping in index
-4. When blob reaches threshold size (e.g. 1GB):
-
-   * Compress with `zstd`
-   * Encrypt with `age`
-   * Upload to: `s3://<bucket>/<prefix>/blobs/<aa>/<bb>/<hash>.zst.age`
-   * Record blob-chunk layout in local index
-5. Once all files are processed:
-   * Build snapshot SQLite DB from index delta
-   * Compress + encrypt
-   * If larger than `chunk_size`, split into parts
-   * Upload to:
-     `s3://<bucket>/<prefix>/metadata/<snapshot_id>.sqlite(.xx).age`
-6. Create snapshot record in local index that lists:
-    * snapshot ID
-    * hostname
-    * vaultik version
-    * timestamp
-    * counts of files, chunks, and blobs
-    * list of all blobs referenced in the snapshot (some new, some old) for
-      efficient pruning later
-7. Create snapshot database for upload
-8. Calculate checksum of snapshot database
-9. Compress, encrypt, split, and upload to S3
-10. Encrypt the hash of the snapshot database to the backup age key
-11. Upload the encrypted hash to S3 as `metadata/<snapshot_id>.hash.age`
-12. Create blob manifest JSON listing all blob hashes referenced in snapshot
-13. Compress manifest with zstd and upload as `metadata/<snapshot_id>.manifest.json.zst`
-14. Optionally prune remote blobs that are no longer referenced in the
-   snapshot, based on local state db
-
-### 5.2 Manual Prune
-
-1. List all objects under `metadata/`
-2. Determine the latest valid `snapshot_id` by timestamp
-3. Download and decompress the latest `<snapshot_id>.manifest.json.zst`
-4. Extract set of referenced blob hashes from manifest (no decryption needed)
-5. List all blob objects under `blobs/`
-6. For each blob:
-   * If the hash is not in the manifest:
-     * Issue `DeleteObject` to remove it
-
-### 5.3 Verify
-
-Verify runs on a host that has no state, but access to the bucket.
-
-1. Fetch latest metadata snapshot files from S3
-2. Fetch latest metadata db hash from S3
-3. Decrypt the hash using the private key
-4. Decrypt the metadata SQLite database chunks using the private key and
-   reassemble the snapshot db file
-5. Calculate the SHA256 hash of the decrypted snapshot database
-6. Verify the db file hash matches the decrypted hash
-7. For each blob in the snapshot:
-    * Fetch the blob metadata from the snapshot db
-    * Ensure the blob exists in S3
-    * Check the S3 content hash matches the expected blob hash
-    * If not using --quick mode:
-        * Download and decrypt the blob
-        * Decompress and verify chunk hashes match metadata
-
---
-
-## 6. CLI Commands
-
-```
-vaultik backup [--config <path>] [--cron] [--daemon] [--prune]
-vaultik restore --bucket <bucket> --prefix <prefix> --snapshot <id> --target <dir>
-vaultik prune --bucket <bucket> --prefix <prefix> [--dry-run]
-vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
-vaultik fetch --bucket <bucket> --prefix <prefix> --snapshot <id> --file <path> --target <path>
-vaultik snapshot list --bucket <bucket> --prefix <prefix> [--limit <n>]
-vaultik snapshot rm --bucket <bucket> --prefix <prefix> --snapshot <id>
-vaultik snapshot latest --bucket <bucket> --prefix <prefix>
-```
-
-* `VAULTIK_PRIVATE_KEY` is required for `restore`, `prune`, `verify`, and
-  `fetch` commands.
-* It is passed via environment variable containing the age private key.
-
---
-
-## 7. Function and Method Signatures
-
-### 7.1 CLI
-
-```go
-func RootCmd() *cobra.Command
-func backupCmd() *cobra.Command
-func restoreCmd() *cobra.Command
-func pruneCmd() *cobra.Command
-func verifyCmd() *cobra.Command
-```
-
-### 7.2 Configuration
-
-```go
-type Config struct {
-    BackupPubKey      string  // age recipient
-    BackupInterval    time.Duration // used in daemon mode, irrelevant for cron mode
-    BlobSizeLimit     int64  // default 10GB
-    ChunkSize         int64 // default 10MB
-    Exclude           []string // list of regex of files to exclude from backup, absolute path
-    Hostname          string
-    IndexPath         string  // path to local SQLite index db, default /var/lib/vaultik/index.db
-    MetadataPrefix    string  // S3 prefix for metadata, default "metadata/"
-    MinTimeBetweenRun time.Duration  // minimum time between backup runs, default 1 hour - for daemon mode
-    S3                S3Config  // S3 configuration
-    ScanInterval      time.Duration  // interval to full stat() scan source dirs, default 24h
-    SourceDirs        []string  // list of source directories to back up, absolute paths
-}
-
-type S3Config struct {
-    Endpoint        string
-    Bucket          string
-    Prefix          string
-    AccessKeyID     string
-    SecretAccessKey string
-    Region          string
-}
-
-func Load(path string) (*Config, error)
-```
-
-### 7.3 Index
-
-```go
-type Index struct {
-    db *sql.DB
-}
-
-func OpenIndex(path string) (*Index, error)
-
-func (ix *Index) LookupFile(path string, mtime int64, size int64) ([]string, bool, error)
-func (ix *Index) SaveFile(path string, mtime int64, size int64, chunkHashes []string) error
-func (ix *Index) AddChunk(chunkHash string, size int64) error
-func (ix *Index) MarkBlob(blobHash, finalHash string, created time.Time) error
-func (ix *Index) MapChunkToBlob(blobHash, chunkHash string, offset, length int64) error
-func (ix *Index) MapChunkToFile(chunkHash, filePath string, offset, length int64) error
-```
-
-### 7.4 Blob Packing
-
-```go
-type BlobWriter struct {
-    // internal buffer, current size, encrypted writer, etc
-}
-
-func NewBlobWriter(...) *BlobWriter
-func (bw *BlobWriter) AddChunk(chunk []byte, chunkHash string) error
-func (bw *BlobWriter) Flush() (finalBlobHash string, err error)
-```
-
-### 7.5 Metadata
-
-```go
-func BuildSnapshotMetadata(ix *Index, snapshotID string) (sqlitePath string, err error)
-func EncryptAndUploadMetadata(path string, cfg *Config, snapshotID string) error
-```
-
-### 7.6 Prune
-
-```go
-func RunPrune(bucket, prefix, privateKey string) error
-```
-
--- a/61
+++ b/61
@@ -0,0 +1,61 @@
+# Lint stage
+# golangci/golangci-lint:v2.11.3-alpine, 2026-03-17
+FROM golangci/golangci-lint:v2.11.3-alpine@sha256:b1c3de5862ad0a95b4e45a993b0f00415835d687e4f12c845c7493b86c13414e AS lint
+
+RUN apk add --no-cache make build-base
+
+WORKDIR /src
+
+# Copy go mod files first for better layer caching
+COPY go.mod go.sum ./
+RUN go mod download
+
+# Copy source code
+COPY . .
+
+# Run formatting check and linter
+RUN make fmt-check
+RUN make lint
+
+# Build stage
+# golang:1.26.1-alpine, 2026-03-17
+FROM golang:1.26.1-alpine@sha256:2389ebfa5b7f43eeafbd6be0c3700cc46690ef842ad962f6c5bd6be49ed82039 AS builder
+
+# Depend on lint stage passing
+COPY --from=lint /src/go.sum /dev/null
+
+ARG VERSION=dev
+
+# Install build dependencies for CGO (mattn/go-sqlite3) and sqlite3 CLI (tests)
+RUN apk add --no-cache make build-base sqlite
+
+WORKDIR /src
+
+# Copy go mod files first for better layer caching
+COPY go.mod go.sum ./
+RUN go mod download
+
+# Copy source code
+COPY . .
+
+# Run tests
+RUN make test
+
+# Build with CGO enabled (required for mattn/go-sqlite3)
+RUN CGO_ENABLED=1 go build -ldflags "-X 'git.eeqj.de/sneak/vaultik/internal/globals.Version=${VERSION}' -X 'git.eeqj.de/sneak/vaultik/internal/globals.Commit=$(git rev-parse HEAD 2>/dev/null || echo unknown)'" -o /vaultik ./cmd/vaultik
+
+# Runtime stage
+# alpine:3.21, 2026-02-25
+FROM alpine:3.21@sha256:c3f8e73fdb79deaebaa2037150150191b9dcbfba68b4a46d70103204c53f4709
+
+RUN apk add --no-cache ca-certificates sqlite
+
+# Copy binary from builder
+COPY --from=builder /vaultik /usr/local/bin/vaultik
+
+# Create non-root user
+RUN adduser -D -H -s /sbin/nologin vaultik
+
+USER vaultik
+
+ENTRYPOINT ["/usr/local/bin/vaultik"]
--- a/21
+++ b/21
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2025 Jeffrey Paul sneak@sneak.berlin
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/55
+++ b/55
@@ -1,26 +1,25 @@
-.PHONY: test fmt lint build clean all
+.PHONY: test fmt lint fmt-check check build clean all docker hooks
+
+# Version number
+VERSION := 0.0.1

 # Build variables
-VERSION := $(shell git describe --tags --always --dirty 2>/dev/null || echo "dev")
-COMMIT := $(shell git rev-parse HEAD 2>/dev/null || echo "unknown")
+GIT_REVISION := $(shell git rev-parse HEAD 2>/dev/null || echo "unknown")

 # Linker flags
 LDFLAGS := -X 'git.eeqj.de/sneak/vaultik/internal/globals.Version=$(VERSION)' \
-           -X 'git.eeqj.de/sneak/vaultik/internal/globals.Commit=$(COMMIT)'
+           -X 'git.eeqj.de/sneak/vaultik/internal/globals.Commit=$(GIT_REVISION)'

 # Default target
-all: test
+all: vaultik

 # Run tests
-test: lint fmt-check
-	go test -v ./...
+test:
+	go test -race -timeout 30s ./...

-# Check if code is formatted
+# Check if code is formatted (read-only)
 fmt-check:
-	@if [ -n "$$(go fmt ./...)" ]; then \
-		echo "Error: Code is not formatted. Run 'make fmt' to fix."; \
-		exit 1; \
-	fi
+	@test -z "$$(gofmt -l .)" || (echo "Files not formatted:" && gofmt -l . && exit 1)

 # Format code
 fmt:
@@ -28,22 +27,17 @@ fmt:

 # Run linter
 lint:
-	golangci-lint run
+	golangci-lint run ./...

 # Build binary
-build:
-	go build -ldflags "$(LDFLAGS)" -o vaultik ./cmd/vaultik
+vaultik: internal/*/*.go cmd/vaultik/*.go
+	go build -ldflags "$(LDFLAGS)" -o $@ ./cmd/vaultik

 # Clean build artifacts
 clean:
 	rm -f vaultik
 	go clean

-# Install dependencies
-deps:
-	go mod download
-	go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
-
 # Run tests with coverage
 test-coverage:
 	go test -v -coverprofile=coverage.out ./...
@@ -52,3 +46,24 @@ test-coverage:
 # Run integration tests
 test-integration:
 	go test -v -tags=integration ./...
+
+local:
+	VAULTIK_CONFIG=$(HOME)/etc/vaultik/config.yml ./vaultik snapshot --debug list 2>&1
+	VAULTIK_CONFIG=$(HOME)/etc/vaultik/config.yml ./vaultik snapshot --debug create 2>&1
+
+install: vaultik
+	cp ./vaultik $(HOME)/bin/
+
+# Run all checks (formatting, linting, tests) without modifying files
+check: fmt-check lint test
+
+# Build Docker image
+docker:
+	docker build -t vaultik .
+
+# Install pre-commit hook
+hooks:
+	@printf '#!/bin/sh\nset -e\n' > .git/hooks/pre-commit
+	@printf 'go mod tidy\ngo fmt ./...\ngit diff --exit-code -- go.mod go.sum || { echo "go mod tidy changed files; please stage and retry"; exit 1; }\n' >> .git/hooks/pre-commit
+	@printf 'make check\n' >> .git/hooks/pre-commit
+	@chmod +x .git/hooks/pre-commit
--- a/PROCESS.md
+++ b/PROCESS.md
@@ -0,0 +1,556 @@
+# Vaultik Snapshot Creation Process
+
+This document describes the lifecycle of objects during snapshot creation, with a focus on database transactions and foreign key constraints.
+
+## Database Schema Overview
+
+### Tables and Foreign Key Dependencies
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                          FOREIGN KEY GRAPH                               │
+│                                                                          │
+│  snapshots ◄────── snapshot_files ────────► files                       │
+│      │                                         │                         │
+│      └───────── snapshot_blobs ────────► blobs │                         │
+│                                           │    │                         │
+│                                           │    ├──► file_chunks ◄── chunks│
+│                                           │    │                    ▲    │
+│                                           │    └──► chunk_files ────┘    │
+│                                           │                              │
+│                                           └──► blob_chunks ─────────────┘│
+│                                                                          │
+│  uploads ───────► blobs.blob_hash                                        │
+│      └──────────► snapshots.id                                           │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+
+### Critical Constraint: `chunks` Must Exist First
+
+These tables reference `chunks.chunk_hash` **without CASCADE**:
+- `file_chunks.chunk_hash` → `chunks.chunk_hash`
+- `chunk_files.chunk_hash` → `chunks.chunk_hash`
+- `blob_chunks.chunk_hash` → `chunks.chunk_hash`
+
+**Implication**: A chunk record MUST be committed to the database BEFORE any of these referencing records can be created.
+
+### Order of Operations Required by Schema
+
+```
+1. snapshots      (created first, before scan)
+2. blobs          (created when packer starts new blob)
+3. chunks         (created during file processing)
+4. blob_chunks    (created immediately after chunk added to packer)
+5. files          (created after file fully chunked)
+6. file_chunks    (created with file record)
+7. chunk_files    (created with file record)
+8. snapshot_files (created with file record)
+9. snapshot_blobs (created after blob uploaded)
+10. uploads       (created after blob uploaded)
+```
+
+---
+
+## Snapshot Creation Phases
+
+### Phase 0: Initialization
+
+**Actions:**
+1. Snapshot record created in database (Transaction T0)
+2. Known files loaded into memory from `files` table
+3. Known chunks loaded into memory from `chunks` table
+
+**Transactions:**
+```
+T0: INSERT INTO snapshots (id, hostname, ...) VALUES (...)
+    COMMIT
+```
+
+---
+
+### Phase 1: Scan Directory
+
+**Actions:**
+1. Walk filesystem directory tree
+2. For each file, compare against in-memory `knownFiles` map
+3. Classify files as: unchanged, new, or modified
+4. Collect unchanged file IDs for later association
+5. Collect new/modified files for processing
+
+**Transactions:**
+```
+(None during scan - all in-memory)
+```
+
+---
+
+### Phase 1b: Associate Unchanged Files
+
+**Actions:**
+1. For unchanged files, add entries to `snapshot_files` table
+2. Done in batches of 1000
+
+**Transactions:**
+```
+For each batch of 1000 file IDs:
+    T: BEGIN
+       INSERT INTO snapshot_files (snapshot_id, file_id) VALUES (?, ?)
+       ... (up to 1000 inserts)
+       COMMIT
+```
+
+---
+
+### Phase 2: Process Files
+
+For each file that needs processing:
+
+#### Step 2a: Open and Chunk File
+
+**Location:** `processFileStreaming()`
+
+For each chunk produced by content-defined chunking:
+
+##### Step 2a-1: Check Chunk Existence
+```go
+chunkExists := s.chunkExists(chunk.Hash)  // In-memory lookup
+```
+
+##### Step 2a-2: Create Chunk Record (if new)
+```go
+// TRANSACTION: Create chunk in database
+err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
+    dbChunk := &database.Chunk{ChunkHash: chunk.Hash, Size: chunk.Size}
+    return s.repos.Chunks.Create(txCtx, tx, dbChunk)
+})
+// COMMIT immediately after WithTx returns
+
+// Update in-memory cache
+s.addKnownChunk(chunk.Hash)
+```
+
+**Transaction:**
+```
+T_chunk: BEGIN
+         INSERT INTO chunks (chunk_hash, size) VALUES (?, ?)
+         COMMIT
+```
+
+##### Step 2a-3: Add Chunk to Packer
+
+```go
+s.packer.AddChunk(&blob.ChunkRef{Hash: chunk.Hash, Data: chunk.Data})
+```
+
+**Inside packer.AddChunk → addChunkToCurrentBlob():**
+
+```go
+// TRANSACTION: Create blob_chunks record IMMEDIATELY
+if p.repos != nil {
+    blobChunk := &database.BlobChunk{
+        BlobID:    p.currentBlob.id,
+        ChunkHash: chunk.Hash,
+        Offset:    offset,
+        Length:    chunkSize,
+    }
+    err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
+        return p.repos.BlobChunks.Create(ctx, tx, blobChunk)
+    })
+    // COMMIT immediately
+}
+```
+
+**Transaction:**
+```
+T_blob_chunk: BEGIN
+              INSERT INTO blob_chunks (blob_id, chunk_hash, offset, length) VALUES (?, ?, ?, ?)
+              COMMIT
+```
+
+**⚠️ CRITICAL DEPENDENCY**: This transaction requires `chunks.chunk_hash` to exist (FK constraint).
+The chunk MUST be committed in Step 2a-2 BEFORE this can succeed.
+
+---
+
+#### Step 2b: Blob Size Limit Handling
+
+If adding a chunk would exceed blob size limit:
+
+```go
+if err == blob.ErrBlobSizeLimitExceeded {
+    if err := s.packer.FinalizeBlob(); err != nil { ... }
+    // Retry adding the chunk
+    if err := s.packer.AddChunk(...); err != nil { ... }
+}
+```
+
+**FinalizeBlob() transactions:**
+```
+T_blob_finish: BEGIN
+               UPDATE blobs SET blob_hash=?, uncompressed_size=?, compressed_size=?, finished_ts=? WHERE id=?
+               COMMIT
+```
+
+Then blob handler is called (handleBlobReady):
+```
+(Upload to S3 - no transaction)
+
+T_blob_uploaded: BEGIN
+                 UPDATE blobs SET uploaded_ts=? WHERE id=?
+                 INSERT INTO snapshot_blobs (snapshot_id, blob_id, blob_hash) VALUES (?, ?, ?)
+                 INSERT INTO uploads (blob_hash, snapshot_id, uploaded_at, size, duration_ms) VALUES (?, ?, ?, ?, ?)
+                 COMMIT
+```
+
+---
+
+#### Step 2c: Queue File for Batch Insertion
+
+After all chunks for a file are processed:
+
+```go
+// Build file data (in-memory, no DB)
+fileChunks := make([]database.FileChunk, len(chunks))
+chunkFiles := make([]database.ChunkFile, len(chunks))
+
+// Queue for batch insertion
+return s.addPendingFile(ctx, pendingFileData{
+    file:       fileToProcess.File,
+    fileChunks: fileChunks,
+    chunkFiles: chunkFiles,
+})
+```
+
+**No transaction yet** - just adds to `pendingFiles` slice.
+
+If `len(pendingFiles) >= fileBatchSize (100)`, triggers `flushPendingFiles()`.
+
+---
+
+### Step 2d: Flush Pending Files
+
+**Location:** `flushPendingFiles()` - called when batch is full or at end of processing
+
+```go
+return s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
+    for _, data := range files {
+        // 1. Create file record
+        s.repos.Files.Create(txCtx, tx, data.file)  // INSERT OR REPLACE
+
+        // 2. Delete old associations
+        s.repos.FileChunks.DeleteByFileID(txCtx, tx, data.file.ID)
+        s.repos.ChunkFiles.DeleteByFileID(txCtx, tx, data.file.ID)
+
+        // 3. Create file_chunks records
+        for _, fc := range data.fileChunks {
+            s.repos.FileChunks.Create(txCtx, tx, &fc)  // FK: chunks.chunk_hash
+        }
+
+        // 4. Create chunk_files records
+        for _, cf := range data.chunkFiles {
+            s.repos.ChunkFiles.Create(txCtx, tx, &cf)  // FK: chunks.chunk_hash
+        }
+
+        // 5. Add file to snapshot
+        s.repos.Snapshots.AddFileByID(txCtx, tx, s.snapshotID, data.file.ID)
+    }
+    return nil
+})
+// COMMIT (all or nothing for the batch)
+```
+
+**Transaction:**
+```
+T_files_batch: BEGIN
+               -- For each file in batch:
+               INSERT OR REPLACE INTO files (...) VALUES (...)
+               DELETE FROM file_chunks WHERE file_id = ?
+               DELETE FROM chunk_files WHERE file_id = ?
+               INSERT INTO file_chunks (file_id, idx, chunk_hash) VALUES (?, ?, ?)  -- FK: chunks
+               INSERT INTO chunk_files (chunk_hash, file_id, ...) VALUES (?, ?, ...) -- FK: chunks
+               INSERT INTO snapshot_files (snapshot_id, file_id) VALUES (?, ?)
+               -- Repeat for each file
+               COMMIT
+```
+
+**⚠️ CRITICAL DEPENDENCY**: `file_chunks` and `chunk_files` require `chunks.chunk_hash` to exist.
+
+---
+
+### Phase 2 End: Final Flush
+
+```go
+// Flush any remaining pending files
+if err := s.flushAllPending(ctx); err != nil { ... }
+
+// Final packer flush
+s.packer.Flush()
+```
+
+---
+
+## The Current Bug
+
+### Problem
+
+The current code attempts to batch file insertions, but `file_chunks` and `chunk_files` have foreign keys to `chunks.chunk_hash`. The batched file flush tries to insert these records, but if the chunks haven't been committed yet, the FK constraint fails.
+
+### Why It's Happening
+
+Looking at the sequence:
+
+1. Process file A, chunk X
+2. Create chunk X in DB (Transaction commits)
+3. Add chunk X to packer
+4. Packer creates blob_chunks for chunk X (needs chunk X - OK, committed in step 2)
+5. Queue file A with chunk references
+6. Process file B, chunk Y
+7. Create chunk Y in DB (Transaction commits)
+8. ... etc ...
+9. At end: flushPendingFiles()
+10. Insert file_chunks for file A referencing chunk X (chunk X committed - should work)
+
+The chunks ARE being created individually. But something is going wrong.
+
+### Actual Issue
+
+Wait - let me re-read the code. The issue is:
+
+In `processFileStreaming`, when we queue file data:
+```go
+fileChunks[i] = database.FileChunk{
+    FileID:    fileToProcess.File.ID,
+    Idx:       ci.fileChunk.Idx,
+    ChunkHash: ci.fileChunk.ChunkHash,
+}
+```
+
+The `FileID` is set, but `fileToProcess.File.ID` might be empty at this point because the file record hasn't been created yet!
+
+Looking at `checkFileInMemory`:
+```go
+// For new files:
+if !exists {
+    return file, true  // file.ID is empty string!
+}
+
+// For existing files:
+file.ID = existingFile.ID  // Reuse existing ID
+```
+
+**For NEW files, `file.ID` is empty!**
+
+Then in `flushPendingFiles`:
+```go
+s.repos.Files.Create(txCtx, tx, data.file)  // This generates/uses the ID
+```
+
+But `data.fileChunks` was built with the EMPTY ID!
+
+### The Real Problem
+
+For new files:
+1. `checkFileInMemory` creates file record with empty ID
+2. `processFileStreaming` queues file_chunks with empty `FileID`
+3. `flushPendingFiles` creates file (generates ID), but file_chunks still have empty `FileID`
+
+Wait, but `Files.Create` should be INSERT OR REPLACE by path, and the file struct should get updated... Let me check.
+
+Actually, looking more carefully at the code path - the file IS created first in the flush, but the `fileChunks` slice was already built with the old (possibly empty) ID. The ID isn't updated after the file is created.
+
+Hmm, but looking at the current code:
+```go
+fileChunks[i] = database.FileChunk{
+    FileID:    fileToProcess.File.ID,  // This uses the ID from the File struct
+```
+
+And in `checkFileInMemory` for new files, we create a file struct but don't set the ID. However, looking at the database repository, `Files.Create` should be doing `INSERT OR REPLACE` and the ID should be pre-generated...
+
+Let me check if IDs are being generated. Looking at the File struct usage, it seems like UUIDs should be generated somewhere...
+
+Actually, looking at the test failures again:
+```
+creating file chunk: inserting file_chunk: constraint failed: FOREIGN KEY constraint failed (787)
+```
+
+Error 787 is SQLite's foreign key constraint error. The failing FK is on `file_chunks.chunk_hash → chunks.chunk_hash`.
+
+So the chunks ARE NOT in the database when we try to insert file_chunks. Let me trace through more carefully...
+
+---
+
+## Transaction Timing Issue
+
+The problem is transaction visibility in SQLite.
+
+Each `WithTx` creates a new transaction that commits at the end. But with batched file insertion:
+
+1. Chunk transactions commit one at a time
+2. File batch transaction runs later
+
+If chunks are being inserted but something goes wrong with transaction isolation, the file batch might not see them.
+
+But actually SQLite in WAL mode should have SERIALIZABLE isolation by default, so committed transactions should be visible.
+
+Let me check if the in-memory cache is masking a database problem...
+
+Actually, wait. Let me re-check the current broken code more carefully. The issue might be simpler.
+
+---
+
+## Current Code Flow Analysis
+
+Looking at `processFileStreaming` in the current broken state:
+
+```go
+// For each chunk:
+if !chunkExists {
+    err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
+        dbChunk := &database.Chunk{ChunkHash: chunk.Hash, Size: chunk.Size}
+        return s.repos.Chunks.Create(txCtx, tx, dbChunk)
+    })
+    // ... check error ...
+    s.addKnownChunk(chunk.Hash)
+}
+
+// ... add to packer (creates blob_chunks) ...
+
+// Collect chunk info for file
+chunks = append(chunks, chunkInfo{...})
+```
+
+Then at end of function:
+```go
+// Queue file for batch insertion
+return s.addPendingFile(ctx, pendingFileData{
+    file:       fileToProcess.File,
+    fileChunks: fileChunks,
+    chunkFiles: chunkFiles,
+})
+```
+
+At end of `processPhase`:
+```go
+if err := s.flushAllPending(ctx); err != nil { ... }
+```
+
+The chunks are being created one-by-one with individual transactions. By the time `flushPendingFiles` runs, all chunk transactions should have committed.
+
+Unless... there's a bug in how the chunks are being referenced. Let me check if the chunk_hash values are correct.
+
+Or... maybe the test database is being recreated between operations somehow?
+
+Actually, let me check the test setup. Maybe the issue is specific to the test environment.
+
+---
+
+## Summary of Object Lifecycle
+
+| Object | When Created | Transaction | Dependencies |
+|--------|--------------|-------------|--------------|
+| snapshot | Before scan | Individual tx | None |
+| blob | When packer needs new blob | Individual tx | None |
+| chunk | During file chunking (each chunk) | Individual tx | None |
+| blob_chunks | Immediately after adding chunk to packer | Individual tx | chunks, blobs |
+| files | Batched at end of processing | Batch tx | None |
+| file_chunks | With file (batched) | Batch tx | files, chunks |
+| chunk_files | With file (batched) | Batch tx | files, chunks |
+| snapshot_files | With file (batched) | Batch tx | snapshots, files |
+| snapshot_blobs | After blob upload | Individual tx | snapshots, blobs |
+| uploads | After blob upload | Same tx as snapshot_blobs | blobs, snapshots |
+
+---
+
+## Root Cause Analysis
+
+After detailed analysis, I believe the issue is one of the following:
+
+### Hypothesis 1: File ID Not Set
+
+Looking at `checkFileInMemory()` for NEW files:
+```go
+if !exists {
+    return file, true  // file.ID is empty string!
+}
+```
+
+For new files, `file.ID` is empty. Then in `processFileStreaming`:
+```go
+fileChunks[i] = database.FileChunk{
+    FileID:    fileToProcess.File.ID,  // Empty for new files!
+    ...
+}
+```
+
+The `FileID` in the built `fileChunks` slice is empty.
+
+Then in `flushPendingFiles`:
+```go
+s.repos.Files.Create(txCtx, tx, data.file)  // This generates the ID
+// But data.fileChunks still has empty FileID!
+for i := range data.fileChunks {
+    s.repos.FileChunks.Create(...)  // Uses empty FileID
+}
+```
+
+**Solution**: Generate file IDs upfront in `checkFileInMemory()`:
+```go
+file := &database.File{
+    ID:   uuid.New().String(),  // Generate ID immediately
+    Path: path,
+    ...
+}
+```
+
+### Hypothesis 2: Transaction Isolation
+
+SQLite with a single connection pool (`MaxOpenConns(1)`) should serialize all transactions. Committed data should be visible to subsequent transactions.
+
+However, there might be a subtle issue with how `context.Background()` is used in the packer vs the scanner's context.
+
+## Recommended Fix
+
+**Step 1: Generate file IDs upfront**
+
+In `checkFileInMemory()`, generate the UUID for new files immediately:
+```go
+file := &database.File{
+    ID:   uuid.New().String(),  // Always generate ID
+    Path: path,
+    ...
+}
+```
+
+This ensures `file.ID` is set when building `fileChunks` and `chunkFiles` slices.
+
+**Step 2: Verify by reverting to per-file transactions**
+
+If Step 1 doesn't fix it, revert to non-batched file insertion to isolate the issue:
+
+```go
+// Instead of queuing:
+//   return s.addPendingFile(ctx, pendingFileData{...})
+
+// Do immediate insertion:
+return s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
+    // Create file
+    s.repos.Files.Create(txCtx, tx, fileToProcess.File)
+    // Delete old associations
+    s.repos.FileChunks.DeleteByFileID(...)
+    s.repos.ChunkFiles.DeleteByFileID(...)
+    // Create new associations
+    for _, fc := range fileChunks {
+        s.repos.FileChunks.Create(...)
+    }
+    for _, cf := range chunkFiles {
+        s.repos.ChunkFiles.Create(...)
+    }
+    // Add to snapshot
+    s.repos.Snapshots.AddFileByID(...)
+    return nil
+})
+```
+
+**Step 3: If batching is still desired**
+
+After confirming per-file transactions work, re-implement batching with the ID fix in place, and add debug logging to trace exactly which chunk_hash is failing and why.
--- a/README.md
+++ b/README.md
@@ -1,11 +1,64 @@
-# vaultik
+# vaultik (ваултик)

-`vaultik` is a incremental backup daemon written in Go. It
-encrypts data using an `age` public key and uploads each encrypted blob
-directly to a remote S3-compatible object store. It requires no private
-keys, secrets, or credentials stored on the backed-up system.
+WIP: pre-1.0, some functions may not be fully implemented yet

---
+`vaultik` is an incremental backup daemon written in Go. It encrypts data
+using an `age` public key and uploads each encrypted blob directly to a
+remote S3-compatible object store. It requires no private keys, secrets, or
+credentials (other than those required to PUT to encrypted object storage,
+such as S3 API keys) stored on the backed-up system.
+
+It includes table-stakes features such as:
+
+* modern encryption (the excellent `age`)
+* deduplication
+* incremental backups
+* modern multithreaded zstd compression with configurable levels
+* content-addressed immutable storage
+* local state tracking in standard SQLite database, enables write-only
+  incremental backups to destination
+* no mutable remote metadata
+* no plaintext file paths or metadata stored in remote
+* does not create huge numbers of small files (to keep S3 operation counts
+  down) even if the source system has many small files
+
+## why
+
+Existing backup software fails under one or more of these conditions:
+
+* Requires secrets (passwords, private keys) on the source system, which
+  compromises encrypted backups in the case of host system compromise
+* Depends on symmetric encryption unsuitable for zero-trust environments
+* Creates one-blob-per-file, which results in excessive S3 operation counts
+* is slow
+
+Other backup tools like `restic`, `borg`, and `duplicity` are designed for
+environments where the source host can store secrets and has access to
+decryption keys. I don't want to store backup decryption keys on my hosts,
+only public keys for encryption.
+
+My requirements are:
+
+* open source
+* no passphrases or private keys on the source host
+* incremental
+* compressed
+* encrypted
+* s3 compatible without an intermediate step or tool
+
+Surprisingly, no existing tool meets these requirements, so I wrote `vaultik`.
+
+## design goals
+
+1. Backups must require only a public key on the source host.
+1. No secrets or private keys may exist on the source system.
+1. Restore must be possible using **only** the backup bucket and a private key.
+1. Prune must be possible (requires private key, done on different hosts).
+1. All encryption uses [`age`](https://age-encryption.org/) (X25519, XChaCha20-Poly1305).
+1. Compression uses `zstd` at a configurable level.
+1. Files are chunked, and multiple chunks are packed into encrypted blobs
+   to reduce object count for filesystems with many small files.
+1. All metadata (snapshots) is stored remotely as encrypted SQLite DBs.

 ## what

@@ -13,29 +66,12 @@ keys, secrets, or credentials stored on the backed-up system.
 content-addressable chunk map of changed files using deterministic chunking.
 Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
 encrypted with `age`, and uploaded directly to remote storage under a
-content-addressed S3 path.
+content-addressed S3 path. At the end, a pruned snapshot-specific sqlite
+database of metadata is created, encrypted, and uploaded alongside the
+blobs.

-No plaintext file contents ever hit disk. No private key is needed or stored
-locally. All encrypted data is streaming-processed and immediately discarded
-once uploaded. Metadata is encrypted and pushed with the same mechanism.
-
-## why
-
-Existing backup software fails under one or more of these conditions:
-
-* Requires secrets (passwords, private keys) on the source system
-* Depends on symmetric encryption unsuitable for zero-trust environments
-* Stages temporary archives or repositories
-* Writes plaintext metadata or plaintext file paths
-
-`vaultik` addresses all of these by using:
-
-* Public-key-only encryption (via `age`) requires no secrets (other than
-  bucket access key) on the source system
-* Blob-level deduplication and batching
-* Local state cache for incremental detection
-* S3-native chunked upload interface
-* Self-contained encrypted snapshot metadata
+No plaintext file contents ever hit disk. No private key or secret
+passphrase is needed or stored locally.

 ## how

@@ -45,23 +81,38 @@ Existing backup software fails under one or more of these conditions:
   go install git.eeqj.de/sneak/vaultik@latest
   ```

-2. **generate keypair**
+1. **generate keypair**

   ```sh
   age-keygen -o agekey.txt
   grep 'public key:' agekey.txt
   ```

-3. **write config**
+1. **write config**

   ```yaml
-   source_dirs:
+   # Named snapshots - each snapshot can contain multiple paths
+   snapshots:
+     system:
+       paths:
         - /etc
-     - /home/user/data
+         - /var/lib
+       exclude:
+         - '*.cache'  # Snapshot-specific exclusions
+     home:
+       paths:
+         - /home/user/documents
+         - /home/user/photos
+
+   # Global exclusions (apply to all snapshots)
   exclude:
     - '*.log'
     - '*.tmp'
-   age_recipient: age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+     - '.git'
+     - 'node_modules'
+
+   age_recipients:
+     - age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
   s3:
     endpoint: https://s3.example.com
     bucket: vaultik-data
@@ -69,28 +120,24 @@ Existing backup software fails under one or more of these conditions:
     access_key_id: ...
     secret_access_key: ...
     region: us-east-1
-   backup_interval: 1h      # only used in daemon mode, not for --cron mode
-   full_scan_interval: 24h  # normally we use inotify to mark dirty, but
-                            # every 24h we do a full stat() scan
-   min_time_between_run: 15m  # again, only for daemon mode
-   index_path: /var/lib/vaultik/index.sqlite
+   backup_interval: 1h
+   full_scan_interval: 24h
+   min_time_between_run: 15m
   chunk_size: 10MB
-   blob_size_limit: 10GB
-   index_prefix: index/
+   blob_size_limit: 1GB
   ```

-4. **run**
+1. **run**

   ```sh
-   vaultik backup /etc/vaultik.yaml
-   ```
+   # Create all configured snapshots
+   vaultik --config /etc/vaultik.yaml snapshot create

-   ```sh
-   vaultik backup /etc/vaultik.yaml --cron # silent unless error
-   ```
+   # Create specific snapshots by name
+   vaultik --config /etc/vaultik.yaml snapshot create home system

-   ```sh
-   vaultik backup /etc/vaultik.yaml --daemon # runs in background, uses inotify
+   # Silent mode for cron
+   vaultik --config /etc/vaultik.yaml snapshot create --cron
   ```

 ---
@@ -100,54 +147,233 @@ Existing backup software fails under one or more of these conditions:
 ### commands

 ```sh
-vaultik backup [--config <path>] [--cron] [--daemon]
-vaultik restore --bucket <bucket> --prefix <prefix> --snapshot <id> --target <dir>
-vaultik prune --bucket <bucket> --prefix <prefix> [--dry-run]
-vaultik fetch --bucket <bucket> --prefix <prefix> --snapshot <id> --file <path> --target <path>
-vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
+vaultik [--config <path>] snapshot create [snapshot-names...] [--cron] [--daemon] [--prune]
+vaultik [--config <path>] snapshot list [--json]
+vaultik [--config <path>] snapshot verify <snapshot-id> [--deep]
+vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--force]
+vaultik [--config <path>] snapshot remove <snapshot-id> [--dry-run] [--force]
+vaultik [--config <path>] snapshot prune
+vaultik [--config <path>] restore <snapshot-id> <target-dir> [paths...]
+vaultik [--config <path>] prune [--dry-run] [--force]
+vaultik [--config <path>] info
+vaultik [--config <path>] store info
 ```

 ### environment

-* `VAULTIK_PRIVATE_KEY`: Required for `restore`, `prune`, `fetch`, and `verify` commands. Contains the age private key for decryption.
-* `VAULTIK_CONFIG`: Optional path to config file. If set, `vaultik backup` can be run without specifying the config file path.
+* `VAULTIK_AGE_SECRET_KEY`: Required for `restore` and deep `verify`. Contains the age private key for decryption.
+* `VAULTIK_CONFIG`: Optional path to config file.

 ### command details

-**backup**: Perform incremental backup of configured directories
+**snapshot create**: Perform incremental backup of configured snapshots
 * Config is located at `/etc/vaultik/config.yml` by default
-* `--config`: Override config file path
+* Optional snapshot names argument to create specific snapshots (default: all)
 * `--cron`: Silent unless error (for crontab)
 * `--daemon`: Run continuously with inotify monitoring and periodic scans
+* `--prune`: Delete old snapshots and orphaned blobs after backup

-**restore**: Restore entire snapshot to target directory
-* Downloads and decrypts metadata
-* Fetches only required blobs
-* Reconstructs directory structure
+**snapshot list**: List all snapshots with their timestamps and sizes
+* `--json`: Output in JSON format

-**prune**: Remove unreferenced blobs from storage
-* Requires private key
-* Downloads latest snapshot metadata
+**snapshot verify**: Verify snapshot integrity
+* `--deep`: Download and verify blob contents (not just existence)
+
+**snapshot purge**: Remove old snapshots based on criteria
+* `--keep-latest`: Keep only the most recent snapshot
+* `--older-than`: Remove snapshots older than duration (e.g., 30d, 6mo, 1y)
+* `--force`: Skip confirmation prompt
+
+**snapshot remove**: Remove a specific snapshot
+* `--dry-run`: Show what would be deleted without deleting
+* `--force`: Skip confirmation prompt
+
+**snapshot prune**: Clean orphaned data from local database
+
+**restore**: Restore snapshot to target directory
+* Requires `VAULTIK_AGE_SECRET_KEY` environment variable with age private key
+* Optional path arguments to restore specific files/directories (default: all)
+* Downloads and decrypts metadata, fetches required blobs, reconstructs files
+* Preserves file permissions, mtime, and ownership (ownership requires root)
+* Handles symlinks and directories
+* Note: ctime cannot be restored (see [platform notes](#platform-specific-ctime-semantics))
+
+**prune**: Remove unreferenced blobs from remote storage
+* Scans all snapshots for referenced blobs
 * Deletes orphaned blobs

-**fetch**: Extract single file from backup
-* Retrieves specific file without full restore
-* Supports extracting to different filename
+**info**: Display system and configuration information

-**verify**: Validate backup integrity
-* Checks metadata hash
-* Verifies all referenced blobs exist
-* Default: Downloads blobs and validates chunk integrity
-* `--quick`: Only checks blob existence and S3 content hashes
+**store info**: Display S3 bucket configuration and storage statistics

 ---

 ## architecture

+### s3 bucket layout
+
+```
+s3://<bucket>/<prefix>/
+├── blobs/
+│   └── <aa>/<bb>/<full_blob_hash>
+└── metadata/
+    ├── <snapshot_id>/
+    │   ├── db.zst.age
+    │   └── manifest.json.zst
+```
+
+* `blobs/<aa>/<bb>/...`: Two-level directory sharding using first 4 hex chars of blob hash
+* `metadata/<snapshot_id>/db.zst.age`: Encrypted, compressed SQLite database
+* `metadata/<snapshot_id>/manifest.json.zst`: Unencrypted blob list for pruning
+
+### blob manifest format
+
+The `manifest.json.zst` file is unencrypted (compressed JSON) to enable pruning without decryption:
+
+```json
+{
+  "snapshot_id": "hostname_snapshotname_2025-01-01T12:00:00Z",
+  "blob_hashes": [
+    "aa1234567890abcdef...",
+    "bb2345678901bcdef0..."
+  ]
+}
+```
+
+Snapshot IDs follow the format `<hostname>_<snapshot-name>_<timestamp>` (e.g., `server1_home_2025-01-01T12:00:00Z`).
+
+### local sqlite schema
+
+```sql
+CREATE TABLE files (
+  id TEXT PRIMARY KEY,
+  path TEXT NOT NULL UNIQUE,
+  source_path TEXT NOT NULL DEFAULT '',
+  mtime INTEGER NOT NULL,
+  ctime INTEGER NOT NULL,
+  size INTEGER NOT NULL,
+  mode INTEGER NOT NULL,
+  uid INTEGER NOT NULL,
+  gid INTEGER NOT NULL,
+  link_target TEXT
+);
+
+CREATE TABLE file_chunks (
+  file_id TEXT NOT NULL,
+  idx INTEGER NOT NULL,
+  chunk_hash TEXT NOT NULL,
+  PRIMARY KEY (file_id, idx),
+  FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
+);
+
+CREATE TABLE chunks (
+  chunk_hash TEXT PRIMARY KEY,
+  size INTEGER NOT NULL
+);
+
+CREATE TABLE blobs (
+  id TEXT PRIMARY KEY,
+  blob_hash TEXT NOT NULL UNIQUE,
+  uncompressed INTEGER NOT NULL,
+  compressed INTEGER NOT NULL,
+  uploaded_at INTEGER
+);
+
+CREATE TABLE blob_chunks (
+  blob_hash TEXT NOT NULL,
+  chunk_hash TEXT NOT NULL,
+  offset INTEGER NOT NULL,
+  length INTEGER NOT NULL,
+  PRIMARY KEY (blob_hash, chunk_hash)
+);
+
+CREATE TABLE chunk_files (
+  chunk_hash TEXT NOT NULL,
+  file_id TEXT NOT NULL,
+  file_offset INTEGER NOT NULL,
+  length INTEGER NOT NULL,
+  PRIMARY KEY (chunk_hash, file_id)
+);
+
+CREATE TABLE snapshots (
+  id TEXT PRIMARY KEY,
+  hostname TEXT NOT NULL,
+  vaultik_version TEXT NOT NULL,
+  started_at INTEGER NOT NULL,
+  completed_at INTEGER,
+  file_count INTEGER NOT NULL,
+  chunk_count INTEGER NOT NULL,
+  blob_count INTEGER NOT NULL,
+  total_size INTEGER NOT NULL,
+  blob_size INTEGER NOT NULL,
+  compression_ratio REAL NOT NULL
+);
+
+CREATE TABLE snapshot_files (
+  snapshot_id TEXT NOT NULL,
+  file_id TEXT NOT NULL,
+  PRIMARY KEY (snapshot_id, file_id)
+);
+
+CREATE TABLE snapshot_blobs (
+  snapshot_id TEXT NOT NULL,
+  blob_id TEXT NOT NULL,
+  blob_hash TEXT NOT NULL,
+  PRIMARY KEY (snapshot_id, blob_id)
+);
+```
+
+### data flow
+
+#### backup
+
+1. Load config, open local SQLite index
+1. Walk source directories, check mtime/size against index
+1. For changed/new files: chunk using content-defined chunking
+1. For each chunk: hash, check if already uploaded, add to blob packer
+1. When blob reaches threshold: compress, encrypt, upload to S3
+1. Build snapshot metadata, compress, encrypt, upload
+1. Create blob manifest (unencrypted) for pruning support
+
+#### restore
+
+1. Download `metadata/<snapshot_id>/db.zst.age`
+1. Decrypt and decompress SQLite database
+1. Query files table (optionally filtered by paths)
+1. For each file, get ordered chunk list from file_chunks
+1. Download required blobs, decrypt, decompress
+1. Extract chunks and reconstruct files
+1. Restore permissions, mtime, uid/gid (ctime cannot be restored — see platform notes above)
+
+### platform-specific ctime semantics
+
+The `ctime` field in the files table stores a platform-dependent timestamp:
+
+* **macOS (Darwin)**: `ctime` is the file's **birth time** — when the file was
+  first created on disk. This value never changes after file creation, even if
+  the file's content or metadata is modified.
+
+* **Linux**: `ctime` is the **inode change time** — the last time the file's
+  metadata (permissions, ownership, link count, etc.) was modified. This is NOT
+  the file creation time. Linux did not expose birth time (via `statx(2)`) until
+  kernel 4.11, and Go's `syscall` package does not yet surface it.
+
+**Restore limitation**: `ctime` cannot be restored on either platform. On Linux,
+the kernel manages the inode change time and userspace cannot set it. On macOS,
+there is no standard POSIX API to set birth time. The `ctime` value is preserved
+in the snapshot database for informational/forensic purposes only.
+
+#### prune
+
+1. List all snapshot manifests
+1. Build set of all referenced blob hashes
+1. List all blobs in storage
+1. Delete any blob not in referenced set
+
 ### chunking

-* Content-defined chunking using rolling hash (Rabin fingerprint)
-* Average chunk size: 10MB (configurable)
+* Content-defined chunking using FastCDC algorithm
+* Average chunk size: configurable (default 10MB)
 * Deduplication at chunk level
 * Multiple chunks packed into blobs for efficiency

@@ -158,19 +384,13 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
 * Each blob encrypted independently
 * Metadata databases also encrypted

-### storage
+### compression

-* Content-addressed blob storage
-* Immutable append-only design
-* Two-level directory sharding for blobs (aa/bb/hash)
-* Compressed with zstd before encryption
+* zstd compression at configurable level
+* Applied before encryption
+* Blob-level compression for efficiency

-### state tracking
-
-* Local SQLite database for incremental state
-* Tracks file mtimes and chunk mappings
-* Enables efficient change detection
-* Supports inotify monitoring in daemon mode
+---

 ## does not

@@ -180,8 +400,6 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
 * Require a symmetric passphrase or password
 * Trust the source system with anything

---
-
 ## does

 * Incremental deduplicated backup
@@ -193,90 +411,22 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]

 ---

-## restore
+## requirements

-`vaultik restore` downloads only the snapshot metadata and required blobs. It
-never contacts the source system. All restore operations depend only on:
-
-* `VAULTIK_PRIVATE_KEY`
-* The bucket
-
-The entire system is restore-only from object storage.
-
---
-
-## features
-
-### daemon mode
-
-* Continuous background operation
-* inotify-based change detection
-* Respects `backup_interval` and `min_time_between_run`
-* Full scan every `full_scan_interval` (default 24h)
-
-### cron mode
-
-* Single backup run
-* Silent output unless errors
-* Ideal for scheduled backups
-
-### metadata integrity
-
-* SHA256 hash of metadata stored separately
-* Encrypted hash file for verification
-* Chunked metadata support for large filesystems
-
-### exclusion patterns
-
-* Glob-based file exclusion
-* Configured in YAML
-* Applied during directory walk
-
-## prune
-
-Run `vaultik prune` on a machine with the private key. It:
-
-* Downloads the most recent snapshot
-* Decrypts metadata
-* Lists referenced blobs
-* Deletes any blob in the bucket not referenced
-
-This enables garbage collection from immutable storage.
-
---
+* Go 1.24 or later
+* S3-compatible object storage
+* Sufficient disk space for local index (typically <1GB)

 ## license

-WTFPL — see LICENSE.
-
---
-
-## security considerations
-
-* Source host compromise cannot decrypt backups
-* No replay attacks possible (append-only)
-* Each blob independently encrypted
-* Metadata tampering detectable via hash verification
-* S3 credentials only allow write access to backup prefix
-
-## performance
-
-* Streaming processing (no temp files)
-* Parallel blob uploads
-* Deduplication reduces storage and bandwidth
-* Local index enables fast incremental detection
-* Configurable compression levels
-
-## requirements
-
-* Go 1.24.4 or later
-* S3-compatible object storage
-* age command-line tool (for key generation)
-* SQLite3
-* Sufficient disk space for local index
+[MIT](https://opensource.org/license/mit/)

 ## author

-sneak
-[sneak@sneak.berlin](mailto:sneak@sneak.berlin)
-[https://sneak.berlin](https://sneak.berlin)
+Made with love and lots of expensive SOTA AI by [sneak](https://sneak.berlin) in Berlin in the summer of 2025.
+
+Released as a free software gift to the world, no strings attached.
+
+Contact: [sneak@sneak.berlin](mailto:sneak@sneak.berlin)
+
+[https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2](https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2)
--- a/TODO.md
+++ b/TODO.md
@@ -1,112 +1,128 @@
-# Implementation TODO
+# Vaultik 1.0 TODO

-## Local Index Database
-1. Implement SQLite schema creation
-1. Create Index type with all database operations
-1. Add transaction support and proper locking
-1. Implement file tracking (save, lookup, delete)
-1. Implement chunk tracking and deduplication
-1. Implement blob tracking and chunk-to-blob mapping
-1. Write tests for all index operations
+Linear list of tasks to complete before 1.0 release.

-## Chunking and Hashing
-1. Implement Rabin fingerprint chunker
-1. Create streaming chunk processor
-1. Implement SHA256 hashing for chunks
-1. Add configurable chunk size parameters
-1. Write tests for chunking consistency
+## Rclone Storage Backend (Complete)

-## Compression and Encryption
-1. Implement zstd compression wrapper
-1. Integrate age encryption library
-1. Create Encryptor type for public key encryption
-1. Create Decryptor type for private key decryption
-1. Implement streaming encrypt/decrypt pipelines
-1. Write tests for compression and encryption
+Add rclone as a storage backend via Go library import, allowing vaultik to use any of rclone's 70+ supported cloud storage providers.

-## Blob Packing
-1. Implement BlobWriter with size limits
-1. Add chunk accumulation and flushing
-1. Create blob hash calculation
-1. Implement proper error handling and rollback
-1. Write tests for blob packing scenarios
+**Configuration:**
+```yaml
+storage_url: "rclone://myremote/path/to/backups"
+```
+User must have rclone configured separately (via `rclone config`).

-## S3 Operations
-1. Integrate MinIO client library
-1. Implement S3Client wrapper type
-1. Add multipart upload support for large blobs
-1. Implement retry logic with exponential backoff
-1. Add connection pooling and timeout handling
-1. Write tests using MinIO container
+**Implementation Steps:**
+1. [x] Add rclone dependency to go.mod
+2. [x] Create `internal/storage/rclone.go` implementing `Storer` interface
+   - `NewRcloneStorer(remote, path)` - init with `configfile.Install()` and `fs.NewFs()`
+   - `Put` / `PutWithProgress` - use `operations.Rcat()`
+   - `Get` - use `fs.NewObject()` then `obj.Open()`
+   - `Stat` - use `fs.NewObject()` for size/metadata
+   - `Delete` - use `obj.Remove()`
+   - `List` / `ListStream` - use `operations.ListFn()`
+   - `Info` - return remote name
+3. [x] Update `internal/storage/url.go` - parse `rclone://remote/path` URLs
+4. [x] Update `internal/storage/module.go` - add rclone case to `storerFromURL()`
+5. [x] Test with real rclone remote

-## Backup Command - Basic
-1. Implement directory walking with exclusion patterns
-1. Add file change detection using index
-1. Integrate chunking pipeline for changed files
-1. Implement blob upload coordination
-1. Add progress reporting to stderr
-1. Write integration tests for backup
+**Error Mapping:**
+- `fs.ErrorObjectNotFound` → `ErrNotFound`
+- `fs.ErrorDirNotFound` → `ErrNotFound`
+- `fs.ErrorNotFoundInConfigFile` → `ErrRemoteNotFound` (new)

-## Snapshot Metadata
-1. Implement snapshot metadata extraction from index
-1. Create SQLite snapshot database builder
-1. Add metadata compression and encryption
-1. Implement metadata chunking for large snapshots
-1. Add hash calculation and verification
-1. Implement metadata upload to S3
-1. Write tests for metadata operations
+---

-## Restore Command
-1. Implement snapshot listing and selection
-1. Add metadata download and reconstruction
-1. Implement hash verification for metadata
-1. Create file restoration logic with chunk retrieval
-1. Add blob caching for efficiency
-1. Implement proper file permissions and mtime restoration
-1. Write integration tests for restore
+## CLI Polish (Priority)

-## Prune Command
-1. Implement latest snapshot detection
-1. Add referenced blob extraction from metadata
-1. Create S3 blob listing and comparison
-1. Implement safe deletion of unreferenced blobs
-1. Add dry-run mode for safety
-1. Write tests for prune scenarios
+1. Improve error messages throughout
+   - Ensure all errors include actionable context
+   - Add suggestions for common issues (e.g., "did you set VAULTIK_AGE_SECRET_KEY?")

-## Verify Command
-1. Implement metadata integrity checking
-1. Add blob existence verification
-1. Implement quick mode (S3 hash checking)
-1. Implement deep mode (download and verify chunks)
-1. Add detailed error reporting
-1. Write tests for verification
+## Security (Priority)

-## Fetch Command
-1. Implement single-file metadata query
-1. Add minimal blob downloading for file
-1. Create streaming file reconstruction
-1. Add support for output redirection
-1. Write tests for fetch command
+1. Audit encryption implementation
+   - Verify age encryption is used correctly
+   - Ensure no plaintext leaks in logs or errors
+   - Verify blob hashes are computed correctly

-## Daemon Mode
-1. Implement inotify watcher for Linux
-1. Add dirty path tracking in index
-1. Create periodic full scan scheduler
-1. Implement backup interval enforcement
-1. Add proper signal handling and shutdown
-1. Write tests for daemon behavior
+1. Secure memory handling for secrets
+   - Clear S3 credentials from memory after client init
+   - Document that age_secret_key is env-var only (already implemented)

-## Cron Mode
-1. Implement silent operation mode
-1. Add proper exit codes for cron
-1. Implement lock file to prevent concurrent runs
-1. Add error summary reporting
-1. Write tests for cron mode
+## Testing

-## Finalization
-1. Add comprehensive logging throughout
-1. Implement proper error wrapping and context
-1. Add performance metrics collection
-1. Create end-to-end integration tests
-1. Write documentation and examples
-1. Set up CI/CD pipeline
+1. Write integration tests for restore command
+
+1. Write end-to-end integration test
+   - Create backup
+   - Verify backup
+   - Restore backup
+   - Compare restored files to originals
+
+1. Add tests for edge cases
+   - Empty directories
+   - Symlinks
+   - Special characters in filenames
+   - Very large files (multi-GB)
+   - Many small files (100k+)
+
+1. Add tests for error conditions
+   - Network failures during upload
+   - Disk full during restore
+   - Corrupted blobs
+   - Missing blobs
+
+## Performance
+
+1. Profile and optimize restore performance
+   - Parallel blob downloads
+   - Streaming decompression/decryption
+   - Efficient chunk reassembly
+
+1. Add bandwidth limiting option
+   - `--bwlimit` flag for upload/download speed limiting
+
+## Documentation
+
+1. Add man page or --help improvements
+   - Detailed help for each command
+   - Examples in help output
+
+## Final Polish
+
+1. Ensure version is set correctly in releases
+
+1. Create release process
+   - Binary releases for supported platforms
+   - Checksums for binaries
+   - Release notes template
+
+1. Final code review
+   - Remove debug statements
+   - Ensure consistent code style
+
+1. Tag and release v1.0.0
+
+---
+
+## Post-1.0 (Daemon Mode)
+
+1. Implement inotify file watcher for Linux
+   - Watch source directories for changes
+   - Track dirty paths in memory
+
+1. Implement FSEvents watcher for macOS
+   - Watch source directories for changes
+   - Track dirty paths in memory
+
+1. Implement backup scheduler in daemon mode
+   - Respect backup_interval config
+   - Trigger backup when dirty paths exist and interval elapsed
+   - Implement full_scan_interval for periodic full scans
+
+1. Add proper signal handling for daemon
+   - Graceful shutdown on SIGTERM/SIGINT
+   - Complete in-progress backup before exit
+
+1. Write tests for daemon mode
--- a/cmd/vaultik/main.go
+++ b/cmd/vaultik/main.go
@@ -1,9 +1,41 @@
 package main

 import (
+	"os"
+	"runtime"
+	"runtime/pprof"
+
 	"git.eeqj.de/sneak/vaultik/internal/cli"
 )

 func main() {
+	// CPU profiling: set VAULTIK_CPUPROFILE=/path/to/cpu.prof
+	if cpuProfile := os.Getenv("VAULTIK_CPUPROFILE"); cpuProfile != "" {
+		f, err := os.Create(cpuProfile)
+		if err != nil {
+			panic("could not create CPU profile: " + err.Error())
+		}
+		defer func() { _ = f.Close() }()
+		if err := pprof.StartCPUProfile(f); err != nil {
+			panic("could not start CPU profile: " + err.Error())
+		}
+		defer pprof.StopCPUProfile()
+	}
+
+	// Memory profiling: set VAULTIK_MEMPROFILE=/path/to/mem.prof
+	if memProfile := os.Getenv("VAULTIK_MEMPROFILE"); memProfile != "" {
+		defer func() {
+			f, err := os.Create(memProfile)
+			if err != nil {
+				panic("could not create memory profile: " + err.Error())
+			}
+			defer func() { _ = f.Close() }()
+			runtime.GC() // get up-to-date statistics
+			if err := pprof.WriteHeapProfile(f); err != nil {
+				panic("could not write memory profile: " + err.Error())
+			}
+		}()
+	}
+
 	cli.CLIEntry()
 }
--- a/config.example.yml
+++ b/config.example.yml
@@ -0,0 +1,332 @@
+# vaultik configuration file example
+# This file shows all available configuration options with their default values
+# Copy this file and uncomment/modify the values you need
+
+# Age recipient public keys for encryption
+# This is REQUIRED - backups are encrypted to these public keys
+# Generate with: age-keygen | grep "public key"
+age_recipients:
+  - age1cj2k2addawy294f6k2gr2mf9gps9r3syplryxca3nvxj3daqm96qfp84tz
+
+# Named snapshots - each snapshot can contain multiple paths
+# Each snapshot gets its own ID and can have snapshot-specific excludes
+snapshots:
+  testing:
+    paths:
+      - ~/dev/vaultik
+  apps:
+    paths:
+      - /Applications
+    exclude:
+      - "/App Store.app"
+      - "/Apps.app"
+      - "/Automator.app"
+      - "/Books.app"
+      - "/Calculator.app"
+      - "/Calendar.app"
+      - "/Chess.app"
+      - "/Clock.app"
+      - "/Contacts.app"
+      - "/Dictionary.app"
+      - "/FaceTime.app"
+      - "/FindMy.app"
+      - "/Font Book.app"
+      - "/Freeform.app"
+      - "/Games.app"
+      - "/GarageBand.app"
+      - "/Home.app"
+      - "/Image Capture.app"
+      - "/Image Playground.app"
+      - "/Journal.app"
+      - "/Keynote.app"
+      - "/Mail.app"
+      - "/Maps.app"
+      - "/Messages.app"
+      - "/Mission Control.app"
+      - "/Music.app"
+      - "/News.app"
+      - "/Notes.app"
+      - "/Numbers.app"
+      - "/Pages.app"
+      - "/Passwords.app"
+      - "/Phone.app"
+      - "/Photo Booth.app"
+      - "/Photos.app"
+      - "/Podcasts.app"
+      - "/Preview.app"
+      - "/QuickTime Player.app"
+      - "/Reminders.app"
+      - "/Safari.app"
+      - "/Shortcuts.app"
+      - "/Siri.app"
+      - "/Stickies.app"
+      - "/Stocks.app"
+      - "/System Settings.app"
+      - "/TV.app"
+      - "/TextEdit.app"
+      - "/Time Machine.app"
+      - "/Tips.app"
+      - "/Utilities/Activity Monitor.app"
+      - "/Utilities/AirPort Utility.app"
+      - "/Utilities/Audio MIDI Setup.app"
+      - "/Utilities/Bluetooth File Exchange.app"
+      - "/Utilities/Boot Camp Assistant.app"
+      - "/Utilities/ColorSync Utility.app"
+      - "/Utilities/Console.app"
+      - "/Utilities/Digital Color Meter.app"
+      - "/Utilities/Disk Utility.app"
+      - "/Utilities/Grapher.app"
+      - "/Utilities/Magnifier.app"
+      - "/Utilities/Migration Assistant.app"
+      - "/Utilities/Print Center.app"
+      - "/Utilities/Screen Sharing.app"
+      - "/Utilities/Screenshot.app"
+      - "/Utilities/Script Editor.app"
+      - "/Utilities/System Information.app"
+      - "/Utilities/Terminal.app"
+      - "/Utilities/VoiceOver Utility.app"
+      - "/VoiceMemos.app"
+      - "/Weather.app"
+      - "/iMovie.app"
+      - "/iPhone Mirroring.app"
+  home:
+    paths:
+      - "~"
+    exclude:
+      - "/.Trash"
+      - "/tmp"
+      - "/Library/Caches"
+      - "/Library/Accounts"
+      - "/Library/AppleMediaServices"
+      - "/Library/Application Support/AddressBook"
+      - "/Library/Application Support/CallHistoryDB"
+      - "/Library/Application Support/CallHistoryTransactions"
+      - "/Library/Application Support/DifferentialPrivacy"
+      - "/Library/Application Support/FaceTime"
+      - "/Library/Application Support/FileProvider"
+      - "/Library/Application Support/Knowledge"
+      - "/Library/Application Support/com.apple.TCC"
+      - "/Library/Application Support/com.apple.avfoundation/Frecents"
+      - "/Library/Application Support/com.apple.sharedfilelist"
+      - "/Library/Assistant/SiriVocabulary"
+      - "/Library/Autosave Information"
+      - "/Library/Biome"
+      - "/Library/ContainerManager"
+      - "/Library/Containers/com.apple.Home"
+      - "/Library/Containers/com.apple.Maps/Data/Maps"
+      - "/Library/Containers/com.apple.MobileSMS"
+      - "/Library/Containers/com.apple.Notes"
+      - "/Library/Containers/com.apple.Safari"
+      - "/Library/Containers/com.apple.Safari.WebApp"
+      - "/Library/Containers/com.apple.VoiceMemos"
+      - "/Library/Containers/com.apple.archiveutility"
+      - "/Library/Containers/com.apple.corerecents.recentsd/Data/Library/Recents"
+      - "/Library/Containers/com.apple.mail"
+      - "/Library/Containers/com.apple.news"
+      - "/Library/Containers/com.apple.stocks"
+      - "/Library/Cookies"
+      - "/Library/CoreFollowUp"
+      - "/Library/Daemon Containers"
+      - "/Library/DoNotDisturb"
+      - "/Library/DuetExpertCenter"
+      - "/Library/Group Containers/com.apple.Home.group"
+      - "/Library/Group Containers/com.apple.MailPersonaStorage"
+      - "/Library/Group Containers/com.apple.PreviewLegacySignaturesConversion"
+      - "/Library/Group Containers/com.apple.bird"
+      - "/Library/Group Containers/com.apple.stickersd.group"
+      - "/Library/Group Containers/com.apple.systempreferences.cache"
+      - "/Library/Group Containers/group.com.apple.AppleSpell"
+      - "/Library/Group Containers/group.com.apple.ArchiveUtility.PKSignedContainer"
+      - "/Library/Group Containers/group.com.apple.DeviceActivity"
+      - "/Library/Group Containers/group.com.apple.Journal"
+      - "/Library/Group Containers/group.com.apple.ManagedSettings"
+      - "/Library/Group Containers/group.com.apple.PegasusConfiguration"
+      - "/Library/Group Containers/group.com.apple.Safari.SandboxBroker"
+      - "/Library/Group Containers/group.com.apple.SiriTTS"
+      - "/Library/Group Containers/group.com.apple.UserNotifications"
+      - "/Library/Group Containers/group.com.apple.VoiceMemos.shared"
+      - "/Library/Group Containers/group.com.apple.accessibility.voicebanking"
+      - "/Library/Group Containers/group.com.apple.amsondevicestoraged"
+      - "/Library/Group Containers/group.com.apple.appstoreagent"
+      - "/Library/Group Containers/group.com.apple.calendar"
+      - "/Library/Group Containers/group.com.apple.chronod"
+      - "/Library/Group Containers/group.com.apple.contacts"
+      - "/Library/Group Containers/group.com.apple.controlcenter"
+      - "/Library/Group Containers/group.com.apple.corerepair"
+      - "/Library/Group Containers/group.com.apple.coreservices.useractivityd"
+      - "/Library/Group Containers/group.com.apple.energykit"
+      - "/Library/Group Containers/group.com.apple.feedback"
+      - "/Library/Group Containers/group.com.apple.feedbacklogger"
+      - "/Library/Group Containers/group.com.apple.findmy.findmylocateagent"
+      - "/Library/Group Containers/group.com.apple.iCloudDrive"
+      - "/Library/Group Containers/group.com.apple.icloud.fmfcore"
+      - "/Library/Group Containers/group.com.apple.icloud.fmipcore"
+      - "/Library/Group Containers/group.com.apple.icloud.searchpartyuseragent"
+      - "/Library/Group Containers/group.com.apple.liveactivitiesd"
+      - "/Library/Group Containers/group.com.apple.loginwindow.persistent-apps"
+      - "/Library/Group Containers/group.com.apple.mail"
+      - "/Library/Group Containers/group.com.apple.mlhost"
+      - "/Library/Group Containers/group.com.apple.moments"
+      - "/Library/Group Containers/group.com.apple.news"
+      - "/Library/Group Containers/group.com.apple.newsd"
+      - "/Library/Group Containers/group.com.apple.notes"
+      - "/Library/Group Containers/group.com.apple.notes.import"
+      - "/Library/Group Containers/group.com.apple.photolibraryd.private"
+      - "/Library/Group Containers/group.com.apple.portrait.BackgroundReplacement"
+      - "/Library/Group Containers/group.com.apple.printtool"
+      - "/Library/Group Containers/group.com.apple.private.translation"
+      - "/Library/Group Containers/group.com.apple.reminders"
+      - "/Library/Group Containers/group.com.apple.replicatord"
+      - "/Library/Group Containers/group.com.apple.scopedbookmarkagent"
+      - "/Library/Group Containers/group.com.apple.secure-control-center-preferences"
+      - "/Library/Group Containers/group.com.apple.sharingd"
+      - "/Library/Group Containers/group.com.apple.shortcuts"
+      - "/Library/Group Containers/group.com.apple.siri.inference"
+      - "/Library/Group Containers/group.com.apple.siri.referenceResolution"
+      - "/Library/Group Containers/group.com.apple.siri.remembers"
+      - "/Library/Group Containers/group.com.apple.siri.userfeedbacklearning"
+      - "/Library/Group Containers/group.com.apple.spotlight"
+      - "/Library/Group Containers/group.com.apple.stocks"
+      - "/Library/Group Containers/group.com.apple.stocks-news"
+      - "/Library/Group Containers/group.com.apple.studentd"
+      - "/Library/Group Containers/group.com.apple.swtransparency"
+      - "/Library/Group Containers/group.com.apple.telephonyutilities.callservicesd"
+      - "/Library/Group Containers/group.com.apple.tips"
+      - "/Library/Group Containers/group.com.apple.tipsnext"
+      - "/Library/Group Containers/group.com.apple.transparency"
+      - "/Library/Group Containers/group.com.apple.usernoted"
+      - "/Library/Group Containers/group.com.apple.weather"
+      - "/Library/HomeKit"
+      - "/Library/IdentityServices"
+      - "/Library/IntelligencePlatform"
+      - "/Library/Mail"
+      - "/Library/Messages"
+      - "/Library/Metadata/CoreSpotlight"
+      - "/Library/Metadata/com.apple.IntelligentSuggestions"
+      - "/Library/PersonalizationPortrait"
+      - "/Library/Safari"
+      - "/Library/Sharing"
+      - "/Library/Shortcuts"
+      - "/Library/StatusKit"
+      - "/Library/Suggestions"
+      - "/Library/Trial"
+      - "/Library/Weather"
+      - "/Library/com.apple.aiml.instrumentation"
+      - "/Movies/TV"
+  system:
+    paths:
+      - /
+    exclude:
+      # Virtual/transient filesystems
+      - /proc
+      - /sys
+      - /dev
+      - /run
+      - /tmp
+      - /var/tmp
+      - /var/run
+      - /var/lock
+      - /var/cache
+      - /media
+      - /mnt
+      # Swap
+      - /swapfile
+      - /swap.img
+      # Package manager caches
+      - /var/cache/apt
+      - /var/cache/yum
+      - /var/cache/dnf
+      - /var/cache/pacman
+      # Trash
+      - "*/.local/share/Trash"
+  dev:
+    paths:
+      - /Users/user/dev
+    exclude:
+      - "**/node_modules"
+      - "**/target"
+      - "**/build"
+      - "**/__pycache__"
+      - "**/*.pyc"
+      - "**/.venv"
+      - "**/vendor"
+
+# Global patterns to exclude from all backups
+exclude:
+  - "*.tmp"
+
+# Storage URL - use either this OR the s3 section below
+# Supports: s3://bucket/prefix, file:///path, rclone://remote/path
+storage_url: "rclone://las1stor1//srv/pool.2024.04/backups/heraklion"
+
+# S3-compatible storage configuration
+#s3:
+#  # S3-compatible endpoint URL
+#  # Examples: https://s3.amazonaws.com, https://storage.googleapis.com
+#  endpoint: http://10.100.205.122:8333
+#
+#  # Bucket name where backups will be stored
+#  bucket: testbucket
+#
+#  # Prefix (folder) within the bucket for this host's backups
+#  # Useful for organizing backups from multiple hosts
+#  # Default: empty (root of bucket)
+#  #prefix: "hosts/myserver/"
+#
+#  # S3 access credentials
+#  access_key_id: Z9GT22M9YFU08WRMC5D4
+#  secret_access_key: Pi0tPKjFbN4rZlRhcA4zBtEkib04yy2WcIzI+AXk
+#
+#  # S3 region
+#  # Default: us-east-1
+#  #region: us-east-1
+#
+#  # Use SSL/TLS for S3 connections
+#  # Default: true
+#  #use_ssl: true
+#
+#  # Part size for multipart uploads
+#  # Minimum 5MB, affects memory usage during upload
+#  # Supports: 5MB, 10M, 100MiB, etc.
+#  # Default: 5MB
+#  #part_size: 5MB
+
+# How often to run backups in daemon mode
+# Format: 1h, 30m, 24h, etc
+# Default: 1h
+#backup_interval: 1h
+
+# How often to do a full filesystem scan in daemon mode
+# Between full scans, inotify is used to detect changes
+# Default: 24h
+#full_scan_interval: 24h
+
+# Minimum time between backup runs in daemon mode
+# Prevents backups from running too frequently
+# Default: 15m
+#min_time_between_run: 15m
+
+# Path to local SQLite index database
+# This database tracks file state for incremental backups
+# Default: /var/lib/vaultik/index.sqlite
+#index_path: /var/lib/vaultik/index.sqlite
+
+# Average chunk size for content-defined chunking
+# Smaller chunks = better deduplication but more metadata
+# Supports: 10MB, 5M, 1GB, 500KB, 64MiB, etc.
+# Default: 10MB
+#chunk_size: 10MB
+
+# Maximum blob size
+# Multiple chunks are packed into blobs up to this size
+# Supports: 1GB, 10G, 500MB, 1GiB, etc.
+# Default: 10GB
+#blob_size_limit: 10GB
+
+# Compression level (1-19)
+# Higher = better compression but slower
+# Default: 3
+compression_level: 5
+# Hostname to use in backup metadata
+# Default: system hostname
+#hostname: myserver
--- a/docs/DATAMODEL.md
+++ b/docs/DATAMODEL.md
@@ -0,0 +1,268 @@
+# Vaultik Data Model
+
+## Overview
+
+Vaultik uses a local SQLite database to track file metadata, chunk mappings, and blob associations during the backup process. This database serves as an index for incremental backups and enables efficient deduplication.
+
+**Important Notes:**
+- **No Migration Support**: Vaultik does not support database schema migrations. If the schema changes, the local database must be deleted and recreated by performing a full backup.
+- **Version Compatibility**: In rare cases, you may need to use the same version of Vaultik to restore a backup as was used to create it. This ensures compatibility with the metadata format stored in S3.
+
+## Database Tables
+
+### 1. `files`
+Stores metadata about files in the filesystem being backed up.
+
+**Columns:**
+- `id` (TEXT PRIMARY KEY) - UUID for the file record
+- `path` (TEXT NOT NULL UNIQUE) - Absolute file path
+- `mtime` (INTEGER NOT NULL) - Modification time as Unix timestamp
+- `ctime` (INTEGER NOT NULL) - Change time as Unix timestamp  
+- `size` (INTEGER NOT NULL) - File size in bytes
+- `mode` (INTEGER NOT NULL) - Unix file permissions and type
+- `uid` (INTEGER NOT NULL) - User ID of file owner
+- `gid` (INTEGER NOT NULL) - Group ID of file owner
+- `link_target` (TEXT) - Symlink target path (NULL for regular files)
+
+**Indexes:**
+- `idx_files_path` on `path` for efficient lookups
+
+**Purpose:** Tracks file metadata to detect changes between backup runs. Used for incremental backup decisions. The UUID primary key provides stable references that don't change if files are moved.
+
+### 2. `chunks`
+Stores information about content-defined chunks created from files.
+
+**Columns:**
+- `chunk_hash` (TEXT PRIMARY KEY) - SHA256 hash of chunk content
+- `size` (INTEGER NOT NULL) - Chunk size in bytes
+
+**Purpose:** Enables deduplication by tracking unique chunks across all files.
+
+### 3. `file_chunks`
+Maps files to their constituent chunks in order.
+
+**Columns:**
+- `file_id` (TEXT) - File ID (FK to files.id)
+- `idx` (INTEGER) - Chunk index within file (0-based)
+- `chunk_hash` (TEXT) - Chunk hash (FK to chunks.chunk_hash)
+- PRIMARY KEY (`file_id`, `idx`)
+
+**Purpose:** Allows reconstruction of files from chunks during restore.
+
+### 4. `chunk_files`
+Reverse mapping showing which files contain each chunk.
+
+**Columns:**
+- `chunk_hash` (TEXT) - Chunk hash (FK to chunks.chunk_hash)
+- `file_id` (TEXT) - File ID (FK to files.id)
+- `file_offset` (INTEGER) - Byte offset of chunk within file
+- `length` (INTEGER) - Length of chunk in bytes
+- PRIMARY KEY (`chunk_hash`, `file_id`)
+
+**Purpose:** Supports efficient queries for chunk usage and deduplication statistics.
+
+### 5. `blobs`
+Stores information about packed, compressed, and encrypted blob files.
+
+**Columns:**
+- `id` (TEXT PRIMARY KEY) - UUID assigned when blob creation starts
+- `blob_hash` (TEXT UNIQUE) - SHA256 hash of final blob (NULL until finalized)
+- `created_ts` (INTEGER NOT NULL) - Creation timestamp
+- `finished_ts` (INTEGER) - Finalization timestamp (NULL if in progress)
+- `uncompressed_size` (INTEGER NOT NULL DEFAULT 0) - Total size of chunks before compression
+- `compressed_size` (INTEGER NOT NULL DEFAULT 0) - Size after compression and encryption
+- `uploaded_ts` (INTEGER) - Upload completion timestamp (NULL if not uploaded)
+
+**Purpose:** Tracks blob lifecycle from creation through upload. The UUID primary key allows immediate association of chunks with blobs.
+
+### 6. `blob_chunks`
+Maps chunks to the blobs that contain them.
+
+**Columns:**
+- `blob_id` (TEXT) - Blob ID (FK to blobs.id)
+- `chunk_hash` (TEXT) - Chunk hash (FK to chunks.chunk_hash)
+- `offset` (INTEGER) - Byte offset of chunk within blob (before compression)
+- `length` (INTEGER) - Length of chunk in bytes
+- PRIMARY KEY (`blob_id`, `chunk_hash`)
+
+**Purpose:** Enables chunk retrieval from blobs during restore operations.
+
+### 7. `snapshots`
+Tracks backup snapshots.
+
+**Columns:**
+- `id` (TEXT PRIMARY KEY) - Snapshot ID (format: hostname-YYYYMMDD-HHMMSSZ)
+- `hostname` (TEXT) - Hostname where backup was created
+- `vaultik_version` (TEXT) - Version of Vaultik used
+- `vaultik_git_revision` (TEXT) - Git revision of Vaultik used
+- `started_at` (INTEGER) - Start timestamp
+- `completed_at` (INTEGER) - Completion timestamp (NULL if in progress)
+- `file_count` (INTEGER) - Number of files in snapshot
+- `chunk_count` (INTEGER) - Number of unique chunks
+- `blob_count` (INTEGER) - Number of blobs referenced
+- `total_size` (INTEGER) - Total size of all files
+- `blob_size` (INTEGER) - Total size of all blobs (compressed)
+- `blob_uncompressed_size` (INTEGER) - Total uncompressed size of all referenced blobs
+- `compression_ratio` (REAL) - Compression ratio achieved
+- `compression_level` (INTEGER) - Compression level used for this snapshot
+- `upload_bytes` (INTEGER) - Total bytes uploaded during this snapshot
+- `upload_duration_ms` (INTEGER) - Total milliseconds spent uploading to S3
+
+**Purpose:** Provides snapshot metadata and statistics including version tracking for compatibility.
+
+### 8. `snapshot_files`
+Maps snapshots to the files they contain.
+
+**Columns:**
+- `snapshot_id` (TEXT) - Snapshot ID (FK to snapshots.id)
+- `file_id` (TEXT) - File ID (FK to files.id)
+- PRIMARY KEY (`snapshot_id`, `file_id`)
+
+**Purpose:** Records which files are included in each snapshot.
+
+### 9. `snapshot_blobs`
+Maps snapshots to the blobs they reference.
+
+**Columns:**
+- `snapshot_id` (TEXT) - Snapshot ID (FK to snapshots.id)
+- `blob_id` (TEXT) - Blob ID (FK to blobs.id)
+- `blob_hash` (TEXT) - Denormalized blob hash for manifest generation
+- PRIMARY KEY (`snapshot_id`, `blob_id`)
+
+**Purpose:** Tracks blob dependencies for snapshots and enables manifest generation.
+
+### 10. `uploads`
+Tracks blob upload metrics.
+
+**Columns:**
+- `blob_hash` (TEXT PRIMARY KEY) - Hash of uploaded blob
+- `snapshot_id` (TEXT NOT NULL) - The snapshot that triggered this upload (FK to snapshots.id)
+- `uploaded_at` (INTEGER) - Upload timestamp
+- `size` (INTEGER) - Size of uploaded blob
+- `duration_ms` (INTEGER) - Upload duration in milliseconds
+
+**Purpose:** Performance monitoring and tracking which blobs were newly created (uploaded) during each snapshot.
+
+## Data Flow and Operations
+
+### 1. Backup Process
+
+1. **File Scanning**
+   - `INSERT OR REPLACE INTO files` - Update file metadata
+   - `SELECT * FROM files WHERE path = ?` - Check if file has changed
+   - `INSERT INTO snapshot_files` - Add file to current snapshot
+
+2. **Chunking** (for changed files)
+   - `INSERT OR IGNORE INTO chunks` - Store new chunks
+   - `INSERT INTO file_chunks` - Map chunks to file
+   - `INSERT INTO chunk_files` - Create reverse mapping
+
+3. **Blob Packing**
+   - `INSERT INTO blobs` - Create blob record with UUID (blob_hash NULL)
+   - `INSERT INTO blob_chunks` - Associate chunks with blob immediately
+   - `UPDATE blobs SET blob_hash = ?, finished_ts = ?` - Finalize blob after packing
+
+4. **Upload**
+   - `UPDATE blobs SET uploaded_ts = ?` - Mark blob as uploaded
+   - `INSERT INTO uploads` - Record upload metrics with snapshot_id
+   - `INSERT INTO snapshot_blobs` - Associate blob with snapshot
+
+5. **Snapshot Completion**
+   - `UPDATE snapshots SET completed_at = ?, stats...` - Finalize snapshot
+   - Generate and upload blob manifest from `snapshot_blobs`
+
+### 2. Incremental Backup
+
+1. **Change Detection**
+   - `SELECT * FROM files WHERE path = ?` - Get previous file metadata
+   - Compare mtime, size, mode to detect changes
+   - Skip unchanged files but still add to `snapshot_files`
+
+2. **Chunk Reuse**
+   - `SELECT * FROM blob_chunks WHERE chunk_hash = ?` - Find existing chunks
+   - `INSERT INTO snapshot_blobs` - Reference existing blobs for unchanged files
+
+### 3. Snapshot Metadata Export
+
+After a snapshot is completed:
+1. Copy database to temporary file
+2. Clean temporary database to contain only current snapshot data
+3. Export to SQL dump using sqlite3
+4. Compress with zstd and encrypt with age
+5. Upload to S3 as `metadata/{snapshot-id}/db.zst.age`
+6. Generate blob manifest and upload as `metadata/{snapshot-id}/manifest.json.zst`
+
+### 4. Restore Process
+
+The restore process doesn't use the local database. Instead:
+1. Downloads snapshot metadata from S3
+2. Downloads required blobs based on manifest
+3. Reconstructs files from decrypted and decompressed chunks
+
+### 5. Pruning
+
+1. **Identify Unreferenced Blobs**
+   - Query blobs not referenced by any remaining snapshot
+   - Delete from S3 and local database
+
+### 6. Incomplete Snapshot Cleanup
+
+Before each backup:
+1. Query incomplete snapshots (where `completed_at IS NULL`)
+2. Check if metadata exists in S3
+3. If no metadata, delete snapshot and all associations
+4. Clean up orphaned files, chunks, and blobs
+
+## Repository Pattern
+
+Vaultik uses a repository pattern for database access:
+
+- `FileRepository` - CRUD operations for files and file metadata
+- `ChunkRepository` - CRUD operations for content chunks
+- `FileChunkRepository` - Manage file-to-chunk mappings
+- `ChunkFileRepository` - Manage chunk-to-file reverse mappings  
+- `BlobRepository` - Manage blob lifecycle (creation, finalization, upload)
+- `BlobChunkRepository` - Manage blob-to-chunk associations
+- `SnapshotRepository` - Manage snapshots and their relationships
+- `UploadRepository` - Track blob upload metrics
+
+Each repository provides methods like:
+- `Create()` - Insert new record
+- `GetByID()` / `GetByPath()` / `GetByHash()` - Retrieve records
+- `Update()` - Update existing records
+- `Delete()` - Remove records
+- Specialized queries for each entity type (e.g., `DeleteOrphaned()`, `GetIncompleteByHostname()`)
+
+## Transaction Management
+
+All database operations that modify multiple tables are wrapped in transactions:
+
+```go
+err := repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
+    // Multiple repository operations using tx
+})
+```
+
+This ensures consistency, especially important for operations like:
+- Creating file-chunk mappings
+- Associating chunks with blobs
+- Updating snapshot statistics
+
+## Performance Considerations
+
+1. **Indexes**: 
+   - Primary keys are automatically indexed
+   - `idx_files_path` on `files(path)` for efficient file lookups
+
+2. **Prepared Statements**: All queries use prepared statements for performance and security
+
+3. **Batch Operations**: Where possible, operations are batched within transactions
+
+4. **Write-Ahead Logging**: SQLite WAL mode is enabled for better concurrency
+
+## Data Integrity
+
+1. **Foreign Keys**: Enforced through CASCADE DELETE and application-level repository methods
+2. **Unique Constraints**: Chunk hashes, file paths, and blob hashes are unique
+3. **Null Handling**: Nullable fields clearly indicate in-progress operations
+4. **Timestamp Tracking**: All major operations record timestamps for auditing
--- a/docs/REPOSTRUCTURE.md
+++ b/docs/REPOSTRUCTURE.md
@@ -0,0 +1,143 @@
+# Vaultik S3 Repository Structure
+
+This document describes the structure and organization of data stored in the S3 bucket by Vaultik.
+
+## Overview
+
+Vaultik stores all backup data in an S3-compatible object store. The repository consists of two main components:
+1. **Blobs** - The actual backup data (content-addressed, encrypted)
+2. **Metadata** - Snapshot information and manifests (partially encrypted)
+
+## Directory Structure
+
+```
+<bucket>/<prefix>/
+├── blobs/
+│   └── <hash[0:2]>/
+│       └── <hash[2:4]>/
+│           └── <full-hash>
+└── metadata/
+    └── <snapshot-id>/
+        ├── db.zst.age
+        └── manifest.json.zst
+```
+
+## Blobs Directory (`blobs/`)
+
+### Structure
+- **Path format**: `blobs/<first-2-chars>/<next-2-chars>/<full-hash>`
+- **Example**: `blobs/ca/fe/cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678`
+- **Sharding**: The two-level directory structure (using the first 4 characters of the hash) prevents any single directory from containing too many objects
+
+### Content
+- **What it contains**: Packed collections of content-defined chunks from files
+- **Format**: Zstandard compressed, then Age encrypted
+- **Encryption**: Always encrypted with Age using the configured recipients
+- **Naming**: Content-addressed using SHA256 hash of the encrypted blob
+
+### Why Encrypted
+Blobs contain the actual file data from backups and must be encrypted for security. The content-addressing ensures deduplication while the encryption ensures privacy.
+
+## Metadata Directory (`metadata/`)
+
+Each snapshot has its own subdirectory named with the snapshot ID.
+
+### Snapshot ID Format
+- **Format**: `<hostname>-<YYYYMMDD>-<HHMMSSZ>`
+- **Example**: `laptop-20240115-143052Z`
+- **Components**:
+  - Hostname (may contain hyphens)
+  - Date in YYYYMMDD format
+  - Time in HHMMSSZ format (Z indicates UTC)
+
+### Files in Each Snapshot Directory
+
+#### `db.zst.age` - Encrypted Database Dump
+- **What it contains**: Complete SQLite database dump for this snapshot
+- **Format**: SQL dump → Zstandard compressed → Age encrypted
+- **Encryption**: Encrypted with Age
+- **Purpose**: Contains full file metadata, chunk mappings, and all relationships
+- **Why encrypted**: Contains sensitive metadata like file paths, permissions, and ownership
+
+#### `manifest.json.zst` - Unencrypted Blob Manifest
+- **What it contains**: JSON list of all blob hashes referenced by this snapshot
+- **Format**: JSON → Zstandard compressed (NOT encrypted)
+- **Encryption**: NOT encrypted
+- **Purpose**: Enables pruning operations without requiring decryption keys
+- **Structure**:
+```json
+{
+  "snapshot_id": "laptop-20240115-143052Z",
+  "timestamp": "2024-01-15T14:30:52Z",
+  "blob_count": 42,
+  "blobs": [
+    "cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678",
+    "deadbeef1234567890abcdef1234567890abcdef1234567890abcdef12345678",
+    ...
+  ]
+}
+```
+
+### Why Manifest is Unencrypted
+The manifest must be readable without the private key to enable:
+1. **Pruning operations** - Identifying unreferenced blobs for deletion
+2. **Storage analysis** - Understanding space usage without decryption
+3. **Verification** - Checking blob existence without decryption
+4. **Cross-snapshot deduplication analysis** - Finding shared blobs between snapshots
+
+The manifest only contains blob hashes, not file names or any other sensitive information.
+
+## Security Considerations
+
+### What's Encrypted
+- **All file content** (in blobs)
+- **All file metadata** (paths, permissions, timestamps, ownership in db.zst.age)
+- **File-to-chunk mappings** (in db.zst.age)
+
+### What's Not Encrypted
+- **Blob hashes** (in manifest.json.zst)
+- **Snapshot IDs** (directory names)
+- **Blob count per snapshot** (in manifest.json.zst)
+
+### Privacy Implications
+From the unencrypted data, an observer can determine:
+- When backups were taken (from snapshot IDs)
+- Which hostname created backups (from snapshot IDs)
+- How many blobs each snapshot references
+- Which blobs are shared between snapshots (deduplication patterns)
+- The size of each encrypted blob
+
+An observer cannot determine:
+- File names or paths
+- File contents
+- File permissions or ownership
+- Directory structure
+- Which chunks belong to which files
+
+## Consistency Guarantees
+
+1. **Blobs are immutable** - Once written, a blob is never modified
+2. **Blobs are written before metadata** - A snapshot's metadata is only written after all its blobs are successfully uploaded
+3. **Metadata is written atomically** - Both db.zst.age and manifest.json.zst are written as complete files
+4. **Snapshots are marked complete in local DB only after metadata upload** - Ensures consistency between local and remote state
+
+## Pruning Safety
+
+The prune operation is safe because:
+1. It only deletes blobs not referenced in any manifest
+2. Manifests are unencrypted and can be read without keys
+3. The operation compares the latest local DB snapshot with the latest S3 snapshot to ensure consistency
+4. Pruning will fail if these don't match, preventing accidental deletion of needed blobs
+
+## Restoration Requirements
+
+To restore from a backup, you need:
+1. **The Age private key** - To decrypt blobs and database
+2. **The snapshot metadata** - Both files from the snapshot's metadata directory
+3. **All referenced blobs** - As listed in the manifest
+
+The restoration process:
+1. Download and decrypt the database dump to understand file structure
+2. Download and decrypt the required blobs
+3. Reconstruct files from their chunks
+4. Restore file metadata (permissions, timestamps, etc.)
--- a/go.mod
+++ b/go.mod
@@ -1,28 +1,305 @@
 module git.eeqj.de/sneak/vaultik

-go 1.24.4
+go 1.26.1

 require (
-	github.com/spf13/cobra v1.9.1
+	filippo.io/age v1.2.1
+	git.eeqj.de/sneak/smartconfig v1.0.0
+	github.com/adrg/xdg v0.5.3
+	github.com/aws/aws-sdk-go-v2 v1.39.6
+	github.com/aws/aws-sdk-go-v2/config v1.31.17
+	github.com/aws/aws-sdk-go-v2/credentials v1.18.21
+	github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.20.4
+	github.com/aws/aws-sdk-go-v2/service/s3 v1.90.0
+	github.com/aws/smithy-go v1.23.2
+	github.com/dustin/go-humanize v1.0.1
+	github.com/gobwas/glob v0.2.3
+	github.com/google/uuid v1.6.0
+	github.com/johannesboyne/gofakes3 v0.0.0-20250603205740-ed9094be7668
+	github.com/klauspost/compress v1.18.1
+	github.com/mattn/go-sqlite3 v1.14.29
+	github.com/rclone/rclone v1.72.1
+	github.com/schollz/progressbar/v3 v3.19.0
+	github.com/spf13/afero v1.15.0
+	github.com/spf13/cobra v1.10.1
+	github.com/stretchr/testify v1.11.1
 	go.uber.org/fx v1.24.0
+	golang.org/x/term v0.37.0
 	gopkg.in/yaml.v3 v3.0.1
 	modernc.org/sqlite v1.38.0
 )

 require (
-	github.com/dustin/go-humanize v1.0.1 // indirect
-	github.com/google/uuid v1.6.0 // indirect
+	cloud.google.com/go/auth v0.17.0 // indirect
+	cloud.google.com/go/auth/oauth2adapt v0.2.8 // indirect
+	cloud.google.com/go/compute/metadata v0.9.0 // indirect
+	cloud.google.com/go/iam v1.5.2 // indirect
+	cloud.google.com/go/secretmanager v1.15.0 // indirect
+	github.com/Azure/azure-sdk-for-go/sdk/azcore v1.20.0 // indirect
+	github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.13.0 // indirect
+	github.com/Azure/azure-sdk-for-go/sdk/internal v1.11.2 // indirect
+	github.com/Azure/azure-sdk-for-go/sdk/keyvault/azsecrets v0.12.0 // indirect
+	github.com/Azure/azure-sdk-for-go/sdk/keyvault/internal v0.7.1 // indirect
+	github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.6.3 // indirect
+	github.com/Azure/azure-sdk-for-go/sdk/storage/azfile v1.5.3 // indirect
+	github.com/Azure/go-ntlmssp v0.0.2-0.20251110135918-10b7b7e7cd26 // indirect
+	github.com/AzureAD/microsoft-authentication-library-for-go v1.6.0 // indirect
+	github.com/Files-com/files-sdk-go/v3 v3.2.264 // indirect
+	github.com/IBM/go-sdk-core/v5 v5.21.0 // indirect
+	github.com/Max-Sum/base32768 v0.0.0-20230304063302-18e6ce5945fd // indirect
+	github.com/Microsoft/go-winio v0.6.2 // indirect
+	github.com/ProtonMail/bcrypt v0.0.0-20211005172633-e235017c1baf // indirect
+	github.com/ProtonMail/gluon v0.17.1-0.20230724134000-308be39be96e // indirect
+	github.com/ProtonMail/go-crypto v1.3.0 // indirect
+	github.com/ProtonMail/go-mime v0.0.0-20230322103455-7d82a3887f2f // indirect
+	github.com/ProtonMail/go-srp v0.0.7 // indirect
+	github.com/ProtonMail/gopenpgp/v2 v2.9.0 // indirect
+	github.com/PuerkitoBio/goquery v1.10.3 // indirect
+	github.com/a1ex3/zstd-seekable-format-go/pkg v0.10.0 // indirect
+	github.com/abbot/go-http-auth v0.4.0 // indirect
+	github.com/anchore/go-lzo v0.1.0 // indirect
+	github.com/andybalholm/cascadia v1.3.3 // indirect
+	github.com/appscode/go-querystring v0.0.0-20170504095604-0126cfb3f1dc // indirect
+	github.com/armon/go-metrics v0.4.1 // indirect
+	github.com/aws/aws-sdk-go v1.44.256 // indirect
+	github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.3 // indirect
+	github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.13 // indirect
+	github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.13 // indirect
+	github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.13 // indirect
+	github.com/aws/aws-sdk-go-v2/internal/ini v1.8.4 // indirect
+	github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.13 // indirect
+	github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.3 // indirect
+	github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.4 // indirect
+	github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.13 // indirect
+	github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.13 // indirect
+	github.com/aws/aws-sdk-go-v2/service/secretsmanager v1.35.8 // indirect
+	github.com/aws/aws-sdk-go-v2/service/sso v1.30.1 // indirect
+	github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.5 // indirect
+	github.com/aws/aws-sdk-go-v2/service/sts v1.39.1 // indirect
+	github.com/bahlo/generic-list-go v0.2.0 // indirect
+	github.com/beorn7/perks v1.0.1 // indirect
+	github.com/boombuler/barcode v1.1.0 // indirect
+	github.com/bradenaw/juniper v0.15.3 // indirect
+	github.com/bradfitz/iter v0.0.0-20191230175014-e8f45d346db8 // indirect
+	github.com/buengese/sgzip v0.1.1 // indirect
+	github.com/buger/jsonparser v1.1.1 // indirect
+	github.com/calebcase/tmpfile v1.0.3 // indirect
+	github.com/cenkalti/backoff/v4 v4.3.0 // indirect
+	github.com/cespare/xxhash/v2 v2.3.0 // indirect
+	github.com/chilts/sid v0.0.0-20190607042430-660e94789ec9 // indirect
+	github.com/clipperhouse/stringish v0.1.1 // indirect
+	github.com/clipperhouse/uax29/v2 v2.3.0 // indirect
+	github.com/cloudflare/circl v1.6.1 // indirect
+	github.com/cloudinary/cloudinary-go/v2 v2.13.0 // indirect
+	github.com/cloudsoda/go-smb2 v0.0.0-20250228001242-d4c70e6251cc // indirect
+	github.com/cloudsoda/sddl v0.0.0-20250224235906-926454e91efc // indirect
+	github.com/colinmarc/hdfs/v2 v2.4.0 // indirect
+	github.com/coreos/go-semver v0.3.1 // indirect
+	github.com/coreos/go-systemd/v22 v22.6.0 // indirect
+	github.com/creasty/defaults v1.8.0 // indirect
+	github.com/cronokirby/saferith v0.33.0 // indirect
+	github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
+	github.com/diskfs/go-diskfs v1.7.0 // indirect
+	github.com/dropbox/dropbox-sdk-go-unofficial/v6 v6.0.5 // indirect
+	github.com/ebitengine/purego v0.9.1 // indirect
+	github.com/emersion/go-message v0.18.2 // indirect
+	github.com/emersion/go-vcard v0.0.0-20241024213814-c9703dde27ff // indirect
+	github.com/emicklei/go-restful/v3 v3.11.0 // indirect
+	github.com/fatih/color v1.16.0 // indirect
+	github.com/felixge/httpsnoop v1.0.4 // indirect
+	github.com/flynn/noise v1.1.0 // indirect
+	github.com/fxamacker/cbor/v2 v2.7.0 // indirect
+	github.com/gabriel-vasile/mimetype v1.4.11 // indirect
+	github.com/geoffgarside/ber v1.2.0 // indirect
+	github.com/go-chi/chi/v5 v5.2.3 // indirect
+	github.com/go-darwin/apfs v0.0.0-20211011131704-f84b94dbf348 // indirect
+	github.com/go-git/go-billy/v5 v5.6.2 // indirect
+	github.com/go-jose/go-jose/v4 v4.1.2 // indirect
+	github.com/go-logr/logr v1.4.3 // indirect
+	github.com/go-logr/stdr v1.2.2 // indirect
+	github.com/go-ole/go-ole v1.3.0 // indirect
+	github.com/go-openapi/errors v0.22.4 // indirect
+	github.com/go-openapi/jsonpointer v0.21.0 // indirect
+	github.com/go-openapi/jsonreference v0.20.2 // indirect
+	github.com/go-openapi/strfmt v0.25.0 // indirect
+	github.com/go-openapi/swag v0.23.0 // indirect
+	github.com/go-playground/locales v0.14.1 // indirect
+	github.com/go-playground/universal-translator v0.18.1 // indirect
+	github.com/go-playground/validator/v10 v10.28.0 // indirect
+	github.com/go-resty/resty/v2 v2.16.5 // indirect
+	github.com/go-viper/mapstructure/v2 v2.4.0 // indirect
+	github.com/gofrs/flock v0.13.0 // indirect
+	github.com/gogo/protobuf v1.3.2 // indirect
+	github.com/golang-jwt/jwt/v4 v4.5.2 // indirect
+	github.com/golang-jwt/jwt/v5 v5.3.0 // indirect
+	github.com/golang/protobuf v1.5.4 // indirect
+	github.com/google/btree v1.1.3 // indirect
+	github.com/google/gnostic-models v0.6.9 // indirect
+	github.com/google/go-cmp v0.7.0 // indirect
+	github.com/google/s2a-go v0.1.9 // indirect
+	github.com/googleapis/enterprise-certificate-proxy v0.3.7 // indirect
+	github.com/googleapis/gax-go/v2 v2.15.0 // indirect
+	github.com/gopherjs/gopherjs v1.17.2 // indirect
+	github.com/gorilla/schema v1.4.1 // indirect
+	github.com/grpc-ecosystem/grpc-gateway/v2 v2.26.3 // indirect
+	github.com/hashicorp/consul/api v1.32.1 // indirect
+	github.com/hashicorp/errwrap v1.1.0 // indirect
+	github.com/hashicorp/go-cleanhttp v0.5.2 // indirect
+	github.com/hashicorp/go-hclog v1.6.3 // indirect
+	github.com/hashicorp/go-immutable-radix v1.3.1 // indirect
+	github.com/hashicorp/go-multierror v1.1.1 // indirect
+	github.com/hashicorp/go-retryablehttp v0.7.8 // indirect
+	github.com/hashicorp/go-rootcerts v1.0.2 // indirect
+	github.com/hashicorp/go-secure-stdlib/parseutil v0.1.6 // indirect
+	github.com/hashicorp/go-secure-stdlib/strutil v0.1.2 // indirect
+	github.com/hashicorp/go-sockaddr v1.0.2 // indirect
+	github.com/hashicorp/go-uuid v1.0.3 // indirect
+	github.com/hashicorp/golang-lru v0.5.4 // indirect
+	github.com/hashicorp/hcl v1.0.1-vault-7 // indirect
+	github.com/hashicorp/serf v0.10.1 // indirect
+	github.com/hashicorp/vault/api v1.20.0 // indirect
+	github.com/henrybear327/Proton-API-Bridge v1.0.0 // indirect
+	github.com/henrybear327/go-proton-api v1.0.0 // indirect
 	github.com/inconshreveable/mousetrap v1.1.0 // indirect
+	github.com/jcmturner/aescts/v2 v2.0.0 // indirect
+	github.com/jcmturner/dnsutils/v2 v2.0.0 // indirect
+	github.com/jcmturner/gofork v1.7.6 // indirect
+	github.com/jcmturner/goidentity/v6 v6.0.1 // indirect
+	github.com/jcmturner/gokrb5/v8 v8.4.4 // indirect
+	github.com/jcmturner/rpc/v2 v2.0.3 // indirect
+	github.com/jlaffaye/ftp v0.2.1-0.20240918233326-1b970516f5d3 // indirect
+	github.com/josharian/intern v1.0.0 // indirect
+	github.com/json-iterator/go v1.1.12 // indirect
+	github.com/jtolds/gls v4.20.0+incompatible // indirect
+	github.com/jtolio/noiseconn v0.0.0-20231127013910-f6d9ecbf1de7 // indirect
+	github.com/jzelinskie/whirlpool v0.0.0-20201016144138-0675e54bb004 // indirect
+	github.com/klauspost/cpuid/v2 v2.3.0 // indirect
+	github.com/koofr/go-httpclient v0.0.0-20240520111329-e20f8f203988 // indirect
+	github.com/koofr/go-koofrclient v0.0.0-20221207135200-cbd7fc9ad6a6 // indirect
+	github.com/kr/fs v0.1.0 // indirect
+	github.com/kylelemons/godebug v1.1.0 // indirect
+	github.com/lanrat/extsort v1.4.2 // indirect
+	github.com/leodido/go-urn v1.4.0 // indirect
+	github.com/lpar/date v1.0.0 // indirect
+	github.com/lufia/plan9stats v0.0.0-20251013123823-9fd1530e3ec3 // indirect
+	github.com/mailru/easyjson v0.9.1 // indirect
+	github.com/mattn/go-colorable v0.1.14 // indirect
 	github.com/mattn/go-isatty v0.0.20 // indirect
+	github.com/mattn/go-runewidth v0.0.19 // indirect
+	github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db // indirect
+	github.com/mitchellh/go-homedir v1.1.0 // indirect
+	github.com/mitchellh/mapstructure v1.5.0 // indirect
+	github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
+	github.com/modern-go/reflect2 v1.0.2 // indirect
+	github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
 	github.com/ncruces/go-strftime v0.1.9 // indirect
+	github.com/ncw/swift/v2 v2.0.5 // indirect
+	github.com/oklog/ulid v1.3.1 // indirect
+	github.com/onsi/ginkgo/v2 v2.23.3 // indirect
+	github.com/oracle/oci-go-sdk/v65 v65.104.0 // indirect
+	github.com/panjf2000/ants/v2 v2.11.3 // indirect
+	github.com/patrickmn/go-cache v2.1.0+incompatible // indirect
+	github.com/pengsrc/go-shared v0.2.1-0.20190131101655-1999055a4a14 // indirect
+	github.com/peterh/liner v1.2.2 // indirect
+	github.com/pierrec/lz4/v4 v4.1.22 // indirect
+	github.com/pkg/browser v0.0.0-20240102092130-5ac0b6a4141c // indirect
+	github.com/pkg/errors v0.9.1 // indirect
+	github.com/pkg/sftp v1.13.10 // indirect
+	github.com/pkg/xattr v0.4.12 // indirect
+	github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
+	github.com/power-devops/perfstat v0.0.0-20240221224432-82ca36839d55 // indirect
+	github.com/pquerna/otp v1.5.0 // indirect
+	github.com/prometheus/client_golang v1.23.2 // indirect
+	github.com/prometheus/client_model v0.6.2 // indirect
+	github.com/prometheus/common v0.67.2 // indirect
+	github.com/prometheus/procfs v0.19.2 // indirect
+	github.com/putdotio/go-putio/putio v0.0.0-20200123120452-16d982cac2b8 // indirect
+	github.com/relvacode/iso8601 v1.7.0 // indirect
 	github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
-	github.com/spf13/pflag v1.0.6 // indirect
+	github.com/rfjakob/eme v1.1.2 // indirect
+	github.com/rivo/uniseg v0.4.7 // indirect
+	github.com/ryanuber/go-glob v1.0.0 // indirect
+	github.com/ryszard/goskiplist v0.0.0-20150312221310-2dfbae5fcf46 // indirect
+	github.com/sabhiram/go-gitignore v0.0.0-20210923224102-525f6e181f06 // indirect
+	github.com/samber/lo v1.52.0 // indirect
+	github.com/shirou/gopsutil/v4 v4.25.10 // indirect
+	github.com/sirupsen/logrus v1.9.4-0.20230606125235-dd1b4c2e81af // indirect
+	github.com/skratchdot/open-golang v0.0.0-20200116055534-eef842397966 // indirect
+	github.com/smarty/assertions v1.16.0 // indirect
+	github.com/sony/gobreaker v1.0.0 // indirect
+	github.com/spacemonkeygo/monkit/v3 v3.0.25-0.20251022131615-eb24eb109368 // indirect
+	github.com/spf13/pflag v1.0.10 // indirect
+	github.com/t3rm1n4l/go-mega v0.0.0-20251031123324-a804aaa87491 // indirect
+	github.com/tidwall/gjson v1.18.0 // indirect
+	github.com/tidwall/match v1.1.1 // indirect
+	github.com/tidwall/pretty v1.2.0 // indirect
+	github.com/tklauser/go-sysconf v0.3.15 // indirect
+	github.com/tklauser/numcpus v0.10.0 // indirect
+	github.com/ulikunitz/xz v0.5.15 // indirect
+	github.com/unknwon/goconfig v1.0.0 // indirect
+	github.com/wk8/go-ordered-map/v2 v2.1.8 // indirect
+	github.com/x448/float16 v0.8.4 // indirect
+	github.com/xanzy/ssh-agent v0.3.3 // indirect
+	github.com/youmark/pkcs8 v0.0.0-20240726163527-a2c0da244d78 // indirect
+	github.com/yunify/qingstor-sdk-go/v3 v3.2.0 // indirect
+	github.com/yusufpapurcu/wmi v1.2.4 // indirect
+	github.com/zeebo/blake3 v0.2.4 // indirect
+	github.com/zeebo/errs v1.4.0 // indirect
+	github.com/zeebo/xxh3 v1.0.2 // indirect
+	go.etcd.io/bbolt v1.4.3 // indirect
+	go.etcd.io/etcd/api/v3 v3.6.2 // indirect
+	go.etcd.io/etcd/client/pkg/v3 v3.6.2 // indirect
+	go.etcd.io/etcd/client/v3 v3.6.2 // indirect
+	go.mongodb.org/mongo-driver v1.17.6 // indirect
+	go.opentelemetry.io/auto/sdk v1.2.1 // indirect
+	go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.61.0 // indirect
+	go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.63.0 // indirect
+	go.opentelemetry.io/otel v1.38.0 // indirect
+	go.opentelemetry.io/otel/metric v1.38.0 // indirect
+	go.opentelemetry.io/otel/trace v1.38.0 // indirect
+	go.shabbyrobe.org/gocovmerge v0.0.0-20230507111327-fa4f82cfbf4d // indirect
 	go.uber.org/dig v1.19.0 // indirect
-	go.uber.org/multierr v1.10.0 // indirect
-	go.uber.org/zap v1.26.0 // indirect
-	golang.org/x/exp v0.0.0-20250408133849-7e4ce0ab07d0 // indirect
-	golang.org/x/sys v0.33.0 // indirect
+	go.uber.org/multierr v1.11.0 // indirect
+	go.uber.org/zap v1.27.0 // indirect
+	go.yaml.in/yaml/v2 v2.4.3 // indirect
+	golang.org/x/crypto v0.45.0 // indirect
+	golang.org/x/exp v0.0.0-20251023183803-a4bb9ffd2546 // indirect
+	golang.org/x/net v0.47.0 // indirect
+	golang.org/x/oauth2 v0.33.0 // indirect
+	golang.org/x/sync v0.18.0 // indirect
+	golang.org/x/sys v0.38.0 // indirect
+	golang.org/x/text v0.31.0 // indirect
+	golang.org/x/time v0.14.0 // indirect
+	golang.org/x/tools v0.38.0 // indirect
+	google.golang.org/api v0.255.0 // indirect
+	google.golang.org/genproto v0.0.0-20250603155806-513f23925822 // indirect
+	google.golang.org/genproto/googleapis/api v0.0.0-20250804133106-a7a43d27e69b // indirect
+	google.golang.org/genproto/googleapis/rpc v0.0.0-20251103181224-f26f9409b101 // indirect
+	google.golang.org/grpc v1.76.0 // indirect
+	google.golang.org/protobuf v1.36.10 // indirect
+	gopkg.in/evanphx/json-patch.v4 v4.12.0 // indirect
+	gopkg.in/inf.v0 v0.9.1 // indirect
+	gopkg.in/natefinch/lumberjack.v2 v2.2.1 // indirect
+	gopkg.in/validator.v2 v2.0.1 // indirect
+	gopkg.in/yaml.v2 v2.4.0 // indirect
+	k8s.io/api v0.33.3 // indirect
+	k8s.io/apimachinery v0.33.3 // indirect
+	k8s.io/client-go v0.33.3 // indirect
+	k8s.io/klog/v2 v2.130.1 // indirect
+	k8s.io/kube-openapi v0.0.0-20250318190949-c8a335a9a2ff // indirect
+	k8s.io/utils v0.0.0-20241104100929-3ea5e8cea738 // indirect
 	modernc.org/libc v1.65.10 // indirect
 	modernc.org/mathutil v1.7.1 // indirect
 	modernc.org/memory v1.11.0 // indirect
+	moul.io/http2curl/v2 v2.3.0 // indirect
+	sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3 // indirect
+	sigs.k8s.io/randfill v1.0.0 // indirect
+	sigs.k8s.io/structured-merge-diff/v4 v4.6.0 // indirect
+	sigs.k8s.io/yaml v1.6.0 // indirect
+	storj.io/common v0.0.0-20251107171817-6221ae45072c // indirect
+	storj.io/drpc v0.0.35-0.20250513201419-f7819ea69b55 // indirect
+	storj.io/eventkit v0.0.0-20250410172343-61f26d3de156 // indirect
+	storj.io/infectious v0.0.2 // indirect
+	storj.io/picobuf v0.0.4 // indirect
+	storj.io/uplink v1.13.1 // indirect
 )
--- a/go.sum
+++ b/go.sum
--- a/internal/blob/errors.go
+++ b/internal/blob/errors.go
@@ -0,0 +1,6 @@
+package blob
+
+import "errors"
+
+// ErrBlobSizeLimitExceeded is returned when adding a chunk would exceed the blob size limit
+var ErrBlobSizeLimitExceeded = errors.New("adding chunk would exceed blob size limit")
--- a/internal/blob/packer.go
+++ b/internal/blob/packer.go
@@ -0,0 +1,555 @@
+// Package blob handles the creation of blobs - the final storage units for Vaultik.
+// A blob is a large file (up to 10GB) containing many compressed and encrypted chunks
+// from multiple source files. Blobs are content-addressed, meaning their filename
+// is derived from the SHA256 hash of their compressed and encrypted content.
+//
+// The blob creation process:
+// 1. Chunks are accumulated from multiple files
+// 2. The collection is compressed using zstd
+// 3. The compressed data is encrypted using age
+// 4. The encrypted blob is hashed to create its content-addressed name
+// 5. The blob is uploaded to S3 using the hash as the filename
+//
+// This design optimizes storage efficiency by batching many small chunks into
+// larger blobs, reducing the number of S3 operations and associated costs.
+package blob
+
+import (
+	"context"
+	"database/sql"
+	"encoding/hex"
+	"fmt"
+	"io"
+	"sync"
+	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/blobgen"
+	"git.eeqj.de/sneak/vaultik/internal/database"
+	"git.eeqj.de/sneak/vaultik/internal/log"
+	"git.eeqj.de/sneak/vaultik/internal/types"
+	"github.com/google/uuid"
+	"github.com/spf13/afero"
+)
+
+// BlobHandler is a callback function invoked when a blob is finalized and ready for upload.
+// The handler receives a BlobWithReader containing the blob metadata and a reader for
+// the compressed and encrypted blob content. The handler is responsible for uploading
+// the blob to storage and cleaning up any temporary files.
+type BlobHandler func(blob *BlobWithReader) error
+
+// PackerConfig holds configuration for creating a Packer.
+// All fields except BlobHandler are required.
+type PackerConfig struct {
+	MaxBlobSize      int64                  // Maximum size of a blob before forcing finalization
+	CompressionLevel int                    // Zstd compression level (1-19, higher = better compression)
+	Recipients       []string               // Age recipients for encryption
+	Repositories     *database.Repositories // Database repositories for tracking blob metadata
+	BlobHandler      BlobHandler            // Optional callback when blob is ready for upload
+	Fs               afero.Fs               // Filesystem for temporary files
+}
+
+// PendingChunk represents a chunk waiting to be inserted into the database.
+type PendingChunk struct {
+	Hash string
+	Size int64
+}
+
+// Packer accumulates chunks and packs them into blobs.
+// It handles compression, encryption, and coordination with the database
+// to track blob metadata. Packer is thread-safe.
+type Packer struct {
+	maxBlobSize      int64
+	compressionLevel int
+	recipients       []string               // Age recipients for encryption
+	blobHandler      BlobHandler            // Called when blob is ready
+	repos            *database.Repositories // For creating blob records
+	fs               afero.Fs               // Filesystem for temporary files
+
+	// Mutex for thread-safe blob creation
+	mu sync.Mutex
+
+	// Current blob being packed
+	currentBlob   *blobInProgress
+	finishedBlobs []*FinishedBlob // Only used if no handler provided
+
+	// Pending chunks to be inserted when blob finalizes
+	pendingChunks []PendingChunk
+}
+
+// blobInProgress represents a blob being assembled
+type blobInProgress struct {
+	id        string          // UUID of the blob
+	chunks    []*chunkInfo    // Track chunk metadata
+	chunkSet  map[string]bool // Track unique chunks in this blob
+	tempFile  afero.File      // Temporary file for encrypted compressed data
+	writer    *blobgen.Writer // Unified compression/encryption/hashing writer
+	startTime time.Time
+	size      int64 // Current uncompressed size
+}
+
+// ChunkRef represents a chunk to be added to a blob.
+// The Hash is the content-addressed identifier (SHA256) of the chunk,
+// and Data contains the raw chunk bytes. After adding to a blob,
+// the Data can be safely discarded as it's written to the blob immediately.
+type ChunkRef struct {
+	Hash string // SHA256 hash of the chunk data
+	Data []byte // Raw chunk content
+}
+
+// chunkInfo tracks chunk metadata in a blob
+type chunkInfo struct {
+	Hash   string
+	Offset int64
+	Size   int64
+}
+
+// FinishedBlob represents a completed blob ready for storage
+type FinishedBlob struct {
+	ID           string
+	Hash         string
+	Data         []byte // Compressed data
+	Chunks       []*BlobChunkRef
+	CreatedTS    time.Time
+	Uncompressed int64
+	Compressed   int64
+}
+
+// BlobChunkRef represents a chunk's position within a blob
+type BlobChunkRef struct {
+	ChunkHash string
+	Offset    int64
+	Length    int64
+}
+
+// BlobWithReader wraps a FinishedBlob with its data reader
+type BlobWithReader struct {
+	*FinishedBlob
+	Reader              io.ReadSeeker
+	TempFile            afero.File // Optional, only set for disk-based blobs
+	InsertedChunkHashes []string   // Chunk hashes that were inserted to DB with this blob
+}
+
+// NewPacker creates a new blob packer that accumulates chunks into blobs.
+// The packer will automatically finalize blobs when they reach MaxBlobSize.
+// Returns an error if required configuration fields are missing or invalid.
+func NewPacker(cfg PackerConfig) (*Packer, error) {
+	if len(cfg.Recipients) == 0 {
+		return nil, fmt.Errorf("recipients are required - blobs must be encrypted")
+	}
+	if cfg.MaxBlobSize <= 0 {
+		return nil, fmt.Errorf("max blob size must be positive")
+	}
+	if cfg.Fs == nil {
+		return nil, fmt.Errorf("filesystem is required")
+	}
+	return &Packer{
+		maxBlobSize:      cfg.MaxBlobSize,
+		compressionLevel: cfg.CompressionLevel,
+		recipients:       cfg.Recipients,
+		blobHandler:      cfg.BlobHandler,
+		repos:            cfg.Repositories,
+		fs:               cfg.Fs,
+		finishedBlobs:    make([]*FinishedBlob, 0),
+	}, nil
+}
+
+// SetBlobHandler sets the handler to be called when a blob is finalized.
+// The handler is responsible for uploading the blob to storage.
+// If no handler is set, finalized blobs are stored in memory and can be
+// retrieved with GetFinishedBlobs().
+func (p *Packer) SetBlobHandler(handler BlobHandler) {
+	p.mu.Lock()
+	defer p.mu.Unlock()
+	p.blobHandler = handler
+}
+
+// AddPendingChunk queues a chunk to be inserted into the database when the
+// current blob is finalized. This batches chunk inserts to reduce transaction
+// overhead. Thread-safe.
+func (p *Packer) AddPendingChunk(hash string, size int64) {
+	p.mu.Lock()
+	defer p.mu.Unlock()
+	p.pendingChunks = append(p.pendingChunks, PendingChunk{Hash: hash, Size: size})
+}
+
+// AddChunk adds a chunk to the current blob being packed.
+// If adding the chunk would exceed MaxBlobSize, returns ErrBlobSizeLimitExceeded.
+// In this case, the caller should finalize the current blob and retry.
+// The chunk data is written immediately and can be garbage collected after this call.
+// Thread-safe.
+func (p *Packer) AddChunk(chunk *ChunkRef) error {
+	p.mu.Lock()
+	defer p.mu.Unlock()
+
+	// Initialize new blob if needed
+	if p.currentBlob == nil {
+		if err := p.startNewBlob(); err != nil {
+			return fmt.Errorf("starting new blob: %w", err)
+		}
+	}
+
+	// Check if adding this chunk would exceed blob size limit
+	// Use conservative estimate: assume no compression
+	// Skip size check if chunk already exists in blob
+	if !p.currentBlob.chunkSet[chunk.Hash] {
+		currentSize := p.currentBlob.size
+		newSize := currentSize + int64(len(chunk.Data))
+
+		if newSize > p.maxBlobSize && len(p.currentBlob.chunks) > 0 {
+			// Return error indicating size limit would be exceeded
+			return ErrBlobSizeLimitExceeded
+		}
+	}
+
+	// Add chunk to current blob
+	if err := p.addChunkToCurrentBlob(chunk); err != nil {
+		return err
+	}
+
+	return nil
+}
+
+// Flush finalizes any in-progress blob, compressing, encrypting, and hashing it.
+// This should be called after all chunks have been added to ensure no data is lost.
+// If a BlobHandler is set, it will be called with the finalized blob.
+// Thread-safe.
+func (p *Packer) Flush() error {
+	p.mu.Lock()
+	defer p.mu.Unlock()
+
+	if p.currentBlob != nil && len(p.currentBlob.chunks) > 0 {
+		if err := p.finalizeCurrentBlob(); err != nil {
+			return fmt.Errorf("finalizing blob: %w", err)
+		}
+	}
+
+	return nil
+}
+
+// FinalizeBlob finalizes the current blob being assembled.
+// This compresses the accumulated chunks, encrypts the result, and computes
+// the content-addressed hash. The finalized blob is either passed to the
+// BlobHandler (if set) or stored internally.
+// Caller must handle retrying any chunk that triggered size limit exceeded.
+// Not thread-safe - caller must hold the lock.
+func (p *Packer) FinalizeBlob() error {
+	p.mu.Lock()
+	defer p.mu.Unlock()
+
+	if p.currentBlob == nil {
+		return nil
+	}
+
+	return p.finalizeCurrentBlob()
+}
+
+// GetFinishedBlobs returns all completed blobs and clears the internal list.
+// This is only used when no BlobHandler is set. After calling this method,
+// the caller is responsible for uploading the blobs to storage.
+// Thread-safe.
+func (p *Packer) GetFinishedBlobs() []*FinishedBlob {
+	p.mu.Lock()
+	defer p.mu.Unlock()
+
+	blobs := p.finishedBlobs
+	p.finishedBlobs = make([]*FinishedBlob, 0)
+	return blobs
+}
+
+// startNewBlob initializes a new blob (must be called with lock held)
+func (p *Packer) startNewBlob() error {
+	// Generate UUID for the blob
+	blobID := uuid.New().String()
+
+	// Create blob record in database
+	if p.repos != nil {
+		blobIDTyped, err := types.ParseBlobID(blobID)
+		if err != nil {
+			return fmt.Errorf("parsing blob ID: %w", err)
+		}
+		blob := &database.Blob{
+			ID:               blobIDTyped,
+			Hash:             types.BlobHash("temp-placeholder-" + blobID), // Temporary placeholder until finalized
+			CreatedTS:        time.Now().UTC(),
+			FinishedTS:       nil,
+			UncompressedSize: 0,
+			CompressedSize:   0,
+			UploadedTS:       nil,
+		}
+		if err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
+			return p.repos.Blobs.Create(ctx, tx, blob)
+		}); err != nil {
+			return fmt.Errorf("creating blob record: %w", err)
+		}
+	}
+
+	// Create temporary file
+	tempFile, err := afero.TempFile(p.fs, "", "vaultik-blob-*.tmp")
+	if err != nil {
+		return fmt.Errorf("creating temp file: %w", err)
+	}
+
+	// Create blobgen writer for unified compression/encryption/hashing
+	writer, err := blobgen.NewWriter(tempFile, p.compressionLevel, p.recipients)
+	if err != nil {
+		_ = tempFile.Close()
+		_ = p.fs.Remove(tempFile.Name())
+		return fmt.Errorf("creating blobgen writer: %w", err)
+	}
+
+	p.currentBlob = &blobInProgress{
+		id:        blobID,
+		chunks:    make([]*chunkInfo, 0),
+		chunkSet:  make(map[string]bool),
+		startTime: time.Now().UTC(),
+		tempFile:  tempFile,
+		writer:    writer,
+		size:      0,
+	}
+
+	log.Debug("Created new blob container", "blob_id", blobID, "temp_file", tempFile.Name())
+	return nil
+}
+
+// addChunkToCurrentBlob adds a chunk to the current blob (must be called with lock held)
+func (p *Packer) addChunkToCurrentBlob(chunk *ChunkRef) error {
+	// Skip if chunk already in current blob
+	if p.currentBlob.chunkSet[chunk.Hash] {
+		log.Debug("Skipping duplicate chunk already in current blob", "chunk_hash", chunk.Hash)
+		return nil
+	}
+
+	// Track offset before writing
+	offset := p.currentBlob.size
+
+	// Write to the blobgen writer (compression -> encryption -> disk)
+	if _, err := p.currentBlob.writer.Write(chunk.Data); err != nil {
+		return fmt.Errorf("writing to blob stream: %w", err)
+	}
+
+	// Track chunk info
+	chunkSize := int64(len(chunk.Data))
+	chunkInfo := &chunkInfo{
+		Hash:   chunk.Hash,
+		Offset: offset,
+		Size:   chunkSize,
+	}
+	p.currentBlob.chunks = append(p.currentBlob.chunks, chunkInfo)
+	p.currentBlob.chunkSet[chunk.Hash] = true
+
+	// Note: blob_chunk records are inserted in batch when blob is finalized
+	// to reduce transaction overhead. The chunk info is already stored in
+	// p.currentBlob.chunks for later insertion.
+
+	// Update total size
+	p.currentBlob.size += chunkSize
+
+	log.Debug("Added chunk to blob container",
+		"blob_id", p.currentBlob.id,
+		"chunk_hash", chunk.Hash,
+		"chunk_size", len(chunk.Data),
+		"offset", offset,
+		"blob_chunks", len(p.currentBlob.chunks),
+		"uncompressed_size", p.currentBlob.size)
+
+	return nil
+}
+
+// finalizeCurrentBlob completes the current blob (must be called with lock held)
+func (p *Packer) finalizeCurrentBlob() error {
+	if p.currentBlob == nil {
+		return nil
+	}
+
+	// Close blobgen writer to flush all data
+	if err := p.currentBlob.writer.Close(); err != nil {
+		p.cleanupTempFile()
+		return fmt.Errorf("closing blobgen writer: %w", err)
+	}
+
+	// Sync file to ensure all data is written
+	if err := p.currentBlob.tempFile.Sync(); err != nil {
+		p.cleanupTempFile()
+		return fmt.Errorf("syncing temp file: %w", err)
+	}
+
+	// Get the final size (encrypted if applicable)
+	finalSize, err := p.currentBlob.tempFile.Seek(0, io.SeekCurrent)
+	if err != nil {
+		p.cleanupTempFile()
+		return fmt.Errorf("getting file size: %w", err)
+	}
+
+	// Reset to beginning for reading
+	if _, err := p.currentBlob.tempFile.Seek(0, io.SeekStart); err != nil {
+		p.cleanupTempFile()
+		return fmt.Errorf("seeking to start: %w", err)
+	}
+
+	// Get hash from blobgen writer (of final encrypted data)
+	finalHash := p.currentBlob.writer.Sum256()
+	blobHash := hex.EncodeToString(finalHash)
+
+	// Create chunk references with offsets
+	chunkRefs := make([]*BlobChunkRef, 0, len(p.currentBlob.chunks))
+
+	for _, chunk := range p.currentBlob.chunks {
+		chunkRefs = append(chunkRefs, &BlobChunkRef{
+			ChunkHash: chunk.Hash,
+			Offset:    chunk.Offset,
+			Length:    chunk.Size,
+		})
+	}
+
+	// Get pending chunks (will be inserted to DB and reported to handler)
+	chunksToInsert := p.pendingChunks
+	p.pendingChunks = nil // Clear pending list
+
+	// Insert pending chunks, blob_chunks, and update blob in a single transaction
+	if p.repos != nil {
+		blobIDTyped, parseErr := types.ParseBlobID(p.currentBlob.id)
+		if parseErr != nil {
+			p.cleanupTempFile()
+			return fmt.Errorf("parsing blob ID: %w", parseErr)
+		}
+		err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
+			// First insert all pending chunks (required for blob_chunks FK)
+			for _, chunk := range chunksToInsert {
+				dbChunk := &database.Chunk{
+					ChunkHash: types.ChunkHash(chunk.Hash),
+					Size:      chunk.Size,
+				}
+				if err := p.repos.Chunks.Create(ctx, tx, dbChunk); err != nil {
+					return fmt.Errorf("creating chunk: %w", err)
+				}
+			}
+
+			// Insert all blob_chunk records in batch
+			for _, chunk := range p.currentBlob.chunks {
+				blobChunk := &database.BlobChunk{
+					BlobID:    blobIDTyped,
+					ChunkHash: types.ChunkHash(chunk.Hash),
+					Offset:    chunk.Offset,
+					Length:    chunk.Size,
+				}
+				if err := p.repos.BlobChunks.Create(ctx, tx, blobChunk); err != nil {
+					return fmt.Errorf("creating blob_chunk: %w", err)
+				}
+			}
+
+			// Update blob record with final hash and sizes
+			return p.repos.Blobs.UpdateFinished(ctx, tx, p.currentBlob.id, blobHash,
+				p.currentBlob.size, finalSize)
+		})
+		if err != nil {
+			p.cleanupTempFile()
+			return fmt.Errorf("finalizing blob transaction: %w", err)
+		}
+
+		log.Debug("Committed blob transaction",
+			"chunks_inserted", len(chunksToInsert),
+			"blob_chunks_inserted", len(p.currentBlob.chunks))
+	}
+
+	// Create finished blob
+	finished := &FinishedBlob{
+		ID:           p.currentBlob.id,
+		Hash:         blobHash,
+		Data:         nil, // We don't load data into memory anymore
+		Chunks:       chunkRefs,
+		CreatedTS:    p.currentBlob.startTime,
+		Uncompressed: p.currentBlob.size,
+		Compressed:   finalSize,
+	}
+
+	compressionRatio := float64(finished.Compressed) / float64(finished.Uncompressed)
+	log.Info("Finalized blob (compressed and encrypted)",
+		"hash", blobHash,
+		"chunks", len(chunkRefs),
+		"uncompressed", finished.Uncompressed,
+		"compressed", finished.Compressed,
+		"ratio", fmt.Sprintf("%.2f", compressionRatio),
+		"duration", time.Since(p.currentBlob.startTime))
+
+	// Collect inserted chunk hashes for the scanner to track
+	var insertedChunkHashes []string
+	for _, chunk := range chunksToInsert {
+		insertedChunkHashes = append(insertedChunkHashes, chunk.Hash)
+	}
+
+	// Call blob handler if set
+	if p.blobHandler != nil {
+		// Reset file position for handler
+		if _, err := p.currentBlob.tempFile.Seek(0, io.SeekStart); err != nil {
+			p.cleanupTempFile()
+			return fmt.Errorf("seeking for handler: %w", err)
+		}
+
+		// Create a blob reader that includes the data stream
+		blobWithReader := &BlobWithReader{
+			FinishedBlob:        finished,
+			Reader:              p.currentBlob.tempFile,
+			TempFile:            p.currentBlob.tempFile,
+			InsertedChunkHashes: insertedChunkHashes,
+		}
+
+		if err := p.blobHandler(blobWithReader); err != nil {
+			p.cleanupTempFile()
+			return fmt.Errorf("blob handler failed: %w", err)
+		}
+		// Note: blob handler is responsible for closing/cleaning up temp file
+		p.currentBlob = nil
+	} else {
+		log.Debug("No blob handler callback configured", "blob_hash", blobHash[:8]+"...")
+		// No handler, need to read data for legacy behavior
+		if _, err := p.currentBlob.tempFile.Seek(0, io.SeekStart); err != nil {
+			p.cleanupTempFile()
+			return fmt.Errorf("seeking to read data: %w", err)
+		}
+
+		data, err := io.ReadAll(p.currentBlob.tempFile)
+		if err != nil {
+			p.cleanupTempFile()
+			return fmt.Errorf("reading blob data: %w", err)
+		}
+		finished.Data = data
+
+		p.finishedBlobs = append(p.finishedBlobs, finished)
+
+		// Cleanup
+		p.cleanupTempFile()
+		p.currentBlob = nil
+	}
+
+	return nil
+}
+
+// cleanupTempFile removes the temporary file
+func (p *Packer) cleanupTempFile() {
+	if p.currentBlob != nil && p.currentBlob.tempFile != nil {
+		name := p.currentBlob.tempFile.Name()
+		_ = p.currentBlob.tempFile.Close()
+		_ = p.fs.Remove(name)
+	}
+}
+
+// PackChunks is a convenience method to pack multiple chunks at once
+func (p *Packer) PackChunks(chunks []*ChunkRef) error {
+	for _, chunk := range chunks {
+		err := p.AddChunk(chunk)
+		if err == ErrBlobSizeLimitExceeded {
+			// Finalize current blob and retry
+			if err := p.FinalizeBlob(); err != nil {
+				return fmt.Errorf("finalizing blob before retry: %w", err)
+			}
+			// Retry the chunk
+			if err := p.AddChunk(chunk); err != nil {
+				return fmt.Errorf("adding chunk %s after finalize: %w", chunk.Hash, err)
+			}
+		} else if err != nil {
+			return fmt.Errorf("adding chunk %s: %w", chunk.Hash, err)
+		}
+	}
+
+	return p.Flush()
+}
--- a/internal/blob/packer_test.go
+++ b/internal/blob/packer_test.go
@@ -0,0 +1,385 @@
+package blob
+
+import (
+	"bytes"
+	"context"
+	"crypto/sha256"
+	"database/sql"
+	"encoding/hex"
+	"io"
+	"testing"
+
+	"filippo.io/age"
+	"git.eeqj.de/sneak/vaultik/internal/database"
+	"git.eeqj.de/sneak/vaultik/internal/log"
+	"git.eeqj.de/sneak/vaultik/internal/types"
+	"github.com/klauspost/compress/zstd"
+	"github.com/spf13/afero"
+)
+
+const (
+	// Test key from test/insecure-integration-test.key
+	testPrivateKey = "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5"
+	testPublicKey  = "age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"
+)
+
+func TestPacker(t *testing.T) {
+	// Initialize logger for tests
+	log.Initialize(log.Config{})
+
+	// Parse test identity
+	identity, err := age.ParseX25519Identity(testPrivateKey)
+	if err != nil {
+		t.Fatalf("failed to parse test identity: %v", err)
+	}
+
+	t.Run("single chunk creates single blob", func(t *testing.T) {
+		// Create test database
+		db, err := database.NewTestDB()
+		if err != nil {
+			t.Fatalf("failed to create test db: %v", err)
+		}
+		defer func() { _ = db.Close() }()
+		repos := database.NewRepositories(db)
+
+		cfg := PackerConfig{
+			MaxBlobSize:      10 * 1024 * 1024, // 10MB
+			CompressionLevel: 3,
+			Recipients:       []string{testPublicKey},
+			Repositories:     repos,
+			Fs:               afero.NewMemMapFs(),
+		}
+		packer, err := NewPacker(cfg)
+		if err != nil {
+			t.Fatalf("failed to create packer: %v", err)
+		}
+
+		// Create a chunk
+		data := []byte("Hello, World!")
+		hash := sha256.Sum256(data)
+		hashStr := hex.EncodeToString(hash[:])
+
+		// Create chunk in database first
+		dbChunk := &database.Chunk{
+			ChunkHash: types.ChunkHash(hashStr),
+			Size:      int64(len(data)),
+		}
+		err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
+			return repos.Chunks.Create(ctx, tx, dbChunk)
+		})
+		if err != nil {
+			t.Fatalf("failed to create chunk in db: %v", err)
+		}
+
+		chunk := &ChunkRef{
+			Hash: hashStr,
+			Data: data,
+		}
+
+		// Add chunk
+		if err := packer.AddChunk(chunk); err != nil {
+			t.Fatalf("failed to add chunk: %v", err)
+		}
+
+		// Flush
+		if err := packer.Flush(); err != nil {
+			t.Fatalf("failed to flush: %v", err)
+		}
+
+		// Get finished blobs
+		blobs := packer.GetFinishedBlobs()
+		if len(blobs) != 1 {
+			t.Fatalf("expected 1 blob, got %d", len(blobs))
+		}
+
+		blob := blobs[0]
+		if len(blob.Chunks) != 1 {
+			t.Errorf("expected 1 chunk in blob, got %d", len(blob.Chunks))
+		}
+
+		// Note: Very small data may not compress well
+		t.Logf("Compression: %d -> %d bytes", blob.Uncompressed, blob.Compressed)
+
+		// Decrypt the blob data
+		decrypted, err := age.Decrypt(bytes.NewReader(blob.Data), identity)
+		if err != nil {
+			t.Fatalf("failed to decrypt blob: %v", err)
+		}
+
+		// Decompress the decrypted data
+		reader, err := zstd.NewReader(decrypted)
+		if err != nil {
+			t.Fatalf("failed to create decompressor: %v", err)
+		}
+		defer reader.Close()
+
+		var decompressed bytes.Buffer
+		if _, err := io.Copy(&decompressed, reader); err != nil {
+			t.Fatalf("failed to decompress: %v", err)
+		}
+
+		if !bytes.Equal(decompressed.Bytes(), data) {
+			t.Error("decompressed data doesn't match original")
+		}
+	})
+
+	t.Run("multiple chunks packed together", func(t *testing.T) {
+		// Create test database
+		db, err := database.NewTestDB()
+		if err != nil {
+			t.Fatalf("failed to create test db: %v", err)
+		}
+		defer func() { _ = db.Close() }()
+		repos := database.NewRepositories(db)
+
+		cfg := PackerConfig{
+			MaxBlobSize:      10 * 1024 * 1024, // 10MB
+			CompressionLevel: 3,
+			Recipients:       []string{testPublicKey},
+			Repositories:     repos,
+			Fs:               afero.NewMemMapFs(),
+		}
+		packer, err := NewPacker(cfg)
+		if err != nil {
+			t.Fatalf("failed to create packer: %v", err)
+		}
+
+		// Create multiple small chunks
+		chunks := make([]*ChunkRef, 10)
+		for i := 0; i < 10; i++ {
+			data := bytes.Repeat([]byte{byte(i)}, 1000)
+			hash := sha256.Sum256(data)
+			hashStr := hex.EncodeToString(hash[:])
+
+			// Create chunk in database first
+			dbChunk := &database.Chunk{
+				ChunkHash: types.ChunkHash(hashStr),
+				Size:      int64(len(data)),
+			}
+			err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
+				return repos.Chunks.Create(ctx, tx, dbChunk)
+			})
+			if err != nil {
+				t.Fatalf("failed to create chunk in db: %v", err)
+			}
+
+			chunks[i] = &ChunkRef{
+				Hash: hashStr,
+				Data: data,
+			}
+		}
+
+		// Add all chunks
+		for _, chunk := range chunks {
+			err := packer.AddChunk(chunk)
+			if err != nil {
+				t.Fatalf("failed to add chunk: %v", err)
+			}
+		}
+
+		// Flush
+		if err := packer.Flush(); err != nil {
+			t.Fatalf("failed to flush: %v", err)
+		}
+
+		// Should have one blob with all chunks
+		blobs := packer.GetFinishedBlobs()
+		if len(blobs) != 1 {
+			t.Fatalf("expected 1 blob, got %d", len(blobs))
+		}
+
+		if len(blobs[0].Chunks) != 10 {
+			t.Errorf("expected 10 chunks in blob, got %d", len(blobs[0].Chunks))
+		}
+
+		// Verify offsets are correct
+		expectedOffset := int64(0)
+		for i, chunkRef := range blobs[0].Chunks {
+			if chunkRef.Offset != expectedOffset {
+				t.Errorf("chunk %d: expected offset %d, got %d", i, expectedOffset, chunkRef.Offset)
+			}
+			if chunkRef.Length != 1000 {
+				t.Errorf("chunk %d: expected length 1000, got %d", i, chunkRef.Length)
+			}
+			expectedOffset += chunkRef.Length
+		}
+	})
+
+	t.Run("blob size limit enforced", func(t *testing.T) {
+		// Create test database
+		db, err := database.NewTestDB()
+		if err != nil {
+			t.Fatalf("failed to create test db: %v", err)
+		}
+		defer func() { _ = db.Close() }()
+		repos := database.NewRepositories(db)
+
+		// Small blob size limit to force multiple blobs
+		cfg := PackerConfig{
+			MaxBlobSize:      5000, // 5KB max
+			CompressionLevel: 3,
+			Recipients:       []string{testPublicKey},
+			Repositories:     repos,
+			Fs:               afero.NewMemMapFs(),
+		}
+		packer, err := NewPacker(cfg)
+		if err != nil {
+			t.Fatalf("failed to create packer: %v", err)
+		}
+
+		// Create chunks that will exceed the limit
+		chunks := make([]*ChunkRef, 10)
+		for i := 0; i < 10; i++ {
+			data := bytes.Repeat([]byte{byte(i)}, 1000) // 1KB each
+			hash := sha256.Sum256(data)
+			hashStr := hex.EncodeToString(hash[:])
+
+			// Create chunk in database first
+			dbChunk := &database.Chunk{
+				ChunkHash: types.ChunkHash(hashStr),
+				Size:      int64(len(data)),
+			}
+			err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
+				return repos.Chunks.Create(ctx, tx, dbChunk)
+			})
+			if err != nil {
+				t.Fatalf("failed to create chunk in db: %v", err)
+			}
+
+			chunks[i] = &ChunkRef{
+				Hash: hashStr,
+				Data: data,
+			}
+		}
+
+		blobCount := 0
+
+		// Add chunks and handle size limit errors
+		for _, chunk := range chunks {
+			err := packer.AddChunk(chunk)
+			if err == ErrBlobSizeLimitExceeded {
+				// Finalize current blob
+				if err := packer.FinalizeBlob(); err != nil {
+					t.Fatalf("failed to finalize blob: %v", err)
+				}
+				blobCount++
+				// Retry adding the chunk
+				if err := packer.AddChunk(chunk); err != nil {
+					t.Fatalf("failed to add chunk after finalize: %v", err)
+				}
+			} else if err != nil {
+				t.Fatalf("failed to add chunk: %v", err)
+			}
+		}
+
+		// Flush remaining
+		if err := packer.Flush(); err != nil {
+			t.Fatalf("failed to flush: %v", err)
+		}
+
+		// Get all blobs
+		blobs := packer.GetFinishedBlobs()
+		totalBlobs := blobCount + len(blobs)
+
+		// Should have multiple blobs due to size limit
+		if totalBlobs < 2 {
+			t.Errorf("expected multiple blobs due to size limit, got %d", totalBlobs)
+		}
+
+		// Verify each blob respects size limit (approximately)
+		for _, blob := range blobs {
+			if blob.Compressed > 6000 { // Allow some overhead
+				t.Errorf("blob size %d exceeds limit", blob.Compressed)
+			}
+		}
+	})
+
+	t.Run("with encryption", func(t *testing.T) {
+		// Create test database
+		db, err := database.NewTestDB()
+		if err != nil {
+			t.Fatalf("failed to create test db: %v", err)
+		}
+		defer func() { _ = db.Close() }()
+		repos := database.NewRepositories(db)
+
+		// Generate test identity (using the one from parent test)
+		cfg := PackerConfig{
+			MaxBlobSize:      10 * 1024 * 1024, // 10MB
+			CompressionLevel: 3,
+			Recipients:       []string{testPublicKey},
+			Repositories:     repos,
+			Fs:               afero.NewMemMapFs(),
+		}
+		packer, err := NewPacker(cfg)
+		if err != nil {
+			t.Fatalf("failed to create packer: %v", err)
+		}
+
+		// Create test data
+		data := bytes.Repeat([]byte("Test data for encryption!"), 100)
+		hash := sha256.Sum256(data)
+		hashStr := hex.EncodeToString(hash[:])
+
+		// Create chunk in database first
+		dbChunk := &database.Chunk{
+			ChunkHash: types.ChunkHash(hashStr),
+			Size:      int64(len(data)),
+		}
+		err = repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
+			return repos.Chunks.Create(ctx, tx, dbChunk)
+		})
+		if err != nil {
+			t.Fatalf("failed to create chunk in db: %v", err)
+		}
+
+		chunk := &ChunkRef{
+			Hash: hashStr,
+			Data: data,
+		}
+
+		// Add chunk and flush
+		if err := packer.AddChunk(chunk); err != nil {
+			t.Fatalf("failed to add chunk: %v", err)
+		}
+		if err := packer.Flush(); err != nil {
+			t.Fatalf("failed to flush: %v", err)
+		}
+
+		// Get blob
+		blobs := packer.GetFinishedBlobs()
+		if len(blobs) != 1 {
+			t.Fatalf("expected 1 blob, got %d", len(blobs))
+		}
+
+		blob := blobs[0]
+
+		// Decrypt the blob
+		decrypted, err := age.Decrypt(bytes.NewReader(blob.Data), identity)
+		if err != nil {
+			t.Fatalf("failed to decrypt blob: %v", err)
+		}
+
+		var decryptedData bytes.Buffer
+		if _, err := decryptedData.ReadFrom(decrypted); err != nil {
+			t.Fatalf("failed to read decrypted data: %v", err)
+		}
+
+		// Decompress
+		reader, err := zstd.NewReader(&decryptedData)
+		if err != nil {
+			t.Fatalf("failed to create decompressor: %v", err)
+		}
+		defer reader.Close()
+
+		var decompressed bytes.Buffer
+		if _, err := decompressed.ReadFrom(reader); err != nil {
+			t.Fatalf("failed to decompress: %v", err)
+		}
+
+		// Verify data
+		if !bytes.Equal(decompressed.Bytes(), data) {
+			t.Error("decrypted and decompressed data doesn't match original")
+		}
+	})
+}
--- a/internal/blobgen/compress.go
+++ b/internal/blobgen/compress.go
@@ -0,0 +1,74 @@
+package blobgen
+
+import (
+	"bytes"
+	"encoding/hex"
+	"fmt"
+	"io"
+)
+
+// CompressResult contains the results of compression
+type CompressResult struct {
+	Data             []byte
+	UncompressedSize int64
+	CompressedSize   int64
+	SHA256           string
+}
+
+// CompressData compresses and encrypts data, returning the result with hash
+func CompressData(data []byte, compressionLevel int, recipients []string) (*CompressResult, error) {
+	var buf bytes.Buffer
+
+	// Create writer
+	w, err := NewWriter(&buf, compressionLevel, recipients)
+	if err != nil {
+		return nil, fmt.Errorf("creating writer: %w", err)
+	}
+
+	// Write data
+	if _, err := w.Write(data); err != nil {
+		_ = w.Close()
+		return nil, fmt.Errorf("writing data: %w", err)
+	}
+
+	// Close to flush
+	if err := w.Close(); err != nil {
+		return nil, fmt.Errorf("closing writer: %w", err)
+	}
+
+	return &CompressResult{
+		Data:             buf.Bytes(),
+		UncompressedSize: int64(len(data)),
+		CompressedSize:   int64(buf.Len()),
+		SHA256:           hex.EncodeToString(w.Sum256()),
+	}, nil
+}
+
+// CompressStream compresses and encrypts from reader to writer, returning hash
+func CompressStream(dst io.Writer, src io.Reader, compressionLevel int, recipients []string) (written int64, hash string, err error) {
+	// Create writer
+	w, err := NewWriter(dst, compressionLevel, recipients)
+	if err != nil {
+		return 0, "", fmt.Errorf("creating writer: %w", err)
+	}
+
+	closed := false
+	defer func() {
+		if !closed {
+			_ = w.Close()
+		}
+	}()
+
+	// Copy data
+	if _, err := io.Copy(w, src); err != nil {
+		return 0, "", fmt.Errorf("copying data: %w", err)
+	}
+
+	// Close to flush
+	if err := w.Close(); err != nil {
+		return 0, "", fmt.Errorf("closing writer: %w", err)
+	}
+	closed = true
+
+	return w.BytesWritten(), hex.EncodeToString(w.Sum256()), nil
+}
--- a/internal/blobgen/compress_test.go
+++ b/internal/blobgen/compress_test.go
@@ -0,0 +1,64 @@
+package blobgen
+
+import (
+	"bytes"
+	"crypto/rand"
+	"strings"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+// testRecipient is a static age recipient for tests.
+const testRecipient = "age1cplgrwj77ta54dnmydvvmzn64ltk83ankxl5sww04mrtmu62kv3s89gmvv"
+
+// TestCompressStreamNoDoubleClose is a regression test for issue #28.
+// It verifies that CompressStream does not panic or return an error due to
+// double-closing the underlying blobgen.Writer. Before the fix in PR #33,
+// the explicit Close() on the happy path combined with defer Close() would
+// cause a double close.
+func TestCompressStreamNoDoubleClose(t *testing.T) {
+	input := []byte("regression test data for issue #28 double-close fix")
+	var buf bytes.Buffer
+
+	written, hash, err := CompressStream(&buf, bytes.NewReader(input), 3, []string{testRecipient})
+	require.NoError(t, err, "CompressStream should not return an error")
+	assert.True(t, written > 0, "expected bytes written > 0")
+	assert.NotEmpty(t, hash, "expected non-empty hash")
+	assert.True(t, buf.Len() > 0, "expected non-empty output")
+}
+
+// TestCompressStreamLargeInput exercises CompressStream with a larger payload
+// to ensure no double-close issues surface under heavier I/O.
+func TestCompressStreamLargeInput(t *testing.T) {
+	data := make([]byte, 512*1024) // 512 KB
+	_, err := rand.Read(data)
+	require.NoError(t, err)
+
+	var buf bytes.Buffer
+	written, hash, err := CompressStream(&buf, bytes.NewReader(data), 3, []string{testRecipient})
+	require.NoError(t, err)
+	assert.True(t, written > 0)
+	assert.NotEmpty(t, hash)
+}
+
+// TestCompressStreamEmptyInput verifies CompressStream handles empty input
+// without double-close issues.
+func TestCompressStreamEmptyInput(t *testing.T) {
+	var buf bytes.Buffer
+	_, hash, err := CompressStream(&buf, strings.NewReader(""), 3, []string{testRecipient})
+	require.NoError(t, err)
+	assert.NotEmpty(t, hash)
+}
+
+// TestCompressDataNoDoubleClose mirrors the stream test for CompressData,
+// ensuring the explicit Close + error-path Close pattern is also safe.
+func TestCompressDataNoDoubleClose(t *testing.T) {
+	input := []byte("CompressData regression test for double-close")
+	result, err := CompressData(input, 3, []string{testRecipient})
+	require.NoError(t, err)
+	assert.True(t, result.CompressedSize > 0)
+	assert.True(t, result.UncompressedSize == int64(len(input)))
+	assert.NotEmpty(t, result.SHA256)
+}
--- a/internal/blobgen/reader.go
+++ b/internal/blobgen/reader.go
@@ -0,0 +1,73 @@
+package blobgen
+
+import (
+	"crypto/sha256"
+	"fmt"
+	"hash"
+	"io"
+
+	"filippo.io/age"
+	"github.com/klauspost/compress/zstd"
+)
+
+// Reader wraps decompression and decryption with SHA256 verification
+type Reader struct {
+	reader       io.Reader
+	decompressor *zstd.Decoder
+	decryptor    io.Reader
+	hasher       hash.Hash
+	teeReader    io.Reader
+	bytesRead    int64
+}
+
+// NewReader creates a new Reader that decrypts, decompresses, and verifies data
+func NewReader(r io.Reader, identity age.Identity) (*Reader, error) {
+	// Create decryption reader
+	decReader, err := age.Decrypt(r, identity)
+	if err != nil {
+		return nil, fmt.Errorf("creating decryption reader: %w", err)
+	}
+
+	// Create decompression reader
+	decompressor, err := zstd.NewReader(decReader)
+	if err != nil {
+		return nil, fmt.Errorf("creating decompression reader: %w", err)
+	}
+
+	// Create SHA256 hasher
+	hasher := sha256.New()
+
+	// Create tee reader that reads from decompressor and writes to hasher
+	teeReader := io.TeeReader(decompressor, hasher)
+
+	return &Reader{
+		reader:       r,
+		decompressor: decompressor,
+		decryptor:    decReader,
+		hasher:       hasher,
+		teeReader:    teeReader,
+	}, nil
+}
+
+// Read implements io.Reader
+func (r *Reader) Read(p []byte) (n int, err error) {
+	n, err = r.teeReader.Read(p)
+	r.bytesRead += int64(n)
+	return n, err
+}
+
+// Close closes the decompressor
+func (r *Reader) Close() error {
+	r.decompressor.Close()
+	return nil
+}
+
+// Sum256 returns the SHA256 hash of all data read
+func (r *Reader) Sum256() []byte {
+	return r.hasher.Sum(nil)
+}
+
+// BytesRead returns the number of uncompressed bytes read
+func (r *Reader) BytesRead() int64 {
+	return r.bytesRead
+}
--- a/internal/blobgen/writer.go
+++ b/internal/blobgen/writer.go
@@ -0,0 +1,127 @@
+package blobgen
+
+import (
+	"crypto/sha256"
+	"fmt"
+	"hash"
+	"io"
+	"runtime"
+
+	"filippo.io/age"
+	"github.com/klauspost/compress/zstd"
+)
+
+// Writer wraps compression and encryption with SHA256 hashing.
+// Data flows: input -> tee(hasher, compressor -> encryptor -> destination)
+// The hash is computed on the uncompressed input for deterministic content-addressing.
+type Writer struct {
+	teeWriter        io.Writer      // Tee to hasher and compressor
+	compressor       *zstd.Encoder  // Compression layer
+	encryptor        io.WriteCloser // Encryption layer
+	hasher           hash.Hash      // SHA256 hasher (on uncompressed input)
+	compressionLevel int
+	bytesWritten     int64
+}
+
+// NewWriter creates a new Writer that compresses, encrypts, and hashes data.
+// The hash is computed on the uncompressed input for deterministic content-addressing.
+func NewWriter(w io.Writer, compressionLevel int, recipients []string) (*Writer, error) {
+	// Validate compression level
+	if err := validateCompressionLevel(compressionLevel); err != nil {
+		return nil, err
+	}
+
+	// Create SHA256 hasher for the uncompressed input
+	hasher := sha256.New()
+
+	// Parse recipients
+	var ageRecipients []age.Recipient
+	for _, recipient := range recipients {
+		r, err := age.ParseX25519Recipient(recipient)
+		if err != nil {
+			return nil, fmt.Errorf("parsing recipient %s: %w", recipient, err)
+		}
+		ageRecipients = append(ageRecipients, r)
+	}
+
+	// Create encryption writer that outputs to destination
+	encWriter, err := age.Encrypt(w, ageRecipients...)
+	if err != nil {
+		return nil, fmt.Errorf("creating encryption writer: %w", err)
+	}
+
+	// Calculate compression concurrency: CPUs - 2, minimum 1
+	concurrency := runtime.NumCPU() - 2
+	if concurrency < 1 {
+		concurrency = 1
+	}
+
+	// Create compression writer with encryption as destination
+	compressor, err := zstd.NewWriter(encWriter,
+		zstd.WithEncoderLevel(zstd.EncoderLevelFromZstd(compressionLevel)),
+		zstd.WithEncoderConcurrency(concurrency),
+	)
+	if err != nil {
+		_ = encWriter.Close()
+		return nil, fmt.Errorf("creating compression writer: %w", err)
+	}
+
+	// Create tee writer: input goes to both hasher and compressor
+	teeWriter := io.MultiWriter(hasher, compressor)
+
+	return &Writer{
+		teeWriter:        teeWriter,
+		compressor:       compressor,
+		encryptor:        encWriter,
+		hasher:           hasher,
+		compressionLevel: compressionLevel,
+	}, nil
+}
+
+// Write implements io.Writer
+func (w *Writer) Write(p []byte) (n int, err error) {
+	n, err = w.teeWriter.Write(p)
+	w.bytesWritten += int64(n)
+	return n, err
+}
+
+// Close closes all layers and returns any errors
+func (w *Writer) Close() error {
+	// Close compressor first
+	if err := w.compressor.Close(); err != nil {
+		return fmt.Errorf("closing compressor: %w", err)
+	}
+
+	// Then close encryptor
+	if err := w.encryptor.Close(); err != nil {
+		return fmt.Errorf("closing encryptor: %w", err)
+	}
+
+	return nil
+}
+
+// Sum256 returns the double SHA256 hash of the uncompressed input data.
+// Double hashing (SHA256(SHA256(data))) prevents information leakage about
+// the plaintext - an attacker cannot confirm existence of known content
+// by computing its hash and checking for a matching blob filename.
+func (w *Writer) Sum256() []byte {
+	// First hash: SHA256(plaintext)
+	firstHash := w.hasher.Sum(nil)
+	// Second hash: SHA256(firstHash) - this is the blob ID
+	secondHash := sha256.Sum256(firstHash)
+	return secondHash[:]
+}
+
+// BytesWritten returns the number of uncompressed bytes written
+func (w *Writer) BytesWritten() int64 {
+	return w.bytesWritten
+}
+
+func validateCompressionLevel(level int) error {
+	// Zstd compression levels: 1-19 (default is 3)
+	// SpeedFastest = 1, SpeedDefault = 3, SpeedBetterCompression = 7, SpeedBestCompression = 11
+	if level < 1 || level > 19 {
+		return fmt.Errorf("invalid compression level %d: must be between 1 and 19", level)
+	}
+	return nil
+}
--- a/internal/blobgen/writer_test.go
+++ b/internal/blobgen/writer_test.go
@@ -0,0 +1,105 @@
+package blobgen
+
+import (
+	"bytes"
+	"crypto/rand"
+	"crypto/sha256"
+	"encoding/hex"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+// TestWriterHashIsDoubleHash verifies that Writer.Sum256() returns
+// the double hash SHA256(SHA256(plaintext)) for security.
+// Double hashing prevents attackers from confirming existence of known content.
+func TestWriterHashIsDoubleHash(t *testing.T) {
+	// Test data - random data that doesn't compress well
+	testData := make([]byte, 1024*1024) // 1MB
+	_, err := rand.Read(testData)
+	require.NoError(t, err)
+
+	// Test recipient (generated with age-keygen)
+	testRecipient := "age1cplgrwj77ta54dnmydvvmzn64ltk83ankxl5sww04mrtmu62kv3s89gmvv"
+
+	// Create a buffer to capture the encrypted output
+	var encryptedBuf bytes.Buffer
+
+	// Create blobgen writer
+	writer, err := NewWriter(&encryptedBuf, 3, []string{testRecipient})
+	require.NoError(t, err)
+
+	// Write test data
+	n, err := writer.Write(testData)
+	require.NoError(t, err)
+	assert.Equal(t, len(testData), n)
+
+	// Close to flush all data
+	err = writer.Close()
+	require.NoError(t, err)
+
+	// Get the hash from the writer
+	writerHash := hex.EncodeToString(writer.Sum256())
+
+	// Calculate the expected double hash: SHA256(SHA256(plaintext))
+	firstHash := sha256.Sum256(testData)
+	secondHash := sha256.Sum256(firstHash[:])
+	expectedDoubleHash := hex.EncodeToString(secondHash[:])
+
+	// Also compute single hash to verify it's different
+	singleHashStr := hex.EncodeToString(firstHash[:])
+
+	t.Logf("Input size: %d bytes", len(testData))
+	t.Logf("Single hash (SHA256(data)): %s", singleHashStr)
+	t.Logf("Double hash (SHA256(SHA256(data))): %s", expectedDoubleHash)
+	t.Logf("Writer hash: %s", writerHash)
+
+	// The writer hash should match the double hash
+	assert.Equal(t, expectedDoubleHash, writerHash,
+		"Writer.Sum256() should return SHA256(SHA256(plaintext)) for security")
+
+	// Verify it's NOT the single hash (would leak information)
+	assert.NotEqual(t, singleHashStr, writerHash,
+		"Writer hash should not be single hash (would allow content confirmation attacks)")
+}
+
+// TestWriterDeterministicHash verifies that the same input always produces
+// the same hash, even with non-deterministic encryption.
+func TestWriterDeterministicHash(t *testing.T) {
+	// Test data
+	testData := []byte("Hello, World! This is test data for deterministic hashing.")
+
+	// Test recipient
+	testRecipient := "age1cplgrwj77ta54dnmydvvmzn64ltk83ankxl5sww04mrtmu62kv3s89gmvv"
+
+	// Create two writers and verify they produce the same hash
+	var buf1, buf2 bytes.Buffer
+
+	writer1, err := NewWriter(&buf1, 3, []string{testRecipient})
+	require.NoError(t, err)
+	_, err = writer1.Write(testData)
+	require.NoError(t, err)
+	require.NoError(t, writer1.Close())
+
+	writer2, err := NewWriter(&buf2, 3, []string{testRecipient})
+	require.NoError(t, err)
+	_, err = writer2.Write(testData)
+	require.NoError(t, err)
+	require.NoError(t, writer2.Close())
+
+	hash1 := hex.EncodeToString(writer1.Sum256())
+	hash2 := hex.EncodeToString(writer2.Sum256())
+
+	// Hashes should be identical (deterministic)
+	assert.Equal(t, hash1, hash2, "Same input should produce same hash")
+
+	// Encrypted outputs should be different (non-deterministic encryption)
+	assert.NotEqual(t, buf1.Bytes(), buf2.Bytes(),
+		"Encrypted outputs should differ due to non-deterministic encryption")
+
+	t.Logf("Hash 1: %s", hash1)
+	t.Logf("Hash 2: %s", hash2)
+	t.Logf("Encrypted size 1: %d bytes", buf1.Len())
+	t.Logf("Encrypted size 2: %d bytes", buf2.Len())
+}
--- a/internal/chunker/chunker.go
+++ b/internal/chunker/chunker.go
@@ -0,0 +1,153 @@
+package chunker
+
+import (
+	"crypto/sha256"
+	"encoding/hex"
+	"fmt"
+	"io"
+	"os"
+)
+
+// Chunk represents a single chunk of data produced by the content-defined chunking algorithm.
+// Each chunk is identified by its SHA256 hash and contains the raw data along with
+// its position and size information from the original file.
+type Chunk struct {
+	Hash   string // Content hash of the chunk
+	Data   []byte // Chunk data
+	Offset int64  // Offset in the original file
+	Size   int64  // Size of the chunk
+}
+
+// Chunker provides content-defined chunking using the FastCDC algorithm.
+// It splits data into variable-sized chunks based on content patterns, ensuring
+// that identical data sequences produce identical chunks regardless of their
+// position in the file. This enables efficient deduplication.
+type Chunker struct {
+	avgChunkSize int
+	minChunkSize int
+	maxChunkSize int
+}
+
+// NewChunker creates a new chunker with the specified average chunk size.
+// The actual chunk sizes will vary between avgChunkSize/4 and avgChunkSize*4
+// as recommended by the FastCDC algorithm. Typical values for avgChunkSize
+// are 64KB (65536), 256KB (262144), or 1MB (1048576).
+func NewChunker(avgChunkSize int64) *Chunker {
+	// FastCDC recommends min = avg/4 and max = avg*4
+	return &Chunker{
+		avgChunkSize: int(avgChunkSize),
+		minChunkSize: int(avgChunkSize / 4),
+		maxChunkSize: int(avgChunkSize * 4),
+	}
+}
+
+// ChunkReader splits the reader into content-defined chunks and returns all chunks at once.
+// This method loads all chunk data into memory, so it should only be used for
+// reasonably sized inputs. For large files or streams, use ChunkReaderStreaming instead.
+// Returns an error if chunking fails or if reading from the input fails.
+func (c *Chunker) ChunkReader(r io.Reader) ([]Chunk, error) {
+	chunker := AcquireReusableChunker(r, c.minChunkSize, c.avgChunkSize, c.maxChunkSize)
+	defer chunker.Release()
+
+	var chunks []Chunk
+	offset := int64(0)
+
+	for {
+		chunk, err := chunker.Next()
+		if err == io.EOF {
+			break
+		}
+		if err != nil {
+			return nil, fmt.Errorf("reading chunk: %w", err)
+		}
+
+		// Calculate hash
+		hash := sha256.Sum256(chunk.Data)
+
+		// Make a copy of the data since the chunker reuses the buffer
+		chunkData := make([]byte, len(chunk.Data))
+		copy(chunkData, chunk.Data)
+
+		chunks = append(chunks, Chunk{
+			Hash:   hex.EncodeToString(hash[:]),
+			Data:   chunkData,
+			Offset: offset,
+			Size:   int64(len(chunk.Data)),
+		})
+
+		offset += int64(len(chunk.Data))
+	}
+
+	return chunks, nil
+}
+
+// ChunkCallback is a function called for each chunk as it's processed.
+// The callback receives a Chunk containing the hash, data, offset, and size.
+// If the callback returns an error, chunk processing stops and the error is propagated.
+type ChunkCallback func(chunk Chunk) error
+
+// ChunkReaderStreaming splits the reader into chunks and calls the callback for each chunk.
+// This is the preferred method for processing large files or streams as it doesn't
+// accumulate all chunks in memory. The callback is invoked for each chunk as it's
+// produced, allowing for streaming processing and immediate storage or transmission.
+// Returns the SHA256 hash of the entire file content and an error if chunking fails,
+// reading fails, or if the callback returns an error.
+func (c *Chunker) ChunkReaderStreaming(r io.Reader, callback ChunkCallback) (string, error) {
+	// Create a tee reader to calculate full file hash while chunking
+	fileHasher := sha256.New()
+	teeReader := io.TeeReader(r, fileHasher)
+
+	chunker := AcquireReusableChunker(teeReader, c.minChunkSize, c.avgChunkSize, c.maxChunkSize)
+	defer chunker.Release()
+
+	offset := int64(0)
+
+	for {
+		chunk, err := chunker.Next()
+		if err == io.EOF {
+			break
+		}
+		if err != nil {
+			return "", fmt.Errorf("reading chunk: %w", err)
+		}
+
+		// Calculate chunk hash
+		hash := sha256.Sum256(chunk.Data)
+
+		// Pass the data directly - caller must process it before we call Next() again
+		// (chunker reuses its internal buffer, but since we process synchronously
+		// and completely before continuing, no copy is needed)
+		if err := callback(Chunk{
+			Hash:   hex.EncodeToString(hash[:]),
+			Data:   chunk.Data,
+			Offset: offset,
+			Size:   int64(len(chunk.Data)),
+		}); err != nil {
+			return "", fmt.Errorf("callback error: %w", err)
+		}
+
+		offset += int64(len(chunk.Data))
+	}
+
+	// Return the full file hash
+	return hex.EncodeToString(fileHasher.Sum(nil)), nil
+}
+
+// ChunkFile splits a file into content-defined chunks by reading the entire file.
+// This is a convenience method that opens the file and passes it to ChunkReader.
+// For large files, consider using ChunkReaderStreaming with a file handle instead.
+// Returns an error if the file cannot be opened or if chunking fails.
+func (c *Chunker) ChunkFile(path string) ([]Chunk, error) {
+	file, err := os.Open(path)
+	if err != nil {
+		return nil, fmt.Errorf("opening file: %w", err)
+	}
+	defer func() {
+		if err := file.Close(); err != nil && err.Error() != "invalid argument" {
+			// Log error or handle as needed
+			_ = err
+		}
+	}()
+
+	return c.ChunkReader(file)
+}
--- a/internal/chunker/chunker_isolated_test.go
+++ b/internal/chunker/chunker_isolated_test.go
@@ -0,0 +1,77 @@
+package chunker
+
+import (
+	"bytes"
+	"testing"
+)
+
+func TestChunkerExpectedChunkCount(t *testing.T) {
+	tests := []struct {
+		name         string
+		fileSize     int
+		avgChunkSize int64
+		minExpected  int
+		maxExpected  int
+	}{
+		{
+			name:         "1MB file with 64KB average",
+			fileSize:     1024 * 1024,
+			avgChunkSize: 64 * 1024,
+			minExpected:  8,  // At least half the expected count
+			maxExpected:  32, // At most double the expected count
+		},
+		{
+			name:         "10MB file with 256KB average",
+			fileSize:     10 * 1024 * 1024,
+			avgChunkSize: 256 * 1024,
+			minExpected:  10, // FastCDC may produce larger chunks
+			maxExpected:  80,
+		},
+		{
+			name:         "512KB file with 64KB average",
+			fileSize:     512 * 1024,
+			avgChunkSize: 64 * 1024,
+			minExpected:  4, // ~8 expected
+			maxExpected:  16,
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			chunker := NewChunker(tt.avgChunkSize)
+
+			// Create data with some variation to trigger chunk boundaries
+			data := make([]byte, tt.fileSize)
+			for i := 0; i < len(data); i++ {
+				// Use a pattern that should create boundaries
+				data[i] = byte((i * 17) ^ (i >> 5))
+			}
+
+			chunks, err := chunker.ChunkReader(bytes.NewReader(data))
+			if err != nil {
+				t.Fatalf("chunking failed: %v", err)
+			}
+
+			t.Logf("Created %d chunks for %d bytes with %d average chunk size",
+				len(chunks), tt.fileSize, tt.avgChunkSize)
+
+			if len(chunks) < tt.minExpected {
+				t.Errorf("too few chunks: got %d, expected at least %d",
+					len(chunks), tt.minExpected)
+			}
+			if len(chunks) > tt.maxExpected {
+				t.Errorf("too many chunks: got %d, expected at most %d",
+					len(chunks), tt.maxExpected)
+			}
+
+			// Verify chunks reconstruct to original
+			var reconstructed []byte
+			for _, chunk := range chunks {
+				reconstructed = append(reconstructed, chunk.Data...)
+			}
+			if !bytes.Equal(data, reconstructed) {
+				t.Error("reconstructed data doesn't match original")
+			}
+		})
+	}
+}
--- a/internal/chunker/chunker_test.go
+++ b/internal/chunker/chunker_test.go
@@ -0,0 +1,128 @@
+package chunker
+
+import (
+	"bytes"
+	"crypto/rand"
+	"testing"
+)
+
+func TestChunker(t *testing.T) {
+	t.Run("small file produces single chunk", func(t *testing.T) {
+		chunker := NewChunker(1024 * 1024)         // 1MB average
+		data := bytes.Repeat([]byte("hello"), 100) // 500 bytes
+
+		chunks, err := chunker.ChunkReader(bytes.NewReader(data))
+		if err != nil {
+			t.Fatalf("chunking failed: %v", err)
+		}
+
+		if len(chunks) != 1 {
+			t.Errorf("expected 1 chunk, got %d", len(chunks))
+		}
+
+		if chunks[0].Size != int64(len(data)) {
+			t.Errorf("expected chunk size %d, got %d", len(data), chunks[0].Size)
+		}
+	})
+
+	t.Run("large file produces multiple chunks", func(t *testing.T) {
+		chunker := NewChunker(256 * 1024) // 256KB average chunk size
+
+		// Generate 2MB of random data
+		data := make([]byte, 2*1024*1024)
+		if _, err := rand.Read(data); err != nil {
+			t.Fatalf("failed to generate random data: %v", err)
+		}
+
+		chunks, err := chunker.ChunkReader(bytes.NewReader(data))
+		if err != nil {
+			t.Fatalf("chunking failed: %v", err)
+		}
+
+		// Should produce multiple chunks - with FastCDC we expect around 8 chunks for 2MB with 256KB average
+		if len(chunks) < 4 || len(chunks) > 16 {
+			t.Errorf("expected 4-16 chunks, got %d", len(chunks))
+		}
+
+		// Verify chunks reconstruct original data
+		var reconstructed []byte
+		for _, chunk := range chunks {
+			reconstructed = append(reconstructed, chunk.Data...)
+		}
+
+		if !bytes.Equal(data, reconstructed) {
+			t.Error("reconstructed data doesn't match original")
+		}
+
+		// Verify offsets
+		var expectedOffset int64
+		for i, chunk := range chunks {
+			if chunk.Offset != expectedOffset {
+				t.Errorf("chunk %d: expected offset %d, got %d", i, expectedOffset, chunk.Offset)
+			}
+			expectedOffset += chunk.Size
+		}
+	})
+
+	t.Run("deterministic chunking", func(t *testing.T) {
+		chunker1 := NewChunker(256 * 1024)
+		chunker2 := NewChunker(256 * 1024)
+
+		// Use deterministic data
+		data := bytes.Repeat([]byte("abcdefghijklmnopqrstuvwxyz"), 20000) // ~520KB
+
+		chunks1, err := chunker1.ChunkReader(bytes.NewReader(data))
+		if err != nil {
+			t.Fatalf("chunking failed: %v", err)
+		}
+
+		chunks2, err := chunker2.ChunkReader(bytes.NewReader(data))
+		if err != nil {
+			t.Fatalf("chunking failed: %v", err)
+		}
+
+		// Should produce same chunks
+		if len(chunks1) != len(chunks2) {
+			t.Fatalf("different number of chunks: %d vs %d", len(chunks1), len(chunks2))
+		}
+
+		for i := range chunks1 {
+			if chunks1[i].Hash != chunks2[i].Hash {
+				t.Errorf("chunk %d: different hashes", i)
+			}
+			if chunks1[i].Size != chunks2[i].Size {
+				t.Errorf("chunk %d: different sizes", i)
+			}
+		}
+	})
+}
+
+func TestChunkBoundaries(t *testing.T) {
+	chunker := NewChunker(256 * 1024) // 256KB average
+
+	// FastCDC uses avg/4 for min and avg*4 for max
+	avgSize := int64(256 * 1024)
+	minSize := avgSize / 4
+	maxSize := avgSize * 4
+
+	// Test that minimum chunk size is respected
+	data := make([]byte, minSize+1024)
+	if _, err := rand.Read(data); err != nil {
+		t.Fatalf("failed to generate random data: %v", err)
+	}
+
+	chunks, err := chunker.ChunkReader(bytes.NewReader(data))
+	if err != nil {
+		t.Fatalf("chunking failed: %v", err)
+	}
+
+	for i, chunk := range chunks {
+		// Last chunk can be smaller than minimum
+		if i < len(chunks)-1 && chunk.Size < minSize {
+			t.Errorf("chunk %d size %d is below minimum %d", i, chunk.Size, minSize)
+		}
+		if chunk.Size > maxSize {
+			t.Errorf("chunk %d size %d exceeds maximum %d", i, chunk.Size, maxSize)
+		}
+	}
+}
--- a/internal/chunker/fastcdc.go
+++ b/internal/chunker/fastcdc.go
@@ -0,0 +1,265 @@
+package chunker
+
+import (
+	"io"
+	"math"
+	"sync"
+)
+
+// ReusableChunker implements FastCDC with reusable buffers to minimize allocations.
+// Unlike the upstream fastcdc-go library which allocates a new buffer per file,
+// this implementation uses sync.Pool to reuse buffers across files.
+type ReusableChunker struct {
+	minSize  int
+	maxSize  int
+	normSize int
+	bufSize  int
+
+	maskS uint64
+	maskL uint64
+
+	rd io.Reader
+
+	buf    []byte
+	cursor int
+	offset int
+	eof    bool
+}
+
+// reusableChunkerPool pools ReusableChunker instances to avoid allocations.
+var reusableChunkerPool = sync.Pool{
+	New: func() interface{} {
+		return &ReusableChunker{}
+	},
+}
+
+// bufferPools contains pools for different buffer sizes.
+// Key is the buffer size.
+var bufferPools = sync.Map{}
+
+func getBuffer(size int) []byte {
+	poolI, _ := bufferPools.LoadOrStore(size, &sync.Pool{
+		New: func() interface{} {
+			buf := make([]byte, size)
+			return &buf
+		},
+	})
+	pool := poolI.(*sync.Pool)
+	return *pool.Get().(*[]byte)
+}
+
+func putBuffer(buf []byte) {
+	size := cap(buf)
+	poolI, ok := bufferPools.Load(size)
+	if ok {
+		pool := poolI.(*sync.Pool)
+		b := buf[:size]
+		pool.Put(&b)
+	}
+}
+
+// FastCDCChunk represents a chunk from the FastCDC algorithm.
+type FastCDCChunk struct {
+	Offset      int
+	Length      int
+	Data        []byte
+	Fingerprint uint64
+}
+
+// AcquireReusableChunker gets a chunker from the pool and initializes it for the given reader.
+func AcquireReusableChunker(rd io.Reader, minSize, avgSize, maxSize int) *ReusableChunker {
+	c := reusableChunkerPool.Get().(*ReusableChunker)
+
+	bufSize := maxSize * 2
+
+	// Reuse buffer if it's the right size, otherwise get a new one
+	if c.buf == nil || cap(c.buf) != bufSize {
+		if c.buf != nil {
+			putBuffer(c.buf)
+		}
+		c.buf = getBuffer(bufSize)
+	} else {
+		// Restore buffer to full capacity (may have been truncated by previous EOF)
+		c.buf = c.buf[:cap(c.buf)]
+	}
+
+	bits := int(math.Round(math.Log2(float64(avgSize))))
+	normalization := 2
+	smallBits := bits + normalization
+	largeBits := bits - normalization
+
+	c.minSize = minSize
+	c.maxSize = maxSize
+	c.normSize = avgSize
+	c.bufSize = bufSize
+	c.maskS = (1 << smallBits) - 1
+	c.maskL = (1 << largeBits) - 1
+	c.rd = rd
+	c.cursor = bufSize
+	c.offset = 0
+	c.eof = false
+
+	return c
+}
+
+// Release returns the chunker to the pool for reuse.
+func (c *ReusableChunker) Release() {
+	c.rd = nil
+	reusableChunkerPool.Put(c)
+}
+
+func (c *ReusableChunker) fillBuffer() error {
+	n := len(c.buf) - c.cursor
+	if n >= c.maxSize {
+		return nil
+	}
+
+	// Move all data after the cursor to the start of the buffer
+	copy(c.buf[:n], c.buf[c.cursor:])
+	c.cursor = 0
+
+	if c.eof {
+		c.buf = c.buf[:n]
+		return nil
+	}
+
+	// Restore buffer to full capacity for reading
+	c.buf = c.buf[:c.bufSize]
+
+	// Fill the rest of the buffer
+	m, err := io.ReadFull(c.rd, c.buf[n:])
+	if err == io.EOF || err == io.ErrUnexpectedEOF {
+		c.buf = c.buf[:n+m]
+		c.eof = true
+	} else if err != nil {
+		return err
+	}
+	return nil
+}
+
+// Next returns the next chunk or io.EOF when done.
+// The returned Data slice is only valid until the next call to Next.
+func (c *ReusableChunker) Next() (FastCDCChunk, error) {
+	if err := c.fillBuffer(); err != nil {
+		return FastCDCChunk{}, err
+	}
+	if len(c.buf) == 0 {
+		return FastCDCChunk{}, io.EOF
+	}
+
+	length, fp := c.nextChunk(c.buf[c.cursor:])
+
+	chunk := FastCDCChunk{
+		Offset:      c.offset,
+		Length:      length,
+		Data:        c.buf[c.cursor : c.cursor+length],
+		Fingerprint: fp,
+	}
+
+	c.cursor += length
+	c.offset += chunk.Length
+
+	return chunk, nil
+}
+
+func (c *ReusableChunker) nextChunk(data []byte) (int, uint64) {
+	fp := uint64(0)
+	i := c.minSize
+
+	if len(data) <= c.minSize {
+		return len(data), fp
+	}
+
+	n := min(len(data), c.maxSize)
+
+	for ; i < min(n, c.normSize); i++ {
+		fp = (fp << 1) + table[data[i]]
+		if (fp & c.maskS) == 0 {
+			return i + 1, fp
+		}
+	}
+
+	for ; i < n; i++ {
+		fp = (fp << 1) + table[data[i]]
+		if (fp & c.maskL) == 0 {
+			return i + 1, fp
+		}
+	}
+
+	return i, fp
+}
+
+func min(a, b int) int {
+	if a < b {
+		return a
+	}
+	return b
+}
+
+// 256 random uint64s for the rolling hash function (from FastCDC paper)
+var table = [256]uint64{
+	0xe80e8d55032474b3, 0x11b25b61f5924e15, 0x03aa5bd82a9eb669, 0xc45a153ef107a38c,
+	0xeac874b86f0f57b9, 0xa5ccedec95ec79c7, 0xe15a3320ad42ac0a, 0x5ed3583fa63cec15,
+	0xcd497bf624a4451d, 0xf9ade5b059683605, 0x773940c03fb11ca1, 0xa36b16e4a6ae15b2,
+	0x67afd1adb5a89eac, 0xc44c75ee32f0038e, 0x2101790f365c0967, 0x76415c64a222fc4a,
+	0x579929249a1e577a, 0xe4762fc41fdbf750, 0xea52198e57dfcdcc, 0xe2535aafe30b4281,
+	0xcb1a1bd6c77c9056, 0x5a1aa9bfc4612a62, 0x15a728aef8943eb5, 0x2f8f09738a8ec8d9,
+	0x200f3dec9fac8074, 0x0fa9a7b1e0d318df, 0x06c0804ffd0d8e3a, 0x630cbc412669dd25,
+	0x10e34f85f4b10285, 0x2a6fe8164b9b6410, 0xcacb57d857d55810, 0x77f8a3a36ff11b46,
+	0x66af517e0dc3003e, 0x76c073c789b4009a, 0x853230dbb529f22a, 0x1e9e9c09a1f77e56,
+	0x1e871223802ee65d, 0x37fe4588718ff813, 0x10088539f30db464, 0x366f7470b80b72d1,
+	0x33f2634d9a6b31db, 0xd43917751d69ea18, 0xa0f492bc1aa7b8de, 0x3f94e5a8054edd20,
+	0xedfd6e25eb8b1dbf, 0x759517a54f196a56, 0xe81d5006ec7b6b17, 0x8dd8385fa894a6b7,
+	0x45f4d5467b0d6f91, 0xa1f894699de22bc8, 0x33829d09ef93e0fe, 0x3e29e250caed603c,
+	0xf7382cba7f63a45e, 0x970f95412bb569d1, 0xc7fcea456d356b4b, 0x723042513f3e7a57,
+	0x17ae7688de3596f1, 0x27ac1fcd7cd23c1a, 0xf429beeb78b3f71f, 0xd0780692fb93a3f9,
+	0x9f507e28a7c9842f, 0x56001ad536e433ae, 0x7e1dd1ecf58be306, 0x15fee353aa233fc6,
+	0xb033a0730b7638e8, 0xeb593ad6bd2406d1, 0x7c86502574d0f133, 0xce3b008d4ccb4be7,
+	0xf8566e3d383594c8, 0xb2c261e9b7af4429, 0xf685e7e253799dbb, 0x05d33ed60a494cbc,
+	0xeaf88d55a4cb0d1a, 0x3ee9368a902415a1, 0x8980fe6a8493a9a4, 0x358ed008cb448631,
+	0xd0cb7e37b46824b8, 0xe9bc375c0bc94f84, 0xea0bf1d8e6b55bb3, 0xb66a60d0f9f6f297,
+	0x66db2cc4807b3758, 0x7e4e014afbca8b4d, 0xa5686a4938b0c730, 0xa5f0d7353d623316,
+	0x26e38c349242d5e8, 0xeeefa80a29858e30, 0x8915cb912aa67386, 0x4b957a47bfc420d4,
+	0xbb53d051a895f7e1, 0x09f5e3235f6911ce, 0x416b98e695cfb7ce, 0x97a08183344c5c86,
+	0xbf68e0791839a861, 0xea05dde59ed3ed56, 0x0ca732280beda160, 0xac748ed62fe7f4e2,
+	0xc686da075cf6e151, 0xe1ba5658f4af05c8, 0xe9ff09fbeb67cc35, 0xafaea9470323b28d,
+	0x0291e8db5bb0ac2a, 0x342072a9bbee77ae, 0x03147eed6b3d0a9c, 0x21379d4de31dbadb,
+	0x2388d965226fb986, 0x52c96988bfebabfa, 0xa6fc29896595bc2d, 0x38fa4af70aa46b8b,
+	0xa688dd13939421ee, 0x99d5275d9b1415da, 0x453d31bb4fe73631, 0xde51debc1fbe3356,
+	0x75a3c847a06c622f, 0xe80e32755d272579, 0x5444052250d8ec0d, 0x8f17dfda19580a3b,
+	0xf6b3e9363a185e42, 0x7a42adec6868732f, 0x32cb6a07629203a2, 0x1eca8957defe56d9,
+	0x9fa85e4bc78ff9ed, 0x20ff07224a499ca7, 0x3fa6295ff9682c70, 0xe3d5b1e3ce993eff,
+	0xa341209362e0b79a, 0x64bd9eae5712ffe8, 0xceebb537babbd12a, 0x5586ef404315954f,
+	0x46c3085c938ab51a, 0xa82ccb9199907cee, 0x8c51b6690a3523c8, 0xc4dbd4c9ae518332,
+	0x979898dbb23db7b2, 0x1b5b585e6f672a9d, 0xce284da7c4903810, 0x841166e8bb5f1c4f,
+	0xb7d884a3fceca7d0, 0xa76468f5a4572374, 0xc10c45f49ee9513d, 0x68f9a5663c1908c9,
+	0x0095a13476a6339d, 0xd1d7516ffbe9c679, 0xfd94ab0c9726f938, 0x627468bbdb27c959,
+	0xedc3f8988e4a8c9a, 0x58efd33f0dfaa499, 0x21e37d7e2ef4ac8b, 0x297f9ab5586259c6,
+	0xda3ba4dc6cb9617d, 0xae11d8d9de2284d2, 0xcfeed88cb3729865, 0xefc2f9e4f03e2633,
+	0x8226393e8f0855a4, 0xd6e25fd7acf3a767, 0x435784c3bfd6d14a, 0xf97142e6343fe757,
+	0xd73b9fe826352f85, 0x6c3ac444b5b2bd76, 0xd8e88f3e9fd4a3fd, 0x31e50875c36f3460,
+	0xa824f1bf88cf4d44, 0x54a4d2c8f5f25899, 0xbff254637ce3b1e6, 0xa02cfe92561b3caa,
+	0x7bedb4edee9f0af7, 0x879c0620ac49a102, 0xa12c4ccd23b332e7, 0x09a5ff47bf94ed1e,
+	0x7b62f43cd3046fa0, 0xaa3af0476b9c2fb9, 0x22e55301abebba8e, 0x3a6035c42747bd58,
+	0x1705373106c8ec07, 0xb1f660de828d0628, 0x065fe82d89ca563d, 0xf555c2d8074d516d,
+	0x6bb6c186b423ee99, 0x54a807be6f3120a8, 0x8a3c7fe2f88860b8, 0xbeffc344f5118e81,
+	0xd686e80b7d1bd268, 0x661aef4ef5e5e88b, 0x5bf256c654cd1dda, 0x9adb1ab85d7640f4,
+	0x68449238920833a2, 0x843279f4cebcb044, 0xc8710cdefa93f7bb, 0x236943294538f3e6,
+	0x80d7d136c486d0b4, 0x61653956b28851d3, 0x3f843be9a9a956b5, 0xf73cfbbf137987e5,
+	0xcf0cb6dee8ceac2c, 0x50c401f52f185cae, 0xbdbe89ce735c4c1c, 0xeef3ade9c0570bc7,
+	0xbe8b066f8f64cbf6, 0x5238d6131705dcb9, 0x20219086c950e9f6, 0x634468d9ed74de02,
+	0x0aba4b3d705c7fa5, 0x3374416f725a6672, 0xe7378bdf7beb3bc6, 0x0f7b6a1b1cee565b,
+	0x234e4c41b0c33e64, 0x4efa9a0c3f21fe28, 0x1167fc551643e514, 0x9f81a69d3eb01fa4,
+	0xdb75c22b12306ed0, 0xe25055d738fc9686, 0x9f9f167a3f8507bb, 0x195f8336d3fbe4d3,
+	0x8442b6feffdcb6f6, 0x1e07ed24746ffde9, 0x140e31462d555266, 0x8bd0ce515ae1406e,
+	0x2c0be0042b5584b3, 0x35a23d0e15d45a60, 0xc14f1ba147d9bc83, 0xbbf168691264b23f,
+	0xad2cc7b57e589ade, 0x9501963154c7815c, 0x9664afa6b8d67d47, 0x7f9e5101fea0a81c,
+	0x45ecffb610d25bfd, 0x3157f7aecf9b6ab3, 0xc43ca6f88d87501d, 0x9576ff838dee38dc,
+	0x93f21afe0ce1c7d7, 0xceac699df343d8f9, 0x2fec49e29f03398d, 0x8805ccd5730281ed,
+	0xf9fc16fc750a8e59, 0x35308cc771adf736, 0x4a57b7c9ee2b7def, 0x03a4c6cdc937a02a,
+	0x6c9a8a269fc8c4fc, 0x4681decec7a03f43, 0x342eecded1353ef9, 0x8be0552d8413a867,
+	0xc7b4ac51beda8be8, 0xebcc64fb719842c0, 0xde8e4c7fb6d40c1c, 0xcc8263b62f9738b1,
+	0xd3cfc0f86511929a, 0x466024ce8bb226ea, 0x459ff690253a3c18, 0x98b27e9d91284c9c,
+	0x75c3ae8aa3af373d, 0xfbf8f8e79a866ffc, 0x32327f59d0662799, 0x8228b57e729e9830,
+	0x065ceb7a18381b58, 0xd2177671a31dc5ff, 0x90cd801f2f8701f9, 0x9d714428471c65fe,
+}
--- a/internal/cli/app.go
+++ b/internal/cli/app.go
@@ -2,28 +2,63 @@ package cli

 import (
 	"context"
+	"errors"
 	"fmt"
+	"os"
+	"os/signal"
+	"path/filepath"
+	"syscall"
+	"time"

 	"git.eeqj.de/sneak/vaultik/internal/config"
 	"git.eeqj.de/sneak/vaultik/internal/database"
 	"git.eeqj.de/sneak/vaultik/internal/globals"
+	"git.eeqj.de/sneak/vaultik/internal/log"
+	"git.eeqj.de/sneak/vaultik/internal/pidlock"
+	"git.eeqj.de/sneak/vaultik/internal/snapshot"
+	"git.eeqj.de/sneak/vaultik/internal/storage"
+	"git.eeqj.de/sneak/vaultik/internal/vaultik"
+	"github.com/adrg/xdg"
 	"go.uber.org/fx"
 )

-// AppOptions contains common options for creating the fx application
+// AppOptions contains common options for creating the fx application.
+// It includes the configuration file path, logging options, and additional
+// fx modules and invocations that should be included in the application.
 type AppOptions struct {
 	ConfigPath string
+	LogOptions log.LogOptions
 	Modules    []fx.Option
 	Invokes    []fx.Option
 }

-// NewApp creates a new fx application with common modules
+// setupGlobals sets up the globals with application startup time
+func setupGlobals(lc fx.Lifecycle, g *globals.Globals) {
+	lc.Append(fx.Hook{
+		OnStart: func(ctx context.Context) error {
+			g.StartTime = time.Now().UTC()
+			return nil
+		},
+	})
+}
+
+// NewApp creates a new fx application with common modules.
+// It sets up the base modules (config, database, logging, globals) and
+// combines them with any additional modules specified in the options.
+// The returned fx.App is ready to be started with RunApp.
 func NewApp(opts AppOptions) *fx.App {
 	baseModules := []fx.Option{
 		fx.Supply(config.ConfigPath(opts.ConfigPath)),
+		fx.Supply(opts.LogOptions),
 		fx.Provide(globals.New),
+		fx.Provide(log.New),
 		config.Module,
 		database.Module,
+		log.Module,
+		storage.Module,
+		snapshot.Module,
+		fx.Provide(vaultik.New),
+		fx.Invoke(setupGlobals),
 		fx.NopLogger,
 	}

@@ -33,24 +68,77 @@ func NewApp(opts AppOptions) *fx.App {
 	return fx.New(allOptions...)
 }

-// RunApp starts and stops the fx application within the given context
+// RunApp starts and stops the fx application within the given context.
+// It handles graceful shutdown on interrupt signals (SIGINT, SIGTERM) and
+// ensures the application stops cleanly. The function blocks until the
+// application completes or is interrupted. Returns an error if startup fails.
 func RunApp(ctx context.Context, app *fx.App) error {
+	// Set up signal handling for graceful shutdown
+	sigChan := make(chan os.Signal, 1)
+	signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)
+
+	// Create a context that will be cancelled on signal
+	ctx, cancel := context.WithCancel(ctx)
+	defer cancel()
+
+	// Start the app
 	if err := app.Start(ctx); err != nil {
 		return fmt.Errorf("failed to start app: %w", err)
 	}
-	defer func() {
-		if err := app.Stop(ctx); err != nil {
-			fmt.Printf("error stopping app: %v\n", err)
+
+	// Handle shutdown
+	shutdownComplete := make(chan struct{})
+	go func() {
+		defer close(shutdownComplete)
+		<-sigChan
+		log.Notice("Received interrupt signal, shutting down gracefully...")
+
+		// Create a timeout context for shutdown
+		shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 30*time.Second)
+		defer shutdownCancel()
+
+		if err := app.Stop(shutdownCtx); err != nil {
+			log.Error("Error during shutdown", "error", err)
 		}
 	}()

-	// Wait for context cancellation
-	<-ctx.Done()
+	// Wait for either the signal handler to complete shutdown or the app to request shutdown
+	select {
+	case <-shutdownComplete:
+		// Shutdown completed via signal
+		return nil
+	case <-ctx.Done():
+		// Context cancelled (shouldn't happen in normal operation)
+		if err := app.Stop(context.Background()); err != nil {
+			log.Error("Error stopping app", "error", err)
+		}
+		return ctx.Err()
+	case <-app.Done():
+		// App finished running (e.g., backup completed)
 		return nil
 	}
+}

-// RunWithApp is a helper that creates and runs an fx app with the given options
+// RunWithApp is a helper that creates and runs an fx app with the given options.
+// It combines NewApp and RunApp into a single convenient function. This is the
+// preferred way to run CLI commands that need the full application context.
+// It acquires a PID lock before starting to prevent concurrent instances.
 func RunWithApp(ctx context.Context, opts AppOptions) error {
+	// Acquire PID lock to prevent concurrent instances
+	lockDir := filepath.Join(xdg.DataHome, "berlin.sneak.app.vaultik")
+	lock, err := pidlock.Acquire(lockDir)
+	if err != nil {
+		if errors.Is(err, pidlock.ErrAlreadyRunning) {
+			return fmt.Errorf("cannot start: %w", err)
+		}
+		return fmt.Errorf("failed to acquire lock: %w", err)
+	}
+	defer func() {
+		if err := lock.Release(); err != nil {
+			log.Warn("Failed to release PID lock", "error", err)
+		}
+	}()
+
 	app := NewApp(opts)
 	return RunApp(ctx, app)
 }
--- a/internal/cli/backup.go
+++ b/internal/cli/backup.go
@@ -1,83 +0,0 @@
-package cli
-
-import (
-	"context"
-	"fmt"
-	"os"
-
-	"git.eeqj.de/sneak/vaultik/internal/config"
-	"git.eeqj.de/sneak/vaultik/internal/database"
-	"git.eeqj.de/sneak/vaultik/internal/globals"
-	"github.com/spf13/cobra"
-	"go.uber.org/fx"
-)
-
-// BackupOptions contains options for the backup command
-type BackupOptions struct {
-	ConfigPath string
-	Daemon     bool
-	Cron       bool
-	Prune      bool
-}
-
-// NewBackupCommand creates the backup command
-func NewBackupCommand() *cobra.Command {
-	opts := &BackupOptions{}
-
-	cmd := &cobra.Command{
-		Use:   "backup",
-		Short: "Perform incremental backup",
-		Long: `Backup configured directories using incremental deduplication and encryption.
-
-Config is located at /etc/vaultik/config.yml, but can be overridden by specifying 
-a path using --config or by setting VAULTIK_CONFIG to a path.`,
-		Args: cobra.NoArgs,
-		RunE: func(cmd *cobra.Command, args []string) error {
-			// If --config not specified, check environment variable
-			if opts.ConfigPath == "" {
-				opts.ConfigPath = os.Getenv("VAULTIK_CONFIG")
-			}
-			// If still not specified, use default
-			if opts.ConfigPath == "" {
-				defaultConfig := "/etc/vaultik/config.yml"
-				if _, err := os.Stat(defaultConfig); err == nil {
-					opts.ConfigPath = defaultConfig
-				} else {
-					return fmt.Errorf("no config file specified, VAULTIK_CONFIG not set, and %s not found", defaultConfig)
-				}
-			}
-			return runBackup(cmd.Context(), opts)
-		},
-	}
-
-	cmd.Flags().StringVar(&opts.ConfigPath, "config", "", "Path to config file")
-	cmd.Flags().BoolVar(&opts.Daemon, "daemon", false, "Run in daemon mode with inotify monitoring")
-	cmd.Flags().BoolVar(&opts.Cron, "cron", false, "Run in cron mode (silent unless error)")
-	cmd.Flags().BoolVar(&opts.Prune, "prune", false, "Delete all previous snapshots and unreferenced blobs after backup")
-
-	return cmd
-}
-
-func runBackup(ctx context.Context, opts *BackupOptions) error {
-	return RunWithApp(ctx, AppOptions{
-		ConfigPath: opts.ConfigPath,
-		Invokes: []fx.Option{
-			fx.Invoke(func(g *globals.Globals, cfg *config.Config, repos *database.Repositories) error {
-				// TODO: Implement backup logic
-				fmt.Printf("Running backup with config: %s\n", opts.ConfigPath)
-				fmt.Printf("Version: %s, Commit: %s\n", g.Version, g.Commit)
-				fmt.Printf("Index path: %s\n", cfg.IndexPath)
-				if opts.Daemon {
-					fmt.Println("Running in daemon mode")
-				}
-				if opts.Cron {
-					fmt.Println("Running in cron mode")
-				}
-				if opts.Prune {
-					fmt.Println("Pruning enabled - will delete old snapshots after backup")
-				}
-				return nil
-			}),
-		},
-	})
-}
--- a/internal/cli/database.go
+++ b/internal/cli/database.go
@@ -0,0 +1,102 @@
+package cli
+
+import (
+	"fmt"
+	"os"
+
+	"git.eeqj.de/sneak/vaultik/internal/config"
+	"git.eeqj.de/sneak/vaultik/internal/log"
+	"github.com/spf13/cobra"
+)
+
+// NewDatabaseCommand creates the database command group
+func NewDatabaseCommand() *cobra.Command {
+	cmd := &cobra.Command{
+		Use:   "database",
+		Short: "Manage the local state database",
+		Long:  `Commands for managing the local SQLite state database.`,
+	}
+
+	cmd.AddCommand(
+		newDatabasePurgeCommand(),
+	)
+
+	return cmd
+}
+
+// newDatabasePurgeCommand creates the database purge command
+func newDatabasePurgeCommand() *cobra.Command {
+	var force bool
+
+	cmd := &cobra.Command{
+		Use:   "purge",
+		Short: "Delete the local state database",
+		Long: `Completely removes the local SQLite state database.
+
+This will erase all local tracking of:
+- File metadata and change detection state
+- Chunk and blob mappings
+- Local snapshot records
+
+The remote storage is NOT affected. After purging, the next backup will
+perform a full scan and re-deduplicate against existing remote blobs.
+
+Use --force to skip the confirmation prompt.`,
+		Args: cobra.NoArgs,
+		RunE: func(cmd *cobra.Command, args []string) error {
+			// Resolve config path
+			configPath, err := ResolveConfigPath()
+			if err != nil {
+				return err
+			}
+
+			// Load config to get database path
+			cfg, err := config.Load(configPath)
+			if err != nil {
+				return fmt.Errorf("failed to load config: %w", err)
+			}
+
+			dbPath := cfg.IndexPath
+
+			// Check if database exists
+			if _, err := os.Stat(dbPath); os.IsNotExist(err) {
+				fmt.Printf("Database does not exist: %s\n", dbPath)
+				return nil
+			}
+
+			// Confirm unless --force
+			if !force {
+				fmt.Printf("This will delete the local state database at:\n  %s\n\n", dbPath)
+				fmt.Print("Are you sure? Type 'yes' to confirm: ")
+				var confirm string
+				if _, err := fmt.Scanln(&confirm); err != nil || confirm != "yes" {
+					fmt.Println("Aborted.")
+					return nil
+				}
+			}
+
+			// Delete the database file
+			if err := os.Remove(dbPath); err != nil {
+				return fmt.Errorf("failed to delete database: %w", err)
+			}
+
+			// Also delete WAL and SHM files if they exist
+			walPath := dbPath + "-wal"
+			shmPath := dbPath + "-shm"
+			_ = os.Remove(walPath) // Ignore errors - files may not exist
+			_ = os.Remove(shmPath)
+
+			rootFlags := GetRootFlags()
+			if !rootFlags.Quiet {
+				fmt.Printf("Database purged: %s\n", dbPath)
+			}
+
+			log.Info("Local state database purged", "path", dbPath)
+			return nil
+		},
+	}
+
+	cmd.Flags().BoolVar(&force, "force", false, "Skip confirmation prompt")
+
+	return cmd
+}
--- a/internal/cli/duration.go
+++ b/internal/cli/duration.go
@@ -0,0 +1,94 @@
+package cli
+
+import (
+	"fmt"
+	"regexp"
+	"strconv"
+	"strings"
+	"time"
+)
+
+// parseDuration parses duration strings. Supports standard Go duration format
+// (e.g., "3h30m", "1h45m30s") as well as extended units:
+// - d: days (e.g., "30d", "7d")
+// - w: weeks (e.g., "2w", "4w")
+// - mo: months (30 days) (e.g., "6mo", "1mo")
+// - y: years (365 days) (e.g., "1y", "2y")
+//
+// Can combine units: "1y6mo", "2w3d", "1d12h30m"
+func parseDuration(s string) (time.Duration, error) {
+	// First try standard Go duration parsing
+	if d, err := time.ParseDuration(s); err == nil {
+		return d, nil
+	}
+
+	// Extended duration parsing
+	// Check for negative values
+	if strings.HasPrefix(strings.TrimSpace(s), "-") {
+		return 0, fmt.Errorf("negative durations are not supported")
+	}
+
+	// Pattern matches: number + unit, repeated
+	re := regexp.MustCompile(`(\d+(?:\.\d+)?)\s*([a-zA-Z]+)`)
+	matches := re.FindAllStringSubmatch(s, -1)
+
+	if len(matches) == 0 {
+		return 0, fmt.Errorf("invalid duration format: %q", s)
+	}
+
+	var total time.Duration
+
+	for _, match := range matches {
+		valueStr := match[1]
+		unit := strings.ToLower(match[2])
+
+		value, err := strconv.ParseFloat(valueStr, 64)
+		if err != nil {
+			return 0, fmt.Errorf("invalid number %q: %w", valueStr, err)
+		}
+
+		var d time.Duration
+		switch unit {
+		// Standard time units
+		case "ns", "nanosecond", "nanoseconds":
+			d = time.Duration(value)
+		case "us", "µs", "microsecond", "microseconds":
+			d = time.Duration(value * float64(time.Microsecond))
+		case "ms", "millisecond", "milliseconds":
+			d = time.Duration(value * float64(time.Millisecond))
+		case "s", "sec", "second", "seconds":
+			d = time.Duration(value * float64(time.Second))
+		case "m", "min", "minute", "minutes":
+			d = time.Duration(value * float64(time.Minute))
+		case "h", "hr", "hour", "hours":
+			d = time.Duration(value * float64(time.Hour))
+		// Extended units
+		case "d", "day", "days":
+			d = time.Duration(value * float64(24*time.Hour))
+		case "w", "week", "weeks":
+			d = time.Duration(value * float64(7*24*time.Hour))
+		case "mo", "month", "months":
+			// Using 30 days as approximation
+			d = time.Duration(value * float64(30*24*time.Hour))
+		case "y", "year", "years":
+			// Using 365 days as approximation
+			d = time.Duration(value * float64(365*24*time.Hour))
+		default:
+			// Try parsing as standard Go duration unit
+			testStr := fmt.Sprintf("1%s", unit)
+			if _, err := time.ParseDuration(testStr); err == nil {
+				// It's a valid Go duration unit, parse the full value
+				fullStr := fmt.Sprintf("%g%s", value, unit)
+				if d, err = time.ParseDuration(fullStr); err != nil {
+					return 0, fmt.Errorf("invalid duration %q: %w", fullStr, err)
+				}
+			} else {
+				return 0, fmt.Errorf("unknown time unit %q", unit)
+			}
+		}
+
+		total += d
+	}
+
+	return total, nil
+}
--- a/internal/cli/duration_test.go
+++ b/internal/cli/duration_test.go
@@ -0,0 +1,263 @@
+package cli
+
+import (
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+)
+
+func TestParseDuration(t *testing.T) {
+	tests := []struct {
+		name     string
+		input    string
+		expected time.Duration
+		wantErr  bool
+	}{
+		// Standard Go durations
+		{
+			name:     "standard seconds",
+			input:    "30s",
+			expected: 30 * time.Second,
+		},
+		{
+			name:     "standard minutes",
+			input:    "45m",
+			expected: 45 * time.Minute,
+		},
+		{
+			name:     "standard hours",
+			input:    "2h",
+			expected: 2 * time.Hour,
+		},
+		{
+			name:     "standard combined",
+			input:    "3h30m",
+			expected: 3*time.Hour + 30*time.Minute,
+		},
+		{
+			name:     "standard complex",
+			input:    "1h45m30s",
+			expected: 1*time.Hour + 45*time.Minute + 30*time.Second,
+		},
+		{
+			name:     "standard with milliseconds",
+			input:    "1s500ms",
+			expected: 1*time.Second + 500*time.Millisecond,
+		},
+		// Extended units - days
+		{
+			name:     "single day",
+			input:    "1d",
+			expected: 24 * time.Hour,
+		},
+		{
+			name:     "multiple days",
+			input:    "7d",
+			expected: 7 * 24 * time.Hour,
+		},
+		{
+			name:     "fractional days",
+			input:    "1.5d",
+			expected: 36 * time.Hour,
+		},
+		{
+			name:     "days spelled out",
+			input:    "3days",
+			expected: 3 * 24 * time.Hour,
+		},
+		// Extended units - weeks
+		{
+			name:     "single week",
+			input:    "1w",
+			expected: 7 * 24 * time.Hour,
+		},
+		{
+			name:     "multiple weeks",
+			input:    "4w",
+			expected: 4 * 7 * 24 * time.Hour,
+		},
+		{
+			name:     "weeks spelled out",
+			input:    "2weeks",
+			expected: 2 * 7 * 24 * time.Hour,
+		},
+		// Extended units - months
+		{
+			name:     "single month",
+			input:    "1mo",
+			expected: 30 * 24 * time.Hour,
+		},
+		{
+			name:     "multiple months",
+			input:    "6mo",
+			expected: 6 * 30 * 24 * time.Hour,
+		},
+		{
+			name:     "months spelled out",
+			input:    "3months",
+			expected: 3 * 30 * 24 * time.Hour,
+		},
+		// Extended units - years
+		{
+			name:     "single year",
+			input:    "1y",
+			expected: 365 * 24 * time.Hour,
+		},
+		{
+			name:     "multiple years",
+			input:    "2y",
+			expected: 2 * 365 * 24 * time.Hour,
+		},
+		{
+			name:     "years spelled out",
+			input:    "1year",
+			expected: 365 * 24 * time.Hour,
+		},
+		// Combined extended units
+		{
+			name:     "weeks and days",
+			input:    "2w3d",
+			expected: 2*7*24*time.Hour + 3*24*time.Hour,
+		},
+		{
+			name:     "years and months",
+			input:    "1y6mo",
+			expected: 365*24*time.Hour + 6*30*24*time.Hour,
+		},
+		{
+			name:     "days and hours",
+			input:    "1d12h",
+			expected: 24*time.Hour + 12*time.Hour,
+		},
+		{
+			name:     "complex combination",
+			input:    "1y2mo3w4d5h6m7s",
+			expected: 365*24*time.Hour + 2*30*24*time.Hour + 3*7*24*time.Hour + 4*24*time.Hour + 5*time.Hour + 6*time.Minute + 7*time.Second,
+		},
+		{
+			name:     "with spaces",
+			input:    "1d 12h 30m",
+			expected: 24*time.Hour + 12*time.Hour + 30*time.Minute,
+		},
+		// Edge cases
+		{
+			name:     "zero duration",
+			input:    "0s",
+			expected: 0,
+		},
+		{
+			name:     "large duration",
+			input:    "10y",
+			expected: 10 * 365 * 24 * time.Hour,
+		},
+		// Error cases
+		{
+			name:    "empty string",
+			input:   "",
+			wantErr: true,
+		},
+		{
+			name:    "invalid format",
+			input:   "abc",
+			wantErr: true,
+		},
+		{
+			name:    "unknown unit",
+			input:   "5x",
+			wantErr: true,
+		},
+		{
+			name:    "invalid number",
+			input:   "xyzd",
+			wantErr: true,
+		},
+		{
+			name:    "negative not supported",
+			input:   "-5d",
+			wantErr: true,
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			got, err := parseDuration(tt.input)
+
+			if tt.wantErr {
+				assert.Error(t, err, "expected error for input %q", tt.input)
+				return
+			}
+
+			assert.NoError(t, err, "unexpected error for input %q", tt.input)
+			assert.Equal(t, tt.expected, got, "duration mismatch for input %q", tt.input)
+		})
+	}
+}
+
+func TestParseDurationSpecialCases(t *testing.T) {
+	// Test that standard Go durations work exactly as expected
+	standardDurations := []string{
+		"300ms",
+		"1.5h",
+		"2h45m",
+		"72h",
+		"1us",
+		"1µs",
+		"1ns",
+	}
+
+	for _, d := range standardDurations {
+		expected, err := time.ParseDuration(d)
+		assert.NoError(t, err)
+
+		got, err := parseDuration(d)
+		assert.NoError(t, err)
+		assert.Equal(t, expected, got, "standard duration %q should parse identically", d)
+	}
+}
+
+func TestParseDurationRealWorldExamples(t *testing.T) {
+	// Test real-world snapshot purge scenarios
+	tests := []struct {
+		description string
+		input       string
+		olderThan   time.Duration
+	}{
+		{
+			description: "keep snapshots from last 30 days",
+			input:       "30d",
+			olderThan:   30 * 24 * time.Hour,
+		},
+		{
+			description: "keep snapshots from last 6 months",
+			input:       "6mo",
+			olderThan:   6 * 30 * 24 * time.Hour,
+		},
+		{
+			description: "keep snapshots from last year",
+			input:       "1y",
+			olderThan:   365 * 24 * time.Hour,
+		},
+		{
+			description: "keep snapshots from last week and a half",
+			input:       "1w3d",
+			olderThan:   10 * 24 * time.Hour,
+		},
+		{
+			description: "keep snapshots from last 90 days",
+			input:       "90d",
+			olderThan:   90 * 24 * time.Hour,
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.description, func(t *testing.T) {
+			got, err := parseDuration(tt.input)
+			assert.NoError(t, err)
+			assert.Equal(t, tt.olderThan, got)
+
+			// Verify the duration makes sense for snapshot purging
+			assert.Greater(t, got, time.Hour, "snapshot purge duration should be at least an hour")
+		})
+	}
+}
--- a/internal/cli/entry.go
+++ b/internal/cli/entry.go
@@ -4,7 +4,9 @@ import (
 	"os"
 )

-// CLIEntry is the main entry point for the CLI application
+// CLIEntry is the main entry point for the CLI application.
+// It creates the root command, executes it, and exits with status 1
+// if an error occurs. This function should be called from main().
 func CLIEntry() {
 	rootCmd := NewRootCommand()
 	if err := rootCmd.Execute(); err != nil {
--- a/internal/cli/entry_test.go
+++ b/internal/cli/entry_test.go
@@ -18,7 +18,7 @@ func TestCLIEntry(t *testing.T) {
 	}

 	// Verify all subcommands are registered
-	expectedCommands := []string{"backup", "restore", "prune", "verify", "fetch"}
+	expectedCommands := []string{"snapshot", "store", "restore", "prune", "verify", "info", "version"}
 	for _, expected := range expectedCommands {
 		found := false
 		for _, cmd := range cmd.Commands() {
@@ -32,19 +32,24 @@ func TestCLIEntry(t *testing.T) {
 		}
 	}

-	// Verify backup command has proper flags
-	backupCmd, _, err := cmd.Find([]string{"backup"})
+	// Verify snapshot command has subcommands
+	snapshotCmd, _, err := cmd.Find([]string{"snapshot"})
 	if err != nil {
-		t.Errorf("Failed to find backup command: %v", err)
+		t.Errorf("Failed to find snapshot command: %v", err)
 	} else {
-		if backupCmd.Flag("config") == nil {
-			t.Error("Backup command missing --config flag")
+		// Check snapshot subcommands
+		expectedSubCommands := []string{"create", "list", "purge", "verify"}
+		for _, expected := range expectedSubCommands {
+			found := false
+			for _, subcmd := range snapshotCmd.Commands() {
+				if subcmd.Use == expected || subcmd.Name() == expected {
+					found = true
+					break
 				}
-		if backupCmd.Flag("daemon") == nil {
-			t.Error("Backup command missing --daemon flag")
 			}
-		if backupCmd.Flag("cron") == nil {
-			t.Error("Backup command missing --cron flag")
+			if !found {
+				t.Errorf("Expected snapshot subcommand '%s' not found", expected)
+			}
 		}
 	}
 }
--- a/internal/cli/fetch.go
+++ b/internal/cli/fetch.go
@@ -1,88 +0,0 @@
-package cli
-
-import (
-	"context"
-	"fmt"
-	"os"
-
-	"git.eeqj.de/sneak/vaultik/internal/globals"
-	"github.com/spf13/cobra"
-	"go.uber.org/fx"
-)
-
-// FetchOptions contains options for the fetch command
-type FetchOptions struct {
-	Bucket     string
-	Prefix     string
-	SnapshotID string
-	FilePath   string
-	Target     string
-}
-
-// NewFetchCommand creates the fetch command
-func NewFetchCommand() *cobra.Command {
-	opts := &FetchOptions{}
-
-	cmd := &cobra.Command{
-		Use:   "fetch",
-		Short: "Extract single file from backup",
-		Long:  `Download and decrypt a single file from a backup snapshot`,
-		Args:  cobra.NoArgs,
-		RunE: func(cmd *cobra.Command, args []string) error {
-			// Validate required flags
-			if opts.Bucket == "" {
-				return fmt.Errorf("--bucket is required")
-			}
-			if opts.Prefix == "" {
-				return fmt.Errorf("--prefix is required")
-			}
-			if opts.SnapshotID == "" {
-				return fmt.Errorf("--snapshot is required")
-			}
-			if opts.FilePath == "" {
-				return fmt.Errorf("--file is required")
-			}
-			if opts.Target == "" {
-				return fmt.Errorf("--target is required")
-			}
-			return runFetch(cmd.Context(), opts)
-		},
-	}
-
-	cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name")
-	cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix")
-	cmd.Flags().StringVar(&opts.SnapshotID, "snapshot", "", "Snapshot ID")
-	cmd.Flags().StringVar(&opts.FilePath, "file", "", "Path of file to extract from backup")
-	cmd.Flags().StringVar(&opts.Target, "target", "", "Target path for extracted file")
-
-	return cmd
-}
-
-func runFetch(ctx context.Context, opts *FetchOptions) error {
-	if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
-		return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
-	}
-
-	app := fx.New(
-		fx.Supply(opts),
-		fx.Provide(globals.New),
-		// Additional modules will be added here
-		fx.Invoke(func(g *globals.Globals) error {
-			// TODO: Implement fetch logic
-			fmt.Printf("Fetching %s from snapshot %s to %s\n", opts.FilePath, opts.SnapshotID, opts.Target)
-			return nil
-		}),
-		fx.NopLogger,
-	)
-
-	if err := app.Start(ctx); err != nil {
-		return fmt.Errorf("failed to start fetch: %w", err)
-	}
-	defer func() {
-		if err := app.Stop(ctx); err != nil {
-			fmt.Printf("error stopping app: %v\n", err)
-		}
-	}()
-
-	return nil
-}
--- a/internal/cli/info.go
+++ b/internal/cli/info.go
@@ -0,0 +1,71 @@
+package cli
+
+import (
+	"context"
+	"os"
+
+	"git.eeqj.de/sneak/vaultik/internal/log"
+	"git.eeqj.de/sneak/vaultik/internal/vaultik"
+	"github.com/spf13/cobra"
+	"go.uber.org/fx"
+)
+
+// NewInfoCommand creates the info command
+func NewInfoCommand() *cobra.Command {
+	cmd := &cobra.Command{
+		Use:   "info",
+		Short: "Display system and configuration information",
+		Long: `Shows information about the current vaultik configuration, including:
+- System details (OS, architecture, version)
+- Storage configuration (S3 bucket, endpoint)
+- Backup settings (source directories, compression)
+- Encryption configuration (recipients)
+- Local database statistics`,
+		Args: cobra.NoArgs,
+		RunE: func(cmd *cobra.Command, args []string) error {
+			// Use unified config resolution
+			configPath, err := ResolveConfigPath()
+			if err != nil {
+				return err
+			}
+
+			// Use the app framework
+			rootFlags := GetRootFlags()
+			return RunWithApp(cmd.Context(), AppOptions{
+				ConfigPath: configPath,
+				LogOptions: log.LogOptions{
+					Verbose: rootFlags.Verbose,
+					Debug:   rootFlags.Debug,
+					Quiet:   rootFlags.Quiet,
+				},
+				Modules: []fx.Option{},
+				Invokes: []fx.Option{
+					fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
+						lc.Append(fx.Hook{
+							OnStart: func(ctx context.Context) error {
+								go func() {
+									if err := v.ShowInfo(); err != nil {
+										if err != context.Canceled {
+											log.Error("Failed to show info", "error", err)
+											os.Exit(1)
+										}
+									}
+									if err := v.Shutdowner.Shutdown(); err != nil {
+										log.Error("Failed to shutdown", "error", err)
+									}
+								}()
+								return nil
+							},
+							OnStop: func(ctx context.Context) error {
+								v.Cancel()
+								return nil
+							},
+						})
+					}),
+				},
+			})
+		},
+	}
+
+	return cmd
+}
--- a/internal/cli/prune.go
+++ b/internal/cli/prune.go
@@ -2,77 +2,83 @@ package cli

 import (
 	"context"
-	"fmt"
 	"os"

-	"git.eeqj.de/sneak/vaultik/internal/globals"
+	"git.eeqj.de/sneak/vaultik/internal/log"
+	"git.eeqj.de/sneak/vaultik/internal/vaultik"
 	"github.com/spf13/cobra"
 	"go.uber.org/fx"
 )

-// PruneOptions contains options for the prune command
-type PruneOptions struct {
-	Bucket string
-	Prefix string
-	DryRun bool
-}
-
 // NewPruneCommand creates the prune command
 func NewPruneCommand() *cobra.Command {
-	opts := &PruneOptions{}
+	opts := &vaultik.PruneOptions{}

 	cmd := &cobra.Command{
 		Use:   "prune",
 		Short: "Remove unreferenced blobs",
-		Long:  `Delete blobs that are no longer referenced by any snapshot`,
+		Long: `Removes blobs that are not referenced by any snapshot.
+
+This command scans all snapshots and their manifests to build a list of
+referenced blobs, then removes any blobs in storage that are not in this list.
+
+Use this command after deleting snapshots with 'vaultik purge' to reclaim
+storage space.`,
 		Args: cobra.NoArgs,
 		RunE: func(cmd *cobra.Command, args []string) error {
-			// Validate required flags
-			if opts.Bucket == "" {
-				return fmt.Errorf("--bucket is required")
+			// Use unified config resolution
+			configPath, err := ResolveConfigPath()
+			if err != nil {
+				return err
 			}
-			if opts.Prefix == "" {
-				return fmt.Errorf("--prefix is required")
+
+			// Use the app framework like other commands
+			rootFlags := GetRootFlags()
+			return RunWithApp(cmd.Context(), AppOptions{
+				ConfigPath: configPath,
+				LogOptions: log.LogOptions{
+					Verbose: rootFlags.Verbose,
+					Debug:   rootFlags.Debug,
+					Quiet:   rootFlags.Quiet || opts.JSON,
+				},
+				Modules: []fx.Option{},
+				Invokes: []fx.Option{
+					fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
+						lc.Append(fx.Hook{
+							OnStart: func(ctx context.Context) error {
+								// Start the prune operation in a goroutine
+								go func() {
+									// Run the prune operation
+									if err := v.PruneBlobs(opts); err != nil {
+										if err != context.Canceled {
+											if !opts.JSON {
+												log.Error("Prune operation failed", "error", err)
 											}
-			return runPrune(cmd.Context(), opts)
+											os.Exit(1)
+										}
+									}
+
+									// Shutdown the app when prune completes
+									if err := v.Shutdowner.Shutdown(); err != nil {
+										log.Error("Failed to shutdown", "error", err)
+									}
+								}()
+								return nil
+							},
+							OnStop: func(ctx context.Context) error {
+								log.Debug("Stopping prune operation")
+								v.Cancel()
+								return nil
+							},
+						})
+					}),
+				},
+			})
 		},
 	}

-	cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name")
-	cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix")
-	cmd.Flags().BoolVar(&opts.DryRun, "dry-run", false, "Show what would be deleted without actually deleting")
+	cmd.Flags().BoolVar(&opts.Force, "force", false, "Skip confirmation prompt")
+	cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output pruning stats as JSON")

 	return cmd
 }
-
-func runPrune(ctx context.Context, opts *PruneOptions) error {
-	if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
-		return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
-	}
-
-	app := fx.New(
-		fx.Supply(opts),
-		fx.Provide(globals.New),
-		// Additional modules will be added here
-		fx.Invoke(func(g *globals.Globals) error {
-			// TODO: Implement prune logic
-			fmt.Printf("Pruning bucket %s with prefix %s\n", opts.Bucket, opts.Prefix)
-			if opts.DryRun {
-				fmt.Println("Running in dry-run mode")
-			}
-			return nil
-		}),
-		fx.NopLogger,
-	)
-
-	if err := app.Start(ctx); err != nil {
-		return fmt.Errorf("failed to start prune: %w", err)
-	}
-	defer func() {
-		if err := app.Stop(ctx); err != nil {
-			fmt.Printf("error stopping app: %v\n", err)
-		}
-	}()
-
-	return nil
-}
--- a/internal/cli/purge.go
+++ b/internal/cli/purge.go
@@ -0,0 +1,100 @@
+package cli
+
+import (
+	"context"
+	"fmt"
+	"os"
+
+	"git.eeqj.de/sneak/vaultik/internal/log"
+	"git.eeqj.de/sneak/vaultik/internal/vaultik"
+	"github.com/spf13/cobra"
+	"go.uber.org/fx"
+)
+
+// PurgeOptions contains options for the purge command
+type PurgeOptions struct {
+	KeepLatest bool
+	OlderThan  string
+	Force      bool
+}
+
+// NewPurgeCommand creates the purge command
+func NewPurgeCommand() *cobra.Command {
+	opts := &PurgeOptions{}
+
+	cmd := &cobra.Command{
+		Use:   "purge",
+		Short: "Purge old snapshots",
+		Long: `Removes snapshots based on age or count criteria.
+
+This command allows you to:
+- Keep only the latest snapshot (--keep-latest)
+- Remove snapshots older than a specific duration (--older-than)
+
+Config is located at /etc/vaultik/config.yml by default, but can be overridden by 
+specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
+		Args: cobra.NoArgs,
+		RunE: func(cmd *cobra.Command, args []string) error {
+			// Validate flags
+			if !opts.KeepLatest && opts.OlderThan == "" {
+				return fmt.Errorf("must specify either --keep-latest or --older-than")
+			}
+			if opts.KeepLatest && opts.OlderThan != "" {
+				return fmt.Errorf("cannot specify both --keep-latest and --older-than")
+			}
+
+			// Use unified config resolution
+			configPath, err := ResolveConfigPath()
+			if err != nil {
+				return err
+			}
+
+			// Use the app framework like other commands
+			rootFlags := GetRootFlags()
+			return RunWithApp(cmd.Context(), AppOptions{
+				ConfigPath: configPath,
+				LogOptions: log.LogOptions{
+					Verbose: rootFlags.Verbose,
+					Debug:   rootFlags.Debug,
+					Quiet:   rootFlags.Quiet,
+				},
+				Modules: []fx.Option{},
+				Invokes: []fx.Option{
+					fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
+						lc.Append(fx.Hook{
+							OnStart: func(ctx context.Context) error {
+								// Start the purge operation in a goroutine
+								go func() {
+									// Run the purge operation
+									if err := v.PurgeSnapshots(opts.KeepLatest, opts.OlderThan, opts.Force); err != nil {
+										if err != context.Canceled {
+											log.Error("Purge operation failed", "error", err)
+											os.Exit(1)
+										}
+									}
+
+									// Shutdown the app when purge completes
+									if err := v.Shutdowner.Shutdown(); err != nil {
+										log.Error("Failed to shutdown", "error", err)
+									}
+								}()
+								return nil
+							},
+							OnStop: func(ctx context.Context) error {
+								log.Debug("Stopping purge operation")
+								v.Cancel()
+								return nil
+							},
+						})
+					}),
+				},
+			})
+		},
+	}
+
+	cmd.Flags().BoolVar(&opts.KeepLatest, "keep-latest", false, "Keep only the latest snapshot")
+	cmd.Flags().StringVar(&opts.OlderThan, "older-than", "", "Remove snapshots older than duration (e.g. 30d, 6m, 1y)")
+	cmd.Flags().BoolVar(&opts.Force, "force", false, "Skip confirmation prompts")
+
+	return cmd
+}
--- a/internal/cli/remote.go
+++ b/internal/cli/remote.go
@@ -0,0 +1,89 @@
+package cli
+
+import (
+	"context"
+	"os"
+
+	"git.eeqj.de/sneak/vaultik/internal/log"
+	"git.eeqj.de/sneak/vaultik/internal/vaultik"
+	"github.com/spf13/cobra"
+	"go.uber.org/fx"
+)
+
+// NewRemoteCommand creates the remote command and subcommands
+func NewRemoteCommand() *cobra.Command {
+	cmd := &cobra.Command{
+		Use:   "remote",
+		Short: "Remote storage management commands",
+		Long:  "Commands for inspecting and managing remote storage",
+	}
+
+	// Add subcommands
+	cmd.AddCommand(newRemoteInfoCommand())
+
+	return cmd
+}
+
+// newRemoteInfoCommand creates the 'remote info' subcommand
+func newRemoteInfoCommand() *cobra.Command {
+	var jsonOutput bool
+
+	cmd := &cobra.Command{
+		Use:   "info",
+		Short: "Display remote storage information",
+		Long: `Shows detailed information about remote storage, including:
+- Size of all snapshot metadata (per snapshot and total)
+- Count and total size of all blobs
+- Count and size of referenced blobs (from all manifests)
+- Count and size of orphaned blobs (not referenced by any manifest)`,
+		Args: cobra.NoArgs,
+		RunE: func(cmd *cobra.Command, args []string) error {
+			// Use unified config resolution
+			configPath, err := ResolveConfigPath()
+			if err != nil {
+				return err
+			}
+
+			rootFlags := GetRootFlags()
+			return RunWithApp(cmd.Context(), AppOptions{
+				ConfigPath: configPath,
+				LogOptions: log.LogOptions{
+					Verbose: rootFlags.Verbose,
+					Debug:   rootFlags.Debug,
+					Quiet:   rootFlags.Quiet || jsonOutput,
+				},
+				Modules: []fx.Option{},
+				Invokes: []fx.Option{
+					fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
+						lc.Append(fx.Hook{
+							OnStart: func(ctx context.Context) error {
+								go func() {
+									if err := v.RemoteInfo(jsonOutput); err != nil {
+										if err != context.Canceled {
+											if !jsonOutput {
+												log.Error("Failed to get remote info", "error", err)
+											}
+											os.Exit(1)
+										}
+									}
+									if err := v.Shutdowner.Shutdown(); err != nil {
+										log.Error("Failed to shutdown", "error", err)
+									}
+								}()
+								return nil
+							},
+							OnStop: func(ctx context.Context) error {
+								v.Cancel()
+								return nil
+							},
+						})
+					}),
+				},
+			})
+		},
+	}
+
+	cmd.Flags().BoolVar(&jsonOutput, "json", false, "Output in JSON format")
+
+	return cmd
+}
--- a/internal/cli/restore.go
+++ b/internal/cli/restore.go
@@ -2,20 +2,30 @@ package cli

 import (
 	"context"
-	"fmt"
-	"os"

+	"git.eeqj.de/sneak/vaultik/internal/config"
 	"git.eeqj.de/sneak/vaultik/internal/globals"
+	"git.eeqj.de/sneak/vaultik/internal/log"
+	"git.eeqj.de/sneak/vaultik/internal/storage"
+	"git.eeqj.de/sneak/vaultik/internal/vaultik"
 	"github.com/spf13/cobra"
 	"go.uber.org/fx"
 )

 // RestoreOptions contains options for the restore command
 type RestoreOptions struct {
-	Bucket     string
-	Prefix     string
-	SnapshotID string
 	TargetDir string
+	Paths     []string // Optional paths to restore (empty = all)
+	Verify    bool     // Verify restored files after restore
+}
+
+// RestoreApp contains all dependencies needed for restore
+type RestoreApp struct {
+	Globals    *globals.Globals
+	Config     *config.Config
+	Storage    storage.Storer
+	Vaultik    *vaultik.Vaultik
+	Shutdowner fx.Shutdowner
 }

 // NewRestoreCommand creates the restore command
@@ -23,61 +33,104 @@ func NewRestoreCommand() *cobra.Command {
 	opts := &RestoreOptions{}

 	cmd := &cobra.Command{
-		Use:   "restore",
+		Use:   "restore <snapshot-id> <target-dir> [paths...]",
 		Short: "Restore files from backup",
-		Long:  `Download and decrypt files from a backup snapshot`,
-		Args:  cobra.NoArgs,
+		Long: `Download and decrypt files from a backup snapshot.
+
+This command will restore files from the specified snapshot to the target directory.
+If no paths are specified, all files are restored.
+If paths are specified, only matching files/directories are restored.
+
+Requires the VAULTIK_AGE_SECRET_KEY environment variable to be set with the age private key.
+
+Examples:
+  # Restore entire snapshot
+  vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore
+
+  # Restore specific file
+  vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore /home/user/important.txt
+
+  # Restore specific directory
+  vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore /home/user/documents/
+
+  # Restore and verify all files
+  vaultik restore --verify myhost_docs_2025-01-01T12:00:00Z /restore`,
+		Args: cobra.MinimumNArgs(2),
 		RunE: func(cmd *cobra.Command, args []string) error {
-			// Validate required flags
-			if opts.Bucket == "" {
-				return fmt.Errorf("--bucket is required")
+			snapshotID := args[0]
+			opts.TargetDir = args[1]
+			if len(args) > 2 {
+				opts.Paths = args[2:]
 			}
-			if opts.Prefix == "" {
-				return fmt.Errorf("--prefix is required")
+
+			// Use unified config resolution
+			configPath, err := ResolveConfigPath()
+			if err != nil {
+				return err
 			}
-			if opts.SnapshotID == "" {
-				return fmt.Errorf("--snapshot is required")
+
+			// Use the app framework like other commands
+			rootFlags := GetRootFlags()
+			return RunWithApp(cmd.Context(), AppOptions{
+				ConfigPath: configPath,
+				LogOptions: log.LogOptions{
+					Verbose: rootFlags.Verbose,
+					Debug:   rootFlags.Debug,
+					Quiet:   rootFlags.Quiet,
+				},
+				Modules: []fx.Option{
+					fx.Provide(fx.Annotate(
+						func(g *globals.Globals, cfg *config.Config,
+							storer storage.Storer, v *vaultik.Vaultik, shutdowner fx.Shutdowner) *RestoreApp {
+							return &RestoreApp{
+								Globals:    g,
+								Config:     cfg,
+								Storage:    storer,
+								Vaultik:    v,
+								Shutdowner: shutdowner,
 							}
-			if opts.TargetDir == "" {
-				return fmt.Errorf("--target is required")
+						},
+					)),
+				},
+				Invokes: []fx.Option{
+					fx.Invoke(func(app *RestoreApp, lc fx.Lifecycle) {
+						lc.Append(fx.Hook{
+							OnStart: func(ctx context.Context) error {
+								// Start the restore operation in a goroutine
+								go func() {
+									// Run the restore operation
+									restoreOpts := &vaultik.RestoreOptions{
+										SnapshotID: snapshotID,
+										TargetDir:  opts.TargetDir,
+										Paths:      opts.Paths,
+										Verify:     opts.Verify,
 									}
-			return runRestore(cmd.Context(), opts)
+									if err := app.Vaultik.Restore(restoreOpts); err != nil {
+										if err != context.Canceled {
+											log.Error("Restore operation failed", "error", err)
+										}
+									}
+
+									// Shutdown the app when restore completes
+									if err := app.Shutdowner.Shutdown(); err != nil {
+										log.Error("Failed to shutdown", "error", err)
+									}
+								}()
+								return nil
+							},
+							OnStop: func(ctx context.Context) error {
+								log.Debug("Stopping restore operation")
+								app.Vaultik.Cancel()
+								return nil
+							},
+						})
+					}),
+				},
+			})
 		},
 	}

-	cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name")
-	cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix")
-	cmd.Flags().StringVar(&opts.SnapshotID, "snapshot", "", "Snapshot ID to restore")
-	cmd.Flags().StringVar(&opts.TargetDir, "target", "", "Target directory for restore")
+	cmd.Flags().BoolVar(&opts.Verify, "verify", false, "Verify restored files by checking chunk hashes")

 	return cmd
 }
-
-func runRestore(ctx context.Context, opts *RestoreOptions) error {
-	if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
-		return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
-	}
-
-	app := fx.New(
-		fx.Supply(opts),
-		fx.Provide(globals.New),
-		// Additional modules will be added here
-		fx.Invoke(func(g *globals.Globals) error {
-			// TODO: Implement restore logic
-			fmt.Printf("Restoring snapshot %s to %s\n", opts.SnapshotID, opts.TargetDir)
-			return nil
-		}),
-		fx.NopLogger,
-	)
-
-	if err := app.Start(ctx); err != nil {
-		return fmt.Errorf("failed to start restore: %w", err)
-	}
-	defer func() {
-		if err := app.Stop(ctx); err != nil {
-			fmt.Printf("error stopping app: %v\n", err)
-		}
-	}()
-
-	return nil
-}
--- a/internal/cli/root.go
+++ b/internal/cli/root.go
@@ -1,10 +1,26 @@
 package cli

 import (
+	"fmt"
+	"os"
+
 	"github.com/spf13/cobra"
 )

-// NewRootCommand creates the root cobra command
+// RootFlags holds global flags that apply to all commands.
+// These flags are defined on the root command and inherited by all subcommands.
+type RootFlags struct {
+	ConfigPath string
+	Verbose    bool
+	Debug      bool
+	Quiet      bool
+}
+
+var rootFlags RootFlags
+
+// NewRootCommand creates the root cobra command for the vaultik CLI.
+// It sets up the command structure, global flags, and adds all subcommands.
+// This is the main entry point for the CLI command hierarchy.
 func NewRootCommand() *cobra.Command {
 	cmd := &cobra.Command{
 		Use:   "vaultik",
@@ -15,15 +31,54 @@ on the source system.`,
 		SilenceUsage: true,
 	}

+	// Add global flags
+	cmd.PersistentFlags().StringVar(&rootFlags.ConfigPath, "config", "", "Path to config file (default: $VAULTIK_CONFIG or /etc/vaultik/config.yml)")
+	cmd.PersistentFlags().BoolVarP(&rootFlags.Verbose, "verbose", "v", false, "Enable verbose output")
+	cmd.PersistentFlags().BoolVar(&rootFlags.Debug, "debug", false, "Enable debug output")
+	cmd.PersistentFlags().BoolVarP(&rootFlags.Quiet, "quiet", "q", false, "Suppress non-error output")
+
 	// Add subcommands
 	cmd.AddCommand(
-		NewBackupCommand(),
 		NewRestoreCommand(),
 		NewPruneCommand(),
 		NewVerifyCommand(),
-		NewFetchCommand(),
-		SnapshotCmd(),
+		NewStoreCommand(),
+		NewSnapshotCommand(),
+		NewInfoCommand(),
+		NewVersionCommand(),
+		NewRemoteCommand(),
+		NewDatabaseCommand(),
 	)

 	return cmd
 }
+
+// GetRootFlags returns the global flags that were parsed from the command line.
+// This allows subcommands to access global flag values like verbosity and config path.
+func GetRootFlags() RootFlags {
+	return rootFlags
+}
+
+// ResolveConfigPath resolves the config file path from flags, environment, or default.
+// It checks in order: 1) --config flag, 2) VAULTIK_CONFIG environment variable,
+// 3) default location /etc/vaultik/config.yml. Returns an error if no valid
+// config file can be found through any of these methods.
+func ResolveConfigPath() (string, error) {
+	// First check global flag
+	if rootFlags.ConfigPath != "" {
+		return rootFlags.ConfigPath, nil
+	}
+
+	// Then check environment variable
+	if envPath := os.Getenv("VAULTIK_CONFIG"); envPath != "" {
+		return envPath, nil
+	}
+
+	// Finally check default location
+	defaultPath := "/etc/vaultik/config.yml"
+	if _, err := os.Stat(defaultPath); err == nil {
+		return defaultPath, nil
+	}
+
+	return "", fmt.Errorf("no config file specified, VAULTIK_CONFIG not set, and %s not found", defaultPath)
+}
--- a/internal/cli/snapshot.go
+++ b/internal/cli/snapshot.go
@@ -1,90 +1,467 @@
 package cli

 import (
+	"context"
+	"fmt"
+	"os"
+
+	"git.eeqj.de/sneak/vaultik/internal/log"
+	"git.eeqj.de/sneak/vaultik/internal/vaultik"
 	"github.com/spf13/cobra"
+	"go.uber.org/fx"
 )

-func SnapshotCmd() *cobra.Command {
+// NewSnapshotCommand creates the snapshot command and subcommands
+func NewSnapshotCommand() *cobra.Command {
 	cmd := &cobra.Command{
 		Use:   "snapshot",
-		Short: "Manage snapshots",
-		Long:  "Commands for listing, removing, and querying snapshots",
+		Short: "Snapshot management commands",
+		Long:  "Commands for creating, listing, and managing snapshots",
 	}

-	cmd.AddCommand(snapshotListCmd())
-	cmd.AddCommand(snapshotRmCmd())
-	cmd.AddCommand(snapshotLatestCmd())
+	// Add subcommands
+	cmd.AddCommand(newSnapshotCreateCommand())
+	cmd.AddCommand(newSnapshotListCommand())
+	cmd.AddCommand(newSnapshotPurgeCommand())
+	cmd.AddCommand(newSnapshotVerifyCommand())
+	cmd.AddCommand(newSnapshotRemoveCommand())
+	cmd.AddCommand(newSnapshotPruneCommand())

 	return cmd
 }

-func snapshotListCmd() *cobra.Command {
-	var (
-		bucket string
-		prefix string
-		limit  int
-	)
+// newSnapshotCreateCommand creates the 'snapshot create' subcommand
+func newSnapshotCreateCommand() *cobra.Command {
+	opts := &vaultik.SnapshotCreateOptions{}
+
+	cmd := &cobra.Command{
+		Use:   "create [snapshot-names...]",
+		Short: "Create new snapshots",
+		Long: `Creates new snapshots of the configured directories.
+
+If snapshot names are provided, only those snapshots are created.
+If no names are provided, all configured snapshots are created.
+
+Config is located at /etc/vaultik/config.yml by default, but can be overridden by
+specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
+		Args: cobra.ArbitraryArgs,
+		RunE: func(cmd *cobra.Command, args []string) error {
+			// Pass snapshot names from args
+			opts.Snapshots = args
+			// Use unified config resolution
+			configPath, err := ResolveConfigPath()
+			if err != nil {
+				return err
+			}
+
+			// Use the backup functionality from cli package
+			rootFlags := GetRootFlags()
+			return RunWithApp(cmd.Context(), AppOptions{
+				ConfigPath: configPath,
+				LogOptions: log.LogOptions{
+					Verbose: rootFlags.Verbose,
+					Debug:   rootFlags.Debug,
+					Cron:    opts.Cron,
+					Quiet:   rootFlags.Quiet,
+				},
+				Modules: []fx.Option{},
+				Invokes: []fx.Option{
+					fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
+						lc.Append(fx.Hook{
+							OnStart: func(ctx context.Context) error {
+								// Start the snapshot creation in a goroutine
+								go func() {
+									// Run the snapshot creation
+									if err := v.CreateSnapshot(opts); err != nil {
+										if err != context.Canceled {
+											log.Error("Snapshot creation failed", "error", err)
+										}
+									}
+
+									// Shutdown the app when snapshot completes
+									if err := v.Shutdowner.Shutdown(); err != nil {
+										log.Error("Failed to shutdown", "error", err)
+									}
+								}()
+								return nil
+							},
+							OnStop: func(ctx context.Context) error {
+								log.Debug("Stopping snapshot creation")
+								// Cancel the Vaultik context
+								v.Cancel()
+								return nil
+							},
+						})
+					}),
+				},
+			})
+		},
+	}
+
+	cmd.Flags().BoolVar(&opts.Daemon, "daemon", false, "Run in daemon mode with inotify monitoring")
+	cmd.Flags().BoolVar(&opts.Cron, "cron", false, "Run in cron mode (silent unless error)")
+	cmd.Flags().BoolVar(&opts.Prune, "prune", false, "Delete all previous snapshots and unreferenced blobs after backup")
+	cmd.Flags().BoolVar(&opts.SkipErrors, "skip-errors", false, "Skip file read errors (log them loudly but continue)")
+
+	return cmd
+}
+
+// newSnapshotListCommand creates the 'snapshot list' subcommand
+func newSnapshotListCommand() *cobra.Command {
+	var jsonOutput bool

 	cmd := &cobra.Command{
 		Use:     "list",
-		Short: "List snapshots",
-		Long:  "List all snapshots in the bucket, sorted by timestamp",
+		Aliases: []string{"ls"},
+		Short:   "List all snapshots",
+		Long:    "Lists all snapshots with their ID, timestamp, and compressed size",
+		Args:    cobra.NoArgs,
 		RunE: func(cmd *cobra.Command, args []string) error {
-			panic("unimplemented")
+			// Use unified config resolution
+			configPath, err := ResolveConfigPath()
+			if err != nil {
+				return err
+			}
+
+			rootFlags := GetRootFlags()
+			return RunWithApp(cmd.Context(), AppOptions{
+				ConfigPath: configPath,
+				LogOptions: log.LogOptions{
+					Verbose: rootFlags.Verbose,
+					Debug:   rootFlags.Debug,
+					Quiet:   rootFlags.Quiet,
+				},
+				Modules: []fx.Option{},
+				Invokes: []fx.Option{
+					fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
+						lc.Append(fx.Hook{
+							OnStart: func(ctx context.Context) error {
+								go func() {
+									if err := v.ListSnapshots(jsonOutput); err != nil {
+										if err != context.Canceled {
+											log.Error("Failed to list snapshots", "error", err)
+											os.Exit(1)
+										}
+									}
+									if err := v.Shutdowner.Shutdown(); err != nil {
+										log.Error("Failed to shutdown", "error", err)
+									}
+								}()
+								return nil
+							},
+							OnStop: func(ctx context.Context) error {
+								v.Cancel()
+								return nil
+							},
+						})
+					}),
+				},
+			})
 		},
 	}

-	cmd.Flags().StringVar(&bucket, "bucket", "", "S3 bucket name")
-	cmd.Flags().StringVar(&prefix, "prefix", "", "S3 prefix")
-	cmd.Flags().IntVar(&limit, "limit", 10, "Maximum number of snapshots to list")
-	cmd.MarkFlagRequired("bucket")
+	cmd.Flags().BoolVar(&jsonOutput, "json", false, "Output in JSON format")

 	return cmd
 }

-func snapshotRmCmd() *cobra.Command {
-	var (
-		bucket   string
-		prefix   string
-		snapshot string
-	)
+// newSnapshotPurgeCommand creates the 'snapshot purge' subcommand
+func newSnapshotPurgeCommand() *cobra.Command {
+	var keepLatest bool
+	var olderThan string
+	var force bool

 	cmd := &cobra.Command{
-		Use:   "rm",
-		Short: "Remove a snapshot",
-		Long:  "Remove a snapshot and optionally its associated blobs",
+		Use:   "purge",
+		Short: "Purge old snapshots",
+		Long:  "Removes snapshots based on age or count criteria",
+		Args:  cobra.NoArgs,
 		RunE: func(cmd *cobra.Command, args []string) error {
-			panic("unimplemented")
+			// Validate flags
+			if !keepLatest && olderThan == "" {
+				return fmt.Errorf("must specify either --keep-latest or --older-than")
+			}
+			if keepLatest && olderThan != "" {
+				return fmt.Errorf("cannot specify both --keep-latest and --older-than")
+			}
+
+			// Use unified config resolution
+			configPath, err := ResolveConfigPath()
+			if err != nil {
+				return err
+			}
+
+			rootFlags := GetRootFlags()
+			return RunWithApp(cmd.Context(), AppOptions{
+				ConfigPath: configPath,
+				LogOptions: log.LogOptions{
+					Verbose: rootFlags.Verbose,
+					Debug:   rootFlags.Debug,
+					Quiet:   rootFlags.Quiet,
+				},
+				Modules: []fx.Option{},
+				Invokes: []fx.Option{
+					fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
+						lc.Append(fx.Hook{
+							OnStart: func(ctx context.Context) error {
+								go func() {
+									if err := v.PurgeSnapshots(keepLatest, olderThan, force); err != nil {
+										if err != context.Canceled {
+											log.Error("Failed to purge snapshots", "error", err)
+											os.Exit(1)
+										}
+									}
+									if err := v.Shutdowner.Shutdown(); err != nil {
+										log.Error("Failed to shutdown", "error", err)
+									}
+								}()
+								return nil
+							},
+							OnStop: func(ctx context.Context) error {
+								v.Cancel()
+								return nil
+							},
+						})
+					}),
+				},
+			})
 		},
 	}

-	cmd.Flags().StringVar(&bucket, "bucket", "", "S3 bucket name")
-	cmd.Flags().StringVar(&prefix, "prefix", "", "S3 prefix")
-	cmd.Flags().StringVar(&snapshot, "snapshot", "", "Snapshot ID to remove")
-	cmd.MarkFlagRequired("bucket")
-	cmd.MarkFlagRequired("snapshot")
+	cmd.Flags().BoolVar(&keepLatest, "keep-latest", false, "Keep only the latest snapshot")
+	cmd.Flags().StringVar(&olderThan, "older-than", "", "Remove snapshots older than duration (e.g., 30d, 6m, 1y)")
+	cmd.Flags().BoolVar(&force, "force", false, "Skip confirmation prompt")

 	return cmd
 }

-func snapshotLatestCmd() *cobra.Command {
-	var (
-		bucket string
-		prefix string
-	)
+// newSnapshotVerifyCommand creates the 'snapshot verify' subcommand
+func newSnapshotVerifyCommand() *cobra.Command {
+	opts := &vaultik.VerifyOptions{}

 	cmd := &cobra.Command{
-		Use:   "latest",
-		Short: "Get the latest snapshot ID",
-		Long:  "Display the ID of the most recent snapshot",
+		Use:   "verify <snapshot-id>",
+		Short: "Verify snapshot integrity",
+		Long:  "Verifies that all blobs referenced in a snapshot exist",
+		Args: func(cmd *cobra.Command, args []string) error {
+			if len(args) != 1 {
+				_ = cmd.Help()
+				if len(args) == 0 {
+					return fmt.Errorf("snapshot ID required")
+				}
+				return fmt.Errorf("expected 1 argument, got %d", len(args))
+			}
+			return nil
+		},
 		RunE: func(cmd *cobra.Command, args []string) error {
-			panic("unimplemented")
+			snapshotID := args[0]
+
+			// Use unified config resolution
+			configPath, err := ResolveConfigPath()
+			if err != nil {
+				return err
+			}
+
+			rootFlags := GetRootFlags()
+			return RunWithApp(cmd.Context(), AppOptions{
+				ConfigPath: configPath,
+				LogOptions: log.LogOptions{
+					Verbose: rootFlags.Verbose,
+					Debug:   rootFlags.Debug,
+					Quiet:   rootFlags.Quiet || opts.JSON,
+				},
+				Modules: []fx.Option{},
+				Invokes: []fx.Option{
+					fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
+						lc.Append(fx.Hook{
+							OnStart: func(ctx context.Context) error {
+								go func() {
+									var err error
+									if opts.Deep {
+										err = v.RunDeepVerify(snapshotID, opts)
+									} else {
+										err = v.VerifySnapshotWithOptions(snapshotID, opts)
+									}
+									if err != nil {
+										if err != context.Canceled {
+											if !opts.JSON {
+												log.Error("Verification failed", "error", err)
+											}
+											os.Exit(1)
+										}
+									}
+									if err := v.Shutdowner.Shutdown(); err != nil {
+										log.Error("Failed to shutdown", "error", err)
+									}
+								}()
+								return nil
+							},
+							OnStop: func(ctx context.Context) error {
+								v.Cancel()
+								return nil
+							},
+						})
+					}),
+				},
+			})
 		},
 	}

-	cmd.Flags().StringVar(&bucket, "bucket", "", "S3 bucket name")
-	cmd.Flags().StringVar(&prefix, "prefix", "", "S3 prefix")
-	cmd.MarkFlagRequired("bucket")
+	cmd.Flags().BoolVar(&opts.Deep, "deep", false, "Download and verify blob hashes")
+	cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output verification results as JSON")
+
+	return cmd
+}
+
+// newSnapshotRemoveCommand creates the 'snapshot remove' subcommand
+func newSnapshotRemoveCommand() *cobra.Command {
+	opts := &vaultik.RemoveOptions{}
+
+	cmd := &cobra.Command{
+		Use:     "remove [snapshot-id]",
+		Aliases: []string{"rm"},
+		Short:   "Remove a snapshot from the local database",
+		Long: `Removes a snapshot from the local database.
+
+By default, only removes from the local database. Use --remote to also remove
+the snapshot metadata from remote storage.
+
+Note: This does NOT remove blobs. Use 'vaultik prune' to remove orphaned blobs
+after removing snapshots.
+
+Use --all --force to remove all snapshots.`,
+		Args: func(cmd *cobra.Command, args []string) error {
+			all, _ := cmd.Flags().GetBool("all")
+			if all {
+				if len(args) > 0 {
+					_ = cmd.Help()
+					return fmt.Errorf("--all cannot be used with a snapshot ID")
+				}
+				return nil
+			}
+			if len(args) != 1 {
+				_ = cmd.Help()
+				if len(args) == 0 {
+					return fmt.Errorf("snapshot ID required (or use --all --force)")
+				}
+				return fmt.Errorf("expected 1 argument, got %d", len(args))
+			}
+			return nil
+		},
+		RunE: func(cmd *cobra.Command, args []string) error {
+			// Use unified config resolution
+			configPath, err := ResolveConfigPath()
+			if err != nil {
+				return err
+			}
+
+			rootFlags := GetRootFlags()
+			return RunWithApp(cmd.Context(), AppOptions{
+				ConfigPath: configPath,
+				LogOptions: log.LogOptions{
+					Verbose: rootFlags.Verbose,
+					Debug:   rootFlags.Debug,
+					Quiet:   rootFlags.Quiet || opts.JSON,
+				},
+				Modules: []fx.Option{},
+				Invokes: []fx.Option{
+					fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
+						lc.Append(fx.Hook{
+							OnStart: func(ctx context.Context) error {
+								go func() {
+									var err error
+									if opts.All {
+										_, err = v.RemoveAllSnapshots(opts)
+									} else {
+										_, err = v.RemoveSnapshot(args[0], opts)
+									}
+									if err != nil {
+										if err != context.Canceled {
+											if !opts.JSON {
+												log.Error("Failed to remove snapshot", "error", err)
+											}
+											os.Exit(1)
+										}
+									}
+									if err := v.Shutdowner.Shutdown(); err != nil {
+										log.Error("Failed to shutdown", "error", err)
+									}
+								}()
+								return nil
+							},
+							OnStop: func(ctx context.Context) error {
+								v.Cancel()
+								return nil
+							},
+						})
+					}),
+				},
+			})
+		},
+	}
+
+	cmd.Flags().BoolVarP(&opts.Force, "force", "f", false, "Skip confirmation prompt")
+	cmd.Flags().BoolVar(&opts.DryRun, "dry-run", false, "Show what would be removed without removing")
+	cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output result as JSON")
+	cmd.Flags().BoolVar(&opts.Remote, "remote", false, "Also remove snapshot metadata from remote storage")
+	cmd.Flags().BoolVar(&opts.All, "all", false, "Remove all snapshots (requires --force)")
+
+	return cmd
+}
+
+// newSnapshotPruneCommand creates the 'snapshot prune' subcommand
+func newSnapshotPruneCommand() *cobra.Command {
+	cmd := &cobra.Command{
+		Use:   "prune",
+		Short: "Remove orphaned data from local database",
+		Long: `Removes orphaned files, chunks, and blobs from the local database.
+
+This cleans up data that is no longer referenced by any snapshot, which can
+accumulate from incomplete backups or deleted snapshots.`,
+		Args: cobra.NoArgs,
+		RunE: func(cmd *cobra.Command, args []string) error {
+			// Use unified config resolution
+			configPath, err := ResolveConfigPath()
+			if err != nil {
+				return err
+			}
+
+			rootFlags := GetRootFlags()
+			return RunWithApp(cmd.Context(), AppOptions{
+				ConfigPath: configPath,
+				LogOptions: log.LogOptions{
+					Verbose: rootFlags.Verbose,
+					Debug:   rootFlags.Debug,
+					Quiet:   rootFlags.Quiet,
+				},
+				Modules: []fx.Option{},
+				Invokes: []fx.Option{
+					fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
+						lc.Append(fx.Hook{
+							OnStart: func(ctx context.Context) error {
+								go func() {
+									if _, err := v.PruneDatabase(); err != nil {
+										if err != context.Canceled {
+											log.Error("Failed to prune database", "error", err)
+											os.Exit(1)
+										}
+									}
+									if err := v.Shutdowner.Shutdown(); err != nil {
+										log.Error("Failed to shutdown", "error", err)
+									}
+								}()
+								return nil
+							},
+							OnStop: func(ctx context.Context) error {
+								v.Cancel()
+								return nil
+							},
+						})
+					}),
+				},
+			})
+		},
+	}

 	return cmd
 }
--- a/internal/cli/store.go
+++ b/internal/cli/store.go
@@ -0,0 +1,158 @@
+package cli
+
+import (
+	"context"
+	"fmt"
+	"strings"
+	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/log"
+	"git.eeqj.de/sneak/vaultik/internal/storage"
+	"github.com/spf13/cobra"
+	"go.uber.org/fx"
+)
+
+// StoreApp contains dependencies for store commands
+type StoreApp struct {
+	Storage    storage.Storer
+	Shutdowner fx.Shutdowner
+}
+
+// NewStoreCommand creates the store command and subcommands
+func NewStoreCommand() *cobra.Command {
+	cmd := &cobra.Command{
+		Use:   "store",
+		Short: "Storage information commands",
+		Long:  "Commands for viewing information about the storage backend",
+	}
+
+	// Add subcommands
+	cmd.AddCommand(newStoreInfoCommand())
+
+	return cmd
+}
+
+// newStoreInfoCommand creates the 'store info' subcommand
+func newStoreInfoCommand() *cobra.Command {
+	return &cobra.Command{
+		Use:   "info",
+		Short: "Display storage information",
+		Long:  "Shows storage configuration and statistics including snapshots and blobs",
+		RunE: func(cmd *cobra.Command, args []string) error {
+			return runWithApp(cmd.Context(), func(app *StoreApp) error {
+				return app.Info(cmd.Context())
+			})
+		},
+	}
+}
+
+// Info displays storage information
+func (app *StoreApp) Info(ctx context.Context) error {
+	// Get storage info
+	storageInfo := app.Storage.Info()
+
+	fmt.Printf("Storage Information\n")
+	fmt.Printf("==================\n\n")
+	fmt.Printf("Storage Configuration:\n")
+	fmt.Printf("  Type:     %s\n", storageInfo.Type)
+	fmt.Printf("  Location: %s\n\n", storageInfo.Location)
+
+	// Count snapshots by listing metadata/ prefix
+	snapshotCount := 0
+	snapshotCh := app.Storage.ListStream(ctx, "metadata/")
+	snapshotDirs := make(map[string]bool)
+
+	for object := range snapshotCh {
+		if object.Err != nil {
+			return fmt.Errorf("listing snapshots: %w", object.Err)
+		}
+		// Extract snapshot ID from path like metadata/2024-01-15-143052-hostname/
+		parts := strings.Split(object.Key, "/")
+		if len(parts) >= 2 && parts[0] == "metadata" && parts[1] != "" {
+			snapshotDirs[parts[1]] = true
+		}
+	}
+	snapshotCount = len(snapshotDirs)
+
+	// Count blobs and calculate total size by listing blobs/ prefix
+	blobCount := 0
+	var totalSize int64
+
+	blobCh := app.Storage.ListStream(ctx, "blobs/")
+	for object := range blobCh {
+		if object.Err != nil {
+			return fmt.Errorf("listing blobs: %w", object.Err)
+		}
+		if !strings.HasSuffix(object.Key, "/") { // Skip directories
+			blobCount++
+			totalSize += object.Size
+		}
+	}
+
+	fmt.Printf("Storage Statistics:\n")
+	fmt.Printf("  Snapshots:  %d\n", snapshotCount)
+	fmt.Printf("  Blobs:      %d\n", blobCount)
+	fmt.Printf("  Total Size: %s\n", formatBytes(totalSize))
+
+	return nil
+}
+
+// formatBytes formats bytes into human-readable format
+func formatBytes(bytes int64) string {
+	const unit = 1024
+	if bytes < unit {
+		return fmt.Sprintf("%d B", bytes)
+	}
+	div, exp := int64(unit), 0
+	for n := bytes / unit; n >= unit; n /= unit {
+		div *= unit
+		exp++
+	}
+	return fmt.Sprintf("%.1f %cB", float64(bytes)/float64(div), "KMGTPE"[exp])
+}
+
+// runWithApp creates the FX app and runs the given function
+func runWithApp(ctx context.Context, fn func(*StoreApp) error) error {
+	var result error
+	rootFlags := GetRootFlags()
+
+	// Use unified config resolution
+	configPath, err := ResolveConfigPath()
+	if err != nil {
+		return err
+	}
+
+	err = RunWithApp(ctx, AppOptions{
+		ConfigPath: configPath,
+		LogOptions: log.LogOptions{
+			Verbose: rootFlags.Verbose,
+			Debug:   rootFlags.Debug,
+			Quiet:   rootFlags.Quiet,
+		},
+		Modules: []fx.Option{
+			fx.Provide(func(storer storage.Storer, shutdowner fx.Shutdowner) *StoreApp {
+				return &StoreApp{
+					Storage:    storer,
+					Shutdowner: shutdowner,
+				}
+			}),
+		},
+		Invokes: []fx.Option{
+			fx.Invoke(func(app *StoreApp, shutdowner fx.Shutdowner) {
+				result = fn(app)
+				// Shutdown after command completes
+				go func() {
+					time.Sleep(100 * time.Millisecond) // Brief delay to ensure clean shutdown
+					if err := shutdowner.Shutdown(); err != nil {
+						log.Error("Failed to shutdown", "error", err)
+					}
+				}()
+			}),
+		},
+	})
+
+	if err != nil {
+		return err
+	}
+	return result
+}
--- a/internal/cli/vaultik_snapshot_types.go
+++ b/internal/cli/vaultik_snapshot_types.go
@@ -0,0 +1,10 @@
+package cli
+
+import "time"
+
+// SnapshotInfo represents snapshot information for listing
+type SnapshotInfo struct {
+	ID             string    `json:"id"`
+	Timestamp      time.Time `json:"timestamp"`
+	CompressedSize int64     `json:"compressed_size"`
+}
--- a/internal/cli/verify.go
+++ b/internal/cli/verify.go
@@ -2,85 +2,97 @@ package cli

 import (
 	"context"
-	"fmt"
 	"os"

-	"git.eeqj.de/sneak/vaultik/internal/globals"
+	"git.eeqj.de/sneak/vaultik/internal/log"
+	"git.eeqj.de/sneak/vaultik/internal/vaultik"
 	"github.com/spf13/cobra"
 	"go.uber.org/fx"
 )

-// VerifyOptions contains options for the verify command
-type VerifyOptions struct {
-	Bucket     string
-	Prefix     string
-	SnapshotID string
-	Quick      bool
-}
-
 // NewVerifyCommand creates the verify command
 func NewVerifyCommand() *cobra.Command {
-	opts := &VerifyOptions{}
+	opts := &vaultik.VerifyOptions{}

 	cmd := &cobra.Command{
-		Use:   "verify",
-		Short: "Verify backup integrity",
-		Long:  `Check that all referenced blobs exist and verify metadata integrity`,
-		Args:  cobra.NoArgs,
+		Use:   "verify <snapshot-id>",
+		Short: "Verify snapshot integrity",
+		Long: `Verifies that all blobs referenced in a snapshot exist and optionally verifies their contents.
+
+Shallow verification (default):
+- Downloads and decompresses manifest
+- Checks existence of all blobs in S3
+- Reports missing blobs
+
+Deep verification (--deep):
+- Downloads and decrypts database
+- Verifies blob lists match between manifest and database
+- Downloads, decrypts, and decompresses each blob
+- Verifies SHA256 hash of each chunk matches database
+- Ensures chunks are ordered correctly
+
+The command will fail immediately on any verification error and exit with non-zero status.`,
+		Args: cobra.ExactArgs(1),
 		RunE: func(cmd *cobra.Command, args []string) error {
-			// Validate required flags
-			if opts.Bucket == "" {
-				return fmt.Errorf("--bucket is required")
+			snapshotID := args[0]
+
+			// Use unified config resolution
+			configPath, err := ResolveConfigPath()
+			if err != nil {
+				return err
 			}
-			if opts.Prefix == "" {
-				return fmt.Errorf("--prefix is required")
+
+			// Use the app framework for all verification
+			rootFlags := GetRootFlags()
+			return RunWithApp(cmd.Context(), AppOptions{
+				ConfigPath: configPath,
+				LogOptions: log.LogOptions{
+					Verbose: rootFlags.Verbose,
+					Debug:   rootFlags.Debug,
+					Quiet:   rootFlags.Quiet || opts.JSON, // Suppress log output in JSON mode
+				},
+				Modules: []fx.Option{},
+				Invokes: []fx.Option{
+					fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
+						lc.Append(fx.Hook{
+							OnStart: func(ctx context.Context) error {
+								// Run the verify operation directly
+								go func() {
+									var err error
+									if opts.Deep {
+										err = v.RunDeepVerify(snapshotID, opts)
+									} else {
+										err = v.VerifySnapshotWithOptions(snapshotID, opts)
 									}
-			return runVerify(cmd.Context(), opts)
+
+									if err != nil {
+										if err != context.Canceled {
+											if !opts.JSON {
+												log.Error("Verification failed", "error", err)
+											}
+											os.Exit(1)
+										}
+									}
+									if err := v.Shutdowner.Shutdown(); err != nil {
+										log.Error("Failed to shutdown", "error", err)
+									}
+								}()
+								return nil
+							},
+							OnStop: func(ctx context.Context) error {
+								log.Debug("Stopping verify operation")
+								v.Cancel()
+								return nil
+							},
+						})
+					}),
+				},
+			})
 		},
 	}

-	cmd.Flags().StringVar(&opts.Bucket, "bucket", "", "S3 bucket name")
-	cmd.Flags().StringVar(&opts.Prefix, "prefix", "", "S3 prefix")
-	cmd.Flags().StringVar(&opts.SnapshotID, "snapshot", "", "Snapshot ID to verify (optional, defaults to latest)")
-	cmd.Flags().BoolVar(&opts.Quick, "quick", false, "Perform quick verification by checking blob existence and S3 content hashes without downloading")
+	cmd.Flags().BoolVar(&opts.Deep, "deep", false, "Perform deep verification by downloading and verifying all blob contents")
+	cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output verification results as JSON")

 	return cmd
 }
-
-func runVerify(ctx context.Context, opts *VerifyOptions) error {
-	if os.Getenv("VAULTIK_PRIVATE_KEY") == "" {
-		return fmt.Errorf("VAULTIK_PRIVATE_KEY environment variable must be set")
-	}
-
-	app := fx.New(
-		fx.Supply(opts),
-		fx.Provide(globals.New),
-		// Additional modules will be added here
-		fx.Invoke(func(g *globals.Globals) error {
-			// TODO: Implement verify logic
-			if opts.SnapshotID == "" {
-				fmt.Printf("Verifying latest snapshot in bucket %s with prefix %s\n", opts.Bucket, opts.Prefix)
-			} else {
-				fmt.Printf("Verifying snapshot %s in bucket %s with prefix %s\n", opts.SnapshotID, opts.Bucket, opts.Prefix)
-			}
-			if opts.Quick {
-				fmt.Println("Performing quick verification")
-			} else {
-				fmt.Println("Performing deep verification")
-			}
-			return nil
-		}),
-		fx.NopLogger,
-	)
-
-	if err := app.Start(ctx); err != nil {
-		return fmt.Errorf("failed to start verify: %w", err)
-	}
-	defer func() {
-		if err := app.Stop(ctx); err != nil {
-			fmt.Printf("error stopping app: %v\n", err)
-		}
-	}()
-
-	return nil
-}
--- a/internal/cli/version.go
+++ b/internal/cli/version.go
@@ -0,0 +1,27 @@
+package cli
+
+import (
+	"fmt"
+	"runtime"
+
+	"git.eeqj.de/sneak/vaultik/internal/globals"
+	"github.com/spf13/cobra"
+)
+
+// NewVersionCommand creates the version command
+func NewVersionCommand() *cobra.Command {
+	cmd := &cobra.Command{
+		Use:   "version",
+		Short: "Print version information",
+		Long:  `Print version, git commit, and build information for vaultik.`,
+		Args:  cobra.NoArgs,
+		Run: func(cmd *cobra.Command, args []string) {
+			fmt.Printf("vaultik %s\n", globals.Version)
+			fmt.Printf("  commit:  %s\n", globals.Commit)
+			fmt.Printf("  go:      %s\n", runtime.Version())
+			fmt.Printf("  os/arch: %s/%s\n", runtime.GOOS, runtime.GOARCH)
+		},
+	}
+
+	return cmd
+}
--- a/internal/config/config.go
+++ b/internal/config/config.go
@@ -3,30 +3,112 @@ package config
 import (
 	"fmt"
 	"os"
+	"path/filepath"
+	"sort"
+	"strings"
 	"time"

+	"filippo.io/age"
+	"git.eeqj.de/sneak/smartconfig"
+	"git.eeqj.de/sneak/vaultik/internal/log"
+	"github.com/adrg/xdg"
 	"go.uber.org/fx"
 	"gopkg.in/yaml.v3"
 )

-// Config represents the application configuration
+const appName = "berlin.sneak.app.vaultik"
+
+// expandTilde expands ~ at the start of a path to the user's home directory.
+func expandTilde(path string) string {
+	if path == "~" {
+		home, _ := os.UserHomeDir()
+		return home
+	}
+	if strings.HasPrefix(path, "~/") {
+		home, _ := os.UserHomeDir()
+		return filepath.Join(home, path[2:])
+	}
+	return path
+}
+
+// expandTildeInURL expands ~ in file:// URLs.
+func expandTildeInURL(url string) string {
+	if strings.HasPrefix(url, "file://~/") {
+		home, _ := os.UserHomeDir()
+		return "file://" + filepath.Join(home, url[9:])
+	}
+	return url
+}
+
+// SnapshotConfig represents configuration for a named snapshot.
+// Each snapshot backs up one or more paths and can have its own exclude patterns
+// in addition to the global excludes.
+type SnapshotConfig struct {
+	Paths   []string `yaml:"paths"`
+	Exclude []string `yaml:"exclude"` // Additional excludes for this snapshot
+}
+
+// GetExcludes returns the combined exclude patterns for a named snapshot.
+// It merges global excludes with the snapshot-specific excludes.
+func (c *Config) GetExcludes(snapshotName string) []string {
+	snap, ok := c.Snapshots[snapshotName]
+	if !ok {
+		return c.Exclude
+	}
+
+	if len(snap.Exclude) == 0 {
+		return c.Exclude
+	}
+
+	// Combine global and snapshot-specific excludes
+	combined := make([]string, 0, len(c.Exclude)+len(snap.Exclude))
+	combined = append(combined, c.Exclude...)
+	combined = append(combined, snap.Exclude...)
+	return combined
+}
+
+// SnapshotNames returns the names of all configured snapshots in sorted order.
+func (c *Config) SnapshotNames() []string {
+	names := make([]string, 0, len(c.Snapshots))
+	for name := range c.Snapshots {
+		names = append(names, name)
+	}
+	// Sort for deterministic order
+	sort.Strings(names)
+	return names
+}
+
+// Config represents the application configuration for Vaultik.
+// It defines all settings for backup operations, including source directories,
+// encryption recipients, storage configuration, and performance tuning parameters.
+// Configuration is typically loaded from a YAML file.
 type Config struct {
-	AgeRecipient      string        `yaml:"age_recipient"`
+	AgeRecipients     []string                  `yaml:"age_recipients"`
+	AgeSecretKey      string                    `yaml:"age_secret_key"`
 	BackupInterval    time.Duration             `yaml:"backup_interval"`
-	BlobSizeLimit     int64         `yaml:"blob_size_limit"`
-	ChunkSize         int64         `yaml:"chunk_size"`
-	Exclude           []string      `yaml:"exclude"`
+	BlobSizeLimit     Size                      `yaml:"blob_size_limit"`
+	ChunkSize         Size                      `yaml:"chunk_size"`
+	Exclude           []string                  `yaml:"exclude"` // Global excludes applied to all snapshots
 	FullScanInterval  time.Duration             `yaml:"full_scan_interval"`
 	Hostname          string                    `yaml:"hostname"`
 	IndexPath         string                    `yaml:"index_path"`
-	IndexPrefix       string        `yaml:"index_prefix"`
 	MinTimeBetweenRun time.Duration             `yaml:"min_time_between_run"`
 	S3                S3Config                  `yaml:"s3"`
-	SourceDirs        []string      `yaml:"source_dirs"`
+	Snapshots         map[string]SnapshotConfig `yaml:"snapshots"`
 	CompressionLevel  int                       `yaml:"compression_level"`
+
+	// StorageURL specifies the storage backend using a URL format.
+	// Takes precedence over S3Config if set.
+	// Supported formats:
+	//   - s3://bucket/prefix?endpoint=host&region=us-east-1
+	//   - file:///path/to/backup
+	// For S3 URLs, credentials are still read from s3.access_key_id and s3.secret_access_key.
+	StorageURL string `yaml:"storage_url"`
 }

-// S3Config represents S3 storage configuration
+// S3Config represents S3 storage configuration for backup storage.
+// It supports both AWS S3 and S3-compatible storage services.
+// All fields except UseSSL and PartSize are required.
 type S3Config struct {
 	Endpoint        string `yaml:"endpoint"`
 	Bucket          string `yaml:"bucket"`
@@ -35,13 +117,17 @@ type S3Config struct {
 	SecretAccessKey string `yaml:"secret_access_key"`
 	Region          string `yaml:"region"`
 	UseSSL          bool   `yaml:"use_ssl"`
-	PartSize        int64  `yaml:"part_size"`
+	PartSize        Size   `yaml:"part_size"`
 }

-// ConfigPath wraps the config file path for fx injection
+// ConfigPath wraps the config file path for fx dependency injection.
+// This type allows the config file path to be injected as a distinct type
+// rather than a plain string, avoiding conflicts with other string dependencies.
 type ConfigPath string

-// New creates a new Config instance
+// New creates a new Config instance by loading from the specified path.
+// This function is used by the fx dependency injection framework.
+// Returns an error if the path is empty or if loading fails.
 func New(path ConfigPath) (*Config, error) {
 	if path == "" {
 		return nil, fmt.Errorf("config path not provided")
@@ -55,32 +141,60 @@ func New(path ConfigPath) (*Config, error) {
 	return cfg, nil
 }

-// Load reads and parses the configuration file
+// Load reads and parses the configuration file from the specified path.
+// It applies default values for optional fields, performs environment variable
+// substitution using smartconfig, and validates the configuration.
+// The configuration file should be in YAML format. Returns an error if the file
+// cannot be read, parsed, or if validation fails.
 func Load(path string) (*Config, error) {
-	data, err := os.ReadFile(path)
+	// Load config using smartconfig for interpolation
+	sc, err := smartconfig.NewFromConfigPath(path)
 	if err != nil {
-		return nil, fmt.Errorf("failed to read config file: %w", err)
+		return nil, fmt.Errorf("failed to load config file: %w", err)
 	}

 	cfg := &Config{
 		// Set defaults
-		BlobSizeLimit:     10 * 1024 * 1024 * 1024, // 10GB
-		ChunkSize:         10 * 1024 * 1024,        // 10MB
+		BlobSizeLimit:     Size(10 * 1024 * 1024 * 1024), // 10GB
+		ChunkSize:         Size(10 * 1024 * 1024),        // 10MB
 		BackupInterval:    1 * time.Hour,
 		FullScanInterval:  24 * time.Hour,
 		MinTimeBetweenRun: 15 * time.Minute,
-		IndexPath:         "/var/lib/vaultik/index.sqlite",
-		IndexPrefix:       "index/",
+		IndexPath:         filepath.Join(xdg.DataHome, appName, "index.sqlite"),
 		CompressionLevel:  3,
 	}

-	if err := yaml.Unmarshal(data, cfg); err != nil {
+	// Convert smartconfig data to YAML then unmarshal
+	configData := sc.Data()
+	yamlBytes, err := yaml.Marshal(configData)
+	if err != nil {
+		return nil, fmt.Errorf("failed to marshal config data: %w", err)
+	}
+
+	if err := yaml.Unmarshal(yamlBytes, cfg); err != nil {
 		return nil, fmt.Errorf("failed to parse config: %w", err)
 	}

+	// Expand tilde in all path fields
+	cfg.IndexPath = expandTilde(cfg.IndexPath)
+	cfg.StorageURL = expandTildeInURL(cfg.StorageURL)
+
+	// Expand tildes in snapshot paths
+	for name, snap := range cfg.Snapshots {
+		for i, path := range snap.Paths {
+			snap.Paths[i] = expandTilde(path)
+		}
+		cfg.Snapshots[name] = snap
+	}
+
 	// Check for environment variable override for IndexPath
 	if envIndexPath := os.Getenv("VAULTIK_INDEX_PATH"); envIndexPath != "" {
-		cfg.IndexPath = envIndexPath
+		cfg.IndexPath = expandTilde(envIndexPath)
+	}
+
+	// Check for environment variable override for AgeSecretKey
+	if envAgeSecretKey := os.Getenv("VAULTIK_AGE_SECRET_KEY"); envAgeSecretKey != "" {
+		cfg.AgeSecretKey = extractAgeSecretKey(envAgeSecretKey)
 	}

 	// Get hostname if not set
@@ -97,7 +211,18 @@ func Load(path string) (*Config, error) {
 		cfg.S3.Region = "us-east-1"
 	}
 	if cfg.S3.PartSize == 0 {
-		cfg.S3.PartSize = 5 * 1024 * 1024 // 5MB
+		cfg.S3.PartSize = Size(5 * 1024 * 1024) // 5MB
+	}
+
+	// Check config file permissions (warn if world or group readable)
+	if info, err := os.Stat(path); err == nil {
+		mode := info.Mode().Perm()
+		if mode&0044 != 0 { // group or world readable
+			log.Warn("Config file has insecure permissions (contains S3 credentials)",
+				"path", path,
+				"mode", fmt.Sprintf("%04o", mode),
+				"recommendation", "chmod 600 "+path)
+		}
 	}

 	if err := cfg.Validate(); err != nil {
@@ -107,37 +232,40 @@ func Load(path string) (*Config, error) {
 	return cfg, nil
 }

-// Validate checks if the configuration is valid
+// Validate checks if the configuration is valid and complete.
+// It ensures all required fields are present and have valid values:
+// - At least one age recipient must be specified
+// - At least one snapshot must be configured with at least one path
+// - Storage must be configured (either storage_url or s3.* fields)
+// - Chunk size must be at least 1MB
+// - Blob size limit must be at least the chunk size
+// - Compression level must be between 1 and 19
+// Returns an error describing the first validation failure encountered.
 func (c *Config) Validate() error {
-	if c.AgeRecipient == "" {
-		return fmt.Errorf("age_recipient is required")
+	if len(c.AgeRecipients) == 0 {
+		return fmt.Errorf("at least one age_recipient is required")
 	}

-	if len(c.SourceDirs) == 0 {
-		return fmt.Errorf("at least one source directory is required")
+	if len(c.Snapshots) == 0 {
+		return fmt.Errorf("at least one snapshot must be configured")
 	}

-	if c.S3.Endpoint == "" {
-		return fmt.Errorf("s3.endpoint is required")
+	for name, snap := range c.Snapshots {
+		if len(snap.Paths) == 0 {
+			return fmt.Errorf("snapshot %q must have at least one path", name)
+		}
 	}

-	if c.S3.Bucket == "" {
-		return fmt.Errorf("s3.bucket is required")
+	// Validate storage configuration
+	if err := c.validateStorage(); err != nil {
+		return err
 	}

-	if c.S3.AccessKeyID == "" {
-		return fmt.Errorf("s3.access_key_id is required")
-	}
-
-	if c.S3.SecretAccessKey == "" {
-		return fmt.Errorf("s3.secret_access_key is required")
-	}
-
-	if c.ChunkSize < 1024*1024 { // 1MB minimum
+	if c.ChunkSize.Int64() < 1024*1024 { // 1MB minimum
 		return fmt.Errorf("chunk_size must be at least 1MB")
 	}

-	if c.BlobSizeLimit < c.ChunkSize {
+	if c.BlobSizeLimit.Int64() < c.ChunkSize.Int64() {
 		return fmt.Errorf("blob_size_limit must be at least chunk_size")
 	}

@@ -148,7 +276,71 @@ func (c *Config) Validate() error {
 	return nil
 }

-// Module exports the config module for fx
+// validateStorage validates storage configuration.
+// If StorageURL is set, it takes precedence. S3 URLs require credentials.
+// File URLs don't require any S3 configuration.
+// If StorageURL is not set, legacy S3 configuration is required.
+func (c *Config) validateStorage() error {
+	if c.StorageURL != "" {
+		// URL-based configuration
+		if strings.HasPrefix(c.StorageURL, "file://") {
+			// File storage doesn't need S3 credentials
+			return nil
+		}
+		if strings.HasPrefix(c.StorageURL, "s3://") {
+			// S3 storage needs credentials
+			if c.S3.AccessKeyID == "" {
+				return fmt.Errorf("s3.access_key_id is required for s3:// URLs")
+			}
+			if c.S3.SecretAccessKey == "" {
+				return fmt.Errorf("s3.secret_access_key is required for s3:// URLs")
+			}
+			return nil
+		}
+		if strings.HasPrefix(c.StorageURL, "rclone://") {
+			// Rclone storage uses rclone's own config
+			return nil
+		}
+		return fmt.Errorf("storage_url must start with s3://, file://, or rclone://")
+	}
+
+	// Legacy S3 configuration
+	if c.S3.Endpoint == "" {
+		return fmt.Errorf("s3.endpoint is required (or set storage_url)")
+	}
+
+	if c.S3.Bucket == "" {
+		return fmt.Errorf("s3.bucket is required (or set storage_url)")
+	}
+
+	if c.S3.AccessKeyID == "" {
+		return fmt.Errorf("s3.access_key_id is required")
+	}
+
+	if c.S3.SecretAccessKey == "" {
+		return fmt.Errorf("s3.secret_access_key is required")
+	}
+
+	return nil
+}
+
+// extractAgeSecretKey extracts the AGE-SECRET-KEY from the input using
+// the age library's parser, which handles comments and whitespace.
+func extractAgeSecretKey(input string) string {
+	identities, err := age.ParseIdentities(strings.NewReader(input))
+	if err != nil || len(identities) == 0 {
+		// Fall back to trimmed input if parsing fails
+		return strings.TrimSpace(input)
+	}
+	// Return the string representation of the first identity
+	if id, ok := identities[0].(*age.X25519Identity); ok {
+		return id.String()
+	}
+	return strings.TrimSpace(input)
+}
+
+// Module exports the config module for fx dependency injection.
+// It provides the Config type to other modules in the application.
 var Module = fx.Module("config",
 	fx.Provide(New),
 )
--- a/internal/config/config_test.go
+++ b/internal/config/config_test.go
@@ -6,6 +6,12 @@ import (
 	"testing"
 )

+const (
+	TEST_SNEAK_AGE_PUBLIC_KEY        = "age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj"
+	TEST_INTEGRATION_AGE_PUBLIC_KEY  = "age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"
+	TEST_INTEGRATION_AGE_PRIVATE_KEY = "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5"
+)
+
 func TestMain(m *testing.M) {
 	// Set up test environment
 	testConfigPath := filepath.Join("..", "..", "test", "config.yaml")
@@ -32,16 +38,28 @@ func TestConfigLoad(t *testing.T) {
 	}

 	// Basic validation
-	if cfg.AgeRecipient != "age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" {
-		t.Errorf("Expected age recipient to be set, got '%s'", cfg.AgeRecipient)
+	if len(cfg.AgeRecipients) != 2 {
+		t.Errorf("Expected 2 age recipients, got %d", len(cfg.AgeRecipients))
+	}
+	if cfg.AgeRecipients[0] != TEST_SNEAK_AGE_PUBLIC_KEY {
+		t.Errorf("Expected first age recipient to be %s, got '%s'", TEST_SNEAK_AGE_PUBLIC_KEY, cfg.AgeRecipients[0])
 	}

-	if len(cfg.SourceDirs) != 2 {
-		t.Errorf("Expected 2 source dirs, got %d", len(cfg.SourceDirs))
+	if len(cfg.Snapshots) != 1 {
+		t.Errorf("Expected 1 snapshot, got %d", len(cfg.Snapshots))
 	}

-	if cfg.SourceDirs[0] != "/tmp/vaultik-test-source" {
-		t.Errorf("Expected first source dir to be '/tmp/vaultik-test-source', got '%s'", cfg.SourceDirs[0])
+	testSnap, ok := cfg.Snapshots["test"]
+	if !ok {
+		t.Fatal("Expected 'test' snapshot to exist")
+	}
+
+	if len(testSnap.Paths) != 2 {
+		t.Errorf("Expected 2 paths in test snapshot, got %d", len(testSnap.Paths))
+	}
+
+	if testSnap.Paths[0] != "/tmp/vaultik-test-source" {
+		t.Errorf("Expected first path to be '/tmp/vaultik-test-source', got '%s'", testSnap.Paths[0])
 	}

 	if cfg.S3.Bucket != "vaultik-test-bucket" {
@@ -65,3 +83,65 @@ func TestConfigFromEnv(t *testing.T) {
 		t.Errorf("Config file does not exist at path from VAULTIK_CONFIG: %s", configPath)
 	}
 }
+
+// TestExtractAgeSecretKey tests extraction of AGE-SECRET-KEY from various inputs
+func TestExtractAgeSecretKey(t *testing.T) {
+	tests := []struct {
+		name     string
+		input    string
+		expected string
+	}{
+		{
+			name:     "plain key",
+			input:    "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
+			expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
+		},
+		{
+			name:     "key with trailing newline",
+			input:    "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5\n",
+			expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
+		},
+		{
+			name: "full age-keygen output",
+			input: `# created: 2025-01-14T12:00:00Z
+# public key: age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg
+AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5
+`,
+			expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
+		},
+		{
+			name: "age-keygen output with extra blank lines",
+			input: `# created: 2025-01-14T12:00:00Z
+# public key: age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg
+
+AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5
+
+`,
+			expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
+		},
+		{
+			name:     "key with leading whitespace",
+			input:    "  AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5  ",
+			expected: "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5",
+		},
+		{
+			name:     "empty input",
+			input:    "",
+			expected: "",
+		},
+		{
+			name:     "only comments",
+			input:    "# this is a comment\n# another comment",
+			expected: "# this is a comment\n# another comment",
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			result := extractAgeSecretKey(tt.input)
+			if result != tt.expected {
+				t.Errorf("extractAgeSecretKey(%q) = %q, want %q", tt.input, result, tt.expected)
+			}
+		})
+	}
+}
--- a/internal/config/size.go
+++ b/internal/config/size.go
@@ -0,0 +1,62 @@
+package config
+
+import (
+	"fmt"
+
+	"github.com/dustin/go-humanize"
+)
+
+// Size represents a byte size that can be specified in configuration files.
+// It can unmarshal from both numeric values (interpreted as bytes) and
+// human-readable strings like "10MB", "2.5GB", or "1TB".
+type Size int64
+
+// UnmarshalYAML implements yaml.Unmarshaler for Size, allowing it to be
+// parsed from YAML configuration files. It accepts both numeric values
+// (interpreted as bytes) and string values with units (e.g., "10MB").
+func (s *Size) UnmarshalYAML(unmarshal func(interface{}) error) error {
+	// Try to unmarshal as int64 first
+	var intVal int64
+	if err := unmarshal(&intVal); err == nil {
+		*s = Size(intVal)
+		return nil
+	}
+
+	// Try to unmarshal as string
+	var strVal string
+	if err := unmarshal(&strVal); err != nil {
+		return fmt.Errorf("size must be a number or string")
+	}
+
+	// Parse the string using go-humanize
+	bytes, err := humanize.ParseBytes(strVal)
+	if err != nil {
+		return fmt.Errorf("invalid size format: %w", err)
+	}
+
+	*s = Size(bytes)
+	return nil
+}
+
+// Int64 returns the size as int64 bytes.
+// This is useful when the size needs to be passed to APIs that expect
+// a numeric byte count.
+func (s Size) Int64() int64 {
+	return int64(s)
+}
+
+// String returns the size as a human-readable string.
+// For example, 1048576 bytes would be formatted as "1.0 MB".
+// This implements the fmt.Stringer interface.
+func (s Size) String() string {
+	return humanize.Bytes(uint64(s))
+}
+
+// ParseSize parses a size string into a Size value
+func ParseSize(s string) (Size, error) {
+	bytes, err := humanize.ParseBytes(s)
+	if err != nil {
+		return 0, fmt.Errorf("invalid size format: %w", err)
+	}
+	return Size(bytes), nil
+}
--- a/internal/crypto/encryption.go
+++ b/internal/crypto/encryption.go
@@ -0,0 +1,209 @@
+package crypto
+
+import (
+	"bytes"
+	"fmt"
+	"io"
+	"sync"
+
+	"filippo.io/age"
+	"go.uber.org/fx"
+)
+
+// Encryptor provides thread-safe encryption using the age encryption library.
+// It supports encrypting data for multiple recipients simultaneously, allowing
+// any of the corresponding private keys to decrypt the data. This is useful
+// for backup scenarios where multiple parties should be able to decrypt the data.
+type Encryptor struct {
+	recipients []age.Recipient
+	mu         sync.RWMutex
+}
+
+// NewEncryptor creates a new encryptor with the given age public keys.
+// Each public key should be a valid age X25519 recipient string (e.g., "age1...")
+// At least one recipient must be provided. Returns an error if any of the
+// public keys are invalid or if no recipients are specified.
+func NewEncryptor(publicKeys []string) (*Encryptor, error) {
+	if len(publicKeys) == 0 {
+		return nil, fmt.Errorf("at least one recipient is required")
+	}
+
+	recipients := make([]age.Recipient, 0, len(publicKeys))
+	for _, key := range publicKeys {
+		recipient, err := age.ParseX25519Recipient(key)
+		if err != nil {
+			return nil, fmt.Errorf("parsing age recipient %s: %w", key, err)
+		}
+		recipients = append(recipients, recipient)
+	}
+
+	return &Encryptor{
+		recipients: recipients,
+	}, nil
+}
+
+// Encrypt encrypts data using age encryption for all configured recipients.
+// The encrypted data can be decrypted by any of the corresponding private keys.
+// This method is suitable for small to medium amounts of data that fit in memory.
+// For large data streams, use EncryptStream or EncryptWriter instead.
+func (e *Encryptor) Encrypt(data []byte) ([]byte, error) {
+	e.mu.RLock()
+	recipients := e.recipients
+	e.mu.RUnlock()
+
+	var buf bytes.Buffer
+
+	// Create encrypted writer for all recipients
+	w, err := age.Encrypt(&buf, recipients...)
+	if err != nil {
+		return nil, fmt.Errorf("creating encrypted writer: %w", err)
+	}
+
+	// Write data
+	if _, err := w.Write(data); err != nil {
+		return nil, fmt.Errorf("writing encrypted data: %w", err)
+	}
+
+	// Close to flush
+	if err := w.Close(); err != nil {
+		return nil, fmt.Errorf("closing encrypted writer: %w", err)
+	}
+
+	return buf.Bytes(), nil
+}
+
+// EncryptStream encrypts data from reader to writer using age encryption.
+// This method is suitable for encrypting large files or streams as it processes
+// data in a streaming fashion without loading everything into memory.
+// The encrypted data is written directly to the destination writer.
+func (e *Encryptor) EncryptStream(dst io.Writer, src io.Reader) error {
+	e.mu.RLock()
+	recipients := e.recipients
+	e.mu.RUnlock()
+
+	// Create encrypted writer for all recipients
+	w, err := age.Encrypt(dst, recipients...)
+	if err != nil {
+		return fmt.Errorf("creating encrypted writer: %w", err)
+	}
+
+	// Copy data
+	if _, err := io.Copy(w, src); err != nil {
+		return fmt.Errorf("copying encrypted data: %w", err)
+	}
+
+	// Close to flush
+	if err := w.Close(); err != nil {
+		return fmt.Errorf("closing encrypted writer: %w", err)
+	}
+
+	return nil
+}
+
+// EncryptWriter creates a writer that encrypts data written to it.
+// All data written to the returned WriteCloser will be encrypted and written
+// to the destination writer. The caller must call Close() on the returned
+// writer to ensure all encrypted data is properly flushed and finalized.
+// This is useful for integrating encryption into existing writer-based pipelines.
+func (e *Encryptor) EncryptWriter(dst io.Writer) (io.WriteCloser, error) {
+	e.mu.RLock()
+	recipients := e.recipients
+	e.mu.RUnlock()
+
+	// Create encrypted writer for all recipients
+	w, err := age.Encrypt(dst, recipients...)
+	if err != nil {
+		return nil, fmt.Errorf("creating encrypted writer: %w", err)
+	}
+
+	return w, nil
+}
+
+// UpdateRecipients updates the recipients for future encryption operations.
+// This method is thread-safe and can be called while other encryption operations
+// are in progress. Existing encryption operations will continue with the old
+// recipients. At least one recipient must be provided. Returns an error if any
+// of the public keys are invalid or if no recipients are specified.
+func (e *Encryptor) UpdateRecipients(publicKeys []string) error {
+	if len(publicKeys) == 0 {
+		return fmt.Errorf("at least one recipient is required")
+	}
+
+	recipients := make([]age.Recipient, 0, len(publicKeys))
+	for _, key := range publicKeys {
+		recipient, err := age.ParseX25519Recipient(key)
+		if err != nil {
+			return fmt.Errorf("parsing age recipient %s: %w", key, err)
+		}
+		recipients = append(recipients, recipient)
+	}
+
+	e.mu.Lock()
+	e.recipients = recipients
+	e.mu.Unlock()
+
+	return nil
+}
+
+// Decryptor provides thread-safe decryption using the age encryption library.
+// It uses a private key to decrypt data that was encrypted for the corresponding
+// public key.
+type Decryptor struct {
+	identity age.Identity
+	mu       sync.RWMutex
+}
+
+// NewDecryptor creates a new decryptor with the given age private key.
+// The private key should be a valid age X25519 identity string.
+// Returns an error if the private key is invalid.
+func NewDecryptor(privateKey string) (*Decryptor, error) {
+	identity, err := age.ParseX25519Identity(privateKey)
+	if err != nil {
+		return nil, fmt.Errorf("parsing age identity: %w", err)
+	}
+
+	return &Decryptor{
+		identity: identity,
+	}, nil
+}
+
+// Decrypt decrypts data using age decryption.
+// This method is suitable for small to medium amounts of data that fit in memory.
+// For large data streams, use DecryptStream instead.
+func (d *Decryptor) Decrypt(data []byte) ([]byte, error) {
+	d.mu.RLock()
+	identity := d.identity
+	d.mu.RUnlock()
+
+	r, err := age.Decrypt(bytes.NewReader(data), identity)
+	if err != nil {
+		return nil, fmt.Errorf("creating decrypted reader: %w", err)
+	}
+
+	decrypted, err := io.ReadAll(r)
+	if err != nil {
+		return nil, fmt.Errorf("reading decrypted data: %w", err)
+	}
+
+	return decrypted, nil
+}
+
+// DecryptStream returns a reader that decrypts data from the provided reader.
+// This method is suitable for decrypting large files or streams as it processes
+// data in a streaming fashion without loading everything into memory.
+// The caller should close the input reader when done.
+func (d *Decryptor) DecryptStream(src io.Reader) (io.Reader, error) {
+	d.mu.RLock()
+	identity := d.identity
+	d.mu.RUnlock()
+
+	r, err := age.Decrypt(src, identity)
+	if err != nil {
+		return nil, fmt.Errorf("creating decrypted reader: %w", err)
+	}
+
+	return r, nil
+}
+
+// Module exports the crypto module for fx dependency injection.
+var Module = fx.Module("crypto")
--- a/internal/crypto/encryption_test.go
+++ b/internal/crypto/encryption_test.go
@@ -0,0 +1,157 @@
+package crypto
+
+import (
+	"bytes"
+	"testing"
+
+	"filippo.io/age"
+)
+
+func TestEncryptor(t *testing.T) {
+	// Generate a test key pair
+	identity, err := age.GenerateX25519Identity()
+	if err != nil {
+		t.Fatalf("failed to generate identity: %v", err)
+	}
+
+	publicKey := identity.Recipient().String()
+
+	// Create encryptor
+	enc, err := NewEncryptor([]string{publicKey})
+	if err != nil {
+		t.Fatalf("failed to create encryptor: %v", err)
+	}
+
+	// Test data
+	plaintext := []byte("Hello, World! This is a test message.")
+
+	// Encrypt
+	ciphertext, err := enc.Encrypt(plaintext)
+	if err != nil {
+		t.Fatalf("failed to encrypt: %v", err)
+	}
+
+	// Verify it's actually encrypted (should be larger and different)
+	if bytes.Equal(plaintext, ciphertext) {
+		t.Error("ciphertext equals plaintext")
+	}
+
+	// Decrypt to verify
+	r, err := age.Decrypt(bytes.NewReader(ciphertext), identity)
+	if err != nil {
+		t.Fatalf("failed to decrypt: %v", err)
+	}
+
+	var decrypted bytes.Buffer
+	if _, err := decrypted.ReadFrom(r); err != nil {
+		t.Fatalf("failed to read decrypted data: %v", err)
+	}
+
+	if !bytes.Equal(plaintext, decrypted.Bytes()) {
+		t.Error("decrypted data doesn't match original")
+	}
+}
+
+func TestEncryptorMultipleRecipients(t *testing.T) {
+	// Generate three test key pairs
+	identity1, err := age.GenerateX25519Identity()
+	if err != nil {
+		t.Fatalf("failed to generate identity1: %v", err)
+	}
+	identity2, err := age.GenerateX25519Identity()
+	if err != nil {
+		t.Fatalf("failed to generate identity2: %v", err)
+	}
+	identity3, err := age.GenerateX25519Identity()
+	if err != nil {
+		t.Fatalf("failed to generate identity3: %v", err)
+	}
+
+	publicKeys := []string{
+		identity1.Recipient().String(),
+		identity2.Recipient().String(),
+		identity3.Recipient().String(),
+	}
+
+	// Create encryptor with multiple recipients
+	enc, err := NewEncryptor(publicKeys)
+	if err != nil {
+		t.Fatalf("failed to create encryptor: %v", err)
+	}
+
+	// Test data
+	plaintext := []byte("Secret message for multiple recipients")
+
+	// Encrypt
+	ciphertext, err := enc.Encrypt(plaintext)
+	if err != nil {
+		t.Fatalf("failed to encrypt: %v", err)
+	}
+
+	// Verify each recipient can decrypt
+	identities := []age.Identity{identity1, identity2, identity3}
+	for i, identity := range identities {
+		r, err := age.Decrypt(bytes.NewReader(ciphertext), identity)
+		if err != nil {
+			t.Fatalf("recipient %d failed to decrypt: %v", i+1, err)
+		}
+
+		var decrypted bytes.Buffer
+		if _, err := decrypted.ReadFrom(r); err != nil {
+			t.Fatalf("recipient %d failed to read decrypted data: %v", i+1, err)
+		}
+
+		if !bytes.Equal(plaintext, decrypted.Bytes()) {
+			t.Errorf("recipient %d: decrypted data doesn't match original", i+1)
+		}
+	}
+}
+
+func TestEncryptorUpdateRecipients(t *testing.T) {
+	// Generate two identities
+	identity1, _ := age.GenerateX25519Identity()
+	identity2, _ := age.GenerateX25519Identity()
+
+	publicKey1 := identity1.Recipient().String()
+	publicKey2 := identity2.Recipient().String()
+
+	// Create encryptor with first key
+	enc, err := NewEncryptor([]string{publicKey1})
+	if err != nil {
+		t.Fatalf("failed to create encryptor: %v", err)
+	}
+
+	// Encrypt with first key
+	plaintext := []byte("test data")
+	ciphertext1, err := enc.Encrypt(plaintext)
+	if err != nil {
+		t.Fatalf("failed to encrypt: %v", err)
+	}
+
+	// Update to second key
+	if err := enc.UpdateRecipients([]string{publicKey2}); err != nil {
+		t.Fatalf("failed to update recipients: %v", err)
+	}
+
+	// Encrypt with second key
+	ciphertext2, err := enc.Encrypt(plaintext)
+	if err != nil {
+		t.Fatalf("failed to encrypt: %v", err)
+	}
+
+	// First ciphertext should only decrypt with first identity
+	if _, err := age.Decrypt(bytes.NewReader(ciphertext1), identity1); err != nil {
+		t.Error("failed to decrypt with identity1")
+	}
+	if _, err := age.Decrypt(bytes.NewReader(ciphertext1), identity2); err == nil {
+		t.Error("should not decrypt with identity2")
+	}
+
+	// Second ciphertext should only decrypt with second identity
+	if _, err := age.Decrypt(bytes.NewReader(ciphertext2), identity2); err != nil {
+		t.Error("failed to decrypt with identity2")
+	}
+	if _, err := age.Decrypt(bytes.NewReader(ciphertext2), identity1); err == nil {
+		t.Error("should not decrypt with identity1")
+	}
+}
--- a/internal/database/blob_chunks.go
+++ b/internal/database/blob_chunks.go
@@ -16,15 +16,15 @@ func NewBlobChunkRepository(db *DB) *BlobChunkRepository {

 func (r *BlobChunkRepository) Create(ctx context.Context, tx *sql.Tx, bc *BlobChunk) error {
 	query := `
-		INSERT INTO blob_chunks (blob_hash, chunk_hash, offset, length)
+		INSERT INTO blob_chunks (blob_id, chunk_hash, offset, length)
 		VALUES (?, ?, ?, ?)
 	`

 	var err error
 	if tx != nil {
-		_, err = tx.ExecContext(ctx, query, bc.BlobHash, bc.ChunkHash, bc.Offset, bc.Length)
+		_, err = tx.ExecContext(ctx, query, bc.BlobID, bc.ChunkHash, bc.Offset, bc.Length)
 	} else {
-		_, err = r.db.ExecWithLock(ctx, query, bc.BlobHash, bc.ChunkHash, bc.Offset, bc.Length)
+		_, err = r.db.ExecWithLog(ctx, query, bc.BlobID, bc.ChunkHash, bc.Offset, bc.Length)
 	}

 	if err != nil {
@@ -34,15 +34,15 @@ func (r *BlobChunkRepository) Create(ctx context.Context, tx *sql.Tx, bc *BlobCh
 	return nil
 }

-func (r *BlobChunkRepository) GetByBlobHash(ctx context.Context, blobHash string) ([]*BlobChunk, error) {
+func (r *BlobChunkRepository) GetByBlobID(ctx context.Context, blobID string) ([]*BlobChunk, error) {
 	query := `
-		SELECT blob_hash, chunk_hash, offset, length
+		SELECT blob_id, chunk_hash, offset, length
 		FROM blob_chunks
-		WHERE blob_hash = ?
+		WHERE blob_id = ?
 		ORDER BY offset
 	`

-	rows, err := r.db.conn.QueryContext(ctx, query, blobHash)
+	rows, err := r.db.conn.QueryContext(ctx, query, blobID)
 	if err != nil {
 		return nil, fmt.Errorf("querying blob chunks: %w", err)
 	}
@@ -51,7 +51,7 @@ func (r *BlobChunkRepository) GetByBlobHash(ctx context.Context, blobHash string
 	var blobChunks []*BlobChunk
 	for rows.Next() {
 		var bc BlobChunk
-		err := rows.Scan(&bc.BlobHash, &bc.ChunkHash, &bc.Offset, &bc.Length)
+		err := rows.Scan(&bc.BlobID, &bc.ChunkHash, &bc.Offset, &bc.Length)
 		if err != nil {
 			return nil, fmt.Errorf("scanning blob chunk: %w", err)
 		}
@@ -63,26 +63,90 @@ func (r *BlobChunkRepository) GetByBlobHash(ctx context.Context, blobHash string

 func (r *BlobChunkRepository) GetByChunkHash(ctx context.Context, chunkHash string) (*BlobChunk, error) {
 	query := `
-		SELECT blob_hash, chunk_hash, offset, length
+		SELECT blob_id, chunk_hash, offset, length
 		FROM blob_chunks
 		WHERE chunk_hash = ?
 		LIMIT 1
 	`

+	LogSQL("GetByChunkHash", query, chunkHash)
 	var bc BlobChunk
 	err := r.db.conn.QueryRowContext(ctx, query, chunkHash).Scan(
-		&bc.BlobHash,
+		&bc.BlobID,
 		&bc.ChunkHash,
 		&bc.Offset,
 		&bc.Length,
 	)

 	if err == sql.ErrNoRows {
+		LogSQL("GetByChunkHash", "No rows found", chunkHash)
 		return nil, nil
 	}
 	if err != nil {
+		LogSQL("GetByChunkHash", "Error", chunkHash, err)
 		return nil, fmt.Errorf("querying blob chunk: %w", err)
 	}

+	LogSQL("GetByChunkHash", "Found blob", chunkHash, "blob", bc.BlobID)
 	return &bc, nil
 }
+
+// GetByChunkHashTx retrieves a blob chunk within a transaction
+func (r *BlobChunkRepository) GetByChunkHashTx(ctx context.Context, tx *sql.Tx, chunkHash string) (*BlobChunk, error) {
+	query := `
+		SELECT blob_id, chunk_hash, offset, length
+		FROM blob_chunks
+		WHERE chunk_hash = ?
+		LIMIT 1
+	`
+
+	LogSQL("GetByChunkHashTx", query, chunkHash)
+	var bc BlobChunk
+	err := tx.QueryRowContext(ctx, query, chunkHash).Scan(
+		&bc.BlobID,
+		&bc.ChunkHash,
+		&bc.Offset,
+		&bc.Length,
+	)
+
+	if err == sql.ErrNoRows {
+		LogSQL("GetByChunkHashTx", "No rows found", chunkHash)
+		return nil, nil
+	}
+	if err != nil {
+		LogSQL("GetByChunkHashTx", "Error", chunkHash, err)
+		return nil, fmt.Errorf("querying blob chunk: %w", err)
+	}
+
+	LogSQL("GetByChunkHashTx", "Found blob", chunkHash, "blob", bc.BlobID)
+	return &bc, nil
+}
+
+// DeleteOrphaned deletes blob_chunks entries where either the blob or chunk no longer exists
+func (r *BlobChunkRepository) DeleteOrphaned(ctx context.Context) error {
+	// Delete blob_chunks where the blob doesn't exist
+	query1 := `
+		DELETE FROM blob_chunks 
+		WHERE NOT EXISTS (
+			SELECT 1 FROM blobs 
+			WHERE blobs.id = blob_chunks.blob_id
+		)
+	`
+	if _, err := r.db.ExecWithLog(ctx, query1); err != nil {
+		return fmt.Errorf("deleting blob_chunks with missing blobs: %w", err)
+	}
+
+	// Delete blob_chunks where the chunk doesn't exist
+	query2 := `
+		DELETE FROM blob_chunks 
+		WHERE NOT EXISTS (
+			SELECT 1 FROM chunks 
+			WHERE chunks.chunk_hash = blob_chunks.chunk_hash
+		)
+	`
+	if _, err := r.db.ExecWithLog(ctx, query2); err != nil {
+		return fmt.Errorf("deleting blob_chunks with missing chunks: %w", err)
+	}
+
+	return nil
+}
--- a/internal/database/blob_chunks_test.go
+++ b/internal/database/blob_chunks_test.go
@@ -2,7 +2,11 @@ package database

 import (
 	"context"
+	"strings"
 	"testing"
+	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/types"
 )

 func TestBlobChunkRepository(t *testing.T) {
@@ -10,78 +14,111 @@ func TestBlobChunkRepository(t *testing.T) {
 	defer cleanup()

 	ctx := context.Background()
-	repo := NewBlobChunkRepository(db)
+	repos := NewRepositories(db)
+
+	// Create blob first
+	blob := &Blob{
+		ID:        types.NewBlobID(),
+		Hash:      types.BlobHash("blob1-hash"),
+		CreatedTS: time.Now(),
+	}
+	err := repos.Blobs.Create(ctx, nil, blob)
+	if err != nil {
+		t.Fatalf("failed to create blob: %v", err)
+	}
+
+	// Create chunks
+	chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
+	for _, chunkHash := range chunks {
+		chunk := &Chunk{
+			ChunkHash: chunkHash,
+			Size:      1024,
+		}
+		err = repos.Chunks.Create(ctx, nil, chunk)
+		if err != nil {
+			t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
+		}
+	}

 	// Test Create
 	bc1 := &BlobChunk{
-		BlobHash:  "blob1",
-		ChunkHash: "chunk1",
+		BlobID:    blob.ID,
+		ChunkHash: types.ChunkHash("chunk1"),
 		Offset:    0,
 		Length:    1024,
 	}

-	err := repo.Create(ctx, nil, bc1)
+	err = repos.BlobChunks.Create(ctx, nil, bc1)
 	if err != nil {
 		t.Fatalf("failed to create blob chunk: %v", err)
 	}

 	// Add more chunks to the same blob
 	bc2 := &BlobChunk{
-		BlobHash:  "blob1",
-		ChunkHash: "chunk2",
+		BlobID:    blob.ID,
+		ChunkHash: types.ChunkHash("chunk2"),
 		Offset:    1024,
 		Length:    2048,
 	}
-	err = repo.Create(ctx, nil, bc2)
+	err = repos.BlobChunks.Create(ctx, nil, bc2)
 	if err != nil {
 		t.Fatalf("failed to create second blob chunk: %v", err)
 	}

 	bc3 := &BlobChunk{
-		BlobHash:  "blob1",
-		ChunkHash: "chunk3",
+		BlobID:    blob.ID,
+		ChunkHash: types.ChunkHash("chunk3"),
 		Offset:    3072,
 		Length:    512,
 	}
-	err = repo.Create(ctx, nil, bc3)
+	err = repos.BlobChunks.Create(ctx, nil, bc3)
 	if err != nil {
 		t.Fatalf("failed to create third blob chunk: %v", err)
 	}

-	// Test GetByBlobHash
-	chunks, err := repo.GetByBlobHash(ctx, "blob1")
+	// Test GetByBlobID
+	blobChunks, err := repos.BlobChunks.GetByBlobID(ctx, blob.ID.String())
 	if err != nil {
 		t.Fatalf("failed to get blob chunks: %v", err)
 	}
-	if len(chunks) != 3 {
-		t.Errorf("expected 3 chunks, got %d", len(chunks))
+	if len(blobChunks) != 3 {
+		t.Errorf("expected 3 chunks, got %d", len(blobChunks))
 	}

 	// Verify order by offset
 	expectedOffsets := []int64{0, 1024, 3072}
-	for i, chunk := range chunks {
-		if chunk.Offset != expectedOffsets[i] {
-			t.Errorf("wrong chunk order: expected offset %d, got %d", expectedOffsets[i], chunk.Offset)
+	for i, bc := range blobChunks {
+		if bc.Offset != expectedOffsets[i] {
+			t.Errorf("wrong chunk order: expected offset %d, got %d", expectedOffsets[i], bc.Offset)
 		}
 	}

 	// Test GetByChunkHash
-	bc, err := repo.GetByChunkHash(ctx, "chunk2")
+	bc, err := repos.BlobChunks.GetByChunkHash(ctx, "chunk2")
 	if err != nil {
 		t.Fatalf("failed to get blob chunk by chunk hash: %v", err)
 	}
 	if bc == nil {
 		t.Fatal("expected blob chunk, got nil")
 	}
-	if bc.BlobHash != "blob1" {
-		t.Errorf("wrong blob hash: expected blob1, got %s", bc.BlobHash)
+	if bc.BlobID != blob.ID {
+		t.Errorf("wrong blob ID: expected %s, got %s", blob.ID, bc.BlobID)
 	}
 	if bc.Offset != 1024 {
 		t.Errorf("wrong offset: expected 1024, got %d", bc.Offset)
 	}

+	// Test duplicate insert (should fail due to primary key constraint)
+	err = repos.BlobChunks.Create(ctx, nil, bc1)
+	if err == nil {
+		t.Fatal("duplicate blob_chunk insert should fail due to primary key constraint")
+	}
+	if !strings.Contains(err.Error(), "UNIQUE") && !strings.Contains(err.Error(), "constraint") {
+		t.Fatalf("expected constraint error, got: %v", err)
+	}
+
 	// Test non-existent chunk
-	bc, err = repo.GetByChunkHash(ctx, "nonexistent")
+	bc, err = repos.BlobChunks.GetByChunkHash(ctx, "nonexistent")
 	if err != nil {
 		t.Fatalf("unexpected error: %v", err)
 	}
@@ -95,26 +132,60 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
 	defer cleanup()

 	ctx := context.Background()
-	repo := NewBlobChunkRepository(db)
+	repos := NewRepositories(db)
+
+	// Create blobs
+	blob1 := &Blob{
+		ID:        types.NewBlobID(),
+		Hash:      types.BlobHash("blob1-hash"),
+		CreatedTS: time.Now(),
+	}
+	blob2 := &Blob{
+		ID:        types.NewBlobID(),
+		Hash:      types.BlobHash("blob2-hash"),
+		CreatedTS: time.Now(),
+	}
+
+	err := repos.Blobs.Create(ctx, nil, blob1)
+	if err != nil {
+		t.Fatalf("failed to create blob1: %v", err)
+	}
+	err = repos.Blobs.Create(ctx, nil, blob2)
+	if err != nil {
+		t.Fatalf("failed to create blob2: %v", err)
+	}
+
+	// Create chunks
+	chunkHashes := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
+	for _, chunkHash := range chunkHashes {
+		chunk := &Chunk{
+			ChunkHash: chunkHash,
+			Size:      1024,
+		}
+		err = repos.Chunks.Create(ctx, nil, chunk)
+		if err != nil {
+			t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
+		}
+	}

 	// Create chunks across multiple blobs
 	// Some chunks are shared between blobs (deduplication scenario)
 	blobChunks := []BlobChunk{
-		{BlobHash: "blob1", ChunkHash: "chunk1", Offset: 0, Length: 1024},
-		{BlobHash: "blob1", ChunkHash: "chunk2", Offset: 1024, Length: 1024},
-		{BlobHash: "blob2", ChunkHash: "chunk2", Offset: 0, Length: 1024}, // chunk2 is shared
-		{BlobHash: "blob2", ChunkHash: "chunk3", Offset: 1024, Length: 1024},
+		{BlobID: blob1.ID, ChunkHash: types.ChunkHash("chunk1"), Offset: 0, Length: 1024},
+		{BlobID: blob1.ID, ChunkHash: types.ChunkHash("chunk2"), Offset: 1024, Length: 1024},
+		{BlobID: blob2.ID, ChunkHash: types.ChunkHash("chunk2"), Offset: 0, Length: 1024}, // chunk2 is shared
+		{BlobID: blob2.ID, ChunkHash: types.ChunkHash("chunk3"), Offset: 1024, Length: 1024},
 	}

 	for _, bc := range blobChunks {
-		err := repo.Create(ctx, nil, &bc)
+		err := repos.BlobChunks.Create(ctx, nil, &bc)
 		if err != nil {
 			t.Fatalf("failed to create blob chunk: %v", err)
 		}
 	}

 	// Verify blob1 chunks
-	chunks, err := repo.GetByBlobHash(ctx, "blob1")
+	chunks, err := repos.BlobChunks.GetByBlobID(ctx, blob1.ID.String())
 	if err != nil {
 		t.Fatalf("failed to get blob1 chunks: %v", err)
 	}
@@ -123,7 +194,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
 	}

 	// Verify blob2 chunks
-	chunks, err = repo.GetByBlobHash(ctx, "blob2")
+	chunks, err = repos.BlobChunks.GetByBlobID(ctx, blob2.ID.String())
 	if err != nil {
 		t.Fatalf("failed to get blob2 chunks: %v", err)
 	}
@@ -132,7 +203,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
 	}

 	// Verify shared chunk
-	bc, err := repo.GetByChunkHash(ctx, "chunk2")
+	bc, err := repos.BlobChunks.GetByChunkHash(ctx, "chunk2")
 	if err != nil {
 		t.Fatalf("failed to get shared chunk: %v", err)
 	}
@@ -140,7 +211,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
 		t.Fatal("expected shared chunk, got nil")
 	}
 	// GetByChunkHash returns first match, should be blob1
-	if bc.BlobHash != "blob1" {
-		t.Errorf("expected blob1 for shared chunk, got %s", bc.BlobHash)
+	if bc.BlobID != blob1.ID {
+		t.Errorf("expected %s for shared chunk, got %s", blob1.ID, bc.BlobID)
 	}
 }
--- a/internal/database/blobs.go
+++ b/internal/database/blobs.go
@@ -5,6 +5,8 @@ import (
 	"database/sql"
 	"fmt"
 	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/log"
 )

 type BlobRepository struct {
@@ -17,15 +19,27 @@ func NewBlobRepository(db *DB) *BlobRepository {

 func (r *BlobRepository) Create(ctx context.Context, tx *sql.Tx, blob *Blob) error {
 	query := `
-		INSERT INTO blobs (blob_hash, created_ts)
-		VALUES (?, ?)
+		INSERT INTO blobs (id, blob_hash, created_ts, finished_ts, uncompressed_size, compressed_size, uploaded_ts)
+		VALUES (?, ?, ?, ?, ?, ?, ?)
 	`

+	var finishedTS, uploadedTS *int64
+	if blob.FinishedTS != nil {
+		ts := blob.FinishedTS.Unix()
+		finishedTS = &ts
+	}
+	if blob.UploadedTS != nil {
+		ts := blob.UploadedTS.Unix()
+		uploadedTS = &ts
+	}
+
 	var err error
 	if tx != nil {
-		_, err = tx.ExecContext(ctx, query, blob.BlobHash, blob.CreatedTS.Unix())
+		_, err = tx.ExecContext(ctx, query, blob.ID, blob.Hash, blob.CreatedTS.Unix(),
+			finishedTS, blob.UncompressedSize, blob.CompressedSize, uploadedTS)
 	} else {
-		_, err = r.db.ExecWithLock(ctx, query, blob.BlobHash, blob.CreatedTS.Unix())
+		_, err = r.db.ExecWithLog(ctx, query, blob.ID, blob.Hash, blob.CreatedTS.Unix(),
+			finishedTS, blob.UncompressedSize, blob.CompressedSize, uploadedTS)
 	}

 	if err != nil {
@@ -37,17 +51,23 @@ func (r *BlobRepository) Create(ctx context.Context, tx *sql.Tx, blob *Blob) err

 func (r *BlobRepository) GetByHash(ctx context.Context, hash string) (*Blob, error) {
 	query := `
-		SELECT blob_hash, created_ts
+		SELECT id, blob_hash, created_ts, finished_ts, uncompressed_size, compressed_size, uploaded_ts
 		FROM blobs
 		WHERE blob_hash = ?
 	`

 	var blob Blob
 	var createdTSUnix int64
+	var finishedTSUnix, uploadedTSUnix sql.NullInt64

 	err := r.db.conn.QueryRowContext(ctx, query, hash).Scan(
-		&blob.BlobHash,
+		&blob.ID,
+		&blob.Hash,
 		&createdTSUnix,
+		&finishedTSUnix,
+		&blob.UncompressedSize,
+		&blob.CompressedSize,
+		&uploadedTSUnix,
 	)

 	if err == sql.ErrNoRows {
@@ -57,40 +77,124 @@ func (r *BlobRepository) GetByHash(ctx context.Context, hash string) (*Blob, err
 		return nil, fmt.Errorf("querying blob: %w", err)
 	}

-	blob.CreatedTS = time.Unix(createdTSUnix, 0)
+	blob.CreatedTS = time.Unix(createdTSUnix, 0).UTC()
+	if finishedTSUnix.Valid {
+		ts := time.Unix(finishedTSUnix.Int64, 0).UTC()
+		blob.FinishedTS = &ts
+	}
+	if uploadedTSUnix.Valid {
+		ts := time.Unix(uploadedTSUnix.Int64, 0).UTC()
+		blob.UploadedTS = &ts
+	}
 	return &blob, nil
 }

-func (r *BlobRepository) List(ctx context.Context, limit, offset int) ([]*Blob, error) {
+// GetByID retrieves a blob by its ID
+func (r *BlobRepository) GetByID(ctx context.Context, id string) (*Blob, error) {
 	query := `
-		SELECT blob_hash, created_ts
+		SELECT id, blob_hash, created_ts, finished_ts, uncompressed_size, compressed_size, uploaded_ts
 		FROM blobs
-		ORDER BY blob_hash
-		LIMIT ? OFFSET ?
+		WHERE id = ?
 	`

-	rows, err := r.db.conn.QueryContext(ctx, query, limit, offset)
-	if err != nil {
-		return nil, fmt.Errorf("querying blobs: %w", err)
-	}
-	defer CloseRows(rows)
-
-	var blobs []*Blob
-	for rows.Next() {
 	var blob Blob
 	var createdTSUnix int64
+	var finishedTSUnix, uploadedTSUnix sql.NullInt64

-		err := rows.Scan(
-			&blob.BlobHash,
+	err := r.db.conn.QueryRowContext(ctx, query, id).Scan(
+		&blob.ID,
+		&blob.Hash,
 		&createdTSUnix,
+		&finishedTSUnix,
+		&blob.UncompressedSize,
+		&blob.CompressedSize,
+		&uploadedTSUnix,
 	)
+
+	if err == sql.ErrNoRows {
+		return nil, nil
+	}
 	if err != nil {
-			return nil, fmt.Errorf("scanning blob: %w", err)
+		return nil, fmt.Errorf("querying blob: %w", err)
 	}

-		blob.CreatedTS = time.Unix(createdTSUnix, 0)
-		blobs = append(blobs, &blob)
+	blob.CreatedTS = time.Unix(createdTSUnix, 0).UTC()
+	if finishedTSUnix.Valid {
+		ts := time.Unix(finishedTSUnix.Int64, 0).UTC()
+		blob.FinishedTS = &ts
+	}
+	if uploadedTSUnix.Valid {
+		ts := time.Unix(uploadedTSUnix.Int64, 0).UTC()
+		blob.UploadedTS = &ts
+	}
+	return &blob, nil
 }

-	return blobs, rows.Err()
+// UpdateFinished updates a blob when it's finalized
+func (r *BlobRepository) UpdateFinished(ctx context.Context, tx *sql.Tx, id string, hash string, uncompressedSize, compressedSize int64) error {
+	query := `
+		UPDATE blobs 
+		SET blob_hash = ?, finished_ts = ?, uncompressed_size = ?, compressed_size = ?
+		WHERE id = ?
+	`
+
+	now := time.Now().UTC().Unix()
+	var err error
+	if tx != nil {
+		_, err = tx.ExecContext(ctx, query, hash, now, uncompressedSize, compressedSize, id)
+	} else {
+		_, err = r.db.ExecWithLog(ctx, query, hash, now, uncompressedSize, compressedSize, id)
+	}
+
+	if err != nil {
+		return fmt.Errorf("updating blob: %w", err)
+	}
+
+	return nil
+}
+
+// UpdateUploaded marks a blob as uploaded
+func (r *BlobRepository) UpdateUploaded(ctx context.Context, tx *sql.Tx, id string) error {
+	query := `
+		UPDATE blobs 
+		SET uploaded_ts = ?
+		WHERE id = ?
+	`
+
+	now := time.Now().UTC().Unix()
+	var err error
+	if tx != nil {
+		_, err = tx.ExecContext(ctx, query, now, id)
+	} else {
+		_, err = r.db.ExecWithLog(ctx, query, now, id)
+	}
+
+	if err != nil {
+		return fmt.Errorf("marking blob as uploaded: %w", err)
+	}
+
+	return nil
+}
+
+// DeleteOrphaned deletes blobs that are not referenced by any snapshot
+func (r *BlobRepository) DeleteOrphaned(ctx context.Context) error {
+	query := `
+		DELETE FROM blobs 
+		WHERE NOT EXISTS (
+			SELECT 1 FROM snapshot_blobs 
+			WHERE snapshot_blobs.blob_id = blobs.id
+		)
+	`
+
+	result, err := r.db.ExecWithLog(ctx, query)
+	if err != nil {
+		return fmt.Errorf("deleting orphaned blobs: %w", err)
+	}
+
+	rowsAffected, _ := result.RowsAffected()
+	if rowsAffected > 0 {
+		log.Debug("Deleted orphaned blobs", "count", rowsAffected)
+	}
+
+	return nil
 }
--- a/internal/database/blobs_test.go
+++ b/internal/database/blobs_test.go
@@ -4,6 +4,8 @@ import (
 	"context"
 	"testing"
 	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/types"
 )

 func TestBlobRepository(t *testing.T) {
@@ -15,7 +17,8 @@ func TestBlobRepository(t *testing.T) {

 	// Test Create
 	blob := &Blob{
-		BlobHash:  "blobhash123",
+		ID:        types.NewBlobID(),
+		Hash:      types.BlobHash("blobhash123"),
 		CreatedTS: time.Now().Truncate(time.Second),
 	}

@@ -25,23 +28,36 @@ func TestBlobRepository(t *testing.T) {
 	}

 	// Test GetByHash
-	retrieved, err := repo.GetByHash(ctx, blob.BlobHash)
+	retrieved, err := repo.GetByHash(ctx, blob.Hash.String())
 	if err != nil {
 		t.Fatalf("failed to get blob: %v", err)
 	}
 	if retrieved == nil {
 		t.Fatal("expected blob, got nil")
 	}
-	if retrieved.BlobHash != blob.BlobHash {
-		t.Errorf("blob hash mismatch: got %s, want %s", retrieved.BlobHash, blob.BlobHash)
+	if retrieved.Hash != blob.Hash {
+		t.Errorf("blob hash mismatch: got %s, want %s", retrieved.Hash, blob.Hash)
 	}
 	if !retrieved.CreatedTS.Equal(blob.CreatedTS) {
 		t.Errorf("created timestamp mismatch: got %v, want %v", retrieved.CreatedTS, blob.CreatedTS)
 	}

-	// Test List
+	// Test GetByID
+	retrievedByID, err := repo.GetByID(ctx, blob.ID.String())
+	if err != nil {
+		t.Fatalf("failed to get blob by ID: %v", err)
+	}
+	if retrievedByID == nil {
+		t.Fatal("expected blob, got nil")
+	}
+	if retrievedByID.ID != blob.ID {
+		t.Errorf("blob ID mismatch: got %s, want %s", retrievedByID.ID, blob.ID)
+	}
+
+	// Test with second blob
 	blob2 := &Blob{
-		BlobHash:  "blobhash456",
+		ID:        types.NewBlobID(),
+		Hash:      types.BlobHash("blobhash456"),
 		CreatedTS: time.Now().Truncate(time.Second),
 	}
 	err = repo.Create(ctx, nil, blob2)
@@ -49,29 +65,45 @@ func TestBlobRepository(t *testing.T) {
 		t.Fatalf("failed to create second blob: %v", err)
 	}

-	blobs, err := repo.List(ctx, 10, 0)
+	// Test UpdateFinished
+	now := time.Now()
+	err = repo.UpdateFinished(ctx, nil, blob.ID.String(), blob.Hash.String(), 1000, 500)
 	if err != nil {
-		t.Fatalf("failed to list blobs: %v", err)
-	}
-	if len(blobs) != 2 {
-		t.Errorf("expected 2 blobs, got %d", len(blobs))
+		t.Fatalf("failed to update blob as finished: %v", err)
 	}

-	// Test pagination
-	blobs, err = repo.List(ctx, 1, 0)
+	// Verify update
+	updated, err := repo.GetByID(ctx, blob.ID.String())
 	if err != nil {
-		t.Fatalf("failed to list blobs with limit: %v", err)
+		t.Fatalf("failed to get updated blob: %v", err)
 	}
-	if len(blobs) != 1 {
-		t.Errorf("expected 1 blob with limit, got %d", len(blobs))
+	if updated.FinishedTS == nil {
+		t.Fatal("expected finished timestamp to be set")
+	}
+	if updated.UncompressedSize != 1000 {
+		t.Errorf("expected uncompressed size 1000, got %d", updated.UncompressedSize)
+	}
+	if updated.CompressedSize != 500 {
+		t.Errorf("expected compressed size 500, got %d", updated.CompressedSize)
 	}

-	blobs, err = repo.List(ctx, 1, 1)
+	// Test UpdateUploaded
+	err = repo.UpdateUploaded(ctx, nil, blob.ID.String())
 	if err != nil {
-		t.Fatalf("failed to list blobs with offset: %v", err)
+		t.Fatalf("failed to update blob as uploaded: %v", err)
 	}
-	if len(blobs) != 1 {
-		t.Errorf("expected 1 blob with offset, got %d", len(blobs))
+
+	// Verify upload update
+	uploaded, err := repo.GetByID(ctx, blob.ID.String())
+	if err != nil {
+		t.Fatalf("failed to get uploaded blob: %v", err)
+	}
+	if uploaded.UploadedTS == nil {
+		t.Fatal("expected uploaded timestamp to be set")
+	}
+	// Allow 1 second tolerance for timestamp comparison
+	if uploaded.UploadedTS.Before(now.Add(-1 * time.Second)) {
+		t.Error("uploaded timestamp should be around test time")
 	}
 }

@@ -83,7 +115,8 @@ func TestBlobRepositoryDuplicate(t *testing.T) {
 	repo := NewBlobRepository(db)

 	blob := &Blob{
-		BlobHash:  "duplicate_blob",
+		ID:        types.NewBlobID(),
+		Hash:      types.BlobHash("duplicate_blob"),
 		CreatedTS: time.Now().Truncate(time.Second),
 	}

--- a/internal/database/cascade_debug_test.go
+++ b/internal/database/cascade_debug_test.go
@@ -0,0 +1,125 @@
+package database
+
+import (
+	"context"
+	"fmt"
+	"testing"
+	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/types"
+)
+
+// TestCascadeDeleteDebug tests cascade delete with debug output
+func TestCascadeDeleteDebug(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repos := NewRepositories(db)
+
+	// Check if foreign keys are enabled
+	var fkEnabled int
+	err := db.conn.QueryRow("PRAGMA foreign_keys").Scan(&fkEnabled)
+	if err != nil {
+		t.Fatal(err)
+	}
+	t.Logf("Foreign keys enabled: %d", fkEnabled)
+
+	// Create a file
+	file := &File{
+		Path:  "/cascade-test.txt",
+		MTime: time.Now().Truncate(time.Second),
+		CTime: time.Now().Truncate(time.Second),
+		Size:  1024,
+		Mode:  0644,
+		UID:   1000,
+		GID:   1000,
+	}
+	err = repos.Files.Create(ctx, nil, file)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+	t.Logf("Created file with ID: %s", file.ID)
+
+	// Create chunks and file-chunk mappings
+	for i := 0; i < 3; i++ {
+		chunk := &Chunk{
+			ChunkHash: types.ChunkHash(fmt.Sprintf("cascade-chunk-%d", i)),
+			Size:      1024,
+		}
+		err = repos.Chunks.Create(ctx, nil, chunk)
+		if err != nil {
+			t.Fatalf("failed to create chunk: %v", err)
+		}
+
+		fc := &FileChunk{
+			FileID:    file.ID,
+			Idx:       i,
+			ChunkHash: chunk.ChunkHash,
+		}
+		err = repos.FileChunks.Create(ctx, nil, fc)
+		if err != nil {
+			t.Fatalf("failed to create file chunk: %v", err)
+		}
+		t.Logf("Created file chunk mapping: file_id=%s, idx=%d, chunk=%s", fc.FileID, fc.Idx, fc.ChunkHash)
+	}
+
+	// Verify file chunks exist
+	fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
+	if err != nil {
+		t.Fatal(err)
+	}
+	t.Logf("File chunks before delete: %d", len(fileChunks))
+
+	// Check the foreign key constraint
+	var fkInfo string
+	err = db.conn.QueryRow(`
+		SELECT sql FROM sqlite_master 
+		WHERE type='table' AND name='file_chunks'
+	`).Scan(&fkInfo)
+	if err != nil {
+		t.Fatal(err)
+	}
+	t.Logf("file_chunks table definition:\n%s", fkInfo)
+
+	// Delete the file
+	t.Log("Deleting file...")
+	err = repos.Files.DeleteByID(ctx, nil, file.ID)
+	if err != nil {
+		t.Fatalf("failed to delete file: %v", err)
+	}
+
+	// Verify file is gone
+	deletedFile, err := repos.Files.GetByID(ctx, file.ID)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if deletedFile != nil {
+		t.Error("file should have been deleted")
+	} else {
+		t.Log("File was successfully deleted")
+	}
+
+	// Check file chunks after delete
+	fileChunks, err = repos.FileChunks.GetByFileID(ctx, file.ID)
+	if err != nil {
+		t.Fatal(err)
+	}
+	t.Logf("File chunks after delete: %d", len(fileChunks))
+
+	// Manually check the database
+	var count int
+	err = db.conn.QueryRow("SELECT COUNT(*) FROM file_chunks WHERE file_id = ?", file.ID).Scan(&count)
+	if err != nil {
+		t.Fatal(err)
+	}
+	t.Logf("Manual count of file_chunks for deleted file: %d", count)
+
+	if len(fileChunks) != 0 {
+		t.Errorf("expected 0 file chunks after cascade delete, got %d", len(fileChunks))
+		// List the remaining chunks
+		for _, fc := range fileChunks {
+			t.Logf("Remaining chunk: file_id=%s, idx=%d, chunk=%s", fc.FileID, fc.Idx, fc.ChunkHash)
+		}
+	}
+}
--- a/internal/database/chunk_files.go
+++ b/internal/database/chunk_files.go
@@ -4,6 +4,8 @@ import (
 	"context"
 	"database/sql"
 	"fmt"
+
+	"git.eeqj.de/sneak/vaultik/internal/types"
 )

 type ChunkFileRepository struct {
@@ -16,16 +18,16 @@ func NewChunkFileRepository(db *DB) *ChunkFileRepository {

 func (r *ChunkFileRepository) Create(ctx context.Context, tx *sql.Tx, cf *ChunkFile) error {
 	query := `
-		INSERT INTO chunk_files (chunk_hash, file_path, file_offset, length)
+		INSERT INTO chunk_files (chunk_hash, file_id, file_offset, length)
 		VALUES (?, ?, ?, ?)
-		ON CONFLICT(chunk_hash, file_path) DO NOTHING
+		ON CONFLICT(chunk_hash, file_id) DO NOTHING
 	`

 	var err error
 	if tx != nil {
-		_, err = tx.ExecContext(ctx, query, cf.ChunkHash, cf.FilePath, cf.FileOffset, cf.Length)
+		_, err = tx.ExecContext(ctx, query, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
 	} else {
-		_, err = r.db.ExecWithLock(ctx, query, cf.ChunkHash, cf.FilePath, cf.FileOffset, cf.Length)
+		_, err = r.db.ExecWithLog(ctx, query, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
 	}

 	if err != nil {
@@ -35,37 +37,28 @@ func (r *ChunkFileRepository) Create(ctx context.Context, tx *sql.Tx, cf *ChunkF
 	return nil
 }

-func (r *ChunkFileRepository) GetByChunkHash(ctx context.Context, chunkHash string) ([]*ChunkFile, error) {
+func (r *ChunkFileRepository) GetByChunkHash(ctx context.Context, chunkHash types.ChunkHash) ([]*ChunkFile, error) {
 	query := `
-		SELECT chunk_hash, file_path, file_offset, length
+		SELECT chunk_hash, file_id, file_offset, length
 		FROM chunk_files
 		WHERE chunk_hash = ?
 	`

-	rows, err := r.db.conn.QueryContext(ctx, query, chunkHash)
+	rows, err := r.db.conn.QueryContext(ctx, query, chunkHash.String())
 	if err != nil {
 		return nil, fmt.Errorf("querying chunk files: %w", err)
 	}
 	defer CloseRows(rows)

-	var chunkFiles []*ChunkFile
-	for rows.Next() {
-		var cf ChunkFile
-		err := rows.Scan(&cf.ChunkHash, &cf.FilePath, &cf.FileOffset, &cf.Length)
-		if err != nil {
-			return nil, fmt.Errorf("scanning chunk file: %w", err)
-		}
-		chunkFiles = append(chunkFiles, &cf)
-	}
-
-	return chunkFiles, rows.Err()
+	return r.scanChunkFiles(rows)
 }

 func (r *ChunkFileRepository) GetByFilePath(ctx context.Context, filePath string) ([]*ChunkFile, error) {
 	query := `
-		SELECT chunk_hash, file_path, file_offset, length
-		FROM chunk_files
-		WHERE file_path = ?
+		SELECT cf.chunk_hash, cf.file_id, cf.file_offset, cf.length
+		FROM chunk_files cf
+		JOIN files f ON cf.file_id = f.id
+		WHERE f.path = ?
 	`

 	rows, err := r.db.conn.QueryContext(ctx, query, filePath)
@@ -74,15 +67,138 @@ func (r *ChunkFileRepository) GetByFilePath(ctx context.Context, filePath string
 	}
 	defer CloseRows(rows)

+	return r.scanChunkFiles(rows)
+}
+
+// GetByFileID retrieves chunk files by file ID
+func (r *ChunkFileRepository) GetByFileID(ctx context.Context, fileID types.FileID) ([]*ChunkFile, error) {
+	query := `
+		SELECT chunk_hash, file_id, file_offset, length
+		FROM chunk_files
+		WHERE file_id = ?
+	`
+
+	rows, err := r.db.conn.QueryContext(ctx, query, fileID.String())
+	if err != nil {
+		return nil, fmt.Errorf("querying chunk files: %w", err)
+	}
+	defer CloseRows(rows)
+
+	return r.scanChunkFiles(rows)
+}
+
+// scanChunkFiles is a helper that scans chunk file rows
+func (r *ChunkFileRepository) scanChunkFiles(rows *sql.Rows) ([]*ChunkFile, error) {
 	var chunkFiles []*ChunkFile
 	for rows.Next() {
 		var cf ChunkFile
-		err := rows.Scan(&cf.ChunkHash, &cf.FilePath, &cf.FileOffset, &cf.Length)
+		var chunkHashStr, fileIDStr string
+		err := rows.Scan(&chunkHashStr, &fileIDStr, &cf.FileOffset, &cf.Length)
 		if err != nil {
 			return nil, fmt.Errorf("scanning chunk file: %w", err)
 		}
+		cf.ChunkHash = types.ChunkHash(chunkHashStr)
+		cf.FileID, err = types.ParseFileID(fileIDStr)
+		if err != nil {
+			return nil, fmt.Errorf("parsing file ID: %w", err)
+		}
 		chunkFiles = append(chunkFiles, &cf)
 	}

 	return chunkFiles, rows.Err()
 }
+
+// DeleteByFileID deletes all chunk_files entries for a given file ID
+func (r *ChunkFileRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fileID types.FileID) error {
+	query := `DELETE FROM chunk_files WHERE file_id = ?`
+
+	var err error
+	if tx != nil {
+		_, err = tx.ExecContext(ctx, query, fileID.String())
+	} else {
+		_, err = r.db.ExecWithLog(ctx, query, fileID.String())
+	}
+
+	if err != nil {
+		return fmt.Errorf("deleting chunk files: %w", err)
+	}
+
+	return nil
+}
+
+// DeleteByFileIDs deletes all chunk_files for multiple files in a single statement.
+func (r *ChunkFileRepository) DeleteByFileIDs(ctx context.Context, tx *sql.Tx, fileIDs []types.FileID) error {
+	if len(fileIDs) == 0 {
+		return nil
+	}
+
+	// Batch at 500 to stay within SQLite's variable limit
+	const batchSize = 500
+
+	for i := 0; i < len(fileIDs); i += batchSize {
+		end := i + batchSize
+		if end > len(fileIDs) {
+			end = len(fileIDs)
+		}
+		batch := fileIDs[i:end]
+
+		query := "DELETE FROM chunk_files WHERE file_id IN (?" + repeatPlaceholder(len(batch)-1) + ")"
+		args := make([]interface{}, len(batch))
+		for j, id := range batch {
+			args[j] = id.String()
+		}
+
+		var err error
+		if tx != nil {
+			_, err = tx.ExecContext(ctx, query, args...)
+		} else {
+			_, err = r.db.ExecWithLog(ctx, query, args...)
+		}
+		if err != nil {
+			return fmt.Errorf("batch deleting chunk_files: %w", err)
+		}
+	}
+
+	return nil
+}
+
+// CreateBatch inserts multiple chunk_files in a single statement for efficiency.
+func (r *ChunkFileRepository) CreateBatch(ctx context.Context, tx *sql.Tx, cfs []ChunkFile) error {
+	if len(cfs) == 0 {
+		return nil
+	}
+
+	// Each ChunkFile has 4 values, so batch at 200 to be safe with SQLite's variable limit
+	const batchSize = 200
+
+	for i := 0; i < len(cfs); i += batchSize {
+		end := i + batchSize
+		if end > len(cfs) {
+			end = len(cfs)
+		}
+		batch := cfs[i:end]
+
+		query := "INSERT INTO chunk_files (chunk_hash, file_id, file_offset, length) VALUES "
+		args := make([]interface{}, 0, len(batch)*4)
+		for j, cf := range batch {
+			if j > 0 {
+				query += ", "
+			}
+			query += "(?, ?, ?, ?)"
+			args = append(args, cf.ChunkHash.String(), cf.FileID.String(), cf.FileOffset, cf.Length)
+		}
+		query += " ON CONFLICT(chunk_hash, file_id) DO NOTHING"
+
+		var err error
+		if tx != nil {
+			_, err = tx.ExecContext(ctx, query, args...)
+		} else {
+			_, err = r.db.ExecWithLog(ctx, query, args...)
+		}
+		if err != nil {
+			return fmt.Errorf("batch inserting chunk_files: %w", err)
+		}
+	}
+
+	return nil
+}
--- a/internal/database/chunk_files_test.go
+++ b/internal/database/chunk_files_test.go
@@ -3,6 +3,9 @@ package database
 import (
 	"context"
 	"testing"
+	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/types"
 )

 func TestChunkFileRepository(t *testing.T) {
@@ -11,24 +14,68 @@ func TestChunkFileRepository(t *testing.T) {

 	ctx := context.Background()
 	repo := NewChunkFileRepository(db)
+	fileRepo := NewFileRepository(db)
+	chunksRepo := NewChunkRepository(db)
+
+	// Create test files first
+	testTime := time.Now().Truncate(time.Second)
+	file1 := &File{
+		Path:       "/file1.txt",
+		MTime:      testTime,
+		CTime:      testTime,
+		Size:       1024,
+		Mode:       0644,
+		UID:        1000,
+		GID:        1000,
+		LinkTarget: "",
+	}
+	err := fileRepo.Create(ctx, nil, file1)
+	if err != nil {
+		t.Fatalf("failed to create file1: %v", err)
+	}
+
+	file2 := &File{
+		Path:       "/file2.txt",
+		MTime:      testTime,
+		CTime:      testTime,
+		Size:       1024,
+		Mode:       0644,
+		UID:        1000,
+		GID:        1000,
+		LinkTarget: "",
+	}
+	err = fileRepo.Create(ctx, nil, file2)
+	if err != nil {
+		t.Fatalf("failed to create file2: %v", err)
+	}
+
+	// Create chunk first
+	chunk := &Chunk{
+		ChunkHash: types.ChunkHash("chunk1"),
+		Size:      1024,
+	}
+	err = chunksRepo.Create(ctx, nil, chunk)
+	if err != nil {
+		t.Fatalf("failed to create chunk: %v", err)
+	}

 	// Test Create
 	cf1 := &ChunkFile{
-		ChunkHash:  "chunk1",
-		FilePath:   "/file1.txt",
+		ChunkHash:  types.ChunkHash("chunk1"),
+		FileID:     file1.ID,
 		FileOffset: 0,
 		Length:     1024,
 	}

-	err := repo.Create(ctx, nil, cf1)
+	err = repo.Create(ctx, nil, cf1)
 	if err != nil {
 		t.Fatalf("failed to create chunk file: %v", err)
 	}

 	// Add same chunk in different file (deduplication scenario)
 	cf2 := &ChunkFile{
-		ChunkHash:  "chunk1",
-		FilePath:   "/file2.txt",
+		ChunkHash:  types.ChunkHash("chunk1"),
+		FileID:     file2.ID,
 		FileOffset: 2048,
 		Length:     1024,
 	}
@@ -50,10 +97,10 @@ func TestChunkFileRepository(t *testing.T) {
 	foundFile1 := false
 	foundFile2 := false
 	for _, cf := range chunkFiles {
-		if cf.FilePath == "/file1.txt" && cf.FileOffset == 0 {
+		if cf.FileID == file1.ID && cf.FileOffset == 0 {
 			foundFile1 = true
 		}
-		if cf.FilePath == "/file2.txt" && cf.FileOffset == 2048 {
+		if cf.FileID == file2.ID && cf.FileOffset == 2048 {
 			foundFile2 = true
 		}
 	}
@@ -61,15 +108,15 @@ func TestChunkFileRepository(t *testing.T) {
 		t.Error("not all expected files found")
 	}

-	// Test GetByFilePath
-	chunkFiles, err = repo.GetByFilePath(ctx, "/file1.txt")
+	// Test GetByFileID
+	chunkFiles, err = repo.GetByFileID(ctx, file1.ID)
 	if err != nil {
-		t.Fatalf("failed to get chunks by file path: %v", err)
+		t.Fatalf("failed to get chunks by file ID: %v", err)
 	}
 	if len(chunkFiles) != 1 {
 		t.Errorf("expected 1 chunk for file, got %d", len(chunkFiles))
 	}
-	if chunkFiles[0].ChunkHash != "chunk1" {
+	if chunkFiles[0].ChunkHash != types.ChunkHash("chunk1") {
 		t.Errorf("wrong chunk hash: expected chunk1, got %s", chunkFiles[0].ChunkHash)
 	}

@@ -86,6 +133,37 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {

 	ctx := context.Background()
 	repo := NewChunkFileRepository(db)
+	fileRepo := NewFileRepository(db)
+	chunksRepo := NewChunkRepository(db)
+
+	// Create test files
+	testTime := time.Now().Truncate(time.Second)
+	file1 := &File{Path: "/file1.txt", MTime: testTime, CTime: testTime, Size: 3072, Mode: 0644, UID: 1000, GID: 1000}
+	file2 := &File{Path: "/file2.txt", MTime: testTime, CTime: testTime, Size: 3072, Mode: 0644, UID: 1000, GID: 1000}
+	file3 := &File{Path: "/file3.txt", MTime: testTime, CTime: testTime, Size: 2048, Mode: 0644, UID: 1000, GID: 1000}
+
+	if err := fileRepo.Create(ctx, nil, file1); err != nil {
+		t.Fatalf("failed to create file1: %v", err)
+	}
+	if err := fileRepo.Create(ctx, nil, file2); err != nil {
+		t.Fatalf("failed to create file2: %v", err)
+	}
+	if err := fileRepo.Create(ctx, nil, file3); err != nil {
+		t.Fatalf("failed to create file3: %v", err)
+	}
+
+	// Create chunks first
+	chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3", "chunk4"}
+	for _, chunkHash := range chunks {
+		chunk := &Chunk{
+			ChunkHash: chunkHash,
+			Size:      1024,
+		}
+		err := chunksRepo.Create(ctx, nil, chunk)
+		if err != nil {
+			t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
+		}
+	}

 	// Simulate a scenario where multiple files share chunks
 	// File1: chunk1, chunk2, chunk3
@@ -94,16 +172,16 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {

 	chunkFiles := []ChunkFile{
 		// File1
-		{ChunkHash: "chunk1", FilePath: "/file1.txt", FileOffset: 0, Length: 1024},
-		{ChunkHash: "chunk2", FilePath: "/file1.txt", FileOffset: 1024, Length: 1024},
-		{ChunkHash: "chunk3", FilePath: "/file1.txt", FileOffset: 2048, Length: 1024},
+		{ChunkHash: types.ChunkHash("chunk1"), FileID: file1.ID, FileOffset: 0, Length: 1024},
+		{ChunkHash: types.ChunkHash("chunk2"), FileID: file1.ID, FileOffset: 1024, Length: 1024},
+		{ChunkHash: types.ChunkHash("chunk3"), FileID: file1.ID, FileOffset: 2048, Length: 1024},
 		// File2
-		{ChunkHash: "chunk2", FilePath: "/file2.txt", FileOffset: 0, Length: 1024},
-		{ChunkHash: "chunk3", FilePath: "/file2.txt", FileOffset: 1024, Length: 1024},
-		{ChunkHash: "chunk4", FilePath: "/file2.txt", FileOffset: 2048, Length: 1024},
+		{ChunkHash: types.ChunkHash("chunk2"), FileID: file2.ID, FileOffset: 0, Length: 1024},
+		{ChunkHash: types.ChunkHash("chunk3"), FileID: file2.ID, FileOffset: 1024, Length: 1024},
+		{ChunkHash: types.ChunkHash("chunk4"), FileID: file2.ID, FileOffset: 2048, Length: 1024},
 		// File3
-		{ChunkHash: "chunk1", FilePath: "/file3.txt", FileOffset: 0, Length: 1024},
-		{ChunkHash: "chunk4", FilePath: "/file3.txt", FileOffset: 1024, Length: 1024},
+		{ChunkHash: types.ChunkHash("chunk1"), FileID: file3.ID, FileOffset: 0, Length: 1024},
+		{ChunkHash: types.ChunkHash("chunk4"), FileID: file3.ID, FileOffset: 1024, Length: 1024},
 	}

 	for _, cf := range chunkFiles {
@@ -132,11 +210,11 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {
 	}

 	// Test file2 chunks
-	chunks, err := repo.GetByFilePath(ctx, "/file2.txt")
+	file2Chunks, err := repo.GetByFileID(ctx, file2.ID)
 	if err != nil {
 		t.Fatalf("failed to get chunks for file2: %v", err)
 	}
-	if len(chunks) != 3 {
-		t.Errorf("expected 3 chunks for file2, got %d", len(chunks))
+	if len(file2Chunks) != 3 {
+		t.Errorf("expected 3 chunks for file2, got %d", len(file2Chunks))
 	}
 }
--- a/internal/database/chunks.go
+++ b/internal/database/chunks.go
@@ -4,6 +4,8 @@ import (
 	"context"
 	"database/sql"
 	"fmt"
+
+	"git.eeqj.de/sneak/vaultik/internal/log"
 )

 type ChunkRepository struct {
@@ -16,16 +18,16 @@ func NewChunkRepository(db *DB) *ChunkRepository {

 func (r *ChunkRepository) Create(ctx context.Context, tx *sql.Tx, chunk *Chunk) error {
 	query := `
-		INSERT INTO chunks (chunk_hash, sha256, size)
-		VALUES (?, ?, ?)
+		INSERT INTO chunks (chunk_hash, size)
+		VALUES (?, ?)
 		ON CONFLICT(chunk_hash) DO NOTHING
 	`

 	var err error
 	if tx != nil {
-		_, err = tx.ExecContext(ctx, query, chunk.ChunkHash, chunk.SHA256, chunk.Size)
+		_, err = tx.ExecContext(ctx, query, chunk.ChunkHash, chunk.Size)
 	} else {
-		_, err = r.db.ExecWithLock(ctx, query, chunk.ChunkHash, chunk.SHA256, chunk.Size)
+		_, err = r.db.ExecWithLog(ctx, query, chunk.ChunkHash, chunk.Size)
 	}

 	if err != nil {
@@ -37,7 +39,7 @@ func (r *ChunkRepository) Create(ctx context.Context, tx *sql.Tx, chunk *Chunk)

 func (r *ChunkRepository) GetByHash(ctx context.Context, hash string) (*Chunk, error) {
 	query := `
-		SELECT chunk_hash, sha256, size
+		SELECT chunk_hash, size
 		FROM chunks
 		WHERE chunk_hash = ?
 	`
@@ -46,7 +48,6 @@ func (r *ChunkRepository) GetByHash(ctx context.Context, hash string) (*Chunk, e

 	err := r.db.conn.QueryRowContext(ctx, query, hash).Scan(
 		&chunk.ChunkHash,
-		&chunk.SHA256,
 		&chunk.Size,
 	)

@@ -66,7 +67,7 @@ func (r *ChunkRepository) GetByHashes(ctx context.Context, hashes []string) ([]*
 	}

 	query := `
-		SELECT chunk_hash, sha256, size
+		SELECT chunk_hash, size
 		FROM chunks
 		WHERE chunk_hash IN (`

@@ -92,7 +93,6 @@ func (r *ChunkRepository) GetByHashes(ctx context.Context, hashes []string) ([]*

 		err := rows.Scan(
 			&chunk.ChunkHash,
-			&chunk.SHA256,
 			&chunk.Size,
 		)
 		if err != nil {
@@ -107,7 +107,7 @@ func (r *ChunkRepository) GetByHashes(ctx context.Context, hashes []string) ([]*

 func (r *ChunkRepository) ListUnpacked(ctx context.Context, limit int) ([]*Chunk, error) {
 	query := `
-		SELECT c.chunk_hash, c.sha256, c.size
+		SELECT c.chunk_hash, c.size
 		FROM chunks c
 		LEFT JOIN blob_chunks bc ON c.chunk_hash = bc.chunk_hash
 		WHERE bc.chunk_hash IS NULL
@@ -127,7 +127,6 @@ func (r *ChunkRepository) ListUnpacked(ctx context.Context, limit int) ([]*Chunk

 		err := rows.Scan(
 			&chunk.ChunkHash,
-			&chunk.SHA256,
 			&chunk.Size,
 		)
 		if err != nil {
@@ -139,3 +138,30 @@ func (r *ChunkRepository) ListUnpacked(ctx context.Context, limit int) ([]*Chunk

 	return chunks, rows.Err()
 }
+
+// DeleteOrphaned deletes chunks that are not referenced by any file or blob
+func (r *ChunkRepository) DeleteOrphaned(ctx context.Context) error {
+	query := `
+		DELETE FROM chunks 
+		WHERE NOT EXISTS (
+			SELECT 1 FROM file_chunks 
+			WHERE file_chunks.chunk_hash = chunks.chunk_hash
+		)
+		AND NOT EXISTS (
+			SELECT 1 FROM blob_chunks 
+			WHERE blob_chunks.chunk_hash = chunks.chunk_hash
+		)
+	`
+
+	result, err := r.db.ExecWithLog(ctx, query)
+	if err != nil {
+		return fmt.Errorf("deleting orphaned chunks: %w", err)
+	}
+
+	rowsAffected, _ := result.RowsAffected()
+	if rowsAffected > 0 {
+		log.Debug("Deleted orphaned chunks", "count", rowsAffected)
+	}
+
+	return nil
+}
--- a/internal/database/chunks_ext.go
+++ b/internal/database/chunks_ext.go
@@ -0,0 +1,37 @@
+package database
+
+import (
+	"context"
+	"fmt"
+)
+
+func (r *ChunkRepository) List(ctx context.Context) ([]*Chunk, error) {
+	query := `
+		SELECT chunk_hash, size
+		FROM chunks
+		ORDER BY chunk_hash
+	`
+
+	rows, err := r.db.conn.QueryContext(ctx, query)
+	if err != nil {
+		return nil, fmt.Errorf("querying chunks: %w", err)
+	}
+	defer CloseRows(rows)
+
+	var chunks []*Chunk
+	for rows.Next() {
+		var chunk Chunk
+
+		err := rows.Scan(
+			&chunk.ChunkHash,
+			&chunk.Size,
+		)
+		if err != nil {
+			return nil, fmt.Errorf("scanning chunk: %w", err)
+		}
+
+		chunks = append(chunks, &chunk)
+	}
+
+	return chunks, rows.Err()
+}
--- a/internal/database/chunks_test.go
+++ b/internal/database/chunks_test.go
@@ -3,6 +3,8 @@ package database
 import (
 	"context"
 	"testing"
+
+	"git.eeqj.de/sneak/vaultik/internal/types"
 )

 func TestChunkRepository(t *testing.T) {
@@ -14,8 +16,7 @@ func TestChunkRepository(t *testing.T) {

 	// Test Create
 	chunk := &Chunk{
-		ChunkHash: "chunkhash123",
-		SHA256:    "sha256hash123",
+		ChunkHash: types.ChunkHash("chunkhash123"),
 		Size:      4096,
 	}

@@ -25,7 +26,7 @@ func TestChunkRepository(t *testing.T) {
 	}

 	// Test GetByHash
-	retrieved, err := repo.GetByHash(ctx, chunk.ChunkHash)
+	retrieved, err := repo.GetByHash(ctx, chunk.ChunkHash.String())
 	if err != nil {
 		t.Fatalf("failed to get chunk: %v", err)
 	}
@@ -35,9 +36,6 @@ func TestChunkRepository(t *testing.T) {
 	if retrieved.ChunkHash != chunk.ChunkHash {
 		t.Errorf("chunk hash mismatch: got %s, want %s", retrieved.ChunkHash, chunk.ChunkHash)
 	}
-	if retrieved.SHA256 != chunk.SHA256 {
-		t.Errorf("sha256 mismatch: got %s, want %s", retrieved.SHA256, chunk.SHA256)
-	}
 	if retrieved.Size != chunk.Size {
 		t.Errorf("size mismatch: got %d, want %d", retrieved.Size, chunk.Size)
 	}
@@ -50,8 +48,7 @@ func TestChunkRepository(t *testing.T) {

 	// Test GetByHashes
 	chunk2 := &Chunk{
-		ChunkHash: "chunkhash456",
-		SHA256:    "sha256hash456",
+		ChunkHash: types.ChunkHash("chunkhash456"),
 		Size:      8192,
 	}
 	err = repo.Create(ctx, nil, chunk2)
@@ -59,7 +56,7 @@ func TestChunkRepository(t *testing.T) {
 		t.Fatalf("failed to create second chunk: %v", err)
 	}

-	chunks, err := repo.GetByHashes(ctx, []string{chunk.ChunkHash, chunk2.ChunkHash})
+	chunks, err := repo.GetByHashes(ctx, []string{chunk.ChunkHash.String(), chunk2.ChunkHash.String()})
 	if err != nil {
 		t.Fatalf("failed to get chunks by hashes: %v", err)
 	}
--- a/internal/database/database.go
+++ b/internal/database/database.go
@@ -1,143 +1,239 @@
+// Package database provides the local SQLite index for Vaultik backup operations.
+// The database tracks files, chunks, and their associations with blobs.
+//
+// Blobs in Vaultik are the final storage units uploaded to S3. Each blob is a
+// large (up to 10GB) file containing many compressed and encrypted chunks from
+// multiple source files. Blobs are content-addressed, meaning their filename
+// is derived from their SHA256 hash after compression and encryption.
+//
+// The database does not support migrations. If the schema changes, delete
+// the local database and perform a full backup to recreate it.
 package database

 import (
 	"context"
 	"database/sql"
+	_ "embed"
 	"fmt"
-	"sync"
+	"os"
+	"strings"

+	"git.eeqj.de/sneak/vaultik/internal/log"
 	_ "modernc.org/sqlite"
 )

+//go:embed schema.sql
+var schemaSQL string
+
+// DB represents the Vaultik local index database connection.
+// It uses SQLite to track file metadata, content-defined chunks, and blob associations.
+// The database enables incremental backups by detecting changed files and
+// supports deduplication by tracking which chunks are already stored in blobs.
+// Write operations are synchronized through a mutex to ensure thread safety.
 type DB struct {
 	conn *sql.DB
-	writeLock sync.Mutex
+	path string
 }

+// New creates a new database connection at the specified path.
+// It creates the schema if needed and configures SQLite with WAL mode for
+// better concurrency. SQLite handles crash recovery automatically when
+// opening a database with journal/WAL files present.
+// The path parameter can be a file path for persistent storage or ":memory:"
+// for an in-memory database (useful for testing).
 func New(ctx context.Context, path string) (*DB, error) {
-	conn, err := sql.Open("sqlite", path+"?_journal_mode=WAL&_synchronous=NORMAL&_busy_timeout=5000")
-	if err != nil {
-		return nil, fmt.Errorf("opening database: %w", err)
+	log.Debug("Opening database connection", "path", path)
+
+	// Note: We do NOT delete journal/WAL files before opening.
+	// SQLite handles crash recovery automatically when the database is opened.
+	// Deleting these files would corrupt the database after an unclean shutdown.
+
+	// First attempt with standard WAL mode
+	log.Debug("Attempting to open database with WAL mode", "path", path)
+	conn, err := sql.Open(
+		"sqlite",
+		path+"?_journal_mode=WAL&_synchronous=NORMAL&_busy_timeout=10000&_locking_mode=NORMAL&_foreign_keys=ON",
+	)
+	if err == nil {
+		// Set connection pool settings
+		// SQLite can handle multiple readers but only one writer at a time.
+		// Setting MaxOpenConns to 1 ensures all writes are serialized through
+		// a single connection, preventing SQLITE_BUSY errors.
+		conn.SetMaxOpenConns(1)
+		conn.SetMaxIdleConns(1)
+
+		if err := conn.PingContext(ctx); err == nil {
+			// Success on first try
+			log.Debug("Database opened successfully with WAL mode", "path", path)
+
+			// Enable foreign keys explicitly
+			if _, err := conn.ExecContext(ctx, "PRAGMA foreign_keys = ON"); err != nil {
+				log.Warn("Failed to enable foreign keys", "error", err)
 			}

+			db := &DB{conn: conn, path: path}
+			if err := db.createSchema(ctx); err != nil {
+				_ = conn.Close()
+				return nil, fmt.Errorf("creating schema: %w", err)
+			}
+			return db, nil
+		}
+		log.Debug("Failed to ping database, closing connection", "path", path, "error", err)
+		_ = conn.Close()
+	}
+
+	// If first attempt failed, try with TRUNCATE mode to clear any locks
+	log.Info(
+		"Database appears locked, attempting recovery with TRUNCATE mode",
+		"path", path,
+	)
+	conn, err = sql.Open(
+		"sqlite",
+		path+"?_journal_mode=TRUNCATE&_synchronous=NORMAL&_busy_timeout=10000&_foreign_keys=ON",
+	)
+	if err != nil {
+		return nil, fmt.Errorf("opening database in recovery mode: %w", err)
+	}
+
+	// Set connection pool settings
+	// SQLite can handle multiple readers but only one writer at a time.
+	// Setting MaxOpenConns to 1 ensures all writes are serialized through
+	// a single connection, preventing SQLITE_BUSY errors.
+	conn.SetMaxOpenConns(1)
+	conn.SetMaxIdleConns(1)
+
 	if err := conn.PingContext(ctx); err != nil {
-		if closeErr := conn.Close(); closeErr != nil {
-			Fatal("failed to close database connection: %v", closeErr)
-		}
-		return nil, fmt.Errorf("pinging database: %w", err)
+		log.Debug("Failed to ping database in recovery mode, closing", "path", path, "error", err)
+		_ = conn.Close()
+		return nil, fmt.Errorf(
+			"database still locked after recovery attempt: %w",
+			err,
+		)
 	}

-	db := &DB{conn: conn}
-	if err := db.createSchema(ctx); err != nil {
-		if closeErr := conn.Close(); closeErr != nil {
-			Fatal("failed to close database connection: %v", closeErr)
+	log.Debug("Database opened in TRUNCATE mode", "path", path)
+
+	// Switch back to WAL mode
+	log.Debug("Switching database back to WAL mode", "path", path)
+	if _, err := conn.ExecContext(ctx, "PRAGMA journal_mode=WAL"); err != nil {
+		log.Warn("Failed to switch back to WAL mode", "path", path, "error", err)
 	}
+
+	// Ensure foreign keys are enabled
+	if _, err := conn.ExecContext(ctx, "PRAGMA foreign_keys=ON"); err != nil {
+		log.Warn("Failed to enable foreign keys", "path", path, "error", err)
+	}
+
+	db := &DB{conn: conn, path: path}
+	if err := db.createSchema(ctx); err != nil {
+		_ = conn.Close()
 		return nil, fmt.Errorf("creating schema: %w", err)
 	}

+	log.Debug("Database connection established successfully", "path", path)
 	return db, nil
 }

+// Close closes the database connection.
+// It ensures all pending operations are completed before closing.
+// Returns an error if the database connection cannot be closed properly.
 func (db *DB) Close() error {
+	log.Debug("Closing database connection", "path", db.path)
 	if err := db.conn.Close(); err != nil {
-		Fatal("failed to close database: %v", err)
+		log.Error("Failed to close database", "path", db.path, "error", err)
+		return fmt.Errorf("failed to close database: %w", err)
 	}
+	log.Debug("Database connection closed successfully", "path", db.path)
 	return nil
 }

+// Conn returns the underlying *sql.DB connection.
+// This should be used sparingly and primarily for read operations.
+// For write operations, prefer using the ExecWithLog method.
 func (db *DB) Conn() *sql.DB {
 	return db.conn
 }

-func (db *DB) BeginTx(ctx context.Context, opts *sql.TxOptions) (*sql.Tx, error) {
+// Path returns the path to the database file.
+func (db *DB) Path() string {
+	return db.path
+}
+
+// BeginTx starts a new database transaction with the given options.
+// The caller is responsible for committing or rolling back the transaction.
+// For write transactions, consider using the Repositories.WithTx method instead,
+// which handles locking and rollback automatically.
+func (db *DB) BeginTx(
+	ctx context.Context,
+	opts *sql.TxOptions,
+) (*sql.Tx, error) {
 	return db.conn.BeginTx(ctx, opts)
 }

-// LockForWrite acquires the write lock
-func (db *DB) LockForWrite() {
-	db.writeLock.Lock()
-}
+// Note: LockForWrite and UnlockWrite methods have been removed.
+// SQLite handles its own locking internally, so explicit locking is not needed.

-// UnlockWrite releases the write lock
-func (db *DB) UnlockWrite() {
-	db.writeLock.Unlock()
-}
-
-// ExecWithLock executes a write query with the write lock held
-func (db *DB) ExecWithLock(ctx context.Context, query string, args ...interface{}) (sql.Result, error) {
-	db.writeLock.Lock()
-	defer db.writeLock.Unlock()
+// ExecWithLog executes a write query with SQL logging.
+// SQLite handles its own locking internally, so we just pass through to ExecContext.
+// The query and args parameters follow the same format as sql.DB.ExecContext.
+func (db *DB) ExecWithLog(
+	ctx context.Context,
+	query string,
+	args ...interface{},
+) (sql.Result, error) {
+	LogSQL("Execute", query, args...)
 	return db.conn.ExecContext(ctx, query, args...)
 }

-// QueryRowWithLock executes a write query that returns a row with the write lock held
-func (db *DB) QueryRowWithLock(ctx context.Context, query string, args ...interface{}) *sql.Row {
-	db.writeLock.Lock()
-	defer db.writeLock.Unlock()
+// QueryRowWithLog executes a query that returns at most one row with SQL logging.
+// This is useful for queries that modify data and return values (e.g., INSERT ... RETURNING).
+// SQLite handles its own locking internally.
+// The query and args parameters follow the same format as sql.DB.QueryRowContext.
+func (db *DB) QueryRowWithLog(
+	ctx context.Context,
+	query string,
+	args ...interface{},
+) *sql.Row {
+	LogSQL("QueryRow", query, args...)
 	return db.conn.QueryRowContext(ctx, query, args...)
 }

 func (db *DB) createSchema(ctx context.Context) error {
-	schema := `
-	CREATE TABLE IF NOT EXISTS files (
-		path TEXT PRIMARY KEY,
-		mtime INTEGER NOT NULL,
-		ctime INTEGER NOT NULL,
-		size INTEGER NOT NULL,
-		mode INTEGER NOT NULL,
-		uid INTEGER NOT NULL,
-		gid INTEGER NOT NULL,
-		link_target TEXT
-	);
-
-	CREATE TABLE IF NOT EXISTS file_chunks (
-		path TEXT NOT NULL,
-		idx INTEGER NOT NULL,
-		chunk_hash TEXT NOT NULL,
-		PRIMARY KEY (path, idx)
-	);
-
-	CREATE TABLE IF NOT EXISTS chunks (
-		chunk_hash TEXT PRIMARY KEY,
-		sha256 TEXT NOT NULL,
-		size INTEGER NOT NULL
-	);
-
-	CREATE TABLE IF NOT EXISTS blobs (
-		blob_hash TEXT PRIMARY KEY,
-		created_ts INTEGER NOT NULL
-	);
-
-	CREATE TABLE IF NOT EXISTS blob_chunks (
-		blob_hash TEXT NOT NULL,
-		chunk_hash TEXT NOT NULL,
-		offset INTEGER NOT NULL,
-		length INTEGER NOT NULL,
-		PRIMARY KEY (blob_hash, chunk_hash)
-	);
-
-	CREATE TABLE IF NOT EXISTS chunk_files (
-		chunk_hash TEXT NOT NULL,
-		file_path TEXT NOT NULL,
-		file_offset INTEGER NOT NULL,
-		length INTEGER NOT NULL,
-		PRIMARY KEY (chunk_hash, file_path)
-	);
-
-	CREATE TABLE IF NOT EXISTS snapshots (
-		id TEXT PRIMARY KEY,
-		hostname TEXT NOT NULL,
-		vaultik_version TEXT NOT NULL,
-		created_ts INTEGER NOT NULL,
-		file_count INTEGER NOT NULL,
-		chunk_count INTEGER NOT NULL,
-		blob_count INTEGER NOT NULL,
-		total_size INTEGER NOT NULL,
-		blob_size INTEGER NOT NULL,
-		compression_ratio REAL NOT NULL
-	);
-	`
-
-	_, err := db.conn.ExecContext(ctx, schema)
+	_, err := db.conn.ExecContext(ctx, schemaSQL)
 	return err
 }
+
+// NewTestDB creates an in-memory SQLite database for testing purposes.
+// The database is automatically initialized with the schema and is ready for use.
+// Each call creates a new independent database instance.
+func NewTestDB() (*DB, error) {
+	return New(context.Background(), ":memory:")
+}
+
+// repeatPlaceholder generates a string of ", ?" repeated n times for IN clause construction.
+// For example, repeatPlaceholder(2) returns ", ?, ?".
+func repeatPlaceholder(n int) string {
+	if n <= 0 {
+		return ""
+	}
+	return strings.Repeat(", ?", n)
+}
+
+// LogSQL logs SQL queries and their arguments when debug mode is enabled.
+// Debug mode is activated by setting the GODEBUG environment variable to include "vaultik".
+// This is useful for troubleshooting database operations and understanding query patterns.
+//
+// The operation parameter describes the type of SQL operation (e.g., "Execute", "Query").
+// The query parameter is the SQL statement being executed.
+// The args parameter contains the query arguments that will be interpolated.
+func LogSQL(operation, query string, args ...interface{}) {
+	if strings.Contains(os.Getenv("GODEBUG"), "vaultik") {
+		log.Debug(
+			"SQL "+operation,
+			"query",
+			strings.TrimSpace(query),
+			"args",
+			fmt.Sprintf("%v", args),
+		)
+	}
+}
--- a/internal/database/database_test.go
+++ b/internal/database/database_test.go
@@ -67,21 +67,26 @@ func TestDatabaseConcurrentAccess(t *testing.T) {
 	}()

 	// Test concurrent writes
-	done := make(chan bool, 10)
+	type result struct {
+		index int
+		err   error
+	}
+	results := make(chan result, 10)
+
 	for i := 0; i < 10; i++ {
 		go func(i int) {
-			_, err := db.ExecWithLock(ctx, "INSERT INTO chunks (chunk_hash, sha256, size) VALUES (?, ?, ?)",
-				fmt.Sprintf("hash%d", i), fmt.Sprintf("sha%d", i), i*1024)
-			if err != nil {
-				t.Errorf("concurrent insert failed: %v", err)
-			}
-			done <- true
+			_, err := db.ExecWithLog(ctx, "INSERT INTO chunks (chunk_hash, size) VALUES (?, ?)",
+				fmt.Sprintf("hash%d", i), i*1024)
+			results <- result{index: i, err: err}
 		}(i)
 	}

-	// Wait for all goroutines
+	// Wait for all goroutines and check results
 	for i := 0; i < 10; i++ {
-		<-done
+		r := <-results
+		if r.err != nil {
+			t.Fatalf("concurrent insert %d failed: %v", r.index, r.err)
+		}
 	}

 	// Verify all inserts succeeded
--- a/internal/database/file_chunks.go
+++ b/internal/database/file_chunks.go
@@ -4,6 +4,8 @@ import (
 	"context"
 	"database/sql"
 	"fmt"
+
+	"git.eeqj.de/sneak/vaultik/internal/types"
 )

 type FileChunkRepository struct {
@@ -16,16 +18,16 @@ func NewFileChunkRepository(db *DB) *FileChunkRepository {

 func (r *FileChunkRepository) Create(ctx context.Context, tx *sql.Tx, fc *FileChunk) error {
 	query := `
-		INSERT INTO file_chunks (path, idx, chunk_hash)
+		INSERT INTO file_chunks (file_id, idx, chunk_hash)
 		VALUES (?, ?, ?)
-		ON CONFLICT(path, idx) DO NOTHING
+		ON CONFLICT(file_id, idx) DO NOTHING
 	`

 	var err error
 	if tx != nil {
-		_, err = tx.ExecContext(ctx, query, fc.Path, fc.Idx, fc.ChunkHash)
+		_, err = tx.ExecContext(ctx, query, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
 	} else {
-		_, err = r.db.ExecWithLock(ctx, query, fc.Path, fc.Idx, fc.ChunkHash)
+		_, err = r.db.ExecWithLog(ctx, query, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
 	}

 	if err != nil {
@@ -37,10 +39,11 @@ func (r *FileChunkRepository) Create(ctx context.Context, tx *sql.Tx, fc *FileCh

 func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*FileChunk, error) {
 	query := `
-		SELECT path, idx, chunk_hash
-		FROM file_chunks
-		WHERE path = ?
-		ORDER BY idx
+		SELECT fc.file_id, fc.idx, fc.chunk_hash
+		FROM file_chunks fc
+		JOIN files f ON fc.file_id = f.id
+		WHERE f.path = ?
+		ORDER BY fc.idx
 	`

 	rows, err := r.db.conn.QueryContext(ctx, query, path)
@@ -49,13 +52,64 @@ func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*Fi
 	}
 	defer CloseRows(rows)

+	return r.scanFileChunks(rows)
+}
+
+// GetByFileID retrieves file chunks by file ID
+func (r *FileChunkRepository) GetByFileID(ctx context.Context, fileID types.FileID) ([]*FileChunk, error) {
+	query := `
+		SELECT file_id, idx, chunk_hash
+		FROM file_chunks
+		WHERE file_id = ?
+		ORDER BY idx
+	`
+
+	rows, err := r.db.conn.QueryContext(ctx, query, fileID.String())
+	if err != nil {
+		return nil, fmt.Errorf("querying file chunks: %w", err)
+	}
+	defer CloseRows(rows)
+
+	return r.scanFileChunks(rows)
+}
+
+// GetByPathTx retrieves file chunks within a transaction
+func (r *FileChunkRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path string) ([]*FileChunk, error) {
+	query := `
+		SELECT fc.file_id, fc.idx, fc.chunk_hash
+		FROM file_chunks fc
+		JOIN files f ON fc.file_id = f.id
+		WHERE f.path = ?
+		ORDER BY fc.idx
+	`
+
+	LogSQL("GetByPathTx", query, path)
+	rows, err := tx.QueryContext(ctx, query, path)
+	if err != nil {
+		return nil, fmt.Errorf("querying file chunks: %w", err)
+	}
+	defer CloseRows(rows)
+
+	fileChunks, err := r.scanFileChunks(rows)
+	LogSQL("GetByPathTx", "Complete", path, "count", len(fileChunks))
+	return fileChunks, err
+}
+
+// scanFileChunks is a helper that scans file chunk rows
+func (r *FileChunkRepository) scanFileChunks(rows *sql.Rows) ([]*FileChunk, error) {
 	var fileChunks []*FileChunk
 	for rows.Next() {
 		var fc FileChunk
-		err := rows.Scan(&fc.Path, &fc.Idx, &fc.ChunkHash)
+		var fileIDStr, chunkHashStr string
+		err := rows.Scan(&fileIDStr, &fc.Idx, &chunkHashStr)
 		if err != nil {
 			return nil, fmt.Errorf("scanning file chunk: %w", err)
 		}
+		fc.FileID, err = types.ParseFileID(fileIDStr)
+		if err != nil {
+			return nil, fmt.Errorf("parsing file ID: %w", err)
+		}
+		fc.ChunkHash = types.ChunkHash(chunkHashStr)
 		fileChunks = append(fileChunks, &fc)
 	}

@@ -63,13 +117,13 @@ func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*Fi
 }

 func (r *FileChunkRepository) DeleteByPath(ctx context.Context, tx *sql.Tx, path string) error {
-	query := `DELETE FROM file_chunks WHERE path = ?`
+	query := `DELETE FROM file_chunks WHERE file_id = (SELECT id FROM files WHERE path = ?)`

 	var err error
 	if tx != nil {
 		_, err = tx.ExecContext(ctx, query, path)
 	} else {
-		_, err = r.db.ExecWithLock(ctx, query, path)
+		_, err = r.db.ExecWithLog(ctx, query, path)
 	}

 	if err != nil {
@@ -78,3 +132,117 @@ func (r *FileChunkRepository) DeleteByPath(ctx context.Context, tx *sql.Tx, path

 	return nil
 }
+
+// DeleteByFileID deletes all chunks for a file by its UUID
+func (r *FileChunkRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fileID types.FileID) error {
+	query := `DELETE FROM file_chunks WHERE file_id = ?`
+
+	var err error
+	if tx != nil {
+		_, err = tx.ExecContext(ctx, query, fileID.String())
+	} else {
+		_, err = r.db.ExecWithLog(ctx, query, fileID.String())
+	}
+
+	if err != nil {
+		return fmt.Errorf("deleting file chunks: %w", err)
+	}
+
+	return nil
+}
+
+// DeleteByFileIDs deletes all chunks for multiple files in a single statement.
+func (r *FileChunkRepository) DeleteByFileIDs(ctx context.Context, tx *sql.Tx, fileIDs []types.FileID) error {
+	if len(fileIDs) == 0 {
+		return nil
+	}
+
+	// Batch at 500 to stay within SQLite's variable limit
+	const batchSize = 500
+
+	for i := 0; i < len(fileIDs); i += batchSize {
+		end := i + batchSize
+		if end > len(fileIDs) {
+			end = len(fileIDs)
+		}
+		batch := fileIDs[i:end]
+
+		query := "DELETE FROM file_chunks WHERE file_id IN (?" + repeatPlaceholder(len(batch)-1) + ")"
+		args := make([]interface{}, len(batch))
+		for j, id := range batch {
+			args[j] = id.String()
+		}
+
+		var err error
+		if tx != nil {
+			_, err = tx.ExecContext(ctx, query, args...)
+		} else {
+			_, err = r.db.ExecWithLog(ctx, query, args...)
+		}
+		if err != nil {
+			return fmt.Errorf("batch deleting file_chunks: %w", err)
+		}
+	}
+
+	return nil
+}
+
+// CreateBatch inserts multiple file_chunks in a single statement for efficiency.
+// Batches are automatically split to stay within SQLite's variable limit.
+func (r *FileChunkRepository) CreateBatch(ctx context.Context, tx *sql.Tx, fcs []FileChunk) error {
+	if len(fcs) == 0 {
+		return nil
+	}
+
+	// SQLite has a limit on variables (typically 999 or 32766).
+	// Each FileChunk has 3 values, so batch at 300 to be safe.
+	const batchSize = 300
+
+	for i := 0; i < len(fcs); i += batchSize {
+		end := i + batchSize
+		if end > len(fcs) {
+			end = len(fcs)
+		}
+		batch := fcs[i:end]
+
+		// Build the query with multiple value sets
+		query := "INSERT INTO file_chunks (file_id, idx, chunk_hash) VALUES "
+		args := make([]interface{}, 0, len(batch)*3)
+		for j, fc := range batch {
+			if j > 0 {
+				query += ", "
+			}
+			query += "(?, ?, ?)"
+			args = append(args, fc.FileID.String(), fc.Idx, fc.ChunkHash.String())
+		}
+		query += " ON CONFLICT(file_id, idx) DO NOTHING"
+
+		var err error
+		if tx != nil {
+			_, err = tx.ExecContext(ctx, query, args...)
+		} else {
+			_, err = r.db.ExecWithLog(ctx, query, args...)
+		}
+		if err != nil {
+			return fmt.Errorf("batch inserting file_chunks: %w", err)
+		}
+	}
+
+	return nil
+}
+
+// GetByFile is an alias for GetByPath for compatibility
+func (r *FileChunkRepository) GetByFile(ctx context.Context, path string) ([]*FileChunk, error) {
+	LogSQL("GetByFile", "Starting", path)
+	result, err := r.GetByPath(ctx, path)
+	LogSQL("GetByFile", "Complete", path, "count", len(result))
+	return result, err
+}
+
+// GetByFileTx retrieves file chunks within a transaction
+func (r *FileChunkRepository) GetByFileTx(ctx context.Context, tx *sql.Tx, path string) ([]*FileChunk, error) {
+	LogSQL("GetByFileTx", "Starting", path)
+	result, err := r.GetByPathTx(ctx, tx, path)
+	LogSQL("GetByFileTx", "Complete", path, "count", len(result))
+	return result, err
+}
--- a/internal/database/file_chunks_test.go
+++ b/internal/database/file_chunks_test.go
@@ -4,6 +4,9 @@ import (
 	"context"
 	"fmt"
 	"testing"
+	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/types"
 )

 func TestFileChunkRepository(t *testing.T) {
@@ -12,24 +15,56 @@ func TestFileChunkRepository(t *testing.T) {

 	ctx := context.Background()
 	repo := NewFileChunkRepository(db)
+	fileRepo := NewFileRepository(db)
+
+	// Create test file first
+	testTime := time.Now().Truncate(time.Second)
+	file := &File{
+		Path:       "/test/file.txt",
+		MTime:      testTime,
+		CTime:      testTime,
+		Size:       3072,
+		Mode:       0644,
+		UID:        1000,
+		GID:        1000,
+		LinkTarget: "",
+	}
+	err := fileRepo.Create(ctx, nil, file)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+
+	// Create chunks first
+	chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
+	chunkRepo := NewChunkRepository(db)
+	for _, chunkHash := range chunks {
+		chunk := &Chunk{
+			ChunkHash: chunkHash,
+			Size:      1024,
+		}
+		err = chunkRepo.Create(ctx, nil, chunk)
+		if err != nil {
+			t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
+		}
+	}

 	// Test Create
 	fc1 := &FileChunk{
-		Path:      "/test/file.txt",
+		FileID:    file.ID,
 		Idx:       0,
-		ChunkHash: "chunk1",
+		ChunkHash: types.ChunkHash("chunk1"),
 	}

-	err := repo.Create(ctx, nil, fc1)
+	err = repo.Create(ctx, nil, fc1)
 	if err != nil {
 		t.Fatalf("failed to create file chunk: %v", err)
 	}

 	// Add more chunks for the same file
 	fc2 := &FileChunk{
-		Path:      "/test/file.txt",
+		FileID:    file.ID,
 		Idx:       1,
-		ChunkHash: "chunk2",
+		ChunkHash: types.ChunkHash("chunk2"),
 	}
 	err = repo.Create(ctx, nil, fc2)
 	if err != nil {
@@ -37,26 +72,26 @@ func TestFileChunkRepository(t *testing.T) {
 	}

 	fc3 := &FileChunk{
-		Path:      "/test/file.txt",
+		FileID:    file.ID,
 		Idx:       2,
-		ChunkHash: "chunk3",
+		ChunkHash: types.ChunkHash("chunk3"),
 	}
 	err = repo.Create(ctx, nil, fc3)
 	if err != nil {
 		t.Fatalf("failed to create third file chunk: %v", err)
 	}

-	// Test GetByPath
-	chunks, err := repo.GetByPath(ctx, "/test/file.txt")
+	// Test GetByFile
+	fileChunks, err := repo.GetByFile(ctx, "/test/file.txt")
 	if err != nil {
 		t.Fatalf("failed to get file chunks: %v", err)
 	}
-	if len(chunks) != 3 {
-		t.Errorf("expected 3 chunks, got %d", len(chunks))
+	if len(fileChunks) != 3 {
+		t.Errorf("expected 3 chunks, got %d", len(fileChunks))
 	}

 	// Verify order
-	for i, chunk := range chunks {
+	for i, chunk := range fileChunks {
 		if chunk.Idx != i {
 			t.Errorf("wrong chunk order: expected idx %d, got %d", i, chunk.Idx)
 		}
@@ -68,18 +103,18 @@ func TestFileChunkRepository(t *testing.T) {
 		t.Fatalf("failed to create duplicate file chunk: %v", err)
 	}

-	// Test DeleteByPath
-	err = repo.DeleteByPath(ctx, nil, "/test/file.txt")
+	// Test DeleteByFileID
+	err = repo.DeleteByFileID(ctx, nil, file.ID)
 	if err != nil {
 		t.Fatalf("failed to delete file chunks: %v", err)
 	}

-	chunks, err = repo.GetByPath(ctx, "/test/file.txt")
+	fileChunks, err = repo.GetByFileID(ctx, file.ID)
 	if err != nil {
 		t.Fatalf("failed to get deleted file chunks: %v", err)
 	}
-	if len(chunks) != 0 {
-		t.Errorf("expected 0 chunks after delete, got %d", len(chunks))
+	if len(fileChunks) != 0 {
+		t.Errorf("expected 0 chunks after delete, got %d", len(fileChunks))
 	}
 }

@@ -89,15 +124,54 @@ func TestFileChunkRepositoryMultipleFiles(t *testing.T) {

 	ctx := context.Background()
 	repo := NewFileChunkRepository(db)
+	fileRepo := NewFileRepository(db)
+
+	// Create test files
+	testTime := time.Now().Truncate(time.Second)
+	filePaths := []string{"/file1.txt", "/file2.txt", "/file3.txt"}
+	files := make([]*File, len(filePaths))
+
+	for i, path := range filePaths {
+		file := &File{
+			Path:       types.FilePath(path),
+			MTime:      testTime,
+			CTime:      testTime,
+			Size:       2048,
+			Mode:       0644,
+			UID:        1000,
+			GID:        1000,
+			LinkTarget: "",
+		}
+		err := fileRepo.Create(ctx, nil, file)
+		if err != nil {
+			t.Fatalf("failed to create file %s: %v", path, err)
+		}
+		files[i] = file
+	}
+
+	// Create all chunks first
+	chunkRepo := NewChunkRepository(db)
+	for i := range files {
+		for j := 0; j < 2; j++ {
+			chunkHash := types.ChunkHash(fmt.Sprintf("file%d_chunk%d", i, j))
+			chunk := &Chunk{
+				ChunkHash: chunkHash,
+				Size:      1024,
+			}
+			err := chunkRepo.Create(ctx, nil, chunk)
+			if err != nil {
+				t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
+			}
+		}
+	}

 	// Create chunks for multiple files
-	files := []string{"/file1.txt", "/file2.txt", "/file3.txt"}
-	for _, path := range files {
-		for i := 0; i < 2; i++ {
+	for i, file := range files {
+		for j := 0; j < 2; j++ {
 			fc := &FileChunk{
-				Path:      path,
-				Idx:       i,
-				ChunkHash: fmt.Sprintf("%s_chunk%d", path, i),
+				FileID:    file.ID,
+				Idx:       j,
+				ChunkHash: types.ChunkHash(fmt.Sprintf("file%d_chunk%d", i, j)),
 			}
 			err := repo.Create(ctx, nil, fc)
 			if err != nil {
@@ -107,13 +181,13 @@ func TestFileChunkRepositoryMultipleFiles(t *testing.T) {
 	}

 	// Verify each file has correct chunks
-	for _, path := range files {
-		chunks, err := repo.GetByPath(ctx, path)
+	for i, file := range files {
+		chunks, err := repo.GetByFileID(ctx, file.ID)
 		if err != nil {
-			t.Fatalf("failed to get chunks for %s: %v", path, err)
+			t.Fatalf("failed to get chunks for file %d: %v", i, err)
 		}
 		if len(chunks) != 2 {
-			t.Errorf("expected 2 chunks for %s, got %d", path, len(chunks))
+			t.Errorf("expected 2 chunks for file %d, got %d", i, len(chunks))
 		}
 	}
 }
--- a/internal/database/files.go
+++ b/internal/database/files.go
@@ -5,6 +5,9 @@ import (
 	"database/sql"
 	"fmt"
 	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/log"
+	"git.eeqj.de/sneak/vaultik/internal/types"
 )

 type FileRepository struct {
@@ -16,10 +19,16 @@ func NewFileRepository(db *DB) *FileRepository {
 }

 func (r *FileRepository) Create(ctx context.Context, tx *sql.Tx, file *File) error {
+	// Generate UUID if not provided
+	if file.ID.IsZero() {
+		file.ID = types.NewFileID()
+	}
+
 	query := `
-		INSERT INTO files (path, mtime, ctime, size, mode, uid, gid, link_target)
-		VALUES (?, ?, ?, ?, ?, ?, ?, ?)
+		INSERT INTO files (id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target)
+		VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
 		ON CONFLICT(path) DO UPDATE SET
+			source_path = excluded.source_path,
 			mtime = excluded.mtime,
 			ctime = excluded.ctime,
 			size = excluded.size,
@@ -27,43 +36,78 @@ func (r *FileRepository) Create(ctx context.Context, tx *sql.Tx, file *File) err
 			uid = excluded.uid,
 			gid = excluded.gid,
 			link_target = excluded.link_target
+		RETURNING id
 	`

+	var idStr string
 	var err error
 	if tx != nil {
-		_, err = tx.ExecContext(ctx, query, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget)
+		LogSQL("Execute", query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String())
+		err = tx.QueryRowContext(ctx, query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String()).Scan(&idStr)
 	} else {
-		_, err = r.db.ExecWithLock(ctx, query, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget)
+		err = r.db.QueryRowWithLog(ctx, query, file.ID.String(), file.Path.String(), file.SourcePath.String(), file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget.String()).Scan(&idStr)
 	}

 	if err != nil {
 		return fmt.Errorf("inserting file: %w", err)
 	}

+	// Parse the returned ID
+	file.ID, err = types.ParseFileID(idStr)
+	if err != nil {
+		return fmt.Errorf("parsing file ID: %w", err)
+	}
+
 	return nil
 }

 func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, error) {
 	query := `
-		SELECT path, mtime, ctime, size, mode, uid, gid, link_target
+		SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
 		FROM files
 		WHERE path = ?
 	`

-	var file File
-	var mtimeUnix, ctimeUnix int64
-	var linkTarget sql.NullString
+	file, err := r.scanFile(r.db.conn.QueryRowContext(ctx, query, path))
+	if err == sql.ErrNoRows {
+		return nil, nil
+	}
+	if err != nil {
+		return nil, fmt.Errorf("querying file: %w", err)
+	}

-	err := r.db.conn.QueryRowContext(ctx, query, path).Scan(
-		&file.Path,
-		&mtimeUnix,
-		&ctimeUnix,
-		&file.Size,
-		&file.Mode,
-		&file.UID,
-		&file.GID,
-		&linkTarget,
-	)
+	return file, nil
+}
+
+// GetByID retrieves a file by its UUID
+func (r *FileRepository) GetByID(ctx context.Context, id types.FileID) (*File, error) {
+	query := `
+		SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
+		FROM files
+		WHERE id = ?
+	`
+
+	file, err := r.scanFile(r.db.conn.QueryRowContext(ctx, query, id.String()))
+	if err == sql.ErrNoRows {
+		return nil, nil
+	}
+	if err != nil {
+		return nil, fmt.Errorf("querying file: %w", err)
+	}
+
+	return file, nil
+}
+
+func (r *FileRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path string) (*File, error) {
+	query := `
+		SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
+		FROM files
+		WHERE path = ?
+	`
+
+	LogSQL("GetByPathTx QueryRowContext", query, path)
+	file, err := r.scanFile(tx.QueryRowContext(ctx, query, path))
+	LogSQL("GetByPathTx Scan complete", query, path)

 	if err == sql.ErrNoRows {
 		return nil, nil
@@ -72,10 +116,80 @@ func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, err
 		return nil, fmt.Errorf("querying file: %w", err)
 	}

-	file.MTime = time.Unix(mtimeUnix, 0)
-	file.CTime = time.Unix(ctimeUnix, 0)
+	return file, nil
+}
+
+// scanFile is a helper that scans a single file row
+func (r *FileRepository) scanFile(row *sql.Row) (*File, error) {
+	var file File
+	var idStr, pathStr, sourcePathStr string
+	var mtimeUnix, ctimeUnix int64
+	var linkTarget sql.NullString
+
+	err := row.Scan(
+		&idStr,
+		&pathStr,
+		&sourcePathStr,
+		&mtimeUnix,
+		&ctimeUnix,
+		&file.Size,
+		&file.Mode,
+		&file.UID,
+		&file.GID,
+		&linkTarget,
+	)
+	if err != nil {
+		return nil, err
+	}
+
+	file.ID, err = types.ParseFileID(idStr)
+	if err != nil {
+		return nil, fmt.Errorf("parsing file ID: %w", err)
+	}
+	file.Path = types.FilePath(pathStr)
+	file.SourcePath = types.SourcePath(sourcePathStr)
+	file.MTime = time.Unix(mtimeUnix, 0).UTC()
+	file.CTime = time.Unix(ctimeUnix, 0).UTC()
 	if linkTarget.Valid {
-		file.LinkTarget = linkTarget.String
+		file.LinkTarget = types.FilePath(linkTarget.String)
+	}
+
+	return &file, nil
+}
+
+// scanFileRows is a helper that scans a file row from rows iterator
+func (r *FileRepository) scanFileRows(rows *sql.Rows) (*File, error) {
+	var file File
+	var idStr, pathStr, sourcePathStr string
+	var mtimeUnix, ctimeUnix int64
+	var linkTarget sql.NullString
+
+	err := rows.Scan(
+		&idStr,
+		&pathStr,
+		&sourcePathStr,
+		&mtimeUnix,
+		&ctimeUnix,
+		&file.Size,
+		&file.Mode,
+		&file.UID,
+		&file.GID,
+		&linkTarget,
+	)
+	if err != nil {
+		return nil, err
+	}
+
+	file.ID, err = types.ParseFileID(idStr)
+	if err != nil {
+		return nil, fmt.Errorf("parsing file ID: %w", err)
+	}
+	file.Path = types.FilePath(pathStr)
+	file.SourcePath = types.SourcePath(sourcePathStr)
+	file.MTime = time.Unix(mtimeUnix, 0).UTC()
+	file.CTime = time.Unix(ctimeUnix, 0).UTC()
+	if linkTarget.Valid {
+		file.LinkTarget = types.FilePath(linkTarget.String)
 	}

 	return &file, nil
@@ -83,7 +197,7 @@ func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, err

 func (r *FileRepository) ListModifiedSince(ctx context.Context, since time.Time) ([]*File, error) {
 	query := `
-		SELECT path, mtime, ctime, size, mode, uid, gid, link_target
+		SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
 		FROM files
 		WHERE mtime >= ?
 		ORDER BY path
@@ -97,31 +211,11 @@ func (r *FileRepository) ListModifiedSince(ctx context.Context, since time.Time)

 	var files []*File
 	for rows.Next() {
-		var file File
-		var mtimeUnix, ctimeUnix int64
-		var linkTarget sql.NullString
-
-		err := rows.Scan(
-			&file.Path,
-			&mtimeUnix,
-			&ctimeUnix,
-			&file.Size,
-			&file.Mode,
-			&file.UID,
-			&file.GID,
-			&linkTarget,
-		)
+		file, err := r.scanFileRows(rows)
 		if err != nil {
 			return nil, fmt.Errorf("scanning file: %w", err)
 		}
-
-		file.MTime = time.Unix(mtimeUnix, 0)
-		file.CTime = time.Unix(ctimeUnix, 0)
-		if linkTarget.Valid {
-			file.LinkTarget = linkTarget.String
-		}
-
-		files = append(files, &file)
+		files = append(files, file)
 	}

 	return files, rows.Err()
@@ -134,7 +228,7 @@ func (r *FileRepository) Delete(ctx context.Context, tx *sql.Tx, path string) er
 	if tx != nil {
 		_, err = tx.ExecContext(ctx, query, path)
 	} else {
-		_, err = r.db.ExecWithLock(ctx, query, path)
+		_, err = r.db.ExecWithLog(ctx, query, path)
 	}

 	if err != nil {
@@ -143,3 +237,146 @@ func (r *FileRepository) Delete(ctx context.Context, tx *sql.Tx, path string) er

 	return nil
 }
+
+// DeleteByID deletes a file by its UUID
+func (r *FileRepository) DeleteByID(ctx context.Context, tx *sql.Tx, id types.FileID) error {
+	query := `DELETE FROM files WHERE id = ?`
+
+	var err error
+	if tx != nil {
+		_, err = tx.ExecContext(ctx, query, id.String())
+	} else {
+		_, err = r.db.ExecWithLog(ctx, query, id.String())
+	}
+
+	if err != nil {
+		return fmt.Errorf("deleting file: %w", err)
+	}
+
+	return nil
+}
+
+func (r *FileRepository) ListByPrefix(ctx context.Context, prefix string) ([]*File, error) {
+	query := `
+		SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
+		FROM files
+		WHERE path LIKE ? || '%'
+		ORDER BY path
+	`
+
+	rows, err := r.db.conn.QueryContext(ctx, query, prefix)
+	if err != nil {
+		return nil, fmt.Errorf("querying files: %w", err)
+	}
+	defer CloseRows(rows)
+
+	var files []*File
+	for rows.Next() {
+		file, err := r.scanFileRows(rows)
+		if err != nil {
+			return nil, fmt.Errorf("scanning file: %w", err)
+		}
+		files = append(files, file)
+	}
+
+	return files, rows.Err()
+}
+
+// ListAll returns all files in the database
+func (r *FileRepository) ListAll(ctx context.Context) ([]*File, error) {
+	query := `
+		SELECT id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target
+		FROM files
+		ORDER BY path
+	`
+
+	rows, err := r.db.conn.QueryContext(ctx, query)
+	if err != nil {
+		return nil, fmt.Errorf("querying files: %w", err)
+	}
+	defer CloseRows(rows)
+
+	var files []*File
+	for rows.Next() {
+		file, err := r.scanFileRows(rows)
+		if err != nil {
+			return nil, fmt.Errorf("scanning file: %w", err)
+		}
+		files = append(files, file)
+	}
+
+	return files, rows.Err()
+}
+
+// CreateBatch inserts or updates multiple files in a single statement for efficiency.
+// File IDs must be pre-generated before calling this method.
+func (r *FileRepository) CreateBatch(ctx context.Context, tx *sql.Tx, files []*File) error {
+	if len(files) == 0 {
+		return nil
+	}
+
+	// Each File has 10 values, so batch at 100 to be safe with SQLite's variable limit
+	const batchSize = 100
+
+	for i := 0; i < len(files); i += batchSize {
+		end := i + batchSize
+		if end > len(files) {
+			end = len(files)
+		}
+		batch := files[i:end]
+
+		query := `INSERT INTO files (id, path, source_path, mtime, ctime, size, mode, uid, gid, link_target) VALUES `
+		args := make([]interface{}, 0, len(batch)*10)
+		for j, f := range batch {
+			if j > 0 {
+				query += ", "
+			}
+			query += "(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"
+			args = append(args, f.ID.String(), f.Path.String(), f.SourcePath.String(), f.MTime.Unix(), f.CTime.Unix(), f.Size, f.Mode, f.UID, f.GID, f.LinkTarget.String())
+		}
+		query += ` ON CONFLICT(path) DO UPDATE SET
+			source_path = excluded.source_path,
+			mtime = excluded.mtime,
+			ctime = excluded.ctime,
+			size = excluded.size,
+			mode = excluded.mode,
+			uid = excluded.uid,
+			gid = excluded.gid,
+			link_target = excluded.link_target`
+
+		var err error
+		if tx != nil {
+			_, err = tx.ExecContext(ctx, query, args...)
+		} else {
+			_, err = r.db.ExecWithLog(ctx, query, args...)
+		}
+		if err != nil {
+			return fmt.Errorf("batch inserting files: %w", err)
+		}
+	}
+
+	return nil
+}
+
+// DeleteOrphaned deletes files that are not referenced by any snapshot
+func (r *FileRepository) DeleteOrphaned(ctx context.Context) error {
+	query := `
+		DELETE FROM files
+		WHERE NOT EXISTS (
+			SELECT 1 FROM snapshot_files
+			WHERE snapshot_files.file_id = files.id
+		)
+	`
+
+	result, err := r.db.ExecWithLog(ctx, query)
+	if err != nil {
+		return fmt.Errorf("deleting orphaned files: %w", err)
+	}
+
+	rowsAffected, _ := result.RowsAffected()
+	if rowsAffected > 0 {
+		log.Debug("Deleted orphaned files", "count", rowsAffected)
+	}
+
+	return nil
+}
--- a/internal/database/files_test.go
+++ b/internal/database/files_test.go
@@ -53,7 +53,7 @@ func TestFileRepository(t *testing.T) {
 	}

 	// Test GetByPath
-	retrieved, err := repo.GetByPath(ctx, file.Path)
+	retrieved, err := repo.GetByPath(ctx, file.Path.String())
 	if err != nil {
 		t.Fatalf("failed to get file: %v", err)
 	}
@@ -81,7 +81,7 @@ func TestFileRepository(t *testing.T) {
 		t.Fatalf("failed to update file: %v", err)
 	}

-	retrieved, err = repo.GetByPath(ctx, file.Path)
+	retrieved, err = repo.GetByPath(ctx, file.Path.String())
 	if err != nil {
 		t.Fatalf("failed to get updated file: %v", err)
 	}
@@ -99,12 +99,12 @@ func TestFileRepository(t *testing.T) {
 	}

 	// Test Delete
-	err = repo.Delete(ctx, nil, file.Path)
+	err = repo.Delete(ctx, nil, file.Path.String())
 	if err != nil {
 		t.Fatalf("failed to delete file: %v", err)
 	}

-	retrieved, err = repo.GetByPath(ctx, file.Path)
+	retrieved, err = repo.GetByPath(ctx, file.Path.String())
 	if err != nil {
 		t.Fatalf("error getting deleted file: %v", err)
 	}
@@ -137,7 +137,7 @@ func TestFileRepositorySymlink(t *testing.T) {
 		t.Fatalf("failed to create symlink: %v", err)
 	}

-	retrieved, err := repo.GetByPath(ctx, symlink.Path)
+	retrieved, err := repo.GetByPath(ctx, symlink.Path.String())
 	if err != nil {
 		t.Fatalf("failed to get symlink: %v", err)
 	}
--- a/internal/database/models.go
+++ b/internal/database/models.go
@@ -1,70 +1,125 @@
+// Package database provides data models and repository interfaces for the Vaultik backup system.
+// It includes types for files, chunks, blobs, snapshots, and their relationships.
 package database

-import "time"
+import (
+	"time"

-// File represents a file record in the database
+	"git.eeqj.de/sneak/vaultik/internal/types"
+)
+
+// File represents a file or directory in the backup system.
+// It stores metadata about files including timestamps, permissions, ownership,
+// and symlink targets. This information is used to restore files with their
+// original attributes.
 type File struct {
-	Path       string
-	MTime      time.Time
-	CTime      time.Time
+	ID         types.FileID     // UUID primary key
+	Path       types.FilePath   // Absolute path of the file
+	SourcePath types.SourcePath // The source directory this file came from (for restore path stripping)
+	MTime      time.Time        // Last modification time
+	CTime      time.Time        // Creation/change time (platform-specific: birth time on macOS, inode change time on Linux)
 	Size       int64
 	Mode       uint32
 	UID        uint32
 	GID        uint32
-	LinkTarget string // empty for regular files, target path for symlinks
+	LinkTarget types.FilePath // empty for regular files, target path for symlinks
 }

-// IsSymlink returns true if this file is a symbolic link
+// IsSymlink returns true if this file is a symbolic link.
+// A file is considered a symlink if it has a non-empty LinkTarget.
 func (f *File) IsSymlink() bool {
 	return f.LinkTarget != ""
 }

-// FileChunk represents the mapping between files and chunks
+// FileChunk represents the mapping between files and their constituent chunks.
+// Large files are split into multiple chunks for efficient deduplication and storage.
+// The Idx field maintains the order of chunks within a file.
 type FileChunk struct {
-	Path      string
+	FileID    types.FileID
 	Idx       int
-	ChunkHash string
+	ChunkHash types.ChunkHash
 }

-// Chunk represents a chunk record in the database
+// Chunk represents a data chunk in the deduplication system.
+// Files are split into chunks which are content-addressed by their hash.
+// The ChunkHash is the SHA256 hash of the chunk content, used for deduplication.
 type Chunk struct {
-	ChunkHash string
-	SHA256    string
+	ChunkHash types.ChunkHash
 	Size      int64
 }

-// Blob represents a blob record in the database
+// Blob represents a blob record in the database.
+// A blob is Vaultik's final storage unit - a large file (up to 10GB) containing
+// many compressed and encrypted chunks from multiple source files.
+// Blobs are content-addressed, meaning their filename in S3 is derived from
+// the SHA256 hash of their compressed and encrypted content.
+// The blob creation process is: chunks are accumulated -> compressed with zstd
+// -> encrypted with age -> hashed -> uploaded to S3 with the hash as filename.
 type Blob struct {
-	BlobHash  string
-	CreatedTS time.Time
+	ID               types.BlobID   // UUID assigned when blob creation starts
+	Hash             types.BlobHash // SHA256 of final compressed+encrypted content (empty until finalized)
+	CreatedTS        time.Time      // When blob creation started
+	FinishedTS       *time.Time     // When blob was finalized (nil if still packing)
+	UncompressedSize int64          // Total size of raw chunks before compression
+	CompressedSize   int64          // Size after compression and encryption
+	UploadedTS       *time.Time     // When blob was uploaded to S3 (nil if not uploaded)
 }

-// BlobChunk represents the mapping between blobs and chunks
+// BlobChunk represents the mapping between blobs and the chunks they contain.
+// This allows tracking which chunks are stored in which blobs, along with
+// their position and size within the blob. The offset and length fields
+// enable extracting specific chunks from a blob without processing the entire blob.
 type BlobChunk struct {
-	BlobHash  string
-	ChunkHash string
+	BlobID    types.BlobID
+	ChunkHash types.ChunkHash
 	Offset    int64
 	Length    int64
 }

-// ChunkFile represents the reverse mapping of chunks to files
+// ChunkFile represents the reverse mapping showing which files contain a specific chunk.
+// This is used during deduplication to identify all files that share a chunk,
+// which is important for garbage collection and integrity verification.
 type ChunkFile struct {
-	ChunkHash  string
-	FilePath   string
+	ChunkHash  types.ChunkHash
+	FileID     types.FileID
 	FileOffset int64
 	Length     int64
 }

 // Snapshot represents a snapshot record in the database
 type Snapshot struct {
-	ID               string
-	Hostname         string
-	VaultikVersion   string
-	CreatedTS        time.Time
+	ID                   types.SnapshotID
+	Hostname             types.Hostname
+	VaultikVersion       types.Version
+	VaultikGitRevision   types.GitRevision
+	StartedAt            time.Time
+	CompletedAt          *time.Time // nil if still in progress
 	FileCount            int64
 	ChunkCount           int64
 	BlobCount            int64
 	TotalSize            int64   // Total size of all referenced files
 	BlobSize             int64   // Total size of all referenced blobs (compressed and encrypted)
-	CompressionRatio float64 // Compression ratio (BlobSize / TotalSize)
+	BlobUncompressedSize int64   // Total uncompressed size of all referenced blobs
+	CompressionRatio     float64 // Compression ratio (BlobSize / BlobUncompressedSize)
+	CompressionLevel     int     // Compression level used for this snapshot
+	UploadBytes          int64   // Total bytes uploaded during this snapshot
+	UploadDurationMs     int64   // Total milliseconds spent uploading to S3
+}
+
+// IsComplete returns true if the snapshot has completed
+func (s *Snapshot) IsComplete() bool {
+	return s.CompletedAt != nil
+}
+
+// SnapshotFile represents the mapping between snapshots and files
+type SnapshotFile struct {
+	SnapshotID types.SnapshotID
+	FileID     types.FileID
+}
+
+// SnapshotBlob represents the mapping between snapshots and blobs
+type SnapshotBlob struct {
+	SnapshotID types.SnapshotID
+	BlobID     types.BlobID
+	BlobHash   types.BlobHash // Denormalized for easier manifest generation
 }
--- a/internal/database/module.go
+++ b/internal/database/module.go
@@ -7,6 +7,7 @@ import (
 	"path/filepath"

 	"git.eeqj.de/sneak/vaultik/internal/config"
+	"git.eeqj.de/sneak/vaultik/internal/log"
 	"go.uber.org/fx"
 )

@@ -32,7 +33,13 @@ func provideDatabase(lc fx.Lifecycle, cfg *config.Config) (*DB, error) {

 	lc.Append(fx.Hook{
 		OnStop: func(ctx context.Context) error {
-			return db.Close()
+			log.Debug("Database module OnStop hook called")
+			if err := db.Close(); err != nil {
+				log.Error("Failed to close database in OnStop hook", "error", err)
+				return err
+			}
+			log.Debug("Database closed successfully in OnStop hook")
+			return nil
 		},
 	})

--- a/internal/database/repositories.go
+++ b/internal/database/repositories.go
@@ -6,6 +6,9 @@ import (
 	"fmt"
 )

+// Repositories provides access to all database repositories.
+// It serves as a centralized access point for all database operations
+// and manages transaction coordination across repositories.
 type Repositories struct {
 	db         *DB
 	Files      *FileRepository
@@ -15,8 +18,11 @@ type Repositories struct {
 	BlobChunks *BlobChunkRepository
 	ChunkFiles *ChunkFileRepository
 	Snapshots  *SnapshotRepository
+	Uploads    *UploadRepository
 }

+// NewRepositories creates a new Repositories instance with all repository types.
+// Each repository shares the same database connection for coordinated transactions.
 func NewRepositories(db *DB) *Repositories {
 	return &Repositories{
 		db:         db,
@@ -27,20 +33,26 @@ func NewRepositories(db *DB) *Repositories {
 		BlobChunks: NewBlobChunkRepository(db),
 		ChunkFiles: NewChunkFileRepository(db),
 		Snapshots:  NewSnapshotRepository(db),
+		Uploads:    NewUploadRepository(db.conn),
 	}
 }

+// TxFunc is a function that executes within a database transaction.
+// The transaction is automatically committed if the function returns nil,
+// or rolled back if it returns an error.
 type TxFunc func(ctx context.Context, tx *sql.Tx) error

+// WithTx executes a function within a write transaction.
+// SQLite handles its own locking internally, so no explicit locking is needed.
+// The transaction is automatically committed on success or rolled back on error.
+// This method should be used for all write operations to ensure atomicity.
 func (r *Repositories) WithTx(ctx context.Context, fn TxFunc) error {
-	// Acquire write lock for the entire transaction
-	r.db.LockForWrite()
-	defer r.db.UnlockWrite()
-
+	LogSQL("WithTx", "Beginning transaction", "")
 	tx, err := r.db.BeginTx(ctx, nil)
 	if err != nil {
 		return fmt.Errorf("beginning transaction: %w", err)
 	}
+	LogSQL("WithTx", "Transaction started", "")

 	defer func() {
 		if p := recover(); p != nil {
@@ -63,6 +75,15 @@ func (r *Repositories) WithTx(ctx context.Context, fn TxFunc) error {
 	return tx.Commit()
 }

+// DB returns the underlying database for direct queries
+func (r *Repositories) DB() *DB {
+	return r.db
+}
+
+// WithReadTx executes a function within a read-only transaction.
+// Read transactions can run concurrently with other read transactions
+// but will be blocked by write transactions. The transaction is
+// automatically committed on success or rolled back on error.
 func (r *Repositories) WithReadTx(ctx context.Context, fn TxFunc) error {
 	opts := &sql.TxOptions{
 		ReadOnly: true,
--- a/internal/database/repositories_test.go
+++ b/internal/database/repositories_test.go
@@ -6,6 +6,8 @@ import (
 	"fmt"
 	"testing"
 	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/types"
 )

 func TestRepositoriesTransaction(t *testing.T) {
@@ -33,8 +35,7 @@ func TestRepositoriesTransaction(t *testing.T) {

 		// Create chunks
 		chunk1 := &Chunk{
-			ChunkHash: "tx_chunk1",
-			SHA256:    "tx_sha1",
+			ChunkHash: types.ChunkHash("tx_chunk1"),
 			Size:      512,
 		}
 		if err := repos.Chunks.Create(ctx, tx, chunk1); err != nil {
@@ -42,8 +43,7 @@ func TestRepositoriesTransaction(t *testing.T) {
 		}

 		chunk2 := &Chunk{
-			ChunkHash: "tx_chunk2",
-			SHA256:    "tx_sha2",
+			ChunkHash: types.ChunkHash("tx_chunk2"),
 			Size:      512,
 		}
 		if err := repos.Chunks.Create(ctx, tx, chunk2); err != nil {
@@ -52,7 +52,7 @@ func TestRepositoriesTransaction(t *testing.T) {

 		// Map chunks to file
 		fc1 := &FileChunk{
-			Path:      file.Path,
+			FileID:    file.ID,
 			Idx:       0,
 			ChunkHash: chunk1.ChunkHash,
 		}
@@ -61,7 +61,7 @@ func TestRepositoriesTransaction(t *testing.T) {
 		}

 		fc2 := &FileChunk{
-			Path:      file.Path,
+			FileID:    file.ID,
 			Idx:       1,
 			ChunkHash: chunk2.ChunkHash,
 		}
@@ -71,7 +71,8 @@ func TestRepositoriesTransaction(t *testing.T) {

 		// Create blob
 		blob := &Blob{
-			BlobHash:  "tx_blob1",
+			ID:        types.NewBlobID(),
+			Hash:      types.BlobHash("tx_blob1"),
 			CreatedTS: time.Now().Truncate(time.Second),
 		}
 		if err := repos.Blobs.Create(ctx, tx, blob); err != nil {
@@ -80,7 +81,7 @@ func TestRepositoriesTransaction(t *testing.T) {

 		// Map chunks to blob
 		bc1 := &BlobChunk{
-			BlobHash:  blob.BlobHash,
+			BlobID:    blob.ID,
 			ChunkHash: chunk1.ChunkHash,
 			Offset:    0,
 			Length:    512,
@@ -90,7 +91,7 @@ func TestRepositoriesTransaction(t *testing.T) {
 		}

 		bc2 := &BlobChunk{
-			BlobHash:  blob.BlobHash,
+			BlobID:    blob.ID,
 			ChunkHash: chunk2.ChunkHash,
 			Offset:    512,
 			Length:    512,
@@ -115,7 +116,7 @@ func TestRepositoriesTransaction(t *testing.T) {
 		t.Error("expected file after transaction")
 	}

-	chunks, err := repos.FileChunks.GetByPath(ctx, "/test/tx_file.txt")
+	chunks, err := repos.FileChunks.GetByFile(ctx, "/test/tx_file.txt")
 	if err != nil {
 		t.Fatalf("failed to get file chunks: %v", err)
 	}
@@ -157,8 +158,7 @@ func TestRepositoriesTransactionRollback(t *testing.T) {

 		// Create a chunk
 		chunk := &Chunk{
-			ChunkHash: "rollback_chunk",
-			SHA256:    "rollback_sha",
+			ChunkHash: types.ChunkHash("rollback_chunk"),
 			Size:      1024,
 		}
 		if err := repos.Chunks.Create(ctx, tx, chunk); err != nil {
@@ -217,7 +217,7 @@ func TestRepositoriesReadTransaction(t *testing.T) {
 	var retrievedFile *File
 	err = repos.WithReadTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
 		var err error
-		retrievedFile, err = repos.Files.GetByPath(ctx, "/test/read_file.txt")
+		retrievedFile, err = repos.Files.GetByPathTx(ctx, tx, "/test/read_file.txt")
 		if err != nil {
 			return err
 		}
--- a/internal/database/repository_comprehensive_test.go
+++ b/internal/database/repository_comprehensive_test.go
@@ -0,0 +1,874 @@
+package database
+
+import (
+	"context"
+	"database/sql"
+	"fmt"
+	"testing"
+	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/types"
+)
+
+// TestFileRepositoryUUIDGeneration tests that files get unique UUIDs
+func TestFileRepositoryUUIDGeneration(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repo := NewFileRepository(db)
+
+	// Create multiple files
+	files := []*File{
+		{
+			Path:  "/file1.txt",
+			MTime: time.Now().Truncate(time.Second),
+			CTime: time.Now().Truncate(time.Second),
+			Size:  1024,
+			Mode:  0644,
+			UID:   1000,
+			GID:   1000,
+		},
+		{
+			Path:  "/file2.txt",
+			MTime: time.Now().Truncate(time.Second),
+			CTime: time.Now().Truncate(time.Second),
+			Size:  2048,
+			Mode:  0644,
+			UID:   1000,
+			GID:   1000,
+		},
+	}
+
+	uuids := make(map[string]bool)
+	for _, file := range files {
+		err := repo.Create(ctx, nil, file)
+		if err != nil {
+			t.Fatalf("failed to create file: %v", err)
+		}
+
+		// Check UUID was generated
+		if file.ID.IsZero() {
+			t.Error("file ID was not generated")
+		}
+
+		// Check UUID is unique
+		if uuids[file.ID.String()] {
+			t.Errorf("duplicate UUID generated: %s", file.ID)
+		}
+		uuids[file.ID.String()] = true
+	}
+}
+
+// TestFileRepositoryGetByID tests retrieving files by UUID
+func TestFileRepositoryGetByID(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repo := NewFileRepository(db)
+
+	// Create a file
+	file := &File{
+		Path:  "/test.txt",
+		MTime: time.Now().Truncate(time.Second),
+		CTime: time.Now().Truncate(time.Second),
+		Size:  1024,
+		Mode:  0644,
+		UID:   1000,
+		GID:   1000,
+	}
+
+	err := repo.Create(ctx, nil, file)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+
+	// Retrieve by ID
+	retrieved, err := repo.GetByID(ctx, file.ID)
+	if err != nil {
+		t.Fatalf("failed to get file by ID: %v", err)
+	}
+
+	if retrieved.ID != file.ID {
+		t.Errorf("ID mismatch: expected %s, got %s", file.ID, retrieved.ID)
+	}
+	if retrieved.Path != file.Path {
+		t.Errorf("Path mismatch: expected %s, got %s", file.Path, retrieved.Path)
+	}
+
+	// Test non-existent ID
+	nonExistentID := types.NewFileID() // Generate a new UUID that won't exist in the database
+	nonExistent, err := repo.GetByID(ctx, nonExistentID)
+	if err != nil {
+		t.Fatalf("GetByID should not return error for non-existent ID: %v", err)
+	}
+	if nonExistent != nil {
+		t.Error("expected nil for non-existent ID")
+	}
+}
+
+// TestOrphanedFileCleanup tests the cleanup of orphaned files
+func TestOrphanedFileCleanup(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repos := NewRepositories(db)
+
+	// Create files
+	file1 := &File{
+		Path:  "/orphaned.txt",
+		MTime: time.Now().Truncate(time.Second),
+		CTime: time.Now().Truncate(time.Second),
+		Size:  1024,
+		Mode:  0644,
+		UID:   1000,
+		GID:   1000,
+	}
+	file2 := &File{
+		Path:  "/referenced.txt",
+		MTime: time.Now().Truncate(time.Second),
+		CTime: time.Now().Truncate(time.Second),
+		Size:  2048,
+		Mode:  0644,
+		UID:   1000,
+		GID:   1000,
+	}
+
+	err := repos.Files.Create(ctx, nil, file1)
+	if err != nil {
+		t.Fatalf("failed to create file1: %v", err)
+	}
+	err = repos.Files.Create(ctx, nil, file2)
+	if err != nil {
+		t.Fatalf("failed to create file2: %v", err)
+	}
+
+	// Create a snapshot and reference only file2
+	snapshot := &Snapshot{
+		ID:        "test-snapshot",
+		Hostname:  "test-host",
+		StartedAt: time.Now(),
+	}
+	err = repos.Snapshots.Create(ctx, nil, snapshot)
+	if err != nil {
+		t.Fatalf("failed to create snapshot: %v", err)
+	}
+
+	// Add file2 to snapshot
+	err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file2.ID)
+	if err != nil {
+		t.Fatalf("failed to add file to snapshot: %v", err)
+	}
+
+	// Run orphaned cleanup
+	err = repos.Files.DeleteOrphaned(ctx)
+	if err != nil {
+		t.Fatalf("failed to delete orphaned files: %v", err)
+	}
+
+	// Check that orphaned file is gone
+	orphanedFile, err := repos.Files.GetByID(ctx, file1.ID)
+	if err != nil {
+		t.Fatalf("error getting file: %v", err)
+	}
+	if orphanedFile != nil {
+		t.Error("orphaned file should have been deleted")
+	}
+
+	// Check that referenced file still exists
+	referencedFile, err := repos.Files.GetByID(ctx, file2.ID)
+	if err != nil {
+		t.Fatalf("error getting file: %v", err)
+	}
+	if referencedFile == nil {
+		t.Error("referenced file should not have been deleted")
+	}
+}
+
+// TestOrphanedChunkCleanup tests the cleanup of orphaned chunks
+func TestOrphanedChunkCleanup(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repos := NewRepositories(db)
+
+	// Create chunks
+	chunk1 := &Chunk{
+		ChunkHash: types.ChunkHash("orphaned-chunk"),
+		Size:      1024,
+	}
+	chunk2 := &Chunk{
+		ChunkHash: types.ChunkHash("referenced-chunk"),
+		Size:      1024,
+	}
+
+	err := repos.Chunks.Create(ctx, nil, chunk1)
+	if err != nil {
+		t.Fatalf("failed to create chunk1: %v", err)
+	}
+	err = repos.Chunks.Create(ctx, nil, chunk2)
+	if err != nil {
+		t.Fatalf("failed to create chunk2: %v", err)
+	}
+
+	// Create a file and reference only chunk2
+	file := &File{
+		Path:  "/test.txt",
+		MTime: time.Now().Truncate(time.Second),
+		CTime: time.Now().Truncate(time.Second),
+		Size:  1024,
+		Mode:  0644,
+		UID:   1000,
+		GID:   1000,
+	}
+	err = repos.Files.Create(ctx, nil, file)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+
+	// Create file-chunk mapping only for chunk2
+	fc := &FileChunk{
+		FileID:    file.ID,
+		Idx:       0,
+		ChunkHash: chunk2.ChunkHash,
+	}
+	err = repos.FileChunks.Create(ctx, nil, fc)
+	if err != nil {
+		t.Fatalf("failed to create file chunk: %v", err)
+	}
+
+	// Run orphaned cleanup
+	err = repos.Chunks.DeleteOrphaned(ctx)
+	if err != nil {
+		t.Fatalf("failed to delete orphaned chunks: %v", err)
+	}
+
+	// Check that orphaned chunk is gone
+	orphanedChunk, err := repos.Chunks.GetByHash(ctx, chunk1.ChunkHash.String())
+	if err != nil {
+		t.Fatalf("error getting chunk: %v", err)
+	}
+	if orphanedChunk != nil {
+		t.Error("orphaned chunk should have been deleted")
+	}
+
+	// Check that referenced chunk still exists
+	referencedChunk, err := repos.Chunks.GetByHash(ctx, chunk2.ChunkHash.String())
+	if err != nil {
+		t.Fatalf("error getting chunk: %v", err)
+	}
+	if referencedChunk == nil {
+		t.Error("referenced chunk should not have been deleted")
+	}
+}
+
+// TestOrphanedBlobCleanup tests the cleanup of orphaned blobs
+func TestOrphanedBlobCleanup(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repos := NewRepositories(db)
+
+	// Create blobs
+	blob1 := &Blob{
+		ID:        types.NewBlobID(),
+		Hash:      types.BlobHash("orphaned-blob"),
+		CreatedTS: time.Now().Truncate(time.Second),
+	}
+	blob2 := &Blob{
+		ID:        types.NewBlobID(),
+		Hash:      types.BlobHash("referenced-blob"),
+		CreatedTS: time.Now().Truncate(time.Second),
+	}
+
+	err := repos.Blobs.Create(ctx, nil, blob1)
+	if err != nil {
+		t.Fatalf("failed to create blob1: %v", err)
+	}
+	err = repos.Blobs.Create(ctx, nil, blob2)
+	if err != nil {
+		t.Fatalf("failed to create blob2: %v", err)
+	}
+
+	// Create a snapshot and reference only blob2
+	snapshot := &Snapshot{
+		ID:        "test-snapshot",
+		Hostname:  "test-host",
+		StartedAt: time.Now(),
+	}
+	err = repos.Snapshots.Create(ctx, nil, snapshot)
+	if err != nil {
+		t.Fatalf("failed to create snapshot: %v", err)
+	}
+
+	// Add blob2 to snapshot
+	err = repos.Snapshots.AddBlob(ctx, nil, snapshot.ID.String(), blob2.ID, blob2.Hash)
+	if err != nil {
+		t.Fatalf("failed to add blob to snapshot: %v", err)
+	}
+
+	// Run orphaned cleanup
+	err = repos.Blobs.DeleteOrphaned(ctx)
+	if err != nil {
+		t.Fatalf("failed to delete orphaned blobs: %v", err)
+	}
+
+	// Check that orphaned blob is gone
+	orphanedBlob, err := repos.Blobs.GetByID(ctx, blob1.ID.String())
+	if err != nil {
+		t.Fatalf("error getting blob: %v", err)
+	}
+	if orphanedBlob != nil {
+		t.Error("orphaned blob should have been deleted")
+	}
+
+	// Check that referenced blob still exists
+	referencedBlob, err := repos.Blobs.GetByID(ctx, blob2.ID.String())
+	if err != nil {
+		t.Fatalf("error getting blob: %v", err)
+	}
+	if referencedBlob == nil {
+		t.Error("referenced blob should not have been deleted")
+	}
+}
+
+// TestFileChunkRepositoryWithUUIDs tests file-chunk relationships with UUIDs
+func TestFileChunkRepositoryWithUUIDs(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repos := NewRepositories(db)
+
+	// Create a file
+	file := &File{
+		Path:  "/test.txt",
+		MTime: time.Now().Truncate(time.Second),
+		CTime: time.Now().Truncate(time.Second),
+		Size:  3072,
+		Mode:  0644,
+		UID:   1000,
+		GID:   1000,
+	}
+	err := repos.Files.Create(ctx, nil, file)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+
+	// Create chunks
+	chunks := []types.ChunkHash{"chunk1", "chunk2", "chunk3"}
+	for i, chunkHash := range chunks {
+		chunk := &Chunk{
+			ChunkHash: chunkHash,
+			Size:      1024,
+		}
+		err = repos.Chunks.Create(ctx, nil, chunk)
+		if err != nil {
+			t.Fatalf("failed to create chunk: %v", err)
+		}
+
+		// Create file-chunk mapping
+		fc := &FileChunk{
+			FileID:    file.ID,
+			Idx:       i,
+			ChunkHash: chunkHash,
+		}
+		err = repos.FileChunks.Create(ctx, nil, fc)
+		if err != nil {
+			t.Fatalf("failed to create file chunk: %v", err)
+		}
+	}
+
+	// Test GetByFileID
+	fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
+	if err != nil {
+		t.Fatalf("failed to get file chunks: %v", err)
+	}
+	if len(fileChunks) != 3 {
+		t.Errorf("expected 3 chunks, got %d", len(fileChunks))
+	}
+
+	// Test DeleteByFileID
+	err = repos.FileChunks.DeleteByFileID(ctx, nil, file.ID)
+	if err != nil {
+		t.Fatalf("failed to delete file chunks: %v", err)
+	}
+
+	fileChunks, err = repos.FileChunks.GetByFileID(ctx, file.ID)
+	if err != nil {
+		t.Fatalf("failed to get file chunks after delete: %v", err)
+	}
+	if len(fileChunks) != 0 {
+		t.Errorf("expected 0 chunks after delete, got %d", len(fileChunks))
+	}
+}
+
+// TestChunkFileRepositoryWithUUIDs tests chunk-file relationships with UUIDs
+func TestChunkFileRepositoryWithUUIDs(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repos := NewRepositories(db)
+
+	// Create files
+	file1 := &File{
+		Path:  "/file1.txt",
+		MTime: time.Now().Truncate(time.Second),
+		CTime: time.Now().Truncate(time.Second),
+		Size:  1024,
+		Mode:  0644,
+		UID:   1000,
+		GID:   1000,
+	}
+	file2 := &File{
+		Path:  "/file2.txt",
+		MTime: time.Now().Truncate(time.Second),
+		CTime: time.Now().Truncate(time.Second),
+		Size:  1024,
+		Mode:  0644,
+		UID:   1000,
+		GID:   1000,
+	}
+
+	err := repos.Files.Create(ctx, nil, file1)
+	if err != nil {
+		t.Fatalf("failed to create file1: %v", err)
+	}
+	err = repos.Files.Create(ctx, nil, file2)
+	if err != nil {
+		t.Fatalf("failed to create file2: %v", err)
+	}
+
+	// Create a chunk that appears in both files (deduplication)
+	chunk := &Chunk{
+		ChunkHash: types.ChunkHash("shared-chunk"),
+		Size:      1024,
+	}
+	err = repos.Chunks.Create(ctx, nil, chunk)
+	if err != nil {
+		t.Fatalf("failed to create chunk: %v", err)
+	}
+
+	// Create chunk-file mappings
+	cf1 := &ChunkFile{
+		ChunkHash:  chunk.ChunkHash,
+		FileID:     file1.ID,
+		FileOffset: 0,
+		Length:     1024,
+	}
+	cf2 := &ChunkFile{
+		ChunkHash:  chunk.ChunkHash,
+		FileID:     file2.ID,
+		FileOffset: 512,
+		Length:     1024,
+	}
+
+	err = repos.ChunkFiles.Create(ctx, nil, cf1)
+	if err != nil {
+		t.Fatalf("failed to create chunk file 1: %v", err)
+	}
+	err = repos.ChunkFiles.Create(ctx, nil, cf2)
+	if err != nil {
+		t.Fatalf("failed to create chunk file 2: %v", err)
+	}
+
+	// Test GetByChunkHash
+	chunkFiles, err := repos.ChunkFiles.GetByChunkHash(ctx, chunk.ChunkHash)
+	if err != nil {
+		t.Fatalf("failed to get chunk files: %v", err)
+	}
+	if len(chunkFiles) != 2 {
+		t.Errorf("expected 2 files for chunk, got %d", len(chunkFiles))
+	}
+
+	// Test GetByFileID
+	chunkFiles, err = repos.ChunkFiles.GetByFileID(ctx, file1.ID)
+	if err != nil {
+		t.Fatalf("failed to get chunks by file ID: %v", err)
+	}
+	if len(chunkFiles) != 1 {
+		t.Errorf("expected 1 chunk for file, got %d", len(chunkFiles))
+	}
+}
+
+// TestSnapshotRepositoryExtendedFields tests snapshot with version and git revision
+func TestSnapshotRepositoryExtendedFields(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repo := NewSnapshotRepository(db)
+
+	// Create snapshot with extended fields
+	snapshot := &Snapshot{
+		ID:                   "test-20250722-120000Z",
+		Hostname:             "test-host",
+		VaultikVersion:       "0.0.1",
+		VaultikGitRevision:   "abc123def456",
+		StartedAt:            time.Now(),
+		CompletedAt:          nil,
+		FileCount:            100,
+		ChunkCount:           200,
+		BlobCount:            50,
+		TotalSize:            1024 * 1024,
+		BlobSize:             512 * 1024,
+		BlobUncompressedSize: 1024 * 1024,
+		CompressionLevel:     6,
+		CompressionRatio:     2.0,
+		UploadDurationMs:     5000,
+	}
+
+	err := repo.Create(ctx, nil, snapshot)
+	if err != nil {
+		t.Fatalf("failed to create snapshot: %v", err)
+	}
+
+	// Retrieve and verify
+	retrieved, err := repo.GetByID(ctx, snapshot.ID.String())
+	if err != nil {
+		t.Fatalf("failed to get snapshot: %v", err)
+	}
+
+	if retrieved.VaultikVersion != snapshot.VaultikVersion {
+		t.Errorf("version mismatch: expected %s, got %s", snapshot.VaultikVersion, retrieved.VaultikVersion)
+	}
+	if retrieved.VaultikGitRevision != snapshot.VaultikGitRevision {
+		t.Errorf("git revision mismatch: expected %s, got %s", snapshot.VaultikGitRevision, retrieved.VaultikGitRevision)
+	}
+	if retrieved.CompressionLevel != snapshot.CompressionLevel {
+		t.Errorf("compression level mismatch: expected %d, got %d", snapshot.CompressionLevel, retrieved.CompressionLevel)
+	}
+	if retrieved.BlobUncompressedSize != snapshot.BlobUncompressedSize {
+		t.Errorf("uncompressed size mismatch: expected %d, got %d", snapshot.BlobUncompressedSize, retrieved.BlobUncompressedSize)
+	}
+	if retrieved.UploadDurationMs != snapshot.UploadDurationMs {
+		t.Errorf("upload duration mismatch: expected %d, got %d", snapshot.UploadDurationMs, retrieved.UploadDurationMs)
+	}
+}
+
+// TestComplexOrphanedDataScenario tests a complex scenario with multiple relationships
+func TestComplexOrphanedDataScenario(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repos := NewRepositories(db)
+
+	// Create snapshots
+	snapshot1 := &Snapshot{
+		ID:        "snapshot1",
+		Hostname:  "host1",
+		StartedAt: time.Now(),
+	}
+	snapshot2 := &Snapshot{
+		ID:        "snapshot2",
+		Hostname:  "host1",
+		StartedAt: time.Now(),
+	}
+
+	err := repos.Snapshots.Create(ctx, nil, snapshot1)
+	if err != nil {
+		t.Fatalf("failed to create snapshot1: %v", err)
+	}
+	err = repos.Snapshots.Create(ctx, nil, snapshot2)
+	if err != nil {
+		t.Fatalf("failed to create snapshot2: %v", err)
+	}
+
+	// Create files
+	files := make([]*File, 3)
+	for i := range files {
+		files[i] = &File{
+			Path:  types.FilePath(fmt.Sprintf("/file%d.txt", i)),
+			MTime: time.Now().Truncate(time.Second),
+			CTime: time.Now().Truncate(time.Second),
+			Size:  1024,
+			Mode:  0644,
+			UID:   1000,
+			GID:   1000,
+		}
+		err = repos.Files.Create(ctx, nil, files[i])
+		if err != nil {
+			t.Fatalf("failed to create file%d: %v", i, err)
+		}
+	}
+
+	// Add files to snapshots
+	// Snapshot1: file0, file1
+	// Snapshot2: file1, file2
+	// file0: only in snapshot1
+	// file1: in both snapshots
+	// file2: only in snapshot2
+	err = repos.Snapshots.AddFileByID(ctx, nil, snapshot1.ID.String(), files[0].ID)
+	if err != nil {
+		t.Fatal(err)
+	}
+	err = repos.Snapshots.AddFileByID(ctx, nil, snapshot1.ID.String(), files[1].ID)
+	if err != nil {
+		t.Fatal(err)
+	}
+	err = repos.Snapshots.AddFileByID(ctx, nil, snapshot2.ID.String(), files[1].ID)
+	if err != nil {
+		t.Fatal(err)
+	}
+	err = repos.Snapshots.AddFileByID(ctx, nil, snapshot2.ID.String(), files[2].ID)
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	// Delete snapshot1
+	err = repos.Snapshots.DeleteSnapshotFiles(ctx, snapshot1.ID.String())
+	if err != nil {
+		t.Fatal(err)
+	}
+	err = repos.Snapshots.Delete(ctx, snapshot1.ID.String())
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	// Run orphaned cleanup
+	err = repos.Files.DeleteOrphaned(ctx)
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	// Check results
+	// file0 should be deleted (only in deleted snapshot)
+	file0, err := repos.Files.GetByID(ctx, files[0].ID)
+	if err != nil {
+		t.Fatalf("error getting file0: %v", err)
+	}
+	if file0 != nil {
+		t.Error("file0 should have been deleted")
+	}
+
+	// file1 should exist (still in snapshot2)
+	file1, err := repos.Files.GetByID(ctx, files[1].ID)
+	if err != nil {
+		t.Fatalf("error getting file1: %v", err)
+	}
+	if file1 == nil {
+		t.Error("file1 should still exist")
+	}
+
+	// file2 should exist (still in snapshot2)
+	file2, err := repos.Files.GetByID(ctx, files[2].ID)
+	if err != nil {
+		t.Fatalf("error getting file2: %v", err)
+	}
+	if file2 == nil {
+		t.Error("file2 should still exist")
+	}
+}
+
+// TestCascadeDelete tests that cascade deletes work properly
+func TestCascadeDelete(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repos := NewRepositories(db)
+
+	// Create a file
+	file := &File{
+		Path:  "/cascade-test.txt",
+		MTime: time.Now().Truncate(time.Second),
+		CTime: time.Now().Truncate(time.Second),
+		Size:  1024,
+		Mode:  0644,
+		UID:   1000,
+		GID:   1000,
+	}
+	err := repos.Files.Create(ctx, nil, file)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+
+	// Create chunks and file-chunk mappings
+	for i := 0; i < 3; i++ {
+		chunk := &Chunk{
+			ChunkHash: types.ChunkHash(fmt.Sprintf("cascade-chunk-%d", i)),
+			Size:      1024,
+		}
+		err = repos.Chunks.Create(ctx, nil, chunk)
+		if err != nil {
+			t.Fatalf("failed to create chunk: %v", err)
+		}
+
+		fc := &FileChunk{
+			FileID:    file.ID,
+			Idx:       i,
+			ChunkHash: chunk.ChunkHash,
+		}
+		err = repos.FileChunks.Create(ctx, nil, fc)
+		if err != nil {
+			t.Fatalf("failed to create file chunk: %v", err)
+		}
+	}
+
+	// Verify file chunks exist
+	fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if len(fileChunks) != 3 {
+		t.Errorf("expected 3 file chunks, got %d", len(fileChunks))
+	}
+
+	// Delete the file
+	err = repos.Files.DeleteByID(ctx, nil, file.ID)
+	if err != nil {
+		t.Fatalf("failed to delete file: %v", err)
+	}
+
+	// Verify file chunks were cascade deleted
+	fileChunks, err = repos.FileChunks.GetByFileID(ctx, file.ID)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if len(fileChunks) != 0 {
+		t.Errorf("expected 0 file chunks after cascade delete, got %d", len(fileChunks))
+	}
+}
+
+// TestTransactionIsolation tests that transactions properly isolate changes
+func TestTransactionIsolation(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repos := NewRepositories(db)
+
+	// Start a transaction
+	err := repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
+		// Create a file within the transaction
+		file := &File{
+			Path:  "/tx-test.txt",
+			MTime: time.Now().Truncate(time.Second),
+			CTime: time.Now().Truncate(time.Second),
+			Size:  1024,
+			Mode:  0644,
+			UID:   1000,
+			GID:   1000,
+		}
+		err := repos.Files.Create(ctx, tx, file)
+		if err != nil {
+			return err
+		}
+
+		// Within the same transaction, we should be able to query it
+		// Note: This would require modifying GetByPath to accept a tx parameter
+		// For now, we'll just test that rollback works
+
+		// Return an error to trigger rollback
+		return fmt.Errorf("intentional rollback")
+	})
+
+	if err == nil {
+		t.Fatal("expected error from transaction")
+	}
+
+	// Verify the file was not created (transaction rolled back)
+	files, err := repos.Files.ListByPrefix(ctx, "/tx-test")
+	if err != nil {
+		t.Fatal(err)
+	}
+	if len(files) != 0 {
+		t.Error("file should not exist after rollback")
+	}
+}
+
+// TestConcurrentOrphanedCleanup tests that concurrent cleanup operations don't interfere
+func TestConcurrentOrphanedCleanup(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repos := NewRepositories(db)
+
+	// Set a 5-second busy timeout to handle concurrent operations
+	if _, err := db.conn.Exec("PRAGMA busy_timeout = 5000"); err != nil {
+		t.Fatalf("failed to set busy timeout: %v", err)
+	}
+
+	// Create a snapshot
+	snapshot := &Snapshot{
+		ID:        "concurrent-test",
+		Hostname:  "test-host",
+		StartedAt: time.Now(),
+	}
+	err := repos.Snapshots.Create(ctx, nil, snapshot)
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	// Create many files, some orphaned
+	for i := 0; i < 20; i++ {
+		file := &File{
+			Path:  types.FilePath(fmt.Sprintf("/concurrent-%d.txt", i)),
+			MTime: time.Now().Truncate(time.Second),
+			CTime: time.Now().Truncate(time.Second),
+			Size:  1024,
+			Mode:  0644,
+			UID:   1000,
+			GID:   1000,
+		}
+		err = repos.Files.Create(ctx, nil, file)
+		if err != nil {
+			t.Fatal(err)
+		}
+
+		// Add even-numbered files to snapshot
+		if i%2 == 0 {
+			err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file.ID)
+			if err != nil {
+				t.Fatal(err)
+			}
+		}
+	}
+
+	// Run multiple cleanup operations concurrently
+	// Note: SQLite has limited support for concurrent writes, so we expect some to fail
+	done := make(chan error, 3)
+	for i := 0; i < 3; i++ {
+		go func() {
+			done <- repos.Files.DeleteOrphaned(ctx)
+		}()
+	}
+
+	// Wait for all to complete
+	for i := 0; i < 3; i++ {
+		err := <-done
+		if err != nil {
+			t.Errorf("cleanup %d failed: %v", i, err)
+		}
+	}
+
+	// Verify correct files were deleted
+	files, err := repos.Files.ListByPrefix(ctx, "/concurrent-")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	// Should have 10 files remaining (even numbered)
+	if len(files) != 10 {
+		t.Errorf("expected 10 files remaining, got %d", len(files))
+	}
+
+	// Verify all remaining files are even-numbered
+	for _, file := range files {
+		var num int
+		_, err := fmt.Sscanf(file.Path.String(), "/concurrent-%d.txt", &num)
+		if err != nil {
+			t.Logf("failed to parse file number from %s: %v", file.Path, err)
+		}
+		if num%2 != 0 {
+			t.Errorf("odd-numbered file %s should have been deleted", file.Path)
+		}
+	}
+}
--- a/internal/database/repository_debug_test.go
+++ b/internal/database/repository_debug_test.go
@@ -0,0 +1,165 @@
+package database
+
+import (
+	"context"
+	"testing"
+	"time"
+)
+
+// TestOrphanedFileCleanupDebug tests orphaned file cleanup with debug output
+func TestOrphanedFileCleanupDebug(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repos := NewRepositories(db)
+
+	// Create files
+	file1 := &File{
+		Path:  "/orphaned.txt",
+		MTime: time.Now().Truncate(time.Second),
+		CTime: time.Now().Truncate(time.Second),
+		Size:  1024,
+		Mode:  0644,
+		UID:   1000,
+		GID:   1000,
+	}
+	file2 := &File{
+		Path:  "/referenced.txt",
+		MTime: time.Now().Truncate(time.Second),
+		CTime: time.Now().Truncate(time.Second),
+		Size:  2048,
+		Mode:  0644,
+		UID:   1000,
+		GID:   1000,
+	}
+
+	err := repos.Files.Create(ctx, nil, file1)
+	if err != nil {
+		t.Fatalf("failed to create file1: %v", err)
+	}
+	t.Logf("Created file1 with ID: %s", file1.ID)
+
+	err = repos.Files.Create(ctx, nil, file2)
+	if err != nil {
+		t.Fatalf("failed to create file2: %v", err)
+	}
+	t.Logf("Created file2 with ID: %s", file2.ID)
+
+	// Create a snapshot and reference only file2
+	snapshot := &Snapshot{
+		ID:        "test-snapshot",
+		Hostname:  "test-host",
+		StartedAt: time.Now(),
+	}
+	err = repos.Snapshots.Create(ctx, nil, snapshot)
+	if err != nil {
+		t.Fatalf("failed to create snapshot: %v", err)
+	}
+	t.Logf("Created snapshot: %s", snapshot.ID)
+
+	// Check snapshot_files before adding
+	var count int
+	err = db.conn.QueryRow("SELECT COUNT(*) FROM snapshot_files").Scan(&count)
+	if err != nil {
+		t.Fatal(err)
+	}
+	t.Logf("snapshot_files count before add: %d", count)
+
+	// Add file2 to snapshot
+	err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file2.ID)
+	if err != nil {
+		t.Fatalf("failed to add file to snapshot: %v", err)
+	}
+	t.Logf("Added file2 to snapshot")
+
+	// Check snapshot_files after adding
+	err = db.conn.QueryRow("SELECT COUNT(*) FROM snapshot_files").Scan(&count)
+	if err != nil {
+		t.Fatal(err)
+	}
+	t.Logf("snapshot_files count after add: %d", count)
+
+	// Check which files are referenced
+	rows, err := db.conn.Query("SELECT file_id FROM snapshot_files")
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer func() {
+		if err := rows.Close(); err != nil {
+			t.Logf("failed to close rows: %v", err)
+		}
+	}()
+	t.Log("Files in snapshot_files:")
+	for rows.Next() {
+		var fileID string
+		if err := rows.Scan(&fileID); err != nil {
+			t.Fatal(err)
+		}
+		t.Logf("  - %s", fileID)
+	}
+
+	// Check files before cleanup
+	err = db.conn.QueryRow("SELECT COUNT(*) FROM files").Scan(&count)
+	if err != nil {
+		t.Fatal(err)
+	}
+	t.Logf("Files count before cleanup: %d", count)
+
+	// Run orphaned cleanup
+	err = repos.Files.DeleteOrphaned(ctx)
+	if err != nil {
+		t.Fatalf("failed to delete orphaned files: %v", err)
+	}
+	t.Log("Ran orphaned cleanup")
+
+	// Check files after cleanup
+	err = db.conn.QueryRow("SELECT COUNT(*) FROM files").Scan(&count)
+	if err != nil {
+		t.Fatal(err)
+	}
+	t.Logf("Files count after cleanup: %d", count)
+
+	// List remaining files
+	files, err := repos.Files.ListByPrefix(ctx, "/")
+	if err != nil {
+		t.Fatal(err)
+	}
+	t.Log("Remaining files:")
+	for _, f := range files {
+		t.Logf("  - ID: %s, Path: %s", f.ID, f.Path)
+	}
+
+	// Check that orphaned file is gone
+	orphanedFile, err := repos.Files.GetByID(ctx, file1.ID)
+	if err != nil {
+		t.Fatalf("error getting file: %v", err)
+	}
+	if orphanedFile != nil {
+		t.Error("orphaned file should have been deleted")
+		// Let's check why it wasn't deleted
+		var exists bool
+		err = db.conn.QueryRow(`
+			SELECT EXISTS(
+				SELECT 1 FROM snapshot_files 
+				WHERE file_id = ?
+			)`, file1.ID).Scan(&exists)
+		if err != nil {
+			t.Fatal(err)
+		}
+		t.Logf("File1 exists in snapshot_files: %v", exists)
+	} else {
+		t.Log("Orphaned file was correctly deleted")
+	}
+
+	// Check that referenced file still exists
+	referencedFile, err := repos.Files.GetByID(ctx, file2.ID)
+	if err != nil {
+		t.Fatalf("error getting file: %v", err)
+	}
+	if referencedFile == nil {
+		t.Error("referenced file should not have been deleted")
+	} else {
+		t.Log("Referenced file correctly remains")
+	}
+}
--- a/internal/database/repository_edge_cases_test.go
+++ b/internal/database/repository_edge_cases_test.go
@@ -0,0 +1,543 @@
+package database
+
+import (
+	"context"
+	"fmt"
+	"strings"
+	"testing"
+	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/types"
+)
+
+// TestFileRepositoryEdgeCases tests edge cases for file repository
+func TestFileRepositoryEdgeCases(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repo := NewFileRepository(db)
+
+	tests := []struct {
+		name    string
+		file    *File
+		wantErr bool
+		errMsg  string
+	}{
+		{
+			name: "empty path",
+			file: &File{
+				Path:  "",
+				MTime: time.Now(),
+				CTime: time.Now(),
+				Size:  1024,
+				Mode:  0644,
+				UID:   1000,
+				GID:   1000,
+			},
+			wantErr: false, // Empty strings are allowed, only NULL is not allowed
+		},
+		{
+			name: "very long path",
+			file: &File{
+				Path:  types.FilePath("/" + strings.Repeat("a", 4096)),
+				MTime: time.Now(),
+				CTime: time.Now(),
+				Size:  1024,
+				Mode:  0644,
+				UID:   1000,
+				GID:   1000,
+			},
+			wantErr: false,
+		},
+		{
+			name: "path with special characters",
+			file: &File{
+				Path:  "/test/file with spaces and 特殊文字.txt",
+				MTime: time.Now(),
+				CTime: time.Now(),
+				Size:  1024,
+				Mode:  0644,
+				UID:   1000,
+				GID:   1000,
+			},
+			wantErr: false,
+		},
+		{
+			name: "zero size file",
+			file: &File{
+				Path:  "/empty.txt",
+				MTime: time.Now(),
+				CTime: time.Now(),
+				Size:  0,
+				Mode:  0644,
+				UID:   1000,
+				GID:   1000,
+			},
+			wantErr: false,
+		},
+		{
+			name: "symlink with target",
+			file: &File{
+				Path:       "/link",
+				MTime:      time.Now(),
+				CTime:      time.Now(),
+				Size:       0,
+				Mode:       0777 | 0120000, // symlink mode
+				UID:        1000,
+				GID:        1000,
+				LinkTarget: "/target",
+			},
+			wantErr: false,
+		},
+	}
+
+	for i, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			// Add a unique suffix to paths to avoid UNIQUE constraint violations
+			if tt.file.Path != "" {
+				tt.file.Path = types.FilePath(fmt.Sprintf("%s_%d_%d", tt.file.Path, i, time.Now().UnixNano()))
+			}
+
+			err := repo.Create(ctx, nil, tt.file)
+			if (err != nil) != tt.wantErr {
+				t.Errorf("Create() error = %v, wantErr %v", err, tt.wantErr)
+			}
+			if err != nil && tt.errMsg != "" && !strings.Contains(err.Error(), tt.errMsg) {
+				t.Errorf("Create() error = %v, want error containing %q", err, tt.errMsg)
+			}
+		})
+	}
+}
+
+// TestDuplicateHandling tests handling of duplicate entries
+func TestDuplicateHandling(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repos := NewRepositories(db)
+
+	// Test duplicate file paths - Create uses UPSERT logic
+	t.Run("duplicate file paths", func(t *testing.T) {
+		file1 := &File{
+			Path:  "/duplicate.txt",
+			MTime: time.Now(),
+			CTime: time.Now(),
+			Size:  1024,
+			Mode:  0644,
+			UID:   1000,
+			GID:   1000,
+		}
+		file2 := &File{
+			Path:  "/duplicate.txt", // Same path
+			MTime: time.Now().Add(time.Hour),
+			CTime: time.Now().Add(time.Hour),
+			Size:  2048,
+			Mode:  0644,
+			UID:   1000,
+			GID:   1000,
+		}
+
+		err := repos.Files.Create(ctx, nil, file1)
+		if err != nil {
+			t.Fatalf("failed to create file1: %v", err)
+		}
+		originalID := file1.ID
+
+		// Create with same path should update the existing record (UPSERT behavior)
+		err = repos.Files.Create(ctx, nil, file2)
+		if err != nil {
+			t.Fatalf("failed to create file2: %v", err)
+		}
+
+		// Verify the file was updated, not duplicated
+		retrievedFile, err := repos.Files.GetByPath(ctx, "/duplicate.txt")
+		if err != nil {
+			t.Fatalf("failed to retrieve file: %v", err)
+		}
+
+		// The file should have been updated with file2's data
+		if retrievedFile.Size != 2048 {
+			t.Errorf("expected size 2048, got %d", retrievedFile.Size)
+		}
+
+		// ID might be different due to the UPSERT
+		if retrievedFile.ID != file2.ID {
+			t.Logf("File ID changed from %s to %s during upsert", originalID, retrievedFile.ID)
+		}
+	})
+
+	// Test duplicate chunk hashes
+	t.Run("duplicate chunk hashes", func(t *testing.T) {
+		chunk := &Chunk{
+			ChunkHash: types.ChunkHash("duplicate-chunk"),
+			Size:      1024,
+		}
+
+		err := repos.Chunks.Create(ctx, nil, chunk)
+		if err != nil {
+			t.Fatalf("failed to create chunk: %v", err)
+		}
+
+		// Creating the same chunk again should be idempotent (ON CONFLICT DO NOTHING)
+		err = repos.Chunks.Create(ctx, nil, chunk)
+		if err != nil {
+			t.Errorf("duplicate chunk creation should be idempotent, got error: %v", err)
+		}
+	})
+
+	// Test duplicate file-chunk mappings
+	t.Run("duplicate file-chunk mappings", func(t *testing.T) {
+		file := &File{
+			Path:  "/test-dup-fc.txt",
+			MTime: time.Now(),
+			CTime: time.Now(),
+			Size:  1024,
+			Mode:  0644,
+			UID:   1000,
+			GID:   1000,
+		}
+		err := repos.Files.Create(ctx, nil, file)
+		if err != nil {
+			t.Fatal(err)
+		}
+
+		chunk := &Chunk{
+			ChunkHash: types.ChunkHash("test-chunk-dup"),
+			Size:      1024,
+		}
+		err = repos.Chunks.Create(ctx, nil, chunk)
+		if err != nil {
+			t.Fatal(err)
+		}
+
+		fc := &FileChunk{
+			FileID:    file.ID,
+			Idx:       0,
+			ChunkHash: chunk.ChunkHash,
+		}
+
+		err = repos.FileChunks.Create(ctx, nil, fc)
+		if err != nil {
+			t.Fatal(err)
+		}
+
+		// Creating the same mapping again should be idempotent
+		err = repos.FileChunks.Create(ctx, nil, fc)
+		if err != nil {
+			t.Error("file-chunk creation should be idempotent")
+		}
+	})
+}
+
+// TestNullHandling tests handling of NULL values
+func TestNullHandling(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repos := NewRepositories(db)
+
+	// Test file with no link target
+	t.Run("file without link target", func(t *testing.T) {
+		file := &File{
+			Path:       "/regular.txt",
+			MTime:      time.Now(),
+			CTime:      time.Now(),
+			Size:       1024,
+			Mode:       0644,
+			UID:        1000,
+			GID:        1000,
+			LinkTarget: "", // Should be stored as NULL
+		}
+
+		err := repos.Files.Create(ctx, nil, file)
+		if err != nil {
+			t.Fatal(err)
+		}
+
+		retrieved, err := repos.Files.GetByID(ctx, file.ID)
+		if err != nil {
+			t.Fatal(err)
+		}
+
+		if retrieved.LinkTarget != "" {
+			t.Errorf("expected empty link target, got %q", retrieved.LinkTarget)
+		}
+	})
+
+	// Test snapshot with NULL completed_at
+	t.Run("incomplete snapshot", func(t *testing.T) {
+		snapshot := &Snapshot{
+			ID:          "incomplete-test",
+			Hostname:    "test-host",
+			StartedAt:   time.Now(),
+			CompletedAt: nil, // Should remain NULL until completed
+		}
+
+		err := repos.Snapshots.Create(ctx, nil, snapshot)
+		if err != nil {
+			t.Fatal(err)
+		}
+
+		retrieved, err := repos.Snapshots.GetByID(ctx, snapshot.ID.String())
+		if err != nil {
+			t.Fatal(err)
+		}
+
+		if retrieved.CompletedAt != nil {
+			t.Error("expected nil CompletedAt for incomplete snapshot")
+		}
+	})
+
+	// Test blob with NULL uploaded_ts
+	t.Run("blob not uploaded", func(t *testing.T) {
+		blob := &Blob{
+			ID:         types.NewBlobID(),
+			Hash:       types.BlobHash("test-hash"),
+			CreatedTS:  time.Now(),
+			UploadedTS: nil, // Not uploaded yet
+		}
+
+		err := repos.Blobs.Create(ctx, nil, blob)
+		if err != nil {
+			t.Fatal(err)
+		}
+
+		retrieved, err := repos.Blobs.GetByID(ctx, blob.ID.String())
+		if err != nil {
+			t.Fatal(err)
+		}
+
+		if retrieved.UploadedTS != nil {
+			t.Error("expected nil UploadedTS for non-uploaded blob")
+		}
+	})
+}
+
+// TestLargeDatasets tests operations with large amounts of data
+func TestLargeDatasets(t *testing.T) {
+	if testing.Short() {
+		t.Skip("skipping large dataset test in short mode")
+	}
+
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repos := NewRepositories(db)
+
+	// Create a snapshot
+	snapshot := &Snapshot{
+		ID:        "large-dataset-test",
+		Hostname:  "test-host",
+		StartedAt: time.Now(),
+	}
+	err := repos.Snapshots.Create(ctx, nil, snapshot)
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	// Create many files
+	const fileCount = 1000
+	fileIDs := make([]types.FileID, fileCount)
+
+	t.Run("create many files", func(t *testing.T) {
+		start := time.Now()
+		for i := 0; i < fileCount; i++ {
+			file := &File{
+				Path:  types.FilePath(fmt.Sprintf("/large/file%05d.txt", i)),
+				MTime: time.Now(),
+				CTime: time.Now(),
+				Size:  int64(i * 1024),
+				Mode:  0644,
+				UID:   uint32(1000 + (i % 10)),
+				GID:   uint32(1000 + (i % 10)),
+			}
+			err := repos.Files.Create(ctx, nil, file)
+			if err != nil {
+				t.Fatalf("failed to create file %d: %v", i, err)
+			}
+			fileIDs[i] = file.ID
+
+			// Add half to snapshot
+			if i%2 == 0 {
+				err = repos.Snapshots.AddFileByID(ctx, nil, snapshot.ID.String(), file.ID)
+				if err != nil {
+					t.Fatal(err)
+				}
+			}
+		}
+		t.Logf("Created %d files in %v", fileCount, time.Since(start))
+	})
+
+	// Test ListByPrefix performance
+	t.Run("list by prefix performance", func(t *testing.T) {
+		start := time.Now()
+		files, err := repos.Files.ListByPrefix(ctx, "/large/")
+		if err != nil {
+			t.Fatal(err)
+		}
+		if len(files) != fileCount {
+			t.Errorf("expected %d files, got %d", fileCount, len(files))
+		}
+		t.Logf("Listed %d files in %v", len(files), time.Since(start))
+	})
+
+	// Test orphaned cleanup performance
+	t.Run("orphaned cleanup performance", func(t *testing.T) {
+		start := time.Now()
+		err := repos.Files.DeleteOrphaned(ctx)
+		if err != nil {
+			t.Fatal(err)
+		}
+		t.Logf("Cleaned up orphaned files in %v", time.Since(start))
+
+		// Verify correct number remain
+		files, err := repos.Files.ListByPrefix(ctx, "/large/")
+		if err != nil {
+			t.Fatal(err)
+		}
+		if len(files) != fileCount/2 {
+			t.Errorf("expected %d files after cleanup, got %d", fileCount/2, len(files))
+		}
+	})
+}
+
+// TestErrorPropagation tests that errors are properly propagated
+func TestErrorPropagation(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repos := NewRepositories(db)
+
+	// Test GetByID with non-existent ID
+	t.Run("GetByID non-existent", func(t *testing.T) {
+		file, err := repos.Files.GetByID(ctx, types.NewFileID())
+		if err != nil {
+			t.Errorf("GetByID should not return error for non-existent ID, got: %v", err)
+		}
+		if file != nil {
+			t.Error("expected nil file for non-existent ID")
+		}
+	})
+
+	// Test GetByPath with non-existent path
+	t.Run("GetByPath non-existent", func(t *testing.T) {
+		file, err := repos.Files.GetByPath(ctx, "/non/existent/path.txt")
+		if err != nil {
+			t.Errorf("GetByPath should not return error for non-existent path, got: %v", err)
+		}
+		if file != nil {
+			t.Error("expected nil file for non-existent path")
+		}
+	})
+
+	// Test invalid foreign key reference
+	t.Run("invalid foreign key", func(t *testing.T) {
+		fc := &FileChunk{
+			FileID:    types.NewFileID(),
+			Idx:       0,
+			ChunkHash: types.ChunkHash("some-chunk"),
+		}
+		err := repos.FileChunks.Create(ctx, nil, fc)
+		if err == nil {
+			t.Error("expected error for invalid foreign key")
+		}
+		if !strings.Contains(err.Error(), "FOREIGN KEY") {
+			t.Errorf("expected foreign key error, got: %v", err)
+		}
+	})
+}
+
+// TestQueryInjection tests that the system is safe from SQL injection
+func TestQueryInjection(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repos := NewRepositories(db)
+
+	// Test various injection attempts
+	injectionTests := []string{
+		"'; DROP TABLE files; --",
+		"' OR '1'='1",
+		"'; DELETE FROM files WHERE '1'='1'; --",
+		`test'); DROP TABLE files; --`,
+	}
+
+	for _, injection := range injectionTests {
+		t.Run("injection attempt", func(t *testing.T) {
+			// Try injection in file path
+			file := &File{
+				Path:  types.FilePath(injection),
+				MTime: time.Now(),
+				CTime: time.Now(),
+				Size:  1024,
+				Mode:  0644,
+				UID:   1000,
+				GID:   1000,
+			}
+			_ = repos.Files.Create(ctx, nil, file)
+			// Should either succeed (treating as normal string) or fail with constraint
+			// but should NOT execute the injected SQL
+
+			// Verify tables still exist
+			var count int
+			err := db.conn.QueryRow("SELECT COUNT(*) FROM files").Scan(&count)
+			if err != nil {
+				t.Fatal("files table was damaged by injection")
+			}
+		})
+	}
+}
+
+// TestTimezoneHandling tests that times are properly handled in UTC
+func TestTimezoneHandling(t *testing.T) {
+	db, cleanup := setupTestDB(t)
+	defer cleanup()
+
+	ctx := context.Background()
+	repos := NewRepositories(db)
+
+	// Create file with specific timezone
+	loc, err := time.LoadLocation("America/New_York")
+	if err != nil {
+		t.Skip("timezone not available")
+	}
+
+	// Use Truncate to remove sub-second precision since we store as Unix timestamps
+	nyTime := time.Now().In(loc).Truncate(time.Second)
+	file := &File{
+		Path:  "/timezone-test.txt",
+		MTime: nyTime,
+		CTime: nyTime,
+		Size:  1024,
+		Mode:  0644,
+		UID:   1000,
+		GID:   1000,
+	}
+
+	err = repos.Files.Create(ctx, nil, file)
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	// Retrieve and verify times are in UTC
+	retrieved, err := repos.Files.GetByID(ctx, file.ID)
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	// Check that times are equivalent (same instant)
+	if !retrieved.MTime.Equal(nyTime) {
+		t.Error("time was not preserved correctly")
+	}
+
+	// Check that retrieved time is in UTC
+	if retrieved.MTime.Location() != time.UTC {
+		t.Error("retrieved time is not in UTC")
+	}
+}
--- a/internal/database/schema.sql
+++ b/internal/database/schema.sql
@@ -0,0 +1,137 @@
+-- Vaultik Database Schema
+-- Note: This database does not support migrations. If the schema changes,
+-- delete the local database and perform a full backup to recreate it.
+
+-- Files table: stores metadata about files in the filesystem
+CREATE TABLE IF NOT EXISTS files (
+    id TEXT PRIMARY KEY,  -- UUID
+    path TEXT NOT NULL UNIQUE,
+    source_path TEXT NOT NULL DEFAULT '',  -- The source directory this file came from (for restore path stripping)
+    mtime INTEGER NOT NULL,
+    ctime INTEGER NOT NULL,
+    size INTEGER NOT NULL,
+    mode INTEGER NOT NULL,
+    uid INTEGER NOT NULL,
+    gid INTEGER NOT NULL,
+    link_target TEXT
+);
+
+-- Create index on path for efficient lookups
+CREATE INDEX IF NOT EXISTS idx_files_path ON files(path);
+
+-- File chunks table: maps files to their constituent chunks
+CREATE TABLE IF NOT EXISTS file_chunks (
+    file_id TEXT NOT NULL,
+    idx INTEGER NOT NULL,
+    chunk_hash TEXT NOT NULL,
+    PRIMARY KEY (file_id, idx),
+    FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE,
+    FOREIGN KEY (chunk_hash) REFERENCES chunks(chunk_hash)
+);
+
+-- Index for efficient chunk lookups (used in orphan detection)
+CREATE INDEX IF NOT EXISTS idx_file_chunks_chunk_hash ON file_chunks(chunk_hash);
+
+-- Chunks table: stores unique content-defined chunks
+CREATE TABLE IF NOT EXISTS chunks (
+    chunk_hash TEXT PRIMARY KEY,
+    size INTEGER NOT NULL
+);
+
+-- Blobs table: stores packed, compressed, and encrypted blob information
+CREATE TABLE IF NOT EXISTS blobs (
+    id TEXT PRIMARY KEY,
+    blob_hash TEXT UNIQUE,
+    created_ts INTEGER NOT NULL,
+    finished_ts INTEGER,
+    uncompressed_size INTEGER NOT NULL DEFAULT 0,
+    compressed_size INTEGER NOT NULL DEFAULT 0,
+    uploaded_ts INTEGER
+);
+
+-- Blob chunks table: maps chunks to the blobs that contain them
+CREATE TABLE IF NOT EXISTS blob_chunks (
+    blob_id TEXT NOT NULL,
+    chunk_hash TEXT NOT NULL,
+    offset INTEGER NOT NULL,
+    length INTEGER NOT NULL,
+    PRIMARY KEY (blob_id, chunk_hash),
+    FOREIGN KEY (blob_id) REFERENCES blobs(id) ON DELETE CASCADE,
+    FOREIGN KEY (chunk_hash) REFERENCES chunks(chunk_hash)
+);
+
+-- Index for efficient chunk lookups (used in orphan detection)
+CREATE INDEX IF NOT EXISTS idx_blob_chunks_chunk_hash ON blob_chunks(chunk_hash);
+
+-- Chunk files table: reverse mapping of chunks to files
+CREATE TABLE IF NOT EXISTS chunk_files (
+    chunk_hash TEXT NOT NULL,
+    file_id TEXT NOT NULL,
+    file_offset INTEGER NOT NULL,
+    length INTEGER NOT NULL,
+    PRIMARY KEY (chunk_hash, file_id),
+    FOREIGN KEY (chunk_hash) REFERENCES chunks(chunk_hash),
+    FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
+);
+
+-- Index for efficient file lookups (used in orphan detection)
+CREATE INDEX IF NOT EXISTS idx_chunk_files_file_id ON chunk_files(file_id);
+
+-- Snapshots table: tracks backup snapshots
+CREATE TABLE IF NOT EXISTS snapshots (
+    id TEXT PRIMARY KEY,
+    hostname TEXT NOT NULL,
+    vaultik_version TEXT NOT NULL,
+    vaultik_git_revision TEXT NOT NULL,
+    started_at INTEGER NOT NULL,
+    completed_at INTEGER,
+    file_count INTEGER NOT NULL DEFAULT 0,
+    chunk_count INTEGER NOT NULL DEFAULT 0,
+    blob_count INTEGER NOT NULL DEFAULT 0,
+    total_size INTEGER NOT NULL DEFAULT 0,
+    blob_size INTEGER NOT NULL DEFAULT 0,
+    blob_uncompressed_size INTEGER NOT NULL DEFAULT 0,
+    compression_ratio REAL NOT NULL DEFAULT 1.0,
+    compression_level INTEGER NOT NULL DEFAULT 3,
+    upload_bytes INTEGER NOT NULL DEFAULT 0,
+    upload_duration_ms INTEGER NOT NULL DEFAULT 0
+);
+
+-- Snapshot files table: maps snapshots to files
+CREATE TABLE IF NOT EXISTS snapshot_files (
+    snapshot_id TEXT NOT NULL,
+    file_id TEXT NOT NULL,
+    PRIMARY KEY (snapshot_id, file_id),
+    FOREIGN KEY (snapshot_id) REFERENCES snapshots(id) ON DELETE CASCADE,
+    FOREIGN KEY (file_id) REFERENCES files(id)
+);
+
+-- Index for efficient file lookups (used in orphan detection)
+CREATE INDEX IF NOT EXISTS idx_snapshot_files_file_id ON snapshot_files(file_id);
+
+-- Snapshot blobs table: maps snapshots to blobs
+CREATE TABLE IF NOT EXISTS snapshot_blobs (
+    snapshot_id TEXT NOT NULL,
+    blob_id TEXT NOT NULL,
+    blob_hash TEXT NOT NULL,
+    PRIMARY KEY (snapshot_id, blob_id),
+    FOREIGN KEY (snapshot_id) REFERENCES snapshots(id) ON DELETE CASCADE,
+    FOREIGN KEY (blob_id) REFERENCES blobs(id)
+);
+
+-- Index for efficient blob lookups (used in orphan detection)
+CREATE INDEX IF NOT EXISTS idx_snapshot_blobs_blob_id ON snapshot_blobs(blob_id);
+
+-- Uploads table: tracks blob upload metrics
+CREATE TABLE IF NOT EXISTS uploads (
+    blob_hash TEXT PRIMARY KEY,
+    snapshot_id TEXT NOT NULL,
+    uploaded_at INTEGER NOT NULL,
+    size INTEGER NOT NULL,
+    duration_ms INTEGER NOT NULL,
+    FOREIGN KEY (blob_hash) REFERENCES blobs(blob_hash),
+    FOREIGN KEY (snapshot_id) REFERENCES snapshots(id)
+);
+
+-- Index for efficient snapshot lookups
+CREATE INDEX IF NOT EXISTS idx_uploads_snapshot_id ON uploads(snapshot_id);
--- a/internal/database/schema/008_uploads.sql
+++ b/internal/database/schema/008_uploads.sql
@@ -0,0 +1,11 @@
+-- Track blob upload metrics
+CREATE TABLE IF NOT EXISTS uploads (
+    blob_hash TEXT PRIMARY KEY,
+    uploaded_at TIMESTAMP NOT NULL,
+    size INTEGER NOT NULL,
+    duration_ms INTEGER NOT NULL,
+    FOREIGN KEY (blob_hash) REFERENCES blobs(blob_hash)
+);
+
+CREATE INDEX idx_uploads_uploaded_at ON uploads(uploaded_at);
+CREATE INDEX idx_uploads_duration ON uploads(duration_ms);
--- a/internal/database/snapshots.go
+++ b/internal/database/snapshots.go
@@ -5,6 +5,8 @@ import (
 	"database/sql"
 	"fmt"
 	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/types"
 )

 type SnapshotRepository struct {
@@ -17,17 +19,27 @@ func NewSnapshotRepository(db *DB) *SnapshotRepository {

 func (r *SnapshotRepository) Create(ctx context.Context, tx *sql.Tx, snapshot *Snapshot) error {
 	query := `
-		INSERT INTO snapshots (id, hostname, vaultik_version, created_ts, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio)
-		VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+		INSERT INTO snapshots (id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at, 
+			file_count, chunk_count, blob_count, total_size, blob_size, blob_uncompressed_size, 
+			compression_ratio, compression_level, upload_bytes, upload_duration_ms)
+		VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
 	`

+	var completedAt *int64
+	if snapshot.CompletedAt != nil {
+		ts := snapshot.CompletedAt.Unix()
+		completedAt = &ts
+	}
+
 	var err error
 	if tx != nil {
-		_, err = tx.ExecContext(ctx, query, snapshot.ID, snapshot.Hostname, snapshot.VaultikVersion, snapshot.CreatedTS.Unix(),
-			snapshot.FileCount, snapshot.ChunkCount, snapshot.BlobCount, snapshot.TotalSize, snapshot.BlobSize, snapshot.CompressionRatio)
+		_, err = tx.ExecContext(ctx, query, snapshot.ID, snapshot.Hostname, snapshot.VaultikVersion, snapshot.VaultikGitRevision, snapshot.StartedAt.Unix(),
+			completedAt, snapshot.FileCount, snapshot.ChunkCount, snapshot.BlobCount, snapshot.TotalSize, snapshot.BlobSize, snapshot.BlobUncompressedSize,
+			snapshot.CompressionRatio, snapshot.CompressionLevel, snapshot.UploadBytes, snapshot.UploadDurationMs)
 	} else {
-		_, err = r.db.ExecWithLock(ctx, query, snapshot.ID, snapshot.Hostname, snapshot.VaultikVersion, snapshot.CreatedTS.Unix(),
-			snapshot.FileCount, snapshot.ChunkCount, snapshot.BlobCount, snapshot.TotalSize, snapshot.BlobSize, snapshot.CompressionRatio)
+		_, err = r.db.ExecWithLog(ctx, query, snapshot.ID, snapshot.Hostname, snapshot.VaultikVersion, snapshot.VaultikGitRevision, snapshot.StartedAt.Unix(),
+			completedAt, snapshot.FileCount, snapshot.ChunkCount, snapshot.BlobCount, snapshot.TotalSize, snapshot.BlobSize, snapshot.BlobUncompressedSize,
+			snapshot.CompressionRatio, snapshot.CompressionLevel, snapshot.UploadBytes, snapshot.UploadDurationMs)
 	}

 	if err != nil {
@@ -58,7 +70,7 @@ func (r *SnapshotRepository) UpdateCounts(ctx context.Context, tx *sql.Tx, snaps
 	if tx != nil {
 		_, err = tx.ExecContext(ctx, query, fileCount, chunkCount, blobCount, totalSize, blobSize, compressionRatio, snapshotID)
 	} else {
-		_, err = r.db.ExecWithLock(ctx, query, fileCount, chunkCount, blobCount, totalSize, blobSize, compressionRatio, snapshotID)
+		_, err = r.db.ExecWithLog(ctx, query, fileCount, chunkCount, blobCount, totalSize, blobSize, compressionRatio, snapshotID)
 	}

 	if err != nil {
@@ -68,27 +80,83 @@ func (r *SnapshotRepository) UpdateCounts(ctx context.Context, tx *sql.Tx, snaps
 	return nil
 }

+// UpdateExtendedStats updates extended statistics for a snapshot
+func (r *SnapshotRepository) UpdateExtendedStats(ctx context.Context, tx *sql.Tx, snapshotID string, blobUncompressedSize int64, compressionLevel int, uploadDurationMs int64) error {
+	// Calculate compression ratio based on uncompressed vs compressed sizes
+	var compressionRatio float64
+	if blobUncompressedSize > 0 {
+		// Get current blob_size from DB to calculate ratio
+		var blobSize int64
+		queryGet := `SELECT blob_size FROM snapshots WHERE id = ?`
+		if tx != nil {
+			err := tx.QueryRowContext(ctx, queryGet, snapshotID).Scan(&blobSize)
+			if err != nil {
+				return fmt.Errorf("getting blob size: %w", err)
+			}
+		} else {
+			err := r.db.conn.QueryRowContext(ctx, queryGet, snapshotID).Scan(&blobSize)
+			if err != nil {
+				return fmt.Errorf("getting blob size: %w", err)
+			}
+		}
+		compressionRatio = float64(blobSize) / float64(blobUncompressedSize)
+	} else {
+		compressionRatio = 1.0
+	}
+
+	query := `
+		UPDATE snapshots 
+		SET blob_uncompressed_size = ?,
+		    compression_ratio = ?,
+		    compression_level = ?,
+		    upload_bytes = blob_size,
+		    upload_duration_ms = ?
+		WHERE id = ?
+	`
+
+	var err error
+	if tx != nil {
+		_, err = tx.ExecContext(ctx, query, blobUncompressedSize, compressionRatio, compressionLevel, uploadDurationMs, snapshotID)
+	} else {
+		_, err = r.db.ExecWithLog(ctx, query, blobUncompressedSize, compressionRatio, compressionLevel, uploadDurationMs, snapshotID)
+	}
+
+	if err != nil {
+		return fmt.Errorf("updating extended stats: %w", err)
+	}
+	return nil
+}
+
 func (r *SnapshotRepository) GetByID(ctx context.Context, snapshotID string) (*Snapshot, error) {
 	query := `
-		SELECT id, hostname, vaultik_version, created_ts, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
+		SELECT id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at, 
+			file_count, chunk_count, blob_count, total_size, blob_size, blob_uncompressed_size,
+			compression_ratio, compression_level, upload_bytes, upload_duration_ms
 		FROM snapshots
 		WHERE id = ?
 	`

 	var snapshot Snapshot
-	var createdTSUnix int64
+	var startedAtUnix int64
+	var completedAtUnix *int64

 	err := r.db.conn.QueryRowContext(ctx, query, snapshotID).Scan(
 		&snapshot.ID,
 		&snapshot.Hostname,
 		&snapshot.VaultikVersion,
-		&createdTSUnix,
+		&snapshot.VaultikGitRevision,
+		&startedAtUnix,
+		&completedAtUnix,
 		&snapshot.FileCount,
 		&snapshot.ChunkCount,
 		&snapshot.BlobCount,
 		&snapshot.TotalSize,
 		&snapshot.BlobSize,
+		&snapshot.BlobUncompressedSize,
 		&snapshot.CompressionRatio,
+		&snapshot.CompressionLevel,
+		&snapshot.UploadBytes,
+		&snapshot.UploadDurationMs,
 	)

 	if err == sql.ErrNoRows {
@@ -98,16 +166,20 @@ func (r *SnapshotRepository) GetByID(ctx context.Context, snapshotID string) (*S
 		return nil, fmt.Errorf("querying snapshot: %w", err)
 	}

-	snapshot.CreatedTS = time.Unix(createdTSUnix, 0)
+	snapshot.StartedAt = time.Unix(startedAtUnix, 0).UTC()
+	if completedAtUnix != nil {
+		t := time.Unix(*completedAtUnix, 0).UTC()
+		snapshot.CompletedAt = &t
+	}

 	return &snapshot, nil
 }

 func (r *SnapshotRepository) ListRecent(ctx context.Context, limit int) ([]*Snapshot, error) {
 	query := `
-		SELECT id, hostname, vaultik_version, created_ts, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
+		SELECT id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
 		FROM snapshots
-		ORDER BY created_ts DESC
+		ORDER BY started_at DESC
 		LIMIT ?
 	`

@@ -120,13 +192,16 @@ func (r *SnapshotRepository) ListRecent(ctx context.Context, limit int) ([]*Snap
 	var snapshots []*Snapshot
 	for rows.Next() {
 		var snapshot Snapshot
-		var createdTSUnix int64
+		var startedAtUnix int64
+		var completedAtUnix *int64

 		err := rows.Scan(
 			&snapshot.ID,
 			&snapshot.Hostname,
 			&snapshot.VaultikVersion,
-			&createdTSUnix,
+			&snapshot.VaultikGitRevision,
+			&startedAtUnix,
+			&completedAtUnix,
 			&snapshot.FileCount,
 			&snapshot.ChunkCount,
 			&snapshot.BlobCount,
@@ -138,10 +213,336 @@ func (r *SnapshotRepository) ListRecent(ctx context.Context, limit int) ([]*Snap
 			return nil, fmt.Errorf("scanning snapshot: %w", err)
 		}

-		snapshot.CreatedTS = time.Unix(createdTSUnix, 0)
+		snapshot.StartedAt = time.Unix(startedAtUnix, 0)
+		if completedAtUnix != nil {
+			t := time.Unix(*completedAtUnix, 0)
+			snapshot.CompletedAt = &t
+		}

 		snapshots = append(snapshots, &snapshot)
 	}

 	return snapshots, rows.Err()
 }
+
+// MarkComplete marks a snapshot as completed with the current timestamp
+func (r *SnapshotRepository) MarkComplete(ctx context.Context, tx *sql.Tx, snapshotID string) error {
+	query := `
+		UPDATE snapshots 
+		SET completed_at = ?
+		WHERE id = ?
+	`
+
+	completedAt := time.Now().UTC().Unix()
+
+	var err error
+	if tx != nil {
+		_, err = tx.ExecContext(ctx, query, completedAt, snapshotID)
+	} else {
+		_, err = r.db.ExecWithLog(ctx, query, completedAt, snapshotID)
+	}
+
+	if err != nil {
+		return fmt.Errorf("marking snapshot complete: %w", err)
+	}
+
+	return nil
+}
+
+// AddFile adds a file to a snapshot
+func (r *SnapshotRepository) AddFile(ctx context.Context, tx *sql.Tx, snapshotID string, filePath string) error {
+	query := `
+		INSERT OR IGNORE INTO snapshot_files (snapshot_id, file_id)
+		SELECT ?, id FROM files WHERE path = ?
+	`
+
+	var err error
+	if tx != nil {
+		_, err = tx.ExecContext(ctx, query, snapshotID, filePath)
+	} else {
+		_, err = r.db.ExecWithLog(ctx, query, snapshotID, filePath)
+	}
+
+	if err != nil {
+		return fmt.Errorf("adding file to snapshot: %w", err)
+	}
+
+	return nil
+}
+
+// AddFileByID adds a file to a snapshot by file ID
+func (r *SnapshotRepository) AddFileByID(ctx context.Context, tx *sql.Tx, snapshotID string, fileID types.FileID) error {
+	query := `
+		INSERT OR IGNORE INTO snapshot_files (snapshot_id, file_id)
+		VALUES (?, ?)
+	`
+
+	var err error
+	if tx != nil {
+		_, err = tx.ExecContext(ctx, query, snapshotID, fileID.String())
+	} else {
+		_, err = r.db.ExecWithLog(ctx, query, snapshotID, fileID.String())
+	}
+
+	if err != nil {
+		return fmt.Errorf("adding file to snapshot: %w", err)
+	}
+
+	return nil
+}
+
+// AddFilesByIDBatch adds multiple files to a snapshot in batched inserts
+func (r *SnapshotRepository) AddFilesByIDBatch(ctx context.Context, tx *sql.Tx, snapshotID string, fileIDs []types.FileID) error {
+	if len(fileIDs) == 0 {
+		return nil
+	}
+
+	// Each entry has 2 values, so batch at 400 to be safe
+	const batchSize = 400
+
+	for i := 0; i < len(fileIDs); i += batchSize {
+		end := i + batchSize
+		if end > len(fileIDs) {
+			end = len(fileIDs)
+		}
+		batch := fileIDs[i:end]
+
+		query := "INSERT OR IGNORE INTO snapshot_files (snapshot_id, file_id) VALUES "
+		args := make([]interface{}, 0, len(batch)*2)
+		for j, fileID := range batch {
+			if j > 0 {
+				query += ", "
+			}
+			query += "(?, ?)"
+			args = append(args, snapshotID, fileID.String())
+		}
+
+		var err error
+		if tx != nil {
+			_, err = tx.ExecContext(ctx, query, args...)
+		} else {
+			_, err = r.db.ExecWithLog(ctx, query, args...)
+		}
+		if err != nil {
+			return fmt.Errorf("batch adding files to snapshot: %w", err)
+		}
+	}
+
+	return nil
+}
+
+// AddBlob adds a blob to a snapshot
+func (r *SnapshotRepository) AddBlob(ctx context.Context, tx *sql.Tx, snapshotID string, blobID types.BlobID, blobHash types.BlobHash) error {
+	query := `
+		INSERT OR IGNORE INTO snapshot_blobs (snapshot_id, blob_id, blob_hash)
+		VALUES (?, ?, ?)
+	`
+
+	var err error
+	if tx != nil {
+		_, err = tx.ExecContext(ctx, query, snapshotID, blobID.String(), blobHash.String())
+	} else {
+		_, err = r.db.ExecWithLog(ctx, query, snapshotID, blobID.String(), blobHash.String())
+	}
+
+	if err != nil {
+		return fmt.Errorf("adding blob to snapshot: %w", err)
+	}
+
+	return nil
+}
+
+// GetBlobHashes returns all blob hashes for a snapshot
+func (r *SnapshotRepository) GetBlobHashes(ctx context.Context, snapshotID string) ([]string, error) {
+	query := `
+		SELECT sb.blob_hash 
+		FROM snapshot_blobs sb
+		WHERE sb.snapshot_id = ?
+		ORDER BY sb.blob_hash
+	`
+
+	rows, err := r.db.conn.QueryContext(ctx, query, snapshotID)
+	if err != nil {
+		return nil, fmt.Errorf("querying blob hashes: %w", err)
+	}
+	defer CloseRows(rows)
+
+	var blobs []string
+	for rows.Next() {
+		var blobHash string
+		if err := rows.Scan(&blobHash); err != nil {
+			return nil, fmt.Errorf("scanning blob hash: %w", err)
+		}
+		blobs = append(blobs, blobHash)
+	}
+
+	return blobs, rows.Err()
+}
+
+// GetSnapshotTotalCompressedSize returns the total compressed size of all blobs referenced by a snapshot
+func (r *SnapshotRepository) GetSnapshotTotalCompressedSize(ctx context.Context, snapshotID string) (int64, error) {
+	query := `
+		SELECT COALESCE(SUM(b.compressed_size), 0)
+		FROM snapshot_blobs sb
+		JOIN blobs b ON sb.blob_hash = b.blob_hash
+		WHERE sb.snapshot_id = ?
+	`
+
+	var totalSize int64
+	err := r.db.conn.QueryRowContext(ctx, query, snapshotID).Scan(&totalSize)
+	if err != nil {
+		return 0, fmt.Errorf("querying total compressed size: %w", err)
+	}
+
+	return totalSize, nil
+}
+
+// GetIncompleteSnapshots returns all snapshots that haven't been completed
+func (r *SnapshotRepository) GetIncompleteSnapshots(ctx context.Context) ([]*Snapshot, error) {
+	query := `
+		SELECT id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
+		FROM snapshots
+		WHERE completed_at IS NULL
+		ORDER BY started_at DESC
+	`
+
+	rows, err := r.db.conn.QueryContext(ctx, query)
+	if err != nil {
+		return nil, fmt.Errorf("querying incomplete snapshots: %w", err)
+	}
+	defer CloseRows(rows)
+
+	var snapshots []*Snapshot
+	for rows.Next() {
+		var snapshot Snapshot
+		var startedAtUnix int64
+		var completedAtUnix *int64
+
+		err := rows.Scan(
+			&snapshot.ID,
+			&snapshot.Hostname,
+			&snapshot.VaultikVersion,
+			&snapshot.VaultikGitRevision,
+			&startedAtUnix,
+			&completedAtUnix,
+			&snapshot.FileCount,
+			&snapshot.ChunkCount,
+			&snapshot.BlobCount,
+			&snapshot.TotalSize,
+			&snapshot.BlobSize,
+			&snapshot.CompressionRatio,
+		)
+		if err != nil {
+			return nil, fmt.Errorf("scanning snapshot: %w", err)
+		}
+
+		snapshot.StartedAt = time.Unix(startedAtUnix, 0)
+		if completedAtUnix != nil {
+			t := time.Unix(*completedAtUnix, 0)
+			snapshot.CompletedAt = &t
+		}
+
+		snapshots = append(snapshots, &snapshot)
+	}
+
+	return snapshots, rows.Err()
+}
+
+// GetIncompleteByHostname returns all incomplete snapshots for a specific hostname
+func (r *SnapshotRepository) GetIncompleteByHostname(ctx context.Context, hostname string) ([]*Snapshot, error) {
+	query := `
+		SELECT id, hostname, vaultik_version, vaultik_git_revision, started_at, completed_at, file_count, chunk_count, blob_count, total_size, blob_size, compression_ratio
+		FROM snapshots
+		WHERE completed_at IS NULL AND hostname = ?
+		ORDER BY started_at DESC
+	`
+
+	rows, err := r.db.conn.QueryContext(ctx, query, hostname)
+	if err != nil {
+		return nil, fmt.Errorf("querying incomplete snapshots: %w", err)
+	}
+	defer CloseRows(rows)
+
+	var snapshots []*Snapshot
+	for rows.Next() {
+		var snapshot Snapshot
+		var startedAtUnix int64
+		var completedAtUnix *int64
+
+		err := rows.Scan(
+			&snapshot.ID,
+			&snapshot.Hostname,
+			&snapshot.VaultikVersion,
+			&snapshot.VaultikGitRevision,
+			&startedAtUnix,
+			&completedAtUnix,
+			&snapshot.FileCount,
+			&snapshot.ChunkCount,
+			&snapshot.BlobCount,
+			&snapshot.TotalSize,
+			&snapshot.BlobSize,
+			&snapshot.CompressionRatio,
+		)
+		if err != nil {
+			return nil, fmt.Errorf("scanning snapshot: %w", err)
+		}
+
+		snapshot.StartedAt = time.Unix(startedAtUnix, 0).UTC()
+		if completedAtUnix != nil {
+			t := time.Unix(*completedAtUnix, 0).UTC()
+			snapshot.CompletedAt = &t
+		}
+
+		snapshots = append(snapshots, &snapshot)
+	}
+
+	return snapshots, rows.Err()
+}
+
+// Delete removes a snapshot record
+func (r *SnapshotRepository) Delete(ctx context.Context, snapshotID string) error {
+	query := `DELETE FROM snapshots WHERE id = ?`
+
+	_, err := r.db.ExecWithLog(ctx, query, snapshotID)
+	if err != nil {
+		return fmt.Errorf("deleting snapshot: %w", err)
+	}
+
+	return nil
+}
+
+// DeleteSnapshotFiles removes all snapshot_files entries for a snapshot
+func (r *SnapshotRepository) DeleteSnapshotFiles(ctx context.Context, snapshotID string) error {
+	query := `DELETE FROM snapshot_files WHERE snapshot_id = ?`
+
+	_, err := r.db.ExecWithLog(ctx, query, snapshotID)
+	if err != nil {
+		return fmt.Errorf("deleting snapshot files: %w", err)
+	}
+
+	return nil
+}
+
+// DeleteSnapshotBlobs removes all snapshot_blobs entries for a snapshot
+func (r *SnapshotRepository) DeleteSnapshotBlobs(ctx context.Context, snapshotID string) error {
+	query := `DELETE FROM snapshot_blobs WHERE snapshot_id = ?`
+
+	_, err := r.db.ExecWithLog(ctx, query, snapshotID)
+	if err != nil {
+		return fmt.Errorf("deleting snapshot blobs: %w", err)
+	}
+
+	return nil
+}
+
+// DeleteSnapshotUploads removes all uploads entries for a snapshot
+func (r *SnapshotRepository) DeleteSnapshotUploads(ctx context.Context, snapshotID string) error {
+	query := `DELETE FROM uploads WHERE snapshot_id = ?`
+
+	_, err := r.db.ExecWithLog(ctx, query, snapshotID)
+	if err != nil {
+		return fmt.Errorf("deleting snapshot uploads: %w", err)
+	}
+
+	return nil
+}
--- a/internal/database/snapshots_test.go
+++ b/internal/database/snapshots_test.go
@@ -6,6 +6,8 @@ import (
 	"math"
 	"testing"
 	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/types"
 )

 const (
@@ -30,7 +32,8 @@ func TestSnapshotRepository(t *testing.T) {
 		ID:               "2024-01-01T12:00:00Z",
 		Hostname:         "test-host",
 		VaultikVersion:   "1.0.0",
-		CreatedTS:        time.Now().Truncate(time.Second),
+		StartedAt:        time.Now().Truncate(time.Second),
+		CompletedAt:      nil,
 		FileCount:        100,
 		ChunkCount:       500,
 		BlobCount:        10,
@@ -45,7 +48,7 @@ func TestSnapshotRepository(t *testing.T) {
 	}

 	// Test GetByID
-	retrieved, err := repo.GetByID(ctx, snapshot.ID)
+	retrieved, err := repo.GetByID(ctx, snapshot.ID.String())
 	if err != nil {
 		t.Fatalf("failed to get snapshot: %v", err)
 	}
@@ -63,12 +66,12 @@ func TestSnapshotRepository(t *testing.T) {
 	}

 	// Test UpdateCounts
-	err = repo.UpdateCounts(ctx, nil, snapshot.ID, 200, 1000, 20, twoHundredMebibytes, sixtyMebibytes)
+	err = repo.UpdateCounts(ctx, nil, snapshot.ID.String(), 200, 1000, 20, twoHundredMebibytes, sixtyMebibytes)
 	if err != nil {
 		t.Fatalf("failed to update counts: %v", err)
 	}

-	retrieved, err = repo.GetByID(ctx, snapshot.ID)
+	retrieved, err = repo.GetByID(ctx, snapshot.ID.String())
 	if err != nil {
 		t.Fatalf("failed to get updated snapshot: %v", err)
 	}
@@ -96,10 +99,11 @@ func TestSnapshotRepository(t *testing.T) {
 	// Add more snapshots
 	for i := 2; i <= 5; i++ {
 		s := &Snapshot{
-			ID:             fmt.Sprintf("2024-01-0%dT12:00:00Z", i),
+			ID:             types.SnapshotID(fmt.Sprintf("2024-01-0%dT12:00:00Z", i)),
 			Hostname:       "test-host",
 			VaultikVersion: "1.0.0",
-			CreatedTS:      time.Now().Add(time.Duration(i) * time.Hour).Truncate(time.Second),
+			StartedAt:      time.Now().Add(time.Duration(i) * time.Hour).Truncate(time.Second),
+			CompletedAt:    nil,
 			FileCount:      int64(100 * i),
 			ChunkCount:     int64(500 * i),
 			BlobCount:      int64(10 * i),
@@ -121,7 +125,7 @@ func TestSnapshotRepository(t *testing.T) {

 	// Verify order (most recent first)
 	for i := 0; i < len(recent)-1; i++ {
-		if recent[i].CreatedTS.Before(recent[i+1].CreatedTS) {
+		if recent[i].StartedAt.Before(recent[i+1].StartedAt) {
 			t.Error("snapshots not in descending order")
 		}
 	}
@@ -162,7 +166,8 @@ func TestSnapshotRepositoryDuplicate(t *testing.T) {
 		ID:             "2024-01-01T12:00:00Z",
 		Hostname:       "test-host",
 		VaultikVersion: "1.0.0",
-		CreatedTS:      time.Now().Truncate(time.Second),
+		StartedAt:      time.Now().Truncate(time.Second),
+		CompletedAt:    nil,
 		FileCount:      100,
 		ChunkCount:     500,
 		BlobCount:      10,
--- a/internal/database/uploads.go
+++ b/internal/database/uploads.go
@@ -0,0 +1,147 @@
+package database
+
+import (
+	"context"
+	"database/sql"
+	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/log"
+)
+
+// Upload represents a blob upload record
+type Upload struct {
+	BlobHash   string
+	SnapshotID string
+	UploadedAt time.Time
+	Size       int64
+	DurationMs int64
+}
+
+// UploadRepository handles upload records
+type UploadRepository struct {
+	conn *sql.DB
+}
+
+// NewUploadRepository creates a new upload repository
+func NewUploadRepository(conn *sql.DB) *UploadRepository {
+	return &UploadRepository{conn: conn}
+}
+
+// Create inserts a new upload record
+func (r *UploadRepository) Create(ctx context.Context, tx *sql.Tx, upload *Upload) error {
+	query := `
+		INSERT INTO uploads (blob_hash, snapshot_id, uploaded_at, size, duration_ms)
+		VALUES (?, ?, ?, ?, ?)
+	`
+
+	var err error
+	if tx != nil {
+		_, err = tx.ExecContext(ctx, query, upload.BlobHash, upload.SnapshotID, upload.UploadedAt, upload.Size, upload.DurationMs)
+	} else {
+		_, err = r.conn.ExecContext(ctx, query, upload.BlobHash, upload.SnapshotID, upload.UploadedAt, upload.Size, upload.DurationMs)
+	}
+
+	return err
+}
+
+// GetByBlobHash retrieves an upload record by blob hash
+func (r *UploadRepository) GetByBlobHash(ctx context.Context, blobHash string) (*Upload, error) {
+	query := `
+		SELECT blob_hash, uploaded_at, size, duration_ms
+		FROM uploads
+		WHERE blob_hash = ?
+	`
+
+	var upload Upload
+	err := r.conn.QueryRowContext(ctx, query, blobHash).Scan(
+		&upload.BlobHash,
+		&upload.UploadedAt,
+		&upload.Size,
+		&upload.DurationMs,
+	)
+
+	if err == sql.ErrNoRows {
+		return nil, nil
+	}
+	if err != nil {
+		return nil, err
+	}
+
+	return &upload, nil
+}
+
+// GetRecentUploads retrieves recent uploads ordered by upload time
+func (r *UploadRepository) GetRecentUploads(ctx context.Context, limit int) ([]*Upload, error) {
+	query := `
+		SELECT blob_hash, uploaded_at, size, duration_ms
+		FROM uploads
+		ORDER BY uploaded_at DESC
+		LIMIT ?
+	`
+
+	rows, err := r.conn.QueryContext(ctx, query, limit)
+	if err != nil {
+		return nil, err
+	}
+	defer func() {
+		if err := rows.Close(); err != nil {
+			log.Error("failed to close rows", "error", err)
+		}
+	}()
+
+	var uploads []*Upload
+	for rows.Next() {
+		var upload Upload
+		if err := rows.Scan(&upload.BlobHash, &upload.UploadedAt, &upload.Size, &upload.DurationMs); err != nil {
+			return nil, err
+		}
+		uploads = append(uploads, &upload)
+	}
+
+	return uploads, rows.Err()
+}
+
+// GetUploadStats returns aggregate statistics for uploads
+func (r *UploadRepository) GetUploadStats(ctx context.Context, since time.Time) (*UploadStats, error) {
+	query := `
+		SELECT 
+			COUNT(*) as count,
+			COALESCE(SUM(size), 0) as total_size,
+			COALESCE(AVG(duration_ms), 0) as avg_duration_ms,
+			COALESCE(MIN(duration_ms), 0) as min_duration_ms,
+			COALESCE(MAX(duration_ms), 0) as max_duration_ms
+		FROM uploads
+		WHERE uploaded_at >= ?
+	`
+
+	var stats UploadStats
+	err := r.conn.QueryRowContext(ctx, query, since).Scan(
+		&stats.Count,
+		&stats.TotalSize,
+		&stats.AvgDurationMs,
+		&stats.MinDurationMs,
+		&stats.MaxDurationMs,
+	)
+
+	return &stats, err
+}
+
+// UploadStats contains aggregate upload statistics
+type UploadStats struct {
+	Count         int64
+	TotalSize     int64
+	AvgDurationMs float64
+	MinDurationMs int64
+	MaxDurationMs int64
+}
+
+// GetCountBySnapshot returns the count of uploads for a specific snapshot
+func (r *UploadRepository) GetCountBySnapshot(ctx context.Context, snapshotID string) (int64, error) {
+	query := `SELECT COUNT(*) FROM uploads WHERE snapshot_id = ?`
+	var count int64
+	err := r.conn.QueryRowContext(ctx, query, snapshotID).Scan(&count)
+	if err != nil {
+		return 0, err
+	}
+	return count, nil
+}
--- a/internal/globals/globals.go
+++ b/internal/globals/globals.go
@@ -4,13 +4,16 @@ import (
 	"time"
 )

-// these get populated from main() and copied into the Globals object.
-var (
-	Appname string = "vaultik"
-	Version string = "dev"
-	Commit  string = "unknown"
-)
+// Appname is the application name, populated from main().
+var Appname string = "vaultik"

+// Version is the application version, populated from main().
+var Version string = "dev"
+
+// Commit is the git commit hash, populated from main().
+var Commit string = "unknown"
+
+// Globals contains application-wide configuration and metadata.
 type Globals struct {
 	Appname   string
 	Version   string
@@ -18,13 +21,11 @@ type Globals struct {
 	StartTime time.Time
 }

+// New creates and returns a new Globals instance initialized with the package-level variables.
 func New() (*Globals, error) {
-	n := &Globals{
+	return &Globals{
 		Appname: Appname,
 		Version: Version,
 		Commit:  Commit,
-		StartTime: time.Now(),
-	}
-
-	return n, nil
+	}, nil
 }
--- a/internal/globals/globals_test.go
+++ b/internal/globals/globals_test.go
@@ -2,16 +2,15 @@ package globals

 import (
 	"testing"
-
-	"go.uber.org/fx"
-	"go.uber.org/fx/fxtest"
 )

 // TestGlobalsNew ensures the globals package initializes correctly
 func TestGlobalsNew(t *testing.T) {
-	app := fxtest.New(t,
-		fx.Provide(New),
-		fx.Invoke(func(g *Globals) {
+	g, err := New()
+	if err != nil {
+		t.Fatalf("Failed to create Globals: %v", err)
+	}
+
 	if g == nil {
 		t.Fatal("Globals instance is nil")
 	}
@@ -28,9 +27,4 @@ func TestGlobalsNew(t *testing.T) {
 	if g.Commit == "" {
 		t.Error("Commit should not be empty")
 	}
-		}),
-	)
-
-	app.RequireStart()
-	app.RequireStop()
 }
--- a/internal/log/log.go
+++ b/internal/log/log.go
@@ -0,0 +1,182 @@
+package log
+
+import (
+	"context"
+	"fmt"
+	"log/slog"
+	"os"
+	"path/filepath"
+	"runtime"
+	"strings"
+
+	"golang.org/x/term"
+)
+
+// LogLevel represents the logging level.
+type LogLevel int
+
+const (
+	// LevelFatal represents a fatal error level that will exit the program.
+	LevelFatal LogLevel = iota
+	// LevelError represents an error level.
+	LevelError
+	// LevelWarn represents a warning level.
+	LevelWarn
+	// LevelNotice represents a notice level (mapped to Info in slog).
+	LevelNotice
+	// LevelInfo represents an informational level.
+	LevelInfo
+	// LevelDebug represents a debug level.
+	LevelDebug
+)
+
+// Config holds logger configuration.
+type Config struct {
+	Verbose bool
+	Debug   bool
+	Cron    bool
+	Quiet   bool
+}
+
+var logger *slog.Logger
+
+// Initialize sets up the global logger based on the provided configuration.
+func Initialize(cfg Config) {
+	// Determine log level based on configuration
+	var level slog.Level
+
+	if cfg.Cron || cfg.Quiet {
+		// In quiet/cron mode, only show errors
+		level = slog.LevelError
+	} else if cfg.Debug || strings.Contains(os.Getenv("GODEBUG"), "vaultik") {
+		level = slog.LevelDebug
+	} else if cfg.Verbose {
+		level = slog.LevelInfo
+	} else {
+		level = slog.LevelWarn
+	}
+
+	// Create handler with appropriate level
+	opts := &slog.HandlerOptions{
+		Level: level,
+	}
+
+	// Check if stdout is a TTY
+	if term.IsTerminal(int(os.Stdout.Fd())) {
+		// Use colorized TTY handler
+		logger = slog.New(NewTTYHandler(os.Stdout, opts))
+	} else {
+		// Use JSON format for non-TTY output
+		logger = slog.New(slog.NewJSONHandler(os.Stdout, opts))
+	}
+
+	// Set as default logger
+	slog.SetDefault(logger)
+}
+
+// getCaller returns the caller information as a string
+func getCaller(skip int) string {
+	_, file, line, ok := runtime.Caller(skip)
+	if !ok {
+		return "unknown"
+	}
+	return fmt.Sprintf("%s:%d", filepath.Base(file), line)
+}
+
+// Fatal logs a fatal error message and exits the program with code 1.
+func Fatal(msg string, args ...any) {
+	if logger != nil {
+		// Add caller info to args
+		args = append(args, "caller", getCaller(2))
+		logger.Error(msg, args...)
+	}
+	os.Exit(1)
+}
+
+// Fatalf logs a formatted fatal error message and exits the program with code 1.
+func Fatalf(format string, args ...any) {
+	Fatal(fmt.Sprintf(format, args...))
+}
+
+// Error logs an error message.
+func Error(msg string, args ...any) {
+	if logger != nil {
+		args = append(args, "caller", getCaller(2))
+		logger.Error(msg, args...)
+	}
+}
+
+// Errorf logs a formatted error message.
+func Errorf(format string, args ...any) {
+	Error(fmt.Sprintf(format, args...))
+}
+
+// Warn logs a warning message.
+func Warn(msg string, args ...any) {
+	if logger != nil {
+		args = append(args, "caller", getCaller(2))
+		logger.Warn(msg, args...)
+	}
+}
+
+// Warnf logs a formatted warning message.
+func Warnf(format string, args ...any) {
+	Warn(fmt.Sprintf(format, args...))
+}
+
+// Notice logs a notice message (mapped to Info level).
+func Notice(msg string, args ...any) {
+	if logger != nil {
+		args = append(args, "caller", getCaller(2))
+		logger.Info(msg, args...)
+	}
+}
+
+// Noticef logs a formatted notice message.
+func Noticef(format string, args ...any) {
+	Notice(fmt.Sprintf(format, args...))
+}
+
+// Info logs an informational message.
+func Info(msg string, args ...any) {
+	if logger != nil {
+		args = append(args, "caller", getCaller(2))
+		logger.Info(msg, args...)
+	}
+}
+
+// Infof logs a formatted informational message.
+func Infof(format string, args ...any) {
+	Info(fmt.Sprintf(format, args...))
+}
+
+// Debug logs a debug message.
+func Debug(msg string, args ...any) {
+	if logger != nil {
+		args = append(args, "caller", getCaller(2))
+		logger.Debug(msg, args...)
+	}
+}
+
+// Debugf logs a formatted debug message.
+func Debugf(format string, args ...any) {
+	Debug(fmt.Sprintf(format, args...))
+}
+
+// With returns a logger with additional context attributes.
+func With(args ...any) *slog.Logger {
+	if logger != nil {
+		return logger.With(args...)
+	}
+	return slog.Default()
+}
+
+// WithContext returns a logger with the provided context.
+func WithContext(ctx context.Context) *slog.Logger {
+	return logger
+}
+
+// Logger returns the underlying slog.Logger instance.
+func Logger() *slog.Logger {
+	return logger
+}
--- a/internal/log/module.go
+++ b/internal/log/module.go
@@ -0,0 +1,25 @@
+package log
+
+import (
+	"go.uber.org/fx"
+)
+
+// Module exports logging functionality for dependency injection.
+var Module = fx.Module("log",
+	fx.Invoke(func(cfg Config) {
+		Initialize(cfg)
+	}),
+)
+
+// New creates a new logger configuration from provided options.
+func New(opts LogOptions) Config {
+	return Config(opts)
+}
+
+// LogOptions are provided by the CLI.
+type LogOptions struct {
+	Verbose bool
+	Debug   bool
+	Cron    bool
+	Quiet   bool
+}
--- a/internal/log/tty_handler.go
+++ b/internal/log/tty_handler.go
@@ -0,0 +1,140 @@
+package log
+
+import (
+	"context"
+	"fmt"
+	"io"
+	"log/slog"
+	"sync"
+	"time"
+)
+
+// ANSI color codes
+const (
+	colorReset  = "\033[0m"
+	colorRed    = "\033[31m"
+	colorYellow = "\033[33m"
+	colorBlue   = "\033[34m"
+	colorGray   = "\033[90m"
+	colorGreen  = "\033[32m"
+	colorCyan   = "\033[36m"
+	colorBold   = "\033[1m"
+)
+
+// TTYHandler is a custom slog handler for TTY output with colors.
+type TTYHandler struct {
+	opts slog.HandlerOptions
+	mu   sync.Mutex
+	out  io.Writer
+}
+
+// NewTTYHandler creates a new TTY handler with colored output.
+func NewTTYHandler(out io.Writer, opts *slog.HandlerOptions) *TTYHandler {
+	if opts == nil {
+		opts = &slog.HandlerOptions{}
+	}
+	return &TTYHandler{
+		out:  out,
+		opts: *opts,
+	}
+}
+
+// Enabled reports whether the handler handles records at the given level.
+func (h *TTYHandler) Enabled(_ context.Context, level slog.Level) bool {
+	return level >= h.opts.Level.Level()
+}
+
+// Handle writes the log record to the output with color formatting.
+func (h *TTYHandler) Handle(_ context.Context, r slog.Record) error {
+	h.mu.Lock()
+	defer h.mu.Unlock()
+
+	// Format timestamp
+	timestamp := r.Time.Format("15:04:05")
+
+	// Level and color
+	level := r.Level.String()
+	var levelColor string
+	switch r.Level {
+	case slog.LevelDebug:
+		levelColor = colorGray
+		level = "DEBUG"
+	case slog.LevelInfo:
+		levelColor = colorGreen
+		level = "INFO "
+	case slog.LevelWarn:
+		levelColor = colorYellow
+		level = "WARN "
+	case slog.LevelError:
+		levelColor = colorRed
+		level = "ERROR"
+	default:
+		levelColor = colorReset
+	}
+
+	// Print main message
+	_, _ = fmt.Fprintf(h.out, "%s%s%s %s%s%s %s%s%s",
+		colorGray, timestamp, colorReset,
+		levelColor, level, colorReset,
+		colorBold, r.Message, colorReset)
+
+	// Print attributes
+	r.Attrs(func(a slog.Attr) bool {
+		value := a.Value.String()
+		// Special handling for certain attribute types
+		switch a.Value.Kind() {
+		case slog.KindDuration:
+			if d, ok := a.Value.Any().(time.Duration); ok {
+				value = formatDuration(d)
+			}
+		case slog.KindInt64:
+			if a.Key == "bytes" {
+				value = formatBytes(a.Value.Int64())
+			}
+		}
+
+		_, _ = fmt.Fprintf(h.out, " %s%s%s=%s%s%s",
+			colorCyan, a.Key, colorReset,
+			colorBlue, value, colorReset)
+		return true
+	})
+
+	_, _ = fmt.Fprintln(h.out)
+	return nil
+}
+
+// WithAttrs returns a new handler with the given attributes.
+func (h *TTYHandler) WithAttrs(attrs []slog.Attr) slog.Handler {
+	return h // Simplified for now
+}
+
+// WithGroup returns a new handler with the given group name.
+func (h *TTYHandler) WithGroup(name string) slog.Handler {
+	return h // Simplified for now
+}
+
+// formatDuration formats a duration in a human-readable way
+func formatDuration(d time.Duration) string {
+	if d < time.Millisecond {
+		return fmt.Sprintf("%dµs", d.Microseconds())
+	} else if d < time.Second {
+		return fmt.Sprintf("%dms", d.Milliseconds())
+	} else if d < time.Minute {
+		return fmt.Sprintf("%.1fs", d.Seconds())
+	}
+	return d.String()
+}
+
+// formatBytes formats bytes in a human-readable way
+func formatBytes(b int64) string {
+	const unit = 1024
+	if b < unit {
+		return fmt.Sprintf("%d B", b)
+	}
+	div, exp := int64(unit), 0
+	for n := b / unit; n >= unit; n /= unit {
+		div *= unit
+		exp++
+	}
+	return fmt.Sprintf("%.1f %cB", float64(b)/float64(div), "KMGTPE"[exp])
+}
--- a/internal/pidlock/pidlock.go
+++ b/internal/pidlock/pidlock.go
@@ -0,0 +1,108 @@
+// Package pidlock provides process-level locking using PID files.
+// It prevents multiple instances of vaultik from running simultaneously,
+// which would cause database locking conflicts.
+package pidlock
+
+import (
+	"errors"
+	"fmt"
+	"os"
+	"path/filepath"
+	"strconv"
+	"strings"
+	"syscall"
+)
+
+// ErrAlreadyRunning indicates another vaultik instance is running.
+var ErrAlreadyRunning = errors.New("another vaultik instance is already running")
+
+// Lock represents an acquired PID lock.
+type Lock struct {
+	path string
+}
+
+// Acquire attempts to acquire a PID lock in the specified directory.
+// If the lock file exists and the process is still running, it returns
+// ErrAlreadyRunning with details about the existing process.
+// On success, it writes the current PID to the lock file and returns
+// a Lock that must be released with Release().
+func Acquire(lockDir string) (*Lock, error) {
+	// Ensure lock directory exists
+	if err := os.MkdirAll(lockDir, 0700); err != nil {
+		return nil, fmt.Errorf("creating lock directory: %w", err)
+	}
+
+	lockPath := filepath.Join(lockDir, "vaultik.pid")
+
+	// Check for existing lock
+	existingPID, err := readPIDFile(lockPath)
+	if err == nil {
+		// Lock file exists, check if process is running
+		if isProcessRunning(existingPID) {
+			return nil, fmt.Errorf("%w (PID %d)", ErrAlreadyRunning, existingPID)
+		}
+		// Process is not running, stale lock file - we can take over
+	}
+
+	// Write our PID
+	pid := os.Getpid()
+	if err := os.WriteFile(lockPath, []byte(strconv.Itoa(pid)), 0600); err != nil {
+		return nil, fmt.Errorf("writing PID file: %w", err)
+	}
+
+	return &Lock{path: lockPath}, nil
+}
+
+// Release removes the PID lock file.
+// It is safe to call Release multiple times.
+func (l *Lock) Release() error {
+	if l == nil || l.path == "" {
+		return nil
+	}
+
+	// Verify we still own the lock (our PID is in the file)
+	existingPID, err := readPIDFile(l.path)
+	if err != nil {
+		// File already gone or unreadable - that's fine
+		return nil
+	}
+
+	if existingPID != os.Getpid() {
+		// Someone else wrote to our lock file - don't remove it
+		return nil
+	}
+
+	if err := os.Remove(l.path); err != nil && !os.IsNotExist(err) {
+		return fmt.Errorf("removing PID file: %w", err)
+	}
+
+	l.path = "" // Prevent double-release
+	return nil
+}
+
+// readPIDFile reads and parses the PID from a lock file.
+func readPIDFile(path string) (int, error) {
+	data, err := os.ReadFile(path)
+	if err != nil {
+		return 0, err
+	}
+
+	pid, err := strconv.Atoi(strings.TrimSpace(string(data)))
+	if err != nil {
+		return 0, fmt.Errorf("parsing PID: %w", err)
+	}
+
+	return pid, nil
+}
+
+// isProcessRunning checks if a process with the given PID is running.
+func isProcessRunning(pid int) bool {
+	process, err := os.FindProcess(pid)
+	if err != nil {
+		return false
+	}
+
+	// On Unix, FindProcess always succeeds. We need to send signal 0 to check.
+	err = process.Signal(syscall.Signal(0))
+	return err == nil
+}
--- a/internal/pidlock/pidlock_test.go
+++ b/internal/pidlock/pidlock_test.go
@@ -0,0 +1,108 @@
+package pidlock
+
+import (
+	"os"
+	"path/filepath"
+	"strconv"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func TestAcquireAndRelease(t *testing.T) {
+	tmpDir := t.TempDir()
+
+	// Acquire lock
+	lock, err := Acquire(tmpDir)
+	require.NoError(t, err)
+	require.NotNil(t, lock)
+
+	// Verify PID file exists with our PID
+	data, err := os.ReadFile(filepath.Join(tmpDir, "vaultik.pid"))
+	require.NoError(t, err)
+	pid, err := strconv.Atoi(string(data))
+	require.NoError(t, err)
+	assert.Equal(t, os.Getpid(), pid)
+
+	// Release lock
+	err = lock.Release()
+	require.NoError(t, err)
+
+	// Verify PID file is gone
+	_, err = os.Stat(filepath.Join(tmpDir, "vaultik.pid"))
+	assert.True(t, os.IsNotExist(err))
+}
+
+func TestAcquireBlocksSecondInstance(t *testing.T) {
+	tmpDir := t.TempDir()
+
+	// Acquire first lock
+	lock1, err := Acquire(tmpDir)
+	require.NoError(t, err)
+	require.NotNil(t, lock1)
+	defer func() { _ = lock1.Release() }()
+
+	// Try to acquire second lock - should fail
+	lock2, err := Acquire(tmpDir)
+	assert.ErrorIs(t, err, ErrAlreadyRunning)
+	assert.Nil(t, lock2)
+}
+
+func TestAcquireWithStaleLock(t *testing.T) {
+	tmpDir := t.TempDir()
+
+	// Write a stale PID file (PID that doesn't exist)
+	stalePID := 999999999 // Unlikely to be a real process
+	pidPath := filepath.Join(tmpDir, "vaultik.pid")
+	err := os.WriteFile(pidPath, []byte(strconv.Itoa(stalePID)), 0600)
+	require.NoError(t, err)
+
+	// Should be able to acquire lock (stale lock is cleaned up)
+	lock, err := Acquire(tmpDir)
+	require.NoError(t, err)
+	require.NotNil(t, lock)
+	defer func() { _ = lock.Release() }()
+
+	// Verify our PID is now in the file
+	data, err := os.ReadFile(pidPath)
+	require.NoError(t, err)
+	pid, err := strconv.Atoi(string(data))
+	require.NoError(t, err)
+	assert.Equal(t, os.Getpid(), pid)
+}
+
+func TestReleaseIsIdempotent(t *testing.T) {
+	tmpDir := t.TempDir()
+
+	lock, err := Acquire(tmpDir)
+	require.NoError(t, err)
+
+	// Release multiple times - should not error
+	err = lock.Release()
+	require.NoError(t, err)
+
+	err = lock.Release()
+	require.NoError(t, err)
+}
+
+func TestReleaseNilLock(t *testing.T) {
+	var lock *Lock
+	err := lock.Release()
+	assert.NoError(t, err)
+}
+
+func TestAcquireCreatesDirectory(t *testing.T) {
+	tmpDir := t.TempDir()
+	nestedDir := filepath.Join(tmpDir, "nested", "dir")
+
+	lock, err := Acquire(nestedDir)
+	require.NoError(t, err)
+	require.NotNil(t, lock)
+	defer func() { _ = lock.Release() }()
+
+	// Verify directory was created
+	info, err := os.Stat(nestedDir)
+	require.NoError(t, err)
+	assert.True(t, info.IsDir())
+}
--- a/internal/s3/client.go
+++ b/internal/s3/client.go
@@ -0,0 +1,334 @@
+package s3
+
+import (
+	"context"
+	"io"
+	"sync/atomic"
+
+	"github.com/aws/aws-sdk-go-v2/aws"
+	"github.com/aws/aws-sdk-go-v2/config"
+	"github.com/aws/aws-sdk-go-v2/credentials"
+	"github.com/aws/aws-sdk-go-v2/feature/s3/manager"
+	"github.com/aws/aws-sdk-go-v2/service/s3"
+	"github.com/aws/smithy-go/logging"
+)
+
+// Client wraps the AWS S3 client for vaultik operations.
+// It provides a simplified interface for S3 operations with automatic
+// prefix handling and connection management. All operations are performed
+// within the configured bucket and prefix.
+type Client struct {
+	s3Client *s3.Client
+	bucket   string
+	prefix   string
+	endpoint string
+}
+
+// Config contains S3 client configuration.
+// All fields are required except Prefix, which defaults to an empty string.
+// The Endpoint field should include the protocol (http:// or https://).
+type Config struct {
+	Endpoint        string
+	Bucket          string
+	Prefix          string
+	AccessKeyID     string
+	SecretAccessKey string
+	Region          string
+}
+
+// nopLogger is a logger that discards all output.
+// Used to suppress SDK warnings about checksums.
+type nopLogger struct{}
+
+func (nopLogger) Logf(classification logging.Classification, format string, v ...interface{}) {}
+
+// NewClient creates a new S3 client with the provided configuration.
+// It establishes a connection to the S3-compatible storage service and
+// validates the credentials. The client uses static credentials and
+// path-style URLs for compatibility with various S3-compatible services.
+func NewClient(ctx context.Context, cfg Config) (*Client, error) {
+	// Create AWS config with a nop logger to suppress SDK warnings
+	awsCfg, err := config.LoadDefaultConfig(ctx,
+		config.WithRegion(cfg.Region),
+		config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(
+			cfg.AccessKeyID,
+			cfg.SecretAccessKey,
+			"",
+		)),
+		config.WithLogger(nopLogger{}),
+	)
+	if err != nil {
+		return nil, err
+	}
+
+	// Configure custom endpoint if provided
+	s3Opts := func(o *s3.Options) {
+		if cfg.Endpoint != "" {
+			o.BaseEndpoint = aws.String(cfg.Endpoint)
+			o.UsePathStyle = true
+		}
+	}
+
+	s3Client := s3.NewFromConfig(awsCfg, s3Opts)
+
+	return &Client{
+		s3Client: s3Client,
+		bucket:   cfg.Bucket,
+		prefix:   cfg.Prefix,
+		endpoint: cfg.Endpoint,
+	}, nil
+}
+
+// PutObject uploads an object to S3 with the specified key.
+// The key is automatically prefixed with the configured prefix.
+// The data parameter should be a reader containing the object data.
+// Returns an error if the upload fails.
+func (c *Client) PutObject(ctx context.Context, key string, data io.Reader) error {
+	fullKey := c.prefix + key
+	_, err := c.s3Client.PutObject(ctx, &s3.PutObjectInput{
+		Bucket: aws.String(c.bucket),
+		Key:    aws.String(fullKey),
+		Body:   data,
+	})
+	return err
+}
+
+// ProgressCallback is called during upload progress with bytes uploaded so far.
+// The callback should return an error to cancel the upload.
+type ProgressCallback func(bytesUploaded int64) error
+
+// PutObjectWithProgress uploads an object to S3 with progress tracking.
+// The key is automatically prefixed with the configured prefix.
+// The size parameter must be the exact size of the data to upload.
+// The progress callback is called periodically with the number of bytes uploaded.
+// Returns an error if the upload fails.
+func (c *Client) PutObjectWithProgress(ctx context.Context, key string, data io.Reader, size int64, progress ProgressCallback) error {
+	fullKey := c.prefix + key
+
+	// Create an uploader with the S3 client
+	uploader := manager.NewUploader(c.s3Client, func(u *manager.Uploader) {
+		// Set part size to 10MB for better progress granularity
+		u.PartSize = 10 * 1024 * 1024
+	})
+
+	// Create a progress reader that tracks upload progress
+	pr := &progressReader{
+		reader:   data,
+		size:     size,
+		callback: progress,
+		read:     0,
+	}
+
+	// Upload the file
+	_, err := uploader.Upload(ctx, &s3.PutObjectInput{
+		Bucket: aws.String(c.bucket),
+		Key:    aws.String(fullKey),
+		Body:   pr,
+	})
+
+	return err
+}
+
+// GetObject downloads an object from S3 with the specified key.
+// The key is automatically prefixed with the configured prefix.
+// Returns a ReadCloser containing the object data. The caller must
+// close the returned reader when done to avoid resource leaks.
+func (c *Client) GetObject(ctx context.Context, key string) (io.ReadCloser, error) {
+	fullKey := c.prefix + key
+	result, err := c.s3Client.GetObject(ctx, &s3.GetObjectInput{
+		Bucket: aws.String(c.bucket),
+		Key:    aws.String(fullKey),
+	})
+	if err != nil {
+		return nil, err
+	}
+	return result.Body, nil
+}
+
+// DeleteObject removes an object from S3 with the specified key.
+// The key is automatically prefixed with the configured prefix.
+// No error is returned if the object doesn't exist.
+func (c *Client) DeleteObject(ctx context.Context, key string) error {
+	fullKey := c.prefix + key
+	_, err := c.s3Client.DeleteObject(ctx, &s3.DeleteObjectInput{
+		Bucket: aws.String(c.bucket),
+		Key:    aws.String(fullKey),
+	})
+	return err
+}
+
+// ListObjects lists all objects with the given prefix.
+// The prefix is combined with the client's configured prefix.
+// Returns a slice of object keys with the base prefix removed.
+// This method loads all matching keys into memory, so use
+// ListObjectsStream for large result sets.
+func (c *Client) ListObjects(ctx context.Context, prefix string) ([]string, error) {
+	fullPrefix := c.prefix + prefix
+
+	var keys []string
+	paginator := s3.NewListObjectsV2Paginator(c.s3Client, &s3.ListObjectsV2Input{
+		Bucket: aws.String(c.bucket),
+		Prefix: aws.String(fullPrefix),
+	})
+
+	for paginator.HasMorePages() {
+		page, err := paginator.NextPage(ctx)
+		if err != nil {
+			return nil, err
+		}
+
+		for _, obj := range page.Contents {
+			if obj.Key != nil {
+				// Remove the base prefix from the key
+				key := *obj.Key
+				if len(key) > len(c.prefix) {
+					key = key[len(c.prefix):]
+				}
+				keys = append(keys, key)
+			}
+		}
+	}
+
+	return keys, nil
+}
+
+// HeadObject checks if an object exists in S3.
+// Returns true if the object exists, false otherwise.
+// The key is automatically prefixed with the configured prefix.
+// Note: This method returns false for any error, not just "not found".
+func (c *Client) HeadObject(ctx context.Context, key string) (bool, error) {
+	fullKey := c.prefix + key
+	_, err := c.s3Client.HeadObject(ctx, &s3.HeadObjectInput{
+		Bucket: aws.String(c.bucket),
+		Key:    aws.String(fullKey),
+	})
+	if err != nil {
+		// Check if it's a not found error
+		// TODO: Add proper error type checking
+		return false, nil
+	}
+	return true, nil
+}
+
+// ObjectInfo contains information about an S3 object.
+// It is used by ListObjectsStream to return object metadata
+// along with any errors encountered during listing.
+type ObjectInfo struct {
+	Key  string
+	Size int64
+	Err  error
+}
+
+// ListObjectsStream lists objects with the given prefix and returns a channel.
+// This method is preferred for large result sets as it streams results
+// instead of loading everything into memory. The channel is closed when
+// listing is complete or an error occurs. If an error occurs, it will be
+// sent as the last item with the Err field set. The recursive parameter
+// is currently unused but reserved for future use.
+func (c *Client) ListObjectsStream(ctx context.Context, prefix string, recursive bool) <-chan ObjectInfo {
+	ch := make(chan ObjectInfo)
+
+	go func() {
+		defer close(ch)
+
+		fullPrefix := c.prefix + prefix
+
+		paginator := s3.NewListObjectsV2Paginator(c.s3Client, &s3.ListObjectsV2Input{
+			Bucket: aws.String(c.bucket),
+			Prefix: aws.String(fullPrefix),
+		})
+
+		for paginator.HasMorePages() {
+			page, err := paginator.NextPage(ctx)
+			if err != nil {
+				ch <- ObjectInfo{Err: err}
+				return
+			}
+
+			for _, obj := range page.Contents {
+				if obj.Key != nil && obj.Size != nil {
+					// Remove the base prefix from the key
+					key := *obj.Key
+					if len(key) > len(c.prefix) {
+						key = key[len(c.prefix):]
+					}
+					ch <- ObjectInfo{
+						Key:  key,
+						Size: *obj.Size,
+					}
+				}
+			}
+		}
+	}()
+
+	return ch
+}
+
+// StatObject returns information about an object without downloading it.
+// The key is automatically prefixed with the configured prefix.
+// Returns an ObjectInfo struct with the object's metadata.
+// Returns an error if the object doesn't exist or if the operation fails.
+func (c *Client) StatObject(ctx context.Context, key string) (*ObjectInfo, error) {
+	fullKey := c.prefix + key
+	result, err := c.s3Client.HeadObject(ctx, &s3.HeadObjectInput{
+		Bucket: aws.String(c.bucket),
+		Key:    aws.String(fullKey),
+	})
+	if err != nil {
+		return nil, err
+	}
+
+	size := int64(0)
+	if result.ContentLength != nil {
+		size = *result.ContentLength
+	}
+
+	return &ObjectInfo{
+		Key:  key,
+		Size: size,
+	}, nil
+}
+
+// RemoveObject deletes an object from S3 (alias for DeleteObject).
+// This method exists for API compatibility and simply calls DeleteObject.
+func (c *Client) RemoveObject(ctx context.Context, key string) error {
+	return c.DeleteObject(ctx, key)
+}
+
+// BucketName returns the configured S3 bucket name.
+// This is useful for displaying configuration information.
+func (c *Client) BucketName() string {
+	return c.bucket
+}
+
+// Endpoint returns the S3 endpoint URL.
+// If no custom endpoint was configured, returns the default AWS S3 endpoint.
+// This is useful for displaying configuration information.
+func (c *Client) Endpoint() string {
+	if c.endpoint == "" {
+		return "s3.amazonaws.com"
+	}
+	return c.endpoint
+}
+
+// progressReader wraps an io.Reader to track reading progress
+type progressReader struct {
+	reader   io.Reader
+	size     int64
+	read     int64
+	callback ProgressCallback
+}
+
+// Read implements io.Reader
+func (pr *progressReader) Read(p []byte) (int, error) {
+	n, err := pr.reader.Read(p)
+	if n > 0 {
+		atomic.AddInt64(&pr.read, int64(n))
+		if pr.callback != nil {
+			if callbackErr := pr.callback(atomic.LoadInt64(&pr.read)); callbackErr != nil {
+				return n, callbackErr
+			}
+		}
+	}
+	return n, err
+}
--- a/internal/s3/client_test.go
+++ b/internal/s3/client_test.go
@@ -0,0 +1,98 @@
+package s3_test
+
+import (
+	"bytes"
+	"context"
+	"io"
+	"testing"
+
+	"git.eeqj.de/sneak/vaultik/internal/s3"
+)
+
+func TestClient(t *testing.T) {
+	ts := NewTestServer(t)
+	defer func() {
+		if err := ts.Cleanup(); err != nil {
+			t.Errorf("cleanup failed: %v", err)
+		}
+	}()
+
+	ctx := context.Background()
+
+	// Create client
+	client, err := s3.NewClient(ctx, s3.Config{
+		Endpoint:        testEndpoint,
+		Bucket:          testBucket,
+		Prefix:          "test-prefix/",
+		AccessKeyID:     testAccessKey,
+		SecretAccessKey: testSecretKey,
+		Region:          testRegion,
+	})
+	if err != nil {
+		t.Fatalf("failed to create client: %v", err)
+	}
+
+	// Test PutObject
+	testKey := "foo/bar.txt"
+	testData := []byte("test data")
+	err = client.PutObject(ctx, testKey, bytes.NewReader(testData))
+	if err != nil {
+		t.Fatalf("failed to put object: %v", err)
+	}
+
+	// Test GetObject
+	reader, err := client.GetObject(ctx, testKey)
+	if err != nil {
+		t.Fatalf("failed to get object: %v", err)
+	}
+	defer func() {
+		if err := reader.Close(); err != nil {
+			t.Errorf("failed to close reader: %v", err)
+		}
+	}()
+
+	data, err := io.ReadAll(reader)
+	if err != nil {
+		t.Fatalf("failed to read data: %v", err)
+	}
+
+	if !bytes.Equal(data, testData) {
+		t.Errorf("data mismatch: got %q, want %q", data, testData)
+	}
+
+	// Test HeadObject
+	exists, err := client.HeadObject(ctx, testKey)
+	if err != nil {
+		t.Fatalf("failed to head object: %v", err)
+	}
+	if !exists {
+		t.Error("expected object to exist")
+	}
+
+	// Test ListObjects
+	keys, err := client.ListObjects(ctx, "foo/")
+	if err != nil {
+		t.Fatalf("failed to list objects: %v", err)
+	}
+	if len(keys) != 1 {
+		t.Errorf("expected 1 key, got %d", len(keys))
+	}
+	if keys[0] != testKey {
+		t.Errorf("unexpected key: got %s, want %s", keys[0], testKey)
+	}
+
+	// Test DeleteObject
+	err = client.DeleteObject(ctx, testKey)
+	if err != nil {
+		t.Fatalf("failed to delete object: %v", err)
+	}
+
+	// Verify deletion
+	exists, err = client.HeadObject(ctx, testKey)
+	if err != nil {
+		t.Fatalf("failed to head object after deletion: %v", err)
+	}
+	if exists {
+		t.Error("expected object to not exist after deletion")
+	}
+}
--- a/internal/s3/module.go
+++ b/internal/s3/module.go
@@ -0,0 +1,42 @@
+package s3
+
+import (
+	"context"
+
+	"git.eeqj.de/sneak/vaultik/internal/config"
+	"go.uber.org/fx"
+)
+
+// Module exports S3 functionality as an fx module.
+// It provides automatic dependency injection for the S3 client,
+// configuring it based on the application's configuration settings.
+var Module = fx.Module("s3",
+	fx.Provide(
+		provideClient,
+	),
+)
+
+func provideClient(lc fx.Lifecycle, cfg *config.Config) (*Client, error) {
+	ctx := context.Background()
+
+	client, err := NewClient(ctx, Config{
+		Endpoint:        cfg.S3.Endpoint,
+		Bucket:          cfg.S3.Bucket,
+		Prefix:          cfg.S3.Prefix,
+		AccessKeyID:     cfg.S3.AccessKeyID,
+		SecretAccessKey: cfg.S3.SecretAccessKey,
+		Region:          cfg.S3.Region,
+	})
+	if err != nil {
+		return nil, err
+	}
+
+	lc.Append(fx.Hook{
+		OnStop: func(ctx context.Context) error {
+			// S3 client doesn't need explicit cleanup
+			return nil
+		},
+	})
+
+	return client, nil
+}
--- a/internal/s3/s3_test.go
+++ b/internal/s3/s3_test.go
@@ -0,0 +1,306 @@
+package s3_test
+
+import (
+	"bytes"
+	"context"
+	"fmt"
+	"io"
+	"net/http"
+	"os"
+	"path/filepath"
+	"testing"
+	"time"
+
+	"github.com/aws/aws-sdk-go-v2/aws"
+	"github.com/aws/aws-sdk-go-v2/config"
+	"github.com/aws/aws-sdk-go-v2/credentials"
+	"github.com/aws/aws-sdk-go-v2/service/s3"
+	"github.com/aws/smithy-go/logging"
+	"github.com/johannesboyne/gofakes3"
+	"github.com/johannesboyne/gofakes3/backend/s3mem"
+)
+
+const (
+	testBucket    = "test-bucket"
+	testRegion    = "us-east-1"
+	testAccessKey = "test-access-key"
+	testSecretKey = "test-secret-key"
+	testEndpoint  = "http://localhost:9999"
+)
+
+// TestServer represents an in-process S3-compatible test server
+type TestServer struct {
+	server   *http.Server
+	backend  gofakes3.Backend
+	s3Client *s3.Client
+	tempDir  string
+	logBuf   *bytes.Buffer
+}
+
+// NewTestServer creates and starts a new test server
+func NewTestServer(t *testing.T) *TestServer {
+	// Create temp directory for any file operations
+	tempDir, err := os.MkdirTemp("", "vaultik-s3-test-*")
+	if err != nil {
+		t.Fatalf("failed to create temp dir: %v", err)
+	}
+
+	// Create in-memory backend
+	backend := s3mem.New()
+	faker := gofakes3.New(backend)
+
+	// Create HTTP server
+	server := &http.Server{
+		Addr:    "localhost:9999",
+		Handler: faker.Server(),
+	}
+
+	// Start server in background
+	go func() {
+		if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
+			t.Logf("test server error: %v", err)
+		}
+	}()
+
+	// Wait for server to be ready
+	time.Sleep(100 * time.Millisecond)
+
+	// Create a buffer to capture logs
+	logBuf := &bytes.Buffer{}
+
+	// Create S3 client with custom logger
+	cfg, err := config.LoadDefaultConfig(context.Background(),
+		config.WithRegion(testRegion),
+		config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(
+			testAccessKey,
+			testSecretKey,
+			"",
+		)),
+		config.WithClientLogMode(aws.LogRetries|aws.LogRequestWithBody|aws.LogResponseWithBody),
+		config.WithLogger(logging.LoggerFunc(func(classification logging.Classification, format string, v ...interface{}) {
+			// Capture logs to buffer instead of stdout
+			fmt.Fprintf(logBuf, "SDK %s %s %s\n",
+				time.Now().Format("2006/01/02 15:04:05"),
+				string(classification),
+				fmt.Sprintf(format, v...))
+		})),
+	)
+	if err != nil {
+		t.Fatalf("failed to create AWS config: %v", err)
+	}
+
+	s3Client := s3.NewFromConfig(cfg, func(o *s3.Options) {
+		o.BaseEndpoint = aws.String(testEndpoint)
+		o.UsePathStyle = true
+	})
+
+	ts := &TestServer{
+		server:   server,
+		backend:  backend,
+		s3Client: s3Client,
+		tempDir:  tempDir,
+		logBuf:   logBuf,
+	}
+
+	// Register cleanup to show logs on test failure
+	t.Cleanup(func() {
+		if t.Failed() && logBuf.Len() > 0 {
+			t.Logf("S3 SDK Debug Output:\n%s", logBuf.String())
+		}
+	})
+
+	// Create test bucket
+	_, err = s3Client.CreateBucket(context.Background(), &s3.CreateBucketInput{
+		Bucket: aws.String(testBucket),
+	})
+	if err != nil {
+		t.Fatalf("failed to create test bucket: %v", err)
+	}
+
+	return ts
+}
+
+// Cleanup shuts down the server and removes temp directory
+func (ts *TestServer) Cleanup() error {
+	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+	defer cancel()
+
+	if err := ts.server.Shutdown(ctx); err != nil {
+		return err
+	}
+
+	return os.RemoveAll(ts.tempDir)
+}
+
+// Client returns the S3 client configured for the test server
+func (ts *TestServer) Client() *s3.Client {
+	return ts.s3Client
+}
+
+// TestBasicS3Operations tests basic store and retrieve operations
+func TestBasicS3Operations(t *testing.T) {
+	ts := NewTestServer(t)
+	defer func() {
+		if err := ts.Cleanup(); err != nil {
+			t.Errorf("cleanup failed: %v", err)
+		}
+	}()
+
+	ctx := context.Background()
+	client := ts.Client()
+
+	// Test data
+	testKey := "test/file.txt"
+	testData := []byte("Hello, S3 test!")
+
+	// Put object
+	_, err := client.PutObject(ctx, &s3.PutObjectInput{
+		Bucket: aws.String(testBucket),
+		Key:    aws.String(testKey),
+		Body:   bytes.NewReader(testData),
+	})
+	if err != nil {
+		t.Fatalf("failed to put object: %v", err)
+	}
+
+	// Get object
+	result, err := client.GetObject(ctx, &s3.GetObjectInput{
+		Bucket: aws.String(testBucket),
+		Key:    aws.String(testKey),
+	})
+	if err != nil {
+		t.Fatalf("failed to get object: %v", err)
+	}
+	defer func() {
+		if err := result.Body.Close(); err != nil {
+			t.Errorf("failed to close body: %v", err)
+		}
+	}()
+
+	// Read and verify data
+	data, err := io.ReadAll(result.Body)
+	if err != nil {
+		t.Fatalf("failed to read object body: %v", err)
+	}
+
+	if !bytes.Equal(data, testData) {
+		t.Errorf("retrieved data mismatch: got %q, want %q", data, testData)
+	}
+}
+
+// TestBlobOperations tests blob storage patterns for vaultik
+func TestBlobOperations(t *testing.T) {
+	ts := NewTestServer(t)
+	defer func() {
+		if err := ts.Cleanup(); err != nil {
+			t.Errorf("cleanup failed: %v", err)
+		}
+	}()
+
+	ctx := context.Background()
+	client := ts.Client()
+
+	// Test blob storage with prefix structure
+	blobHash := "aabbccddee112233445566778899aabbccddee11"
+	blobKey := filepath.Join("blobs", blobHash[:2], blobHash[2:4], blobHash+".zst.age")
+	blobData := []byte("compressed and encrypted blob data")
+
+	// Store blob
+	_, err := client.PutObject(ctx, &s3.PutObjectInput{
+		Bucket: aws.String(testBucket),
+		Key:    aws.String(blobKey),
+		Body:   bytes.NewReader(blobData),
+	})
+	if err != nil {
+		t.Fatalf("failed to store blob: %v", err)
+	}
+
+	// List objects with prefix
+	listResult, err := client.ListObjectsV2(ctx, &s3.ListObjectsV2Input{
+		Bucket: aws.String(testBucket),
+		Prefix: aws.String("blobs/aa/"),
+	})
+	if err != nil {
+		t.Fatalf("failed to list objects: %v", err)
+	}
+
+	if len(listResult.Contents) != 1 {
+		t.Errorf("expected 1 object, got %d", len(listResult.Contents))
+	}
+
+	if listResult.Contents[0].Key != nil && *listResult.Contents[0].Key != blobKey {
+		t.Errorf("unexpected key: got %s, want %s", *listResult.Contents[0].Key, blobKey)
+	}
+
+	// Delete blob
+	_, err = client.DeleteObject(ctx, &s3.DeleteObjectInput{
+		Bucket: aws.String(testBucket),
+		Key:    aws.String(blobKey),
+	})
+	if err != nil {
+		t.Fatalf("failed to delete blob: %v", err)
+	}
+
+	// Verify deletion
+	_, err = client.GetObject(ctx, &s3.GetObjectInput{
+		Bucket: aws.String(testBucket),
+		Key:    aws.String(blobKey),
+	})
+	if err == nil {
+		t.Error("expected error getting deleted object, got nil")
+	}
+}
+
+// TestMetadataOperations tests metadata storage patterns
+func TestMetadataOperations(t *testing.T) {
+	ts := NewTestServer(t)
+	defer func() {
+		if err := ts.Cleanup(); err != nil {
+			t.Errorf("cleanup failed: %v", err)
+		}
+	}()
+
+	ctx := context.Background()
+	client := ts.Client()
+
+	// Test metadata storage
+	snapshotID := "2024-01-01T12:00:00Z"
+	metadataKey := filepath.Join("metadata", snapshotID+".sqlite.age")
+	metadataData := []byte("encrypted sqlite database")
+
+	// Store metadata
+	_, err := client.PutObject(ctx, &s3.PutObjectInput{
+		Bucket: aws.String(testBucket),
+		Key:    aws.String(metadataKey),
+		Body:   bytes.NewReader(metadataData),
+	})
+	if err != nil {
+		t.Fatalf("failed to store metadata: %v", err)
+	}
+
+	// Store manifest
+	manifestKey := filepath.Join("metadata", snapshotID+".manifest.json.zst")
+	manifestData := []byte(`{"snapshot_id":"2024-01-01T12:00:00Z","blob_hashes":["hash1","hash2"]}`)
+
+	_, err = client.PutObject(ctx, &s3.PutObjectInput{
+		Bucket: aws.String(testBucket),
+		Key:    aws.String(manifestKey),
+		Body:   bytes.NewReader(manifestData),
+	})
+	if err != nil {
+		t.Fatalf("failed to store manifest: %v", err)
+	}
+
+	// List metadata objects
+	listResult, err := client.ListObjectsV2(ctx, &s3.ListObjectsV2Input{
+		Bucket: aws.String(testBucket),
+		Prefix: aws.String("metadata/"),
+	})
+	if err != nil {
+		t.Fatalf("failed to list metadata: %v", err)
+	}
+
+	if len(listResult.Contents) != 2 {
+		t.Errorf("expected 2 metadata objects, got %d", len(listResult.Contents))
+	}
+}
--- a/internal/snapshot/backup_test.go
+++ b/internal/snapshot/backup_test.go
@@ -0,0 +1,534 @@
+package snapshot
+
+import (
+	"context"
+	"crypto/sha256"
+	"database/sql"
+	"fmt"
+	"io"
+	"io/fs"
+	"os"
+	"path/filepath"
+	"testing"
+	"testing/fstest"
+	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/database"
+	"git.eeqj.de/sneak/vaultik/internal/types"
+)
+
+// MockS3Client is a mock implementation of S3 operations for testing
+type MockS3Client struct {
+	storage map[string][]byte
+}
+
+func NewMockS3Client() *MockS3Client {
+	return &MockS3Client{
+		storage: make(map[string][]byte),
+	}
+}
+
+func (m *MockS3Client) PutBlob(ctx context.Context, hash string, data []byte) error {
+	m.storage[hash] = data
+	return nil
+}
+
+func (m *MockS3Client) GetBlob(ctx context.Context, hash string) ([]byte, error) {
+	data, ok := m.storage[hash]
+	if !ok {
+		return nil, fmt.Errorf("blob not found: %s", hash)
+	}
+	return data, nil
+}
+
+func (m *MockS3Client) BlobExists(ctx context.Context, hash string) (bool, error) {
+	_, ok := m.storage[hash]
+	return ok, nil
+}
+
+func (m *MockS3Client) CreateBucket(ctx context.Context, bucket string) error {
+	return nil
+}
+
+func TestBackupWithInMemoryFS(t *testing.T) {
+	// Create a temporary directory for the database
+	tempDir := t.TempDir()
+	dbPath := filepath.Join(tempDir, "test.db")
+
+	// Create test filesystem
+	testFS := fstest.MapFS{
+		"file1.txt": &fstest.MapFile{
+			Data:    []byte("Hello, World!"),
+			Mode:    0644,
+			ModTime: time.Now(),
+		},
+		"dir1/file2.txt": &fstest.MapFile{
+			Data:    []byte("This is a test file with some content."),
+			Mode:    0755,
+			ModTime: time.Now(),
+		},
+		"dir1/subdir/file3.txt": &fstest.MapFile{
+			Data:    []byte("Another file in a subdirectory."),
+			Mode:    0600,
+			ModTime: time.Now(),
+		},
+		"largefile.bin": &fstest.MapFile{
+			Data:    generateLargeFileContent(10 * 1024 * 1024), // 10MB file with varied content
+			Mode:    0644,
+			ModTime: time.Now(),
+		},
+	}
+
+	// Initialize the database
+	ctx := context.Background()
+	db, err := database.New(ctx, dbPath)
+	if err != nil {
+		t.Fatalf("Failed to create database: %v", err)
+	}
+	defer func() {
+		if err := db.Close(); err != nil {
+			t.Logf("Failed to close database: %v", err)
+		}
+	}()
+
+	repos := database.NewRepositories(db)
+
+	// Create mock S3 client
+	s3Client := NewMockS3Client()
+
+	// Run backup
+	backupEngine := &BackupEngine{
+		repos:    repos,
+		s3Client: s3Client,
+	}
+
+	snapshotID, err := backupEngine.Backup(ctx, testFS, ".")
+	if err != nil {
+		t.Fatalf("Backup failed: %v", err)
+	}
+
+	// Verify snapshot was created
+	snapshot, err := repos.Snapshots.GetByID(ctx, snapshotID)
+	if err != nil {
+		t.Fatalf("Failed to get snapshot: %v", err)
+	}
+
+	if snapshot == nil {
+		t.Fatal("Snapshot not found")
+	}
+
+	if snapshot.FileCount == 0 {
+		t.Error("Expected snapshot to have files")
+	}
+
+	// Verify files in database
+	files, err := repos.Files.ListByPrefix(ctx, "")
+	if err != nil {
+		t.Fatalf("Failed to list files: %v", err)
+	}
+
+	expectedFiles := map[string]bool{
+		"file1.txt":             true,
+		"dir1/file2.txt":        true,
+		"dir1/subdir/file3.txt": true,
+		"largefile.bin":         true,
+	}
+
+	if len(files) != len(expectedFiles) {
+		t.Errorf("Expected %d files, got %d", len(expectedFiles), len(files))
+	}
+
+	for _, file := range files {
+		if !expectedFiles[file.Path.String()] {
+			t.Errorf("Unexpected file in database: %s", file.Path)
+		}
+		delete(expectedFiles, file.Path.String())
+
+		// Verify file metadata
+		fsFile := testFS[file.Path.String()]
+		if fsFile == nil {
+			t.Errorf("File %s not found in test filesystem", file.Path)
+			continue
+		}
+
+		if file.Size != int64(len(fsFile.Data)) {
+			t.Errorf("File %s: expected size %d, got %d", file.Path, len(fsFile.Data), file.Size)
+		}
+
+		if file.Mode != uint32(fsFile.Mode) {
+			t.Errorf("File %s: expected mode %o, got %o", file.Path, fsFile.Mode, file.Mode)
+		}
+	}
+
+	if len(expectedFiles) > 0 {
+		t.Errorf("Files not found in database: %v", expectedFiles)
+	}
+
+	// Verify chunks
+	chunks, err := repos.Chunks.List(ctx)
+	if err != nil {
+		t.Fatalf("Failed to list chunks: %v", err)
+	}
+
+	if len(chunks) == 0 {
+		t.Error("No chunks found in database")
+	}
+
+	// The large file should create 10 chunks (10MB / 1MB chunk size)
+	// Plus the small files
+	minExpectedChunks := 10 + 3
+	if len(chunks) < minExpectedChunks {
+		t.Errorf("Expected at least %d chunks, got %d", minExpectedChunks, len(chunks))
+	}
+
+	// Verify at least one blob was created and uploaded
+	// We can't list blobs directly, but we can check via snapshot blobs
+	blobHashes, err := repos.Snapshots.GetBlobHashes(ctx, snapshotID)
+	if err != nil {
+		t.Fatalf("Failed to get blob hashes: %v", err)
+	}
+	if len(blobHashes) == 0 {
+		t.Error("Expected at least one blob to be created")
+	}
+
+	for _, blobHash := range blobHashes {
+		// Check blob exists in mock S3
+		exists, err := s3Client.BlobExists(ctx, blobHash)
+		if err != nil {
+			t.Errorf("Failed to check blob %s: %v", blobHash, err)
+		}
+		if !exists {
+			t.Errorf("Blob %s not found in S3", blobHash)
+		}
+	}
+}
+
+func TestBackupDeduplication(t *testing.T) {
+	// Create a temporary directory for the database
+	tempDir := t.TempDir()
+	dbPath := filepath.Join(tempDir, "test.db")
+
+	// Create test filesystem with duplicate content
+	testFS := fstest.MapFS{
+		"file1.txt": &fstest.MapFile{
+			Data:    []byte("Duplicate content"),
+			Mode:    0644,
+			ModTime: time.Now(),
+		},
+		"file2.txt": &fstest.MapFile{
+			Data:    []byte("Duplicate content"),
+			Mode:    0644,
+			ModTime: time.Now(),
+		},
+		"file3.txt": &fstest.MapFile{
+			Data:    []byte("Unique content"),
+			Mode:    0644,
+			ModTime: time.Now(),
+		},
+	}
+
+	// Initialize the database
+	ctx := context.Background()
+	db, err := database.New(ctx, dbPath)
+	if err != nil {
+		t.Fatalf("Failed to create database: %v", err)
+	}
+	defer func() {
+		if err := db.Close(); err != nil {
+			t.Logf("Failed to close database: %v", err)
+		}
+	}()
+
+	repos := database.NewRepositories(db)
+
+	// Create mock S3 client
+	s3Client := NewMockS3Client()
+
+	// Run backup
+	backupEngine := &BackupEngine{
+		repos:    repos,
+		s3Client: s3Client,
+	}
+
+	_, err = backupEngine.Backup(ctx, testFS, ".")
+	if err != nil {
+		t.Fatalf("Backup failed: %v", err)
+	}
+
+	// Verify deduplication
+	chunks, err := repos.Chunks.List(ctx)
+	if err != nil {
+		t.Fatalf("Failed to list chunks: %v", err)
+	}
+
+	// Should have only 2 unique chunks (duplicate content + unique content)
+	if len(chunks) != 2 {
+		t.Errorf("Expected 2 unique chunks, got %d", len(chunks))
+	}
+
+	// Verify chunk references
+	for _, chunk := range chunks {
+		files, err := repos.ChunkFiles.GetByChunkHash(ctx, chunk.ChunkHash)
+		if err != nil {
+			t.Errorf("Failed to get files for chunk %s: %v", chunk.ChunkHash, err)
+		}
+
+		// The duplicate content chunk should be referenced by 2 files
+		if chunk.Size == int64(len("Duplicate content")) && len(files) != 2 {
+			t.Errorf("Expected duplicate chunk to be referenced by 2 files, got %d", len(files))
+		}
+	}
+}
+
+// BackupEngine performs backup operations
+type BackupEngine struct {
+	repos    *database.Repositories
+	s3Client interface {
+		PutBlob(ctx context.Context, hash string, data []byte) error
+		BlobExists(ctx context.Context, hash string) (bool, error)
+	}
+}
+
+// Backup performs a backup of the given filesystem
+func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (string, error) {
+	// Create a new snapshot
+	hostname, _ := os.Hostname()
+	snapshotID := time.Now().Format(time.RFC3339)
+	snapshot := &database.Snapshot{
+		ID:             types.SnapshotID(snapshotID),
+		Hostname:       types.Hostname(hostname),
+		VaultikVersion: "test",
+		StartedAt:      time.Now(),
+		CompletedAt:    nil,
+	}
+
+	// Create initial snapshot record
+	err := b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
+		return b.repos.Snapshots.Create(ctx, tx, snapshot)
+	})
+	if err != nil {
+		return "", err
+	}
+
+	// Track counters
+	var fileCount, chunkCount, blobCount, totalSize, blobSize int64
+
+	// Track which chunks we've seen to handle deduplication
+	processedChunks := make(map[string]bool)
+
+	// Scan the filesystem and process files
+	err = fs.WalkDir(fsys, root, func(path string, d fs.DirEntry, err error) error {
+		if err != nil {
+			return err
+		}
+
+		// Skip directories
+		if d.IsDir() {
+			return nil
+		}
+
+		// Get file info
+		info, err := d.Info()
+		if err != nil {
+			return err
+		}
+
+		// Handle symlinks
+		if info.Mode()&fs.ModeSymlink != 0 {
+			// For testing, we'll skip symlinks since fstest doesn't support them well
+			return nil
+		}
+
+		// Create file record in a short transaction
+		file := &database.File{
+			Path:  types.FilePath(path),
+			Size:  info.Size(),
+			Mode:  uint32(info.Mode()),
+			MTime: info.ModTime(),
+			CTime: fileCTime(info), // platform-specific: birth time on macOS, inode change time on Linux
+			UID:   1000,            // Default UID for test
+			GID:   1000,            // Default GID for test
+		}
+		err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
+			return b.repos.Files.Create(ctx, tx, file)
+		})
+		if err != nil {
+			return err
+		}
+
+		fileCount++
+		totalSize += info.Size()
+
+		// Read and process file in chunks
+		f, err := fsys.Open(path)
+		if err != nil {
+			return err
+		}
+		defer func() {
+			if err := f.Close(); err != nil {
+				// Log but don't fail since we're already in an error path potentially
+				fmt.Fprintf(os.Stderr, "Failed to close file: %v\n", err)
+			}
+		}()
+
+		// Process file in chunks
+		chunkIndex := 0
+		buffer := make([]byte, defaultChunkSize)
+
+		for {
+			n, err := f.Read(buffer)
+			if err != nil && err != io.EOF {
+				return err
+			}
+			if n == 0 {
+				break
+			}
+
+			chunkData := buffer[:n]
+			chunkHash := calculateHash(chunkData)
+
+			// Check if chunk already exists (outside of transaction)
+			existingChunk, _ := b.repos.Chunks.GetByHash(ctx, chunkHash)
+			if existingChunk == nil {
+				// Create new chunk in a short transaction
+				err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
+					chunk := &database.Chunk{
+						ChunkHash: types.ChunkHash(chunkHash),
+						Size:      int64(n),
+					}
+					return b.repos.Chunks.Create(ctx, tx, chunk)
+				})
+				if err != nil {
+					return err
+				}
+				processedChunks[chunkHash] = true
+			}
+
+			// Create file-chunk mapping in a short transaction
+			err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
+				fileChunk := &database.FileChunk{
+					FileID:    file.ID,
+					Idx:       chunkIndex,
+					ChunkHash: types.ChunkHash(chunkHash),
+				}
+				return b.repos.FileChunks.Create(ctx, tx, fileChunk)
+			})
+			if err != nil {
+				return err
+			}
+
+			// Create chunk-file mapping in a short transaction
+			err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
+				chunkFile := &database.ChunkFile{
+					ChunkHash:  types.ChunkHash(chunkHash),
+					FileID:     file.ID,
+					FileOffset: int64(chunkIndex * defaultChunkSize),
+					Length:     int64(n),
+				}
+				return b.repos.ChunkFiles.Create(ctx, tx, chunkFile)
+			})
+			if err != nil {
+				return err
+			}
+
+			chunkIndex++
+		}
+
+		return nil
+	})
+
+	if err != nil {
+		return "", err
+	}
+
+	// After all files are processed, create blobs for new chunks
+	for chunkHash := range processedChunks {
+		// Get chunk data (outside of transaction)
+		chunk, err := b.repos.Chunks.GetByHash(ctx, chunkHash)
+		if err != nil {
+			return "", err
+		}
+
+		chunkCount++
+
+		// In a real system, blobs would contain multiple chunks and be encrypted
+		// For testing, we'll create a blob with a "blob-" prefix to differentiate
+		blobHash := "blob-" + chunkHash
+
+		// For the test, we'll create dummy data since we don't have the original
+		dummyData := []byte(chunkHash)
+
+		// Upload to S3 as a blob
+		if err := b.s3Client.PutBlob(ctx, blobHash, dummyData); err != nil {
+			return "", err
+		}
+
+		// Create blob entry in a short transaction
+		blobID := types.NewBlobID()
+		err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
+			blob := &database.Blob{
+				ID:        blobID,
+				Hash:      types.BlobHash(blobHash),
+				CreatedTS: time.Now(),
+			}
+			return b.repos.Blobs.Create(ctx, tx, blob)
+		})
+		if err != nil {
+			return "", err
+		}
+
+		blobCount++
+		blobSize += chunk.Size
+
+		// Create blob-chunk mapping in a short transaction
+		err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
+			blobChunk := &database.BlobChunk{
+				BlobID:    blobID,
+				ChunkHash: types.ChunkHash(chunkHash),
+				Offset:    0,
+				Length:    chunk.Size,
+			}
+			return b.repos.BlobChunks.Create(ctx, tx, blobChunk)
+		})
+		if err != nil {
+			return "", err
+		}
+
+		// Add blob to snapshot in a short transaction
+		err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
+			return b.repos.Snapshots.AddBlob(ctx, tx, snapshotID, blobID, types.BlobHash(blobHash))
+		})
+		if err != nil {
+			return "", err
+		}
+	}
+
+	// Update snapshot with final counts
+	err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
+		return b.repos.Snapshots.UpdateCounts(ctx, tx, snapshotID, fileCount, chunkCount, blobCount, totalSize, blobSize)
+	})
+
+	if err != nil {
+		return "", err
+	}
+
+	return snapshotID, nil
+}
+
+func calculateHash(data []byte) string {
+	h := sha256.New()
+	h.Write(data)
+	return fmt.Sprintf("%x", h.Sum(nil))
+}
+
+func generateLargeFileContent(size int) []byte {
+	data := make([]byte, size)
+	// Fill with pattern that changes every chunk to avoid deduplication
+	for i := 0; i < size; i++ {
+		chunkNum := i / defaultChunkSize
+		data[i] = byte((i + chunkNum) % 256)
+	}
+	return data
+}
+
+const defaultChunkSize = 1024 * 1024 // 1MB chunks
--- a/internal/snapshot/ctime_darwin.go
+++ b/internal/snapshot/ctime_darwin.go
@@ -0,0 +1,26 @@
+package snapshot
+
+import (
+	"os"
+	"syscall"
+	"time"
+)
+
+// fileCTime returns the file creation time (birth time) on macOS.
+//
+// On macOS/Darwin, "ctime" refers to the file's birth time (when the file
+// was first created on disk). This is stored in the Birthtimespec field of
+// the syscall.Stat_t structure.
+//
+// This differs from Linux where "ctime" means inode change time (the last
+// time file metadata was modified). See ctime_linux.go for details.
+//
+// If the underlying stat information is unavailable (e.g. when using a
+// virtual filesystem like afero.MemMapFs), this falls back to mtime.
+func fileCTime(info os.FileInfo) time.Time {
+	stat, ok := info.Sys().(*syscall.Stat_t)
+	if !ok {
+		return info.ModTime()
+	}
+	return time.Unix(stat.Birthtimespec.Sec, stat.Birthtimespec.Nsec).UTC()
+}
--- a/internal/snapshot/ctime_linux.go
+++ b/internal/snapshot/ctime_linux.go
@@ -0,0 +1,28 @@
+package snapshot
+
+import (
+	"os"
+	"syscall"
+	"time"
+)
+
+// fileCTime returns the inode change time on Linux.
+//
+// On Linux, "ctime" refers to the inode change time — the last time the
+// file's metadata (permissions, ownership, link count, etc.) was modified.
+// This is NOT the file creation time; Linux did not expose birth time until
+// the statx(2) syscall was added in kernel 4.11, and Go's syscall package
+// does not yet surface it.
+//
+// This differs from macOS/Darwin where "ctime" means birth time (file
+// creation time). See ctime_darwin.go for details.
+//
+// If the underlying stat information is unavailable (e.g. when using a
+// virtual filesystem like afero.MemMapFs), this falls back to mtime.
+func fileCTime(info os.FileInfo) time.Time {
+	stat, ok := info.Sys().(*syscall.Stat_t)
+	if !ok {
+		return info.ModTime()
+	}
+	return time.Unix(stat.Ctim.Sec, stat.Ctim.Nsec).UTC()
+}
--- a/internal/snapshot/ctime_test.go
+++ b/internal/snapshot/ctime_test.go
@@ -0,0 +1,133 @@
+package snapshot
+
+import (
+	"os"
+	"path/filepath"
+	"testing"
+	"time"
+)
+
+func TestFileCTime_RealFile(t *testing.T) {
+	// Create a temporary file
+	dir := t.TempDir()
+	path := filepath.Join(dir, "testfile.txt")
+
+	if err := os.WriteFile(path, []byte("hello"), 0644); err != nil {
+		t.Fatal(err)
+	}
+
+	info, err := os.Stat(path)
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	ctime := fileCTime(info)
+
+	// ctime should be a valid time (not zero)
+	if ctime.IsZero() {
+		t.Fatal("fileCTime returned zero time")
+	}
+
+	// ctime should be close to now (within a few seconds)
+	diff := time.Since(ctime)
+	if diff < 0 || diff > 5*time.Second {
+		t.Fatalf("fileCTime returned unexpected time: %v (diff from now: %v)", ctime, diff)
+	}
+
+	// ctime should not equal mtime exactly in all cases, but for a freshly
+	// created file they should be very close
+	mtime := info.ModTime()
+	ctimeMtimeDiff := ctime.Sub(mtime)
+	if ctimeMtimeDiff < 0 {
+		ctimeMtimeDiff = -ctimeMtimeDiff
+	}
+	// For a freshly created file, ctime and mtime should be within 1 second
+	if ctimeMtimeDiff > time.Second {
+		t.Fatalf("ctime and mtime differ by too much for a new file: ctime=%v, mtime=%v, diff=%v",
+			ctime, mtime, ctimeMtimeDiff)
+	}
+}
+
+func TestFileCTime_AfterMtimeChange(t *testing.T) {
+	// Create a temporary file
+	dir := t.TempDir()
+	path := filepath.Join(dir, "testfile.txt")
+
+	if err := os.WriteFile(path, []byte("hello"), 0644); err != nil {
+		t.Fatal(err)
+	}
+
+	// Get initial ctime
+	info1, err := os.Stat(path)
+	if err != nil {
+		t.Fatal(err)
+	}
+	ctime1 := fileCTime(info1)
+
+	// Change mtime to a time in the past
+	pastTime := time.Date(2020, 1, 1, 0, 0, 0, 0, time.UTC)
+	if err := os.Chtimes(path, pastTime, pastTime); err != nil {
+		t.Fatal(err)
+	}
+
+	// Get new stats
+	info2, err := os.Stat(path)
+	if err != nil {
+		t.Fatal(err)
+	}
+	ctime2 := fileCTime(info2)
+	mtime2 := info2.ModTime()
+
+	// mtime should now be in the past
+	if mtime2.Year() != 2020 {
+		t.Fatalf("mtime not set correctly: %v", mtime2)
+	}
+
+	// On macOS: ctime (birth time) should remain unchanged since birth time
+	// doesn't change when mtime is updated.
+	// On Linux: ctime (inode change time) will be updated to ~now because
+	// changing mtime is a metadata change.
+	// Either way, ctime should NOT equal the past mtime we just set.
+	if ctime2.Equal(pastTime) {
+		t.Fatal("ctime should not equal the artificially set past mtime")
+	}
+
+	// ctime should still be a recent time (the original creation time or
+	// the metadata change time, depending on platform)
+	_ = ctime1 // used for reference; both platforms will have a recent ctime2
+	if time.Since(ctime2) > 10*time.Second {
+		t.Fatalf("ctime is unexpectedly old: %v", ctime2)
+	}
+}
+
+// TestFileCTime_NonSyscallFileInfo verifies the fallback to mtime when
+// the FileInfo doesn't have a *syscall.Stat_t (e.g. afero.MemMapFs).
+type mockFileInfo struct {
+	name    string
+	size    int64
+	mode    os.FileMode
+	modTime time.Time
+	isDir   bool
+}
+
+func (m *mockFileInfo) Name() string       { return m.name }
+func (m *mockFileInfo) Size() int64        { return m.size }
+func (m *mockFileInfo) Mode() os.FileMode  { return m.mode }
+func (m *mockFileInfo) ModTime() time.Time { return m.modTime }
+func (m *mockFileInfo) IsDir() bool        { return m.isDir }
+func (m *mockFileInfo) Sys() interface{}   { return nil } // No syscall.Stat_t
+
+func TestFileCTime_FallbackToMtime(t *testing.T) {
+	now := time.Now().UTC().Truncate(time.Second)
+	info := &mockFileInfo{
+		name:    "test.txt",
+		size:    100,
+		mode:    0644,
+		modTime: now,
+	}
+
+	ctime := fileCTime(info)
+	if !ctime.Equal(now) {
+		t.Fatalf("expected fallback to mtime %v, got %v", now, ctime)
+	}
+}
--- a/internal/snapshot/exclude_test.go
+++ b/internal/snapshot/exclude_test.go
@@ -0,0 +1,454 @@
+package snapshot_test
+
+import (
+	"context"
+	"database/sql"
+	"path/filepath"
+	"testing"
+	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/database"
+	"git.eeqj.de/sneak/vaultik/internal/log"
+	"git.eeqj.de/sneak/vaultik/internal/snapshot"
+	"git.eeqj.de/sneak/vaultik/internal/types"
+	"github.com/spf13/afero"
+	"github.com/stretchr/testify/require"
+)
+
+func setupExcludeTestFS(t *testing.T) afero.Fs {
+	t.Helper()
+
+	// Create in-memory filesystem
+	fs := afero.NewMemMapFs()
+
+	// Create test directory structure:
+	// /backup/
+	//   file1.txt           (should be backed up)
+	//   file2.log           (should be excluded if *.log is in patterns)
+	//   .git/
+	//     config            (should be excluded if .git is in patterns)
+	//     objects/
+	//       pack/
+	//         data.pack     (should be excluded if .git is in patterns)
+	//   src/
+	//     main.go           (should be backed up)
+	//     test.go           (should be backed up)
+	//   node_modules/
+	//     package/
+	//       index.js        (should be excluded if node_modules is in patterns)
+	//   cache/
+	//     temp.dat          (should be excluded if cache/ is in patterns)
+	//   build/
+	//     output.bin        (should be excluded if build is in patterns)
+	//   docs/
+	//     readme.md         (should be backed up)
+	//   .DS_Store           (should be excluded if .DS_Store is in patterns)
+	//   thumbs.db           (should be excluded if thumbs.db is in patterns)
+
+	files := map[string]string{
+		"/backup/file1.txt":                     "content1",
+		"/backup/file2.log":                     "log content",
+		"/backup/.git/config":                   "git config",
+		"/backup/.git/objects/pack/data.pack":   "pack data",
+		"/backup/src/main.go":                   "package main",
+		"/backup/src/test.go":                   "package main_test",
+		"/backup/node_modules/package/index.js": "module.exports = {}",
+		"/backup/cache/temp.dat":                "cached data",
+		"/backup/build/output.bin":              "binary data",
+		"/backup/docs/readme.md":                "# Documentation",
+		"/backup/.DS_Store":                     "ds store data",
+		"/backup/thumbs.db":                     "thumbs data",
+		"/backup/src/.hidden":                   "hidden file",
+		"/backup/important.log.bak":             "backup of log",
+	}
+
+	testTime := time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC)
+	for path, content := range files {
+		dir := filepath.Dir(path)
+		err := fs.MkdirAll(dir, 0755)
+		require.NoError(t, err)
+		err = afero.WriteFile(fs, path, []byte(content), 0644)
+		require.NoError(t, err)
+		err = fs.Chtimes(path, testTime, testTime)
+		require.NoError(t, err)
+	}
+
+	return fs
+}
+
+func createTestScanner(t *testing.T, fs afero.Fs, excludePatterns []string) (*snapshot.Scanner, *database.Repositories, func()) {
+	t.Helper()
+
+	// Initialize logger
+	log.Initialize(log.Config{})
+
+	// Create test database
+	db, err := database.NewTestDB()
+	require.NoError(t, err)
+
+	repos := database.NewRepositories(db)
+
+	scanner := snapshot.NewScanner(snapshot.ScannerConfig{
+		FS:               fs,
+		ChunkSize:        64 * 1024,
+		Repositories:     repos,
+		MaxBlobSize:      1024 * 1024,
+		CompressionLevel: 3,
+		AgeRecipients:    []string{"age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p"},
+		Exclude:          excludePatterns,
+	})
+
+	cleanup := func() {
+		_ = db.Close()
+	}
+
+	return scanner, repos, cleanup
+}
+
+func createSnapshotRecord(t *testing.T, ctx context.Context, repos *database.Repositories, snapshotID string) {
+	t.Helper()
+	err := repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
+		snap := &database.Snapshot{
+			ID:               types.SnapshotID(snapshotID),
+			Hostname:         "test-host",
+			VaultikVersion:   "test",
+			StartedAt:        time.Now(),
+			CompletedAt:      nil,
+			FileCount:        0,
+			ChunkCount:       0,
+			BlobCount:        0,
+			TotalSize:        0,
+			BlobSize:         0,
+			CompressionRatio: 1.0,
+		}
+		return repos.Snapshots.Create(ctx, tx, snap)
+	})
+	require.NoError(t, err)
+}
+
+func TestExcludePatterns_ExcludeGitDirectory(t *testing.T) {
+	fs := setupExcludeTestFS(t)
+	scanner, repos, cleanup := createTestScanner(t, fs, []string{".git"})
+	defer cleanup()
+	require.NotNil(t, scanner)
+
+	ctx := context.Background()
+	createSnapshotRecord(t, ctx, repos, "test-snapshot")
+
+	result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
+	require.NoError(t, err)
+
+	// Should have scanned files but NOT .git directory contents
+	// Expected: file1.txt, file2.log, src/main.go, src/test.go, node_modules/package/index.js,
+	//           cache/temp.dat, build/output.bin, docs/readme.md, .DS_Store, thumbs.db,
+	//           src/.hidden, important.log.bak
+	// Excluded: .git/config, .git/objects/pack/data.pack
+	require.Equal(t, 12, result.FilesScanned, "Should exclude .git directory contents")
+}
+
+func TestExcludePatterns_ExcludeByExtension(t *testing.T) {
+	fs := setupExcludeTestFS(t)
+	scanner, repos, cleanup := createTestScanner(t, fs, []string{"*.log"})
+	defer cleanup()
+	require.NotNil(t, scanner)
+
+	ctx := context.Background()
+	createSnapshotRecord(t, ctx, repos, "test-snapshot")
+
+	result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
+	require.NoError(t, err)
+
+	// Should exclude file2.log but NOT important.log.bak (different extension)
+	// Total files: 14, excluded: 1 (file2.log)
+	require.Equal(t, 13, result.FilesScanned, "Should exclude *.log files")
+}
+
+func TestExcludePatterns_ExcludeNodeModules(t *testing.T) {
+	fs := setupExcludeTestFS(t)
+	scanner, repos, cleanup := createTestScanner(t, fs, []string{"node_modules"})
+	defer cleanup()
+	require.NotNil(t, scanner)
+
+	ctx := context.Background()
+	createSnapshotRecord(t, ctx, repos, "test-snapshot")
+
+	result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
+	require.NoError(t, err)
+
+	// Should exclude node_modules/package/index.js
+	// Total files: 14, excluded: 1
+	require.Equal(t, 13, result.FilesScanned, "Should exclude node_modules directory")
+}
+
+func TestExcludePatterns_MultiplePatterns(t *testing.T) {
+	fs := setupExcludeTestFS(t)
+	scanner, repos, cleanup := createTestScanner(t, fs, []string{".git", "node_modules", "*.log", ".DS_Store", "thumbs.db", "cache", "build"})
+	defer cleanup()
+	require.NotNil(t, scanner)
+
+	ctx := context.Background()
+	createSnapshotRecord(t, ctx, repos, "test-snapshot")
+
+	result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
+	require.NoError(t, err)
+
+	// Should only have: file1.txt, src/main.go, src/test.go, docs/readme.md, src/.hidden, important.log.bak
+	// Excluded: .git/*, node_modules/*, *.log (file2.log), .DS_Store, thumbs.db, cache/*, build/*
+	require.Equal(t, 6, result.FilesScanned, "Should exclude multiple patterns")
+}
+
+func TestExcludePatterns_NoExclusions(t *testing.T) {
+	fs := setupExcludeTestFS(t)
+	scanner, repos, cleanup := createTestScanner(t, fs, []string{})
+	defer cleanup()
+	require.NotNil(t, scanner)
+
+	ctx := context.Background()
+	createSnapshotRecord(t, ctx, repos, "test-snapshot")
+
+	result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
+	require.NoError(t, err)
+
+	// Should scan all 14 files
+	require.Equal(t, 14, result.FilesScanned, "Should scan all files when no exclusions")
+}
+
+func TestExcludePatterns_ExcludeHiddenFiles(t *testing.T) {
+	fs := setupExcludeTestFS(t)
+	scanner, repos, cleanup := createTestScanner(t, fs, []string{".*"})
+	defer cleanup()
+	require.NotNil(t, scanner)
+
+	ctx := context.Background()
+	createSnapshotRecord(t, ctx, repos, "test-snapshot")
+
+	result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
+	require.NoError(t, err)
+
+	// Should exclude: .git/*, .DS_Store, src/.hidden
+	// Total files: 14, excluded: 4 (.git/config, .git/objects/pack/data.pack, .DS_Store, src/.hidden)
+	require.Equal(t, 10, result.FilesScanned, "Should exclude hidden files and directories")
+}
+
+func TestExcludePatterns_DoubleStarGlob(t *testing.T) {
+	fs := setupExcludeTestFS(t)
+	scanner, repos, cleanup := createTestScanner(t, fs, []string{"**/*.pack"})
+	defer cleanup()
+	require.NotNil(t, scanner)
+
+	ctx := context.Background()
+	createSnapshotRecord(t, ctx, repos, "test-snapshot")
+
+	result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
+	require.NoError(t, err)
+
+	// Should exclude .git/objects/pack/data.pack
+	// Total files: 14, excluded: 1
+	require.Equal(t, 13, result.FilesScanned, "Should exclude **/*.pack files")
+}
+
+func TestExcludePatterns_ExactFileName(t *testing.T) {
+	fs := setupExcludeTestFS(t)
+	scanner, repos, cleanup := createTestScanner(t, fs, []string{"thumbs.db", ".DS_Store"})
+	defer cleanup()
+	require.NotNil(t, scanner)
+
+	ctx := context.Background()
+	createSnapshotRecord(t, ctx, repos, "test-snapshot")
+
+	result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
+	require.NoError(t, err)
+
+	// Should exclude thumbs.db and .DS_Store
+	// Total files: 14, excluded: 2
+	require.Equal(t, 12, result.FilesScanned, "Should exclude exact file names")
+}
+
+func TestExcludePatterns_CaseSensitive(t *testing.T) {
+	// Pattern matching should be case-sensitive
+	fs := setupExcludeTestFS(t)
+	scanner, repos, cleanup := createTestScanner(t, fs, []string{"THUMBS.DB"})
+	defer cleanup()
+	require.NotNil(t, scanner)
+
+	ctx := context.Background()
+	createSnapshotRecord(t, ctx, repos, "test-snapshot")
+
+	result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
+	require.NoError(t, err)
+
+	// Case-sensitive matching: THUMBS.DB should NOT match thumbs.db
+	// All 14 files should be scanned
+	require.Equal(t, 14, result.FilesScanned, "Pattern matching should be case-sensitive")
+}
+
+func TestExcludePatterns_DirectoryWithTrailingSlash(t *testing.T) {
+	fs := setupExcludeTestFS(t)
+	// Some users might add trailing slashes to directory patterns
+	scanner, repos, cleanup := createTestScanner(t, fs, []string{"cache/", "build/"})
+	defer cleanup()
+	require.NotNil(t, scanner)
+
+	ctx := context.Background()
+	createSnapshotRecord(t, ctx, repos, "test-snapshot")
+
+	result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
+	require.NoError(t, err)
+
+	// Should exclude cache/temp.dat and build/output.bin
+	// Total files: 14, excluded: 2
+	require.Equal(t, 12, result.FilesScanned, "Should handle directory patterns with trailing slashes")
+}
+
+func TestExcludePatterns_PatternInSubdirectory(t *testing.T) {
+	fs := setupExcludeTestFS(t)
+	// Exclude .hidden file specifically in src directory
+	scanner, repos, cleanup := createTestScanner(t, fs, []string{"src/.hidden"})
+	defer cleanup()
+	require.NotNil(t, scanner)
+
+	ctx := context.Background()
+	createSnapshotRecord(t, ctx, repos, "test-snapshot")
+
+	result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
+	require.NoError(t, err)
+
+	// Should exclude only src/.hidden
+	// Total files: 14, excluded: 1
+	require.Equal(t, 13, result.FilesScanned, "Should exclude specific subdirectory files")
+}
+
+// setupAnchoredTestFS creates a filesystem for testing anchored patterns
+// Source dir: /backup
+// Structure:
+//
+//	/backup/
+//	  projectname/
+//	    file.txt         (should be excluded with /projectname)
+//	  otherproject/
+//	    projectname/
+//	      file.txt       (should NOT be excluded with /projectname, only with projectname)
+//	  src/
+//	    file.go
+func setupAnchoredTestFS(t *testing.T) afero.Fs {
+	t.Helper()
+
+	fs := afero.NewMemMapFs()
+
+	files := map[string]string{
+		"/backup/projectname/file.txt":              "root project file",
+		"/backup/otherproject/projectname/file.txt": "nested project file",
+		"/backup/src/file.go":                       "source file",
+		"/backup/file.txt":                          "root file",
+	}
+
+	testTime := time.Date(2024, 1, 1, 12, 0, 0, 0, time.UTC)
+	for path, content := range files {
+		dir := filepath.Dir(path)
+		err := fs.MkdirAll(dir, 0755)
+		require.NoError(t, err)
+		err = afero.WriteFile(fs, path, []byte(content), 0644)
+		require.NoError(t, err)
+		err = fs.Chtimes(path, testTime, testTime)
+		require.NoError(t, err)
+	}
+
+	return fs
+}
+
+func TestExcludePatterns_AnchoredPattern(t *testing.T) {
+	// Pattern starting with / should only match from root of source dir
+	fs := setupAnchoredTestFS(t)
+	scanner, repos, cleanup := createTestScanner(t, fs, []string{"/projectname"})
+	defer cleanup()
+	require.NotNil(t, scanner)
+
+	ctx := context.Background()
+	createSnapshotRecord(t, ctx, repos, "test-snapshot")
+
+	result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
+	require.NoError(t, err)
+
+	// /projectname should ONLY exclude /backup/projectname/file.txt (1 file)
+	// /backup/otherproject/projectname/file.txt should NOT be excluded
+	// Total files: 4, excluded: 1
+	require.Equal(t, 3, result.FilesScanned, "Anchored pattern /projectname should only match at root of source dir")
+}
+
+func TestExcludePatterns_UnanchoredPattern(t *testing.T) {
+	// Pattern without leading / should match anywhere in path
+	fs := setupAnchoredTestFS(t)
+	scanner, repos, cleanup := createTestScanner(t, fs, []string{"projectname"})
+	defer cleanup()
+	require.NotNil(t, scanner)
+
+	ctx := context.Background()
+	createSnapshotRecord(t, ctx, repos, "test-snapshot")
+
+	result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
+	require.NoError(t, err)
+
+	// projectname (without /) should exclude BOTH:
+	// - /backup/projectname/file.txt
+	// - /backup/otherproject/projectname/file.txt
+	// Total files: 4, excluded: 2
+	require.Equal(t, 2, result.FilesScanned, "Unanchored pattern should match anywhere in path")
+}
+
+func TestExcludePatterns_AnchoredPatternWithGlob(t *testing.T) {
+	// Anchored pattern with glob
+	fs := setupAnchoredTestFS(t)
+	scanner, repos, cleanup := createTestScanner(t, fs, []string{"/src/*.go"})
+	defer cleanup()
+	require.NotNil(t, scanner)
+
+	ctx := context.Background()
+	createSnapshotRecord(t, ctx, repos, "test-snapshot")
+
+	result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
+	require.NoError(t, err)
+
+	// /src/*.go should exclude /backup/src/file.go
+	// Total files: 4, excluded: 1
+	require.Equal(t, 3, result.FilesScanned, "Anchored pattern with glob should work")
+}
+
+func TestExcludePatterns_AnchoredPatternFile(t *testing.T) {
+	// Anchored pattern for exact file at root
+	fs := setupAnchoredTestFS(t)
+	scanner, repos, cleanup := createTestScanner(t, fs, []string{"/file.txt"})
+	defer cleanup()
+	require.NotNil(t, scanner)
+
+	ctx := context.Background()
+	createSnapshotRecord(t, ctx, repos, "test-snapshot")
+
+	result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
+	require.NoError(t, err)
+
+	// /file.txt should ONLY exclude /backup/file.txt
+	// NOT /backup/projectname/file.txt or /backup/otherproject/projectname/file.txt
+	// Total files: 4, excluded: 1
+	require.Equal(t, 3, result.FilesScanned, "Anchored pattern for file should only match at root")
+}
+
+func TestExcludePatterns_UnanchoredPatternFile(t *testing.T) {
+	// Unanchored pattern for file should match anywhere
+	fs := setupAnchoredTestFS(t)
+	scanner, repos, cleanup := createTestScanner(t, fs, []string{"file.txt"})
+	defer cleanup()
+	require.NotNil(t, scanner)
+
+	ctx := context.Background()
+	createSnapshotRecord(t, ctx, repos, "test-snapshot")
+
+	result, err := scanner.Scan(ctx, "/backup", "test-snapshot")
+	require.NoError(t, err)
+
+	// file.txt should exclude ALL file.txt files:
+	// - /backup/file.txt
+	// - /backup/projectname/file.txt
+	// - /backup/otherproject/projectname/file.txt
+	// Total files: 4, excluded: 3
+	require.Equal(t, 1, result.FilesScanned, "Unanchored pattern for file should match anywhere")
+}
--- a/internal/snapshot/file_change_test.go
+++ b/internal/snapshot/file_change_test.go
@@ -0,0 +1,238 @@
+package snapshot_test
+
+import (
+	"context"
+	"database/sql"
+	"testing"
+	"time"
+
+	"git.eeqj.de/sneak/vaultik/internal/database"
+	"git.eeqj.de/sneak/vaultik/internal/log"
+	"git.eeqj.de/sneak/vaultik/internal/snapshot"
+	"git.eeqj.de/sneak/vaultik/internal/types"
+	"github.com/spf13/afero"
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+// TestFileContentChange verifies that when a file's content changes,
+// the old chunks are properly disassociated
+func TestFileContentChange(t *testing.T) {
+	// Initialize logger for tests
+	log.Initialize(log.Config{})
+
+	// Create in-memory filesystem
+	fs := afero.NewMemMapFs()
+
+	// Create initial file
+	err := afero.WriteFile(fs, "/test.txt", []byte("Initial content"), 0644)
+	require.NoError(t, err)
+
+	// Create test database
+	db, err := database.NewTestDB()
+	require.NoError(t, err)
+	defer func() {
+		if err := db.Close(); err != nil {
+			t.Errorf("failed to close database: %v", err)
+		}
+	}()
+
+	repos := database.NewRepositories(db)
+
+	// Create scanner
+	scanner := snapshot.NewScanner(snapshot.ScannerConfig{
+		FS:               fs,
+		ChunkSize:        int64(1024 * 16), // 16KB chunks for testing
+		Repositories:     repos,
+		MaxBlobSize:      int64(1024 * 1024), // 1MB blobs
+		CompressionLevel: 3,
+		AgeRecipients:    []string{"age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"}, // Test public key
+	})
+
+	// Create first snapshot
+	ctx := context.Background()
+	snapshotID1 := "snapshot1"
+	err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
+		snapshot := &database.Snapshot{
+			ID:             types.SnapshotID(snapshotID1),
+			Hostname:       "test-host",
+			VaultikVersion: "test",
+			StartedAt:      time.Now(),
+		}
+		return repos.Snapshots.Create(ctx, tx, snapshot)
+	})
+	require.NoError(t, err)
+
+	// First scan - should create chunks for initial content
+	result1, err := scanner.Scan(ctx, "/", snapshotID1)
+	require.NoError(t, err)
+	t.Logf("First scan: %d files scanned", result1.FilesScanned)
+
+	// Get file chunks from first scan
+	fileChunks1, err := repos.FileChunks.GetByPath(ctx, "/test.txt")
+	require.NoError(t, err)
+	assert.Len(t, fileChunks1, 1) // Small file = 1 chunk
+	oldChunkHash := fileChunks1[0].ChunkHash
+
+	// Get chunk files from first scan
+	chunkFiles1, err := repos.ChunkFiles.GetByFilePath(ctx, "/test.txt")
+	require.NoError(t, err)
+	assert.Len(t, chunkFiles1, 1)
+
+	// Modify the file
+	time.Sleep(10 * time.Millisecond) // Ensure mtime changes
+	err = afero.WriteFile(fs, "/test.txt", []byte("Modified content with different data"), 0644)
+	require.NoError(t, err)
+
+	// Create second snapshot
+	snapshotID2 := "snapshot2"
+	err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
+		snapshot := &database.Snapshot{
+			ID:             types.SnapshotID(snapshotID2),
+			Hostname:       "test-host",
+			VaultikVersion: "test",
+			StartedAt:      time.Now(),
+		}
+		return repos.Snapshots.Create(ctx, tx, snapshot)
+	})
+	require.NoError(t, err)
+
+	// Second scan - should create new chunks and remove old associations
+	result2, err := scanner.Scan(ctx, "/", snapshotID2)
+	require.NoError(t, err)
+	t.Logf("Second scan: %d files scanned", result2.FilesScanned)
+
+	// Get file chunks from second scan
+	fileChunks2, err := repos.FileChunks.GetByPath(ctx, "/test.txt")
+	require.NoError(t, err)
+	assert.Len(t, fileChunks2, 1) // Still 1 chunk but different hash
+	newChunkHash := fileChunks2[0].ChunkHash
+
+	// Verify the chunk hashes are different
+	assert.NotEqual(t, oldChunkHash, newChunkHash, "Chunk hash should change when content changes")
+
+	// Get chunk files from second scan
+	chunkFiles2, err := repos.ChunkFiles.GetByFilePath(ctx, "/test.txt")
+	require.NoError(t, err)
+	assert.Len(t, chunkFiles2, 1)
+	assert.Equal(t, newChunkHash, chunkFiles2[0].ChunkHash)
+
+	// Verify old chunk still exists (it's still valid data)
+	oldChunk, err := repos.Chunks.GetByHash(ctx, oldChunkHash.String())
+	require.NoError(t, err)
+	assert.NotNil(t, oldChunk)
+
+	// Verify new chunk exists
+	newChunk, err := repos.Chunks.GetByHash(ctx, newChunkHash.String())
+	require.NoError(t, err)
+	assert.NotNil(t, newChunk)
+
+	// Verify that chunk_files for old chunk no longer references this file
+	oldChunkFiles, err := repos.ChunkFiles.GetByChunkHash(ctx, oldChunkHash)
+	require.NoError(t, err)
+	for _, cf := range oldChunkFiles {
+		file, err := repos.Files.GetByID(ctx, cf.FileID)
+		require.NoError(t, err)
+		assert.NotEqual(t, "/data/test.txt", file.Path, "Old chunk should not be associated with the modified file")
+	}
+}
+
+// TestMultipleFileChanges verifies handling of multiple file changes in one scan
+func TestMultipleFileChanges(t *testing.T) {
+	// Initialize logger for tests
+	log.Initialize(log.Config{})
+
+	// Create in-memory filesystem
+	fs := afero.NewMemMapFs()
+
+	// Create initial files
+	files := map[string]string{
+		"/file1.txt": "Content 1",
+		"/file2.txt": "Content 2",
+		"/file3.txt": "Content 3",
+	}
+
+	for path, content := range files {
+		err := afero.WriteFile(fs, path, []byte(content), 0644)
+		require.NoError(t, err)
+	}
+
+	// Create test database
+	db, err := database.NewTestDB()
+	require.NoError(t, err)
+	defer func() {
+		if err := db.Close(); err != nil {
+			t.Errorf("failed to close database: %v", err)
+		}
+	}()
+
+	repos := database.NewRepositories(db)
+
+	// Create scanner
+	scanner := snapshot.NewScanner(snapshot.ScannerConfig{
+		FS:               fs,
+		ChunkSize:        int64(1024 * 16), // 16KB chunks for testing
+		Repositories:     repos,
+		MaxBlobSize:      int64(1024 * 1024), // 1MB blobs
+		CompressionLevel: 3,
+		AgeRecipients:    []string{"age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"}, // Test public key
+	})
+
+	// Create first snapshot
+	ctx := context.Background()
+	snapshotID1 := "snapshot1"
+	err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
+		snapshot := &database.Snapshot{
+			ID:             types.SnapshotID(snapshotID1),
+			Hostname:       "test-host",
+			VaultikVersion: "test",
+			StartedAt:      time.Now(),
+		}
+		return repos.Snapshots.Create(ctx, tx, snapshot)
+	})
+	require.NoError(t, err)
+
+	// First scan
+	result1, err := scanner.Scan(ctx, "/", snapshotID1)
+	require.NoError(t, err)
+	// Only regular files are counted, not directories
+	assert.Equal(t, 3, result1.FilesScanned)
+
+	// Modify two files
+	time.Sleep(10 * time.Millisecond) // Ensure mtime changes
+	err = afero.WriteFile(fs, "/file1.txt", []byte("Modified content 1"), 0644)
+	require.NoError(t, err)
+	err = afero.WriteFile(fs, "/file3.txt", []byte("Modified content 3"), 0644)
+	require.NoError(t, err)
+
+	// Create second snapshot
+	snapshotID2 := "snapshot2"
+	err = repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
+		snapshot := &database.Snapshot{
+			ID:             types.SnapshotID(snapshotID2),
+			Hostname:       "test-host",
+			VaultikVersion: "test",
+			StartedAt:      time.Now(),
+		}
+		return repos.Snapshots.Create(ctx, tx, snapshot)
+	})
+	require.NoError(t, err)
+
+	// Second scan
+	result2, err := scanner.Scan(ctx, "/", snapshotID2)
+	require.NoError(t, err)
+
+	// Only regular files are counted, not directories
+	assert.Equal(t, 3, result2.FilesScanned)
+
+	// Verify each file has exactly one set of chunks
+	for path := range files {
+		fileChunks, err := repos.FileChunks.GetByPath(ctx, path)
+		require.NoError(t, err)
+		assert.Len(t, fileChunks, 1, "File %s should have exactly 1 chunk association", path)
+
+		chunkFiles, err := repos.ChunkFiles.GetByFilePath(ctx, path)
+		require.NoError(t, err)
+		assert.Len(t, chunkFiles, 1, "File %s should have exactly 1 chunk-file association", path)
+	}
+}
--- a/internal/snapshot/manifest.go
+++ b/internal/snapshot/manifest.go
@@ -0,0 +1,70 @@
+package snapshot
+
+import (
+	"bytes"
+	"encoding/json"
+	"fmt"
+	"io"
+
+	"github.com/klauspost/compress/zstd"
+)
+
+// Manifest represents the structure of a snapshot's blob manifest
+type Manifest struct {
+	SnapshotID          string     `json:"snapshot_id"`
+	Timestamp           string     `json:"timestamp"`
+	BlobCount           int        `json:"blob_count"`
+	TotalCompressedSize int64      `json:"total_compressed_size"`
+	Blobs               []BlobInfo `json:"blobs"`
+}
+
+// BlobInfo represents information about a single blob in the manifest
+type BlobInfo struct {
+	Hash           string `json:"hash"`
+	CompressedSize int64  `json:"compressed_size"`
+}
+
+// DecodeManifest decodes a manifest from a reader containing compressed JSON
+func DecodeManifest(r io.Reader) (*Manifest, error) {
+	// Decompress using zstd
+	zr, err := zstd.NewReader(r)
+	if err != nil {
+		return nil, fmt.Errorf("creating zstd reader: %w", err)
+	}
+	defer zr.Close()
+
+	// Decode JSON manifest
+	var manifest Manifest
+	if err := json.NewDecoder(zr).Decode(&manifest); err != nil {
+		return nil, fmt.Errorf("decoding manifest: %w", err)
+	}
+
+	return &manifest, nil
+}
+
+// EncodeManifest encodes a manifest to compressed JSON
+func EncodeManifest(manifest *Manifest, compressionLevel int) ([]byte, error) {
+	// Marshal to JSON
+	jsonData, err := json.MarshalIndent(manifest, "", "  ")
+	if err != nil {
+		return nil, fmt.Errorf("marshaling manifest: %w", err)
+	}
+
+	// Compress using zstd
+	var compressedBuf bytes.Buffer
+	writer, err := zstd.NewWriter(&compressedBuf, zstd.WithEncoderLevel(zstd.EncoderLevelFromZstd(compressionLevel)))
+	if err != nil {
+		return nil, fmt.Errorf("creating zstd writer: %w", err)
+	}
+
+	if _, err := writer.Write(jsonData); err != nil {
+		_ = writer.Close()
+		return nil, fmt.Errorf("writing compressed data: %w", err)
+	}
+
+	if err := writer.Close(); err != nil {
+		return nil, fmt.Errorf("closing zstd writer: %w", err)
+	}
+
+	return compressedBuf.Bytes(), nil
+}
--- a/internal/snapshot/module.go
+++ b/internal/snapshot/module.go
@@ -0,0 +1,53 @@
+package snapshot
+
+import (
+	"git.eeqj.de/sneak/vaultik/internal/config"
+	"git.eeqj.de/sneak/vaultik/internal/database"
+	"git.eeqj.de/sneak/vaultik/internal/storage"
+	"github.com/spf13/afero"
+	"go.uber.org/fx"
+)
+
+// ScannerParams holds parameters for scanner creation
+type ScannerParams struct {
+	EnableProgress bool
+	Fs             afero.Fs
+	Exclude        []string // Exclude patterns (combined global + snapshot-specific)
+	SkipErrors     bool     // Skip file read errors (log loudly but continue)
+}
+
+// Module exports backup functionality as an fx module.
+// It provides a ScannerFactory that can create Scanner instances
+// with custom parameters while sharing common dependencies.
+var Module = fx.Module("backup",
+	fx.Provide(
+		provideScannerFactory,
+		NewSnapshotManager,
+	),
+)
+
+// ScannerFactory creates scanners with custom parameters
+type ScannerFactory func(params ScannerParams) *Scanner
+
+func provideScannerFactory(cfg *config.Config, repos *database.Repositories, storer storage.Storer) ScannerFactory {
+	return func(params ScannerParams) *Scanner {
+		// Use provided excludes, or fall back to global config excludes
+		excludes := params.Exclude
+		if len(excludes) == 0 {
+			excludes = cfg.Exclude
+		}
+
+		return NewScanner(ScannerConfig{
+			FS:               params.Fs,
+			ChunkSize:        cfg.ChunkSize.Int64(),
+			Repositories:     repos,
+			Storage:          storer,
+			MaxBlobSize:      cfg.BlobSizeLimit.Int64(),
+			CompressionLevel: cfg.CompressionLevel,
+			AgeRecipients:    cfg.AgeRecipients,
+			EnableProgress:   params.EnableProgress,
+			Exclude:          excludes,
+			SkipErrors:       params.SkipErrors,
+		})
+	}
+}
--- a/Show More
+++ b/Show More