sneak/vaultik

Fork 0

Files

sneak 9f2d722734 Refresh docs: remove PROCESS.md, fix snapshot ID format, document pre-1.0 migration policy

2026-05-01 07:07:18 +02:00

5.6 KiB

Raw Blame History

Vaultik S3 Repository Structure

This document describes the structure and organization of data stored in the S3 bucket by Vaultik.

Overview

Vaultik stores all backup data in an S3-compatible object store. The repository consists of two main components:

Blobs - The actual backup data (content-addressed, encrypted)
Metadata - Snapshot information and manifests (partially encrypted)

Directory Structure

<bucket>/<prefix>/
├── blobs/
│   └── <hash[0:2]>/
│       └── <hash[2:4]>/
│           └── <full-hash>
└── metadata/
    └── <snapshot-id>/
        ├── db.zst.age
        └── manifest.json.zst

Blobs Directory (`blobs/`)

Structure

Path format: blobs/<first-2-chars>/<next-2-chars>/<full-hash>
Example: blobs/ca/fe/cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678
Sharding: The two-level directory structure (using the first 4 characters of the hash) prevents any single directory from containing too many objects

Content

What it contains: Packed collections of content-defined chunks from files
Format: Zstandard compressed, then Age encrypted
Encryption: Always encrypted with Age using the configured recipients
Naming: Content-addressed using SHA256 hash of the encrypted blob

Why Encrypted

Blobs contain the actual file data from backups and must be encrypted for security. The content-addressing ensures deduplication while the encryption ensures privacy.

Metadata Directory (`metadata/`)

Each snapshot has its own subdirectory named with the snapshot ID.

Snapshot ID Format

Format: <hostname>_<snapshot-name>_<RFC3339> (or <hostname>_<RFC3339> if no name was specified)
Example: laptop_home_2024-01-15T14:30:52Z
Components:
- Short hostname (everything before the first dot is stripped from the FQDN)
- Snapshot name from the configured snapshots: map (optional)
- RFC3339 UTC timestamp

Files in Each Snapshot Directory

`db.zst.age` - Encrypted Database

What it contains: Pruned binary SQLite database for this snapshot
Format: Binary SQLite → Zstandard compressed → Age encrypted
Encryption: Encrypted with Age
Purpose: Contains full file metadata, chunk mappings, and all relationships
Why encrypted: Contains sensitive metadata like file paths, permissions, and ownership

`manifest.json.zst` - Unencrypted Blob Manifest

What it contains: JSON list of all blob hashes referenced by this snapshot
Format: JSON → Zstandard compressed (NOT encrypted)
Encryption: NOT encrypted
Purpose: Enables pruning operations without requiring decryption keys
Structure:

{
  "snapshot_id": "laptop_home_2024-01-15T14:30:52Z",
  "timestamp": "2024-01-15T14:30:52Z",
  "blob_count": 42,
  "blobs": [
    "cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678",
    "deadbeef1234567890abcdef1234567890abcdef1234567890abcdef12345678",
    ...
  ]
}

Why Manifest is Unencrypted

The manifest must be readable without the private key to enable:

Pruning operations - Identifying unreferenced blobs for deletion
Storage analysis - Understanding space usage without decryption
Verification - Checking blob existence without decryption
Cross-snapshot deduplication analysis - Finding shared blobs between snapshots

The manifest only contains blob hashes, not file names or any other sensitive information.

Security Considerations

What's Encrypted

All file content (in blobs)
All file metadata (paths, permissions, timestamps, ownership in db.zst.age)
File-to-chunk mappings (in db.zst.age)

What's Not Encrypted

Blob hashes (in manifest.json.zst)
Snapshot IDs (directory names)
Blob count per snapshot (in manifest.json.zst)

Privacy Implications

From the unencrypted data, an observer can determine:

When backups were taken (from snapshot IDs)
Which hostname created backups (from snapshot IDs)
How many blobs each snapshot references
Which blobs are shared between snapshots (deduplication patterns)
The size of each encrypted blob

An observer cannot determine:

File names or paths
File contents
File permissions or ownership
Directory structure
Which chunks belong to which files

Consistency Guarantees

Blobs are immutable - Once written, a blob is never modified
Blobs are written before metadata - A snapshot's metadata is only written after all its blobs are successfully uploaded
Metadata is written atomically - Both db.zst.age and manifest.json.zst are written as complete files
Snapshots are marked complete in local DB only after metadata upload - Ensures consistency between local and remote state

Pruning Safety

The prune operation is safe because:

It only deletes blobs not referenced in any manifest
Manifests are unencrypted and can be read without keys
The operation compares the latest local DB snapshot with the latest S3 snapshot to ensure consistency
Pruning will fail if these don't match, preventing accidental deletion of needed blobs

Restoration Requirements

To restore from a backup, you need:

The Age private key - To decrypt blobs and database
The snapshot metadata - Both files from the snapshot's metadata directory
All referenced blobs - As listed in the manifest

The restoration process:

Download and decrypt the database dump to understand file structure
Download and decrypt the required blobs
Reconstruct files from their chunks
Restore file metadata (permissions, timestamps, etc.)

5.6 KiB Raw Blame History