Files
vaultik/docs/REPOSTRUCTURE.md

5.6 KiB

Vaultik S3 Repository Structure

This document describes the structure and organization of data stored in the S3 bucket by Vaultik.

Overview

Vaultik stores all backup data in an S3-compatible object store. The repository consists of two main components:

  1. Blobs - The actual backup data (content-addressed, encrypted)
  2. Metadata - Snapshot information and manifests (partially encrypted)

Directory Structure

<bucket>/<prefix>/
├── blobs/
│   └── <hash[0:2]>/
│       └── <hash[2:4]>/
│           └── <full-hash>
└── metadata/
    └── <snapshot-id>/
        ├── db.zst.age
        └── manifest.json.zst

Blobs Directory (blobs/)

Structure

  • Path format: blobs/<first-2-chars>/<next-2-chars>/<full-hash>
  • Example: blobs/ca/fe/cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678
  • Sharding: The two-level directory structure (using the first 4 characters of the hash) prevents any single directory from containing too many objects

Content

  • What it contains: Packed collections of content-defined chunks from files
  • Format: Zstandard compressed, then Age encrypted
  • Encryption: Always encrypted with Age using the configured recipients
  • Naming: Content-addressed using SHA256 hash of the encrypted blob

Why Encrypted

Blobs contain the actual file data from backups and must be encrypted for security. The content-addressing ensures deduplication while the encryption ensures privacy.

Metadata Directory (metadata/)

Each snapshot has its own subdirectory named with the snapshot ID.

Snapshot ID Format

  • Format: <hostname>_<snapshot-name>_<RFC3339> (or <hostname>_<RFC3339> if no name was specified)
  • Example: laptop_home_2024-01-15T14:30:52Z
  • Components:
    • Short hostname (everything before the first dot is stripped from the FQDN)
    • Snapshot name from the configured snapshots: map (optional)
    • RFC3339 UTC timestamp

Files in Each Snapshot Directory

db.zst.age - Encrypted Database

  • What it contains: Pruned binary SQLite database for this snapshot
  • Format: Binary SQLite → Zstandard compressed → Age encrypted
  • Encryption: Encrypted with Age
  • Purpose: Contains full file metadata, chunk mappings, and all relationships
  • Why encrypted: Contains sensitive metadata like file paths, permissions, and ownership

manifest.json.zst - Unencrypted Blob Manifest

  • What it contains: JSON list of all blob hashes referenced by this snapshot
  • Format: JSON → Zstandard compressed (NOT encrypted)
  • Encryption: NOT encrypted
  • Purpose: Enables pruning operations without requiring decryption keys
  • Structure:
{
  "snapshot_id": "laptop_home_2024-01-15T14:30:52Z",
  "timestamp": "2024-01-15T14:30:52Z",
  "blob_count": 42,
  "blobs": [
    "cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678",
    "deadbeef1234567890abcdef1234567890abcdef1234567890abcdef12345678",
    ...
  ]
}

Why Manifest is Unencrypted

The manifest must be readable without the private key to enable:

  1. Pruning operations - Identifying unreferenced blobs for deletion
  2. Storage analysis - Understanding space usage without decryption
  3. Verification - Checking blob existence without decryption
  4. Cross-snapshot deduplication analysis - Finding shared blobs between snapshots

The manifest only contains blob hashes, not file names or any other sensitive information.

Security Considerations

What's Encrypted

  • All file content (in blobs)
  • All file metadata (paths, permissions, timestamps, ownership in db.zst.age)
  • File-to-chunk mappings (in db.zst.age)

What's Not Encrypted

  • Blob hashes (in manifest.json.zst)
  • Snapshot IDs (directory names)
  • Blob count per snapshot (in manifest.json.zst)

Privacy Implications

From the unencrypted data, an observer can determine:

  • When backups were taken (from snapshot IDs)
  • Which hostname created backups (from snapshot IDs)
  • How many blobs each snapshot references
  • Which blobs are shared between snapshots (deduplication patterns)
  • The size of each encrypted blob

An observer cannot determine:

  • File names or paths
  • File contents
  • File permissions or ownership
  • Directory structure
  • Which chunks belong to which files

Consistency Guarantees

  1. Blobs are immutable - Once written, a blob is never modified
  2. Blobs are written before metadata - A snapshot's metadata is only written after all its blobs are successfully uploaded
  3. Metadata is written atomically - Both db.zst.age and manifest.json.zst are written as complete files
  4. Snapshots are marked complete in local DB only after metadata upload - Ensures consistency between local and remote state

Pruning Safety

The prune operation is safe because:

  1. It only deletes blobs not referenced in any manifest
  2. Manifests are unencrypted and can be read without keys
  3. The operation compares the latest local DB snapshot with the latest S3 snapshot to ensure consistency
  4. Pruning will fail if these don't match, preventing accidental deletion of needed blobs

Restoration Requirements

To restore from a backup, you need:

  1. The Age private key - To decrypt blobs and database
  2. The snapshot metadata - Both files from the snapshot's metadata directory
  3. All referenced blobs - As listed in the manifest

The restoration process:

  1. Download and decrypt the database dump to understand file structure
  2. Download and decrypt the required blobs
  3. Reconstruct files from their chunks
  4. Restore file metadata (permissions, timestamps, etc.)