vaultik/docs/REPOSTRUCTURE.md
sneak fb220685a2 Fix manifest generation to not encrypt manifests
- Manifests are now only compressed (not encrypted) so pruning operations can work without private keys
- Updated generateBlobManifest to use zstd compression directly
- Updated prune command to handle unencrypted manifests
- Updated snapshot list command to handle new manifest format
- Updated documentation to reflect manifest.json.zst (not .age)
- Removed unnecessary VAULTIK_PRIVATE_KEY check from prune command
2025-07-26 02:54:52 +02:00

5.4 KiB

Vaultik S3 Repository Structure

This document describes the structure and organization of data stored in the S3 bucket by Vaultik.

Overview

Vaultik stores all backup data in an S3-compatible object store. The repository consists of two main components:

  1. Blobs - The actual backup data (content-addressed, encrypted)
  2. Metadata - Snapshot information and manifests (partially encrypted)

Directory Structure

<bucket>/<prefix>/
├── blobs/
│   └── <hash[0:2]>/
│       └── <hash[2:4]>/
│           └── <full-hash>
└── metadata/
    └── <snapshot-id>/
        ├── db.zst.age
        └── manifest.json.zst

Blobs Directory (blobs/)

Structure

  • Path format: blobs/<first-2-chars>/<next-2-chars>/<full-hash>
  • Example: blobs/ca/fe/cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678
  • Sharding: The two-level directory structure (using the first 4 characters of the hash) prevents any single directory from containing too many objects

Content

  • What it contains: Packed collections of content-defined chunks from files
  • Format: Zstandard compressed, then Age encrypted
  • Encryption: Always encrypted with Age using the configured recipients
  • Naming: Content-addressed using SHA256 hash of the encrypted blob

Why Encrypted

Blobs contain the actual file data from backups and must be encrypted for security. The content-addressing ensures deduplication while the encryption ensures privacy.

Metadata Directory (metadata/)

Each snapshot has its own subdirectory named with the snapshot ID.

Snapshot ID Format

  • Format: <hostname>-<YYYYMMDD>-<HHMMSSZ>
  • Example: laptop-20240115-143052Z
  • Components:
    • Hostname (may contain hyphens)
    • Date in YYYYMMDD format
    • Time in HHMMSSZ format (Z indicates UTC)

Files in Each Snapshot Directory

db.zst.age - Encrypted Database Dump

  • What it contains: Complete SQLite database dump for this snapshot
  • Format: SQL dump → Zstandard compressed → Age encrypted
  • Encryption: Encrypted with Age
  • Purpose: Contains full file metadata, chunk mappings, and all relationships
  • Why encrypted: Contains sensitive metadata like file paths, permissions, and ownership

manifest.json.zst - Unencrypted Blob Manifest

  • What it contains: JSON list of all blob hashes referenced by this snapshot
  • Format: JSON → Zstandard compressed (NOT encrypted)
  • Encryption: NOT encrypted
  • Purpose: Enables pruning operations without requiring decryption keys
  • Structure:
{
  "snapshot_id": "laptop-20240115-143052Z",
  "timestamp": "2024-01-15T14:30:52Z",
  "blob_count": 42,
  "blobs": [
    "cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678",
    "deadbeef1234567890abcdef1234567890abcdef1234567890abcdef12345678",
    ...
  ]
}

Why Manifest is Unencrypted

The manifest must be readable without the private key to enable:

  1. Pruning operations - Identifying unreferenced blobs for deletion
  2. Storage analysis - Understanding space usage without decryption
  3. Verification - Checking blob existence without decryption
  4. Cross-snapshot deduplication analysis - Finding shared blobs between snapshots

The manifest only contains blob hashes, not file names or any other sensitive information.

Security Considerations

What's Encrypted

  • All file content (in blobs)
  • All file metadata (paths, permissions, timestamps, ownership in db.zst.age)
  • File-to-chunk mappings (in db.zst.age)

What's Not Encrypted

  • Blob hashes (in manifest.json.zst)
  • Snapshot IDs (directory names)
  • Blob count per snapshot (in manifest.json.zst)

Privacy Implications

From the unencrypted data, an observer can determine:

  • When backups were taken (from snapshot IDs)
  • Which hostname created backups (from snapshot IDs)
  • How many blobs each snapshot references
  • Which blobs are shared between snapshots (deduplication patterns)
  • The size of each encrypted blob

An observer cannot determine:

  • File names or paths
  • File contents
  • File permissions or ownership
  • Directory structure
  • Which chunks belong to which files

Consistency Guarantees

  1. Blobs are immutable - Once written, a blob is never modified
  2. Blobs are written before metadata - A snapshot's metadata is only written after all its blobs are successfully uploaded
  3. Metadata is written atomically - Both db.zst.age and manifest.json.zst are written as complete files
  4. Snapshots are marked complete in local DB only after metadata upload - Ensures consistency between local and remote state

Pruning Safety

The prune operation is safe because:

  1. It only deletes blobs not referenced in any manifest
  2. Manifests are unencrypted and can be read without keys
  3. The operation compares the latest local DB snapshot with the latest S3 snapshot to ensure consistency
  4. Pruning will fail if these don't match, preventing accidental deletion of needed blobs

Restoration Requirements

To restore from a backup, you need:

  1. The Age private key - To decrypt blobs and database
  2. The snapshot metadata - Both files from the snapshot's metadata directory
  3. All referenced blobs - As listed in the manifest

The restoration process:

  1. Download and decrypt the database dump to understand file structure
  2. Download and decrypt the required blobs
  3. Reconstruct files from their chunks
  4. Restore file metadata (permissions, timestamps, etc.)