5.6 KiB
Vaultik S3 Repository Structure
This document describes the structure and organization of data stored in the S3 bucket by Vaultik.
Overview
Vaultik stores all backup data in an S3-compatible object store. The repository consists of two main components:
- Blobs - The actual backup data (content-addressed, encrypted)
- Metadata - Snapshot information and manifests (partially encrypted)
Directory Structure
<bucket>/<prefix>/
├── blobs/
│ └── <hash[0:2]>/
│ └── <hash[2:4]>/
│ └── <full-hash>
└── metadata/
└── <snapshot-id>/
├── db.zst.age
└── manifest.json.zst
Blobs Directory (blobs/)
Structure
- Path format:
blobs/<first-2-chars>/<next-2-chars>/<full-hash> - Example:
blobs/ca/fe/cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678 - Sharding: The two-level directory structure (using the first 4 characters of the hash) prevents any single directory from containing too many objects
Content
- What it contains: Packed collections of content-defined chunks from files
- Format: Zstandard compressed, then Age encrypted
- Encryption: Always encrypted with Age using the configured recipients
- Naming: Content-addressed using SHA256 hash of the encrypted blob
Why Encrypted
Blobs contain the actual file data from backups and must be encrypted for security. The content-addressing ensures deduplication while the encryption ensures privacy.
Metadata Directory (metadata/)
Each snapshot has its own subdirectory named with the snapshot ID.
Snapshot ID Format
- Format:
<hostname>_<snapshot-name>_<RFC3339>(or<hostname>_<RFC3339>if no name was specified) - Example:
laptop_home_2024-01-15T14:30:52Z - Components:
- Short hostname (everything before the first dot is stripped from the FQDN)
- Snapshot name from the configured
snapshots:map (optional) - RFC3339 UTC timestamp
Files in Each Snapshot Directory
db.zst.age - Encrypted Database
- What it contains: Pruned binary SQLite database for this snapshot
- Format: Binary SQLite → Zstandard compressed → Age encrypted
- Encryption: Encrypted with Age
- Purpose: Contains full file metadata, chunk mappings, and all relationships
- Why encrypted: Contains sensitive metadata like file paths, permissions, and ownership
manifest.json.zst - Unencrypted Blob Manifest
- What it contains: JSON list of all blob hashes referenced by this snapshot
- Format: JSON → Zstandard compressed (NOT encrypted)
- Encryption: NOT encrypted
- Purpose: Enables pruning operations without requiring decryption keys
- Structure:
{
"snapshot_id": "laptop_home_2024-01-15T14:30:52Z",
"timestamp": "2024-01-15T14:30:52Z",
"blob_count": 42,
"blobs": [
"cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678",
"deadbeef1234567890abcdef1234567890abcdef1234567890abcdef12345678",
...
]
}
Why Manifest is Unencrypted
The manifest must be readable without the private key to enable:
- Pruning operations - Identifying unreferenced blobs for deletion
- Storage analysis - Understanding space usage without decryption
- Verification - Checking blob existence without decryption
- Cross-snapshot deduplication analysis - Finding shared blobs between snapshots
The manifest only contains blob hashes, not file names or any other sensitive information.
Security Considerations
What's Encrypted
- All file content (in blobs)
- All file metadata (paths, permissions, timestamps, ownership in db.zst.age)
- File-to-chunk mappings (in db.zst.age)
What's Not Encrypted
- Blob hashes (in manifest.json.zst)
- Snapshot IDs (directory names)
- Blob count per snapshot (in manifest.json.zst)
Privacy Implications
From the unencrypted data, an observer can determine:
- When backups were taken (from snapshot IDs)
- Which hostname created backups (from snapshot IDs)
- How many blobs each snapshot references
- Which blobs are shared between snapshots (deduplication patterns)
- The size of each encrypted blob
An observer cannot determine:
- File names or paths
- File contents
- File permissions or ownership
- Directory structure
- Which chunks belong to which files
Consistency Guarantees
- Blobs are immutable - Once written, a blob is never modified
- Blobs are written before metadata - A snapshot's metadata is only written after all its blobs are successfully uploaded
- Metadata is written atomically - Both db.zst.age and manifest.json.zst are written as complete files
- Snapshots are marked complete in local DB only after metadata upload - Ensures consistency between local and remote state
Pruning Safety
The prune operation is safe because:
- It only deletes blobs not referenced in any manifest
- Manifests are unencrypted and can be read without keys
- The operation compares the latest local DB snapshot with the latest S3 snapshot to ensure consistency
- Pruning will fail if these don't match, preventing accidental deletion of needed blobs
Restoration Requirements
To restore from a backup, you need:
- The Age private key - To decrypt blobs and database
- The snapshot metadata - Both files from the snapshot's metadata directory
- All referenced blobs - As listed in the manifest
The restoration process:
- Download and decrypt the database dump to understand file structure
- Download and decrypt the required blobs
- Reconstruct files from their chunks
- Restore file metadata (permissions, timestamps, etc.)