- Manifests are now only compressed (not encrypted) so pruning operations can work without private keys - Updated generateBlobManifest to use zstd compression directly - Updated prune command to handle unencrypted manifests - Updated snapshot list command to handle new manifest format - Updated documentation to reflect manifest.json.zst (not .age) - Removed unnecessary VAULTIK_PRIVATE_KEY check from prune command
5.4 KiB
Vaultik S3 Repository Structure
This document describes the structure and organization of data stored in the S3 bucket by Vaultik.
Overview
Vaultik stores all backup data in an S3-compatible object store. The repository consists of two main components:
- Blobs - The actual backup data (content-addressed, encrypted)
- Metadata - Snapshot information and manifests (partially encrypted)
Directory Structure
<bucket>/<prefix>/
├── blobs/
│ └── <hash[0:2]>/
│ └── <hash[2:4]>/
│ └── <full-hash>
└── metadata/
└── <snapshot-id>/
├── db.zst.age
└── manifest.json.zst
Blobs Directory (blobs/
)
Structure
- Path format:
blobs/<first-2-chars>/<next-2-chars>/<full-hash>
- Example:
blobs/ca/fe/cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678
- Sharding: The two-level directory structure (using the first 4 characters of the hash) prevents any single directory from containing too many objects
Content
- What it contains: Packed collections of content-defined chunks from files
- Format: Zstandard compressed, then Age encrypted
- Encryption: Always encrypted with Age using the configured recipients
- Naming: Content-addressed using SHA256 hash of the encrypted blob
Why Encrypted
Blobs contain the actual file data from backups and must be encrypted for security. The content-addressing ensures deduplication while the encryption ensures privacy.
Metadata Directory (metadata/
)
Each snapshot has its own subdirectory named with the snapshot ID.
Snapshot ID Format
- Format:
<hostname>-<YYYYMMDD>-<HHMMSSZ>
- Example:
laptop-20240115-143052Z
- Components:
- Hostname (may contain hyphens)
- Date in YYYYMMDD format
- Time in HHMMSSZ format (Z indicates UTC)
Files in Each Snapshot Directory
db.zst.age
- Encrypted Database Dump
- What it contains: Complete SQLite database dump for this snapshot
- Format: SQL dump → Zstandard compressed → Age encrypted
- Encryption: Encrypted with Age
- Purpose: Contains full file metadata, chunk mappings, and all relationships
- Why encrypted: Contains sensitive metadata like file paths, permissions, and ownership
manifest.json.zst
- Unencrypted Blob Manifest
- What it contains: JSON list of all blob hashes referenced by this snapshot
- Format: JSON → Zstandard compressed (NOT encrypted)
- Encryption: NOT encrypted
- Purpose: Enables pruning operations without requiring decryption keys
- Structure:
{
"snapshot_id": "laptop-20240115-143052Z",
"timestamp": "2024-01-15T14:30:52Z",
"blob_count": 42,
"blobs": [
"cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678",
"deadbeef1234567890abcdef1234567890abcdef1234567890abcdef12345678",
...
]
}
Why Manifest is Unencrypted
The manifest must be readable without the private key to enable:
- Pruning operations - Identifying unreferenced blobs for deletion
- Storage analysis - Understanding space usage without decryption
- Verification - Checking blob existence without decryption
- Cross-snapshot deduplication analysis - Finding shared blobs between snapshots
The manifest only contains blob hashes, not file names or any other sensitive information.
Security Considerations
What's Encrypted
- All file content (in blobs)
- All file metadata (paths, permissions, timestamps, ownership in db.zst.age)
- File-to-chunk mappings (in db.zst.age)
What's Not Encrypted
- Blob hashes (in manifest.json.zst)
- Snapshot IDs (directory names)
- Blob count per snapshot (in manifest.json.zst)
Privacy Implications
From the unencrypted data, an observer can determine:
- When backups were taken (from snapshot IDs)
- Which hostname created backups (from snapshot IDs)
- How many blobs each snapshot references
- Which blobs are shared between snapshots (deduplication patterns)
- The size of each encrypted blob
An observer cannot determine:
- File names or paths
- File contents
- File permissions or ownership
- Directory structure
- Which chunks belong to which files
Consistency Guarantees
- Blobs are immutable - Once written, a blob is never modified
- Blobs are written before metadata - A snapshot's metadata is only written after all its blobs are successfully uploaded
- Metadata is written atomically - Both db.zst.age and manifest.json.zst are written as complete files
- Snapshots are marked complete in local DB only after metadata upload - Ensures consistency between local and remote state
Pruning Safety
The prune operation is safe because:
- It only deletes blobs not referenced in any manifest
- Manifests are unencrypted and can be read without keys
- The operation compares the latest local DB snapshot with the latest S3 snapshot to ensure consistency
- Pruning will fail if these don't match, preventing accidental deletion of needed blobs
Restoration Requirements
To restore from a backup, you need:
- The Age private key - To decrypt blobs and database
- The snapshot metadata - Both files from the snapshot's metadata directory
- All referenced blobs - As listed in the manifest
The restoration process:
- Download and decrypt the database dump to understand file structure
- Download and decrypt the required blobs
- Reconstruct files from their chunks
- Restore file metadata (permissions, timestamps, etc.)