- Manifests are now only compressed (not encrypted) so pruning operations can work without private keys - Updated generateBlobManifest to use zstd compression directly - Updated prune command to handle unencrypted manifests - Updated snapshot list command to handle new manifest format - Updated documentation to reflect manifest.json.zst (not .age) - Removed unnecessary VAULTIK_PRIVATE_KEY check from prune command
143 lines
5.4 KiB
Markdown
143 lines
5.4 KiB
Markdown
# Vaultik S3 Repository Structure
|
|
|
|
This document describes the structure and organization of data stored in the S3 bucket by Vaultik.
|
|
|
|
## Overview
|
|
|
|
Vaultik stores all backup data in an S3-compatible object store. The repository consists of two main components:
|
|
1. **Blobs** - The actual backup data (content-addressed, encrypted)
|
|
2. **Metadata** - Snapshot information and manifests (partially encrypted)
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
<bucket>/<prefix>/
|
|
├── blobs/
|
|
│ └── <hash[0:2]>/
|
|
│ └── <hash[2:4]>/
|
|
│ └── <full-hash>
|
|
└── metadata/
|
|
└── <snapshot-id>/
|
|
├── db.zst.age
|
|
└── manifest.json.zst
|
|
```
|
|
|
|
## Blobs Directory (`blobs/`)
|
|
|
|
### Structure
|
|
- **Path format**: `blobs/<first-2-chars>/<next-2-chars>/<full-hash>`
|
|
- **Example**: `blobs/ca/fe/cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678`
|
|
- **Sharding**: The two-level directory structure (using the first 4 characters of the hash) prevents any single directory from containing too many objects
|
|
|
|
### Content
|
|
- **What it contains**: Packed collections of content-defined chunks from files
|
|
- **Format**: Zstandard compressed, then Age encrypted
|
|
- **Encryption**: Always encrypted with Age using the configured recipients
|
|
- **Naming**: Content-addressed using SHA256 hash of the encrypted blob
|
|
|
|
### Why Encrypted
|
|
Blobs contain the actual file data from backups and must be encrypted for security. The content-addressing ensures deduplication while the encryption ensures privacy.
|
|
|
|
## Metadata Directory (`metadata/`)
|
|
|
|
Each snapshot has its own subdirectory named with the snapshot ID.
|
|
|
|
### Snapshot ID Format
|
|
- **Format**: `<hostname>-<YYYYMMDD>-<HHMMSSZ>`
|
|
- **Example**: `laptop-20240115-143052Z`
|
|
- **Components**:
|
|
- Hostname (may contain hyphens)
|
|
- Date in YYYYMMDD format
|
|
- Time in HHMMSSZ format (Z indicates UTC)
|
|
|
|
### Files in Each Snapshot Directory
|
|
|
|
#### `db.zst.age` - Encrypted Database Dump
|
|
- **What it contains**: Complete SQLite database dump for this snapshot
|
|
- **Format**: SQL dump → Zstandard compressed → Age encrypted
|
|
- **Encryption**: Encrypted with Age
|
|
- **Purpose**: Contains full file metadata, chunk mappings, and all relationships
|
|
- **Why encrypted**: Contains sensitive metadata like file paths, permissions, and ownership
|
|
|
|
#### `manifest.json.zst` - Unencrypted Blob Manifest
|
|
- **What it contains**: JSON list of all blob hashes referenced by this snapshot
|
|
- **Format**: JSON → Zstandard compressed (NOT encrypted)
|
|
- **Encryption**: NOT encrypted
|
|
- **Purpose**: Enables pruning operations without requiring decryption keys
|
|
- **Structure**:
|
|
```json
|
|
{
|
|
"snapshot_id": "laptop-20240115-143052Z",
|
|
"timestamp": "2024-01-15T14:30:52Z",
|
|
"blob_count": 42,
|
|
"blobs": [
|
|
"cafebabe1234567890abcdef1234567890abcdef1234567890abcdef12345678",
|
|
"deadbeef1234567890abcdef1234567890abcdef1234567890abcdef12345678",
|
|
...
|
|
]
|
|
}
|
|
```
|
|
|
|
### Why Manifest is Unencrypted
|
|
The manifest must be readable without the private key to enable:
|
|
1. **Pruning operations** - Identifying unreferenced blobs for deletion
|
|
2. **Storage analysis** - Understanding space usage without decryption
|
|
3. **Verification** - Checking blob existence without decryption
|
|
4. **Cross-snapshot deduplication analysis** - Finding shared blobs between snapshots
|
|
|
|
The manifest only contains blob hashes, not file names or any other sensitive information.
|
|
|
|
## Security Considerations
|
|
|
|
### What's Encrypted
|
|
- **All file content** (in blobs)
|
|
- **All file metadata** (paths, permissions, timestamps, ownership in db.zst.age)
|
|
- **File-to-chunk mappings** (in db.zst.age)
|
|
|
|
### What's Not Encrypted
|
|
- **Blob hashes** (in manifest.json.zst)
|
|
- **Snapshot IDs** (directory names)
|
|
- **Blob count per snapshot** (in manifest.json.zst)
|
|
|
|
### Privacy Implications
|
|
From the unencrypted data, an observer can determine:
|
|
- When backups were taken (from snapshot IDs)
|
|
- Which hostname created backups (from snapshot IDs)
|
|
- How many blobs each snapshot references
|
|
- Which blobs are shared between snapshots (deduplication patterns)
|
|
- The size of each encrypted blob
|
|
|
|
An observer cannot determine:
|
|
- File names or paths
|
|
- File contents
|
|
- File permissions or ownership
|
|
- Directory structure
|
|
- Which chunks belong to which files
|
|
|
|
## Consistency Guarantees
|
|
|
|
1. **Blobs are immutable** - Once written, a blob is never modified
|
|
2. **Blobs are written before metadata** - A snapshot's metadata is only written after all its blobs are successfully uploaded
|
|
3. **Metadata is written atomically** - Both db.zst.age and manifest.json.zst are written as complete files
|
|
4. **Snapshots are marked complete in local DB only after metadata upload** - Ensures consistency between local and remote state
|
|
|
|
## Pruning Safety
|
|
|
|
The prune operation is safe because:
|
|
1. It only deletes blobs not referenced in any manifest
|
|
2. Manifests are unencrypted and can be read without keys
|
|
3. The operation compares the latest local DB snapshot with the latest S3 snapshot to ensure consistency
|
|
4. Pruning will fail if these don't match, preventing accidental deletion of needed blobs
|
|
|
|
## Restoration Requirements
|
|
|
|
To restore from a backup, you need:
|
|
1. **The Age private key** - To decrypt blobs and database
|
|
2. **The snapshot metadata** - Both files from the snapshot's metadata directory
|
|
3. **All referenced blobs** - As listed in the manifest
|
|
|
|
The restoration process:
|
|
1. Download and decrypt the database dump to understand file structure
|
|
2. Download and decrypt the required blobs
|
|
3. Reconstruct files from their chunks
|
|
4. Restore file metadata (permissions, timestamps, etc.) |