- Add internal/types package with type-safe wrappers for IDs, hashes, paths, and credentials (FileID, BlobID, ChunkHash, etc.) - Implement driver.Valuer and sql.Scanner for UUID-based types - Add `vaultik version` command showing version, commit, go version - Add `--verify` flag to restore command that checksums all restored files against expected chunk hashes with progress bar - Remove fetch.go (dead code, functionality in restore) - Clean up TODO.md, remove completed items - Update all database and snapshot code to use new custom types
411 lines
12 KiB
Markdown
411 lines
12 KiB
Markdown
# vaultik (ваултик)
|
|
|
|
WIP: pre-1.0, some functions may not be fully implemented yet
|
|
|
|
`vaultik` is an incremental backup daemon written in Go. It encrypts data
|
|
using an `age` public key and uploads each encrypted blob directly to a
|
|
remote S3-compatible object store. It requires no private keys, secrets, or
|
|
credentials (other than those required to PUT to encrypted object storage,
|
|
such as S3 API keys) stored on the backed-up system.
|
|
|
|
It includes table-stakes features such as:
|
|
|
|
* modern encryption (the excellent `age`)
|
|
* deduplication
|
|
* incremental backups
|
|
* modern multithreaded zstd compression with configurable levels
|
|
* content-addressed immutable storage
|
|
* local state tracking in standard SQLite database, enables write-only
|
|
incremental backups to destination
|
|
* no mutable remote metadata
|
|
* no plaintext file paths or metadata stored in remote
|
|
* does not create huge numbers of small files (to keep S3 operation counts
|
|
down) even if the source system has many small files
|
|
|
|
## why
|
|
|
|
Existing backup software fails under one or more of these conditions:
|
|
|
|
* Requires secrets (passwords, private keys) on the source system, which
|
|
compromises encrypted backups in the case of host system compromise
|
|
* Depends on symmetric encryption unsuitable for zero-trust environments
|
|
* Creates one-blob-per-file, which results in excessive S3 operation counts
|
|
* is slow
|
|
|
|
Other backup tools like `restic`, `borg`, and `duplicity` are designed for
|
|
environments where the source host can store secrets and has access to
|
|
decryption keys. I don't want to store backup decryption keys on my hosts,
|
|
only public keys for encryption.
|
|
|
|
My requirements are:
|
|
|
|
* open source
|
|
* no passphrases or private keys on the source host
|
|
* incremental
|
|
* compressed
|
|
* encrypted
|
|
* s3 compatible without an intermediate step or tool
|
|
|
|
Surprisingly, no existing tool meets these requirements, so I wrote `vaultik`.
|
|
|
|
## design goals
|
|
|
|
1. Backups must require only a public key on the source host.
|
|
1. No secrets or private keys may exist on the source system.
|
|
1. Restore must be possible using **only** the backup bucket and a private key.
|
|
1. Prune must be possible (requires private key, done on different hosts).
|
|
1. All encryption uses [`age`](https://age-encryption.org/) (X25519, XChaCha20-Poly1305).
|
|
1. Compression uses `zstd` at a configurable level.
|
|
1. Files are chunked, and multiple chunks are packed into encrypted blobs
|
|
to reduce object count for filesystems with many small files.
|
|
1. All metadata (snapshots) is stored remotely as encrypted SQLite DBs.
|
|
|
|
## what
|
|
|
|
`vaultik` walks a set of configured directories and builds a
|
|
content-addressable chunk map of changed files using deterministic chunking.
|
|
Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
|
|
encrypted with `age`, and uploaded directly to remote storage under a
|
|
content-addressed S3 path. At the end, a pruned snapshot-specific sqlite
|
|
database of metadata is created, encrypted, and uploaded alongside the
|
|
blobs.
|
|
|
|
No plaintext file contents ever hit disk. No private key or secret
|
|
passphrase is needed or stored locally.
|
|
|
|
## how
|
|
|
|
1. **install**
|
|
|
|
```sh
|
|
go install git.eeqj.de/sneak/vaultik@latest
|
|
```
|
|
|
|
1. **generate keypair**
|
|
|
|
```sh
|
|
age-keygen -o agekey.txt
|
|
grep 'public key:' agekey.txt
|
|
```
|
|
|
|
1. **write config**
|
|
|
|
```yaml
|
|
# Named snapshots - each snapshot can contain multiple paths
|
|
snapshots:
|
|
system:
|
|
paths:
|
|
- /etc
|
|
- /var/lib
|
|
exclude:
|
|
- '*.cache' # Snapshot-specific exclusions
|
|
home:
|
|
paths:
|
|
- /home/user/documents
|
|
- /home/user/photos
|
|
|
|
# Global exclusions (apply to all snapshots)
|
|
exclude:
|
|
- '*.log'
|
|
- '*.tmp'
|
|
- '.git'
|
|
- 'node_modules'
|
|
|
|
age_recipients:
|
|
- age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
|
|
s3:
|
|
endpoint: https://s3.example.com
|
|
bucket: vaultik-data
|
|
prefix: host1/
|
|
access_key_id: ...
|
|
secret_access_key: ...
|
|
region: us-east-1
|
|
backup_interval: 1h
|
|
full_scan_interval: 24h
|
|
min_time_between_run: 15m
|
|
chunk_size: 10MB
|
|
blob_size_limit: 1GB
|
|
```
|
|
|
|
1. **run**
|
|
|
|
```sh
|
|
# Create all configured snapshots
|
|
vaultik --config /etc/vaultik.yaml snapshot create
|
|
|
|
# Create specific snapshots by name
|
|
vaultik --config /etc/vaultik.yaml snapshot create home system
|
|
|
|
# Silent mode for cron
|
|
vaultik --config /etc/vaultik.yaml snapshot create --cron
|
|
```
|
|
|
|
---
|
|
|
|
## cli
|
|
|
|
### commands
|
|
|
|
```sh
|
|
vaultik [--config <path>] snapshot create [snapshot-names...] [--cron] [--daemon] [--prune]
|
|
vaultik [--config <path>] snapshot list [--json]
|
|
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep]
|
|
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--force]
|
|
vaultik [--config <path>] snapshot remove <snapshot-id> [--dry-run] [--force]
|
|
vaultik [--config <path>] snapshot prune
|
|
vaultik [--config <path>] restore <snapshot-id> <target-dir> [paths...]
|
|
vaultik [--config <path>] prune [--dry-run] [--force]
|
|
vaultik [--config <path>] info
|
|
vaultik [--config <path>] store info
|
|
```
|
|
|
|
### environment
|
|
|
|
* `VAULTIK_AGE_SECRET_KEY`: Required for `restore` and deep `verify`. Contains the age private key for decryption.
|
|
* `VAULTIK_CONFIG`: Optional path to config file.
|
|
|
|
### command details
|
|
|
|
**snapshot create**: Perform incremental backup of configured snapshots
|
|
* Config is located at `/etc/vaultik/config.yml` by default
|
|
* Optional snapshot names argument to create specific snapshots (default: all)
|
|
* `--cron`: Silent unless error (for crontab)
|
|
* `--daemon`: Run continuously with inotify monitoring and periodic scans
|
|
* `--prune`: Delete old snapshots and orphaned blobs after backup
|
|
|
|
**snapshot list**: List all snapshots with their timestamps and sizes
|
|
* `--json`: Output in JSON format
|
|
|
|
**snapshot verify**: Verify snapshot integrity
|
|
* `--deep`: Download and verify blob contents (not just existence)
|
|
|
|
**snapshot purge**: Remove old snapshots based on criteria
|
|
* `--keep-latest`: Keep only the most recent snapshot
|
|
* `--older-than`: Remove snapshots older than duration (e.g., 30d, 6mo, 1y)
|
|
* `--force`: Skip confirmation prompt
|
|
|
|
**snapshot remove**: Remove a specific snapshot
|
|
* `--dry-run`: Show what would be deleted without deleting
|
|
* `--force`: Skip confirmation prompt
|
|
|
|
**snapshot prune**: Clean orphaned data from local database
|
|
|
|
**restore**: Restore snapshot to target directory
|
|
* Requires `VAULTIK_AGE_SECRET_KEY` environment variable with age private key
|
|
* Optional path arguments to restore specific files/directories (default: all)
|
|
* Downloads and decrypts metadata, fetches required blobs, reconstructs files
|
|
* Preserves file permissions, timestamps, and ownership (ownership requires root)
|
|
* Handles symlinks and directories
|
|
|
|
**prune**: Remove unreferenced blobs from remote storage
|
|
* Scans all snapshots for referenced blobs
|
|
* Deletes orphaned blobs
|
|
|
|
**info**: Display system and configuration information
|
|
|
|
**store info**: Display S3 bucket configuration and storage statistics
|
|
|
|
---
|
|
|
|
## architecture
|
|
|
|
### s3 bucket layout
|
|
|
|
```
|
|
s3://<bucket>/<prefix>/
|
|
├── blobs/
|
|
│ └── <aa>/<bb>/<full_blob_hash>
|
|
└── metadata/
|
|
├── <snapshot_id>/
|
|
│ ├── db.zst.age
|
|
│ └── manifest.json.zst
|
|
```
|
|
|
|
* `blobs/<aa>/<bb>/...`: Two-level directory sharding using first 4 hex chars of blob hash
|
|
* `metadata/<snapshot_id>/db.zst.age`: Encrypted, compressed SQLite database
|
|
* `metadata/<snapshot_id>/manifest.json.zst`: Unencrypted blob list for pruning
|
|
|
|
### blob manifest format
|
|
|
|
The `manifest.json.zst` file is unencrypted (compressed JSON) to enable pruning without decryption:
|
|
|
|
```json
|
|
{
|
|
"snapshot_id": "hostname_snapshotname_2025-01-01T12:00:00Z",
|
|
"blob_hashes": [
|
|
"aa1234567890abcdef...",
|
|
"bb2345678901bcdef0..."
|
|
]
|
|
}
|
|
```
|
|
|
|
Snapshot IDs follow the format `<hostname>_<snapshot-name>_<timestamp>` (e.g., `server1_home_2025-01-01T12:00:00Z`).
|
|
|
|
### local sqlite schema
|
|
|
|
```sql
|
|
CREATE TABLE files (
|
|
id TEXT PRIMARY KEY,
|
|
path TEXT NOT NULL UNIQUE,
|
|
mtime INTEGER NOT NULL,
|
|
size INTEGER NOT NULL,
|
|
mode INTEGER NOT NULL,
|
|
uid INTEGER NOT NULL,
|
|
gid INTEGER NOT NULL
|
|
);
|
|
|
|
CREATE TABLE file_chunks (
|
|
file_id TEXT NOT NULL,
|
|
idx INTEGER NOT NULL,
|
|
chunk_hash TEXT NOT NULL,
|
|
PRIMARY KEY (file_id, idx),
|
|
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
|
|
);
|
|
|
|
CREATE TABLE chunks (
|
|
chunk_hash TEXT PRIMARY KEY,
|
|
size INTEGER NOT NULL
|
|
);
|
|
|
|
CREATE TABLE blobs (
|
|
id TEXT PRIMARY KEY,
|
|
blob_hash TEXT NOT NULL UNIQUE,
|
|
uncompressed INTEGER NOT NULL,
|
|
compressed INTEGER NOT NULL,
|
|
uploaded_at INTEGER
|
|
);
|
|
|
|
CREATE TABLE blob_chunks (
|
|
blob_hash TEXT NOT NULL,
|
|
chunk_hash TEXT NOT NULL,
|
|
offset INTEGER NOT NULL,
|
|
length INTEGER NOT NULL,
|
|
PRIMARY KEY (blob_hash, chunk_hash)
|
|
);
|
|
|
|
CREATE TABLE chunk_files (
|
|
chunk_hash TEXT NOT NULL,
|
|
file_id TEXT NOT NULL,
|
|
file_offset INTEGER NOT NULL,
|
|
length INTEGER NOT NULL,
|
|
PRIMARY KEY (chunk_hash, file_id)
|
|
);
|
|
|
|
CREATE TABLE snapshots (
|
|
id TEXT PRIMARY KEY,
|
|
hostname TEXT NOT NULL,
|
|
vaultik_version TEXT NOT NULL,
|
|
started_at INTEGER NOT NULL,
|
|
completed_at INTEGER,
|
|
file_count INTEGER NOT NULL,
|
|
chunk_count INTEGER NOT NULL,
|
|
blob_count INTEGER NOT NULL,
|
|
total_size INTEGER NOT NULL,
|
|
blob_size INTEGER NOT NULL,
|
|
compression_ratio REAL NOT NULL
|
|
);
|
|
|
|
CREATE TABLE snapshot_files (
|
|
snapshot_id TEXT NOT NULL,
|
|
file_id TEXT NOT NULL,
|
|
PRIMARY KEY (snapshot_id, file_id)
|
|
);
|
|
|
|
CREATE TABLE snapshot_blobs (
|
|
snapshot_id TEXT NOT NULL,
|
|
blob_id TEXT NOT NULL,
|
|
blob_hash TEXT NOT NULL,
|
|
PRIMARY KEY (snapshot_id, blob_id)
|
|
);
|
|
```
|
|
|
|
### data flow
|
|
|
|
#### backup
|
|
|
|
1. Load config, open local SQLite index
|
|
1. Walk source directories, check mtime/size against index
|
|
1. For changed/new files: chunk using content-defined chunking
|
|
1. For each chunk: hash, check if already uploaded, add to blob packer
|
|
1. When blob reaches threshold: compress, encrypt, upload to S3
|
|
1. Build snapshot metadata, compress, encrypt, upload
|
|
1. Create blob manifest (unencrypted) for pruning support
|
|
|
|
#### restore
|
|
|
|
1. Download `metadata/<snapshot_id>/db.zst.age`
|
|
1. Decrypt and decompress SQLite database
|
|
1. Query files table (optionally filtered by paths)
|
|
1. For each file, get ordered chunk list from file_chunks
|
|
1. Download required blobs, decrypt, decompress
|
|
1. Extract chunks and reconstruct files
|
|
1. Restore permissions, mtime, uid/gid
|
|
|
|
#### prune
|
|
|
|
1. List all snapshot manifests
|
|
1. Build set of all referenced blob hashes
|
|
1. List all blobs in storage
|
|
1. Delete any blob not in referenced set
|
|
|
|
### chunking
|
|
|
|
* Content-defined chunking using FastCDC algorithm
|
|
* Average chunk size: configurable (default 10MB)
|
|
* Deduplication at chunk level
|
|
* Multiple chunks packed into blobs for efficiency
|
|
|
|
### encryption
|
|
|
|
* Asymmetric encryption using age (X25519 + XChaCha20-Poly1305)
|
|
* Only public key needed on source host
|
|
* Each blob encrypted independently
|
|
* Metadata databases also encrypted
|
|
|
|
### compression
|
|
|
|
* zstd compression at configurable level
|
|
* Applied before encryption
|
|
* Blob-level compression for efficiency
|
|
|
|
---
|
|
|
|
## does not
|
|
|
|
* Store any secrets on the backed-up machine
|
|
* Require mutable remote metadata
|
|
* Use tarballs, restic, rsync, or ssh
|
|
* Require a symmetric passphrase or password
|
|
* Trust the source system with anything
|
|
|
|
## does
|
|
|
|
* Incremental deduplicated backup
|
|
* Blob-packed chunk encryption
|
|
* Content-addressed immutable blobs
|
|
* Public-key encryption only
|
|
* SQLite-based local and snapshot metadata
|
|
* Fully stream-processed storage
|
|
|
|
---
|
|
|
|
## requirements
|
|
|
|
* Go 1.24 or later
|
|
* S3-compatible object storage
|
|
* Sufficient disk space for local index (typically <1GB)
|
|
|
|
## license
|
|
|
|
[MIT](https://opensource.org/license/mit/)
|
|
|
|
## author
|
|
|
|
Made with love and lots of expensive SOTA AI by [sneak](https://sneak.berlin) in Berlin in the summer of 2025.
|
|
|
|
Released as a free software gift to the world, no strings attached.
|
|
|
|
Contact: [sneak@sneak.berlin](mailto:sneak@sneak.berlin)
|
|
|
|
[https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2](https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2)
|