Add custom types, version command, and restore --verify flag
- Add internal/types package with type-safe wrappers for IDs, hashes, paths, and credentials (FileID, BlobID, ChunkHash, etc.) - Implement driver.Valuer and sql.Scanner for UUID-based types - Add `vaultik version` command showing version, commit, go version - Add `--verify` flag to restore command that checksums all restored files against expected chunk hashes with progress bar - Remove fetch.go (dead code, functionality in restore) - Clean up TODO.md, remove completed items - Update all database and snapshot code to use new custom types
This commit is contained in:
407
README.md
407
README.md
@@ -2,7 +2,7 @@
|
||||
|
||||
WIP: pre-1.0, some functions may not be fully implemented yet
|
||||
|
||||
`vaultik` is a incremental backup daemon written in Go. It encrypts data
|
||||
`vaultik` is an incremental backup daemon written in Go. It encrypts data
|
||||
using an `age` public key and uploads each encrypted blob directly to a
|
||||
remote S3-compatible object store. It requires no private keys, secrets, or
|
||||
credentials (other than those required to PUT to encrypted object storage,
|
||||
@@ -22,19 +22,6 @@ It includes table-stakes features such as:
|
||||
* does not create huge numbers of small files (to keep S3 operation counts
|
||||
down) even if the source system has many small files
|
||||
|
||||
## what
|
||||
|
||||
`vaultik` walks a set of configured directories and builds a
|
||||
content-addressable chunk map of changed files using deterministic chunking.
|
||||
Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
|
||||
encrypted with `age`, and uploaded directly to remote storage under a
|
||||
content-addressed S3 path. at the end, a pruned snapshot-specific sqlite
|
||||
database of metadata is created, encrypted, and uploaded alongside the
|
||||
blobs.
|
||||
|
||||
No plaintext file contents ever hit disk. No private key or secret
|
||||
passphrase is needed or stored locally.
|
||||
|
||||
## why
|
||||
|
||||
Existing backup software fails under one or more of these conditions:
|
||||
@@ -45,15 +32,46 @@ Existing backup software fails under one or more of these conditions:
|
||||
* Creates one-blob-per-file, which results in excessive S3 operation counts
|
||||
* is slow
|
||||
|
||||
`vaultik` addresses these by using:
|
||||
Other backup tools like `restic`, `borg`, and `duplicity` are designed for
|
||||
environments where the source host can store secrets and has access to
|
||||
decryption keys. I don't want to store backup decryption keys on my hosts,
|
||||
only public keys for encryption.
|
||||
|
||||
* Public-key-only encryption (via `age`) requires no secrets (other than
|
||||
remote storage api key) on the source system
|
||||
* Local state cache for incremental detection does not require reading from
|
||||
or decrypting remote storage
|
||||
* Content-addressed immutable storage allows efficient deduplication
|
||||
* Storage only of large encrypted blobs of configurable size (1G by default)
|
||||
reduces S3 operation counts and improves performance
|
||||
My requirements are:
|
||||
|
||||
* open source
|
||||
* no passphrases or private keys on the source host
|
||||
* incremental
|
||||
* compressed
|
||||
* encrypted
|
||||
* s3 compatible without an intermediate step or tool
|
||||
|
||||
Surprisingly, no existing tool meets these requirements, so I wrote `vaultik`.
|
||||
|
||||
## design goals
|
||||
|
||||
1. Backups must require only a public key on the source host.
|
||||
1. No secrets or private keys may exist on the source system.
|
||||
1. Restore must be possible using **only** the backup bucket and a private key.
|
||||
1. Prune must be possible (requires private key, done on different hosts).
|
||||
1. All encryption uses [`age`](https://age-encryption.org/) (X25519, XChaCha20-Poly1305).
|
||||
1. Compression uses `zstd` at a configurable level.
|
||||
1. Files are chunked, and multiple chunks are packed into encrypted blobs
|
||||
to reduce object count for filesystems with many small files.
|
||||
1. All metadata (snapshots) is stored remotely as encrypted SQLite DBs.
|
||||
|
||||
## what
|
||||
|
||||
`vaultik` walks a set of configured directories and builds a
|
||||
content-addressable chunk map of changed files using deterministic chunking.
|
||||
Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
|
||||
encrypted with `age`, and uploaded directly to remote storage under a
|
||||
content-addressed S3 path. At the end, a pruned snapshot-specific sqlite
|
||||
database of metadata is created, encrypted, and uploaded alongside the
|
||||
blobs.
|
||||
|
||||
No plaintext file contents ever hit disk. No private key or secret
|
||||
passphrase is needed or stored locally.
|
||||
|
||||
## how
|
||||
|
||||
@@ -63,59 +81,63 @@ Existing backup software fails under one or more of these conditions:
|
||||
go install git.eeqj.de/sneak/vaultik@latest
|
||||
```
|
||||
|
||||
2. **generate keypair**
|
||||
1. **generate keypair**
|
||||
|
||||
```sh
|
||||
age-keygen -o agekey.txt
|
||||
grep 'public key:' agekey.txt
|
||||
```
|
||||
|
||||
3. **write config**
|
||||
1. **write config**
|
||||
|
||||
```yaml
|
||||
source_dirs:
|
||||
- /etc
|
||||
- /home/user/data
|
||||
# Named snapshots - each snapshot can contain multiple paths
|
||||
snapshots:
|
||||
system:
|
||||
paths:
|
||||
- /etc
|
||||
- /var/lib
|
||||
exclude:
|
||||
- '*.cache' # Snapshot-specific exclusions
|
||||
home:
|
||||
paths:
|
||||
- /home/user/documents
|
||||
- /home/user/photos
|
||||
|
||||
# Global exclusions (apply to all snapshots)
|
||||
exclude:
|
||||
- '*.log'
|
||||
- '*.tmp'
|
||||
age_recipient: age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
|
||||
- '.git'
|
||||
- 'node_modules'
|
||||
|
||||
age_recipients:
|
||||
- age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
|
||||
s3:
|
||||
# endpoint is optional if using AWS S3, but who even does that?
|
||||
endpoint: https://s3.example.com
|
||||
bucket: vaultik-data
|
||||
prefix: host1/
|
||||
access_key_id: ...
|
||||
secret_access_key: ...
|
||||
region: us-east-1
|
||||
backup_interval: 1h # only used in daemon mode, not for --cron mode
|
||||
full_scan_interval: 24h # normally we use inotify to mark dirty, but
|
||||
# every 24h we do a full stat() scan
|
||||
min_time_between_run: 15m # again, only for daemon mode
|
||||
#index_path: /var/lib/vaultik/index.sqlite
|
||||
backup_interval: 1h
|
||||
full_scan_interval: 24h
|
||||
min_time_between_run: 15m
|
||||
chunk_size: 10MB
|
||||
blob_size_limit: 10GB
|
||||
blob_size_limit: 1GB
|
||||
```
|
||||
|
||||
4. **run**
|
||||
1. **run**
|
||||
|
||||
```sh
|
||||
# Create all configured snapshots
|
||||
vaultik --config /etc/vaultik.yaml snapshot create
|
||||
```
|
||||
|
||||
```sh
|
||||
vaultik --config /etc/vaultik.yaml snapshot create --cron # silent unless error
|
||||
```
|
||||
# Create specific snapshots by name
|
||||
vaultik --config /etc/vaultik.yaml snapshot create home system
|
||||
|
||||
```sh
|
||||
vaultik --config /etc/vaultik.yaml snapshot daemon # runs continuously in foreground, uses inotify to detect changes
|
||||
|
||||
# TODO
|
||||
* make sure daemon mode does not make a snapshot if no files have
|
||||
changed, even if the backup_interval has passed
|
||||
* in daemon mode, if we are long enough since the last snapshot event, and we get
|
||||
an inotify event, we should schedule the next snapshot creation for 10 minutes from the
|
||||
time of the mark-dirty event.
|
||||
# Silent mode for cron
|
||||
vaultik --config /etc/vaultik.yaml snapshot create --cron
|
||||
```
|
||||
|
||||
---
|
||||
@@ -125,76 +147,211 @@ Existing backup software fails under one or more of these conditions:
|
||||
### commands
|
||||
|
||||
```sh
|
||||
vaultik [--config <path>] snapshot create [--cron] [--daemon]
|
||||
vaultik [--config <path>] snapshot create [snapshot-names...] [--cron] [--daemon] [--prune]
|
||||
vaultik [--config <path>] snapshot list [--json]
|
||||
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--force]
|
||||
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep]
|
||||
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--force]
|
||||
vaultik [--config <path>] snapshot remove <snapshot-id> [--dry-run] [--force]
|
||||
vaultik [--config <path>] snapshot prune
|
||||
vaultik [--config <path>] restore <snapshot-id> <target-dir> [paths...]
|
||||
vaultik [--config <path>] prune [--dry-run] [--force]
|
||||
vaultik [--config <path>] info
|
||||
vaultik [--config <path>] store info
|
||||
# FIXME: remove 'bucket' and 'prefix' and 'snapshot' flags. it should be
|
||||
# 'vaultik restore snapshot <snapshot> --target <dir>'. bucket and prefix are always
|
||||
# from config file.
|
||||
vaultik restore --bucket <bucket> --prefix <prefix> --snapshot <id> --target <dir>
|
||||
# FIXME: remove prune, it's the old version of "snapshot purge"
|
||||
vaultik prune --bucket <bucket> --prefix <prefix> [--dry-run]
|
||||
# FIXME: change fetch to 'vaultik restore path <snapshot> <path> --target <path>'
|
||||
vaultik fetch --bucket <bucket> --prefix <prefix> --snapshot <id> --file <path> --target <path>
|
||||
# FIXME: remove this, it's redundant with 'snapshot verify'
|
||||
vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
|
||||
```
|
||||
|
||||
### environment
|
||||
|
||||
* `VAULTIK_PRIVATE_KEY`: Required for `restore`, `prune`, `fetch`, and `verify` commands. Contains the age private key for decryption.
|
||||
* `VAULTIK_CONFIG`: Optional path to config file. If set, config file path doesn't need to be specified on the command line.
|
||||
* `VAULTIK_AGE_SECRET_KEY`: Required for `restore` and deep `verify`. Contains the age private key for decryption.
|
||||
* `VAULTIK_CONFIG`: Optional path to config file.
|
||||
|
||||
### command details
|
||||
|
||||
**snapshot create**: Perform incremental backup of configured directories
|
||||
**snapshot create**: Perform incremental backup of configured snapshots
|
||||
* Config is located at `/etc/vaultik/config.yml` by default
|
||||
* Optional snapshot names argument to create specific snapshots (default: all)
|
||||
* `--cron`: Silent unless error (for crontab)
|
||||
* `--daemon`: Run continuously with inotify monitoring and periodic scans
|
||||
* `--prune`: Delete old snapshots and orphaned blobs after backup
|
||||
|
||||
**snapshot list**: List all snapshots with their timestamps and sizes
|
||||
* `--json`: Output in JSON format
|
||||
|
||||
**snapshot verify**: Verify snapshot integrity
|
||||
* `--deep`: Download and verify blob contents (not just existence)
|
||||
|
||||
**snapshot purge**: Remove old snapshots based on criteria
|
||||
* `--keep-latest`: Keep only the most recent snapshot
|
||||
* `--older-than`: Remove snapshots older than duration (e.g., 30d, 6mo, 1y)
|
||||
* `--force`: Skip confirmation prompt
|
||||
|
||||
**snapshot verify**: Verify snapshot integrity
|
||||
* `--deep`: Download and verify blob hashes (not just existence)
|
||||
**snapshot remove**: Remove a specific snapshot
|
||||
* `--dry-run`: Show what would be deleted without deleting
|
||||
* `--force`: Skip confirmation prompt
|
||||
|
||||
**store info**: Display S3 bucket configuration and storage statistics
|
||||
**snapshot prune**: Clean orphaned data from local database
|
||||
|
||||
**restore**: Restore entire snapshot to target directory
|
||||
* Downloads and decrypts metadata
|
||||
* Fetches only required blobs
|
||||
* Reconstructs directory structure
|
||||
**restore**: Restore snapshot to target directory
|
||||
* Requires `VAULTIK_AGE_SECRET_KEY` environment variable with age private key
|
||||
* Optional path arguments to restore specific files/directories (default: all)
|
||||
* Downloads and decrypts metadata, fetches required blobs, reconstructs files
|
||||
* Preserves file permissions, timestamps, and ownership (ownership requires root)
|
||||
* Handles symlinks and directories
|
||||
|
||||
**prune**: Remove unreferenced blobs from storage
|
||||
* Requires private key
|
||||
* Downloads latest snapshot metadata
|
||||
**prune**: Remove unreferenced blobs from remote storage
|
||||
* Scans all snapshots for referenced blobs
|
||||
* Deletes orphaned blobs
|
||||
|
||||
**fetch**: Extract single file from backup
|
||||
* Retrieves specific file without full restore
|
||||
* Supports extracting to different filename
|
||||
**info**: Display system and configuration information
|
||||
|
||||
**verify**: Validate backup integrity
|
||||
* Checks metadata hash
|
||||
* Verifies all referenced blobs exist
|
||||
* Default: Downloads blobs and validates chunk integrity
|
||||
* `--quick`: Only checks blob existence and S3 content hashes
|
||||
**store info**: Display S3 bucket configuration and storage statistics
|
||||
|
||||
---
|
||||
|
||||
## architecture
|
||||
|
||||
### s3 bucket layout
|
||||
|
||||
```
|
||||
s3://<bucket>/<prefix>/
|
||||
├── blobs/
|
||||
│ └── <aa>/<bb>/<full_blob_hash>
|
||||
└── metadata/
|
||||
├── <snapshot_id>/
|
||||
│ ├── db.zst.age
|
||||
│ └── manifest.json.zst
|
||||
```
|
||||
|
||||
* `blobs/<aa>/<bb>/...`: Two-level directory sharding using first 4 hex chars of blob hash
|
||||
* `metadata/<snapshot_id>/db.zst.age`: Encrypted, compressed SQLite database
|
||||
* `metadata/<snapshot_id>/manifest.json.zst`: Unencrypted blob list for pruning
|
||||
|
||||
### blob manifest format
|
||||
|
||||
The `manifest.json.zst` file is unencrypted (compressed JSON) to enable pruning without decryption:
|
||||
|
||||
```json
|
||||
{
|
||||
"snapshot_id": "hostname_snapshotname_2025-01-01T12:00:00Z",
|
||||
"blob_hashes": [
|
||||
"aa1234567890abcdef...",
|
||||
"bb2345678901bcdef0..."
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Snapshot IDs follow the format `<hostname>_<snapshot-name>_<timestamp>` (e.g., `server1_home_2025-01-01T12:00:00Z`).
|
||||
|
||||
### local sqlite schema
|
||||
|
||||
```sql
|
||||
CREATE TABLE files (
|
||||
id TEXT PRIMARY KEY,
|
||||
path TEXT NOT NULL UNIQUE,
|
||||
mtime INTEGER NOT NULL,
|
||||
size INTEGER NOT NULL,
|
||||
mode INTEGER NOT NULL,
|
||||
uid INTEGER NOT NULL,
|
||||
gid INTEGER NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE file_chunks (
|
||||
file_id TEXT NOT NULL,
|
||||
idx INTEGER NOT NULL,
|
||||
chunk_hash TEXT NOT NULL,
|
||||
PRIMARY KEY (file_id, idx),
|
||||
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE TABLE chunks (
|
||||
chunk_hash TEXT PRIMARY KEY,
|
||||
size INTEGER NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE blobs (
|
||||
id TEXT PRIMARY KEY,
|
||||
blob_hash TEXT NOT NULL UNIQUE,
|
||||
uncompressed INTEGER NOT NULL,
|
||||
compressed INTEGER NOT NULL,
|
||||
uploaded_at INTEGER
|
||||
);
|
||||
|
||||
CREATE TABLE blob_chunks (
|
||||
blob_hash TEXT NOT NULL,
|
||||
chunk_hash TEXT NOT NULL,
|
||||
offset INTEGER NOT NULL,
|
||||
length INTEGER NOT NULL,
|
||||
PRIMARY KEY (blob_hash, chunk_hash)
|
||||
);
|
||||
|
||||
CREATE TABLE chunk_files (
|
||||
chunk_hash TEXT NOT NULL,
|
||||
file_id TEXT NOT NULL,
|
||||
file_offset INTEGER NOT NULL,
|
||||
length INTEGER NOT NULL,
|
||||
PRIMARY KEY (chunk_hash, file_id)
|
||||
);
|
||||
|
||||
CREATE TABLE snapshots (
|
||||
id TEXT PRIMARY KEY,
|
||||
hostname TEXT NOT NULL,
|
||||
vaultik_version TEXT NOT NULL,
|
||||
started_at INTEGER NOT NULL,
|
||||
completed_at INTEGER,
|
||||
file_count INTEGER NOT NULL,
|
||||
chunk_count INTEGER NOT NULL,
|
||||
blob_count INTEGER NOT NULL,
|
||||
total_size INTEGER NOT NULL,
|
||||
blob_size INTEGER NOT NULL,
|
||||
compression_ratio REAL NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE snapshot_files (
|
||||
snapshot_id TEXT NOT NULL,
|
||||
file_id TEXT NOT NULL,
|
||||
PRIMARY KEY (snapshot_id, file_id)
|
||||
);
|
||||
|
||||
CREATE TABLE snapshot_blobs (
|
||||
snapshot_id TEXT NOT NULL,
|
||||
blob_id TEXT NOT NULL,
|
||||
blob_hash TEXT NOT NULL,
|
||||
PRIMARY KEY (snapshot_id, blob_id)
|
||||
);
|
||||
```
|
||||
|
||||
### data flow
|
||||
|
||||
#### backup
|
||||
|
||||
1. Load config, open local SQLite index
|
||||
1. Walk source directories, check mtime/size against index
|
||||
1. For changed/new files: chunk using content-defined chunking
|
||||
1. For each chunk: hash, check if already uploaded, add to blob packer
|
||||
1. When blob reaches threshold: compress, encrypt, upload to S3
|
||||
1. Build snapshot metadata, compress, encrypt, upload
|
||||
1. Create blob manifest (unencrypted) for pruning support
|
||||
|
||||
#### restore
|
||||
|
||||
1. Download `metadata/<snapshot_id>/db.zst.age`
|
||||
1. Decrypt and decompress SQLite database
|
||||
1. Query files table (optionally filtered by paths)
|
||||
1. For each file, get ordered chunk list from file_chunks
|
||||
1. Download required blobs, decrypt, decompress
|
||||
1. Extract chunks and reconstruct files
|
||||
1. Restore permissions, mtime, uid/gid
|
||||
|
||||
#### prune
|
||||
|
||||
1. List all snapshot manifests
|
||||
1. Build set of all referenced blob hashes
|
||||
1. List all blobs in storage
|
||||
1. Delete any blob not in referenced set
|
||||
|
||||
### chunking
|
||||
|
||||
* Content-defined chunking using rolling hash (Rabin fingerprint)
|
||||
* Average chunk size: 10MB (configurable)
|
||||
* Content-defined chunking using FastCDC algorithm
|
||||
* Average chunk size: configurable (default 10MB)
|
||||
* Deduplication at chunk level
|
||||
* Multiple chunks packed into blobs for efficiency
|
||||
|
||||
@@ -205,19 +362,13 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
|
||||
* Each blob encrypted independently
|
||||
* Metadata databases also encrypted
|
||||
|
||||
### storage
|
||||
### compression
|
||||
|
||||
* Content-addressed blob storage
|
||||
* Immutable append-only design
|
||||
* Two-level directory sharding for blobs (aa/bb/hash)
|
||||
* Compressed with zstd before encryption
|
||||
* zstd compression at configurable level
|
||||
* Applied before encryption
|
||||
* Blob-level compression for efficiency
|
||||
|
||||
### state tracking
|
||||
|
||||
* Local SQLite database for incremental state
|
||||
* Tracks file mtimes and chunk mappings
|
||||
* Enables efficient change detection
|
||||
* Supports inotify monitoring in daemon mode
|
||||
---
|
||||
|
||||
## does not
|
||||
|
||||
@@ -227,8 +378,6 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
|
||||
* Require a symmetric passphrase or password
|
||||
* Trust the source system with anything
|
||||
|
||||
---
|
||||
|
||||
## does
|
||||
|
||||
* Incremental deduplicated backup
|
||||
@@ -240,70 +389,16 @@ vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
|
||||
|
||||
---
|
||||
|
||||
## restore
|
||||
|
||||
`vaultik restore` downloads only the snapshot metadata and required blobs. It
|
||||
never contacts the source system. All restore operations depend only on:
|
||||
|
||||
* `VAULTIK_PRIVATE_KEY`
|
||||
* The bucket
|
||||
|
||||
The entire system is restore-only from object storage.
|
||||
|
||||
---
|
||||
|
||||
## features
|
||||
|
||||
### daemon mode
|
||||
|
||||
* Continuous background operation
|
||||
* inotify-based change detection
|
||||
* Respects `backup_interval` and `min_time_between_run`
|
||||
* Full scan every `full_scan_interval` (default 24h)
|
||||
|
||||
### cron mode
|
||||
|
||||
* Single backup run
|
||||
* Silent output unless errors
|
||||
* Ideal for scheduled backups
|
||||
|
||||
### metadata integrity
|
||||
|
||||
* SHA256 hash of metadata stored separately
|
||||
* Encrypted hash file for verification
|
||||
* Chunked metadata support for large filesystems
|
||||
|
||||
### exclusion patterns
|
||||
|
||||
* Glob-based file exclusion
|
||||
* Configured in YAML
|
||||
* Applied during directory walk
|
||||
|
||||
## prune
|
||||
|
||||
Run `vaultik prune` on a machine with the private key. It:
|
||||
|
||||
* Downloads the most recent snapshot
|
||||
* Decrypts metadata
|
||||
* Lists referenced blobs
|
||||
* Deletes any blob in the bucket not referenced
|
||||
|
||||
This enables garbage collection from immutable storage.
|
||||
|
||||
---
|
||||
|
||||
## LICENSE
|
||||
|
||||
[MIT](https://opensource.org/license/mit/)
|
||||
|
||||
---
|
||||
|
||||
## requirements
|
||||
|
||||
* Go 1.24.4 or later
|
||||
* Go 1.24 or later
|
||||
* S3-compatible object storage
|
||||
* Sufficient disk space for local index (typically <1GB)
|
||||
|
||||
## license
|
||||
|
||||
[MIT](https://opensource.org/license/mit/)
|
||||
|
||||
## author
|
||||
|
||||
Made with love and lots of expensive SOTA AI by [sneak](https://sneak.berlin) in Berlin in the summer of 2025.
|
||||
|
||||
Reference in New Issue
Block a user