404 lines
14 KiB
Markdown
404 lines
14 KiB
Markdown
# vaultik (ваултик)
|
||
|
||
`vaultik` is an incremental backup tool written in Go. It encrypts data
|
||
using an `age` public key and uploads each encrypted blob directly to a
|
||
remote S3-compatible object store. It requires no private keys, secrets, or
|
||
credentials (other than those required to PUT to encrypted object storage,
|
||
such as S3 API keys) stored on the backed-up system.
|
||
|
||
Features:
|
||
|
||
* modern encryption ([age](https://age-encryption.org/), X25519 + XChaCha20-Poly1305)
|
||
* content-defined chunking with deduplication (FastCDC)
|
||
* incremental backups (only changed files are re-chunked)
|
||
* multithreaded zstd compression at configurable levels
|
||
* content-addressed immutable storage
|
||
* local state tracking in SQLite (enables write-only incremental backups)
|
||
* no mutable remote metadata
|
||
* no plaintext file paths or metadata in remote storage
|
||
* packs small files into large blobs (keeps S3 operation counts down)
|
||
* backs up regular files, symlinks, empty directories, and file permissions
|
||
* pluggable storage backends: S3, local filesystem, rclone (70+ providers)
|
||
* pure Go (no CGO), cross-compiles to linux/darwin × amd64/arm64
|
||
|
||
## why
|
||
|
||
Other backup tools like `restic`, `borg`, and `duplicity` are designed for
|
||
environments where the source host can store secrets and has access to
|
||
decryption keys. `vaultik` is for environments where you don't want to
|
||
store backup decryption keys on your hosts — only public keys for
|
||
encryption.
|
||
|
||
Requirements that no existing tool meets:
|
||
|
||
* open source
|
||
* no passphrases or private keys on the source host
|
||
* incremental
|
||
* compressed
|
||
* encrypted
|
||
* s3 compatible without an intermediate step or tool
|
||
|
||
## install
|
||
|
||
```sh
|
||
go install git.eeqj.de/sneak/vaultik@latest
|
||
```
|
||
|
||
## quick start
|
||
|
||
1. **generate keypair**
|
||
|
||
```sh
|
||
age-keygen -o agekey.txt
|
||
grep 'public key:' agekey.txt
|
||
```
|
||
|
||
2. **write config** (see `config.example.yml` for all options)
|
||
|
||
```yaml
|
||
snapshots:
|
||
system:
|
||
paths:
|
||
- /etc
|
||
- /var/lib
|
||
exclude:
|
||
- '*.cache'
|
||
home:
|
||
paths:
|
||
- /home/user/documents
|
||
- /home/user/photos
|
||
|
||
exclude:
|
||
- '*.log'
|
||
- '*.tmp'
|
||
- '.git'
|
||
- 'node_modules'
|
||
|
||
age_recipients:
|
||
- age1YOUR_PUBLIC_KEY_HERE
|
||
|
||
# Storage backend (pick one):
|
||
storage_url: "s3://mybucket/backups?endpoint=s3.example.com®ion=us-east-1"
|
||
# storage_url: "file:///mnt/backups"
|
||
# storage_url: "rclone://myremote/path/to/backups"
|
||
|
||
# For s3:// URLs, credentials are still required:
|
||
s3:
|
||
access_key_id: ...
|
||
secret_access_key: ...
|
||
```
|
||
|
||
3. **run**
|
||
|
||
```sh
|
||
# Back up all configured snapshots
|
||
vaultik --config /etc/vaultik.yml snapshot create
|
||
|
||
# Back up specific snapshots by name
|
||
vaultik --config /etc/vaultik.yml snapshot create home system
|
||
|
||
# Silent mode for cron
|
||
vaultik --config /etc/vaultik.yml snapshot create --cron
|
||
|
||
# Back up and clean up old snapshots + orphan blobs in one shot
|
||
vaultik --config /etc/vaultik.yml snapshot create --prune
|
||
|
||
# Daily cron: back up, keep last 4 weeks of snapshots
|
||
vaultik --config /etc/vaultik.yml snapshot create --cron --prune --keep-newer-than 4w
|
||
```
|
||
|
||
---
|
||
|
||
## cli
|
||
|
||
### commands
|
||
|
||
```sh
|
||
vaultik [--config <path>] snapshot create [snapshot-names...] [--cron] [--prune] [--keep-newer-than <duration>] [--skip-errors]
|
||
vaultik [--config <path>] snapshot list [--json]
|
||
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep] [--json]
|
||
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--snapshot <name>...] [--force]
|
||
vaultik [--config <path>] snapshot remove <snapshot-id|--all> [--dry-run] [--force] [--remote] [--json]
|
||
vaultik [--config <path>] snapshot prune
|
||
vaultik [--config <path>] restore <snapshot-id> <target-dir> [paths...] [--verify]
|
||
vaultik [--config <path>] prune [--force] [--json]
|
||
vaultik [--config <path>] info
|
||
vaultik [--config <path>] remote info [--json]
|
||
vaultik [--config <path>] store info
|
||
vaultik [--config <path>] database purge [--force]
|
||
vaultik version
|
||
```
|
||
|
||
### global flags
|
||
|
||
* `--config <path>`: Path to config file (default: `$VAULTIK_CONFIG` or `/etc/vaultik/config.yml`)
|
||
* `--verbose`, `-v`: Enable verbose output
|
||
* `--debug`: Enable debug output
|
||
* `--quiet`, `-q`: Suppress non-error output
|
||
|
||
### environment variables
|
||
|
||
* `VAULTIK_AGE_SECRET_KEY`: Age private key for decryption (required for `restore` and `verify --deep`)
|
||
* `VAULTIK_CONFIG`: Path to config file (overridden by `--config`)
|
||
* `VAULTIK_INDEX_PATH`: Override local SQLite index path
|
||
|
||
### command details
|
||
|
||
**snapshot create**: Perform incremental backup of configured snapshots.
|
||
* Optional snapshot names argument to create specific snapshots (default: all)
|
||
* `--cron`: Silent unless error (for crontab)
|
||
* `--prune`: After backup, drop older snapshots of each backed-up name and
|
||
remove orphaned blobs from remote storage. By default keeps only the latest
|
||
snapshot per name; use `--keep-newer-than` for a rolling window.
|
||
* `--keep-newer-than <duration>`: With `--prune`, keep snapshots newer than
|
||
this duration instead of only the latest (e.g. `4w`, `30d`, `6mo`, `1y`)
|
||
* `--skip-errors`: Skip file read errors (log them loudly but continue)
|
||
|
||
**snapshot list**: List all snapshots with their timestamps and sizes.
|
||
* `--json`: Output in JSON format
|
||
|
||
**snapshot verify**: Verify snapshot integrity.
|
||
* Default (shallow): checks that all blobs referenced in the manifest exist in storage
|
||
* `--deep`: Downloads and decrypts each blob, verifies chunk hashes against the
|
||
encrypted metadata database
|
||
* `--json`: Output results as JSON
|
||
|
||
**snapshot purge**: Remove old snapshots based on criteria. Retention is
|
||
per-snapshot-name (`--keep-latest` keeps the latest of each name, not the
|
||
latest globally).
|
||
* `--keep-latest`: Keep only the most recent snapshot of each name
|
||
* `--older-than <duration>`: Remove snapshots older than duration (e.g. `30d`, `6m`, `1y`)
|
||
* `--snapshot <name>`: Restrict to specific snapshot names (repeat for multiple)
|
||
* `--force`: Skip confirmation prompt
|
||
|
||
**snapshot remove**: Remove a specific snapshot from the local database.
|
||
* `--remote`: Also remove snapshot metadata from remote storage
|
||
* `--all`: Remove all snapshots (requires `--force`)
|
||
* `--dry-run`: Show what would be deleted without deleting
|
||
* `--force`: Skip confirmation prompt
|
||
* `--json`: Output result as JSON
|
||
|
||
**snapshot prune**: Clean orphaned data from the local database (files,
|
||
chunks, blobs not referenced by any snapshot).
|
||
|
||
**restore**: Restore files from a backup snapshot.
|
||
* Requires `VAULTIK_AGE_SECRET_KEY` environment variable
|
||
* Optional path arguments to restore specific files/directories (default: all)
|
||
* Preserves file permissions, timestamps, ownership (ownership requires root),
|
||
symlinks, and empty directories
|
||
* `--verify`: After restoring, verify every file's chunk hashes match
|
||
|
||
**prune**: Remove unreferenced blobs from remote storage.
|
||
* Scans all snapshot manifests for referenced blobs, deletes any blob not referenced
|
||
* `--force`: Skip confirmation prompt
|
||
* `--json`: Output stats as JSON
|
||
|
||
**info**: Display system configuration, storage settings, encryption
|
||
recipients, and local database statistics.
|
||
|
||
**remote info**: Show detailed remote storage information including per-snapshot
|
||
metadata sizes, blob counts, and orphaned blob detection.
|
||
* `--json`: Output as JSON
|
||
|
||
**store info**: Display storage backend type and statistics.
|
||
|
||
**database purge**: Delete the local SQLite state database entirely. Remote
|
||
storage is unaffected; the next backup will do a full scan and re-deduplicate
|
||
against existing remote blobs.
|
||
* `--force`: Skip confirmation prompt
|
||
|
||
---
|
||
|
||
## storage backends
|
||
|
||
vaultik supports three storage backends, selected via the `storage_url` config field:
|
||
|
||
**S3** (`s3://bucket/prefix?endpoint=host®ion=us-east-1`): Any S3-compatible
|
||
object store. Credentials are read from `s3.access_key_id` and
|
||
`s3.secret_access_key` in the config file.
|
||
|
||
**Local filesystem** (`file:///path/to/backup`): Stores blobs and metadata on
|
||
a local or mounted filesystem. Useful for testing or backing up to a NAS.
|
||
|
||
**Rclone** (`rclone://remote/path`): Uses rclone's 70+ supported cloud
|
||
providers. Requires rclone to be configured separately (`rclone config`).
|
||
|
||
Legacy S3 configuration via `s3.*` fields (endpoint, bucket, prefix, etc.) is
|
||
still supported for backward compatibility. `storage_url` takes precedence if
|
||
both are set.
|
||
|
||
---
|
||
|
||
## architecture
|
||
|
||
### remote storage layout
|
||
|
||
```
|
||
<bucket>/<prefix>/
|
||
├── blobs/
|
||
│ └── <aa>/<bb>/<full_blob_hash>
|
||
└── metadata/
|
||
└── <snapshot_id>/
|
||
├── db.zst.age # Encrypted binary SQLite database
|
||
└── manifest.json.zst # Unencrypted blob list (for pruning)
|
||
```
|
||
|
||
* Blobs are two-level directory sharded using the first 4 hex chars of the blob hash
|
||
* `db.zst.age` is a binary SQLite database (zstd compressed, age encrypted)
|
||
containing all file metadata, chunk mappings, and relationships for the snapshot
|
||
* `manifest.json.zst` is an unencrypted compressed JSON blob list, enabling
|
||
pruning without the private key
|
||
|
||
Snapshot IDs follow the format `<hostname>_<snapshot-name>_<RFC3339-timestamp>`
|
||
(e.g. `server1_home_2025-06-01T12:00:00Z`).
|
||
|
||
### data flow
|
||
|
||
**backup:**
|
||
|
||
1. Open local SQLite index, load known files and chunks into memory
|
||
2. Walk source directories, compare mtime/size/mode against index
|
||
3. For changed/new files: chunk using content-defined chunking (FastCDC)
|
||
4. For symlinks and directories: record metadata (no chunking)
|
||
5. For each chunk: hash, check dedup, add to blob packer
|
||
6. When blob reaches size threshold: compress (zstd), encrypt (age), upload
|
||
7. Build snapshot metadata database, compress, encrypt, upload
|
||
8. Create unencrypted blob manifest for pruning support
|
||
|
||
**restore:**
|
||
|
||
1. Download and decrypt `metadata/<snapshot_id>/db.zst.age`
|
||
2. Open the binary SQLite database
|
||
3. Query files (optionally filtered by paths)
|
||
4. Download and decrypt required blobs
|
||
5. Extract chunks, reconstruct files
|
||
6. Restore permissions, timestamps, ownership, symlinks
|
||
|
||
**prune:**
|
||
|
||
1. List all snapshot manifests
|
||
2. Build set of all referenced blob hashes
|
||
3. List all blobs in storage
|
||
4. Delete any blob not in the referenced set
|
||
|
||
### chunking and deduplication
|
||
|
||
* Content-defined chunking using the FastCDC algorithm
|
||
* Average chunk size: configurable (default 10MB)
|
||
* Deduplication at file level (unchanged files skipped) and chunk level
|
||
(identical chunks across files stored once)
|
||
* Multiple chunks packed into blobs to reduce object count
|
||
|
||
### encryption
|
||
|
||
* Asymmetric encryption using age (X25519 + XChaCha20-Poly1305)
|
||
* Only the public key is needed on the source host
|
||
* Each blob and each metadata database is encrypted independently
|
||
* Multiple recipients supported (encrypt to multiple keys)
|
||
|
||
### compression
|
||
|
||
* zstd compression at configurable level (1-19, default 3)
|
||
* Applied before encryption at the blob level
|
||
|
||
---
|
||
|
||
## configuration reference
|
||
|
||
See `config.example.yml` for a complete annotated example. Key fields:
|
||
|
||
| Field | Default | Description |
|
||
|-------|---------|-------------|
|
||
| `age_recipients` | (required) | Age public keys for encryption |
|
||
| `snapshots` | (required) | Named snapshot definitions with paths and excludes |
|
||
| `storage_url` | | Storage backend URL (`s3://`, `file://`, `rclone://`) |
|
||
| `s3.*` | | Legacy S3 configuration (endpoint, bucket, credentials) |
|
||
| `exclude` | | Global exclude patterns (applied to all snapshots) |
|
||
| `chunk_size` | `10MB` | Average chunk size for content-defined chunking |
|
||
| `blob_size_limit` | `10GB` | Maximum blob size before splitting |
|
||
| `compression_level` | `3` | zstd compression level (1-19) |
|
||
| `hostname` | system hostname | Hostname used in snapshot IDs |
|
||
| `index_path` | `~/.local/share/.../index.sqlite` | Local SQLite index path |
|
||
|
||
---
|
||
|
||
## limitations
|
||
|
||
* **No extended attributes (xattrs).** ACLs, macOS Finder metadata,
|
||
quarantine flags, SELinux labels, and other extended attributes are not
|
||
backed up or restored.
|
||
* **No hard link detection.** Two hard links to the same inode are backed
|
||
up as independent files. Content deduplication means the data is stored
|
||
once, but the hard link relationship is lost on restore.
|
||
* **No sparse file support.** Sparse files are fully materialized during
|
||
backup. A 100 GB sparse VM disk that is mostly zeros will consume the
|
||
full (compressed) size in storage.
|
||
* **No bandwidth limiting.** Uploads and downloads use whatever bandwidth
|
||
is available. There is no `--bwlimit` flag yet.
|
||
* **No parallel blob downloads during restore.** Blobs are fetched
|
||
sequentially. Restore speed is bound by single-stream throughput.
|
||
* **Device nodes, named pipes, and sockets are silently skipped.** Only
|
||
regular files, directories, and symlinks are backed up.
|
||
* **No database migrations.** If the local SQLite schema changes between
|
||
versions, delete the local database (`vaultik database purge`) and run
|
||
a full backup. Remote storage is unaffected.
|
||
* **Files that change during backup may be inconsistent.** There is no
|
||
filesystem snapshot or freeze. If a file is modified between the scan
|
||
and chunk phases, the backed-up copy may reflect a partial write.
|
||
* **Ownership restoration requires root.** File uid/gid are recorded
|
||
and restored, but `chown` requires elevated privileges. Without root,
|
||
files are restored with the current user's ownership.
|
||
|
||
---
|
||
|
||
## roadmap
|
||
|
||
Items for future releases:
|
||
|
||
* Error-condition tests (network failures, disk full, corrupted/missing blobs)
|
||
* Parallel blob downloads during restore
|
||
* Bandwidth limiting (`--bwlimit`)
|
||
* Security audit of encryption implementation
|
||
* Man pages and richer `--help` examples
|
||
|
||
---
|
||
|
||
## requirements
|
||
|
||
* Go 1.26 or later
|
||
* S3-compatible object storage (or local filesystem, or rclone remote)
|
||
|
||
## development workflow
|
||
|
||
All changes follow this workflow. No exceptions.
|
||
|
||
1. Create a feature branch off `main`.
|
||
2. Write tests.
|
||
3. Write the implementation.
|
||
4. Fix implementation errors until it compiles and tests pass.
|
||
5. Fix linting errors (`make lint`).
|
||
6. Update documentation and README as required by the change.
|
||
7. Format code (`make fmt`).
|
||
8. Run `make check` (lint + fmt-check + test). Fix any issues. Repeat until clean.
|
||
9. Commit on the branch.
|
||
10. Merge to `main`.
|
||
11. Push.
|
||
|
||
Do not commit directly to `main`. Do not skip steps.
|
||
|
||
Repository policies for AI agents are in [`AGENTS.md`](AGENTS.md).
|
||
|
||
## license
|
||
|
||
[MIT](https://opensource.org/license/mit/)
|
||
|
||
## author
|
||
|
||
Made with love and lots of expensive SOTA AI by [sneak](https://sneak.berlin) in Berlin in the summer of 2025.
|
||
|
||
Released as a free software gift to the world, no strings attached.
|
||
|
||
Contact: [sneak@sneak.berlin](mailto:sneak@sneak.berlin)
|
||
|
||
[https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2](https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2)
|