--cron now sets Vaultik.Stdout to io.Discard so all user-facing output is suppressed, not just the scanner progress. Errors still go to stderr via the structured logger. snapshot list now warns when local snapshot records have no matching remote metadata, and suggests 'vaultik snapshot cleanup' instead of silently deleting them. snapshot cleanup is a new subcommand that explicitly removes stale local snapshot records. syncWithRemote (used by purge) still does this automatically since purge is already destructive. .gitignore changed from 'vaultik' to '/vaultik' so it only matches the binary at the repo root, not the internal/vaultik/ directory.
409 lines
15 KiB
Markdown
409 lines
15 KiB
Markdown
# vaultik (ваултик)
|
||
|
||
`vaultik` is an incremental backup tool written in Go. It encrypts data
|
||
using an `age` public key and uploads each encrypted blob directly to a
|
||
remote S3-compatible object store. It requires no private keys, secrets, or
|
||
credentials (other than those required to PUT to encrypted object storage,
|
||
such as S3 API keys) stored on the backed-up system.
|
||
|
||
Features:
|
||
|
||
* modern encryption ([age](https://age-encryption.org/), X25519 + XChaCha20-Poly1305)
|
||
* content-defined chunking with deduplication (FastCDC)
|
||
* incremental backups (only changed files are re-chunked)
|
||
* multithreaded zstd compression at configurable levels
|
||
* content-addressed immutable storage
|
||
* local state tracking in SQLite (enables write-only incremental backups)
|
||
* no mutable remote metadata
|
||
* no plaintext file paths or metadata in remote storage
|
||
* packs small files into large blobs (keeps S3 operation counts down)
|
||
* backs up regular files, symlinks, empty directories, and file permissions
|
||
* pluggable storage backends: S3, local filesystem, rclone (70+ providers)
|
||
* pure Go (no CGO), cross-compiles to linux/darwin × amd64/arm64
|
||
|
||
## why
|
||
|
||
Other backup tools like `restic`, `borg`, and `duplicity` are designed for
|
||
environments where the source host can store secrets and has access to
|
||
decryption keys. `vaultik` is for environments where you don't want to
|
||
store backup decryption keys on your hosts — only public keys for
|
||
encryption.
|
||
|
||
Requirements that no existing tool meets:
|
||
|
||
* open source
|
||
* no passphrases or private keys on the source host
|
||
* incremental
|
||
* compressed
|
||
* encrypted
|
||
* s3 compatible without an intermediate step or tool
|
||
|
||
## install
|
||
|
||
```sh
|
||
go install git.eeqj.de/sneak/vaultik@latest
|
||
```
|
||
|
||
## quick start
|
||
|
||
1. **generate keypair**
|
||
|
||
```sh
|
||
age-keygen -o agekey.txt
|
||
grep 'public key:' agekey.txt
|
||
```
|
||
|
||
2. **write config** (see `config.example.yml` for all options)
|
||
|
||
```yaml
|
||
snapshots:
|
||
system:
|
||
paths:
|
||
- /etc
|
||
- /var/lib
|
||
exclude:
|
||
- '*.cache'
|
||
home:
|
||
paths:
|
||
- /home/user/documents
|
||
- /home/user/photos
|
||
|
||
exclude:
|
||
- '*.log'
|
||
- '*.tmp'
|
||
- '.git'
|
||
- 'node_modules'
|
||
|
||
age_recipients:
|
||
- age1YOUR_PUBLIC_KEY_HERE
|
||
|
||
# Storage backend (pick one):
|
||
storage_url: "s3://mybucket/backups?endpoint=s3.example.com®ion=us-east-1"
|
||
# storage_url: "file:///mnt/backups"
|
||
# storage_url: "rclone://myremote/path/to/backups"
|
||
|
||
# For s3:// URLs, credentials are still required:
|
||
s3:
|
||
access_key_id: ...
|
||
secret_access_key: ...
|
||
```
|
||
|
||
3. **run**
|
||
|
||
```sh
|
||
# Back up all configured snapshots
|
||
vaultik --config /etc/vaultik.yml snapshot create
|
||
|
||
# Back up specific snapshots by name
|
||
vaultik --config /etc/vaultik.yml snapshot create home system
|
||
|
||
# Silent mode for cron
|
||
vaultik --config /etc/vaultik.yml snapshot create --cron
|
||
|
||
# Back up and clean up old snapshots + orphan blobs in one shot
|
||
vaultik --config /etc/vaultik.yml snapshot create --prune
|
||
|
||
# Daily cron: back up, keep last 4 weeks of snapshots
|
||
vaultik --config /etc/vaultik.yml snapshot create --cron --prune --keep-newer-than 4w
|
||
```
|
||
|
||
---
|
||
|
||
## cli
|
||
|
||
### commands
|
||
|
||
```sh
|
||
vaultik [--config <path>] snapshot create [snapshot-names...] [--cron] [--prune] [--keep-newer-than <duration>] [--skip-errors]
|
||
vaultik [--config <path>] snapshot list [--json]
|
||
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep] [--json]
|
||
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--snapshot <name>...] [--force]
|
||
vaultik [--config <path>] snapshot remove <snapshot-id|--all> [--dry-run] [--force] [--remote] [--json]
|
||
vaultik [--config <path>] snapshot prune
|
||
vaultik [--config <path>] snapshot cleanup
|
||
vaultik [--config <path>] restore <snapshot-id> <target-dir> [paths...] [--verify]
|
||
vaultik [--config <path>] prune [--force] [--json]
|
||
vaultik [--config <path>] info
|
||
vaultik [--config <path>] remote info [--json]
|
||
vaultik [--config <path>] store info
|
||
vaultik [--config <path>] database purge [--force]
|
||
vaultik version
|
||
```
|
||
|
||
### global flags
|
||
|
||
* `--config <path>`: Path to config file (default: `$VAULTIK_CONFIG` or `/etc/vaultik/config.yml`)
|
||
* `--verbose`, `-v`: Enable verbose output
|
||
* `--debug`: Enable debug output
|
||
* `--quiet`, `-q`: Suppress non-error output
|
||
|
||
### environment variables
|
||
|
||
* `VAULTIK_AGE_SECRET_KEY`: Age private key for decryption (required for `restore` and `verify --deep`)
|
||
* `VAULTIK_CONFIG`: Path to config file (overridden by `--config`)
|
||
* `VAULTIK_INDEX_PATH`: Override local SQLite index path
|
||
|
||
### command details
|
||
|
||
**snapshot create**: Perform incremental backup of configured snapshots.
|
||
* Optional snapshot names argument to create specific snapshots (default: all)
|
||
* `--cron`: Silent unless error (for crontab)
|
||
* `--prune`: After backup, drop older snapshots of each backed-up name and
|
||
remove orphaned blobs from remote storage. By default keeps only the latest
|
||
snapshot per name; use `--keep-newer-than` for a rolling window.
|
||
* `--keep-newer-than <duration>`: With `--prune`, keep snapshots newer than
|
||
this duration instead of only the latest (e.g. `4w`, `30d`, `6mo`, `1y`)
|
||
* `--skip-errors`: Skip file read errors (log them loudly but continue)
|
||
|
||
**snapshot list**: List all snapshots with their timestamps and sizes.
|
||
* `--json`: Output in JSON format
|
||
|
||
**snapshot verify**: Verify snapshot integrity.
|
||
* Default (shallow): checks that all blobs referenced in the manifest exist in storage
|
||
* `--deep`: Downloads and decrypts each blob, verifies chunk hashes against the
|
||
encrypted metadata database
|
||
* `--json`: Output results as JSON
|
||
|
||
**snapshot purge**: Remove old snapshots based on criteria. Retention is
|
||
per-snapshot-name (`--keep-latest` keeps the latest of each name, not the
|
||
latest globally).
|
||
* `--keep-latest`: Keep only the most recent snapshot of each name
|
||
* `--older-than <duration>`: Remove snapshots older than duration (e.g. `30d`, `6m`, `1y`)
|
||
* `--snapshot <name>`: Restrict to specific snapshot names (repeat for multiple)
|
||
* `--force`: Skip confirmation prompt
|
||
|
||
**snapshot remove**: Remove a specific snapshot from the local database.
|
||
* `--remote`: Also remove snapshot metadata from remote storage
|
||
* `--all`: Remove all snapshots (requires `--force`)
|
||
* `--dry-run`: Show what would be deleted without deleting
|
||
* `--force`: Skip confirmation prompt
|
||
* `--json`: Output result as JSON
|
||
|
||
**snapshot prune**: Clean orphaned data from the local database (files,
|
||
chunks, blobs not referenced by any snapshot).
|
||
|
||
**snapshot cleanup**: Remove stale local snapshot records that have no
|
||
corresponding metadata in remote storage. These are typically left behind
|
||
by incomplete or interrupted backups. Does not touch remote storage.
|
||
|
||
**restore**: Restore files from a backup snapshot.
|
||
* Requires `VAULTIK_AGE_SECRET_KEY` environment variable
|
||
* Optional path arguments to restore specific files/directories (default: all)
|
||
* Preserves file permissions, timestamps, ownership (ownership requires root),
|
||
symlinks, and empty directories
|
||
* `--verify`: After restoring, verify every file's chunk hashes match
|
||
|
||
**prune**: Remove unreferenced blobs from remote storage.
|
||
* Scans all snapshot manifests for referenced blobs, deletes any blob not referenced
|
||
* `--force`: Skip confirmation prompt
|
||
* `--json`: Output stats as JSON
|
||
|
||
**info**: Display system configuration, storage settings, encryption
|
||
recipients, and local database statistics.
|
||
|
||
**remote info**: Show detailed remote storage information including per-snapshot
|
||
metadata sizes, blob counts, and orphaned blob detection.
|
||
* `--json`: Output as JSON
|
||
|
||
**store info**: Display storage backend type and statistics.
|
||
|
||
**database purge**: Delete the local SQLite state database entirely. Remote
|
||
storage is unaffected; the next backup will do a full scan and re-deduplicate
|
||
against existing remote blobs.
|
||
* `--force`: Skip confirmation prompt
|
||
|
||
---
|
||
|
||
## storage backends
|
||
|
||
vaultik supports three storage backends, selected via the `storage_url` config field:
|
||
|
||
**S3** (`s3://bucket/prefix?endpoint=host®ion=us-east-1`): Any S3-compatible
|
||
object store. Credentials are read from `s3.access_key_id` and
|
||
`s3.secret_access_key` in the config file.
|
||
|
||
**Local filesystem** (`file:///path/to/backup`): Stores blobs and metadata on
|
||
a local or mounted filesystem. Useful for testing or backing up to a NAS.
|
||
|
||
**Rclone** (`rclone://remote/path`): Uses rclone's 70+ supported cloud
|
||
providers. Requires rclone to be configured separately (`rclone config`).
|
||
|
||
Legacy S3 configuration via `s3.*` fields (endpoint, bucket, prefix, etc.) is
|
||
still supported for backward compatibility. `storage_url` takes precedence if
|
||
both are set.
|
||
|
||
---
|
||
|
||
## architecture
|
||
|
||
### remote storage layout
|
||
|
||
```
|
||
<bucket>/<prefix>/
|
||
├── blobs/
|
||
│ └── <aa>/<bb>/<full_blob_hash>
|
||
└── metadata/
|
||
└── <snapshot_id>/
|
||
├── db.zst.age # Encrypted binary SQLite database
|
||
└── manifest.json.zst # Unencrypted blob list (for pruning)
|
||
```
|
||
|
||
* Blobs are two-level directory sharded using the first 4 hex chars of the blob hash
|
||
* `db.zst.age` is a binary SQLite database (zstd compressed, age encrypted)
|
||
containing all file metadata, chunk mappings, and relationships for the snapshot
|
||
* `manifest.json.zst` is an unencrypted compressed JSON blob list, enabling
|
||
pruning without the private key
|
||
|
||
Snapshot IDs follow the format `<hostname>_<snapshot-name>_<RFC3339-timestamp>`
|
||
(e.g. `server1_home_2025-06-01T12:00:00Z`).
|
||
|
||
### data flow
|
||
|
||
**backup:**
|
||
|
||
1. Open local SQLite index, load known files and chunks into memory
|
||
2. Walk source directories, compare mtime/size/mode against index
|
||
3. For changed/new files: chunk using content-defined chunking (FastCDC)
|
||
4. For symlinks and directories: record metadata (no chunking)
|
||
5. For each chunk: hash, check dedup, add to blob packer
|
||
6. When blob reaches size threshold: compress (zstd), encrypt (age), upload
|
||
7. Build snapshot metadata database, compress, encrypt, upload
|
||
8. Create unencrypted blob manifest for pruning support
|
||
|
||
**restore:**
|
||
|
||
1. Download and decrypt `metadata/<snapshot_id>/db.zst.age`
|
||
2. Open the binary SQLite database
|
||
3. Query files (optionally filtered by paths)
|
||
4. Download and decrypt required blobs
|
||
5. Extract chunks, reconstruct files
|
||
6. Restore permissions, timestamps, ownership, symlinks
|
||
|
||
**prune:**
|
||
|
||
1. List all snapshot manifests
|
||
2. Build set of all referenced blob hashes
|
||
3. List all blobs in storage
|
||
4. Delete any blob not in the referenced set
|
||
|
||
### chunking and deduplication
|
||
|
||
* Content-defined chunking using the FastCDC algorithm
|
||
* Average chunk size: configurable (default 10MB)
|
||
* Deduplication at file level (unchanged files skipped) and chunk level
|
||
(identical chunks across files stored once)
|
||
* Multiple chunks packed into blobs to reduce object count
|
||
|
||
### encryption
|
||
|
||
* Asymmetric encryption using age (X25519 + XChaCha20-Poly1305)
|
||
* Only the public key is needed on the source host
|
||
* Each blob and each metadata database is encrypted independently
|
||
* Multiple recipients supported (encrypt to multiple keys)
|
||
|
||
### compression
|
||
|
||
* zstd compression at configurable level (1-19, default 3)
|
||
* Applied before encryption at the blob level
|
||
|
||
---
|
||
|
||
## configuration reference
|
||
|
||
See `config.example.yml` for a complete annotated example. Key fields:
|
||
|
||
| Field | Default | Description |
|
||
|-------|---------|-------------|
|
||
| `age_recipients` | (required) | Age public keys for encryption |
|
||
| `snapshots` | (required) | Named snapshot definitions with paths and excludes |
|
||
| `storage_url` | | Storage backend URL (`s3://`, `file://`, `rclone://`) |
|
||
| `s3.*` | | Legacy S3 configuration (endpoint, bucket, credentials) |
|
||
| `exclude` | | Global exclude patterns (applied to all snapshots) |
|
||
| `chunk_size` | `10MB` | Average chunk size for content-defined chunking |
|
||
| `blob_size_limit` | `10GB` | Maximum blob size before splitting |
|
||
| `compression_level` | `3` | zstd compression level (1-19) |
|
||
| `hostname` | system hostname | Hostname used in snapshot IDs |
|
||
| `index_path` | `~/.local/share/.../index.sqlite` | Local SQLite index path |
|
||
|
||
---
|
||
|
||
## limitations
|
||
|
||
* **No extended attributes (xattrs).** ACLs, macOS Finder metadata,
|
||
quarantine flags, SELinux labels, and other extended attributes are not
|
||
backed up or restored.
|
||
* **No hard link detection.** Two hard links to the same inode are backed
|
||
up as independent files. Content deduplication means the data is stored
|
||
once, but the hard link relationship is lost on restore.
|
||
* **No sparse file support.** Sparse files are fully materialized during
|
||
backup. A 100 GB sparse VM disk that is mostly zeros will consume the
|
||
full (compressed) size in storage.
|
||
* **No bandwidth limiting.** Uploads and downloads use whatever bandwidth
|
||
is available. There is no `--bwlimit` flag yet.
|
||
* **No parallel blob downloads during restore.** Blobs are fetched
|
||
sequentially. Restore speed is bound by single-stream throughput.
|
||
* **Device nodes, named pipes, and sockets are silently skipped.** Only
|
||
regular files, directories, and symlinks are backed up.
|
||
* **No database migrations.** If the local SQLite schema changes between
|
||
versions, delete the local database (`vaultik database purge`) and run
|
||
a full backup. Remote storage is unaffected.
|
||
* **Files that change during backup may be inconsistent.** There is no
|
||
filesystem snapshot or freeze. If a file is modified between the scan
|
||
and chunk phases, the backed-up copy may reflect a partial write.
|
||
* **Ownership restoration requires root.** File uid/gid are recorded
|
||
and restored, but `chown` requires elevated privileges. Without root,
|
||
files are restored with the current user's ownership.
|
||
|
||
---
|
||
|
||
## roadmap
|
||
|
||
Items for future releases:
|
||
|
||
* Error-condition tests (network failures, disk full, corrupted/missing blobs)
|
||
* Parallel blob downloads during restore
|
||
* Bandwidth limiting (`--bwlimit`)
|
||
* Security audit of encryption implementation
|
||
* Man pages and richer `--help` examples
|
||
|
||
---
|
||
|
||
## requirements
|
||
|
||
* Go 1.26 or later
|
||
* S3-compatible object storage (or local filesystem, or rclone remote)
|
||
|
||
## development workflow
|
||
|
||
All changes follow this workflow. No exceptions.
|
||
|
||
1. Create a feature branch off `main`.
|
||
2. Write tests.
|
||
3. Write the implementation.
|
||
4. Fix implementation errors until it compiles and tests pass.
|
||
5. Fix linting errors (`make lint`).
|
||
6. Update documentation and README as required by the change.
|
||
7. Format code (`make fmt`).
|
||
8. Run `make check` (lint + fmt-check + test). Fix any issues. Repeat until clean.
|
||
9. Commit on the branch.
|
||
10. Merge to `main`.
|
||
11. Push.
|
||
|
||
Do not commit directly to `main`. Do not skip steps.
|
||
|
||
Repository policies for AI agents are in [`AGENTS.md`](AGENTS.md).
|
||
|
||
## license
|
||
|
||
[MIT](https://opensource.org/license/mit/)
|
||
|
||
## author
|
||
|
||
Made with love and lots of expensive SOTA AI by [sneak](https://sneak.berlin) in Berlin in the summer of 2025.
|
||
|
||
Released as a free software gift to the world, no strings attached.
|
||
|
||
Contact: [sneak@sneak.berlin](mailto:sneak@sneak.berlin)
|
||
|
||
[https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2](https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2)
|