vaultik/README.md

# vaultik (ваултик)

`vaultik` is an incremental backup tool written in Go. It encrypts data
using an `age` public key and uploads each encrypted blob directly to a
remote S3-compatible object store. It requires no private keys, secrets, or
credentials (other than those required to PUT to encrypted object storage,
such as S3 API keys) stored on the backed-up system.

## quickstart

```sh
# install
go install sneak.berlin/go/vaultik/cmd/vaultik@latest

# create a default config file (prints the path it wrote to)
vaultik config init

# generate an age keypair; keep the private key file somewhere safe and
# offline — you need it to restore, and the backed-up machine does not need it
age-keygen -o vaultik_backup_private_key.txt
grep 'public key' vaultik_backup_private_key.txt

# configure the encryption key and backup destination
vaultik config set age_recipients.0 age1YOUR_PUBLIC_KEY_HERE
vaultik config set storage_url "file:///Volumes/usbstick/mybackup"

# macOS only: grant your terminal app Full Disk Access first
# (System Settings → Privacy & Security → Full Disk Access), otherwise
# the backup will abort with a permission error on protected directories

# run your first backup (the default config backs up ~ and /Applications
# with sensible excludes)
vaultik snapshot create

# see what you have
vaultik snapshot list
```

Features:

* modern encryption ([age](https://age-encryption.org/), X25519 + XChaCha20-Poly1305)
* content-defined chunking with deduplication (FastCDC)
* incremental backups (only changed files are re-chunked)
* multithreaded zstd compression at configurable levels
* content-addressed immutable storage
* local state tracking in SQLite (enables write-only incremental backups)
* no mutable remote metadata
* no plaintext file paths or metadata in remote storage
* packs small files into large blobs (keeps S3 operation counts down)
* backs up regular files, symlinks, empty directories, and file permissions
* pluggable storage backends: S3, local filesystem, rclone (70+ providers)
* pure Go (no CGO), cross-compiles to linux/darwin × amd64/arm64

## why

Other backup tools like `restic`, `borg`, and `duplicity` are designed for
environments where the source host can store secrets and has access to
decryption keys. `vaultik` is for environments where you don't want to
store backup decryption keys on your hosts — only public keys for
encryption.

Requirements that no existing tool meets:

* open source
* no passphrases or private keys on the source host
* incremental
* compressed
* encrypted
* s3 compatible without an intermediate step or tool

## daily use

```sh
# verify a snapshot (shallow: checks all blobs exist)
vaultik snapshot verify <snapshot-id>

# deep verify (downloads and cryptographically verifies every blob)
VAULTIK_AGE_SECRET_KEY='AGE-SECRET-KEY-...' vaultik snapshot verify --deep <snapshot-id>

# restore (requires the private key)
VAULTIK_AGE_SECRET_KEY='AGE-SECRET-KEY-...' vaultik snapshot restore <snapshot-id> /tmp/restored

# daily cron job: back up, keep a 4-week rolling window of snapshots
# 0 3 * * * vaultik snapshot create --cron --prune --keep-newer-than 4w
```

---

## cli

### commands

```sh
vaultik [--config <path>] config init
vaultik [--config <path>] config edit
vaultik [--config <path>] config get <key>
vaultik [--config <path>] config set <key> <value>
vaultik [--config <path>] snapshot create [snapshot-names...] [--cron] [--prune] [--keep-newer-than <duration>]
vaultik [--config <path>] snapshot list [--json]
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep] [--json]
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--snapshot <name>...] [--force]
vaultik [--config <path>] snapshot remove <snapshot-id|--all> [--dry-run] [--force] [--remote] [--json]
vaultik [--config <path>] snapshot cleanup
vaultik [--config <path>] snapshot restore <snapshot-id> <target-dir> [paths...] [--verify]
vaultik [--config <path>] prune [--force] [--json]
vaultik [--config <path>] info
vaultik [--config <path>] remote info [--json]
vaultik [--config <path>] remote nuke --force
vaultik [--config <path>] store info
vaultik [--config <path>] database purge [--force]
vaultik completion <bash|zsh|fish|powershell>
vaultik version
```

### global flags

* `--config <path>`: Path to config file (default: `$VAULTIK_CONFIG`, then platform config dir, then `/etc/vaultik/config.yml`)
* `--verbose`, `-v`: Enable verbose output
* `--debug`: Enable debug output
* `--quiet`, `-q`: Suppress non-error output (also suppresses startup banner)
* `--skip-errors`: Continue past per-file errors instead of aborting (applies to `snapshot create` and `restore`)

### environment variables

* `VAULTIK_AGE_SECRET_KEY`: Age private key for decryption (required for `snapshot restore` and `snapshot verify --deep`)
* `VAULTIK_CONFIG`: Path to config file (overridden by `--config`)
* `VAULTIK_INDEX_PATH`: Override local SQLite index path

### shell completion

```sh
# zsh: load for the current session
source <(vaultik completion zsh)

# zsh: install permanently
vaultik completion zsh > "${fpath[1]}/_vaultik"

# bash: load for the current session
source <(vaultik completion bash)

# bash: install permanently (Linux)
vaultik completion bash > /etc/bash_completion.d/vaultik

# fish
vaultik completion fish > ~/.config/fish/completions/vaultik.fish
```

### command details

**`config init`**: Write a default config file with commented explanations for
every setting. Writes to the path from `--config`, `$VAULTIK_CONFIG`, or the
platform config directory (`~/Library/Application Support/vaultik/` on macOS,
`~/.config/vaultik/` on Linux, `/etc/vaultik/` as root). Refuses to overwrite an
existing file. Created with mode `0600` since it will contain credentials.

**`config edit`**: Open the config file in `$EDITOR` (falls back to `vi`).

**`config get`**: Print a config value addressed by dotted YAML path
(e.g. `vaultik config get storage_url`). Non-scalar values print as YAML.

**`config set`**: Set a scalar config value by dotted YAML path
(e.g. `vaultik config set compression_level 9`,
`vaultik config set storage_url "file:///mnt/backups"`). Comments and
formatting in the file are preserved; intermediate maps are created as
needed.

**`snapshot create`**: Perform incremental backup of configured snapshots.
* Optional snapshot names argument to create specific snapshots (default: all)
* On macOS, the terminal application running vaultik needs Full Disk Access
  (System Settings → Privacy & Security → Full Disk Access) to read
  TCC-protected directories; without it the backup aborts with a permission
  error that explains how to fix it
* `--cron`: Silent unless error (for crontab)
* `--prune`: After backup, drop older snapshots of each backed-up name and
  remove orphaned blobs from remote storage. By default keeps only the latest
  snapshot per name; use `--keep-newer-than` for a rolling window.
* `--keep-newer-than <duration>`: With `--prune`, keep snapshots newer than
  this duration instead of only the latest (e.g. `4w`, `30d`, `6mo`, `1y`)

**`snapshot list`**: Show every snapshot known to the destination
store with timestamps and three sizes per snapshot (compressed
remote size; total uncompressed chunk size; size of chunks newly
referenced by that snapshot). The uncompressed and "new chunk"
columns show `<remote only>` for snapshots not in the local index.
* `--json`: Output in JSON format

**`snapshot verify`**: Verify snapshot integrity.
* Default (shallow): checks that all blobs referenced in the manifest exist in storage
* `--deep`: Downloads and decrypts each blob, verifies chunk hashes against the
  encrypted metadata database
* `--json`: Output results as JSON

**`snapshot purge`**: Remove old snapshots based on criteria. Retention is
per-snapshot-name (`--keep-latest` keeps the latest of each name, not the
latest globally).
* `--keep-latest`: Keep only the most recent snapshot of each name
* `--older-than <duration>`: Remove snapshots older than duration (e.g. `30d`, `6m`, `1y`)
* `--snapshot <name>`: Restrict to specific snapshot names (repeat for multiple)
* `--force`: Skip confirmation prompt

**`snapshot remove`**: Remove a specific snapshot from the local database.
Automatically cleans up local rows (files, chunks, blobs) that the removed
snapshot was the last referrer for — you don't need a separate prune step
after removal.
* `--remote`: Also remove snapshot metadata from remote storage
* `--all`: Remove all snapshots (requires `--force`)
* `--dry-run`: Show what would be deleted without deleting
* `--force`: Skip confirmation prompt
* `--json`: Output result as JSON

**`snapshot cleanup`**: Remove stale local snapshot records that have no
corresponding metadata in remote storage. These are typically left behind
by incomplete or interrupted backups. Does not touch remote storage.

**`snapshot restore`**: Restore files from a backup snapshot.
* Requires `VAULTIK_AGE_SECRET_KEY` environment variable
* Optional path arguments to restore specific files/directories (default: all)
* Preserves file permissions, timestamps, ownership (ownership requires root),
  symlinks, and empty directories
* `--verify`: After restoring, verify every file's chunk hashes match

**`prune`**: Tidy up everything that isn't needed. Removes orphaned local
database rows (files, chunks, blobs no longer referenced by any completed
snapshot) AND deletes unreferenced blobs from remote storage. `snapshot
create --prune`, `snapshot remove`, and `snapshot purge` run the same
cleanup automatically; this is the manual entry point for the same work.
* `--force`: Skip confirmation prompt
* `--json`: Output stats as JSON

**`info`**: Display system configuration, storage settings, encryption
recipients, and local database statistics.

**`remote info`**: Show detailed remote storage information including per-snapshot
metadata sizes, blob counts, and orphaned blob detection.
* `--json`: Output as JSON

**`remote nuke`**: Delete every snapshot's metadata and every blob from the
backup destination store, leaving the bucket prefix empty. Destructive and
irreversible.
* `--force`: Required to confirm destruction.

**`store info`**: Display storage backend type and statistics.

**`database purge`**: Delete the local SQLite state database entirely. Remote
storage is unaffected; the next backup will do a full scan and re-deduplicate
against existing remote blobs.
* `--force`: Skip confirmation prompt

---

## storage backends

vaultik supports three storage backends, selected via the `storage_url` config field:

**S3** (`s3://bucket/prefix?endpoint=host&region=us-east-1`): Any S3-compatible
object store. Credentials are read from `s3.access_key_id` and
`s3.secret_access_key` in the config file.

**Local filesystem** (`file:///path/to/backup`): Stores blobs and metadata on
a local or mounted filesystem. Useful for testing or backing up to a NAS.

**Rclone** (`rclone://remote/path`): Uses rclone's 70+ supported cloud
providers. Requires rclone to be configured separately (`rclone config`).

Legacy S3 configuration via `s3.*` fields (endpoint, bucket, prefix, etc.) is
still supported for backward compatibility. `storage_url` takes precedence if
both are set.

---

## architecture

### remote storage layout

```
<bucket>/<prefix>/
├── blobs/
│   └── <aa>/<bb>/<full_blob_hash>
└── metadata/
    └── <snapshot_id>/
        ├── db.zst.age          # Encrypted binary SQLite database
        └── manifest.json.zst   # Unencrypted blob list (for pruning)
```

* Blobs are two-level directory sharded using the first 4 hex chars of the blob hash
* `db.zst.age` is a binary SQLite database (zstd compressed, age encrypted)
  containing all file metadata, chunk mappings, and relationships for the snapshot
* `manifest.json.zst` is an unencrypted compressed JSON blob list, enabling
  pruning without the private key

Snapshot IDs follow the format `<hostname>_<snapshot-name>_<RFC3339-timestamp>`
(e.g. `server1_home_2025-06-01T12:00:00Z`).

### data flow

**backup:**

1. Open local SQLite index, load known files and chunks into memory
2. Walk source directories, compare mtime/size/mode against index
3. For changed/new files: chunk using content-defined chunking (FastCDC)
4. For symlinks and directories: record metadata (no chunking)
5. For each chunk: hash, check dedup, add to blob packer
6. When blob reaches size threshold: compress (zstd), encrypt (age), upload
7. Build snapshot metadata database, compress, encrypt, upload
8. Create unencrypted blob manifest for pruning support

**restore:**

1. Download and decrypt `metadata/<snapshot_id>/db.zst.age`
2. Open the binary SQLite database
3. Query files (optionally filtered by paths)
4. Download and decrypt required blobs
5. Extract chunks, reconstruct files
6. Restore permissions, timestamps, ownership, symlinks

**prune:**

1. List all snapshot manifests
2. Build set of all referenced blob hashes
3. List all blobs in storage
4. Delete any blob not in the referenced set

### chunking and deduplication

* Content-defined chunking using the FastCDC algorithm
* Average chunk size: configurable (default 10MB)
* Deduplication at file level (unchanged files skipped) and chunk level
  (identical chunks across files stored once)
* Multiple chunks packed into blobs to reduce object count

### encryption

* Asymmetric encryption using age (X25519 + XChaCha20-Poly1305)
* Only the public key is needed on the source host
* Each blob and each metadata database is encrypted independently
* Multiple recipients supported (encrypt to multiple keys)

### compression

* zstd compression at configurable level (1-19, default 3)
* Applied before encryption at the blob level

---

## configuration reference

Run `vaultik config init` to generate a fully commented config file.
Key fields:

| Field | Default | Description |
|-------|---------|-------------|
| `age_recipients` | (required) | Age public keys for encryption |
| `snapshots` | (required) | Named snapshot definitions with paths and excludes |
| `storage_url` | | Storage backend URL (`s3://`, `file://`, `rclone://`) |
| `s3.*` | | Legacy S3 configuration (endpoint, bucket, credentials) |
| `exclude` | | Global exclude patterns (applied to all snapshots) |
| `chunk_size` | `10MB` | Average chunk size for content-defined chunking |
| `blob_size_limit` | `10GB` | Maximum blob size before splitting |
| `compression_level` | `3` | zstd compression level (1-19) |
| `hostname` | system hostname | Hostname used in snapshot IDs |
| `index_path` | platform data dir | Local SQLite index path |

---

## limitations

* **No extended attributes (xattrs).** ACLs, macOS Finder metadata,
  quarantine flags, SELinux labels, and other extended attributes are not
  backed up or restored.
* **No hard link detection.** Two hard links to the same inode are backed
  up as independent files. Content deduplication means the data is stored
  once, but the hard link relationship is lost on restore.
* **No sparse file support.** Sparse files are fully materialized during
  backup. A 100 GB sparse VM disk that is mostly zeros will consume the
  full (compressed) size in storage.
* **No bandwidth limiting.** Uploads and downloads use whatever bandwidth
  is available. There is no `--bwlimit` flag yet.
* **No parallel blob downloads during restore.** Blobs are fetched
  sequentially. Restore speed is bound by single-stream throughput.
* **Device nodes, named pipes, and sockets are silently skipped.** Only
  regular files, directories, and symlinks are backed up.
* **No database migrations.** If the local SQLite schema changes between
  versions, delete the local database (`vaultik database purge`) and run
  a full backup. Remote storage is unaffected.
* **Files that change during backup may be inconsistent.** There is no
  filesystem snapshot or freeze. If a file is modified between the scan
  and chunk phases, the backed-up copy may reflect a partial write.
* **Ownership restoration requires root.** File uid/gid are recorded
  and restored, but `chown` requires elevated privileges. Without root,
  files are restored with the current user's ownership.

---

## roadmap

Items still to do before / shortly after 1.0. Loosely ordered by
priority.

### correctness and operability

* **Security audit of the encryption implementation.** Pre-1.0
  blocker if we're advertising "secure" at the top of this README.
  age + zstd + content-defined chunking is mostly off-the-shelf
  pieces, but the seams (key handling, recipient parsing, manifest
  trust boundary, restore-time identity validation) need an outside
  read.
* **Error-condition tests.** Today's coverage is the happy path
  plus a few specific regressions. Need fault-injection coverage:
  network failures mid-blob, disk-full during restore, corrupted /
  truncated / missing blobs, partial uploads, kill -9 between
  manifest and db.zst.age writes.
* **Verify restored content end-to-end in CI.** The current
  integration test does this for a small synthetic snapshot but
  not at scale. A nightly job against a multi-GB representative
  snapshot would catch silent regressions in the chunker, packer,
  or restore planner.

### performance

* **Parallel blob downloads during restore.** Single-stream right
  now. With a fast S3 endpoint and a multi-core machine restore is
  bound by per-blob fetch + decrypt + decompress; running N of
  those in parallel against the disk cache would close most of the
  remaining gap. Needs to interact correctly with the locality
  planner and sweeper.
* **Bandwidth limiting (`--bwlimit`).** Both upload and download.
  Useful for backing up over a shared link. Tricky to make work
  correctly with the parallel-download story.
* **Restart of interrupted restore.** Today restore is restartable
  in the sense that re-running it overwrites partial output; it
  doesn't resume from where it stopped or skip already-present
  files. A `--resume` mode that checks targets before fetching
  blobs would matter for very large restores.

### usability

* **Man pages and richer `--help` examples.** Cobra generates
  basic help; man pages would be a separate target.
* **`--bwlimit` style human-readable size flags** across the
  command surface where they're currently raw integers.
* **`vaultik snapshot diff <a> <b>`** — show which files changed
  between two snapshots without restoring either.
* **Status reporting hook for `--cron`.** When a backup fails
  silently in cron, the user has no idea. A configurable
  webhook / email / `notify-send` hook on completion (success and
  failure) would close the loop.

### infrastructure

* **Cross-machine restore documentation.** The "restore from
  another host" workflow works but isn't documented as a
  first-class operation in this README. Worth a dedicated section
  once it's settled.
* **Schema migrations.** Currently nonexistent — pre-1.0 schema
  changes are handled by `vaultik database purge` plus a full
  re-scan. Post-1.0 we'll need a migration story to keep existing
  index databases usable across upgrades.
* **Storage backend coverage tests.** S3, file://, and rclone://
  all share the Storer interface but the rclone path is the least
  exercised in CI.

---

## output style

All user-facing output goes through helpers in `internal/ui` and conforms
to a uniform style. Color is enabled when stdout is a TTY and the
`NO_COLOR` environment variable is unset (https://no-color.org/).

Message classes:

| Class | Marker | Alignment | Use for |
|-------|--------|-----------|---------|
| Banner | none | column 0 | The startup line printed once per invocation |
| Begin | `》` (white) | column 0 | An operation is about to start (present-continuous verb) |
| Complete | `》` (green) | column 0 | An operation just finished (past-tense verb) |
| Info | `》` (white) | column 0 | Neutral status update |
| Notice | `》` (cyan) | column 0 | Important note that is not a warning |
| Warning | `⚠️  Warning:` (orange/yellow) | column 0 | Recoverable problem |
| Error | `🛑 ERROR:` (red) | column 0 | Operation aborted |
| Progress | `  》` (white) | column 2 | Heartbeat or per-item status during a long-running operation |
| Detail | `  》` (white) | column 2 | Continuation/sub-line of a preceding Complete (visually identical to Progress) |

Conventions:

* Messages are complete English sentences ending with a period.
* Fully qualify terms — say "backup destination store" instead of
  "storage", "snapshot source files enumeration" instead of "scan",
  "local index database" instead of "database".
* Every operation that emits a Complete also emits a corresponding
  Begin. Operations that print only a Begin (because completion is
  obvious from a later Begin) should be rare and intentional.
* Use natural verb tense to signal state: "Uploading" for Begin,
  "Uploaded" for Complete. Never write the words "begin" or "complete"
  in the body — the marker color already conveys that.
* All elapsed and remaining-time fields are explicitly scoped to their
  subject: write "blob upload elapsed: 30s, blob upload ETA: 03:15:00
  (est remain 14s)", never just "elapsed 30s, ETA 14s".
* "ETA" means an absolute clock time (when the operation will finish),
  not a remaining-duration. Use `ui.Time()` for the former and
  `ui.Duration()` for the latter, and label both.
* `ui.Time` formats same-day times as `HH:MM:SS` and other-day times as
  `YYYY-MM-DD HH:MM:SS`. No timezone — local time is implied.

Value colorizers in `internal/ui` colorize specific value types
consistently. Compose messages from these helpers rather than embedding
ANSI escapes inline:

| Helper | Color | Use for |
|--------|-------|---------|
| `Hex` | cyan | Blob hashes, chunk hashes (truncated to 12 chars + `...`) |
| `Snapshot` | bold cyan | Snapshot IDs (untruncated) |
| `Path` | blue | Filesystem paths |
| `Size` | magenta | Byte counts (human-readable) |
| `Speed` | magenta | Bytes-per-second rates |
| `Duration` | yellow | Elapsed or remaining time |
| `Time` | yellow | Absolute clock times |
| `Count` | magenta | Integer counts with thousands separators |
| `Percent` | magenta | Percentages |

When `NO_COLOR` is set or output is not a TTY, all helpers return plain
text and the marker prefixes (`》`, `Warning:`, `ERROR:`) emit without
ANSI escapes. The emoji prefixes on Warning and Error are always emitted
regardless of color setting (emoji are not color).

## requirements

* Go 1.26 or later
* S3-compatible object storage (or local filesystem, or rclone remote)

## development workflow

All changes follow this workflow. No exceptions.

1. Create a feature branch off `main`.
2. Write tests.
3. Write the implementation.
4. Fix implementation errors until it compiles and tests pass.
5. Fix linting errors (`make lint`).
6. Update documentation and README as required by the change.
7. Format code (`make fmt`).
8. Run `make check` (lint + fmt-check + test). Fix any issues. Repeat until clean.
9. Commit on the branch.
10. Merge to `main`.
11. Push.

Do not commit directly to `main`. Do not skip steps.

Repository policies for AI agents are in [`AGENTS.md`](AGENTS.md).

## license

[MIT](https://opensource.org/license/mit/)

## author

Made with love and lots of expensive SOTA AI by [sneak](https://sneak.berlin) in Berlin in the summer of 2025.

Released as a free software gift to the world, no strings attached.

Contact: [sneak@sneak.berlin](mailto:sneak@sneak.berlin)

[https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2](https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2)