vaultik/README.md

# vaultik (ваултик)

`vaultik` is an incremental backup tool written in Go. It encrypts data
using an `age` public key and uploads each encrypted blob directly to a
remote S3-compatible object store. It requires no private keys, secrets, or
credentials (other than those required to PUT to encrypted object storage,
such as S3 API keys) stored on the backed-up system.

Features:

* modern encryption ([age](https://age-encryption.org/), X25519 + XChaCha20-Poly1305)
* content-defined chunking with deduplication (FastCDC)
* incremental backups (only changed files are re-chunked)
* multithreaded zstd compression at configurable levels
* content-addressed immutable storage
* local state tracking in SQLite (enables write-only incremental backups)
* no mutable remote metadata
* no plaintext file paths or metadata in remote storage
* packs small files into large blobs (keeps S3 operation counts down)
* backs up regular files, symlinks, empty directories, and file permissions
* pluggable storage backends: S3, local filesystem, rclone (70+ providers)
* pure Go (no CGO), cross-compiles to linux/darwin × amd64/arm64

## why

Other backup tools like `restic`, `borg`, and `duplicity` are designed for
environments where the source host can store secrets and has access to
decryption keys. `vaultik` is for environments where you don't want to
store backup decryption keys on your hosts — only public keys for
encryption.

Requirements that no existing tool meets:

* open source
* no passphrases or private keys on the source host
* incremental
* compressed
* encrypted
* s3 compatible without an intermediate step or tool

## install

```sh
go install git.eeqj.de/sneak/vaultik@latest
```

## quick start

1. **generate keypair**

   ```sh
   age-keygen -o agekey.txt
   grep 'public key:' agekey.txt
   ```

2. **write config** (see `config.example.yml` for all options)

   ```yaml
   snapshots:
     system:
       paths:
         - /etc
         - /var/lib
       exclude:
         - '*.cache'
     home:
       paths:
         - /home/user/documents
         - /home/user/photos

   exclude:
     - '*.log'
     - '*.tmp'
     - '.git'
     - 'node_modules'

   age_recipients:
     - age1YOUR_PUBLIC_KEY_HERE

   # Storage backend (pick one):
   storage_url: "s3://mybucket/backups?endpoint=s3.example.com&region=us-east-1"
   # storage_url: "file:///mnt/backups"
   # storage_url: "rclone://myremote/path/to/backups"

   # For s3:// URLs, credentials are still required:
   s3:
     access_key_id: ...
     secret_access_key: ...
   ```

3. **run**

   ```sh
   # Back up all configured snapshots
   vaultik --config /etc/vaultik.yml snapshot create

   # Back up specific snapshots by name
   vaultik --config /etc/vaultik.yml snapshot create home system

   # Silent mode for cron
   vaultik --config /etc/vaultik.yml snapshot create --cron

   # Back up and clean up old snapshots + orphan blobs in one shot
   vaultik --config /etc/vaultik.yml snapshot create --prune

   # Daily cron: back up, keep last 4 weeks of snapshots
   vaultik --config /etc/vaultik.yml snapshot create --cron --prune --keep-newer-than 4w
   ```

---

## cli

### commands

```sh
vaultik [--config <path>] snapshot create [snapshot-names...] [--cron] [--prune] [--keep-newer-than <duration>] [--skip-errors]
vaultik [--config <path>] snapshot list [--json]
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep] [--json]
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--snapshot <name>...] [--force]
vaultik [--config <path>] snapshot remove <snapshot-id|--all> [--dry-run] [--force] [--remote] [--json]
vaultik [--config <path>] snapshot prune
vaultik [--config <path>] restore <snapshot-id> <target-dir> [paths...] [--verify]
vaultik [--config <path>] prune [--force] [--json]
vaultik [--config <path>] info
vaultik [--config <path>] remote info [--json]
vaultik [--config <path>] store info
vaultik [--config <path>] database purge [--force]
vaultik version
```

### global flags

* `--config <path>`: Path to config file (default: `$VAULTIK_CONFIG` or `/etc/vaultik/config.yml`)
* `--verbose`, `-v`: Enable verbose output
* `--debug`: Enable debug output
* `--quiet`, `-q`: Suppress non-error output

### environment variables

* `VAULTIK_AGE_SECRET_KEY`: Age private key for decryption (required for `restore` and `verify --deep`)
* `VAULTIK_CONFIG`: Path to config file (overridden by `--config`)
* `VAULTIK_INDEX_PATH`: Override local SQLite index path

### command details

**snapshot create**: Perform incremental backup of configured snapshots.
* Optional snapshot names argument to create specific snapshots (default: all)
* `--cron`: Silent unless error (for crontab)
* `--prune`: After backup, drop older snapshots of each backed-up name and
  remove orphaned blobs from remote storage. By default keeps only the latest
  snapshot per name; use `--keep-newer-than` for a rolling window.
* `--keep-newer-than <duration>`: With `--prune`, keep snapshots newer than
  this duration instead of only the latest (e.g. `4w`, `30d`, `6mo`, `1y`)
* `--skip-errors`: Skip file read errors (log them loudly but continue)

**snapshot list**: List all snapshots with their timestamps and sizes.
* `--json`: Output in JSON format

**snapshot verify**: Verify snapshot integrity.
* Default (shallow): checks that all blobs referenced in the manifest exist in storage
* `--deep`: Downloads and decrypts each blob, verifies chunk hashes against the
  encrypted metadata database
* `--json`: Output results as JSON

**snapshot purge**: Remove old snapshots based on criteria. Retention is
per-snapshot-name (`--keep-latest` keeps the latest of each name, not the
latest globally).
* `--keep-latest`: Keep only the most recent snapshot of each name
* `--older-than <duration>`: Remove snapshots older than duration (e.g. `30d`, `6m`, `1y`)
* `--snapshot <name>`: Restrict to specific snapshot names (repeat for multiple)
* `--force`: Skip confirmation prompt

**snapshot remove**: Remove a specific snapshot from the local database.
* `--remote`: Also remove snapshot metadata from remote storage
* `--all`: Remove all snapshots (requires `--force`)
* `--dry-run`: Show what would be deleted without deleting
* `--force`: Skip confirmation prompt
* `--json`: Output result as JSON

**snapshot prune**: Clean orphaned data from the local database (files,
chunks, blobs not referenced by any snapshot).

**restore**: Restore files from a backup snapshot.
* Requires `VAULTIK_AGE_SECRET_KEY` environment variable
* Optional path arguments to restore specific files/directories (default: all)
* Preserves file permissions, timestamps, ownership (ownership requires root),
  symlinks, and empty directories
* `--verify`: After restoring, verify every file's chunk hashes match

**prune**: Remove unreferenced blobs from remote storage.
* Scans all snapshot manifests for referenced blobs, deletes any blob not referenced
* `--force`: Skip confirmation prompt
* `--json`: Output stats as JSON

**info**: Display system configuration, storage settings, encryption
recipients, and local database statistics.

**remote info**: Show detailed remote storage information including per-snapshot
metadata sizes, blob counts, and orphaned blob detection.
* `--json`: Output as JSON

**store info**: Display storage backend type and statistics.

**database purge**: Delete the local SQLite state database entirely. Remote
storage is unaffected; the next backup will do a full scan and re-deduplicate
against existing remote blobs.
* `--force`: Skip confirmation prompt

---

## storage backends

vaultik supports three storage backends, selected via the `storage_url` config field:

**S3** (`s3://bucket/prefix?endpoint=host&region=us-east-1`): Any S3-compatible
object store. Credentials are read from `s3.access_key_id` and
`s3.secret_access_key` in the config file.

**Local filesystem** (`file:///path/to/backup`): Stores blobs and metadata on
a local or mounted filesystem. Useful for testing or backing up to a NAS.

**Rclone** (`rclone://remote/path`): Uses rclone's 70+ supported cloud
providers. Requires rclone to be configured separately (`rclone config`).

Legacy S3 configuration via `s3.*` fields (endpoint, bucket, prefix, etc.) is
still supported for backward compatibility. `storage_url` takes precedence if
both are set.

---

## architecture

### remote storage layout

```
<bucket>/<prefix>/
├── blobs/
│   └── <aa>/<bb>/<full_blob_hash>
└── metadata/
    └── <snapshot_id>/
        ├── db.zst.age          # Encrypted binary SQLite database
        └── manifest.json.zst   # Unencrypted blob list (for pruning)
```

* Blobs are two-level directory sharded using the first 4 hex chars of the blob hash
* `db.zst.age` is a binary SQLite database (zstd compressed, age encrypted)
  containing all file metadata, chunk mappings, and relationships for the snapshot
* `manifest.json.zst` is an unencrypted compressed JSON blob list, enabling
  pruning without the private key

Snapshot IDs follow the format `<hostname>_<snapshot-name>_<RFC3339-timestamp>`
(e.g. `server1_home_2025-06-01T12:00:00Z`).

### data flow

**backup:**

1. Open local SQLite index, load known files and chunks into memory
2. Walk source directories, compare mtime/size/mode against index
3. For changed/new files: chunk using content-defined chunking (FastCDC)
4. For symlinks and directories: record metadata (no chunking)
5. For each chunk: hash, check dedup, add to blob packer
6. When blob reaches size threshold: compress (zstd), encrypt (age), upload
7. Build snapshot metadata database, compress, encrypt, upload
8. Create unencrypted blob manifest for pruning support

**restore:**

1. Download and decrypt `metadata/<snapshot_id>/db.zst.age`
2. Open the binary SQLite database
3. Query files (optionally filtered by paths)
4. Download and decrypt required blobs
5. Extract chunks, reconstruct files
6. Restore permissions, timestamps, ownership, symlinks

**prune:**

1. List all snapshot manifests
2. Build set of all referenced blob hashes
3. List all blobs in storage
4. Delete any blob not in the referenced set

### chunking and deduplication

* Content-defined chunking using the FastCDC algorithm
* Average chunk size: configurable (default 10MB)
* Deduplication at file level (unchanged files skipped) and chunk level
  (identical chunks across files stored once)
* Multiple chunks packed into blobs to reduce object count

### encryption

* Asymmetric encryption using age (X25519 + XChaCha20-Poly1305)
* Only the public key is needed on the source host
* Each blob and each metadata database is encrypted independently
* Multiple recipients supported (encrypt to multiple keys)

### compression

* zstd compression at configurable level (1-19, default 3)
* Applied before encryption at the blob level

---

## configuration reference

See `config.example.yml` for a complete annotated example. Key fields:

| Field | Default | Description |
|-------|---------|-------------|
| `age_recipients` | (required) | Age public keys for encryption |
| `snapshots` | (required) | Named snapshot definitions with paths and excludes |
| `storage_url` | | Storage backend URL (`s3://`, `file://`, `rclone://`) |
| `s3.*` | | Legacy S3 configuration (endpoint, bucket, credentials) |
| `exclude` | | Global exclude patterns (applied to all snapshots) |
| `chunk_size` | `10MB` | Average chunk size for content-defined chunking |
| `blob_size_limit` | `10GB` | Maximum blob size before splitting |
| `compression_level` | `3` | zstd compression level (1-19) |
| `hostname` | system hostname | Hostname used in snapshot IDs |
| `index_path` | `~/.local/share/.../index.sqlite` | Local SQLite index path |

---

## limitations

* **No extended attributes (xattrs).** ACLs, macOS Finder metadata,
  quarantine flags, SELinux labels, and other extended attributes are not
  backed up or restored.
* **No hard link detection.** Two hard links to the same inode are backed
  up as independent files. Content deduplication means the data is stored
  once, but the hard link relationship is lost on restore.
* **No sparse file support.** Sparse files are fully materialized during
  backup. A 100 GB sparse VM disk that is mostly zeros will consume the
  full (compressed) size in storage.
* **No bandwidth limiting.** Uploads and downloads use whatever bandwidth
  is available. There is no `--bwlimit` flag yet.
* **No parallel blob downloads during restore.** Blobs are fetched
  sequentially. Restore speed is bound by single-stream throughput.
* **Device nodes, named pipes, and sockets are silently skipped.** Only
  regular files, directories, and symlinks are backed up.
* **No database migrations.** If the local SQLite schema changes between
  versions, delete the local database (`vaultik database purge`) and run
  a full backup. Remote storage is unaffected.
* **Files that change during backup may be inconsistent.** There is no
  filesystem snapshot or freeze. If a file is modified between the scan
  and chunk phases, the backed-up copy may reflect a partial write.
* **Ownership restoration requires root.** File uid/gid are recorded
  and restored, but `chown` requires elevated privileges. Without root,
  files are restored with the current user's ownership.

---

## roadmap

Items for future releases:

* Error-condition tests (network failures, disk full, corrupted/missing blobs)
* Parallel blob downloads during restore
* Bandwidth limiting (`--bwlimit`)
* Security audit of encryption implementation
* Man pages and richer `--help` examples

---

## requirements

* Go 1.26 or later
* S3-compatible object storage (or local filesystem, or rclone remote)

## development workflow

All changes follow this workflow. No exceptions.

1. Create a feature branch off `main`.
2. Write tests.
3. Write the implementation.
4. Fix implementation errors until it compiles and tests pass.
5. Fix linting errors (`make lint`).
6. Update documentation and README as required by the change.
7. Format code (`make fmt`).
8. Run `make check` (lint + fmt-check + test). Fix any issues. Repeat until clean.
9. Commit on the branch.
10. Merge to `main`.
11. Push.

Do not commit directly to `main`. Do not skip steps.

Repository policies for AI agents are in [`AGENTS.md`](AGENTS.md).

## license

[MIT](https://opensource.org/license/mit/)

## author

Made with love and lots of expensive SOTA AI by [sneak](https://sneak.berlin) in Berlin in the summer of 2025.

Released as a free software gift to the world, no strings attached.

Contact: [sneak@sneak.berlin](mailto:sneak@sneak.berlin)

[https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2](https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2)