vaultik/README.md

# vaultik (ваултик)

`vaultik` is a incremental backup daemon written in Go. It
encrypts data using an `age` public key and uploads each encrypted blob
directly to a remote S3-compatible object store. It requires no private
keys, secrets, or credentials stored on the backed-up system.

It includes table-stakes features such as:

* modern authenticated encryption
* deduplication
* incremental backups
* modern multithreaded zstd compression with configurable levels
* content-addressed immutable storage
* local state tracking in standard SQLite database
* inotify-based change detection
* streaming processing of all data to not require lots of ram or temp file
  storage
* no mutable remote metadata
* no plaintext file paths or metadata stored in remote
* does not create huge numbers of small files (to keep S3 operation counts
  down) even if the source system has many small files

## what

`vaultik` walks a set of configured directories and builds a
content-addressable chunk map of changed files using deterministic chunking.
Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
encrypted with `age`, and uploaded directly to remote storage under a
content-addressed S3 path.

No plaintext file contents ever hit disk. No private key or secret
passphrase is needed or stored locally. All encrypted data is
streaming-processed and immediately discarded once uploaded. Metadata is
encrypted and pushed with the same mechanism.

## why

Existing backup software fails under one or more of these conditions:

* Requires secrets (passwords, private keys) on the source system, which
  compromises encrypted backups in the case of host system compromise
* Depends on symmetric encryption unsuitable for zero-trust environments
* Creates one-blob-per-file, which results in excessive S3 operation counts

`vaultik` addresses these by using:

* Public-key-only encryption (via `age`) requires no secrets (other than
  remote storage api key) on the source system
* Local state cache for incremental detection does not require reading from
  or decrypting remote storage
* Content-addressed immutable storage allows efficient deduplication
* Storage only of large encrypted blobs of configurable size (1G by default)
  reduces S3 operation counts and improves performance

## how

1. **install**

   ```sh
   go install git.eeqj.de/sneak/vaultik@latest
   ```

2. **generate keypair**

   ```sh
   age-keygen -o agekey.txt
   grep 'public key:' agekey.txt
   ```

3. **write config**

   ```yaml
   source_dirs:
     - /etc
     - /home/user/data
   exclude:
     - '*.log'
     - '*.tmp'
   age_recipient: age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
   s3:
     # endpoint is optional if using AWS S3, but who even does that?
     endpoint: https://s3.example.com
     bucket: vaultik-data
     prefix: host1/
     access_key_id: ...
     secret_access_key: ...
     region: us-east-1
   backup_interval: 1h      # only used in daemon mode, not for --cron mode
   full_scan_interval: 24h  # normally we use inotify to mark dirty, but
                            # every 24h we do a full stat() scan
   min_time_between_run: 15m  # again, only for daemon mode
   #index_path: /var/lib/vaultik/index.sqlite
   chunk_size: 10MB
   blob_size_limit: 10GB
   ```

4. **run**

   ```sh
   vaultik --config /etc/vaultik.yaml snapshot create
   ```

   ```sh
   vaultik --config /etc/vaultik.yaml snapshot create --cron # silent unless error
   ```

   ```sh
   vaultik --config /etc/vaultik.yaml snapshot daemon # runs continuously in foreground, uses inotify to detect changes

   # TODO
   * make sure daemon mode does not make a snapshot if no files have
     changed, even if the backup_interval has passed
   * in daemon mode, if we are long enough since the last snapshot event, and we get
     an inotify event, we should schedule the next snapshot creation for 10 minutes from the
     time of the mark-dirty event.
   ```

---

## cli

### commands

```sh
vaultik [--config <path>] snapshot create [--cron] [--daemon]
vaultik [--config <path>] snapshot list [--json]
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--force]
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep]
vaultik [--config <path>] store info
# FIXME: remove 'bucket' and 'prefix' and 'snapshot' flags.  it should be
# 'vaultik restore snapshot <snapshot> --target <dir>'.  bucket and prefix are always
# from config file.
vaultik restore --bucket <bucket> --prefix <prefix> --snapshot <id> --target <dir>
# FIXME: remove prune, it's the old version of "snapshot purge"
vaultik prune --bucket <bucket> --prefix <prefix> [--dry-run]
# FIXME: change fetch to 'vaultik restore path <snapshot> <path> --target <path>'
vaultik fetch --bucket <bucket> --prefix <prefix> --snapshot <id> --file <path> --target <path>
# FIXME: remove this, it's redundant with 'snapshot verify'
vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
```

### environment

* `VAULTIK_PRIVATE_KEY`: Required for `restore`, `prune`, `fetch`, and `verify` commands. Contains the age private key for decryption.
* `VAULTIK_CONFIG`: Optional path to config file. If set, config file path doesn't need to be specified on the command line.

### command details

**snapshot create**: Perform incremental backup of configured directories
* Config is located at `/etc/vaultik/config.yml` by default
* `--cron`: Silent unless error (for crontab)
* `--daemon`: Run continuously with inotify monitoring and periodic scans

**snapshot list**: List all snapshots with their timestamps and sizes
* `--json`: Output in JSON format

**snapshot purge**: Remove old snapshots based on criteria
* `--keep-latest`: Keep only the most recent snapshot
* `--older-than`: Remove snapshots older than duration (e.g., 30d, 6mo, 1y)
* `--force`: Skip confirmation prompt

**snapshot verify**: Verify snapshot integrity
* `--deep`: Download and verify blob hashes (not just existence)

**store info**: Display S3 bucket configuration and storage statistics

**restore**: Restore entire snapshot to target directory
* Downloads and decrypts metadata
* Fetches only required blobs
* Reconstructs directory structure

**prune**: Remove unreferenced blobs from storage
* Requires private key
* Downloads latest snapshot metadata
* Deletes orphaned blobs

**fetch**: Extract single file from backup
* Retrieves specific file without full restore
* Supports extracting to different filename

**verify**: Validate backup integrity
* Checks metadata hash
* Verifies all referenced blobs exist
* Default: Downloads blobs and validates chunk integrity
* `--quick`: Only checks blob existence and S3 content hashes

---

## architecture

### chunking

* Content-defined chunking using rolling hash (Rabin fingerprint)
* Average chunk size: 10MB (configurable)
* Deduplication at chunk level
* Multiple chunks packed into blobs for efficiency

### encryption

* Asymmetric encryption using age (X25519 + XChaCha20-Poly1305)
* Only public key needed on source host
* Each blob encrypted independently
* Metadata databases also encrypted

### storage

* Content-addressed blob storage
* Immutable append-only design
* Two-level directory sharding for blobs (aa/bb/hash)
* Compressed with zstd before encryption

### state tracking

* Local SQLite database for incremental state
* Tracks file mtimes and chunk mappings
* Enables efficient change detection
* Supports inotify monitoring in daemon mode

## does not

* Store any secrets on the backed-up machine
* Require mutable remote metadata
* Use tarballs, restic, rsync, or ssh
* Require a symmetric passphrase or password
* Trust the source system with anything

---

## does

* Incremental deduplicated backup
* Blob-packed chunk encryption
* Content-addressed immutable blobs
* Public-key encryption only
* SQLite-based local and snapshot metadata
* Fully stream-processed storage

---

## restore

`vaultik restore` downloads only the snapshot metadata and required blobs. It
never contacts the source system. All restore operations depend only on:

* `VAULTIK_PRIVATE_KEY`
* The bucket

The entire system is restore-only from object storage.

---

## features

### daemon mode

* Continuous background operation
* inotify-based change detection
* Respects `backup_interval` and `min_time_between_run`
* Full scan every `full_scan_interval` (default 24h)

### cron mode

* Single backup run
* Silent output unless errors
* Ideal for scheduled backups

### metadata integrity

* SHA256 hash of metadata stored separately
* Encrypted hash file for verification
* Chunked metadata support for large filesystems

### exclusion patterns

* Glob-based file exclusion
* Configured in YAML
* Applied during directory walk

## prune

Run `vaultik prune` on a machine with the private key. It:

* Downloads the most recent snapshot
* Decrypts metadata
* Lists referenced blobs
* Deletes any blob in the bucket not referenced

This enables garbage collection from immutable storage.

---

## LICENSE

[MIT](https://opensource.org/license/mit/)

---

## requirements

* Go 1.24.4 or later
* S3-compatible object storage
* Sufficient disk space for local index (typically <1GB)

## author

Made with love and lots of expensive SOTA AI by [sneak](https://sneak.berlin) in Berlin in the summer of 2025.

Released as a free software gift to the world, no strings attached.

Contact: [sneak@sneak.berlin](mailto:sneak@sneak.berlin)

[https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2](https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2)